The value of Benchmarking. Still. Always

February 11, 2023
dev architecting benchmark

Benchmarking code is important because you learn something you can always use. Follow me on this short journey through "is there something better?"

Recently I had a need to get two items from a list of numbers. Given a value, I need the items before or after. It is guaranteed that there will always be a before and after. 

I went ahead and used mu skills and came up with something. And after I did, I wondered to myself:

"Self, is there a better way?"

And I didn't answer myself. Which is a good thing, because otherwise…

I then did what every great developer does: I googled with Bing. Which led to a few places and ultimately a StackOverflow question and answer. believe it or not, there weren't many posts for this particular need. I suppose it is a bit esoteric and easy enough to solve that no one really asks.

Though there were plenty of questions around “how I get the index of…” so I am surprised I didn't get more hits - getting an index seems like a much simpler problem to solve on your own.

Stay on topic.

I found a similarly asked question that our boy over the pond, Jon Skeet himself answered. And it was a whole lot more complicated than my way. Being the author of a c# book I suppose I cannot say I am surprised. Afterall, I went the lazy dev way and used LINQ. Which let's be honest, this si why I asked Self could there be a better way. Deep down I know that using LINQ is bad for me and will eventually be the bottleneck in all .NET programs. Mark my words. (I didn't mention I used LINQ so that you wouldn't run off and laugh too soon and I wanted to intrigue you with Skeet's name). Case in point: you're still here.

So, what was Jon's way?

How to get the item before current and after current in a dictionary with Linq / C#? - Stack Overflow

public IEnumerable<Tuple<T, T, T>> WithNextAndPrevious<T>
    (this IEnumerable<T> source)
{
    // Actually yield "the previous two" as well as the current one - this
    // is easier to implement than "previous and next" but they're equivalent
    using (IEnumerator<T> iterator = source.GetEnumerator())
    {
        if (!iterator.MoveNext())
        {
            yield break;
        }
        T lastButOne = iterator.Current;
        if (!iterator.MoveNext())
        {
            yield break;
        }
        T previous = iterator.Current;
        while (iterator.MoveNext())
        {
            T current = iterator.Current;
            yield return Tuple.Create(lastButOne, previous, current);
            lastButOne = previous;
            previous = current;
        }
    }        
}

OK, that's some code. I haven't used yield in an iterator in soooo long. I forgot it was a thing! And yes, the post is 12 years old as of today, but like I said, not much on this topic.

What was my code? Fine:

var lower = list.Where(x => val >= x).OrderByDescending(x => x).FirstOrDefault();
var higher = list.Where(x => val <= x).OrderBy(x => x).FirstOrDefault(); 

Let's be honest, that's pretty clean looking and so easy to understand. But then I think, hmm.. Jon is actually just iterating over the list - maybe I shouldn't be afraid of that.

Then I remembered that FOR i is pretty damn fast - and these are small lists of data…let's make a new version:

    public (double min, double max) WhileI(IEnumerable<double> source, double myVal)
    {
        double before = 0;
        double after = 0;
        foreach (var x in source)
        {
            if (x <= myVal && x > before)
            {
                before = x;
            }
            if (x >= myVal && x <= after)
            {
                after = x;
            }
        }

        return (before, after);
    }

That looks pretty good. Easy to understand which is nice. Jon's version is a bit messier (albeit probably better).

Now what? Remember the title? Let's find out.

I set up Benchmark.NET and ran 1000s of tests. Literally. I did 5, 10, 100, 1000, 10000 - to see if I got anything different - 10K tests take a bit.

I tested all three version. And here are the results:

LINQ: no, surprise: slowest and memoriest (which means it uses the most allocations at 1344B.

Jon's/Iterator & My/FOR i version both use the same memory (912B).

The FOR i version is a little faster (most of the time) by about 10-30 ns - until we do 10K at which point it showed a bigger delta

So, I guess it doesn't matter which of the two you prefer. I'll go with the FOR i version because it's clearer to me. And when we're talking about nanoseconds (a billionth of a second) do we really care?

Summary

Benchmark that shit. And don't forget how fast FOR loops are - they are insanely fast.

And if you were reading this and thinking “don't pre-optimize” or “don't worry until you have a problem" go read this. (I thought I had another older post but I can't find it).

One of the benchmarks: