The value of Benchmarking. Still. Always

Benchmarking code is important because you learn something you can always use. Follow me on this short journey through "is there something better?"

Recently I had a need to get two items from a list of numbers. Given a value, I need the items before or after. It is guaranteed that there will always be a before and after.

I went ahead and used mu skills and came up with something. And after I did, I wondered to myself:

"Self, is there a better way?"

And I didn't answer myself. Which is a good thing, because otherwise…

I then did what every great developer does: I googled with Bing. Which led to a few places and ultimately a StackOverflow question and answer. believe it or not, there weren't many posts for this particular need. I suppose it is a bit esoteric and easy enough to solve that no one really asks.

Though there were plenty of questions around “how I get the index of…” so I am surprised I didn't get more hits - getting an index seems like a much simpler problem to solve on your own.

Stay on topic.

I found a similarly asked question that our boy over the pond, Jon Skeet himself answered. And it was a whole lot more complicated than my way. Being the author of a c# book I suppose I cannot say I am surprised. Afterall, I went the lazy dev way and used LINQ. Which let's be honest, this si why I asked Self could there be a better way. Deep down I know that using LINQ is bad for me and will eventually be the bottleneck in all .NET programs. Mark my words. (I didn't mention I used LINQ so that you wouldn't run off and laugh too soon and I wanted to intrigue you with Skeet's name). Case in point: you're still here.

So, what was Jon's way?

How to get the item before current and after current in a dictionary with Linq / C#? - Stack Overflow

public IEnumerable<Tuple<T, T, T>> WithNextAndPrevious<T>
    (this IEnumerable<T> source)
{
    // Actually yield "the previous two" as well as the current one - this
    // is easier to implement than "previous and next" but they're equivalent
    using (IEnumerator<T> iterator = source.GetEnumerator())
    {
        if (!iterator.MoveNext())
        {
            yield break;
        }
        T lastButOne = iterator.Current;
        if (!iterator.MoveNext())
        {
            yield break;
        }
        T previous = iterator.Current;
        while (iterator.MoveNext())
        {
            T current = iterator.Current;
            yield return Tuple.Create(lastButOne, previous, current);
            lastButOne = previous;
            previous = current;
        }
    }        
}

OK, that's some code. I haven't used yield in an iterator in soooo long. I forgot it was a thing! And yes, the post is 12 years old as of today, but like I said, not much on this topic.

What was my code? Fine:

var lower = list.Where(x => val >= x).OrderByDescending(x => x).FirstOrDefault();
var higher = list.Where(x => val <= x).OrderBy(x => x).FirstOrDefault();

Let's be honest, that's pretty clean looking and so easy to understand. But then I think, hmm.. Jon is actually just iterating over the list - maybe I shouldn't be afraid of that.

Then I remembered that FOR i is pretty damn fast - and these are small lists of data…let's make a new version:

    public (double min, double max) WhileI(IEnumerable<double> source, double myVal)
    {
        double before = 0;
        double after = 0;
        foreach (var x in source)
        {
            if (x <= myVal && x > before)
            {
                before = x;
            }
            if (x >= myVal && x <= after)
            {
                after = x;
            }
        }

        return (before, after);
    }

That looks pretty good. Easy to understand which is nice. Jon's version is a bit messier (albeit probably better).

Now what? Remember the title? Let's find out.

I set up Benchmark.NET and ran 1000s of tests. Literally. I did 5, 10, 100, 1000, 10000 - to see if I got anything different - 10K tests take a bit.

I tested all three version. And here are the results:

LINQ: no, surprise: slowest and memoriest (which means it uses the most allocations at 1344B.

Jon's/Iterator & My/FOR i version both use the same memory (912B).

The FOR i version is a little faster (most of the time) by about 10-30 ns - until we do 10K at which point it showed a bigger delta

So, I guess it doesn't matter which of the two you prefer. I'll go with the FOR i version because it's clearer to me. And when we're talking about nanoseconds (a billionth of a second) do we really care?

Summary

Benchmark that shit. And don't forget how fast FOR loops are - they are insanely fast.

And if you were reading this and thinking “don't pre-optimize” or “don't worry until you have a problem" go read this. (I thought I had another older post but I can't find it).

One of the benchmarks:

The value of Benchmarking. Still. Always

February 11, 2023