Very Fast HashMap in C++: Hopscotch & Robin Hood Hashing (Part 1)

A while ago I’ve spent significant time researching and implementing a fast Hopscotch hash table for C++. My current source code can be found in my github repository at martinus/robin-hood-hashing. After spending some time optimizing, I am mostly happy with the results. Insertion time is much faster than std::unordered_map and it uses far less memory.


In this simple benchmark I have measured time while sequentially adding an entry (int -> int) to the hashmap, similar to’s benchmark. As a hash function, I am using std::hash:

Whenever a jump occurs, the hashmap got too full and it is reallocating. Interestingly, std::unordered_map has to allocate memory whenever it inserts an element (since it’s a linked list), while the Hopscotch hash table only allocates once and uses that memory.


I am using Visual Studio 2015 Update 3, 64 bit, Intel i5-4670 @ 3.4GHZ. Here is a comparison table (best values bold)

Hashtable Time insert 30M integers Memory insert 30M integers Time query 30M existing Time query 30M nonexisting
unordered_map 11.56 sec (100%) 1493 MB (100%) 1.84 sec (100%) 1.92 sec (100%)
Hopscotch 3.44 sec (30%) 321 MB (22%) 1.50 sec (81%) 1.48 sec (77%)

Insertion is really fast and much more efficient, query time is also a bit faster than std::unordered_map, even though we need to check the hop bitmap of 32 elements.

Robin Hood Hashing vs. Hopscotch

After contemplating a while, I have come to the conclusion that Hopscotch is just a bad version of Robin Hood Hashing. Here is my reasoning:

  1. Hopscotch’s bitmap naturally has to be quite sparse. For a perfectly full hashmap, where each bucket contains a corresponding entry, of the 32 hop bits there will be just a single bit that is set to 1. Wikipedia has a nice representation: hopscotch-wiki-example
  2. Is there a better way to represent the hop bitmap? On way would be a linked list of offsets. Unfortunately, linked lists are not very cache friendly.
  3. To store the linked list more efficiently, we can put them into an array. How about just storing the offsets next to the buckets? So the above example from a) could be stored like this: hopscotch_as_offsetarray
    Will this work? Surprisingly, yes! When querying for an element, we just need to sequentially check the offsets. Say we want to check if ‘b’ is in the map:

    1. We hash ‘b’ and get index 6.
    2. The offset at index 6 is 0: That means at index 6 is an element that actually belongs there: It’s not b, so we have to continue checking.
    3. At index 7 we would expect an offset 1 if it contains an element that was indexed to 6. Since it is taken by something else, it will have a higher value, so we are certain this is not the bucket we are looking for.
    4. At index 8 we expect an offset 2 if it contains an element that was indexed to 6. It is to, so check the value, and bam! We found the element.

    If we query an element that is not in the map, we just need to check hop size offsets.

  4. All this sounds increasingly similar to Robin Hood Hashing. Once we have the offset, we can use it as clever as robin hood hashing does: It enables us to use the cool “take from the rich” algorithm. So now we can ditch the hop size, and just keep swapping elements exactly like robin hood hashing does.

    What now?

    With these insights, I believe I have a great idea to implement a highly efficient variant of the robin hood hash table, that takes some ideas from the hopscotch implementation. In my next post, part 2, I will explain these ideas and (hopefully) have a fantastically fast and memory efficient hash table in my repository.

Analyzing Keto Calculator Demographics

Recently my Keto Calculator has grown huge. So big, that in January 2016 alone I’ve had 222,781 page views. That is a record for my site, and I have no reason to believe that it will slow down anytime soon. In my quest to make all the visitors as happy as possible, I’ve done quite a bit of analysis of my visitors to find out what I need to do with my homepage. Here is some of the data that I have gathered using Google Analytics.

Each graph is color coded: blue stands for mobile devices, orange for tablets, and green for desktops.

Where Do the Users Come From?

I have created a bar chart showing the total number of sessions for January 2016, sorted by top countries.


By far the most visitors come from the US, followed by Canada, UK, and Australia – all english speaking countries. The first non-english speaking country is Germany, and only with 0.69% of all sessions.

For fast access for the majority of my users, my hosting server should sit in the US. I have recently switched to T1Hosting which is both cheaper and faster than my previous webhoster. Also, they were really fast to respond to my questions. So far I am pretty happy with them!

Translation the calculator is currently not really useful. I am sure that translating it into German would increase the number of users from Germany, but it does not look like the effort would really be worth it. What I have done is add an automatic translation feature to the top of the calculator to provide some usability for non-English speakers. Unfortunately the quality is quite bad.

Almost half of my users come from mobile devices! To be precies, 48.8% mobile, 41.7% desktop, 9.5% tablets. That means it is absolutely essential that the keto calculator works well on mobile and on the desktop. It took me quite a while to make it look good on mobile, and now I’m happy with the result. I’ve done most of the testing with Chrome DevTools, which are absolutely fantastic!

How Old Are My Visitors?


Most visitors are 25-34 years old. I guess that’s a typical result for a website with nutrition as a topic. Interestingly, the graph does not show any data for younger than 18 year olds. I have updated the calculator with some safety checks so that even when somebody very young tries to use the calculator it will give reasonable advice. If you are under 18, it will show a warning. Under 15 the calculator even refuses to give you any results.

What Gender Are My Visitors?


There are a lot more female visitors! It seems that females are much more sensitive about nutrition topics. Also, female tend to use more mobile devices.

It would probably help to have some gender specific advise on the keto calculator, especially for women. E.g. breastfeeding comes up from time to time. Unfortunately I know nothing about these topics, so it’s best to go to the dedicated subreddit /r/xxketo with these kind of questions.

How Big Are the Screens?


Lots of visitors use smartphones, and the screen resolutions reflect this. The sizes range from quite narrow with just 360 pixels to quite large with 1440 pixels. I’ve tuned the layout to be nicely visible on all these ranges. Thanks to @media it gracefully switches between these layouts and looks quite readable on very wide screens and very small screens.

Google Analytics Experiments is Buggy

Google Analytics has a very interesting nice feature: it is possible to use it for A/B testing, and in a nice client-side way that just uses javascript. I am using this to improve my keto calculator website. An interesting feature is that google uses a multi-armed bandit implementation, which optimizes the A/B split ratio while optimizing.

Unfortunately, Google’s implementation is quite buggy. Here are a few problems that I ran into while testing:

Inaccurate Number

While testing, Google Analytics shows a “Probability of Outperforming Original”. This number can is often incorrect, and does not match with what is actually used for the homepage A/B split ratio. It seems that the numbers in the Analytics UI is calculated based on data that is only partly up-to date. In reality the probability should is at least 99%, but the UI shows 84.1%. The number is updated about twice a day, and it jumps around wildly even though there is not much change in the relative ratio between A and B test.

Incorrect Graphs

The above graph is sometimes not up to date. It seems that the graph data can be delayed for 1-2 days. Sometimes it shows the current days as no data, even though it’s already recorded in the “Adsense” view.

By the way, when optimizing for Adsense Revenue, it seems that the “AdSense” view is much more accurate than the “Conversions” view. What really counts is the “AdSense eCPM” value (which by the way should be renamed to RPM).

Incorrect/Inaccurate Results with Multiple Variants

I’ve tried to use the multi-armed bandit to find the optimum variant out of 20 different ones. It seems that this has failed completely. While having a look at the experiments.js file that google integrates, it can be seen that some variants had a probability to be chosen of 0, so they were not at all considered any more. With multi-armed bandit Google usually has a selection probability for each variant that is exactly the probability of it outperforming the original. The javascript file for the above screenshot contains this data:

So even though the screenshot says a 84.1% probability of outperforming, the file has a selection probability of the variant of 98.1% which should be more correct. In my test with 20 variants, about 10 variants had a weight of 0, which means that they were never shown to the user. When I noticed this, I added my own selection code (in 50% of the cases, just randomly present a variation to the user) so that the 0 probability variants have a chance to be actually visible to some users, and while this had some effect on the results, it still seemed that the multi-armed bandit just can not deal with so many variants at the same time. Some variants which are actually very similar got a completely different probability of outperforming the original.


All in all Google Analytics is really excellent for A/B testing, but has a few quirks that one should be aware of. When A/B testing, a Chi-Square test for comparison is always helpful. E.g. a Chi-Squared test of click through ratio of the above example (ok, it’s not the same as testing AdSense Revenue like google does, but the difference should be marginal in my case) shows that with a confidence level of 99% we can say that the variant is more successful.

Body Fat Comparison Pictures

Here is a collection of visual body fat charts. Use these to estimate your personal body fat for the keto calculator. See below for men.


low female body fat percentages
15% female body fat comparrison
female at 8-9 percent body fat

As you can see, even the same body fat percentag (here twice 15%) can look quite different, depending on your muscles and where the body fat is positioned. The minimum possible body fat percentage is 8-9%. This low is not sustainable, and very unhealthy! If you are female, don’t go that low.


Men at different body fat levels

male body fat percentages


Picture of a man at 3-4% body fat

As above with the female pictures, here are also two images with the same body fat percentage: 10% can look quite different, depending on how muscular you are and where your body fat is positioned. The lowest male can go os 3-4%, but this is not at all healthy at all. It is only possible to reach such low body fat for a very short period of time. I’d say the lowest you should go to still be healthy is 10%.

Back to the Keto Calculator

Now when you have made up your mind where about your body fat percentage might be, head back to the keto calculator and enter your number.


Beautiful Git Logs

git has a very configurable logging options. I’ve played a while with the configuration, and found an awesome alias that looks just beautiful. It only works as of git 1.8.3 (March 24, 2013) because it uses auto coloring.

git l


git ls