Visual Studio C++ 2015 Compiler Bug

I think I’ve unfortunately found a compiler bug in Visual Studio 2015, all the way back to at least VS2010. Here is the code:

Compile and run in 64bit, Release. This is the expected output (as with g++, clang++, or in debug):

This is what I get with 64bit, Release:

Also see my stackoverflow post. The compiler seems to optimize too much, and directly uses a 64bit number instead of making sure the 32bit value is casted to 64bit. I have already reported the bug to Microsoft, and wait for feedback.

Very Fast HashMap in C++: Benchmark Results (Part 3)

As promised in part 2 of this series, I have some interesting benchmarks of the different hash map variants. In this part I will discuss the results, and present my conclusions. Click any benchmark to get a much larger graph.

How I Benchmark

I’m inserting 100 million int -> int pairs into different hashmap variants, and every 100000 inserts I am taking measurements (insertion time, memory usage, time to lookup 1 million elements, where half of the lookups fail.). As a hash function I am using std::hash. My computer is an Intel i5-4670 @ 3.4 GHz, 16GB RAM, Visual Studio 2015 Update 3, 64bit.

Insertion Time Benchmark


In this graph the reallocations are very clearly visible. All variants are doubling the size of the internally used array when it gets full. As expected in part 2, it can be seen that insertion time slows down when “Robin Hood with Infobyte” gets full, and also when “Robin Hood with Infobyte & Fastforward” gets full. This is due to the fact that more and more data needs to be moved around so the hashmap can stay sorted.

Dealing with the fastforward field has a clear overhead associated with it, insertion is quite a bit slower.

For “Robin Hood with Infobits and Hashbit”, insertion time is very fast: mostly because the offset is limited to 16, if it would get larger the hashmap’s size is doubled. Hopscotch insertion is almost as fast, just a tad slower.

All variants are much faster than std::unordered_map. The comparison is a bit unfair, because in my implementations I do not have to support the same interface as std::unordered_map does.

Memory Usage


Robin Hood with Infobyte is the most compact representation. I am resizing the map when it gets 95% full, or when an overflow occurs which is highly unlikely when using an appropriate hash function. There is just 1 byte overhead for each entry.

The second best representation is “Robin Hood with Infobyte & Fastforward”. This has exactly the same reallocation properties as when just using the infobyte, but it uses one additional byte per bucket.

When limiting the maximum offset as in “Robin Hood with Infobits & Hashbits”, resizes are occuring much quicker (maximum offset is 15 instead of 127), so the memory usage jumps up earlier.

For Hopscotch this reallocation seems to happen still a bit earlier. The overhead in my implementation is 16 bit, so the line overlaps a lot with the Infobyte & Fastforward variant.

All my implemented variants need much less memory than std::unordered_map. While std’s implementation allocates new memory for each insert, in all my variants the memory stays constant until reallocation occurs.

Lookup Time


This is probably the most interesting benchmark for most use cases, and I find this graph extremely educational and interesting.

First, the Robin Hood with Infobyte variant: Lookup time starts pretty fast, but it gets slower and slower as the hashmap gets full. This is due to the longer and longer chains that needs to be traversed to find the correct element. So the highly compact format is bought with ever increasing lookup (and insertion) times. Still, lookup is almost always faster than lookup for std::unordered_map.

The Robin Hood with Infobits & Hashbits variant shows the same behaviour, lookup time goes up when the hashmap gets full. This increase is capped early though due to the earlier reallocations. So faster lookup times can be bought with earlier reallocations and more memory usage.

To improve lookup times compared to the infobyte variant, I have introduced the fastforward byte to quickly skip directly to the elements that belong to a bucket. That’s the gray line. When the hashmap is quite empty it seems that extracting this fastforward information has quite a bit of an overhead. When implementing this variant I did not expect such a large overhead. It seems to work very well though in improving lookup time when the map gets very full, as can be seen around the 60 million elements line. Here, lookup for the variant with just the infobyte takes 0.0676 seconds, and when using fastforward byte it takes 0.0487 seconds, 38% faster. Also the lookup time is quite linear.

Last but not least, the Hopscotch variant. After spending quite some time tuning, this variant turned out as the fastest of all. Lookup time does not have the spiky behaviour that robin hood has, and it is extremly fast. One trick I am using is that I using that I didn’t see anywhere else mentioned, is that I shift the whole hop field and can stop the while loop as soon as the hop field is zero. So when only one element is there and it is in the first bucket, I need to check just one bit of the hopfield. When the element is not there, hopfield is zero so no bits have to be checked at all.


I’ve summed up the average insertion time, average lookup time, and average memory usage over all the different samples. std::unordered_map is the reference with 100%. Here are my conclusions:

  • The HopScotch implementation is the best overall for most use cases, if you have a reasonable hash function. Insertion time is just a tad slower than the fastest variant, memory usage is a bit higher than the other variants but still almost 3 times better than std::unordered_map, and lookup is blazingly fast with almost twice the speed as unordered_map. 0.0404 seconds to look up 1 million elements, or just 40.4 nanoseconds per lookup on average is not too shabby.
  • If your hash function is bad or compactness is of utmost importance, use the Robin Hood with Infobyte variant.
  • If the std interface is important, stick to std::unordered_map. While it uses lots of memory, lookup times seem to be ok and fairly consistent.

That’s my tour of hashtable implementations. You can find my code on Github at martinus/robin-hood-hashing. It still needs lots of cleanup, and most importantly, the ability to remove elements…

Very Fast HashMap in C++: Implementation Variants (Part 2)

In part 1 I have discussed Hopscotch and Robin Hood Hashing table. Since then I have implemented several hashmap variants that combine some tricks from these two variants, and have some interesting results to share. Here are the variations that I have implemented:

Robin Hood with Infobyte

Robin Hood Hashing uses the hash value to calculate the position to place it, than does linear probing until it finds an empty spot to place it. While doing so it swaps out entries that have a lesser distance to its original bucket. This minimizes the maximum time a lookup takes. For a lookup, it is only necessary to lineary probe until the distance to the original bucket is larger than the current element’s distance.

In this “Infobyte” implementation variant, instead of storing the full 64 bit hash value I directly store the distance to the original bucket in a byte. One bit of this byte is used to mark the bucket as taken or empty. The bit layout is defined like this:


When storing four elements a, b, c, d, where a, b map to hash position 1 and c, d to hash position 2, this is how they would be stored:


The advantage is that I have just 1 byte instead of 8 overhead, and linear probing is faster because I need only comparison operation on a single byte instead of calculating offsets with the hash. A byte is more than enough for the offset, since on average the distance to the the original bucket is very small. Here is the full code for lookup:

This search is very simple and straightforward, and fast.

Robin Hood with Infobits & Hashbits

Using a full byte for the offset (actually 7 bit) is fairly large. This would allow an offset of 127, which would lead to a very long and slow linear probing. My idea then is to limit the offset to a much lower maximum number. The remaining bits of the byte can be used to store a few bits from the hash, so that we can use these few bits to reduce the number of key comparisons. Also these hashbits could be used when resizing the hashmap so we would not have to rehash all elements each time we resize. After some tuning, this is the bit layout I am using for this variant:


The much smaller offset means that when inserting an element and the maximum offset is reached, I have to resize the hashmap. This is intentional, since very large offset would lead to slow searches. This way this hashing variant is more similar to Hopscotch. Here is the full code to lookup an element with this byte:

The find code is almost exactly like in the pure Infobyte variant, except that I am or’ing 3 hash bits into the info byte, and instead of ++info I am using info += Traits::OFFSET_INC where OFFSET_INC is 16 (1000b).

Robin Hood with Infobyte & Fastforward

In the first variant of Robin Hood Hashing finding an element will be slow as the hashmap gets full, because a lot of elements need to be skipped before finding the full one. Robin hood hashing has the property that it always keeps elements that belong to the same bucket together. The infobyte introduced in the first variant basically acts as a backpointer to the bucket that element originally belongs to, and it is possible to introduce another byte that is stored at the original bucket, that acts as a forward pointer to where the first element is stored for this original bucket.

Using the above example where a,b maps to 1 and c,d to 2, fastforward byte looks like this:


When looking up element c, the fastforward byte can be used to skip one element, so the while (info < _info[idx]) { ... } can be replaced with the skip operation. Here is the code:

I am using a struct that contains two bytes: info field, and fastforward. Now I have two bytes overhead for each entry, but can quickly skip lots of buckets which should be especially useful when the hashmap is very full.


Hopscotch can be seen as quite similar to the robin hood hashing with fastforward, except that it replaces the info and fastforward byte with a bitmask. With some tuning I am now using a 16 bit field, where 1 bit is used to define if the current bucket is full, and the other bits define which offsets to the current bucket are taken.


If the first bit is set, the current bucket is taken; but not nessarily by the bucket from this hop bitfield. This bit is not strictly necessary because that information is also encoded in the different hop tables, but it speeds up insertion code. With 16 bits available, this limits the hop bits to 15 though. As with “Robin Hood Hashing with Infobits & Hashbits”, when the hop bits are full, the hashmap has to be resized.

Benchmark Results

I’m leaving the most important information for part 3: how do all these variant perform? What’s best?

Very Fast HashMap in C++: Hopscotch & Robin Hood Hashing (Part 1)

A while ago I’ve spent significant time researching and implementing a fast Hopscotch hash table for C++. My current source code can be found in my github repository at martinus/robin-hood-hashing. After spending some time optimizing, I am mostly happy with the results. Insertion time is much faster than std::unordered_map and it uses far less memory.


In this simple benchmark I have measured time while sequentially adding an entry (int -> int) to the hashmap, similar to’s benchmark. As a hash function, I am using std::hash:

Whenever a jump occurs, the hashmap got too full and it is reallocating. Interestingly, std::unordered_map has to allocate memory whenever it inserts an element (since it’s a linked list), while the Hopscotch hash table only allocates once and uses that memory.


I am using Visual Studio 2015 Update 3, 64 bit, Intel i5-4670 @ 3.4GHZ. Here is a comparison table (best values bold)

Hashtable Time insert 30M integers Memory insert 30M integers Time query 30M existing Time query 30M nonexisting
unordered_map 11.56 sec (100%) 1493 MB (100%) 1.84 sec (100%) 1.92 sec (100%)
Hopscotch 3.44 sec (30%) 321 MB (22%) 1.50 sec (81%) 1.48 sec (77%)

Insertion is really fast and much more efficient, query time is also a bit faster than std::unordered_map, even though we need to check the hop bitmap of 32 elements.

Robin Hood Hashing vs. Hopscotch

After contemplating a while, I have come to the conclusion that Hopscotch is just a bad version of Robin Hood Hashing. Here is my reasoning:

  1. Hopscotch’s bitmap naturally has to be quite sparse. For a perfectly full hashmap, where each bucket contains a corresponding entry, of the 32 hop bits there will be just a single bit that is set to 1. Wikipedia has a nice representation: hopscotch-wiki-example
  2. Is there a better way to represent the hop bitmap? On way would be a linked list of offsets. Unfortunately, linked lists are not very cache friendly.
  3. To store the linked list more efficiently, we can put them into an array. How about just storing the offsets next to the buckets? So the above example from a) could be stored like this: hopscotch_as_offsetarray
    Will this work? Surprisingly, yes! When querying for an element, we just need to sequentially check the offsets. Say we want to check if ‘b’ is in the map:

    1. We hash ‘b’ and get index 6.
    2. The offset at index 6 is 0: That means at index 6 is an element that actually belongs there: It’s not b, so we have to continue checking.
    3. At index 7 we would expect an offset 1 if it contains an element that was indexed to 6. Since it is taken by something else, it will have a higher value, so we are certain this is not the bucket we are looking for.
    4. At index 8 we expect an offset 2 if it contains an element that was indexed to 6. It is to, so check the value, and bam! We found the element.

    If we query an element that is not in the map, we just need to check hop size offsets.

  4. All this sounds increasingly similar to Robin Hood Hashing. Once we have the offset, we can use it as clever as robin hood hashing does: It enables us to use the cool “take from the rich” algorithm. So now we can ditch the hop size, and just keep swapping elements exactly like robin hood hashing does.

    What now?

    With these insights, I believe I have a great idea to implement a highly efficient variant of the robin hood hash table, that takes some ideas from the hopscotch implementation. In my next post, part 2, I will explain these ideas and (hopefully) have a fantastically fast and memory efficient hash table in my repository.

Analyzing Keto Calculator Demographics

Recently my Keto Calculator has grown huge. So big, that in January 2016 alone I’ve had 222,781 page views. That is a record for my site, and I have no reason to believe that it will slow down anytime soon. In my quest to make all the visitors as happy as possible, I’ve done quite a bit of analysis of my visitors to find out what I need to do with my homepage. Here is some of the data that I have gathered using Google Analytics.

Each graph is color coded: blue stands for mobile devices, orange for tablets, and green for desktops.

Where Do the Users Come From?

I have created a bar chart showing the total number of sessions for January 2016, sorted by top countries.


By far the most visitors come from the US, followed by Canada, UK, and Australia – all english speaking countries. The first non-english speaking country is Germany, and only with 0.69% of all sessions.

For fast access for the majority of my users, my hosting server should sit in the US. I have recently switched to T1Hosting which is both cheaper and faster than my previous webhoster. Also, they were really fast to respond to my questions. So far I am pretty happy with them!

Translation the calculator is currently not really useful. I am sure that translating it into German would increase the number of users from Germany, but it does not look like the effort would really be worth it. What I have done is add an automatic translation feature to the top of the calculator to provide some usability for non-English speakers. Unfortunately the quality is quite bad.

Almost half of my users come from mobile devices! To be precies, 48.8% mobile, 41.7% desktop, 9.5% tablets. That means it is absolutely essential that the keto calculator works well on mobile and on the desktop. It took me quite a while to make it look good on mobile, and now I’m happy with the result. I’ve done most of the testing with Chrome DevTools, which are absolutely fantastic!

How Old Are My Visitors?


Most visitors are 25-34 years old. I guess that’s a typical result for a website with nutrition as a topic. Interestingly, the graph does not show any data for younger than 18 year olds. I have updated the calculator with some safety checks so that even when somebody very young tries to use the calculator it will give reasonable advice. If you are under 18, it will show a warning. Under 15 the calculator even refuses to give you any results.

What Gender Are My Visitors?


There are a lot more female visitors! It seems that females are much more sensitive about nutrition topics. Also, female tend to use more mobile devices.

It would probably help to have some gender specific advise on the keto calculator, especially for women. E.g. breastfeeding comes up from time to time. Unfortunately I know nothing about these topics, so it’s best to go to the dedicated subreddit /r/xxketo with these kind of questions.

How Big Are the Screens?


Lots of visitors use smartphones, and the screen resolutions reflect this. The sizes range from quite narrow with just 360 pixels to quite large with 1440 pixels. I’ve tuned the layout to be nicely visible on all these ranges. Thanks to @media it gracefully switches between these layouts and looks quite readable on very wide screens and very small screens.