It’s been over 3 years since I’ve spent considerable time finding the best C++ hashmap. After several requests I finally gave in and redid the benchmark with state of C++ hashmaps as of August 2022. This took much more work than I initially anticipated, mostly due to the fact that benchmarks take a looong time, and writing everything up and creating a representation that is actually useful takes even more time. Thanks everyone who annoyingly kept asking me for updates

TL;DR: Here are the benchmark results!

This time I have evaluated 29 different hashmaps, and also added several variants with special allocators. Each of these was combined with 6 differend hashes, resulting in 174 different combinations to benchmark. Each of these combinations was evaluated in 11 different benchmarks, totaling in 1914 benchmark evaluations. This almost doubles the number of benchmarks from my evaluation in 2019.

Benchmark Infrastructure

Hardware

All benchmarks ran on an Intel i7-8700, locked at 3200 MHz.
The benchmarks ran on an isolated core dedicated to benchmarking
I disabled frequency scaling and turbo boost.
the PC was kept completely idle otherwise while running the benchmarks.

Software

I used Manjaro Linux with an up to date kernel.
All benchmarks are done with clang++ 13, which at that time was the default compiler on Manjaro Linux.
I used the compile flags -O3 -march=native.
Each benchmark was run multiple times, and I’m using the median to get rid of any potential outliers.

Benchmarks

Stable References

The hashmaps can be divided into two types: ones where the inserted elements always reside at the same location, thus pointers & references to the elements are stable, and ones where elements can be shuffled around. This is usually the case in open address hashing.

Hashmaps with no reference stability tend to be faster because they can usually get rid of one memory indirection and can better optimize for cache locality and allocations.

Modifying Numbers Benchmarks

Copy

This benchmark first creates a map uint64_t → uint64_t with 1M random entries. This map is then copied 200 times into a brand new map, and copied 200 times into an existing map.

for (size_t n = 0; n < 200; ++n) {
    Map<uint64_t, uint64_t> m = mapForCopy;
}
Map<uint64_t, uint64_t> m;
for (size_t n = 0; n < 200; ++n) {
    m = mapForCopy;
}

There is some minor modification steps added so the compiler can’t optimize the copy away. Full sourcecode of Copy.

Insert then Erase 100M int

This benchmarks bulk insertion and bulk erase:

Insert 100 million random int into a Map<int, int>.
Clear all entries with clear().
Reinsert 100 million random int into the same cleared map.
Remove all of the inserted entries one by one until the map is empty again.
Destruct the empty map.

100 million int-int pairs take at least 1526 MB. It is interesting to see how much overhead the maps have here, and how they deal with resizing their data. clear() is interesting too, for flat maps it might be possible to optimize for data that is trivially destructible, then clear() can be very fast. Reinsertion is interesting to see if a map reuses initialized memory, and if it can gain any speed from that. Removing elements one by one is interesting to see removal performance - some maps need to rearrange entries (e.g. robin-hood based maps) which might slow down their performance.

Random Insert & Access uint64_t

This inserts random elements into a map, but with bounded random numbers. The bencharked loop looks like this:

Map<int, int> map;
for (size_t i = 0; i < 50'000'000; ++i) {
    checksum += ++map[rng(max_rng)];
}

I is an adapted benchmark from attractivechaos’ Revisiting hash table performance code.

Here rng(max_rng) creates a random number in the range [0, max_rng(. If max_rng is small, not many elements will be inserted but most of them will be accessed. If max_rng is large, mostly new elements will be inserted.

In fact the benchmark is run 4 times, with different max_rng settings:

5% distinct: max_rng = 50M * 5% = 250k, so many values will be duplicates. After 50 million iterations, all 250k elements are inserted. So most map operations will be accesses.
25% distinct: max_rng = 50M * 25% = 12.5M. More inserts, less modifications.
50% distinct: max_rng = 50M * 50% = 25M. Note that due to randomness not all numbers from 0-25M will be inserted. Here the final map’s size contains 21.6M entries. So actually its about 43% of the value range instead of 50%.
100% distinct: Here we make use of the full range of int, so 2^32 numbers are available. Practically all operations are new insertions.

Random Insert & Erase uint64_t

The core of the benchmark is this loop:

Map<uint64_t, uint64_t> map;
for (size_t i = 0; i < 50'000'000; ++i) {
    map.emplace(rng() & bitMask, i);
    map.erase(rng() & bitMask);
}

On first glance it looks similar to the previous benchmark, but it bechmarks something completely different. Each iteration it randomly inserts an element, and then randomly erases an element. Instead of using a maximum number for the random number generator, here I am using a bit mask to extract several bits. The purpose of this is to ensure the maps work still well, even when numbers are not sequentially or small and don’t always change in the lower bits.

This benchmark loop is repeated 6 times. Initially, 4 random bits are set in bitMask. After each benchmark 4 additional bits are set, thus the number of available random numbers increases and the map will find a larger equilibrium. Here is the list of bitMasks used in the benchmark:

1001000000000000000000000000000000000000000100000000000000001000 4 bits set, 16 distinct numbers. Equilibrium will be around 16/2 = 8 entries, so the map stays very small.
1001000000000010001100000000000000000000000101000000000000001000 8 bits set, 256 distinct numbers. Averaging 128 entries.
1001000000000110001100000000000000010000000101100000000000001001 12 bits set, 4096 distinct numbers, averaging 2048 entries
1001000000000110001100000001000000010000000101110000000100101001 16 bits set, 65k distinct numbers, 32.8k entries equilibrium.
1101100000000110001100001001000000010000000101110001000100101001 20 bits set, 1M distinct numbers, 524k entries equilibrium.
1101100000001110001100001001001000010000100101110001000100101011 24 bits set, 16.8M distinct numbers, 8.4M entries equilibrium.

So the map’s average size increases by a factor of 16 each benchmark. Ideally a hashmap has O(1) amortized operations, so the speed should be constant regardless the size. This is unfortunately not the case, mostly due to lots and lots of cache misses the larger you get.

Access & Find Benchmarks

Iterate

Iteration benchmark works like this: I create a Map<uint64_t, uint64_t>, and each time I insert one random element the whole map is iterated. Once 50k elements are inserted, elements are removed one by one and each time the map is iterated again. In total, exactly 2.5 billion entries are iterated.

Map<uint64_t, uint64_t> map;

auto const initialRngState = rng.state();

// measure insert than iterate whole map
for (size_t n = 0; n < num_iters; ++n) {
    map[rng()] = n;
    for (auto const& keyVal : map) {
        result += keyVal.second;
    }
}

// reset rng back to inital state
rng.state(initialRngState);

// measure erase than iterate whole map
for (size_t n = 0; n < num_iters; ++n) {
    map.erase(rng());
    for (auto const& keyVal : map) {
        result += keyVal.second;
    }
}

Find 1 – 200 uint64_t

In many use cases find performance is probably considered the most important benchmark. This benchmark was tricky to implement, as it should be as unbiased as possible. It does the following:

Lookup with different probability of being found: 0%, 25%, 50%, 75%, and 100% success ratio, to measure get a broad spectrum of lookup probability.
Lookups with different amount of data in the map, to make sure we have a wide range of fullness factors.
Lookups with different bitmasks of the entry, to make sure we are not biased towards small numbers.

In detail, it works this way:

Each iteration, 4 random entries are created in a Map<size_t, size_t>. Depending on the desired lookup probability, 0, 1, 2, 3, or all 4 entries will be chosen either from a unique random number generator, or one with the same initial seed used for the lookups. Additionally, the order of these 4 entries is shuffled to introduce even more randomness.
After insertion, a number of lookups are performed, and when an entry is found it’s value is accessed.
Repeat until the map is full.

Here, we perform 5 million lookups for every 4 inserts, until the map contains 200 elements.

Find 1 – 2000 uint64_t

Same as benchmark Find 1 - 200 uint64_t, but with 500k lookups every 4 inserts until the map contains 2000 entries.

Find 1 – 500k uint64_t

Same as benchmark Find 1 - 200 uint64_t, but with 1000 lookups every 4 inserts until the map contains 500k entries. Note that there is a overhead measured with the find benchmark, namely inserting. In my tests this overhead is minimal and negligable to find results. E.g. for jg::dense_hash_map which has very fast find and very slow inserts, the overhead is only ~0.49%.

Modifying String Benchmarks

Random Insert & Erase string

This benchmark is very similar to Random Insert & Erase uint64_t. It constantly inserts and removes random strings. The benchmark is run with different sizes of std::string: 7, 8, 13, 100, and 1000 bytes. Each time a string is inserted or queried, only the last few bytes are modified. That means if two hashes of the string are the same, std::equal will have to check quite a few bytes, especially for longer strings. Since comparisons and hashing takes much longer for longer strings, I have adapted the runtime for each benchmark:

7 bytes: 20M iterations
8 bytes: 20M iterations
13 bytes: 20M iterations
100 bytes: 12M iterations
1000 bytes: 6M iteratons

The benchmark code looks like this. It takes great care that the random string is not unnecessarily copied or recreated.

std::string str(string_length, 'x');

Map<std::string, std::string> map;
for (size_t i = 0; i < max_n; ++i) {    
    // create an entry.
    randomize(str);
    map[str];

    // find and erase entry.
    randomize(str);
    auto it = map.find(str);
    if (it != map.end()) {
        ++verifier;
        map.erase(it);
    }
}

Find String Benchmarks

Find 1 – 100k string

This benchmark is practically the same as the uint64_t find benchmarks, except that it uses 100 byte long std::string. There are 1000 lookups every 4 inserts, until 100k entries are inserted.

Find 1 – 1M string

Same as Find 1 - 100k string, but with 13 bytes long strings, 200 lookups every 4 inserts until 1M strings are in the map.

Other

Memory Usage

Each benchmark is run in a separate process, and I measure the peak resident set size for each benchmark. The number presented here calculates the geometric mean of the two very memory-heavy benchmarks, Insert then Erase 100M int and Random Insert & Erase uint64_t.

Flat maps tend to have a bad maximum memory behavior, because whenever data has to be relocated they allocate a temporary array that’s twice the size of the existing data, then move data over to the new array, and after that they erase the memory.

So for a fill factor of 80%, the worst case for peak memory usage (Insert one element after the map is full) is m = numElements / 0.8 * 3, which gives an overhead of 275%.

Combined Results

Geometric Mean Number Find

The geometric mean of the number related find benchmarks Find 1 – 200 uint64_t, Find 1 – 2000 uint64_t, Find 1 – 500k uint64_t. Thus, if you only care about search performance for integral types, this is your most important result.

Geometric Mean String Find

The geometric mean of std::string find benchmarks Find 1 – 100k string, Find 1 – 1M string. If searching for std::string is all you care about, look at the top results here.

Geometric Mean All

The geometric mean of all benchmarks, including memory usage. If you are interested overall speed in a wide range of scenarios, have a look at the top results here.

Benchmark Results Table

Each column shows benchmark runtime normalized to 100 for the best performer. So 100 means fastest (or lowest memory), 110 means 10% slower than the fastest. The last 3 rows show summarized results. Click a row for sorting, enter text in map/hash field for filtering.

Result Analysis

Containers

absl::flat_hash_map ↑

Google’s Abseil’s absl::flat_hash_map stores value_type directly in the slot array, and Google recommends these for general use. They were brand new in 2019 and pushed the boundary on what’s possible to achieve for unordered_maps. It uses several interesting optimizations, described in CppCon 2017: Matt Kulukundis “Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step.

The Good: 3 years ago absl::flat_hash_map was one of the fastest maps. It still is quite fast, and seems to perform especially well for large maps. This map and gtl::flat_hash_map, which is based on that map, are the fastest in the Find 1 – 500k uint64_t benchmark. Other find benchmarks are reasonably fast too, especially for strings.
The Bad: Copying and iterating the map is comparatively slow. The map is highly sensitive to the used hash, and benchmarks are incredibly slow (timeout) when bad hash is used. E.g. std::hash or boost::hash for number types.
About: Website: https://abseil.io/docs/cpp/guides/container, Tested version: 736458b5 (master), License: Apache License 2.0

absl::node_hash_map ↑

Google’s absl::node_hash_map is a near drop-in replacement for std::unordered_map with stable pointers & references. Bound to be a bit slower than absl::flat_hash_map due to the indirection.

The Good: Being node based pointers & references are stable. Search performance is still very good, there is only little slowdown compared to absl::flat_hash_map.
The Bad: Memory usage, copying, inserting elements is very slow, even much slower than std::unordered_map.
About: Website: https://abseil.io/docs/cpp/guides/container, Tested version: 736458b5 (master), License: Apache License 2.0

ankerl::unordered_dense::map ↑

Full disclaimer: I’m the author! This map is designed to be very fast, simple, but still feature rich. It achieves that by combining ideas from robin_hood::unordered_flat_map and using a simple std::vector as storage. Iteration is therefore as fast as it gets since all data is stored linearly in memory. I consider this implementation as a successor of my old robin_hood map.

The Good: The map is an excellent allrounder. Search is very fast, string search is fastest in one benchmark. Iteration speed is unbeatable because all the data lies in a contiguous block of memory. It has support for custom allocators, custom containers, and fancy pointers. It is e.g. possible to simply replace the internally used std::vector with other types to make use of shared memory.
The Bad: Removing an element can be relatively slow, since it requires two lookups because the map keeps a densely stored vector at all times.
About: Website: https://github.com/martinus/unordered_dense, Tested version: v1.0.2, License: MIT License

boost::multi_index::hashed_unique ↑

Boost’s boost::multi_index library is extremely powerful. You can use multiple indices at once. In this benchmark I’m just using boost::multi_index::hashed_unique to see how well it performs.

The Good: Lookup is reasonably fast, and memory usage is ok. If you need multiple indices for the same data this is the most user friendly choice.
The Bad: Copying the map is really slow - a whopping 100 (hundred) times slower than ankerl::unordered_dense::map.
About: Website: https://www.boost.org/doc/libs/1_80_0/libs/multi_index/doc/index.html, Tested version: 1.80.0, License: Boost Software License 1.0

boost::unordered_map ↑

In version 1.80 there has been a complete rewrite of boost::unordered_map. That was actually the main reason why I have decided to redo this whole benchmark. It comes with extensive documentation and benchmarks. I took the opportunity to test the map, and also try it with different allocators as my initial experiments indicated quite a big performance difference with a specialized allocator.

The Good: Compared to std::unordered_map, lookup is almost twice as fast! This is a huge improvement, given the limitations the maps are under (stable references, and keeping the bucket API). If you need high compatibility with std::unordered_map, this is the best choice.
The Bad: Insert, erase, copy are relatively slow and memory usage is quite high.
About: Website: https://www.boost.org/doc/libs/1_80_0/libs/multi_index/doc/index.html, Tested version: 1.80.0, License: Boost Software License 1.0

boost::unordered_map & PoolAllocator ↑

The PoolAllocator is a custom allocator for node based containers. Since boost::unordered_map is node based, it has to allocate one node for each element. Thus it can potentially gain a lot from a custom allocator. I actually wrote PoolAllocator for Bitcoin, where an std::unordered_map is heavily used and using this PoolAllocator speeds up initial block indexing significantly.

The Good: Memory usage is reduced quite a lot, and copying the map is more than twice as fast.
The Bad: For some reason search performance is quite a bit slower. Theoretically using a different allocator shouldn’t make any difference, but at least in my benchmarks search performance for small maps are much slower.
About: Website: https://github.com/martinus/map_benchmark/blob/master/src/app/pool.h, Tested version: 644b2fa (master), License: MIT License

boost::unordered_map & boost::container::pmr::unsynchronized_pool_resource ↑

Boost comes with its own implementation of Polymorphic Memory Resources, which should behave similar to PoolAllocator. And it does in most cases, except the Copy benchmark, here PoolAllocator is quite a bit faster.

The Good: It’s boost’s own implementation, and it gives a big speedup for inserting elements, and is also much faster than a plain boost::unordered_map when many inserts & erase happen. Using this custom allocator also brings down memory usage quite a lot, because it doesn’t have to pay the malloc overhead for each node.
The Bad: The time to copy the map is about the same than without a custom allocator. In comparison, with PoolAllocator the performnace here is almost doubled, this looks like a lost opportunity to me!
About: Website: https://www.boost.org/doc/libs/1_80_0/libs/multi_index/doc/index.html, Tested version: 1.80.0, License: Boost Software License 1.0

emhash7::HashMap ↑

These are very high performance hashmaps and the author is constantly updating them. There are different versions of the maps with different propertices concerning performance vs. memory. This map has very fast iteration speed.

The Good: One of the fastest map. It is fastest for insert & erase, and find is also extremely fast.
The Bad: It’s a bit slower than the best for std::string keys, but overall its a very fast implementation.
About: Website: https://github.com/ktprime/emhash, Tested version: 9a3f7189 (master), License: MIT License

emhash8::HashMap ↑

This variant from ktprime has behaves similar to emhash7::Hash, but requires less memory.

The Good: Very fast iteration speed, search & insert is very fast. Low memory usage.
The Bad: Erase is slower than emhash7, but it’s still quite good.
About: Website: https://github.com/ktprime/emhash, Tested version: 9a3f7189 (master), License: MIT License

folly::F14NodeMap ↑

A supposedly high performance hashmap implementation from Facebook / Meta. It stores values indirectly, calling malloc for each node. Also see folly::F14ValueMap

The Good: Relatively good search performance, Insert & erase is not ok-ish.
The Bad: Memory usage is high. It’s a large dependency.
About: Website: https://github.com/facebook/folly, Tested version: v2022.06.27.00, License: Apache License 2.0

folly::F14ValueMap ↑

A supposedly high performance hashmap implementation from Facebook / Meta. It stores values directly. Also see folly::F14NodeMap

The Good: Overall relatively fast search, and very low memory usage.
The Bad: It’s a big dependency
About: Website: https://github.com/facebook/folly, Tested version: v2022.06.27.00, License: Apache License 2.0

fph::DynamicFphMap ↑

A very interesting new contender: This hashmap implementation uses a perfect hash, thus it doesn’t have any collisions. This should make it extremely fast for lookups, but with a potentially high overhead for insert/removal.

The Good: Find is extremely fast, regardless of the hash. In fact, std::hash or boost::hash is best here even with their bad hash quality.
The Bad: Insert and erase and copying the map is very very slow, the slowest in the benchmark. Memory usage is very high. I’d say this map is good for stable data that is never modified, and when you can afford the high memory usage.
About: Website: https://github.com/renzibei/fph-table/tree/noseed, Tested version: 1a613aba (noseed), License: none specified

gtl::btree_map ↑

Containers from greg’s template library. This is actually not a hashmap at all, but it is an ordered container much like std::map. They store multiple elements per node which should make them faster because it is a more cache-friendly layout. I added this one to see if it is possible if non-hashmap implementations could compete in this benchmark.

The Good: Memory usage is excellent, this container has the lowest memory usage of all. Insert & erase are of medium speed.
The Bad: Lookup is very slow.
About: Website: https://github.com/greg7mdp/gtl, Tested version: v1.1.2, License: Apache License 2.0

gtl::flat_hash_map ↑

A hashmap implementation based on Google’s Abseil. It lists changes to the original implementation here. This one is the flat variant.

The Good: This has very similar performance characteristics to absl::flat_hash_map, with the added bonus that it also works well for bad hashes like std::hash and boost::hash. This is a single-header file.
The Bad: Same as for absl::flat_hash_map, copy & iterating is relatively slow.
About: Website: https://github.com/greg7mdp/gtl, Tested version: v1.1.2, License: Apache License 2.0

gtl::node_hash_map ↑

The node hashmap based on google abseil’s absl::node_hash_map. It lists changes to the original implementation here.

The Good: Performance is very similar to absl::node_hash_map. For some reason copying is a bit faster. It works well with std::hash and boost::hash.
The Bad: Copying & iterating is still slow, insert & erase are slow. Memory usage is quite high.
About: Website: https://github.com/greg7mdp/gtl, Tested version: v1.1.2, License: Apache License 2.0

gtl::parallel_flat_hash_map ↑

The parallel variants of the hashmaps have reduced peak memory usage and multithreading support. This is done by splitting up the data into multiple sub-hashmaps. See Parallel hash containers provided by gtl for more information.

The Good: Memory usage is very good, copying is a bit faster than gtl::flat_hash_map. It has nice thread safety properties, see here.
The Bad: Search is a bit slower than the non-parallel variant. It’s still fast though.
About: Website: https://github.com/greg7mdp/gtl, Tested version: v1.1.2, License: Apache License 2.0

gtl::parallel_node_hash_map ↑

The Good: Copying is a bit faster than gtl::flat_hash_map. It has nice thread safety properties, see here.
The Bad: Memory usage is not really much better due to the node overhead. This is the slowerst variant in terms of find speed.
About: Website: https://github.com/greg7mdp/gtl, Tested version: v1.1.2, License: Apache License 2.0

jg::dense_hash_map ↑

A simple replacement for std::unordered_map with better performance but loose stable addressing as a trade-off. The author has nice blog posts about the hashmap’s properties.

The Good: Most operations are really fast, and since data is densly stored iteration is very fast too. For small number of elements (200-2000) this is actually the fastest map!
The Bad: For larger maps the search is still fast but not the best. String search is ok but not among the best performer.
About: Website: https://github.com/Jiwan/dense_hash_map, Tested version: 74277fc4 (master), License: MIT License

robin_hood::unordered_flat_map ↑

Full disclaimer: I’m the author! This is a flat map that is very fast, and I have spent considerable time optimizing it. At that point though it has become hard for me to further support it, and will only provide bug fixes. I consider ankerl::unordered_dense::map its successor!

The Good: Search for numbers is quite fast, inserting & erasing is very good.
The Bad: Iteration is relatively slow, search for numbers could be better. Due to the design when bad hash quality is used it can overflow though (throws std::overflow_error). In my own usage, which is quite heavy, this never happened to me. There are reports from users though where it happened. I’ve stopped developing it in favor of ankerl::unordered_dense::map.
About: Website: https://github.com/martinus/robin-hood-hashing, Tested version: 3.11.5, License: MIT License

robin_hood::unordered_node_map ↑

Similar to robin_hood::unordered_flat_map, but with stable references & pointers. To make this fast it uses a specialized allocator implementation. At that point though it has become hard for me to further support it, and will only provide bug fixes. I consider ankerl::unordered_dense::map its successor!

The Good: It’s really fast and memory usage is quite low. For a node based container iteration is quite good.
The Bad: Due to the design when bad hash quality is used it can overflow though (throws std::overflow_error). In my own usage, which is quite heavy, this never happened to me. There are reports from users though where it happened. I’ve stopped developing it in favor of ankerl::unordered_dense::map.
About: Website: https://github.com/martinus/robin-hood-hashing, Tested version: 3.11.5, License: MIT License

ska::bytell_hash_map ↑

This map was developed by Malte Skarupke in response to Google’s absl::flat_hash_map. It is described with benchmarks in the blog post A new fast hash table in response to Google’s new fast hash table.

The Good: Insert & erase is very fast, find speed is ok. Memory usage is quite good, the same as robin_hood::unordered_flat_map. I suspect it too uses 1 byte overhead per entry and a default fill rate of 80%.
The Bad: Iteration is slow. The map has not received any updates in the last 4 years.
About: Website: https://github.com/skarupke/flat_hash_map, Tested version: 2c468743 (master), License: Boost Software License 1.0

ska::flat_hash_map ↑

This map came before ska::bytell_hash_map and is described in Malte Skarupke’s blog post I Wrote The Fastest Hashtable

The Good: Search performanc is really good, especially for small number of entries.
The Bad: Contrary to its claims it is not the fastest hashtable in any of my benchmarks. Iteration speed is very slow, 26 times slower than the top hash maps. Memory usage is very high.
About: Website: https://github.com/skarupke/flat_hash_map, Tested version: 2c468743 (master), License: Boost Software License 1.0

spp::sparse_hash_map ↑

The map is derived from Google’s sparsehash implementation, but with a modernized C++11 interface.

The Good: Memory usage is very low. Iteration speed is good. Find performance is quite good for such a compact hashmap.
The Bad: Copy is relatively slow, I’m pretty sure this could be optimized.
About: Website: https://github.com/greg7mdp/sparsepp, Tested version: 1.22, License: modified MIT

std::unordered_map ↑

This is the golden standard to which every implementation likes to compare against.

The Good: It’s the standard. It is rock solid, and for most use cases fast enough.
The Bad: Slow across the board. There is no benchmark where the map is fast compared to most of the competitors.
About: Website: https://gcc.gnu.org/onlinedocs/libstdc++/, Tested version: v3, License: modified GPL

std::unordered_map & PoolAllocator ↑

PoolAllocator provides a generic allocator that works especially well for node based maps like std::unordered_map.

The Good: Copying the map becomes about twice as fast with the allocator. Memory usage drops quite a bit. There is some change in find performance, sometimes it’s faster with the pool, sometimes slower.
The Bad: Initialization is a bit more complex, requiring one additional line of code to initialize the PoolAllocator.
About: Website: https://github.com/martinus/map_benchmark/blob/master/src/app/pool.h, Tested version: 644b2fa (master), License: MIT License

std::unordered_map & std::pmr::unsynchronized_pool_resource ↑

Theoretically std::pmr::unsynchronized_pool_resource should behave very similar to PoolAllocator, but it comes from the standard library and requires support for polymorphic memory resources. This is in the C++17 standard, but not everyone implements this.

The Good: Same as for PoolAllocator find is mostly unaffected, and memory usage is lower.
The Bad: Iteration speed is slowed down significantly. Theoretically this shouldn’t make any difference, but it does. Copying the map also got slower.
About: Website: https://gcc.gnu.org/onlinedocs/libstdc++/, Tested version: v3, License: modified GPL

tsl::hopscotch_map ↑

Hashmap based on hopscotch method. This method should be cache friendly and is relatively similar to google::dense_hash_map.

The Good: Insert & erase is very fast. Search performance is ok but not stellar. Memory usage is ok-ish.
The Bad: General performance is not too bad, but also not good.
About: Website: https://github.com/Tessil/hopscotch-map, Tested version: v2.3.0, License: MIT license

tsl::robin_map ↑

A map implemented that makes use of robin hood’s backward shift deletion.

The Good: When a proper hash is used (anything except std::hash or boost::hash), insert & erase performance is top.
The Bad: It is very sensitive with the hash quality, and times out in my random insert & erase benchmarks with std::hash and boost::hash. Memory usage is very high. std::string search is ok but not the fastest. Iteration speed is very slow.
About: Website: https://github.com/Tessil/robin-map, Tested version: 1.0.1, License: MIT License

tsl::sparse_map ↑

A map based on Google’s google::sparse_hash_map, which also makes it similar to spp::sparse_hash_map.

The Good: Excellent low memory usage, only gtl::btree_map is better. For such a low memory usage search performance is very good. Copy is fast too.
The Bad: As with tsl::robin_map this implementation highly depends on a good quality hash. My benchmarks time out with std::hash and boost::hash.
About: Website: https://github.com/Tessil/sparse-map, Tested version: v0.6.2, License: MIT license

Hashes

I have benchmarked all hashmaps with a combination of different hashes. All the hashes have a generic implementation, but they can be basically differenciated into two different modes:

Calculating a hash of an integral type (int, size_t, uint64_t, …)
Calculating a hash of a string (std::string, std::string_view, …)

Many hashmap implementation depend on a reasonably good hash, where reasonably good usually means that it has sufficiently good avalanching so that changes in the input have an effect on the storage location in the hashmap.

Unfortunately std::hash and boost::hash use the identity function for integral types. This is obviously very fast to calculate (simply return the input value), but has no avalanching properties whatsoever. Depending on the input value this can be disastrous for the hashmap performance. It can make the difference between 1 second and 10 minutes runtime. Most of these problems can be found when operating with numbers where only the upper bits change while the lower bits stay constant.

std::hash ↑

Integral Types

Most implementations use the identity hash, which makes them incredible fast and incredible bad. Identity hash works for some hashmap implementations, either by design or because they contain mitigation measures. E.g. ankerl::unordered_dense::map specifically implements additional mixer when it believes hashes are of bad quality. Many maps don’t have these mitigation steps. Thus many of my benchmarks time out for the maps absl::flat_hash_map, absl::node_hash_map, emhash7::HashMap, emhash8::HashMap, jg::dense_hash_map, spp::sparse_hash_map, tsl::hopscotch_map, tsl::robin_map, tsl::sparse_map. Here are some hashes and the corresponding hash values. You can see that any hashmap that depends on the lower or upper bits to change on different input might not get what they want:

0x0000000000000000 → 0x0000000000000000 0000000000000000000000000000000000000000000000000000000000000000
0x0000000000000001 → 0x0000000000000001 0000000000000000000000000000000000000000000000000000000000000001
0x0000000000000002 → 0x0000000000000002 0000000000000000000000000000000000000000000000000000000000000010
0x0001000000000000 → 0x0001000000000000 0000000000000001000000000000000000000000000000000000000000000000
0x0002000000000000 → 0x0002000000000000 0000000000000010000000000000000000000000000000000000000000000000
0x0004000000000000 → 0x0004000000000000 0000000000000100000000000000000000000000000000000000000000000000
0x0001000000000001 → 0x0001000000000001 0000000000000001000000000000000000000000000000000000000000000001
0x0002000000000001 → 0x0002000000000001 0000000000000010000000000000000000000000000000000000000000000001
0x0004000000000001 → 0x0004000000000001 0000000000000100000000000000000000000000000000000000000000000001

Note that Microsoft’s implementation of the STL uses the fnv1a algorithm. This is a very simple hash with ok-ish quality and very bad performance. See godbolt for a comparison.

String Types: String hashing performance is ok for libstdc++.

boost::hash ↑

Integral Types

Uses identity hash, therefore it should behave here exactly the same as std::hash. The same maps timeout: absl::flat_hash_map, absl::node_hash_map, emhash7::HashMap, emhash8::HashMap, jg::dense_hash_map, spp::sparse_hash_map, tsl::hopscotch_map, tsl::robin_map, tsl::sparse_map. As with std::hash, the hash values have no avalanching.

0x0000000000000000 → 0x0000000000000000 0000000000000000000000000000000000000000000000000000000000000000
0x0000000000000001 → 0x0000000000000001 0000000000000000000000000000000000000000000000000000000000000001
0x0000000000000002 → 0x0000000000000002 0000000000000000000000000000000000000000000000000000000000000010
0x0001000000000000 → 0x0001000000000000 0000000000000001000000000000000000000000000000000000000000000000
0x0002000000000000 → 0x0002000000000000 0000000000000010000000000000000000000000000000000000000000000000
0x0004000000000000 → 0x0004000000000000 0000000000000100000000000000000000000000000000000000000000000000
0x0001000000000001 → 0x0001000000000001 0000000000000001000000000000000000000000000000000000000000000001
0x0002000000000001 → 0x0002000000000001 0000000000000010000000000000000000000000000000000000000000000001
0x0004000000000001 → 0x0004000000000001 0000000000000100000000000000000000000000000000000000000000000001

String Types

String hashing performance is slow! It’s much slower than std::hash and it seems to use by far the slowest string hashing algorithm of all hashes that I tested.

absl::Hash ↑

Integral Types

This mixes the numbers well and has good avalanching properties. It is also fast, works well with all hashmaps. Here are some inputs and the corresponding hash values (in hex and in bits)

0x0000000000000000 → 0x9a21db8e77112ea2 1001101000100001110110111000111001110111000100010010111010100010
0x0000000000000001 → 0x3801ed914ce8fc3a 0011100000000001111011011001000101001100111010001111110000111010
0x0000000000000002 → 0xd5e1f799a1a0d391 1101010111100001111101111001100110100001101000001101001110010001
0x0001000000000000 → 0xc78a3c6e611a3b6a 1100011110001010001111000110111001100001000110100011101101101010
0x0002000000000000 → 0xf4f29e4e1be30032 1111010011110010100111100100111000011011111000110000000000110010
0x0004000000000000 → 0x4fc7420e2ffd79c3 0100111111000111010000100000111000101111111111010111100111000011
0x0001000000000001 → 0x656a0a715ae3e9f2 0110010101101010000010100111000101011010111000111110100111110010
0x0002000000000001 → 0x92d2a851201ad2aa 1001001011010010101010000101000100100000000110101101001010101010
0x0004000000000001 → 0xeda774111404ab5a 1110110110100111011101000001000100010100000001001010101101011010

String Types

String hashing performance is very fast, I believe it is based on wyhash.

ankerl::unordered_dense::hash ↑

A very fast hash across the board, with good avalanching properties. Note that this always returns an uint64_t and not a size_t like std::hash, so it produces the same 64bit numbers also on 32bit systems (and might be relatively slow on these). It also adds a property using is_avalanching = void; to the hash which makes it possible for hashmap implementations to differentiate their behavior based on the quality of the hash they are getting. This is used in ankerl::unordered_dense::map.

Integral Types

This hash uses a very simple but effective mixer. It performs a 128bit multiplication of the input with a constant and then XOR’s the input. It produces good avalanching (at least good enough for hashmaps to perform well), and can be calculated extremely fast. It usually compiles down to 4 assembly instructions (movabs, mov, mul, xor. See godbolt example). Note that for the input 0 the output is also 0. I believe it has worse mixing quality than the hash used in absl::Hash, but it is faster and generally good enough.

0x0000000000000000 → 0x0000000000000000 0000000000000000000000000000000000000000000000000000000000000000
0x0000000000000001 → 0x9e3779b97f4a7c15 1001111000110111011110011011100101111111010010100111110000010101
0x0000000000000002 → 0x3c6ef372fe94f82b 0011110001101110111100110111001011111110100101001111100000101011
0x0001000000000000 → 0x7c159e3779b97f4a 0111110000010101100111100011011101111001101110010111111101001010
0x0002000000000000 → 0xf82b3c6ef372fe94 1111100000101011001111000110111011110011011100101111111010010100
0x0004000000000000 → 0xf05678dde6e5fd29 1111000001010110011110001101110111100110111001011111110100101001
0x0001000000000001 → 0x1a4ce78e06f3035e 0001101001001100111001111000111000000110111100110000001101011110
0x0002000000000001 → 0x966045d78c388280 1001011001100000010001011101011110001100001110001000001010000000
0x0004000000000001 → 0x8e89016499af813f 1000111010001001000000010110010010011001101011111000000100111111

String Types

This uses wyhash. This is an extremely fast hash for both small and large strings with very high hashing quality. The code size is also relatively small which makes inlining easier for the compiler.

robin_hood::hash ↑

Integral Types

The hash of robin_hood::unordered_map has gone through quite a bit of evolution. Currently it is basically murmurhash3. It is relatively fast and should have relatively good avalanching properties.

0x0000000000000000 → 0x0000000000000000 0000000000000000000000000000000000000000000000000000000000000000
0x0000000000000001 → 0xff51afd792fd5b26 1111111101010001101011111101011110010010111111010101101100100110
0x0000000000000002 → 0xfea35fafa5fab64d 1111111010100011010111111010111110100101111110101011011001001101
0x0001000000000000 → 0x64b8f6aaf43afb55 0110010010111000111101101010101011110100001110101111101101010101
0x0002000000000000 → 0xc971ed55e875f6aa 1100100101110001111011010101010111101000011101011111011010101010
0x0004000000000000 → 0x92e3daab50ebed55 1001001011100011110110101010101101010000111010111110110101010101
0x0001000000000001 → 0x640aa68281b95f8c 0110010000001010101001101000001010000001101110010101111110001100
0x0002000000000001 → 0xc8c39d2d1e43425b 1100100011000011100111010010110100011110010000110100001001011011
0x0004000000000001 → 0x92358a834ff5498c 1001001000110101100010101000001101001111111101010100100110001100

String Types

Uses a slightly streamlined MurmurHash2 hash which is quite compact and fast.

mumx ↑

Simple hash that uses exactly the same algorithm as ankerl::unordered_dense::hash for integral types (but with a different multiplier constatn), and reverts to std::hash for all other types.

Integral Types

Same as ankerl::unordered_dense::hash, avalanching is good.

0x0000000000000000 → 0x0000000000000000 0000000000000000000000000000000000000000000000000000000000000000
0x0000000000000001 → 0x2ca7aea0ebd71d49 0010110010100111101011101010000011101011110101110001110101001001
0x0000000000000002 → 0x594f5d41d7ae3a92 0101100101001111010111010100000111010111101011100011101010010010
0x0001000000000000 → 0x1d492ca7aea0ebd7 0001110101001001001011001010011110101110101000001110101111010111
0x0002000000000000 → 0x3a92594f5d41d7ae 0011101010010010010110010100111101011101010000011101011110101110
0x0004000000000000 → 0x7524b29eba83af5c 0111010100100100101100101001111010111010100000111010111101011100
0x0001000000000001 → 0x49f082074577f69e 0100100111110000100000100000011101000101011101111111011010011110
0x0002000000000001 → 0x6739f7efb696cae7 0110011100111001111101111110111110110110100101101100101011100111
0x0004000000000001 → 0xa1cb1c3e5154b215 1010000111001011000111000011111001010001010101001011001000010101

String Types

Same as std::hash.

Conclusion

This benchmark, like the last one, was a lot more effort than I originally anticipated. Redoing the benchmark took my computer weeks, and I’ve completely revamped the presentation with the big table which hopefully is easier to browse and search than the charts I’ve done in the last benchmark. I had a look at each hashmap, the different hashes, and tried to figure out why they behave one way or the other.

So what’s actually the best map to choose? As you saw above, it depends. There are a lot of excellent implementations to choose from, each with different properties. I can’t give any advice here. Although I’ve spent a lot of time making the benchmarks as good as possible, they might not necessarily represent your real world needs.

All of the work here is available as open source here: https://github.com/martinus/map_benchmark

If you like my work, I’d really appreciate it if you can become my sponsor!

Errata

2022-09-07

I claimed that std::hash always uses identity hash. That’s not true, Microsoft’s STL uses fnv1a.

2022-09-08

I wrote that absl::Hash is the slowest for std::string. This is totally not true, that statement should have gone to boost::hash.

Table of Contents

Benchmark Infrastructure

Hardware

Software

Benchmarks

Stable References

Modifying Numbers Benchmarks

Copy

Insert then Erase 100M int

Random Insert & Access uint64_t

Random Insert & Erase uint64_t

Access & Find Benchmarks

Iterate

Find 1 – 200 uint64_t

Find 1 – 2000 uint64_t

Find 1 – 500k uint64_t

Modifying String Benchmarks

Random Insert & Erase string

Find String Benchmarks

Find 1 – 100k string

Find 1 – 1M string

Other

Memory Usage

Combined Results

Geometric Mean Number Find

Geometric Mean String Find

Geometric Mean All

Benchmark Results Table

Result Analysis

Containers

absl::flat_hash_map ↑

absl::node_hash_map ↑

ankerl::unordered_dense::map ↑

boost::multi_index::hashed_unique ↑

boost::unordered_map ↑

boost::unordered_map & PoolAllocator ↑

boost::unordered_map & boost::container::pmr::unsynchronized_pool_resource ↑

emhash7::HashMap ↑

emhash8::HashMap ↑

folly::F14NodeMap ↑

folly::F14ValueMap ↑

fph::DynamicFphMap ↑

gtl::btree_map ↑

gtl::flat_hash_map ↑

gtl::node_hash_map ↑

gtl::parallel_flat_hash_map ↑

gtl::parallel_node_hash_map ↑

jg::dense_hash_map ↑

robin_hood::unordered_flat_map ↑

robin_hood::unordered_node_map ↑

ska::bytell_hash_map ↑

ska::flat_hash_map ↑

spp::sparse_hash_map ↑

std::unordered_map ↑

std::unordered_map & PoolAllocator ↑

std::unordered_map & std::pmr::unsynchronized_pool_resource ↑

tsl::hopscotch_map ↑

tsl::robin_map ↑

tsl::sparse_map ↑

Hashes

std::hash ↑

boost::hash ↑

absl::Hash ↑

ankerl::unordered_dense::hash ↑

robin_hood::hash ↑

mumx ↑

Conclusion

Errata