Table of Contents
- Overview
- Construction Benchmarks
- Modifying Benchmarks
- Accessing
- Conclusion
The core of the benchmark is this loop:
Map<uint64_t, uint64_t> map;
for (size_t i = 0; i < 50'000'000; ++i) {
map.emplace(rng() & bitMask, i);
map.erase(rng() & bitMask);
}
On first glance it looks similar to the previous benchmark, but it bechmarks something completely different. Each iteration it randomly inserts an element, and then randomly erases an element. Instead of using a maximum number for the random number generator, here I am using a bit mask to extract several bits. The purpose of this is to ensure the maps work still well, even when numbers are not sequentially or small and don’t always change in the lower bits.
This benchmark loop is repeated 6 times. Initially, 4 random bits are set in bitMask. After each benchmark 4 additional bits are set, thus the number of available random numbers increases and the map will find a larger equilibrium. Here is the list of bitMasks used in the benchmark:
10010000000000000000000000000000000000000001000000000000000010004 bits set, 16 distinct numbers. Equilibrium will be around 16/2 = 8 entries, so the map stays very small.10010000000000100011000000000000000000000001010000000000000010008 bits set, 256 distinct numbers. Averaging 128 entries.100100000000011000110000000000000001000000010110000000000000100112 bits set, 4096 distinct numbers, averaging 2048 entries100100000000011000110000000100000001000000010111000000010010100116 bits set, 65k distinct numbers, 32.8k entries equilibrium.110110000000011000110000100100000001000000010111000100010010100120 bits set, 1M distinct numbers, 524k entries equilibrium.110110000000111000110000100100100001000010010111000100010010101124 bits set, 16.8M distinct numbers, 8.4M entries equilibrium.
So the map’s average size increases by a factor of 16 each benchmark. Ideally a hashmap has O(1) amortized operations, so the speed should be constant regardless the size. This is unfortunately not the case, mostly due to lots and lots of cache misses the larger you get.
Results
Hashes
Yet another win for robin_hood::hash, and absl::Hash comes in second. Not that this time libstdc++-v3 hash has extremely bad behavior - as predicted. Lots of map implementations simply timeout. Notably, a few hashmaps still work with reasonable performance, I suspect that they have some kind of bad-hash-prevention built to protect against dumb hashes like libstdc++-v3. Having this bad-hash-prevention unfortunately also incures an overhead when a good hash is used.
Hashmaps
tsl::robin_map wins again, with robin_hood::underered_flat_map closely behind. Again, robin_hood::unordered_flat_map has lower memory usage.
Interestingly, the next on the pareto front is robin_hood::unordered_node_map. Even though there is additional memory and runtime overhead due to the indirection necessary for a node maps, this map manages to get to the pareto front. It does so with a few tricks:
- It uses a custom bulk pool allocator. When entries are removed, the memory is not freed but put back into a pool to be reused when another entry is inserted.
- The memory requirement of flat maps usually have a large peak while resizing. This peak can be somewhat diminished with a node-based map, because the flat map only contains pointers instead of actual data. So in this case, the node is 16 bytes (a
pair<uint64_t, uint64_t>), while a pointer is just 8 bytes. Thus, peak memory can be lower androbin_hood::unordered_node_maptakes advantage of that.
I find it quite amusing that in this benchmark, which consists of purely inserting and removing elements, two hashmaps with robin-hood hashing technique are the fastest. It is often assumend that due to the need to shuffle data around in these implementations they are slower than other techniques, but in practice this seems to not be the case.
Chart
Each entry is total runtime for 50M emplace() and erase() of a random number with a bitmask.
- blue: 4 bits set
- orange: 8 bits set
- green: 12 bits set
- red: 16 bits set
- magenta: 20 bits set
- brown: 24 bits set