Table of Contents


The core of the benchmark is this loop:

Map<uint64_t, uint64_t> map;
for (size_t i = 0; i < 50'000'000; ++i) {
    map.emplace(rng() & bitMask, i);
    map.erase(rng() & bitMask);
}

On first glance it looks similar to the previous benchmark, but it bechmarks something completely different. Each iteration it randomly inserts an element, and then randomly erases an element. Instead of using a maximum number for the random number generator, here I am using a bit mask to extract several bits. The purpose of this is to ensure the maps work still well, even when numbers are not sequentially or small and don’t always change in the lower bits.

This benchmark loop is repeated 6 times. Initially, 4 random bits are set in bitMask. After each benchmark 4 additional bits are set, thus the number of available random numbers increases and the map will find a larger equilibrium. Here is the list of bitMasks used in the benchmark:

  1. 1001000000000000000000000000000000000000000100000000000000001000 4 bits set, 16 distinct numbers. Equilibrium will be around 16/2 = 8 entries, so the map stays very small.
  2. 1001000000000010001100000000000000000000000101000000000000001000 8 bits set, 256 distinct numbers. Averaging 128 entries.
  3. 1001000000000110001100000000000000010000000101100000000000001001 12 bits set, 4096 distinct numbers, averaging 2048 entries
  4. 1001000000000110001100000001000000010000000101110000000100101001 16 bits set, 65k distinct numbers, 32.8k entries equilibrium.
  5. 1101100000000110001100001001000000010000000101110001000100101001 20 bits set, 1M distinct numbers, 524k entries equilibrium.
  6. 1101100000001110001100001001001000010000100101110001000100101011 24 bits set, 16.8M distinct numbers, 8.4M entries equilibrium.

So the map’s average size increases by a factor of 16 each benchmark. Ideally a hashmap has O(1) amortized operations, so the speed should be constant regardless the size. This is unfortunately not the case, mostly due to lots and lots of cache misses the larger you get.

Results

Hashes

Yet another win for robin_hood::hash, and absl::Hash comes in second. Not that this time libstdc++-v3 hash has extremely bad behavior - as predicted. Lots of map implementations simply timeout. Notably, a few hashmaps still work with reasonable performance, I suspect that they have some kind of bad-hash-prevention built to protect against dumb hashes like libstdc++-v3. Having this bad-hash-prevention unfortunately also incures an overhead when a good hash is used.

Hashmaps

tsl::robin_map wins again, with robin_hood::underered_flat_map closely behind. Again, robin_hood::unordered_flat_map has lower memory usage.

Interestingly, the next on the pareto front is robin_hood::unordered_node_map. Even though there is additional memory and runtime overhead due to the indirection necessary for a node maps, this map manages to get to the pareto front. It does so with a few tricks:

  • It uses a custom bulk pool allocator. When entries are removed, the memory is not freed but put back into a pool to be reused when another entry is inserted.
  • The memory requirement of flat maps usually have a large peak while resizing. This peak can be somewhat diminished with a node-based map, because the flat map only contains pointers instead of actual data. So in this case, the node is 16 bytes (a pair<uint64_t, uint64_t>), while a pointer is just 8 bytes. Thus, peak memory can be lower and robin_hood::unordered_node_map takes advantage of that.

I find it quite amusing that in this benchmark, which consists of purely inserting and removing elements, two hashmaps with robin-hood hashing technique are the fastest. It is often assumend that due to the need to shuffle data around in these implementations they are slower than other techniques, but in practice this seems to not be the case.

Chart

Each entry is total runtime for 50M emplace() and erase() of a random number with a bitmask.

  1. blue: 4 bits set
  2. orange: 8 bits set
  3. green: 12 bits set
  4. red: 16 bits set
  5. magenta: 20 bits set
  6. brown: 24 bits set