Essential Linux Dev Stuff

I’ve recently started developing more in Linux, here is a collection of tools, tips and tricks to get the most out of the box. I’ll expand this list from time to time.

Appearance

Terminal & Editor font: Hack

Here is a test pattern I am using to evaluate fonts:

This is how the pattern looks like with Hack, my favourite programming font:

testpattern

All characters are very distinguishable, e.g. 0 and O, 1 and l. Subpixel hinting is also excellent.

colormake

Simple wrapper around make to colorize its output.

redshift

Install redshift-gtk, which is similar to f.lux but is more robust on my graphics card. I use this config for redshift (geoloc provider has problems).

In file ~/.config/redshift.conf:

Tools

Terminator

Multi-window terminal with lots of features.

ripgrep

Extremely fast grep tool, similar to ack but much faster, even faster than silver searcher. It searches through my 48GB subversion folder in 1 second. Also available for Windows! See here for installation instructions.

I’ve also added some aliases in ~/.bash_aliases:

geany

Replacement for gedit, for basic editing tasks. It’s not excellent but usually does the job. I’m not a vim/emacs guy.

Visual Studio Code

Nice and highly configurable editor, also great in Linux. Just open a directory and you are ready to go. Get it here. Essential extensions are:

There are many others, but these are the bare minimum I use.

glances

Glances gives a fantastic overview of what’s going on in the system. Using you can see at a glance if e.g. your memory, disk space, swap is running out. The version you’ll get with apt is unfortunately quite outdate, so I am installing it with pip (see the homepage).

glances

Useful Time Savers

Clear Terminal buffer

In ~/.bash_aliases, put this:

While Ctrl+L just clears the screen, cls will now clear the whole buffer. Convenient for a large build job and you want to scroll up to find something without scrolling too far.

Sleep Until

I use this ruby script to sleep until a given time. It’s in ruby so I can also easily use it in Windows:

I use this to schedule builds, so that when I arrive at the office everything is already built with the latest version so I can start working right away.

bash prompt with runtime and errorcode

See my Linux Bash Prompt post. It’s awesome.

screenshot_2016-11-07_06-24-23

Optimizations

ccache

Use ccache, and use it on an SSD disk. Here are my settings from ~/.ccache/ccache.conf:

I’m using this for a huge C++ project where I build debug, release both in 32bit and 64bit, so a large ccache helps. Using compression_level is only a very minor slowdown, but practically increases cache size by a factor of 3-5 or so.

Use tmpfs for /tmp

Much faster compilation. Add this to /etc/fstab:

Compressed RAM

Especially in virtual machines or low-end machines where memory is constrained, zram provides memory compression.

Reboot your machine, and everything is well configured. For more information, see

Disk Cleanup

Remove libreoffice

Remove old Kernels

Source: How to Easily Remove Old Kernels in Ubuntu 16.04

Virtualbox

Dynamic Disk Size

Don’t do it. It’s much slower. If you have to, be aware of this:

I’m using Linux in a Windows Host, my .vdi image is on an SSD. When using dynamic disk size, this unfortunately grows the space continuously when compiling a lot, because deleted files won’t shrink the .vdi image. But it is possible to do this with e.g. zerofree (a hazzle), or better with TRIM:

  • In the Windows host, enable SSD and TRIM support for the image (see here):

  • In Linux guest, perform TRIM, you’ll see the .vdi disk shrinking while this runs:

  • Add a nightly task to crontab with sudo crontab -e, there add this line:

I also tried to mount the image with the discard option (see here), but it caused my Linux to hang so I disabled it again

Mount shared folder

Add the user to the vboxsf group:

Edit /etc/fstab e.g. like this:

GNOME keyring Without Password Prompt

It’s unfortunate when a scheduled svn up asks for password because it has not been entered for a while. To disable the password question, do this:

  1. sudo apt install seahorse
  2. Follow this guide.

Visual Studio C++ 2015 Compiler Bug

I think I’ve unfortunately found a compiler bug in Visual Studio 2015, all the way back to at least VS2010. Here is the code:

Compile and run in 64bit, Release. This is the expected output (as with g++, clang++, or in debug):

This is what I get with 64bit, Release:

Also see my stackoverflow post. The compiler seems to optimize too much, and directly uses a 64bit number instead of making sure the 32bit value is casted to 64bit. I have already reported the bug to Microsoft, and wait for feedback.

Linux Bash Prompt

Here is my bash prompt, with the following features:

  • Red ✘ if the previous command has failed, otherwise a green ✔.
  • Shows running time of the previous command
  • Separate line for path and command

Example

screenshot_2016-11-07_06-24-23

Installation

Add this to your .bashrc:

Updates

  • 2017-04-28: Now prints days, hours, minutes, seconds. Much better readable for long running tasks.
  • 2016-11-04: Initial version

Sources

Very Fast HashMap in C++: Benchmark Results (Part 3)


As promised in part 2 of this series, I have some interesting benchmarks of the different hash map variants. In this part I will discuss the results, and present my conclusions. Click any benchmark to get a much larger graph.

How I Benchmark

I’m inserting 100 million int -> int pairs into different hashmap variants, and every 100000 inserts I am taking measurements (insertion time, memory usage, time to lookup 1 million elements, where half of the lookups fail.). As a hash function I am using std::hash. My computer is an Intel i5-4670 @ 3.4 GHz, 16GB RAM, Visual Studio 2015 Update 3, 64bit.

Insertion Time Benchmark

all_insertion_time

In this graph the reallocations are very clearly visible. All variants are doubling the size of the internally used array when it gets full. As expected in part 2, it can be seen that insertion time slows down when “Robin Hood with Infobyte” gets full, and also when “Robin Hood with Infobyte & Fastforward” gets full. This is due to the fact that more and more data needs to be moved around so the hashmap can stay sorted.

Dealing with the fastforward field has a clear overhead associated with it, insertion is quite a bit slower.

For “Robin Hood with Infobits and Hashbit”, insertion time is very fast: mostly because the offset is limited to 16, if it would get larger the hashmap’s size is doubled. Hopscotch insertion is almost as fast, just a tad slower.

All variants are much faster than std::unordered_map. The comparison is a bit unfair, because in my implementations I do not have to support the same interface as std::unordered_map does.

Memory Usage

all_memory

Robin Hood with Infobyte is the most compact representation. I am resizing the map when it gets 95% full, or when an overflow occurs which is highly unlikely when using an appropriate hash function. There is just 1 byte overhead for each entry.

The second best representation is “Robin Hood with Infobyte & Fastforward”. This has exactly the same reallocation properties as when just using the infobyte, but it uses one additional byte per bucket.

When limiting the maximum offset as in “Robin Hood with Infobits & Hashbits”, resizes are occuring much quicker (maximum offset is 15 instead of 127), so the memory usage jumps up earlier.

For Hopscotch this reallocation seems to happen still a bit earlier. The overhead in my implementation is 16 bit, so the line overlaps a lot with the Infobyte & Fastforward variant.

All my implemented variants need much less memory than std::unordered_map. While std’s implementation allocates new memory for each insert, in all my variants the memory stays constant until reallocation occurs.

Lookup Time

all_lookup_time

This is probably the most interesting benchmark for most use cases, and I find this graph extremely educational and interesting.

First, the Robin Hood with Infobyte variant: Lookup time starts pretty fast, but it gets slower and slower as the hashmap gets full. This is due to the longer and longer chains that needs to be traversed to find the correct element. So the highly compact format is bought with ever increasing lookup (and insertion) times. Still, lookup is almost always faster than lookup for std::unordered_map.

The Robin Hood with Infobits & Hashbits variant shows the same behaviour, lookup time goes up when the hashmap gets full. This increase is capped early though due to the earlier reallocations. So faster lookup times can be bought with earlier reallocations and more memory usage.

To improve lookup times compared to the infobyte variant, I have introduced the fastforward byte to quickly skip directly to the elements that belong to a bucket. That’s the gray line. When the hashmap is quite empty it seems that extracting this fastforward information has quite a bit of an overhead. When implementing this variant I did not expect such a large overhead. It seems to work very well though in improving lookup time when the map gets very full, as can be seen around the 60 million elements line. Here, lookup for the variant with just the infobyte takes 0.0676 seconds, and when using fastforward byte it takes 0.0487 seconds, 38% faster. Also the lookup time is quite linear.

Last but not least, the Hopscotch variant. After spending quite some time tuning, this variant turned out as the fastest of all. Lookup time does not have the spiky behaviour that robin hood has, and it is extremly fast. One trick I am using is that I using that I didn’t see anywhere else mentioned, is that I shift the whole hop field and can stop the while loop as soon as the hop field is zero. So when only one element is there and it is in the first bucket, I need to check just one bit of the hopfield. When the element is not there, hopfield is zero so no bits have to be checked at all.

Summary

summary
I’ve summed up the average insertion time, average lookup time, and average memory usage over all the different samples. std::unordered_map is the reference with 100%. Here are my conclusions:

  • The HopScotch implementation is the best overall for most use cases, if you have a reasonable hash function. Insertion time is just a tad slower than the fastest variant, memory usage is a bit higher than the other variants but still almost 3 times better than std::unordered_map, and lookup is blazingly fast with almost twice the speed as unordered_map. 0.0404 seconds to look up 1 million elements, or just 40.4 nanoseconds per lookup on average is not too shabby.
  • If your hash function is bad or compactness is of utmost importance, use the Robin Hood with Infobyte variant.
  • If the std interface is important, stick to std::unordered_map. While it uses lots of memory, lookup times seem to be ok and fairly consistent.

That’s my tour of hashtable implementations. You can find my code on Github at martinus/robin-hood-hashing. It still needs lots of cleanup, and most importantly, the ability to remove elements…

Very Fast HashMap in C++: Implementation Variants (Part 2)


In part 1 I have discussed Hopscotch and Robin Hood Hashing table. Since then I have implemented several hashmap variants that combine some tricks from these two variants, and have some interesting results to share. Here are the variations that I have implemented:

Robin Hood with Infobyte

Robin Hood Hashing uses the hash value to calculate the position to place it, than does linear probing until it finds an empty spot to place it. While doing so it swaps out entries that have a lesser distance to its original bucket. This minimizes the maximum time a lookup takes. For a lookup, it is only necessary to lineary probe until the distance to the original bucket is larger than the current element’s distance.

In this “Infobyte” implementation variant, instead of storing the full 64 bit hash value I directly store the distance to the original bucket in a byte. One bit of this byte is used to mark the bucket as taken or empty. The bit layout is defined like this:

infobyte

When storing four elements a, b, c, d, where a, b map to hash position 1 and c, d to hash position 2, this is how they would be stored:

infobyte_layout

The advantage is that I have just 1 byte instead of 8 overhead, and linear probing is faster because I need only comparison operation on a single byte instead of calculating offsets with the hash. A byte is more than enough for the offset, since on average the distance to the the original bucket is very small. Here is the full code for lookup:

This search is very simple and straightforward, and fast.

Robin Hood with Infobits & Hashbits

Using a full byte for the offset (actually 7 bit) is fairly large. This would allow an offset of 127, which would lead to a very long and slow linear probing. My idea then is to limit the offset to a much lower maximum number. The remaining bits of the byte can be used to store a few bits from the hash, so that we can use these few bits to reduce the number of key comparisons. Also these hashbits could be used when resizing the hashmap so we would not have to rehash all elements each time we resize. After some tuning, this is the bit layout I am using for this variant:

infobyte_hashbits

The much smaller offset means that when inserting an element and the maximum offset is reached, I have to resize the hashmap. This is intentional, since very large offset would lead to slow searches. This way this hashing variant is more similar to Hopscotch. Here is the full code to lookup an element with this byte:

The find code is almost exactly like in the pure Infobyte variant, except that I am or’ing 3 hash bits into the info byte, and instead of ++info I am using info += Traits::OFFSET_INC where OFFSET_INC is 16 (1000b).

Robin Hood with Infobyte & Fastforward

In the first variant of Robin Hood Hashing finding an element will be slow as the hashmap gets full, because a lot of elements need to be skipped before finding the full one. Robin hood hashing has the property that it always keeps elements that belong to the same bucket together. The infobyte introduced in the first variant basically acts as a backpointer to the bucket that element originally belongs to, and it is possible to introduce another byte that is stored at the original bucket, that acts as a forward pointer to where the first element is stored for this original bucket.

Using the above example where a,b maps to 1 and c,d to 2, fastforward byte looks like this:

infobyte_fastforward

When looking up element c, the fastforward byte can be used to skip one element, so the while (info < _info[idx]) { ... } can be replaced with the skip operation. Here is the code:

I am using a struct that contains two bytes: info field, and fastforward. Now I have two bytes overhead for each entry, but can quickly skip lots of buckets which should be especially useful when the hashmap is very full.

Hopscotch

Hopscotch can be seen as quite similar to the robin hood hashing with fastforward, except that it replaces the info and fastforward byte with a bitmask. With some tuning I am now using a 16 bit field, where 1 bit is used to define if the current bucket is full, and the other bits define which offsets to the current bucket are taken.

hopscotch_mask

If the first bit is set, the current bucket is taken; but not nessarily by the bucket from this hop bitfield. This bit is not strictly necessary because that information is also encoded in the different hop tables, but it speeds up insertion code. With 16 bits available, this limits the hop bits to 15 though. As with “Robin Hood Hashing with Infobits & Hashbits”, when the hop bits are full, the hashmap has to be resized.

Benchmark Results

I’m leaving the most important information for part 3: how do all these variant perform? What’s best?