Category Archives: rant

Cleverness Considered Harmful

I have just read this nice quote at the stackoverflow question “Why is cleverness considered harmful in programming by some people?“:

Fools ignore complexity; pragmatists suffer it; experts avoid it; geniuses remove it.
– Alan Perlis

Which reminds me of a little code piece I have written recently. I’ve recently tried to implement a small, little parser for a very simple custom data format, in C++. To do this, I have tried several approaches:

1. Boost.Spirit

Since we use Boost in our projects, I have started reading about Boost.Spirit, and took some time to decipher the tutorials which contains code like this:
[cpp]bool r = phrase_parse(first, last,
// Begin grammar
(
‘(‘ >> double_[ref(rN) = _1]
>> -(‘,’ >> double_[ref(iN) = _1]) >> ‘)’
| double_[ref(rN) = _1]
),
// End grammar
space);[/cpp]

After half an hour I got annoyed because it simply is too much effort. I don’t care how well thought out the library is and how powerful it is, it is simply unusable. Maybe I am too stupid, but I am sure that even when I manage to understand it enough to write a decent parser, half a year later I can never debug my code again: it’s simply too clever.

2. Coco

I’ve ditched Boost.Spirit, and tried to use Coco. I am unfamiliar with this but have seen a colleague use it, so I gave it a try. I was reading the documentation, which looks nice but has 42 pages and since I am a lazy bastard I stopped right there because I just want to get something working, and quickly.

3. Hand Written Large Switch

I have ditched Cocomo, and started to write my own, very simple code that basically looked like this:

[cpp]while (instream >> sym) {
switch (symbol_map[sym]) {
case START:
// do this
break;
case WHATEVER:
// do that
break;
}
}[/cpp]
After just 10 minutes I got a minimal parser that worked good enough and was extremly readable and understandable code. Everybody with basic C++ understanding can skim over this code and get it. The number of WTF’s per minute when reading this code is close to zero. I am very happy with this approach, and it really rocks because it is so dead simple.

You can rightfully say that a simple switch is very inflexible, and not extensible. You are right, but who cares? Almost all code that I have seen that was planed ahead for flexibility that you might need, gets too complicated because what you planed ahead for might never be needed; even worse: most of the time you need flexibility that you cannot know in advance, it only becomes apparent when you have something and running and use it for a while.

Final Words

Back to the original quote, based on my experience I would extend it a bit:

Ignorants add complexity; fools ignore complexity; pragmatists suffer it; experts avoid it; geniuses remove it.
– Martin Ankerl

If you find this interesting you might also like consider reading the Three Laws of Software Development.

What do you think?

Job Interview Question: Sorting Records

Dev102.com has a few postings about interesting job interview question. His fifth challenge is this neat task:

You are asked to sort a collection of records. The records are stored in a file on the disk, and the size of the file is much bigger than available memory you can work with. Assume a record size is 10 bytes and file size is 1GB. you read and write data to/from disk in units of pages where each page is 1KB big. You have few (less than 10) available pages of physical memory. Assume you are given a method to compare two records.

Here is how I would approach this problem. It was just a quick direct writeup of the thoughts that popped up in my mind:

  1. Easiest solution: use Unix’s sort tool, with -S option to specify max memory usage.
  2. If this is not allowed, the second easiest solution is to implement a simple Merge sort, and use multiple passes with one temporary file until everything is sorted.
  3. If speed is an issue, you can have a pre-pass before the merge sort that loads as much data junks into memory as possible, sorts this junk with Quicksort, and write each sorted junk back into the temporary file.
  4. If you want to go even faster, you can go crazy with 10 passes at once and use a sorting network to merge these 10 streams. Here comes the point where you should benchmark before coding, because caching, harddisk properties etc. might have an enormous influence.

I believe this is a quite reasonable approach. It is important to have multiple solutions for a problem to select from, then you can choose the simplest one that works. This is the Do The Simplest Thing That Could Possibly Work (DTSTTCPW) – Principle. This approach is very important in test driven development, and in general a big part of the scientific method (see also Occams’ Razor). If the simplest solution is enough, that’s great! If this is not enough, you can try again with the second simplest solution. Repeat until the problem is solved :-)

I have use this approach for a long time now, and its really productive. Never go for cool and interesting but complex solutions just because they are cool and interesting. These kind of solutions are just for your own ego. There are even 10 commandments for egoless programming.