From agile

Job Interview Question: Sorting Records has a few postings about interesting job interview question. His fifth challenge is this neat task:

You are asked to sort a collection of records. The records are stored in a file on the disk, and the size of the file is much bigger than available memory you can work with. Assume a record size is 10 bytes and file size is 1GB. you read and write data to/from disk in units of pages where each page is 1KB big. You have few (less than 10) available pages of physical memory. Assume you are given a method to compare two records.

Here is how I would approach this problem. It was just a quick direct writeup of the thoughts that popped up in my mind:

  1. Easiest solution: use Unix’s sort tool, with -S option to specify max memory usage.
  2. If this is not allowed, the second easiest solution is to implement a simple Merge sort, and use multiple passes with one temporary file until everything is sorted.
  3. If speed is an issue, you can have a pre-pass before the merge sort that loads as much data junks into memory as possible, sorts this junk with Quicksort, and write each sorted junk back into the temporary file.
  4. If you want to go even faster, you can go crazy with 10 passes at once and use a sorting network to merge these 10 streams. Here comes the point where you should benchmark before coding, because caching, harddisk properties etc. might have an enormous influence.

I believe this is a quite reasonable approach. It is important to have multiple solutions for a problem to select from, then you can choose the simplest one that works. This is the Do The Simplest Thing That Could Possibly Work (DTSTTCPW) – Principle. This approach is very important in test driven development, and in general a big part of the scientific method (see also Occams’ Razor). If the simplest solution is enough, that’s great! If this is not enough, you can try again with the second simplest solution. Repeat until the problem is solved :-)

I have use this approach for a long time now, and its really productive. Never go for cool and interesting but complex solutions just because they are cool and interesting. These kind of solutions are just for your own ego. There are even 10 commandments for egoless programming.

Three Laws of Software Development

seecretGeek has a nice blog entry about the three laws of Software Development, inspired by Isaac Asimov’s Laws of Robotics (minor adaptations from me):

  1. A developer must write code that creates value.
  2. A developer must make their code easy to maintain, except where such expenditure will conflict with the first law.
  3. A developer must reduce their code to the smallest size possible, as long as such reduction does not conflict with the first two laws.

That’s about all there is to software development. Print this out, make copies for all your coworkers, and engrave it in stone plates so that our childrens children will still be able to live by these timeless wisdoms!

Statistical Unit Tests with ensure4j

As part of another project I am developing ensure4j. The syntax (see the examples here) is working quite nicely, ensure4j is already very useful for internal use.

Lately I was busy adding tests that are able to verify if some code (e.g. an optimizer that uses random, like genetic algorithm, simulated annealing, …) produces the desired in e.g. 95% of the cases (Wikipedia has a nice practical example for confidence intervals).

Example Usage

Here is an example that tests the nonsense code Math.random() * 2.
[sourcecode language=”java”]ensure(new Experiment() {
public double measure() {
return Math.random() * 2;
}).between(0.9, 1.1, 0.95).sample(10, 100);[/sourcecode]

The code most likely does not make much sense out of context like this, so here is an explanation of what it does:

Read more