Google Analytics Experiments is Buggy

Google Analytics has a very interesting nice feature: it is possible to use it for A/B testing, and in a nice client-side way that just uses javascript. I am using this to improve my keto calculator website. An interesting feature is that google uses a multi-armed bandit implementation, which optimizes the A/B split ratio while optimizing.

Unfortunately, Google’s implementation is quite buggy. Here are a few problems that I ran into while testing:

Inaccurate Number

While testing, Google Analytics shows a “Probability of Outperforming Original”. This number can is often incorrect, and does not match with what is actually used for the homepage A/B split ratio. It seems that the numbers in the Analytics UI is calculated based on data that is only partly up-to date. In reality the probability should is at least 99%, but the UI shows 84.1%. The number is updated about twice a day, and it jumps around wildly even though there is not much change in the relative ratio between A and B test.
analytics

Incorrect Graphs

The above graph is sometimes not up to date. It seems that the graph data can be delayed for 1-2 days. Sometimes it shows the current days as no data, even though it’s already recorded in the “Adsense” view.

By the way, when optimizing for Adsense Revenue, it seems that the “AdSense” view is much more accurate than the “Conversions” view. What really counts is the “AdSense eCPM” value (which by the way should be renamed to RPM).

Incorrect/Inaccurate Results with Multiple Variants

I’ve tried to use the multi-armed bandit to find the optimum variant out of 20 different ones. It seems that this has failed completely. While having a look at the experiments.js file that google integrates, it can be seen that some variants had a probability to be chosen of 0, so they were not at all considered any more. With multi-armed bandit Google usually has a selection probability for each variant that is exactly the probability of it outperforming the original. The javascript file for the above screenshot contains this data:

So even though the screenshot says a 84.1% probability of outperforming, the file has a selection probability of the variant of 98.1% which should be more correct. In my test with 20 variants, about 10 variants had a weight of 0, which means that they were never shown to the user. When I noticed this, I added my own selection code (in 50% of the cases, just randomly present a variation to the user) so that the 0 probability variants have a chance to be actually visible to some users, and while this had some effect on the results, it still seemed that the multi-armed bandit just can not deal with so many variants at the same time. Some variants which are actually very similar got a completely different probability of outperforming the original.

Verdict

All in all Google Analytics is really excellent for A/B testing, but has a few quirks that one should be aware of. When A/B testing, a Chi-Square test for comparison is always helpful. E.g. a Chi-Squared test of click through ratio of the above example (ok, it’s not the same as testing AdSense Revenue like google does, but the difference should be marginal in my case) shows that with a confidence level of 99% we can say that the variant is more successful.

Beautiful Git Logs

git has a very configurable logging options. I’ve played a while with the configuration, and found an awesome alias that looks just beautiful. It only works as of git 1.8.3 (March 24, 2013) because it uses auto coloring.

git l

git_l

git ls

git_ls

FreeArc Compression Settings

I have been playing around with compression tools for quite a while, and FreeArc is one of the best of the bunch. It has the best compression / runtime ratio of any compression tool I know. Here are two parameter sets I especially like:

Fast compression

uses 4 cores to the max on a 4GB RAM machine, ~10MB per second. Compression ratio is usually above 7z max, while beeing much faster. Decompression is very fast.

Maximum compression

Requires 2 pass on a 4GB RAM machine, ~1MB per second. Usually compresses better than freearc’s default max compression. Decompression is fast too, and requires only one pass.

Extreme Compression

Can take like forever, and needs executable for precomp042 and srep, and requires two temporary files that are about as large (or larger) than the whole uncompressed data set.