Monthly Archives: September 2008

Data set competitions

One of the reproducibility problems with many current papers is that everyone applies his new algorithm to his own set of data. So did I in my super-resolution work, too. A problem with that is that it is very difficult to assess whether the data set is used (a) because that was the one the author had at hand, (b) because it was the most representative one, or (c) because the algorithm performed best on that data set.

To allow more fair comparisons, competitions are being set up in various fields. Often in the period before a conference, a competition is set up, where everyone can try his algorithm on a common dataset given by the organizers.

Continue reading

Reproducible Research History (1)

To my knowledge, the reproducible research efforts in computational sciences were started by Jon Claerbout (who retired earlier this year) in the early 90s. In his Stanford Exploration Lab at Stanford University, Claerbout and his colleagues (working in seismic imaging) developed a system using Makefiles that allows to remove all figures, and reproduce them using a single Unix command. This allows any person (with a Unix/Linux system) to reproduce all the results in their work. I think it is about as close to “one-click reproducibility” as one can get! Claerbout and his lab performed a lot of the pioneering work in promoting reproducible research, which has spread later to various disciplines. A history by Claerbout himself is available here.

In their work, Claerbout and his colleagues make a distinction between three types of figures/results. First of all, and most common, there are easily reproducible results, which can be reproduced by a reader using the code and data contained in the electronic document. Secondly, conditionally reproducible results are results for which the commands and data are given, provided that certain resources are available (such as Matlab or Mathematica), or for which it requires more than 20 minutes to reproduce the results. And finally, non reproducible results, a label used for results that cannot be reproduced, such as hand-drawn figures, scans, or images taken from other documents for comparison.

Their Makefile setup was recently developed further by Fomel et al. in the Madagascar project, using SCons, a similar language to Makefiles, but which should make reproducibility even more simple, and cross-platform! See their project page for more details.