Category Archives: reproducible research

Scientific fraud

A few months ago, I read in a Belgian newspaper that 9% of the participants in a study among 2.000 American scientists said they had witnessed scientific fraud within the past three years. And it seems they were not talking about those cases where people use Photoshop to crop an image or so, but rather inventing fake results or falsifying articles.

Although I wasn’t able to find this back on the web with Google, I am quite sure the original authors checked the number. Wikipedia reports on another study, where the actual number was 3%. Anyhow, whether it is 3 or 9 percent, this number is much too high. Let us hope it can be taken down by requiring higher reproducibility of our research work. I do realize that there will always be people cheating, and falsifying results (Wikipedia even keeps a list of the most famous cases). But I also strongly believe that in the end, most researchers just want to do good work. And many of them perform non-reproducible work, just because they don’t feel the need for making it reproducible (yet). Or are too busy with their next piece of work to properly finish off the current one…

Data set competitions

One of the reproducibility problems with many current papers is that everyone applies his new algorithm to his own set of data. So did I in my super-resolution work, too. A problem with that is that it is very difficult to assess whether the data set is used (a) because that was the one the author had at hand, (b) because it was the most representative one, or (c) because the algorithm performed best on that data set.

To allow more fair comparisons, competitions are being set up in various fields. Often in the period before a conference, a competition is set up, where everyone can try his algorithm on a common dataset given by the organizers.

Continue reading

Reproducible Research History (1)

To my knowledge, the reproducible research efforts in computational sciences were started by Jon Claerbout (who retired earlier this year) in the early 90s. In his Stanford Exploration Lab at Stanford University, Claerbout and his colleagues (working in seismic imaging) developed a system using Makefiles that allows to remove all figures, and reproduce them using a single Unix command. This allows any person (with a Unix/Linux system) to reproduce all the results in their work. I think it is about as close to “one-click reproducibility” as one can get! Claerbout and his lab performed a lot of the pioneering work in promoting reproducible research, which has spread later to various disciplines. A history by Claerbout himself is available here.

In their work, Claerbout and his colleagues make a distinction between three types of figures/results. First of all, and most common, there are easily reproducible results, which can be reproduced by a reader using the code and data contained in the electronic document. Secondly, conditionally reproducible results are results for which the commands and data are given, provided that certain resources are available (such as Matlab or Mathematica), or for which it requires more than 20 minutes to reproduce the results. And finally, non reproducible results, a label used for results that cannot be reproduced, such as hand-drawn figures, scans, or images taken from other documents for comparison.

Their Makefile setup was recently developed further by Fomel et al. in the Madagascar project, using SCons, a similar language to Makefiles, but which should make reproducibility even more simple, and cross-platform! See their project page for more details.

Middlebury Stereo

An article close to my current work on 3D now:

D. Scharstein and R. Szeliski, A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, International Journal of Computer Vision, 47(1/2/3), pp. 7-42, April-June 2002.

In their article, Scharstein and Szeliski make a comparison of stereo estimation algorithms. But they do not just offer this overview of algorithms. On their webpage, they also provide the source code, and a widely used dataset of stereo images. They also invite other researchers to try their own algorithm on this dataset, and upload the results. This has resulted over the years in a performance comparison of almost 50 stereo algorithms, nicely listed on their webpage.

A nice example of what reproducible research can do! I think we need a lot more of these comparisons on common (representative) datasets.

Reproducible Research in Medicine

I just read the following article:

C. Laine, S. N. Goodman, M. E. Griswold, and H. C. Sox, Reproducible Research: Moving toward Research the Public Can Really Trust, Annals of Internal Medicine, Vol. 146, Nr. 6, pp. 450-453, 2007.

A very interesting article, about how the journal “Annals of Internal Medicine” is promoting reproducible research. They do not require that all papers are reproducible, but they do ask the authors of each paper whether theirs is reproducible or not. If it is reproducible, they provide links to the protocol, data, or statistical code that was used.

While, certainly in medicine, this still does not guarantee that the entire research work is reproducible, it does give a lot of additional information (and credibility) about the presented work. I (as an ignorant researcher) also found it very interesting to read the description of the thorough editorial process that each paper undergoes. I have put an overview of reproducible research initiatives by journals on our RR links page. That is, the initiatives I know about of course. Feel free to let me know if you know other examples!

This initiative was (among others) initiated by an article about this topic by Peng et al. It would be great if other journals take over these examples, and reproducible research becomes the ‘default’ for a paper…

Putting reproducible research papers online

Reproducible Research
OK, you’ve made a reproducible research (RR) paper. How to make it available online?

At LCAV, we started off by simply making an HTML web page with all the paper details and additional information (data, code, additional figures), and putting that on our lab website.

While this is a very straightforward way of working, it is probably not very practical in the long term, as a new web page has to be made for each new publication. I can also imagine writing HTML may seem a big step (or just too cumbersome) to some. That’s why we have developed, in collaboration with the EPrints team, a Reproducible Research Repository setup. It has to be configured once (by your system administrator?), and allows you then to upload new RR papers by filling out a form with all the necessary information and uploading PDF, code, data, and/or any other additional material. I think it is really a lot more user-friendly than creating HTML pages each time. At the same time, it assures that all the information is there, and creates nice web pages (see here for an example of the repository front page, and here for an example of a page for a paper).

Continue reading

Our reproducible research paper

We (Jelena Kovacevic, Martin Vetterli, and me) recently submitted a paper on reproducible research to IEEE Signal Processing Magazine: “What, Why and How of Reproducible Research in Signal Processing“. It describes our experiences with reproducible research at LCAV (EPFL) and CMU, as well as a study on reproducibility of articles that appeared in IEEE Transactions on Image Processing (to which many people contributed). Feel free to take a look!

It also describes a setup for putting reproducible research publications online, about which I hope to post more info soon. One thing I already want to mention is that it is freely available for download!

We also did some advertizing for this paper recently by e-mail, and I must admit it is always nice to see the reactions you get… ! It was also picked up in the blogosphere by John Cook, hosting www.reproducibleresearch.org, here and here, and by Greg Wilson on The Third Bit.

But more about reproducible research soon…