Monthly Archives: July 2010

What is the best reproducible research?

What is best research practice in terms of reproducibility? At the recent workshop in As (Norway), I had a discussion with Marc-Oliver Gewaltig, similar to discussions I had earlier with some other colleagues as well. So I decided to put it up here. All feedback is welcome!

The discussion boils down to the following question: Is it better (in terms of reproducibility) to make code and data available online and allow users to repeat your experiments (or simulations as Marc-Oliver would call them) obtaining the same results, or to describe your theory (model in Marc-Oliver’s terminology) in sufficient detail that people can verify your results by re-implementing your experiments and verifying that they obtain the same thing?

I personally believe both approaches have their pros and cons. With the first one, a reader can download the related code and data, and very easily verify that he/she can obtain the same results as presented in the paper. If he wants to analyze things further, there is already a first implementation available to start analyzing, or to test on other data. However, that certainly doesn’t take away the need for a good and clear description in the paper!

With the second approach, one avoids the risk that a bug in the code giving those results is not caught by a reader reproducing the results, because he can just “double-click” to repeat the experiment. The second approach allows a thorough verification of the presented concept/theory, as the reader independently re-implements the work and checks the results. I believe certain standardization bodies like MPEG use this approach to make sure that descriptions are sufficiently precise.

Personally, I think the second approach is a better, more thorough approach in an ideal world. Currently, I prefer the first one, because most people won’t go into the depth of re-implementing things, and the first approach already gives those people something. Something more than just the paper, allowing to get their hands dirty on it. And “more interested readers” may still re-implement, or start analyzing the code in detail.