I just read a very well-written article on reproducible research, giving 10 simple but important rules for making your results (more) reproducible:
Happy reading!
I just read a very well-written article on reproducible research, giving 10 simple but important rules for making your results (more) reproducible:
Happy reading!
The latest issue of IEEE Computing in Science and Engineering is a special issue on reproducible research. It features several articles on tools and approaches for reproducible research.
I also contributed a paper “Code Sharing Is Associated with Research Impact in Image Processing“, where I show that there is a relation between making code available online for your paper and the paper’s number of citations. For academics, I believe this is one of the most important motivations for making code available online.
Have fun reading the entire issue!
At this year’s ICIP conference (IEEE International Conference on Image Processing) in Brussels, a round table was organized on reproducible research. Martin Vetterli (EPFL) was one of the panel members, the others were Thrasos Pappas (Northwestern Univ.), Thomas Sikora (Technical University of Berlin), Edward Delp (Purdue University), and Khaled El-Maleh (Qualcomm). Unfortunately, I was not able to attend the panel discussion myself, but I’d be very happy to read your feedback and comments on the discussion in the comments below. And let the discussion continue here…!
The conference also particularly mentioned in the call for papers that they would give a “Reproducible code available” label. A best code prize would also be awarded, however, I did not hear anything about it later anymore. I am curious how many submissions would have been received. When scanning through the papers, I could find 9 papers mentioning something about their code being available online:
At two recent occasions, I heard about Elsevier’s Executable Paper contest. The intention was to show concepts for the next generation of publications. Or as Elsevier put it:
Executable Paper Grand Challenge is a contest created to improve the way scientific information is communicated and used.
It asks:
By now, the contest is over, and the winners have been announced:
First Prize: The Collage Authoring Environment by Nowakowski et al.
Second Prize: SHARE: a web portal for creating and sharing executable research papers by Van Gorp and Mazanek.
Third Prize: A Universal Identifier for Computational Results by Gavish and Donoho.
Congratulations to all! At the AMP Workshop where I am now, we were lucky to have a presentation about the work by Gavish and Donoho, which sounds very cool! I also know the work by Van Gorp and Mazanek, using virtual machines to allow others to reproduce results. Still need to look into the winner’s work…
If any of this sounds interesting to you, and I believe it should, please take a look at the Grand Challenge website, and also check out some of the other participants’ contributions!
Here at the workshop, we also had an interesting related presentation yesterday by James Quirk about all that can be done with a PDF. Quite impressive! For examples, see his Amrita work and webpage.
I was just reading the following two articles/notes. While they are not entirely about reproducible research, I think they reflect well the worries that many researchers have about current “publish or perish” research practices. Not sure I agree with all of it, but they do make a number of good remarks.
D. Geman, Ten Reasons Why Conference Papers Should be Abolished, Johns Hopkins University, Nov. 2007.
Y. Ma, Warning Signs of Bogus Progress in Research in an Age of Rich Computation and Information, ECE, University of Illinois, Nov. 2007.
I just got a pointer to ResearchAssistant, a Java tool to keep better track of research experiments. It stores the exact circumstances under which experiments are performed, with parameter values etc. Looks like a very nice tool for reproducible research. At least, it should make it very easy when writing a paper to trace back how certain results were obtained. If you’re doing research in Java, I’d certainly recommend taking a look at RA!
The tool is also described in the following paper:
D. Ramage and A. J. Oliner, RA: ResearchAssistant for the Computational Sciences, Workshop on Experimental Computer Science (ExpCS), June 2007.
Keith Baggerly and Kevin Coombes have just published a great paper about forensic bioinformatics: “Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology.” The article will appear in the next issue of Annals of Applied Statistics and is available here.
The blog post Make up your own rules of probability discusses a couple of the innovative rules of probability Baggerly and Coombes discovered while reverse engineering bioinformatics articles.
Note that the Reproducible Ideas blog is winding down. I’m in the process of handing over the reproducibleresearch.org domain to the owners of reproducibleresearch.net. Eventually the .net site will move over to the .org name.
Some more interesting reading:
K. Price, Anything You Can Do, I Can Do Better (No You Can’t)…, Computer Vision, Graphics, and Image Processing, Vol. 36, pp. 387-391, 1986, doi:10.1016/0734-189X(86)90083-6.
Abstract: Computer vision suffers from an overload of written information but a dearth of good evaluations and comparisons. This paper discusses why some of the problems arise and offers some guidelines we should all follow.
Very nice reading material, and (although I know these ideas are around for quite some time already) I was amazed to see so many parallels to our recent IEEE Signal Processing Magazine paper, already in this paper by Price from 1986. That’s more than 20 years ago! Price talks about the reproducibility problems in computer vision and image processing, writing we should “stand on other’s shoulders, not on other’s toes”. He also did a study on reproducibility of a set of about 42 papers, verifying the size of the dataset and clarity of the problem statement. Price concludes as follows: “Researchers should make the effort to obtain implementations of other researchers’ systems so that we can better understand the limitations of our own work.”
Again, interesting to see how these issues and worries have been around for more than 20 years in the field of image processing. It’s about time to drastically improve our standards, I think!
I would really recommend this article to anyone interested in issues around reproducible research.
I just read the following paper:
A. J. Rossini and F. Leisch, Literate statistical practice, UW Biostatistics Working Paper Series 194, University of Washington, WA, USA, 2003.
Although I am not a statistician, this was a very interesting paper to me. It gives a nice description of a possible literate programming approach in statistics. The authors propose a very versatile type of document combining documentation and code/statistical analyses, interweaved as in the original description of literate programming by Knuth. From this versatile document, which contains a complete description of the research work, multiple reports can be extracted, such as an article, an internal report, an overview of the various analyses that were performed, etc.
I just read the article about Netflix’ Million Dollar Programming Prize on IEEE Spectrum.
Robert M. Bell, Jim Bennett, Yehuda Koren, and Chris Volinsky, The Million Dollar Programming Prize, IEEE Spectrum Online, http://www.spectrum.ieee.org/may09/8788.
Interesting article, showing again how contests proposing a challenge can inspire a lot of great work, and allow an ‘objective’ comparison between algorithms. I think they provide a great way to motivate researchers to work on real problems, with testing on standardized datasets.