Given the fact that you are reading this blog, I am assuming you are into sharing your code and data. However, what is not so clear to me, is where we should be sharing these data (assuming data includes code, which people often seem to forget).
- On our own personal web page? Seems like an excellent place to me, but too many of those personal sites are extremely short-lived. I notice this every time I update links.
- On our institution’s publication pages? That would probably be my preferred choice at this moment. Often with a longer life-span than personal webpages, and still “close enough to yourself”. Some issues arise with people working at a company (but then, are you often allowed to share code/data in such a situation?), or with people moving from one institution to another, but those all seem fairly limited compared to some of the alternatives.
- On the publisher’s web pages? That would make it consistent with the related publication. However, I’m not sure I want to transfer ownership of my code and data to the publisher as well.
- On “social media” such as ResearchGate, or Academia.edu? At first I was enthusiast about these, but I start having my doubts. Who is behind these sites? How are they counting on making money, based on my data? Now that some of those start spamming me with e-mail, and asking me whether I have questions for the authors of a paper I downloaded, I become even more skeptical.
- Any other suggestions?
Maybe I am too critical about this, or too old-fashioned. Or just too commercially oriented, and not open enough to share with everyone potentially interested in my work. Who will tell?
The latest issue of IEEE Computing in Science and Engineering is a special issue on reproducible research. It features several articles on tools and approaches for reproducible research.
I also contributed a paper “Code Sharing Is Associated with Research Impact in Image Processing“, where I show that there is a relation between making code available online for your paper and the paper’s number of citations. For academics, I believe this is one of the most important motivations for making code available online.
Have fun reading the entire issue!
At this year’s ICIP conference (IEEE International Conference on Image Processing) in Brussels, a round table was organized on reproducible research. Martin Vetterli (EPFL) was one of the panel members, the others were Thrasos Pappas (Northwestern Univ.), Thomas Sikora (Technical University of Berlin), Edward Delp (Purdue University), and Khaled El-Maleh (Qualcomm). Unfortunately, I was not able to attend the panel discussion myself, but I’d be very happy to read your feedback and comments on the discussion in the comments below. And let the discussion continue here…!
The conference also particularly mentioned in the call for papers that they would give a “Reproducible code available” label. A best code prize would also be awarded, however, I did not hear anything about it later anymore. I am curious how many submissions would have been received. When scanning through the papers, I could find 9 papers mentioning something about their code being available online:
- Chuohao Yeo, Yih Han Tan, Zhengguo Li, Susanto Rahardja, CHROMA INTRA PREDICTION USING TEMPLATE MATCHING WITH RECONSTRUCTED LUMA COMPONENTS, http://iphome.hhi.de/suehring/tml/download/.
- Li Chen, Yang Xiang, YaoJie Chen, XiaoLong Zhang, RETINAL IMAGE REGISTRATION USING BIFURCATION STRUCTURES, http://www.mathworks.com/matlabcentral/fileexchange/23015-feature-based-retinal-image-registration.
- Christian Keimel, Manuel Klimpke, Julian Habigt and Klaus Diepold, NO-REFERENCE VIDEO QUALITY METRIC FOR HDTV BASED ON H.264/AVC BITSTREAM FEATURES, www.ldv.ei.tum.de/videolab.
- Athanasios Voulodimos, Dimitrios Kosmopoulos, Georgios Vasileiou, Emmanuel Sardis, Anastasios Doulamis, Vassileios Anagnostopoulos, Constantinos Lalos, Theodora Varvarigou, A DATASET FOR WORKFLOWRECOGNITION IN INDUSTRIAL SCENES, http://www.scovis.eu/.
- Roland Kwitt, Peter Meerwald, Andreas Uhl and Geert Verdoolaege, TESTING A MULTIVARIATE MODEL FOR WAVELET COEFFICIENTS, http://www.wavelab.at/sources/.
- Yizhen Huang, WAVELET-BASED QUALITY CONSTRAINED COMPRESSION USING BINARY SEARCH, http://pages.cs.wisc.edu/~huangyz/imageCompression.rar.
- Thomas Stütz and Andreas Uhl, EFFICIENTWAVELET PACKET BASIS SELECTION IN JPEG2000, http://www.wavelab.at/sources/.
- E. Gil-Rodrigo, J. Portilla, D. Miraut, R. Suarez-Mesa, EFFICIENT JOINT POISSON-GAUSS RESTORATION USING MULTI-FRAME L2-RELAXED-L0 ANALYSIS-BASED SPARSITY, – announced code, but I could not find it yet – .
- J. Portilla, E. Gil-Rodrigo, D. Miraut, R. Suarez-Mesa, CONDY: ULTRA-FAST HIGH PERFORMANCE RESTORATION USING MULTI-FRAME L2-RELAXED-L0 SPARSITY AND CONSTRAINED DYNAMIC HEURISTICS, to become available on http://www4.io.csic.es/PagsPers/JPortilla/portada/software.
Another data competition:
Machine Learning for Signal Processing (MLSP) TC Announces the Winners of the 6th Annual Data Analysis Competition
See here for more info.
I was just reading the following two articles/notes. While they are not entirely about reproducible research, I think they reflect well the worries that many researchers have about current “publish or perish” research practices. Not sure I agree with all of it, but they do make a number of good remarks.
D. Geman, Ten Reasons Why Conference Papers Should be Abolished, Johns Hopkins University, Nov. 2007.
Y. Ma, Warning Signs of Bogus Progress in Research in an Age of Rich Computation and Information, ECE, University of Illinois, Nov. 2007.
Making publications reproducible is tough…
I recently experienced it again in some of my work. In the stress of preparing a publication for a submission deadline, it is very challenging to take the (precious) time to verify all of the results once more and make sure all the results are perfectly reproducible. A result or figure so easily slips in for which the exact parameter settings have not been checked or written down…
I just got pointed to the author guidelines for CVPR 2010. They state that reviewers will be asked about (indicative) reproducibility (or repeatability, as it is called there):
Repeatability Criteria: The CVPR 2010 reviewer form will include the following additional criteria, with rating and associated comment field: “Are there sufficient algorithmic and experimental details and available datasets that a graduate student could replicate the experiments in the paper? Alternatively, will a reference implementation be provided?”. During paper registration, authors will be asked to answer the following two checkbox questions: “1. Are the datasets used in this paper already publicly available, or will they be made available for research use at the time of submission of the final camera-ready version of the paper (if accepted)? 2. Will a reference implementation adequate to replicate results in the paper be made publicly available (if accepted)?” If either these boxes are checked, the authors should specify in the submitted paper the scope of such datasets and/or implementations so that the reviewers can judge the merit of that aspect of the submission’s contribution. The Program Chairs realize that for certain CVPR subfields providing such datasets, implementations, or detailed specification is impractical, but in other areas it is reasonable and sometimes even standard, so on balance repeatability is a relevant criteria for reviewer consideration. “N.A.” will be an available reviewer score for this field, as it is for other fields.
Very exciting developments!
Some more interesting reading:
K. Price, Anything You Can Do, I Can Do Better (No You Can’t)…, Computer Vision, Graphics, and Image Processing, Vol. 36, pp. 387-391, 1986, doi:10.1016/0734-189X(86)90083-6.
Abstract: Computer vision suffers from an overload of written information but a dearth of good evaluations and comparisons. This paper discusses why some of the problems arise and offers some guidelines we should all follow.
Very nice reading material, and (although I know these ideas are around for quite some time already) I was amazed to see so many parallels to our recent IEEE Signal Processing Magazine paper, already in this paper by Price from 1986. That’s more than 20 years ago! Price talks about the reproducibility problems in computer vision and image processing, writing we should “stand on other’s shoulders, not on other’s toes”. He also did a study on reproducibility of a set of about 42 papers, verifying the size of the dataset and clarity of the problem statement. Price concludes as follows: “Researchers should make the effort to obtain implementations of other researchers’ systems so that we can better understand the limitations of our own work.”
Again, interesting to see how these issues and worries have been around for more than 20 years in the field of image processing. It’s about time to drastically improve our standards, I think!
I would really recommend this article to anyone interested in issues around reproducible research.
I just read the following paper:
A. J. Rossini and F. Leisch, Literate statistical practice, UW Biostatistics Working Paper Series 194, University of Washington, WA, USA, 2003.
Although I am not a statistician, this was a very interesting paper to me. It gives a nice description of a possible literate programming approach in statistics. The authors propose a very versatile type of document combining documentation and code/statistical analyses, interweaved as in the original description of literate programming by Knuth. From this versatile document, which contains a complete description of the research work, multiple reports can be extracted, such as an article, an internal report, an overview of the various analyses that were performed, etc.