It’s probably the way things go, but still I feel sad about it. One of the reproducible research tools linked on our site does not seem to exist anymore: ResearchAssistant (by Daniel Ramage). Typical story: PhD student graduates, and moves on to another position (I did find a “now at Google” when searching what happened), and the web pages with useful links and tools disappear. RIP.
(If you read this blog in the next month, and you know that ResearchAssistant is still alive, let me know. I promise to keep the link alive for another month.)
Given the fact that you are reading this blog, I am assuming you are into sharing your code and data. However, what is not so clear to me, is where we should be sharing these data (assuming data includes code, which people often seem to forget).
- On our own personal web page? Seems like an excellent place to me, but too many of those personal sites are extremely short-lived. I notice this every time I update links.
- On our institution’s publication pages? That would probably be my preferred choice at this moment. Often with a longer life-span than personal webpages, and still “close enough to yourself”. Some issues arise with people working at a company (but then, are you often allowed to share code/data in such a situation?), or with people moving from one institution to another, but those all seem fairly limited compared to some of the alternatives.
- On the publisher’s web pages? That would make it consistent with the related publication. However, I’m not sure I want to transfer ownership of my code and data to the publisher as well.
- On “social media” such as ResearchGate, or Academia.edu? At first I was enthusiast about these, but I start having my doubts. Who is behind these sites? How are they counting on making money, based on my data? Now that some of those start spamming me with e-mail, and asking me whether I have questions for the authors of a paper I downloaded, I become even more skeptical.
- Any other suggestions?
Maybe I am too critical about this, or too old-fashioned. Or just too commercially oriented, and not open enough to share with everyone potentially interested in my work. Who will tell?
The latest issue of IEEE Computing in Science and Engineering is a special issue on reproducible research. It features several articles on tools and approaches for reproducible research.
I also contributed a paper “Code Sharing Is Associated with Research Impact in Image Processing“, where I show that there is a relation between making code available online for your paper and the paper’s number of citations. For academics, I believe this is one of the most important motivations for making code available online.
Have fun reading the entire issue!
I recently learned about the RunMyCode portal, a website where you can easily create a companion website for a paper, allowing others to redo your experiments (thanks, Victoria). The website looks very nice, and it seems a very attractive proposition.
Yet another case of scientific fraud caught a lot of media attention in The Netherlands in the past months. In social psychology, Mr Stapel, former professor at Tilburg University, got caught after years of scientific misconduct.
Remark that in this case we are not talking about removing an outlier, or ‘enhancing’ some results. No, Mr Stapel actually made up the data for (some of) his publications entirely. Over the past years, his work has received quite some media attention, with research findings like “meat eaters are more selfish than vegetarians”, or “disordered environments make people more prone to stereotyping and discrimination”. Both results have been withdrawn.
When preparing my post on ICIP 2011 and the reproducible research round table, I ran into a presentation which Steve Eddins (The Mathworks) gave at ICIP in 2006:
Take Control of Your Code
Maybe 5 years old now, but still very actual. Recommended reading for all software writers among us (and I guess most are these days writing code in some way or another)!
At this year’s ICIP conference (IEEE International Conference on Image Processing) in Brussels, a round table was organized on reproducible research. Martin Vetterli (EPFL) was one of the panel members, the others were Thrasos Pappas (Northwestern Univ.), Thomas Sikora (Technical University of Berlin), Edward Delp (Purdue University), and Khaled El-Maleh (Qualcomm). Unfortunately, I was not able to attend the panel discussion myself, but I’d be very happy to read your feedback and comments on the discussion in the comments below. And let the discussion continue here…!
The conference also particularly mentioned in the call for papers that they would give a “Reproducible code available” label. A best code prize would also be awarded, however, I did not hear anything about it later anymore. I am curious how many submissions would have been received. When scanning through the papers, I could find 9 papers mentioning something about their code being available online:
- Chuohao Yeo, Yih Han Tan, Zhengguo Li, Susanto Rahardja, CHROMA INTRA PREDICTION USING TEMPLATE MATCHING WITH RECONSTRUCTED LUMA COMPONENTS, http://iphome.hhi.de/suehring/tml/download/.
- Li Chen, Yang Xiang, YaoJie Chen, XiaoLong Zhang, RETINAL IMAGE REGISTRATION USING BIFURCATION STRUCTURES, http://www.mathworks.com/matlabcentral/fileexchange/23015-feature-based-retinal-image-registration.
- Christian Keimel, Manuel Klimpke, Julian Habigt and Klaus Diepold, NO-REFERENCE VIDEO QUALITY METRIC FOR HDTV BASED ON H.264/AVC BITSTREAM FEATURES, www.ldv.ei.tum.de/videolab.
- Athanasios Voulodimos, Dimitrios Kosmopoulos, Georgios Vasileiou, Emmanuel Sardis, Anastasios Doulamis, Vassileios Anagnostopoulos, Constantinos Lalos, Theodora Varvarigou, A DATASET FOR WORKFLOWRECOGNITION IN INDUSTRIAL SCENES, http://www.scovis.eu/.
- Roland Kwitt, Peter Meerwald, Andreas Uhl and Geert Verdoolaege, TESTING A MULTIVARIATE MODEL FOR WAVELET COEFFICIENTS, http://www.wavelab.at/sources/.
- Yizhen Huang, WAVELET-BASED QUALITY CONSTRAINED COMPRESSION USING BINARY SEARCH, http://pages.cs.wisc.edu/~huangyz/imageCompression.rar.
- Thomas Stütz and Andreas Uhl, EFFICIENTWAVELET PACKET BASIS SELECTION IN JPEG2000, http://www.wavelab.at/sources/.
- E. Gil-Rodrigo, J. Portilla, D. Miraut, R. Suarez-Mesa, EFFICIENT JOINT POISSON-GAUSS RESTORATION USING MULTI-FRAME L2-RELAXED-L0 ANALYSIS-BASED SPARSITY, – announced code, but I could not find it yet – .
- J. Portilla, E. Gil-Rodrigo, D. Miraut, R. Suarez-Mesa, CONDY: ULTRA-FAST HIGH PERFORMANCE RESTORATION USING MULTI-FRAME L2-RELAXED-L0 SPARSITY AND CONSTRAINED DYNAMIC HEURISTICS, to become available on http://www4.io.csic.es/PagsPers/JPortilla/portada/software.
I wrote two months ago about the mini-symposium “Store-Share-and-Cite” at TU Delft, where I gave a talk. The slides for all presentations are available online now. Enjoy!
Early this year, IEEE has changed its policy with respect to making your publications available online. Now you are only allowed to put a (final) preprint on your personal web page (or your institution’s), mentioning the copyright and final referencing data. This holds for all papers published after January 1st, 2011. Before, you were also allowed to make the published paper itself available online.
While I do understand that this protects (some of the) additional work done by IEEE to make that final publication look nice, and thus should encourage people to subscribe, I am not happy with this measure. Maybe this is just aligning the IEEE policy with what most publishers do already, but still.
Why do I prefer the published one? First of all, this makes sure only a single version of a paper circulates on the web. I personally find it very annoying to see a paper, start reading it because it looks different from what you’ve seen before, and notice that it is actually the same, but in different typesetting. Even more so if the two would have differences. The final published one would be the most correct one, I assume. Secondly, it also increases the chances that a paper is cited correctly. Because, let’s face it, not everyone will nicely add the “full citation to the original IEEE publication and a link to the article in the IEEE Xplore digital library“.
Correctly citing a paper may become even more difficult…
At two recent occasions, I heard about Elsevier’s Executable Paper contest. The intention was to show concepts for the next generation of publications. Or as Elsevier put it:
Executable Paper Grand Challenge is a contest created to improve the way scientific information is communicated and used.
- How can we develop a model for executable files that is compatible with the user’s operating system and architecture and adaptable to future systems?
- How do we manage very large file sizes?
- How do we validate data and code, and decrease the reviewer’s workload?
- How to support registering and tracking of actions taken on the ‘executable paper?’
By now, the contest is over, and the winners have been announced:
First Prize: The Collage Authoring Environment by Nowakowski et al.
Second Prize: SHARE: a web portal for creating and sharing executable research papers by Van Gorp and Mazanek.
Third Prize: A Universal Identifier for Computational Results by Gavish and Donoho.
Congratulations to all! At the AMP Workshop where I am now, we were lucky to have a presentation about the work by Gavish and Donoho, which sounds very cool! I also know the work by Van Gorp and Mazanek, using virtual machines to allow others to reproduce results. Still need to look into the winner’s work…
If any of this sounds interesting to you, and I believe it should, please take a look at the Grand Challenge website, and also check out some of the other participants’ contributions!
Here at the workshop, we also had an interesting related presentation yesterday by James Quirk about all that can be done with a PDF. Quite impressive! For examples, see his Amrita work and webpage.