While preparing my talk last week in Leuven, I wanted to update the slide with non-reproducible results making news headlines. Unfortunately, it’s always very easy to find recent material for that slide…
An interesting site which I recently discovered in that respect, is RetractionWeb: an overview of retracted works. And when reading some of the recent posts, I can fully see how the authors have a hard time keeping up with all related news. The authors also describe why they started this blog. Great site! Sad to see so much news on it.
It has been silent here for a while again, which by no means implies that people promoting reproducible research were sitting still.
ResearchCompendia.org is a site set up by Victoria Stodden and her colleagues at Columbia university. It invites everyone to upload their research compendia (paper with source code) onto the site. It has a clean interface, and the source code for the site is maintained at GitHub. Currently the site already contains more than 200 compendia, maybe soon also yours?
Another nice article on reproducible research by Victoria Stodden in IMS Bulletin:
V. Stodden, Resolving Irreproducibility in Empirical and Computational Research, IMS Bulletin, Nov 2013.
I just read a very well-written article on reproducible research, giving 10 simple but important rules for making your results (more) reproducible:
It’s probably the way things go, but still I feel sad about it. One of the reproducible research tools linked on our site does not seem to exist anymore: ResearchAssistant (by Daniel Ramage). Typical story: PhD student graduates, and moves on to another position (I did find a “now at Google” when searching what happened), and the web pages with useful links and tools disappear. RIP.
(If you read this blog in the next month, and you know that ResearchAssistant is still alive, let me know. I promise to keep the link alive for another month.)
Given the fact that you are reading this blog, I am assuming you are into sharing your code and data. However, what is not so clear to me, is where we should be sharing these data (assuming data includes code, which people often seem to forget).
- On our own personal web page? Seems like an excellent place to me, but too many of those personal sites are extremely short-lived. I notice this every time I update links.
- On our institution’s publication pages? That would probably be my preferred choice at this moment. Often with a longer life-span than personal webpages, and still “close enough to yourself”. Some issues arise with people working at a company (but then, are you often allowed to share code/data in such a situation?), or with people moving from one institution to another, but those all seem fairly limited compared to some of the alternatives.
- On the publisher’s web pages? That would make it consistent with the related publication. However, I’m not sure I want to transfer ownership of my code and data to the publisher as well.
- On “social media” such as ResearchGate, or Academia.edu? At first I was enthusiast about these, but I start having my doubts. Who is behind these sites? How are they counting on making money, based on my data? Now that some of those start spamming me with e-mail, and asking me whether I have questions for the authors of a paper I downloaded, I become even more skeptical.
- Any other suggestions?
Maybe I am too critical about this, or too old-fashioned. Or just too commercially oriented, and not open enough to share with everyone potentially interested in my work. Who will tell?
The latest issue of IEEE Computing in Science and Engineering is a special issue on reproducible research. It features several articles on tools and approaches for reproducible research.
I also contributed a paper “Code Sharing Is Associated with Research Impact in Image Processing“, where I show that there is a relation between making code available online for your paper and the paper’s number of citations. For academics, I believe this is one of the most important motivations for making code available online.
Have fun reading the entire issue!
I recently learned about the RunMyCode portal, a website where you can easily create a companion website for a paper, allowing others to redo your experiments (thanks, Victoria). The website looks very nice, and it seems a very attractive proposition.
Yet another case of scientific fraud caught a lot of media attention in The Netherlands in the past months. In social psychology, Mr Stapel, former professor at Tilburg University, got caught after years of scientific misconduct.
Remark that in this case we are not talking about removing an outlier, or ‘enhancing’ some results. No, Mr Stapel actually made up the data for (some of) his publications entirely. Over the past years, his work has received quite some media attention, with research findings like “meat eaters are more selfish than vegetarians”, or “disordered environments make people more prone to stereotyping and discrimination”. Both results have been withdrawn.
When preparing my post on ICIP 2011 and the reproducible research round table, I ran into a presentation which Steve Eddins (The Mathworks) gave at ICIP in 2006:
Take Control of Your Code
Maybe 5 years old now, but still very actual. Recommended reading for all software writers among us (and I guess most are these days writing code in some way or another)!