Yet another case of scientific fraud caught a lot of media attention in The Netherlands in the past months. In social psychology, Mr Stapel, former professor at Tilburg University, got caught after years of scientific misconduct.
Remark that in this case we are not talking about removing an outlier, or ‘enhancing’ some results. No, Mr Stapel actually made up the data for (some of) his publications entirely. Over the past years, his work has received quite some media attention, with research findings like “meat eaters are more selfish than vegetarians”, or “disordered environments make people more prone to stereotyping and discrimination”. Both results have been withdrawn.
When preparing my post on ICIP 2011 and the reproducible research round table, I ran into a presentation which Steve Eddins (The Mathworks) gave at ICIP in 2006:
Take Control of Your Code
Maybe 5 years old now, but still very actual. Recommended reading for all software writers among us (and I guess most are these days writing code in some way or another)!
At this year’s ICIP conference (IEEE International Conference on Image Processing) in Brussels, a round table was organized on reproducible research. Martin Vetterli (EPFL) was one of the panel members, the others were Thrasos Pappas (Northwestern Univ.), Thomas Sikora (Technical University of Berlin), Edward Delp (Purdue University), and Khaled El-Maleh (Qualcomm). Unfortunately, I was not able to attend the panel discussion myself, but I’d be very happy to read your feedback and comments on the discussion in the comments below. And let the discussion continue here…!
The conference also particularly mentioned in the call for papers that they would give a “Reproducible code available” label. A best code prize would also be awarded, however, I did not hear anything about it later anymore. I am curious how many submissions would have been received. When scanning through the papers, I could find 9 papers mentioning something about their code being available online:
- Chuohao Yeo, Yih Han Tan, Zhengguo Li, Susanto Rahardja, CHROMA INTRA PREDICTION USING TEMPLATE MATCHING WITH RECONSTRUCTED LUMA COMPONENTS, http://iphome.hhi.de/suehring/tml/download/.
- Li Chen, Yang Xiang, YaoJie Chen, XiaoLong Zhang, RETINAL IMAGE REGISTRATION USING BIFURCATION STRUCTURES, http://www.mathworks.com/matlabcentral/fileexchange/23015-feature-based-retinal-image-registration.
- Christian Keimel, Manuel Klimpke, Julian Habigt and Klaus Diepold, NO-REFERENCE VIDEO QUALITY METRIC FOR HDTV BASED ON H.264/AVC BITSTREAM FEATURES, www.ldv.ei.tum.de/videolab.
- Athanasios Voulodimos, Dimitrios Kosmopoulos, Georgios Vasileiou, Emmanuel Sardis, Anastasios Doulamis, Vassileios Anagnostopoulos, Constantinos Lalos, Theodora Varvarigou, A DATASET FOR WORKFLOWRECOGNITION IN INDUSTRIAL SCENES, http://www.scovis.eu/.
- Roland Kwitt, Peter Meerwald, Andreas Uhl and Geert Verdoolaege, TESTING A MULTIVARIATE MODEL FOR WAVELET COEFFICIENTS, http://www.wavelab.at/sources/.
- Yizhen Huang, WAVELET-BASED QUALITY CONSTRAINED COMPRESSION USING BINARY SEARCH, http://pages.cs.wisc.edu/~huangyz/imageCompression.rar.
- Thomas Stütz and Andreas Uhl, EFFICIENTWAVELET PACKET BASIS SELECTION IN JPEG2000, http://www.wavelab.at/sources/.
- E. Gil-Rodrigo, J. Portilla, D. Miraut, R. Suarez-Mesa, EFFICIENT JOINT POISSON-GAUSS RESTORATION USING MULTI-FRAME L2-RELAXED-L0 ANALYSIS-BASED SPARSITY, – announced code, but I could not find it yet – .
- J. Portilla, E. Gil-Rodrigo, D. Miraut, R. Suarez-Mesa, CONDY: ULTRA-FAST HIGH PERFORMANCE RESTORATION USING MULTI-FRAME L2-RELAXED-L0 SPARSITY AND CONSTRAINED DYNAMIC HEURISTICS, to become available on http://www4.io.csic.es/PagsPers/JPortilla/portada/software.
I wrote two months ago about the mini-symposium “Store-Share-and-Cite” at TU Delft, where I gave a talk. The slides for all presentations are available online now. Enjoy!
Early this year, IEEE has changed its policy with respect to making your publications available online. Now you are only allowed to put a (final) preprint on your personal web page (or your institution’s), mentioning the copyright and final referencing data. This holds for all papers published after January 1st, 2011. Before, you were also allowed to make the published paper itself available online.
While I do understand that this protects (some of the) additional work done by IEEE to make that final publication look nice, and thus should encourage people to subscribe, I am not happy with this measure. Maybe this is just aligning the IEEE policy with what most publishers do already, but still.
Why do I prefer the published one? First of all, this makes sure only a single version of a paper circulates on the web. I personally find it very annoying to see a paper, start reading it because it looks different from what you’ve seen before, and notice that it is actually the same, but in different typesetting. Even more so if the two would have differences. The final published one would be the most correct one, I assume. Secondly, it also increases the chances that a paper is cited correctly. Because, let’s face it, not everyone will nicely add the “full citation to the original IEEE publication and a link to the article in the IEEE Xplore digital library“.
Correctly citing a paper may become even more difficult…
I am currently attending the AMP Workshop on Reproducible Research: Tools and Strategies for Scientific Computing, organized by Ian Mitchell, Randy Leveque and Victoria Stodden. This workshop is organized as one of the satellite workshops around ICIAM, the International Congress on Industrial and Applied Mathematics. It’s been a very interesting workshop already, with nice talks and tutorials about tools and policies. Some more about it probably later on this blog. I definitely already have some new input for the links pages on the rest of this website… The talks and presentations will also be posted online on the workshop website.
At the end of this workshop, there will also be a community forum discussion, to which I also really look forward.
It’s been quiet here over the past months… which is actually a sign for how busy real life has been over that period, and how lousy I have been at posting and updating here, and reacting on the little tidbits in my mailbox.
So let’s get things started and more active again over here.
Earlier this month I gave a talk on reproducible research at a mini-symposium “Store, Share and Cite” at TU Delft (The Netherlands). Together with the other technical universities in The Netherlands, they have created a data center for long-term storing of data sets. Great initiative. The mini-symposium was also very interesting, with talks also from other data centers and publishers. In my opinion too much of a library activity, and not enough ‘real’ researchers joining, but I think that’s always difficult. Hey, there were some, and I believe they were interested.
I just got pointed to a very interesting article in The New Yorker. It’s not entirely on reproducible research itself, but on the scientific method, which I believe is related enough to mention it here. The article is about how difficult it is to do ‘good’ science, and not to be tricked into believing what we want to believe.
J. Lehrer, The Truth Wears Off, The New Yorker, December 2010.
Here’s an interview with Pieter Van Gorp about SHARE, a tool to share a research environment, allowing others to reproduce your results.
I got pointed to another recent article published as a column in Nature:
Nick Barnes, Publish your computer code: it is good enough, Nature 467, 753 (2010), doi:10.1038/467753a.
Many very recognizable arguments pro (and contra) putting your code online! I enjoyed reading it, hope you will do the same.