Author Archives: Patrick

CVPR 2010

I just got pointed to the author guidelines for CVPR 2010. They state that reviewers will be asked about (indicative) reproducibility (or repeatability, as it is called there):

Repeatability Criteria: The CVPR 2010 reviewer form will include the following additional criteria, with rating and associated comment field: “Are there sufficient algorithmic and experimental details and available datasets that a graduate student could replicate the experiments in the paper? Alternatively, will a reference implementation be provided?”. During paper registration, authors will be asked to answer the following two checkbox questions: “1. Are the datasets used in this paper already publicly available, or will they be made available for research use at the time of submission of the final camera-ready version of the paper (if accepted)? 2. Will a reference implementation adequate to replicate results in the paper be made publicly available (if accepted)?” If either these boxes are checked, the authors should specify in the submitted paper the scope of such datasets and/or implementations so that the reviewers can judge the merit of that aspect of the submission’s contribution. The Program Chairs realize that for certain CVPR subfields providing such datasets, implementations, or detailed specification is impractical, but in other areas it is reasonable and sometimes even standard, so on balance repeatability is a relevant criteria for reviewer consideration. “N.A.” will be an available reviewer score for this field, as it is for other fields.

Very exciting developments!

New paper in forensic bioinformatics

Keith Baggerly and Kevin Coombes have just published a great paper about forensic bioinformatics: “Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology.” The article will appear in the next issue of Annals of Applied Statistics and is available here.

The blog post Make up your own rules of probability discusses a couple of the innovative rules of probability Baggerly and Coombes discovered while reverse engineering bioinformatics articles.

Note that the Reproducible Ideas blog is winding down. I’m in the process of handing over the reproducibleresearch.org domain to the owners of reproducibleresearch.net. Eventually the .net site will move over to the .org name.

Research data. Who cares?

Today I attended the mini-symposium “Research data. Who cares?“, organized by Leon Osinski at TU Eindhoven. The symposium was organized at the startup of the 3TU.Data Centre, an organization by the 3 Dutch Technical Universities’ libraries for the storage, sharing and preservation of research data. I gave a presentation there about my experiences with reproducible research.

Another presentation there that I liked a lot was given by Pieter Van Gorp, about “SHARE” (Sharing Hosted Autonomous Research Environments). This is an exciting setup he developed. It allows a researcher to put his research results in a safe and well-controlled environment on a virtual machine. Other researchers can then login to that virtual machine, and reproduce the results in exactly the same environment as used by the author, as if they are working on the author’s machine. While I am not entirely sure yet about its advantage for my typical Matlab scripts (that do not require complex installations), it is certainly of tremendous help when presenting more complex tools and results. Seems like a great step towards one-click reproducing of results, and I am certainly going to try it out!

Welcome (back) !

Welcome (or welcome back) on this blog! September 1st, a good moment for a (new) start!

First of all, I’d like to welcome all readers from John Cook’s Reproducible Ideas blog at reproducibleresearch.org. I hope I’ll be able to live up to the standards John has set.

And of course also welcome to reproducibleresearch.net readers, and readers that join from pixeltje.be.

This blog has been created when merging reproducibleresearch.net and reproducibleresearch.org. I’ve taken this occasion to merge John’s posts (and thus keep those links valid) with my earlier posts on two different sites: blog.epfl.ch/rr and blog.pixeltje.be that are related to reproducible research.

I hope I’ll be able to write many interesting posts here. Please feel free to comment on any of my writings! If you would be interested in writing guest posts, please let me know!

Anything You Can Do, I Can Do Better (No You Can’t)…

Some more interesting reading:

K. Price, Anything You Can Do, I Can Do Better (No You Can’t)…, Computer Vision, Graphics, and Image Processing, Vol. 36, pp. 387-391, 1986, doi:10.1016/0734-189X(86)90083-6.

Abstract: Computer vision suffers from an overload of written information but a dearth of good evaluations and comparisons. This paper discusses why some of the problems arise and offers some guidelines we should all follow.

Very nice reading material, and (although I know these ideas are around for quite some time already) I was amazed to see so many parallels to our recent IEEE Signal Processing Magazine paper, already in this paper by Price from 1986. That’s more than 20 years ago! Price talks about the reproducibility problems in computer vision and image processing, writing we should “stand on other’s shoulders, not on other’s toes”. He also did a study on reproducibility of a set of about 42 papers, verifying the size of the dataset and clarity of the problem statement. Price concludes as follows: “Researchers should make the effort to obtain implementations of other researchers’ systems so that we can better understand the limitations of our own work.”

Again, interesting to see how these issues and worries have been around for more than 20 years in the field of image processing. It’s about time to drastically improve our standards, I think!

I would really recommend this article to anyone interested in issues around reproducible research.

Literate Statistical Practice

I just read the following paper:

A. J. Rossini and F. Leisch, Literate statistical practice, UW Biostatistics Working Paper Series 194, University of Washington, WA, USA, 2003.

Although I am not a statistician, this was a very interesting paper to me. It gives a nice description of a possible literate programming approach in statistics. The authors propose a very versatile type of document combining documentation and code/statistical analyses, interweaved as in the original description of literate programming by Knuth. From this versatile document, which contains a complete description of the research work, multiple reports can be extracted, such as an article, an internal report, an overview of the various analyses that were performed, etc.

Challenges and Prizes

I just read the article about Netflix’ Million Dollar Programming Prize on IEEE Spectrum.

Robert M. Bell, Jim Bennett, Yehuda Koren, and Chris Volinsky, The Million Dollar Programming Prize, IEEE Spectrum Online, http://www.spectrum.ieee.org/may09/8788.

Interesting article, showing again how contests proposing a challenge can inspire a lot of great work, and allow an ‘objective’ comparison between algorithms. I think they provide a great way to motivate researchers to work on real problems, with testing on standardized datasets.

Reproducible Research in Signal Processing – What, why, and how

I am glad to let you know that our paper has been published in the latest issue of IEEE Signal Processing Magazine:

P. Vandewalle, J. Kovacevic and M. Vetterli, Reproducible Research in Signal Processing – What, why, and how, IEEE Signal Processing Magazine, Vol. 26, Nr. 3, pp. 37-47, 2009, DOI: 10.1109/MSP.2009.932122.

Have you ever tried to reproduce the results presented in a research paper? For many of our current publications, this would unfortunately be a challenging task. For a computational algorithm, details such as the exact data set, initialization or termination procedures, and precise parameter values are often omitted in the publication for various reasons, such as a lack of space, a lack of self-discipline, or an apparent lack of interest to the readers, to name a few. This makes it difficult, if not impossible, for someone else to obtain the same results. In our experience, it is often even worse as even we are not always able to reproduce our own experiments, making it difficult to answer questions from colleagues about details. Following are some examples of e-mails we have received: “I just read your paper X. It is very completely described, however I am confused by Y. Could you provide the implementation code to me for reference if possible?” “Hi! I am also working on a project related to X. I have implemented your algorithm but cannot get the same results as described in your paper. Which values should I use for parameters Y and Z?”

Enjoy reading! And feel free to post your comments!

A sobering experience

Last month, a few former colleagues at LCAV did some cross-testing of the reproducible research compendia available at rr.epfl.ch. And I must say, from the results I have seen so far, it is quite a sobering experience. Many of those which I considered to be definitely reproducible didn’t pass the test (entirely). I guess that shows again how difficult it is to make work really reproducible, even if you fully intend to do it. So that also leads me to my conviction that for papers that do not have code and data online, it is almost impossible to reproduce the exact results. There is work to be done on the road to reproducible research!

I’ll need to look further into the reasons why even some of my own work did not pass the test.

reproducibleresearch.net

I am glad to announce you our new website on reproducible research: www.reproducibleresearch.net. Yes, as I already discussed before, various sites on this topic recently (or less recently) popped up. However, I still think this site can add something extra to the existing sites. First of all, it is mainly addressing the signal/image processing community, a research domain not specifically addressed in the other sites yet.

It contains information on reproducible research and how to make signal processing research reproducible. It also lists references to articles about reproducible research, a discussion forum, and various other related links.

And then, in my opinion an important extra to signal processing interested people. We added a listing of links to papers for which code/data are available (with of course links to them). I really believe this can be extremely useful when doing research. For copyright reasons, we cannot (in most cases) host the PDF on our own site, and I am also not sure we should want to. But if developed and maintained well, this can give a one-stop site when looking for code/data related to a paper. So please feel free to send me your additions. I will be happy to add all signal/image processing related works!

I’m really excited about this site, so let me know what you think!