Author Archives: Patrick

Reproducible Research in Medicine

I just read the following article:

C. Laine, S. N. Goodman, M. E. Griswold, and H. C. Sox, Reproducible Research: Moving toward Research the Public Can Really Trust, Annals of Internal Medicine, Vol. 146, Nr. 6, pp. 450-453, 2007.

A very interesting article, about how the journal “Annals of Internal Medicine” is promoting reproducible research. They do not require that all papers are reproducible, but they do ask the authors of each paper whether theirs is reproducible or not. If it is reproducible, they provide links to the protocol, data, or statistical code that was used.

While, certainly in medicine, this still does not guarantee that the entire research work is reproducible, it does give a lot of additional information (and credibility) about the presented work. I (as an ignorant researcher) also found it very interesting to read the description of the thorough editorial process that each paper undergoes. I have put an overview of reproducible research initiatives by journals on our RR links page. That is, the initiatives I know about of course. Feel free to let me know if you know other examples!

This initiative was (among others) initiated by an article about this topic by Peng et al. It would be great if other journals take over these examples, and reproducible research becomes the ‘default’ for a paper…

Putting reproducible research papers online

Reproducible Research
OK, you’ve made a reproducible research (RR) paper. How to make it available online?

At LCAV, we started off by simply making an HTML web page with all the paper details and additional information (data, code, additional figures), and putting that on our lab website.

While this is a very straightforward way of working, it is probably not very practical in the long term, as a new web page has to be made for each new publication. I can also imagine writing HTML may seem a big step (or just too cumbersome) to some. That’s why we have developed, in collaboration with the EPrints team, a Reproducible Research Repository setup. It has to be configured once (by your system administrator?), and allows you then to upload new RR papers by filling out a form with all the necessary information and uploading PDF, code, data, and/or any other additional material. I think it is really a lot more user-friendly than creating HTML pages each time. At the same time, it assures that all the information is there, and creates nice web pages (see here for an example of the repository front page, and here for an example of a page for a paper).

Continue reading

Our reproducible research paper

We (Jelena Kovacevic, Martin Vetterli, and me) recently submitted a paper on reproducible research to IEEE Signal Processing Magazine: “What, Why and How of Reproducible Research in Signal Processing“. It describes our experiences with reproducible research at LCAV (EPFL) and CMU, as well as a study on reproducibility of articles that appeared in IEEE Transactions on Image Processing (to which many people contributed). Feel free to take a look!

It also describes a setup for putting reproducible research publications online, about which I hope to post more info soon. One thing I already want to mention is that it is freely available for download!

We also did some advertizing for this paper recently by e-mail, and I must admit it is always nice to see the reactions you get… ! It was also picked up in the blogosphere by John Cook, hosting www.reproducibleresearch.org, here and here, and by Greg Wilson on The Third Bit.

But more about reproducible research soon…

Welcome!

Welcome on my personal blog!

On these pages, I plan to post thoughts and ideas on reproducible research, image processing research, or other things I find interesting enough to share with “the world” (that means you). It is also meant for experimenting with this medium, so it is still a bit unclear to me what and how often I will post here. I guess that will also depend on your feedback…

To be honest, it’s not my first attempt at blogging. When I was still at EPFL, I already started a blog on reproducible research, but somehow I never managed to publish things regularly enough there. So this time I’ll try to keep it a bit broader, and write a bit more regularly.

Reproducible Research in Blogosphere

Things have been rather quiet here recently… Not because nothing was happening on reproducible research, but mainly because I was not sure about the purposes and use of this Blog. Please feel free to let me know if you have any feelings about the use or lack of use for such a blog.

After some interesting discussions about reproducible research and open access, some colleagues have reported about our reproducible research initiatives on their blogs:
– Peter Murray-Rust wrote an article “ Open Data is critical for Reproducible Research” on his blog at University of Cambridge. He is quite active on Open Access to publications and data in chemistry. He and his colleagues have built a robot that extracts cristallographic information from publications and gathers them in an online database CristalEye. In their community, they also have the Blue Obelisk which collects open source code and data in chemistry.
– Peter Suber referred to reproducible research on his Open Access News blog: OA for text, data, and code to make research reproducible. Peter is a policy strategist for open access to scientific and scholarly research literature. On his blog, he gives a lot of news about new initiatives, publishing policies, etc.

And thanks of course also to Stevan Harnad for his kind and helpful reactions, and for bringing me into contact with these people!

repository server for publications

I think it’s probably a lot easier, and more consistent, if instead of making a web page for each RR paper we do (http://lcavwww.epfl.ch/reproducible_research), we have a setup (a bit) like Infoscience, where everyone can enter publications by filling in the required and optional fields. I would like to build such a setup based on EPrints (http://www.eprints.org/software/) and make it public, such that other labs/universities can also easily set up a similar server. We will probably let the people from EPrints develop this system, but for that we need accurate requirements… So your comments on this would be very welcome!

I was thinking about the following fields:
– standard publication fields (title, author, reviewing status, journal, volume, number, pages, year, DOI, abstract, keywords, PDF, publisher, official URL)
– specifically for RR:
* code and data (in a zip archive, specifying also the type of code), mandatory
* tested configurations, mandatory
* contact e-mail address, mandatory
* figures, optional
– additional features for readers (cfr http://clare.eprints.org/10/ for an example of the last)
* a check box saying ‘I have tested this code and it runs/does not run’
* a check box saying ‘I was/was not able to reproduce the results described in this paper’
* a field where anyone can add comments

Any comments? More/less things needed?
Some specific questions:
– should we make these ‘Additional features’ linked to a name and/or date or so, such that we can avoid the author clicking 10 times? 😉
– should we separate code and data? Data might get quite large, while code is generally small.

licenses – which one to use?

OK, there we go again after some pretty silent months… with a first note about a usable license for our reproducible research.

I have done some more reading about licensing possibilities, and believe me, there are plenty of them: 😉 see for example
www.opensource.org,
GNU Licenses, or
Creative Commons licenses (although this one is not intended for software, so it seems not really useful for our purpose).

Some features that I would find desirable for our license:
– be an ‘open’ license, meaning that people can get it freely (without paying) and easily on the web, and can even contribute etc.
– it would sound fair to me that if someone wants to build a commercial application using my code, he has to somehow ask (and pay) for it.
– if I want to commercialize my code myself, I need to be able to do this 😉

This second point seems to be a problem with most open source licenses. GPL says that all derived works need to use GPL too, whereas many other open licenses allow any kind of redistribution, under whatever commercial/noncommercial license that person would want to. As far as I can see, the third item is not a problem, as the author himself can apparently re-license things anyway he wants. Except of course for the fact that some version of your code may already be floating around on the internet.

I currently feel attracted to the dual licensing I saw on some places on the internet (MySQL uses this, and our neighbors from CVLab also): put the work by default freely available under GPL, but with a remark that people who want to use it commercially can contact us for a commercial license. This should give a very open distribution, forces other people to use GPL too if they want to redistribute it, but also gives the possibility to commercialize it.

Any comments on this? Is this the way we should license our reproducible work?

Academic Free License (AFL) v. 3.0

— just copying an e-mail from Christophe below about a possible license: AFL —

Hello to the reproducible research group 😉

Although I should still read it once more (and in its entirety) to understand it, this license could be a good candidate for Reproducible Research.
Does anyone know about it and if it is good or bad in any sense?

HTML: http://www.opensource.org/licenses/afl-3.0.php

Christophe

licenses

One important issue to discuss is under which license all our reproducible code and information should go. Currently most of our stuff is under GPL, but that does not permit even ourselves to commercialize things later. All derived code and products have to necessarily go under GPL too in that case. So maybe not that great if someone would want to do a startup based on his PhD research.