Monthly Archives: July 2008

When your hair is on fire …

I heard a line the other day something like this:

When you’re working with your hair on fire, if you see anything that doesn’t look like a bucket of water, you’re not interested.

I think I heard this on the PowerScripting podcast. The context was a discussion about the design of Microsoft’s PowerShell, a shell and scripting environment targeted for system administrators. The idea was that since many sys admins are working with their hair on fire, PowerShell was designed to look like a bucket of water, something that will bring relief rather than yet another thing to learn.

How can we make reproducible research look like a bucket of water? In the long term, even in the not so long term, reproducibility habits can improve productivity and reduce stress. But many people will not be receptive unless they also see short term benefits, the shorter the better.

I think templates are one way reproducibility can look like a bucket of water to someone with their hair on fire. You’ve got an analysis to do? Here’s a template. Fill in your specifics at the top, compile it, and out comes a beautiful report. Along these lines, at M. D. Anderson we’ve created some Sweave templates for microarray data analysis. One of the things I’d like to see happen on the ReproducibleResearch.org web site is a collection of Sweave templates for statistical analysis. If you have anything to contribute, please send a note.

Putting reproducible research papers online

Reproducible Research
OK, you’ve made a reproducible research (RR) paper. How to make it available online?

At LCAV, we started off by simply making an HTML web page with all the paper details and additional information (data, code, additional figures), and putting that on our lab website.

While this is a very straightforward way of working, it is probably not very practical in the long term, as a new web page has to be made for each new publication. I can also imagine writing HTML may seem a big step (or just too cumbersome) to some. That’s why we have developed, in collaboration with the EPrints team, a Reproducible Research Repository setup. It has to be configured once (by your system administrator?), and allows you then to upload new RR papers by filling out a form with all the necessary information and uploading PDF, code, data, and/or any other additional material. I think it is really a lot more user-friendly than creating HTML pages each time. At the same time, it assures that all the information is there, and creates nice web pages (see here for an example of the repository front page, and here for an example of a page for a paper).

Continue reading

Reproducible presentations

Yesterday I wrote about my experience trying out Beamer for writing presentations in LaTeX. Some of the images that I’m wanting to include in my presentations are plots produced in R, so one simplification would be to combine Beamer with Sweave. That way I could include code for producing the images directly in my presentation file rather than referencing external image files. Any change to the R code would be automatically reflected in my presentation.

One problem I had when turning a LaTeX Beamer file into a Sweave file was image sizes. When an Sweave file has documentclass{article}, plots are modestly sized and centered. But when I tried including a plot with an Sweave file with documentclass{Beamer}, the image was so large that it covered up other material on the slide. The solution was to include the following line immediately after the begin{document} command:

setkeys{Gin}{width=0.6textwidth}

(See section 4.1.2, page 14 of the Sweave manual.) This command made the image the size I wanted, but the image was no longer centered. To center the image, I added begin{center} before and end{center} after the Sweave figure command. This worked. A sketch of the code is included below.

documentclass{Beamer}
begin{document}
setkeys{Gin}{width=0.6textwidth}

begin{frame}
frametitle{...}

Slide verbiage ...

begin{center}
<<&fig=TRUE, echo=FALSE>>=
# R code ...
@
end{center}

Our reproducible research paper

We (Jelena Kovacevic, Martin Vetterli, and me) recently submitted a paper on reproducible research to IEEE Signal Processing Magazine: “What, Why and How of Reproducible Research in Signal Processing“. It describes our experiences with reproducible research at LCAV (EPFL) and CMU, as well as a study on reproducibility of articles that appeared in IEEE Transactions on Image Processing (to which many people contributed). Feel free to take a look!

It also describes a setup for putting reproducible research publications online, about which I hope to post more info soon. One thing I already want to mention is that it is freely available for download!

We also did some advertizing for this paper recently by e-mail, and I must admit it is always nice to see the reactions you get… ! It was also picked up in the blogosphere by John Cook, hosting www.reproducibleresearch.org, here and here, and by Greg Wilson on The Third Bit.

But more about reproducible research soon…

Reproducibility badge

Greg Wilson points out in his badge of reproducibility post a couple things I missed in my previous post announcing the signal processing article.

The first is the reproducible research badge, the green check mark logo on the preprint site. I don’t know who owns that or what their rules are. Perhaps it belongs to EPFL.

The other is the user evaluation form. Users can select one of the following four options:

  1. I have tested this code and it works
  2. I have tested this code and it does not work (on my computer)
  3. I have tested this code and was able to reproduce the results from the paper
  4. I have tested this code and was unable to reproduce the results from the paper

That is huge. You cannot say whether your own research is reproducible. It’s reproducible when someone else can reproduce it.

Welcome!

Welcome on my personal blog!

On these pages, I plan to post thoughts and ideas on reproducible research, image processing research, or other things I find interesting enough to share with “the world” (that means you). It is also meant for experimenting with this medium, so it is still a bit unclear to me what and how often I will post here. I guess that will also depend on your feedback…

To be honest, it’s not my first attempt at blogging. When I was still at EPFL, I already started a blog on reproducible research, but somehow I never managed to publish things regularly enough there. So this time I’ll try to keep it a bit broader, and write a bit more regularly.