Monthly Archives: October 2008

Biggest barrier to reproducibility

My previous post discussed Keith Baggerly and his efforts as a “forensic bioinformatician.”

In that article, the reporter asks Keith to name the biggest problem he sees in trying to reproduce results.

It’s not sexy, it’s not higher mathematics. It’s bookkeeping … keeping track of the labels and keeping track of what goes where. The thing that we have found repeatedly in our analyses is that it actually is one of the most difficult steps in performing some of these analyses.

I’ve seen presentations where Keith discusses specific bookkeeping errors. Quite often columns get transposed in spreadsheets, so researchers are not analyzing the data they say they are analyzing.

Forensic bioinformatics

The October 2008 issue of AMSTAT News has an article entitled “Forensic Bioinformatician Aims To Solve Mysteries of Biomarker Studies.” The article is about Keith Baggerly of M. D. Anderson Cancer Center and his efforts to reproduce analyses in bioinformatics papers.

The article quotes David Ransohoff, professor of medicine at UNC Chapel Hill, saying this about Keith Baggerly.

I think Keith is doing a wonderful and needed job … But the fact that we need people like him means that our journals are failing us. The kinds of things that Keith spends time finding out — what [the researchers] actually do — that’s what methods and results are supposed to be for in journals. … We need to figure out how to do science without needing people like Keith.

One of the reasons for lack of reproducibility is that journals press authors for space and so statistics sections get abbreviated. (Why not put the full details online?) Another reason is that bioinformatics articles are inherently cross-disciplinary and it may be that no single person is responsible for or even understands the entire article.

What’s in a name?

Reproducible research, literate programming, open science, and science 2.0. All different namings, and (in my opinion) all covering largely the same topic: sharing code and/or data complementing a publication as a presentation of your research work. While literate programming is more focused on adding documentation to code, and science 2.0 seems to include the assumption that you put work in progress online, there really seems to be a very large intersection between these topics.

This clearly shows that from various sides of the scientific community, in very different fields of science, the same ideas pop up. That is a really exciting thing! And at the same time it also shows that there is a clear need for such open publication of a piece of research. And I think everyone will agree that there would be nothing nicer than being able to really start from the current state-of-the-art when starting to do research in a certain field?

Should all these efforts be merged under a single “label”? It would definitely be exciting. And it would create a huge impact, as a joint effort for “open science”, “reproducible research”, or whatever the name may be, would receive a lot of attention, and cannot be overlooked by anyone anymore. At the same time, every research domain needs other specifics or finetuning, and it is not clear to me now what the “best” setup would be for the type of work I am doing now. So maybe we should let these variations co-exist for some more time, and see later which ones survive, are the simplest to use, and which tools can be combined to create an optimal method for research.

But of course (if anyone is reading these posts), I would be very happy to hear your own opinion on this!

Scientific fraud

A few months ago, I read in a Belgian newspaper that 9% of the participants in a study among 2.000 American scientists said they had witnessed scientific fraud within the past three years. And it seems they were not talking about those cases where people use Photoshop to crop an image or so, but rather inventing fake results or falsifying articles.

Although I wasn’t able to find this back on the web with Google, I am quite sure the original authors checked the number. Wikipedia reports on another study, where the actual number was 3%. Anyhow, whether it is 3 or 9 percent, this number is much too high. Let us hope it can be taken down by requiring higher reproducibility of our research work. I do realize that there will always be people cheating, and falsifying results (Wikipedia even keeps a list of the most famous cases). But I also strongly believe that in the end, most researchers just want to do good work. And many of them perform non-reproducible work, just because they don’t feel the need for making it reproducible (yet). Or are too busy with their next piece of work to properly finish off the current one…

Embedding .NET code in Office documents

I recently heard out about some interesting tools from Blue Reference. I haven’t had a chance to try them out yet, but they look promising.

Sweave has received a fair amount of attention with regard to reproducibility because it lets you embed R code in LaTeX. Code stays with the presentation document, reducing the chance of error and increasing transparency. However, the number of people who use R and LaTeX is small, and asking people to learn these two packages before they can reproducible research is not going to fly. The number of people who use C# and Microsoft Word is orders of magnitude larger than the number of folks who use R and LaTeX.

It looks like Blue Reference’s product Inference for .NET lets .NET programmers do the kinds of things Sweave lets R programmers do, embedding .NET code in Microsoft Office documents. The also make a product Inference for MATLAB for embedding MATLAB code in Office documents.

Python developers who don’t think of themselves as .NET developers might want to use Inference for .NET to embed Python code in Word documents via Iron Python. Ruby developers might want to use Iron Ruby similarly.

Programming is understanding

Bjarne Stroustrup’s book The C++ Programming Language begins with the quote

Programming is understanding.

Many times I’ve thought I understood something until I tried to implement it in software. Then the process of writing and testing the software exposed my lack of understanding

One thing that can make reproducible research difficult is that you have to deeply understand what you’re doing. Making work reproducible may require automating steps that you do not fully understand, and don’t realize that you don’t understand until you try automating them.

Stated more positively, attempts to make research reproducible can lead to new insights into the research itself.

Related post: Paper doesn’t abort

Musical chairs and reproducibility drills

In a recent interview on the Hanselminutes podcast, Jeff Web said that if he were to teach a computer science class, he would have the class work on an assignment, then a week later make everyone move over one chair, i.e. have everyone take over the code their neighbor started. Aside from the difficulty of assigning individual grades in such a class, I think that’s a fantastic idea.

Suppose students did have to take over a new code base every week. People who write mysterious code would be chastised by their peers. Hopefully people who think they write transparent code would realize that they don’t. The students might even hold a meeting outside of class to set a strategy. I could imagine someone standing up to argue that they’re all going to do poorly in the class unless they agree on some standards. It would be fantastic if the students would discover a few principles of software engineering out of self-defense.

I had a small taste of this in college. My first assignment in one computer science class was to add functionality to a program the instructor had started. Then he asked us to add the same functionality to a program that a typical student had written the previous semester. As the instructor emphasized, he didn’t pick out the worst program turned in, only a typical one. As I recall, the student code wasn’t terrible, but it wasn’t exactly clear either. This was by far the most educational homework problem I had in a CS class. I realized that the principles we’d been taught about how to write good code were not just platitudes but survival skills. Later my experience as a professional programmer and as a project manager reinforced the same conclusion.

In some environments, it’s not practical to have people switch projects unless it is absolutely necessary. Maybe the code is high quality (and maybe it’s not!) but there is a large amount of domain knowledge necessary before someone could contribute to the code. But at least software developers ought to be able to build each other’s code, even if they couldn’t maintain it.

When I was managing a group of around 20 programmers, mostly working on one-person projects, I had what I called reproducibility drills. These were similar to Jeff Webb’s idea for teaching computer science. I had everyone try to build someone else’s project. These exercises turned out to be far more difficult than anyone anticipated, but they caused us to improve our development procedures.

We later added a policy that a build master would have to extract a project from version control and build it without help from the developer before the project could be deployed. The developer was allowed (required) to create written instructions for how to build the project, and these instructions were to be in a location dictated by convention. The build master position rotated so we wouldn’t become too dependent on one person’s implicit knowledge.

Having a rotating build master is great improvement, but it lacked some of the benefits of the reproducibility drills. The build master procedure only requires a project to be reproducible before it’s deployed. That is essential, but it could foster an attitude that it’s OK for a project to be in bad shape until the very end. Also, some projects never actually deploy, such as research projects, and so they never go to  the build master.