I wrote a post on my personal blog recently called Medieval software project management. The post compares software project management to the medieval practice of “beating the bounds,” having young boys walk the perimeter of a parish to memorize the boundaries in order to preserve this information for their lifetimes. Many research projects use a similar strategy, assigning one person to a project for life, depending on that person’s memory rather than capturing project information in prose or in software.
Johanna Rothman wrote a response to the medieval project management post in which she gives good advice on how businesses can avoid such traps. Here’s an excerpt.
Here’s what I did when I was a manager inside organizations, and what I suggest to clients now: make sure a team works on each project. That means no single-person projects, ever. A team to me contains all the people necessary to release a product. Certainly a developer and a tester. Maybe a writer, maybe a release engineer, maybe an analyst. Maybe a DBA. Whatever it takes to release a product, everyone’s on the team. Everyone participates. If they can automate their work and explain it to other people, great. But it’s not a team unless the team can release the product. (Emphasis added.)
It would be terrific progress if more scientific programming were done this way. In theory, science strives for a higher standard. Not only should your team of colleagues be able to reproduce your work, so should anonymous scientists around the world. But in practice, science often has lower standards than business with regard to software development.
Here’s an excerpt from Jon Udell’s interview with Roger Barga explaining the idea of provenance in art and science.
JU: Explain what you mean by provenance.
RB: Think about it in terms of art. For a given piece of art, we’re able to establish through authorities that it’s original, where it came from, and who’s had their hands on it through its lifetime. Provenance for a workflow result is the same thing. Minimally we want to be able to establish trust in a result. If you think about how that happens, it often starts by considering who wrote the workflow. So with Trident you can click on a result and interrogate the history of the workflow: who wrote it, who reviewed it, who revised it, when it first entered the system.
Jon Udell has a post this morning about reproducible research in oceanography.
Trident: A workflow system for doing data-intensive science with reproducible results
The post describes an interview with Roger Barga about the Trident system.
An article close to my current work on 3D now:
D. Scharstein and R. Szeliski, A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, International Journal of Computer Vision, 47(1/2/3), pp. 7-42, April-June 2002.
In their article, Scharstein and Szeliski make a comparison of stereo estimation algorithms. But they do not just offer this overview of algorithms. On their webpage, they also provide the source code, and a widely used dataset of stereo images. They also invite other researchers to try their own algorithm on this dataset, and upload the results. This has resulted over the years in a performance comparison of almost 50 stereo algorithms, nicely listed on their webpage.
A nice example of what reproducible research can do! I think we need a lot more of these comparisons on common (representative) datasets.
Test-driven software development has much in common with reproducible research. Here’s an excerpt from a talk by Kent Beck, one of the most visible proponents of test-driven development. He says test-driven development isn’t about testing.
Testing really isn’t the point. The point here is about responsibility. When you say it’s done is it done? Can you go to sleep at night knowing the software that software you finished today works and will help and isn’t going to take anything away from people?
You could say similar things about reproducible research. RR is about responsibility, really finishing a project rather than sorta finishing it. Can other people build on top of your work with confidence? Can you build with confidence tomorrow on the work you did today?
Software unit tests exist not only to verify that code is correct, but to insure that the code stays correct over time. These tests act as tripwires. The hope is that if a future change introduces a bug, a unit test will fail. Again similar remarks apply to RR. With RR, you’re not just interested in producing a result. You’re also giving some thought to producing a variation on that result with minimum effort and maximum confidence in the future when something changes.
I just read the following article:
C. Laine, S. N. Goodman, M. E. Griswold, and H. C. Sox, Reproducible Research: Moving toward Research the Public Can Really Trust, Annals of Internal Medicine, Vol. 146, Nr. 6, pp. 450-453, 2007.
A very interesting article, about how the journal “Annals of Internal Medicine” is promoting reproducible research. They do not require that all papers are reproducible, but they do ask the authors of each paper whether theirs is reproducible or not. If it is reproducible, they provide links to the protocol, data, or statistical code that was used.
While, certainly in medicine, this still does not guarantee that the entire research work is reproducible, it does give a lot of additional information (and credibility) about the presented work. I (as an ignorant researcher) also found it very interesting to read the description of the thorough editorial process that each paper undergoes. I have put an overview of reproducible research initiatives by journals on our RR links page. That is, the initiatives I know about of course. Feel free to let me know if you know other examples!
This initiative was (among others) initiated by an article about this topic by Peng et al. It would be great if other journals take over these examples, and reproducible research becomes the ‘default’ for a paper…
Last week Greg Wilson asked me what I would do if I had one hour to teach a group about reproducible research. He said to assume that the group is already convinced of the need for reproducibility.
First here are some thoughts on what I’d say if the group had not given much thought to reproducibility. I would start impersonal and then become more personal. I’d start by relating some horror stories of how someone else’s work was impossible to reproduce and contained false conclusions. It’s easy to gang up on some third party researcher, griping about how sloppy someone not in the room was in their research. This plants the idea that at least some people need to think more about reproducibility. Then I’d transition by talking about times when I’ve had difficulty reproducing my own work. Then I would try to convince them that their own work is probably not reproducible or at least not easily reproducible. So my outline would be they have problems, I have problems, you have problems.
I believe that convincing people of the need to be concerned about reproducibility is most of the problem. If people are highly motivated, they will come up with their own ways to make their work easier to reproduce and they will gladly take advantage of tools they are introduced to.
To Greg’s original question, now what? First I’d expound the merits of version control systems. You can’t possibly reproduce software if you can’t put your hands on the source code, and you can’t reproduce software as it existed at a particular point in time without revision history. Then I’d emphasize that version control is necessary but not sufficient. When people first understand version control, they tend to think it takes care of all their reproducibility problems when in fact it’s just the first step. I’d share some war stories of projects that have taken many hours to build even though we had all the source code. (If I had a semester rather than an hour, I’d let them experience this for themselves rather than just telling them about it by bringing in some outside projects for them to rebuild.) I’d also emphasize that it’s not enough to put code in version control: data needs to be versioned as well.
Once they grok version control, I’d discuss automation. When a process is 99% automated and 1% manual, the reproducibility problems come from the 1% that is manual. The principle behind many reproducibility tools is automating steps that are otherwise manual, undocumented, and error-prone. (See Programming the last mile.)
Finally, I’d emphasize the need for auditing. As I pointed out in an earlier post “You cannot say whether your own research is reproducible. It’s reproducible when someone else can reproduce it.” Again if I had a semester rather than an hour, I’d let them experience this by having them reproduce each other’s assignments. I could hear it now: “What do you mean you can’t reproduce my homework? It’s all right there!”
Greg Wilson and I have been discussing the importance of tools in reproducible research lately. Would more people use reproducibile research practices if tools made doing so more convenient? Would better tools appear if more people cared about reproducibility?
I believe both statements are true, and I believe Greg does as well. However, he and I have different emphases. Greg says “In my experience, most people won’t adopt a programming practice unless there is at least some basic support for it.” I agree, but I think the biggest obstacle to more widespread reproducibility is that few people realize or care that their work is irreproducible. I think that when more people care about reproducibility, some percentage of them will develop and give away tools and we’ll have enough tool support.
We are not in a chicken-and-egg scenario. It’s not as if Greg is saying first we need tools and I’m saying first we need users. We have both tools and users. There are people who care about reproducibility, and some of them have produced tools that make it easier for others to follow. But not many of these people know each other or know about their tools. I hope that the ReproducibleResearch.org web site and this blog will change this.
It help to look at the early history of object oriented programming. Some people were writing object oriented programs before there were (popular) object oriented languages. For example, some people were writing object oriented C before C++ baked support for OO into the language. This was painful, but some pioneers did it. To Greg’s point, the number of programmers writing OO programs took off once there were OO languages with good tool support. To my point, first there were programmers wanting to write OO code; these were the folks who developed the tools and the early adopters of the tools.