Category Archives: Uncategorized

Musical chairs and reproducibility drills

In a recent interview on the Hanselminutes podcast, Jeff Web said that if he were to teach a computer science class, he would have the class work on an assignment, then a week later make everyone move over one chair, i.e. have everyone take over the code their neighbor started. Aside from the difficulty of assigning individual grades in such a class, I think that’s a fantastic idea.

Suppose students did have to take over a new code base every week. People who write mysterious code would be chastised by their peers. Hopefully people who think they write transparent code would realize that they don’t. The students might even hold a meeting outside of class to set a strategy. I could imagine someone standing up to argue that they’re all going to do poorly in the class unless they agree on some standards. It would be fantastic if the students would discover a few principles of software engineering out of self-defense.

I had a small taste of this in college. My first assignment in one computer science class was to add functionality to a program the instructor had started. Then he asked us to add the same functionality to a program that a typical student had written the previous semester. As the instructor emphasized, he didn’t pick out the worst program turned in, only a typical one. As I recall, the student code wasn’t terrible, but it wasn’t exactly clear either. This was by far the most educational homework problem I had in a CS class. I realized that the principles we’d been taught about how to write good code were not just platitudes but survival skills. Later my experience as a professional programmer and as a project manager reinforced the same conclusion.

In some environments, it’s not practical to have people switch projects unless it is absolutely necessary. Maybe the code is high quality (and maybe it’s not!) but there is a large amount of domain knowledge necessary before someone could contribute to the code. But at least software developers ought to be able to build each other’s code, even if they couldn’t maintain it.

When I was managing a group of around 20 programmers, mostly working on one-person projects, I had what I called reproducibility drills. These were similar to Jeff Webb’s idea for teaching computer science. I had everyone try to build someone else’s project. These exercises turned out to be far more difficult than anyone anticipated, but they caused us to improve our development procedures.

We later added a policy that a build master would have to extract a project from version control and build it without help from the developer before the project could be deployed. The developer was allowed (required) to create written instructions for how to build the project, and these instructions were to be in a location dictated by convention. The build master position rotated so we wouldn’t become too dependent on one person’s implicit knowledge.

Having a rotating build master is great improvement, but it lacked some of the benefits of the reproducibility drills. The build master procedure only requires a project to be reproducible before it’s deployed. That is essential, but it could foster an attitude that it’s OK for a project to be in bad shape until the very end. Also, some projects never actually deploy, such as research projects, and so they never go to  the build master.

Medieval project management

I wrote a post on my personal blog recently called Medieval software project management. The post compares software project management to the medieval practice of “beating the bounds,” having young boys walk the perimeter of a parish to memorize the boundaries in order to preserve this information for their lifetimes. Many research projects use a similar strategy, assigning one person to a project for life, depending on that person’s memory rather than capturing project information in prose or in software.

Johanna Rothman wrote a response to the medieval project management post in which she gives good advice on how businesses can avoid such traps. Here’s an excerpt.

Here’s what I did when I was a manager inside organizations, and what I suggest to clients now: make sure a team works on each project. That means no single-person projects, ever. A team to me contains all the people necessary to release a product. Certainly a developer and a tester. Maybe a writer, maybe a release engineer, maybe an analyst. Maybe a DBA. Whatever it takes to release a product, everyone’s on the team. Everyone participates. If they can automate their work and explain it to other people, great. But it’s not a team unless the team can release the product. (Emphasis added.)

It would be terrific progress if more scientific programming were done this way. In theory, science strives for a higher standard. Not only should your team of colleagues be able to reproduce your work, so should anonymous scientists around the world. But in practice, science often has lower standards than business with regard to software development.

Provenance in art and science

Here’s an excerpt from Jon Udell’s interview with Roger Barga explaining the idea of provenance in art and science.

JU: Explain what you mean by provenance.

RB: Think about it in terms of art. For a given piece of art, we’re able to establish through authorities that it’s original, where it came from, and who’s had their hands on it through its lifetime. Provenance for a workflow result is the same thing. Minimally we want to be able to establish trust in a result. If you think about how that happens, it often starts by considering who wrote the workflow. So with Trident you can click on a result and interrogate the history of the workflow: who wrote it, who reviewed it, who revised it, when it first entered the system.

Test-driven development

Test-driven software development has much in common with reproducible research. Here’s an excerpt from a talk by Kent Beck, one of the most visible proponents of test-driven development. He says test-driven development isn’t about testing.

Testing really isn’t the point. The point here is about responsibility. When you say it’s done is it done? Can you go to sleep at night knowing the software that software you finished today works and will help and isn’t going to take anything away from people?

You could say similar things about reproducible research. RR is about responsibility, really finishing a project rather than sorta finishing it. Can other people build on top of your work with confidence? Can you build with confidence tomorrow on the work you did today?

Software unit tests exist not only to verify that code is correct, but to insure that the code stays correct over time. These tests act as tripwires. The hope is that if a future change introduces a bug, a unit test will fail. Again similar remarks apply to RR. With RR, you’re not just interested in producing a result. You’re also giving some thought to producing a variation on that result with minimum effort and maximum confidence in the future when something changes.

How to teach RR in one hour

Last week Greg Wilson asked me what I would do if I had one hour to teach a group about reproducible research. He said to assume that the group is already convinced of the need for reproducibility.

First here are some thoughts on what I’d say if the group had not given much thought to reproducibility. I would start impersonal and then become more personal. I’d start by relating some horror stories of how someone else’s work was impossible to reproduce and contained false conclusions. It’s easy to gang up on some third party researcher, griping about how sloppy someone not in the room was in their research. This plants the idea that at least some people need to think more about reproducibility. Then I’d transition by talking about times when I’ve had difficulty reproducing my own work. Then I would try to convince them that their own work is probably not reproducible or at least not easily reproducible. So my outline would be they have problems, I have problems, you have problems.

I believe that convincing people of the need to be concerned about reproducibility is most of the problem. If people are highly motivated, they will come up with their own ways to make their work easier to reproduce and they will gladly take advantage of tools they are introduced to.

To Greg’s original question, now what? First I’d expound the merits of version control systems. You can’t possibly reproduce software if you can’t put your hands on the source code, and you can’t reproduce software as it existed at a particular point in time without revision history. Then I’d emphasize that version control is necessary but not sufficient. When people first understand version control, they tend to think it takes care of all their reproducibility problems when in fact it’s just the first step. I’d share some war stories of projects that have taken many hours to build even though we had all the source code. (If I had a semester rather than an hour, I’d let them experience this for themselves rather than just telling them about it by bringing in some outside projects for them to rebuild.) I’d also emphasize that it’s not enough to put code in version control: data needs to be versioned as well.

Once they grok version control, I’d discuss automation. When a process is 99% automated and 1% manual, the reproducibility problems come from the 1% that is manual. The principle behind many reproducibility tools is automating steps that are otherwise manual, undocumented, and error-prone. (See Programming the last mile.)

Finally, I’d emphasize the need for auditing. As I pointed out in an earlier post “You cannot say whether your own research is reproducible. It’s reproducible when someone else can reproduce it.” Again if I had a semester rather than an hour, I’d let them experience this by having them reproduce each other’s assignments. I could hear it now: “What do you mean you can’t reproduce my homework? It’s all right there!”

Which comes first, users or tools?

Greg Wilson and I have been discussing the importance of tools in reproducible research lately. Would more people use reproducibile research practices if tools made doing so more convenient? Would better tools appear if more people cared about reproducibility?

I believe both statements are true, and I believe Greg does as well. However, he and I have different emphases. Greg says “In my experience, most people won’t adopt a programming practice unless there is at least some basic support for it.” I agree, but I think the biggest obstacle to more widespread reproducibility is that few people realize or care that their work is irreproducible. I think that when more people care about reproducibility, some percentage of them will develop and give away tools and we’ll have enough tool support.

We are not in a chicken-and-egg scenario. It’s not as if Greg is saying first we need tools and I’m saying first we need users. We have both tools and users. There are people who care about reproducibility, and some of them have produced tools that make it easier for others to follow. But not many of these people know each other or know about their tools. I hope that the ReproducibleResearch.org web site and this blog will change this.

It help to look at the early history of object oriented programming. Some people were writing object oriented programs before there were (popular) object oriented languages. For example, some people were writing object oriented C before C++ baked support for OO into the language. This was painful, but some pioneers did it. To Greg’s point, the number of programmers writing OO programs took off once there were OO languages with good tool support. To my point, first there were programmers wanting to write OO code; these were the folks who developed the tools and the early adopters of the tools.

When your hair is on fire …

I heard a line the other day something like this:

When you’re working with your hair on fire, if you see anything that doesn’t look like a bucket of water, you’re not interested.

I think I heard this on the PowerScripting podcast. The context was a discussion about the design of Microsoft’s PowerShell, a shell and scripting environment targeted for system administrators. The idea was that since many sys admins are working with their hair on fire, PowerShell was designed to look like a bucket of water, something that will bring relief rather than yet another thing to learn.

How can we make reproducible research look like a bucket of water? In the long term, even in the not so long term, reproducibility habits can improve productivity and reduce stress. But many people will not be receptive unless they also see short term benefits, the shorter the better.

I think templates are one way reproducibility can look like a bucket of water to someone with their hair on fire. You’ve got an analysis to do? Here’s a template. Fill in your specifics at the top, compile it, and out comes a beautiful report. Along these lines, at M. D. Anderson we’ve created some Sweave templates for microarray data analysis. One of the things I’d like to see happen on the ReproducibleResearch.org web site is a collection of Sweave templates for statistical analysis. If you have anything to contribute, please send a note.

Reproducible presentations

Yesterday I wrote about my experience trying out Beamer for writing presentations in LaTeX. Some of the images that I’m wanting to include in my presentations are plots produced in R, so one simplification would be to combine Beamer with Sweave. That way I could include code for producing the images directly in my presentation file rather than referencing external image files. Any change to the R code would be automatically reflected in my presentation.

One problem I had when turning a LaTeX Beamer file into a Sweave file was image sizes. When an Sweave file has documentclass{article}, plots are modestly sized and centered. But when I tried including a plot with an Sweave file with documentclass{Beamer}, the image was so large that it covered up other material on the slide. The solution was to include the following line immediately after the begin{document} command:

setkeys{Gin}{width=0.6textwidth}

(See section 4.1.2, page 14 of the Sweave manual.) This command made the image the size I wanted, but the image was no longer centered. To center the image, I added begin{center} before and end{center} after the Sweave figure command. This worked. A sketch of the code is included below.

documentclass{Beamer}
begin{document}
setkeys{Gin}{width=0.6textwidth}

begin{frame}
frametitle{...}

Slide verbiage ...

begin{center}
<<&fig=TRUE, echo=FALSE>>=
# R code ...
@
end{center}

Reproducibility badge

Greg Wilson points out in his badge of reproducibility post a couple things I missed in my previous post announcing the signal processing article.

The first is the reproducible research badge, the green check mark logo on the preprint site. I don’t know who owns that or what their rules are. Perhaps it belongs to EPFL.

The other is the user evaluation form. Users can select one of the following four options:

  1. I have tested this code and it works
  2. I have tested this code and it does not work (on my computer)
  3. I have tested this code and was able to reproduce the results from the paper
  4. I have tested this code and was unable to reproduce the results from the paper

That is huge. You cannot say whether your own research is reproducible. It’s reproducible when someone else can reproduce it.