Last week, I attended the Berlin 6 Open Access Conference in Düsseldorf (Germany). It was an interesting conference, on different aspects of Open Access: making publications freely available online. There was a wide variety of talks, from publishers’ perspectives over financial models for Open Access and open standards, to benefits of Open Access for developed and developing countries.
One of the sessions was organized by Mark Liberman around the topic of reproducible research. I gave a talk there about my experiences with reproducible research, but that’s not what I want to talk about here. I found it very interesting to see the wide range of subjects and perspectives that Mark gathered in that session. Slides of the entire session are available here for those who are interested.
It was particularly interesting to hear the experiences of Mark and Kai in linguistics. Mark explained that historically, all machine translation funding by the US government was stopped (in the late 1960s) due to some bad reviews. Twenty years later, funding was resumed, but only under the strict requirement that all such research has to be evaluated by a neutral agent using an objective evaluation metric. They essentially required that all research results should be reproduced by someone independent, and that the results should be measured objectively. While many researchers were upset at the start about this way of handling their results, it did enforce a culture of reproducibility, public datasets, and comparing results using the same metrics. One could comment that this creates the risk of over-fitting a data set, and that an objective quality measure is not always easy to develop, but I think that overall, such a development would be a great step forward in many fields. I do hope, though, that we can find other ways of making it happen than by cutting the funding for all non-RR domains for 20 years…
Another thing I personally learned, is that I should be more careful when talking about “history” of reproducible research. I generally tend to call Claerbout the initiator of reproducible research in computational sciences. This is very subjective, though. I still believe our work on reproducible research in signal processing has its foundations in his work, but of course, worries about reproducibility are already as old as the scientific method itself. And as the above story explains, people in linguistics have been thinking about standardized data sets and evaluation metrics in the broad framework of ‘computational sciences’. So it is not that easy to state who coined the term ‘reproducible research’, and there was probably a gradual development, simultaneously coming up in various fields of research.
Some other reports on Berlin 6 and our reproducible research session are available on