Category Archives: general

On reproducing someone’s results…

It’s a good idea to try to reproduce someone else’s results. But when doing so, it’s important to give credit to the original authors, by making appropriate citations.

It’s not a good idea to copy someone else’s article, only changing the author list, and submitting it to another conference/journal. I’m glad to see IEEE takes appropriate measures when such a thing happens. And I am still amazed it actually happened…

Workshop in Computational Systems Biology

Earlier this week, I was at the Workshop in Computational Systems Biology: Models, Methods, Meaning at the Norwegian University of Life Sciences in As, Norway. I gave a talk there on reproducible research, and there were some other excellent talks on modeling and simulation, research methods, etc. I liked it a lot, and it was really an excellent workshop! Thanks for the organization, Hans Ekkehard!

As it says on the site, this workshop was on the following topic, which very well described to me both the content and the spirit of the workshop: “Modeling and simulation are essential tools in systems biology and many other branches of science. This workshop is an invitation to step back from the day-to-day struggle with our simulations and to reflect about the nature of modeling and its relation to simulation: How do modeling and simulation contribute to the development of knowledge? Is a simulation per se a valid scientific experiment?”

Both the speakers and the audience consisted of people with a very diverse background, ranging from physicists, chemists and engineers (like me) all the way to philosophers in metaphysics. This resulted in often very enthusiast and interesting discussions. It was also very interesting for me to see how scientists in neuroscience struggle with similar issues as me, and to see how they approach things. I learned a lot of new things, some of which will pop up in separate posts on this blog in the future. Stay tuned!

Climate science

Just like many other domains, climate science is a mixture between theory, models and empirical results. Often this comes with different scientists working on the different parts (theory/model/experiments), and all claiming their part to be the (far) more important one of the three. A nice analysis is given on the IEEE Spectrum site. Unlike many other domains, it seems hard to me (not being a climate scientist) to do a lot of small experiments to validate the models. This makes it even more important to be open about the precise models used, parameters, and the data used to validate those models.

We’ve only got one planet Earth to validate models on. And it takes soooo long to check whether a model is correct, that we’d better be open about it, collaborate, check each other’s assumptions, and make sure it’s the best model we can make!

For some more discussion on the recent climate study scandal and reproducible research, see also Victoria Stodden’s blog (or also here).

ResearchAssistant

I just got a pointer to ResearchAssistant, a Java tool to keep better track of research experiments. It stores the exact circumstances under which experiments are performed, with parameter values etc. Looks like a very nice tool for reproducible research. At least, it should make it very easy when writing a paper to trace back how certain results were obtained. If you’re doing research in Java, I’d certainly recommend taking a look at RA!

The tool is also described in the following paper:

D. Ramage and A. J. Oliner, RA: ResearchAssistant for the Computational Sciences, Workshop on Experimental Computer Science (ExpCS), June 2007.

Making research reproducible

Making publications reproducible is tough…

I recently experienced it again in some of my work. In the stress of preparing a publication for a submission deadline, it is very challenging to take the (precious) time to verify all of the results once more and make sure all the results are perfectly reproducible. A result or figure so easily slips in for which the exact parameter settings have not been checked or written down…

New York Times about R

I got a pointer earlier this week to a New York Times article about R. A very interesting article about the use of R in scientific communities and industrial research, mainly for statistical analysis. R is open source software, so it is free and has already taken advantage from contributions made by various authors. And (although I haven’t used it myself yet), it is a great tool for reproducible research. Using the package Sweave, authors can write a single document containing their article and the R code to reproduce the results and put them in place. This ensures that all the material is in a single place.

It also shows something about the amazing power of open source software developed by a community of authors (and typically users at the same time).

Domain names

I seem to be dwelling quite some time on the web lately… After my post about the lifetime of URLs, here’s one about domain names and reproducibility. I recently noticed when looking around that there are quite some websites and domain names related to reproducible research.

reproducibleresearch.org is an overview website by John D. Cook containing links to reproducible research projects, articles about the topics, and relevant tools. It also contains a blog about reproducible ideas.

reproducibleresearch.com is owned by the people at Blue Reference, who created Inference for Office, a commercial tool to perform reproducible research from within Microsoft Office.

reproducibility.org is used by Sergey Fomel and his colleagues as home for their Madagascar open source package for reproducible research experiments.

reproducible.org is a reproducible research archive maintained by R. Peng at Johns Hopkins School, where the goal is to host a place for reproducible research packages.

Quite a range of domain names containing the word “reproducible” (or a derivative), if you ask me! And then I didn’t even start about the Open Research or Research 2.0 sites. Let’s hope this also means that research itself will soon see a big boost in reproducibility!

The volatility of URLs

I am getting worried these days about the volatility of URLs and web pages. I guess you all know the problem: it is very easy to create a web page, and hence many people do so. Great! However, after some years, only few of those web pages are still available. Common reasons include people retiring, or moving to other places, and therefore their web pages at their employer’s site disappear. Similarly, registering a domain name at some point in time does not mean you will keep on paying the yearly fees forever. Or also, web sites getting an entire re-design often result in broken URLs.

Why does this worry me so much?

Continue reading