Machine Learning Data Analysis Competition

Another data competition:

Machine Learning for Signal Processing (MLSP) TC Announces the Winners of the 6th Annual Data Analysis Competition

See here for more info.

Workshop in Computational Systems Biology

Earlier this week, I was at the Workshop in Computational Systems Biology: Models, Methods, Meaning at the Norwegian University of Life Sciences in As, Norway. I gave a talk there on reproducible research, and there were some other excellent talks on modeling and simulation, research methods, etc. I liked it a lot, and it was really an excellent workshop! Thanks for the organization, Hans Ekkehard!

As it says on the site, this workshop was on the following topic, which very well described to me both the content and the spirit of the workshop: “Modeling and simulation are essential tools in systems biology and many other branches of science. This workshop is an invitation to step back from the day-to-day struggle with our simulations and to reflect about the nature of modeling and its relation to simulation: How do modeling and simulation contribute to the development of knowledge? Is a simulation per se a valid scientific experiment?”

Both the speakers and the audience consisted of people with a very diverse background, ranging from physicists, chemists and engineers (like me) all the way to philosophers in metaphysics. This resulted in often very enthusiast and interesting discussions. It was also very interesting for me to see how scientists in neuroscience struggle with similar issues as me, and to see how they approach things. I learned a lot of new things, some of which will pop up in separate posts on this blog in the future. Stay tuned!

On doing research

Leave a reply

I was just reading the following two articles/notes. While they are not entirely about reproducible research, I think they reflect well the worries that many researchers have about current “publish or perish” research practices. Not sure I agree with all of it, but they do make a number of good remarks.

D. Geman, Ten Reasons Why Conference Papers Should be Abolished, Johns Hopkins University, Nov. 2007.

Y. Ma, Warning Signs of Bogus Progress in Research in an Age of Rich Computation and Information, ECE, University of Illinois, Nov. 2007.

Climate science

Leave a reply

Just like many other domains, climate science is a mixture between theory, models and empirical results. Often this comes with different scientists working on the different parts (theory/model/experiments), and all claiming their part to be the (far) more important one of the three. A nice analysis is given on the IEEE Spectrum site. Unlike many other domains, it seems hard to me (not being a climate scientist) to do a lot of small experiments to validate the models. This makes it even more important to be open about the precise models used, parameters, and the data used to validate those models.

We’ve only got one planet Earth to validate models on. And it takes soooo long to check whether a model is correct, that we’d better be open about it, collaborate, check each other’s assumptions, and make sure it’s the best model we can make!

For some more discussion on the recent climate study scandal and reproducible research, see also Victoria Stodden’s blog (or also here).

Reproducible machine learning

Leave a reply

I just stumbled upon this one: Neil Lawrence’s page on reproducible research. Nice page, see also the large number of reproducible publications on his publications page. I think machine learning’s got another nice reference set of reproducible publications!

It’s interesting to see how Jon Claerbout’s work inspired a large number of people all around the globe.

ResearchAssistant

Leave a reply

I just got a pointer to ResearchAssistant, a Java tool to keep better track of research experiments. It stores the exact circumstances under which experiments are performed, with parameter values etc. Looks like a very nice tool for reproducible research. At least, it should make it very easy when writing a paper to trace back how certain results were obtained. If you’re doing research in Java, I’d certainly recommend taking a look at RA!

The tool is also described in the following paper:

D. Ramage and A. J. Oliner, RA: ResearchAssistant for the Computational Sciences, Workshop on Experimental Computer Science (ExpCS), June 2007.

ORCID: on being a number

Leave a reply

I just learned about ORCID: the Open Researcher Contributor Identification initiative. Its goal is to provide a unique ID for every researcher, and in that way provide better traceability of all the work by a researcher. It should avoid ambiguity between authors with the same name and typos. They even intend to include not only ‘standard’ conference/journal publications, but also more ‘exotic’ research output like data sets, blog posts, etc. The initiative is supported by a large number of major publishers, like Springer, Elsevier and Nature.

A very nice initiative, which should get a few problems out of the world. However, I am not sure how that is supposed to work in practice. Does that mean that we should soon add an ORCID number (without typos) below the title and the author name? And cite works by citing the ORCID and the DOI (digital object identifier)? And will we write these numbers with less errors than the author names now?

It makes me indeed think of that other unique number: DOI, which was introduced to uniquely identify a document (publication, for as far as I have seen them). I’ve seen it for some time now when I look up articles, and I have no doubt it uniquely identifies those articles, but what is it used for? Maybe they have their use… but I haven’t seen it yet.

People who do know of practical cases where the DOI is used, feel free to comment! (others too, of course)

Citations

Leave a reply

Something struck me lately, when reading a paper…

In academia, the game is all about publishing, and getting others to cite your articles. And I guess, to a certain extent, article counts and citation counts indeed give a measure of someone’s work. Until you start overfitting your system. But anyway, that’s another story…

So, to get back to my story, citations measure the quality of a work. In general, people try to be correct, and cite the researchers that started a certain work. And then, once work gets really well known, it’s somehow not cited anymore. So the ultimate reward for good work is not to be cited anymore. Or did you cite a reference when writing about the Fourier transform, wavelets, least squares or filtering? For some of them I don’t even know who it was, but someone must have invented them…

Making research reproducible

Leave a reply

Making publications reproducible is tough…

I recently experienced it again in some of my work. In the stress of preparing a publication for a submission deadline, it is very challenging to take the (precious) time to verify all of the results once more and make sure all the results are perfectly reproducible. A result or figure so easily slips in for which the exact parameter settings have not been checked or written down…

Open Access Week

2 Replies

This week is the first International Open Access Week. You can find more information here for the international events, or here for the Dutch website, to which I gave my modest contribution. I am truly convinced that making publications available online in open access is a great start. And the next step is to do the same for your code and data!

Reproducible Research

Links and info about reproducible research

Machine Learning Data Analysis Competition

Workshop in Computational Systems Biology

On doing research

Climate science

Reproducible machine learning

ResearchAssistant

ORCID: on being a number

Citations

Making research reproducible

Open Access Week