One of the reproducibility problems with many current papers is that everyone applies his new algorithm to his own set of data. So did I in my super-resolution work, too. A problem with that is that it is very difficult to assess whether the data set is used (a) because that was the one the author had at hand, (b) because it was the most representative one, or (c) because the algorithm performed best on that data set.
To allow more fair comparisons, competitions are being set up in various fields. Often in the period before a conference, a competition is set up, where everyone can try his algorithm on a common dataset given by the organizers.
For example, in medical imaging, before the latest MICCAI conference, a “segmentation in the clinic” challenge was organized. For three different applications (coronary arteries, MS lesions, and liver tumors), a competition was organized to compute the best segmentation. For each case, a training set of data with ground truth segmentations is provided to tune algorithms. Next, a test set is given, without segmentations, that should be segmented by the participating algorithms, and for which the results will be compared. And finally, a third data set is only distributed during the workshop, and used for an on-site comparison of the algorithms.
In neural networks, a competition for time series prediction is organized. Participants have to forecast the next 56 data points of daily cash money withdrawals at cash machines, based on the daily withdrawals for the past two years. The participants can choose between a dataset of 11 such time series, or the full dataset of 111 such time series, and has to predict the next points for each series with a unique methodology. The algorithms can be submitted to a few related conferences.
I strongly believe such competitions, together with more standardized data sets, will pop up more frequently in the future, and provide a nice way of truly comparing and evaluating algorithms.