One of the reproducibility problems with many current papers is that everyone applies his new algorithm to his own set of data. So did I in my super-resolution work, too. A problem with that is that it is very difficult to assess whether the data set is used (a) because that was the one the author had at hand, (b) because it was the most representative one, or (c) because the algorithm performed best on that data set.
To allow more fair comparisons, competitions are being set up in various fields. Often in the period before a conference, a competition is set up, where everyone can try his algorithm on a common dataset given by the organizers.
For example, in medical imaging, before the latest MICCAI conference, a “segmentation in the clinic” challenge was organized. For three different applications (coronary arteries, MS lesions, and liver tumors), a competition was organized to compute the best segmentation. For each case, a training set of data with ground truth segmentations is provided to tune algorithms. Next, a test set is given, without segmentations, that should be segmented by the participating algorithms, and for which the results will be compared. And finally, a third data set is only distributed during the workshop, and used for an on-site comparison of the algorithms.
In neural networks, a competition for time series prediction is organized. Participants have to forecast the next 56 data points of daily cash money withdrawals at cash machines, based on the daily withdrawals for the past two years. The participants can choose between a dataset of 11 such time series, or the full dataset of 111 such time series, and has to predict the next points for each series with a unique methodology. The algorithms can be submitted to a few related conferences.
I strongly believe such competitions, together with more standardized data sets, will pop up more frequently in the future, and provide a nice way of truly comparing and evaluating algorithms.
I certainly agree with your thought about the data set problem, since I encountered the same difficulty in my research. When I wanted to compare my algorithm with others for video coding, it is almost impossible for me to find a common test set that others used. And most of the algorithms proposed can not be found on the authors’ websites, which makes comparison very difficult. I really think research should be made more ‘transparent’ and open to competition!
So I like your idea of ‘data set competition’. Although not only the ‘best’ algorithms should be appreciated, new concept/trial/idea can also be promoted, providing benchmark is definitely something useful to do.:)