I am getting worried these days about the volatility of URLs and web pages. I guess you all know the problem: it is very easy to create a web page, and hence many people do so. Great! However, after some years, only few of those web pages are still available. Common reasons include people retiring, or moving to other places, and therefore their web pages at their employer’s site disappear. Similarly, registering a domain name at some point in time does not mean you will keep on paying the yearly fees forever. Or also, web sites getting an entire re-design often result in broken URLs.
Why does this worry me so much?
I think web pages are the easiest, fastest, and most practical way of making reproducible research and articles available to colleagues. But if these web pages have a short lifetime, it also makes that research reproducible (by others) for only a short time. We conducted a study last year about reproducibility of articles, and when going through the obtained URLs now, I can already see quite a few that don’t work anymore. No way to retrieve the information…
And the worst is: what can we do about this? I do believe institutional repositories, by their larger scale nature, and more long-term support, will have a longer lifetime. But those are often not very flexible to allow for adding code, data, or other metadata. Each paper these days also has a unique identifier, its DOI. This allows one to track a paper using something more permanent than a URL. So a DOI can resolve the site redesign problem. But if a page disappears, it is still gone. And which individual can/wants to guarantee that he will maintain a page forever?
From some recent discussions, I get the impression that quite a few people would not mind uploading their code and data to a centralized service. But currently I don’t feel like setting such a service up, as it is a lot of work, and I do not think I want to be responsible for keeping it alive “forever”, taking care of back-up, etc. So I think such a service should definitely not be an individual initiative, but be backed by an institution. And even then, what will the lifetime be?
Any better solutions?
Pingback: Domain names at Pixeltje Blog