Are there any government or public-sector sites that have cool URIs?
This week on Radio 4, Woman’s Hour has been featuring excerpts from a book by the pseudonymous WPC E.E. Bloggs entitled Diary of an On-Call Girl - a reference to Diary of a Call Girl, the blog-book of ‘Belle’, a top-class London prostitute. I started poking through PC Bloggs, the blog behind the book of the blog (so to speak). I went back and read the first two months of entries, and what struck me most was how many links that pointed to .police.uk domains 404ed. The links pointed to quite important stuff: reports about police performance, recruiting and other things that are definitely in the public interest.
Let’s not pick on the Police - I’m sure there are just as many broken links out to other UK public services on domains ending .nhs.uk, .gov.uk, .ac.uk etc. And those broken links break the commentary from the interesting bloggers who work in the various public sector organisations. The problem exists in every other sector, but the public sector is an especial problem here because, of course, they are paid for by the public purse.
What public sector link rot means is that we get stuck in a perpetual present-tense. We can’t go back and see the historical data about our hospitals, schools, police constabularies and government agencies. All that information gets put in the Internet equivalent of the memory hole. The memory hole, for those not versed in Nineteen Eighty-Four is the hole that sits next to the desks of the workers in the Ministry of Truth, where anything that was politically inconvenient was dropped for destruction. The difference between URIs disappearing and a document being placed in the memory hole is not one of category but degree. Just because it’s motivated by having a “sexy new website” doesn’t make it okay. If someone came along to your office and said they’ll install a “sexy new phone system”, but you’d have to change all your phone numbers, you’d throw the tosser out. We need to make the public sector ashamed of this kind of silliness. I think there’s a few ways we can try and resolve this.
The first is we need to help people understand that URIs are equivalent to phone numbers or addresses and should change as often as they do, which is to say as little as humanly possible, and only when there’s good reason. An example: the emergency services phone number, 999. Or rather, the two emergency services phone numbers, 999 and 112. Ofcom state that 999 has been the number of the emergency operator since 1937, and that it will be retained in addition to the pan-European 112. If every few years, the number of the emergency services were changed, it would cause chaos. The same is true at a lesser level for other phone numbers. As we build more and more stuff on top of the web, we will find unchanging URIs to be as desirable as phone numbers and postal numbers that don’t change. We need to get it into the heads of public-sector management across the country that URIs must be persistent and they should then be implementing this when it comes to commissioning websites.
A second way we can encourage the public sector to do this right is to come up with a small group of people who know about the Web and about URIs (the sort of people who give a crap about dorky things like accurate DNS resolution, open data formats, open government and so on) to produce a list of a significant number of public sector services, and then to produce what is in essence a mechanical test of continued resource availability. What this means is to simply take the URI, pull a few strings out of it that will uniquely identify that piece of data and then put them in something a bit like a unit test. Of course, we’d have to basically figure out a way of parsing the various things we might get back from their server. That would mean having to have reliable conversion or parsing libraries for PDF, Microsoft Word, Microsoft Excel and OpenDocument Format. HTML (and XML, CSV, JSON, YAML, RDF etc.) is easy, but getting the other things might be a pain. Of course, why anyone is publishing documents in Word is frankly baffling. Then every six months, we’d run the unit tests, figure out what URIs have broken and fire off some snotty e-mails to their web people pointing out that they suck and are going to hell (or something a little bit more polite). And maybe we could send out some press releases to the broadsheets. Basically make this an issue: point out to people that putting up reports and data and information about public services on the Web is great, but keeping them there is just as important.
If we could say to the newspapers “int(x) public-sector reports that were available last year are no longer available at the same address”, then hope people see that those reports are reports that they have paid for the production of. This could be an important first step in freeing a lot of other important government data. The one which pisses me off most is the situation with law reports. We as citizens of this country are actually bound by law which we don’t own the rights to. Yes, there are laws which bind us but which resides in the hands of legal publishers. Every law that binds us as citizens should be readable freely online, but it isn’t. That’s bullshit.
Of course, it will be necessary for us to sometimes change uncool URIs to cool ones, but if we are to remain cool, we’ll redirect the uncool ones to cool ones using the appropriate 30x status code. After all, Cool URIs Don’t Change.
So, who is up for unit testing the UK public sector’s adherence to Cool URIs Don’t Change?