Tom Morris

23 January 2009

A pungent mix of programming, philosophy, pedanticism, procrastination, perplexity, peripheral political polemic, and platters of preposterousness.

Using XML-RPC encoding to avoid messy XML-to-JSON transformations

There is a big mismatch in programming between the structure of XML and the tools used to parse it programatically. In XML, you have elements, attributes, CDATA sections, processing instructions, namespaces and text nodes - often in a rather spaghetti-like construction (and sometimes vaguely resembling something like a schema). There isn’t a simple mapping from this to the common types in programming languages: booleans, number primitives, strings, tuples/lists, key-value structures, classes, objects and functions. And so we get rather clumsy XML interfaces like the XML Document Object Model, with all those lovely method names like “getElementByTagName”, “getAttribute” and so on. Or, even more fun, one can use a SAX-based parser - well-known to be about as much fun as getting raped by a javelin.

Today, Jeremy Keith posted a blog entry about how he added a feature to the excellent Huffduffer using Amazon’s product information API and machine tags, but had to handle XML coming out of Amazon. Jeremy then wrote an XSLT to turn the XML into JSON. You can have a look at the XSLT here. It’s pretty good. I’d have added some xsl:text elements, and maybe explicitly added an xsl:output element with a ‘text’ method. Not the best XSLT I’ve ever seen, but not at all shabby.

There is a problem with this approach - character encoding. If you were to get a text node that contained some encoded ampersand or something, that wouldn’t be properly transformed - JSON uses UTF-8 text, wheras XML uses character encoding as per the processing instruction. How to solve this problem? Easy, don’t turn it into JSON. Turn it into XML-RPC instead. XML-RPC gives you pretty much the same data model as JSON, except that you don’t get a nil (a glaring error in XML-RPC). So, you take the XML from Amazon, turn it into an XML-RPC message, then use an XML-RPC library to turn that into language-native data. For PHP users (as Jeremy is), you then use Simon WIllison’s XML-RPC library (Python and Ruby both have built-in XML-RPC libraries - be aware that Ruby’s XML-RPC server has had/may still have a notorious security hole, so watch yourself). You should be able to parse the XML-RPC message with Simon’s XML-RPC class by just instantiating an IXR_Message object, running the parse() method on that object and then getting the data out of the object’s ‘params’ property. For super-laziness, just wrap those three lines up in a method called xmlrpc_parse(), and you can use it just like the json_parse() method in PHP.

It’s not much of a difference (unless, of course, you are transmitting lots of nullness around) - but does mean that you can neatly side-step around character encoding issues going between XML and JSON - something that XSLT 1.0 won’t just fix for you by magic.

As for XML-RPC not having nil? Wikipedia says there’s an unofficial extension to XML-RPC that allows one to use a nil element - here it is. It’s so common sense to have a nil element. In Jeremy’s case, this is not a problem. It’s not likely that a book on Amazon is going to, say, not be a string but be a nil value. (The other approach to this is to use SOAP, but that’s too heavyweight - or to use something like WDDX, which does have a nil value, but doesn’t have the tool support that XML-RPC does - the PHP WDDX parser is a compile-in extension, which is notoriously unpossible on shared hosting sites. As for me, I’d turn it into RDF, but I’m not even going to go there.)

Right, having suggested that people stop using JSON and start using XML-RPC - in PHP no less - I shall retire to my bed in disgrace.

Tags:

Ben Russell in The Independent: Proposals in the Coroners and Justice Bill include measures to authorise ministers to move huge amounts of data between government departments and other agencies and public bodies. Bodies that hold personal information include local councils, the DVLA, benefits offices and HM Revenue and Customs. The Bill will allow ministers to use data-sharing orders to overturn strict rules that require information to be used only for the purpose it was taken. But it places no limit on the information that could eventually be shared between public bodies, potentially allowing vast amounts of personal data to be shared by officials across Whitehall, agencies or other public bodies.

Tags:

Norman Geras: Here’s my advice to the new president. Get yourself a team of philosophers. Philosophers would have been able to tell you, providing as much backup as you could want, that even with ‘faithfully’ out of place in that word string, the oath remained the same. For the oath resideth not in the precise order of the words (although that doesn’t mean any old order would do) but in the meaning of what the words that are uttered express. There may well be philosophers somewhere out there right now explaining why to take the oath of office twice renders each occurrence nugatory from a legal point of view: for the second occasion negates the validity of the first by superseding it, and the first occasion renders the second one superfluous, so ensuring that it can’t count for anything. Come to think of it, Obama should make sure that the philosophers he gets are up to scratch. If philosophers were kings, the Oath would be in predicate logic.

Tags:

Obama White House should use public-key encryption

MSNBC quotes John Pescatore from Gartner on Barack Obama’s electronic security: Take an innocuous example. If (Obama) were to sit down at his personal PC, log into his (presidential) e-mail account and send a congratulatory e-mail to the pilot of the US Airways jet […] how would the pilot know it was really Obama? If someone else sent out a doctored e-mail pretending to be Obama, how would we know it wasn’t really him?

Well, the White House could use GnuPG. It’s strong enough that PGP, the software that GnuPG is an open source clone of, used algorithms strong enough that a previous U.S. government determined them to be military strength and subjected them to a totally futile and rather laughable attempt at export control (something which Vice-President Biden presided over - see PGP creator forgives Biden).

Just imagine. You turn on your TV for a message from President Obama, and he says something like: “in the interest of national security, I have decided to use the GNU Privacy Guard on all my e-mails, and I humbly request citizens to adopt similar technology to provide for themselves the guarantees of liberty and freedom through code as the Bill of Rights attempts to provide through law. All e-mail communication from the U.S. federal government will be digitally signed. My public key is (whatever it is), and the keys of all the members of the cabinet are cross-signed.”

Diffie and Hellman have solved this problem. There’s no reason not to start using their solution in government. Perhaps the Obama administration could put an open source bounty out there for someone to port GPG to the BlackBerry.

Tags:

On live blogging

Martin has blogged my conversation with him the other day about liveblogging. For me, the reason liveblogging is of lesser significance (‘dead’ being a Steve Gillmor-esque abbreviation) is that now events are routinely videoed, the truly important thing is original analysis.

Also, I find that unless you dose yourself up on caffeine, liveblogging requires a lot of focus. It was a necessity in times gone by. Now, analysis is far more important. I’d rather read a blog post with someone describing a talk they went to which contains some unique analysis of whether or not the speaker is actually right or not.

I didn’t live blog much when I was in Paris for Le Web, but I have liveblogged at events where the video did not come out for months - and a few where it never came out at all.

The other realization I had was that none of it matters. Who cares about some tech conference? I don’t read other people’s liveblogging. Why would I want to write something that I find absolutely no value in? Why do we need it live? Think about live television. In the last decade, there has been only one event that has really needed live television - September the 11th. When you watch the news and they have live, outside broadcasts, it’s more a clever gimmick. When Britain was flooded last summer, we watched BBC reporters in wellies standing in flooded streets. Yes, it’s flooded. Why do we need a live, outside broadcast unit to tell us that? There are those key defining moments - the attacks on the Twin Towers, the death of Princess Di and the inauguration of President Obama - where the rawness and importance of the event is based on it’s liveness. Reading notes about a PowerPoint-addled presentation at a conference on social media or Web 2.0 or whatever is not one of them. If it’s important, it’ll bubble up as video later.

Which reminds me, I finally got around to reading The Onion’s election blog War for the White House today. There’s some hilarious personas on there, but Oliver Thayer is my favourite. I’ll use this as a teaser: Who needs book knowledge when you have blog knowledge? Heh.

Tags: