XML, JSON


Disclaimer: This entry contains about a dozen acronyms and may make non-techies cross-eyed.

***

One of the nice things about designing a new system is that you get to use the latest and greatest toys (err, tools). 😀 It has been mentioned before that the use of XML in Evergreen is pretty pervasive. We make use of MARC21Slim and MODS for bibliographic data, XMPP for Jabber messaging, XUL for widget layout, XHMTL for OPAC, and we even subclass LibXML for some internal domain objects.

However, we’re not suffering from some weird desire to use XML everywhere and anywhere. We’re not likely to make much use of the RDF resources available in Mozilla, for example, and eyeballing Jabber logs with (escaped) XML message bodies is not something we’re particularly fond of.

I explained to someone today that it’s okay if we replace some XML-based components with non-XML-based alternatives, and that, for example, there is no particular synergy between our use of XML with Jabber and our use of XUL with the GUI. That’s not to say that there aren’t some benefits from using XML in certain places. It’s a ubiquitous standard, and there is a lot of existing work out there being freely shared. The use of XSLT to go from MARC21Slim to MODS is very convenient, and in our Demo OPAC, Bill used XSLT to transform MODS to XHTML (with embedded links for Subject Headings, etc.) Apache and modperl makes it very easy for us to use XSLT as filters at various points in webpage generation. We could have stylesheet transforms for PDA’s, screen readers, RSS, you name it.

But those advantages really show up when we’re dealing with documents. Mike showed us something recently that could be really useful for when we’re just dealing with data interchange: JSON, or the JavaScript Object Notation (http://www.json.org/). I’m really interested in this because of its simplicity and the ability for us to eval JSON strings (which are basically serialized Javascript objects) straight into the Mozilla staff client, which may not always be running on hardware fast enough to handle lots of DOM/XML manipulation. Since we’re representing basic and universal data structures (key/value pairs and ordered arrays of values), it’s relatively simple to write JSON transcoders for any modern language. Mike wrote one for Perl on a whim. We use Perl heavily on our backend, and Javascript heavily on our front end. To put it differently, imagine Perl’s Data::Dumper spitting out Javascript, and also being able to go in the other direction. That could be pretty useful in our environment.

Here’s an example of a data structure (taken from Evergreen) represented in both XML and JSON:

First, the XML:

<oils:domainObject name="oilsMessage">
   <oils:domainObjectAttr value="STATUS" name="type"/>
   <oils:domainObjectAttr value="0" name="threadTrace"/>
   <oils:domainObjectAttr value="1" name="protocol"/>
   <oils:domainObject name="oilsConnectStatus">
     <oils:domainObjectAttr value="Connection Successful" name="status"/>
     <oils:domainObjectAttr value="200" name="statusCode"/>
   </oils:domainObject>
</oils:domainObject>

A direct JSON translation would look like this:

{ "oilsMessage" : {
                    "type" : "STATUS",
                    "threadTrace" : "0",
                    "protocol" : "1",
                    "oilsConnectStatus" : {
                                            "status" : "Connection Successful",
                                            "statusCode" : 200
                                          }
                  }
}

However, Mike has an idea for expanding this by embedding class name “hints” into comments. The following is still valid JSON but the special comments can tell the parser how to “bless” the data into objects and it’s much closer to what the XML is actually representing:

/*-- oilsMessage --*/ {
        "type" : "STATUS",
        "threadTrace" : "0",
        "prototcol" : "1",
        "content" :
                /*-- oilsConnectStatus --*/ {
                        "status" : "Connection Successful",
                        "statusCode" : 200
                }
}

It’s definitely something we want to look into.