XMLisms and binary XML rants 

There is a furious discussion going on (again) on the XML-DEV mailing list about the necessity of binary XML representation format. Tons of press ink has also been spilled on this issue.


In essence, the XML data model consists of three layers:

  1. XML Serialization Syntax (often called XML 1.0, XML 1.1)
  2. XML Infoset (abstract data model)
  3. PSVI (Post Schema Validation Infoset), (typed data model, brought by XML Schema)

The problem lies in number 1. XML serialization stacks produce nasty angle bracket wire serialization format which needs to be parsed into the XML Infoset before it can be exposed by any programmatic XML-exposing technology, like a DOM, SAX or what have you. In the reverse things get done in the opposite direction.


If we rule schema out of the view temporarily, there are currently two ways to encode a single XML document. One being the ordinary XML 1.0 + Namespaces 1.0 wire syntax, represented in XML Infoset 1.0 form by the parsers/stacks. The second one is XML 1.1 + Namespaces 1.1 wire syntax, represented in XML Infoset 1.1 form, which didn't gain enough momentum and it's a question whether it will in the future.


Question still remains about whether the XML industry has reached a sweet spot in the (non)complexity of the serialization syntax to allow fast processing in the future. It is my belief that we will not see a great adoption of any binary XML serialization format, like XOP or BinaryXML outside the MTOM area, which pushes XOP into SOAP. That stated, one should recognize the importance of main vendors not reaching the agreement for quite some time. Even if they do reach it some time in the future, the processing time gap will long be gone, squashed by the Moore's law. This will essentially kill the push behind binary serialization advantages outside the transport mechanisms (read SOAP). Actually, having a 33% penalty on base64 encoded data is not something the industry could really be concerned about.


There are numerous limiting factors in designing an interoperable serialization syntax for binary XML. It all comes down to optimization space. What do we want to optimize? Parsing speed? Transfer speed? Wire size? Generation speed? Even if those don't seem connected, it turns out that they are sometimes orthogonal. You cannot optimize for generation speed and expect small wire size.


We will, in contrary, see a lot more XML Infoset binary representations that are vendor-centric, being only compatible in intra-vendor-technology scenarios. Microsoft's Indigo is one such technology, which will allow proprietary binary XML encoding (see System.ServiceModel.Channels.BinaryMessageEncoderFactory class) for all SOAP envelopes traveling between Indigo endpoints being on the same or different machines.

Categories:  XML
Tuesday, April 12, 2005 3:59:39 PM (Central Europe Standard Time, UTC+01:00)  #    Comments


Copyright © 2003-2024 , Matevž Gačnik
Recent Posts
RSS: Atom:

The opinions expressed herein are my own personal opinions and do not represent my company's view in any way.

My views often change.

This blog is just a collection of bytes.

Copyright © 2003-2024
Matevž Gačnik

Send mail to the author(s) E-mail