Advantages of XML Over S-Expressions Notation

xml

I would like to ask a question about XML and S-expressions(-ish) notation. S-expressions are pretty old; they are also really simple. We could consider two forms that are equal in meaning, different in syntax:

(xml code taken from Polish wikipedia)

<?xml version="1.0" encoding="UTF-8"?>
<ksiazka-telefoniczna kategoria="bohaterowie książek">
 <!-- komentarz -->
  <osoba charakter="dobry">
    <imie>Ambroży</imie>
    <nazwisko>Kleks</nazwisko>
    <telefon>123-456-789</telefon>
  </osoba>
  <osoba charakter="zły">
    <imie>Alojzy</imie>
    <nazwisko>Bąbel</nazwisko>
    <telefon/>
  </osoba>
</ksiazka-telefoniczna>

S-Expression(-ish) version:

(:version "1.0" :encoding "utf-8")
(ksiazka-telefoniczna :category "bohaterowie książek"
  ; komentarz(a comment)
  (osoba :charakter "dobry"
    (imie Ambroży)
    (nazwisko Kleks)
    (telefon 123-456-789))
  (osoba :charakter "zły"
    (imie Alojzy)
    (nazwisko Bąbel)
    (telefon)))

The S-Expression version is much more concise. We avoid redundancy by using simple list notations, yet we still can define syntax to include things that we want to have(e.g. properties). Of course, this is just an example, and the actual standard could have been better or simply different; however, it's shorter and easier to parse. Why did XML win?

Best Answer

We know the designers of XML were familiar with S-expressions, since XML is based on SGML, and SGML has a style sheet language, DSSSL, which uses S-expression syntax (and scheme as embedded scripting language).

Nevertheless they chose a different syntax than S-expressions due to the use cases for XML. XML was initially designed to support both machine-generated structured data and markup languages like HTML, which are authored manually and contains mixed content (text intermingled with elements with metadata).

Redundancy

Markup text documents are often longer than a screenful. If you see a ) and you can't see the beginning of the structure, you are pretty lost; you don't know if the was a chapter or a sidebar which just ended. The redundancy of repeating the tagname in endtags in XML like </sidebar> makes this much easier for the human writer. It also makes it more robust: if you accidentally delete an end tag, you can often infer which end-tag is missing.

SGML (the predecessor to XML) allowed you to optionally shorten the end-tag to a single character, but this feature was left out of XML for simplicity.

So in short, XML is more verbose by design, because it is designed to support human-editable document. Today XML is used for a wide variety of purposes, also for pure machine-to-machine communication, where this redundancy is not needed.

Mixed content

Your suggested syntax would not support mixed content very well. Take this example in HTML:

<p>Hi! <a href="example.com">Click here</a>!</p>

How would you express this in your syntax? You would need some kind of additional delimiter to distinguish between attributes and text content. Suddenly it it not so concise anymore.

Special characters

Angle brackets are much rarer in ordinary text than parentheses and colon.

Compatibility

HTML was already wildly successful at the time XML were designed, and it made sense to choose a similar syntax.

Why did XML win?

S-expressions were never an alternative to XML. The XML spec is much more than angle brackets; It defines a syntax for elements and attributes and mixed content, escaping, character encoding, DTD-syntax and validation and so on. Nothing similar existed for s-expressions. Of course you can define a similar standard, as you propose here, but nobody had done this at the time. XML got blessed by the W3C and was therefore adopted by major players and became the defacto standard for data exchange.

Related Topic