XML Type Safety – Why is XML Type Safe?

type-safetyxml

Why do they say that XML provides type safety and how is it expressed in the XML itself?

How is it different from JSON (for example) which (as I understand) is not type safe?

Best Answer

Because of the XML Schema Definition (XSD).

With XML, you can have an additional file which describes the schema. It indicates, for example, that the element /a/b is an array and contains from 1 to 10 elements, or that the element /a/c is an integer. You can find an example of an XSD here.

Validation of a given XML file through an XSD is supported by many languages. For example, a .NET application may request an XML file from an untrusted source and check that it matches the XSD; then, it can save it to a Microsoft SQL Server database, which can in turn contains an XSD and do the check again (to ensure that any client which have access to the database complies).

XSD is not the only language.

  • If you've done web development, you certainly heard about Document Type Definition (DTD)—a markup language which defines the structure of XML and is used especially in validation of HTML-related content. While it cannot do all things XSD can, such as ensure that an element or an attribute contains an integer number, it can still perform a bunch of structure checks.

  • RELAX NG has a benefit of being relatively simple compared to other languages and can be written in a more compact form than XML.

  • Schematron is another “rule-based validation language for making assertions about the presence or absence of patterns in XML trees” (Wikipedia) and presents a slightly different approach, based on XPath assertions.

Similar initiatives for JSON are not that popular (especially, I believe, in Microsoft-centric corporate world). One of the reasons is that JSON is intended for situations where the data structure is rather basic (i.e. can be expressed as a tree, without the need for attributes, for instance) and don't necessarily need to be validated. An excellent example is a REST API used by a dynamically-typed language:

  • the client is very easy and fast to implement,
  • the API is trusted not to change,
  • the client can easily deal with specific leafs where validation is necessary (for instance check that /something/percentage is an actual number and is in 0..100 range).