XML Implementation – Tips and Tricks for Implementing XML or XML Schema

data structuresdesignxmlxsd

First is I'm starting to build a standard XML format/structure for our users. The objectives are:

  • XML that can be used for multiple organisations
  • XML that can be used to map external system data to our system data
  • XML structure should be in best practices
  • XML should be adaptable to change

The above are the objectives, so our initial structure, users XML, would look like this:

<users>
    <user>
        <firstName></firstName>
        <lastName></lastName>
        <email></email>
        <!-- .. some user child node here -->
    </user>
</users>

So I'm thinking what if this structure grows which might have different objects associated to user something like:

<users>
    <user>
        <firstName></firstName>
        <lastName></lastName>
        <email></email>
        <element1>
            <child1></child1>
        </element1>
        <element2>
            <child1></child1>
            <child2>
                <innerChild1></innerChild1>
            </child2>
        </element2>
    </user>
</users>

Then I would have to implement urn namespace to uniquely identify same named <element>. This is where namespace is useful.

My questions are:

  • Do I have to implement namespace by having it implemented on the initial XML sample?
  • When to use attribute instead of creating elements as child node?
  • Best practices that I could use for our XML to be adaptable to change?
  • What are the things I should prevent when building or structuring XML?

Note: we are using XML instead of JSON because most of our users use XML.

Best Answer

Honestly, my answer would initially be: don't use XML. I've been working with XML for many years and the reality is that it's a terrible format for data exchange. JSON has it's own flaws but it is much better. XML is actually not a bad way to create documents but even that usage is being replaced with HTML5.

However, given that you are 'forced' to do this, here's my list of recommendations:

XML

  • XML Namespaces suck to deal with but if might need them, you are better off using them from the start. Retrofitting them in is a huge pain in my experience.
  • Use attributes only for metadata. Elements are much more powerful. When something you thought was simple becomes complex, it is still an element. An element can also be more than one thing depending on the context.
  • Never ever ever allow mixed content. That is, don't allow text nodes and child elements as content at the same time. It's either or.
  • Do not allow entity references. This is a serious security risk.
  • Declare all namespaces in the root and use prefixes. Putting namespace declarations on every element will add a lot of bloat to an already bloated document.
  • Remove all whitespace if you are doing any sort of encryption or signatures.

XSD

  • Forget all the stuff about salami slices and venetian blinds. Create element definitions at the schema level only for those things that you want to use as the root of a document. Everything else should a type.
  • Use sequences pretty much always. Choice elements can be useful but complicate things.
  • Do not specify nillable="true". Use minOccurs="0". The element is there with a value, there and empty, or is not there. Introducing null values at the interface level is a bad idea.
  • You can't say things like "at least 2 of the following three options" in XSD without getting nutty. Let it go and move on.

I will add more if I can think of anything.