XElement Parsing – XElement.Parse vs Serialization for Strongly Typed Objects

cperformanceserialization

We have a rather large code base that interacts with many SOAP based XML services.

Each one of these services makes 1 to n service calls A typical low level web service call looks like this (simplified):

public XElement ExecuteWebService(string xmlRequest) 

We use WCF to build the SOAP message send it and get back the response. We get the body using GetReaderAtBodyContents() and then convert that string to XMLElement using:

XElement.Parse(response)

Then we use that XElement throughout the rest of the layers of the application. There are not any strongly typed data contracts or classes that are marked with XmlSerialization attributes.

This type of structure makes it very difficult to write tests since an XElement can be any valid XML structure, as well as there are many additional lines of code to read, parse, and update XElement as they are passed around, some pretty messy code.

Are there valid reasons for this type of structure? I was told it was done for performance reasons and it's more flexible.

Is XElement parsing and reading really faster than one time serialization (Data contract, XML Serialization)? Is this really more flexible then using a strongly typed object model?

On other systems I have always used serialization and strongly typed objects because it's easier to understand and maintain. I am not certain if this XElement approach is valid.

Best Answer

I did some research and testing and here are my results:

Analysis

We have several possibilities when reading and writing messages for web services under the .Net platform.

  • Strongly Typed Classes using XmlSerializer
  • Strongly Typed Classes implementing IXmlSerializable
  • Strongly Typed Classes using Data Contract Serializer
  • Loading to XElement/Document and using Linq to Xml
  • Custom

XmlSerializer can serialize both elements and attributes. It’s the default choice when dealing with legacy or existing message structures. When creating your data model classes, one can simply attribute the public properties or members appropriately to produce or consume the outgoing or incoming XML.

Implementing IXmlSerializable is very similar to using the XmlSerializer, except that one will write code to manage the reading and writing of Xml by implementing GetSchema(), ReadXml(), and WriteXml(). This method will still use XmlReader and XmlWriters to read and write the Xml message.

The Data Contract Serializer is also very easy to implement and use. It does not support attributes, so it is suitable for green field development when dealing with Xml. The data contract serializer is ideal for code first scenarios where the message structure is not important.

It may be difficult or not viable to use the data contract serializer when an existing message structure is present. In those cases, one should default back to the XmlSerializer as that offer much more granular control of the format of the incoming and outgoing message structure.

The Data Contract Serializer is the default serializer for WCF and usually offers similar or greater performance improvement over the Xml Serializer when serializing Xml. The data contract serializer can also serialize out to JSON as well and is a more modern approach when dealing with messages.

Data Contract Serializer should always be the first choice for green field development as it produces the least amount of code to maintain and offers good performance.

XElement/XDocument loading and parsing allows for great flexibility when reading and writing Xml messages.

Some of the things to keep in mind:

  • An XElement/XDocument is not strongly typed, it is a representation of a Xml document
  • One must write code to pull information from the XElement/XDocument. This is straight forward with Linq to Xml, but complexities can arise.
  • One must convert the string values to appropriate data types as needed.
  • One must know some internals of the message structure (XPaths, etc.)
  • Testing is cumbersome when using XElement and XDocument structures in code.
  • More code is needed than serialization techniques.

This method is suitable for cherry picking data off of a large Xml structure. For example, if one has an Xml message with 100 elements and only a few or those elements are needed, this method is appropriate. It is recommended that message processing be centralized and that the data elements that are parsed are put into a strong typed object that can be used throughout the code base. Do not pass XElements or XDocument around in the code as this will be difficult to test. Also, if message processing is not centralized, there could be duplication of XElement parsing throughout the code base with different approaches being used.

Typically, this method offers better performance than serialization when a small amount of parsing is done. As the amount of manual processing and parsing increases, serialization is a better option, even if the performance is slightly worse as there will be less code to maintain and test.

When performance is of utmost concern, implementing a custom approach can be suitable, but costly from a code and maintenance standpoint. Here’s a naïve implementation of producing the some Xml that is being used for performance tests. We have a constructor that takes in an Xml string and creates the object. We have also overrode ToString() to create the Xml string.

public class FoobarHandRolled
{
    public FoobarHandRolled(string name, int age, bool isContent, DateTime birthDay)
    {
        Name = name;
        Age = age;
        IsContent = isContent;
        BirthDay = birthDay;
    }

    public FoobarHandRolled(string xml)
    {
        if (string.IsNullOrWhiteSpace(xml))
        {
            return;
        }

        SetName(xml);
        SetAge(xml);
        SetIsContent(xml);
        SetBirthday(xml);
    }

    public string Name { get; set; }
    public int Age { get; set; }
    public bool IsContent { get; set; }
    public DateTime BirthDay { get; set; }

    /// <summary>
    ///     Takes this object and creates an XML representation.
    /// </summary>
    /// <returns>An XML string that represents this object.</returns>
    public override string ToString()
    {
        var builder = new StringBuilder();
        builder.Append("<FoobarHandRolled>");

        if (!string.IsNullOrWhiteSpace(Name))
        {
            builder.Append("<Name>" + Name + "</Name>");
        }

        builder.Append("<Age>" + Age + "</Age>");
        builder.Append("<IsContent>" + IsContent + "</IsContent>");
        builder.Append("<BirthDay>" + BirthDay.ToString("yyyy-MM-dd") + "</BirthDay>");
        builder.Append("</FoobarHandRolled>");

        return builder.ToString();
    }

    private void SetName(string xml)
    {
        Name = GetSubString(xml, "<Name>", "</Name>");
    }

    private void SetAge(string xml)
    {
        var ageString = GetSubString(xml, "<Age>", "</Age>");
        int result;
        var success = int.TryParse(ageString, out result);
        if (success)
        {
            Age = result;
        }
    }

    private void SetIsContent(string xml)
    {
        var isContentString = GetSubString(xml, "<IsContent>", "</IsContent>");
        bool result;
        var success = bool.TryParse(isContentString, out result);
        if (success)
        {
            IsContent = result;
        }
    }

    private void SetBirthday(string xml)
    {
        var dateString = GetSubString(xml, "<BirthDay>", "</BirthDay>");
        DateTime result;
        var success = DateTime.TryParseExact(dateString, "yyyy-MM-dd", null, DateTimeStyles.None, out result);
        if (success)
        {
            BirthDay = result;
        }
    }

    private string GetSubString(string xml, string startTag, string endTag)
    {
        var startIndex = xml.IndexOf(startTag, StringComparison.Ordinal);
        if (startIndex < 0)
        {
            return null;
        }

        startIndex = startIndex + startTag.Length;

        var endIndex = xml.IndexOf(endTag, StringComparison.Ordinal);
        if (endIndex < 0)
        {
            return null;
        }

        return xml.Substring(startIndex, endIndex - startIndex);
    }
}

Here we are using string parsing techniques and hardcoded values to read and write the Xml message. This method will offer the best performance at the cost of additional code, maintenance and custom implementation. This method is not recommended unless there is a critical need for the best performance.

Performance Summary

The chart will list initial performance (1st) time, and the average 1000 reads. Hardware used was W530 laptop using visual studio 2013 and .net 4.5.2. The processor was an i7-3840QM at 2.80Ghz. All serializers offer nanosecond read times. The warm up times can be mitigated by performing proper initialization at startup. Xml Serializer can be mitigated by using SGEN prior.

Serializer                      First Time  Average 1000 Reads
XmlSerializer                   2448965         245
Implementing XmlSerializable    2051813         208
Custom                          161105          29
Using XElement/XDocument        247024          113
Data Contact (Json)             1979593         303

All times are in ticks. Ticks are hardware dependent but as a guideline there are roughly 10,000,000 ticks in 1 second.

Final Recommendations

The following guidelines should be followed when designing messages in which services exchange data:

  • Exchange the minimal amount of data that is needed to satisfy the requirements
  • Avoid complex or generic data structures
  • Avoid messages that contain meta data that describes the data
  • Avoid data within data (Example: Xml within Xml)
  • Be mindful of the size of the message

By following these guidelines clients consuming those messages can be made simpler and faster.