C# – Gracefully handle validation errors in a XML file in C#

cnetxml

The description is bit on the longer side please bear with me. I would like to process and validate a huge XML file and log the node which triggered the validation error and continue with processing the next node. A simplified version of the XML file is shown below.

What I would like to perform is on encountering any validation error processing node 'A' or its children (both XMLException and XmlSchemaValidationException) I would like to stop processing current node log the error and XML for node 'A' and move on to the next node 'A'.

<Root>
  <A id="A1">
     <B Name="B1">
        <C>
          <D Name="ID" >
            <E>Test Text 1</E>
          </D>
        <D Name="text" >
          <E>Test Text 1</E>
        </D>        
      </C>
    </B>
  </A>
  <A id="A2">
    <B Name="B2">
      <C>
        <D Name="id" >
          <E>Test Text 3</E>
        </D>
        <D Name="tab1_id"  >
          <E>Test Text 3</E>
        </D>
        <D Name="text" >
          <E>Test Text 3</E>
        </D>
      </C>
    </B>
</Root>

I am currently able to recover from the XmlSchemaValidationException by using a ValidationEventHandler with XMLReader which throws a Exception that I handle in the XML Processing code. However for some cases XMLException is being triggered which leads to termination of the process.

The following snippets of the code illustrate the current structure I am using; it is messy and code improvement suggestions are also welcome.

    // Setting up the XMLReader
    XmlReaderSettings settings = new XmlReaderSettings();
    settings.ConformanceLevel = ConformanceLevel.Auto;
    settings.IgnoreWhitespace = true;
    settings.CloseInput = true;
    settings.IgnoreComments = true;
    settings.ValidationType = ValidationType.Schema;
    settings.Schemas.Add(null, "schema.xsd");
    settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);
    XmlReader reader = XmlReader.Create("Sample.xml", settings);   
    // Processing XML
    while (reader.Read())
    if (reader.NodeType == XmlNodeType.Element)
       if (reader.Name.Equals("A"))
         processA(reader.ReadSubtree());            
    reader.Close(); 
   // Process Node A
   private static void processA(XmlReader A){
    try{
       // Perform some book-keeping 
       // Process Node B by calling processB(A.ReadSubTree())               
    }   
    catch (InvalidOperationException ex){

    }
    catch (XmlException xmlEx){

    } 
    catch (ImportException impEx){

    }
    finally{ if (A != null) A.Close(); }            
  }
  // All the lower level process node functions propagate the exception to caller.
  private static void processB(XmlReader B){
   try{
     // Book-keeping and call processC
   }
   catch (Exception ex){
    throw ex;
    }
   finally{ if (B != null) B.Close();}    
  } 
  // Validation event handler
  private static void ValidationCallBack(object sender, ValidationEventArgs e){
    String msg =  "Validation Error: " + e.Message +" at line " + e.Exception.LineNumber+
        " position number "+e.Exception.LinePosition;
    throw new ImportException(msg);
  }

When a XMLSchemaValidationException is encountered the finally block will invoke close() and the original XMLReader is being positioned on the EndElement of the subtree and hence the finally block in processA will lead to processing of the next node A.

However when a XMlException is encountered invoking the close method is not positioning the original reader on the EndElement node of the subtree and an InvalidOperationException is being throw.

I tried to use methods like skip, ReadToXYZ() methods but these are invariably leading to XMLExcpetion of InvalidOperationException when invoked on any node that triggered an exception.

The following is a excerpt from MSDN regarding the ReadSubTree method.

When the new XmlReader has been
closed, the original XmlReader will be
positioned on the EndElement node of
the sub-tree. Thus, if you called the
ReadSubtree method on the start tag of
the book element, after the sub-tree
has been read and the new XmlReader
has been closed, the original
XmlReader is positioned on the end tag
of the book element.

Note: I cannot use .Net 3.5 for this, however .Net 3.5 suggestions are welcome.

Best Answer

See this question:
XML Parser Validation Report

You need to distinguish between well-formed xml (it follows the rules required to be real xml) and valid xml (follows additional rules given by a specific xml schema). From the spec:

Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way).

For better or worse, the xml tools included with Visual Studio need to follow that spec very closely, and therefore will not continue processing if there is a well-formedness error. The link I provided might give you some alternatives.