C# – Best Way to Consume Very Dynamic/Inconsistent XML/JSON

cparsingxml

I don't know whether to call the data dynamic or inconsistent. But I need to create profile pages for people generated from xml or json. The challenge is in the data the is returned. The data is bibliographical data about a person, and the APIs that are returning it are old, 11 years old. They need revisiting.

the data can look something like this

    <person>
         <personalinfo>
              //Always the same......
         </personalinfo>
         <categories>
             <publications>
                 <item> //<---The issues lie inside here.
                      <authors>
                      </authors>
                      <publications>
                      </publications>
                 </item>
             </publications>
         </categories>
    </person>

The issue lies with the categories and item information. A new category can be added at any time and be called anything. Also an item can have any fields and those fields can have any name and be added at any time. Essentially I don't know what I am getting back. On top of that there is no token returned giving any hints about displaying the information nor is there any format requirements on data that is entered into the feed that these APIs are returning.

I know that these APIs need updating but that isn't on the table right now. I just got a deadline pushed forward 2 weeks and need to have profile pages done very soon.

Are there any good tools that can handle this mess of information? Does anyone have any suggestions for getting this done quickly. This is going to be an iterative project I imagine so whatever I use is most likely going to be an interim solution. The data returned can be json, but it keeps the same structure.

The site I am making these pages for is an .net MVC site. I'm using razor for everything else, but I think for this particular page there may be a better approach.

The concern I have isn't in parsing of the data. I know you can use dynamic objects with many libraries. The concern is formatting the data once I have it. There are no good identifiers or tokens to use to step through data formatting it. With these dynamic objects is there a good way to format them before passing the model to a view, or Am I going to have to write a huge xsl sheet for handling all possible cases. There are 2000+ different fields that items can have and more could be added so I don't want to do it that way.

Formatting is an issue because they want stuff like authors to be formatted differently based on category APA vs MLA stuff like that as well date formats and phone/ mail formats to differ based on category. The problem orginates from the APIs which I can't fix now, not my project nor are they listening to me. When the the data structure for the backend of the APIs was set up they wanted to allow for any data which is good, but they didn't set up and structured guidelines for formatting or creating new fields.

So publication may have authors while poems may have author and articles may have AUTHORS. Date in one category my be date or startdate or enddate or birthday. I realize there isn't a very elegant solution to this without fixing the root cause was just hoping someone had some advice for a quick easy interim solution until the APIs can be tackled right. Also they want me to reorder fields in some spots not based on category but based on adjacent fields.

Best Answer

Your going to have to write a lot of custom formatters. As a example, he's some solutions for formatting phone numbers:

https://stackoverflow.com/questions/188510/how-to-format-a-string-as-a-telephone-number-in-c-sharp

As you can see, lots of variations on a theme.

You might try a factory approach where where you pass a type and a category and it returns a formatter. All formatters inherit from an interface like IFormat. Some examples:

var formatter = FormatFactory.Create(FormatType.Telephone, "CategoryX");
var formattedString = formatter.Format(phoneNumberFromJsonString);

var formatter = FormatFactory.Create(FormatType.Author, "APA");
var formattedString = formatter.Format(AuthorFromJsonString);

At least you would be able to keep the format logic focused instead of having a single formatter trying to handle all the scenarios.