HTML and XML are both markup languages (hence the *ML). XML is a generic markup language suitable for representing arbitrary data, while HTML is a specific markup language suitable only for representing web pages.
HTML and XHTML are both subsets only of SGML, except that XHTML has additional specifications so that it also validates as XML. Think of XML as XHTML's influential godfather.
Because of this relationship to SGML across all 3 of these languages, there are a lot of similarities, but they are all considered different languages. However, much of what defines these languages is their restrictions on SGML.
- HTML restricts SGML by defining a list of tags that are allowed to be used.
- XML restricts SGML by not allowing unclosed or empty start and end tags, and forces attributes to be explicit. XML also has a large number of additional restrictions that are not found in SGML.
- XHTML restricts SGML with the tags from HTML (with some exclusions, such as frameset, et al), and with the tag and entity restrictions from XML.
You may find this document helpful, although the technical terms may be hard to digest. http://www.w3.org/TR/NOTE-sgml-xml-971215
XML is not a metalanguage for defining markup languages. Really that's just SGML. XML is simply a data formatting markup language. Your quoted source is using technical terms imprecisely, which is why they are confusing.
Purposes
XML is for defining your own data format. If you wish to pass data between two systems, XML is often the way to do it.
If, for example, you needed to pass a sales order from your website to your billing system, you could create this XML payload:
<order id="12345">
<name>John Doe</name>
<item id="443">Adult Diapers</item>
</order>
Your website would then send that XML to your billing system, which could then parse the data from that XML.
XHTML and HTML are obviously just for web pages. XHTML's primary purpose is to remove a lot of the ambiguity that we had in previous years (decades) of web development. Back in the late 90s when I started, we were using HTML 3.2 which allowed for seriously sloppy code. HTML 4+ and XHTML try to remedy that by either strongly suggesting or enforcing explicit closing tags, explicit attributes, and disallowed tags, which makes it easier on both browsers and humans, and avoids unexpected differences in behaviour cross-browser.
Your going to have to write a lot of custom formatters. As a example, he's some solutions for formatting phone numbers:
https://stackoverflow.com/questions/188510/how-to-format-a-string-as-a-telephone-number-in-c-sharp
As you can see, lots of variations on a theme.
You might try a factory approach where where you pass a type and a category and it returns a formatter. All formatters inherit from an interface like IFormat. Some examples:
var formatter = FormatFactory.Create(FormatType.Telephone, "CategoryX");
var formattedString = formatter.Format(phoneNumberFromJsonString);
var formatter = FormatFactory.Create(FormatType.Author, "APA");
var formattedString = formatter.Format(AuthorFromJsonString);
At least you would be able to keep the format logic focused instead of having a single formatter trying to handle all the scenarios.
Best Answer
XML and JSON are both capable of transmitting the same data, but which is better depends mostly on what you want to do with it.
This does touch on existing tooling, but you're not likely to be hand rolling parsers for either, so it is relevant.
XML
JSON
Because one is capable of storing anything that can be stored in the other, usability and tooling is about the only criteria that is useful for comparing what it's like to use them.
If all that were wiped away, I'd probably use neither, and transmit the data as chunks of Lisp.
It's lower ceremony than JSON (barely), much easier to transform than XML, and easier to write a parser for than either (which I'd have to do if all the tooling were gone).