HTML vs XML vs XHTML – Key Differences and Relations

htmlxhtmlxml

  1. I was wondering what "profile" means in Wikipedia:

    XML is a profile of an ISO standard SGML, and most of XML comes from
    SGML unchanged.

  2. According to
    http://xml-tips.assistprogramming.com/sgml-xml-html-xhtml-all-together.html:

    HTML is a subset of SGML.

    XML is a highly functional subset of SGML.

    XHTML extends and subsets HTML.

    Does "one being a subset of another" mean that code in the first is
    also syntactically correct and semantically the same as in the
    second?

    As in the sense of elementary set theory,

    • are HTML, XML and XHTML all different subsets of SGML?
    • do XML and HTML almost not intersect each other?
    • is XHTML a superset of both XML and HTML?
  3. Can I expect some more concise and clear summation of the
    differences in the purposes of the four and/or when to use which,
    than the link above? I am really confused about the clear line between their intended purposes.
  4. According to
    http://xml-tips.assistprogramming.com/sgml-xml-html-xhtml-all-together.html:

    XML is not a single Markup Language. It is a metalanguage to let users design their own markup language.

    I was wondering how to understand XML and HTML are both subsets of
    SGML, but HTML is a markup language while XML is not a markup
    language but a metalanguage for designing markup languages?

    Are SGML and XHTML both also metalanguage for designing markup
    language?

  5. As in both links mention that HTML is an applicaiton of SGML as well as a subset of SGML, and XHTML is an application of XML. I wonder what differences are between saying one language is an application
    of another, and one language is a subset of another?

Best Answer

HTML and XML are both markup languages (hence the *ML). XML is a generic markup language suitable for representing arbitrary data, while HTML is a specific markup language suitable only for representing web pages.

HTML and XHTML are both subsets only of SGML, except that XHTML has additional specifications so that it also validates as XML. Think of XML as XHTML's influential godfather.

Because of this relationship to SGML across all 3 of these languages, there are a lot of similarities, but they are all considered different languages. However, much of what defines these languages is their restrictions on SGML.

  • HTML restricts SGML by defining a list of tags that are allowed to be used.
  • XML restricts SGML by not allowing unclosed or empty start and end tags, and forces attributes to be explicit. XML also has a large number of additional restrictions that are not found in SGML.
  • XHTML restricts SGML with the tags from HTML (with some exclusions, such as frameset, et al), and with the tag and entity restrictions from XML.

You may find this document helpful, although the technical terms may be hard to digest. http://www.w3.org/TR/NOTE-sgml-xml-971215

XML is not a metalanguage for defining markup languages. Really that's just SGML. XML is simply a data formatting markup language. Your quoted source is using technical terms imprecisely, which is why they are confusing.

Purposes

XML is for defining your own data format. If you wish to pass data between two systems, XML is often the way to do it.

If, for example, you needed to pass a sales order from your website to your billing system, you could create this XML payload:

<order id="12345">
    <name>John Doe</name>
    <item id="443">Adult Diapers</item>
</order>

Your website would then send that XML to your billing system, which could then parse the data from that XML.

XHTML and HTML are obviously just for web pages. XHTML's primary purpose is to remove a lot of the ambiguity that we had in previous years (decades) of web development. Back in the late 90s when I started, we were using HTML 3.2 which allowed for seriously sloppy code. HTML 4+ and XHTML try to remedy that by either strongly suggesting or enforcing explicit closing tags, explicit attributes, and disallowed tags, which makes it easier on both browsers and humans, and avoids unexpected differences in behaviour cross-browser.

Related Topic