XML Language – Understanding Why XML is Called a Language

language-designlanguagesmarkupxml

I've been wondering why XML has an L in its name.

By itself, XML doesn't "do" anything. It's just a data storage format, not a language! Languages "do" things.

The way you get XML to "do" stuff, to turn it into a language proper, is to add xmlns attributes to its root element. Only then does it tell its environment what it's about.
One example is XHTML. It's active, it has links, hypertext, styles etc, all triggered by the xmlns. Without that, an XHTML file is just a bunch of data in markup nodes.

So why then is XML called a language? It doesn't describe anything, it doesn't interpret, it just is.

Edit: Maybe my question should have been broader. Since the answer is currently "because XML was named after SGML, which was named after GML, etc" the question should have been, why are markup languages (like XML) called languages?

Oh, and WRT the close votes: no, I'm not asking about the X. I'm asking about the L!

Best Answer

The real answer is XML has an L in the name because a guy named Raymond Lorie was among the designers of the first "markup language" at IBM in the 1970'ies. The developers had to find a name for the language so they chose GML because it was the initials of the three developers (Goldfarb, Mosher and Lorie). They then created the backronym Generalized Markup Language.

This later became standardized as SGML (Standardized General Markup Language), and when XML was created, the developers wanted to retain the ML-postfix to indicate the family relationship to SGML, and they added the X in front because they thought it looked cool. (Even though it doesn't actually make sense - XML is a meta language which allows you to define extensible languages, but XML is not really extensible itself.)

As for your second question if XML can legitimately be called a language:

Any structured textual (or even binary) format which can be processed computationally can be called a language. A language doesn't "do" anything as such, but some software might process input in the language and "do" something based on it.

You note that XML is a "storage format" which is true, but a textual storage format can be called a language, these term are not mutually exclusive.

Programming languages are a subset of languages. E.g. HTML and CSS are languages but not programming languages, while JavaScript is a real programming language. That said, there is no formal definition of programming language either, and there is a large grey zone of languages which could be called either data formats or programming languages depending on your point of view.

Given this, XML is clearly a language. just not a programming language - though it can be used to define programming languages like XSLT.

Your point about namespaces is irrelevant. Namespaces are an optional feature of XML and do not change the semantics of an XML vocabulary. It is just needed to disambiguate element names if the format may contain multiple vocabularies.


Edit: reinierpost pointed out that you might have meant something different with the question than what I understood. Maybe you meant that specific vocabularies like XHTML, RSS, XSLT etc. are languages because they associate elements and attributes with particular semantics, but the XML standard itself does not define any semantics for specific elements and attributes, so it does not feel like a "real language".

My answer to this would be that XML does define both syntax and semantics, it just defines it at a different level. For example it defines the syntax of elements and attributes and rules about how to process them. XML is a "metalanguage" which is still a kind of language (just like metadata is still data!). As an example EBNF is also clearly a language, but its purpose is to define the syntax of other languages, so it is also a metalanguage.

Related Topic