Domain-specific XML Vocabularies

Like RDF, XML is really a framework for one’s own information: it does not come with much in the way of predefined semantics, whether behavioural or extrinsic. Like HTML, XML has a simple tag-based syntax for representing elements, but unlike HTML, XML does not predefine any element names. There is also no single data model for XML, although a few data models predominate in practice, primarily DOM and XDM.

Since XML does not predefine element names, and also does not support anonymous (unnamed) elements, one needs a set of XML names for elements (and their attributes) to use. A set of element names and constraints on them can loosely be called a vocabulary. To make XML useful, there are three common paths people take, depending on their situation and the project context: to use a domain-specific XML vocabulary; to extend an existing XML vocabulary; to develop a custom XML vocabulary. Subsequent sections will describe each of these in turn, including some examples and some exceptions.

Some frequently-heard complaints about XML should be mentioned here. People say that XML files are large compared to, for example, JSON; this is often true although XML compresses well. But the names repeated in XML end tags and the quotes around attribute values also provide a level of redundancy: experience with SGML minimization showed that the ability to omit these increased support costs considerably, because they created common situations in which the document would parse and even validate, but the interpretation of the parser differed from that of the human user.

Another common complaint about XML from developers is that the APIs for working with XML are inelegant. Although this was true fifteen years ago, today there are often much more convenient APIs, ranging from XQuery and XPath to JQuery. But in any case the primary question of context for an XML project is whether the document maintainers are in control of the markup or whether the developers are. The former case is a primary use-case for XML. When developers are in control, JSON may be a better fit for short-form key-value or object-like documents. XML remains the format of choice for mixed content, where there is a mixture of text and markup at the same level.