XHTML™: The Extensible HyperText Markup Language

XHTML is a widely-used XML vocabulary that is also a reworking of HTML into XML. There are two primary versions: 1.x and 5. Version 5 is the current version, and is an XML syntax for HTML 5. The original XHTML 1 versions supported customization using XML DTDs, the original mechanism to define an XML vocabulary. XHTML 5 does not use grammar-based validation, and is primarily intended for use in Web browsers. XHTML is also used by the EPUB standard for electronic books, where it has the advantage that a device-specific rendering engine can be used without having to worry about full HTML compatibility. This is important because Web browsers tend to be very flexible in displaying files containing errors, so existing content often contains a lot of syntax errors, which in turn means any software that tries to process arbitrary HTML needs a relatively complex parser and a lot of compatibility code. With XHTML, the syntax checking is strict, and enforced by the XML parser.

Since XHTML files can be read by XML parsers, they are amenable to processing with XSLT and XProc, and to being stored in databases and queries with XQuery. Modern versions of XSLT and XQuery can also create XHTML 5 files.

XHTML, like HTML, is designed in the context of Web browsers. Like all software, Web browsers evolve. Over time, elements change meaning or are dropped entirely. However, HTML 5 dropped the idea of including a version number in HTML files, which may cause problems with long-term archiving. Of course, any vocabulary could change over time, but the mitigation there is to combine grammar-based validation and specific version marking; HTML and XHTML do neither.

XHTML is good for projects at the edge of XML and the Web, because they are equally usable by both tool-sets. The element names are also understood by Web search engines, so that serving XHTML files directly on the open Web makes sense and works. XHTML 5 is a good choice for generic documents such as blogs, as well as for Web-based applications. It has deficiencies in validation compared to domain-specific vocabularies, but it would also be fair to consider XHTML to be a domain-specific XML vocabulary for sharing documents on the World Wide Web. The name (X)HTML is sometimes used to refer to the vocabulary independently of whether the XML or slightly different HTML syntax is used.

If you are producing your own domain-specific vocabulary, it is worth considering using (X)HTML 5 element names for plain text paragraphs and for markup within them, simply because (X)HTML is, overall, the most widely-used vocabulary on the planet. However, beware of false promises: if you use p for paragraphs, people may expect to be able to use ol for a list, i for italics, and so on.