HTML: The HyperText Markup Language (HTML).


Prev		Next

The first standardized version of HTML was described by a combination of prose and an SGML DTD. In this sense, and one other (see XHTML later in this paper) HTML shares some ancestry with XML However, people working with XML and with HTML often have very different ways of looking at markup. The difference can be illustrated by the meaning of the word “semantics” in the respective communities. In the HTML world, it’s common to hear people to describe the meaning of an element in terms of what a Web browser does when it encounters the start and end tags. Although most browser developers no longer refer to start and end tags as separate commands, the idea remains that operational semantics, Web browser behaviour, is the primary focus of HTML design. Marking up in the sense of identifying the content of a document seems alien here: there’s no poem element, for example, and tie cite element has no standard way to identify the author of a quotation or to give a bibliographic reference.

HTML offers some extensibility: one can use markup such as:

<span class="volno"3</span>

to indicate the issue number of a volume containing a journal within a bibliographic reference. As with “plain XML” there is no standard way to mark up a bibliography and one is actually more likely in practice to find simply

<b>3</b>

in practice, or, worse,

<span class="cn506dw"3</span>

inserted by a content management system, framework, or word processor conversion.

IHTML 5 includes elements such as nav and main, but their goal is to help people and programs navigate round documents by identifying the functions of different parts, not to try and label the parts and have the browser automatically function appropriately.

HTML, then, is strongest when the Web is the primary end output format. It should be mentioned that there are also products that support generating PDF for print from HTML and CSS, and that the EPUB 3 standard uses XHTML (with a move to HTML possible in the future).

A difficulty with HTML can be that the content can be difficult to reuse or re-purpose. If you do the work to annotate your HTML so that audiobooks can be made, or to use class attributes consistently enough that your aircraft repair manual can be generated automatically from your twenty-thousand-page operations manual, you will be replicating the work that might already have been done for you with an SGML or XML system: the tools are not generally designed for large, long, complex documents with precise domain-specific requirements. The primary application domain and usage context of HTML is the Webb browser.

A benefit of using HTML, however, can be reduced staff training. If your writers are already familiar with HTML, they can be up and running quickly. Beware, however, that saving a few hours of training does not end up costing you weeks or months of work when you discover the files are not consistently marked up or contain errors that weren’t flagged by (error-tolerant) HTML systems or Web browsers.