Natural/Intuitive Tagging (Influences of Shared Vocabularies on Thinking)

Sorry, my young friend, there is nothing more natural about HTML tags than any other tags! Intuitive markup is like intuitive user interfaces, a lovely but very fragile environment-based fiction.

Familiarity with one markup vocabulary often influences others. In the last 25 years or so, HTML has become such a well-known vocabulary that many users don’t think of it as a set of tags that someone made up and that have grown over time. Many users think of the HTML tags as natural in some fashion. So, they think it is natural to use <p> for paragraphs and <li> for list items. Even odder, they think it is natural to have different wrapper tags for numbered and unnumbered lists but to distinguish between numbered and lettered lists with an attribute. I’m not saying that this doesn’t work; clearly it does. But there is nothing more natural about this than, for example, having a single list wrapper element and distinguishing between unordered, numbered, and lettered display with an attribute. Users have objected to using unfamiliar tags for familiar structures; we already know HTML, we should use HTML tags when possible.

More insidious, we already know the logical model that underlies HTML so well that we think of it as the natural model of text. HTML does not allow lists inside paragraphs. Therefore, even in the case where a list occurs inside a sentence, which continues after the end of the list, where the list is obviously inside the paragraph, we should encode that as two paragraphs with a list between them, because HTML would have to code them that way. I have been astonished by users who were shocked by the suggestion that in their content they might allow lists to occur inside paragraphs instead of between them!

Of course, we get push-back in the other direction, too. A comment on JATS recently suggested that the rather old fashioned HTML style (<i>,<b>), e.g. <bold>, <italic> tags be replaced by tags such as <emph> or <hi> that at least try to encode the ‘meaning’ of the tagged content. This, in my opinion, is simply a stylistic suggestion: <i>, <italic> and <emph type="italic"> carry exactly the same amount of semantic information! The suggestion was to use a more familiar or more comfortable tag set, not to increase the semantic richness of the vocabulary.