Introduction

Starting with P3 in 1994 (i.e., over two years before CSS1 was released), the Text Encoding Initiative Guidelines for Text Encoding and Interchange supported a mechanism to indicate a default rendition, a way of saying all emph elements were in bold italics in the original. The method used to indicate with which element type a particular default rendition was associated was to give the element type as the value of the gi attribute of the tagUsage element. The value of this attribute could be validated in broad strokes by giving it a datatype of teidata.xmlName (which boils down to xsd:NCName). Furthermore, it could be checked to be of an element type that occurs in the document using some simple Schematron:

<sch:let name="instanceTypes"
         value="distinct-values( //tei:TEI/tei:text//tei:*/local-name() )"/>
  
<sch:pattern>
  <sch:rule context="tei:tagUsage[@gi]">
    <sch:assert test="@gi = $instanceTypes">
      @gi should contain the name of an element that is within the
      &lt;text> of the document.
    </sch:assert>
  </sch:rule>
</sch:pattern>

Starting in 2015-10 with [P5] [2.9.0], TEI introduced a new method for the same purpose (and then phased out the original method). In this new method, rather than simply giving the element type to which a default rendition applied, a user specifies to which elements a default rendition applies using the CSS selection mechanism. This allows far greater flexibility and precision in expressing to which instance elements a default rendition applies, at very little to no cost in processing when using CSS to directly render TEI. For example, it is quite common in early modern printed books to have the signatures centered on the bottom of certain pages, and the catchwords on the bottom right of each page. Each of these phenomena is encoded in TEI using the fw element, but with different values of its type attribute. Thus it would not be surprising to find the declarations in Example 1, “Sample renditions” in a teiHeader.

Example 1. Sample renditions

<rendition selector="fw[type='sig']">text-align: center;</rendition>
<rendition selector="fw[type='catch']">text-align: right;</rendition>


However, there is a significant cost to this improvement in the system with respect to our ability to validate.[26] The TEI only defines selector as teidata.text (which boils down to the RELAX NG string datatype).

This struck me as insufficient, and when I found a simple syntactic error in a selector in one of our textbase files, I decided to try to improve on the situation. The TEI does not say from which version of CSS the selector syntax should be taken, so I chose level 3 ([sel3][27]).[28] The only formal constraint system available in the TEI schema language[29] above and beyond enumerated lists of values and XSD datatypes is the W3C regular expression language. Thus I set about writing a regular expression to validate CSS3 selectors.



[26] It is also costly to use this system when one wishes to convey the indicated renditions when converting the TEI to some other markup language, e.g. XHTML or ePUB. That is a topic for another paper, though.

[27] Since I did this work, a newer version, [sel3N], has been released. The new version seems at first blush to be substantially the same as the one I was using; section 4 Selector syntax is word for word identical.

[28] I was at the time blissfully ignorant of [sel4], which is still in Working Draft.