With the generation of a complete RELAX NG schema or XSLT program instead of just a string, I could quickly and easily test that the regular expression generated was at least a valid regular expression just by validating an XML document (any XML document) against the generated RELAX NG schema or transforming an XML document (any XML document) with the generated XSLT stylesheet. In either case, the processor (for me that generally means jing or Saxon) will generate an error message if the string being used as a regular expression is not in fact parsable as a W3C regular expression. For example, for the string “This is )bad(” jing will generate an invalid parameter: invalid regular expression: character is not allowed in this context: This is >>>>)bad( message, and Saxon a Syntax error at char 8 in regular expression: Unmatched close paren message.
Testing that a valid regular expression works as desired
is the next step. In a practical sense, this is quite easy: just
take an XML file that has a test selector on a
tei:rendition/@selector
, and validate it against
the generated RELAX NG schema or transform it with the generated
XSLT stylesheet. But, in a logical sense, this is quite difficult:
what selectors get tested? What strings that are not selectors
get tested?