Master Thesis

Barbara Fichte explored the topic in depth in her well written master thesis "Strategies for User-Oriented Conformance Testing of XML Documents".[MAFICHTE] She worked out the requirements, evaluated different schema languages and described how she implemented a first prototype.

She looked at the schema languages W3C XML Schema, Schematron, Relax NG and NVDL. The conclusion was that Schematron was the best fit for the purpose. One major aspect was the expertise in the team. But most importantly Schematron makes it easy to integrate additional information into the validation.

The other Schema technology that was favoured was W3C XML Schema. It is widely implemented and works well with grammar-based constraints. But W3C XML Schema came with a number of limitations. The most important one was that for some types of rules XML Schema 1.1 was needed and at the time of writing of the master thesis only limited support by free or open-source XSD 1.1 schema parsers was available.

As Schematron was such a good fit, Barbara Fichte proposed and implemented an approach where all constraints could be implemented with Schematron. She used Schematron’s abstract patterns to implement grammar constraints that would usually be implemented with W3C Schema. One example of these constraints is the order in which elements may appear under a parent element.

In the later implementation of subcheck, the main schema technology used was indeed Schematron. However, we did not completely remove W3C XML Schema from the validation process. The costs to re-implement already existing grammar constraints in Schematron was too high compared to the benefit. There also existed some XML Schema of TTML profiles published by standard bodies. The use of these schemas enabled subcheck validation to align better with the validation approaches of these organizations.