TEI to Emblem Schema

The transformation from TEI to Emblem Schema failed because the structural metadata document was declared to be a TEI document by a DOCTYPE-declaration but the TEI namespace was not declared in the document. Instead the DTD declared an xmlns attribute with a default value of http://www.tei-c.org/ns/1.0. Thus in order for the namespace binding to be present the XML processor had to process the DTD when loading the document. The reason why the transformation failed was simply that XProc used the DTD to supply the xmlns attribute with its default value while the original transformation was initiated in PHP which disables DTD processing by default. The transformation script thus expected the elements of the structural metadata document to be in the null namespace, while they were placed in the TEI namespace in our pipeline.

To allow for consistent namespace-aware processing we relocated the elements to the TEI namespace by applying an appropriate XSL transformation and modified the existing transformations accordingly. We also removed the DOCTYPE-declaration and changed the inclusion mechanism of facsimile.xml from external entity references to XInclude. The latter prompted us to add a namespace declaration to facsimile.xml, which was missing. What we did not realize back then (and still have to do) is that we need to change references to the tei:graphic elements due to XInclude's base URI fixup. Referencing attributes like facs are typed as xsd:anyURI and thus need to reference the originating file and not just document local xml:ids.

We furthermore noticed that all structural metadata documents defined an xml:base attribute on the outermost element. The content of this attribute varied. Some documents used the persistent URL of the digital object, some used a relative URI reference denoting the object identifier with, and some without trailing slash. It is unclear, why xml:base was used in the first place and what caused the variations in its content. Effectively the attribute was used as if it holds the digital object identifier which happens to be the relative path to the object in most cases.

Although the use of xml:base caused no apparent error we removed it to avoid problems in the future.