The publication process as XProc pipeline

Recreating the publication process as a sequence of XSL transformations interspersed with validation seemed straight forward. Unluckily we hit a first wall when we tried to reimplement the initial transformation with XSL. The transformation is done by a PHP script that uses a mixture of string concatenation and DOM manipulation to create the structural metadata document. From the looks of it the conversion script was developed for an earlier project (i.e. the Digital Incunabula Collection) and later modified by adding one special case after another. We decided to skip the first transformation and start with the structural metadata document instead. This didn't work either: The transformation from TEI to Emblem Schema ran but didn't return anything. As a last resort we the turned to the Emblem Schema documents and it was when some of those didn't validate against the Emblem Schema we decided to take a step back and start a deeper investigation.

The library engaged in the creation of digital content since the late nineties, specifically in the creation of resources for digital emblem studies since the early 2000s. Not surprisingly the overall biggest problem we faced was the lack of proper documentation. Although we were lucky to still have staff around that was involved in the projects back then we had to resort to a certain amount of guesswork regarding the reasons why documents looked the way they did. Human memory turned out to be unreliable and no substitute for proper documentation. Having an XML database at hand to run queries across the documents helped a lot. What struck us as odd was that even though documents were associated with a DTD or a schema the DTD or schema was not used to actually validate the documents. This meant that we had to base our guesses on existing transformation and processing scripts, the source items, and anecdotal evidence.

All in all we found the following flaws in our data and processes.