Data entry and initial transformation

The reliance on micro syntax to record emblem information is prone to errors. In order to e.g. state the fact that the current page shows pictura and motto of emblem E018850, and the motto is the German text So muß es mir ergehn, Soll ich sonst fäste Stehn. an undergraduate has to use the code E018850_P_M-de@So muß es mir ergehn. Soll ich sonst fäste Stehn..

The conversion script does not validate this micro syntax and silently drops information that doesn't fit the expected structure. We observed motti ending up in language tags and missing emblem parts alltogether. We assume that most of those errors are caught during data entry by a manual inspection of the emblem book's web representation, though.

More problematic are "semantic errors" that are introduced to the document structure. In some cases the conversion script adds a superfluous section after a pictura. Our current theory is that the section is added to hold a paragraph which in turn holds page breaks. This could be explained by a more or less informal rule for structural metadata documents that states that page break elements are only allowed inside "logical units"[13]. We are still uncertain if the addition of this superfluous section was done deliberately or is an artefact of the conversion process. We believe the latter is the case because the TEI guidelines would have provided an anonymous block element tei:ab without the semantics of a textual division.

Given the complicated nature of this problem and constraints of the project we decided against trying to further analyze and fix the initial conversion. We hope that replacing the ancient structural metadata editor with the one from the Kitodo software suite will allow us to get rid of the micro syntax and replace the PHP-based with an XSLT based transformation. What we could do is fix the most blatant errors (e.g. motti in language tags) and provide a list of questionable emblem structures to our expert scientist.



[13] http://diglib.hab.de/rules/documentation/structuralMD.xml: Zu beachten ist, dass ein pagebreak pb immer innerhalb einer logischen Einheit, z.B. p, gesetzt wird und den Beginn einer Seite kennzeichnet.