Conclusions

Opening up XProc to enable processing of non-XML documents is one of the primary goals in the currently ongoing development of XProc 3.0. In this paper we took a first look at the new document model and investigated its application in non-XML workflows. While XProc still models workflows using the idea of documents flowing between steps from output to input ports, the concept of a document changes in the new version of the language. We found, that the new understanding of a document as a pair of a representation and document properties fits in perfectly into the well known world of XML, XPath and XDM.

With regard to the different instances of the new document model already defined in the language's specification it was argued, that using the XDM concept for XML documents makes pipeline authoring much easier. The same holds for text documents, which very smoothly opens up XProc to the processing of character based documents using a (not yet complete) set of XProc steps and XPath functions as well. The introduction of the binary document type allows processing of many document formats typically occurring in XML workflows, namely ZIP based formats as ePUBs and documents from various office productivity software suites on one side and image files on the other side.

Our discussion also showed some open questions which should be discussed in the further development of XProc 3.0: While XML based and text based serializations of RDF graphs are supported, there is currently no support for RDF graph as document type. Whether this would improve the work of pipeline authors in such a way that its introduction could be justified, has to be taken under investigation. Our coverage also showed some open questions about the usefulness of the JSON document type, which should be settled in further discussions.