About <xml-project /> and XProc

XProc 3.0 is a pipeline language with an XML syntax. It is based on XProc (1.0), which became a W3C recommendation in 2010. Based on user experience, a group of volunteers have worked together since 2017 as an W3C community group to improve and expand the original language. In September 2022, a community report on the core language specifications and the standard step library was published. While additional step libraries, e.g. for file processing, document validation, and paged media creation, are technically still under construction, we consider them to be very mature and in their final state. In fact, the project presented here relies heavily steps from the additional libraries, and they proved to be very useful and robust.

For those familiar with the original XProc, it might be interesting to mention some of the changes made for XProc 3.0. The most visible change is the expansion of the basic document model from XML only to a more realistic model for the latest processing: “Native” documents in XProc 3.0 are now XML, HTML, JSON, as well as text documents and binary documents (such as images or PDFs). The newly supported document types are accompanied by corresponding steps so that they can be used effectively in the pipelines. Further highlights of XProc 3.0 are the move to XPath 3.1 as the underlying processing language, XDM typing for options and variables along with a number of minor syntax tweaks that greatly improve the coding and debugging experience from the original XProc.

For those not familiar with XProc 1.0, or those who want to start over with XProc 3.0, there is now an improved learning base. Foremost, there is Erik Siegel's excellent book [Siegel:2020]. Erik also published a series of articles introducing XProc 3.0 on XML.com ([Siegel:2019], [Siegel:2020a], [Siegel:2020b]). For those who prefer videos, a series of six talks from Markup UK 2020 are available on the conference's YouTube channel. In addition to two talks on the basics of XProc 3.0, there are also talks on handling JSON documents, text documents and Zip archives.

Currently, two XProc 3.0 processors are known to be available: XML Calabash 3.0 is in its final phase as a successor to the well-known XML Calabash, both developed by Norman Tovey-Walsh. This paper is based on MorganaXProc-III, which is the successor to the now retired MorganaXProc. Also developed by <xml-project />, this is a Java (or JVM) based implementation that, in addition to the core specification and the standard step libraries, also implements the file step library and most of the validation library, while also supporting Extensible Validation Report Language (XVRL). It has been around as a public beta since February 2020, received a lot of useful bug reports from users and was released as version 1.0 in September 2022. Since then, it has received monthly updates with bug fixes and feature enhancements. MorganaXProc-IIIse is an open-source product released under GPL 3.0. Coming later this year is a second, commercial edition called MorganaXProc-IIIee (Extended Edition). It provides support for almost all optional features of XProc 3.0, with complete coverage of the proposed step libraries as well as processor-specific steps such as image processing.