Why is XProc Special?

XProc pipelines are basically directed acyclic graphs where the processing steps are the nodes and the input/output connections are the edges. However, when we at le-tex first tried to use generic browser-based graph editing frameworks for XProc in 2014, we discovered that some fundamental XProc properties are not well supported, to wit: Multiple docking ports per node, the distinction between input ports and options, the distinction between parameter and document inputs, encapsulation/sub-graphs, and default readable ports.

Encapsulation, for example, is a powerful feature of XProc that allows to expose a potentially complex pipeline as an apparently monolithic building block with a well-defined interface and opaque innards. Some graph editing frameworks are able to fold a sub-graph so that it occupies less screen real estate. This is no genuine encapsulation though since it requires folded sub-graphs to be copied and pasted rather than re-used. It does not allow the “write once, use many times” approach that XProc’s language design supports so well.

This kind of encapsulation can be seen in other functional languages, too, and there are graphical editors for other programming languages. What sets XProc apart is the ability of a processing step to produce many different outputs that don’t need to be consumed at once (or at all). This is useful for example when an encapsulated multi-step conversion pipeline produces the conversion result on one port and intermediate results and validation reports for the input and output on other ports. But this multi-valued, non-simultaneously consumed outputs deviate sufficiently enough from common programming paradigms as to render visual editors for these languages unsuited for editing XProc.

Another peculiarity, XProc’s concept of “primary” and “default readable ports” is meant to make pipeline authoring less verbose: The primary port of adjacent (in document order) steps connect implicitly, without the need to establish explicit connections. On the two-dimensional canvas of a visual pipeline editor, however, there is no canonical document order. Turning the 2-D representation into a linear XProc XML document, a task that the visual XProc editor needs to perform, will become an optimization problem where a score needs to be attached to multiple possible serializations, rewarding implicit connections via default readable ports. Alternatively, users of the graphical editor could be forced to make XML document order explicit, a thing that we wanted to avoid for usability reasons.

This means that although XProc seems to be a perfect candidate for visual or dataflow programming, its reliance on XML serialization is an extra challenge when converting the natural graph to an XML representation that actually helps people writing their first pipelines.

One should acknowledge that a visual XProc programming environment will probably never replace actual coding. To a large extent this is due to the amount of XSLT that many pipelines orchestrate. At least this is what our experience as developers of the transpect framework [transpect] tells us. In most of our pipelines and libraries, the core tasks will be performed by XSLT stylesheets. This is not a shortcoming of XProc but rather a feature. Apart from ripping apart XSLT micropipelines, we don’t strive at replacing what we do in XSLT with XProc steps. We do see a huge benefit in continuing to use XSLT’s template matching and import mechanisms while encapsulating multi-step XSLT, zip/unzip, HTTP request, validation, etc. pipelines in well-defined XProc step signatures with possibly multiple outputs.