The two key criteria for selecting a case study were (a) that use of JSON (rather than XML) should be a natural choice for the input and output data, and (b) that the data should be recursive — because it's with recursive data structures that the recursive-descent processing model becomes a necessary part of the solution.
The application that best fitted these requirements was the Java-to-C# transpiler, which I described at Markup UK in 2021: [Kay 2021] This is an application that is written almost entirely in XSLT (with a small amount of control logic in Java and Gradle). It is a live application, used internally by Saxonica on a daily basis to transform our Java source code into C# source code, which is then used to build the SaxonCS product. The application runs in several phases:
We preprocess the Java code using a Java preprocessor to exclude parts of the code that are are not needed in SaxonCS.
We run the open-source JavaParser
product to generate an XML representation
of the syntax tree of each (preprocessed) Java module in the Saxon product. This produces around
110Mb of XML across 2100 files.
We analyse these files to produce a digest file. The digest contains a list of classes, interfaces, and methods across the product as a whole, in a single XML file. The digest is around 4Mb.
We refine the digest file, producing a modified version with augmented information. The
main purpose of this process is to work out which Java methods are overridden, so that in the generated C#,
the can be suitably annotated with virtual
or override
modifiers,
something that is not possible by looking at each Java module in isolation.
We then transform each of the XML modules into a C# serialization, taking account of information in the digest file. This stage is a pure recursive-descent rule-based transformation, using around 350 template rules to handle each of the syntactic constructs identified by the Java parser.
As written, the application doesn't use JSON. But what if it did? It's convenient that the JavaParser product generates XML, but it didn't have to be that way: JSON would work just as well. There's no mixed content, which is the key feature that would give XML a natural advantage.
So the case study pretends that we're starting with a syntax tree in JSON rather than XML; and furthermore, it explores the use of JSON (rather than XML) for the digest file. Rather than converting the JavaParser to emit JSON, however, we start by converting the XML to JSON, which also gives us a chance to test the new features in XSLT 4.0 for converting XML to JSON.
Some might argue that using XSLT for converting Java to C# is not exactly a typical use case for XSLT. That's a fair criticism. However, I've seen XSLT used for many applications that you might not consider typical: for example, converting the output of a CAD tool into instructions controlling a 3-D printer. Wherever complex data needs to be structurally transformed, XSLT is a possible solution.
Note that it wasn't an aim of the case study to produce a complete working application. Rather, the aim was to identify whether this was likely to be feasible, and what difficulties might be encountered, and how proposed new language features might mitigate any problems.
The remaining sections of the paper focus on what we learned examining each part of the application.