Serializing the parse tree


Prev		Next

In the real transpiler, the final stage of processing is to take each of the XML (now JSON) documents representing the parse tree of a module, and, with the aid of information in the digest file, to generate corresponding C# code. This combines two tasks: handling any differences between Java and C#, and then serializing the result (with sufficient indentation and spacing to make it legible, since we're going to need to debug it).

For the sake of the case study, I decided to skip the business logic of Java to C# conversion, and simply re-serialize the parse tree as Java code. This mirrored the development approach I had used for the transpiler, where I first wrote template rules to convert the parse tree back to Java, and then incrementally modified the XSLT to handle cases where the C# needed to be different.

I didn't attempt to rewrite all the template rules, but converted a sufficient subset that several of the larger Java modules could be successfully processed. I felt this would give us all the feedback we needed on whether the task was feasible.

A typical (but very simple) template rule in the transpiler might look like this:

<xsl:template match="*[@nodeType='ReturnStmt']">
    <xsl:call-template name="indent"/>
    <xsl:text>return </xsl:text>
    <xsl:apply-templates select="*"/>
    <xsl:text>;{$NL}</xsl:text>
</xsl:template>

This rule processes an expression with @nodeType='ReturnStmt' and outputs the (Java or C#) text "return XXX;" with suitable indentation, and followed by a newline. The XXX here is constructed by recursive application of template rules to the single operand of the return statement (if any): select="*" selects the operand, whatever it might be, and processes it using its own template rule.

The rule doesn't need much changing to handle JSON instead of XML. It becomes:

<xsl:template match=".[?_nodeType='ReturnStmt']">
    <xsl:call-template name="indent"/>
    <xsl:text>return </xsl:text>
    <xsl:apply-templates select="?expression"/>
    <xsl:text>;{$NL}</xsl:text>
</xsl:template>

Some observations:

match="." matches anything. We could have written match="map(*)" to indicate that we're only interested in matching maps; or we could have written match="record(_nodeType, *) to indicate that we're only interested in matching maps having a "_nodeType property. I quite like the idea of combining that with the predicate to allow syntax like match="record(_nodeType='ReturnStmt', *)" but that's wishful thinking for now.
The xsl:text instruction produces a text node. The stylesheet as a whole is producing a text file (Java or C# source code), and the traditional way of doing that is to use the XSLT text output method, with a result tree consisting entirely of text nodes. There's a lot of inbuilt XML legacy there, but it works.
The variable {$NL} is used in preference to a literal newline because it doesn't disrupt the indentation of the code. This is purely a matter of personal style.
Using the latest features in the XSLT 4.0 spec, we could replace the last three lines in the template body with <xsl:text>return {apply-templates(?expression)};{$NL}</xsl:text>, which some people might prefer.
The original XSLT uses select="*" in the apply-templates instruction to select all children; the revised XSLT uses select="?expression" to select only the expression child. That is because the attributes and children of the element node in the XML have all become named properties in the JSON, and ?* would select them all. There's no convenient way with a lookup expression of saying something like select="?* except ?_nodeType" (the XPath except operator only works with nodes). We have an open issue on this.

It turns out to be rather convenient that we can define the match patterns of template rules based on the properties of a map in the JSON, rather than on the associated key. If instead of "right":{"_nodeType":"NullLiteralExpr"} we had to cope with "NullLiteralExpr":{"_role":"right"} (a design that could equally well have been chosen), then the matching would become rather more complex, as we shall see.

While most of the template rules in this stylesheet match on the value of the nodeType attribute, this isn't true of all of them.

With the JSON tree, there's no obvious equivalent of match="/", which matches the root of the tree. There's a good reason for this: the XDM model for JSON doesn't include parent pointers, so a map or array that's at the top of the tree produced by parsing JSON doesn't actually know that it's at the top of the tree.
With this example, we know that the JSON is in the form of a singleton map with the key root: that is, the JSON starts with:
```
    { "root":{
        "_nodeType": "CompilationUnit",
        "packageDeclaration": {          
        
```
and we can take advantage of this by using match="record(root)" to match the outermost map.
In other cases where the original stylesheet matched on element name, it was usually possible to exploit redundancy in the data to match on properties instead. For example the element with name packageDeclaration always has the attribute nodeType="packageDeclaration", and the element with name imports contains a sequence of elements each having the attribute nodeType="importDeclaration".
In one or two cases a template rule that matched on an element name (to handle a particular part of an expression, such a the finally clause of a try/catch) could simply be inlined into the calling template.

The conclusion from this exercise was that the conversion to handle JSON rather than XML input was straightforward — but that we had been lucky. The template rules all matched on attribute values rather than element names; and none of them made use of features such as XML node identity, or access to parents, ancestors, or siblings, that would be difficult to replicate in the JSON world.

Also: I've glossed over the fact that in this phase, I was merely looking at the code that serializes the parse tree back to Java, and skipped the “business logic” that does the conversion from Java to C#. That code, from a fairly superficial examination, includes a few things that are rather harder to deal with:

The template rules access information from the digest file using an xsl:key definition. The key definition is essentially the same as that described in the subsequent section Refining the digest file, and creates the same challenges.
There are a number of functions and templates that use the parent or attribute axis to examine the context of an expression. For example there is a function isInterfaceMember that distinguishes methods defined in a class from methods defined in an interface, which it does by searching the ancestor axis to see whether the containing type is a class or an interface. With a JSON model there are always two ways of tackling this: the needed context information (class or interface?) can be passed down the call tree as a tunnel parameter, or the mechanism for pinning the tree can be used to expose an equivalent to the ancestor axis. This is discussed further in a later section.