In the real transpiler, the final stage of processing is to take each of the XML (now JSON) documents representing the parse tree of a module, and, with the aid of information in the digest file, to generate corresponding C# code. This combines two tasks: handling any differences between Java and C#, and then serializing the result (with sufficient indentation and spacing to make it legible, since we're going to need to debug it).
For the sake of the case study, I decided to skip the business logic of Java to C# conversion, and simply re-serialize the parse tree as Java code. This mirrored the development approach I had used for the transpiler, where I first wrote template rules to convert the parse tree back to Java, and then incrementally modified the XSLT to handle cases where the C# needed to be different.
I didn't attempt to rewrite all the template rules, but converted a sufficient subset that several of the larger Java modules could be successfully processed. I felt this would give us all the feedback we needed on whether the task was feasible.
A typical (but very simple) template rule in the transpiler might look like this:
<xsl:template match="*[@nodeType='ReturnStmt']"> <xsl:call-template name="indent"/> <xsl:text>return </xsl:text> <xsl:apply-templates select="*"/> <xsl:text>;{$NL}</xsl:text> </xsl:template>
This rule processes an expression with @nodeType='ReturnStmt'
and outputs the
(Java or C#) text "return XXX;"
with suitable indentation, and followed by a newline.
The XXX
here is constructed by recursive application of template rules to the single
operand of the return statement (if any): select="*"
selects the operand, whatever
it might be, and processes it using its own template rule.
The rule doesn't need much changing to handle JSON instead of XML. It becomes:
<xsl:template match=".[?_nodeType='ReturnStmt']"> <xsl:call-template name="indent"/> <xsl:text>return </xsl:text> <xsl:apply-templates select="?expression"/> <xsl:text>;{$NL}</xsl:text> </xsl:template>
Some observations:
match="."
matches anything. We could have written
match="map(*)"
to indicate that we're only interested in matching
maps; or we could have written match="record(_nodeType, *)
to indicate
that we're only interested in matching maps having a "_nodeType
property.
I quite like the idea of combining that with the predicate to allow syntax
like match="record(_nodeType='ReturnStmt', *)"
but that's wishful thinking
for now.
The xsl:text
instruction produces a text node. The stylesheet as a whole
is producing a text file (Java or C# source code), and the traditional way of doing that
is to use the XSLT text output method, with a result tree consisting entirely of text nodes.
There's a lot of inbuilt XML legacy there, but it works.
The variable {$NL}
is used in preference to a literal newline because
it doesn't disrupt the indentation of the code. This is purely a matter of personal style.
Using the latest features in the XSLT 4.0 spec, we could replace the last three lines
in the template body with <xsl:text>return {apply-templates(?expression)};{$NL}</xsl:text>
,
which some people might prefer.
The original XSLT uses select="*"
in the apply-templates
instruction to select all children; the revised XSLT uses select="?expression"
to select only the expression
child. That is because the attributes and children
of the element node in the XML have all become named properties in the JSON, and ?*
would select them all. There's no convenient way with a lookup expression of saying something
like select="?* except ?_nodeType"
(the XPath except
operator
only works with nodes). We have an open issue on this.
It turns out to be rather convenient that we can define the match patterns of template
rules based on the properties of a map in the JSON, rather than on the associated key. If
instead of "right":{"_nodeType":"NullLiteralExpr"}
we had to cope with
"NullLiteralExpr":{"_role":"right"}
(a design that could equally well have
been chosen), then the matching would become rather more complex, as we shall see.
While most of the template rules in this stylesheet match on the value of the
nodeType
attribute, this isn't true of all of them.
With the JSON tree, there's no obvious equivalent of match="/"
,
which matches the root of the tree. There's a good reason for this: the XDM model
for JSON doesn't include parent pointers, so a map or array that's at the top of
the tree produced by parsing JSON doesn't actually know
that it's at the top of the tree.
With this example, we know that the JSON is in the form of a singleton map
with the key root
: that is, the JSON starts with:
{ "root":{ "_nodeType": "CompilationUnit", "packageDeclaration": {
and we can take advantage of this by using match="record(root)"
to match the outermost map.
In other cases where the original stylesheet matched on element name, it was
usually possible to exploit redundancy in the data to match on properties instead.
For example the element with name packageDeclaration
always
has the attribute nodeType="packageDeclaration"
, and the element
with name imports
contains a sequence of elements each having the
attribute nodeType="importDeclaration"
.
In one or two cases a template rule that matched on an element name
(to handle a particular part of an expression, such a the finally
clause of a try/catch) could simply be inlined into the calling template.
The conclusion from this exercise was that the conversion to handle JSON rather than XML input was straightforward — but that we had been lucky. The template rules all matched on attribute values rather than element names; and none of them made use of features such as XML node identity, or access to parents, ancestors, or siblings, that would be difficult to replicate in the JSON world.
Also: I've glossed over the fact that in this phase, I was merely looking at the code that serializes the parse tree back to Java, and skipped the “business logic” that does the conversion from Java to C#. That code, from a fairly superficial examination, includes a few things that are rather harder to deal with:
The template rules access information from the digest file using an xsl:key
definition. The key definition is essentially the same as that described in the subsequent
section Refining the digest file, and creates the same challenges.
There are a number of functions and templates that use the parent or attribute axis
to examine the context of an expression. For example there is a function isInterfaceMember
that distinguishes methods defined in a class from methods defined in an interface, which it
does by searching the ancestor axis to see whether the containing type is a class or an interface.
With a JSON model there are always two ways of tackling this: the needed context information
(class or interface?) can be passed down the call tree as a tunnel parameter, or the mechanism
for pinning the tree can be used to expose an equivalent to the ancestor axis. This is discussed
further in a later section.