The final part of the transpiler to be examined in this case study is the stylesheet
that refines the digest. This is concerned with adding attributes to the information about
classes and methods: the most obvious example is to annotate C# methods as virtual
if they are overridden in a subclass, or as override
if they are overriding
a method in a superclass. Another task is to change the return type of a method if it overrides
a superclass method with a wider return type: when we wrote the transpiler, C# did not allow
covariant return types, and it still imposes restrictions that are more severe than those in Java
(for example for methods defined in interfaces).
By its nature, this stylesheet is often following links from the usage of a class to the definition of the class, and this is achieved using an XSLT key definition:
<xsl:key name="classKey" match="class | interface" use="ancestor::module/@package || '.' || string-join(ancestor-or-self::* [self::class|self::interface]/@name, '.')"/>
This indexes every class or interface by a key that represents the full hierarchic name of the class or interface. For example, given this structure:
<module package="net.sf.saxon.tree.iter"> <class name="EmptyIterator"> <implements name="net.sf.saxon.om.SequenceIterator"/> <implements name="net.sf.saxon.tree.iter.ReversibleIterator"/> ... <field name="theInstance" type="net.sf.saxon.tree.iter.EmptyIterator" static="1"/> <method name="getInstance" returns="net.sf.saxon.tree.iter.EmptyIterator" static="1"/> <method name="nextAtomizedValue" returns="net.sf.saxon.om.AtomicSequence"/> ... <class name="OfNodes"> <extends name="net.sf.saxon.tree.iter.EmptyIterator"/> <implements name="net.sf.saxon.tree.iter.AxisIterator"/> ...
it indexes the class OfNodes
with the key
net.sf.saxon.tree.iter.EmptyIterator.OfNodes
.
XSLT keys work only with nodes, not with maps and arrays, and we have no intention of changing that. Instead, the preferred approach is to construct a map that can act as an index. Often it will be appropriate for this map to be held in a global variable. How should we construct it?
In simple cases, constructing a map is easy. For example the equivalent in XPath 4.0 of a key
defined with match="employee" use="@ssn"
is a map built using
map:build(.??employee, fn{@ssn})
[6].
This case is more difficult, because with
a tree of maps and arrays built from JSON, there is no ancestor axis to play with.
I experimented with several ways of constructing the index as a map. The first approach uses recursive descent template rules with tunnel parameters:
<xsl:mode name="build-index" on-no-match="deep-skip"/> <xsl:output method="json" indent="yes"/> <xsl:template match="record(package, *)" mode="build-index" priority="2"> <xsl:message>Processing package {?package}</xsl:message> <xsl:next-match> <xsl:with-param name="full-name" select="?package" tunnel="yes"/> </xsl:next-match> </xsl:template> <xsl:template match="record(class, *)" mode="build-index"> <xsl:param name="full-name" tunnel="yes"/> <xsl:variable name="full-class-name" select="`{$full-name}.{?class?*?name}`"/> <xsl:message>Processing class {$full-class-name}</xsl:message> <xsl:map-entry key="$full-class-name" select="{'class': ?class}"/> <xsl:apply-templates select="?class?*[. instance of (record(class, *) |record(interface, *))]" mode="#current"> <xsl:with-param name="full-name" select="$full-class-name" tunnel="yes"/> </xsl:apply-templates> </xsl:template> <xsl:template match="record(interface, *)" mode="build-index"> <xsl:param name="full-name" tunnel="yes"/> <xsl:variable name="full-class-name" select="`{$full-name}.{?interface?*?name}`"/> <xsl:message>Processing interface {$full-class-name}</xsl:message> <xsl:map-entry key="$full-class-name" select="{'interface': ?interface}"/> <xsl:apply-templates select="?class?*[. instance of (record(class, *) |record(interface, *))]" mode="#current"> <xsl:with-param name="full-name" select="$full-class-name" tunnel="yes"/> </xsl:apply-templates> </xsl:template> <xsl:template name="xsl:initial-template"> <xsl:map> <xsl:apply-templates select="?digest?*" mode="build-index"/> </xsl:map> </xsl:template>
The tunnel parameter full-name
is used to build up the concatenated name as
we descend the hierarchy; when we get to the leaf nodes, we can create a map entry using this
name, so there is no need to access ancestor information. This works, but it's a lot of work
to replicate a fairly simple xsl:key
declaration. The xsl:message
instructions are there as a reminder of how difficult I found it to get this right. The need
for separate paths to handle classes and interfaces is especially irritating. They could probably
be combined, but I found it was getting too complicated.
Note the use of the idiom match="record(class, *)
. This matches any map
that has an entry with the key "class"
. A single key is often enough to identify
the relevant maps uniquely.
The subtlety is that a top-level class is represented in the digest by a map that has
both a package
key and a class
key, and both contribute to the
full name of the class. By giving the match="record(package, *)"
template rule
higher priority, and then using xsl:next-match
, we ensure that both names are
added to the hierarchic name, in the right order. An inner class will have an entry that only
matches the match="record(class, *)"
template rule.
My second attempt to build the index also used recursive-descent template processing, but instead of passing tunnel parameters down with each call, it relied on the ability in a pinned tree of maps and arrays to access ancestor information. This worked, but it demonstrated no benefits over the first approach.
My third attempt used the map:build
function, again processing a pinned
tree of maps and arrays to make ancestor information. Here it is:
<xsl:function name="f:fullClassName"> <xsl:param name="c" as="(record(class, *)|record(interface, *))"/> <xsl:variable name="upper" select="label($c)?parent ! label(.)?parent"/> <xsl:variable name="prefix" select=" if ($c instance of record(package, *)) then $c?package else f:fullClassName($upper)"/> <xsl:sequence select="$prefix || '.' || $c?('class','interface')?*?name"/> </xsl:function> <xsl:template name="xsl:initial-template"> <xsl:sequence select=" map:build( pin(?digest)??~(record(class, *)|record(interface, *)), f:fullClassName#1) "/> </xsl:template>
I've cheated a little here, because it uses constructs that aren't yet implemented in Saxon, so I had to use workarounds to make it work. But it's only using features that are defined in the status-quo 4.0 specification.
Some observations:
The construct ??
is a deep lookup operator: it does the same for maps and arrays
as //
does for node trees. It can be qualified by a type, so ??~record(method)
searches the entire tree for values matching the class record(method)
. In this case
we have supplied a choice type: ??~(A|B)
matches items that are instances of either
A
or B
.
We have called pin()
on the tree so that each value is labelled; the label
includes information about the containing (parent) map or array.
The first argument to map:build
selects the items to be indexed.
The second computes a key value for each one. This is done by calling a user-written
recursive function f:fullClassName
.
In this function, $upper
navigates to the grandparent of
a value. That's because the structure uses arrays of maps: to get from an inner class
to its containing class, we need to go up two levels. The local name of the selected
class or interface is then prefixed either with the package name (if it is a top-level
class or interface), or with the full name of the containing (grandparent) class,
computed by a recursive call, if it represents an inner class.
Which is preferable? Opinions will probably differ. Neither is as concise as I would like, but is the requirement frequent enough to justify custom syntax for the equivalent of an ancestor axis? With the current (very early) implementation in Saxon, both take around the same time: 500ms to 700ms to index a 5Mb digest file.
[6] map:build
, with two arguments, returns a map in which each
key-value pair contains a value from the sequence supplied in the first argument,
with a corresponding key calculated using the function supplied in the second argument.
The XPath 4.0 expression fn{@ssn}
represents a function that returns the value of
the @ssn
attribute of the node supplied as the implicit function argument:
in XPath 3.1 this would be written function($node){$node/@ssn}
.