Generating the digest file


Prev		Next

Let's look at another stage of the transpilation process: generation of the digest file. In the existing transpiler, this reads the entire collection of 2100 XML files produced by the Java parser, and constructs a single XML file (the digest) containing summary details of the classes, interfaces, and methods. Here is a short extract (the real thing is about 71,000 lines):

<digest>
   <module package="net.sf.saxon.tree">
      <class name="NamespaceNode">
         <implements name="net.sf.saxon.om.NodeInfo"/>
         <constructor params="net.sf.saxon.om.NodeInfo|net.sf.saxon.om.NamespaceBinding|int"/>
         <field name="element" type="net.sf.saxon.om.NodeInfo"/>
         <field name="nsBinding" type="net.sf.saxon.om.NamespaceBinding"/>
         <field name="position" type="int"/>
         <field name="fingerprint" type="int"/>
         <method name="getTreeInfo" returns="net.sf.saxon.om.TreeInfo"/>
         <method name="head" returns="net.sf.saxon.om.NodeInfo" csReturns="net.sf.saxon.om.Item"/>
         <method name="getNodeKind" returns="int"/>
         <method name="equals" returns="boolean" sig="java.lang.Object" params="java.lang.Object"/>
         <method name="hashCode" returns="int"/>
         <method name="getSystemId" returns="java.lang.String"/>
         <method name="getPublicId" returns="java.lang.String"/>
         <method name="getBaseURI" returns="java.lang.String"/>
         <method name="getLineNumber" returns="int"/>
         <method name="getColumnNumber" returns="int"/>
         ...

The JSON equivalent, which we will be generating here, mirrors this closely:

 { "digest":[
    {
      "package": "net.sf.saxon.tree",
      "class": [
        { "name":"NamespaceNode" },
        { "implements":{ "name":"net.sf.saxon.om.NodeInfo" } },
        { "constructor":{ "params":"net.sf.saxon.om.NodeInfo|net.sf.saxon.om.NamespaceBinding|int" } },
        { "field":{ "name":"element", "type":"net.sf.saxon.om.NodeInfo" } },
        { "field":{ "name":"nsBinding", "type":"net.sf.saxon.om.NamespaceBinding" } },
        { "field":{ "name":"position", "type":"int" } },
        { "field":{ "name":"fingerprint", "type":"int" } },
        { "method":{ "name":"getTreeInfo", "returns":"net.sf.saxon.om.TreeInfo" } },
        { "method":{ "name":"head", "returns":"net.sf.saxon.om.NodeInfo", "csReturns":"net.sf.saxon.om.Item" } },
        { "method":{ "name":"getNodeKind", "returns":"int" } },
        { "method":{ "name":"equals", "returns":"boolean", "sig":"java.lang.Object", "params":"java.lang.Object" } },
        { "method":{ "name":"hashCode", "returns":"int" } },
        { "method":{ "name":"getSystemId", "returns":"java.lang.String" } },
        { "method":{ "name":"getPublicId", "returns":"java.lang.String" } },
        { "method":{ "name":"getBaseURI", "returns":"java.lang.String" } },
        { "method":{ "name":"getLineNumber", "returns":"int" } },
        { "method":{ "name":"getColumnNumber", "returns":"int" } },

Actually, what I'm showing here is the result of converting the XML digest to JSON using the 4.0 element-to-map() function. In this stage of the case study, we're looking at the code needed to generate this structure, but I didn't actually complete the exercise, partly because the code uses features not yet implemented in Saxon.

The stylesheeet is fairly small (just 180 lines). It uses a mode with on-no-match="fail" so there has to be an explicit template rule for every element of interest. The top two templates (in the XML version) are:

 <xsl:template name="xsl:initial-template">
    <digest>
       <xsl:apply-templates select="collection($xmlDir || 
                             '?recurse=yes;select=*.xml')"/>
    </digest>     
 </xsl:template>
 
 <xsl:template match="root">
    <module package="{f:qualifiedName(packageDeclaration/name)}">
       <xsl:apply-templates select="types/type"/>
    </module>
 </xsl:template>

So the entry-point template reads all the XML files in a directory whose name is supplied as a parameter, and invokes a template to process each file independently; this selects the only elements of interest, which are the type elements (a type being typically a class or interface). These two templates translate in the JSON version to:

 <xsl:template name="xsl:initial-template">
    <xsl:map>
       <xsl:map-entry key="'digest'">
          <xsl:array>
             <xsl:for-each select="(collection($jsonDir || 
                           '?recurse=yes;select=*.json') ! pin(.)) ? *">
                <xsl:array-member>
                   <xsl:map>
                      <xsl:apply-templates select="."/>
                   </xsl:map>
                </xsl:array-member>
             </xsl:for-each>
          </xsl:array>
       </xsl:map-entry> 
    </xsl:map>
 </xsl:template>
 
 <xsl:template match="?root">  <!-- match=".[label(.)?key = 'root']" --> 
    <xsl:map-entry key="'module'">
       <xsl:map>
          <xsl:map-entry key="'_package'" 
                         select="f:qualifiedName(?packageDeclaration?name)"/>
          <xsl:apply-templates select="?types?type"/>
       </xsl:map>
    </xsl:map-entry>
 </xsl:template>

Observations:

The JSON version of the digest is a singleton map, with key "digest", whose value is an array of maps. Constructing this top-level map-of-array-of-maps is somewhat verbose, but straightforward enough.
The first template rule applies the pin() function to each of the 2100 XML documents in the collection, before applying templates to the result. I'll have more to say about the pin() function in due course, what it does is to create a copy of the tree of maps and arrays, with each item in the tree augmented with a label carrying information about where it was found in the tree.
It's possible we may decide that when xsl:apply-templates selects a map or array, it should be pinned automatically. That decision hasn't been made yet.
The second template rule is shown with two alternative forms of the match pattern. The commented-out version works in Saxon today: it tests whether the label of the item (created when it was pinned) has a key property of root. The second form is a proposed contraction: match="?root" is proposed syntax equivalent to the first form. This would only work if the tree has been pinned, because without this, a value in the tree knows nothing about its associated key. (Contrast this with the XDM model for XML, where an element name is an intrinsic property of an element node.)

This stylesheet processes the input JSON by applying templates to the values found in its arrays and maps: the default processing is <xsl:apply-templates select="?*"/>. This selects the values, not the key-value pairs. I have experimented with selecting key value pairs instead, using <xsl:apply-templates select="map:entries(.)"/>, and there are some cases where this is a good solution, but I have usually found it causes confusion. If the values are labelled with their associated key (by pinning the tree before we start), then it turns out not to be necessary.

For many of the functions and templates in the stylesheet, the translation is fairly direct. For example the XML version has a function to test whether a Java interface has an annotation marking it as a functional interface ^[5]:

 <xsl:function name="f:isDelegate" as="xs:boolean">
    <xsl:param name="interfaceDecl" as="element()"/>
    <xsl:choose>
       <xsl:when test="$interfaceDecl/annotations/
                       annotation/name/@identifier='CSharpDelegate'">
          <xsl:sequence select="$interfaceDecl/annotations/
                       annotation[name/@identifier='CSharpDelegate']/
                       memberValue/@value = 'true'"/>
       </xsl:when>
       <xsl:otherwise>
          <xsl:sequence select="exists($interfaceDecl
                       [@nodeType='ClassOrInterfaceDeclaration']
                       [@isInterface='true']
                       [annotations/annotation/name/@identifier='FunctionalInterface']
                       [count(members/member)=1])"/>
       </xsl:otherwise>
    </xsl:choose>      
 </xsl:function>

In the JSON version this becomes:

 <xsl:function name="f:isDelegate" as="xs:boolean">
    <xsl:param name="interfaceDecl" as="item()"/>
    <xsl:choose>
       <xsl:when test="$interfaceDecl?annotations
                       ?*?name?_identifier='CSharpDelegate'">
          <xsl:sequence select="$interfaceDecl?annotations
                       ?*[?name?_identifier='CSharpDelegate']
                       ?memberValue?_value = 'true'"/>
       </xsl:when>
       <xsl:otherwise>
          <xsl:sequence select="exists($interfaceDecl
                       [?_nodeType='ClassOrInterfaceDeclaration']
                       [?_isInterface='true']
                       [?annotations?*?name?_identifier='FunctionalInterface']
                       [count(?members?*)=1])"/>
       </xsl:otherwise>
    </xsl:choose>      
 </xsl:function>

which, although it might appear a little cryptic at first sight, is actually a very direct translation.

Incidentally, both versions could take advantage of the new xsl:if instruction in XSLT 4.0: the JSON version could be written:

<xsl:function name="f:isDelegate" as="xs:boolean">
    <xsl:param name="interfaceDecl" as="item()"/>
    <xsl:if test="$interfaceDecl?annotations
                  ?*?name?_identifier='CSharpDelegate'"
            then="$interfaceDecl?annotations
                  ?*[?name?_identifier='CSharpDelegate']
                  ?memberValue?_value = 'true'"
            else="exists($interfaceDecl
                  [?_nodeType='ClassOrInterfaceDeclaration']
                  [?_isInterface='true']
                  [?annotations?*?name?_identifier='FunctionalInterface']
                  [count(?members?*)=1])"/>     
 </xsl:function>

This stylesheet outputs JSON, and it is therefore greatly concerned with constructing maps and arrays. We've already seen in the top-level template that this can be rather verbose. There's a lot of this kind of code:

 <xsl:map>
    <xsl:map-entry key="'name'" select="f:degenerify(?name?_identifier)"/>
    <xsl:if test="f:isDelegate(.)">
       <xsl:map-entry key="'delegate'" select="1"/>
    </xsl:if>
    ...
    <xsl:map-entry key="'members'">
       <xsl:array>
          <xsl:for-each select="?members?*[?_nodeType='MethodDeclaration']">
             <xsl:array-member>
                <xsl:apply-templates select="."/>
             </xsl:array-member>
          </xsl:for-each>
       </xsl:array>
    </xsl:map-entry>
 </xsl:map>

and it would be nice to reduce the verbosity if we can. One way of doing this is to do more of the work in XPath expressions rather than XSLT instructions. A couple of new XSLT features are designed to facilitate this: an xsl:select instruction, which evaluates an XPath expression held in its content, and an apply-templates function, which does the same thing as the xsl:apply-templates instruction. With these enhancements, we can almost replace the above code by:

  <xsl:select> {
    'name' : f:degenerify(?name?_identifier),
    'delegate' : xs:integer(f:isDelegate(.)),
    ...
    'members' : array:build(?members?*[?_nodeType='MethodDeclaration'],
                            apply-templates#1)
  } </xsl:select>

The only ingredient missing here is that map constructors have no way to generate a map entry (such as 'delegate') conditionally. We're working on that one.

Another attempt to make map construction more concise is a new xsl:record instruction, allowing something like:

  <xsl:record
    name = "f:degenerify(?name?_identifier)"
    delegate = "xs:integer(f:isDelegate(.))"
    ...
    members = "array:build(?members?*[?_nodeType='MethodDeclaration'],
                            apply-templates#1)"/>

Again, it offers no way to generate a map entry conditionally.

^[5]In the Java source code, we use the annotation @CSharpDelegate to mark interfaces that should be transpiled to C# delegates. The JavaParser faithfully copies this annotation into the XML syntax tree, and the transpiler picks it up from there.