Generating the digest file

Let's look at another stage of the transpilation process: generation of the digest file. In the existing transpiler, this reads the entire collection of 2100 XML files produced by the Java parser, and constructs a single XML file (the digest) containing summary details of the classes, interfaces, and methods. Here is a short extract (the real thing is about 71,000 lines):

<digest>
   <module package="net.sf.saxon.tree">
      <class name="NamespaceNode">
         <implements name="net.sf.saxon.om.NodeInfo"/>
         <constructor params="net.sf.saxon.om.NodeInfo|net.sf.saxon.om.NamespaceBinding|int"/>
         <field name="element" type="net.sf.saxon.om.NodeInfo"/>
         <field name="nsBinding" type="net.sf.saxon.om.NamespaceBinding"/>
         <field name="position" type="int"/>
         <field name="fingerprint" type="int"/>
         <method name="getTreeInfo" returns="net.sf.saxon.om.TreeInfo"/>
         <method name="head" returns="net.sf.saxon.om.NodeInfo" csReturns="net.sf.saxon.om.Item"/>
         <method name="getNodeKind" returns="int"/>
         <method name="equals" returns="boolean" sig="java.lang.Object" params="java.lang.Object"/>
         <method name="hashCode" returns="int"/>
         <method name="getSystemId" returns="java.lang.String"/>
         <method name="getPublicId" returns="java.lang.String"/>
         <method name="getBaseURI" returns="java.lang.String"/>
         <method name="getLineNumber" returns="int"/>
         <method name="getColumnNumber" returns="int"/>
         ...

The JSON equivalent, which we will be generating here, mirrors this closely:

 { "digest":[
    {
      "package": "net.sf.saxon.tree",
      "class": [
        { "name":"NamespaceNode" },
        { "implements":{ "name":"net.sf.saxon.om.NodeInfo" } },
        { "constructor":{ "params":"net.sf.saxon.om.NodeInfo|net.sf.saxon.om.NamespaceBinding|int" } },
        { "field":{ "name":"element", "type":"net.sf.saxon.om.NodeInfo" } },
        { "field":{ "name":"nsBinding", "type":"net.sf.saxon.om.NamespaceBinding" } },
        { "field":{ "name":"position", "type":"int" } },
        { "field":{ "name":"fingerprint", "type":"int" } },
        { "method":{ "name":"getTreeInfo", "returns":"net.sf.saxon.om.TreeInfo" } },
        { "method":{ "name":"head", "returns":"net.sf.saxon.om.NodeInfo", "csReturns":"net.sf.saxon.om.Item" } },
        { "method":{ "name":"getNodeKind", "returns":"int" } },
        { "method":{ "name":"equals", "returns":"boolean", "sig":"java.lang.Object", "params":"java.lang.Object" } },
        { "method":{ "name":"hashCode", "returns":"int" } },
        { "method":{ "name":"getSystemId", "returns":"java.lang.String" } },
        { "method":{ "name":"getPublicId", "returns":"java.lang.String" } },
        { "method":{ "name":"getBaseURI", "returns":"java.lang.String" } },
        { "method":{ "name":"getLineNumber", "returns":"int" } },
        { "method":{ "name":"getColumnNumber", "returns":"int" } },     
      

Actually, what I'm showing here is the result of converting the XML digest to JSON using the 4.0 element-to-map() function. In this stage of the case study, we're looking at the code needed to generate this structure, but I didn't actually complete the exercise, partly because the code uses features not yet implemented in Saxon.

The stylesheeet is fairly small (just 180 lines). It uses a mode with on-no-match="fail" so there has to be an explicit template rule for every element of interest. The top two templates (in the XML version) are:

 <xsl:template name="xsl:initial-template">
    <digest>
       <xsl:apply-templates select="collection($xmlDir || 
                             '?recurse=yes;select=*.xml')"/>
    </digest>     
 </xsl:template>
 
 <xsl:template match="root">
    <module package="{f:qualifiedName(packageDeclaration/name)}">
       <xsl:apply-templates select="types/type"/>
    </module>
 </xsl:template>      
      

So the entry-point template reads all the XML files in a directory whose name is supplied as a parameter, and invokes a template to process each file independently; this selects the only elements of interest, which are the type elements (a type being typically a class or interface). These two templates translate in the JSON version to:

 <xsl:template name="xsl:initial-template">
    <xsl:map>
       <xsl:map-entry key="'digest'">
          <xsl:array>
             <xsl:for-each select="(collection($jsonDir || 
                           '?recurse=yes;select=*.json') ! pin(.)) ? *">
                <xsl:array-member>
                   <xsl:map>
                      <xsl:apply-templates select="."/>
                   </xsl:map>
                </xsl:array-member>
             </xsl:for-each>
          </xsl:array>
       </xsl:map-entry> 
    </xsl:map>
 </xsl:template>
 
 <xsl:template match="?root">  <!-- match=".[label(.)?key = 'root']" --> 
    <xsl:map-entry key="'module'">
       <xsl:map>
          <xsl:map-entry key="'_package'" 
                         select="f:qualifiedName(?packageDeclaration?name)"/>
          <xsl:apply-templates select="?types?type"/>
       </xsl:map>
    </xsl:map-entry>
 </xsl:template>      
      

Observations:

This stylesheet processes the input JSON by applying templates to the values found in its arrays and maps: the default processing is <xsl:apply-templates select="?*"/>. This selects the values, not the key-value pairs. I have experimented with selecting key value pairs instead, using <xsl:apply-templates select="map:entries(.)"/>, and there are some cases where this is a good solution, but I have usually found it causes confusion. If the values are labelled with their associated key (by pinning the tree before we start), then it turns out not to be necessary.

For many of the functions and templates in the stylesheet, the translation is fairly direct. For example the XML version has a function to test whether a Java interface has an annotation marking it as a functional interface [5]:

 <xsl:function name="f:isDelegate" as="xs:boolean">
    <xsl:param name="interfaceDecl" as="element()"/>
    <xsl:choose>
       <xsl:when test="$interfaceDecl/annotations/
                       annotation/name/@identifier='CSharpDelegate'">
          <xsl:sequence select="$interfaceDecl/annotations/
                       annotation[name/@identifier='CSharpDelegate']/
                       memberValue/@value = 'true'"/>
       </xsl:when>
       <xsl:otherwise>
          <xsl:sequence select="exists($interfaceDecl
                       [@nodeType='ClassOrInterfaceDeclaration']
                       [@isInterface='true']
                       [annotations/annotation/name/@identifier='FunctionalInterface']
                       [count(members/member)=1])"/>
       </xsl:otherwise>
    </xsl:choose>      
 </xsl:function>      
      

In the JSON version this becomes:

 <xsl:function name="f:isDelegate" as="xs:boolean">
    <xsl:param name="interfaceDecl" as="item()"/>
    <xsl:choose>
       <xsl:when test="$interfaceDecl?annotations
                       ?*?name?_identifier='CSharpDelegate'">
          <xsl:sequence select="$interfaceDecl?annotations
                       ?*[?name?_identifier='CSharpDelegate']
                       ?memberValue?_value = 'true'"/>
       </xsl:when>
       <xsl:otherwise>
          <xsl:sequence select="exists($interfaceDecl
                       [?_nodeType='ClassOrInterfaceDeclaration']
                       [?_isInterface='true']
                       [?annotations?*?name?_identifier='FunctionalInterface']
                       [count(?members?*)=1])"/>
       </xsl:otherwise>
    </xsl:choose>      
 </xsl:function>      
      

which, although it might appear a little cryptic at first sight, is actually a very direct translation.

Incidentally, both versions could take advantage of the new xsl:if instruction in XSLT 4.0: the JSON version could be written:

<xsl:function name="f:isDelegate" as="xs:boolean">
    <xsl:param name="interfaceDecl" as="item()"/>
    <xsl:if test="$interfaceDecl?annotations
                  ?*?name?_identifier='CSharpDelegate'"
            then="$interfaceDecl?annotations
                  ?*[?name?_identifier='CSharpDelegate']
                  ?memberValue?_value = 'true'"
            else="exists($interfaceDecl
                  [?_nodeType='ClassOrInterfaceDeclaration']
                  [?_isInterface='true']
                  [?annotations?*?name?_identifier='FunctionalInterface']
                  [count(?members?*)=1])"/>     
 </xsl:function>             
      

This stylesheet outputs JSON, and it is therefore greatly concerned with constructing maps and arrays. We've already seen in the top-level template that this can be rather verbose. There's a lot of this kind of code:

 <xsl:map>
    <xsl:map-entry key="'name'" select="f:degenerify(?name?_identifier)"/>
    <xsl:if test="f:isDelegate(.)">
       <xsl:map-entry key="'delegate'" select="1"/>
    </xsl:if>
    ...
    <xsl:map-entry key="'members'">
       <xsl:array>
          <xsl:for-each select="?members?*[?_nodeType='MethodDeclaration']">
             <xsl:array-member>
                <xsl:apply-templates select="."/>
             </xsl:array-member>
          </xsl:for-each>
       </xsl:array>
    </xsl:map-entry>
 </xsl:map>      

and it would be nice to reduce the verbosity if we can. One way of doing this is to do more of the work in XPath expressions rather than XSLT instructions. A couple of new XSLT features are designed to facilitate this: an xsl:select instruction, which evaluates an XPath expression held in its content, and an apply-templates function, which does the same thing as the xsl:apply-templates instruction. With these enhancements, we can almost replace the above code by:

  <xsl:select> {
    'name' : f:degenerify(?name?_identifier),
    'delegate' : xs:integer(f:isDelegate(.)),
    ...
    'members' : array:build(?members?*[?_nodeType='MethodDeclaration'],
                            apply-templates#1)
  } </xsl:select>
      

The only ingredient missing here is that map constructors have no way to generate a map entry (such as 'delegate') conditionally. We're working on that one.

Another attempt to make map construction more concise is a new xsl:record instruction, allowing something like:

  <xsl:record
    name = "f:degenerify(?name?_identifier)"
    delegate = "xs:integer(f:isDelegate(.))"
    ...
    members = "array:build(?members?*[?_nodeType='MethodDeclaration'],
                            apply-templates#1)"/>
      

Again, it offers no way to generate a map entry conditionally.



[5] In the Java source code, we use the annotation @CSharpDelegate to mark interfaces that should be transpiled to C# delegates. The JavaParser faithfully copies this annotation into the XML syntax tree, and the transpiler picks it up from there.