Extensibility

FreqX uses an array of maps to represent information about available reports. In the following code listing, the dc:v() function produces a string, empty in the case of an empty sequence. This is needed as otherwise an empty sequence in a comma-separated sequence would be discarded, and the columns in the CSV files would not align. This is not required for numbers, as zero values are not discarded. An alternative design might have used an array, as these can contain empty sequences. With XSLT 4, a record type would provide increased type safety.

<xsl:variable name="csv-makers" as="map(*)"
  select="
      (: This data structure drives the various different
       : comma-separated-value (CSV) reports.
       :)
     map {
      'elements' : map {
        'what' : function($counts) { $counts/elements/* },
        'headings' : 'Element,NS,Nocc,NDocs',
        'attributes' : function($count as element(count)) {
          (
            dc:v($count/@name),
            dc:v($count/@ns),
            xs:string($count),
            dc:v($count/@ndocs)
          )
        }
     },
     'element-parents' : map {
       'what' : function($counts) { $counts/elements-parents/* },
       'headings' : 'Element,NS,Parent,Parent NS,Nocc,NDocs',
       'attributes' : function($count as element(count)) {
         (
            dc:v($count/@name),
            dc:v($count/@ns),
            dc:v($count/@parent-name),
            dc:v($count/@parent-ns),
            xs:string($count),
            dc:v($count/@ndocs)
         )
       }
     },

The listing is incomplete: there are more entries in the actual XSLT file for FreqX, and of course a new CSV report can be added by inserting a new entry into the map. Entirely new formats, such as JSON, require a separate new template, but the hardest part is deciding how to represent the information in the report.

The report generator will apply the attributes function to each count element in turn, receive a sequence of strings in return, and make them into one item of the report: one comma-separated line, for example. The function examines the attributes of count elements to obtain information such as names, namespaces, counts.

This architecture means that the representation of final counts as count elements can be changed with only moderate effort, and the representation of observations is entirely self-contained in the document scanner. As a result it would be possible to experiment again with maps instead of elements, for example.