Outline of the XSLT

The approach we adopted was to pass the content of the <bibItem> element through a template a number of times, each time looking for a different attribute to match in the text.

Example 17. Stages of conversion

Original content:

Deborah Furet How Illusions Pass, 176-196, 234-247. London: Gerald Duckworth, 1987

After first pass:

Deborah Furet <title>How Illusions Pass</title>, 176-196, 234-247. 
London: Gerald Duckworth, 1987

After second pass:

Deborah <surname>Furet</surname> <title>How Illusions Pass</title>, 
176-196, 234-247. London: Gerald Duckworth, 1987

After third pass:

Deborah <surname>Furet</surname> <title>How Illusions Pass</title>, 
176-196, 234-247. London: Gerald Duckworth, <year>1987</year>

The template for the original <bibItem> element is:

<xsl:template match="bibItem[
    @class = 'bookChapter' 
    or @class = 'book' 
    or @class = 'journalArticle' 
    <xsl:variable name="content" as="node()+" select="child::node()"/>
    <xsl:variable name="content-plus-title" as="node()+">
        <xsl:when test="@title">
          <xsl:call-template name="getElement">
            <xsl:with-param name="class" select="@class" tunnel="yes"/>
            <xsl:with-param name="attributeName" select="'title'" tunnel="yes"/>
            <xsl:with-param name="stringToMatch" select="@title" tunnel="yes"/>
            <xsl:with-param name="matched" select="false()" tunnel="yes"/>
            <xsl:with-param name="nodesToCheck" select="$content"/>
          <xsl:copy-of select="$content"/>

    <xsl:variable name="content-plus-name" as="node()+">
        <!-- call to "getElement" but parameter $nodesToCheck is $content-plus-title -->

    <xsl:variable name="content-plus-date" as="node()+">
        <!-- call to "getElement" but parameter $nodesToCheck is $content-plus-name -->    

      <xsl:apply-templates select="@*"/>
      <xsl:copy-of select="$content-plus-date"/>

The content of the <bibItem> element goes through several stages within the template. Passing from one stage to the next involves a call to a template getElement to find a particular attribute value within the content and wrap it in the corresponding BITS element. The output of the first stage is passed as the input to the next, and so on, until the output of the last stage is copied into the output XML tree.

The getElement template that does the matching is:

<xsl:template name="getElement">
    <xsl:param name="class" as="xs:string" required="yes" tunnel="yes"/>
    <xsl:param name="attributeName" as="xs:string" required="yes" tunnel="yes"/>
    <xsl:param name="stringToMatch" as="xs:string" required="yes" tunnel="yes"/>
    <xsl:param name="matched" as="xs:boolean" tunnel="yes"/>
    <xsl:param name="nodesToCheck" as="node()+"/>
    <xsl:variable name="thisNode" as="node()" select="$nodesToCheck[1]"/>
    <xsl:variable name="remainingNodes" as="node()*" 
    select="$nodesToCheck except $thisNode"/>
    <xsl:variable name="nodeAsString" as="xs:string" select="string($thisNode)"/> 
    <xsl:variable name="matches" as="xs:boolean" 

        Code to look at $thisNode (the first node)
        Either copy it (if $stringToMatch is not found, 
        or if it has been matched already)
        or wrap part of the text node matching $stringToMatch 
        in an element appropriate to the $attributeName

    <xsl:if test="count($remainingNodes) != 0">
      <xsl:call-template name="getElement">
        <xsl:with-param name="matched" select="$matched or $matches" tunnel="yes"/>
        <xsl:with-param name="nodesToCheck" select="$remainingNodes"/>

The parameters to the template are:

classThe type of item that is being referenced (e.g. book, journal article).
attributeNameThe name of the attribute whose value is being sought in the content (e.g. author, title). The value of this and the $class parameter determine the name of the element that is wrapped round a matching string.
stringToMatchThe value (author name, article title, etc) that the template is looking for in the content.
matchedA boolean variable, whose value is true() if the string has already been matched in the content.
nodesToCheckThe remaining content of the bibItem. The template parses one node at a time, and then recursively calls itself with the remaining nodes passed as this parameter.