Assessment against Complex Types using Finite State Machines


Prev		Next

As we've seen, the finite state machines used to evaluate a sequence of elements against the grammar rules for a complex type are constructed by the schema compiler and embedded in the SCM file that is used as input to the validator.

A simplified validator for a simple finite state machine could be written like this:

<xsl:iterate select="$node/*">
    <xsl:param name="state" select="$initial-state" as="element(scm:state)"/>
    <xsl:on-completion>
        <xsl:if test="not($state/@final = 'true')">
            <xsl:sequence select="map{'errors': 
                                      scm:error($node, 'Element content is 
                                      incomplete')}"/>
        </xsl:if>
    </xsl:on-completion>
    <xsl:variable name="matching-edge" as="element(scm:edge)?"
        select="$state/scm:edge[scm:get(@term)[@name = local-name(current()) 
                   and string(@targetNamespace) = namespace-uri(current())]]"/>
    <xsl:variable name="matching-wildcard-edge" as="element(scm:edge)?"
        select="$state/scm:edge[scm:get(@term)[self::scm:wildcard[
                                  scm:wildcard-matches($containing-type, ., 
                                  current())]]]"/>
    <xsl:choose>
        <xsl:when test="empty($matching-edge) and empty($matching-wildcard-edge)">
             <xsl:break select="map{'errors': scm:error(., 'Element ' || name()
                                                  || ' is not allowed here')}"/>
        </xsl:when>
        <xsl:when test="empty($matching-edge)">
            <xsl:variable name="wildcard" 
                          select="scm:get($matching-wildcard-edge/@term)" 
                          as="element(scm:wildcard)?"/>
            <xsl:sequence select="scm:check-wildcard-match($containing-type, 
                                  $wildcard, .)"/>
            <xsl:next-iteration>
                <xsl:with-param name="state" 
                                select="$states[@nr = 
                                        $matching-wildcard-edge/@to]"/>
            </xsl:next-iteration>
        </xsl:when>
        <xsl:otherwise>
            <xsl:variable name="decl" 
                          select="scm:get($matching-edge/@term)" 
                          as="element(scm:element)"/>
            <xsl:apply-templates select="." mode="explicit-decl">
                <xsl:with-param name="decl" select="$decl"/>
            </xsl:apply-templates>
            <xsl:next-iteration>
                <xsl:with-param name="state" 
                                select="$states[@nr = 
                                        $matching-edge/@to]"/>
            </xsl:next-iteration>
        </xsl:otherwise>
    </xsl:choose>            
</xsl:iterate>

The way this code works is as follows:

The xsl:iterate instruction is new in XSLT 3.0. It is rather like xsl:for-each, except that it processes the selected items strictly in sequence; the code for processing one item can set parameters for processing the next item; and it is possible to break out of the loop early. The same effect could be achieved with a recursive template, but xsl:iterate is often easier to understand. In this case we are iterating over the children of the element being validated.
There is a single parameter, the current state, which is initially set (by the calling code) to the state numbered 0.
The xsl:on-completion instruction is executed when we reach the end of the sequence of child elements. If the current state is a final state, we return nothing (meaning all is well, the input is valid). Otherwise we return a map containing an error value.
There are two kinds of transition possible in a given state: named element transitions, and wildcard transitions. We first find all the matching named element transitions (the schema compiler will have ensured there can be at most one) and all the matching wildcard transitions.
If both sets are empty, there is no legal transition for the current child element in this state, so we return an error value.
If there is a wildcard transition possible, but no named-element transition, then we check that the wildcard transition is really allowed and that the element is valid against the wildcard (this will take account of its processContents attribute, and then proceed to process the next child element in the state reached by this transition.
If there is a named-element transition possible, then we call apply-templates to check that the child element is valid against the required type for the named element, and then proceed to process the next child element in the state reached by this transition.

The actual logic is more complex than this. Firstly, we use a finite state machine with counters, to reduce the size of the finite state machine needed for a grammar such as <element name="book" minOccurs="100" maxOccurs="200"/>. Secondly, XSD 1.1 allows "open content" which allows elements matching a given wildcard to appear either (a) anywhere (interleaved content), or (b) at the end of the sequence (suffix content). The possibility of open content is not integrated into the finite state machine, but is instead handled by the validator as it arises. However, the basic principle is retained of stepping through the children using xsl:iterate to maintain the current state.