No serious work on optimizing (or measuring) performance has yet been done. However, it's useful to get some very preliminary data to assess whether performance is going to be a major obstacle to the feasibility of the approach.
I constructed a valid data file containing ten thousand book elements.
With the existing Saxon-EE schema validator, validation took 1.1 seconds.
With the XSLT validator, validation (using Saxon-EE on Java as the XSLT engine) took 9.4 seconds.
This represents a ballpark estimate of the relative efficiency. It's not a thorough benchmark in any way; there is no point in doing a thorough benchmark until some basic performance tuning has been done.
There are clearly many opportunities for performance improvement. Some of the obvious inefficiencies, relative to the Java validator, include:
In evaluating items with a pattern facet, the regular expression is recompiled every time an item is validated.
This is because the fn:matches()
function precompiles the regular expression if it is known statically, but
it makes no attempt to cache the compiled regular expression if the same regex is used repeatedly. The regex in this case is
read from the SCM file at run-time, so no compile-time optimization is possible.
Similarly, XPath expressions used in assertions may be recompiled every time they are used. There are some
circumstances in which xsl:evaluate
will cache compiled XPath expressions that are used repeatedly, but
this doesn't appear to be happening in this stylesheet.
Too much data is being retained from validation and passed upwards from the validation of a child to the validation of its parent. This results in bloated maps containing validation outcomes, that take a long time to combine. It's probably not difficult to find "low-hanging" optimizations in this area.
It's clear that a lot of the time is being spent creating and combining the maps that are used to pass data up the tree.
The whole application relies very heavily on maps, and its performance depends on the performance of map operations
such as map:put
and map:merge
. It's possible that it might benefit from a different implementation
of maps that is tailored to the usage patterns that occur when map types are declared as tuple types. It could also
benefit from changes to the application to make more selective use of maps. In particular, we seem to be incurring heavy costs
inspecting and copying maps that are actually empty, because all the data is valid: there's clearly an opportunity for optimizations
here.
For the moment, all we can conclude about performance is that more work needs to be done. Making the XSLT validator as fast as the existing Java validator is probably unachievable, but we should be able to get acceptably close.