What have we learned from this case study? Quite a lot. We've learned about things that work well, we've learned about how to take best advantage of some of the new constructs in the language, we've generated ideas for further refinements to the language specs (some of which were implemented during the course of the study), and we've learned about areas where there is still room for further improvements.
Here's a list of some of the more important observations.
When converting XML to JSON, we discovered the importance of achieving a mapping
that i consistent not only over a large collection of instance documents, but that is also
consistent over time despite the fact that tomorrow's instance documents might not have
exactly the same structure as today's. We redesigned the element-to-map
function to meet this requirement.
The plan constructed by the element-to-map-plan
function seems
to work well on the samples we needed to convert, given a set of input documents
that is sufficiently large and representative.
We found that it's easiest to define template rules for maps if they can be
written to depend only on the internal structure of the map, and not on the key
used to identify the map within a larger structure. Whether this is possible depends
on the design of the JSON to be tranformed. Writing match patterns of the form
match="record(real, complex)
that recognise the type of a map
from the names of its fields is often a good approach. Sometimes one would also
like to match on the values of a field, for example
match="record(type, *)[?type="xxx']
. It would be nice to have
syntax that's less clumsy for this. There's a temptation for users to reduce it
to match=".[?type="xxx']
but this seems to lack clarity.
XSLT often processes selected child elements or attributes by inline code
within a template, and then processes the remainder using a construct such as
<xsl:apply-templates select="* except (X, Y, Z)"/>
where
X
, Y
, Z
are the children that have been given special treatment.
The except
operator works only on nodes, and the lookup operator
?
currently provides no similar capability to select all properties except
some specifically-named ones. One option is to provide lower-priority template
rules that match X
, Y
, and Z
and do nothing
with them.
For this and other reasons, it is often useful to match values appearing in a tree
of maps and arrays by their associated key. The syntax match="?keyval"
has been proposed for this. The semantics, though, depend on values being labelled with
their associated key, and the full complexities of this (and the usability problems that
it might introduce) are not yet fully understood.
It would be useful for the union
, except
, and intersect
operators in patterns to apply to all kinds of pattern, not only patterns that match nodes. (The semantics
of these operators in a pattern have already diverged in detail from their XPath semantics.)
It would be nice to have some equivalent to match="/"
to match the root of
a tree.
I had been concerned about how template rules should process arrays. The case study revealed no problems in this area. In most cases arrays are not processed by matching them in a template rule, but by iterating over the array in the template rule for its container.
The current syntax for constructing maps and arrays in XSLT is rather verbose, and could
be improved for many common use cases. Sometimes the right answer is to do it in XPath:
the introduction of the fn:apply-templates
function and the xsl:select
instruction both facilitate that. In other cases the new xsl:record
instruction
helps. An equivalent to array:build
as an XSLT instruction has also been
mooted.
Neither xsl:record
nor XPath map constructors make it easy to
include an entry in the constructed map conditionally.
Pinned maps and arrays make access to containing (ancestor) arrays and maps possible, but the current syntax for doing so is very clumsy.
We've added quite a lot of functionality to introduce modifiers for lookup expressions
(such as $x?pair::y
). But this case study didn't identify any situations where
they proved useful.
When the JSON structure uses arrays of maps (which is quite common), paths such as
?x?*?y?*?z
start to appear frequently (and are very hard to debug when they select
nothing). Could this be improved?