Processing Observations

This representation requires that some elements are split, although there is no requirement for a dominant hierarch as such. A given overlap situation can be represented in a number of different ways, all of which are valid in that they correctly represent the overlap. As mentioned earlier, a representation is deemed to be correct if it is possible to generate each of the original hierarchies without loss from the overlap representation, and this implies there may be more than one correct representation. There may be different ways to define what might be an 'optimal' representation, but it seems that minimising the splitting of elements is a key aspect of this.

Some form of milestone representation may be a starting point because this is one of the most intuitive ways to represent overlap. How then do we process a milestone representation to get to an optimal solution using the delta format described here? The problem is to work out where there is overlap and which element or elements need to be split to remove this overlap. This is fairly easy for simple overlaps of two elements but it is not simple in the situation where there are multiple, arbitary overlaps. Minimising splits can result in having different XML hierarchies, e.g. a particular element type in document A may be both surrounding and within the same element type from document B. Both the A and B document can be generated from this but it can look odd in the delta file and a processor cannot rely on the same nesting in all situations. It is also true that although elements such as <b> or <i> could be nested either one inside the other (the nesting is commutative), this may not be the case with all element combinations.

In practice, different representations are useful for different purposes. In some cases it is preferable to avoid overlap by duplication of content but then the issue is to find the minimum duplication to achieve this. It may be easy in some situations to process milestones but again it is preferable that milestones are only used when they need to be and by finding the minimul split representation it is simple then to achieve minimum milestones from this.