How Content Duplication Represents Any Change

Our starting point was an existing solution (a delta format) for representing change to elements, attributes and text in XML documents.[28] Any change could be represented, but changes to structure required some duplication of content. For example, two paragraphs (denoted A and B) might be:

<p>The quick brown fox.</p>

and

<p>The <s>quick</s> brown fox.</p>

This is a change only to the XML tag structure, the textual content is unchanged. However, we can represent the change by deleting the word ‘quick’ and adding the element

<s>quick</s>

This is a perfectly valid representation of the change, but it implies that there has been deletion and addition and thus that the text has changed. This is shown below. The dx attribute indicates the documents in which the element and its content were present. The deltaxml:textGroup and deltaxml:text elements are wrappers introduced to delineate the word that has been deleted. We need the wrapper as a container for the dx attribute that applies to the text. The reason for the double wrapper here is that there may be more than one variant of the text, so more than one deltaxml:text element, and it is then useful to have these grouped in the outer deltaxml:textGroup for easier processing.

<p dx="A,B">The
   <deltaxml:textGroup dx="A">
      <deltaxml:text dx="A">quick</deltaxml:text>
   </deltaxml:textGroup>
   <s dx="B">quick</s>
   brown fox.
</p>

It would be preferable if we could represent this change without implying change to the content. This is discussed in the next section.



[28] The delta format being used here is a simplified form of the DeltaXML DeltaV2.1 format [10]. The dx attribute would normally be a deltaxml:deltaV2 and the content would indicate whether or not the documents were the same or different for this element. This distinction is not important for this paper and so has been omitted to make the examples simpler.