Representing XML Element Tag Change in diff3x


Prev		Next

XML tags present a problem for diff3 format in that it is in general not possible to ensure a well-formed result without unacceptable duplication of content. To handle tag changes in diff3x we need to treat the XML payload, i.e. the XML that is the subject of change, as text and so escape it with CDATA markers. Later in this paper we look at treating an XML payload as XML, which is more natural but as we shall see, it is not possible to represent tag changes in that approach.

Here is an example of a change of structure.

Table 5. XML tag change

A.txt	O.txt	B.txt
<p>This is a long paragraph where <strong>most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated</strong>. </p>	<p>This is a long paragraph where most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated. </p>	<p>This is a long paragraph where <italic>most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated</italic>. </p>

A.txt

O.txt

B.txt

<p>This is a long
paragraph where
<strong>most of
it has been made 
either bold or 
italic, but the rest 
of the paragraph 
remains unchanged - 
there is no change 
to the text so
we do not want it 
duplicated</strong>.
</p>

<p>This is a long
paragraph where
most of
it has been made 
either bold or 
italic, but the rest 
of the paragraph 
remains unchanged - 
there is no change 
to the text so
we do not want it 
duplicated.
</p>

<p>This is a long
paragraph where
<italic>most of
it has been made 
either bold or 
italic, but the rest 
of the paragraph 
remains unchanged - 
there is no change 
to the text so
we do not want it 
duplicated</italic>.
</p>

This could be represented as shown below, but there is duplication of unchanged text which is confusing because if there had been a small change the user would have found it difficult to see. Note that in this example the payload is XML but this is not seen as part of the XML of the carrier, the payload is treated as text because it is enclosed in the CDATA sections.

<diff3x a="A.txt" b="B.txt" o="O.txt"><![CDATA[<p>This is a long paragraph 
where ]]>
<choice3>
<a><![CDATA[<strong>most of it has 
been made either bold or
italic, but the rest of 
the paragraph remains 
unchanged - there is no 
change to the text so
we do not want it 
duplicated</strong>
]]></a>
<o><![CDATA[most of it has 
been made either bold or
italic, but the rest of 
the paragraph remains 
unchanged - there is no 
change to the text so
we do not want it 
duplicated]]></o>
<b><![CDATA[<italic>most of it has 
been made either bold or
italic, but the rest of 
the paragraph remains 
unchanged - there is no 
change to the text so
we do not want it 
duplicated</italic>
]]></b></choice3>
<![CDATA[. </p>]]>
</diff3x>

We can improve this significantly and use the id attributes as described above to ensure consistent choices. In this case we have given the options within the two connected choices an id value. Thus if option <a> is selected in one choice with id="a42" then the choice with id="a420" is automatically selected. We have also added 'vice versa' attributes so that the same would happen the other way round: if the end tag was selected then the corresponding start tag would also be selected.

<diff3x a="A.txt" b="B.txt" o="O.txt"><![CDATA[<p>This is a long paragraph 
where ]]>
<choice3 >
<a id="a42" select="a420" include="true"><![CDATA[<strong>]]></a>
<o/>
<b id="b43" select="b430"><![CDATA[<italic>]]></b></choice3>
most of it has 
been made either bold or
italic, but the rest of 
the paragraph remains 
unchanged - there is no 
change to the text so
we do not want it 
duplicated
<choice3>
<b select="b43" id="b430"><![CDATA[</italic>]]></b>
<o/>
<a select="a42" id="a420"><![CDATA[</strong>]]></a></choice3>
<![CDATA[. </p>]]>
</diff3x>

If the GUI tool allowed multiple options to be selected, then both <strong> and <italic> could be selected - provided of course they appeared in the correct nested order. We have assumed that the user would need to have the knowledge to know if the selection of two options was appropriate.