XML tags present a problem for diff3 format in that it is in general not possible to ensure a well-formed result without unacceptable duplication of content. To handle tag changes in diff3x we need to treat the XML payload, i.e. the XML that is the subject of change, as text and so escape it with CDATA markers. Later in this paper we look at treating an XML payload as XML, which is more natural but as we shall see, it is not possible to represent tag changes in that approach.
Here is an example of a change of structure.
Table 5. XML tag change
A.txt | O.txt | B.txt |
---|---|---|
<p>This is a long paragraph where <strong>most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated</strong>. </p> |
<p>This is a long paragraph where most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated. </p> |
<p>This is a long paragraph where <italic>most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated</italic>. </p> |
This could be represented as shown below, but there is duplication of
unchanged text which is confusing because if there had been a small change the user
would have found it difficult to see. Note that in this example the payload is XML
but this is not seen as part of the XML of the carrier, the payload is treated as
text because it is enclosed in the CDATA
sections.
<diff3x a="A.txt" b="B.txt" o="O.txt"><![CDATA[<p>This is a long paragraph where ]]> <choice3> <a><![CDATA[<strong>most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated</strong> ]]></a> <o><![CDATA[most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated]]></o> <b><![CDATA[<italic>most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated</italic> ]]></b></choice3> <![CDATA[. </p>]]> </diff3x>
We can improve this significantly and use the id attributes as described above to
ensure consistent choices. In this case we have given the options within the two
connected choices an id value. Thus if option <a>
is selected in one choice with
id="a42" then the choice with id="a420" is automatically selected. We have also
added 'vice versa' attributes so that the same would happen the other way round: if
the end tag was selected then the corresponding start tag would also be
selected.
<diff3x a="A.txt" b="B.txt" o="O.txt"><![CDATA[<p>This is a long paragraph where ]]> <choice3 > <a id="a42" select="a420" include="true"><![CDATA[<strong>]]></a> <o/> <b id="b43" select="b430"><![CDATA[<italic>]]></b></choice3> most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated <choice3> <b select="b43" id="b430"><![CDATA[</italic>]]></b> <o/> <a select="a42" id="a420"><![CDATA[</strong>]]></a></choice3> <![CDATA[. </p>]]> </diff3x>
If
the GUI tool allowed multiple options to be selected, then both <strong>
and
<italic>
could be selected - provided of course they appeared in the correct
nested order. We have assumed that the user would need to have the knowledge to know
if the selection of two options was appropriate.