Representing XML Element Tag Change in diff3

XML tags present a problem for diff3 format in that it is, in general, not possible to ensure a well-formed result without unacceptable duplication of content. Here is an example of a change of structure.

Table 2. XML tag change

A.txtO.txtB.txt
<p>This is a 
long paragraph 
where <strong>most 
of it has 
been made either bold or
italic, but the rest of 
the paragraph remains 
unchanged - there is no 
change to the text so
we do not want it 
duplicated</strong>.
 </p>
<p>This is a 
long paragraph 
where most 
of it has 
been made either bold or
italic, but the rest of 
the paragraph remains 
unchanged - there is no 
change to the text so
we do not want it 
duplicated.
 </p>
<p>This is a 
long paragraph 
where <italic>most 
of it has 
been made either bold or
italic, but the rest of 
the paragraph remains 
unchanged - there is no 
change to the text so
we do not want it 
duplicated</italic>.
 </p>


This could be represented as shown below, but there is duplication of unchanged text. Such duplication is confusing because if there had been a small change, the user would have found it difficult to see.

<p>This is a long paragraph 
where 
<<<<<<< A.txt
<strong>most of it has 
been made either bold or
italic, but the rest of 
the paragraph remains 
unchanged - there is no 
change to the text so
we do not want it 
duplicated</strong>
||||||| O.txt
most of it has 
been made either bold or
italic, but the rest of 
the paragraph remains 
unchanged - there is no 
change to the text so
we do not want it 
duplicated
=======
<italic>most of it has 
been made either bold or
italic, but the rest of 
the paragraph remains 
unchanged - there is no 
change to the text so
we do not want it 
duplicated</italic>
>>>>>>> B.txt
. </p>

We can improve this representation, but at the cost of some intelligence on the part of the user to make consistent choices.

<p>This is a long paragraph 
where 
<<<<<<< A.txt
<strong>
||||||| O.txt
 
=======
<italic>
>>>>>>> B.txt
most of it has 
been made either bold or
italic, but the rest of 
the paragraph remains 
unchanged - there is no 
change to the text so
we do not want it 
duplicated
<<<<<<< A.txt
</strong>
||||||| O.txt
 
=======
</italic>
>>>>>>> B.txt
. </p>

What we really need here is some way to connect the relevant consistent choices so that if the <strong> start tag is selected, then the appropriate choice of the end </strong> is also made automatically. One simple way to achieve this would be to add a choice id into the format. In this case, we have given the three choices an id value of 42. This is shown below.

<p>This is a long paragraph 
where 
<<<<<<<42< A.txt
<strong>
||||||| O.txt
 
=======
<italic>
>>>>>>> B.txt
most of it has 
been made either bold or
italic, but the rest of 
the paragraph remains 
unchanged - there is no 
change to the text so
we do not want it 
duplicated
<<<<<<<42< A.txt
</strong>
||||||| O.txt
 
=======
</italic>
>>>>>>> B.txt
. </p>

There are many ways this connection could be achieved syntactically; this is just one. The rules here would be:

  1. A conflict may be labelled with an id.

  2. For any labelled conflict, there must be at least one other labelled conflict with the same id value.

  3. The selection of a choice within a conflict with an id automatically results in the selection of the corresponding choice, i.e., the choice with the same source file, within conflicts with the same id.

Putting the numbers is not a big change to the format but would make a significant difference to the ease of use of diff3 format for structured data,.