XML tags present a problem for diff3 format in that it is, in general, not possible to ensure a well-formed result without unacceptable duplication of content. Here is an example of a change of structure.
Table 2. XML tag change
A.txt | O.txt | B.txt |
---|---|---|
<p>This is a long paragraph where <strong>most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated</strong>. </p> |
<p>This is a long paragraph where most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated. </p> |
<p>This is a long paragraph where <italic>most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated</italic>. </p> |
This could be represented as shown below, but there is duplication of
unchanged text. Such duplication is confusing because if there had been a small
change, the user would have found it difficult to see.
<p>This is a long paragraph where <<<<<<< A.txt <strong>most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated</strong> ||||||| O.txt most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated ======= <italic>most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated</italic> >>>>>>> B.txt . </p>
We can improve this representation, but at the cost of some intelligence on the part of the user to make consistent choices.
<p>This is a long paragraph where <<<<<<< A.txt <strong> ||||||| O.txt ======= <italic> >>>>>>> B.txt most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated <<<<<<< A.txt </strong> ||||||| O.txt ======= </italic> >>>>>>> B.txt . </p>
What we really need here is some way to connect the relevant consistent choices so that if the <strong> start tag is selected, then the appropriate choice of the end </strong> is also made automatically. One simple way to achieve this would be to add a choice id into the format. In this case, we have given the three choices an id value of 42. This is shown below.
<p>This is a long paragraph where <<<<<<<42< A.txt <strong> ||||||| O.txt ======= <italic> >>>>>>> B.txt most of it has been made either bold or italic, but the rest of the paragraph remains unchanged - there is no change to the text so we do not want it duplicated <<<<<<<42< A.txt </strong> ||||||| O.txt ======= </italic> >>>>>>> B.txt . </p>
There are many ways this connection could be achieved syntactically; this is just one. The rules here would be:
A conflict may be labelled with an id.
For any labelled conflict, there must be at least one other labelled conflict with the same id value.
The selection of a choice within a conflict with an id automatically results in the selection of the corresponding choice, i.e., the choice with the same source file, within conflicts with the same id.
Putting the numbers is not a big change to the format but would make a significant difference to the ease of use of diff3 format for structured data,.