Nested Changes

Given an XML or JSON representation, we can go one step further and use the fact that the representation is hierarchical to support hierarchical or 'nested' change. A nested change is a change in one branch that modifies something that has been removed in another branch.

We will look at an XML example, showing nested changes.

Table 8. XML nested data example

A.xmlO.xmlB.xml
<author>
  <personname>
    <firstname>Nigel
        </firstname>
    <surname>Whitaker
        </surname>
  </personname>
  <address>
    <phone>+44 1684 532141
        </phone>
    <street>Geraldine Road
        </street>
    <city>Malvern</city>
    <country>UK</country>
    <postcode>WR14 3SZ
        </postcode>
  </address>
</author>
<author>
  <personname>
    <firstname>Nigel
        </firstname>
    <surname>Whitaker
        </surname>
  </personname>
  <address>
    
    <street>Geraldine Road
        </street>
    <city>Malvern</city>
    <country>UK</country>
    <postcode>WR14 3SZ
        </postcode>
  </address>
</author>
<author>
  <personname>
    <firstname>Nigel
    </firstname>
    <surname>Whitaker
    </surname>
  </personname>







</author>

In the above example, one branch, B.xml, has deleted the address sub-tree, which the other branch has modified with an added phone number.

Let us now consider how this could be represented using the proposed XML format presented in the previous section:

<d:diff3>
<author>
  <personname>
    <firstname>Nigel</firstname>
    <surname>Whitaker</surname>
  </personname>
  <d:change>
    <d:content origin="A.xml">
      <address>
        <phone>+44 1684 532141</phone>
        <street>Geraldine Road</street>
        <city>Malvern</city>
        <country>UK</country>
        <postcode>WR14 3SZ</postcode>
      </address>
    </d:content>
    <d:content origin="O.xml">
      <address>        
        <street>Geraldine Road</street>
        <city>Malvern</city>
        <country>UK</country>
        <postcode>WR14 3SZ</postcode>
      </address>
    </d:content>
    <d:content origin="B.xml"/>
  </d:change>
</author>
</d:diff3>

Here we can see the deletion of the address in B.xml, and if we carefully look at A.xml and O.xml, we can work out that a phone child element has been added. But is there a better representation we can use? Given we are now using a representation that follows the tree structure, we can also make use of this structure in the result. Here is an alternative result, where we allow change to nest:

<d:diff3>
<author>
  <personname>
    <firstname>Nigel</firstname>
    <surname>Whitaker</surname>
  </personname>
  <d:change>
    <d:content origin="A.xml, O.xml">
      <address>
        <d:change>
          <d:content origin="A.xml">
            <phone>+44 1684 532141</phone>
          </d:content>
          <d:content origin="O.xml"/>
        </d:change>
        <street>Geraldine Road</street>
        <city>Malvern</city>
        <country>UK</country>
        <postcode>WR14 3SZ</postcode>
      </address>
    </d:content>
    <d:content origin="B.xml"/>
  </d:change>
</author>
</d:diff3>

Here we can see that by allowing nested change and making some small adjustments to the format to allow multiple versions to be specified in origin attributes, we can avoid the repetition and make it easier for a human to understand. However, we have moved further from the simple diff3 representation in this step. Rather than choose one of two or three possibilities at each step, we now need to understand reuse of content, and a more complex format is used for the origin attributes.

As well as being more compact and allowing reuse, there is a further benefit: in some cases, nested changes can be ignored. Suppose we decided to accept the change made in B.xml, the deletion of the address. In this case, we would take the B.xml content, i.e., nothing, and immediately delete the other d:content alternatives at that level of the tree. We do not need to consider the nested change related to the phone element when we choose the outer B.xml alternative.

It is also possible to prove that for a three-way merge process, as used by diff3, at most two levels of nested change/content structure is required. This can be generalized so that for n-way merge algorithms (akin to the idea of 'octopus merge' used in git), a maximum of n-1 levels of nested change are required.

We have not explored how the original diff3 representation could be enhanced to handle nested change. It could be done, but it is more natural in the context of an XML representation of change.