diff3 Format as XML or JSON


Prev		Next

An obvious question about diff3, when we are looking at XML and JSON, is whether or not we would get a significantly better result if we used XML or JSON instead of the fairly basic format of diff3. The table below shows an example in diff3 and the corresponding file in XML and JSON using a very simple syntax in each case. The purpose here is just to explore whether or not it makes sense to do this.

Table 6. diff3 format in XML or JSON

diff3	XML	JSON
1 4 5 2 <<<<<<< A.txt 3 \|\|\|\|\|\|\| O.txt 3 4 5 ======= 4 5 3 >>>>>>> B.txt 6	<d:diff3> 1 4 5 2 <d:change> <d:content origin="A.txt"> 3 </d:content> <d:content origin="O.txt"> 3 4 5 </d:content> <d:content origin="B.txt"> 4 5 3 </d:content></d:change> 6 </d:diff3>	{ "diff3": [ "1", "4", "5", "2", { "change": { "A.txt": ["3"], "O.txt": [ "3", "4", "5" ], "B.txt": [ "4", "5", "3" ] } }, "6" ] }

diff3

XML

JSON

1
4
5
2
<<<<<<< A.txt
3
||||||| O.txt
3
4
5
=======
4
5
3
>>>>>>> B.txt
6

<d:diff3>
1
4
5
2
<d:change>
    <d:content origin="A.txt">
3
</d:content>
    <d:content origin="O.txt">
3
4
5
</d:content>
    <d:content origin="B.txt">
4
5
3
</d:content></d:change>
6
</d:diff3>

{
 "diff3": [
  "1",
  "4",
  "5",
  "2",
  {
   "change": {
    "A.txt": ["3"],
    "O.txt": [
     "3",
     "4",
     "5"
    ],
    "B.txt": [
     "4",
     "5",
     "3"
    ]
   }
  },
  "6"
 ]
}

For JSON, we have represented the sequence of lines as an array of strings, where each line is a string and a change is an object where each member name is the name of the original file. We could have concatenated the lines with a '\n' delimiter, but this would have been very difficult to read.

This example shows that JSON changes the look and feel significantly due to the way it represents strings. XML is similar to the original, though some detail is left out here, for example, xml:space="preserve" or <![CDATA[ to preserve the formatting. If the original data is XML, then representing the changes in XML in this way would be very confusing and it would be better to embed the changes within the original XML, assuming the original was well-formed.

The addition of the id (to represent connected changes) would be very simple in XML as an attribute, but a little harder in JSON because it would mean adding another member to the change object. The table below compares some of the characteristics of the three formats, where we use an informal score of three stars for good, two stars for OK and one star for poor.

Table 7. Characteristics of diff3, XML and JSON

Characteristic	diff3	XML	JSON	Comment
No processing needed for unchanged file	***	**	*
Preserve line structure	***	***	**	JSON needs strings or \n
Good for text editor (by hand)	***	*	*
Connected changes	**	***	**
Nested changes	*	***	**
Changes within a line	*	***	**
Show all resolved merges	*	***	**
Show changes to JSON data	**	**	***
Show changes to XML data	*	***	*

The table does show some potential advantages of having an XML representation of diff3, especially for automated processing. For showing changes to well-formed XML in XML this might require some care to preserve comments, processing instructions and the first line declaration/prolog. Attribute changes could also not be handled as text so again would need some further design thought. One approach would be to treat the XML source file as text and enclose it in CDATA sections. It is likely that embedding the changes in a well-formed XML source would require a different approach to simply using XML to show changes in a text file. Similar issues would occur for showing changes to JSON in JSON.

This proposal is not intended as an alternative to diff3 and it is clear that there would be issues to resolve if JSON or XML were used. XML does look more appropriate, but it lacks one desirable characteristic of diff3: no processing is needed for an unchanged file (or one with no conflicts).