Introduction and Background

This paper is focused on the diff3 format rather than the diff3 executable application. In this paper, we will consider the diff3 format from GNU diffutils [1]. There are many possible outputs from diff3, but the one we are interested in is the one that provides a merged file result with conflicts marked up, i.e., the '-m' option on the command line.

The diff3 format can present information that is used in a three-way merge. It is a derivative of the two-way diff change format that uses a subset of the change markers (it does not include the ancestor information, but does use left and right angle brackets to delimit the two inputs). Many users do not invoke diff3 directly; instead, it is often invoked by a version control system such as git or mercurial when the users merge a branch, cherry-pick, rebase or change branches with working directory changes.

The format can be used for resolving changes directly, perhaps using a simple text editor, and this was a common mode of operation with early version control systems. It can also be suitable for use with a GUI to provide accept/reject changes, resulting in a new version of the file with the conflicts resolved.

In order to better understand the format itself, we will provide some background on how the diff3 tool identifies areas of conflict We will not go into any details about the limitations of using line-based comparison tools on tree-structured data, which is a subject that has been explored elsewhere and whose limitations are well known in principle if not in detail, as are a number of different ways to make a line-based comparison work better with tree-structured data, e.g., re-formatting into some canonical form.

It is possible to do a better job of comparison for XML and JSON if the comparison engine is aware of the tree structure. The issue then is how to represent the change in a way that is suitable for other systems, for example, Visual Studio Code [2], which understands the diff3 format. With some ingenuity, certain changes can be represented so that accepting or rejecting the change results in a well-formed output. However, such a representation is not always possible when, for example, start and corresponding end tags have been added or deleted, or when changes are nested.

We will propose a way that the diff3 format could be extended to handle ‘connected changes’ where the acceptance of one change requires the acceptance (or rejection) of a connected change, for example, to keep start/end tags or braces balanced. We will explore the difficulties in trying to extend it further to handle nested changes and propose a way to use XML or JSON to achieve this in a way that is more suited to those technology stacks.