When Overlapping XML Meets Changing XML Does Confusion Reign?

Robin La Fontaine


The issue of how best to represent overlapping hierarchy in XML has been the topic of a number of papers over the years. This paper is a further contribution to this important issue, but approaching the problem from a different direction. Our goal is to represent changes to documents, and one type of change is change to the markup hierarchy. Therefore our ultimate goal is to be able to represent not only changes to the hierarchy, typically resulting in overlapping hierarchy, but also changes to attributes and text. This is a more ambitious goal than simply representing overlapping hierarchy, and one aspect of this is to make a clear distinction between the different hierarchical structures and the text that corresponds with each one.

Our work started with a delta format for two or more documents, which easily represents inline changes, but handles hierarchy change by duplicating content. In order to avoid duplication, we introduce a distinction between the name of the element (its tag) and the element content, so that assertions can be made separately. We then introduce @dx (change) and @dxTag (change tag) attributes to mark changes. This representation allows us to define overlapping hierarchies in a completely XML way without declaring a dominant hierarchy and while keeping element fragmentation to a minimum. While this solution probably will not scale for large numbers of variants, it shows promise for many classes of documents.

Table of Contents

Introduction and Background
How Content Duplication Represents Any Change
Representing Structural Change without Content Duplication
Dominant Hierarchy
Processing Observations