Open Document Format (ODF), OOXML

Whereas PDF files are compressed binary extensions of the PostScript language which, when executed, produce page images, ODF and OOXML are XML representations, again binary and compressed but extractable as text. However, rather than specifying the position of items on the page, ODF and OOXML provide an XML-based representation of editable documents.

Word processors generally are strong in their tools for commenting on documents and performing collaborative reviews. The model is that a reviewer sends an annotated copy of a document to the author, who then in turn reviews the comments and suggested changes; this model fits well with many social contexts of writing and working with documents.

The ODF and OOXML formats were designed to serialize OpenOffice and Microsoft Word™ documents (respectively) to XML, and support document revision. Unlike PDF, ODF and OOXML documents are intended to be editable by the recipient, and text reflows.

Unfortunately, the OOXML specification, despite being very large, is also incomplete. In addition, the XML is written in the word-processing implementation domain, not in the user’s problem domain. The XML is complex and relatively difficult to work with, although freely available libraries and XSLT transformations exist to work with them. Word processor formats tend not to nest structures very much: for example, list items tend to be considered as automatically-numbered paragraphs and not to be nested inside a containing list element. Points at which there had one been style changes or selection boundaries may also be retained in the markup, further complicating processing.

Word processing files are not generally ideal for archiving or interchange except in specific contexts such as the need for review or specific print-based workflows. Generated HTML may also have accessibility problems, and unless users are systematic with the use of named styles there tends to be only weak semantic labeling. Word processors in many cases reduce the apparent cost of writing, because they make it seem easy. Unfortunately in a wider context this can merely result in increasing costs and complexity elsewhere in workflow processes when finer-grained control over markup and document features may be needed. In this regard, word processor files can be similar to HTML documents. In all cases final archived documents need to be saved in format that is independent of any particular version of any software.