Background

Before we had XML, we had SGML, which is the ISO standard techreport on which XML is based. By default, SGML is pointy-bracket markup like XML…but in SGML almost everything can be redefined, including the markup characters themselves, and there are numerous options for additional features and for abbreviations and markup shortcuts to minimise typing. Any changes to syntax or to the configuration value limits have to be made in a Declaration file, and a DTD is compulsory on every document.

Notice the words to minimise typing. In the beginning, there was no software with an editing interface that could use the DTD to provide a contextual menu of available element types; and the idea that the markup would be hidden from the user was counterintuitive — if you couldn’t see the tags, how could you know what was marked?

In these early stages, therefore, markup was applied by typing it in a plaintext editor, so the essential piece of software to begin with was the parser, not the editor, so that you could check that you hadn’t got something wrong. The term parsing was often used to mean parsing-and-validating[1] as there was no concept of well-formed tag validity in the sense introduced with XML. There were many early parsers; among the most significant were:

Other software developed rapidly, spurred partly by the adoption of SGML for some military documentation in the US and elsewhere, and partly by its growing use in publishing, research, and academia. Editing software included:

Documents also need processing in some way: adding to a database, putting on the Web, mining it for data, or converting it for a formatting system for publishing. Conversion or processing (transformation) systems included:

There were several standalone viewers, especially for vertical-market applications, but few general-purpose browsers. As with editors, some used SGML-syntax stylesheets to format the display; others used proprietary stylesheet syntax. Formatting systems for printed output typically produced Postscript (pre-PDF days). Some handled SGML input direct, others via an established conversion route; output was formatted using TeX or a proprietary typesetting engine.

There was far more software available which is outside the scope of this report — some of it is now either uncompilable or uninstallable, or was in any case incomplete or experimental at the time. A significant amount was normal commercial software which has suffered the conventional fate of being superseded, falling out of use, or being abandoned when the company failed or was taken over. There are extensive lists of both free and commercial applications in Robin Cover’s SGML/XML Web Pages, and some of the SGML Conference CDs have a considerable amount of freely-distributable and commercial-sample software in subdirectories..

Other categories not covered here include design tools, search engines, and databases. The only three of these of which this author has direct experience (noted below) were Microstar’s Near&Far Designer, Tim Bray’s PAT search engine in the section called “PAT  □”, and the SGML DARC document management database in the section called “SGML Darc  ☑”.



[1] In fact, in the authors’ description of the Amsterdam SGML Parser article, the only instances of the term validation are in the formal references to validation services in the SGML standard itself.

[2] The editor is reputedly being resuscitated and rebuilt using INRIA’s Thot structured editor.