An Alternative Approach

The methods presented in the previous section are more or less suitable for converting XML to TeX. They vary between ready‐to‐use applications that are difficult to configure (PassiveTeX, Pandoc) and programming approaches from scratch (xmltex, XSLT). The methods have all in common that they are not well‐suited to be configured for arbitrary XML vocabularies.

When I was in the situation of knowing virtually nothing about TeX and given the task of developing an alternative to Chikrii’s Word2TeX plugin[19] for Microsoft Word eight years ago, I’ve also been thinking of a way to handle arbitrary XML vocabularies by one library. But I learned from my colleagues in the typesetting department that the library must be configurable not only for different ML inputs, but for different TeX outputs as well: Depending on the customer or product, other TeX packages and engines are used. For example, we use LuaTeX with unciode‐math[20] for the conversion of NISO STS standards to TeX while we still use pdflatex for traditional typesetting projects in TeX.

xml2tex is a module of the le‑tex transpect [21] framework and covers various aspects of converting arbitrary XML to TeX‐based formats. It is based on XProc and XSLT and built around an XML vocabulary agnostic configuration that allows to associate XML contexts with TeX instructions. xml2tex consists of three more or less configurable components:

  1. Convert CALS/HTML tables to tabular or htmltabs[22]

  2. Transform MathML to TeX

  3. Transform XML to TeX

The components for converting mathematical formulas and tables are prefabricated XSLT stylesheets that are only configurable to some extent. The third transformation is based on a configuration.

Figure 3. xml2tex conversion pipeline

xml2tex conversion pipeline

Convert CALS/HTML tables to tabular or htmltabs

First, CALS tables are normalized. Therefore, we adopted Andrew J. Welch’s table normalization[23] and implemented it for CALS tables. If the input consists of HTML tables, they are converted to CALS before. The table normalization facilitates converting the tables to TeX later.

+-----------+-----------+      +-----+-----+-----+-----+
| a         | b         |      | a   | a   | b   | b   |
|           +-----+-----+      +-----+-----+-----+-----+
|           | c   | d   |      | a   | a   | c   | d   |
+-----------+-----+     |  =>  +-----+-----+-----+-----+
| e               |     |      | e   | e   | e   | d   |
+-----+-----+-----+     |      +-----+-----+-----+-----+
| f   | g   | h   |     |      | f   | g   | h   | d   |
+-----+-----+-----+-----+      +-----+-----+-----+-----+

Depending on the used options, another stylesheet converts the normalized tables either to tabular or htmltabs tables. The TeX tables are inserted as processing instructions into the original XML document.



[19] Chikrii (2022) Word2Tex. Available at http://www.chikrii.com/products/word2tex/ (Accessed: May 30, 2023)

[20] Will Robertson (2020) Experimental Unicode mathematical

typesetting: The unicode‐math package. Available at: https://ctan.org/pkg/unicode-math (Accessed: May 30, 2023)

[21] le‑tex (2023) transpect framework documentation. Available at https://transpect.io (Accessed: May 30, 2023)

[22] transpect (2023) htmltabs source code. Available at https://github.com/transpect/xerif/blob/main/latex-oops/htmltabs.sty (Accessed: May 30, 2023)

[23] Andrew J.Welch (2006) Table Normalization in XSLT 2.0. Available at http://ajwelch.blogspot.com/2006/09/table-normalization-in-xslt-20.html (Accessed: May 30, 2023)