Pandoc

Pandoc is a popular Haskell library for converting markup formats into one another[15]. Pandoc can be invoked via an easy command line client and supports a variety of input and output formats, among others also XML formats like DocBook and JATS. Pandoc supports also LaTeX as output format and uses the Haskell library texmath[16] for converting MathML formulas to TeX.

With these features, Pandoc can be also used to convert from XML to LaTeX. In contrast to xmltex and PassiveTeX, you don't have to be a programmer to use Pandoc. Basic knowledge of how to use the command line is sufficient. To test Pandoc, I prepared a small DocBook test document.

&lt;?xml version="1.0" encoding="UTF‑8"?&gt;<br/>
&lt;article xmlns="<link xlink:href="http://docbook.org/ns/docbook">http://docbook.org/ns/docbook</link>"
  xmlns:xlink="<link xlink:href="http://www.w3.org/1999/xlink">http://www.w3.org/1999/xlink</link>"
  version="5.0"&gt;
  &lt;title/&gt;
  &lt;section&gt;
    &lt;title&gt;Area of a Triangle&lt;/title&gt;
    &lt;equation&gt;
      &lt;math xmlns="<link xlink:href="http://www.w3.org/1998/Math/MathML">http://www.w3.org/1998/Math/MathML</link>" display="block"&gt;
        &lt;mi&gt;A&lt;/mi&gt;
        &lt;mo&gt;=&lt;/mo&gt;
        &lt;mfrac&gt;
          &lt;msup&gt;
            &lt;mi fontstyle="italic"&gt;a&lt;/mi&gt;
            &lt;mn&gt;4&lt;/mn&gt;
          &lt;/msup&gt;
          &lt;mn&gt;4&lt;/mn&gt;
        &lt;/mfrac&gt;
        &lt;mo&gt;&amp;#x22c5;&lt;/mo&gt;
        &lt;mroot&gt;
          &lt;mn&gt;3&lt;/mn&gt;
          &lt;mrow/&gt;
        &lt;/mroot&gt;
      &lt;/math&gt;
    &lt;/equation&gt;
  &lt;/section&gt;
  &lt;bibliography&gt;
    &lt;biblioentry&gt;
      &lt;citetitle&gt;Geometry Workbook For Dummies&lt;/citetitle&gt;
      &lt;author&gt;
        &lt;personname&gt;&lt;firstname&gt;Mark&lt;/firstname&gt;&lt;surname&gt;Ryan&lt;/surname&gt;&lt;/personname&gt;
      &lt;/author&gt;
      &lt;biblioid role="isbn"&gt;978–0471799405&lt;/biblioid&gt;
      &lt;pubdate&gt;2006&lt;/pubdate&gt;
      &lt;publisher&gt;
        &lt;publishername&gt;For Dummies&lt;/publishername&gt;
      &lt;/publisher&gt;
    &lt;/biblioentry&gt;
  &lt;/bibliography&gt;
&lt;/article&gt;

The results that Pandoc delivered were not very promising. Pandoc failed to convert MathML to TeX, even though it worked in other attempts. Moreover, Pandoc was not able to convert the bibliographical entry. While the MathML was quietly removed from the output, the contents of the bibliography were just printed as plain text but with an empty \section{} above. Pandoc’s output is shown below:

\section{Area of a Triangle}
\section{}
Geometry Workbook For Dummies MarkRyan 978–0471799405 2006 For Dummies

On the Pandoc website, the tool is described as a Swiss army knife for markup formats. What applies to Swiss Army Knives applies to Pandoc, too: If the built‐in tools are not suitable for the purpose, you have to get another tool yourself.

Pandoc provides an interface for users to write their own transformations, called filters. These are small programs that convert to or from Pandoc’s intermediate abstract syntax tree (AST). Traditional Pandoc filters work on a JSON representation of the Pandoc AST and can be written in any programming language [17] . But the whole thing is very cumbersome: JSON has to be written to stdout and read from stdin and the filter only works if the programming language is also available on the user’s system. Starting with version 2.0, Pandoc allows to write filters entirely in Lua which requires no external software to be installed.

Apart from the less declarative and time‐consuming method of programming filters, Pandoc offers not very much extensibility. Unfortunately, there are also just a few configuration options to customize the output. Another downside is that you don't always get an error message when Pandoc is unable to process content. That leads me to the conclusion that Pandoc may not be suitable for professional scenarios, but might be useful for occasional conversions of lightweight documents where the users can fix the output themselves if necessary.



[15] Pandoc (2023) Pandoc. A universal document converter. Available at https://pandoc.org/ (Accessed: May 30, 2023)

[16] John MacFarlane, Matthew Pickering (2023) texmath. Available at https://hackage.haskell.org/package/texmath (Accessed: May 30, 2023)

[17] Pandoc (2023) Pandoc Lua Filters. Available at https://pandoc.org/lua-filters.html (Accessed: May 30, 2023)