“Tagged PDF” is not a separate PDF specification. It refers to PDF that includes additional information about the logical structure of the document. Tagged PDF was first defined in PDF 1.4. Later versions of the PDF specification added more tag (‘Structure Element’) types and more properties of Structure Elements. PDF 2.0 [PDF2.0] added some new tags and deprecated some of the existing tags.
The text, graphics, and images in Tagged PDF can be extracted and reused for other purposes. For example, to make content accessible to users with visual impairments. PDF/UA files (see the section called “PDF/UA”) are Tagged PDF files that also conform to additional requirements.
AH Formatter embeds PDF tags (‘StructElem’) for XSL Formatting Object elements as shown in Table 1, “XSL Formatting Objects and PDF tags”. Other XSL-FO formatters have similar mappings.[24]
Table 1. XSL Formatting Objects and PDF tags
FO element | PDF ‘Structure Element’ | Comment |
---|---|---|
fo:root | Document | |
fo:page-sequence | Part | |
fo:flow | Sect | |
fo:static-content | Sect | |
fo:block | P or Div | P when it has the content of inline-level, otherwise Div |
fo:block-container | Div or Sect | Sect when absolute-position="fixed" or "absolute", otherwise Div |
fo:inline | Span or Reference | Reference when the child of fo:footnote, otherwise Span |
fo:inline-container | Span | |
fo:leader | Span | |
fo:page-number | Span | |
fo:page-number-citation | Span | |
fo:page-number-citation-last | Span | |
fo:scaling-value-citation | Span | |
fo:index-page-citation-list | Span | |
fo:bidi-override | Span | |
fo:footnote | The footnote-reference-area embeds a Sect that contains all the footnotes on the page | |
fo:footnote-body | Note | |
fo:float | Sect | |
fo:external-graphic | Figure or Formula | Formula in case of MathML, otherwise Figure |
fo:instream-foreign-object | Figure or Formula | Formula in case of MathML, otherwise Figure |
fo:basic-link | Link | |
itemizedlist | L | |
listitem | LI | |
listitem-label | Lbl | |
listitem-body | LBody | |
fo:table | Table | |
fo:table-caption | Caption | |
fo:table-header | THead | |
fo:table-footer | TFoot | |
fo:table-body | TBody | |
tr | TR | |
td | TH or TD | TH within fo:table-header, otherwise TD |
axf:form-field | Form | |
axf:ruby | Ruby | |
axf:ruby-base | RB | |
axf:ruby-text | RT |
AH Formatter embeds PDF tags (‘StructElem’) for HTML/CSS elements and pseudo-elements as shown in the following table:
Table 2. HTML elements and PDF tags
HTML element | PDF ‘Structure Element’ |
---|---|
html | Document |
div | Div |
h1 | H1 |
h2 | H2 |
h3 | H3 |
h4 | H4 |
h5 | H5 |
h6 | H6 |
p | P |
ul | L |
ol | L |
li | LI |
li::marker | Lbl |
dl | L |
dt | Lbl |
dd | LBody |
blockquote | BlockQuote |
caption | Caption |
table | Table |
tr | TR |
td | TD |
th | TH |
thead | THead |
tfoot | TFoot |
tbody | TBody |
ruby | Ruby |
rb | RB |
rt | RT |
span | Span |
img | Figure |
a[href] | Link |
other block elements | Div |
other inline elements | Span |