Tagged PDF

“Tagged PDF” is not a separate PDF specification. It refers to PDF that includes additional information about the logical structure of the document. Tagged PDF was first defined in PDF 1.4. Later versions of the PDF specification added more tag (‘Structure Element’) types and more properties of Structure Elements. PDF 2.0 [PDF2.0] added some new tags and deprecated some of the existing tags.

The text, graphics, and images in Tagged PDF can be extracted and reused for other purposes. For example, to make content accessible to users with visual impairments. PDF/UA files (see the section called “PDF/UA”) are Tagged PDF files that also conform to additional requirements.

AH Formatter embeds PDF tags (‘StructElem’) for XSL Formatting Object elements as shown in Table 1, “XSL Formatting Objects and PDF tags”. Other XSL-FO formatters have similar mappings.[24]

Table 1. XSL Formatting Objects and PDF tags

FO elementPDF ‘Structure Element’Comment
fo:rootDocument
fo:page-sequencePart
fo:flowSect
fo:static-contentSect
fo:blockP or DivP when it has the content of inline-level, otherwise Div
fo:block-containerDiv or SectSect when absolute-position="fixed" or "absolute", otherwise Div
fo:inlineSpan or Reference Reference when the child of fo:footnote, otherwise Span
fo:inline-containerSpan
fo:leaderSpan
fo:page-numberSpan
fo:page-number-citationSpan
fo:page-number-citation-lastSpan
fo:scaling-value-citationSpan
fo:index-page-citation-listSpan
fo:bidi-overrideSpan
fo:footnoteThe footnote-reference-area embeds a Sect that contains all the footnotes on the page
fo:footnote-bodyNote
fo:floatSect
fo:external-graphicFigure or FormulaFormula in case of MathML, otherwise Figure
fo:instream-foreign-objectFigure or FormulaFormula in case of MathML, otherwise Figure
fo:basic-linkLink
itemizedlistL
listitemLI
listitem-labelLbl
listitem-bodyLBody
fo:tableTable
fo:table-captionCaption
fo:table-headerTHead
fo:table-footerTFoot
fo:table-bodyTBody
trTR
tdTH or TD TH within fo:table-header, otherwise TD
axf:form-fieldForm
axf:rubyRuby
axf:ruby-baseRB
axf:ruby-textRT

AH Formatter embeds PDF tags (‘StructElem’) for HTML/CSS elements and pseudo-elements as shown in the following table:

Table 2. HTML elements and PDF tags

HTML elementPDF ‘Structure Element’
htmlDocument
divDiv
h1H1
h2H2
h3H3
h4H4
h5H5
h6H6
pP
ulL
olL
liLI
li::markerLbl
dlL
dtLbl
ddLBody
blockquoteBlockQuote
captionCaption
tableTable
trTR
tdTD
thTH
theadTHead
tfootTFoot
tbodyTBody
rubyRuby
rbRB
rtRT
spanSpan
imgFigure
a[href]Link
other block elementsDiv
other inline elementsSpan



[24] The information provided by other formatters, however, was either incomplete [FOP] or not in a format that could just be pasted into this paper [XEP].