Portable Document Format (PDF)


Prev		Next

A disadvantage in some applications of all of the above formats is that they are not easily printed in a useful way. HTML is the closest, as Cascading Style Sheets (CSS) can be used with formatting software to make print-ready pages, but it is necessary to write separate CSS stylesheets for different HTML documents. The result of formatting HTML for print is usually PDF, which raises the question of whether PDF is useful as a document interchange format.

PDF documents can be printed: PDF is a page description language and a PDF file describes the exact location of the page of everything inside it, both text and graphics. Unfortunately for document interchange this also means that PDF files cannot be edited to insert or delete content: the remainder of text on the page does not normally reflow.

The PDF format contains machinery for embedding fonts, and for complex graphics. It is a compressed form of the PostScript language. Using PDF can provide page fidelity in many cases, in the sense that printed pages will look the same everywhere, scaled if desired to print on differently-sized paper.

Since PDF is not easily edited, and has limited accessibility to people who are blind or who cannot distinguish certain colours or who need low (or high) contrast, it is not suitable as a primary document format.

Reading and processing PDF in programs is difficult; there are libraries for C and JavaScript and other languages, but the resulting data structures are complex. There are no standard query languages for PDF, and it should be noted that even determining natural-language word-breaks is unreliable, using heuristics based on the position of each letter of the word on the page to try and detect the spaces. In addition, it is not usually possible to distinguish between a line-break at a hyphen in the original input and a hyphen inserted to facilitate line-breaking: text formatting is lossy so that the original document cannot be reconstructed reliably.