Machine generation of PDF files the easy way

In the beginning, personal computers were not quite so friendly and not all users really enjoyed working with them in that pre-windowed state. In that sense, windows from Microsoft was really an advancement as it provided a easy to use interface to utilize the power of the computer.

The power of the computer allowed more than just text. It also allowed for the formatting and beautifying of documents. With word processing, you could see almost exactly what the final result would look like. The fonts and other markup was immediately visible and that gave a real advantage to the average user. No longer did you need a professional who could do desktop publishing to create a nice looking lost pet flyer.

Yet, pretty early on everyone noticed that what you see is what you get is only guaranteed if you happen to look at the document on the exact same computer, when you looked at the same document using the same program on an almost identical second computer sometimes the layouts were not quite the same as the source machine.

Adobe cracked this nut with their portable document format, PDF, and rather than trying to keep it a proprietary standard they released it as an open standard in 1993. To help with the acceptance, they also offer for free their own PDF file viewer.

This format provided the ability to generate a file on one machine or platform and have it accurately reproduced everywhere it was opened. For casual uses, it is perfect. It is possible to create documents that anyone can look at and nobody can modify. The document can be printed on any other machines with perfect reproduction.

This is pretty neat, but after a few decades this is old hat. It is possible to download dozens of tools for free that can create or modify PDF’s.

There are toolkits that can be used to generate PDF files but one of the neatest in my opinion is the one provided by Apache.

Apache Formatting objects processor (FOP)

This is just one implementation of the world wide web consortium’s Extensible Stylesheet Language Formatting Objects or XSL-FO. Apache FOP is simply a print formatter driven by XSL-FO and an independent output formatter.

I use this technology in a lot of cases where either I need to produce PDF files or want to provide nice to look at output. The supported output formats for Apache FOP are PDF, Postscript, PCL, and AFP.

The XSL-FO solution is a well thought out decomposition where the format is stored in an XSLT file while the actual data to be formatted is stored in an XML file. The two are processed together to generate the final output.

I will be producing a number of different posts showing how to take a very small data file and how to create a PDF file showing off that list of data. I will then expand the solution to show how XSL-FO can be expanded to create really professional looking documents.

The final PDF report file will look like this.

data7

It all begins with the page layout.

This entry was posted in programming and tagged , , , . Bookmark the permalink.