Machine generation of PDF files the easy way – part I

The tools

In order to reproduce this work you will need to download Apache FOP and have a recent version of Java on your computer.

https://xmlgraphics.apache.org/fop/download.html

Page layout

There are a number of books that spend hundreds of pages to describe how to create the XSL-FO formatting files. I can only give a brief overview and try to describe a few examples to help bend XSL-FO to your will.

The first example will be taking a few names and creating a PDF file that is a list of these names. This is a snapshot from the actual output from step #1.data1There is no point in display the entire page here as it is only partially filled. It can be downloaded at the bottom of this post for the truly curious.

The actual layout for generating a single page PDF file using A4 as the page size and 2cm margins. In addition to the physical page definition, there are a lot of tags that appear to deal with formatting.

ie.

<fo:block font_size="16pt" font-weight="bold">
(other stuff here)

</fo:block>

The formatting

The formatting in the xslt file in a lot of cases is pretty obvious. Virtually every tag contains one more attributes. One of the nice things about the xslt is that the names chosen by the W3C are actually quite meaningful as are most of the attributes.

<fo:block
font-size=”16pt”
font-weight=”bold”
space-after=”5mm”>
List of
<xsl:value-of select=”/xmlroottag/listtype”/>
</fo:block>

It may or may not seem obvious but not every attribute can be used on every tag.  An obvious example of this is margins. Page layout is anything that relates to the physical size. It is the definition of how part of the document will look on with respect to the output page (body, heading or footer) The block could be considered a scope or style that lets you define the basic format for everything within that paragraph or block.

As not every tag has the same sets of attributes so it can be difficult to know which tags have which attributes. I suggest that you find either a good reference book. A book is nicer as you can slide printouts of specific code in between the pages to clarify any points or show examples of special functionality.

I was surprised that really none of the xslt books that cover XSL-FO received good reviews. I think this is because this is a really dense material and it is difficult to easily convey. I happened to purchase a book from O’Reilly which I used to help supplement other examples and information received from the internet.

Making XML Look Good in Print

The actual xslt layout that creates my PDF is listed below.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" exclude-result-prefixes="fo">
<xsl:template match="xmlroottag">
    <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
      <fo:layout-master-set>
        <fo:simple-page-master master-name="simpleA4" page-height="29.7cm" page-width="21cm" margin-top="2cm" margin-bottom="2cm" margin-left="2cm" margin-right="2cm">
          <fo:region-body/>
        </fo:simple-page-master>
      </fo:layout-master-set>

      <fo:page-sequence master-reference="simpleA4">
        <fo:flow flow-name="xsl-region-body">

          <fo:block font-size="16pt" font-weight="bold" space-after="5mm">
              List of <xsl:value-of select="/xmlroottag/listtype"/>
          </fo:block>

          <fo:block font-size="10pt">

              <xsl:apply-templates />

          <fo:block space-before="5mm">
                thats all folks!
          </fo:block>

          </fo:block>
        </fo:flow>
      </fo:page-sequence>
     </fo:root>
</xsl:template>

<xsl:template match="person">
     <fo:block font-size="12pt" >
         <xsl:value-of select="first"/>
         <xsl:value-of select="last"/>
         <xsl:value-of select="party"/>
     </fo:block>
</xsl:template>

<xsl:template match="listtype">
</xsl:template>

</xsl:stylesheet>

The data file is listed here so you can see the structure of the xml.

<xmlroottag>                                                                                                            
   <listtype>presidents </listtype>                                                                                     
   <person>                                                                                                             
      <first> Robert </first>                                                                                           
      <last> Kennedy </last>                                                                                            
      <party> Dem </party>                                                                                              
   </person>                                                                                                            
   <person>                                                                                                             
      <first> George </first>                                                                                           
      <last> Bush </last>                                                                                               
      <party> Rep </party>                                                                                              
   </person>                                                                                                            
   <person>                                                                                                             
      <first> Jimmy </first>                                                                                            
      <last> Carter </last>                                                                                             
      <party> Dem </party>                                                                                              
   </person>                                                                                                            
</xmlroottag>                               

The xslt document does contain a single stylesheet which is essentially a collection of templates that you need or want to generate specific output. The person template (lines 33-40) when executed will extract the first, last and party fields from the xml data, yet I am getting a bit ahead of myself.

The form matches up with the starting at the template for the xmlroottag which starts matching with the data at element xmlroottag of our xml data file. This xsl template (lines 3-31) defines how the pages will be formatted (lines 5 – 9). All pages will just be a collection of simple A4 sized pages with border of 2cm on each side.

It is on line 7 that the defines the body of the document. This body is actually described in lines 11 – 26. The region body is where the xslt starts to actually process the data file and as the data is matched it is then output into the body of our document.

The first thing that is done in our document body is to retrieve the “listtype” tag. I am pulling in the value form the listtype field from the data file into the document. You can see that it is possible to pull in a value from the data by giving the exact location of the data. This obviously only works due to the nature of this field existing in that particular location. If the structure of the file changes or the root tag name changes, then this data won’t be found nor will the form actually create any meaningful output. The way that this is done is neither elegant nor flexible and it will be corrected in the future examples.

One of the most innocuous lines is the apply-templates at line 21 . This is essentially asking FOP to simply perform any of the templates that are defined if they happen to match that same structure in the data file. Fop will encounter the tag listtype and execute that template, which in this case doesn’t do anything. The next tags in the data file are the three different person blocks.

This person template will be executed three times, once for each of the tag in the data file. When it is executed it will extract that data and output it into our document body.

After the last person tag is processed there is no more of the data file left, and thus control resumes in the region body on line 22. This just changes the formatting and outputs a bit of constant text.

Most of the xslt file is taken up with some of the more boilerplate parts of the code (ie page layout) with only a few lines being used to process the xml data. In the rest of my examples I will be using this single page layout using the simple-page-master. However, it is possible to create different layouts for even pages, odd pages, blank pages, first page, or last page. If you want to do anything really creative using XSL-FO you will have to investigate specifically for that point.

Making the PDF

It is possible to include the merging of the xml data and xslt inside of a Java program but for right now, I will simply run the command line program “fop” with its parameters to generate the form. This will allow us to produce the output that we desire.

fop -xml data1.xml -xsl data1.xsl -pdf data1.pdf

This command simply creates the data1.pdf file from the data1.xml and data1.xsl source files.

Note: This assumes that fop exists in your path, otherwise you will need to give the fully qualified name to the fop executable.

The next set of changes for this can be found in part II.

Download source and pdf for this example

 

This entry was posted in programming and tagged , , , . Bookmark the permalink.