Integrating XSL-FO into Web-based applications

Integrating XSL-FO into Web-based applications

Adelene Ng

JavaMail and PDF report creation

This article demonstrates how we can integrate XSL-FO, XSLT, and JavaMail into our existing web-based applications. I show you how we can generate PDF reports for an application through the use of XSLT and XSL-FO embedded within the Java application. I also illustrate how the generated PDF file can be sent as an e-mail attachment using JavaMail.

Although a variety of Web-based technologies such as servlets or Web services are available, I chose the JSP approach. A simple JSP-based test harness was written to demonstrate the integration of all these technologies. An HTML table report generation example is also included to show how the XSL-FO table elements correspond to the HTML table elements. Although using XSL-FO and XSLT may be overkill, the greater degree of formatting control and flexibility may prove advantageous.

Tools Used

The entire system was written in Java running on Windows 2000 Professional. The tools used to build this system include:

* Java 2 Platform, Standard Edition (J2SE): http://java.sun.com/j2se/

* JavaMail 1.3: http://java.sun.com/prod ucts/javamail/

* JavaBeans Activation Framework 1.0.2: http://java.sun.com/products/jav abeans/glasgow/jaf.html

* JDOM Beta 9:www.jdom.org

* Log4j v 1.2.8 from Apache: http://log ging.apache.org/log4j

* FOP-0.20.5: http://xml.apache.org/fop/index.html

* Apache Tomcat 5.0.16 (JSP and Servlet Engine): http://jakarta.apache.org/tom cat/index.html

Formatting Objects Processor (FOP)

XSL-FO is part of the Extensible Stylesheet Language (XSL) family of recommendations from the W3C. XSL is used to define XML document transformations and presentations and is made up of:

* XSLT/XPath

* XSL-FO

XSLT is a language for transforming XML documents (mainly from XML to XML or XML to HTML).

XPath allows you to identify specific parts of your XML document and to write expressions to refer to, for example, the nth child element of the specified XML file. It is used extensively by XSLT for referencing specified elements within the input document for further processing.

XSL-FO is an XML language that defines page formatting and layout.

FOP is an implementation of the XSLFO specification defined by W3C. It is both an open source library and an application used to convert your XML documents into paginated output. FOP supports a number of different output formats, such as PDF, Postscript, PCL, and text.

Why XSL-FO?

XSL-FO, in particular Apache FOP, was used because FOP allows for easy conversion from XML to PDF. The formatting commands are not embedded into the Java application and are stored in a separate file. This means I can easily change the “look” of the resulting document. Also, XSL-FO may be elevated to a W3C standard in the near future.

Although there are a number of free PDF libraries (non-XSL-FO) available, such as PJ by Ethymon (www.etymon. com/epub.html) and retepPDF (www.ret ep.org.uk/retep/ home.do), I have chosen to compare iText (www.lowagie.com/iText) to Apache FOP because iText has been around much longer and is a popular package. Table 1 highlights some of the differences between iText and Apache FOP.

System Architecture

I have adopted an n-tier architecture for this system (see Figure 1). Clients communicate with the Web server, which serves up the HTML and JSP pages. The relationship between these pages is described in detail below. The JSP page instantiates helper objects (GenStatistics, Pairs) that live on the Web server. These in turn connect to the application server, retrieving the results and storing them in the helper objects. The results are then presented to the clients.

[FIGURE 1 OMITTED]

Design Overview

Figure 2 shows the relationship between the various HTML and JSP pages in the application. The start.html page is the entry point into the system. These pages are not elaborate, as their purpose is to serve as a test framework. When the “Submit” button on the start page is pressed, the genReport.jsp page is called. This checks the user selection. Depending on the type of report requested, either the genHTMLReport.jsp or genPDFReport.jsp page is invoked.

[FIGURE 2 OMITTED]

Both pages invoke the main class, GenStatistics, which connects to the application server, retrieves the data from the database, and stores the results in an array of the Pairs Bean. However, for illustration, I have removed this unnecessary complexity from the code examples and replaced it with a simple “read the data from file” illustration.

If the report is requested in HTML format, the genHTMLReport.jsp page is invoked.

If a PDF report is requested, the genPDFReport.jsp page is invoked. The results are stored in the Pairs array. This is traversed and the data is “XML-ized” using JDOM. An XSLT file containing the embedded XSL-FO commands is defined and applied to the XML-ized data, creating a FO file, which can then be passed to the Apache FOP driver to be rendered.

When the PDF report has been successfully generated, it can either be mailed out as an attachment to the specified recipient(s) or saved and displayed.

* Supporting Data Structures, Pairs Bean: I have defined a “Pairs” JavaBean to store the information retrieved from the application. It contains two attributes: a name (type String) and its value (type int). It contains methods for setting and retrieving attributes.

* The Main Class, GenStatistics: The application reads the data from the database and stores it in the “Pairs” array. When the results are returned, the data is “XML-ized.” I used JDOM for this purpose. JDOM is much easier to use than DOM. It is a Java representation of a XML document.

–To use JDOM, I first create the root node,

Element root = new Element(“statistics”)

–Next, I create a Document object, passing in the root node:

Document myDoc = new Document(root)

–To add more nodes to the root element, I iterate through the “Pairs” array. A new Element is created for each “Pairs” element in the array:

Element e = new Element(pairs[i].getName());

–To set the text value associated with the node, I call the setText method:

e.setText(Integer.toString(pairs[i].ge tValue()).

–This element is added to the root node, via root.addContent.

–When the XML Document object has been populated, it will have the structure shown in Figure 3.

[FIGURE 3 OMITTED]

Finally, the transform method is called to produce the XSL-FO commands. This takes as input the Document object, an XSLT stylesheet containing the embedded XSL-FO commands, and an FO output filename. An FO file containing the XSL-FO commands is created. The PDF is generated invoking the FOPDriver. Alternatively, the XSLT stylesheet can be set up to transform the XML document into HTML, which is then displayed on your browser (see Figure 4).

[FIGURE 4 OMITTED]

Using the XSLT Style Sheet with XSL-FO

Although it is possible to create the XSLFO commands by hand, it is troublesome to edit and modify it each time the XML contents change. Instead, it is easier to use an XSLT stylesheet to transform the XML data into an XSL-FO file (see Listing 1).

LISTING 1 * Transform the XML data into an XSL-FO file

1

2 <xsl:stylesheet version="1.0"

3

xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”

4 xmlns:fo=”http://www.w3.org/1999/XSL/Format”>

5

6 <fo:root

xmlns:fo=”http://www.w3.org/1999/XSL/Format”>

7

8 <fo:simple-page-master master-name="simple"

9 page-height=”29.7cm”

10 page-width=”21cm”

11 margin-top=”1cm”

12 margin-bottom=”2cm”

13 margin-left=”2.5cm”

14 margin-right=”2.5cm”>

15

16

17

18

19

20

21

22 Server Statistcs

23

24

25 Server Statistics at <xsl:value-of

select=”/statistics/@date”/>

26

27

28

29 Page

30

31

32

33

34

35

36

37

38 <fo:table-cell border-

style=”solid” display-

align=”center”

39 border-color=”black”

border-width=”1pt”

40 padding-before=”3pt”

padding-after=”3pt”

41 padding-start=”3pt”

padding-end=”3pt”>

42 Type of

Object

43

44 <fo:table-cell border-style="solid"

display-align=”center”

45 border-color=”black” border-

width=”1pt”

46 padding-before=”3pt”

padding-after=”3pt”

47 padding-start=”3pt”

padding-end=”3pt”>

48 Total

Number

49

50

51

52

53 <fo:table-cell border-style="solid"

display-align=”center”

54 border-color=”black” border

width=”1pt”

55 padding-before=”3pt”

padding-after=”3pt”

56 padding-start=”3pt”

padding-end=”3pt”>

57

58 <xsl:value-of

select=”name()”/>

59

60

61 <fo:table-cell border-style="solid"

display-align=”center”

62 border-color=”black” border

width=”1pt”

63 padding-before=”3pt” padding

after=”3pt”

64 padding-start=”3pt” padding

end=”3pt”>

65

66

67

68

69

70

71

72

73

74

75

76

77

78

1. The XSLT file, dsstats.xsl, starts with the XML and namespace declaration (see lines 1-4).

2. A template rule is declared on line 5. This rule searches for and matches the root tag, replacing it with the content following it.

3. Line 6 is the root element for the XSLFO document. This typically contains the followed by one or more elements.

4. The element defines the page for our document (line 7). The element embedded within the tag defines the page layout required for this application (lines 8-19).

5. The page-height and page-width attributes (lines 9-10) define the size of the physical page. The master-name attribute (line 8) declares a name for this master page. It is referenced by the master-reference attribute in the element (line 21). The meaning of the remaining attributes is shown in Figure 5 (lines 11-18).

[FIGURE 5 OMITTED]

6. The element (lines 23 and 28) allows the data occurring within these tags to appear on every page. Lines 23-30 show how the title and page number can be made to appear on every page.

7. The element is used to format paragraphs, titles, figure captions, and table titles. It can also contain raw text. In the code snippets presented here, I show an example containing raw text plus an XSL command (line 25). The XSL command matches the attribute “date” associated with the tag “statistics”, extracting its value. The second example shows raw text plus an embedded command (line 29).

8. The element (line 31) contains the actual content. This is made up of sequences of fo:block, fo:block-container, fo:table-and-caption, fo:table and fo:list-block. The flow-name attribute specifies where the flow’s content will be placed.

9. The XSL-FO command is used to generate tables. The command (line 33) is embedded within the elements (line 32). The table-layout attribute is set to “fixed”. This is the only option currently supported. Next, I specify the element (lines 34-35), setting the column-width attribute to a specified value.

10. The element (line 36) contains all the elements (line 37). Each element contains the elements (line 38). This is where all the work for the tables is done. A number of properties are associated with the element. This controls the table look and feel.

11. Finally, to populate the cells of the table, I use the XSL command to iterate through all the child elements contained in the XML data via (line 51). For each child element of the statistics tag, get the name of the node (line 58) and its value, (line 66) and populate the table cells.

12. To ensure that the XSLT and XSL-FO commands are error free (before attempting to integrate all the components), I verified that the XSLT file produced the correct PDF output by running the files through the fop.bat utility:

fop -xml myTest.xml -xsl dsstats.xsl -pdf myTest.pdf

Transforming the Document

Having created the XSLT stylesheet containing the embedded XSL-FO commands, the next step is to programmatically invoke this stylesheet within the application. The output of this transformation step produces a .FO file. This contains the XSL-FO commands and data for populating the table rows. This data was extracted from the XML document created previously. The sequence of steps is outlined below:

1. A StreamSource is first created from the given XSLT style sheet File object:

StreamSource strmSource = new

StreamSource(styleSheetFile);

2. Next, create the Transformer object from the TransformerFactory, passing in an instance of the StreamSource object.

TransformerFactory transformerFactory =

TransformerFactory.newInstance();

Transformer transformer =

transformerFactory.newTransformer(strmS

ource);

3. Invoke the transform method, passing in the JDOMSource and JDOMResult as arguments. Note, the JDOMSource constructor takes in a Document object as input.

transformer.transform(jdSrc, jdRes);

4. Finally, output the resulting .FO file using the XMLOutputter object:

XMLOutputter xmlOutputter = new

XMLOutputter(” “, true);

This will create an XMLOutputter object with the specified indent (usually a number of spaces). If the second argument is true, new lines will be printed.

xmlOutputter.output(jdRes.getDocument()

, new FileOutputStream(fopFileName));

PDF Report Creation

Once the .FO file is created, I call the FOP Driver run() method to render the document.

1. First, set up logging. The MessageHandler object handles the global logging of all FOP processes. The FOP Driver handles per-instance logging. Both of these have to be set using an implementation of org.apache.avalon.framework.logger.Logg er. I used the Log4Jlogger implementation because existing code on the application server currently utilizes Log4J.

2. Create a Log4JLogger object and associate this with the static logger object created on start up in the GenStatistics class, i.e.

org.apache.avalon.framework.logger.Log

4JLogger Alogger = new

org.apache.avalon.framework.logger.Log

4JLogger(GenStatistics.logger);

3. Next, instantiate the FOP Driver.

Driver driver = new Driver();

4. Set the logger associated with the FOP Driver to point to Alogger.

driver.setLogger(Alogger);

5. Make the screen logger point to Alogger.

MessageHandler.setScreenLogger(Alogge r);

6. Set the type of rendering desired via

driver.setRenderer(Driver.RENDER_PDF) ;

7. Set the input source to

driver.setInputSource(new

InputSource(new FileInputStream(new

File(fopFileName))));

8. Set the output source to

driver.setOutputStream(new

FileOutputStream(new

File(pdfFileName)));

9. Finally, render it.

driver.run();

If you are specifying XSLT and XML files as input, change steps 7 and 9 to:

InputHandler inputHandler = new

XSLTInputHandler(xmlFile, xsltFile);

driver.render(inputHandler.getParser(

), inputHandler.getInputSource());

Mailing the Generated PDF Report as an Attachment

After the PDF report has been generated, I use the JavaMail API to send the document out as an email attachment. 1. Obtain a default Session.

Session s =

Session.getDefaultInstance(myProperti

es, null);

myProperties is the properties object. I supply the mail.smtp.host property defined in the Mail.properties file. The Authenticator object (the second argument) is used to indirectly check access permissions. If the Authenticator object is set to null, this means anyone can get the default session.

2. Create a new Message.

Message msg = new MimeMessage(s);

3. Set the To, From, Subject, and Date Sent fields.

4. Create the Message Body Parts.

MimeBodyPart mbp2 = new

MimeBodyPart();

5. Attach the file to the message.

FileDataSource fds = new

FileDataSource(attachment);

mbp2.setDataHandler(new

DataHandler(fds));

mbp2.setFileName(attachment);

6. Create a multipart object and add the message body parts from step 4 to it.

Multipart mp = new MimeMultipart();

mp.addBodyPart(mbp2);

7. Add the multipart object to the message.

msg.setContent(mp);

8. Send the message.

Transport.send(msg);

HTML Report Generation

Instead of generating a PDF report, I also show how XSLT can be used to transform the XML document created previously into an HTML report. A complete listing of the XSLT commands for this can be seen in Listing 2.

LISTING 2 * XSLT commands to translate XML to HTML

1

2 <xsl:stylesheet version="1.0"

3 xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”>

4

5

6

7 Server Statistics – HTML</p><p> format

8

9

10

11 Server Statistics at <xsl:value-of select="

/statistics/@date”/>

12

13

14

15

16

17

18

19

20

21

22

23

24

Type of ObjectTotal Number

25

26

27

28

1. Lines 1-3 show the XML and namespace declarations.

2. The template rule is shown on line 4. This searches for and matches the root tag, replacing it with the content following it.

3. Table creation begins on line 13. The table headings are specified on lines 14-17.

4. Iterate through all the child nodes of the element, populating each row of the table with the object name and number of objects (lines 18-23).

5. From Listing 2, we see that the XSLFO table and HTML table commands are very similar. Table 2 shows the relationship between the two.

Running the Test Application

1. Install Tomcat 5.X on your system

2. Drop your JSP files into the C:jakarta-tomcat-5.0.16webappsMyApplicationjspfolder

3. Make sure that the associated JAR files are in the C:jakarta-tomcat-5.0.16webappsMyApplication WEB-INFlib directory

4. Start up Tomcat, C:jakarta-tomcat-5.0.16binstartup.bat

5. Start up your browser and point it at the start.jsp page

Conclusion

This article has shown you how the various technologies such as XSL-FO, XSLT, and JavaMail can be integrated into existing Web-based applications. A simple front end using JSP is provided as a test harness. I also explained how XSLT can be used to generate the XSL-FO commands, illustrated how the PDF or HTML reports are generated, and showed how JavaMail is used to e-mail the PDF reports to a specified recipient list.

Table 1 * iText vs. FOP

iText Apache FOP

Uses its own input format, iText-xml Uses the XSL-FO

specification

Used for postprocessing FOP-generated

PDF files (merging, updating, and encrypting)

Document generation is much faster for Slower for long

long documents documents

Experimental XML2PDF functionality Fully supports XML

to PDF conversion

Table 2 * Mapping between XSL-FO and HTML table elements

XSL-FO Element HTML Element

cols attribute of

i.e. <table

cols=>

Acknowledgements

I want to thank Tzu-Khiau Koh and Yee-Koon Loh for their comments on early drafts of this article.

Reference

* Extensible Stylesheet Language, Version 1.0: www.w3.org/TR/xsl/

Adelene Ng is an independent software consultant. She holds a Ph.D. in Computer Science from the University of London, United Kingdom, and an M.Sc. from the University of Manchester, United Kingdom.

NG_A@HOTMAIL.COM

COPYRIGHT 2004 Sys-Con Publications, Inc.

COPYRIGHT 2004 Gale Group