xhtmlrenderer PDF Servlet Filter

August 12, 2008

I managed to get xhtmlrenderer working as a Servlet Filter, so that any HTML page you apply the filter to will be converted to PDF. Here are a few hints:

  • If you want to keep the XHTML in memory rather than writing it out to a file, then you’ll need an XML parser like Xerces to parse the XHTML into a DOM object before you can pass it to xhtmlrenderer. (iText has got to have some kind of XML parser it uses that you could probably use too, instead of including a separate library. I just didn’t take the time to figure it out.)
  • You’ll need something like a ByteArrayServletOutputStream to capture the JSP output in-memory so you can then convert it to PDF. Catalina has one.
  • If you have a table that spans multiple pages and you want to repeat the table header and footer on each page, use the custom CSS attribute “-fs-table-paginate: paginate;” on the table.
  • Be sure to pass xhtmlrenderer the URL of the page it’s rendering, so that it can access relative URLs to pull up external resources like images and CSS files.
  • I’m using JSPs as the templating solution, and I ran into some buffering issues where no content was written into the ServletOutputStream when I was applying the filter. A manual call to out.flush() at the end of the JSP works for now, but I’m looking for a better solution.

Java PDF Libraries

August 8, 2008

Looking today for a few OSS options for generating PDFs from Java. Here’s a quick summary of what I’ve found:

  • Apache FOP – generates PDFs from a library-specific XML document. You can use XSLT to translate your own XML into the library-specific XML.
  • iText – generates PDFs using API calls.
  • xhtmlrenderer (AKA Flying Saucer Project) – generates PDFs from HTML documents. You can use your regular templating engine to create the HTML, or even put the renderer in a servlet filter in order to PDFify JSP output. As a result, maintenance is easier than with the other frameworks, because you’re maintaining template files of the same kind as your webapp. HTML is going to be less flexible than native PDF commands, but there is a workaround for repeating table headers and footers on each page, for example.

A well-known library I don’t recommend is pd4ml – it also generates PDFs from HTML documents, but, if memory serves me correctly, it isn’t as feature rich as xhtmlrenderer, and it’s also commercial (non-free) software.