FOP 2.3 Generates Truncated/Corrupted PDF with Mathematical Unicode Characters
We use FOP 2.3 to generate PDFs based on HTML, and in some very rare cases we have found that the resulting PDF appears to be truncated and will not open in any PDF viewer. The aspects of the HTML that cause the problem are truly mysterious, and I will appreciate help determining what makes this particular HTML cause problems.
We detected the issue because we use Lowagie PdfReader to validate that the PDF we generate is well-formed. The PdfReader threw the following Exception:
com.itextpdf.text.exceptions.InvalidPdfException: Rebuild failed: trailer not found.; Original message: PDF startxref not found.
In researching this exception, I have found that in all cases the user is able to fix their issue by ensuring that the input and output streams are closed or flushed properly -- in our case, we are using the Java try-with-resources pattern to invoke close() automatically, so I don't believe this is our issue.
The majority of the characters rendered in the PDF are Mathematical Double-Struck characters (e.g. https://www.compart.com/en/unicode/U+1D538), but not exclusively -- many are normal Latin alphabet characters. The problem seems to be linked to the quantity of characters rather than particular characters, because I've been able to fix the problem by deleting enough characters, adding previously deleted characters and deleting others. In fact, sometimes adding more Latin characters allows it to render the PDF. Because we are able to render the PDF in some cases, I believe we have the fonts necessary to render the Mathematical characters.
I understand there are many factors in play, so I've tried to provide only the relevant information -- please let me know if there are other facts that would be helpful in determining the issue. I have omitted the HTML because it is over 200 lines long -- but I'm willing to provide it if you desire to look at it.