Bug 150217

Summary: Add import Filter support for PDFs holding XFA based form content
Product: LibreOffice Reporter: mikeclemmons_2000
Component: DrawAssignee: Not Assigned <libreoffice-bugs>
Status: UNCONFIRMED ---    
Severity: enhancement CC: mikekaganski, quikee, vmiklos, vsfoote
Priority: medium Keywords: needsDevEval
Version: 7.2.7.2 release   
Hardware: All   
OS: All   
Whiteboard:
Crash report or crash signature: Regression By:
Bug Depends on:    
Bug Blocks: 99746, 114233    
Attachments: Draw cannot open PDF

Description mikeclemmons_2000 2022-08-01 08:10:08 UTC
Created attachment 181526 [details]
Draw cannot open PDF

PDF: https://www.canb.uscourts.gov/sites/default/files/forms/denb-request-form-version-5.15.pdf

Result:

If this message is not eventually replaced by the proper contents of the document, your PDF viewer may not be able to display this type of document. 

Expected result:

What the Firefox PDF viewer shows
Comment 1 Mike Kaganski 2022-08-01 11:00:30 UTC
Not sure it's a bug, or that we should "fix" it.

LibreOffice is *not* a PDF viewer or editor. It can import some subset of PDF into its own document model; but PDF forms, scripting, and everything dynamic is out of scope of LibreOffice IMO.

So it's completely normal that it correctly imports the placeholder.
Comment 2 V Stuart Foote 2022-08-01 15:19:26 UTC
PDF with scripted XFA (Adobe's XML Form Architecture) as here generated by Adobe's LiveCycle Designer ver. ES 10.1, remain a common source document LibreOffice users need to manipulate in some fashion. Inability to even view it is distracting.

Our current pdf import filters (both pdfium and poppler based implementations) do not parse the XML describing the form.

pdfium now has an XFA parser [1], poppler is looking at it [2]. And, mozilla has added XFA support to pdf.js [3].

XFA based forms are not going away anytime soon, so ability to at least expose the PDF form as an image with our pdfium implementation seems reasonable minimal handling.

Worth the effort?

=-ref-=

https://github.com/chromium/pdfium/tree/master/xfa
https://gitlab.freedesktop.org/poppler/poppler/-/issues/530
https://github.com/mozilla/pdf.js/issues/2373
Comment 3 Mike Kaganski 2022-08-01 15:43:54 UTC
(In reply to V Stuart Foote from comment #2)
> PDF with scripted XFA ... remain a common source document
> LibreOffice users need to manipulate in some fashion.

This needs clarifying. Why do they need that? To create a static image?

LibreOffice imports PDF as a set of graphical objects. PDF forms are means to provide data to some services. The two worlds don't intersect.

> pdfium now has an XFA parser [1], poppler is looking at it [2]. And, mozilla
> has added XFA support to pdf.js [3].

Setting aside the general-purpose PDF libraries supporting dynamic PDF content (which is natural, given their general-purposeness), Mozilla's decision is natural (given that opening PDFs in browsers is a norm, and users expect to interact with such PDFs in a normal way), and is orthogonal to how LibreOffice opens these files, so this reference is also unrelated.

> XFA based forms are not going away anytime soon, so ability to at least
> expose the PDF form as an image with our pdfium implementation seems
> reasonable minimal handling.

Again: why? Supporting some "minimal" image-like support for a purely dynamic feature seems worse than just clear "we do not support it" to me.
Comment 4 V Stuart Foote 2022-08-01 16:18:22 UTC
(In reply to Mike Kaganski from comment #3)
as you note in comment 1, LibreOffice is *not* a PDF viewer nor an editor.

For bug 89727 project implemented a simple insert from PDF as image--for what ever purpose user may need--we should at least provide ability to render an XFA based PDF form to a static image. 

To me that is reasonable minimal function in line with our position that LO is not a PDF viewer or editor. But we should be able to see the content of any PDF, XFA forms included. With bug 114234 open against a need for a dialog to manipulate insert/import of multi-page PDFS.

Beyond that, dev's choice to provided conversion on export to a functional form requiring much more complex import. There is no requirement driving LO ability to do so
Comment 5 Miklos Vajna 2022-08-02 06:20:01 UTC
pdfium has a feature flag to support xfa, but as Mike says it would be quite some effort to get that working (just to set expectations). The other trouble is that xfa would mean we also bundle the v8 javascript engine, which is again quite some maintenance.

My take would be that it's not impossible to do this (after all Chrome's pdf viewer is somewhat similar to how you can view PDFs in Draw, and there this works), but it's quite hard. It's easier for browsers, that already have a js engine at hand.