Bug 85569 - FILEOPEN: OLE insertion of PDF pages, bounding box and extent are not set correctly (STR comment 16)
Summary: FILEOPEN: OLE insertion of PDF pages, bounding box and extent are not set cor...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
4.3.1.2 release
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:pdf
Depends on:
Blocks:
 
Reported: 2014-10-28 19:40 UTC by Jérôme Borme
Modified: 2017-02-02 11:37 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
Document with the two inserted documents (62.04 KB, application/vnd.oasis.opendocument.presentation)
2014-10-28 19:40 UTC, Jérôme Borme
Details
screen capture of the previous document (47.14 KB, image/png)
2014-10-28 19:41 UTC, Jérôme Borme
Details
pdf file generated by scilab 5.5.1 (12.94 KB, application/pdf)
2014-10-28 19:41 UTC, Jérôme Borme
Details
pdf file generated by gnuplot 4.6.5 (9.28 KB, application/pdf)
2014-10-28 19:42 UTC, Jérôme Borme
Details
PDFs filtered in as Draw objects and inserted (69.42 KB, application/vnd.oasis.opendocument.presentation)
2014-10-29 15:18 UTC, V Stuart Foote
Details
pdf files opened in Draw, saved to odg and dropped into Impress (blue frame added for visibility) (77.82 KB, application/vnd.oasis.opendocument.presentation)
2014-10-29 21:42 UTC, Jérôme Borme
Details
plot filter import to Draw and saved (11.97 KB, application/vnd.oasis.opendocument.graphics)
2014-10-29 23:48 UTC, V Stuart Foote
Details
scilab plot filter imported to Draw and saved (12.94 KB, application/pdf)
2014-10-29 23:49 UTC, V Stuart Foote
Details
scilab plot filter imported to Draw and saved (16.74 KB, application/vnd.oasis.opendocument.graphics)
2014-10-29 23:50 UTC, V Stuart Foote
Details
Writer file with OLD copied .fodg graphs (66.06 KB, application/vnd.oasis.opendocument.text)
2014-10-30 00:07 UTC, V Stuart Foote
Details
ODP with both PDF opened into Impress and copied (58.21 KB, application/vnd.oasis.opendocument.presentation)
2015-03-22 21:16 UTC, V Stuart Foote
Details
Example of graphs pasted after import from Draw and using OLE. (141.07 KB, application/vnd.oasis.opendocument.presentation)
2015-03-22 22:20 UTC, Jérôme Borme
Details
Draw import filter opened PDF copied to Impress -- resized, font set (41.55 KB, application/pdf)
2015-03-23 01:00 UTC, V Stuart Foote
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jérôme Borme 2014-10-28 19:40:28 UTC
Created attachment 108583 [details]
Document with the two inserted documents

The bounding box of certain pdf imported into Impress is not correct. Two basic pdfs were created using scilab (numerical calculation free software) and gnuplot.

1. Create a new Impress document
2. Drag one of the attached pdf files onto an empty Impress slide
3. Observe how the pdf is displayed. 

Results: The scilab pdf is cropped. The gnuplot pdf has extra white margin.

Workaround: It is possible to double click on the pdf and manually change the limits, but this has to be done manually for each pdf and has the drawback that the user will never do twice the same thing, so the inserted image cannot be easily made of the same size on the slides.

See attached pictures and examples documents.

Libreoffice 4.3.1.2 on gentoo linux.

// Scilab 5.5.1 commands
x=0:2*%pi/100:(2*%pi)
plot2d(x, sin(x))
xtitle("The sine function", "abscissae (radians)", "coordinates (no unit)")
legend("sine")
// then in the graph window, choose File/Export.

# Gnuplot 4.6.5 commands
set terminal pdf enhanced
set output "test-libO.pdf"
set title "The sine function"
set xlabel "abscissae (radians)"
set ylabel "coordinates (no unit)"
plot [0:6.28] sin(x)
Comment 1 Jérôme Borme 2014-10-28 19:41:07 UTC
Created attachment 108584 [details]
screen capture of the previous document
Comment 2 Jérôme Borme 2014-10-28 19:41:45 UTC
Created attachment 108585 [details]
pdf file generated by scilab 5.5.1
Comment 3 Jérôme Borme 2014-10-28 19:42:19 UTC
Created attachment 108586 [details]
pdf file generated by gnuplot 4.6.5
Comment 4 V Stuart Foote 2014-10-29 15:14:24 UTC
Since these plots are coming from Scilab or GNUPlot could easily be output to any directly readable vector or bitmap format. Why even take them through PDF? But if you must...

Draw remains the preferred work flow for filtering PDF for import to ODF.

Depending on Impress to handle the OLE import (Drag-n-Drop) of PDF seems very ill-advised. Yes it works sort of, but it will never be consistent. Guess wec could describe it as an enhancement to Impress for better OLE object parsing of PDF for use in Impress module.

Far better fidelity is gained by opening the PDF in Draw, which more completely will render the PDF--including honoring its bounding box and scale.  Then saving the filter converted PDF to ODF .ODG drawing provides the best handling within the program. The converted drawing can be inserted as OLE (copied or linked), or dragged from system file manager again OLE.

External conversion of the PDF to a vector or bitmap format is probably even a better choice.  Ghostscript handles conversion well with a -sDEVICE= selection. But continue to suggest not using EPS as there are continuing issues within LibreOffice there.

Setting NEW, but not sure this is even a bug.
Comment 5 V Stuart Foote 2014-10-29 15:18:15 UTC
Created attachment 108631 [details]
PDFs filtered in as Draw objects and inserted
Comment 6 Jérôme Borme 2014-10-29 21:42:43 UTC
Created attachment 108658 [details]
pdf files opened in Draw, saved to odg and dropped into Impress (blue frame added for visibility)

> not sure this is even a bug.

From the point of view of the user, I was expecting OLE to "just work" with all the format it advertises as supported. OLE does detect the boundaries of bitmap images, there is no obvious reason to the user why pdf would behave any different.

You raise the point that I could use other formats. pdf has the advantage that virtually all software can produce it through printer emulation. It's a format that is fine to exchange with colleagues, fine to archive as a final document, fine to print on paper. svg performs worse in all the above criteria (my colleagues would have no clue what to do with an svg, it's not directly usable to print and screen rendering varies in quality, it's less nice to archive as it takes more space and embedded objects might be a problem), and also svg support in LO is not bug-free either.

Bitmap images are not a solution for many reasons that are out of the scope of this bug (not editable, by default transparency is likely to not be used, changes in size damage quality, aliasing fixed at production, poor result when printing).

> Draw remains the preferred work flow for filtering PDF for import to ODF.

Drag-and-drop to Draw of the provided pdf files pruduces the same bug: boundaries are not correct.

> Then saving the filter converted PDF to ODF .ODG drawing provides
> the best handling within the program. The converted drawing can
> be inserted as OLE (copied or linked), or dragged from system
> file manager again OLE.

I finally understood that you meant that using File/Open on the file would give a better result. This was not obvious as for me File/Open is just for native formats. Formats other than native are imported into a document, not "opened". But okay, I understood. Anyway, your suggestion does not always lead to good results. If the pdf is opened in Draw, saved in odg, then inserted again by OLE drag-and-drop, then an undesirable white space the size of a full page is added around the drawing (attached document).

Also, it makes little sense from the perspective of the user that drag-and-drop would give worse results than File/Open. Drag-and-drop was invented in the first place because it is far more practical than opening one by one all the documents. That the implementation is not perfect is understandable, still the objective should be that it should perform well.

To give an example, this is my use case: every Fridays morning, I am in a hurry to prepare a presentation for a meeting with my boss. Maybe I could do it the day before with more time, but we all know it's not going to happen. So I'm in a hurry, I create an empty document, drag and drop what I have to insert and try and arrange them into a meaningful presentation. It needs to be fast and it needs to work well at first try, because life is short and work is plenty. It is annoying for me the user if I have to give up on OLE and open the files one by one in a separate application (Draw), save them as odg, then open again in Impress. I might have many files to process this way, and I'm using an office suite to help me with my productivity needs, after all.
Comment 7 V Stuart Foote 2014-10-29 23:48:58 UTC
Created attachment 108662 [details]
plot filter import to Draw and saved

@Jérôme,

Sorry but you miss the point. We make no claim that an OLE insertion of a PDF into any component including Draw will properly format the PDF. LibreOffice is not a PDF editor!

Dealing with PDF, only native ODF as an .odg (or flat XML .fodg) drawings would be assured of rendering for use in the other LO components.

That is just the way the suite is structured.

What is provided is a fairly robust filter to import and render the PDF into the Draw component. It is not perfect, but does a reasonable job, and the filter is continually being improved.

Some of that filtering is marginally able to handle filtering for use in Impress and Writer. But IMHO that is a by-product and should not be relied on.

The proper work flow in dealing with PDF is Opening it with Draw and allowing its filtered import. It could then further be exported from Draw into multiple bitmap and vector formats that LibreOffice can consume, but there are other applications that are more efficient for that.

I've attached the two PDFs as converted to .ODG, they can be OLE linked or copied into an Impress .ODP.  Please note though that when correctly filtered into Draw there is no additional white space.
Comment 8 V Stuart Foote 2014-10-29 23:49:42 UTC
Created attachment 108663 [details]
scilab plot filter imported to Draw and saved
Comment 9 V Stuart Foote 2014-10-29 23:50:30 UTC
Created attachment 108664 [details]
scilab plot filter imported to Draw and saved
Comment 10 V Stuart Foote 2014-10-30 00:07:34 UTC
Created attachment 108665 [details]
Writer file with OLD copied .fodg graphs

To show that once the PDF has been reasonably rendered on filter import to Draw, it is functional. Here the two graphs, in .fodg flat XML format, are inserted--Insert --> Object --> OLE Object: Create from file--into a Writer document.

Unlike image formats (PNG, JPG, BMP, EMF or SVG; even EPS) this is the scope of what LibreOffice can reliably do with filter handling of PDF formatted graphics. Conversion to a Draw object and insertion as an OLE.
Comment 11 V Stuart Foote 2014-11-11 14:47:20 UTC
@Jérôme,

Have you had a chance to review your work flow to satisfy yourself that the project handles PDF illustrations correctly when PDF are opened in Draw, rather than directly inserting (or linking) into a component as an OLE object? 

If so we either need to set this Resolved NotABug, or restate it as a need for enhancement of OLE handling of PDF--your call.

Stuart
Comment 12 sam tygier 2015-03-22 17:33:01 UTC
It would be very useful for this to work correctly. In my work PDF is common output format for various plots and graphs. When I need to create a presentation it would be nice to be able to drag and drop my existing PDFs onto a slide.

Otherwsie I have to either convert the plots into a bitmap format, and figure out the compromise between filesize and display quality.
Comment 13 V Stuart Foote 2015-03-22 18:21:17 UTC
@Sam,
(In reply to sam tygier from comment #12)
> It would be very useful for this to work correctly. In my work PDF is common
> output format for various plots and graphs. When I need to create a
> presentation it would be nice to be able to drag and drop my existing PDFs
> onto a slide.
> 
> Otherwsie I have to either convert the plots into a bitmap format, and
> figure out the compromise between filesize and display quality.

It *does* work correctly.

But, bug 89727 - Implement an import filter to insert single PDF pages into Impress is probably more in line with your use case needs.

We never heard back from OP, so resolving this issue as Notabug.
Comment 14 Jérôme Borme 2015-03-22 19:36:39 UTC
I'll have a look at the workflow explained at bug 89727 and see if it indeed solves my problem.

However:

(In reply to V Stuart Foote from comment #13)
> It *does* work correctly.

It does not respect the bounding box of the included document. The OLE drag-and-drop is advertised to the user as working, therefore the user expects it to work 100%. If it does not work perfectly, it's what the user will call a bug.

Maybe this particular bug cannot be fixed because of the underlying software architecture, or maybe it's not worth spending time on fixing it if the workaround is easy, but that's a completely different matter.

I did not answer sooner because there was nothing useful I could answer. I showed you an example where a feature advertised by the software (PDF OLE) does not work perfectly, and you say it works. Somebody else says they have a similar problem, and you insist it does work. Now I'm spending time discussing about what a bug is and what it is not. If you're not willing to take the input of users, go on ignoring it. We both have better things to do.
Comment 15 V Stuart Foote 2015-03-22 21:16:36 UTC
Created attachment 114252 [details]
ODP with both PDF opened into Impress and copied

With Windows 7 sp1, 64-bit en-US with
Version: 4.4.2.1
Build ID: 93fc8832889bf050a10ec6d0171dae213adc9b55
Locale: en_US

Original .ODP of attch 108583, but added a slide with both PDF pages filter opened into Impress, and then selected and copied onto a new slide. 

The steps of bug 89727--to bypass default opening of PDF into Draw, and instead open directly into Impress.

Either way, PDF bounding-boxes are correctly handled.
Comment 16 V Stuart Foote 2015-03-22 22:15:09 UTC
@Jérôme,

OK, going to adjust the summary to correctly state the issue. I *can* reproduce the issue while trying to directly insert PDF into Impress or Draw with OLE actions.  While "drag-n-drop" OLE actions might be OS dependent, the menu/dialog driven OLE placement clearly displays there is an issue with OLE handling of PDF.

STR:

1. New Draw page, Impress slide, or Writer document
2. Insert -> Object -> OLE Object -> Create from File (radio button)
3. Search button, navigate to a single page PDF containing an image or graph.
The examples attached to this bug suffice

https://bugs.documentfoundation.org/attachment.cgi?id=108585
https://bugs.documentfoundation.org/attachment.cgi?id=108586

4. complete the "Insert OLE Object" dialog selecting a PDF image or graph
5. the bounding box and/or extent of the image or graph will not show correctly
=-=
6. now for comparison, File -> Open the same single page PDF image or graph
7. the PDF page will open in Draw--but does display correct bounding box and extents for the PDF.

Would expect the PDF import filter used for OLE rendering to produce similar results as the PDF import filter for Draw (or Impress, or Writer).
Comment 17 Jérôme Borme 2015-03-22 22:20:24 UTC
Created attachment 114257 [details]
Example of graphs pasted after import from Draw and using OLE.

The technique you are demontrating indeed allows to import a pdf, but it serves a different purpose. It allows to edit lines and texts, but it does not treat the figure as a whole like the OLE thing does.

The problem is that the paper dimensions and font sizes hardcoded into pdfs are set to dimensions that suited a paper print, and for an on-screen presentation they often need to change to fit the available space on the slide, or the focus that one wants to give to that particular graph.

With the import from Draw, when in the need for changing a graph size, all text fonts need to be changed, the text areas need to be resized if the font is larger, and text areas need to be centered again around the axis tick. The graph linewidth might also need to be changed. This takes a lot of work, and can never been done perfectly.

When using the OLE function, all imported figure behave the same, whether the file is a png or a pdf, which allows to efficiently compose a slide full of images from different origins.

The document attached demonstrates what happens when resizing a graph imported through Draw and using OLE.
Comment 18 Jérôme Borme 2015-03-22 22:55:43 UTC
(In reply to V Stuart Foote from comment #16)
> OK, going to adjust the summary to correctly state the issue. I *can*

Thanks you very much. Note that my comment #17 was submitted before I saw your comment #16.
Comment 19 V Stuart Foote 2015-03-22 23:15:28 UTC
(In reply to Jérôme Borme from comment #17)
> The technique you are demontrating indeed allows to import a pdf, but it
> serves a different purpose. It allows to edit lines and texts, but it does
> not treat the figure as a whole like the OLE thing does.

No, you aren't looking deep enough.

Yes, individual elements of the filter imported PDF (now internal BMP, WMF metafile) are fully editiable individually--but the entire inserted object (all elements) can be selected and acted upon. Once selected in bulk, scale and placement are the most obvious actions, but the whole range of actions in Draw or Impress is available for the entire image or graphic.

> 
> The problem is that the paper dimensions and font sizes hardcoded into pdfs
> are set to dimensions that suited a paper print, and for an on-screen
> presentation they often need to change to fit the available space on the
> slide, or the focus that one wants to give to that particular graph.

No, the FILEOPEN PDF filter will correctly size the resulting ODG, to the size of the PDF document.  What ever is specified in the PDF will be used to set the canvas for the filter import.  That is what is broken somehow in the OLE based filter.

> 
> With the import from Draw, when in the need for changing a graph size, all
> text fonts need to be changed, the text areas need to be resized if the font
> is larger, and text areas need to be centered again around the axis tick.
> The graph linewidth might also need to be changed. This takes a lot of work,
> and can never been done perfectly.

Not at all, just be sure to select the entire set of objects so the scaling or placement applies equally.
Comment 20 V Stuart Foote 2015-03-23 01:00:33 UTC
Created attachment 114259 [details]
Draw import filter opened PDF copied to Impress -- resized, font set

This PDF is export of a new slide from your ODP. 

The entire plot was selected and copied--180 Draw elements--into a new slide.  The elements were then scaled upward, and a font size of 20 was set. With that font, just the "3 min plasma" had to be moved left a tic. Finally, a rectangle with Gray1 Area fill was drawn, and then pushed to back. Finished with export of just that slide to PDF.

Should be apparent that working with PDF, opened via the import filter for Draw, does function and offers advantages for touch up that OLE insert will not necessarily provide.

Still, getting the OLE import filter to honor the PDF boundary needs to be corrected.
Comment 21 Jérôme Borme 2015-03-24 14:09:12 UTC
(In reply to V Stuart Foote from comment #20)
> The entire plot was selected and copied--180 Draw elements--into a new
> slide.  The elements were then scaled upward, and a font size of 20 was set.

Your solution scales up the size of the text areas, which is indeed an improvement (as compared to changing the size of a grouped object). But if I understand correctly you had to set the font size manually, so it has to be done every time you want to change the size, and in several steps if there are text areas with different font size (the numbers on the scale and some comment or formula written on top of the graph may not have the same size).

That's also an example of the different purposes that the OLE import and the Draw edition capabilities serve. The OLE is good for when you just want to compose a presentation with graphs already produced and you don't need to adjust anything except the final size the final screen size of the graph. The edition capabilities of Draw are useful when the graph still needs some polishing, but this added flexibility also requires more work (you have to set the font size and maybe move the caption a bit).
Comment 22 sam tygier 2015-03-28 15:57:12 UTC
I tried to have a look into what actually happens when insert a pdf into a slide as an OLE object (i.e. by drag and drop, or Insert->Object->OLE Object). Maybe the bounding box parsing code has a simple to fix bug. But I am having trouble tracking it down as I am not familiar with the LO code.

So far I have found my way to InsertEmbeddedObject() in core/comphelper/source/container/embeddedobjectcontainer.cxx which gets called for either method. And then into createInstanceInitFromMediaDescriptor() in core/embeddedobj/source/commonembedding/xfactory.cxx. I not sure I am that close to the PDF parser yet, so if someone could give me a hint as to where to find it that would be great. I assume it is independent to the the PDFimport plugin.
Comment 23 aflux 2015-09-12 11:48:49 UTC
Dear all,

is there anything new on this topic?

At the time of writing (5.0.1.2-release) things are still not completely ok.

Opening pdf in Draw works flawlessly, but pdf import to Impress (drag'n'drop or OLE import) discards bounding box information.

It seems that the most important part of the work is already done, and what is missing is the pdf import to share size informations with the OLE container.
Comment 24 V Stuart Foote 2015-09-12 19:14:34 UTC
@aflux, the Version field is earliest present, as from OP, not the latest. Please do not change meta data until you understand the QA workflow.
Comment 25 aflux 2015-09-12 19:16:40 UTC
It makes sense indeed. 
Apologies.
Comment 26 Robinson Tryon (qubit) 2015-12-03 11:03:01 UTC
Converting Whiteboard tags to Keywords: filter:pdf
Comment 27 sam tygier 2016-01-27 14:41:48 UTC
Still a problem in 5.1.0.2 (on Linux x86-64)
Comment 28 V Stuart Foote 2016-01-27 15:09:48 UTC
@Caolán, Armin, *

ref comment 22-- any hint for Sam on where the OLE import and pdf import meet to set the bounding box and its size on the draw/impress canvas?
Comment 29 Armin Le Grand 2016-01-28 15:24:51 UTC
Easiest way is to debug, break at InsertEmbeddedObject( and look at the stack, you will find the PDF importer. AFAIK it resides in filter\source\pdf. The pdf importer uses some PDF read-tooling and creates the needed objects using the UNO API, thus the code in-between will be massive. HTH!
Comment 30 sam tygier 2016-01-31 17:59:02 UTC
filter\source\pdf looks like just the export filter. Looks like sdext/source/pdfimport has all the import.
But i thinks it is bigger than just an issue in the pdf import.

If in writer you do Insert->Object->OLE Object -> Create from file, and then select a png file you get a similar issue. I think in both case it is creating a new draw file, loading the pdf or png into it, then embedding the draw file into the orginal document, but with the draw file page misaligned to the OLE object bounds.

For the png it's actually a bit worse because it creates an A4 page in draw, puts the png in the middle, and then renders the top left corner in the original docuement.
Comment 31 sam tygier 2017-02-02 11:37:44 UTC
In 5.3 PDFs get embedded as an image, so this is not a problem any more.

Thanks