Bug 92519 - SVG images are wrongly embedded tens of times in FODT
Summary: SVG images are wrongly embedded tens of times in FODT
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.0.0.0.beta3
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, filter:fodt, regression
Depends on:
Blocks: ODF-Flat
  Show dependency treegraph
 
Reported: 2015-07-03 10:16 UTC by g3855563
Modified: 2023-11-06 23:52 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
original sample ODT (18.48 MB, application/vnd.oasis.opendocument.text)
2023-11-06 23:52 UTC, Stéphane Guillou (stragu)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description g3855563 2015-07-03 10:16:44 UTC
If i save a document with an SVG image or an ODG embedded object, the document size grows immensly. If i open the fodt file in my text editor, i see the image is embedded tens of times instead of just once.

The document is correctly saved in LibreOffice 4.4.3.2 with a filesize of around 750K. If i save the same document in LibreOffice > 5.0.0.0, the same document is saved as a whopping 55MB file. I've tested this with 5.0.0.0 Beta 3 and 5.0.0.2.

To confirm download this file (somehow it was not saved as fodt but as odt):
https://github.com/pilight/pilight-manual/raw/d8760ea825cad0421df599d117ee57818ec80e20/english/electronics-wiring.fodt

First open and save it again as fodt in 4.4.3.2 and you'll see a filesize of 756K. The source of the file nicely shows the SVG XML tags embedded in the document source.

Then open the exact same file as save it again as fodt in 5.0.0.2 (a26d58f11b99b6aeddf7f7884effea188cc6e512) and you'll see a filesize of 55.2MB. The source of the file now doesn't contain the nice SVG XML tags anymore but instead the "office:binary-data" base64 encoded images are inserted 58 times (for 2 images).
Comment 1 g3855563 2015-07-03 10:36:12 UTC
I now see the a similar things happens with the regular odt file, but now two wierd 14.6MB TTF fonts are saved inside the odt archive (font8.ttf and font21.ttf) in both 4.4.3.2, 5.0.0.0, and 5.0.0.2.

To confirm (again) download this file (somehow it was not saved as fodt but as odt):
https://github.com/pilight/pilight-manual/raw/d8760ea825cad0421df599d117ee57818ec80e20/english/electronics-wiring.fodt

Rename to zip. Open it with your archiver and you'll see the large font8.ttf and font21.ttf in the Fonts folder. The total filesize of the file is 18.4MB

Now open and save the same document in 4.4.3.2 first as fodt. This reduces the filesize to 756K. Then save that same file again as odt and you'll see it inserts those large fonts again which increases the filesize to a total of 18.4MB

The same applies to 5.0.0.0 and 5.0.0.2, with the difference that in these versions the fodt file is with 55MB also massively large (as described in my first post).
Comment 2 Buovjaga 2015-07-04 11:02:41 UTC
Reproduced.
Lowered severity https://wiki.documentfoundation.org/images/0/06/Prioritizing_Bugs_Flowchart.jpg

Win 7 Pro 64-bit, Version: 4.4.4.3
Build ID: 2c39ebcf046445232b798108aa8a7e7d89552ea8
Locale: fi_FI

Version: 5.1.0.0.alpha1+ (x64)
Build ID: 8b788891796ff0571f779cdbe8ce809c35c42754
TinderBox: Win-x86_64@62-TDF, Branch:MASTER, Time: 2015-07-02_23:09:27
Locale: fi-FI (fi_FI)
Comment 3 raal 2015-10-30 12:46:20 UTC
This have begun with this commit :
bibisect-win32-5.0 51edc379a18acada247bb86dfab0204e45ce1514 is the first bad commit
commit 51edc379a18acada247bb86dfab0204e45ce1514
Author: Norbert Thiebaud <nthiebaud@gmail.com>
Date:   Sat May 16 12:31:26 2015 -0500

    source f86a1dbf2a6761b23f9430b6bc61e789190290c9

    source f86a1dbf2a6761b23f9430b6bc61e789190290c9
	
	author	David Tardon <dtardon@redhat.com>	2015-01-06 15:09:35 (GMT)
committer	David Tardon <dtardon@redhat.com>	2015-01-06 15:12:29 (GMT)
commit f86a1dbf2a6761b23f9430b6bc61e789190290c9 (patch)
fdo#78921 save embedded fonts in Flat ODF

This is probably not a bug but feature - allow to embedded font in fodt files.
You can disable this feature with File - Properties - tab Font - uncheck "Embed fonts in the document". Without this check "Embed fonts in the document" is file 750K size.

@David, can this be closed - reporter wrote "The source of the file nicely shows the SVG XML tags embedded in the document source." ? Thanks
Comment 4 David Tardon 2015-11-03 11:17:44 UTC
It's unclear to me what the supposed bug is. The reporter talks about an SVG image embedded multiple times, not about fonts. But maybe he's just mistaken?
Comment 5 g3855563 2015-11-03 21:13:28 UTC
I don't quite get what the confusion is. If you just follow the steps i described in the opening post and in comment #1 (with a sample document) you just can't miss it.

To be clear, the fonts are *not* the bug. The corrupt (and big) font files that appear in the *ODT* files (but not the *FODT) are some wierd side-effect of the SVG bug i'm describing.
Comment 6 g3855563 2015-11-03 21:33:21 UTC
I have to stand corrected. Disabling the embedded fonts in the *fodt* does fix part of the issue.

But:
1. I'm not sure if a file size growth of 750Kb to 62.4MB is desireable. 
2. It does not explain what the wierd corrupt font file is in the *odt* archives?
3. And why my SVG image is saved as binary data and not as xml (as it did in the previous version).
Comment 7 Xisco Faulí 2016-09-11 19:44:41 UTC Comment hidden (obsolete)
Comment 8 Xisco Faulí 2016-09-26 16:57:02 UTC
Adding keywords regression, bibisected and bisected
Comment 9 tommy27 2017-07-04 11:55:28 UTC
(In reply to Xisco Faulí from comment #8)
> Adding keywords regression, bibisected and bisected

this implied that the issue has been reproduced.
hence, status -> NEW
Comment 10 g3855563 2018-06-22 16:32:46 UTC
I've tested LibreOffice 6.0.0.3 (x64) (64a0f66915f38c6217de274f0aa8e15618924765) and the bug still applies. The files size has now grown to 63.2MB instead of the 55MB reported earlier, in *fodt* files. In version 6.0.5.2 (x64) (54c8cbb85f300ac59db32fe8a675ff7683cd5a16) only 54 binary data tags are available with a filesize of 62.7MB

The same (corrupt) fonts(?) are still present in *odt* files in both versions. I also notice that there were 17 in version 6.0.0.3, but only 15 in 6.0.5.2, of which in both versions two are of a huge size and corrupt.
Comment 11 QA Administrators 2019-06-23 02:51:40 UTC Comment hidden (obsolete)
Comment 12 QA Administrators 2021-06-23 03:48:58 UTC Comment hidden (obsolete, spam)
Comment 13 QA Administrators 2023-06-24 03:14:34 UTC Comment hidden (obsolete)
Comment 14 Stéphane Guillou (stragu) 2023-11-06 23:40:37 UTC
(In reply to g3855563 from comment #6)
> 1. I'm not sure if a file size growth of 750Kb to 62.4MB is desireable.
If this is related to font embedding, it often does take a lot of space, but I'm not sure there's a way around it.
I've tested saving as FODT and got 17.3 mb with fonts, 695 kb without fonts. SVG XML remains for the original images. No duplication of it in subsequent modifications + saves.

> 2. It does not explain what the wierd corrupt font file is in the *odt*
> archives?
I can see the two larger TTF files in your sample ODT: https://github.com/pilight/pilight-manual/raw/d8760ea825cad0421df599d117ee57818ec80e20/english/electronics-wiring.fodt
But I don't know how to reproduce this issue again. I tried turning the Font Embedding option off in the files properties, then saving, then turning it back on an saving again: the ODT is 2 mb, the TTF files are less than 500 kb, they are all functional.

> 3. And why my SVG image is saved as binary data and not as xml (as it did in
> the previous version).
That's a good point, keeping the actual SVG XML of newly inserted images would also improve version control like requested in bug 85660 (although I'm not sure what the specification says about it). And then, there's the issue of the loss of the SVG in subsequent saves, described in bug 123396.

In any case, one report should be about one issue, and IMHO this report is too scattered to lead anywhere (ODT vs FODT, font embedding size, corrupt font file, conversion of XML to binary, duplication of pictures...)
I think we should split this into smaller, precise, focused reports for the issues that are still relevant in a recent version. (Happy to open one about point 3.)

Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 31fb3045dabdb27d913712f3abcade315e3ea9bd
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded
Comment 15 Stéphane Guillou (stragu) 2023-11-06 23:52:07 UTC
Created attachment 190694 [details]
original sample ODT

Attaching this one, with correct extension, instead of relying on the GitHub link.