Bug 154700 - special characters in pdf export
Summary: special characters in pdf export
Status: RESOLVED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
7.5.2.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-07 11:27 UTC by Bernard
Modified: 2023-04-10 23:42 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
zip of source odt file, pdf export, and pdf after processing with pdfjam (685.93 KB, application/zip)
2023-04-07 11:32 UTC, Bernard
Details
pdf file produced by LO-7.4.6.2-2 (5.78 KB, application/pdf)
2023-04-10 14:41 UTC, Bernard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bernard 2023-04-07 11:27:52 UTC
Description:
In LibreOffice 7.5.2.2 50(Build:2) on Ubuntu 22.04 I get different behaviour than I used to when exporting pdf (older version compared: LibreOffice 7.3...).
A document containing "special characters" (eg old-style numerals or stylistic characters such as "Q" with a long tail) is exported as pdf. The exported pdf file is as expected. However, if that pdf file is processed with the pdfjam script (which calls the LaTeX pdfpages package) then the special characters are simply replaced with the "X" in a square glyph.
It suggests to me that the font handling in the exported pdf file is not quite standard.



Steps to Reproduce:
1.Create document with stylistic glyph elements
2.export as pdf (eg export as pdfmwe.pdf)
3.process that pdf file with pdfjam (eg pdfjam -o mwe-pdfjammed.pdf pdfmwe.pdf)

Actual Results:
see attached mwe-pdfjammed.pdf in which stylistic characters do not appear correctly.

Expected Results:
The file mwe-pdfjammed.pdf should appear the same as the attached pdfmwe.pdf


Reproducible: Always


User Profile Reset: No

Additional Info:
Font used: Libertinus Serif
Comment 1 Bernard 2023-04-07 11:32:21 UTC
Created attachment 186532 [details]
zip of source odt file, pdf export, and pdf after processing with pdfjam
Comment 2 ⁨خالد حسني⁩ 2023-04-09 21:30:42 UTC
I don’t see anything unusual about the PDF file LibreOffice generates, can you attach the PDFs created with the old version that were working for you?
Comment 3 ⁨خالد حسني⁩ 2023-04-09 21:42:50 UTC
Also, what options do you use with pdfjam? I tried it locally and the output PDF came out fine.
Comment 4 Bernard 2023-04-10 14:41:30 UTC
Created attachment 186564 [details]
pdf file produced by LO-7.4.6.2-2
Comment 5 Bernard 2023-04-10 14:44:57 UTC
(In reply to خالد حسني from comment #3)
> Also, what options do you use with pdfjam? I tried it locally and the output
> PDF came out fine.

The pdfjam call simply passes the original pdf file to an output file with no options - so output should be the same as input going straight through LaTeX pdfpages.
Comment 6 Bernard 2023-04-10 14:57:10 UTC
(In reply to خالد حسني from comment #2)
> I don’t see anything unusual about the PDF file LibreOffice generates, can
> you attach the PDFs created with the old version that were working for you?

The file from LO-7.5 is pdfmwe.pdf, size 5,914 bytes. This is indeed fine. However, after passing straight through a call to pdfjam (simply writing the input file to an output file) the result is the file mwe-pdfjammed.pdf, size 688,403 bytes! This "faulty" file also shows font encoding as "custom".

The corresponding results with LO-7.4.6.2-2 are a file of 5,920 bytes which passes through pdfjam to give an output of 5,854 bytes - which shows font encoding as "built-in" (as do all the other files which display correctly).

Thus the problem seems to be the font encoding instructions in the file from LO-7.5 - which somehow are misinterpreted once passed through LaTeX pdfpages.
Comment 7 ⁨خالد حسني⁩ 2023-04-10 22:49:16 UTC
I don’t see any meaningful difference between your 7.4 and 7.5-generated PDFs. The fonts embedded in the two documents are almost identical. There is more kerning adjustments in the 7.4 PDF than in the 7.5 one (that is probably because 7.5 is more accurate), but that shouldn’t affect font encoding.

When I use pdfjam on the 7.5 PDF, the result is still fine.

Did you update the font somehow recently? May be you have two different copies of the font? I don’t know how pdfjam works, but LIbreOffice-generated PDF embeds a font subset, while your broken embeds a full version of the font (pdfjam/pdftex must have gotten it from somewhere else since the PDF does not have it).
Comment 8 ⁨خالد حسني⁩ 2023-04-10 23:42:39 UTC
OK, I can reproduce this now. If I install libertinus-type1 package using tlmgr, pdfjam gives me a broken PDF. For some reason, pdftex is using the installed font instead of the one embedded in the PDF and this obviously doesn’t work.

The reason this started happening with 7.5 is because in 7.4 we were not using the font’s real PostScript name and were generating it (fixed in bug 138325).

For libertinus, the generate name was “LibertinusSerifRegular”, but in 7.5 it is now correctly “LibertinusSerif-Regular”. Apparently pdftex is matching this with the installed font and using it instead. This is obviously not a LibreOffice bug. You might get better debugging on the pdftex side asking on https://tex.stackexchange.com.