Summary: | sdext xpdfimport (poppler): Garbage characters shown when open certain PDF in Draw | ||
---|---|---|---|
Product: | LibreOffice | Reporter: | Kevin Suo <suokunlong> |
Component: | Draw | Assignee: | Not Assigned <libreoffice-bugs> |
Status: | RESOLVED NOTOURBUG | ||
Severity: | normal | CC: | himajin100000, michael.warner.ut+libreoffice |
Priority: | medium | ||
Version: | 6.4.4.2 release | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Crash report or crash signature: | Regression By: | ||
Bug Depends on: | |||
Bug Blocks: | 99746 | ||
Attachments: |
1.pdf
1.pdf, uncompressed with qpdf --stream-data=uncompress |
Description
Kevin Suo
2022-05-08 02:44:31 UTC
Bug reproduced in: Version: 7.3.3.2 / LibreOffice Community Build ID: d1d0ea68f081ee2800a922cac8f79445e4603348 CPU threads: 4; OS: Mac OS X 10.14.6; UI render: default; VCL: osx Locale: en-GB (en_GB.UTF-8); UI: en-GB Calc: threaded Adobe Reader 11.0.23 and Mac OS Preview Version 10.1 (944.6.16.1) both seem to display the PDF correctly. LO input into Draw (using File>Open) results in multiple characters displaying as � Bug also reproduced with: Version: 6.4.4.2 Build ID: 3d775be2011f3886db32dfd395a6a6d1ca2630ff CPU threads: 4; OS: Mac OS X 10.14.6; UI render: default; VCL: osx; Locale: en-GB (en_GB.UTF-8); UI-Language: en-GB Calc: threaded Status set to NEW, earliest version affected to 6.4.4.2. Created attachment 180138 [details]
1.pdf, uncompressed with qpdf --stream-data=uncompress
e.g. --- /FT8 209 Tf /GS13 gs 0.05 0 0 -0.05 153.959 742.609 Tm <1C5F>Tj 208.797 -0 TD<0430>Tj 211.188 -0 TD<0773>Tj 208.797 -0 TD<04BC>Tj 211.188 -0 TD<2151>Tj 208.797 -0 TD<1BE9>Tj 211.188 -0 TD<303B>Tj ET Q q BT 0 0 0 rg /FT24 209 Tf /GS13 gs 0.05 0 0 -0.05 227.4 742.85 Tm <0754>Tj ET Q q BT 0 0 0 rg /FT24 149 Tf /GS13 gs 0.05 0 0 -0.05 233.04 739.13 Tm <0374>Tj ET Q q BT 0 0 0 rg /FT24 209 Tf /GS13 gs 0.05 0 0 -0.05 239.88 742.85 Tm <0D46>Tj ET --- <2151> = U+6B21 = '次' <1BE9> = U+65B9 = '方' <303B> = U+7A0B = '程' <0754> = <D835> <0374> = U+0032 = '2' <0D46> = U+2212 = '-' when I tried copying <D835> with firefox nightly and pasted to the text editor I normally use, I got a surrogate pair d835 dc00 = (U+1D400) when I tried the same thing with PDF-XChange, the <D835> part was just a blank. 5 0 obj << (snip) /FT24 10 0 R (snip) /FT8 13 0 R >> /XObject << /IM39 14 0 R /IM41 15 0 R >> >> /Rotate 0 /TrimBox [ 0 0 595.3 841.9 ] /Type /Page >> 10 0 obj << /BaseFont /DCWGQU+CambriaMath /DescendantFonts [ 20 0 R ] /Encoding /Identity-H /Subtype /Type0 /ToUnicode 21 0 R /Type /Font >> endobj 13 0 obj << /BaseFont /LNUHNF+SimSun /DescendantFonts [ 26 0 R ] /Encoding /Identity-H /Subtype /Type0 /ToUnicode 27 0 R /Type /Font >> endobj The PDF contains mangled text; surrogate pairs are all missing the low surrogate part, making the original text unrecoverable. Garbage in, garbage out. |