Description: Calc opening/saving as xlsx/ods format will lose PUA characters. Unicode PUA zone has 3 area: U+00E000-U+00F8FF Private Use Area U+0F0000-U+0FFFFF Supplementary Private Use Area-A U+100000-U+10FFFF Supplementary Private Use Area-B but from original file saved as ODS, the PUA chars will be removed. (after opening file) I lost a lot of data because of this issue The same problem occurs when accessing XLSX files evo: Windows10/LibreOffice 7.6.2.1 x64 Steps to Reproduce: 1. Open the xls/xlsx file containing PUA characters. 2. Save as ods/xlsx format file. 3. Close all files with Calc windows. 4. Open the newly saved ods/xlsx file. 5. All PUA characters are deleted and cannot be recovered. Actual Results: PUA characters are deleted. Expected Results: should be keep the original characters. Reproducible: Always User Profile Reset: Yes Additional Info: test file: https://ask.libreoffice.org/t/lost-pua-chars-with-xlsx-ods-formats/97477
Created attachment 190434 [details] PUAtest.xls PUAtest.xls is a file contain PUA chars.
Created attachment 190435 [details] PUAtest2.png PUAtest2.png is the xls format convert to ods format result.
Before concluding that something is wrong in Calc, it is necessary to check what is in the test file. From experiment (using Alt-X), problematic cells contain a pair of Unicode codepoints taken from the Surrogate block, but order in these pairs is incorrect. First codepoint is low surrogate instead of high. When members of the pair are switched, the surrogate pair is recognised as such and Calc displays an X-crossed rectangle (missing glyph in font) in my 7.5.7.1 under Fedora 38, KDE Plasma desktop. OP claims the characters are taken from a PUA block but decoding the surrogate pairs (at least in A9 and A10) shows they are somewhere in Plane 2. I'd first suspect an incorrect designation for the intended characters. More information in needed about the intended characters.
(In reply to ajlittoz from comment #3) > Before concluding that something is wrong in Calc, it is necessary to check > what is in the test file. > > From experiment (using Alt-X), problematic cells contain a pair of Unicode > codepoints taken from the Surrogate block, but order in these pairs is > incorrect. First codepoint is low surrogate instead of high. > > When members of the pair are switched, the surrogate pair is recognised as > such and Calc displays an X-crossed rectangle (missing glyph in font) in my > 7.5.7.1 under Fedora 38, KDE Plasma desktop. > > OP claims the characters are taken from a PUA block but decoding the > surrogate pairs (at least in A9 and A10) shows they are somewhere in Plane 2. > > I'd first suspect an incorrect designation for the intended characters. > > More information in needed about the intended characters. NEEDINFO while we wait for the reporter to respond.
Dear yukiguo, This bug has been in NEEDINFO status with no change for at least 6 months. Please provide the requested information as soon as possible and mark the bug as UNCONFIRMED. Due to regular bug tracker maintenance, if the bug is still in NEEDINFO status with no change in 30 days the QA team will close the bug as INSUFFICIENTDATA due to lack of needed information. For more information about our NEEDINFO policy please read the wiki located here: https://wiki.documentfoundation.org/QA/Bugzilla/Fields/Status/NEEDINFO If you have already provided the requested information, please mark the bug as UNCONFIRMED so that the QA team knows that the bug is ready to be confirmed. Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-NeedInfo-Ping