Bug 156819 - FILEOPEN DOCX Sub-direction is lost loading the file
Summary: FILEOPEN DOCX Sub-direction is lost loading the file
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: DOCX-RTL
  Show dependency treegraph
 
Reported: 2023-08-19 14:08 UTC by Hossein
Modified: 2023-08-19 14:34 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
DOCX: Numbers 123 / ۱۲۳, and the text "C++" in Persian (Farsi) paragraph (12.06 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-08-19 14:08 UTC, Hossein
Details
PDF output from LibreOffice (63.55 KB, application/pdf)
2023-08-19 14:13 UTC, Hossein
Details
PDF output from MS Word (63.55 KB, application/pdf)
2023-08-19 14:14 UTC, Hossein
Details
PNG: side by side comparison of the output from LibreOffice and MS Word (36.05 KB, image/png)
2023-08-19 14:15 UTC, Hossein
Details
TXT: The output of saving the DOCX to a text file (223 bytes, text/plain)
2023-08-19 14:34 UTC, Hossein
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hossein 2023-08-19 14:08:02 UTC
Created attachment 189039 [details]
DOCX: Numbers 123 / ۱۲۳, and the text "C++" in Persian (Farsi) paragraph

Description:
While loading a DOCX file, sub-direction is lost in an RTL paragraph. An LTR part of an RTL paragraph is also displayed as RTL. This also affects the display of numerals.

Steps to Reproduce:
1. Open LibreOffice and set the number display to "Context"
2. Open the attachment in LibreOffice.
3. Open MS Word and set the numerals to context in "File > Options > Advanced > Numerals: Context".
4. Open the attachment in MS Word.
5. Compare the display and output.

Actual Results:
The display of these in LibreOffice are completely different from MS Office. In LibreOffice, sub-direction is lost loading the file. C++ is rendered as ++C, and the 123 is displayed as ۱۲۳, which is incorrect. 

Expected Results:
The original document contains two numbers, one ۱۲۳ and one 123. Also, it contains the text C++.

Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 20f57e14362674d321ef184e1987f41a6418adc2
CPU threads: 20; OS: Windows 10.0 Build 22621; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_DE); UI: en-US
Calc: CL threaded
Comment 1 Hossein 2023-08-19 14:13:41 UTC
Created attachment 189040 [details]
PDF output from LibreOffice
Comment 2 Hossein 2023-08-19 14:14:00 UTC
Created attachment 189041 [details]
PDF output from MS Word
Comment 3 Hossein 2023-08-19 14:15:42 UTC
Created attachment 189042 [details]
PNG: side by side comparison of the output from LibreOffice and MS Word
Comment 4 ⁨خالد حسني⁩ 2023-08-19 14:21:34 UTC
This can be hacked by surrounding the text with RLE/LRE and PDF Unicode control characters, but when exporting back we wouldn’t have a way to differentiate between this and user-entered control characters (though probably it does not matter in practice). Otherwise we would need an internal machinery to handle explicit direction of text portion which we currently don’t have AFAIK.

This might be a duplicate of bug 156582 and possibly other DOCX-RTL issues that have the same root cause.
Comment 5 Hossein 2023-08-19 14:34:03 UTC
Created attachment 189043 [details]
TXT: The output of saving the DOCX to a text file

"Add bi-directional marks" should be selected during the export. In "File Conversion" dialog, the selected encoding is "Unicode (UTF-8)" in "Other encoding".