Bug 155470 - Numbers in a Farsi (and Arabic) document change to latin, after saving in docx format.
Summary: Numbers in a Farsi (and Arabic) document change to latin, after saving in doc...
Status: REOPENED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:docx
Depends on:
Blocks: DOCX-RTL
  Show dependency treegraph
 
Reported: 2023-05-24 10:03 UTC by Baback Ashtari
Modified: 2023-11-07 21:14 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
PDF file of the issue (559.05 KB, application/pdf)
2023-05-24 10:04 UTC, Baback Ashtari
Details
MS Word format (190.50 KB, application/msword)
2023-05-24 10:05 UTC, Baback Ashtari
Details
.odt format of the same document (58.13 KB, application/vnd.oasis.opendocument.text)
2023-05-24 15:46 UTC, Baback Ashtari
Details
DOCX Numbers 123 in Persian (Farsi) (11.78 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-06-08 11:50 UTC, Hossein
Details
DOCX Numbers 123 in Persian (Farsi) saved in LibreOffice (8.98 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-06-08 11:52 UTC, Hossein
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Baback Ashtari 2023-05-24 10:03:49 UTC
Description:
Everything looks great in LibreWriter. But after saving to a .docx format and the other party opens the file in Microsoft Word, numbers have changed in different lines. (Only numbers. Everything else looks good). Numbers that should be in Farsi format, for example, are one in Latin and one in Farsi. I have both original files and pdf so you can see yourself.

Actual Results:
Just type some numbers like 1234 76 89 0472 or in my case 10:22; 99:66-12 (should be in Farsi and perhaps with Arabic font) and save the document in .docx. Then open it in Microsoft Word. You'll see the result.

Expected Results:
numbers will be like 1۲3۴ or  10:22; ۹9:6۶-۱2


Reproducible: Always


User Profile Reset: Yes

Additional Info:
It should be: ۱۲۳۴ ۷۶ .... All in Farsi (complex format)
Comment 1 Baback Ashtari 2023-05-24 10:04:39 UTC Comment hidden (obsolete)
Comment 2 Baback Ashtari 2023-05-24 10:05:12 UTC Comment hidden (obsolete)
Comment 3 Baback Ashtari 2023-05-24 10:06:13 UTC Comment hidden (obsolete)
Comment 4 raal 2023-05-24 15:42:31 UTC Comment hidden (obsolete)
Comment 5 Baback Ashtari 2023-05-24 15:46:36 UTC Comment hidden (obsolete)
Comment 6 raal 2023-05-25 10:30:31 UTC Comment hidden (obsolete)
Comment 7 Saeid Kavandi 2023-06-07 10:38:34 UTC
This is not necessary a bug. If you copy your first cell numbers and past it in a simple text editor you can see this: ۱:7 – 5 (you typed one in Persian code and seven and five in Latin code). It is different interpretation of this numbers in word  and LibreOffice. if you save your file with .docx and .doc and open this file in word you can see this different.
Comment 8 Hossein 2023-06-07 11:58:43 UTC
Confirmed with both LO 7.5 and LO 7.6 dev master:

Version: 7.6.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: 52f70f04bdc586a072141e069d451a979c5f4cb7
CPU threads: 20; OS: Windows 10.0 Build 22621; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_DE); UI: en-US
Calc: CL threaded

Version: 7.5.3.2 (X86_64) / LibreOffice Community
Build ID: 9f56dff12ba03b9acd7730a5a481eea045e468f3
CPU threads: 20; OS: Windows 10.0 Build 22621; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_DE); UI: en-GB
Calc: CL threaded

As far as I remember, this is inherited from OpenOffice.org.
Comment 9 Hossein 2023-06-08 11:50:52 UTC
Created attachment 187783 [details]
DOCX Numbers 123 in Persian (Farsi)

The file is created in MS Word, and contains 123 in Persian, which should be rendered as ۱۲۳ when context option is enabled for the numerals. To set it, set it in MS Word options: "File > Options > Advanced > Numerals: Context".
Comment 10 Hossein 2023-06-08 11:52:30 UTC
Created attachment 187784 [details]
DOCX Numbers 123 in Persian (Farsi) saved in LibreOffice

This is the same DOCX file, opened and saved in LibreOffice. When reopening the same file in MS Word, the numerals are rendered as 123, and not ۱۲۳ anymore.
Comment 11 Hossein 2023-06-08 12:10:21 UTC
The file created in MS Word (document.xml):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document>
  <w:body>
    <w:p w14:paraId="0BC83E22" w14:textId="07F9CD70" w:rsidR="00CC1601" w:rsidRPr="00DA0543" w:rsidRDefault="00DA0543" w:rsidP="00DA0543">
      <w:pPr>
        <w:bidi/>
        <w:rPr>
          <w:rFonts w:hint="cs"/>
          <w:lang w:val="en-US" w:bidi="fa-IR"/>
        </w:rPr>
      </w:pPr>
      <w:r>
        <w:rPr>
          <w:rFonts w:hint="cs"/>
          <w:rtl/>
          <w:lang w:bidi="fa-IR"/>
        </w:rPr>
        <w:t>123</w:t>
      </w:r>
    </w:p>
    <w:sectPr w:rsidR="00CC1601" w:rsidRPr="00DA0543">
      <w:pgSz w:w="11906" w:h="16838"/>
      <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0"/>
      <w:cols w:space="708"/>
      <w:docGrid w:linePitch="360"/>
    </w:sectPr>
  </w:body>
</w:document>

After re-saved in LibreOffice:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document>
  <w:body>
    <w:p>
      <w:pPr>
        <w:pStyle w:val="Normal"/>
        <w:bidi w:val="1"/>
        <w:spacing w:before="0" w:after="160"/>
        <w:jc w:val="left"/>
        <w:rPr>
          <w:lang w:val="en-US" w:bidi="fa-IR"/>
        </w:rPr>
      </w:pPr>
      <w:r>
        <w:rPr>
          <w:lang w:bidi="fa-IR"/>
        </w:rPr>
        <w:t>123</w:t>
      </w:r>
    </w:p>
    <w:sectPr>
      <w:type w:val="nextPage"/>
      <w:pgSz w:w="11906" w:h="16838"/>
      <w:pgMar w:left="1440" w:right="1440" w:gutter="0" w:header="0" w:top="1440" w:footer="0" w:bottom="1440"/>
      <w:pgNumType w:fmt="decimal"/>
      <w:formProt w:val="false"/>
      <w:textDirection w:val="lrTb"/>
      <w:docGrid w:type="default" w:linePitch="360" w:charSpace="4096"/>
    </w:sectPr>
  </w:body>
</w:document>
Comment 12 Hossein 2023-06-08 12:20:01 UTC
By adding a <w:rtl/> in this section, the numerals look fine (rendered as ۱۲۳) in MS Word, and also Writer.

In other words, by turning this:

      <w:r>
        <w:rPr>
          <w:lang w:bidi="fa-IR"/>
        </w:rPr>
        <w:t>123</w:t>
      </w:r>

into this:

      <w:r>
        <w:rPr>
          <w:rtl/>
          <w:lang w:bidi="fa-IR"/>
        </w:rPr>
        <w:t>123</w:t>
      </w:r>
Comment 13 Commit Notification 2023-08-21 12:27:31 UTC
Hossein committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/40ed8dd3a5a16f21f2e98440c62efa0fa6ec60ff

tdf#155470 DOCX export: fix RTL numbers changed to LTR

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 Regina Henschel 2023-10-04 20:49:02 UTC
(In reply to Hossein from comment #9)
> Created attachment 187783 [details]
> DOCX Numbers 123 in Persian (Farsi)
> 
> The file is created in MS Word, and contains 123 in Persian, which should be
> rendered as ۱۲۳ when context option is enabled for the numerals. To set it,
> set it in MS Word options: "File > Options > Advanced > Numerals: Context".

I could not find that the MS Word option "Numerals: Context" is contained in the document. I have searched the web and examined the file markup. The file itself contains only the string "123". It seems, that this option is only a display option of the application, but not an information in the file. If that is true, LibreOffice cannot know, that Word shows the digits different than expected from their unicode code points.
Comment 15 Commit Notification 2023-10-05 08:15:43 UTC
Hossein committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e0bedd3f7311bf47392a46d097304e3c7afcb246

Revert "tdf#155470 DOCX export: fix RTL numbers changed to LTR"

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Xisco Faulí 2023-10-05 08:16:17 UTC
The fix has been reverted. Reopening...