Bug 132938 - Very strange numbers inside table as UTF-8 krakozyabras
Summary: Very strange numbers inside table as UTF-8 krakozyabras
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.4.4.1 rc
Hardware: All All
: medium normal
Assignee: Ming Hua
URL:
Whiteboard: target:7.1.0
Keywords: easyHack
Depends on:
Blocks:
 
Reported: 2020-05-11 00:51 UTC by Mikhail Novosyolov
Modified: 2020-11-23 22:06 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
table1_tdf132938.docx: Table with strange symbols displayed as numbers (17.73 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-05-11 00:52 UTC, Mikhail Novosyolov
Details
Screenshot: Correctly displayed as numbers in MS Office Online (104.01 KB, image/png)
2020-05-11 00:53 UTC, Mikhail Novosyolov
Details
Screenshot: krakozyabraz in LibreOffice (65.32 KB, image/png)
2020-05-11 00:54 UTC, Mikhail Novosyolov
Details
Full screen screenshot of LibreOffice (182.00 KB, image/png)
2020-05-11 07:54 UTC, Mikhail Novosyolov
Details
Patch to OpenSymbol.sfd adding glyphs U+F030 to U+F039 (3.18 KB, patch)
2020-11-17 16:49 UTC, Ming Hua
Details
screenshot: LibreOffice 6.4.7 with cherry-picked 52f1115571469 (99.74 KB, image/png)
2020-11-20 18:55 UTC, Mikhail Novosyolov
Details
Screenshot with new font, 6.4.7 on Windows 10 (45.53 KB, image/png)
2020-11-21 19:23 UTC, Ming Hua
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mikhail Novosyolov 2020-05-11 00:51:42 UTC
Description:
There is a very strange table made in Microsoft Office Word.
MS Word, including MS Office Online, displays e.g. "620", but actually it is "" (I do not know what it is, just some random UTF-8 symbols!)
"symbol" font is set in the original document. If I change the font to "Times New Roman" in MS Office Online, symbols are displayed as UTF-8 krakozyabraz.
In LibreOffice, they are displayed as krakozyabraz out of the box.
I do not understand how MS Office manages to display "" as "620". LibreOffice fails to do it.

Steps to Reproduce:
.

Actual Results:
.

Expected Results:
.


Reproducible: Always


User Profile Reset: No



Additional Info:
.
Comment 1 Mikhail Novosyolov 2020-05-11 00:52:45 UTC
Created attachment 160636 [details]
table1_tdf132938.docx: Table with strange symbols displayed as numbers
Comment 2 Mikhail Novosyolov 2020-05-11 00:53:44 UTC
Created attachment 160637 [details]
Screenshot: Correctly displayed as numbers in MS Office Online

It can be viewed online in MS Office Online here: https://yadi.sk/i/lbjEKpowQGUpFg
Comment 3 Mikhail Novosyolov 2020-05-11 00:54:24 UTC
Created attachment 160638 [details]
Screenshot: krakozyabraz in LibreOffice
Comment 4 Mike Kaganski 2020-05-11 07:53:46 UTC
The numbers in the bugdoc use U+f030-U+f039 instead of usual U+0030-U+0039. The font used is Symbol. Opens OK on Windows, where Symbol font is present. Trying to apply OpenSymbol to see if substitution would work correctly on systems without Symbol font garbles the symbols - so setting to NEW, since our substitution has a problem here.

Tested with Version: 6.4.4.1 (x64)
Build ID: b50bc319eca5cd5b66fbfe2ebd0d3bd1eed099b5
CPU threads: 12; OS: Windows 10.0 Build 18363; UI render: default; VCL: win; 
Locale: ru-RU (ru_RU); UI-Language: en-US
Calc: threaded
Comment 5 Mikhail Novosyolov 2020-05-11 07:54:32 UTC
Created attachment 160642 [details]
Full screen screenshot of LibreOffice

Some clarification: on my system (Linux) font "Symbol" does not exist (you may see on the screenshot that font name in cursive - it means, that it does not exist and was replaced). In MS Office Online that font does exist. It has also been reported that this document is displayed correctly on Windows OS in LibreOffice where font "Symbol" is installed.

So, the problem is probably in how fonts are replaced in Libreoffice and/or Fontconfig or in fonts or in not standard solutions in the "Symbol" font.

That is weird how such strange characters become arabic numbers in the "Symbol" font.
Comment 6 Mikhail Novosyolov 2020-05-11 07:56:36 UTC
(In reply to Mike Kaganski from comment #4)
> Trying to apply OpenSymbol to see if substitution would work correctly on
> systems without Symbol font garbles the symbols
Setting "Open Symbol" or "XO Symbol" for this text does not help. Did not try subsitution via fontconfig.
Comment 7 Mike Kaganski 2020-05-11 08:43:21 UTC
Code pointer: extras/source/truetype/symbol/OpenSymbol.sfd

What's needed is add new code points U+F030-U+F039 in private use area [1], that would reference existing U+0030-U+0039 glyphs. git blame can give you previous commits changing the font (see how to increase font version there). The file is in Spline Font Database format [2]. See other hints in extras/source/truetype/symbol/README.

[1] http://www.unicode.org/charts/PDF/UE000.pdf
[2] https://fontforge.org/docs/techref/sfdformat.html
Comment 8 Ming Hua 2020-11-11 01:47:44 UTC
I'm interested in this one.

However as I only have access to Windows system, I'd like to take this approach:

I've downloaded the attached DOCX file, it shows "620" in the highlighted cell when using Windows "Symbol" font (it's not set to Symbol by default though, the font is Calibri when I open the file).  If I change the font to "OpenSymbol" instead, it shows some other random characters.

So my goal is to edit the OpenSymbol.sfd file, add U+F030-U+F039 glyphs as references to U+0030-U+0039 (digits 0-9), and build a new OpenSymbol TTF font. If I can use the new OpenSymbol font to make this document show numbers correctly, I'm on the right track.

Am I understanding the problem correctly?
Comment 9 Mike Kaganski 2020-11-11 05:59:02 UTC
(In reply to Ming Hua from comment #8)

You are quite right! Please go ahead. :-)
Comment 10 Ming Hua 2020-11-17 16:49:20 UTC
Created attachment 167371 [details]
Patch to OpenSymbol.sfd adding glyphs U+F030 to U+F039

I think I've made good progress on this bug.

I've modified OpenSymbol.sfd to add 10 glyphs U+F030-U+F039 as references to glyph 0-9.  I've also built TTF font, installed it on my Windows system, and now attachment 160636 [details] displays correctly for me.

Patch attached.  I believe I've made all the necessary changes:
- glyphs added
- version number increased
- copyright section updated

There are still three lines that are automatically updated by FontForge:
- line #1: SplineFontDB - 3.0 to 3.2 (this should be the SFD format version number, and is safe to change)
- line #23: ModificationTime (this should be kept, I think)
- line #784: WinInfo (this is about how the font is displayed when opened in FontForge, so probably an unnecessary change, I can manually edit it back)
Comment 11 Ming Hua 2020-11-17 17:03:36 UTC
I also need some guidance on how to proceed from here.

According to extra/source/truetype/symbol/README, after modifying the SFD file, there are also these work needed:
1. Use TTX/fonttools to verify the generated opens___.ttf font only contains intended changes
2. Upload opens___.ttf to dev-www.libreoffice.org
3. Update the build system to use the new version of OpenSymbol font, like the changes made in https://gerrit.libreoffice.org/#/c/75577

As far as I understand, these things are parallel to updating the SFD font file.  I can do the (1) TTX/fonttools verification part.  But as I don't have a build environment for LO here, I don't feel very confortable doing (3).  And I doubt just everyone has the upload rights to do (2).

So what else is expected from me?  Can I just do the SFD part, and leave the TTF part to others?
Comment 12 Ming Hua 2020-11-17 17:08:56 UTC
I'm also not sure how this change will affect Dante's ongoing work [1][2] to overhaul how OpenSymbol is used in LO (hopefully not at all, as I don't think starmath would use these 10 glyphs), so adding him to CC for opinions.

1. https://lists.freedesktop.org/archives/libreoffice/2020-November/086277.html
2. https://gerrit.libreoffice.org/c/core/+/105769
Comment 13 dante19031999 2020-11-17 19:17:19 UTC
(In reply to Ming Hua from comment #12)
> I'm also not sure how this change will affect Dante's ongoing work [1][2] to
> overhaul how OpenSymbol is used in LO (hopefully not at all, as I don't
> think starmath would use these 10 glyphs), so adding him to CC for opinions.
> 
> 1.
> https://lists.freedesktop.org/archives/libreoffice/2020-November/086277.html
> 2. https://gerrit.libreoffice.org/c/core/+/105769

Your work should not interfere with mine. I'm creating a new font for starmath based on it, but stills pending of approval.

This is resoluble without a demential amount of work, but actually opensymbol is an internal ressource of Libreoffice used here and there for it's own purposes, which means there are a lot of symbols missing. I would recommend to have a symbol font in case of.

Those numbers are in a private use area. There are several standardization initiative uses, but don't know to which one responds. Don't really know how the user managed to type them.

About the ttf thing. Fontforge allows you to export fonts without validating them. OpenSymbol itself would be unbuildable if not for that. But I didn't managed to go that far yet, still in work over the font itself. Maybe you should write an email to the developers list.
Comment 14 Mike Kaganski 2020-11-18 06:34:44 UTC
(In reply to Ming Hua from comment #10)
> I think I've made good progress on this bug.

Great!

> I've modified OpenSymbol.sfd to add 10 glyphs U+F030-U+F039 as references to
> glyph 0-9.  I've also built TTF font, installed it on my Windows system, and
> now attachment 160636 [details] displays correctly for me.

Very good!

> Patch attached.

Please submit it to gerrit - that way would be better for review - thank you for all the work!

> There are still three lines that are automatically updated by FontForge:
> - line #1: SplineFontDB - 3.0 to 3.2 (this should be the SFD format version
> number, and is safe to change)

Please revert this line change - it's not necessary

> - line #23: ModificationTime (this should be kept, I think)

It's OK

> - line #784: WinInfo (this is about how the font is displayed when opened in
> FontForge, so probably an unnecessary change, I can manually edit it back)

Yes, please revert this line's change as unrelated - that reduces the noise.

(In reply to Ming Hua from comment #11)
> According to extra/source/truetype/symbol/README, after modifying the SFD
> file, there are also these work needed:
> 1. Use TTX/fonttools to verify the generated opens___.ttf font only contains
> intended changes
> 2. Upload opens___.ttf to dev-www.libreoffice.org
> 3. Update the build system to use the new version of OpenSymbol font, like
> the changes made in https://gerrit.libreoffice.org/#/c/75577

As you submit the change to gerrit, we will review it; then we'll ask someone with appropriate access to upload the TTF built from the final result to server; and then you will update the patch in gerrit with necessary changes to use that file. It should be merged in the same patch, so that the source code matches the used binary.

(In reply to Ming Hua from comment #12)
> I'm also not sure how this change will affect Dante's ongoing work [1][2] to
> overhaul how OpenSymbol is used in LO (hopefully not at all, as I don't
> think starmath would use these 10 glyphs), so adding him to CC for opinions.

Adding to Dante's answer in comment 13, you are working on *different* files. Your work continues to improve OpenSymbol, which is going to stay as the *font* (in contexts where there is a reference to the font; like when it's used explicitly, or as a substitution for another font like Symbol etc.); Dante is going to create a font that would be used in context of Math, as purely internal application resource. Dante's approach makes these two use cases clearly distinguished, and makes sure that no conflicts would appear in future.
Comment 15 Ming Hua 2020-11-18 22:13:23 UTC
(In reply to Mike Kaganski from comment #14)

Thanks for your detailed advice.  I've now made the suggested changes and submitted the patch to Gerrit:
https://gerrit.libreoffice.org/c/core/+/105997
Comment 16 Commit Notification 2020-11-19 15:15:10 UTC
Ming Hua committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/52f1115571469f210192cbce6b52e8b7d1d85dc0

tdf#132938 Add glyphs U+F030-U+F039 to OpenSymbol

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Mikhail Novosyolov 2020-11-20 18:55:06 UTC
Created attachment 167428 [details]
screenshot: LibreOffice 6.4.7 with cherry-picked 52f1115571469

I have cherry-picked 52f1115571469 to LibreOffice 6.4.7 on rosa2019.1 and confirm that this problem is fixed. Numbers are displayed not very beutifully due - they are positioned strangily - but they are displayed as numbers.

Thanks to every one imvolved!
Comment 18 Ming Hua 2020-11-21 19:23:28 UTC
Created attachment 167454 [details]
Screenshot with new font, 6.4.7 on Windows 10

(In reply to Mikhail Novosyolov from comment #17)

Thanks for testing!

> Numbers are displayed not very
> beutifully due - they are positioned strangily - but they are displayed as
> numbers.
I don't have access to MS Office to compare with, but for me with 6.4.7 and new OpenSymbol.ttf font installed, it displays quite nicely here on Windows 10, screenshot attached.  It looks close enough to attachment 160637 [details].

So it's probably due to rendering difference between Windows and Linux, and not related to the root cause of this bug.  If you'd like, please file a separate bug for this issue (but maybe test 7.0.x and/or master first).
Comment 19 Mike Kaganski 2020-11-21 20:15:14 UTC
(In reply to Ming Hua from comment #18)
> I don't have access to MS Office to compare with, but for me with 6.4.7 and
> new OpenSymbol.ttf font installed, it displays quite nicely here on Windows
> 10, screenshot attached.

Are you sure that testing on Windows, you actually test the OpenSymbol glyphs, and not Symbol glyphs as the document references? It might be necessary to select the cells, and set their font to OpenSymbol explicitly to see the problem from comment 17 (which indeed needs its own report).
Comment 20 Ming Hua 2020-11-23 20:54:10 UTC
(In reply to Mike Kaganski from comment #19)
> (In reply to Ming Hua from comment #18)
> > I don't have access to MS Office to compare with, but for me with 6.4.7 and
> > new OpenSymbol.ttf font installed, it displays quite nicely here on Windows
> > 10, screenshot attached.
> 
> Are you sure that testing on Windows, you actually test the OpenSymbol
> glyphs, and not Symbol glyphs as the document references?

I was pretty sure I was testing OpenSymbol glyphs, because the screenshot I posted was how my LO looks when opening the sample docx without any changes.  Without the new version of OpenSymbol, it showed strange characters when opened without changing fonts.

Until...

> It might be
> necessary to select the cells, and set their font to OpenSymbol explicitly
> to see the problem from comment 17 (which indeed needs its own report).

...I tried this.  If I select all cells with numbers, the font selection dropdown shows "Calibri" on the Formatting toolbar, but the numbers are still shown correctly (and obviously doesn't look digit glyphs in Calibri font).  Change the font to "Symbol" in the toolbar dropdown list, nothing obvious happens; change the font to "OpenSymbol", now I see the problem described in comment 17, the numbers are too close to the top border.

But enough of this issue here.  Let's continue when/if there is a separate bug filed.
Comment 21 Mikhail Novosyolov 2020-11-23 22:06:32 UTC
Here is a bug report about positioning inside the table: https://bugs.documentfoundation.org/show_bug.cgi?id=138442