Bug 133607

Summary: FILEOPEN: Semi-colons in front of words cause incorrect line break (ICU 60.1 change)
Product: LibreOffice Reporter: Xisco Faulí <xiscofauli>
Component: WriterAssignee: Not Assigned <libreoffice-bugs>
Status: NEW ---    
Severity: normal CC: buzea.bogdan, dgp-mail, erack, jluth, xiscofauli
Priority: medium Keywords: bibisected, bisected, regression
Version: 6.0 all versions   
Hardware: All   
OS: All   
See Also: https://bugs.documentfoundation.org/show_bug.cgi?id=131278
Whiteboard:
Crash report or crash signature: Regression By:
Bug Depends on:    
Bug Blocks: 107838, 149092, 161022    
Attachments: Comparison MSO 2010 and LibreOffice 7.0 master
DOCX file
DOC file
Untitled 1234b.odt: copy/paste of the text into new ODT to (assumedly) avoid compat flags.
semicolonedNonBreakingWhitespace_133607.odt: cleanroom demonstration

Description Xisco Faulí 2020-06-02 15:40:46 UTC
Created attachment 161529 [details]
Comparison MSO 2010 and LibreOffice 7.0 master

Steps to reproduce:
1. Open attached document ( either the DOC or the DOCX document )

-> First line breaks in the middle. it should reach the end of the paragraph. See comparison image.

Reproduced in

Version: 7.0.0.0.alpha1+
Build ID: 82894d85147840f1f587e9530b12f0058f2ef2c3
CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded


[Bug found by office-interoperability-tools]
Comment 1 Xisco Faulí 2020-06-02 15:41:06 UTC
Created attachment 161530 [details]
DOCX file
Comment 2 Xisco Faulí 2020-06-02 15:41:29 UTC
Created attachment 161531 [details]
DOC file
Comment 3 Xisco Faulí 2020-06-02 15:43:49 UTC
I've bisected it with bibisect-linux64-6.0 and it points to

author	Eike Rathke <erack@redhat.com>	2017-11-17 11:03:45 +0100
committer	Eike Rathke <erack@redhat.com>	2017-11-20 19:28:10 +0100
commit 9206a08ada00e8762c4a634f242bd566028964bb (patch)
tree eaa317ce6717d44f75c077a6db147b0ebd4994b7
parent a8687041c46b3fe93a76faa0a4a65e7069ef5e9d (diff)
Upgrade to ICU 60.1

so it might be Writer interprets a unicode as a line break?

@Justin, I thought you might be interested in this issue...
Comment 4 Justin L 2020-06-02 16:24:00 UTC
It is not being read in as a line break. (There is no linebreak character indicated with reveal formatting.) Add more spaces, and it will jump back up to the top line.
Comment 5 Justin L 2020-06-02 18:52:14 UTC
Created attachment 161544 [details]
Untitled 1234b.odt: copy/paste of the text into new ODT to (assumedly) avoid compat flags.

I don't think this is related to MS formats.
Comment 6 Dieter 2020-06-05 06:45:17 UTC
I confirm ith with

Version: 7.0.0.0.beta1 (x64)
Build ID: 94f789cbb33335b4a511c319542c7bdc31ff3b3c
CPU threads: 4; OS: Windows 10.0 Build 18363; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-GB
Calc: CL

and Word 2016
Comment 7 Justin L 2020-07-23 08:59:36 UTC
Created attachment 163441 [details]
semicolonedNonBreakingWhitespace_133607.odt: cleanroom demonstration

This seems somehow to be related specifically to the semi-colons (discovered through trial and error). Apparently they have a special meaning when they follow whitespace.

Reproducable steps.
1.) type any sentence in Writer just one word longer than one line, so that it wraps to the next line..
2.) starting from the last word, add a semi-colon in front of it. Notice that the previous word is now added in front.
3.) repeat.

If you DELETE a semi-colon, the text will not re-flow backwards, but if you save/re-open, then the text will re-flow backwards.

This is probably intentional behaviour.  I'd guess that if it is not intentional, then it is an ICU bug and NOTOURBUG. @Eike might be able to provide more knowledgeable insight.
Comment 8 Justin L 2020-11-18 11:40:53 UTC
Tested after yesterday's

author	Eike Rathke  on	2020-11-17 16:33:33 +0100
commit 8335c8c20765d4f167d9b48e6a2757864a3bc7fd 
Update to ICU 68.1

and still the same thing.  A space followed by a semi-colon is treated as a keep-with-next-work flag.
Comment 9 Justin L 2021-11-22 08:05:50 UTC
repro 7.3+ with new ICU 70.1.