Bug 118691

Summary: FILEOPEN DOCX Extra CR tag in table causes it to appear incorrectly
Product: LibreOffice Reporter: Gabor Kelemen (allotropia) <kelemeng>
Component: WriterAssignee: László Németh <nemeth>
Status: RESOLVED FIXED    
Severity: normal CC: nemeth, reiner.banken, xiscofauli
Priority: medium Keywords: filter:docx
Version: Inherited From OOo   
Hardware: All   
OS: All   
URL: http://officeopenxml.com/WPtextSpecialContent.php
See Also: https://bugs.documentfoundation.org/show_bug.cgi?id=116194
Whiteboard: target:6.2.0 target:6.1.2
Crash report or crash signature: Regression By:
Bug Depends on:    
Bug Blocks: 104444    
Attachments: Example document, reduced from a user doc
Screenshot of the document in Word
The document in Writer
Another example version of the reduced user doc
The other example in LO 6.2alpha and Word 2013

Description Gabor Kelemen (allotropia) 2018-07-11 12:32:51 UTC
Created attachment 143454 [details]
Example document, reduced from a user doc

Attached simplified user document contains a simple 1x1 table. There are some text and a <w:cr/> tag in the cell.
When opening it in Writer, the content before the <w:cr/> tag appears top of the table, out of cell.

Actual results: 
The text before <w:cr/> tag appears out of the table in LibreOffice view.

Expected results: 
Whole text appears in the cell. The <w:cr/> tag removed.

LibreOffice details: 
Version: 6.2.0.0.alpha0+
Build ID: bb1d5780226bb1b9156580972eea9aa849178742
CPU threads: 1; OS: Windows 6.1; UI render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2018-07-03_05:56:48
Locale: hu-HU (hu_HU); Calc: group threaded
Comment 1 Gabor Kelemen (allotropia) 2018-07-11 12:33:46 UTC
Created attachment 143455 [details]
Screenshot of the document in Word
Comment 2 Gabor Kelemen (allotropia) 2018-07-11 12:34:07 UTC
Created attachment 143456 [details]
The document in Writer
Comment 3 Gabor Kelemen (allotropia) 2018-07-11 12:40:03 UTC
Created attachment 143459 [details]
Another example version of the reduced user doc
Comment 4 Gabor Kelemen (allotropia) 2018-07-11 12:50:01 UTC
Created attachment 143461 [details]
The other example in LO 6.2alpha and Word 2013

In a more complicated table the entire table structure disappears, leaving only the cell contents behind.

We have no idea how the users managed to create the original document in Word - it contained change tracking entries and comments from multiple organizations as well.
Comment 5 Xisco Faulí 2018-07-11 13:27:43 UTC
Reproduced in

Version: 6.2.0.0.alpha0+
Build ID: c290f692dd28094d41dff686f3faa1c4e14b556e
CPU threads: 4; OS: Linux 4.13; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group threaded

Version: 5.2.0.0.alpha0+
Build ID: 3ca42d8d51174010d5e8a32b96e9b4c0b3730a53
Threads 4; Ver: 4.10; Render: default; 

Version: 4.3.0.0.alpha1+
Build ID: c15927f20d4727c3b8de68497b6949e72f9e6e9e



LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4
Comment 6 Gabor Kelemen (allotropia) 2018-09-11 08:42:48 UTC
@Laszlo, I think you should be interested in this one.
Comment 7 László Németh 2018-09-17 13:01:21 UTC
Proposed fix: https://gerrit.libreoffice.org/#/c/60585/

tdf#118691 DOCX import: fix table loss caused by <w:cr>

According to the OOXML standard, <w:cr> (carriage return – Unicode character 000D) is equivalent to a break with null type and clear attributes, so we handle it as a <w:br/>, instead of endOfParagraph, fixing losing table paragraphs and tables containing <w:cr/>. Note: It seems, MSO cannot handle carriage return characters in table cells correctly. It shows squares (unknown characters) without line break there. Copying this text to a non-table paragraph in MSO, we get the correct layout with line breaks. Copying this text with carriage return characters back to a table cell, we get squares again. With this LO fix, it will be possible to fix the bad tables edited by MS Word by using LO, because LibreOffice import/export converts all <w:cr>s to <w:br>s (as before, but now without destroying the structure of the tables).
Comment 8 Commit Notification 2018-09-18 06:06:26 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f63a60f56156e4ac17887e6c96d15fb865a2a8eb

tdf#118691 DOCX import: fix table loss caused by <w:cr>

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Commit Notification 2018-09-18 11:49:22 UTC
László Németh committed a patch related to this issue.
It has been pushed to "libreoffice-6-1":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=8693f6fa799c43304741f465c23e827c3ceafd9d&h=libreoffice-6-1

tdf#118691 DOCX import: fix table loss caused by <w:cr>

It will be available in 6.1.2.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Gabor Kelemen (allotropia) 2018-10-28 22:34:38 UTC
*** Bug 116889 has been marked as a duplicate of this bug. ***