Bug 155272 - FILEOPEN DOCX: invalid document: fldchar begin without fldchar end - lots of text is lost.
Summary: FILEOPEN DOCX: invalid document: fldchar begin without fldchar end - lots of ...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: low minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: DOCX-Opening
  Show dependency treegraph
 
Reported: 2023-05-13 00:51 UTC by Gerry
Modified: 2023-05-17 13:08 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
This is how the .docx file looks (correctly) in MS Word (81.75 KB, image/png)
2023-05-13 00:51 UTC, Gerry
Details
File sample resaved with word. (58.75 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-05-13 12:23 UTC, m_a_riosv
Details
Example file (53.59 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-05-13 13:43 UTC, Telesto
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gerry 2023-05-13 00:51:31 UTC
Created attachment 187237 [details]
This is how the .docx file looks (correctly) in MS Word

The following .docx Microsoft Word file is imported/displayed absolutely incorrectly in LibreOffice Writer. The layout is completely wrong and most of the text is missing:

https://vergabe.niedersachsen.de/Satellite/public/company/project/CXTMYYDYRHF/de/documents/misc/VVB+236+-+Verplichtungserklaerung+anderer+Unternehmen+12-2017.docx

System:
Version: 7.5.2.2 (X86_64) / LibreOffice Community
Build ID: 50(Build:2)
CPU threads: 16; OS: Linux 6.2; UI render: default; VCL: gtk3
Locale: fr-FR (fr_FR.UTF-8); UI: fr-FR
Ubuntu package version: 4:7.5.2-0ubuntu1
Calc: threaded
Comment 1 m_a_riosv 2023-05-13 12:23:35 UTC
Created attachment 187246 [details]
File sample resaved with word.

Opening the file with
Microsoft® Word para Microsoft 365 MSO (version 2304 compilation 16.0.16327.20200) 64 bits 
and saving it,
then LibreOffice opens fine the file.
So not sure if it is our bug.
Comment 2 Telesto 2023-05-13 13:43:14 UTC
Created attachment 187248 [details]
Example file
Comment 3 Telesto 2023-05-13 13:43:50 UTC
Confirm
Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 066b23115c2a360507e306a88da572554daefab7
CPU threads: 8; OS: Mac OS X 12.6.3; UI render: Skia/Raster; VCL: osx
Locale: nl-NL (nl_NL.UTF-8); UI: en-US
Calc: threaded
Comment 4 Telesto 2023-05-13 17:51:18 UTC
Also in
4.4.7.2

and in
Versie 4.0.0.3 (Bouw-id: 7545bee9c2a0782548772a21bc84a9dcc583b89)
Comment 5 Telesto 2023-05-13 17:56:08 UTC
@Justin L
Some analysis would be nice to have.. if you're interested of course
Comment 6 Gerry 2023-05-13 20:50:51 UTC
FYI, I encountered the same problems with the following three .docx documents, too. Unfortunately, these are all official documents in public tender/procurement procedures of the Federal Republic of Germany VHB-Bund (required templates/forms of the Vergabe- und Vertragshandbuch des Bundes (VHB)), thus the user needs to be able to use these documents in public tender procedures.

https://www.evergabe.nrw.de/VMPSatellite/public/company/project/73091/de/documents/filledByCompany/VVB+234+-+Erklaerung+Bieter-_Arbeitsgemeinschaft+12-2017.docx

https://vergabe.niedersachsen.de/Satellite/public/company/project/CXS0YMTYY4J/de/documents/filledByCompany/VVB+124_LD+-+Eigenerklaerung+zur+Eignung+Liefer-_Dienstleistungen+07-2019.docx

https://www.evergabe.nrw.de/VMPSatellite/public/company/project/CXS0Y6XYYPB/de/documents/filledByCompany/VVB+235+-+Verzeichnis+der+Leistungen_Kapazitaeten+anderer+Unternehmen+12-2017.docx
Comment 7 Justin L 2023-05-15 20:14:28 UTC
tested with bibisect-releases and see it is inherited from OOo.

The problem is related to bookmarks in the original document that are removed when MS Word round-trips the file.
              <w:fldChar w:fldCharType="begin">
                <w:ffData>
                  <w:name w:val="Text2"/>
                  <w:enabled/>
                  <w:calcOnExit w:val="0"/>
                  <w:textInput>
                    <w:default w:val="${fn:format-date(fn:current-date(),'[D01].[M01].[Y0001]')}"/>
                  </w:textInput>
                </w:ffData>
              </w:fldChar>
            </w:r>
            <w:bookmarkStart w:id="1" w:name="Text2"/>
            <w:r>
              <w:t>29.09.2022</w:t>
            </w:r>
            <w:bookmarkEnd w:id="1"/>
Comment 8 Justin L 2023-05-15 20:36:05 UTC
The reason the fldChar is invalid is because it is missing an end.
            <w:r>
              <w:fldChar w:fldCharType="end"/>
            </w:r>
Comment 9 Justin L 2023-05-15 21:03:47 UTC
The first thing we do in DomainMapper_Impl::finishParagraph is to just do an early return if the field command is not finished.

We can't just assume that the end of the paragraph can just close any started field commands because of embedded fields, paragraphs in field shapes/tables etc.

This probably needs to be chalked up to corrupt documents, and likely WONTFIX.
Comment 10 Telesto 2023-05-16 15:36:50 UTC
(In reply to Justin L from comment #9)
> This probably needs to be chalked up to corrupt documents, and likely
> WONTFIX.

This might be a corrupt document in  the technical sense. However WONTFIX is somewhat problematic from end-user perspective: 

* The documents are official published government forms, which need to be used. 
* There a probably a lot more of those (because it are government documents)
* Even corrupt, those somehow work with MSO
* The documents are apparently created by Microsoft Office Word 16.0000 (if app.xml delivering proper information).

So won't fix, entails, use MSO (or some other alternative).