Bug 106388

Summary: FILEOPEN: Flat MS Office Word 2003 XML file opens with wrong import filter
Product: LibreOffice Reporter: sam tygier <samtygier>
Component: filters and storageAssignee: Not Assigned <libreoffice-bugs>
Status: NEW ---    
Severity: normal CC: ilmari.lauhakangas, vsfoote
Priority: medium    
Version: Inherited From OOo   
Hardware: All   
OS: All   
See Also: https://bugs.documentfoundation.org/show_bug.cgi?id=150964
Whiteboard:
Crash report or crash signature: Regression By:
Bug Depends on:    
Bug Blocks: 109530    
Attachments: test_file_word2010.xml

Description sam tygier 2017-03-07 15:10:08 UTC
Created attachment 131701 [details]
test_file_word2010.xml

I was send a form to fill in a .doc that actually contained xml (not contained in a zip). Libreoffice 5.2 and 5.3 open this showing the raw xml. The file opens normally in MS word 2010. Changing the filename extension to docx or xml makes no difference.

In Word 2010 I can generate a similar file by saving as "Word XML Document".
Comment 1 Buovjaga 2017-03-11 20:19:44 UTC
Confirmed.

It is: pkg:contentType="application/vnd.openxmlformats-package.relationships+xml"

Arch Linux 64-bit, KDE Plasma 5
Version: 5.4.0.0.alpha0+
Build ID: 43af3605d7e3b372dcc61f9cbc2cabff09396ed5
CPU threads: 8; OS: Linux 4.9; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on March 10th 2016

Arch Linux 64-bit
LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4
Comment 2 QA Administrators 2018-03-12 03:35:50 UTC Comment hidden (obsolete)
Comment 3 sam tygier 2018-03-17 22:21:58 UTC
Still an issue in current master:
Version: 6.1.0.0.alpha0+
Build ID: 5833734027f9194e3433d82a6e8848b64e2ae3b1
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: en-GB (en_GB.utf8); Calc: group
Comment 4 V Stuart Foote 2018-03-18 22:17:23 UTC
Opens using the "Microsoft Word 2003 XML (*.xml, *.doc)" file type filter. You'll need to set the "UseSystemFileDialog" false to conveniently select the filter. 

But, the import filter does not seem to correctly handle all content tags of the source XML content-- the "<w:r><w:t>Test file</w:t></w:r>" is not being picked up...
Comment 5 QA Administrators 2019-03-19 03:49:37 UTC Comment hidden (obsolete)
Comment 6 V Stuart Foote 2019-03-19 04:09:48 UTC
Remains an issue, have to force use of the "Word 2003 XML (.xml, .doc)" filter to open as document into Writer. Otherwise opens as XML text.

Version: 6.3.0.0.alpha0+
Build ID: ce01727e4d6779ea128aa1be09f4af8cad4e1854
CPU threads: 8; OS: Windows 10.0; UI render: GL; VCL: win; 
Locale: en-US (en_US); UI-Language: en-US
Calc: CL
Comment 7 Timur 2019-08-19 11:02:28 UTC
Repro 6.4+
Comment 8 Stéphane Guillou (stragu) 2021-06-29 07:06:11 UTC
Reproduced in:

Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: f446a203fa2897bab8ae7686c948a8bf060675c6
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-06-24_15:16:38
Calc: threaded
Comment 9 Ken Parker 2023-10-20 22:46:40 UTC
We at OpenText have a product, WebReports, with thousands of users producing thousands of these documents per day. We would really appreciate anything that could be done to move this issue up the priority list.
Comment 10 Buovjaga 2023-10-21 04:01:04 UTC
(In reply to Ken Parker from comment #9)
> We at OpenText have a product, WebReports, with thousands of users producing
> thousands of these documents per day. We would really appreciate anything
> that could be done to move this issue up the priority list.

https://www.libreoffice.org/get-help/professional-support/
Comment 11 Justin L 2023-10-23 12:12:33 UTC
Following comment 4's instructions opens up to an empty page. I expect that it fails everything EXCEPT accepting this as a legitimate document. So likely the styles, the settings, the header/footer, and the document itself are completely ignored. Thus it probably is not much different than opening Writer and assigning a file name, which isn't very helpful. To test this theory, I'd suggest adding some SAL_DEBUG statements in writerfilter...DomainMapper.cxx in some properties that exist in styles like 
    <w:spacing w:after="200" w:line="276" w:lineRule="auto"/>
and see if ANYTHING from the XML is loading.

Since likely the XML is completely ignored as unknown, the key to solving this should be to parse the added pkg:part XML commands and use that to direct the rest of the XML parsing into the correct "buckets" of document, styles, settings, header/footer etc.

That would involve diving into the nasty /writerfilter/source/ooxml/model.xml and related functions in OOXMLFastContext files.