Bug 158973 - (Writer) xml file is displayed in xml code rather than style
Summary: (Writer) xml file is displayed in xml code rather than style
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.6.4.1 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-02 13:25 UTC by BDF
Modified: 2024-01-11 09:04 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
xml file MS office (11.46 KB, application/pdf)
2024-01-02 13:26 UTC, BDF
Details
xml file LibreOffice (15.54 KB, application/pdf)
2024-01-02 13:27 UTC, BDF
Details
test file.xml (45.88 KB, application/xml)
2024-01-05 08:37 UTC, BDF
Details
writer screenshot xml file.jpg (471.34 KB, image/jpeg)
2024-01-05 14:21 UTC, BDF
Details
Word XML Document: test file. (49.55 KB, text/xml)
2024-01-08 10:48 UTC, Miklos Vajna
Details

Note You need to log in before you can comment on or make changes to this bug.
Description BDF 2024-01-02 13:25:23 UTC
Description:
An xml file that was created with Microsoft Office is not displayed as formated text, but just as text file with xml codes in it.

Steps to Reproduce:
1. Open file

Actual Results:
The file is displayed as text file with xml codes in it

Expected Results:
Show the text as it was formatted in MS office


Reproducible: Always


User Profile Reset: No

Additional Info:
I know that MS office does a lot of garbage with their files, so I wouldn't expect LibreOffice to do the same. Yet, it would be nice if it would recognize the garbage and offer you to display the file as it would have looked like in MS office.

I included two files:
1) The first one is the xml file with altered text with print to pdf on Windows 10.
2) The second file is the first page of the xml file and how it looks like in LibreOffice (I only printed the first page; there would be 102 pages in total)

I can also share the original file, but since it contains personal and/or confidential information (which I can not remove since it's EVERYWHERE in the xml code) I will not upload it here.
Comment 1 BDF 2024-01-02 13:26:03 UTC
Created attachment 191696 [details]
xml file MS office

The xml file with replaced text. The file was created with print to pdf in MS office.
Comment 2 BDF 2024-01-02 13:27:45 UTC
Created attachment 191697 [details]
xml file LibreOffice

The first page of the same xml file opened with Writer. This shows only the first image, there would be 102 pages in total.
The file was created using export to pdf in Writer.
Comment 3 V Stuart Foote 2024-01-02 15:35:41 UTC
(In reply to BDF from comment #0)
> ...
> I included two files:
> 1) The first one is the xml file with altered text with print to pdf on Windows 10.
> ...


Can not confirm. The PDF exported from MS Word (attachment 191696 [details]) correctly opens into the Draw LibreOffice module.

If you want to open the OOXML .docx Word document into LibreOffice, just do so.

Otherwise, PDF is not the editable format--and opening the PDF opens it in Draw by default, though there are other options for handling the PDF (as image insert, or with alternate filter into Writer or Impress).

=-testing-=
Version: 7.6.4.1 (X86_64) / LibreOffice Community
Build ID: e19e193f88cd6c0525a17fb7a176ed8e6a3e2aa1
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

> Description:
> An xml file that was created with Microsoft Office is not displayed as
> formated text, but just as text file with xml codes in it.
> 
> Steps to Reproduce:
> 1. Open file
> 
> Actual Results:
> The file is displayed as text file with xml codes in it
> 
> Expected Results:
> Show the text as it was formatted in MS office
> 
> 
> Reproducible: Always
> 
> 
> User Profile Reset: No
> 
> Additional Info:
> I know that MS office does a lot of garbage with their files, so I wouldn't
> expect LibreOffice to do the same. Yet, it would be nice if it would
> recognize the garbage and offer you to display the file as it would have
> looked like in MS office.
> 
> I included two files:
> 1) The first one is the xml file with altered text with print to pdf on
> Windows 10.
> 2) The second file is the first page of the xml file and how it looks like
> in LibreOffice (I only printed the first page; there would be 102 pages in
> total)
> 
> I can also share the original file, but since it contains personal and/or
> confidential information (which I can not remove since it's EVERYWHERE in
> the xml code) I will not upload it here.
Comment 4 BDF 2024-01-03 12:13:25 UTC
@V Stuart Foote: The bug is *N O T* about the included pdf file. 

It is about the "xml file" that "is displayed in xml code rather than style".
The xml file (not the included pdf file) "was created with Microsoft Office". When this xml file (not the included pdf file) is opened with LibreOffice it is "displayed as text file with xml codes in it" rather than showing "the text as it was formatted in MS office".

The first included pdf file was created "with print to pdf on Windows 10". The included pdf file is the print result of the xml file, not the actual problematic file.
The second included pdf file was created with 'Export to pdf' in LibreOffice and shows (only) the first page of the xml file opened with LibreOffice.

As said "I can also share the original [xml] file, but [...] it contains personal and/or confidential information" that "I can not fully remove" so "I will not upload it here"

Again: The uploaded files are *NOT* the xml file. It never was about the pdf files being buggy. It was always about the original xml file.

If there is a developer willing to look at the file, I can send the original xml file via mail. I can not upload the file to a part of the web where it can be accessed by everybody.
Comment 5 V Stuart Foote 2024-01-03 12:44:30 UTC
Without the OOXML generated by MS Word 2006 (sanitized as needed) we can't see an issue.  But be aware that XML is parsed based on the DTD specified in its header. If that is corrupt, or incomplete, LibreOffice will not be able to identify the correct import filter to use and the XML can be treated simply as TEXT.

Please provide a redacted test file.
Comment 6 V Stuart Foote 2024-01-03 13:07:19 UTC
And actually, since it is Word 2006 that XML would not be OOXML, so not the default 'Word 2010-365 Document (*.docx, *.docm)' or 'Word 2007 (.docx)' import filters.

Please check if explicitly using the 'Word 2003 XML (*.xml, *.doc)' import filter correctly parses the file on opening.
Comment 7 BDF 2024-01-05 08:37:45 UTC
Created attachment 191773 [details]
test file.xml

The original xml file can not manually be fully sanitized since it's over 12'000 lines (or 104 pages in Writer) of xml code which I can not check individually nor do I know what is hidden in the code in areas that I don't even know about. Therefore it can not be uploaded to a website that can be accessed by the public and where it is not possible to delete files.

As said I can send the file eg. via Mail to a single developer that would deal with it.


However, I tried to replicate this issue on a different Windows machine. For this I created a new file with MS Word 2016 with only a single empty page and saved it as "Word XML-Document" (not "Word 2003 XML-Document"). On this Windows machine I also have LO installed and the not-2003-xml looks like the problematic xml file and as described (xml code like a text file instead of formated text). The same file saved as 2003-xml looks like it does in MS office when opened with LO. My guess here is that the original problematic file was saved as not-2003-xml file. I talked to a person who from that part of the company and he confirmed that they save it as "Word XML-Document".

The not-2003-xml test file is the one attached to this comment.
Note that this is NOT the original xml file and that the original xml might have a different problem by itself. But since I can not share the original xml file (as explained) and because the file with an empty page shows the same behaviour and a person who also creates such files confirmed the file extension I think it can be a good first step.
Comment 8 V Stuart Foote 2024-01-05 13:25:00 UTC
Looking at attachment 191773 [details], the "not-2003-xml" you've indicate as saved from Word 2016. But there is no w:t content in the w:body, and only the XML docProps/app.xml and docProps/core.xml elements have any values.

MS Edge, or MS Wordpad, or LO Writer render the XML tree fully--showing there is no content.

And, as I'd expect it opens blank in LibreOffice with the 'Word 2003 XML' import filter. But I'd guess we may not have an XSLT filter in place to support Word 2007 XML? @Miklos, Mike, am I missing something on our filter handling here? 

Perhaps your company needs a support contract from one of the community members?
Comment 9 BDF 2024-01-05 14:21:26 UTC
Created attachment 191776 [details]
writer screenshot xml file.jpg

As said, 'test file.xml' does not contain any text. It's just a single blank page. I guess that you meant that as no "w:t content".

The main problem is that the xml file is displayed as shown in the attached screenshot (which is what is displayed in "xml file LibreOffice" (15.54 KB, application/pdf))
Comment 10 Miklos Vajna 2024-01-08 10:48:50 UTC
Created attachment 191809 [details]
Word XML Document: test file.

Interestingly if I open the sample XML in Word, I get an empty document, while the XML definitely has some content.

I attach a perhaps better example: it's just "test" in Word, but it shows the XML markup in Writer as plain text. This is a "Word XML Document" on Word's UI.

Is this what you want to open in Writer?
Comment 11 BDF 2024-01-11 09:04:01 UTC
(In reply to Miklos Vajna from comment #10)
> 
> Is this what you want to open in Writer?

Yes.
the file should look in Writer as it looks like in Word.

The strange thing is that my (Linux) system does not even suggest to open xml with Writer in the right click menu.

Same on Android where LibreOffice does not open the file.