Bug 39179 - FILEOPEN: Writer has dump and takes very long to open DOCX with 973 footnotes
Summary: FILEOPEN: Writer has dump and takes very long to open DOCX with 973 footnotes
Status: RESOLVED DUPLICATE of bug 76260
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: confirmed:4.2.0.4:OSX
Keywords: filter:docx, haveBacktrace, perf
Depends on:
Blocks: DOCX-Opening
  Show dependency treegraph
 
Reported: 2011-07-12 14:26 UTC by [REDACTED]
Modified: 2021-01-05 12:52 UTC (History)
16 users (show)

See Also:
Crash report or crash signature:


Attachments
shorter 50 page docx document (122.52 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2011-07-12 14:26 UTC, [REDACTED]
Details
119 page docx file (311.62 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2011-07-12 14:31 UTC, [REDACTED]
Details
1.5 mb docx file which opens quickly (1.46 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2011-07-12 14:35 UTC, [REDACTED]
Details
long 119 page docx file (311.62 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2011-07-13 02:01 UTC, [REDACTED]
Details
Large 1.5mb docx file (1.46 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2011-07-13 02:05 UTC, [REDACTED]
Details
some bts at random (7.39 KB, application/x-compressed)
2012-11-01 05:12 UTC, Julien Nabet
Details
kcachegrind screenshot (206.79 KB, image/png)
2016-05-16 15:47 UTC, Julien Nabet
Details
Callgrind output from master (fileopen 50 page docx) (6.82 MB, application/x-xz)
2018-09-22 17:00 UTC, Buovjaga
Details
Flamegraph (227.38 KB, application/x-bzip)
2019-07-02 21:29 UTC, Julien Nabet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description [REDACTED] 2011-07-12 14:26:54 UTC
Created attachment 49025 [details]
shorter 50 page docx document

When opening a 50 page docx file it takes approximately 2 minutes to open and a 119 page docx file takes approximately 6 minutes to open. Also, the cpu is functioning at 100% for the full duration of the file opening process.

Steps to reproduce: Just open the attached 50 page file.

After the time lapse the file does open without any problems.

I first encountered the bug in LO 3.3.2 Build:202) on Pinguy OS. I have tested this bug in LO 3.3.3; 3.4.0 and 3.4.1 in Windows XP as well as in Ubuntu. Furthermore, I have also tested this bug on 2 other computers using LO 3.3.3 and 3.4.1.

Lastly, this bug does not seem to be dependent on the size of the document since a docx document with lots of pictures in it and with a size of 1.5 mb opened up very quickly. It therefore seems that this problem relates to the length of the document.
Comment 1 [REDACTED] 2011-07-12 14:31:22 UTC
Created attachment 49026 [details]
119 page docx file

This file should take around 6 minutes to open
Comment 2 [REDACTED] 2011-07-12 14:35:08 UTC
Created attachment 49027 [details]
1.5 mb docx file which opens quickly
Comment 3 Jeffrey 2011-07-12 19:17:23 UTC
Reproduced on LibreOffice 3.4 340m1(Build:103) for OpenSuse Linux.

Took me 2 minutes to open the first attached document. Other two documents are saved as text/plain files. I cannot seem to open them.
Comment 4 [REDACTED] 2011-07-13 02:01:30 UTC
Created attachment 49032 [details]
long 119 page docx file
Comment 5 [REDACTED] 2011-07-13 02:05:14 UTC
Created attachment 49033 [details]
Large 1.5mb docx file
Comment 6 [REDACTED] 2011-07-13 02:08:02 UTC
(In reply to comment #3)
> Reproduced on LibreOffice 3.4 340m1(Build:103) for OpenSuse Linux.
> 
> Took me 2 minutes to open the first attached document. Other two documents are
> saved as text/plain files. I cannot seem to open them.

Thank you Jeffrey. I re-attached the long and large files. Sorry, first ever bug I reported.
Comment 7 [REDACTED] 2011-09-21 01:40:17 UTC
Writer only takes so long to open a docx file, if the file has footnotes in it.
Comment 8 Roman Eisele 2011-09-21 05:33:11 UTC
Problem tested and confirmed with LibreOffice 3.4.3 (OOO340m1 (Build:302)) running on MacOS X 10.6.8, German, on a MacBook Pro with 2.53 GHz Intel Core i5, 4 GB RAM and Solid State Hard Drive. I mention the hardware data because the times LibreOffice needs to open your documents are a bit shorter for me, but they are still far too long. In detail:

1) The "large 1.5 mb docx file which opens quickly" (AJ Marais Research Proposal.docx) opens in less than 5 seconds for me -- fine! (But note that when closing the document again, just so, without any changes etc., LibreOffice crashes -- this is Bug 39510, I think.)

2) The "shorter 50 page docx document" (Chapter 3 draft 1.docx) needs about 60 seconds to open, while LibreOffice uses > 95% of the CPU all the time (measured with the top command in the shell). Far too long.

3) The "long 119 page docx file" (Volledige dokument.docx) needs about 3 minutes and 5 seconds to open, while LibreOffice uses > 97% of the CPU all the time (again measured with the top command in the shell). Far too long again.
Comment 9 [REDACTED] 2011-09-21 13:46:55 UTC
(In reply to comment #8)
> Problem tested and confirmed with LibreOffice 3.4.3 (OOO340m1 (Build:302))
> running on MacOS X 10.6.8, German, on a MacBook Pro with 2.53 GHz Intel Core
> i5, 4 GB RAM and Solid State Hard Drive. I mention the hardware data because
> the times LibreOffice needs to open your documents are a bit shorter for me,
> but they are still far too long. In detail:
> 
> 1) The "large 1.5 mb docx file which opens quickly" (AJ Marais Research
> Proposal.docx) opens in less than 5 seconds for me -- fine! (But note that when
> closing the document again, just so, without any changes etc., LibreOffice
> crashes -- this is Bug 39510, I think.)
> 
> 2) The "shorter 50 page docx document" (Chapter 3 draft 1.docx) needs about 60
> seconds to open, while LibreOffice uses > 95% of the CPU all the time (measured
> with the top command in the shell). Far too long.
> 
> 3) The "long 119 page docx file" (Volledige dokument.docx) needs about 3
> minutes and 5 seconds to open, while LibreOffice uses > 97% of the CPU all the
> time (again measured with the top command in the shell). Far too long again.

Thank you, I have commented at bug 39510 that these two bugs could be related to that one another.
Comment 10 Björn Michaelsen 2011-12-23 12:29:24 UTC Comment hidden (obsolete)
Comment 11 sasha.libreoffice 2012-01-06 04:24:02 UTC
reproduced on LibO 3.5.0 beta 1 on Fedora 64 bit and on LibO 3.4.3 on Windows XP 32 bit
Comment 12 Roman Eisele 2012-01-16 00:05:59 UTC
Acknowledgement: Problem is still present in LibreOffice 3.4.4 and 3.5beta2 running on MacOS X 10.6.8 German.
Comment 13 Julien Nabet 2012-10-30 20:37:53 UTC
On pc Debian x86-64 with master sources updated today, I reproduced the problem.

There are a lot of logs like these:
warn:writerfilter:22852:1:/home/julien/compile-libreoffice/libo/writerfilter/source/dmapper/DomainMapper.cxx:1451: DomainMapper::attribute() - Id: 92505 / 0x16959 value: 299 / 0x12b
warn:writerfilter:22852:1:/home/julien/compile-libreoffice/libo/writerfilter/source/dmapper/DomainMapper.cxx:1451: DomainMapper::attribute() - Id: 92505 / 0x16959 value: 300 / 0x12c
warn:writerfilter:22852:1:/home/julien/compile-libreoffice/libo/writerfilter/source/dmapper/DomainMapper.cxx:1451: DomainMapper::attribute() - Id: 92505 / 0x16959 value: 300 / 0x12c
etc.

Miklos: one for you?
Comment 14 Miklos Vajna 2012-10-31 06:34:34 UTC
Julien,

It makes little sense to reproduce a performance problem with a debug-enabled build, which has optimization disabled. :-)

Anyway, those warnings are just about some ignored properties, once those are fixed, the speed will be a bit more slow. But sure, I'll try to take a look as time permits.

Miklos
Comment 15 Julien Nabet 2012-10-31 06:53:44 UTC
(In reply to comment #14)
> It makes little sense to reproduce a performance problem with a
> debug-enabled build, which has optimization disabled. :-)

If the goal is to compare time, I agree with you of course :-)
Now it's quite a small doc and after a minute, it wasn't opened (I'm on i5 with 6Gb and had only Iceweasel and Icedove opened in addition with console and LO)
My goal was to retrieve maximum info.
I'll try to retrieve bts at random tonight after my day time job
Comment 16 Michael Meeks 2012-10-31 17:59:50 UTC
Nasty; is it a regression ? [ if so it'd be nice to have the keyword ;-] can we get a callgrind trace with debug symbols installed:

export OOO_EXIT_POST_STARTUP=1
export OOO_DISABLE_RECOVERY=1
valgrind --tool=callgrind --simulate-cache=yes --dump-instr=yes ./soffice.bin --splash-pipe=0 /path/to/file

and leave it overnight ;-)

gzip / up-load the callgrind.12345.txt file and (hopefully) the answer will just drop out of that.

Thanks !
Comment 17 Julien Nabet 2012-11-01 00:05:26 UTC
Michael: tried to follow your comment but after the 2 exports you quoted I runned this:
valgrind --tool=callgrind --simulate-cache=yes --dump-instr=yes ./soffice.bin --splash-pipe=0 ~/compile-libreoffice/bugs/39179/callgrind.txt

then got this:
10%
20%
25%
30%
35%
40%
50%
55%
60%
75%
80%
100%
end
+ a popup saying calgrind.txt doesn't exist
then:
==28753== Events    : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
==28753== Collected : 4357516741 1343547732 904497078 53990330 10823051 1763992 305982 1830857 823690
==28753== 
==28753== I   refs:      4,357,516,741
==28753== I1  misses:       53,990,330
==28753== LLi misses:          305,982
==28753== I1  miss rate:          1.23%
==28753== LLi miss rate:           0.0%

and other numbers.

I made the test again this time, after having creating an empty "callgrind.txt" file but I got the same.

What did I miss?

(Valgrind Version: 1:3.7.0-6 on pc Debian testing x86-64 updated today)
Comment 18 Julien Nabet 2012-11-01 05:12:56 UTC
Created attachment 69376 [details]
some bts at random

I attached some bts taken at random
Comment 19 Michael Meeks 2012-11-01 10:03:27 UTC
Hi Julien,

valgrind --tool=callgrind --simulate-cache=yes --dump-instr=yes ./soffice.bin --splash-pipe=0 ~/compile-libreoffice/bugs/39179/callgrind.txt

So - the command-line you're running after ./soffice.bin is just the same as if you're launching the app itself; so you need to pass the path to the test file you want to load to provoke the slowness :-)

so /path/to/docx/file is what I should have written.

Any chance of another try ? :-)
Comment 20 Julien Nabet 2012-11-01 10:24:54 UTC
(In reply to comment #19)
> Hi Julien,
> 
> valgrind --tool=callgrind --simulate-cache=yes --dump-instr=yes
> ./soffice.bin --splash-pipe=0 ~/compile-libreoffice/bugs/39179/callgrind.txt
> 
> So - the command-line you're running after ./soffice.bin is just the same as
> if you're launching the app itself; so you need to pass the path to the test
> file you want to load to provoke the slowness :-)
> 
> so /path/to/docx/file is what I should have written.
> 
> Any chance of another try ? :-)
Sorry for this, I thought it was like memcheck process and it was the output file.
Will give a try as soon as building of the day be finished
Comment 21 Julien Nabet 2012-11-01 16:45:09 UTC
I put the file available here:
http://serval2412.free.fr/callgrind.out.fdo39179.bz2

(I tried with http://wiki.documentfoundation.org/QA/Bugzilla-Attachments but it doesn't seem to accept more than 5MB, so gave up)
Comment 22 Michael Meeks 2012-11-05 11:37:12 UTC
Thanks - firing it up in kcachegrind doesn't show any staggeringly obvious single bottleneck. I suspect that the 2+ million calls to:

writerfilter::ooxml::OOXMLFactory::createFastChildContextFromFactory

and friends are near the core of the issue - we're spending far too much time per attribute by a -biiig- margin as we parse (I think) - and (I guess) there are a ton of attributes here.

OTOH - that will require a fair bit of work digging out ways to improve there I suspect, and there is the XSLT monster to kill there too :-)
Comment 23 tommy27 2013-08-31 19:27:45 UTC
still slow opening of 50 page test file.
almost 2 minute with LibO 4.1.1.2 under Win7 64bit
Comment 24 Julien Nabet 2013-09-01 20:47:57 UTC
I updated my local master repo and had the same.
I noticed lots of logs like this:
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 394 / 0x18a
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 395 / 0x18b
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 396 / 0x18c
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 397 / 0x18d
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 398 / 0x18e
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 399 / 0x18f
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 400 / 0x190
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 401 / 0x191
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 402 / 0x192
warn:writerfilter:18467:1:writerfilter/source/dmapper/DomainMapper.cxx:1479: DomainMapper::attribute() - Id: 92521 / 0x16969 value: 402 / 0x192
Comment 25 Miklos Vajna 2013-09-02 07:14:21 UTC
That is about unhandled tokens, not related to performance. If we want to improve performance, we should first profile.
Comment 26 the_letter_j 2013-10-07 01:24:28 UTC
Regression sometime before the most recent Ubuntu LTS in 2012 -- the slow documents are still slow as ever, but the 'fast' document is *also* now very slow to open.  Sigh.  LibreOffice 3.5.7 on Ubuntu 12.04.3 LTS

 25sec	Large 1.5mb docx     [was   5sec for previous commenter but crashed]  
 50sec	shorter 50 page docx [was  60sec for previous commenter] 
118sec	long 119 page docx   [was 185sec for previous commenter] 

Anyone like to test with ApacheOpenOffice 4.x please?  
And also on msftOffice, please specify Word version and Windows version? 

See also this bug, which is a much slower docx (~400 seconds) and has plenty of footnotes to go around.  https://bugs.freedesktop.org/show_bug.cgi?id=62656
Comment 27 tommy27 2013-10-07 05:40:30 UTC
just tested the shorter 50 page .docx on Win7 64bit.

same bad performance (almost 2 minutes to load) in LibO 3.3.3, LibO 4.1.2 and in OOo 3.3.0 and AOO 4.0.0.

so it seems the bug is inherited from OOo. changing
Comment 28 the_letter_j 2013-10-07 06:47:59 UTC
> just tested the shorter 50 page .docx on Win7 64bit ... 
> so it seems the bug is inherited from OOo. changing

@Tommy, can you please test the 1.5mb "fast" document?  Allegedly it worked fine in 2011, but on my Lio357 from 2012 is is now five times slower than before.  Might be a temporarily regression in the 3.5.x series, that is now fine in the 4.x series?
Comment 29 bugquestcontri 2014-02-06 04:28:53 UTC
I opened in LibO 4.1.4.2 on XP/SP3:
- 50 page file (123kB): in 46.9 sec; bug report: 2 min
- 119 page file (312kB): in 147 sec, bug report: 6 min
- 17 page file with 1.5 MB: in 4 sec; the bug reports gives no indication on opening time

See also the AskLibO Q&A: http://ask.libreoffice.org/en/question/29282/openbenchmarking-results-of-libreoffice/?answer=29290#post-id-29290
Comment 30 Michael Meeks 2014-02-06 07:50:13 UTC
Julien your callgrind trace is 404 now (urgh) =) in general running this stuff in callgrind (with symbols) and then looking at the results carefully, adding printf's to the slow pieces and then looking at what is churned on the console is the quickest and best way to identify what silly stuff is going on =)

That's really something that users can help with ... annoying that we can't attach those traces to bugzilla though ...
Comment 31 retired 2014-02-06 11:01:03 UTC
Confirmed:4.2.0.4:OSX

CPU spikes and 1.42min to open 50 page docx. Something is very off here.
Comment 32 Michael Stahl (allotropia) 2014-02-07 21:30:30 UTC
a quick check on loading the attachment implicated in comment #28 gives:

1167072 112520 pts/15 Sl+  22:20   0:01 OOo_3.2.1/opt/openoffice.org3/program/soffice.bin
1202460 120096 pts/6  Sl+  22:20   0:01 OOo_3.3/opt/openoffice.org3/program/soffice.bin
1209924 122832 pts/14 Sl+  22:20   0:01 OOo_3.4_beta/opt/ooo-dev3/program/soffice.bin
953528 131192 pts/9   Sl+  22:20   0:01 LO_3.4.6/opt/libreoffice3.4/program/soffice.bin
1075360 121616 pts/10 Sl+  22:21   0:01 LO_3.5.7/opt/libreoffice3.5/program/soffice.bin
996916 124556 pts/11  Sl+  22:21   0:01 LO_3.6.7/opt/libreoffice3.6/program/soffice.bin
1594144 143480 pts/12 Sl+  22:21   0:02 LO_4.0.6/opt/libreoffice4.0/program/soffice.bin
1581200 135336 pts/13 Sl+  22:22   0:02 LO_4.1.4/opt/libreoffice4.1/program/soffice.bin --splash-pipe=5

a look at the CPU time column shows only 0:01/0:02 values across all releases
=> removing "regression"

if you do find a document for which loading performance has regressed,
_please_ open a separate bug, do not add "regression" here!
Comment 33 Michael Stahl (allotropia) 2014-02-07 21:31:47 UTC
Comment on attachment 49033 [details]
Large 1.5mb docx file

since this document opens quickly it is sort of off-topic for this bug
Comment 34 tommy27 2014-12-13 09:34:03 UTC
(In reply to Astrix from comment #0)
> Created attachment 49025 [details]
> shorter 50 page docx document

still takes 1 minute and 40 seconds to load this file with LibO 4.3.4.1 and recent 4.5.0.0 alpha

tested under Win8.1 x64
AMD A8-6410 APU 2GHz 8GB RAM SSD
Comment 35 Michael Meeks 2014-12-13 14:27:21 UTC
Might be worth building a new profile of a build with debug symbols on vs. master - a ton of stuff changed in the XFastParser recently - and that has a very helpful effect wrt. splitting the core implementation slowness in writer from the XML parsing - and making it more visible. Any takers ?
Comment 36 tommy27 2015-06-21 19:57:02 UTC
(In reply to tommy27 from comment #34)
> (In reply to Astrix from comment #0)
> > Created attachment 49025 [details]
> > shorter 50 page docx document
> 
> still takes 1 minute and 40 seconds to load this file with LibO 4.3.4.1 and
> recent 4.5.0.0 alpha
> 
> tested under Win8.1 x64
> AMD A8-6410 APU 2GHz 8GB RAM SSD

retested in the same computer and I see an improvement.
both LibO 4.4.3.2 and recent 5.1.0.0 alpha take 1 minute and 21 seconds
Comment 37 Jean-Baptiste Faure 2016-05-01 13:38:30 UTC
Please test again with current version.

For me with LO 5.1.4.0.0+ built at home under Ubuntu 16.04 x86-64:
short 50 pages docx: 18 seconds
long 119 pages docx: 55 seconds


Best regards. JBF
Comment 38 Julien Nabet 2016-05-16 14:49:32 UTC
I put a new callgrind trace here:
http://serval2412.free.fr/tdf_39179_callgrind.out.7206.bz2
I retrieved it with master sources updated yesterday (enable-dbgutil)
Comment 39 Michael Meeks 2016-05-16 15:16:49 UTC
Hi Julien.

Thanks so much for running that; but I guess:

> I retrieved it with master sources updated yesterday (enable-dbgutil)

dbgutil adds -so- much more debugging code and changes all the code-paths - such that the profile is no longer useful (normally) - although with some quick kcachegrinding perhaps we'll find that it doesn't distort it too much; I didn't look. Ideally an --enable-symbols build is the one to callgrind.

Thanks !
Comment 40 Julien Nabet 2016-05-16 15:47:37 UTC
Created attachment 125095 [details]
kcachegrind screenshot

I use dbg-util to try to retrieve the maximum of information (in bt and/or in console).

I'm not accustomed to interpreting kcachegrind views but it seems these might be interesting:
1) http://opengrok.libreoffice.org/xref/core/sax/source/tools/fastattribs.cxx#121
    121 sal_Bool FastAttributeList::hasAttribute( ::sal_Int32 Token ) throw (RuntimeException, std::exception)
    122 {
    123     for (sal_Int32 i : maAttributeTokens)
    124         if (i == Token)
    125             return true;
    126 
    127     return false;
    128 }
I wonder if the loop could be replaced by something else.

2) http://opengrok.libreoffice.org/xref/core/sax/source/fastparser/fastparser.cxx#639
FastSaxParserImpl::GetTokenWithPrefix
Comment 41 Michael Meeks 2016-05-16 16:07:09 UTC
Interesting: so this:

    123     for (sal_Int32 i : maAttributeTokens)
    124         if (i == Token)
    125             return true;

is I'm afraid reasonably optimal - you can use a hash which is supposedly constant time, but cache effects will probably hurt you.

In general - I'm surprised to see that at the top of the profile; are there far too many attributes being used in the XML ? is it possible that the XML is parsed again and again in error (?) - or some meta-data ?

The call-counts can be helpful for that as you look up the stack in kcachegrind - and those are ~independent of dbgutil.
Comment 42 Julien Nabet 2016-05-16 16:55:39 UTC
Trying to find the different paths, I noticed this part

   5240 void DomainMapper_Impl::substream(Id rName,
   5241         ::writerfilter::Reference<Stream>::Pointer_t const& ref)
   5242 {
   5243 #ifndef NDEBUG
   5244     size_t contextSize(m_aContextStack.size());
   5245     size_t propSize[NUMBER_OF_CONTEXTS];
   5246     for (int i = 0; i < NUMBER_OF_CONTEXTS; ++i) {
   5247         propSize[i] = m_aPropertyStacks[i].size();
   5248     }
(see http://opengrok.libreoffice.org/xref/core/writerfilter/source/dmapper/DomainMapper_Impl.cxx#5240)

for use here:
   5324     // check that stacks are the same as before substream
   5325     assert(m_aContextStack.size() == contextSize);
   5326     for (int i = 0; i < NUMBER_OF_CONTEXTS; ++i) {
   5327         assert(m_aPropertyStacks[i].size() == propSize[i]);
   5328     }

But if it allows to check that everything is ok in debug mode, I suppose this part must be ok.
Comment 43 Julien Nabet 2016-05-16 17:47:30 UTC
I tested with LO Debian package 5.1.3.2, I confirm what Jean-Baptiste indicated, it takes less than 20s to load the 50 pages version.

Also, when I kept on going up the calling tree, I found this call:
void LoggedStream::substream(Id name, writerfilter::Reference<Stream>::Pointer_t ref) from writerfilter/source/dmapper/LoggedResources.cxx
    276 void LoggedStream::substream(Id name, writerfilter::Reference<Stream>::Pointer_t ref)
    277 {
    278 #ifdef DEBUG_WRITERFILTER
    279     mHelper.startElement("substream");
    280     LoggedResourcesHelper::attribute("name", (*QNameToString::Instance())(name));
    281 #endif
    282 
    283     lcl_substream(name, ref);
    284 
    285 #ifdef DEBUG_WRITERFILTER
    286     LoggedResourcesHelper::endElement("substream");
    287 #endif
    288 }

So it seems that debug build takes lots of time because of these DEBUG_WRITERFILTER (but it's normal since it's for debugging :-)).
But these impact only debug build so for non debug release, it seems indeed ok now.

May we put this one to WFM then?
Comment 44 Michael Meeks 2016-05-17 08:54:40 UTC
Sure - performance measurements on dbgutil builds are -totally- unhelpful =) only a product build should ever be used to measure or assess performance; so I'm happy to close. But was the original reporter using a product or a dbgutil build ? =)
Comment 45 Telesto 2017-06-05 20:09:23 UTC
Still a repro. However not at the same extend as the initial report (+/-60 seconds)
Version: 5.5.0.0.alpha0+
Build ID: ec79f3453471ee9b6ae32e71ff16ea99d9b7751c
CPU threads: 4; OS: Windows 6.19; UI render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2017-05-28_23:21:44
Locale: nl-NL (nl_NL); Calc: CL
Comment 46 Xisco Faulí 2017-10-04 11:30:52 UTC
(In reply to Telesto from comment #45)
> Still a repro. However not at the same extend as the initial report (+/-60
> seconds)
> Version: 5.5.0.0.alpha0+
> Build ID: ec79f3453471ee9b6ae32e71ff16ea99d9b7751c
> CPU threads: 4; OS: Windows 6.19; UI render: default; 
> TinderBox: Win-x86@42, Branch:master, Time: 2017-05-28_23:21:44
> Locale: nl-NL (nl_NL); Calc: CL

20 seconds here

Version: 6.0.0.0.alpha0+
Build ID: 6e3e4cd38b56d432c48cd7217885974e3f0519fd
CPU threads: 4; OS: Linux 4.10; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group

Could you please try again with the same computer ?
Comment 47 Roman Kuznetsov 2018-07-29 11:14:29 UTC
(In reply to Xisco Faulí from comment #46)
 
> 20 seconds here
> 
> Version: 6.0.0.0.alpha0+
> Build ID: 6e3e4cd38b56d432c48cd7217885974e3f0519fd
> CPU threads: 4; OS: Linux 4.10; UI render: default; VCL: gtk3; 
> Locale: ca-ES (ca_ES.UTF-8); Calc: group
> 
> Could you please try again with the same computer ?

22 sec in LO 6.1.0.1 on Linux (Xubuntu 16.10) and I have 49 pages in the opened document

@Xisco: WFM?
Comment 48 Buovjaga 2018-09-22 17:00:26 UTC
Created attachment 145116 [details]
Callgrind output from master (fileopen 50 page docx)

Under normal conditions:
50 page docx takes 13 secs to open
119 page docx takes 32 secs to open

Arch Linux 64-bit
Version: 6.2.0.0.alpha0+
Build ID: 0ffa7a733d834647dfd59b864c52a015028822b6
CPU threads: 8; OS: Linux 4.18; UI render: default; VCL: gtk3_kde5; 
Locale: fi-FI (fi_FI.UTF-8); Calc: threaded
Built on September 21st 2018
Comment 49 Xisco Faulí 2019-04-02 14:04:08 UTC
50 pages document takes

real	0m23,714s
user	0m23,225s
sys	0m0,359s


in

Version: 6.3.0.0.alpha0+
Build ID: 3b518953a8141b0d5043c2f3996a92956fdc3a47
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

while the 119 pages document takes

real	1m7,827s
user	1m7,196s
sys	0m0,529s

could be improved but it's not that bad
Comment 50 Julien Nabet 2019-07-02 21:29:20 UTC
Created attachment 152517 [details]
Flamegraph

On pc Debian x86-64 with master sources updated + enable-symbols (not enable-dbgutil), I retrieved a Flamegraph during opening of 50 pages doc.
Comment 51 Justin L 2019-11-30 07:02:55 UTC
The entire footnote.xml file is opened and read for every single footnote reference. A huge speed-up could be attained by parsing once and storing the result in an array (at the cost of theoretically infinite memory). But I'm an inadequate programmer to handle that task...

Some useful commentary from commit 330b860205c7ba69dd6603f65324d0f89ad9cd5f
Author: Miklos Vajna
Date:   Wed Sep 4 16:08:49 2013 +0200
    tdf#68787 DOCX import: handle when w:separator is missing for footnotes
...
    2) The footnote stream isn't read by the importer, only when the main
    stream references the footnote one, the relevant part of it is parsed.
    At the moment we always parse the first (special, "separator") entry in
    the footnote stream, that may be optimized later if it becomes a
    bottleneck.

That optimization of only parsing the separator once instead of 973 times yields only a minor improvement (<10%). gerrit.libreoffice.org/84104
Comment 52 Justin L 2019-11-30 07:11:00 UTC
bug 76260 has much better commentary and is definitely talking about the same problem. Marking this one as the duplicate.

*** This bug has been marked as a duplicate of bug 76260 ***