Bug 54638 - Too slow to import file with lots of cells with multi-line contents.
Summary: Too slow to import file with lots of cells with multi-line contents.
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.6.1.2 release
Hardware: All All
: medium major
Assignee: Kohei Yoshida
URL:
Whiteboard: target:4.1.0
Keywords: perf
Depends on:
Blocks:
 
Reported: 2012-09-07 13:11 UTC by 8472
Modified: 2015-12-15 11:35 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
Saving of these data took 37 minutes, but is only 330K large! (329.23 KB, application/vnd.oasis.opendocument.spreadsheet)
2012-09-07 14:28 UTC, 8472
Details

Note You need to log in before you can comment on or make changes to this bug.
Description 8472 2012-09-07 13:11:54 UTC
Hi,

I updated to the recent LO 3.6.1 on my distro Arch Linux, and have noticed, that the Calc is quite slow ever since when manipulating with bigger data file.
I have by e.g. about 5MB large ODS spreadsheet, in which I store various data.
Before on LO 3.5.* , it worked fast.
Now, it takes minutes to load or save the file at change. And I'm updating the file quite often per day.
When I was searching for an root cause, I've noticed, that only one CPU core is being in use by CPU process when loading/saving.
I believe, this migt be it, as I have laptop with 4 core CPU (i5) , and htop/top revealed only one working on 100% at the process of load/save.
I've tested it also on one another separate arch virtual pc, having only two CPU cores, and htop/top again revealed only one CPU core in use at 100%.

As might be seen in here: https://bbs.archlinux.org/viewtopic.php?id=148492 , I'm not the only person affected.

I've also found this problem on the Ubuntu distro: https://bugs.launchpad.net/ubuntu/+source/libreoffice/+bug/1034999

Furthermore, I'm not sure, whether this problem is present on Linux systems only, as I haven't tested it yet at other systems like Windows, but it it possible, that it might be related to other OS's as well.
Comment 1 8472 2012-09-07 14:28:33 UTC
Created attachment 66797 [details]
Saving of these data took 37 minutes, but is only 330K large!

Saving of these data took 37 minutes, but is only 330K large!
Comment 2 8472 2012-09-07 16:35:32 UTC
We've (people on the Arch forum link I've mentioned before) made some further testing, and simply said:
The more data per one cell, e.g. multiple lines per one cell, having a very lot of such data in the sheet, e.g. 10-20 columns and several thousand rows (20x4000), and the time for load/save operations grows.
But if there is just one line per cell, in such 20x4000 sheet, the speed ain't affected, it's saved instantly.
It worked much faster before each new LO version, till it got this slug speed.
Can this be fixed please?
Thx
Comment 3 Markus Mohrhard 2012-09-09 21:10:29 UTC
We have no multithreaded import/export so one core is normal.
Comment 4 8472 2012-09-10 15:58:34 UTC
(In reply to comment #3)
> We have no multithreaded import/export so one core is normal.

It might be normal for one line cell content.
But in cases as I described - an multiline cell content, the performance went horribly down.
Try it yourself, and you'll see on your own.
I don't know what precisely changed in between of the versions, but the processing performance decreased every time after each major release, 3.4, 3.5, 3.6.
And since such feature (multiline cell content) is available, it would be great if it get's some fix.

thx in advance
Comment 5 billhook 2012-09-12 06:43:07 UTC
Confirmed this performance problem on LO 3.6.1.2 on Windows Vista 32bit.

Opening the attached file took 5minutes+ till I killed LO.
Comment 6 Jamie Deith 2012-10-23 22:51:19 UTC
Possibly related - A workbook of mine which recalcs in <1s using 3.5.7 and previous versions now takes ~5min.  Sorry I cannot share it as there is too much proprietary data in there. Loading is not noticeably slower. As with these reports, 1 CPU is 100% utilized by scalc the entire time. I see the same results in both 3.6.0.2 and 3.6.3.1.  I am using Ubuntu Linux 12.04.  Thanks for your work on this great package.
Comment 7 Kohei Yoshida 2013-01-05 05:01:49 UTC
Did some investigation.  Basically I haven't really figured out why importing of multiline contents (which are equivalent of rich-text contents) became so slow all of a sudden, but what's clear is that, to solve this, we need to re-write the code that parses and creates rich-text cells (aka edit cells).  The current code uses way over-engineered UNO API for rich-text cells, and it's not humanly possible to grok that enough to be able to speed things up.  It's just not doable.

The bad news is that this won't make it into 4.0.  I'll see if I can tackle this during the 4.1 development cycle.
Comment 8 Kohei Yoshida 2013-01-05 05:11:14 UTC
Note for self for future work:

Start in ScXMLTableRowCellContext::CreateChildContext and write a new handler to parse the <text:p> elements.  Use XclImpStringHelper::CreateCell() as a reference, and import rich-text contents directly into ScEditCell.
Comment 9 Michael Meeks 2013-01-10 14:34:04 UTC
Jamie - your bug seems un-related to import - can you file a new bug for that - and (if at all possible) generate a callgrind trace with debuginfo installed for it :-)
Comment 10 Kohei Yoshida 2013-02-06 14:57:00 UTC
I'm getting ready to tackle this.
Comment 11 Kohei Yoshida 2013-02-08 04:36:36 UTC
The work is on-going on the feature/ods-edit-cell-import branch.
Comment 12 Kohei Yoshida 2013-02-12 03:46:29 UTC
Just merged the feature/ods-edit-cell-import branch onto master. Here are my numbers for opening the attached document.

3.5: 75.1015 sec
3.6: (too long to measure)
4.0: (too long to measure)
4.1 (master): 18.2441 sec

The numbers suggest that the perf regression is now gone, to say the least.  Not only that, the number is 3 times better than 3.5.

I'll call this fixed.  The fix will be available in 4.1.  As I said in Comment 7, we won't be able to backport this into 4.0 due to its very invasive change.
Comment 13 Daniel Szabo 2013-02-12 11:41:10 UTC
That's great news. However, can you confirm if the same regression when saving an ods file is also fixed by this?

Opening/Saving an ods we work a lot on takes following times:

LO 3.4: Open 15.5s, Save 7.9s
LO 3.6: Open 37.3s, Save 43.2s
Comment 14 Kohei Yoshida 2013-02-12 14:08:38 UTC
(In reply to comment #13)
> That's great news. However, can you confirm if the same regression when
> saving an ods file is also fixed by this?
> 
> Opening/Saving an ods we work a lot on takes following times:
> 
> LO 3.4: Open 15.5s, Save 7.9s
> LO 3.6: Open 37.3s, Save 43.2s

Could you open a separate bug for it?  It's best to separate the import and export parts, since they are two separates code set.  Thanks.
Comment 15 Daniel Szabo 2013-02-12 17:33:59 UTC
> Could you open a separate bug for it?  It's best to separate the import and
> export parts, since they are two separates code set.  Thanks.

Bug 60740
Comment 16 8472 2013-02-12 19:11:26 UTC
Thank you Kohei.
Anyway, I've thought, that it will be fixed both, import/load AND export/save at once. I also mentioned this in my initial description, so who would have tell, that you will concentrate on one function only (doesn't matter that it is elsewhere in the code set as you said).
If I understood it right, now it means, that it will load faster, but saving will still be affected, and will take hell long.

Ok, so Daniel opened the Bug 60740.
Do you think, that you can also focus on this new bug fix ASAP, to have both fixed available in the upcomming 4.1 ?

And thank you in advance for your effort on both.
Comment 17 8472 2013-09-01 16:34:28 UTC
Great, the 4.1.* finally got into my distribution's official repository.
Import/load works much much faster than before (even on my older computer).
Perfect work Kohei, thank you very much.
Comment 18 Robinson Tryon (qubit) 2015-12-15 11:35:22 UTC Comment hidden (obsolete)