Bug 150141 - loading 8.5M row .csv never completes - 100% CPU entire time
Summary: loading 8.5M row .csv never completes - 100% CPU entire time
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.5.0.0 alpha0+
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: perf
Depends on:
Blocks:
 
Reported: 2022-07-25 16:33 UTC by Pierre Fortin
Modified: 2022-07-27 07:15 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pierre Fortin 2022-07-25 16:33:07 UTC
Trying to load .csv with 8559242 rows of 90 columns each.
CPU pegged entire time.
No visual clues as to what is happening: no progress bar, no response to mouse or keyboard.
An earlier version of 7.5 loaded similar sheets in ~4 minutes.

Operating System: Mageia 9
KDE Plasma Version: 5.24.4
KDE Frameworks Version: 5.93.0
Qt Version: 5.15.2
Kernel Version: 5.18.11-server-1.mga9 (64-bit)
Graphics Platform: X11
Processors: 20 × 12th Gen Intel® Core™ i7-12700K
Memory: 125.5 GiB of RAM
Graphics Processor: AMD Radeon RX 6600 XT

$ scalc --version
LibreOfficeDev 7.5.0.0.alpha0 4827d5cb1508f6bca9489e31b877cfff36393c50
Comment 1 Pierre Fortin 2022-07-25 21:20:05 UTC
Aborted the load after waiting 6.5 hours.
Comment 2 Roman Kuznetsov 2022-07-26 19:35:30 UTC
Calc supports only ~1 million rows by default

Anyway please attach your CSV here
Comment 3 Pierre Fortin 2022-07-27 06:51:20 UTC
(In reply to Roman Kuznetsov from comment #2)
> Calc supports only ~1 million rows by default

Yes, but this report is against the new jumbo feature of 16M rows which is almost enough for what my team needs.

> Anyway please attach your CSV here

Plenty of examples available at https://dl.ncsbe.gov/?prefix=data/ -- look for the big files....
These zip files mostly contain a single .txt (mostly tab separated "csv")... See also the Snapshots folder...   Files may contain tab or comma separated data; but vary in data encoding.  If the 16 bit encoded files give you trouble, you can use the Linux command:
  tr -d '\000"\r\377\376\275' < infile.txt > outfile.csv
to "clean" them up...

Cool! this daily build has a progress bar...  Loading a 5.7GB sheet...  Progress bar reached the end 1 minute +|- 5 seconds after starting the load.  Waiting for the sheet to display. Oh well... after another 1:40m, load failed: too many rows... less than a minute after OK, sheet appeared...  HUGE speed improvement over initial tests about a week ago.  Sheet showing 16,777,216 rows. This file is from https://s3.amazonaws.com/dl.ncsbe.gov/data/ncvhis_Statewide.zip

$ ll ncvhis_Statewide-20220723-070658.csv
[snip] 4265533961 Jul 23 07:06 ncvhis_Statewide-20220723-070658.csv
$ wc -l ncvhis_Statewide-20220723-070658.csv
33686293 ncvhis_Statewide-20220723-070658.csv
^^^^^^^^ 
Even if Calc doubled the number of jumbo rows to 33,554,432; I'd still leave 131,861 rows on the cutting room floor...  :)  While it would be great to load such sheets, we have to split them up.  I have one sheet covering 2012-2022 which we reduced to around 77M records...   but seriously, 16M rows is something we'd be happy with for a while...  we have lots of ways to slice and dice these large sheets; but 16M rows is a big help; I'm using the daily builds almost exclusively when they work...
Comment 4 Pierre Fortin 2022-07-27 06:54:23 UTC
Actually, the latest build has both resolved the reported issue and has better performance that I'd expected.  OK to close this report, unless you want to keep it open for internal reasons...  THANKS!!
Comment 5 Roman Kuznetsov 2022-07-27 07:10:20 UTC
(In reply to Pierre Fortin from comment #4)
> Actually, the latest build has both resolved the reported issue and has
> better performance that I'd expected.  OK to close this report, unless you
> want to keep it open for internal reasons...  THANKS!!

Pierre, thanks for retesting it with the latest master build.

Let's close this one as WFM by Comment 4
Comment 6 Timur 2022-07-27 07:15:51 UTC
Many bugs were resolved with meta bug 133764. So let's close.