Bug 36689 - SUMPRODUCT slow compared to SUMIF
Summary: SUMPRODUCT slow compared to SUMIF
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: perf
Depends on:
Blocks:
 
Reported: 2011-04-29 10:47 UTC by Kevin Hunter
Modified: 2015-12-15 11:05 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Workbook with 2 sheets. Bug is (hopefully) well explained on the second sheet. (850.15 KB, application/vnd.oasis.opendocument.spreadsheet)
2011-04-29 10:47 UTC, Kevin Hunter
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Hunter 2011-04-29 10:47:39 UTC
Created attachment 46163 [details]
Workbook with 2 sheets.  Bug is (hopefully) well explained on the second sheet.

When making the same calculation with SUMPRODUCT and SUMIF in the attached workbook, the former takes considerably longer than the latter.  And, with only 14,000 rows, I believe it should be near instantaneous.

I've checked with a git pull of the master branch (as of 4:30pm -0400 22 Apr) and though the performance is better than the 14+ seconds I see with LO 3.3, it is currently still taking 4 seconds.  Given that the equivalent SUMIF appears to return immediately, I wonder if there are still some gains to be had.

Is this a performance issue?  Or am I misunderstanding something about Calc's API in regards to SUMPRODUCT?

I have only tested with an x86-64 kernel (real hardware) as that's all I have.

If it's of use:

$ uname -a
Linux hani 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 18:42:20 UTC 2011 x86_64 GNU/Linux
Comment 1 m_a_riosv 2011-06-26 17:32:24 UTC
With Winx64 7 ultimate i3-2200, it's slightly difference in favour of SUM.IF(),
but for both less than a second.
SUMPRODUCT() is an array function and do the job with all rows, while SUM.IF() only work with rows ok with the condition.
Comment 2 Kevin Hunter 2011-06-26 17:55:34 UTC
Can you provide a little more information?  Like the date at which you git pulled, or what version of LO you're using?

I expected a slight difference in speed, akin to what you're suggesting, but regardless, I think that 14,000 rows for the calculation I was requesting should be barely noticeable on any semi modern CPU.

Unfortunately, I still notice a 4+ second speed difference with a LO with a git pull of master as of 11a -0400 22 Jun.

If you're using a similarly-aged build, that suggests it might be a Linux-specific regression then.  Thanks for the info!
Comment 3 Winfried Donkers 2013-02-11 15:13:35 UTC
Tested with LibO version 4.0.0.3 on Windows XP with a Pentium Dual Core E5400@2.7GHz:
SUMPRODUCT took approx. one second, SUMIF less. Given the cpu and the size of the sheet, the results do not worry me.

Tested with LibO version 3.6.5.3 on Windows XP with a Pentium Dual Core E5400@2.7GHz:
SUMPRODUCT took approx. one second, SUMIF less. Given the cpu and the size of the sheet, the results do not worry me.

Tested with LibO version 3.6 on openSUSE 12.2 with a AMD FX6300 6 core @ 3GHz:
both functions were too fast to measure with stopwatch.
I finally tested with LibO version 4.1 alpha+ (code update this weekend), build with full debug information and producing traces on modified code in calc, on same openSUSE12.2 machine:
SUMPRODUCT took approx. two seconds, SUMIF less than a second. Given the build type, these results do not worry me.
 
I looked at the code of the two functions:
-SUMProduct is a matrix fucntion and calculates the summation of the product of (in your case) 2 matrices of identical size (13819x1), and one matrix is filtered before it is filled.
-SUMIF calculates the summation of (in your case) 4510 products of 2 cell values, after 13819 rows have been filtered.

These formulas are coded to handle much more complex contents (nxm matrices, with references, formulas, different data types, etc. ) and are fundamentally different, I am not surprised that SUMPRODUCT is slower, nor am I worried by the time the calculation takes.


@Kevin:
Is this still a bug for you?
Comment 4 Kevin Hunter 2013-02-11 20:18:32 UTC
On LO 4.0.0.3 (7545bee9c2a0782548772a21bc84a9dcc583b89) (same hardware as original bug report), they're not the same speed, but the difference for this small amount of data is in the "acceptable" range (~1 second for the SUMPRODUCT path, given my mental stopwatch).

In light of the speed up since my reporting of this bug, and your explanation of why (two different paths through memory), I'm happy to accept this bug report as suitably fixed.  (Marking as such now.)

In closing of this report, I'll note two items:

  - I'm having a difficult time making MS Office even get the correct
    result with the SUMPRODUCT function.  Kudos to LO and the embedded
    test that I now take for granted (i.e., the ' < B4' in
    "=SUMPRODUCT(OnLineYear < B4, CapacityMW)".

  - I can consistently crash LO (4.0.0.3) while trying to C&P rows to make
    a larger dataset for testing this bug report.  But that's for a
    new/different bug report.
Comment 5 Winfried Donkers 2013-02-12 06:55:36 UTC
(In reply to comment #4)
 
> Kudos to LO and the embedded
> test that I now take for granted (i.e., the ' < B4' in
> "=SUMPRODUCT(OnLineYear < B4, CapacityMW)".

Yes, that is really top class formula handling.
 
>   - I can consistently crash LO (4.0.0.3) while trying to C&P rows to make
>     a larger dataset for testing this bug report.  But that's for a
>     new/different bug report.

Can you CC me in that new bug? I opened your sample-ods in 4.0.0.3, copied 13819 rows (all except the header) and pasted this 7 times, which resulted in 110553 rows. No problem and I did not notice slower calculation (did not use a stopwatch). Tested on Windows XP (Linux machines busy building).
Comment 6 Kevin Hunter 2013-02-12 07:05:55 UTC
(In reply to comment #5)
>>   - I can consistently crash LO (4.0.0.3) while trying to C&P rows to make
>>     a larger dataset for testing this bug report.  But that's for a
>>     new/different bug report.
> 
> Can you CC me in that new bug? I opened your sample-ods in 4.0.0.3, copied
> 13819 rows (all except the header) and pasted this 7 times, which resulted
> in 110553 rows. No problem and I did not notice slower calculation (did not
> use a stopwatch). Tested on Windows XP (Linux machines busy building).

You know, I said that, but now I can't satisfy myself that I don't have something else going on.  LO was crashing around the ~200k row mark for me, but after attempting to narrow down the variables, I can longer even open this bug report's spreadsheet (sumproduct_slow.ods).  In fact, I can longer open LO, even with a clean profile.

When (err, if) I can /actually/ recreate it like I said I could yesterday, I will CC you.
Comment 7 Kevin Hunter 2013-02-12 07:41:35 UTC
(In reply to comment #6)
> When (err, if) I can /actually/ recreate it like I said I could yesterday, I
> will CC you.

Okay, I cannot reliably recreate said bug.  I must have had some memory corruption.  After rebooting, I can reliably start LO, and cannot reliably create the crash.  I can /intermittently/ make it crash.  Oh joy.

So, long story short, /this/ bug is closed, and my earlier reference to the crasher is null and void.  Cheers, and thanks for all your efforts, Winfried.  (Not just on this bug report.)
Comment 8 Robinson Tryon (qubit) 2015-12-15 11:05:13 UTC
Migrating Whiteboard tags to Keywords: (perf)
[NinjaEdit]