Bug 154627 - Inconsistencies in RANK() function in Calc
Summary: Inconsistencies in RANK() function in Calc
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.0.0.3 release
Hardware: All All
: medium normal
Assignee: Eike Rathke
URL:
Whiteboard: target:7.6.0 target:7.5.3
Keywords: implementationError
Depends on:
Blocks:
 
Reported: 2023-04-05 17:04 UTC by ady
Modified: 2023-06-06 17:52 UTC (History)
0 users

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ady 2023-04-05 17:04:20 UTC
There seem to be some inconsistencies in the RANK() function in Calc as of LO 7.4.6 (in part since LO 4.0).

For the purpose of this report:
* Rank(Value;Data[;Type])
* L.Value is the minimum value in Data.
* G.Value is the maximum value in Data.


Sample file showing the problems
Attachment 186403 [details].


A_ Columns B:C use RANK() on Data, with the respective Values taken directly from the same Data range. Besides valid values within Data, there are also "a", "1" (potential valid value, but not within the range of values in Data) and an empty cell.

B_ Columns F:G use RANK.AVG().

C_ Column K is a set of possible values for the Value argument for RANK().

D_ Columns L:M use column K as Value for RANK(), with Data being A3:A7.

E_ Columns O:P are similar to "D_" but with duplicate values in Data, A3:A10.

F_ Columns S:T are like "D_" but with RANK.AVG().

G_ Columns V:W are like "F_" but with duplicate values in Data, A3:A10.



The following items are presented in increasing magnitude/importance of problematic results (starting from "no problem" towards "serious problem").

0.1. When Value is "string" or refers to a string, then the result is #Value!.
  This seems adequate.

0.2. When Value is "string" or refers to a string, and it is included in Data, then the Ranks corresponding to (other) valid values in Data discard the existence of the string value.
  This seems adequate.

0.3. When Value refers to an empty cell, then the result is #Value!.
  This seems adequate; although someone could argue in favor of a different result such as #N/A (as other spreadsheet tools do).


1. When Value  <  L.Value, or when Value  >  G.Value, then the result is #Value!.
  IMO, it should be #N/A, as in other spreadsheet software tools.
  For Values 1 and 10 in attachment 186403 [details] (column K), the result should be #N/A.


2. When Type is a number <> zero, and when 'L.Value  <  Value  >  G.Value', ((and) Value is not listed in Data), the result is "-1" (without quotation marks).
  I couldn't find official Release Notes on this important change, which can affect further calculations, preventing interoperability with other spreadsheet tools. This (distinct) result is not seen in LO 3.3, (nor in AOO, GSheets, Excel). I would had expected some documentation (including the reasoning, especially when it is different than other spreadsheet tools). Using other spreadsheet software tools, this results in #N/A.
  In attachment 186403 [details], when Type is 1 and Value is between 2 and 9 (but not them), the result (since LO 4.0) is "-1" (columns M and P). It should had been #N/A.


3. When Type = 0 (zero), and when 'L.Value  <  Value  >  G.Value', ((and) Value is not listed in Data), the result is still a positive integer number, which is INCORRECT!
  Similar case as point 2, but with Type = 0 (columns L and (letter) O in attachment 186403 [details]). The result (positive numbers) is simply incorrect, and worse than point 2. This should had been #N/A.


4. When using RANK.AVG() instead of RANK(), when 'L.Value < Value > G.Value', ((and) Value is not listed in Data), the result is still a positive number, which is INCORRECT! Moreover, when there are duplicate values in Data, the results can fall beyond the maximum rank.
  In attachment 186403 [details] (columns S, T, V and W), the maximum rank should had been 5 (S:T) and 7 (V:W) respectively. Here we not only have incorrect positive results where there shouldn't be (as in point 3) but the results can be greater than 7.


The functions RANK(), RANK.EQ() (equivalent to RANK()) and RANK.AVG() need a review.
Comment 1 Eike Rathke 2023-04-05 18:13:42 UTC
2.,3.,4. are clearly bugs.

Whether #N/A would be better than #VALUE! depends, for query values not being numeric #VALUE! is appropriate. If a rank number is not found #N/A can be expected.
Comment 2 Commit Notification 2023-04-06 14:58:54 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/74d39f5cff324d76092268418028bd882d8a4d60

Resolves: tdf#154627 RANK() query value not in data must return error

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 3 Eike Rathke 2023-04-06 14:59:16 UTC
Pending review https://gerrit.libreoffice.org/c/core/+/150084 for 7-5
Comment 4 ady 2023-04-07 05:51:40 UTC
(In reply to Eike Rathke from comment #1)
> 2.,3.,4. are clearly bugs.
> 
> Whether #N/A would be better than #VALUE! depends, for query values not
> being numeric #VALUE! is appropriate. If a rank number is not found #N/A can
> be expected.

Thank you.

With:
Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 375f85f8518f49ce4381b6663f1e94fc02bacf93
CPU threads: 4; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: en-US (es_AR); UI: en-US
Calc: CL threaded
Built: 2023-04-07

By using attachment 186403 [details], I immediately see points 0.3 and 1 solved, but points 2, 3 and 4 initially seemed still the same as they were before.

I had to manually "Recalculate Hard" in order to see points 2, 3 and 4 being resolved. If I don't re-save the ods file, the previous results show up again when reopening.

* Is the need for "Recalculate Hard" expected?

* Is there still any/some case in which the result of any of the RANK functions would result in "-1"?

* Could this bug 154627 be resolved / cherry picked for LO 7.4.7. too?
Comment 5 Commit Notification 2023-04-07 15:53:32 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/92ac45898fe6715b7dd5e12831af6f7d3df119e3

Resolves: tdf#154627 RANK() query value not in data must return error

It will be available in 7.5.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 Alex Doll 2023-06-06 16:02:19 UTC
RANK should not be the same as RANK.EQ.  Item 1 has broken a number of my spreadsheets.  

I am querying a range of values and want to what where a value not explicitly in the range would appear, had it been in the range.  Now all I get is a column of NA instead of the position the search value would appear.

I am reverting to 7.5.2.
Comment 7 ady 2023-06-06 17:52:46 UTC
(In reply to Alex Doll from comment #6)
> RANK should not be the same as RANK.EQ.  Item 1 has broken a number of my
> spreadsheets.  
> 
> I am querying a range of values and want to what where a value not
> explicitly in the range would appear, had it been in the range.  Now all I
> get is a column of NA instead of the position the search value would appear.
> 

Please explain (or point to some relevant source) how or why in your opinion RANK() should not be the same as RANK.EQ().

As for your needs (or "broken" spreadsheets), please attach a sample file. Regarding sensitive info in your file, please see
 https://wiki.documentfoundation.org/QA/Bugzilla/Sanitizing_Files_Before_Submission

...before attaching it.