Bug 151851

Summary: Sorting array formula works differently in Calc compared to Excel
Product: LibreOffice Reporter: Mike Kaganski <mikekaganski>
Component: CalcAssignee: Not Assigned <libreoffice-bugs>
Status: UNCONFIRMED ---    
Severity: normal CC: 79045_79045, erack, miguelangelrv, rafael.palma.lima, telesto
Priority: medium    
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard: interoperability
Crash report or crash signature: Regression By:
Bug Depends on:    
Bug Blocks: 108827    
Attachments: Sorting array formula example
Formula analysis
Comparison of Calc and Excel
Formula to sort arrays of data.

Description Mike Kaganski 2022-11-01 07:54:36 UTC
Created attachment 183357 [details]
Sorting array formula example

In the attachment, the data range "range1" (A1:A15) is sorted in C1:C15 using array formulas like

> {=INDEX(range1;MATCH(LARGE(COUNTIF(range1;">"&range1);ROW(C1:C1));COUNTIF(range1;">"&range1);0))}

The results in Excel start with sorted cells containing data, then zeroes; in Calc, zeroes are first.

It seems to me that Calc is correct in this case (because indeed, empty cells are those that have most other cells greater than them); but this is an interop problem; and anyway, maybe I miss some detail (I always am confused by array evaluation).
Comment 1 Rafael Lima 2022-11-01 14:28:03 UTC
Created attachment 183363 [details]
Formula analysis

Hi Mike... Using the function wizard to analyze the function I came to a similar conclusion. It seems Calc is right because an empty cell is "smaller" than a cell with text.

In the sample file, cell A4 is empty hence it has 7 cells that are greater than it (which are the cells with values).

The weird bit is that in Excel, testing if an empty cell is "smaller" than a cell with value also returns TRUE (as well as in Calc). But in the array function the test returns a different result, as if an empty cell is "larger" than a cell with value.

Hence the problem is in the COUNTIF(range1;">"&range1) part, but IMO Excel is doing it wrong.
Comment 2 Rafael Lima 2022-11-01 14:35:11 UTC
Created attachment 183364 [details]
Comparison of Calc and Excel

See the attached image for more details.

I used the same array formula in both files:

=COUNTIF(A1:A15;">"&A1:A15)

The formula is applied as an array formula over the range A1:A15. Notice how the results are different.

However, note below that in both Calc/Excel an empty cell is "smaller" than a cell with text.
Comment 3 Eike Rathke 2022-11-01 17:03:16 UTC
For Criteria comparison operators empty being smaller than any content seems natural to me. However, maybe Excel does that differently because Sort (that actually sorts cells) sorts empty cells to the end of the range (which also Calc does), i.e. sort column A to see.
Comment 4 m_a_riosv 2022-11-02 01:13:03 UTC
Created attachment 183368 [details]
Formula to sort arrays of data.

Seems to me that such formula in any case is not of too much confidence.

Excel: Introducing something in the empty cells between data (a4, a8) produce a result sort list with replicated values at the end. If there are numbers, the second it's not on the sorted list.
Calc: retains the '0' at the top, but if numbers in a4,a8 only a4 it's on the sorted list but duplicated.

Attached a sample file that deals with numbers, text, blank cells (at the end), working with calc and excel. Based on 'ken johnson' answer https://forum.openoffice.org/en/forum/viewtopic.php?p=290729#p290729