Bug 132389

Summary: BASIC: Replace is only case-insensitive for ASCII characters
Product: LibreOffice Reporter: Mike Kaganski <mikekaganski>
Component: BASICAssignee: Andreas Heinisch <andreas.heinisch>
Status: RESOLVED FIXED    
Severity: normal CC: andreas.heinisch, himajin100000, sberg.fun
Priority: medium    
Version: unspecified   
Hardware: All   
OS: All   
See Also: https://bugs.documentfoundation.org/show_bug.cgi?id=141045
https://bugs.documentfoundation.org/show_bug.cgi?id=142243
https://bugs.documentfoundation.org/show_bug.cgi?id=142487
https://bugs.documentfoundation.org/show_bug.cgi?id=110003
https://bugs.documentfoundation.org/show_bug.cgi?id=144245
Whiteboard: target:7.0.0 target:7.2.0
Crash report or crash signature: Regression By:
Bug Depends on:    
Bug Blocks: 127592    

Description Mike Kaganski 2020-04-24 22:03:18 UTC
> Sub TestReplace2
>   MsgBox Replace("АБВабв", "б", "*") ' test Cyrillic characters
>   MsgBox Replace("ABCabc", "b", "*") ' test ASCII characters
> End Sub

This code generates "АБВа*в" in the first case, while the correct result should be "А*Ва*в", since the default mode for Replace is case-insensitive [1]. It shows "A*Ca*c" correctly for the second case.

Replace should allow case-insensitive operation for non-ASCII characters, too.
Code pointer: SbRtl_Replace in basic/source/runtime/methods.cxx.

[1] https://help.libreoffice.org/6.4/en-US/text/sbasic/shared/replace.html
Comment 1 Commit Notification 2020-05-21 06:51:35 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/3ff159d35770ac3454ee909b348cb4f4ca8b0b9b

tdf#132389 - case-insensitive operation for non-ASCII characters

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 2 Commit Notification 2021-05-13 18:04:23 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/698e27d29cf0612634720c818ee773bfac6c40d1

tdf#132389 - Case-insensitive operation for non-ASCII characters

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 3 Stephan Bergmann 2021-05-14 14:58:52 UTC
Note that the Unicode standard defines a concept of locale-independent "default caseless matching" (D144 in section 3.13 "Default Case Algorithms", <https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf>), which might be more appropriate to use here than any specific locale-dependent approach.
Comment 4 Andreas Heinisch 2021-05-14 16:31:09 UTC
This is something I cannot decide, because I have not the insight in the locale vs. locale independent comparision.

In the linked document, there are even two possible ways in order to do a default caseless matching:

D144 
A string X is a caseless match for a string Y if and only if:
toCasefold(X) = toCasefold(Y)

D145
A string X is a canonical caseless match for a string Y if and only if:
NFD(toCasefold(NFD(X))) = NFD(toCasefold(NFD(Y)))

Is the method toCasefold the same as defined in https://opengrok.libreoffice.org/xref/core/i18npool/source/transliteration/transliteration_Ignore.cxx?r=c6b7f555#85, or is there another implementation?