Bug 107769

Summary: spell checking should normalize data first
Product: LibreOffice Reporter: martin_hosken
Component: LinguisticAssignee: Not Assigned <libreoffice-bugs>
Status: NEW ---    
Severity: enhancement    
Priority: medium    
Version: 5.4.0.0.alpha1+   
Hardware: All   
OS: All   
Whiteboard:
Crash report or crash signature: Regression By:
Bug Depends on:    
Bug Blocks: 96000    

Description martin_hosken 2017-05-11 11:05:45 UTC
Words to be spell checked should be converted to NFKC first so that spell checking dictionaries don't need to hold all forms (NFD, NFC, mixed) of a word.

I'm going to sketch my thoughts on how to do it here in case I can't get back to the bug for a while. Anyone want to take it further?

In SpellChecker::GetSpellFailure in lingucomponent/source/spell/sspellimpl.cxx, rather than doing a poor man's hand created NFK into nWord, start with an nWord created something like:

icu::UnicodeString rIn(reinterpret_case<const UChar *>(rWord.getStr()), rWord.getLength());
icu::UnicodeString normal;
UErrorCode rCode;
icu::Normalizer(rIn, UNORM_NFKC, normal, rCode);
OUString nWord(U_SUCCESS(rCode) ? OUString(reinterpret_case<Sal_Unicode *>(normal.getBuffer()), normal.length()) : OUString());

then use nWord instead of rWord for the rest of the function.

Need to find a test for this.
Comment 1 Buovjaga 2017-05-12 17:49:58 UTC
Ok -> NEW