Bug 107769

Summary:	spell checking should normalize data first
Product:	LibreOffice	Reporter:	martin_hosken
Component:	Linguistic	Assignee:	Not Assigned <libreoffice-bugs>
Status:	NEW ---
Severity:	enhancement
Priority:	medium
Version:	5.4.0.0.alpha1+
Hardware:	All
OS:	All
Whiteboard:
Crash report or crash signature:		Regression By:
Bug Depends on:
Bug Blocks:	96000

Description martin_hosken 2017-05-11 11:05:45 UTC

Words to be spell checked should be converted to NFKC first so that spell checking dictionaries don't need to hold all forms (NFD, NFC, mixed) of a word.

I'm going to sketch my thoughts on how to do it here in case I can't get back to the bug for a while. Anyone want to take it further?

In SpellChecker::GetSpellFailure in lingucomponent/source/spell/sspellimpl.cxx, rather than doing a poor man's hand created NFK into nWord, start with an nWord created something like:

icu::UnicodeString rIn(reinterpret_case<const UChar *>(rWord.getStr()), rWord.getLength());
icu::UnicodeString normal;
UErrorCode rCode;
icu::Normalizer(rIn, UNORM_NFKC, normal, rCode);
OUString nWord(U_SUCCESS(rCode) ? OUString(reinterpret_case<Sal_Unicode *>(normal.getBuffer()), normal.length()) : OUString());

then use nWord instead of rWord for the rest of the function.

Need to find a test for this.

Comment 1 Buovjaga 2017-05-12 17:49:58 UTC

Ok -> NEW