Bug 155315

Summary:	Remove ambiguous Romanian autocorrect entries
Product:	LibreOffice	Reporter:	cipricus <cipricus>
Component:	Linguistic	Assignee:	Not Assigned <libreoffice-bugs>
Status:	NEW ---
Severity:	normal	CC:	sophi, stephane.guillou
Priority:	medium
Version:	7.5.2.2 release
Hardware:	All
OS:	All
Whiteboard:
Crash report or crash signature:		Regression By:
Bug Depends on:
Bug Blocks:	103341

Description cipricus 2023-05-15 10:03:02 UTC

Description:
Some forms that need correcting can be corrected to different correct forms. Auto-correction may lead to a form that is not the one desired and needs correction. 



Steps to Reproduce:
In Romanian: type a wrong form (e.g., "razbunata", "rasfatata", "neastamparata" etc)


Actual Results:
These are automatically corrected to proper forms ("răzbunată"=revenged, fem., "răsfățată"=spoiled/pampered, fem., "neastâmpărată"=naughty/unruly, fem.), but which are not the only ones possible (notably, the definite forms may also be expected in these examples: "răzbunată"=the revenged one, fem., "răsfățată"=the spoiled/pampered one, fem., "neastâmpărată"= the naughty/unruly, fem.).

Expected Results:
Auto-correction should provide a word unambiguosly expected relatively to the incorrect form to which the automated action is applied. 


Reproducible: Always


User Profile Reset: No

Additional Info:
I have noticed this while trying to fix the bug on auto-correction being applied to correct Romanian words (https://bugs.documentfoundation.org/show_bug.cgi?id=155087).

I am only aware of this problem with the Romanian auto-correction, but I would like to know if this could be identified as a rule: forms that might support multiple correct forms should not be auto-corrected.

If this is true, I could apply some adjustments to the Romanian auto-correction while I work for the linked bug report.
 
As said here (https://bugs.documentfoundation.org/show_bug.cgi?id=155087#c21):

`The autocorrection tool for any language must be prepared to require the least possible effort from user: the replacements that the tool makes must be correct on 100% cases`. 

That is not the case if supplementary actions may be required from the user. Auto-correction should not operate when ulterior intervention is not excluded.

Comment 1 cipricus 2023-05-15 10:13:51 UTC

(In reply to cipricus from comment #0)

> ...but which are not the only ones possible (notably, the definite forms
> may also be expected in these examples: "răzbunată"=the revenged one, fem.,
> "răsfățată"=the spoiled/pampered one, fem., "neastâmpărată"= the
> naughty/unruly, fem.).

I made a copy/paste error. The above should read:

the definite forms may also be expected (are also correct): "răzbunata"=the revenged one, fem.,"răsfățata"=the spoiled/pampered one, fem., "neastâmpărat"= the
naughty/unruly, fem.

That is, the definite form (ending in `a`) could be expected too, instead of the definite one (ending in `ă`). This structure may trigger this problem with Romanian, but it's not the only possible pattern, while other languages may have their own favorable patterns leading to the same problem. I haven't studied other language auto-correctors and am mentioning Romanian because it's here that I could intervene.

The main aspect here is whether a rule like the one aforementioned could be specified: forms that might support multiple correct forms should not be auto-corrected.

Comment 2 cipricus 2023-05-15 10:30:09 UTC

e.g. "tacuta" means nothing and should be corrected, but "tăcută"="silent", fem. and "tăcuta"="the silent one" are both correct)

For English that would be something like auto-correcting "bleack" to "black" or "bleak", where either (the other one) may be expected.

Comment 3 Stéphane Guillou (stragu) 2023-05-30 13:04:44 UTC

Thank you for the report.
So isn't the solution to remove the entries that are ambiguous from the corresponding DocumentList.xml, so the erroneous form then falls back onto the spellcheck? I assume autocorrect relies exclusively on unambiguous 1-to-1 rules, and a DocumentList.xml can't contain several replacements for the same string.

Maybe this report needs to be renamed to "Remove ambiguous Romanian autocorrect entries" so it is more focused and has a chance to be resolved.
Are you planning to work on it?

Comment 4 cipricus 2023-06-02 15:05:28 UTC

(In reply to Stéphane Guillou (stragu) from comment #3)
> Thank you for the report.
> So isn't the solution to remove the entries that are ambiguous from the
> corresponding DocumentList.xml, so the erroneous form then falls back onto
> the spellcheck? I assume autocorrect relies exclusively on unambiguous
> 1-to-1 rules, and a DocumentList.xml can't contain several replacements for
> the same string.
> 
> Maybe this report needs to be renamed to "Remove ambiguous Romanian
> autocorrect entries" so it is more focused and has a chance to be resolved.
> Are you planning to work on it?

Yes, I would like to work on it, although I don't know how systematically I can do it, but I would like to be able to propose changes when I notice the need at https://gerrit.libreoffice.org/c/core/+/151770

Is that ok?

Comment 5 cipricus 2023-06-02 15:07:55 UTC

(In reply to Stéphane Guillou (stragu) from comment #3)

> Maybe this report needs to be renamed to "Remove ambiguous Romanian
> autocorrect entries" so it is more focused and has a chance to be resolved.

I have renamed it.

Comment 6 cipricus 2023-06-02 15:16:27 UTC

(In reply to cipricus from comment #4)

> Yes, I would like to work on it, although I don't know how systematically I
> can do it, but I would like to be able to propose changes when I notice the
> need at https://gerrit.libreoffice.org/c/core/+/151770
> 
> Is that ok?

In fact I understand now from another exchange (https://ask.libreoffice.org/t/where-and-how-to-report-errors-in-defaults-of-autocorrection/91034/18?u=cipricus) that once the merge is made changes cannot be made at that address and a new session of changes has to be initiated. Thanks.