Summary: | Remove ambiguous Romanian autocorrect entries | ||
---|---|---|---|
Product: | LibreOffice | Reporter: | cipricus <cipricus> |
Component: | Linguistic | Assignee: | Not Assigned <libreoffice-bugs> |
Status: | NEW --- | ||
Severity: | normal | CC: | sophi, stephane.guillou |
Priority: | medium | ||
Version: | 7.5.2.2 release | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Crash report or crash signature: | Regression By: | ||
Bug Depends on: | |||
Bug Blocks: | 103341 |
Description
cipricus
2023-05-15 10:03:02 UTC
(In reply to cipricus from comment #0) > ...but which are not the only ones possible (notably, the definite forms > may also be expected in these examples: "răzbunată"=the revenged one, fem., > "răsfățată"=the spoiled/pampered one, fem., "neastâmpărată"= the > naughty/unruly, fem.). I made a copy/paste error. The above should read: the definite forms may also be expected (are also correct): "răzbunata"=the revenged one, fem.,"răsfățata"=the spoiled/pampered one, fem., "neastâmpărat"= the naughty/unruly, fem. That is, the definite form (ending in `a`) could be expected too, instead of the definite one (ending in `ă`). This structure may trigger this problem with Romanian, but it's not the only possible pattern, while other languages may have their own favorable patterns leading to the same problem. I haven't studied other language auto-correctors and am mentioning Romanian because it's here that I could intervene. The main aspect here is whether a rule like the one aforementioned could be specified: forms that might support multiple correct forms should not be auto-corrected. e.g. "tacuta" means nothing and should be corrected, but "tăcută"="silent", fem. and "tăcuta"="the silent one" are both correct) For English that would be something like auto-correcting "bleack" to "black" or "bleak", where either (the other one) may be expected. Thank you for the report. So isn't the solution to remove the entries that are ambiguous from the corresponding DocumentList.xml, so the erroneous form then falls back onto the spellcheck? I assume autocorrect relies exclusively on unambiguous 1-to-1 rules, and a DocumentList.xml can't contain several replacements for the same string. Maybe this report needs to be renamed to "Remove ambiguous Romanian autocorrect entries" so it is more focused and has a chance to be resolved. Are you planning to work on it? (In reply to Stéphane Guillou (stragu) from comment #3) > Thank you for the report. > So isn't the solution to remove the entries that are ambiguous from the > corresponding DocumentList.xml, so the erroneous form then falls back onto > the spellcheck? I assume autocorrect relies exclusively on unambiguous > 1-to-1 rules, and a DocumentList.xml can't contain several replacements for > the same string. > > Maybe this report needs to be renamed to "Remove ambiguous Romanian > autocorrect entries" so it is more focused and has a chance to be resolved. > Are you planning to work on it? Yes, I would like to work on it, although I don't know how systematically I can do it, but I would like to be able to propose changes when I notice the need at https://gerrit.libreoffice.org/c/core/+/151770 Is that ok? (In reply to Stéphane Guillou (stragu) from comment #3) > Maybe this report needs to be renamed to "Remove ambiguous Romanian > autocorrect entries" so it is more focused and has a chance to be resolved. I have renamed it. (In reply to cipricus from comment #4) > Yes, I would like to work on it, although I don't know how systematically I > can do it, but I would like to be able to propose changes when I notice the > need at https://gerrit.libreoffice.org/c/core/+/151770 > > Is that ok? In fact I understand now from another exchange (https://ask.libreoffice.org/t/where-and-how-to-report-errors-in-defaults-of-autocorrection/91034/18?u=cipricus) that once the merge is made changes cannot be made at that address and a new session of changes has to be initiated. Thanks. |