159652 – Finding a way to join a suffix to the word immediately before it, using autocorrect function

Bug 159652 - Finding a way to join a suffix to the word immediately before it, using autocorrect function

Summary: Finding a way to join a suffix to the word immediately before it, using autoc...

Status:	REOPENED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Documentation (show other bugs)
Version: (earliest affected)	7.6.4.1 release
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	AutoCorrect-Complete
	Show dependency tree / graph

Reported:	2024-02-09 01:14 UTC by Mac
Modified:	2024-03-07 05:49 UTC (History)
CC List:	5 users (show)

See Also:
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mac 2024-02-09 01:14:19 UTC

Description:
I'm looking for a way to merge two words together by using the autocorrect function in Writer. I use the Polish language in my work)

My aim is to type a word, add the space (so that the word is corrected if necessary, eg. diacritics are added, etc.), and then type a suffix as a separate word. The suffix should be autocorrected (again, dicritics, etc.) and the space prceding it removed, as if by hitting the Backspace key.

The reason for this request is that I'm making a huge list of autocorrect definitions. Polish is chock-full of diacritics, so I'm using autocorrect to be able to type without thinking about them, letting the software auto-add the various dots and squiggles where necessary.

To make my task even more complicated, Polish is also an inflected language, which means that every one of these words has multiple variants depending on their flection.

Adding a suffix on top of this (each suffix taking a different form depending on flection) leads to a huge number of necessary autocorrect definitions for one word in all its forms.

What I'd like to do is:
1. define for autocorrect purposes the base form of a word (that is eg. change mogl to mógł). — This is not a problem, as autocorrect already does it.
2. define a suffix after a space (eg. bys) that will be autocorrected (eg. byś), but ALSO the space before it will be removed — so that the suffix is joined with the preceding word, resulting in: mógłbyś.

It seems removing that space could be done by regex, which autocorrection currently does not support, as aadvised to me by another user here: https://ask.libreoffice.org/t/can-you-auto-delete-the-space-before-an-auto-corrected-word/101757.

Steps to Reproduce:
1. In autocorrect options (I use Polish language) define: mogl to be autocorrected to: mógł.
2. In autocorrect options define: bys (that's [space]bys) to be autocorrected to: byś (that is [no space]byś.
3.Close the options.
In Writer document type: mogl bys.

Actual Results:
After the above steps the actual result is:
mógł byś (diacritics corrected properly, space persists)

Expected Results:
I'd like the result to be:
mógłbyś (diacritics in both words corrected properly, space removed — words are joined)

Reproducible: Always

User Profile Reset: No

Additional Info:
Version: 7.6.4.1 (X86_64) / LibreOffice Community
Build ID: e19e193f88cd6c0525a17fb7a176ed8e6a3e2aa1
CPU threads: 4; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: en-GB (en_GB); UI: en-GB
Calc: CL threaded

Comment 1 V Stuart Foote 2024-02-09 02:02:15 UTC

Interesting, just not sure it is feasible given the simple list 2-tuples for autocorrect entry and its replacement string. Would require substantial dev effort.

Comment 2 Mac 2024-02-09 04:05:36 UTC

Thank you for looking into it, Stuart.
Actually the list of such 2-tuples in Polish would be massive, as -byś, -bym, -by, -byście, -byśmy are super common suffixes that you add to verbs in the past tense to create conditional verbs. 

Then there are suffixes that result in participles: verb + -jąc/-jący/-jącym/-jącego/-jącemu/-jąca/-jącą/-jącej/-jące/-jącym/-jących/-jącymi (all of which are sort of equivalent of -ing)

You can see how long a list of definitions is required for a single verb - and that's only for two suffixes.

It would be soooo much easier and more efficient if I defined autocrrection for as many verbs as possible, and had a finite list of such suffixes that I could attach as I type. 

At the moment I've just found a way around it - I type the verbs and their suffixes separately, and then, after a day's work I Ctrl+H them: 
find: [space]-byś (etc.) / replace: [w/o space]byś

Than kind of does the job, but severely obstructs the flow. As a writer I'd be super grateful for smoothing it out.

I'm positive that the option to autocorrect words while at the same time deleting the space preceding it - effectively joining the autocorrected word with the one preceding it - would find many other uses, too.

Thanks again,
Mac

Comment 3 Heiko Tietze 2024-02-09 09:40:44 UTC

László, do you have an idea?

Comment 4 Shantanu 2024-02-11 05:21:49 UTC

Such a feature will also help Marathi language. The people incorrectly write "घरा चा" or "घरा ची"  (with space) instead of "घराचा" / "घराची" (without space). There should be a parameter to declare such suffixes to be joined with the earlier word. The suffixes like "चा" or "ची" are not accepted words in Marathi and never used separately after space. (like "ed" not a word in English but correct in "worked")

Comment 5 László Németh 2024-02-13 12:56:13 UTC

A possible solution is to use a non-space separator, e.g. comma, and the .* pattern to recognize the suffix in the end of the character sequence:

mogl -> mógł
.*,bys -> byś

After typing the comma, mogl changed to mógł. After following with ,bys:

mogl,bys -> mógłbyś


The other solution to use Hunspell spell checker to add the missing diacritics, accepting the their suggestions automatically. It seems, the dictionary contains all the Polish diacritics, so it can suggest the right alternatives with diacritics:

== pl_PL.aff ==
MAP 8
MAP aą
MAP cć
MAP eę
MAP lł
MAP nń
MAP oóu
MAP sś
MAP zżź

So with a LibreBasic or pyUNO macro, it's possible to add the missing diacritics automatically (except when the result is ambiguous), e.g. by clicking on a button at the end of the document editing. As a code snippet, see for example the following LibreBasic code snippet from https://forum.openoffice.org/en/forum/viewtopic.php?t=1222, using XSpellChecker service of LibreOffice UNO API via com.sun.star.linguistic2.LinguServiceManager:

Sub WrongWordsList 

    Dim oDocModel as Variant 
    Dim oTextCursor as Variant 
    Dim oLinguSvcMgr as Variant 
    Dim oSpellChk as Variant 
    Dim oListDocFrame as Variant 
    Dim oListDocModel as Variant 
    Dim sListaPalabras as String 
    Dim aProp() As New com.sun.star.beans.PropertyValue 

    oDocModel = StarDesktop.CurrentFrame.Controller.getModel() 
    If IsNull(oDocModel) Then 
        MsgBox("There's no active document." + Chr(13)) 
        Exit Sub 
    End If 

    If Not HasUnoInterfaces (oDocModel, "com.sun.star.text.XTextDocument") Then 
        MsgBox("This document doesn't support the 'XTextDocument' interface." + Chr(13)) 
        Exit Sub 
    End If 

    oTextCursor = oDocModel.Text.createTextCursor() 
    oTextCursor.gotoStart(False) 

    oLinguSvcMgr = createUnoService("com.sun.star.linguistic2.LinguServiceManager") 
    If Not IsNull(oLinguSvcMgr) Then 
        oSpellChk = oLinguSvcMgr.getSpellChecker() 
    End If 
    If IsNull (oSpellChk) Then 
        MsgBox("It's not possible to access to the spellcheck." + Chr(13)) 
        Exit Sub 
    End If 
        
    Do 
        If oTextCursor.isStartOfWord() Then 
            oTextCursor.gotoEndOfWord(True) 
            ' Verificar si la palabra está bien escrita 
            If Not isEmpty (oTextCursor.getPropertyValue("CharLocale")) Then 
                    If Not oSpellChk.isValid(oTextCursor.getString(), oTextCursor.getPropertyValue("CharLocale"), aProp()) Then 
               sListaPalabras = sListaPalabras + oTextCursor.getString() + Chr(13) 
           End If 
        End If 
            oTextCursor.collapseToEnd() 
        End If 
    Loop While oTextCursor.gotoNextWord(False) 
        
    If Len(sListaPalabras) = 0 Then 
        MsgBox("There are no errors in the document.") 
        Exit Sub 
    End If 

    oListDocFrame = StarDesktop.findFrame("fListarPalabrasIncorrectas", com.sun.star.frame.FrameSearchFlag.ALL) 
    If IsNull(oListDocFrame) Then 
        oListDocModel = StarDesktop.loadComponentFromURL("private:factory/swriter", "fListarPalabrasIncorrectas", com.sun.star.frame.FrameSearchFlag.CREATE, aProp()) 
        oListDocFrame = oListDocModel.CurrentController.getFrame() 
    Else 
        oListDocModel = oListDocFrame.Controller.getModel() 
    End If 

    oTextCursor = oListDocModel.Text.createTextCursor() 
    oTextCursor.gotoEnd(False) 

    oListDocModel.Text.insertString (oTextCursor, sListaPalabras, False) 

    oListDocFrame.activate() 

End Sub


And the other code snippet to modify the wrong words (but also fixing a problem in XSpellChecker usage that has since been solved):

https://forum.openoffice.org/en/forum/viewtopic.php?p=425651

Comment 6 Mac 2024-03-02 01:07:35 UTC

Awesome! László's solution works for me like a charm. Thank you.

Comment 7 Mac 2024-03-06 00:58:35 UTC

(In reply to Mac from comment #6)
> Awesome! László's solution works for me like a charm. Thank you.

Let me add that this solution works just as well with prefixes. In that case the .* pattern needs to be used after the prefix and the non-space separator (in this example comma):

e.g.:

pol,.* -> pół
zalezny -> zależny

After typing the comma, pol changed to pół. After following with any word that word is joined to pół. In this case:

pol,zalezny -> półzależny

---

You can even create long words with both prefixes and suffixes, 
e.g. by defining in autocorrect the following:

pol,.* -> pół        (a common prefix, meaning semi-)
zalezny -> zależny
.*,ch -> ch          (one of the many inflectional morphemes)

you can type:
pol,zalezny,ch -> półzależnych

Thank you again! You're the best :)

Comment 8 Mac 2024-03-06 05:03:34 UTC

Actually not exactly - with prefixes whatever is after the prefix doesn't get corrected, but it's still better than nothing.
(I'm sorry, there's no option to edit a post - please, feel free whatever parts of what I'm saying you deem irrelevant or muddying.)

Comment 9 Shantanu 2024-03-06 05:24:29 UTC

This (use of comma) workaround is intended for those who already know the correct spelling. Spell checkers are designed for novice users who may think what they've typed is correct.

By the way, I was unaware that this was possible. It has rendered hundreds, if not thousands, of auto-correct entries in the Marathi language pack obsolete. I appreciate your efforts, but what I find frustrating is that it's not adequately documented, especially with all the use cases as mentioned above.

Comment 10 Shantanu 2024-03-06 05:59:25 UTC

Ususally Close up (Unicode U+2050) ⁐ sign is used by proof readers if there is no need of space. For e.g. if I type "I am work ing hard." in google docs, it suggests "working". Libreoffice writer suggestions are way out of context.
If this is unrelated in current context, let me open a new feature request.

Comment 11 Heiko Tietze 2024-03-06 08:50:33 UTC

(In reply to Mac from comment #8)
> ...with prefixes whatever is after the prefix doesn't get corrected...

(In reply to Shantanu from comment #9)
> This (use of comma) workaround is intended for those who already know the
> correct spelling....
> ... I find frustrating is that it's not adequately documented...

Since the ticket is still flagged as UX relevant I wonder what's missing. Or should we forward the solution to documentation?

Comment 12 Mac 2024-03-06 10:08:49 UTC

(In reply to Heiko Tietze from comment #11)
> (In reply to Mac from comment #8)
> > ...with prefixes whatever is after the prefix doesn't get corrected...
> 
> (In reply to Shantanu from comment #9)
> > This (use of comma) workaround is intended for those who already know the
> > correct spelling....
> > ... I find frustrating is that it's not adequately documented...
> 
> Since the ticket is still flagged as UX relevant I wonder what's missing. Or
> should we forward the solution to documentation?

Oh, by all means, my initial question/request has been answered beautifully. It's a neat method for such highly inflective language as Polish (and, I gather, Marathi). So, it can go to documentation as a solution for adding suffixes and inflective morphemes that need to be automatically attached to the root word (while also get autocorrected - as a bonus - if there's need for that)



---
Since you are asking what is missing - it's the autocorrection of the root word in the middle (the part without the the .* pattern and non-space separator). 

In other words: prefix[corrected],root[not corrected],suffix[corrected]
(the root may stay not autocorrected because the autocorrect function recognises it not as the root alone, but as the chain consisting of [prefix][comma][uncorrected root].

Although I'll have to investigate it once again, because Im almost positive the root got corrected in a few words I initially typed as a test.

Comment 13 Mac 2024-03-07 00:00:38 UTC

(In reply to Mac from comment #12)

> ---
> Since you are asking what is missing - it's the autocorrection of the root
> word in the middle (the part without the the .* pattern and non-space
> separator). 
> 
> In other words: prefix[corrected],root[not corrected],suffix[corrected]
> (the root may stay not autocorrected because the autocorrect function
> recognises it not as the root alone, but as the chain consisting of
> [prefix][comma][uncorrected root].


Yes, I've tested it and can confirm now that it's how it is at the moment.

Someone mentioned before that it would be more difficult than suffixes, but how cool would it be if a user could "tell" the program to treat a particular non-space separator [e.g. comma] like a space. Or maybe is there such space separator already? Something other than comma? (But TBH comma is placed in a very convenient spot on the keyboard...)

Comment 14 Mac 2024-03-07 00:02:47 UTC

[An Edit Comment option would be handy]

I meant: Or maybe is there such a space-like separator already?

Comment 15 Mac 2024-03-07 05:49:20 UTC

One more thing worth noting when it comes to prefixes.

There's a difference between words whose root doesn't get autocorrected, and words that do (e.g. words with diactitics).

---
Example:

[prefix]
za

[roots]
baw
myśl

---

In autocorrect:

za,.* -> za
mysl -> myśl
[baw doesn't need to be autocorrected - no diacritics]

---

When I type:

za,baw -> zabaw
za,mysl -> zamysl [joined, but 'mysl' not corrected to 'myśl']