Bug 156625 - It's difficult to figure out how to export unicode characters to PDF/UA
Summary: It's difficult to figure out how to export unicode characters to PDF/UA
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.3.7.2 release
Hardware: ARM Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Special-Character
  Show dependency treegraph
 
Reported: 2023-08-04 22:10 UTC by Randy
Modified: 2023-08-05 13:04 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Randy 2023-08-04 22:10:24 UTC
Description:
Special Character inserter adds unwanted and invisible character formatting, making it difficult to export a document with hundreds of these characters to PDF/UA.

Steps to Reproduce:
In a Writer document, use Insert -> Special Character.  Search for "half" and insert the ½ character.

Elsewhere in the same document, type a series of digits 123456. Between 3 and 4, Insert -> Special Character. In the Hexadecimal field, set it to U+202F.  Put that character into your favourites, and insert it into your document from there.

Now use Tools -> Accessibility Check.

(The Accessibility Check is also run as part of Export to PDF, when PDF/UA is selected)


Actual Results:
At these locations in the document (and sometimes at other invisible locations) you are told "The text formatting conveys additional meaning."

But this is perplexing because you never added any formatting, and there doesn't appear to be any formatting. It is difficult to find and remove the formatting.

More likely the user assumes that the error message is flagging the wrong error, and that Unicode simply isn't usable in PDF/UA.

Expected Results:
Inserting Special Characters shouldn't add any character formatting (unless the user actually specified different formatting, such as a different font).



Reproducible: Always


User Profile Reset: No

Additional Info:
The boxes above weren't the best way to explain this.

For me, this is impacting these characters:
* the quantity ½, common in recipes and carpentry
* numbers of the (Canadian) form 12 345.67 (where the whitespace is U+202F)

If these characters are copied from another document, or if whitespace is inserted by Insert -> Formatting Mark, they pass the Accessibility Check (which can be run independently or part of export to PDF/UA).

If these characters are inserted from the Special Character dialog, they look identical, but they fail the Accessibility Check saying "The text formatting conveys additional meaning." But as far as the user knows, there was no text formatting.  They didn't ask for any, and it looks just like the surrounding text.

When you go looking for help on this, you encounter a number of web pages claiming that you shouldn't use unicode in PDF/UA, so you think you can't use these characters. But that's not the actual problem.

The problem is that characters inserted from the Special Character dialog are getting extra character formatting that is hard to detect, but needs to be removed to pass the Accessibility Check.

If you use Style Inspector and use the arrow keys to move over that area, there is just the briefest blip that there is some Character Direct Formatting but it immediately vanishes, so that gives you, maybe a hint. (Is this a bug? Why does it vanish?)

You can use the Clone Formatting painter to copy the format of characters that are OK, but that has no effect (which is itself a defect, but it's not THIS defect). Again that makes you think it must be the characters themselves at fault, but that's not the case.

Further, if you select the text that includes that character, the Styles sidebar indicates "No Character Style" (this happens even if any multiple styles are selected) so you think it can't possibly be a style problem. But it is a formatting problem.

This works reliably: Clear Direct Formatting for selected text does work to get rid of the formatting that was secretly added.

I'm on a new install of LibreOffice so I didn't reset my profile.
Comment 1 m_a_riosv 2023-08-05 07:49:19 UTC
Looks like a duplicate of https://bugs.documentfoundation.org/show_bug.cgi?id=155267

Please, if you are not agreed, reopen this one.

*** This bug has been marked as a duplicate of bug 155267 ***
Comment 2 V Stuart Foote 2023-08-05 13:04:50 UTC
Confirm that using the SCD to insert breaks text runs around the inserted character.  

Sometimes this would be expected even needed when font of the PS does not include the glyph being pasted (or the glyph is explicitly selected from a different font than in use for PS).

But when the PS font includes the glyph, the extra spans and T-text styles are not needed and as with bug 155267 can disrupt the PDF/UA export validation.

@Justin, Mike -- could/should the SCD insert logic be expanded to check the PS font so as to not add the additional spans when not needed?