Bug 153043 - Writer should not declare CJK (RTL-CTL) fonts when CJK (resp. RTL-CTL) support disabled
Summary: Writer should not declare CJK (RTL-CTL) fonts when CJK (resp. RTL-CTL) suppor...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Languages
  Show dependency treegraph
 
Reported: 2023-01-16 11:52 UTC by Eyal Rozenberg
Modified: 2023-12-04 20:50 UTC (History)
0 users

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2023-01-16 11:52:21 UTC
I use LO with CJK support disabled and RTL-CTL support disabled.

If I create a blank document in Writer and save it, the styles.xml file gets a font-face-decls with elements for CJK fonts:

<office:font-face-decls>
<!-- ... snip ... -->
  <style:font-face style:name="Noto Sans CJK SC" svg:font-family="&apos;Noto Sans CJK SC&apos;" style:font-family-generic="system" style:font-pitch="variable"/>
  <style:font-face style:name="Noto Serif CJK SC" svg:font-family="&apos;Noto Serif CJK SC&apos;" style:font-family-generic="system" style:font-pitch="variable"/>
</office:font-face-decls>

I don't believe these should not be added to the file, and the user has in no way chosen these fonts for anything.

Similarly, if I had not enabled RTL-CTL, I would expect no information about RTL-CTL fonts to be written.
Comment 1 Mike Kaganski 2023-01-16 12:25:57 UTC
No! :)
"Disabled support" is just a UI thing, and this must never translate to core functionality. The format must keep all the data (defaults or explicitly set) for all aspects of styles, including CTL/Asian pieces. If someone has CTL disabled, and opens a file created in a CTL environment, the program uses the data from "hidden" CTL properties; it must keep them; it must store them.

And even when a document is created in such an environment, it must have defaults, so that when it later edited in different environments (including at the same place, using e.g. charmap), the results are consistent.
Comment 2 Eyal Rozenberg 2023-01-16 13:03:02 UTC
(In reply to Mike Kaganski from comment #1)
> "Disabled support" is just a UI thing, and this must never translate to core
> functionality. 

I'm not suggesting the ODF be dysfunctional, just that the app not specify things it doesn't have an opinion about.

> The format must keep all the data (defaults or explicitly
> set) for all aspects of styles, including CTL/Asian pieces.

Why? And when answering, please consider that we're both supporting 151215, i.e. per-language default font settings. So should we have 50, or 100, elements here for all languages?


> If someone has
> CTL disabled, and opens a file created in a CTL environment, the program
> uses the data from "hidden" CTL properties; it must keep them; it must store
> them.

Why must it? Why not just use whatever defaults it likes? After all, the document author doesn't care which CTL fonts the person opening the document uses for adding content.

> And even when a document is created in such an environment, it must have
> defaults, so that when it later edited in different environments (including
> at the same place, using e.g. charmap), the results are consistent.

The results will be consistent, in the sense that everything the author specified will be maintained. Additions of text in other language groups _are_ consistent with the original document, regardless of the fonts they use.
Comment 3 Mike Kaganski 2023-01-16 13:14:20 UTC
(In reply to Eyal Rozenberg from comment #2)
> > If someone has
> > CTL disabled, and opens a file created in a CTL environment, the program
> > uses the data from "hidden" CTL properties; it must keep them; it must store
> > them.
> 
> Why must it? Why not just use whatever defaults it likes? After all, the
> document author doesn't care which CTL fonts the person opening the document
> uses for adding content.

This: "the document author doesn't care which CTL fonts the person opening the document uses for adding content" is ... shocking ;)
The author creates content *and styles*. The work on a document can happen collaboratively through sharing, and the end result *indeed* depends on correct behavior of the program on all systems. The change of a styling should be always an explicit action.

> > And even when a document is created in such an environment, it must have
> > defaults, so that when it later edited in different environments (including
> > at the same place, using e.g. charmap), the results are consistent.
> 
> The results will be consistent, in the sense that everything the author
> specified will be maintained. Additions of text in other language groups
> _are_ consistent with the original document, regardless of the fonts they
> use.

Again: the results must be consistent not in your sense, but in absolute sense: the author can start something on a system configured for en-US, and then participants from all other places should not create a mess because of "unspecified behavior". Any change from some *fixed default* must be explicit.
Comment 4 Mike Kaganski 2023-01-16 13:20:24 UTC
(In reply to Mike Kaganski from comment #3)
> The work on a document can happen collaboratively through sharing

Or sending copies, and then using Compare Document feature. Or without "Compare Document", but using comments and assembling using copy-pasting ... or joining chapters in a Master Document ...

Having inconsistent styles would hurt in any such situation. Even if you re-define the end result, people working on intermediate documents could rely on some formatting (or, say, font character repertoire support) details.
Comment 5 Mike Kaganski 2023-01-16 14:08:15 UTC
(In reply to Eyal Rozenberg from comment #2)
> > The format must keep all the data (defaults or explicitly
> > set) for all aspects of styles, including CTL/Asian pieces.
> 
> Why? And when answering, please consider that we're both supporting 151215,
> i.e. per-language default font settings. So should we have 50, or 100,
> elements here for all languages?

(In reply to Mike Kaganski from bug 151215 comment #3)
> Having a per-lang font assignment ... indeed together with a usable UI where some
> defaults would allow one to avoid assigning fonts to each of 5000+ human languages

This implies that that proposal should be implemented to allow the style to map only one (or a few, or maybe even zero) actual language to a font, and have a "Default" entry handling all the rest, which is an *explicit* setting.
Comment 6 Eyal Rozenberg 2023-01-18 22:37:25 UTC
(In reply to Mike Kaganski from comment #3)
> This: "the document author doesn't care which CTL fonts the person opening
> the document uses for adding content" is ... shocking ;)

It may shock you, but it's true.

> The author creates content *and styles*.

Content and styles that don't involve CJK, because the author has configured their app to disable CJK support. Which happens when the author's work - including the author's collaboration work - does not involve CJK languages.

> The work on a document can happen
> collaboratively through sharing, and the end result *indeed* depends on
> correct behavior of the program on all systems. The change of a styling
> should be always an explicit action.

That is only for _defined_ styling.

> > > And even when a document is created in such an environment, it must have
> > > defaults, so that when it later edited in different environments (including
> > > at the same place, using e.g. charmap), the results are consistent.

The results will be consistent, since the document doesn't involve any CJK LG content.

> Again: the results must be consistent not in your sense, but in absolute
> sense: the author can start something on a system configured for en-US, and
> then participants from all other places should not create a mess because of
> "unspecified behavior".

They won't create a mess. Whenever some collaborator adds CJK content, they will need to declare/specify fonts for that language group. While collaborators don't use CJK either, that remains unnecessary. And - it is not a problem if a collaborator declares the CJK fonts, and the author sees the CJK fonts which the collaborator set, e.g. using their system's default, rather than the original author's system default - because it's the collaborator who introduced CJK.

>  Any change from some *fixed default* must be explicit.

We are arguing about whether there should be a fixed default. Even without a fixed default, of course any divergence from the absence of CJK should be explicit w.r.t. styling.
Comment 7 Mike Kaganski 2023-01-19 04:27:32 UTC
(In reply to Eyal Rozenberg from comment #6)
> > Again: the results must be consistent not in your sense, but in absolute
> > sense: the author can start something on a system configured for en-US, and
> > then participants from all other places should not create a mess because of
> > "unspecified behavior".
> 
> They won't create a mess. Whenever some collaborator adds CJK content, they
> will need to declare/specify fonts for that language group. While
> collaborators don't use CJK either, that remains unnecessary. And - it is
> not a problem if a collaborator declares the CJK fonts, and the author sees
> the CJK fonts which the collaborator set, e.g. using their system's default,
> rather than the original author's system default - because it's the
> collaborator who introduced CJK.

Two different collaborators would create two different formattings - each on their system with their settings; and then it is a mess.
Comment 8 Mike Kaganski 2023-01-19 04:28:25 UTC
I mean: a document without CJK is sent to two CJK-using collaborators; and their results arrive like that.
Comment 9 Eyal Rozenberg 2023-05-08 18:59:49 UTC
(In reply to Mike Kaganski from comment #7)
> Two different collaborators would create two different formattings - each on
> their system with their settings; and then it is a mess.

It's not a mess, because none of them uses CJK. And if they start using it - i.e.  the case of two collaborators introducing CJK in parallel - that's no different then two collaborators writing, say, the same section of a document in parallel: Their modifications conflict with each other and need to be harmonized.
Comment 10 Mike Kaganski 2023-05-08 19:53:16 UTC
(In reply to Mike Kaganski from comment #8)
> I mean: a document without CJK is sent to two CJK-using collaborators; and
> their results arrive like that.

(In reply to Eyal Rozenberg from comment #9)
> It's not a mess, because none of them uses CJK.

Don't you find that my words above explicitly say otherwise?

> And if they start using it -
> i.e.  the case of two collaborators introducing CJK in parallel - that's no
> different then two collaborators writing, say, the same section of a
> document in parallel: Their modifications conflict with each other and need
> to be harmonized.

Don't you find that the same can be said about work *without* a template (when people start creating their own document without any prior preparational work); and the template idea is exactly to *prevent* such a situation ;)
Comment 11 Eyal Rozenberg 2023-09-29 21:16:28 UTC
(In reply to Mike Kaganski from comment #10)

Let me start with a meta-reply: Our deeper disagreement seems to be about whether or not an ODF document must fully define the styling of _all_ possible content, not just the content in the actual document; or whether unused styling can remain undefined.

The argument for full-definition seems to be: 

FD1. If such a document is modified by different people, they are likely to define inconsistent styling (of those aspects not defined in the original document)

My arguments for partial definition are:

PD1. There is benefit in documents only containing anything the user was not aware they are inserting into the document. Whoever opens the document can't tell whether the author actually wanted any RTL-CTL content, for example, to be set in the font specified in styles.xml, or whether the app just inserted some default.

PD2. This Allows for the creation of smaller documents, and particularly, shorter styles.xml file. This is most relevant for testcases and sample document.

PD3. The aspects of styling which the user does not set explicitly and does not use (e.g. RTL-CTL font), and which would be set to some default, are likely to not match the specified styles well; thus, if/when they come into effect, the document would be poorly-styled, or otherwise - the effect would be the same as with no-styling, i.e. each user (among several potential collaborators) would set it to something different. In these cases there will have been no benefit in for

PD4. The default choice of RTL-CTL font is likely to not cover many Unicode characters of various RTL-CTL languages. For those characters, even the specification of the font in styles.xml does not _really_ specify which font is used. And, in fact, LO today doesn't even have the capability of specifying fallbacks properly (*) - so if users were to add content in those languages, they would again each be using their own different fallback font.


Now back to the bickering:

> (In reply to Mike Kaganski from comment #8)
> (In reply to Eyal Rozenberg from comment #9)
> > It's not a mess, because none of them uses CJK.
> 
> Don't you find that my words above explicitly say otherwise?

You did say that, but I was talking about the scenario of CJK not being used. I assume you concede that while CJK is not used no mess is created, and proceed to the case you want to focus on, which is multiple collaborators who independently introduce CJK content without one disseminating an update to all the others.

In that case, the "mess" is a conflict of styles: Say, one user who added CJK chose font family FooCJK, and another chose BarCJK. And this conflict will need to be resolved. But like I said in my last comment - that's no different from settling differences in edits to the text.

In fact, I believe your approach sometime results in the need for more need for style conflict resolution, because if two people write a document from scratch, and then want to merge it - with your approach, they will need to harmonize the differences in all undefined styles they had not even given any though to (and may not even know about).

> Don't you find that the same can be said about work *without* a template
> (when people start creating their own document without any prior
> preparational work);

I didn't mention templates anywhere... what does it matter if this scenario happens with or without a template?

> and the template idea is exactly to *prevent* such a situation ;)

We could have the same argument about templates. If a group of people don't use CJK, and want to work on some documents based on a template - why should that template define any CJK fonts?
Comment 12 Mike Kaganski 2023-09-30 10:21:16 UTC
(In reply to Eyal Rozenberg from comment #11)
> (In reply to Mike Kaganski from comment #10)
> > Don't you find that the same can be said about work *without* a template
> > (when people start creating their own document without any prior
> > preparational work);
> 
> I didn't mention templates anywhere... what does it matter if this scenario
> happens with or without a template?

You mentioned it from start - just the problem of loose terminology here. We never talked about the *proper* "template" in LibreOffice sense (e.g., ott); only about a *document* that one sends to another, and they start working on copies of that document - it's all about this original document used as "template" here. If you prefer, you may change the word (not *term*) "template" that I used here all along, with word "initial blank document shared among contributors to start their work".

> In fact, I believe your approach sometime results in the need for more need
> for style conflict resolution, because if two people write a document from
> scratch, and then want to merge it - with your approach, they will need to
> harmonize the differences in all undefined styles they had not even given
> any though to (and may not even know about).

No.
"From scratch" case would not make it easier in even a slightest bit. Even if you imagine the case where collaborator A has their from-scratch document (using *pre-defined, default* Style 1 and Style 2) without any other styles in the ODF, and collaborator B has their from-scratch document (using *pre-defined, default* Style 2 and Style 3) without any other styles:

* what you imagine is that copying data from collaborator B's document into collaborator A's document (on collaborator A's system) would only introduce possibly unexpected look of parts with Style 2, but not with Style 3;
* what would happen instead, would be that opening collaborator A's document on collaborator A's system would still fill all the missing *pre-defined, default* styles, including Style 3; and that would match the collaborator A's system i.e. what would be saved in the ODT anyway; pasting data from collaborator B's document would still result in the conflict between the in-memory definition of Style 3 in target document with what is in the pasted data. Same in the opposite case (opening on collaborator B's system), or even worse on collaborator C's system, having their from-scratch new document, pasting data from both other collaborators' documents.

So there is no real scenario where your proposal would create a *usability* improvement (or there was not provided one yet); the only actual upside is smaller XML size (not really resulting in noticeable change in ZIPped size; of course, FODT file size could change significantly for not too large documents).

On the other hand, there is a real scenario, that benefits from the *status quo* usability-wise. I have shown it. So, we compare 0% usability improvement (and some % of XML size improvement) - i.e., your proposal - to some (non-0) % of usability improvement - i.e., to status quo.

I'd say, that until we implement our discussed improvement into language handling in ODF, the Western/CTL/CJK problems (most of them) would not have any satisfactory solution. Having this triade is unfortunate. It works with the in-built magic of assigning a character one of these categories not based on what a user wants, but based on the character Unicode group; and that's awful. Anyone could suddenly arrive at using a character from the other Unicode group, say, by copy-pasting some emoticons (there are plenty of them using Japanese, Arabic, etc. characters). Not having control on what part of a style is applied to what run means the program has to be prepared.
Comment 13 Eyal Rozenberg 2023-10-02 19:14:45 UTC
(In reply to Mike Kaganski from comment #12)

So, about templates: I think that people who collaborate should probably use a proper OTT template, which defines everything they expect to use or maybe simply everything, period, in a way that's stylistically consistent. And that means I am not that worried about collaborators who start writing from scratch having to then harmonize different style choices they each introduce during their work.



> > if two people write a document from
> > scratch, and then want to merge it - with your approach, they will need to
> > harmonize the differences in all undefined styles they had not even given
> > any though to (and may not even know about).
> 
> No.
> "From scratch" case would not make it easier in even a slightest bit. Even
> if you imagine the case where collaborator A has their from-scratch document
> (using *pre-defined, default* Style 1 and Style 2)

Wait, is only Style 1 a predefined default or both of them? Grammar ambiguity

>  without any other styles
> in the ODF, and collaborator B has their from-scratch document (using
> *pre-defined, default* Style 2 and Style 3) 

same question. Please clarify so that I can understand the rest of the scenario.

Also, it seems you're assuming the default pre-defined styles are the same on A and B's system. Why are you making this assumption?
Comment 14 Mike Kaganski 2023-10-02 20:18:03 UTC
(In reply to Eyal Rozenberg from comment #13)
> Wait, is only Style 1 a predefined default or both of them? Grammar ambiguity
> ...

In my scenario, the three styles: Style 1, Style 2, and Style 3, were all pre-defined; e.g., they could be "Heading 1", "Body Text", and "Block Quotation".

> Also, it seems you're assuming the default pre-defined styles are the same
> on A and B's system. Why are you making this assumption?

Grammar ambiguity ;) What specifically do you mean by "the same"? I assume that the styles are "the same" in the sense that they have the same name. But otherwise, I do not assume their same *formatting* - if they were "the same" formatting-wise, it would make it safe and would not create problems.

(In reply to Eyal Rozenberg from comment #11)
> In fact, I believe your approach sometime results in the need for more need
> for style conflict resolution, because if two people write a document from
> scratch, and then want to merge it - with your approach, they will need to
> harmonize the differences in all undefined styles they had not even given
> any though to (and may not even know about).

These words imply, that two people - starting *from scratch*, and wanting to *merge* - have conflicts in all *undefined* styles - i.e., they have the problem in styles they didn't use (but that implies, that these styles still existed in their "from scratch" document, which leads to the standard styles).