Bug 156638 - Export to HTML results in a number of issues
Summary: Export to HTML results in a number of issues
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-06 14:24 UTC by robert
Modified: 2023-09-01 06:14 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Just the spreadsheet I'm using to report on all my Calc issues (1.31 MB, application/vnd.oasis.opendocument.spreadsheet)
2023-08-06 14:25 UTC, robert
Details

Note You need to log in before you can comment on or make changes to this bug.
Description robert 2023-08-06 14:24:08 UTC
When the attached ODS file is saved in HTML format, there are at least these issues:

1) Excessive bloat 
The resulting HTML file has a size in excess of 10Mb, and part of this is caused by

a) Excessive indentation of the generated HTML - there should be a setting to manage this, IN THE SAVE-AS DIALOGUE, and not is some obscure settings dialogue that your average Tammy, Danny, or Harriet will never look at

b) Inclusion of utterly useless data, in casu the "sdval" and "sdnum" attributes. They serve abso-(strong expletive)-ingly no purpose in a html document, and in my case the 2057 is quite likely something Windoze specific, even worse...

c) Lack of additional CSS, even just three selectors, .l, .c, and .r for "text-align: left/center/right" would save considerable space, 'class="l"' saves three chars over 'align="left"' Setting a default alignment for *all* tables would result in (potentially) substantial additional savings.

2) Navigation
a) Given that each sheet ends up in a sequentially numbered table, how hard would it be to add backward and forward links to each of them, and an "up" link to get back to the index?

b) If the top-row of a sheet is (a) frozen (heading), why not add something like
 
 .sticky {
        position: sticky;
        top: 0;
        width: fit-content;
      }

      html {
        scroll-padding-top: 3rem;
      }

to it, so that the same effect is achieved in the html

3) Why are the html anchor tags in UPPERCASE???

4) How hard would it be to use the name of the ODS file in the <title> tag, when the create and modified dates are already used in (what are essentially) useless meta tags?

5) What the flippin' 'ell are the colgroup tags for? They do not stop some cells from flowing into the next row, or let me rephrase that, THEY ACTUALLY CAUSE CELLS TO WRAP! 

And as I've mentioned before: Bloat is not a technical issue, but verily a way of thinking, a "state of mind". Its cure is a simple refusal to accept, and a well directed, resounding "clean up your act and clean up your code!" - and no, don't suggest that I get involved, I only work in PL/I, Pascal, REXX and assembler.
Comment 1 robert 2023-08-06 14:25:39 UTC
Created attachment 188807 [details]
Just the spreadsheet I'm using to report on all my Calc issues
Comment 2 toi 2023-08-14 08:39:37 UTC
I will reply to your issue(s) step by step as following:

1. Excessive bloat: 

a) Using tabs is actually already a storage-saving choice by default (compared to using spaces, nearly 18% by a small research at this website: https://www.madskristensen.net/blog/performance-of-tabs-vs-spaces-in-html-files/). The size of your final file is 10744 KB. Removing tabs results in a file with size of 10324 KB, which is only around 4% of decrease.

b) For the uselessness of these data (in case of the tag "sdval" and "sdnum"): there is a website about the purpose of these "special tags" at https://help.libreoffice.org/6.1/he/text/swriter/01/04090007.html. You could probably observed that they also contains the format of cell in "sdnum" field so as to be compatible to original Calc format.

c) Using class instead of inline css is a good idea too. It would save more than just only removing tabs. Replacing inline css as classes saved around 7% (from 10744 KB to 9996 KB). Replacing inline css and removing tabs saved totally around 10.9% (from 10744 KB to 9576 KB).

2. Navigation:

a) I confirmed that when you link a cell from one sheet to other cell from other sheet, the HTML source file simply does not honor this relationship. There should be a new mechanism to make it compatible to original Calc format.

b) This is also a good idea too.

3. HTML anchor tags appeared in UPPERCASE as early as in version 4.0, and more in lowercase from version 5.0.
Comment 3 Buovjaga 2023-09-01 06:14:05 UTC
You list eight issues while a report should should only be about a single issue. So separate reports will have to be created. However, some of the listed issues are already covered by existing reports.

(In reply to robert from comment #0)
> When the attached ODS file is saved in HTML format, there are at least these
> issues:
> 
> 1) Excessive bloat 
> The resulting HTML file has a size in excess of 10Mb, and part of this is
> caused by
> 
> a) Excessive indentation of the generated HTML - there should be a setting
> to manage this, IN THE SAVE-AS DIALOGUE, and not is some obscure settings
> dialogue that your average Tammy, Danny, or Harriet will never look at

Possibly covered by bug 128638.

> b) Inclusion of utterly useless data, in casu the "sdval" and "sdnum"
> attributes. They serve abso-(strong expletive)-ingly no purpose in a html
> document, and in my case the 2057 is quite likely something Windoze
> specific, even worse...

Bug 60071 comment 4 addresses this, but for export this could possibly be covered by bug 128638.

> c) Lack of additional CSS, even just three selectors, .l, .c, and .r for
> "text-align: left/center/right" would save considerable space, 'class="l"'
> saves three chars over 'align="left"' Setting a default alignment for *all*
> tables would result in (potentially) substantial additional savings.

It is about Writer, but bug 95861 could possibly cover this, if the end result is a bigger rework of HTML handling.

> 2) Navigation
> a) Given that each sheet ends up in a sequentially numbered table, how hard
> would it be to add backward and forward links to each of them, and an "up"
> link to get back to the index?

Requested in bug 106656.

> b) If the top-row of a sheet is (a) frozen (heading), why not add something
> like
>  
>  .sticky {
>         position: sticky;
>         top: 0;
>         width: fit-content;
>       }
> 
>       html {
>         scroll-padding-top: 3rem;
>       }
> 
> to it, so that the same effect is achieved in the html

Could not find existing report.
 
> 3) Why are the html anchor tags in UPPERCASE???

Pretty cosmetic, but could not find existing report.

> 4) How hard would it be to use the name of the ODS file in the <title> tag,
> when the create and modified dates are already used in (what are
> essentially) useless meta tags?

Could not find existing report.

> 5) What the flippin' 'ell are the colgroup tags for? They do not stop some
> cells from flowing into the next row, or let me rephrase that, THEY ACTUALLY
> CAUSE CELLS TO WRAP! 

Possibly covered by bug 128638.