Bug 144576 - Copy a table from Writer to plain text editor or as unformatted text pastes a list instead of matrix (like Calc does)
Summary: Copy a table from Writer to plain text editor or as unformatted text pastes a...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: high normal
Assignee: Adam664
URL:
Whiteboard:
Keywords: difficultyInteresting, easyHack, skillCpp
: 157605 (view as bug list)
Depends on:
Blocks: Writer-Tables Unify-Across-Apps Cut-Copy
  Show dependency treegraph
 
Reported: 2021-09-18 01:29 UTC by Israel Enriquez
Modified: 2024-04-24 07:07 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample doc for discussion on https://gerrit.libreoffice.org/c/core/+/164833 PS 9 (10.49 KB, application/vnd.oasis.opendocument.text)
2024-03-18 09:59 UTC, Michael Weghorn
Details
Sample doc for further discussion in https://gerrit.libreoffice.org/c/core/+/164833 PS 11 (11.09 KB, application/vnd.oasis.opendocument.text)
2024-03-19 08:28 UTC, Michael Weghorn
Details
Screenshot for discussion on https://gerrit.libreoffice.org/c/core/+/164833 PS 16 (15.41 KB, image/png)
2024-03-26 16:15 UTC, Michael Weghorn
Details
Sample doc for discussion on https://gerrit.libreoffice.org/c/core/+/164833 PS 16 (10.27 KB, application/vnd.oasis.opendocument.text)
2024-03-27 08:53 UTC, Michael Weghorn
Details
Sample doc for discussion on https://gerrit.libreoffice.org/c/core/+/164833 PS 20 (10.71 KB, application/vnd.oasis.opendocument.text)
2024-04-23 17:47 UTC, Adam664
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Israel Enriquez 2021-09-18 01:29:14 UTC
Description:
The "table" format should be tab for new column and \n (new line) for new row.
This format is correct in Calc, but not in Writer.

Example:
I've open 2 documents, one in Calc and other in Writer, both with a 2x2 table, and this is what happens when I copy these tables and paste it right here:

Calc (correct):
(1,1)	(1,2)
(2,1)	(1,2)

Writer (incorrect):
(1,1)
(1,2)
(2,1)
(1,2)



Steps to Reproduce:
1.Open Writer
2.Create a table
3.Copy the table
4.Paste it in a text editor as notepad or this text boxes

Actual Results:
A non "table-formated" string.
Cells separated only with new lines.

Expected Results:
A "table-formated" string.
Cells separated with new lines and tabs.


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 7.2.1.2 (x64) / LibreOffice Community
Build ID: 87b77fad49947c1441b67c559c339af8f3517e22
CPU threads: 8; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: CL
Comment 1 m_a_riosv 2021-09-18 09:40:48 UTC Comment hidden (obsolete)
Comment 2 Israel Enriquez 2021-09-18 19:57:42 UTC
It is not a problem of copy from Calc to Writer, is a problem of copy from Writer to text.
Comment 3 QA Administrators 2021-09-21 04:54:17 UTC Comment hidden (obsolete)
Comment 4 raal 2022-01-25 21:00:43 UTC
I can confirm with Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: 0c3b8792b712e939d2ad524d554f96616b4844be
CPU threads: 4; OS: Linux 5.11; UI render: default; VCL: gtk3
Locale: cs-CZ (cs_CZ.UTF-8); UI: en-US
Calc: threaded Jumbo
and Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)


paste table 2x2 from Calc:
1	2
3	4

paste table 2x2 from Writer:
1
2
3
4
Comment 5 Timur 2022-01-26 07:45:30 UTC
Copy a table from MSO Word to Notepad pastes a matrix, as it should, not a list.
So bug correctly confirmed. 
Seems Inherited from OO.
Comment 6 Heiko Tietze 2023-10-20 09:05:29 UTC
*** Bug 157605 has been marked as a duplicate of this bug. ***
Comment 7 Stéphane Guillou (stragu) 2023-10-20 09:48:06 UTC
Copying my comment from duplicate bug 157605:

Same in OOo 3.3, so inherited.
Reproduced in recent trunk build:

Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: b83f069101f1e6d8aaac09a805f02bbc4c619e7a
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Table copied from OnlyOffice gives the OP's expected results:

1	Line 1
2	Line 2

(and uses tabs to separate columns, as it should)

This is the same as copy-pasting from Calc:

1	Line 1
2	Line 2

So, to me, it's sensible to make Writer tables behave the same as Calc when copy-pasted.

Hossein, could this qualify as an easy hack? Merged cells should be tested too.
Comment 8 Hossein 2023-10-30 12:59:09 UTC
(In reply to Stéphane Guillou (stragu) from comment #7)
> Hossein, could this qualify as an easy hack? Merged cells should be tested
> too.
Yes, I think this can be an EasyHack with the difficultyMedium.

Code pointers:

There are many steps in copy/pasting, including the data/format conversion and clipboard format handling. Here, you have to know that the document is converted to plain text via "text" filter.

The plaintext (ascii) filter is located here in the LibreOffice core source code:

sw/source/filter/ascii

Therefore, to change the copy/paste output, you have to fix the ascii filter. That would also provide the benefit that plain text export will be also fixed as requested here.

In this folder, there are a few files:

$ ls sw/source/filter/ascii/
ascatr.cxx  parasc.cxx  wrtasc.cxx  wrtasc.hxx

To change the output, you have to edit this file:

sw/source/filter/ascii/wrtasc.cxx

In this file, there is a loop dedicated to create the output.

 // Output all areas of the pam into the ASC file
 do {
     bool bTstFly = true;
    ...
 }

Inside this loop, the code iterates over the nodes inside the document structure, and extracts text from them. To check for yourself, add the one line below to the code, build LO, and then test. You will see that a * is appended before each node.

 SwTextNode* pNd = m_pCurrentPam->GetPoint()->GetNode().GetTextNode();
 if( pNd )
 {
+   Strm().WriteUChar('*');
  ...
 }

For example, having this table, with 1 blank paragraph up and down:

A | B
--|--
C | D

You will get this after copy/paste into a plain text editor:

*
*a
*b
*c
*d
*

To fix the bug, you have to differentiate between table cells and other nodes. Then, you should take care of the table columns and print tab between them.

To go further, you can only add star before table cells:

 if( pNd )
 {
     SwTableNode *pTableNd = pNd->FindTableNode();
     if (pTableNd)
     {
         Strm().WriteUChar('*');
     }
     ...
 }

You can look into how other filters handled tables. For example, inside sw/source/filter/html/htmltab.cxx you will see how table is managed, first cell is tracked and appropriate functions to handle HTML table are called.

For the merged cells, I suggest the EasyHacker first checks the behavior in other software, then design and implement the appropriate behavior.

To gain a better understanding of the Writer document model / layout, please see this document:

Writer/Core And Layout
https://wiki.openoffice.org/wiki/Writer/Core_And_Layout

And also this presentation:

Introduction to Writer Development - LibreOffice 2023 Conference Workshop
Miklos Vajna
https://www.youtube.com/watch?v=oM0tB1A0JHA
Comment 9 Michael Weghorn 2024-03-18 09:59:36 UTC
Created attachment 193175 [details]
Sample doc for discussion on https://gerrit.libreoffice.org/c/core/+/164833 PS 9
Comment 10 Michael Weghorn 2024-03-19 08:28:33 UTC
Created attachment 193189 [details]
Sample doc for further discussion in https://gerrit.libreoffice.org/c/core/+/164833 PS 11
Comment 11 Michael Weghorn 2024-03-26 16:15:41 UTC
Created attachment 193324 [details]
Screenshot for discussion on https://gerrit.libreoffice.org/c/core/+/164833 PS 16
Comment 12 Michael Weghorn 2024-03-27 08:53:00 UTC
Created attachment 193334 [details]
Sample doc for discussion on https://gerrit.libreoffice.org/c/core/+/164833 PS 16

Output I currently get with sample file:

aabc         defghij
ySome longer text that is only in the second row    z
Comment 13 Adam664 2024-04-23 17:47:33 UTC
Created attachment 193826 [details]
Sample doc for discussion on https://gerrit.libreoffice.org/c/core/+/164833 PS 20

If a table border or column is moved to create a custom cell width, the width reported by
SwFormatFrameSize.GetWidth() changes. See the attached test document for example tables.

Steps to Reproduce:

1. Open the test document.
2. Move Table1 right border to the left so that it ends up looking like Table2
3. Select all of Table1 and copy
4. Using debugger, set break point on SwASCWriter::WriteTable and open the test document
5. Use pTableNd->GetTable().GetFrameFormat()->GetFrameSize(true).GetWidth()
       pBox->GetFrameFormat()->GetFrameSize(true).GetWidth() to get the following

              Table1 Untouched  Table1 Modified   Untouched/Modified
Table Width   65535             8640              6.57
Cell A Width  21845             3324              6.57
Cell B Width  10922             1661              6.57
Cell C Width  10923             1663              7.59

Checking the table and column widths using Table->Properties with measurement set to points(1/20 twips) gives the correct measurements for both tables in twips.
Comment 14 Adam664 2024-04-23 18:05:39 UTC
Apologies for the noise but the table of values in comment 13 should read

                   Table1 Untouched      Table1 Modified       Untouched/Modified
Table Width        65535                 8640                  7.59
Cell A Width       21845                 3324                  6.57
Cell B Width       10922                 1661                  6.57
Cell C Width       10923                 1663                  6.57
Comment 15 Miklos Vajna 2024-04-24 07:07:32 UTC
The second table is normal, the first one is weird. 65535 is USHRT_MAX, has a special meaning. Some directions you could research:

1) Use GetHTMLTableLayout() on the SwTable (this is what the RTF export tries to do), which should give you ~3000 for the A1 cell width even in the first table.

2) Look at the UI code in sw/source/ui/table/tabledlg.cxx, SwTableColumnPage and see how it manages to show something sane in the USHRT_MAX case.

3) Look at how SwCellFrame gets a sane size.