Bug 132414

Summary: Allow multi-character delimiters in CSV import
Product: LibreOffice Reporter: Mikhail Novosyolov <mikhailnov>
Component: CalcAssignee: Not Assigned <libreoffice-bugs>
Status: RESOLVED WONTFIX    
Severity: enhancement CC: 79045_79045, erack, heiko.tietze, ming.v.hua, vsfoote
Priority: medium    
Version: 6.4.3.2 release   
Hardware: All   
OS: All   
See Also: https://bugs.documentfoundation.org/show_bug.cgi?id=127718
Whiteboard:
Crash report or crash signature: Regression By:
Bug Depends on:    
Bug Blocks: 109239    
Attachments: Example CSV
Screenshot illustrating the bug

Description Mikhail Novosyolov 2020-04-25 19:06:18 UTC
Description:
Example CSV:

3305460;;import/gcc;;i686;;error: in `/builddir/build/BUILD/gcc-8.3.0/BUILD': ;;xxx

Here 2 ";" are a separator, but, when csv is opened in Calc, despite how many ";" I write to the filed "Separator", LibreOffice thinks that only one ";" is a separator and makes empty columns.

Steps to Reproduce:
.

Actual Results:
.

Expected Results:
.


Reproducible: Always


User Profile Reset: No



Additional Info:
.
Comment 1 Mikhail Novosyolov 2020-04-25 19:06:40 UTC
Created attachment 159931 [details]
Example CSV
Comment 2 Mikhail Novosyolov 2020-04-25 19:07:02 UTC
Created attachment 159932 [details]
Screenshot illustrating the bug
Comment 3 Ming Hua 2020-04-25 20:09:49 UTC
For me, choosing "semicolon" and "merge delimiters" got rid of the empty columns and imported the example CSV file as desired.  Can you try this as well?
Comment 4 Roman Kuznetsov 2020-04-25 22:41:34 UTC
Your report looks as RFE like "Add opportunity to set up any number of symbols as one separator"
Comment 5 Ming Hua 2020-05-02 21:38:42 UTC
Mikhail, please give a reply about
1. Does my suggesting in comment #3 solve you problem?
2. Do you want to propose an enhancement about using multiple characters as separator, like Roman said in comment #4?
Comment 6 Mikhail Novosyolov 2020-05-11 05:30:36 UTC
(In reply to Ming Hua from comment #5)
> Mikhail, please give a reply about
> 1. Does my suggesting in comment #3 solve you problem?
Yes, it does, thank you, but it is not obvious at all and I do not catch the logics behind this behaviour
> 2. Do you want to propose an enhancement about using multiple characters as
> separator, like Roman said in comment #4?
I did not understand what Roman meant. If I specify ';;' as A separator (a = one), why does LibreOffice ignore the second ';'?
Comment 7 Mikhail Novosyolov 2020-05-11 05:32:01 UTC
E.g.
cat *.csv | awk -F ';;' '{print $1}'
works as I want
I expected the same behaviour from LibreOffice
Comment 8 QA Administrators 2020-05-12 03:52:58 UTC Comment hidden (obsolete)
Comment 9 Buovjaga 2020-08-28 18:59:18 UTC
Indeed, discussed before: https://bugs.documentfoundation.org/show_bug.cgi?id=127718#c4

Stuart mentions that any Unicode glyphs can be used as separators. It does work with emojis, but you still need to merge the delimiters.
Comment 10 Heiko Tietze 2020-09-10 13:47:38 UTC
You can escape characters that shouldn't be used as delimiter. Like "Foo;";"Bar";";";"Baz". Plenty of options and I would rather stick to the single character separator to keep things simple and familiar. Consider users complaining why ; doesn't work after unintentionally adding a space after, ie. "; ". => WF
Comment 11 Eike Rathke 2020-09-10 19:40:00 UTC
Heiko, that does not address the original request that asks to be able to specify a sequence of characters like ';;' two semicolons to be treated as *one* separator, as the original data uses such field delimiter. It's nothing about quoting or escaping extraneous semicolons. The data probably is delimited such to cater for the case where one semicolon could be part of a field content and the generator software is too lazy to quote and escape field content.

However, the Merge Delimiters option exactly solves this very problem except the case of a semicolon embedded in field content. If that is really needed then pre-process the data before importing to make it comply with the syntax of RFC 4180 (using any delimiter, not restricted to comma).

I'd rather not reimplement everything to allow a delimiter string instead of a delimiter character..


(In reply to Mikhail Novosyolov from comment #6)
> If I specify ';;' as A separator (a =
> one), why does LibreOffice ignore the second ';'?
In the Other input field one can specify a list of single character delimiters, not a string that is used as one delimiter.
Comment 12 Justin L 2020-12-15 11:41:45 UTC
(In reply to Eike Rathke from comment #11)
> pre-process the data before importing to make it comply with the syntax
> of RFC 4180 (using any delimiter, not restricted to comma).

Yes - exactly. LibreOffice does not need to cater to every conceivable textual data model. Anyone trying to manipulate text data should be able to search/replace with something like a | or whatever will work for their data set.

> I'd rather not reimplement everything to allow a delimiter string instead of
> a delimiter character..

WONTFIX