Summary: | charset autodetection for csv imports | ||
---|---|---|---|
Product: | LibreOffice | Reporter: | Björn Michaelsen <bjoern.michaelsen> |
Component: | Calc | Assignee: | Not Assigned <libreoffice-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | CC: | lists, malik.a.rumi, samuel.mehrbrodt |
Priority: | medium | ||
Version: | 3.3.1 release | ||
Hardware: | Other | ||
OS: | All | ||
See Also: | https://launchpad.net/bugs/694188 | ||
Whiteboard: | |||
Crash report or crash signature: | Regression By: | ||
Bug Depends on: | |||
Bug Blocks: | 109236, 38637 | ||
Attachments: | Confirmation that this is an ongoing issue. |
Description
Björn Michaelsen
2011-03-04 08:11:09 UTC
There is already a basic implementation of charset detection implemented in the writer text import as SwIoSystem::IsDetectableText: http://opengrok.libreoffice.org/xref/writer/sw/source/filter/basflt/iodetect.cxx#427 It old and ugly, but could be a starting point. Obviously, it would have to be moved out of writer and polished a bit so that it can be used in other applications too. Would be nice if the implementation would also work with 38637 - Better handling for csv-Files [This is an automated message.] This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it started right out as NEW without ever being explicitly confirmed. The bug is changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases. Details on how to test the 3.5.0 beta1 can be found at: http://wiki.documentfoundation.org/QA/BugHunting_Session_3.5.0.-1 more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html Still an issue in Version: 4.4.2.2. I actually ended up using Excel for a bunch of CSV files because when I tried to open them in LibreOffice, the import screen (defaulting to UTF-16) showed the file as a string of unintelligible Asian characters and I was in a hurry. Once I had a bit more time, I realised that making them work in LO was as simple as changing the charset to UTF-8. Excel just worked. Created attachment 122602 [details]
Confirmation that this is an ongoing issue.
If this has been around since 2011 and unchanged, perhaps it isn't urgent if not enough people encounter it, but that surprises me. Perhaps people run into it, and don't know what to do about it, and so don't report it? Anyway, here it is, and yes, if you just go back to utf-8 it resolves.
Implemented (at least the loose Unicode UTF-16 detection) since 7.1 with https://git.libreoffice.org/core/+/85f12e47f4a086a3923dd3a6b097776d60c6dc82%5E%21/ |