Bug 48413 - Windows: Command line bulk conversion including wildcards (*?) not working
Summary: Windows: Command line bulk conversion including wildcards (*?) not working
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All Windows (All)
: low enhancement
Assignee: Deb
URL:
Whiteboard: BSA target:7.2.0
Keywords: difficultyBeginner, easyHack, skillCpp
: 68647 (view as bug list)
Depends on:
Blocks: Commandline
  Show dependency treegraph
 
Reported: 2012-04-07 05:20 UTC by zumbs
Modified: 2022-07-18 04:32 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description zumbs 2012-04-07 05:20:38 UTC
Problem description: 

The LibreOffice executable (soffice.exe) supports a number of command line arguments as described in http://help.libreoffice.org/Common/Starting_the_Software_With_Parameters. The linked document (as well as blog posts on the internet) suggests that it should be possible to use the command line to convert all MS Word documents in a directory to odf. However, following command line does not convert anything:

soffice.exe --headless --convert-to odf --outdir C:\Docs\Out C:\Docs\In\*.doc

If the direct path is used, e.g.

soffice.exe --headless --convert-to odf --outdir C:\Docs\Out C:\Docs\In\a.doc

the specified document is converted to a.odf and placed in the out directory.

I run Windows 7 64 bit and have LibreOffice 3.5.1.2 installed. I have made sure that no instance of soffice is running before trying to use the command line.

Steps to reproduce:
1. Open command prompt to C:\Program Files (x86)\LibreOffice 3.5\program
2. Enter >soffice.exe --headless --convert-to odf --outdir C:\Docs\Out C:\Docs\In\*.doc
3. The documents in C:\Docs\In are not converted.
4. Enter >soffice.exe --headless --convert-to odf --outdir C:\Docs\Out C:\Docs\In\a.doc
5. The specified document, a.doc, is found in C:\Docs\Out.

Current behavior:
Using *.doc does not convert any documents in the input directory.

Expected behavior:
Using *.doc converts all documents in the input directory and copy the result to the output directory. Preferably, it would also iterate through sub directories, maintaining the directory structure in the outdir, but I'm unsure if that is the intended behaviour.

Platform (if different from the browser): 
              
Browser: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20100101 Firefox/11.0
Comment 1 Nino 2012-04-21 14:05:38 UTC Comment hidden (off-topic)
Comment 2 zumbs 2012-04-22 02:03:09 UTC
I installed LibreOffice 3.5.1rc1 and tried out the following command line argument:

C:\Docs\In>for %i in (*.doc) do soffice.exe --headless --convert-to odf --outdir C:\Docs\Out %i

Output:

C:\Docs\In>soffice.exe --headless --convert-to odf --outdir C:\Docs\Out File1.doc

C:\Docs\In>soffice.exe --headless --convert-to odf --outdir C:\Docs\Out File2.doc

C:\Docs\In>soffice.exe --headless --convert-to odf --outdir C:\Docs\Out File3.doc

The first file was correctly converted, but conversion of subsequent files failed. It would seem that Windows starts the three calls in parallel, which causes us to run into the bug where the command line fail to succeed if LibreOffice is already running (bug 37531). I saw a lot of instances of soffice.exe and soffice.bin in Taskmanager.

I tried using an Ubuntu 11.04 live cd. With command line arguments similar to the one in my original bug report, I got full conversion of the contents of C:\Docs\In, so it seems like it is a Windows bug.

On a side note, I made a small .net tool to convert the files one at a time, so I no longer need a workaround.
Comment 3 QA Administrators 2015-01-02 16:58:07 UTC Comment hidden (obsolete)
Comment 4 zumbs 2015-01-11 16:46:12 UTC
Verified that bug is still present with LibreOffice 4.3.5.2 running on Windows 7 SP1 64 bit. (Note that as per the comments above, the issue is not present on Linux, but I have not verified that again.)
Comment 5 Maxim Monastirsky 2015-07-31 11:36:11 UTC
*** Bug 68647 has been marked as a duplicate of this bug. ***
Comment 6 QA Administrators 2016-09-20 10:21:07 UTC Comment hidden (obsolete)
Comment 7 zumbs 2016-09-20 17:12:10 UTC Comment hidden (obsolete)
Comment 8 QA Administrators 2017-10-23 14:11:28 UTC Comment hidden (obsolete)
Comment 9 Ljubomir Ljubojevic 2018-09-28 10:07:29 UTC
This bug is still present on LibreOffice 6.1.2.1 for Windows x86.

I am seeing this problem for years, and ONLY on Windows, when wildcard "*" is used.

*.doc will NOT work
<filename>.doc will always work.

Workaround:
Convert files using LibreOffice on Linux
Comment 10 QA Administrators 2019-09-30 02:51:50 UTC Comment hidden (obsolete)
Comment 11 zumbs 2019-09-30 18:07:54 UTC Comment hidden (obsolete)
Comment 12 jhack_jos 2020-04-29 10:05:03 UTC
I can confirm on LibreOffice version  6.4.3.2 (x64) for Windows 10 Pro x64 the bug is still there. Wildcards should be definitely handled nicely by the soffice executable in headless mode.

Example of not working command:
"C:\Program Files\LibreOffice\program\soffice.exe" -ArgumentList "--headless --convert-to pdf *.odt

Workaround: if anyone is in search of a temporary solution, you may use a powershell or batch script. Here is a simple example Powershell script I wrote to convert all odt documents inside a folder to pdf:

Get-ChildItem *.odt | Foreach {
  Write-Host "Converting `"$_`" ..."
  Start-Process -Wait -FilePath "C:\Program Files\LibreOffice\program\soffice.exe" -ArgumentList "--headless --convert-to pdf `"$_`""
}

You may:
- change "Get-ChildItem *.odt" to your favourite input format. Ex. "Get-ChildItem *.docx"
- add -Recurse to it if you need to convert files in nested folders. Ex. "Get-ChildItem *.odt -Recurse"
- change "--convert-to pdf" to your needed output format. Ex. "--convert-to docx"

Hope this issue gets addressed.
Good luck everyone!
Comment 13 jhack_jos 2020-04-29 10:08:50 UTC
It seems you cannot edit a post on Bugzilla.
I did a small mistake when I copy&pasted the example command.

Here is a correct example of command not working:
"C:\Program Files\LibreOffice\program\soffice.exe" --headless --convert-to pdf *.odt
Comment 14 Mike Kaganski 2020-07-06 05:35:13 UTC
This Windows-only enhancement needs LibreOffice to implement own wildcard matching when pre-processing the passed command line on Windows. Unlike *nix environment, On Windows there's no shell pre-processing that the resulting command line that LibreOffice gets is already expanded.

For now, tricks like

> for %%f in ("input folder\*.sxd") do "C:\Program Files\LibreOffice\program\soffice.exe" --convert-to png --outdir "output folder" "%%f"

are needed (the example above is for batch files; if using in console, %% should be replaced into %; see e.g. [1]).

Code pointer: the command line handling happens in desktop/source/app/cmdlineargs.cxx. It must be Windows-only, since on other platforms, the file path is an arbitrary byte sequence, which itself may include bytes like '*' or '?' (different flavors of FS may apply own restrictions).

[1] https://ask.libreoffice.org/en/question/86800
Comment 15 Deb 2020-08-29 05:20:21 UTC
I will try this. I think I need to link  setargv.obj.
Comment 16 Deb 2020-11-09 04:39:30 UTC Comment hidden (obsolete)
Comment 17 Mike Kaganski 2020-11-09 05:38:13 UTC
(In reply to Deb from comment #16)
> setargv.obj doesn't work with winmain. Trying direct calls to FindFirstFileA

Please never use *A WinAPI. Always use *W variants, *explicitly*.
Comment 18 Mike Kaganski 2020-11-09 06:24:52 UTC
(In reply to Deb from comment #16)
> setargv.obj doesn't work with winmain. Trying direct calls to FindFirstFileA

And also it's unexpected that "setargv.obj doesn't work with winmain", given the MS-specific documentation on this:

https://docs.microsoft.com/en-us/cpp/c-language/expanding-wildcard-arguments
Comment 19 Mike Kaganski 2020-11-09 06:28:21 UTC
(In reply to Mike Kaganski from comment #18)

Specifically, note that soffice.bin is a console application on Windows, which uses main, not WinMain. So it's unnecessary to rely on wildcard processing in soffice.exe (which indeed uses WinMain itself, but passes the command line arguments to the console-based soffice.bin).

But it's OK either way, whatever you consider easier.
Comment 20 Mike Kaganski 2020-11-09 07:15:17 UTC
(In reply to Deb from comment #16)
> setargv.obj doesn't work with winmain. Trying direct calls to FindFirstFileA

After looking at that MS documentation on setargv.obj, I now see where the problem is. Indeed, it will not work with LibreOffice, since it doesn't use crt's main arguments at all. It uses WinAPI to access the original command line arguments (so any crt pre-processing is ignored), and you would surely need to use the FindFirstFile WinAPI (or the like) to do the job.
Comment 21 Commit Notification 2020-11-30 09:36:06 UTC
Deb Barkley-Yeung committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/7f477f8dd85c84c9c1a9e673b685dc0e03d1d45a

tdf#48413 handle wildcards on Windows

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 22 Scott 2020-12-06 14:45:01 UTC
Just confirming... this won’t be available for another year in v7.2.0?
Comment 23 Mike Kaganski 2020-12-06 18:54:29 UTC
(In reply to Scott from comment #22)
This *will* be available in v7.2.0. It will be released in Aug 2021.
Comment 24 10686639125 2021-05-29 09:46:04 UTC
\soffice_safe.exe --convert-to pdf D:\inputdir\* --outdir D:\outputdir\

It works in 7.2.

However, how to iterate through subdirectory?
Comment 25 Timur 2021-05-31 09:24:19 UTC
If it works, we set Verified.
I guess that for subdirectory another enhancement bug would be needed.
.
Comment 26 Timur 2022-07-14 13:19:57 UTC
As noted in bug 148275, from LO 6.3 proper console mode in Windows is `soffice.com`, so calling should be with "soffice.com" or just "soffice" but not soffice.exe.
Only then it waits for a command to finish to continue with the next one.

https://mikekaganski.wordpress.com/2018/11/21/proper-console-mode-for-libreoffice-on-windows/
Comment 27 asbjoern.skoedt 2022-07-17 23:18:11 UTC
Is this working or not? I get this error:


At line:1 char:50
+ "C:\Program Files\LibreOffice\program\soffice" --headless --convert-t ...
+                                                  ~~~~~~~~
Unexpected token 'headless' in expression or statement.
At line:1 char:1
+ "C:\Program Files\LibreOffice\program\soffice" --headless --convert-t ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The '--' operator works only on variables or on properties.
    + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : UnexpectedToken
Comment 28 Mike Kaganski 2022-07-18 04:32:41 UTC
(In reply to asbjoern.skoedt from comment #27)

Your problem has nothing to do with this bug; you need to learn how to use PowerShell to call programs. Specifically, operator & is used for that in that environment [1]:

> & "C:\Program Files\LibreOffice\program\soffice" --headless --convert-to ...

[1] https://www.delftstack.com/howto/powershell/powershell-run-exe/