This issue occurs when converting a docx file to html using the command line. The command used is 'soffice --headless --convert-to "html:HTML:EmbedImages" --outdir ./dir file.docx'. If the file contains rectangles that have text boxes in them, the behavior is not consistent. For each rectangle, a span is created. In some cases, the text in the rectangle is inside this span while in other cases the text is placed outside. This causes empty spans and text placed after it. I've seen in some cases that an absolute positioning is added to these spans, causing text overlapping, but I haven't been able to create a MWE to reproduce this. You can contact me for a file that reproduces this inconsistency if needed, but it can be reproduced creating a file, adding multiple rectangle shapes with text in them. Save the file to docx and then convert using the previous command.
Thank you for reporting the bug. Please attach a sample document, as this makes it easier for us to verify the bug. I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the requested document is provided. (Please note that the attachment will be public, remove any sensitive information before attaching it. See https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F for help on how to do so.)
Created attachment 175759 [details] Test file to test the inconsistencies
Repro with file Arch Linux 64-bit Version: 7.5.0.0.alpha0+ / LibreOffice Community Build ID: ffc23650d988051bf9fe43edeb4e16096907b080 CPU threads: 8; OS: Linux 6.0; UI render: default; VCL: kf5 (cairo+xcb) Locale: fi-FI (fi_FI.UTF-8); UI: en-US Calc: threaded Built on 19 October 2022