Bug 39795

Summary: ACCESSIBILITY: Writer XHTML export loses language information [accessibility]
Product: LibreOffice Reporter: Christophe Strobbe <c_strobbe-fdo>
Component: WriterAssignee: Not Assigned <libreoffice-bugs>
Status: NEW ---    
Severity: enhancement CC: c_strobbe-fdo, m.weghorn, sasha.libreoffice, vsfoote
Priority: medium Keywords: accessibility
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
Crash report or crash signature: Regression By:
Bug Depends on: 39937    
Bug Blocks: 101912, 108799    

Description Christophe Strobbe 2011-08-03 03:18:38 UTC
When an OpenDocument Text (file) is exported to XHTML, the exported code does not contain lang attributes that identify the document's default language or the language changes inside the document.

Steps to reproduce the issue:
1. Create a new Writer document and insert some text in English.
2. Add a paragraph in French (e.g. copy something from fr.wikipedia.org).
3. Go to File > Export > and choose XHTML.
4. Inspect the exported XHTML file in a source code editor and search for 'lang="'.

What the XHTML *should* have is:
1. lang="en" (possibly lang="en-US" or lang="en-GB", depending on the language specified for the Writer document) on the HTML element;
2. lang="..." on elements where the language changes compared to the immediate context (i.e. nearest ancestor).

Notes:
* xml:lang is also in use, but is not supported by screen readers or software for dyslexics; screen readers are used by blind users to convert content to synthetic speech and/or Braille, and correct language identification is essential for both synthetic speech and Braille.
* Using Dublin Core metadata (e.g. <meta name="DCTERMS.language" content="en-US"...) specifies the expected audience language, but not the text processing language.

Background:
* <http://www.w3.org/International/tutorials/language-decl/#Slide0140>: "Declaring the text-processing language" (in W3C tutorial);
* WCAG 2.0 technique H57: Using language attributes on the html element: <http://www.w3.org/TR/2010/NOTE-WCAG20-TECHS-20101014/H57>
* WCAG 2.0 technique H58: Using language attributes to identify changes in the human language: <http://www.w3.org/TR/2010/NOTE-WCAG20-TECHS-20101014/H58.html>
Comment 1 Christophe Strobbe 2011-08-08 10:33:32 UTC
Added dependency on Bug 39937 because the XSLT for XHTML export assumes that a dc:language element exists.
Comment 2 Björn Michaelsen 2011-12-23 12:28:24 UTC Comment hidden (obsolete)
Comment 3 sasha.libreoffice 2012-01-08 21:33:47 UTC
reproduced in LibO 3.5.0 beta 1
Comment 4 Stéphane Guillou (stragu) 2021-05-18 06:29:36 UTC
Confirmed in 7.2 Alpha, although the situation seems to have slightly improved:

There is a lang tag in the HTML tag at the top, but there isn't anything for the body or specific paragraphs.

Note that simply copying and pasting from FR Wikipedia did not attribute the French language to the paragraph in LibreOffice: I had to manually select the text and make it French by using the language menu in the status bar.

I can confirm also that using "Save As > HTML" does use lang tags for both body and specific paragraphs.

Version: 7.2.0.0.alpha0+ / LibreOffice Community
Build ID: 6b09276d157abada74e1a4989700139167207778
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-05-14_04:32:30
Calc: threaded