Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong reading of literal-block content in PO files #120

Open
rffontenelle opened this issue Sep 18, 2024 · 9 comments
Open

Wrong reading of literal-block content in PO files #120

rffontenelle opened this issue Sep 18, 2024 · 9 comments

Comments

@rffontenelle
Copy link
Contributor

rffontenelle commented Sep 18, 2024

CPython enabled literal-block in gettext_additional_targets, so literal blocks are being extracted to translation files by the gettext builder. Since then, incorrect reports are showing up when running sphinx-lint 1.0.0, pointing to lines of msgids/source messages (English) instead of the msgstr/translation message.

Examples (make sure to rename removing .txt of the upload filename if want to test):

license.po (doc) shows:

license.po:998: found an unbalanced inline literal markup. (unbalanced-inline-literals-delimiters)
license.po:1410: found an unbalanced inline literal markup. (unbalanced-inline-literals-delimiters)
license.po:2029: found an unbalanced inline literal markup. (unbalanced-inline-literals-delimiters)
license.po:2038: found an unbalanced inline literal markup. (unbalanced-inline-literals-delimiters)
license.po:2503: found an unbalanced inline literal markup. (unbalanced-inline-literals-delimiters)

and library/datetime.po (doc) shows:

library/datetime.po:1601: trailing whitespace (trailing-whitespace)
library/datetime.po:1613: trailing whitespace (trailing-whitespace)
library/datetime.po:2056: trailing whitespace (trailing-whitespace)
library/datetime.po:2159: trailing whitespace (trailing-whitespace)
library/datetime.po:2931: trailing whitespace (trailing-whitespace)
library/datetime.po:3029: trailing whitespace (trailing-whitespace)
library/datetime.po:3031: trailing whitespace (trailing-whitespace)
library/datetime.po:3041: trailing whitespace (trailing-whitespace)
library/datetime.po:3056: trailing whitespace (trailing-whitespace)
@m-aciek
Copy link

m-aciek commented Oct 31, 2024

The messages for literal-blocks in PO files are no different than regular messages. We could try to use a custom flag to mark literal-blocks to change sphinx-lint behaviour (will a non-standard flag break gettext?). It would need to be added by Sphinx. Alternatively try to mark in the translator comment.

white-space
#  translator-comments
#. extracted-comments
#: reference…
#, flag…
#| msgctxt previous-context
#| msgid previous-untranslated-string
msgctxt context
msgid untranslated-string
msgstr translated-string

@JulienPalard
Copy link
Collaborator

I'm really not a fan of having Python code in po files, we disabled it french-side by adding -Dgettext_additional_targets=['index'] to our sphinx-build invocation. I think I would have preferred for this to be enabled explicitly language by language instead of upstream.

I don't think a non-standard flag would break the po format, I've seen #, python-format flags, I bet those are not standard, but ... reading the po spec would probably be a better solution than guessing :D

@m-aciek
Copy link

m-aciek commented Dec 2, 2024

Trailing spaces in datetime.po are present in the original snippets and could be fixed upstream, as they don't change the rendering.

In license.po file the failing linting comes from characters used as quotes. We could also start a discussion to fix them upstream.

Also I confirm the wrong line numbers printed by sphinx-lint. I wonder if those are not connected with line-breaks in formatted PO file vs sphinx-lint counting only \n inside concatenated msgstrs.

The following changes fix the linting errors in attached files:

Index: license.po
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/license.po b/license.po
--- a/license.po	(revision 03299b43007cbf6827bc10c6e9238efb5d02608a)
+++ b/license.po	(date 1733177435150)
@@ -1024,7 +1024,7 @@
 "   may be used to endorse or promote products derived from this software\n"
 "   without specific prior written permission.\n"
 "\n"
-"THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND\n"
+"THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ''AS IS'' AND\n"
 "ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n"
 "IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE\n"
 "ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE\n"
@@ -1433,7 +1433,7 @@
 "   notice, this list of conditions and the following disclaimer in the\n"
 "   documentation and/or other materials provided with the distribution.\n"
 "\n"
-"THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND\n"
+"THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ''AS IS'' AND\n"
 "ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\n"
 "IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE\n"
 "ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE\n"
@@ -2050,7 +2050,7 @@
 "\n"
 "Permission is hereby granted, free of charge, to any person obtaining\n"
 "a copy of this software and associated documentation files (the\n"
-"``Software''), to deal in the Software without restriction, including\n"
+"''Software''), to deal in the Software without restriction, including\n"
 "without limitation the rights to use, copy, modify, merge, publish,\n"
 "distribute, sublicense, and/or sell copies of the Software, and to\n"
 "permit persons to whom the Software is furnished to do so, subject to\n"
@@ -2059,7 +2059,7 @@
 "The above copyright notice and this permission notice shall be included\n"
 "in all copies or substantial portions of the Software.\n"
 "\n"
-"THE SOFTWARE IS PROVIDED ``AS IS'', WITHOUT WARRANTY OF ANY KIND,\n"
+"THE SOFTWARE IS PROVIDED ''AS IS'', WITHOUT WARRANTY OF ANY KIND,\n"
 "EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF\n"
 "MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n"
 "NONINFRINGEMENT.  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\n"
@@ -2525,7 +2525,7 @@
 "   notice, this list of conditions and the following disclaimer in the\n"
 "   documentation and/or other materials provided with the distribution.\n"
 "\n"
-"THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR\n"
+"THIS SOFTWARE IS PROVIDED BY THE AUTHOR ''AS IS'' AND ANY EXPRESS OR\n"
 "IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES\n"
 "OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.\n"
 "IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,\n"
Index: library/datetime.po
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/library/datetime.po b/library/datetime.po
--- a/library/datetime.po	(revision 03299b43007cbf6827bc10c6e9238efb5d02608a)
+++ b/library/datetime.po	(date 1733177100036)
@@ -1669,7 +1669,7 @@
 "\n"
 ">>> # Métodos para extrair 'componentes' em diferentes calendários\n"
 ">>> t = d.timetuple()\n"
-">>> for i in t:     \n"
+">>> for i in t:\n"
 "...     print(i)\n"
 "2002                # ano\n"
 "3                   # mês\n"
@@ -1681,7 +1681,7 @@
 "70                  # 70º dia no ano\n"
 "-1\n"
 ">>> ic = d.isocalendar()\n"
-">>> for i in ic:    \n"
+">>> for i in ic:\n"
 "...     print(i)\n"
 "2002                # ano em ISO\n"
 "11                  # número da semana em ISO\n"
@@ -2114,7 +2114,7 @@
 ">>> datetime.fromisoformat('2011-11-04 00:05:23.283+00:00')\n"
 "datetime.datetime(2011, 11, 4, 0, 5, 23, 283000, tzinfo=datetime.timezone."
 "utc)\n"
-">>> datetime.fromisoformat('2011-11-04T00:05:23+04:00')   \n"
+">>> datetime.fromisoformat('2011-11-04T00:05:23+04:00')\n"
 "datetime.datetime(2011, 11, 4, 0, 5, 23,\n"
 "    tzinfo=datetime.timezone(datetime.timedelta(seconds=14400)))"
 
@@ -2202,7 +2202,7 @@
 ">>> date_string = \"02/29\"\n"
 ">>> when = datetime.strptime(f\"{date_string};1984\", \"%m/%d;%Y\")  # Evita "
 "o bug do ano bissexto.\n"
-">>> when.strftime(\"%B %d\")  \n"
+">>> when.strftime(\"%B %d\")\n"
 "'February 29'"
 
 #: ../../library/datetime.rst:1130
@@ -2974,7 +2974,7 @@
 "'2015-01-01T12:30:59.000000'"
 msgstr ""
 ">>> from datetime import datetime\n"
-">>> datetime.now().isoformat(timespec='minutes')   \n"
+">>> datetime.now().isoformat(timespec='minutes')\n"
 "'2002-12-25T00:00'\n"
 ">>> dt = datetime(2015, 1, 1, 12, 30, 59, 0)\n"
 ">>> dt.isoformat(timespec='microseconds')\n"
@@ -3116,9 +3116,9 @@
 "datetime.datetime(2005, 7, 14, 12, 30)\n"
 "\n"
 ">>> # Usando datetime.now()\n"
-">>> datetime.now()   \n"
+">>> datetime.now()\n"
 "datetime.datetime(2007, 12, 6, 16, 29, 43, 79043)   # GMT +1\n"
-">>> datetime.now(timezone.utc)   \n"
+">>> datetime.now(timezone.utc)\n"
 "datetime.datetime(2007, 12, 6, 15, 29, 43, 79060, tzinfo=datetime.timezone."
 "utc)\n"
 "\n"
@@ -3130,7 +3130,7 @@
 ">>> # Usando datetime.timetuple() para obter uma tupla de todos os "
 "atributos\n"
 ">>> tt = dt.timetuple()\n"
-">>> for it in tt:   \n"
+">>> for it in tt:\n"
 "...     print(it)\n"
 "...\n"
 "2006    # ano\n"
@@ -3145,7 +3145,7 @@
 "\n"
 ">>> # Data em formato ISO\n"
 ">>> ic = dt.isocalendar()\n"
-">>> for it in ic:   \n"
+">>> for it in ic:\n"
 "...     print(it)\n"
 "...\n"
 "2006    # ano ISO\n"

@rtobar
Copy link
Contributor

rtobar commented Dec 2, 2024

On the Spanish translation we've also encountered a few problems, with these sphinx-lint errors being one of them. I thought upstream ran sphinx-lint too? We've been fixing them on each msgstr as we find them, but haven't bothered yet reporting upstream.

I'm really not a fan of having Python code in po files, we disabled it french-side by adding -Dgettext_additional_targets=['index'] to our sphinx-build invocation

@JulienPalard our of curiosity, how does that flag work? Although I've used sphinx-build to generate the .pot files from the cpython documentation, I'm not well versed in how you can change that process, and the extra flag you mention doesn't mention anything about code blocks.

I'm currently a bit torn ATM about whether including those blocks was a good or bad idea. Yes, at the moment they are giving all these new problems (spelling is the other big one), but being able to offer translated comments in code snippets is yet another bit of help for people new to Python. I'm hopeful things will settle down in the future.

I've seen #, python-format flags, I bet those are not standard, but ...

It is: https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html#index-no_002dpython_002dformat-flag

I've been working on a few po-related fixes in powrap, polib, babel and gettext, and I managed to learn one or two things on the way 😄

@rffontenelle
Copy link
Contributor Author

I'm really not a fan of having Python code in po files, we disabled it french-side by adding -Dgettext_additional_targets=['index'] to our sphinx-build invocation

@JulienPalard our of curiosity, how does that flag work? Although I've used sphinx-build to generate the .pot files from the cpython documentation, I'm not well versed in how you can change that process, and the extra flag you mention doesn't mention anything about code blocks.

You may see that Python doc's conf.py now has 'literal-block' as target of gettext_additional_targets Sphinx confval. Because of that, generating POT files will extract literal blocks as well and make available for translation. What Julien did with that flag is to override such confval disabling 'literal-block'.

In Brazilian Portuguese, we have a workaround to is basically this:

  1. changes to PO files are committed
  2. enter cpython/Doc dir,
  3. run make gettext SPHINXOPTS='-Dgettext_additional_targets=["index"]'

    NOTE: po files are placed in cpython/Doc/locales/pt_BR/LC_MESSAGES, so I don't pass --locale-dir to sphinx-intl

  4. sphinx-intl update --pot cpython/Doc/build/gettext --language pt_BR

    Now PO files do not have strings from literal blocks

  5. Now run sphinx-lint to get the usual output
  6. Finally undo the changes caused by the step 4 (we want to keep literal block translations)

@m-aciek
Copy link

m-aciek commented Dec 3, 2024

I thought upstream ran sphinx-lint too? We've been fixing them on each msgstr as we find them, but haven't bothered yet reporting upstream.

Now code-blocks are excluded from ReStructuredText linting as they are not containing the rst syntax, but in majority Python code instead. We could somehow enable or allow the trailing whitespaces check in code-blocks to make sphinx-lint recognise the issues.

@m-aciek
Copy link

m-aciek commented Dec 4, 2024

Extra trailing spaces in datetime.po are related to doctest comments, that are being removed from rendered content. Examples:

>>> for i in t:     # doctest: +SKIP

>>> for i in ic:    # doctest: +SKIP

>>> datetime.fromisoformat('2011-11-04T00:05:23+04:00')   # doctest: +NORMALIZE_WHITESPACE

>>> when.strftime("%B %d")  # doctest: +SKIP

The code in Sphinx transformation that processes the doctest blocks could remove the trailing whitespaces, not only comments starting from hash character.

@m-aciek
Copy link

m-aciek commented Dec 4, 2024

I've proposed a change in Sphinx that covers the trailing spaces in code blocks: sphinx-doc/sphinx#13164.

@m-aciek
Copy link

m-aciek commented Jan 24, 2025

I've proposed a change in Sphinx that covers the trailing spaces in code blocks: sphinx-doc/sphinx#13164.

It's merged now and planned to be released along Sphinx 8.2.0 version. That will solve the case of doctest trailing whitespaces. We are left only with unbalanced inline literal markup from the original issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants