-
-
Notifications
You must be signed in to change notification settings - Fork 818
Use gettext for recurring phrases, fix a bunch of i18n issues!!!!! #684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: lektor
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes. I'm really sweating right now after slaving away a solid 3 hours of my life that I'll never get back but am happy to give away... so need to step away from this for now, updating all the po files by hand is going to be hard, so @freakboy3742 or any adults w/ a credit card who signed up for a deepL key, maybe we should just deepL all the strings? Or I'll copy them in tomorrow or after dinner today.
BeeWare.lektorproject
Outdated
@@ -83,7 +83,7 @@ locale = fa_IR | |||
lektor-github-repos = 0.1.1 | |||
lektor-gravatar = 0.1.3 | |||
lektor-markdown-admonition = 0.3.1 | |||
git+https://github.com/beeware/lektor-i18n-plugin@v0.5.4 = | |||
../lektor-i18n-plugin = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to replace with new version after beeware/lektor-i18n-plugin#6 gets merged. That is just for extra safety, seem to work fine without it but you never know when the internal impls of any of those programs change...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
../ is my local checkout I used to update the pos
@@ -1,2 +1,3 @@ | |||
[jinja2: **/templates/**.html] | |||
encoding = utf-8 | |||
trimmed = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -9,6 +9,9 @@ | |||
|
|||
@pass_context | |||
def translate(context, string, bag_name="translate"): | |||
if bag_name == 'translate': | |||
raise RuntimeError("Use the new gettext system instead") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While doing this PR. eventually I will replace all other bags as well.
@@ -8,16 +8,16 @@ | |||
<div class="container"> | |||
<p>{{ breadcrumbs(this) }}</p> | |||
<h1>{{ this.title }}</h1> | |||
<p>{{ "posted_by"|trans }} | |||
{% if this.mastodon_handle %} | |||
{% set author_link %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[sweating intensifies]
templates/event.html
Outdated
{% elif this.event_type == "keynote" %} | ||
<p>{% trans title=this.title, url=this.url, talk_title=this.talk_title %}{{ speakers_list }} will be keynoting at {{ title }}, giving a presentation entitled "<a href="{{ url }}">{{ talk_title }}</a>".{% endtrans %}</p> | ||
{% elif this.event_type == "tutorial" %} | ||
<p>{% trans title=this.title, url=this.url, talk_title=this.talk_title %}{{ speakers_list }} will be presenting a tutorial at {{ title }} entitled "<a href="{{ url }}">{{ talk_title }}</a>".{% endtrans %}</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the duplication here... but I realize I can't do away with it.
@HalfWhitt I apologize that some code you wrote for translation bags might get deleted. The bird has flown away... (continuing the extended metaphor in the other thread) |
Yikes... looks like if you're doing _(name) when name is a variable, it does not get extracted... this happens with the badge macro. Which means this PR must be reviewed extremely carefully for such things. |
FYI -- marked some arabic strings as fuzzy b/c cannot figure out where to put periods correctly... i'm using textedit to edit the po files directly, spent some time yak shaving to use emacs po-mode but realized I don't know how to use emacs... |
Yikes. More strings not extracted out of event.html... |
Isolating event.html into a seperate directory and deleting random components shows that deleting
will cause babel to extract properly. |
Progress: python-babel/babel#1216 reports a simple MWE for this situation. |
Given that this seems to be a bug, I'm going to work around it by separating the logic into a seperate macro. To pass Python dicts around, I will put filters that converts to and from JSON in our plugin. [sweats] |
And... now the messages are extracted! Lektorbuilding to update the translation files and then committing. |
FYI beeware/lektor-i18n-plugin#5 caused a huge diff in 42a8f3c because of the way xgetttext seems to output stuff differently, but a cursory diff will get you the conclusion that it's actually good because now the difference between the format of the pot and the po file is 0, sans the translated strings. |
@freakboy3742 @HalfWhitt (latter -- since you started the freeform) I'd like some preliminary comments on this before I go copy all the strings into the po files and suggestions on how to work around the issue linked #684 (comment) ? 'cause this is going to produce huge, multithousand-line diffs. Read the above comments though, if you have time, especially the last one. Thanks |
Maybe CI is using Ubuntu 24.04 with gettext 0.21 while I'm using gettext 0.25 on macOS and we're hitting the test case over here at https://github.com/translate/translate/pull/5439/files which differentiates b/w <=0.23 and >0.23... See: https://launchpad.net/ubuntu/noble/+package/gettext OK... time to go yak shaving tomorrow to install gettext 0.21... |
Hmm... maybe let me try my ubuntu 22.04 vm tomorrow. See how that works. |
|
||
#: https://beeware.org/ (content/contents+en.lr:button-block.label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI beeware/lektor-i18n-plugin#5 caused a huge diff in 42a8f3c because of the way xgetttext seems to output stuff differently, but a cursory diff will get you the conclusion that it's actually good because now the difference between the format of the pot and the po file is 0, sans the translated strings.
Not in the specific hash but looking at these changes
This is resolved now. |
FWIW: I can't figure out why the zh_CN has this huge wrapping diff. It's formatted using Ubuntu 24.04 on GitHub codespaces; it matches the version used in CI. Maybe some of my plugin changes? The CI isn't re-wrapping it back, either. |
Strangely, using the new po file with the state of beeware.github.io before this PR causes an update translation, but does not re-wrap the strings back. What the heck is even going on here? I am totally confused to why on Ubuntu 22.04 and 24.04 both, they both would rewrap the strings from how Weblate wraps them... but this is not happening on the main branch! And copying the new PO file onto the main branch and updating translations in the CI in my fork does not wrap them back! |
@freakboy3742 I'm sorry if I'm coming across as pushy, but now that (unlike the description at the top says) the dependency for lektor-i18n is pointing at my fork, and also since this seem to require 2 builds when I'm lektor-servering sometimes for some reason, I'd like preview applied to see if this works with the CI pipeline. Also -- I still can't figure out why the huge wrapping diff. Seems like the diff is on some newer strings I've (and the other contributers have) translated -- the old ones aren't getting modified. I can't pinpoint the exact issue, though... and now I'm sorta stuck in this aspect. If you get these translation files from Weblate into CI it'd probably rewrap all the strings every time there is a new transltion... so this is an issue we need to solve here... I have completely no idea of what's going on. Another issue is how to test lektor-i18n-plugin... some of the logic is easy to get wrong in there. [note I ended up carrying my macbook for some other reason so I have access to a machine to work on this (and togax_qt) in some extremely limited time.] Thanks for taking a look @ this! |
So - I think what you're asking here is "can you please apply the preview tag so I can see if this works in CI". However, because it's 5 paragraphs long, including some digressions about diffs and the computers you have available while travelling, it's difficult to tell. The diff thing may be a request for assistance/explanation on one aspect of the patch; if that's the case, I honestly couldn't tell you what aspect of the patch you're asking for feedback on. You have been asked, on many occasions, to be more intentional when you're communicating. Github comments aren't a social forum, and they're not made better by having an unfiltered stream of consciousness rendered in text. If you're asking someone for something specific - ask for that specific thing. Additional detail only serves to confuse the issue, obscuring what you actually want behind a veil of irrelevant details. |
Visit the preview URL for this PR (updated for commit 2ebe489): https://beeware-org--pr684-jinjai18n-04pcm3hg.web.app (expires Mon, 11 Aug 2025 08:51:55 GMT) 🔥 via Firebase Hosting GitHub Action 🌎 Sign: b0da44bc067e7d9a4255c77cb2c5fce572218cec |
Oh… talking about how Ubuntu 24.04 keeps wanting to rewrap the Weblate output when building this PR but not on main. Might be related to the POT formatting changes… I’d need to get a minimal example working to see what is going on. |
Okay... I think it's sort of clear that the rewrapping bug may be an error in Translate Toolkit. I have made an minimum working example and posted at translate/translate#5661 @freakboy3742 So what I'm actually asking for next besides potential ways the bug is happening (which I already filed over there) is for you to take a look at the plugin changes and see if you're okay with them. Doing this one step at a time, let's get that reviewed and merged first before we discuss more issues about this PR here. Thanks! |
Fwiw before I get on an airplane: the line wrapping thing is a translate toolkit / weblate bug…not sure why it’s not happening to main yet. (probably because there hasn’t been a bunch of translation update commits in awhile… seems that wrap only happens when the pot gets altered.) |
FWIW: I had no idea why this sort of thing happened with the line wrapping... msgmerge seem to rewrap less but msgcat rewraps every time -- I was using msgcat in one of the old commits in lektor-i18n-plugin before (to reformat everything once they're done) on the final po file and that reformats. I've pasted in old-formatted strings into the new zh_CN po and clean and built and now there's no changes from the old-formatted strings -- so all this rewrap is one-off (hopefully) due to probably mismatched lektor plugin reinstalls, but I'm not reverting the rewrap b/c it seems to be consistent with what gettext would usually produce anyways. @freakboy3742 -- this is ready for you to take a look at. sorry for the noise |
@freakboy3742 I'm sort of sorry for the messy summmary... but I have no idea how to explain the changes I made in a more concise way... could a look be taken at this? Thank you very much. |
I'm under a crunch preparing for a conference trip, so I'm going to be short on time for the next couple of weeks. I'll take a look if I get a chance, but I wouldn't bet on that happening soon. However - as you've already identified - the PR is its own worst enemy. How can you explain the changes more concisely? Start by summarizing what you have done. What have you done? Why is it required? Why have you taken the approach you have? I read the PR summary here, and my eyes glaze over. I still only have the vaguest idea what you're trying to accomplish, and the mechanism by which it has been achieved. The first block of substantive content in the summary should not be a multi-paragraph ramble about the nature of fuzziness. If fuzziness is an issue - then start with "I've done X; this has some issues with fuzziness, details to follow" (or similar). |
@freakboy3742 Thanks for the tips for preparing a good PR summary. I will do some extensive refactoring. |
@freakboy3742 I have updated the PR summary to be more concise and a bit more logically organized in terms of what this does. Feel free to take a look at this and the plugin PR if you have a chance. Note -- sorry for weird formatting in the description I used Microsoft Word to write the Markdown code but it pasted as an image to I pasted into textedit and then pasted the code into this PR. Thank ya |
Fixes #683, fixes #692, fixes #689 (3rd through plugin).
Motivation
The existing model for maintaining translations for recurring phrases (databags) causes some issues; translators often do not get the nessacary context, the strings are not full sentences or the entire label, and that the strings for other languages aren't nessacarily updated when a new English key is added; in addition, this is an ad-hoc format and there is no way to keep track of whether a translation is a genuine human one or a machine translation, or how much work we've done in translating databags.
gettext is the standard solution for most translation issues; lektor-i18n-plugin provides functionality to translate templates properly; this PR contains the nessacary setup, with a PR on lektor-i18n-plugin (beeware/lektor-i18n-plugin#6) containing some flags ensuring the sorting of strings for safety.
While we're updating the lektor-i18n-plugin, since a new release will take some time, some unrelated i18n issues have also been addressed.
Summary
In this PR:
Since I figured that updating the plugin isn’t really done often, I decided to fix all bugs I can find in the plugin to avoid the need to update lektor-i18n-plugin repo again. The changes for the i18n plugin includes the following list; I was trying to fix all the bugs since I figured making a release would be complicated, some of those changes aren’t really relevant to this PR.
Project-Id-Version
header entry when generating PO files from the POT for the first time, since we’re in a position to know what the project name is in the plugin.Answers about Huge Diffs on Files
Things I marked as Fuzzy or Need Editing
Editing Done
Questions
https://github.com/beeware/beeware.github.io/blob/lektor/BeeWare.lektorproject#L17-L20 (EDIT -- normalized by me already)