Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate PO files only for components that contain some localized code #2024

Open
hejtmii opened this issue Sep 2, 2024 · 9 comments
Open

Comments

@hejtmii
Copy link

hejtmii commented Sep 2, 2024

Is your feature request related to a problem? Please describe.
Our codebase is split into a large number of small files. Many of them are just app logic / Redux reducers / thunks etc.
We prefer the options to store localizations for each component separately using the {name} macro. (ideally in the same directory as the component so we can let AI help us with it and have it coupled with a nearby source code to provide extra context - I will file a separate feature request for that...)
But it turns out that extract generates "empty" PO files even for code that doesn't need them, which kind of spams our repo.

Describe proposed solution
Do not generate PO files for source files without localizations

Describe alternatives you've considered
Make it configurable, keeping current behavior as default so that a breaking change is not introduced?

Additional context
Please let me know if a PR for this could be accepted or if there are some important internal reason why you decided to generate PO files even for components without localizations.

If the idea is passable, I could create a PR for that.

@timofei-iatsenko
Copy link
Collaborator

Your usecase is not something that usually Lingui users do. Usually catalogs created for whole app or for the slice. And path's to this catalogs used somewhere in the runtime code, so creating catalog even if they empty is expected behaviour.

Regarding your changes, i'm not sure that the value of this feature would be worth effort of having and maintaing this in the codebase.

How you are going to load this catalogs after all? Have a loading code in every component?

Maybe it's better/easier for you to use lingui api and write your own extractor for your specific case.

@hejtmii
Copy link
Author

hejtmii commented Sep 4, 2024

How you are going to load this catalogs after all? Have a loading code in every component?

I assume this relates to "ideally in the same directory as the component" part, right? I am discussing that part separately here #2024 assuming that the build process could collect the catalogs recursively from the whole app and the result linked from root of the app in a way similar as outlined here https://lingui.dev/ref/conf#catalogsmergepath

Need to say I don't yet fully understand the life cycle of it.

Anyway, this issue is mainly about the empty files. I am interested in not generating the empty files even in the typical scenarios described in the examples https://lingui.dev/ref/conf#examples

The thing is that our app consists of:

  • Several hundreds of .tsx files with components, out of which about only half needs localization - Which adds several hundred "empty" files to our repo that aren't needed at all
  • Over a thousand of .ts files, mostly with business logic, out of which only under a hundred needs localization (typically ones that handle some logic related to validation or are defining messages for enums or other data-driven messages) - Which adds about a thousand of "empty" files to our repo that aren't needed at all

So overall about 80% of out PO files are "empty" which doesn't feel right...

And we don't want to have everything in a single PO file because it would be extremely complicated to translate just parts of it related to specific component with the AI when a part of our codebase changes.

@Palid
Copy link

Palid commented Oct 8, 2024

+1 for what @hejtmii is talking about, I tried having one big catalog per language (en.po and no.po), and it was unmaintainable - every time anyone changed pretty much anything in any file that had a translation, it ended up with horrible conflicts.
Unfortunately if you go with even this suggestion from the docs, you're unable to directly load .po files and are stuck with npx lingui extract --watch (which doesn't understand neither clean nor overwrite options while in watch mode, unfortunate; maybe that's a separate bug?) and npm lingui compile --watch with catalogsMergePath defined in your lingui.config.* so you can properly import your translations while developing things.

@timofei-iatsenko How is this problem solved in any kind of repositories bigger than 1-man-army? Extracting things to one big file just doesn't work if you're working with literally anyone other than yourself.
It is enough of a problem for a single developer if you frequently have to work on some different branches that modify the same file!

I strongly agree with @hejtmii, as even in a POC repository where I have only one translated file I already generated 14+ .po files with only headers, which aren't easily importable with the loaders!

I'll be happy to work with @hejtmii on this one just so we could solve the issue; a separate extractor, or some options to it, could be the solution, but I'm not super keen on it yet. On the other hand, the default extractor has lots of issues with globs and it is not entering and reproducing paths with nested directories correctly, so maybe the proper way to solve it is actually a custom extractor and documenting it? 🤔

@timofei-iatsenko
Copy link
Collaborator

@Palid Let's look at each point separately

I tried having one big catalog per language (en.po and no.po), and it was unmaintainable - every time anyone changed pretty much anything in any file that had a translation, it ended up with horrible conflicts.

How is this problem solved in any kind of repositories bigger than 1-man-army? Extracting things to one big file just doesn't work if you're working with literally anyone other than yourself.

In my opinion, that was a huge bad decision by original Lingui authors to implement 2 actions in one command. That exactly what lingui extract is doing, when extracting and merging translations in one shot. That, actually, causes this horrible merge conflicts.
In all other enterprise-grade i18n systems, I worked before it implement differently. You have two steps - one for extraction to a "master" file, and one to update your translations catalogs from this master file with some 3rd party tool. The master file is usually added to git ignore or if you decided to not, conflicts in this file fixed pretty easily with simple re-extracting.

Lingui supports this flow with lingui extract-template command. And that what we are using on pretty big project without any merge conflicts.

Another option could be disabling line numbers or source references completely using Po formatter settings.

On the other hand, the default extractor has lots of issues with globs and it is not entering and reproducing paths with nested directories correctly

Could you share reproductions so we can work on them. I hear about that for the first time. We are also opened to contributors and happily accept PRs.

@Palid
Copy link

Palid commented Oct 9, 2024

@timofei-iatsenko Gladly can work on that, considering I have two projects that could use this. I'll provide a reproduction repository for all those separate problems!

I feel like there's a couple of separate issues, related to the tool's legacy (even though it's not that old). Seems that docs suggest a default solution that will result in tons of conflicts, but if you try to go other way, it creates some other issues with the tooling that makes some of the features of LingUI no longer available, like loaders. I'll try to list problems below:

  • The @lingui/loader packages expect .po files, which makes development with translations a bit of a pain with extract-template, as you now need to either fully ignore messages object in development, or have a separate step extracting those templates with --watch. Or, for development only, have an entirely different pass with lingui extract that's in .gitignore.
  • Trying to go with catalogs per component as suggested in docs is also a bad idea, as you now not only need to lingui extract --watch, but also lingui compile --watch in development mode, as there's no easy way anymore to load all the different .po files.
  • Docs suggesting that the correct go-to route would be huge .po catalogs per language, which will end up with conflicts, unless you do .gitignore them, which explicitely enforces you to use external services for translations, as you no longer have an easy way to sync your translations directly in repository.
  • lingui extract-template does not have any --watch options, which makes it unusable in development at all.
  • lingui.config.js only has a {name} template placeholder, which makes generating .po files in nested directories impossible with any kind of globs, as it doesn't generate the entire path correctly.

Considering all of above, maybe the easiest way to at least remove some of the troubles would be to define development and production steps, and figure out where can we simplify&improve the tooling? Having so many different abilities to extract translations, while still not being able to configure it the way you want (e.g. generating .po files for all the languages near your source files if you have a nested structure, while still generating empty .po files) makes this quite a problem.

Before getting to the reproduction repository, I can share the tree and lingui config where this nesting is already a problem:
Source tree:

components
├── ThemeToggle.tsx
├── router-entry.tsx
└── ui
    ├── alert-dialog.tsx
    ├── avatar.tsx
    ├── badge.tsx
    ├── button.tsx
    ├── card.tsx
    ├── input.tsx
    ├── progress.tsx
    ├── text.tsx
    └── tooltip.tsx

lingui.config.js:

/** @type {import('@lingui/conf').LinguiConfig} */
module.exports = {
  locales: ['en', 'no'],
  sourceLocale: 'en',
  catalogs: [
    {
      path: 'locale/{locale}/{name}',
      include: ['components/**/{name}'],
    },
  ],
  catalogsMergePath: '.locales/{locale}',
  format: 'po',
};

Running lingui extract now would generate this tree:

locale
├── en
│   ├── ThemeToggle.tsx.po
│   ├── alert-dialog.tsx.po
│   ├── avatar.tsx.po
│   ├── badge.tsx.po
│   ├── button.tsx.po
│   ├── card.tsx.po
│   ├── input.tsx.po
│   ├── progress.tsx.po
│   ├── router-entry.tsx.po
│   ├── text.tsx.po
│   ├── tooltip.tsx.po
│   └── ui.po
└── no
    ├── ThemeToggle.tsx.po
    ├── alert-dialog.tsx.po
    ├── avatar.tsx.po
    ├── badge.tsx.po
    ├── button.tsx.po
    ├── card.tsx.po
    ├── input.tsx.po
    ├── progress.tsx.po
    ├── router-entry.tsx.po
    ├── text.tsx.po
    ├── tooltip.tsx.po
    └── ui.po

3 directories, 24 files

Most of the files in here are entirely empty, other than the headers, which could make it entirely skippable for extraction, though it still generates them (I guess that's the problem @hejtmii mentioned).
The entire contents of cat locale/en/alert-dialog.tsx.po:

msgid ""
msgstr ""
"POT-Creation-Date: 2024-10-09 13:10+0200\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: @lingui/cli\n"
"Language: en\n"

This amount of files has been generated using React Native Reusables initial template and it already created so many empty files, that aren't even properly nested (which is currently impossible with the extraction mechanism). The even worse thing is that you can't really use @loader anyways, because you have a ton of .po files, so the simple and nice dev experience does not apply anymore.

@timofei-iatsenko
Copy link
Collaborator

@Palid

  • The @lingui/loader packages expect .po files, which makes development with translations a bit of a pain with extract-template, as you now need to either fully ignore messages object in development, or have a separate step extracting those templates with --watch. Or, for development only, have an entirely different pass with lingui extract that's in .gitignore.

Have you had the chance to try this setup, or are you just speculating about what might happen? The @lingui/loader is designed to work with templates right away. It automatically merges translations, with a fallback to the messages from the template if message is not presented in the translation catalog. You don't need anything from what you described.

  • Trying to go with catalogs per component as suggested in docs is also a bad idea, as you now not only need to lingui extract --watch, but also lingui compile --watch in development mode, as there's no easy way anymore to load all the different .po files.

I, honestly, never have a need for that, so never used.

  • Docs suggesting that the correct go-to route would be huge .po catalogs per language, which will end up with conflicts, unless you do .gitignore them, which explicitely enforces you to use external services for translations, as you no longer have an easy way to sync your translations directly in repository.

Yes, suggested approach would be huge catalog per language per entry point. You don't need to add catalogs to the gitignore. You need to add to git ignore only template. Furthermore, you also don't need to extract and commit on every commit. Commit your catalogs only when the translation changed, not on every file change.

  • lingui extract-template does not have any --watch options, which makes it unusable in development at all.

You don't need.

  • lingui.config.js only has a {name} template placeholder, which makes generating .po files in nested directories impossible with any kind of globs, as it doesn't generate the entire path correctly.

You also don't need it.

Prerequisite:

  1. Add lingui extract-template before build of your application, like that
     "build": "lingui extract-template && vite build",
  1. Add template.pot to the git ignore.
  2. if you don't have a translation files for specific language, create an empty po file. Including for source language (en.po for example)
  3. Use a standard loading snippet
    export async function loadCatalog(locale: string) {
      const { messages } = await import(`../locales/${locale}.po`)
      i18n.loadAndActivate({ locale, messages })
    }
  4. Remove any extraction or compiling on pre-commit hooks, if you have them

The Flow

  • If you are developing locally with your source language (en), empty catalog would be loaded, all messages from source code would be used.
  • If you are developing locally with translation language (pl for example), catalog with partial translations would be loaded, not translated messages would be used from source code
  • If you are bundling for production, you need to have messages.pot file up to date, that's why it added before the build command. Messages from the sourcecode would be not available, lingui loader will compile your catalogs and fallback to template.

Hope that helps.

Do you translate in feature branches or only when feature is merged to a main branch?

@Palid
Copy link

Palid commented Oct 9, 2024

@Palid

  • The @lingui/loader packages expect .po files, which makes development with translations a bit of a pain with extract-template, as you now need to either fully ignore messages object in development, or have a separate step extracting those templates with --watch. Or, for development only, have an entirely different pass with lingui extract that's in .gitignore.

Have you had the chance to try this setup, or are you just speculating about what might happen? The @lingui/loader is designed to work with templates right away. It automatically merges translations, with a fallback to the messages from the template if message is not presented in the translation catalog. You don't need anything from what you described.

I actually did try to import .pot directly, as my importing code in one of the projects is doing couple of things, mostly due to it being next.js and that I needed to support server side rendering + having nice DX (as well as supporting Turbopack, see: #1854 - I see we had a short discussion there.). It unfortunately does not work with my use case at all, as having an empty .po file to make sure that the importer correctly resolved the path, and then it automatically falling back to .pot (!!!) is a very weird design choice.

Attaching the code example below, slightly modified for clarity reasons. This

import "server-only";

import { I18n, MessageDescriptor, setupI18n } from "@lingui/core";
import { msg } from "@lingui/macro";
import { setI18n } from "@lingui/react/server";
import linguiConfig from "../../../lingui.config";

export type Lang = (typeof linguiConfig.locales)[number];

export const languages: Record<Lang, MessageDescriptor> = {
  en: msg`English`,
  no: msg`Norwegian`,
};

const translations = require("src/i18n/prod-messages");
// Code for `translations` below:
/**
 *
 * if (process.env.NODE_ENV === "production" || process.env.TEST_ENV === "test") {
 *   const en = require("../locales/en/messages.js");
 *   const no = require("../locales/no/messages.js");
 *   module.exports = {
 *     en,
 *     no,
 *   };
 * }
 */

type MessagesFile = Record<string, string>;

export async function loadLinguiMessages(lang: string): Promise<MessagesFile> {
  if (
    process.env.NODE_ENV === "development" &&
    process.env.TEST_ENV !== "test"
  ) {
    const msgFile = await import(`src/locales/${lang}/messages.po`);
    return {
      [lang]: msgFile.messages,
    };
  } else {
    return {
      [lang]: translations[lang].messages,
    };
  }
}

const { locales } = linguiConfig;
// optionally use a stricter union type
type SupportedLocales = string;

type AllI18nInstances = { [K in SupportedLocales]: I18n };

let catalogs: MessagesFile[] = [];
let allMessages: MessagesFile;
let hasInitializedCatalogs = false;
let allI18nInstances: AllI18nInstances = {};
async function getAllInstances(): Promise<AllI18nInstances> {
  if (!hasInitializedCatalogs) {
    const messages = await Promise.all(locales.map(loadLinguiMessages));
    catalogs = messages;
    allMessages = catalogs.reduce((acc, oneCatalog) => {
      return { ...acc, ...oneCatalog };
    }, {});
    allI18nInstances = locales.reduce((acc, locale) => {
      const messages = allMessages[locale] ?? {};
      const i18n = setupI18n({
        locale,
        messages: { [locale]: messages } as any,
      });
      return { ...acc, [locale]: i18n };
    }, {});
    hasInitializedCatalogs = true;
  }

  return Promise.resolve(allI18nInstances);
}

export async function getI18nInstance(locale: Lang) {
  const allI18nInstances = await getAllInstances();
  return allI18nInstances[locale];
}

export async function getI18nInstanceWithLocale(locale: Lang) {
  const instance = await getI18nInstance(locale);
  setI18n(instance);
  return instance;
}

And for the loading I'm just using a webpack loader config:

/* ... */
  webpack: (config) => {
    config.module.rules.push({
      test: /\.po$/i,
      loader: "@lingui/loader",
    });
    return config;
  },
 /* ... */
  • Trying to go with catalogs per component as suggested in docs is also a bad idea, as you now not only need to lingui extract --watch, but also lingui compile --watch in development mode, as there's no easy way anymore to load all the different .po files.

I, honestly, never have a need for that, so never used.

  • Docs suggesting that the correct go-to route would be huge .po catalogs per language, which will end up with conflicts, unless you do .gitignore them, which explicitely enforces you to use external services for translations, as you no longer have an easy way to sync your translations directly in repository.

Yes, suggested approach would be huge catalog per language per entry point. You don't need to add catalogs to the gitignore. You need to add to git ignore only template. Furthermore, you also don't need to extract and commit on every commit. Commit your catalogs only when the translation changed, not on every file change.

  • lingui extract-template does not have any --watch options, which makes it unusable in development at all.

You don't need.

  • lingui.config.js only has a {name} template placeholder, which makes generating .po files in nested directories impossible with any kind of globs, as it doesn't generate the entire path correctly.

You also don't need it.

Prerequisite:

  1. Add lingui extract-template before build of your application, like that
     "build": "lingui extract-template && vite build",
  1. Add template.pot to the git ignore.
  2. if you don't have a translation files for specific language, create an empty po file. Including for source language (en.po for example)
  3. Use a standard loading snippet
    export async function loadCatalog(locale: string) {
      const { messages } = await import(`../locales/${locale}.po`)
      i18n.loadAndActivate({ locale, messages })
    }
  4. Remove any extraction or compiling on pre-commit hooks, if you have them

The Flow

  • If you are developing locally with your source language (en), empty catalog would be loaded, all messages from source code would be used.
  • If you are developing locally with translation language (pl for example), catalog with partial translations would be loaded, not translated messages would be used from source code
  • If you are bundling for production, you need to have messages.pot file up to date, that's why it added before the build command. Messages from the sourcecode would be not available, lingui loader will compile your catalogs and fallback to template.

Hope that helps.

Do you translate in feature branches or only when feature is merged to a main branch?
In that particular project I did translate in feature branches.

I think we're talking about a few different problems here.
Your suggestion does not solve the problem with conflicts, as we're still stuck with a one huge .po file, though now it's required to be manually filled.
It certainly does help a bit with generating a production build, but it wasn't an issue in my case anyways. Doing the extract-template thing definitely makes things easier, but having an additional required step of adding translations later doesn't seem ideal.
Another problem is co-location of the files, which you mentioned you never needed to use. There are cases where you might have a component named exactly the same way under a different, nested path (it unfortunately does happen), and currently there's no way to have two components named Button, one under ui/button.tsx and one under ui/user/button.tsx, as those will be merged into one catalog, e.g.

#: components/button.tsx:7
msgid "First message"
msgstr "First message"

#: components/user/button.tsx:7
msgid "Second message"
msgstr "Second message"

The final issue is loading those multiple catalogs in development - lingui doesn't really provide any way to do it well, as you'd have to manually define the imports in every single component, which kind of ruins the idea of good developer experience and ease of use. It'd be perfect if the loader could understand lingui config and deliver the translations based on default or defined priority (e.g .js first, then .json, .po, .pot, etc.), as even though docs allow for having this tree-based configuration, it pretty much requires having a single .po or .json file as the entrypoint for development.

Your suggestion will be good enough for this particular problem as long as I use a dedicated service for the translations and never change the .po files manually, but then docs still show a way to configure your project that's basically a footgun, as it'll just make the development a lot harder, with barely any additional benefits.

To sum it up, which one would you prefer?

  • Better configurability, so that the loader understands catalog paths from lingui.config file and imports it automatically when needed & fixing the globs/templates so that you properly generate lang/components/ui/button.po and lang/components/ui/user/button.po
  • Getting rid of the configuration feature, as it's just a confusing thing that seems to allow you to structure your translations however you want, but in the end is missing some features that would allow you to do that.

I'd very much prefer the first choice, even though it's going to make maintaining it definitely harder. I'm willing to take over the development for this myself, as this feature would be really beneficial for my $DAYJOB stuff.
Whichever you choose, let me know, I'll gladly help with doing the heavy lifting here. Having a trap like this in documentation definitely isn't perfect, and having to fork/patch the library to make it do the thing documentation suggests it can is far from great experience.

@Palid
Copy link

Palid commented Oct 15, 2024

Pinging @timofei-iatsenko as you might have not noticed the wall of text above, I'd love to help on that in addition to the turbopack PR. 😄

We can have a chat on something like matrix if you'd prefer, [email protected] if you are able to chat there.

@timofei-iatsenko
Copy link
Collaborator

timofei-iatsenko commented Dec 16, 2024

@Palid hey, sorry for the delay. I found a time to continue discussing the issue.

Firstly, thanks for rising this concern and sharing the feedback. Second, I'm really happy to improve the flow in this regard.

What I don't want is to implement something half-way done, or implement something that will lead users into the dead end (like it is now with extract --files on pre-commit hook). So let's identify the problems and find solutions then decompose them to issue / task which could be picked up.

For the issue described in this thread we can add an option for the CLI "--omit-empty/--skip-empty/--no-emit-empty", which is planned to be used together with catalog per file approach (say we will fix glob issue). But what next? How to load them into application?

The loader's behavior is already weird enough. Usually we specify the file we want to load. And we're expecting that this file would somehow preprocessed by the loader.

If we want to create a loader which will load many files, crawling the file system, we should specify some virtual path?
Something like this:

import('virtual:lingui-catalog:en')

Which would load and merge all catalogs per file under the hood. This is on top of my head, quite naive approach, though.

I'm afraid that this is very far from a normal use case for the webpack (and it's derivatives such as rspack or turbopack) and potentially could lead to incompatibilities. With Vite it should be fine, because Vite is supporting "virtual" modules out of the box.

Maybe you had a better idea?


It unfortunately does not work with my use case at all, as having an empty .po file to make sure that the importer correctly resolved the path, and then it automatically falling back to .pot (!!!) is a very weird design choice.

It doesn't fallback into the whole file. You need to understand a bit of the history to understand that decisions. Historically, the loader was used as a shortcut or alias to just a trigger the same logic "lingui compile" is using. Loader isn't pure. It compiles the full catalog as normal lingui compile command would do and then picking up the requested file from the results.
When lingui compile catalogs, it considers few things: language fallbacks, is source locale fallback and template fallback. The fallbacks are happening per message basis. You can follow the logic reading the tests here
So to build the catalog correctly, compiler is loading all these files and consider them as a whole thing, it doesn't compile catalogs per-file basis.

That's why, if you will try to load a catalog file which is not in the lingui configuration, you receive an error even though the file exists in the filesystem.

Other things you need to consider, is that lingui is removing original messages from the source code. So there is nothing to fallback if the message key is not presented in the catalog, and you will see ugly message id instead. That's why it's very important to have up-to-date catalogs before application build. Even if these catalogs would not have a translation, there should be an empty msgid to fallback to.

POT actually resolving this problem. You create POT file right before the actual production build, so you have it always up to date and then POT is used as a backbone to create a language catalogs. Unfortunately there is nothing I can do here better, because it's a react ecosystem, there are plenty of different bundlers, different workflows and combinations and so on.

Let's contact me in the discord, i think it could be faster to iterate there and we can come up on some good solution.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants