Description
Automatic Localization Approach
I am contemplating adding the ability to automatically localize scripts in a user's preferred language using the OpenAI API library. To that end, I will introduce in RGP Lua (for 0.71)
FCControl:SetAutoResizeWidth
(which requests auto-sizing of supported controls before the window is running)FCControl:AssureWidthForText
(which you can use while the window is running)
Then I am imagining a new lua library localization
. Each script would include global table of all of its strings like this:
localization_base =
{
["Hello"] = "Hello",
["Goodbye"] = "Goodbye",
}
The library would export a function `'localize(string)' that did the following:
- Extract the user's preferred 2-character language code from
finenv.UI():GetUserLocaleName
. - Check to see if a global tables exists for ``localization_" .. language_code`. If so, it looks up the input string in that table.
- If found, return the string.
- Otherwise, look for
localization_base
. - If found, and if an OpenAI key exists, auto-generate a language table for the user's language. The beauty of this approach is that it only requires a single call to OpenAI to translate all the strings in the script at once.
- If nothing else works, it returns the input string unchanged.
The script-writer then wraps all hard-coded strings inside calls to localize
. It should be fairly simple to write a regex that does this. (Or get ChatGPT to write one for you, which is what I would do.)
Possible Optimizations
Some ways to speed up the script would be
- Set
finenv.RetainLuaState = true
so that it would not have to call OpenAI each time you invoked the script. - Optionally embed additional languages, e.g.:
localization_es =
{
["Hello"] = "Hola",
["Goodbye"] = "Adios",
}
localization_jp =
{
["Hello"] = "今日は",
["Goodbye"] = "さようなら",
}
Utility Functions
In additional to localize
, the library would have a of utility function for developers:
generate_localization(language_code)
This function would search the current running script for all strings and create the language_
table and copy it to the clipboard. The developer could then paste this into the script and provide direct support for a language that way. This function could also be used to generate the _base
table. In that case, it would not call OpenAI.
Issues and Concerns
- LLM hallucinations. LLMs have gotten quite reliably good at language translation, but there will inevitably be mistakes. We probably need a way for a user to disable auto-translation if it a particular language is not well-supported in the LLM.
- Right-to-left languages. I don't know that anyone has ever tested the PDK Framework dialog box system with right-to-left languages. It probably doesn't work. We need a way to detect that a language is right-to-left and not auto-generate translations for it. Or maybe (at minimum) a way for the user to disable the feature (as above) if it is not generating useful results.
- Strings overrunning the layout. While auto-sizing the width of individual controls is fairly straightforward, auto-sizing the control layout is beyond the scope of any change I am prepared to make in the PDK Framework. (The current code is fairly opaque.) Dialog boxes will have to be sized and laid out sufficient to the longest versions of the strings. (My suggestion is use Spanish to determine needed layout size. I'm sure there are languages with longer strings, but those are quite long, mainly due to the lack of a direct possessive form.)
I welcome ideas and concerns.