Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I compared this translation with Google Translate, I got a different result. #131

Open
thawon opened this issue Jan 31, 2025 · 38 comments

Comments

@thawon
Copy link

thawon commented Jan 31, 2025

Hi there!

We've noticed some users mentioning a recent dip in translation quality, and I've done some testing myself, comparing your library's results with Google Translate. I've found some discrepancies, with Google Translate sometimes performing better. Any insights you could share on why this might be happening would be greatly appreciated!

Regards,
Thawon Uttamavanit

@vitalets
Copy link
Owner

Hello!

Could you share the exact discrepancies, that you've found?

@thawon
Copy link
Author

thawon commented Jan 31, 2025

Hi vitalets,

Thank you for the reply.

Here is the text in Chinese:
"m2 正向電磁接觸器下方的過載電驛 r相欠相 已做更換(跟污水廠借了一顆)||m2 逆向電磁接觸器彈簧卡卡 s相沒有吸到很裡面 重複測試恢復正常之後 堪用||目前已投入使用"

The result we have from your library:
"โพสต์ไฟฟ้าของ M2 ที่อยู่ทางด้านขวาของคอนแทคเลนส์แม่เหล็กไฟฟ้าถูกแทนที่ (ยืมมาด้วยโรงงานน้ำเสีย) เข้าใช้งาน"

Google translate's result:
"รีเลย์โอเวอร์โหลดใต้คอนแทคเตอร์แม่เหล็กไฟฟ้าแบบหน้า M2 ได้รับการเปลี่ยนใหม่แล้ว (ยืมมาจากโรงบำบัดน้ำเสีย) สปริงของคอนแทคเตอร์แม่เหล็กไฟฟ้าแบบกลับ M2 ติดขัด และเฟส S ไม่ถูกดูดเข้าไปลึกมาก หลังจากทดสอบซ้ำแล้วซ้ำเล่า ก็ถือว่าปกติ ใช้||ใช้อยู่ในปัจจุบัน"

You are not probably speaking Thai but the discrepancies can be found in many cases.
Here is another example:

Text: "How was work today?"
The result from your library:
"วันนี้ทำงานอย่างไร?"
Google Translate's result:
"วันนี้งานเป็นอย่างไรบ้าง?"

The good results are the ones that comes from Google Translate. The incorrect translations are really off, and occasionally they don't make sense. I actually have tried the same examples on another NodeJS library named "@iamtraction/google-translate". The results are identical to yours. I think and according to our users, the discrepancies only started to occur a few days ago.

Regards,
Thawon Uttamavanit

@vitalets
Copy link
Owner

vitalets commented Jan 31, 2025

Could you show the code snippet, how do you perform the request?
And what is the version of google-translate-api.

@thawon
Copy link
Author

thawon commented Jan 31, 2025

The code is copied straight from the npm:
https://www.npmjs.com/package/@vitalets/google-translate-api

The version number is 9.2.1

import { translate } from '@vitalets/google-translate-api';
const { text } = await translate('How was work today?', { to: 'th' });
console.log(text) // => 'Hello World! How are you?'

@vitalets
Copy link
Owner

I've tested with russian and confirm the discrepancy.
This needs investigation, as this library uses undocumented APIs.
Could you post a raw response for your query:

const { text, raw } = await translate(sourceText, { to: 'th' });
console.dir(raw, { depth: null });

@Waste2Time
Copy link

The same question, I have raised two issues in other repos, I suspect that this is because Google uses different models for different APIs.

It seems Google is going to make changes to these undocumented APIs. :(

index.js

import { translate } from '@vitalets/google-translate-api';

async function translateText() {
  const { text, raw } = await translate('やれやれ、またドイツか、と僕は思った。', { to: 'ru' });
  console.log(text);
  console.dir(raw, {depth: null});
}
translateText();

output:

Я думал, что это снова Германия.
{
  sentences: [
    {
      trans: 'Я думал, что это снова Германия.',
      orig: 'やれやれ、またドイツか、と僕は思った。',
      backend: 3,
      model_specification: [ { label: 'offline' }, { label: 'offline' } ],
      translation_engine_debug_info: [
        {
          model_tracking: {
            checkpoint_md5: '07653cda2443db08a8e1f2435c678a44',
            launch_doc: 'efficient_models_2022q2.md'
          }
        },
        {
          model_tracking: {
            checkpoint_md5: 'edbff5b2398eeca464de2caaf36a7a7e',
            launch_doc: 'efficient_models_2022q2.md'
          }
        }
      ]
    },
    {
      translit: 'YA dumal, chto eto snova Germaniya.',
      src_translit: 'Yareyare, mata Doitsu ka, to boku wa omotta.'
    }
  ],
  src: 'ja',
  confidence: 1,
  spell: {},
  ld_result: {
    srclangs: [ 'ja' ],
    srclangs_confidences: [ 1 ],
    extended_srclangs: [ 'ja' ]
  }
}

I've tested with russian and confirm the discrepancy. This needs investigation, as this library uses undocumented APIs. Could you post a raw response for your query:

const { text, raw } = await translate(sourceText, { to: 'th' });
console.dir(raw, { depth: null });

@thawon
Copy link
Author

thawon commented Feb 1, 2025

@vitalets here is the raw:

import { translate } from '@vitalets/google-translate-api';
const { text, raw } = await translate('How was work today?', { to: 'th' });
console.dir(raw, { depth: null });
{
  sentences: [
    {
      trans: 'วันนี้ทำงานอย่างไร?',
      orig: 'How was work today?',
      backend: 3,
      model_specification: [ { label: 'offline' } ],
      translation_engine_debug_info: [
        {
          model_tracking: {
            checkpoint_md5: '09b7ec576131a14236c5712dd8ea64aa',
            launch_doc: 'efficient_models_2022q2.md'
          }
        }
      ]
    },
    { translit: 'Wạn nī̂ thảngān xỳāngrị?' }
  ],
  src: 'en',
  confidence: 1,
  spell: {},
  ld_result: {
    srclangs: [ 'en' ],
    srclangs_confidences: [ 1 ],
    extended_srclangs: [ 'en' ]
  }
}

@vitalets
Copy link
Owner

vitalets commented Feb 3, 2025

Frankly, not sure that we can solve it, any ideas would be appreciated.
I've tried to play with query params of the request to google endpoint, but the response is the same as above.

Google translate website performs the following request:
Endpoint:

https://translate.google.com/_/TranslateWebserverUi/data/batchexecute

Params:
Image

UPDATE:
I've found a great explanation of the reasons, why translations are worse.

@Waste2Time
Copy link

Frankly, not sure that we can solve it, any ideas would be appreciated. I've tried to play with query params of the request to google endpoint, but the response is the same as above.

Google translate website performs the following request: Endpoint:

https://translate.google.com/_/TranslateWebserverUi/data/batchexecute

Params: Image

UPDATE: I've found a great explanation of the reasons, why translations are worse.

@vitalets The interesting thing is that this has only happened to me in the last month. But it seems that this phenomenon has occurred to others in early 2024 or even before. Do you have any thoughts on this?

By the way, I tried to use different IP addresses or machine to make requests to that API, but all the results I got were wrong. If it is not associated with something on the client side, how can it become unusable after a certain "accidental" point in time?

@vitalets
Copy link
Owner

vitalets commented Feb 3, 2025

Just one idea, maybe google rolls out this feature gradually?
I'tried to ask ChatGPT to reverse engineer the minified script that calculates x-goog-batchexecute-bgr on google translate page, but with no luck.
I think for apps that are more than pet-projects it's better to use their public translate apis. That are not free though(

@Waste2Time
Copy link

Google's forum Q&A staff has denied any free undocumentary APIs, I guess these APIs were used in obscure places and Google has figured it out and is phasing them out...

Is next step to imitate the web translation request through reverse engineering? Maybe there are other easier ways...

@vitalets
Copy link
Owner

vitalets commented Feb 4, 2025

Is next step to imitate the web translation request through reverse engineering? Maybe there are other easier ways...

One of the approaches is to utilize headless browser - navigate to google translate page and perform translation request.
There is one project that uses it, but I didn't try it myself: https://github.com/alanleungcn/puppeteer-google-translate

@Waste2Time
Copy link

Is next step to imitate the web translation request through reverse engineering? Maybe there are other easier ways...

One of the approaches is to utilize headless browser - navigate to google translate page and perform translation request. There is one project that uses it, but I didn't try it myself: https://github.com/alanleungcn/puppeteer-google-translate

Thank you, I thought of this, and I'll try it.

@Waste2Time
Copy link

I haven't tried headless browser yet, but others mentioned a lib which is not affected in another issue. That py lib uses another api 'https://translate.google.com/m', and the api can return the entire translation page, which contains the correct translation results, without the need for a specific header. I debugged in Python env and verified it can work without specific header in Postman.

Test content

url : https://translate.google.com/m?q=僕は三十七歳で、そのときボーイング747のシートに座っていた。その巨大な飛行機はぶ厚い雨雲をくぐり抜けて降下し、ハンブルク空港に着陸しようとしているところだった。十一月の冷ややかな雨が大地を暗く染め、雨合羽を着た整備工たちや、のっぺりとした空港ビルの上に立った旗や、BMWの広告板やそんな何もかもをフランドル派の陰うつな絵の背景のように見せていた。やれやれ、またドイツか、と僕は思った。&sl=ja&tl=en

headers:
Content-Type:text/html; charset=utf-8
Vary:Sec-Fetch-Dest, Sec-Fetch-Mode, Sec-Fetch-Site
Cache-Control:no-cache, no-store, must-revalidate
Strict-Transport-Security:max-age=31536000; includeSubDomains; preload
X-Content-Type-Options:nosniff
X-Frame-Options:DENY
X-XSS-Protection:1; mode=block
Set-Cookie:<cookie_data>; Path=/; Secure; HttpOnly; SameSite=lax
Alt-Svc:h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Transfer-Encoding:chunked

@thawon
Copy link
Author

thawon commented Feb 6, 2025

@Waste2Time I tried the 'https://translate.google.com/m' api, I can verify that the results still are not the same.

https://translate.google.com/m?q=各位帥哥 老闆春節不放假 泰妞陪你嗨~過年🧨&sl=ja&tl=en
result: Ladies and Gentlemen, the old man is not free of the Spring Festival.

compare to

https://translate.google.com/?sl=auto&tl=en&text=各位帥哥 老闆春節不放假 泰妞陪你嗨~過年🧨&op=translate
result: Dear handsome guys, the boss will not take a holiday during the Spring Festival. Thai girls will accompany you to celebrate the New Year 🧨

This just gets me an idea that should there be another api out there we don't know about? that returns the same result.

@Waste2Time
Copy link

Waste2Time commented Feb 7, 2025

@Waste2Time I tried the 'https://translate.google.com/m' api, I can verify that the results still are not the same.

https://translate.google.com/m?q=各位帥哥 老闆春節不放假 泰妞陪你嗨~過年🧨&sl=ja&tl=en result: Ladies and Gentlemen, the old man is not free of the Spring Festival.

compare to

https://translate.google.com/?sl=auto&tl=en&text=各位帥哥 老闆春節不放假 泰妞陪你嗨~過年🧨&op=translate result: Dear handsome guys, the boss will not take a holiday during the Spring Festival. Thai girls will accompany you to celebrate the New Year 🧨

This just gets me an idea that should there be another api out there we don't know about? that returns the same result.

@thawon
/m api change param sl to auto from ja so that it corresponds to the web api

base on your giving /m api, try to translate the given adv sentence in web and select ja as source language, you will see the same result

@thawon
Copy link
Author

thawon commented Feb 7, 2025

@Waste2Time my mistake. I am sorry. Yes, when changing the source language (sl), the results are the same. Great work :)

@vitalets
Copy link
Owner

vitalets commented Feb 7, 2025

I've tried with this url:

https://translate.google.com/m?q=各位帥哥 老闆春節不放假 泰妞陪你嗨~過年🧨&sl=ja&tl=en

and got the following result (incorrect):

Image

When I've changed to sl=auto, it does not translate at all:

Image

I've also checked with How was work today?, translating to Russian:

# /m api (sl=en):
Как была работа сегодня?

# /m api (sl=auto) - same result:
Как была работа сегодня?

# full google translate website - better than previous:
Как прошла работа сегодня?

Do you get stable results on that /m api?

@Waste2Time
Copy link

I've tried with this url:

https://translate.google.com/m?q=各位帥哥 老闆春節不放假 泰妞陪你嗨~過年🧨&sl=ja&tl=en

and got the following result (incorrect):

Image When I've changed to `sl=auto`, it does not translate at all: Image I've also checked with **How was work today?**, translating to Russian:
# /m api (sl=en):
Как была работа сегодня?

# /m api (sl=auto) - same result:
Как была работа сегодня?

# full google translate website - better than previous:
Как прошла работа сегодня?

Do you get stable results on that /m api?

I bet the one obtained before this use is correct. And I found it's wrong now. :(
I'm going to try if it works in py lib.

@vitalets
Copy link
Owner

vitalets commented Feb 7, 2025

Google is trolling us 🙂

@Waste2Time
Copy link

Waste2Time commented Feb 7, 2025

Now deep-transtalor py lib is failed, it returns as follows:

Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.

so crazy.

edit:
I found this in deep-translator, it appears to happen intermittently, and no one is sure what Google will do about it.

nidhaloff/deep-translator#282

the api is working but returns different result, I'll submit a new issue in that repo...

@Waste2Time
Copy link

Waste2Time commented Feb 9, 2025

@vitalets Chrome also has two built-in google translation methods, partial translation and whole web translation. I caught messages and analyzed them.

Partial translation uses get method and puts what needs to be translated as base64 in header, unfortunately there is a part before and after the base64 encoding that I can't find out how to get it.

But the whole web translation is eaiser, it use post method, and just needs x-goog-api-key in header. I guess it's unique and unchanging for every user. I didn't make many changes to reuse the whole web translation api in postman and verify it. I hope this information is useful to you. :)

btw, I hope google won't make any more changes to translation api

@ioanalexme
Copy link

@Waste2Time And what is the value of the "x-goog-api-key" ? Or do you have an example?
Thanks.

@Waste2Time
Copy link

@Waste2Time And what is the value of the "x-goog-api-key" ? Or do you have an example? Thanks.

You can get it from google cloud service as normal way I guess, it's unique to each user. But I got it by catching requests.

@vitalets
Copy link
Owner

@Waste2Time thanks for the interesting insights. Do you see the these requests when un-authorized in chrome / incognito page?

@Waste2Time
Copy link

Okay, I feel like I have it all figured out. The whole page tranlsation function uses google cloud serivce, so the key is the same for everyone. I have applied another api key in google cloud and replaced the old with I applied, it returns translate Private API has not been used in project before or it is disabled..

@ioanalexme
Copy link

@Waste2Time Can you share a screenshot with the entire request Chrome makes to translate a website?

@ioanalexme
Copy link

Hello,
Here is all you need:

POST /v1/translateHtml HTTP/1.1
Host: translate-pa.googleapis.com
X-Goog-API-Key: AIzaSyATBXajvzQLTDHEQbcpq0Ihe0vWDHmO520
Content-Type: application/json+protobuf

[[["¿Cómo estás?"],"auto","en"],"wt_lib"]

And the result:

[["How are you doing?"],["sp"]]

@star-kiss
Copy link

您好, 您需要的一切如下:

POST /v1/translateHtml HTTP/1.1
Host: translate-pa.googleapis.com
X-Goog-API-Key: AIzaSyATBXajvzQLTDHEQbcpq0Ihe0vWDHmO520
Content-Type: application/json+protobuf

[[["¿Cómo estás?"],"auto","en"],"wt_lib"]

结果如下:

[["How are you doing?"],["sp"]]

Hello!

First of all, thank you for your valuable reminder in the translation plan.
During the use, the problem you mentioned -
"You have to take care to escape and unescape certain characters when using it (and turn newlines into
and back), since it communicates in HTML, but that's really about it."

  • is indeed a detail that needs special attention.

We have fully discussed the plan, but we still hope to ask you about character escape and line break processing in HTML communication.
Are there any efficient processing methods or experiences that can simplify this process, avoid potential problems, and ensure that the quality of the final result is not affected?

Thank you again for your reminder and look forward to your valuable suggestions.

Best wishes,

@star-kiss
Copy link

只是一个想法,也许谷歌会逐步推出这个功能? 我试图让 ChatGPT 对谷歌翻译页面上计算的最小化脚本进行逆向工程x-goog-batchexecute-bgr,但没有成功。 我认为对于不仅仅是宠物项目的应用程序,最好使用它们的公共翻译 API。但这些 API 不是免费的(

Hi, you mentioned earlier that you tried to reverse engineer the minimized script x-goog-batchexecute-bgr that does the calculations on the Google Translate page, but failed. How is your progress on this issue now? Have you encountered any new challenges or found any new clues?

@hayatnoor
Copy link

I also tried using Cypress to open a browser window to point to translate.google.com to give me the translation but it's also giving me the same translation as this module is. When I open the same exact URL that cypress generates from my script into a normal chrome window, it gives me the correct translation. I tried using the same user agent headers in my cypress test but it still didn't work.

@ShihabZzz
Copy link

@Waste2Time
Copy link

您好, 您需要的一切如下:

POST /v1/translateHtml HTTP/1.1
Host: translate-pa.googleapis.com
X-Goog-API-Key: AIzaSyATBXajvzQLTDHEQbcpq0Ihe0vWDHmO520
Content-Type: application/json+protobuf

[[["¿Cómo estás?"],"auto","en"],"wt_lib"]

结果如下:
[["How are you doing?"],["sp"]]

Hello!

First of all, thank you for your valuable reminder in the translation plan. During the use, the problem you mentioned - "You have to take care to escape and unescape certain characters when using it (and turn newlines into and back), since it communicates in HTML, but that's really about it."

  • is indeed a detail that needs special attention.

We have fully discussed the plan, but we still hope to ask you about character escape and line break processing in HTML communication. Are there any efficient processing methods or experiences that can simplify this process, avoid potential problems, and ensure that the quality of the final result is not affected?

Thank you again for your reminder and look forward to your valuable suggestions.

Best wishes,

It is very complex, I think unless we reverse engineer Chrome, we can't figure out what's in that header. In fact, I have tried to reverse Chrome using dynamic or static methods half a month ago, but to no avail. I have very little knowledge in the field of reverse engineering.

What I found about this header is that, obviously, the translated content exists in this header in the form of BASE64 encoding, but there are parts of the prefix and suffix that I can't understand where they come from and what they represent.

@xsxfjsm
Copy link

xsxfjsm commented Mar 13, 2025

The translation request using the /v1/translateHtml endpoint did not work as expected. For instance, when translating the text "你是個好人" with the following input:
[[["你是個好人"],"auto","de"],"wt_lib"]
result:
[
[
"你是個好人"
],
[
"en"
]
]
The content was not translated, and the source language was incorrectly identified as English ("en") instead of Traditional Chinese("zh-TW")。

@Waste2Time
Copy link

The translation request using the /v1/translateHtml endpoint did not work as expected. For instance, when translating the text "你是個好人" with the following input: [[["你是個好人"],"auto","de"],"wt_lib"] result: [ [ "你是個好人" ], [ "en" ] ] The content was not translated, and the source language was incorrectly identified as English ("en") instead of Traditional Chinese("zh-TW")。

Google's faulty I guess. Considering in the api is used in chrome, have you tried to translate goolge pages including "你是個好人"? I tried and find most search result including traditional chinese wasn't be translated whether to en or de, but simplified Chinese content is ok.

@xsxfjsm
Copy link

xsxfjsm commented Mar 14, 2025

The translation request using the /v1/translateHtml endpoint did not work as expected. For instance, when translating the text "你是個好人" with the following input: [[["你是個好人"],"auto","de"],"wt_lib"] result: [ [ "你是個好人" ], [ "en" ] ] The content was not translated, and the source language was incorrectly identified as English ("en") instead of Traditional Chinese("zh-TW")。

Google's faulty I guess. Considering in the api is used in chrome, have you tried to translate goolge pages including "你是個好人"? I tried and find most search result including traditional chinese wasn't be translated whether to en or de, but simplified Chinese content is ok.

If the source language is set to Traditional Chinese (zh-TW), it can be translated.
[[["你是個好人"],"zh-TW","de"],"wt_lib"]
result:
[ [ "Du bist ein guter Mensch" ] ]
However, the translation quality of most Traditional Chinese translations is relatively poor.

@Waste2Time
Copy link

The translation request using the /v1/translateHtml endpoint did not work as expected. For instance, when translating the text "你是個好人" with the following input: [[["你是個好人"],"auto","de"],"wt_lib"] result: [ [ "你是個好人" ], [ "en" ] ] The content was not translated, and the source language was incorrectly identified as English ("en") instead of Traditional Chinese("zh-TW")。

Google's faulty I guess. Considering in the api is used in chrome, have you tried to translate goolge pages including "你是個好人"? I tried and find most search result including traditional chinese wasn't be translated whether to en or de, but simplified Chinese content is ok.

If the source language is set to Traditional Chinese (zh-TW), it can be translated. [[["你是個好人"],"zh-TW","de"],"wt_lib"] result: [ [ "Du bist ein guter Mensch" ] ] However, the translation quality of most Traditional Chinese translations is relatively poor.

I don't know German so I can't tell if the translation is poorer but I suggest you try to translate your example in google translate website, the website result is the same with what you provide. A possible explanation is it's the faulty of google translation with traditional chinese i guess.

@xsxfjsm
Copy link

xsxfjsm commented Mar 18, 2025

I don't know German so I can't tell if the translation is poorer but I suggest you try to translate your example in google translate website, the website result is the same with what you provide. A possible explanation is it's the faulty of google translation with traditional chinese i guess.

Yes, Google Translate doesn't work well with Traditional Chinese. Thank you for your answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants