-
-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: GPT missed some translations after commit 89443fc #805
Comments
我覺得問題不只是這樣,在 commit 027c966 上使用 GPT4o-min,並不會失敗 commit 027c966 with GPT4o-min: console log:
|
其实真正的问题在于,你说的另一个commit是在这个pr之后的....chatgpt的文件没有任何变化,而且那个commit也和这个问题不相关,所以应该是你自己改了什么东西和仓库不一样了。 Your 4omini filled the merged sentences with ellipses <|2|>..., which could be a coincidence. I tried five or six times, which is 5*15 times, and didn't succeed even once. Retrying 15 times here is redundant. It seems that the _INVALID_REPEAT_COUNT parameter is useless now, it only simply increases the retry times. Let's change it to 0 in the next PR. Are you sure that you used 4omini in both versions, and that you didn't change the model yourself? Are your prompts consistent between the two? Are you using the same API? If any of them are different, change them to be the same and retry.. |
问题已换了种方式解决,看新pr。 |
我知道你说的是什么情况了,这个commit前确实能像你所说的一样输出结果,之前是用append直接把未翻译完成的内容append到第一次翻译内容的之后的,所以是肯定能输出结果的,只是结果都是错误的罢了,宁可不输出,我也不想要错误结果来误导用户,在你说的那个commit对应的pr中我也提到过,#788 就是为了解决最终翻译存在错误的问题。你无法在新版中正常输出,正是说明了你的翻译结果存在错误,否则是会正常输出的,而你自己没有发现这个错误,也说明了这个pr是有价值的。我再给你举点详细例子。以下输出结果是用你所说的能正常输出的版本。 I understand the situation you mentioned. Before this commit, it could indeed output results as you said. Previously, the untranslated content was directly appended to the first translation, so it would definitely output results, but the results were all incorrect. I would rather not output anything than provide misleading results for users. I also mentioned this in the PR corresponding to that commit, #788 was aimed at solving the problem of final translation errors. Your inability to output normally in the new version precisely indicates that your translation results contain errors. If the results were correct, it would output normally. The fact that you didn't discover these errors yourself also demonstrates the value of this PR. Let me give you some more detailed examples. Here is the output result using the version you mentioned that can generate normal output.
原先的逻辑是如果回复数量不足,例如:
会保留第一次所有的翻译结果,先生成一个翻译列表,然后在剩余的位置填充空字符串待后续填充: 看起来似乎是翻译了但是事实上,这些翻译大部分都和气泡框错位,可以看到前两个气泡框的文字合并成了一句放在了第一个气泡框里,第9和10气泡框的文字合并了翻译,放在了第八个源文本是“治安を乱す反動分子がいた場合”的气泡框内,这已经不是影响阅读体验的问题了,我遇到过不少一个人的台词从另一个人口中说出的情况,总之是非常抽象,而且出现这种问题你还没法定位在哪,如果你对源语言一点都不了解,更是难分辨: It seems as if it has been translated, but in fact, most of these translations are misaligned with the speech bubbles. You can see that the text of the first two speech bubbles has been merged into one sentence and placed in the first speech bubble. The text of bubbles 9 and 10 has been merged and placed in the eighth speech bubble, where the original text was "治安を乱す反動分子がいた場合". This is no longer just an issue of reading experience. I have encountered many situations where a line of dialogue is spoken from another person's mouth. In any case, it is very abstract, and you cannot locate where this problem occurs, and if you do not understand the source language at all, it becomes even more difficult to distinguish.
再来看看首次翻译结果与中间的两次翻译结果以及最终结果的区别:
这是中间两次翻译结果:
这是最终结果:
最终结果局部详情:
可以看出最终的翻译结果的前面8句话都保留了首次翻译内容,后续所有retry的前8句是无效的,因为最终翻译必然采用首次翻译的全部内容,而第九句实际上是最后一次尝试时对应位置恰好有翻译而附加到后面的,如果最后一次retry的第九句为空,而其他retry的第九/十句有翻译,最后的结果是第九句为空。所以以前会出现非常多翻译末尾出现重复内容的情况,就是因为最后一次retry的末尾句恰好和首次翻译的末尾句一样且位置在首次翻译末尾句的后面。 以前还会碰到这种问题: The final translation result retains the first 8 sentences from the initial translation attempt. All subsequent retries for the first 8 sentences are ineffective, as the final translation will invariably use the entire content from the first attempt. The ninth sentence is actually appended from the corresponding position in the last retry attempt, where a translation happened to be available. If the ninth sentence in the final retry is empty, but other retries have translations for the ninth or tenth sentences, the result will show an empty ninth sentence. This explains why there were often many instances of repeated content at the end of translations in the past. It was because the final sentences of the last retry attempt happened to be the same as the final sentences of the initial translation and were positioned after the end of the initial translation. Previously, we would also encounter this kind of problem:
可以发现,若固定保留首次的结果,如果第一次尝试被风控返回了风控词,那么第一个位置在后续即使被正确翻译了,也不会保留重试后正确的结果,而第二个位置是保留的是“'應該在…”而不是“應該在的…”。而很多时候中间对应位置上的翻译可能比最后一次更好,这样不但浪费了token,还可能使用更不靠谱的翻译。并且由于你能看到中间翻译的所有内容,结果程序选择只会选择最后一次对应位置的翻译,而且可能恰好是这些重试翻译里的渣翻,会让你心理上会很难受,不如一次通过,不留任何余地。 It can be observed that if we keep the first result fixed, and if the first attempt returns censored content, even if the first position is correctly translated in subsequent attempts, the correct result after retrying will not be retained. The second position retains "'應該在..." instead of "應該在的...". Often the translation in the middle corresponding positions may be better than the last attempt. This not only wastes tokens, but may also result in using less reliable translations. Additionally, since you can see all the intermediate translations, the program will only choose the translation from the last corresponding position, which may happen to be the worst translation among these retries. This can be psychologically frustrating for you. It would be better to get it right in one attempt, leaving no room for alternatives. |
我理解你說的,就等 #807 merge 進來我再 pull 來試試看 雖然如你所說的,會有翻譯錯誤跟錯格的問題 但是為了講求完美,乾脆連錯誤都不輸出 不過沒關係 |
有意输出错误的内容好像不太合理,我在那个commit之前已经被错位问题困扰很久了,我只是解决一下绝大多数用户使用上可能遇到的问题,或者说是没意识到的问题,认为是机翻不行才导致翻译看不懂,实际上是代码逻辑问题。我没考虑到空白页是因为我不用4omini,事实上4omini错位问题非常严重,因为以前风控就把我控死了,错位问题反倒不是问题了。建议使用deepseek,智能且廉价。 |
我確實最近也有想換成 deepseek,我還趁機試試看好了,謝謝 |
Issue
After commit 89443fc, GPT missed some translations in some cases, as shown in this example:
Source image (Eiyuu Kikan: Chapter13-004):
commit 89443fc:
commit 027c966 (before 89443fc):
Command Line Arguments
Console logs
commit 89443fc:
commit 027c966 (before 89443fc):
The text was updated successfully, but these errors were encountered: