Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

別URLからの同一テクスト翻訳をどう突合するか #136

Closed
ttizze opened this issue Jul 31, 2024 · 4 comments · Fixed by #143
Closed

別URLからの同一テクスト翻訳をどう突合するか #136

ttizze opened this issue Jul 31, 2024 · 4 comments · Fixed by #143
Assignees
Labels
feature New feature implementation

Comments

@ttizze
Copy link
Owner

ttizze commented Jul 31, 2024

why
例えば別URLから不思議の国のアリスの翻訳が来た場合、訳文も同じでよいので、突合させたい

タイトル抜き出しが出来たら部分一致とかで検索してヒットしたデータでsource text突合?

@ttizze
Copy link
Owner Author

ttizze commented Aug 3, 2024

タイトルとnumberをコンテクストとしてhash化すると、numberが違うとき以外はいける
タイトルと前後のテクストだと、前後は同じだが文脈が違う同じ文章のときおかしくなる(詩とかでありそう)

@ttizze
Copy link
Owner Author

ttizze commented Aug 3, 2024

タイトルとnumberで同定 hash化は不要

@ttizze ttizze added bug Something isn't working as expected feature New feature implementation and removed bug Something isn't working as expected labels Aug 3, 2024
@ttizze ttizze self-assigned this Aug 3, 2024
Copy link

github-actions bot commented Aug 3, 2024

Created branch ttizze/feat-issue-136

Copy link

github-actions bot commented Aug 3, 2024

PR #143 has been linked to this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature implementation
Projects
Status: Done
1 participant