-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
補充兩岸讀音相異的單字 #38
補充兩岸讀音相異的單字 #38
Conversation
第二筆「小修改」本來想刪除打多出一行的 |
你可以撤銷(revert)第二次commit |
This reverts commit c11a61e.
and update date
謝謝,我發現網頁版無法執行這個操作,下載了Windows版操作,應該是成功了 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感謝拾柴。
需要關注對多音字權重的處理。
還一個盲區,只看代碼的差異不夠,需要在整個詞庫裏檢查:
如果添加字音後,某個字變成了多音字,這個字還出現在多音詞的注音裏面,則該詞的注音也需要增加標註。
舉一個例子。假設詞庫有
僞作 wei3 zuo4
因爲「作」是多音字,有 zuo1, zuo2, zuo4 等。
原有詞條的註音是爲了對「作」的讀音消歧,使程序不會自動推導出其他單字音的組合。
現在「僞」增加了一個常用讀音 wei4,需要同時定義
僞作 wei3 zuo4
僞作 wei4 zuo4
很遺憾,這種情況目前還需要人工審查保證改動完整。
@@ -21399,6 +21433,7 @@ min_phrase_weight: 100 | |||
緛 ruan4 | |||
緜 mian2 | |||
緝 ji1 50% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ji1
和 qi4
爲常見讀音,權重應標記爲100%
或各個讀音都不標記權重
@@ -14420,7 +14442,9 @@ min_phrase_weight: 100 | |||
昑 qin3 | |||
昒 hu1 | |||
易 yi4 | |||
昔 cuo4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
這個讀音應標記爲罕用讀音,第三列設爲 0%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
權重剛剛已經修改了,這方面我確實不太懂,感謝指點。
至於多音字消歧,我應該力不足了,一個一個弄工作量有點大 #32 不知道能否自動化 |
感謝。 |
我剛剛新增了全字庫讀音支援: 不過我也只是針對單字讀音而已。詞組讀音恐需要長期調整。 |
見issue #37