perplexity計算時にbosトークンがない場合にeosトークンを代わりに使う #253

Ktakuya332C · 2025-10-06T05:11:57Z

概要

bosトークンがないモデル(Qwen系)に対してperplexityを計算しようとするとモデルの入力にNoneが混入し落ちます。
bosトークンがない場合には代わりにeosトークンを入れることで、落ちるのを防ぎます

詳細

bosトークンがないモデルはtokenizer.bos_tokenがNoneなので、入力セグメント全てのprefixがNoneになってしまい、vllmやhfモデルへの入力がNoneになります。
bosトークンがない場合にはだいたい 文章1<eos>文章2 という形で事前学習がなされているはずなので、今回の変更を入れることで上のchunkの <eos>文章2 という部分をとってきた場合のperplexityを計算するようにします。

ryokan0123

LGTM

Ktakuya332C added 2 commits October 6, 2025 13:29

substitute eos if bos is not set

44efd39

apply ruff format

a9aedad

Ktakuya332C requested a review from a team October 6, 2025 06:02

Ktakuya332C marked this pull request as ready for review October 6, 2025 06:02

ryokan0123 approved these changes Oct 7, 2025

View reviewed changes

Ktakuya332C merged commit 0a10787 into main Oct 7, 2025
8 checks passed

Ktakuya332C deleted the fix/logprobs-without-bos-token branch October 7, 2025 01:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perplexity計算時にbosトークンがない場合にeosトークンを代わりに使う #253

perplexity計算時にbosトークンがない場合にeosトークンを代わりに使う #253

Uh oh!

Ktakuya332C commented Oct 6, 2025 •

edited

Loading

Uh oh!

ryokan0123 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perplexity計算時にbosトークンがない場合にeosトークンを代わりに使う #253

perplexity計算時にbosトークンがない場合にeosトークンを代わりに使う #253

Uh oh!

Conversation

Ktakuya332C commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

概要

詳細

Uh oh!

ryokan0123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ktakuya332C commented Oct 6, 2025 •

edited

Loading