Skip to content

Commit

Permalink
another fix
Browse files Browse the repository at this point in the history
  • Loading branch information
ahmeda14960 committed Jan 17, 2025
1 parent fa049d6 commit 0b149b0
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions src/levanter/data/text.py
Original file line number Diff line number Diff line change
Expand Up @@ -955,6 +955,9 @@ def preprocess_chat_example(batch, tokenizer: PreTrainedTokenizerBase, should_ap

# Tokenize sources to get lengths
sources_tokenized = tokenizer(sources, padding=False, truncation=True)

if should_append_eos:
targets = [t + tokenizer.eos_token for t in targets]

# Combine for full examples
full_examples = [f"{s}{t}" for s, t in zip(sources, targets)]
Expand Down

0 comments on commit 0b149b0

Please sign in to comment.