Open
Description
That I have added some new words to t5.get_tokenizer()
as shown below:
def get_tokenizer(name):
tokenizer = T5Tokenizer.from_pretrained(name, model_max_length=MAX_LENGTH)
new_words =['XXX', 'OOO', ......]
tokenizer.add_tokens(new_words)
return tokenizer
I would like to understand if I need to retrain or fine-tune the EncoderModel after adding these new words to the tokenizer. How will this modification affect the model's performance or behavior?
This question is related to the Imagen project, and I want to ensure that I am following the correct approach when incorporating new words into the tokenizer.
Metadata
Metadata
Assignees
Labels
No labels