-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jasper and Stella: distillation of SOTA embedding models #67
Conversation
return f'Instruct: {task_description}\nQuery: {query}' | ||
|
||
|
||
# Each query must come with a one-sentence instruction that describes the task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the teacher model doesn't need a prompt?
You can use a more detailed comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder, I will add it to the readme note.
dic = set() | ||
with open(train_data_path) as f: | ||
for line in tqdm.tqdm(f): | ||
data_dic=json.loads(line.strip()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data_dict=json.loads(line)
json.loads
does not need strip
Complete variable names will never be an error
|
||
|
||
if 'pos' in data_dic: | ||
for text_pos in data_dic['pos']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for text_pos in data_dic.get('pos',[]):
trust_remote_code=True, | ||
device="cuda:7", | ||
model_kwargs={ | ||
"torch_dtype": torch.bfloat16, # fp16 容易计算出nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove Chinese character
remove this comment; the experience is not suitable for every model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,thanks.
return loss | ||
|
||
def pair_inbatch_similarity_loss( | ||
self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果你想更好可以考虑typing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后续引入
student_embeddings, # [batch_size,dim] | ||
teacher_similarity, # [batch_size,dim] | ||
): | ||
loss_fct = nn.MSELoss() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
每次计算loss都要初始化一个nn.MSELoss()吗?如果编译器没有自动优化这个,我建议用F.mse_loss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
todo: