You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am experiencing an issue with batch inference using my self-trained model. When I perform inference on single samples, the results are consistent and correct. However, when I perform inference on batches of multiple samples, the results differ unexpectedly.
I also find it strange that the outputs of batch inference change when I alter the batch size. I’ve tested batch sizes ranging from 8 to 64, and the inconsistencies increase with larger batch sizes.
I've updated the unsloth version to 2024.12.4 and also set padding_side to 'left' and set tokenizer.pad_token = tokenizer.unk_token, it still not work.
Here is my code
max_seq_length = 1024 # in case of truncate
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = f"{weights_path}", # YOUR MODEL YOU USED FOR TRAINING
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model)
tokenizer.padding_side='left'
tokenizer.pad_token = tokenizer.unk_token
batch_size = 4 #32 #64
# Prepare the batch
batch_input_strs = []
batch_data_js_items = []
# Iterate through data and prepare inputs in batches
for idx, data_js_item in enumerate(data_js[:eval_item_num]):
input_str = f"game_record:{data_js_item['game_record']}, 'target_player':{data_js_item['target_player']}"
batch_input_strs.append(input_str)
batch_data_js_items.append(data_js_item)
# Once the batch size is reached, or we've processed the last item
if len(batch_input_strs) == batch_size or idx == len(data_js[:eval_item_num]) - 1:
# Prepare batch inputs for tokenizer
inputs = tokenizer(
[alpaca_prompt.format(
INSTRCTION, input_str, "",) for input_str in batch_input_strs
],
return_tensors="pt",padding=True, truncation=True).to("cuda")
# Perform batch inference
outputs = model.generate(**inputs, max_new_tokens=1024, use_cache=True, do_sample = False)
# Decode the batch outputs
output_lst = tokenizer.batch_decode(outputs)
for ouput_token in output_lst:
ouput_token = ouput_token.replace(tokenizer.pad_token, "")
# Process results for each item in the batch
for i, output_text in enumerate(output_lst):
# Extract the response text from the model output
s_idx = output_text.find("### Response:\n") + len("### Response:\n")
e_idx = output_text.find(EOS_TOKEN)
predict_str = output_text[s_idx:e_idx]
batch_input_strs = []
batch_data_js_items = []
The text was updated successfully, but these errors were encountered:
I am experiencing an issue with batch inference using my self-trained model. When I perform inference on single samples, the results are consistent and correct. However, when I perform inference on batches of multiple samples, the results differ unexpectedly.
I also find it strange that the outputs of batch inference change when I alter the batch size. I’ve tested batch sizes ranging from 8 to 64, and the inconsistencies increase with larger batch sizes.
I've updated the unsloth version to 2024.12.4 and also set padding_side to 'left' and set tokenizer.pad_token = tokenizer.unk_token, it still not work.
Here is my code
The text was updated successfully, but these errors were encountered: