Batch inference produces inconsistent results for self-trained model #1456

Xyuan13 · 2024-12-20T10:14:05Z

I am experiencing an issue with batch inference using my self-trained model. When I perform inference on single samples, the results are consistent and correct. However, when I perform inference on batches of multiple samples, the results differ unexpectedly.

I also find it strange that the outputs of batch inference change when I alter the batch size. I’ve tested batch sizes ranging from 8 to 64, and the inconsistencies increase with larger batch sizes.

I've updated the unsloth version to 2024.12.4 and also set padding_side to 'left' and set tokenizer.pad_token = tokenizer.unk_token, it still not work.

Here is my code

max_seq_length = 1024 # in case of truncate 
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = f"{weights_path}", # YOUR MODEL YOU USED FOR TRAINING
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) 

tokenizer.padding_side='left'
tokenizer.pad_token = tokenizer.unk_token

batch_size = 4 #32 #64   

# Prepare the batch
batch_input_strs = []
batch_data_js_items = []

# Iterate through data and prepare inputs in batches
for idx, data_js_item in enumerate(data_js[:eval_item_num]):
    input_str = f"game_record:{data_js_item['game_record']}, 'target_player':{data_js_item['target_player']}"
    batch_input_strs.append(input_str)
    batch_data_js_items.append(data_js_item)

    # Once the batch size is reached, or we've processed the last item
    if len(batch_input_strs) == batch_size or idx == len(data_js[:eval_item_num]) - 1:
        # Prepare batch inputs for tokenizer
        inputs = tokenizer(
            [alpaca_prompt.format(
                INSTRCTION, input_str, "",) for input_str in batch_input_strs
            ],
            return_tensors="pt",padding=True, truncation=True).to("cuda")
        # Perform batch inference
        outputs = model.generate(**inputs, max_new_tokens=1024, use_cache=True, do_sample = False)

        # Decode the batch outputs
        output_lst = tokenizer.batch_decode(outputs)

        for ouput_token in output_lst:
            ouput_token = ouput_token.replace(tokenizer.pad_token, "")
        
        # Process results for each item in the batch
        for i, output_text in enumerate(output_lst):
            # Extract the response text from the model output
            s_idx = output_text.find("### Response:\n") + len("### Response:\n")
            e_idx = output_text.find(EOS_TOKEN)
            predict_str = output_text[s_idx:e_idx]
        
        batch_input_strs = []
        batch_data_js_items = []

The text was updated successfully, but these errors were encountered:

shimmyshimmer added the unsure bug? I'm unsure label Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch inference produces inconsistent results for self-trained model #1456

Batch inference produces inconsistent results for self-trained model #1456

Xyuan13 commented Dec 20, 2024

Batch inference produces inconsistent results for self-trained model #1456

Batch inference produces inconsistent results for self-trained model #1456

Comments

Xyuan13 commented Dec 20, 2024