Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch inference produces inconsistent results for self-trained model #1456

Open
Xyuan13 opened this issue Dec 20, 2024 · 0 comments
Open

Batch inference produces inconsistent results for self-trained model #1456

Xyuan13 opened this issue Dec 20, 2024 · 0 comments
Labels
unsure bug? I'm unsure

Comments

@Xyuan13
Copy link

Xyuan13 commented Dec 20, 2024

I am experiencing an issue with batch inference using my self-trained model. When I perform inference on single samples, the results are consistent and correct. However, when I perform inference on batches of multiple samples, the results differ unexpectedly.

I also find it strange that the outputs of batch inference change when I alter the batch size. I’ve tested batch sizes ranging from 8 to 64, and the inconsistencies increase with larger batch sizes.

I've updated the unsloth version to 2024.12.4 and also set padding_side to 'left' and set tokenizer.pad_token = tokenizer.unk_token, it still not work.

Here is my code

max_seq_length = 1024 # in case of truncate 
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = f"{weights_path}", # YOUR MODEL YOU USED FOR TRAINING
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) 

tokenizer.padding_side='left'
tokenizer.pad_token = tokenizer.unk_token

batch_size = 4 #32 #64   

# Prepare the batch
batch_input_strs = []
batch_data_js_items = []

# Iterate through data and prepare inputs in batches
for idx, data_js_item in enumerate(data_js[:eval_item_num]):
    input_str = f"game_record:{data_js_item['game_record']}, 'target_player':{data_js_item['target_player']}"
    batch_input_strs.append(input_str)
    batch_data_js_items.append(data_js_item)

    # Once the batch size is reached, or we've processed the last item
    if len(batch_input_strs) == batch_size or idx == len(data_js[:eval_item_num]) - 1:
        # Prepare batch inputs for tokenizer
        inputs = tokenizer(
            [alpaca_prompt.format(
                INSTRCTION, input_str, "",) for input_str in batch_input_strs
            ],
            return_tensors="pt",padding=True, truncation=True).to("cuda")
        # Perform batch inference
        outputs = model.generate(**inputs, max_new_tokens=1024, use_cache=True, do_sample = False)

        # Decode the batch outputs
        output_lst = tokenizer.batch_decode(outputs)

        for ouput_token in output_lst:
            ouput_token = ouput_token.replace(tokenizer.pad_token, "")
        
        # Process results for each item in the batch
        for i, output_text in enumerate(output_lst):
            # Extract the response text from the model output
            s_idx = output_text.find("### Response:\n") + len("### Response:\n")
            e_idx = output_text.find(EOS_TOKEN)
            predict_str = output_text[s_idx:e_idx]
        
        batch_input_strs = []
        batch_data_js_items = []

@shimmyshimmer shimmyshimmer added the unsure bug? I'm unsure label Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unsure bug? I'm unsure
Projects
None yet
Development

No branches or pull requests

2 participants