fix: improve batch size handling in GRPOTrainer compute_loss #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

All4Nothing wants to merge 1 commit into FusionBrainLab:main from All4Nothing:fix/batch-size-handling

All4Nothing commented Jun 4, 2025 •

edited

Loading

Description

This PR improves the batch size handling in the GRPOTrainer's compute_loss method to make it more robust and explicit.

Changes

Added explicit batch size calculation from advantages tensor
Added safe tensor slicing to ensure consistent batch sizes across all tensors

Why

The current implementation assumes all tensors have the same batch size, which could lead to potential issues. The improved version:

Explicitly calculates batch size from the advantages tensor
Ensures all tensors are properly sliced to match the batch size

Testing

Tested with different batch sizes
Verified backward compatibility

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
My changes generate no new warnings


          fix: improve batch size handling in GRPOTrainer compute_loss

3a552f0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet