Skip to content

Conversation

All4Nothing
Copy link

@All4Nothing All4Nothing commented Jun 4, 2025

Description

This PR improves the batch size handling in the GRPOTrainer's compute_loss method to make it more robust and explicit.

Changes

  • Added explicit batch size calculation from advantages tensor
  • Added safe tensor slicing to ensure consistent batch sizes across all tensors

Why

The current implementation assumes all tensors have the same batch size, which could lead to potential issues. The improved version:

  1. Explicitly calculates batch size from the advantages tensor
  2. Ensures all tensors are properly sliced to match the batch size

Testing

  • Tested with different batch sizes
  • Verified backward compatibility

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant