Skip to content

Conversation

@finbarrtimbers
Copy link
Collaborator

Here, we make the following changes:

  1. We support setting the ZeRO stage for DPO.
  2. We add support for caching the reference logprobs to disk to avoid recomputing them
  3. Changes DPO to log the grad norms
  4. Changes DPO so that we don't crash when wandb tags aren't available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants