Skip to content

Add fine-tuning code and scripts #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 3, 2025
Merged

Add fine-tuning code and scripts #1

merged 9 commits into from
Jun 3, 2025

Conversation

llewelld
Copy link
Collaborator

Adds the initial code for fine-tuning.

Only the small model works on an 80 GiB A100. The standard model gives an out-of-memory error on the GPU.

@llewelld llewelld force-pushed the baskerville branch 5 times, most recently from 5779493 to f9c710c Compare May 22, 2025 13:10
llewelld added 9 commits June 3, 2025 11:56
Adds code for fine-tuning the models on Baskerville.
The directory contains both prediction and fine-tuning, so the name no longer
makes sense.

Keeping both tasks in the same directory turned out to be convenient because
the downloads directory can be reused more easily by both in this case.
Wraps the model in FSDP, but still runs out of memory when performing the
backwards step.
Adds headers to various files:
1. Shebang interpreter directive.
2. vim modeline configuration.
3. SPDX licence identifier.
Renames the scripts that match closest to those working on DAWN to use the
suffix "aligned" to avoid confusion over thier purpose.
Adds a loss function to align with the formula in the paper.
Fixes the loss function implementation for the aligned and FSDP cases.
Adds code for comparing DAWN and Baskerville results and generating comparisong
graphs.
Adds a README with instructions for how to run the comparisons and generate the
comparison graphs.
@llewelld llewelld merged commit 0830e37 into main Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant