Skip to content

Commit 164b2d4

Browse files
committed
FSDP, PEFT and Readme adjustments
- adding FSDP checkpoint load example - adjust the lora model merge example - change the 7b model to use PEFT fine-tuning - readme spelling fixes
1 parent f9ba5dd commit 164b2d4

File tree

3 files changed

+209
-43
lines changed

3 files changed

+209
-43
lines changed

distributed_training/llama2/README.md

+14-14
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ You can select your preferred Llama2 model size in the setup configuration betwe
1111

1212
## Prerequisite
1313

14-
The key prerequisites that you would need to set tup before you can proceed to run the distributed fine-tuning process on Oracle Cloud Infrastructure Data Science Service.
14+
The key prerequisites that you would need to setup before you can proceed to run the distributed fine-tuning process on Oracle Cloud Infrastructure Data Science Service.
1515

1616
* [Configure custom subnet](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/distributed_training#1-networking) - with security list to allow ingress into any port from the IPs originating within the CIDR block of the subnet. This is to ensure that the hosts on the subnet can connect to each other during distributed training.
1717
* [Create an object storage bucket](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/distributed_training#2-object-storage) - to save the fine tuned weights
@@ -90,8 +90,6 @@ spec:
9090
appdirs==1.4.4
9191
loralib==0.1.2
9292
bitsandbytes==0.39.1
93-
black==23.9.1
94-
'black[jupyter]'
9593
datasets==2.12.0
9694
fire==0.5.0
9795
'git+https://github.com/huggingface/peft.git@15a013af5ff5660b9377af24d3eee358213d72d4'
@@ -128,7 +126,7 @@ spec:
128126
infrastructure:
129127
kind: infrastructure
130128
spec:
131-
blockStorageSize: 512
129+
blockStorageSize: 256
132130
logGroupId: ocid1.loggroup.<>
133131
logId: ocid1.log.<>
134132
subnetId: ocid1.subnet.<>
@@ -148,7 +146,7 @@ spec:
148146
--peft_method lora
149147
--pure_bf16
150148
--mixed_precision
151-
--batch_size_training 4
149+
--batch_size_training 1
152150
--model_name $MODEL_NAME
153151
--output_dir /home/datascience/outputs
154152
--num_epochs 1
@@ -164,8 +162,6 @@ spec:
164162
appdirs==1.4.4
165163
loralib==0.1.2
166164
bitsandbytes==0.39.1
167-
black==23.9.1
168-
'black[jupyter]'
169165
datasets==2.12.0
170166
fire==0.5.0
171167
'git+https://github.com/huggingface/peft.git@15a013af5ff5660b9377af24d3eee358213d72d4'
@@ -176,7 +172,7 @@ spec:
176172
scipy==1.10.0
177173
optimum==1.13.1
178174
outputDir: /home/datascience/outputs
179-
outputUri: oci://<bucket-for-finetuned-model>@<namespace>/$JOB_OCID
175+
outputUri: oci://llama2@bigdatadatasciencelarge/outputs/lvp-7b/$JOB_OCID
180176
env:
181177
- name: MODEL_NAME
182178
value: meta-llama/Llama-2-7b-hf
@@ -214,7 +210,7 @@ ads opctl watch <job run ocid of job-run-ocid>
214210

215211
### ADS Python API
216212

217-
As we mention you could also run the distributed fine-tuning process directly via the ADS Python API. Here the examples for fine-tuning full parameters of the [7B model](https://huggingface.co/meta-llama/Llama-2-7b-hf) using [FSDP](https://engineering.fb.com/2021/07/15/open-source/fsdp/).
213+
As we mention you could also run the distributed fine-tuning process directly via the ADS Python API. Here the examples for fine-tuning full parameters of the [7B model](https://huggingface.co/meta-llama/Llama-2-7b-hf) using [FSDP](https://engineering.fb.com/2021/07/15/open-source/fsdp/). Notice that in the following example we used `--dist_checkpoint_root_folder` and `--dist_checkpoint_folder` as those are required when only FSDP fine-tuning process is executed.
218214

219215
```python
220216
from ads.jobs import Job, DataScienceJob, PyTorchDistributedRuntime
@@ -245,8 +241,6 @@ job = (
245241
"appdirs==1.4.4",
246242
"loralib==0.1.2",
247243
"bitsandbytes==0.39.1",
248-
"black==23.9.1",
249-
"black[jupyter]",
250244
"datasets==2.12.0",
251245
"fire==0.5.0",
252246
"git+https://github.com/huggingface/peft.git@15a013af5ff5660b9377af24d3eee358213d72d4",
@@ -264,7 +258,6 @@ job = (
264258
"--enable_fsdp",
265259
"--pure_bf16",
266260
"--batch_size_training 1",
267-
"--micro_batch_size 1",
268261
"--model_name $MODEL_NAME",
269262
"--dist_checkpoint_root_folder /home/datascience/outputs",
270263
"--dist_checkpoint_folder fine-tuned"
@@ -274,7 +267,7 @@ job = (
274267
MODEL_NAME="meta-llama/Llama-2-7b-hf",
275268
HUGGING_FACE_HUB_TOKEN="<access_token>",
276269
LD_LIBRARY_PATH="/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/opt/conda/lib",
277-
OCI__METRICS_NAMESPACE="finetune_llama2_7b_hf_peft_lora"
270+
OCI__METRICS_NAMESPACE="finetune_llama2_7b_hf_fsdp"
278271
)
279272
)
280273
)
@@ -341,11 +334,18 @@ Additionally under the OCI Monitoring Service, if you enabled the `OCI__METRICS_
341334

342335
After the fine-tuning process is complete, to test the new model, we have to merge the weights to the base model and upload to the OCI Data Science Model Catalog.
343336

337+
### PEFT Weights Merging
338+
344339
1. Create a notebook session with VM.GPU.A10.2 shape or higher. Specify the object storage location where the fine-tuned weights are saved in the mount path while creating the notebook session.
345340
2. Upload `lora-model-merge.ipynb` notebook to the notebook session
346-
3. Run the notebook for verifying the fine tuned weights.
341+
3. Run the notebook for verifying the fine-tuned weights.
347342
4. The notebook also has code to upload the fine tuned model to model catalog.
348343

344+
### FSDP Weights Merging
345+
346+
1. Create a notebook session with VM.GPU.A10.2 shape or higher. Specify the object storage location where the fine-tuned weights are saved in the mount path while creating the notebook session.
347+
2. Upload `load-back-FSDP-checkpoints` notebook to the notebook session and follow the instructions.
348+
349349
## Deployment
350350

351351
We recommend to use vLLM based inference container for serving the fine-tuned model. vLLM offers various optimizations for efficient usage of GPU and offers good throughput out of the box. For the deployment, use the model that was saved to the model catalog after fine tuning job.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "58901f38",
6+
"metadata": {},
7+
"source": [
8+
"# Loading back FSDP checkpoints\n",
9+
"\n",
10+
"For more information: https://github.com/facebookresearch/llama-recipes/blob/main/docs/inference.md#loading-back-fsdp-checkpoints"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"id": "98b57d30",
16+
"metadata": {},
17+
"source": [
18+
"## All of the code in this notebook should be run in the OCI Data Science Notebook Terminal!"
19+
]
20+
},
21+
{
22+
"cell_type": "markdown",
23+
"id": "05a59132",
24+
"metadata": {},
25+
"source": [
26+
"Before you start make sure that you've installed the `pytorch20_p39_gpu_v2` Conda and activate it in the `Terminal`\n",
27+
"\n",
28+
"```bash\n",
29+
"odsc conda install -s pytorch20_p39_gpu_v2\n",
30+
"```\n",
31+
"\n",
32+
"... then activate it\n",
33+
"\n",
34+
"```bash\n",
35+
"conda activate /home/datascience/conda/pytorch20_p39_gpu_v2\n",
36+
"```\n",
37+
"\n",
38+
"Then install all of the required dependancies\n",
39+
"\n",
40+
"```bash\n",
41+
"!pip install tokenizers==0.13.3 -U && pip install transformers -U && pip install llama-recipes==0.0.1\n",
42+
"```"
43+
]
44+
},
45+
{
46+
"cell_type": "markdown",
47+
"id": "6e2aecea",
48+
"metadata": {},
49+
"source": [
50+
"Following commands work best when you execute them in the `terminal` too!\n",
51+
"\n",
52+
"First you have to login to access the Llama2 model\n",
53+
"```bash\n",
54+
"!huggingface-cli login\n",
55+
"```\n",
56+
"\n",
57+
"Then run the checkpoint conververter, it looks like following\n",
58+
"\n",
59+
"```bash\n",
60+
"python -m llama_recipes.inference.checkpoint_converter_fsdp_hf --fsdp_checkpoint_path /mnt/llama2/outputs/lvp-7b/ocid1.datasciencejob.oc1.eu-frankfurt-1.amaaaaaan/fine-tuned-meta-llama/Llama-2-7b-hf --consolidated_model_path /mnt/llama2/fsdp_consolidated_checkpoints --HF_model_path_or_name \"meta-llama/Llama-2-13b-hf\"\n",
61+
"```\n",
62+
"\n",
63+
"Replace the `--fsdp_checkpoint_path` with the folder you specified by the `--dist_checkpoint_root_folder` which will be the location at your object storage bucket, as per the example above. Notice that we ran this in OCI Data Science Notebooks and mounted the object storage bucket used to store the FSDP checkpoints under `/mnt/llama2`. The `--consolidated_model_path` is the path where the consolidated weights will be stored back. The `--HF_model_path_or_name` is the name of the model used for the fine-tuning, or if you downloaded the model locally, the location of the downloaded model.\n",
64+
"\n",
65+
"If the merging process was successful, you should see in your `--consolidated_model_path` folder something like this:\n",
66+
"\n",
67+
"```bash\n",
68+
" 0 drwxr-xr-x. 1 datascience users 0 Oct 18 15:48 .\n",
69+
" 0 drwxr-xr-x. 1 datascience users 0 Oct 18 14:38 ..\n",
70+
" 512 -rw-r--r--. 1 datascience users 42 Oct 18 16:35 added_tokens.json\n",
71+
"1.0K -rw-r--r--. 1 datascience users 656 Oct 18 16:35 config.json\n",
72+
" 512 -rw-r--r--. 1 datascience users 111 Oct 18 16:35 generation_config.json\n",
73+
"9.2G -rw-r--r--. 1 datascience users 9.2G Oct 18 16:35 pytorch_model-00001-of-00003.bin\n",
74+
"9.3G -rw-r--r--. 1 datascience users 9.3G Oct 18 16:36 pytorch_model-00002-of-00003.bin\n",
75+
"6.7G -rw-r--r--. 1 datascience users 6.7G Oct 18 16:36 pytorch_model-00003-of-00003.bin\n",
76+
" 24K -rw-r--r--. 1 datascience users 24K Oct 18 16:36 pytorch_model.bin.index.json\n",
77+
" 512 -rw-r--r--. 1 datascience users 72 Oct 18 16:35 special_tokens_map.json\n",
78+
"1.5K -rw-r--r--. 1 datascience users 1.2K Oct 18 16:35 tokenizer_config.json\n",
79+
"489K -rw-r--r--. 1 datascience users 489K Oct 18 16:35 tokenizer.model\n",
80+
"```"
81+
]
82+
},
83+
{
84+
"cell_type": "code",
85+
"execution_count": null,
86+
"id": "2407ae40",
87+
"metadata": {},
88+
"outputs": [],
89+
"source": []
90+
}
91+
],
92+
"metadata": {
93+
"kernelspec": {
94+
"display_name": "Python [conda env:pytorch20_p39_gpu_v2]",
95+
"language": "python",
96+
"name": "conda-env-pytorch20_p39_gpu_v2-py"
97+
},
98+
"language_info": {
99+
"codemirror_mode": {
100+
"name": "ipython",
101+
"version": 3
102+
},
103+
"file_extension": ".py",
104+
"mimetype": "text/x-python",
105+
"name": "python",
106+
"nbconvert_exporter": "python",
107+
"pygments_lexer": "ipython3",
108+
"version": "3.9.16"
109+
}
110+
},
111+
"nbformat": 4,
112+
"nbformat_minor": 5
113+
}

0 commit comments

Comments
 (0)