Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions examples/instructlab-multiphase-configs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# InstructLab Multi-Phase Training Configurations
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we start grouping the examples under examples/training_hub for example to better organize the examples from different part of the product.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think that makes sense. One question is whether we will have other components of the instructlab pipeline / notebooks in this repo. @aditisaluja5 was a decision made as to whether the full end-to-end pipeline example will live here? And if so, will that be as a series of notebooks (including the training notebook), or a single all-encompassing notebook?

If it is just the training configs living here, then I think @astefanutti's suggestions are best, where we can have a training_hub sub-directory and include the training notebook itself in there as well. If we are including the full pipeline in here, however, then we should probably create an instructlab sub-directory, add all of the step notebooks in there (data prep, sdg, training, eval, etc.) and have this config notebook live alongside them. LMK what the plan is and I can proceed accordingly

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, for the late response. The end to end examples will also live in this repo.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we are leaning on the latter option. Would examples/ai_hub be more appropriate as sub-directory instead of instructlab?


This directory contains hardware-specific configurations for running the LAB multi-phase training pipeline with different GPU setups.

## Quick Start

See [`lab_multiphase_configs.ipynb`](./lab_multiphase_configs.ipynb) for optimized training parameters for various hardware configurations including:

- **H200**: 1x, 2x, 4x, 8x GPU configurations
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always single node or multi-node is supported? That would be useful to mention it explicitly.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added a disclaimer to the README

- **H100**: 2x, 4x, 8x GPU configurations
- **A100 80GB**: 2x, 4x, 8x GPU configurations
- **A100 40GB**: 2x, 4x, 8x GPU configurations
- **L40S**: 4x, 8x GPU configurations
- **L4**: 8x GPU configurations

Each configuration includes memory-optimized settings for `max_tokens_per_gpu`, `max_seq_len`, `nproc_per_node`, and FSDP CPU offloading parameters.

Note: The values are all set assuming a single node with the above GPU resources. For multi-node, note that the default sharding strategy is FSDP [HYBRID_SHARD](https://docs.pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.ShardingStrategy). Begin with the cooresponding settings that align with one of your given nodes, or switch to `FULL_SHARD` if required due to memory constraints.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "cooresponding" -> "corresponding"

324 changes: 324 additions & 0 deletions examples/instructlab-multiphase-configs/lab_multiphase_configs.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,324 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": "# InstructLab Multi-Phase Training Hardware Configurations\n\nThis notebook contains hardware-specific parameter configurations for use with the [LAB Multi-Phase Training Tutorial](https://github.com/Red-Hat-AI-Innovation-Team/training_hub/blob/main/examples/notebooks/lab_multiphase_training_tutorial.ipynb).\n\n**Model**: These configurations are optimized for `granite-3.1-8b-starter-v2.1`\n\nEach configuration below specifies the optimal parameters for different GPU setups, including:\n- `max_tokens_per_gpu`: Memory limit per GPU to prevent OOM errors\n- `nproc_per_node`: Number of GPUs per node for distributed training\n- `cpu_offload_params`: FSDP CPU offloading configuration for memory optimization\n\n**Usage**: Copy the appropriate configuration parameters from the sections below into your training script based on your available hardware."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to have lab_multiphase_training_tutorial.ipynb added here so the example is standalone. WDYT?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to have the end to end instruct lab pipeline example here instead of just training example.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also related to above comment: #3 (comment)

},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## FSDP Configuration Reference\n",
"\n",
"The `cpu_offload_params` parameter controls whether FSDP parameters are offloaded to CPU memory to reduce GPU memory usage:\n",
"\n",
"```python\n",
"from instructlab.training import FSDPOptions\n",
"\n",
"fsdp_options = FSDPOptions(\n",
" cpu_offload_params=True # Boolean: True to enable CPU offloading, False to disable\n",
")\n",
"```"
]
},
{
"cell_type": "markdown",
"source": "## Usage Example\n\nHere's how to use these configurations in the [LAB Multi-Phase Training Tutorial](https://github.com/Red-Hat-AI-Innovation-Team/training_hub/blob/main/examples/notebooks/lab_multiphase_training_tutorial.ipynb):\n\n1. **Choose your hardware configuration** from the sections below based on your available GPUs\n2. **Copy the configuration values** into your training script\n3. **Apply them to your training configuration**\n\n```python\n# Example: Using H100 4x GPU configuration\n\n# Step 1: Set hardware-specific parameters from this notebook\nmax_tokens_per_gpu = 30000\nmax_seq_len = 27000\nnproc_per_node = 4\ncpu_offload_params = False\n\n# Step 2: Use with training_hub's sft function\nfrom training_hub import sft\nfrom instructlab.training import FSDPOptions\n\n# Configure FSDP options\nfsdp_options = FSDPOptions(\n cpu_offload_params=cpu_offload_params\n)\n\n# Step 3: Apply to your LAB training phases\n# Phase 1: Knowledge Tuning (Phase07)\nphase07_result = sft(\n model_path=\"/path/to/base/model\",\n data_path=\"/path/to/knowledge_data.jsonl\",\n ckpt_output_dir=\"/path/to/phase07_checkpoints\",\n max_tokens_per_gpu=max_tokens_per_gpu,\n max_seq_len=max_seq_len,\n fsdp_options=fsdp_options,\n num_epochs=7,\n # Use torchrun with nproc_per_node for distributed training\n)\n\n# Phase 2: Skills + Replay Training (Phase10)\nphase10_result = sft(\n model_path=phase07_result.checkpoint_path, # Use Phase07 output\n data_path=\"/path/to/skills_plus_replay_data.jsonl\",\n ckpt_output_dir=\"/path/to/phase10_checkpoints\", \n max_tokens_per_gpu=max_tokens_per_gpu,\n max_seq_len=max_seq_len,\n fsdp_options=fsdp_options,\n num_epochs=7,\n # Use torchrun with nproc_per_node for distributed training\n)\n```\n\n**Note**: These values have been tested and optimized specifically for Granite starter models and LAB multiphase datasets. You may need to adjust these parameters for different student models and datasets.",
"metadata": {}
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## H200 Configurations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### H200 8x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# H200 8x GPU Configuration\nmax_tokens_per_gpu = 85000\nmax_seq_len = 80000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 8\n\n# FSDP CPU offloading configuration\ncpu_offload_params = False"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### H200 4x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# H200 4x GPU Configuration\nmax_tokens_per_gpu = 75000\nmax_seq_len = 70000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 4\n\n# FSDP CPU offloading configuration\ncpu_offload_params = False"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### H200 2x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# H200 2x GPU Configuration\nmax_tokens_per_gpu = 45000\nmax_seq_len = 40000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 2\n\n# FSDP CPU offloading configuration\ncpu_offload_params = False"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### H200 1x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# H200 1x GPU Configuration\nmax_tokens_per_gpu = 45000\nmax_seq_len = 40000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 1\n\n# FSDP CPU offloading configuration\ncpu_offload_params = True"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## H100 Configurations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### H100 8x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# H100 8x GPU Configuration\nmax_tokens_per_gpu = 45000\nmax_seq_len = 40000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 8\n\n# FSDP CPU offloading configuration\ncpu_offload_params = False"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### H100 4x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# H100 4x GPU Configuration\nmax_tokens_per_gpu = 30000\nmax_seq_len = 27000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 4\n\n# FSDP CPU offloading configuration\ncpu_offload_params = False"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### H100 2x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# H100 2x GPU Configuration\nmax_tokens_per_gpu = 25000\nmax_seq_len = 22000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 2\n\n# FSDP CPU offloading configuration\ncpu_offload_params = True"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# H100 1x GPU Configuration\n",
"max_tokens_per_gpu = 0 # TO BE FILLED\n",
"nproc_per_node = 1\n",
"\n",
"# FSDP CPU offloading configuration\n",
"cpu_offload_params = False # TO BE FILLED - True to enable CPU offloading, False to disable"
Comment on lines +147 to +153
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove or complete the unfinished H100 1x config before merging.

Line 149 publishes max_tokens_per_gpu = 0, and Lines 147-153 never define max_seq_len. That makes this block unusable as a copy/paste example and can break outside a stateful notebook session. Either add tested values for all parameters or replace this with a non-runnable “not yet validated” note.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/instructlab-multiphase-configs/lab_multiphase_configs.ipynb` around
lines 147 - 153, The H100 1x GPU Configuration block is incomplete and not
runnable: fill in realistic, tested values for max_tokens_per_gpu (replace 0),
define max_seq_len, and set cpu_offload_params to the validated boolean (or
explicit tuning value) so the snippet can be copy/pasted (e.g., set
max_tokens_per_gpu to a non-zero value, add max_seq_len with the intended
sequence length, leave nproc_per_node as 1 if correct); alternatively replace
the entire H100 1x GPU Configuration section with a clear non-runnable note
stating "not yet validated" so users don't try to execute it.

]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A100 80GB 8x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# A100 80GB 8x GPU Configuration\nmax_tokens_per_gpu = 45000\nmax_seq_len = 40000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 8\n\n# FSDP CPU offloading configuration\ncpu_offload_params = False"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A100 80GB 4x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# A100 80GB 4x GPU Configuration\nmax_tokens_per_gpu = 30000\nmax_seq_len = 27000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 4\n\n# FSDP CPU offloading configuration\ncpu_offload_params = False"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A100 80GB 2x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# A100 80GB 2x GPU Configuration\nmax_tokens_per_gpu = 25000\nmax_seq_len = 22000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 2\n\n# FSDP CPU offloading configuration\ncpu_offload_params = True"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A100 40GB Configurations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A100 40GB 8x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# A100 40GB 8x GPU Configuration\nmax_tokens_per_gpu = 15000\nmax_seq_len = 13000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 8\n\n# FSDP CPU offloading configuration\ncpu_offload_params = False"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A100 40GB 4x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# A100 40GB 4x GPU Configuration\nmax_tokens_per_gpu = 30000\nmax_seq_len = 27000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 4\n\n# FSDP CPU offloading configuration\ncpu_offload_params = True"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A100 40GB 2x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# A100 40GB 2x GPU Configuration\nmax_tokens_per_gpu = 25000\nmax_seq_len = 22000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 2\n\n# FSDP CPU offloading configuration\ncpu_offload_params = True"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## L40S Configurations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### L40S 8x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# L40S 8x GPU Configuration\nmax_tokens_per_gpu = 10000\nmax_seq_len = 8000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 8\n\n# FSDP CPU offloading configuration\ncpu_offload_params = False"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### L40S 4x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# L40S 4x GPU Configuration\nmax_tokens_per_gpu = 8000\nmax_seq_len = 6000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 4\n\n# FSDP CPU offloading configuration\ncpu_offload_params = False"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## L4 Configurations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### L4 8x GPU Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "# L4 8x GPU Configuration\nmax_tokens_per_gpu = 8000\nmax_seq_len = 6000 # adjust if needed based on actual data lengths, but keep below max_tokens_per_gpu\nnproc_per_node = 8\n\n# FSDP CPU offloading configuration\ncpu_offload_params = True"
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}