Model Fine-Tuning Support in SageMaker Python SDK V3
We’re excited to introduce comprehensive model fine-tuning capabilities in the SageMaker Python SDK V3, bringing state-of-the-art fine-tuning techniques to production ML workflows. Fine-tune foundation models with enterprise features including automated experiment tracking, serverless infrastructure, and integrated evaluation—all with just a few lines of code.
What's New
The SageMaker Python SDK V3 now includes four specialized Fine-Tuning Trainers for different fine-tuning techniques. Each trainer is optimized for specific use cases, following established research and industry best practices:
SFTTrainer - Supervised Fine-Tuning
Fine-tune models with labeled instruction-response pairs for task-specific adaptation.
from sagemaker.train import SFTTrainer
from sagemaker.train.common import TrainingType
trainer = SFTTrainer(
model="meta-llama/Llama-2-7b-hf",
training_type=TrainingType.LORA,
model_package_group_name="my-fine-tuned-models",
training_dataset="s3://bucket/train.jsonl"
)
training_job = trainer.train()DPOTrainer - Direct Preference Optimization
Align models with human preferences using the DPO algorithm. Unlike traditional RLHF, DPO eliminates the need for a separate reward model, simplifying the alignment pipeline while achieving comparable results. Use cases : Preference alignment, safety tuning, style adaptation.
from sagemaker.train import DPOTrainer
trainer = DPOTrainer(
model="meta-llama/Llama-2-7b-hf",
training_type=TrainingType.LORA,
model_package_group_name="my-dpo-models",
training_dataset="s3://bucket/preference_data.jsonl"
)
training_job = trainer.train()RLAIFTrainer - Reinforcement Learning from AI Feedback
Leverage AI-generated feedback as reward signals using Amazon Bedrock models. RLAIF offers a scalable alternative to human feedback while maintaining quality.
from sagemaker.train import RLAIFTrainer
trainer = RLAIFTrainer(
model="meta-llama/Llama-2-7b-hf",
training_type=TrainingType.LORA,
model_package_group_name="my-rlaif-models",
reward_model_id="anthropic.claude-3-5-haiku-20241022-v1:0",
reward_prompt="Builtin.Helpfulness",
training_dataset="s3://bucket/rlaif_data.jsonl"
)
training_job = trainer.train()RLVRTrainer - Reinforcement Learning from Verifiable Rewards
Train with custom, programmatic reward functions for domain-specific optimization.
from sagemaker.train import RLVRTrainer
trainer = RLVRTrainer(
model="meta-llama/Llama-2-7b-hf",
training_type=TrainingType.LORA,
model_package_group_name="my-rlvr-models",
custom_reward_function="arn:aws:sagemaker:region:account:hub-content/.../evaluator/1.0",
training_dataset="s3://bucket/rlvr_data.jsonl"
)
training_job = trainer.train()Key Features
Parameter-Efficient Fine-Tuning
- LoRA (Low-Rank Adaptation): Default, memory-efficient approach
- Full Fine-Tuning: Train all model parameters, for maximum performance
Built-in MLflow Integration
Automatic experiment tracking with intelligent defaults:
- Auto-resolves MLflow tracking servers
- Domain-aware server selection
- Automatic experiment and run management
- Provides ongoing visibility into performance and loss metrics during training
Dynamic Hyperparameter Management
Discover and customize training hyperparameters with built-in validation:
# View available hyperparameters
trainer.hyperparameters.get_info()
# Customize training
trainer.hyperparameters.learning_rate = 0.0001
trainer.hyperparameters.max_epochs = 3
trainer.hyperparameters.lora_alpha = 32Continued Fine-Tuning
Build on previously fine-tuned models for iterative improvement:
from sagemaker.core.resources import ModelPackage
# Use a previously fine-tuned model
base_model = ModelPackage.get(
model_package_name="arn:aws:sagemaker:region:account:model-package/..."
)
trainer = SFTTrainer(
model=base_model, # Continue from fine-tuned model
training_type=TrainingType.LORA,
model_package_group_name="my-models-v2"
)Flexible Dataset Support
Multiple input formats with automatic validation:
- S3 URIs: s3://bucket/path/data.jsonl
- SageMaker AI Registry Dataset ARNs
- DataSet objects with validation
Serverless Training
No infrastructure management required—just specify your model and data:
- Automatic compute provisioning
- Managed training infrastructure
- Pay only for training time
Enterprise-Ready
Production-ready security features:
- VPC support for secure training
- KMS encryption for outputs
- IAM role management
- EULA acceptance for gated models
Model Evaluation
Comprehensive evaluation framework with three evaluator types:
- BenchMarkEvaluator: Standard benchmarks (MMLU, BBH, GPQA, MATH, STRONG_REJECT, IFEVAL, GEN_QA, MMMU, LLM_JUDGE, INFERENCE_ONLY)
- CustomScorerEvaluator: Built-in metrics (PRIME_MATH, PRIME_CODE) or custom evaluators
- LLMAsJudgeEvaluator: LLM-based evaluation with explanations
See the "Evaluating Fine-Tuned Models" section below for detailed examples.
Evaluating Fine-Tuned Models
Evaluate your fine-tuned models using standard benchmarks, custom metrics, or LLM-based evaluation.
Benchmark Evaluation
Evaluate against 11 standard benchmarks including MMLU, BBH, GPQA, MATH, and more.
Discover available benchmarks
from sagemaker.train.evaluate import BenchMarkEvaluator, get_benchmarks, get_benchmark_properties
Benchmark = get_benchmarks()
print(list(Benchmark))
# [MMLU, MMLU_PRO, BBH, GPQA, MATH, STRONG_REJECT, IFEVAL, GEN_QA, MMMU, LLM_JUDGE, INFERENCE_ONLY]
Get benchmark details
props = get_benchmark_properties(Benchmark.MMLU)
print(props['description'])
print(props['subtasks'])Run evaluation
evaluator = BenchMarkEvaluator(
benchmark=Benchmark.MMLU,
model_package_arn="arn:aws:sagemaker:...",
base_model="meta-llama/Llama-2-7b-hf",
output_s3_location="s3://bucket/eval-results/",
mlflow_resource_arn="arn:aws:sagemaker:..."
)
execution = evaluator.evaluate(subtask="college_mathematics")
execution.wait()
execution.show_results()Custom Scorer Evaluation
Use built-in metrics or custom evaluators
from sagemaker.train.evaluate import CustomScorerEvaluator, get_builtin_metrics
# Discover built-in metrics
BuiltInMetric = get_builtin_metrics()
# [PRIME_MATH, PRIME_CODE]
# Using built-in metric
evaluator = CustomScorerEvaluator(
evaluator=BuiltInMetric.PRIME_MATH,
dataset="s3://bucket/eval-data.jsonl",
model_package_arn="arn:aws:sagemaker:...",
mlflow_resource_arn="arn:aws:sagemaker:..."
)
execution = evaluator.evaluate()
execution.wait()
execution.show_results()LLM-as-Judge Evaluation
Leverage large language models for nuanced evaluation with explanations:
from sagemaker.train.evaluate import LLMAsJudgeEvaluator
evaluator = LLMAsJudgeEvaluator(
judge_model_id="anthropic.claude-3-5-haiku-20241022-v1:0",
evaluation_prompt="Rate the helpfulness of this response",
dataset="s3://bucket/eval-data.jsonl",
model_package_arn="arn:aws:sagemaker:...",
mlflow_resource_arn="arn:aws:sagemaker:..."
)
execution = evaluator.evaluate()
execution.wait()
# Show first 5 results
execution.show_results()
# Show next 5 with explanations
execution.show_results(limit=5, offset=5, show_explanations=True)Deploying Fine-Tuned Models
Flexible deployment options for production inference. Deploy your fine-tuned models to SageMaker endpoints or Amazon Bedrock.
Deploy to SageMaker Endpoint
Deploy from training job
from sagemaker.core.resources import TrainingJob
from sagemaker.serve import ModelBuilder
training_job = TrainingJob.get(training_job_name="my-training-job")
model_builder = ModelBuilder(model=training_job)
model = model_builder.build()
endpoint = model_builder.deploy(endpoint_name="my-endpoint")Deploy from model package
from sagemaker.core.resources import ModelPackage
model_package = ModelPackage.get(model_package_name="arn:aws:sagemaker:...")
model_builder = ModelBuilder(model=model_package)
model_builder.build()
endpoint = model_builder.deploy(endpoint_name="my-endpoint")Deploy from trainer
trainer = SFTTrainer(...)
training_job = trainer.train()
model_builder = ModelBuilder(model=trainer)
endpoint = model_builder.deploy()Deploy Multiple Adapters to Same Endpoint
Deploy multiple fine-tuned adapters to a single endpoint for cost-efficient serving
# Deploy base model
model_builder = ModelBuilder(model=training_job)
model_builder.build()
endpoint = model_builder.deploy(endpoint_name="my-endpoint")
# Deploy adapter to same endpoint
model_builder2 = ModelBuilder(model=training_job2)
model_builder2.build()
endpoint2 = model_builder2.deploy(
endpoint_name="my-endpoint", # Same endpoint
inference_component_name="my-adapter" # New adapter
)Deploy to Amazon Bedrock
from sagemaker.serve.bedrock_model_builder import BedrockModelBuilder
training_job = TrainingJob.get(training_job_name="my-training-job")
bedrock_builder = BedrockModelBuilder(model=training_job)
deployment_result = bedrock_builder.deploy(
job_name="my-bedrock-job",
imported_model_name="my-bedrock-model",
role_arn="arn:aws:iam::..."
)Find Endpoints Using a Base Model
model_builder = ModelBuilder(model=training_job)
endpoint_names = model_builder.fetch_endpoint_names_for_base_model()
# Returns: set of endpoint namesComplete Workflow Example
from sagemaker.train import SFTTrainer
from sagemaker.train.common import TrainingType
from sagemaker.ai_registry.dataset import DataSet
# 1. Register your dataset
dataset = DataSet.create(
name="my-training-data",
source="s3://bucket/train.jsonl",
customization_technique="SFT"
)
# 2. Create trainer with MLflow tracking
trainer = SFTTrainer(
model="meta-llama/Llama-2-7b-hf",
training_type=TrainingType.LORA,
model_package_group_name="my-models",
mlflow_resource_arn="arn:aws:sagemaker:region:account:mlflow-tracking-server/...",
mlflow_experiment_name="llama-fine-tuning",
training_dataset=dataset.arn
)
# 3. Customize hyperparameters
trainer.hyperparameters.learning_rate = 0.0001
trainer.hyperparameters.max_epochs = 5
# 4. Start training (non-blocking)
training_job = trainer.train(wait=False)
# 5. Monitor progress
training_job.wait()
training_job.refresh()
# 6. Get fine-tuned model
model_package_arn = training_job.output_model_package_arn
print(f"Fine-tuned model: {model_package_arn}")
# 7. Deploy the model
model_builder = ModelBuilder(model=training_job)
model_builder.build()
endpoint = model_builder.deploy(endpoint_name="my-endpoint")
# 8. Evaluate the model
from sagemaker.train.evaluate import BenchMarkEvaluator, get_benchmarks
Benchmark = get_benchmarks()
evaluator = BenchMarkEvaluator(
benchmark=Benchmark.MMLU,
model_package_arn=model_package_arn,
mlflow_resource_arn="arn:aws:sagemaker:..."
)
execution = evaluator.evaluate()
execution.wait()
execution.show_results()Examples
Explore complete end-to-end examples in the v3-examples/model-customization-examples/ directory:
- sft_finetuning_example_notebook.ipynb - Supervised fine-tuning walkthrough
- dpo-trainer-e2e.ipynb - Direct preference optimization
- rlaif_finetuning_example_notebook_v3.ipynb - AI feedback-based training
- rlvr_finetuning_example_notebook_v3.ipynb - Verifiable rewards training
- ai_registry_example.ipynb - SageMaker AI Registry SDK to manage datasets and evaluators for model customization workflows
Evaluation Examples
Available in v3-examples/model-customization-examples/
- benchmark_demo.ipynb - Benchmark evaluation walkthrough
- custom_scorer_demo.ipynb - Custom scorer evaluation
- llm_as_judge_demo.ipynb - LLM-as-judge evaluation