Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 13 additions & 20 deletions src/content/workflow-guardrails/steering.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,25 @@
title: Steering
---

Steering is a fancy word for nudging. When the model does the wrong thing, you push it in the right direction.
Steering is nudging. When the model does the wrong thing, you push it in the right direction.

## Interactive Steering
## Direct Steering

The most immediate form is interactive: you [stop](/context-management/stop) the agent mid-execution and send a follow-up prompt, or [rewind](/context-management/rewinding) to edit a previous message and steer it differently from that point.
The most immediate form: you [stop](/context-management/stop) the agent mid-execution and send a follow-up prompt, or [rewind](/context-management/rewinding) to edit a previous message and steer it differently from that point.

This is prompting in the moment. You see what the model is doing, intervene, and redirect.
You see what the model is doing, intervene, and redirect. This is the core skill of working with AI.

## Activation Steering
## Environmental Steering

With open-weight models, you can steer at inference time by manipulating the model's internals. The process:
You can also steer indirectly through [feedback loops](/workflow-guardrails/feedback-loops). For example, when writing TypeScript, adding a lint rule that flags the `any` type steers the model away from using it. The model sees the lint errors and adjusts.

1. Run good prompts and bad prompts through the model
2. Identify the activation patterns that differ between them
3. Adjust weights to reinforce good behaviors and suppress bad ones
This is steering through constraints rather than through prompts. You shape the environment, and the model adapts to it.

This steers the model's outputs without changing the prompt. It's more technical and requires access to model weights.
## The Difference

## Fine-Tuning
| Type | When | How |
|------|------|-----|
| **Direct** | In the moment | Stop, redirect, rewind |
| **Environmental** | Before the task | Lint rules, type systems, tests |

The most robust form of steering happens during training. You take a generalized model and run a final round of fine-tuning to shape it toward the behaviors you want.

Fine-tuning is permanent—the steering is baked into the model itself rather than applied at inference time.

## Steering Through Constraints

You can also steer indirectly through [feedback loops](/workflow-guardrails/feedback-loops). For example, when writing TypeScript, adding a lint rule that flags the `any` type steers the model away from using it. The model learns from the lint errors and adjusts.

This is steering through environment rather than through the model itself.
Direct steering is reactive—you fix problems as they happen. Environmental steering is proactive—you prevent problems before they start.