diff --git a/src/content/workflow-guardrails/steering.md b/src/content/workflow-guardrails/steering.md index 80be523..b1df680 100644 --- a/src/content/workflow-guardrails/steering.md +++ b/src/content/workflow-guardrails/steering.md @@ -2,32 +2,25 @@ title: Steering --- -Steering is a fancy word for nudging. When the model does the wrong thing, you push it in the right direction. +Steering is nudging. When the model does the wrong thing, you push it in the right direction. -## Interactive Steering +## Direct Steering -The most immediate form is interactive: you [stop](/context-management/stop) the agent mid-execution and send a follow-up prompt, or [rewind](/context-management/rewinding) to edit a previous message and steer it differently from that point. +The most immediate form: you [stop](/context-management/stop) the agent mid-execution and send a follow-up prompt, or [rewind](/context-management/rewinding) to edit a previous message and steer it differently from that point. -This is prompting in the moment. You see what the model is doing, intervene, and redirect. +You see what the model is doing, intervene, and redirect. This is the core skill of working with AI. -## Activation Steering +## Environmental Steering -With open-weight models, you can steer at inference time by manipulating the model's internals. The process: +You can also steer indirectly through [feedback loops](/workflow-guardrails/feedback-loops). For example, when writing TypeScript, adding a lint rule that flags the `any` type steers the model away from using it. The model sees the lint errors and adjusts. -1. Run good prompts and bad prompts through the model -2. Identify the activation patterns that differ between them -3. Adjust weights to reinforce good behaviors and suppress bad ones +This is steering through constraints rather than through prompts. You shape the environment, and the model adapts to it. -This steers the model's outputs without changing the prompt. It's more technical and requires access to model weights. +## The Difference -## Fine-Tuning +| Type | When | How | +|------|------|-----| +| **Direct** | In the moment | Stop, redirect, rewind | +| **Environmental** | Before the task | Lint rules, type systems, tests | -The most robust form of steering happens during training. You take a generalized model and run a final round of fine-tuning to shape it toward the behaviors you want. - -Fine-tuning is permanent—the steering is baked into the model itself rather than applied at inference time. - -## Steering Through Constraints - -You can also steer indirectly through [feedback loops](/workflow-guardrails/feedback-loops). For example, when writing TypeScript, adding a lint rule that flags the `any` type steers the model away from using it. The model learns from the lint errors and adjusts. - -This is steering through environment rather than through the model itself. +Direct steering is reactive—you fix problems as they happen. Environmental steering is proactive—you prevent problems before they start.