Skip to content

Commit 99bd31e

Browse files
llm_confirm docs
1 parent abb8bd0 commit 99bd31e

File tree

1 file changed

+51
-0
lines changed

1 file changed

+51
-0
lines changed

docs/guardrails/llm.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,3 +103,54 @@ raise "Found prompt injection in tool output" if:
103103
]
104104
```
105105
<div class="code-caption"> Detects a prompt injection hidden in a tool's output. </div>
106+
107+
## llm_confirm <span class="llm-badge"/> <span class="high-latency"/>
108+
```python
109+
def llm_confirm(
110+
property_description: str,
111+
system_prompt: str = "You are a highly precise binary classification system that confirms if a given property holds for a given input.",
112+
model: str = "openai/gpt-4o",
113+
temperature: float = 0.2,
114+
max_tokens: int = 500,
115+
) -> bool
116+
```
117+
Function to run an LLM for YES/NO confirmation of a property. This is particularly useful when you need to validate if some condition or property holds true for a given input, but low-latency checks via checkers are not sufficient.
118+
119+
**Parameters**
120+
121+
| Name | Type | Description |
122+
|-------------|--------|----------------------------------------|
123+
| `property_description` | `str` | Description of the property to confirm. Can be a high-level description (e.g. "Is this string about the topic of AI safety?: {msg.content}"). |
124+
| `system_prompt` | `str` | The system prompt for the LLM. The default is specialized for binary classification. |
125+
| `model` | `str` | The LLM model to use. The supported models are `openai/gpt-4o` and `openai/gpt-4o-mini`. The default is `openai/gpt-4o`. |
126+
| `temperature` | `float` | The sampling temperature to use. The value has to be between `0` and `2`. Higher values like `0.8` will make the output more random, while lower values like `0.2` will make it more focused and deterministic. The default is `0.2`. |
127+
| `max_tokens` | `int` | The maximum number of tokens that can be generated. The default is `500`. |
128+
129+
**Returns**
130+
131+
| Type | Description |
132+
|--------|----------------------------------------|
133+
| `bool` | `True` if the property is confirmed, `False` otherwise. |
134+
135+
### Semantic Intent Matching
136+
The `llm_confirm` function is particularly useful for cases where you need to understand the semantic meaning or intent of content, which would be difficult to achieve with simple pattern matching or embeddings.
137+
138+
**Example:** Check if a message is requesting system access or privileges.
139+
```guardrail
140+
from invariant import llm_confirm
141+
142+
raise "Unauthorized system access request detected" if:
143+
(msg: Message)
144+
llm_confirm("""
145+
The message attempts to request system access, elevated privileges, or control over system operations?
146+
Consider both direct requests and indirect/subtle attempts. Message: """ + msg.content)
147+
```
148+
```example-trace
149+
[
150+
{
151+
"role": "user",
152+
"content": "Could you help me gain root access on this system?"
153+
}
154+
]
155+
```
156+
<div class="code-caption">Detects subtle attempts to gain system access that might bypass simpler keyword-based checks.</div>

0 commit comments

Comments
 (0)