You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=20000,
###### thinking params ######
thinking={
"type": "enabled",
"budget_tokens": 16000
},
#########################
messages=[{
"role": "user",
"content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"
}]
)
print(response)
The corresponding content will contain the thoughts in addition to the regular response
{
"content": [
{
"type": "thinking",
"thinking": "To approach this, let's think about what we know about prime numbers...",
"signature": "zbbJhbGciOiJFU8zI1NiIsImtakcjsu38219c0.eyJoYXNoIjoiYWJjMTIzIiwiaWFxxxjoxNjE0NTM0NTY3fQ...."
},
{
"type": "text",
"text": "Yes, there are infinitely many prime numbers such that..."
}
]
}
Requirements
Need to compile a list of providers (including self-hosted options such as VLLM, Llama.cpp, and Ollama,...) that provide this capability and how each of them handles the request and response.
Need to be able to detect which model offers this mode (Sonnet 3.5 does not, but Sonnet 3.7 does)
Need some changes to the UI to enable this thinking mode for appropriate models and to specify thinking budget
Some dependencies with the cost calculations as well
The text was updated successfully, but these errors were encountered:
Why
More and more models/providers are offering hybrid LLMs: regular fast response & thinking mode response.
Description
One example is Anthropic Claude API: https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking
Here, you simply add a thinking param in the API request to enable the thinking mode:
The corresponding content will contain the thoughts in addition to the regular response
Requirements
The text was updated successfully, but these errors were encountered: