FastAPI integrating LiteLLM's flexibility with Langfuse's prompt management.
APP_NAME="my-overlord"
ACCESS_KEYS='["example-secret-key-one", "example-secret-key-two", "example-secret-key-three"]'
ALLOWED_ORIGINS='["https://www.example.com/"]'
RATE_LIMITS_DEFAULT='["1/second", "10/minute", "100/hour", "1000/day"]'
RATE_LIMITS_HIGH='["10/second", "100/minute", "1000/hour", "10000/day"]' # only needed if high-usage client required
# various langfuse project keys
LANGFUSE_SECRET_KEY_PROJECT="your-langfuse-secret-key-with-the-project-name"
LANGFUSE_PUBLIC_KEY_PROJECT="your-langfuse-public-key-with-the-project-name"
# various ai provider api keys
OPENAI_API_KEY="your-openai-api-key"
ANTHROPIC_API_KEY="your-anthropic-api-key"
GEMINI_API_KEY="your-gemini-api-key"
pip install -r requirements.txt
uvicorn main:app --no-access-log
Simply utilize the Dockerfile
to automatically install all dependencies.
Currently there only is a Python client available for server to server communication.
The Overlord API is based on server-sent events (SSE), meaning by simply sending requests to the ai/chat
endpoint and parsing SSE one could access the API easily and build their own client for front-end usage in e.g. JavaScript etc.
The client is async first meaning if called in a synchronous application chat.request()
and overlord.task()
must be wrapped in asyncio.run()
instead of prefixed with await
Copy client.py
to your cwd
Rename to overlordapi.py
Run pip install requests pydantic
from overlordapi import Overlord
overlord = Overlord("http://your-server.url", "your-api-key", "your-langfuse-project", client_type="high-usage") # or "default" client_type
# health check (optional)
print((await overlord.client.ping()).text)
The API can be called with either a prompt from Langfuse or a simple text prompt. Every chat must however start with a Langfuse prompt, as model settings are derived from it.
class ChatInput(BaseModel):
prompt: str | dict | None # required (None only for internal tool use call)
file_urls: list[str] = None # optional
metadata: dict = None # optional
tools: dict[str, Callable] = None # optional
# overlord.input internalizes this schema
Here are the internalized models shown to visualize the dictionary structure required by ChatInput's prompt.
class PromptArgs(BaseModel):
name: str
label: str # defaults to 'production'
version: str | None = None
class PromptConfig(BaseModel):
args: PromptArgs
placeholders: dict | None = None # optional
project: str # this is used internally and can be ignored
data = overlord.input(
prompt=dict(
args=dict(
name="summarize_file",
label="latest",
),
placeholders=dict(
role="professor",
),
),
file_urls=["https://constitutioncenter.org/media/files/constitution.pdf"],
metadata=dict(
order_id="123456",
),
)
As mentioned earlier the chat can be continued with a simple text prompt but must always start with a Langfuse prompt.
data = overlord.input(prompt="What did we just look at?")
response = await overlord.task(data)
chat = overlord.chat()
# check session id (optional)
print(chat.session_id)
response = await chat.request(data)
Optionally an existing message history can be passed on chat init to allow continuing from a previous point in a conversation.
chat = overlord.chat(existing_message_history=[dict(role="user", content="My name is Tom.")])
Set when creating Langfuse prompt
tools = [
dict(
type="function",
function=dict(
name="get_random_words",
description="Generates a random word as string",
parameters=dict(
type="object",
properties=dict(
n=dict(
type="integer",
description="How many words to return",
),
),
required=["n"],
),
),
)
]
langfuse.create_prompt(
name="tools_test",
prompt="Write a {{text}} about {{topic}}.",
config=dict(
model="gpt-4o-mini",
tools=tools,
tool_choice="auto", # how the AI decides on which tool to use
),
)
Provide actual tools to client in call.
Tools can be async
await
to be called in parallel.
Also async and sync tools can be mixed.
def get_random_words(n: int):
selection = ["cat", "dog", "horse", "fish", "human"]
return [random.choice(selection) for _ in range(n)]
data = overlord.input(
prompt=dict(
args=dict(
name="tools_test",
label="latest",
),
placeholders=dict(
text="very short poem",
topic="a topic comprised of the words returned by the get_random_words tool",
),
),
tools=dict(
get_random_words=get_random_words,
),
)
response = await overlord.task(data)
- every chat will have its own session id used to connect messages in the Langfuse UI
- the initally provided system prompt json schema from the first Langfuse prompt is used throughout a chat