BrowserAgentKit is a TypeScript library for running a code agent in the browser.
Highlights:
- Browser-first agent loop (observe → plan → act)
- Skills: prompt-based tools stored in the DOM (Codex skill markdown)
- Built-in DOM/JS tools (XPath helpers, event binding, interpreter)
- Optional chat UI subpath (
browseragentkit/ui) - Static skill loader with Vite plugin (DOM-backed resources)
- Minimal MCP HTTP tools adapter
- Streaming API (async generator)
npm i browseragentkitimport {
createAgentMessages,
createOpenAIResponsesAdapter,
isAgentError,
jsInterpreterTool,
runAgent,
Skill,
} from "browseragentkit";
// Somewhere in your HTML:
// <script type="text/markdown" id="skill-canvas-render">
// ---
// name: canvas.render
// description: Renders HTML inside the canvas using the JS interpreter helpers.
// ---
// # Goal
// Create or update HTML inside the canvas.
//
// # Steps
// 1) Use the JS interpreter helpers: `x()`, `replaceSubtree()`, and `viewRoot`.
// 2) Build HTML as a string and call `replaceSubtree(x("/")[0], html)`.
// 3) Return a short confirmation message.
//
// # Notes
// - Keep it deterministic and short.
// </script>
const skills = [Skill.fromDomSelector("//script[@id='skill-canvas-render']", document)];
const adapter = createOpenAIResponsesAdapter({
model: "gpt-5.1-codex-mini",
baseURL: "/api/llm", // your backend proxy
apiKey: "sk-...", // DANGEROUS! DO NOT PASS YOUR OWN KEY
dangerouslyAllowBrowser: true,
});
// Adapter shape:
// { model, generate, countTokens?, contextWindowTokens? }
const agentMessages = createAgentMessages(
"System: This demo is for creating beautiful interfaces. Focus on elegant layout, typography, and clear visual hierarchy."
);
const tools = [
jsInterpreterTool(),
];
const callables = [...tools, ...skills];
const agentContext = { viewRoot: document.getElementById("canvas") };
for await (const ev of runAgent(
agentMessages,
adapter.generate,
"Create a hero section on the canvas",
callables,
25,
agentContext,
undefined,
{
tokenCounter: adapter.countTokens,
contextWindowTokens: adapter.contextWindowTokens,
model: adapter.model,
}
)) {
if (isAgentError(ev)) {
console.error(ev.left);
continue;
}
// handle events in your UI / logs
console.log(ev.right);
}adapter.generate(messages, tools, signal) must return (or resolve to) an AsyncIterable of Either<Error, AgentEvent> objects. When using the OpenAI Responses stream, you can reuse createOpenAIResponsesAdapter from the library (see above or examples/main.js).
The agent preserves conversation history across runs; create a fresh createAgentMessages() array to clear it (system prompt is kept).
If runAgent() is called again with the same messages array, the previous run is aborted.
import { createChatUi } from "browseragentkit/ui";
const chat = createChatUi({ container: document.getElementById("chatLog") });
chat.addUserMessage("Hello");
chat.appendAssistantDelta("Hi");
chat.finalizeAssistantMessage("Hi there!");The UI appends DOM nodes with classes: message, user, assistant, bubble, and status.
Bring your own CSS to style them (see examples/index.html for the demo styles).
BrowserAgentKit can load Codex-style skills from SKILL.md files at build time and inject them into the DOM.
// vite.config.ts
import { defineConfig } from "vite";
import { codexSkillPlugin } from "browseragentkit/skills/vite";
export default defineConfig({
plugins: [codexSkillPlugin({ root: "./skills", mode: "dom" })],
});At runtime, load a skill from the injected DOM:
import { Skill } from "browseragentkit";
const skill = Skill.fromDomSelector("//script[@data-skill='canvas.render' and @data-kind='prompt']", document);The plugin injects script[type="text/plain"] nodes under #bak-skills-root with:
data-skill, data-kind (prompt | reference | script), and data-path.
import { createMcpHttpClient, mcpTools } from "browseragentkit";
const mcpClient = createMcpHttpClient({
baseUrl: "https://mcp.example.com",
bearerToken: "token",
});
const callables = [
...mcpTools(mcpClient),
// ...your other tools/skills
];A skill is a tool that runs the LLM with a Markdown prompt stored in the DOM. Store prompts in a script tag (or any DOM element) as Codex skill markdown (YAML frontmatter + body) and pass an XPath selector:
<script type="text/markdown" id="skill-example">
---
name: example.skill
description: One-line description (optional but recommended).
---
# Goal
...
# Steps
1) ...
2) ...
# Output
- What the agent should return.
</script>
<script type="text/markdown" id="skill-subskill">
---
name: example.subskill
description: Nested skill (only available inside this skill).
---
# Goal
...
</script>const skills = [
Skill.fromDomSelector("//script[@id='skill-example']", document)
// Optional: scope what the skill can call.
.withCallables([
jsInterpreterTool(),
Skill.fromDomSelector("//script[@id='skill-subskill']", document),
]),
];The agent exposes each skill as a function-calling tool. When a skill runs, the agent:
- Builds a child cycle from scratch (base system prompt → skill prompt → optional history → task).
- Sanitizes the skill prompt to Markdown-only.
- Makes only the skill's
callablesavailable to the child cycle. The skill tool arguments are{ task: string; history?: EasyInputMessage[] }. At the start of each root cycle, the agent injects a system message listing available tools and skills. If the user mentions$name, it is treated as a suggestion. OpenAI tool names must match^[a-zA-Z0-9_-]+$, so skill/tool names are normalized for function calling (e.g.,canvas.render→canvas_render). When the name changes, the system list shows the call name.
A tool is an instance of the Tool class: name, description, action, and input/output schemas.
Keep the description near the tool definition (in src/tools.ts).
Tool names are also normalized for function calling using the same rule as skills.
import { Tool } from "browseragentkit";
const echoTool = new Tool(
"echo",
"Echo the input.",
(args) => args,
{
type: "object",
properties: { value: { type: "string" } },
required: ["value"],
additionalProperties: false,
},
{ type: "object", description: "Echoed args." }
);Built-in tools:
jsInterpreterTool(runs JS with DOM helpers + jQuery)jsRunTool(same as above, with explicit jQuery guidance)domSummaryTooldomSubtreeHtmlTooldomAppendHtmlTooldomRemoveTooldomBindEventTool
If you load the demo via plain importmap, add jQuery:
<script type="importmap">
{
"imports": {
"browseragentkit": "../dist/index.js",
"browseragentkit/ui": "../dist/ui/index.js",
"openai": "../node_modules/openai/index.mjs",
"jquery": "../node_modules/jquery/dist/jquery.min.js"
}
}
</script>You don’t call tools directly: you pass tools and skills into the agent as one callables list, and the agent calls them when needed.
npm install
npm run build
python3 -m http.server 5173Then open http://localhost:5173/examples/ in your browser.
npm install
npm run devVite will open the demo and refresh on source changes. The demo includes skill and tool toggles above the chat input.
npm run build:staticThe demo is built into examples/dist/ and deployed by GitHub Actions on pushes to main.
You can prefill demo fields via query params:
?baseUrl=https://...&apiKey=sk-...&message=Hello
runAgent(messages, generate, input, callables?, maxSteps?, context?, signal?, options?) returns an async generator of Either<Error, AgentEvent>.
If you don’t need persistent history, pass undefined for messages and the agent will create a fresh system prompt for the run.
To append custom app-wide instructions (like AGENTS.md), pass a string to createAgentMessages(agentsMd).
Context compaction options (main cycle only):
tokenCounter: function to count tokens for the current model.contextWindowTokens: max context size (defaults to 96k).compactThreshold: ratio to trigger compaction (defaults to 0.75).model: model name passed to the token counter.
Typical event kinds:
message(agent text)thinking.delta/thinking(reasoning summary text, when available)tool.start/tool.endartifactdone
Consume it with:
const agentMessages = createAgentMessages();
for await (const ev of runAgent(agentMessages, generate, "...")) {
if (isAgentError(ev)) {
console.error(ev.left);
break;
}
// update UI / state
}If you want status events, wrap the stream:
import { withStatus } from "browseragentkit";
for await (const ev of withStatus(runAgent(agentMessages, generate, "..."))) {
if (isAgentError(ev)) {
console.error(ev.left);
break;
}
if (ev.right.type === "status") {
console.log(ev.right.status);
}
}