Skip to content

bavadim/BrowserAgentKit

Repository files navigation

BrowserAgentKit

BrowserAgentKit is a TypeScript library for running a code agent in the browser.

Demo

Highlights:

  • Browser-first agent loop (observe → plan → act)
  • Skills: prompt-based tools stored in the DOM (Codex skill markdown)
  • Built-in DOM/JS tools (XPath helpers, event binding, interpreter)
  • Optional chat UI subpath (browseragentkit/ui)
  • Static skill loader with Vite plugin (DOM-backed resources)
  • Minimal MCP HTTP tools adapter
  • Streaming API (async generator)

Install

npm i browseragentkit

Quick start

import {
  createAgentMessages,
  createOpenAIResponsesAdapter,
  isAgentError,
  jsInterpreterTool,
  runAgent,
  Skill,
} from "browseragentkit";

// Somewhere in your HTML:
// <script type="text/markdown" id="skill-canvas-render">
// ---
// name: canvas.render
// description: Renders HTML inside the canvas using the JS interpreter helpers.
// ---
// # Goal
// Create or update HTML inside the canvas.
//
// # Steps
// 1) Use the JS interpreter helpers: `x()`, `replaceSubtree()`, and `viewRoot`.
// 2) Build HTML as a string and call `replaceSubtree(x("/")[0], html)`.
// 3) Return a short confirmation message.
//
// # Notes
// - Keep it deterministic and short.
// </script>

const skills = [Skill.fromDomSelector("//script[@id='skill-canvas-render']", document)];

const adapter = createOpenAIResponsesAdapter({
  model: "gpt-5.1-codex-mini",
  baseURL: "/api/llm", // your backend proxy
  apiKey: "sk-...", // DANGEROUS! DO NOT PASS YOUR OWN KEY
  dangerouslyAllowBrowser: true,
});

// Adapter shape:
// { model, generate, countTokens?, contextWindowTokens? }

const agentMessages = createAgentMessages(
  "System: This demo is for creating beautiful interfaces. Focus on elegant layout, typography, and clear visual hierarchy."
);
const tools = [
  jsInterpreterTool(),
];
const callables = [...tools, ...skills];
const agentContext = { viewRoot: document.getElementById("canvas") };

for await (const ev of runAgent(
  agentMessages,
  adapter.generate,
  "Create a hero section on the canvas",
  callables,
  25,
  agentContext,
  undefined,
  {
    tokenCounter: adapter.countTokens,
    contextWindowTokens: adapter.contextWindowTokens,
    model: adapter.model,
  }
)) {
  if (isAgentError(ev)) {
    console.error(ev.left);
    continue;
  }
  // handle events in your UI / logs
  console.log(ev.right);
}

adapter.generate(messages, tools, signal) must return (or resolve to) an AsyncIterable of Either<Error, AgentEvent> objects. When using the OpenAI Responses stream, you can reuse createOpenAIResponsesAdapter from the library (see above or examples/main.js). The agent preserves conversation history across runs; create a fresh createAgentMessages() array to clear it (system prompt is kept). If runAgent() is called again with the same messages array, the previous run is aborted.

Optional chat UI

import { createChatUi } from "browseragentkit/ui";

const chat = createChatUi({ container: document.getElementById("chatLog") });

chat.addUserMessage("Hello");
chat.appendAssistantDelta("Hi");
chat.finalizeAssistantMessage("Hi there!");

The UI appends DOM nodes with classes: message, user, assistant, bubble, and status. Bring your own CSS to style them (see examples/index.html for the demo styles).

Static skills from files (Vite)

BrowserAgentKit can load Codex-style skills from SKILL.md files at build time and inject them into the DOM.

// vite.config.ts
import { defineConfig } from "vite";
import { codexSkillPlugin } from "browseragentkit/skills/vite";

export default defineConfig({
  plugins: [codexSkillPlugin({ root: "./skills", mode: "dom" })],
});

At runtime, load a skill from the injected DOM:

import { Skill } from "browseragentkit";

const skill = Skill.fromDomSelector("//script[@data-skill='canvas.render' and @data-kind='prompt']", document);

The plugin injects script[type="text/plain"] nodes under #bak-skills-root with: data-skill, data-kind (prompt | reference | script), and data-path.

MCP tools (HTTP)

import { createMcpHttpClient, mcpTools } from "browseragentkit";

const mcpClient = createMcpHttpClient({
  baseUrl: "https://mcp.example.com",
  bearerToken: "token",
});

const callables = [
  ...mcpTools(mcpClient),
  // ...your other tools/skills
];

Concepts

Skills

A skill is a tool that runs the LLM with a Markdown prompt stored in the DOM. Store prompts in a script tag (or any DOM element) as Codex skill markdown (YAML frontmatter + body) and pass an XPath selector:

<script type="text/markdown" id="skill-example">
---
name: example.skill
description: One-line description (optional but recommended).
---
# Goal
...

# Steps
1) ...
2) ...

# Output
- What the agent should return.
</script>

<script type="text/markdown" id="skill-subskill">
---
name: example.subskill
description: Nested skill (only available inside this skill).
---
# Goal
...
</script>
const skills = [
  Skill.fromDomSelector("//script[@id='skill-example']", document)
    // Optional: scope what the skill can call.
    .withCallables([
      jsInterpreterTool(),
      Skill.fromDomSelector("//script[@id='skill-subskill']", document),
    ]),
];

The agent exposes each skill as a function-calling tool. When a skill runs, the agent:

  • Builds a child cycle from scratch (base system prompt → skill prompt → optional history → task).
  • Sanitizes the skill prompt to Markdown-only.
  • Makes only the skill's callables available to the child cycle. The skill tool arguments are { task: string; history?: EasyInputMessage[] }. At the start of each root cycle, the agent injects a system message listing available tools and skills. If the user mentions $name, it is treated as a suggestion. OpenAI tool names must match ^[a-zA-Z0-9_-]+$, so skill/tool names are normalized for function calling (e.g., canvas.rendercanvas_render). When the name changes, the system list shows the call name.

Tools

A tool is an instance of the Tool class: name, description, action, and input/output schemas. Keep the description near the tool definition (in src/tools.ts). Tool names are also normalized for function calling using the same rule as skills.

import { Tool } from "browseragentkit";

const echoTool = new Tool(
  "echo",
  "Echo the input.",
  (args) => args,
  {
    type: "object",
    properties: { value: { type: "string" } },
    required: ["value"],
    additionalProperties: false,
  },
  { type: "object", description: "Echoed args." }
);

Built-in tools:

  • jsInterpreterTool (runs JS with DOM helpers + jQuery)
  • jsRunTool (same as above, with explicit jQuery guidance)
  • domSummaryTool
  • domSubtreeHtmlTool
  • domAppendHtmlTool
  • domRemoveTool
  • domBindEventTool

If you load the demo via plain importmap, add jQuery:

<script type="importmap">
{
  "imports": {
    "browseragentkit": "../dist/index.js",
    "browseragentkit/ui": "../dist/ui/index.js",
    "openai": "../node_modules/openai/index.mjs",
    "jquery": "../node_modules/jquery/dist/jquery.min.js"
  }
}
</script>

You don’t call tools directly: you pass tools and skills into the agent as one callables list, and the agent calls them when needed.

Demo

npm install
npm run build
python3 -m http.server 5173

Then open http://localhost:5173/examples/ in your browser.

Dev (hot reload)

npm install
npm run dev

Vite will open the demo and refresh on source changes. The demo includes skill and tool toggles above the chat input.

GitHub Pages (static demo)

npm run build:static

The demo is built into examples/dist/ and deployed by GitHub Actions on pushes to main.

URL presets

You can prefill demo fields via query params:

?baseUrl=https://...&apiKey=sk-...&message=Hello

Agent API (async generator)

runAgent(messages, generate, input, callables?, maxSteps?, context?, signal?, options?) returns an async generator of Either<Error, AgentEvent>. If you don’t need persistent history, pass undefined for messages and the agent will create a fresh system prompt for the run. To append custom app-wide instructions (like AGENTS.md), pass a string to createAgentMessages(agentsMd).

Context compaction options (main cycle only):

  • tokenCounter: function to count tokens for the current model.
  • contextWindowTokens: max context size (defaults to 96k).
  • compactThreshold: ratio to trigger compaction (defaults to 0.75).
  • model: model name passed to the token counter.

Typical event kinds:

  • message (agent text)
  • thinking.delta / thinking (reasoning summary text, when available)
  • tool.start / tool.end
  • artifact
  • done

Consume it with:

const agentMessages = createAgentMessages();

for await (const ev of runAgent(agentMessages, generate, "...")) {
  if (isAgentError(ev)) {
    console.error(ev.left);
    break;
  }
  // update UI / state
}

If you want status events, wrap the stream:

import { withStatus } from "browseragentkit";

for await (const ev of withStatus(runAgent(agentMessages, generate, "..."))) {
  if (isAgentError(ev)) {
    console.error(ev.left);
    break;
  }
  if (ev.right.type === "status") {
    console.log(ev.right.status);
  }
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •