Skip to content

HackStrix/herd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

52 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Herd - Go Library

Herd is a session-affine process pool for Go. It manages a fleet of OS subprocess "workers" and routes incoming requests to the correct worker based on an arbitrary session ID.

The Core Invariant

1 Session ID β†’ 1 Worker, for the lifetime of the session.

This invariant transforms stateful binaries (like Browsers, LLMs, or REPLs) into multi-tenant services. Because a session always hits the same process, you can maintain in-memory state, KV caches, or local file systems without a complex coordination layer.


πŸš€ Key Features

  • Session Affinity: Guaranteed routing of a session ID to its unique pinned worker.
  • Auto-Scaling: Dynamically scale workers between min and max bounds based on demand.
  • Idle Eviction (TTL): Automatically reclaim workers that haven't been accessed within a configurable TTL.
  • Health Monitoring: Continuous liveness checks on every worker process; dead workers are automatically replaced.
  • Singleflight Acquisition: Protects against "thundering herd" issues where multiple concurrent requests for a new session ID try to spawn workers simultaneously.
  • Generic Clients: Fully generic Pool[C] supports any client type (HTTP, gRPC, custom structs).
  • Reverse Proxy Helper: Built-in HTTP reverse proxy that handles the full session lifecycle (Acquire β†’ Proxy β†’ Release).

πŸ“¦ Installation

go get github.com/hackstrix/herd

🌐 Quick Start: Playwright Browser Isolation

Herd is perfect for creating multi-tenant browser automation gateways. In this example, each session ID gets its own dedicated Chrome instance. Because browsers maintain complex state (cookies, local storage, open pages), we configure Herd to never reuse a worker once its TTL expires, avoiding cross-tenant state leaks.

You can find the full, runnable code for this example in examples/playwright/main.go.

1. The Code

package main

import (
	"log"
	"net/http"
	"time"

	"github.com/hackstrix/herd"
	"github.com/hackstrix/herd/proxy"
)

func main() {
	// 1. Spawns an isolated npx playwright run-server per user
	factory := herd.NewProcessFactory("npx", "playwright", "run-server", "--port", "{{.Port}}", "--host", "127.0.0.1").
		WithHealthPath("/").
		WithStartTimeout(1 * time.Minute).
		WithStartHealthCheckDelay(500 * time.Millisecond)

	// 2. Worker reuse is disabled to prevent state leaks between sessions
	pool, _ := herd.New(factory,
		herd.WithAutoScale(1, 5), // auto-scale between 1 and 5 concurrent tenants (until expires)
		herd.WithTTL(15 * time.Minute),
		herd.WithWorkerReuse(false), // CRITICAL: Never share browsers between users
	)

	// 3. Setup proxy to intelligently route WebSocket connections
	mux := http.NewServeMux()
	mux.Handle("/", proxy.NewReverseProxy(pool, func(r *http.Request) string {
		return r.Header.Get("X-Session-ID") // Pin by X-Session-ID
	}))

	log.Fatal(http.ListenAndServe(":8080", mux))
}

2. Running It

Start the gateway (assuming you are in the examples/playwright directory):

sudo snap install node
npx playwright install --with-deps
# Running without sudo will disable cgroup isolation.
sudo go run .

3. Usage

Connect to the gateway using Python and Playwright. Herd guarantees that all requests with the same X-Session-ID connect to the exact same browser instance, preserving your state (like logins, cookies, and tabs) across reconnections as long as your session TTL hasn't expired!

import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        # Herd routes based on X-Session-ID header
        browser = await p.chromium.connect(
            "ws://127.0.0.1:8080/", 
            headers={"X-Session-ID": "my-secure-session"}
        )
        
        ctx = await browser.new_context()
        page = await ctx.new_page()
        await page.goto("https://github.com")
        print(await page.title())
        await browser.close()

asyncio.run(main())

πŸ› οΈ Quick Start: Ollama Multi-Agent Gateway

Here is an example of turning ollama serve into a multi-tenant LLM gateway where each agent (or user) gets their own dedicated Ollama process. This is specifically useful for isolating context windows or KV caches per agent without downloading models multiple times.

You can find the full, runnable code for this example in examples/ollama/main.go.

1. The Code

package main

import (
	"context"
	"log"
	"net/http"
	"time"

	"github.com/hackstrix/herd"
	"github.com/hackstrix/herd/proxy"
)

func main() {
	// 1. Define how to spawn an Ollama worker on a dynamic port
	factory := herd.NewProcessFactory("ollama", "serve").
		WithEnv("OLLAMA_HOST=127.0.0.1:{{.Port}}").
		WithHealthPath("/").
		WithStartTimeout(2 * time.Minute).
		WithStartHealthCheckDelay(1 * time.Second)

	// 2. Create the pool with auto-scaling and TTL eviction
	pool, _ := herd.New(factory,
		herd.WithAutoScale(1, 10),
		herd.WithTTL(10 * time.Minute),
		herd.WithWorkerReuse(true),
	)

	// 3. Setup a session-aware reverse proxy
	mux := http.NewServeMux()
	mux.Handle("/api/", proxy.NewReverseProxy(pool, func(r *http.Request) string {
		return r.Header.Get("X-Agent-ID") // Pin worker by X-Agent-ID header
	}))

	log.Fatal(http.ListenAndServe(":8080", mux))
}

2. Running It

Start the gateway (assuming you are in the examples/ollama directory):

sudo snap isntall ollama
# Running without sudo will disable cgroup isolation.
sudo go run .

3. Usage

Send requests with an X-Agent-ID header. Herd guarantees that all requests with the same ID will hit the exact same underlying ollama serve instance!

curl -X POST http://localhost:8080/api/chat \
  -H "X-Agent-ID: agent-42" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3",
    "messages": [{"role": "user", "content": "Hello! I am agent 42."}]
  }'

πŸ—οΈ Architecture

Read the full Architecture & Request Lifecycle Design Document

Herd is built around three core interfaces:

  • Worker[C]: Represents a single running subprocess. It provides the typed client C used to communicate with the process.
  • WorkerFactory[C]: Responsible for spawning new Worker instances. The default ProcessFactory handles local OS binaries.
  • Pool[C]: The central router. It maps session IDs to workers, manages the horizontal scaling, and handles session lifecycle.

Session Lifecycle

  1. Acquire(ctx, sessionID): Retrieves the worker pinned to the ID. If none exists, a free worker is popped from the pool (or a new one is spawned).
  2. Session.Worker.Client(): Use the returned worker to perform your logic.
  3. Session.Release(): Returns the worker to the pool. The bond to the session ID is preserved until the TTL expires or the worker crashes.

βš™οΈ Configuration Options

Option Description Default
WithAutoScale(min, max) Sets the floor and ceiling for the process fleet. min:1, max:10
WithTTL(time.Duration) Max idle time for a session before it is evicted. 5m
WithHealthInterval(d) How often to poll workers for liveness. 5s
WithStartHealthCheckDelay(d) Delay before starting health checks on newly spawned workers. 1s
WithCrashHandler(func) Callback triggered when a worker exits unexpectedly. nil
WithWorkerReuse(bool) Whether to recycle workers or kill them when TTL expires. true

πŸ“Š Monitoring

Pool.Stats() returns a point-in-time snapshot of both pool state and host resource usage, powered by the herd/observer subpackage.

import (
    "fmt"
    "github.com/hackstrix/herd"
)

stats := pool.Stats()

fmt.Printf("Workers : %d total, %d available\n",
    stats.TotalWorkers, stats.AvailableWorkers)
fmt.Printf("Sessions: %d active, %d acquiring\n",
    stats.ActiveSessions, stats.InflightAcquires)

// Node-level resource snapshot (Linux only; zero on macOS/Windows)
fmt.Printf("Host RAM: %d MB total, %d MB available\n",
    stats.Node.TotalMemoryBytes/1024/1024,
    stats.Node.AvailableMemoryBytes/1024/1024)
fmt.Printf("CPU Idle: %.1f%%\n", stats.Node.CPUIdle*100)

Note: On Linux, Stats() blocks for ~100 ms to measure CPU idle via two /proc/stat samples. Cache the result if you expose it on a high-traffic metrics endpoint.

The Node field is zero-valued on non-Linux platforms β€” treat a zero TotalMemoryBytes as "metrics unavailable" rather than "machine has no RAM."


πŸ“„ License

MIT License. See LICENSE for details.

About

Session-affine process pool and reverse proxy for stateful Go binaries.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors