diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index 13257b714e..3c6042939b 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -21,3 +21,9 @@ A clear and concise description of how you'd like it to work. ## Alternatives Considered A clear and concise description of any alternative solutions or features you've considered. + +## Pre-submission Checklist + +- [ ] I have searched existing issues and this is not a duplicate +- [ ] I have described a concrete use case, not just a feature in the abstract +- [ ] I understand that feature requests require a linked issue before a PR will be accepted diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml index 2e7e9b1d45..da6e8cdf78 100644 --- a/.github/workflows/docs.yml +++ b/.github/workflows/docs.yml @@ -43,6 +43,7 @@ jobs: with: path: docs/site + deploy: environment: name: github-pages diff --git a/README.md b/README.md index 1c23b44881..2350a34874 100644 --- a/README.md +++ b/README.md @@ -1,43 +1,99 @@
-altimate-code + + + altimate-code + -**The data engineering agent for dbt, SQL, and cloud warehouses.** +**The open-source data engineering harness.** -An AI-powered CLI with 99+ specialized tools — SQL analysis, schema inspection, -column-level lineage, FinOps, PII detection, and data visualization. Connects to your warehouse, -understands your data, and helps you ship faster. +The intelligence layer for data engineering AI — 99+ deterministic tools for SQL analysis, +column-level lineage, dbt, FinOps, and warehouse connectivity across every major cloud platform. + +Run standalone in your terminal, embed underneath Claude Code or Codex, or integrate +into CI pipelines and orchestration DAGs. Precision data tooling for any LLM. [![npm](https://img.shields.io/npm/v/@altimateai/altimate-code)](https://www.npmjs.com/package/@altimateai/altimate-code) [![npm](https://img.shields.io/npm/v/@altimateai/altimate-core)](https://www.npmjs.com/package/@altimateai/altimate-core) +[![npm downloads](https://img.shields.io/npm/dm/@altimateai/altimate-code)](https://www.npmjs.com/package/@altimateai/altimate-code) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](./LICENSE) [![CI](https://github.com/AltimateAI/altimate-code/actions/workflows/ci.yml/badge.svg)](https://github.com/AltimateAI/altimate-code/actions/workflows/ci.yml) -[![Docs](https://img.shields.io/badge/docs-altimate--code.sh-blue)](https://altimate.ai) +[![Slack](https://img.shields.io/badge/Slack-Join%20Community-4A154B?logo=slack)](https://altimate.ai/slack) +[![Docs](https://img.shields.io/badge/docs-altimateai.github.io-blue)](https://altimateai.github.io/altimate-code)
--- -## Why altimate? +## Install + +```bash +# npm (recommended) +npm install -g @altimateai/altimate-code + +# Homebrew +brew install AltimateAI/tap/altimate-code +``` + +Then — in order: + +**Step 1: Configure your LLM provider** (required before anything works): +```bash +altimate # Launch the TUI +/connect # Interactive setup — choose your provider and enter your API key +``` + +> **No API key?** Select **Codex** in the `/connect` menu — it's built-in and requires no setup. -General-purpose coding agents can write SQL, but they don't *understand* it. They can't trace lineage, detect anti-patterns, check PII exposure, or optimize warehouse costs — because they don't have the tools. +Or set an environment variable directly: +```bash +export ANTHROPIC_API_KEY=your_key # Anthropic Claude +export OPENAI_API_KEY=your_key # OpenAI +``` + +**Step 2 (optional): Auto-detect your data stack** (read-only, safe for production connections): +```bash +altimate /discover +``` + +`/discover` auto-detects dbt projects, warehouse connections (from `~/.dbt/profiles.yml`, Docker, environment variables), and installed tools (dbt, sqlfluff, airflow, dagster, and more). Skip this and start building — you can always run it later. + +> **Zero Python setup required.** On first run, the CLI automatically downloads [`uv`](https://github.com/astral-sh/uv), creates an isolated Python environment, and installs the data engine with all warehouse drivers. No `pip install`, no virtualenv management. -altimate is a fork of [OpenCode](https://github.com/anomalyco/opencode) rebuilt for data teams. It gives any LLM access to 99+ specialized data engineering tools, 12 purpose-built skills, and direct warehouse connectivity — so the AI works with your actual schemas, not guesses. +## Why a specialized harness? -## General agents vs altimate +General AI coding agents can edit SQL files. They cannot *understand* your data stack. +altimate gives any LLM a deterministic data engineering intelligence layer — +no hallucinated SQL advice, no guessing at schema, no missed PII. | Capability | General coding agents | altimate | |---|---|---| -| SQL anti-pattern detection | None | 19 rules with confidence scoring | -| Column-level lineage | None | Automatic from SQL | -| Schema-aware autocomplete | None | Indexes your warehouse metadata | -| Cross-dialect translation | None | Snowflake, BigQuery, Databricks, Redshift | -| FinOps analysis | None | Credit analysis, expensive queries, warehouse sizing | -| PII detection | None | Automatic column scanning | -| dbt integration | Basic file editing | Manifest parsing, test generation, model scaffolding | +| SQL anti-pattern detection | None | 19 rules, confidence-scored | +| Column-level lineage | None | Automatic from SQL, any dialect | +| Schema-aware autocomplete | None | Live-indexed warehouse metadata | +| Cross-dialect SQL translation | None | Snowflake ↔ BigQuery ↔ Databricks ↔ Redshift | +| FinOps & cost analysis | None | Credits, expensive queries, right-sizing | +| PII detection | None | 30+ regex patterns, 15 categories | +| dbt integration | Basic file editing | Manifest parsing, test gen, model scaffolding, lineage | | Data visualization | None | Auto-generated charts from SQL results | | Observability | None | Local-first tracing of AI sessions and tool calls | +> **Benchmarked precision:** 100% F1 on SQL anti-pattern detection (1,077 queries, 19 rules, 0 false positives). +> 100% edge-match on column-level lineage (500 queries, 13 categories). +> [See methodology →](experiments/BENCHMARKS.md) + +**What the harness provides:** +- **SQL Intelligence Engine** — deterministic SQL parsing and analysis (not LLM pattern matching). 19 rules, 100% F1, 0 false positives. Built for data engineers who've been burned by hallucinated SQL advice. +- **Column-Level Lineage** — automatic extraction from SQL across dialects. 100% edge-match on 500 benchmark queries. +- **Live Warehouse Intelligence** — indexed schemas, query history, and cost data from your actual warehouse. Not guesses. +- **dbt Native** — manifest parsing, test generation, model scaffolding, medallion patterns, impact analysis +- **FinOps** — credit consumption, expensive query detection, warehouse right-sizing, idle resource cleanup +- **PII Detection** — 15 categories, 30+ regex patterns, enforced pre-execution + +**Works seamlessly with Claude Code and Codex.** altimate is the data engineering tool layer — use it standalone in your terminal, or mount it as the harness underneath whatever AI agent you already run. The two are complementary. + +altimate is a fork of [OpenCode](https://github.com/anomalyco/opencode) rebuilt for data teams. Model-agnostic — bring your own LLM or run locally with Ollama. + ## Quick demo ```bash @@ -59,6 +115,8 @@ altimate is a fork of [OpenCode](https://github.com/anomalyco/opencode) rebuilt ## Key Features +All features are deterministic — they parse, trace, and measure. Not LLM pattern matching. + ### SQL Anti-Pattern Detection 19 rules with confidence scoring — catches SELECT *, cartesian joins, non-sargable predicates, correlated subqueries, and more. **100% accuracy** on 1,077 benchmark queries. @@ -86,27 +144,6 @@ Built-in observability for AI interactions — trace tool calls, token usage, an ### AI Teammate Training Teach your AI teammate project-specific patterns, naming conventions, and best practices. The training system learns from examples and applies rules automatically across sessions. -## Install - -```bash -# npm (recommended) -npm install -g altimate-code - -# Homebrew -brew install AltimateAI/tap/altimate-code -``` - -Then: - -```bash -altimate # Launch the interactive TUI -altimate /discover # Auto-detect your data stack and go -``` - -> **Note:** `altimate-code` still works as a backward-compatible alias. - -`/discover` auto-detects dbt projects, warehouse connections (from `~/.dbt/profiles.yml`, Docker, environment variables), and installed tools (dbt, sqlfluff, airflow, dagster, and more). - ## Agent Modes Each agent has scoped permissions and purpose-built tools for its role. @@ -117,8 +154,12 @@ Each agent has scoped permissions and purpose-built tools for its role. | **Analyst** | Explore data, run SELECT queries, and generate insights | Read-only enforced | | **Validator** | Data quality checks, schema validation, test coverage analysis | Read + validate | | **Migrator** | Cross-warehouse SQL translation, schema migration, dialect conversion | Read/write for migrations | +| **Researcher** | Deep-dive analysis, documentation research, and knowledge extraction | Read-only | +| **Trainer** | Teach project-specific patterns, naming conventions, and best practices | Read + write training data | | **Executive** | Business-audience summaries — translates findings into revenue, cost, and compliance impact | Read-only | +> **New to altimate?** Start with **Analyst mode** — it's read-only and safe to run against production connections. + ## Supported Warehouses Snowflake · BigQuery · Databricks · PostgreSQL · Redshift · DuckDB · MySQL · SQL Server @@ -131,6 +172,12 @@ Model-agnostic — bring your own provider or run locally. Anthropic · OpenAI · Google Gemini · Google Vertex AI · Amazon Bedrock · Azure OpenAI · Mistral · Groq · DeepInfra · Cerebras · Cohere · Together AI · Perplexity · xAI · OpenRouter · Ollama · GitHub Copilot +> **No API key?** **Codex** is a built-in provider with no key required. Select it via `/connect` to start immediately. + +## Skills + +altimate ships with built-in skills for every common data engineering task — type `/` in the TUI to browse available skills and get autocomplete. No memorization required. + ## Architecture ``` @@ -158,37 +205,31 @@ packages/ drivers/ Shared database drivers (10 warehouses) dbt-tools/ dbt integration (TypeScript) plugin/ Plugin system - sdk/js/ JavaScript SDK + sdk/ SDKs (includes VS Code extension) util/ Shared utilities ``` -## Documentation - -Full docs at **[altimate.ai](https://altimate.ai)**. - -- [Getting Started](https://altimate.ai/getting-started/) -- [SQL Tools](https://altimate.ai/data-engineering/tools/sql-tools/) -- [Agent Modes](https://altimate.ai/data-engineering/agent-modes/) -- [Configuration](https://altimate.ai/configure/model-providers/) - ## Community & Contributing -- **Issues**: [GitHub Issues](https://github.com/AltimateAI/altimate-code/issues) -- **Discussions**: [GitHub Discussions](https://github.com/AltimateAI/altimate-code/discussions) -- **Security**: See [SECURITY.md](./SECURITY.md) +- **Slack**: [altimate.ai/slack](https://altimate.ai/slack) — Real-time chat for questions, showcases, and feature discussion +- **Issues**: [GitHub Issues](https://github.com/AltimateAI/altimate-code/issues) — Bug reports and feature requests +- **Discussions**: [GitHub Discussions](https://github.com/AltimateAI/altimate-code/discussions) — Long-form questions and proposals +- **Security**: See [SECURITY.md](./SECURITY.md) for responsible disclosure -Contributions welcome! Please read the [Contributing Guide](./CONTRIBUTING.md) before opening a PR. +Contributions welcome — docs, SQL rules, warehouse connectors, and TUI improvements are all needed. The contributing guide covers setup, the vouch system, and the issue-first PR policy. -```bash -git clone https://github.com/AltimateAI/altimate-code.git -cd altimate-code -bun install -``` +**[Read CONTRIBUTING.md →](./CONTRIBUTING.md)** -## Acknowledgements +## What's New -altimate is a fork of [OpenCode](https://github.com/anomalyco/opencode), the open-source AI coding agent. We build on top of their excellent foundation to add data-team-specific capabilities. +- **v0.4.1** (March 2026) — env-based skill selection, session caching, tracing improvements +- **v0.4.0** (Feb 2026) — data visualization skill, 99+ tools, training system +- **v0.3.x** — [See full changelog →](CHANGELOG.md) ## License MIT — see [LICENSE](./LICENSE). + +## Acknowledgements + +altimate is a fork of [OpenCode](https://github.com/anomalyco/opencode), the open-source AI coding agent. We build on top of their excellent foundation to add data-team-specific capabilities. diff --git a/docs/docs/configure/skills.md b/docs/docs/configure/skills.md index f83fa1150c..66801fd2b1 100644 --- a/docs/docs/configure/skills.md +++ b/docs/docs/configure/skills.md @@ -34,16 +34,13 @@ Focus on the query: $ARGUMENTS Skills are loaded from these locations (in priority order): -1. **External directories** (if not disabled): - - `~/.claude/skills/` - - `~/.agents/skills/` - - `.claude/skills/` (project, searched up tree) - - `.agents/skills/` (project, searched up tree) - -2. **altimate-code directories**: +1. **altimate-code directories** (project-scoped, highest priority): - `.altimate-code/skill/` - `.altimate-code/skills/` +2. **Global user directories**: + - `~/.altimate-code/skills/` + 3. **Custom paths** (from config): ```json @@ -54,7 +51,11 @@ Skills are loaded from these locations (in priority order): } ``` -4. **Remote URLs** (from config): +4. **External directories & remote URLs** (if not disabled): + - `~/.claude/skills/` + - `~/.agents/skills/` + - `.claude/skills/` (project, searched up tree) + - `.agents/skills/` (project, searched up tree) ```json { @@ -66,13 +67,9 @@ Skills are loaded from these locations (in priority order): ## Built-in Data Engineering Skills -altimate includes skills for common data engineering tasks: +altimate ships with built-in skills for common data engineering tasks. Skills are loaded and surfaced dynamically at runtime — type `/` in the TUI to browse what's available and get autocomplete on skill names. -- SQL analysis and optimization -- dbt model generation -- Schema exploration -- Cost estimation -- Migration planning +For custom skills, see [Adding Custom Skills](#adding-custom-skills) below. ## Disabling External Skills diff --git a/docs/docs/data-engineering/guides/ci-headless.md b/docs/docs/data-engineering/guides/ci-headless.md new file mode 100644 index 0000000000..11d29da1af --- /dev/null +++ b/docs/docs/data-engineering/guides/ci-headless.md @@ -0,0 +1,155 @@ +# CI & Headless Mode + +Run any altimate prompt non-interactively from scripts, CI pipelines, or scheduled jobs. No TUI. Output is plain text or JSON. + +--- + +## Basic Usage + +```bash +altimate run "your prompt here" +``` + +Key flags: + +| Flag | Description | +|---|---| +| `--output json` | Structured JSON output instead of plain text | +| `--model ` | Override the configured model | +| `--connection ` | Select a specific warehouse connection | +| `--no-color` | Disable ANSI color codes (for CI logs) | + +See `altimate run --help` for the full flag list, or [CLI Reference](../../usage/cli.md). + +--- + +## Environment Variables for CI + +Configure without committing an `altimate-code.json` file: + +```bash +# LLM provider +ALTIMATE_PROVIDER=anthropic +ALTIMATE_ANTHROPIC_API_KEY=your-key-here + +# Or OpenAI +ALTIMATE_PROVIDER=openai +ALTIMATE_OPENAI_API_KEY=your-key-here + +# Warehouse (Snowflake example) +SNOWFLAKE_ACCOUNT=myorg-myaccount +SNOWFLAKE_USER=ci_user +SNOWFLAKE_PASSWORD=${{ secrets.SNOWFLAKE_PASSWORD }} +SNOWFLAKE_DATABASE=analytics +SNOWFLAKE_SCHEMA=public +SNOWFLAKE_WAREHOUSE=compute_wh +``` + +--- + +## Exit Codes + +| Code | Meaning | +|---|---| +| `0` | Success — task completed | +| `1` | Task completed but result indicates issues (e.g., anti-patterns found) | +| `2` | Configuration error (missing API key, bad connection) | +| `3` | Tool execution error (warehouse unreachable, query failed) | + +Use exit codes to fail CI on actionable findings: + +```bash +altimate run "validate models in models/staging/ for anti-patterns" || exit 1 +``` + +--- + +## Worked Examples + +### Example 1 — Nightly Cost Check (GitHub Actions) + +```yaml +# .github/workflows/cost-check.yml +name: Nightly Cost Check + +on: + schedule: + - cron: '0 8 * * 1-5' # 8am UTC, weekdays + +jobs: + cost-check: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Install altimate + run: npm install -g @altimateai/altimate-code + + - name: Run cost report + env: + ALTIMATE_PROVIDER: anthropic + ALTIMATE_ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }} + SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_CI_USER }} + SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_CI_PASSWORD }} + SNOWFLAKE_DATABASE: analytics + SNOWFLAKE_WAREHOUSE: compute_wh + run: | + altimate run "/cost-report" --output json > cost-report.json + cat cost-report.json + + - name: Upload cost report + uses: actions/upload-artifact@v4 + with: + name: cost-report + path: cost-report.json +``` + +### Example 2 — Post-Deploy SQL Validation + +Add to your dbt deployment workflow to catch anti-patterns before they reach production: + +```yaml + - name: SQL anti-pattern check + env: + ALTIMATE_PROVIDER: anthropic + ALTIMATE_ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + run: | + altimate run "validate all SQL files in models/staging/ for anti-patterns and fail if any are found" \ + --no-color \ + --output json +``` + +### Example 3 — Automated Test Generation (Pre-commit) + +```bash +#!/bin/bash +# .git/hooks/pre-commit +# Generate tests for any staged SQL model files + +STAGED_MODELS=$(git diff --cached --name-only --diff-filter=A | grep "models/.*\.sql") + +if [ -n "$STAGED_MODELS" ]; then + echo "Generating tests for new models..." + altimate run "/generate-tests for: $STAGED_MODELS" --no-color +fi +``` + +--- + +## Tracing in Headless Mode + +Tracing works in headless mode. View traces after the run: + +```bash +altimate trace list +altimate trace view +``` + +See [Tracing](../../configure/tracing.md) for the full trace reference. + +--- + +## Security Recommendation + +Use a **read-only warehouse user** for CI jobs that only need to read data. Reserve write-access credentials for jobs that explicitly need them (e.g., test generation that writes files). See [Security FAQ](../../security-faq.md) and [Permissions](../../configure/permissions.md). diff --git a/docs/docs/data-engineering/tools/index.md b/docs/docs/data-engineering/tools/index.md index cc3310c9fc..b8993ee5f8 100644 --- a/docs/docs/data-engineering/tools/index.md +++ b/docs/docs/data-engineering/tools/index.md @@ -1,6 +1,6 @@ # Tools Reference -altimate has 55+ specialized tools organized by function. +altimate has 99+ specialized tools organized by function. | Category | Tools | Purpose | |---|---|---| diff --git a/docs/docs/data-engineering/training/team-deployment.md b/docs/docs/data-engineering/training/team-deployment.md new file mode 100644 index 0000000000..fec7848db0 --- /dev/null +++ b/docs/docs/data-engineering/training/team-deployment.md @@ -0,0 +1,88 @@ +# Deploying Team Training + +Get every teammate's AI automatically applying the same SQL conventions, naming standards, and anti-pattern rules. Achieved by committing `.altimate-code/memory/` to git — teammates inherit your training on `git pull`. + +--- + +## Step 1 — Create Your First Team Training Entries + +Use the `/teach` or `/train` skills to save project-specific conventions: + +``` +/teach always use QUALIFY instead of nested window function subqueries in Snowflake SQL +``` + +``` +/teach our staging models follow the pattern: stg___.sql +``` + +Verify the training was saved: + +```bash +/training-status +``` + +This shows all active training entries, their scope (global vs project), and when they were added. + +--- + +## Step 2 — Locate the Training Files + +Training is stored in `.altimate-code/memory/` in your project root. Each entry is a markdown file with YAML frontmatter: + +``` +.altimate-code/ + memory/ + sql-conventions.md + naming-standards.md + project-patterns.md +``` + +**Global vs. project scope:** +- **Project scope** (`.altimate-code/memory/`): Applies when working in this project. Commit to git to share with team. +- **Global scope** (`~/.altimate-code/memory/`): Applies across all projects. Do not commit — this is personal. + +--- + +## Step 3 — Commit to Git + +```bash +git add .altimate-code/memory/ +git commit -m "Add team SQL conventions and naming standards" +git push +``` + +Teammates who `git pull` automatically inherit all training entries. No additional setup required — the tool reads from `.altimate-code/memory/` on startup. + +--- + +## Step 4 — Verify a Teammate Got the Training + +After a teammate pulls, they can run: + +```bash +/training-status +``` + +They should see the same entries you created. If they don't, check that `.altimate-code/memory/` is not in `.gitignore`. + +--- + +## Best Practices + +**What to teach first:** +1. Your team's most common SQL mistakes (the things that keep coming up in code review) +2. Naming conventions for models, tables, and columns +3. Project-specific patterns: your medallion layer names, your warehouse, your dbt project structure + +**Handling conflicting corrections:** +Later corrections override earlier ones for the same topic. Use `/training-status` to audit and delete stale entries with `/forget `. + +**Global vs. project scope:** +Use project scope for team standards. Use global scope only for personal preferences that apply to all your projects (e.g., preferred SQL style). + +--- + +## Limitations + +Training is as good as the corrections you save. The system doesn't infer conventions from your existing codebase — you teach it explicitly. For the full description of how training works, see [Training Overview](index.md). diff --git a/docs/docs/develop/plugins.md b/docs/docs/develop/plugins.md index a12deda276..237904ea80 100644 --- a/docs/docs/develop/plugins.md +++ b/docs/docs/develop/plugins.md @@ -1,6 +1,6 @@ # Plugins -Plugins extend altimate with custom tools, hooks, and behaviors. +Plugins extend altimate with custom tools, hooks, and behaviors. Use plugins to add domain-specific rules, integrate with internal APIs, log telemetry, enforce governance policies, or customize how the agent interacts with your data stack. ## Creating a Plugin @@ -42,6 +42,8 @@ export default definePlugin({ ## Registering Plugins +Add plugins to your `altimate-code.json` config file: + ```json { "plugin": [ @@ -52,21 +54,319 @@ export default definePlugin({ } ``` +Plugins can be specified as: + +- **npm package name** — installed from the registry (e.g., `"npm-published-plugin"`) +- **Relative path** — a local directory (e.g., `"./path/to/local-plugin"`) +- **Scoped package** — with an org prefix (e.g., `"@altimateai/altimate-code-plugin-example"`) + ## Plugin Hooks -Plugins can listen to lifecycle events: - -| Hook | Description | -|------|------------| -| `onSessionStart` | Session created | -| `onSessionEnd` | Session ended | -| `onMessage` | User message received | -| `onResponse` | AI response generated | -| `onToolCall` | Before tool execution | -| `onToolResult` | After tool execution | -| `onFileEdit` | File edited | -| `onFileWrite` | File written | -| `onError` | Error occurred | +Plugins can listen to lifecycle events. Each hook receives a context object with data relevant to the event. + +| Hook | When It Fires | Data Available | +|------|--------------|----------------| +| `onSessionStart` | A new session is created | `session.id`, `session.agent`, `session.metadata` | +| `onSessionEnd` | A session is closed or expires | `session.id`, `session.duration`, `session.messageCount` | +| `onMessage` | User sends a message to the agent | `message.content`, `message.sessionId`, `message.agent` | +| `onResponse` | Agent generates a response | `response.content`, `response.sessionId`, `response.toolCalls` | +| `onToolCall` | Before a tool is executed | `call.name`, `call.parameters`, `call.sessionId` — return `false` to cancel | +| `onToolResult` | After a tool finishes executing | `result.toolName`, `result.output`, `result.duration`, `result.error` | +| `onFileEdit` | A file is modified via the agent | `edit.filePath`, `edit.oldContent`, `edit.newContent`, `edit.sessionId` | +| `onFileWrite` | A new file is created via the agent | `write.filePath`, `write.content`, `write.sessionId` | +| `onError` | An error occurs during processing | `error.message`, `error.code`, `error.stack`, `error.sessionId` | +| `onConfigChange` | Configuration is reloaded or modified | `config.previous`, `config.current`, `config.changedKeys` | + +### Hook Execution Order + +Hooks fire in this order during a typical interaction: + +1. `onSessionStart` (once per session) +2. `onMessage` (each user message) +3. `onToolCall` (before each tool runs) +4. `onToolResult` (after each tool completes) +5. `onFileEdit` / `onFileWrite` (if the tool modifies files) +6. `onResponse` (when the agent produces a response) +7. `onError` (if something fails, at any point) +8. `onSessionEnd` (when the session closes) + +## Example: SQL Anti-Pattern Plugin + +This example creates a data-engineering-specific plugin that checks for `CROSS JOIN` without a `WHERE` clause in Snowflake SQL — a common anti-pattern that can cause massive result sets and runaway costs. + +### Plugin File + +```typescript +// plugins/sql-antipattern-cross-join/index.ts +import { definePlugin, defineTool } from "@altimateai/altimate-code-plugin" +import { z } from "zod" + +/** + * Detects CROSS JOIN usage without a WHERE clause in Snowflake SQL. + * This anti-pattern can produce cartesian products and consume + * excessive credits. + */ +const crossJoinChecker = defineTool({ + name: "check_cross_join_antipattern", + description: + "Checks SQL for CROSS JOIN without a WHERE clause, which can cause cartesian products in Snowflake", + parameters: z.object({ + sql: z.string().describe("The SQL query to analyze"), + severity: z + .enum(["warning", "error"]) + .default("error") + .describe("Severity level for detected anti-patterns"), + }), + async execute({ sql, severity }) { + const findings: Array<{ + line: number + message: string + severity: string + suggestion: string + }> = [] + + const lines = sql.split("\n") + const upperSql = sql.toUpperCase() + + // Check for CROSS JOIN + const crossJoinRegex = /\bCROSS\s+JOIN\b/gi + let match: RegExpExecArray | null + + while ((match = crossJoinRegex.exec(sql)) !== null) { + const lineNumber = + sql.substring(0, match.index).split("\n").length + + // Check if there's a WHERE clause after this CROSS JOIN + const afterJoin = upperSql.substring(match.index) + const hasWhere = /\bWHERE\b/.test(afterJoin) + const hasLimit = /\bLIMIT\b/.test(afterJoin) + + if (!hasWhere) { + findings.push({ + line: lineNumber, + message: `CROSS JOIN without a WHERE clause at line ${lineNumber}`, + severity, + suggestion: hasLimit + ? "Add a WHERE clause to filter the cartesian product. LIMIT alone does not prevent full computation in Snowflake." + : "Add a WHERE clause or replace with an INNER JOIN on a specific condition. Without filtering, this produces a full cartesian product.", + }) + } + } + + // Also detect implicit cross joins (comma-separated FROM without WHERE) + const implicitCrossRegex = + /\bFROM\s+(\w+\s*,\s*\w+(?:\s*,\s*\w+)*)\b/gi + while ((match = implicitCrossRegex.exec(sql)) !== null) { + const afterFrom = upperSql.substring(match.index) + const hasWhere = /\bWHERE\b/.test(afterFrom) + + if (!hasWhere) { + const lineNumber = + sql.substring(0, match.index).split("\n").length + findings.push({ + line: lineNumber, + message: `Implicit CROSS JOIN (comma-separated tables) without WHERE at line ${lineNumber}`, + severity: "warning", + suggestion: + "Use explicit JOIN syntax with ON conditions instead of comma-separated tables in FROM.", + }) + } + } + + return { + passed: findings.length === 0, + findingCount: findings.length, + findings, + summary: + findings.length === 0 + ? "No CROSS JOIN anti-patterns detected." + : `Found ${findings.length} potential CROSS JOIN anti-pattern(s).`, + } + }, +}) + +export default definePlugin({ + name: "sql-antipattern-cross-join", + description: "Detects CROSS JOIN anti-patterns in Snowflake SQL", + tools: [crossJoinChecker], + hooks: { + onToolCall(call) { + // Automatically check SQL when query tools are used + if ( + call.name === "warehouse_query" && + typeof call.parameters?.sql === "string" + ) { + console.log( + `[cross-join-checker] Scanning query for anti-patterns...` + ) + } + }, + onToolResult(result) { + if (result.toolName === "check_cross_join_antipattern") { + const output = result.output as { passed: boolean; summary: string } + if (!output.passed) { + console.warn(`[cross-join-checker] ${output.summary}`) + } + } + }, + }, +}) +``` + +### Register It + +Add the plugin path to your `altimate-code.json`: + +```json +{ + "plugin": [ + "./plugins/sql-antipattern-cross-join" + ] +} +``` + +Or place it directly in your project's `.altimate-code/plugins/` directory, where it will be loaded automatically. + +### Use It + +Once registered, the tool is available in any session: + +``` +> check_cross_join_antipattern sql:"SELECT * FROM orders CROSS JOIN customers" + +Found 1 potential CROSS JOIN anti-pattern(s). +- Line 1: CROSS JOIN without a WHERE clause + Suggestion: Add a WHERE clause or replace with an INNER JOIN on a specific condition. +``` + +## Testing Your Plugin + +### Development Mode + +Run your plugin tests using `bun test`: + +```bash +cd plugins/sql-antipattern-cross-join +bun test +``` + +### Writing Unit Tests + +Create a test file alongside your plugin: + +```typescript +// plugins/sql-antipattern-cross-join/index.test.ts +import { describe, it, expect } from "bun:test" +import plugin from "./index" + +describe("cross-join-antipattern", () => { + const tool = plugin.tools[0] + + it("detects CROSS JOIN without WHERE", async () => { + const result = await tool.execute({ + sql: "SELECT * FROM orders CROSS JOIN customers", + severity: "error", + }) + expect(result.passed).toBe(false) + expect(result.findingCount).toBe(1) + expect(result.findings[0].message).toContain("CROSS JOIN without a WHERE") + }) + + it("passes CROSS JOIN with WHERE", async () => { + const result = await tool.execute({ + sql: "SELECT * FROM orders CROSS JOIN customers WHERE orders.id = customers.order_id", + severity: "error", + }) + expect(result.passed).toBe(true) + expect(result.findingCount).toBe(0) + }) + + it("detects implicit cross join", async () => { + const result = await tool.execute({ + sql: "SELECT * FROM orders, customers", + severity: "warning", + }) + expect(result.passed).toBe(false) + expect(result.findings[0].message).toContain("Implicit CROSS JOIN") + }) + + it("handles clean SQL", async () => { + const result = await tool.execute({ + sql: "SELECT o.id, c.name FROM orders o INNER JOIN customers c ON o.customer_id = c.id", + severity: "error", + }) + expect(result.passed).toBe(true) + }) +}) +``` + +Run the tests: + +```bash +bun test plugins/sql-antipattern-cross-join/index.test.ts +``` + +## Distributing Your Plugin + +### Option 1: Local Directory + +Place your plugin in the `.altimate-code/plugins/` directory of your project. Plugins in this directory are loaded automatically without explicit registration. + +``` +my-dbt-project/ + .altimate-code/ + plugins/ + sql-antipattern-cross-join/ + index.ts + package.json +``` + +### Option 2: Git Repository + +Publish your plugin as a git repository and reference it by URL: + +```json +{ + "plugin": [ + "git+https://github.com/your-org/altimate-cross-join-checker.git" + ] +} +``` + +### Option 3: npm Package + +Publish your plugin to npm for the widest distribution: + +```bash +# In your plugin directory +npm publish +``` + +Your `package.json` should include: + +```json +{ + "name": "@your-org/altimate-plugin-cross-join", + "version": "1.0.0", + "main": "index.ts", + "keywords": ["altimate-code-plugin"], + "peerDependencies": { + "@altimateai/altimate-code-plugin": ">=0.4.0" + } +} +``` + +Then consumers install and register it: + +```bash +npm install @your-org/altimate-plugin-cross-join +``` + +```json +{ + "plugin": ["@your-org/altimate-plugin-cross-join"] +} +``` ## Plugin API diff --git a/docs/docs/develop/sdk.md b/docs/docs/develop/sdk.md index 0bfcc88de3..5502660509 100644 --- a/docs/docs/develop/sdk.md +++ b/docs/docs/develop/sdk.md @@ -1,6 +1,6 @@ # SDK -The altimate SDK (`@altimateai/altimate-code-sdk`) provides a TypeScript client for programmatic access to altimate functionality. +The altimate SDK (`@altimateai/altimate-code-sdk`) provides a TypeScript client for programmatic access to altimate functionality. Use it to automate SQL analysis, manage sessions, and integrate altimate into your CI/CD pipelines or internal tools. ## Installation @@ -8,6 +8,28 @@ The altimate SDK (`@altimateai/altimate-code-sdk`) provides a TypeScript client npm install @altimateai/altimate-code-sdk ``` +## Starting the Server + +Before using the SDK, you need a running altimate server. Start it with: + +```bash +# Start the server on the default port (3000) +altimate serve + +# Start on a custom port +altimate serve --port 8080 + +# Start with a specific config file +altimate serve --config ./altimate-code.json +``` + +Verify the server is running by hitting the health check endpoint: + +```bash +curl http://localhost:3000/health +# => {"status":"ok"} +``` + ## Client Usage ```typescript @@ -29,16 +51,157 @@ const response = await client.send({ const sessions = await client.sessions.list() ``` +## Complete Integration Example + +The following example demonstrates a full workflow: starting a session, running a SQL analysis task, reading the structured result, and handling errors. + +```typescript +import { createClient } from "@altimateai/altimate-code-sdk/client" + +async function analyzeExpensiveQueries() { + const client = createClient({ + baseURL: "http://localhost:3000", + username: "admin", + password: "secret", + }) + + // Step 1: Create a new session + const session = await client.sessions.create({ + agent: "analyst", + metadata: { project: "analytics-pipeline" }, + }) + + try { + // Step 2: Send an analysis request within the session + const response = await client.send({ + sessionId: session.id, + message: "Find the top 10 most expensive queries by credit consumption in the last 30 days", + agent: "analyst", + }) + + // Step 3: Read the structured result + console.log("Analysis complete:") + console.log("Response:", response.content) + + if (response.toolResults) { + for (const result of response.toolResults) { + console.log(`Tool: ${result.toolName}`) + console.log(`Output:`, JSON.stringify(result.output, null, 2)) + } + } + + // Step 4: Ask a follow-up question in the same session + const followUp = await client.send({ + sessionId: session.id, + message: "Which of those queries could benefit from clustering keys?", + agent: "analyst", + }) + + console.log("Follow-up:", followUp.content) + + return { response, followUp } + } finally { + // Step 5: Always close the session when done + await client.sessions.close(session.id) + } +} + +analyzeExpensiveQueries().catch(console.error) +``` + +## Session Management + +Sessions maintain conversation context, which is important for multi-turn interactions and batch workflows. + +```typescript +// Create a session with metadata for tracking +const session = await client.sessions.create({ + agent: "analyst", + metadata: { pipeline: "nightly-audit", runId: "2025-01-15" }, +}) + +// Reuse the session for multiple related messages +await client.send({ sessionId: session.id, message: "List all tables in ANALYTICS.PUBLIC" }) +await client.send({ sessionId: session.id, message: "Which tables have no primary key?" }) + +// List all active sessions +const activeSessions = await client.sessions.list() +console.log(`Active sessions: ${activeSessions.length}`) + +// Close the session to release resources +await client.sessions.close(session.id) +``` + +**Batch workflow tip:** When processing many projects or warehouses, create one session per unit of work and close each when done. This keeps memory usage predictable and ensures context does not leak between unrelated analyses. + +## Error Handling + +The SDK throws typed errors that you can catch and handle: + +```typescript +import { createClient } from "@altimateai/altimate-code-sdk/client" +import { + ConnectionError, + AuthenticationError, + SessionNotFoundError, + RateLimitError, + ServerError, +} from "@altimateai/altimate-code-sdk" + +const client = createClient({ + baseURL: "http://localhost:3000", + username: "admin", + password: "secret", +}) + +try { + const response = await client.send({ + message: "analyze warehouse costs", + agent: "analyst", + }) +} catch (error) { + if (error instanceof ConnectionError) { + // Server is not running or unreachable + console.error("Cannot reach altimate server. Is it running?", error.message) + } else if (error instanceof AuthenticationError) { + // Invalid credentials + console.error("Invalid username or password") + } else if (error instanceof SessionNotFoundError) { + // Session expired or does not exist + console.error("Session not found — it may have expired", error.sessionId) + } else if (error instanceof RateLimitError) { + // Too many requests — back off and retry + console.error(`Rate limited. Retry after ${error.retryAfterMs}ms`) + await new Promise((r) => setTimeout(r, error.retryAfterMs)) + } else if (error instanceof ServerError) { + // Internal server error + console.error("Server error:", error.statusCode, error.message) + } else { + throw error // Re-throw unexpected errors + } +} +``` + ## Exports | Import | Description | |--------|------------| -| `@altimateai/altimate-code-sdk` | Core SDK | -| `@altimateai/altimate-code-sdk/client` | HTTP client | -| `@altimateai/altimate-code-sdk/server` | Server utilities | -| `@altimateai/altimate-code-sdk/v2` | v2 API types | -| `@altimateai/altimate-code-sdk/v2/client` | v2 client | +| `@altimateai/altimate-code-sdk` | Core SDK — error types, constants, utilities | +| `@altimateai/altimate-code-sdk/client` | HTTP client — `createClient()` | +| `@altimateai/altimate-code-sdk/server` | Server utilities — for embedding altimate in your own server | +| `@altimateai/altimate-code-sdk/v2` | v2 API types — TypeScript type definitions | +| `@altimateai/altimate-code-sdk/v2/client` | v2 client — auto-generated typed client | ## OpenAPI The SDK is generated from an OpenAPI specification. The v2 client is auto-generated using `@hey-api/openapi-ts`. + +When the server is running, you can access the live OpenAPI spec at: + +``` +http://localhost:PORT/openapi.json +``` + +This is useful for exploring available endpoints, generating clients in other languages, or importing into tools like Postman or Insomnia. + +> **For contributors:** If you make changes to the API (e.g., `packages/opencode/src/server/server.ts`), run `./script/generate.ts` to regenerate the SDK and related files. See [CONTRIBUTING.md](https://github.com/AltimateAI/altimate-code/blob/main/CONTRIBUTING.md) for details. diff --git a/docs/docs/getting-started.md b/docs/docs/getting-started.md index bea79378eb..e4d74fcc1f 100644 --- a/docs/docs/getting-started.md +++ b/docs/docs/getting-started.md @@ -1,8 +1,10 @@ # Getting Started +> **New to altimate?** [Start with the 5-minute quickstart](quickstart.md) to go from install to your first analysis in minutes. + ## Why altimate? -Unlike general-purpose coding agents, altimate is built for data teams: +altimate is the open-source data engineering harness — 99+ deterministic tools for building, validating, optimizing, and shipping data products. Unlike general-purpose coding agents, every tool is purpose-built for data engineering: | Capability | General coding agents | altimate | |---|---|---| @@ -14,7 +16,7 @@ Unlike general-purpose coding agents, altimate is built for data teams: | PII detection | None | Automatic column scanning | | dbt integration | Basic file editing | Manifest parsing, test generation, model scaffolding | -## Installation +## Step 1: Install ```bash npm install -g altimate-code @@ -22,7 +24,9 @@ npm install -g altimate-code After install, you'll see a welcome banner with quick-start commands. On upgrades, the banner also shows what changed since your previous version. -## First run +## Step 2: Connect Your LLM (`/connect`) + +Before anything else, connect an LLM provider. Launch altimate and run: ```bash altimate @@ -30,7 +34,19 @@ altimate > **Note:** `altimate-code` still works as a backward-compatible alias. -The TUI launches with an interactive terminal. On first run, use the `/discover` command to auto-detect your data stack: +Then in the TUI: + +``` +/connect +``` + +This walks you through selecting and authenticating with an LLM provider (Anthropic, OpenAI, Bedrock, Codex, Ollama, etc.). You need a working LLM connection before the agent can do anything useful. + +## Step 3: Configure Your Warehouse + +Set up warehouse connections so altimate can query your data platform. You have two options: + +### Option A: Auto-discover with `/discover` ``` /discover @@ -44,9 +60,101 @@ The TUI launches with an interactive terminal. On first run, use the `/discover` 4. **Offers to configure connections** — walks you through adding and testing each discovered warehouse 5. **Indexes schemas** — populates the schema cache for autocomplete and context-aware analysis -You can also configure connections manually — see [Warehouse connections](#warehouse-connections) below. +Once complete, altimate indexes your schemas and detects your tooling, enabling schema-aware autocomplete and context-rich analysis. + +### Option B: Manual configuration + +Add a warehouse connection to your `altimate-code.json`. Here are minimal snippets for each warehouse type: + +#### Snowflake (quick-connect) + +```json +{ + "warehouses": { + "snowflake": { + "type": "snowflake", + "account": "xy12345.us-east-1", + "user": "your_user", + "password": "${SNOWFLAKE_PASSWORD}", + "warehouse": "COMPUTE_WH", + "database": "ANALYTICS" + } + } +} +``` -To set up your LLM provider, use the `/connect` command. +#### BigQuery (quick-connect) + +```json +{ + "warehouses": { + "bigquery": { + "type": "bigquery", + "project": "my-gcp-project", + "dataset": "analytics" + } + } +} +``` + +> Tip: Omit `service_account` to use Application Default Credentials (`gcloud auth application-default login`). + +#### Databricks (quick-connect) + +```json +{ + "warehouses": { + "databricks": { + "type": "databricks", + "host": "dbc-abc123.cloud.databricks.com", + "token": "${DATABRICKS_TOKEN}", + "warehouse_id": "abcdef1234567890", + "catalog": "main" + } + } +} +``` + +#### DuckDB (quick-connect) + +```json +{ + "warehouses": { + "duckdb": { + "type": "duckdb", + "database": "./dev.duckdb" + } + } +} +``` + +See [Warehouse connections](#warehouse-connections) below for full configuration options including key-pair auth, Redshift, and PostgreSQL. + +## Step 4: Choose an Agent Mode + +altimate offers specialized agent modes for different workflows: + +| What do you want to do? | Use this agent mode | +|---|---| +| Analyzing data without risk of changes | **Analyst** — read-only queries, cost analysis, data profiling | +| Building or generating dbt models | **Builder** — model scaffolding, SQL generation, ref() wiring | +| Validating data quality | **Validator** — test generation, anomaly detection, data contracts | +| Migrating across warehouses | **Migrator** — cross-dialect SQL translation, compatibility checks | +| Teaching team conventions | **Trainer** — learns corrections, enforces naming/style rules across team | +| Research and exploration | **Researcher** — deep-dive analysis, lineage tracing, impact assessment | +| Executive summaries and reports | **Executive** — high-level overviews, cost summaries, health dashboards | + +Switch modes in the TUI: + +``` +/mode analyst +``` + +## Step 5: Start Working + +You are ready to go. Type a natural-language prompt in the TUI and the agent will use the appropriate tools to answer. See [Example prompts](#example-prompts) at the bottom of this page for ideas. + +--- ## Configuration @@ -199,12 +307,47 @@ If you have a ChatGPT Plus/Pro subscription, you can use Codex as your LLM backe ✓ Connected successfully ``` +## Example Prompts + +Copy and paste these into the TUI to get started with common use cases: + +### Cost analysis + +``` +Analyze our Snowflake credit consumption over the last 30 days. Show the top 10 most expensive queries, which warehouses they ran on, and suggest optimizations. +``` + +### dbt model generation + +``` +Create a dbt staging model for the raw_orders table in our Snowflake warehouse. Include column descriptions, a unique test on order_id, and a not_null test on customer_id. +``` + +### SQL anti-pattern review + +``` +Scan all SQL files in the models/ directory for anti-patterns. Flag any SELECT *, missing WHERE clauses on DELETE statements, implicit cartesian joins, and non-sargable predicates. +``` + +### Cross-warehouse migration + +``` +Translate the following Snowflake SQL to BigQuery-compatible SQL, noting any function differences, data type changes, and features that don't have a direct equivalent: +SELECT DATEADD(day, -7, CURRENT_TIMESTAMP()), TRY_TO_NUMBER(amount), ARRAY_AGG(DISTINCT category) WITHIN GROUP (ORDER BY category) FROM sales QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY sale_date DESC) = 1; +``` + +### Data quality validation + +``` +Generate data quality tests for all models in the marts/ directory. For each model, suggest unique tests, not-null tests, accepted-values tests, and relationship tests based on the column names and types. +``` + ## Next steps -- [TUI Guide](usage/tui.md) — Learn the terminal interface, keybinds, and slash commands -- [CLI Reference](usage/cli.md) — Subcommands, flags, and environment variables -- [Configuration](configure/config.md) — Full config file reference +- [Terminal UI](usage/tui.md) — Learn the terminal interface, keybinds, and slash commands +- [CLI](usage/cli.md) — Subcommands, flags, and environment variables +- [Config Files](configure/config.md) — Full config file reference - [Providers](configure/providers.md) — Set up Anthropic, OpenAI, Bedrock, Ollama, and more - [Agent Modes](data-engineering/agent-modes.md) — Builder, Analyst, Validator, Migrator, Researcher, Trainer -- [Training: Corrections That Stick](data-engineering/training/index.md) — Correct the agent once, it remembers forever, your team inherits it -- [Data Engineering Tools](data-engineering/tools/index.md) — 55+ specialized tools for SQL, dbt, and warehouses +- [Training](data-engineering/training/index.md) — Correct the agent once, it remembers forever, your team inherits it +- [Tools](data-engineering/tools/sql-tools.md) — 99+ specialized tools for SQL, dbt, and warehouses diff --git a/docs/docs/index.md b/docs/docs/index.md index cb57c5f30b..3abd9c34cf 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -15,9 +15,9 @@ hide: altimate-code

-

The data engineering agent for
dbt, SQL, and cloud warehouses.

+

The open-source data engineering harness.

-

An AI-powered CLI with 55+ specialized tools — SQL analysis, schema inspection, column-level lineage, FinOps, and RBAC. Connects to your warehouse, understands your data, and helps you ship faster.

+

99+ tools for building, validating, optimizing, and shipping data products. Use in your terminal, CI pipeline, orchestration DAGs, or as the harness for your data agents. Evaluate across any platform — independent of a single warehouse provider.

@@ -38,8 +38,8 @@ npm install -g altimate-code --- -

Built for data teams

-

Unlike general-purpose coding agents, every tool is purpose-built for data engineering workflows.

+

Purpose-built for the data product lifecycle

+

Every tool covers a specific stage — build, validate, optimize, or ship. Not general-purpose AI on top of SQL files.

@@ -83,6 +83,39 @@ npm install -g altimate-code --- +

Use anywhere in your stack

+

Run interactively, automate in CI, embed in DAGs, or mount as the tool layer for your AI agents.

+ +
+ +- :material-console:{ .lg .middle } **Terminal** + + --- + + Interactive TUI with 99+ tools, autocomplete for skills, and persistent memory across sessions. + +- :material-pipe-disconnected:{ .lg .middle } **CI Pipeline** + + --- + + Headless mode for automated validation, schema diffing, and anti-pattern checks in GitHub Actions or any CI system. + +- :material-graph:{ .lg .middle } **Orchestration DAGs** + + --- + + Call the harness from Airflow, Dagster, or Prefect tasks to add data quality gates and lineage checks to your pipelines. + +- :material-robot-outline:{ .lg .middle } **Data Agent Harness** + + --- + + Mount altimate as the tool layer underneath Claude Code, Codex, or any AI agent — giving it deterministic, warehouse-aware capabilities. + +
+ +--- +

Seven specialized agents

Each agent has scoped permissions and purpose-built tools for its role.

@@ -151,8 +184,8 @@ npm install -g altimate-code --- -

Connects to your warehouse

-

First-class support for 8 data platforms.

+

Evaluate across any platform

+

First-class support for 8 warehouses. Migrate, compare, and translate across platforms — not locked to one vendor.

@@ -171,8 +204,8 @@ npm install -g altimate-code diff --git a/docs/docs/llms.txt b/docs/docs/llms.txt new file mode 100644 index 0000000000..70eaa8733f --- /dev/null +++ b/docs/docs/llms.txt @@ -0,0 +1,42 @@ +# altimate-code llms.txt +# AI-friendly documentation index for altimate-code +# Generated: 2026-03-17 | Version: v0.4.1 +# Source: https://altimateai.github.io/altimate-code + +> altimate-code is an open-source data engineering harness — 99+ tools for building, validating, optimizing, and shipping data products. Use in your terminal, CI pipeline, orchestration DAGs, or as the tool layer for your data agents. Includes a deterministic SQL Intelligence Engine (100% F1 across 1,077 queries), column-level lineage, FinOps analysis, PII detection, and dbt integration. Works with any LLM provider. Local-first, MIT-licensed. + +## Get Started + +- [Quickstart (5 min)](https://altimateai.github.io/altimate-code/quickstart/): Install altimate, configure your LLM provider, connect your warehouse, and run your first query in under 5 minutes. +- [Full Setup Guide](https://altimateai.github.io/altimate-code/getting-started/): Complete installation, warehouse configuration for all 8 supported warehouses, LLM provider setup, and first-run walkthrough. +- [Network & Proxy](https://altimateai.github.io/altimate-code/network/): Proxy configuration, CA certificate setup, firewall requirements. + +## Data Engineering + +- [Agent Modes](https://altimateai.github.io/altimate-code/data-engineering/agent-modes/): 7 specialized agents — Builder (full read/write), Analyst (read-only enforced), Validator, Migrator, Researcher, Trainer, Executive — each with scoped permissions and purpose-built tool access. +- [Training Overview](https://altimateai.github.io/altimate-code/data-engineering/training/): How to teach altimate project-specific patterns, naming conventions, and corrections that persist across sessions and team members. +- [Team Deployment](https://altimateai.github.io/altimate-code/data-engineering/training/team-deployment/): How to commit training to git so your entire team inherits SQL conventions automatically. +- [SQL Tools](https://altimateai.github.io/altimate-code/data-engineering/tools/sql-tools/): 9 SQL analysis tools with 19 anti-pattern rules. 100% F1 accuracy on 1,077 benchmark queries. +- [Schema Tools](https://altimateai.github.io/altimate-code/data-engineering/tools/schema-tools/): Warehouse schema introspection, metadata indexing, and column-level analysis tools. +- [FinOps Tools](https://altimateai.github.io/altimate-code/data-engineering/tools/finops-tools/): Credit analysis, expensive query detection, warehouse right-sizing, unused resource cleanup, RBAC auditing. +- [Lineage Tools](https://altimateai.github.io/altimate-code/data-engineering/tools/lineage-tools/): Column-level lineage extraction from SQL. 100% edge-match accuracy on 500 benchmark queries. +- [dbt Tools](https://altimateai.github.io/altimate-code/data-engineering/tools/dbt-tools/): dbt manifest parsing, test generation, model scaffolding, incremental logic detection. +- [Warehouse Tools](https://altimateai.github.io/altimate-code/data-engineering/tools/warehouse-tools/): Direct connectivity to Snowflake, BigQuery, Databricks, PostgreSQL, Redshift, DuckDB, MySQL, SQL Server. +- [Memory Tools](https://altimateai.github.io/altimate-code/data-engineering/tools/memory-tools/): Session memory, persistent corrections, and team training storage. +- [Cost Optimization Guide](https://altimateai.github.io/altimate-code/data-engineering/guides/cost-optimization/): Step-by-step warehouse cost reduction with before/after SQL examples and savings estimates. +- [Migration Guide](https://altimateai.github.io/altimate-code/data-engineering/guides/migration/): Cross-warehouse SQL migration with side-by-side examples. +- [CI & Headless Mode](https://altimateai.github.io/altimate-code/data-engineering/guides/ci-headless/): Non-interactive use in GitHub Actions, scheduled jobs, and pre-commit hooks. + +## Configure + +- [Configuration Overview](https://altimateai.github.io/altimate-code/configure/config/): Full altimate-code.json schema, value substitution, project structure, experimental flags. +- [Providers](https://altimateai.github.io/altimate-code/configure/providers/): 17 LLM provider configurations with JSON examples: Anthropic, OpenAI, Google Gemini, Vertex AI, Amazon Bedrock, Azure OpenAI, Mistral, Groq, Ollama, and more. +- [Agent Skills](https://altimateai.github.io/altimate-code/configure/skills/): How to configure, discover, and add custom skills. +- [Permissions](https://altimateai.github.io/altimate-code/configure/permissions/): Permission levels, pattern matching, per-agent restrictions, deny rules for destructive SQL. +- [Tracing](https://altimateai.github.io/altimate-code/configure/tracing/): Local-first observability — trace schema, span types, live viewing, remote OTLP exporters, crash recovery. +- [Telemetry](https://altimateai.github.io/altimate-code/configure/telemetry/): 25 anonymized event types, privacy guarantees, opt-out instructions. + +## Reference + +- [Security FAQ](https://altimateai.github.io/altimate-code/security-faq/): 12 Q&A pairs on data handling, credentials, permissions, network endpoints, and team hardening. +- [Troubleshooting](https://altimateai.github.io/altimate-code/troubleshooting/): 6 common issues with step-by-step fixes, including Python bridge failures and warehouse connection errors. diff --git a/docs/docs/quickstart.md b/docs/docs/quickstart.md new file mode 100644 index 0000000000..a0292ffaad --- /dev/null +++ b/docs/docs/quickstart.md @@ -0,0 +1,102 @@ +--- +description: "Install altimate-code and run your first SQL analysis. The open-source data engineering harness — 99+ tools for building, validating, optimizing, and shipping data products." +--- + +# Quickstart + +> **You need:** npm 8+ or Homebrew. An API key for any supported LLM provider — or use Codex (built-in, no key required). + +--- + +## Step 1 — Install + +```bash +# npm (recommended) +npm install -g @altimateai/altimate-code + +# Homebrew +brew install AltimateAI/tap/altimate-code +``` + +> **Zero Python setup required.** On first run, the CLI automatically downloads `uv`, creates an isolated Python environment, and installs the data engine. No `pip install`, no virtualenv management. + +--- + +## Step 2 — Configure Your LLM + +```bash +altimate # Launch the TUI +/connect # Choose your provider and enter your API key +``` + +Or set an environment variable: + +```bash +export ANTHROPIC_API_KEY=your-key-here # Anthropic Claude (recommended) +export OPENAI_API_KEY=your-key-here # OpenAI +``` + +Minimal config file option (`altimate-code.json` in your project root): + +```json +{ + "providers": { + "anthropic": { + "apiKey": "your-key-here" + } + } +} +``` + +> **No API key?** Select **Codex** in the `/connect` menu — it's a built-in provider with no setup required. + +--- + +## Step 3 — Connect Your Warehouse _(Optional)_ + +> Skip this step if you want to work locally or don't need warehouse/orchestration connections. You can always run `/discover` later. + +```bash +altimate /discover +``` + +`/discover` scans for dbt projects, warehouse credentials (from `~/.dbt/profiles.yml`, environment variables, and Docker), and installed tools. It **reads but never writes** — safe to run against production. + +**No cloud warehouse?** Use DuckDB with a local file: + +```json +{ + "connections": { + "local": { + "type": "duckdb", + "database": "~/.altimate/local.duckdb" + } + } +} +``` + +--- + +## Step 4 — Build Your First Artifact + +In the TUI, try these prompts or describe your own use case: + +``` + +Look at my snowflake account and do a comprehensive Analysis our Snowflake credit consumption over the last 30 days. After doing this generate a dashboard for my consumption. + +``` + +``` + +Build me a real time, interactive dashboard for my macbook system metrics and health. Use python, iceberg, dbt for various time slices. + +``` + +--- + +## What's Next + +- [Full Setup](getting-started.md) — All warehouse configs, LLM providers, advanced setup +- [Agent Modes](data-engineering/agent-modes.md) — Choose the right agent for your task +- [CI & Automation](data-engineering/guides/ci-headless.md) — Run altimate in automated pipelines diff --git a/docs/docs/windows-wsl.md b/docs/docs/windows-wsl.md index 68f6f00d81..0367a64436 100644 --- a/docs/docs/windows-wsl.md +++ b/docs/docs/windows-wsl.md @@ -1,8 +1,24 @@ # Windows / WSL -altimate is supported on Windows through WSL (Windows Subsystem for Linux). +altimate runs on Windows both natively (via Node.js on Windows) and through WSL (Windows Subsystem for Linux). WSL 2 is recommended for the best experience, but it is not required. -## WSL Setup +## Windows Native Install + +You can install and run altimate directly in PowerShell or Command Prompt without WSL: + +```powershell +# PowerShell or CMD — install globally +npm install -g @altimateai/altimate-code + +# Launch +altimate +``` + +This works with Node.js 18+ installed natively on Windows. All core features work in native mode, including warehouse connections, agent modes, and the TUI. + +## WSL Setup (Recommended) + +For the best experience — especially with file watching, shell tools, and dbt — we recommend WSL 2: 1. Install WSL: ```powershell @@ -25,6 +41,18 @@ altimate is supported on Windows through WSL (Windows Subsystem for Linux). altimate ``` +## Windows Terminal + +For the best TUI experience on Windows, use [Windows Terminal](https://aka.ms/terminal) with a Nerd Font installed. Windows Terminal supports true color, Unicode, and the full range of TUI features that altimate uses. + +To install a Nerd Font: + +1. Download a Nerd Font from [nerdfonts.com](https://www.nerdfonts.com/font-downloads) (e.g., "FiraCode Nerd Font") +2. Install the font on your system +3. In Windows Terminal, go to **Settings > Profiles > Defaults > Appearance** and set the font face to the installed Nerd Font + +> **Note:** The default `cmd.exe` and older PowerShell windows have limited Unicode support, which may cause rendering issues with altimate's TUI elements. + ## Git Bash Path If you need to use Git Bash instead of WSL: @@ -39,8 +67,52 @@ export ALTIMATE_CLI_GIT_BASH_PATH="C:\\Program Files\\Git\\bin\\bash.exe" - Some terminal features may not work in older cmd.exe or PowerShell windows - File watching may have delays due to WSL filesystem bridging +## Troubleshooting + +### Path separator issues + +Windows uses backslashes (`\`) in file paths, but altimate config files should always use **forward slashes** (`/`), even on Windows. This applies to all paths in `altimate-code.json`: + +```json +{ + "warehouses": { + "local-duckdb": { + "type": "duckdb", + "database": "C:/Users/analyst/projects/dev.duckdb" + } + } +} +``` + +**Wrong** (will cause errors): + +```json +{ + "database": "C:\\Users\\analyst\\projects\\dev.duckdb" +} +``` + +**Right:** + +```json +{ + "database": "C:/Users/analyst/projects/dev.duckdb" +} +``` + +This also applies to paths like `private_key_path`, `service_account`, and any plugin paths specified in the config. + +### Node.js not found after install + +If you installed Node.js but `npm` or `node` is not recognized: + +- Restart your terminal after installing Node.js +- Ensure the Node.js installation directory is in your system `PATH` +- In WSL, make sure you installed Node.js inside WSL, not on the Windows side + ## Tips - Use WSL 2 for better performance - Store your projects in the WSL filesystem (`~/projects/`) rather than `/mnt/c/` for faster file operations - Set up your warehouse connections in the WSL environment +- If using both WSL and native Windows, keep separate config files — the WSL and Windows file systems have different path conventions diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index 984b040204..dfb4fa1177 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -1,5 +1,5 @@ site_name: altimate-code -site_description: The data engineering agent for dbt, SQL, and cloud warehouses +site_description: The open-source data engineering harness. 99+ tools for building, validating, optimizing, and shipping data products. site_url: https://altimateai.github.io/altimate-code repo_url: https://github.com/AltimateAI/altimate-code repo_name: AltimateAI/altimate-code @@ -31,13 +31,33 @@ theme: features: - navigation.sections - navigation.top + - navigation.instant + - navigation.tracking - search.suggest - search.highlight - content.code.copy + - toc.follow + - content.tooltips extra_css: - assets/css/extra.css +extra: + social: + - icon: fontawesome/brands/github + link: https://github.com/AltimateAI/altimate-code + - icon: fontawesome/brands/discord + link: https://altimate.ai/discord + - icon: fontawesome/brands/python + link: https://pypi.org/project/altimate-engine/ + analytics: + provider: google + property: G-XXXXXXXXXX # TODO: Replace with actual GA4 property ID + consent: + title: Cookie consent + description: >- + We use cookies to measure usage and improve the documentation. + markdown_extensions: - admonition - pymdownx.details @@ -53,64 +73,76 @@ markdown_extensions: nav: - Home: index.md - - Getting Started: getting-started.md - - Data Engineering: + + - Get Started: + - Quickstart: quickstart.md + - Full Setup: getting-started.md - Agent Modes: data-engineering/agent-modes.md - - Training: - - Overview: data-engineering/training/index.md - - Tools: - - Overview: data-engineering/tools/index.md - - SQL Tools: data-engineering/tools/sql-tools.md - - Schema Tools: data-engineering/tools/schema-tools.md - - FinOps Tools: data-engineering/tools/finops-tools.md - - Lineage Tools: data-engineering/tools/lineage-tools.md - - dbt Tools: data-engineering/tools/dbt-tools.md - - Warehouse Tools: data-engineering/tools/warehouse-tools.md - - Guides: - - Overview: data-engineering/guides/index.md - - Cost Optimization: data-engineering/guides/cost-optimization.md - - Migration: data-engineering/guides/migration.md - - Using with Claude Code: data-engineering/guides/using-with-claude-code.md - - Using with Codex: data-engineering/guides/using-with-codex.md - - Usage: - - TUI: usage/tui.md - - CLI: usage/cli.md - - Web: usage/web.md - - IDE: usage/ide.md - - GitHub: usage/github.md - - GitLab: usage/gitlab.md + - Interfaces: + - Terminal UI: usage/tui.md + - CLI: usage/cli.md + - IDE / VS Code: usage/ide.md + - Web UI: usage/web.md + + - Guides: + - Cost Optimization: data-engineering/guides/cost-optimization.md + - SQL Migration: data-engineering/guides/migration.md + - CI & Automation: data-engineering/guides/ci-headless.md + + - Tools: + - SQL Analysis: data-engineering/tools/sql-tools.md + - Schema & Metadata: data-engineering/tools/schema-tools.md + - Column-Level Lineage: data-engineering/tools/lineage-tools.md + - dbt Integration: data-engineering/tools/dbt-tools.md + - Cost & FinOps: data-engineering/tools/finops-tools.md + - Warehouse Tools: data-engineering/tools/warehouse-tools.md + + - Integrations: + - GitHub Actions: usage/github.md + - GitLab CI: usage/gitlab.md + - Claude Code: data-engineering/guides/using-with-claude-code.md + - Codex: data-engineering/guides/using-with-codex.md + - MCP Servers: configure/mcp-servers.md + - LSP: configure/lsp.md + - ACP: configure/acp.md + - Configure: - - Overview: configure/config.md - - Providers & Models: + - Config Files: configure/config.md + - AI Providers & Models: - Providers: configure/providers.md - Models: configure/models.md - - Agents & Tools: + - Agents & Skills: - Agents: configure/agents.md - - Tools: configure/tools.md - - Agent Skills: configure/skills.md + - Skills: configure/skills.md + - Tools & Access: + - Allowed Tools: configure/tools.md - Custom Tools: configure/custom-tools.md - - Commands: configure/commands.md + - Access Control: configure/permissions.md - Behavior: - Rules: configure/rules.md - - Permissions: configure/permissions.md + - Commands: configure/commands.md - Context Management: configure/context-management.md - - Formatters: configure/formatters.md + - Memory: data-engineering/tools/memory-tools.md + - Training: + - Overview: data-engineering/training/index.md + - Team Deployment: data-engineering/training/team-deployment.md - Appearance: - Themes: configure/themes.md - Keybinds: configure/keybinds.md - - Tracing: configure/tracing.md - - Telemetry: configure/telemetry.md - - Integrations: - - LSP Servers: configure/lsp.md - - MCP Servers: configure/mcp-servers.md - - ACP Support: configure/acp.md - - Develop: + - Formatters: configure/formatters.md + - Observability: + - Tracing: configure/tracing.md + - Telemetry: configure/telemetry.md + - Network & Proxy: network.md + - Windows / WSL: windows-wsl.md + + - Extend: - SDK: develop/sdk.md - - Server: develop/server.md + - Server API: develop/server.md - Plugins: develop/plugins.md - Ecosystem: develop/ecosystem.md + - Reference: - Security FAQ: security-faq.md - - Network: network.md - Troubleshooting: troubleshooting.md - - Windows / WSL: windows-wsl.md + - Changelog: https://github.com/AltimateAI/altimate-code/blob/main/CHANGELOG.md