Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,8 @@ The image is published to `ghcr.io/trusera/ai-bom` on every tagged release.

**25+ AI SDKs detected** across Python, JavaScript, TypeScript, Java, Go, Rust, and Ruby.

**Optional LLM enrichment** — use `--llm-enrich` to extract specific model names (e.g., gpt-4o, claude-3-opus) from code via OpenAI, Anthropic, or local Ollama models. See [docs/enrichment.md](docs/enrichment.md).

---

## Agent SDKs
Expand Down
5 changes: 3 additions & 2 deletions docs/comparison.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ The goal is to help users understand feature differences and choose the right to
| Scanners | 13+ (code, cloud, Docker, GitHub Actions, Jupyter, MCP, n8n, etc.) | 1 (Python-focused) | Unknown |
| Output Formats | 9 (Table, JSON, SARIF, SPDX, CycloneDX, CSV, HTML, Markdown, JUnit) | JSON, CSV | Unknown |
| CI/CD Integration | GitHub Action, GitLab CI | No | Yes |
| LLM Enrichment | No | Yes | Early access / limited preview |
| LLM Enrichment | Yes | Yes | Early access / limited preview |
| n8n Scanning | Yes | No | No |
| MCP / A2A Detection | Yes | No | No |
| Agent Framework Detection | LangChain, CrewAI, AutoGen, LlamaIndex, Semantic Kernel | Limited | Unknown |
| Binary Model Detection | Yes (.onnx, .pt, .safetensors, etc.) | No | Unknown |
| Policy Enforcement | Cedar policy gate | No | Yes |
| Best For | Multi-framework projects needing multiple formats | Python projects needing LLM enrichment | Existing Snyk customers |
| Best For | Multi-framework projects needing multiple formats and optional LLM enrichment | Python projects needing LLM enrichment | Existing Snyk customers |

---

Expand All @@ -31,6 +31,7 @@ The goal is to help users understand feature differences and choose the right to

- Open-source AI Bill of Materials scanner focused on discovering AI/LLM usage across codebases and infrastructure.
- Supports multiple scanners, formats, and compliance mappings (OWASP Agentic Top 10, EU AI Act).
- LLM enrichment (`--llm-enrich`) uses litellm to extract specific model names from code, supporting OpenAI, Anthropic, Ollama, and 100+ providers.
- Designed for developer workflows with CLI, CI/CD, and dashboard support.

### Cisco AIBOM
Expand Down
115 changes: 115 additions & 0 deletions docs/enrichment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# LLM Enrichment

AI-BOM can optionally use an LLM to analyze code snippets around detected AI components and extract the specific model names being used (e.g., `gpt-4o`, `claude-3-opus-20240229`, `llama3`).

This fills the `model_name` field that static pattern matching may leave empty, particularly when model names are passed as variables or constructed dynamically.

---

## Installation

LLM enrichment requires the `litellm` package:

```bash
pip install ai-bom[enrich]
```

---

## Usage

### Basic

```bash
ai-bom scan . --llm-enrich
```

This uses `gpt-4o-mini` by default (requires `OPENAI_API_KEY` environment variable).

### With a specific model

```bash
# OpenAI
ai-bom scan . --llm-enrich --llm-model gpt-4o

# Anthropic
ai-bom scan . --llm-enrich --llm-model anthropic/claude-3-haiku-20240307

# Local Ollama (no API key needed)
ai-bom scan . --llm-enrich --llm-model ollama/llama3 --llm-base-url http://localhost:11434
```

### With an explicit API key

```bash
ai-bom scan . --llm-enrich --llm-api-key sk-your-key-here
```

If `--llm-api-key` is not provided, litellm falls back to standard environment variables (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.).

---

## CLI Options

| Flag | Default | Description |
|------|---------|-------------|
| `--llm-enrich` | `False` | Enable LLM enrichment |
| `--llm-model` | `gpt-4o-mini` | litellm model identifier |
| `--llm-api-key` | None | API key (falls back to env vars) |
| `--llm-base-url` | None | Custom API base URL (e.g., Ollama) |

---

## How It Works

1. After all scanners run, components with type `llm_provider` or `model` that have an empty `model_name` are selected for enrichment.
2. For each eligible component, ~20 lines of code around the detection site are read from the source file.
3. The code snippet is sent to the configured LLM with a prompt asking it to extract the model identifier.
4. The response is parsed and cross-referenced with AI-BOM's built-in model registry to validate the name and fill in provider/deprecation metadata.
5. If the LLM call fails or returns no model name, the component is left unchanged.

Components that already have a `model_name` (from static detection) are skipped. Non-model component types (containers, tools, MCP servers, workflows) are never sent to the LLM.

---

## Privacy and Security

**Code snippets are sent to the LLM provider.** When using cloud-hosted models (OpenAI, Anthropic, etc.), approximately 20 lines of source code around each detected AI import or usage site are transmitted to the provider's API.

Recommendations:

- **For sensitive or proprietary codebases**, use a local model via Ollama (`--llm-model ollama/llama3`). No data leaves your machine.
- **Before using cloud APIs**, ensure you have organizational approval to send source code excerpts to the provider.
- **Only code around detected AI components** is sent — not entire files, not the full repository.
- AI-BOM does not intentionally include secrets in snippets, but if API keys are hard-coded near import statements, they may be included in the context window. Use `--deep` scanning to detect and remediate hard-coded keys separately.

A warning is printed when using non-local models:

```
Warning: LLM enrichment sends code snippets to an external API.
Use ollama/* models for local-only processing.
```

---

## Cost

Each eligible component triggers one or more LLM API calls. For projects with many detected AI components, this can result in non-trivial API costs when using paid providers.

- Components are batched (default: 5 per call) to reduce the number of API requests.
- Use a low-cost model like `gpt-4o-mini` for bulk enrichment.
- **Ollama is free** — run models locally with zero API cost.

---

## Supported Providers

LLM enrichment uses [litellm](https://docs.litellm.ai/) as its backend, which supports 100+ LLM providers including:

- OpenAI (`gpt-4o`, `gpt-4o-mini`, etc.)
- Anthropic (`anthropic/claude-3-haiku-20240307`, etc.)
- Ollama (`ollama/llama3`, `ollama/mistral`, etc.)
- Azure OpenAI, AWS Bedrock, Google Vertex AI
- Mistral, Cohere, and many more

See the [litellm provider list](https://docs.litellm.ai/docs/providers) for the full list.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ nav:
- CSV: outputs/csv.md
- JUnit: outputs/junit.md
- Markdown: outputs/markdown.md
- LLM Enrichment: enrichment.md
- CI/CD Integration: ci-integration.md
- Policy Enforcement: policy.md
- Compliance:
Expand Down
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ aws = ["boto3>=1.26.0,<2.0"]
gcp = ["google-cloud-aiplatform>=1.38.0,<2.0"]
azure = ["azure-ai-ml>=1.11.0,<2.0", "azure-identity>=1.12.0,<2.0"]
cloud-live = ["ai-bom[aws,gcp,azure]"]
enrich = ["litellm>=1.40.0,<3.0"]
callable = [] # base callable module — no SDKs required
callable-openai = ["openai>=1.0.0,<3.0"]
callable-anthropic = ["anthropic>=0.30.0,<2.0"]
Expand All @@ -88,7 +89,7 @@ callable-cohere = ["cohere>=5.0.0,<7.0"]
callable-all = [
"ai-bom[callable-openai,callable-anthropic,callable-google,callable-bedrock,callable-ollama,callable-mistral,callable-cohere]",
]
all = ["ai-bom[dashboard,docs,server,watch,cloud-live,callable-all]"]
all = ["ai-bom[dashboard,docs,server,watch,cloud-live,callable-all,enrich]"]

[project.scripts]
ai-bom = "ai_bom.cli:app"
Expand Down
57 changes: 57 additions & 0 deletions src/ai_bom/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -403,12 +403,48 @@ def scan(
"--telemetry/--no-telemetry",
help="Enable/disable anonymous telemetry (overrides AI_BOM_TELEMETRY env var)",
),
llm_enrich: bool = typer.Option(
False,
"--llm-enrich",
help="Use an LLM to extract model names from code snippets (requires ai-bom[enrich])",
),
llm_model: str = typer.Option(
"gpt-4o-mini",
"--llm-model",
help="LLM model for enrichment (e.g. gpt-4o-mini, anthropic/claude-3-haiku, ollama/llama3)",
),
llm_api_key: Optional[str] = typer.Option(
None,
"--llm-api-key",
help="API key for the LLM provider (falls back to provider env vars like OPENAI_API_KEY)",
),
llm_base_url: Optional[str] = typer.Option(
None,
"--llm-base-url",
help="Custom base URL for LLM API (e.g. http://localhost:11434 for Ollama)",
),
) -> None:
"""Scan a directory or repository for AI/LLM components."""
# --json / -j overrides --format
if json_output:
format = "json"

# Validate --llm-enrich dependency early
if llm_enrich:
try:
import litellm # noqa: F401
except ImportError:
console.print(
"[red]LLM enrichment requires litellm. "
"Install with: pip install ai-bom[enrich][/red]"
)
raise typer.Exit(EXIT_ERROR) from None
if not quiet and not llm_model.startswith("ollama/"):
console.print(
"[yellow]Warning: LLM enrichment sends code snippets to an external API. "
"Use ollama/* models for local-only processing.[/yellow]"
)

# Setup logging
_setup_logging(verbose=verbose, debug=debug)

Expand Down Expand Up @@ -582,6 +618,23 @@ def scan(
end_time = time.time()
result.summary.scan_duration_seconds = end_time - start_time

# LLM enrichment (optional post-processing)
if llm_enrich and result.components:
from ai_bom.enrichment import enrich_components

if format == "table" and not quiet:
console.print("[cyan]Running LLM enrichment...[/cyan]")
enriched = enrich_components(
result.components,
scan_path=scan_path,
model=llm_model,
api_key=llm_api_key,
base_url=llm_base_url,
quiet=quiet,
)
if format == "table" and not quiet:
console.print(f"[green]Enriched {enriched} component(s) with model names[/green]")

# Build summary
result.build_summary()

Expand Down Expand Up @@ -901,6 +954,10 @@ def demo() -> None:
validate_schema=False,
json_output=False,
telemetry=None,
llm_enrich=False,
llm_model="gpt-4o-mini",
llm_api_key=None,
llm_base_url=None,
)


Expand Down
5 changes: 5 additions & 0 deletions src/ai_bom/enrichment/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""LLM-based enrichment for AI-BOM components."""

from ai_bom.enrichment.llm_enricher import enrich_components

__all__ = ["enrich_components"]
Loading