Trusera · cassiocouto · Mar 2, 2026 · Mar 2, 2026
@@ -128,6 +128,8 @@ The image is published to `ghcr.io/trusera/ai-bom` on every tagged release.
 
 **25+ AI SDKs detected** across Python, JavaScript, TypeScript, Java, Go, Rust, and Ruby.
 
+**Optional LLM enrichment** — use `--llm-enrich` to extract specific model names (e.g., gpt-4o, claude-3-opus) from code via OpenAI, Anthropic, or local Ollama models. See [docs/enrichment.md](docs/enrichment.md).
+
 ---
 
 ## Agent SDKs

@@ -15,13 +15,13 @@ The goal is to help users understand feature differences and choose the right to
 | Scanners | 13+ (code, cloud, Docker, GitHub Actions, Jupyter, MCP, n8n, etc.) | 1 (Python-focused) | Unknown |
 | Output Formats | 9 (Table, JSON, SARIF, SPDX, CycloneDX, CSV, HTML, Markdown, JUnit) | JSON, CSV | Unknown |
 | CI/CD Integration | GitHub Action, GitLab CI | No | Yes |
-| LLM Enrichment | No | Yes | Early access / limited preview |
+| LLM Enrichment | Yes | Yes | Early access / limited preview |
 | n8n Scanning | Yes | No | No |
 | MCP / A2A Detection | Yes | No | No |
 | Agent Framework Detection | LangChain, CrewAI, AutoGen, LlamaIndex, Semantic Kernel | Limited | Unknown |
 | Binary Model Detection | Yes (.onnx, .pt, .safetensors, etc.) | No | Unknown |
 | Policy Enforcement | Cedar policy gate | No | Yes |
-| Best For | Multi-framework projects needing multiple formats | Python projects needing LLM enrichment | Existing Snyk customers |
+| Best For | Multi-framework projects needing multiple formats and optional LLM enrichment | Python projects needing LLM enrichment | Existing Snyk customers |
 
 ---
 
@@ -31,6 +31,7 @@ The goal is to help users understand feature differences and choose the right to
 
 - Open-source AI Bill of Materials scanner focused on discovering AI/LLM usage across codebases and infrastructure.
 - Supports multiple scanners, formats, and compliance mappings (OWASP Agentic Top 10, EU AI Act).
+- LLM enrichment (`--llm-enrich`) uses litellm to extract specific model names from code, supporting OpenAI, Anthropic, Ollama, and 100+ providers.
 - Designed for developer workflows with CLI, CI/CD, and dashboard support.
 
 ### Cisco AIBOM

@@ -0,0 +1,115 @@
+# LLM Enrichment
+
+AI-BOM can optionally use an LLM to analyze code snippets around detected AI components and extract the specific model names being used (e.g., `gpt-4o`, `claude-3-opus-20240229`, `llama3`).
+
+This fills the `model_name` field that static pattern matching may leave empty, particularly when model names are passed as variables or constructed dynamically.
+
+---
+
+## Installation
+
+LLM enrichment requires the `litellm` package:
+
+```bash
+pip install ai-bom[enrich]
+```
+
+---
+
+## Usage
+
+### Basic
+
+```bash
+ai-bom scan . --llm-enrich
+```
+
+This uses `gpt-4o-mini` by default (requires `OPENAI_API_KEY` environment variable).
+
+### With a specific model
+
+```bash
+# OpenAI
+ai-bom scan . --llm-enrich --llm-model gpt-4o
+
+# Anthropic
+ai-bom scan . --llm-enrich --llm-model anthropic/claude-3-haiku-20240307
+
+# Local Ollama (no API key needed)
+ai-bom scan . --llm-enrich --llm-model ollama/llama3 --llm-base-url http://localhost:11434
+```
+
+### With an explicit API key
+
+```bash
+ai-bom scan . --llm-enrich --llm-api-key sk-your-key-here
+```
+
+If `--llm-api-key` is not provided, litellm falls back to standard environment variables (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.).
+
+---
+
+## CLI Options
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--llm-enrich` | `False` | Enable LLM enrichment |
+| `--llm-model` | `gpt-4o-mini` | litellm model identifier |
+| `--llm-api-key` | None | API key (falls back to env vars) |
+| `--llm-base-url` | None | Custom API base URL (e.g., Ollama) |
+
+---
+
+## How It Works
+
+1. After all scanners run, components with type `llm_provider` or `model` that have an empty `model_name` are selected for enrichment.
+2. For each eligible component, ~20 lines of code around the detection site are read from the source file.
+3. The code snippet is sent to the configured LLM with a prompt asking it to extract the model identifier.
+4. The response is parsed and cross-referenced with AI-BOM's built-in model registry to validate the name and fill in provider/deprecation metadata.
+5. If the LLM call fails or returns no model name, the component is left unchanged.
+
+Components that already have a `model_name` (from static detection) are skipped. Non-model component types (containers, tools, MCP servers, workflows) are never sent to the LLM.
+
+---
+
+## Privacy and Security
+
+**Code snippets are sent to the LLM provider.** When using cloud-hosted models (OpenAI, Anthropic, etc.), approximately 20 lines of source code around each detected AI import or usage site are transmitted to the provider's API.
+
+Recommendations:
+
+- **For sensitive or proprietary codebases**, use a local model via Ollama (`--llm-model ollama/llama3`). No data leaves your machine.
+- **Before using cloud APIs**, ensure you have organizational approval to send source code excerpts to the provider.
+- **Only code around detected AI components** is sent — not entire files, not the full repository.
+- AI-BOM does not intentionally include secrets in snippets, but if API keys are hard-coded near import statements, they may be included in the context window. Use `--deep` scanning to detect and remediate hard-coded keys separately.
+
+A warning is printed when using non-local models:
+
+```
+Warning: LLM enrichment sends code snippets to an external API.
+Use ollama/* models for local-only processing.
+```
+
+---
+
+## Cost
+
+Each eligible component triggers one or more LLM API calls. For projects with many detected AI components, this can result in non-trivial API costs when using paid providers.
+
+- Components are batched (default: 5 per call) to reduce the number of API requests.
+- Use a low-cost model like `gpt-4o-mini` for bulk enrichment.
+- **Ollama is free** — run models locally with zero API cost.
+
+---
+
+## Supported Providers
+
+LLM enrichment uses [litellm](https://docs.litellm.ai/) as its backend, which supports 100+ LLM providers including:
+
+- OpenAI (`gpt-4o`, `gpt-4o-mini`, etc.)
+- Anthropic (`anthropic/claude-3-haiku-20240307`, etc.)
+- Ollama (`ollama/llama3`, `ollama/mistral`, etc.)
+- Azure OpenAI, AWS Bedrock, Google Vertex AI
+- Mistral, Cohere, and many more
+
+See the [litellm provider list](https://docs.litellm.ai/docs/providers) for the full list.
@@ -67,6 +67,7 @@ nav:
     - CSV: outputs/csv.md
     - JUnit: outputs/junit.md
     - Markdown: outputs/markdown.md
+  - LLM Enrichment: enrichment.md
   - CI/CD Integration: ci-integration.md
   - Policy Enforcement: policy.md
   - Compliance:

@@ -77,6 +77,7 @@ aws = ["boto3>=1.26.0,<2.0"]
 gcp = ["google-cloud-aiplatform>=1.38.0,<2.0"]
 azure = ["azure-ai-ml>=1.11.0,<2.0", "azure-identity>=1.12.0,<2.0"]
 cloud-live = ["ai-bom[aws,gcp,azure]"]
+enrich = ["litellm>=1.40.0,<3.0"]
 callable = []  # base callable module — no SDKs required
 callable-openai = ["openai>=1.0.0,<3.0"]
 callable-anthropic = ["anthropic>=0.30.0,<2.0"]
@@ -88,7 +89,7 @@ callable-cohere = ["cohere>=5.0.0,<7.0"]
 callable-all = [
     "ai-bom[callable-openai,callable-anthropic,callable-google,callable-bedrock,callable-ollama,callable-mistral,callable-cohere]",
 ]
-all = ["ai-bom[dashboard,docs,server,watch,cloud-live,callable-all]"]
+all = ["ai-bom[dashboard,docs,server,watch,cloud-live,callable-all,enrich]"]
 
 [project.scripts]
 ai-bom = "ai_bom.cli:app"

@@ -403,12 +403,48 @@ def scan(
         "--telemetry/--no-telemetry",
         help="Enable/disable anonymous telemetry (overrides AI_BOM_TELEMETRY env var)",
     ),
+    llm_enrich: bool = typer.Option(
+        False,
+        "--llm-enrich",
+        help="Use an LLM to extract model names from code snippets (requires ai-bom[enrich])",
+    ),
+    llm_model: str = typer.Option(
+        "gpt-4o-mini",
+        "--llm-model",
+        help="LLM model for enrichment (e.g. gpt-4o-mini, anthropic/claude-3-haiku, ollama/llama3)",
+    ),
+    llm_api_key: Optional[str] = typer.Option(
+        None,
+        "--llm-api-key",
+        help="API key for the LLM provider (falls back to provider env vars like OPENAI_API_KEY)",
+    ),
+    llm_base_url: Optional[str] = typer.Option(
+        None,
+        "--llm-base-url",
+        help="Custom base URL for LLM API (e.g. http://localhost:11434 for Ollama)",
+    ),
 ) -> None:
     """Scan a directory or repository for AI/LLM components."""
     # --json / -j overrides --format
     if json_output:
         format = "json"
 
+    # Validate --llm-enrich dependency early
+    if llm_enrich:
+        try:
+            import litellm  # noqa: F401
+        except ImportError:
+            console.print(
+                "[red]LLM enrichment requires litellm. "
+                "Install with: pip install ai-bom[enrich][/red]"
+            )
+            raise typer.Exit(EXIT_ERROR) from None
+        if not quiet and not llm_model.startswith("ollama/"):
+            console.print(
+                "[yellow]Warning: LLM enrichment sends code snippets to an external API. "
+                "Use ollama/* models for local-only processing.[/yellow]"
+            )
+
     # Setup logging
     _setup_logging(verbose=verbose, debug=debug)
 
@@ -582,6 +618,23 @@ def scan(
         end_time = time.time()
         result.summary.scan_duration_seconds = end_time - start_time
 
+        # LLM enrichment (optional post-processing)
+        if llm_enrich and result.components:
+            from ai_bom.enrichment import enrich_components
+
+            if format == "table" and not quiet:
+                console.print("[cyan]Running LLM enrichment...[/cyan]")
+            enriched = enrich_components(
+                result.components,
+                scan_path=scan_path,
+                model=llm_model,
+                api_key=llm_api_key,
+                base_url=llm_base_url,
+                quiet=quiet,
+            )
+            if format == "table" and not quiet:
+                console.print(f"[green]Enriched {enriched} component(s) with model names[/green]")
+
         # Build summary
         result.build_summary()
 
@@ -901,6 +954,10 @@ def demo() -> None:
         validate_schema=False,
         json_output=False,
         telemetry=None,
+        llm_enrich=False,
+        llm_model="gpt-4o-mini",
+        llm_api_key=None,
+        llm_base_url=None,
     )
 
 

@@ -0,0 +1,5 @@
+"""LLM-based enrichment for AI-BOM components."""
+
+from ai_bom.enrichment.llm_enricher import enrich_components
+
+__all__ = ["enrich_components"]
-Original file line number
+Diff line change
@@ Expand Up @@
     **25+ AI SDKs detected** across Python, JavaScript, TypeScript, Java, Go, Rust, and Ruby.
+    **Optional LLM enrichment** — use `--llm-enrich` to extract specific model names (e.g., gpt-4o, claude-3-opus) from code via OpenAI, Anthropic, or local Ollama models. See [docs/enrichment.md](docs/enrichment.md).
     ---
     ## Agent SDKs
@@ Expand Down @@