Release list

v2.1.0 Latest

Latest

cdgamarose-nv released this 19 May 21:35

v2.1.0

917271c

What's Changed

AI-Q REST API with pluggable auth middleware, entry-point-registered token validators, and async job ownership enforcement
Auth extensibility hooks (register_token_fetcher, provider lifecycle) and auth refactor eliminating the refresh race
Data source registry driving UI toggles, per-message filtering, and agent tool inheritance
New exa_web_search data source with full_text and highlights controls
Deep researcher consumes DeepAgents skills with a job-scoped Modal sandbox; built-in data-table-analysis skill and configs/config_skills.yml example
AI-Q is consumable as a portable Agent Skill (.agents/skills/aiq-research/), with .claude/skills/aiq-research/ retained as a Claude Code compatibility symlink for routed /chat and async job lifecycle against a local AI-Q server
Cost analysis tool with pricing configs and profiling example
Documented MCP client patterns scoped for 2.1: mcp_client, mcp_service_account, and user-identity tools
Prompt restructure across all agents for KV cache prefix reuse
Operability: idempotent DB init, tuned Dask/Postgres defaults, request tracing into NAT spans, UI stream-failure hardening
New authentication and MCP tools guides; new skills-and-sandbox example
Pinned to NeMo Agent Toolkit (NAT) v1.6.0; CVE bumps for Pillow, cryptography, pygments, authlib, pyopenssl, and pytest

Assets 2

v2.1.0-rc4 Pre-release

Pre-release

AjayThorve released this 13 May 21:27

v2.1.0-rc4

34c3a41

What's Changed

fix(citation): register MCP tool results as sources when no URLs present by @tanleach in #227
Upgrade Python runtime to 3.13 and distroless to v4.0.5 by @efajardo-nv in #233

Full Changelog: v2.1.0-rc3...v2.1.0-rc4

Contributors

tanleach and efajardo-nv

Assets 2

v2.1.0-rc3 Pre-release

Pre-release

AjayThorve released this 13 May 02:47

v2.1.0-rc3

62ffe40

What's Changed

aligning the SKILL version with the upcoming release by @tanleach in #223
fix helm bootstrap resources by @AjayThorve in #231
fix: add headless mode header for chat endpoint (#226) by @cdgamarose-nv in #232
fix(security): remediate 20 CVEs in aiq-agent container + UI by @efajardo-nv in #229

Full Changelog: v2.1.0-rc2...v2.1.0-rc3

Contributors

tanleach, AjayThorve, and 2 other contributors

Assets 2

v2.1.0-rc2 Pre-release

Pre-release

AjayThorve released this 07 May 03:15

v2.1.0-rc2

1fba03c

What's Changed

docs(mcp): remove untested 'Publish AIQ as MCP server' section by @AjayThorve in #220
fix(helm): repair imagePullSecrets fallback that breaks pod render by @AjayThorve in #221
chore: package AIQ research as portable agent skill by @tanleach in #219

Full Changelog: v2.1.0-rc1...v2.1.0-rc2

Contributors

tanleach and AjayThorve

Assets 2

v2.1.0-rc1 Pre-release

Pre-release

AjayThorve released this 06 May 01:21

v2.1.0-rc1

d2b85a7

What's Changed

update nat version and compatibility fixes by @cdgamarose-nv in #166
fix: idempotent DB init and SSE stream reliability with connection poolers by @AjayThorve in #161
Fix silent auth transport failures in WebSocket and SSE by @AjayThorve in #170
Add register_token_fetcher plugin hook for auth extensibility by @AjayThorve in #169
Add data source registry and update related configurations by @AjayThorve in #90
Propagate auth token to Dask workers for async jobs by @AjayThorve in #174
fix bug in uploading a file, and delete unnecessary func by @DinaLaptii in #163
feat: expose AI-Q as an API with Auth Middleware by @cdgamarose-nv in #173
fix: add LangGraph checkpoint tables to init-db.sql by @AjayThorve in #176
fix: bump cryptography, pygments, authlib, pyopenssl for CVE fixes by @AjayThorve in #175
refactor(prompts): restructure all prompts for KV cache prefix reuse by @AjayThorve in #177
fix: allow access to /docs by @cdgamarose-nv in #178
fix: set size cap on reads by @cdgamarose-nv in #180
Bump Pillow to 12.2.0 (CVE-2026-40192) by @efajardo-nv in #183
chore: update dependencies and improve linting configuration by @AjayThorve in #185
fix: reduce checkpoint pool size and raise postgres max_connections by @AjayThorve in #186
fix: inject fresh idToken on WebSocket upgrade for reliable auth by @AjayThorve in #188
fix: use workflow identifier as model field in chat response by @cdgamarose-nv in #189
fix: surface unavailable tool details in user-facing error messages by @KyleZheng1284 in #184
Revert "fix: inject fresh idToken on WebSocket upgrade for reliable auth (#188)" by @AjayThorve in #191
Bump pytest to 9.0.3 (CVE-2025-71176) by @efajardo-nv in #192
fix: broken periodic_cleanup import + add Dask memory/lifetime env vars by @AjayThorve in #197
Bump authlib to >=1.6.11 by @efajardo-nv in #198
fix auth trust boundary and enforce async job ownership by @cdgamarose-nv in #199
feat: add exa_web_search data source by @maxwbuckley in #181
Add request trace classification and pseudonymous ids by @AjayThorve in #203
feat: cost analysis tool with pricing configs, one-time report generation tools, and profiling config example by @cdgamarose-nv in #172
fix perfomance and tests by @DinaLaptii in #205
Propagate AIQ request tags to NAT spans by @AjayThorve in #206
fix: auth refactor — eliminate refresh race, increase buffer, add error semantics by @exactlyallan in #194
feat: provider lifecycle hooks for composable auth extensions by @exactlyallan in #195
test: close remaining auth bug fix test coverage gaps by @exactlyallan in #196
Harden AIQ UI stream failure handling by @exactlyallan in #207
fix: keep commas inside URL paths in citation source extractor by @AjayThorve in #209
chore: bump NeMo Agent Toolkit pin to 1.6.0 by @AjayThorve in #208
Fix/dask cleanup and memory controls by @AjayThorve in #200
feat: add support for skills and sandbox along with example by @cdgamarose-nv in #211
docs: add authentication guide and scope MCP guide to AIQ 2.1 by @AjayThorve in #212
deleted expired sessions by @DinaLaptii in #213
docs: add v2.1.0 changelog entry by @AjayThorve in #216

New Contributors

@DinaLaptii made their first contribution in #163
@maxwbuckley made their first contribution in #181

Full Changelog: 2.0.0...v2.1.0.RC1

Contributors

maxwbuckley, exactlyallan, and 5 other contributors

Assets 2

2.0.0

AjayThorve released this 18 Mar 15:16

2.0.0

62101c8

Release v2.0.0

Overview

AI-Q v2.0.0 is a ground-up rewrite of the NVIDIA AI-Q Blueprint. The v1.x line provided a single deep research agent with PDF upload and a demo web application. v2.0.0 introduces a two-tier multi-agent architecture built on the NVIDIA NeMo Agent Toolkit (NAT), a new Next.js frontend, async job infrastructure, a pluggable knowledge layer, and built-in evaluation. The AI-Q NVIDIA Blueprint is an open reference example for building intelligent AI agents that connect to your enterprise data, reason using state-of-the-art models, and deliver trusted business insights.

AI-Q holds top positions on both the DeepResearch Bench and DeepResearch Bench II leaderboards. To reproduce those results, use the drb1 and drb2 branches, respectively.

Architecture

Two-tier research routing. A single-call Intent Classifier routes every query to the optimal path: instant meta responses, fast shallow research, or comprehensive deep research — eliminating unnecessary latency for simple queries.
LangGraph state machine orchestrator. The core workflow is a LangGraph StateGraph with explicit, testable routing and conversation checkpointing (in-memory, SQLite, or PostgreSQL).
Shallow Researcher agent. New bounded tool-calling agent optimized for speed with configurable tool-call budgets, context compaction, and a synthesis anchor that forces citation-backed answers when the budget is exhausted.
Deep Researcher agent. Rebuilt using the deepagents library with a three-role subagent architecture (Orchestrator, Planner, Researcher). Supports configurable research loop iterations, per-role LLM assignment, and structured multi-phase workflows: planning, iterative research, citation management, and final report generation.
Clarifier agent with HITL. New human-in-the-loop agent that gathers clarifications, generates structured research plans, and supports plan approval/rejection/feedback before deep research begins. Fully configurable and can be disabled.
Shallow-to-deep escalation. The shallow researcher can automatically escalate to deep research when it detects insufficient results, routing through the clarifier for plan approval.

API and Backend

Async Jobs API. New REST API (/v1/jobs/async/) for submitting, tracking, cancelling, and streaming research jobs. Supports custom job IDs, configurable expiry, and job artifact retrieval.
SSE streaming with event replay. Real-time Server-Sent Events for all agent execution events (LLM tokens, tool calls, artifacts, citations). Full reconnection support with event replay from any point — sub-10ms latency on PostgreSQL via LISTEN/NOTIFY.
Dask-based distributed execution. Deep research jobs run on a Dask cluster with configurable workers and threads, background heartbeats, stale job reaping, and cooperative cancellation.
PostgreSQL persistence. Job store, event store, LangGraph checkpoints, and document summaries all support PostgreSQL for production deployments. SQLite remains available for local development.
Pluggable agent registration. Custom agents can be registered and exposed through the async jobs API without modifying core code.

Knowledge Layer

Pluggable knowledge retrieval. Backend-agnostic knowledge layer with a factory/registry pattern. Swap between LlamaIndex (local ChromaDB) and Foundational RAG (hosted NVIDIA RAG Blueprint) without changing agent code.
Document ingestion pipeline. Async file upload with job tracking, status polling (UPLOADING → INGESTING → SUCCESS/FAILED), and collection management (create, delete, list, TTL cleanup).
Multimodal extraction. LlamaIndex backend supports VLM-powered image captioning and chart data extraction from PDFs, making visual content searchable alongside text.
Document summaries. Optional LLM-generated one-sentence summaries per document, injected into agent prompts so researchers understand available files before making tool calls.
Session-based collections. Each browser session gets an isolated collection with automatic 24-hour TTL cleanup.

Citation Verification

Deterministic citation verification pipeline. Every research response (shallow and deep) passes through post-processing that validates all citations against a source registry of actually-retrieved URLs using a five-level matching strategy (exact, truncation, prefix, child-path, query-subset). Includes report sanitization (shortened URLs, IP addresses, non-HTTP schemes) and a full audit trail of verification decisions.

Frontend

New Next.js web UI. Complete rewrite as a modern Next.js application with conversational chat interface, document upload, collection management, and real-time research progress visualization.
Optional OAuth authentication. OIDC-based authentication support with configurable providers and a REQUIRE_AUTH toggle.
Configurable file upload. Accepted file types, max file size, and max file count controllable via environment variables.

Observability

Multi-backend tracing. Built-in support for Phoenix (local trace visualization), LangSmith (LLM evaluation and prompt optimization), Weights & Biases Weave (experiment tracking with PII redaction), and a production-grade OpenTelemetry Collector exporter with configurable privacy redaction — all configurable through NAT YAML config or environment variables.

Evaluation

FreshQA benchmark. Built-in factuality evaluation on time-sensitive questions for measuring shallow researcher accuracy, runnable via the NAT evaluation harness (nat eval). Deep research benchmark reproduction is available on the dedicated drb1 and drb2 branches.

Deployment

Docker Compose stack. Production-ready three-service stack (backend, frontend, PostgreSQL) with multi-stage Dockerfile, dev/release build targets, and distroless runtime images running as non-root (UID 1000).
Helm chart for Kubernetes. Full Helm deployment with NGC registry support, Kubernetes secrets management, configurable resource limits, health checks, and Foundational RAG integration via internal service DNS.
Horizontal scaling. Stateless backend supports scaling behind a load balancer with shared PostgreSQL and optional external Dask scheduler.

NAT-Powered Configuration

Native NeMo Agent Toolkit integration. AI-Q is a direct implementation of the NVIDIA NeMo Agent Toolkit — all agents, tools, LLMs, routing behavior, and observability are defined through NAT's YAML configuration system with environment variable substitution (${VAR:-default}), plugin registration, and nat run / nat serve / nat eval CLI commands.
Per-role LLM assignment. Assign different models to the orchestrator, planner, researcher, and intent classifier roles independently.
Four pre-built configs. CLI default, Web + LlamaIndex, Web + Foundational RAG, and Hybrid Frontier Model (GPT-5.2 orchestrator with open-source researchers).

Models

Default models. NVIDIA Nemotron 3 Nano 30B (agents, intent classifier), GPT-OSS 120B (deep research orchestrator/planner), Nemotron Mini 4B (document summaries), Llama Nemotron Embed VL 1B v2 (embeddings), Nemotron Nano 12B v2 VL (multimodal extraction).
Frontier model support. Optional config for GPT-5.2 as orchestrator/planner with open-source researchers.
Nemotron Super compatibility. Tested with Nemotron 3 Super 120B; temporarily commented out in default configs due to Build API availability constraints.

Developer Experience

uv workspace monorepo. uv sync installs everything; individual packages installable with uv pip install -e.
Jupyter notebook series. Three-part tutorial: Getting Started, Deep Researcher deep dive, and Customization guide.
Debug console. Built-in debug UI at /debug with real-time SSE visualization, job tracking, and state inspection.
Comprehensive documentation. Architecture docs, API reference, customization guides, knowledge layer SDK reference, and deployment guides for Docker Compose and Kubernetes.

Breaking Changes from v1.x

Complete architecture rewrite — v1.x configs and workflows are not compatible.
The demo web application from v1.x has been replaced by the new Next.js frontend.
PDF processing is now handled through the knowledge layer rather than direct RAG integration.
The v1.x single-agent deep researcher has been replaced by the multi-agent orchestrated workflow.

Dependencies

Pinned to NeMo Agent Toolkit (NAT) v1.4.0. NAT v1.5 or later is not yet supported.
Python 3.11–3.13 supported.
Node.js 22+ required for the frontend.

Assets 2

2.0.0.rc13 Pre-release

Pre-release

AjayThorve released this 17 Mar 08:04

2.0.0.rc13

a5fd0e2

What's Changed

update model intance to nano by @cdgamarose-nv in #156

Full Changelog: 2.0.0.rc12...2.0.0.rc13

Contributors

cdgamarose-nv

Assets 2

AIQ v2 RC9 Pre-release

Pre-release

AjayThorve released this 16 Mar 05:35

2.0.0.rc9

42b49ed

What's Changed

Refactor package installation to use --no-deps flag in CI and Dockerfile by @AjayThorve in #145
Fix/todo list by @AjayThorve in #147
Update MCP guide: use mcp_client, inline config, add server instructions by @PicoNVIDIA in #150
Enhance title extraction logic in citation verification to prioritize titles closest to URLs by @AjayThorve in #149
bugfix: dependency version errors and endpoint change by @cdgamarose-nv in #148
Update documentation to reflect dependency pinning for NeMo Agent Toolkit by @AjayThorve in #151
Bump to 2603.15.ext.rc11 and fix route registration problem. by @drobison00 in #146

Full Changelog: 2.0.0.rc8...2.0.0.rc9

Contributors

drobison00, AjayThorve, and 2 other contributors

Assets 2

2.0.0.rc12 Pre-release

Pre-release

AjayThorve released this 17 Mar 02:01

2.0.0.rc12

7e53572

What's Changed

docs: document tested RAG version and NIM hosted API limitations by @naimnv in #142
update super endpoints to build.nvidia by @AjayThorve in #152

Full Changelog: 2.0.0.rc9...2.0.0.rc12

Contributors

AjayThorve and naimnv

Assets 2

AIQ v2 RC8 Pre-release

Pre-release

AjayThorve released this 13 Mar 01:53

2.0.0.rc8

cff8d0a

What's Changed

Revise README for clarity and additional information by @raykallen in #138
add LangSmith observability, partner endpoint callout, and QA fixes to getting started notebook by @aadesoba-nv in #137
Add MCP tool integration to customization guide by @PicoNVIDIA in #135
bug: lower max completion tokens for notebook by @cdgamarose-nv in #141
AIQ UI polish + bug fix - final by @exactlyallan in #143
Update brev.dev launchable link in notebook by @AjayThorve in #144

New Contributors

@PicoNVIDIA made their first contribution in #135

Full Changelog: 2.0.0.rc7...2.0.0.rc8

Contributors

exactlyallan, AjayThorve, and 4 other contributors

Assets 2

Uh oh!

Releases: NVIDIA-AI-Blueprints/aiq

Release list

v2.1.0

What's Changed

Uh oh!

v2.1.0-rc4

What's Changed

Contributors

Uh oh!

v2.1.0-rc3

What's Changed

Contributors

Uh oh!

v2.1.0-rc2

What's Changed

Contributors

Uh oh!

v2.1.0-rc1

What's Changed

New Contributors

Contributors

Uh oh!

2.0.0

Release v2.0.0

Overview

Architecture

API and Backend

Knowledge Layer

Citation Verification

Frontend

Observability

Evaluation

Deployment

NAT-Powered Configuration

Models

Developer Experience

Breaking Changes from v1.x

Dependencies

Uh oh!

2.0.0.rc13

What's Changed

Contributors

Uh oh!

AIQ v2 RC9

What's Changed

Contributors

Uh oh!

2.0.0.rc12

What's Changed

Contributors

Uh oh!

AIQ v2 RC8

What's Changed

New Contributors

Contributors

Uh oh!