Releases: NVIDIA-AI-Blueprints/aiq
Release list
v2.1.0
What's Changed
- AI-Q REST API with pluggable auth middleware, entry-point-registered token validators, and async job ownership enforcement
- Auth extensibility hooks (
register_token_fetcher, provider lifecycle) and auth refactor eliminating the refresh race - Data source registry driving UI toggles, per-message filtering, and agent tool inheritance
- New
exa_web_searchdata source with full_text and highlights controls - Deep researcher consumes DeepAgents skills with a job-scoped Modal sandbox; built-in data-table-analysis skill and configs/config_skills.yml example
- AI-Q is consumable as a portable Agent Skill (
.agents/skills/aiq-research/), with.claude/skills/aiq-research/retained as a Claude Code compatibility symlink for routed/chatand async job lifecycle against a local AI-Q server - Cost analysis tool with pricing configs and profiling example
- Documented MCP client patterns scoped for 2.1: mcp_client, mcp_service_account, and user-identity tools
- Prompt restructure across all agents for KV cache prefix reuse
- Operability: idempotent DB init, tuned Dask/Postgres defaults, request tracing into NAT spans, UI stream-failure hardening
- New authentication and MCP tools guides; new skills-and-sandbox example
- Pinned to NeMo Agent Toolkit (NAT) v1.6.0; CVE bumps for Pillow, cryptography, pygments, authlib, pyopenssl, and pytest
v2.1.0-rc4
What's Changed
- fix(citation): register MCP tool results as sources when no URLs present by @tanleach in #227
- Upgrade Python runtime to 3.13 and distroless to v4.0.5 by @efajardo-nv in #233
Full Changelog: v2.1.0-rc3...v2.1.0-rc4
v2.1.0-rc3
What's Changed
- aligning the SKILL version with the upcoming release by @tanleach in #223
- fix helm bootstrap resources by @AjayThorve in #231
- fix: add headless mode header for chat endpoint (#226) by @cdgamarose-nv in #232
- fix(security): remediate 20 CVEs in aiq-agent container + UI by @efajardo-nv in #229
Full Changelog: v2.1.0-rc2...v2.1.0-rc3
v2.1.0-rc2
What's Changed
- docs(mcp): remove untested 'Publish AIQ as MCP server' section by @AjayThorve in #220
- fix(helm): repair imagePullSecrets fallback that breaks pod render by @AjayThorve in #221
- chore: package AIQ research as portable agent skill by @tanleach in #219
Full Changelog: v2.1.0-rc1...v2.1.0-rc2
v2.1.0-rc1
What's Changed
- update nat version and compatibility fixes by @cdgamarose-nv in #166
- fix: idempotent DB init and SSE stream reliability with connection poolers by @AjayThorve in #161
- Fix silent auth transport failures in WebSocket and SSE by @AjayThorve in #170
- Add register_token_fetcher plugin hook for auth extensibility by @AjayThorve in #169
- Add data source registry and update related configurations by @AjayThorve in #90
- Propagate auth token to Dask workers for async jobs by @AjayThorve in #174
- fix bug in uploading a file, and delete unnecessary func by @DinaLaptii in #163
- feat: expose AI-Q as an API with Auth Middleware by @cdgamarose-nv in #173
- fix: add LangGraph checkpoint tables to init-db.sql by @AjayThorve in #176
- fix: bump cryptography, pygments, authlib, pyopenssl for CVE fixes by @AjayThorve in #175
- refactor(prompts): restructure all prompts for KV cache prefix reuse by @AjayThorve in #177
- fix: allow access to /docs by @cdgamarose-nv in #178
- fix: set size cap on reads by @cdgamarose-nv in #180
- Bump Pillow to 12.2.0 (CVE-2026-40192) by @efajardo-nv in #183
- chore: update dependencies and improve linting configuration by @AjayThorve in #185
- fix: reduce checkpoint pool size and raise postgres max_connections by @AjayThorve in #186
- fix: inject fresh idToken on WebSocket upgrade for reliable auth by @AjayThorve in #188
- fix: use workflow identifier as model field in chat response by @cdgamarose-nv in #189
- fix: surface unavailable tool details in user-facing error messages by @KyleZheng1284 in #184
- Revert "fix: inject fresh idToken on WebSocket upgrade for reliable auth (#188)" by @AjayThorve in #191
- Bump pytest to 9.0.3 (CVE-2025-71176) by @efajardo-nv in #192
- fix: broken periodic_cleanup import + add Dask memory/lifetime env vars by @AjayThorve in #197
- Bump authlib to >=1.6.11 by @efajardo-nv in #198
- fix auth trust boundary and enforce async job ownership by @cdgamarose-nv in #199
- feat: add exa_web_search data source by @maxwbuckley in #181
- Add request trace classification and pseudonymous ids by @AjayThorve in #203
- feat: cost analysis tool with pricing configs, one-time report generation tools, and profiling config example by @cdgamarose-nv in #172
- fix perfomance and tests by @DinaLaptii in #205
- Propagate AIQ request tags to NAT spans by @AjayThorve in #206
- fix: auth refactor — eliminate refresh race, increase buffer, add error semantics by @exactlyallan in #194
- feat: provider lifecycle hooks for composable auth extensions by @exactlyallan in #195
- test: close remaining auth bug fix test coverage gaps by @exactlyallan in #196
- Harden AIQ UI stream failure handling by @exactlyallan in #207
- fix: keep commas inside URL paths in citation source extractor by @AjayThorve in #209
- chore: bump NeMo Agent Toolkit pin to 1.6.0 by @AjayThorve in #208
- Fix/dask cleanup and memory controls by @AjayThorve in #200
- feat: add support for skills and sandbox along with example by @cdgamarose-nv in #211
- docs: add authentication guide and scope MCP guide to AIQ 2.1 by @AjayThorve in #212
- deleted expired sessions by @DinaLaptii in #213
- docs: add v2.1.0 changelog entry by @AjayThorve in #216
New Contributors
- @DinaLaptii made their first contribution in #163
- @maxwbuckley made their first contribution in #181
Full Changelog: 2.0.0...v2.1.0.RC1
2.0.0
Release v2.0.0
Overview
AI-Q v2.0.0 is a ground-up rewrite of the NVIDIA AI-Q Blueprint. The v1.x line provided a single deep research agent with PDF upload and a demo web application. v2.0.0 introduces a two-tier multi-agent architecture built on the NVIDIA NeMo Agent Toolkit (NAT), a new Next.js frontend, async job infrastructure, a pluggable knowledge layer, and built-in evaluation. The AI-Q NVIDIA Blueprint is an open reference example for building intelligent AI agents that connect to your enterprise data, reason using state-of-the-art models, and deliver trusted business insights.
AI-Q holds top positions on both the DeepResearch Bench and DeepResearch Bench II leaderboards. To reproduce those results, use the drb1 and drb2 branches, respectively.
Architecture
- Two-tier research routing. A single-call Intent Classifier routes every query to the optimal path: instant meta responses, fast shallow research, or comprehensive deep research — eliminating unnecessary latency for simple queries.
- LangGraph state machine orchestrator. The core workflow is a LangGraph
StateGraphwith explicit, testable routing and conversation checkpointing (in-memory, SQLite, or PostgreSQL). - Shallow Researcher agent. New bounded tool-calling agent optimized for speed with configurable tool-call budgets, context compaction, and a synthesis anchor that forces citation-backed answers when the budget is exhausted.
- Deep Researcher agent. Rebuilt using the
deepagentslibrary with a three-role subagent architecture (Orchestrator, Planner, Researcher). Supports configurable research loop iterations, per-role LLM assignment, and structured multi-phase workflows: planning, iterative research, citation management, and final report generation. - Clarifier agent with HITL. New human-in-the-loop agent that gathers clarifications, generates structured research plans, and supports plan approval/rejection/feedback before deep research begins. Fully configurable and can be disabled.
- Shallow-to-deep escalation. The shallow researcher can automatically escalate to deep research when it detects insufficient results, routing through the clarifier for plan approval.
API and Backend
- Async Jobs API. New REST API (
/v1/jobs/async/) for submitting, tracking, cancelling, and streaming research jobs. Supports custom job IDs, configurable expiry, and job artifact retrieval. - SSE streaming with event replay. Real-time Server-Sent Events for all agent execution events (LLM tokens, tool calls, artifacts, citations). Full reconnection support with event replay from any point — sub-10ms latency on PostgreSQL via LISTEN/NOTIFY.
- Dask-based distributed execution. Deep research jobs run on a Dask cluster with configurable workers and threads, background heartbeats, stale job reaping, and cooperative cancellation.
- PostgreSQL persistence. Job store, event store, LangGraph checkpoints, and document summaries all support PostgreSQL for production deployments. SQLite remains available for local development.
- Pluggable agent registration. Custom agents can be registered and exposed through the async jobs API without modifying core code.
Knowledge Layer
- Pluggable knowledge retrieval. Backend-agnostic knowledge layer with a factory/registry pattern. Swap between LlamaIndex (local ChromaDB) and Foundational RAG (hosted NVIDIA RAG Blueprint) without changing agent code.
- Document ingestion pipeline. Async file upload with job tracking, status polling (UPLOADING → INGESTING → SUCCESS/FAILED), and collection management (create, delete, list, TTL cleanup).
- Multimodal extraction. LlamaIndex backend supports VLM-powered image captioning and chart data extraction from PDFs, making visual content searchable alongside text.
- Document summaries. Optional LLM-generated one-sentence summaries per document, injected into agent prompts so researchers understand available files before making tool calls.
- Session-based collections. Each browser session gets an isolated collection with automatic 24-hour TTL cleanup.
Citation Verification
- Deterministic citation verification pipeline. Every research response (shallow and deep) passes through post-processing that validates all citations against a source registry of actually-retrieved URLs using a five-level matching strategy (exact, truncation, prefix, child-path, query-subset). Includes report sanitization (shortened URLs, IP addresses, non-HTTP schemes) and a full audit trail of verification decisions.
Frontend
- New Next.js web UI. Complete rewrite as a modern Next.js application with conversational chat interface, document upload, collection management, and real-time research progress visualization.
- Optional OAuth authentication. OIDC-based authentication support with configurable providers and a
REQUIRE_AUTHtoggle. - Configurable file upload. Accepted file types, max file size, and max file count controllable via environment variables.
Observability
- Multi-backend tracing. Built-in support for Phoenix (local trace visualization), LangSmith (LLM evaluation and prompt optimization), Weights & Biases Weave (experiment tracking with PII redaction), and a production-grade OpenTelemetry Collector exporter with configurable privacy redaction — all configurable through NAT YAML config or environment variables.
Evaluation
- FreshQA benchmark. Built-in factuality evaluation on time-sensitive questions for measuring shallow researcher accuracy, runnable via the NAT evaluation harness (
nat eval). Deep research benchmark reproduction is available on the dedicateddrb1anddrb2branches.
Deployment
- Docker Compose stack. Production-ready three-service stack (backend, frontend, PostgreSQL) with multi-stage Dockerfile, dev/release build targets, and distroless runtime images running as non-root (UID 1000).
- Helm chart for Kubernetes. Full Helm deployment with NGC registry support, Kubernetes secrets management, configurable resource limits, health checks, and Foundational RAG integration via internal service DNS.
- Horizontal scaling. Stateless backend supports scaling behind a load balancer with shared PostgreSQL and optional external Dask scheduler.
NAT-Powered Configuration
- Native NeMo Agent Toolkit integration. AI-Q is a direct implementation of the NVIDIA NeMo Agent Toolkit — all agents, tools, LLMs, routing behavior, and observability are defined through NAT's YAML configuration system with environment variable substitution (
${VAR:-default}), plugin registration, andnat run/nat serve/nat evalCLI commands. - Per-role LLM assignment. Assign different models to the orchestrator, planner, researcher, and intent classifier roles independently.
- Four pre-built configs. CLI default, Web + LlamaIndex, Web + Foundational RAG, and Hybrid Frontier Model (GPT-5.2 orchestrator with open-source researchers).
Models
- Default models. NVIDIA Nemotron 3 Nano 30B (agents, intent classifier), GPT-OSS 120B (deep research orchestrator/planner), Nemotron Mini 4B (document summaries), Llama Nemotron Embed VL 1B v2 (embeddings), Nemotron Nano 12B v2 VL (multimodal extraction).
- Frontier model support. Optional config for GPT-5.2 as orchestrator/planner with open-source researchers.
- Nemotron Super compatibility. Tested with Nemotron 3 Super 120B; temporarily commented out in default configs due to Build API availability constraints.
Developer Experience
- uv workspace monorepo.
uv syncinstalls everything; individual packages installable withuv pip install -e. - Jupyter notebook series. Three-part tutorial: Getting Started, Deep Researcher deep dive, and Customization guide.
- Debug console. Built-in debug UI at
/debugwith real-time SSE visualization, job tracking, and state inspection. - Comprehensive documentation. Architecture docs, API reference, customization guides, knowledge layer SDK reference, and deployment guides for Docker Compose and Kubernetes.
Breaking Changes from v1.x
- Complete architecture rewrite — v1.x configs and workflows are not compatible.
- The demo web application from v1.x has been replaced by the new Next.js frontend.
- PDF processing is now handled through the knowledge layer rather than direct RAG integration.
- The v1.x single-agent deep researcher has been replaced by the multi-agent orchestrated workflow.
Dependencies
- Pinned to NeMo Agent Toolkit (NAT) v1.4.0. NAT v1.5 or later is not yet supported.
- Python 3.11–3.13 supported.
- Node.js 22+ required for the frontend.
2.0.0.rc13
What's Changed
- update model intance to nano by @cdgamarose-nv in #156
Full Changelog: 2.0.0.rc12...2.0.0.rc13
AIQ v2 RC9
What's Changed
- Refactor package installation to use --no-deps flag in CI and Dockerfile by @AjayThorve in #145
- Fix/todo list by @AjayThorve in #147
- Update MCP guide: use mcp_client, inline config, add server instructions by @PicoNVIDIA in #150
- Enhance title extraction logic in citation verification to prioritize titles closest to URLs by @AjayThorve in #149
- bugfix: dependency version errors and endpoint change by @cdgamarose-nv in #148
- Update documentation to reflect dependency pinning for NeMo Agent Toolkit by @AjayThorve in #151
- Bump to 2603.15.ext.rc11 and fix route registration problem. by @drobison00 in #146
Full Changelog: 2.0.0.rc8...2.0.0.rc9
2.0.0.rc12
What's Changed
- docs: document tested RAG version and NIM hosted API limitations by @naimnv in #142
- update super endpoints to build.nvidia by @AjayThorve in #152
Full Changelog: 2.0.0.rc9...2.0.0.rc12
AIQ v2 RC8
What's Changed
- Revise README for clarity and additional information by @raykallen in #138
- add LangSmith observability, partner endpoint callout, and QA fixes to getting started notebook by @aadesoba-nv in #137
- Add MCP tool integration to customization guide by @PicoNVIDIA in #135
- bug: lower max completion tokens for notebook by @cdgamarose-nv in #141
- AIQ UI polish + bug fix - final by @exactlyallan in #143
- Update brev.dev launchable link in notebook by @AjayThorve in #144
New Contributors
- @PicoNVIDIA made their first contribution in #135
Full Changelog: 2.0.0.rc7...2.0.0.rc8