Skip to content

Releases: NVIDIA-AI-Blueprints/aiq

v2.1.0

Choose a tag to compare

@cdgamarose-nv cdgamarose-nv released this 19 May 21:35
917271c

What's Changed

  • AI-Q REST API with pluggable auth middleware, entry-point-registered token validators, and async job ownership enforcement
  • Auth extensibility hooks (register_token_fetcher, provider lifecycle) and auth refactor eliminating the refresh race
  • Data source registry driving UI toggles, per-message filtering, and agent tool inheritance
  • New exa_web_search data source with full_text and highlights controls
  • Deep researcher consumes DeepAgents skills with a job-scoped Modal sandbox; built-in data-table-analysis skill and configs/config_skills.yml example
  • AI-Q is consumable as a portable Agent Skill (.agents/skills/aiq-research/), with .claude/skills/aiq-research/ retained as a Claude Code compatibility symlink for routed /chat and async job lifecycle against a local AI-Q server
  • Cost analysis tool with pricing configs and profiling example
  • Documented MCP client patterns scoped for 2.1: mcp_client, mcp_service_account, and user-identity tools
  • Prompt restructure across all agents for KV cache prefix reuse
  • Operability: idempotent DB init, tuned Dask/Postgres defaults, request tracing into NAT spans, UI stream-failure hardening
  • New authentication and MCP tools guides; new skills-and-sandbox example
  • Pinned to NeMo Agent Toolkit (NAT) v1.6.0; CVE bumps for Pillow, cryptography, pygments, authlib, pyopenssl, and pytest

v2.1.0-rc4

v2.1.0-rc4 Pre-release
Pre-release

Choose a tag to compare

@AjayThorve AjayThorve released this 13 May 21:27
34c3a41

What's Changed

  • fix(citation): register MCP tool results as sources when no URLs present by @tanleach in #227
  • Upgrade Python runtime to 3.13 and distroless to v4.0.5 by @efajardo-nv in #233

Full Changelog: v2.1.0-rc3...v2.1.0-rc4

v2.1.0-rc3

v2.1.0-rc3 Pre-release
Pre-release

Choose a tag to compare

@AjayThorve AjayThorve released this 13 May 02:47
62ffe40

What's Changed

Full Changelog: v2.1.0-rc2...v2.1.0-rc3

v2.1.0-rc2

v2.1.0-rc2 Pre-release
Pre-release

Choose a tag to compare

@AjayThorve AjayThorve released this 07 May 03:15
1fba03c

What's Changed

  • docs(mcp): remove untested 'Publish AIQ as MCP server' section by @AjayThorve in #220
  • fix(helm): repair imagePullSecrets fallback that breaks pod render by @AjayThorve in #221
  • chore: package AIQ research as portable agent skill by @tanleach in #219

Full Changelog: v2.1.0-rc1...v2.1.0-rc2

v2.1.0-rc1

v2.1.0-rc1 Pre-release
Pre-release

Choose a tag to compare

@AjayThorve AjayThorve released this 06 May 01:21
d2b85a7

What's Changed

New Contributors

Full Changelog: 2.0.0...v2.1.0.RC1

2.0.0

Choose a tag to compare

@AjayThorve AjayThorve released this 18 Mar 15:16
62101c8

Release v2.0.0

Overview

AI-Q v2.0.0 is a ground-up rewrite of the NVIDIA AI-Q Blueprint. The v1.x line provided a single deep research agent with PDF upload and a demo web application. v2.0.0 introduces a two-tier multi-agent architecture built on the NVIDIA NeMo Agent Toolkit (NAT), a new Next.js frontend, async job infrastructure, a pluggable knowledge layer, and built-in evaluation. The AI-Q NVIDIA Blueprint is an open reference example for building intelligent AI agents that connect to your enterprise data, reason using state-of-the-art models, and deliver trusted business insights.

AI-Q holds top positions on both the DeepResearch Bench and DeepResearch Bench II leaderboards. To reproduce those results, use the drb1 and drb2 branches, respectively.

Architecture

  • Two-tier research routing. A single-call Intent Classifier routes every query to the optimal path: instant meta responses, fast shallow research, or comprehensive deep research — eliminating unnecessary latency for simple queries.
  • LangGraph state machine orchestrator. The core workflow is a LangGraph StateGraph with explicit, testable routing and conversation checkpointing (in-memory, SQLite, or PostgreSQL).
  • Shallow Researcher agent. New bounded tool-calling agent optimized for speed with configurable tool-call budgets, context compaction, and a synthesis anchor that forces citation-backed answers when the budget is exhausted.
  • Deep Researcher agent. Rebuilt using the deepagents library with a three-role subagent architecture (Orchestrator, Planner, Researcher). Supports configurable research loop iterations, per-role LLM assignment, and structured multi-phase workflows: planning, iterative research, citation management, and final report generation.
  • Clarifier agent with HITL. New human-in-the-loop agent that gathers clarifications, generates structured research plans, and supports plan approval/rejection/feedback before deep research begins. Fully configurable and can be disabled.
  • Shallow-to-deep escalation. The shallow researcher can automatically escalate to deep research when it detects insufficient results, routing through the clarifier for plan approval.

API and Backend

  • Async Jobs API. New REST API (/v1/jobs/async/) for submitting, tracking, cancelling, and streaming research jobs. Supports custom job IDs, configurable expiry, and job artifact retrieval.
  • SSE streaming with event replay. Real-time Server-Sent Events for all agent execution events (LLM tokens, tool calls, artifacts, citations). Full reconnection support with event replay from any point — sub-10ms latency on PostgreSQL via LISTEN/NOTIFY.
  • Dask-based distributed execution. Deep research jobs run on a Dask cluster with configurable workers and threads, background heartbeats, stale job reaping, and cooperative cancellation.
  • PostgreSQL persistence. Job store, event store, LangGraph checkpoints, and document summaries all support PostgreSQL for production deployments. SQLite remains available for local development.
  • Pluggable agent registration. Custom agents can be registered and exposed through the async jobs API without modifying core code.

Knowledge Layer

  • Pluggable knowledge retrieval. Backend-agnostic knowledge layer with a factory/registry pattern. Swap between LlamaIndex (local ChromaDB) and Foundational RAG (hosted NVIDIA RAG Blueprint) without changing agent code.
  • Document ingestion pipeline. Async file upload with job tracking, status polling (UPLOADING → INGESTING → SUCCESS/FAILED), and collection management (create, delete, list, TTL cleanup).
  • Multimodal extraction. LlamaIndex backend supports VLM-powered image captioning and chart data extraction from PDFs, making visual content searchable alongside text.
  • Document summaries. Optional LLM-generated one-sentence summaries per document, injected into agent prompts so researchers understand available files before making tool calls.
  • Session-based collections. Each browser session gets an isolated collection with automatic 24-hour TTL cleanup.

Citation Verification

  • Deterministic citation verification pipeline. Every research response (shallow and deep) passes through post-processing that validates all citations against a source registry of actually-retrieved URLs using a five-level matching strategy (exact, truncation, prefix, child-path, query-subset). Includes report sanitization (shortened URLs, IP addresses, non-HTTP schemes) and a full audit trail of verification decisions.

Frontend

  • New Next.js web UI. Complete rewrite as a modern Next.js application with conversational chat interface, document upload, collection management, and real-time research progress visualization.
  • Optional OAuth authentication. OIDC-based authentication support with configurable providers and a REQUIRE_AUTH toggle.
  • Configurable file upload. Accepted file types, max file size, and max file count controllable via environment variables.

Observability

  • Multi-backend tracing. Built-in support for Phoenix (local trace visualization), LangSmith (LLM evaluation and prompt optimization), Weights & Biases Weave (experiment tracking with PII redaction), and a production-grade OpenTelemetry Collector exporter with configurable privacy redaction — all configurable through NAT YAML config or environment variables.

Evaluation

  • FreshQA benchmark. Built-in factuality evaluation on time-sensitive questions for measuring shallow researcher accuracy, runnable via the NAT evaluation harness (nat eval). Deep research benchmark reproduction is available on the dedicated drb1 and drb2 branches.

Deployment

  • Docker Compose stack. Production-ready three-service stack (backend, frontend, PostgreSQL) with multi-stage Dockerfile, dev/release build targets, and distroless runtime images running as non-root (UID 1000).
  • Helm chart for Kubernetes. Full Helm deployment with NGC registry support, Kubernetes secrets management, configurable resource limits, health checks, and Foundational RAG integration via internal service DNS.
  • Horizontal scaling. Stateless backend supports scaling behind a load balancer with shared PostgreSQL and optional external Dask scheduler.

NAT-Powered Configuration

  • Native NeMo Agent Toolkit integration. AI-Q is a direct implementation of the NVIDIA NeMo Agent Toolkit — all agents, tools, LLMs, routing behavior, and observability are defined through NAT's YAML configuration system with environment variable substitution (${VAR:-default}), plugin registration, and nat run / nat serve / nat eval CLI commands.
  • Per-role LLM assignment. Assign different models to the orchestrator, planner, researcher, and intent classifier roles independently.
  • Four pre-built configs. CLI default, Web + LlamaIndex, Web + Foundational RAG, and Hybrid Frontier Model (GPT-5.2 orchestrator with open-source researchers).

Models

  • Default models. NVIDIA Nemotron 3 Nano 30B (agents, intent classifier), GPT-OSS 120B (deep research orchestrator/planner), Nemotron Mini 4B (document summaries), Llama Nemotron Embed VL 1B v2 (embeddings), Nemotron Nano 12B v2 VL (multimodal extraction).
  • Frontier model support. Optional config for GPT-5.2 as orchestrator/planner with open-source researchers.
  • Nemotron Super compatibility. Tested with Nemotron 3 Super 120B; temporarily commented out in default configs due to Build API availability constraints.

Developer Experience

  • uv workspace monorepo. uv sync installs everything; individual packages installable with uv pip install -e.
  • Jupyter notebook series. Three-part tutorial: Getting Started, Deep Researcher deep dive, and Customization guide.
  • Debug console. Built-in debug UI at /debug with real-time SSE visualization, job tracking, and state inspection.
  • Comprehensive documentation. Architecture docs, API reference, customization guides, knowledge layer SDK reference, and deployment guides for Docker Compose and Kubernetes.

Breaking Changes from v1.x

  • Complete architecture rewrite — v1.x configs and workflows are not compatible.
  • The demo web application from v1.x has been replaced by the new Next.js frontend.
  • PDF processing is now handled through the knowledge layer rather than direct RAG integration.
  • The v1.x single-agent deep researcher has been replaced by the multi-agent orchestrated workflow.

Dependencies

  • Pinned to NeMo Agent Toolkit (NAT) v1.4.0. NAT v1.5 or later is not yet supported.
  • Python 3.11–3.13 supported.
  • Node.js 22+ required for the frontend.

2.0.0.rc13

2.0.0.rc13 Pre-release
Pre-release

Choose a tag to compare

@AjayThorve AjayThorve released this 17 Mar 08:04
a5fd0e2

What's Changed

Full Changelog: 2.0.0.rc12...2.0.0.rc13

AIQ v2 RC9

AIQ v2 RC9 Pre-release
Pre-release

Choose a tag to compare

@AjayThorve AjayThorve released this 16 Mar 05:35
42b49ed

What's Changed

  • Refactor package installation to use --no-deps flag in CI and Dockerfile by @AjayThorve in #145
  • Fix/todo list by @AjayThorve in #147
  • Update MCP guide: use mcp_client, inline config, add server instructions by @PicoNVIDIA in #150
  • Enhance title extraction logic in citation verification to prioritize titles closest to URLs by @AjayThorve in #149
  • bugfix: dependency version errors and endpoint change by @cdgamarose-nv in #148
  • Update documentation to reflect dependency pinning for NeMo Agent Toolkit by @AjayThorve in #151
  • Bump to 2603.15.ext.rc11 and fix route registration problem. by @drobison00 in #146

Full Changelog: 2.0.0.rc8...2.0.0.rc9

2.0.0.rc12

2.0.0.rc12 Pre-release
Pre-release

Choose a tag to compare

@AjayThorve AjayThorve released this 17 Mar 02:01
7e53572

What's Changed

  • docs: document tested RAG version and NIM hosted API limitations by @naimnv in #142
  • update super endpoints to build.nvidia by @AjayThorve in #152

Full Changelog: 2.0.0.rc9...2.0.0.rc12

AIQ v2 RC8

AIQ v2 RC8 Pre-release
Pre-release

Choose a tag to compare

@AjayThorve AjayThorve released this 13 Mar 01:53
cff8d0a

What's Changed

New Contributors

Full Changelog: 2.0.0.rc7...2.0.0.rc8