[0.4.6]
Added
-
New State-Of-The-Art LLM Model Support
- Added support for Amazon Nova 2 Lite model (
us.amazon.nova-2-lite-v1:0,eu.amazon.nova-2-lite-v1:0) - Added support for Claude Opus 4.5 model (
us.anthropic.claude-opus-4-5-20251101-v1:0,eu.anthropic.claude-opus-4-5-20251101-v1:0) - Added support for Qwen 3 VL model (
qwen.qwen3-vl-235b-a22b) - Available for configuration across all document processing steps
- Added support for Amazon Nova 2 Lite model (
-
Test Studio for Comprehensive Test Management and Analysis
- Added unified web interface for managing test sets, running tests, and analyzing results directly from the UI
- Test Sets Tab: Create and manage reusable test collections with three creation methods:
- Pattern-based creation with file patterns to match existing data sets (Input Bucket and Test Set Bucket)
- Zip upload with automatic extraction of
input/andbaseline/folder structure
- Test Executions Tab: Unified interface combining test execution and results management:
- Real-time status monitoring
- Multi-select comparison for side-by-side test analysis
- Integrated export and delete operations
- Key Features: File structure validation, progress-aware status updates, cached metrics for improved performance, dual bucket support for flexible test organization
- Documentation: Guide in
docs/test-studio.mdwith architecture details and workflow examples
-
MCP Integration for External Application Access
- Added MCP (Model Context Protocol) integration enabling external applications (like Amazon Quick Suite) to access IDP analytics through AWS Bedrock AgentCore Gateway with secure OAuth 2.0 authentication
- Implemented Analytics Agent with
search_genaiidptool for natural language queries of processed document data (statistics, trends, confidence scores, processing status) - Controlled by
EnableMCPparameter (default: true); provides MCPServerEndpoint and authentication outputs for external application integration; documentation indocs/mcp-integration.md
-
Configurable Section Splitting Strategies for Enhanced Document Segmentation Control
- Added new
sectionSplittingconfiguration option to control how classified pages are grouped into document sections - Three Strategies Available:
disabled: Entire document treated as single section with first detected class (simplest case)page: One section per page preventing automatic joining of same-type documents (deterministic, solves Issue #146)llm_determined: Uses LLM boundary detection with "Start"/"Continue" indicators (default, maintains existing behavior)
- Key Benefits: Deterministic splitting for long documents with multiple same-type forms (e.g., multiple W-2s, multiple invoices), eliminates LLM boundary detection failures for critical government form processing, provides flexibility across simple to complex document scenarios
- Resolves #146
- Added new
Changed
-
Improved Temperature and Top_P Parameter Logic for Deterministic Output
- Changed inference parameter selection logic to allow
temperature=0.0for deterministic output (recommended by Anthropic and other model providers) - New Logic: Uses
top_ponly when it has a positive value (> 0); otherwise usestemperatureincludingtemperature=0.0 - Previous Logic: Used
top_pwhenevertemperature=0.0, preventing proper deterministic configuration - Key Benefits: Enables proper deterministic output with
temperature=0.0, more intuitive parameter behavior, aligns with model provider best practices (Anthropic recommendstemperature=0for consistent outputs) - Affected Components: Bedrock client (
lib/idp_common_pkg/idp_common/bedrock/client.py), Agentic extraction service (lib/idp_common_pkg/idp_common/extraction/agentic_idp.py) - Configuration Guidance: Set
top_p: 0to usetemperatureparameter; settop_pto positive value to override temperature - Set temperature to 0.0 in discovery config for deterministic discovery output (was previously set to 1.0)
- Set top_p to 0.0 in all repo config files to force use of temperature setting by default.
- Changed inference parameter selection logic to allow
-
Removed page image limit entirely across all IDP services
- removed image limits from multimodal inference steps (classification, extraction, assessment) following Amazon Bedrock API removal of image count restrictions. The system now processes all document pages without artificial truncation, with info logging to track image counts for monitoring purposes.
- Resolves #147
-
Knowledge Base Vector Store Default Changed to S3 Vectors
- Changed default
KnowledgeBaseVectorStorefromOPENSEARCH_SERVERLESStoS3_VECTORSfor cost-optimized deployments - S3 Vectors provides 40-60% lower storage costs with sub-second latency suitable for most use cases
- OpenSearch Serverless remains available for applications requiring sub-millisecond query performance
- No action required for existing deployments - only affects new stack deployments
- Changed default
Fixed
- UI: Document Schema Editor Regex Fields Not Persisting - Fixed issue where Document Name Regex and Page Content Regex fields were not being saved in configuration or restored after page refresh. Fixes #151
- Document Schema Builder Enum Support - Fixed enum value handling in schema builder to properly support enumeration constraints for attribute definitions
- Agentic Extraction Parameter Passing - Fixed temperature and top_p parameters now correctly passed to agentic extraction service, enabling proper model behavior control
- Document Schema Builder UI Labels - Enhanced field labels and formats in document schema builder for improved clarity and user experience
- Retry Mechanism Improvements - Enhanced retry logic for more reliable error handling and recovery across document processing workflows
- Type Safety Enhancements - Improved type annotations and fixed undefined items handling to prevent runtime errors
Templates
- us-west-2:
https://s3.us-west-2.amazonaws.com/aws-ml-blog-us-west-2/artifacts/genai-idp/idp-main_0.4.6.yaml - us-east-1:
https://s3.us-east-1.amazonaws.com/aws-ml-blog-us-east-1/artifacts/genai-idp/idp-main_0.4.6.yaml - eu-central-1:
https://s3.eu-central-1.amazonaws.com/aws-ml-blog-eu-central-1/artifacts/genai-idp/idp-main_0.4.6.yaml