Name	Name	Last commit message	Last commit date
parent directory ..
awslabs	awslabs
docs	docs
tests	tests
.gitignore	.gitignore
.python-version	.python-version
CHANGELOG.md	CHANGELOG.md
Dockerfile	Dockerfile
LICENSE	LICENSE
MCP_INSPECTOR_SETUP.md	MCP_INSPECTOR_SETUP.md
NOTICE	NOTICE
README.md	README.md
docker-healthcheck.sh	docker-healthcheck.sh
pyproject.toml	pyproject.toml
uv-requirements.txt	uv-requirements.txt
uv.lock	uv.lock

AWS HealthOmics MCP Server

A Model Context Protocol (MCP) server that provides AI assistants with comprehensive access to AWS HealthOmics services for genomic workflow management, execution, and analysis.

Overview

AWS HealthOmics is a purpose-built service for storing, querying, and analyzing genomic, transcriptomic, and other omics data. This MCP server enables AI assistants to interact with HealthOmics workflows through natural language, making genomic data analysis more accessible and efficient.

Key Capabilities

This MCP server provides tools for:

🧬 Workflow Management

Create and validate workflows: Support for WDL, CWL, and Nextflow workflow languages
Lint workflow definitions: Validate WDL and CWL workflows using industry-standard linting tools
Version management: Create and manage workflow versions with different configurations
Package workflows: Bundle workflow definitions into deployable packages

🚀 Workflow Execution

Start and monitor runs: Execute workflows with custom parameters and monitor progress
Task management: Track individual workflow tasks and their execution status
Resource configuration: Configure compute resources, storage, and caching options

📊 Analysis and Troubleshooting

Performance analysis: Analyze workflow execution performance and resource utilization
Failure diagnosis: Comprehensive troubleshooting tools for failed workflow runs
Log access: Retrieve detailed logs from runs, engines, tasks, and manifests

🔍 File Discovery and Search

Genomics file search: Intelligent discovery of genomics files across S3 buckets, HealthOmics sequence stores, and reference stores
Pattern matching: Advanced search with fuzzy matching against file paths and object tags
File associations: Automatic detection and grouping of related files (BAM/BAI indexes, FASTQ pairs, FASTA indexes)
Relevance scoring: Smart ranking of search results based on match quality and file relationships

🌍 Region Management

Multi-region support: Get information about AWS regions where HealthOmics is available

Available Tools

Workflow Management Tools

ListAHOWorkflows - List available HealthOmics workflows with pagination support
CreateAHOWorkflow - Create new workflows with WDL, CWL, or Nextflow definitions from local ZIP files, S3 URIs, or base64-encoded content, with optional container registry mappings
GetAHOWorkflow - Retrieve detailed workflow information and export definitions
CreateAHOWorkflowVersion - Create new versions of existing workflows from local ZIP files, S3 URIs, or base64-encoded content, with optional container registry mappings
ListAHOWorkflowVersions - List all versions of a specific workflow
LintAHOWorkflowDefinition - Lint single WDL or CWL workflow files using miniwdl and cwltool, accepting local file paths, S3 URIs, or inline content
LintAHOWorkflowBundle - Lint multi-file WDL or CWL workflow bundles with import/dependency support, accepting local directories, ZIP files, S3 prefixes, or inline dictionaries
PackageAHOWorkflow - Package workflow files into base64-encoded ZIP format, accepting local file paths, S3 URIs, or inline content

Workflow Execution Tools

StartAHORun - Start workflow runs with custom parameters, resource configuration, and optional VPC networking mode with a named configuration
ListAHORuns - List workflow runs with filtering by status and date ranges
GetAHORun - Retrieve detailed run information including status and metadata
ListAHORunTasks - List tasks for specific runs with status filtering
GetAHORunTask - Get detailed information about specific workflow tasks

Analysis and Troubleshooting Tools

AnalyzeAHORunPerformance - Analyze workflow run performance and resource utilization
DiagnoseAHORunFailure - Comprehensive diagnosis of failed workflow runs with remediation suggestions
GetAHORunLogs - Access high-level workflow execution logs and events
GetAHORunEngineLogs - Retrieve workflow engine logs (STDOUT/STDERR) for debugging
GetAHORunManifestLogs - Access run manifest logs with runtime information and metrics
GetAHOTaskLogs - Get task-specific logs for debugging individual workflow steps

File Discovery Tools

SearchGenomicsFiles - Intelligent search for genomics files across S3 buckets, HealthOmics sequence stores, and reference stores with pattern matching, file association detection, and relevance scoring

Run Group Management Tools

CreateAHORunGroup - Create a new run group with optional resource limits (maxCpus, maxGpus, maxDuration, maxRuns) and tags
GetAHORunGroup - Retrieve detailed information about a specific run group
ListAHORunGroups - List available run groups with optional name filtering and pagination
UpdateAHORunGroup - Update an existing run group's name or resource limits

Run Cache Management Tools

CreateAHORunCache - Create a new run cache with a cache behavior (CACHE_ALWAYS or CACHE_ON_FAILURE), S3 URI for cache storage, and optional name, description, tags, and cross-account bucket owner ID
GetAHORunCache - Retrieve detailed information about a specific run cache including configuration, status, and metadata
ListAHORunCaches - List available run caches with optional filtering by name, status, or cache behavior, with pagination support
UpdateAHORunCache - Update an existing run cache's cache behavior, name, or description

Sequence Store Management Tools

CreateAHOSequenceStore - Create a new sequence store with optional encryption, description, fallback location, and tags
ListAHOSequenceStores - List sequence stores with optional name filtering and pagination
GetAHOSequenceStore - Get detailed information about a specific sequence store
UpdateAHOSequenceStore - Update a sequence store's name, description, or fallback location (manages ETags internally)
ListAHOReadSets - List read sets in a sequence store with filtering by sample ID, subject ID, reference ARN, status, file type, and date range
GetAHOReadSetMetadata - Get detailed metadata for a specific read set including sequence information and file details
StartAHOReadSetImportJob - Import genomic files from S3 into a sequence store with batch support
GetAHOReadSetImportJob - Get status and details of a read set import job including per-source statuses
ListAHOReadSetImportJobs - List import jobs for a sequence store with pagination
StartAHOReadSetExportJob - Export read sets from a sequence store to S3 with batch support
GetAHOReadSetExportJob - Get status and details of a read set export job
ListAHOReadSetExportJobs - List export jobs for a sequence store with pagination
ActivateAHOReadSets - Activate archived read sets for analysis access

Reference Store Management Tools

ListAHOReferenceStores - List reference stores with optional name filtering and pagination
GetAHOReferenceStore - Get detailed information about a specific reference store
ListAHOReferences - List references in a reference store with optional name and status filtering
GetAHOReferenceMetadata - Get detailed metadata for a specific reference including file information
StartAHOReferenceImportJob - Import reference files from S3 into a reference store with batch support
GetAHOReferenceImportJob - Get status and details of a reference import job including per-source statuses
ListAHOReferenceImportJobs - List import jobs for a reference store with pagination

Configuration Management Tools

CreateAHOConfiguration - Create a new HealthOmics configuration for workflow runs with optional run settings, description, and tags
GetAHOConfiguration - Get detailed information about a specific configuration including run settings and status
ListAHOConfigurations - List available configurations with pagination support
DeleteAHOConfiguration - Delete a configuration

Region Management Tools

GetAHOSupportedRegions - List AWS regions where HealthOmics is available

Instructions for AI Assistants

This MCP server enables AI assistants like Kiro, Cline, Cursor, and Windsurf to help users with AWS HealthOmics genomic workflow management. Here's how to effectively use these tools:

Understanding AWS HealthOmics

AWS HealthOmics is designed for genomic data analysis workflows. Key concepts:

Workflows: Computational pipelines written in WDL, CWL, or Nextflow that process genomic data
Runs: Executions of workflows with specific input parameters and data
Tasks: Individual steps within a workflow run
Storage Types: STATIC (fixed storage) or DYNAMIC (auto-scaling storage)

Workflow Management Best Practices

Creating Workflows:
- From local files: Use PackageAHOWorkflow to bundle workflow files, then use the base64-encoded ZIP with CreateAHOWorkflow
- From S3: Store your workflow definition ZIP file in S3 and reference it using the definition_uri parameter
- Validate workflows with appropriate language syntax (WDL, CWL, Nextflow)
- Include parameter templates to guide users on required inputs
- Choose the appropriate method based on your workflow storage preferences
S3 URI Support:
- Both CreateAHOWorkflow and CreateAHOWorkflowVersion support S3 URIs as an alternative to base64-encoded ZIP files
- Benefits of S3 URIs:
  - Better for large workflow definitions (no base64 encoding overhead)
  - Easier integration with CI/CD pipelines that store artifacts in S3
  - Reduced memory usage during workflow creation
  - Direct reference to existing S3-stored workflow definitions
- Requirements:
  - S3 URI must start with s3://
  - The S3 bucket must be in the same region as the HealthOmics service
  - Appropriate S3 permissions must be configured for the HealthOmics service
- Usage: Specify either definition_source (local ZIP path, S3 URI, or base64 content) OR definition_uri, but not both. The legacy definition_zip_base64 parameter is still accepted as a deprecated alias.
Version Management:
- Create new versions for workflow updates rather than modifying existing ones
- Use descriptive version names that indicate changes or improvements
- List versions to help users choose the appropriate one
- Both base64 ZIP and S3 URI methods are supported for version creation

Workflow Execution Guidance

Starting Runs:
- Always specify required parameters: workflow_id, role_arn, name, output_uri
- Choose appropriate storage type (DYNAMIC recommended for most cases)
- Use meaningful run names for easy identification
- Configure caching when appropriate to save costs and time
Monitoring Runs:
- Use ListAHORuns with status filters to track active workflows
- Check individual run details with GetAHORun for comprehensive status
- Monitor tasks with ListAHORunTasks to identify bottlenecks

Troubleshooting Failed Runs

When workflows fail, follow this diagnostic approach:

Start with DiagnoseAHORunFailure: This comprehensive tool provides:
- Failure reasons and error analysis
- Failed task identification
- Log summaries and recommendations
- Actionable troubleshooting steps
Access Specific Logs:
- Run Logs: High-level workflow events and status changes
- Engine Logs: Workflow engine STDOUT/STDERR for system-level issues
- Task Logs: Individual task execution details for specific failures
- Manifest Logs: Resource utilization and workflow summary information
Performance Analysis:
- Use AnalyzeAHORunPerformance to identify resource bottlenecks
- Review task resource utilization patterns
- Optimize workflow parameters based on analysis results

Workflow Linting and Validation

The MCP server includes built-in workflow linting capabilities for validating WDL and CWL workflows before deployment:

Lint Workflow Definitions:
- Single files: Use LintAHOWorkflowDefinition for individual workflow files
- Multi-file bundles: Use LintAHOWorkflowBundle for workflows with imports and dependencies
- Syntax errors: Catch parsing issues before deployment
- Missing components: Identify missing inputs, outputs, or steps
- Runtime requirements: Ensure tasks have proper runtime specifications
- Import resolution: Validate imports and dependencies between files
- Best practices: Get warnings about potential improvements
Supported Formats:
- WDL: Uses miniwdl for comprehensive validation
- CWL: Uses cwltool for standards-compliant validation
No Additional Installation Required: Both miniwdl and cwltool are included as dependencies and available immediately after installing the MCP server.

Genomics File Discovery

The MCP server includes a powerful genomics file search tool that helps users locate and discover genomics files across multiple storage systems:

Multi-Storage Search:
- S3 Buckets: Search configured S3 bucket paths for genomics files
- HealthOmics Sequence Stores: Discover read sets and their associated files
- HealthOmics Reference Stores: Find reference genomes and associated indexes
- Unified Results: Get combined, deduplicated results from all storage systems
Intelligent Pattern Matching:
- File Path Matching: Search against S3 object keys and HealthOmics resource names
- Tag-Based Search: Match against S3 object tags and HealthOmics metadata
- Fuzzy Matching: Find files even with partial or approximate search terms
- Multiple Terms: Support for multiple search terms with logical matching
Automatic File Association:
- BAM/CRAM Indexes: Automatically group BAM files with their .bai indexes and CRAM files with .crai indexes
- FASTQ Pairs: Detect and group R1/R2 read pairs using standard naming conventions (_R1/_R2, _1/_2)
- FASTA Indexes: Associate FASTA files with their .fai, .dict, and BWA index collections
- Variant Indexes: Group VCF/GVCF files with their .tbi and .csi index files
- Complete File Sets: Identify complete genomics file collections for analysis pipelines
Smart Relevance Scoring:
- Pattern Match Quality: Higher scores for exact matches, lower for fuzzy matches
- File Type Relevance: Boost scores for files matching the requested type
- Associated Files Bonus: Increase scores for files with complete index sets
- Storage Accessibility: Consider storage class (Standard vs. Glacier) in scoring
Comprehensive File Metadata:
- Access Paths: S3 URIs or HealthOmics S3 access point paths for direct data access
- File Characteristics: Size, storage class, last modified date, and file type detection
- Storage Information: Archive status and retrieval requirements
- Source System: Clear indication of whether files are from S3, sequence stores, or reference stores
Configuration and Setup:
- S3 Bucket Configuration: Set GENOMICS_SEARCH_S3_BUCKETS environment variable with comma-separated bucket paths
- Example: GENOMICS_SEARCH_S3_BUCKETS=s3://my-genomics-data/,s3://shared-references/hg38/
- Permissions: Ensure appropriate S3 and HealthOmics read permissions
- Performance: Parallel searches across storage systems for optimal response times
Performance Optimizations:
- Smart S3 API Usage: Optimized to minimize S3 API calls by 60-90% through intelligent caching and batching
- Lazy Tag Loading: Only retrieves S3 object tags when needed for pattern matching
- Result Caching: Caches search results to eliminate repeated S3 calls for identical searches
- Batch Operations: Retrieves tags for multiple objects in parallel batches
- Configurable Performance: Tune cache TTLs, batch sizes, and tag search behavior for your use case
- Path-First Matching: Prioritizes file path matching over tag matching to reduce API calls

File Search Usage Examples

Find FASTQ Files for a Sample:

User: "Find all FASTQ files for sample NA12878"
→ Use SearchGenomicsFiles with file_type="fastq" and search_terms=["NA12878"]
→ Returns R1/R2 pairs automatically grouped together
→ Includes file sizes and storage locations

Locate Reference Genomes:

User: "Find human reference genome hg38 files"
→ Use SearchGenomicsFiles with file_type="fasta" and search_terms=["hg38", "human"]
→ Returns FASTA files with associated .fai, .dict, and BWA indexes
→ Provides S3 access point paths for HealthOmics reference stores

Search for Alignment Files:

User: "Find BAM files from the 1000 Genomes project"
→ Use SearchGenomicsFiles with file_type="bam" and search_terms=["1000", "genomes"]
→ Returns BAM files with their .bai index files
→ Ranked by relevance with complete file metadata

Discover Variant Files:

User: "Locate VCF files containing SNP data"
→ Use SearchGenomicsFiles with file_type="vcf" and search_terms=["SNP"]
→ Returns VCF files with associated .tbi index files
→ Includes both S3 and HealthOmics store results

Performance Tuning for File Search

The genomics file search includes several optimizations to minimize S3 API calls and improve performance:

For Path-Based Searches (Recommended):

# Use specific file/sample names in search terms
# This enables path matching without tag retrieval
GENOMICS_SEARCH_ENABLE_S3_TAG_SEARCH=true  # Keep enabled for fallback
GENOMICS_SEARCH_RESULT_CACHE_TTL=600       # Cache results for 10 minutes

For Tag-Heavy Environments:

# Optimize batch sizes for your dataset
GENOMICS_SEARCH_MAX_TAG_BATCH_SIZE=200     # Larger batches for better performance
GENOMICS_SEARCH_TAG_CACHE_TTL=900          # Longer tag cache for frequently accessed objects

For Cost-Sensitive Environments:

# Disable tag search if only path matching is needed
GENOMICS_SEARCH_ENABLE_S3_TAG_SEARCH=false  # Eliminates all tag API calls
GENOMICS_SEARCH_RESULT_CACHE_TTL=1800       # Longer result cache to reduce repeated searches

For Development/Testing:

# Disable caching for immediate results during development
GENOMICS_SEARCH_RESULT_CACHE_TTL=0         # No result caching
GENOMICS_SEARCH_TAG_CACHE_TTL=0            # No tag caching
GENOMICS_SEARCH_MAX_TAG_BATCH_SIZE=50      # Smaller batches for testing

Performance Impact: These optimizations can reduce S3 API calls by 60-90% and improve search response times by 5-10x compared to the unoptimized implementation.

Common Use Cases

Workflow Development:

User: "Help me create a new genomic variant calling workflow"
→ Option A: Use PackageAHOWorkflow to bundle files, then CreateAHOWorkflow with base64 ZIP
→ Option B: Upload workflow ZIP to S3, then CreateAHOWorkflow with S3 URI
→ Validate syntax and parameters
→ Choose method based on workflow size and storage preferences

Production Execution:

User: "Run my alignment workflow on these FASTQ files"
→ Use SearchGenomicsFiles to find FASTQ files for the run
→ Use StartAHORun with appropriate parameters
→ Monitor with ListAHORuns and GetAHORun
→ Track task progress with ListAHORunTasks

Troubleshooting:

User: "My workflow failed, what went wrong?"
→ Use DiagnoseAHORunFailure for comprehensive analysis
→ Access specific logs based on failure type
→ Provide actionable remediation steps

Performance Optimization:

User: "How can I make my workflow run faster?"
→ Use AnalyzeAHORunPerformance to identify bottlenecks
→ Review resource utilization patterns
→ Suggest optimization strategies

Workflow Validation:

User: "Check if my WDL workflow is valid"
→ Use LintAHOWorkflowDefinition for single files
→ Use LintAHOWorkflowBundle for multi-file workflows with imports
→ Check for missing inputs, outputs, or runtime requirements
→ Validate import resolution and dependencies
→ Get detailed error messages and warnings

Important Considerations

IAM Permissions: Ensure proper IAM roles with HealthOmics permissions
Regional Availability: Use GetAHOSupportedRegions to verify service availability
Cost Management: Monitor storage and compute costs, especially with STATIC storage
Data Security: Follow genomic data handling best practices and compliance requirements
Resource Limits: Be aware of service quotas and limits for concurrent runs

Error Handling

When tools return errors:

Check AWS credentials and permissions
Verify resource IDs (workflow_id, run_id, task_id) are valid
Ensure proper parameter formatting and required fields
Use diagnostic tools to understand failure root causes
Provide clear, actionable error messages to users

Installation

Kiro	Cursor	VS Code

Install using uvx:

uvx awslabs.aws-healthomics-mcp-server

Or install from source:

git clone <repository-url>
cd mcp/src/aws-healthomics-mcp-server
uv sync
uv run -m awslabs.aws_healthomics_mcp_server.server

Configuration

Environment Variables

Core Configuration

AWS_REGION - AWS region for HealthOmics operations (default: us-east-1)
AWS_PROFILE - AWS profile for authentication
FASTMCP_LOG_LEVEL - Server logging level (default: WARNING)
HEALTHOMICS_DEFAULT_MAX_RESULTS - Default maximum number of results for paginated API calls (default: 10)

Genomics File Search Configuration

GENOMICS_SEARCH_S3_BUCKETS - Comma-separated list of S3 bucket paths to search for genomics files (e.g., "s3://my-genomics-data/,s3://shared-references/")
GENOMICS_SEARCH_ENABLE_S3_TAG_SEARCH - Enable/disable S3 tag-based searching (default: true)
- Set to false to disable tag retrieval and only use path-based matching
- Significantly reduces S3 API calls when tag matching is not needed
GENOMICS_SEARCH_MAX_TAG_BATCH_SIZE - Maximum objects to retrieve tags for in a single batch (default: 100)
- Larger values improve performance for tag-heavy searches but use more memory
- Smaller values reduce memory usage but may increase API call latency
GENOMICS_SEARCH_RESULT_CACHE_TTL - Result cache TTL in seconds (default: 600)
- Set to 0 to disable result caching
- Caches complete search results to eliminate repeated S3 calls for identical searches
GENOMICS_SEARCH_TAG_CACHE_TTL - Tag cache TTL in seconds (default: 300)
- Set to 0 to disable tag caching
- Caches individual object tags to avoid duplicate retrievals across searches
GENOMICS_SEARCH_MAX_CONCURRENT - Maximum concurrent S3 bucket searches (default: 10)
GENOMICS_SEARCH_TIMEOUT_SECONDS - Search timeout in seconds (default: 300)
GENOMICS_SEARCH_ENABLE_HEALTHOMICS - Enable/disable HealthOmics sequence/reference store searches (default: true)

Note for Large S3 Buckets: When searching very large S3 buckets (millions of objects), the genomics file search may take longer than the default MCP client timeout. If you encounter timeout errors, increase the MCP server timeout by adding a "timeout" property to your MCP server configuration (e.g., "timeout": 300000 for five minutes, specified in milliseconds). This is particularly important when using the search tool with extensive S3 bucket configurations or when GENOMICS_SEARCH_ENABLE_S3_TAG_SEARCH=true is used with large datasets. The value of "timeout" should always be greater than the value of GENOMICS_SEARCH_TIMEOUT_SECONDS if you want to prevent the MCP timeout from preempting the genomics search timeout

Agent Identification

AGENT - Agent identifier appended to the User-Agent string on all boto3 API calls as agent/<value> (optional)
- Use case: Attributing API calls to specific AI agents for traceability via CloudTrail and AWS service logs
- Behavior: When set, the value is sanitized to visible ASCII characters (0x20-0x7E), stripped of leading/trailing whitespace, lowercased, and appended to the User-Agent header as agent/<value>
- Validation: Empty, whitespace-only, or values that become empty after sanitization are treated as unset
- Example: export AGENT=KIRO produces User-Agent: ... agent/kiro

Testing Configuration Variables

The following environment variables are primarily intended for testing scenarios, such as integration testing against mock service endpoints:

HEALTHOMICS_SERVICE_NAME - Override the AWS service name used by the HealthOmics client (default: omics)
- Use case: Testing against mock services or alternative implementations
- Validation: Cannot be empty or whitespace-only; falls back to default with warning if invalid
- Example: export HEALTHOMICS_SERVICE_NAME=omics-mock
HEALTHOMICS_ENDPOINT_URL - Override the endpoint URL used by the HealthOmics client
- Use case: Integration testing against local mock services or alternative endpoints
- Validation: Must begin with http:// or https://; ignored with warning if invalid
- Example: export HEALTHOMICS_ENDPOINT_URL=http://localhost:8080
- Note: Only affects the HealthOmics client; other AWS services use default endpoints

Important: These testing configuration variables should only be used in development and testing environments. In production, always use the default AWS HealthOmics service endpoints for security and reliability.

AWS Credentials

This server requires AWS credentials with appropriate permissions for HealthOmics operations. Configure using:

AWS CLI: aws configure
Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
IAM roles (recommended for EC2/Lambda)
AWS profiles: Set AWS_PROFILE environment variable

Required IAM Permissions

The following IAM permissions are required:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "omics:ListWorkflows",
                "omics:CreateWorkflow",
                "omics:GetWorkflow",
                "omics:CreateWorkflowVersion",
                "omics:ListWorkflowVersions",
                "omics:StartRun",
                "omics:ListRuns",
                "omics:GetRun",
                "omics:ListRunTasks",
                "omics:GetRunTask",
                "omics:CreateRunGroup",
                "omics:GetRunGroup",
                "omics:ListRunGroups",
                "omics:UpdateRunGroup",
                "omics:CreateRunCache",
                "omics:GetRunCache",
                "omics:ListRunCaches",
                "omics:UpdateRunCache",
                "omics:ListSequenceStores",
                "omics:ListReadSets",
                "omics:GetReadSetMetadata",
                "omics:ListReferenceStores",
                "omics:ListReferences",
                "omics:GetReferenceMetadata",
                "omics:CreateSequenceStore",
                "omics:GetSequenceStore",
                "omics:UpdateSequenceStore",
                "omics:StartReadSetImportJob",
                "omics:GetReadSetImportJob",
                "omics:ListReadSetImportJobs",
                "omics:StartReadSetExportJob",
                "omics:GetReadSetExportJob",
                "omics:ListReadSetExportJobs",
                "omics:StartReadSetActivationJob",
                "omics:GetReferenceStore",
                "omics:StartReferenceImportJob",
                "omics:GetReferenceImportJob",
                "omics:ListReferenceImportJobs",
                "omics:CreateConfiguration",
                "omics:GetConfiguration",
                "omics:ListConfigurations",
                "omics:DeleteConfiguration",
                "logs:DescribeLogGroups",
                "logs:DescribeLogStreams",
                "logs:GetLogEvents"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetObjectTagging",
                "s3:HeadBucket"
            ],
            "Resource": [
                "arn:aws:s3:::*genomics*",
                "arn:aws:s3:::*genomics*/*",
                "arn:aws:s3:::*omics*",
                "arn:aws:s3:::*omics*/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "arn:aws:iam::*:role/HealthOmicsExecutionRole*"
        }
    ]
}

Note: The S3 permissions above use wildcard patterns for genomics-related buckets. In production, replace these with specific bucket ARNs that you want to search. For example:

{
    "Effect": "Allow",
    "Action": [
        "s3:ListBucket",
        "s3:GetObject",
        "s3:GetObjectTagging",
        "s3:HeadBucket"
    ],
    "Resource": [
        "arn:aws:s3:::my-genomics-data",
        "arn:aws:s3:::my-genomics-data/*",
        "arn:aws:s3:::shared-references",
        "arn:aws:s3:::shared-references/*"
    ]
}

Usage with MCP Clients

Kiro

See the Kiro IDE documentation or the Kiro CLI documentation for details.

For global configuration, edit ~/.kiro/settings/mcp.json. For project-specific configuration, edit .kiro/settings/mcp.json in your project directory.

Add to your Kiro MCP configuration (~/.kiro/settings/mcp.json):

{
  "mcpServers": {
    "aws-healthomics": {
      "command": "uvx",
      "args": ["awslabs.aws-healthomics-mcp-server"],
      "timeout": 300000,
      "env": {
        "AWS_REGION": "us-east-1",
        "AWS_PROFILE": "your-profile",
        "HEALTHOMICS_DEFAULT_MAX_RESULTS": "10",
        "AGENT": "kiro",
        "GENOMICS_SEARCH_S3_BUCKETS": "s3://my-genomics-data/,s3://shared-references/",
        "GENOMICS_SEARCH_ENABLE_S3_TAG_SEARCH": "true",
        "GENOMICS_SEARCH_MAX_TAG_BATCH_SIZE": "100",
        "GENOMICS_SEARCH_RESULT_CACHE_TTL": "600",
        "GENOMICS_SEARCH_TAG_CACHE_TTL": "300"
      }
    }
  }
}

Testing Configuration Example

For integration testing against mock services:

{
  "mcpServers": {
    "aws-healthomics-test": {
      "command": "uvx",
      "args": ["awslabs.aws-healthomics-mcp-server"],
      "timeout": 300000,
      "env": {
        "AWS_REGION": "us-east-1",
        "AWS_PROFILE": "test-profile",
        "HEALTHOMICS_SERVICE_NAME": "omics-mock",
        "HEALTHOMICS_ENDPOINT_URL": "http://localhost:8080",
        "GENOMICS_SEARCH_S3_BUCKETS": "s3://test-genomics-data/",
        "GENOMICS_SEARCH_ENABLE_S3_TAG_SEARCH": "false",
        "GENOMICS_SEARCH_RESULT_CACHE_TTL": "0",
        "FASTMCP_LOG_LEVEL": "DEBUG"
      }
    }
  }
}

Other MCP Clients

Configure according to your client's documentation, using:

Command: uvx
Args: ["awslabs.aws-healthomics-mcp-server"]
Environment variables as needed

Windows Installation

For Windows users, the MCP server configuration format is slightly different:

{
  "mcpServers": {
    "awslabs.aws-healthomics-mcp-server": {
      "disabled": false,
      "timeout": 300000,
      "type": "stdio",
      "command": "uv",
      "args": [
        "tool",
        "run",
        "--from",
        "awslabs.aws-healthomics-mcp-server@latest",
        "awslabs.aws-healthomics-mcp-server.exe"
      ],
      "env": {
        "FASTMCP_LOG_LEVEL": "ERROR",
        "AWS_PROFILE": "your-aws-profile",
        "AWS_REGION": "us-east-1",
        "GENOMICS_SEARCH_S3_BUCKETS": "s3://my-genomics-data/,s3://shared-references/",
        "GENOMICS_SEARCH_ENABLE_S3_TAG_SEARCH": "true",
        "GENOMICS_SEARCH_MAX_TAG_BATCH_SIZE": "100",
        "GENOMICS_SEARCH_RESULT_CACHE_TTL": "600",
        "GENOMICS_SEARCH_TAG_CACHE_TTL": "300"
      }
    }
  }
}

Windows Testing Configuration

For testing scenarios on Windows:

{
  "mcpServers": {
    "awslabs.aws-healthomics-mcp-server-test": {
      "disabled": false,
      "timeout": 300000,
      "type": "stdio",
      "command": "uv",
      "args": [
        "tool",
        "run",
        "--from",
        "awslabs.aws-healthomics-mcp-server@latest",
        "awslabs.aws-healthomics-mcp-server.exe"
      ],
      "env": {
        "FASTMCP_LOG_LEVEL": "DEBUG",
        "AWS_PROFILE": "test-profile",
        "AWS_REGION": "us-east-1",
        "HEALTHOMICS_SERVICE_NAME": "omics-mock",
        "HEALTHOMICS_ENDPOINT_URL": "http://localhost:8080",
        "GENOMICS_SEARCH_S3_BUCKETS": "s3://test-genomics-data/",
        "GENOMICS_SEARCH_ENABLE_S3_TAG_SEARCH": "false",
        "GENOMICS_SEARCH_RESULT_CACHE_TTL": "0"
      }
    }
  }
}

Development

Setup

git clone <repository-url>
cd aws-healthomics-mcp-server
uv sync

Testing

# Run tests with coverage
uv run pytest --cov --cov-branch --cov-report=term-missing

# Run specific test file
uv run pytest tests/test_server.py -v

Code Quality

# Format code
uv run ruff format

# Lint code
uv run ruff check

# Type checking
uv run pyright

Contributing

Contributions are welcome! Please see the contributing guidelines for more information.

License

This project is licensed under the Apache-2.0 License. See the LICENSE file for details.

FilesExpand file tree

aws-healthomics-mcp-server

Directory actions

More options