Skip to content

Conversation

@Edison-A-N
Copy link
Contributor

Overview

Adds fault tolerance support for handling unavailable MCP servers when using shared MCP config files.

Motivation

When sharing MCP config files across teams/projects, some MCP servers may become unavailable due to poor maintenance or temporary issues. Currently, a single failed server connection causes the entire adapter to fail, blocking access to all other working servers.

Discussion point: Should MCP server availability be managed solely by the config provider, or should the SDK help manage failures gracefully?

Changes

  • fail_fast parameter (default: True):

    • True: Maintains backward compatibility - any connection failure raises an exception
    • False: Skips failed connections and continues with available servers
  • on_connection_error callback: Optional error handler called for each failed connection with (server_params, exception) as arguments

  • failed_connections tracking: List of tuples (server_params, exception) for transparency

  • Individual connection handling: Changed from batch connection to individual try-catch for better fault isolation

Usage Example

def log_connection_error(params, exception):
    logger.warning(f"Failed to connect to {params}: {exception}")

with MCPAdapt(
    [server1_params, server2_params, server3_params],
    adapter=MyAdapter(),
    fail_fast=False,  # Continue even if some servers fail
    on_connection_error=log_connection_error
) as tools:
    # Only tools from successfully connected servers are available
    # Check adapter.failed_connections for error details
    pass

Testing

Comprehensive test coverage added in test_core_fault_tolerance.py covering:

  • Default fail_fast=True behavior
  • Fault tolerance with fail_fast=False
  • Error callback functionality
  • Failed connection tracking
  • Both sync and async context managers

- Add fail_fast option to control connection failure behavior
- Add on_connection_error callback for custom error handling
- Track failed connections for better debugging
- Try connecting to each server individually for graceful degradation
- Add get_connection_summary() method for connection status overview
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant