feat: add fault tolerance for MCP server connections #76
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Adds fault tolerance support for handling unavailable MCP servers when using shared MCP config files.
Motivation
When sharing MCP config files across teams/projects, some MCP servers may become unavailable due to poor maintenance or temporary issues. Currently, a single failed server connection causes the entire adapter to fail, blocking access to all other working servers.
Discussion point: Should MCP server availability be managed solely by the config provider, or should the SDK help manage failures gracefully?
Changes
fail_fastparameter (default:True):True: Maintains backward compatibility - any connection failure raises an exceptionFalse: Skips failed connections and continues with available serverson_connection_errorcallback: Optional error handler called for each failed connection with(server_params, exception)as argumentsfailed_connectionstracking: List of tuples(server_params, exception)for transparencyIndividual connection handling: Changed from batch connection to individual try-catch for better fault isolation
Usage Example
Testing
Comprehensive test coverage added in
test_core_fault_tolerance.pycovering:fail_fast=Truebehaviorfail_fast=False