Skip to content

Conversation

JAORMX
Copy link
Collaborator

@JAORMX JAORMX commented Sep 21, 2025

Overview

Refactors the session management system to use a pluggable storage interface, enabling future support for distributed storage backends like Redis/Valkey while maintaining backward compatibility.

What Changed

Core Architecture

  • Introduced a Storage interface that abstracts session persistence operations
  • Refactored Manager to use the Storage interface instead of directly using sync.Map
  • Created LocalStorage implementation that maintains the existing in-memory behavior
  • Added JSON serialization support for sessions to enable future network storage

Session Interface Enhancements

  • Extended Session interface with Type() and metadata methods that were already implemented in concrete types
  • Added GetData(), SetData(), GetMetadata(), and SetMetadata() methods for better data management

New Components

  • Storage interface with methods: Store(), Load(), Delete(), DeleteExpired(), Close()
  • LocalStorage implementation using sync.Map for backward compatibility
  • SessionType enum for better type identification
  • JSON serialization/deserialization support for all session types

Why This Change?

The previous implementation was tightly coupled to in-memory storage, making it impossible to share sessions across multiple ToolHive instances. This refactoring enables:

  • Horizontal scaling with shared session state
  • Session persistence across restarts
  • Future Redis/Valkey backend support without breaking changes
  • Better testability with pluggable storage backends

Testing

Added comprehensive unit tests covering:

  • LocalStorage implementation with all CRUD operations
  • ✅ Session serialization/deserialization for all session types
  • Manager with pluggable storage interface
  • ✅ All existing session types (ProxySession, SSESession, StreamableSession)
  • ✅ Error handling and edge cases
  • ✅ Context cancellation support
  • ✅ TTL and cleanup functionality

Backward Compatibility

Full backward compatibility maintained

  • Existing NewManager() function continues to work unchanged
  • All existing session types work without modification
  • No breaking changes to public APIs
  • Default behavior remains in-memory storage

Files Changed

  • pkg/transport/session/manager.go - Refactored to use Storage interface
  • pkg/transport/session/storage.go - New Storage interface and LocalStorage implementation
  • pkg/transport/session/serialization.go - JSON serialization support
  • pkg/transport/session/session.go - Enhanced session types with metadata support
  • pkg/transport/session/storage_test.go - Comprehensive test suite
  • pkg/transport/session/serialization_test.go - Serialization tests

Future Work

This foundation enables:

  • Redis/Valkey storage backend implementation
  • Distributed session management
  • Session replication and high availability
  • Custom storage backends for specific use cases

Checklist

  • All tests pass
  • Backward compatibility maintained
  • Documentation updated in code comments
  • No breaking changes to existing APIs
  • Comprehensive test coverage added
  • Error handling implemented
  • Context support added where appropriate

Copy link

codecov bot commented Sep 21, 2025

Codecov Report

❌ Patch coverage is 85.64815% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.07%. Comparing base (2d6a56f) to head (695f279).

Files with missing lines Patch % Lines
pkg/transport/session/manager.go 85.18% 7 Missing and 1 partial ⚠️
pkg/transport/proxy/httpsse/http_proxy.go 0.00% 4 Missing and 2 partials ⚠️
pkg/transport/proxy/streamable/streamable_proxy.go 0.00% 4 Missing and 2 partials ⚠️
pkg/transport/session/serialization.go 90.90% 4 Missing and 2 partials ⚠️
pkg/transport/session/storage_local.go 92.85% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1989      +/-   ##
==========================================
+ Coverage   48.71%   49.07%   +0.36%     
==========================================
  Files         241      243       +2     
  Lines       30592    30756     +164     
==========================================
+ Hits        14902    15095     +193     
+ Misses      14567    14530      -37     
- Partials     1123     1131       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JAORMX JAORMX requested a review from Copilot September 23, 2025 15:00
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the session management system to use a pluggable storage interface, enabling future support for distributed storage backends like Redis/Valkey while maintaining backward compatibility with the existing in-memory behavior.

Key changes:

  • Introduced a Storage interface that abstracts session persistence operations
  • Refactored Manager to use the Storage interface instead of directly using sync.Map
  • Added JSON serialization support for sessions to enable future network storage
  • Enhanced session interfaces with metadata and data management methods

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
pkg/transport/session/storage.go New Storage interface defining core operations for session persistence
pkg/transport/session/storage_local.go LocalStorage implementation using sync.Map for backward compatibility
pkg/transport/session/serialization.go JSON serialization functions for future distributed storage support
pkg/transport/session/manager.go Refactored Manager to use Storage interface instead of direct sync.Map
pkg/transport/session/storage_test.go Comprehensive test suite for storage implementations and serialization
pkg/transport/session/serialization_test.go Test suite for JSON serialization/deserialization functionality

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

sseSession := NewSSESession(sd.ID)
// Update timestamps to match stored values
sseSession.created = sd.CreatedAt
sseSession.updated = sd.UpdatedAt
Copy link

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct field assignment bypasses encapsulation and could break if the internal structure changes. Consider adding setter methods or using reflection-based copying instead of direct field access.

Suggested change
sseSession.updated = sd.UpdatedAt
sseSession.SetUpdatedAt(sd.UpdatedAt)

Copilot uses AI. Check for mistakes.

@JAORMX
Copy link
Collaborator Author

JAORMX commented Oct 7, 2025

@claude please review this thoroguhly

Copy link
Contributor

claude bot commented Oct 7, 2025

Claude finished @JAORMX's task —— View job


Thorough Review of PR #1989

Completed comprehensive review of the session storage refactoring.

Tasks

  • Read all changed files
  • Analyze architecture and design decisions
  • Review for bugs, security issues, and race conditions
  • Check backward compatibility
  • Review test coverage and quality
  • Assess code quality and maintainability
  • Provide comprehensive feedback

🎯 Overall Assessment

This is a well-designed refactoring that successfully introduces a pluggable storage interface while maintaining backward compatibility. The code quality is high, test coverage is comprehensive, and the architecture is solid. However, there are several issues that need attention before merging.

🔴 Critical Issues

1. Race Condition in LocalStorage.Close() (pkg/transport/session/storage_local.go:97-105)

Issue: The Close() method modifies the sync.Map while iterating over it, which can cause race conditions.

func (s *LocalStorage) Close() error {
    // PROBLEM: Modifying map during iteration
    s.sessions.Range(func(key, _ any) bool {
        s.sessions.Delete(key)  // ❌ Race condition
        return true
    })
    return nil
}

Fix: Collect keys first, then delete (like DeleteExpired does):

func (s *LocalStorage) Close() error {
    var toDelete []string
    s.sessions.Range(func(key, _ any) bool {
        if id, ok := key.(string); ok {
            toDelete = append(toDelete, id)
        }
        return true
    })
    for _, id := range toDelete {
        s.sessions.Delete(id)
    }
    return nil
}

Also: The comment says "no-op" but the method actually clears all sessions. Update the comment to match reality.

2. Encapsulation Violations in Serialization (pkg/transport/session/serialization.go)

Issue: Direct field assignment bypasses encapsulation and breaks if internal structure changes. Lines 71-72, 74, 85-87, 101-104 all directly assign to private fields:

sseSession.created = sd.CreatedAt      // ❌ Direct field access
sseSession.updated = sd.UpdatedAt      // ❌ Direct field access
sseSession.metadata = sd.Metadata      // ❌ Direct field access

Recommendation: Add proper setter methods to session types or use a builder pattern. For example:

// In proxy_session.go
func (s *ProxySession) setTimestamps(created, updated time.Time) {
    s.mu.Lock()
    defer s.mu.Unlock()
    s.created = created
    s.updated = updated
}

func (s *ProxySession) setMetadataMap(metadata map[string]string) {
    s.mu.Lock()
    defer s.mu.Unlock()
    s.metadata = metadata
}

This is marked with nolint:unused for future use, but it's important to fix before enabling.

3. Inconsistent StreamableSession Initialization (pkg/transport/session/streamable_session.go:20 + serialization.go:83)

Issue: NewStreamableSession() creates a ProxySession with only the ID field set (line 20), but deserializeSession() tries to set additional fields like created, updated, and sessType (lines 85-87). This inconsistency will cause issues.

// Current implementation - incomplete
func NewStreamableSession(id string) Session {
    return &StreamableSession{
        ProxySession: &ProxySession{id: id},  // ❌ Missing other fields
        MessageCh:    make(chan jsonrpc2.Message, 100),
        ResponseCh:   make(chan jsonrpc2.Message, 100),
    }
}

Fix: Use the proper constructor:

func NewStreamableSession(id string) Session {
    return &StreamableSession{
        ProxySession: NewTypedProxySession(id, SessionTypeStreamable),
        MessageCh:    make(chan jsonrpc2.Message, 100),
        ResponseCh:   make(chan jsonrpc2.Message, 100),
    }
}

🟡 Medium Priority Issues

4. Missing Context Support in Storage Methods

Issue: While Storage.DeleteExpired() properly uses context, Store(), Load(), and Delete() receive context but never check for cancellation. This could be problematic for future distributed storage implementations with network I/O.

Current: Context parameters are ignored in LocalStorage
Recommendation: For consistency, future storage backends should respect context cancellation. Consider documenting this requirement in the Storage interface.

5. Type Assertion Without Error Handling (pkg/transport/session/serialization.go:83)

streamSession := NewStreamableSession(sd.ID).(*StreamableSession)  // ❌ Panic on type assertion failure

Fix:

sess := NewStreamableSession(sd.ID)
streamSession, ok := sess.(*StreamableSession)
if !ok {
    return nil, fmt.Errorf("failed to create StreamableSession")
}

6. Manager Methods Coupled to LocalStorage (pkg/transport/session/manager.go:185-198)

Issue: Range() and Count() methods only work with LocalStorage, which defeats the purpose of the abstraction:

func (m *Manager) Count() int {
    if localStorage, ok := m.storage.(*LocalStorage); ok {
        return localStorage.Count()
    }
    return 0  // ❌ Silent failure
}

Options:

  1. Add Count() and Range() to the Storage interface
  2. Move these methods to LocalStorage only and remove from Manager
  3. Return an error when storage backend doesn't support these operations

Recommendation: Option 2 is cleanest - these are helper methods that don't make sense for distributed storage.

7. Manager Uses context.Background() Everywhere (pkg/transport/session/manager.go)

Issue: All storage operations use context.Background() instead of accepting context from callers:

func (m *Manager) Get(id string) (Session, bool) {
    ctx := context.Background()  // ❌ No way for caller to cancel
    sess, err := m.storage.Load(ctx, id)
    ...
}

Recommendation: Add context-aware versions of these methods:

func (m *Manager) GetContext(ctx context.Context, id string) (Session, bool) {
    sess, err := m.storage.Load(ctx, id)
    ...
}

Or make the existing methods accept context (breaking change).

🟢 Minor Issues / Suggestions

8. Documentation: Comment on Storage Interface (storage.go:31)

The comment says "For local storage, this is a no-op" but after review, LocalStorage.Close() actually clears all sessions. Update to:

// Close performs cleanup of the storage backend.
// For local storage, this clears all sessions. For remote storage, it closes connections.

9. Test Coverage Gaps (Based on Codecov Report)

The codecov report shows 89.56% patch coverage with 19 lines missing. Key gaps:

  • serialization.go: 6 missing + 3 partials (88%)
  • manager.go: 4 missing + 1 partial (87.5%)
  • storage_local.go: 4 missing + 1 partial (92.53%)

Recommendation: Add tests for:

  • Error paths in deserialization (malformed JSON for each session type)
  • Manager error handling when factory returns nil
  • Edge cases in LocalStorage with concurrent access

10. Serialization Functions Marked as Unused

The serialization functions are marked with nolint:unused because they're for future use. Consider:

  • Adding a comment explaining they'll be used in Phase 4 (you already have this ✅)
  • Creating a placeholder test that exercises them (you have this too ✅)

Actually, this is well-handled - tests already cover these functions comprehensively.

11. Mutex Protection in StreamableSession

Issue: StreamableSession.disconnected field is accessed without mutex protection in Disconnect(), SendMessage(), and SendResponse() methods. While this might work in practice, it's technically a data race.

Fix: Add mutex to StreamableSession like ProxySession has.

12. Backward Compatibility: Factory Type Handling (manager.go:42-63)

The backward compatibility handling is good, but the fallback case (line 59-62) will never create the intended session type if someone passes an invalid factory:

default:
    // Default to creating basic ProxySession
    f = func(id string) Session {
        return NewProxySession(id)  // ❌ Might not be desired behavior
    }

Recommendation: Consider returning an error or panicking in NewManager if an invalid factory type is provided.

✅ Strengths

  1. Clean Architecture: The Storage interface is well-designed with a minimal, focused API
  2. Excellent Test Coverage: Comprehensive tests cover all storage operations, serialization, and edge cases
  3. Backward Compatibility: Existing code continues to work without changes
  4. Good Documentation: Comments explain the purpose and future use cases
  5. Thread Safety: Proper use of mutexes in ProxySession for concurrent access
  6. Context Support: Added to storage operations for future distributed implementations
  7. Type Safety: Good use of session types with the SessionType enum
  8. Separation of Concerns: Serialization logic is cleanly separated from storage logic

📋 Pre-Merge Checklist

Before merging, please address:

  • MUST FIX: Race condition in LocalStorage.Close() - collect keys first
  • MUST FIX: Update comment in storage_local.go:97 - Close is not a no-op
  • MUST FIX: Fix StreamableSession constructor to use NewTypedProxySession
  • SHOULD FIX: Add setter methods to avoid direct field access in serialization
  • SHOULD FIX: Handle type assertion properly in deserialization (line 83)
  • CONSIDER: Add mutex protection to StreamableSession.disconnected
  • CONSIDER: Decide strategy for Count() and Range() methods on Manager
  • CONSIDER: Add context parameters to Manager public methods

🎉 Conclusion

This is a solid refactoring that achieves its goals. The architecture is well thought out, the implementation is clean, and the tests are comprehensive. The critical issues are straightforward to fix and don't require major rework. Once the race condition and encapsulation issues are addressed, this will be ready to merge.

The foundation for distributed session storage (Redis/Valkey) is well-established. Great work! 🚀


Job Run | Branch: feat/session-storage-pluggable

@JAORMX JAORMX force-pushed the feat/session-storage-pluggable branch from a0ba465 to 30db7d3 Compare October 7, 2025 06:15
@JAORMX JAORMX requested a review from Copilot October 7, 2025 06:19
@JAORMX
Copy link
Collaborator Author

JAORMX commented Oct 7, 2025

@claude please review this again

Copy link
Contributor

claude bot commented Oct 7, 2025

Claude finished @JAORMX's task —— View job


Thorough Code Review: Session Storage Refactoring

I've completed a comprehensive review of all 8 changed files (1,089 additions, 54 deletions). Here are my findings:


Excellent Improvements Since Last Review

I can see that all critical issues from the previous review have been addressed:

  1. Race condition in LocalStorage.Close() FIXED (storage_local.go:97-109)

    • Now correctly collects keys first, then deletes them
    • No more concurrent modification during iteration
  2. Comment accuracy FIXED (storage_local.go:97)

    • Comment now correctly states "Close clears all sessions from local storage"
    • Matches actual behavior
  3. StreamableSession initialization FIXED (streamable_session.go:20)

    • Now uses NewTypedProxySession(id, SessionTypeStreamable)
    • Proper initialization with correct session type
  4. Encapsulation improved (proxy_session.go:138-157)

    • Added setTimestamps() and setMetadataMap() helper methods
    • Serialization now uses proper setters instead of direct field access

🎯 Architecture & Design Quality

Strengths

  • Clean abstraction: The Storage interface is minimal and well-designed
  • Backward compatibility: Full compatibility maintained with existing code
  • Type safety: Good use of SessionType enum for session identification
  • Thread safety: Proper mutex usage in ProxySession and SSESession
  • Separation of concerns: Serialization cleanly separated from storage logic
  • Future-ready: Prepared for Redis/Valkey without breaking changes

Design Observations

1. Manager methods coupled to LocalStorage (manager.go:185-198)

func (m *Manager) Range(f func(key, value interface{}) bool) {
    if localStorage, ok := m.storage.(*LocalStorage); ok {
        localStorage.Range(f)
    }
}
  • Range() and Count() only work with LocalStorage
  • Type assertion defeats the purpose of abstraction
  • Recommendation: These are helper methods that don't make sense for distributed storage. Consider:
    • Option A: Document that these only work with LocalStorage (currently done ✅)
    • Option B: Remove from Manager and only expose on LocalStorage directly
    • Current approach is acceptable for v1

2. Context usage in Manager (manager.go:128-171)

  • All storage operations use context.Background() instead of accepting context from callers
  • No way for callers to control timeouts or cancellation
  • Recommendation: For v2, consider adding context-aware versions:
func (m *Manager) GetContext(ctx context.Context, id string) (Session, bool)
  • Current implementation is acceptable for local storage

🔍 Code Quality Analysis

Thread Safety ✅

  • ProxySession: Proper mutex protection for all shared fields
  • SSESession: Extends ProxySession's mutex correctly
  • StreamableSession: ⚠️ Minor issue (see below)
  • LocalStorage: sync.Map is inherently thread-safe

Error Handling ✅

  • Good validation in storage methods (empty ID checks, nil checks)
  • Proper error types defined (ErrSessionNotFound)
  • Graceful handling of edge cases

Test Coverage ✅

  • 89.56% patch coverage is excellent for new code
  • Comprehensive tests cover:
    • All storage operations (Store, Load, Delete, DeleteExpired)
    • All session types (Proxy, SSE, Streamable)
    • Serialization/deserialization
    • Edge cases and error paths
    • Context cancellation
    • Concurrent access patterns

⚠️ Minor Issues Found

1. StreamableSession lacks mutex protection (streamable_session.go:14, 44, 54, 67)

The disconnected field is accessed without mutex protection in multiple methods:

type StreamableSession struct {
    *ProxySession
    MessageCh    chan jsonrpc2.Message
    ResponseCh   chan jsonrpc2.Message
    disconnected bool  // ⚠️ No mutex protection
}

func (s *StreamableSession) Disconnect() {
    if s.disconnected {  // ⚠️ Race: read without lock
        return
    }
    // ...
    s.disconnected = true  // ⚠️ Race: write without lock
}

Impact: Low - unlikely to cause issues in practice, but technically a data race
Fix: Add mutex protection or use atomic operations:

s.mu.Lock()
defer s.mu.Unlock()
if s.disconnected {
    return
}

2. Serialization type assertion without check (serialization.go:80)

streamSession, ok := sess.(*StreamableSession)
if !ok {
    return nil, fmt.Errorf("failed to create StreamableSession")
}

Status: ✅ Already has proper error handling (line 80-82)
No issue here - I was mistaken in initial scan

3. Missing test coverage areas (per codecov report)

serialization.go: 6 missing + 3 partials (88% coverage)

  • Some error paths in deserialization may not be fully tested
  • Complex data scenarios could use more edge case tests

Recommendation: Add tests for:

  • Malformed JSON for each session type
  • Invalid session type values
  • Missing required fields in JSON

🔒 Security Review

✅ No security issues identified:

  • Input validation on all public methods
  • No unsafe type conversions without checks
  • No credential or sensitive data exposure
  • Thread-safe operations throughout
  • Context cancellation support for cleanup

📊 Test Quality Assessment

Excellent Test Patterns

t.Run("Context cancellation in DeleteExpired", func(t *testing.T) {
    ctx, cancel := context.WithCancel(context.Background())
    cancel()
    err := storage.DeleteExpired(ctx, time.Now())
    assert.NoError(t, err)  // Graceful handling
})

Comprehensive Coverage

  • Unit tests: ✅ All storage operations
  • Integration tests: ✅ Manager with different session types
  • Edge cases: ✅ Empty IDs, nil sessions, context cancellation
  • Concurrency: ✅ Implicit through storage operations

🎯 Final Assessment

Overall Quality: Excellent

This refactoring successfully achieves its goals:

  • ✅ Pluggable storage interface for future distributed backends
  • ✅ Full backward compatibility maintained
  • ✅ Clean architecture with proper abstractions
  • ✅ Comprehensive test coverage (89.56%)
  • ✅ All critical issues from previous review addressed
  • ✅ Thread-safe implementation
  • ✅ Well-documented code

Pre-Merge Checklist

Must Fix:

  • None - no critical issues found

Should Consider (Optional):

  • Add mutex protection to StreamableSession.disconnected for correctness
  • Add a few more edge case tests for serialization error paths
  • Document context usage patterns for future distributed storage

Future Enhancements (Post-Merge):

  • Add context parameters to Manager public methods (breaking change)
  • Implement Redis/Valkey storage backend
  • Consider removing Range()/Count() from Manager interface

🚀 Recommendation: APPROVE

This PR is ready to merge. The code quality is high, all critical issues have been addressed, and the architecture is sound. The minor StreamableSession mutex issue is a nice-to-have fix but not a blocker.

The foundation for distributed session storage is well-established and the implementation maintains excellent backward compatibility. Great work! 🎉


Files Reviewed:

  • pkg/transport/session/storage.go (new)
  • pkg/transport/session/storage_local.go (new)
  • pkg/transport/session/serialization.go (new)
  • pkg/transport/session/manager.go (refactored)
  • pkg/transport/session/proxy_session.go (enhanced)
  • pkg/transport/session/streamable_session.go (updated)
  • pkg/transport/session/storage_test.go (new)
  • pkg/transport/session/serialization_test.go (new)

Job Run | Branch: feat/session-storage-pluggable

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@JAORMX
Copy link
Collaborator Author

JAORMX commented Oct 8, 2025

@claude can you review this again?

Copy link
Contributor

claude bot commented Oct 8, 2025

Claude finished @JAORMX's task —— View job


Code Review: Session Storage Refactoring

Completed comprehensive review of PR #1989 - Pluggable storage backend for session management.

Review Tasks

  • Read all changed files (8 files)
  • Verify previous critical issues were addressed
  • Review for new issues
  • Check code quality and thread safety
  • Review test coverage
  • Provide comprehensive feedback

🎯 Overall Assessment

This is an excellent refactoring that successfully introduces a pluggable storage interface while maintaining full backward compatibility. All critical issues from previous reviews have been addressed. The code quality is high, test coverage is comprehensive (89.56%), and the architecture is well-designed.

✅ Previous Issues - All Fixed

I verified that all critical issues from the previous review have been properly addressed:

  1. Race condition in LocalStorage.Close() FIXED (storage_local.go:97-109)

    • Now correctly collects keys first, then deletes them
    • No more concurrent modification during iteration
  2. Comment accuracy FIXED (storage_local.go:97)

    • Comment now correctly states "Close clears all sessions from local storage"
    • Accurately describes actual behavior
  3. StreamableSession initialization FIXED (streamable_session.go:20)

    • Now uses NewTypedProxySession(id, SessionTypeStreamable)
    • Proper initialization with correct session type
  4. Encapsulation improved (proxy_session.go:138-157)

    • Added setTimestamps() and setMetadataMap() helper methods
    • Serialization now uses proper setters instead of direct field access (serialization.go:71, 73, 85-87, 96-98)

🟢 Code Quality Review

Architecture & Design: Excellent ⭐

Strengths:

  • Clean abstraction: The Storage interface is minimal and well-designed with just 5 essential methods
  • Backward compatibility: Full compatibility maintained - existing NewManager() continues to work unchanged
  • Type safety: Good use of SessionType enum for session identification
  • Separation of concerns: Serialization cleanly separated from storage logic
  • Future-ready: Well-prepared for Redis/Valkey implementation without breaking changes
  • Proper constructors: NewSSESession() uses NewTypedProxySession() correctly (sse_session.go:23)

Minor Design Notes (not blockers):

  • Manager.Range() and Manager.Count() methods only work with LocalStorage (manager.go:185-198)
    • This is acceptable and properly documented with comments
    • These helper methods don't make sense for distributed storage anyway
  • Manager uses context.Background() internally rather than accepting context from callers (manager.go:128-171)
    • This is fine for v1 with local storage
    • Future versions could add context-aware methods if needed for distributed storage

Thread Safety: Excellent ✅

All session types properly implement thread safety:

  • ProxySession: Excellent mutex protection with sync.RWMutex for all shared fields (proxy_session.go:36, 72-74, 78-82, 88-98, 102-120)
  • SSESession: Properly uses inherited mutex from ProxySession (sse_session.go:41-49, 54-58, 69-84)
  • StreamableSession: ⚠️ Minor issue - disconnected field lacks mutex protection (see below)
  • LocalStorage: sync.Map is inherently thread-safe
  • Serialization helpers: Properly use setters that acquire locks

Error Handling: Excellent ✅

  • Good validation throughout (empty ID checks, nil checks)
  • Proper error types defined (ErrSessionNotFound, ErrSessionDisconnected, ErrMessageChannelFull)
  • Graceful handling of edge cases
  • Context cancellation support in DeleteExpired() (storage_local.go:73-77)

Test Coverage: Excellent ✅

89.56% patch coverage is outstanding for new code! Tests comprehensively cover:

  • ✅ All storage operations (Store, Load, Delete, DeleteExpired, Close)
  • ✅ All session types (Proxy, SSE, Streamable)
  • ✅ Serialization/deserialization for all types
  • ✅ Edge cases (nil sessions, empty IDs, non-existent sessions)
  • ✅ Error paths (invalid JSON, type assertions)
  • ✅ Context cancellation behavior
  • ✅ Concurrent access patterns
  • ✅ Manager with different session types
  • ✅ Timestamp preservation
  • ✅ Metadata handling

⚠️ Minor Issue Found (Non-Blocking)

StreamableSession.disconnected lacks mutex protection

Location: streamable_session.go:14, 44, 54, 67

The disconnected field is accessed without mutex protection in multiple methods:

type StreamableSession struct {
    *ProxySession
    MessageCh    chan jsonrpc2.Message
    ResponseCh   chan jsonrpc2.Message
    disconnected bool  // ⚠️ No mutex protection
}

func (s *StreamableSession) Disconnect() {
    if s.disconnected {  // ⚠️ Race: read without lock
        return
    }
    // ...
    s.disconnected = true  // ⚠️ Race: write without lock
}

func (s *StreamableSession) SendMessage(msg jsonrpc2.Message) error {
    if s.disconnected {  // ⚠️ Race: read without lock
        return errors.New("session disconnected")
    }
    // ...
}

Impact: Low - unlikely to cause issues in practice, but technically a data race that could be detected by race detector

Suggested Fix: Add mutex protection (StreamableSession embeds ProxySession which has mu):

func (s *StreamableSession) Disconnect() {
    s.mu.Lock()
    defer s.mu.Unlock()
    
    if s.disconnected {
        return
    }
    close(s.MessageCh)
    close(s.ResponseCh)
    s.disconnected = true
}

func (s *StreamableSession) SendMessage(msg jsonrpc2.Message) error {
    s.mu.RLock()
    defer s.mu.RUnlock()
    
    if s.disconnected {
        return errors.New("session disconnected")
    }
    // ...
}

Note: SSESession handles this correctly with mutex protection (sse_session.go:41-49, 54-58)

📊 Documentation Quality

  • ✅ Clear interface documentation (storage.go)
  • ✅ Good comments on helper methods explaining LocalStorage-only behavior
  • ✅ Future use clearly marked with comments (serialization.go:9-10)
  • ✅ Proper package documentation

🔒 Security Review

✅ No security issues identified:

  • Input validation on all public methods
  • No unsafe type conversions without checks
  • Thread-safe operations throughout
  • Context cancellation support for cleanup
  • No credential or sensitive data exposure

🎯 Backward Compatibility

Perfect backward compatibility:

  • Existing NewManager() function works unchanged
  • Legacy factory pattern supported (manager.go:45-63)
  • All existing session types work without modification
  • Default behavior remains in-memory storage
  • No breaking changes to public APIs

📋 Recommendations

Before Merge:

  • Optional: Fix StreamableSession.disconnected mutex protection for correctness
    • Not a blocker, but would eliminate a potential race condition
    • Would make code consistent with SSESession implementation

Future Enhancements (Post-Merge):

  • Add context parameters to Manager public methods when implementing distributed storage
  • Implement Redis/Valkey storage backend using the new interface
  • Consider adding more serialization error path tests (though current coverage is already excellent)

🚀 Final Recommendation: APPROVE ✅

This PR is ready to merge. The refactoring achieves all stated goals:

  • ✅ Pluggable storage interface for future distributed backends
  • ✅ Full backward compatibility maintained
  • ✅ Clean, well-tested architecture
  • ✅ All previous critical issues resolved
  • ✅ Excellent test coverage (89.56%)
  • ✅ Thread-safe implementation (one minor issue, non-blocking)
  • ✅ Well-documented code

The foundation for distributed session storage (Redis/Valkey) is solid and the implementation maintains excellent quality. The minor StreamableSession mutex issue is a nice-to-have fix but not a blocker. Outstanding work! 🎉


Files Reviewed:

  • pkg/transport/session/storage.go (new - 34 lines)
  • pkg/transport/session/storage_local.go (new - 128 lines)
  • pkg/transport/session/serialization.go (new - 111 lines)
  • pkg/transport/session/manager.go (refactored - +64/-53)
  • pkg/transport/session/proxy_session.go (enhanced - +21 lines)
  • pkg/transport/session/streamable_session.go (updated - +1/-1)
  • pkg/transport/session/sse_session.go (context - already exists)
  • pkg/transport/session/storage_test.go (new - 502 lines)
  • pkg/transport/session/serialization_test.go (new - 232 lines)

Total Changes: 1,090 additions, 54 deletions across 8 files


Job Run | Branch: feat/session-storage-pluggable

JAORMX and others added 3 commits October 8, 2025 15:03
Refactors the session management system to use a pluggable storage
interface, enabling future support for distributed storage backends
like Redis/Valkey while maintaining backward compatibility.

What Changed

- Introduced a Storage interface that abstracts session persistence
- Refactored Manager to use the Storage interface instead of directly
  using sync.Map
- Created LocalStorage implementation that maintains the existing
  in-memory behavior
- Added JSON serialization support for sessions to enable future
  network storage
- Extended Session interface with Type() and metadata methods that
  were already implemented in concrete types

Why

The previous implementation was tightly coupled to in-memory storage,
making it impossible to share sessions across multiple ToolHive instances.
This refactoring enables:
- Horizontal scaling with shared session state
- Session persistence across restarts
- Future Redis/Valkey backend support without breaking changes

Testing

Added comprehensive unit tests covering:
- LocalStorage implementation
- Session serialization/deserialization
- Manager with pluggable storage
- All existing session types (ProxySession, SSESession, StreamableSession)

All tests pass and the implementation maintains full backward compatibility.

Signed-off-by: Juan Antonio Osorio <[email protected]>
- Fix race condition in LocalStorage.Close() by collecting keys before deletion
- Update Close() comment to reflect actual behavior (clears sessions, not a no-op)
- Add setter methods (setTimestamps, setMetadataMap) to ProxySession for proper encapsulation
- Update serialization to use setter methods instead of direct field access
- Fix StreamableSession constructor to use NewTypedProxySession for proper initialization
- Add type assertion check in StreamableSession deserialization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Replace string(rune('a' + i)) with fmt.Sprintf("session-%d", i)
- Previous approach only worked for i values 0-25 and produced unexpected Unicode characters for larger values
- Add fmt import to storage_test.go

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@JAORMX JAORMX force-pushed the feat/session-storage-pluggable branch from 4e12c30 to a8ebd91 Compare October 8, 2025 12:03
Copy link
Contributor

@yrobla yrobla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments

This commit addresses all feedback from PR review comments:

1. Separate Touch from Load operations
   - Remove auto-touch behavior from Storage.Load()
   - Manager.Get() now explicitly touches sessions
   - Gives callers control over when sessions are touched

2. Document Range/Count design decision
   - Add comprehensive documentation explaining why Range/Count
     are not part of Storage interface
   - These operations don't make sense for distributed storage

3. Add consistent context timeout usage
   - All storage operations now use consistent timeouts
   - 5 seconds for quick operations (Get, Delete, Add)
   - 30 seconds for cleanup operations

4. Proper error handling throughout
   - Manager.Delete() and Manager.Stop() now return errors
   - Cleanup routine logs errors instead of ignoring them
   - Proxy implementations use debug logging for non-critical errors

All tests pass with these changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants