Feat/local docker sandbox#172
Open
mdear wants to merge 16 commits intoIntelligent-Internet:developfrom
Open
Conversation
Author
|
Hi, dev team, loving this framework! I put it to the test by first making it run airgapped (save secure foundation providers such as Pinecone and Anthropic), plugged in my new mcp server (knowledge base for seating/mobility vertical, able to easily overwhelm any model's context unless carefully controlled/tuned). Found a few stability issues and made some proposed fixes along the way. All is open for constructive criticism, discussion and debate. |
|
I really like the idea of having a sandbox that is local, for sensitive data or workflow. I am attempting to test this PR, thank you for your efforts |
- Add DockerSandbox provider for air-gapped/local deployments - Add PortPoolManager for centralized port allocation (30000-30999) - Add LocalStorage providers for ii_agent and ii_tool - Add MCP tool image processing from sandbox containers - Add storage factory functions with local/GCS support - Add test suite (143 tests passing) - Fix connect() to register ports preventing conflicts on reconnect - Fix delete() to cleanup orphaned volumes - Update docs with port management and local sandbox setup
Chat file handling:
- Fix file_search filtering by user_id only (not session_id) for cross-session access
- Add SHA-256 content hash deduplication in OpenAI vector store
- Reduce file_search max results to 3 to prevent context overflow
- Add file corpus discovery so AI knows which files are searchable
- Fix reasoning.effort parameter only sent to reasoning models
- Add hasattr guard for text attribute on image-only messages
Sandbox management:
- Add orphan cleanup loop (5min interval) to remove containers without active sessions
- Add /internal/sandboxes/{id}/has-active-session endpoint for session verification
- Add port_manager.scan_existing_containers() to recover state on restart
- Add LOCAL_MODE config with orphan cleanup settings
Resource limits:
- Add MAX_TABS=20 limit in browser with force-close of oldest tabs
- Add MAX_SHELL_SESSIONS=10 limit in shell tool
Tests: Add 248 unit tests covering all changes
## New Features - expose_port(external) parameter: external=True returns localhost:port for browser access, external=False returns internal Docker IP for container-to-container communication - LLMConfig.get_max_output_tokens(): Model-specific output token limits (64K Claude 4, 100K o1, 16K GPT-4, 8K Gemini) - Browser MAX_TABS=20 limit with automatic cleanup of oldest tabs - Shell session MAX_SHELL_SESSIONS=15 limit with clear error messages - Anthropic native thinking blocks support via beta endpoint - Extended context (1M tokens) support for Claude models ## Frontend Improvements - Added selectIsStopped selector for proper stopped state UI handling - Fixed agent task state transitions for cancelled sessions - Improved subagent container with session awareness ## New Test Coverage (343 tests total) - tests/llm/test_llm_config.py: LLMConfig.get_max_output_tokens() tests - tests/tools/test_browser_tab_limit.py: Browser MAX_TABS enforcement - tests/tools/test_resource_limits.py: Browser and shell session limits - tests/tools/test_generation_config_factory.py: Image/video generation configs - tests/tools/test_openai_dalle.py: DALL-E 3 image generation client - tests/tools/test_openai_sora.py: Sora video generation client - tests/storage/test_local_storage.py: LocalStorage.get_permanent_url() - tests/storage/test_tool_local_storage.py: Tool server LocalStorage ## Code Quality - Removed debug print statements from anthropic.py - Removed trailing whitespace from all files - Fixed test assertions to match implementation behavior
…proposed changes.
## Resource Management - Browser: Increase MAX_TABS from 20 to 50 - Browser: Enforce tab limit in _on_page_change handler for popups/target=_blank - Shell: Auto-close oldest session when MAX_SHELL_SESSIONS limit reached (replaces error-based rejection) ## Admin Tools - Add scripts/admin_credits.sh for user credit management - list: View all users and balances - show/topup/set/bonus: Manage individual user credits ## Documentation - Fix SANDBOX_DATABASE_URL to use asyncpg driver in .stack.env.local.example - Update feature-branch-analysis.md: Claude 3.5 → Claude 4.5, improve wording ## Tests - Update browser tests for MAX_TABS=50 - Add tests for _on_page_change tab limit enforcement - Update shell tests for auto-close behavior ## Build - uv.lock: Add prerelease-mode = "allow" option
…tension architecture docs - Refactor run_stack.sh to support both cloud (E2B/ngrok) and local-only (Docker sandbox) modes - Add start, stop, restart, status, logs, build, and setup commands - Implement auto-detection of local mode based on env file presence - Auto-create env files from templates with helpful setup instructions - Add --local flag for explicit local mode selection - Fix color output using $'...' syntax and printf for portability - Add VS Code extension architecture documentation with detailed design specs - Add executive summary document for VS Code extension integration
Converts HTML files (slides, pages, etc.) to a single multi-page PDF using Playwright/Chromium. Each HTML file becomes exactly one page with full content capture (no truncation). Features: - Automatic content height detection - Configurable viewport width and DPI - Supports directory input or specific files - Progress output with quiet mode option
…s, pages, etc.) to a single multi-page PDF\nusing Playwright/Chromium. Each HTML file becomes exactly one page\nwith full content capture (no truncation).\n\nFeatures:\n- Automatic content height detection\n- Configurable viewport width and DPI\n- Supports directory input or specific files\n- Progress output with quiet mode option
…ox modes - Add get_available_ports() method to SandboxInterface (defaults to None) - Implement get_available_ports() in sandbox adapter to return [3000, 5173, 8080] for Docker/local mode - Make RegisterPort tool description dynamic based on sandbox mode: - Local mode: instructs agent to use only ports 3000, 5173, or 8080 - Cloud mode: allows any port (original behavior) - Add unit tests for get_available_ports() in SandboxInterface - Fix test_resource_limits.py to match current implementation: - Update MAX_TABS expected value from 20 to 50 - Update shell session tests to expect auto-close behavior instead of rejection This fixes the issue where agents in local Docker sandbox mode would try to use ports like 8000 that aren't pre-mapped, causing HTTP 500 errors.
…rvice controls to run_stack.sh - Fix LocalStorage.get_upload_signed_url() to use internal_url_base (default) for server-to-server uploads, preventing hang when backend tries to upload via localhost URL inside Docker container - Add individual service start/stop/restart controls to run_stack.sh (e.g., ./scripts/run_stack.sh restart frontend --local)
…control wake Slides content processor: - Fix deadlock when LocalStorage upload URL points to backend itself - Use direct write for LocalStorage (avoid self-request) - Use async httpx with timeout for external storage (GCS/S3) Subagent interrupt handling: - Add SUB_AGENT_INTERRUPTED event type for proper UI state - Add emit_completion_event() helper to BaseAgentTool - Update all subagent tools to emit correct completion/interrupted events - Handle asyncio.CancelledError in CodexAgent Stack control: - Replace run_stack.sh with unified stack_control.sh - Add 'wake' command to restart stopped sandbox containers after reboot - Add 'recover' command to fix stuck sessions and restart backend - Auto-detect local vs cloud mode from running containers
…rescan Backend: - agent_controller: Extract image paths instead of embedding base64 (prevents 413 errors) - agent_service: Dynamic token budget (70% of model context window) - anthropic: Replace print() with logger for 1M context logging - openai: Strip images from tool results, fix content merging, handle None stop_sequence - port_manager: Add rescan_containers() for runtime port sync - sandbox main: Add /ports/rescan endpoint, provider type guards - system_prompt: Add <user_uploads> section to 5 prompts Frontend: - Add SUB_AGENT_INTERRUPTED event type and handler - Mark subagents completed on COMPLETE/ERROR events - Improve subagent status fallback logic - Display template names in slide selector Scripts: - stack_control: Add cleanup command, _resync_sandbox_ports helper Tests: - Add TestRescanContainers (7 tests) - Add test_openai_tool_results.py (14 tests) - Add test_agent_controller_images.py (9 tests) - Add test_event_types.py (8 tests) - Extend local storage and llm_config tests
… type
- Redirect chat sessions to /chat?id={sessionId} when accessed via /:sessionId route
- Add defensive normalization in share-agent-content for chat sessions
- Add 'stopped' to AgentContext.status type to fix TypeScript build error
Fixes validation error when chat sessions were opened via agent URL, which
caused WebSocket API to reject 'chat' as invalid agent_type.
…ing timeouts - websocket-context: prioritize URL sessionId over stale Redux activeSessionId - stack_control.sh: use --force-recreate for restart to pick up env changes - anthropic provider: add explicit httpx timeout (600s read) for extended thinking - anthropic provider: disable HTTP/2 for more reliable streaming - service.py: log warning instead of crash when stream has no response - router.py: add event counting and timing logs for SSE debugging - anthropic.py: add missing logger import - .stack.env.local.example: document VITE_API_URL and LOCAL_STORAGE_URL_BASE alignment
570dc5a to
7bae40d
Compare
…-agent handoff Add tiered timeout protection and interrupt-during-execution support to prevent agent sessions from hanging indefinitely on stuck tool calls. Fix CAPTCHA/bot-detection handoff workflow so the agent can pause for human intervention and resume with the browser session intact. - Wrap every tool call in asyncio.wait_for(timeout=120s) so hung browser/MCP operations no longer block the agent loop forever - On timeout, return an error ToolResult telling the agent the browser is still running and to use browser_wait to re-assess state - Tools can override the default 120s via a per-tool 'timeout' attribute - Reduce MCP DEFAULT_TIMEOUT from 1800s to 300s as a hard backstop - New _run_tool_with_interrupt() polls is_interrupted() every 2s during tool execution, enabling the Cancel button to work mid-tool - run_tools_batch() and _run_tools_serially() accept an interrupt_check callback; serial execution stops early on interrupt - agent_controller passes self.is_interrupted to run_tools_batch() and checks for interrupted tool results after execution - Rewrite CAPTCHA handling instructions: agent must register port 6080 (noVNC) and hand off to user instead of attempting to solve or restart - Expose port 6080 (noVNC) in DOCKER_AVAILABLE_PORTS so the agent can share a working noVNC URL for human interaction - Browser session is preserved across handoff (no restart on timeout) - Add _sanitize_thinking_blocks() to State: ensures no assistant turn ends with a ThinkingBlock/RedactedThinkingBlock (causes Claude 400) - Apply sanitization on both save and restore (defense-in-depth) - Add safety net in AnthropicDirectClient for both streaming and non-streaming paths: append placeholder text if needed - Append COMPLETE_MESSAGE in agent_controller when model response ends with a thinking block - Add _normalize_agent_type() to handle 'chat' -> AgentType.GENERAL mapping, preventing KeyError on prompt lookup - Normalize in get_specialized_instructions() and get_system_prompt_for_agent_type() - Track failed tool lookups separately and add error ToolResults to history so the conversation state stays consistent - Add client_host_var ContextVar to track the browser hostname from incoming WebSocket requests - Used by sandbox expose_port to rewrite localhost URLs for the client - Add sandbox status tracking and websocket reconnection improvements - Add model constants and share-agent-content fixes - Add navigation leave-session hook update - Add utility tests for URL/formatting helpers - sandbox_controller recovery improvements - Docker sandbox port exposure and README updates - register_port tool fix - Add comprehensive tests for: thinking block sanitization, agent type normalization, Anthropic safety nets, chat session error handling, failed tool lookup, handshake status, query handler status, sandbox handlers, system prompt noVNC, sandbox controller recovery, client host extraction, and frontend utils
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request: Local Docker Sandbox Provider & Resource Management
Summary
This PR introduces a complete local Docker sandbox provider for air-gapped/local deployments, comprehensive resource management limits, and extensive test coverage (~343 new tests).
Key Features
🐳 Local Docker Sandbox Provider
/internal/sandboxes/{id}/has-active-sessionendpoint for session verification💾 Local Storage Providers
LocalStorageproviders for bothii_agentandii_tool🔒 Resource Management
MAX_TABSlimit with automatic cleanup of oldest tabs_on_page_changenow enforces tab limits on externally-created pages (popups, target="_blank")MAX_SHELL_SESSIONS=10limit with auto-close of oldest session when limit reached🧠 LLM Enhancements
LLMConfig.get_max_output_tokens(): Model-specific output token limits (64K Claude 4, 100K o1, 16K GPT-4, 8K Gemini)💬 Chat Improvements
🎨 Frontend
Test Coverage
125 files changed, 16,573 insertions(+), 297 deletions(-)
New test files:
test_browser_tab_limit.py- Browser MAX_TABS enforcementtest_resource_limits.py- Browser and shell session limitstest_shell_tools.py- Shell session managementtest_llm_config.py- LLM configurationtest_generation_config_factory.py- Image/video generation configstest_openai_dalle.py / test_openai_sora.py- Media generation clientstest_local_storage.py / test_tool_local_storage.py- Storage providerstest_file_*.py- File operation toolstest_terminal_manager.py- Terminal session managementCode Quality
Added comprehensive documentation for architecture and design