-
Notifications
You must be signed in to change notification settings - Fork 70
sandbox support #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
sandbox support #110
Conversation
…nput improvements Implement comprehensive autosave functionality with interactive session restoration: - Add autosave picker modal for TUI that displays recent sessions with metadata (message counts, timestamps) - Sessions stored in ~/.code_puppy/autosaves with stable session IDs that persist across prompts - Auto-restore prompt on startup allows loading previous sessions in both CLI and TUI modes - Loading autosave sets it as active target; manual context loads rotate session ID to prevent overwrites - Add /session commands to view current ID and rotate to new session Enhance multiline input handling across interfaces: - Add multiline mode toggle with Alt+M or F2 in CLI (persistent until toggled off) - Improve newline insertion with Ctrl+J (universal) and Ctrl+Enter keybindings - Update TUI to use Shift+Enter for newlines (more intuitive than Alt+Enter) - Add visual feedback for multiline mode status Improve user experience and polish: - Preload agent/model on TUI startup with loading indicator before first prompt - Tighten system message whitespace to reduce visual clutter - Silence "no MCP servers" message when none are configured - Reuse existing agent instance to avoid redundant reloads - Align command help text columns for better readability - Upgrade prompt-toolkit to 3.0.52 for improved terminal compatibility
…clear - Rotate autosave session ID when switching agents to prevent cross-agent context pollution - Rotate autosave session ID when clearing conversation history to maintain clean state - Add finalize_autosave_session helper that persists current snapshot before rotation - Add refresh_config method to JSONAgent to reload configuration after external edits - Skip rotation when switching to same agent or when agent doesn't exist - Improve /model pin command to refresh active agent config immediately - Add comprehensive test coverage for autosave rotation logic and edge cases
… and model switching - Update get_model_context_length() to respect agent-specific model pins via get_model_name() - Add graceful fallback to prevent status bar crashes if model config lookup fails - Ensure immediate agent reload when switching models in both CLI and TUI interfaces - Call refresh_config() for JSON agents before reload to pick up new model settings - Wrap reload operations in try-except to maintain stability during model changes
Add comprehensive attachment handling to enable users to drag files or paste URLs directly into prompts. The system automatically detects and processes images, PDFs, and other supported file types, passing them as binary content or URL references to the language model. - Implement `attachments.py` parser with shell-like tokenization to extract file paths and URLs from raw prompt text - Support local binary attachments (images: png/jpg/gif/webp, documents: pdf/txt/md) with MIME type detection - Support remote URL attachments (http/https image and document links) using pydantic-ai's ImageUrl/DocumentUrl types - Extend BaseAgent.run_with_mcp() to accept optional attachments and link_attachments parameters - Add run_prompt_with_attachments() helper in main.py to parse, validate, and execute prompts with attachments - Integrate attachment processing into both interactive mode and single-prompt execution flows - Provide user-friendly warnings for unsupported file types, missing files, or permission errors - Generate default prompt "Describe the attached files in detail" when user provides only attachments - Add comprehensive test coverage for parsing logic, file type detection, and integration with agent execution
Previously, BinaryContent objects in messages were silently skipped during string formatting, making it difficult to debug messages containing binary data. Now BinaryContent is explicitly marked in the formatted output.
Implement comprehensive handling of terminal drag-and-drop file paths that contain backslash-escaped spaces, ensuring reliable attachment detection and proper prompt cleaning across the command processing pipeline. - Introduce `_unescape_dragged_path()` to normalize backslash-space sequences before path resolution - Use sentinel markers to preserve escaped spaces during shlex tokenization in `_detect_path_tokens()` - Track token start indices in `_DetectedPath` for accurate span-based text replacement - Rebuild cleaned prompts using token-span logic to maintain exact punctuation and spacing - Update placeholder processor to use token spans for robust visual replacement with escaped paths - Parse attachments before command detection to prevent leading file paths from being misinterpreted as commands - Add test coverage for drag-and-drop escaped space handling - Remove unused `_clean_binaries()` method from base agent
- Disable POSIX mode in shlex.split() on Windows to prevent backslash escaping - Ensures Windows file paths with backslashes are correctly parsed as attachments - Fixes issue where backslashes in paths were being incorrectly interpreted as escape characters
- Remove document file type support (.pdf, .txt, .md) - Streamline media type detection to focus solely on images - Eliminate document-specific fallback logic in MIME type detection - Update function documentation to reflect image-only scope
- Add paginated display showing 5 sessions per page instead of fixed top-5 list - Replace single prompt with interactive loop supporting page navigation - Enable option 6 to cycle through pages or return to first page when at end - Preserve existing selection methods (numeric choice and direct name entry) - Improve user feedback with page-specific prompts and invalid selection warnings - Maintain backward compatibility with original session restoration behavior
…tory mutation - Remove URL detection from _parse_link to prevent URLs from being treated as attachments - URLs in prompts now remain as plain text instead of being converted to ImageUrl or DocumentUrl - Fix potential mutation bug in agent_manager by storing shallow copies of message histories - Prevent shared list instances between agent history cache and active agents - Update tests to reflect new behavior where URLs are left untouched in prompts
- Add persistent profile directory management to maintain browser state across runs - Configure Camoufox to use stored cookies, localStorage, and history via storage_state - Save browser context state on cleanup to preserve session data for future use - Display profile directory path in startup info and state save confirmation - Ensure profile directory is created if it doesn't exist at ~/.code_puppy/camoufox_profile
- Fix UsageLimits import to use public API instead of private module path - Downgrade pydantic-ai from 1.0.6 to 1.0.5 for stability - Refactor Camoufox browser initialization to use persistent context mode - Remove automatic homepage navigation to prevent duplicate tabs - Implement lazy page creation in get_current_page() method - Add persistent storage state handling for cookies and localStorage
…th handling - Add new claude-4.1-opus model with 200k context length to models.json - Implement MAX_PATH_LENGTH limit (1024 chars) to prevent OS filesystem errors - Skip real-time path detection for long text input (>500 chars) to avoid UI slowdown - Improve error handling for filesystem operations with better OSError catching - Add tests for very long token handling and long paragraph paste scenarios - Refactor code formatting for better readability in several functions
- Updated configuration to include 'http2' option for enabling HTTP/2 protocol. - Implemented functions to get and set the 'http2' configuration value. - Modified client creation functions to utilize HTTP/2 if enabled in the configuration. - Updated tests to verify the inclusion of 'http2' in configuration keys. Co-authored-by: Mike Pfaffenberger <[email protected]>
Add support for the Synthetic provider (https://dev.synthetic.new), enabling access to high-quality open-source models through an OpenAI-compatible API. This integration provides developers with additional model options featuring generous context windows. - Add four new Synthetic provider models: DeepSeek-V3.1-Terminus (128K), Kimi-K2-Instruct-0905 (256K), Qwen3-Coder-480B-A35B-Instruct (256K), and GLM-4.6 (200K) - Configure custom OpenAI endpoint at api.synthetic.new/openai/v1/ for all Synthetic models - Document SYN_API_KEY environment variable setup in README - Add dedicated Synthetic Provider section explaining available models and their capabilities
… discovery, and alphabetization (mpfaffenberger#103) * fix: require space after slash commands for proper triggering - Updated model picker completion to only trigger when "/model " is followed by a space - Modified pin command completion to require space after "/pin_model" trigger - Prevents false positives when text contains command patterns as substrings - Ensures commands are only recognized when properly formatted with space separator * fix: improve/set command completion by requiring space after trigger - Fixed completion trigger to require a space after "/set" before showing suggestions - Simplified logic by removing unnecessary case handling for exact trigger matches - Improved cursor position handling for more accurate text replacement - Maintained special handling for 'model' and 'puppy_token' keys - Reduced code complexity while preserving existing functionality * feat: add slash command completion to command line interface - Implement SlashCompleter class that provides autocomplete for slash commands - Trigger completion when '/' is the first non-whitespace character - Show all available commands and their aliases with descriptions - Filter completions based on partial input after the slash - Integrate new completer into the existing completion system - Handle command loading failures gracefully to prevent crashes * style: sort completions alphabetically for consistent user experience - Sort config keys in SetCompleter to provide predictable completion order - Sort commands by name in SlashCompleter for consistent command suggestions - Sort command aliases alphabetically in SlashCompleter - Improve readability and consistency of command completion interface * feat: improve command completion sorting and display - Refactor command completion to collect all completions before yielding - Implement case-insensitive alphabetical sorting for primary commands and aliases - Sort aliases by their alias name rather than their primary command name - Maintain proper display formatting showing aliases with reference to primary commands - Preserve existing behavior while improving user experience with consistent ordering * feat: add agent command completion and standardize completer behavior - Add new AgentCompleter for /agent command with agent name suggestions - Standardize completion triggers to require space after commands (/load_context, /cd, /agent) - Refactor LoadContextCompleter to use simplified trigger logic and improve session name extraction - Update CDCompleter logic for consistent directory path handling and completion positioning - Integrate AgentCompleter into the combined completion system for seamless agent selection --------- Co-authored-by: cellwebb <[email protected]>
* feat: add shorthand "/m" trigger for model completion - Added ModelNameCompleter with "/m" trigger alongside existing "/model" command - Provides users with a shorter alternative for switching models in the CLI - Maintains consistency with existing completion system architecture * feat: add tab completion for MCP commands - Introduce MCPCompleter class to provide intelligent tab completion for /mcp commands - Support completion for both server-specific subcommands (start, stop, restart, status, logs, remove) and general subcommands (list, start-all, stop-all, test, add, install, search, help) - Include dynamic server name completion when using server-specific subcommands - Implement caching mechanism to reduce repeated server name lookups - Integrate MCPCompleter into the main completion system alongside other command completers This enhancement improves user experience when working with MCP (Model Context Protocol) commands by providing contextual suggestions and reducing the need to remember exact command syntax and server names. * fix: improve MCP command completion logic for subcommands and server names - Reorder completion logic to prioritize server name completion when appropriate - Fix cursor position detection to properly handle cases with spaces after subcommands - Improve handling of partial subcommand vs server name completion - Ensure subcommand completions only appear when no space follows the subcommand - Remove unused Optional import from typing module The changes resolve issues where server name completions would not appear correctly when typing "/mcp start " or similar commands with trailing spaces, while maintaining proper subcommand completion behavior. --------- Co-authored-by: cellwebb <[email protected]>
- Add space suggestion when user types partial commands like /load_context or /set - Enhance directory completion to preserve user's original path prefixes (~/, ./) - Refactor model command handling to use command handler for consistent feedback - Remove prompt-level model processing to allow proper command execution flow - Update tests to reflect new model handling approach
…ration - Implement interactive_autosave_picker() with split-panel interface for browsing and loading autosave sessions - Add interactive_diff_picker() with live preview for configuring diff colors and styles - Replace text-based diff configuration commands with rich visual menu system - Improve autosave loading flow with better session metadata display and preview - Fix HTML escaping in arrow_select_async to prevent parsing errors - Remove redundant diff style example emission from config setters The new TUI interfaces provide a more intuitive and visually appealing way to manage autosave sessions and customize diff display settings, with live previews and keyboard navigation for better user experience.
- Add TTY detection to determine when to use TUI vs text-based autosave picker
- Fall back to original text-based picker for tests and non-interactive environments
- Support CODE_PUPPY_NO_TUI environment variable for explicit control
- Update UI labels to be more generic ("Last Message" instead of "Last User Message")
- Remove TUI-specific expectations from integration tests to work with fallback picker
- Ensure autosave loading works consistently across different terminal environments
- Implement interactive arrow-key selector for agent switching when no agent name is provided - Add preview panel showing agent descriptions during selection - Enhance arrow_select_async to support optional preview callbacks with boxed preview display - Add robust error handling with fallback to text-based listing if interactive picker fails - Include auto-save session rotation when switching agents - Display current agent status and availability indicators in the picker - Force UI redraws when navigating through options to update preview content
Introduce a comprehensive safety assessment system for shell commands executed in yolo_mode that prevents accidental execution of destructive commands. Key changes: - Add new shell_safety plugin with AI agent for risk assessment - Implement configurable risk threshold system (none/low/medium/high/critical) - Add safety callback that intercepts commands before execution - Update command runner to integrate safety checks and handle blocking - Make shell command execution async to support safety assessment workflows - Add fail-safe behavior that blocks commands when assessment fails The safety agent evaluates commands for destructive patterns like file system destruction, database operations, privilege escalation, and data exfiltration risks. Commands exceeding the configured risk threshold are automatically blocked with clear override instructions.
- Extended sleep times from 0.5s and 1.0s to 5s each to allow proper initialization - Reduced expect timeout from 60s to 10s for more efficient test execution - These changes address timing issues that were causing test failures in session rotation scenarios
- Add get_user_approval_async function to replace synchronous version - Update command_runner.py to use the new async approval function - Maintain proper spinner pausing/resuming during user interaction - Preserve all existing functionality including feedback collection and result display - Ensure proper console state management to prevent display artifacts
…oved filtering (mpfaffenberger#108) * chore: update .gitignore to exclude environment files and serena directory - Add .env to prevent environment variable files from being tracked - Add .serena/ directory to gitignore to exclude local serena configuration * refactor: remove deprecated Cerebras-Qwen3-Coder-480b model - Removed the Cerebras-Qwen3-Coder-480b model configuration from models.json - This model appears to be deprecated or no longer supported - Cleanup reduces configuration clutter and removes unused model reference * feat: filter Claude models to keep only latest versions Add intelligent model filtering to automatically keep only the most recent versions of Claude models (haiku, sonnet, opus) when configuring the plugin. - Implement filter_latest_claude_models() function that parses model names using semantic versioning patterns - Support both dashed (claude-haiku-3-5-20241022) and dotted (claude-haiku-3.5-20241022) version formats - Compare models by major version, then minor version, then date to determine latest - Integrate filtering into add_models_to_extra_config() to reduce configuration clutter - Add logging to track filtering results for debugging This ensures users only see the latest model variants while maintaining backward compatibility with existing API responses. * feat: add /unpin command to reset agent models to default - Implement new /unpin command that removes pinned models from agents - Add support for both JSON agents and built-in Python agents - Extend /pin_model command to handle '(unpin)' special case for convenience - Add autocompletion support for the new unpin command - Include (unpin) option in pin command completion suggestions - Add command registration and test coverage for the unpin functionality This change provides users with an easy way to reset agents to their default models, complementing the existing pin functionality and improving the overall model management workflow. * feat: filter Claude Code OAuth models to show only latest versions - Add load_claude_models_filtered() function to return only the latest haiku, sonnet, and opus models - Update ModelFactory to use filtered loading for Claude Code OAuth models to prevent showing duplicate older versions - Modify register_callbacks to use filtered models when handling custom commands - Apply same filtering logic during loading that was previously only used during saving - Improve user experience by reducing model selection noise and confusion * feat: add GPT-5.1 and GPT-5.1-codex models to configuration - Added new OpenAI model configurations for GPT-5.1 and GPT-5.1-codex variants - Both models support context length of 272,000 tokens - Extends available model options for users with access to newer GPT versions * feat: add gpt-5.1-codex-mini-api model configuration - Added new OpenAI model "gpt-5.1-codex-mini" with 272k context length - Extended available model options for code generation capabilities - Maintained consistent model configuration structure * feat: update GLM-4.5 model to GLM-4.5-air variant - Changed model key from "glm-4.5-coding" to "glm-4.5-air-coding" - Updated API model key from "glm-4.5-api" to "glm-4.5-air-api" - Modified model name references from "glm-4.5" to "glm-4.5-air" - Maintains same model types (zai_coding and zai_api) for consistency * feat: add Claude 4.5 Haiku model and update Anthropic model names - Add new claude-4-5-haiku model with 200k context length - Update Anthropic model names to simplified format: - claude-sonnet-4-0 (was claude-sonnet-4-20250514) - claude-sonnet-4-5 (was claude-sonnet-4-5-20250929) - claude-opus-4-1 (was claude-opus-4-1-20250805) - Standardize naming convention for better maintainability --------- Co-authored-by: cellwebb <[email protected]>
- Extended model factory to recognize and properly instantiate gpt-5.1-codex-api - Ensures the new model uses OpenAIResponsesModel like gpt-5-codex-api - Maintains consistency with existing model instantiation patterns
- Replace explicit model name checks with generic substring detection - Improve maintainability by using "codex" in model_name instead of hardcoded names - Reduce code duplication and make logic more resilient to future model additions
- Add comprehensive proxy configuration support for HTTP/HTTPS traffic in http_utils.py - Introduce CODE_PUPPY_DISABLE_RETRY_TRANSPORT environment variable for testing scenarios - Disable custom retry transport when proxy environment variables are detected to ensure compatibility - Add comprehensive integration test with custom proxy server to monitor and validate network traffic - Implement strict domain whitelist validation to ensure only authorized external domains are contacted - Remove deprecated ENVIRONMENT_VARIABLES.md documentation file - Enhance client creation logic to properly handle proxy settings and SSL verification based on environment
* feat: add comprehensive code execution sandboxing
Implements sandboxing for shell command execution inspired by Anthropic's
Claude Code approach. Provides dual-layer isolation: filesystem and network.
## Features
**Filesystem Isolation:**
- Linux: bubblewrap for namespace isolation
- macOS: sandbox-exec with custom profiles
- Restricts access to current working directory
- Blocks sensitive paths (~/.ssh, ~/.aws, ~/.gnupg)
- Configurable allowed read/write paths
**Network Isolation:**
- HTTP/HTTPS proxy with domain allowlisting
- Pre-approved safe domains (package registries, git hosts)
- Optional user approval for new domains
- Network traffic monitoring and logging
**Security:**
- Opt-in by default (explicit user control)
- Fail-safe fallback to unsandboxed execution
- Prevents access to SSH keys, cloud credentials
- Limits blast radius of compromised dependencies
## Implementation
Core Components:
- code_puppy/sandbox/base.py - Base classes and interfaces
- code_puppy/sandbox/linux_isolator.py - Bubblewrap implementation
- code_puppy/sandbox/macos_isolator.py - sandbox-exec implementation
- code_puppy/sandbox/network_proxy.py - Proxy server
- code_puppy/sandbox/config.py - Configuration management
- code_puppy/sandbox/command_wrapper.py - Main wrapper
Integration:
- Integrated into command_runner.py subprocess execution
- Added /sandbox CLI commands for management
- Configuration stored in ~/.code_puppy/sandbox_config.json
## Commands
- /sandbox enable - Enable sandboxing
- /sandbox disable - Disable sandboxing
- /sandbox status - Show configuration
- /sandbox test - Test availability
- /sandbox allow-domain <domain> - Add domain to allowlist
- /sandbox allow-path <path> - Add filesystem path
## Testing
- 43 unit and integration tests (100% passing)
- Tests for Linux (bubblewrap) isolation
- Tests for macOS (sandbox-exec) isolation
- Tests for network proxy functionality
- Integration tests for complete sandboxing flow
- All code passes ruff style checks
## Documentation
- Comprehensive README section on sandboxing
- Usage examples and security benefits
- Platform-specific installation instructions
- Configuration and command reference
Closes: Add code execution sandboxing feature
* feat: add Claude Code-inspired sandbox enhancements
Implements advanced sandboxing features matching Anthropic's Claude Code
implementation for production-ready code execution isolation.
## New Features
**1. Broad Read Scope (matches Claude Code default)**
- Read access: Entire filesystem EXCEPT denied paths
- Write access: Current working directory + allowed paths only
- Configurable via read_scope: "broad" (default) or "restricted"
**2. Excluded Commands**
- Commands that always run unsandboxed (docker, watchman, podman, systemctl)
- Prevents sandbox incompatibility issues
- Configurable exclusion list
**3. Resource Limits (CPU/Memory)**
- Linux: systemd-run with MemoryMax and CPUQuota
- macOS: ulimit for memory limits
- Prevents runaway processes
- Configurable: max_memory_mb, max_cpu_percent
**4. Retry Handler (dangerouslyDisableSandbox)**
- Detects sandbox-related failures
- Prompts user to retry without sandboxing
- Configurable via allow_unsandboxed_commands
- Improves UX for legitimate failures
**5. Advanced Proxy Configuration**
- Separate HTTP and SOCKS proxy ports
- Configurable: http_proxy_port (9050), socks_proxy_port (9051)
- Better network isolation control
## Implementation Details
**Updated Files:**
- code_puppy/sandbox/base.py - Added read_scope, resource limits to SandboxOptions
- code_puppy/sandbox/config.py - Extended with all new configuration options
- code_puppy/sandbox/linux_isolator.py - Broad read scope + systemd-run resource limits
- code_puppy/sandbox/macos_isolator.py - Broad read scope + ulimit resource limits
- code_puppy/sandbox/command_wrapper.py - Excluded commands check, 3-tuple return
- code_puppy/sandbox/retry_handler.py - NEW: Retry logic for failed commands
- code_puppy/tools/command_runner.py - Handle exclusions, show status messages
**Configuration Schema:**
```json
{
"read_scope": "broad", // "broad" or "restricted"
"excluded_commands": ["docker", "watchman", "podman", "systemctl"],
"allow_unsandboxed_commands": true,
"http_proxy_port": 9050,
"socks_proxy_port": 9051,
"max_memory_mb": null,
"max_cpu_percent": null,
"denied_read_paths": ["~/.ssh", "~/.aws", "~/.gnupg", ...]
}
```
## Testing
- 43 unit tests passing (100%)
- Fixed tests for new 3-tuple return signature
- Updated tests for broad read scope default
- All ruff style checks passing
## Matches Claude Code Features
✅ Two-layer isolation (filesystem + network)
✅ Broad read scope by default
✅ Excluded commands for incompatible tools
✅ Resource limits (Linux with systemd, macOS with ulimit)
✅ Retry mechanism foundation (dangerouslyDisableSandbox)
✅ Configurable proxy ports
✅ Domain-based network filtering
## Benefits
- More flexible than initial implementation
- Better compatibility with real-world tools
- Resource protection prevents DoS
- Improved UX with retry mechanism
- Production-ready configuration
Refs: https://code.claude.com/docs/en/sandboxing
---------
Co-authored-by: Claude <[email protected]>
Add package.json to enable npm commands for the Python project. Includes scripts for building, testing, linting, and formatting using uv and ruff. Co-authored-by: Claude <[email protected]>
- Remove redundant test:cov script (coverage already configured in pyproject.toml) - Update clean script to use rimraf for cross-platform compatibility - Add .git suffix to repository URL for better tooling support - Add rimraf as devDependency Co-authored-by: Claude <[email protected]>
|
Hello! What an interesting contribution. I have a few questions - Would it be possible to refactor this to be a Typically with these kinds of contributions, I ask that folks implement in such a way that no behavior is changed from the default and that the code nearly 100% isolated. The plugin hooks / callbacks can facilitate this. I will totally accept adding additional hooks in various parts of the codebase. But, this principal keeps the codebase very clean and different components isolated. The other thing I will ask is that the feature is toggled off by default so effectively there is no change in behavior whatsoever when I merge the P/R. Users may opt in by enabling the feature for example Let me know your thoughts and if you think that is feasible. |
Implements sandboxing for shell command execution inspired by Anthropic's Claude Code approach. Provides dual-layer isolation: filesystem and network.
Features
Filesystem Isolation:
Network Isolation:
Security:
Implementation
Core Components:
Integration:
Commands
Testing
Documentation
Closes: Add code execution sandboxing feature
Implements advanced sandboxing features matching Anthropic's Claude Code implementation for production-ready code execution isolation.
New Features
1. Broad Read Scope (matches Claude Code default)
2. Excluded Commands
3. Resource Limits (CPU/Memory)
4. Retry Handler (dangerouslyDisableSandbox)
5. Advanced Proxy Configuration
Implementation Details
Updated Files:
Configuration Schema:
{ "read_scope": "broad", // "broad" or "restricted" "excluded_commands": ["docker", "watchman", "podman", "systemctl"], "allow_unsandboxed_commands": true, "http_proxy_port": 9050, "socks_proxy_port": 9051, "max_memory_mb": null, "max_cpu_percent": null, "denied_read_paths": ["~/.ssh", "~/.aws", "~/.gnupg", ...] }Testing
Matches Claude Code Features
✅ Two-layer isolation (filesystem + network)
✅ Broad read scope by default
✅ Excluded commands for incompatible tools
✅ Resource limits (Linux with systemd, macOS with ulimit) ✅ Retry mechanism foundation (dangerouslyDisableSandbox) ✅ Configurable proxy ports
✅ Domain-based network filtering
Benefits
Refs: https://code.claude.com/docs/en/sandboxing