Skip to content

Conversation

@jpsank
Copy link

@jpsank jpsank commented Nov 12, 2025

Add PDF Parsing Support

Adds native PDF document ingestion - users can drag and drop PDF files into the terminal like images. Text is automatically extracted and included in the AI's context.

Changes

code_puppy/tools/pdf_parser.py (new)

  • Text and metadata extraction from PDFs using pypdf
  • Page-by-page parsing with error handling

code_puppy/command_line/attachments.py

  • Added .pdf to supported extensions
  • Auto-detect and parse PDFs, append text to prompts

code_puppy/main.py

  • Display PDF count in attachment messages

pyproject.toml

  • Added pypdf>=5.1.0

Testing

  • 20 tests added (9 unit, 11 integration)
  • 93% coverage on PDF parser, 77% on attachments
  • Tests cover multi-page docs, invalid files, spaces in filenames, mixed attachments

Usage

>>> Summarize research_paper.pdf
[dim]Attachments detected -> PDFs: 1[/dim]

Total Changes: 9 files changed, 791 insertions(+), 2 deletions(-)

mpfaffenberger and others added 30 commits October 9, 2025 09:56
…ger#33)

Add support for specifying which agent to use directly from the command line,
enabling better automation and non-interactive usage patterns.

## What's Added
- New --agent/-a command line argument
- Agent validation with helpful error messages
- Support for all existing modes (interactive, TUI, non-interactive)

## Why This Matters
- **Automation**: Perfect for CI/CD pipelines and scripts
- **Non-interactive mode**: No need to switch agents during runtime
- **Developer experience**: Faster workflow when you know which agent you need
- **Scripting**: Enables predictable agent behavior in automated environments

## Usage Examples

### Non-interactive mode with specific agent:
python -m code_puppy --prompt 'Create a hello world script' --agent code-puppy

### Interactive mode with pre-selected agent:
python -m code_puppy --interactive --agent agent-creator

### TUI mode with specific agent:
python -m code_puppy --tui --agent ld-expert

### Short form syntax:
python -m code_puppy -p 'hello world' -a code-puppy

## Error Handling
- Validates agent exists before startup
- Shows clear error messages for invalid agents
- Lists available agents when errors occur
- Proper exit codes for automation (exit 1 on error)

## Available Agents
- code-puppy: Main coding assistance agent
- agent-creator: Helps create new JSON agent configurations
- ld-expert: Living Design expert for Walmart design system

## Technical Implementation
- Added argparse argument parsing for --agent/-a
- Early validation in startup flow before agent initialization
- Integrated with existing agent_manager infrastructure
- Maintains backward compatibility (no breaking changes)

This feature significantly improves the developer experience for automation,
scripting, and situations where you know exactly which agent you need upfront.
- Add /reasoning command to dynamically set reasoning effort (low/medium/high) for GPT-5 models
- Implement get_openai_reasoning_effort() and set_openai_reasoning_effort() config functions with validation
- Replace hardcoded "medium" reasoning effort with configurable value from settings
- Update /status command to display current reasoning effort setting
- Auto-reload active agent when reasoning effort changes to apply new configuration
- Replace shift key detection logic with search filter check
- Use prompt_toolkit's built-in is_searching filter to prevent enter from submitting during search
- Remove unnecessary comments about shift key detection complexity
- Improve reliability by leveraging framework's native search state detection
- Replace deprecated OpenAIModelSettings with OpenAIChatModelSettings
- Update GeminiModel imports to GoogleModel throughout codebase
- Update GoogleGLAProvider references to GoogleProvider
- Maintain backward compatibility with custom provider implementations
- Sync dependency lock file with latest pydantic-ai package changes
* feat: implement auto-save context functionality

- Modified command handler to support context auto-save
- Updated config to include auto-save settings
- Enhanced main module with context management

* test: add comprehensive tests for auto-save session functionality

- Tests for auto-save session configuration (enabled/disabled)
- Tests for max saved sessions configuration and validation
- Tests for auto-save session functionality and error handling
- Tests for cleanup old sessions functionality
- All tests pass with proper mocking
Consolidate all session save/load logic from command_handler and config into a
new session_storage module to eliminate duplication and improve maintainability.

- Create session_storage.py with unified save_session/load_session/cleanup_sessions APIs
- Move autosave directory from contexts/ to dedicated autosaves/ folder
- Replace inline pickle/json handling in command_handler with storage module calls
- Refactor auto_save_session_if_enabled to use new storage primitives
- Add restore_autosave_interactively for startup session recovery prompt
- Introduce SessionMetadata and SessionPaths dataclasses for type safety
- Update _cleanup_old_sessions to delegate to storage module cleanup logic
- Add comprehensive test coverage for all storage operations
- Update existing tests to handle new AUTOSAVE_DIR configuration
- Introduce per-process autosave session ID that remains constant across saves
- Add /session command to view current autosave ID or rotate to new session
- Automatically rotate session ID when loading saved context to prevent overwrites
- Remove max_saved_sessions config and automatic cleanup of old sessions
- Users now control session lifecycle explicitly via /session new command
- Simplifies autosave behavior: each session accumulates updates until rotated
…nput improvements

Implement comprehensive autosave functionality with interactive session restoration:
- Add autosave picker modal for TUI that displays recent sessions with metadata (message counts, timestamps)
- Sessions stored in ~/.code_puppy/autosaves with stable session IDs that persist across prompts
- Auto-restore prompt on startup allows loading previous sessions in both CLI and TUI modes
- Loading autosave sets it as active target; manual context loads rotate session ID to prevent overwrites
- Add /session commands to view current ID and rotate to new session

Enhance multiline input handling across interfaces:
- Add multiline mode toggle with Alt+M or F2 in CLI (persistent until toggled off)
- Improve newline insertion with Ctrl+J (universal) and Ctrl+Enter keybindings
- Update TUI to use Shift+Enter for newlines (more intuitive than Alt+Enter)
- Add visual feedback for multiline mode status

Improve user experience and polish:
- Preload agent/model on TUI startup with loading indicator before first prompt
- Tighten system message whitespace to reduce visual clutter
- Silence "no MCP servers" message when none are configured
- Reuse existing agent instance to avoid redundant reloads
- Align command help text columns for better readability
- Upgrade prompt-toolkit to 3.0.52 for improved terminal compatibility
…clear

- Rotate autosave session ID when switching agents to prevent cross-agent context pollution
- Rotate autosave session ID when clearing conversation history to maintain clean state
- Add finalize_autosave_session helper that persists current snapshot before rotation
- Add refresh_config method to JSONAgent to reload configuration after external edits
- Skip rotation when switching to same agent or when agent doesn't exist
- Improve /model pin command to refresh active agent config immediately
- Add comprehensive test coverage for autosave rotation logic and edge cases
… and model switching

- Update get_model_context_length() to respect agent-specific model pins via get_model_name()
- Add graceful fallback to prevent status bar crashes if model config lookup fails
- Ensure immediate agent reload when switching models in both CLI and TUI interfaces
- Call refresh_config() for JSON agents before reload to pick up new model settings
- Wrap reload operations in try-except to maintain stability during model changes
Add comprehensive attachment handling to enable users to drag files or paste URLs directly into prompts. The system automatically detects and processes images, PDFs, and other supported file types, passing them as binary content or URL references to the language model.

- Implement `attachments.py` parser with shell-like tokenization to extract file paths and URLs from raw prompt text
- Support local binary attachments (images: png/jpg/gif/webp, documents: pdf/txt/md) with MIME type detection
- Support remote URL attachments (http/https image and document links) using pydantic-ai's ImageUrl/DocumentUrl types
- Extend BaseAgent.run_with_mcp() to accept optional attachments and link_attachments parameters
- Add run_prompt_with_attachments() helper in main.py to parse, validate, and execute prompts with attachments
- Integrate attachment processing into both interactive mode and single-prompt execution flows
- Provide user-friendly warnings for unsupported file types, missing files, or permission errors
- Generate default prompt "Describe the attached files in detail" when user provides only attachments
- Add comprehensive test coverage for parsing logic, file type detection, and integration with agent execution
Previously, BinaryContent objects in messages were silently skipped during
string formatting, making it difficult to debug messages containing binary
data. Now BinaryContent is explicitly marked in the formatted output.
mpfaffenberger and others added 28 commits November 9, 2025 13:14
…oved controls

- Implement interactive arrow-key selector for file operations and shell commands with approve/reject/feedback options
- Add user feedback capture and propagation to allow users to guide rejected operations with specific instructions
- Introduce Ctrl+X keyboard shortcut for interrupting running shell commands without canceling the entire agent
- Standardize diff formatting and display to avoid redundant output during permission prompts
- Add configurable verbose mode for grep output with detailed file contents and line numbers
- Improve spinner behavior during user input prompts to prevent terminal interference
- Consolidate permission handling logic with thread-local storage for user feedback
- Update keyboard shortcuts and help text to reflect new Ctrl+X shell command cancellation
- Implement async interactive model picker UI with arrow-key selection
- Show current model indicator and visual feedback during selection
- Add fallback to original command-line interface if picker fails
- Improve model completion to only trigger at beginning of line and filter matches
- Add async version of arrow_select utility for use in async contexts
- Update help text to reference /model command instead of /m
- Enhance grep output formatting and code organization
- Remove round-robin integration tests that require external API keys
- Update test mocks to match new permission prompt API returning tuples
- Exclude TUI modules from coverage reporting to improve metrics
- Fix agent tools test assertions for updated file permission text
- Add terminal state reset functionality to handle corrupted terminal state after interrupts
- Suppress noisy error messages from Windows Ctrl+X listener that were common but harmless
- Add cross-platform terminal reset logic for both Windows (via console mode) and Unix (via termios)
- Ensure output streams are properly flushed after user input cancellation
- Make keyboard interrupt handling more robust by gracefully handling terminal state corruption

These changes specifically address Windows terminal stability issues while maintaining compatibility with other platforms, providing a better user experience when interrupting operations.
- Add reset_terminal_state() utility for Windows/Unix terminal state recovery
- Fix Ctrl-C during approvals no longer breaking terminal input
- Improve spinner pause/resume around approval flows to prevent littering
- Better error handling in Windows Ctrl-X listener with graceful fallback
- Add explicit stream flushing after user input cancellations
- Ensure spinners fully pause before approval panel shows
- Add delays to let spinners stabilize before/after approval flows

Issues fixed:
- Terminal no longer gets stuck after Ctrl-C during approvals
- Spinner no longer litters text after approval flows complete
- One-character-on-far-right cursor artifact should be gone
- Windows PowerShell terminal state properly restored after interrupts

Note: Ctrl-X on Windows PowerShell has limitations (Windows constraint).
      Users can use Ctrl+C as reliable interrupt method.
…rtifacts

- Replace CTRL_BREAK_EVENT with taskkill /F /T on Windows for reliable process killing
- Add explicit line clearing (\r + \x1b[K) when spinner pauses/resumes
- Clear cursor artifacts before creating new Live display
- Should fix Ctrl-X not killing processes on Windows PowerShell
- Should fix 'one character on far right' spinner artifact issue
…ate clearing

- Changed Rich Live display from transient=True to transient=False
- transient=True was causing display state issues after pause/resume
- Added Rich console state clearing with ANSI escapes in approval flow
- Clear console.file with \r, \x1b[K, and \x1b[H before resuming spinners
- Should eliminate the 'one character on far right' artifact issue
)

- Replace custom version badge with standard PyPI version badge
- Improves badge display and automatically syncs with PyPI releases
- Maintains consistent styling with other badges

Co-authored-by: cellwebb <[email protected]>
)

- Replace custom version badge with standard PyPI version badge
- Improves badge display and automatically syncs with PyPI releases
- Maintains consistent styling with other badges

Co-authored-by: cellwebb <[email protected]>
- Remove unnecessary sys imports that were already imported at module level
- Clean up import statements in arrow_select_async, arrow_select, and get_user_approval functions
- Improve code maintainability by reducing duplicate imports
- Implement PinCompleter class to provide intelligent tab completion for the /pin_model command
- Support completion for agent names (both built-in and JSON agents) as first argument
- Support completion for model names as second argument after agent is specified
- Handle various cursor positions and partial typing scenarios
- Integrate PinCompleter into the existing prompt_toolkit completion system
- Load available agents dynamically from agent_manager and json_agent modules
- Reuse existing model name loading functionality from model_picker_completion

Co-authored-by: cellwebb <[email protected]>
- Eliminated pnpm-check command from lefthook pre-commit configuration
- This change streamlines the pre-commit process by removing an optional dependency check
- Tests are now handled exclusively in CI, as noted in the existing comment

Co-authored-by: cellwebb <[email protected]>
Implement automatic cache control injection to optimize Claude Code API usage by adding ephemeral cache control to message requests. This reduces API costs and improves response times for repeated similar requests.

- Create ClaudeCacheAsyncClient that intercepts /v1/messages requests and injects cache_control into the last message content block
- Add patch_anthropic_client_messages function to monkey-patch AsyncAnthropic SDK at the payload level before serialization
- Integrate the cache client into ModelFactory for Claude Code models with proper error handling
- Use dual-layer approach (httpx-level + SDK-level) to ensure cache control injection works reliably regardless of internal implementation changes
- Implement defensive error handling to prevent cache injection from breaking real API calls
… discovery, and alphabetization (mpfaffenberger#103)

* fix: require space after slash commands for proper triggering

- Updated model picker completion to only trigger when "/model " is followed by a space
- Modified pin command completion to require space after "/pin_model" trigger
- Prevents false positives when text contains command patterns as substrings
- Ensures commands are only recognized when properly formatted with space separator

* fix: improve/set command completion by requiring space after trigger

- Fixed completion trigger to require a space after "/set" before showing suggestions
- Simplified logic by removing unnecessary case handling for exact trigger matches
- Improved cursor position handling for more accurate text replacement
- Maintained special handling for 'model' and 'puppy_token' keys
- Reduced code complexity while preserving existing functionality

* feat: add slash command completion to command line interface

- Implement SlashCompleter class that provides autocomplete for slash commands
- Trigger completion when '/' is the first non-whitespace character
- Show all available commands and their aliases with descriptions
- Filter completions based on partial input after the slash
- Integrate new completer into the existing completion system
- Handle command loading failures gracefully to prevent crashes

* style: sort completions alphabetically for consistent user experience

- Sort config keys in SetCompleter to provide predictable completion order
- Sort commands by name in SlashCompleter for consistent command suggestions
- Sort command aliases alphabetically in SlashCompleter
- Improve readability and consistency of command completion interface

* feat: improve command completion sorting and display

- Refactor command completion to collect all completions before yielding
- Implement case-insensitive alphabetical sorting for primary commands and aliases
- Sort aliases by their alias name rather than their primary command name
- Maintain proper display formatting showing aliases with reference to primary commands
- Preserve existing behavior while improving user experience with consistent ordering

* feat: add agent command completion and standardize completer behavior

- Add new AgentCompleter for /agent command with agent name suggestions
- Standardize completion triggers to require space after commands (/load_context, /cd, /agent)
- Refactor LoadContextCompleter to use simplified trigger logic and improve session name extraction
- Update CDCompleter logic for consistent directory path handling and completion positioning
- Integrate AgentCompleter into the combined completion system for seamless agent selection

---------

Co-authored-by: cellwebb <[email protected]>
* feat: add shorthand "/m" trigger for model completion

- Added ModelNameCompleter with "/m" trigger alongside existing "/model" command
- Provides users with a shorter alternative for switching models in the CLI
- Maintains consistency with existing completion system architecture

* feat: add tab completion for MCP commands

- Introduce MCPCompleter class to provide intelligent tab completion for /mcp commands
- Support completion for both server-specific subcommands (start, stop, restart, status, logs, remove) and general subcommands (list, start-all, stop-all, test, add, install, search, help)
- Include dynamic server name completion when using server-specific subcommands
- Implement caching mechanism to reduce repeated server name lookups
- Integrate MCPCompleter into the main completion system alongside other command completers

This enhancement improves user experience when working with MCP (Model Context Protocol) commands by providing contextual suggestions and reducing the need to remember exact command syntax and server names.

* fix: improve MCP command completion logic for subcommands and server names

- Reorder completion logic to prioritize server name completion when appropriate
- Fix cursor position detection to properly handle cases with spaces after subcommands
- Improve handling of partial subcommand vs server name completion
- Ensure subcommand completions only appear when no space follows the subcommand
- Remove unused Optional import from typing module

The changes resolve issues where server name completions would not appear correctly
when typing "/mcp start " or similar commands with trailing spaces, while maintaining
proper subcommand completion behavior.

---------

Co-authored-by: cellwebb <[email protected]>
- Add space suggestion when user types partial commands like /load_context or /set
- Enhance directory completion to preserve user's original path prefixes (~/, ./)
- Refactor model command handling to use command handler for consistent feedback
- Remove prompt-level model processing to allow proper command execution flow
- Update tests to reflect new model handling approach
…ration

- Implement interactive_autosave_picker() with split-panel interface for browsing and loading autosave sessions
- Add interactive_diff_picker() with live preview for configuring diff colors and styles
- Replace text-based diff configuration commands with rich visual menu system
- Improve autosave loading flow with better session metadata display and preview
- Fix HTML escaping in arrow_select_async to prevent parsing errors
- Remove redundant diff style example emission from config setters

The new TUI interfaces provide a more intuitive and visually appealing way to manage autosave sessions and customize diff display settings, with live previews and keyboard navigation for better user experience.
@arnonuem
Copy link
Contributor

might now be covered by this: #143

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.