Transform the AI CLI's model configuration from static YAML-based to dynamic runtime management with Redis persistence. This allows users to add, remove, and switch between different LLM models on-the-fly without restarting the application.
-
Tinyollama (Local) - Lightweight fallback model
- URL:
http://localhost:11434 - Model:
tinyllama - Kept in config.yaml (static)
- Has disabled features list
- URL:
-
General Model - Main conversational model
- Currently:
llama3.1:8bathttp://192.168.31.23:11434 - Used for regular chat interactions
- Stored in config.yaml (will become dynamic)
- Currently:
-
Coder Model - Specialized for code tasks
- Currently:
qwen2.5-coder:7bat same URL - Used for
/codecommand and code editing - Stored in config.yaml (will become dynamic)
- Currently:
src/config/manager.py:get_ollama_model(),get_coder_model()src/config/llm_availability.py: Availability checking and fallback logicmain.py:1411: Coder model used for file editing operationsmain.py:1101: Code mode disabled check for tinyollamasrc/ui/routes/chat.py: UI model retrievalsrc/ui/routes/commands.py: Code command execution
Create src/model_registry/manager.py:
@dataclass
class ModelConfig:
"""Configuration for a dynamically registered model."""
model_id: str # Unique identifier (auto-generated)
model_type: str # 'general' or 'coder'
url: str # Ollama service URL
model_name: str # Model name (e.g., 'llama3.1:8b')
timeout: int # Request timeout (default: 120)
is_active: bool # Whether this is the active model for its type
added_at: datetime # When the model was registered
last_checked: datetime # Last availability check
is_available: bool # Current availability status
class ModelRegistry:
"""Manages dynamic model registration with Redis persistence."""
def add_model(self, model_type: str, url: str, model_name: str,
timeout: int = 120, set_active: bool = True) -> ModelConfig
def remove_model(self, model_id: str) -> bool
def list_models(self, model_type: str = None) -> List[ModelConfig]
def get_active_model(self, model_type: str) -> Optional[ModelConfig]
def set_active_model(self, model_id: str) -> bool
def update_availability(self, model_id: str, is_available: bool) -> bool
def get_model(self, model_id: str) -> Optional[ModelConfig]Redis Storage Schema:
models:{model_type}:active- String: Active model ID for typemodels:{model_id}- Hash: Model configurationmodels:index:{model_type}- Set: Model IDs by type
# Add models
/model general add <url> <model_name> [--timeout SECONDS]
/model coder add <url> <model_name> [--timeout SECONDS]
# List models
/model general list
/model coder list
/model list # Lists all models
# Switch active model
/model general use <model_id>
/model coder use <model_id>
# Remove models
/model general remove <model_id>
/model coder remove <model_id>
# Check availability
/model check [model_id] # Check specific or all models
/model status # Show current active models and availabilityInstead of exiting the CLI when models are unavailable, implement feature-level disabling:
No General Model Available:
- Disable chat prompt (show warning)
- Display: "
⚠️ No general model available. Use '/model general add' to configure a model." - Allow command execution:
/model,/session,/help, etc. - Still allow code execution if coder model is available
No Coder Model Available:
- Disable
/codecommand - Disable code-related MCP tools
- Fall back to general model for simple code editing
- Display warning when trying to use code features
Only Tinyollama Available:
- Continue current behavior with disabled_features list
- Show clear warnings about limitations
src/config/manager.py:
- Keep tinyollama methods (static config)
- Remove
get_ollama_model()andget_coder_model() - Add
get_fallback_model()for tinyollama - Add
get_default_timeout()for backward compatibility
src/config/llm_availability.py:
- Rename to
src/model_registry/availability.py - Integrate with ModelRegistry
get_available_llm()checks ModelRegistry first, falls back to tinyollama- Add
get_available_model(model_type: str)for type-specific retrieval - Maintain current fallback logic
main.py:
- Initialize ModelRegistry alongside other managers
- Update chat loop to check model availability before prompting
- Modify
/codecommand to check coder model availability - Update edit operations (line 1411) to use ModelRegistry.get_active_model('coder')
- Add new
/modelcommand handlers - Display model status in banner or on startup
src/ui/routes/chat.py:
- Update model retrieval to use ModelRegistry
- Add endpoint:
GET /models- List all models - Add endpoint:
GET /models/<type>- Get active model for type - Update chat endpoint to handle missing model gracefully
src/ui/routes/commands.py:
- Add
/codeendpoint availability check - Return proper error if coder model not available
- Add new model management endpoints:
POST /models/<type>- Add modelDELETE /models/<model_id>- Remove modelPUT /models/<model_id>/activate- Set activeGET /models/status- Get all models status
Redis Schema Setup:
Create migration script migrations/add_model_registry.py:
- Initialize model registry keys
- Migrate existing config.yaml models to Redis (one-time)
- Set current models as active
Migration Path:
-
On first startup with new code:
- Check if models exist in Redis
- If not, read from config.yaml and populate Redis
- Mark migrated models as active
- Keep config.yaml untouched (for rollback)
-
Environment variable override:
AI_CLI_SKIP_MODEL_MIGRATION=true- Don't auto-migrateAI_CLI_FORCE_CONFIG_YAML=true- Ignore Redis, use config.yaml
Startup Behavior:
AI CLI v1.0.0
✓ General Model: llama3.1:8b @ 192.168.31.23:11434 (reachable)
✓ Coder Model: qwen2.5-coder:7b @ 192.168.31.23:11434 (reachable)
ℹ Fallback: tinyllama @ localhost:11434 (reachable)
Type /help for available commands or start chatting!
When models unavailable:
AI CLI v1.0.0
⚠️ General Model: Not configured
✓ Coder Model: qwen2.5-coder:7b @ 192.168.31.23:11434 (reachable)
ℹ Fallback: tinyllama @ localhost:11434 (unreachable)
Limited mode: Use '/model general add' to enable chat features.
Code features available via '/code' command.
Type /help for available commands.
Interactive model addition:
> /model general add http://192.168.31.23:11434 llama3.1:8b
Checking availability... ✓
Model registered successfully!
ID: model_abc123
Type: general
Model: llama3.1:8b
URL: http://192.168.31.23:11434
Status: Active
You can now start chatting!
Unit Tests:
tests/test_model_registry.py: Model CRUD operationstests/test_model_availability.py: Availability checking with mocked Redistests/test_model_migration.py: Config.yaml to Redis migration
Integration Tests:
tests/test_model_commands.py: CLI command parsing and executiontests/test_ui_model_routes.py: UI endpoints with real Redistests/test_graceful_degradation.py: Feature disabling when models unavailable
-
Phase 1: Core Infrastructure
- Create ModelRegistry class with Redis backend
- Add model availability checking
- Write unit tests
-
Phase 2: CLI Integration
- Add
/modelcommands to main.py - Update chat loop for graceful degradation
- Modify code command to check coder model
- Update edit operations
- Add
-
Phase 3: Migration & Backward Compatibility
- Create migration script
- Add auto-migration on startup
- Test config.yaml fallback
-
Phase 4: UI Integration
- Add UI endpoints for model management
- Update existing endpoints to use ModelRegistry
- Add UI components for model switching
-
Phase 5: Polish & Documentation
- Update help text
- Add user documentation
- Create migration guide
- Update CLAUDE.md
src/model_registry/
├── __init__.py
├── manager.py # ModelRegistry class
└── availability.py # Refactored LLMAvailabilityChecker
migrations/
└── add_model_registry.py
tests/
├── test_model_registry.py
├── test_model_availability.py
├── test_model_migration.py
├── test_model_commands.py
└── test_graceful_degradation.py
docs/
└── MODEL_MANAGEMENT.md # User-facing documentation
src/config/manager.py # Remove dynamic model methods
src/config/llm_availability.py # Move to model_registry/availability.py
main.py # Add /model commands, update model usage
src/ui/routes/chat.py # Update model retrieval
src/ui/routes/commands.py # Add model management endpoints
src/session/manager.py # Possibly store active models in session
config.yaml # Keep only tinyollama config
CLAUDE.md # Update development docs
None for end users - Migration is automatic and backward compatible.
For developers:
ConfigManager.get_ollama_model()→ModelRegistry.get_active_model('general')ConfigManager.get_coder_model()→ModelRegistry.get_active_model('coder')- Direct config.yaml model access is deprecated
- Model Templates: Pre-configured model presets (e.g., "fast", "accurate", "coding")
- Auto-discovery: Detect available models from Ollama server
- Load balancing: Distribute requests across multiple models
- Model metrics: Track usage, response times, success rates
- Model groups: Group models by capability (vision, code, chat, etc.)
- Per-session models: Different models for different sessions
- ✅ Users can add/remove models without editing config.yaml
- ✅ CLI doesn't exit when models are unavailable
- ✅ Clear feedback about which features are available
- ✅ Backward compatible with existing config.yaml
- ✅ All tests pass
- ✅ UI can manage models dynamically
- ✅ Session persistence works with dynamic models
- ✅ Code command gracefully handles missing coder model