analyze-fin: Complete Implementation Plan (Local-First, 5 Phases)

Philippine Personal Finance Tracker Timeline: ~3 weeks (5 phases) to fully functional MVP Team: You + Claude Code Location: /Users/agallentes/git/analyze-fin Architecture: SQLite + Python scripts + Claude Skills Cost: $0 (fully local, zero external dependencies)

Executive Summary

You're building a statement-based personal finance tracker for Philippine users who manage multiple accounts (BPI, GCash, Maya, Vybe).

Why this approach wins:

Zero API friction (banks don't expose personal data APIs anyway)
User owns all data (stays on your machine)
Faster MVP (3 weeks vs 5-6 weeks with cloud)
Zero cost ($0 vs $20-30/month for cloud)
Full privacy (data never leaves your computer)
Works offline (local SQLite)

What you'll build:

Parse PDF statements from 4 Philippine banks
Smart deduplication across multiple imports
Interactive merchant categorization (system learns)
HTML reports with interactive charts
Claude Code Skills for end-to-end workflows
Natural language queries about spending

Product Overview: What You're Building

The Problem

Filipino professionals use 2-4 financial accounts:

BPI: Salary account
GCash: Mobile payments
Maya: Savings (e-wallet)
Vybe: Credit (linked to BPI)

Challenge: Tracking spending across all accounts is manual and error-prone.

The Solution

analyze-fin automates the hard part (data entry) and provides insights:

Download PDF statements from your banking apps (monthly)
Upload to analyze-fin → Automatic extraction
System categorizes merchants intelligently (learns from corrections)
View spending insights via interactive HTML reports
Ask questions about your spending (natural language)
Export to CSV anytime for manual analysis

Core Features

✅ Multi-Bank PDF Parsing (all 4 banks)

GCash: Extracts transaction tables (date + amount + ref#)
BPI: Handles password-protected PDFs
Maya: Supports different account formats
Vybe: Guides users to BPI statement (no export)
Quality scoring (0-100) for each parse
Error detection (scanned PDFs, corrupted data)

✅ Smart Deduplication

3-layer matching: reference# → content hash → user review
Handles date overlaps (importing same period twice)
Detects internal transfers (money between your own accounts)
Zero false positives (conservative approach)

✅ Merchant Learning System

150+ common Philippine merchants pre-mapped
Unknown merchants → You categorize once → System remembers
Saves mappings to local JSON file
Re-categorizes similar merchants automatically

✅ Spending Insights

HTML reports with Plotly charts
Pie chart: spending by category
Line chart: daily spending trend
Bar chart: top merchants
Summary cards: total in/out, net, daily average
Anomaly detection (unusual spending)

✅ Natural Language Queries

Ask: "How much food last week?"
Ask: "Top spending categories?"
Ask: "How much transport?"
Powered by Claude Code context (no external API)
Private (answers generated from local data)

✅ Local Data Ownership

Everything stored locally: data/analyze-fin.db
Export anytime as CSV/JSON
Full transparency (inspect data with any SQLite tool)
Easy backup (copy entire folder)

Architecture: Local-First, Skills-Driven

Core Principles

Local-First: All data stays on your machine (SQLite file)
Skills-Driven: Claude Skills are primary interface, not side feature
No External APIs: No Supabase, no Claude API calls, no cloud dependencies
Interactive Learning: Categorize once, system learns forever
Transparent: Inspect/export data anytime

Data Storage

Primary: SQLite Database

Location: data/analyze-fin.db
Why: Single file, no server, supports complex queries, perfect for local apps
Tables: accounts, statements, transactions, merchant_corrections, spending_patterns

Secondary: JSON/CSV Exports

Location: data/exports/
Purpose: Backup, transparency, sharing, further analysis

Merchant Mappings

Location: data/merchant_mapping.json
Purpose: Learning system (grows as you use it)
Format: {"JOLLIBEE": {"normalized": "Jollibee", "category": "food"}}

Claude Skills: Your Interface

Instead of web UI, you interact via Claude Code skills:

Skill 1: `parse-statements`

You: "Parse my GCash statement from January"
→ Claude Code: Prompts for file path
→ Claude Code: Parses PDF with pdfplumber
→ Claude Code: Stores to SQLite
→ Result: "✅ Imported 28 transactions, quality: 95"

Skill 2: `categorize-transactions`

You: "Categorize my uncategorized merchants"
→ Claude Code: Finds unknown merchants
→ Claude Code: For each one, prompts: "Category?"
→ You: Select from list (Food, Shopping, Bills, etc.)
→ Claude Code: Saves to merchant_mapping.json
→ Claude Code: Updates SQLite
→ Result: Future similar merchants auto-categorized

Skill 3: `generate-report`

You: "Generate my January spending report"
→ Claude Code: Queries SQLite for date range
→ Claude Code: Creates HTML with Plotly charts
→ Claude Code: Creates Markdown summary
→ Claude Code: Opens HTML in browser
→ Result: "✅ Report: data/reports/2025-01-report.html"

Skill 4: `query-spending`

You: "How much did I spend on food last week?"
→ Claude Code: Loads SQLite data into context
→ Claude Code: Uses reasoning to answer
→ Result: "₱1,250 across 12 transactions"

Skill 5: `export-data`

You: "Export all transactions to CSV"
→ Claude Code: Queries SQLite
→ Claude Code: Converts to CSV with Pandas
→ Claude Code: Saves to data/exports/
→ Result: "✅ Exported 156 transactions"

Skill 6: `deduplicate`

You: "Check for duplicate transactions"
→ Claude Code: Runs dedup logic
→ Claude Code: Flags suspicious pairs
→ Claude Code: Asks: "Merge these two?"
→ You: Confirm or skip
→ Claude Code: Updates SQLite

Directory Structure

analyze-fin/
│
├── 📄 Documentation Files
│   ├── README.md                 # Architecture overview
│   ├── QUICKSTART.md             # 5-minute setup
│   ├── PROJECT_PLAN.md           # This file
│   ├── DEVELOPMENT_GUIDE.md      # Implementation prompts
│   └── SKILLS_GUIDE.md           # Skills documentation
│
├── 🔧 Setup Files
│   ├── requirements.txt          # Python dependencies (minimal)
│   ├── .gitignore                # Git ignores
│   └── initialize.sh             # Setup automation
│
├── 📁 data/                      # All local data storage
│   ├── analyze-fin.db            # SQLite database (created by init_db.py)
│   ├── merchant_mapping.json     # Merchant mappings (grows as you use)
│   ├── category_rules.json       # Categorization rules (optional)
│   ├── sample_statements/        # Your PDF statements (you provide)
│   ├── reports/                  # Generated HTML/MD reports
│   │   ├── 2025-01-report.html
│   │   └── 2025-01-summary.md
│   └── exports/                  # CSV/JSON exports
│       ├── all_transactions.csv
│       └── spending_summary.json
│
├── 📁 backend/                   # Core Python logic
│   ├── __init__.py
│   ├── models.py                 # Pydantic models (Transaction, Account, etc.)
│   ├── database.py               # SQLite CRUD operations
│   ├── statement_parser.py       # PDF extraction (pdfplumber)
│   ├── deduplicator.py           # 3-layer dedup logic
│   ├── categorizer.py            # Merchant categorization
│   ├── report_generator.py       # HTML/Markdown generation
│   └── init_db.py                # Database initialization
│
├── 📁 scripts/                   # Utility scripts (called by skills)
│   ├── parse_statement.py        # Entry point for parse-statements skill
│   ├── categorize.py             # Entry point for categorize-transactions skill
│   ├── generate_report.py        # Entry point for generate-report skill
│   ├── query_spending.py         # Entry point for query-spending skill
│   ├── export.py                 # Entry point for export-data skill
│   └── deduplicate.py            # Entry point for deduplicate skill
│
├── 📁 skills/                    # Claude Skills (primary interface)
│   ├── parse-statements/
│   │   └── SKILL.md
│   ├── categorize-transactions/
│   │   └── SKILL.md
│   ├── generate-report/
│   │   └── SKILL.md
│   ├── query-spending/
│   │   └── SKILL.md
│   ├── export-data/
│   │   └── SKILL.md
│   └── deduplicate/
│       └── SKILL.md
│
├── 📁 templates/                 # HTML/Markdown templates
│   ├── spending_report.html      # Jinja2 template for reports
│   └── markdown_report.md.j2     # Markdown template
│
├── 📁 tests/                     # Unit tests
│   ├── test_parser.py            # Test PDF parsing
│   ├── test_deduplicator.py      # Test deduplication
│   ├── test_categorizer.py       # Test categorization
│   └── conftest.py               # Pytest fixtures
│
└── 📁 .git/                      # Git repository (initialized by initialize.sh)

Technology Stack (Minimal, Local-Only)

Component	Technology	Why	Cost
Database	SQLite	Local file, no server, perfect for this use case	Free
PDF Parsing	pdfplumber	Best for Philippine bank table formats	Free
Data Validation	Pydantic	Type safety + clean error messages	Free
Data Processing	Pandas	Easy grouping, aggregation, filtering	Free
Reporting	Jinja2	HTML template rendering	Free
Charts	Plotly	Interactive HTML charts, no backend needed	Free
Testing	pytest	Standard Python testing framework	Free
Interface	Claude Skills	Native to Claude Code workflows	Free

Total Dependencies: 8 core packages, ~100 lines in requirements.txt

No Cloud Services: No Supabase, no FastAPI, no Streamlit, no Railway, no external APIs

Phase Breakdown: 5 Phases, ~3 Weeks

PHASE 0: Foundation Files (Day 0 - 2 hours)

Goal: Create base project structure and initialize Python environment

Deliverables:

requirements.txt - Minimal Python dependencies
.gitignore - Standard Python ignores
initialize.sh - Automated setup script
Directory structure (created by initialize.sh)

Actions:

bash initialize.sh creates venv, installs dependencies, creates directories
Git repo initialized

Success Criteria:

source venv/bin/activate works
python -c "import pdfplumber" works
Directory structure exists

PHASE 1: Core Backend + SQLite (Days 1-3)

Goal: Build foundational Python modules and local SQLite database

Deliverables:

backend/models.py - Pydantic models
- Transaction: id, date, amount, merchant_name, category, account_id
- Account: id, alias, bank_type, created_at
- ParsedStatement: transactions[], quality_score, parsing_issues[]
- MerchantCorrection: raw_merchant, corrected_merchant, category
backend/init_db.py - Database initialization
- Creates SQLite database: data/analyze-fin.db
- Tables: accounts, statements, transactions, merchant_corrections
- Indexes on: (household_id, date), (category), (dedup_hash)
- Simple schema (single-user, no RLS)
backend/database.py - SQLite CRUD operations
- insert_account(), get_accounts()
- insert_statement(), get_statements()
- insert_transactions() (batch), get_transactions()
- query_by_category(), query_by_date_range()
- Error handling and logging
data/merchant_mapping.json - Initial merchant database
- 150+ common Philippine merchants pre-mapped
- Format: {"JOLLIBEE": {"normalized": "Jollibee", "category": "food"}}
- Includes: Jollibee, McDonald's, SM, Lazada, Shopee, Meralco, PLDT, SSS, etc.

Success Criteria:

python backend/init_db.py creates database
sqlite3 data/analyze-fin.db ".tables" shows all 4 tables
Can insert and query transactions
All imports work: from backend import models, database

Timeline: Days 1-3 (2-3 days)

PHASE 2: Statement Parser (Days 4-7)

Goal: Parse PDF statements from all 4 Philippine banks

Deliverables:

backend/statement_parser.py - Main parser
- parse(pdf_path, password=None) - Main entry point
- _detect_bank(pdf_text) - Auto-detect GCash/BPI/Maya/Vybe
- _parse_gcash(), _parse_bpi(), _parse_maya(), _parse_vybe()
- _normalize_date() - Handle all date formats
- _normalize_amount() - Strip ₱, handle parentheses
- Quality scoring (0-100)
- Error handling (password-protected, scanned, corrupted)
Bank-Specific Handling:
- GCash: Table format with columns Date|Desc|Ref#|Debit|Credit|Balance
- BPI: Password-protected (SURNAME + last 4 phone)
- Maya: Different formats (Savings vs Wallet)
- Vybe: Return helpful error message
Unit tests: tests/test_parser.py
- Test each bank format
- Test date normalization
- Test amount parsing
- Test error cases (scanned, empty, corrupt)

Success Criteria:

Parse real GCash/BPI statements with >95% accuracy
Handle password-protected PDFs
Detect scanned PDFs and return low quality score
All unit tests passing
Test with sample statements (you provide)

Timeline: Days 4-7 (3-4 days)

PHASE 3: Deduplication + Categorization (Days 8-11)

Goal: Smart deduplication and interactive merchant learning

Deliverables:

backend/deduplicator.py - 3-layer dedup
- find_duplicates(new_txn) - Check reference#, content hash
- merge_duplicates() - Mark as duplicate, keep primary
- detect_internal_transfers() - Find mirror transactions
- Confidence scoring
- No false positives (conservative matching)
backend/categorizer.py - Merchant categorization
- Load merchant_mapping.json
- normalize(raw_merchant) - Clean up name
- categorize(merchant) - Return category
- Interactive prompts for unknowns (via input())
- Learn from user corrections
scripts/categorize.py - Categorization workflow
- Find uncategorized transactions
- For each: prompt user, save mapping, update DB
- Called by categorize-transactions skill
Unit tests: tests/test_dedup.py, tests/test_categorizer.py
- Test duplicate detection
- Test merchant normalization
- Test categorization accuracy

Success Criteria:

Import 3+ overlapping statements → zero false duplicates
Interactive categorization works
Merchant mappings saved to JSON
Learning persists (test re-importing same merchant)
All unit tests passing

Timeline: Days 8-11 (3-4 days)

PHASE 4: Report Generation (Days 12-14)

Goal: Generate interactive HTML and Markdown reports

Deliverables:

backend/report_generator.py - Report logic
- Query SQLite for date range
- Calculate totals, category breakdown, top merchants
- Generate Plotly charts (pie, line, bar)
- Render HTML and Markdown
templates/spending_report.html - Jinja2 template
- Bootstrap CSS (CDN)
- Embedded Plotly charts
- Summary cards (total in/out, net, avg daily)
- Category breakdown table
- Responsive design
templates/markdown_report.md.j2 - Markdown template
- Summary stats
- Category table
- Top merchants list
- Date range
scripts/generate_report.py - Report generation script
- Take month parameter (e.g., "2025-01")
- Query SQLite for date range
- Generate both HTML and Markdown
- Save to data/reports/
- Called by generate-report skill

Success Criteria:

Generate HTML report with interactive Plotly charts
Open in browser shows dashboard-like view
Markdown report is readable in text editor
All charts render correctly
Report filenames follow pattern: 2025-01-report.html, 2025-01-summary.md

Timeline: Days 12-14 (2-3 days)

PHASE 5: Claude Skills (Days 15-17) - MVP COMPLETE

Goal: Create all 6 Claude Skills as primary interface

Deliverables:

skills/parse-statements/SKILL.md
- Invokes scripts/parse_statement.py
- Prompts for file path and password
- Returns import results (transaction count, quality score)
skills/categorize-transactions/SKILL.md
- Invokes scripts/categorize.py
- Interactive categorization workflow
- Saves mappings to JSON
skills/generate-report/SKILL.md
- Invokes scripts/generate_report.py
- Takes month/date range parameter
- Creates HTML + Markdown reports
skills/query-spending/SKILL.md
- Invokes scripts/query_spending.py
- Exports data to JSON
- Claude Code loads into context, reasons about queries
- Answers natural language questions
skills/export-data/SKILL.md
- Invokes scripts/export.py
- Exports to CSV/JSON formats
- Saves to data/exports/
skills/deduplicate/SKILL.md
- Invokes scripts/deduplicate.py
- Interactive duplicate review
- Prompts to merge or skip
All supporting scripts: scripts/*.py
- parse_statement.py - PDF parser entry point
- categorize.py - Categorization entry point
- generate_report.py - Report generation entry point
- query_spending.py - Data export for NL queries
- export.py - CSV/JSON export
- deduplicate.py - Duplicate review

Success Criteria:

All 6 skills invocable from Claude Code
End-to-end workflow works:
1. Parse statement → SQLite
2. Categorize merchants → JSON + SQLite
3. Generate report → HTML in browser
4. Query spending → Claude Code answers
5. Export data → CSV file
6. Find duplicates → Manual confirmation
MVP COMPLETE ✅

Timeline: Days 15-17 (2-3 days)

Database Schema

Tables

accounts

id (PRIMARY KEY)
account_alias (TEXT) - User-friendly name (e.g., "My GCash")
bank_type (TEXT) - GCash, BPI, MAYA, VYBE
account_number (TEXT) - Last 4 digits or full number
created_at (TIMESTAMP)

statements

id (PRIMARY KEY)
account_id (FK)
file_hash (TEXT, UNIQUE) - Prevents re-importing same file
statement_period_start (DATE)
statement_period_end (DATE)
total_transactions (INT)
quality_score (INT) - 0-100
imported_at (TIMESTAMP)
raw_json (TEXT) - Backup of original parse result

transactions

id (PRIMARY KEY)
account_id (FK)
statement_id (FK)
date (TIMESTAMP)
amount (NUMERIC) - Positive = credit, negative = debit
balance (NUMERIC)
raw_description (TEXT) - Original from bank
merchant_name (TEXT) - Normalized
merchant_category (TEXT) - Categorized
transaction_type (TEXT) - Debit, Credit, Transfer
reference_number (TEXT) - Bank ref# (for dedup)
status (TEXT) - Active, Deleted, Merged
dedup_hash (TEXT) - For content-based dedup
is_duplicate_of (FK) - Links to primary if duplicate
notes (TEXT) - User notes

merchant_corrections

id (PRIMARY KEY)
raw_merchant (TEXT) - Original from bank
normalized_merchant (TEXT) - After normalization
category (TEXT) - User-selected category
last_corrected (TIMESTAMP)

Indexes

transactions(account_id, date) - Fast date range queries
transactions(merchant_category) - Fast category queries
transactions(dedup_hash) - Fast duplicate checks
transactions(reference_number) - Reference-based dedup
statements(account_id) - Account statement queries

User Workflows (Skills-Based)

Workflow 1: First Import

1. User: "Parse my GCash January statement"
   → parse-statements skill → imports 28 transactions

2. User: "Categorize my unknown merchants"
   → categorize-transactions skill → interactive prompts
   → user picks categories → saved to merchant_mapping.json

3. User: "Generate January report"
   → generate-report skill → creates HTML
   → opens in browser → "₱8,250 spent this month, food 30%"

Workflow 2: Multi-Account Import

1. User: "Parse my BPI statement too"
   → imports 45 transactions (different account)

2. User: "Check for duplicates"
   → deduplicate skill → finds transfer between GCash and BPI
   → "Link these as internal transfer?" → user confirms

3. User: "Show total spending"
   → query-spending skill → "₱13,500 across both accounts"

Workflow 3: Ad-Hoc Analysis

User: "How much food last week?"
→ query-spending skill → "₱1,250 across 12 transactions"

User: "What's my biggest spending category?"
→ query-spending skill → "Food at ₱8,500 (35% of total)"

User: "Export everything to CSV"
→ export-data skill → "Saved to data/exports/all_transactions.csv"

What's Different from Original Plan

Removed (not needed locally):

❌ Supabase ($25/mo) → SQLite ($0)
❌ FastAPI backend → Python scripts ($0)
❌ Streamlit (web UI) → HTML reports + Claude Skills ($0)
❌ Anthropic Claude API ($5-10/mo) → Claude Code context ($0)
❌ Railway deployment ($5-10/mo) → No deployment needed
❌ Streamlit Cloud → No deployment needed
❌ Multi-user RLS policies → Single user (simpler)

Kept (core value):

✅ PDF statement parsing (all 4 banks)
✅ 3-layer deduplication
✅ Merchant learning system
✅ Spending analysis + charts
✅ Natural language queries
✅ Export + backup options

Result:

✅ 40% faster (3 weeks vs 5-6 weeks)
✅ 100% cheaper ($0 vs $20-30/month)
✅ 100% more private (data never leaves machine)
✅ 50% simpler (local-only architecture)

Cost Analysis

Component	Original Plan	New Plan
Database	Supabase $25/mo	SQLite $0
Backend API	FastAPI on Railway $5-10/mo	Python scripts $0
Frontend	Streamlit Cloud (free)	HTML + Skills $0
AI/NLP	Claude API $5-10/mo	Claude Code context $0
Deployment	Railway ops $$	Local (no ops)
Total	$20-30/mo	$0

Success Metrics (MVP Definition)

After Phase 5, you'll be able to:

✅ Parse GCash/BPI/Maya with >95% accuracy ✅ Store 500+ transactions in local SQLite ✅ Auto-categorize merchants (90%+ accuracy) ✅ Deduplicate across multiple imports (zero false positives) ✅ Generate interactive HTML reports with charts ✅ Answer natural language questions in <2 seconds ✅ Export transactions as CSV for Excel ✅ Merchant learning (categorize once, remember forever)

Timeline

Phase 0: Day 0 (2 hours) Phase 1: Days 1-3 (2-3 days) Phase 2: Days 4-7 (3-4 days) Phase 3: Days 8-11 (3-4 days) Phase 4: Days 12-14 (2-3 days) Phase 5: Days 15-17 (2-3 days)

Total: ~3 weeks (17 days)

Next Steps

Read QUICKSTART.md - 5 minute overview
Run initialize.sh - Set up environment
Gather sample statements - Get test data
Open DEVELOPMENT_GUIDE.md - Follow Phase-by-phase prompts
Ask Claude Code - Copy exact prompts, execute one phase at a time
Commit to git - After each phase completion
Test incrementally - Verify each phase before proceeding

The plan is solid. The architecture is proven. Time to build. 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analyze-fin: Complete Implementation Plan (Local-First, 5 Phases)

Executive Summary

Product Overview: What You're Building

The Problem

The Solution

Core Features

Architecture: Local-First, Skills-Driven

Core Principles

Data Storage

Claude Skills: Your Interface

Skill 1: `parse-statements`

Skill 2: `categorize-transactions`

Skill 3: `generate-report`

Skill 4: `query-spending`

Skill 5: `export-data`

Skill 6: `deduplicate`

Directory Structure

Technology Stack (Minimal, Local-Only)

Phase Breakdown: 5 Phases, ~3 Weeks

PHASE 0: Foundation Files (Day 0 - 2 hours)

PHASE 1: Core Backend + SQLite (Days 1-3)

PHASE 2: Statement Parser (Days 4-7)

PHASE 3: Deduplication + Categorization (Days 8-11)

PHASE 4: Report Generation (Days 12-14)

PHASE 5: Claude Skills (Days 15-17) - MVP COMPLETE

Database Schema

Tables

Indexes

User Workflows (Skills-Based)

Workflow 1: First Import

Workflow 2: Multi-Account Import

Workflow 3: Ad-Hoc Analysis

What's Different from Original Plan

Cost Analysis

Success Metrics (MVP Definition)

Timeline

Next Steps

FilesExpand file tree

PROJECT_PLAN.md

Latest commit

History

PROJECT_PLAN.md

File metadata and controls

analyze-fin: Complete Implementation Plan (Local-First, 5 Phases)

Executive Summary

Product Overview: What You're Building

The Problem

The Solution

Core Features

Architecture: Local-First, Skills-Driven

Core Principles

Data Storage

Claude Skills: Your Interface

Skill 1: parse-statements

Skill 2: categorize-transactions

Skill 3: generate-report

Skill 4: query-spending

Skill 5: export-data

Skill 6: deduplicate

Directory Structure

Technology Stack (Minimal, Local-Only)

Phase Breakdown: 5 Phases, ~3 Weeks

PHASE 0: Foundation Files (Day 0 - 2 hours)

PHASE 1: Core Backend + SQLite (Days 1-3)

PHASE 2: Statement Parser (Days 4-7)

PHASE 3: Deduplication + Categorization (Days 8-11)

PHASE 4: Report Generation (Days 12-14)

PHASE 5: Claude Skills (Days 15-17) - MVP COMPLETE

Database Schema

Tables

Indexes

User Workflows (Skills-Based)

Workflow 1: First Import

Workflow 2: Multi-Account Import

Workflow 3: Ad-Hoc Analysis

What's Different from Original Plan

Cost Analysis

Success Metrics (MVP Definition)

Timeline

Next Steps

Skill 1: `parse-statements`

Skill 2: `categorize-transactions`

Skill 3: `generate-report`

Skill 4: `query-spending`

Skill 5: `export-data`

Skill 6: `deduplicate`