Summary
This issue tracks the implementation of the project's core Retrieval-Augmented Generation (RAG) capabilities. The architecture is designed to be branch-aware, efficient, and fully self-contained within a single project database file.
It uses a Git-like, snapshot-based model to handle different versions of the codebase (i.e., Git branches) without data duplication, and a content-addressable store for expensive-to-compute artifacts like ASTs and vector embeddings. The entire stack will be pure Go, adhering to the project's core principles.
1. Core Architecture: Single Database with Snapshots
- Database: A single
modernc.org/sqlite database file at .meowg1k/index.db will be the sole source of truth. The database must be opened in WAL (Write-Ahead Logging) mode to ensure high concurrency between reader (query, generate) and writer (index) processes.
- Content-Addressable Storage: All processed file data (ASTs, chunks, embeddings) will be stored once in a
content_store table, keyed by the SHA256 hash of the file's content.
- Branch Snapshots: A
snapshots table will represent the state of a Git branch at a specific point in time. Each snapshot is a manifest listing the files and their content hashes for that branch.
2. Database Schema
A _schema_meta table will be used for versioning the schema itself.
content_store (Stores the heavy, processed data)
content_hash (PK, TEXT)
raw_content_blob (BLOB)
treesitter_ast_blob (BLOB)
chunks_blob (BLOB)
vectors_blob (BLOB)
snapshots (Represents a state of a branch)
id (PK, INTEGER)
branch_name (TEXT)
created_at (TIMESTAMP)
snapshot_files (Links a snapshot to its content)
snapshot_id (FK to snapshots.id)
filepath (TEXT)
content_hash (FK to content_store.content_hash)
vector_indexes (Stores the built HNSW search indexes for a snapshot)
snapshot_id (FK to snapshots.id)
index_name (TEXT, e.g., "plain")
index_blob (BLOB)
last_updated (TIMESTAMP)
3. Configuration (indexing section)
The config.yaml will define strategies for creating documents and the final search indexes, including per-index embedding profiles.
# .meowg1k/config.yaml
indexing:
# Default profile used if an index doesn't specify its own.
embeddingProfile: "default-embedding-model"
document_strategies:
- name: "go_files"
parser: "treesitter"
language: "go"
queries: ["(function_declaration) @document"]
rules:
- match: "**/*.go"
indexes:
- name: "plain"
from_documents: ["go_files"]
chunk_strategy: "by_line"
- name: "go_functions"
embeddingProfile: "powerful-embedding-model"
from_documents: ["go_files"]
chunk_strategy: "document_as_chunk"
filter: "document_type == 'function'"
4. meow index Command (Snapshot Reconciliation Workflow)
This command will be the core of the indexing process.
- Detect Branch: Get the current Git branch name.
- Scan Filesystem: Scan all project files and calculate the
content_hash for each.
- Update Content Store: For each file, if its hash is not in
content_store, process it (parse AST, chunk, embed) and save the results.
- Create New Snapshot: Create a new entry in the
snapshots table for the current branch. Populate snapshot_files with the current list of (filepath, content_hash) pairs.
- Build Vector Indexes: For each index defined in the config, query the data belonging to the new snapshot, build the HNSW index, and save it to the
vector_indexes table, linked to the snapshot_id.
5. meow query Command
A standalone command for directly querying the indexes of the current branch's latest snapshot.
# Search the 'go_functions' index for functions related to authentication
meow query --index go_functions "user authentication logic"
6. Integration with meow generate
Tasks will use a rag key to specify which indexes to query. The command will automatically use the latest snapshot for the current branch.
generate:
tasks:
refactor-auth:
rag:
indexes: ["go_functions", "plain"]
userPrompt: "Refactor the main authentication function..."
Acceptance Criteria
Summary
This issue tracks the implementation of the project's core Retrieval-Augmented Generation (RAG) capabilities. The architecture is designed to be branch-aware, efficient, and fully self-contained within a single project database file.
It uses a Git-like, snapshot-based model to handle different versions of the codebase (i.e., Git branches) without data duplication, and a content-addressable store for expensive-to-compute artifacts like ASTs and vector embeddings. The entire stack will be pure Go, adhering to the project's core principles.
1. Core Architecture: Single Database with Snapshots
modernc.org/sqlitedatabase file at.meowg1k/index.dbwill be the sole source of truth. The database must be opened inWAL (Write-Ahead Logging)mode to ensure high concurrency between reader (query,generate) and writer (index) processes.content_storetable, keyed by the SHA256 hash of the file's content.snapshotstable will represent the state of a Git branch at a specific point in time. Each snapshot is a manifest listing the files and their content hashes for that branch.2. Database Schema
A
_schema_metatable will be used for versioning the schema itself.content_store(Stores the heavy, processed data)content_hash(PK, TEXT)raw_content_blob(BLOB)treesitter_ast_blob(BLOB)chunks_blob(BLOB)vectors_blob(BLOB)snapshots(Represents a state of a branch)id(PK, INTEGER)branch_name(TEXT)created_at(TIMESTAMP)snapshot_files(Links a snapshot to its content)snapshot_id(FK tosnapshots.id)filepath(TEXT)content_hash(FK tocontent_store.content_hash)vector_indexes(Stores the built HNSW search indexes for a snapshot)snapshot_id(FK tosnapshots.id)index_name(TEXT, e.g., "plain")index_blob(BLOB)last_updated(TIMESTAMP)3. Configuration (
indexingsection)The
config.yamlwill define strategies for creating documents and the final search indexes, including per-index embedding profiles.4.
meow indexCommand (Snapshot Reconciliation Workflow)This command will be the core of the indexing process.
content_hashfor each.content_store, process it (parse AST, chunk, embed) and save the results.snapshotstable for the current branch. Populatesnapshot_fileswith the current list of(filepath, content_hash)pairs.vector_indexestable, linked to thesnapshot_id.5.
meow queryCommandA standalone command for directly querying the indexes of the current branch's latest snapshot.
6. Integration with
meow generateTasks will use a
ragkey to specify which indexes to query. The command will automatically use the latest snapshot for the current branch.Acceptance Criteria
Phase 1: Foundation
[ ]Integratemodernc.org/sqliteand a pure Go HNSW library (e.g.,coder/hnsw).[ ]Implement the full SQLite schema with versioning andWALmode enabled.[ ]Implement theindexingconfiguration structure with per-index embedding profiles.[ ]Implement the service for managing thecontent_store.[ ]Implement the Tree-sitter via WASM backend.Phase 2: Indexing
[ ]Implement themeow indexcommand with the full snapshot-based reconciliation logic.[ ]The command correctly populatescontent_storeon a cache miss.[ ]The command correctly creates new entries insnapshotsandsnapshot_files.[ ]The HNSW index builder queries data based on asnapshot_idand saves the result tovector_indexes.Phase 3: Querying & Integration
[ ]Implement themeow querycommand, making it branch-aware.[ ]Updatemeow generateto use the latest snapshot for the current branch to fetch RAG context.Phase 4: Documentation
[ ]Write comprehensive documentation for the new RAG system, explaining the architecture and commands.