LangGraph + LangSmith infrastructure for detecting, scoring, observing, and replaying emerging social and news narratives.
Narrative Alpha Agent (NAA) is a production-oriented TypeScript MVP for emerging narrative research. It uses LangGraph as the stateful orchestration layer, LangSmith-ready tracing for graph observability, deterministic local embeddings by default, SQLite for persisted long-term narrative memory, and a replay engine designed for historical backtesting without future data leakage.
NAA uses LangGraph and LangSmith as first-class infrastructure components:
- LangGraph orchestration: every narrative run is represented as a typed graph with named nodes, checkpointable state, and replay re-entry.
- LangSmith observability: graph invocations carry run names, tags, metadata, timestamps, document counts, and thread IDs for trace inspection.
- Provider-agnostic model access: OpenAI-compatible providers, Claude, DeepSeek, OpenRouter, Gemini, Cohere, Mistral, Groq, Together, xAI, Azure OpenAI, and local deterministic mode are all supported through clean ports.
- Production discipline: strict TypeScript, SQLite persistence, deterministic replay, CI quality gates, branch protection, issue templates, security policy, and documented operations.
LangSmith tracing is opt-in and configured through .env.example:
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=lsv2_...
LANGSMITH_PROJECT=narrative-alpha-agent
LANGSMITH_TAGS=naa,langgraph,narrative-replaytimestamped documents
|
v
+------------------+
| IngestionNode | dedupe, visibility cutoff
+------------------+
|
v
+------------------+
| PreprocessingNode| text normalization
+------------------+
|
v
+------------------+
| ClusteringNode | embeddings + cosine similarity
+------------------+
|
v
+----------------------+
| NarrativeAgentNode | stateful lifecycle + SQLite memory
+----------------------+
|
v
+------------------+
| ScoringNode | Narrative Impact Probability
+------------------+
|
v
+------------------+
| ActionNode | console + notifier alerts
+------------------+
The code is organized around injectable services rather than prompt-heavy business logic:
src/graphcontains LangGraph state and graph construction.src/nodescontains the six required orchestration nodes.src/servicescontains clustering, lifecycle, scoring, embeddings, vector store, queue, and alerting.src/dbcontains SQLite and in-memory narrative repositories.src/backtestcontains deterministic replay and demo fixtures.testscovers clustering, scoring, replay leakage, and lifecycle transitions.
createNarrativeGraph builds a StateGraph with a strongly typed global state:
type SystemState = {
timestamp: number;
documents: Document[];
clusters: Cluster[];
narratives: Narrative[];
logs: string[];
};The graph is compiled with a MemorySaver checkpointer, and every node returns partial state updates. The graph can be invoked repeatedly with the same runner, which is what replay uses for time-cursor re-entry.
NarrativeGraphRunner passes LangChain runnable config into each LangGraph invocation:
runNametags- run metadata for app name, replay timestamp, document count, project, and tracing status
thread_idfor checkpointing and replay-specific trace grouping
When LANGSMITH_TRACING=true, these graph runs and node spans are visible in LangSmith under the configured project. This makes replay checkpoints, live ingestion runs, and alert-producing executions inspectable instead of opaque.
NAA calculates NIP as a configurable weighted score:
NIP = w1 * velocity
+ w2 * sourceDiversity
+ w3 * crossPlatformPresence
+ w4 * sentimentShift
Defaults live in src/config/defaults.ts. Scoring also emits human-readable logs such as high velocity, cross-platform presence, source diversity, and sentiment shift.
The replay engine advances through a timeline and only passes documents visible at the current timestamp:
for (const t of timeline) {
const visibleData = data.filter((d) => d.timestamp <= t);
await runGraph(visibleData);
}ReplayEngine sorts input deterministically, asserts that no future documents are present in graph state, and carries prior narrative state forward between checkpoints. This keeps live ingestion and historical replay on the same execution path.
pnpm install
pnpm run replay
pnpm run ingest
pnpm run devDocker:
docker build -t narrative-alpha-agent:local .
docker run --rm narrative-alpha-agent:localQuality gates:
pnpm run format
pnpm run lint
pnpm run typecheck
pnpm run test- Architecture
- Backtesting
- Operations
- Providers and Secrets
- Observability
- Notifications
- Docker
- Contributing
- Security
- Changelog
src/backtest/demoDataset.ts simulates:
- early formation of an AI-agent payments narrative
- acceleration across Twitter, Telegram, and news
- peak and cooling phases
- an unrelated market document to test cluster separation
pnpm run replay prints NIP over time, narrative states, and graph logs.
The MVP is intentionally local-first but replaceable:
- LLM access uses a multi-provider registry with local, OpenAI, Anthropic Claude, DeepSeek, OpenRouter, Google Gemini, Cohere, Mistral, Groq, Together, xAI, Azure OpenAI, and custom OpenAI-compatible support.
- Embeddings implement the
EmbeddingProviderport. - Vector search implements the
VectorStoreport. - Alerts implement the
Notifierport with console, Discord webhook, and Telegram stub examples. - Narrative memory implements the
NarrativeRepositoryport with SQLite and in-memory backends.
- The default embedding provider is deterministic and useful for tests, not semantic production quality.
- Clustering is threshold-based and does not yet perform split/merge maintenance.
- Source authority is not modeled beyond source diversity.
- The Telegram notifier is a stub.
- SQLite persistence stores the latest version of each narrative, not a full event-sourced history.
- Add source connectors for social APIs, RSS, and market data.
- Add event-sourced narrative snapshots for richer historical analysis.
- Add offline evaluation metrics for precision, recall, and lead time.
- Add Redis-backed queue and distributed workers.
- Add richer lifecycle models with decay, source authority, and engagement-adjusted velocity.
- Add real embedding providers and a durable vector store such as Qdrant, LanceDB, or pgvector.