A LangGraph-based system that processes books chapter-by-chapter to analyze and rank characters by their influence throughout the story.
This system tracks:
- Mentioned count: Total times a character's name or aliases appear in the text
- Appeared scenes: Number of scenes in which a character appears (binary per scene)
- Influence evidence: Structured evidence of causal, social, world, pacing, and narrative gravity impact
After processing all chapters, the system synthesizes book-wide dossiers and produces a subjective influence ranking based on narrative impact, not just frequency.
This initially started as a static analyzer meant to count character reference in Game of Thrones, with the goal of determining who has "the best story", and thus deserves the iron throne. With the advent of powerful LLMs and agentic workflows, I've modified the approach to take advantage of the non-deterministic, subjective nature of LLMs to further apply "importance" metrics to characters in any book.
- Chapter-by-chapter processing with LangGraph state machine
- Automatic character detection and alias resolution
- Scene segmentation and appearance tracking
- Structured influence evidence extraction
- Book-wide synthesis into character dossiers
- Subjective influence ranking (not based on frequency)
- LangSmith integration for tracing and debugging
- Multiple input formats (JSON, directory of files, single file)
- Clone the repository
- Install dependencies:
pip install -r requirements.txt- Set up environment variables (copy
.env.exampleto.envand fill in values):
cp .env.example .envRequired environment variables:
OPENAI_API_KEY: Your OpenAI API key (required for LLM nodes)LANGSMITH_API_KEY: Your LangSmith API key (optional, for tracing)LANGSMITH_TRACING: Set totrueto enable tracing (default:false)LANGSMITH_PROJECT: LangSmith project name (default:book-influence-dev)
Process a book from a JSON file:
python main.py --input chapters.json --output results.jsonProcess chapters from a directory:
python main.py --input-dir chapters/ --output results.jsonProcess a single file with chapter separators:
python main.py --input-file book.txt --output results.jsonJSON Format (--input chapters.json):
[
{
"chapter_id": "chapter_001",
"text": "Chapter text here..."
},
{
"chapter_id": "chapter_002",
"text": "More chapter text..."
}
]Directory Format (--input-dir chapters/):
- Directory containing text files (default pattern:
*.txt) - Files are sorted by name and assigned sequential chapter IDs
Single File Format (--input-file book.txt):
- Single text file with chapters separated by
\n\n---\n\n - Chapters are assigned sequential IDs
The output JSON contains an array of ranked characters:
[
{
"rank": 1,
"character_id": "char_001",
"name": "Character Name",
"aliases": ["Alias1", "Alias2"],
"appeared_scenes": 42,
"mentioned_count": 310,
"influence_summary": "Summary of their influence...",
"ranking_rationale": "Why this character ranks here..."
}
]Include metadata in output:
python main.py --input chapters.json --output results.json --include-metadataWrite full state for debugging:
python main.py --input chapters.json --output results.json --full-state debug_state.jsonSpecify run metadata for tracing:
python main.py --input chapters.json --output results.json --book-id "book_001" --run-id "run_001" --env prodLogging Options:
By default, logs are output to the command line (stdout). You can also write logs to a file:
# Log to both console and file
python main.py --input chapters.json --output results.json --log-file run.log
# Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
python main.py --input chapters.json --output results.json --log-file run.log --log-level DEBUGLogging Behavior:
- Without
--log-file: Logs only go to the command line (stdout) - With
--log-file: Logs go to both the command line and the specified file - Log levels: Control verbosity (default: INFO)
DEBUG: Detailed information including character IDs, counts, model configsINFO: Progress updates, major milestones, summariesWARNING: Ambiguous references, parsing issuesERROR: Failures, missing data
All nodes log their execution steps, making it easy to track progress and debug issues.
graph TD
Start([Start]) --> Init[Init]
Init --> LoadChapter[LoadChapter]
LoadChapter --> SceneSegmenter[SceneSegmenter]
SceneSegmenter --> EntityRosterUpdate[EntityRosterUpdate]
EntityRosterUpdate --> MentionCounter[MentionCounter]
MentionCounter --> AppearanceCounter[AppearanceCounter]
AppearanceCounter --> InfluenceExtractor[InfluenceExtractor]
InfluenceExtractor --> BookAggregator[BookAggregator]
BookAggregator --> NextChapter[NextChapter]
NextChapter -->|next_chapter| LoadChapter
NextChapter -->|finalize| BookSynthesis[BookSynthesis]
BookSynthesis --> Ranker[Ranker]
Ranker --> End([End])
style Start fill:#009919
style End fill:#b52d93
style NextChapter fill:#b5ae2d
style Init fill:#003399
style BookSynthesis fill:#812db5
style Ranker fill:#812db5
- Init: Validates input and initializes state
- LoadChapter: Loads current chapter and resets scratch fields
- SceneSegmenter: Splits chapter into scenes
- EntityRosterUpdate: Detects characters and updates alias registry
- MentionCounter: Counts alias occurrences
- AppearanceCounter: Counts scenes per character
- InfluenceExtractor: Extracts structured influence evidence
- BookAggregator: Merges chapter results into book totals
- NextChapter: Conditional routing (next chapter or finalize)
- BookSynthesis: Synthesizes evidence into dossiers
- Ranker: Assigns subjective influence ranks
The state maintains three layers:
- Book-level aggregates: Persist across chapters (mentions, appearances, influence)
- Per-chapter scratch: Reset each chapter (current chapter data)
- Indexing structures: Character canon and alias resolution
Influence ranking is subjective and based on:
- How characters affect other characters (social impact)
- How they affect the world/stakes/rules
- How they affect pacing (initiate/accelerate/resolve conflicts)
- Narrative gravity (scenes revolve around them)
- Causal responsibility for major events
Note: Mention and appearance counts are tracked but not the primary driver for ranking.
.
├── nodes/ # Graph node implementations
│ ├── init.py
│ ├── load_chapter.py
│ ├── scene_segmenter.py
│ ├── entity_roster_update.py
│ ├── mention_counter.py
│ ├── appearance_counter.py
│ ├── influence_extractor.py
│ ├── book_aggregator.py
│ ├── next_chapter.py
│ ├── book_synthesis.py
│ └── ranker.py
├── schemas/ # State and data models
│ └── state.py
├── utils/ # Utility modules
│ ├── text.py
│ ├── aliases.py
│ └── json.py
├── io/ # I/O modules
│ ├── load_chapters.py
│ └── write_results.py
├── observability/ # LangSmith integration
│ └── langsmith.py
├── graph.py # Graph assembly
├── prompts.py # LLM prompt templates
├── main.py # CLI entry point
├── requirements.txt
├── .env.example
└── README.md
When LANGSMITH_TRACING=true, all runs are traced with:
- Root run per full-book execution
- Per-node spans with tags and metadata
- LLM call tracking with prompt/model/temperature metadata
- Standard tags:
book:<book_id>,run:<run_id>,node:<node_name>,stage:<env>
- Type hints everywhere
- TypedDict/dataclass/Pydantic models for contracts
- One file per node
- Pure functions where possible
- Comprehensive docstrings
Unit tests should cover:
- Text segmentation logic
- Alias matching
- Counting logic
- JSON parsing
Integration tests should verify:
- End-to-end graph execution
- Output schema correctness
- Metadata presence
MIT