Lit Anal, short for Literary Analysis - A LangGraph Book Character Influence Analyzer

A LangGraph-based system that processes books chapter-by-chapter to analyze and rank characters by their influence throughout the story.

Overview

This system tracks:

Mentioned count: Total times a character's name or aliases appear in the text
Appeared scenes: Number of scenes in which a character appears (binary per scene)
Influence evidence: Structured evidence of causal, social, world, pacing, and narrative gravity impact

After processing all chapters, the system synthesizes book-wide dossiers and produces a subjective influence ranking based on narrative impact, not just frequency.

Inspiration

This initially started as a static analyzer meant to count character reference in Game of Thrones, with the goal of determining who has "the best story", and thus deserves the iron throne. With the advent of powerful LLMs and agentic workflows, I've modified the approach to take advantage of the non-deterministic, subjective nature of LLMs to further apply "importance" metrics to characters in any book.

Features

Chapter-by-chapter processing with LangGraph state machine
Automatic character detection and alias resolution
Scene segmentation and appearance tracking
Structured influence evidence extraction
Book-wide synthesis into character dossiers
Subjective influence ranking (not based on frequency)
LangSmith integration for tracing and debugging
Multiple input formats (JSON, directory of files, single file)

Installation

Clone the repository
Install dependencies:

pip install -r requirements.txt

Set up environment variables (copy .env.example to .env and fill in values):

cp .env.example .env

Required environment variables:

OPENAI_API_KEY: Your OpenAI API key (required for LLM nodes)
LANGSMITH_API_KEY: Your LangSmith API key (optional, for tracing)
LANGSMITH_TRACING: Set to true to enable tracing (default: false)
LANGSMITH_PROJECT: LangSmith project name (default: book-influence-dev)

Usage

Basic Usage

Process a book from a JSON file:

python main.py --input chapters.json --output results.json

Process chapters from a directory:

python main.py --input-dir chapters/ --output results.json

Process a single file with chapter separators:

python main.py --input-file book.txt --output results.json

Input Formats

JSON Format (--input chapters.json):

[
  {
    "chapter_id": "chapter_001",
    "text": "Chapter text here..."
  },
  {
    "chapter_id": "chapter_002",
    "text": "More chapter text..."
  }
]

Directory Format (--input-dir chapters/):

Directory containing text files (default pattern: *.txt)
Files are sorted by name and assigned sequential chapter IDs

Single File Format (--input-file book.txt):

Single text file with chapters separated by \n\n---\n\n
Chapters are assigned sequential IDs

Output Format

The output JSON contains an array of ranked characters:

[
  {
    "rank": 1,
    "character_id": "char_001",
    "name": "Character Name",
    "aliases": ["Alias1", "Alias2"],
    "appeared_scenes": 42,
    "mentioned_count": 310,
    "influence_summary": "Summary of their influence...",
    "ranking_rationale": "Why this character ranks here..."
  }
]

Advanced Options

Include metadata in output:

python main.py --input chapters.json --output results.json --include-metadata

Write full state for debugging:

python main.py --input chapters.json --output results.json --full-state debug_state.json

Specify run metadata for tracing:

python main.py --input chapters.json --output results.json --book-id "book_001" --run-id "run_001" --env prod

Logging Options:

By default, logs are output to the command line (stdout). You can also write logs to a file:

# Log to both console and file
python main.py --input chapters.json --output results.json --log-file run.log

# Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
python main.py --input chapters.json --output results.json --log-file run.log --log-level DEBUG

Logging Behavior:

Without --log-file: Logs only go to the command line (stdout)
With --log-file: Logs go to both the command line and the specified file
Log levels: Control verbosity (default: INFO)
- DEBUG: Detailed information including character IDs, counts, model configs
- INFO: Progress updates, major milestones, summaries
- WARNING: Ambiguous references, parsing issues
- ERROR: Failures, missing data

All nodes log their execution steps, making it easy to track progress and debug issues.

Architecture

Graph Flow

graph TD
    Start([Start]) --> Init[Init]
    Init --> LoadChapter[LoadChapter]
    LoadChapter --> SceneSegmenter[SceneSegmenter]
    SceneSegmenter --> EntityRosterUpdate[EntityRosterUpdate]
    EntityRosterUpdate --> MentionCounter[MentionCounter]
    MentionCounter --> AppearanceCounter[AppearanceCounter]
    AppearanceCounter --> InfluenceExtractor[InfluenceExtractor]
    InfluenceExtractor --> BookAggregator[BookAggregator]
    BookAggregator --> NextChapter[NextChapter]
    NextChapter -->|next_chapter| LoadChapter
    NextChapter -->|finalize| BookSynthesis[BookSynthesis]
    BookSynthesis --> Ranker[Ranker]
    Ranker --> End([End])
    
    style Start fill:#009919
    style End fill:#b52d93
    style NextChapter fill:#b5ae2d
    style Init fill:#003399
    style BookSynthesis fill:#812db5
    style Ranker fill:#812db5

Graph Nodes

Init: Validates input and initializes state
LoadChapter: Loads current chapter and resets scratch fields
SceneSegmenter: Splits chapter into scenes
EntityRosterUpdate: Detects characters and updates alias registry
MentionCounter: Counts alias occurrences
AppearanceCounter: Counts scenes per character
InfluenceExtractor: Extracts structured influence evidence
BookAggregator: Merges chapter results into book totals
NextChapter: Conditional routing (next chapter or finalize)
BookSynthesis: Synthesizes evidence into dossiers
Ranker: Assigns subjective influence ranks

State Structure

The state maintains three layers:

Book-level aggregates: Persist across chapters (mentions, appearances, influence)
Per-chapter scratch: Reset each chapter (current chapter data)
Indexing structures: Character canon and alias resolution

Influence Ranking

Influence ranking is subjective and based on:

How characters affect other characters (social impact)
How they affect the world/stakes/rules
How they affect pacing (initiate/accelerate/resolve conflicts)
Narrative gravity (scenes revolve around them)
Causal responsibility for major events

Note: Mention and appearance counts are tracked but not the primary driver for ranking.

Project Structure

.
├── nodes/              # Graph node implementations
│   ├── init.py
│   ├── load_chapter.py
│   ├── scene_segmenter.py
│   ├── entity_roster_update.py
│   ├── mention_counter.py
│   ├── appearance_counter.py
│   ├── influence_extractor.py
│   ├── book_aggregator.py
│   ├── next_chapter.py
│   ├── book_synthesis.py
│   └── ranker.py
├── schemas/            # State and data models
│   └── state.py
├── utils/              # Utility modules
│   ├── text.py
│   ├── aliases.py
│   └── json.py
├── io/                 # I/O modules
│   ├── load_chapters.py
│   └── write_results.py
├── observability/      # LangSmith integration
│   └── langsmith.py
├── graph.py            # Graph assembly
├── prompts.py          # LLM prompt templates
├── main.py             # CLI entry point
├── requirements.txt
├── .env.example
└── README.md

LangSmith Integration

When LANGSMITH_TRACING=true, all runs are traced with:

Root run per full-book execution
Per-node spans with tags and metadata
LLM call tracking with prompt/model/temperature metadata
Standard tags: book:<book_id>, run:<run_id>, node:<node_name>, stage:<env>

Development

Code Standards

Type hints everywhere
TypedDict/dataclass/Pydantic models for contracts
One file per node
Pure functions where possible
Comprehensive docstrings

Testing

Unit tests should cover:

Text segmentation logic
Alias matching
Counting logic
JSON parsing

Integration tests should verify:

End-to-end graph execution
Output schema correctness
Metadata presence

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lit Anal, short for Literary Analysis - A LangGraph Book Character Influence Analyzer

Overview

Inspiration

Features

Installation

Usage

Basic Usage

Input Formats

Output Format

Advanced Options

Architecture

Graph Flow

Graph Nodes

State Structure

Influence Ranking

Project Structure

LangSmith Integration

Development

Code Standards

Testing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
book_io		book_io
nodes		nodes
observability		observability
schemas		schemas
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.example		env.example
graph.py		graph.py
langgraph.json		langgraph.json
main.py		main.py
prompts.py		prompts.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Lit Anal, short for Literary Analysis - A LangGraph Book Character Influence Analyzer

Overview

Inspiration

Features

Installation

Usage

Basic Usage

Input Formats

Output Format

Advanced Options

Architecture

Graph Flow

Graph Nodes

State Structure

Influence Ranking

Project Structure

LangSmith Integration

Development

Code Standards

Testing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages