Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 23 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ A LangGraph-based system that processes books chapter-by-chapter to analyze and

This system tracks:
- **Mentioned count**: Total times a character's name or aliases appear in the text
- **Appeared scenes**: Number of scenes in which a character appears (binary per scene)
- **Influence evidence**: Structured evidence of causal, social, world, pacing, and narrative gravity impact
- **Chapter summaries**: Incremental summaries of each chapter for narrative context

After processing all chapters, the system synthesizes book-wide dossiers and produces a subjective influence ranking based on narrative impact, not just frequency.

Expand All @@ -17,9 +17,11 @@ This initially started as a static analyzer meant to count character reference i
## Features

- Chapter-by-chapter processing with LangGraph state machine
- Scene-based processing (chunks full scenes, never truncates)
- Automatic character detection and alias resolution
- Scene segmentation and appearance tracking
- Scene segmentation and intelligent chunking
- Structured influence evidence extraction
- Incremental chapter summarization for narrative context
- Book-wide synthesis into character dossiers
- Subjective influence ranking (not based on frequency)
- LangSmith integration for tracing and debugging
Expand All @@ -45,6 +47,7 @@ Required environment variables:
- `LANGSMITH_API_KEY`: Your LangSmith API key (optional, for tracing)
- `LANGSMITH_TRACING`: Set to `true` to enable tracing (default: `false`)
- `LANGSMITH_PROJECT`: LangSmith project name (default: `book-influence-dev`)
- `SCENE_CHUNK_MAX_CHARS`: Maximum characters per scene chunk (default: `5000`)

## Usage

Expand Down Expand Up @@ -103,7 +106,6 @@ The output JSON contains an array of ranked characters:
"character_id": "char_001",
"name": "Character Name",
"aliases": ["Alias1", "Alias2"],
"appeared_scenes": 42,
"mentioned_count": 310,
"influence_summary": "Summary of their influence...",
"ranking_rationale": "Why this character ranks here..."
Expand Down Expand Up @@ -160,12 +162,14 @@ graph TD
Start([Start]) --> Init[Init]
Init --> LoadChapter[LoadChapter]
LoadChapter --> SceneSegmenter[SceneSegmenter]
SceneSegmenter --> EntityRosterUpdate[EntityRosterUpdate]
SceneSegmenter --> SceneChunker[SceneChunker]
SceneChunker --> EntityRosterUpdate[EntityRosterUpdate]
EntityRosterUpdate --> MentionCounter[MentionCounter]
MentionCounter --> AppearanceCounter[AppearanceCounter]
AppearanceCounter --> InfluenceExtractor[InfluenceExtractor]
InfluenceExtractor --> BookAggregator[BookAggregator]
MentionCounter --> InfluenceExtractor[InfluenceExtractor]
InfluenceExtractor --> ChapterSummarizer[ChapterSummarizer]
ChapterSummarizer --> BookAggregator[BookAggregator]
BookAggregator --> NextChapter[NextChapter]
NextChapter -->|next_chunk| SceneChunker
NextChapter -->|next_chapter| LoadChapter
NextChapter -->|finalize| BookSynthesis[BookSynthesis]
BookSynthesis --> Ranker[Ranker]
Expand All @@ -184,20 +188,21 @@ graph TD
1. **Init**: Validates input and initializes state
2. **LoadChapter**: Loads current chapter and resets scratch fields
3. **SceneSegmenter**: Splits chapter into scenes
4. **EntityRosterUpdate**: Detects characters and updates alias registry
5. **MentionCounter**: Counts alias occurrences
6. **AppearanceCounter**: Counts scenes per character
4. **SceneChunker**: Selects batch of full scenes that fit within character limit
5. **EntityRosterUpdate**: Detects characters and updates alias registry
6. **MentionCounter**: Counts alias occurrences across scene chunk
7. **InfluenceExtractor**: Extracts structured influence evidence
8. **BookAggregator**: Merges chapter results into book totals
9. **NextChapter**: Conditional routing (next chapter or finalize)
10. **BookSynthesis**: Synthesizes evidence into dossiers
11. **Ranker**: Assigns subjective influence ranks
8. **ChapterSummarizer**: Incrementally summarizes chapter as scenes are processed
9. **BookAggregator**: Merges chapter results into book totals
10. **NextChapter**: Conditional routing (next chunk, next chapter, or finalize)
11. **BookSynthesis**: Synthesizes evidence into dossiers using chapter summaries
12. **Ranker**: Assigns subjective influence ranks

### State Structure

The state maintains three layers:
1. **Book-level aggregates**: Persist across chapters (mentions, appearances, influence)
2. **Per-chapter scratch**: Reset each chapter (current chapter data)
1. **Book-level aggregates**: Persist across chapters (mentions, influence, chapter summaries)
2. **Per-chapter scratch**: Reset each chapter (current chapter data, scene chunks)
3. **Indexing structures**: Character canon and alias resolution

### Influence Ranking
Expand All @@ -209,7 +214,7 @@ Influence ranking is **subjective** and based on:
- Narrative gravity (scenes revolve around them)
- Causal responsibility for major events

**Note**: Mention and appearance counts are tracked but not the primary driver for ranking.
**Note**: Mention counts are tracked but not the primary driver for ranking. The system processes chapters in scene chunks to ensure full coverage without truncation.

## Project Structure

Expand All @@ -221,8 +226,8 @@ Influence ranking is **subjective** and based on:
│ ├── scene_segmenter.py
│ ├── entity_roster_update.py
│ ├── mention_counter.py
│ ├── appearance_counter.py
│ ├── influence_extractor.py
│ ├── chapter_summarizer.py
│ ├── book_aggregator.py
│ ├── next_chapter.py
│ ├── book_synthesis.py
Expand Down
7 changes: 7 additions & 0 deletions env.example
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,13 @@ BOOK_SYNTHESIS_MODEL=
BOOK_SYNTHESIS_TEMPERATURE=0.2
RANKER_MODEL=
RANKER_TEMPERATURE=0.1
CHAPTER_SUMMARIZER_MODEL=
CHAPTER_SUMMARIZER_TEMPERATURE=0.3

# Scene Chunking Configuration
# Maximum characters per scene chunk (default: 5000)
# Only full scenes are included - never truncates scenes
SCENE_CHUNK_MAX_CHARS=5000

# Environment
ENV=dev
Expand Down
27 changes: 16 additions & 11 deletions graph.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
"""LangGraph assembly: wires nodes together with edges and conditional routing.

Graph flow:
Init -> LoadChapter -> SceneSegmenter -> EntityRosterUpdate -> MentionCounter ->
AppearanceCounter -> InfluenceExtractor -> BookAggregator -> NextChapter ->
(if next_chapter: loop to LoadChapter, else: BookSynthesis -> Ranker -> Done)
Init -> LoadChapter -> SceneSegmenter -> SceneChunker -> EntityRosterUpdate ->
MentionCounter -> InfluenceExtractor -> ChapterSummarizer -> BookAggregator ->
NextChapter -> (if next_chunk: loop to SceneChunker, if next_chapter: loop to
LoadChapter, else: BookSynthesis -> Ranker -> Done)
"""

from langgraph.graph import StateGraph, END
Expand All @@ -13,10 +14,11 @@
from nodes.scene_segmenter import scene_segmenter_node
from nodes.entity_roster_update import entity_roster_update_node
from nodes.mention_counter import mention_counter_node
from nodes.appearance_counter import appearance_counter_node
from nodes.influence_extractor import influence_extractor_node
from nodes.book_aggregator import book_aggregator_node
from nodes.next_chapter import next_chapter_node, should_continue
from nodes.scene_segmenter import scene_chunker_node
from nodes.chapter_summarizer import chapter_summarizer_node
from nodes.book_synthesis import book_synthesis_node
from nodes.ranker import ranker_node

Expand All @@ -34,10 +36,11 @@ def create_graph() -> StateGraph:
workflow.add_node("init", init_node)
workflow.add_node("load_chapter", load_chapter_node)
workflow.add_node("scene_segmenter", scene_segmenter_node)
workflow.add_node("scene_chunker", scene_chunker_node)
workflow.add_node("entity_roster_update", entity_roster_update_node)
workflow.add_node("mention_counter", mention_counter_node)
workflow.add_node("appearance_counter", appearance_counter_node)
workflow.add_node("influence_extractor", influence_extractor_node)
workflow.add_node("chapter_summarizer", chapter_summarizer_node)
workflow.add_node("book_aggregator", book_aggregator_node)
workflow.add_node("next_chapter", next_chapter_node)
workflow.add_node("book_synthesis", book_synthesis_node)
Expand All @@ -49,20 +52,22 @@ def create_graph() -> StateGraph:
# Add edges
workflow.add_edge("init", "load_chapter")
workflow.add_edge("load_chapter", "scene_segmenter")
workflow.add_edge("scene_segmenter", "entity_roster_update")
workflow.add_edge("scene_segmenter", "scene_chunker")
workflow.add_edge("scene_chunker", "entity_roster_update")
workflow.add_edge("entity_roster_update", "mention_counter")
workflow.add_edge("mention_counter", "appearance_counter")
workflow.add_edge("appearance_counter", "influence_extractor")
workflow.add_edge("influence_extractor", "book_aggregator")
workflow.add_edge("mention_counter", "influence_extractor")
workflow.add_edge("influence_extractor", "chapter_summarizer")
workflow.add_edge("chapter_summarizer", "book_aggregator")
workflow.add_edge("book_aggregator", "next_chapter")

# Conditional routing from next_chapter
workflow.add_conditional_edges(
"next_chapter",
should_continue,
{
"next_chapter": "load_chapter",
"finalize": "book_synthesis",
"next_chunk": "scene_chunker", # More scenes in current chapter
"next_chapter": "load_chapter", # Move to next chapter
"finalize": "book_synthesis", # All chapters done
}
)

Expand Down
4 changes: 2 additions & 2 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,8 +126,8 @@ def main():
'alias_index': {},
'unresolved_aliases': {},
'book_mentions': {},
'book_appearances': {},
'book_influence': {},
'chapter_summaries': {},
}

# Get graph app
Expand Down Expand Up @@ -180,7 +180,7 @@ def main():
logger.info("Top 5 characters by influence:")
for char in ranked[:5]:
logger.info(f" {char['rank']}. {char['name']} (ID: {char['character_id']})")
logger.info(f" Appeared in {char['appeared_scenes']} scenes, mentioned {char['mentioned_count']} times")
logger.info(f" Mentioned {char['mentioned_count']} times")


if __name__ == "__main__":
Expand Down
61 changes: 0 additions & 61 deletions nodes/appearance_counter.py

This file was deleted.

27 changes: 10 additions & 17 deletions nodes/book_aggregator.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,16 @@
def book_aggregator_node(state: BookState) -> BookState:
"""Merge per-chapter results into book-level aggregates.

Aggregates mention counts and influence evidence from the current chapter
into book-level totals. Note: This aggregates per scene chunk, so it may
be called multiple times per chapter.

Reads:
- `chapter_mentions_by_char`: per-chapter mention counts
- `chapter_appearances_by_char`: per-chapter appearance counts
- `chapter_influence_evidence`: per-chapter influence evidence
- `chapter_mentions_by_char`: per-chapter mention counts (accumulated across chunks)
- `chapter_influence_evidence`: per-chapter influence evidence (from current chunk)

Writes:
- Increments `book_mentions`
- Increments `book_appearances`
- Increments `book_mentions` with chapter totals
- Appends to `book_influence[char_id].evidence`
- Updates `book_influence[char_id].feature_totals`

Expand All @@ -28,28 +30,20 @@ def book_aggregator_node(state: BookState) -> BookState:
"""
chapter_id = state['current_chapter_id']
chapter_mentions = state.get('chapter_mentions_by_char', {})
chapter_appearances = state.get('chapter_appearances_by_char', {})
chapter_evidence = state.get('chapter_influence_evidence', {})

logger.info(f"Aggregating chapter {chapter_id} results into book totals")

book_mentions = state.get('book_mentions', {}).copy()
book_appearances = state.get('book_appearances', {}).copy()
book_influence = state.get('book_influence', {}).copy()

# Aggregate mentions
# Aggregate mentions (chapter totals, accumulated across all chunks)
mentions_added = 0
for char_id, count in chapter_mentions.items():
book_mentions[char_id] = book_mentions.get(char_id, 0) + count
mentions_added += count

# Aggregate appearances
appearances_added = 0
for char_id, count in chapter_appearances.items():
book_appearances[char_id] = book_appearances.get(char_id, 0) + count
appearances_added += count

# Aggregate influence evidence
# Aggregate influence evidence (from current chunk)
evidence_added = 0
for char_id, evidence in chapter_evidence.items():
if char_id not in book_influence:
Expand Down Expand Up @@ -78,13 +72,12 @@ def book_aggregator_node(state: BookState) -> BookState:
totals['centered_scenes'] = totals.get('centered_scenes', 0) + len(signals.get('narrative_gravity', []))
book_influence[char_id]['feature_totals'] = totals

logger.info(f"Aggregated: {mentions_added} mentions, {appearances_added} appearances, {evidence_added} evidence entries")
logger.info(f"Aggregated: {mentions_added} mentions, {evidence_added} evidence entries")
logger.debug(f"Book totals: {len(book_mentions)} characters with mentions, {len(book_influence)} with influence evidence")

updated_state: BookState = {
**state,
'book_mentions': book_mentions,
'book_appearances': book_appearances,
'book_influence': book_influence,
}

Expand Down
9 changes: 5 additions & 4 deletions nodes/book_synthesis.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ def book_synthesis_node(state: BookState) -> BookState:
Reads:
- `book_influence`: accumulated influence evidence
- `book_mentions`: total mentions per character
- `book_appearances`: total appearances per character
- `characters_by_id`: all character profiles
- `chapter_summaries`: chapter summaries for narrative context

Writes:
- `book_plot_summary`: overall plot summary
Expand All @@ -34,18 +34,19 @@ def book_synthesis_node(state: BookState) -> BookState:
"""
book_influence = state.get('book_influence', {})
book_mentions = state.get('book_mentions', {})
book_appearances = state.get('book_appearances', {})
characters_by_id = state.get('characters_by_id', {})
chapter_summaries = state.get('chapter_summaries', {})

logger.info("Synthesizing book-wide evidence into character dossiers")
logger.debug(f"Processing {len(characters_by_id)} characters with {len(book_influence)} influence records")
logger.debug(f"Using {len(chapter_summaries)} chapter summaries for context")

# Build prompt
prompt = get_book_synthesis_prompt(
book_influence,
book_mentions,
book_appearances,
characters_by_id
characters_by_id,
chapter_summaries
)

# Get model configuration from environment
Expand Down
Loading