gtspencer · gtspencer · Jan 31, 2026
diff --git a/README.md b/README.md
@@ -6,8 +6,8 @@ A LangGraph-based system that processes books chapter-by-chapter to analyze and
 
 This system tracks:
 - **Mentioned count**: Total times a character's name or aliases appear in the text
-- **Appeared scenes**: Number of scenes in which a character appears (binary per scene)
 - **Influence evidence**: Structured evidence of causal, social, world, pacing, and narrative gravity impact
+- **Chapter summaries**: Incremental summaries of each chapter for narrative context
 
 After processing all chapters, the system synthesizes book-wide dossiers and produces a subjective influence ranking based on narrative impact, not just frequency.
 
@@ -17,9 +17,11 @@ This initially started as a static analyzer meant to count character reference i
 ## Features
 
 - Chapter-by-chapter processing with LangGraph state machine
+- Scene-based processing (chunks full scenes, never truncates)
 - Automatic character detection and alias resolution
-- Scene segmentation and appearance tracking
+- Scene segmentation and intelligent chunking
 - Structured influence evidence extraction
+- Incremental chapter summarization for narrative context
 - Book-wide synthesis into character dossiers
 - Subjective influence ranking (not based on frequency)
 - LangSmith integration for tracing and debugging
@@ -45,6 +47,7 @@ Required environment variables:
 - `LANGSMITH_API_KEY`: Your LangSmith API key (optional, for tracing)
 - `LANGSMITH_TRACING`: Set to `true` to enable tracing (default: `false`)
 - `LANGSMITH_PROJECT`: LangSmith project name (default: `book-influence-dev`)
+- `SCENE_CHUNK_MAX_CHARS`: Maximum characters per scene chunk (default: `5000`)
 
 ## Usage
 
@@ -103,7 +106,6 @@ The output JSON contains an array of ranked characters:
     "character_id": "char_001",
     "name": "Character Name",
     "aliases": ["Alias1", "Alias2"],
-    "appeared_scenes": 42,
     "mentioned_count": 310,
     "influence_summary": "Summary of their influence...",
     "ranking_rationale": "Why this character ranks here..."
@@ -160,12 +162,14 @@ graph TD
     Start([Start]) --> Init[Init]
     Init --> LoadChapter[LoadChapter]
     LoadChapter --> SceneSegmenter[SceneSegmenter]
-    SceneSegmenter --> EntityRosterUpdate[EntityRosterUpdate]
+    SceneSegmenter --> SceneChunker[SceneChunker]
+    SceneChunker --> EntityRosterUpdate[EntityRosterUpdate]
     EntityRosterUpdate --> MentionCounter[MentionCounter]
-    MentionCounter --> AppearanceCounter[AppearanceCounter]
-    AppearanceCounter --> InfluenceExtractor[InfluenceExtractor]
-    InfluenceExtractor --> BookAggregator[BookAggregator]
+    MentionCounter --> InfluenceExtractor[InfluenceExtractor]
+    InfluenceExtractor --> ChapterSummarizer[ChapterSummarizer]
+    ChapterSummarizer --> BookAggregator[BookAggregator]
     BookAggregator --> NextChapter[NextChapter]
+    NextChapter -->|next_chunk| SceneChunker
     NextChapter -->|next_chapter| LoadChapter
     NextChapter -->|finalize| BookSynthesis[BookSynthesis]
     BookSynthesis --> Ranker[Ranker]
@@ -184,20 +188,21 @@ graph TD
 1. **Init**: Validates input and initializes state
 2. **LoadChapter**: Loads current chapter and resets scratch fields
 3. **SceneSegmenter**: Splits chapter into scenes
-4. **EntityRosterUpdate**: Detects characters and updates alias registry
-5. **MentionCounter**: Counts alias occurrences
-6. **AppearanceCounter**: Counts scenes per character
+4. **SceneChunker**: Selects batch of full scenes that fit within character limit
+5. **EntityRosterUpdate**: Detects characters and updates alias registry
+6. **MentionCounter**: Counts alias occurrences across scene chunk
 7. **InfluenceExtractor**: Extracts structured influence evidence
-8. **BookAggregator**: Merges chapter results into book totals
-9. **NextChapter**: Conditional routing (next chapter or finalize)
-10. **BookSynthesis**: Synthesizes evidence into dossiers
-11. **Ranker**: Assigns subjective influence ranks
+8. **ChapterSummarizer**: Incrementally summarizes chapter as scenes are processed
+9. **BookAggregator**: Merges chapter results into book totals
+10. **NextChapter**: Conditional routing (next chunk, next chapter, or finalize)
+11. **BookSynthesis**: Synthesizes evidence into dossiers using chapter summaries
+12. **Ranker**: Assigns subjective influence ranks
 
 ### State Structure
 
 The state maintains three layers:
-1. **Book-level aggregates**: Persist across chapters (mentions, appearances, influence)
-2. **Per-chapter scratch**: Reset each chapter (current chapter data)
+1. **Book-level aggregates**: Persist across chapters (mentions, influence, chapter summaries)
+2. **Per-chapter scratch**: Reset each chapter (current chapter data, scene chunks)
 3. **Indexing structures**: Character canon and alias resolution
 
 ### Influence Ranking
@@ -209,7 +214,7 @@ Influence ranking is **subjective** and based on:
 - Narrative gravity (scenes revolve around them)
 - Causal responsibility for major events
 
-**Note**: Mention and appearance counts are tracked but not the primary driver for ranking.
+**Note**: Mention counts are tracked but not the primary driver for ranking. The system processes chapters in scene chunks to ensure full coverage without truncation.
 
 ## Project Structure
 
@@ -221,8 +226,8 @@ Influence ranking is **subjective** and based on:
 │   ├── scene_segmenter.py
 │   ├── entity_roster_update.py
 │   ├── mention_counter.py
-│   ├── appearance_counter.py
 │   ├── influence_extractor.py
+│   ├── chapter_summarizer.py
 │   ├── book_aggregator.py
 │   ├── next_chapter.py
 │   ├── book_synthesis.py

diff --git a/env.example b/env.example
@@ -21,6 +21,13 @@ BOOK_SYNTHESIS_MODEL=
 BOOK_SYNTHESIS_TEMPERATURE=0.2
 RANKER_MODEL=
 RANKER_TEMPERATURE=0.1
+CHAPTER_SUMMARIZER_MODEL=
+CHAPTER_SUMMARIZER_TEMPERATURE=0.3
+
+# Scene Chunking Configuration
+# Maximum characters per scene chunk (default: 5000)
+# Only full scenes are included - never truncates scenes
+SCENE_CHUNK_MAX_CHARS=5000
 
 # Environment
 ENV=dev

diff --git a/graph.py b/graph.py
@@ -1,9 +1,10 @@
 """LangGraph assembly: wires nodes together with edges and conditional routing.
 
 Graph flow:
-Init -> LoadChapter -> SceneSegmenter -> EntityRosterUpdate -> MentionCounter ->
-AppearanceCounter -> InfluenceExtractor -> BookAggregator -> NextChapter ->
-(if next_chapter: loop to LoadChapter, else: BookSynthesis -> Ranker -> Done)
+Init -> LoadChapter -> SceneSegmenter -> SceneChunker -> EntityRosterUpdate ->
+MentionCounter -> InfluenceExtractor -> ChapterSummarizer -> BookAggregator ->
+NextChapter -> (if next_chunk: loop to SceneChunker, if next_chapter: loop to
+LoadChapter, else: BookSynthesis -> Ranker -> Done)
 """
 
 from langgraph.graph import StateGraph, END
@@ -13,10 +14,11 @@
 from nodes.scene_segmenter import scene_segmenter_node
 from nodes.entity_roster_update import entity_roster_update_node
 from nodes.mention_counter import mention_counter_node
-from nodes.appearance_counter import appearance_counter_node
 from nodes.influence_extractor import influence_extractor_node
 from nodes.book_aggregator import book_aggregator_node
 from nodes.next_chapter import next_chapter_node, should_continue
+from nodes.scene_segmenter import scene_chunker_node
+from nodes.chapter_summarizer import chapter_summarizer_node
 from nodes.book_synthesis import book_synthesis_node
 from nodes.ranker import ranker_node
 
@@ -34,10 +36,11 @@ def create_graph() -> StateGraph:
     workflow.add_node("init", init_node)
     workflow.add_node("load_chapter", load_chapter_node)
     workflow.add_node("scene_segmenter", scene_segmenter_node)
+    workflow.add_node("scene_chunker", scene_chunker_node)
     workflow.add_node("entity_roster_update", entity_roster_update_node)
     workflow.add_node("mention_counter", mention_counter_node)
-    workflow.add_node("appearance_counter", appearance_counter_node)
     workflow.add_node("influence_extractor", influence_extractor_node)
+    workflow.add_node("chapter_summarizer", chapter_summarizer_node)
     workflow.add_node("book_aggregator", book_aggregator_node)
     workflow.add_node("next_chapter", next_chapter_node)
     workflow.add_node("book_synthesis", book_synthesis_node)
@@ -49,20 +52,22 @@ def create_graph() -> StateGraph:
     # Add edges
     workflow.add_edge("init", "load_chapter")
     workflow.add_edge("load_chapter", "scene_segmenter")
-    workflow.add_edge("scene_segmenter", "entity_roster_update")
+    workflow.add_edge("scene_segmenter", "scene_chunker")
+    workflow.add_edge("scene_chunker", "entity_roster_update")
     workflow.add_edge("entity_roster_update", "mention_counter")
-    workflow.add_edge("mention_counter", "appearance_counter")
-    workflow.add_edge("appearance_counter", "influence_extractor")
-    workflow.add_edge("influence_extractor", "book_aggregator")
+    workflow.add_edge("mention_counter", "influence_extractor")
+    workflow.add_edge("influence_extractor", "chapter_summarizer")
+    workflow.add_edge("chapter_summarizer", "book_aggregator")
     workflow.add_edge("book_aggregator", "next_chapter")
 
     # Conditional routing from next_chapter
     workflow.add_conditional_edges(
         "next_chapter",
         should_continue,
         {
-            "next_chapter": "load_chapter",
-            "finalize": "book_synthesis",
+            "next_chunk": "scene_chunker",  # More scenes in current chapter
+            "next_chapter": "load_chapter",  # Move to next chapter
+            "finalize": "book_synthesis",  # All chapters done
         }
     )
 

diff --git a/main.py b/main.py
@@ -126,8 +126,8 @@ def main():
         'alias_index': {},
         'unresolved_aliases': {},
         'book_mentions': {},
-        'book_appearances': {},
         'book_influence': {},
+        'chapter_summaries': {},
     }
 
     # Get graph app
@@ -180,7 +180,7 @@ def main():
     logger.info("Top 5 characters by influence:")
     for char in ranked[:5]:
         logger.info(f"  {char['rank']}. {char['name']} (ID: {char['character_id']})")
-        logger.info(f"     Appeared in {char['appeared_scenes']} scenes, mentioned {char['mentioned_count']} times")
+        logger.info(f"     Mentioned {char['mentioned_count']} times")
 
 
 if __name__ == "__main__":

diff --git a/nodes/appearance_counter.py b/nodes/appearance_counter.py
diff --git a/nodes/book_aggregator.py b/nodes/book_aggregator.py
@@ -9,14 +9,16 @@
 def book_aggregator_node(state: BookState) -> BookState:
     """Merge per-chapter results into book-level aggregates.
 
+    Aggregates mention counts and influence evidence from the current chapter
+    into book-level totals. Note: This aggregates per scene chunk, so it may
+    be called multiple times per chapter.
+
     Reads:
-    - `chapter_mentions_by_char`: per-chapter mention counts
-    - `chapter_appearances_by_char`: per-chapter appearance counts
-    - `chapter_influence_evidence`: per-chapter influence evidence
+    - `chapter_mentions_by_char`: per-chapter mention counts (accumulated across chunks)
+    - `chapter_influence_evidence`: per-chapter influence evidence (from current chunk)
 
     Writes:
-    - Increments `book_mentions`
-    - Increments `book_appearances`
+    - Increments `book_mentions` with chapter totals
     - Appends to `book_influence[char_id].evidence`
     - Updates `book_influence[char_id].feature_totals`
 
@@ -28,28 +30,20 @@ def book_aggregator_node(state: BookState) -> BookState:
     """
     chapter_id = state['current_chapter_id']
     chapter_mentions = state.get('chapter_mentions_by_char', {})
-    chapter_appearances = state.get('chapter_appearances_by_char', {})
     chapter_evidence = state.get('chapter_influence_evidence', {})
 
     logger.info(f"Aggregating chapter {chapter_id} results into book totals")
 
     book_mentions = state.get('book_mentions', {}).copy()
-    book_appearances = state.get('book_appearances', {}).copy()
     book_influence = state.get('book_influence', {}).copy()
 
-    # Aggregate mentions
+    # Aggregate mentions (chapter totals, accumulated across all chunks)
     mentions_added = 0
     for char_id, count in chapter_mentions.items():
         book_mentions[char_id] = book_mentions.get(char_id, 0) + count
         mentions_added += count
 
-    # Aggregate appearances
-    appearances_added = 0
-    for char_id, count in chapter_appearances.items():
-        book_appearances[char_id] = book_appearances.get(char_id, 0) + count
-        appearances_added += count
-
-    # Aggregate influence evidence
+    # Aggregate influence evidence (from current chunk)
     evidence_added = 0
     for char_id, evidence in chapter_evidence.items():
         if char_id not in book_influence:
@@ -78,13 +72,12 @@ def book_aggregator_node(state: BookState) -> BookState:
         totals['centered_scenes'] = totals.get('centered_scenes', 0) + len(signals.get('narrative_gravity', []))
         book_influence[char_id]['feature_totals'] = totals
 
-    logger.info(f"Aggregated: {mentions_added} mentions, {appearances_added} appearances, {evidence_added} evidence entries")
+    logger.info(f"Aggregated: {mentions_added} mentions, {evidence_added} evidence entries")
     logger.debug(f"Book totals: {len(book_mentions)} characters with mentions, {len(book_influence)} with influence evidence")
 
     updated_state: BookState = {
         **state,
         'book_mentions': book_mentions,
-        'book_appearances': book_appearances,
         'book_influence': book_influence,
     }
 

diff --git a/nodes/book_synthesis.py b/nodes/book_synthesis.py
@@ -18,8 +18,8 @@ def book_synthesis_node(state: BookState) -> BookState:
     Reads:
     - `book_influence`: accumulated influence evidence
     - `book_mentions`: total mentions per character
-    - `book_appearances`: total appearances per character
     - `characters_by_id`: all character profiles
+    - `chapter_summaries`: chapter summaries for narrative context
 
     Writes:
     - `book_plot_summary`: overall plot summary
@@ -34,18 +34,19 @@ def book_synthesis_node(state: BookState) -> BookState:
     """
     book_influence = state.get('book_influence', {})
     book_mentions = state.get('book_mentions', {})
-    book_appearances = state.get('book_appearances', {})
     characters_by_id = state.get('characters_by_id', {})
+    chapter_summaries = state.get('chapter_summaries', {})
 
     logger.info("Synthesizing book-wide evidence into character dossiers")
     logger.debug(f"Processing {len(characters_by_id)} characters with {len(book_influence)} influence records")
+    logger.debug(f"Using {len(chapter_summaries)} chapter summaries for context")
 
     # Build prompt
     prompt = get_book_synthesis_prompt(
         book_influence,
         book_mentions,
-        book_appearances,
-        characters_by_id
+        characters_by_id,
+        chapter_summaries
     )
 
     # Get model configuration from environment