Skip to content

C64CityBuilder GraphRAG local-text visibility diagnosis #28

@sharpninja

Description

@sharpninja

Observed on March 11, 2026.

This issue captures diagnosis only. It intentionally excludes fixes, remediation, and implementation steps.

Workspace under test

  • E:\github\C64CityBuilder

Summary of symptoms

  • Local staged PRG TXT files were present under E:\github\C64CityBuilder\mcp-data\graphrag\input\docs\prg.
  • Commodore_64_Programmers_Reference_Guide.txt contained verified exact phrases, but exact filename and exact snippet queries returned no local hits from MCP context or GraphRAG.

Evidence

  • Local staged input confirmed: E:\github\C64CityBuilder\mcp-data\graphrag\input\docs\prg\Commodore_64_Programmers_Reference_Guide.txt
  • Local staged input confirmed: E:\github\C64CityBuilder\mcp-data\graphrag\input\docs\prg\Mapping_the_Commodore_64.txt
  • Local inspection confirmed exact phrases in Commodore_64_Programmers_Reference_Guide.txt including: Video Bank Selection; MCS6510 Microprocessor Instruction Set; KERNAL Power.Up Activities; The COMMODORE 64 PROGRAMMER'S REFERENCE GUIDE has been.
  • Exact MCP context and GraphRAG queries returned no hits for Commodore_64_Programmers_Reference_Guide.txt, The COMMODORE 64 PROGRAMMER'S REFERENCE GUIDE has been, Mapping_the_Commodore_64.txt, and helpyouusethePEEKandPOKEinstructionstoextendyour.

Observed server behavior

  • MCP GraphRAG status reported enabled=true, state=ready, isIndexed=true, backend=internal-fallback, lastIndexedDocumentCount=43, and lastIndexDurationMs approximately 1 to 4 ms.
  • Forcing POST /mcpserver/graphrag/index updated lastIndexedAtUtc but still reported 43 documents and an approximately 4 ms duration.
  • An unrelated query like Color RAM returned an external C64-Wiki result instead of the local book content.

Why this appears to be an indexing or query visibility problem and not only a source-content problem

  • A locally staged TXT with verified clean exact phrases remained undiscoverable.
  • GraphRAG continued to report ready/indexed state before and after forced reindex.
  • The reported document count remained fixed at 43.
  • Index duration remained near-instant while local content stayed invisible.
  • Missing OCR dependencies can explain some PDF extraction limitations, but they do not by themselves explain why already-staged local TXT content was not query-visible.

Separate source text quality note

  • Mapping_the_Commodore_64.txt also has source text quality issues, including noisy or gibberish text near the beginning and collapsed spacing/run-on text such as helpyouusethePEEKandPOKEinstructionstoextendyour.

Impact on users and workspaces

  • Users and agents can receive false confidence from ready/indexed status while workspace-local staged material remains undiscoverable.
  • External sources may dominate answers even when local documents are present.

Open diagnostic questions

  • Are TXT files under mcp-data/graphrag/input/docs/prg included in the effective corpus for this workspace?
  • What corpus produced the fixed 43-document count?
  • Does the internal-fallback backend read the staged input tree or some other store?
  • Are exact-filename and exact-snippet queries executed against the same corpus that status reports as indexed?
  • Is workspace scoping or source-key registration excluding these local documents from visibility?

Client machine dependency note

  • On the client machine used for ingestion, OCR dependencies were missing: tesseract missing and pdftoppm missing.

Metadata

Metadata

Labels

questionFurther information is requested

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions