-
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Observed on March 11, 2026.
This issue captures diagnosis only. It intentionally excludes fixes, remediation, and implementation steps.
Workspace under test
- E:\github\C64CityBuilder
Summary of symptoms
- Local staged PRG TXT files were present under E:\github\C64CityBuilder\mcp-data\graphrag\input\docs\prg.
- Commodore_64_Programmers_Reference_Guide.txt contained verified exact phrases, but exact filename and exact snippet queries returned no local hits from MCP context or GraphRAG.
Evidence
- Local staged input confirmed: E:\github\C64CityBuilder\mcp-data\graphrag\input\docs\prg\Commodore_64_Programmers_Reference_Guide.txt
- Local staged input confirmed: E:\github\C64CityBuilder\mcp-data\graphrag\input\docs\prg\Mapping_the_Commodore_64.txt
- Local inspection confirmed exact phrases in Commodore_64_Programmers_Reference_Guide.txt including: Video Bank Selection; MCS6510 Microprocessor Instruction Set; KERNAL Power.Up Activities; The COMMODORE 64 PROGRAMMER'S REFERENCE GUIDE has been.
- Exact MCP context and GraphRAG queries returned no hits for Commodore_64_Programmers_Reference_Guide.txt, The COMMODORE 64 PROGRAMMER'S REFERENCE GUIDE has been, Mapping_the_Commodore_64.txt, and helpyouusethePEEKandPOKEinstructionstoextendyour.
Observed server behavior
- MCP GraphRAG status reported enabled=true, state=ready, isIndexed=true, backend=internal-fallback, lastIndexedDocumentCount=43, and lastIndexDurationMs approximately 1 to 4 ms.
- Forcing POST /mcpserver/graphrag/index updated lastIndexedAtUtc but still reported 43 documents and an approximately 4 ms duration.
- An unrelated query like Color RAM returned an external C64-Wiki result instead of the local book content.
Why this appears to be an indexing or query visibility problem and not only a source-content problem
- A locally staged TXT with verified clean exact phrases remained undiscoverable.
- GraphRAG continued to report ready/indexed state before and after forced reindex.
- The reported document count remained fixed at 43.
- Index duration remained near-instant while local content stayed invisible.
- Missing OCR dependencies can explain some PDF extraction limitations, but they do not by themselves explain why already-staged local TXT content was not query-visible.
Separate source text quality note
- Mapping_the_Commodore_64.txt also has source text quality issues, including noisy or gibberish text near the beginning and collapsed spacing/run-on text such as helpyouusethePEEKandPOKEinstructionstoextendyour.
Impact on users and workspaces
- Users and agents can receive false confidence from ready/indexed status while workspace-local staged material remains undiscoverable.
- External sources may dominate answers even when local documents are present.
Open diagnostic questions
- Are TXT files under mcp-data/graphrag/input/docs/prg included in the effective corpus for this workspace?
- What corpus produced the fixed 43-document count?
- Does the internal-fallback backend read the staged input tree or some other store?
- Are exact-filename and exact-snippet queries executed against the same corpus that status reports as indexed?
- Is workspace scoping or source-key registration excluding these local documents from visibility?
Client machine dependency note
- On the client machine used for ingestion, OCR dependencies were missing: tesseract missing and pdftoppm missing.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested