Skip to content

Add GraphRAG corpus visibility diagnostics for internal-fallback indexing/query mismatch#29

Merged
sharpninja merged 5 commits intodevelopfrom
copilot/diagnose-local-text-visibility-issue
Mar 12, 2026
Merged

Add GraphRAG corpus visibility diagnostics for internal-fallback indexing/query mismatch#29
sharpninja merged 5 commits intodevelopfrom
copilot/diagnose-local-text-visibility-issue

Conversation

Copy link
Contributor

Copilot AI commented Mar 11, 2026

  • Inspect failing CI runs and identify actionable pipeline errors on this PR branch.
  • Fix the MCP config validation script so the default config path matches the repository’s current appsettings format.
  • Fix the markdownlint failure in the touched documentation content.
  • Run targeted validation for the script and markdown lint/build paths.
  • Run code review/security checks, then reply to the PR comment with the addressing commit.
Original prompt

This section details on the original issue you should resolve

<issue_title>C64CityBuilder GraphRAG local-text visibility diagnosis</issue_title>
<issue_description>Observed on March 11, 2026.

This issue captures diagnosis only. It intentionally excludes fixes, remediation, and implementation steps.

Workspace under test

  • E:\github\C64CityBuilder

Summary of symptoms

  • Local staged PRG TXT files were present under E:\github\C64CityBuilder\mcp-data\graphrag\input\docs\prg.
  • Commodore_64_Programmers_Reference_Guide.txt contained verified exact phrases, but exact filename and exact snippet queries returned no local hits from MCP context or GraphRAG.

Evidence

  • Local staged input confirmed: E:\github\C64CityBuilder\mcp-data\graphrag\input\docs\prg\Commodore_64_Programmers_Reference_Guide.txt
  • Local staged input confirmed: E:\github\C64CityBuilder\mcp-data\graphrag\input\docs\prg\Mapping_the_Commodore_64.txt
  • Local inspection confirmed exact phrases in Commodore_64_Programmers_Reference_Guide.txt including: Video Bank Selection; MCS6510 Microprocessor Instruction Set; KERNAL Power.Up Activities; The COMMODORE 64 PROGRAMMER'S REFERENCE GUIDE has been.
  • Exact MCP context and GraphRAG queries returned no hits for Commodore_64_Programmers_Reference_Guide.txt, The COMMODORE 64 PROGRAMMER'S REFERENCE GUIDE has been, Mapping_the_Commodore_64.txt, and helpyouusethePEEKandPOKEinstructionstoextendyour.

Observed server behavior

  • MCP GraphRAG status reported enabled=true, state=ready, isIndexed=true, backend=internal-fallback, lastIndexedDocumentCount=43, and lastIndexDurationMs approximately 1 to 4 ms.
  • Forcing POST /mcpserver/graphrag/index updated lastIndexedAtUtc but still reported 43 documents and an approximately 4 ms duration.
  • An unrelated query like Color RAM returned an external C64-Wiki result instead of the local book content.

Why this appears to be an indexing or query visibility problem and not only a source-content problem

  • A locally staged TXT with verified clean exact phrases remained undiscoverable.
  • GraphRAG continued to report ready/indexed state before and after forced reindex.
  • The reported document count remained fixed at 43.
  • Index duration remained near-instant while local content stayed invisible.
  • Missing OCR dependencies can explain some PDF extraction limitations, but they do not by themselves explain why already-staged local TXT content was not query-visible.

Separate source text quality note

  • Mapping_the_Commodore_64.txt also has source text quality issues, including noisy or gibberish text near the beginning and collapsed spacing/run-on text such as helpyouusethePEEKandPOKEinstructionstoextendyour.

Impact on users and workspaces

  • Users and agents can receive false confidence from ready/indexed status while workspace-local staged material remains undiscoverable.
  • External sources may dominate answers even when local documents are present.

Open diagnostic questions

  • Are TXT files under mcp-data/graphrag/input/docs/prg included in the effective corpus for this workspace?
  • What corpus produced the fixed 43-document count?
  • Does the internal-fallback backend read the staged input tree or some other store?
  • Are exact-filename and exact-snippet queries executed against the same corpus that status reports as indexed?
  • Is workspace scoping or source-key registration excluding these local documents from visibility?

Client machine dependency note

  • On the client machine used for ingestion, OCR dependencies were missing: tesseract missing and pdftoppm missing.</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Copilot AI changed the title [WIP] Diagnose visibility issue for local text in GraphRAG Add GraphRAG corpus visibility diagnostics for internal-fallback indexing/query mismatch Mar 11, 2026
@sharpninja
Copy link
Owner

@copilot Resolve pipeline errors

Copy link
Contributor Author

Copilot AI commented Mar 11, 2026

@copilot Resolve pipeline errors

Resolved the current PR pipeline blockers in baeb997. The config validation script now auto-detects the YAML appsettings file used in this repo, and the USER-GUIDE markdownlint failure is fixed.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • example.com
    • Triggering command: /home/REDACTED/work/McpServer/McpServer/tests/McpServer.Support.Mcp.Tests/bin/Release/net9.0/McpServer.Support.Mcp.Tests /home/REDACTED/work/McpServer/McpServer/tests/McpServer.Support.Mcp.Tests/bin/Release/net9.0/McpServer.Support.Mcp.Tests @@ /tmp/tmpDL7XJC.tmp (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@sharpninja sharpninja marked this pull request as ready for review March 12, 2026 04:44
@sharpninja sharpninja merged commit 5bbd9e9 into develop Mar 12, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants