Fix for page level chunk integration, and reduce integration prompt redundancy#40
Merged
Fix for page level chunk integration, and reduce integration prompt redundancy#40
Conversation
Page-level chunks (__PAGE__) are synthetic entries in the RAG index used for semantic search by page name/title/frontmatter. They don't correspond to real blocks in the file structure, so they can't be used as integration targets for actions like add_under or replace. When the LLM selects a __PAGE__ chunk as a target, it means "this knowledge belongs on this page" without specifying a particular block. The correct interpretation is add_section (new top-level section). Changes: - Detect targets ending with "::__PAGE__" after LLM ID translation - Normalize action to "add_section" regardless of LLM suggestion - Clear target_block_id (add_section has no specific target) - Add debug logging for normalization events This eliminates the "Target block not found: page::__PAGE__" errors and enables LLM to suggest integration into pages that have no blocks yet (only page-level metadata exists in RAG index). Impact: - Before: Integration fails with "target block not found" error - After: Page-level chunks correctly interpreted as add_section - Enables: Adding knowledge to empty but relevant pages Tests: - Added test_plan_integration_for_block_normalizes_page_level_chunks - Added test_plan_integration_for_block_preserves_regular_block_targets - All existing llm_wrappers tests pass Assisted-by: Claude Code
Page-level chunks (__PAGE__) store frontmatter in their context for semantic search quality during embedding. However, when formatting these chunks for LLM prompts, the frontmatter was duplicated: - Once in the <properties> section (parsed from page outline) - Again in the <block> content (from stored RAG chunk context) This wasted ~50-200 tokens per page depending on property count. Solution: Reuse existing _clean_context_for_llm() function from page_indexer.py to strip frontmatter from page-level chunks during prompt formatting. This elegant approach reuses tested code instead of duplicating logic. Changes: - Import _clean_context_for_llm() in llm_helpers.py - Detect page-level chunks (::__PAGE__) in format_chunks_for_llm() - Apply _clean_context_for_llm() to page-level chunks only - Regular blocks unchanged (already cleaned during indexing) - Frontmatter remains in <properties> section for LLM context Impact: - Before: "tags:: foo, bar" appears twice in prompt (properties + block) - After: "tags:: foo, bar" appears once (properties only) - Token savings: ~50-200 per page with properties - No impact on semantic search quality (frontmatter still embedded) Tests: - Added test_llm_helpers.py with 3 integration tests - Tests cover page-level chunk stripping, regular block preservation - All existing llm_wrappers tests pass (no regressions) Assisted-by: Claude Code
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #40 +/- ##
==========================================
+ Coverage 84.63% 84.67% +0.04%
==========================================
Files 48 48
Lines 5128 5136 +8
==========================================
+ Hits 4340 4349 +9
+ Misses 788 787 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.