-
Notifications
You must be signed in to change notification settings - Fork 111
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
To maximize the value of the upcoming Neo4j graph integration (and the existing PostgreSQL graph), we need to improve the quality of the data entering the graph and how we query it. This sub-issue focuses on two key enhancements:
- Refining Document Ingestion: Moving beyond simple
CO_OCCURSrelationships by identifying richer, semantic relationships between entities, and refining/deduplicating the entities themselves. - Query-Side Entity Extraction: Applying Named Entity Recognition (NER) to the user's query at runtime to identify precise seed nodes for graph traversal, improving retrieval accuracy.
Proposed Solution
1. Refine Entity & Relationship Extraction (Ingestion)
- Semantic Relationships: Update the sidecar
/extract-entitieslogic (or the LLM prompt powering it) to identify specific relationship types rather than just co-occurrence. Examples includeBELONGS_TO,DEPENDS_ON,REPORTS_TO,SIMILAR_TO, etc. - Entity Refinement & Resolution: Introduce a deduplication or canonicalization step during ingestion. For example, resolving "AWS" and "Amazon Web Services" to the same underlying entity, or merging entities with high embedding similarity before syncing to the database/Neo4j.
2. Query-Side Entity Extraction (Retrieval)
- User Query NER: Before hitting the RAG ensemble, pass the user query through a lightweight extraction step (either a fast LLM call or a dedicated NER model in the sidecar) to identify key entities.
- Targeted Graph Traversal: Pass these extracted entities to the
GraphRetrieverand the proposedNeo4jGraphRetriever. Use these exact entity names/labels as the starting nodes for multi-hop graph traversals.
Implementation Tasks
- Update the ingestion extraction prompt/logic in
src/lib/ingestion/entity-extraction.ts(and the sidecar) to output semantic relationship types. - Add entity canonicalization logic to merge synonymous entities before writing to
kg_entities. - Create an
extractQueryEntities(query: string)utility function to process user queries. - Update the RAG retrieval pipeline (
src/lib/tools/rag/retrievers/graph-retriever.tsand the future Neo4j retriever) to accept extracted entities as search parameters. - Update existing Cypher/SQL traversal queries to anchor their searches on the newly extracted query entities.
Acceptance Criteria
- When a document is ingested, relationships other than
CO_OCCURSare successfully identified and stored in PostgreSQL/Neo4j. - When a user submits a RAG question, entities are actively extracted from the query text.
- Graph retrieval uses the extracted query entities as starting points for graph traversal, resulting in more relevant document section retrieval.
- Entities representing the exact same concept are deduplicated/merged during the ingestion phase.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request