Refine Entity/Relationship Extraction & Implement Query-Side Extraction

## Summary
To maximize the value of the upcoming Neo4j graph integration (and the existing PostgreSQL graph), we need to improve the quality of the data entering the graph and how we query it. This sub-issue focuses on two key enhancements:

* **Refining Document Ingestion:** Moving beyond simple `CO_OCCURS` relationships by identifying richer, semantic relationships between entities, and refining/deduplicating the entities themselves.
* **Query-Side Entity Extraction:** Applying Named Entity Recognition (NER) to the user's query at runtime to identify precise seed nodes for graph traversal, improving retrieval accuracy.

## Proposed Solution

### 1. Refine Entity & Relationship Extraction (Ingestion)
* **Semantic Relationships:** Update the sidecar `/extract-entities` logic (or the LLM prompt powering it) to identify specific relationship types rather than just co-occurrence. Examples include `BELONGS_TO`, `DEPENDS_ON`, `REPORTS_TO`, `SIMILAR_TO`, etc.
* **Entity Refinement & Resolution:** Introduce a deduplication or canonicalization step during ingestion. For example, resolving "AWS" and "Amazon Web Services" to the same underlying entity, or merging entities with high embedding similarity before syncing to the database/Neo4j.

### 2. Query-Side Entity Extraction (Retrieval)
* **User Query NER:** Before hitting the RAG ensemble, pass the user query through a lightweight extraction step (either a fast LLM call or a dedicated NER model in the sidecar) to identify key entities.
* **Targeted Graph Traversal:** Pass these extracted entities to the `GraphRetriever` and the proposed `Neo4jGraphRetriever`. Use these exact entity names/labels as the starting nodes for multi-hop graph traversals.

## Implementation Tasks
- [ ] Update the ingestion extraction prompt/logic in `src/lib/ingestion/entity-extraction.ts` (and the sidecar) to output semantic relationship types.
- [ ] Add entity canonicalization logic to merge synonymous entities before writing to `kg_entities`.
- [ ] Create an `extractQueryEntities(query: string)` utility function to process user queries.
- [ ] Update the RAG retrieval pipeline (`src/lib/tools/rag/retrievers/graph-retriever.ts` and the future Neo4j retriever) to accept extracted entities as search parameters.
- [ ] Update existing Cypher/SQL traversal queries to anchor their searches on the newly extracted query entities.

## Acceptance Criteria
- [ ] When a document is ingested, relationships other than `CO_OCCURS` are successfully identified and stored in PostgreSQL/Neo4j.
- [ ] When a user submits a RAG question, entities are actively extracted from the query text.
- [ ] Graph retrieval uses the extracted query entities as starting points for graph traversal, resulting in more relevant document section retrieval.
- [ ] Entities representing the exact same concept are deduplicated/merged during the ingestion phase.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine Entity/Relationship Extraction & Implement Query-Side Extraction #241

Summary

Proposed Solution

1. Refine Entity & Relationship Extraction (Ingestion)

2. Query-Side Entity Extraction (Retrieval)

Implementation Tasks

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Refine Entity/Relationship Extraction & Implement Query-Side Extraction #241

Description

Summary

Proposed Solution

1. Refine Entity & Relationship Extraction (Ingestion)

2. Query-Side Entity Extraction (Retrieval)

Implementation Tasks

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions