Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions after using this package #305

Closed
byt3bl33d3r opened this issue Mar 13, 2025 · 2 comments
Closed

Questions after using this package #305

byt3bl33d3r opened this issue Mar 13, 2025 · 2 comments

Comments

@byt3bl33d3r
Copy link

Hello,

First off really appreciate the fast turn around on #300. cheers🍻 .

After using this package I have a few questions, mostly cause it completely tossed out the window a lot of the assumptions I had about Neo4j and was hoping the team here could provide some clarity about best practices.

  1. I thought you weren't supposed to store documents or really any type of data in Neo4j for performance reasons (and use something like Postgres for that) however KGPipeline does store the full contents of each document in the Chunk nodes. I also had no clue you could use Neo4j as a Vector DB and the embeddings are also stored. Does this scale? How does this affect performance with large databases?

  2. KGPipeline stores the embeddings of each Chunk in the embeddings property however from the examples it seems like you have to create a separate vector store to do vector searches? Is there a way of just using the embeddings from the Chunk.embeddings property for searching without creating a complete new vector store ?

@stellasia
Copy link
Contributor

Hi @byt3bl33d3r ,

  1. We store chunks and their content together with the document structure using NEXT_CHUNK relationship so it's a good use case for a graph DB. Regarding storing text in Neo4j, we have not yet conducted heavy performance tests but compared to the overhead of querying both Neo4j and a separate document store, I don't think it is unreasonable. Also note that Neo4j supports full text index, so it's not uncommon to store text in the graph itself.
    Regarding your question on the vector index, it's a relatively new feature (introduced in 2023) and we are still working on improving scalability of the database and supporting increasingly larger datasets and we encourage users to try our vector search and indexing capabilities with their own use cases. In Aura we have recently released vector-optimzed instances (improving memory allocated to our vector index). Our vector indexes support HNSW quantization, which allow for more vectors to be stored and served at reasonable latency.

  2. Embeddings are indeed stored as node property and to use it in RAG we need an vertor index. It's not copying the full vectors in another place but indexing them for faster queries.

Hope that helps!

@stellasia
Copy link
Contributor

Closing this one if there is no follow-up questions. Feel free to reopen if something needs to be clarified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants