You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First off really appreciate the fast turn around on #300. cheers🍻 .
After using this package I have a few questions, mostly cause it completely tossed out the window a lot of the assumptions I had about Neo4j and was hoping the team here could provide some clarity about best practices.
I thought you weren't supposed to store documents or really any type of data in Neo4j for performance reasons (and use something like Postgres for that) however KGPipeline does store the full contents of each document in the Chunk nodes. I also had no clue you could use Neo4j as a Vector DB and the embeddings are also stored. Does this scale? How does this affect performance with large databases?
KGPipeline stores the embeddings of each Chunk in the embeddings property however from the examples it seems like you have to create a separate vector store to do vector searches? Is there a way of just using the embeddings from the Chunk.embeddings property for searching without creating a complete new vector store ?
The text was updated successfully, but these errors were encountered:
We store chunks and their content together with the document structure using NEXT_CHUNK relationship so it's a good use case for a graph DB. Regarding storing text in Neo4j, we have not yet conducted heavy performance tests but compared to the overhead of querying both Neo4j and a separate document store, I don't think it is unreasonable. Also note that Neo4j supports full text index, so it's not uncommon to store text in the graph itself.
Regarding your question on the vector index, it's a relatively new feature (introduced in 2023) and we are still working on improving scalability of the database and supporting increasingly larger datasets and we encourage users to try our vector search and indexing capabilities with their own use cases. In Aura we have recently released vector-optimzed instances (improving memory allocated to our vector index). Our vector indexes support HNSW quantization, which allow for more vectors to be stored and served at reasonable latency.
Embeddings are indeed stored as node property and to use it in RAG we need an vertor index. It's not copying the full vectors in another place but indexing them for faster queries.
Hello,
First off really appreciate the fast turn around on #300. cheers🍻 .
After using this package I have a few questions, mostly cause it completely tossed out the window a lot of the assumptions I had about Neo4j and was hoping the team here could provide some clarity about best practices.
I thought you weren't supposed to store documents or really any type of data in Neo4j for performance reasons (and use something like Postgres for that) however
KGPipeline
does store the full contents of each document in theChunk
nodes. I also had no clue you could use Neo4j as a Vector DB and the embeddings are also stored. Does this scale? How does this affect performance with large databases?KGPipeline
stores the embeddings of eachChunk
in theembeddings
property however from the examples it seems like you have to create a separate vector store to do vector searches? Is there a way of just using the embeddings from theChunk.embeddings
property for searching without creating a complete new vector store ?The text was updated successfully, but these errors were encountered: