-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating embedding with MongoDB text store when library contains CSV file fails #475
Comments
Anyone use this functionality? |
thank you for reporting this. Do you have this problem solely with ChromaDB or also other vector stores?. |
Hi @MacOS, I'm not positive if it only happens with ChromaDB, but the specific exception that occurs with any non-string mentioned above is specifically in the create_new_embedding function of the EmbeddingChromaDB class. I haven't yet had a chance to do any tests with different vector stores. |
I see, @BillJones-SectorFlow. So it could effect other vector stores too. |
As a test, I moved to postgres for both db types and I no longer get this error, so I do believe it is specific to ChromaDB (or at least it doesn't affect postgres). |
🤔 I think it has nothing to do with the vector store, but with the text store. Because the line in the Compare Line 2429 in 2964747
With Line 1917 in 2964747
For some reason, the text collection is returning a list for block["text_search"] is returning a list in the former case but a string in the later. Which you described above. In your case, I would try to convert the list to a string. ''.join(block["text_search"]) And the full line change would then be text_search = ''.join(block["text_search"]).strip() With that said, I do not understand how this is possible. May I kindly ask you to post a self-contained example that reproduces the error for |
I did not test ChromaDB + Postgres, but I did switch from MongoDB to using postgres on both sides and you are correct: I also see a string instead of a list coming in from the text collection store after moving off of MongoDB. So, I believe it might actually be an issue with Mongo (either how it's saving it or how it's retrieved -- unsure which). I've changed the title of the issue to indicate this. |
Hi all! I think I may have found a bug related to creating embeddings of CSV files.
When attempting to create an embedding of a library (with ChromaDB as vector_db), where the library has a CSV file added, I'm getting the following exception:
**I certainly could be doing something wrong, but on the line mentioned, block["text_search"] is a list of lists representing the rows of the CSV, but the code then attempts to do a strip() on the list as if it was a str, causing the error. Simply avoiding the strip() based on instance type doesn't work as errors are then picked up elsewhere.
I'm able to reproduce with the latest main branch using the following sample code (with above setup included):
Setup/Boilerplate Minimized for Simplicity
Example addresses.csv file contents (though, it happens with all I've tested):
Bug or user error? :)
The text was updated successfully, but these errors were encountered: