-
Notifications
You must be signed in to change notification settings - Fork 5
feat: Add vector search to Python Backend #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice PR! 🎃
I am approving but this actually is a blocking change.
We need to update the requirements.txt to include the new module voyageai.
We have pip-tools installed, so its a quick fix.
- Double check your version of voageai via
pip show voyageai - Manually add voyageai in the requirements.in (I would put it under section 4). I would use pinning to make it more flexible.
- Run
pip-compile requirements.in- This automatically remakes the requirements.txt - Run
pip install -r requirements.txtto ensure everything still works and test an endpoint. It works on my side! - Commit the new requirements
In reference to your note about reordering the docstrings for the endpoint. I concur. It should be revised to match the order in the file.
| # Handle invalid ObjectId conversion | ||
| result["_id"] = str(result["_id"]) if result["_id"] else None | ||
|
|
||
| results = [VectorSearchResult(**doc) for doc in raw_results] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N: I would perhaps leave a comment here explaining what this line is doing. Something about converting the result in a VectorSearch object.
|
|
||
| async def execute_aggregation_on_collection(collection, pipeline: list) -> list: | ||
| """Helper function to execute aggregation pipeline on a specified collection and return results""" | ||
| print(f"Executing pipeline: {pipeline}") # Debug logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More a question/thought than anything else, should we introduce logging for the aggregation instead of the printing in the console?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i asked the same thing in the previous PR but i forgot to follow-up -- we shouldn't be printing to the console in the final version of the app. we can introduce logging now or as a fast follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting the changes Taylor pointed out in her comments. Overall this looks good, though, and everything worked for me once I got the voyageai package installed
|
|
||
| async def execute_aggregation_on_collection(collection, pipeline: list) -> list: | ||
| """Helper function to execute aggregation pipeline on a specified collection and return results""" | ||
| print(f"Executing pipeline: {pipeline}") # Debug logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i asked the same thing in the previous PR but i forgot to follow-up -- we shouldn't be printing to the console in the final version of the app. we can introduce logging now or as a fast follow
Adds vector search capabilities to Python backend
Completion of Python Backend
This pull request completes the backend for python.
What to Review
The new vector search endpoint to verify that it is working.
Testing Recommendations
python -m venv venvsource venv/bin/activatepip install pip-toolspip-sync requirements.txtfastapi run main.py --reloadLeftover question
Should we refactor the comment at the top listing all the endpoints to be in order of the order that the endpoints are implemented? Search + Vector Search are currently implemented earlier than the other endpoints but are listed at the end of the comment at the top listing all endpoints.