Skip to content

Conversation

@shuangela
Copy link
Collaborator

@shuangela shuangela commented Oct 30, 2025

Adds vector search capabilities to Python backend

Completion of Python Backend

This pull request completes the backend for python.

What to Review

The new vector search endpoint to verify that it is working.

Method Endpoint
GET /api/movies/vector-search

Testing Recommendations

  • Create a virtual environment. python -m venv venv
  • Activate the virtual environment source venv/bin/activate
  • Install pip-tools pip install pip-tools
  • Install dependencies pip-sync requirements.txt
  • Setup your .env. You will need a Voyage API key to run this, you can ask Angela or Taylor for the API key necessary.
  • Run the server using fastapi run main.py --reload
  • localhost/docs will give you an UI to test every endpoint.

Leftover question

Should we refactor the comment at the top listing all the endpoints to be in order of the order that the endpoints are implemented? Search + Vector Search are currently implemented earlier than the other endpoints but are listed at the end of the comment at the top listing all endpoints.

Copy link
Collaborator

@tmcneil-mdb tmcneil-mdb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR! 🎃

I am approving but this actually is a blocking change.

We need to update the requirements.txt to include the new module voyageai.
We have pip-tools installed, so its a quick fix.

  • Double check your version of voageai via pip show voyageai
  • Manually add voyageai in the requirements.in (I would put it under section 4). I would use pinning to make it more flexible.
  • Run pip-compile requirements.in - This automatically remakes the requirements.txt
  • Run pip install -r requirements.txt to ensure everything still works and test an endpoint. It works on my side!
  • Commit the new requirements

In reference to your note about reordering the docstrings for the endpoint. I concur. It should be revised to match the order in the file.

# Handle invalid ObjectId conversion
result["_id"] = str(result["_id"]) if result["_id"] else None

results = [VectorSearchResult(**doc) for doc in raw_results]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N: I would perhaps leave a comment here explaining what this line is doing. Something about converting the result in a VectorSearch object.


async def execute_aggregation_on_collection(collection, pipeline: list) -> list:
"""Helper function to execute aggregation pipeline on a specified collection and return results"""
print(f"Executing pipeline: {pipeline}") # Debug logging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More a question/thought than anything else, should we introduce logging for the aggregation instead of the printing in the console?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i asked the same thing in the previous PR but i forgot to follow-up -- we shouldn't be printing to the console in the final version of the app. we can introduce logging now or as a fast follow

Copy link
Collaborator

@cbullinger cbullinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting the changes Taylor pointed out in her comments. Overall this looks good, though, and everything worked for me once I got the voyageai package installed


async def execute_aggregation_on_collection(collection, pipeline: list) -> list:
"""Helper function to execute aggregation pipeline on a specified collection and return results"""
print(f"Executing pipeline: {pipeline}") # Debug logging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i asked the same thing in the previous PR but i forgot to follow-up -- we shouldn't be printing to the console in the final version of the app. we can introduce logging now or as a fast follow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants