Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added first implementation #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Added first implementation #1

wants to merge 1 commit into from

Conversation

lferran
Copy link
Collaborator

@lferran lferran commented Aug 1, 2024

No description provided.

Copy link

@cwaddingham cwaddingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the most part. There are a few areas of potential improvement, though. Note that these are mainly stylistic and shouldn't prevent a merge. I do think they will improve the devex of using the module, though.

  1. DataPlane.delete() on line 237 raises an error if the list of vectors is > 1000. I would change this to a batched delete, instead, so if more than 1000 items are passed it silently handles them in batches.
async def _delete_batch(self, chunk: list[str]) -> None:
    # Implement the actual delete logic here
    pass

async def delete(self, ids: list[str]) -> None:
    """Delete vectors by their IDs."""
    if len(ids) > 1000:
        batches = [ids[i:i + 1000] for i in range(0, len(ids), 1000)]
        for batch in batches:
            self._delete_batch(batch)
    else:
        self._delete_batch(ids)

You could also simplify the code somewhat by revising the delete_by_prefix() method to use the same _delete_chunks() method. Just so both delete operations are using the same code, to avoid issues later with revisions or updates to how deletes are performed.

  1. Similar for the upsert methods. I see you have two of these, one for handling a single call and one for batching. I would add a check in the first one so if the total size is > the max payload size, it sends the request to the batch method. Or even better, combine the two similar to what was suggested for deletions.

  2. I would also add logging using Python's logger module. For example, for logging deletions:

import logging
logger = logging.getLogger(__name__)

def delete(self, ids: list[str]) -> None:
    """Delete vectors by their IDs."""
    logger.info(f"Deleting {len(ids)} vectors")
    # rest of delete() method goes here
  1. In its current state, any method that uses a semaphore to restrict concurrency creates their own. For clarity, this should be at the class level, with self._semaphore accessible to any method that needs it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants