-
Notifications
You must be signed in to change notification settings - Fork 194
feat: adding the operation delete_all_documents
to the OpenSearchDocumentStore
#2321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
davidsbatista
wants to merge
9
commits into
main
Choose a base branch
from
feat/adding-delete-all-docs-to-OpenSearchDocumentStore
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
6800ea8
converting several methods to static
davidsbatista 42b7a23
adding delete_all_documents() for sync and async + tests
davidsbatista 3cf92a1
updating tests
davidsbatista b2063c9
Merge branch 'main' into feat/adding-delete-all-docs-to-OpenSearchDoc…
davidsbatista b888045
updating tests
davidsbatista ea1baa1
formatting
davidsbatista 3fd4223
adding try/catch to both sync and async
davidsbatista 4dae896
using correct self._async_client in sync version
davidsbatista 159c076
fixing async version failure
davidsbatista File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,7 @@ | ||
# SPDX-FileCopyrightText: 2023-present deepset GmbH <[email protected]> | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
from unittest.mock import Mock, patch | ||
|
||
import pytest | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
# SPDX-FileCopyrightText: 2023-present deepset GmbH <[email protected]> | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
from unittest.mock import Mock, patch | ||
|
||
import pytest | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,9 @@ | ||
# SPDX-FileCopyrightText: 2023-present deepset GmbH <[email protected]> | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
import random | ||
import time | ||
from typing import List | ||
from unittest.mock import patch | ||
|
||
|
@@ -453,7 +455,7 @@ def test_embedding_retrieval_but_dont_return_embeddings_for_bm25_retrieval( | |
assert len(results) == 2 | ||
assert results[0].embedding is None | ||
|
||
def filter_documents_no_embedding_returned( | ||
def test_filter_documents_no_embedding_returned( | ||
self, document_store_embedding_dim_4_no_emb_returned: OpenSearchDocumentStore | ||
): | ||
docs = [ | ||
|
@@ -468,3 +470,56 @@ def filter_documents_no_embedding_returned( | |
assert results[0].embedding is None | ||
assert results[1].embedding is None | ||
assert results[2].embedding is None | ||
|
||
def test_delete_all_documents_index_recreation(self, document_store: OpenSearchDocumentStore): | ||
# populate the index with some documents | ||
docs = [Document(id="1", content="A first document"), Document(id="2", content="Second document")] | ||
document_store.write_documents(docs) | ||
|
||
# capture index structure before deletion | ||
assert document_store._client is not None | ||
index_info_before = document_store._client.indices.get(index=document_store._index) | ||
mappings_before = index_info_before[document_store._index]["mappings"] | ||
settings_before = index_info_before[document_store._index]["settings"] | ||
|
||
# delete all documents | ||
document_store.delete_all_documents(recreate_index=True) | ||
assert document_store.count_documents() == 0 | ||
|
||
# verify index structure is preserved | ||
index_info_after = document_store._client.indices.get(index=document_store._index) | ||
mappings_after = index_info_after[document_store._index]["mappings"] | ||
settings_after = index_info_after[document_store._index]["settings"] | ||
|
||
assert mappings_after == mappings_before, "delete_all_documents should preserve index mappings" | ||
|
||
settings_after["index"].pop("uuid", None) | ||
settings_after["index"].pop("creation_date", None) | ||
settings_before["index"].pop("uuid", None) | ||
settings_before["index"].pop("creation_date", None) | ||
assert settings_after == settings_before, "delete_all_documents should preserve index settings" | ||
|
||
new_doc = Document(id="4", content="New document after delete all") | ||
document_store.write_documents([new_doc]) | ||
assert document_store.count_documents() == 1 | ||
|
||
results = document_store.filter_documents() | ||
assert len(results) == 1 | ||
assert results[0].content == "New document after delete all" | ||
|
||
def test_delete_all_documents_no_index_recreation(self, document_store: OpenSearchDocumentStore): | ||
docs = [Document(id="1", content="A first document"), Document(id="2", content="Second document")] | ||
document_store.write_documents(docs) | ||
assert document_store.count_documents() == 2 | ||
|
||
document_store.delete_all_documents(recreate_index=False) | ||
time.sleep(2) # need to wait for the deletion to be reflected in count_documents | ||
assert document_store.count_documents() == 0 | ||
|
||
new_doc = Document(id="3", content="New document after delete all") | ||
document_store.write_documents([new_doc]) | ||
assert document_store.count_documents() == 1 | ||
|
||
results = document_store.filter_documents() | ||
assert len(results) == 1 | ||
assert results[0].content == "New document after delete all" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,8 @@ | ||
# SPDX-FileCopyrightText: 2023-present deepset GmbH <[email protected]> | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
import time | ||
from typing import List | ||
|
||
import pytest | ||
|
@@ -241,3 +246,59 @@ async def test_delete_documents(self, document_store: OpenSearchDocumentStore): | |
|
||
await document_store.delete_documents_async([doc.id]) | ||
assert await document_store.count_documents_async() == 0 | ||
|
||
@pytest.mark.asyncio | ||
async def test_delete_all_documents_index_recreation(self, document_store: OpenSearchDocumentStore): | ||
# populate the index with some documents | ||
docs = [Document(id="1", content="A first document"), Document(id="2", content="Second document")] | ||
await document_store.write_documents_async(docs) | ||
|
||
# capture index structure before deletion | ||
assert document_store._client is not None | ||
index_info_before = document_store._client.indices.get(index=document_store._index) | ||
mappings_before = index_info_before[document_store._index]["mappings"] | ||
settings_before = index_info_before[document_store._index]["settings"] | ||
|
||
# delete all documents | ||
await document_store.delete_all_documents_async(recreate_index=True) | ||
assert await document_store.count_documents_async() == 0 | ||
|
||
# verify index structure is preserved | ||
index_info_after = document_store._client.indices.get(index=document_store._index) | ||
mappings_after = index_info_after[document_store._index]["mappings"] | ||
settings_after = index_info_after[document_store._index]["settings"] | ||
|
||
assert mappings_after == mappings_before, "delete_all_documents should preserve index mappings" | ||
|
||
settings_after["index"].pop("uuid", None) | ||
settings_after["index"].pop("creation_date", None) | ||
settings_before["index"].pop("uuid", None) | ||
settings_before["index"].pop("creation_date", None) | ||
assert settings_after == settings_before, "delete_all_documents should preserve index settings" | ||
|
||
new_doc = Document(id="4", content="New document after delete all") | ||
await document_store.write_documents_async([new_doc]) | ||
assert await document_store.count_documents_async() == 1 | ||
|
||
results = await document_store.filter_documents_async() | ||
assert len(results) == 1 | ||
assert results[0].content == "New document after delete all" | ||
|
||
@pytest.mark.asyncio | ||
async def test_delete_all_documents_no_index_recreation(self, document_store: OpenSearchDocumentStore): | ||
docs = [Document(id="1", content="A first document"), Document(id="2", content="Second document")] | ||
await document_store.write_documents_async(docs) | ||
assert await document_store.count_documents_async() == 2 | ||
|
||
await document_store.delete_all_documents_async(recreate_index=False) | ||
# need to wait for the deletion to be reflected in count_documents | ||
time.sleep(2) | ||
assert await document_store.count_documents_async() == 0 | ||
|
||
new_doc = Document(id="3", content="New document after delete all") | ||
await document_store.write_documents_async([new_doc]) | ||
assert await document_store.count_documents_async() == 1 | ||
|
||
results = await document_store.filter_documents_async() | ||
assert len(results) == 1 | ||
assert results[0].content == "New document after delete all" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
# SPDX-FileCopyrightText: 2023-present deepset GmbH <[email protected]> | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
from unittest.mock import Mock, patch | ||
|
||
import pytest | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
# SPDX-FileCopyrightText: 2023-present deepset GmbH <[email protected]> | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
from typing import List | ||
|
||
import pytest | ||
|
4 changes: 4 additions & 0 deletions
4
integrations/opensearch/tests/test_open_search_hybrid_retriever.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,7 @@ | ||
# SPDX-FileCopyrightText: 2023-present deepset GmbH <[email protected]> | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
from copy import deepcopy | ||
from typing import Any, Dict | ||
from unittest.mock import Mock | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's consistent with the other methods to have this assert here. Therefore, I agree it's good to have it here.
However, I believe the only case it covers is if self._client was set at some point and then for some reason is set to None again. We have the same assert in
_ensure_index_exists
, which get's called as part of_ensure_initialized
when it is run for the first time.