-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add SearchAfterMixin for ES search_after capability #4536
base: master
Are you sure you want to change the base?
Conversation
2262eb3
to
a70ebb7
Compare
2b92b48
to
d93f032
Compare
d93f032
to
7a8f3b7
Compare
de8f6a3
to
e2052c4
Compare
partner=ESDSLQ('term', partner=partner.short_code), | ||
identifiers=ESDSLQ('terms', **{'uuid': course_uuids}), | ||
document=CourseDocument | ||
).values_list('uuid', flat=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this uses values_list while course_run_ids is using comprehension. We can should make it consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's rather consistent now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it? The code is still the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I misunderstood the earlier comment. Lemme use values_list for both.
66e7fae
to
c3c3901
Compare
@@ -4192,3 +4195,23 @@ def test_basic(self): | |||
self.assertEqual(course_run.restricted_run, restricted_course_run) | |||
self.assertEqual(restricted_course_run.restriction_type, 'custom-b2b-enterprise') | |||
self.assertEqual(str(restricted_course_run), "course-v1:SC+BreadX+3T2015: <custom-b2b-enterprise>") | |||
|
|||
|
|||
class TestSearchAfterMixin(ElasticsearchTestMixin, TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this test live in test_mixins? It seems a bit weird to have it in test_models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving to a new file test_mixins.py
for _ in range(self.total_courses): | ||
CourseFactory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use CourseFactory.create_batch(count) instead of doing this loop.
CourseFactory() | ||
|
||
@patch("course_discovery.apps.course_metadata.models.registry.get_documents") | ||
def test_fetch_all_courses(self, mock_get_documents): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from courses, try indexing combination of different products like CourseRun, Programs, etc. to ensure a variety and then verify everything is working as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functionalities are well tested in course_discovery/apps/api/v2/tests/test_views/test_catalog_queries.py
. We've already added a proxy model to verify the behavior that the search_after functionality is working as expected.
Furthermore, it should ensure that the existing search functionality and search responses remain unaffected in the current version of the endpoint. | ||
|
||
Decision | ||
---------- | ||
A new version (v2) of the `search/all/` endpoint will be introduced to enhance functionality while ensuring that the existing v1 functionality remains unaffected. | ||
A new version (v2) of the `search/all/` endpoint will be introduced to enhance functionality while ensuring that the existing v1 functionality remains unaffected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can tweak the decision section to highlight the addition of SearchAfterMixin and SearchAfterPagination instead of just mentioning that new endpoint was added. It would better reflect the capabilities. Then we can build on that and show how new endpoints were added.
|
||
search = search.extra(search_after=search_after) if search_after else search | ||
|
||
results = search.execute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious: should we not add error handling here, in case any of the sub-sequent request fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is already handled in this custom dispatch
exception = InvalidQuery(f'Failed to make Elasticsearch request. Got exception: {exc}') |
{
"detail": "Failed to make Elasticsearch request. Got exception: RequestError(400, 'search_phase_execution_exception', 'Failed to parse query [(org:)]')"
}
c3c3901
to
89fad11
Compare
PROD-4233
Adds a new
SearchAfterMixin
to be added in place ofPkSearchableMixin
that allows using search_after. Once this mixin is used, it will bypass the default search limit of 10k by making multiple calls to ES in case we have more than 10k records in an index.Previously, we faced an issue regarding the search limit, resulting in less records to be returned. We increased the
MAX_RESULT_WINDOW
before but a better way is to use search_after capability for an optimal and flexible result.This PR also adds a v2
CatalogQueryContainsViewSet
that fixes the querying mechanism by filtering the items at the time of when we're executing queries on ES. In v1, ourCatalogQueryContainsViewSet
first searched all the records AND THEN filtered it.Testing Instructions:
update_index
locally in Discovery shell./api/v1/catalog/query_contains/
endpoint and add a sample query like this: http://localhost:18381/api/v2/catalog/query_contains/?course_uuids=2de67490-f748-4efd-8532-b445f7ecc6f9,f9f1e100-668a-4fd5-a966-a127de1f69de&query=org:edXYou can set
ELASTICSEARCH_DSL_QUERYSET_PAGINATION
to your specific value in order to test the behavior. While the end result will be the same but this can affect the number of times the search_after mechanism is called.