-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[Feature Request] Improve the performance of numeric range queries #18334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @kkewwei here is the similar issue to extend Approximation to other numeric types #14406 (comment) which I'm looking into. Also I agree it would be more efficient to limit the visit to the size default to 10. I'm about to open an issue with boolean approximation, any boolean query with filter (and must not) clause produces ConstantScore, example the following query
Can be rewritten to and can be approximated
Thanks |
@prudhvigodithi For the third, I'm not certain I grasp your point. For multi-condition queries within a filter clause, approximating document IDs (docIDs) for range query seems challenging. For above example, suppose a I’ve also pondered this issue. Perhaps instead of traversing all docIDs in the BKD tree upfront, we could represent them using an iterator and traverse them in real time as needed. |
Thats true @kkewwei we should come up with few ideas on how we can do a safe approximation for bool queries. We can brainstorm some ideas in this same issue and later can move that to a separate issue for bool queries approximation. |
When @harshavamsi and I discussed the idea of I'm a bit reluctant to change that behavior, since it impacts the user experience. |
Add on this, we discovered there's shortcutTotalHitCount logic for MatchAllQuery in a bug fix PR, so we can just vist the size and stop super early. But yeah, for bool query, we have to return 10k because no shortcut way to tell total hits. OpenSearch/server/src/main/java/org/opensearch/search/query/TopDocsCollectorContext.java Lines 712 to 763 in a6eb368
@kkewwei I am not sure the idea for this point, would you explain a bit more?
|
@bowenlan-amzn Yes. When the number of matched terms exceeds 10,000, the
Yes, I find another bug in |
@kkewwei FYI here is PR where the bug was fixed to include the shortcut https://github.com/opensearch-project/OpenSearch/pull/18189/files#diff-53173a30404a65ce7f35073d65258143cfa2d47947ff8e0817337ff3d37e01f3, as part of this the |
Uh oh!
There was an error while loading. Please reload this page.
Is your feature request related to a problem? Please describe
In #13788, we introduce
ApproximatePointRangeQuery
forlong
anddate
type. However, several potential optimizations remain to be explored:ApproximatePointRangeQuery
is only used inlong
type, but doesn't used ininteger
,double
,float
type, we can extend theApproximatePointRangeQuery
to those numeric types.ApproximatePointRangeQuery
can't used in the followdsl
, which just add abool
on the outer compared with the standarddsl
.ApproximatePointRangeQuery
to term queries. For low-cardinality field, this could yield better optimization results.ApproximatePointRangeQuery
, we will visit at leasttrackTotalHitsUpTo
(defalut 10k) docs inBKD
tree, it may be more efficient to limit the visit to thesize
(default: 10) to reduce unnecessary overhead for the non-scoring case.Describe the solution you'd like
no
Related component
Search:Performance
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: