SOLR-15437: ReRanking/LTR does not work in combination with custom sort and SolrCloud #151

tkaessmann · 2021-05-26T13:57:35Z

https://issues.apache.org/jira/browse/SOLR-15437

Description

We found out that a plain SolrCloud will return random sorted docs if you define a custom sort field or function. This problem is also mentioned on mailing-lists (see jira issue) and is not really fixed yet.

Solution

The basic idea is to fix this in the mergeIds method of the QueryComponent.

A PriorityQueue was used to collect the results from all shards.
We added another PriorityQueue next to the existing one. The new queue is used to collect the defined count of reRanked documents.

After collecting all documents, the two queues have to be combined to fill the resultIds.

Be aware, that the reRankDocs-threshold is used per shard but also has to be applied across the whole result.

To handle this, documents get removed from the reRankQueue and inserted into the normal queue if their score after reRanking does not place them under the top results defined by the reRankDocs.

The documents in the reRankQueue are sorted by the score (after reRanking). The original sort is applied to all documents in the original queue.

To be able to correctly sort the not-reRanked documents, we had to remove the shortcut of taking the orderInShard in shard instead of comparing the sortValues from the ShardFieldSortedHitQueue. We extracted the functionality to the class ShardFieldSortedHitQueueWithSameShardCompareSkip which is used for the reRankQueue or in cases without reRanking.

Tests

We've added two tests with a custom sorting and reranking activated. One test ranks the whole resultset and the other one reranks only a subset, to make sure that the custom is applied to docs that were not reranked.

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.
I have added documentation for the Reference Guide

The basic idea is to have two queues in the QueryComponent. First get the results of the rerankend docs and after them place the other docs

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java

tomglk · 2021-05-27T14:13:26Z

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java

-          queue.insertWithOverflow(shardDoc);
+          if(reRankQueue != null && docCounter++ <= reRankDocsSize) {
+              ShardDoc droppedShardDoc = reRankQueue.insertWithOverflow(shardDoc);
+              // FIXME: Only works if the original request does not sort by score


The current solution only works if the original sort did not sort by score.

This is because the score of the documents is overwritten during the reRanking.
The reRankDocs-param which specifies the amount of docs that should be reRanked, is used per shard, but also has to be applied while combining the results.

Therefore, each shard response may contain documents which were reRanked on the shard, but should not be reRanked in the combined result.

Example:
reRankDocs = 2
shard1: doc_1 (score 200, reRanked), doc_2 (score 100, reRanked, original score 40), doc_3 (score 30)
shard2: doc_4 (score 300, reRanked), doc_5 (score 50, reRanked, original score 25), doc_6 (score 20)

expected result:
doc_4 (score 300, reRanked), doc_1 (score 200, reRanked), doc_2, doc_3, doc_5, doc_6

actual result:
doc_4 (score 300, reRanked), doc_1 (score 200, reRanked), doc_2, doc_5, doc_3, doc_6

The problem is, that we compare the score after reRanking (doc 2 & 5) with the score before reRanking (doc 3 & 6).
We have no access to the score before reRanking at this point and I currently see no possibility to retrieve it again.
Depending on the used reRanking algorithm, these scores may differ greatly, which results in an incorrect ordering of the results starting at position > reRankDocs.

<thinking out aloud>

... The reRankDocs-param which specifies the amount of docs that should be reRanked, is used per shard, but also has to be applied while combining the results. ...

Example: ... reRankDocs = 2 shard1: ... shard2: ...

Might it be helpful to include in the example the original score for doc_1 and doc_4 and to have a "pre-example" of what the logic and outcome is if the collection was single sharded and a "post-example" for (say) 4-sharded?

Can distributed, sharded re-ranking arrive at exactly the same outcome? Might the "but also has to be applied while combining the results" be up-for-debate e.g. a reRankDocs on shard level and a (possibly different) foobarReRankDocs on the federation level?

... We have no access to the score before reRanking at this point and I currently see no possibility to retrieve it again. ...

Conceptually if a feature store had an OriginalScoreFeature and fl=[fv] feature values were requested then in terms of "transport" the score before reRanking could be available in the federator. As a proof-of-concept that could be explored by making sure the store has the feature and the fl asks for the feature values. As a real implementation it could be a bit messy to capture, send and use the score that way. And actually, hmm, the score might be available via the fl via the "get fields" part of the distributed use of the query component but if/since the id merging happens before "get fields" then yes no original score available during merging.

</thinking out aloud>

The current solution only works if the original sort did not sort by score. ...
... We have no access to the score before reRanking at this point and I currently see no possibility to retrieve it again. ...

I had some more ideas around this, it's a semi-structured trail of thoughts, but sharing in that form in the hope that it might be useful and/or trigger further ideas and thoughts.

So as indicated by the // FIXME: Only works if the original request does not sort by score comment location, the problem happens when a document drops out of reRankQueue (where it's compared against other reranked documents via the reranked score) and joins queue (where it needs to be compared against other (reranked or unreranked) documents via the sort clause).

One way to avoid this is to ensure dropping out never happens but that is a change to the overall reranking logic and so for now here let's assume that is not the logic we want ...

OriginalScoreFeature via fl=[fv] providing an original score is too late timing wise since it happens during "get fields" after the "merge ids".

ShardParams.DISTRIB_SINGLE_PASS "distrib.singlePass" could be used to bring timing forward but that would have other performance etc. implications.

Implying "distrib.singlePass=true" and implying fl=[fv] and an OriginalScoreFeature that might not have been there in the model or feature store would also be messy both on shard and federator code level.

All that aside, this a reranking problem generally i.e. not an LTR reranking problem specifically i.e. OriginalScoreFeature and fl=[fv] is too specific a solution.

So let's leave the merging logic here alone for a moment and go to the shard code and determine where exactly the "original_score" information is lost. If we find where it's lost then we can preserve it there and pass it along via the code path that the "score" information takes and eventually it would reach the merging logic here, at least that's the theory.

The score changes during rescoring.

In the case of LTR rescoring https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.2/solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRRescorer.java#L121-L131 allocates reranked which is then filled from firstPassResults i.e. structurally this is where the information loss happens. For non-LTR rescoring something equivalent presumably happens elsewhere.

For thinking purposes let's assume that we could add an original_score field to ScoreDoc itself i.e. at line 210 we could preserve the score:

scorer.getDocInfo().setOriginalDocScore(hit.score); + hit.original_score = hit.score; hit.score = scorer.score();

Next let's explore how a distributed query's scores get from the shard to the federator.

(Shoutout to Hoss' "Lifecycle of a Solr Search Request" talk at this point e.g. http://people.apache.org/~hossman/rev2017/ slides.)

Somewhat jumping into the middle of the code QueryComponent L368 and L1512

QueryResult result = new QueryResult(); ... rb.setResult(result);

QueryResult contains DocListAndSet which contains DocList and DocSet. DocSlice implements DocList and one of the places (there might be others) where ScoreDoc turns into DocSlice is at https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.2/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1623

So for thinking purposes we added original_score field to ScoreDoc already and DocSlice contains an optional scores list and so let's imagine adding an optional original_scores list alongside it: https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.2/solr/core/src/java/org/apache/solr/search/DocSlice.java#L39

public class DocSlice extends DocSetBase implements DocList { ... final int[] docs; // a slice of documents (docs 0-100 of the query) final float[] scores; // optional score list + final float[] original_scores; // optional original_score list final long matches; final TotalHits.Relation matchesRelation; ...

Now then though, how will that new original score list get "transported" from shard to federator?

Using an unusual(?) trick here: instead of navigating the code via DocSlice.scores we could navigate via DocSlice.matchesRelation on the assumption that it leads to the same area but more easily.

One of the areas is https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.2/solr/core/src/java/org/apache/solr/response/TextResponseWriter.java#L189-L197 and there the res.getReturnFields() leads to https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.2/solr/core/src/java/org/apache/solr/search/SolrReturnFields.java#L51-L53

public class SolrReturnFields extends ReturnFields { // Special Field Keys public static final String SCORE = "score";

and the speculation that perhaps a SolrReturnFields.ORIGINAL_SCORE might be needed.

That's where the trail of thoughts led me.

Of course, it is too early to reach any conclusions here but going intuitively and based on the code trails above it seems to me that getting access to the original_score is feasible but would be quite an undertaking and so before investing effort in that direction it could be helpful to first further consider and firmly rule out any overall reranking logic that requires only the already available information e.g. for shards to use reRankDocs and for the federator whilst combining results to not use reRankDocs but to combine all shards' reranked documents separately in a reRankQueue that has the same sizing as queue (and then documents dropping out of reRankQueue need not be added to queue since them dropping out means there's already sufficient documents in reRankQueue to serve the request).

I hope that is not too opinionated a way of sharing one perspective and that some bits of it are useful? This distributed reranking stuff is an intriguing problem! :-)

I really appreciate how detailed you share your thoughts!
This is extremely helpful to discover different ways to solve this problem.

First regarding the original score:
Personally, I like the idea of adding the value to the ShardDoc, but that would entail a PR to Lucene and we would have to wait for a newer version to use that in Solr to have access to that change. Like you already mentioned, that sounds like quite an undertaking.

Moving on to the problem with sort by score:

e.g. for shards to use reRankDocs and for the federator whilst combining results to not use reRankDocs but to combine all shards' reranked documents separately in a reRankQueue that has the same sizing as queue (and then documents dropping out of reRankQueue need not be added to queue since them dropping out means there's already sufficient documents in reRankQueue to serve the request).

I struggle to understand your idea.
First I understood that you want to collect the returned documents before reranking in one queue (queue A) and the reranked documents in another queue (queue B). Then we could take all the documents in queue A first, remove them from queue B and add the results from queue B after the results from queue A.
But the reranking happens on the shards and we are on the coordinator (in mergeIds), so we only have the already "mixed" (reranked and nor reranked) results from the shards.
Then I thought, maybe you want the shards to return multiple result lists? One with the reranked documents and another one with the results before reranking?
Then we could first merge the results from the reranked lists (= take the top ones), remove these documents from the queues that contain the results before reranking and then merge these results to append them after the reranked docs.

I did not look into the code yet but that feels like quite a big task.

What do you think about excluding both problems (making the original score visible and fixing the reranking with sort by score) to another PR? So that we first make LTR with a cloud setup possible at all and in a next step fix the problem with the sort=score?

I know that the sort by score is the default, but I am also positive that there are many people with custom sorting that could already benefit from the first part (this PR without fix for score sort).
We could also aim to finish that until 8.9 maybe?

If I did not understand your last idea correctly, please let me know. And please be aware that I do not want to create pressure by suggesting the exclusion of the fix from this PR. :)

... but that would entail a PR to Lucene and we would have to wait for a newer version ...

Maybe, maybe not. In terms of thinking about the code it is perhaps easiest to imagine ScoreDoc having an additional original_score element but in terms of writing the code it might not necessarily work out that way e.g. instead imagine within Solr

public class ScoreDocWithOriginalScore extends org.apache.lucene.search.ScoreDoc { public float original_score; ... }

or something similar and for that class to be used if and only if needed thus reducing impact of the changes on code paths that don't need the change. But still even then there'd be quite a bit of Solr code to cover to build confidence that the original score makes it to all the right places and isn't somehow lost somewhere.

... to collect the returned documents before reranking in one queue (queue A) and the reranked documents in another queue (queue B). ...
... But the reranking happens on the shards and we are on the coordinator (in mergeIds), so we only have the already "mixed" (reranked and nor reranked) results from the shards. ...

Yes, the idea was to collect reranked and not-reranked documents in separate non-overlapping queues and it assumed that within our "mixed" results we can already tell apart the reranked and not-reranked results i.e. within each shard's response the first reRankDocs documents will be reranked and documents (if any) after that are not-reranked i.e. the i < reRankDocsSize condition. I have not verified if that assumption is accurate but here's a short example to explore the idea, noting that in the dual sharded collection the client application chooses a halfed reRankDocs on the understanding that the collection has two shards.

single sharded collection shard1: a b c d e f g h i j x y z # without reranking shard1: J I H G F E D C B A x y z # with reRankDocs=10 (assuming reranking reverses order of reranked docs and uppercases them) overall results: J I H G F E D C B A x y z

dual sharded collection shard1: a c e g i y # without reranking shard2: b d f h j x z # without reranking shard1: I G E C A y # with reRankDocs=5 (assuming reranking reverses order of reranked docs and uppercases them) shard2: J H F D B x z # with reRankDocs=5 (assuming reranking reverses order of reranked docs and uppercases them) reRankedQueue: J I H G F E D C B A queue: x y z overall results: J I H G F E D C B A x y z

... What do you think about excluding ... to another PR? ... people with custom sorting that could already benefit from the first part ...

Yes, that possibility had occurred to me too :-)

In principle I think a fix for half of the issue could be better than no fix:

if it can be clearly documented/communicated what is fixed and what is not fixed

if existing behaviour is maintained for the 'not (yet) fixed' scenario

if we anticipate that the fix for the second half is not going to break/change things for anyone who benefitted from the first fix

In terms of the specifics here, so the "fixed" condition would be that "sort" did not use "score" in any way i.e. sort=popularity desc is "fixed" but both sort=score desc and sort=popularity desc, score desc is "not fixed" as yet. I wonder if QueryComponent and SortedHitQueueManager could accommodate that logic so that we would maintain existing behaviour for the not-yet-fixed use cases?

... If I did not understand your last idea correctly, please let me know. ...

Thanks for identifying which bits were and weren't clear! I've tried to clarify with the example and making the key assumption explicit. Please let me know if that was useful or not. I think that approach would solve both the without-score and the with-score scenario.

... And please be aware that I do not want to create pressure by suggesting the exclusion of the fix from this PR. :)

I did not perceive any pressure here at all but appreciate your openness and clarity w.r.t. no pressure being intended, thank you. :)

Hi @cpoerschke please excuse the delay, I am very sorry!
I just found time to look at your commit.
The tests fail because they assume, that the number of reRankDocs is applied to the whole result. Your change applies it to each shard.
So with e.g. 3 shards and reRankDocs=2 we expect the first 2 docs to be reranked, but instead 6 docs get reranked. Then the assertion on sort by id fails.
(Actually, the 6 is variable. I ran tests where 5 docs were reRanked in total, because shard 3 only had one doc.)

Because the client cannot (and should not have to) know, how many shards the cluster has, we would have to convert the reRankDocs that were passed in the {!ltr...}-query to a per shard value.
So basically <value that was passed in> / numShards.

However, I don't think that we can convert the global setting for reRankDocs into a per shard setting because we cannot know how the top reRankDocs are distributed across the shards.
So when the client wants 6 docs to be reRanked and we have 3 shards, we cannot tell each shard that it should reReank the first 2 documents.
This is because the top 6 reRanked docs may be all be on one shard.
That was our reasoning behind the droppedShardDoc-logic.

So referring to your examples (thank you for the visualization idea! :) ):

dual sharded collection with reRankDocs=6 (3 per shard) shard1: a c e g i y # without reranking shard2: b d f h j x z # without reranking shard1: E C A g i y shard2: F D B h j x z reRankedQueue: E C A F D B # assuming that all scores are higher on shard 1 queue: g i y h j x z overall results: E C A F D B g i y h j x z

But the top 6 docs by their reRanked score could be:

dual sharded collection with reRankDocs=6 (also applied on shard level) shard1: a c e g i y # without reranking shard2: b d f h j x z # without reranking shard1: Y I G E C A shard2: X J H F D B z reRankedQueue: Y I G E C A # assuming that all scores are higher on shard 1 queue: b d f h j x z overall results: Y I G E C A b d f h j x z

That's a bit sad, because if we could just convert the global param for the reRankDocs to a per shard one, we would not have the problem with the missing originalScore for documents, that were reRanked on a shard but not in the whole result. :/

Hi @cpoerschke please excuse the delay, I am very sorry!

Hi @tomglk, no worries at all and nothing to be sorry about, we all contribute here as and when time permits. :)

I just found time to look at your commit.

Thank you for taking a look at the commit!

The tests fail because they assume, that the number of reRankDocs is applied to the whole result. Your change applies it to each shard. ...
... Because the client cannot (and should not have to) know, how many shards the cluster has ...
... if the reRankScore for all docs on shard2 is higher then the score for each reRanked doc on shard1. ...

Yes, very well put, so this all shows then that (with my change applied) someone (Solr or the client) doing reRankDocs / numShard could give different results than reRankDocs for a single sharded collection.

That's a bit sad, ...

Yes it is.

Hmm, so let's maybe then loop back to and continue with your idea of a multi-part solution? I.e. this pull request to fix for use cases with a score-free sort and future pull request(s) for use cases whose sort contains a score (and which therefore requires access to the original score in the coordinator but where it is currently not available).

Hi @cpoerschke ,
thank you for the reassurance. :)

I just found the timing quite unlucky because we were suggesting an inclusion of this PR in 8.9 just before the "break" and I have no idea when 8.9 may be released.
So I cannot estimate the time pressure we would be under if we decide for a split of the PRs and aim for the 8.9 release.
Do you have any information on this? Maybe just a broad timeframe like 2 weeks / 2 months?

Having said that,

maybe then loop back to and continue with your idea of a multi-part solution? I.e. this pull request to fix for use cases with a score-free sort and future pull request(s) for use cases whose sort contains a score (and which therefore requires access to the original score in the coordinator but where it is currently not available).

I think that this approach would make the most sense.
We were looking into different methods to gain access to the original score during the week but do not have a good solution yet.

Just your point

if existing behaviour is maintained for the 'not (yet) fixed' scenario

is something we would have to look at again. Currently the existing behavior and the sort by score with our fix are broken, but I don't think that they are broken in the same way.

And this point

if it can be clearly documented/communicated what is fixed and what is not fixed

really is important. I do not know where we would communicate this best, but we definitely should make really clear, what was not working, what is working after this PR and what will need another PR to work.

This

fix for the second half is not going to break/change things for anyone who benefitted from the first fix

should be no problem. The first PR would only work for people with sort != score and the second PR would only provide benefit for people with sort == score. Returning information for the original score also should not break anything for people benefiting from this PR because that is just a bit more information than can, but does not have to be used.

Regarding

future pull request(s)

I think that the access to the original score and allowing the sort by score are very intertwined. So as soon as we solve the access to the score, the sorting by score is no longer a big problem. Therefore this could be one PR, in my opinion.

... I have no idea when 8.9 may be released. ... Do you have any information on this? Maybe just a broad timeframe like 2 weeks / 2 months? ...

As per https://lists.apache.org/thread.html/rc9b83cd6c1608088365a911b40fa98f0775a00b21ae1491725d63ebc%40%3Cdev.lucene.apache.org%3E the 8.9 release process is now underway and from the "Release Lucene/Solr 8.9.0 should we have it soon" mailing list thread subject i.e. https://lists.apache.org/thread.html/r6bde6e242a0801a63386dc3b7efb10d77038271cd9c69dc07f9c76a1%40%3Cdev.lucene.apache.org%3E my understanding is that this is for both Lucene and Solr.

I hope that's not a disappointment, perhaps I could have provided more clarity w.r.t. likely 8.9 timelines earlier here, sorry!

Perhaps "we are targetting 8.10" rather than "we just missed 8.9" could be another way to look at it?

w.r.t. broad timeframe for 8.10 (assuming there will be a 8.10 before 9.0) -- i don't know but historically e.g. http://archive.apache.org/dist/lucene/solr/ usually there's a release every couple of months but it depends (we have documentation on this somewhere which explains it better but i can't find it right now). hope that helps.

"we are targetting 8.10" rather than "we just missed 8.9"

I like that mindset!
That also means that we are not under time pressure and we can aim to solve the original score problem until them to be able to solve all problems in one release.

So no worries! :)

cpoerschke

This looks very interesting and the narrative description of the approach in the solution part of the pull request really helped me find a quick way into reading of the code changes, thank you!

...java/org/apache/solr/handler/component/ShardFieldSortedHitQueueWithSameShardCompareSkip.java

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java

…as used to keep the shortcut logic. Instead, make it configurable via constructor param.

…eues to SortedHitQueueManager to remove complexity from mergeIds( )

solr/contrib/ltr/src/test/org/apache/solr/ltr/TestLTROnSolrCloud.java

…fl, because we need it for sorting

…readability

…ected score

cpoerschke · 2021-06-04T21:53:32Z

solr/core/src/java/org/apache/solr/handler/component/SortedHitQueueManager.java

+
+    if(rb.getRankQuery() != null && rankQuery instanceof AbstractReRankQuery){


... this pull request to fix for use cases with a score-free sort and future pull request(s) for use cases whose sort contains a score ...

I think something like this could work:

Suggested change

if(rb.getRankQuery() != null && rankQuery instanceof AbstractReRankQuery){

boolean hasScoreSortField = false;

for (SortField sortField : sortFields) {

if (SortField.FIELD_SCORE.equals(sortField)) {

hasScoreSortField = true;

break;

}

}

if(rb.getRankQuery() != null && rankQuery instanceof AbstractReRankQuery && hasScoreSortField == false){

What do you think? (And no time pressure or rush at all to consider this idea, I just wanted to post it here today to perhaps help us shift away from the "bit sad" conclusion of the "The current solution only works if ..." conversation above.)

Hi @cpoerschke ,
because your solution looked good to me, I added it in df6f30a . :)
I also added some comments about why we do that and linked to this PR.

Would you agree that this now finishes the scope of this PR?
I think this was the only ToDo left.

From my point of view the only thing left is to find a way of communicating what this PR solves and what is still broken. I do not know the right place for that, but maybe you have an idea?

Hi @tomglk,

Thanks for updating the branch so that the existing behaviour is maintained when the sort includes a score!

In terms of communicating the scope of the fix, I can see a few complimentary possibilities:

solr/CHANGES.txt entry for the JIRA issue: if this is very specific e.g. "custom sort without a score" rather than only "custom sort" wording then that could convey the scope and non-scope of the fix.

Solr Ref Guide: https://solr.apache.org/guide/8_9/query-re-ranking.html appears to not yet mention distributed queries, so there's an opportunity there to document limitations or caveats e.g. https://solr.apache.org/guide/8_9/result-grouping.html#distributed-result-grouping-caveats does something similar.

code javadocs: i've pushed an exploratory commit to the branch -- feel free to revert or to adapt e.g. the (Single/Dual) SortedHitQueueManager name choice is probably too generic -- and in it the javadocs for the DualSortedHitQueueManager.supports method could provide a natural place to communicate what is and what is not supported and the usage of the method when deciding the DualSortedHitQueueManager vs. SingleSortedHitQueueManager instantiation helps communicate structually when existing behaviour is maintained since SingleSortedHitQueueManager is a simple wrapper around ShardFieldSortedHitQueue.

... I think this was the only ToDo left. ...

The test coverage in solr/core remains on the ToDo list I think, though it's easy to miss since it's rather hidden amongst all the discussion detail that led us to this point.

Hi @cpoerschke ,
thank you for the different pointers for the documentation. :)

You are right, the test coverage still is a ToDo.

I like the approach of splitting the SortedHitQueueManager.
But I do not think that the QueryComponent should know when to create which SortedHitQueueManager. So I moved the newSortedHitQueueManager() into the manager itself and changed the interface to an abstract class. (bbf887f)

What do you think about that change?

But I do not think that the QueryComponent should know when to create which SortedHitQueueManager. So I moved the newSortedHitQueueManager() into the manager itself and changed the interface to an abstract class. (bbf887f)

What do you think about that change?

Makes sense. And thanks for adding wonderful javadocs for the new classes too!

…s made in PR apache#151

tomglk · 2021-06-10T11:20:19Z

Hi @cpoerschke ,
the originalScore really is a hard nut to crack.

We have a very rough first draw of a possible solution and created a draft PR ( #171 ) which is based upon the code from this PR.
There is still some stuff that does not work but maybe you can generate some ideas from the current WIP.

Apart from that we would continue with the scope of only fixing the sort now. Starting with applying you suggestion.

…mplementations)

cpoerschke · 2021-06-20T17:27:17Z

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java

+          // we always need the score for reRanking
+          rb.setFieldFlags(rb.getFieldFlags() | GET_SCORES);


909a25d proposes to continue to have only one setFieldFlags call

That looks good to me! :)

One thing that I stumbled upon while looking at your change is, that we actually override the flags instead of just adding the score, because we start with this:

int flags = 0;

A comment why we do this instead of starting with

int flags = rb.getFieldFlags();

would be great, I think.

Great point. How about something along these lines?

int flags = 0; // we presume to be first to set this flag and hereby override any flags other components' prepare methods may have already set

That's exactly what I was looking for, thank you. :)

I appreciate all your ideas for improved comments! I included them in the commit where I added your comment above (dc2b84f)

…ager and change interface to abstract class

cpoerschke · 2021-06-22T16:54:23Z

solr/core/src/java/org/apache/solr/handler/component/ReRankSortedHitQueueManager.java

+import org.apache.lucene.search.SortField;
+
+/**
+ * This class is used to manage the multiple SortedHitQueues that we need during mergeIds( ).


minor:

Suggested change

* This class is used to manage the multiple SortedHitQueues that we need during mergeIds( ).

* This class is used to manage the multiple SortedHitQueues that we need during {@link QueryComponent#mergeIds(ResponseBuilder, ShardRequest)}.

cpoerschke · 2021-06-22T16:56:35Z

solr/core/src/java/org/apache/solr/handler/component/ReRankSortedHitQueueManager.java

+      if (sortField.getType().equals(SortField.Type.SCORE) && sortField.getField() == null) {
+        // using two queues makes reRanking on Solr Cloud possible (@see https://github.com/apache/solr/pull/151 )
+        // however, the fix does not work if the non-reRanked docs should be sorted by score ( @see https://github.com/apache/solr/pull/151#discussion_r640664451 )
+        // to maintain the existing behavior for these cases, we disable the broken use of two queues and use the (also broken) status quo instead


minor

Suggested change

// to maintain the existing behavior for these cases, we disable the broken use of two queues and use the (also broken) status quo instead

// to maintain the existing behavior for these cases, we do not support the broken use of two queues and continue to use the (also broken) status quo instead

cpoerschke · 2021-06-22T17:00:47Z

solr/core/src/java/org/apache/solr/handler/component/ReRankSortedHitQueueManager.java

+    if(shardDoc.orderInShard < reRankDocsSize) {
+      ShardDoc droppedShardDoc = reRankQueue.insertWithOverflow(shardDoc);
+      // Only works if the original request does not sort by score
+      // @AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/SOLR-15437")


@AwaitsFix is usually used for tests I think and using it in non-test code (albeit in a comment) could be confusing.

Suggested change

// @AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/SOLR-15437")

// see https://issues.apache.org/jira/browse/SOLR-15437

cpoerschke · 2021-06-22T17:06:08Z

solr/core/src/java/org/apache/solr/handler/component/SortedHitQueueManager.java

+   * In all other cases only one queue is used.
+   *
+   * @param sortFields the fields by which the results should be sorted
+   * @param ss SortSpec of the responseBuilder


Suggested change

* @param ss SortSpec of the responseBuilder

* @param ss SortSpec

or perhaps we can remove the ss parameter then and do rb.getSort() instead?

cpoerschke · 2021-06-22T17:08:44Z

solr/core/src/java/org/apache/solr/handler/component/SortedHitQueueManager.java

+
+    if(rb.getRankQuery() != null && rankQuery instanceof AbstractReRankQuery){


But I do not think that the QueryComponent should know when to create which SortedHitQueueManager. So I moved the newSortedHitQueueManager() into the manager itself and changed the interface to an abstract class. (bbf887f)

What do you think about that change?

Makes sense. And thanks for adding wonderful javadocs for the new classes too!

…d; improve comments

dsmiley · 2022-03-01T17:18:19Z

I ran into this problem today -- I'm glad others are working on it. Unless I've misattributed this issue, I think this problem is not related to a "custom sort" -- one only needs to have a re-ranking boost function that could reduce the score from what it was. QueryComponent.mergeIds doesn't know how to deal with scores that don't descend when you're sorting by score (as is the default).

The solution above is one approach and I think it's basically fine. It assumes that rr.rows docs across all shards are more relevant than all docs that follow, which I don't think is necessarily true, although it usually is when your data is balanced across the shards, as is typical. Another solution is to have the ReRankQuery machinery apply a constant factor to the score of all docs that follow rr.rows, so that the scores continue to descend. The factor could be computed based on how close the scores were between these documents on the fence prior to reranking, and then it could apply it proportionally to whatever the final score is of the last re-ranked document.

Stefan4solr · 2022-10-11T11:09:55Z

We're also affected by this bug and first of all we're wondering: Is anyone still working on these PRs?

Furthermore while the Solution proposed in the PRs 151/171 also covers many corner cases and e.g. uneven distributions of re-ranked docs across the shards, we were thinking about deploying a temporary, simpler fix for our local installation, perhaps a variant of what @dsmiley mentioned in his last comment: To adjust the scores of all re-ranked documents locally in the shards to make them larger than the largest original score of the documents before re-ranking. There is still be the possibility that the original scores differ per shard, but since we also accept that for normal ranking (except for distributed IDF via global statsCache), we figured we could ignore that. We would also accept that the scores of all re-ranked docs are always better than the highest non-reranked score, and we'd set reRankDocs to be our intended total number divided by the actual numbers of shards we're using. However (not being a Solr expert), is there something obvious I'm missing here?

tomglk · 2022-10-19T06:21:52Z

I had to stop due to a lack of time.

However, I can try to manage collaborating with you to finally fix this.
Especially if you have questions regarding this solution. Maybe also to develop a new solution (cannot promise anything here). I definitely need a bit of time to get deep into the topic again to be able to have fruitful discussions.

Stefan4solr · 2022-10-26T14:02:43Z

Thanks a lot for the offer, @tomglk! I guess it will take us some time to get a full understanding of the issue(s) and the proposed solution. Perhaps we'll start by looking into the "simpler" fix proposed above and see if we can get that to run on our side - that will probably already give us a better understanding of these issues. We'll definitely get back to you and are looking forward to discussions on this!

github-actions · 2024-02-26T00:00:32Z

This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!

github-actions · 2024-10-09T00:01:03Z

This PR is now closed due to 60 days of inactivity after being marked as stale. Re-opening this PR is still possible, in which case it will be marked as active again.

SOLR-15437: Add test to reproduce the problem

3900118

tkaessmann changed the title ~~SOLR-15437: ReRanking does work in combination with custom sort and SolrCloud~~ SOLR-15437: ReRanking does not work in combination with custom sort and SolrCloud May 26, 2021

tkaessmann changed the title ~~SOLR-15437: ReRanking does not work in combination with custom sort and SolrCloud~~ SOLR-15437: ReRanking/LTR does not work in combination with custom sort and SolrCloud May 26, 2021

SOLR-15437: Add support for reranking in cloud and custom sort

9f9e99d

The basic idea is to have two queues in the QueryComponent. First get the results of the rerankend docs and after them place the other docs

tomglk reviewed May 27, 2021

View reviewed changes

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java Outdated Show resolved Hide resolved

tomglk reviewed May 27, 2021

View reviewed changes

[SOLR-15437] remove unnecessary docCounter

ced7de3

tkaessmann marked this pull request as ready for review May 27, 2021 14:33

cpoerschke reviewed May 27, 2021

View reviewed changes

...java/org/apache/solr/handler/component/ShardFieldSortedHitQueueWithSameShardCompareSkip.java Outdated Show resolved Hide resolved

solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java Outdated Show resolved Hide resolved

tomglk added 2 commits May 28, 2021 09:12

[SOLR-15437] remove extra subclass of ShardFieldSortedHitQueue that w…

66e4f39

…as used to keep the shortcut logic. Instead, make it configurable via constructor param.

[SOLR-15437] extract logic of handling multiple ShardFieldSortedHitQu…

47075d2

…eues to SortedHitQueueManager to remove complexity from mergeIds( )

cpoerschke reviewed May 28, 2021

View reviewed changes

solr/contrib/ltr/src/test/org/apache/solr/ltr/TestLTROnSolrCloud.java Outdated Show resolved Hide resolved

tomglk and others added 7 commits May 31, 2021 11:32

[SOLR-15437] always request scores when reRanking even if not set as …

f3ccac9

…fl, because we need it for sorting

Merge branch 'main' into SOLR-15437

a3f6367

[SOLR-15437] add license

84c2b4a

Merge remote-tracking branch 'origin/SOLR-15437' into SOLR-15437

3cdfcd5

SOLR-15437: Add test and calculate ltr score manually

004ed78

[SOLR-15437] use helper method to calculate expected scores, improve …

db49b6d

…readability

[SOLR-15437] correctly parse negative values when calculating the exp…

c1db0cf

…ected score

cpoerschke reviewed Jun 4, 2021

View reviewed changes

tomglk pushed a commit to tkaessmann/solr that referenced this pull request Jun 10, 2021

[SOLR-15437] first draft of making the sort by score work with change…

7405691

…s made in PR apache#151

tomglk mentioned this pull request Jun 10, 2021

SOLR-15437: ReRanking / LTR does not work in SolrCloud with custom sort by score #171

Draft

7 tasks

tomglk and others added 3 commits June 20, 2021 11:49

[SOLR-15437] maintain existing behavior for sort by score

df6f30a

explore splitting SortedHitQueueManager into three (interface + two i…

37e9da8

…mplementations)

continue to have only one setFieldFlags call in QueryComponent.prepare

909a25d

cpoerschke reviewed Jun 20, 2021

View reviewed changes

[SOLR-15437] move method to create new SortedHitQueueManager into man…

bbf887f

…ager and change interface to abstract class

cpoerschke reviewed Jun 22, 2021

View reviewed changes

[SOLR-15437] remove sortSpec as parameter, just get it from rb instea…

dc2b84f

…d; improve comments

github-actions bot added the stale PR not updated in 60 days label Feb 26, 2024

github-actions bot added the closed-stale Closed after being stale for 60 days label Oct 9, 2024

github-actions bot closed this Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOLR-15437: ReRanking/LTR does not work in combination with custom sort and SolrCloud #151

SOLR-15437: ReRanking/LTR does not work in combination with custom sort and SolrCloud #151

tkaessmann commented May 26, 2021 •

edited

Loading

tomglk May 27, 2021

cpoerschke May 28, 2021

cpoerschke May 29, 2021

tomglk May 31, 2021

cpoerschke May 31, 2021

tomglk Jun 4, 2021 •

edited

Loading

cpoerschke Jun 4, 2021

tomglk Jun 5, 2021

cpoerschke Jun 5, 2021

tomglk Jun 7, 2021

cpoerschke left a comment

cpoerschke Jun 4, 2021 •

edited

Loading

tomglk Jun 20, 2021

cpoerschke Jun 20, 2021

tomglk Jun 21, 2021 •

edited

Loading

cpoerschke Jun 22, 2021

tomglk commented Jun 10, 2021

cpoerschke Jun 20, 2021

tomglk Jun 22, 2021

cpoerschke Jun 22, 2021

tomglk Jun 22, 2021

tomglk Jun 22, 2021

cpoerschke Jun 22, 2021

cpoerschke Jun 22, 2021

cpoerschke Jun 22, 2021

cpoerschke Jun 22, 2021

cpoerschke Jun 22, 2021

dsmiley commented Mar 1, 2022

Stefan4solr commented Oct 11, 2022

tomglk commented Oct 19, 2022

Stefan4solr commented Oct 26, 2022

github-actions bot commented Feb 26, 2024

github-actions bot commented Oct 9, 2024


		if(rb.getRankQuery() != null && rankQuery instanceof AbstractReRankQuery){

-    if(rb.getRankQuery() != null && rankQuery instanceof AbstractReRankQuery){
+    boolean hasScoreSortField = false;
+    for (SortField sortField : sortFields) {
+      if (SortField.FIELD_SCORE.equals(sortField)) {
+        hasScoreSortField = true;
+        break;
+      }
+    }
+    if(rb.getRankQuery() != null && rankQuery instanceof AbstractReRankQuery && hasScoreSortField == false){

		// we always need the score for reRanking
		rb.setFieldFlags(rb.getFieldFlags() \| GET_SCORES);

	* This class is used to manage the multiple SortedHitQueues that we need during mergeIds( ).
	* This class is used to manage the multiple SortedHitQueues that we need during {@link QueryComponent#mergeIds(ResponseBuilder, ShardRequest)}.

	// to maintain the existing behavior for these cases, we disable the broken use of two queues and use the (also broken) status quo instead
	// to maintain the existing behavior for these cases, we do not support the broken use of two queues and continue to use the (also broken) status quo instead

	// @AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/SOLR-15437")
	// see https://issues.apache.org/jira/browse/SOLR-15437

	* @param ss SortSpec of the responseBuilder
	* @param ss SortSpec

SOLR-15437: ReRanking/LTR does not work in combination with custom sort and SolrCloud #151

SOLR-15437: ReRanking/LTR does not work in combination with custom sort and SolrCloud #151

Conversation

tkaessmann commented May 26, 2021 • edited Loading

Description

Solution

Tests

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomglk Jun 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cpoerschke left a comment

Choose a reason for hiding this comment

cpoerschke Jun 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomglk Jun 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomglk commented Jun 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsmiley commented Mar 1, 2022

Stefan4solr commented Oct 11, 2022

tomglk commented Oct 19, 2022

Stefan4solr commented Oct 26, 2022

github-actions bot commented Feb 26, 2024

github-actions bot commented Oct 9, 2024

tkaessmann commented May 26, 2021 •

edited

Loading

tomglk Jun 4, 2021 •

edited

Loading

cpoerschke Jun 4, 2021 •

edited

Loading

tomglk Jun 21, 2021 •

edited

Loading