-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix query condition regression introduced by #5417 #5461
Fix query condition regression introduced by #5417 #5461
Conversation
9fbc7a0
to
b8e615c
Compare
// no side effects. Internally we will modify `base::pos_` because | ||
// the comparators need that to get the right coordinate. | ||
// This is only a problem if this is called concurrently (and even then | ||
// it would have been a problem anyway). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean that it would have been a problem anyway? Before this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why this comment is unclear. I wanted to declare it const
because it has no side effects if it is only called in a serializable way. But that requires the const_cast
because we have to temporarily modify state, hence the discussion of thread safety.
I have sympathies for the notion that because of the temporary state change this should not be declared const
. What I intended to highlight with this comment is that whether or not this is const
the thread safety issue exists, because the risk still exists that thread A saves and restores a temporary position used by thread B.
* @postcondition this method has no side-effects if it is never called | ||
* concurrently by multiple threads. If multiple threads call this method | ||
* concurrently then results may be undefined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for highlighting this. We need to be more aware that not all current reader steps are designed for concurrency and that they can silently cause undefined behavior.
/backport to release-2.27 (to fix the regression introduced over there) |
Started backporting to release-2.27: https://github.com/TileDB-Inc/TileDB/actions/runs/13393039545 |
@rroelke an error occurred while backporting to "release-2.27", please check the run log for details! Error: @rroelke is not a repo collaborator, backporting is not allowed. If you're a collaborator please make sure your TileDB-Inc team membership visibility is set to Public on https://github.com/orgs/TileDB-Inc/people?query=rroelke |
/backport to release-2.27 (take 2) |
Started backporting to release-2.27: https://github.com/TileDB-Inc/TileDB/actions/runs/13394045510 |
@rroelke an error occurred while backporting to "release-2.27", please check the run log for details! Error: @rroelke is not a repo collaborator, backporting is not allowed. If you're a collaborator please make sure your TileDB-Inc team membership visibility is set to Public on https://github.com/orgs/TileDB-Inc/people?query=rroelke |
/backport to release-2.27 (take 3) |
Started backporting to release-2.27: https://github.com/TileDB-Inc/TileDB/actions/runs/13394134690 |
#5417 made several changes to `SparseGlobalOrderReader::merge_result_cell_slabs`. One of these changes erroneously caused coordinates which do not pass the query condition to be emitted. This pull request restores the correct behavior. The crux of the change is ``` length = 1; ``` to ``` length = to_process.max_slab_length(1); ``` The affected line of code occurs during coordinate de-duplication. `merge_result_cell_slabs` puts coordinates from the leading tile of each fragment into a priority queue and pops them off. If the coordinate at the head of the queue matches what we just pulled off, then we continue popping until there is no longer a match. This de-duplication can cause a fragment to run out of loaded tiles, in which case it is necessary to end the loop to get more data so that we do not emit coordinates out of order. That's how we get to this `length = 1` code, which signals that we should emit a result slab of length 1. In contrast, `max_slab_length(1)` checks the query condition bitmap to see if the duplicated coordinate actually belongs in the query result. This was found by a customer data set which has many fragments which are essentially copies of the whole coordinate domain. This scenario de-duplicates many coordinates and is likely to hit the condition described. The new test `Sparse global order reader: fragment full copy 1d` simulates this set-up. A large fraction of the changes in this PR are the test scaffolding changes to support evaluating query conditions upon our `FxRun1D` and `FxRun2D` structures, so we can continue to check whether the expected and actual results match in the presence of a query condition. --- TYPE: BUG DESC: Fix query condition regression introduced by #5417 (cherry picked from commit 022af98)
#5417 made several changes to `SparseGlobalOrderReader::merge_result_cell_slabs`. One of these changes erroneously caused coordinates which do not pass the query condition to be emitted. This pull request restores the correct behavior. The crux of the change is ``` length = 1; ``` to ``` length = to_process.max_slab_length(1); ``` The affected line of code occurs during coordinate de-duplication. `merge_result_cell_slabs` puts coordinates from the leading tile of each fragment into a priority queue and pops them off. If the coordinate at the head of the queue matches what we just pulled off, then we continue popping until there is no longer a match. This de-duplication can cause a fragment to run out of loaded tiles, in which case it is necessary to end the loop to get more data so that we do not emit coordinates out of order. That's how we get to this `length = 1` code, which signals that we should emit a result slab of length 1. In contrast, `max_slab_length(1)` checks the query condition bitmap to see if the duplicated coordinate actually belongs in the query result. This was found by a customer data set which has many fragments which are essentially copies of the whole coordinate domain. This scenario de-duplicates many coordinates and is likely to hit the condition described. The new test `Sparse global order reader: fragment full copy 1d` simulates this set-up. A large fraction of the changes in this PR are the test scaffolding changes to support evaluating query conditions upon our `FxRun1D` and `FxRun2D` structures, so we can continue to check whether the expected and actual results match in the presence of a query condition. --- TYPE: BUG DESC: Fix query condition regression introduced by #5417 (cherry picked from commit 022af98)
#5417 made several changes to
SparseGlobalOrderReader::merge_result_cell_slabs
. One of these changes erroneously caused coordinates which do not pass the query condition to be emitted. This pull request restores the correct behavior.The crux of the change is
to
The affected line of code occurs during coordinate de-duplication.
merge_result_cell_slabs
puts coordinates from the leading tile of each fragment into a priority queue and pops them off. If the coordinate at the head of the queue matches what we just pulled off, then we continue popping until there is no longer a match.This de-duplication can cause a fragment to run out of loaded tiles, in which case it is necessary to end the loop to get more data so that we do not emit coordinates out of order. That's how we get to this
length = 1
code, which signals that we should emit a result slab of length 1. In contrast,max_slab_length(1)
checks the query condition bitmap to see if the duplicated coordinate actually belongs in the query result.This was found by a customer data set which has many fragments which are essentially copies of the whole coordinate domain. This scenario de-duplicates many coordinates and is likely to hit the condition described. The new test
Sparse global order reader: fragment full copy 1d
simulates this set-up.A large fraction of the changes in this PR are the test scaffolding changes to support evaluating query conditions upon our
FxRun1D
andFxRun2D
structures, so we can continue to check whether the expected and actual results match in the presence of a query condition.TYPE: BUG
DESC: Fix query condition regression introduced by #5417