Skip to content

Conversation

@rfratto
Copy link
Member

@rfratto rfratto commented Oct 16, 2025

Note

This PR may be easier to review commit-by-commit.

This PR introduces a new physical engine node, ScanSet, which represents a set of targets to scan. Currently, each target is a data object section. In the future, this may be expanded to support chunks as a target.

ScanSet is emitted as part of processing the MAKE_TABLE operation from the logical plan. This comes with some other changes:

  • TopK is now emitted when processing the SORT from the logical plan, rather than a part of processing the MAKE_TABLE.
  • Merge and SortMerge are removed as they are no longer used. This also removes the need for buildNodeGroup and overlappingShardDescriptors.

The ScanSet node will be split into many DataObjScan tasks from the workflow (#19511), removing it from the resulting per-task physical plan. However, to be able to test physical plans without the workflow, a ScanSet can still be executed. Executing a ScanSet returns the data from each target in sequence, similar to Merge.

ScanSet is a single node which represents a set of targets to scan in a
single operation.

As part of this change, a TopK can no longer be injected at the scan
level. Instead, processing a SORT from the logical plan directly results
in a TopK. buildNodeGroup and overlappingShardDescriptors are now unused
and have been removed.

The scheduler can use ScanSet to manually create smaller per-section
tasks.
Set MergePrefetchCount on new engine to slightly speed up query times.

Signed-off-by: Robert Fratto <[email protected]>
Now that the physical planner emits TopK and ScanSet, Merge and
SortMerge are unused.

The executor pipeline for SortMerge is also removed. The executor Merge
pipeline remains, as the local implementation of executing ScanSet still
uses it.
@rfratto rfratto requested a review from a team as a code owner October 16, 2025 16:05
@rfratto
Copy link
Member Author

rfratto commented Oct 16, 2025

Since this change moves TopK to as close to the end of the pipeline as possible, the correctness tests once again pass as of this PR: https://github.com/grafana/loki/actions/runs/18567553549/job/52933101996

@spiridonov
Copy link
Contributor

LGTM

@rfratto rfratto merged commit bd3f3da into main Oct 16, 2025
64 checks passed
@rfratto rfratto deleted the engine-scanset branch October 16, 2025 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants