-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[Refactor](block) pick some trim block pr #57737 #57860 #58124 #58209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### What problem does this PR solve?
Remove unless time var in block:
```
int64_t _decompress_time_ns = 0;
int64_t _decompressed_bytes = 0;
mutable int64_t _compress_time_ns = 0;
```
This pull request refactors how columns are accessed by name in the `Block` class and related code, removing the internal name-to-index map and switching to linear search. It also removes temporary column handling and updates code throughout the backend to use the new column access pattern. Additionally, it improves error handling and updates some operator logic for schema scans. * Removed the internal `index_by_name` map from the `Block` class, replacing all column name lookups with linear search via the new `get_position_by_name` method. This affects methods like `get_by_name`, `try_get_by_name`, `has`, and related insert/erase logic. [[1]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L85-R87) [[2]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L164-L193) [[3]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L204-L227) [[4]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L241-L247) [[5]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L259-L288) [[6]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L310-R268) [[7]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L774) [[8]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L795-L803) [[9]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L825-L835) * Removed the ability to erase columns by name and the logic for maintaining the name-to-index map, simplifying column management. [[1]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L259-L288) [[2]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L241-L247) [[3]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L795-L803) * Removed the definition and usage of temporary block column prefixes and the method `erase_tmp_columns`, as well as related cleanup logic in scan operators. [[1]](diffhunk://#diff-d569a6c78638e9c539783057051e89bcb135cc254c763e3bc5e3a71e334df883L27-L28) [[2]](diffhunk://#diff-ac38473f151ccd99db4ed80dfbfa176f4f957ae2f5335ce8fbb4f5dbb0e932e9L1332-L1338) [[3]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L795-L803) * Updated all code that previously accessed columns by name using the old map to use the new position-based access, including changes in partial update logic, push broker reader, and schema scan operator. [[1]](diffhunk://#diff-23fa0193d626ba712c4186c66bcd1809c7e55bfc04ea10f5a91c691ed3e04727L944-R947) [[2]](diffhunk://#diff-2ed235dda16244dccd76626375b4512b6ade1724933269c40a2953c29dd95c61L314-R314) [[3]](diffhunk://#diff-2ed235dda16244dccd76626375b4512b6ade1724933269c40a2953c29dd95c61L387-R388) [[4]](diffhunk://#diff-2ed235dda16244dccd76626375b4512b6ade1724933269c40a2953c29dd95c61L423-L443) [[5]](diffhunk://#diff-aa97e616efca2638565e6a58aa9295252714265fcc1ed904d503425ef4930ce7L481-R484) [[6]](diffhunk://#diff-aa97e616efca2638565e6a58aa9295252714265fcc1ed904d503425ef4930ce7L494-R494) [[7]](diffhunk://#diff-991f87a3acb91e58d14e3b30520de5dd07ecade8d4ecf885c02bf436873b4262R182) [[8]](diffhunk://#diff-991f87a3acb91e58d14e3b30520de5dd07ecade8d4ecf885c02bf436873b4262L253-R257) * Added a slot-to-column index mapping (`_slot_offsets`) to `SchemaScanOperatorX` for more efficient column mapping during scans. [[1]](diffhunk://#diff-82ea988d6a5179d6698c2844ad8f06167ebdb362225b07b35e9a5ba0a7bb173aR22-R23) [[2]](diffhunk://#diff-82ea988d6a5179d6698c2844ad8f06167ebdb362225b07b35e9a5ba0a7bb173aR90-R92) [[3]](diffhunk://#diff-991f87a3acb91e58d14e3b30520de5dd07ecade8d4ecf885c02bf436873b4262R182) [[4]](diffhunk://#diff-991f87a3acb91e58d14e3b30520de5dd07ecade8d4ecf885c02bf436873b4262L253-R257) * Improved error handling for missing columns, notably in partial update logic, by returning internal errors with block structure dumps when columns are not found. --- This refactor simplifies the `Block` class, improves code consistency, and enhances error handling across the backend.
This pull request refactors how columns are accessed by name in several
vectorized execution components, replacing the inefficient O(N)
`get_by_name` method with a map-based lookup via a new
`get_name_to_pos_map` function. This change improves performance and
error handling when working with blocks of columns, especially in ORC
and Parquet readers. The updates also remove the old `get_by_name`
methods and update all relevant call sites to use the new approach.
* Added `get_name_to_pos_map` to the `Block` class, enabling efficient
mapping from column names to positions for fast lookup.
* Removed the O(N) `get_by_name` methods from the `Block` class,
enforcing use of position-based access via the map.
[[1]](diffhunk://#diff-5a2c9e19a27153df9fce7277a09325589cd009441c63da55761c286627417fb3L147-L151)
[[2]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L244-L261)
* Updated all column accesses in `OrcReader` (`vorc_reader.cpp`) to use
the name-to-position map, improving performance and adding error
handling for missing columns.
[[1]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R1286-R1289)
[[2]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R1314-R1324)
[[3]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L1338-R1348)
[[4]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R1993-R1996)
[[5]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2062-R2072)
[[6]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L2088-R2094)
[[7]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2205-R2225)
[[8]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2239-R2247)
[[9]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2273-R2276)
[[10]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L2310-R2334)
[[11]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2648-R2655)
* Updated `RowGroupReader` in `vparquet_group_reader.cpp` to use the
name-to-position map for column access, with improved error handling for
missing columns.
* Added checks for missing columns using the map's `contains` method
before accessing columns, returning internal errors if columns are not
found.
[[1]](diffhunk://#diff-2ed235dda16244dccd76626375b4512b6ade1724933269c40a2953c29dd95c61L436-R441)
[[2]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R1314-R1324)
[[3]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2062-R2072)
[[4]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2239-R2247)
[[5]](diffhunk://#diff-d09145594c823444cca71879eb6515950211b548d7cbef65f0caf5c1d88f296fR396-R421)
* Removed unused includes and updated struct initialization style for
clarity and consistency.
[[1]](diffhunk://#diff-d09145594c823444cca71879eb6515950211b548d7cbef65f0caf5c1d88f296fR28)
[[2]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2205-R2225)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [x] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [x] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [x] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
yiguolei
approved these changes
Nov 21, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
pick from #57737 #57860 #58124