[Refactor](block) pick some trim block pr #57737 #57860 #58124 #58209

BiteTheDDDDt · 2025-11-21T02:38:06Z

### What problem does this PR solve? Remove unless time var in block: ``` int64_t _decompress_time_ns = 0; int64_t _decompressed_bytes = 0; mutable int64_t _compress_time_ns = 0; ```

This pull request refactors how columns are accessed by name in the `Block` class and related code, removing the internal name-to-index map and switching to linear search. It also removes temporary column handling and updates code throughout the backend to use the new column access pattern. Additionally, it improves error handling and updates some operator logic for schema scans. * Removed the internal `index_by_name` map from the `Block` class, replacing all column name lookups with linear search via the new `get_position_by_name` method. This affects methods like `get_by_name`, `try_get_by_name`, `has`, and related insert/erase logic. [[1]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L85-R87) [[2]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L164-L193) [[3]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L204-L227) [[4]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L241-L247) [[5]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L259-L288) [[6]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L310-R268) [[7]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L774) [[8]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L795-L803) [[9]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L825-L835) * Removed the ability to erase columns by name and the logic for maintaining the name-to-index map, simplifying column management. [[1]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L259-L288) [[2]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L241-L247) [[3]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L795-L803) * Removed the definition and usage of temporary block column prefixes and the method `erase_tmp_columns`, as well as related cleanup logic in scan operators. [[1]](diffhunk://#diff-d569a6c78638e9c539783057051e89bcb135cc254c763e3bc5e3a71e334df883L27-L28) [[2]](diffhunk://#diff-ac38473f151ccd99db4ed80dfbfa176f4f957ae2f5335ce8fbb4f5dbb0e932e9L1332-L1338) [[3]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L795-L803) * Updated all code that previously accessed columns by name using the old map to use the new position-based access, including changes in partial update logic, push broker reader, and schema scan operator. [[1]](diffhunk://#diff-23fa0193d626ba712c4186c66bcd1809c7e55bfc04ea10f5a91c691ed3e04727L944-R947) [[2]](diffhunk://#diff-2ed235dda16244dccd76626375b4512b6ade1724933269c40a2953c29dd95c61L314-R314) [[3]](diffhunk://#diff-2ed235dda16244dccd76626375b4512b6ade1724933269c40a2953c29dd95c61L387-R388) [[4]](diffhunk://#diff-2ed235dda16244dccd76626375b4512b6ade1724933269c40a2953c29dd95c61L423-L443) [[5]](diffhunk://#diff-aa97e616efca2638565e6a58aa9295252714265fcc1ed904d503425ef4930ce7L481-R484) [[6]](diffhunk://#diff-aa97e616efca2638565e6a58aa9295252714265fcc1ed904d503425ef4930ce7L494-R494) [[7]](diffhunk://#diff-991f87a3acb91e58d14e3b30520de5dd07ecade8d4ecf885c02bf436873b4262R182) [[8]](diffhunk://#diff-991f87a3acb91e58d14e3b30520de5dd07ecade8d4ecf885c02bf436873b4262L253-R257) * Added a slot-to-column index mapping (`_slot_offsets`) to `SchemaScanOperatorX` for more efficient column mapping during scans. [[1]](diffhunk://#diff-82ea988d6a5179d6698c2844ad8f06167ebdb362225b07b35e9a5ba0a7bb173aR22-R23) [[2]](diffhunk://#diff-82ea988d6a5179d6698c2844ad8f06167ebdb362225b07b35e9a5ba0a7bb173aR90-R92) [[3]](diffhunk://#diff-991f87a3acb91e58d14e3b30520de5dd07ecade8d4ecf885c02bf436873b4262R182) [[4]](diffhunk://#diff-991f87a3acb91e58d14e3b30520de5dd07ecade8d4ecf885c02bf436873b4262L253-R257) * Improved error handling for missing columns, notably in partial update logic, by returning internal errors with block structure dumps when columns are not found. --- This refactor simplifies the `Block` class, improves code consistency, and enhances error handling across the backend.

This pull request refactors how columns are accessed by name in several vectorized execution components, replacing the inefficient O(N) `get_by_name` method with a map-based lookup via a new `get_name_to_pos_map` function. This change improves performance and error handling when working with blocks of columns, especially in ORC and Parquet readers. The updates also remove the old `get_by_name` methods and update all relevant call sites to use the new approach. * Added `get_name_to_pos_map` to the `Block` class, enabling efficient mapping from column names to positions for fast lookup. * Removed the O(N) `get_by_name` methods from the `Block` class, enforcing use of position-based access via the map. [[1]](diffhunk://#diff-5a2c9e19a27153df9fce7277a09325589cd009441c63da55761c286627417fb3L147-L151) [[2]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L244-L261) * Updated all column accesses in `OrcReader` (`vorc_reader.cpp`) to use the name-to-position map, improving performance and adding error handling for missing columns. [[1]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R1286-R1289) [[2]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R1314-R1324) [[3]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L1338-R1348) [[4]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R1993-R1996) [[5]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2062-R2072) [[6]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L2088-R2094) [[7]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2205-R2225) [[8]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2239-R2247) [[9]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2273-R2276) [[10]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L2310-R2334) [[11]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2648-R2655) * Updated `RowGroupReader` in `vparquet_group_reader.cpp` to use the name-to-position map for column access, with improved error handling for missing columns. * Added checks for missing columns using the map's `contains` method before accessing columns, returning internal errors if columns are not found. [[1]](diffhunk://#diff-2ed235dda16244dccd76626375b4512b6ade1724933269c40a2953c29dd95c61L436-R441) [[2]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R1314-R1324) [[3]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2062-R2072) [[4]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2239-R2247) [[5]](diffhunk://#diff-d09145594c823444cca71879eb6515950211b548d7cbef65f0caf5c1d88f296fR396-R421) * Removed unused includes and updated struct initialization style for clarity and consistency. [[1]](diffhunk://#diff-d09145594c823444cca71879eb6515950211b548d7cbef65f0caf5c1d88f296fR28) [[2]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2205-R2225) - Test  - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [x] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [x] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason  - Behavior changed: - [x] No. - [ ] Yes.  - Does this need documentation? - [x] No. - [ ] Yes.  - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label

Thearas · 2025-11-21T02:38:11Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

BiteTheDDDDt · 2025-11-21T02:38:16Z

run buildall

hello-stephen · 2025-11-21T06:58:03Z

BE UT Coverage Report

Increment line coverage 23.91% (66/276) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	52.78% (18148/34381)
Line Coverage	38.13% (165281/433518)
Region Coverage	33.15% (128383/387325)
Branch Coverage	33.94% (55350/163080)

HappenLee and others added 3 commits November 21, 2025 10:34

[Refactor](exec) remove unless time var in block (apache#57737)

c6f62c8

### What problem does this PR solve? Remove unless time var in block: ``` int64_t _decompress_time_ns = 0; int64_t _decompressed_bytes = 0; mutable int64_t _compress_time_ns = 0; ```

BiteTheDDDDt requested a review from yiguolei as a code owner November 21, 2025 02:38

yiguolei approved these changes Nov 21, 2025

View reviewed changes

yiguolei merged commit df2a963 into apache:branch-4.0 Nov 21, 2025
24 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor](block) pick some trim block pr #57737 #57860 #58124 #58209

[Refactor](block) pick some trim block pr #57737 #57860 #58124 #58209

Uh oh!

BiteTheDDDDt commented Nov 21, 2025

Uh oh!

Thearas commented Nov 21, 2025

Uh oh!

BiteTheDDDDt commented Nov 21, 2025

Uh oh!

hello-stephen commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Refactor](block) pick some trim block pr #57737 #57860 #58124 #58209

[Refactor](block) pick some trim block pr #57737 #57860 #58124 #58209

Uh oh!

Conversation

BiteTheDDDDt commented Nov 21, 2025

Uh oh!

Thearas commented Nov 21, 2025

Uh oh!

BiteTheDDDDt commented Nov 21, 2025

Uh oh!

hello-stephen commented Nov 21, 2025

BE UT Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants