Skip to content

Conversation

@BiteTheDDDDt
Copy link
Contributor

@BiteTheDDDDt BiteTheDDDDt commented Nov 18, 2025

What problem does this PR solve?

This pull request refactors how columns are accessed by name in several vectorized execution components, replacing the inefficient O(N) get_by_name method with a map-based lookup via a new get_name_to_pos_map function. This change improves performance and error handling when working with blocks of columns, especially in ORC and Parquet readers. The updates also remove the old get_by_name methods and update all relevant call sites to use the new approach.

Block column access improvements

  • Added get_name_to_pos_map to the Block class, enabling efficient mapping from column names to positions for fast lookup.
  • Removed the O(N) get_by_name methods from the Block class, enforcing use of position-based access via the map. [1] [2]

ORC reader refactoring

  • Updated all column accesses in OrcReader (vorc_reader.cpp) to use the name-to-position map, improving performance and adding error handling for missing columns. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

Parquet reader refactoring

  • Updated RowGroupReader in vparquet_group_reader.cpp to use the name-to-position map for column access, with improved error handling for missing columns.

Error handling enhancements

  • Added checks for missing columns using the map's contains method before accessing columns, returning internal errors if columns are not found. [1] [2] [3] [4] [5]

Minor code cleanup

  • Removed unused includes and updated struct initialization style for clarity and consistency. [1] [2]

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 15.03% (23/153) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.73% (18271/34648)
Line Coverage 38.10% (166023/435748)
Region Coverage 33.03% (129057/390711)
Branch Coverage 33.81% (55390/163813)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 21.57% (33/153) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.47% (24334/34048)
Line Coverage 57.92% (252762/436417)
Region Coverage 53.05% (210244/396287)
Branch Coverage 54.44% (89773/164892)

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 14.65% (23/157) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.73% (18271/34648)
Line Coverage 38.10% (166013/435755)
Region Coverage 33.04% (129109/390743)
Branch Coverage 33.81% (55382/163815)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 21.02% (33/157) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.42% (24316/34048)
Line Coverage 57.88% (252583/436424)
Region Coverage 53.28% (211166/396319)
Branch Coverage 54.59% (90014/164894)

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 16.07% (27/168) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.73% (18275/34655)
Line Coverage 38.10% (166074/435877)
Region Coverage 33.07% (129260/390839)
Branch Coverage 33.82% (55413/163849)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 22.02% (37/168) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.49% (24346/34055)
Line Coverage 57.96% (253009/436546)
Region Coverage 53.29% (211242/396415)
Branch Coverage 54.58% (90015/164928)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 22.02% (37/168) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.49% (24345/34055)
Line Coverage 57.96% (253002/436546)
Region Coverage 53.30% (211285/396415)
Branch Coverage 54.58% (90019/164928)

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Nov 20, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit eb5a8e2 into apache:master Nov 21, 2025
27 of 29 checks passed
BiteTheDDDDt added a commit to BiteTheDDDDt/incubator-doris that referenced this pull request Nov 21, 2025
This pull request refactors how columns are accessed by name in several
vectorized execution components, replacing the inefficient O(N)
`get_by_name` method with a map-based lookup via a new
`get_name_to_pos_map` function. This change improves performance and
error handling when working with blocks of columns, especially in ORC
and Parquet readers. The updates also remove the old `get_by_name`
methods and update all relevant call sites to use the new approach.

* Added `get_name_to_pos_map` to the `Block` class, enabling efficient
mapping from column names to positions for fast lookup.
* Removed the O(N) `get_by_name` methods from the `Block` class,
enforcing use of position-based access via the map.
[[1]](diffhunk://#diff-5a2c9e19a27153df9fce7277a09325589cd009441c63da55761c286627417fb3L147-L151)
[[2]](diffhunk://#diff-76dba768c36c76cce660f7fe39514a98b509030a8976aabc9ac47d58bc923976L244-L261)

* Updated all column accesses in `OrcReader` (`vorc_reader.cpp`) to use
the name-to-position map, improving performance and adding error
handling for missing columns.
[[1]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R1286-R1289)
[[2]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R1314-R1324)
[[3]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L1338-R1348)
[[4]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R1993-R1996)
[[5]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2062-R2072)
[[6]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L2088-R2094)
[[7]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2205-R2225)
[[8]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2239-R2247)
[[9]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2273-R2276)
[[10]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L2310-R2334)
[[11]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2648-R2655)

* Updated `RowGroupReader` in `vparquet_group_reader.cpp` to use the
name-to-position map for column access, with improved error handling for
missing columns.

* Added checks for missing columns using the map's `contains` method
before accessing columns, returning internal errors if columns are not
found.
[[1]](diffhunk://#diff-2ed235dda16244dccd76626375b4512b6ade1724933269c40a2953c29dd95c61L436-R441)
[[2]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R1314-R1324)
[[3]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2062-R2072)
[[4]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2239-R2247)
[[5]](diffhunk://#diff-d09145594c823444cca71879eb6515950211b548d7cbef65f0caf5c1d88f296fR396-R421)

* Removed unused includes and updated struct initialization style for
clarity and consistency.
[[1]](diffhunk://#diff-d09145594c823444cca71879eb6515950211b548d7cbef65f0caf5c1d88f296fR28)
[[2]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R2205-R2225)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [x] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [x] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [x] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
yiguolei pushed a commit that referenced this pull request Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.2-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants