Skip to content

Conversation

@Arkur745
Copy link

@Arkur745 Arkur745 commented Sep 7, 2025

Rationale for this change

StructBuilder class was missing UnsafeAppend() , UnsafeAppendNull(), UnsafeAppendNulls() methods that were standard to other builders.

What changes are included in this PR?

Added three new public methods to the StructBuilder class in cpp/src/arrow/array/builder_nested.h:

void UnsafeAppend()

Status UnsafeAppendNull()

Status UnsafeAppendNulls(int64_t length)

Refactored the existing Append() methods to use a centralized private UnsafeAppend(bool is_valid) function, improving maintainability and reducing code duplication, based on reviewer feedback.

Are these changes tested?

Yes. A new test suite, TestStructBuilderUnsafe, has been added to cpp/src/arrow/array/array_struct_test.cc. This suite contains three new unit tests, one for each of the new unsafe methods, which validate their correctness and handling of nulls. All new and existing tests are passing locally.

Are there any user-facing changes?

Yes, this PR introduces new public methods to the StructBuilder API. The changes are purely additive and do not alter or remove any existing functionality, so they are not breaking.

@github-actions
Copy link

github-actions bot commented Sep 7, 2025

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@kou kou changed the title feat(cpp): Add UnsafeAppend methods to StructBuilder GH-45722: [C++] Add UnsafeAppend methods to StructBuilder Sep 8, 2025
@kou
Copy link
Member

kou commented Sep 8, 2025

Could you use our PR template instead of removing it entirely?

Could you fix lint failures? You can enable GitHub Actions on your fork to check CI results on your fork.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// This method is "unsafe" because it does not update the null bitmap
/// This method is "unsafe" because it does not update the null bitmap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void UnsafeAppend(){
void UnsafeAppend(bool is_valid = true) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
++length_;
UnsafeAppendToBitmap(is_valid);

Could you use UnsafeAppend(is_valid) in Append() instead of UnsafeAppendToBitmap(is_valid)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to append an empty value not null.

Ah, we may need UnsafeAppendEmptyValue() before we work on this. @pitrou What do you think about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
length_++;
UnsafeAppend(false);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently working on lint issues. I will commit the changes once these issues are resolved.

@Arkur745 Arkur745 force-pushed the ARROW-45722-unsafe-append-structbuilder branch 2 times, most recently from c629dca to a8b8faf Compare September 8, 2025 10:22
@Arkur745
Copy link
Author

Arkur745 commented Sep 8, 2025

Could you fix lint failures? You can enable GitHub Actions on your fork to check CI results on your fork.

@kou, I am having the entire project and all the tests compile successfully. The final build log confirms that arrow-array-test.exe is being created correctly.
However, when I run the test executable directly, the Google Test runner reports that it found 0 tests, even though the code contains the new TestStructBuilder test suite.

C:\dev\arrow\cpp\build\debug\Debug\arrow-array-test.exe --gtest_filter=*TestStructBuilder*

Running main() from C:\dev\arrow\cpp\build\_deps\googletest-src\googletest\src\gtest_main.cc

Note: Google Test filter = *TestStructBuilder*

[==========] Running 0 tests from 0 test suites.

[==========] 0 tests from 0 test suites ran. (2 ms total)
[  PASSED  ] 0 tests.

The core of the issue is that the build system doesn't seem to be automatically discovering the tests in array_struct_test.cc.
I found a workaround. By manually adding this line to cpp/src/arrow/array/CMakeLists.txt:
add_arrow_test(array_struct_test)

C:\dev\arrow\cpp\build> ctest -C Debug -R arrow-array-struct-test
Test project 
C:/dev/arrow/cpp/build
Start 17: arrow-array-struct-test
1/1 Test #17: arrow-array-struct-test ..........   Passed   2.98 sec

While my specific test passes, I've discovered that my change to the CMakeLists.txt file seems to break the main test suite. When I run a full ctest, the main arrow-array-test target (which I believe is supposed to contain all the array tests) now fails.

Here is the log from the full run:

99% tests passed, 1 tests failed out of 96
...
The following tests FAILED:
1 - arrow-array-test (Failed)

@kou
Copy link
Member

kou commented Sep 8, 2025

Does C:\dev\arrow\cpp\build\debug\Debug\arrow-array-test.exe (no --gtest_filter=*TestStructBuilder*) work?

@Arkur745
Copy link
Author

Arkur745 commented Sep 8, 2025

No, its not working. all it say is
0 tests from 0 test suites ran.

@Arkur745 Arkur745 force-pushed the ARROW-45722-unsafe-append-structbuilder branch 2 times, most recently from 1020ebc to 5bc5f08 Compare September 8, 2025 14:12
@kou
Copy link
Member

kou commented Sep 8, 2025

OK. #47434 may be related. Could you rebase on main?

@Arkur745 Arkur745 force-pushed the ARROW-45722-unsafe-append-structbuilder branch 7 times, most recently from f3873a6 to 9d3abe5 Compare September 10, 2025 11:01
@Arkur745
Copy link
Author

Please checkout my recent changes. I have resolved the lint errors and it’s passing all the workflows in my fork repo.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use our PR template instead of removing it entirely?

@kou kou requested a review from pitrou September 11, 2025 01:40
@Arkur745
Copy link
Author

I've updated the PR according to the template and committed the suggested changes.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@pitrou I think that we should add UnsafeAppendEmptyValue() before this. What do you think about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "does not update the null bitmap"? Below it does.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, my approach was different. But that didn't work out. I will remove this comment in my next commit.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou is right, we should add UnsafeAppendEmptyValue. This can be done in this PR, or in a separate one.

@Arkur745
Copy link
Author

@pitrou I'll update this PR with updated comments and UnsafeAppendEmptyValue.

@Arkur745
Copy link
Author

@kou , @pitrou
Should I implement UnsafeAppendEmptyValue() and UnsafeAppendEmptyValues() in the base ArrayBuilder class and all the concrete builder types (Int32Builder, StringBuilder , etc.) as part of this PR?
This would involve:

  • Adding virtual method signatures to ArrayBuilder base class
  • Implementing them in all concrete builder classes
  • Then using them in my StructBuilder unsafe methods

Or implement a basic version for now and leave the full concrete implementations for a separate PR?
I think this should be in a separate PR with a separate issue.

@Arkur745 Arkur745 force-pushed the ARROW-45722-unsafe-append-structbuilder branch from f56cffb to bf73f86 Compare September 15, 2025 14:17
@pitrou
Copy link
Member

pitrou commented Sep 15, 2025

Or implement a basic version for now and leave the full concrete implementations for a separate PR? I think this should be in a separate PR with a separate issue.

I think UnsafeAppendEmptyValue() and UnsafeAppendEmptyValues() should be implemented entirely at once.
Either in this PR, or in a separate one.

Note you can recycle the existing implementations of AppendEmptyValue and AppendEmptyValues.
Then these methods can become something like:

Status ArrayBuilder::AppendEmptyValue() {
  RETURN_NOT_OK(Reserve(1));
  UnsafeAppendEmptyValue();
  return Status::OK();
}

Status ArrayBuilder::AppendEmptyValues(int64_t length) {
  RETURN_NOT_OK(Reserve(length));
  UnsafeAppendEmptyValues(length);
  return Status::OK();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants