Skip to content

Conversation

@ayushbansal07
Copy link
Contributor

@ayushbansal07 ayushbansal07 commented Sep 21, 2025

Rationale for this change

Expose CSV writer option quoting_header for pyarrow. Addresses #47575

What changes are included in this PR?

Cython changes for parsing quoting_header option in a manner similar to quoting_style

Are these changes tested?

Yes, added a unit test under test_csv.py

Are there any user-facing changes?

Add QuotingStyle quoting_header option in WriteOptions for pyarrow

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@ayushbansal07 ayushbansal07 changed the title add quoting_header option to pyarrow WriterOptions GH-47575: [Python] add quoting_header option to pyarrow WriterOptions Sep 21, 2025
@github-actions
Copy link

⚠️ GitHub issue #47575 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ayushbansal07 . This looks mostly good to me, just two suggestions to improve the documentation.


def __init__(self, *, include_header=None, batch_size=None,
delimiter=None, quoting_style=None):
delimiter=None, quoting_style=None, quoting_header=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to add a doc for the new argument in the class docstring above as well.

@property
def quoting_header(self):
"""
Same as quoting_style, but for header column names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add the note that is found in the C++ docs, as otherwise people may be surprised by the behavior.

@ayushbansal07
Copy link
Contributor Author

Thanks for the suggestions @pitrou. Have made the specified changes.

@pitrou
Copy link
Member

pitrou commented Sep 22, 2025

@AlenkaF Do you want to give this a look?

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 22, 2025
Copy link
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @ayushbansal07!
Looks good to me, I only have one minor comment.

@AlenkaF
Copy link
Member

AlenkaF commented Sep 22, 2025

@github-actions crossbow submit -g python

@AlenkaF
Copy link
Member

AlenkaF commented Sep 22, 2025

Will just wait for the CI and the extended builds and will merge if all looks ok.

@github-actions
Copy link

Revision: ae57525

Submitted crossbow builds: ursacomputing/crossbow @ actions-2a38d354fa

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-1.3.4-numpy-1.21.2 GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.12-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.12-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.13 GitHub Actions
test-conda-python-3.13-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.13-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-python-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-42-python-3 GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions

@AlenkaF
Copy link
Member

AlenkaF commented Sep 23, 2025

The failures in the extended builds are not connected to the changes in this PR.
Note to myself: need to open issues for those and try to look into it.

Thanks again @ayushbansal07!

@AlenkaF AlenkaF merged commit 37c87db into apache:main Sep 23, 2025
14 checks passed
@AlenkaF AlenkaF removed the awaiting committer review Awaiting committer review label Sep 23, 2025
@ayushbansal07 ayushbansal07 deleted the feature/pyarrow-add-quoting_header branch September 23, 2025 04:42
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 37c87db.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 12 possible false positives for unstable benchmarks that are known to sometimes produce them.

zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Oct 15, 2025
…ptions (apache#47610)

### Rationale for this change
Expose CSV writer option quoting_header for pyarrow. Addresses apache#47575

### What changes are included in this PR?
Cython changes for parsing quoting_header option in a manner similar to quoting_style

### Are these changes tested?
Yes, added a unit test under test_csv.py

### Are there any user-facing changes?
Add QuotingStyle quoting_header option in WriteOptions for pyarrow

* GitHub Issue: apache#47575

Authored-by: Ayush Bansal <[email protected]>
Signed-off-by: AlenkaF <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants