Skip to content

[native] Fail-fast for file formats unsupported by hive connector #25147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pramodsatya
Copy link
Contributor

@pramodsatya pramodsatya commented May 20, 2025

Description

Presto C++ only supports reading of tables with DWRF, ORC and PARQUET file formats with hive connector. Using config native-execution-enabled, we can fail-fast at coordinator when attempting to read from tables with unsupported file formats in Presto C++.

Motivation and Context

Currently attempting to read from tables with unsupported file formats in Presto C++ fails at the worker:

it != readerFactories().end() ReaderFactory is not registered for format text

These missing reader factories can be detected at the coordinator itself instead of sending the splits to workers.

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label May 20, 2025
@pramodsatya pramodsatya marked this pull request as ready for review May 20, 2025 15:20
@pramodsatya pramodsatya requested a review from a team as a code owner May 20, 2025 15:20
@pramodsatya pramodsatya requested a review from jaystarshot May 20, 2025 15:20
@prestodb-ci prestodb-ci requested review from a team, sh-shamsan and pdabre12 and removed request for a team May 20, 2025 15:20
@pramodsatya pramodsatya requested review from tdcmeehan, aditi-pandit, a team and nishithakbhaskaran and removed request for sh-shamsan, pdabre12 and a team May 20, 2025 15:20
if (connectorSystemConfig.isNativeExecution()) {
StorageFormat storageFormat = table.getStorage().getStorageFormat();
Optional<HiveStorageFormat> hiveStorageFormat = getHiveStorageFormat(storageFormat);
if (hiveStorageFormat.isPresent() && !(hiveStorageFormat.equals(Optional.of(DWRF))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this in the Hive configs. By default, it is empty, which means whatever is available in Hive is fine. It can be a set of comma separated values.

@aditi-pandit
Copy link
Contributor

@pramodsatya : Thanks for this code. Should we add a check for the file formats applicable at the Writer side as well ? Native execution only supports DWRF and Parquet writers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants