Skip to content

Fix missing file extension in Hive connector output files #25787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

chenjian2664
Copy link
Contributor

Description

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label May 14, 2025
@github-actions github-actions bot added hudi Hudi connector iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector labels May 14, 2025
Copy link
Member

@Praveen2112 Praveen2112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any testing based on "$path" column ? Does hive also output the data with their extension ?

@@ -675,7 +675,7 @@ public static int getBucketFromFileName(String fileName)
public static String getFileExtension(HiveCompressionCodec compression, StorageFormat format)
{
// text format files must have the correct extension when compressed
return compression.getHiveCompressionKind()
return format.getFileExtension() + compression.getHiveCompressionKind()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move "." here

@chenjian2664 chenjian2664 force-pushed the con16914_unload_extension branch 5 times, most recently from c7888b2 to 4b61f65 Compare May 14, 2025 15:22
@@ -240,7 +240,12 @@ public static Optional<Boolean> directoryExists(TrinoFileSystem fileSystem, Loca

public static boolean isFileCreatedByQuery(String fileName, String queryId)
{
return fileName.startsWith(queryId) || fileName.endsWith(queryId);
return fileName.startsWith(queryId)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are using the logic to build fileName/Path, and expect the fileNames always startsWith or endsWith queryId, while the HiveWriteFactory#getFileExtension possible returns extension, it seems a bug previously

@chenjian2664 chenjian2664 force-pushed the con16914_unload_extension branch from 5dcb866 to a217b4b Compare May 14, 2025 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector hive Hive connector hudi Hudi connector iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

4 participants