Skip to content

Fix format, structure and compression method detection for DataLake #746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: antalya
Choose a base branch
from

Conversation

ianton-ru
Copy link

@ianton-ru ianton-ru commented Apr 23, 2025

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fixed format detection for table function iceberg (fixes #732)

Documentation entry for user-facing changes

With empty Iceberg table this query works:

select * from icebergS3('http://minio:9000/warehouse/data/', 'minio', 'minio123')

and this not

select * from iceberg('http://minio:9000/warehouse/data/', 'minio', 'minio123')

Code: 715. DB::Exception: Received from localhost:9000. DB::Exception: The data format cannot be detected by the contents of the files, because there are no files with provided path in S3ObjectStorage or all files are empty. You can specify the format manually: The data format cannot be detected by the contents of the files. You can specify the format manually. (CANNOT_DETECT_FORMAT)

because StorageIcebergConfiguration returns own default-initialized fields instead of fields of specific config implementation (S3, Azure or HDFS), ClickHouse tries to resolve it and it fails - for datalake format is always 'Parquet' now, but when it not filled properly, code tries to detect it from source files and fails for empty table case.

Technical changes - fields in StorageObjectStorage::Configuration now private access via getters and setters.
Logical changes - these getters and setters are overridden in StorageIcebergConfiguration to use proper implementation.

@ianton-ru ianton-ru force-pushed the feature/fix_configuration_format branch from eba1242 to 173d6a5 Compare April 23, 2025 09:48
@ianton-ru ianton-ru changed the title Make fields in object storage configuration private Fix format, structure and compression method detection for DataLake Apr 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fail to read empty Iceberg table with iceberg table function
1 participant