Skip to content

Conversation

@pyckle
Copy link
Contributor

@pyckle pyckle commented Dec 4, 2025

Rationale for this change

ParquetMetadataConverter has gotten way too large - it needs to be broken up.
SchemaElement conversion is a good starting point to refactor into an external class because:

  • It is an actively developed part of the class - recent changes for variant and geographical types have changed this code
  • It's not strongly coupled to other conversion logic
  • Moving it to parquet-column will reduce code duplication in parquet readers want hadoop dependencies (full disclosure: I had to duplicate some of this code in my downstream parquet lib)

What changes are included in this PR?

All SchemaElement logic is moved to ParquetSchemaConverter in the parquet-column project.
Further cleanup to remove boiler plate enum conversion logic to a different separate class has been done. Tests are also moved appropriately.
Minor deduplication was done for getting LogicalTypeAnnotation from deprecated ConvertedType enum.

Are these changes tested?

Existing tests have been carefully moved to ensure no changes in behavior.

Are there any user-facing changes?

  • Conversion functions for SchemaElement to and from MessageType are now public.
  • Existing public functions that were moved are now deprecated delegates to ensure backwards compatibility.

Closes #1835
Further cleanup of this class is needed, and as such, perhaps closing this issue is not the correct action. I think the next candidate to refactor out is the ColumnChunk metadata conversion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ParquetMetadataConverter.java is too long

1 participant