Skip to content

HIVE-28956 : Use msdb.alterPartitions() API and implement batching for alter table add column cascade command #5814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vikramahuja1001
Copy link
Contributor

@vikramahuja1001 vikramahuja1001 commented May 14, 2025

What changes were proposed in this pull request?

Currently Alter table add column cascade uses msdb.alterPartition API. This API only JDO based implementation. Instead of this use msdb.alterPartitions which supports both DirectSql and JDO Implementation and use batching as well.

Why are the changes needed?

DirectSql bein more optimized than JDO can improve performance for tables with very high number of partitions and very high number of columns.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tested this on a 3 node cluster on a table with 5000+ partitions and 900+ columns.

Table finalOldt = oldt;
int partitionBatchSize = MetastoreConf.getIntVar(handler.getConf(),
MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX);
Batchable.runBatched(partitionBatchSize, parts, new Batchable<Partition, Void>() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR naming is confusing. Where is the direct sql here? This PR introduces batching for partition metadata update, or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previously used msdb.alterPartition was only using JDO. The newly added msdb.alterPartitions API is based on DirectSql.

So this PR is introducing Batching and also using direct sql based msdb.alterPartitions API.

Copy link
Member

@deniskuzZ deniskuzZ May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the direct sql here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deniskuzZ , PR naming is confusing, i will update the same from my end. Thanks for pointing it out in your earlier messages.

tableName, part.getValues(), oldCols, finalOldt, part, null, null);
assert (colStats.isEmpty());
oldPart.setParameters(part.getParameters());
oldParts.add(oldPart);
Copy link
Member

@deniskuzZ deniskuzZ May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you constructing oldParts here if it's only used when cascade is false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

@vikramahuja1001 vikramahuja1001 changed the title HIVE-28956 : Implement DirectSql for alter table add column cascade command. HIVE-28956 : Use msdb.alterPartitions() API and implement batching for alter table add column cascade command May 22, 2025
Copy link

@vikramahuja1001
Copy link
Contributor Author

@deniskuzZ , i have made the necessary changes. Could you please check the PR once again?

@vikramahuja1001
Copy link
Contributor Author

Cc @zhangbutao , @chinnaraolalam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants