Skip to content

Insert into bucketed but unpartitioned Hive table #25139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

anandamideShakyan
Copy link
Contributor

@anandamideShakyan anandamideShakyan commented May 18, 2025

Description

Addresses #25104
Currently, Presto does not support INSERT INTO operations on bucketed but unpartitioned Hive tables. This limitation originates from a hard check in HiveWriterFactory:

https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/HiveWriterFactory.java#L480

Motivation and Context

Supporting writes to bucketed unpartitioned Hive tables in Presto would improve compatibility and enhance Presto’s ability to handle modern Hive table layouts. It's a reasonable and useful feature for users who wish to leverage bucketing for performance optimizations even without partitioning.

Impact

This change would align Presto’s behavior with the broader SQL-on-Hadoop ecosystem and remove an artificial limitation that may block valid use cases — particularly in data warehousing environments where bucketing is used independently of partitioning.

Release Notes

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label May 18, 2025
@anandamideShakyan anandamideShakyan marked this pull request as ready for review May 18, 2025 16:29
@anandamideShakyan anandamideShakyan requested a review from a team as a code owner May 18, 2025 16:29
@prestodb-ci prestodb-ci requested review from a team, namya28 and pramodsatya and removed request for a team May 18, 2025 16:29
@aditi-pandit
Copy link
Contributor

@anandamideShakyan : Thanks for this PR.

Have you tried this functionality with Prestissimo ? You might need facebookincubator/velox#13283 as well for it.

@anandamideShakyan
Copy link
Contributor Author

@aditi-pandit Sure I will add the support in Prestissimo after facebookincubator/velox#13283 is merged.

@aditi-pandit
Copy link
Contributor

@anandamideShakyan : Ther are failures in product tests. PTAL.

2025-05-18 19:49:10 INFO: [78 of 435] com.facebook.presto.tests.hive.TestHiveBucketedTables.testInsertIntoBucketedTables (Groups: )
2025-05-18 19:49:11 INFO: FAILURE     /    com.facebook.presto.tests.hive.TestHiveBucketedTables.testInsertIntoBucketedTables (Groups: ) took 1.1 seconds
2025-05-18 19:49:11 SEVERE: Failure cause:
java.lang.IllegalArgumentException: No mutable table instance found for name TableHandle{name=bucket_nation}
	at io.prestodb.tempto.fulfillment.table.TablesState.get(TablesState.java:64)
	at io.prestodb.tempto.fulfillment.table.TablesState.get(TablesState.java:48)
	at com.facebook.presto.tests.hive.TestHiveBucketedTables.testInsertIntoBucketedTables(TestHiveBucketedTables.java:173)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:135)
	at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:673)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:220)
	at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:50)
	at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:945)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:193)
	at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
	at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:128)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

@anandamideShakyan anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from 698df8b to 793a4b5 Compare May 25, 2025 04:43
@anandamideShakyan anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from 793a4b5 to 4f7929a Compare May 25, 2025 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants