Skip to content

HIVE-27275: Provide docker image for HMS #5834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

deniskuzZ
Copy link
Member

@deniskuzZ deniskuzZ commented May 31, 2025

What changes were proposed in this pull request?

Lightweight Docker image for the Hive Metastore

Why are the changes needed?

Simplified ddeployment

Does this PR introduce any user-facing change?

No

How was this patch tested?

Locally

@deniskuzZ deniskuzZ marked this pull request as draft May 31, 2025 17:01
@deniskuzZ deniskuzZ force-pushed the HIVE-27275 branch 3 times, most recently from 38c537c to ae15116 Compare May 31, 2025 21:45
@deniskuzZ deniskuzZ marked this pull request as ready for review June 2, 2025 11:02
@deniskuzZ deniskuzZ changed the title [DRAFT ]Provide docker image for HMS Provide docker image for HMS Jun 2, 2025
@deniskuzZ deniskuzZ changed the title Provide docker image for HMS HIVE-27275: Provide docker image for HMS Jun 2, 2025
@deniskuzZ deniskuzZ requested a review from dengzhhu653 June 2, 2025 11:07
@@ -875,7 +875,7 @@ public enum ConfVars {
EVENT_DB_LISTENER_CLEAN_STARTUP_WAIT_INTERVAL("metastore.event.db.listener.clean.startup.wait.interval",
"hive.metastore.event.db.listener.clean.startup.wait.interval", 1, TimeUnit.DAYS,
"Wait interval post start of metastore after which the cleaner thread starts to work"),
EVENT_DB_NOTIFICATION_API_AUTH("metastore.metastore.event.db.notification.api.auth",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it's good to change the property name that existing in old releases

Copy link
Member Author

@deniskuzZ deniskuzZ Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No one used such a name, but a hive alias

docker run -d -p 9083:9083 --env DB_DRIVER=postgres \
--env SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=org.postgresql.Driver -Djavax.jdo.option.ConnectionURL=jdbc:postgresql://postgres:5432/metastore_db -Djavax.jdo.option.ConnectionUserName=hive -Djavax.jdo.option.ConnectionPassword=password" \
--mount source=warehouse,target=/opt/hive/data/warehouse \
--mount type=bind,source=`mvn help:evaluate -Dexpression=settings.localRepository -q -DforceStdout`/org/postgresql/postgresql/42.7.3/postgresql-42.7.3.jar,target=/opt/hive/lib/postgres.jar \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is very convenient if major JDBC drivers are legally included.

Copy link
Member Author

@deniskuzZ deniskuzZ Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we should embed the drivers, that would affect the docker image size + won't allow users to pass a custom version of a driver
Instead they could place the drivers under the POSTGRES_LOCAL_PATH

Copy link
Member

@dengzhhu653 dengzhhu653 Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point is we might have license issues if we include all the major drivers, though I'm not sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, I think we can create another ticket if we are truly sure we need it.

Probably, k8s users prefer to bake it into their images rather than having a tricky manifest. The default drivers can eliminate the extra work. They can still overwrite the driver when they want to use their own.
Regarding the license, the licenses of some drivers, such as PostgreSQL, are compatible if I remember correctly.

Copy link

sonarqubecloud bot commented Jun 5, 2025

</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-standalone-metastore-rest-catalog</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I use the command mvn clean install -DskipTests -Pdocker to build the image, an issue:

[ERROR] The projects in the reactor contain a cyclic reference: Edge between 'Vertex{label='org.apache.hive:hive-standalone-metastore-rest-catalog:4.1.0-SNAPSHOT'}' and 'Vertex{label='org.apache.hive:hive-standalone-metastore-server:4.1.0-SNAPSHOT'}' introduces to cycle in the graph org.apache.hive:hive-standalone-metastore-server:4.1.0-SNAPSHOT --> org.apache.hive:hive-standalone-metastore-rest-catalog:4.1.0-SNAPSHOT --> org.apache.hive:hive-standalone-metastore-server:4.1.0-SNAPSHOT -> [Help 1]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dengzhhu653, you should execute the above command from within meteastore-server; otherwise, you'll encounter a cyclic reference problem.
The problem is that instead of a code refactor and extraction of auth classes into common, metastore-server was added as a direct dependency for rest-catalog. I think that is out-of-scope of this PR, but we should address it a future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants