-
Notifications
You must be signed in to change notification settings - Fork 314
Site: Add docs for catalog federation #2761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
bf3e0ed
c56d74a
b9e4114
e3a8fed
5c9401d
204c40b
9b191a9
2c975ed
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
title: Federation | ||
type: docs | ||
weight: 703 | ||
--- | ||
|
||
Guides for federating Polaris with existing metadata services. Expand this section to select a | ||
specific integration. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
--- | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
title: Hive Metastore Federation | ||
type: docs | ||
weight: 705 | ||
--- | ||
|
||
Polaris can federate catalog operations to an existing Hive Metastore (HMS). This lets an external | ||
HMS remain the source of truth for table metadata while Polaris brokers access, policies, and | ||
multi-engine connectivity. | ||
|
||
## Build-time enablement | ||
|
||
The Hive factory is packaged as an optional extension and is not baked into default server builds. | ||
Include it when assembling the runtime or container images by setting the `NonRESTCatalogs` Gradle | ||
property to include `HIVE` (and any other non-REST backends you need): | ||
|
||
```bash | ||
./gradlew :polaris-server:assemble :polaris-server:quarkusAppPartsBuild --rerun \ | ||
-DNonRESTCatalogs=HIVE -Dquarkus.container-image.build=true | ||
``` | ||
|
||
`runtime/server/build.gradle.kts` wires the extension in only when this flag is present, so binaries | ||
built without it will reject Hive federation requests. | ||
|
||
## Runtime requirements | ||
|
||
- **Metastore connectivity:** Expose the HMS Thrift endpoint (`thrift://host:port`) to the Polaris | ||
deployment. | ||
- **Configuration discovery:** Iceberg’s `HiveCatalog` loads Hadoop/Hive client settings from the | ||
classpath. Provide `hive-site.xml` (and `core-site.xml` if needed) via | ||
`HADOOP_CONF_DIR`/`HIVE_CONF_DIR` or an image layer. | ||
- **Authentication:** Hive federation only supports `IMPLICIT` authentication, meaning Polaris uses | ||
the operating-system or Kerberos identity of the running process (no stored secrets). Ensure the | ||
service principal is logged in or holds a valid keytab/TGT before starting Polaris. | ||
- **Object storage role:** Configure `polaris.service-identity.<realm>.aws-iam.*` (or the default | ||
realm) so the server can assume the AWS role referenced by the catalog. The IAM role must allow | ||
STS access from the Polaris service identity and grant permissions to the table locations. | ||
|
||
### Kerberos setup example | ||
|
||
If your Hive Metastore enforces Kerberos, stage the necessary configuration alongside Polaris: | ||
|
||
```bash | ||
export KRB5_CONFIG=/etc/polaris/krb5.conf | ||
export HADOOP_CONF_DIR=/etc/polaris/hadoop-conf # contains hive-site.xml with HMS principal | ||
export HADOOP_OPTS="-Djava.security.auth.login.config=/etc/polaris/jaas.conf" | ||
kinit -kt /etc/polaris/keytabs/polaris.keytab polaris/[email protected] | ||
``` | ||
|
||
- `hive-site.xml` must define `hive.metastore.sasl.enabled=true`, the metastore principal, and | ||
client principal pattern (for example `hive.metastore.client.kerberos.principal=polaris/_HOST@REALM`). | ||
- The JAAS entry (referenced by `java.security.auth.login.config`) should use `useKeyTab=true` and | ||
point to the same keytab shown above so the Polaris JVM can refresh credentials automatically. | ||
- Keep the keytab readable solely by the Polaris service user; the implicit authenticator consumes | ||
the TGT at startup and for periodic renewal. | ||
|
||
## Creating a federated catalog | ||
|
||
Use the Management API (or the Python CLI) to create an external catalog whose connection type is | ||
`HIVE`. The following request registers a catalog that proxies to an HMS running on | ||
`thrift://hms.example.internal:9083`: | ||
|
||
```bash | ||
curl -X POST https://<polaris-host>/management/v1/catalogs \ | ||
-H "Authorization: Bearer $TOKEN" \ | ||
-H "Content-Type: application/json" \ | ||
-d '{ | ||
"type": "EXTERNAL", | ||
"name": "analytics_hms", | ||
"storageConfigInfo": { | ||
"storageType": "S3", | ||
"roleArn": "arn:aws:iam::123456789012:role/polaris-warehouse-access", | ||
"region": "us-east-1" | ||
}, | ||
"properties": { "default-base-location": "s3://analytics-bucket/warehouse/" }, | ||
"connectionConfigInfo": { | ||
"connectionType": "HIVE", | ||
"uri": "thrift://hms.example.internal:9083", | ||
"warehouse": "s3://analytics-bucket/warehouse/", | ||
"authenticationParameters": { "authenticationType": "IMPLICIT" } | ||
} | ||
}' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should be able to use polaris cli to create the catalog now There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @XJDKC ! I can add the cli version as well There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Most of the docs just reference the CLI, so if the CLI works I think we should prefer that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have changed the REST example. The HMS cli support wasn't there. I can make the change after adding the support. https://github.com/polaris-catalog/polaris/blob/d449f59e9b54b414c0dd581e086d5ce436b05708/client/python/cli/command/catalogs.py#L271-L271 |
||
``` | ||
|
||
Grant catalog roles to principal roles exactly as you would for internal catalogs so engines can | ||
obtain tokens that authorize against the federated metadata. | ||
|
||
`default-base-location` is required; it tells Polaris and Iceberg where to place new metadata files. | ||
`allowedLocations` is optional—supply it only when you want to restrict writers to a specific set of | ||
prefixes. If your IAM trust policy requires an `externalId` or explicit `userArn`, include those | ||
optional fields in `storageConfigInfo`. Polaris persists them and supplies them when assuming the | ||
role cited by `roleArn` during metadata commits. | ||
|
||
## Limitations and operational notes | ||
|
||
- **Single identity:** Because only `IMPLICIT` authentication is permitted, Polaris cannot mix | ||
multiple Hive identities in a single deployment (`HiveFederatedCatalogFactory` rejects other auth | ||
types). Plan a deployment topology that aligns the Polaris process identity with the target HMS. | ||
- **Generic tables:** The Hive extension exposes Iceberg tables registered in HMS. Generic table | ||
federation is not implemented (`HiveFederatedCatalogFactory#createGenericCatalog` throws | ||
`UnsupportedOperationException`). | ||
- **Configuration caching:** Atlas-style catalog failover and multi-HMS routing are not yet handled; | ||
Polaris initializes one `HiveCatalog` per connection and relies on the underlying Iceberg client | ||
for retries. | ||
|
||
With these constraints satisfied, Polaris can sit in front of an HMS so that Iceberg tables managed | ||
there gain OAuth-protected, multi-engine access through the Polaris REST APIs. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
--- | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
title: Iceberg REST Federation | ||
type: docs | ||
weight: 704 | ||
--- | ||
|
||
Polaris can federate an external Iceberg REST catalog (e.g., another Polaris deployment, AWS Glue, or a custom Iceberg | ||
REST implementation), enabling a Polaris service to access table and view entities managed by remote Iceberg REST Catalogs. | ||
|
||
## Runtime requirements | ||
|
||
- **REST endpoint:** The remote service must expose the Iceberg REST specification. Configure | ||
firewalls so Polaris can reach the base URI you provide in the connection config. | ||
- **Authentication:** Polaris forwards requests using the credentials defined in | ||
`ConnectionConfigInfo.AuthenticationParameters`. OAuth2 client credentials, bearer tokens, and AWS | ||
SigV4 are supported; choose the scheme the remote service expects. | ||
|
||
## Creating a federated REST catalog | ||
|
||
The snippet below registers an external catalog that forwards to a remote Polaris server using OAuth2 | ||
client credentials. `iceberg-remote-catalog-name` is optional; supply it when the remote server multiplexes | ||
multiple logical catalogs under one URI. | ||
|
||
```bash | ||
polaris catalogs create \ | ||
--type EXTERNAL \ | ||
--storage-type s3 \ | ||
--role-arn "arn:aws:iam::123456789012:role/polaris-warehouse-access" \ | ||
--default-base-location "s3://analytics-bucket/warehouse/" \ | ||
--catalog-connection-type iceberg-rest \ | ||
--iceberg-remote-catalog-name analytics \ | ||
--catalog-uri "https://remote-polaris.example.com/catalog/v1" \ | ||
--catalog-authentication-type OAUTH \ | ||
--catalog-token-uri "https://remote-polaris.example.com/catalog/v1/oauth/tokens" \ | ||
--catalog-client-id "<remote-client-id>" \ | ||
--catalog-client-secret "<remote-client-secret>" \ | ||
--catalog-client-scopes "PRINCIPAL_ROLE:ALL" \ | ||
analytics_rest | ||
``` | ||
|
||
HonahX marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Refer to the [CLI documentation](../command-line-interface.md#catalogs) for details on alternative authentication types such as BEARER or SIGV4. | ||
|
||
Grant catalog roles to principal roles the same way you do for internal catalogs so compute engines | ||
receive tokens with access to the federated namespace. | ||
HonahX marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Operational notes | ||
|
||
- **Connectivity checks:** Polaris does not lazily probe the remote service; catalog creation fails if | ||
the REST endpoint is unreachable or authentication is rejected. | ||
- **Feature parity:** Federation exposes whatever table/namespace operations the remote service | ||
implements. Unsupported features return the remote error directly to callers. | ||
- **Generic tables:** The REST federation path currently surfaces Iceberg tables only; generic table | ||
federation is not implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this document can also be used for Hadoop catalogs. Essentially, the mechanism for HMS and Hadoop is the same. We could call these non-REST catalogs.
CC @eric-maynard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let me make that change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This page is about HMS so it's maybe okay to focus on HMS, but @poojanilangekar is absolutely correct that Polaris is meant to be able to federate to any HadoopCatalog implementation and we should make the docs clear about that (if not on this page then elsewhere). There are also catalogs which do use REST which Polaris could federate to (e.g. Unity Catalog).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add hadoop one as a followup