Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions site/content/in-dev/unreleased/federation/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
title: Federation
type: docs
weight: 703
---

Guides for federating Polaris with existing metadata services. Expand this section to select a
specific integration.
125 changes: 125 additions & 0 deletions site/content/in-dev/unreleased/federation/hive-metastore-federation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
title: Hive Metastore Federation
type: docs
weight: 705
---

Polaris can federate catalog operations to an existing Hive Metastore (HMS). This lets an external
Comment on lines +23 to +25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this document can also be used for Hadoop catalogs. Essentially, the mechanism for HMS and Hadoop is the same. We could call these non-REST catalogs.

CC @eric-maynard

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let me make that change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This page is about HMS so it's maybe okay to focus on HMS, but @poojanilangekar is absolutely correct that Polaris is meant to be able to federate to any HadoopCatalog implementation and we should make the docs clear about that (if not on this page then elsewhere). There are also catalogs which do use REST which Polaris could federate to (e.g. Unity Catalog).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add hadoop one as a followup

HMS remain the source of truth for table metadata while Polaris brokers access, policies, and
multi-engine connectivity.

## Build-time enablement

The Hive factory is packaged as an optional extension and is not baked into default server builds.
Include it when assembling the runtime or container images by setting the `NonRESTCatalogs` Gradle
property to include `HIVE` (and any other non-REST backends you need):

```bash
./gradlew :polaris-server:assemble :polaris-server:quarkusAppPartsBuild --rerun \
-DNonRESTCatalogs=HIVE -Dquarkus.container-image.build=true
```

`runtime/server/build.gradle.kts` wires the extension in only when this flag is present, so binaries
built without it will reject Hive federation requests.

## Runtime requirements

- **Metastore connectivity:** Expose the HMS Thrift endpoint (`thrift://host:port`) to the Polaris
deployment.
- **Configuration discovery:** Iceberg’s `HiveCatalog` loads Hadoop/Hive client settings from the
classpath. Provide `hive-site.xml` (and `core-site.xml` if needed) via
`HADOOP_CONF_DIR`/`HIVE_CONF_DIR` or an image layer.
- **Authentication:** Hive federation only supports `IMPLICIT` authentication, meaning Polaris uses
the operating-system or Kerberos identity of the running process (no stored secrets). Ensure the
service principal is logged in or holds a valid keytab/TGT before starting Polaris.
- **Object storage role:** Configure `polaris.service-identity.<realm>.aws-iam.*` (or the default
realm) so the server can assume the AWS role referenced by the catalog. The IAM role must allow
STS access from the Polaris service identity and grant permissions to the table locations.

### Kerberos setup example

If your Hive Metastore enforces Kerberos, stage the necessary configuration alongside Polaris:

```bash
export KRB5_CONFIG=/etc/polaris/krb5.conf
export HADOOP_CONF_DIR=/etc/polaris/hadoop-conf # contains hive-site.xml with HMS principal
export HADOOP_OPTS="-Djava.security.auth.login.config=/etc/polaris/jaas.conf"
kinit -kt /etc/polaris/keytabs/polaris.keytab polaris/[email protected]
```

- `hive-site.xml` must define `hive.metastore.sasl.enabled=true`, the metastore principal, and
client principal pattern (for example `hive.metastore.client.kerberos.principal=polaris/_HOST@REALM`).
- The JAAS entry (referenced by `java.security.auth.login.config`) should use `useKeyTab=true` and
point to the same keytab shown above so the Polaris JVM can refresh credentials automatically.
- Keep the keytab readable solely by the Polaris service user; the implicit authenticator consumes
the TGT at startup and for periodic renewal.

## Creating a federated catalog

Use the Management API (or the Python CLI) to create an external catalog whose connection type is
`HIVE`. The following request registers a catalog that proxies to an HMS running on
`thrift://hms.example.internal:9083`:

```bash
curl -X POST https://<polaris-host>/management/v1/catalogs \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"type": "EXTERNAL",
"name": "analytics_hms",
"storageConfigInfo": {
"storageType": "S3",
"roleArn": "arn:aws:iam::123456789012:role/polaris-warehouse-access",
"region": "us-east-1"
},
"properties": { "default-base-location": "s3://analytics-bucket/warehouse/" },
"connectionConfigInfo": {
"connectionType": "HIVE",
"uri": "thrift://hms.example.internal:9083",
"warehouse": "s3://analytics-bucket/warehouse/",
"authenticationParameters": { "authenticationType": "IMPLICIT" }
}
}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to use polaris cli to create the catalog now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @XJDKC ! I can add the cli version as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the docs just reference the CLI, so if the CLI works I think we should prefer that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the REST example. The HMS cli support wasn't there. I can make the change after adding the support. https://github.com/polaris-catalog/polaris/blob/d449f59e9b54b414c0dd581e086d5ce436b05708/client/python/cli/command/catalogs.py#L271-L271

```

Grant catalog roles to principal roles exactly as you would for internal catalogs so engines can
obtain tokens that authorize against the federated metadata.

`default-base-location` is required; it tells Polaris and Iceberg where to place new metadata files.
`allowedLocations` is optional—supply it only when you want to restrict writers to a specific set of
prefixes. If your IAM trust policy requires an `externalId` or explicit `userArn`, include those
optional fields in `storageConfigInfo`. Polaris persists them and supplies them when assuming the
role cited by `roleArn` during metadata commits.

## Limitations and operational notes

- **Single identity:** Because only `IMPLICIT` authentication is permitted, Polaris cannot mix
multiple Hive identities in a single deployment (`HiveFederatedCatalogFactory` rejects other auth
types). Plan a deployment topology that aligns the Polaris process identity with the target HMS.
- **Generic tables:** The Hive extension exposes Iceberg tables registered in HMS. Generic table
federation is not implemented (`HiveFederatedCatalogFactory#createGenericCatalog` throws
`UnsupportedOperationException`).
- **Configuration caching:** Atlas-style catalog failover and multi-HMS routing are not yet handled;
Polaris initializes one `HiveCatalog` per connection and relies on the underlying Iceberg client
for retries.

With these constraints satisfied, Polaris can sit in front of an HMS so that Iceberg tables managed
there gain OAuth-protected, multi-engine access through the Polaris REST APIs.
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
title: Iceberg REST Federation
type: docs
weight: 704
---

Polaris can federate an external Iceberg REST catalog (e.g., another Polaris deployment, AWS Glue, or a custom Iceberg
REST implementation), enabling a Polaris service to access table and view entities managed by remote Iceberg REST Catalogs.

## Runtime requirements

- **REST endpoint:** The remote service must expose the Iceberg REST specification. Configure
firewalls so Polaris can reach the base URI you provide in the connection config.
- **Authentication:** Polaris forwards requests using the credentials defined in
`ConnectionConfigInfo.AuthenticationParameters`. OAuth2 client credentials, bearer tokens, and AWS
SigV4 are supported; choose the scheme the remote service expects.

## Creating a federated REST catalog

The snippet below registers an external catalog that forwards to a remote Polaris server using OAuth2
client credentials. `iceberg-remote-catalog-name` is optional; supply it when the remote server multiplexes
multiple logical catalogs under one URI.

```bash
polaris catalogs create \
--type EXTERNAL \
--storage-type s3 \
--role-arn "arn:aws:iam::123456789012:role/polaris-warehouse-access" \
--default-base-location "s3://analytics-bucket/warehouse/" \
--catalog-connection-type iceberg-rest \
--iceberg-remote-catalog-name analytics \
--catalog-uri "https://remote-polaris.example.com/catalog/v1" \
--catalog-authentication-type OAUTH \
--catalog-token-uri "https://remote-polaris.example.com/catalog/v1/oauth/tokens" \
--catalog-client-id "<remote-client-id>" \
--catalog-client-secret "<remote-client-secret>" \
--catalog-client-scopes "PRINCIPAL_ROLE:ALL" \
analytics_rest
```

Refer to the [CLI documentation](../command-line-interface.md#catalogs) for details on alternative authentication types such as BEARER or SIGV4.

Grant catalog roles to principal roles the same way you do for internal catalogs so compute engines
receive tokens with access to the federated namespace.

## Operational notes

- **Connectivity checks:** Polaris does not lazily probe the remote service; catalog creation fails if
the REST endpoint is unreachable or authentication is rejected.
- **Feature parity:** Federation exposes whatever table/namespace operations the remote service
implements. Unsupported features return the remote error directly to callers.
- **Generic tables:** The REST federation path currently surfaces Iceberg tables only; generic table
federation is not implemented.