Skip to content

Conversation

HonahX
Copy link
Contributor

@HonahX HonahX commented Oct 1, 2025

Overview

This PR extracts credential vending entrypoint getAccessConfig from IcebergCatalog into a new centralized AccessConfigProvider class, decoupling credential generation from catalog implementations.

The old SupportsCredentialVending is removed in this PR upon discussion

Motivation

When implementing credential vending for catalog federation, the current model proved difficult to extend. The existing approach models credential vending similar to Iceberg's capability interfaces (SupportsNamespace, ViewCatalog) using inheritance to determine whether a catalog possesses certain functionality.

This inheritance-based approach works well for Iceberg-native interfaces like SupportsNamespace because:

  • The interface is defined in the upstream Iceberg SDK
  • All catalog implementations naturally follow the upstream contract

However, for SupportsCredentialDelegation:

  • It's a Polaris-specific interface, not part of upstream Iceberg
  • External/federated catalogs won't naturally implement this Polaris-specific interface
  • We only perform check in one place in IcebergCatalogHandler
  • The "get subscoped credential" is not a standard API defined in our catalog spec

The inheritance approach does not provides that much value in this case. Additionally, as outlined in our federation design doc, we plan to make getAccessConfig a unified entry point for all storage access. Extracting this logic to a shared service prepares the codebase for that unification.

I considered introducing a wrapper class (e.g., CredentialVendingCatalogWrapper) to add credential vending capability to external catalogs. However, IcebergCatalogHandler already serves as an adapter layer between REST APIs and catalog implementations (both internal and federated). Adding another wrapping layer would over-complicate the architecture with minimal benefit, especially since the capability check happens in only one location.

* @return {@link AccessConfig} with scoped credentials and metadata; empty if no storage config
* found
*/
public AccessConfig getAccessConfig(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow-up, this aims to replace

FileIOUtil.refreshAccessConfig(
callContext,
storageCredentialCache,
credentialVendor,
identifier,
tableLocations,
storageActions,
storageInfo,
Optional.empty()));

such that all storage access will start from this call

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet 👍

@HonahX HonahX changed the title [Draft] Extract IcebergCatalog.getAccessConfig to a separate class AccessConfigProvider Extract IcebergCatalog.getAccessConfig to a separate class AccessConfigProvider Oct 1, 2025
@HonahX HonahX marked this pull request as ready for review October 1, 2025 00:59
@HonahX HonahX marked this pull request as draft October 1, 2025 01:32
@HonahX HonahX marked this pull request as ready for review October 1, 2025 01:56
* primary entrypoint to get sub-scoped credentials for accessing table data.
*/
@ApplicationScoped
public class AccessConfigProvider {
Copy link
Member

@XJDKC XJDKC Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we make it clear that this is to provide storage access config? Considering we now need to access two types of external services: storage and catalog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That SGTM, but then we'll probably have to rename AccessConfig too for the sake of clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea too! Let's do this in a follow-up since the rename will introduce many git diffs : )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up SGTM

@dimas-b
Copy link
Contributor

dimas-b commented Oct 1, 2025

Let's finish #2711 first, because it's a follow-up to some other changes already merged for 1.2.0 and I believe we need to address it before the release.

@HonahX
Copy link
Contributor Author

HonahX commented Oct 1, 2025

@dimas-b Sounds good! I also do not want to introduce any new deprecation before the upcoming RC, but just open this to gather feedback early : )

*/
@Override
@Deprecated
public AccessConfig getAccessConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method has not callers in the Apache Polaris codebase. I think we should remove it right away to simplify code maintenance.

This would be consistent with the current code evolution policy: https://polaris.apache.org/releases/1.1.0/evolution/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: any callers to this method from downstream projects can simply inline this method (or switch to AccessConfigProvider).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I think the instance of check also only happen once in icebergCatalogHandler. I will remove the method and the interface but leave this conversation open should anyone notice some downstream dependency on this

@dimas-b
Copy link
Contributor

dimas-b commented Oct 1, 2025

The PR LGTM overall 👍

* <p>This provider decouples credential vending from catalog implementations, and should be the
* primary entrypoint to get sub-scoped credentials for accessing table data.
*/
@ApplicationScoped
Copy link
Contributor

@dimas-b dimas-b Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this class needs a CallContext, how about making it request-scoped and injecting the CallContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! When writing this "provider", I also debate myself on making this ApplicationScoped or RequestScoped. The RequestScoped, as you mentioned, can let us inject CallContext and polarisMetastoreManager, which makes the method implementation simpler. However, this also means the usage of this Bean will be limited to Request-scoped bean. For example, DefaultFileIOFactory cannot directly inject this.

IMHO, if a provider/factory class, like this, does not depend on the status of request but only need to pass the context, it's better to be ApplicationScoped so the usage is not limited. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After poking at the code, I believe in this case the CallContext is needed only for AtomicOperationMetaStoreManager to get IntegrationPersistence from it.

However, subsequent calls under loadPolarisStorageIntegration do not use it. Only the entity and static type information is needed to get to PolarisStorageIntegration.

Would you be open to a deeper refactoring so as to avoid the dependency on CallContext in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be clear: I'm proposing to take loadPolarisStorageIntegration() out of the "metastore" code and have a separate interface for it (with an application-scoped bean injected though CDI)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(re: my initial message in this thread: from my POV the awkwardness is in mixing application-scoped code with CallContext. Normally CallContext data should be request-scoped. Since in this case it does not seem to be required at all, I propose to stay at the application scope level, but remove CallContext from method parameters.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with multiple PRs, but it looks like in this PR we're adding CallContext to method parameters that did not have it before and we're planning to do extra work to remove those CallContext soon (I hope we're on the same page here).

From the runtime POV, the server (once fully started) handles only request. Each request has a CDI request context. Normally, all request-specific data is carries around in the request context. Beans can be produced in the request context to hold and manifest a sub-set of that data for executing a particular call path.

Application-scoped beans hold state for the duration of the application (which may be empty, e.g. for factories).

In that regard CallContext looks like a custom request context, but I believe CDI/Quarkus can be leverages to handle it efficiently and reduce complexity at the code level - that is not having CallContext in method parameters, but injecting specific request-scoped data (only) when required.

Copy link
Contributor Author

@HonahX HonahX Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general @ApplicationScoped does not have to be stateless 🙂 it is only restricted to one instance per application runtime.

Yeah, I actually mean more of "request state agnostic", sorry for the confusion.

BCR

Although we do not offer guarantee for interfaces other than our api spec, it may be still good to do a dev mailing list notification on interface changes like the metaStoreManager and persistence, so that the downstream maintainer and vendor could exercise the

Maintainers of downstream projects are encouraged to join Polaris mailing lists to monitor project changes, suggest improvements, and engage with the Polaris community in case of specific compatibility concerns.

I expect the refactoring on loadPolarisStorageIntegration() will be large so it better to be in a separate PR anyway to make it easier for reviewer. Given both, how about making this in a follow-up, both to control the size of the PR and make it easier for other maintainer to evaluate any compatibility concerns?

The current PR IMHO still has value that separate the logic getAccessConfig out and unify the call path, so that subsequent small refactoring to update DefaultFileIOFactory could happen in parallel to the metastoreManager refactoring. And we could always update the AccessConfigProvider (changing the scope, rename) as needed. WDYT 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dev mailing list notification on interface changes

Yes, we can do that, but I think it's not a requirement for situations when the use case is clear, there are multiple reviewers on the PR and there is a path-forward for downstream projects.

We cannot avoid forcing some changes downstream. As long as downstream code has a way to adapt, any refactoring is possible on main. That's my take on https://polaris.apache.org/in-dev/unreleased/evolution/ 🙂

However, feel free to start dev discussion for this, if you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If by dev discussion your mean taking loadPolarisStorageIntegration() out of metastore, then yes, we should certainly discuss that on the dev ML, because that falls under "use case is NOT clear" IMHO :)

Cf.

PolarisStorageIntegration<T> createStorageIntegrationInCurrentTxn(

... but I do not think this PR needs a dev discussion (just my opinion).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for dev list discussion, I mean specifically for loadPolarisStorageIntegration() and any related change in the metastoreManager and persistence layer : ). I think we are aligned here 🙌

* <p>This provider decouples credential vending from catalog implementations, and should be the
* primary entrypoint to get sub-scoped credentials for accessing table data.
*/
@ApplicationScoped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After poking at the code, I believe in this case the CallContext is needed only for AtomicOperationMetaStoreManager to get IntegrationPersistence from it.

However, subsequent calls under loadPolarisStorageIntegration do not use it. Only the entity and static type information is needed to get to PolarisStorageIntegration.

Would you be open to a deeper refactoring so as to avoid the dependency on CallContext in this case?

* found
*/
public AccessConfig getAccessConfig(
@Nonnull CallContext callContext,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting a new thread on call/request context since the other thread got side-tracked a bit. Here's a concrete proposal.

AccessConfigProvider is currently injected only into request-scoped beans (IcebergCatalogAdapter).

Let's make AccessConfigProvider request-scoped and inject CallContext into it (cf. ServiceProducers.polarisCallContext()).

This interface method will not have a CallContext parameter, but AccessConfigProvider will have it as a field.

When we call FileIOUtil.refreshAccessConfig() we'll use the injected CallContext from the field (CDI ensures that all related objects have the same request-scoped CallContext).

This will keep the java interface lean (only carrying parameters specific to getting access config). General parameters (context) are provided via CDI. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I see so the key point here is to pass in CallContext by CDI injection instead of param parsing. I think that makes sense, the only limitation today is that AccessConfigProvider need to be injected later in the DefaultFileIOFactory to replace.

FileIOUtil.refreshAccessConfig(
callContext,
storageCredentialCache,
credentialVendor,
identifier,
tableLocations,
storageActions,
storageInfo,
Optional.empty()));

DefaultFileIOFactory and other FileIOFactory are currently modelled as Application-scoped. Indeed, this is another major reason that I decide the keep AccessConfigProvider application-scoped in this first version refactoring.

How about we separate the AccessConfigProvider into 2 things

  • AccessConfigProviderFactory - Application-scoped bean supplies a AccessConfigProvider for given callContext
  • AccessConfigProvider - request-scoped bean in ServiceProducers that use the above Factory to initialize the Provider

This can be a good intermediate state among all the potential refactoring, that all request-scoped call path could directly use AccessConfigProvider, other like DefaultFileIOFactory could still use AccessConfigProviderFactory to get the provider. We could then update this mechanism accordingly based on follow-up refactoring, WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll have to take several steps to untangle this completely 😅

@XN137 did some work in that direction:
https://lists.apache.org/thread/ot19px6t0w808pp994rrc8kk7186dfxm
#2555, #2556

I think we can refactor DefaultFileIOFactory after these PRs 🤔 WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm trying to say is that I think if we do not have CallContext as a method argument in the new AccessConfigProvider now, then this class will not have to change (much) during later refactorings.

If you prefer we can merge this PR "as is", but in this case, I think it's unavoidable that AccessConfigProvider will have to be refactored again pretty soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all in all, if the community is in general agreement on https://lists.apache.org/thread/ot19px6t0w808pp994rrc8kk7186dfxm, I think it does not matter too much how we proceed with this PR. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll have to take several steps to untangle this completely

Unfortunately yes....😅

Yeah I agree, no matter what we do in this PR we will need more refactoring in follow-ups. And that's another reason that I am waiting for the RC cut to allow a whole month to make further related changes, such that the interface could be stabilized before it been released for the first time in 1.3. I will rebase and update the PR for a final pass

HonahX added 4 commits October 7, 2025 18:57
1. Move getAccessConfig out to a credential vendor class
2. Deprecate getAccessConfig in IcebergCatalog
3. IcebergCatalogHander need a new factory
…ration-credential-vending-refactor

# Conflicts:
#	runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java
@HonahX HonahX force-pushed the honahx-catalog-federation-credential-vending-refactor branch from 5c805d1 to d610dd4 Compare October 8, 2025 04:27
Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed above, I think we can merge this now and refactor / improve later.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Oct 8, 2025
@HonahX
Copy link
Contributor Author

HonahX commented Oct 8, 2025

@dimas-b @XJDKC Thanks for reviewing this! I will wait for 1.2 branch cut before merging this

@HonahX HonahX merged commit 9957e24 into apache:main Oct 8, 2025
16 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants