-
Notifications
You must be signed in to change notification settings - Fork 314
Add loadEntities batch call and rename listFullEntities #2508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add loadEntities batch call and rename listFullEntities #2508
Conversation
...bc/src/main/java/org/apache/polaris/persistence/relational/jdbc/JdbcBasePersistenceImpl.java
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/entity/PolarisBaseEntity.java
Show resolved
Hide resolved
...-core/src/main/java/org/apache/polaris/core/persistence/AtomicOperationMetaStoreManager.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/persistence/PolarisMetaStoreManager.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/persistence/PolarisMetaStoreManager.java
Outdated
Show resolved
Hide resolved
1e05a56
to
05d7f49
Compare
* NULL if the entity has been dropped. | ||
*/ | ||
@Nonnull | ||
ResolvedEntitiesResult loadResolvedEntities( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existence of this method may cause confusion since the single-lookup methods have both ByName
and ById
variations, and an EntityNameLookupRecord
happens to have the name in it, and yet it looks like the actual key difference between the two methods is whether we have EntityType
per item.
I'm not sure if we already use EntityNameLookupRecord
as an input argument anywhere else, but generally since it was kind of structured as an output argument before it seems it becomes ambiguous as an input argument.
And at first glance the unittests seems to imply that the lookup would be "by name":
@ParameterizedTest
@ValueSource(strings = {"id", "name"})
....
if (loadType.equals("id")) {
// Create entity ID list with the updated entity
List<PolarisEntityId> entityIds = List.of(getPolarisEntityId(T6v2));
// Call batch load - this should detect the stale version and reload
results =
cache.getOrLoadResolvedEntities(this.callCtx, PolarisEntityType.TABLE_LIKE, entityIds);
} else {
results =
cache.getOrLoadResolvedEntities(this.callCtx, List.of(new EntityNameLookupRecord(T6v2)));
}
To match convention with the single-item lookups can we rename these methods to say loadResolvedEntitiesById
?
And if the difference is really just whether we pass in a per-entity EntityType, I think even parallel Lists (List<EntityId>, List<EntityType>
would be better than reusing EntityNameLookupRecord
just for its catalogId, entityId, entityType
.
Alternatively, cleanest would be just having one interface entirely with List<EntityIdAndType>
as the input argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing to consider is whether we actually want to allow different entity type lookups within a single batch. It may change the atomicity semantics for persistence implementations where different types are in different atomicity domains.
Are there any callsites that actually rely on using this form of the method instead of the one with a single entityType across the whole list of EntityIds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I can't think of a use case where we would mix EntityType
s in a single call. The two immediate use cases I have in mind are
- Batch loading the principal roles during the authentication step
- Support for loading TableMetadata from persistence rather than from cloud storage (this is in concert with Add properties from TableMetadata into Table entity internalProperties #2735 and other future PRs).
In both cases, only a single EntityType is loaded. I used the EntityNameLookupRecord
type largely because it is the return type for the listEntities
API, but I wanted to avoid fetching all entities in full in the case that many/most entities are already in cache. Personally, I don't like the pattern of using parallel list parameters for an API, so I would oppose the List<EntityId>, List<EntityType>
option. I am ok with a new EntityIdAndType
argument, but I'd also be ok with just supporting the one API that takes in the EntityType
and List<EntityId>
arguments and getting rid of the other option until a need arises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sounds good, I think it's good to have the more opinionated "all entities in the batch are the same EntityType" method signature for now, as it's easier to ensure different impls can fulfill the interface. Signature (EntityType, List<EntityId>)
looks good to me, and let's remove the overloaded method regarding EntityNameLookupRecord
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have any concerns with the current state of this PR, but I'd be interested in reviewing again after comments from @dennishuo are resolved :)
if (e == null) { | ||
return null; | ||
} else { | ||
// load the grant records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe add toResolvedPolarisEntity()
as in AtomicOperationMetaStoreManager
?
6804058
to
2289355
Compare
* NULL if the entity has been dropped. | ||
*/ | ||
@Nonnull | ||
ResolvedEntitiesResult loadResolvedEntities( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name LGTM, but I guess it does not match the PR description anymore 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM 👍 but I'll defer approval to @dennishuo .
#2290 introduced a new
loadEntities
variant, which is really alistEntities
call that returns the completePolarisBaseEntity
rather than theEntityNameLookupRecord
. A batchloadEntities
call that functions similar to theloadEntity
, when given an id, returns the identified entity, is also useful, notably for cases when you don't want to list all entities of a particular type (e.g., loading a set of Principal Roles or refreshing specific entities for theEntityCache
).This introduces a new
loadResolvedEntities
API and renames the previousloadEntities
tolistFullEntities
to avoid ambiguity. The new API now mirrors theloadResolvedEntity...
APIs, accepting either a list ofPolarisEntityId
s orEntityNameLookupRecord
s. I usedEntityNameLookupRecord
because that is the return type for the originallistEntities
API, but also becausePolarisEntityCore
requires agrantVersion
, which may not be present, e.g., if the caller only has the results of alistEntities
call. I also wanted to mirror the existingloadEntity
API, which requires anPolarisEntityType
argument and thePolarisEntityId
doesn't contain a type field.I used the
ResolvedPolarisEntity
type and terminology in the API name in order to make the EntityCache API and the raw PolarisMetaStoreManager API the same. In part, this aims to start bringing the two APIs closer together so that the concept of the cache can one day be just an implementation detail, rather than part of the core business logic. The bulk load implementation in the cache mirrors the logic in the Resolver, in that it ensures that it always returns a snapshot consistent with the state of the persistence layer as it exists at a single point in time. This means that it validates that the entire batch of entities returned matches the entity versions and grant versions returned by a call to theloadEntitiesChangeTracking
API.