Skip to content

Conversation

@ruai0511
Copy link
Contributor

@ruai0511 ruai0511 commented Sep 25, 2025

Description

In the past, the auto tagging label resolving logic is only suitable for single attribute evaluation. Since we're adding more attributes into the feature now (username, role, index_pattern), we need a more comprehensive logic to find the best suited rule and label.

Feature documentation: https://docs.opensearch.org/latest/tuning-your-cluster/availability-and-recovery/rule-based-autotagging/autotagging/

Main classes & functions introduced:

  1. Entry Point – evaluateLabel
public Optional<String> evaluateLabel(List<AttributeExtractor<String>> attributeExtractors)
  • Sorts extractors by priority.
  • Delegates resolution to FeatureValueResolver.resolve(...).
  • Returns the final label from FeatureValueResolutionResult.resolveLabel().
  1. Central class to evaluate candidate values – FeatureValueResolver
  • Iterates over each AttributeExtractor.
  • For each extractor, delegates to FeatureValueCollector.
  • Maintains a running intersection across extractors (AND logic between attributes).
  1. Extracting values for a single attribute– FeatureValueCollector
  • Each extractor may have subfields (e.g., "principal.username", "principal.role").
  • If multiple values are extracted, merges them according to the extractor’s logical operator (OR/AND)

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ruai0511 ruai0511 requested a review from a team as a code owner September 25, 2025 19:50
@github-actions
Copy link
Contributor

❌ Gradle check result for a8b2860: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing the PR, few initial comments

Comment on lines +23 to +32
enum LogicalOperator {
/**
* Logical AND
*/
AND,
/**
* Logical OR
*/
OR
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting anything other than AND/OR? If not, might be better to have method return boolean value, say isConjunction()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that will not be ideal since the return value might be little ambiguous in instructions after the method call e,g;

boolean isAnd = isConjuntion();
....

if (!isAnd) // this is ambiguos as this doesn't directly imply OR here vs LogicalOperator.OR 

* This helps in tie-breaking: values appearing earlier in the list (i.e., more specific matches)
* are considered better matches when resolving the final label.
*/
private final Map<String, Integer> firstOccurrenceIndex = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming this is for optimizing the lookup? Have we considered the latency impact without having this index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven’t measured the latency impact/ run latency tests yet, but expect this should make lookups faster. Without it, we would need to iterate through every element in the list to determine the earliest occurrence, which would be way less efficient.

Copy link
Contributor

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, for evaluating principal attribute, we should use exact match for username and role instead of prefix. The admins can create role mapping for specific user patterns (which supports regex, not just prefix) instead of working with prefix/patterns as part principal attribute in WLM. Hopefully, that should make FeatureValueResolver logic simpler and easier to follow.

We can always support that prefix based principal values, if there is strong ask for it in future, but unable to see that for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants