Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource Isolation Feature Checkpoint 1: SQL Parsing through storing resource isolation group in Frontend and ComputeNode objects #8

Merged

Conversation

ctbrennan
Copy link

@ctbrennan ctbrennan commented Aug 30, 2024

What I'm doing:

  • Making changes to the MODIFY COMPUTE NODE and MODIFY FRONTEND parsing to acommodate resource isolation group labels.

  • The sink for the parsed resource isolation group labels are in the Frontend and ComputeNode classes

  • Rough flow for frontend:

    • DDLStmtExecutor.visitAlterSystemStatement -> systemHandler.process -> systemHandler.visitModifyFrontendClause -> NodeMgr.modifyFrontend -> Frontend state

Why I'm doing:

Trying to keep PRs from getting too big, so submitting this PR as its own piece.

This is part of the implementation of the Resource Isolation for Shared Data Mode feature

Follow up PRs to come which will

  • Use the now persisted resource isolation group labels to make frontends only select compute nodes from the same group.
  • Add the labels to CACHE SELECT statements, etc.
  • See other milestones

Fixes #issue https://jira.pinadmin.com/browse/RTA-6269

Testing

I tested in dev7 that assigning labels works fine.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

@ctbrennan ctbrennan changed the title Cbrennan/resource isolation 3.3 Resource Isolation Feature Checkpoint 1: SQL Parsing through storing resource isolation group in Frontend and ComputeNode objects Aug 30, 2024
@ctbrennan ctbrennan marked this pull request as ready for review August 30, 2024 20:20
fe/fe-core/src/main/java/com/starrocks/server/NodeMgr.java Outdated Show resolved Hide resolved
computeNode.setResourceIsolationGroup("");
continue;
}
computeNode.setResourceIsolationGroup(getResourceIsolationGroupFromProperties(properties));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check and log the case when the resource isolation group modify statement was issued but it doesn't change the existing group? (in which case should we skip setting the same group again/updating the compute node state?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the logging could could be useful, added that. It should be a noop to set the same resource group id, and it seems more robust to bugs to do it anyway, so I'll keep the call to set the group and persist the action to the metadata record regardless

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with that, but for future reference, could you elaborate a bit on why it's more robust to bugs if you perform the update regardless?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. The first thing I'd say is that I don't think there's a downside to repeating an operation like this.
Possible upsides

  1. it could save us in a situation where the frontends fell out of sync: e.g. this frontend, that is, the one handling the MODIFY COMPUTE NODE statement, has knows that the frontend belongs to the given compute node, but the other frontends might have fallen out of sync. In that case, we'd want to persist the operation to the metadata log so the other frontends can pick up on it, and if we're going to persist it to the metadata log, I think it really makes sense to perform the operation
  2. It's future proofing. It could save us in a situation where we've later added some side effect to setResourceIsolationGroup that we also want executed. Of course, we could change the behavior of this function at the time that we change setResourceIsolationGroup to add said side effect, but that could be an easy thing to miss.

@ctbrennan ctbrennan force-pushed the cbrennan/resource-isolation-3.3 branch from d638ed2 to 2029aec Compare September 4, 2024 19:22
@@ -3854,7 +3855,17 @@ public ParseNode visitDropFrontendClause(StarRocksParser.DropFrontendClauseConte
public ParseNode visitModifyFrontendHostClause(StarRocksParser.ModifyFrontendHostClauseContext context) {
Copy link

@saahilbarai saahilbarai Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More a question to better my understanding- Where does the call for the visitModifyFrontendHostClause method originate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visitModifyFrontendHostClause is called by ModifyFrontendHostClauseContext.accept. ModifyFrontendHostClauseContext is an automatically generated subclass of org.antlr.v4.runtime.ParseRuleContext, and therefore the interface between the antlr4 parser and the user-written StarRocks FE code. My understanding is that antlr4 is invoked to parse the input SQL statement into a syntax tree, and the <ParseTreeVisitor>.accept (in this case, AstBuilder) is where the user-written starrocks code begins.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context. I think my confusion was from the transition from visitModifyFrontendHostClause --> visitModifyFrontendClause.

Copy link

@zhenxiao zhenxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work. Looks good
Just 2 minor questions

// step 1 update the fe information stored in memory.
fe.setResourceIsolationGroup(getResourceIsolationGroupFromProperties(props));
// step 2 editLog
GlobalStateMgr.getCurrentState().getEditLog().logUpdateFrontend(fe);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it only add a log entry in edit log? Or other logics?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logUpdateFrontend only adds a line to the edit log indicating how we've updated the frontend and the serializable state of the frontend, that's it.

@@ -831,6 +831,7 @@ alterClause
| modifyBackendClause
| addComputeNodeClause
| dropComputeNodeClause
| modifyComputeNodeClause
| modifyBrokerClause
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add show groups or show isolation groups?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently my plan is to simply add group information to SHOW FRONTENDS and SHOW COMPUTE NODES but when I get to that stage, I'll consider whether it makes sense to add group information as a separate type of output.

@ctbrennan ctbrennan merged commit 54c58b2 into pinterest-integration-3.3 Sep 6, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants