-
Notifications
You must be signed in to change notification settings - Fork 593
HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes #9611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@octachoron would you like to take a look at it if you have time? It's related to what we discussed recently :) |
|
@dombizita, absolutely, thank you! I don't think my vote is enough to merge though. |
octachoron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @sreejasahithi for the patch. I added a few thoughts and questions inline. 🙂
...pache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java
Outdated
Show resolved
Hide resolved
...pache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java
Outdated
Show resolved
Hide resolved
.../apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocolServerSideTranslatorPB.java
Outdated
Show resolved
Hide resolved
...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java
Show resolved
Hide resolved
octachoron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, the changes look good to me. Do you think there is a good way to write tests for the feature? (I do not see straightforward precedent other than actual integration tests, but that does not mean there isn't a way. 🙂)
ashishkumar50
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sreejasahithi Thanks for working on this.
| } | ||
|
|
||
| private void executeForSpecificNodeInHA(ScmClient scmClient, String serviceId) throws IOException { | ||
| String scmAddress = getScmOption().getScm(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scmAddress is not mandatory option.
| } else if (StringUtils.isNotEmpty(getScmOption().getScm()) && serviceId != null) { | ||
| executeForSpecificNodeInHA(scmClient, serviceId); | ||
| } else { | ||
| executeForSingleNode(scmClient); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In normal or existing behaviour we need safemode status from leader node most of the time. When no scm address is passed, whether we are getting safe mode status from leader node or not? Because now follower also can accept safemode and can return the status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ashishkumar50 for finding this bug, you are right now that we are allowing follower to also accept status command there can be a possibility where when we run safemode status command with no additional option it can return the status of the follower.
I have fixed this issue.
priyeshkaratha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sreejasahithi for working on this. I have one minor comment on handling audit logs.
| @Override | ||
| public Map<String, Pair<Boolean, String>> getSafeModeRuleStatusesForNode(String nodeId) throws IOException { | ||
| Map<String, Pair<Boolean, String>> result = getSafeModeRuleStatuses(); | ||
| AUDIT.logReadSuccess( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use logReadSuccess with auditMap with nodeId added and also add auditlog on failure since its called for nodeId
| target.getAddress().equals(nodeAddr.getAddress()); | ||
| } catch (Exception e) { | ||
| // If address resolution fails, no match | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : Log the exception here before returning false
What changes were proposed in this pull request?
This PR provides an option
--allto show the safemode status of each SCM node in the cluster.If verbose, It also provides the status of each safemode exit rule for each SCM node.
This PR also fixes the bug stated in HDDS-13832 where when
--scmoption is used in HA it always shows the status of the leader SCM and silently ignores the node specified via the option.What is the link to the Apache JIRA
HDDS-14108
How was this patch tested?
This patch was tested locally in a docker ozone-ha cluster:
When one of the SCM node is down :
Green CI : https://github.com/sreejasahithi/ozone/actions/runs/20842284515