-
Notifications
You must be signed in to change notification settings - Fork 567
Support AMD SEV-SNP on AWS #2424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hello @fangge1212! Some important instructions when contributing to openshift/api: |
1271931
to
a6478c1
Compare
82e877d
to
1df992a
Compare
// instanceType is the type of instance to create. Example: m4.xlarge | ||
InstanceType string `json:"instanceType"` | ||
// cpuOptions is the set of cpu options for the instance. | ||
// +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this field is not specified by a user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If unset, no CPU options are passed to the AWS platform and AWS default values are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know what the default values are currently on AWS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we don't have concrete defaults we can point to, it would be nice to include guidance on how an end-user could identify what the defaults for their configuration would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CpuOptions in AWS consists of three fields:
- amdSevSnp - default value is Disabled.
- coreCount - default value depends on the instance type.
- threadsPerCore - default value depends on the instance type.
Refer to: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/cpu-options-supported-instances-values.html
In this PR, only amdSevSnp is exposed to users. I'm not entirely sure how best to describe this in the API documentation — should we include a link to the AWS documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the link to AWS website
bbef962
to
e48669f
Compare
/retest-required |
/retest |
f28b17a
to
520141d
Compare
machine/v1beta1/types_awsprovider.go
Outdated
// cpuOptions defines CPU-related settings for the instance, including the confidential computing policy. | ||
// If unset, no CPU options will be passed to the AWS platform and AWS default CPU options will be applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I know what the "AWS default CPU options" that will be applied are? Are these literally defaults AWS imposes on requests that don't specify these options, or are these defaulted elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment! To clarify: OpenShift does not set defaults for cpuOptions. If the field is unset, the RunInstances
request is sent without a CpuOptions
block, and AWS applies its own defaults for the chosen instance type.
I’ll update the field description to make that clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the description, please take a look again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the description again to make it more concise:
When omitted, this means no opinion and the AWS platform is left to choose a reasonable default.
520141d
to
d34306b
Compare
machine/v1beta1/types_awsprovider.go
Outdated
// cpuOptions defines CPU-related settings for the instance, including the confidential computing policy. | ||
// If unset, no cpuOptions will be included in the API request to AWS, and the instance will use the default CPU options | ||
// applied by AWS for the selected intance type. | ||
// +kubebuilder:validation:MinProperties=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pop this one on the struct rather than the field please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
machine/v1beta1/types_awsprovider.go
Outdated
// More details can be checked at https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/sev-snp.html | ||
// When omitted, this means no opinion and the AWS platform is left to choose a reasonable default, | ||
// which is subject to change without notice. The current default is Disabled. | ||
// +kubebuilder:validation:Enum=Disabled;AMDEncrytedVirtualizationNestedPaging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to put this here, vs on the type definition on L119?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The placement of this line differs between AWSNetworkInterfaceType
(within the struct) and MarketType
(on the type definition). I was unsure which pattern to follow, so I selected one approach arbitrarily.
89d92a1
to
f5ba092
Compare
/hold until upstream PRs have been merged |
The upstream PR is now unblocked, and reviews/merging can proceed |
f5ba092
to
a784870
Compare
/unhold |
machine/v1beta1/types_awsprovider.go
Outdated
type CPUOptions struct { | ||
// confidentialCompute specifies whether confidential computing should be enabled for the instance, | ||
// and, if so, which confidential computing technology to use. | ||
// Valid values are: Disabled, AMDEncryptedVirtualizationNestedPaging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We generally also mention that an allowed value is omitted
for optional enums.
In practice, omitting this field is for now an invalid configuration, but in the future if new fields are added omitting this field and specifying another one would be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated. Also updated the description for CPUOptions to make it clear:
+ // If provided, it must not be empty — at least one field must be set.
@fangge1212 Could you include a link to the associated PR that implements the validations outlined here in the validating webhook this API runs through? |
Previously, I added instance type validation for the specified confidential computing technology (currently only AMD SEV-SNP). However, the instance type list is hard-coded and needs to be updated manually, so I removed the validation. As a result, there is now no webhook validation for this API change. |
AFAIK none of the validations that exist on the markers in the API here are actually enforced unless done through the webhook. I would expect us to have changes in the webhook to reject invalid configurations. |
Ah, I didn't know this. I will add validation in webhook |
When I added
|
That looks like it is an upstream test. @JoelSpeed knows the nuances here more than I do, but my understanding is that machine configurations for OpenShift are embedded as raw JSON in the Machine API and as such they must be programmatically evaluated |
a784870
to
5534fd1
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Upstream, the types are real CRDs and so all of the validation markers apply. But, the This means that they have no validation, and the only validation we can do is implemented as a Go ValidatingWebhookConfiguration which lives in the machine-api-operator repository. This means the markers (while helpful and I appreciate being added even if they do nothing), will need to also be implemented manually in the webhook |
c4d560a
to
85a7d3f
Compare
This is the draft PR(https://github.com/openshift/machine-api-operator/pull/1420/files) that adds webhook validation for the new parameter cpuOptions.
Should we change |
Yes, this was a nuance of webhook based validation that I missed and is not something the linter is aware of. We can override the lint check if it fails on this. @JoelSpeed and I have already started discussing how we can improve this linting behavior for APIs that would need the same validation behaviors. |
After chaning CPUOptions to pointer type, linting fails:
Remove
How about I remove the marker "// +kubebuilder:validation:MinProperties=1" to make the linting pass? |
I don't think we want to remove that. We don't want to allow an empty struct ( |
I was wrong here, |
85a7d3f
to
4f111bd
Compare
When CPUOptions is a pointer type, I can't distinguish between the following configurations in webhook validation:
When CPUOptions is a struct type, I can't distinguish between the following configurations in webhook validation:
So there is no way to validate minProperties=1 for CPUOptions in webhook validation. Updates:
|
That seems like a reasonable error to override as well if you can't properly distinguish between the necessary states otherwise. |
AMD SEV-SNP is one of the confidential computing technologies. This commit adds support for AMD SEV-SNP on AWS, so users can utilize the confidential computing on the cluster nodes. Signed-off-by: Fangge Jin <[email protected]>
4f111bd
to
5f65881
Compare
@everettraven |
@fangge1212: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
AMD SEV-SNP is one of the confidential computing technologies. This commit adds support for AMD SEV-SNP on AWS, so users can utilize the confidential computing on the cluster nodes.
Upstream CAPA PR: kubernetes-sigs/cluster-api-provider-aws#5605