Skip to content

Azure Stack Support #5532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

patrickdillon
Copy link

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR adds infrastructure provisioning support on Azure Stack. Currently Azure Stack is completely unsupported, but with the changes in this PR I was able to fully provision an OpenShift cluster.

This PR adds a new field armEndpoint to the cluster spec, and extends azureEnvironment to accept a new value, HybridEnvironment, to indicate installation to Azure Stack:

armEndpoint: https://management.ashRegion.ashInstance.com
azureEnvironment: HybridEnvironment

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #
#5201

Special notes for your reviewer:

This is a large PR, which I know is not preferred, but I have laid out the commits logically and with messages so they should be easy to follow in that manner. I would be happy to break it up into smaller PRs if that would help.

Furthermore, there were some significant challenges in this implementation. Particularly: I could not get tag reconciliation using the tagging service to work: an inscrutable 500 error was returned. Therefore, 88fc6ea skips adding the tagging service for azure stack. Perhaps I should do the same for MachinePool?

I was pretty satisfied with how other challenges were addressed, but definitely happy to discuss them. Thanks!

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:

Adds support for installing to Azure Stack environments. Users can specify `cloudEnvironment: HybridEnvironment` and specify `armEndpoint` in the cluster spec.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 31, 2025
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 31, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sbueringer for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Welcome @patrickdillon!

It looks like this is your first PR to kubernetes-sigs/cluster-api-provider-azure 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-provider-azure has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @patrickdillon. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 31, 2025
@k8s-ci-robot k8s-ci-robot requested review from Jont828 and nojnhuh March 31, 2025 18:46
@damdo
Copy link
Member

damdo commented Apr 1, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 1, 2025
Adds AzureStack as a valid cloud environment. The value
"HybridEnvironment" is the value provided by the Azure autorest
package. It would be possible to have a different user-facing
value, such as AzureStackCloud, but internally within the code
it is necessary to check for the value "HybridEnvironment"
returned by autorest. This commit opts to use a single value,
rather than separate user-facing and internal values.
Adds the ARMEndpoint field for specifying the ARM Resource Manager
Endpoint for use with Azure Stack deployments. The endpoint is
used to configure the environment as well as manage resources.
Uses ARMEndpoint from Cluster scope to configure Azure Stack settings
for Azure Client and Authorizer, which will be used to configure
ARM options for the V2 SDK.
Sets ARM Client Options when using the Azure Stack environment. Extends
ARMClientOptions to accept an ARMEndpoint, which can be obtained from
the authorizer interface, the same source the cloud environment.

Sets the APIVersion to a hybrid cloud profile to ensure compatibility
with hybrid environments.
Azure Stack Hub does not support private dns zones, so skip them.
The Resource SKU API for availability sets may not be available in
an Azure Stack environment. The cache is used to determine the
fault domain count. For Azure Stack, we can default to 2. Future
work could potentially set this programatically or expose the
fault domain count in the API.
The tag service using the V2 SDK is not available in azure stack.
Skip tag reconciliation in Azure Stack environments.
The standard 2020-06-01 API Version is not supported for disk
operations in Azure Stack, so change to the compatible 2018-06-01
profile.
Azure Stack returns a 400 error when trying to delete a VM with
the force flag and the error message suggests retrying without
the flag.
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 5, 2025
@patrickdillon
Copy link
Author

Don't think anybody has looked at this yet so I went ahead and force pushed to rebase, and fixed the unit test failure (due to newly wrappd error).

@@ -48,6 +48,7 @@ type AzureClusterClassSpec struct {
// - GermanCloud: "AzureGermanCloud"
// - PublicCloud: "AzurePublicCloud"
// - USGovernmentCloud: "AzureUSGovernmentCloud"
// - StackCloud: "HybridEnvironment"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Azure Stack officially supported? Autorest is a deprecated library. I don't see Azure Stack listed in the new Cloud library - https://github.com/Azure/azure-sdk-for-go/blob/d165f8083de58e12135bce2102205081eaade930/sdk/azcore/cloud/cloud.go#L10.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, the GermanCloud isn't supported anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there is an issue too to try and get rid of the autorest library implementations in CAPZ - #2974

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there is an issue too to try and get rid of the autorest library implementations in CAPZ

Right, I discussed this a little in the issue #5201. AFAIK there is no available alternative.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea whether it is still supported, although I'm pretty sure it is still sold.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 7, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@@ -48,6 +48,7 @@ type AzureClusterClassSpec struct {
// - GermanCloud: "AzureGermanCloud"
// - PublicCloud: "AzurePublicCloud"
// - USGovernmentCloud: "AzureUSGovernmentCloud"
// - StackCloud: "HybridEnvironment"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further testing, it seems "AzureStackCloud" may indeed be required by some components. I will iron this out.

@willie-yao
Copy link
Contributor

/assign @willie-yao @jackfrancis

Copy link

codecov bot commented Apr 10, 2025

Codecov Report

Attention: Patch coverage is 21.78218% with 79 lines in your changes missing coverage. Please review.

Project coverage is 52.81%. Comparing base (b18718c) to head (4605c9e).
Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
azure/defaults.go 6.25% 15 Missing ⚠️
azure/services/virtualmachines/client.go 0.00% 10 Missing ⚠️
azure/scope/machine.go 0.00% 9 Missing ⚠️
azure/services/availabilitysets/spec.go 72.72% 4 Missing and 2 partials ⚠️
azure/services/disks/client.go 0.00% 4 Missing ⚠️
azure/errors.go 0.00% 3 Missing ⚠️
azure/scope/cluster.go 40.00% 3 Missing ⚠️
controllers/azuremachine_reconciler.go 0.00% 3 Missing ⚠️
azure/scope/clients.go 60.00% 1 Missing and 1 partial ⚠️
azure/services/identities/client.go 0.00% 2 Missing ⚠️
... and 19 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5532      +/-   ##
==========================================
- Coverage   52.86%   52.81%   -0.05%     
==========================================
  Files         272      272              
  Lines       29474    29520      +46     
==========================================
+ Hits        15582    15592      +10     
- Misses      13080    13113      +33     
- Partials      812      815       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this! I just had a few comments that are mostly nitpicky and addressing a need for unit testing. Also, I think adding some documentation relating to this feature would be great!

@@ -48,6 +48,7 @@ type AzureClusterClassSpec struct {
// - GermanCloud: "AzureGermanCloud"
// - PublicCloud: "AzurePublicCloud"
// - USGovernmentCloud: "AzureUSGovernmentCloud"
// - StackCloud: "HybridEnvironment"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be renamed to AzureStackCloud to keep it consistent with the other clouds, or is it required to be called "HybridCloud"?

@@ -186,6 +193,7 @@ type AzureManagedControlPlaneClassSpec struct {
// - PublicCloud: "AzurePublicCloud"
// - USGovernmentCloud: "AzureUSGovernmentCloud"
//
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended to add AzureStack to the comment here as well?

@@ -27,6 +27,7 @@ import (
"github.com/Azure/azure-sdk-for-go/sdk/azcore/policy"
"github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/compute/armcompute/v5"
"github.com/Azure/azure-sdk-for-go/sdk/tracing/azotel"
"github.com/Azure/go-autorest/autorest/azure"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"github.com/Azure/go-autorest/autorest/azure"
azureautorest "github.com/Azure/go-autorest/autorest/azure"

@@ -109,6 +112,16 @@ const (
CustomHeaderPrefix = "infrastructure.cluster.x-k8s.io/custom-header-"
)

const (
// StackAPIVersion is the API version profile to set for ARM clients. See:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// StackAPIVersion is the API version profile to set for ARM clients. See:
// StackAPIVersionProfile is the API version profile to set for ARM clients. See:

Comment on lines +383 to +397
case StackCloudName:
cloudEnv, err := azure.EnvironmentFromURL(armEndpoint)
if err != nil {
return nil, fmt.Errorf("unable to get Azure Stack cloud environment: %w", err)
}
opts.APIVersion = StackAPIVersionProfile
opts.Cloud = cloud.Configuration{
ActiveDirectoryAuthorityHost: cloudEnv.ActiveDirectoryEndpoint,
Services: map[cloud.ServiceName]cloud.ServiceConfiguration{
cloud.ResourceManager: {
Audience: cloudEnv.TokenAudience,
Endpoint: cloudEnv.ResourceManagerEndpoint,
},
},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a unit test case for this in TestARMClientOptions?

@@ -150,7 +150,8 @@ func (m *MachineScope) InitMachineCache(ctx context.Context) error {
}

m.cache.availabilitySetSKU, err = skuCache.Get(ctx, string(armcompute.AvailabilitySetSKUTypesAligned), resourceskus.AvailabilitySets)
if err != nil {
// Resource SKU API for availability sets may not be available in Azure Stack environments.
if err != nil && !strings.EqualFold(m.CloudEnvironment(), "HybridEnvironment") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err != nil && !strings.EqualFold(m.CloudEnvironment(), "HybridEnvironment") {
if err != nil && !strings.EqualFold(m.CloudEnvironment(), azure.StackCloudName) {

@@ -98,3 +91,27 @@ func (s *AvailabilitySetSpec) Parameters(_ context.Context, existing interface{}

return asParams, nil
}

func getFaultDomainCount(SKU *resourceskus.SKU, cloudEnvironment string) (*int32, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func getFaultDomainCount(SKU *resourceskus.SKU, cloudEnvironment string) (*int32, error) {
func getFaultDomainCount(sku *resourceskus.SKU, cloudEnvironment string) (*int32, error) {

Comment on lines +98 to +100
if strings.EqualFold(cloudEnvironment, azure.StackCloudName) {
return ptr.To(int32(2)), nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a unit test case for when cloud environment is Azure Stack to TestParameters in spec_test.go?

}
if err != nil {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good improvement, but is it related to Azure Stack support? What problem is it trying to solve?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, azure stack throws a 400 error that says the force flag is not supported... Should I add a comment?

Comment on lines +108 to +110
if !strings.EqualFold(machineScope.CloudEnvironment(), azure.StackCloudName) {
ams.services = append(ams.services, tagsSvc)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed because Azure Stack doesn't support the tags service? If so, I think a code comment here would be helpful.

@patrickdillon
Copy link
Author

@willie-yao thanks for the in-depth review and feedback.

I am just back today from vacation, & will incorporate the changes ASAP.

@nawazkh
Copy link
Member

nawazkh commented May 7, 2025

Hey @patrickdillon , how is this PR coming along ? How can we help you push this forward ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

7 participants