|
| 1 | +--- |
| 2 | +title: Troubleshoot the health probe mode for AKS cluster service load balancer |
| 3 | +description: Diagnoses and fixes common issues with the health probe mode feature. |
| 4 | +ms.date: 06/03/2024 |
| 5 | +ms.reviewer: niqi, cssakscic, v-weizhu |
| 6 | +ms.service: azure-kubernetes-service |
| 7 | +ms.custom: sap:Node/node pool availability and performance, devx-track-azurecli, innovation-engine |
| 8 | +--- |
| 9 | + |
| 10 | +# Troubleshoot issues when enabling the AKS cluster service health probe mode |
| 11 | + |
| 12 | +The health probe mode feature allows you to configure how Azure Load Balancer probes the health of the nodes in your Azure Kubernetes Service (AKS) cluster. You can choose between two modes: Shared and ServiceNodePort. The Shared mode uses a single health probe for all external traffic policy cluster services that use the same load balancer. In contrast, the ServiceNodePort mode uses a separate health probe for each service. The Shared mode can reduce the number of health probes and improve the performance of the load balancer, but it requires some additional components to work properly. To enable this feature, see [How to enable the health probe mode feature using the Azure CLI](#how-to-enable-the-health-probe-mode-feature-using-the-azure-cli). |
| 13 | + |
| 14 | +This article describes some common issues about using the health probe mode feature in an AKS cluster and helps you troubleshoot and resolve these issues. |
| 15 | + |
| 16 | +## Symptoms |
| 17 | + |
| 18 | +When creating or updating an AKS cluster by using the Azure CLI, if you enable the health probe mode feature using the `--cluster-service-load-balancer-health-probe-mode Shared` flag, the following issues occur: |
| 19 | + |
| 20 | +- The load balancer doesn't distribute traffic to the nodes as expected. |
| 21 | + |
| 22 | +- The load balancer reports unhealthy nodes even if they're healthy. |
| 23 | + |
| 24 | +- The health-probe-proxy sidecar container crashes or doesn't start. |
| 25 | + |
| 26 | +- The cloud-node-manager pod crashes or doesn't start. |
| 27 | + |
| 28 | +The following operations also happen: |
| 29 | + |
| 30 | +1. RP frontend checks if the request is valid and updates the corresponding property in the LoadBalancerProfile. |
| 31 | + |
| 32 | +2. RP async calls the cloud provider config secret reconciler to update the cloud provider config secret based on the LoadBalancerProfile. |
| 33 | + |
| 34 | +3. Overlaymgr reconciles the cloud-node-manager chart to enable the health-probe-proxy sidecar. |
| 35 | + |
| 36 | +## Initial troubleshooting |
| 37 | + |
| 38 | +To troubleshoot these issues, follow these steps: |
| 39 | + |
| 40 | +0. First, connect to your AKS cluster using the Azure CLI: |
| 41 | + |
| 42 | + ```azurecli |
| 43 | + export RESOURCE_GROUP="aks-rg" |
| 44 | + export AKS_CLUSTER_NAME="aks-cluster" |
| 45 | + az aks get-credentials --resource-group $RESOURCE_GROUP --name $AKS_CLUSTER_NAME --overwrite-existing |
| 46 | + ``` |
| 47 | +
|
| 48 | +1. Next, check the RP frontend log to see if the health probe mode in the LoadBalancerProfile is properly configured. You can use the `az aks show` command to view the LoadBalancerProfile property of your cluster. |
| 49 | +
|
| 50 | + ```azurecli |
| 51 | + export RESOURCE_GROUP="aks-rg" |
| 52 | + export AKS_CLUSTER_NAME="aks-cluster" |
| 53 | + az aks show --resource-group $RESOURCE_GROUP --name $AKS_CLUSTER_NAME --query "networkProfile.loadBalancerProfile" |
| 54 | + ``` |
| 55 | + Results: |
| 56 | +
|
| 57 | + <!-- expected_similarity=0.3 --> |
| 58 | +
|
| 59 | + ```output |
| 60 | + { |
| 61 | + "clusterServiceLoadBalancerHealthProbeMode": "Shared", |
| 62 | + "managedOutboundIPs": null, |
| 63 | + "outboundIPs": null, |
| 64 | + "outboundIPPrefixes": null, |
| 65 | + "allocatedOutboundPorts": null, |
| 66 | + "effectiveOutboundIPs": [ |
| 67 | + { |
| 68 | + "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/MC_aks-rg_aks-cluster_eastus2/providers/Microsoft.Network/publicIPAddresses/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" |
| 69 | + } |
| 70 | + ], |
| 71 | + "idleTimeoutInMinutes": 30, |
| 72 | + "loadBalancerSku": "standard", |
| 73 | + "managedOutboundIPv6": null |
| 74 | + } |
| 75 | + ``` |
| 76 | +
|
| 77 | +2. Check the cloud provider configuration. In modern AKS clusters, the cloud provider configuration is managed internally and the `ccp` namespace doesn't exist. Instead, check for cloud provider related resources and verify the cloud-node-manager pods are running properly: |
| 78 | +
|
| 79 | +
|
| 80 | + ```bash |
| 81 | + # Check for cloud provider related ConfigMaps in kube-system |
| 82 | + kubectl get configmap -n kube-system | grep -i azure |
| 83 | + |
| 84 | + # Check if cloud-node-manager pods are running (indicates cloud provider integration is working) |
| 85 | + kubectl get pods -n kube-system | grep cloud-node-manager |
| 86 | + |
| 87 | + # Check the azure-ip-masq-agent-config if it exists |
| 88 | + kubectl get configmap azure-ip-masq-agent-config-reconciled -n kube-system -o yaml 2>/dev/null || echo "ConfigMap not found" |
| 89 | + ``` |
| 90 | + Results: |
| 91 | +
|
| 92 | + <!-- expected_similarity=0.3 --> |
| 93 | +
|
| 94 | + ```output |
| 95 | + configmap/azure-ip-masq-agent-config-reconciled 1 11h |
| 96 | + |
| 97 | + cloud-node-manager-rfb2w 2/2 Running 0 16m |
| 98 | + ``` |
| 99 | +
|
| 100 | +3. Check the chart or overlay daemonset cloud-node-manager to see if the health-probe-proxy sidecar container is enabled. You can use the `kubectl get ds` command to view the daemonset. |
| 101 | +
|
| 102 | + ```shell |
| 103 | + kubectl get ds -n kube-system cloud-node-manager -o yaml |
| 104 | + ``` |
| 105 | + Results: |
| 106 | +
|
| 107 | + <!-- expected_similarity=0.3 --> |
| 108 | +
|
| 109 | + ```output |
| 110 | + apiVersion: apps/v1 |
| 111 | + kind: DaemonSet |
| 112 | + metadata: |
| 113 | + name: cloud-node-manager |
| 114 | + namespace: kube-system |
| 115 | + ... |
| 116 | + spec: |
| 117 | + template: |
| 118 | + spec: |
| 119 | + containers: |
| 120 | + - name: cloud-node-manager |
| 121 | + image: mcr.microsoft.com/oss/kubernetes/azure-cloud-node-manager:xxxxxxxx |
| 122 | + - name: health-probe-proxy |
| 123 | + image: mcr.microsoft.com/oss/kubernetes/azure-health-probe-proxy:xxxxxxxx |
| 124 | + ... |
| 125 | + ``` |
| 126 | +
|
| 127 | +## Cause 1: The health probe mode isn't Shared or ServiceNodePort |
| 128 | +
|
| 129 | +The health probe mode feature only works with these two modes. If you use any other mode, the feature won't work. |
| 130 | +
|
| 131 | +### Solution 1: Use the correct health probe mode |
| 132 | +
|
| 133 | +Make sure you use the Shared or ServiceNodePort mode when creating or updating your cluster. You can use the `--cluster-service-load-balancer-health-probe-mode` flag to specify the mode. |
| 134 | +
|
| 135 | +## Cause 2: The toggle for the health probe mode feature is off |
| 136 | +
|
| 137 | +The health probe mode feature is controlled by a toggle that can be enabled or disabled by the AKS team. If the toggle is off, the feature won't work. |
| 138 | +
|
| 139 | +### Solution 2: Turn on the toggle |
| 140 | +
|
| 141 | +Contact the AKS team to check if the toggle for the health probe mode feature is on or off. If it's off, ask them to turn it on for your subscription. |
| 142 | +
|
| 143 | +## Cause 3: The load balancer SKU is Basic |
| 144 | +
|
| 145 | +The health probe mode feature only works with the Standard Load Balancer SKU. If you use the Basic Load Balancer SKU, the feature won't work. |
| 146 | +
|
| 147 | +### Solution 3: Use the Standard Load Balancer SKU |
| 148 | +
|
| 149 | +Make sure you use the Standard Load Balancer SKU when creating or updating your cluster. You can use the `--load-balancer-sku` flag to specify the SKU. |
| 150 | +
|
| 151 | +## Cause 4: The feature isn't registered |
| 152 | +
|
| 153 | +The health probe mode feature requires you to register the feature on your subscription. If the feature isn't registered, it won't work. |
| 154 | +
|
| 155 | +### Solution 4: Register the feature |
| 156 | +
|
| 157 | +Make sure you register the feature for your subscription before creating or updating your cluster. You can use the `az feature register` command to register the feature. |
| 158 | +
|
| 159 | +```azurecli |
| 160 | +export FEATURE_NAME="EnableSLBSharedHealthProbePreview" |
| 161 | +export PROVIDER_NAMESPACE="Microsoft.ContainerService" |
| 162 | +az feature register --name $FEATURE_NAME --namespace $PROVIDER_NAMESPACE |
| 163 | +``` |
| 164 | +Results: |
| 165 | + |
| 166 | +<!-- expected_similarity=0.3 --> |
| 167 | + |
| 168 | +```output |
| 169 | +{ |
| 170 | + "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/providers/Microsoft.Features/providers/Microsoft.ContainerService/features/EnableAKSClusterServiceLoadBalancerHealthProbeMode", |
| 171 | + "name": "Microsoft.ContainerService/EnableAKSClusterServiceLoadBalancerHealthProbeMode", |
| 172 | + "properties": { |
| 173 | + "state": "Registering" |
| 174 | + }, |
| 175 | + "type": "Microsoft.Features/providers/features" |
| 176 | +} |
| 177 | +``` |
| 178 | + |
| 179 | +## Cause 5: The Kubernetes version is earlier than v1.28.0 |
| 180 | + |
| 181 | +The health probe mode feature requires a minimum Kubernetes version of v1.28.0. If you use an older version, the feature won't work. |
| 182 | + |
| 183 | +### Solution 5: Upgrade the Kubernetes version |
| 184 | + |
| 185 | +Make sure you use Kubernetes v1.28.0 or a later version when creating or updating your cluster. You can use the `--kubernetes-version` flag to specify the version. |
| 186 | + |
| 187 | +## Known issues |
| 188 | + |
| 189 | +For Windows, the kube-proxy component doesn't start until you create the first non-HPC pod in a node. This issue affects the health probe mode feature and causes the load balancer to report unhealthy nodes. It will be fixed in a future update. |
| 190 | + |
| 191 | +## How to enable the health probe mode feature using the Azure CLI |
| 192 | + |
| 193 | +To enable the health probe mode feature, run one of the following commands: |
| 194 | + |
| 195 | +Enable `ServiceNodePort` health probe mode (default) for a cluster: |
| 196 | + |
| 197 | +```shell |
| 198 | +export RESOURCE_GROUP="aks-rg" |
| 199 | +export AKS_CLUSTER_NAME="aks-cluster" |
| 200 | +az aks update --resource-group $RESOURCE_GROUP --name $AKS_CLUSTER_NAME --cluster-service-load-balancer-health-probe-mode ServiceNodePort |
| 201 | +``` |
| 202 | +Results: |
| 203 | + |
| 204 | +```output |
| 205 | +{ |
| 206 | + "name": "aks-cluster", |
| 207 | + "location": "eastus2", |
| 208 | + "resourceGroup": "aks-rg", |
| 209 | + "kubernetesVersion": "1.28.x", |
| 210 | + "provisioningState": "Succeeded", |
| 211 | + "loadBalancerProfile": { |
| 212 | + "clusterServiceLoadBalancerHealthProbeMode": "ServiceNodePort", |
| 213 | + ... |
| 214 | + }, |
| 215 | + ... |
| 216 | +} |
| 217 | +``` |
| 218 | + |
| 219 | +Enable `Shared` health probe mode for a cluster: |
| 220 | + |
| 221 | +```shell |
| 222 | +export RESOURCE_GROUP="MyAksResourceGroup" |
| 223 | +export AKS_CLUSTER_NAME="MyAksCluster" |
| 224 | +az aks update --resource-group $RESOURCE_GROUP --name $AKS_CLUSTER_NAME --cluster-service-load-balancer-health-probe-mode Shared |
| 225 | +``` |
| 226 | + |
| 227 | +Results: |
| 228 | + |
| 229 | +```output |
| 230 | +{ |
| 231 | + "name": "MyAksCluster", |
| 232 | + "location": "eastus2", |
| 233 | + "resourceGroup": "MyAksResourceGroup", |
| 234 | + "kubernetesVersion": "1.28.x", |
| 235 | + "provisioningState": "Succeeded", |
| 236 | + "loadBalancerProfile": { |
| 237 | + "clusterServiceLoadBalancerHealthProbeMode": "Shared", |
| 238 | + ... |
| 239 | + }, |
| 240 | + ... |
| 241 | +} |
| 242 | +``` |
| 243 | + |
| 244 | +[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)] |
0 commit comments