Skip to content

Commit 66d0dd1

Browse files
author
naman-msft
committed
added new batch of exec docs as of june 24, 2025 and tested them ready for PR
1 parent 9d11db6 commit 66d0dd1

37 files changed

+6359
-324
lines changed
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
---
2+
title: Troubleshoot the K8SAPIServerDNSLookupFailVMExtensionError error code (52)
3+
description: Learn how to troubleshoot the K8SAPIServerDNSLookupFailVMExtensionError error (52) when you try to start or create and deploy an Azure Kubernetes Service (AKS) cluster.
4+
ms.topic: article
5+
ms.date: 06/14/2024
6+
author: MicrosoftDocsExec
7+
ms.author: MicrosoftDocsExec
8+
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool), innovation-engine
9+
---
10+
11+
# Troubleshoot the K8SAPIServerDNSLookupFailVMExtensionError error code (52)
12+
13+
This article discusses how to identify and resolve the `K8SAPIServerDNSLookupFailVMExtensionError` error (also known as error code ERR_K8S_API_SERVER_DNS_LOOKUP_FAIL, error number 52) that occurs when you try to start or create and deploy a Microsoft Azure Kubernetes Service (AKS) cluster.
14+
15+
## Prerequisites
16+
17+
- The [nslookup](/windows-server/administration/windows-commands/nslookup) DNS lookup tool for Windows nodes or the [dig](https://linuxize.com/post/how-to-use-dig-command-to-query-dns-in-linux/) tool for Linux nodes.
18+
19+
- [Azure CLI](/cli/azure/install-azure-cli), version 2.0.59 or a later version. If Azure CLI is already installed, you can find the version number by running `az --version`.
20+
21+
## Symptoms
22+
23+
When you try to start or create an AKS cluster, you receive the following error message:
24+
25+
> Agents are unable to resolve Kubernetes API server name. It's likely custom DNS server is not correctly configured, please see <https://aka.ms/aks/private-cluster#hub-and-spoke-with-custom-dns> for more information.
26+
>
27+
> Details: Code="VMExtensionProvisioningError"
28+
>
29+
> Message="VM has reported a failure when processing extension 'vmssCSE'.
30+
>
31+
> Error message: "**Enable failed: failed to execute command: command terminated with exit status=52**\n[stdout]\n{
32+
>
33+
> "ExitCode": "52",
34+
>
35+
> "Output": "Fri Oct 15 10:06:00 UTC 2021,aks- nodepool1-36696444-vmss000000\\nConnection to mcr.microsoft.com 443 port [tcp/https]
36+
37+
## Cause
38+
39+
The cluster nodes can't resolve the cluster's fully qualified domain name (FQDN) in Azure DNS. Run the following DNS lookup command on the failed cluster node to find DNS resolutions that are valid.
40+
41+
| Node OS | Command |
42+
| ------- | ------------------------- |
43+
| Linux | `dig <cluster-fqdn>` |
44+
| Windows | `nslookup <cluster-fqdn>` |
45+
46+
## Solution
47+
48+
On your DNS servers and firewall, make sure that nothing blocks the resolution to your cluster's FQDN. Your custom DNS server might be incorrectly configured if something is blocking even after you run the `nslookup` or `dig` command and apply any necessary fixes. For help to configure your custom DNS server, review the following articles:
49+
50+
- [Create a private AKS cluster](/azure/aks/private-clusters)
51+
- [Private Azure Kubernetes service with custom DNS server](https://github.com/Azure/terraform/tree/00d15e09c54f25fb6387330c36aa4366122c5aaa/quickstart/301-aks-private-cluster)
52+
- [What is IP address 168.63.129.16?](/azure/virtual-network/what-is-ip-address-168-63-129-16)
53+
54+
When you use a private cluster that has a custom DNS, a DNS zone is created. The DNS zone must be linked to the virtual network. This occurs after the cluster is created. Creating a private cluster that has a custom DNS fails during creation. However, you can restore the creation process to a "success" state by reconciling the cluster. To do this, run the [az resource update](/cli/azure/resource#az-resource-update) command in Azure CLI, as follows:
55+
56+
Below, set your AKS cluster and resource group names, then run the update command to reconcile the cluster. The environment variables will make your resource names unique and are declared just before use.
57+
58+
```azurecli-interactive
59+
az resource update --resource-group $RESOURCE_GROUP_NAME \
60+
--name $CLUSTER_NAME \
61+
--namespace Microsoft.ContainerService \
62+
--resource-type ManagedClusters
63+
```
64+
65+
Results:
66+
67+
<!-- expected_similarity=0.3 -->
68+
69+
```output
70+
{
71+
"id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/myResourceGroupxxx/providers/Microsoft.ContainerService/ManagedClusters/myAksClusterxxx",
72+
"location": "eastus",
73+
"name": "myAksClusterxxx",
74+
"properties": {
75+
// ...other properties...
76+
},
77+
"resourceGroup": "myResourceGroupxxx",
78+
"type": "Microsoft.ContainerService/ManagedClusters"
79+
}
80+
```
81+
82+
Also verify that your DNS server is configured correctly for your private cluster, as described earlier.
83+
84+
> [!NOTE]
85+
> Conditional Forwarding doesn't support subdomains.
86+
87+
## More information
88+
89+
- [General troubleshooting of AKS cluster creation issues](troubleshoot-aks-cluster-creation-issues.md)
90+
91+
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
---
2+
title: Troubleshoot the health probe mode for AKS cluster service load balancer
3+
description: Diagnoses and fixes common issues with the health probe mode feature.
4+
ms.date: 06/03/2024
5+
ms.reviewer: niqi, cssakscic, v-weizhu
6+
ms.service: azure-kubernetes-service
7+
ms.custom: sap:Node/node pool availability and performance, devx-track-azurecli, innovation-engine
8+
---
9+
10+
# Troubleshoot issues when enabling the AKS cluster service health probe mode
11+
12+
The health probe mode feature allows you to configure how Azure Load Balancer probes the health of the nodes in your Azure Kubernetes Service (AKS) cluster. You can choose between two modes: Shared and ServiceNodePort. The Shared mode uses a single health probe for all external traffic policy cluster services that use the same load balancer. In contrast, the ServiceNodePort mode uses a separate health probe for each service. The Shared mode can reduce the number of health probes and improve the performance of the load balancer, but it requires some additional components to work properly. To enable this feature, see [How to enable the health probe mode feature using the Azure CLI](#how-to-enable-the-health-probe-mode-feature-using-the-azure-cli).
13+
14+
This article describes some common issues about using the health probe mode feature in an AKS cluster and helps you troubleshoot and resolve these issues.
15+
16+
## Symptoms
17+
18+
When creating or updating an AKS cluster by using the Azure CLI, if you enable the health probe mode feature using the `--cluster-service-load-balancer-health-probe-mode Shared` flag, the following issues occur:
19+
20+
- The load balancer doesn't distribute traffic to the nodes as expected.
21+
22+
- The load balancer reports unhealthy nodes even if they're healthy.
23+
24+
- The health-probe-proxy sidecar container crashes or doesn't start.
25+
26+
- The cloud-node-manager pod crashes or doesn't start.
27+
28+
The following operations also happen:
29+
30+
1. RP frontend checks if the request is valid and updates the corresponding property in the LoadBalancerProfile.
31+
32+
2. RP async calls the cloud provider config secret reconciler to update the cloud provider config secret based on the LoadBalancerProfile.
33+
34+
3. Overlaymgr reconciles the cloud-node-manager chart to enable the health-probe-proxy sidecar.
35+
36+
## Initial troubleshooting
37+
38+
To troubleshoot these issues, follow these steps:
39+
40+
0. First, connect to your AKS cluster using the Azure CLI:
41+
42+
```azurecli
43+
export RESOURCE_GROUP="aks-rg"
44+
export AKS_CLUSTER_NAME="aks-cluster"
45+
az aks get-credentials --resource-group $RESOURCE_GROUP --name $AKS_CLUSTER_NAME --overwrite-existing
46+
```
47+
48+
1. Next, check the RP frontend log to see if the health probe mode in the LoadBalancerProfile is properly configured. You can use the `az aks show` command to view the LoadBalancerProfile property of your cluster.
49+
50+
```azurecli
51+
export RESOURCE_GROUP="aks-rg"
52+
export AKS_CLUSTER_NAME="aks-cluster"
53+
az aks show --resource-group $RESOURCE_GROUP --name $AKS_CLUSTER_NAME --query "networkProfile.loadBalancerProfile"
54+
```
55+
Results:
56+
57+
<!-- expected_similarity=0.3 -->
58+
59+
```output
60+
{
61+
"clusterServiceLoadBalancerHealthProbeMode": "Shared",
62+
"managedOutboundIPs": null,
63+
"outboundIPs": null,
64+
"outboundIPPrefixes": null,
65+
"allocatedOutboundPorts": null,
66+
"effectiveOutboundIPs": [
67+
{
68+
"id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/MC_aks-rg_aks-cluster_eastus2/providers/Microsoft.Network/publicIPAddresses/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
69+
}
70+
],
71+
"idleTimeoutInMinutes": 30,
72+
"loadBalancerSku": "standard",
73+
"managedOutboundIPv6": null
74+
}
75+
```
76+
77+
2. Check the cloud provider configuration. In modern AKS clusters, the cloud provider configuration is managed internally and the `ccp` namespace doesn't exist. Instead, check for cloud provider related resources and verify the cloud-node-manager pods are running properly:
78+
79+
80+
```bash
81+
# Check for cloud provider related ConfigMaps in kube-system
82+
kubectl get configmap -n kube-system | grep -i azure
83+
84+
# Check if cloud-node-manager pods are running (indicates cloud provider integration is working)
85+
kubectl get pods -n kube-system | grep cloud-node-manager
86+
87+
# Check the azure-ip-masq-agent-config if it exists
88+
kubectl get configmap azure-ip-masq-agent-config-reconciled -n kube-system -o yaml 2>/dev/null || echo "ConfigMap not found"
89+
```
90+
Results:
91+
92+
<!-- expected_similarity=0.3 -->
93+
94+
```output
95+
configmap/azure-ip-masq-agent-config-reconciled 1 11h
96+
97+
cloud-node-manager-rfb2w 2/2 Running 0 16m
98+
```
99+
100+
3. Check the chart or overlay daemonset cloud-node-manager to see if the health-probe-proxy sidecar container is enabled. You can use the `kubectl get ds` command to view the daemonset.
101+
102+
```shell
103+
kubectl get ds -n kube-system cloud-node-manager -o yaml
104+
```
105+
Results:
106+
107+
<!-- expected_similarity=0.3 -->
108+
109+
```output
110+
apiVersion: apps/v1
111+
kind: DaemonSet
112+
metadata:
113+
name: cloud-node-manager
114+
namespace: kube-system
115+
...
116+
spec:
117+
template:
118+
spec:
119+
containers:
120+
- name: cloud-node-manager
121+
image: mcr.microsoft.com/oss/kubernetes/azure-cloud-node-manager:xxxxxxxx
122+
- name: health-probe-proxy
123+
image: mcr.microsoft.com/oss/kubernetes/azure-health-probe-proxy:xxxxxxxx
124+
...
125+
```
126+
127+
## Cause 1: The health probe mode isn't Shared or ServiceNodePort
128+
129+
The health probe mode feature only works with these two modes. If you use any other mode, the feature won't work.
130+
131+
### Solution 1: Use the correct health probe mode
132+
133+
Make sure you use the Shared or ServiceNodePort mode when creating or updating your cluster. You can use the `--cluster-service-load-balancer-health-probe-mode` flag to specify the mode.
134+
135+
## Cause 2: The toggle for the health probe mode feature is off
136+
137+
The health probe mode feature is controlled by a toggle that can be enabled or disabled by the AKS team. If the toggle is off, the feature won't work.
138+
139+
### Solution 2: Turn on the toggle
140+
141+
Contact the AKS team to check if the toggle for the health probe mode feature is on or off. If it's off, ask them to turn it on for your subscription.
142+
143+
## Cause 3: The load balancer SKU is Basic
144+
145+
The health probe mode feature only works with the Standard Load Balancer SKU. If you use the Basic Load Balancer SKU, the feature won't work.
146+
147+
### Solution 3: Use the Standard Load Balancer SKU
148+
149+
Make sure you use the Standard Load Balancer SKU when creating or updating your cluster. You can use the `--load-balancer-sku` flag to specify the SKU.
150+
151+
## Cause 4: The feature isn't registered
152+
153+
The health probe mode feature requires you to register the feature on your subscription. If the feature isn't registered, it won't work.
154+
155+
### Solution 4: Register the feature
156+
157+
Make sure you register the feature for your subscription before creating or updating your cluster. You can use the `az feature register` command to register the feature.
158+
159+
```azurecli
160+
export FEATURE_NAME="EnableSLBSharedHealthProbePreview"
161+
export PROVIDER_NAMESPACE="Microsoft.ContainerService"
162+
az feature register --name $FEATURE_NAME --namespace $PROVIDER_NAMESPACE
163+
```
164+
Results:
165+
166+
<!-- expected_similarity=0.3 -->
167+
168+
```output
169+
{
170+
"id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/providers/Microsoft.Features/providers/Microsoft.ContainerService/features/EnableAKSClusterServiceLoadBalancerHealthProbeMode",
171+
"name": "Microsoft.ContainerService/EnableAKSClusterServiceLoadBalancerHealthProbeMode",
172+
"properties": {
173+
"state": "Registering"
174+
},
175+
"type": "Microsoft.Features/providers/features"
176+
}
177+
```
178+
179+
## Cause 5: The Kubernetes version is earlier than v1.28.0
180+
181+
The health probe mode feature requires a minimum Kubernetes version of v1.28.0. If you use an older version, the feature won't work.
182+
183+
### Solution 5: Upgrade the Kubernetes version
184+
185+
Make sure you use Kubernetes v1.28.0 or a later version when creating or updating your cluster. You can use the `--kubernetes-version` flag to specify the version.
186+
187+
## Known issues
188+
189+
For Windows, the kube-proxy component doesn't start until you create the first non-HPC pod in a node. This issue affects the health probe mode feature and causes the load balancer to report unhealthy nodes. It will be fixed in a future update.
190+
191+
## How to enable the health probe mode feature using the Azure CLI
192+
193+
To enable the health probe mode feature, run one of the following commands:
194+
195+
Enable `ServiceNodePort` health probe mode (default) for a cluster:
196+
197+
```shell
198+
export RESOURCE_GROUP="aks-rg"
199+
export AKS_CLUSTER_NAME="aks-cluster"
200+
az aks update --resource-group $RESOURCE_GROUP --name $AKS_CLUSTER_NAME --cluster-service-load-balancer-health-probe-mode ServiceNodePort
201+
```
202+
Results:
203+
204+
```output
205+
{
206+
"name": "aks-cluster",
207+
"location": "eastus2",
208+
"resourceGroup": "aks-rg",
209+
"kubernetesVersion": "1.28.x",
210+
"provisioningState": "Succeeded",
211+
"loadBalancerProfile": {
212+
"clusterServiceLoadBalancerHealthProbeMode": "ServiceNodePort",
213+
...
214+
},
215+
...
216+
}
217+
```
218+
219+
Enable `Shared` health probe mode for a cluster:
220+
221+
```shell
222+
export RESOURCE_GROUP="MyAksResourceGroup"
223+
export AKS_CLUSTER_NAME="MyAksCluster"
224+
az aks update --resource-group $RESOURCE_GROUP --name $AKS_CLUSTER_NAME --cluster-service-load-balancer-health-probe-mode Shared
225+
```
226+
227+
Results:
228+
229+
```output
230+
{
231+
"name": "MyAksCluster",
232+
"location": "eastus2",
233+
"resourceGroup": "MyAksResourceGroup",
234+
"kubernetesVersion": "1.28.x",
235+
"provisioningState": "Succeeded",
236+
"loadBalancerProfile": {
237+
"clusterServiceLoadBalancerHealthProbeMode": "Shared",
238+
...
239+
},
240+
...
241+
}
242+
```
243+
244+
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]

0 commit comments

Comments
 (0)