Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns resolver not used #10396

Open
viswanathanb opened this issue Feb 18, 2025 · 1 comment
Open

dns resolver not used #10396

viswanathanb opened this issue Feb 18, 2025 · 1 comment

Comments

@viswanathanb
Copy link

Bug Report

DNS Servers from Machine Config are not used for fetching images using mirror.

Description

We have a airgapped system and the default dns servers are not accessible. 1.1.1.1 and 8.8.8.8. With the right dns servers set in the machine config, there is a issue with getting the images. kubelet can be restarted and it resolves and fetches the container. However, etcd get stuck and there is no way to restart it.

Logs

87.53.172.71: kern: notice: [2025-02-18T19:48:31.093635767Z]: NFS: Registering the id_resolver key type
87.53.172.71: kern: notice: [2025-02-18T19:48:31.094204767Z]: Key type id_resolver registered
87.53.172.71: kern: notice: [2025-02-18T19:48:31.209146767Z]: Key type dns_resolver registered
87.53.172.71: user: warning: [2025-02-18T19:48:31.494749767Z]: [talos] setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["1.1.1.1", "8.8.8.8"], "searchDomains": []}
87.53.172.71: user: warning: [2025-02-18T19:48:31.502604767Z]: [talos] setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["1.1.1.1", "8.8.8.8"], "searchDomains": []}
87.53.172.71: user: warning: [2025-02-18T19:48:31.578038767Z]: [talos] serviceudevd: Process Process(["/sbin/systemd-udevd" "--resolve-names=never"]) started with PID 1156
87.53.172.71: user: warning: [2025-02-18T19:48:32.321416767Z]: [talos] setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["10.96.0.10"], "searchDomains": []}
87.53.172.71: user: warning: [2025-02-18T19:48:33.804397767Z]: [talos] setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["192.66.185.7", "192.66.185.8"], "searchDomains": ["c001.svc.cluster.local"]}
87.53.172.71: user: warning: [2025-02-18T19:48:33.872621767Z]: [talos] updated dns server nameservers {"component": "dns-resolve-cache", "addrs": ["192.66.185.7:53", "192.66.185.8:53"]}
87.53.172.71: user: warning: [2025-02-18T19:48:33.877183767Z]: [talos] setting resolvers {"component": "controller-runtime", "controller": "network.ResolverSpecController", "resolvers": ["192.66.185.7", "192.66.185.8"], "searchDomains": []}
87.53.172.71: user: warning: [2025-02-18T19:48:53.812215767Z]: level=info msg=trying next host error=failed to do request: Head "https://p-boss-gitlab-01.eng.xxx.net:5050/v2/public-repositories/shared-services/public-container-image-registry/etcd-development/etcd/manifests/v3.5.17?ns=gcr.io": dial tcp: lookup p-boss-gitlab-01.eng.xxx.net on 8.8.8.8:53: read udp 87.53.172.71:60075->8.8.8.8:53: i/o timeout host=p-boss-gitlab-01.eng.xxx.net:5050 image=gcr.io/etcd-development/etcd:v3.5.17
87.53.172.71: user: warning: [2025-02-18T19:49:23.693356767Z]: failed to pull image "gcr.io/etcd-development/etcd:v3.5.17": failed to resolve reference "gcr.io/etcd-development/etcd:v3.5.17": gcr.io/etcd-development/etcd:v3.5.17: not found

NODE SERVICE STATE HEALTH LAST CHANGE LAST EVENT
87.53.172.71 apid Running OK 24m40s ago Health check successful
87.53.172.71 auditd Running OK 24m42s ago Health check successful
87.53.172.71 containerd Running OK 24m42s ago Health check successful
87.53.172.71 cri Running OK 24m39s ago Health check successful
87.53.172.71 dashboard Running ? 24m41s ago Process Process(["/sbin/dashboard"]) started with PID 2042
87.53.172.71 etcd Failed ? 23m51s ago Failed to run pre stage: failed to pull image "gcr.io/etcd-development/etcd:v3.5.17": 1 error(s) occurred:
failed to pull image "gcr.io/etcd-development/etcd:v3.5.17": failed to resolve reference "gcr.io/etcd-development/etcd:v3.5.17": gcr.io/etcd-development/etcd:v3.5.17: not found
87.53.172.71 kubelet Running OK 24m19s ago Health check successful
87.53.172.71 machined Running OK 24m42s ago Health check successful
87.53.172.71 syslogd Running OK 24m41s ago Health check successful
87.53.172.71 trustd Running OK 24m21s ago Health check successful
87.53.172.71 udevd Running OK 24m43s ago Health check successful

Environment

  • Talos version:
    Client:
    Tag: v1.9.2
    SHA: 09758b3
    Built:
    Go version: go1.23.4
    OS/Arch: darwin/amd64
    Server:
    NODE: 87.53.172.70
    Tag: v1.9.2
    SHA: 09758b3
    Built:
    Go version: go1.23.4
    OS/Arch: linux/amd64
    Enabled: RBAC

  • Kubernetes version:
    -Client Version: v1.32.0
    Kustomize Version: v5.5.0
    Server Version: v1.32.0

  • Platform:
    Baremetal

@smira
Copy link
Member

smira commented Feb 24, 2025

Talos should pick up DNS changes, but I think your issue is different - you probably have a registry mirror in the chain which returns "not found" error which stops any attempt to fetch an image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants