Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Still limitations for a dual stack Talos cluster #10525

Open
Heracles31 opened this issue Mar 13, 2025 · 2 comments
Open

Still limitations for a dual stack Talos cluster #10525

Heracles31 opened this issue Mar 13, 2025 · 2 comments

Comments

@Heracles31
Copy link

Bug Report

Running a dual stack cluster with Talos is still problematic and documentation is still short.

Description

The last bug report I created turned to a nice opportunity for many to contribute about how to achieve dual stacking with Talos. Thanks to every one who contributed to that one.

Still, there are more limitations and I thought about creating this new issue, hoping it will also help improving documentation and push development towards a better solution for dual stacking.

A-Talos's VIP is single stack

In machineconfig, you can create a VIP that will be shared by control planes to avoid the need of an external load balancer. Unfortunately, that option is single stack. If you put an IPv4 and an IPv6 address, the config fails parsing.

machine:
  network:
    interfaces:
    - interface: eth0
      vip:
        ip: 1.2.3.4,2001:0DB8:1234:56::30

Fails because the value does not parse to a valid IP address.

machine:
  network:
    interfaces:
    - interface: eth0
      vip:
        ip: 1.2.3.4
        ip: 2001:0DB8:1234:56::30

Fails because there are two IP lines.

machine:
  network:
    interfaces:
    - interface: eth0
      vip:
        ip: 2001:0DB8:1234:56::30

or

machine:
  network:
    interfaces:
    - interface: eth0
      vip:
        ip: 1.2.3.4

Will work but you will be single stack.

Another point here is that Talos docs recommends not to use that VIP as the endpoint for talosctl. Here, I created a DNS name that resolves to each of control planes' IPs. Unfortunately, should the first one fail, the command will exit and will not try the others. Because that mechanism will fail to pass even from one IPv4 to another, of course it will also fail to pass from an IPv4 to a v6. So here again, Talos is single stack only.

B-Dual stack IPv6 / IPv4 is not the same as dual stack IPv4 / IPv6

In this post, @bernardgut explained that the ordering of the IP ranges in the config was important. If the order does not match from one setting to the other, the config will fail. It is also important to understand, explain and document that If IPv4 is listed first, the dual stack cluster will be IPv4 / IPv6. If IPv6 is listed first, the cluster will turn to IPv6 / IPv4. Here, I am deploying Longhorn as a CSI. That one does not support IPv6 and its frontend service must be IPv4. Still, its helm chart does not specify it and the service ends up single stack using the first IP family used by the cluster. For that reason, you will not reach Longhorn's UI if you are single stack v6 or dual stack v6 / v4. Again, nowhere I saw any documentation about this kind of impact.

C-And some other minor points...

As for the service subnet, I am the one who suspected that /112 was the maximum but indeed, /108 is fine and I'm using that myself. Thanks to @nazarewk for that one and forget about my /112.

Because I ended up forced to run IPv4 / IPv6 to accommodate Longhorn, I chose to re-enable kubeprism. The cluster being IPv4 first, I did not detect any problem with it. Still, I am unable to prove that it fully works as expected or if it just ends up surviving like Longhorn's frontend, saved by the fact that the cluster is IPv4 first hand.

So indeed, there are ways to run a dual stack cluster with Talos but there are still many things that are obscure, uncertain or even confirmed as non-functional.

Overall, I am using these files :

machine:
  install:
    extraKernelArgs:
      - net.ifnames=0
  sysctls:
      vm.nr_hugepages: "2048"
  time:
      disabled: false # Indicates if the time service is disabled for the machine.
      servers:
          - time.localdomain
      bootTimeout: 2m0s # Specifies the timeout when the node time is considered to be in sync unlocking the boot sequence.
  kubelet:
    nodeIP:
      validSubnets:
        - 172.24.136.128/26
        - 2001:0DB8:1234:c0::/64
cluster:
  apiServer:
    extraArgs:
      bind-address: "::"
  controllerManager:
    extraArgs:
      bind-address: "::1"
      node-cidr-mask-size-ipv6: "80"
  scheduler:
    extraArgs:
      bind-address: "::1"
  network:
    podSubnets:
      - 10.244.0.0/16
      - 2001:0DB8:1234:c1::/64
    serviceSubnets:
      - 10.96.0.0/12
      - 2001:0DB8:1234:c3::10:0/108
  etcd:
    advertisedSubnets:
      - 172.24.136.128/26
      - 2001:0DB8:1234:c0::/64
  proxy:
    extraArgs:
      ipvs-strict-arp: true

For control planes, adding this:

machine:
  network:
    interfaces:
    - interface: eth0
      vip:
        ip: 2001:0DB8:1234:c0::30
  certSANs:
      - k64ctl.localdomain
      - kube64-ctl.localdomain
      - kube64-c1.localdomain
      - kube64-c2.localdomain
      - kube64-c3.localdomain
      - 172.24.136.161
      - 172.24.136.162
      - 172.24.136.163
      - 2001:0DB8:1234:c0::30
      - 2001:0DB8:1234:c0::31
      - 2001:0DB8:1234:c0::32
      - 2001:0DB8:1234:c0::33
      - ::1

each control plan has its unique file :

machine:
  network:
    hostname: kube64-c1.localdomain
    interfaces:
      - interface: eth0
        addresses:
          - 172.24.136.161/26
          - 2001:0DB8:1234:c0::31/64
        routes:
          - network: 0.0.0.0/0
            gateway: 172.24.136.129
    nameservers:
      - 172.24.136.132
      - 172.24.136.135

just like each worker :

machine:
  disks:
    - device: /dev/sdb
      partitions:
        - mountpoint: /var/mnt/sdb
  kubelet:
      extraMounts:
          - destination: /var/mnt/sdb
            type: bind
            source: /var/mnt/sdb
            options:
              - bind
              - rshared
              - rw
  network:
    hostname: kube64-w1.localdomain
    interfaces:
      - interface: eth0
        addresses:
          - 172.24.136.164/26
          - 2001:0DB8:1234:c0::41/64
        routes:
          - network: 0.0.0.0/0
            gateway: 172.24.136.129
    nameservers:
      - 172.24.136.132
      - 172.24.136.135

Logs

Environment

  • Talos version: 1.9.4
  • Kubernetes version: 1.32.2
  • Platform: VMs built from ISO running in Proxmox 8.3
@smira
Copy link
Member

smira commented Mar 14, 2025

First of all, it's best to created pointed issues, or better discussions for each point, rather than a mix of issue with a vague title.

Point A is valid (dual-stack VIP), we are going to probably solve it with new machine config.

But I don't think it's big, as I don't see much value in that? It's used only to access the cluster from the outside (as a Kubernetes API client). So unless your environment is a mix of v4-only and v6-only, you can stick with either of v4 or v6 VIP.

Points B and C is missing documentation (even though it's questionable, as it's documented on Kubernetes side), and it has nothing to do with Talos itself, but more with Kubernetes - feel free to open a PR and add that.

@Heracles31
Copy link
Author

I agree that a single issue should be about a single point. Also agree that information can be discovered by alternate path. But considering that even after the first issue nothing has improved in the official documentation about how to do IPv6, I thought this kind of ticket was helpful. I also chose to put the entire config in a single post to help because still to this day, the first issue scores pretty high whenever you search about problems with Talos and IPv6.

Thanks for the confirmation and development about the single stack VIP and in all cases, feel free to close this ticket if you do not think it can trigger a constructive discussion like the previous one did.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants