You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running a dual stack cluster with Talos is still problematic and documentation is still short.
Description
The last bug report I created turned to a nice opportunity for many to contribute about how to achieve dual stacking with Talos. Thanks to every one who contributed to that one.
Still, there are more limitations and I thought about creating this new issue, hoping it will also help improving documentation and push development towards a better solution for dual stacking.
A-Talos's VIP is single stack
In machineconfig, you can create a VIP that will be shared by control planes to avoid the need of an external load balancer. Unfortunately, that option is single stack. If you put an IPv4 and an IPv6 address, the config fails parsing.
Another point here is that Talos docs recommends not to use that VIP as the endpoint for talosctl. Here, I created a DNS name that resolves to each of control planes' IPs. Unfortunately, should the first one fail, the command will exit and will not try the others. Because that mechanism will fail to pass even from one IPv4 to another, of course it will also fail to pass from an IPv4 to a v6. So here again, Talos is single stack only.
B-Dual stack IPv6 / IPv4 is not the same as dual stack IPv4 / IPv6
In this post, @bernardgut explained that the ordering of the IP ranges in the config was important. If the order does not match from one setting to the other, the config will fail. It is also important to understand, explain and document that If IPv4 is listed first, the dual stack cluster will be IPv4 / IPv6. If IPv6 is listed first, the cluster will turn to IPv6 / IPv4. Here, I am deploying Longhorn as a CSI. That one does not support IPv6 and its frontend service must be IPv4. Still, its helm chart does not specify it and the service ends up single stack using the first IP family used by the cluster. For that reason, you will not reach Longhorn's UI if you are single stack v6 or dual stack v6 / v4. Again, nowhere I saw any documentation about this kind of impact.
C-And some other minor points...
As for the service subnet, I am the one who suspected that /112 was the maximum but indeed, /108 is fine and I'm using that myself. Thanks to @nazarewk for that one and forget about my /112.
Because I ended up forced to run IPv4 / IPv6 to accommodate Longhorn, I chose to re-enable kubeprism. The cluster being IPv4 first, I did not detect any problem with it. Still, I am unable to prove that it fully works as expected or if it just ends up surviving like Longhorn's frontend, saved by the fact that the cluster is IPv4 first hand.
So indeed, there are ways to run a dual stack cluster with Talos but there are still many things that are obscure, uncertain or even confirmed as non-functional.
Overall, I am using these files :
machine:
install:
extraKernelArgs:
- net.ifnames=0
sysctls:
vm.nr_hugepages: "2048"
time:
disabled: false # Indicates if the time service is disabled for the machine.
servers:
- time.localdomain
bootTimeout: 2m0s # Specifies the timeout when the node time is considered to be in sync unlocking the boot sequence.
kubelet:
nodeIP:
validSubnets:
- 172.24.136.128/26
- 2001:0DB8:1234:c0::/64
First of all, it's best to created pointed issues, or better discussions for each point, rather than a mix of issue with a vague title.
Point A is valid (dual-stack VIP), we are going to probably solve it with new machine config.
But I don't think it's big, as I don't see much value in that? It's used only to access the cluster from the outside (as a Kubernetes API client). So unless your environment is a mix of v4-only and v6-only, you can stick with either of v4 or v6 VIP.
Points B and C is missing documentation (even though it's questionable, as it's documented on Kubernetes side), and it has nothing to do with Talos itself, but more with Kubernetes - feel free to open a PR and add that.
I agree that a single issue should be about a single point. Also agree that information can be discovered by alternate path. But considering that even after the first issue nothing has improved in the official documentation about how to do IPv6, I thought this kind of ticket was helpful. I also chose to put the entire config in a single post to help because still to this day, the first issue scores pretty high whenever you search about problems with Talos and IPv6.
Thanks for the confirmation and development about the single stack VIP and in all cases, feel free to close this ticket if you do not think it can trigger a constructive discussion like the previous one did.
Bug Report
Running a dual stack cluster with Talos is still problematic and documentation is still short.
Description
The last bug report I created turned to a nice opportunity for many to contribute about how to achieve dual stacking with Talos. Thanks to every one who contributed to that one.
Still, there are more limitations and I thought about creating this new issue, hoping it will also help improving documentation and push development towards a better solution for dual stacking.
A-Talos's VIP is single stack
In machineconfig, you can create a VIP that will be shared by control planes to avoid the need of an external load balancer. Unfortunately, that option is single stack. If you put an IPv4 and an IPv6 address, the config fails parsing.
Fails because the value does not parse to a valid IP address.
Fails because there are two IP lines.
or
Will work but you will be single stack.
Another point here is that Talos docs recommends not to use that VIP as the endpoint for talosctl. Here, I created a DNS name that resolves to each of control planes' IPs. Unfortunately, should the first one fail, the command will exit and will not try the others. Because that mechanism will fail to pass even from one IPv4 to another, of course it will also fail to pass from an IPv4 to a v6. So here again, Talos is single stack only.
B-Dual stack IPv6 / IPv4 is not the same as dual stack IPv4 / IPv6
In this post, @bernardgut explained that the ordering of the IP ranges in the config was important. If the order does not match from one setting to the other, the config will fail. It is also important to understand, explain and document that If IPv4 is listed first, the dual stack cluster will be IPv4 / IPv6. If IPv6 is listed first, the cluster will turn to IPv6 / IPv4. Here, I am deploying Longhorn as a CSI. That one does not support IPv6 and its frontend service must be IPv4. Still, its helm chart does not specify it and the service ends up single stack using the first IP family used by the cluster. For that reason, you will not reach Longhorn's UI if you are single stack v6 or dual stack v6 / v4. Again, nowhere I saw any documentation about this kind of impact.
C-And some other minor points...
As for the service subnet, I am the one who suspected that /112 was the maximum but indeed, /108 is fine and I'm using that myself. Thanks to @nazarewk for that one and forget about my /112.
Because I ended up forced to run IPv4 / IPv6 to accommodate Longhorn, I chose to re-enable kubeprism. The cluster being IPv4 first, I did not detect any problem with it. Still, I am unable to prove that it fully works as expected or if it just ends up surviving like Longhorn's frontend, saved by the fact that the cluster is IPv4 first hand.
So indeed, there are ways to run a dual stack cluster with Talos but there are still many things that are obscure, uncertain or even confirmed as non-functional.
Overall, I am using these files :
For control planes, adding this:
each control plan has its unique file :
just like each worker :
Logs
Environment
The text was updated successfully, but these errors were encountered: