-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Volume not detached/migrated after node failure/shutdown #164
Comments
ok, seems to be k8s related and not csi driver specific. |
@mrkamel Have you found any workaround to that? Not having the VolumeAttachment removed when the pod enters "terminating" state (after node failure) unless I manually delete it imo completely defeats the purpose of self-healing. It's just manual healing then. Which means that running on a single-node setup would be far more likely not to fail than having a multi-node setup where no node may fail or the application breaks until I fix it manually. I understand the concerns from a design perspective of forcefully detaching a volume from a node, but for certain use-cases it's needed (or I miss something - then please let me know ;)). I found some issue reports mentioning that this could probably solved using TaintBasedEvictions, which I don't understands since with a taint based eviction set up, the pod will simply be moved to terminated state based on the taint of the nodes, resulting in the same problem. Btw. either this was change with a more recent k8s version or it's part of the csi-drivers behavior. Running k8s on a private openstack deployment (ovh), this behavior is different - but as said, it's an older k8s version. |
@dprandzioch unfortunately not. i opened this during evaluation of different solutions for a migration and due to that behaviour of k8s ended up using docker swarm with https://github.com/costela/docker-volume-hetzner for now which works as desired for me. would be interested in solutions as well though. |
So if I understand this correctly, this is mostly a limitation of the way k8s handles things, right? With upcoming CSI support for Docker Swarm, I am currently checking out what needs to be done once it its released. Was wondering if we should invest time in checking whether the CSI driver works, but if the behaviour is due to the provider, then this might be a reason to stick with costela/docker-voume-hetzner for the time being. |
@s4ke Kubernetes explicitly handles this by forcefully detaching the volume from the server if the kubernetes node is unreachable (configurable timeout). This is per se not allowed by the spec, but there is a discussion to include this behaviour in the CSI spec: container-storage-interface/spec#512 |
I've a simple echoserver pod running on one of three nodes with a 10gb hetzner volume using the csi driver.
When i shut down the node where the pod is running on, the pod can't be migrated to another node as it gets stuck in
ContainerCreating
I'm using the latest driver with kubernetes 1.19.
The log is telling me
The hetzner cloud panel still shows the volume attached to the shut down node.
When i power on the shut down node again and let it join, the pod and volume can finally be migrated to the new node, but i thought this would not be a neccessity.
My echoserver.yaml:
Am i missing some config i'm not aware of? Or is this desired?
Thanks in advance
The text was updated successfully, but these errors were encountered: