failed to get shared datastores in kubernetes cluster #3076

MKITConsulting · 2024-10-10T10:33:28Z

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:

The attempt to provision a new PVC leads to the following error messages on attempt to deploy hashicorp vault helm chart (works on another cluster though):

2024-10-09T12:12:14.699271218Z W1009 12:12:14.699128       1 controller.go:934] Retrying syncing claim "c18c109b-9b36-4405-9834-5a5a09198776", failure 9 
2024-10-09T12:12:14.699280063Z E1009 12:12:14.699157       1 controller.go:957] error syncing claim "c18c109b-9b36-4405-9834-5a5a09198776": failed to provision volume with StorageClass "vsphere-csi-sc": rpc error: code = Internal desc = failed to get shared datastores in kubernetes cluster. Error: ServerFaultCode: The object 'vim.VirtualMachine:vm-567408' has already been deleted or has not been completely created

What you expected to happen:

PVC gets created

How to reproduce it (as minimally and precisely as possible):

Latest Hashicorp Vault Helm Chart deployment.

Anything else we need to know?:

Environment:

(taken from Helm release values):

csiController:
  csiResizer:
    enabled: false
  image:
    csiAttacher:
      repository: rancher/mirrored-sig-storage-csi-attacher
      tag: v4.2.0
    csiProvisioner:
      repository: rancher/mirrored-sig-storage-csi-provisioner
      tag: v3.4.0
    csiResizer:
      repository: rancher/mirrored-sig-storage-csi-resizer
      tag: v1.7.0
    livenessProbe:
      repository: rancher/mirrored-sig-storage-livenessprobe
      tag: v2.9.0
    repository: rancher/mirrored-cloud-provider-vsphere-csi-release-driver
    tag: v3.0.1
    vsphereSyncer:
      repository: rancher/mirrored-cloud-provider-vsphere-csi-release-syncer
      tag: v3.0.1

The text was updated successfully, but these errors were encountered:

k8s-triage-robot · 2025-01-08T10:51:03Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Griznah · 2025-01-23T07:42:41Z

/remove-lifecycle stale

Griznah · 2025-01-23T08:05:02Z

We had the same issue in our cluster:
vsphere-csi-controller-6f5b94f464-gtwf2 vsphere-csi-controller {"level":"error","time":"2025-01-23T07:34:40.127076949Z","caller":"vanilla/controller.go:2702","msg":"get block volumeIDToNodeUUIDMap failed with err = ServerFaultCode: The object 'vim.VirtualMachine:vm-448207' has already been deleted or has not been completely created ","TraceId":"f09a8baf-7f26-4829-b854-cdd6ff02c462","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).ListVolumes.func1\n\t/build/pkg/csi/service/vanilla/controller.go:2702\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).ListVolumes\n\t/build/pkg/csi/service/vanilla/controller.go:2740\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ListVolumes_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:6670\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1372\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1783\ngoogle.golang.org/grpc.(*Server).serveStreams.func2.1\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1016"}

Container image "gcr.io/gke-on-prem-release/csi-attacher:v4.7.0-gke.3" Container image "gcr.io/gke-on-prem-release/vsphere-csi-driver:v3.3.1-gke.2" Container image "gcr.io/gke-on-prem-release/vsphere-csi-syncer:v3.3.1-gke.2" Container image "gcr.io/gke-on-prem-release/csi-provisioner:v5.1.0-gke.4" Container image "gcr.io/gke-on-prem-release/csi-resizer:v1.12.0-gke.3" Container image "gcr.io/gke-on-prem-release/csi-snapshotter:v8.1.0-gke.3"

We are running Google Distributed Cloud (software only) for VMware on-prem.
Just prior to this issue occuring we had rebooted our control plane nodes to increase provisioned memory.
I fixed/worked-around the isse by restarting the vsphere-csi-controller pods:
kubectl rollout restart deployment vsphere-csi-controller -n kube-system

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 10, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 8, 2025

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failed to get shared datastores in kubernetes cluster #3076

failed to get shared datastores in kubernetes cluster #3076

MKITConsulting commented Oct 10, 2024

k8s-triage-robot commented Jan 8, 2025

Griznah commented Jan 23, 2025

Griznah commented Jan 23, 2025 •

edited

Loading

failed to get shared datastores in kubernetes cluster #3076

failed to get shared datastores in kubernetes cluster #3076

Comments

MKITConsulting commented Oct 10, 2024

k8s-triage-robot commented Jan 8, 2025

Griznah commented Jan 23, 2025

Griznah commented Jan 23, 2025 • edited Loading

Griznah commented Jan 23, 2025 •

edited

Loading