Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to get shared datastores in kubernetes cluster #3076

Open
MKITConsulting opened this issue Oct 10, 2024 · 3 comments
Open

failed to get shared datastores in kubernetes cluster #3076

MKITConsulting opened this issue Oct 10, 2024 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@MKITConsulting
Copy link

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:

The attempt to provision a new PVC leads to the following error messages on attempt to deploy hashicorp vault helm chart (works on another cluster though):

2024-10-09T12:12:14.699271218Z W1009 12:12:14.699128       1 controller.go:934] Retrying syncing claim "c18c109b-9b36-4405-9834-5a5a09198776", failure 9 
2024-10-09T12:12:14.699280063Z E1009 12:12:14.699157       1 controller.go:957] error syncing claim "c18c109b-9b36-4405-9834-5a5a09198776": failed to provision volume with StorageClass "vsphere-csi-sc": rpc error: code = Internal desc = failed to get shared datastores in kubernetes cluster. Error: ServerFaultCode: The object 'vim.VirtualMachine:vm-567408' has already been deleted or has not been completely created

What you expected to happen:

PVC gets created

How to reproduce it (as minimally and precisely as possible):

Latest Hashicorp Vault Helm Chart deployment.

Anything else we need to know?:

Environment:

(taken from Helm release values):

csiController:
  csiResizer:
    enabled: false
  image:
    csiAttacher:
      repository: rancher/mirrored-sig-storage-csi-attacher
      tag: v4.2.0
    csiProvisioner:
      repository: rancher/mirrored-sig-storage-csi-provisioner
      tag: v3.4.0
    csiResizer:
      repository: rancher/mirrored-sig-storage-csi-resizer
      tag: v1.7.0
    livenessProbe:
      repository: rancher/mirrored-sig-storage-livenessprobe
      tag: v2.9.0
    repository: rancher/mirrored-cloud-provider-vsphere-csi-release-driver
    tag: v3.0.1
    vsphereSyncer:
      repository: rancher/mirrored-cloud-provider-vsphere-csi-release-syncer
      tag: v3.0.1

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 10, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 8, 2025
@Griznah
Copy link

Griznah commented Jan 23, 2025

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 23, 2025
@Griznah
Copy link

Griznah commented Jan 23, 2025

We had the same issue in our cluster:
vsphere-csi-controller-6f5b94f464-gtwf2 vsphere-csi-controller {"level":"error","time":"2025-01-23T07:34:40.127076949Z","caller":"vanilla/controller.go:2702","msg":"get block volumeIDToNodeUUIDMap failed with err = ServerFaultCode: The object 'vim.VirtualMachine:vm-448207' has already been deleted or has not been completely created ","TraceId":"f09a8baf-7f26-4829-b854-cdd6ff02c462","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).ListVolumes.func1\n\t/build/pkg/csi/service/vanilla/controller.go:2702\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).ListVolumes\n\t/build/pkg/csi/service/vanilla/controller.go:2740\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ListVolumes_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:6670\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1372\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1783\ngoogle.golang.org/grpc.(*Server).serveStreams.func2.1\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1016"}

Container image "gcr.io/gke-on-prem-release/csi-attacher:v4.7.0-gke.3" Container image "gcr.io/gke-on-prem-release/vsphere-csi-driver:v3.3.1-gke.2" Container image "gcr.io/gke-on-prem-release/vsphere-csi-syncer:v3.3.1-gke.2" Container image "gcr.io/gke-on-prem-release/csi-provisioner:v5.1.0-gke.4" Container image "gcr.io/gke-on-prem-release/csi-resizer:v1.12.0-gke.3" Container image "gcr.io/gke-on-prem-release/csi-snapshotter:v8.1.0-gke.3"

We are running Google Distributed Cloud (software only) for VMware on-prem.
Just prior to this issue occuring we had rebooted our control plane nodes to increase provisioned memory.
I fixed/worked-around the isse by restarting the vsphere-csi-controller pods:
kubectl rollout restart deployment vsphere-csi-controller -n kube-system

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants