-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFE - Support for GPU Operator on ARM (Specifically Nvidia Jetson AGX Xavier) #230
Comments
@schmaustech I will get back to you on this. |
@shivamerla Any movement or update on this? |
@schmaustech Support for GPU Operator on ARM is currently targeted for Q1 2022. |
@shivamerla - Any update on this, please? |
@shivamerla How's this going? |
@jasonbarbee @David-VTUK While GPU operator v1.10.x added support for ARM platform, support for Jetson devices is not yet there. It needs changes in k8s-device-plugin and container-toolkit which is in the roadmap. |
Any update here 2024? |
@shivamerla any updates? Where is the roadmap located? |
I currently have been able to deploy a development release of Red Hat OpenShift 4.9 running on RHCOS in a single node scenario on my Nvidia Jetson AGX Xavier:
$ oc get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master-0.kni7.schmaustech.com Ready master,worker 43h v1.21.0-rc.0+ec0996b 192.168.0.47 Red Hat Enterprise Linux CoreOS 49.84.202106272247-0 (Ootpa) 4.18.0-305.3.1.el8_4.aarch64 cri-o://1.21.0-88.rhaos4.8.gitfd485de.el8
$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 123m
baremetal 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
cloud-credential 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
cluster-autoscaler 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
config-operator 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
console 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 122m
csi-snapshot-controller 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 27h
dns 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 125m
etcd 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
image-registry 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
ingress 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
insights 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
kube-apiserver 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
kube-controller-manager 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
kube-scheduler 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
kube-storage-version-migrator 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
machine-api 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
machine-approver 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
machine-config 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
marketplace 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
monitoring 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 122m
network 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
node-tuning 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
openshift-apiserver 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 123m
openshift-controller-manager 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 20h
openshift-samples 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
operator-lifecycle-manager 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
operator-lifecycle-manager-catalog 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
operator-lifecycle-manager-packageserver 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 123m
service-ca 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
storage 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
I would like to be able to use the GPU-Operator to be able to access the GPU in the AGX Xavier but believe its not possible as of right now as I tried to deploy it and got the following:
$ oc get all -n gpu-operator-resources
No resources found in gpu-operator-resources namespace.
$ oc get all | egrep 'node|gpu'
pod/gpu-operator-64df558567-r6zr8 0/1 CrashLoopBackOff 6 8m54s
deployment.apps/gpu-operator 0/1 1 0 8m54s
replicaset.apps/gpu-operator-64df558567 1 1 0 8m54s
$ oc logs gpu-operator-64df558567-r6zr8
standard_init_linux.go:219: exec user process caused: exec format error
Is this something planned in the future?
The text was updated successfully, but these errors were encountered: