|
| 1 | +# Basic examples for a Linux desktop or workstation |
| 2 | +* [Prerequsites](#prerequsites) |
| 3 | + * [Examples with different DRA configuration](#examples-with-different-dra-configurations) |
| 4 | + * [1. A single pod accesses a GPU via ResourceClaimTemplate](#example-1-spsc-gpu-a-single-pod-accesses-a-gpu-via-resourceclaimtemplate) |
| 5 | + * [2. A single pod's multiple containers share a GPU via ResourceClaimTemplate](#example-2-spmc-shared-gpu-a-single-pods-multiple-containers-share-a-gpu-via-resourceclaimtemplate) |
| 6 | + * [3. Multiple pods share a GPU via ResourceClaim](#example-3-mpsc-shared-gpu-multiple-pods-share-a-gpu-via-resourceclaim) |
| 7 | + * [4. Multiple pods request dedicated GPU access](#example-4-mpsc-unshared-gpu-multiple-pods-request-dedicated-gpu-access) |
| 8 | + * [5. A single pod's multiple containers share a GPU via MPS](#example-5-spmc-mps-gpu-a-single-pods-multiple-containers-share-a-gpu-via-mps) |
| 9 | + * [6. Multiple pods share a GPU via MPS](#example-6-mpsc-mps-gpu-multiple-pods-share-a-gpu-via-mps) |
| 10 | + * [7. A singile pod's multiple containers share a GPU via TimeSlicing](#example-7-spmc-timeslicing-gpu-a-single-pods-multiple-containers-share-a-gpu-via-timeslicing) |
| 11 | + * [8. Multiple pods share a GPU via TimeSlicing](#example-8-mpsc-timeslicing-gpu-multiple-pods-share-a-gpu-via-timeslicing) |
| 12 | + |
| 13 | +## Prerequsites |
| 14 | + |
| 15 | +You will need a Linux machine with a NVIDIA GPU such as GeForce, install the DRA driver and create a kind cluster by following the instructions in the [DRA driver setup](https://github.com/yuanchen8911/k8s-dra-driver?tab=readme-ov-file#demo). |
| 16 | + |
| 17 | +#### Show the current GPU configuration of the machine |
| 18 | +```console |
| 19 | +nvidia-smi -L |
| 20 | +``` |
| 21 | + |
| 22 | +``` |
| 23 | +GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-84f293a6-d610-e3dc-c4d8-c5d94409764b) |
| 24 | +``` |
| 25 | + |
| 26 | +#### Show the cluster up |
| 27 | +```console |
| 28 | +kubectl cluster-info |
| 29 | +kubectl get nodes |
| 30 | +``` |
| 31 | + |
| 32 | +``` |
| 33 | +Kubernetes control plane is running at https://127.0.0.1:34883 |
| 34 | +CoreDNS is running at https://127.0.0.1:34883/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy |
| 35 | +
|
| 36 | +To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. |
| 37 | +
|
| 38 | +NAME STATUS ROLES AGE VERSION |
| 39 | +k8s-dra-driver-cluster-control-plane Ready control-plane 4d1h v1.29.1 |
| 40 | +k8s-dra-driver-cluster-worker Ready <none> 4d1h v1.29.1 |
| 41 | +``` |
| 42 | + |
| 43 | +#### Show the DRA-driver running |
| 44 | +```console |
| 45 | +kubectl get pod -n nvidia-dra-driver |
| 46 | +``` |
| 47 | + |
| 48 | +``` |
| 49 | +NAME READY STATUS RESTARTS AGE |
| 50 | +nvidia-k8s-dra-driver-controller-6d5869d478-rr488 1/1 Running 0 4d1h |
| 51 | +nvidia-k8s-dra-driver-kubelet-plugin-qqq5b 1/1 Running 0 4d1h |
| 52 | +``` |
| 53 | + |
| 54 | + |
| 55 | +## Examples with different DRA configurations |
| 56 | + |
| 57 | +#### Example 1 (SPSC-GPU): a single pod accesses a GPU via ResourceClaimTemplate |
| 58 | + |
| 59 | +```console |
| 60 | +kubectl apply -f single-pod-single-container-gpu.yaml |
| 61 | +sleep 2 |
| 62 | +kubectl get pods -n spsc-gpu-test |
| 63 | +``` |
| 64 | + |
| 65 | +The pod will be running. |
| 66 | +``` |
| 67 | +NAME READY STATUS RESTARTS AGE |
| 68 | +gpu-pod 1/1 Running 0 6s |
| 69 | +``` |
| 70 | + |
| 71 | +Running `nvidia-smi` will show something like the following: |
| 72 | +```console |
| 73 | +nvidia-smi |
| 74 | +``` |
| 75 | + |
| 76 | +``` |
| 77 | ++---------------------------------------------------------------------------------------+ |
| 78 | +| Processes: | |
| 79 | +| GPU GI CI PID Type Process name GPU Memory | |
| 80 | +| ID ID Usage | |
| 81 | +|=======================================================================================| |
| 82 | +| 0 N/A N/A 1474787 C /cuda-samples/sample 746MiB | |
| 83 | ++---------------------------------------------------------------------------------------+ |
| 84 | +``` |
| 85 | + |
| 86 | +Delete the pod: |
| 87 | +```console |
| 88 | +kubectl delete -f single-pod-single-container-gpu.yaml |
| 89 | +``` |
| 90 | + |
| 91 | +#### Example 2 (SPMC-Shared-GPU): a single pod's multiple containers share a GPU via ResourceClaimTemplate |
| 92 | + |
| 93 | +```console |
| 94 | +kubectl apply -f single-pod-multiple-containers-shared-gpu.yaml |
| 95 | +sleep 2 |
| 96 | +kubectl get pods -n spmc-shared-gpu-test |
| 97 | +``` |
| 98 | + |
| 99 | +The pod will be running. |
| 100 | +``` |
| 101 | +NAME READY STATUS RESTARTS AGE |
| 102 | +gpu-pod 2/2 Running 2 (55s ago) 2m13s |
| 103 | +``` |
| 104 | + |
| 105 | +Running `nvidia-smi` will show something like the following: |
| 106 | +```console |
| 107 | +nvidia-smi |
| 108 | +``` |
| 109 | +``` |
| 110 | ++---------------------------------------------------------------------------------------+ |
| 111 | +| Processes: | |
| 112 | +| GPU GI CI PID Type Process name GPU Memory | |
| 113 | +| ID ID Usage | |
| 114 | +|=======================================================================================| |
| 115 | +| 0 N/A N/A 1514114 C /cuda-samples/sample 746MiB | |
| 116 | +| 0 N/A N/A 1514167 C /cuda-samples/sample 746MiB | |
| 117 | ++---------------------------------------------------------------------------------------+ |
| 118 | +``` |
| 119 | + |
| 120 | +Delete the pod: |
| 121 | +```console |
| 122 | +kubectl delete -f single-pod-single-container-gpu.yaml |
| 123 | +``` |
| 124 | + |
| 125 | +#### Example 3 (MPSC-Shared-GPU): multiple pods share a GPU via ResourceClaim |
| 126 | + |
| 127 | +```console |
| 128 | +kubectl apply -f multiple-pods-single-container-shared-gpu.yaml |
| 129 | +sleep 2 |
| 130 | +kubectl get pods -n mpsc-shared-gpu-test |
| 131 | +``` |
| 132 | + |
| 133 | +Two pods will be running. |
| 134 | +``` |
| 135 | +$ kubectl get pods -n mpsc-shared-gpu-test |
| 136 | +NAME READY STATUS RESTARTS AGE |
| 137 | +gpu-pod-1 1/1 Running 0 11s |
| 138 | +gpu-pod-2 1/1 Running 0 11s |
| 139 | +``` |
| 140 | + |
| 141 | +Running `nvidia-smi` will show something like the following: |
| 142 | +```console |
| 143 | +nvidia-smi |
| 144 | +``` |
| 145 | +``` |
| 146 | ++---------------------------------------------------------------------------------------+ |
| 147 | +| Processes: | |
| 148 | +| GPU GI CI PID Type Process name GPU Memory | |
| 149 | +| ID ID Usage | |
| 150 | +| 0 N/A N/A 1551456 C /cuda-samples/sample 746MiB | |
| 151 | +| 0 N/A N/A 1551593 C /cuda-samples/sample 746MiB | |
| 152 | +|=======================================================================================| |
| 153 | +``` |
| 154 | + |
| 155 | +Delete the pods: |
| 156 | +```console |
| 157 | +kubectl delete -f multiple-pods-single-container-shared-gpu.yaml |
| 158 | +``` |
| 159 | + |
| 160 | +#### Example 4 (MPSC-Unshared-GPU): multiple pods request dedicated GPU access |
| 161 | + |
| 162 | +```console |
| 163 | +kubectl apply -f multiple-pods-single-container-unshared-gpu.yaml |
| 164 | +sleep 2 |
| 165 | +kubectl get pods -n mpsc-unshared-gpu-test |
| 166 | +``` |
| 167 | + |
| 168 | +One pod will be running and the other one is pending. |
| 169 | +``` |
| 170 | +$ kubectl get pods -n mpsc-unshared-gpu-test |
| 171 | +NAME READY STATUS RESTARTS AGE |
| 172 | +gpu-pod-1 1/1 Running 0 11s |
| 173 | +gpu-pod-2 1/1 Pending 0 11s |
| 174 | +``` |
| 175 | + |
| 176 | +Running `nvidia-smi` will show something like the following: |
| 177 | +```console |
| 178 | +nvidia-smi |
| 179 | +``` |
| 180 | +``` |
| 181 | ++---------------------------------------------------------------------------------------+ |
| 182 | +| Processes: | |
| 183 | +| GPU GI CI PID Type Process name GPU Memory | |
| 184 | +| ID ID Usage | |
| 185 | +| 0 N/A N/A 1544488 C /cuda-samples/sample 746MiB | |
| 186 | +|=======================================================================================| |
| 187 | +``` |
| 188 | + |
| 189 | +Delete the pods: |
| 190 | +``` |
| 191 | +kubectl delete -f multiple-pods-single-container-unshared-gpu.yaml |
| 192 | +``` |
| 193 | + |
| 194 | +#### Example 5 (SPMC-MPS-GPU): a single pod's multiple containers share a GPU via MPS |
| 195 | + |
| 196 | +```console |
| 197 | +kubectl apply -f single-pod-multiple-containers-mps-gpu.yaml |
| 198 | +sleep 2 |
| 199 | +kubectl get pods -n spmc-mps-gpu-test |
| 200 | +``` |
| 201 | + |
| 202 | +The pod will be running. |
| 203 | +``` |
| 204 | +$ kubectl get pods -n mpsc-mps-gpu-test |
| 205 | +NAME READY STATUS RESTARTS AGE |
| 206 | +gpu-pod-1 2/2 Running 0 11s |
| 207 | +``` |
| 208 | + |
| 209 | +Running `nvidia-smi` will show something like the following: |
| 210 | +```console |
| 211 | +nvidia-smi |
| 212 | +``` |
| 213 | +``` |
| 214 | ++---------------------------------------------------------------------------------------+ |
| 215 | +| Processes: | |
| 216 | +| GPU GI CI PID Type Process name GPU Memory | |
| 217 | +| ID ID Usage | |
| 218 | +|=======================================================================================| |
| 219 | +| 0 N/A N/A 1559554 M+C /cuda-samples/sample 790MiB | |
| 220 | +| 0 N/A N/A 1559585 C nvidia-cuda-mps-server 28MiB | |
| 221 | +| 0 N/A N/A 1559610 M+C /cuda-samples/sample 790MiB | |
| 222 | ++---------------------------------------------------------------------------------------+ |
| 223 | +``` |
| 224 | + |
| 225 | +Delete the pod: |
| 226 | +``` |
| 227 | +kubectl delete -f single-pod-multiple-containers-mps-gpu.yaml |
| 228 | +``` |
| 229 | + |
| 230 | +#### Example 6 (MPSC-MPS-GPU): multiple pods share a GPU via MPS |
| 231 | + |
| 232 | +```console |
| 233 | +kubectl apply -f multiple-pods-single-container-mps-gpu.yaml |
| 234 | +sleep 2 |
| 235 | +kubectl get pods -n mpsc-mps-gpu-test |
| 236 | +``` |
| 237 | + |
| 238 | +Two pods will be running and the other one is pending. |
| 239 | +``` |
| 240 | +$ kubectl get pods -n mpsc-mps-gpu-test |
| 241 | +NAME READY STATUS RESTARTS AGE |
| 242 | +gpu-pod-1 1/1 Running 0 11s |
| 243 | +gpu-pod-2 1/1 Running 0 11s |
| 244 | +``` |
| 245 | + |
| 246 | +Running `nvidia-smi` will show something like the following: |
| 247 | +```console |
| 248 | +nvidia-smi |
| 249 | +``` |
| 250 | +``` |
| 251 | ++---------------------------------------------------------------------------------------+ |
| 252 | +| Processes: | |
| 253 | +| GPU GI CI PID Type Process name GPU Memory | |
| 254 | +| ID ID Usage | |
| 255 | +|=======================================================================================| |
| 256 | +| 0 N/A N/A 1568768 M+C /cuda-samples/sample 562MiB | |
| 257 | +| 0 N/A N/A 1568771 M+C /cuda-samples/sample 562MiB | |
| 258 | +| 0 N/A N/A 1568831 C nvidia-cuda-mps-server 28MiB | |
| 259 | ++---------------------------------------------------------------------------------------+ |
| 260 | +``` |
| 261 | + |
| 262 | +Delete the pods: |
| 263 | +```console |
| 264 | +kubectl delete -f multiple-pods-single-container-mps-gpu.yaml |
| 265 | +``` |
| 266 | + |
| 267 | +#### Example 7 (SPMC-TimeSlicing-GPU): a single pod's multiple containers share a GPU via TimeSlicing |
| 268 | + |
| 269 | +```console |
| 270 | +kubectl apply -f single-pod-multiple-containers-timeslicing-gpu.yaml |
| 271 | +sleep 2 |
| 272 | +kubectl get pods -n spmc-timeslicing-gpu-test |
| 273 | +``` |
| 274 | + |
| 275 | +Two pods will be running and the other one is pending. |
| 276 | +``` |
| 277 | +$ kubectl get pods -n spmc-timeslicing-gpu-test |
| 278 | +NAME READY STATUS RESTARTS AGE |
| 279 | +gpu-pod 1/1 Running 0 11s |
| 280 | +``` |
| 281 | + |
| 282 | +Run `nvidia-smi` will show something like the following (2 containers sharing the GPU): |
| 283 | +```console |
| 284 | +nvidia-smi |
| 285 | +``` |
| 286 | +``` |
| 287 | ++---------------------------------------------------------------------------------------+ |
| 288 | +| Processes: | |
| 289 | +| GPU GI CI PID Type Process name GPU Memory | |
| 290 | +| ID ID Usage | |
| 291 | +|=======================================================================================| |
| 292 | +| 0 N/A N/A 306436 C /cuda-samples/sample 746MiB | |
| 293 | +| 0 N/A N/A 306442 C ./gpu_burn 21206MiB | |
| 294 | ++---------------------------------------------------------------------------------------+``` |
| 295 | +``` |
| 296 | + |
| 297 | +Delete the pods: |
| 298 | +```console |
| 299 | +kubectl delete -f single-pod-multiple-containers-timeslicing-gpu.yaml |
| 300 | +``` |
| 301 | + |
| 302 | +#### Example 8 (MPSC-TimeSlicing-GPU): multiple pods share a GPU via TimeSlicing |
| 303 | + |
| 304 | +```console |
| 305 | +kubectl apply -f multiple-pods-single-container-timeslicing-gpu.yaml |
| 306 | +sleep 2 |
| 307 | +kubectl get pods -n mpsc-timeslicing-gpu-test |
| 308 | +``` |
| 309 | + |
| 310 | +Two pods will be running and the other one is pending. |
| 311 | +``` |
| 312 | +$ kubectl get pods -n mpsc-timeslicing-gpu-test |
| 313 | +NAME READY STATUS RESTARTS AGE |
| 314 | +gpu-pod-1 1/1 Running 0 11s |
| 315 | +gpu-pod-2 1/1 Running 0 11s |
| 316 | +``` |
| 317 | + |
| 318 | +Run `nvidia-smi` will show something like the following (2 containers sharing the GPU): |
| 319 | +```console |
| 320 | +nvidia-smi |
| 321 | +``` |
| 322 | +``` |
| 323 | ++---------------------------------------------------------------------------------------+ |
| 324 | +| Processes: | |
| 325 | +| GPU GI CI PID Type Process name GPU Memory | |
| 326 | +| ID ID Usage | |
| 327 | +|=======================================================================================| |
| 328 | +| 0 N/A N/A 306436 C /cuda-samples/sample 746MiB | |
| 329 | +| 0 N/A N/A 306442 C ./gpu_burn 21206MiB | |
| 330 | ++---------------------------------------------------------------------------------------+``` |
| 331 | +``` |
| 332 | + |
| 333 | +Delete the pods: |
| 334 | +```console |
| 335 | +kubectl delete -f multiple-pods-single-containers-timeslicing-gpu.yaml |
| 336 | +``` |
0 commit comments