Skip to content

Commit 5a89f2e

Browse files
committed
Add basic examples for Linux workstations
Signed-off-by: Yuan Chen <[email protected]> Update README Signed-off-by: Yuan Chen <[email protected]> Update README Signed-off-by: Yuan Chen <[email protected]> Update README Signed-off-by: Yuan Chen <[email protected]> Update README Signed-off-by: Yuan Chen <[email protected]> Update README Signed-off-by: Yuan Chen <[email protected]> Update README Signed-off-by: Yuan Chen <[email protected]> Update README Update README.md Signed-off-by: Yuan Chen <[email protected]> Update README.md Signed-off-by: Yuan Chen <[email protected]> Update README.md Signed-off-by: Yuan Chen <[email protected]> Update README.md Signed-off-by: Yuan Chen <[email protected]> Update README.md Signed-off-by: Yuan Chen <[email protected]> Update README.md Signed-off-by: Yuan Chen <[email protected]> Update README.md Signed-off-by: Yuan Chen <[email protected]> Update README Signed-off-by: Yuan Chen <[email protected]> Update README Signed-off-by: Yuan Chen <[email protected]> Update README Signed-off-by: Yuan Chen <[email protected]> Update README.md Signed-off-by: Yuan Chen <[email protected]> Remove the timeslicing example Add restartPolicy to examples Signed-off-by: Yuan Chen <[email protected]> Update demo files Signed-off-by: Yuan Chen <[email protected]>
1 parent bbaffa0 commit 5a89f2e

10 files changed

+937
-0
lines changed

demo/specs/quickstart/README.md

+2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
You can run basic examples on a Linux desktop by following the instructions in the [desktop folder](desktop/README.md) as well.
2+
13
#### Show current state of the cluster
24
```console
35
kubectl get pod -A
+336
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
# Basic examples for a Linux desktop or workstation
2+
* [Prerequsites](#prerequsites)
3+
* [Run examples](#run-examples)
4+
* [1. SPSC-GPU: a single pod accesses a GPU via ResourceClaimTemplate](#example-1-spsc-gpu-a-single-pod-accesses-a-gpu-via-resourceclaimtemplate)
5+
* [2. SPMC-Shared-GPU: a single pod's multiple containers share a GPU via ResourceClaimTemplate](#example-2-spmc-shared-gpu-a-single-pods-multiple-containers-share-a-gpu-via-resourceclaimtemplate)
6+
* [3. MPSC-Shared-GPU: multiple pods share a GPU via ResourceClaim](#example-3-mpsc-shared-gpu-multiple-pods-share-a-gpu-via-resourceclaim)
7+
* [4. MPSC-Unshared-GPU: multiple pods request dedicated GPU access](#example-4-mpsc-unshared-gpu-multiple-pods-request-dedicated-gpu-access)
8+
* [5. SPMC-MPS-GPU: a single pod's multiple containers share a GPU via MPS](#example-5-spmc-mps-gpu-a-single-pods-multiple-containers-share-a-gpu-via-mps)
9+
* [6. MPSC-MPS-GPU: multiple pods share a GPU via MPS](#example-6-mpsc-mps-gpu-multiple-pods-share-a-gpu-via-mps)
10+
* [7. SPMC-TimeSlicing-GPU: a singile pod's multiple containers share a GPU via TimeSlicing](#example-7-spmc-timeslicing-gpu-a-single-pods-multiple-containers-share-a-gpu-via-timeslicing)
11+
* [8. MPSC-TimeSlicing-GPU: multiple pods share a GPU via TimeSlicing](#example-8-mpsc-timeslicing-gpu-multiple-pods-multiple-containers-share-a-gpu-via-timeslicing)
12+
13+
## Prerequsites
14+
15+
You will need a Linux machine with a NVIDIA GPU such as GeForce, install the DRA driver and create a kind cluster by following the instructions in the [DRA driver setup](https://github.com/yuanchen8911/k8s-dra-driver?tab=readme-ov-file#demo).
16+
17+
#### Show the current GPU configuration of the machine
18+
```console
19+
nvidia-smi -L
20+
```
21+
22+
```
23+
GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-84f293a6-d610-e3dc-c4d8-c5d94409764b)
24+
```
25+
26+
#### Show the cluster up
27+
```console
28+
kubectl cluster-info
29+
kubectl get nodes
30+
```
31+
32+
```
33+
Kubernetes control plane is running at https://127.0.0.1:34883
34+
CoreDNS is running at https://127.0.0.1:34883/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
35+
36+
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
37+
38+
NAME STATUS ROLES AGE VERSION
39+
k8s-dra-driver-cluster-control-plane Ready control-plane 4d1h v1.29.1
40+
k8s-dra-driver-cluster-worker Ready <none> 4d1h v1.29.1
41+
```
42+
43+
#### Show the DRA-driver running
44+
```console
45+
kubectl get pod -n nvidia-dra-driver
46+
```
47+
48+
```
49+
NAME READY STATUS RESTARTS AGE
50+
nvidia-k8s-dra-driver-controller-6d5869d478-rr488 1/1 Running 0 4d1h
51+
nvidia-k8s-dra-driver-kubelet-plugin-qqq5b 1/1 Running 0 4d1h
52+
```
53+
54+
55+
## Run examples
56+
57+
#### Example 1 (SPSC-GPU): a single pod accesses a GPU via ResourceClaimTemplate
58+
59+
```console
60+
kubectl apply -f single-pod-single-container-gpu.yaml
61+
sleep 2
62+
kubectl get pods -n spsc-gpu-test
63+
```
64+
65+
The pod will be running.
66+
```
67+
NAME READY STATUS RESTARTS AGE
68+
gpu-pod 1/1 Running 0 6s
69+
```
70+
71+
Running `nvidia-smi` will show something like the following:
72+
```console
73+
nvidia-smi
74+
```
75+
76+
```
77+
+---------------------------------------------------------------------------------------+
78+
| Processes: |
79+
| GPU GI CI PID Type Process name GPU Memory |
80+
| ID ID Usage |
81+
|=======================================================================================|
82+
| 0 N/A N/A 1474787 C /cuda-samples/sample 746MiB |
83+
+---------------------------------------------------------------------------------------+
84+
```
85+
86+
Delete the pod:
87+
```console
88+
kubectl delete -f single-pod-single-container-gpu.yaml
89+
```
90+
91+
#### Example 2 (SPMC-Shared-GPU): a single pod's multiple containers share a GPU via ResourceClaimTemplate
92+
93+
```console
94+
kubectl apply -f single-pod-multiple-containers-shared-gpu.yaml
95+
sleep 2
96+
kubectl get pods -n spmc-shared-gpu-test
97+
```
98+
99+
The pod will be running.
100+
```
101+
NAME READY STATUS RESTARTS AGE
102+
gpu-pod 2/2 Running 2 (55s ago) 2m13s
103+
```
104+
105+
Running `nvidia-smi` will show something like the following:
106+
```console
107+
nvidia-smi
108+
```
109+
```
110+
+---------------------------------------------------------------------------------------+
111+
| Processes: |
112+
| GPU GI CI PID Type Process name GPU Memory |
113+
| ID ID Usage |
114+
|=======================================================================================|
115+
| 0 N/A N/A 1514114 C /cuda-samples/sample 746MiB |
116+
| 0 N/A N/A 1514167 C /cuda-samples/sample 746MiB |
117+
+---------------------------------------------------------------------------------------+
118+
```
119+
120+
Delete the pod:
121+
```console
122+
kubectl delete -f single-pod-single-container-gpu.yaml
123+
```
124+
125+
#### Example 3 (MPSC-Shared-GPU): multiple pods share a GPU via ResourceClaim
126+
127+
```console
128+
kubectl apply -f multiple-pods-single-container-shared-gpu.yaml
129+
sleep 2
130+
kubectl get pods -n mpsc-shared-gpu-test
131+
```
132+
133+
Two pods will be running.
134+
```
135+
$ kubectl get pods -n mpsc-shared-gpu-test
136+
NAME READY STATUS RESTARTS AGE
137+
gpu-pod-1 1/1 Running 0 11s
138+
gpu-pod-2 1/1 Running 0 11s
139+
```
140+
141+
Running `nvidia-smi` will show something like the following:
142+
```console
143+
nvidia-smi
144+
```
145+
```
146+
+---------------------------------------------------------------------------------------+
147+
| Processes: |
148+
| GPU GI CI PID Type Process name GPU Memory |
149+
| ID ID Usage |
150+
| 0 N/A N/A 1551456 C /cuda-samples/sample 746MiB |
151+
| 0 N/A N/A 1551593 C /cuda-samples/sample 746MiB |
152+
|=======================================================================================|
153+
```
154+
155+
Delete the pods:
156+
```console
157+
kubectl delete -f multiple-pods-single-container-shared-gpu.yaml
158+
```
159+
160+
#### Example 4 (MPSC-Unshared-GPU): multiple pods request dedicated GPU access
161+
162+
```console
163+
kubectl apply -f multiple-pods-single-container-unshared-gpu.yaml
164+
sleep 2
165+
kubectl get pods -n mpsc-unshared-gpu-test
166+
```
167+
168+
One pod will be running and the other one is pending.
169+
```
170+
$ kubectl get pods -n mpsc-unshared-gpu-test
171+
NAME READY STATUS RESTARTS AGE
172+
gpu-pod-1 1/1 Running 0 11s
173+
gpu-pod-2 1/1 Pending 0 11s
174+
```
175+
176+
Running `nvidia-smi` will show something like the following:
177+
```console
178+
nvidia-smi
179+
```
180+
```
181+
+---------------------------------------------------------------------------------------+
182+
| Processes: |
183+
| GPU GI CI PID Type Process name GPU Memory |
184+
| ID ID Usage |
185+
| 0 N/A N/A 1544488 C /cuda-samples/sample 746MiB |
186+
|=======================================================================================|
187+
```
188+
189+
Delete the pods:
190+
```
191+
kubectl delete -f multiple-pods-single-container-unshared-gpu.yaml
192+
```
193+
194+
#### Example 5 (SPMC-MPS-GPU): a single pod's multiple containers share a GPU via MPS
195+
196+
```console
197+
kubectl apply -f single-pod-multiple-containers-mps-gpu.yaml
198+
sleep 2
199+
kubectl get pods -n spmc-mps-gpu-test
200+
```
201+
202+
The pod will be running.
203+
```
204+
$ kubectl get pods -n mpsc-mps-gpu-test
205+
NAME READY STATUS RESTARTS AGE
206+
gpu-pod-1 2/2 Running 0 11s
207+
```
208+
209+
Running `nvidia-smi` will show something like the following:
210+
```console
211+
nvidia-smi
212+
```
213+
```
214+
+---------------------------------------------------------------------------------------+
215+
| Processes: |
216+
| GPU GI CI PID Type Process name GPU Memory |
217+
| ID ID Usage |
218+
|=======================================================================================|
219+
| 0 N/A N/A 1559554 M+C /cuda-samples/sample 790MiB |
220+
| 0 N/A N/A 1559585 C nvidia-cuda-mps-server 28MiB |
221+
| 0 N/A N/A 1559610 M+C /cuda-samples/sample 790MiB |
222+
+---------------------------------------------------------------------------------------+
223+
```
224+
225+
Delete the pod:
226+
```
227+
kubectl delete -f single-pod-multiple-containers-mps-gpu.yaml
228+
```
229+
230+
#### Example 6 (MPSC-MPS-GPU): multiple pods share a GPU via MPS
231+
232+
```console
233+
kubectl apply -f multiple-pods-single-container-mps-gpu.yaml
234+
sleep 2
235+
kubectl get pods -n mpsc-mps-gpu-test
236+
```
237+
238+
Two pods will be running and the other one is pending.
239+
```
240+
$ kubectl get pods -n mpsc-mps-gpu-test
241+
NAME READY STATUS RESTARTS AGE
242+
gpu-pod-1 1/1 Running 0 11s
243+
gpu-pod-2 1/1 Running 0 11s
244+
```
245+
246+
Running `nvidia-smi` will show something like the following:
247+
```console
248+
nvidia-smi
249+
```
250+
```
251+
+---------------------------------------------------------------------------------------+
252+
| Processes: |
253+
| GPU GI CI PID Type Process name GPU Memory |
254+
| ID ID Usage |
255+
|=======================================================================================|
256+
| 0 N/A N/A 1568768 M+C /cuda-samples/sample 562MiB |
257+
| 0 N/A N/A 1568771 M+C /cuda-samples/sample 562MiB |
258+
| 0 N/A N/A 1568831 C nvidia-cuda-mps-server 28MiB |
259+
+---------------------------------------------------------------------------------------+
260+
```
261+
262+
Delete the pods:
263+
```console
264+
kubectl delete -f multiple-pods-single-container-mps-gpu.yaml
265+
```
266+
267+
#### Example 7 (SPMC-TimeSlicing-GPU):a single pod's multiple containers share a GPU via TimeSlicing
268+
269+
```console
270+
kubectl apply -f single-pod-multiple-containers-timeslicing-gpu.yaml
271+
sleep 2
272+
kubectl get pods -n spmc-timeslicing-gpu-test
273+
```
274+
275+
Two pods will be running and the other one is pending.
276+
```
277+
$ kubectl get pods -n spmc-timeslicing-gpu-test
278+
NAME READY STATUS RESTARTS AGE
279+
gpu-pod 1/1 Running 0 11s
280+
```
281+
282+
Run `nvidia-smi` will show something like the following (2 containers sharing the GPU):
283+
```console
284+
nvidia-smi
285+
```
286+
```
287+
+---------------------------------------------------------------------------------------+
288+
| Processes: |
289+
| GPU GI CI PID Type Process name GPU Memory |
290+
| ID ID Usage |
291+
|=======================================================================================|
292+
| 0 N/A N/A 306436 C /cuda-samples/sample 746MiB |
293+
| 0 N/A N/A 306442 C ./gpu_burn 21206MiB |
294+
+---------------------------------------------------------------------------------------+```
295+
```
296+
297+
Delete the pods:
298+
```console
299+
kubectl delete -f single-pod-multiple-containers-timeslicing-gpu.yaml
300+
```
301+
302+
#### Example 8 (MPSMP-TimeSlicing-GPU): multiple pods share a GPU via TimeSlicing
303+
304+
```console
305+
kubectl apply -f multiple-pods-single-container-timeslicing-gpu.yaml
306+
sleep 2
307+
kubectl get pods -n mpsc-timeslicing-gpu-test
308+
```
309+
310+
Two pods will be running and the other one is pending.
311+
```
312+
$ kubectl get pods -n mpsc-timeslicing-gpu-test
313+
NAME READY STATUS RESTARTS AGE
314+
gpu-pod-1 1/1 Running 0 11s
315+
gpu-pod-2 1/1 Running 0 11s
316+
```
317+
318+
Run `nvidia-smi` will show something like the following (2 containers sharing the GPU):
319+
```console
320+
nvidia-smi
321+
```
322+
```
323+
+---------------------------------------------------------------------------------------+
324+
| Processes: |
325+
| GPU GI CI PID Type Process name GPU Memory |
326+
| ID ID Usage |
327+
|=======================================================================================|
328+
| 0 N/A N/A 306436 C /cuda-samples/sample 746MiB |
329+
| 0 N/A N/A 306442 C ./gpu_burn 21206MiB |
330+
+---------------------------------------------------------------------------------------+```
331+
```
332+
333+
Delete the pods:
334+
```console
335+
kubectl delete -f multiple-pods-single-containers-timeslicing-gpu.yaml
336+
```

0 commit comments

Comments
 (0)