Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a svg demo file showing the DRA use: install, configure and use #108

Merged
merged 1 commit into from
May 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ First since we'll launch kind with GPU support, ensure that the following prereq
sudo nvidia-ctk config --set accept-nvidia-visible-devices-as-volume-mounts=true --in-place
```

1. Show the current set of GPUs on the machine
1. Show the current set of GPUs on the machine:
```console
nvidia-smi -L
```
Expand All @@ -53,6 +53,15 @@ cd k8s-dra-driver
```

### Setting up the infrastructure

Here's a demo showing how to install and configure DRA, and run a pod in a `kind` cluster on a Linux workstation.

<p align="center">
<img width="800" src="./demo/specs/quickstart/basic-demo.svg">
</p>

Below are the detailed, step-by-step instructions.

Comment on lines +56 to +64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this change looks good now. My only worry is keeping this SVG in sync as we make changes to the README. Is there someway to add the code to generate it in the repo instead of just importing the final image?

Copy link
Collaborator Author

@yuanchen8911 yuanchen8911 May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I assume we won't change the basic flow and example very frequently.
The svg file was generated from the following description script file using a tool I created. recdemo.sh <input: demo script> <output: svg file>.

We can add the script and demo files to a new folder and update/build svg files (e.g., in Makefile) when things change. That's how KWOK manages its examples and demos.

How about doing it in future PRs?

# Demonstrate the basic use of DRA in a kind cluster

# Prerequisites: follow the instructions in the README
# 1. Install kind
# 2. Install and configure the NVIDIA container toolkit

# Show the current set of GPUs on the machine
nvidia-smi -L

# Create a kind cluster to run the demo:
./demo/clusters/kind/create-cluster.sh

# Build the image for the example resource driver and the images for the kind cluster:
./demo/clusters/kind/build-dra-driver.sh

# Install the driver in the kind cluster
./demo/clusters/kind/install-dra-driver.sh

# Show two pods running in the nvidia-dra-driver namespace:
kubectl get pods -n nvidia-dra-driver

# Run the examples in the demo/specs/quickstart folder. The README in that directory shows the full script:
# For example, you can run the second test.
kubectl apply --filename=demo/specs/quickstart/gpu-test2.yaml
sleep 2

# Get the pod information
kubectl get pod -n gpu-test2

# Get the GPU resource information of the pod.
"basic.demo" 34L, 1109B                                                                                                      1,1           Top
# 2. Install and configure the NVIDIA container toolkit

# Show the current set of GPUs on the machine
nvidia-smi -L

# Create a kind cluster to run the demo:
./demo/clusters/kind/create-cluster.sh

# Build the image for the example resource driver and the images for the kind cluster:
./demo/clusters/kind/build-dra-driver.sh

# Install the driver in the kind cluster
./demo/clusters/kind/install-dra-driver.sh

# Show two pods running in the nvidia-dra-driver namespace:
kubectl get pods -n nvidia-dra-driver

# Run the examples in the demo/specs/quickstart folder. The README in that directory shows the full script:
# For example, you can run the second test.
kubectl apply --filename=demo/specs/quickstart/gpu-test2.yaml
sleep 2

# Get the pod information
kubectl get pod -n gpu-test2

# Get the GPU resource information of the pod.
kubectl logs -n gpu-test2 pod --all-containers

# Delete the kind cluster and clean up the environment
 ./demo/clusters/kind/delete-cluster.sh

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'm fine merging it as-is for now.

First, create a `kind` cluster to run the demo:
```console
./demo/clusters/kind/create-cluster.sh
Expand Down Expand Up @@ -88,7 +97,7 @@ The `README` in that directory shows the full script of the demo you can walk th
cat demo/specs/quickstart/README.md
```

Deploy the example pods in the demo directory.
Deploy the example pods in the demo directory:
```console
kubectl apply --filename=demo/specs/quickstart/gpu-test{1,2,3}.yaml
```
Expand Down Expand Up @@ -130,11 +139,10 @@ GPU 0: A100-SXM4-40GB (UUID: GPU-4404041a-04cf-1ccf-9e70-f139a9b1e23c)

### Cleaning up the environment

Running
Remove the cluster created in the preceding steps:
```console
./demo/clusters/kind/delete-cluster.sh
```
will remove the cluster created in the preceding steps.

<!--
TODO: This README should be extended with additional content including:
Expand Down
1 change: 1 addition & 0 deletions demo/specs/quickstart/basic-demo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.