Skip to content

Commit 5f08cb2

Browse files
Update main user guide in preparation for SHARK 3.1 release (#751)
Made as many minor changes as possible as I didn't want to create new content but rather just string together existing content. Main change outside of adding links to the Llama documentation is to move the SDXL quickstart into its user guide while creating a quick organizational hierarchy for the Llama 3.1 docs. Ideally those will move into the LLama 3.1 user docs in the next release. I've handwaved the documentation for getting Llama 3.1 70b model working given it's an advanced topic that requires a user to be familiar with both the hugging face cli and llama.cpp.
1 parent d42cc29 commit 5f08cb2

File tree

3 files changed

+66
-59
lines changed

3 files changed

+66
-59
lines changed

README.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -61,10 +61,10 @@ optimal parameter configurations to use during model compilation.
6161

6262
### Models
6363

64-
Model name | Model recipes | Serving apps
65-
---------- | ------------- | ------------
66-
SDXL | [`sharktank/sharktank/models/punet/`](https://github.com/nod-ai/shark-ai/tree/main/sharktank/sharktank/models/punet) | [`shortfin/python/shortfin_apps/sd/`](https://github.com/nod-ai/shark-ai/tree/main/shortfin/python/shortfin_apps/sd)
67-
llama | [`sharktank/sharktank/models/llama/`](https://github.com/nod-ai/shark-ai/tree/main/sharktank/sharktank/models/llama) | [`shortfin/python/shortfin_apps/llm/`](https://github.com/nod-ai/shark-ai/tree/main/shortfin/python/shortfin_apps/llm)
64+
Model name | Model recipes | Serving apps | Guide |
65+
---------- | ------------- | ------------ | ----- |
66+
SDXL | [`sharktank/sharktank/models/punet/`](https://github.com/nod-ai/shark-ai/tree/main/sharktank/sharktank/models/punet) | [`shortfin/python/shortfin_apps/sd/`](https://github.com/nod-ai/shark-ai/tree/main/shortfin/python/shortfin_apps/sd) | [shortfin/python/shortfin_apps/sd/README.md](shortfin/python/shortfin_apps/sd/README.md)
67+
llama | [`sharktank/sharktank/models/llama/`](https://github.com/nod-ai/shark-ai/tree/main/sharktank/sharktank/models/llama) | [`shortfin/python/shortfin_apps/llm/`](https://github.com/nod-ai/shark-ai/tree/main/shortfin/python/shortfin_apps/llm) | [docs/shortfin/llm/user/llama_serving.md](docs/shortfin/llm/user/llama_serving.md)
6868

6969
## SHARK Developers
7070

docs/user_guide.md

+14-53
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
These instructions cover the usage of the latest stable release of SHARK. For a more bleeding edge release please install the [nightly releases](nightly_releases.md).
44

5+
> [!TIP]
6+
> Please note as we are prepping the next stable release, please use [nightly releases](nightly_releases.md) for usage.
7+
58
## Prerequisites
69

710
Our current user guide requires that you have:
@@ -64,61 +67,19 @@ pip install shark-ai[apps]
6467
python -m shortfin_apps.sd.server --help
6568
```
6669

67-
## Quickstart
70+
## Getting started
6871

69-
### Run the SDXL Server
72+
As part of our current release we support serving [SDXL](https://stablediffusionxl.com/) and [Llama 3.1](https://ai.meta.com/blog/meta-llama-3-1/) variants as well as an initial release of `sharktank`, SHARK's model development toolkit which is leveraged in order to compile these models to be high performant.
7073

71-
Run the [SDXL Server](../shortfin/python/shortfin_apps/sd/README.md#Start-SDXL-Server)
74+
### SDXL
7275

73-
### Run the SDXL Client
76+
To get started with SDXL, please follow the [SDXL User Guide](../shortfin/python/shortfin_apps/sd/README.md#Start-SDXL-Server)
7477

75-
```
76-
python -m shortfin_apps.sd.simple_client --interactive
77-
```
7878

79-
Congratulations!!! At this point you can play around with the server and client based on your usage.
80-
81-
### Note: Server implementation scope
82-
83-
The SDXL server's implementation does not account for extremely large client batches. Normally, for heavy workloads, services would be composed under a load balancer to ensure each service is fed with requests optimally. For most cases outside of large-scale deployments, the server's internal batching/load balancing is sufficient.
84-
85-
### Update flags
86-
87-
Please see --help for both the server and client for usage instructions. Here's a quick snapshot.
88-
89-
#### Update server options:
90-
91-
| Flags | options |
92-
|---|---|
93-
|--host HOST |
94-
|--port PORT | server port |
95-
|--root-path ROOT_PATH |
96-
|--timeout-keep-alive |
97-
|--device | local-task,hip,amdgpu | amdgpu only supported in this release
98-
|--target | gfx942,gfx1100 | gfx942 only supported in this release
99-
|--device_ids |
100-
|--tokenizers |
101-
|--model_config |
102-
| --workers_per_device |
103-
| --fibers_per_device |
104-
| --isolation | per_fiber, per_call, none |
105-
| --show_progress |
106-
| --trace_execution |
107-
| --amdgpu_async_allocations |
108-
| --splat |
109-
| --build_preference | compile,precompiled |
110-
| --compile_flags |
111-
| --flagfile FLAGFILE |
112-
| --artifacts_dir ARTIFACTS_DIR | Where to store cached artifacts from the Cloud |
113-
114-
#### Update client with different options:
115-
116-
| Flags |options|
117-
|---|---
118-
|--file |
119-
|--reps |
120-
|--save | Whether to save image generated by the server |
121-
|--outputdir| output directory to store images generated by SDXL |
122-
|--steps |
123-
|--interactive |
124-
|--port| port to interact with server |
79+
### Llama 3.1
80+
81+
To get started with Llama 3.1, please follow the [Llama User Guide](shortfin/llm/user/llama_serving.md).
82+
83+
* Once you've set up the Llama server in the guide above, we recommend that you use [SGLang Frontend](https://sgl-project.github.io/frontend/frontend.html) by following the [Using `shortfin` with `sglang` guide](shortfin/llm/user/shortfin_with_sglang_frontend_language.md)
84+
* If you would like to deploy LLama on a Kubernetes cluster we also provide a simple set of instructions and deployment configuration to do so [here](shortfin/llm/user/llama_serving_on_kubernetes.md).
85+
* Finally, if you'd like to leverage the instructions above to run against a different variant of Llama 3.1, it's supported. However, you will need to generate a gguf dataset for that variant. In order to do this leverage the [HuggingFace](https://huggingface.co/)'s [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/en/guides/cli) in combination with [llama.cpp](https://github.com/ggerganov/llama.cpp)'s convert_hf_to_gguf.py. In future releases, we plan to streamline these instructions to make it easier for users to compile their own models from HuggingFace.

shortfin/python/shortfin_apps/sd/README.md

+48-2
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,55 @@ python -m shortfin_apps.sd.server --device=amdgpu --device_ids=0 --build_prefere
2222
INFO - Application startup complete.
2323
INFO - Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2424
```
25-
## Run the SDXL Client
25+
### Run the SDXL Client
2626

27-
- Run a CLI client in a separate shell:
2827
```
2928
python -m shortfin_apps.sd.simple_client --interactive
3029
```
30+
31+
Congratulations!!! At this point you can play around with the server and client based on your usage.
32+
33+
### Note: Server implementation scope
34+
35+
The SDXL server's implementation does not account for extremely large client batches. Normally, for heavy workloads, services would be composed under a load balancer to ensure each service is fed with requests optimally. For most cases outside of large-scale deployments, the server's internal batching/load balancing is sufficient.
36+
37+
### Update flags
38+
39+
Please see --help for both the server and client for usage instructions. Here's a quick snapshot.
40+
41+
#### Update server options:
42+
43+
| Flags | options |
44+
|---|---|
45+
|--host HOST |
46+
|--port PORT | server port |
47+
|--root-path ROOT_PATH |
48+
|--timeout-keep-alive |
49+
|--device | local-task,hip,amdgpu | amdgpu only supported in this release
50+
|--target | gfx942,gfx1100 | gfx942 only supported in this release
51+
|--device_ids |
52+
|--tokenizers |
53+
|--model_config |
54+
| --workers_per_device |
55+
| --fibers_per_device |
56+
| --isolation | per_fiber, per_call, none |
57+
| --show_progress |
58+
| --trace_execution |
59+
| --amdgpu_async_allocations |
60+
| --splat |
61+
| --build_preference | compile,precompiled |
62+
| --compile_flags |
63+
| --flagfile FLAGFILE |
64+
| --artifacts_dir ARTIFACTS_DIR | Where to store cached artifacts from the Cloud |
65+
66+
#### Update client with different options:
67+
68+
| Flags |options|
69+
|---|---
70+
|--file |
71+
|--reps |
72+
|--save | Whether to save image generated by the server |
73+
|--outputdir| output directory to store images generated by SDXL |
74+
|--steps |
75+
|--interactive |
76+
|--port| port to interact with server |

0 commit comments

Comments
 (0)