Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug studio #1831

Closed
wants to merge 31 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
e9aa1f6
logging request works
zolinthecow Oct 28, 2024
f816836
change to use stream executor submit
zolinthecow Oct 29, 2024
25cd38d
add support for anthropic
zolinthecow Oct 29, 2024
86154f7
support litellm + edits
zolinthecow Oct 29, 2024
4c4b19d
support openai
zolinthecow Oct 29, 2024
04a950c
add for choices too
zolinthecow Oct 29, 2024
1daae8a
support vertexai
zolinthecow Oct 29, 2024
0bde1b1
add docs
zolinthecow Oct 29, 2024
8ed7d1c
add port arg
zolinthecow Oct 29, 2024
4754915
add unit tests
zolinthecow Oct 29, 2024
d816597
remove debug print
zolinthecow Oct 29, 2024
6fa7c56
remove another debug log
zolinthecow Oct 29, 2024
ccc9f57
accidently removed some lines in readme
zolinthecow Oct 29, 2024
b7dc1b5
remove another debug log
zolinthecow Oct 29, 2024
4b80787
bump enochian-studio version
zolinthecow Oct 29, 2024
480ba87
bump enochian-studio version for debugging CI runner
zolinthecow Oct 29, 2024
17601ea
bump version to hopefully make it pass
zolinthecow Oct 29, 2024
649bc94
even more logging
zolinthecow Oct 29, 2024
996af70
file lock on download_node so hopefully race condition goes away
zolinthecow Oct 29, 2024
f289e01
is it running twice?
zolinthecow Oct 29, 2024
3ce1499
bump again to clear old data if its bad
zolinthecow Oct 29, 2024
61da822
bump to test again
zolinthecow Oct 30, 2024
449b724
better logging
zolinthecow Oct 30, 2024
21e72df
should be fixed
zolinthecow Oct 30, 2024
cb74c81
always delete
zolinthecow Oct 30, 2024
324cfbe
set the PATH properly
zolinthecow Oct 30, 2024
7d82409
merge
zolinthecow Nov 1, 2024
e30a806
left in a merge conflict
zolinthecow Nov 1, 2024
052c437
Merge branch 'main' into debug-studio
zolinthecow Nov 2, 2024
776e727
Merge branch 'main' into debug-studio
zolinthecow Nov 2, 2024
18da680
Merge branch 'main' into debug-studio
zolinthecow Nov 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions docs/frontend/frontend.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,35 @@ def chat_example(s):
s += sgl.assistant_end()
```

### Debug Studio

The frontend also provides a debug studio to view what exactly is getting passed into the runtime endpoint's generation API.
To use it, first start the debug server:

```bash
python -m sglang.launch_debug_server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python -m sglang.launch_debug_server
python -m sglang.lang.launch_debug_server

```

It will start a debug server on port 56765. Then, add a debug region to an `sgl.function`:

```python
@sgl.function
def text_qa(s, question):
s.begin_debug_region("TEXT_QA")
s += "Q: " + question + "\n"
s += "A:" + sgl.gen("answer", stop="\n")

state = text_qa.run(
question="What is the capital of France?",
temperature=0.1,
stream=True
)
```

When you navigate to `http://localhost:56765` (if you're on a remote server, ssh forward the port), you should see a web app with the prompt and response.

<img src="https://raw.githubusercontent.com/sgl-project/sglang/main/assets/debug_studio_example.png" alt="prompt_studio_demo" margin="10px">

### Tips and Implementation Details
- The `choices` argument in `sgl.gen` is implemented by computing the [token-length normalized log probabilities](https://blog.eleuther.ai/multiple-choice-normalization/) of all choices and selecting the one with the highest probability.
- The `regex` argument in `sgl.gen` is implemented through autoregressive decoding with logit bias masking, according to the constraints set by the regex. It is compatible with `temperature=0` and `temperature != 0`.
8 changes: 5 additions & 3 deletions python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ srt_hip = ["sglang[runtime_common]", "torch", "vllm==0.6.3.dev13"]
# need to follow https://docs.vllm.ai/en/latest/getting_started/xpu-installation.htmlinstall vllm
srt_xpu = ["sglang[runtime_common]"]

studio = ["enochian-studio>=0.0.3.post10"]

openai = ["openai>=1.0", "tiktoken"]
anthropic = ["anthropic>=0.20.0"]
litellm = ["litellm>=1.0.0"]
Expand All @@ -39,9 +41,9 @@ test = [
"accelerate",
"peft",
]
all = ["sglang[srt]", "sglang[openai]", "sglang[anthropic]", "sglang[litellm]"]
all_hip = ["sglang[srt_hip]", "sglang[openai]", "sglang[anthropic]", "sglang[litellm]"]
all_xpu = ["sglang[srt_xpu]", "sglang[openai]", "sglang[anthropic]", "sglang[litellm]"]
all = ["sglang[srt]", "sglang[openai]", "sglang[anthropic]", "sglang[litellm]", "sglang[studio]"]
all_hip = ["sglang[srt_hip]", "sglang[openai]", "sglang[anthropic]", "sglang[litellm]", "sglang[studio]"]
all_xpu = ["sglang[srt_xpu]", "sglang[openai]", "sglang[anthropic]", "sglang[litellm]", "sglang[studio]"]
dev = ["sglang[all]", "sglang[test]"]
dev_hip = ["sglang[all_hip]", "sglang[test]"]
dev_xpu = ["sglang[all_xpu]", "sglang[test]"]
Expand Down
1 change: 1 addition & 0 deletions python/sglang/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@
- `bench_serving.py`: Benchmark online serving with dynamic requests.
- `global_config.py`: The global configs and constants.
- `launch_server.py`: The entry point for launching the local server.
- `launch_debug_server.py`: The entry point for launching the debug server + web app
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an experimental feature for frontend language only, so please move it under python/sglang/lang.

- `utils.py`: Common utilities.
53 changes: 53 additions & 0 deletions python/sglang/lang/backend/anthropic.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import uuid
from datetime import datetime
from typing import List, Optional, Union

import numpy as np
Expand Down Expand Up @@ -42,6 +44,20 @@ def generate(
else:
system = ""

debug_request_id = str(uuid.uuid4())
s.log_debug(
[
{
"id": debug_request_id,
"requestPrompt": str(
[{"role": "system", "content": system}] + messages
),
"requestTimestamp": datetime.now().isoformat(),
"requestMetadata": sampling_params.to_anthropic_kwargs(),
}
]
)

ret = self.client.messages.create(
model=self.model_name,
system=system,
Expand All @@ -50,6 +66,17 @@ def generate(
)
comp = ret.content[0].text

s.log_debug(
[
{
"id": debug_request_id,
"responseContent": comp,
"responseTimestamp": datetime.now().isoformat(),
"responseMetadata": ret.to_json(),
}
]
)

return comp, {}

def generate_stream(
Expand All @@ -67,6 +94,20 @@ def generate_stream(
else:
system = ""

debug_request_id = str(uuid.uuid4())
debug_obj = s.log_debug(
[
{
"id": debug_request_id,
"requestPrompt": str(
[{"role": "system", "content": system}] + messages
),
"requestTimestamp": datetime.now().isoformat(),
"requestMetadata": sampling_params.to_anthropic_kwargs(),
}
]
)
Comment on lines +97 to +109
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not efficient enough. When debug is turned off, you still run the code to construct the argument, which takes some time. Please minimize the overhead and do not construct any objects when debug is turned off.


with self.client.messages.stream(
model=self.model_name,
system=system,
Expand All @@ -75,3 +116,15 @@ def generate_stream(
) as stream:
for text in stream.text_stream:
yield text, {}
final_message = stream.get_final_message()
final_message_json = final_message.to_json()
s.log_debug(
[
{
"id": debug_request_id,
"responseContent": final_message.content[0].text,
"responseTimestamp": datetime.now().isoformat(),
"responseMetadata": final_message_json,
}
]
)
57 changes: 57 additions & 0 deletions python/sglang/lang/backend/litellm.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import uuid
from datetime import datetime
from typing import Mapping, Optional

from sglang.lang.backend.base_backend import BaseBackend
Expand Down Expand Up @@ -57,6 +59,21 @@ def generate(
else:
messages = [{"role": "user", "content": s.text_}]

debug_request_id = str(uuid.uuid4())
s.log_debug(
[
{
"id": debug_request_id,
"requestPrompt": str(messages),
"requestTimestamp": datetime.now().isoformat(),
"requestMetadata": {
**self.client_params,
**sampling_params.to_litellm_kwargs(),
},
}
]
)

ret = litellm.completion(
model=self.model_name,
messages=messages,
Expand All @@ -65,6 +82,17 @@ def generate(
)
comp = ret.choices[0].message.content

s.log_debug(
[
{
"id": debug_request_id,
"responseContent": comp,
"responseTimestamp": datetime.now().isoformat(),
"responseMetadata": ret.to_json(),
}
]
)

return comp, {}

def generate_stream(
Expand All @@ -77,14 +105,43 @@ def generate_stream(
else:
messages = [{"role": "user", "content": s.text_}]

debug_request_id = str(uuid.uuid4())
s.log_debug(
[
{
"id": debug_request_id,
"requestPrompt": str(messages),
"requestTimestamp": datetime.now().isoformat(),
"requestMetadata": {
**self.client_params,
**sampling_params.to_litellm_kwargs(),
},
}
]
)

ret = litellm.completion(
model=self.model_name,
messages=messages,
stream=True,
**self.client_params,
**sampling_params.to_litellm_kwargs(),
)

full_text = ""
for chunk in ret:
text = chunk.choices[0].delta.content
if text is not None:
full_text += text
yield text, {}

s.log_debug(
[
{
"id": debug_request_id,
"responseContent": full_text,
"responseTimestamp": datetime.now().isoformat(),
"responseMetadata": {},
}
]
)
Loading
Loading