support prometheus metrics #1853

Lzhang-hub · 2024-10-31T06:24:36Z

Motivation

@binarycrayon 1461 provides a good start.
Since the latest code scheduling of sglang is integrated into schulder.py, so I implemented a version of prometheus metrics based on 1461 and the latest code.

Modifications

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

ByronHsu · 2024-10-31T06:33:22Z

@merrymercy @Ying1123 i can help reviewing and testing this PR

binarycrayon · 2024-11-01T19:59:08Z

Thank you, I have been swamped by game release and didn't have time to finish the work. Happy to test too.

One thing would be useful for our autoscaling is to have indication of whether sglang is lagging out from heavy load of requests. Presumably by a combination of requests waiting to be processed and request process speed

@Ying1123 @merrymercy

binarycrayon · 2024-11-01T20:00:57Z

if no one has get to this yet, I can add example configuration to your branch. Let me know. @Lzhang-hub @ByronHsu

merrymercy · 2024-11-02T05:44:19Z

@Lzhang-hub Can you fix the ci tests?

Lzhang-hub · 2024-11-02T06:00:38Z

if no one has get to this yet, I can add example configuration to your branch. Let me know. @Lzhang-hub @ByronHsu

@binarycrayon Sure, that would be great!

Lzhang-hub · 2024-11-02T06:01:02Z

@Lzhang-hub Can you fix the ci tests?

OK

ByronHsu · 2024-11-02T06:34:10Z

I will finish the review over the weekend

ByronHsu

LGTM overall. I did a local test and was able to query the metrics.
left a few minor suggestions

$ curl -X POST http://127.0.0.1:30000/metrics
# HELP sglang:max_total_num_tokens Maximum total number of tokens
# TYPE sglang:max_total_num_tokens gauge
sglang:max_total_num_tokens{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 570979.0
# HELP sglang:max_prefill_tokens Maximum prefill tokens
# TYPE sglang:max_prefill_tokens gauge
sglang:max_prefill_tokens{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 16384.0
# HELP sglang:max_running_requests Maximum running requests
# TYPE sglang:max_running_requests gauge
sglang:max_running_requests{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 2231.0
# HELP sglang:context_len Context length
# TYPE sglang:context_len gauge
sglang:context_len{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 131072.0
# HELP sglang:prompt_tokens_total Number of prefill tokens processed.
# TYPE sglang:prompt_tokens_total counter
sglang:prompt_tokens_total{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 230794.0
# HELP sglang:generation_tokens_total Number of generation tokens processed.
# TYPE sglang:generation_tokens_total counter
sglang:generation_tokens_total{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 191001.0
# HELP sglang:num_requests_running Number of requests currently running on GPU
# TYPE sglang:num_requests_running gauge
sglang:num_requests_running{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
# HELP sglang:num_requests_waiting Number of requests waiting to be processed.
# TYPE sglang:num_requests_waiting gauge
sglang:num_requests_waiting{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
# HELP sglang:gen_throughput Gen token throughput (token/s)
# TYPE sglang:gen_throughput gauge
sglang:gen_throughput{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
# HELP sglang:token_usage Total token usage
# TYPE sglang:token_usage gauge
sglang:token_usage{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
# HELP sglang:new_seq Number of new sequences
# TYPE sglang:new_seq gauge
sglang:new_seq{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
# HELP sglang:new_token Number of new token
# TYPE sglang:new_token gauge
sglang:new_token{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
# HELP sglang:cached_token Number of cached token
# TYPE sglang:cached_token gauge
sglang:cached_token{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
# HELP sglang:cache_hit_rate Cache hit rate
# TYPE sglang:cache_hit_rate gauge
sglang:cache_hit_rate{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 4.37
# HELP sglang:queue_req Number of queued requests
# TYPE sglang:queue_req gauge
sglang:queue_req{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
# HELP sglang:time_to_first_token_seconds Histogram of time to first token in seconds.
# TYPE sglang:time_to_first_token_seconds histogram
sglang:time_to_first_token_seconds_sum{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 235.673574924469
sglang:time_to_first_token_seconds_bucket{le="0.001",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
sglang:time_to_first_token_seconds_bucket{le="0.005",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
sglang:time_to_first_token_seconds_bucket{le="0.01",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
sglang:time_to_first_token_seconds_bucket{le="0.02",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 5.0
sglang:time_to_first_token_seconds_bucket{le="0.04",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 5.0
sglang:time_to_first_token_seconds_bucket{le="0.06",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 5.0
sglang:time_to_first_token_seconds_bucket{le="0.08",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 5.0
sglang:time_to_first_token_seconds_bucket{le="0.1",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 16.0
sglang:time_to_first_token_seconds_bucket{le="0.25",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1029.0
sglang:time_to_first_token_seconds_bucket{le="0.5",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1029.0
sglang:time_to_first_token_seconds_bucket{le="0.75",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1029.0
sglang:time_to_first_token_seconds_bucket{le="1.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1029.0
sglang:time_to_first_token_seconds_bucket{le="2.5",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1029.0
sglang:time_to_first_token_seconds_bucket{le="5.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1029.0
sglang:time_to_first_token_seconds_bucket{le="7.5",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1029.0
sglang:time_to_first_token_seconds_bucket{le="10.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1029.0
sglang:time_to_first_token_seconds_bucket{le="15.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1029.0
sglang:time_to_first_token_seconds_bucket{le="20.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1029.0
sglang:time_to_first_token_seconds_bucket{le="25.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1029.0
sglang:time_to_first_token_seconds_bucket{le="30.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1029.0
sglang:time_to_first_token_seconds_bucket{le="+Inf",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1029.0
sglang:time_to_first_token_seconds_count{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1029.0
# HELP sglang:time_per_output_token_seconds Histogram of time per output token in seconds.
# TYPE sglang:time_per_output_token_seconds histogram
sglang:time_per_output_token_seconds_sum{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 5988.631323337555
sglang:time_per_output_token_seconds_bucket{le="0.005",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
sglang:time_per_output_token_seconds_bucket{le="0.01",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 3649.0
sglang:time_per_output_token_seconds_bucket{le="0.015",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 22930.0
sglang:time_per_output_token_seconds_bucket{le="0.02",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 31116.0
sglang:time_per_output_token_seconds_bucket{le="0.025",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 46169.0
sglang:time_per_output_token_seconds_bucket{le="0.03",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 76405.0
sglang:time_per_output_token_seconds_bucket{le="0.04",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 157311.0
sglang:time_per_output_token_seconds_bucket{le="0.05",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 171231.0
sglang:time_per_output_token_seconds_bucket{le="0.075",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 189972.0
sglang:time_per_output_token_seconds_bucket{le="0.1",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 189972.0
sglang:time_per_output_token_seconds_bucket{le="0.15",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 189972.0
sglang:time_per_output_token_seconds_bucket{le="0.2",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 189972.0
sglang:time_per_output_token_seconds_bucket{le="0.3",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 189972.0
sglang:time_per_output_token_seconds_bucket{le="0.4",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 189972.0
sglang:time_per_output_token_seconds_bucket{le="0.5",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 189972.0
sglang:time_per_output_token_seconds_bucket{le="0.75",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 189972.0
sglang:time_per_output_token_seconds_bucket{le="1.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 189972.0
sglang:time_per_output_token_seconds_bucket{le="2.5",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 189972.0
sglang:time_per_output_token_seconds_bucket{le="+Inf",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 189972.0
sglang:time_per_output_token_seconds_count{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 189972.0
# HELP sglang:request_prompt_tokens Number of prefill tokens processed
# TYPE sglang:request_prompt_tokens histogram
sglang:request_prompt_tokens_sum{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 224472.0
sglang:request_prompt_tokens_bucket{le="1.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
sglang:request_prompt_tokens_bucket{le="2.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
sglang:request_prompt_tokens_bucket{le="5.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 9.0
sglang:request_prompt_tokens_bucket{le="10.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 89.0
sglang:request_prompt_tokens_bucket{le="20.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 244.0
sglang:request_prompt_tokens_bucket{le="50.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 392.0
sglang:request_prompt_tokens_bucket{le="100.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 481.0
sglang:request_prompt_tokens_bucket{le="200.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 570.0
sglang:request_prompt_tokens_bucket{le="500.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 846.0
sglang:request_prompt_tokens_bucket{le="1000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1001.0
sglang:request_prompt_tokens_bucket{le="2000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_prompt_tokens_bucket{le="5000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_prompt_tokens_bucket{le="10000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_prompt_tokens_bucket{le="20000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_prompt_tokens_bucket{le="50000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_prompt_tokens_bucket{le="100000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_prompt_tokens_bucket{le="200000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_prompt_tokens_bucket{le="500000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_prompt_tokens_bucket{le="+Inf",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_prompt_tokens_count{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
# HELP sglang:request_generation_tokens Number of generation tokens processed.
# TYPE sglang:request_generation_tokens histogram
sglang:request_generation_tokens_sum{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 190975.0
sglang:request_generation_tokens_bucket{le="1.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
sglang:request_generation_tokens_bucket{le="2.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 0.0
sglang:request_generation_tokens_bucket{le="5.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 24.0
sglang:request_generation_tokens_bucket{le="10.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 100.0
sglang:request_generation_tokens_bucket{le="20.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 212.0
sglang:request_generation_tokens_bucket{le="50.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 392.0
sglang:request_generation_tokens_bucket{le="100.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 472.0
sglang:request_generation_tokens_bucket{le="200.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 614.0
sglang:request_generation_tokens_bucket{le="500.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 914.0
sglang:request_generation_tokens_bucket{le="1000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1000.0
sglang:request_generation_tokens_bucket{le="2000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_generation_tokens_bucket{le="5000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_generation_tokens_bucket{le="10000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_generation_tokens_bucket{le="20000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_generation_tokens_bucket{le="50000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_generation_tokens_bucket{le="100000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_generation_tokens_bucket{le="200000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_generation_tokens_bucket{le="500000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_generation_tokens_bucket{le="+Inf",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:request_generation_tokens_count{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
# HELP sglang:e2e_request_latency_seconds Histogram of End-to-end request latency in seconds
# TYPE sglang:e2e_request_latency_seconds histogram
sglang:e2e_request_latency_seconds_sum{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 12786.840049982071
sglang:e2e_request_latency_seconds_bucket{le="1.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 2.0
sglang:e2e_request_latency_seconds_bucket{le="2.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 3.0
sglang:e2e_request_latency_seconds_bucket{le="5.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 3.0
sglang:e2e_request_latency_seconds_bucket{le="10.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 438.0
sglang:e2e_request_latency_seconds_bucket{le="20.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 870.0
sglang:e2e_request_latency_seconds_bucket{le="50.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:e2e_request_latency_seconds_bucket{le="100.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:e2e_request_latency_seconds_bucket{le="200.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:e2e_request_latency_seconds_bucket{le="500.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:e2e_request_latency_seconds_bucket{le="1000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:e2e_request_latency_seconds_bucket{le="2000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:e2e_request_latency_seconds_bucket{le="5000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:e2e_request_latency_seconds_bucket{le="10000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:e2e_request_latency_seconds_bucket{le="20000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:e2e_request_latency_seconds_bucket{le="50000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:e2e_request_latency_seconds_bucket{le="100000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:e2e_request_latency_seconds_bucket{le="200000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:e2e_request_latency_seconds_bucket{le="500000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:e2e_request_latency_seconds_bucket{le="+Inf",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:e2e_request_latency_seconds_count{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
# HELP sglang:waiting_request_latency_seconds Histogram of request waiting time in seconds
# TYPE sglang:waiting_request_latency_seconds histogram
sglang:waiting_request_latency_seconds_sum{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 2593.996586561203
sglang:waiting_request_latency_seconds_bucket{le="1.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 347.0
sglang:waiting_request_latency_seconds_bucket{le="2.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 422.0
sglang:waiting_request_latency_seconds_bucket{le="5.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 866.0
sglang:waiting_request_latency_seconds_bucket{le="10.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="20.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="50.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="100.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="200.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="500.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="1000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="2000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="5000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="10000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="20000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="50000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="100000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="200000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="500000.0",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_bucket{le="+Inf",name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0
sglang:waiting_request_latency_seconds_count{name="/shared/public/elr-models/meta-llama/Llama-3.2-3B/5cc0ffe09ee49f7be6ca7c794ee6bd7245e84e60/"} 1003.0

ByronHsu · 2024-11-03T17:58:55Z

python/sglang/srt/server_args.py

@@ -414,6 +415,12 @@ def add_cli_args(parser: argparse.ArgumentParser):
            action="store_true",
            help="Show time cost of custom marks.",
        )
+        parser.add_argument(
+            "--disable-log-stats",


disable-metrics might be clearer? referenced from https://sourcegraph.com/github.com/vllm-project/vllm/-/blob/vllm/entrypoints/openai/run_batch.py?L64

can we verify this does not affect the performance by running benchmark script? compare the throughput before and after. so we can safely turn this on by default without perf degradation

I consistently see ~3% perf decrease when enable log stats. i think this is due to the overhead from get_stats. Should we disable this by default? cc @merrymercy

Benchmark script (one A100 80GB)

$ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B --disable-log-stats $ python -m sglang.bench_serving --port 30000 --dataset-path /home/jobuser/resources/ShareGPT_V3_unfiltered_cleaned_split.json

disable

============ Serving Benchmark Result ============ Backend: sglang Traffic request rate: inf Successful requests: 1000 Benchmark duration (s): 54.76 Total input tokens: 224442 Total generated tokens: 190594 Total generated tokens (retokenized): 190562 Request throughput (req/s): 18.26 Input token throughput (tok/s): 4098.61 Output token throughput (tok/s): 3480.50 ----------------End-to-End Latency---------------- Mean E2E Latency (ms): 25589.94 Median E2E Latency (ms): 23799.11 ---------------Time to First Token---------------- Mean TTFT (ms): 8014.98 Median TTFT (ms): 8649.67 P99 TTFT (ms): 15176.21 -----Time per Output Token (excl. 1st token)------ Mean TPOT (ms): 325.73 Median TPOT (ms): 122.45 P99 TPOT (ms): 2547.66 ---------------Inter-token Latency---------------- Mean ITL (ms): 95.88 Median ITL (ms): 53.29 P99 ITL (ms): 440.54 ================================================== ============ Serving Benchmark Result ============ Backend: sglang Traffic request rate: inf Successful requests: 1000 Benchmark duration (s): 54.73 Total input tokens: 224442 Total generated tokens: 190594 Total generated tokens (retokenized): 190562 Request throughput (req/s): 18.27 Input token throughput (tok/s): 4100.94 Output token throughput (tok/s): 3482.48 ----------------End-to-End Latency---------------- Mean E2E Latency (ms): 25669.04 Median E2E Latency (ms): 23922.68 ---------------Time to First Token---------------- Mean TTFT (ms): 8066.42 Median TTFT (ms): 8721.35 P99 TTFT (ms): 15256.58 -----Time per Output Token (excl. 1st token)------ Mean TPOT (ms): 327.92 Median TPOT (ms): 123.95 P99 TPOT (ms): 2657.98 ---------------Inter-token Latency---------------- Mean ITL (ms): 95.76 Median ITL (ms): 53.57 P99 ITL (ms): 394.33 ==================================================

enable

============ Serving Benchmark Result ============ Backend: sglang Traffic request rate: inf Successful requests: 1000 Benchmark duration (s): 56.23 Total input tokens: 224442 Total generated tokens: 190594 Total generated tokens (retokenized): 190562 Request throughput (req/s): 17.78 Input token throughput (tok/s): 3991.43 Output token throughput (tok/s): 3389.48 ----------------End-to-End Latency---------------- Mean E2E Latency (ms): 26241.26 Median E2E Latency (ms): 24398.49 ---------------Time to First Token---------------- Mean TTFT (ms): 8123.16 Median TTFT (ms): 8761.40 P99 TTFT (ms): 15238.44 -----Time per Output Token (excl. 1st token)------ Mean TPOT (ms): 330.88 Median TPOT (ms): 126.88 P99 TPOT (ms): 2539.43 ---------------Inter-token Latency---------------- Mean ITL (ms): 99.19 Median ITL (ms): 56.29 P99 ITL (ms): 447.78 ================================================== ============ Serving Benchmark Result ============ Backend: sglang Traffic request rate: inf Successful requests: 1000 Benchmark duration (s): 56.43 Total input tokens: 224442 Total generated tokens: 190594 Total generated tokens (retokenized): 190557 Request throughput (req/s): 17.72 Input token throughput (tok/s): 3977.55 Output token throughput (tok/s): 3377.69 ----------------End-to-End Latency---------------- Mean E2E Latency (ms): 26421.26 Median E2E Latency (ms): 24639.09 ---------------Time to First Token---------------- Mean TTFT (ms): 9308.69 Median TTFT (ms): 10000.19 P99 TTFT (ms): 15446.68 -----Time per Output Token (excl. 1st token)------ Mean TPOT (ms): 323.10 Median TPOT (ms): 114.34 P99 TPOT (ms): 2543.62 ---------------Inter-token Latency---------------- Mean ITL (ms): 93.43 Median ITL (ms): 56.43 P99 ITL (ms): 455.82 ==================================================

I also found the perf decrease when I fix ci test, it may be add a args enable-metrics to determine whether it is turned on?

sounds good, and the default is off

ByronHsu · 2024-11-03T18:14:52Z

python/sglang/srt/managers/scheduler.py

@@ -513,7 +530,9 @@ def print_decode_stats(self):
        )
        throughput = self.num_generated_tokens / (time.time() - self.last_stats_tic)
        self.num_generated_tokens = 0
-        self.last_stats_tic = time.time()
+        # self.last_stats_tic = time.time()


why comment this out?

Because last_stats_time will be used in get_stats for TPOT code, If update when print decode log, it will affect the calculation of TPOT.

I think last_stats_time should be updated after each round of iteration, Is there any bug in updating it at line 360?

Oh, comment seems to affect the decode gen throughput (token/s) log, I will check it.

@ByronHsu Because the original decode log is not printed every iter, it is on condition self.forward_ct_decode % self.server_args.decode_log_interval == 0.
There are two solutions to this problem:

Print throughput for every iter.

Add a new variable to record the time of printing the throughput log.

Which do you think is better?

let's take the 2nd approach. the first one will introduce much overhead

python/sglang/srt/server.py

binarycrayon · 2024-11-03T23:35:56Z

Added grafana dashboard example and a production metrics documentation page in a pr to this branch Lzhang-hub#1

============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    inf       
Successful requests:                     1000      
Benchmark duration (s):                  89.35     
Total input tokens:                      237804    
Total generated tokens:                  200655    
Total generated tokens (retokenized):    172693    
Request throughput (req/s):              11.19     
Input token throughput (tok/s):          2661.60   
Output token throughput (tok/s):         2245.81   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   37429.80  
Median E2E Latency (ms):                 36665.81  
---------------Time to First Token----------------
Mean TTFT (ms):                          15205.34  
Median TTFT (ms):                        10752.00  
P99 TTFT (ms):                           46274.60  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          182.10    
Median TPOT (ms):                        132.63    
P99 TPOT (ms):                           1037.29   
---------------Inter-token Latency----------------
Mean ITL (ms):                           112.13    
Median ITL (ms):                         87.95     
P99 ITL (ms):                            438.84    
==================================================

ByronHsu · 2024-11-04T00:59:03Z

@binarycrayon they look great. let's open another PR for doc and use this for the code change

python/sglang/srt/managers/schedule_batch.py

binarycrayon · 2024-11-04T01:41:41Z

@binarycrayon they look great. let's open another PR for doc and use this for the code change

then I'll wait for this one merged first.

ByronHsu · 2024-11-04T07:20:51Z

looks good now. can you fix the failing test? you can also run the test locally https://github.com/sgl-project/sglang/tree/main/test

Lzhang-hub · 2024-11-04T07:52:48Z

looks good now. can you fix the failing test? you can also run the test locally https://github.com/sgl-project/sglang/tree/main/test

Sorry, due to network reasons, in order to run the complete test, I need to change the all hf model path in the test, which seems a bit troublesome

I run one of test locally python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_without_radix_cache, it work normal

Bug fix: I change prometheus labels model name to serverd_model_name, because Modelconfig in main branch have changed.

ByronHsu · 2024-11-04T19:07:49Z

@Lzhang-hub i am also on the cluster which does not have network access. usually i manually replace the model path with the local model path and only run one or two tests not all. the code looks good now. lets wait for the CI

ByronHsu · 2024-11-05T06:01:03Z

python/sglang/srt/server.py

+        )
+    else:
+        prometheus_multiproc_dir = tempfile.TemporaryDirectory()
+        os.environ["PROMETHEUS_MULTIPROC_DIR"] = prometheus_multiproc_dir.name


one test case is failing for this.

start engine 1 => set env_var to tmp_dir_1

terminate engine1

start engine2 => env_var already exists but tmp_dir_1 is already cleaned up => thus error

we can get around this by not setting prom env when launching engine. the setting of prometheus should only happens in launch_server but not in launch_engine. maybe have a separate _set_prometheus_env function and use the func in launch_server only

ByronHsu · 2024-11-05T06:22:35Z

python/sglang/srt/server.py

@@ -459,6 +459,7 @@ def launch_server(
        add_api_key_middleware(app, server_args.api_key)

    # add prometheus middleware
+    _set_prometheus_env()
    add_prometheus_middleware(app)


should we only enable this when enable_metrics?

zhyncs · 2024-11-06T05:03:27Z

Nice to have this feature! When we do benchmark, we can consider using this monitoring to observe the change curve over a period of time.

Ying1123 · 2024-11-06T08:20:01Z

Coooool to see it has finally been merged!

Lzhang-hub added 2 commits October 31, 2024 14:06

support prometheus metrics

628e00b

fix Format

81a071e

Lzhang-hub requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners October 31, 2024 06:24

Merge branch 'main' into log_prometheus_metrics

32ff9fc

This was referenced Nov 1, 2024

Development Roadmap (2024 Q4) #1487

Open

[WIP] Prometheus Metrics #1461

Closed

merrymercy force-pushed the main branch from 55311eb to 2134f08 Compare November 2, 2024 01:26

fix req queue_time is None

6b7caa2

merrymercy added the high priority label Nov 3, 2024

ByronHsu self-assigned this Nov 3, 2024

ByronHsu reviewed Nov 3, 2024

View reviewed changes

Merge branch 'main' into log_prometheus_metrics

aa89726

ByronHsu reviewed Nov 4, 2024

View reviewed changes

python/sglang/srt/managers/schedule_batch.py Outdated Show resolved Hide resolved

ByronHsu added 2 commits November 3, 2024 17:02

Update python/sglang/srt/server.py

ab40e38

Update python/sglang/srt/managers/schedule_batch.py

3672b1a

Lzhang-hub and others added 2 commits November 4, 2024 15:05

default no log stats and add last_log_tic for print gen throughput

df1283f

Merge branch 'main' into log_prometheus_metrics

4832433

change prometheus labels model_name

506773c

Merge branch 'main' into log_prometheus_metrics

0178f38

ByronHsu reviewed Nov 5, 2024

View reviewed changes

move prometheus env init to launch_server

902066e

ByronHsu reviewed Nov 5, 2024

View reviewed changes

Lzhang-hub and others added 2 commits November 6, 2024 10:13

enable prometheus according to the server_args

d9e557f

Merge branch 'main' into log_prometheus_metrics

24c36d4

ByronHsu approved these changes Nov 6, 2024

View reviewed changes

ByronHsu merged commit a146d99 into sgl-project:main Nov 6, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support prometheus metrics #1853

support prometheus metrics #1853

Lzhang-hub commented Oct 31, 2024

ByronHsu commented Oct 31, 2024

binarycrayon commented Nov 1, 2024

binarycrayon commented Nov 1, 2024

merrymercy commented Nov 2, 2024

Lzhang-hub commented Nov 2, 2024

Lzhang-hub commented Nov 2, 2024

ByronHsu commented Nov 2, 2024

ByronHsu left a comment

ByronHsu Nov 3, 2024

ByronHsu Nov 3, 2024

ByronHsu Nov 3, 2024

Lzhang-hub Nov 4, 2024

ByronHsu Nov 4, 2024

ByronHsu Nov 3, 2024

Lzhang-hub Nov 4, 2024

Lzhang-hub Nov 4, 2024

Lzhang-hub Nov 4, 2024

ByronHsu Nov 4, 2024

binarycrayon commented Nov 3, 2024 •

edited

Loading

ByronHsu commented Nov 4, 2024

binarycrayon commented Nov 4, 2024

ByronHsu commented Nov 4, 2024 •

edited

Loading

Lzhang-hub commented Nov 4, 2024

ByronHsu commented Nov 4, 2024

ByronHsu Nov 5, 2024

ByronHsu Nov 5, 2024

Lzhang-hub Nov 6, 2024

zhyncs commented Nov 6, 2024

Ying1123 commented Nov 6, 2024

support prometheus metrics #1853

support prometheus metrics #1853

Conversation

Lzhang-hub commented Oct 31, 2024

Motivation

Modifications

Checklist

ByronHsu commented Oct 31, 2024

binarycrayon commented Nov 1, 2024

binarycrayon commented Nov 1, 2024

merrymercy commented Nov 2, 2024

Lzhang-hub commented Nov 2, 2024

Lzhang-hub commented Nov 2, 2024

ByronHsu commented Nov 2, 2024

ByronHsu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binarycrayon commented Nov 3, 2024 • edited Loading

ByronHsu commented Nov 4, 2024

binarycrayon commented Nov 4, 2024

ByronHsu commented Nov 4, 2024 • edited Loading

Lzhang-hub commented Nov 4, 2024

ByronHsu commented Nov 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhyncs commented Nov 6, 2024

Ying1123 commented Nov 6, 2024

binarycrayon commented Nov 3, 2024 •

edited

Loading

ByronHsu commented Nov 4, 2024 •

edited

Loading