Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 51 additions & 3 deletions pkg/kthena-router/backend/sglang/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,10 @@ limitations under the License.
package sglang

import (
"encoding/json"
"fmt"
"io"
"net/http"

dto "github.com/prometheus/client_model/go"
corev1 "k8s.io/api/core/v1"
Expand All @@ -29,6 +32,7 @@ import (
var (
GPUCacheUsage = "sglang:token_usage"
RequestWaitingNum = "sglang:num_queue_reqs"
RequestRunningNum = "sglang:num_running_reqs"
TPOT = "sglang:time_per_output_token_seconds"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TTFT = "sglang:time_to_first_token_seconds"
)
Expand All @@ -37,6 +41,7 @@ var (
CounterAndGaugeMetrics = []string{
GPUCacheUsage,
RequestWaitingNum,
RequestRunningNum,
}

HistogramMetrics = []string{
Expand All @@ -47,11 +52,20 @@ var (
mapOfMetricsName = map[string]string{
GPUCacheUsage: utils.GPUCacheUsage,
RequestWaitingNum: utils.RequestWaitingNum,
RequestRunningNum: utils.RequestRunningNum,
TPOT: utils.TPOT,
TTFT: utils.TTFT,
}
)

type Model struct {
ID string `json:"id"`
}

type ModelList struct {
Data []Model `json:"data"`
}
Comment on lines +61 to +67
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These Model and ModelList structs are duplicates of the ones defined in pkg/kthena-router/backend/vllm/models.go. To improve maintainability and avoid code duplication, these structs should be moved to a shared package, for example, a new pkg/kthena-router/backend/types.go file, and reused in both vllm and sglang packages.


type sglangEngine struct {
// The address of sglang's query metrics is http://{model server}:MetricPort/metrics
// Default is 30000
Expand Down Expand Up @@ -83,7 +97,15 @@ func (engine *sglangEngine) GetCountMetricsInfo(allMetrics map[string]*dto.Metri
continue
}
for _, metric := range metricInfo.Metric {
metricValue := metric.GetCounter().GetValue()
var metricValue float64
switch metricInfo.GetType() {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason for adding this switch statement is i'm not actually sure what kind of metricsType i am going to get like gauge or counter from sglang engine, so i ended up using switch case for both condtional's but would love to what exactly kind of data we are going to get? @YaoZengzeng @LiZhenCheng9527!

case dto.MetricType_GAUGE:
metricValue = metric.GetGauge().GetValue()
case dto.MetricType_COUNTER:
metricValue = metric.GetCounter().GetValue()
default:
continue
}
wantMetrics[mapOfMetricsName[metricName]] = metricValue
}
}
Expand Down Expand Up @@ -115,7 +137,33 @@ func (engine *sglangEngine) GetHistogramPodMetrics(allMetrics map[string]*dto.Me
return wantMetrics, histogramMetrics
}

// TODO: Methods to get Models from sglang
// GetPodModels retrieves the list of models from a pod running the sglang engine.
func (engine *sglangEngine) GetPodModels(pod *corev1.Pod) ([]string, error) {
return nil, nil
url := fmt.Sprintf("http://%s:%d/v1/models", pod.Status.PodIP, engine.MetricPort)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we abstract a common helper function from vllm backend,

resp, err := http.Get(url)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The call to http.Get does not have a timeout. This could cause the request to hang indefinitely if the server is unresponsive. It's recommended to use an http.Client with a configured timeout to prevent this. Note that this change will require importing the time package.

Suggested change
resp, err := http.Get(url)
resp, err := (&http.Client{Timeout: 5 * time.Second}).Get(url)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

if err != nil {
return nil, err
}
Comment on lines +142 to +146
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("failed to get models from pod %s/%s: HTTP %d", pod.GetNamespace(), pod.GetName(), resp.StatusCode)
}

body, err := io.ReadAll(resp.Body)
if err != nil {
return nil, err
}

var modelList ModelList
err = json.Unmarshal(body, &modelList)
if err != nil {
return nil, err
}

models := make([]string, 0, len(modelList.Data))
for _, model := range modelList.Data {
models = append(models, model.ID)
}
return models, nil
}
Comment on lines 141 to 169
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function GetPodModels is identical to the implementation for vllm in pkg/kthena-router/backend/vllm/models.go. To avoid code duplication, this logic should be extracted into a shared helper function. This function could be placed in a common backend package and called by both sglang and vllm implementations. Additionally, for consistency with the vllm backend, consider moving model-related logic into a new pkg/kthena-router/backend/sglang/models.go file.

Loading