Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serve on vanilla sglang server with mid-layer #293

Open
SoheylM opened this issue Nov 20, 2024 · 1 comment
Open

Serve on vanilla sglang server with mid-layer #293

SoheylM opened this issue Nov 20, 2024 · 1 comment

Comments

@SoheylM
Copy link

SoheylM commented Nov 20, 2024

Hi,

Is there a way to create a mid-layer to serve functionary on a vanilla sglang server?

The unit I work in runs a sglang server where I can easily serve any huggingface hosted LLM. But I cannot install the functionary library on the server side and run the server_sglang.py script.

I attempted to create a middleware "server" on the client side without much success.

Thanks in advance for any tips.

@mckbrchill
Copy link

Same for vllm would be nice.
I've found out that server_vllm.py contains only /chat/completion endpoint

Also it seems that 3.1 medium generates a bunch of backslashes \\\, when it's prompted with a long input sequence. Is it because of the FC tune or FP8 quantization? Haven't tried 3.1 medium non quantized yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants