diff --git a/docs/deploy_guidance.md b/docs/deploy_guidance.md index eceeb72..244e244 100644 --- a/docs/deploy_guidance.md +++ b/docs/deploy_guidance.md @@ -86,6 +86,7 @@ vllm serve /path/to/step3-fp8 \ --reasoning-parser step3 \ --enable-auto-tool-choice \ --tool-call-parser step3 \ + --gpu-memory-utilization 0.85 \ --max-num-batched-tokens 4096 \ --trust-remote-code \ ``` @@ -223,4 +224,4 @@ print("Chat response:", chat_response) ``` -Note: In our image preprocessing pipeline, we implement a multi-patch mechanism to handle large images. If the input image exceeds 728x728 pixels, the system will automatically apply image cropping logic to get patches of the image. \ No newline at end of file +Note: In our image preprocessing pipeline, we implement a multi-patch mechanism to handle large images. If the input image exceeds 728x728 pixels, the system will automatically apply image cropping logic to get patches of the image.