pip install vllm==0.13.0For video inference, install the video module:
pip install vllm[video]1.2 Install vllm-plugin-FL
git clone https://github.com/flagos-ai/vllm-plugin-FL
cd vllm-plugin-FL
pip install --no-build-isolation .
# or editble install
pip install --no-build-isolation -e .1.2.2 Install FlagGems
pip install flag-gems==4.2.1rc0 [-i https://pypi.tuna.tsinghua.edu.cn/simple]1.2.3 [Optional] Install FlagCX
git clone https://github.com/flagos-ai/FlagCX.git
cd FlagCX
git checkout -b v0.7.0
git submodule update --init --recursive
make USE_NVIDIA=1 # NVIDIA GPU Platform
export FLAGCX_PATH="$PWD"
cd plugin/torch/
python setup.py develop --adaptor [xxx]Note: [xxx] should be selected according to the current platform, e.g., nvidia, ascend, etc.
If there are multiple plugins in the current environment, you can specify use vllm-plugin-fl via VLLM_PLUGINS='fl'.
vllm serve <model_path> --dtype auto --max-model-len 2048 --api-key token-abc123 --gpu_memory_utilization 0.9 --trust-remote-code --max-num-batched-tokens 2048Parameter Description:
<model_path>: Specify the local path to your MiniCPM-V4.5 model--api-key: Set the API access key--max-model-len: Set the maximum model length--gpu_memory_utilization: GPU memory utilization rate