how to use auth token to use llama2 in vLLM? and question about presence_penalty #717
Replies: 3 comments
-
I found out on that #539 that frequeny_penalty is repetition_penalty, not sure if it is true |
Beta Was this translation helpful? Give feedback.
0 replies
-
anyone needed a solution: |
Beta Was this translation helpful? Give feedback.
0 replies
-
Also found some docs on integrating it with vLLM: https://docs.mistral.ai/deployment/self-deployment/vllm/ |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
nowhere in the docs nor in the git https://github.com/vllm-project/vllm/tree/main I found anything related to huggingface token to download from huggingface the model of llama2 (like 13b chat hf)? I get the error in sagemaker of:
Repo model meta-llama/Llama-2-13b-chat-hf is gated. You must be authenticated to access it.
[INFO ] PyProcess - Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-2-13b-hf/resolve/main/config.json.
what to do?
model_name = "meta-llama/Llama-2-13b-chat-hf"
sampling_params = SamplingParams(temperature=0.1, top_p=0.75, top_k=0.4, presence_penalty=1.17)
llm = LLM(model=model_name, tensor_parallel_size=4 )
btw, is presence_penalty is the known ״repetition_penalty״ in other models? or it is the frequency_penalty one in SamplingParams?
Beta Was this translation helpful? Give feedback.
All reactions