Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OLMo model #1676

Merged
merged 3 commits into from
Oct 16, 2024
Merged

Add OLMo model #1676

merged 3 commits into from
Oct 16, 2024

Conversation

janimo
Copy link
Contributor

@janimo janimo commented Oct 15, 2024

Support for OLMo models

Note: Only the new ones with -hf suffix are supported, not the old ones built before OLMo was integrated into transformers, those are deprecated and have an extra package requirement.

https://huggingface.co/collections/allenai/olmo-suite-65aeaae8fe5b6b2122b46778

@merrymercy
Copy link
Contributor

merrymercy commented Oct 16, 2024

Thanks for the contribution. Can you add it here

ALL_OTHER_MODELS = [
ModelCase("Qwen/Qwen2-1.5B"),
ModelCase("Qwen/Qwen2.5-14B-Instruct"),
ModelCase("HuggingFaceTB/SmolLM-135M-Instruct"),
]

and run the test
### Add the model to the test suite
To make sure the new model is well maintained in the future, it is better to add it to the test suite.
You can add it to the `ALL_OTHER_MODELS` list in the [test_generation_models.py](https://github.com/sgl-project/sglang/blob/main/test/srt/models/test_generation_models.py) and run the following command to test it.
For example, if the model is Qwen/Qwen2-1.5B
```
ONLY_RUN=Qwen/Qwen2-1.5B python3 -m unittest test_generation_models.TestGenerationModels.test_others
```
?

@janimo
Copy link
Contributor Author

janimo commented Oct 16, 2024

I had to bump the error tolerance for decode for the test to pass. The resulting tokens are all the same.

@merrymercy merrymercy merged commit a5114b6 into sgl-project:main Oct 16, 2024
6 of 10 checks passed
@janimo janimo deleted the olmo branch October 16, 2024 07:16
from torch import nn
from transformers import OlmoConfig
from vllm.distributed import get_tensor_model_parallel_world_size
from vllm.model_executor.layers.linear import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janimo Could you submit another PR to replace this with SGLang's linear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhyncs done #1696

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants