-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error during compiling the LLama example #950
Comments
I uninstalled torch and installed torch from the requirement file pytorch-rocm-requirements.txt, I could run the llama example but got the same error when running the sharded model. |
Could you please post the output from |
Also could you post the exact commands that you used to export + compile, for sharded and unsharded llama? |
(3.11.venv) root@66986f7f479e:/data# pip freeze |
The commands are exatly from the github page |
Can you confirm what
We recommend uninstalling |
It looks like you're using an old version of It was not implemented yet as of Just ran install in a fresh venv and got the correct versions: aiohappyeyeballs==2.4.6
aiohttp==3.11.12
aiosignal==1.3.2
annotated-types==0.7.0
anyio==4.8.0
attrs==25.1.0
certifi==2025.1.31
charset-normalizer==3.4.1
click==8.1.8
dataclasses-json==0.6.7
datasets==3.2.0
dill==0.3.8
einops==0.8.1
fastapi==0.115.8
filelock==3.17.0
frozenlist==1.5.0
fsspec==2024.9.0
gguf==0.14.0
h11==0.14.0
huggingface-hub==0.28.1
idna==3.10
iree-base-compiler==3.2.0
iree-base-runtime==3.2.0
iree-turbine==3.2.0
Jinja2==3.1.5
MarkupSafe==3.0.2
marshmallow==3.26.1
ml_dtypes==0.5.1
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
mypy-extensions==1.0.0
numpy==2.2.2
packaging==24.2
pandas==2.2.3
pillow==11.1.0
propcache==0.2.1
pyarrow==19.0.0
pydantic==2.10.6
pydantic_core==2.27.2
python-dateutil==2.9.0.post0
pytz==2025.1
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.2
sentencepiece==0.2.0
shark-ai==3.2.0
sharktank==3.2.0
shortfin==3.2.0
six==1.17.0
sniffio==1.3.1
starlette==0.45.3
sympy==1.13.3
tokenizers==0.21.0
tqdm==4.67.1
transformers==4.48.0
typing-inspect==0.9.0
typing_extensions==4.12.2
tzdata==2025.1
urllib3==2.3.0
uvicorn==0.34.0
xxhash==3.5.0
yarl==1.18.3
`` |
okay let me upgrade it. |
Did this resolve your issue? |
Upgrading shark-ai to 3.2.0 fixed the iree-compiler issue with the sharded model but when I checked the output of the query it was complete gibberish, worse than the output mentioned in the issue here iree-org/iree#19948 I am still not able to run this model mistral_7b_q8_0_gguf though |
Sounds good. We have the IREE commit identified that caused the regression, but still working on nailing down how the underlying streams are improperly behaving, causing the incorrect sharded model output. So that is still on-going. Unless I'm mistaken, we currently only support Llama models for the LLM server |
Another report of this: https://discord.com/channels/689900678990135345/1245423631626932264/1349738088816971819 Maybe related to torch 2.6.0+? Can anyone else repro? We can also add a lowering for |
Taking a look now |
Created a fresh environment, and was successfully able to export, compile, start the server, and service a request by copying and pasting the instructions in the From
Will try again with |
Hello, I am trying to run the llama example in the repository https://github.com/nod-ai/shark-ai/blob/main/docs/shortfin/llm/user/llama_serving.md
I was able to export the model but while compiling I got the following error.
iree-compile $MLIR_PATH
--iree-hal-target-backends=rocm
--iree-hip-target=gfx942
-o $VMFB_PATH
/data/export/model.mlir:1020:12: error: failed to legalize operation 'torch.aten.outer'
%347 = torch.aten.outer %338, %346 : !torch.vtensor<[131072],si64>, !torch.vtensor<[128],f32> -> !torch.vtensor<[131072,128],f32>
^
/data/export/model.mlir:1020:12: note: see current operation: %1006 = "torch.aten.outer"(%995, %1005) : (!torch.vtensor<[131072],si64>, !torch.vtensor<[128],f32>) -> !torch.vtensor<[131072,128],f32>
/data/export/model.mlir:18411:12: error: failed to legalize operation 'torch.aten.outer'
%381 = torch.aten.outer %372, %380 : !torch.vtensor<[131072],si64>, !torch.vtensor<[128],f32> -> !torch.vtensor<[131072,128],f32>
^
/data/export/model.mlir:18411:12: note: see current operation: %1019 = "torch.aten.outer"(%1008, %1018) : (!torch.vtensor<[131072],si64>, !torch.vtensor<[128],f32>) -> !torch.vtensor<[131072,128],f32>
The text was updated successfully, but these errors were encountered: