Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during compiling the LLama example #950

Open
umechand-amd opened this issue Feb 11, 2025 · 14 comments
Open

Error during compiling the LLama example #950

umechand-amd opened this issue Feb 11, 2025 · 14 comments

Comments

@umechand-amd
Copy link

Hello, I am trying to run the llama example in the repository https://github.com/nod-ai/shark-ai/blob/main/docs/shortfin/llm/user/llama_serving.md
I was able to export the model but while compiling I got the following error.

iree-compile $MLIR_PATH
--iree-hal-target-backends=rocm
--iree-hip-target=gfx942
-o $VMFB_PATH
/data/export/model.mlir:1020:12: error: failed to legalize operation 'torch.aten.outer'
%347 = torch.aten.outer %338, %346 : !torch.vtensor<[131072],si64>, !torch.vtensor<[128],f32> -> !torch.vtensor<[131072,128],f32>
^
/data/export/model.mlir:1020:12: note: see current operation: %1006 = "torch.aten.outer"(%995, %1005) : (!torch.vtensor<[131072],si64>, !torch.vtensor<[128],f32>) -> !torch.vtensor<[131072,128],f32>
/data/export/model.mlir:18411:12: error: failed to legalize operation 'torch.aten.outer'
%381 = torch.aten.outer %372, %380 : !torch.vtensor<[131072],si64>, !torch.vtensor<[128],f32> -> !torch.vtensor<[131072,128],f32>
^
/data/export/model.mlir:18411:12: note: see current operation: %1019 = "torch.aten.outer"(%1008, %1018) : (!torch.vtensor<[131072],si64>, !torch.vtensor<[128],f32>) -> !torch.vtensor<[131072,128],f32>

@umechand-amd
Copy link
Author

I uninstalled torch and installed torch from the requirement file pytorch-rocm-requirements.txt, I could run the llama example but got the same error when running the sharded model.
I got the same error when I tried to run Mistral 8B model as well.

@stbaione
Copy link
Contributor

Could you please post the output from pip freeze? Would help to know the software versions that you are using

@stbaione
Copy link
Contributor

Also could you post the exact commands that you used to export + compile, for sharded and unsharded llama?

@umechand-amd
Copy link
Author

(3.11.venv) root@66986f7f479e:/data# pip freeze
aiohappyeyeballs==2.4.6
aiohttp==3.11.12
aiosignal==1.3.2
annotated-types==0.7.0
anyio==4.8.0
attrs==25.1.0
certifi==2025.1.31
charset-normalizer==3.4.1
click==8.1.8
dataclasses-json==0.6.7
datasets==3.0.1
dill==0.3.8
fastapi==0.115.8
filelock==3.13.1
frozenlist==1.5.0
fsspec==2024.6.1
gguf==0.14.0
h11==0.14.0
huggingface-hub==0.22.2
idna==3.10
iree-base-compiler==3.1.0
iree-base-runtime==3.1.0
iree-compiler==20241104.1068
iree-turbine==3.1.0
Jinja2==3.1.4
MarkupSafe==2.1.5
marshmallow==3.26.1
ml_dtypes==0.5.1
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
mypy-extensions==1.0.0
networkx==3.3
numpy==2.1.2
packaging==24.2
pandas==2.2.3
pillow==11.0.0
propcache==0.2.1
pyarrow==19.0.0
pydantic==2.10.6
pydantic_core==2.27.2
python-dateutil==2.9.0.post0
pytorch-triton-rocm==3.2.0
pytz==2025.1
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.2
sentencepiece==0.2.0
shark-ai==3.1.0
sharktank==3.1.0
shortfin==3.1.0
six==1.17.0
sniffio==1.3.1
starlette==0.45.3
sympy==1.13.1
tokenizers==0.19.1
torch==2.6.0+rocm6.2.4
torchaudio==2.6.0+rocm6.2.4
torchvision==0.21.0+rocm6.2.4
tqdm==4.67.1
transformers==4.40.0
typing-inspect==0.9.0
typing_extensions==4.12.2
tzdata==2025.1
urllib3==2.3.0
uvicorn==0.34.0
xxhash==3.5.0
yarl==1.18.3
(3.11.venv) root@66986f7f479e:/data#

@umechand-amd
Copy link
Author

@ScottTodd
Copy link
Member

Can you confirm what iree-compile --version says? I see you have the old iree-compiler package installed along with the newer iree-base-compiler package:

iree-base-compiler==3.1.0
iree-base-runtime==3.1.0
iree-compiler==20241104.1068

We recommend uninstalling iree-compiler first, or better yet creating a fresh venv. https://iree.dev/reference/bindings/python/

@stbaione
Copy link
Contributor

stbaione commented Feb 12, 2025

It looks like you're using an old version of shark-ai. The latest release is v3.2.0, which added sharding support (for both shark-ai and iree-base-compiler/iree-base-runtime).

It was not implemented yet as of v3.1.0.

Just ran install in a fresh venv and got the correct versions:

aiohappyeyeballs==2.4.6
aiohttp==3.11.12
aiosignal==1.3.2
annotated-types==0.7.0
anyio==4.8.0
attrs==25.1.0
certifi==2025.1.31
charset-normalizer==3.4.1
click==8.1.8
dataclasses-json==0.6.7
datasets==3.2.0
dill==0.3.8
einops==0.8.1
fastapi==0.115.8
filelock==3.17.0
frozenlist==1.5.0
fsspec==2024.9.0
gguf==0.14.0
h11==0.14.0
huggingface-hub==0.28.1
idna==3.10
iree-base-compiler==3.2.0
iree-base-runtime==3.2.0
iree-turbine==3.2.0
Jinja2==3.1.5
MarkupSafe==3.0.2
marshmallow==3.26.1
ml_dtypes==0.5.1
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
mypy-extensions==1.0.0
numpy==2.2.2
packaging==24.2
pandas==2.2.3
pillow==11.1.0
propcache==0.2.1
pyarrow==19.0.0
pydantic==2.10.6
pydantic_core==2.27.2
python-dateutil==2.9.0.post0
pytz==2025.1
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.2
sentencepiece==0.2.0
shark-ai==3.2.0
sharktank==3.2.0
shortfin==3.2.0
six==1.17.0
sniffio==1.3.1
starlette==0.45.3
sympy==1.13.3
tokenizers==0.21.0
tqdm==4.67.1
transformers==4.48.0
typing-inspect==0.9.0
typing_extensions==4.12.2
tzdata==2025.1
urllib3==2.3.0
uvicorn==0.34.0
xxhash==3.5.0
yarl==1.18.3
``

@umechand-amd
Copy link
Author

okay let me upgrade it.

@stbaione
Copy link
Contributor

Did this resolve your issue?

@umechand-amd
Copy link
Author

umechand-amd commented Feb 17, 2025

Upgrading shark-ai to 3.2.0 fixed the iree-compiler issue with the sharded model but when I checked the output of the query it was complete gibberish, worse than the output mentioned in the issue here iree-org/iree#19948

I am still not able to run this model mistral_7b_q8_0_gguf though

@stbaione
Copy link
Contributor

Upgrading shark-ai to 3.2.0 fixed the iree-compiler issue with the sharded model but when I checked the output of the query it was complete gibberish, worse than the output mentioned in the issue here iree-org/iree#19948

I am still not able to run this model mistral_7b_q8_0_gguf though

Sounds good. We have the IREE commit identified that caused the regression, but still working on nailing down how the underlying streams are improperly behaving, causing the incorrect sharded model output. So that is still on-going.

Unless I'm mistaken, we currently only support Llama models for the LLM server

@ScottTodd
Copy link
Member

Another report of this: https://discord.com/channels/689900678990135345/1245423631626932264/1349738088816971819

Maybe related to torch 2.6.0+? Can anyone else repro? We can also add a lowering for torch.aten.outer if that is all that is needed.

@stbaione
Copy link
Contributor

Taking a look now

@stbaione
Copy link
Contributor

Created a fresh environment, and was successfully able to export, compile, start the server, and service a request by copying and pasting the instructions in the llama_serving doc.

From pip freeze:

iree-base-compiler==3.2.0
iree-base-runtime==3.2.0
iree-turbine==3.2.0
shark-ai==3.2.0
sharktank==3.2.0
shortfin==3.2.0
torch==2.5.1+cpu

Will try again with torch >= 2.6.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants