Error during compiling the LLama example #950

umechand-amd · 2025-02-11T14:21:06Z

Hello, I am trying to run the llama example in the repository https://github.com/nod-ai/shark-ai/blob/main/docs/shortfin/llm/user/llama_serving.md
I was able to export the model but while compiling I got the following error.

iree-compile $MLIR_PATH
--iree-hal-target-backends=rocm
--iree-hip-target=gfx942
-o $VMFB_PATH
/data/export/model.mlir:1020:12: error: failed to legalize operation 'torch.aten.outer'
%347 = torch.aten.outer %338, %346 : !torch.vtensor<[131072],si64>, !torch.vtensor<[128],f32> -> !torch.vtensor<[131072,128],f32>
^
/data/export/model.mlir:1020:12: note: see current operation: %1006 = "torch.aten.outer"(%995, %1005) : (!torch.vtensor<[131072],si64>, !torch.vtensor<[128],f32>) -> !torch.vtensor<[131072,128],f32>
/data/export/model.mlir:18411:12: error: failed to legalize operation 'torch.aten.outer'
%381 = torch.aten.outer %372, %380 : !torch.vtensor<[131072],si64>, !torch.vtensor<[128],f32> -> !torch.vtensor<[131072,128],f32>
^
/data/export/model.mlir:18411:12: note: see current operation: %1019 = "torch.aten.outer"(%1008, %1018) : (!torch.vtensor<[131072],si64>, !torch.vtensor<[128],f32>) -> !torch.vtensor<[131072,128],f32>

umechand-amd · 2025-02-12T16:18:51Z

I uninstalled torch and installed torch from the requirement file pytorch-rocm-requirements.txt, I could run the llama example but got the same error when running the sharded model.
I got the same error when I tried to run Mistral 8B model as well.

stbaione · 2025-02-12T16:21:21Z

Could you please post the output from pip freeze? Would help to know the software versions that you are using

stbaione · 2025-02-12T16:23:59Z

Also could you post the exact commands that you used to export + compile, for sharded and unsharded llama?

umechand-amd · 2025-02-12T16:29:16Z

(3.11.venv) root@66986f7f479e:/data# pip freeze
aiohappyeyeballs==2.4.6
aiohttp==3.11.12
aiosignal==1.3.2
annotated-types==0.7.0
anyio==4.8.0
attrs==25.1.0
certifi==2025.1.31
charset-normalizer==3.4.1
click==8.1.8
dataclasses-json==0.6.7
datasets==3.0.1
dill==0.3.8
fastapi==0.115.8
filelock==3.13.1
frozenlist==1.5.0
fsspec==2024.6.1
gguf==0.14.0
h11==0.14.0
huggingface-hub==0.22.2
idna==3.10
iree-base-compiler==3.1.0
iree-base-runtime==3.1.0
iree-compiler==20241104.1068
iree-turbine==3.1.0
Jinja2==3.1.4
MarkupSafe==2.1.5
marshmallow==3.26.1
ml_dtypes==0.5.1
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
mypy-extensions==1.0.0
networkx==3.3
numpy==2.1.2
packaging==24.2
pandas==2.2.3
pillow==11.0.0
propcache==0.2.1
pyarrow==19.0.0
pydantic==2.10.6
pydantic_core==2.27.2
python-dateutil==2.9.0.post0
pytorch-triton-rocm==3.2.0
pytz==2025.1
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.2
sentencepiece==0.2.0
shark-ai==3.1.0
sharktank==3.1.0
shortfin==3.1.0
six==1.17.0
sniffio==1.3.1
starlette==0.45.3
sympy==1.13.1
tokenizers==0.19.1
torch==2.6.0+rocm6.2.4
torchaudio==2.6.0+rocm6.2.4
torchvision==0.21.0+rocm6.2.4
tqdm==4.67.1
transformers==4.40.0
typing-inspect==0.9.0
typing_extensions==4.12.2
tzdata==2025.1
urllib3==2.3.0
uvicorn==0.34.0
xxhash==3.5.0
yarl==1.18.3
(3.11.venv) root@66986f7f479e:/data#

umechand-amd · 2025-02-12T16:32:13Z

The commands are exatly from the github page
https://github.com/nod-ai/shark-ai/blob/main/docs/shortfin/llm/user/llama_serving.md#exporting-to-mlir

ScottTodd · 2025-02-12T16:33:56Z

Can you confirm what iree-compile --version says? I see you have the old iree-compiler package installed along with the newer iree-base-compiler package:

iree-base-compiler==3.1.0
iree-base-runtime==3.1.0
iree-compiler==20241104.1068

We recommend uninstalling iree-compiler first, or better yet creating a fresh venv. https://iree.dev/reference/bindings/python/

stbaione · 2025-02-12T16:35:23Z

It looks like you're using an old version of shark-ai. The latest release is v3.2.0, which added sharding support (for both shark-ai and iree-base-compiler/iree-base-runtime).

It was not implemented yet as of v3.1.0.

Just ran install in a fresh venv and got the correct versions:

aiohappyeyeballs==2.4.6
aiohttp==3.11.12
aiosignal==1.3.2
annotated-types==0.7.0
anyio==4.8.0
attrs==25.1.0
certifi==2025.1.31
charset-normalizer==3.4.1
click==8.1.8
dataclasses-json==0.6.7
datasets==3.2.0
dill==0.3.8
einops==0.8.1
fastapi==0.115.8
filelock==3.17.0
frozenlist==1.5.0
fsspec==2024.9.0
gguf==0.14.0
h11==0.14.0
huggingface-hub==0.28.1
idna==3.10
iree-base-compiler==3.2.0
iree-base-runtime==3.2.0
iree-turbine==3.2.0
Jinja2==3.1.5
MarkupSafe==3.0.2
marshmallow==3.26.1
ml_dtypes==0.5.1
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
mypy-extensions==1.0.0
numpy==2.2.2
packaging==24.2
pandas==2.2.3
pillow==11.1.0
propcache==0.2.1
pyarrow==19.0.0
pydantic==2.10.6
pydantic_core==2.27.2
python-dateutil==2.9.0.post0
pytz==2025.1
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.2
sentencepiece==0.2.0
shark-ai==3.2.0
sharktank==3.2.0
shortfin==3.2.0
six==1.17.0
sniffio==1.3.1
starlette==0.45.3
sympy==1.13.3
tokenizers==0.21.0
tqdm==4.67.1
transformers==4.48.0
typing-inspect==0.9.0
typing_extensions==4.12.2
tzdata==2025.1
urllib3==2.3.0
uvicorn==0.34.0
xxhash==3.5.0
yarl==1.18.3
``

umechand-amd · 2025-02-12T16:38:49Z

okay let me upgrade it.

stbaione · 2025-02-12T18:02:47Z

Did this resolve your issue?

umechand-amd · 2025-02-17T19:56:03Z

Upgrading shark-ai to 3.2.0 fixed the iree-compiler issue with the sharded model but when I checked the output of the query it was complete gibberish, worse than the output mentioned in the issue here iree-org/iree#19948

I am still not able to run this model mistral_7b_q8_0_gguf though

stbaione · 2025-02-18T20:01:39Z

Upgrading shark-ai to 3.2.0 fixed the iree-compiler issue with the sharded model but when I checked the output of the query it was complete gibberish, worse than the output mentioned in the issue here iree-org/iree#19948

I am still not able to run this model mistral_7b_q8_0_gguf though

Sounds good. We have the IREE commit identified that caused the regression, but still working on nailing down how the underlying streams are improperly behaving, causing the incorrect sharded model output. So that is still on-going.

Unless I'm mistaken, we currently only support Llama models for the LLM server

ScottTodd · 2025-03-13T21:08:28Z

Another report of this: https://discord.com/channels/689900678990135345/1245423631626932264/1349738088816971819

Maybe related to torch 2.6.0+? Can anyone else repro? We can also add a lowering for torch.aten.outer if that is all that is needed.

stbaione · 2025-03-13T21:09:41Z

Taking a look now

stbaione · 2025-03-13T21:26:31Z

Created a fresh environment, and was successfully able to export, compile, start the server, and service a request by copying and pasting the instructions in the llama_serving doc.

From pip freeze:

iree-base-compiler==3.2.0
iree-base-runtime==3.2.0
iree-turbine==3.2.0
shark-ai==3.2.0
sharktank==3.2.0
shortfin==3.2.0
torch==2.5.1+cpu

Will try again with torch >= 2.6.0

ScottTodd mentioned this issue Mar 13, 2025

Replace torch.aten.outer with corresponding math #1084

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error during compiling the LLama example #950

Error during compiling the LLama example #950

umechand-amd commented Feb 11, 2025

umechand-amd commented Feb 12, 2025

stbaione commented Feb 12, 2025

stbaione commented Feb 12, 2025

umechand-amd commented Feb 12, 2025

umechand-amd commented Feb 12, 2025

ScottTodd commented Feb 12, 2025

stbaione commented Feb 12, 2025 •

edited

Loading

umechand-amd commented Feb 12, 2025

stbaione commented Feb 12, 2025

umechand-amd commented Feb 17, 2025 •

edited

Loading

stbaione commented Feb 18, 2025

ScottTodd commented Mar 13, 2025

stbaione commented Mar 13, 2025

stbaione commented Mar 13, 2025

Error during compiling the LLama example #950

Error during compiling the LLama example #950

Comments

umechand-amd commented Feb 11, 2025

umechand-amd commented Feb 12, 2025

stbaione commented Feb 12, 2025

stbaione commented Feb 12, 2025

umechand-amd commented Feb 12, 2025

umechand-amd commented Feb 12, 2025

ScottTodd commented Feb 12, 2025

stbaione commented Feb 12, 2025 • edited Loading

umechand-amd commented Feb 12, 2025

stbaione commented Feb 12, 2025

umechand-amd commented Feb 17, 2025 • edited Loading

stbaione commented Feb 18, 2025

ScottTodd commented Mar 13, 2025

stbaione commented Mar 13, 2025

stbaione commented Mar 13, 2025

stbaione commented Feb 12, 2025 •

edited

Loading

umechand-amd commented Feb 17, 2025 •

edited

Loading