[HuggingFace] Compilation of vision models fail for batch_size > 1 #1080

JingyaHuang · 2025-01-06T11:11:29Z

Hi team! We also observed some issues when compiling vision models(swin and donut) with batch_size > 1 after bumping to Neuron SDK 2.21.0:

***** Compiling tiny-random-SwinModel *****
Using Neuron: --optlevel 2
..root = neuronxcc/starfish/penguin/targets/codegen/BirCodeGenLoop.py
root = neuronxcc/starfish/penguin/targets/codegen
root = neuronxcc/starfish/penguin/targets
root = neuronxcc/starfish/penguin
root = neuronxcc/starfish

[TEN404] (_divide.1171) Internal tensorizer error: BirCodeGenLoop:Too many strides! {{{{0,+,1}[4],+,0}[2],+,4}[16],+,0}[2] - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.

***** Compiling swin-tiny-patch4-window7-224 *****
Using Neuron: --optlevel 2
....root = neuronxcc/starfish/penguin/targets/transforms/SplitAPUnionSets.py
root = neuronxcc/starfish/penguin/targets/transforms
root = neuronxcc/starfish/penguin/targets
root = neuronxcc/starfish/penguin
root = neuronxcc/starfish

[TEN404] Internal tensorizer error: SplitAPUnionSets:Unsupported batch-norm-training op: tensor_op_name: _batch-norm-training.1761 | hlo_id: 1761 | . Add --internal-hlo2tensorizer-options=--expand-batch-norm-training  to compiler args to workaround the problem. - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.

To reproduce:

tiny

import requests
from PIL import Image
from transformers import AutoImageProcessor
from optimum.neuron import NeuronModelForImageClassification

# Create the feature extractor and model
model_id = "hf-internal-testing/tiny-random-SwinModel"
feature_extractor = AutoImageProcessor.from_pretrained(model_id)
model = NeuronModelForImageClassification.from_pretrained(model_id, export=True, batch_size=2)
save_directory = "tiny_swin_neuron"
model.save_pretrained(save_directory)

(The compilation is successful with batch_size=1 though)

regular

import requests
from PIL import Image
from transformers import AutoImageProcessor
from optimum.neuron import NeuronModelForImageClassification

# Create the feature extractor and model
model_id = "microsoft/swin-tiny-patch4-window7-224"
feature_extractor = AutoImageProcessor.from_pretrained(model_id)
model = NeuronModelForImageClassification.from_pretrained(model_id, export=True, batch_size=2)
save_directory = "swin_neuron"
model.save_pretrained(save_directory)

The text was updated successfully, but these errors were encountered:

karthickgopalswamy · 2025-01-07T19:11:36Z

Thanks for filing the issue. We will take a look and get back to you soon.

0x6b64 · 2025-01-09T14:55:02Z

The model can be compiled successfully (for BS>1) by including the --model-type=transformer flag to the compilation command. Unfortunately optimum library doesn't have a clean way for the user to specify neuron compiler parameters (specifically for the neuron trace flow, in the neuron lazy tensor flow one can set NEURON_CC_FLAGS, but AFAICT, those will get overwritten by whatever comes from the compiler_args parameter in the trace flow). but its possible to patch a function in neuron.convert module which prepares the compiler args prior to triggering compilation.

Sample code is attached here && can be used to successfully compile the model.

import optimum
import optimum.exporters
import optimum.exporters.neuron

def patch_compiler_flags(config, compiler_args):
    compiler_args.append("--model-type=transformer")
    return compiler_args

optimum.exporters.neuron.convert.add_stable_diffusion_compiler_args = patch_compiler_flags

from transformers import AutoImageProcessor
from optimum.neuron import NeuronModelForImageClassification

# Create the feature extractor and model
model_id = "hf-internal-testing/tiny-random-SwinModel"
feature_extractor = AutoImageProcessor.from_pretrained(model_id)
model = NeuronModelForImageClassification.from_pretrained(model_id, export=True, batch_size=2)
save_directory = "tiny_swin_neuron"
model.save_pretrained(save_directory)

This was tested on artifacts from latest Neuron release container: public.ecr.aws/neuron/pytorch-inference-neuronx:2.5.1-neuronx-py310-sdk2.21.0-ubuntu22.04

dacorvo · 2025-01-10T09:49:21Z

@0x6b64 thank you for looking into this issue.

The reason why optimum-neuron modifies the compiler args when tracing SD/SDXL models is because of previous recommendations to compile these models.
Can you confirm that, as of SDK 2.21 the unet model must be compiled using the --model-type=transformer flag (was previously --model-type=unet-inference).
Also, do these models support the --enable-fast-loading-neuron-binaries flag or should we drop it ?

jimburtoft · 2025-01-10T17:40:09Z

Just to +1, I have another issue with Swin models where for Neuron to compile larger than 32x32 it needs a special compiler arg passed in. (In my case I use

compiler_args="""--internal-hlo2tensorizer-options='--expand-batch-norm-training'"""

(and apparently the triple quotes are important))

An "append to compiler_args" option would be nice.

jimburtoft · 2025-01-10T18:41:08Z

@0x6b64 I think there are two issues, and model-type=transformers only fixes one. The 224 model is a higher resolution and seems to need the flag above.

224 is the one I used on the internal ticket. I tested the code there (that doesn't use optimum-neuron) with model-type=transformers and replicated the OTHER error message.

>>> model_neuron = torch_neuronx.trace(model, example_inputs = img_lq,
...     compiler_args="--model-type=transformer")
..root = neuronxcc/starfish/penguin/targets/codegen/BirCodeGenLoop.py
root = neuronxcc/starfish/penguin/targets/codegen
root = neuronxcc/starfish/penguin/targets
root = neuronxcc/starfish/penguin
root = neuronxcc/starfish

[TEN404] (_divide.2158) Internal tensorizer error: BirCodeGenLoop:Too many strides! {{{{0,+,1}[7],+,28}[7],+,7}[4],+,0}[2] - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/aws_neuronx_venv_pytorch_2_5_transformers/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 589, in trace
    neff_filename, metaneff, flattener, packer, weights = _trace(
  File "/opt/aws_neuronx_venv_pytorch_2_5_transformers/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 654, in _trace
    neff_artifacts = generate_neff(
  File "/opt/aws_neuronx_venv_pytorch_2_5_transformers/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 506, in generate_neff
    neff_filename = hlo_compile(
  File "/opt/aws_neuronx_venv_pytorch_2_5_transformers/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 396, in hlo_compile
    raise RuntimeError(f"neuronx-cc failed with {status}")
RuntimeError: neuronx-cc failed with 70

JingyaHuang · 2025-01-14T10:22:20Z

@0x6b64 In Optimum-Neuron, we only allow users to configure the arguments that directly impact model performance.
Certainly, there are flags(eg. --model-type) that need to be specifically passed to ensure successful compilation, and we hoped these could be pre-configured within Optimum. If these flags change frequently and affect the stability of Optimum-Neuron, we should allow users to modify them manually.
cc. @dacorvo

0x6b64 · 2025-01-15T21:02:13Z

Thanks @dacorvo @JingyaHuang for all of the context!

To answer your questions

Can you confirm that, as of SDK 2.21 the unet model must be compiled using the --model-type=transformer flag (was previously --model-type=unet-inference).

Without revealing too many internals, short answer is that it shouldn't matter. But you're right, a unet based architecture should be using unet-inference model type (which I believe will also work here).

do these models support the --enable-fast-loading-neuron-binaries flag or should we drop it?

We should continue to use --enable-fast-loading-neuron-binaries, I didn't add it the aforementioned example, but it does exactly as its name suggests. :)

karthickgopalswamy self-assigned this Jan 8, 2025

karthickgopalswamy added bug Something isn't working compiler labels Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HuggingFace] Compilation of vision models fail for batch_size > 1 #1080

[HuggingFace] Compilation of vision models fail for batch_size > 1 #1080

JingyaHuang commented Jan 6, 2025

karthickgopalswamy commented Jan 7, 2025

0x6b64 commented Jan 9, 2025 •

edited

Loading

dacorvo commented Jan 10, 2025

jimburtoft commented Jan 10, 2025

jimburtoft commented Jan 10, 2025 •

edited

Loading

JingyaHuang commented Jan 14, 2025 •

edited

Loading

0x6b64 commented Jan 15, 2025

[HuggingFace] Compilation of vision models fail for batch_size > 1 #1080

[HuggingFace] Compilation of vision models fail for batch_size > 1 #1080

Comments

JingyaHuang commented Jan 6, 2025

karthickgopalswamy commented Jan 7, 2025

0x6b64 commented Jan 9, 2025 • edited Loading

dacorvo commented Jan 10, 2025

jimburtoft commented Jan 10, 2025

jimburtoft commented Jan 10, 2025 • edited Loading

JingyaHuang commented Jan 14, 2025 • edited Loading

0x6b64 commented Jan 15, 2025

0x6b64 commented Jan 9, 2025 •

edited

Loading

jimburtoft commented Jan 10, 2025 •

edited

Loading

JingyaHuang commented Jan 14, 2025 •

edited

Loading