Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HuggingFace] Compilation of vision models fail for batch_size > 1 #1080

Open
JingyaHuang opened this issue Jan 6, 2025 · 7 comments
Open
Assignees
Labels
bug Something isn't working compiler

Comments

@JingyaHuang
Copy link

Hi team! We also observed some issues when compiling vision models(swin and donut) with batch_size > 1 after bumping to Neuron SDK 2.21.0:

***** Compiling tiny-random-SwinModel *****
Using Neuron: --optlevel 2
..root = neuronxcc/starfish/penguin/targets/codegen/BirCodeGenLoop.py
root = neuronxcc/starfish/penguin/targets/codegen
root = neuronxcc/starfish/penguin/targets
root = neuronxcc/starfish/penguin
root = neuronxcc/starfish

[TEN404] (_divide.1171) Internal tensorizer error: BirCodeGenLoop:Too many strides! {{{{0,+,1}[4],+,0}[2],+,4}[16],+,0}[2] - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
***** Compiling swin-tiny-patch4-window7-224 *****
Using Neuron: --optlevel 2
....root = neuronxcc/starfish/penguin/targets/transforms/SplitAPUnionSets.py
root = neuronxcc/starfish/penguin/targets/transforms
root = neuronxcc/starfish/penguin/targets
root = neuronxcc/starfish/penguin
root = neuronxcc/starfish

[TEN404] Internal tensorizer error: SplitAPUnionSets:Unsupported batch-norm-training op: tensor_op_name: _batch-norm-training.1761 | hlo_id: 1761 | . Add --internal-hlo2tensorizer-options=--expand-batch-norm-training  to compiler args to workaround the problem. - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.

To reproduce:

  • tiny
import requests
from PIL import Image
from transformers import AutoImageProcessor
from optimum.neuron import NeuronModelForImageClassification

# Create the feature extractor and model
model_id = "hf-internal-testing/tiny-random-SwinModel"
feature_extractor = AutoImageProcessor.from_pretrained(model_id)
model = NeuronModelForImageClassification.from_pretrained(model_id, export=True, batch_size=2)
save_directory = "tiny_swin_neuron"
model.save_pretrained(save_directory)

(The compilation is successful with batch_size=1 though)

  • regular
import requests
from PIL import Image
from transformers import AutoImageProcessor
from optimum.neuron import NeuronModelForImageClassification

# Create the feature extractor and model
model_id = "microsoft/swin-tiny-patch4-window7-224"
feature_extractor = AutoImageProcessor.from_pretrained(model_id)
model = NeuronModelForImageClassification.from_pretrained(model_id, export=True, batch_size=2)
save_directory = "swin_neuron"
model.save_pretrained(save_directory)
@karthickgopalswamy
Copy link
Contributor

Thanks for filing the issue. We will take a look and get back to you soon.

@karthickgopalswamy karthickgopalswamy self-assigned this Jan 8, 2025
@karthickgopalswamy karthickgopalswamy added bug Something isn't working compiler labels Jan 8, 2025
@0x6b64
Copy link
Contributor

0x6b64 commented Jan 9, 2025

The model can be compiled successfully (for BS>1) by including the --model-type=transformer flag to the compilation command. Unfortunately optimum library doesn't have a clean way for the user to specify neuron compiler parameters (specifically for the neuron trace flow, in the neuron lazy tensor flow one can set NEURON_CC_FLAGS, but AFAICT, those will get overwritten by whatever comes from the compiler_args parameter in the trace flow). but its possible to patch a function in neuron.convert module which prepares the compiler args prior to triggering compilation.

Sample code is attached here && can be used to successfully compile the model.

import optimum
import optimum.exporters
import optimum.exporters.neuron

def patch_compiler_flags(config, compiler_args):
    compiler_args.append("--model-type=transformer")
    return compiler_args

optimum.exporters.neuron.convert.add_stable_diffusion_compiler_args = patch_compiler_flags

from transformers import AutoImageProcessor
from optimum.neuron import NeuronModelForImageClassification

# Create the feature extractor and model
model_id = "hf-internal-testing/tiny-random-SwinModel"
feature_extractor = AutoImageProcessor.from_pretrained(model_id)
model = NeuronModelForImageClassification.from_pretrained(model_id, export=True, batch_size=2)
save_directory = "tiny_swin_neuron"
model.save_pretrained(save_directory)

This was tested on artifacts from latest Neuron release container: public.ecr.aws/neuron/pytorch-inference-neuronx:2.5.1-neuronx-py310-sdk2.21.0-ubuntu22.04

@dacorvo
Copy link

dacorvo commented Jan 10, 2025

@0x6b64 thank you for looking into this issue.

The reason why optimum-neuron modifies the compiler args when tracing SD/SDXL models is because of previous recommendations to compile these models.
Can you confirm that, as of SDK 2.21 the unet model must be compiled using the --model-type=transformer flag (was previously --model-type=unet-inference).
Also, do these models support the --enable-fast-loading-neuron-binaries flag or should we drop it ?

@jimburtoft
Copy link
Contributor

Just to +1, I have another issue with Swin models where for Neuron to compile larger than 32x32 it needs a special compiler arg passed in. (In my case I use

compiler_args="""--internal-hlo2tensorizer-options='--expand-batch-norm-training'"""

(and apparently the triple quotes are important))

An "append to compiler_args" option would be nice.

@jimburtoft
Copy link
Contributor

jimburtoft commented Jan 10, 2025

@0x6b64 I think there are two issues, and model-type=transformers only fixes one. The 224 model is a higher resolution and seems to need the flag above.

224 is the one I used on the internal ticket. I tested the code there (that doesn't use optimum-neuron) with model-type=transformers and replicated the OTHER error message.

>>> model_neuron = torch_neuronx.trace(model, example_inputs = img_lq,
...     compiler_args="--model-type=transformer")
..root = neuronxcc/starfish/penguin/targets/codegen/BirCodeGenLoop.py
root = neuronxcc/starfish/penguin/targets/codegen
root = neuronxcc/starfish/penguin/targets
root = neuronxcc/starfish/penguin
root = neuronxcc/starfish

[TEN404] (_divide.2158) Internal tensorizer error: BirCodeGenLoop:Too many strides! {{{{0,+,1}[7],+,28}[7],+,7}[4],+,0}[2] - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/aws_neuronx_venv_pytorch_2_5_transformers/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 589, in trace
    neff_filename, metaneff, flattener, packer, weights = _trace(
  File "/opt/aws_neuronx_venv_pytorch_2_5_transformers/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 654, in _trace
    neff_artifacts = generate_neff(
  File "/opt/aws_neuronx_venv_pytorch_2_5_transformers/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 506, in generate_neff
    neff_filename = hlo_compile(
  File "/opt/aws_neuronx_venv_pytorch_2_5_transformers/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 396, in hlo_compile
    raise RuntimeError(f"neuronx-cc failed with {status}")
RuntimeError: neuronx-cc failed with 70

@JingyaHuang
Copy link
Author

JingyaHuang commented Jan 14, 2025

@0x6b64 In Optimum-Neuron, we only allow users to configure the arguments that directly impact model performance.
Certainly, there are flags(eg. --model-type) that need to be specifically passed to ensure successful compilation, and we hoped these could be pre-configured within Optimum. If these flags change frequently and affect the stability of Optimum-Neuron, we should allow users to modify them manually.
cc. @dacorvo

@0x6b64
Copy link
Contributor

0x6b64 commented Jan 15, 2025

Thanks @dacorvo @JingyaHuang for all of the context!

To answer your questions

  1. Can you confirm that, as of SDK 2.21 the unet model must be compiled using the --model-type=transformer flag (was previously --model-type=unet-inference).

Without revealing too many internals, short answer is that it shouldn't matter. But you're right, a unet based architecture should be using unet-inference model type (which I believe will also work here).

  1. do these models support the --enable-fast-loading-neuron-binaries flag or should we drop it?

We should continue to use --enable-fast-loading-neuron-binaries, I didn't add it the aforementioned example, but it does exactly as its name suggests. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compiler
Projects
None yet
Development

No branches or pull requests

5 participants