Closed as duplicate of#23133
Closed as duplicate of#23133
Description
Describe the issue
Loading and optimizing the model using CUDA may result in inconsistent outputs after optimization. In contrast, performing the optimization on the CPU produces consistent results.
- Actual Behavior:
AssertionError:
Not equal to tolerance rtol=0.001, atol=0.001
Mismatched elements: 2 / 138 (1.45%)
Max absolute difference: 32
Max relative difference: inf
x: array([[32, 14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],...
y: array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],...
- Expected Behavior:
The optimized model should produce identical results for all outputs when compared to the original model, within the specified tolerance.
To reproduce
-
Download the model
-
Run the following script:
import onnx
import onnxruntime as ort
from onnxruntime.transformers import optimizer
import numpy as np
model_path = "9256.onnx"
optimized_model_path = f"./opt.onnx"
input_data = {
"v8_0": np.array([[[[0.5576], [0.4236]]]], dtype=np.float16),
"v7_0": np.array([[[[0.1953]]], [[[0.94]]], [[[0.807]]]], dtype=np.float16),
}
sess_options = ort.SessionOptions()
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
original_session = ort.InferenceSession(model_path, sess_options, providers=["CUDAExecutionProvider"])
original_output_names = [output.name for output in original_session.get_outputs()]
original_result = original_session.run(original_output_names, input_data)
original_result2 = original_session.run(original_output_names, input_data)
for r1, r2 in zip(original_result, original_result2):
np.testing.assert_allclose(r1, r2, rtol=1e-3, atol=1e-3)
optimized_model = optimizer.optimize_model(model_path, opt_level=99)
optimized_model.save_model_to_file(optimized_model_path)
optimized_session = ort.InferenceSession(optimized_model_path, providers=["CUDAExecutionProvider"])
optimized_output_names = [output.name for output in optimized_session.get_outputs()]
optimized_result = optimized_session.run(optimized_output_names, input_data)
for r1, r2 in zip(original_result, optimized_result):
np.testing.assert_allclose(r1, r2, atol=1e-3, rtol=1e-3)
notice:
- providers=["CUDAExecutionProvider"] -> inconsistent outputs
- providers=["CPUExecutionProvider"] -> run well
Urgency
No response
Platform
Linux
OS Version
Ubuntu 20.04
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response