From 54b62146ffa82b9bb168d3350671fc0123963fdb Mon Sep 17 00:00:00 2001 From: Jaswanth Gannamaneni Date: Mon, 15 Sep 2025 21:53:02 -0700 Subject: [PATCH 1/4] Draft for Updation of OpenVINO ExecutionProvider documentation --- docs/build/eps.md | 54 +- .../OpenVINO-ExecutionProvider.md | 682 +++++++++++++++--- 2 files changed, 597 insertions(+), 139 deletions(-) diff --git a/docs/build/eps.md b/docs/build/eps.md index 3edacac1b37dc..59cd1ace6e29c 100644 --- a/docs/build/eps.md +++ b/docs/build/eps.md @@ -46,7 +46,7 @@ The onnxruntime code will look for the provider shared libraries in the same loc {: .no_toc } * Install [CUDA](https://developer.nvidia.com/cuda-toolkit) and [cuDNN](https://developer.nvidia.com/cudnn) - * The CUDA execution provider for ONNX Runtime is built and tested with CUDA 11.8, 12.2 and cuDNN 8.9. Check [here](../execution-providers/CUDA-ExecutionProvider.md#requirements) for more version information. + * The CUDA execution provider for ONNX Runtime is built and tested with CUDA 12.x and cuDNN 9. Check [here](../execution-providers/CUDA-ExecutionProvider.md#requirements) for more version information. * The path to the CUDA installation must be provided via the CUDA_HOME environment variable, or the `--cuda_home` parameter. The installation directory should contain `bin`, `include` and `lib` sub-directories. * The path to the CUDA `bin` directory must be added to the PATH environment variable so that `nvcc` is found. * The path to the cuDNN installation must be provided via the CUDNN_HOME environment variable, or `--cudnn_home` parameter. In Windows, the installation directory should contain `bin`, `include` and `lib` sub-directories. @@ -110,7 +110,7 @@ See more information on the TensorRT Execution Provider [here](../execution-prov * Follow [instructions for CUDA execution provider](#cuda) to install CUDA and cuDNN, and setup environment variables. * Follow [instructions for installing TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/latest/installing-tensorrt/installing.html) - * The TensorRT execution provider for ONNX Runtime is built and tested with TensorRT 10.8. + * The TensorRT execution provider for ONNX Runtime is built and tested with TensorRT 10.9. * The path to TensorRT installation must be provided via the `--tensorrt_home` parameter. * ONNX Runtime uses [TensorRT built-in parser](https://developer.nvidia.com/tensorrt/download) from `tensorrt_home` by default. * To use open-sourced [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt/tree/main) parser instead, add `--use_tensorrt_oss_parser` parameter in build commands below. @@ -123,14 +123,15 @@ See more information on the TensorRT Execution Provider [here](../execution-prov * i.e It's version-matched if assigning `tensorrt_home` with path to TensorRT-10.9 built-in binaries and onnx-tensorrt [10.9-GA branch](https://github.com/onnx/onnx-tensorrt/tree/release/10.9-GA) specified in [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt). -### **[Note to ORT 1.21.0 open-sourced parser users]** +### **[Note to ORT 1.21/1.22 open-sourced parser users]** -* ORT 1.21.0 links against onnx-tensorrt 10.8-GA, which requires upcoming onnx 1.18. - * Here's a temporarily fix to preview on onnx-tensorrt 10.8-GA (or newer) when building ORT 1.21.0: +* ORT 1.21/1.22 link against onnx-tensorrt 10.8-GA/10.9-GA, which requires newly released onnx 1.18. + * Here's a temporarily fix to preview on onnx-tensorrt 10.8-GA/10.9-GA when building ORT 1.21/1.22: * Replace the [onnx line in cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/rel-1.21.0/cmake/deps.txt#L38) - with `onnx;https://github.com/onnx/onnx/archive/f22a2ad78c9b8f3bd2bb402bfce2b0079570ecb6.zip;324a781c31e30306e30baff0ed7fe347b10f8e3c` + with `onnx;https://github.com/onnx/onnx/archive/e709452ef2bbc1d113faf678c24e6d3467696e83.zip;c0b9f6c29029e13dea46b7419f3813f4c2ca7db8` * Download [this](https://github.com/microsoft/onnxruntime/blob/7b2733a526c12b5ef4475edd47fd9997ebc2b2c6/cmake/patches/onnx/onnx.patch) as raw file and save file to [cmake/patches/onnx/onnx.patch](https://github.com/microsoft/onnxruntime/blob/rel-1.21.0/cmake/patches/onnx/onnx.patch) (do not copy/paste from browser, as it might alter line break type) - * Build ORT 1.21.0 with trt-related flags above (including `--use_tensorrt_oss_parser`) + * Build ORT with trt-related flags above (including `--use_tensorrt_oss_parser`) + * The [onnx 1.18](https://github.com/onnx/onnx/releases/tag/v1.18.0) is supported by latest ORT main branch. Please checkout main branch and build ORT-TRT with `--use_tensorrt_oss_parser` to enable OSS parser with full onnx 1.18 support. ### Build Instructions {: .no_toc } @@ -234,6 +235,21 @@ These instructions are for the latest [JetPack SDK](https://developer.nvidia.com * For a portion of Jetson devices like the Xavier series, higher power mode involves more cores (up to 6) to compute but it consumes more resource when building ONNX Runtime. Set `--parallel 1` in the build command if OOM happens and system is hanging. +## TensorRT-RTX + +See more information on the NV TensorRT RTX Execution Provider [here](../execution-providers/TensorRTRTX-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + + * Follow [instructions for CUDA execution provider](#cuda) to install CUDA and setup environment variables. + * Intall TensorRT for RTX from nvidia.com (TODO: add link when available) + +### Build Instructions +{: .no_toc } +`build.bat --config Release --parallel 32 --build_dir _build --build_shared_lib --use_nv_tensorrt_rtx --tensorrt_home "C:\dev\TensorRT-RTX-1.1.0.3" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9" --cmake_generator "Visual Studio 17 2022" --use_vcpkg` +Replace the --tensorrt_home and --cuda_home with correct paths to CUDA and TensorRT-RTX installations. + ## oneDNN See more information on oneDNN (formerly DNNL) [here](../execution-providers/oneDNN-ExecutionProvider.md). @@ -276,15 +292,15 @@ See more information on the OpenVINO™ Execution Provider [here](../execution-p {: .no_toc } 1. Install the OpenVINO™ offline/online installer from Intel® Distribution of OpenVINO™TM Toolkit **Release 2024.3** for the appropriate OS and target hardware: - * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE). - * [Linux - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE) + * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2025_3_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE). + * [Linux - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2025_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE) - Follow [documentation](https://docs.openvino.ai/2024/home.html) for detailed instructions. + Follow [documentation](https://docs.openvino.ai/2025/index.html) for detailed instructions. - *2024.5 is the current recommended OpenVINO™ version. [OpenVINO™ 2024.5](https://docs.openvino.ai/2024/index.html) is minimal OpenVINO™ version requirement.* + *2025.3 is the current recommended OpenVINO™ version. [OpenVINO™ 2025.0](https://docs.openvino.ai/2025/index.html) is minimal OpenVINO™ version requirement.* 2. Configure the target hardware with specific follow on instructions: - * To configure Intel® Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html#windows), [Linux](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html#linux) + * To configure Intel® Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/2025/get-started/install-openvino/configurations/configurations-intel-gpu.html#windows), [Linux](https://docs.openvino.ai/2025/get-started/install-openvino/configurations/configurations-intel-gpu.html#linux) 3. Initialize the OpenVINO™ environment by running the setupvars script as shown below. This is a required step: @@ -296,7 +312,7 @@ See more information on the OpenVINO™ Execution Provider [here](../execution-p ``` $ source /setupvars.sh ``` - **Note:** If you are using a dockerfile to use OpenVINO™ Execution Provider, sourcing OpenVINO™ won't be possible within the dockerfile. You would have to explicitly set the LD_LIBRARY_PATH to point to OpenVINO™ libraries location. Refer our [dockerfile](https://github.com/microsoft/onnxruntime/blob/main/dockerfiles/Dockerfile.openvino). + ### Build Instructions {: .no_toc } @@ -319,7 +335,7 @@ See more information on the OpenVINO™ Execution Provider [here](../execution-p * `--use_openvino` builds the OpenVINO™ Execution Provider in ONNX Runtime. * ``: Specifies the default hardware target for building OpenVINO™ Execution Provider. This can be overriden dynamically at runtime with another option (refer to [OpenVINO™-ExecutionProvider](../execution-providers/OpenVINO-ExecutionProvider.md#summary-of-options) for more details on dynamic device selection). Below are the options for different Intel target devices. -Refer to [Intel GPU device naming convention](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#device-naming-convention) for specifying the correct hardware target in cases where both integrated and discrete GPU's co-exist. +Refer to [Intel GPU device naming convention](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#device-naming-convention) for specifying the correct hardware target in cases where both integrated and discrete GPU's co-exist. | Hardware Option | Target Device | | --------------- | ------------------------| @@ -351,8 +367,8 @@ Example's: HETERO:GPU,CPU or AUTO:GPU,CPU or MULTI:GPU,CPU * To enable this feature during build time. Use `--use_openvino ` `_NO_PARTITION` ``` -Usage: --use_openvino CPU_FP32_NO_PARTITION or --use_openvino GPU_FP32_NO_PARTITION or - --use_openvino GPU_FP16_NO_PARTITION +Usage: --use_openvino CPU_NO_PARTITION or --use_openvino GPU_NO_PARTITION or + --use_openvino GPU_NO_PARTITION ``` For more information on OpenVINO™ Execution Provider's ONNX Layer support, Topology support, and Intel hardware enabled, please refer to the document [OpenVINO™-ExecutionProvider](../execution-providers/OpenVINO-ExecutionProvider.md) @@ -595,7 +611,7 @@ e.g. ### Linux {: .no_toc } -Currently Linux support is only enabled for AMD Adapable SoCs. Please refer to the guidance [here](../execution-providers/Vitis-AI-ExecutionProvider.md#amd-adaptable-soc-installation) for SoC targets. +Currently Linux support is only enabled for AMD Adapable SoCs. Please refer to the guidance [here](../execution-providers/Vitis-AI-ExecutionProvider.md#installation-for-amd-adaptable-socs) for SoC targets. --- @@ -624,7 +640,7 @@ Dockerfile instructions are available [here](https://github.com/microsoft/onnxru #### Build Phython Wheel -`./build.sh --config Release --build --build_wheel --parallel --use_migraphx --migraphx_home /opt/rocm` +`./build.sh --config Release --build_wheel --parallel --use_migraphx --migraphx_home /opt/rocm` Then the python wheels(*.whl) could be found at ```./build/Linux/Release/dist```. @@ -653,7 +669,7 @@ Dockerfile instructions are available [here](https://github.com/microsoft/onnxru #### Build Phython Wheel -`./build.sh --config Release --build --build_wheel --parallel --use_rocm --rocm_home /opt/rocm` +`./build.sh --config Release --build_wheel --parallel --use_rocm --rocm_home /opt/rocm` Then the python wheels(*.whl) could be found at ```./build/Linux/Release/dist```. diff --git a/docs/execution-providers/OpenVINO-ExecutionProvider.md b/docs/execution-providers/OpenVINO-ExecutionProvider.md index 34dca5aba4858..24230df06e353 100644 --- a/docs/execution-providers/OpenVINO-ExecutionProvider.md +++ b/docs/execution-providers/OpenVINO-ExecutionProvider.md @@ -19,22 +19,20 @@ Accelerate ONNX models on Intel CPUs, GPUs, NPU with Intel OpenVINO™ Execution ## Install -Pre-built packages and Docker images are published for OpenVINO™ Execution Provider for ONNX Runtime by Intel for each release. -* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.6 Release](https://github.com/intel/onnxruntime/releases) +Pre-built packages are published for OpenVINO™ Execution Provider for ONNX Runtime by Intel for each release. +* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.8 Release](https://github.com/intel/onnxruntime/releases) * Python wheels Ubuntu/Windows: [onnxruntime-openvino](https://pypi.org/project/onnxruntime-openvino/) -* Docker image: [openvino/onnxruntime_ep_ubuntu20](https://hub.docker.com/r/openvino/onnxruntime_ep_ubuntu20) ## Requirements -ONNX Runtime OpenVINO™ Execution Provider is compatible with three lastest releases of OpenVINO™. + +ONNX Runtime OpenVINO™ Execution Provider is compatible with three latest releases of OpenVINO™. |ONNX Runtime|OpenVINO™|Notes| |---|---|---| +|1.23.0|2025.3|[Details - Placeholder]()| +|1.22.0|2025.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.7)| |1.21.0|2025.0|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.6)| -|1.20.0|2024.4|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.5)| -|1.19.0|2024.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.4)| -|1.18.0|2024.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.3)| -|1.17.1|2023.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.2)| ## Build @@ -42,61 +40,514 @@ For build instructions, please see the [BUILD page](../build/eps.md#openvino). ## Usage -**Set OpenVINO™ Environment for Python** +**Python Package Installation** -Please download onnxruntime-openvino python packages from PyPi.org: +For Python users, install the onnxruntime-openvino package: ``` pip install onnxruntime-openvino ``` +**Set OpenVINO™ Environment Variables** + +To use OpenVINO™ Execution Provider with any programming language (Python, C++, C#), you must set up the OpenVINO™ Environment Variables using the full installer package of OpenVINO™. + * **Windows** +``` +C:\ \setupvars.bat +``` +* **Linux** +``` +$ source /setupvars.sh +``` +**Note for Linux Python Users:** OpenVINO™ Execution Provider installed from PyPi.org comes with prebuilt OpenVINO™ libs and supports flag CXX11_ABI=0. So there is no need to install OpenVINO™ separately. However, if you need to enable CX11_ABI=1 flag, build ONNX Runtime python wheel packages from source. For build instructions, see the [BUILD page](../build/eps.md#openvino). - To enable OpenVINO™ Execution Provider with ONNX Runtime on Windows it is must to set up the OpenVINO™ Environment Variables using the full installer package of OpenVINO™. - Initialize the OpenVINO™ environment by running the setupvars script as shown below. This is a required step: - ``` - C:\ \setupvars.bat - ``` +**Set OpenVINO™ Environment for C#** -* **Linux** +To use csharp api for openvino execution provider create a custom nuget package. Follow the instructions [here](../build/inferencing.md#build-nuget-packages) to install prerequisites for nuget creation. Once prerequisites are installed follow the instructions to [build openvino execution provider](../build/eps.md#openvino) and add an extra flag `--build_nuget` to create nuget packages. Two nuget packages will be created Microsoft.ML.OnnxRuntime.Managed and Intel.ML.OnnxRuntime.Openvino. - OpenVINO™ Execution Provider with Onnx Runtime on Linux, installed from PyPi.org comes with prebuilt OpenVINO™ libs and supports flag CXX11_ABI=0. So there is no need to install OpenVINO™ separately. +## Table of Contents +- [Configuration Options](#configuration-options) +- [Features ](#features ) +- [Examples](#examples) +- [Detailed Descriptions](#detailed-descriptions) - But if there is need to enable CX11_ABI=1 flag of OpenVINO, build Onnx Runtime python wheel packages from source. For build instructions, please see the [BUILD page](../build/eps.md#openvino). - OpenVINO™ Execution Provider wheels on Linux built from source will not have prebuilt OpenVINO™ libs so we must set the OpenVINO™ Environment Variable using the full installer package of OpenVINO™: - ``` - $ source /setupvars.sh - ``` +## Configuration Options -**Set OpenVINO™ Environment for C++** -For Running C++/C# ORT Samples with the OpenVINO™ Execution Provider it is must to set up the OpenVINO™ Environment Variables using the full installer package of OpenVINO™. -Initialize the OpenVINO™ environment by running the setupvars script as shown below. This is a required step: - * For Windows run: - ``` - C:\ \setupvars.bat - ``` - * For Linux run: - ``` - $ source /setupvars.sh - ``` - **Note:** If you are using a dockerfile to use OpenVINO™ Execution Provider, sourcing OpenVINO™ won't be possible within the dockerfile. You would have to explicitly set the LD_LIBRARY_PATH to point to OpenVINO™ libraries location. Refer our [dockerfile](https://github.com/microsoft/onnxruntime/blob/main/dockerfiles/Dockerfile.openvino). +Runtime parameters you set when initializing the OpenVINO Execution Provider to control how inference runs. +### Configuration Table -**Set OpenVINO™ Environment for C#** +| **Key** | **Type** | **Allowable Values** | **Value Type** | **Description** | **Example** | +|---------|----------|---------------------|----------------|-----------------|-------------| +| **device_type** | string | CPU, NPU, GPU, GPU.0, GPU.1, HETERO, MULTI, AUTO | string | [Choose which hardware device to use](#device_type-config-description) | [Examples](#device_type-config-examples) | +| **precision** | string | FP32, FP16, ACCURACY | string | [Set inference precision level](#precision-config-description) | [Examples](#precision-config-examples) | +| **num_of_threads** | string | Any positive integer > 0 | size_t | [Control number of inference threads](#num_of_threads-config-description) | [Examples](#num_of_threads-config-examples) | +| **num_streams** | string | Any positive integer > 0 | size_t | [Set parallel execution streams](#num_streams-config-description) | [Examples](#num_streams-config-examples) | +| **cache_dir** | string | Valid filesystem path | string | [Enable model caching by setting cache directory](#cache_dir-config-description) | [Examples](#cache_dir-config-examples) | +| **load_config** | string | JSON file path | string | [Load custom OpenVINO properties from JSON](#load_config-config-description) | [Examples](#load_config-config-examples) | +| **enable_qdq_optimizer** | string | True/False | boolean | [Enable QDQ optimization for NPU](#enable_qdq_optimizer-config-description) | [Examples](#enable_qdq_optimizer-config-examples) | +| **disable_dynamic_shapes** | string | True/False | boolean | [Convert dynamic models to static shapes](#disable_dynamic_shapes-config-description) | [Examples](#disable_dynamic_shapes-config-examples) | +| **model_priority** | string | LOW, MEDIUM, HIGH, DEFAULT | string | [Configure model resource allocation priority](#model_priority-config-description) | [Examples](#model_priority-config-examples) | +| **reshape_input** | string | input_name[shape_bounds] | string | [Set dynamic shape bounds for NPU models](#reshape_input-config-description) | [Examples](#reshape_input-config-examples) | +| **layout** | string | input_name[layout_format] | string | [Specify input/output tensor layout format](#layout-config-description) | [Examples](#layout-config-examples) | + +Valid Hetero or Multi or Auto Device combinations: `HETERO:,...` +The `device` can be any of these devices from this list ['CPU','GPU', 'NPU'] + +A minimum of two DEVICE_TYPE'S should be specified for a valid HETERO, MULTI, or AUTO Device Build. + +Example: HETERO:GPU,CPU AUTO:GPU,CPU MULTI:GPU,CPU + +Deprecated device_type option : CPU_FP32, GPU_FP32, GPU_FP16, NPU_FP16 are no more supported. They will be deprecated in the future release. Kindly upgrade to latest device_type and precision option. + +--- + +## Features + + +Built-in capabilities that the OpenVINO EP provides automatically or can be enabled through configuration. + + +### Features Table + +| **Feature** | **Supported Devices** | **Description** | **How to Enable** | **Example** | +|-------------|----------------------|-----------------|-------------------|-------------| +| **Auto Device Selection** | CPU, GPU, NPU | [Automatically selects optimal device for your model](#auto-device-execution-for-openvino-execution-provider) | Set device_type to AUTO | Examples | +| **Model Caching** | CPU, GPU, NPU | [Saves compiled models for faster subsequent loading](#model-caching) | Set cache_dir option | Examples | +| **Multi-Threading** | All devices | [Thread-safe inference with configurable thread count](#multi-threading-for-openvino-execution-provider) | Automatic/configure with num_of_threads | Examples | +| **Multi-Stream Execution** | All devices | [Parallel inference streams for higher throughput](#multi-streams-for-openvino-execution-provider) | Configure with num_streams | Examples | +| **Heterogeneous Execution** | CPU + GPU/NPU | [Split model execution across multiple devices](#heterogeneous-execution-for-openvino-execution-provider) | Set device_type to HETERO | Examples | +| **Multi-Device Execution** | CPU, GPU, NPU | [Run same model on multiple devices in parallel](#multi-device-execution-for-openvino-execution-provider) | Set device_type to MULTI | Examples | +| **INT8 Quantized Models** | CPU, GPU, NPU | [Support for quantized models with better performance](#support-for-int8-quantized-models) | Automatic for quantized models | Examples | +| **External Weights Support** | All devices | [Load models with weights stored in external files](#support-for-weights-saved-in-external-files) | Automatic detection | [Example](#support-for-weights-saved-in-external-files) | +| **Dynamic Shape Management** | All devices | [Handle models with variable input dimensions](#dynamic-shape-management) | Automatic/use reshape_input for NPU | Examples | +| **Tensor Layout Control** | All devices | [Explicit control over tensor memory layout](#tensor-layout-control) | Set layout option | Examples | +| **QDQ Optimization** | NPU | [Optimize quantized models for NPU performance](#enable-qdq-optimizations-passes) | Set enable_qdq_optimizer | Examples | +| **EP-Weight Sharing** | All devices | [Share weights across multiple inference sessions](#openvino-execution-provider-supports-ep-weight-sharing-across-sessions) | Session configuration | Examples | + + +## Examples + +### Configuration Examples + + +#### device_type Config Examples +```python +Single device +"device_type": "GPU" +"device_type": "NPU" +"device_type": "CPU" +Specific GPU +"device_type": "GPU.1" + +Multi-device configurations +"device_type": "HETERO:GPU,CPU" +"device_type": "MULTI:GPU,CPU" +"device_type": "AUTO:GPU,NPU,CPU" + +Python API +import onnxruntime as ort +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'AUTO:GPU,NPU,CPU'}] + ) + +Command line +onnxruntime_perf_test.exe -e openvino -i "device_type|GPU" model.onnx +``` + +#### precision Config Examples +```python +"precision": "FP32" +"precision": "FP16" +"precision": "ACCURACY" + + +# Python API + +import onnxruntime as ort +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'GPU', 'precision': 'FP16'}] +) + + +``` +#### num_of_threads Config Examples +```python +"num_of_threads": "4" +"num_of_threads": "8" +"num_of_threads": "16" + +# Python API +import onnxruntime as ort +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'CPU', 'num_of_threads': '8'}] +) + +``` + +#### num_streams Config Examples +```python +"num_streams": "1" +"num_streams": "4" +"num_streams": "8 + +# Python API +import onnxruntime as ort +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'GPU', 'num_streams': '4'}] +) + +``` + +#### cache_dir Config Examples +```python +# Windows +"cache_dir": "C:\\intel\\openvino_cache" + +# Linux +"cache_dir": "/tmp/ov_cache" + +# Relative path +"cache_dir": "./model_cache" + +import onnxruntime as ort +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'GPU', 'cache_dir': './ov_cache'}] +) + +``` + +#### load_config Config Examples +```python +# JSON file path +"load_config": "config.json" +"load_config": "/path/to/openvino_config.json" +"load_config": "C:\\configs\\gpu_config.json" + +# Python API +import onnxruntime as ort +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'GPU', 'load_config': 'custom_config.json'}] +) + +# Example JSON content: +{ + "GPU": { + "PERFORMANCE_HINT": "THROUGHPUT", + "EXECUTION_MODE_HINT": "ACCURACY", + "CACHE_DIR": "C:\\gpu_cache" + }, + "NPU": { + "LOG_LEVEL": "LOG_DEBUG" + } +} +# Command line usage +onnxruntime_perf_test.exe -e openvino -i "device_type|NPU load_config|config.json" model.onnx + +``` + +#### enable_qdq_optimizer Config Examples + +```python +"enable_qdq_optimizer": "True" # Enable QDQ optimization for NPU +"enable_qdq_optimizer": "False" # Disable QDQ optimization + +# Python API +import onnxruntime as ort +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'NPU', 'enable_qdq_optimizer': 'True'}] +) +``` +#### disable_dynamic_shapes Config Examples +```python +"disable_dynamic_shapes": "True" # Convert dynamic to static shapes +"disable_dynamic_shapes": "False" # Keep original dynamic shapes + +# Python API +import onnxruntime as ort +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'GPU', 'disable_dynamic_shapes': 'True'}] +) + +``` + +#### model_priority Config Examples +```python +"model_priority": "HIGH" # Highest resource priority +"model_priority": "MEDIUM" # Medium resource priority +"model_priority": "LOW" # Lowest resource priority +"model_priority": "DEFAULT" # System default priority + +# Python API +import onnxruntime as ort +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'GPU', 'model_priority': 'HIGH'}] +) +``` + +#### reshape_input Config Examples +```python +# Command line usage (NPU only) +"reshape_input": "data[1,3,60,80..120]" # Dynamic height: 80-120 +"reshape_input": "input[1,3,224,224]" # Fixed shape +"reshape_input": "seq[1,10..50,768]" # Dynamic sequence: 10-50 + +# Python API +import onnxruntime as ort +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'NPU', 'reshape_input': 'data[1,3,60,80..120]'}] +) +# Command line +onnxruntime_perf_test.exe -e openvino -i "device_type|NPU reshape_input|data[1,3,60,80..120]" model.onnx + +``` + +#### layout Config Examples +```python +# Command line usage +"layout": "data_0[NCHW],prob_1[NC]" # Multiple inputs/outputs +"layout": "input[NHWC]" # Single input +"layout": "data[N?HW]" # Unknown channel dimension + +# Python API +import onnxruntime as ort +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'NPU', 'layout': 'data_0[NCHW],output[NC]'}] +) + +# Command line +onnxruntime_perf_test.exe -e openvino -i "device_type|NPU layout|data_0[NCHW],prob_1[NC]" model.onnx +``` + +### Feature Examples + +#### auto-device Feature Examples + +```python +# Basic AUTO usage +import onnxruntime as ort +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'AUTO'}] +) + +# AUTO with device priority +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'AUTO:GPU,NPU,CPU'}] +) + +# Command line +onnxruntime_perf_test.exe -e openvino -i "device_type|AUTO:GPU,CPU" model.onnx + +``` + +#### model-caching Feature Examples +```python +# Enable caching +import onnxruntime as ort +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'GPU', 'cache_dir': './ov_cache'}] +) + +# First run: compiles and caches model +# Subsequent runs: loads from cache (much faster) +session1 = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'GPU', 'cache_dir': './ov_cache'}] +) # Slow first time +session2 = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{'device_type': 'GPU', 'cache_dir': './ov_cache'}] +) # Fast second time +``` + +#### multi-threading Feature Examples +```python + +``` + +#### multi-stream-feature Examples +```python + +``` +### Detailed Descriptions +#### Configuration Descriptions +#### device_type Config Description +Specifies which hardware device to run inference on. This is the primary configuration that determines execution target. +Available Options: + +CPU: Intel CPU execution using OpenVINO CPU plugin +NPU: Neural Processing Unit for AI-optimized inference +GPU: Intel GPU acceleration (integrated or discrete) +GPU.0, GPU.1: Specific GPU device selection in multi-GPU systems +AUTO: Automatic device selection based on model characteristics +HETERO: Heterogeneous execution across multiple devices +MULTI: Multi-device parallel execution + +Default Behavior: If not specified, uses the default hardware specified during build time. +#### precision Config Description +Controls the numerical precision used during inference, affecting both performance and accuracy. +Device Support: + +CPU: FP32 +GPU: FP32, FP16, ACCURACY +NPU: FP16 + +ACCURACY Mode: Maintains original model precision without any conversion, ensuring maximum accuracy at potential performance cost. +Performance Considerations: FP16 generally provides 2x better performance on GPU/NPU with minimal accuracy loss. + +#### num_of_threads Config Description +Override the default number of inference threads for CPU-based execution. +Default: 8 threads if not specified + +#### num_streams Config Description +Controls the number of parallel inference streams for throughput optimization. +Default: 1 stream (latency-optimized) +Use Cases: + +Single stream (1): Minimize latency for real-time applications +Multiple streams (2-8): Maximize throughput for batch processing +Optimal count: Usually matches number of CPU cores or GPU execution units + +Performance Impact: More streams can improve throughput but may increase memory usage. +#### cache_dir Config Description +Specifies directory path for caching compiled models to improve subsequent load times. +Benefits: Dramatically faster model loading after first compilation + Reduces initialization overhead significantly E + specially beneficial for complex models and frequent restarts +Requirements: Directory must be writable by the application +Sufficient disk space for cached models (can be substantial) +Path must be accessible at runtime + +Supported Devices: CPU, NPU, GPU +#### context Config Description + +Provides OpenCL context for GPU acceleration when OpenVINO EP is built with OpenCL support. +Usage: Pass cl_context address as void pointer converted to string +Availability: Only when compiled with OpenCL flags enabled +Purpose: Integration with existing OpenCL workflows and shared memory management + +#### load_config Config Description +Enables loading custom OpenVINO properties from JSON configuration file during runtime. +JSON Format: + +```python +{ + "DEVICE_KEY": {"PROPERTY": "PROPERTY_VALUE"} +} +``` +Validation Rules: + +Invalid property keys: Ignored with warning logged +Invalid property values: Causes exception during execution +Immutable properties: Skipped with warning logged + +Common Properties: + +PERFORMANCE_HINT: "THROUGHPUT", "LATENCY" +EXECUTION_MODE_HINT: "ACCURACY", "PERFORMANCE" +LOG_LEVEL: "LOG_DEBUG", "LOG_INFO", "LOG_WARNING" +CACHE_DIR: Custom cache directory path +INFERENCE_PRECISION_HINT: "f32", "f16" + +Device-Specific Properties: For setting appropriate `"PROPERTY"`, refer to OpenVINO config options for [CPU](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/cpu-device.html#supported-properties), [GPU](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#supported-properties), [NPU](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html#supported-features-and-properties) and [AUTO](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/auto-device-selection.html#using-auto). + +#### enable_qdq_optimizer Config Description +Enables Quantize-Dequantize (QDQ) optimization specifically for NPU devices. +Target: NPU devices only +Purpose: Optimizes ORT quantized models by keeping QDQ operations only for supported ops +Benefits: + +Better performance/accuracy with ORT optimizations disabled +NPU-specific quantization optimizations +Reduced computational overhead for quantized models + +#### disable_dynamic_shapes Config Description +Controls whether dynamic input shapes are converted to static shapes at runtime. +Options: + +True: Convert dynamic models to static shapes before execution +False: Maintain dynamic shape capabilities (default for most devices) + +Use Cases: +\ +Models with dynamic batch size or sequence length +Devices that perform significantly better with static shapes +Memory optimization scenarios -To use csharp api for openvino execution provider create a custom nuget package. Follow the instructions [here](../build/inferencing.md#build-nuget-packages) to install prerequisites for nuget creation. Once prerequisites are installed follow the instructions to [build openvino execution provider](../build/eps.md#openvino) and add an extra flag `--build_nuget` to create nuget packages. Two nuget packages will be created Microsoft.ML.OnnxRuntime.Managed and Microsoft.ML.OnnxRuntime.Openvino. +#### model_priority Config Description +Configures resource allocation priority when multiple models run simultaneously. +Priority Levels: -## Features +HIGH: Maximum resource allocation and priority +MEDIUM: Balanced resource sharing with other models +LOW: Minimal resource allocation, yields to higher priority models +DEFAULT: System-determined priority based on device capabilities -### OpenCL queue throttling for GPU devices +Use Cases: Multi-model deployments, resource-constrained environments. +#### reshape_input Config Description +Allows setting dynamic shape bounds specifically for NPU devices to optimize memory allocation and performance. +Format: input_name[lower_bound..upper_bound] or input_name[fixed_shape] +Device Support: NPU only (other devices handle dynamic shapes automatically) +Purpose: NPU requires shape bounds for optimal memory management with dynamic models +Examples: -Enables [OpenCL queue throttling](https://docs.openvino.ai/2024/api/c_cpp_api/group__ov__runtime__ocl__gpu__prop__cpp__api.html) for GPU devices. Reduces CPU utilization when using GPUs with OpenVINO EP. +data[1,3,224,224..448]: Height can vary from 224 to 448 +sequence[1,10..100,768]: Sequence length from 10 to 100 +batch[1..8,3,224,224]: Batch size from 1 to 8 -### Model caching -OpenVINO™ supports [model caching](https://docs.openvino.ai/2024/openvino-workflow/running-inference/optimize-inference/optimizing-latency/model-caching-overview.html). +#### layout Config Description +Provides explicit control over tensor layout format for inputs and outputs, enabling performance optimizations. +Standard Layout Characters: + +N: Batch dimension +C: Channel dimension +H: Height dimension +W: Width dimension +D: Depth dimension +T: Time/sequence dimension +?: Unknown/placeholder dimension + +Format: input_name[LAYOUT],output_name[LAYOUT] + + +### Feature Descriptions + +#### Model Caching +OpenVINO™ supports [model caching](https://docs.openvino.ai/2025/openvino-workflow/running-inference/optimize-inference/optimizing-latency/model-caching-overview.html). Model caching feature is supported on CPU, NPU, GPU along with kernel caching on iGPU, dGPU. @@ -106,9 +557,9 @@ Kernel Caching on iGPU and dGPU: This feature also allows user to save kernel caching as cl_cache files for models with dynamic input shapes. These cl_cache files can be loaded directly onto the iGPU/dGPU hardware device target and inferencing can be performed. -#### Enabling Model Caching via Runtime options using c++/python API's. +#### Enabling Model Caching via Runtime options using C++/python API's. -This flow can be enabled by setting the runtime config option 'cache_dir' specifying the path to dump and load the blobs (CPU, NPU, iGPU, dGPU) or cl_cache(iGPU, dGPU) while using the c++/python API'S. +This flow can be enabled by setting the runtime config option 'cache_dir' specifying the path to dump and load the blobs (CPU, NPU, iGPU, dGPU) or cl_cache(iGPU, dGPU) while using the C++/python API'S. Refer to [Configuration Options](#configuration-options) for more information about using these runtime options. @@ -120,7 +571,7 @@ Int8 models are supported on CPU, GPU and NPU. OpenVINO™ Execution Provider now supports ONNX models that store weights in external files. It is especially useful for models larger than 2GB because of protobuf limitations. -See the [OpenVINO™ ONNX Support documentation](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-onnx.html). +See the [OpenVINO™ ONNX Support documentation](https://docs.openvino.ai/2025/openvino-workflow/model-preparation/convert-model-onnx.html). Converting and Saving an ONNX Model to External Data: Use the ONNX API's.[documentation](https://github.com/onnx/onnx/blob/master/docs/ExternalData.md#converting-and-saving-an-onnx-model-to-external-data). @@ -140,57 +591,34 @@ Note: 3. Install the latest ONNX Python package using pip to run these ONNX Python API's successfully. -### Support for IO Buffer Optimization -To enable IO Buffer Optimization we have to set OPENCL_LIBS, OPENCL_INCS environment variables before build. For IO Buffer Optimization, the model must be fully supported on OpenVINO™ and we must provide in the remote context cl_context void pointer as C++ Configuration Option. We can provide cl::Buffer address as Input using GPU Memory Allocator for input and output. -Example: -```bash -//Set up a remote context -cl::Context _context; -..... -// Set the context through openvino options -std::unordered_map ov_options; -ov_options[context] = std::to_string((unsigned long long)(void *) _context.get()); -..... -//Define the Memory area -Ort::MemoryInfo info_gpu("OpenVINO_GPU", OrtAllocatorType::OrtDeviceAllocator, 0, OrtMemTypeDefault); -//Create a shared buffer , fill in with data -cl::Buffer shared_buffer(_context, CL_MEM_READ_WRITE, imgSize, NULL, &err); -.... -//Cast it to void*, and wrap it as device pointer for Ort::Value -void *shared_buffer_void = static_cast(&shared_buffer); -Ort::Value inputTensors = Ort::Value::CreateTensor( - info_gpu, shared_buffer_void, imgSize, inputDims.data(), - inputDims.size(), ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT); -``` - -### Multi-threading for OpenVINO™ Execution Provider +### Multi-threading for OpenVINO Execution Provider OpenVINO™ Execution Provider for ONNX Runtime enables thread-safe deep learning inference -### Multi streams for OpenVINO™ Execution Provider +### Multi streams for OpenVINO Execution Provider OpenVINO™ Execution Provider for ONNX Runtime allows multiple stream execution for difference performance requirements part of API 2.0 -### Auto-Device Execution for OpenVINO EP +### Auto-Device Execution for OpenVINO Execution Provider -Use `AUTO:,..` as the device name to delegate selection of an actual accelerator to OpenVINO™. Auto-device internally recognizes and selects devices from CPU, integrated GPU, discrete Intel GPUs (when available) and NPU (when available) depending on the device capabilities and the characteristic of CNN models, for example, precisions. Then Auto-device assigns inference requests to the selected device. +Use `AUTO:,..` as the device name to delegate selection of an actual accelerator to OpenVINO™. Auto-device internally recognizes and selects devices from CPU, integrated GPU, discrete Intel GPUs (when available) and NPU (when available) depending on the device capabilities and the characteristic of ONNX models, for example, precisions. Then Auto-device assigns inference requests to the selected device. From the application point of view, this is just another device that handles all accelerators in full system. For more information on Auto-Device plugin of OpenVINO™, please refer to the -[Intel OpenVINO™ Auto Device Plugin](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#automatic-device-selection). +[Intel OpenVINO™ Auto Device Plugin](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#automatic-device-selection). -### Heterogeneous Execution for OpenVINO™ Execution Provider +### Heterogeneous Execution for OpenVINO Execution Provider The heterogeneous execution enables computing for inference on one network on several devices. Purposes to execute networks in heterogeneous mode: * To utilize accelerator's power and calculate the heaviest parts of the network on the accelerator and execute unsupported layers on fallback devices like the CPU to utilize all available hardware more efficiently during one inference. For more information on Heterogeneous plugin of OpenVINO™, please refer to the -[Intel OpenVINO™ Heterogeneous Plugin](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.html). +[Intel OpenVINO™ Heterogeneous Plugin](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.html). -### Multi-Device Execution for OpenVINO EP +### Multi-Device Execution for OpenVINO Execution Provider Multi-Device plugin automatically assigns inference requests to available computational devices to execute the requests in parallel. Potential gains are as follows: @@ -198,7 +626,7 @@ Multi-Device plugin automatically assigns inference requests to available comput * More consistent performance, since the devices can now share the inference burden (so that if one device is becoming too busy, another device can take more of the load) For more information on Multi-Device plugin of OpenVINO™, please refer to the -[Intel OpenVINO™ Multi Device Plugin](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#multi-stream-execution). +[Intel OpenVINO™ Multi Device Plugin](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#multi-stream-execution). ### Export OpenVINO Compiled Blob Export the OpenVINO compiled blob as an ONNX model. Using this ONNX model for subsequent inferences avoids model recompilation and could have a positive impact on Session creation time. This feature is currently enabled for fully supported models only. It complies with the ORT session config keys @@ -230,22 +658,69 @@ Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/main/in Optimizes ORT quantized models for the NPU device to only keep QDQs for supported ops and optimize for performance and accuracy.Generally this feature will give better performance/accuracy with ORT Optimizations disabled. Refer to [Configuration Options](#configuration-options) for more information about using these runtime options. -### Loading Custom JSON OV Config During Runtime -This feature is developed to facilitate loading of OVEP parameters from a single JSON configuration file. -The JSON input schema must be of format - +### Loading Custom JSON OpenVINO Config During Runtime +The `load_config` feature is developed to facilitate loading of OpenVINO EP parameters using a JSON input schema, which mandatorily follows below format - ``` { "DEVICE_KEY": {"PROPERTY": "PROPERTY_VALUE"} } ``` -where "DEVICE_KEY" can be CPU, NPU or GPU , "PROPERTY" must be a valid entity defined in OV from its properties.hpp sections and "PROPERTY_VALUE" must be passed in as a string. If we pass any other type like int/bool we encounter errors from ORT like below - +where "DEVICE_KEY" can be CPU, NPU, GPU or AUTO , "PROPERTY" must be a valid entity defined in [OpenVINO™ supported properties](https://github.com/openvinotoolkit/openvino/blob/releases/2025/3/src/inference/include/openvino/runtime/properties.hpp) & "PROPERTY_VALUE" must be a valid corresponding supported property value passed in as a string. -Exception during initialization: [json.exception.type_error.302] type must be string, but is a number. +If a property is set using an invalid key (i.e., a key that is not recognized as part of the `OpenVINO™ supported properties`), it will be ignored & a warning will be logged against the same. However, if a valid property key is used but assigned an invalid value (e.g., a non-integer where an integer is expected), the OpenVINO™ framework will result in an exception during execution. + +The valid properties are of two types viz. Mutable (Read/Write) & Immutable (Read only) these are also governed while setting the same. If an Immutable property is being set, we skip setting the same with a similar warning. + +For setting appropriate `"PROPERTY"`, refer to OpenVINO config options for [CPU](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/cpu-device.html#supported-properties), [GPU](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#supported-properties), [NPU](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html#supported-features-and-properties) and [AUTO](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/auto-device-selection.html#using-auto). + +Example: + +The usage of this functionality using onnxruntime_perf_test application is as below – + +``` +onnxruntime_perf_test.exe -e openvino -m times -r 1 -i "device_type|NPU load_config|test_config.json" model.onnx +``` +#### Dynamic Shape Management +Comprehensive handling of models with variable input dimensions across all supported devices. +Device-Specific Handling: + +NPU: Requires shape bounds via reshape_input for optimal memory management +CPU/GPU: Automatic dynamic shape handling with runtime optimization +All Devices: Option to convert dynamic to static shapes when beneficial + +OpenVINO™ Shape Management: +The reshape method updates input shapes and propagates them through all intermediate layers to outputs. This enables runtime shape modification for different input sizes. +Shape Changing Approaches: + +Single input models: Pass new shape directly to reshape method +Multiple inputs: Specify shapes by port, index, or tensor name +Batch modification: Use set_batch method with appropriate layout specification + +Performance Considerations: + +Static shapes avoid memory and runtime overheads +Dynamic shapes provide flexibility at performance cost +Shape bounds optimization (NPU) balances flexibility and performance + +Important: Always set static shapes when input dimensions won't change between inferences for optimal performance. + +#### Tensor Layout Control +Enables explicit specification of tensor memory layout for performance optimization currenty supported on CPU. +Layout specification helps OpenVINO optimize memory access patterns and tensor operations based on actual data organization. +Layout Specification Benefits: + +Optimized memory access: Improved cache utilization and memory throughput +Better tensor operations: Device-specific operation optimization +Reduced memory copies: Direct operation on optimally laid out data +Hardware-specific optimization: Leverages device-preferred memory layouts + +Common Layout Patterns: + +NCHW: Batch, Channel, Height, Width +NHWC: Batch, Height, Width, Channel +NC: Batch, Channel +NTD: Batch, Time, Dimension -While one can set the int/bool values like this "NPU_TILES": "2" which is valid. -If someone passes incorrect keys, it will be skipped with a warning while incorrect values assigned to a valid key will result in an exception arising from OV framework. - -The valid properties are of 2 types viz. MUTABLE (R/W) & IMMUTABLE (R ONLY) these are also governed while setting the same. If an IMMUTABLE property is being set, we skip setting the same with a similar warning. ### OpenVINO Execution Provider Supports EP-Weight Sharing across sessions The OpenVINO Execution Provider (OVEP) in ONNX Runtime supports EP-Weight Sharing, enabling models to efficiently share weights across multiple inference sessions. This feature enhances the execution of Large Language Models (LLMs) with prefill and KV cache, reducing memory consumption and improving performance when running multiple inferences. @@ -255,11 +730,7 @@ With EP-Weight Sharing, prefill and KV cache models can now reuse the same set o These changes enable weight sharing between two models using the session context option: ep.share_ep_contexts. Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/5068ab9b190c549b546241aa7ffbe5007868f595/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h#L319) for more details on configuring this runtime option. -### OVEP supports CreateSessionFromArray API -The OpenVINO Execution Provider (OVEP) in ONNX Runtime supports creating sessions from memory using the CreateSessionFromArray API. This allows loading models directly from memory buffers instead of file paths. The CreateSessionFromArray loads the model in memory then creates a session from the in-memory byte array. - -Note: -Use the -l argument when running the inference with perf_test using CreateSessionFromArray API. + ## Configuration Options @@ -319,35 +790,6 @@ OpenVINO™ backend performs hardware, dependent as well as independent optimiza SessionOptions::SetGraphOptimizationLevel(ORT_DISABLE_ALL); ``` -## Summary of options - -The following table lists all the available configuration options for API 2.0 and the Key-Value pairs to set them: - -| **Key** | **Key type** | **Allowable Values** | **Value type** | **Description** | -| --- | --- | --- | --- | --- | -| device_type | string | CPU, NPU, GPU, GPU.0, GPU.1 based on the available GPUs, NPU, Any valid Hetero combination, Any valid Multi or Auto devices combination | string | Overrides the accelerator hardware type with these values at runtime. If this option is not explicitly set, default hardware specified during build is used. | -| precision | string | FP32, FP16, ACCURACY based on the device_type chosen | string | Supported precisions for HW {CPU:FP32, GPU:[FP32, FP16, ACCURACY], NPU:FP16}. Default precision for HW for optimized performance {CPU:FP32, GPU:FP16, NPU:FP16}. To execute model with the default input precision, select ACCURACY precision type. | -| num_of_threads | string | Any unsigned positive number other than 0 | size_t | Overrides the accelerator default value of number of threads with this value at runtime. If this option is not explicitly set, default value of 8 during build time will be used for inference. | -| num_streams | string | Any unsigned positive number other than 0 | size_t | Overrides the accelerator default streams with this value at runtime. If this option is not explicitly set, default value of 1, performance for latency is used during build time will be used for inference. | -| cache_dir | string | Any valid string path on the hardware target | string | Explicitly specify the path to save and load the blobs enabling model caching feature.| -| context | string | OpenCL Context | void* | This option is only available when OpenVINO EP is built with OpenCL flags enabled. It takes in the remote context i.e the cl_context address as a void pointer.| -| enable_opencl_throttling | string | True/False | boolean | This option enables OpenCL queue throttling for GPU devices (reduces CPU utilization when using GPU). | -| enable_qdq_optimizer | string | True/False | boolean | This option enables QDQ Optimization to improve model performance and accuracy on NPU. | -| load_config | string | Any custom JSON path | string | This option enables a feature for loading custom JSON OV config during runtime which sets OV parameters. | - - -Valid Hetero or Multi or Auto Device combinations: -`HETERO:,...` -The `device` can be any of these devices from this list ['CPU','GPU', 'NPU'] - -A minimum of two DEVICE_TYPE'S should be specified for a valid HETERO, MULTI, or AUTO Device Build. - -Example: -HETERO:GPU,CPU AUTO:GPU,CPU MULTI:GPU,CPU - -Deprecated device_type option : -CPU_FP32, GPU_FP32, GPU_FP16, NPU_FP16 are no more supported. They will be deprecated in the future release. Kindly upgrade to latest device_type and precision option. - ## Support Coverage **ONNX Layers supported using OpenVINO** @@ -628,4 +1070,4 @@ In order to showcase what you can do with the OpenVINO™ Execution Provider for [Docker Containers](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-execution-provider-docker-container.html) ### Tutorial on how to use OpenVINO™ Execution Provider for ONNX Runtime python wheel packages -[Python Pip Wheel Packages](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-execution-provider-for-onnx-runtime.html) +[Python Pip Wheel Packages](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-execution-provider-for-onnx-runtime.html) \ No newline at end of file From 2432030f8ef978ac791e3ddc7d8d68ecf381b7a9 Mon Sep 17 00:00:00 2001 From: Jaswanth Gannamaneni Date: Thu, 18 Sep 2025 00:18:45 -0700 Subject: [PATCH 2/4] update documentation --- docs/build/eps.md | 1704 ++++++++--------- .../OpenVINO-ExecutionProvider.md | 1643 ++++++---------- 2 files changed, 1423 insertions(+), 1924 deletions(-) diff --git a/docs/build/eps.md b/docs/build/eps.md index 59cd1ace6e29c..39196774d15be 100644 --- a/docs/build/eps.md +++ b/docs/build/eps.md @@ -1,852 +1,852 @@ ---- -title: Build with different EPs -parent: Build ONNX Runtime -description: Learm how to build ONNX Runtime from source for different execution providers -nav_order: 3 -redirect_from: /docs/how-to/build/eps ---- - -# Build ONNX Runtime with Execution Providers -{: .no_toc } - -## Contents -{: .no_toc } - -* TOC placeholder -{:toc} - -## Execution Provider Shared Libraries - -The oneDNN, TensorRT, OpenVINO™, CANN, and QNN providers are built as shared libraries vs being statically linked into the main onnxruntime. This enables them to be loaded only when needed, and if the dependent libraries of the provider are not installed onnxruntime will still run fine, it just will not be able to use that provider. For non shared library providers, all dependencies of the provider must exist to load onnxruntime. - -### Built files -{: .no_toc } - -On Windows, shared provider libraries will be named 'onnxruntime_providers_\*.dll' (for example onnxruntime_providers_openvino.dll). -On Unix, they will be named 'libonnxruntime_providers_\*.so' -On Mac, they will be named 'libonnxruntime_providers_\*.dylib'. - -There is also a shared library that shared providers depend on called onnxruntime_providers_shared (with the same naming convension applied as above). - -Note: It is not recommended to put these libraries in a system location or added to a library search path (like LD_LIBRARY_PATH on Unix). If multiple versions of onnxruntime are installed on the system this can make them find the wrong libraries and lead to undefined behavior. - -### Loading the shared providers -{: .no_toc } - -Shared provider libraries are loaded by the onnxruntime code (do not load or depend on them in your client code). The API for registering shared or non shared providers is identical, the difference is that shared ones will be loaded at runtime when the provider is added to the session options (through a call like OrtSessionOptionsAppendExecutionProvider_OpenVINO or SessionOptionsAppendExecutionProvider_OpenVINO in the C API). -If a shared provider library cannot be loaded (if the file doesn't exist, or its dependencies don't exist or not in the path) then an error will be returned. - -The onnxruntime code will look for the provider shared libraries in the same location as the onnxruntime shared library is (or the executable statically linked to the static library version). - ---- - -## CUDA - -### Prerequisites -{: .no_toc } - -* Install [CUDA](https://developer.nvidia.com/cuda-toolkit) and [cuDNN](https://developer.nvidia.com/cudnn) - * The CUDA execution provider for ONNX Runtime is built and tested with CUDA 12.x and cuDNN 9. Check [here](../execution-providers/CUDA-ExecutionProvider.md#requirements) for more version information. - * The path to the CUDA installation must be provided via the CUDA_HOME environment variable, or the `--cuda_home` parameter. The installation directory should contain `bin`, `include` and `lib` sub-directories. - * The path to the CUDA `bin` directory must be added to the PATH environment variable so that `nvcc` is found. - * The path to the cuDNN installation must be provided via the CUDNN_HOME environment variable, or `--cudnn_home` parameter. In Windows, the installation directory should contain `bin`, `include` and `lib` sub-directories. - * cuDNN 8.* requires ZLib. Follow the [cuDNN 8.9 installation guide](https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-890/install-guide/index.html) to install zlib in Linux or Windows. - * In Windows, the path to the cuDNN bin directory must be added to the PATH environment variable so that cudnn64_8.dll is found. - -### Build Instructions -{: .no_toc } - -#### Windows -``` -.\build.bat --use_cuda --cudnn_home --cuda_home -``` - -#### Linux -``` -./build.sh --use_cuda --cudnn_home --cuda_home -``` - -A Dockerfile is available [here](https://github.com/microsoft/onnxruntime/blob/main/dockerfiles#cuda). - -### Build Options - -To specify GPU architectures (see [Compute Capability](https://developer.nvidia.com/cuda-gpus)), you can append parameters like `--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=80;86;89`. - -With `--cmake_extra_defines onnxruntime_USE_CUDA_NHWC_OPS=ON`, the CUDA EP can be compiled with additional NHWC ops. This option is not enabled by default due to the small amount of supported NHWC operators. - -Another very helpful CMake build option is to build with NVTX support (`--cmake_extra_defines onnxruntime_ENABLE_NVTX_PROFILE=ON`) that will enable much easier profiling using [Nsight Systems](https://developer.nvidia.com/nsight-systems) and correlates CUDA kernels with their actual ONNX operator. - -`--enable_cuda_line_info` or `--cmake_extra_defines onnxruntime_ENABLE_CUDA_LINE_NUMBER_INFO=ON` will enable [NVCC generation of line-number information for device code](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#generate-line-info-lineinfo). It might be helpful when you run [Compute Sanitizer](https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html) tools on CUDA kernels. - -If your Windows machine has multiple versions of CUDA installed and you want to use an older version of CUDA, you need append parameters like `--cuda_version `. - -When your build machine has many CPU cores and less than 64 GB memory, there is chance of out of memory error like `nvcc error : 'cicc' died due to signal 9`. The solution is to limit number of parallel NVCC threads with parameters like `--parallel 4 --nvcc_threads 1`. - -### Notes on older versions of ONNX Runtime, CUDA and Visual Studio -{: .no_toc } - -* Depending on compatibility between the CUDA, cuDNN, and Visual Studio versions you are using, you may need to explicitly install an earlier version of the MSVC toolset. -* For older version of ONNX Runtime and CUDA, and Visual Studio: - * CUDA 10.0 is [known to work](https://devblogs.microsoft.com/cppblog/cuda-10-is-now-available-with-support-for-the-latest-visual-studio-2017-versions/) with toolsets from 14.11 up to 14.16 (Visual Studio 2017 15.9), and should continue to work with future Visual Studio versions - * CUDA 9.2 is known to work with the 14.11 MSVC toolset (Visual Studio 15.3 and 15.4) - * To install the 14.11 MSVC toolset, see [this page](https://blogs.msdn.microsoft.com/vcblog/2017/11/15/side-by-side-minor-version-msvc-toolsets-in-visual-studio-2017). - * To use the 14.11 toolset with a later version of Visual Studio 2017 you have two options: - 1. Setup the Visual Studio environment variables to point to the 14.11 toolset by running vcvarsall.bat, prior to running the build script. e.g. if you have VS2017 Enterprise, an x64 build would use the following command `"C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" amd64 -vcvars_ver=14.11` For convenience, .\build.amd64.1411.bat will do this and can be used in the same way as .\build.bat. e.g. ` .\build.amd64.1411.bat --use_cuda` - - 2. Alternatively, if you have CMake 3.13 or later you can specify the toolset version via the `--msvc_toolset` build script parameter. e.g. `.\build.bat --msvc_toolset 14.11` - -* If you have multiple versions of CUDA installed on a Windows machine and are building with Visual Studio, CMake will use the build files for the highest version of CUDA it finds in the BuildCustomization folder. -e.g. C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\Common7\IDE\VC\VCTargets\BuildCustomizations\. -If you want to build with an earlier version, you must temporarily remove the 'CUDA x.y.*' files for later versions from this directory. - ---- - -## TensorRT - -See more information on the TensorRT Execution Provider [here](../execution-providers/TensorRT-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - - * Follow [instructions for CUDA execution provider](#cuda) to install CUDA and cuDNN, and setup environment variables. - * Follow [instructions for installing TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/latest/installing-tensorrt/installing.html) - * The TensorRT execution provider for ONNX Runtime is built and tested with TensorRT 10.9. - * The path to TensorRT installation must be provided via the `--tensorrt_home` parameter. - * ONNX Runtime uses [TensorRT built-in parser](https://developer.nvidia.com/tensorrt/download) from `tensorrt_home` by default. - * To use open-sourced [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt/tree/main) parser instead, add `--use_tensorrt_oss_parser` parameter in build commands below. - * The default version of open-sourced onnx-tensorrt parser is specified in [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt). - * To specify a different version of onnx-tensorrt parser: - * Select the commit of [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt/commits) that you preferred; - * Run `sha1sum` command with downloaded onnx-tensorrt zip file to acquire the SHA1 hash - * Update [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt) with updated onnx-tensorrt commit and hash info. - * Please make sure TensorRT built-in parser/open-sourced onnx-tensorrt specified in [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt) are **version-matched**, if enabling `--use_tensorrt_oss_parser`. - * i.e It's version-matched if assigning `tensorrt_home` with path to TensorRT-10.9 built-in binaries and onnx-tensorrt [10.9-GA branch](https://github.com/onnx/onnx-tensorrt/tree/release/10.9-GA) specified in [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt). - - -### **[Note to ORT 1.21/1.22 open-sourced parser users]** - -* ORT 1.21/1.22 link against onnx-tensorrt 10.8-GA/10.9-GA, which requires newly released onnx 1.18. - * Here's a temporarily fix to preview on onnx-tensorrt 10.8-GA/10.9-GA when building ORT 1.21/1.22: - * Replace the [onnx line in cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/rel-1.21.0/cmake/deps.txt#L38) - with `onnx;https://github.com/onnx/onnx/archive/e709452ef2bbc1d113faf678c24e6d3467696e83.zip;c0b9f6c29029e13dea46b7419f3813f4c2ca7db8` - * Download [this](https://github.com/microsoft/onnxruntime/blob/7b2733a526c12b5ef4475edd47fd9997ebc2b2c6/cmake/patches/onnx/onnx.patch) as raw file and save file to [cmake/patches/onnx/onnx.patch](https://github.com/microsoft/onnxruntime/blob/rel-1.21.0/cmake/patches/onnx/onnx.patch) (do not copy/paste from browser, as it might alter line break type) - * Build ORT with trt-related flags above (including `--use_tensorrt_oss_parser`) - * The [onnx 1.18](https://github.com/onnx/onnx/releases/tag/v1.18.0) is supported by latest ORT main branch. Please checkout main branch and build ORT-TRT with `--use_tensorrt_oss_parser` to enable OSS parser with full onnx 1.18 support. - -### Build Instructions -{: .no_toc } - -#### Windows -```bash -# to build with tensorrt built-in parser -.\build.bat --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --tensorrt_home --cmake_generator "Visual Studio 17 2022" - -# to build with specific version of open-sourced onnx-tensorrt parser configured in cmake/deps.txt -.\build.bat --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --tensorrt_home --use_tensorrt_oss_parser --cmake_generator "Visual Studio 17 2022" -``` - -#### Linux - -```bash -# to build with tensorrt built-in parser -./build.sh --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --tensorrt_home - -# to build with specific version of open-sourced onnx-tensorrt parser configured in cmake/deps.txt -./build.sh --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --use_tensorrt_oss_parser --tensorrt_home --skip_submodule_sync -``` - -Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/tree/main/dockerfiles#tensorrt) - -**Note** Building with `--use_tensorrt_oss_parser` with TensorRT 8.X requires additional flag --cmake_extra_defines onnxruntime_USE_FULL_PROTOBUF=ON - ---- - -## NVIDIA Jetson TX1/TX2/Nano/Xavier/Orin - -### Build Instructions -{: .no_toc } - -These instructions are for the latest [JetPack SDK](https://developer.nvidia.com/embedded/jetpack). - -1. Clone the ONNX Runtime repo on the Jetson host - - ```bash - git clone --recursive https://github.com/microsoft/onnxruntime - ``` - -2. Specify the CUDA compiler, or add its location to the PATH. - - 1. JetPack 5.x users can upgrade to the latest CUDA release without updating the JetPack version or Jetson Linux BSP (Board Support Package). - - 1. For JetPack 5.x users, CUDA>=11.8 and GCC>9.4 are required to be installed on and after ONNX Runtime 1.17. - - 2. Check [this official blog](https://developer.nvidia.com/blog/simplifying-cuda-upgrades-for-nvidia-jetson-users/) for CUDA upgrade instruction (CUDA 12.2 has been verified on JetPack 5.1.2 on Jetson Xavier NX). - - 1. If there's no `libnvcudla.so` under `/usr/local/cuda-12.2/compat`: `sudo apt-get install -y cuda-compat-12-2` and add `export LD_LIBRARY_PATH="/usr/local/cuda-12.2/lib64:/usr/local/cuda-12.2/compat:$LD_LIBRARY_PATH"` to `~/.bashrc`. - - 3. Check [here](https://developer.nvidia.com/cuda-gpus#collapse5) for compute capability datasheet. - - 2. CMake can't automatically find the correct `nvcc` if it's not in the `PATH`. `nvcc` can be added to `PATH` via: - - ```bash - export PATH="/usr/local/cuda/bin:${PATH}" - ``` - - or: - - ```bash - export CUDACXX="/usr/local/cuda/bin/nvcc" - ``` - - 3. Update TensorRT libraries - - 1. Jetpack 5.x supports up to TensorRT 8.5. Jetpack 6.x are equipped with TensorRT 8.6-10.3. - - 2. Jetpack 6.x users can download latest TensorRT 10 TAR package for **jetpack** on [TensorRT SDK website](https://developer.nvidia.com/tensorrt/download/10x). - - 3. Check [here](../execution-providers/TensorRT-ExecutionProvider.md#requirements) for TensorRT/CUDA support matrix among all ONNX Runtime versions. - -3. Install the ONNX Runtime build dependencies on the Jetpack host: - - ```bash - sudo apt install -y --no-install-recommends \ - build-essential software-properties-common libopenblas-dev \ - libpython3.10-dev python3-pip python3-dev python3-setuptools python3-wheel - ``` - -4. Cmake is needed to build ONNX Runtime. Please check the minimum required CMake version [here](https://github.com/microsoft/onnxruntime/blob/main/cmake/CMakeLists.txt#L6). Download from https://cmake.org/download/ and add cmake executable to `PATH` to use it. - -5. Build the ONNX Runtime Python wheel: - - 1. Build `onnxruntime-gpu` wheel with CUDA and TensorRT support (update paths to CUDA/CUDNN/TensorRT libraries if necessary): - - ```bash - ./build.sh --config Release --update --build --parallel --build_wheel \ - --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu \ - --tensorrt_home /usr/lib/aarch64-linux-gnu - ``` - -​ Notes: - -* By default, `onnxruntime-gpu` wheel file will be captured under `path_to/onnxruntime/build/Linux/Release/dist/` (build path can be customized by adding `--build_dir` followed by a customized path to the build command above). - -* Append `--skip_tests --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' 'onnxruntime_BUILD_UNIT_TESTS=OFF' 'onnxruntime_USE_FLASH_ATTENTION=OFF' -'onnxruntime_USE_MEMORY_EFFICIENT_ATTENTION=OFF'` to the build command to opt out optional features and reduce build time. - -* For a portion of Jetson devices like the Xavier series, higher power mode involves more cores (up to 6) to compute but it consumes more resource when building ONNX Runtime. Set `--parallel 1` in the build command if OOM happens and system is hanging. - -## TensorRT-RTX - -See more information on the NV TensorRT RTX Execution Provider [here](../execution-providers/TensorRTRTX-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - - * Follow [instructions for CUDA execution provider](#cuda) to install CUDA and setup environment variables. - * Intall TensorRT for RTX from nvidia.com (TODO: add link when available) - -### Build Instructions -{: .no_toc } -`build.bat --config Release --parallel 32 --build_dir _build --build_shared_lib --use_nv_tensorrt_rtx --tensorrt_home "C:\dev\TensorRT-RTX-1.1.0.3" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9" --cmake_generator "Visual Studio 17 2022" --use_vcpkg` -Replace the --tensorrt_home and --cuda_home with correct paths to CUDA and TensorRT-RTX installations. - -## oneDNN - -See more information on oneDNN (formerly DNNL) [here](../execution-providers/oneDNN-ExecutionProvider.md). - -### Build Instructions -{: .no_toc } - - -The DNNL execution provider can be built for Intel CPU or GPU. To build for Intel GPU, install [Intel SDK for OpenCL Applications](https://software.intel.com/content/www/us/en/develop/tools/opencl-sdk.html) or build OpenCL from [Khronos OpenCL SDK](https://github.com/KhronosGroup/OpenCL-SDK). Pass in the OpenCL SDK path as dnnl_opencl_root to the build command. Install the latest GPU driver - [Windows graphics driver](https://downloadcenter.intel.com/product/80939/Graphics), [Linux graphics compute runtime and OpenCL driver](https://github.com/intel/compute-runtime/releases). - -For CPU -#### Windows -`.\build.bat --use_dnnl` - -#### Linux -`./build.sh --use_dnnl` - -For GPU -#### Windows - -`.\build.bat --use_dnnl --dnnl_gpu_runtime ocl --dnnl_opencl_root "c:\program files (x86)\intelswtools\sw_dev_tools\opencl\sdk"` -#### Linux - -`./build.sh --use_dnnl --dnnl_gpu_runtime ocl --dnnl_opencl_root "/opt/intel/sw_dev_tools/opencl-sdk"` - -#### Build Phython Wheel - - -OneDNN EP build supports building Python wheel for both Windows and linux using flag --build_wheel - -`.\build.bat --config RelWithDebInfo --parallel --build_shared_lib --cmake_generator "Visual Studio 16 2019" --build_wheel --use_dnnl --dnnl_gpu_runtime ocl --dnnl_opencl_root "C:\Program Files (x86)\IntelSWTools\system_studio_2020\OpenCL\sdk"` - ---- - -## OpenVINO - -See more information on the OpenVINO™ Execution Provider [here](../execution-providers/OpenVINO-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - -1. Install the OpenVINO™ offline/online installer from Intel® Distribution of OpenVINO™TM Toolkit **Release 2024.3** for the appropriate OS and target hardware: - * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2025_3_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE). - * [Linux - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2025_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE) - - Follow [documentation](https://docs.openvino.ai/2025/index.html) for detailed instructions. - - *2025.3 is the current recommended OpenVINO™ version. [OpenVINO™ 2025.0](https://docs.openvino.ai/2025/index.html) is minimal OpenVINO™ version requirement.* - -2. Configure the target hardware with specific follow on instructions: - * To configure Intel® Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/2025/get-started/install-openvino/configurations/configurations-intel-gpu.html#windows), [Linux](https://docs.openvino.ai/2025/get-started/install-openvino/configurations/configurations-intel-gpu.html#linux) - - -3. Initialize the OpenVINO™ environment by running the setupvars script as shown below. This is a required step: - * For Windows: - ``` - C:\\setupvars.bat - ``` - * For Linux: - ``` - $ source /setupvars.sh - ``` - - -### Build Instructions -{: .no_toc } - -#### Windows - -``` -.\build.bat --config RelWithDebInfo --use_openvino --build_shared_lib --build_wheel -``` - -*Note: The default Windows CMake Generator is Visual Studio 2019, but you can also use the newer Visual Studio 2022 by passing `--cmake_generator "Visual Studio 17 2022"` to `.\build.bat`* - -#### Linux - -```bash -./build.sh --config RelWithDebInfo --use_openvino --build_shared_lib --build_wheel -``` - -* `--build_wheel` Creates python wheel file in dist/ folder. Enable it when building from source. -* `--use_openvino` builds the OpenVINO™ Execution Provider in ONNX Runtime. -* ``: Specifies the default hardware target for building OpenVINO™ Execution Provider. This can be overriden dynamically at runtime with another option (refer to [OpenVINO™-ExecutionProvider](../execution-providers/OpenVINO-ExecutionProvider.md#summary-of-options) for more details on dynamic device selection). Below are the options for different Intel target devices. - -Refer to [Intel GPU device naming convention](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#device-naming-convention) for specifying the correct hardware target in cases where both integrated and discrete GPU's co-exist. - -| Hardware Option | Target Device | -| --------------- | ------------------------| -| CPU | Intel® CPUs | -| GPU | Intel® Integrated Graphics | -| GPU.0 | Intel® Integrated Graphics | -| GPU.1 | Intel® Discrete Graphics | -| NPU | Intel® Neural Processor Unit | -| HETERO:DEVICE_TYPE_1,DEVICE_TYPE_2,DEVICE_TYPE_3... | All Intel® silicons mentioned above | -| MULTI:DEVICE_TYPE_1,DEVICE_TYPE_2,DEVICE_TYPE_3... | All Intel® silicons mentioned above | -| AUTO:DEVICE_TYPE_1,DEVICE_TYPE_2,DEVICE_TYPE_3... | All Intel® silicons mentioned above | - -Specifying Hardware Target for HETERO or Multi or AUTO device Build: - -HETERO:DEVICE_TYPE_1,DEVICE_TYPE_2,DEVICE_TYPE_3... -The DEVICE_TYPE can be any of these devices from this list ['CPU','GPU', 'NPU'] - -A minimum of two device's should be specified for a valid HETERO or MULTI or AUTO device build. - -``` -Example's: HETERO:GPU,CPU or AUTO:GPU,CPU or MULTI:GPU,CPU -``` - -#### Disable subgraph partition Feature -* Builds the OpenVINO™ Execution Provider in ONNX Runtime with sub graph partitioning disabled. - -* With this option enabled. Fully supported models run on OpenVINO Execution Provider else they completely fall back to default CPU EP. - -* To enable this feature during build time. Use `--use_openvino ` `_NO_PARTITION` - -``` -Usage: --use_openvino CPU_NO_PARTITION or --use_openvino GPU_NO_PARTITION or - --use_openvino GPU_NO_PARTITION -``` - -For more information on OpenVINO™ Execution Provider's ONNX Layer support, Topology support, and Intel hardware enabled, please refer to the document [OpenVINO™-ExecutionProvider](../execution-providers/OpenVINO-ExecutionProvider.md) - ---- - -## QNN -See more information on the QNN execution provider [here](../execution-providers/QNN-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } -* Install the Qualcomm AI Engine Direct SDK (Qualcomm Neural Network SDK) [Linux/Android/Windows](https://qpm.qualcomm.com/main/tools/details/qualcomm_ai_engine_direct) - -* Install [cmake-3.28](https://cmake.org/download/) or higher. - -* Install Python 3.10 or higher. - * [Python 3.12 for Windows Arm64](https://www.python.org/ftp/python/3.12.9/python-3.12.9-arm64.exe) - * [Python 3.12 for Windows x86-64](https://www.python.org/ftp/python/3.12.9/python-3.12.9-amd64.exe) - * Note: Windows on Arm supports a x86-64 Python environment via emulation. Ensure that the Arm64 Python environment is actived for a native Arm64 ONNX Runtime build. - -* Checkout the source tree: - - ```bash - git clone --recursive https://github.com/Microsoft/onnxruntime.git - cd onnxruntime - ``` - -* Install ONNX Runtime Python dependencies. - ```bash - pip install -r requirements.txt - ``` - -### Build Options -{: .no_toc } - -* `--use_qnn [QNN_LIBRARY_KIND]`: Builds the QNN Execution provider. `QNN_LIBRARY_KIND` is optional and specifies whether to build the QNN Execution Provider as a shared library (default) or static library. - * `--use_qnn` or `--use_qnn shared_lib`: Builds the QNN Execution Provider as a shared library. - * `--use_qnn static_lib`: Builds QNN Execution Provider as a static library linked into ONNX Runtime. This is required for Android builds. -* `--qnn_home QNN_SDK_PATH`: The path to the Qualcomm AI Engine Direct SDK. - * Example on Windows: `--qnn_home 'C:\Qualcomm\AIStack\QAIRT\2.31.0.250130'` - * Example on Linux: `--qnn_home /opt/qcom/aistack/qairt/2.31.0.250130` -* `--build_wheel`: Enables Python bindings and builds Python wheel. -* `--arm64`: Cross-compile for Arm64. -* `--arm64ec`: Cross-compile for Arm64EC. Arm64EC code runs with native performance and is interoperable with x64 code running under emulation within the same process on a Windows on Arm device. Refer to the [Arm64EC Overview](https://learn.microsoft.com/en-us/windows/arm/arm64ec). - -Run `python tools/ci_build/build.py --help` for a description of all available build options. - -### Build Instructions -{: .no_toc } - -#### Windows (native x86-64 or native Arm64) -``` -.\build.bat --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --cmake_generator "Visual Studio 17 2022" --config Release --parallel --skip_tests --build_dir build\Windows -``` - -Notes: -* Not all Qualcomm backends (e.g., HTP) are supported for model execution on a native x86-64 build. Refer to the [Qualcomm SDK backend documentation](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/backend.html) for more information. -* Even if a Qualcomm backend does not support execution on x86-64, the QNN Execution provider may be able to [generate compiled models](../execution-providers/QNN-ExecutionProvider.md#qnn-context-binary-cache-feature) for the Qualcomm backend. - -#### Windows (Arm64 cross-compile target) -``` -.\build.bat --arm64 --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --cmake_generator "Visual Studio 17 2022" --config Release --parallel --build_dir build\Windows -``` - -#### Windows (Arm64EC cross-compile target) -``` -.\build.bat --arm64ec --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --cmake_generator "Visual Studio 17 2022" --config Release --parallel --build_dir build\Windows -``` - -#### Windows (Arm64X cross-compile target) -Use the `build_arm64x.bat` script to build Arm64X binaries. Arm64X binaries bundle both Arm64 and Arm64EC code, making Arm64X compatible with both Arm64 and Arm64EC processes on a Windows on Arm device. Refer to the [Arm64X PE files overview](https://learn.microsoft.com/en-us/windows/arm/arm64x-pe). - -``` -.\build_arm64x.bat --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --cmake_generator "Visual Studio 17 2022" --config Release --parallel -``` -Notes: -* Do not specify a `--build_dir` option because `build_arm64x.bat` sets specific build directories. -* The above command places Arm64X binaries in the `.\build\arm64ec-x\Release\Release\` directory. - -#### Linux (x86_64) -``` -./build.sh --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --config Release --parallel --skip_tests --build_dir build/Linux -``` - -#### Android (cross-compile): - -Please reference [Build OnnxRuntime For Android](android.md) -``` -# on Windows -.\build.bat --build_shared_lib --android --config Release --parallel --use_qnn static_lib --qnn_home [QNN_SDK_PATH] --android_sdk_path [android_SDK path] --android_ndk_path [android_NDK path] --android_abi arm64-v8a --android_api [api-version] --cmake_generator Ninja --build_dir build\Android - -# on Linux -./build.sh --build_shared_lib --android --config Release --parallel --use_qnn static_lib --qnn_home [QNN_SDK_PATH] --android_sdk_path [android_SDK path] --android_ndk_path [android_NDK path] --android_abi arm64-v8a --android_api [api-version] --cmake_generator Ninja --build_dir build/Android -``` - ---- - -## DirectML -See more information on the DirectML execution provider [here](../execution-providers/DirectML-ExecutionProvider.md). -### Windows -{: .no_toc } - -``` -.\build.bat --use_dml -``` -### Notes -{: .no_toc } - -The DirectML execution provider supports building for both x64 and x86 architectures. DirectML is only supported on Windows. - ---- - -## Arm Compute Library -See more information on the ACL Execution Provider [here](../execution-providers/community-maintained/ACL-ExecutionProvider.md). - -### Build Instructions -{: .no_toc } - -You must first build Arm Compute Library 24.07 for your platform as described in the [documentation](https://github.com/ARM-software/ComputeLibrary). -See [here](inferencing.md#arm) for information on building for Arm®-based devices. - -Add the following options to `build.sh` to enable the ACL Execution Provider: - -``` ---use_acl --acl_home=/path/to/ComputeLibrary --acl_libs=/path/to/ComputeLibrary/build -``` - -## Arm NN - -See more information on the Arm NN Execution Provider [here](../execution-providers/community-maintained/ArmNN-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - - -* Supported backend: i.MX8QM Armv8 CPUs -* Supported BSP: i.MX8QM BSP - * Install i.MX8QM BSP: `source fsl-imx-xwayland-glibc-x86_64-fsl-image-qt5-aarch64-toolchain-4*.sh` -* Set up the build environment - -```bash -source /opt/fsl-imx-xwayland/4.*/environment-setup-aarch64-poky-linux -alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/share/cmake/OEToolchainConfig.cmake" -``` - -* See [here](inferencing.md#arm) for information on building for Arm-based devices - -### Build Instructions -{: .no_toc } - - -```bash -./build.sh --use_armnn -``` - -The Relu operator is set by default to use the CPU execution provider for better performance. To use the Arm NN implementation build with --armnn_relu flag - -```bash -./build.sh --use_armnn --armnn_relu -``` - -The Batch Normalization operator is set by default to use the CPU execution provider. To use the Arm NN implementation build with --armnn_bn flag - -```bash -./build.sh --use_armnn --armnn_bn -``` - -To use a library outside the normal environment you can set a custom path by providing the --armnn_home and --armnn_libs parameters to define the path to the Arm NN home directory and build directory respectively. -The Arm Compute Library home directory and build directory must also be available, and can be specified if needed using --acl_home and --acl_libs respectively. - -```bash -./build.sh --use_armnn --armnn_home /path/to/armnn --armnn_libs /path/to/armnn/build --acl_home /path/to/ComputeLibrary --acl_libs /path/to/acl/build -``` - ---- - -## RKNPU -See more information on the RKNPU Execution Provider [here](../execution-providers/community-maintained/RKNPU-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - - -* Supported platform: RK1808 Linux -* See [here](inferencing.md#arm) for information on building for Arm-based devices -* Use gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu instead of gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf, and modify CMAKE_CXX_COMPILER & CMAKE_C_COMPILER in tool.cmake: - -``` -set(CMAKE_CXX_COMPILER aarch64-linux-gnu-g++) -set(CMAKE_C_COMPILER aarch64-linux-gnu-gcc) -``` - -### Build Instructions -{: .no_toc } - -#### Linux - -1. Download [rknpu_ddk](https://github.com/airockchip/rknpu_ddk.git) to any directory. - -2. Build ONNX Runtime library and test: - - ```bash - ./build.sh --arm --use_rknpu --parallel --build_shared_lib --build_dir build_arm --config MinSizeRel --cmake_extra_defines RKNPU_DDK_PATH= CMAKE_TOOLCHAIN_FILE= ONNX_CUSTOM_PROTOC_EXECUTABLE= - ``` - -3. Deploy ONNX runtime and librknpu_ddk.so on the RK1808 board: - - ```bash - libonnxruntime.so.1.2.0 - onnxruntime_test_all - rknpu_ddk/lib64/librknpu_ddk.so - ``` - ---- - -## AMD Vitis AI -See more information on the Vitis AI Execution Provider [here](../execution-providers/Vitis-AI-ExecutionProvider.md). - -### Windows -{: .no_toc } - -From the Visual Studio Developer Command Prompt or Developer PowerShell, execute the following command: - -``` -.\build.bat --use_vitisai --build_shared_lib --parallel --config Release -``` - -If you wish to leverage the Python APIs, please include the `--build_wheel` flag: - -``` -.\build.bat --use_vitisai --build_shared_lib --parallel --config Release --build_wheel -``` - -You can override also override the installation location by specifying CMAKE_INSTALL_PREFIX via the cmake_extra_defines parameter. -e.g. - -``` -.\build.bat --use_vitisai --build_shared_lib --parallel --config Release --cmake_extra_defines CMAKE_INSTALL_PREFIX=D:\onnxruntime -``` -### Linux -{: .no_toc } - -Currently Linux support is only enabled for AMD Adapable SoCs. Please refer to the guidance [here](../execution-providers/Vitis-AI-ExecutionProvider.md#installation-for-amd-adaptable-socs) for SoC targets. - ---- - -## AMD MIGraphX - -See more information on the MIGraphX Execution Provider [here](../execution-providers/MIGraphX-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - -* Install [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/) - * The MIGraphX execution provider for ONNX Runtime is built and tested with ROCm6.3.1 -* Install [MIGraphX](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX) - * The path to MIGraphX installation must be provided via the `--migraphx_home parameter`. - -### Build Instructions -{: .no_toc } - -#### Linux - -```bash -./build.sh --config --parallel --use_migraphx --migraphx_home -``` - -Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/blob/main/dockerfiles#migraphx). - -#### Build Phython Wheel - -`./build.sh --config Release --build_wheel --parallel --use_migraphx --migraphx_home /opt/rocm` - -Then the python wheels(*.whl) could be found at ```./build/Linux/Release/dist```. - ---- - -## AMD ROCm - -See more information on the ROCm Execution Provider [here](../execution-providers/ROCm-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - -* Install [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/) - * The ROCm execution provider for ONNX Runtime is built and tested with ROCm6.3.1 - -### Build Instructions -{: .no_toc } - -#### Linux - -```bash -./build.sh --config --parallel --use_rocm --rocm_home -``` - -Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/tree/main/dockerfiles#rocm). - -#### Build Phython Wheel - -`./build.sh --config Release --build_wheel --parallel --use_rocm --rocm_home /opt/rocm` - -Then the python wheels(*.whl) could be found at ```./build/Linux/Release/dist```. - ---- - -## NNAPI - -Usage of NNAPI on Android platforms is via the NNAPI Execution Provider (EP). - -See the [NNAPI Execution Provider](../execution-providers/NNAPI-ExecutionProvider.md) documentation for more details. - -The pre-built ONNX Runtime Mobile package for Android includes the NNAPI EP. - -If performing a custom build of ONNX Runtime, support for the NNAPI EP or CoreML EP must be enabled when building. - -### Create a minimal build with NNAPI EP support - -Please see [the instructions](./android.md) for setting up the Android environment required to build. The Android build can be cross-compiled on Windows or Linux. - -Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: - -* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the NNAPI EP. -* Add `--use_nnapi` to include the NNAPI EP in the build - -#### Example build commands with the NNAPI EP enabled - -Windows example: - -```dos -.\build.bat --config MinSizeRel --android --android_sdk_path D:\Android --android_ndk_path D:\Android\ndk\21.1.6352462\ --android_abi arm64-v8a --android_api 29 --cmake_generator Ninja --minimal_build extended --use_nnapi --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config -``` - -Linux example: - -```bash -./build.sh --config MinSizeRel --android --android_sdk_path /Android --android_ndk_path /Android/ndk/21.1.6352462/ --android_abi arm64-v8a --android_api 29 --minimal_build extended --use_nnapi --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config ` -``` - -## CoreML - -Usage of CoreML on iOS and macOS platforms is via the CoreML EP. - -See the [CoreML Execution Provider](../execution-providers/CoreML-ExecutionProvider.md) documentation for more details. - -The pre-built ONNX Runtime Mobile package for iOS includes the CoreML EP. - -### Create a minimal build with CoreML EP support - -Please see [the instructions](./ios.md) for setting up the iOS environment required to build. The iOS/macOS build must be performed on a mac machine. - -Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: - -* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the CoreML EP. -* Add `--use_coreml` to include the CoreML EP in the build - -## XNNPACK - -Usage of XNNPACK on Android/iOS/Windows/Linux platforms is via the XNNPACK EP. - -See the [XNNPACK Execution Provider](../execution-providers/Xnnpack-ExecutionProvider.md) documentation for more details. - -The pre-built ONNX Runtime package([`onnxruntime-android`](https://mvnrepository.com/artifact/com.microsoft.onnxruntime/onnxruntime-android)) for Android includes the XNNPACK EP. - -The pre-built ONNX Runtime Mobile package for iOS, `onnxruntime-c` and `onnxruntime-objc` in [CocoaPods](https://cocoapods.org/), includes the XNNPACK EP. (Package `onnxruntime-objc` with XNNPACK will be available since 1.14.) - - -If performing a custom build of ONNX Runtime, support for the XNNPACK EP must be enabled when building. - -### Build for Android -#### Create a minimal build with XNNPACK EP support - -Please see [the instructions](./android.md) for setting up the Android environment required to build. The Android build can be cross-compiled on Windows or Linux. - -Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: - -* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the XNNPACK EP. -* Add `--use_xnnpack` to include the XNNPACK EP in the build - -##### Example build commands with the XNNPACK EP enabled - -Windows example: - -```bash -.\build.bat --config MinSizeRel --android --android_sdk_path D:\Android --android_ndk_path D:\Android\ndk\21.1.6352462\ --android_abi arm64-v8a --android_api 29 --cmake_generator Ninja --minimal_build extended --use_xnnpack --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config -``` - -Linux example: - -```bash -./build.sh --config MinSizeRel --android --android_sdk_path /Android --android_ndk_path /Android/ndk/21.1.6352462/ --android_abi arm64-v8a --android_api 29 --minimal_build extended --use_xnnpack --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config ` -``` -If you don't mind MINIMAL build, you can use the following command to build XNNPACK EP for Android: -Linux example: -```bash -./build.sh --cmake_generator "Ninja" --android --android_sdk_path /Android --android_ndk_path /Android/ndk/21.1.6352462/ --android_abi arm64-v8a --android_api 29 --use_xnnpack -``` -### Build for iOS (available since 1.14) -A Mac machine is required to build package for iOS. Please follow this [guide](./ios.md) to set up environment firstly. -#### Create a minimal build with XNNPACK EP support - -Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: - -* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the XNNPACK EP. -* Add `--use_xnnpack` to include the XNNPACK EP in the build - -```dos -./build.sh --config --use_xcode \ - --ios --ios_sysroot iphoneos --osx_arch arm64 --apple_deploy_target --use_xnnpack --minimal_build extended --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config -``` - -### Build for Windows -```dos -.\build.bat --config --use_xnnpack -``` -### Build for Linux -```bash -./build.sh --config --use_xnnpack -``` - ---- - -## CANN - -See more information on the CANN Execution Provider [here](../execution-providers/community-maintained/CANN-ExecutionProvider.md). - -### Prerequisites -{: .no_toc } - -1. Install the CANN Toolkit for the appropriate OS and target hardware by following [documentation](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/51RC1alphaX/softwareinstall/instg/atlasdeploy_03_0017.html) for detailed instructions, please. - -2. Initialize the CANN environment by running the script as shown below. - - ```bash - # Default path, change it if needed. - source /usr/local/Ascend/ascend-toolkit/set_env.sh - ``` - -### Build Instructions -{: .no_toc } - -#### Linux - -```bash -./build.sh --config --build_shared_lib --parallel --use_cann -``` - -### Notes -{: .no_toc } - -* The CANN execution provider supports building for both x64 and aarch64 architectures. -* CANN excution provider now is only supported on Linux. - -## Azure - -See the [Azure Execution Provider](../execution-providers/Azure-ExecutionProvider.md) documentation for more details. - -### Prerequisites - -For Linux, before building, please: - -* install openssl dev package into the system, which is openssl-dev for redhat and libssl-dev for ubuntu. -* if have multiple openssl dev versions installed in the system, please set environment variable "OPENSSL_ROOT_DIR" to the desired version, for example: - -```base -set OPENSSL_ROOT_DIR=/usr/local/ssl3.x/ -``` - -### Build Instructions - -#### Windows - -```dos -build.bat --config --build_shared_lib --build_wheel --use_azure -``` - -#### Linux - -```bash -./build.sh --config --build_shared_lib --build_wheel --use_azure -``` +--- +title: Build with different EPs +parent: Build ONNX Runtime +description: Learm how to build ONNX Runtime from source for different execution providers +nav_order: 3 +redirect_from: /docs/how-to/build/eps +--- + +# Build ONNX Runtime with Execution Providers +{: .no_toc } + +## Contents +{: .no_toc } + +* TOC placeholder +{:toc} + +## Execution Provider Shared Libraries + +The oneDNN, TensorRT, OpenVINO™, CANN, and QNN providers are built as shared libraries vs being statically linked into the main onnxruntime. This enables them to be loaded only when needed, and if the dependent libraries of the provider are not installed onnxruntime will still run fine, it just will not be able to use that provider. For non shared library providers, all dependencies of the provider must exist to load onnxruntime. + +### Built files +{: .no_toc } + +On Windows, shared provider libraries will be named 'onnxruntime_providers_\*.dll' (for example onnxruntime_providers_openvino.dll). +On Unix, they will be named 'libonnxruntime_providers_\*.so' +On Mac, they will be named 'libonnxruntime_providers_\*.dylib'. + +There is also a shared library that shared providers depend on called onnxruntime_providers_shared (with the same naming convension applied as above). + +Note: It is not recommended to put these libraries in a system location or added to a library search path (like LD_LIBRARY_PATH on Unix). If multiple versions of onnxruntime are installed on the system this can make them find the wrong libraries and lead to undefined behavior. + +### Loading the shared providers +{: .no_toc } + +Shared provider libraries are loaded by the onnxruntime code (do not load or depend on them in your client code). The API for registering shared or non shared providers is identical, the difference is that shared ones will be loaded at runtime when the provider is added to the session options (through a call like OrtSessionOptionsAppendExecutionProvider_OpenVINO or SessionOptionsAppendExecutionProvider_OpenVINO in the C API). +If a shared provider library cannot be loaded (if the file doesn't exist, or its dependencies don't exist or not in the path) then an error will be returned. + +The onnxruntime code will look for the provider shared libraries in the same location as the onnxruntime shared library is (or the executable statically linked to the static library version). + +--- + +## CUDA + +### Prerequisites +{: .no_toc } + +* Install [CUDA](https://developer.nvidia.com/cuda-toolkit) and [cuDNN](https://developer.nvidia.com/cudnn) + * The CUDA execution provider for ONNX Runtime is built and tested with CUDA 12.x and cuDNN 9. Check [here](../execution-providers/CUDA-ExecutionProvider.md#requirements) for more version information. + * The path to the CUDA installation must be provided via the CUDA_HOME environment variable, or the `--cuda_home` parameter. The installation directory should contain `bin`, `include` and `lib` sub-directories. + * The path to the CUDA `bin` directory must be added to the PATH environment variable so that `nvcc` is found. + * The path to the cuDNN installation must be provided via the CUDNN_HOME environment variable, or `--cudnn_home` parameter. In Windows, the installation directory should contain `bin`, `include` and `lib` sub-directories. + * cuDNN 8.* requires ZLib. Follow the [cuDNN 8.9 installation guide](https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-890/install-guide/index.html) to install zlib in Linux or Windows. + * In Windows, the path to the cuDNN bin directory must be added to the PATH environment variable so that cudnn64_8.dll is found. + +### Build Instructions +{: .no_toc } + +#### Windows +``` +.\build.bat --use_cuda --cudnn_home --cuda_home +``` + +#### Linux +``` +./build.sh --use_cuda --cudnn_home --cuda_home +``` + +A Dockerfile is available [here](https://github.com/microsoft/onnxruntime/blob/main/dockerfiles#cuda). + +### Build Options + +To specify GPU architectures (see [Compute Capability](https://developer.nvidia.com/cuda-gpus)), you can append parameters like `--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=80;86;89`. + +With `--cmake_extra_defines onnxruntime_USE_CUDA_NHWC_OPS=ON`, the CUDA EP can be compiled with additional NHWC ops. This option is not enabled by default due to the small amount of supported NHWC operators. + +Another very helpful CMake build option is to build with NVTX support (`--cmake_extra_defines onnxruntime_ENABLE_NVTX_PROFILE=ON`) that will enable much easier profiling using [Nsight Systems](https://developer.nvidia.com/nsight-systems) and correlates CUDA kernels with their actual ONNX operator. + +`--enable_cuda_line_info` or `--cmake_extra_defines onnxruntime_ENABLE_CUDA_LINE_NUMBER_INFO=ON` will enable [NVCC generation of line-number information for device code](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#generate-line-info-lineinfo). It might be helpful when you run [Compute Sanitizer](https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html) tools on CUDA kernels. + +If your Windows machine has multiple versions of CUDA installed and you want to use an older version of CUDA, you need append parameters like `--cuda_version `. + +When your build machine has many CPU cores and less than 64 GB memory, there is chance of out of memory error like `nvcc error : 'cicc' died due to signal 9`. The solution is to limit number of parallel NVCC threads with parameters like `--parallel 4 --nvcc_threads 1`. + +### Notes on older versions of ONNX Runtime, CUDA and Visual Studio +{: .no_toc } + +* Depending on compatibility between the CUDA, cuDNN, and Visual Studio versions you are using, you may need to explicitly install an earlier version of the MSVC toolset. +* For older version of ONNX Runtime and CUDA, and Visual Studio: + * CUDA 10.0 is [known to work](https://devblogs.microsoft.com/cppblog/cuda-10-is-now-available-with-support-for-the-latest-visual-studio-2017-versions/) with toolsets from 14.11 up to 14.16 (Visual Studio 2017 15.9), and should continue to work with future Visual Studio versions + * CUDA 9.2 is known to work with the 14.11 MSVC toolset (Visual Studio 15.3 and 15.4) + * To install the 14.11 MSVC toolset, see [this page](https://blogs.msdn.microsoft.com/vcblog/2017/11/15/side-by-side-minor-version-msvc-toolsets-in-visual-studio-2017). + * To use the 14.11 toolset with a later version of Visual Studio 2017 you have two options: + 1. Setup the Visual Studio environment variables to point to the 14.11 toolset by running vcvarsall.bat, prior to running the build script. e.g. if you have VS2017 Enterprise, an x64 build would use the following command `"C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" amd64 -vcvars_ver=14.11` For convenience, .\build.amd64.1411.bat will do this and can be used in the same way as .\build.bat. e.g. ` .\build.amd64.1411.bat --use_cuda` + + 2. Alternatively, if you have CMake 3.13 or later you can specify the toolset version via the `--msvc_toolset` build script parameter. e.g. `.\build.bat --msvc_toolset 14.11` + +* If you have multiple versions of CUDA installed on a Windows machine and are building with Visual Studio, CMake will use the build files for the highest version of CUDA it finds in the BuildCustomization folder. +e.g. C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\Common7\IDE\VC\VCTargets\BuildCustomizations\. +If you want to build with an earlier version, you must temporarily remove the 'CUDA x.y.*' files for later versions from this directory. + +--- + +## TensorRT + +See more information on the TensorRT Execution Provider [here](../execution-providers/TensorRT-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + + * Follow [instructions for CUDA execution provider](#cuda) to install CUDA and cuDNN, and setup environment variables. + * Follow [instructions for installing TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/latest/installing-tensorrt/installing.html) + * The TensorRT execution provider for ONNX Runtime is built and tested with TensorRT 10.9. + * The path to TensorRT installation must be provided via the `--tensorrt_home` parameter. + * ONNX Runtime uses [TensorRT built-in parser](https://developer.nvidia.com/tensorrt/download) from `tensorrt_home` by default. + * To use open-sourced [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt/tree/main) parser instead, add `--use_tensorrt_oss_parser` parameter in build commands below. + * The default version of open-sourced onnx-tensorrt parser is specified in [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt). + * To specify a different version of onnx-tensorrt parser: + * Select the commit of [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt/commits) that you preferred; + * Run `sha1sum` command with downloaded onnx-tensorrt zip file to acquire the SHA1 hash + * Update [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt) with updated onnx-tensorrt commit and hash info. + * Please make sure TensorRT built-in parser/open-sourced onnx-tensorrt specified in [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt) are **version-matched**, if enabling `--use_tensorrt_oss_parser`. + * i.e It's version-matched if assigning `tensorrt_home` with path to TensorRT-10.9 built-in binaries and onnx-tensorrt [10.9-GA branch](https://github.com/onnx/onnx-tensorrt/tree/release/10.9-GA) specified in [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt). + + +### **[Note to ORT 1.21/1.22 open-sourced parser users]** + +* ORT 1.21/1.22 link against onnx-tensorrt 10.8-GA/10.9-GA, which requires newly released onnx 1.18. + * Here's a temporarily fix to preview on onnx-tensorrt 10.8-GA/10.9-GA when building ORT 1.21/1.22: + * Replace the [onnx line in cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/rel-1.21.0/cmake/deps.txt#L38) + with `onnx;https://github.com/onnx/onnx/archive/e709452ef2bbc1d113faf678c24e6d3467696e83.zip;c0b9f6c29029e13dea46b7419f3813f4c2ca7db8` + * Download [this](https://github.com/microsoft/onnxruntime/blob/7b2733a526c12b5ef4475edd47fd9997ebc2b2c6/cmake/patches/onnx/onnx.patch) as raw file and save file to [cmake/patches/onnx/onnx.patch](https://github.com/microsoft/onnxruntime/blob/rel-1.21.0/cmake/patches/onnx/onnx.patch) (do not copy/paste from browser, as it might alter line break type) + * Build ORT with trt-related flags above (including `--use_tensorrt_oss_parser`) + * The [onnx 1.18](https://github.com/onnx/onnx/releases/tag/v1.18.0) is supported by latest ORT main branch. Please checkout main branch and build ORT-TRT with `--use_tensorrt_oss_parser` to enable OSS parser with full onnx 1.18 support. + +### Build Instructions +{: .no_toc } + +#### Windows +```bash +# to build with tensorrt built-in parser +.\build.bat --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --tensorrt_home --cmake_generator "Visual Studio 17 2022" + +# to build with specific version of open-sourced onnx-tensorrt parser configured in cmake/deps.txt +.\build.bat --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --tensorrt_home --use_tensorrt_oss_parser --cmake_generator "Visual Studio 17 2022" +``` + +#### Linux + +```bash +# to build with tensorrt built-in parser +./build.sh --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --tensorrt_home + +# to build with specific version of open-sourced onnx-tensorrt parser configured in cmake/deps.txt +./build.sh --config Release --parallel --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' --cudnn_home --cuda_home --use_tensorrt --use_tensorrt_oss_parser --tensorrt_home --skip_submodule_sync +``` + +Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/tree/main/dockerfiles#tensorrt) + +**Note** Building with `--use_tensorrt_oss_parser` with TensorRT 8.X requires additional flag --cmake_extra_defines onnxruntime_USE_FULL_PROTOBUF=ON + +--- + +## NVIDIA Jetson TX1/TX2/Nano/Xavier/Orin + +### Build Instructions +{: .no_toc } + +These instructions are for the latest [JetPack SDK](https://developer.nvidia.com/embedded/jetpack). + +1. Clone the ONNX Runtime repo on the Jetson host + + ```bash + git clone --recursive https://github.com/microsoft/onnxruntime + ``` + +2. Specify the CUDA compiler, or add its location to the PATH. + + 1. JetPack 5.x users can upgrade to the latest CUDA release without updating the JetPack version or Jetson Linux BSP (Board Support Package). + + 1. For JetPack 5.x users, CUDA>=11.8 and GCC>9.4 are required to be installed on and after ONNX Runtime 1.17. + + 2. Check [this official blog](https://developer.nvidia.com/blog/simplifying-cuda-upgrades-for-nvidia-jetson-users/) for CUDA upgrade instruction (CUDA 12.2 has been verified on JetPack 5.1.2 on Jetson Xavier NX). + + 1. If there's no `libnvcudla.so` under `/usr/local/cuda-12.2/compat`: `sudo apt-get install -y cuda-compat-12-2` and add `export LD_LIBRARY_PATH="/usr/local/cuda-12.2/lib64:/usr/local/cuda-12.2/compat:$LD_LIBRARY_PATH"` to `~/.bashrc`. + + 3. Check [here](https://developer.nvidia.com/cuda-gpus#collapse5) for compute capability datasheet. + + 2. CMake can't automatically find the correct `nvcc` if it's not in the `PATH`. `nvcc` can be added to `PATH` via: + + ```bash + export PATH="/usr/local/cuda/bin:${PATH}" + ``` + + or: + + ```bash + export CUDACXX="/usr/local/cuda/bin/nvcc" + ``` + + 3. Update TensorRT libraries + + 1. Jetpack 5.x supports up to TensorRT 8.5. Jetpack 6.x are equipped with TensorRT 8.6-10.3. + + 2. Jetpack 6.x users can download latest TensorRT 10 TAR package for **jetpack** on [TensorRT SDK website](https://developer.nvidia.com/tensorrt/download/10x). + + 3. Check [here](../execution-providers/TensorRT-ExecutionProvider.md#requirements) for TensorRT/CUDA support matrix among all ONNX Runtime versions. + +3. Install the ONNX Runtime build dependencies on the Jetpack host: + + ```bash + sudo apt install -y --no-install-recommends \ + build-essential software-properties-common libopenblas-dev \ + libpython3.10-dev python3-pip python3-dev python3-setuptools python3-wheel + ``` + +4. Cmake is needed to build ONNX Runtime. Please check the minimum required CMake version [here](https://github.com/microsoft/onnxruntime/blob/main/cmake/CMakeLists.txt#L6). Download from https://cmake.org/download/ and add cmake executable to `PATH` to use it. + +5. Build the ONNX Runtime Python wheel: + + 1. Build `onnxruntime-gpu` wheel with CUDA and TensorRT support (update paths to CUDA/CUDNN/TensorRT libraries if necessary): + + ```bash + ./build.sh --config Release --update --build --parallel --build_wheel \ + --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu \ + --tensorrt_home /usr/lib/aarch64-linux-gnu + ``` + +​ Notes: + +* By default, `onnxruntime-gpu` wheel file will be captured under `path_to/onnxruntime/build/Linux/Release/dist/` (build path can be customized by adding `--build_dir` followed by a customized path to the build command above). + +* Append `--skip_tests --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=native' 'onnxruntime_BUILD_UNIT_TESTS=OFF' 'onnxruntime_USE_FLASH_ATTENTION=OFF' +'onnxruntime_USE_MEMORY_EFFICIENT_ATTENTION=OFF'` to the build command to opt out optional features and reduce build time. + +* For a portion of Jetson devices like the Xavier series, higher power mode involves more cores (up to 6) to compute but it consumes more resource when building ONNX Runtime. Set `--parallel 1` in the build command if OOM happens and system is hanging. + +## TensorRT-RTX + +See more information on the NV TensorRT RTX Execution Provider [here](../execution-providers/TensorRTRTX-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + + * Follow [instructions for CUDA execution provider](#cuda) to install CUDA and setup environment variables. + * Intall TensorRT for RTX from nvidia.com (TODO: add link when available) + +### Build Instructions +{: .no_toc } +`build.bat --config Release --parallel 32 --build_dir _build --build_shared_lib --use_nv_tensorrt_rtx --tensorrt_home "C:\dev\TensorRT-RTX-1.1.0.3" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9" --cmake_generator "Visual Studio 17 2022" --use_vcpkg` +Replace the --tensorrt_home and --cuda_home with correct paths to CUDA and TensorRT-RTX installations. + +## oneDNN + +See more information on oneDNN (formerly DNNL) [here](../execution-providers/oneDNN-ExecutionProvider.md). + +### Build Instructions +{: .no_toc } + + +The DNNL execution provider can be built for Intel CPU or GPU. To build for Intel GPU, install [Intel SDK for OpenCL Applications](https://software.intel.com/content/www/us/en/develop/tools/opencl-sdk.html) or build OpenCL from [Khronos OpenCL SDK](https://github.com/KhronosGroup/OpenCL-SDK). Pass in the OpenCL SDK path as dnnl_opencl_root to the build command. Install the latest GPU driver - [Windows graphics driver](https://downloadcenter.intel.com/product/80939/Graphics), [Linux graphics compute runtime and OpenCL driver](https://github.com/intel/compute-runtime/releases). + +For CPU +#### Windows +`.\build.bat --use_dnnl` + +#### Linux +`./build.sh --use_dnnl` + +For GPU +#### Windows + +`.\build.bat --use_dnnl --dnnl_gpu_runtime ocl --dnnl_opencl_root "c:\program files (x86)\intelswtools\sw_dev_tools\opencl\sdk"` +#### Linux + +`./build.sh --use_dnnl --dnnl_gpu_runtime ocl --dnnl_opencl_root "/opt/intel/sw_dev_tools/opencl-sdk"` + +#### Build Phython Wheel + + +OneDNN EP build supports building Python wheel for both Windows and linux using flag --build_wheel + +`.\build.bat --config RelWithDebInfo --parallel --build_shared_lib --cmake_generator "Visual Studio 16 2019" --build_wheel --use_dnnl --dnnl_gpu_runtime ocl --dnnl_opencl_root "C:\Program Files (x86)\IntelSWTools\system_studio_2020\OpenCL\sdk"` + +--- + +## OpenVINO + +See more information on the OpenVINO™ Execution Provider [here](../execution-providers/OpenVINO-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + +1. Install the OpenVINO™ offline/online installer from Intel® Distribution of OpenVINO™TM Toolkit **Release 2024.3** for the appropriate OS and target hardware: + * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2025_3_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE). + * [Linux - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2025_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE) + + Follow [documentation](https://docs.openvino.ai/2025/index.html) for detailed instructions. + + *2025.3 is the current recommended OpenVINO™ version. [OpenVINO™ 2025.0](https://docs.openvino.ai/2025/index.html) is minimal OpenVINO™ version requirement.* + +2. Configure the target hardware with specific follow on instructions: + * To configure Intel® Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/2025/get-started/install-openvino/configurations/configurations-intel-gpu.html#windows), [Linux](https://docs.openvino.ai/2025/get-started/install-openvino/configurations/configurations-intel-gpu.html#linux) + + +3. Initialize the OpenVINO™ environment by running the setupvars script as shown below. This is a required step: + * For Windows: + ``` + C:\\setupvars.bat + ``` + * For Linux: + ``` + $ source /setupvars.sh + ``` + + +### Build Instructions +{: .no_toc } + +#### Windows + +``` +.\build.bat --config RelWithDebInfo --use_openvino --build_shared_lib --build_wheel +``` + +*Note: The default Windows CMake Generator is Visual Studio 2019, but you can also use the newer Visual Studio 2022 by passing `--cmake_generator "Visual Studio 17 2022"` to `.\build.bat`* + +#### Linux + +```bash +./build.sh --config RelWithDebInfo --use_openvino --build_shared_lib --build_wheel +``` + +* `--build_wheel` Creates python wheel file in dist/ folder. Enable it when building from source. +* `--use_openvino` builds the OpenVINO™ Execution Provider in ONNX Runtime. +* ``: Specifies the default hardware target for building OpenVINO™ Execution Provider. This can be overriden dynamically at runtime with another option (refer to [OpenVINO™-ExecutionProvider](../execution-providers/OpenVINO-ExecutionProvider.md#summary-of-options) for more details on dynamic device selection). Below are the options for different Intel target devices. + +Refer to [Intel GPU device naming convention](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#device-naming-convention) for specifying the correct hardware target in cases where both integrated and discrete GPU's co-exist. + +| Hardware Option | Target Device | +| --------------- | ------------------------| +| CPU | Intel® CPUs | +| GPU | Intel® Integrated Graphics | +| GPU.0 | Intel® Integrated Graphics | +| GPU.1 | Intel® Discrete Graphics | +| NPU | Intel® Neural Processor Unit | +| HETERO:DEVICE_TYPE_1,DEVICE_TYPE_2,DEVICE_TYPE_3... | All Intel® silicons mentioned above | +| MULTI:DEVICE_TYPE_1,DEVICE_TYPE_2,DEVICE_TYPE_3... | All Intel® silicons mentioned above | +| AUTO:DEVICE_TYPE_1,DEVICE_TYPE_2,DEVICE_TYPE_3... | All Intel® silicons mentioned above | + +Specifying Hardware Target for HETERO or Multi or AUTO device Build: + +HETERO:DEVICE_TYPE_1,DEVICE_TYPE_2,DEVICE_TYPE_3... +The DEVICE_TYPE can be any of these devices from this list ['CPU','GPU', 'NPU'] + +A minimum of two device's should be specified for a valid HETERO or MULTI or AUTO device build. + +``` +Example's: HETERO:GPU,CPU or AUTO:GPU,CPU or MULTI:GPU,CPU +``` + +#### Disable subgraph partition Feature +* Builds the OpenVINO™ Execution Provider in ONNX Runtime with sub graph partitioning disabled. + +* With this option enabled. Fully supported models run on OpenVINO Execution Provider else they completely fall back to default CPU EP. + +* To enable this feature during build time. Use `--use_openvino ` `_NO_PARTITION` + +``` +Usage: --use_openvino CPU_NO_PARTITION or --use_openvino GPU_NO_PARTITION or + --use_openvino GPU_NO_PARTITION +``` + +For more information on OpenVINO™ Execution Provider's ONNX Layer support, Topology support, and Intel hardware enabled, please refer to the document [OpenVINO™-ExecutionProvider](../execution-providers/OpenVINO-ExecutionProvider.md) + +--- + +## QNN +See more information on the QNN execution provider [here](../execution-providers/QNN-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } +* Install the Qualcomm AI Engine Direct SDK (Qualcomm Neural Network SDK) [Linux/Android/Windows](https://qpm.qualcomm.com/main/tools/details/qualcomm_ai_engine_direct) + +* Install [cmake-3.28](https://cmake.org/download/) or higher. + +* Install Python 3.10 or higher. + * [Python 3.12 for Windows Arm64](https://www.python.org/ftp/python/3.12.9/python-3.12.9-arm64.exe) + * [Python 3.12 for Windows x86-64](https://www.python.org/ftp/python/3.12.9/python-3.12.9-amd64.exe) + * Note: Windows on Arm supports a x86-64 Python environment via emulation. Ensure that the Arm64 Python environment is actived for a native Arm64 ONNX Runtime build. + +* Checkout the source tree: + + ```bash + git clone --recursive https://github.com/Microsoft/onnxruntime.git + cd onnxruntime + ``` + +* Install ONNX Runtime Python dependencies. + ```bash + pip install -r requirements.txt + ``` + +### Build Options +{: .no_toc } + +* `--use_qnn [QNN_LIBRARY_KIND]`: Builds the QNN Execution provider. `QNN_LIBRARY_KIND` is optional and specifies whether to build the QNN Execution Provider as a shared library (default) or static library. + * `--use_qnn` or `--use_qnn shared_lib`: Builds the QNN Execution Provider as a shared library. + * `--use_qnn static_lib`: Builds QNN Execution Provider as a static library linked into ONNX Runtime. This is required for Android builds. +* `--qnn_home QNN_SDK_PATH`: The path to the Qualcomm AI Engine Direct SDK. + * Example on Windows: `--qnn_home 'C:\Qualcomm\AIStack\QAIRT\2.31.0.250130'` + * Example on Linux: `--qnn_home /opt/qcom/aistack/qairt/2.31.0.250130` +* `--build_wheel`: Enables Python bindings and builds Python wheel. +* `--arm64`: Cross-compile for Arm64. +* `--arm64ec`: Cross-compile for Arm64EC. Arm64EC code runs with native performance and is interoperable with x64 code running under emulation within the same process on a Windows on Arm device. Refer to the [Arm64EC Overview](https://learn.microsoft.com/en-us/windows/arm/arm64ec). + +Run `python tools/ci_build/build.py --help` for a description of all available build options. + +### Build Instructions +{: .no_toc } + +#### Windows (native x86-64 or native Arm64) +``` +.\build.bat --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --cmake_generator "Visual Studio 17 2022" --config Release --parallel --skip_tests --build_dir build\Windows +``` + +Notes: +* Not all Qualcomm backends (e.g., HTP) are supported for model execution on a native x86-64 build. Refer to the [Qualcomm SDK backend documentation](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/backend.html) for more information. +* Even if a Qualcomm backend does not support execution on x86-64, the QNN Execution provider may be able to [generate compiled models](../execution-providers/QNN-ExecutionProvider.md#qnn-context-binary-cache-feature) for the Qualcomm backend. + +#### Windows (Arm64 cross-compile target) +``` +.\build.bat --arm64 --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --cmake_generator "Visual Studio 17 2022" --config Release --parallel --build_dir build\Windows +``` + +#### Windows (Arm64EC cross-compile target) +``` +.\build.bat --arm64ec --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --cmake_generator "Visual Studio 17 2022" --config Release --parallel --build_dir build\Windows +``` + +#### Windows (Arm64X cross-compile target) +Use the `build_arm64x.bat` script to build Arm64X binaries. Arm64X binaries bundle both Arm64 and Arm64EC code, making Arm64X compatible with both Arm64 and Arm64EC processes on a Windows on Arm device. Refer to the [Arm64X PE files overview](https://learn.microsoft.com/en-us/windows/arm/arm64x-pe). + +``` +.\build_arm64x.bat --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --cmake_generator "Visual Studio 17 2022" --config Release --parallel +``` +Notes: +* Do not specify a `--build_dir` option because `build_arm64x.bat` sets specific build directories. +* The above command places Arm64X binaries in the `.\build\arm64ec-x\Release\Release\` directory. + +#### Linux (x86_64) +``` +./build.sh --use_qnn --qnn_home [QNN_SDK_PATH] --build_shared_lib --build_wheel --config Release --parallel --skip_tests --build_dir build/Linux +``` + +#### Android (cross-compile): + +Please reference [Build OnnxRuntime For Android](android.md) +``` +# on Windows +.\build.bat --build_shared_lib --android --config Release --parallel --use_qnn static_lib --qnn_home [QNN_SDK_PATH] --android_sdk_path [android_SDK path] --android_ndk_path [android_NDK path] --android_abi arm64-v8a --android_api [api-version] --cmake_generator Ninja --build_dir build\Android + +# on Linux +./build.sh --build_shared_lib --android --config Release --parallel --use_qnn static_lib --qnn_home [QNN_SDK_PATH] --android_sdk_path [android_SDK path] --android_ndk_path [android_NDK path] --android_abi arm64-v8a --android_api [api-version] --cmake_generator Ninja --build_dir build/Android +``` + +--- + +## DirectML +See more information on the DirectML execution provider [here](../execution-providers/DirectML-ExecutionProvider.md). +### Windows +{: .no_toc } + +``` +.\build.bat --use_dml +``` +### Notes +{: .no_toc } + +The DirectML execution provider supports building for both x64 and x86 architectures. DirectML is only supported on Windows. + +--- + +## Arm Compute Library +See more information on the ACL Execution Provider [here](../execution-providers/community-maintained/ACL-ExecutionProvider.md). + +### Build Instructions +{: .no_toc } + +You must first build Arm Compute Library 24.07 for your platform as described in the [documentation](https://github.com/ARM-software/ComputeLibrary). +See [here](inferencing.md#arm) for information on building for Arm®-based devices. + +Add the following options to `build.sh` to enable the ACL Execution Provider: + +``` +--use_acl --acl_home=/path/to/ComputeLibrary --acl_libs=/path/to/ComputeLibrary/build +``` + +## Arm NN + +See more information on the Arm NN Execution Provider [here](../execution-providers/community-maintained/ArmNN-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + + +* Supported backend: i.MX8QM Armv8 CPUs +* Supported BSP: i.MX8QM BSP + * Install i.MX8QM BSP: `source fsl-imx-xwayland-glibc-x86_64-fsl-image-qt5-aarch64-toolchain-4*.sh` +* Set up the build environment + +```bash +source /opt/fsl-imx-xwayland/4.*/environment-setup-aarch64-poky-linux +alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/share/cmake/OEToolchainConfig.cmake" +``` + +* See [here](inferencing.md#arm) for information on building for Arm-based devices + +### Build Instructions +{: .no_toc } + + +```bash +./build.sh --use_armnn +``` + +The Relu operator is set by default to use the CPU execution provider for better performance. To use the Arm NN implementation build with --armnn_relu flag + +```bash +./build.sh --use_armnn --armnn_relu +``` + +The Batch Normalization operator is set by default to use the CPU execution provider. To use the Arm NN implementation build with --armnn_bn flag + +```bash +./build.sh --use_armnn --armnn_bn +``` + +To use a library outside the normal environment you can set a custom path by providing the --armnn_home and --armnn_libs parameters to define the path to the Arm NN home directory and build directory respectively. +The Arm Compute Library home directory and build directory must also be available, and can be specified if needed using --acl_home and --acl_libs respectively. + +```bash +./build.sh --use_armnn --armnn_home /path/to/armnn --armnn_libs /path/to/armnn/build --acl_home /path/to/ComputeLibrary --acl_libs /path/to/acl/build +``` + +--- + +## RKNPU +See more information on the RKNPU Execution Provider [here](../execution-providers/community-maintained/RKNPU-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + + +* Supported platform: RK1808 Linux +* See [here](inferencing.md#arm) for information on building for Arm-based devices +* Use gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu instead of gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf, and modify CMAKE_CXX_COMPILER & CMAKE_C_COMPILER in tool.cmake: + +``` +set(CMAKE_CXX_COMPILER aarch64-linux-gnu-g++) +set(CMAKE_C_COMPILER aarch64-linux-gnu-gcc) +``` + +### Build Instructions +{: .no_toc } + +#### Linux + +1. Download [rknpu_ddk](https://github.com/airockchip/rknpu_ddk.git) to any directory. + +2. Build ONNX Runtime library and test: + + ```bash + ./build.sh --arm --use_rknpu --parallel --build_shared_lib --build_dir build_arm --config MinSizeRel --cmake_extra_defines RKNPU_DDK_PATH= CMAKE_TOOLCHAIN_FILE= ONNX_CUSTOM_PROTOC_EXECUTABLE= + ``` + +3. Deploy ONNX runtime and librknpu_ddk.so on the RK1808 board: + + ```bash + libonnxruntime.so.1.2.0 + onnxruntime_test_all + rknpu_ddk/lib64/librknpu_ddk.so + ``` + +--- + +## AMD Vitis AI +See more information on the Vitis AI Execution Provider [here](../execution-providers/Vitis-AI-ExecutionProvider.md). + +### Windows +{: .no_toc } + +From the Visual Studio Developer Command Prompt or Developer PowerShell, execute the following command: + +``` +.\build.bat --use_vitisai --build_shared_lib --parallel --config Release +``` + +If you wish to leverage the Python APIs, please include the `--build_wheel` flag: + +``` +.\build.bat --use_vitisai --build_shared_lib --parallel --config Release --build_wheel +``` + +You can override also override the installation location by specifying CMAKE_INSTALL_PREFIX via the cmake_extra_defines parameter. +e.g. + +``` +.\build.bat --use_vitisai --build_shared_lib --parallel --config Release --cmake_extra_defines CMAKE_INSTALL_PREFIX=D:\onnxruntime +``` +### Linux +{: .no_toc } + +Currently Linux support is only enabled for AMD Adapable SoCs. Please refer to the guidance [here](../execution-providers/Vitis-AI-ExecutionProvider.md#installation-for-amd-adaptable-socs) for SoC targets. + +--- + +## AMD MIGraphX + +See more information on the MIGraphX Execution Provider [here](../execution-providers/MIGraphX-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + +* Install [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/) + * The MIGraphX execution provider for ONNX Runtime is built and tested with ROCm6.3.1 +* Install [MIGraphX](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX) + * The path to MIGraphX installation must be provided via the `--migraphx_home parameter`. + +### Build Instructions +{: .no_toc } + +#### Linux + +```bash +./build.sh --config --parallel --use_migraphx --migraphx_home +``` + +Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/blob/main/dockerfiles#migraphx). + +#### Build Phython Wheel + +`./build.sh --config Release --build_wheel --parallel --use_migraphx --migraphx_home /opt/rocm` + +Then the python wheels(*.whl) could be found at ```./build/Linux/Release/dist```. + +--- + +## AMD ROCm + +See more information on the ROCm Execution Provider [here](../execution-providers/ROCm-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + +* Install [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/) + * The ROCm execution provider for ONNX Runtime is built and tested with ROCm6.3.1 + +### Build Instructions +{: .no_toc } + +#### Linux + +```bash +./build.sh --config --parallel --use_rocm --rocm_home +``` + +Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/tree/main/dockerfiles#rocm). + +#### Build Phython Wheel + +`./build.sh --config Release --build_wheel --parallel --use_rocm --rocm_home /opt/rocm` + +Then the python wheels(*.whl) could be found at ```./build/Linux/Release/dist```. + +--- + +## NNAPI + +Usage of NNAPI on Android platforms is via the NNAPI Execution Provider (EP). + +See the [NNAPI Execution Provider](../execution-providers/NNAPI-ExecutionProvider.md) documentation for more details. + +The pre-built ONNX Runtime Mobile package for Android includes the NNAPI EP. + +If performing a custom build of ONNX Runtime, support for the NNAPI EP or CoreML EP must be enabled when building. + +### Create a minimal build with NNAPI EP support + +Please see [the instructions](./android.md) for setting up the Android environment required to build. The Android build can be cross-compiled on Windows or Linux. + +Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: + +* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the NNAPI EP. +* Add `--use_nnapi` to include the NNAPI EP in the build + +#### Example build commands with the NNAPI EP enabled + +Windows example: + +```dos +.\build.bat --config MinSizeRel --android --android_sdk_path D:\Android --android_ndk_path D:\Android\ndk\21.1.6352462\ --android_abi arm64-v8a --android_api 29 --cmake_generator Ninja --minimal_build extended --use_nnapi --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config +``` + +Linux example: + +```bash +./build.sh --config MinSizeRel --android --android_sdk_path /Android --android_ndk_path /Android/ndk/21.1.6352462/ --android_abi arm64-v8a --android_api 29 --minimal_build extended --use_nnapi --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config ` +``` + +## CoreML + +Usage of CoreML on iOS and macOS platforms is via the CoreML EP. + +See the [CoreML Execution Provider](../execution-providers/CoreML-ExecutionProvider.md) documentation for more details. + +The pre-built ONNX Runtime Mobile package for iOS includes the CoreML EP. + +### Create a minimal build with CoreML EP support + +Please see [the instructions](./ios.md) for setting up the iOS environment required to build. The iOS/macOS build must be performed on a mac machine. + +Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: + +* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the CoreML EP. +* Add `--use_coreml` to include the CoreML EP in the build + +## XNNPACK + +Usage of XNNPACK on Android/iOS/Windows/Linux platforms is via the XNNPACK EP. + +See the [XNNPACK Execution Provider](../execution-providers/Xnnpack-ExecutionProvider.md) documentation for more details. + +The pre-built ONNX Runtime package([`onnxruntime-android`](https://mvnrepository.com/artifact/com.microsoft.onnxruntime/onnxruntime-android)) for Android includes the XNNPACK EP. + +The pre-built ONNX Runtime Mobile package for iOS, `onnxruntime-c` and `onnxruntime-objc` in [CocoaPods](https://cocoapods.org/), includes the XNNPACK EP. (Package `onnxruntime-objc` with XNNPACK will be available since 1.14.) + + +If performing a custom build of ONNX Runtime, support for the XNNPACK EP must be enabled when building. + +### Build for Android +#### Create a minimal build with XNNPACK EP support + +Please see [the instructions](./android.md) for setting up the Android environment required to build. The Android build can be cross-compiled on Windows or Linux. + +Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: + +* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the XNNPACK EP. +* Add `--use_xnnpack` to include the XNNPACK EP in the build + +##### Example build commands with the XNNPACK EP enabled + +Windows example: + +```bash +.\build.bat --config MinSizeRel --android --android_sdk_path D:\Android --android_ndk_path D:\Android\ndk\21.1.6352462\ --android_abi arm64-v8a --android_api 29 --cmake_generator Ninja --minimal_build extended --use_xnnpack --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config +``` + +Linux example: + +```bash +./build.sh --config MinSizeRel --android --android_sdk_path /Android --android_ndk_path /Android/ndk/21.1.6352462/ --android_abi arm64-v8a --android_api 29 --minimal_build extended --use_xnnpack --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config ` +``` +If you don't mind MINIMAL build, you can use the following command to build XNNPACK EP for Android: +Linux example: +```bash +./build.sh --cmake_generator "Ninja" --android --android_sdk_path /Android --android_ndk_path /Android/ndk/21.1.6352462/ --android_abi arm64-v8a --android_api 29 --use_xnnpack +``` +### Build for iOS (available since 1.14) +A Mac machine is required to build package for iOS. Please follow this [guide](./ios.md) to set up environment firstly. +#### Create a minimal build with XNNPACK EP support + +Once you have all the necessary components setup, follow the instructions to [create the custom build](./custom.md), with the following changes: + +* Replace `--minimal_build` with `--minimal_build extended` to enable support for execution providers that dynamically create kernels at runtime, which is required by the XNNPACK EP. +* Add `--use_xnnpack` to include the XNNPACK EP in the build + +```dos +./build.sh --config --use_xcode \ + --ios --ios_sysroot iphoneos --osx_arch arm64 --apple_deploy_target --use_xnnpack --minimal_build extended --disable_ml_ops --disable_exceptions --build_shared_lib --skip_tests --include_ops_by_config +``` + +### Build for Windows +```dos +.\build.bat --config --use_xnnpack +``` +### Build for Linux +```bash +./build.sh --config --use_xnnpack +``` + +--- + +## CANN + +See more information on the CANN Execution Provider [here](../execution-providers/community-maintained/CANN-ExecutionProvider.md). + +### Prerequisites +{: .no_toc } + +1. Install the CANN Toolkit for the appropriate OS and target hardware by following [documentation](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/51RC1alphaX/softwareinstall/instg/atlasdeploy_03_0017.html) for detailed instructions, please. + +2. Initialize the CANN environment by running the script as shown below. + + ```bash + # Default path, change it if needed. + source /usr/local/Ascend/ascend-toolkit/set_env.sh + ``` + +### Build Instructions +{: .no_toc } + +#### Linux + +```bash +./build.sh --config --build_shared_lib --parallel --use_cann +``` + +### Notes +{: .no_toc } + +* The CANN execution provider supports building for both x64 and aarch64 architectures. +* CANN excution provider now is only supported on Linux. + +## Azure + +See the [Azure Execution Provider](../execution-providers/Azure-ExecutionProvider.md) documentation for more details. + +### Prerequisites + +For Linux, before building, please: + +* install openssl dev package into the system, which is openssl-dev for redhat and libssl-dev for ubuntu. +* if have multiple openssl dev versions installed in the system, please set environment variable "OPENSSL_ROOT_DIR" to the desired version, for example: + +```base +set OPENSSL_ROOT_DIR=/usr/local/ssl3.x/ +``` + +### Build Instructions + +#### Windows + +```dos +build.bat --config --build_shared_lib --build_wheel --use_azure +``` + +#### Linux + +```bash +./build.sh --config --build_shared_lib --build_wheel --use_azure +``` \ No newline at end of file diff --git a/docs/execution-providers/OpenVINO-ExecutionProvider.md b/docs/execution-providers/OpenVINO-ExecutionProvider.md index 24230df06e353..c26eaa8be306e 100644 --- a/docs/execution-providers/OpenVINO-ExecutionProvider.md +++ b/docs/execution-providers/OpenVINO-ExecutionProvider.md @@ -1,1073 +1,572 @@ ---- -title: Intel - OpenVINO™ -description: Instructions to execute OpenVINO™ Execution Provider for ONNX Runtime. -parent: Execution Providers -nav_order: 3 -redirect_from: /docs/reference/execution-providers/OpenVINO-ExecutionProvider ---- - -# OpenVINO™ Execution Provider -{: .no_toc } - -Accelerate ONNX models on Intel CPUs, GPUs, NPU with Intel OpenVINO™ Execution Provider. Please refer to [this](https://software.intel.com/en-us/openvino-toolkit/hardware) page for details on the Intel hardware supported. - -## Contents -{: .no_toc } - -* TOC placeholder -{:toc} - -## Install - -Pre-built packages are published for OpenVINO™ Execution Provider for ONNX Runtime by Intel for each release. -* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.8 Release](https://github.com/intel/onnxruntime/releases) -* Python wheels Ubuntu/Windows: [onnxruntime-openvino](https://pypi.org/project/onnxruntime-openvino/) - -## Requirements - - -ONNX Runtime OpenVINO™ Execution Provider is compatible with three latest releases of OpenVINO™. - -|ONNX Runtime|OpenVINO™|Notes| -|---|---|---| -|1.23.0|2025.3|[Details - Placeholder]()| -|1.22.0|2025.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.7)| -|1.21.0|2025.0|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.6)| - -## Build - -For build instructions, please see the [BUILD page](../build/eps.md#openvino). - -## Usage - -**Python Package Installation** - -For Python users, install the onnxruntime-openvino package: -``` -pip install onnxruntime-openvino -``` - -**Set OpenVINO™ Environment Variables** - -To use OpenVINO™ Execution Provider with any programming language (Python, C++, C#), you must set up the OpenVINO™ Environment Variables using the full installer package of OpenVINO™. - -* **Windows** -``` -C:\ \setupvars.bat -``` -* **Linux** -``` -$ source /setupvars.sh -``` -**Note for Linux Python Users:** OpenVINO™ Execution Provider installed from PyPi.org comes with prebuilt OpenVINO™ libs and supports flag CXX11_ABI=0. So there is no need to install OpenVINO™ separately. However, if you need to enable CX11_ABI=1 flag, build ONNX Runtime python wheel packages from source. For build instructions, see the [BUILD page](../build/eps.md#openvino). - - -**Set OpenVINO™ Environment for C#** - -To use csharp api for openvino execution provider create a custom nuget package. Follow the instructions [here](../build/inferencing.md#build-nuget-packages) to install prerequisites for nuget creation. Once prerequisites are installed follow the instructions to [build openvino execution provider](../build/eps.md#openvino) and add an extra flag `--build_nuget` to create nuget packages. Two nuget packages will be created Microsoft.ML.OnnxRuntime.Managed and Intel.ML.OnnxRuntime.Openvino. - -## Table of Contents -- [Configuration Options](#configuration-options) -- [Features ](#features ) -- [Examples](#examples) -- [Detailed Descriptions](#detailed-descriptions) - - -## Configuration Options - - -Runtime parameters you set when initializing the OpenVINO Execution Provider to control how inference runs. - -### Configuration Table - -| **Key** | **Type** | **Allowable Values** | **Value Type** | **Description** | **Example** | -|---------|----------|---------------------|----------------|-----------------|-------------| -| **device_type** | string | CPU, NPU, GPU, GPU.0, GPU.1, HETERO, MULTI, AUTO | string | [Choose which hardware device to use](#device_type-config-description) | [Examples](#device_type-config-examples) | -| **precision** | string | FP32, FP16, ACCURACY | string | [Set inference precision level](#precision-config-description) | [Examples](#precision-config-examples) | -| **num_of_threads** | string | Any positive integer > 0 | size_t | [Control number of inference threads](#num_of_threads-config-description) | [Examples](#num_of_threads-config-examples) | -| **num_streams** | string | Any positive integer > 0 | size_t | [Set parallel execution streams](#num_streams-config-description) | [Examples](#num_streams-config-examples) | -| **cache_dir** | string | Valid filesystem path | string | [Enable model caching by setting cache directory](#cache_dir-config-description) | [Examples](#cache_dir-config-examples) | -| **load_config** | string | JSON file path | string | [Load custom OpenVINO properties from JSON](#load_config-config-description) | [Examples](#load_config-config-examples) | -| **enable_qdq_optimizer** | string | True/False | boolean | [Enable QDQ optimization for NPU](#enable_qdq_optimizer-config-description) | [Examples](#enable_qdq_optimizer-config-examples) | -| **disable_dynamic_shapes** | string | True/False | boolean | [Convert dynamic models to static shapes](#disable_dynamic_shapes-config-description) | [Examples](#disable_dynamic_shapes-config-examples) | -| **model_priority** | string | LOW, MEDIUM, HIGH, DEFAULT | string | [Configure model resource allocation priority](#model_priority-config-description) | [Examples](#model_priority-config-examples) | -| **reshape_input** | string | input_name[shape_bounds] | string | [Set dynamic shape bounds for NPU models](#reshape_input-config-description) | [Examples](#reshape_input-config-examples) | -| **layout** | string | input_name[layout_format] | string | [Specify input/output tensor layout format](#layout-config-description) | [Examples](#layout-config-examples) | - -Valid Hetero or Multi or Auto Device combinations: `HETERO:,...` -The `device` can be any of these devices from this list ['CPU','GPU', 'NPU'] - -A minimum of two DEVICE_TYPE'S should be specified for a valid HETERO, MULTI, or AUTO Device Build. - -Example: HETERO:GPU,CPU AUTO:GPU,CPU MULTI:GPU,CPU - -Deprecated device_type option : CPU_FP32, GPU_FP32, GPU_FP16, NPU_FP16 are no more supported. They will be deprecated in the future release. Kindly upgrade to latest device_type and precision option. - ---- - -## Features - - -Built-in capabilities that the OpenVINO EP provides automatically or can be enabled through configuration. - - -### Features Table - -| **Feature** | **Supported Devices** | **Description** | **How to Enable** | **Example** | -|-------------|----------------------|-----------------|-------------------|-------------| -| **Auto Device Selection** | CPU, GPU, NPU | [Automatically selects optimal device for your model](#auto-device-execution-for-openvino-execution-provider) | Set device_type to AUTO | Examples | -| **Model Caching** | CPU, GPU, NPU | [Saves compiled models for faster subsequent loading](#model-caching) | Set cache_dir option | Examples | -| **Multi-Threading** | All devices | [Thread-safe inference with configurable thread count](#multi-threading-for-openvino-execution-provider) | Automatic/configure with num_of_threads | Examples | -| **Multi-Stream Execution** | All devices | [Parallel inference streams for higher throughput](#multi-streams-for-openvino-execution-provider) | Configure with num_streams | Examples | -| **Heterogeneous Execution** | CPU + GPU/NPU | [Split model execution across multiple devices](#heterogeneous-execution-for-openvino-execution-provider) | Set device_type to HETERO | Examples | -| **Multi-Device Execution** | CPU, GPU, NPU | [Run same model on multiple devices in parallel](#multi-device-execution-for-openvino-execution-provider) | Set device_type to MULTI | Examples | -| **INT8 Quantized Models** | CPU, GPU, NPU | [Support for quantized models with better performance](#support-for-int8-quantized-models) | Automatic for quantized models | Examples | -| **External Weights Support** | All devices | [Load models with weights stored in external files](#support-for-weights-saved-in-external-files) | Automatic detection | [Example](#support-for-weights-saved-in-external-files) | -| **Dynamic Shape Management** | All devices | [Handle models with variable input dimensions](#dynamic-shape-management) | Automatic/use reshape_input for NPU | Examples | -| **Tensor Layout Control** | All devices | [Explicit control over tensor memory layout](#tensor-layout-control) | Set layout option | Examples | -| **QDQ Optimization** | NPU | [Optimize quantized models for NPU performance](#enable-qdq-optimizations-passes) | Set enable_qdq_optimizer | Examples | -| **EP-Weight Sharing** | All devices | [Share weights across multiple inference sessions](#openvino-execution-provider-supports-ep-weight-sharing-across-sessions) | Session configuration | Examples | - - -## Examples - -### Configuration Examples - - -#### device_type Config Examples -```python -Single device -"device_type": "GPU" -"device_type": "NPU" -"device_type": "CPU" -Specific GPU -"device_type": "GPU.1" - -Multi-device configurations -"device_type": "HETERO:GPU,CPU" -"device_type": "MULTI:GPU,CPU" -"device_type": "AUTO:GPU,NPU,CPU" - -Python API -import onnxruntime as ort -session = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'AUTO:GPU,NPU,CPU'}] - ) - -Command line -onnxruntime_perf_test.exe -e openvino -i "device_type|GPU" model.onnx -``` - -#### precision Config Examples -```python -"precision": "FP32" -"precision": "FP16" -"precision": "ACCURACY" - - -# Python API - -import onnxruntime as ort -session = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'GPU', 'precision': 'FP16'}] -) - - -``` -#### num_of_threads Config Examples -```python -"num_of_threads": "4" -"num_of_threads": "8" -"num_of_threads": "16" - -# Python API -import onnxruntime as ort -session = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'CPU', 'num_of_threads': '8'}] -) - -``` - -#### num_streams Config Examples -```python -"num_streams": "1" -"num_streams": "4" -"num_streams": "8 - -# Python API -import onnxruntime as ort -session = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'GPU', 'num_streams': '4'}] -) - -``` - -#### cache_dir Config Examples -```python -# Windows -"cache_dir": "C:\\intel\\openvino_cache" - -# Linux -"cache_dir": "/tmp/ov_cache" - -# Relative path -"cache_dir": "./model_cache" - -import onnxruntime as ort -session = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'GPU', 'cache_dir': './ov_cache'}] -) - -``` - -#### load_config Config Examples -```python -# JSON file path -"load_config": "config.json" -"load_config": "/path/to/openvino_config.json" -"load_config": "C:\\configs\\gpu_config.json" - -# Python API -import onnxruntime as ort -session = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'GPU', 'load_config': 'custom_config.json'}] -) - -# Example JSON content: -{ - "GPU": { - "PERFORMANCE_HINT": "THROUGHPUT", - "EXECUTION_MODE_HINT": "ACCURACY", - "CACHE_DIR": "C:\\gpu_cache" - }, - "NPU": { - "LOG_LEVEL": "LOG_DEBUG" - } -} -# Command line usage -onnxruntime_perf_test.exe -e openvino -i "device_type|NPU load_config|config.json" model.onnx - -``` - -#### enable_qdq_optimizer Config Examples - -```python -"enable_qdq_optimizer": "True" # Enable QDQ optimization for NPU -"enable_qdq_optimizer": "False" # Disable QDQ optimization - -# Python API -import onnxruntime as ort -session = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'NPU', 'enable_qdq_optimizer': 'True'}] -) -``` -#### disable_dynamic_shapes Config Examples -```python -"disable_dynamic_shapes": "True" # Convert dynamic to static shapes -"disable_dynamic_shapes": "False" # Keep original dynamic shapes - -# Python API -import onnxruntime as ort -session = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'GPU', 'disable_dynamic_shapes': 'True'}] -) - -``` - -#### model_priority Config Examples -```python -"model_priority": "HIGH" # Highest resource priority -"model_priority": "MEDIUM" # Medium resource priority -"model_priority": "LOW" # Lowest resource priority -"model_priority": "DEFAULT" # System default priority - -# Python API -import onnxruntime as ort -session = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'GPU', 'model_priority': 'HIGH'}] -) -``` - -#### reshape_input Config Examples -```python -# Command line usage (NPU only) -"reshape_input": "data[1,3,60,80..120]" # Dynamic height: 80-120 -"reshape_input": "input[1,3,224,224]" # Fixed shape -"reshape_input": "seq[1,10..50,768]" # Dynamic sequence: 10-50 - -# Python API -import onnxruntime as ort -session = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'NPU', 'reshape_input': 'data[1,3,60,80..120]'}] -) -# Command line -onnxruntime_perf_test.exe -e openvino -i "device_type|NPU reshape_input|data[1,3,60,80..120]" model.onnx - -``` - -#### layout Config Examples -```python -# Command line usage -"layout": "data_0[NCHW],prob_1[NC]" # Multiple inputs/outputs -"layout": "input[NHWC]" # Single input -"layout": "data[N?HW]" # Unknown channel dimension - -# Python API -import onnxruntime as ort -session = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'NPU', 'layout': 'data_0[NCHW],output[NC]'}] -) - -# Command line -onnxruntime_perf_test.exe -e openvino -i "device_type|NPU layout|data_0[NCHW],prob_1[NC]" model.onnx -``` - -### Feature Examples - -#### auto-device Feature Examples - -```python -# Basic AUTO usage -import onnxruntime as ort -session = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'AUTO'}] -) - -# AUTO with device priority -session = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'AUTO:GPU,NPU,CPU'}] -) - -# Command line -onnxruntime_perf_test.exe -e openvino -i "device_type|AUTO:GPU,CPU" model.onnx - -``` - -#### model-caching Feature Examples -```python -# Enable caching -import onnxruntime as ort -session = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'GPU', 'cache_dir': './ov_cache'}] -) - -# First run: compiles and caches model -# Subsequent runs: loads from cache (much faster) -session1 = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'GPU', 'cache_dir': './ov_cache'}] -) # Slow first time -session2 = ort.InferenceSession( - "model.onnx", - providers=['OpenVINOExecutionProvider'], - provider_options=[{'device_type': 'GPU', 'cache_dir': './ov_cache'}] -) # Fast second time -``` - -#### multi-threading Feature Examples -```python - -``` - -#### multi-stream-feature Examples -```python - -``` -### Detailed Descriptions -#### Configuration Descriptions -#### device_type Config Description -Specifies which hardware device to run inference on. This is the primary configuration that determines execution target. -Available Options: - -CPU: Intel CPU execution using OpenVINO CPU plugin -NPU: Neural Processing Unit for AI-optimized inference -GPU: Intel GPU acceleration (integrated or discrete) -GPU.0, GPU.1: Specific GPU device selection in multi-GPU systems -AUTO: Automatic device selection based on model characteristics -HETERO: Heterogeneous execution across multiple devices -MULTI: Multi-device parallel execution - -Default Behavior: If not specified, uses the default hardware specified during build time. -#### precision Config Description -Controls the numerical precision used during inference, affecting both performance and accuracy. -Device Support: - -CPU: FP32 -GPU: FP32, FP16, ACCURACY -NPU: FP16 - -ACCURACY Mode: Maintains original model precision without any conversion, ensuring maximum accuracy at potential performance cost. -Performance Considerations: FP16 generally provides 2x better performance on GPU/NPU with minimal accuracy loss. - -#### num_of_threads Config Description -Override the default number of inference threads for CPU-based execution. -Default: 8 threads if not specified - -#### num_streams Config Description -Controls the number of parallel inference streams for throughput optimization. -Default: 1 stream (latency-optimized) -Use Cases: - -Single stream (1): Minimize latency for real-time applications -Multiple streams (2-8): Maximize throughput for batch processing -Optimal count: Usually matches number of CPU cores or GPU execution units - -Performance Impact: More streams can improve throughput but may increase memory usage. -#### cache_dir Config Description -Specifies directory path for caching compiled models to improve subsequent load times. -Benefits: Dramatically faster model loading after first compilation - Reduces initialization overhead significantly E - specially beneficial for complex models and frequent restarts -Requirements: Directory must be writable by the application -Sufficient disk space for cached models (can be substantial) -Path must be accessible at runtime - -Supported Devices: CPU, NPU, GPU -#### context Config Description - -Provides OpenCL context for GPU acceleration when OpenVINO EP is built with OpenCL support. -Usage: Pass cl_context address as void pointer converted to string -Availability: Only when compiled with OpenCL flags enabled -Purpose: Integration with existing OpenCL workflows and shared memory management - -#### load_config Config Description -Enables loading custom OpenVINO properties from JSON configuration file during runtime. -JSON Format: - -```python -{ - "DEVICE_KEY": {"PROPERTY": "PROPERTY_VALUE"} -} -``` -Validation Rules: - -Invalid property keys: Ignored with warning logged -Invalid property values: Causes exception during execution -Immutable properties: Skipped with warning logged - -Common Properties: - -PERFORMANCE_HINT: "THROUGHPUT", "LATENCY" -EXECUTION_MODE_HINT: "ACCURACY", "PERFORMANCE" -LOG_LEVEL: "LOG_DEBUG", "LOG_INFO", "LOG_WARNING" -CACHE_DIR: Custom cache directory path -INFERENCE_PRECISION_HINT: "f32", "f16" - -Device-Specific Properties: For setting appropriate `"PROPERTY"`, refer to OpenVINO config options for [CPU](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/cpu-device.html#supported-properties), [GPU](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#supported-properties), [NPU](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html#supported-features-and-properties) and [AUTO](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/auto-device-selection.html#using-auto). - -#### enable_qdq_optimizer Config Description -Enables Quantize-Dequantize (QDQ) optimization specifically for NPU devices. -Target: NPU devices only -Purpose: Optimizes ORT quantized models by keeping QDQ operations only for supported ops -Benefits: - -Better performance/accuracy with ORT optimizations disabled -NPU-specific quantization optimizations -Reduced computational overhead for quantized models - -#### disable_dynamic_shapes Config Description -Controls whether dynamic input shapes are converted to static shapes at runtime. -Options: - -True: Convert dynamic models to static shapes before execution -False: Maintain dynamic shape capabilities (default for most devices) - -Use Cases: -\ -Models with dynamic batch size or sequence length -Devices that perform significantly better with static shapes -Memory optimization scenarios - -#### model_priority Config Description -Configures resource allocation priority when multiple models run simultaneously. -Priority Levels: - -HIGH: Maximum resource allocation and priority -MEDIUM: Balanced resource sharing with other models -LOW: Minimal resource allocation, yields to higher priority models -DEFAULT: System-determined priority based on device capabilities - -Use Cases: Multi-model deployments, resource-constrained environments. -#### reshape_input Config Description -Allows setting dynamic shape bounds specifically for NPU devices to optimize memory allocation and performance. -Format: input_name[lower_bound..upper_bound] or input_name[fixed_shape] -Device Support: NPU only (other devices handle dynamic shapes automatically) -Purpose: NPU requires shape bounds for optimal memory management with dynamic models -Examples: - -data[1,3,224,224..448]: Height can vary from 224 to 448 -sequence[1,10..100,768]: Sequence length from 10 to 100 -batch[1..8,3,224,224]: Batch size from 1 to 8 - - -#### layout Config Description -Provides explicit control over tensor layout format for inputs and outputs, enabling performance optimizations. -Standard Layout Characters: - -N: Batch dimension -C: Channel dimension -H: Height dimension -W: Width dimension -D: Depth dimension -T: Time/sequence dimension -?: Unknown/placeholder dimension - -Format: input_name[LAYOUT],output_name[LAYOUT] - - -### Feature Descriptions - -#### Model Caching -OpenVINO™ supports [model caching](https://docs.openvino.ai/2025/openvino-workflow/running-inference/optimize-inference/optimizing-latency/model-caching-overview.html). - -Model caching feature is supported on CPU, NPU, GPU along with kernel caching on iGPU, dGPU. - -This feature enables users to save and load the blob file directly on to the hardware device target and perform inference with improved Inference Latency. - -Kernel Caching on iGPU and dGPU: - -This feature also allows user to save kernel caching as cl_cache files for models with dynamic input shapes. These cl_cache files can be loaded directly onto the iGPU/dGPU hardware device target and inferencing can be performed. - -#### Enabling Model Caching via Runtime options using C++/python API's. - -This flow can be enabled by setting the runtime config option 'cache_dir' specifying the path to dump and load the blobs (CPU, NPU, iGPU, dGPU) or cl_cache(iGPU, dGPU) while using the C++/python API'S. - -Refer to [Configuration Options](#configuration-options) for more information about using these runtime options. - -### Support for INT8 Quantized models - -Int8 models are supported on CPU, GPU and NPU. - -### Support for Weights saved in external files - -OpenVINO™ Execution Provider now supports ONNX models that store weights in external files. It is especially useful for models larger than 2GB because of protobuf limitations. - -See the [OpenVINO™ ONNX Support documentation](https://docs.openvino.ai/2025/openvino-workflow/model-preparation/convert-model-onnx.html). - -Converting and Saving an ONNX Model to External Data: -Use the ONNX API's.[documentation](https://github.com/onnx/onnx/blob/master/docs/ExternalData.md#converting-and-saving-an-onnx-model-to-external-data). - -Example: - -```python -import onnx -onnx_model = onnx.load("model.onnx") # Your model in memory as ModelProto -onnx.save_model(onnx_model, 'saved_model.onnx', save_as_external_data=True, all_tensors_to_one_file=True, location='data/weights_data', size_threshold=1024, convert_attribute=False) -``` - -Note: -1. In the above script, model.onnx is loaded and then gets saved into a file called 'saved_model.onnx' which won't have the weights but this new onnx model now will have the relative path to where the weights file is located. The weights file 'weights_data' will now contain the weights of the model and the weights from the original model gets saved at /data/weights_data. - -2. Now, you can use this 'saved_model.onnx' file to infer using your sample. But remember, the weights file location can't be changed. The weights have to be present at /data/weights_data - -3. Install the latest ONNX Python package using pip to run these ONNX Python API's successfully. - - - -### Multi-threading for OpenVINO Execution Provider - -OpenVINO™ Execution Provider for ONNX Runtime enables thread-safe deep learning inference - -### Multi streams for OpenVINO Execution Provider -OpenVINO™ Execution Provider for ONNX Runtime allows multiple stream execution for difference performance requirements part of API 2.0 - -### Auto-Device Execution for OpenVINO Execution Provider - -Use `AUTO:,..` as the device name to delegate selection of an actual accelerator to OpenVINO™. Auto-device internally recognizes and selects devices from CPU, integrated GPU, discrete Intel GPUs (when available) and NPU (when available) depending on the device capabilities and the characteristic of ONNX models, for example, precisions. Then Auto-device assigns inference requests to the selected device. - -From the application point of view, this is just another device that handles all accelerators in full system. - -For more information on Auto-Device plugin of OpenVINO™, please refer to the -[Intel OpenVINO™ Auto Device Plugin](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#automatic-device-selection). - -### Heterogeneous Execution for OpenVINO Execution Provider - -The heterogeneous execution enables computing for inference on one network on several devices. Purposes to execute networks in heterogeneous mode: - -* To utilize accelerator's power and calculate the heaviest parts of the network on the accelerator and execute unsupported layers on fallback devices like the CPU to utilize all available hardware more efficiently during one inference. - -For more information on Heterogeneous plugin of OpenVINO™, please refer to the -[Intel OpenVINO™ Heterogeneous Plugin](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.html). - -### Multi-Device Execution for OpenVINO Execution Provider - -Multi-Device plugin automatically assigns inference requests to available computational devices to execute the requests in parallel. Potential gains are as follows: - -* Improved throughput that multiple devices can deliver (compared to single-device execution) -* More consistent performance, since the devices can now share the inference burden (so that if one device is becoming too busy, another device can take more of the load) - -For more information on Multi-Device plugin of OpenVINO™, please refer to the -[Intel OpenVINO™ Multi Device Plugin](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#multi-stream-execution). - -### Export OpenVINO Compiled Blob -Export the OpenVINO compiled blob as an ONNX model. Using this ONNX model for subsequent inferences avoids model recompilation and could have a positive impact on Session creation time. This feature is currently enabled for fully supported models only. It complies with the ORT session config keys -``` - Ort::SessionOptions session_options; - - // Enable EP context feature to dump the partitioned graph which includes the EP context into Onnx file. - // "0": disable. (default) - // "1": enable. - - session_options.AddConfigEntry(kOrtSessionOptionEpContextEnable, "1"); - - // Flag to specify whether to dump the EP context into single Onnx model or pass bin path. - // "0": dump the EP context into separate file, keep the file name in the Onnx model. - // "1": dump the EP context into the Onnx model. (default). - - session_options.AddConfigEntry(kOrtSessionOptionEpContextEmbedMode, "1"); - - // Specify the file path for the Onnx model which has EP context. - // Defaults to /original_file_name_ctx.onnx if not specified - - session_options.AddConfigEntry(kOrtSessionOptionEpContextFilePath, ".\ov_compiled_epctx.onnx"); - - sess = onnxruntime.InferenceSession(, session_options) -``` -Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h) for more information about session options. - -### Enable QDQ Optimizations Passes -Optimizes ORT quantized models for the NPU device to only keep QDQs for supported ops and optimize for performance and accuracy.Generally this feature will give better performance/accuracy with ORT Optimizations disabled. -Refer to [Configuration Options](#configuration-options) for more information about using these runtime options. - -### Loading Custom JSON OpenVINO Config During Runtime -The `load_config` feature is developed to facilitate loading of OpenVINO EP parameters using a JSON input schema, which mandatorily follows below format - -``` -{ - "DEVICE_KEY": {"PROPERTY": "PROPERTY_VALUE"} -} -``` -where "DEVICE_KEY" can be CPU, NPU, GPU or AUTO , "PROPERTY" must be a valid entity defined in [OpenVINO™ supported properties](https://github.com/openvinotoolkit/openvino/blob/releases/2025/3/src/inference/include/openvino/runtime/properties.hpp) & "PROPERTY_VALUE" must be a valid corresponding supported property value passed in as a string. - -If a property is set using an invalid key (i.e., a key that is not recognized as part of the `OpenVINO™ supported properties`), it will be ignored & a warning will be logged against the same. However, if a valid property key is used but assigned an invalid value (e.g., a non-integer where an integer is expected), the OpenVINO™ framework will result in an exception during execution. - -The valid properties are of two types viz. Mutable (Read/Write) & Immutable (Read only) these are also governed while setting the same. If an Immutable property is being set, we skip setting the same with a similar warning. - -For setting appropriate `"PROPERTY"`, refer to OpenVINO config options for [CPU](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/cpu-device.html#supported-properties), [GPU](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#supported-properties), [NPU](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/npu-device.html#supported-features-and-properties) and [AUTO](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/auto-device-selection.html#using-auto). - -Example: - -The usage of this functionality using onnxruntime_perf_test application is as below – - -``` -onnxruntime_perf_test.exe -e openvino -m times -r 1 -i "device_type|NPU load_config|test_config.json" model.onnx -``` -#### Dynamic Shape Management -Comprehensive handling of models with variable input dimensions across all supported devices. -Device-Specific Handling: - -NPU: Requires shape bounds via reshape_input for optimal memory management -CPU/GPU: Automatic dynamic shape handling with runtime optimization -All Devices: Option to convert dynamic to static shapes when beneficial - -OpenVINO™ Shape Management: -The reshape method updates input shapes and propagates them through all intermediate layers to outputs. This enables runtime shape modification for different input sizes. -Shape Changing Approaches: - -Single input models: Pass new shape directly to reshape method -Multiple inputs: Specify shapes by port, index, or tensor name -Batch modification: Use set_batch method with appropriate layout specification - -Performance Considerations: - -Static shapes avoid memory and runtime overheads -Dynamic shapes provide flexibility at performance cost -Shape bounds optimization (NPU) balances flexibility and performance - -Important: Always set static shapes when input dimensions won't change between inferences for optimal performance. - -#### Tensor Layout Control -Enables explicit specification of tensor memory layout for performance optimization currenty supported on CPU. -Layout specification helps OpenVINO optimize memory access patterns and tensor operations based on actual data organization. -Layout Specification Benefits: - -Optimized memory access: Improved cache utilization and memory throughput -Better tensor operations: Device-specific operation optimization -Reduced memory copies: Direct operation on optimally laid out data -Hardware-specific optimization: Leverages device-preferred memory layouts - -Common Layout Patterns: - -NCHW: Batch, Channel, Height, Width -NHWC: Batch, Height, Width, Channel -NC: Batch, Channel -NTD: Batch, Time, Dimension - - -### OpenVINO Execution Provider Supports EP-Weight Sharing across sessions -The OpenVINO Execution Provider (OVEP) in ONNX Runtime supports EP-Weight Sharing, enabling models to efficiently share weights across multiple inference sessions. This feature enhances the execution of Large Language Models (LLMs) with prefill and KV cache, reducing memory consumption and improving performance when running multiple inferences. - -With EP-Weight Sharing, prefill and KV cache models can now reuse the same set of weights, minimizing redundancy and optimizing inference. Additionally, this ensures that EP Context nodes are still created even when the model undergoes subgraph partitioning. - -These changes enable weight sharing between two models using the session context option: ep.share_ep_contexts. -Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/5068ab9b190c549b546241aa7ffbe5007868f595/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h#L319) for more details on configuring this runtime option. - - - -## Configuration Options - -OpenVINO™ Execution Provider can be configured with certain options at runtime that control the behavior of the EP. These options can be set as key-value pairs as below:- - -### Python API -Key-Value pairs for config options can be set using InferenceSession API as follow:- - -``` -session = onnxruntime.InferenceSession(, providers=['OpenVINOExecutionProvider'], provider_options=[{Key1 : Value1, Key2 : Value2, ...}]) -``` -*Note that the releases from (ORT 1.10) will require explicitly setting the providers parameter if you want to use execution providers other than the default CPU provider (as opposed to the current behavior of providers getting set/registered by default based on the build flags) when instantiating InferenceSession.* - -### C/C++ API 2.0 -The session configuration options are passed to SessionOptionsAppendExecutionProvider API as shown in an example below for GPU device type: - -``` -std::unordered_map options; -options[device_type] = "GPU"; -options[precision] = "FP32"; -options[num_of_threads] = "8"; -options[num_streams] = "8"; -options[cache_dir] = ""; -options[context] = "0x123456ff"; -options[enable_qdq_optimizer] = "True"; -options[load_config] = "config_path.json"; -session_options.AppendExecutionProvider_OpenVINO_V2(options); -``` - -### C/C++ Legacy API -Note: This API is no longer officially supported. Users are requested to move to V2 API. - -The session configuration options are passed to SessionOptionsAppendExecutionProvider_OpenVINO() API as shown in an example below for GPU device type: - -``` -OrtOpenVINOProviderOptions options; -options.device_type = "GPU_FP32"; -options.num_of_threads = 8; -options.cache_dir = ""; -options.context = 0x123456ff; -options.enable_opencl_throttling = false; -SessionOptions.AppendExecutionProvider_OpenVINO(session_options, &options); -``` - -### Onnxruntime Graph level Optimization -OpenVINO™ backend performs hardware, dependent as well as independent optimizations on the graph to infer it on the target hardware with best possible performance. In most cases it has been observed that passing the ONNX input graph as it is without explicit optimizations would lead to best possible optimizations at kernel level by OpenVINO™. For this reason, it is advised to turn off high level optimizations performed by ONNX Runtime for OpenVINO™ Execution Provider. This can be done using SessionOptions() as shown below:- - -* #### Python API - ``` - options = onnxruntime.SessionOptions() - options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL - sess = onnxruntime.InferenceSession(, options) - ``` - -* #### C/C++ API - ``` - SessionOptions::SetGraphOptimizationLevel(ORT_DISABLE_ALL); - ``` - -## Support Coverage - -**ONNX Layers supported using OpenVINO** - -The table below shows the ONNX layers supported and validated using OpenVINO™ Execution Provider.The below table also lists the Intel hardware support for each of the layers. CPU refers to Intel® -Atom, Core, and Xeon processors. GPU refers to the Intel Integrated Graphics. Intel Discrete Graphics. For NPU if an op is not supported we fallback to CPU. - -| **ONNX Layers** | **CPU** | **GPU** | -| --- | --- | --- | -| Abs | Yes | Yes | -| Acos | Yes | Yes | -| Acosh | Yes | Yes | -| Add | Yes | Yes | -| And | Yes | Yes | -| ArgMax | Yes | Yes | -| ArgMin | Yes | Yes | -| Asin | Yes | Yes | -| Asinh | Yes | Yes | -| Atan | Yes | Yes | -| Atanh | Yes | Yes | -| AveragePool | Yes | Yes | -| BatchNormalization | Yes | Yes | -| BitShift | Yes | No | -| Ceil | Yes | Yes | -| Celu | Yes | Yes | -| Cast | Yes | Yes | -| Clip | Yes | Yes | -| Concat | Yes | Yes | -| Constant | Yes | Yes | -| ConstantOfShape | Yes | Yes | -| Conv | Yes | Yes | -| ConvInteger | Yes | Yes | -| ConvTranspose | Yes | Yes | -| Cos | Yes | Yes | -| Cosh | Yes | Yes | -| CumSum | Yes | Yes | -| DepthToSpace | Yes | Yes | -| DequantizeLinear | Yes | Yes | -| Div | Yes | Yes | -| Dropout | Yes | Yes | -| Einsum | Yes | Yes | -| Elu | Yes | Yes | -| Equal | Yes | Yes | -| Erf | Yes | Yes | -| Exp | Yes | Yes | -| Expand | Yes | Yes | -| EyeLike | Yes | No | -| Flatten | Yes | Yes | -| Floor | Yes | Yes | -| Gather | Yes | Yes | -| GatherElements | No | No | -| GatherND | Yes | Yes | -| Gemm | Yes | Yes | -| GlobalAveragePool | Yes | Yes | -| GlobalLpPool | Yes | Yes | -| GlobalMaxPool | Yes | Yes | -| Greater | Yes | Yes | -| GreaterOrEqual | Yes | Yes | -| GridSample | Yes | No | -| HardMax | Yes | Yes | -| HardSigmoid | Yes | Yes | -| Identity | Yes | Yes | -| If | Yes | Yes | -| ImageScaler | Yes | Yes | -| InstanceNormalization | Yes | Yes | -| LeakyRelu | Yes | Yes | -| Less | Yes | Yes | -| LessOrEqual | Yes | Yes | -| Log | Yes | Yes | -| LogSoftMax | Yes | Yes | -| Loop | Yes | Yes | -| LRN | Yes | Yes | -| LSTM | Yes | Yes | -| MatMul | Yes | Yes | -| MatMulInteger | Yes | No | -| Max | Yes | Yes | -| MaxPool | Yes | Yes | -| Mean | Yes | Yes | -| MeanVarianceNormalization | Yes | Yes | -| Min | Yes | Yes | -| Mod | Yes | Yes | -| Mul | Yes | Yes | -| Neg | Yes | Yes | -| NonMaxSuppression | Yes | Yes | -| NonZero | Yes | No | -| Not | Yes | Yes | -| OneHot | Yes | Yes | -| Or | Yes | Yes | -| Pad | Yes | Yes | -| Pow | Yes | Yes | -| PRelu | Yes | Yes | -| QuantizeLinear | Yes | Yes | -| QLinearMatMul | Yes | No | -| Range | Yes | Yes | -| Reciprocal | Yes | Yes | -| ReduceL1 | Yes | Yes | -| ReduceL2 | Yes | Yes | -| ReduceLogSum | Yes | Yes | -| ReduceLogSumExp | Yes | Yes | -| ReduceMax | Yes | Yes | -| ReduceMean | Yes | Yes | -| ReduceMin | Yes | Yes | -| ReduceProd | Yes | Yes | -| ReduceSum | Yes | Yes | -| ReduceSumSquare | Yes | Yes | -| Relu | Yes | Yes | -| Reshape | Yes | Yes | -| Resize | Yes | Yes | -| ReverseSequence | Yes | Yes | -| RoiAlign | Yes | Yes | -| Round | Yes | Yes | -| Scatter | Yes | Yes | -| ScatterElements | Yes | Yes | -| ScatterND | Yes | Yes | -| Selu | Yes | Yes | -| Shape | Yes | Yes | -| Shrink | Yes | Yes | -| Sigmoid | Yes | Yes | -| Sign | Yes | Yes | -| Sin | Yes | Yes | -| Sinh | Yes | No | -| SinFloat | No | No | -| Size | Yes | Yes | -| Slice | Yes | Yes | -| Softmax | Yes | Yes | -| Softplus | Yes | Yes | -| Softsign | Yes | Yes | -| SpaceToDepth | Yes | Yes | -| Split | Yes | Yes | -| Sqrt | Yes | Yes | -| Squeeze | Yes | Yes | -| Sub | Yes | Yes | -| Sum | Yes | Yes | -| Softsign | Yes | No | -| Tan | Yes | Yes | -| Tanh | Yes | Yes | -| ThresholdedRelu | Yes | Yes | -| Tile | Yes | Yes | -| TopK | Yes | Yes | -| Transpose | Yes | Yes | -| Unsqueeze | Yes | Yes | -| Upsample | Yes | Yes | -| Where | Yes | Yes | -| Xor | Yes | Yes | - - -### Topology Support - -Below topologies from ONNX open model zoo are fully supported on OpenVINO™ Execution Provider and many more are supported through sub-graph partitioning. -For NPU if model is not supported we fallback to CPU. - -### Image Classification Networks - -| **MODEL NAME** | **CPU** | **GPU** | -| --- | --- | --- | -| bvlc_alexnet | Yes | Yes | -| bvlc_googlenet | Yes | Yes | -| bvlc_reference_caffenet | Yes | Yes | -| bvlc_reference_rcnn_ilsvrc13 | Yes | Yes | -| emotion ferplus | Yes | Yes | -| densenet121 | Yes | Yes | -| inception_v1 | Yes | Yes | -| inception_v2 | Yes | Yes | -| mobilenetv2 | Yes | Yes | -| resnet18v2 | Yes | Yes | -| resnet34v2 | Yes | Yes | -| resnet101v2 | Yes | Yes | -| resnet152v2 | Yes | Yes | -| resnet50 | Yes | Yes | -| resnet50v2 | Yes | Yes | -| shufflenet | Yes | Yes | -| squeezenet1.1 | Yes | Yes | -| vgg19 | Yes | Yes | -| zfnet512 | Yes | Yes | -| mxnet_arcface | Yes | Yes | - - -### Image Recognition Networks - -| **MODEL NAME** | **CPU** | **GPU** | -| --- | --- | --- | -| mnist | Yes | Yes | - -### Object Detection Networks - -| **MODEL NAME** | **CPU** | **GPU** | -| --- | --- | --- | -| tiny_yolov2 | Yes | Yes | -| yolov3 | Yes | Yes | -| tiny_yolov3 | Yes | Yes | -| mask_rcnn | Yes | No | -| faster_rcnn | Yes | No | -| yolov4 | Yes | Yes | -| yolov5 | Yes | Yes | -| yolov7 | Yes | Yes | -| tiny_yolov7 | Yes | Yes | - -### Image Manipulation Networks - -| **MODEL NAME** | **CPU** | **GPU** | -| --- | --- | --- | -| mosaic | Yes | Yes | -| candy | Yes | Yes | -| cgan | Yes | Yes | -| rain_princess | Yes | Yes | -| pointilism | Yes | Yes | -| udnie | Yes | Yes | - -### Natural Language Processing Networks - -| **MODEL NAME** | **CPU** | **GPU** | -| --- | --- | --- | -| bert-squad | Yes | Yes | -| bert-base-cased | Yes | Yes | -| bert-base-chinese | Yes | Yes | -| bert-base-japanese-char | Yes | Yes | -| bert-base-multilingual-cased | Yes | Yes | -| bert-base-uncased | Yes | Yes | -| distilbert-base-cased | Yes | Yes | -| distilbert-base-multilingual-cased | Yes | Yes | -| distilbert-base-uncased | Yes | Yes | -| distilbert-base-uncased-finetuned-sst-2-english | Yes | Yes | -| gpt2 | Yes | Yes | -| roberta-base | Yes | Yes | -| roberta-base-squad2 | Yes | Yes | -| t5-base | Yes | Yes | -| twitter-roberta-base-sentiment | Yes | Yes | -| xlm-roberta-base | Yes | Yes | - -### Models Supported on NPU - -| **MODEL NAME** | **NPU** | -| --- | --- | -| yolov3 | Yes | -| microsoft_resnet-50 | Yes | -| realesrgan-x4 | Yes | -| timm_inception_v4.tf_in1k | Yes | -| squeezenet1.0-qdq | Yes | -| vgg16 | Yes | -| caffenet-qdq | Yes | -| zfnet512 | Yes | -| shufflenet-v2 | Yes | -| zfnet512-qdq | Yes | -| googlenet | Yes | -| googlenet-qdq | Yes | -| caffenet | Yes | -| bvlcalexnet-qdq | Yes | -| vgg16-qdq | Yes | -| mnist | Yes | -| ResNet101-DUC | Yes | -| shufflenet-v2-qdq | Yes | -| bvlcalexnet | Yes | -| squeezenet1.0 | Yes | - -**Note:** We have added support for INT8 models, quantized with Neural Network Compression Framework (NNCF). To know more about NNCF refer [here](https://github.com/openvinotoolkit/nncf). - -## OpenVINO™ Execution Provider Samples Tutorials - -In order to showcase what you can do with the OpenVINO™ Execution Provider for ONNX Runtime, we have created a few samples that shows how you can get that performance boost you’re looking for with just one additional line of code. - -### Python API -[Object detection with tinyYOLOv2 in Python](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/OpenVINO_EP/tiny_yolo_v2_object_detection) - -[Object detection with YOLOv4 in Python](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/OpenVINO_EP/yolov4_object_detection) - -### C/C++ API -[Image classification with Squeezenet in CPP](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/c_cxx/OpenVINO_EP) - -### Csharp API -[Object detection with YOLOv3 in C#](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/c_sharp/OpenVINO_EP/yolov3_object_detection) - -## Blogs/Tutorials - -### Overview of OpenVINO Execution Provider for ONNX Runtime -[OpenVINO Execution Provider](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/faster-inferencing-with-one-line-of-code.html) - -### Tutorial on how to use OpenVINO™ Execution Provider for ONNX Runtime Docker Containers -[Docker Containers](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-execution-provider-docker-container.html) - -### Tutorial on how to use OpenVINO™ Execution Provider for ONNX Runtime python wheel packages +--- +title: Intel - OpenVINO™ +description: Instructions to execute OpenVINO™ Execution Provider for ONNX Runtime. +parent: Execution Providers +nav_order: 3 +redirect_from: /docs/reference/execution-providers/OpenVINO-ExecutionProvider +--- + +# OpenVINO™ Execution Provider +{: .no_toc } + +Accelerate ONNX models on Intel CPUs, GPUs, NPU with Intel OpenVINO™ Execution Provider. Please refer to [this](https://software.intel.com/en-us/openvino-toolkit/hardware) page for details on the Intel hardware supported. + +## Contents +{: .no_toc } + +* TOC placeholder +{:toc} + +## Install + +Pre-built packages are published for OpenVINO™ Execution Provider for ONNX Runtime by Intel for each release. +* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.8 Release](https://github.com/intel/onnxruntime/releases) +* Python wheels Ubuntu/Windows: [onnxruntime-openvino](https://pypi.org/project/onnxruntime-openvino/) + +## Requirements + + +ONNX Runtime OpenVINO™ Execution Provider is compatible with three latest releases of OpenVINO™. + +|ONNX Runtime|OpenVINO™|Notes| +|---|---|---| +|1.23.0|2025.3|[Details - Placeholder]()| +|1.22.0|2025.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.7)| +|1.21.0|2025.0|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.6)| + +## Build + +For build instructions, please see the [BUILD page](../build/eps.md#openvino). + +## Usage + +**Python Package Installation** + +For Python users, install the onnxruntime-openvino package: +``` +pip install onnxruntime-openvino +``` + +**Set OpenVINO™ Environment Variables** + +To use OpenVINO™ Execution Provider with any programming language (Python, C++, C#), you must set up the OpenVINO™ Environment Variables using the full installer package of OpenVINO™. + +* **Windows** +``` +C:\ \setupvars.bat +``` +* **Linux** +``` +$ source /setupvars.sh +``` +**Note for Linux Python Users:** OpenVINO™ Execution Provider installed from PyPi.org comes with prebuilt OpenVINO™ libs and supports flag CXX11_ABI=0. So there is no need to install OpenVINO™ separately. However, if you need to enable CX11_ABI=1 flag, build ONNX Runtime python wheel packages from source. For build instructions, see the [BUILD page](../build/eps.md#openvino). + + +**Set OpenVINO™ Environment for C#** + +To use csharp api for openvino execution provider create a custom nuget package. Follow the instructions [here](../build/inferencing.md#build-nuget-packages) to install prerequisites for nuget creation. Once prerequisites are installed follow the instructions to [build openvino execution provider](../build/eps.md#openvino) and add an extra flag `--build_nuget` to create nuget packages. Two nuget packages will be created Microsoft.ML.OnnxRuntime.Managed and Intel.ML.OnnxRuntime.Openvino. + +# OpenVINO Execution Provider Configuration + +## Table of Contents +- [Configuration Options](#configuration-options) +- [Configuration Descriptions](#configuration-descriptions) +- [Examples](#examples) + +## Configuration Options + +Runtime parameters you set when initializing the OpenVINO Execution Provider to control how inference runs. + +**Click on any configuration key below to jump to its detailed description.** + +| **Key** | **Type** | **Allowable Values** | **Value Type** | **Description** | +|---------|----------|---------------------|----------------|-----------------| +| [**device_type**](#device_type) | string | CPU, NPU, GPU, GPU.0, GPU.1, HETERO, MULTI, AUTO | string | Choose which hardware device to use for inference | +| [**precision**](#precision) | string | FP32, FP16, ACCURACY | string | Set inference precision level | +| [**num_of_threads**](#num_of_threads--num_streams) | string | Any positive integer > 0 | size_t | Control number of inference threads | +| [**num_streams**](#num_of_threads--num_streams) | string | Any positive integer > 0 | size_t | Set parallel execution streams for throughput | +| [**cache_dir**](#cache_dir) | string | Valid filesystem path | string | Enable model caching by setting cache directory | +| [**load_config**](#load_config) | string | JSON file path | string | Load custom OpenVINO properties from JSON | +| [**enable_qdq_optimizer**](#enable_qdq_optimizer) | string | True/False | boolean | Enable QDQ optimization for NPU | +| [**disable_dynamic_shapes**](#disable_dynamic_shapes--reshape_input) | string | True/False | boolean | Convert dynamic models to static shapes | +| [**model_priority**](#model_priority) | string | LOW, MEDIUM, HIGH, DEFAULT | string | Configure model resource allocation priority | +| [**reshape_input**](#disable_dynamic_shapes--reshape_input) | string | input_name[shape_bounds] | string | Set dynamic shape bounds for NPU models | +| [**layout**](#layout) | string | input_name[layout_format] | string | Specify input/output tensor layout format | + +Refer to [Examples](#examples) for usage. + +--- + +## Configuration Descriptions + +### device_type +Specifies the target hardware device for inference execution. Supports single devices (CPU, NPU, GPU, GPU.0, GPU.1) and multi-device configurations. + +**Valid Device Combinations:** +- `HETERO:,...` - Split execution across devices +- `MULTI:,...` - Parallel execution on devices +- `AUTO:,...` - Automatic device selection + +Minimum two devices required. Example: `HETERO:GPU,CPU`, `AUTO:GPU,NPU,CPU`, `MULTI:GPU,CPU` + +**Note:** Deprecated options `CPU_FP32`, `GPU_FP32`, `GPU_FP16`, `NPU_FP16` are no longer supported. Use `device_type` and `precision` separately. + +**Auto Device Selection:** Use `AUTO` to automatically select optimal device based on model characteristics. AUTO internally recognizes CPU, integrated GPU, discrete Intel GPUs, and NPU, then assigns inference requests to the best-suited device. + +**Heterogeneous Execution:** Use `HETERO` to split network execution across multiple devices, utilizing accelerator power for heavy operations while falling back to CPU for unsupported layers. + +**Multi-Device Execution:** Use `MULTI` to run the same model on multiple devices in parallel, improving throughput and performance consistency through load distribution. + +### precision +Controls numerical precision during inference, balancing performance and accuracy. + +**Device Support:** +- CPU: FP32 +- GPU: FP32, FP16, ACCURACY +- NPU: FP16 + +**ACCURACY Mode:** Maintains original model precision without conversion, ensuring maximum accuracy. FP16 generally provides 2x better performance on GPU/NPU with minimal accuracy loss. + +### num_of_threads & num_streams +**Multi-Threading:** Controls inference thread count for CPU execution (default: 8). OpenVINO EP provides thread-safe inference across all devices. + +**Multi-Stream Execution:** Manages parallel inference streams for throughput optimization (default: 1 for latency). Multiple streams improve throughput for batch processing while single stream minimizes latency for real-time applications. + +### cache_dir +Enables model caching to dramatically reduce subsequent load times. Supports CPU, NPU, GPU with kernel caching on iGPU/dGPU. + +**Benefits:** Saves compiled models and cl_cache files for dynamic shapes, eliminating recompilation overhead. Especially beneficial for complex models and frequent application restarts. + +### load_config +Loads custom OpenVINO properties from JSON configuration file during runtime. + +**JSON Format:** +```json +{ + "DEVICE_KEY": {"PROPERTY": "PROPERTY_VALUE"} +} +``` +Validation: Invalid property keys are ignored with warnings. Invalid values cause execution exceptions. Immutable properties are skipped. +Common Properties: PERFORMANCE_HINT, EXECUTION_MODE_HINT, LOG_LEVEL, CACHE_DIR, INFERENCE_PRECISION_HINT. + +### enable_qdq_optimizer +NPU-specific optimization for Quantize-Dequantize operations. Optimizes ORT quantized models by keeping QDQ operations only for supported ops, providing better performance and accuracy. + +### disable_dynamic_shapes & reshape_input +**Dynamic Shape Management** : Handles models with variable input dimensions. Option to convert dynamic to static shapes when beneficial for performance. +**NPU Shape Bounds** : Use reshape_input to set dynamic shape bounds specifically for NPU devices (format: input_name[lower..upper] or input_name[fixed_shape]). +Required for optimal NPU memory management. + +### model_priority +Configures resource allocation priority for multi-model deployments: + +**HIGH**: Maximum resource allocation +**MEDIUM**: Balanced resource sharing +**LOW**: Minimal allocation, yields to higher priority +**DEFAULT**: System-determined priority + +### layout +***Tensor Layout Control:***: Provides explicit control over tensor memory layout for performance optimization. Helps OpenVINO optimize memory access patterns and tensor operations. + +***Layout Characters:***: N (Batch), C (Channel), H (Height), W (Width), D (Depth), T (Time), ? (Unknown) + +***Format:*** input_name[LAYOUT],output_name[LAYOUT] + +## Examples + +### [Example 1](#examples) + +```python +import onnxruntime as ort + +# Multi-device with caching and threading optimization +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{ + 'device_type': 'AUTO:GPU,NPU,CPU', + 'precision': 'FP16', + 'num_of_threads': '8', + 'num_streams': '4', + 'cache_dir': './ov_cache' + }] +) + +# Command line equivalent +# onnxruntime_perf_test.exe -e openvino -i "device_type|AUTO:GPU,NPU,CPU precision|FP16 num_of_threads|8 num_streams|4 cache_dir|./ov_cache" model.onnx +``` + +### Example 2 +```python +import onnxruntime as ort + +# NPU-optimized with custom config and shape management +session = ort.InferenceSession( + "model.onnx", + providers=['OpenVINOExecutionProvider'], + provider_options=[{ + 'device_type': 'HETERO:NPU,CPU', + 'load_config': 'custom_config.json', + 'enable_qdq_optimizer': 'True', + 'disable_dynamic_shapes': 'True', + 'model_priority': 'HIGH', + 'reshape_input': 'data[1,3,224,224..448]', + 'layout': 'data[NCHW],output[NC]' + }] +) + +# Example custom_config.json +{ + "NPU": { + "LOG_LEVEL": "LOG_DEBUG", + "PERFORMANCE_HINT": "THROUGHPUT" + }, + "CPU": { + "EXECUTION_MODE_HINT": "ACCURACY" + } +} + +# Command line equivalent +# onnxruntime_perf_test.exe -e openvino -i "device_type|HETERO:NPU,CPU load_config|custom_config.json enable_qdq_optimizer|True disable_dynamic_shapes|True model_priority|HIGH reshape_input|data[1,3,224,224..448] layout|data[NCHW],output[NC]" model.onnx + +``` + +## Configuration Options + +OpenVINO™ Execution Provider can be configured with certain options at runtime that control the behavior of the EP. These options can be set as key-value pairs as below:- + +### Python API +Key-Value pairs for config options can be set using InferenceSession API as follow:- + +``` +session = onnxruntime.InferenceSession(, providers=['OpenVINOExecutionProvider'], provider_options=[{Key1 : Value1, Key2 : Value2, ...}]) +``` +*Note that the releases from (ORT 1.10) will require explicitly setting the providers parameter if you want to use execution providers other than the default CPU provider (as opposed to the current behavior of providers getting set/registered by default based on the build flags) when instantiating InferenceSession.* + +### C/C++ API 2.0 +The session configuration options are passed to SessionOptionsAppendExecutionProvider API as shown in an example below for GPU device type: + +``` +std::unordered_map options; +options[device_type] = "GPU"; +options[precision] = "FP32"; +options[num_of_threads] = "8"; +options[num_streams] = "8"; +options[cache_dir] = ""; +options[context] = "0x123456ff"; +options[enable_qdq_optimizer] = "True"; +options[load_config] = "config_path.json"; +session_options.AppendExecutionProvider_OpenVINO_V2(options); +``` + +### C/C++ Legacy API +Note: This API is no longer officially supported. Users are requested to move to V2 API. + +The session configuration options are passed to SessionOptionsAppendExecutionProvider_OpenVINO() API as shown in an example below for GPU device type: + +``` +OrtOpenVINOProviderOptions options; +options.device_type = "GPU_FP32"; +options.num_of_threads = 8; +options.cache_dir = ""; +options.context = 0x123456ff; +options.enable_opencl_throttling = false; +SessionOptions.AppendExecutionProvider_OpenVINO(session_options, &options); +``` + +### Onnxruntime Graph level Optimization +OpenVINO™ backend performs hardware, dependent as well as independent optimizations on the graph to infer it on the target hardware with best possible performance. In most cases it has been observed that passing the ONNX input graph as it is without explicit optimizations would lead to best possible optimizations at kernel level by OpenVINO™. For this reason, it is advised to turn off high level optimizations performed by ONNX Runtime for OpenVINO™ Execution Provider. This can be done using SessionOptions() as shown below:- + +* #### Python API + ``` + options = onnxruntime.SessionOptions() + options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL + sess = onnxruntime.InferenceSession(, options) + ``` + +* #### C/C++ API + ``` + SessionOptions::SetGraphOptimizationLevel(ORT_DISABLE_ALL); + ``` + +## Support Coverage + +**ONNX Layers supported using OpenVINO** + +The table below shows the ONNX layers supported and validated using OpenVINO™ Execution Provider.The below table also lists the Intel hardware support for each of the layers. CPU refers to Intel® +Atom, Core, and Xeon processors. GPU refers to the Intel Integrated Graphics. Intel Discrete Graphics. For NPU if an op is not supported we fallback to CPU. + +| **ONNX Layers** | **CPU** | **GPU** | +| --- | --- | --- | +| Abs | Yes | Yes | +| Acos | Yes | Yes | +| Acosh | Yes | Yes | +| Add | Yes | Yes | +| And | Yes | Yes | +| ArgMax | Yes | Yes | +| ArgMin | Yes | Yes | +| Asin | Yes | Yes | +| Asinh | Yes | Yes | +| Atan | Yes | Yes | +| Atanh | Yes | Yes | +| AveragePool | Yes | Yes | +| BatchNormalization | Yes | Yes | +| BitShift | Yes | No | +| Ceil | Yes | Yes | +| Celu | Yes | Yes | +| Cast | Yes | Yes | +| Clip | Yes | Yes | +| Concat | Yes | Yes | +| Constant | Yes | Yes | +| ConstantOfShape | Yes | Yes | +| Conv | Yes | Yes | +| ConvInteger | Yes | Yes | +| ConvTranspose | Yes | Yes | +| Cos | Yes | Yes | +| Cosh | Yes | Yes | +| CumSum | Yes | Yes | +| DepthToSpace | Yes | Yes | +| DequantizeLinear | Yes | Yes | +| Div | Yes | Yes | +| Dropout | Yes | Yes | +| Einsum | Yes | Yes | +| Elu | Yes | Yes | +| Equal | Yes | Yes | +| Erf | Yes | Yes | +| Exp | Yes | Yes | +| Expand | Yes | Yes | +| EyeLike | Yes | No | +| Flatten | Yes | Yes | +| Floor | Yes | Yes | +| Gather | Yes | Yes | +| GatherElements | No | No | +| GatherND | Yes | Yes | +| Gemm | Yes | Yes | +| GlobalAveragePool | Yes | Yes | +| GlobalLpPool | Yes | Yes | +| GlobalMaxPool | Yes | Yes | +| Greater | Yes | Yes | +| GreaterOrEqual | Yes | Yes | +| GridSample | Yes | No | +| HardMax | Yes | Yes | +| HardSigmoid | Yes | Yes | +| Identity | Yes | Yes | +| If | Yes | Yes | +| ImageScaler | Yes | Yes | +| InstanceNormalization | Yes | Yes | +| LeakyRelu | Yes | Yes | +| Less | Yes | Yes | +| LessOrEqual | Yes | Yes | +| Log | Yes | Yes | +| LogSoftMax | Yes | Yes | +| Loop | Yes | Yes | +| LRN | Yes | Yes | +| LSTM | Yes | Yes | +| MatMul | Yes | Yes | +| MatMulInteger | Yes | No | +| Max | Yes | Yes | +| MaxPool | Yes | Yes | +| Mean | Yes | Yes | +| MeanVarianceNormalization | Yes | Yes | +| Min | Yes | Yes | +| Mod | Yes | Yes | +| Mul | Yes | Yes | +| Neg | Yes | Yes | +| NonMaxSuppression | Yes | Yes | +| NonZero | Yes | No | +| Not | Yes | Yes | +| OneHot | Yes | Yes | +| Or | Yes | Yes | +| Pad | Yes | Yes | +| Pow | Yes | Yes | +| PRelu | Yes | Yes | +| QuantizeLinear | Yes | Yes | +| QLinearMatMul | Yes | No | +| Range | Yes | Yes | +| Reciprocal | Yes | Yes | +| ReduceL1 | Yes | Yes | +| ReduceL2 | Yes | Yes | +| ReduceLogSum | Yes | Yes | +| ReduceLogSumExp | Yes | Yes | +| ReduceMax | Yes | Yes | +| ReduceMean | Yes | Yes | +| ReduceMin | Yes | Yes | +| ReduceProd | Yes | Yes | +| ReduceSum | Yes | Yes | +| ReduceSumSquare | Yes | Yes | +| Relu | Yes | Yes | +| Reshape | Yes | Yes | +| Resize | Yes | Yes | +| ReverseSequence | Yes | Yes | +| RoiAlign | Yes | Yes | +| Round | Yes | Yes | +| Scatter | Yes | Yes | +| ScatterElements | Yes | Yes | +| ScatterND | Yes | Yes | +| Selu | Yes | Yes | +| Shape | Yes | Yes | +| Shrink | Yes | Yes | +| Sigmoid | Yes | Yes | +| Sign | Yes | Yes | +| Sin | Yes | Yes | +| Sinh | Yes | No | +| SinFloat | No | No | +| Size | Yes | Yes | +| Slice | Yes | Yes | +| Softmax | Yes | Yes | +| Softplus | Yes | Yes | +| Softsign | Yes | Yes | +| SpaceToDepth | Yes | Yes | +| Split | Yes | Yes | +| Sqrt | Yes | Yes | +| Squeeze | Yes | Yes | +| Sub | Yes | Yes | +| Sum | Yes | Yes | +| Softsign | Yes | No | +| Tan | Yes | Yes | +| Tanh | Yes | Yes | +| ThresholdedRelu | Yes | Yes | +| Tile | Yes | Yes | +| TopK | Yes | Yes | +| Transpose | Yes | Yes | +| Unsqueeze | Yes | Yes | +| Upsample | Yes | Yes | +| Where | Yes | Yes | +| Xor | Yes | Yes | + + +### Topology Support + +Below topologies from ONNX open model zoo are fully supported on OpenVINO™ Execution Provider and many more are supported through sub-graph partitioning. +For NPU if model is not supported we fallback to CPU. + +### Image Classification Networks + +| **MODEL NAME** | **CPU** | **GPU** | +| --- | --- | --- | +| bvlc_alexnet | Yes | Yes | +| bvlc_googlenet | Yes | Yes | +| bvlc_reference_caffenet | Yes | Yes | +| bvlc_reference_rcnn_ilsvrc13 | Yes | Yes | +| emotion ferplus | Yes | Yes | +| densenet121 | Yes | Yes | +| inception_v1 | Yes | Yes | +| inception_v2 | Yes | Yes | +| mobilenetv2 | Yes | Yes | +| resnet18v2 | Yes | Yes | +| resnet34v2 | Yes | Yes | +| resnet101v2 | Yes | Yes | +| resnet152v2 | Yes | Yes | +| resnet50 | Yes | Yes | +| resnet50v2 | Yes | Yes | +| shufflenet | Yes | Yes | +| squeezenet1.1 | Yes | Yes | +| vgg19 | Yes | Yes | +| zfnet512 | Yes | Yes | +| mxnet_arcface | Yes | Yes | + + +### Image Recognition Networks + +| **MODEL NAME** | **CPU** | **GPU** | +| --- | --- | --- | +| mnist | Yes | Yes | + +### Object Detection Networks + +| **MODEL NAME** | **CPU** | **GPU** | +| --- | --- | --- | +| tiny_yolov2 | Yes | Yes | +| yolov3 | Yes | Yes | +| tiny_yolov3 | Yes | Yes | +| mask_rcnn | Yes | No | +| faster_rcnn | Yes | No | +| yolov4 | Yes | Yes | +| yolov5 | Yes | Yes | +| yolov7 | Yes | Yes | +| tiny_yolov7 | Yes | Yes | + +### Image Manipulation Networks + +| **MODEL NAME** | **CPU** | **GPU** | +| --- | --- | --- | +| mosaic | Yes | Yes | +| candy | Yes | Yes | +| cgan | Yes | Yes | +| rain_princess | Yes | Yes | +| pointilism | Yes | Yes | +| udnie | Yes | Yes | + +### Natural Language Processing Networks + +| **MODEL NAME** | **CPU** | **GPU** | +| --- | --- | --- | +| bert-squad | Yes | Yes | +| bert-base-cased | Yes | Yes | +| bert-base-chinese | Yes | Yes | +| bert-base-japanese-char | Yes | Yes | +| bert-base-multilingual-cased | Yes | Yes | +| bert-base-uncased | Yes | Yes | +| distilbert-base-cased | Yes | Yes | +| distilbert-base-multilingual-cased | Yes | Yes | +| distilbert-base-uncased | Yes | Yes | +| distilbert-base-uncased-finetuned-sst-2-english | Yes | Yes | +| gpt2 | Yes | Yes | +| roberta-base | Yes | Yes | +| roberta-base-squad2 | Yes | Yes | +| t5-base | Yes | Yes | +| twitter-roberta-base-sentiment | Yes | Yes | +| xlm-roberta-base | Yes | Yes | + +### Models Supported on NPU + +| **MODEL NAME** | **NPU** | +| --- | --- | +| yolov3 | Yes | +| microsoft_resnet-50 | Yes | +| realesrgan-x4 | Yes | +| timm_inception_v4.tf_in1k | Yes | +| squeezenet1.0-qdq | Yes | +| vgg16 | Yes | +| caffenet-qdq | Yes | +| zfnet512 | Yes | +| shufflenet-v2 | Yes | +| zfnet512-qdq | Yes | +| googlenet | Yes | +| googlenet-qdq | Yes | +| caffenet | Yes | +| bvlcalexnet-qdq | Yes | +| vgg16-qdq | Yes | +| mnist | Yes | +| ResNet101-DUC | Yes | +| shufflenet-v2-qdq | Yes | +| bvlcalexnet | Yes | +| squeezenet1.0 | Yes | + +**Note:** We have added support for INT8 models, quantized with Neural Network Compression Framework (NNCF). To know more about NNCF refer [here](https://github.com/openvinotoolkit/nncf). + +## OpenVINO™ Execution Provider Samples Tutorials + +In order to showcase what you can do with the OpenVINO™ Execution Provider for ONNX Runtime, we have created a few samples that shows how you can get that performance boost you’re looking for with just one additional line of code. + +### Python API +[Object detection with tinyYOLOv2 in Python](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/OpenVINO_EP/tiny_yolo_v2_object_detection) + +[Object detection with YOLOv4 in Python](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/OpenVINO_EP/yolov4_object_detection) + +### C/C++ API +[Image classification with Squeezenet in CPP](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/c_cxx/OpenVINO_EP) + +### Csharp API +[Object detection with YOLOv3 in C#](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/c_sharp/OpenVINO_EP/yolov3_object_detection) + +## Blogs/Tutorials + +### Overview of OpenVINO Execution Provider for ONNX Runtime +[OpenVINO Execution Provider](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/faster-inferencing-with-one-line-of-code.html) + +### Tutorial on how to use OpenVINO™ Execution Provider for ONNX Runtime Docker Containers +[Docker Containers](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-execution-provider-docker-container.html) + +### Tutorial on how to use OpenVINO™ Execution Provider for ONNX Runtime python wheel packages [Python Pip Wheel Packages](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-execution-provider-for-onnx-runtime.html) \ No newline at end of file From 088afd407f7086c26a8dad80fbf8b4521bcde19b Mon Sep 17 00:00:00 2001 From: Jaswanth Gannamaneni Date: Thu, 18 Sep 2025 01:33:05 -0700 Subject: [PATCH 3/4] update --- docs/execution-providers/OpenVINO-ExecutionProvider.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/docs/execution-providers/OpenVINO-ExecutionProvider.md b/docs/execution-providers/OpenVINO-ExecutionProvider.md index c26eaa8be306e..47088bc3ebda0 100644 --- a/docs/execution-providers/OpenVINO-ExecutionProvider.md +++ b/docs/execution-providers/OpenVINO-ExecutionProvider.md @@ -135,7 +135,8 @@ Controls numerical precision during inference, balancing performance and accurac ### cache_dir Enables model caching to dramatically reduce subsequent load times. Supports CPU, NPU, GPU with kernel caching on iGPU/dGPU. -**Benefits:** Saves compiled models and cl_cache files for dynamic shapes, eliminating recompilation overhead. Especially beneficial for complex models and frequent application restarts. +**Benefits:** Saves compiled models and cl_cache files for dynamic shapes, eliminating recompilation overhead. Especially beneficial for complex models +and frequent application restarts. ### load_config Loads custom OpenVINO properties from JSON configuration file during runtime. @@ -154,6 +155,7 @@ NPU-specific optimization for Quantize-Dequantize operations. Optimizes ORT quan ### disable_dynamic_shapes & reshape_input **Dynamic Shape Management** : Handles models with variable input dimensions. Option to convert dynamic to static shapes when beneficial for performance. + **NPU Shape Bounds** : Use reshape_input to set dynamic shape bounds specifically for NPU devices (format: input_name[lower..upper] or input_name[fixed_shape]). Required for optimal NPU memory management. @@ -161,11 +163,15 @@ Required for optimal NPU memory management. Configures resource allocation priority for multi-model deployments: **HIGH**: Maximum resource allocation + **MEDIUM**: Balanced resource sharing + **LOW**: Minimal allocation, yields to higher priority + **DEFAULT**: System-determined priority ### layout + ***Tensor Layout Control:***: Provides explicit control over tensor memory layout for performance optimization. Helps OpenVINO optimize memory access patterns and tensor operations. ***Layout Characters:***: N (Batch), C (Channel), H (Height), W (Width), D (Depth), T (Time), ? (Unknown) From 83fbfced8c28d924a846ff79148ecbdb418e25d0 Mon Sep 17 00:00:00 2001 From: Jaswanth Gannamaneni Date: Thu, 18 Sep 2025 01:45:52 -0700 Subject: [PATCH 4/4] update --- docs/execution-providers/OpenVINO-ExecutionProvider.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/docs/execution-providers/OpenVINO-ExecutionProvider.md b/docs/execution-providers/OpenVINO-ExecutionProvider.md index 47088bc3ebda0..c69b5d3d55d59 100644 --- a/docs/execution-providers/OpenVINO-ExecutionProvider.md +++ b/docs/execution-providers/OpenVINO-ExecutionProvider.md @@ -237,9 +237,6 @@ session = ort.InferenceSession( ``` -## Configuration Options - -OpenVINO™ Execution Provider can be configured with certain options at runtime that control the behavior of the EP. These options can be set as key-value pairs as below:- ### Python API Key-Value pairs for config options can be set using InferenceSession API as follow:-