Skip to content

Commit 02e3ac2

Browse files
committed
Update GPU document: runtime_skippable
1 parent 48ef755 commit 02e3ac2

File tree

2 files changed

+37
-1
lines changed

2 files changed

+37
-1
lines changed

src/plugins/intel_gpu/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ GPU Plugin contains the following components:
3333
* [Run benchmark from device_mem](./docs/use_device_mem.md)
3434

3535
## Documentation on dynamic-shape
36-
This contents explain the internal implementation of dynamic shape support in the GPU Plugin. For general usage of dynamic shape and limitations of the GPU plugin, please refer to this link: [GPU Device — OpenVINO™ documentation - Version(2023.1)](https://docs.openvino.ai/2023.1/openvino_docs_OV_UG_supported_plugins_GPU.html#dynamic-shapes).
36+
This contents explain the internal implementation of dynamic shape support in the GPU Plugin. For general usage of dynamic shape and limitations of the GPU plugin, please refer to this link: [GPU Device — OpenVINO™ documentation - Version(2025)](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html#dynamic-shapes).
3737

3838
* [Overall flow for dynamic shape execution](./docs/dynamic_shape/overall_flow.md)
3939
* Implementation details
@@ -44,6 +44,7 @@ This contents explain the internal implementation of dynamic shape support in th
4444
<!-- * weight compression (TBD)) -->
4545
* Optimization features
4646
* [Memory preallocation](./docs/dynamic_shape/memory_preallocation.md)
47+
* [Runtime operation skip](./docs/dynamic_shape/runtime_skip.md)
4748
<!-- * Fake alignment of shape (TBD)
4849
* Shape-of subgraph on CPU (TBD)
4950
* Runtime buffer fusing (TBD)
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Runtime operation skip
2+
## Description
3+
When working with dynamic shapes, compilation-time optimization faces inherent limitations since shape information remains undefined until runtime. This creates a two-phase optimization opportunity: while certain operations cannot be optimized during the initial compilation phase due to unknown shapes, they become prime candidates for runtime optimization once concrete shape information materializes during inference execution.
4+
5+
Consider a 4D permute operation with the transformation order [0, 2, 1, 3]. During compilation, the input shapes are dynamic [-1, -1, -1, -1], therefore, any shape-based optimization is not applicable. However, there might be a second chance do optimized this operation during the runtime. Suppose the actual input shape resolves to [128, 1, 32, 64]. With this concrete information, the we can now recognize a critical insight: since dimension 1 has size 1, swapping dimensions 1 and 2 (as specified by the permute order [0, 2, 1, 3]) results in no actual data movement. The operation becomes essentially a metadata-only transformation—a simple reshape that requires no memory copying or data rearrangement.
6+
This example demonstrates how runtime optimization can transform potentially expensive operations to be skipped, highlighting the value of deferred optimization strategies in dynamic computation graphs.
7+
8+
## Basic flow of runtime operation skip
9+
1. **Relefant flags**
10+
First, we need to set two flags for the program_node of such an operation, which we do not apply shape-based optimization during compilation but try runtime optimization with the shape.
11+
- Static flags (Set during `mark_runtime_skippable_nodes` pass at compilation time)
12+
- `program_node::optimized`
13+
- This flag presents that this node is eligible for being optimized out, either at compilation time or runtime.
14+
- This flag is set true for all optimization schemes, not limited to runtime skippability.
15+
- `program_node::runtime_skippable`
16+
- Indicates that this node can be optimized during runtime based on the shape.
17+
- Dynamic flag (Set at runtime)
18+
- `primitive_inst::_can_be_optimized`
19+
- Indicates that this `primitive_inst` is actually optimized out at a certain execution
20+
21+
If `program_node::optimized` is true and `program_node::runtime_skippable` is false, it means that this node is *always* optimized out (i.e., compile-time optimization).
22+
If both of the flags are set true, the node may be optimized out or not in the runtime, depending on the runtime shapes.
23+
If program_node::optimized is false and program_node::runtime_skippable is true, it is an invalid combination.
24+
25+
`program_node::optimized` is set for more conservative optimization checking, many graph optimization passes use this flag for safe optimization decisions.
26+
27+
However, some optimization passes such as [memory_dependency_pass](https://github.com/openvinotoolkit/openvino/blob/aa6d3811e6dea93cb818ff483bf6c3ca849d4034/src/plugins/intel_gpu/src/graph/include/pass_manager.h#L313) applies different decisions for compile time optimized nodes and runtime optimized nodes.
28+
29+
2. **Runtime optimization decision**
30+
- Once the shape is updated in `primitive_inst::prepare_primitive()`, `do_runtime_skip_*node_type*` for each type of operation decides whehther to skip the node at that exeuction or not.
31+
32+
3. **Caveats**
33+
- Once the `primitive_inst::_can_be_optimized` is set true, the runtime will only update its metadata such as shape or padding information and skip the actual execution.
34+
- Also, it needs to update the primitive_inst's output memory with its input memory. This is done by update_output_memory() called from primitive_inst::on_execute().
35+
- If you are adding a new type of skippable operation, please make sure that the primitive has update_output_memory function implemented too.

0 commit comments

Comments
 (0)