-
Notifications
You must be signed in to change notification settings - Fork 731
Remove no-op clones in xnnpack #15884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15884
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit cafe8a3 with merge base aff5086 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87405074. |
This PR needs a
|
Summary: Pull Request resolved: pytorch#15884 Differential Revision: D87405074
a3db1eb to
aa47742
Compare
Summary: Pull Request resolved: pytorch#15884 Differential Revision: D87405074
aa47742 to
7e370c0
Compare
|
@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D87405074. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for removing no-op clone operations in the XNNPACK backend to improve performance, as clone operations that don't modify memory format or dim order add unnecessary overhead in functional graphs.
- Added
RemoveCloneOpsTransformpass to the XNNPACK pass pipeline with support for preserving input-output copies - Implemented clone/copy operator support in XNNPACK delegate for dim-order preserving clones
- Updated test models to include operations after clone to prevent direct input-to-output forwarding
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| backends/transforms/remove_clone_ops.py | Added preserve_input_output_copies parameter and _is_input_output_copy method to selectively preserve clones that directly connect graph inputs to outputs |
| backends/xnnpack/_passes/init.py | Added RemoveCloneOpsTransform to the XNNPACK pass pipeline |
| backends/xnnpack/_passes/TARGETS | Added dependency on remove_clone_ops transform |
| backends/xnnpack/partition/config/generic_node_configs.py | Added CloneDimOrderConfig to partition dim-order preserving clones |
| backends/xnnpack/partition/config/init.py | Exported CloneDimOrderConfig |
| backends/xnnpack/operators/op_clone.py | Implemented CloneVisitor to serialize clone operations as XNN copy nodes |
| backends/xnnpack/operators/init.py | Added import for op_clone module |
| backends/xnnpack/serialization/xnnpack_graph_schema.py | Added XNNCopy node type to schema |
| backends/xnnpack/serialization/schema.fbs | Added XNNCopy to flatbuffer schema union |
| backends/xnnpack/serialization/runtime_schema.fbs | Added XNNCopy to runtime flatbuffer schema union |
| backends/xnnpack/runtime/XNNCompiler.cpp | Implemented defineCopyNode function to define XNN copy operations in the subgraph |
| backends/xnnpack/test/ops/test_clone.py | Added comprehensive tests for clone operation partitioning with various memory formats and tensor dimensions |
| exir/tests/test_memory_format_ops_pass_utils.py | Updated test models to include operations after clone to prevent direct forwarding |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary: Pull Request resolved: pytorch#15884 Differential Revision: D87405074 Pulled By: GregoryComer
7e370c0 to
ebe2d2e
Compare
Summary: Pull Request resolved: pytorch#15884 Differential Revision: D87405074 Pulled By: GregoryComer
ebe2d2e to
cafe8a3
Compare
ATen clone ops can end up in the graph from a few sources. Since the graph is functional, we don't actually need these and they are slow. This PR runs the no-op clone removal pass for XNNPACK.
In addition to this, I ran into an issue where XNNPACK delegate doesn't currently handle inputs being directly forwarded to partition outputs. There has to be at least one operator. To solve this, I updated the removal pass to leave these clone ops in and added copy support in the XNN delegate to direct copy to the output.
In the long-run, I want to remove these no-ops higher up as part of to_edge, but this requires alignment and changes with a few more backends. See #15838. But resolving for XNNPACK will mitigate the issue for CPU models, at least.
Differential Revision: D87405074