Skip to content

Conversation

@GregoryComer
Copy link
Member

@GregoryComer GregoryComer commented Nov 19, 2025

ATen clone ops can end up in the graph from a few sources. Since the graph is functional, we don't actually need these and they are slow. This PR runs the no-op clone removal pass for XNNPACK.

In addition to this, I ran into an issue where XNNPACK delegate doesn't currently handle inputs being directly forwarded to partition outputs. There has to be at least one operator. To solve this, I updated the removal pass to leave these clone ops in and added copy support in the XNN delegate to direct copy to the output.

In the long-run, I want to remove these no-ops higher up as part of to_edge, but this requires alignment and changes with a few more backends. See #15838. But resolving for XNNPACK will mitigate the issue for CPU models, at least.

Differential Revision: D87405074

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 19, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 19, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15884

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit cafe8a3 with merge base aff5086 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-codesync
Copy link

meta-codesync bot commented Nov 19, 2025

@GregoryComer has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87405074.

@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

GregoryComer added a commit to GregoryComer/executorch that referenced this pull request Nov 19, 2025
Summary: Pull Request resolved: pytorch#15884

Differential Revision: D87405074
GregoryComer added a commit to GregoryComer/executorch that referenced this pull request Nov 19, 2025
Summary: Pull Request resolved: pytorch#15884

Differential Revision: D87405074
@meta-codesync
Copy link

meta-codesync bot commented Nov 19, 2025

@GregoryComer has imported this pull request. If you are a Meta employee, you can view this in D87405074.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for removing no-op clone operations in the XNNPACK backend to improve performance, as clone operations that don't modify memory format or dim order add unnecessary overhead in functional graphs.

  • Added RemoveCloneOpsTransform pass to the XNNPACK pass pipeline with support for preserving input-output copies
  • Implemented clone/copy operator support in XNNPACK delegate for dim-order preserving clones
  • Updated test models to include operations after clone to prevent direct input-to-output forwarding

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
backends/transforms/remove_clone_ops.py Added preserve_input_output_copies parameter and _is_input_output_copy method to selectively preserve clones that directly connect graph inputs to outputs
backends/xnnpack/_passes/init.py Added RemoveCloneOpsTransform to the XNNPACK pass pipeline
backends/xnnpack/_passes/TARGETS Added dependency on remove_clone_ops transform
backends/xnnpack/partition/config/generic_node_configs.py Added CloneDimOrderConfig to partition dim-order preserving clones
backends/xnnpack/partition/config/init.py Exported CloneDimOrderConfig
backends/xnnpack/operators/op_clone.py Implemented CloneVisitor to serialize clone operations as XNN copy nodes
backends/xnnpack/operators/init.py Added import for op_clone module
backends/xnnpack/serialization/xnnpack_graph_schema.py Added XNNCopy node type to schema
backends/xnnpack/serialization/schema.fbs Added XNNCopy to flatbuffer schema union
backends/xnnpack/serialization/runtime_schema.fbs Added XNNCopy to runtime flatbuffer schema union
backends/xnnpack/runtime/XNNCompiler.cpp Implemented defineCopyNode function to define XNN copy operations in the subgraph
backends/xnnpack/test/ops/test_clone.py Added comprehensive tests for clone operation partitioning with various memory formats and tensor dimensions
exir/tests/test_memory_format_ops_pass_utils.py Updated test models to include operations after clone to prevent direct forwarding

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

GregoryComer added a commit to GregoryComer/executorch that referenced this pull request Nov 20, 2025
Summary: Pull Request resolved: pytorch#15884

Differential Revision: D87405074

Pulled By: GregoryComer
Summary: Pull Request resolved: pytorch#15884

Differential Revision: D87405074

Pulled By: GregoryComer
@GregoryComer GregoryComer merged commit b3b7a98 into pytorch:main Nov 20, 2025
230 of 235 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants