Skip to content

Commit 736ffe1

Browse files
[ExecuTorch][WebGPU] Add clone op (aten.clone.default)
Pull Request resolved: #20463 `aten.clone.default` is a pure flat copy on the buffer-only WebGPU backend, identical to `view_copy`: `clone_impl` reuses the existing `add_flat_copy` helper (`output[i] = input[i]`) and registers a handler under `aten.clone.default`. No new shader, generated WGSL header, or CMake source — it shares the `view_copy` flat-copy compute pipeline. Required for end-to-end Llama 3.2 1B (4-bit, KV cache): the exported model serializes 2 `aten.clone.default` ops into its runtime operator chain (the RoPE-frequency clones reused across all 16 transformer layers), so without a handler the partition graph-breaks at those nodes. Mirrors the Vulkan delegate, which registers the same op and routes a buffer clone to a flat view-copy. ghstack-source-id: 397534700 @exported-using-ghexport @diff-train-skip-merge Differential Revision: [D109477717](https://our.internmc.facebook.com/intern/diff/D109477717/)
1 parent 799a40c commit 736ffe1

1 file changed

Lines changed: 7 additions & 0 deletions

File tree

backends/webgpu/runtime/ops/view_copy/ViewCopy.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,10 +53,17 @@ void view_copy_impl(WebGPUGraph& graph, const std::vector<int>& args) {
5353
add_flat_copy(graph, args.at(0), args.at(args.size() - 1));
5454
}
5555

56+
// clone = flat copy; survives Vulkan RemoveRedundantOpsTransform in Llama 1B.
57+
void clone_impl(WebGPUGraph& graph, const std::vector<int>& args) {
58+
// args: [self, memory_format?, out]; out = last value-id.
59+
add_flat_copy(graph, args.at(0), args.at(args.size() - 1));
60+
}
61+
5662
} // namespace
5763

5864
WEBGPU_REGISTER_OPERATORS {
5965
WEBGPU_REGISTER_OP(aten.view_copy.default, view_copy_impl);
66+
WEBGPU_REGISTER_OP(aten.clone.default, clone_impl);
6067
}
6168

6269
} // namespace executorch::backends::webgpu

0 commit comments

Comments
 (0)