Some operations (e.g., SCALE) accept in-place tensors: the output tensor aliases one of the input tensors.
In this case, we need additional logic to support transformations, e.g., FP16 -> BF16, flattening etc. to both the input/output in tandem, since the current implementation does them separately (some transformations on the input, some of the output).
Additional metadata may be needed in the internal graph representation.
Some operations (e.g., SCALE) accept in-place tensors: the output tensor aliases one of the input tensors.
In this case, we need additional logic to support transformations, e.g., FP16 -> BF16, flattening etc. to both the input/output in tandem, since the current implementation does them separately (some transformations on the input, some of the output).
Additional metadata may be needed in the internal graph representation.