Skip to content

Conversation

@avtc
Copy link
Contributor

@avtc avtc commented Nov 1, 2025

@Qubitium
This is a simplified fix for #2116, maybe it is not optimal, but it works.

The suggested solution is too hard for me to make it properly:

If a full layer has no module to quantize a simple forward() is enough and output is captured to be used as next layer's input. So one pass forward (entire layer simple forward wihout need of dealing with subset loops and micro forward loops, just full layer, usally XXXDecodeLayer.forward(). So output = current_layer.forward() is enough or sometimes just calling the layer callable like layer() which same as layer.forward().

Assume layer 2 has no modules to quantize. At beginniing loop for layer 2, we have layer_output from completed forward_replay() of layer 1. Then pass this to layer 2 (as a whole) as layer_input and store ouput, then immediately loop to layer 3 without any further subset work that is only necessary if we need to quantize part of a layer.

And it looks it is correct for the case when one or more subsets of a layer are excluded from quantization.

@Qubitium Qubitium self-requested a review November 2, 2025 20:05
@Qubitium Qubitium merged commit 1618322 into ModelCloud:main Nov 2, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants