You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This scaling calculus results in a number of consequences, among them the fact that the geometric mean of the fan-in and fan-out, rather than the fan-in, fan-out, or arithmetic mean, should be used for the initialization of the variance of weights in a neural network.
Initialization using the geometric-mean of the fan-in and fan-out ensures a constant layer scaling factor throughout the network, aiding optimization.
The use of geometric initialization results in an equally weighted diagonal, in contrast to the other initializations considered.
Using geometric average allows us to find a better compromise between forward and backward passes and significantly improve training stability and final accuracy
Our split-aware initialization adopts geometric average instead of arithmetic average to make a better balance between forward and backward
I can submit a PR for this.
The text was updated successfully, but these errors were encountered:
carlosgmartin
changed the title
Add mode='fan_geom_avg' to nn.initializers.variance_scaling
Add mode='fan_geo_avg' to nn.initializers.variance_scaling
Jan 8, 2025
Feature request: Add
'fan_geo_avg'
as an option for themode
argument ofnn.initializers.variance_scaling
, in order to use the geometric mean rather than arithmetic mean offan_in
andfan_out
for the denominator.Beyond Folklore: A Scaling Calculus for the Design and Initialization of ReLU Networks:
SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems:
I can submit a PR for this.
The text was updated successfully, but these errors were encountered: