LecCun Initialization (LeCun et al. 2002)
where is the number of inputs to a neuron (“fan-in”).
Link to originalConsider a neuron receiving inputs (“fan-in”). Each input has variance 1, and each weight is sampled uniformly from . The neuron computes:
For each term :
Summing such terms:
This maintains unit variance through the network, preventing vanishing or exploding gradients.