LecCun Initialization (LeCun et al. 2002)

where is the number of inputs to a neuron (“fan-in”).

Consider a neuron receiving inputs (“fan-in”). Each input has variance 1, and each weight is sampled uniformly from . The neuron computes:

For each term :

Summing such terms:

This maintains unit variance through the network, preventing vanishing or exploding gradients.

Link to original

References

Yann Lecun