year: 2022
paper: https://transformer-circuits.pub/2022/toy_model/index.html
website:
code:
connections: mixed selectivity, mechanistic interpretability, anthropic
Read
https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=EuO4CLwSIzX7AEZA1ZOsnwwF
- Superposition enables exponentially more representational power with the same number of neurons
- Occurs when features are sparse (rarely activate together), making interference tolerable
- As sparsity increases, models shift from monosemantic (one feature per neuron) to polysemantic representations in a phase transition