Toy Models of Superposition
Summary
In this episode of The Turing Talks, we explore the concept of superposition in neural networks, where more features are represented than there are dimensions. The research uses toy models—small ReLU networks with sparse inputs—to investigate how superposition enables networks to simulate larger ones, producing polysemantic neurons that respond to multiple unrelated features. The study examines uniform superposition, tied to geometric shapes like triangles and pentagons, and non-uniform superposition, where features vary in importance or sparsity. Connections between superposition, learning dynamics, adversarial vulnerability, and AI safety are discussed, with proposed solutions including developing models without superposition or finding overcomplete bases to describe features.
Sources
You might also like
No comments yet