Toy Models of Superposition

Summary

In this episode of The Turing Talks, we explore the concept of superposition in neural networks, where more features are represented than there are dimensions. The research uses toy models—small ReLU networks with sparse inputs—to investigate how superposition enables networks to simulate larger ones, producing polysemantic neurons that respond to multiple unrelated features. The study examines uniform superposition, tied to geometric shapes like triangles and pentagons, and non-uniform superposition, where features vary in importance or sparsity. Connections between superposition, learning dynamics, adversarial vulnerability, and AI safety are discussed, with proposed solutions including developing models without superposition or finding overcomplete bases to describe features.

Sources

Mechanistic Interpretability

17 min

Conciousness and AI

17 min

Technical governance approaches

17 min

Join the discussion

0 / 300 characters

Comments

No comments yet

Toy Models of Superposition

Summary

Sources

You might also like

Mechanistic Interpretability

Conciousness and AI

Technical governance approaches