Episodes

Our catalogue of exciting episodes is at your fingertips

From simple introductions to deep dives in the field of AI. We've got something for everyone.

What is AI alignment

9 min

In this episode of The Turing Talks, we explore the potential risks and benefits of artificial intelligence (AI), with a focus on the development of superintelligent AI. We examine the implications of AI systems potentially exceeding human intelligence and the challenge of keeping these systems aligned with human values. Our discussion covers significant risks, including AI malfunctions, discrimination, social isolation, privacy breaches, and disinformation. We also address complex issues like worker exploitation, bioterrorism, authoritarianism, and the loss of control over advanced technologies. Our guests emphasize the need for AI safety research to be prioritized alongside development, ensuring a beneficial future for humanity. Join us for an engaging exploration of the AI landscape and the balance between innovation and responsibility.

How does ChatGPT work

11 min

In this episode of The Turing Talks, we delve into the inner workings of ChatGPT, a large language model created by OpenAI using Generative Pre-trained Transformer (GPT) technology. We unpack its development, covering the creation of a vast dataset of text and code, which was instrumental in training its neural network. You’ll learn about key training phases: pre-training for language understanding and Reinforcement Learning from Human Feedback (RLHF) for refining responses. We also explore ChatGPT's transformer architecture, which leverages self-attention mechanisms to process language efficiently. Finally, we discuss its strengths, limitations, and the ongoing research aimed at enhancing accuracy, fairness, and safety to responsibly harness this advanced AI technology.

Scalable Oversight

17 min

In this episode of The Turing Talks, we tackle the critical challenge of aligning powerful artificial intelligence (AI) systems with human values. We discuss innovative approaches from three key sources. First, “AI Safety via Debate” presents a method where AI engages in debates with humans to ensure accurate and relevant information delivery. Next, “Supervising Strong Learners by Amplifying Weak Experts” explores how training AI to break down complex tasks into simpler subtasks can enhance understanding. Finally, “Weak-to-Strong Generalization” investigates the potential of using “weak” AI models to supervise and unlock the capabilities of “strong” AI models. Together, these sources highlight the need for robust techniques to keep increasingly powerful AI systems aligned with our values. Join us for a thought-provoking discussion on the future of AI alignment and its implications for society.

Reinforcement learning from human (or AI) feedback

8 min

In this episode of The Turing Talks, we dive into the intriguing world of Reinforcement Learning from Human Feedback (RLHF) and its role in training Large Language Models (LLMs). We break down the three key steps of RLHF: collecting human feedback, training a reward model, and optimizing AI systems for maximum rewards. Discover the advantages of RLHF in communicating complex goals without manual reward design, reducing the risk of reward hacking. We also address the challenges of obtaining quality feedback, reward model misspecification, and biases in policy optimization. Additionally, we introduce Constitutional AI (CAI) as a novel approach to improve RLHF, utilizing a set of human-written principles to enhance AI behavior. Join us for a comprehensive overview of RLHF, its limitations, and how CAI can lead to safer and more transparent AI development.

Seasons

AI Safety Fundamentals

Papers

All Episodes

Toy Models of Superposition

13 min

In this episode of The Turing Talks, we explore the concept of superposition in neural networks, where more features are represented than there are dimensions. The research uses toy models—small ReLU networks with sparse inputs—to investigate how superposition enables networks to simulate larger ones, producing polysemantic neurons that respond to multiple unrelated features. The study examines uniform superposition, tied to geometric shapes like triangles and pentagons, and non-uniform superposition, where features vary in importance or sparsity. Connections between superposition, learning dynamics, adversarial vulnerability, and AI safety are discussed, with proposed solutions including developing models without superposition or finding overcomplete bases to describe features.

Many-shot Jailbreaking

9 min

This episode of The Turing Talks explores Many-Shot Jailbreaking (MSJ), a technique exploiting expanded context windows in large language models (LLMs) to prompt harmful behaviors. The study highlights the increasing effectiveness of MSJ with more examples, showing vulnerabilities across LLMs like GPT-4 and Llama 2, while standard defenses proved insufficient. Researchers stress the need for innovative mitigations, noting some promise in prompt-based defenses like Cautionary Warning Defense (CWD) to reduce attack success rates.

Tutorial on Active Inference

14 min

In this episode of The Turing Talks, we explore Active Inference, a neuroscience-backed model explaining how the brain works by minimizing surprise and maintaining homeostasis. The brain uses a generative model to predict the likelihood of observations, inferring hidden causes. To achieve the most accurate model, the brain maximizes model evidence—or, equivalently, minimizes surprise. A core concept, Free Energy, represents surprise and can be decomposed into complexity and accuracy. Through active inference, the brain selects actions that minimize free energy, constantly planning to achieve desired outcomes. We also discuss Epistemic Value, which gauges how much can be learned from observations, and how action selection aligns expected and real outcomes. The episode concludes with a look at limitations of active inference, but highlights its promise in fields like computational psychiatry.

The Road to Superintelligence

14 min

In this episode of The Turing Talks, we dive into the Law of Accelerating Returns and its implications for artificial intelligence (AI) development. This law suggests that technological progress is exponential, leading many to underestimate the speed of future advancements. We discuss the different types of AI: Artificial Narrow Intelligence (ANI), which specializes in specific tasks; Artificial General Intelligence (AGI), aimed at human-level intelligence; and the potential for Artificial Superintelligence (ASI), which could exceed human intelligence and capabilities. The episode also explores AGI's potential to rapidly self-improve and the risks and benefits ASI might bring to humanity.

Conciousness and AI

17 min

In this episode of The Turing Talks, we dive into the complex question of AI consciousness and the ethical challenges it presents. With rapid advances in large language models like LaMDA and ChatGPT, experts debate whether AI could ever achieve true consciousness. We examine the neuroscience of consciousness, including insights from researchers like Liad Mudrik, and explore competing theories that shape this debate. Proposed approaches, such as consciousness tests and the "Excluded Middle Policy," seek to manage these ethical concerns. Finally, we discuss the impact of perceived consciousness, highlighting potential risks and moral considerations for both AI and society.

Machine Theory of Mind

11 min

In this episode of The Turing Talks, we introduce the innovative Theory of Mind neural network (ToMnet), which utilizes meta-learning to model agents by analyzing their behavior. The ToMnet is designed with three key modules: a character net that processes past actions, a mental state net for current behavioral analysis, and a prediction net for forecasting future actions. We discuss various experiments demonstrating how the ToMnet approximates optimal inference, infers goals, and recognizes agents' false beliefs. This framework not only advances multi-agent AI systems but also holds potential for improving machine-human interactions and fostering interpretable AI.

On the Measure of Intelligence

15 min

In this episode of The Turing Talks, we explore a groundbreaking approach to understanding intelligence in artificial intelligence (AI) systems. We discuss how intelligence differs from mere skill, emphasizing the need for an anthropocentric perspective that evaluates AI against human-like general intelligence. The conversation includes a new framework for assessing intelligence based on Algorithmic Information Theory (AIT), focusing on the efficiency of skill acquisition and the challenge of generalization. We also introduce the Abstraction and Reasoning Corpus (ARC) as a benchmark for measuring AI’s capacity to handle novel tasks, while addressing the limitations of current evaluation methods and the need for further research.

How does ChatGPT work

11 min

Contributing to AI safety

10 min

In this episode of The Turing Talks, we explain how you can actively contribute to the field of AI alignment, ensuring that AI systems align with human values and goals. We explore various career opportunities in research, engineering, and policy, while emphasizing the importance of mastering both machine learning and alignment concepts. You'll also discover project ideas on assessing AI risks, improving alignment techniques, and advancing AI safety. Plus, we share tips on interdisciplinary collaboration and advice for navigating the AI alignment community.

Technical governance approaches

17 min

In this episode of The Turing Talks, we dive into emerging strategies for AI governance, including the regulation of compute power—a measurable way to oversee AI development. We also discuss the potential of watermarking AI-generated content to combat disinformation, though its effectiveness is debated due to easy circumvention. Additionally, the conversation covers the critical need for more robust AI model evaluations and the push for a “Science of Evals” to better assess AI risks and capabilities. Tune in for insights on steering AI toward safe and responsible use.

Mechanistic Interpretability

17 min

In this episode of The Turing Talks, we explore the rising field of mechanistic interpretability, which seeks to unravel how neural networks internally process decisions. As AI systems enter high-stakes fields like healthcare and finance, understanding their inner workings is crucial for detecting biases and ensuring alignment with human goals. We discuss key challenges like polysemanticity—where neurons represent multiple features—and how researchers are addressing this with techniques like sparse autoencoders. Tune in for insights into the future of AI transparency and safety.

Robustness, unlearning and control

11 min

In this episode of The Turing Talks, we dive into the latest research on removing hazardous knowledge from large language models (LLMs). We explore "The WMDP Benchmark," a new tool designed to assess LLMs' expertise in critical areas like biosecurity, cybersecurity, and chemical security. The episode also introduces RMU, a novel unlearning technique that strips harmful information without affecting overall model capabilities. We then turn to "Deep Forgetting & Unlearning," which tackles the challenge of eliminating unwanted knowledge embedded in LLMs’ internal workings. Join us for insights into safer, more responsibly scoped AI.

Scalable Oversight

17 min

Reinforcement learning from human (or AI) feedback

8 min

What is AI alignment

9 min

AI and the Years Ahead

14 min

In this episode of The Turing Talks, we dive into the explosive growth of training compute in artificial intelligence, with a special focus on deep learning. Our discussion unfolds across three insightful sources. First, we spotlight groundbreaking advancements in AI domains like image generation, game playing, and language processing, all fueled by enhanced computing power and vast datasets. Next, we explore the economic drivers behind AI, revealing how companies are striving to develop systems that could replace human labor and unlock immense economic value. Finally, we take a historical perspective on computing trends in machine learning, charting three distinct eras of compute growth. Join us as we unpack these critical insights, emphasizing the need for responsible governance in the face of powerful AI technologies. Tune in for an engaging exploration of AI’s past, present, and future!

Episodes

Our catalogue of exciting episodes is at your fingertips

From simple introductions to deep dives in the field of AI. We've got something for everyone.

Popular

What is AI alignment

How does ChatGPT work

Scalable Oversight

Reinforcement learning from human (or AI) feedback

Seasons

AI Safety Fundamentals

Papers

All Episodes

Toy Models of Superposition

Many-shot Jailbreaking

Tutorial on Active Inference

The Road to Superintelligence

Conciousness and AI

Machine Theory of Mind

On the Measure of Intelligence

How does ChatGPT work

Contributing to AI safety

Technical governance approaches

Mechanistic Interpretability

Robustness, unlearning and control

Scalable Oversight

Reinforcement learning from human (or AI) feedback

What is AI alignment

AI and the Years Ahead