Robustness, unlearning and control

Summary

In this episode of The Turing Talks, we dive into the latest research on removing hazardous knowledge from large language models (LLMs). We explore "The WMDP Benchmark," a new tool designed to assess LLMs' expertise in critical areas like biosecurity, cybersecurity, and chemical security. The episode also introduces RMU, a novel unlearning technique that strips harmful information without affecting overall model capabilities. We then turn to "Deep Forgetting & Unlearning," which tackles the challenge of eliminating unwanted knowledge embedded in LLMs’ internal workings. Join us for insights into safer, more responsibly scoped AI.

Sources

Join the discussion
0 / 300 characters
Comments

No comments yet