And so comes the idea of red teaming, of finding the vulnerabilities before it’s released, instead of just publishing everything by default. And also comes the culture of setting a buffer, like if it hits this capability, even though we know it’s probably not empowering this kind of agentic or some extinction-scale risks, but we don’t actually know what the adversaries’ capabilities are. What if this is just the one missing puzzle that they lack?

Keyboard shortcuts

j previous speech k next speech