Yes, I of course think that it’s qualitatively different. Everybody get to be a shapeshifter now. It used to take a, because we have 20 national languages, if someone really fluent in one language or one culture, to scam people in that culture. But now that restriction is gone, anyone can scam anyone in any culture without understanding that culture in the first place. And the latest frontier models, they’re really, really good at just bridging those cultural differences and they’re very tunable, very alignable. Right. The latest clause cloth three opus was partly aligned by the alignment assembly effort that I mentioned. People co created along online deliberation, a set of constitutional documents. So instead of just the researchers selling whatever their universal values are, the target community get to co create a document to align that data too. But the problem is that this reward model, if you just assign, then it become equally capable, but very good at polarization, at isolating people, at bullying and things like that, because that’s what a constitutional document tells it not to do. So as it’s now becoming easy to align from here models, it’s becoming equally easy to align for abuse.

Keyboard shortcuts

j previous speech k next speech