A lot of it probably comes from what you’re just describing, like the data is the wide Internet and it’s totally unsupervised. I think that it does seem to be the case that the human feedback, RLHF, post-processing does a pretty good job at preventing really obvious sorts of abuses.