So, one video can look at like 10 different captions, or 10,000 or 10 million different captions, depending on how many people are viewing that video, and each generated caption would fit that person’s reward model to maximize effect on any axis, really. So, this is the kind of interesting research that they publish publicly, because then it removes a kind of centralizing compute cap on how many people that they can meaningfully tune to effectively affect.

Keyboard shortcuts

j previous speech k next speech