To me, I think a way to free ourselves from this value alignment issue is just to have, as a norm, multiple interpretations. Just as you can have many human assistants, each perfectly aligned to you to make accountable explanation if they do some decision not in your best interest, you will have those different human assistants, compare notes. If one of them consistently makes things that are not value-aligned to you, at least you have other assistants to warn you about it.

Keyboard shortcuts

j previous speech k next speech