This is a couple hundred thousand sentences. This is, in itself, a mix of some other data sets. We went to the person who is curating the data sets, Sam Brown. He’s at NYU now. We said, "Can we use your data set in Common Voice? The license doesn’t work with us because we promised our users CC0. Is this CC4, I think?"

Keyboard shortcuts

j previous speech k next speech