Do you work with the raw interaction data in terms of voice and video, or are you working only from the written part.
j previous speech k next speech