That’s fine. If you don’t yet require speaker identification, I’m sure that just importing from YouTube would work. People want to have a higher quality transcript. They can run through Whisper and upload the SRT to YouTube. That’s something on the burden of the data preparation. Not on your system, basically. You don’t have to fork out to run Whisper.

