

Technically it supports fewer languages than whisper, 40 vs 99
The main problem isn’t “bother”, it’s training data. You need hundreds of thousands of hours of high quality transcripts to train models like these and that just doesn’t exist for like zulu or whatever
Alright, I’m waiting on the youtube playlist