This is great, but I hope in the future there would be a speech-to-text model with a focus on low-resource languages, probably by balancing the dataset similar to No Language Left Behind (NLLB) released by Meta, it's a translation model that works really well even with low-resource languages, it would be really cool something similar for speech transcription.