I tried using this for a technical talk[1], and it got the amount of speakers wrong. Which is somewhat suprising to me, as I would have thought diarization tech would just worked by now.<p>[1]<a href="https://www.youtube.com/watch?v=5lFxURxbyEc&list=PLiayR7yJx8-aCfBlccBjF1t-UO86fZJVu&index=2">https://www.youtube.com/watch?v=5lFxURxbyEc&list=PLiayR7yJx8...</a>
Woah! I've been facing the same problems with pyannote+whisper for diarization+transcription, and, coincidentally, was just experimenting with combining NeMO and whisper. Do you happen to have a repo for this? Would be invaluable.<p>Edit: Nevermind, found the link: <a href="https://colab.research.google.com/drive/1X5XTiob6irFq8NJM831S0ADwz5_wIS-r" rel="nofollow">https://colab.research.google.com/drive/1X5XTiob6irFq8NJM831...</a>