Hi HN,<p>based on Whisper [0] and Whisper.cpp [1], I created a comparison of transcription performance (quantitative metrics such as relative speed).<p>You can find the code in the Colab [2] and a blog post [3] containing a how-to guide and visualizations.<p>In the future, I'd love to add WER evaluation and visualizations based on ground-truth data.<p>Bonus: Normally you would log these results from Python to Weights & Biases, but there is a way to log even from C++ / the cli by using `subprocess`<p>Would love to know what you think of this comparison and what features / attributes you would like to see in a more sophisticated comparison.<p>Thanks!<p>[0]: <a href="https://news.ycombinator.com/item?id=32927360" rel="nofollow">https://news.ycombinator.com/item?id=32927360</a>
[1]: <a href="https://news.ycombinator.com/item?id=33877893" rel="nofollow">https://news.ycombinator.com/item?id=33877893</a>
[2]: <a href="https://colab.research.google.com/drive/1mXZUdIbvdNVOFRJaIhW-b8LfB17spOm1" rel="nofollow">https://colab.research.google.com/drive/1mXZUdIbvdNVOFRJaIhW...</a>
[3]: <a href="https://wandb.ai/hans-ramsl/gradient-dissent-transcription/reports/How-to-Track-and-Compare-Audio-Transcriptions-with-Whisper-and-Weights-Biases--VmlldzozNDc5OTg2" rel="nofollow">https://wandb.ai/hans-ramsl/gradient-dissent-transcription/r...</a>