Find these findings questionable unless Whisper is very poorly optimized the way it was run on a 4090.<p>I have a 3090 and an M1 Max 32GB and and although I haven't tried Whisper the inference difference on Llama and Stable Diffusion between the two is staggering, especially with Stable Diffusion where SDXL is about 0:09 seconds 3090 and 1:10 minute on M1 Max.
I think this is using the OpenAI Whisper repo? If they want a real comparison, they should be comparing MLX to faster-whisper or insanely-fast-whisper on the 4090. Faster whisper runs sequentially, insanely fast whisper batches the audio in 30 second intervals.<p>We use whisper in production and this is our findings: We use faster whisper because we find the quality is better when you include the previous segment text. Just for comparison, we find that faster whisper is generally 4-5x faster than OpenAI/whisper, and insanely-fast-whisper can be another 3-4x faster than faster whisper.
Key to this article is understanding it’s leveraging the newly released Apple MLX, and their code is using these Apple specific optimizations.<p><a href="https://news.ycombinator.com/item?id=38539153">https://news.ycombinator.com/item?id=38539153</a>
How does this compare to insanely-fast-whisper though? <a href="https://github.com/Vaibhavs10/insanely-fast-whisper">https://github.com/Vaibhavs10/insanely-fast-whisper</a><p>I think that not using optimizations allows this to be a 1:1 comparison, but if the optimizations are not ported to MLX, then it would still be better to use a 4090.<p>Having looked at MLX recently, I think it's definitely going to get traction on Macs - and iOS when Swift bindings are released <a href="https://github.com/ml-explore/mlx/issues/15">https://github.com/ml-explore/mlx/issues/15</a> (although there might be some C++20 compilation issue blocking right now).
Does this translate to other models or was whisper cherry picked due to it's serial nature and integer math? looking at <a href="https://github.com/ml-explore/mlx-examples/tree/main/stable_diffusion">https://github.com/ml-explore/mlx-examples/tree/main/stable_...</a> seems to hint that this is the case:<p>>At the time of writing this comparison convolutions are still some of the least optimized operations in MLX.<p>I think the main thing at play is the fact you can have 64+G of very fast ram directly coupled to the cpu/gpu and the benefits of that from a latency/co-accessibility point of view.<p>These numbers are certainly impressive when you look at the power packages of these systems.<p>Worth considering/noting that the cost of m3 max system with the minimum ram config is ~2x the price of a 4090...
It's easy to run Whisper on my Mac M1. But it's not using MLX out of the box.<p>I spend an hour or two, trying to run figure out what I need to install / configure to enable it to use MLX. Was getting cryptic Python errors, Torch errors... Gave up on it.<p>I rented VM with GPU, and started Whisper on it within few minutes.
There will be a lot of debate about which is the absolute best choice for X task, but what I love about this is the level of performance at such a low power consumption.
Use this Whisper derivative repo instead - one hour of audio gets transcribed within a minute or less on most GPUs - <a href="https://github.com/Vaibhavs10/insanely-fast-whisper">https://github.com/Vaibhavs10/insanely-fast-whisper</a>
I feel like this is particularly interesting in light of their Vision Pro. Being able to run models in a power efficient manner may not mean much to everyone on a laptop, but it's a huge benefit for an already power hungry headset.
I'll take this opportunity to ask for help: What's a good open source transcription and diarization app or work flow?<p>I looked at <a href="https://github.com/thomasmol/cog-whisper-diarization">https://github.com/thomasmol/cog-whisper-diarization</a> and <a href="https://about.transcribee.net/" rel="nofollow noreferrer">https://about.transcribee.net/</a> (from the people behind Audapolis) but neither work that well -- crashes, etc.<p>Thank you!
4090 -> 82 TFLOPS<p>M3 MAX GPU -> 10 TFLOPS<p>It is 8 times slower than 4090.<p>But yeah, you can claim that a bike has a faster acceleration than Ferrari, because it could reach the speed of 1km per hour faster...
I wonder how AMD's XDNA accelerator will fair.<p>They just shipped 1.0 of the Ryzen AI Software and SDK. Alleges ONNX, PyTorch, and Tensorflow support. <a href="https://www.anandtech.com/show/21178/amd-widens-availability-of-ryzen-ai-software-for-developers-xdna-2-coming-with-strix-point-in-2024" rel="nofollow noreferrer">https://www.anandtech.com/show/21178/amd-widens-availability...</a><p>Interestingly, the upcoming XDNA2 supposedly is going to boost generative performance a lot? "3x". I'd kind of assumed these sort of devices would mainly be helping with inference. (I don't really know what characterizes the different workloads, just a naive grasp.)
looking at the comments perhaps the article could be more eptly titled. the author does stress these benchmarks, maybe better called test runs, are not of any scientific accuracy or worth, but simply to demonstrate what is being tested. i think its interesting though that apple and 4090s are even compared in any way since the devices are so vastly different. id expect the 4090 to be more powerful, but apple optimized code runs really quick on apple silicon despite this seemingly obvious fact, and that i think is interesting. you dont need a 4090 to do things if you use the right libraries. is that what i can take from it?
There's a better parallel/batching that works on the 30s chunks resulting in 40X. From HF at <a href="https://github.com/Vaibhavs10/insanely-fast-whisper">https://github.com/Vaibhavs10/insanely-fast-whisper</a><p>This is again not native PyTorch so there's still room to have better RTFX numbers.
Anyone have overall benchmarks or qualified speculation on how an optimized implementation for a 4070 compares against the M series -- especially the M3 Max?<p>I'm trying to decide between the two. I figure the M3 Max would crush the 4070?
The shocking thing about these M series comparisons is never "the M series is fast as the GIANT NVIDIA THING!" it's always "Man, the M series is 70% as fast with like 1/4 the power."
About whisper, anyone knows a project (github) about using the model in real-time? I'm studying a new language, and it appears to be a good chance to use and learning pronunciation vs. word.
TL;DR<p>If you compare whisper
on a mac with Mac optimized build
Vs
on a pc with a few NON-optimized NVIDIA build
The results are close!
If nvidia optimized is compared, it’s not even remotely close.<p>Pfft<p>I’ll be picking up a Mac but I’m well aware it’s not close to Nvidia at all. It’s just the best portable setup I can find that I can run completely offline.<p>Do people really need to make these disingenuous comparisons to validate their purchase?<p>If a mac fits your overall use case better, get a Mac. If a pc with nvidia is the better choice, get it. Why all these articles of “look my choice wasn’t that dumb”??
Hmm… this is a dumb question, but the cookie pop up appears to be in German on this site. Does anyone know which button to press to say “maximally anti-tracking?”