Whisper: Nvidia RTX 4090 vs. M1 Pro with MLX

343 pointsby interpol_pover 1 year ago

26 comments

Find these findings questionable unless Whisper is very poorly optimized the way it was run on a 4090.I have a 3090 and an M1 Max 32GB and and although I haven't tried Whisper the inference difference on Llama and Stable Diffusion between the two is staggering, especially with Stable Diffusion where SDXL is about 0:09 seconds 3090 and 1:10 minute on M1 Max.

评论 #38629268 未加载

评论 #38628983 未加载

评论 #38629609 未加载

评论 #38629232 未加载

评论 #38631691 未加载

评论 #38628855 未加载

评论 #38630389 未加载

评论 #38634545 未加载

评论 #38629407 未加载

评论 #38629044 未加载

attyover 1 year ago

I think this is using the OpenAI Whisper repo? If they want a real comparison, they should be comparing MLX to faster-whisper or insanely-fast-whisper on the 4090. Faster whisper runs sequentially, insanely fast whisper batches the audio in 30 second intervals.We use whisper in production and this is our findings: We use faster whisper because we find the quality is better when you include the previous segment text. Just for comparison, we find that faster whisper is generally 4-5x faster than OpenAI/whisper, and insanely-fast-whisper can be another 3-4x faster than faster whisper.

评论 #38633733 未加载

评论 #38633881 未加载

评论 #38633942 未加载

tiffanyhover 1 year ago

Key to this article is understanding it’s leveraging the newly released Apple MLX, and their code is using these Apple specific optimizations.<a href="https://news.ycombinator.com/item?id=38539153">https://news.ycombinator.com/item?id=38539153</a>

评论 #38629951 未加载

评论 #38631509 未加载

Flux159over 1 year ago

How does this compare to insanely-fast-whisper though? <a href="https://github.com/Vaibhavs10/insanely-fast-whisper">https://github.com/Vaibhavs10/insanely-fast-whisper</a>I think that not using optimizations allows this to be a 1:1 comparison, but if the optimizations are not ported to MLX, then it would still be better to use a 4090.Having looked at MLX recently, I think it's definitely going to get traction on Macs - and iOS when Swift bindings are released <a href="https://github.com/ml-explore/mlx/issues/15">https://github.com/ml-explore/mlx/issues/15</a> (although there might be some C++20 compilation issue blocking right now).

评论 #38629224 未加载

评论 #38630501 未加载

评论 #38633764 未加载

tgtweakover 1 year ago

Does this translate to other models or was whisper cherry picked due to it's serial nature and integer math? looking at <a href="https://github.com/ml-explore/mlx-examples/tree/main/stable_diffusion">https://github.com/ml-explore/mlx-examples/tree/main/stable_...</a> seems to hint that this is the case:>At the time of writing this comparison convolutions are still some of the least optimized operations in MLX.I think the main thing at play is the fact you can have 64+G of very fast ram directly coupled to the cpu/gpu and the benefits of that from a latency/co-accessibility point of view.These numbers are certainly impressive when you look at the power packages of these systems.Worth considering/noting that the cost of m3 max system with the minimum ram config is ~2x the price of a 4090...

评论 #38629804 未加载

SlavikCAover 1 year ago

It's easy to run Whisper on my Mac M1. But it's not using MLX out of the box.I spend an hour or two, trying to run figure out what I need to install / configure to enable it to use MLX. Was getting cryptic Python errors, Torch errors... Gave up on it.I rented VM with GPU, and started Whisper on it within few minutes.

评论 #38629956 未加载

评论 #38633652 未加载

评论 #38630643 未加载

评论 #38629964 未加载

Lalabadieover 1 year ago

There will be a lot of debate about which is the absolute best choice for X task, but what I love about this is the level of performance at such a low power consumption.

mightytravelsover 1 year ago

Use this Whisper derivative repo instead - one hour of audio gets transcribed within a minute or less on most GPUs - <a href="https://github.com/Vaibhavs10/insanely-fast-whisper">https://github.com/Vaibhavs10/insanely-fast-whisper</a>

评论 #38633586 未加载

评论 #38629622 未加载

theschwaover 1 year ago

I feel like this is particularly interesting in light of their Vision Pro. Being able to run models in a power efficient manner may not mean much to everyone on a laptop, but it's a huge benefit for an already power hungry headset.

LiamMcCallowayover 1 year ago

I'll take this opportunity to ask for help: What's a good open source transcription and diarization app or work flow?I looked at <a href="https://github.com/thomasmol/cog-whisper-diarization">https://github.com/thomasmol/cog-whisper-diarization</a> and <a href="https://about.transcribee.net/" rel="nofollow noreferrer">https://about.transcribee.net/</a> (from the people behind Audapolis) but neither work that well -- crashes, etc.Thank you!

评论 #38629849 未加载

评论 #38634584 未加载

bcatanzaroover 1 year ago

What precision is this running in? If 32-bit, it’s not using the tensor cores in the 4090.

lars512over 1 year ago

Is there a great speech generation model that runs on MacOS, to close the loop? Something more natural than the built in MacOS voices?

评论 #38634858 未加载

评论 #38634355 未加载

2lkj22kjoiover 1 year ago

4090 -> 82 TFLOPSM3 MAX GPU -> 10 TFLOPSIt is 8 times slower than 4090.But yeah, you can claim that a bike has a faster acceleration than Ferrari, because it could reach the speed of 1km per hour faster...

jauntywundrkindover 1 year ago

I wonder how AMD's XDNA accelerator will fair.They just shipped 1.0 of the Ryzen AI Software and SDK. Alleges ONNX, PyTorch, and Tensorflow support. <a href="https://www.anandtech.com/show/21178/amd-widens-availability-of-ryzen-ai-software-for-developers-xdna-2-coming-with-strix-point-in-2024" rel="nofollow noreferrer">https://www.anandtech.com/show/21178/amd-widens-availability...</a>Interestingly, the upcoming XDNA2 supposedly is going to boost generative performance a lot? "3x". I'd kind of assumed these sort of devices would mainly be helping with inference. (I don't really know what characterizes the different workloads, just a naive grasp.)

sim7c00over 1 year ago

looking at the comments perhaps the article could be more eptly titled. the author does stress these benchmarks, maybe better called test runs, are not of any scientific accuracy or worth, but simply to demonstrate what is being tested. i think its interesting though that apple and 4090s are even compared in any way since the devices are so vastly different. id expect the 4090 to be more powerful, but apple optimized code runs really quick on apple silicon despite this seemingly obvious fact, and that i think is interesting. you dont need a 4090 to do things if you use the right libraries. is that what i can take from it?

darknoonover 1 year ago

Would be more interesting if Pytorch with MPS backend was also included.

iAkashPaulover 1 year ago

There's a better parallel/batching that works on the 30s chunks resulting in 40X. From HF at <a href="https://github.com/Vaibhavs10/insanely-fast-whisper">https://github.com/Vaibhavs10/insanely-fast-whisper</a>This is again not native PyTorch so there's still room to have better RTFX numbers.

runjakeover 1 year ago

Anyone have overall benchmarks or qualified speculation on how an optimized implementation for a 4070 compares against the M series -- especially the M3 Max?I'm trying to decide between the two. I figure the M3 Max would crush the 4070?

etchalonover 1 year ago

The shocking thing about these M series comparisons is never "the M series is fast as the GIANT NVIDIA THING!" it's always "Man, the M series is 70% as fast with like 1/4 the power."

评论 #38629728 未加载

评论 #38630142 未加载

accidbuddyover 1 year ago

About whisper, anyone knows a project (github) about using the model in real-time? I'm studying a new language, and it appears to be a good chance to use and learning pronunciation vs. word.

评论 #38637327 未加载

DeathArrowover 1 year ago

Ok, OpenAI will ditch Nvidia and buy macs instead. :)

评论 #38629277 未加载

brcmthrowawayover 1 year ago

Shocked that Apple hasn't released a high end compute chip competitive with NVIDIA

ex3ndrover 1 year ago

So running on M2 Ultra would beat 4090 by 30%? (since it has 2x of gpu cores)

atlas_huggedover 1 year ago

TL;DRIf you compare whisper on a mac with Mac optimized build Vs on a pc with a few NON-optimized NVIDIA build The results are close! If nvidia optimized is compared, it’s not even remotely close.PfftI’ll be picking up a Mac but I’m well aware it’s not close to Nvidia at all. It’s just the best portable setup I can find that I can run completely offline.Do people really need to make these disingenuous comparisons to validate their purchase?If a mac fits your overall use case better, get a Mac. If a pc with nvidia is the better choice, get it. Why all these articles of “look my choice wasn’t that dumb”??

throwaw33333434over 1 year ago

META: is M3 Pro good enough to run Cyberpunk 2077 smoothly? Does Max really makes a difference?

评论 #38630269 未加载

bee_riderover 1 year ago

Hmm… this is a dumb question, but the cookie pop up appears to be in German on this site. Does anyone know which button to press to say “maximally anti-tracking?”

评论 #38629888 未加载