Accelerated PyTorch Training on M1 Mac

443 pointsby tgymnichabout 3 years ago

23 comments

lekeviciusabout 3 years ago

Curiously neither PyTorch nor Tensorflow currently use M1's Neural Engine. Is too limited? Too hard to interact with? Not worth the effort?

评论 #31426950 未加载

评论 #31425447 未加载

评论 #31425275 未加载

alexfromapexabout 3 years ago

Since it's tangentially relevant, if you have an M1 Mac I've created some boilerplate for working with the latest Tensorflow with GPU acceleration as well: <a href="https://github.com/alexfromapex/tensorexperiments" rel="nofollow">https://github.com/alexfromapex/tensorexperiments</a> . I'm thinking of adding a branch for PyTorch now.

评论 #31425282 未加载

评论 #31425271 未加载

mkaicabout 3 years ago

This is really cool for a number of reasons:1.) Apple Silicon currently can't compete with Nvidia GPUs in terms of raw compute power, but they're already way ahead on energy efficiency. Training a small deep learning model on battery power on a laptop could actually be a thing now.Edit: I've been informed that for matrix math, Apple Silicon isn't actually ahead in efficiency2.) Apple Silicon probably will compete directly with Nvidia GPUs in the near future in terms of raw compute power in future generations of products like the Mac Studio and Mac Pro, which is very exciting. Competition in this space is incredibly good for consumers.3.) At $4800, an M1 Ultra Mac Studio appears to be far and away the cheapest machine you can buy with 128GB of GPU memory. With proper PyTorch support, we'll actually be able to use this memory for training big models or using big batch sizes. For the kind of DL work I do where dataloading is much more of a bottleneck than actual raw compute power, Mac Studio is now looking very enticing.

评论 #31426095 未加载

评论 #31426240 未加载

评论 #31426235 未加载

评论 #31428217 未加载

评论 #31426615 未加载

评论 #31426674 未加载

评论 #31426074 未加载

评论 #31467291 未加载

评论 #31427317 未加载

评论 #31427419 未加载

ekelsenabout 3 years ago

Nice results! But why are people still reporting benchmark results on VGG? Does anybody actually use this network anymore?Better would be mobilenets or efficientNets or NFNets or vision transformers or almost anything that's come out in the 8 years since VGG was published (great work it was at the time!).

评论 #31424723 未加载

评论 #31424793 未加载

评论 #31424811 未加载

评论 #31424671 未加载

评论 #31430971 未加载

singularity2001about 3 years ago

The installation command generated on <a href="https://pytorch.org/get-started/locally/" rel="nofollow">https://pytorch.org/get-started/locally/</a> didn't install the latest version for me. What did it was:pip3 install --pre torch==1.12.0.dev20220518 --extra-index-url <a href="https://download.pytorch.org/whl/nightly/cpu" rel="nofollow">https://download.pytorch.org/whl/nightly/cpu</a>

评论 #31458433 未加载

评论 #31433704 未加载

nafizhabout 3 years ago

Exciting!! But don't see comparison with any laptop Nvidia GPUs in terms of performance. That would be insightful.

评论 #31425432 未加载

buildbotabout 3 years ago

This is very interesting since the M1 studio supports 128GB of unified memory - training a large memory heavy model slowly on a single device could be interesting, or inferencing a very large model.

评论 #31425789 未加载

ivstitiaabout 3 years ago

There was a report comparing M1 Pro with several other Nvidia GPUs from a few months ago: <a href="https://wandb.ai/tcapelle/apple_m1_pro/reports/Deep-Learning-on-the-M1-Pro-with-Apple-Silicon---VmlldzoxMjQ0NjY3" rel="nofollow">https://wandb.ai/tcapelle/apple_m1_pro/reports/Deep-Learning...</a>I'm curious on how the benchmarks change with this recent new release!

almostdigitalabout 3 years ago

Anyone actually got this to run on an M1 Mac?<pre><code> $ conda install pytorch torchvision torchaudio -c pytorch-nightly Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Collecting package metadata (repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. PackagesNotFoundError: The following packages are not available from current channels: - torchaudio </code></pre> And the pip install variant installs an old version of torchaudio that is broken<pre><code> OSError: dlopen(/opt/homebrew/Caskroom/miniforge/base/envs/test123/lib/python3.10/site-packages/torchaudio/lib/libtorchaudio.so, 0x0006): Symbol not found: __ZN2at14RecordFunctionC1ENS_11RecordScopeEb</code></pre>

评论 #31430882 未加载

评论 #31436515 未加载

评论 #31453305 未加载

Scene_Cast2about 3 years ago

I'm curious about the performance compared to something like, say, the RTX 3070.

评论 #31424446 未加载

评论 #31424579 未加载

评论 #31425809 未加载

MasterScratabout 3 years ago

Small code example in the PyTorch doc:<a href="https://pytorch.org/docs/master/notes/mps.html" rel="nofollow">https://pytorch.org/docs/master/notes/mps.html</a>

评论 #31431522 未加载

singularity2001about 3 years ago

Anyone else getting "illegal hardware instruction"?(pytorch_env) ~/dev/ai/ python -c "import torch"

评论 #31426719 未加载

in3dabout 3 years ago

It’s surprising to see PyTorch developers working on things like that when common operations like group convolutions are still completely unoptimized on Nvidia GPUs, despite many requests.

评论 #31428867 未加载

arecurrenceabout 3 years ago

This is much nicer ergonomics than what I had to do for tensorflow. It’s ostensibly out of the box support as a different torch device.

评论 #31424364 未加载

评论 #31424365 未加载

diliellonelucaabout 3 years ago

I started collecting benchmarks of the M1 Max on PyTorch here: <a href="https://github.com/lucadiliello/pytorch-apple-silicon-benchmarks" rel="nofollow">https://github.com/lucadiliello/pytorch-apple-silicon-benchm...</a>

munroabout 3 years ago

yess! This is important for me, because I don't have any $$$ to rent GPUs for personal projects. Now we just need M1 support for JAX.Since there are no hard benchmarks against other GPUs, here's a Geekbench against an RTX 3080 Mobile laptop I have [1]. Looks like it's about 2x slower--the RTX laptop absolutely rips for gaming, I love it.[1] <a href="https://browser.geekbench.com/v5/compute/compare/4140651?baseline=4529092" rel="nofollow">https://browser.geekbench.com/v5/compute/compare/4140651?bas...</a>

评论 #31427172 未加载

kristianpabout 3 years ago

A tangential thought: will we see animation studios buy mac studios for their rendering farms? What do they use these days, aws ec2?

Kalanosabout 3 years ago

Anyone care to comment on how this is better than Metal's TensorFlow support?

macshomeabout 3 years ago

Does this work on any Metal hardware or just the M1 GPU?

评论 #31430501 未加载

cj8989about 3 years ago

really hope to see some comparisons with nvidia gpus!

toppyabout 3 years ago

Does speed up refer to absolute value or percentage?

评论 #31424615 未加载

sbeckerivabout 3 years ago

What is the * in the chart referencing?

评论 #31424429 未加载

ameliusabout 3 years ago

> Accelerated GPU training is enabled using Apple’s Metal Performance Shaders (MPS) as a backend for PyTorch.What do shaders have to do with it? Deep learning is a mature field now, it shouldn't need to borrow compute architecture from the gaming/entertainment field. Anyone else find this disconcerting?

评论 #31424526 未加载

评论 #31424520 未加载

评论 #31424528 未加载

23 comments

lekeviciusabout 3 years ago

Curiously neither PyTorch nor Tensorflow currently use M1's Neural Engine. Is too limited? Too hard to interact with? Not worth the effort?

评论 #31426950 未加载

评论 #31425447 未加载

评论 #31425275 未加载

alexfromapexabout 3 years ago

评论 #31425282 未加载

评论 #31425271 未加载

mkaicabout 3 years ago

评论 #31426095 未加载

评论 #31426240 未加载

评论 #31426235 未加载

评论 #31428217 未加载

评论 #31426615 未加载

评论 #31426674 未加载

评论 #31426074 未加载

评论 #31467291 未加载

评论 #31427317 未加载

评论 #31427419 未加载

ekelsenabout 3 years ago

评论 #31424723 未加载

评论 #31424793 未加载

评论 #31424811 未加载

评论 #31424671 未加载

评论 #31430971 未加载

singularity2001about 3 years ago

评论 #31458433 未加载

评论 #31433704 未加载

nafizhabout 3 years ago

Exciting!! But don't see comparison with any laptop Nvidia GPUs in terms of performance. That would be insightful.

评论 #31425432 未加载

buildbotabout 3 years ago

This is very interesting since the M1 studio supports 128GB of unified memory - training a large memory heavy model slowly on a single device could be interesting, or inferencing a very large model.

评论 #31425789 未加载

ivstitiaabout 3 years ago

almostdigitalabout 3 years ago

评论 #31430882 未加载

评论 #31436515 未加载

评论 #31453305 未加载

Scene_Cast2about 3 years ago

I'm curious about the performance compared to something like, say, the RTX 3070.

评论 #31424446 未加载

评论 #31424579 未加载

评论 #31425809 未加载

MasterScratabout 3 years ago

Small code example in the PyTorch doc:<a href="https://pytorch.org/docs/master/notes/mps.html" rel="nofollow">https://pytorch.org/docs/master/notes/mps.html</a>

评论 #31431522 未加载

singularity2001about 3 years ago

Anyone else getting "illegal hardware instruction"?(pytorch_env) ~/dev/ai/ python -c "import torch"

评论 #31426719 未加载

in3dabout 3 years ago

It’s surprising to see PyTorch developers working on things like that when common operations like group convolutions are still completely unoptimized on Nvidia GPUs, despite many requests.

评论 #31428867 未加载

arecurrenceabout 3 years ago

This is much nicer ergonomics than what I had to do for tensorflow. It’s ostensibly out of the box support as a different torch device.

评论 #31424364 未加载

评论 #31424365 未加载

diliellonelucaabout 3 years ago

munroabout 3 years ago

评论 #31427172 未加载

kristianpabout 3 years ago

A tangential thought: will we see animation studios buy mac studios for their rendering farms? What do they use these days, aws ec2?