Since it's tangentially relevant, if you have an M1 Mac I've created some boilerplate for working with the latest Tensorflow with GPU acceleration as well: <a href="https://github.com/alexfromapex/tensorexperiments" rel="nofollow">https://github.com/alexfromapex/tensorexperiments</a> . I'm thinking of adding a branch for PyTorch now.
This is really cool for a number of reasons:<p>1.) Apple Silicon <i>currently</i> can't compete with Nvidia GPUs in terms of raw compute power, but they're already way ahead on energy efficiency. Training a small deep learning model on battery power on a laptop could actually be a thing now.<p>Edit: I've been informed that for matrix math, Apple Silicon isn't actually ahead in efficiency<p>2.) Apple Silicon probably <i>will</i> compete directly with Nvidia GPUs in the near future in terms of raw compute power in future generations of products like the Mac Studio and Mac Pro, which is very exciting. Competition in this space is incredibly good for consumers.<p>3.) At $4800, an M1 Ultra Mac Studio appears to be far and away the cheapest machine you can buy with 128GB of GPU memory. With proper PyTorch support, we'll actually be able to use this memory for training big models or using big batch sizes. For the kind of DL work I do where dataloading is much more of a bottleneck than actual raw compute power, Mac Studio is now looking <i>very</i> enticing.
Nice results! But why are people still reporting benchmark results on VGG? Does anybody actually use this network anymore?<p>Better would be mobilenets or efficientNets or NFNets or vision transformers or almost anything that's come out in the 8 years since VGG was published (great work it was at the time!).
The installation command generated on <a href="https://pytorch.org/get-started/locally/" rel="nofollow">https://pytorch.org/get-started/locally/</a> didn't install the latest version for me. What did it was:<p>pip3 install --pre torch==1.12.0.dev20220518 --extra-index-url <a href="https://download.pytorch.org/whl/nightly/cpu" rel="nofollow">https://download.pytorch.org/whl/nightly/cpu</a>
This is very interesting since the M1 studio supports 128GB of unified memory - training a large memory heavy model slowly on a single device could be interesting, or inferencing a very large model.
There was a report comparing M1 Pro with several other Nvidia GPUs from a few months ago: <a href="https://wandb.ai/tcapelle/apple_m1_pro/reports/Deep-Learning-on-the-M1-Pro-with-Apple-Silicon---VmlldzoxMjQ0NjY3" rel="nofollow">https://wandb.ai/tcapelle/apple_m1_pro/reports/Deep-Learning...</a><p>I'm curious on how the benchmarks change with this recent new release!
Anyone actually got this to run on an M1 Mac?<p><pre><code> $ conda install pytorch torchvision torchaudio -c pytorch-nightly
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
PackagesNotFoundError: The following packages are not available from current channels:
- torchaudio
</code></pre>
And the pip install variant installs an old version of torchaudio that is broken<p><pre><code> OSError: dlopen(/opt/homebrew/Caskroom/miniforge/base/envs/test123/lib/python3.10/site-packages/torchaudio/lib/libtorchaudio.so, 0x0006): Symbol not found: __ZN2at14RecordFunctionC1ENS_11RecordScopeEb</code></pre>
Small code example in the PyTorch doc:<p><a href="https://pytorch.org/docs/master/notes/mps.html" rel="nofollow">https://pytorch.org/docs/master/notes/mps.html</a>
It’s surprising to see PyTorch developers working on things like that when common operations like group convolutions are still completely unoptimized on Nvidia GPUs, despite many requests.
I started collecting benchmarks of the M1 Max on PyTorch here: <a href="https://github.com/lucadiliello/pytorch-apple-silicon-benchmarks" rel="nofollow">https://github.com/lucadiliello/pytorch-apple-silicon-benchm...</a>
yess! This is important for me, because I don't have any $$$ to rent GPUs for personal projects. Now we just need M1 support for JAX.<p>Since there are no hard benchmarks against other GPUs, here's a Geekbench against an RTX 3080 Mobile laptop I have [1]. Looks like it's about 2x slower--the RTX laptop absolutely rips for gaming, I love it.<p>[1] <a href="https://browser.geekbench.com/v5/compute/compare/4140651?baseline=4529092" rel="nofollow">https://browser.geekbench.com/v5/compute/compare/4140651?bas...</a>
> Accelerated GPU training is enabled using Apple’s Metal Performance Shaders (MPS) as a backend for PyTorch.<p>What do shaders have to do with it? Deep learning is a mature field now, it shouldn't need to borrow compute architecture from the gaming/entertainment field. Anyone else find this disconcerting?