TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Resources for general purpose GPU development on Apple's M* chips?

149 点作者 thinking_banana5 个月前
While Apple M* chips seems to have an incredible unified memory access, the available learning resources seem to be quite restricted and often convoluted. Has anyone been able to get past this barrier? I have some familiarity with general purpose software development with CUDA and C++. I want to figure how to work with/ use Apple's developer resources for general purpose programming.

13 条评论

aleinin5 个月前
If you&#x27;re looking for a high level introduction to GPU development on Apple silicon I would recommend learning Metal. It&#x27;s Apple&#x27;s GPU acceleration language similar to CUDA for Nvidia hardware. I ported a set of puzzles for CUDA called GPU-Puzzles (a collection of exercises designed to teach GPU programming fundamentals)[1] to Metal [2]. I think it&#x27;s a very accessible introduction to Metal and writing GPU kernels.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;srush&#x2F;GPU-Puzzles">https:&#x2F;&#x2F;github.com&#x2F;srush&#x2F;GPU-Puzzles</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;abeleinin&#x2F;Metal-Puzzles">https:&#x2F;&#x2F;github.com&#x2F;abeleinin&#x2F;Metal-Puzzles</a>
评论 #42510624 未加载
评论 #42512256 未加载
morphle5 个月前
You can help with the reverse engineering of Apple Silicon done by a dozen people worldwide, that is how we find out the GPU and NPU instructions[1-4]. There is over 43 trillion float operations per second to unlock at 8 terabit per second &#x27;unified&#x27; memory bandwidth and 270 gigabits per second networking (less on the smaller chips)....<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;AsahiLinux&#x2F;gpu">https:&#x2F;&#x2F;github.com&#x2F;AsahiLinux&#x2F;gpu</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;dougallj&#x2F;applegpu">https:&#x2F;&#x2F;github.com&#x2F;dougallj&#x2F;applegpu</a><p>[3] <a href="https:&#x2F;&#x2F;github.com&#x2F;antgroup-skyward&#x2F;ANETools&#x2F;tree&#x2F;main&#x2F;ANEDisassembler">https:&#x2F;&#x2F;github.com&#x2F;antgroup-skyward&#x2F;ANETools&#x2F;tree&#x2F;main&#x2F;ANEDi...</a><p>[4] <a href="https:&#x2F;&#x2F;github.com&#x2F;hollance&#x2F;neural-engine">https:&#x2F;&#x2F;github.com&#x2F;hollance&#x2F;neural-engine</a><p>You can use a high level APIs like MLX, Metal or CoreML to compute other things on the GPU and NPU.<p>Shadama [5] is an example programming language that translates (with Ometa) matrix calculations into WebGPU or WebGL APIs (I forget which). You can do exactly the same with the MLX, Metal or CoreML APIs and only pay around 3% overhead going through the translation stages.<p>[5] <a href="https:&#x2F;&#x2F;github.com&#x2F;yoshikiohshima&#x2F;Shadama">https:&#x2F;&#x2F;github.com&#x2F;yoshikiohshima&#x2F;Shadama</a><p>I estimate it will cost around $22K at my hourly rate to completely reverse engineer the latest A16 and M4 CPU (ARMV9), GPU and NPU instruction sets. I think I am halfway on the reverse engineering, the debugging part is the hardest problem. You would however not be able to sell software with it on the APP Store as Apple forbids undocumented API&#x27;s or bare metal instructions.
评论 #42511582 未加载
评论 #42511729 未加载
评论 #42510313 未加载
评论 #42510288 未加载
barkingcat5 个月前
There is no general purpose GPU development on Apple M series.<p>There is Metal development. You want to learn Apple M-series gpu and gpgpu development? Learn Metal!<p><a href="https:&#x2F;&#x2F;developer.apple.com&#x2F;metal&#x2F;" rel="nofollow">https:&#x2F;&#x2F;developer.apple.com&#x2F;metal&#x2F;</a>
评论 #42513271 未加载
rgovostes5 个月前
It&#x27;s hard to answer not knowing exactly what your aim is, or your experience level with CUDA and how easily the concepts you know will map to Metal, and what you find &quot;restricted and convoluted&quot; about the documentation.<p>&lt;Insert your favorite LLM&gt; helped me write some simple Metal-accelerated code by scaffolding the compute pipeline, which took most of the nuisance out of learning the API and let me focus on writing the kernel code.<p>Here&#x27;s the code if it&#x27;s helpful at all. <a href="https:&#x2F;&#x2F;github.com&#x2F;rgov&#x2F;thps-crack">https:&#x2F;&#x2F;github.com&#x2F;rgov&#x2F;thps-crack</a>
评论 #42510028 未加载
billti5 个月前
If you know CUDA, then I assume you know a bit already about GPUs and the major concepts. There’s just minor differences and different terminology for things like “warps” etc.<p>With that base, I’ve found their docs decent enough, especially coupled with the Metal Shader Language pdf they provide (<a href="https:&#x2F;&#x2F;developer.apple.com&#x2F;metal&#x2F;Metal-Shading-Language-Specification.pdf" rel="nofollow">https:&#x2F;&#x2F;developer.apple.com&#x2F;metal&#x2F;Metal-Shading-Language-Spe...</a>), and quite a few code samples you can download from the docs site (e.g. <a href="https:&#x2F;&#x2F;developer.apple.com&#x2F;documentation&#x2F;metal&#x2F;performing_calculations_on_a_gpu" rel="nofollow">https:&#x2F;&#x2F;developer.apple.com&#x2F;documentation&#x2F;metal&#x2F;performing_c...</a>).<p>I’d note a lot of their stuff was still written in Objective-C, which I’m not that familiar with. But most of that is boilerplate and the rest is largely C&#x2F;C++ based (including the Metal shader language).<p>I just ported some CPU&#x2F;SIMD number crunching (complex matrices) to Metal, and the speed up has been staggering. What used to take days now takes minutes. It is the hottest my M3 MacBook has ever been though! (See <a href="https:&#x2F;&#x2F;x.com&#x2F;billticehurst&#x2F;status&#x2F;1871375773413876089" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;billticehurst&#x2F;status&#x2F;1871375773413876089</a> :-)
mkagenius5 个月前
Check out MLX[1]. Its a bit like pytorch&#x2F;tensorflow with added benefit of Apple Silicon.<p>1. <a href="https:&#x2F;&#x2F;ml-explore.github.io&#x2F;mlx&#x2F;build&#x2F;html&#x2F;index.html" rel="nofollow">https:&#x2F;&#x2F;ml-explore.github.io&#x2F;mlx&#x2F;build&#x2F;html&#x2F;index.html</a>
thetwentyone5 个月前
I’ve had a good time dabbling with Metal.jl: <a href="https:&#x2F;&#x2F;github.com&#x2F;JuliaGPU&#x2F;Metal.jl">https:&#x2F;&#x2F;github.com&#x2F;JuliaGPU&#x2F;Metal.jl</a>
评论 #42517037 未加载
dylanowen5 个月前
People have already mentioned Metal, but if you want cross platform, <a href="https:&#x2F;&#x2F;github.com&#x2F;gfx-rs&#x2F;wgpu">https:&#x2F;&#x2F;github.com&#x2F;gfx-rs&#x2F;wgpu</a> has a vulkan-like API and cross compiles to all the various GPU frameworks. I believe it uses <a href="https:&#x2F;&#x2F;github.com&#x2F;KhronosGroup&#x2F;MoltenVK">https:&#x2F;&#x2F;github.com&#x2F;KhronosGroup&#x2F;MoltenVK</a> to run on Macs. You can also see the metal shader transpilation results for debugging.
评论 #42510332 未加载
评论 #42509852 未加载
feznyng5 个月前
Besides the official docs you can check out llama.cpp as an example that uses metal for accelerated inference on Apple silicon.
desideratum5 个月前
I&#x27;d reccomend checking out the CUDA mode Discord server! They also have a channel for Metal <a href="https:&#x2F;&#x2F;discord.gg&#x2F;ZqckTYcv" rel="nofollow">https:&#x2F;&#x2F;discord.gg&#x2F;ZqckTYcv</a>
rowanG0775 个月前
If you are open to run Linux you can use standard opencl and vulkan.
TriangleEdge5 个月前
Why not OpenCL or OpenGL? You&#x27;ll not be constrained by the flavor of GPU.
评论 #42512573 未加载
amelius5 个月前
Apple is known to actively discourage general purpose computing. Better try a different vendor.
评论 #42512470 未加载
评论 #42511232 未加载