I made these a couple of years ago as a teaching exercise for <a href="https://minitorch.github.io/" rel="nofollow">https://minitorch.github.io/</a>. At the time the resources for doing anything on GPUs were pretty sparse and the NVidia docs were quite challenging.<p>These days there are great resources for going deep on this topic. The CUDA-mode org is particularly great, both their video series and PMPP reading groups.
I recently ported this to Metal for Apple Silicon computers. If you're interested in learning GPU programming on an M series Mac, I think this is a very accessible option. Thanks to Sasha for making this!<p><a href="https://github.com/abeleinin/Metal-Puzzles">https://github.com/abeleinin/Metal-Puzzles</a>
I think this course is also relevant for some deeper context.<p><a href="https://gfxcourses.stanford.edu/cs149/fall23/lecture/dataparallel/" rel="nofollow">https://gfxcourses.stanford.edu/cs149/fall23/lecture/datapar...</a>
When working on GPU code there’s really two parts to it, I feel. One is “how do I even write code for the GPU” which this tutorial seems to cover but there’s a second part which is “how do I write <i>good</i> code for the GPU” which seems like it would need another resource or expansion to this one.
I loved the tensor puzzles you made. I spent the morning revisiting and liking all the videos on youtube you've made. Hope for many more in the future!
Either puzzle 4 has a bug in it or I'm losing my mind. (Possible answer to solution below, so don't read if you want to go in fresh)<p><pre><code> # FILL ME IN (roughly 2 lines)
if local_i < size and local_j < size:
out[local_i][local_j] = a[local_i][local_j] + 10
</code></pre>
Results in a failed assertion:<p><pre><code> AssertionError: Wrong number of indices
</code></pre>
But the test cell beneath it will still pass?
So I'm used to working with lists and maps, which doesn't really track well with tackling problems on thousands of cores.<p>Is the usual strategy to worry less about repeating calculations and just use brute force to tackle the problem?<p>Is there a good resource to read about how to tackle problems in an extremely parallel way?
seems like an opportune moment to gift a plug for bitcoin puzzles, namely BTC32 / 1000 BTC Challenge[1]<p>pools are in dire need of cuda developers<p>[1]<a href="https://bitcointalk.org/index.php?topic=1306983.0" rel="nofollow">https://bitcointalk.org/index.php?topic=1306983.0</a>