I'm taking a free course in CUDA programming on Udacity at the moment that's co-taught by a guy from NVIDIA Research and a professor from UC Davis. If you're looking for something that starts from the basics and is really easy to follow, I highly recommend it.<p><a href="https://www.udacity.com/course/intro-to-parallel-programming--cs344" rel="nofollow">https://www.udacity.com/course/intro-to-parallel-programming...</a>
It is very cool to see that the class is being taught by a group of juniors/seniors (checked the top two, first one was a senior and second one was a junior), and an appointed faculty is listed only as a supervisor ....<p>I am really interested in the class outcome, and would love to hear what the students in class feel about this arrangement ....<p>I can see the good things about this. It gives the instructor/TA students an opportunity to grow while giving the peer-learning atmosphere to students in class. Plus, the students in class will learn from their peers who has the latest working knowledge of CUDA fresh in their heads, and this arrangement also frees up a faculty (or two) from having to prepare the course so that they can do their faculty/research work (prepping and teaching a class, especially an interesting and engaging one, is a really draining experience on the part of the faculty as well.)<p>Only downside I can see could be managing the class well enough so that class time is efficiently utilized. But I believe this should be covered by the faculty who is in supervisor position ....
For someone that knows a thing about CUDA and parallel programming already, the best reference is Paulius Micikevicius’ presentations. If the words in it mean something to you, these 100+ slides explain more about the hardware and programming model than any other documentation you’ll find elsewhere.<p><a href="http://on-demand.gputechconf.com/gtc/2013/presentations/S3466-Programming-Guidelines-GPU-Architecture.pdf" rel="nofollow">http://on-demand.gputechconf.com/gtc/2013/presentations/S346...</a><p>If you want to really master CUDA, Nvidia GPUs and the various programming model tradeoffs, the best thing is to write a GEMM kernel and a sort kernel from scratch. To take it even further, write two of each: one that optimizes large GEMMs/sorts, and one that optimizes for batches of small GEMMs (or large GEMMs with tiny (<16 or <32) `k` or another dim) / batches of small sorts. Specialization for different problem configurations is often the name of the game.<p>For GEMM, you can work through the simple GEMM example in the CUDA documentation, then take a look at the Volkov GEMM from 2008, then the MAGMA GEMM, then the Junjie Lai / INRIA GEMM, then eventually the Scott Gray / Nervana SASS implementation, in increasing order of complexity and state-of-the-art-ness.
I took this class last year. Although it was nice to see undergraduates instructing the class, the lack of teaching experience really showed: the students were pretty rough around the edges in terms of their examples and explanations. About 2/3rds of classes ended early (at least this is better than heavily wasted time). This somewhat fits in with the unofficial caltech policy of "figuring out the finer details on your own".<p>That said, I thought the practical nature of the class was a refreshing switch from the heavily theoretical foundation of my other CS coursework experiences.
The lecture slides are very good.<p>For anybody following along, there's 2 other books, Wrox Professional Cuda programming, and Cuda for Engineers, which would ease entry for those who aren't versed in HPC (PDE solvers, BLAS/LAPACK, Fourier transforms etc). The Storti/Yurtoglu book is the best intro i've seen to the topic, the Wrox book covers a lot of the material in Wilt's Handbook, not as exhaustively, but more up to date (Kepler vs Fermi).<p>________________________<p>There's other course material online, UIUC, oxford (especially good, IMO)<p><a href="http://people.maths.ox.ac.uk/gilesm/cuda/" rel="nofollow">http://people.maths.ox.ac.uk/gilesm/cuda/</a><p><a href="http://cseweb.ucsd.edu/classes/fa15/cse260-a/lectures.html" rel="nofollow">http://cseweb.ucsd.edu/classes/fa15/cse260-a/lectures.html</a><p><a href="https://www.coursera.org/course/hetero" rel="nofollow">https://www.coursera.org/course/hetero</a>