This is certainly interesting and informative for somebody who already kinda knows how the code->gpu->screen pipeline works and wants to know how it's made fast, but I'm still missing some of the "pre"/"post" background for how/when the request to the shading system is made and what happens after the GPU is done.<p>I just started poking at doing computer graphics "from scratch" (working my way up from ray tracers now). I know that modern graphics systems are huge and ultimately your C++ library or whatever is just gonna flip some bits on graphics-card registers/busses or something, but I was really hoping for some background on "here's a bit of code that asks the shader to add depth to a triangle" and then how that gets mapped back to the screen.
Not sure "Introduction to compute shaders" is the right title here. I was expecting to see come code, but it's more about the history of its design.
There's some great information about how GPUs are organized here. It's interesting that GPUs and CPUs aren't <i>that</i> different, even though maximizing performance requires different considerations. SIMD lanes are just wider on the GPU, and you have to treat memory differently (registers are bigger, main memory has higher latency but better bandwidth, etc).<p>It's also funny that Nvidia lists the total number of SIMD lanes as "CUDA cores", meaning that a single compute unit with 64 SIMD lanes counts as 64 CUDA cores. That's cheating, if you ask me :P