This is a great write-up. I think Rust could be a very nice language for Shader-like applications which are already very functional, and don't involve alot of shared, mutable state across threads.<p>In HPC, we're very much interested in GPU compute programming, rather than shader programming. In CUDA codes, you're typically doing transformation from input buffers directly into output buffers from your CUDA kernels. This should immediately raise red flags for a Rust developer - you've got shared, mutable state across threads!<p>Consider this simple CUDA-ish Rust code with threads independently executing over 0..cuda.len() (ignore the bounds bugs at i = 0 and i = in.len()):<p><pre><code> fn stencil(i: usize, in: &[f32], out: &mut [f32]) {
out[i] = (in[i - 1] + in[i] + in[i + 1]) / 3.0;
}
</code></pre>
(The `i` is a conceit around computing indexing from thread/block IDs, but the input and output arrays are pretty similar to the style CUDA promotes).<p>It's obvious to me, the programmer, that I don't have any aliasing issue - each thread is only mutating at a single index in the output array. However, Rust is not smart enough to see this. If they allowed the definition of the kernel as is, you could easily write multi-threaded code that has shared mutable access to individual memory locations, violating Rust's memory model. OK, you force the kernel to look more like this, then:<p><pre><code> // `in` is the slice of [i-1,i+1]
fn stencil(in: &[f32], out: &mut f32) {
*out = (in[0] + in[1] + in[2]) / 3.0;
}
</code></pre>
And you'd enforce Rust's invariants at the kernel launch site, computing the valid slices at some higher level in the library in some "unsafe" code. But this only solves the simple case where you have some array mapping to another array where the index relationship is obvious, and it's easily provable that there are no aliasing issues. Start layering in things like unique indirect indexing, or perhaps non-unique indexing but with atomic reductions, and it becomes difficult to phrase your correct program in a way to safe(!) Rust that is compatible with the borrow checker, at least without having to build a bunch of abstractions to express each of your parallel patterns. Having to build a bunch of bespoke abstractions may not be scalable to the types of developers building big scientific codes.<p>Anyway, I'm curious if the folks at "Embark" have spent any time thinking about the issue of shared, mutable state in GPU programming with Rust. It seems like a deal breaker from where I stand.