A bit long-winded, but I enjoy seeing more interest in real-time audio on non-traditional compute platforms.<p>I'm under the impression that more recent versions of CUDA are able to implement "streaming" kernels, where you only launch the kernel once and then continually feed it data and prices its output. Presumably this would slightly reduce latency (or variability) related to kernel launching & increase throughput if the kernel can remain resident during the time that it would previously be waiting between the end of a task and the start of the next one.
I suspect that if VR / AR continues to take off, the need for good quality generated binaural sound will make it more and more attractive to put a greater percentage of sound processing onto the GPU.