I took a GPU programming course in college and even did a year long thesis implementing a RK4 integrator to solve a particular differential equation.<p>In my thesis work, the main issue I encountered was that RK4 was a vector operation, but GPUs are matrix processors. The bottleneck in the application was the memory bandwidth, not the GPU itself. We ended up with a speedup of 16 w.r.t a single-core CPU implementation of the same problem.<p>The article claims a speedup of 35-60, but I see they also compared the GPU to a single-core CPU implementation. This is not a fair comparison. If they want to be fair, they need to utilize the full capabilities of a CPU (think performance per socket, not performance per core). I think Intel makes 18-core CPUs now; with a properly implemented multi-threaded RK4 (not very difficult) I'd expect the speedup to be closer to 2-12 instead of 35-60.
It seems like a nice and modern stack to work with. Anything on adoption in the industry?<p>It's a niche market with firms very-very reluctant to switch software and very-very long development cycles. Some googling led me to Edlund, which from their '14 annual report seems a mainly Danish firm [1].<p>[1] <a href="https://www.edlund.dk/sites/default/files/Downloads/annual-report_2014.pdf" rel="nofollow">https://www.edlund.dk/sites/default/files/Downloads/annual-r...</a>
Aon Benfield Securities has a Python + GPU approach to actuarial modeling. If you want to compare the approaches they take, there's a video and talk up on the DSLs for Finance homepage: dslfin.org<p>For those generally interested in financial domain-specific languages, the website also has a comprehensive listing of financials DSLs.