For other people interested in high-level GPU programming, might I suggest Theano?<p>Theano (<a href="http://deeplearning.net/software/theano/" rel="nofollow">http://deeplearning.net/software/theano/</a>) is a CPU and GPU compiler for mathematical expressions in Python. It combines the convenience of NumPy with the speed of optimized native machine language. For gradient-based machine learning algorithms (like training an MLP or convolutional net), Theano is from <i>1.6x to 7.5x</i> faster than competitive alternatives (including those in C/C++, NumPy, SciPy, and Matlab) when compiled for the CPU and between <i>6.5x and 44x</i> faster when compiled for the GPU. You can read more about it here: <a href="http://www.iro.umontreal.ca/~lisa/pointeurs/theano_scipy2010.pdf" rel="nofollow">http://www.iro.umontreal.ca/~lisa/pointeurs/theano_scipy2010...</a>