Misleading headline. The 150x was just an anecdote about a particularly pessimal customer code. The only serious number they actually use for comparison to CPU is from the FFT kernel, where they claim 8-10x speedup (but that will be diluted by the amount of time spent outside of the FFT).