Always great to hear about parallelization becoming more widely adopted. The only thing that I don't get is the part about the GPU - it takes a <i>lot</i> of computation to tie up an 8800GT for a full minute. Also, it shouldn't take the 1-3 seconds he described to send a 45 sec .wav file over a PCI Express 2.0 x16 bus (~3-4 GB/s bandwidth IIRC).<p>I'm not sure what's causing those runtimes, but the fact that it spread over 8 cores that well suggests that it almost qualifies as embarrassingly parallel, which a GPU really should be great for. This makes me really wonder about the maturity of Apple's / nVidia's OpenCL implementation.<p>EDIT: I just ran a few of the OpenCL SDK demos and can confirm that it is 1-2 orders of magnitude slower than the same demo running in CUDA. The bandwidth for copying memory to / from the device should still be high, though.<p>My OpenCL Bandwidth Test results:
~/NVIDIA_GPU_Computing_SDK/OpenCL/bin/linux/release$ ./oclBandwidthTest<p>./oclBandwidthTest Starting...<p>Running on...<p>Device GeForce 8400M GT<p>Quick Mode<p>Host to Device Bandwidth, 1 Device(s), Paged memory, direct access<p><pre><code> Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1600.9
</code></pre>
Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)<p><pre><code> 33554432 1235.1
</code></pre>
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)<p><pre><code> 33554432 6069.7
</code></pre>
TEST PASSED<p>Press <Enter> to Quit...