Writing performant numerical code for CPUs requires using vector instruction sets like AVX. With these instruction it is up to the compiler/programmer to vectorize the sequential program and deal with things like scatter/gather memory access or masks to deal with control divergence. On the other hand GPU's offer a cleaner multithreaded programming model with the mapping to vectorized form being done at runtime in hardware. And as the GPU's get more sophisticated controls, this mapping will become even more transparent performance-wise. Yet GPUs remain competitive with CPUs in terms of price and energy consumption.<p>Given this, can we expect CPU instruction sets to move away from explicit vectors and expose a GPU-like interface for parallel numerical work with the advantage that this GPU-like would be on the same chip and share the same cache with the more serial-oriented parts of the CPU?
There's nothing special about GPU hardware that magically handles vectorization for you. A fragment shader, for instance, is a small program that runs in parallel across all the pixels in a fragment. That turns out to be a great interface for doing graphics, but the hardware doesn't really care about it. It's simply up to the driver to compute where the fragment boundaries are and to program set up the corresponding inputs to the shader cores.<p>This actually gives you less control over the hardware, in exchange for an easier programming model. The problem is that it's a domain-specific solution; if you're not rasterizing polygons then it doesn't do you much good; you'll have to break the problem up into parallelizable domains yourself, or find a library that does it for you. (That's all a graphics driver is, in the end. It's just a library that helps you access a complicated piece of hardware without having to know too many details about it.