Is there an example that:<p><pre><code> on CPU, Instruction A is faster than Instruction B
However,
on GPU, Instruction B is faster than Instruction A
(Instructions are assumed to be math operations)</code></pre>
There is a constant arms race between GPUs and CPUs to be faster. What is faster today on one, might be slower tomorrow.<p>GPUs are very good at doing lots of floating point math. Historically, CPUs have been better at dealing with branching, multiple instruction issue, out of order execution, integer math, and really pile on the cache architectures. CPUs have SIMD too so they are no slouch for lots of floating point calculations either.<p>Since memory (I/O) is now one of the largest bottlenecks for both GPUs and CPUs because memory bus speeds are much slower than both, this will often be your dominant factor. Since most of your data comes from main RAM, your CPU often lives closer to the data and tends to have aggressive cache architectures (L1, L2, L3 caches) to help, thus giving the CPU an advantage when data is not local.<p>I don't know if it still holds true, but NxM matrix math used to be faster on CPUs for very large values of N,M because for cache locality, the CPU had an easier time keeping values that needed to be reused in the matrices in cache. But GPUs tend to be really good at 4x4 matrices since that is what graphics primarily uses.
I don't understand the question, as I don't know what "instruction" means in a portable way. AMD/Intel chips have instructions like LZCNT and CRC32, that don't exist as an instruction on R700-Family Instruction Set Architecture (nor other GPUs?).<p>Even if two functions do mostly the same thing (eg, multiply two floats), doesn't the Intel architecture have more complete support for the optional alternate exception handling of IEEE 754? If so, then they aren't really identical.<p>So, which instructions do you think are equivalent enough for your comparison?<p>Performance is driven by economics. Find where the economics between GPUs and CPUs are different, and you'll likely find where the performance inversion is.
GPU's are parallel, which means they are better if you perform the same operation multiple times. GPU's also are better optimized for floating point arithmetic, but not always integer arithmetic.<p>This makes sense since GPU's are optimized to calculate graphic-related things, which is often a bunch of floating point operations.
Disk I/O. The GPU would have to ask the CPU to do much of the work. I expect that in principle a storage controller can DMA into the GPU's VRAM, but I don't know that anyone actually does that. maybe for texture maps.