The crux here is the transfer of data into the very limited RAM of the GPU. If the data is already conveniently placed on the GPU, they are indeed thousands of time faster. But this is mostly a situation that is artificially created for benchmarks. If the transfer is taken into account, usually CPU-based systems win.<p>Database query processing (SQL) usually requires a very low amount of compute cycles per byte of data (e.g. filter, hash table builds, ...), which means that the GPU cannot really take advantage of its compute power.<p>For ML this is a very different situation, lots of compute on few bytes, which is why we see GPUs and TPUs seriously being used there.