> <i>Never rely on benchmark tests unless the benchmarking code is known to be open source and compiled without using any Intel tools.</i><p>A serious question - how many common benchmark packages are compiled by ICC or uses Intel MKL? I hope the number is limited, otherwise all the benchmarks published by mainstream PC reviewers are potentially biased. If there's a serious ICC-in-benchmark problem, then only Phoronix's Linux benchmarks are trustworthy - the majority of benchmarks on Phoronix uses free and open source compilers and testsuits, with known versions, build parameters and optimization levels. Thanks Michael Larabel for his service for the community.
Money quote: "... on an AMD computer then you may set the environment variable MKL_DEBUG_CPU_TYPE=5."<p>When run on an AMD, any program built with Intel's compiler should have the environment variable set. I don't think there is any downside to leaving it on all the time, unless you are measuring how badly Intel has tried to cripple your AMD performance.
I work on CFD software. We're well aware of this in my work, but the reality is that all our big corporate clients use Intel hardware. We already tell people to set those environment variables in our documentation.<p>> Avoid the Intel compiler. There are other compilers with similar or better performance.<p>This is not really true IMO, but even as an aside, the Intel compiler has the enormous advantage of being available cross platform. So we can use it on Linux and Windows, and provides MPI cross platform. We upgrade fairly regularly and that provides us with less work.<p>My own tests found that PGI compiler performance was worse than Intel for C++, and that now appears to have been discontinued on Windows anyway with NVidia's new HPC compiler suite replacing it. GNU can run everywhere, but performance is around 2.5x worse on Linux for our application use case because it doesn't perform many of the optimisations that Intel does. We use MSVC on Windows just because everyone can have a license, and performance is much worse.<p>The other thing is that MKL is pretty stable and gets updated. If I use an open source BLAS/LAPACK implementation - sure, it works, and it may even give better performance! But it's not guaranteed to get updates beyond a couple of years, and plenty of implementations are also only partial. We pay Intel a lot of money for the lack of hassle, basically.
The mythology surrounding the Intel tools and libraries really ought to die. It's bizarre seeing people deciding they must use MKL rather than the linear algebra libraries on which AMD has been working hard to optimize for their hardware (and possibly other hardware incidentally). Similarly for compiler code generation.<p>Free BLASs are pretty much on a par with MKL, at least for large dimension level 3 in BLIS's case, even on Haswell. For small matrices MKL only became fast after libxsmm showed the way. (I don't know about libxsmm on current AMD hardware, but it's free software you can work on if necessary, like AMD have done with BLIS.) OpenBLAS and BLIS are infinitely better performing than MKL in general because they can run on all CPU architectures (and BLIS's plain C gets about 75% of the hand-written DGEMM kernel's performance).<p>The differences between the implementations are comparable with the noise in typical HPC jobs, even if performance was entirely dominated by, say, DGEMM (and getting close to peak floating point intensity is atypical). On the other hand, you can see a factor of several difference in MPI performance in some cases.