4 dies per package is a pretty interesting way of doing things - probably helps yields immensely, but I can't imagine it does anything good for intra-processor latency. 142 ns to ping a thread on a different CCX within a die isn't too horrible, but I really want to know what sort of penalty you'll have from going to a different die within a package.
For anyone looking for info about the socket:<p>* Epyc uses socket SP3 <a href="https://en.wikipedia.org/wiki/Socket_SP3" rel="nofollow">https://en.wikipedia.org/wiki/Socket_SP3</a><p>* Threadripper uses socket TR4 <a href="https://en.wikipedia.org/wiki/Socket_TR4" rel="nofollow">https://en.wikipedia.org/wiki/Socket_TR4</a><p>* Sockets SP3 and TR4 have the same number of pins (4094 pins) and they have the same cooler bracket mount (see <a href="https://www.overclock3d.net/news/cases_cooling/noctua_showcase_epyc_threadripper_ready_tr4_sp3_ready_cpu_coolers/1" rel="nofollow">https://www.overclock3d.net/news/cases_cooling/noctua_showca...</a> )<p>* However they are still two separate sockets so you shouldn't expect to be able to use Epyc on TR4 or Threadripper on SP3
I would really love it if there was a benchmark around running VMs and containers for something like this. Our dev/test system is all docker containers so that is what we would care about.<p>I guess it would be hard as there are to many ways to scale out what you run - how many VMs, how many containers, what are you running in them? It would be an interesting benchmark matrix to sort for.<p>It would be interesting just to see how many containers you could start, run lighttpd and each server a static web page? Maybe 1/2 with the page and 1/2 with an application that builds the page? Who knows...to many variables.<p>I think we will just by a system when we can and try our workload on it. Oh, well.
Baidu and Microsoft will be customers:<p><a href="https://www.bloomberg.com/news/articles/2017-06-20/amd-server-chip-revival-effort-enlists-some-big-friends" rel="nofollow">https://www.bloomberg.com/news/articles/2017-06-20/amd-serve...</a>
Those TDPs look pretty high, what are vendors willing to put into 1U high 0.5U wide style servers with 2 sockets these days? Last I looked I seem to recall it was around up to 145W.
I didn't see any info on the cpu cache architecture which governs performance for many applications now.<p>Anybody have any info on things like L0 to L2 size, type, latencies, etc?
While I'm excited to see AMD's offering, as a scientific-HPC user I can't help but wonder how much marketshare AMD will be able to gain without more information on supporting software - specifically good compilers + math libraries (cf. Intel compilers + MKL).<p>Strangely, I've not seen much on HN, or elsewhere, make mention of AMD's software support. Is this because it doesn't exist, or because compilers are less "sexy" than shiny new hardware?
>In this case, an EPYC 7281 in single socket mode is listed as having +63% performance (in SPECint) over a dual socket E5-2609v4 system.<p>So, quad-CPU is faster than dual-CPU? Not surprising.