It's hard to give significant advice with this little information - how much time the CPUs spend waiting for the memory, how many cache misses are happening, how many core execution units are doing something at any given time, etc.<p>HPE has single-image machines that can have up to 16 4th gen Xeons, which gives a top limit of 960 cores. IBM has POWER10 boxes that go up to 240 cores (but they are POWER 10 cores that can do, IIRC, up to 8 threads per core (increasing cache misses, but reducing unused execution units).