Ask HN: Why are older processors slower at the same frequency?

32 pointsby etamponiover 2 years ago

I am a software engineer so I am pretty knowledgeable in the topic of computers in general, but this specific question continues to bother me. Why does a processor from 2015 at 2.5GHz run slower than a processor from 2022 at 2.5GHz? What should I look at specifically? Is the difference reported somewhere?In general: how can I tell when I need to replace my processor with a new one (without needing to manually compare the new and old one...)?I think looking at the GHz and number of cores is not enough anymore.

17 comments

tlbover 2 years ago

Modern CPUs can look ahead in the instruction stream and run 4 or more instructions simultaneously. This isn't easy: you have to respect data dependencies where one instruction depends on the output of a previous one. When something depends on a load instruction that missed in the cache, CPUs can keep going farther ahead and do some other work while waiting for load to complete.This "speculative out-of-order execution" requires a huge number of transistors to consider various combinations of future instructions it might be able to execute every clock cycle, and burns some extra power doing that. So although most of the basic ideas were known by the late 90s, adding more transistors in every generation lets it do more and more in parallel.Also, faster and larger caches cause fewer stalls.Also, modern cores are better at predicting branches, so they can proceed to start executing instructions past a branch before knowing which way the branch is going to go. If it guessed wrong about the branch, it has to undo the results of some instructions. So it adds a lot of complexity to track each side-effect that might need to be canceled.Also, SIMD parallel has gotten much better. Some modern cores can do 8 floating point operations per cycle using AVX2 or Neon. While older SIMD systems had very limited instructions sets, you can do a lot with modern ones. x86 SIMD instructions can process 32 bytes at a time. With a great deal of cleverness, you can do some byte stream operations in less than one cycle per byte. See <a href="https://arxiv.org/abs/2010.03090" rel="nofollow">https://arxiv.org/abs/2010.03090</a>GPUs generally do 32 parallel floating point operations per core per cycle, with hundreds of cores.Also, main memory is gradually getting slightly faster and wider.Lastly, more cores are good. Back when the most cores you could get was 4, it was barely worth writing parallel software because all the locking slowed things down almost as much as the 4 cores speeded things up. But high-end Xeons can have 40+ cores, which makes it worth the hassle of writing parallel code. And GPUs have 1000s of cores, so it's worth a lot of complication to make use of them.

toast0over 2 years ago

In a word, IPC, instructions per clock cycle. Even in most, non-pipelined, strictly in-order processor, some operations take more cycles to complete than others: division is almost always much longer than addition, although some processors simply don't divide or multiply so everything can be equal speed.Adding more, cache and memory interfaces are important too; instructions don't run without data, and while early computers often had synchronous ram as fast as the cpu clock, that's not possible anymore, data that's not in registers takes time to load and store. Super scalar, out of order execution, etc mask some of that, but not all of it.

ksecover 2 years ago

>I think looking at the GHz and number of cores is not enough anymore.It never was. The whole era of Pentium 4 and Centrino, or PowerPC vs x86.>I am a software engineer so I am pretty knowledgeable in the topic of computers in general, but this specific question continues to bother me.I think this pretty much sums up modern day Software Engineer in the industry. Is the lack of knowledge in hardware. Everything is so abstracted most people think of it as someone else's problem. Until now when Moore's law is finally dead.

评论 #32975907 未加载

评论 #32979614 未加载

评论 #32970775 未加载

detaroover 2 years ago

Actual performance is based on two things: clock speed, and how much useful work the CPU design can do per clock step. improvements in the former have kind of stopped, so now most of the improvements happen in various aspects of the latter.The best comparison is and has always been benchmark results.

smoldesuover 2 years ago

Like other comments said, there are a lot of factors that go into determining the performance of a clock cycle. One of the more interesting fields here is the optimization of instructions themselves - Agner Fog has a great document[0] comparing the performance of common x86 instructions across multiple CPU generations. It becomes really easy to see how great the early Ryzen chips were despite their low clock speeds.[0] <a href="https://www.agner.org/optimize/instruction_tables.pdf" rel="nofollow">https://www.agner.org/optimize/instruction_tables.pdf</a>

giuliomagnificoover 2 years ago

Because a CPU is more complex, you have also to look at the other parameters, like the RAM clock (that’s handled by the cpu), the instructions set, the other connections BUS, etc… obviously if we are comparing a same x86 architecture and not ARM vs x86 or other RISC vs CISC.Using only the clock is like comparing two cars using only the number of horsepowers without the weight, aerodynamic, frame, etc…

Ekarosover 2 years ago

Newer CPUs have more transistors. Meaning they can do more things. More things you can do faster you can be. Now "more things" might be more complex instructions, more cache memory, more things attempted at one time.Now when to replace... I suppose that is entirely depends when you feel like it is too slow. At that point evaluate if newer CPU or would help.

IceWreckover 2 years ago

If you want to compare CPUs do benchmarks like Geekbench instead of comparing a single metric.

bjourneover 2 years ago

Mostly because of Instruction Level Parallelism. While the number of cycles per second has not increased since 2015, the number of instructions per cycle has more than doubled due to more efficient architectures and duplicated circuits. So a core from 2015 could perhaps run 1.5 instructions per cycle, but one from 2022 could run 8 instructions per cycle and thus would be more than four times faster (at least on some tasks).You can not tell when you need to replace your old processor with a new one without manually comparing them. Performance is much too workload specific these days.

rramadassover 2 years ago

The keyword to research here is ILP (Instruction Level Parallelism) - <a href="https://en.wikipedia.org/wiki/Instruction-level_parallelism" rel="nofollow">https://en.wikipedia.org/wiki/Instruction-level_parallelism</a>There are a whole bunch of factors involved (as pointed out in the above link and by others in this thread) but the basic idea is to parallelize instructions using both Processor Micro-architectural and Compiler techniques i.e. how to get more done in a single clock cycle aka IPC (Instructions Per Clock cycle).

kadobanover 2 years ago

Besides what others have mentioned, caches sizes and speeds help a lot, and I/O speeds in general (mostly in that some machines from 2015 might still have an hdd, not too many these days).

fulafelover 2 years ago

Look up a computer architecture book or course materials, that discusses evolution of CPU designs and how we got from multiple clock cycles per instruction to several instructions per cycle and several data items per instruction. Pipelining, superscalar, vector processors, out of order, vliw, simultaneous multithreading, simd, symmetric multiprocessing, branch prediction, memory caching, etc.

oshirisukiover 2 years ago

throughputnewer CPUs have higher levels of parallelism, therefore having higher throughput, even at the same frequencythe parallelism can be achieved via vector instructions, out of order execution, along with other changes, like better or more cachingsystem performance as a whole doesn't just depend on the CPU though, a beefy CPU with shitty RAM or an HDD might be worse than a mid CPU with high-speed RAM and an SSD (even a SATA one)

DamonHDover 2 years ago

Clock frequency and cores were never really enough. When I benchmarked hardware for <ABigBank> years ago, the OS made quite a bit of difference as well. Eg WinNT was maybe ~2x slower for I/O for a given MHz of CPU than SunOS or Linux, IIRC.

zhxshenover 2 years ago

All of these have been around for a long time, but just like well-maintained software, 7 years worth of incremental improvements add up:<a href="https://en.wikipedia.org/wiki/Superscalar_processor" rel="nofollow">https://en.wikipedia.org/wiki/Superscalar_processor</a><a href="https://en.wikipedia.org/wiki/Pipeline_(computing)" rel="nofollow">https://en.wikipedia.org/wiki/Pipeline_(computing)</a><a href="https://en.wikipedia.org/wiki/Single_instruction,_multiple_data" rel="nofollow">https://en.wikipedia.org/wiki/Single_instruction,_multiple_d...</a><a href="https://en.wikipedia.org/wiki/Branch_predictor" rel="nofollow">https://en.wikipedia.org/wiki/Branch_predictor</a><a href="https://en.wikipedia.org/wiki/Speculative_execution" rel="nofollow">https://en.wikipedia.org/wiki/Speculative_execution</a><a href="https://en.wikipedia.org/wiki/Multi-core_processor" rel="nofollow">https://en.wikipedia.org/wiki/Multi-core_processor</a>Since RAM (slow) and/or cache access is involved in nearly every step--which becomes increasingly complicated when trying to preserve cache coherency across multiple cores--improvements in the next two are a big deal:<a href="https://en.wikipedia.org/wiki/Memory_management_unit" rel="nofollow">https://en.wikipedia.org/wiki/Memory_management_unit</a><a href="https://en.wikipedia.org/wiki/Cache_hierarchy" rel="nofollow">https://en.wikipedia.org/wiki/Cache_hierarchy</a>Executive summary: Work smarter, not harder, & the ultimate measure of performance, is performance (which may sound stupid, but it's in the textbook, because it's true!).I would also add that more cores aren't necessarily better. The utility depends upon the nature of the task & how memory-hungry it is. If the task is inescapably sequential, it doesn't really matter how many cores are on the die. Same story with parallelizable tasks that pound RAM: at any given time, one core is hogging the memory bus, and the rest are waiting their turn. They may take turns, but at any point in time, it's essentially single-core performance.The place where multi-core really shines is when you have a highly parallelizable task, where each thread grinds really hard over a smallish data set that fits comfortably in the core's cache. In that case, you can definitely max out all cores. Though from what I see in the wild, that is a rare case.A lot of the industry is really just gaming benchmarks at this point, which are, for the most part, bullshit. I think Apple will remain in very good shape on this front, if only because of their customers (i.e. normies instead of gamers; people who haven't fallen completely into the quantity cult). They will complain when it stops feeling fast (the only measure...), instead of taking it as a challenge & wasting their lives on overclocked water-cooling bullshit.

sys_64738over 2 years ago

Process fabrication improvements can mean a basic CPU design gets faster with no new redesign. Faster memory chips also with the same improvements.

评论 #32967878 未加载

jmartin2683over 2 years ago

Not to be an ass, but it’s horrifying that we graduate and/or employ engineers that don’t already know this, at least at a high level.

评论 #32970851 未加载