Advanced Performance Extensions (APX)

152 点作者 gautamcgoel将近 2 年前

19 条评论

jcranmer将近 2 年前

High-level overview of what's changing here:* A new REX-like prefix that extends the number of addressable GPRs to 32 from 16. This only supports instructions that have one-byte opcodes or the 0f prefix, so recent GPR instructions like ADCX or BLSR aren't supported with this format, except.* The EVEX prefix (used for AVX-512) is also extended to be usable for GPR instructions instead of just vector instructions. This allows three-address instructions to be defined.* The EVEX prefix for GPR also has a dedicated bit for "do you want to set flags as a result of this instruction."* New instructions that push/pop 2 GPRs at once* New instructions that let you conditionally set flags (basically you can do OR/AND in the hardware flags, this sounds useful for compilers).* New instructions for predicated loads.* New 64-bit absolute jump instruction* Also, implementation of the predicated stuff in AVX-512, but for 256-bit vectors. With this note:> A “converged” version of Intel AVX10 with maximum vector lengths of 256 bits and 32-bit opmask registers will be supported across all Intel processors, while 512-bit vector registers and 64-bit opmasks will continue to be supported on some P-core processors.

评论 #36868429 未加载

pavlov将近 2 年前

Interesting. More registers and separate destination on instructions is about 40 years overdue, but better late than never.I realized I’ve completely lost track of Intel’s architecture extensions reading this:“They do not change the size and layout of the XSAVE area as they take up the space left behind by the deprecated Intel® MPX registers.”Apparently MPX was Memory Protection Extensions and the design was so flawed, it was removed entirely soon after introduction.

评论 #36853880 未加载

评论 #36853793 未加载

dfox将近 2 年前

Apparently the Intel's marketing had forgotten that they already had a product called iAPX (standing for “Advanced Performance arCHitecture”) and it did not go exactly well :)

评论 #36856859 未加载

评论 #36853715 未加载

soulbadguy将近 2 年前

> extends the number of addressable GPRs to 32 from 16I have always been curious as to why the number of GPRs were limited for so long on X86 given that the instruction set is already variable length, and the CPU have typically a very large number of internal arch-register that could be cheaply addressed.Having looked at the pain of developing a good register allocate in LLVM, and how critical memory access can me in hot/tight loops i would have loved to have even more register something closer to 64 or 128, and let the cpu manage the spilling internally.

评论 #36855089 未加载

评论 #36855455 未加载

评论 #36860431 未加载

评论 #36863317 未加载

mike_hearn将近 2 年前

So .... when will it ship? No mention of physical products anywhere.I wonder why this long delays are still necessary. In the old days yes as there were so many parties to coordinate but nowadays, in theory, Intel could release hardware, the new ISA and compiler/OS patches and binaries on the same day.

评论 #36854481 未加载

评论 #36854208 未加载

评论 #36854211 未加载

评论 #36854712 未加载

评论 #36856204 未加载

评论 #36854040 未加载

serhack_将近 2 年前

> Intel® APX doubles the number of general-purpose registers (GPRs) from 16 to 32.

评论 #36853545 未加载

FullyFunctional将近 2 年前

> "legacy integer instructions now can also use EVEX to encode a dedicated destination register operand – turning them into three-operand instructions"x86 ISA is growing more RISC-like. Definitely saving on stack spilling is a Good Thing™

jamesy0ung将近 2 年前

I thought it was going to be related to the iAPX (Intel Advanced Performance Architecture)<a href="https://en.wikipedia.org/wiki/IAPX" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/IAPX</a>

KerrAvon将近 2 年前

So it sounds like (among other things) they're adding 3-address integer instructions to an instruction encoding only used for vector instructions today.I was not familiar with the AVX vector instructions at this level of detail.<a href="https://en.wikipedia.org/wiki/EVEX_prefix" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/EVEX_prefix</a>

评论 #36854284 未加载

FullyFunctional将近 2 年前

The "wall of text" TL;DR:- +16 registers (thus 32) and optionally separate destination (looking very RISC like now)- PUSH2/POP2 with full forwarding- Much expanded predication, including predicated loads and storesThis is pretty interesting. Especially the latter can make a big difference for highly unpredictable memory intensive code, like compression.

评论 #36854126 未加载

评论 #36854125 未加载

ksec将近 2 年前

It is clear this is at least 2 - 3 years in the making. It doesn't seems any of the 2024 products on Intel Roadmap will have APX ( This could be wrong ). So I assume the earliest being 2025.The question is when will AMD adopt it. Zen 5 is done and Zen 6 may be too late for these changes. Zen 6 is already looking at 2026. If they waited til Zen 7 it will be at least 2028.Intel is still 35% behind Apple in terms of Pref / Clock on Geekbench.

brucethemoose2将近 2 年前

> Intel® APX demonstrates the advantage of the variable-length instruction encodings of x86 – new features enhancing the entire instruction set can be defined with only incremental changes to the instruction-decode hardware. This flexibility has allowed Intel® architecture to adapt and flourish over four decades of rapid advances in computing – and it enables the innovations that will keep it thriving into the future.

johnklos将近 2 年前

"Intel® APX demonstrates the advantage of the variable-length instruction encodings of x86 – new features enhancing the entire instruction set can be defined with only incremental changes to the instruction-decode hardware."In other words, their initial sloppiness was a Feature™: our initial mess was so bad that changes like these don't make it any worse!

voidbert将近 2 年前

I don't know much about CPUs, but isn't this going to increase decoder complexity and binary size? The reduction in memory accesses seems great, but can anybody tell me if there are better ways of achieving this? x86 gets more complex each day.

评论 #36861295 未加载

dmitrygr将近 2 年前

So, in about 30 years when the majority of the CPUs have this, we can use it. Assuming intel does not gate this just to XEON for no reason whatsoever, like they did to AVX512?

评论 #36854097 未加载

评论 #36853899 未加载

评论 #36853714 未加载

评论 #36853736 未加载

ribit将近 2 年前

Do this basically copies ARM’s Aarch64, but with a really awkward instruction encoding? Interesting move, Intel.

soulbadguy将近 2 年前

couldn't find the info anywhere, but any ETA on this ? Or least what is the first Arch supporting this? Redwood Cove or someting later ?

user20230724将近 2 年前

Wow, perfect timing

muricula将近 2 年前

It seems like most of these new instructions and registers correspond to the original armv8 base isa. I'm going to go out on a limb here and suppose that's not an accident. Does anyone know why Intel thinks x86 needs them?Is the goal here to increase the decode bandwidth of Intel CPUs?Is the goal to reduce demands on load-store units by increasing the number of registers?Are they hoping to make it easier to port or JIT armv8 asm to Intel CPUs?

评论 #36854218 未加载

评论 #36856553 未加载

19 条评论

jcranmer将近 2 年前

评论 #36868429 未加载

pavlov将近 2 年前

评论 #36853880 未加载

评论 #36853793 未加载

dfox将近 2 年前

Apparently the Intel's marketing had forgotten that they already had a product called iAPX (standing for “Advanced Performance arCHitecture”) and it did not go exactly well :)

评论 #36856859 未加载

评论 #36853715 未加载

soulbadguy将近 2 年前

评论 #36855089 未加载

评论 #36855455 未加载

评论 #36860431 未加载

评论 #36863317 未加载

mike_hearn将近 2 年前

评论 #36854481 未加载

评论 #36854208 未加载

评论 #36854211 未加载

评论 #36854712 未加载

评论 #36856204 未加载

评论 #36854040 未加载

serhack_将近 2 年前

> Intel® APX doubles the number of general-purpose registers (GPRs) from 16 to 32.

评论 #36853545 未加载

FullyFunctional将近 2 年前

jamesy0ung将近 2 年前

KerrAvon将近 2 年前

评论 #36854284 未加载

FullyFunctional将近 2 年前

评论 #36854126 未加载

评论 #36854125 未加载

ksec将近 2 年前

brucethemoose2将近 2 年前

johnklos将近 2 年前

voidbert将近 2 年前

评论 #36861295 未加载

dmitrygr将近 2 年前

So, in about 30 years when the majority of the CPUs have this, we can use it. Assuming intel does not gate this just to XEON for no reason whatsoever, like they did to AVX512?

评论 #36854097 未加载

评论 #36853899 未加载

评论 #36853714 未加载

评论 #36853736 未加载

ribit将近 2 年前

Do this basically copies ARM’s Aarch64, but with a really awkward instruction encoding? Interesting move, Intel.

soulbadguy将近 2 年前

couldn't find the info anywhere, but any ETA on this ? Or least what is the first Arch supporting this? Redwood Cove or someting later ?