TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Intel AVX10: The Successor to AVX-512

97 点作者 gautamcgoel将近 2 年前

11 条评论

crest将近 2 年前
Hasn't AMD proven multiple times that a double pumped packed-SIMD implementation works well enough? Just the permute operations need a full width data path to get reasonable latencies. Intel already overplayed their hand with AVX-512 when they still had a stronger position. Let's hope they fail to hold back the field with their misguided attempts to increase their margin no matter the cost (even to their own bottom line).
评论 #36855698 未加载
canucker2016将近 2 年前
from <a href="https:&#x2F;&#x2F;cdrdv2.intel.com&#x2F;v1&#x2F;dl&#x2F;getContent&#x2F;784343" rel="nofollow noreferrer">https:&#x2F;&#x2F;cdrdv2.intel.com&#x2F;v1&#x2F;dl&#x2F;getContent&#x2F;784343</a> (&quot;The Converged Vector ISA: Intel Advanced Vector Extensions 10&quot; Technical Paper PDF)<p>&quot;Intel AVX10 Version 1 will be introduced for early software enablement and supports the subset of all the Intel AVX-512 instruction set available as of future Intel Xeon processors with P-cores, codenamed Granite Rapids, that is forward compatible to Intel AVX10. This version will not include the new 256-bit vector instructions supporting embedded rounding or any of the new instructions and will serve as the transition base version from Intel AVX-512 to Intel AVX10.<p>Intel AVX10 Version 2 will include the 256-bit instruction forms supporting embedded rounding as well as a suite of new Intel AVX10 instructions covering new AI data types and conversions, data movement optimizations, and standards support. All new instructions will be supported at 128-, 256-, and 512-bit vector lengths with limited variances. All Intel AVX10 versions will implement the new versioning enumeration scheme.&quot;<p>And who knows when AMD will have time to update Zen ? architecture with these new instructions.
评论 #36855695 未加载
Am4TIfIsER0ppos将近 2 年前
Slow down dammit! I&#x27;ve barely started writing avx512 since they became worth it on ice lake.<p>&gt; being able to work for both P and E cores<p>Oh yes I forgot they were gimping their own processors.<p>&gt; the converged version has a maximum vector length of 256-bits [on] the E cores while P cores will have optional 512-bit vector use<p>Maybe they shouldn&#x27;t have made xmm and ymm &quot;extensions&quot; to the base set to begin with.
评论 #36854910 未加载
colejohnson66将近 2 年前
That’s a massive extension. 32 GPRs! And they’re finally reusing an encoding made reserved in long mode (D5 - AAM in legacy modes).<p>Guess Intel’s feeling the pressure from Zen 4 supporting AVX-512.
评论 #36859927 未加载
codedokode将近 2 年前
From the name I thought that they extended registers to 1024 bits, but it looks like instead they made 512-bit width support optional.
评论 #36860526 未加载
aperture147将近 2 年前
As an average developer who works on high level interface, I don&#x27;t really see the benefit of AVX-512. I&#x27;ve heard that some math calculating software MAY gains some benefit from AVX instructions (like BLAS), but I&#x27;ve never use it personally. Can you guys please explain?
评论 #36860470 未加载
评论 #36858496 未加载
jauntywundrkind将近 2 年前
This seemed really cool. I&#x27;m used to a lot of new instructions &amp; boosts, but Intel adding new conditional load&#x2F;store is a smart interesting coupling that could help increase execution unit efficiency in a significant way.<p>&gt; <i>As out-of-order CPUs continue to become deeper and wider, the cost of mispredictions increasingly dominates performance of such workloads. Branch predictor improvements can mitigate this to a limited extent only as data-dependent branches are fundamentally hard to predict.</i><p>&gt; <i>To address this growing performance issue, we significantly expand the conditional instruction set of x86, which was first introduced with the Intel® Pentium® Pro in the form of CMOV&#x2F;SET instructions. These instructions are used quite extensively by today’s compilers, but they are too limited for broader use of if-conversion (a compiler optimization that replaces branches with conditional instructions).</i><p>&gt; <i>Intel® APX adds conditional forms of load, store, and compare&#x2F;test instructions, and it also adds an option for the compiler to suppress the status flags writes of common instructions. </i><p><a href="https:&#x2F;&#x2F;www.intel.com&#x2F;content&#x2F;www&#x2F;us&#x2F;en&#x2F;developer&#x2F;articles&#x2F;technical&#x2F;advanced-performance-extensions-apx.html" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.intel.com&#x2F;content&#x2F;www&#x2F;us&#x2F;en&#x2F;developer&#x2F;articles&#x2F;t...</a><p>I didn&#x27;t understand everything about the &quot;caller-saved volatile&quot; new general purpose register interface &amp; legacy compatibility. But some potentially really interesting optimizations where load&#x2F;store being dual register capable, and being capable of staying on the AVX unit &amp; not having to go further out to &quot;memory&quot; (caches?):<p>&gt; <i>Generally, more register state will need to be managed at function boundaries. In order to reduce the associated overhead, we are adding PUSH2&#x2F;POP2 instructions that transfer two register values within a single memory operation. The processor tracks these new instructions internally and fast-forwards register data between matching PUSH2 and POP2 instructions without going through memory.</i><p>Neat stuff. Very superficially reminds me of Semantic Streaming Registers on the very novel standalone-ish FPU on PULP&#x27;s RISC-V based Occamy many-core chip. In that the unit is acting in a more standalone fashion. <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=kMhdq7A3d3I#t=10m">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=kMhdq7A3d3I#t=10m</a> <a href="https:&#x2F;&#x2F;pulp-platform.org&#x2F;docs&#x2F;BeniniSC11-22.pdf" rel="nofollow noreferrer">https:&#x2F;&#x2F;pulp-platform.org&#x2F;docs&#x2F;BeniniSC11-22.pdf</a>
评论 #36856332 未加载
dathinab将近 2 年前
Is someone here who understands the nitty bitty details of AVX-512&#x2F;AVX10 and could tell me what is included which current latest gen AMD processors do not support?<p>Because the only thing I can pick out is the 256bit AVX-512 which AFIK recent amd processors do support (including 512bit support) both on their normal cores and their new compacted code.<p>But I don&#x27;t know much about AVX_ so I&#x27;m 100% I missed a bunch of stuff and&#x2F;or limitations with current AMD code (besides it being double pumped).
评论 #36855140 未加载
评论 #36855051 未加载
评论 #36854915 未加载
评论 #36854768 未加载
RcouF1uZ4gsC将近 2 年前
It seems Intel is borrowing Microsoft’s XBox naming scheme. At least they didn’t name the successor to AVX-512 AVX-One.
shmerl将近 2 年前
Why did it take around 10 years for AMD to implement AVX-512 and will they need to wait as long for this too? Doesn&#x27;t seem to be patent related (patents are 20 years and AVX-512 was introduced in 2013?).
评论 #36855319 未加载
评论 #36855745 未加载
评论 #36855629 未加载
评论 #36855156 未加载
phkahler将近 2 年前
Can we get great RISC-V cores from Apple or AMD please with that vector ISA so we can shut down this whole notion of ISA as a product differentiator?
评论 #36855212 未加载
评论 #36854862 未加载