TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Towards fearless SIMD, 7 years later

177 点作者 raphlinus大约 1 个月前

6 条评论

ashvardanian大约 1 个月前
I’ve said it before and I’ll say it again: Rust feels like a Python developer’s idea of a high-performance computing language. It’s a great language for many kinds of applications — just not when you need to squeeze out every bit of performance from advanced hardware.<p>Even before getting into SIMD, try using Rust for concurrent, succinct, or external-memory data structures. It quickly becomes clear where the friction is.<p>Cargo is fantastic — clean, ergonomic, and a joy compared to many toolchains. But it’s much easier to keep things simple when you don’t have to support dozens of AVX-512 variants, AMX, SME, different CUDA generations, ROCm, or any of the other modern hardware capabilities.<p>Standardising SIMD in the standard library — in Rust or C++ — has always been a questionable idea. Most of these APIs cater to operations that compilers already auto-vectorize reasonably well, and they barely touch the recent capabilities of SIMD. Just consider how hard it is to build any meaningful abstraction over the predicate&#x2F;register models across AVX-512, SVE, and RVV.<p>RVV aside, this should illustrate the point: <a href="https:&#x2F;&#x2F;www.modular.com&#x2F;blog&#x2F;understanding-simd-infinite-complexity-of-trivial-problems" rel="nofollow">https:&#x2F;&#x2F;www.modular.com&#x2F;blog&#x2F;understanding-simd-infinite-com...</a>
评论 #43521425 未加载
评论 #43520460 未加载
评论 #43521947 未加载
评论 #43521907 未加载
评论 #43521258 未加载
评论 #43520449 未加载
评论 #43520505 未加载
评论 #43522489 未加载
评论 #43520434 未加载
评论 #43523508 未加载
评论 #43523819 未加载
评论 #43522138 未加载
dzaima大约 1 个月前
Seems rustc nightly does successfully vectorize the first sigmoid example: <a href="https:&#x2F;&#x2F;rust.godbolt.org&#x2F;z&#x2F;e1WYexqWY" rel="nofollow">https:&#x2F;&#x2F;rust.godbolt.org&#x2F;z&#x2F;e1WYexqWY</a><p>Also there&#x27;s progress on making safe intrinsics safe: <a href="https:&#x2F;&#x2F;github.com&#x2F;rust-lang&#x2F;stdarch&#x2F;pull&#x2F;1714" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;rust-lang&#x2F;stdarch&#x2F;pull&#x2F;1714</a>
the__alchemist大约 1 个月前
Very interesting! I posted a vector and quaternion lib here a few weeks ago, and got great feedback on the state of SIMD on these things. I since have went on a deep dive, and implemented wrapper types in a similar way to this library. Used macros to keep repetition down. Split into three main sections:<p><pre><code> - Floating point primitives. Like this lib. Basically, copied `core::simd`&#x27;s API. Will delete this part once core::simd is stable. `f32x8`, `f64x4` types etc, with standard operator overloads, and utility methods like `splat`, `to_array` etc. - Vec and Quaternion analogs. Same idea, similar API. Vec3x8, Quaternionx8 etc. - Code to convert slices of floating point values, or non-SIMD vectors and quaternions to SIMD ones, including (partial) handling of accessing valid lanes in the last chunk. </code></pre> I&#x27;ve incorporated these `x8` types into a WIP molecular dynamics application; relatively painless after setting up the infra. Would love to try `Vec3x16` etc, but 512-bit types aren&#x27;t stable yet. But from Github activity on Rust, it sounds like this is right around the corner!<p>Of note, as others pointed out in the thread here I mentioned, the other vector etc libs are using the AoS approach, where a single f32x4 value etc is used to represent a Vec3 etc. While with this SoA approach, a `Vec3x8` is for performing operations on 8 Vec3s at once.<p>The article had interesting and surprising points on AVX-512 (Needed for f32x16, Vec3x16 etc). Not sure of the implications of exposing this in a public library is, i.e. might be a trap if the user isn&#x27;t on one of the AMD Zen CPUs mentioned.<p>From a few examples, I seem to get 2-4x speedup from using the x8 intrinsics, over scalar (non-SIMD) operations.
评论 #43523016 未加载
DeathArrow大约 1 个月前
Using SIMD in C#:<p><a href="https:&#x2F;&#x2F;xoofx.github.io&#x2F;blog&#x2F;2023&#x2F;07&#x2F;09&#x2F;10x-performance-with-simd-in-csharp-dotnet&#x2F;" rel="nofollow">https:&#x2F;&#x2F;xoofx.github.io&#x2F;blog&#x2F;2023&#x2F;07&#x2F;09&#x2F;10x-performance-with...</a><p><a href="https:&#x2F;&#x2F;learn.microsoft.com&#x2F;en-us&#x2F;dotnet&#x2F;standard&#x2F;simd" rel="nofollow">https:&#x2F;&#x2F;learn.microsoft.com&#x2F;en-us&#x2F;dotnet&#x2F;standard&#x2F;simd</a>
isusmelj大约 1 个月前
I’ve been playing around with SIMD since uni lectures about 10 years ago. Back then I started with OpenMP, then moved to x86 intrinsics with AVX. Lately I’ve been exploring portable SIMD for a side project where I’m (re)writing a Numpy-like library in Rust, mostly sticking to the standard library. Portable SIMD has been super helpful so far.<p>I’m on an M-series MacBook now but still want to target x86 as well, and without portable SIMD that would’ve been a headache.<p>If anyone’s curious, the project is here: <a href="https:&#x2F;&#x2F;github.com&#x2F;IgorSusmelj&#x2F;rustynum" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;IgorSusmelj&#x2F;rustynum</a>. It&#x27;s just a learning exercise for learning Rust, but I’m having a lot of fun with it.
IshKebab大约 1 个月前
A problem for RISC-V is going to be that there&#x27;s currently no way for user code to detect the presence of RVV. I have no idea how you can do multiversioning with that limitation.
评论 #43522704 未加载
评论 #43524034 未加载
评论 #43528860 未加载
评论 #43524513 未加载