科技回声

6 条评论

ashvardanian大约 1 个月前

I’ve said it before and I’ll say it again: Rust feels like a Python developer’s idea of a high-performance computing language. It’s a great language for many kinds of applications — just not when you need to squeeze out every bit of performance from advanced hardware.Even before getting into SIMD, try using Rust for concurrent, succinct, or external-memory data structures. It quickly becomes clear where the friction is.Cargo is fantastic — clean, ergonomic, and a joy compared to many toolchains. But it’s much easier to keep things simple when you don’t have to support dozens of AVX-512 variants, AMX, SME, different CUDA generations, ROCm, or any of the other modern hardware capabilities.Standardising SIMD in the standard library — in Rust or C++ — has always been a questionable idea. Most of these APIs cater to operations that compilers already auto-vectorize reasonably well, and they barely touch the recent capabilities of SIMD. Just consider how hard it is to build any meaningful abstraction over the predicate/register models across AVX-512, SVE, and RVV.RVV aside, this should illustrate the point: <a href="https://www.modular.com/blog/understanding-simd-infinite-complexity-of-trivial-problems" rel="nofollow">https://www.modular.com/blog/understanding-simd-infinite-com...</a>

评论 #43521425 未加载

评论 #43520460 未加载

评论 #43521947 未加载

评论 #43521907 未加载

评论 #43521258 未加载

评论 #43520449 未加载

评论 #43520505 未加载

评论 #43522489 未加载

评论 #43520434 未加载

评论 #43523508 未加载

评论 #43523819 未加载

评论 #43522138 未加载

dzaima大约 1 个月前

Seems rustc nightly does successfully vectorize the first sigmoid example: <a href="https://rust.godbolt.org/z/e1WYexqWY" rel="nofollow">https://rust.godbolt.org/z/e1WYexqWY</a>Also there's progress on making safe intrinsics safe: <a href="https://github.com/rust-lang/stdarch/pull/1714" rel="nofollow">https://github.com/rust-lang/stdarch/pull/1714</a>

the__alchemist大约 1 个月前

Very interesting! I posted a vector and quaternion lib here a few weeks ago, and got great feedback on the state of SIMD on these things. I since have went on a deep dive, and implemented wrapper types in a similar way to this library. Used macros to keep repetition down. Split into three main sections:<pre><code> - Floating point primitives. Like this lib. Basically, copied `core::simd`'s API. Will delete this part once core::simd is stable. `f32x8`, `f64x4` types etc, with standard operator overloads, and utility methods like `splat`, `to_array` etc. - Vec and Quaternion analogs. Same idea, similar API. Vec3x8, Quaternionx8 etc. - Code to convert slices of floating point values, or non-SIMD vectors and quaternions to SIMD ones, including (partial) handling of accessing valid lanes in the last chunk. </code></pre> I've incorporated these `x8` types into a WIP molecular dynamics application; relatively painless after setting up the infra. Would love to try `Vec3x16` etc, but 512-bit types aren't stable yet. But from Github activity on Rust, it sounds like this is right around the corner!Of note, as others pointed out in the thread here I mentioned, the other vector etc libs are using the AoS approach, where a single f32x4 value etc is used to represent a Vec3 etc. While with this SoA approach, a `Vec3x8` is for performing operations on 8 Vec3s at once.The article had interesting and surprising points on AVX-512 (Needed for f32x16, Vec3x16 etc). Not sure of the implications of exposing this in a public library is, i.e. might be a trap if the user isn't on one of the AMD Zen CPUs mentioned.From a few examples, I seem to get 2-4x speedup from using the x8 intrinsics, over scalar (non-SIMD) operations.

评论 #43523016 未加载

DeathArrow大约 1 个月前

Using SIMD in C#:<a href="https://xoofx.github.io/blog/2023/07/09/10x-performance-with-simd-in-csharp-dotnet/" rel="nofollow">https://xoofx.github.io/blog/2023/07/09/10x-performance-with...</a><a href="https://learn.microsoft.com/en-us/dotnet/standard/simd" rel="nofollow">https://learn.microsoft.com/en-us/dotnet/standard/simd</a>

isusmelj大约 1 个月前

I’ve been playing around with SIMD since uni lectures about 10 years ago. Back then I started with OpenMP, then moved to x86 intrinsics with AVX. Lately I’ve been exploring portable SIMD for a side project where I’m (re)writing a Numpy-like library in Rust, mostly sticking to the standard library. Portable SIMD has been super helpful so far.I’m on an M-series MacBook now but still want to target x86 as well, and without portable SIMD that would’ve been a headache.If anyone’s curious, the project is here: <a href="https://github.com/IgorSusmelj/rustynum" rel="nofollow">https://github.com/IgorSusmelj/rustynum</a>. It's just a learning exercise for learning Rust, but I’m having a lot of fun with it.

IshKebab大约 1 个月前

A problem for RISC-V is going to be that there's currently no way for user code to detect the presence of RVV. I have no idea how you can do multiversioning with that limitation.

评论 #43522704 未加载

评论 #43524034 未加载

评论 #43528860 未加载

评论 #43524513 未加载

6 条评论

ashvardanian大约 1 个月前

评论 #43521425 未加载

评论 #43520460 未加载

评论 #43521947 未加载

评论 #43521907 未加载

评论 #43521258 未加载

评论 #43520449 未加载

评论 #43520505 未加载

评论 #43522489 未加载

评论 #43520434 未加载

评论 #43523508 未加载

评论 #43523819 未加载

评论 #43522138 未加载

dzaima大约 1 个月前

the__alchemist大约 1 个月前

评论 #43523016 未加载

DeathArrow大约 1 个月前

isusmelj大约 1 个月前

IshKebab大约 1 个月前

A problem for RISC-V is going to be that there's currently no way for user code to detect the presence of RVV. I have no idea how you can do multiversioning with that limitation.

评论 #43522704 未加载

评论 #43524034 未加载

评论 #43528860 未加载

评论 #43524513 未加载

Towards fearless SIMD, 7 years later

6 条评论

Towards fearless SIMD, 7 years later

6 条评论