SIMD instructions

129 点作者 _3lin大约 5 年前

22 条评论

dang大约 5 年前

This is a list of articles—probably a good one, but HN is itself a list of articles, so this is too much indirection.Lists don't make good HN submissions, because the only thing to discuss about them is the lowest common denominator of the items on the list [1], leading to generic discussion, which isn't as interesting as specific discussion [2].It's better to pick the most interesting item from the list and submit that. You can always do it more than once, if there is more than one interesting item—but it's best to wait a while between such submissions, to let the hivemind caches clear.[1] <a href="https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=by%3Adang%20denominator%20list&sort=byDate&type=comment" rel="nofollow">https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...</a>[2] <a href="https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=by%3Adang%20generic%20discussion&sort=byDate&type=comment" rel="nofollow">https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...</a>

Twinklebear大约 5 年前

SIMD is used a ton in rendering applications and starting to see more use in games too (through ISPC for example).I'd add to the list:- Embree: <a href="https://www.embree.org/" rel="nofollow">https://www.embree.org/</a> Open source high-performance ray tracing kernels for CPUs using SIMD.- OpenVKL: <a href="https://www.openvkl.org/" rel="nofollow">https://www.openvkl.org/</a> Similar to Embree (high-performance ray tracing kernels), but for volume traversal and sampling.- ISPC: <a href="https://ispc.github.io/" rel="nofollow">https://ispc.github.io/</a> an open source compiler for a SPMD language which compiles it to efficient SIMD code- OSPRay: <a href="http://www.ospray.org/" rel="nofollow">http://www.ospray.org/</a> A large project using SIMD throughout (via ISPC) for real time ray tracing for scientific visualization and physically based rendering.- Open Image Denoise: <a href="https://openimagedenoise.github.io/" rel="nofollow">https://openimagedenoise.github.io/</a> An open-source image denoiser using SIMD (via ISPC) for some image processing and denoising.- (my own project) ChameleonRT: <a href="https://github.com/Twinklebear/ChameleonRT" rel="nofollow">https://github.com/Twinklebear/ChameleonRT</a> has an Embree + ISPC backend, using Embree for SIMD ray traversal and ISPC for vectorizing the rest of the path tracer (shading, texture sampling).

评论 #22834970 未加载

评论 #22834382 未加载

评论 #22834384 未加载

评论 #22834025 未加载

burntsushi大约 5 年前

ripgrep does, and it's a big reason why it edges out GNU grep in a lot of common cases, especially for case insensitive searches. The most significant use of SIMD is the Teddy algorithm, which I copied from the Hyperscan project. I wrote up how it works here: <a href="https://github.com/BurntSushi/aho-corasick/blob/66f581583b6921ad1e5731d0dd4f192436af0e36/src/packed/teddy/README.md" rel="nofollow">https://github.com/BurntSushi/aho-corasick/blob/66f581583b69...</a>

评论 #22835102 未加载

reikonomusha大约 5 年前

A Common Lisp project that uses SIMD (specifically AVX2) is the Quantum Virtual Machine [1]. It’s a quantum computer simulator. Here [2] is part of the source that has the SIMD instructions.It’s cool that with using SBCL, an implementation of Common Lisp, you can write compartmentalized assembly very easily in an otherwise extremely high-level language.[1] <a href="https://github.com/rigetti/qvm" rel="nofollow">https://github.com/rigetti/qvm</a>[2] <a href="https://github.com/rigetti/qvm/blob/master/src/impl/sbcl-avx-vops.lisp#L53" rel="nofollow">https://github.com/rigetti/qvm/blob/master/src/impl/sbcl-avx...</a>

corysama大约 5 年前

The megahertz-scaling "Free Lunch" was declared dead 15 years ago [<a href="http://www.gotw.ca/publications/concurrency-ddj.htm" rel="nofollow">http://www.gotw.ca/publications/concurrency-ddj.htm</a>] and it's been only getting deader. People are finally, grudgingly accepting that they must go parallel unless we want to see software performance stagnate permanently. For most people here, the issue has been obvious since before they learned to program. But, still they are putting off learning how to deal with it. The first, obvious answer to that is threading. But, in my experience, SIMD is a bigger bang for the buck for two reasons: 1) No synchronization problems. 2) Better cache utilization. It's not just that SIMD forces you to work in large, contiguous blocks. Fun fact: When you aren't using SIMD you are only using a fraction of your L1 cache bandwidth!A big challenge is that SIMD intrinsic-function APIs are weird. They have inscrutable function names and sometimes difficult semantics. What helped me greatly was going through the effort of writing #define wrappers for myself that just gave each function in SSE1-3 names that made sense to me. I don't expect many people to put in that effort. And, unfortunately, I don't have go-to recommendations for pre-existing libraries. Best I can do is:<a href="https://github.com/VcDevel/Vc" rel="nofollow">https://github.com/VcDevel/Vc</a> is working on being standardized into C++. It's great for processing medium-to-large arrays.<a href="https://ispc.github.io/" rel="nofollow">https://ispc.github.io/</a> is great for writing large, complicated SIMD features.<a href="https://github.com/microsoft/DirectXMath" rel="nofollow">https://github.com/microsoft/DirectXMath</a> is not actually tied to DirectX. It's has a huge library of small-vector linear algebra (3D graphics math) function. It used to be pretty tied to MS's compiler. But, I believe they've been cleaning it up to be cross compiler lately.

评论 #22834532 未加载

TazeTSchnitzel大约 5 年前

> SIMD […] is a good alternative to multithreadingThey are not alternatives to eachother, they are orthogonal things, unless you're using a GPU.

评论 #22835010 未加载

评论 #22834599 未加载

poorman大约 5 年前

Surprised Apache Arrow isn't on this list.<a href="https://arrow.apache.org/" rel="nofollow">https://arrow.apache.org/</a> > Apache Arrow™ enables execution engines to take advantage of the latest SIMD (Single instruction, multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing.

singhrac大约 5 年前

Pretty much every neural network framework is aggressively SIMD-optimized (after all, that's kind of the point besides autodiff), not sure why Tencent's framework is picked..If you know about it, I want to hear about more fast SIMD-based CLI tools that can replace my existing workflow (e.g. burntsushi's ripgrep or xsv).

评论 #22835181 未加载

nickysielicki大约 5 年前

Plus many many more if they use the right compiler flags and use aligned types.I wrote this a couple days ago: <a href="https://sielicki.github.io/posts/playing-around-with-autovectorization/" rel="nofollow">https://sielicki.github.io/posts/playing-around-with-autovec...</a>

评论 #22834120 未加载

gameswithgo大约 5 年前

I have a couple of videos introducing intel SIMD intrinsics and how to use them well.For Rust/C/C++: <a href="https://www.youtube.com/watch?v=4Gs_CA_vm3o" rel="nofollow">https://www.youtube.com/watch?v=4Gs_CA_vm3o</a>For C#: <a href="https://www.youtube.com/watch?v=8RcjQPbvvRU" rel="nofollow">https://www.youtube.com/watch?v=8RcjQPbvvRU</a>

dmos62大约 5 年前

To someone just hearing about SIMD, anyone care to give an experience infused introduction? Is it worth the hassle only in rare cases?

评论 #22834552 未加载

评论 #22834183 未加载

评论 #22834924 未加载

评论 #22837559 未加载

评论 #22834211 未加载

评论 #22834225 未加载

haolez大约 5 年前

I found this curious regarding QuestDB[1]:> Java 8 64-bit. We recommend Oracle Java 8, but OpenJDK8 will also work (although a little slower).Anyone have an idea why?[1]<a href="https://github.com/questdb/questdb" rel="nofollow">https://github.com/questdb/questdb</a>

评论 #22834830 未加载

veselin大约 5 年前

I was wondering generally, is SIMD a good idea for general purpose CPUs. Imagine if the current high end CPUs had double the number of cores, no SIMD, but possibly higher frequency and the algorithms that benefit from SIMD were all run on integrated accelerators instead.At least as a side observer it looks like a huge number of very large registers take large portions of a core, for sure consuming a lot of power as well, just to sit idle while the core is running JavaScript. Can somebody with CPU architecture experience say what is the real tradeoff here.

评论 #22838206 未加载

jzelinskie大约 5 年前

Reminder that nothing is a panacea: I've heard from game engine authors and cryptographers that on Intel chips _over-using_ SIMD can actually heat up the chips too much such that it'll cause the system to then adjust the clockrate lower to cool down and you can degrade performance beyond not using SIMD at all. Before hearing that, I had never considered thermal properties of particular instructions.

评论 #22834542 未加载

评论 #22836459 未加载

评论 #22834508 未加载

GordonS大约 5 年前

As a point of interest, you can even use SIMD (and other hardware intrinsics) in dotnet core, since 3.0, e.g. <a href="https://medium.com/@alexyakunin/geting-4x-speedup-with-net-core-3-0-simd-intrinsics-5c9c31c47991" rel="nofollow">https://medium.com/@alexyakunin/geting-4x-speedup-with-net-c...</a>

tarr11大约 5 年前

Apache Lucene has recently started using SIMD to decode postings lists (Java)<a href="https://issues.apache.org/jira/browse/LUCENE-9027" rel="nofollow">https://issues.apache.org/jira/browse/LUCENE-9027</a>

FZ1大约 5 年前

Adding the obvious numpy vectorization - I presume that counts as an 'open source project'?Or maybe this is limited to little personal projects, and not major libraries ?

truth_seeker大约 5 年前

JVM generates SIMD to certain extent, i wish other runtimes like V8 (Browser/NodeJS), Go, BEAM (Elixir/Erlang) etc did the same.

评论 #22834235 未加载

andrea_s大约 5 年前

Yandex ClickHouse also should be on the list!

vmchale大约 5 年前

gcc can do SIMD on its own quite surprisingly (to me)<a href="https://github.com/vmchale/ats-codecount/blob/master/DATS/wc.dats" rel="nofollow">https://github.com/vmchale/ats-codecount/blob/master/DATS/wc...</a>

capableweb大约 5 年前

Looks like a blog post, so shouldn't really be a Show HN. Take a look at the guidelines: <a href="https://news.ycombinator.com/showhn.html" rel="nofollow">https://news.ycombinator.com/showhn.html</a>> This week has been particularly bad regarding securitySeems like a weird way to phrase it, when talking about security fixes. If it was about security vulnerabilities, I would understand you say it's bad, but in this case it's about fixing vulnerabilities, that's good right?

评论 #22833889 未加载

The_rationalist大约 5 年前

The easiest way to benefit from SIMD is to use OpenMP SIMD directive on for loops.