TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The trouble with SPIR-V, 2022 edition

47 pointsby Jhstoalmost 3 years ago

3 comments

my123almost 3 years ago
&gt; Modern CUDA uses explicit programmer-managed masks, which is powerful and takes advantage of their hardware specifics. But mis-using the mask can cause a deadlock, as divergent threads could simply never participate in a subgroup operation that expects them to, leaving the other threads to block forever. I can see why this solution leaves to be desired, as it just offloads the problem and the risk of misuse to the user.<p>Note, on Volta (2017, present on customer since Turing in 2018) onwards, Independent Thread Scheduling is present, with a separate instruction pointer per SIMT thread.<p>This allows to have atomics across different lanes of the same warp, as such providing the guarantees assumed by the C++ memory model. Quite some modern CUDA apps are starting to rely on that, and as such will not work on Pascal or earlier, nevermind other GPU vendors.<p>Cooperative Groups are very flexible in CUDA too.<p><a href="https:&#x2F;&#x2F;docs.nvidia.com&#x2F;cuda&#x2F;volta-tuning-guide&#x2F;index.html#sm-independent-thread-scheduling" rel="nofollow">https:&#x2F;&#x2F;docs.nvidia.com&#x2F;cuda&#x2F;volta-tuning-guide&#x2F;index.html#s...</a><p>As such, control flow is handled very differently on post-Volta GPUs compared to pre-Volta ones, with pre-Volta more akin to what AMD still does today.
评论 #31488589 未加载
评论 #31487352 未加载
yuri91almost 3 years ago
The article mentions WebAssembly as having the same issue as spir-v with structured control flow, but actually in Wasm it is quite a bit better, because you are allowed to break&#x2F;continue from an arbitrarily nested block.<p>This allows you to convert any reducible CFG without losing runtime performance, and only pay a price for irreducible ones (which are somewhat rare).<p>Shameless plug: I wrote an article about solving the structured control flow problem in WebAssembly -&gt; <a href="https:&#x2F;&#x2F;medium.com&#x2F;leaningtech&#x2F;solving-the-structured-control-flow-problem-once-and-for-all-5123117b1ee2" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;leaningtech&#x2F;solving-the-structured-contro...</a>
评论 #31504078 未加载
atq2119almost 3 years ago
This is a very good introduction to the inherent difficulty that comes from trying to do SIMT execution &#x2F; whole program vectorization while at the same time giving programmers the power of certain optimization tricks that punch through the SIMT abstraction and expose the underlying vector architecture (via subgroup&#x2F;wave operations).<p>The title is somewhat misleading as this trouble isn&#x27;t specific to SPIR-V. It is inherent to the field, and DXIL has the same problem. (Arguably it&#x27;s worse there because Microsoft tends to be quite bad at properly specifying semantics of DXIL and DirectX more generally.)
评论 #31488557 未加载