A GPU followup to this article.<p>While on CPU sequentially consistent semantics are efficient to implement, that seems to be much less true on GPU. Thus, Vulkan completely eliminates sequential consistency and provides only acquire/release semantics[1].<p>It is extremely difficult to reason about programs using these advanced memory semantics. For example, there is a discussion about whether a spinlock implemented in terms of acquire and release can be reordered in a way to introduce deadlock (see reddit discussion linked from [2]). I was curious enough about this I tried to model it in CDSChecker, but did not get definitive results (the deadlock checker in that tool is enabled for mutexes provided by API, but not for mutexes built out of primitives). I'll also note that using AcqRel semantics is not provided by the Rust version of compare_exchange_weak (perhaps a nit on TFA's assertion that Rust adopts the C++ memory model wholesale), so if acquire to lock the spinlock is not adequate, it's likely it would need to go to SeqCst.<p>Thus, I find myself quite unsure whether this kind of spinlock would work on Vulkan or would be prone to deadlock. It's also possible it could be fixed by putting a release barrier before the lock loop.<p>We have some serious experts on HN, so hopefully someone who knows the answer can enlighten us - mixed in of course with all the confidently wrong assertions that inevitably pop up in discussions about memory model semantics.<p>[1]: <a href="https://www.khronos.org/blog/comparing-the-vulkan-spir-v-memory-model-to-cs" rel="nofollow">https://www.khronos.org/blog/comparing-the-vulkan-spir-v-mem...</a><p>[2]: <a href="https://rigtorp.se/spinlock/" rel="nofollow">https://rigtorp.se/spinlock/</a>
The prior article in this series from ~a week ago is 'Hardware Memory Models', at <a href="https://research.swtch.com/hwmm" rel="nofollow">https://research.swtch.com/hwmm</a>, with some hn-discussion here: <a href="https://news.ycombinator.com/item?id=27684703" rel="nofollow">https://news.ycombinator.com/item?id=27684703</a><p>Another somewhat recently posted (but years-old) page with different but related content is 'Memory Models that Underlie Programming Languages': <a href="http://canonical.org/~kragen/memory-models/" rel="nofollow">http://canonical.org/~kragen/memory-models/</a><p>a few previous hn discussions of that one:<p><a href="https://news.ycombinator.com/item?id=17099608" rel="nofollow">https://news.ycombinator.com/item?id=17099608</a><p><a href="https://news.ycombinator.com/item?id=27455509" rel="nofollow">https://news.ycombinator.com/item?id=27455509</a><p><a href="https://news.ycombinator.com/item?id=13293290" rel="nofollow">https://news.ycombinator.com/item?id=13293290</a>
"Java and JavaScript have avoided introducing weak (acquire/release) synchronizing atomics, which seem tailored for x86."<p>This is not true for Java; see<p><a href="http://gee.cs.oswego.edu/dl/html/j9mm.html" rel="nofollow">http://gee.cs.oswego.edu/dl/html/j9mm.html</a><p><a href="https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/lang/invoke/VarHandle.html" rel="nofollow">https://docs.oracle.com/en/java/javase/16/docs/api/java.base...</a>
A lot of the complexity comes from the lack of expressivity in languages to relate variables (or data structure fields) semantically to each other. If there was a way to tell the compiler "these variables are always accessed in tandem", the compiler could be smart about ordering and memory fences.<p>The idea to extend programming languages and type systems in that direction is not new: folk who've been using distributed computing for computations have to think about this already, and could teach a few things to folk who use shared memory multi-processors.<p>Here's an idea for ISA primitives that could help a language group variables together: bind/propagate operators on (combinations of) address ranges. <a href="https://pure.uva.nl/ws/files/1813114/109501_19.pdf" rel="nofollow">https://pure.uva.nl/ws/files/1813114/109501_19.pdf</a>
Fascinating article. I've been doing research in this area and I wonder if there was exploration for JinjaThreads - which operate on Jinja (a Java-like language) that does a formal DRF proof guarantee (coincidentally using Isabelle/HOL).<p>You can read more about this here if you're interested: <a href="https://www.isa-afp.org/entries/JinjaThreads.html" rel="nofollow">https://www.isa-afp.org/entries/JinjaThreads.html</a>
I'm wondering: is the fact that a CS PhD finds resources like this as much amusing as educational/pedagogical gold telling something for the Academia, the Culture, or the Self?<p>AKA why can't I stumble upon such stuff more often. Thanks OP!
<i>If thread 2 copies done into a register before thread 1 executes, it may keep using that register for the entire loop, never noticing that thread 1 later modifies done.</i><p>Alternative solution: Forget all the "atomic" semantics and simply avoid "optimization" of global variables. Access to any global variable should always occur direct from memory. Sure, this will be less than optimal in some cases but such is the price of using globals. Their use should be discouraged anyway.<p>In other words, make "atomic" the sensible and logical default with globals. Assignment is an "atomic" operation, just don't circumvent it by using a local copy as an "optimization".
These "memory models" are too complex for languages intended for dilettante developers. It was a disaster in Java/C#. Not even more than a handful of programmers in existence know in depth how it works, as in, can they understand any given trivial program in their language. At best they only know some vague stuff like that locking prevents any non visibility issues. It goes far deeper than that though (which is also the fault of complex language designs like Java and C#).<p>The common programmer does not understand that you've just transformed their program - for which they were taught merely that multiple threads needs synchronization - into a new game, which has an entire separate specification, where every shared variable obeys a set of abstruse rules revolving around the happens-before relationship. Locks, mutexes, atomic variables are all one thing. Fences are a completely different thing. At least in the way most people intuit programs to work.<p>Go tries to appeal to programmers as consumers (that is, when given a choice between cleaner design and pleasing the user who just wants to "get stuff done", they choose the latter), but yet also adds in traditional complexities like this. Yes, there is performance trade off to having shared memory behave intuitively, but that's much better than bugs that 99% of your CHOSEN userbase do not know how to avoid.
Also remember Go has lots of weird edge cases, like sharing a slice across threads can lead to memory corruption (in the C / assembly sense, not merely within that array) despite the rest of the language being memory-safe. Multiply that by the "memory model".<p>Edit: forgot spaces between paragraphs.
In a 100 years the main languages used will still be C on the client (with a C++ compiler) and Java on the server.<p>Go has no VM but it has a GC.
WASM has a VM but no GC.<p>Eveything has been tried and Java still kicks everythings ass to the moon on the server.<p>Fragmentation is bad, lets stop using bad languages and focus on the products we build instead.<p>"While I'm on the topic of concurrency I should mention my far too brief chat with Doug Lea. He commented that multi-threaded Java these days far outperforms C, due to the memory management and a garbage collector. If I recall correctly he said "only 12 times faster than C means you haven't started optimizing"." - Martin Fowler <a href="https://martinfowler.com/bliki/OOPSLA2005.html" rel="nofollow">https://martinfowler.com/bliki/OOPSLA2005.html</a><p>"Many lock-free structures offer atomic-free read paths, notably concurrent containers in garbage collected languages, such as ConcurrentHashMap in Java. Languages without garbage collection have fewer straightforward options, mostly because safe memory reclamation is a hard problem..." - Travis Downs <a href="https://travisdowns.github.io/blog/2020/07/06/concurrency-costs.html" rel="nofollow">https://travisdowns.github.io/blog/2020/07/06/concurrency-co...</a>