I am a fan of the thread-per-core model, but I do not share their views on the data sharding per core<p>While it increases the data locality, I have seen a few software following this sharding model (notably scylla) that work really bad once the load is not evenly distributed across all shards<p>When that happens it can be a huge waste of resources and can give lower performance (depending on the type of load)<p>Imho unless you are absolutely sure about the type of load, leave the sharding to dividing data between servers, or have some mechanism that can shift to sharing the load between threads if the system imbalance is too great
I'll go against the grain here and say that async/await, wether implemented by one thread-per-core like here or by stackless coroutines is not the solution.<p>Async/await will make complexity explode because of the colored function problem [1].<p>The solution to expensive context switches is cheap context switches, plain and simple. User-mode lightweight threads like go's, or upcoming Java's with Loom [2] have proven that this is possible.<p>Yes, it does mean that it can only happen in a language that controls its stack (so that you can slice it off and pop a continuation on it). I sincerely believe this is Rust's ballpark; hell they even started the project with that idea in mind.<p>[1] <a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/" rel="nofollow">https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...</a><p>[2] <a href="https://jdk.java.net/loom/" rel="nofollow">https://jdk.java.net/loom/</a>
Finally, a Seastar clone for Rust! Really impressed by some of the work coming out from Datadog. Curious if one of the popular Rust web frameworks will choose it as I/O backend, could be a great way to push down latencies even further.
Scipio has seen some action on HN before in "C++ vs Rust: an async Thread-per-Core story": <a href="https://news.ycombinator.com/item?id=24444347" rel="nofollow">https://news.ycombinator.com/item?id=24444347</a>
Thread-per-core has been around for a while! It's how a lot of modern multicore languages work. You've probably heard of "green" or "lightweight" threads before... it's all the same idea. Typically, you'd want to use a scheduler (probably work-stealing <a href="https://en.wikipedia.org/wiki/Work_stealing" rel="nofollow">https://en.wikipedia.org/wiki/Work_stealing</a>) under the hood to dynamically assign tasks to processors, which is much more effective at load-balancing than "sharding".<p>All of these languages/libraries use a dynamic scheduler for load-balancing:<p>* Rayon (Rust) [<a href="https://github.com/rayon-rs/rayon" rel="nofollow">https://github.com/rayon-rs/rayon</a>]<p>* Goroutines (Go) [<a href="https://golangbyexample.com/goroutines-golang/" rel="nofollow">https://golangbyexample.com/goroutines-golang/</a>]<p>* OpenMP [<a href="https://www.openmp.org/" rel="nofollow">https://www.openmp.org/</a>]<p>* Task Parallel Library (.NET) [<a href="https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/task-parallel-library-tpl" rel="nofollow">https://docs.microsoft.com/en-us/dotnet/standard/parallel-pr...</a>]<p>* Thread Building Blocks (C++) [<a href="https://software.intel.com/content/www/us/en/develop/tools/threading-building-blocks.html" rel="nofollow">https://software.intel.com/content/www/us/en/develop/tools/t...</a>]<p>* Cilk (C/C++) [<a href="http://cilk.mit.edu/" rel="nofollow">http://cilk.mit.edu/</a>]<p>* Java Fork-Join and Parallel Streams [<a href="https://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html" rel="nofollow">https://docs.oracle.com/javase/tutorial/collections/streams/...</a>]<p>* ParlayLib (C++) [<a href="https://github.com/cmuparlay/parlaylib" rel="nofollow">https://github.com/cmuparlay/parlaylib</a>]
So this sounds a lot like an application of Virding's First Rule of Programming.<p><i>Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang.</i><p><a href="http://rvirding.blogspot.com/2008/01/virdings-first-rule-of-programming.html" rel="nofollow">http://rvirding.blogspot.com/2008/01/virdings-first-rule-of-...</a>
This only makes sense if the local I/O is really fast and in many cases you contact external databases where the context switches are negligible to the I/O time spent on a query.