I think this is a prime example of why micro-benchmarks are hard to get right, and easy to get wrong.<p>In particular, there are two concerning things here:<p>"rust" and "go" have pretty different memory performance characteristics -- one has GC and heavy runtime and other does not! It seems like this is "go vs rust" comparison, not "goroutines vs thread". For fairness, you want the same language.<p>Another point is that the examples are really trivial, and don't really exercise the stack. Would the thread stack shrink back or grow too fast? What about goroutine stack?
The benchmark does not fit the title.<p>In the benchmark as originally done, threads took 3x the memory of goroutines. And that benchmark missed the 40 MB of data hidden in the kernel. Which makes a thread take over 4x the memory of a goroutine. And threads were 50% slower.<p>The memory and performance overhead of threads may not be an issue in specific use cases when you're doing real work. But a factor of 4+ memory overhead does make threads significantly larger than goroutines.