Wow, this is amazing how easily fusing can be done with just a few linear typed primitives.<p>The cool part is that if you have a for loop of vecbuilders and inside the for loop you have vecbuilders as well, the code can be fused quite easily. This is very hard to do with other data structures.<p>The paper is much better than the site and very easy to understand: <a href="https://cs.stanford.edu/~matei/papers/2017/cidr_weld.pdf" rel="nofollow">https://cs.stanford.edu/~matei/papers/2017/cidr_weld.pdf</a>
I didn't quite get it from the initial skim of this page.<p>But here's the key bit:<p>> [existing solutions slow] ... due to extensive data movement across the functions. Weld’s take on solving this problem is to lazily build up a computation for the entire workflow, optimizing and evaluating it only when a result is needed.<p>The "Background" [1] section of the tutorial has more info too.<p>I've done lots of optimizations before with CPU/GPU computations and I've definitely noticed that minimizing trips across the bus and doing computation during bus activity is critical to saturating the processors/memory. But would we see the orders-of-magnitude improvements shown on this page for CPU-only work? I can't quite tell what was measured in the graphs.<p>[1] <a href="https://github.com/weld-project/weld/blob/master/docs/tutorial.md#weld-background" rel="nofollow">https://github.com/weld-project/weld/blob/master/docs/tutori...</a>