yes! non standard data wrangling, even if just for fun, is great way to gain a better standing of your workload and hardware.<p>tldr; [de]serialization is your bottleneck, after that it’s general data processing. both are wasting insane levels of cpu cycles. network and disk, when accessed linearly, are free.<p>i remember first looking into this when ec2 i3 came out, only more so since. lambda for burst cpu capacity when you can’t wait 30s for ec2 spot is interesting too.<p><a href="https://nathants.com/posts/performant-batch-processing-with-bsv-s4-and-presto" rel="nofollow noreferrer">https://nathants.com/posts/performant-batch-processing-with-...</a>