It's not clear <i>how much</i> of these improvements are from reductions in pricing rather than algorithms and design decisions. They've documented things like using Netty for network latency, avoiding GC, and getting better with Spark, but it'd be interesting if the team could go back and run the benchmark using the same infrastructure as their 2014 benchmark for a code-vs-code comparison to separate engineering improvements from economies of scale.
Meanwhile, google was sorting Petabytes in under a minute on their clusters 6+ years ago. We've still got a long ways to go in OSS land to compete with the big boys.
A price record not a performance one.<p>Also, seeing how expensive it is to sort 100TB ($144) you have to wonder why it wouldn't be better to do it on your own hardware.