Funny this comes up again so soon after I needed it! I recently did a proof-of-concept related to bioinformatics (gene assembly, etc...), and one quirk of that space is that they work with <i>enormous</i> text files. Think tens of gigabytes being a "normal" size. Just compressing and copying these around is a pain.<p>One trick I discovered is that tools like pigz can be used to both accelerate the compression step and also copy to cloud storage in parallel! E.g.:<p><pre><code> pigz input.fastq -c | azcopy copy --from-to PipeBlob "https://myaccountname.blob.core.windows.net/inputs/input.fastq.gz?..."
</code></pre>
There is a similar pipeline available for s3cmd as well with the same benefit of overlapping the compression and the copy.<p>However, if your tools support zstd, then it's more efficient to use that instead. Try the "zstd -T0" option or the "pzstd" tool for even higher throughputs but with same minor caveats.<p>PS: In case anyone here is working on the above tools, I have a small request! What would be awesome is to <i>automatically</i> tune the compression ratio to match the available output bandwidth. With the '-c' output option, this is easy: just keep increasing the compression level by one notch whenever the output buffer is full, and reduce it by one level whenever the output buffer is empty. This will automatically tune the system to get the maximum total throughput given the available CPU performance and network bandwidth.