I've just spent the last week investigating cloud compute options for a lab that needs to run bioinformatics / genomics algorithms.<p>First off, the pricing in the article is so disingenuous as to be outright deception.<p>Here is the Spot price for Azure HB120rs_v2, a popular HPC size with 120 AMD EPYC cores and 456 GB of RAM: <a href="https://azureprice.net/vm/Standard_HB120rs_v2?tier=spot&currency=USD" rel="nofollow">https://azureprice.net/vm/Standard_HB120rs_v2?tier=spot&curr...</a><p>This is less than $300/month for 2.5x the compute capacity he's referencing! The author's estimate is $200/month for an on-prem server with just 48 cores. Scaled down to that level, the equivalent in cloud spot pricing would be $120.<p>That's assuming on-prem is 100% utilised and the cloud compute is not auto-scaled. If those assumptions are lifted, the cloud is <i>much</i> cheaper.<p>The cloud makes sense in several other ways also:<p>- Once the data is in cloud storage like S3 or Azure Storage Accounts, sharing it with government departments, universities, or other research institutes is trivial. Just send them a SAS URL and they can probably download it at 1GB/s without killing the Internet link at the source.<p>- Many of these processes have 10 GB inputs that produce about 1 TB of output due to all the intermediate and temporary files. These are often kept for later analysis, but they're of low value and go cold very quickly. Tiered storage in the cloud is very easy to set up and dirt cheap compared to on-prem network attached storage. These blobs can be moved to "Cold" storage within a few days, and then to "Archive" within a month or two at most.<p>- The algorithms improve over time, at which point it would be oh-so-nice to be able to re-run them over the old multi-petabyte data sets. But on-prem, this is an extravagance, and needs a lot of justification. In the cloud, you can just spin up a large pool of Spot instances with a low price cap, and let it chunk through the old data when it can. Unlike on-prem, this can read the old data in <i>much</i> faster, easily up to 30-100 Gbps in my tests. Good luck building a disk array that can stream 100 Gbps <i>and also</i> have good performance for high-priority workloads!<p>- The hardware is evolving much more rapidly than typical enterprise purchase cycles. We have a customer that is about to buy one (1) NVIDIA A100 GPU to use for bioinformatics. In a matter of months, it'll be superseded by the NVIDIA "Hopper" H100 series, which is 7x faster for the same genomics codes. In the cloud, both AWS and Azure will soon have instances with four H100 cards in them. That'll be 28 times faster than one A100 card, making the on-prem purchase obsolete years before the warranty runs out. A couple of years later when then successor to H100 is available in the cloud, these guys will <i>still</i> be using the A100!<p>- The cloud provides lots of peripheral services that are a PitA to set up, secure, and manage locally. For example, EKS or AKS are managed Kubernetes clusters that can be used to efficiently bin-pack HPC compute jobs and restart jobs on Spot instances if they're deallocated. Similarly, Azure CycleCloud provides managed Slurm clusters with auto-scale and spot pricing. For Docker workloads there are managed container registries, and both single-instance and scalable "container apps" that work quite well for one-off batch jobs, Jupyter notebooks, and the like.<p>- In the cloud, it's easy to temporarily spin up a <i>true</i> HPC cluster with 200 Gbps Infiniband and a matching high-performance storage cache. It's like a tiny supercomputer, rented by the hour. On-prem, just buying a single Infiniband switch will set you back more than $30K, and it'll be just the chassis. No cables, SFPs, or host adapters. A full setup is north of $100K. Good luck buying "cheap" storage that can keep up with that network!<p>Etc, etc...