Previous discussions:<p>- Ahrefs saved $400m in 3 years by not going to the cloud (2023): <a href="https://news.ycombinator.com/item?id=35094407">https://news.ycombinator.com/item?id=35094407</a> - (163 comments)<p>- Ahrefs Saved US$400M in 3 Years by Not Going to the Cloud (2023): <a href="https://news.ycombinator.com/item?id=35108813">https://news.ycombinator.com/item?id=35108813</a> - (44 comments)<p>---<p>Similar sentiment:<p>- X celebrates 60% savings from cloud exit: <a href="https://news.ycombinator.com/item?id=38041181">https://news.ycombinator.com/item?id=38041181</a> (18 comments)<p>- Leaving the Cloud: <a href="https://news.ycombinator.com/item?id=33301078">https://news.ycombinator.com/item?id=33301078</a> (195 comments)<p>- We stand to save $7M over five years from our cloud exit: <a href="https://news.ycombinator.com/item?id=34878140">https://news.ycombinator.com/item?id=34878140</a> (18 comments)<p>- Our cloud exit has already yielded $1M/year in savings: <a href="https://news.ycombinator.com/item?id=37530011">https://news.ycombinator.com/item?id=37530011</a> (3 comments)
They crawl all the time, their instances could go down and no problem, there are still hundreds doing the same task. They consume waaaay too much traffic for the cloud to make sense financially.<p>Hybrid approach is best in cases like this. Use the cloud for client facing interfaces and rent dedicated servers for the spiders.<p>edit: even better, build your own data center instead of renting.
Yes, but this is truly an exceptional case. Their workloads are basically scraping (crawling) at a massive scale. Just like Google does, it makes more sense to have cheap throw-away hardware for this use case.<p>There are no permission issues or ACLs.<p>There’s no need to auto scale and the traffic is very predictable.<p>There is no serious need to orchestrate deployments. I imagine it’s mostly just workers reading URLs from a queue and crawling a page. So very easy to deploy new servers.<p>This is just an edge case scenario specifically great for self hosting.
This is a weird read. The analysis makes the classic mistake of assuming a lift-and-shift calculation. Of course that's going to be more expensive. You save money by re-architecting and using more managed services.<p>Which makes me scratch my head at the concluding statement:<p>> A cloud is convenient and locked in.<p><i>Everything</i> is a lock-in. But in the case they've described, which is just shifting from VMs to EC2s, it is the exact same thing, there is no lock-in from their perspective other than to use the phrase as a boogeyman.
Really misleading numbers in a lot of ways. Notably, they dismiss the risk of inflation, ignore longer term maintenance, choose really poor cloud analogs for their architecture, and ignore cost-saving options like spot instances and pre-pay discounts.<p>All that said, yes, cloud is often more expensive for simple applications with stable 24/7 workloads that don’t evolve over time. Do the research and choose the right infrastructure platform for your business.
The new mantra with the cloud champions in my company is that cloud was never meant to save money. It’s a premium experience that’s about saving time.<p>This did not sound right so I dug into the emails our leadership sent us between 6-3 years ago upping our “cloud transformation”. And yup, saving money was a part of it.<p>It’s only over the last year or so where it’s become obvious we didn’t save any money and in fact spent a lot more that it’s become about functionality and quality and not about cost.<p>The cloud may beat on Prem on functionality. For example, global colocation is much easier with the cloud. But don’t f’ing gaslight me and tell me the cloud providers hadn’t been selling cost as a benefit and even the primary benefit for the first 10 years or so at least.
120TB of storage is 3k USD per month when using s3 in Singapore and can sustain a much higher aggregate read/write speed than their existing setup.<p>Like many have said, a lift and shift is never great, and imagining you need 120TB of EBS per instance then being surprised it costs a lot is rather telling about the accuracy of such estimations.<p>Nothing was mentioned about utilisation - like basically everything, services follow a utilisation trend across a given time period. This assumes 100% used capacity at all times.<p>Moving to S3 and being able to scale down to 50% capacity at non-peak hours seems to nearly equal the cost, aside from the human+time cost savings. Using spot instances would also save even more.<p>Lock in also takes many forms. If you’re locked in to an infrastructure that only supports a certain type of system with big bulky servers and big bulky disks, then you’re going to build that kind of system. You can’t take advantage of something like a lambda for specific parts of your scraping pipeline, or SQS or S3. These are useful things to have at your disposal when designing systems.
i don't buy it.<p>first of all, ahrefs discounts the "people" cost, but that's a huge cost to ignore!<p>the biggest advantage that AWS and the like confer is being able to reduce interactions with literally every piece of infra you consume from them down to APIs.<p>having physical hardware means you need a team who knows how to rack/provision/configure/update hardware *along with* administrating operating systems and everything that comes with *along with* the automation needed to hold everything together.<p>finding people who had all of those skillsets was super challenging before The Cloud appeared, and is especially hard now since everyone who would have those skillsets prefers to work with cloudy things (because everything's an API).<p>second of all, they made the classic mistake of doing a one-to-one comparison of running their business on EC2. ofc that's going to cost a ton! you're basically just renting VMs from them at a huge premium. that can be done anywhere else (Hetzner is popular) for much cheaper.<p>that's not why you move to the cloud.<p>when AWS or Azure says they help companies save money, they usually mean taking an app that runs really well on-premise on a fixed set of compute that's a whole process to scale and making it run even better on smaller, but more distributed, compute that should be less expensive due to economies of scale.<p>Do web crawlers like these _need_ to run entirely on huge EC2 instances that run hot all of the time? Could they take advantage of more fractional compute from things like EC2 spot autoscaling groups or "serverless" compute? Ahrefs uses local NVMe storage for everything, which is definitely cheaper than EBS. Could they use data archival pipelines to compact and move less-used data onto slow networked storage? Could they benefit from using more aggressive caching for sites that don't change very often?<p>finally, for every company like Ahrefs who runs lots of compute hot 24x7, there are at least 20,000 companies who spend big money operating datacenters for apps that don't justify the cost. they _could_ save significant amounts of money by moving to the cloud AND re-architecting their apps to spend compute more efficiently.
How much do servers cost nowadays? According to the article, the cost would be $61,500 USD per server for specifications including 2TB RAM, 100Gbps, 16x 15TB drives, and 64 core CPU (assuming "We use high core-count CPUs").<p>Is this accurate? Could you provide me with a tip on how to acquire a server with those specifications for $60,000 USD?
Not directly related, but how much energy is being wasted by thousands of companies scraping the internet continuously and storing roughly the same information as everyone else, and then storing that in their own datacenters? I understand the commercial reasons for it, but this all seems very inefficient.
if you give the heart of your business (data) to an alien company you make TWO collosal mistakes:<p>1) you transfer the core of your business to somebody else
2) you can be blackmailed service and costs wise<p>outsourcing in general is a deadly management fashion.
The article honestly reads as if written by a very smart sysadmin with zero cloud experience.<p>1:1 lift and shift is always obscenely more expensive. In this case, if the author had been in charge of the migration, then yes, the services would have cost them dearly to operate in the cloud.<p>I'm sure if I was personally put in charge of moving some aspect of IT into an unfamiliar mode of operation, my inexperience there would make my approach insanely expensive as well.<p>That says nothing about the target, except that having undertrained and inexperienced staff in charge of its design and implementation is probably foolish from a financial perspective.<p>There are obviously thousands on thousands of scenarios where moving to commodity cloud is an absolute slam dunk in aspects that are important to the subject business.<p>Unfortunately we really get no insight into what the workload truly is in the article's comparison. There's no mention of solution aspects like app architecture, security, HA/DR, SLA, RTO/RPO, security or backups [1]. We only get what is plainly a tunnel-vision view of a comparison.<p>It's almost like the author doesn't make solutions for a living.<p>Maybe the author actually realizes their blind spot, and is secretly utilizing Cunningham's law to crowd-source a relatively free solution from the professionals and amateurs in the internet comments sections.<p>The good architects don't work for free. There's a reason why Troy Hunt's web services cost him vanishingly little to operate, and it's certainly not by running IaaS VMs 24x7x365.<p>[1] I mentioned security twice as part of an ongoing effort to make up for all the times CyberSec/Infosec teams have been forgotten in the planning process. =P
> Also, we pay for IP Transit and dark fiber between the data center and our point of presence.<p>IP noob here- can someone explain what this means?
I just realised that (ignoring other weird price calculation aspects), they compare the processing power 1:1 for AWS... Does that mean they have no second site / failover? They keep referring to just one datacentre, but missing the "x% chance of a thermal event wiping the company for (colocation+hardware lead time)" is not something you can swipe under the rug.
>by not going to the cloud<p>...in the worst way imaginable<p>Doing a direct lift and shift with 1:1 replacement of instances is, intentionally, prohibitibely expensive, so you stop and <i>think</i>.