AWS S3: Sometimes you should press the $100k button

408 点作者 korostelevm大约 3 年前

31 条评论

lenkite大约 3 年前

sigh. My team is facing all these issues. Drowning in data. Crazy S3 bill spikes. And not just S3 - Azure, GCP, Alibaba, etc since we are a multi-cloud product.Earlier, we couldn't even figure out lifecycle policies to expire objects since naturally every PM had a different opinion on the data lifecycle. So it was old-fashioned cleanup jobs that were scheduled and triggered when a byzantine set of conditions were met. Sometimes they were never met - cue bill spike.Thankfully, all the new data privacy & protection regulations are a life-saver. Now, we can blindly delete all associated data when a customer off-boards or trial expires or when data is no longer used for original purpose. Just tell the intransigent PM's that we are strictly following govt regulations.

评论 #30373682 未加载

评论 #30374074 未加载

评论 #30374700 未加载

评论 #30383265 未加载

liveoneggs大约 3 年前

I have caused billing spikes like this before those little warnings were invented and it was always a dark day. They are really a life saver.Lifecycle rules are also welcome. Writing them yourself was always a pain and tended to be expensive with list operations eating up that api calls bill.----Once I supported an app that dumped small objects into s3 and begged the dev team to store the small objects in oracle as BLOBS to be concatenated into normal-sized s3 bjects after a reasonable timeout where no new small objects would reasonably be created. They refused (of course) and the bills for managing a bucket with millions and millions of tiny objects were just what you expect.I then went for a compromise solution asking if we could stitch the small objects together after a period of time so they would be eligible for things like infrequent access or glacier but, alas, "dev time is expensive you know" so N figure s3 bills continue as far as I know.

评论 #30373606 未加载

评论 #30377775 未加载

评论 #30373408 未加载

asim大约 3 年前

The AWS horror stories never cease to amaze me. It's like we're banging our heads against the wall expecting a different outcome each time. What's more frustrating, the AWS zealots are quite happy to tell you how you're doing it wrong. It's the users fault for misusing the service. The reality is, AWS was built for a specific purpose and demographic of user. It's now complexity and scale makes it unusable for newer devs. I'd argue, we need a completely new experience for the next generation.

评论 #30374549 未加载

评论 #30374036 未加载

评论 #30374456 未加载

评论 #30374590 未加载

评论 #30379060 未加载

评论 #30375884 未加载

评论 #30377093 未加载

评论 #30378316 未加载

评论 #30378592 未加载

rizkeyz大约 3 年前

I did the back-of-the-envelope math once. You get a Petabyte of storage today for $60K/year if you buy the hardware (retail disks, server, energy). It actually fits into the corner of a room. What do you get for $60K in AWS S3? Maybe a PB for 3 months (w/o egress).If you replace all your hardware every year, the cloud is 4x more expensive. If you manage to use your getto-cloud for 5 year, you are 20x cheaper than Amazon.To store one TB per person on this planet in 2022, it would take a mere $500M to do that. That's short change for a slightly bigger company these days.I guess by 2030 we should be able to record everything a human says, sees, hears and speaks in an entire life for every human on this planet.And by 2040 we should be able to have machines learning all about human life, expression and intelligence to slowly making sense of all of this.

评论 #30380055 未加载

评论 #30381061 未加载

jwalton大约 3 年前

Your website renders as a big empty blue page in Firefox unless I disable tracking protection (and in my case, since I have noscript, I have to enable javascript for "website-files.com", a domain that sounds totally legit).

评论 #30373600 未加载

评论 #30373397 未加载

评论 #30373553 未加载

评论 #30375121 未加载

cj大约 3 年前

Off topic: for people with a "million billion" objects, does the S3 console just completely freeze up for you? I have some large buckets that I'm unable to even interact with via the GUI. I've always wondered if my account is in some weird state or if performance is that bad for everyone. (This is a bucket with maybe 500 million objects, under a hundred terabytes)

评论 #30372899 未加载

评论 #30372989 未加载

评论 #30376325 未加载

评论 #30373379 未加载

评论 #30374282 未加载

评论 #30375607 未加载

评论 #30376020 未加载

评论 #30373753 未加载

评论 #30372937 未加载

lloesche大约 3 年前

I had a similar issue at my last job. Whenever a user created a PR on our open source project artifacts of 1GB size consisting of hundreds of small files would be created and uploaded to a bucket. There was just no process that would ever delete anything. This went on for 7 years and resulted in a multi-petabyte bucket.I wrote some tooling to help me with the cleanup. It's available on Github: <a href="https://github.com/someengineering/resoto/tree/main/plugins/aws/resoto_plugin_aws/cmd/" rel="nofollow">https://github.com/someengineering/resoto/tree/main/plugins/...</a> consisting of two scripts, s3.py and delete.py.It's not exactly meant for end-users, but if you know your way around Python/S3 it might help. I build it for a one-off purge of old data. s3.py takes a `--aws-s3-collect` arg to create the index. It lists one or more buckets and can store the result in a sqlite file. In my case the directory listing of the bucket took almost a week to complete and resulted in a 80GB sqlite.I also added a very simple CLI interface (calling it virtual filesystem would be a stretch) that allows to load the sqlite file and browse the bucket content, summarise "directory" sizes, order by last modification date, etc. It's what starts when calling s3.py without the collect arg.Then there is delete.py which I used to delete objects from the bucket, including all versions (our horrible bucket was versioned which made it extra painful). On a versioned bucket it has to run twice, once to delete the file and once to delete the then created version, if I remember correctly - it's been a year since I built this.Maybe it's useful for someone.

评论 #30377426 未加载

评论 #30377272 未加载

ebingdom大约 3 年前

I'm confused about prefixes and sharding:> The files are stored on a physical drive somewhere and indexed someplace else by the entire string app/events/ - called the prefix. The / character is really just a rendered delimiter. You can actually specify whatever you want to be the delimiter for list/scan apis.> Anyway, under the hood, these prefixes are used to shard and partition data in S3 buckets across whatever wires and metal boxes in physical data centers. This is important because prefix design impacts performance in large scale high volume read and write applications.If the delimiter is not set at bucket creation time, but rather can be specified whenever you do a list query, how can the prefix be used to influence where objects are physically stored? Doesn't the prefix depend on what delimiter you use? How can the sharding logic know what the prefix is if it doesn't know the delimiter in advance?For example, if I have a path like `app/events/login-123123.json`, how does S3 know the prefix is `app/events/` without knowing that I'm going to use `/` as the delimiter?

评论 #30373375 未加载

评论 #30373208 未加载

评论 #30375685 未加载

评论 #30373068 未加载

zmmmmm大约 3 年前

The rationale for using cloud is so often that it saves you from complexity. It really undermines the whole proposition when you find out that the complexity it shields you from is only skin deep, and in fact you still need a "PhD in AWS" anyway.But as a bonus, now you face huge risks and liabilities from single button pushes and none of those skills you learned are transferrable outside of AWS so you'll have to learn them again for gcloud, again for azure, again for Oracle ....

pontifier大约 3 年前

DON'T PRESS THAT BUTTON.The egress and early deletion fees on those "cheaper options" killed a company that I had to step in and save.

评论 #30375324 未加载

评论 #30375654 未加载

Tehchops大约 3 年前

We’ve got data in S3 buckets not nearly at that scale and managing them, god forbid trying a mass delete, is absolute tedium.

评论 #30372896 未加载

评论 #30373035 未加载

评论 #30375900 未加载

pattycake23大约 3 年前

Here's an article about Shopify running into the S3 prefix rate limit too many times, and tackling it: <a href="https://shopify.engineering/future-proofing-our-cloud-storage-usage" rel="nofollow">https://shopify.engineering/future-proofing-our-cloud-storag...</a>

评论 #30373435 未加载

wackget大约 3 年前

As a web developer who has never used anything except locally-hosted databases, can someone explain what kind of system actually produces billions or trillions of files which each need to be individually stored in a low-latency environment?And couldn't that data be stored in an actual database?

评论 #30377606 未加载

评论 #30380748 未加载

评论 #30377525 未加载

评论 #30382247 未加载

wodenokoto大约 3 年前

I've never been in this situation, but I do wish you could query files with more advanced filters on these blob storage services.- But why SageMaker?- Why do some orgs choose to put almost everything in 1 buckets?

评论 #30372909 未加载

评论 #30372819 未加载

评论 #30378744 未加载

评论 #30372957 未加载

charcircuit大约 3 年前

Can someone explain what happened in the end? From my understanding nothing happened (they deprioritizod the story for fixing it) and they are still blowing through the cloud budget.

评论 #30375551 未加载

评论 #30373588 未加载

vdm大约 3 年前

DeleteObjects takes 1000 keys per call.Lifecycle rules can filter by min/max object size. (since Nov 2021)

评论 #30374435 未加载

评论 #30373232 未加载

Mave83大约 3 年前

Just avoid the cloud. You get a Ceph storage with the performance of Amazon S3 at the price point of Amazon S3 Glacier in any Datacenter worldwide deployed if you want. There are companies that help you doing this.Feel free to ask if you need help.

评论 #30380788 未加载

评论 #30379936 未加载

valar_m大约 3 年前

Though it doesn't address the problem in TFA, I recommend setting up billing alerts in AWS. Doesn't solve their issue, but they would have at least known about it sooner.

0x002A大约 3 年前

Each time a developer does something on a cloud platform, that moment the platform might start to profit for two reasons: vendor lock-in and accrued costs in the long term regardless of the unit cost.Anything limitless/easiest has a higher hidden cost attached.

StratusBen大约 3 年前

On this topic, it's always surprising to me how few people even seem to know about different storage classes on S3...or even intelligent tiering (which I know carries a cost to it, but allows AWS to manage some of this on your behalf which can be helpful for certain use-cases and teams).We did an analysis of S3 storage levels by profiling 25,000 random S3 buckets a while back for a comparison of Amazon S3 and R2* and nearly 70% of storage in S3 was StandardStorage which just seems crazy high to me.* <a href="https://www.vantage.sh/blog/the-opportunity-for-cloudflare-r2" rel="nofollow">https://www.vantage.sh/blog/the-opportunity-for-cloudflare-r...</a>

评论 #30374263 未加载

kondro大约 3 年前

The minimum size of objects in cheaper storage types is 128KiB.Given the article quotes $100k to run an inventory (and $100k/month in standard storage) it's likely most of your objects are smaller than 128KiB and so probably wouldn't benefit from cheaper storage options (although it's possible this is right on the cusp of the 128KiB limit and could go either way).Honestly, if you have a $1.2m/year storage bill in S3 this would be the time to contact your account manager and try to work out what could be done to improve this. You probably shouldn't be paying list anyway if just the S3 component of your bill is $1.2m/year.

dekhn大约 3 年前

I had to chuckle at this article because it reminded me of some of the things I've had to do to clean up data.One time I had to write a special mapreduce that did a multiple-step-map to converted my (deeply nested) directory tree into roughly equally sized partitions (a serial directory listing would have taken too long, and the tree was really unbalanced to partition in one step), then did a second mapreduce to map-delete all the files and reduce the errors down to a report file for later cleanup. This meant we could delete a few hundred terabytes across millions of files in 24 hours, which was a victory.

cyanic大约 3 年前

We solved the problem of deleting old files early in our development process, as we wanted to avoid situations such as this one.While developing GitFront, we were using S3 to store individual files from git repositories as single objects. Each of our users was able to have multiple repositories with thousands of files, and they needed to be able to delete them.To solve the issue, we implemented a system for storing multiple files inside a single object and a proxy which allows accessing individual files transparently. Deleting a whole repository is now just a single request to S3.

jopsen大约 3 年前

One of the biggest pains is that cloud services rarely mention what they don't do.I think it's really sad, because when I don't see docs clearly stating the limits, I assume the worst and avoid the service.

gfd大约 3 年前

Does anyone have recommendations on how to compress the data (gzip or parquet).

zitterbewegung大约 3 年前

I was at a presentation where HERE technologies told us that they went from being on the top ten (or top five) S3 users (by data stored) to getting off of that list. This was seen as a big deal obviously.

solatic大约 3 年前

TL-DR: Object stores are not databases. Don't treat them like one.

评论 #30373038 未加载

评论 #30373729 未加载

hughrr大约 3 年前

For every $100k bill there’s a hundred of us with 14TB that costs SFA to roll with.

harshaw大约 3 年前

AWS budgets is a tool for cost containment (among other external services).

gnutrino大约 3 年前

Lol this post hits close to home.

gtirloni大约 3 年前

A "TLDR" that is not.