You can use lifecycle policies to delete it for free, but its best to confirm it via support. Not saying this is the great way, maybe its intentionally hidden, but at least there is a way.<p><a href="https://stackoverflow.com/questions/59170391/s3-lifecycle-expiration-do-object-expiry-deletes-cost-money-for-sia-objects" rel="nofollow">https://stackoverflow.com/questions/59170391/s3-lifecycle-ex...</a>
Not only is a problem that deleting a bucket costs money, but if you have a big bucket with many deeply nested files, it can take a really long time to clean it up using the AWS command line.<p>I ran into this with a bucket full of EMR log files a few years ago and had to figure out some pretty crazy command line hackiness, plus running on a EC2 machine with lots of cores to figure it out. This a write-up I did if anyone else ever runs into this issue.<p><a href="https://gist.github.com/michael-erasmus/6a5acddcb56548874ffe780e19b7701d" rel="nofollow">https://gist.github.com/michael-erasmus/6a5acddcb56548874ffe...</a>
Per-object costs can be tricky with S3 -- it's easy to mentally round costs less than 1/10th of a penny to zero, and then look up a few years later and realize you have hundreds of millions of things and can't afford to do anything with them.<p>When this bit us on a project I made a tool to solve our particular problem, which tars files, writes csv indexes, and can fetch individual files from the tars if need be.[1] Running on millions of files was janky enough that I also ended up scripting an orchestrator to repeatedly attempt each step of the pipeline.[2] Not tested on data other than ours but could be a useful starting point.<p>[1] <a href="https://github.com/harvard-lil/s3mothball" rel="nofollow">https://github.com/harvard-lil/s3mothball</a>
[2] <a href="https://github.com/harvard-lil/mothball_pipeline" rel="nofollow">https://github.com/harvard-lil/mothball_pipeline</a>
And deleting your AWS account will keep billing you [1] if you don’t delete all resources first.<p>AWS is designed to extract dollars from big enterprise contracts.<p>Also interesting from the article, this poor soul on StackOverflow was trying to figure out how to delete a bucket that would cost him $20,000 [2]. Can’t delete, can’t close.<p>[1] <a href="https://www.reddit.com/r/aws/comments/j5nh4w/ive_deleted_my_account_but_amazon_keeps_billing/" rel="nofollow">https://www.reddit.com/r/aws/comments/j5nh4w/ive_deleted_my_...</a><p>[2] <a href="https://stackoverflow.com/questions/54255990/cheapest-way-to-delete-2-billion-objects-from-s3-ia" rel="nofollow">https://stackoverflow.com/questions/54255990/cheapest-way-to...</a>
Pricing of AWS services makes me uneasy in general, just take the S3 as an example - you go to the pricing page and you have several tabs with dozens of entries which makes calculating how much exactly will you pay difficult. I might be simple minded but I prefer a clearly defined plans with predetermined limits - you know exactly what it costs you each month and what you get and if you need more, just switch to a higher plan, no risk of nasty (and often expensive) surprises like mentioned in the article.
Yup. And uploading / downloading large objects from S3 incurs tons of requests because S3 client does parallel chunking with a small number of other control requests. That client works on the same premise as SFTP client.<p>It’s amazing how often it retries.<p>Example from go sdk: <a href="https://github.com/aws/aws-sdk-go/blob/main/service/s3/s3manager/download.go#L303" rel="nofollow">https://github.com/aws/aws-sdk-go/blob/main/service/s3/s3man...</a>.
The first thing everyone who tries using cloud services should learn: <i>everything</i> costs money. Even the service that tells you how much it costs: <a href="https://aws.amazon.com/aws-cost-management/pricing/" rel="nofollow">https://aws.amazon.com/aws-cost-management/pricing/</a>
This post finally got my ass in gear to cancel an account that I thought I had closed but was still charging me a few dollars a month.<p>I spinned up an AWS instance to practice, and once I was done I thought I closed everything down.<p>Turns out I had just stopped my micro instances, and I didn't terminate them. I also hadn't released the my IP address. There was also a snapshot of the tiny db I had created still floating around. The documentation was a little confusing, so after I went through it I spent half an hour chatting with a support rep to make sure everything was completely good. After next month my last bill should go through and I should be free and clear. Unfortunately I have to wait for next months bill to go through as I can't just pay it all now.<p>This was mostly my fault for letting it go on for so long, but I hate how if you don't do some very specific steps you can still be charged. And I think if an account is closed, it should absolutely terminate all services that are still running on that account, and then send you the final bill.
How much of this is a problem in practice?<p>I think in practice, S3 data is often indexed using other DBs e.g DynamoDB, Postgres, MySQL etc. Can't this index be used to enumerate all S3 URLs? I am off-course simplifying this a lot.
Stories like this make me extremely hesitant to try AWS. I was about to try S3 for a static site I was working on this weekends but I think I am gonna stick with netlify or digital ocean instead after reading this.
> .5¢ per 1000 items LISTed seems insanely expensive considering how cheaply you can transfer terabytes of data with S3.<p>Correction: I misread - .5¢ per 1,000,000 items LISTed<p><pre><code> .5¢ per 1000 LIST operations
LIST operations max out at 1000 items
</code></pre>
Still a little pricey, but way less so than I'd imagined.<p>Do they make a lot of money off of charging for basic operations? It seems like you could make the whole pricing structure a lot more friendly by only charging for bandwidth use. I guess when you're as dominant as S3, you don't need to care about friendly pricing structures.<p>Charging for basic operations like that is weird, it's akin to a service charging people per number of clicks on a website.
> In 2021, anyone who comes across this question may benefit to know that AWS console now provides an empty button.<p>source : <a href="https://stackoverflow.com/a/67834172" rel="nofollow">https://stackoverflow.com/a/67834172</a>
> you can also get an export of all objects in a bucket using S3 Inventory and run the output through AWS Batch in order to delete those objects<p>"S3 Batch Operations" sends S3 requests based on a csv file, which can but does not have to be from S3 Inventory. But S3 Batch Operations supports only a subset of APIs and this does not include DeleteObject(s). [0]<p>An AWS Batch job could run a container which sends DeleteObjects requests but only when triggered by a job queue which seems redundant here.<p>If I can't use an expiration lifecycle policy because I need a selection of objects not matching a prefix or object tags, I would run something with `s5cmd rm` [1]. Alternatively roll your own golang which parses the CSV and sends many DeleteObjects requests in parallel goroutines.<p>0. <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops-operations.html" rel="nofollow">https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-...</a><p>1. <a href="https://github.com/peak/s5cmd#delete-multiple-s3-objects" rel="nofollow">https://github.com/peak/s5cmd#delete-multiple-s3-objects</a>
They have an example of some person almost paying $20k on transition fees. In my early days of AWS, I racked up $90k on S3 transition fees. Thankfully, AWS forgave it.
Would the S3 inventory help here? That would allow you to get the list of all files (albeit on a delay similar to the lifecycle rule approach), which you could process offline to generate the DELETEs.<p><a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html" rel="nofollow">https://docs.aws.amazon.com/AmazonS3/latest/userguide/storag...</a>
Ok pretty obvious, but if you don't know what you are storing inside your bucket, how are you accessing your objects in the first place ?<p>If your use-case is storing random things you don't know the path of, maybe it's the wrong product to use.
> AWS is "eventually consistent" within most services, and S3 is no exception<p>Nowadays, it (¿almost?) is. <a href="https://aws.amazon.com/s3/consistency/" rel="nofollow">https://aws.amazon.com/s3/consistency/</a>:<p><i>“After a successful write of a new object, or an overwrite or delete of an existing object, any subsequent read request immediately receives the latest version of the object”</i><p>I think that says that deletes are immediately visible, too, but they phrase it weirdly, as, after a delete, there is no latest version of the object.<p>Also, I don’t think buckets are objects in this sense, so the caveat in the article stands.
><i>The wait is often hours until AWS released a bucket name (since bucket names are globally unique, not just within your account).</i><p>I think last time I did this, the wait time was pretty much exactly 60 minutes.
Anyone have suggestions for S3 alternatives for storing many files sized 50-500mb each? They are mostly long audio files and there is an external index as well.
Its silly that they wont just let you delete the whole bucket but this actually pretty cheap tho.<p>Based on some quick maths, deleting a million files would only cost you like $5.<p>P.S. Again its silly they do this and I'm probably greatly underestimating how these costs can add up for mid to large orgs.
> Deleting a bucket won't let you re-create that bucket immediately.<p>This is partially incorrect. I can recreate it immediately in the same account, but in different account, I need to wait for ~1 hour
Most things on AWS cost money and AWS makes pricing incredibly complex and opaque...where the monthly bill is usually the first way people find out about these things. While it is likely no consolation, S3 is by far one of the most complex AWS products pricing-wise with different object storage types each with their own rates, request costs with different rates for GET/PUT/POST/etc which this post mentions, and transit/egress fees.<p>I work on <a href="https://www.vantage.sh/" rel="nofollow">https://www.vantage.sh/</a> which helps teams get visibility on their cloud costs which may be helpful to folks here as well on this topic.