> When we moved S3 to a strong consistency model, the customer reception was stronger than any of us expected.<p>This feels like one of those Apple-like stories about inventing and discovering an amazing, brand new feature that delighted customers but not mentioning the motivating factor of the competing products that already had it. A more honest sentence might have been "After years of customers complaining that the other major cloud storage providers had strong consistency models, customers were relieved when we finally joined the party."
S3 is up there as one of my favorite tech products ever. Over the years I've used it for all sorts of things but most recently I've been using it to roll my own DB backup system.<p>One of the things that shocks me about the system is the level of object durability. A few years ago I was taking an AWS certification course and learned that their durability number means that one can expect to loose data about once every 10,000 years. Since then anytime I talk about S3's durability I bring up that example and it always seems to convey the point for a layperson.<p>And it's "simplicity" is truly elegant. When I first started using S3 I thought of it as a dumb storage location but as I learned more I realized that it had some wild features that they all managed to hide properly so when you start it looks like a simple system but you can gradually get deeper and deeper until your S3 bucket is doing some pretty sophisticated stuff.<p>Last thing I'll say is, you know your API is good when "S3 compatable API" is a selling point of your competitors.
> I’ve seen other examples where customers guess at new APIs they hope that S3 will launch, and have scripts that run in the background probing them for years! When we launch new features that introduce new REST verbs, we typically have a dashboard to report the call frequency of requests to it, and it’s often the case that the team is surprised that the dashboard starts posting traffic as soon as it’s up, even before the feature launches, and they discover that it’s exactly these customer probes, guessing at a new feature.<p>This surprises me; has anyone done something similar and benefitted from it? It's the sort of thing where I feel like you'd maybe get a result 1% of the time if that, and then only years later when everyone has moved on from the problem they were facing at the time...
It's funny—S3 started as a "simple" storage service, and now it's handling entire table abstractions. Reminds me how SQL was declared dead every few years, yet here we are, building ever more complex data solutions on top of supposedly simple foundations.
The true hero in AWS is its authentication and accounting infrastructure.<p>Most people don't even think about it. But authenticating trillions of operations per second is why AWS works. And the accounting and billing. Anyone that does authentication knows how hard it is. At AWS' scale it's well, the pinnacle of distributed systems.
little history: When we were getting ready to do an API at DigitalOcean I got asked "uhm... how should it feel?" I thought about that for about 11 seconds and said "if all our APIs feel as good as S3, it should be fine" - it's a good API.
S3 is the simplest CRUD app you could create.<p>It's essentially just the 4 functions of C.R.U.D done to a file.<p>Most problems in tech are not that simple.<p>Note: not knocking the service. just pointing out not all things are so inherently basic (and valuable at the same time).
I have a feeling that economies of scale have a point of diminishing returns. At what point does it become more costly and complicated to store your data on S3 versus just maintaining a server with RAID disks somewhere?<p>S3 is an engineering marvel, but it's an insanely complicated backend architecture just to store some files.
"I think one thing that we’ve learned from the work on Tables is that it’s these _properties of storage_ that really define S3 much more than the object API itself."<p>Between the above philosophy, S3 Tables, and Express One Zone (SSD perf), it makes me really curious about what other storage modalities S3 moves towards supporting going forward.
It's great that they added iceberg support I guess, but it's a shame that they also removed S3 Select. S3 Select wasn't perfect. For instance the performance was no where near as good as using DuckDB to scan a parquet file, since duck is smart, and S3 Select does a full table scan.<p>But S3 Select is nearly way cheaper that the new iceberg support. So if your needs are only for reading one parquet snapshot, we no need to do updates, then this change is not welcome.<p>Great article though, and I was pleased to see this at the end:<p>> We’ve invested in a collaboration with DuckDB to accelerate Iceberg support in Duck,
For those interested in S3 Tables which is referenced in this blog post, we literally just published this overview on what they are and cost considerations of them that people might find interesting: <a href="https://www.vantage.sh/blog/amazon-s3-tables" rel="nofollow">https://www.vantage.sh/blog/amazon-s3-tables</a>
Lakehouse is an architecture defined to overcome the limitations associated with an immutable object store. It is already in my eyes introducing unnecessary complexity (i.e. at which point do I just need to transition to a proper database, even for larger scales (that cannot be accommodated by a database?), when is a tiered data architecture with stream materialisation snapshots actually simpler to reason about and more economic etc.)<p>I would hope that S3 could introduce a change in the operation of said fundamental building block (immutable object), rather than just slap an existing downstream abstraction. That's not what I call design for simplicity. As an external observer, I would think that's internal amazon moat management with some co-branding strategy.
Lots of comments here talking about how great S3 is.<p>Anyone willing to give a cliff notes about what's good about it?<p>I've been running various sites and apps for a decade, but have never touched S3 because the bandwidth costs are 1, sometimes 2 orders of magnitude more expensive than other static hosting solutions.
I really enjoy using S3 to serve arbitrary blobs. It perfectly solves the problem space for my use cases.<p>I avoid getting tangled in authentication mess by simply naming my files using type4 GUIDs and dumping them in public buckets. The file name is effectively the authentication token and expiration policies are used to deal with the edges.<p>This has been useful for problems like emailing customers gigantic reports and transferring build artifacts between systems. Having a stable URL that "just works" everywhere easily pays for the S3 bill in terms of time & frustration saved.
I wish they'd comment on how their service keeps leaking massive amounts of data that isn't supposed to.<p>Didn't Microsoft blame their customers for not running updates and using AV? Gosh I guess MS did a fine job. Customers were to blame.<p>Just like those s3 buckets. Customers ignored the warnings ...
S3 was one of the first offerings coming out of AWS right? It’s pretty legendary and a great concept to begin with. You can tell by how much sense it makes and then trying to wrap your ahead around the web dev world pre-S3.
if only metadata could be queried without processing a csv output file first, imagine storing thumbnails in there even! copied objects had actual events, not something you have to dig cloudtrail for, you could get last update time from a bucket to make caching easier
Would have been good if they mentioned they meant Amazon S3. It took me a while to figure out what this was about.<p>Initially I thought this was about S3 standby mode.
Clicked on the article thinking it was about S3 Graphics, the company that made the graphics chip in my first PC. Now I see it's some amazon cloud storage thing.