Amazon S3 Adds Put-If-Match (Compare-and-Swap)

524 pointsby Sirupsen6 months ago

29 comments

torginus6 months ago

Ah so its not only me that uses AWS primitives for hackily implementing all sorts of synchronization primitives.My other favorite pattern is implementing a pool of workers by quering ec2 instances with a certain tag in a stopped state and starting them. Starting the instance can succeed only once - that means I managed to snatch the machine. If it fails, I try again, grabbing another one.This is one of those things that I never advertised out of professional shame, but it works, its bulletproof and dead simple and does not require additional infra to work.

评论 #42244410 未加载

评论 #42243588 未加载

评论 #42244895 未加载

JoshTriplett6 months ago

It's also possible to enforce the use of conditional writes: <a href="https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-enforcement-conditional-write-operations-general-purpose-buckets/" rel="nofollow">https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3...</a>My biggest wishlist item for S3 is the ability to enforce that an object is named with a name that matches its hash. (With a modern hash considered secure, not MD5 or SHA1, though it isn't supported for those either.) That would make it much easier to build content-addressible storage.

评论 #42243234 未加载

评论 #42242293 未加载

评论 #42241838 未加载

评论 #42241624 未加载

评论 #42241630 未加载

评论 #42241750 未加载

Sirupsen6 months ago

To avoid any dependencies other than object storage, we've been making use of this in our database (turbopuffer.com) for consensus and concurrency control since day one. Been waiting for this since the day we launched on Google Cloud Storage ~1 year ago. Our bet that S3 would get it in a reasonable time-frame worked out!<a href="https://turbopuffer.com/blog/turbopuffer" rel="nofollow">https://turbopuffer.com/blog/turbopuffer</a>

评论 #42242120 未加载

评论 #42243418 未加载

1a527dd56 months ago

Be still my beating heart. I have lived to see this day.Genuinely, we've wanted this for ages and we got half way there with strong consistency.

评论 #42241188 未加载

评论 #42241305 未加载

CubsFan10606 months ago

I feel dumb for asking this, but can someone explain why this is such a big deal? I’m not quite sure I am grokking it yet.

评论 #42241577 未加载

评论 #42243437 未加载

评论 #42241551 未加载

评论 #42242041 未加载

评论 #42241537 未加载

maglite776 months ago

Noting that Azure Blob storage supports e-tag / optimistic controls as well (via If-Match conditions)[1], how does this differ? Or is it the same feature?[1]: <a href="https://learn.microsoft.com/en-us/azure/storage/blobs/concurrency-manage" rel="nofollow">https://learn.microsoft.com/en-us/azure/storage/blobs/concur...</a>

评论 #42242409 未加载

koolba6 months ago

This combined with the read-after-write consistency guarantee is a perfect building block (pun intended) for incremental append only storage atop an object store. It solves the biggest problem with coordinating multiple writers to a WAL.

评论 #42240996 未加载

评论 #42243896 未加载

offmycloud6 months ago

If the default ETag algorithm for non-encrypted, non-multipart uploads in AWS is a plain MD5 hash, is this subject to failure for object data with MD5 collisions?I'm thinking of a situation in which an application assumes that different (possibly adversarial) user-provided data will always generate a different ETag.

评论 #42241391 未加载

评论 #42241417 未加载

评论 #42242333 未加载

评论 #42243448 未加载

评论 #42242010 未加载

ipython6 months ago

I can't wait to see what abomination Cory Quinn can come up with now given this new primitive! (see previous work abusing Route53 as a database: <a href="https://www.lastweekinaws.com/blog/route-53-amazons-premier-database/" rel="nofollow">https://www.lastweekinaws.com/blog/route-53-amazons-premier-...</a>)

amazingamazing6 months ago

Ironically with this and lambda you could make a serverless sqlite by mapping pages to objects, using http range reads to read the db and lambda to translate queries to the writes in the appropriate pages via cas. Prior to this it would require a server to handle concurrent writers, making the whole thing a nonstarter for “serverless”.Too bad performance would be terrible without a caching layer (ebs).

评论 #42242312 未加载

sillysaurusx6 months ago

Finally. GCP has had this for a long time. Years ago I was surprised S3 didn’t.

评论 #42241173 未加载

评论 #42248936 未加载

评论 #42241735 未加载

m_d_6 months ago

s3fs's <a href="https://github.com/fsspec/s3fs/pull/917">https://github.com/fsspec/s3fs/pull/917</a> was in response to the IfNoneMatch feature from the summer. How would people imagine this new feature being surfaced in a filesystem abstraction?

spprashant6 months ago

I had no idea people rely on S3 beyond dumb storage. It almost feels like people are trying to build out a distributed OLAP database in the reverse direction.

评论 #42246730 未加载

vytautask6 months ago

An open-source implementation of Amazon S3 - MinIO has had it for almost two years (relevant post: <a href="https://blog.min.io/leading-the-way-minios-conditional-write-feature-for-modern-data-workloads/" rel="nofollow">https://blog.min.io/leading-the-way-minios-conditional-write...</a>). Strangely, Amazon is catching up just now.

评论 #42243073 未加载

评论 #42248151 未加载

tonymet6 months ago

good example of how a simple feature on the surface (a header comparison) requires tremendous complexity and capacity on the backend.

评论 #42241351 未加载

wanderingmind6 months ago

Does this mean, in theory we will be able to manage multiple concurrent writes/updates to s3 without having to use new solutions like Regatta[1] that was recently launched?<a href="https://news.ycombinator.com/item?id=42174204">https://news.ycombinator.com/item?id=42174204</a>

评论 #42244897 未加载

gravitronic6 months ago

First thing I thought when I saw the headline was "oh! I should tell Sirupsen"

lttlrck6 months ago

Isn't this compare-and-set rather than compare-and-swap?

rrr_oh_man6 months ago

Could anybody explain for the uninitiated?

评论 #42241619 未加载

stevefan19996 months ago

So...are we closer to getting to use S3 as a...you guessed it...a database? With CAS, we are probably able to get a basic level of atomicity, and S3 itself is pretty durable, now we have to deal with consistency and isolation...although S3 branded itself as "eventually consistent"...

评论 #42242195 未加载

评论 #42241951 未加载

评论 #42246685 未加载

vlovich1236 months ago

I implemented that extension in R2 at launch IIRC. Thanks for catching up & helping move distributed storage applications a meaningful step forward. Intended sincerely. I'm sure adding this was non-trivial for a complex legacy codebase like that.

anonymousDan6 months ago

Would be interesting to understand how they've implemented it and they whether there is any perf impact on other API calls.

dvektor6 months ago

[rejected] error: failed to push some refs to remote repositoryFinally we can have this with s3 :)

评论 #42248056 未加载

paulsutter6 months ago

What’s amazing is that it took them so long to add these functions

thayne6 months ago

Now if only you had more control over the ETag, so you could use a sha256 of the total file (even for multi-part uploads), or a version counter, or a global counter from an external system, or a logical hash of the content as opposed to a hash of the bytes.

londons_explore6 months ago

So we can now implement S3-as-RAM for a worldwide million-core linux VM?

juggli6 months ago

finally

grahamj6 months ago

bender_neat.gif

serbrech6 months ago

Why is standard etag support making the frontpage?