Ah so its not only me that uses AWS primitives for hackily implementing all sorts of synchronization primitives.<p>My other favorite pattern is implementing a pool of workers by quering ec2 instances with a certain tag in a stopped state and starting them.
Starting the instance can succeed only once - that means I managed to snatch the machine. If it fails, I try again, grabbing another one.<p>This is one of those things that I never advertised out of professional shame, but it works, its bulletproof and dead simple and does not require additional infra to work.
It's also possible to enforce the use of conditional writes: <a href="https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-enforcement-conditional-write-operations-general-purpose-buckets/" rel="nofollow">https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3...</a><p>My biggest wishlist item for S3 is the ability to enforce that an object is named with a name that matches its hash. (With a modern hash considered secure, not MD5 or SHA1, though it isn't supported for those either.) That would make it much easier to build content-addressible storage.
To avoid any dependencies other than object storage, we've been making use of this in our database (turbopuffer.com) for consensus and concurrency control since day one. Been waiting for this since the day we launched on Google Cloud Storage ~1 year ago. Our bet that S3 would get it in a reasonable time-frame worked out!<p><a href="https://turbopuffer.com/blog/turbopuffer" rel="nofollow">https://turbopuffer.com/blog/turbopuffer</a>
Be still my beating heart. I have lived to see this day.<p>Genuinely, we've wanted this for ages and we got half way there with strong consistency.
Noting that Azure Blob storage supports e-tag / optimistic controls as well (via If-Match conditions)[1], how does this differ? Or is it the same feature?<p>[1]: <a href="https://learn.microsoft.com/en-us/azure/storage/blobs/concurrency-manage" rel="nofollow">https://learn.microsoft.com/en-us/azure/storage/blobs/concur...</a>
This combined with the read-after-write consistency guarantee is a perfect building block (pun intended) for incremental append only storage atop an object store. It solves the biggest problem with coordinating multiple writers to a WAL.
If the default ETag algorithm for non-encrypted, non-multipart uploads in AWS is a plain MD5 hash, is this subject to failure for object data with MD5 collisions?<p>I'm thinking of a situation in which an application assumes that different (possibly adversarial) user-provided data will always generate a different ETag.
I can't wait to see what abomination Cory Quinn can come up with now given this new primitive! (see previous work abusing Route53 as a database: <a href="https://www.lastweekinaws.com/blog/route-53-amazons-premier-database/" rel="nofollow">https://www.lastweekinaws.com/blog/route-53-amazons-premier-...</a>)
Ironically with this and lambda you could make a serverless sqlite by mapping pages to objects, using http range reads to read the db and lambda to translate queries to the writes in the appropriate pages via cas. Prior to this it would require a server to handle concurrent writers, making the whole thing a nonstarter for “serverless”.<p>Too bad performance would be terrible without a caching layer (ebs).
s3fs's <a href="https://github.com/fsspec/s3fs/pull/917">https://github.com/fsspec/s3fs/pull/917</a> was in response to the IfNoneMatch feature from the summer. How would people imagine this new feature being surfaced in a filesystem abstraction?
I had no idea people rely on S3 beyond dumb storage. It almost feels like people are trying to build out a distributed OLAP database in the reverse direction.
An open-source implementation of Amazon S3 - MinIO has had it for almost two years (relevant post: <a href="https://blog.min.io/leading-the-way-minios-conditional-write-feature-for-modern-data-workloads/" rel="nofollow">https://blog.min.io/leading-the-way-minios-conditional-write...</a>). Strangely, Amazon is catching up just now.
Does this mean, in theory we will be able to manage multiple concurrent writes/updates to s3 without having to use new solutions like Regatta[1] that was recently launched?<p><a href="https://news.ycombinator.com/item?id=42174204">https://news.ycombinator.com/item?id=42174204</a>
So...are we closer to getting to use S3 as a...you guessed it...a database? With CAS, we are probably able to get a basic level of atomicity, and S3 itself is pretty durable, now we have to deal with consistency and isolation...although S3 branded itself as "eventually consistent"...
I implemented that extension in R2 at launch IIRC. Thanks for catching up & helping move distributed storage applications a meaningful step forward. Intended sincerely. I'm sure adding this was non-trivial for a complex legacy codebase like that.
Now if only you had more control over the ETag, so you could use a sha256 of the total file (even for multi-part uploads), or a version counter, or a global counter from an external system, or a logical hash of the content as opposed to a hash of the bytes.