This article contains some misunderstandings about the S3 API.<p>> The interface to upload data into Amazon S3 is actually a bit simpler than Backblaze B2’s API. But it comes at a literal cost. It requires Amazon to have a massive and expensive choke point in their network: load balancers. When a customer tries to upload to S3, she is given a single upload URL to use. For instance, <a href="http://s3.amazonaws.com/<bucketname>" rel="nofollow">http://s3.amazonaws.com/<bucketname></a>. This is great for the customer as she can just start pushing data to the URL. But that requires Amazon to be able to take that data and then, in a second step behind the scenes, find available storage space and then push that data to that available location. The second step creates a choke point as it requires having high bandwidth load balancers. That, in turn, carries a significant customer implication; load balancers cost significant money.<p>In fact, S3's REST API requires callers to follow HTTP redirects, and the PUT documentation expressly mentions the HTTP "Expect: 100-continue" mechanism precisely so that the S3 endpoint you reach in your initial PUT request does not have to handle the HTTP request body.<p><a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/Redirects.html" rel="nofollow">https://docs.aws.amazon.com/AmazonS3/latest/dev/Redirects.ht...</a>
<a href="https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html" rel="nofollow">https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPU...</a><p>> The Dispatching Server (the API server answering the b2_get_upload_url call) tells the Client “there is space over on “Vault-8329.” This next step is our magic. Armed with the knowledge of the open vault, the Client ends its connection with the Dispatching Server and creates a brand new request DIRECTLY to Vault-8329 (calling b2_upload_file or b2_upload_part). No load balancers involved!<p>Again, this could be done directly with HTTP. PUT to the first server, receive a redirect, PUT to vault-8329, receive "100 Continue", transmit file. There's no need to have a separate API call to get the "real" upload URL.<p>> 3) Expensive, time consuming data copy needs (and “eventual consistency”). Amazon S3 requires the copying of massive amounts of data from one part of their network (the upload server) to wherever the data’s ultimate resting place will be. This is at the root of one of the biggest frustrations when dealing with S3: Amazon’s “eventual consistency.”<p>Wait, I thought they were load balancers? Why does the load balancer need to copy any data once it's done uploading?<p>As for eventual consistency, there is truth to this complaint -- but much less truth than in the distant past. Every S3 region except us-standard has always had read-after-write consistency for new objects since launch, and as of August 2015, us-standard does too:<p><a href="https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-s3-introduces-new-usability-enhancements/" rel="nofollow">https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-s3...</a><p>If your PUT returns 200 OK, a subsequent GET will return the object, assuming you're using unique keys. This prevents the 2015-and-earlier problem where you'd create a new S3 object and enqueue a job to process it, then the job gets 404 Not Found while retrieving the new object.<p>There are other cases where S3's eventual consistency can be an issue, but none of them have been dealbreakers for my applications. Having said that: S3's consistency model is a weaker than the model B2 provides, so this is not an argument against providing an S3-compatible interface.