TechEcho

10 comments

elFartoalmost 7 years ago

I wonder if you could get the same functionality of AWS, with the same implementation of B2, by having a single URL to POST files to, that simply sent a redirect to the correct location (apparently there's the 307 HTTP status code for exactly this).E.g:<pre><code> => POST https://upload.backblaze.com/bucket/file <= 307 redirect to https://pod-000-1007-13.backblaze.com/b2api/v1/b2_upload_file/... => POST https://pod-000-1007-13.backblaze.com/b2api/v1/b2_upload_file/...</code></pre>

评论 #17719094 未加载

评论 #17719972 未加载

jlmortonalmost 7 years ago

I'm really surprised B2 doesn't seem to charge for upload API requests. I have a project which uploads several billion small objects to Amazon S3. The vast, vast majority are written, stored with a 15 month TTL, and never touched again. Some small number are downloaded each month.To illustrate this, here's a recent S3 bill:$0.005 per 1,000 PUT, COPY, POST, or LIST requests 289,727,754 Requests $1,448.64$0.004 per 10,000 GET and all other requests 62,305 Requests $0.02$0.023 per GB - first 50 TB / month of storage used 18,990.009 GB-Mo $436.77As you can see, most of our spend on S3 is from the PUT requests, not the storage, or download. Probably there are some things we could do to reduce the number of PUT requests. We don't really care that much, because the total cost is not that large, but there is at least some incentive to reduce the number of PUT calls.But if it was free? I would never change this system. Does Backblaze really want this sort of traffic profile?

评论 #17719975 未加载

评论 #17719923 未加载

评论 #17719872 未加载

hemancusoalmost 7 years ago

It’s a bit unclear to me what is so expensive about the load balancing nodes. Care that explain why it’s substantially more than a few round robin’d smart reverse proxies moving data to the correct storage node? With S3/Dynamo design the back end destination is largely known from the hash ring.Also- Wasabi has fantastic pricing and full s3 compatibility.

评论 #17720217 未加载

评论 #17719812 未加载

bcheungalmost 7 years ago

Anyone know why Amazon didn't adopt existing standards like SCP / SFTP / WebDAV? I've always found the S3 APIs to be difficult to work with, especially for authorization and large uploads.

评论 #17719651 未加载

评论 #17719321 未加载

评论 #17720405 未加载

评论 #17719441 未加载

willglynnalmost 7 years ago

This article contains some misunderstandings about the S3 API.> The interface to upload data into Amazon S3 is actually a bit simpler than Backblaze B2’s API. But it comes at a literal cost. It requires Amazon to have a massive and expensive choke point in their network: load balancers. When a customer tries to upload to S3, she is given a single upload URL to use. For instance, <a href="http://s3.amazonaws.com/<bucketname>" rel="nofollow">http://s3.amazonaws.com/<bucketname></a>. This is great for the customer as she can just start pushing data to the URL. But that requires Amazon to be able to take that data and then, in a second step behind the scenes, find available storage space and then push that data to that available location. The second step creates a choke point as it requires having high bandwidth load balancers. That, in turn, carries a significant customer implication; load balancers cost significant money.In fact, S3's REST API requires callers to follow HTTP redirects, and the PUT documentation expressly mentions the HTTP "Expect: 100-continue" mechanism precisely so that the S3 endpoint you reach in your initial PUT request does not have to handle the HTTP request body.<a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/Redirects.html" rel="nofollow">https://docs.aws.amazon.com/AmazonS3/latest/dev/Redirects.ht...</a> <a href="https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html" rel="nofollow">https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPU...</a>> The Dispatching Server (the API server answering the b2_get_upload_url call) tells the Client “there is space over on “Vault-8329.” This next step is our magic. Armed with the knowledge of the open vault, the Client ends its connection with the Dispatching Server and creates a brand new request DIRECTLY to Vault-8329 (calling b2_upload_file or b2_upload_part). No load balancers involved!Again, this could be done directly with HTTP. PUT to the first server, receive a redirect, PUT to vault-8329, receive "100 Continue", transmit file. There's no need to have a separate API call to get the "real" upload URL.> 3) Expensive, time consuming data copy needs (and “eventual consistency”). Amazon S3 requires the copying of massive amounts of data from one part of their network (the upload server) to wherever the data’s ultimate resting place will be. This is at the root of one of the biggest frustrations when dealing with S3: Amazon’s “eventual consistency.”Wait, I thought they were load balancers? Why does the load balancer need to copy any data once it's done uploading?As for eventual consistency, there is truth to this complaint -- but much less truth than in the distant past. Every S3 region except us-standard has always had read-after-write consistency for new objects since launch, and as of August 2015, us-standard does too:<a href="https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-s3-introduces-new-usability-enhancements/" rel="nofollow">https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-s3...</a>If your PUT returns 200 OK, a subsequent GET will return the object, assuming you're using unique keys. This prevents the 2015-and-earlier problem where you'd create a new S3 object and enqueue a job to process it, then the job gets 404 Not Found while retrieving the new object.There are other cases where S3's eventual consistency can be an issue, but none of them have been dealbreakers for my applications. Having said that: S3's consistency model is a weaker than the model B2 provides, so this is not an argument against providing an S3-compatible interface.

评论 #17719532 未加载

评论 #17719372 未加载

deepsunalmost 7 years ago

That "get_upload_url()" trick they invented was in AppEngine's BlobStore since 2008. Although Google deprecated it in favor of GCS.

deedubayaalmost 7 years ago

That's all fine and good, I don't care if you're S3 compatible or not....I do care if I have to write my own API client for your storage backend. Or if you have examples to go off of. Backblaze doesn't seem to offer either for non-C++/Swift languages. Complete non-starter.The, perhaps obvious, win of being S3 compatible is that you open the door to thousands of existing S3 clients already implemented in my different technologies, for free. And you get the developers who use them as customers.

评论 #17719378 未加载

评论 #17719266 未加载

评论 #17719313 未加载

metalrainalmost 7 years ago

It's great that cost of elasticity is not hidden. I'm glad that there are alternatives.

misterbowfingeralmost 7 years ago

Honestly surprised that AWS, GCP, or Azure haven't acquired BackBlaze by now. Seems like an obvious move.

评论 #17719749 未加载

评论 #17720466 未加载

评论 #17719105 未加载

vpribishalmost 7 years ago

"Design Thinking" >>cringe<<

10 comments

elFartoalmost 7 years ago

评论 #17719094 未加载

评论 #17719972 未加载

jlmortonalmost 7 years ago

评论 #17719975 未加载

评论 #17719923 未加载

评论 #17719872 未加载

hemancusoalmost 7 years ago

评论 #17720217 未加载

评论 #17719812 未加载

bcheungalmost 7 years ago

Anyone know why Amazon didn't adopt existing standards like SCP / SFTP / WebDAV? I've always found the S3 APIs to be difficult to work with, especially for authorization and large uploads.

评论 #17719651 未加载

评论 #17719321 未加载

评论 #17720405 未加载

评论 #17719441 未加载

willglynnalmost 7 years ago

评论 #17719532 未加载

评论 #17719372 未加载

deepsunalmost 7 years ago

That "get_upload_url()" trick they invented was in AppEngine's BlobStore since 2008. Although Google deprecated it in favor of GCS.

deedubayaalmost 7 years ago

评论 #17719378 未加载

评论 #17719266 未加载

评论 #17719313 未加载

metalrainalmost 7 years ago

It's great that cost of elasticity is not hidden. I'm glad that there are alternatives.

misterbowfingeralmost 7 years ago

Honestly surprised that AWS, GCP, or Azure haven't acquired BackBlaze by now. Seems like an obvious move.

Design Thinking: B2 APIs and the Hidden Costs of S3 Compatibility

10 comments

Design Thinking: B2 APIs and the Hidden Costs of S3 Compatibility

10 comments