TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Design Thinking: B2 APIs and the Hidden Costs of S3 Compatibility

67 pointsby Manozcoalmost 7 years ago

10 comments

elFartoalmost 7 years ago
I wonder if you could get the same functionality of AWS, with the same implementation of B2, by having a single URL to POST files to, that simply sent a redirect to the correct location (apparently there&#x27;s the 307 HTTP status code for exactly this).<p>E.g:<p><pre><code> =&gt; POST https:&#x2F;&#x2F;upload.backblaze.com&#x2F;bucket&#x2F;file &lt;= 307 redirect to https:&#x2F;&#x2F;pod-000-1007-13.backblaze.com&#x2F;b2api&#x2F;v1&#x2F;b2_upload_file&#x2F;... =&gt; POST https:&#x2F;&#x2F;pod-000-1007-13.backblaze.com&#x2F;b2api&#x2F;v1&#x2F;b2_upload_file&#x2F;...</code></pre>
评论 #17719094 未加载
评论 #17719972 未加载
jlmortonalmost 7 years ago
I&#x27;m really surprised B2 doesn&#x27;t seem to charge for upload API requests. I have a project which uploads several billion small objects to Amazon S3. The vast, vast majority are written, stored with a 15 month TTL, and never touched again. Some small number are downloaded each month.<p>To illustrate this, here&#x27;s a recent S3 bill:<p>$0.005 per 1,000 PUT, COPY, POST, or LIST requests 289,727,754 Requests $1,448.64<p>$0.004 per 10,000 GET and all other requests 62,305 Requests $0.02<p>$0.023 per GB - first 50 TB &#x2F; month of storage used 18,990.009 GB-Mo $436.77<p>As you can see, most of our spend on S3 is from the PUT requests, not the storage, or download. Probably there are some things we could do to reduce the number of PUT requests. We don&#x27;t really care that much, because the total cost is not that large, but there is at least some incentive to reduce the number of PUT calls.<p>But if it was free? I would never change this system. Does Backblaze really want this sort of traffic profile?
评论 #17719975 未加载
评论 #17719923 未加载
评论 #17719872 未加载
hemancusoalmost 7 years ago
It’s a bit unclear to me what is so expensive about the load balancing nodes. Care that explain why it’s substantially more than a few round robin’d smart reverse proxies moving data to the correct storage node? With S3&#x2F;Dynamo design the back end destination is largely known from the hash ring.<p>Also- Wasabi has fantastic pricing and full s3 compatibility.
评论 #17720217 未加载
评论 #17719812 未加载
bcheungalmost 7 years ago
Anyone know why Amazon didn&#x27;t adopt existing standards like SCP &#x2F; SFTP &#x2F; WebDAV? I&#x27;ve always found the S3 APIs to be difficult to work with, especially for authorization and large uploads.
评论 #17719651 未加载
评论 #17719321 未加载
评论 #17720405 未加载
评论 #17719441 未加载
willglynnalmost 7 years ago
This article contains some misunderstandings about the S3 API.<p>&gt; The interface to upload data into Amazon S3 is actually a bit simpler than Backblaze B2’s API. But it comes at a literal cost. It requires Amazon to have a massive and expensive choke point in their network: load balancers. When a customer tries to upload to S3, she is given a single upload URL to use. For instance, <a href="http:&#x2F;&#x2F;s3.amazonaws.com&#x2F;&lt;bucketname&gt;" rel="nofollow">http:&#x2F;&#x2F;s3.amazonaws.com&#x2F;&lt;bucketname&gt;</a>. This is great for the customer as she can just start pushing data to the URL. But that requires Amazon to be able to take that data and then, in a second step behind the scenes, find available storage space and then push that data to that available location. The second step creates a choke point as it requires having high bandwidth load balancers. That, in turn, carries a significant customer implication; load balancers cost significant money.<p>In fact, S3&#x27;s REST API requires callers to follow HTTP redirects, and the PUT documentation expressly mentions the HTTP &quot;Expect: 100-continue&quot; mechanism precisely so that the S3 endpoint you reach in your initial PUT request does not have to handle the HTTP request body.<p><a href="https:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;AmazonS3&#x2F;latest&#x2F;dev&#x2F;Redirects.html" rel="nofollow">https:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;AmazonS3&#x2F;latest&#x2F;dev&#x2F;Redirects.ht...</a> <a href="https:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;AmazonS3&#x2F;latest&#x2F;API&#x2F;RESTObjectPUT.html" rel="nofollow">https:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;AmazonS3&#x2F;latest&#x2F;API&#x2F;RESTObjectPU...</a><p>&gt; The Dispatching Server (the API server answering the b2_get_upload_url call) tells the Client “there is space over on “Vault-8329.” This next step is our magic. Armed with the knowledge of the open vault, the Client ends its connection with the Dispatching Server and creates a brand new request DIRECTLY to Vault-8329 (calling b2_upload_file or b2_upload_part). No load balancers involved!<p>Again, this could be done directly with HTTP. PUT to the first server, receive a redirect, PUT to vault-8329, receive &quot;100 Continue&quot;, transmit file. There&#x27;s no need to have a separate API call to get the &quot;real&quot; upload URL.<p>&gt; 3) Expensive, time consuming data copy needs (and “eventual consistency”). Amazon S3 requires the copying of massive amounts of data from one part of their network (the upload server) to wherever the data’s ultimate resting place will be. This is at the root of one of the biggest frustrations when dealing with S3: Amazon’s “eventual consistency.”<p>Wait, I thought they were load balancers? Why does the load balancer need to copy any data once it&#x27;s done uploading?<p>As for eventual consistency, there is truth to this complaint -- but much less truth than in the distant past. Every S3 region except us-standard has always had read-after-write consistency for new objects since launch, and as of August 2015, us-standard does too:<p><a href="https:&#x2F;&#x2F;aws.amazon.com&#x2F;about-aws&#x2F;whats-new&#x2F;2015&#x2F;08&#x2F;amazon-s3-introduces-new-usability-enhancements&#x2F;" rel="nofollow">https:&#x2F;&#x2F;aws.amazon.com&#x2F;about-aws&#x2F;whats-new&#x2F;2015&#x2F;08&#x2F;amazon-s3...</a><p>If your PUT returns 200 OK, a subsequent GET will return the object, assuming you&#x27;re using unique keys. This prevents the 2015-and-earlier problem where you&#x27;d create a new S3 object and enqueue a job to process it, then the job gets 404 Not Found while retrieving the new object.<p>There are other cases where S3&#x27;s eventual consistency can be an issue, but none of them have been dealbreakers for my applications. Having said that: S3&#x27;s consistency model is a weaker than the model B2 provides, so this is not an argument against providing an S3-compatible interface.
评论 #17719532 未加载
评论 #17719372 未加载
deepsunalmost 7 years ago
That &quot;get_upload_url()&quot; trick they invented was in AppEngine&#x27;s BlobStore since 2008. Although Google deprecated it in favor of GCS.
deedubayaalmost 7 years ago
That&#x27;s all fine and good, I don&#x27;t care if you&#x27;re S3 compatible or not....<p>I do care if I have to write my own API client for your storage backend. Or if you have examples to go off of. Backblaze doesn&#x27;t seem to offer either for non-C++&#x2F;Swift languages. Complete non-starter.<p>The, perhaps obvious, win of being S3 compatible is that you open the door to thousands of existing S3 clients already implemented in my different technologies, for free. And you get the developers who use them as customers.
评论 #17719378 未加载
评论 #17719266 未加载
评论 #17719313 未加载
metalrainalmost 7 years ago
It&#x27;s great that cost of elasticity is not hidden. I&#x27;m glad that there are alternatives.
misterbowfingeralmost 7 years ago
Honestly surprised that AWS, GCP, or Azure haven&#x27;t acquired BackBlaze by now. Seems like an obvious move.
评论 #17719749 未加载
评论 #17720466 未加载
评论 #17719105 未加载
vpribishalmost 7 years ago
&quot;Design Thinking&quot; &gt;&gt;cringe&lt;&lt;