TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Petabytes on a budget: How to build cheap cloud storage (2010)

142 点作者 Oculus超过 10 年前

19 条评论

KaiserPro超过 10 年前
I look after about 15PB of tier1 storage, and I&#x27;d recommend not doing it the backblaze way.<p>Its grand that its worked out for them, but there are a few big drawbacks that backblasze have software&#x27;d their way around.<p>Nowadays its cheaper to use an engenio based array like the MD3260 (also sold&#x2F;made by netapps&#x2F;LSI)<p>First you can hot swap the disks. Second you don&#x27;t need to engineer your own storage manager. Thirdly you get much much better performance(2gigabytes a second sustained, which is enough to saturate to 10 gig nics). Fourthly you can get 4 hour 24&#x2F;7 response. Finally the air flow is a bit suspect.<p>we use a 1u two socket server with SAS to server the data.<p>If you&#x27;re brave, you can skip the raid controller and the JBOD enclosure instead and ZFS over the top. However ZFS fragments like a bitch, so watch out if you&#x27;re running at 75% plus
评论 #8512596 未加载
评论 #8511748 未加载
评论 #8511599 未加载
评论 #8513035 未加载
2close4comfort超过 10 年前
<a href="https://www.backblaze.com/blog/backblaze-storage-pod-4/" rel="nofollow">https:&#x2F;&#x2F;www.backblaze.com&#x2F;blog&#x2F;backblaze-storage-pod-4&#x2F;</a> here is the latest version. I have loved being able to follow the iterations of the storage pod. They are very thoughtful about HW choices but leave it open which I think is the best part!
评论 #8512080 未加载
评论 #8511252 未加载
jpalomaki超过 10 年前
I&#x27;ve always been very suspicious about backup vendors offering unlimited space for fixed price. These storage pod posts by Backblaze was primary reason why I decided to give their service a try. Knowing the technology behind the system made it much more credible for me.
评论 #8512746 未加载
评论 #8512128 未加载
andyidsinga超过 10 年前
&gt; A Backblaze Storage Pod is a Building Block<p>&gt; But the intelligence of where to store data and how to encrypt it, deduplicate it, and index it is all at a higher level (outside the scope of this blog post).<p>I&#x27;m curious about their software that works outside the nodes too. I&#x27;ve been working on storage clusters over this past 9 months using the Ceph ( <a href="http://ceph.com/" rel="nofollow">http:&#x2F;&#x2F;ceph.com&#x2F;</a> ) open source storage software. Its pretty amazing -- and I suspect it could be deployed to a set of backblaze pods too.<p>It seems to be that for production environment where you wanted to maintain availability you would to build at least 3 of those pods for any deployment - enabling replication across pods&#x2F;storage nodes.
评论 #8511949 未加载
jdub超过 10 年前
But in 2014, 1PB in S3 – with 11 nines of data durability – costs ~USD$30,000.
评论 #8511928 未加载
bkruse超过 10 年前
This has always interested me. I need to do decently big storage for genomic data. It doesn&#x27;t have to be fast, but it needs to be able to survive one data center blowing up. If I have 3 data centers, need to store 2-3 petabytes and need storage to survive in the case of a data center failure - the solutions really narrow down when you have to get under the $200&#x2F;tb range.<p>Playing with Swift now - but it has really opened my eyes to how much more difficult 2-3 petabytes of storage is (disk failures, number of disks in your infrastructure, the time to redeploy a datacenter on a 1gpbs connection). All the little problems become much bigger!
评论 #8512053 未加载
harel超过 10 年前
Tech aside, I&#x27;m quite curious about the economics of storage here. By the price tags and &#x27;Debian 4&#x27; I&#x27;m guessing this is an older post. But still, $7867 per 67TB and $5 per month, means they need 131 users pay for one year to recoup the cost of one pod, assuming those 131 do not generate over 67TB worth of storage in that period of time. I&#x27;ve not factored in data centre costs, salaries etc. Just a pod. I&#x27;m guessing they have enough users as they have been around for many years now, but still, $5 seems a bit on the cheap side to me (not that I&#x27;m complaining)
评论 #8512668 未加载
评论 #8512537 未加载
corv超过 10 年前
I wonder how they can guarantee data integrity.<p>Are they checksumming on a higher level and is that cheaper than using ZFS with necessarily more expensive hardware?
评论 #8511349 未加载
评论 #8511372 未加载
immortalx超过 10 年前
I decided to try it. You can only select entire hard drives and work around this by excluding folders. That&#x27;s odd but what i don&#x27;t understand is why you cannot exclude your c:&#x2F; (or Main Drive). Why should anyone be forced to backup something?<p>Seems to me like the design is backwards and doesn&#x27;t make any sense.
评论 #8512927 未加载
评论 #8512087 未加载
ciupicri超过 10 年前
If you submit an old article even if newer versions of it exist, at least mention the year in parenthesis.
hendzen超过 10 年前
Honest question, why JFS vs say, ext4?
评论 #8511326 未加载
评论 #8511333 未加载
评论 #8511327 未加载
fredsted超过 10 年前
&gt;In the future, we will dedicate an entire blog post to vibration.<p>In the meantime, does anyone have a link?
aliakhtar超过 10 年前
Do they only do backups or also cloud storage? If they have an API for uploading &#x2F; deleting &#x2F; viewing files, I&#x27;d use them over S3 given how lower their costs are. But, I can&#x27;t find any info on that on their website.
评论 #8513149 未加载
评论 #8511931 未加载
mschuster91超过 10 年前
I&#x27;d love it if either Backblaze or a 3rd party makes a business of selling these pods!<p>edit: just spotted it, their boot drive is PATA?! Why is this, given that PATA drives are slower and more expensive than SATA ones?
评论 #8511286 未加载
评论 #8511368 未加载
评论 #8511551 未加载
ksec超过 10 年前
Need the Add the Year at the title. This post is old.
tkinom超过 10 年前
Great article!<p>Love to see more write up on software selection process, tradeoff, failure recovery process&#x2F;methods and benchmark data.
sidcool超过 10 年前
Quite detailed post. Loved reading it.
iflyun超过 10 年前
why such an old and expensive cpu?
评论 #8512822 未加载
codeonfire超过 10 年前
What do you do when a drive goes bad? Do you move ~50TB of data, pull the entire pod out of the rack, and then try to determine which of 45 drives is bad?
评论 #8511456 未加载
评论 #8512120 未加载
评论 #8512152 未加载
评论 #8511493 未加载
评论 #8513930 未加载