I worked on the design of Dropbox's exabyte-scale storage system, and from that experience I can say that these numbers are all extremely optimistic, even with their "you can do it cheaper if you only target 95% uptime" caveat. Networking is much more expensive, labor is much more expensive, space is much more expensive, depreciation is faster than they say, etc etc. I don't think the authors have ever done any actual hardware provisioning before.<p>I didn't read all their math but I expect their final result to be off by a factor of 2-5x. Hard drives are a surprisingly low percentage of the cost of a storage system.
I work in telecom/datacenter infrastructure and this is fanciful. The whole way they take the wattage load of one machine and then hand wave away all of the rest of the costs of either building and running a datacenter, or paying ongoing monthly colocation costs... Is just scary. I truly don't mean to offend anyone but this looks like a bunch of enthusiastic dilettantes.<p>Generators?<p>UPS?<p>Cooling costs?<p>Square footage costs for the real estate itself?<p>Security and staffing?<p>At the scale they intend to accomplish they will need at minimum several hundred kilowatts of datacenter space. Even assuming somewhere with a very low kWh cost of electricity, that much space for bare metal things isn't cheap. Go price a lot of square footage and 300kW of equipment load in Quincy, WA or anywhere else comparable, the monthly recurring dollar figure will be quite high.<p>And all of that is before you even start to look into network costs to build a serious IP network and interconnect with transits and peers.
In 2018, I spent about six weeks running a series of tests to measure Sia's real world costs. At that time, storage cost ~$4.50/TB on Sia to back up large real world files (backups of DVDs and Blu-Rays).[0] Community members have re-run my tests every few months, most recently in October 2019, when the cost was measured at $1.31/TB, though it's worth noting that recent tests use synthetic data optimized to minimize Sia's cost.[1] It's also unclear how much the market value of Sia's utility token affects these costs, as the price of Siacoin has fallen by ~80% since I conducted my original set of tests.<p>The calculations in today's blog post account for the labor cost of assembling hardware, but leave out major other labor costs:<p>1. You need an SRE to keep the servers online. Sia pushes out updates every few months, and the network penalizes you if you don't upgrade to the latest version. In addition, to optimize costs, you need to adjust your node's pricing in response to changes in the market.<p>2. You need a compliance officer to handle takedown requests. Since Sia allows anyone to upload data to your server without proving their identity, there's nothing stopping anyone from uploading illegal data to the network. If Sia reached the point where people are building $4k hosting rigs, then it's safe to assume clients would also be using Sia to store illegal data. When law enforcement identifies illegal data, they would send takedown notices to all hosts who are storing copies of it, and those hosts would need someone available to process those takedowns quickly.<p>[0] <a href="https://blog.spaceduck.io/load-test-wrapup/" rel="nofollow">https://blog.spaceduck.io/load-test-wrapup/</a><p>[1] <a href="https://siastats.info/benchmarking" rel="nofollow">https://siastats.info/benchmarking</a>
I'm going through Sia's website now. It seems this article is meant to bolster the claim on their website which states "When the Sia network is fully optimized, pricing will fall somewhere around $2/TB/month." [1]<p>Call me skeptical but it seems that they aren't committing to building out this infrastructure themselves or providing a specific amount of storage at this pricing. They seem to be outlining a potential infrastructure that some enterprising individual (or corporation) could use to provide storage at that price to "renters" within their marketplace.<p>I guess I'll just wait until someone puts their money where their mouth is. Given that this is a marketplace, the fact that a theoretical setup could be built to provide some service doesn't necessarily guarantee it will be built.<p>1. <a href="https://support.sia.tech/article/thvymhf1ff-about-renting" rel="nofollow">https://support.sia.tech/article/thvymhf1ff-about-renting</a>
> That means about 2 hours of labor per rig. We’ll call that $50<p>Does that seem low to anyone else? I don’t really have any background in the area, but 25/hr cost <i>to the company</i> would be less than 20/hr pay for the skilled labor. Other countries are different of course, but in US I could make that much flipping burgers in the right area.
There is way too much hand-waving and assuming going on this article. It is a load of BS that does not take into account real-world inefficiencies. e.g. sometimes buying in bulk is more expensive than buying at retail, esp when you need consistent supply. Sure, you may need only an hour of sysadmin time a day, but what sysadmin will let you employ them an hour a day? The buildout did not list a CPU. The assumptions about uptime are over-amortized, an outage given the resources they quote may average out to 95% uptime but their latency for getting systems back up is going to be absolutely terrible and I’d be surprised if outages were shorter than a day or two on average. They aren’t factoring in cooling. They aren’t factoring in the drastically reduced lifetime of drives in their ridiculously cramped and under-ventilated cubbies. They are completely ignoring diagnostic time, presuming they can only quote actual repair times, which is an absolute joke given the lack of smart hardware and enterprise DC management. They think they can average out throughout over the number of drives not taking into account per-channel limitations. They are not taking into account the extra time to build and dismantle systems in their hacked-together IKEA shelves. They are underestimating the costs of electricity at commercial rates. I could go on and on, but suffice to say that I would never, ever use their network for any purpose without another backup (which they don’t finger into their costs, of course ;). I thought B2 was risky; this is taking it to an entirely different level.
I feel like backblaze has already done most of this and has it in production [1].
Whereas this is just done back of the napkin calculation.<p>[1] <a href="https://www.backblaze.com/b2/storage-pod.html" rel="nofollow">https://www.backblaze.com/b2/storage-pod.html</a>
One interesting point of reference is that backbaze currently charges $5 / TB / Mo. Assuming they haven't changed their profit margin of 50% from 2017 (<a href="https://www.backblaze.com/blog/cost-of-cloud-storage/" rel="nofollow">https://www.backblaze.com/blog/cost-of-cloud-storage/</a>), then this would imply that they have a direct cost of roughly $2.5 / TB / Mo.
Top of Hacker News and there's nothing clickable above the fold that takes me to the SIA website.<p>Content marketers and technical marketers - don't miss the opportunity on Medium and other platforms to at VERY LEAST link to your homepage in the first section.<p>In fact that is at the top of this awesome piece of content marketing is a "Sign Up" button for Medium . . .
I've been using Sia for about three months to backup some personal files. Nothing crazy, but it seems to work well.<p>I'm looking forward to seeing this project mature as well as have some more layers build on top of it moving forward. I really wish the client offered synchronization or access across multiple devices. For now you have to try third party layers on top of Sia to accomplish this.
Really smart people make this mistake a lot, so I'm wondering what Sia is doing to decorrelate failure rates. If hedge fund quants can turn mortgage tranches into a machine for massive correlated economic losses, can blockchain quants turn storage tranches into a machine for massive correlated storage losses?<p>Or if one of the major hyperscalers or datacenter operators decides to start selling storage to Sia, it seems likely that their control plane across datacenters could result in correlated failures. A networking outage for their AS could result in multiple datacenters appearing offline concurrently, for example.
This analysis entirely omits the cost of a sysadmin to manage the storage servers. Even if sia is assumed to do almost everything, and even if we only want 95% uptime, you still need someone to deal with software updates, hard drive monitoring, etc etc.<p>The profit of $570/year/box is not enough to pay a part-time sysadmin and still have any useful profit.
>If we assume that the 30 hosts go offline independently<p>I wonder how reasonable this assumption really is. For regular CPU-bound crypto-mining we see that it tends to centralize geographically in zones where electricity, workforce and real-estate space to build a datacenter are cheap.<p>Assuming that Sia ends up following a similar distribution, it wouldn't be surprising if several of these hosts ended up sharing a single point of failure.<p>Beyond that, if only copying stuff around three times to provide tolerance is enough to lower the costs to $2/TB/Mo, why aren't centralized commercial offerings already offering something like that? Just pool three datacenters with 95+% uptime around the world and you should get the same numbers without the overhead of the decentralized solution, no? Surely the overhead of accounting for hosts going offline and redistributing the chunks alone must be very non-trivial. With a centralized, trusted solution it would be much simpler to deal with.<p>Or is the real catch that Sia has very high latency?
Wait, how are they connecting 32 drives to that motherboard? They seem to be implying they are splitting each SATA plug 4 ways, which as far as I know is impossible.<p>The adapter they're linking to is SF8087 to 4x SATA, not SATA to 4x SATA (which shouldn't exist). That motherboard doesn't have SF8087, it has 8 SATA3 connections.<p>Unless I've missed something big, SF8087 can not be plugged into SATA3.
I don't think it is correct to say that the only options are "host failures are truly independent" or "world war three".<p>The hosts are not ever going to be fully independent. There will be hundreds, if not thousands, host co-located in the same location -- likely of the cheapest grade, without any extras like fire alarms or halon extinguishers or redundant power feeds.
A single fire (flood, broken power station) has a chance of taking out thousands of hosts simultaneously.<p>And there is management system as well -- AWS has thousands of engineers working on security. Will there be one at this super-cheap farm? What are the chances there will be farms with default passwords and password-less VNC connections? And since machines are likely to be cloned, any compromise affects thousands of hosts.<p>... and all of those things are made worse by the fact that if you store hundreds of thousands of files, your failure probability raises significantly. If a data center burns down, at least few of your files may be unlucky enough to be lost.
at a minimum the facility will need some power conditioning and/or insurance. you don't want a brief power surge to eat all of your capital, and lockup fees, in one go.<p>> For a 32 HDD system, you expect about 5 drives to fail per year. This takes time to repair and you will need on-site staff (just not 24/7). To account for these costs, we will budget $50 per year per rig.<p>will you not also lose 6TB (times utilization) of your lockup every time a drive dies?<p>> 8x 4 way SATA data splitters<p>you've linked to SAS breakout cables. they don't plug into SATA ports, they plug into SFF-8087 SAS ports.<p>they cannot plug into the motherboard you've listed. nor have I ever seen one listed for retail sale that has 8 SFF-8087 ports.<p>the cheapest way to get 8 SFF-8087 ports is with some SAS expander card, and a SAS HBA. even scraping off eBay that's another $50 per host, and two more components to fail.<p>there are also actual SATA expanders out there, but they last about 3 months before catastrophic failure in my experience.
Big deal. I charge $5 per TB per month and I'm not even trying to be cheap.<p>The economies of scale should make this much less expensive. Colocating your own machine in a real datacenter and hosting your own data shouldn't still be cheaper than practically all of "the cloud" offerings, but it is. What does that tell you about "the cloud"? It's marketing bullshit.<p>Sure, it's fine for occasional use, but anyone using the hell out of "the cloud" can easily save money by using anything else.
From their site:
[1]
Sia is a variant on the Bitcoin protocol that enables decentralized file storage via cryptographic contracts<p>[1]<a href="https://sia.tech/sia.pdf" rel="nofollow">https://sia.tech/sia.pdf</a>
I don't know anything about the subject, so no idea if these claims are realistic. But whatever, either they deliver or they don't.<p>My (or their, actually) problem is I don't really get what they are offering right now. There is an impressive landing page with big numbers and pretty pictures which explains pretty much nothing. Project seems to be in production for at least 3 years, there are some apps, but I don't actually see if I can use it to backup/store some data and how much it costs right now. I mean, they say "1TB of files on Sia costs about $1-2 per month" right there on the main page, but it cannot be true, right? It's just what they promise in the hypothetical future, not current price-tag?<p>The only technical question I'm interested here is why they actually need blockchain? This is always suspicious and I don't remember if I saw <i>any</i> startup at all that actually needs it for things other than hype. It is basically their internal money system to enable actual money exchange between storage providers and their customers, right? So, just a billing system akin to what telecom and ISP companies have? Is it cheaper to implement it on blockchain than by conventional means? How so?
On a related topic, I've had a ton of problems finding a cloud storage system that will reliably handle files around 100-200gb. Does anyone have a recommendation for a service that can handle that file size with ease?
So no CPU (or APU, so you don't need a GPU), no RAM, and those breakout cables are actually for SAS, but no SAS card listed in the total. This does not inspire confidence in the project at all.
Interesting article, but "black swan situations like world war three" may be underestimation. Software bugs are more likely and sometimes fatal.<p>I wonder why transfer prices are not included? As you explain every transfer is paid does it mean one has to pay for 10 uploads of every single object, right? But as equipment ages, peers go out of business then who pays for the data rebalancing transfers?
It's probably feasible to reach these levels of cost. I certainly still keep NAS in two locations because even places like hetzner don't sell lower powered machines with lots of disk space. But the build they specify doesn't have a CPU or RAM and it's using a SAS cable to connect to a SATA motherboard. Depending on the requirements of the platform they may be able to get away with non-ECC RAM, a simple APU to not need a graphics card, and a few cheap SATA PCIe cards to get enough connections. It will probably add ~500$ or ~10% to the build though. I don't know if the other costs have similar issues.
I was expecting an ad based on the title, but it ended up being an interesting analysis of just how much storage ends up costing them with their focused hardware setup.
Honestly, I'm a bit confused on who the targeted audience is for this article. I've been running as a host for Sia for months now. My rig is a raspberry pi and a 10tb external HDD I had laying around.
I worked for a p2p startup 15 years ago. We were exploring ideas and products in this space. We came close to partnering with a company doing distributed cloud storage. Their idea was to allow people to rent storage space in personal computers.<p>We decided to scrap the plan to do p2p storage, ended up using cloud storage. This p2p storage idea is a tough one. People are not willing to make a few dimes renting out their hard drive or CPU. The economic unit is too small to work. But good luck trying this idea. I wouldn't be surprised if someone tries again in 20 years. :)
The IO operations amplification for 64 of 96 is pretty brutal, and particularly unfavorable in a world where capacity-per-IOPS keeps trending up. I wonder how they'll deal with that.
The most important thing not addressed here is demand. Last I checked (granted, this was a while ago) it simply wasnt there- Meaning if you built this rig you might only be able to rent out a small part of it.<p>If this has changed I would be interested in hearing about it-<p>One other thing I am not understanding is how this makes financial sense, even if the demand is there. If I am buying a rig for 4500 bucks to get 200TB, making "570 a year in profit" is nowhere near exciting enough. Practically any other use pays more. Renting a dedicated server for a game, web hosting, hell even GPU mining makes more.<p>(a single 1080ti can do about 1$ a day in gross revenue on grin/eth/etc - which can be had used for ~400 bucks- Or you can get a p102 which is the mining card version with no display output for 250 bucks) - Payback with power costs/etc.. well below the 10 year threshold of siacoin)<p>Now where it might be interesting (IF there is demand), is just adding harddrives to an existing infrastructure already in place. So if you are a GPU miner and have 1000 rigs already in place, just adding a single 4TB harddrive to each machine might not be too bad- They go for about $50 each used and according to this, will pay back $8 a month with minimal extra costs
The website seemed okay and even useful until:<p>"Both renters and hosts use Siacoin, a unique cryptocurrency built on the Sia blockchain. Renters use Siacoin to buy storage capacity from hosts, while hosts deposit Siacoin into each file contract as collateral."
See also Tardigrade<p><a href="https://tardigrade.io/" rel="nofollow">https://tardigrade.io/</a><p>It is $10/mo/TB, but has different uptime, speed and security characteristics.
We from croit.io operate Ceph based storage including everything from datacenter, power, switches, labor, licenses, all to a price point of 3€/TB.<p>No consumer ware
There is one big problem that I've not seen anyone else point out with systems like this. I know because I did the calculation early on with Peergos and came to the conclusion that it doesn't work.<p>The problem comes when you want to store multiple files. If the corresponding erasure code fragments from different files are not stored on the same server then you don't have correlated failures. Contrast this with a typical raid scheme where a failed drive means the nth erasure fragment of every file is gone - correlated failures. If the failures across different files are not correlated, which is the case if you're storing each new block on a random node, then you are basically guaranteed to lose data if you have enough files. Depending on your scheme, this can happen as low as 1 TiB of data for a user. It is similar to the birthday paradox.<p>For erasure codes to work for a filesystem you need to have correlated failures.
This tech seems very cool, but with only 200TB stored I worry that it is destined to not pay for its overheads. No big project can survive on a revenue of $20/month!<p>When will the project grow some mobile apps like Dropbox or Google drive that you can just put a credit card number into, pay a few bucks and know your data is safe?
This is pretty incredible both from a product perspective, as well as the potential to push the whole industry towards a race to the bottom. Equilibrium here pushes storage, processing, and availability towards distributed nodes, unless high availability is required for some unique business case
Offtopic question : I went through their website, and I have no idea about blockchain. I have gone through the documentation and almost everything they are doing is possible without it as well. Cryptoproofs for storing data, smart contracts et al - not sure if it is different from regular encrypted deduplication done regularly with standard per hour/ per minute billing. Also, Siacoin for payment, not sure if it is the most optimal way.
I think I am missing something, would be glad if someone can point me in the right direction.
You lost me on the homepage (sia.tech) there are only 895 hosts storing a total of 206TB right now. That coupled with the shady infrastructure.<p>You aren’t touching my enterprise data, not even the cold storage logs.
For people that use Plex... Do you think that it's a good place ? If we have several TB is it cheaper to use it's own PC with (let's say) 2x10TB HDD + 2x10TB HDD backup or it's better to go online today ? When I check the price, I feel that it's always more expansive.<p>For my backup, it's not sync in real time but I do manual backup every 3 months. I can loose some data but I feel ok with that.
The website sia.tech required a Google captcha challenge for me to even load, clicking through from the article.<p>So.. that turned me off in an instant.
Where are these datacenters? I live in Ohio in the area of two of the points on the home page at <a href="https://sia.tech/" rel="nofollow">https://sia.tech/</a> One appears to be a private residence or a farm. The other dot is literally on a golf course fairway, is a private residence, or a power substation.
I wish there was an easy solution that would allow me to plug an S3 or a virtual drive or whatever and mount it as a partition in my cheap 20gb VPS.<p>The price for additional “drive”, like 5 bucks per month for 50gb or sthg, is insane. Especially when comparing with Dropbox or Onedrive pricing (or even physical drives sold over the counter).
I expected this title to be about <a href="https://www.scaleway.com/en/c14-cold-storage/" rel="nofollow">https://www.scaleway.com/en/c14-cold-storage/</a> which already offers exactly that, Cloud storage for $2 per TB per month.
Does any consumer grade motherboard have IPMI* support? When I tried to optimize my server costs one issue I ran into was that colocation providers require IPMI capability, which seems only available in server-grade motherboards.<p>* IPMI is for remote hardware management
> It also turns out that 32 HDDs only consume 200w, so the 750w PSU we picked is more than sufficient.<p>Yep. Stopped reading right there. HDDs use ~15 watts when they boot up. I experienced this and I never allocate less than 20 watts / hard disk.
In addition to what other people mentioned, there is also a huge cost in managing all the metadata that you'd get from billions of files. This really even worse if you're using a crazy 64-32 encoding.
one thing that bothers me about this system beyond the hand waving math is what would motivate me to give up control of my data being available? If I have no SLA and no ability to convince a bunch of down hosts to come back online with my data, why store it that way at all?
Great writeup from David as always. Can’t wait to see more!<p>If anyone hasn’t seen the work being done on the skynet platform, I highly recommend taking a look. Amazing stuff.