TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Storing 25 petabytes of Megaupload data costs us $9,000 a day

313 点作者 ttt_大约 13 年前

31 条评论

johngalt大约 13 年前
I've dealt with e-discovery sets. No one really has answers to what to do when you have a litigation hold on data. Legislation commonly requires "retention of anything related to X case", but how do you know what's relevant and what isn't? When you are a third party the ambiguity increases. So you end up with an <i>everything and kitchen sink</i> data dump. Even with <i>everything</i> the data is commonly useless without context. You have files without access logs and logs referrencing local namespaces etc...<p>With a 25 petabyte discovery, I'm not surprised that everyone's scratching their heads on what to do next. This isn't just an MPAA/Megaupload problem. Even a smaller dataset like a 10-20TB discovery has numerous problems. Hosting/indexing/classifying/reviewing millions of documents is an open issue for the legal field. What do you do when there are multiple parties who all need to see "everything"? If everyone does their own thing how do you reference materials in a consistent manner across the interested parties? If you all agree to host the data in a neutral place who pays for it? What if the technology of that host benefits one party at the expense of another?<p>For years the legal field has had a "print it all out and have a team of paralegals go over it" viewpoint. Clients don't pay for computers, but they do pay for paralegal hours. Only recently has that become untenable. Discovery sizes are growing exponentially per year. It's common to have a new discovery set come in larger that every previous set combined, and the legal industry doesn't really know what to do about it.
评论 #3742680 未加载
评论 #3743924 未加载
评论 #3743131 未加载
shrike大约 13 年前
The federal government does have a process for this sort of thing, if they seize an alleged drug dealer's house and that house has a mortgage the United States Marshals Service will pay the mortgage. If they seize cars, furniture, other assets the government is responsible for the storage of those items until the case has been resolved. [1]<p>I would guess that MegaUpload's lawyers will make the claim that the data on those servers is critical to their defense and must be maintained. That is probably an accurate claim, DotCom will want to present evidence of compliance with DMCA notices, counter the claim that a "majority" of the content was under copyright, etc. Best case for DotCom would probably be that his lawyers argue for retaining the data and the judge lets Carpathia destroy it anyway. That would give DotCom reasonable grounds for appeal.<p>[1] <a href="http://en.wikipedia.org/wiki/Asset_forfeiture" rel="nofollow">http://en.wikipedia.org/wiki/Asset_forfeiture</a>
评论 #3742839 未加载
评论 #3744200 未加载
anon808大约 13 年前
It sucks, but that's the price of doing business. They chose their customer, and now are (unfortunately) tied to consequences. Same thing happens to building owners who have a crime committed by a tenant, the leased space becomes a crime scene until the police/govt are done with their investigation.
评论 #3741743 未加载
评论 #3741806 未加载
评论 #3742581 未加载
tripzilch大约 13 年前
So, 25 petabytes ... 25 million gigabytes. Anyone care to guess how much of this data is illegitimate? And how much of <i>that</i> is under MPAA's copyright?<p>Back-of-the-envelope calculation: Just did a search for "1080" on some unnamed site and it appears a bluray rip of a movie encodes to roughly 10GB. So that would be <i>2.5 million</i> movies in 1080p quality. I don't think we've made that many, have we? Especially if you consider that movies that came out before the "high-definition era" are encoded to about a 10th of that size (700MB-2GB roughly, afaik).<p>Maybe I'm missing something obvious.<p>Not counting TV series for instance (are they also intellectual property represented by the MPAA? I'm not in the USA so I never really dug into that).<p>Movies duplicated in different quality formats are usually a 10th or less of the size of a 1080p Bluray rip as well, as an upper limit I could add a factor of x1.5 for that.<p>But then, the "long tail" of movie rips are 700-800MB and do not have duplicates.<p>Unless ... is the MPAA also representing porn? Because then all bets are off and I can easily accept that this 25 petabyte consists mostly of MPAA protected intellectual properties.<p>But otherwise, what percentage of these 25 petabytes would you estimate actually represents illegitimate data owned/represented by the MPAA? 2% ? 10% ?<p>Is that fair to the owners of the other 90% of the data? Even if it's probably mostly porn? (I'm fairly sure most of the data has to be porn)<p>I'm just wondering. Also because it's interesting to speculate what could be in these 25 petabytes. If you have a better guess I'd love to hear it :)
评论 #3743513 未加载
评论 #3743254 未加载
评论 #3743365 未加载
评论 #3744033 未加载
评论 #3743533 未加载
orbitingpluto大约 13 年前
In civil or criminal asset forfeiture, the state can conceivably confiscate property if used for or if it enables a crime. In some jurisdictions it doesn't even matter if the owner of the property and the criminal have really nothing to do with each other. (i.e. Your stolen SUV was used to rob a liquor store.)<p>Also, the government could have probably seized everything anyway as evidence. The problem with that is setting up that much rack space and network infrastructure isn't cheap.<p>That's Carpathia's basis for compensation. They are providing a service to the government. Seems like a no-brainer.
VikingCoder大约 13 年前
Help me out with the math here:<p>1 terabyte costs them $128.41 per year, right?<p>Amazon S3 would cost them roughly $444 per year, if they were using the Reduced Redundancy Storage.<p>The cheapest HD that I see on pcpartpicker (in terms of Price/GB) is the Western Digital Caviar Green 2.5 TB (5400 RPM) for $135.43, which is $0.054/GB. That's $54.17 per TB.<p>If you want a single backup, that's $108.34 per TB. Two backups (3 copies of each file), is $162.51 per TB.<p>So, if I'm doing this right, as long as their HDs last at least 15 months, on average, they have triple-redundancy, and the cheapest price ratio for consumer hardware. And I'm not even counting their power, network, cooling, or puny humans to maintain it all. That means their HDs, if they were made out of the cheapest parts I could find, would have to last significantly longer than 15 months, on average.<p>They're actually doing really good on price, if you ask me.<p>Or am I missing something obvious, or doing the math horribly wrong?
评论 #3742370 未加载
评论 #3742311 未加载
评论 #3742257 未加载
bshep大约 13 年前
Its the storage disks the government need, not the rest of the server hardware, if they cant come up with an agreement then shutdown all the servers, take out the disks, catalog, and put in a warehouse somewhere. They are now free to re-use the rest of the server for something else.<p>That would satisfy the needs of the government if they need access to the data, preserve it if in the future people are allowed to download it, and prevent the MPAA from complaining that it was given back to Megaupload.<p>I'm sure the cost of storage would not be minimal, but they could still use the rest of the hardware and not have to keep the servers powered up.<p>Possible problems:<p>- Maybe the servers cant be shutdown and brought back up without certain passwords or encryption keys<p>- Labor cost of shutting down and catalogging all those disks ( if done progressively would probably work )<p>- Others?
评论 #3741808 未加载
评论 #3741763 未加载
nextparadigms大约 13 年前
This is why the US Government shouldn't have seized the site first, and asked questions later. They should've filed a trial against them, and let them keep hosting the data, and if found guilty, <i>then</i> take it down.
评论 #3741888 未加载
brownbat大约 13 年前
The urgency is because Carpathia's lease has run out, they can't stay at the $9k/day facility.<p>Carpathia has to pay $65k to move the servers, then $37k per month to keep them in a climate controlled facility while powered down. Lost profits are still a relevant consideration. This is a doozy of a damages calculation. What's depreciation on assets that are rendered obselete by (something like) Moore's law?<p>I'd say Carpathia deletes the data and then supports the petitioners (those with lost data) in the takings clause case against the government. Carpathia claims indemnity against claims by pointing at MegaUpload and the Feds, but probably gets joined in a bunch of messy lawsuits. Real roll of the dice.
ericd大约 13 年前
Why are options that would destroy any chance at Megaupload conducting business in the future even on the table before a trial is finished? I suppose a large amount of damage is already done, but it would be a gross injustice to kill their business before anything started. The government should pay to keep this up until they've conducted their trial. If they don't, and they lose somehow, I hope they get hit with a massive countersuit.
moonboots大约 13 年前
For reference, this amount of data would require 190 backblaze storage pods ($7,384 for 135TB) totaling $1.4 million.
评论 #3741669 未加载
adrianpike大约 13 年前
Can someone more familiar with this stuff explain why Carpathia's still paying for "power and connectivity"?<p>I would have assumed that the FBI would have actually seized the servers, or at the very least pulled the network cables out.
评论 #3742123 未加载
DanBC大约 13 年前
I'm really confused by this. Is Megaupload (or any megaupload employee) facing a criminal trial? How can any "evidence trial" (or whatever they call it) be maintained if a law-enforcement agency doesn't have the drives?<p>Have any hashes been taken of the drives?
jlawer大约 13 年前
The costs of moving that amount of data is crazy. I am surprised the government hasn't seized the hardware, and chucked it in a warehouse.<p>I did some back of an envelope calculations... and its absolutely crazy. Tape would require over 17,000 Ultrium tapes. Now you could De-dupe... but the hardware to process and dedupe that much data.... not really an option. Not to mention the time to write that many tapes...<p>Something like thumpers (48 disk sun x86 boxes) would be expensive, last time I looked they were around say $30k for a large order... 160tb usable assuming 4tb disks are the thumper is split into 4 Raid 6 arrays... thats 160 thumpers... 4.8 Million<p>Even backblaze pods would likely be well over a Million...<p>This doesn't even cover hosting costs, transfer and such. Not to mention to be usable in court there are going to have to be processes in place to document compliance and validity of the copy....<p>All in all not a great place for Carpathia to be in.
评论 #3742586 未加载
genu1大约 13 年前
This post really hurts my soul.<p>Can Carpathia sue the Federal Government for NOT seizing assets. It's the data, not hardware. Data is transferable. They want it, take it.<p>Can Carpathia sue? This kind of injustice just makes me boil.
Zikes大约 13 年前
Pardon my ignorance, and this is a serious question, but why can't they just turn them off? I realize it doesn't address all the costs, but surely it could reduce them significantly.
评论 #3741791 未加载
mmaunder大约 13 年前
This gives an idea of the economic activity generated by services like megaupload and what is being removed from the economy by killing the company. Roughly $3.2 million in hosting fees, and could be more if that's just the cost price. Also salaries, over $1 million in hardware, and the various other suppliers. One wonders about the GDP of the recording and movie industries relative to the businesses they're going after.
评论 #3743863 未加载
jneal大约 13 年前
What's the big deal? Just delete the data. Customer pays for storage. Company stores. Customer stops paying for storage. Company deletes.<p>Sure, a bunch of pissed off people will certainly be upset - but it's not the company's fault - they shouldn't have to bear this burden. I can't see how they could be sued by users for this, they didn't enter into any kind of agreement with the users, only with the customer.
评论 #3742002 未加载
评论 #3741993 未加载
评论 #3742176 未加载
jakejake大约 13 年前
I can definitely understand the lost potential revenue of having unused servers. But I wonder why they are saying that cost includes power and connectivity for the servers? Seems like they would be powered down. I would actually have assumed the servers to be confiscated and taken off-premise by the FBI.
评论 #3741709 未加载
评论 #3742028 未加载
katane大约 13 年前
There are legal obligations for the government to reimburse telco companies if they are asked to spy on their customers on the governments behalf. Also, obviously, if you want to use data as evidence in a trial, it needs to be stored safely by the police and sealed off, to ensure that its integrity is preserved.<p>So either the government needs to pay up, store the drives themselves or dismiss these thousands of harddrives from the witness bench.<p>Also, I cant see how the EFFs claim has any legal merit. Theres no obligation for a site to enable you to access data you sent them.
guan大约 13 年前
Megaupload had a lot of assets that were frozen. I don’t know about the legailities, but it would be reasonable to use frozen funds to pay for this.
评论 #3741818 未加载
评论 #3741780 未加载
firefoxman1大约 13 年前
I know any legitimate hosting company would never do this, but it would be amazing if they just "happened" to have very loose security on the servers that hold Megaupload's data, and if some hacker were to..."gain unauthorized access" and wipe all the data.<p>They wouldn't be held responsible for a breakin, would they?
评论 #3742017 未加载
av500大约 13 年前
Are there seriously people that used Megaupload as their sole and only place to store their data? What if there was a fire in the server room? Or some MU intern typed rm -rf /?
评论 #3741956 未加载
nextstep大约 13 年前
Making Megaupload pay this $9000/day seems unfair, too. The US government has cut off all of Megaupload's revenue streams, and so they would be forcing Megaupload to keep paying for a service that they can no longer make money from.<p>Regardless, why is the cost so high if the server is down? Does this $9000/day reflect the loss that Carpathia suffers from not re-allocating this storage to other customers? It would seem to me that given Megaupload's current state, it would be sufficient to leave the servers powered down and unplugged until the legal dispute is resolved... surely the cost of leaving a server idle is not $9000. I don't really know though...
neilparikh大约 13 年前
Wait, why do they need to keep the power and connectivity on if they are not being actively accessed? That would same a bunch of money it they were kept off.
joering2大约 13 年前
<i>"and argues that if that data needs to be preserved, someone else—the government, Megaupload, or an interested party such as the MPAA or EFF—should bear the costs of preserving the data"</i><p>Fucking exactly!! Have fucking MPAA pick up the tab.<p>EDIT: its going to be amazing (and will take years for sure) to see if this won't bite MPAA in the ass if the judge will rule that yes they do have to pay. Would looove to see that. This should be actually a rule of thumb -- if MPAA believes someone is infringing, court suit is entirely fine, but you guys (MPAA) will pay to keep the light on in the meanwhile.
rdl大约 13 年前
I wonder if this is an opportunity for a startup, and/or an insurance product sold to SaaS end users, hosting facilities, or developers.
nwmcsween大约 13 年前
I don't feel one bit of sympathy towards Carpathia, they most likely had all the warning signs on their door - dmca notices, legal notices and more but they willingly provided service to a company with garbage morals.
dos1大约 13 年前
In my mind, the MPAA is certainly the best choice to pay these costs. They're the ones with the problem, they should be the ones to bear the burden. Especially considering Megaupload offered to take the data and they explicitly forbade it. If the MPAA didn't like the solutions offered, but can't come up with something better, then I think Carpathia should get to do what it wants.<p>Edit: The opinion above has NO legal basis whatsoever. As many have pointed out, it's not even legally possible. I made this comment solely from a "In a perfect world..." standpoint.
评论 #3741666 未加载
评论 #3741782 未加载
评论 #3741925 未加载
评论 #3744018 未加载
评论 #3743095 未加载
评论 #3741712 未加载
ecaron大约 13 年前
At that price, it would only take them 150 days to stop losing money if they started building some Backblaze servers (<a href="http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/" rel="nofollow">http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v...</a>). 25,000TB / 135TB * $7,384 = $1,367,407 minimum cost of commercial hardware to store that much.<p>"historically and mind-bogglingly large amount of data" - you could say that again.
评论 #3741679 未加载
jamespo大约 13 年前
Surely it wouldn't take too long to contact both the legitimate users of Megaupload and ask them to download their totally legitimate files?