Wow! I worked on EODSIS in 93-96. We estimated 16 petabytes, at the time it would be one of the worlds largest databases. We changed horses midstream moving our user interfaces from X-windows Motif to WWW. And built a very early Oracle DB accessible via WWW. There was no cloud then except missions studying atmospheric water vapor.
When this was originally designed there were to be several (6-7) DAACs - Distributed Active Archive Centers (<a href="https://earthdata.nasa.gov/eosdis/daacs" rel="nofollow">https://earthdata.nasa.gov/eosdis/daacs</a>) to store data near where it was needed or captured. Now they have 12 and are storing on AWS. Amazon didn't exist when this was originally built.
This article seems short sighted.<p>1. Using the AWS cost calculator is pointless, naturally an entity the size of NASA would get heavily discounted rates.
2. As data volume grows, the complexities of working with that data expands. NASA appears to be embracing cloud computing by embracing a paradigm where scientists push computation to where the data rests rather than downloading data [1], [2], [3], thereby paying egress on only the higher order data products.
3. The report notes that NASA has tooling to rate limit and throttle access to data. This, in itself, proves that NASA didn't "[forget] about eye-watering cloudy egress costs before lift-off".<p>People may scream about vendor lock in, which is a fair complaint; but acting like NASA just didn't think about egress is misleading.<p>NASA is ultimately a science institution, I think diverting effort away from infrastructure management and towards studying data is likely a wise decision.<p>[1: <a href="https://www.hec.nasa.gov/news/features/2018/cloud_computing_services.html" rel="nofollow">https://www.hec.nasa.gov/news/features/2018/cloud_computing_...</a>]
[2: <a href="https://link.springer.com/article/10.1007/s10712-019-09541-z" rel="nofollow">https://link.springer.com/article/10.1007/s10712-019-09541-z</a>]
[3: <a href="https://ui.adsabs.harvard.edu/abs/2017AGUFMIN21F..02P/abstract" rel="nofollow">https://ui.adsabs.harvard.edu/abs/2017AGUFMIN21F..02P/abstra...</a>]
> “However, when end users download data from Earthdata Cloud, the agency, not the user, will be charged every time data is egressed.<p>Not necessarily, depending on how the users access the data. If users access the data through their own AWS accounts, NASA could leverage S3's "Requester Pays" feature [1], to let the user pay for downloading the data.<p>1: <a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPaysBuckets.html" rel="nofollow">https://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPay...</a>
I'm not saying this won't be a financial cluster - it likely will cost many times more than planned - but the headline here is just a flat-out lie.<p>TFA says:<p>"a March audit report [PDF] from NASA's Inspector General noticed EOSDIS hadn’t properly modeled what data egress charges would do to its cloudy plan."<p>'Hadn't properly modeled' is very different from 'forgot about'. And if you actually read the linked report, it says things like:<p>"ESDIS officials said they plan to educate end users on accessing data stored in the cloud, including providing tools to enable them to process the data in the cloud to avoid egress charges."
and
"To mitigate the challenges associated with potential high egress costs when end-users access data, ESDIS plans to monitor such access and “throttle” back access to the data"<p>Neither of those statements would be <i>in the audit</i> if the entire topic had been a surprise.
<p><pre><code> YOU ARE NOT AFRAID?
'Not yet. But, er...which way to the egress, please?'
There was a pause. Then Death said, in a puzzled voice: ISN'T THAT A FEMALE EAGLE?
</code></pre>
I've been reading A Hat Full of Sky to my daughter these days, and there's a running joke that "supposedly intelligent people" don't know the meaning of the word "egress", mixing it up with things like egret, ogress or eagles.<p>(See also the inspiration for the joke: <a href="https://unrealfacts.com/pt-barnum-would-trick-people-with-a-this-way-to-egress-sign/" rel="nofollow">https://unrealfacts.com/pt-barnum-would-trick-people-with-a-...</a> )
It's The Register, people. Don't take it seriously. It's practically The Onion of the IT industry, especially the comments sections.<p>I've written two articles for them and the comments are a joke. They're all anti-Cloud, anti-progressive. Try selling them Kubernetes has a solution to their problems: they'll think you've come to steal their children. I know, I've tried.<p>In short: this never happened. NASA didn't forget anything. It does, however, make for a great eye catching headline!<p>Sorry to be bitter about this, but publications like The Register serve little purpose these days. It caters to a specific kind of IT personality that can't let go of their physical tin and they think public Cloud has no place or use at all. Again I know, I've tried convincing these people of such things.
Unless my numbers are <i>way</i> off, I got around $15.5 million per year using Backblaze's calculator: <a href="https://www.backblaze.com/b2/cloud-storage-pricing.html" rel="nofollow">https://www.backblaze.com/b2/cloud-storage-pricing.html</a><p>Numbers used:<p><pre><code> Initial upload: 258998272 GB (1024*1024*247)
Monthly upload: 100 GB (default)
Monthly delete: 5 GB (default)
Monthly download: 1048576 GB (1 PB)
Period of Time: 12 months (default)</code></pre>
I assume the data accessed is a heavily skewed pareto distribution.<p>Given that, it's maybe still cheaper to build their own serving / caching layer in front to save egress costs than to have constructed the whole storage solution themselves.
This surely was entirely known to AWS, where they were rubbing their hands at the fact that every user of this data has to process it using EC2 on site.<p>This is Cloud lock-in using data location.
I wonder if this includes or if they can use Direct Connect? [1]<p>Cloud data transfers are too expensive, personally I assume that it costs more to measure and bill for bandwidth than the usage itself...<p>1: <a href="https://aws.amazon.com/directconnect/" rel="nofollow">https://aws.amazon.com/directconnect/</a>
Cue the cloud apologists that “it’s better to use the cloud than to build and manage your own infra”.<p>This is why you build and run your own storage, similar to Backblaze (who is almost entirely bootstrapped except for one reasonable round of investment).
> You don't need to be a rocket scientist to learn about and understand data egress costs. Which left The Register wondering how an agency capable of sending stuff into orbit or making marvelously long-lived Mars rovers could also make such a dumb mistake.<p>I used to work very closely with this department at NASA. Without saying too much, the short answer is "tenured government employees more concerned about job security than the success of the project" is how an agency could make such dumb mistakes.
What's the opposite of AWS Snowmobile[0]?<p>[0] - <a href="https://aws.amazon.com/snowmobile/" rel="nofollow">https://aws.amazon.com/snowmobile/</a>
Using AWS for this type of use case is dumb for an org as large as NASA, if cost savings is a goal. It's cheaper to just land capacity at a datacenter.
This article is misleading. The entire point is to not move data out of the cloud. Instead bring your computing (analysis, visualization) to the data and pay for compute cycles on AWS. If your workflows are short/bursty, you will come out ahead. Moreover, you will be able to do big data-style computations that you cannot do in a local computing environment. This is bad journalism, IMO.
If you are facing similar problems you should know traffic via Cloudflare from B2 is free. I am not 100% CF would be happy if NASA picked the CF free tier but probably their quote would be magnitudes lower than Amazon's.
Can't they just use the current DAACs as a caching layer? Seems like the least ugly way out of this mess.<p>Also - can't they use torrent tech? I wouldn't mind helping out a bit on space & data
I wonder why they wouldn't use Wasabi:<p><a href="https://wasabi.com/cloud-storage-pricing/" rel="nofollow">https://wasabi.com/cloud-storage-pricing/</a><p>Looks like egress is free.<p>Maybe because it's comparably untested? Does anyone here have any experience with it?
This is exactly why the costs are set up that way. The first time I saw AWS pricing I chuckled and thought "roach motel." Data goes in but it doesn't come out. Its one of many soft lock in mechanisms cloud hosts use.
just build your own storage and save an incredible amount.<p>It's hard you might think, but it's not. croit.io provides all you need to deploy a scalable cluster even on multiple geographic regions.<p>Price for 1 PB sized cluster including everything from rack to hardware to license to labor for below 3€/TB/Month or at the Amazon Glacier price tag but with the S3-IA access.
1 Terabyte of hard disk cost ~50USD.<p>247 Petabyte ~ 247000 Terabyte > 50000 USD.<p>Network cards, bandwidth, electricity cost > I can't guess.<p>Couple of good engineers (hardware and software ones), which they definitely
have.<p>May be they could have built their own cloud in < ~10-15 million USD. And that won't be recurring cost.<p>May be they missed article about Bank of America saving ~2 Billion USD, by building their own cloud.