TechEcho

12 comments

vgtover 8 years ago

Google BigQuery has pure separation of storage and compute, which allows for Public Datasets [0] to exist in ready-to-query highly optimized format. Run SQL immediately!BigQuery has a perpetual free query tier of 1 Terabyte per month ($5). In addition, you can a get $300 in Google Cloud credits for two months to do more work [1].(work on Google Cloud and used to work on BigQuery)Edit: spaces, how do they work?!?![0] <a href="https://cloud.google.com/bigquery/public-data/" rel="nofollow">https://cloud.google.com/bigquery/public-data/</a>[1] <a href="https://cloud.google.com/free-trial/" rel="nofollow">https://cloud.google.com/free-trial/</a>

评论 #13189976 未加载

jsprocover 8 years ago

For me, the killer dataset would be the Google Scholar data. That would just blow the whole scientometric space wide open. It would also be a nice introduction for researchers into BigQuery (calculate your own H-factor)

评论 #13191994 未加载

wjosseyover 8 years ago

I'd highly recommend anyone who hasn't played with BigQuery yet to take a few moments to give it a shot. I was highly skeptical at first when my old company moved to it, but now I've come to love it as a platform.What I love most: [0] Exceptionally fast queries against large datasets. [1] Very Cost Effective (although as others have called out, it can be accidentally misused resulting in a big bill). [2] Non data-engineers can setup, use, and manage, with minimal difficulty. [3] Gets you away from high-priced solutions like Vertica or Teradata. [4] No management headaches like Redshift.Downsides: [0] Quotas can get annoying to work with. [1] Not a ton of wrappers in a diverse set of languages. [2] Not a ton of support with desktop SQL clients.

评论 #13190748 未加载

minimaxirover 8 years ago

The complexity of the Stack Overflow Data Explorer was the only reason I never played around with SO Data.I am looking through the tables now and there is certainly a lot of cool stuff that can be done! :)Although, there appear to be a few tables with garbage data and only few rows, like posts_privilage_wiki and posts_wiki_placeholder.

评论 #13191246 未加载

评论 #13189769 未加载

sambrandover 8 years ago

Hey, I figured it out!<a href="https://bigquery.cloud.google.com/savedquery/809799891616:6bda226894244e1f82ed214cfd1c2af3" rel="nofollow">https://bigquery.cloud.google.com/savedquery/809799891616:6b...</a>There's a learning curve. Getting in without giving up your credit card isn't exactly intuitive. And Leslie isn't the most popular name in the US -- it's gender neutral name, so there are lots of rows.

评论 #13192135 未加载

__coaxialcabalover 8 years ago

I really hope GCP continues to do this with other types of syndicated public data feeds like American FactFinder, FRED, etc. They are missing a substantial audience that would see this as a killer app and compelling reason to move to GCP for this. The success of Enigma.io suggests there is a legit (although easily reproduced) business case here with minimal effort. Anyone who has ever worked to prep and integrate this type of data could now spend that time doing science.

ameliusover 8 years ago

Since the article is on a "big data and machine-learning blog", I wonder: what machine-learning applications would this data-set enable?

评论 #13190563 未加载

评论 #13189708 未加载

anamoulousover 8 years ago

I wish they included the common crawl datasets.

评论 #13189757 未加载

hawskiover 8 years ago

I have two Google accounts and BigQuery has a problem if I want to use the not default one. It inhibited my curiosity a bit.I had an idea to query the usage of string handling functions in C code bases, so I could do something like a manual linting around them.

评论 #13218406 未加载

koolbaover 8 years ago

Very cool. I love the idea of public data sets like these.I wonder if AWS can swing somethings similar with Athena. They already have "requester pays" buckets for S3 so should be inline with that to have a similar offering for Athena connected to S3 resources.

WatchDogover 8 years ago

It would be nice if they added a public dataset of the certificate transparency logs.

yaggaover 8 years ago

Did StackOverflow agreed to offer the data or Google just took because they can? Stackoverflow has their own query tool. Why would they give their data to Google?

评论 #13191685 未加载

评论 #13192223 未加载

12 comments

vgtover 8 years ago

评论 #13189976 未加载

jsprocover 8 years ago

评论 #13191994 未加载

wjosseyover 8 years ago

评论 #13190748 未加载

minimaxirover 8 years ago

评论 #13191246 未加载

评论 #13189769 未加载

sambrandover 8 years ago

评论 #13192135 未加载

__coaxialcabalover 8 years ago

ameliusover 8 years ago

Since the article is on a "big data and machine-learning blog", I wonder: what machine-learning applications would this data-set enable?

评论 #13190563 未加载

评论 #13189708 未加载

anamoulousover 8 years ago

I wish they included the common crawl datasets.

评论 #13189757 未加载

hawskiover 8 years ago

评论 #13218406 未加载

koolbaover 8 years ago

WatchDogover 8 years ago

It would be nice if they added a public dataset of the certificate transparency logs.

yaggaover 8 years ago

Did StackOverflow agreed to offer the data or Google just took because they can? Stackoverflow has their own query tool. Why would they give their data to Google?

评论 #13191685 未加载

评论 #13192223 未加载

BigQuery public datasets now include Stack Overflow Q&A

12 comments

BigQuery public datasets now include Stack Overflow Q&A

12 comments