TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Making open source data more available

254 点作者 taylorwc将近 9 年前

13 条评论

minimaxir将近 9 年前
The GitHub Archive dataset was updated as well. Example BigQuery to get the Top Repositories from 2015-2016 YTD, by the number of Stars given during that time:<p><pre><code> SELECT repo.id, repo.name, COUNT(*) as num_stars FROM TABLE_DATE_RANGE([githubarchive:day.], TIMESTAMP(&#x27;2015-01-01&#x27;), TIMESTAMP(&#x27;2016-12-31&#x27;)) WHERE type = &quot;WatchEvent&quot; GROUP BY repo.id, repo.name ORDER BY num_stars DESC LIMIT 1000 </code></pre> Which results in this output: <a href="https:&#x2F;&#x2F;docs.google.com&#x2F;spreadsheets&#x2F;d&#x2F;16yDS2wDdDOTxjVsjGvWmpHVsOIU65wLEjXFHDtDeKU4&#x2F;edit?usp=sharing" rel="nofollow">https:&#x2F;&#x2F;docs.google.com&#x2F;spreadsheets&#x2F;d&#x2F;16yDS2wDdDOTxjVsjGvWm...</a><p>Since the query only hits 3 columns, it only uses 15.4GB of data (out of a 1TB allowance)<p>More information on the GitHub Archive changes: <a href="https:&#x2F;&#x2F;medium.com&#x2F;@hoffa&#x2F;github-archive-fully-updated-notice-some-breaking-changes-64e7e7cd0967" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;@hoffa&#x2F;github-archive-fully-updated-notic...</a>
评论 #12005270 未加载
cdibona将近 9 年前
Please note if you are wondering where your project is, we only archived to bigquery open source projects. So add an open source license for reals?
评论 #12004963 未加载
fhoffa将近 9 年前
I&#x27;m compiling all links and tips I can find at:<p>- <a href="https:&#x2F;&#x2F;medium.com&#x2F;@hoffa&#x2F;github-on-bigquery-analyze-all-the-code-b3576fd2b150" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;@hoffa&#x2F;github-on-bigquery-analyze-all-the...</a><p>The Changelog also invited us to record podcast with Arfon Smith (GitHub), Will Curran (Google), and me (Google) - <a href="https:&#x2F;&#x2F;changelog.com&#x2F;209&#x2F;" rel="nofollow">https:&#x2F;&#x2F;changelog.com&#x2F;209&#x2F;</a><p>Happy to answer any questions!
chickenbane将近 9 年前
Here&#x27;s the announcement from Google<p><a href="http:&#x2F;&#x2F;google-opensource.blogspot.com&#x2F;2016&#x2F;06&#x2F;github-on-bigquery-analyze-all-code.html" rel="nofollow">http:&#x2F;&#x2F;google-opensource.blogspot.com&#x2F;2016&#x2F;06&#x2F;github-on-bigq...</a>
seltzered_将近 9 年前
FWIW, the author (Arfon Smith) had a recent Microsoft Research talk on github and open collaboration for the scientific community: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=7XOuJFwy270" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=7XOuJFwy270</a>
sqs将近 9 年前
This is super cool. If you want to benefit from this info in your workflow now, we have analyzed some of this same data at Sourcegraph, and you can see (e.g.) all the repos that call http.NewRequest in Go (<a href="https:&#x2F;&#x2F;sourcegraph.com&#x2F;github.com&#x2F;golang&#x2F;go&#x2F;-&#x2F;info&#x2F;GoPackage&#x2F;net&#x2F;http&#x2F;-&#x2F;NewRequest?refs=all" rel="nofollow">https:&#x2F;&#x2F;sourcegraph.com&#x2F;github.com&#x2F;golang&#x2F;go&#x2F;-&#x2F;info&#x2F;GoPackag...</a>) or lots of usages of Joda-Time DataTime in Java (<a href="https:&#x2F;&#x2F;sourcegraph.com&#x2F;github.com&#x2F;JodaOrg&#x2F;joda-time&#x2F;-&#x2F;info&#x2F;JavaArtifact&#x2F;joda-time&#x2F;joda-time&#x2F;-&#x2F;org&#x2F;joda&#x2F;time&#x2F;DateTime:type?refs=all" rel="nofollow">https:&#x2F;&#x2F;sourcegraph.com&#x2F;github.com&#x2F;JodaOrg&#x2F;joda-time&#x2F;-&#x2F;info&#x2F;...</a>). You can search for functions&#x2F;types&#x2F;etc. on Sourcegraph to cross-reference them globally. We&#x27;re working on an API to provide this information to anyone else who wants it; email me at sqs@sourcegraph.com if you are interested in using it.
kozikow将近 9 年前
Top emacs packages required on github repos <a href="https:&#x2F;&#x2F;kozikow.wordpress.com&#x2F;2016&#x2F;06&#x2F;29&#x2F;top-emacs-packages-used-in-github-repos&#x2F;" rel="nofollow">https:&#x2F;&#x2F;kozikow.wordpress.com&#x2F;2016&#x2F;06&#x2F;29&#x2F;top-emacs-packages-...</a>
hut8将近 9 年前
I made an interface to some of this which can be used for finding a single user&#x27;s contributions over time: <a href="https:&#x2F;&#x2F;githubcontributions.io" rel="nofollow">https:&#x2F;&#x2F;githubcontributions.io</a>
kozikow将近 9 年前
sample_contents only lists contents of 10% sample of all files. Scanning the full data set may be hard for people new to big query. I managed to query the full data set in <a href="https:&#x2F;&#x2F;kozikow.wordpress.com&#x2F;2016&#x2F;06&#x2F;29&#x2F;top-emacs-packages-used-in-github-repos&#x2F;" rel="nofollow">https:&#x2F;&#x2F;kozikow.wordpress.com&#x2F;2016&#x2F;06&#x2F;29&#x2F;top-emacs-packages-...</a> . &quot;Resources exceeded during query execution&quot; are especially hard to debug as may mean many things that could have caused Big query to go out of memory.<p>Some big query tricks to make it work:<p><pre><code> - TOP&#x2F;COUNT is faster and more memory efficient than GROUP BY&#x2F;ORDER - Filtering data prior to join in sub-query reduces memory usage. - Regexps and globs are expensive. Use LEFT&#x2F;RIGHT as a faster version. - Avoid reading all files to get around 1TB freebie scan limit. Only access file contents after filtering some paths.</code></pre>
dikaiosune将近 9 年前
I&#x27;m curious how frequently this will be updated. It&#x27;d be nice to set up weekly&#x2F;monthly queries which show updated information.
评论 #12005020 未加载
sergames将近 9 年前
I created open source webservice which serves data in json format: <a href="http:&#x2F;&#x2F;getjson.info" rel="nofollow">http:&#x2F;&#x2F;getjson.info</a><p>Hope you will find it useful
AznHisoka将近 9 年前
Is there a way to get all Ask HN posts from the HN dataset?
评论 #12004882 未加载
评论 #12004893 未加载
fiatjaf将近 9 年前
<a href="https:&#x2F;&#x2F;scholar.google.com&#x2F;scholar?hl=en&amp;q=github&amp;btnG=&amp;as_sdt=1%2C32&amp;as_sdtp=" rel="nofollow">https:&#x2F;&#x2F;scholar.google.com&#x2F;scholar?hl=en&amp;q=github&amp;btnG=&amp;as_s...</a><p>People take their time &quot;studying&quot; everything these days, isn&#x27;t it?<p>I can&#x27;t imagine how would that be if the State wasn&#x27;t paying them to do that.