How much attention does a Hacker News frontpage post drive to a GitHub project?<p>For this visualization I combined 2 datasets: GitHub Archive (http://www.githubarchive.org/) and Hacker News (https://news.ycombinator.com/item?id=10440502), both living in BigQuery (https://cloud.google.com/bigquery/what-is-bigquery, https://reddit.com/r/bigquery).<p>The visualizations were built with Google Cloud Datalab (https://cloud.google.com/datalab/, Jupyter/IPython notebooks on the cloud).<p>With one SQL query you can extract the daily number of stars a project gets, and with another one the GitHub urls that were submitted to the Hacker News - or combine both queries in one:<p><pre><code> SELECT repo_name, created_at date, COUNT(*) c, GROUP_CONCAT_UNQUOTED(UNIQUE(hndate+':'+STRING(hnscore))) hndates, SUM(UNIQUE(hnscore)) hnscore, SUM(c) OVER(PARTITION BY repo_name) monthstars
FROM (
SELECT repo_name, actor_login, DATE(MAX(created_at)) created_at, date hndate, score hnscore
FROM [githubarchive:month.201509] a
JOIN (
SELECT REGEXP_EXTRACT(url, r'github.com/([a-zA-Z0-9\-\.]+.[a-zA-Z0-9\-\.]*)') mention, DATE(time_ts) date, score
FROM [fh-bigquery:hackernews.stories]
WHERE REGEXP_MATCH(url, r'github.com/[a-zA-Z0-9\-\.]+')
AND score>10
AND YEAR(time_ts)=2015 AND MONTH(time_ts)=9
HAVING NOT (mention CONTAINS '.com/search?' OR mention CONTAINS '.com/blog/')
) b
ON a.repo_name=b.mention
WHERE type="WatchEvent"
GROUP BY 1,2, hndate, hnscore
)
GROUP BY 1,2
HAVING hnscore>300
ORDER BY 1,2,4
LIMIT 1000
</code></pre>
The visualization: https://i.imgur.com/B5awmAL.png<p>(correlation is no causation, but there is indeed correlation between both)<p>--@felipehoffa