Since I've been looking into a data-driven approach into Github Archive data, I have a few comments:<p>1) A blog post is not a Show HN topic: <a href="https://news.ycombinator.com/showhn.html" rel="nofollow">https://news.ycombinator.com/showhn.html</a><p>2) If you're plotting a density scatterplot in ggplot2, you <i>must</i> used a reduced alpha, otherwise there is no indication of density. (e.g. the conclusion of "Almost 90% of our repositories have less than 20,000 stars and 20 languages." is not apparent)<p>3) Why did you use a "repository index" for the second chart? Why did you sort it descending? Why are you using a scatter plot instead of a histogram?