<p><pre><code> article.published_at = Time.zone.parse(elm.css(".byline time").first.to_h["datetime"])
# timezone seems inconsistent, no big deal because we only care about date anyway
</code></pre>
I've written a TechCrunch scraper myself, and it turns out the reason the time-zone is inconsistent is TechCrunch outputs the time in 12-hour AM/PM...but forgets to include the AM/PM. So the hours just loop around from 0 to 12.
I stopped visiting TechCrunch almost 5 years ago when it changed from highlighting new and upcoming startups to simply covering news of the already established and huge tech companies such as Twitter, Facebook etc.<p>I haven't been back, but do catch an article or two when it's linked from HN. Does anyone here still frequent TC enough to comment on the current editorial direction?
This is pretty good - until you published it. Now all you need to ask is how much incentive TC's owners have to show a certain graph. (That is just one example of gamification this can create.) I mean because they can publish whatever headlines they want and track and hack this graph absolutely directly.<p>I think it would have been better to keep this pretty powerful heuristic for yourself :)
I remember doing a very basic analysis (via Google search) of TC headlines back in 2012, but I was curious about their preference for funded companies vs bootstrapped, in terms of coverage. Predictably, they were mostly covering funding rounds:<p><a href="http://blog.itrendcorporation.com/2012/07/23/no-coverage-for-self-funded-companies/" rel="nofollow">http://blog.itrendcorporation.com/2012/07/23/no-coverage-for...</a><p>Your data is very interesting. Any change you could also "group by" company and list companies which are more frequently (repeatedly) covered by TC? I have a theory about that, would be interesting to test.
Maybe the equation is that TechCrunch articles about startup funding get more "help" from those startups on sites like HN/Reddit, thus they get much more exposure, thus more views.<p>This is an amazing analysis. Thank you. I really enjoyed the aggregated "x for y" headlines.
This is really awesome. It really says something about the quality of the product being churned out at TC.<p>I imagine that most of the funding articles are short. It would be interesting to see how much of the total words were about fundraising (i.e. not just headlines, but the entire article).
Funraising is very envy-inducing. Envy tends to be good for advertisers (look at womens print magazines, if you have any doubt). I'd be interested to see how this impacts their advertising business metrics.