In our spare time, we're researching this dataset in detail. Here are some questions that we're interested in. Would love to hear other ideas and to have folks dig into the data. I think this dataset may be of interest to hackers, researchers and marketers.<p>1. Are the trajectories (e.g. rank vs time) for all popular posts of the same shape? They look ~logarithmic.<p>2. Are there identifiable clusters when you look in 4d space for rank vs points vs comments?<p>3. How does the impact of a post depend quantitatively on its respective cohort. I.e., what's a good model to normalize performance based on what else was happening that day?<p>4. What fraction of posts have comment threads that are "hijacked" by the first comment? Is their a quantitative way to find this, perhaps by looking at (2) above?<p>5. What are more detailed metrics to collapse "performance" of a post onto a single number?<p>6. How does performance on HN compare to reddit, etc?<p>7. How is the HN community different than other communities, if at all?<p>8. Given the time-dependent data, can we create a good estimator for the number of active HN users per day? Or can we at least create a relative ranking of the number of unique users between different days?
That's pretty sweet! Shameless plug: I built something just for your personal points a while ago using the Agolia API. It's not that sophisticated and detailed but it's good enough for my personal usage.<p><a href="https://hn.notmyhostna.me/" rel="nofollow">https://hn.notmyhostna.me/</a>
Some things that might be nice to see are:<p>i) how many different people post URLs from a particular domain?<p>ii) how many different domains does a particular person post?<p>There's also iii but I'm not sure how to word it. It's something like "given a particular domain, what's the average[1] number of different domains posted by people who've posted this domain at some point?"
very cool. I've been thinking of building a distributed analytics service like this -- can you talk about what the architecture of something like this is? If I had to build a custom chart, I would have to go get my own data?<p>It would be awesome to have a service that provides hosted data, allows anyone to make charts / random transformations / add extra data and then add that to the main dashboard.
I'll probably be down-voted to hell, but whats with all the redirects that make "back"ing out of the site back to HN so painful, I counted three redirects before I landed on the final page, and had to long-click my back button to avoid the obnoxious redirect trap. Looks like your tracking (I assume) might drive people away.