I posted this as a blog entry and in quora, but I would also like to hear any thoughts from the hn community on this.<p>We are in a data splurge. Everyone is interested in data, and developing data driven products. We collect tons of data about our users.<p>But how does one decide what data is worth collecting? And how do you strike the balance between collecting just increasing noise vs those events that will likely give us the crucial insight?
I would argue that that depends on your architecture.
If you gather more data than you data storage can handle and it starts slowing you down your core product then you should stop. But if you that is not a factor then you could collect as much as possible and later decide whether it was or is worth collecting UNLESS it starts raising privacy concerns if you have a user driven project.
- Well chosen statistics can extract the needles from the haystack. So collecting more may be better. This observation is so true that some people prefer the phrase "extracting needles from needles".<p>- Get familiar with the new consumer privacy bill of rights.