Neat project. A few concerns I have:<p>1) ideally you would be able to measure change in every metric, not just ones you whitelist for a specific experiment. What if adding one feature changes how people interact with a completely different feature? You would want to know about this.<p>2) just showing change without any sort of hypothesis testing is just begging for people to draw unfounded conclusions from the results. Instead of a vague note that more than 100 sessions is necessary to get significance, you need to have real confidence intervals at the very least.