I found it got off to a bit of a slow start, but was a fun, rewarding, and very real-world read.<p>In particular, I appreciated admissions like this:<p>> Writing our own genetic algorithm in C# was a bad idea. It took us weeks to implement, test, and optimize. Not to mention all the time spent waiting for results. There was a better solution available all along (the optim function in R). Because we didn’t do proper research, we overlooked it and lost time. Sigh.<p>... which far too few eng blogs overlook / fail to mention.
R is a very well supported language on Windows (since MS bought Revolution Analytics), very well integrated with SQL Server and Azure, and R Server solves a pain point that a lot of real-world users have. And of course it's open source.
Question to spark discussion and for me to fill potential gaps in my knowledge, not to criticize the article as I very much appreciate the transparency and build-up from the simple naive initial approach to the final approach used in production:<p>Is anyone else bothered by the claim that there "is a 100% chance that the new version is better than the current one" shown by using bootstrap? Maybe I've just never come across such a use of bootstrap through my encounters with statistics. I know it as a tool for resampling from a population to build up properties of your estimator (mean, variance, what have you) when all you have is a dataset and no clue about the actual distribution. When I saw bootstrap with that probabilistic claim, I thought the author would calculate a bootstrapped (100-x)% confidence interval for both the current and the new weights: and if the intervals didn't overlap with one another then you can claim with (100-x)% certainty that one is better than the other. But the author creates a new statistic that is a function of both datasets; Z_i = 1 if new is better than current on iteration i (on a random subset of data) else 0, and for all N=10000 iterations Z_i = 1. The chance/probabilistic claim made of new being better than current is based on the fact that no variation was seen on Z_i (I'm also kind of skeptical that out of so many iterations with random subsets that each time the new weights were better than the current). I think at most you can say that you simulated subsets of the data and 100% of the time new > current; the current claim leads me to believe there's inference that isn't there.<p>Maybe I should just ask one of my past stats profs. Open to someone enlightening me.
Great read. I loved the honest balance between engineering features and how much they ultimately ended up mattering to users. Oftentimes, the features we select can be quite arbitrary -- it's good to do gut checks by running real-time validation of results as often as possible. Fortunately, at Stack, you've got the userbase to do just that :)
> Genetic algorithm running on 56-core machine<p>Wow, just... wow. I wonder why they didn't utilize (multiple) GPUs instead? I would guess it would be far more efficient in all aspects. Especially now that there is TensorFlow & co.