So we had this idea of a new feature for our product. The only way to quickly do it was to somehow implement a machine learning algo and that would give us the result that we wanted. Viola!! It seemed simple.<p>Now our company doesn't have any machine learning expert or a data science genius. Going for hiring one would take time. Taking someone up on contract would be very expensive (our CEO wasn't ready to shell out that kinda money). So the task fell on me. They asked me to go through the multitudes of Machine leaning MOOCs out there and get a working prototype ready in 2 weeks.<p>I had already done Andrew Ng's course back when it came out for the first time. But my memory had faded for the lack of practice.<p>I re-ran the course again. I went over a couple of online ML books too.<p>Then I started thinking of the problem at hand. Unfortunately, it turned out to be a chicken and egg problem. For the feature to work perfectly we needed a large amount of training data to train our models.
But without the feature actually deployed, we didn't have any way to collect any training data.<p>So we ultimately fell back to simple algo, that took it's decisions based on a few hard coded rules. Things have been working fine till now.
Machine Learning is much more nuanced than people seem to understand. You can't just throw data at a net and expect results-this field requires a heavy degree of intuition, and engineers must be prepared for nets to pick up on patterns not obvious to humans, which can lead to unintuitive results.<p>Neural nets are basically black box heuristics, with unpredictable edge cases. Much like human reasoning, I'd warrant!
That sounds awfully close to DMAIC.<p><a href="https://en.wikipedia.org/wiki/DMAIC" rel="nofollow">https://en.wikipedia.org/wiki/DMAIC</a><p>Nothing wrong with that though...
So we do the loop 50 time and we now have an algorithm that works (97%!) on the test set. We are happy! We run it in production and everything looks good (prbly 92% ish). Everyone is happy! We all get promoted or get new jobs. Then, one day, someone actually looks at what it's doing... and lo. It. does. not. work (~51%) Everyone is sad. Apart from us! Yay!<p>Seriously - an optimisation loop on a test set? Seriously?
The point about hacking away at the code needs to be couched heavily. It's too easy to conclude you've got negative or positive results when what you really have is a silly little bug. The lack of focus on implementation skills in data (or even "real" science) is frightful. The one take away anyone trained in software engineering could share is that if you aren't very sure if it is working as intended, it's very likely not. Code review is very applicable here when making major pivots, even if unit or other testing is decidedly too time consuming for the train test improve loop.<p>Edit: typo "of" to "if". Somewhat serendipitous if you think about it.