Why is machine learning ‘hard’?

327 pointsby Dawny33over 8 years ago

24 comments

mattlondonover 8 years ago

I think the problem is that we don't really understand ML properly yet.Like picking hyperparamters - time and time again I've asked experts/trainers/colleagues: "How do I know what type of model to use? How many layers? How many nodes per layer? Dropout or not?" etc etc And the answer is always along the lines of "just try a load of stuff and pick the one that works best".To me, that feels weird and worrying. Its like we don't yet understand ML properly yet to definitively say, for a given data set, what sort of model we'll need.This can lead us down the debugging black-hole TFA talks about since we appear to have zero-clue about why we chose something, so debugging something ultimately might just be "opps - we chose 3 layers of 10, 15, and 11 nodes, instead of 3 layers of 10, 15 and 12 nodes! D'oh! Lets start training again!"It really grates me to think about this considering how much maths and proofs and algorithms get thrown at you when being taught ML, then to be told when it comes to actually doing something its all down to "intuition" (guessing).And yeah as others have said - data :-)

评论 #12939963 未加载

评论 #12939515 未加载

评论 #12938836 未加载

评论 #12939199 未加载

评论 #12939885 未加载

评论 #12939698 未加载

评论 #12939965 未加载

评论 #12939943 未加载

bltover 8 years ago

I think to some extent, this kind of difficulty occurs with any numerical programming. With typical software engineering the bugs are usually in logic. All but the hardest problems can be diagnosed with a visual debugger. With numerical code, though, you usually can't look at the state of a few discrete-valued variables and a stack trace to figure out what went wrong. "Why is this matrix singular?" can take days to answer. You spend a lot of time staring at the code, comparing it to the math on paper, trying to visualize high-dimensional intermediate data, etc. Continuous math can be a lot harder to reason about than discrete.

评论 #12938321 未加载

评论 #12938798 未加载

评论 #12937337 未加载

eva1984over 8 years ago

So there is a greedy strategy to approach problem with ML1.Starts with a VANILLA model, a proven one. To establish a baseline you can fall back on. For example, in deep learning, starts with fully-connected nets, then vanilla CNN, adding BNs and ReLus, then residual connections, etc.2.Do not spend too much time tuning hyperparmeters, especially in the field of deep learning, once you change the your algorithm, a.k.a network structure, everything changes.3.Adding complexity as you go. It is important once you established some solid baseline, then you can start add more fancy ideas into your stack, and you will find, fancy ideas are improvement over the already working ideas, and it is not that hard to add it.4.One important tip to remind, once you change your algorithm, as time goes, those changes might not happy with each other. So reduction is also very important. Rethink your approach from time to time, take away stuff that didn't fit anymore.5.Look At Your Data. Garbage in, Garbage out. Cannot be more true. Really. Look at your data, maybe a sample of it, see whether youself, as the most intelligent being as of yet, can make sense of it or not. If you cannot, then you need to probably improve the quality.Anyway, ML is a very complex field and developing like crazy, but I didn't feel the methodology to tackle it is any different from any other complex problems. It is a iterative processes starting from simple prove solutions to something greater, piece by piece. Watch and think then improve.

jupp0rover 8 years ago

There are also lots of architectural problems with machine learning components to consider, beautifully summarized in "Machine Learning: The High Interest Credit Card of Technical Debt" by Sculley et al. <a href="http://research.google.com/pubs/pub43146.html" rel="nofollow">http://research.google.com/pubs/pub43146.html</a>

BickNowstromover 8 years ago

I actually think Machine Learning is relatively easy. There are a lot of resources, the community is very open, state-of-the-art tools are available, and all it needs to get incrementally better is trying out more stuff on different data sets.I worked in SEO before, which had far more elements of "black magic". Perhaps SEO helps with the transition to ML, because you are basically reverse engineering a model (Google's search engine) / crafting input to get a higher ranked output. It's feature engineering, experimentation, and debugging all-in-one.And front-end development of the old days... debugging old javascript or IE6 render bugs makes ML debugging pale in comparison. You had to make a broken model work, without being able to repair it.As for the long debugging cycles in ML. John Langford coined "sub-linear debugging": Output enough intermediate information to quickly know if you introduced a major bug or hit upon a significant improvement [1]. Machine learning competitions are not so much won by skill, but by the teams iterating faster and more efficiently: Those who try more (failed) experiments hit upon more successful experiments. No Neural Net researcher should let all nets finish training, before drawing conclusions/estimates on learning process.Sure, the ML field is relatively new, and computer programming has a longer history of proper debugging and testing. It is difficult to do monitoring on feedback-looped models running in production, yet no more difficult than control theory ;). And proper practices are being developed as we speak [2]. The author will probably write a randomization script to avoid malordered samples automatically in the future.[1] <a href="http://www.machinedlearnings.com/2013/06/productivity-is-about-not-waiting.html" rel="nofollow">http://www.machinedlearnings.com/2013/06/productivity-is-abo...</a>[2] <a href="http://research.google.com/pubs/pub43146.html" rel="nofollow">http://research.google.com/pubs/pub43146.html</a>

aflamover 8 years ago

The long and convoluted debugging cycle for machine learning really hurts my faith in ML models. This issue - with some practical advice - was the center of this interview (disclaimer: author). <a href="https://shapescience.xyz/blog/interview-data-science-methodology/" rel="nofollow">https://shapescience.xyz/blog/interview-data-science-methodo...</a>I'm convinced we lack decent tools for ML debugging: what could they be?

Matthias247over 8 years ago

I think once you have the need to go deep enough into a topic they all get hard.Debugging and testing is also hard in all things that are somehow related to realtime or concurrency. E.g. OS development, embedded firmware, network stacks, etc. For these things you often also need to know about Math, Physics, Statistics, Electronics, Hardware and Software Architecture, etc.Game engine development is also hard because you should also know about most of this stuff to really find the most efficient solutions.

du_bingover 8 years ago

That's right, machine learning requires knowledges of so many fields, so if any problem occurs, developer has to do so so many check to find the problem, and optimize it.

评论 #12937151 未加载

partykid92over 8 years ago

one big dimension here, the "implementation error" can be easily be debugged. Gradients can be checked numerically. The model can be checked to work by looking at the optimality conditions (not just the loss function go down). This shouldn't be an issue for anyone from a traditional coding background.

peatfreakover 8 years ago

Why would you expect it to be easy?

评论 #12937192 未加载

Dzugaruover 8 years ago

Even if you develop an "intuition" for known tasks (like classification) - there are so many problems that are not tackled yet and no one has any "intuition" yet. Common sense very often doesn't work there (in high-dimensional spaces ;)).For example I've only recently stumbled upon an "Explaining and harnessing adversarial examples" article - and that completely changed my perception about my current work in computer vision.

js8over 8 years ago

I think it is hard because of <a href="https://en.wikipedia.org/wiki/No_free_lunch_theorem" rel="nofollow">https://en.wikipedia.org/wiki/No_free_lunch_theorem</a>That follows there is no single "good" algorithm, and you need to have and exploit domain knowledge in order to succeed.

norswapover 8 years ago

> Machine learning often boils down to the art of developing an intuition for where something went wrong (or could work better) when there are many dimensions of things that could go wrong (or work better).I'm not a practitioner, but I always thought this was the main challenge. Uses of ML are rarely "right" or "wrong" per se, but they rely on intuition to get a model that "works" in a practical sense.There is no royal way to machine learning: you can't decide you are going to make an algorithm that detects bad comments (as determined by human consensus) and then just go make an implementation that you can reason out to be correct, the way you could prove a graph algorithm correct. Trial-and-error and hard-to-transcribe intuition are baked into the process.(I'd love to get some insider insight on this comment!)

bitLover 8 years ago

I will be a little sarcastic - using sub-optimal/locally-optimal algorithms everywhere in ML due to time complexity, why would you expect it would bring nice/predictable results? It's more like a miracle if you find something that works, otherwise you will be hitting the usual hard problems from optimization and end up in a "catch-as-you-can" situation where even Monte Carlo randomness is a good guess. And way too many people assume ML is just applied statistics, always keeping their minds in this frame and missing out on the large data ML capabilities where statistics is irrelevant and you can directly ask and find answers to many fundamental questions in your dataset.

ramblenodeover 8 years ago

The diagrams are pretty misleading. One can craft a space with any number of arbitrary dimensions, but that's kind of meaningless until the space is populated with data. Certainly the likelihood of a bug is not uniformly distributed across the space, and certainly the density of bugs within a space varies greatly depending on the problem. I imagine the average kernel developer's 2-D space is both very dense and has greater spread than the 4-D space of many ML engineers.

jorgemfover 8 years ago

In software you can track the program and detect what instruction is not doing what it is suppose to do. In machine learning there is no program to track, it is not a set of instructions with a purpose, the whole thing either works or not. In order to discover what could be failing, you need to have a deep knowledge about a lot of stuff (maths, statistics, CS) to figure out what is wrong. And sometimes the answer is that the problem doesn't have a solution.

评论 #12938339 未加载

strictfpover 8 years ago

People don't use it simply because it's not what they signed up for. It's not obvious that a software engineer will enjoy being a data scientist. I for one think it's tedious, and I don' t enjoy spending time and effort collecting the necessary data to solve my problems the ML way.

DrNukeover 8 years ago

Generally speaking, being hot in the media does not help: walking the walk is way harder than talking the talk.

godmodusover 8 years ago

because of preprocessing and needing to choose the functions that'll do the approximation - the process itself is semi-automatic, not fully automatic. ANN's inner nodes are specific functions that need parameter tweaking (after choosing the right ones that is) - vector machines have different kinds for different data, etc.and those two things are very domain specific so you need to do a lot of homework first, and debugging later.

edblarneyover 8 years ago

I've worked in computational linguistics:1) It's 'hard' because you need a lot of 'training data' in order to train models etc.. It's hard to get.2) 'AI' type interfaces represent a whole new kind of UI challenge. For 'predictive typing' for example, you can optimize an algorithm so that it does better for 90% of the US population, but then it gets 'worse' for the remaining 10%. So it's a paradox. This can have weird effects.For example, if you have an app in the app-store, you may leave the settings so that it's 'broadly optimal'. You get ok stars.If you then make it 'better' for those 90%, you might get a little boost in ratings, but you get 1 and 0 star ratings from the 10% for whom it's a sub-par experience. This can destroy your product.Anyhow - 'there is no right answer' often in AI, and setting expectations can be extremely difficult.And all of that has nothing even to do with CS.

评论 #12937643 未加载

评论 #12937693 未加载

评论 #12940002 未加载

评论 #12937883 未加载

评论 #12938640 未加载

tmptmpover 8 years ago

Beautifully written and insightful.>>After much trial and error I eventually learned that this is often the case of a training set that has not been correctly randomized and is a problem when you are using stochastic gradient algorithms that process the data in small batches.Take this single term from the above sentence: "stochastic gradient algorithms", they represent three key areas: statistics, calculus and CS.These three things, even when they are to be studied in isolation are much complex. For ML, you must be able to juggle these 3 fireballs effectively. No surprise, it's much, much more difficult than many other software engineering problems.

评论 #12937227 未加载

评论 #12938011 未加载

atomicalover 8 years ago

I'm getting a 404.

评论 #12937450 未加载

评论 #12937687 未加载

dschiptsovover 8 years ago

Because most of the models are flawed or wrong.

fucking_idiotover 8 years ago

mostly because the implementation is really tough - mostly lots of matricies and calculus. i recommend using sklearn.