Not discussing the trivial example (since for any model there exists a distribution and a dataset on which a model will not perform). Just a general thought. Intro to ML teaches us this: if we want to "learn" a hypothesis class reasonably well with a finite sample, the class shouldn't be too complex. Otherwise we lose precision and/or any guarantees. This implies that for any DL algorithm A(S) on sample S, there exists a data transformation g such that B(g(S)) will result in a lower* risk for some simpler algorithm B. The question is not whether linear models are good or bad, but how complex/expensive is the transformation g.
Thanks for posting this.<p>I work in data science and I haven’t seen anybody claiming that PCA, an unsupervised technique, is a good classification (supervised) technique. I think this is the framing that the author is pursuing.<p>That said, some candidates might mention clustering because it’s easy to understand and you can apply an action (this group is high but also good customers so they get a special treatment).