TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Micro-models: purposefully overfit models that are good at one specific thing

72 pointsby ulrikhansen54almost 4 years ago

10 comments

ALittleLightalmost 4 years ago
This doesn't really seem like "Overfitting" as I understand the concept. This is more like training a model to do a specific task rather than a more general task. Overfitting would be if your model started to memorize the training data - which doesn't seem to be what they are talking about and doesn't seem like it would be very useful..
评论 #28084007 未加载
评论 #28084367 未加载
评论 #28083830 未加载
评论 #28083921 未加载
评论 #28083533 未加载
ruinar50almost 4 years ago
Putting aside the detailed discussions on what exactly is &quot;overfitting&quot; for the moment, interested to hear more about the utility of micro-models in actual value delivery pipelines.<p>Does it matter if it&#x27;s technically overfitting or not if everyone understands what their &quot;one specific thing&quot; is and how to &quot;stitch&quot; them together to get accurate results over a some real-world problem space? (conversely, people have to recognize the limitations.) Also, for &quot;micro-model&quot; as a word, appreciate having neutral vocabulary to talk about a model that doesn&#x27;t solve the whole problem space, but does work for some of it. As opposed to &quot;overfit model&quot; or &quot;incomplete model&quot;, which seem to cast negative connotations on a concept which is potentially useful when properly applied. (Though an eventual consensus on vocabulary likely necessary as the space matures...)<p>Later parts of the article introduced kick-off, iteration, and prototyping time as concrete benefits. Interested to see a follow-up addressing how micro-models fit into general problem-solving pipeline. What&#x27;s next in terms of speeding up the assembly-line process? Where do they fit into data-oriented programming on the whole?
brainwipealmost 4 years ago
I&#x27;m not sure this is overfitting but a very narrow training set. It&#x27;s still generalising against inputs it hasn&#x27;t seen. If it was really overfitted then it wouldn&#x27;t work for any unseen frames and it would be learning the &quot;noise&quot;. It&#x27;s not learning noise else you&#x27;d get lots of false positives, such as dark areas in the frame that look a bit like Batman but aren&#x27;t. The main reason you want to generalise is noise rejection (no mention of this in the article). I think the S&#x2F;N ratio in a video is exceptionally high as the dataset is directly repeatable so the source of truth is exceptionally accurate.<p>That being said, narrow training sets are a great idea and this application looks great.
评论 #28087924 未加载
robojokeralmost 4 years ago
This is an approach that I have used when doing attribution. Given error signals of a larger system, I couldn’t get great performance to attribute the errors to a particular broken component in the system. However, when I broke down that component into its set of particular issues and built a classifier per issue, I was able to get great performance. With the light weight models we used, it was straight forward to automate most of the training &#x2F; validation of these component-issue specific models and decom them when the issue no longer existed (a fix was put in).
l-lousyalmost 4 years ago
Interesting article, this also seems like a form of knowledge distillation. There have been a lot of examples of people distilling an ensemble into a single model, maybe you could try that here directly by taking out the middle man (match their outputs directly instead of labeling data).<p>Anyway, I’ve been trying to think of how this could be used for text data, specifically NER, which generally requires a lot more semantic understanding of the input. Sadly it seems like there might not be much room for the ‘micro’ part of the micro models.
评论 #28085711 未加载
jogundasalmost 4 years ago
A nice example of overfitting!<p>However, it is hard to imagine an actual application of the process. If I understand it correctly, the author suggests using a set of micro-models for annotating a dataset which is then used to train another model. The latter model can actually detect Batman in a general environment, ie, can generalize. However, enriching a training dataset by adding adjacent frames depicting Batman from the same movie will likely have limited usefulness when training an actual Batman detection (non-micro!) model. Or do I get the final application wrong?
评论 #28084616 未加载
tomrodalmost 4 years ago
Neat concept. Suggestion to the author: show the out of sample fit stats and how the interpolation versus extrapolation regions are determined.
评论 #28085724 未加载
underaxonalmost 4 years ago
I may be wrong but I think this is what kernel methods (eg. SVM) do, right? So this looks like a (deep)SVM where the kernels are small NNs.
klysmalmost 4 years ago
I think the important piece missing from the headline is that these micro models are combined in ensemble like fashion. Because of that I wouldn’t really call it overfitting per se - more of a very restricted space to care about.
abz10almost 4 years ago
Nothing new was discovered here and the key terminology is used incorrectly.<p>To be fair, most of the industry are amateurs, but most people don’t write medium posts and continue to argue their ignorance on HN.