TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How do you manage to keep improving your ML models accuracy?

3 点作者 gerenuk超过 7 年前
Hello everyone,<p>What do you guys do to improve training data set labels to get better success rate? Are you using some framework or doing it manually after some time?<p>We have been working on topic modeling stuff, and managing everything manually is quite hectic, so looking for some better solutions.<p>Thanks

1 comment

PaulHoule超过 7 年前
From an engineering standpoint, the question is understanding where the bottlenecks are.<p>For instance, your feature set might limit your accuracy. Let&#x27;s say you are interested (or uninterested) in posts about the Go programming language on HN and you are classifying based on the title. &quot;Golang&quot; predicts &quot;Go Language&quot; accurately, but &quot;Go&quot; does not. No matter how much you train, you will reach a limit unless you have beyond-bag-of-words features like &quot;Go Development&quot;, &quot;Go Implementation&quot;, ...<p>Many NLP projects fail because people decide up front to throw away critical information that they can never get back. Beyond BoW is not trivial, however, because if you vastly increase the number of features, most will be poorly sampled and you won&#x27;t learn from them.<p>Past feature engineering there are very interesting questions in active learning that are not covered well in the academic literature, largely because active learning experiments are not reproducible in a Kaggle-like competition. There is also the human factor; you can destroy people psychologically by making them split hairs that don&#x27;t matter; realistically you can get 2000 judgements a day out of a person if that is all they do, 200 is more likely from an expert who does other things.<p>Click on my profile link and send me an email and I can share what I know.