TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Classifying aviation-related posts on Hacker News with SLMs

10 点作者 sethkim大约 1 个月前

1 comment

minimaxir大约 1 个月前
A few misc notes:<p>1. The better way to get all Hacker News data instead of blasting the API is to download the data from the official BigQuery dataset, which can do the task in a single query: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40644563">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40644563</a><p>2. For labeling the posts, instead of label-then-explanation, it may be better to do explanation-then-label to give the model a chance to reason though the edge cases.<p>3. Following up from #2, for prompt engineering the system prompt, it would likely be better to give a list of multiple valid examples and invalid examples (as noted after the fact) to guide reasoning.<p>4. Since the target label is a binary objective, it may be more practical&#x2F;faster&#x2F;cheaper to create a normal logistic regression model (e.g. tf-idf&#x2F;BoW) from a large representative sample, then use that to predict the rest of the labels.<p>The more advanced way to do #4 would be to encode the posts as text embeddings first then use them as the input for a small MLP model...which I may or may not have a project in the pipeline based around that approach.
评论 #43710694 未加载