TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: AI labeling and TL;DR for HN posts?

2 点作者 A_No_Name_Mouse12 个月前
HN is a great source of news. Not every subject is of equal interest to everybody though. Ideally I would like to have a personalized feed of submissions that may interest me, with a short summary of the linked article.<p>I now use the RSS feed of new submissions to automatically filter posts on keywords in the title (e.g. &quot;space&quot; but not &quot;SpaceX&quot;). This has the added benefit that niche articles with 0 upvotes will still get my attention. An AI should be able to fairly easily categorize each article so you could filter on category&#x2F;label instead of literal keywords. This could also address a request often seen on HN &quot;is there an HN focussed on subject xyz&quot;, because this way focus can be achieved by means of a filter with HN as the single source. A generated TL;DR would be a bonus.<p>Knowing the amount of smart and AI focussed people over here, probably someone has already thought about or even implemented this. So what do you think? Ideally this would be a publicly available feature instead of everyone analyzing and filtering articles themselves.<p>Edit: changed &quot;posts&quot; to submissions to make clear it is about submitted articles, not comments

2 条评论

Leftium12 个月前
Hacker News with Tags:<p>- <a href="https:&#x2F;&#x2F;histre.com&#x2F;hn" rel="nofollow">https:&#x2F;&#x2F;histre.com&#x2F;hn</a> (discussion: <a href="https:&#x2F;&#x2F;hw.leftium.com&#x2F;#&#x2F;item&#x2F;35904988" rel="nofollow">https:&#x2F;&#x2F;hw.leftium.com&#x2F;#&#x2F;item&#x2F;35904988</a>)<p>I use Kagi summarizer to give me the TL;DR for articles. Kagi provides two levels of summary:<p>- <a href="https:&#x2F;&#x2F;blog.kagi.com&#x2F;universal-summarizer" rel="nofollow">https:&#x2F;&#x2F;blog.kagi.com&#x2F;universal-summarizer</a>
PaulHoule12 个月前
My YOShInOn RSS reader works pretty well on HN comments. It ingests about 110 feeds including <a href="https:&#x2F;&#x2F;hnrss.org&#x2F;bestcomments" rel="nofollow">https:&#x2F;&#x2F;hnrss.org&#x2F;bestcomments</a>, early I had tried using <a href="https:&#x2F;&#x2F;hnrss.org&#x2F;newcomments" rel="nofollow">https:&#x2F;&#x2F;hnrss.org&#x2F;newcomments</a> but the volume was overwhelming when compared to the set of feeds I had at the time.<p>I treat recommendation as a classification problem, I run documents through a model from SBERT and then do clustering, classification and such with tools from scikit-learn. The system currently trains on my last 120 days worth of judgements and takes about 3 minutes to train, evaluate and calibrate a model.<p>k-means clustering works great for lumping articles into big categories, for instance sports articles wind up together, articles about computer programming, others about the Ukraine war, etc. These categories aren&#x27;t labeled but the system works by clustering the data and showing me the highest scoring articles. I like the results a lot.<p>99% of the posts that I make to HN were selected by the system and selected by me twice.<p>You can ask ChatGPT to do a topic classification; if you are lucky and suggestible you&#x27;ll probably be impressed with the results initially, but when the honeymoon is over you will see it won&#x27;t be as accurate as you like. It&#x27;s also slow and expensive.<p>I&#x27;ve thought about developing a topic classifier using the same methods I use for recommendation, the main challenge here is getting a training set. My take is that it takes 2000-8000 labeled examples to make a good classifier for one category so if you wanted to support 20 categories you will need 40,000-160,000 labeled documents. Labeling 1000 documents a day takes about as much time and energy as a serious videogame habit, I have at times labeled 4000 images a day but I found it has effects on my visual system including hallucinations. (e.g. go label photos of people and then ride the bus and you&#x27;ll find yourself classifying people automatically as to whether they have &quot;short hair&quot; or &quot;medium hair&quot; or whatever)<p>There are some ways to cheat. <a href="https:&#x2F;&#x2F;tildes.net&#x2F;" rel="nofollow">https:&#x2F;&#x2F;tildes.net&#x2F;</a> has a pretty good classification system and I&#x27;ve been tempted to crawl the site, also some newspapers have a good classification system. (YOShInOn has avoided using these because I want it to learn to read text) My k-Means clusters correspond more or less to topics so if I did a little editing of the results that would also be a fast way to build a training set.<p>Another question is what inputs to use: the title or more of the article? In the case of an &quot;Ask HN&quot; the title might be all you want. The titles are easy to pull out of the HN API but crawling the actual articles will be a lot more work and mean collecting vastly more data. There&#x27;s a real limit of how well you&#x27;ll do with titles because some titles are ambiguous.