My YOShInOn RSS reader works pretty well on HN comments. It ingests about 110 feeds including <a href="https://hnrss.org/bestcomments" rel="nofollow">https://hnrss.org/bestcomments</a>, early I had tried using <a href="https://hnrss.org/newcomments" rel="nofollow">https://hnrss.org/newcomments</a> but the volume was overwhelming when compared to the set of feeds I had at the time.<p>I treat recommendation as a classification problem, I run documents through a model from SBERT and then do clustering, classification and such with tools from scikit-learn. The system currently trains on my last 120 days worth of judgements and takes about 3 minutes to train, evaluate and calibrate a model.<p>k-means clustering works great for lumping articles into big categories, for instance sports articles wind up together, articles about computer programming, others about the Ukraine war, etc. These categories aren't labeled but the system works by clustering the data and showing me the highest scoring articles. I like the results a lot.<p>99% of the posts that I make to HN were selected by the system and selected by me twice.<p>You can ask ChatGPT to do a topic classification; if you are lucky and suggestible you'll probably be impressed with the results initially, but when the honeymoon is over you will see it won't be as accurate as you like. It's also slow and expensive.<p>I've thought about developing a topic classifier using the same methods I use for recommendation, the main challenge here is getting a training set. My take is that it takes 2000-8000 labeled examples to make a good classifier for one category so if you wanted to support 20 categories you will need 40,000-160,000 labeled documents. Labeling 1000 documents a day takes about as much time and energy as a serious videogame habit, I have at times labeled 4000 images a day but I found it has effects on my visual system including hallucinations. (e.g. go label photos of people and then ride the bus and you'll find yourself classifying people automatically as to whether they have "short hair" or "medium hair" or whatever)<p>There are some ways to cheat. <a href="https://tildes.net/" rel="nofollow">https://tildes.net/</a> has a pretty good classification system and I've been tempted to crawl the site, also some newspapers have a good classification system. (YOShInOn has avoided using these because I want it to learn to read text) My k-Means clusters correspond more or less to topics so if I did a little editing of the results that would also be a fast way to build a training set.<p>Another question is what inputs to use: the title or more of the article? In the case of an "Ask HN" the title might be all you want. The titles are easy to pull out of the HN API but crawling the actual articles will be a lot more work and mean collecting vastly more data. There's a real limit of how well you'll do with titles because some titles are ambiguous.