I tried to train bayesian (and other) classifiers to reliably pick the same stories to read as I would. Despite looking at a variety of things - title, poster, domain, corpus from the article, corpus of the comments, I found their accuracy was never really better than 60%.<p>Then I tried rating the same set of articles myself several times. My accuracy was only around 60% too.<p>Figures.
Very cool! I've been hacking around with modifying HN's interface via JS a lot recently - this will be a welcome tool in my experiments.<p>One comment: The up/down votes are really "strong" visually. Perhaps make them smaller and/or lighter in color?
Maybe it's easier simply to classify things you wouldn't want to read and hide those as less interesting. Because of the variety of topics, training something to figure out what you like seems much more restricting on the flow.<p>E.g. if you rarely read things with ".js" (stupid amounts of js library posts here), it'll be easier to say this is uninteresting to me, vs classifying everything as interesting so the algorithm has to infer that you find js libraries uninteresting.<p>Although I'm pretty interested in node but not js libraries for api's necessarily, tough problem indeed.
As an alternative, just trust the HN home page algorithm.<p>Stories seem to move up to a relevant max rank position, stay there and then move back down. Big stories stay in the top 5 for 20+ hours.<p>Here's what I do: If I only have time to look at 5 stories per day, I visit once per day at any point in time and look at the first 5 stories. If I have time to look at 20, look at the first 20.<p>Set yourself a timeout, start reading at the top, stop when the time is up, repeat after 12 or 24 hours. Works very well for me, I get the best stories, and feel pretty well informed.