Would love to get people's feedback - built this for myself over the weekend as a way to accelerate and limit my reading of HN (and being inspired by the NLP course from Stanford).<p>The NLP is pretty basic and takes a ratio of the original article, so you do get some longer listings.<p>Big thanks to Wayne Larsen of hckrnews.com for providing me with some insight on tracking top stories and letting me use his ranking data. Also, I recommend <a href="http://www.hackernewsletter.com/" rel="nofollow">http://www.hackernewsletter.com/</a> for a human-curated version.
Here is a simple recipe to do similar that works decently well as a start off point:<p>--------------------------------------------------------<p>Count how many times each word appears in the document into a dictionary or map structure<p>Also make sure you track the total words.<p>document |> splitBySpace |> if dictionary has word then +1 else 1; totalwords++<p>Then split the document into sentences.<p>Okay, now for each sentence<p>==========================================<p>score = 0.<p>split sentence by space and<p>for each word score+= -(dictionary[word]/sum) * log(dictionary[word]/sum)<p>dictionaryScore.Add(sentence, score)<p>==========================================<p>So now each sentence has a score. You can sort by best and lose order. Or if you want to limit (0 - 1) based on score:<p>findbestScore and filter each sentence by if limit < docscore / bestscore.<p>As I said this is only a start off point and is susceptible to list of random words (guess why) there are many ways to make it better. Here is a portion of code I dug up from a while ago:<p><pre><code> let inline sumMap m = m |> Map.fold (curryfst (+)) 0.
let inline internal countsAndSum n doc =
let counts = splitstr [|" "|] doc |> filterStop n |> Array.fold mapAdd Map.empty
counts, sumMap counts
let ent m sum k =
let p = (mapGet m k 0.)/sum
if p = 0. then 0. else -p * log2 (p)
let eScore doc =
let counts , sum = countsAndSum 0 doc
splitSentenceRegEx doc |> Array.map (fun str -> str, splitstr [|" "|] str |> Array.fold (flip ((+) << (ent counts sum))) 0.)</code></pre>
I personally don't want algorithmically summarized content, I want manually summarized content by knowledgeable HN users. It's half the reason I click into the comments 99% of the time before clicking into the linked article. I want interesting insight along with a good summary of what the main points were being communicated. There's just no way automatically generated summaries can compete with that.
Just got my first newsletter. Looking good for an initial release. Some feedback:<p>Would love to get an index of headlines on top of the email with anchors to actual stories below.<p>Would love to see shorter summaries and maybe some of the top comments for each story (summarized, if possible).
Bear in mind that comments here are self selecting for people who like HN's comments section ;-) But I know plenty of people and speak to people on Twitter who <i>deliberately</i> avoid these comments pages due to a perception (fair or not) of "drama" and what not. For those folks, an email like this could be just the ticket. For me though, I'm staying here ;-)
Thanks for sharing this, I'm curious to see how well it works out over time. It'd be nice to be able to choose the compression level.<p>Quality feels at least as good as an open source summarizer I played around with a while back; good work!
Great execution, but I'm uncertain of the idea.
My personal perspective: I read wikipedia for information -- I read HN for critical insight. Not always present, but a higher signal/noise ratio than other websites. I don't want a summary of information - I want critical thought.
Why only 20 stories? I usually scan the first three pages once a day, a snapshot of the top 90 articles. Only about 10% are relevant so I'd rather have more summaries to sift thru to find the ~10 relevant articles for the day.
I actually thought algorithmic summaries would be worse than useless but they seem surprisingly good. Here's the one from Caine's Arcade:<p>"9 year old Caine sets up an arcade in his father’s used car parts store in East L.A., using only cardboard boxes his dad had lying around and a ton of ingenuity. Watch his dreams come true when this filmmaker sets up a flash mob to come and play. Just watching this may make you a better person. $82,000 has already been raised for Caine’s scholarship fund! little behind on the bandwagon, but...film just had me in tears."
I plan on adding this to my <a href="http://newspaper23.com" rel="nofollow">http://newspaper23.com</a> site. It's just way on the back burner.<p>Ideally I think you would do it client-side, so readers could adjust the shrinkage to the amount of time they have to peruse. I was also thinking about a scenario where you could browse at say 100-words and then dive-deep if you found anything that interests you. A more interactive approach. You might want to consider this.<p>But I really like the idea. Would love to hear how the project goes!
I got my first email, here's some feedback.<p>You should make sure that the summaries don't scale linearly with the size of the content--just because an article is 10x as long, doesn't mean I want a summary to be 10x longer. Maybe scale logarithmically?<p>I didn't find any of the summaries to be high quality or any better than I could get from briefly skimming HN myself.<p>I've unsubscribed.
I am taking an alternative approach to make sense of HN stories for Chinese readers. As a regular HN reader, I manually summarize the topic of top stories and translate them into Chinese. The motivation is to lower the startup/tech news sharing barriers. Link - <a href="http://geektell.com/" rel="nofollow">http://geektell.com/</a>
Really like it!<p>One small suggestion...could you make the "76 comments" under the title clickable through to the HN comments?<p>One other option (maybe a user preference), include some noteworthy excerpts from the HN comments in the email as well?
How about giving writers the respect they deserve and not algorithmically rewriting their work? Has our attention span really gotten so short that we cannot read articles of substance any longer?