Looking at the hn demo, I'm impressed. There are definitely relevant tags being generated. Unfortunately there also some noisy tags which clutter the results. Taking one example, the post "DevOps? Join us in the fight against the Big Telcos" given the tags "phone tools sendhub we're news experience customers comfortable", I would say that "we're" is unarguably noise. Another example, "Questions for Donald Knuth" with tags "computer programming don i've knuth taocp algorithms i'm" I would call out "i've" and "i'm".<p>There are other words in both examples that I personally would not use as tags, but I can't really say they would be universally not-useful. I think a vast improvement could be made just by having a dictionary blacklist filled with things like these - from this tiny sampling contractions seem to be a big loser.