TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Auto-Generating Clickbait with Recurrent Neural Networks

251 点作者 lars超过 9 年前

24 条评论

thenomad超过 9 年前
If I could feed this an article and have it generate headlines based on the text of that article (and they were any good), there is a solid chance I would pay real money for that service.<p>Headlines are an absolute pain, and as the article says, they&#x27;re decidedly unoriginal most of the time. I can&#x27;t see an obvious reason that an AI would be much worse at creating them as a human.
评论 #10381722 未加载
评论 #10381641 未加载
blisterpeanuts超过 9 年前
I like the notion of swamping the Internet with fake click-bait headlines, to dilute the attractiveness of this (to me, odious) form.<p>Give me sincere, honest news and discussion, or else shut up.<p>Unfortunately, someone out there must really have a craving for &quot;weird old tricks&quot; and &quot;shocking conclusions&quot;.<p>It&#x27;s a sort of race-to-the-bottom, least common denominator effect.<p>Maybe someone will write a browser extension that filters out obvious click-bait headlines. Now <i>that</i> would be clever!
评论 #10383435 未加载
评论 #10383170 未加载
评论 #10384408 未加载
评论 #10384107 未加载
rndn超过 9 年前
Could this RNN model perhaps be used to filter click bait headlines from HN automatically? Perhaps one could perform some sort of backward beam search to figure out how likely a particular headline would&#x27;ve been produced by it. If there are words in a headline that the model doesn&#x27;t know, one could perhaps just let it replace it with one that it knows.
oneJob超过 9 年前
Now if we can just teach AI to get sidetracked reading all this content we&#x27;d also prevent Judgement Day.<p>SkyNet: (speaking to self?) &quot;Unleash hell on humans. Launch all missiles.&quot;<p>SkyNet: (responding to self?) &quot;Not now, not now. Let me finish this article on John Stamos&#x27;s belly button.&quot;
ChuckMcM超过 9 年前
<a href="https:&#x2F;&#x2F;xkcd.com&#x2F;1283&#x2F;" rel="nofollow">https:&#x2F;&#x2F;xkcd.com&#x2F;1283&#x2F;</a><p>I really find RNNs to be pretty cool. When they are combined with a natural human tendency to see patterns they are hilarious. So perhaps we need to update our million monkeys hypothesis to a million RNNs with typewriters coming up with all the works of Shakespeare.
评论 #10382222 未加载
clickok超过 9 年前
Nice! I&#x27;ve wanted to do something like this for awhile, too, but haven&#x27;t had the time yet.<p>What&#x27;s interesting to me, from a research point of view, is the degree of nuance the network uncovers for the clickbait. We all know that &lt;person&gt; is going to be doing &lt;intriguing action&gt;, but for each person these actions are slightly different. The sentence completions for &quot;Barack Obama Says...&quot; are mainly politics related while &quot;Kim Kardashian Says...&quot; involve Kim commenting on herself.<p>So it might not really understand what it&#x27;s saying, but it captures the fact those two people will tend to produce different headlines.<p>Neat Idea: what if we tried the same thing with headlines from the New York Times (or maybe a basket of newspapers)? We would likely find that the Clickbait RNN&#x27;s vision of Obama is a lot different from the Newspaper RNN&#x27;s Obama. Teasing apart the differences would likely give you a lot more insight into how the two readerships view the president than any number polls would.
mikkom超过 9 年前
What I&#x27;m surprised most is that the headlines seem not to be much better than your average markov chain output
评论 #10382414 未加载
评论 #10383106 未加载
VLM超过 9 年前
This was an enjoyable article. There is an obvious extension which is to mturk the results and feed the mturk data back into the net. Just give the turkers 5 headlines and ask them which they would click first, repeat a hundred times per a thousand turkers or whatever.<p>Years ago I considered applying for DoD grant money to implement something reminiscent of all this for military propaganda. That went approximately nowhere, not even past the first steps. Someone else should try this (insert obvious famous news network joke here, although I was serious about the proposal). To save time I&#x27;ll point out I never got beyond the earliest steps because there is a vaguely infinite pool of clickbaitable English speakers on the turk, but the pool of bilingual Arabic (or whatever) speakers with good taste in pro-usa propaganda is extremely small, so the tech side was easy to scale but the mandatory human side simply couldn&#x27;t scale enough to make the output realistically anything but a joke.
rlu超过 9 年前
&gt; The training converges after a few days of number crunching on a GTX980 GPU. Let’s take a look at the results.<p>Stupid question: why is the GPU important here? I would have thought this was more of a CPU task..??<p>(then again, as I typed this I remembered that bitcoin farming is supposed to be GPU intensive so I&#x27;m guessing the &quot;why&quot; for that is the same as this)
评论 #10383375 未加载
评论 #10383412 未加载
imaginenore超过 9 年前
Getting this error:<p><pre><code> Error: 500 Internal Server Error Sorry, the requested URL &#x27;http:&#x2F;&#x2F;clickotron.com&#x2F;&#x27; caused an error: Internal Server Error Exception: IOError(24, &#x27;Too many open files&#x27;) Traceback: Traceback (most recent call last): File &quot;&#x2F;usr&#x2F;local&#x2F;lib&#x2F;python2.7&#x2F;dist-packages&#x2F;bottle.py&quot;, line 862, in _handle return route.call(**args) File &quot;&#x2F;usr&#x2F;local&#x2F;lib&#x2F;python2.7&#x2F;dist-packages&#x2F;bottle.py&quot;, line 1732, in wrapper rv = callback(*a, **ka) File &quot;server.py&quot;, line 69, in index return template(&#x27;index&#x27;, left_articles=left_articles, right_articles=right_articles) File &quot;&#x2F;usr&#x2F;local&#x2F;lib&#x2F;python2.7&#x2F;dist-packages&#x2F;bottle.py&quot;, line 3595, in template return TEMPLATES[tplid].render(kwargs) File &quot;&#x2F;usr&#x2F;local&#x2F;lib&#x2F;python2.7&#x2F;dist-packages&#x2F;bottle.py&quot;, line 3399, in render self.execute(stdout, env) File &quot;&#x2F;usr&#x2F;local&#x2F;lib&#x2F;python2.7&#x2F;dist-packages&#x2F;bottle.py&quot;, line 3386, in execute eval(self.co, env) File &quot;&#x2F;usr&#x2F;local&#x2F;lib&#x2F;python2.7&#x2F;dist-packages&#x2F;bottle.py&quot;, line 189, in __get__ value = obj.__dict__[self.func.__name__] = self.func(obj) File &quot;&#x2F;usr&#x2F;local&#x2F;lib&#x2F;python2.7&#x2F;dist-packages&#x2F;bottle.py&quot;, line 3344, in co return compile(self.code, self.filename or &#x27;&lt;string&gt;&#x27;, &#x27;exec&#x27;) File &quot;&#x2F;usr&#x2F;local&#x2F;lib&#x2F;python2.7&#x2F;dist-packages&#x2F;bottle.py&quot;, line 189, in __get__ value = obj.__dict__[self.func.__name__] = self.func(obj) File &quot;&#x2F;usr&#x2F;local&#x2F;lib&#x2F;python2.7&#x2F;dist-packages&#x2F;bottle.py&quot;, line 3350, in code with open(self.filename, &#x27;rb&#x27;) as f: IOError: [Errno 24] Too many open files: &#x27;&#x2F;home&#x2F;ubuntu&#x2F;clickotron&#x2F;views&#x2F;index.tpl&#x27;</code></pre>
评论 #10382575 未加载
juddlyon超过 9 年前
I can&#x27;t stop laughing at these. Check out the Click-o-tron site: <a href="http:&#x2F;&#x2F;clickotron.com&#x2F;" rel="nofollow">http:&#x2F;&#x2F;clickotron.com&#x2F;</a>
评论 #10385439 未加载
flashman超过 9 年前
I used a simpler technique (character level language modelling) to come up with an Australian real estate listing generator: <a href="http:&#x2F;&#x2F;electronsoup.net&#x2F;realtybot" rel="nofollow">http:&#x2F;&#x2F;electronsoup.net&#x2F;realtybot</a><p>This is pre-generated, not live, for performance reasons. There are a few hundred thousand items though, so the effect is similar.<p>The data source is several tens of thousands of real estate listings that I scraped and parsed.
OhHeyItsE超过 9 年前
This is simply brilliant.<p>(Ranking algorithm baked into a stored procedure notwithstanding. [ducks])
neikos超过 9 年前
I am not sure how much I would give credit to the idea that the neural network &#x27;gets&#x27; anything as it is written in the article.<p>&gt; Yet, the network knows that the Romney Camp criticizing the president is a plausible headline.<p>I am pretty certain that the network does not know any of this and instead just happens to be understood by us as making sense.
评论 #10382073 未加载
andrewtbham超过 9 年前
tldr; guy uses rnn lstm to create link bait site.<p>hopes crowd sourcing will filter out non-sense.<p><a href="http:&#x2F;&#x2F;clickotron.com&#x2F;" rel="nofollow">http:&#x2F;&#x2F;clickotron.com&#x2F;</a>
评论 #10381744 未加载
chipgap98超过 9 年前
&quot;Tips From Two And A Half Men : Getting Real&quot; is great. Some of the generate titles are incredible
billconan超过 9 年前
I can&#x27;t understand the first two layer RNN which according to the author optimized the word vectors.<p>it says:<p>During training, we can follow the gradient down into these word vectors and fine-tune the vector representations specifically for the task of generating clickbait, thus further improving the generalization accuracy of the complete model.<p>how to you follow the gradient down into these word vectors?<p>if word vectors are the input of the network, don&#x27;t we only train the weight of the network? how come the input vectors get optimized during the process?
alkonaut超过 9 年前
Missed opportunity for HN headline.<p>This program generates random clickbait headlines. You won&#x27;t believe what happens next. You&#x27;ll love #7.
indiv0超过 9 年前
Reminds me of Headline Smasher [0].<p>Some pretty fun ones there but it doesn&#x27;t use RNNs. It just merges existing headlines.<p>[0]: <a href="http:&#x2F;&#x2F;www.headlinesmasher.com&#x2F;best&#x2F;all" rel="nofollow">http:&#x2F;&#x2F;www.headlinesmasher.com&#x2F;best&#x2F;all</a>
kidgorgeous超过 9 年前
Great tutorial. Been looking to do something like this for a while. Bookmarked!
smpetrey超过 9 年前
I think this one is my favorite:<p>Life Is About — Or Still Didn’t Know Me
评论 #10381749 未加载
CephalopodMD超过 9 年前
Your main site is down. Bottle can&#x27;t handle serving files scalably or something? Point is, it broke.
评论 #10383784 未加载
hilti超过 9 年前
Interesting blog post, but site is down. How much traffic do You get from HN?
joshdance超过 9 年前
500 Internal Server Error on the site where you could upvote em.
评论 #10381911 未加载