TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Neural network that impersonates writers

48 pointsby jacob_plasterover 9 years ago

9 comments

nlover 9 years ago
I haven&#x27;t looked at the code, but glancing at the results leaves me thinking it might need more work.<p>The output seems to me around the level a Markov chain might produce. Karpathy&#x27;s RNN code produces much, much better results[1].<p>I wonder if manually extracting features and training the RNN on that is a mistake? RNN&#x27;s tend to work well on text because they encode understanding of the parse tree themselves.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;karpathy&#x2F;char-rnn" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;karpathy&#x2F;char-rnn</a>
strong_aiover 9 years ago
this doesn&#x27;t look like a neural net to me. from NeuralNetwork.py<p><pre><code> from sklearn.neighbors import KNeighborsClassifier # Create a sperate neural network for each identifier for index in range(0, len(NaturalLanguageObject._Identifiers)): nn = KNeighborsClassifier() self._Networks.append(nn)</code></pre>
评论 #10169907 未加载
评论 #10169596 未加载
bearzooover 9 years ago
I am afraid this author has no idea what he is doing - and is loosely throwing around terms he does not understand. What the hell was his normalization procedure. Dangerous to readers who do not know a lot and will get confused while reading.
Turing_Machineover 9 years ago
I ran a Markov chain text generator on <i>Finnegans Wake</i> once. It came out looking much the same. :-)
friscoover 9 years ago
Fun hack. If anything, it highlights how compelling deep learning and RNNs are: no messing with NLP, no messing with building other features or adding up classifiers, etc. The manual feature engineering means it might work better on a smaller dataset, but even then probably not.<p>For comparison with Andrej Karpathy&#x27;s RNN code (<a href="http:&#x2F;&#x2F;karpathy.github.io&#x2F;2015&#x2F;05&#x2F;21&#x2F;rnn-effectiveness&#x2F;" rel="nofollow">http:&#x2F;&#x2F;karpathy.github.io&#x2F;2015&#x2F;05&#x2F;21&#x2F;rnn-effectiveness&#x2F;</a>) training on the &quot;HarryPotter(xxlarge).txt&quot; (76K) file using the default hyperparameters and a batch size of 25 gets me:<p><pre><code> &gt; But Atfa the loom proset! No contarin — mibll,’s just pucking to live &gt; note left them hard and fitther, clooked of course little happered to &gt; trige on the fistpened. Their knew Harry mear from the shind-beas &gt; eveided, at Uncle Vernon’s thepped to spept were pelled and beadn &gt; Harry, distine dy use. Harry had in a amalout, into the fish sfary door. </code></pre> The difference here is tokenizing on words vs letters: the RNN code is trying to learn the structure of English from completely zero whereas the code here gets to work with well-formed words from the beginning. But otherwise, the results in the linked post are about as silly semantically:<p><pre><code> &gt; Input: &quot;Harry don&#x27;t look&quot; &gt; Output: &quot;Harry don&#x27;t look , incredibly that a year for been parents in . &gt; followers , Harry , and Potter was been curse . Harry was up a year , &gt; Harry was been curse &quot; </code></pre> EDIT: Updated the RNN output text. Was sampling from a checkpoint file for a different input corpus. Got confused by the long similar-looking filenames. Doesn&#x27;t change the overall point though.
评论 #10169923 未加载
评论 #10169145 未加载
cafover 9 years ago
Have you considered the copyright on the Harry Potter training data?
评论 #10169036 未加载
achompasover 9 years ago
&gt; I decided to use scikit&#x27;s machine learning libraries. [...] The writer I create uses multiple SVM engines. One large neural network for the sentence structuring and multiple small networks for the algorithm which selects words from a vocabulary.<p>This person has no idea what they&#x27;re talking about. sklearn has no neural network code whatsoever.<p>EDIT: this feels like a testament to sklearn&#x27;s greatness, honestly.
w_t_payneover 9 years ago
I&#x27;d be interested to know if this could be turned into a tool that lets you know how well your writing (or coding) matches the &quot;house style&quot;. (Mostly for technical documentation, requirements specs etc...)<p>I&#x27;d be even more interested if it could be turned into a sublime text plugin that highlights words &#x2F; phrases that deviate most strongly from the house style.
评论 #10170390 未加载
scorpwarp23over 9 years ago
This is brilliant! I tried it out. Waiting for a larger data set! +1