TechEcho

9 comments

nlover 9 years ago

I haven't looked at the code, but glancing at the results leaves me thinking it might need more work.The output seems to me around the level a Markov chain might produce. Karpathy's RNN code produces much, much better results[1].I wonder if manually extracting features and training the RNN on that is a mistake? RNN's tend to work well on text because they encode understanding of the parse tree themselves.[1] <a href="https://github.com/karpathy/char-rnn" rel="nofollow">https://github.com/karpathy/char-rnn</a>

strong_aiover 9 years ago

this doesn't look like a neural net to me. from NeuralNetwork.py<pre><code> from sklearn.neighbors import KNeighborsClassifier # Create a sperate neural network for each identifier for index in range(0, len(NaturalLanguageObject._Identifiers)): nn = KNeighborsClassifier() self._Networks.append(nn)</code></pre>

评论 #10169907 未加载

评论 #10169596 未加载

bearzooover 9 years ago

I am afraid this author has no idea what he is doing - and is loosely throwing around terms he does not understand. What the hell was his normalization procedure. Dangerous to readers who do not know a lot and will get confused while reading.

Turing_Machineover 9 years ago

I ran a Markov chain text generator on Finnegans Wake once. It came out looking much the same. :-)

friscoover 9 years ago

Fun hack. If anything, it highlights how compelling deep learning and RNNs are: no messing with NLP, no messing with building other features or adding up classifiers, etc. The manual feature engineering means it might work better on a smaller dataset, but even then probably not.For comparison with Andrej Karpathy's RNN code (<a href="http://karpathy.github.io/2015/05/21/rnn-effectiveness/" rel="nofollow">http://karpathy.github.io/2015/05/21/rnn-effectiveness/</a>) training on the "HarryPotter(xxlarge).txt" (76K) file using the default hyperparameters and a batch size of 25 gets me:<pre><code> > But Atfa the loom proset! No contarin — mibll,’s just pucking to live > note left them hard and fitther, clooked of course little happered to > trige on the fistpened. Their knew Harry mear from the shind-beas > eveided, at Uncle Vernon’s thepped to spept were pelled and beadn > Harry, distine dy use. Harry had in a amalout, into the fish sfary door. </code></pre> The difference here is tokenizing on words vs letters: the RNN code is trying to learn the structure of English from completely zero whereas the code here gets to work with well-formed words from the beginning. But otherwise, the results in the linked post are about as silly semantically:<pre><code> > Input: "Harry don't look" > Output: "Harry don't look , incredibly that a year for been parents in . > followers , Harry , and Potter was been curse . Harry was up a year , > Harry was been curse " </code></pre> EDIT: Updated the RNN output text. Was sampling from a checkpoint file for a different input corpus. Got confused by the long similar-looking filenames. Doesn't change the overall point though.

评论 #10169923 未加载

评论 #10169145 未加载

cafover 9 years ago

Have you considered the copyright on the Harry Potter training data?

评论 #10169036 未加载

achompasover 9 years ago

> I decided to use scikit's machine learning libraries. [...] The writer I create uses multiple SVM engines. One large neural network for the sentence structuring and multiple small networks for the algorithm which selects words from a vocabulary.This person has no idea what they're talking about. sklearn has no neural network code whatsoever.EDIT: this feels like a testament to sklearn's greatness, honestly.

w_t_payneover 9 years ago

I'd be interested to know if this could be turned into a tool that lets you know how well your writing (or coding) matches the "house style". (Mostly for technical documentation, requirements specs etc...)I'd be even more interested if it could be turned into a sublime text plugin that highlights words / phrases that deviate most strongly from the house style.

评论 #10170390 未加载

scorpwarp23over 9 years ago

This is brilliant! I tried it out. Waiting for a larger data set! +1

9 comments

nlover 9 years ago

strong_aiover 9 years ago

评论 #10169907 未加载

评论 #10169596 未加载

bearzooover 9 years ago

Turing_Machineover 9 years ago

I ran a Markov chain text generator on Finnegans Wake once. It came out looking much the same. :-)

friscoover 9 years ago

评论 #10169923 未加载

评论 #10169145 未加载

cafover 9 years ago

Have you considered the copyright on the Harry Potter training data?

评论 #10169036 未加载

achompasover 9 years ago

w_t_payneover 9 years ago

评论 #10170390 未加载

scorpwarp23over 9 years ago

This is brilliant! I tried it out. Waiting for a larger data set! +1

Show HN: Neural network that impersonates writers

9 comments

Show HN: Neural network that impersonates writers

9 comments