TechEcho

8 comments

mjnabout 12 years ago

As a side note if you want an "industrial-strength" applied version, at the cost of losing the small/elegant implementation, there's been considerable work in the past years on building robust Markov-language-modeling packages. I mention it here mostly because I didn't realize until recently myself that there was much to gain on top of the vanilla versions that are fairly straightforward to DIY.One active area of research is smoothing the models, so they capture regularities without just memorizing the input text. The vanilla approach chooses a fixed order of Markov chain (e.g. 2-grams or 3-grams), where you choose n either ad-hoc, or (ideally) to balance underfitting (miss context you could use) against overfitting (end up memorizing verbatim text strings due to insufficient data). But the right "n" to choose can vary based on context as well, since different words show up more often in the training data, and are more or less variable in what kinds of words follow them. So there are a number of approaches to choose n adaptively and/or to smooth the counts ("Katz's backoff model", "Kneser-Ney smoothing", etc.).From a more engineering point of view, there's also considerable work on pruning the models and representing them efficiently, if you want to work with things of the size of, say, the Google Books ngram dataset.A few free-software packages that implement a good portion of the recent work:<a href="http://kheafield.com/code/kenlm/" rel="nofollow">http://kheafield.com/code/kenlm/</a><a href="http://code.google.com/p/berkeleylm/" rel="nofollow">http://code.google.com/p/berkeleylm/</a>

评论 #5616158 未加载

stiffabout 12 years ago

Various kinds of Markov models are pretty interesting and have many interesting applications that are not too complicated, it's a pity this particular example is being done to death, since it doesn't give much insight into the topic. Shannon's classic paper on information theory is very readable and shows the usefulness of the concept:<a href="http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf" rel="nofollow">http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pd...</a>

评论 #5615486 未加载

johanschabout 12 years ago

I simply do not understand HN's fascination with this type of posts. "Something that could have been implemented in any other turing complete language NOW IN GO".

评论 #5616203 未加载

评论 #5616531 未加载

exrationeabout 12 years ago

Here's a small Node.js state machine / Markov chain tool package that tries to make what's going on under the hood a little more obvious, so as to allow for tinkering. It contains an example for name generation.<a href="https://github.com/exratione/state-engines" rel="nofollow">https://github.com/exratione/state-engines</a>

joshrotenbergabout 12 years ago

Meta, but codewalk looks pretty cool.

评论 #5616005 未加载

gburtabout 12 years ago

I implemented this in PHP yesterday if anyone is interested:<a href="https://github.com/gburtini/Learning-Library-for-PHP/blob/master/lib/unsupervised/markovchain.php" rel="nofollow">https://github.com/gburtini/Learning-Library-for-PHP/blob/ma...</a>Edit: the thread above me is talking about bigram implementations, mine supports n-gram by setting the variable $degree when you train to anything you like. By default, its 1-gram.

MostAwesomeDudeabout 12 years ago

Huh. 85 lines, ignoring the explanation at the top. I wonder if this could be golfed to be smaller; my Markov generator for Time Cube is less than half the height (in Python): <a href="https://github.com/MostAwesomeDude/airbrush/blob/master/tipsum.py" rel="nofollow">https://github.com/MostAwesomeDude/airbrush/blob/master/tips...</a>

评论 #5616780 未加载

pjmlpabout 12 years ago

Oh man, one letter character variables. :(

评论 #5615387 未加载

评论 #5615352 未加载

评论 #5615291 未加载

评论 #5615691 未加载

8 comments

mjnabout 12 years ago

评论 #5616158 未加载

stiffabout 12 years ago

评论 #5615486 未加载

johanschabout 12 years ago

I simply do not understand HN's fascination with this type of posts. "Something that could have been implemented in any other turing complete language NOW IN GO".

Markov chain algorithm in Go

8 comments

Markov chain algorithm in Go

8 comments