TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Markov chain algorithm in Go

78 pointsby jgrant27about 12 years ago

8 comments

mjnabout 12 years ago
As a side note if you want an "industrial-strength" applied version, at the cost of losing the small/elegant implementation, there's been considerable work in the past years on building robust Markov-language-modeling packages. I mention it here mostly because I didn't realize until recently myself that there was much to gain on top of the vanilla versions that are fairly straightforward to DIY.<p>One active area of research is smoothing the models, so they capture regularities without just memorizing the input text. The vanilla approach chooses a fixed order of Markov chain (e.g. 2-grams or 3-grams), where you choose n either ad-hoc, or (ideally) to balance underfitting (miss context you could use) against overfitting (end up memorizing verbatim text strings due to insufficient data). But the right "n" to choose can vary based on context as well, since different words show up more often in the training data, and are more or less variable in what kinds of words follow them. So there are a number of approaches to choose n adaptively and/or to smooth the counts ("Katz's backoff model", "Kneser-Ney smoothing", etc.).<p>From a more engineering point of view, there's also considerable work on pruning the models and representing them efficiently, if you want to work with things of the size of, say, the Google Books ngram dataset.<p>A few free-software packages that implement a good portion of the recent work:<p><a href="http://kheafield.com/code/kenlm/" rel="nofollow">http://kheafield.com/code/kenlm/</a><p><a href="http://code.google.com/p/berkeleylm/" rel="nofollow">http://code.google.com/p/berkeleylm/</a>
评论 #5616158 未加载
stiffabout 12 years ago
Various kinds of Markov models are pretty interesting and have many interesting applications that are not too complicated, it's a pity this particular example is being done to death, since it doesn't give much insight into the topic. Shannon's classic paper on information theory is very readable and shows the usefulness of the concept:<p><a href="http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf" rel="nofollow">http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pd...</a>
评论 #5615486 未加载
johanschabout 12 years ago
I simply do not understand HN's fascination with this type of posts. "Something that could have been implemented in any other turing complete language NOW IN GO".
评论 #5616203 未加载
评论 #5616531 未加载
exrationeabout 12 years ago
Here's a small Node.js state machine / Markov chain tool package that tries to make what's going on under the hood a little more obvious, so as to allow for tinkering. It contains an example for name generation.<p><a href="https://github.com/exratione/state-engines" rel="nofollow">https://github.com/exratione/state-engines</a>
joshrotenbergabout 12 years ago
Meta, but codewalk looks pretty cool.
评论 #5616005 未加载
gburtabout 12 years ago
I implemented this in PHP yesterday if anyone is interested:<p><a href="https://github.com/gburtini/Learning-Library-for-PHP/blob/master/lib/unsupervised/markovchain.php" rel="nofollow">https://github.com/gburtini/Learning-Library-for-PHP/blob/ma...</a><p>Edit: the thread above me is talking about bigram implementations, mine supports n-gram by setting the variable $degree when you train to anything you like. By default, its 1-gram.
MostAwesomeDudeabout 12 years ago
Huh. 85 lines, ignoring the explanation at the top. I wonder if this could be golfed to be smaller; my Markov generator for Time Cube is less than half the height (in Python): <a href="https://github.com/MostAwesomeDude/airbrush/blob/master/tipsum.py" rel="nofollow">https://github.com/MostAwesomeDude/airbrush/blob/master/tips...</a>
评论 #5616780 未加载
pjmlpabout 12 years ago
Oh man, one letter character variables. :(
评论 #5615387 未加载
评论 #5615352 未加载
评论 #5615291 未加载
评论 #5615691 未加载