Another practical example of Markov-chaining is <a href="http://www.x11r5.com/" rel="nofollow">http://www.x11r5.com/</a> - a robot that's various trained from IRC, Twitter and Identica content.<p>There's even a weekly podcast generated from news headlines: <a href="http://www.x11r5.com/radio/" rel="nofollow">http://www.x11r5.com/radio/</a>
I accidentally reinvented this algorithm many years ago as a C++ project. Here is an example of passages that it created:<p>sections of ice fell through the. invectives in which he had been wondering how. roman scales was in readiness. occasional murmur of pain that continued to torment. desavanchers dicksen dickson dochard du chaillu duncan durand was a. waging a war of extermination against. lively about it no snap or bite. chairs cane sofas carved wood pillars rose pillars. skirting an acclivity covered with woods and dotted with trees of very deep water. scratching as though he'd tear his nails out and sharp bite out. jerked by a sudden stoppage of the sled dogs barked. mentioned a cemetery on the south the four brilliants of the sky great and. ranks plunging into the flames would extinguish them beneath their mass and the rest were seen in numerous flocks hovering about the borders of some beautiful river until it fell. fridays well i hope that we shall overcome. emphatically an' i make free the. profitable the captains of the sea and consequently the balloon remained.<p>You can see more info about it or download my source code at:<p><a href="http://experimentgarden.blogspot.com/2009/11/software-tool-for-low-order.html" rel="nofollow">http://experimentgarden.blogspot.com/2009/11/software-tool-f...</a>
Markov chains are always good fun to play with... a few months ago I worked on a class project which generated markov models from Final Fantasy SNES tracks. (<a href="https://github.com/IsaacLewis/MidiMarkov" rel="nofollow">https://github.com/IsaacLewis/MidiMarkov</a>). I should blog about it at some point.<p>I hadn't seen Shannon's algorithm before though, which looks a bit more memory efficient than the approach I used.
I remember reading about that in "The Information", where it is described how Claude Shannon did it.<p>Now I'll have to tweak my spam detection even more. Joke aside and somebody correct me if I'm wrong, spam probably runs on a simple wordlist type algorithm.<p>What is then the usefulness (I define usefulness extremely wide) of such a generator?
Bentley's collection of Programming Pearls radically changed my outlook (when I read it as a novice programmer, I considered programming itself to be an end, not a means to an end). I still read and re-read selections to this day, and I always seem to learn something new each time.<p>Would anyone care to share books of similar quality or importance?
Is it possible to generate text by a sort of reverse-LDA, where you have topics (per-sentence or per-paragraph, ideally) and estimate the probability of a word to appear in a given topic?<p>You could then use these topics to generate more realistic-looking text as this ostensibly wouldn't have the wild jumps from one topic to another that naive Markov chains have.<p>Has anyone done anything like this, or should I give it a shot?
"Dive into Python" contains chapter on XML processing[1] that uses grammar defined in XML to generate random English passages. A very funny and informative reading.<p>[1] <a href="http://diveintopython.org/xml_processing/" rel="nofollow">http://diveintopython.org/xml_processing/</a>
Markov Chains are really handy. You can also use the Hidden Markov Model to do voice recognition. <a href="http://en.wikipedia.org/wiki/Speech_recognition" rel="nofollow">http://en.wikipedia.org/wiki/Speech_recognition</a>
Shameless Plug:
This is very similar to what I use in <a href="http://wordum.net/" rel="nofollow">http://wordum.net/</a> but instead of letters, I use whole words to generate the text.
I did this exercise on my blog at <a href="http://programmingpraxis.com/2009/02/27/mark-v-shaney/" rel="nofollow">http://programmingpraxis.com/2009/02/27/mark-v-shaney/</a>.
<a href="http://www.jwz.org/dadadodo/" rel="nofollow">http://www.jwz.org/dadadodo/</a>
DadaDodo is a fairly old implementation that may be of interest.