I used to write Markov-based chat bots. Something I thought I observed, but tried and failed to show mathematically, is the possibility of well-connected neighborhoods of the graph (cliques?) leading to some outputs being more likely than their a priori likelihood in the input.<p>For example, once a simple Markov bot learns the input phrase "had had" it will also start to generate output phrases a human would assign 0% probability to like "had had had" and "had had had had". This in itself isn't a violation of the principles behind the model (it would have to look at more words to distinguish these).<p>The question is whether more complicated loops among related words can create "thickets" where the output generation can get "tangled up" and generate improbable output at a slightly higher rate than the formal analysis says a Markov model of that order should do for given input frequencies. An example of such a thicket would be something like, "have to have had had to have had".<p>Essentially, I'm hypothesizing that the percentage values of the weighted probabilities of transitions does not tell the whole story, because the high-level structure of the graph has an add-on effect. A weaker hypothesis is that the state-space of Markov models contains such pathological examples, but that these states are not reachable by normal learning.<p>Unfortunately I lacked the mathematical chops / expertise to formalize these ideas myself, nor do I personally know anyone who could help me explore these ideas.
I've done something similar without first learning about Markov Chains. One of my more interesting experiments was creating messed-up laws. I fed it the constitution and alice in wonderland, and it made the most surreal laws. The great thing about them is they don't need to know about language. You could make one to create images, another to create names for cities. I made one to create 'pronounceable passwords'. It took the top 1000 words of a dictionary, and then it would spit out things which could potentially be words, of any length. Of course, the pronounceability of a word like Shestatond is debatable.
I published a program to do basically this on Usenet in 1987.<p>Someone created a fixed version on github at<p><a href="https://github.com/cheako/markov3" rel="nofollow">https://github.com/cheako/markov3</a>
This is a neat introduction to the subject.<p>If you want to see more of what they can do, and you have had any exposure to Chomsky, you might also appreciate <a href="https://rubberducky.org/cgi-bin/chomsky.pl" rel="nofollow">https://rubberducky.org/cgi-bin/chomsky.pl</a>.
I wrote 'Bloviate' to mess around with Markov chains. From Goldilocks and the 3 bears it prodoces gems such as:<p>“Someone’s been sitting my porridge,” said the bedroom.<p>You can download it here:
<a href="https://successfulsoftware.net/2019/04/02/bloviate/" rel="nofollow">https://successfulsoftware.net/2019/04/02/bloviate/</a>
8 years ago I implemented something like this for your tweets at <a href="https://thatcan.be/my/next/tweet/" rel="nofollow">https://thatcan.be/my/next/tweet/</a><p>It still causes a spike of traffic every now and then from Twitter.
For the Finns reading this, there's a Twitter bot "ylekov" [1] that combines headlines from different Finnish news outlets using Markov chains. Sometimes they come out pretty funny<p>> Suora lähetys virkaanastujaisista – ainakin kaksi tonnia kokaiinia<p>[1]: <a href="https://twitter.com/ylekov_uutiset" rel="nofollow">https://twitter.com/ylekov_uutiset</a>
Yup, those are Markov Chains. Not to sound snotty or mean, but... so what? Why should we be interested?<p>Looks kinda like you followed a tutorial and did a write up.