I did something similar recently for a game I'm working on[1]. I didn't know it was called a Markov chain, but one thing I found is that if you take the <i>two</i> previous letters to generate the next one, the results are a little less random and seem a little more natural.<p>The more letters you take to generate the next one, the closer to the original source data you get, but with a big enough corpus of source data, you can still make random names using three or four letters.<p>[1] <a href="http://www.war-worlds.com/blog/2012/07/generating-names" rel="nofollow">http://www.war-worlds.com/blog/2012/07/generating-names</a>
I did something like this a few weeks ago using quotes from a loudly spoken team member.<p>Previously my workmates and I had started jotting down his sayings, and before we knew it had 2,000 or so entries in a database of the stuff he had said. I ran it though a Markov chain to see what sort of nonsense it would produce. My favorite thing that came out of it so far is the following,<p>"While I am changing my underwear people should check my email. Its an old Greek saying mate."
I had a similar idea for generating test data for a relational databases.
It seems that for test data the boundary conditions and exceptional cases are sort of easy but it is the common stuff that is harder to fake.
My ides is to create test data for numeric columns by estimating statistical parameters and use those in conjunction with a random number generator to make 100-500 rows of fake data.
But for the first and last names (and other textual columns) I've been thinking about modeling the data using Markov processes to be able to come up with fake names and addresses that are somewhat close to the real data.
I think that once you have a good statistical model you could export that and outsource testing more easily without compromising confidential information. If things like average salary were considered confidential then that could be skewed as a kind of obfuscation step.
I built one of these quite awhile ago to help a friend pick his babies name.<p><a href="https://github.com/harperreed/Baby-Chains" rel="nofollow">https://github.com/harperreed/Baby-Chains</a><p>The names it generates are hilarious.<p>Markov chains are wonderful things. A good markov chain bot will really spice up a company IRC channel.
Cool! I'll have to remember this when we can't pick a name for our next baby. A little hit or miss, but for a geeky name it is better than little Bobby tables...
Ahh, Markov chains. I pulled all my favourite quotes into a text file and used them to generate new quotes. Some nice ones:<p>What you do speaks so loudly that I drink this beer.<p>Premature optimisation is the immemorial refuge of the most troubled mind.<p>Collective judgement of new ideas is so often wrong that it picks up confidence as it appears.<p>How many seconds are there in a cellar on a rainy day?<p>The music business is a higher revelation than philosophy.<p>Art is a sign of intelligence.<p>My choice early in life is to have dinner.
Slightly off-topic: Does anyone know of tree(?) graphs of markov chains for certain bodies of words? Eg the probability of following characters for a certain book, showing only the more probable choices. That might look pretty cool.
Just so you know, I'm looking at this from my Android 4.1 stock browser and the entire page is blinking on and off randomly like some kind of joke. I can't scroll down because it seems as if it's constantly reloading itself.
Here are a few "gems" from the twitter feed:<p><pre><code> C would make an awesome boy's name
Ieahaholijayson would make an awesome boy's name.
Thinking about a boy's name? how about Chosowex?
</code></pre>
Pretty damn horrible.