科技回声

4 条评论

lutusp超过 11 年前

1. Create a list of common words, the more the better. Let's say we list a common spelling dictionary of 80,000 words.2. Create an order-5 permutation of the list (a result in which the order of the words matters). For an 80,000 word dictionary, that's 3.27 x 10^24 test sentences, each of five words.3. Scan the result sentence set, using a heuristic able to distinguish valid sentences from invalid ones. Let's say that one optimized validation test requires one millisecond -- in that case, the test would require 1.04 x 10^14 years, 7536 times the age of the universe.Meaning this is a much easier question to ask than answer.EDIT: Another approach is to make an arbitrary assumption about the structure of a five-word sentence, like pronoun-noun-noun-verb-noun: "The calico cat ate breakfast.". Crude and limited, and many apparently valid results will be meaningless, but it makes the estimate easier.We realize that random words have a probability of being pronouns, nouns or verbs, pp, pn and pv. The probability of producing a valid sentence (pvs) using the described template is therefore:pvs = pp * pn * pn * pv * pnJust generate the probability values by scanning a dictionary, identifying the word types (easier said than done), perform the above equation, and you have your answer.

评论 #6346354 未加载

thejteam超过 11 年前

It depends completely on whether you are looking for "valid" in a structural sense or a semantic sense. If a structural sense, lutusp's second approach looks very do-able. There are a lot of cases to consider and it would most likely mean pulling out an old grammar book that actually teaches these things but it could be approached systematically. It also would mean making some assumptions about verb forms, ie I would only want to have to choose the verb at random and not the form.If "valid" in a semantic sense, I wouldn't have a real clue where to start. The structural requirements give you an upper bound and a quick initial test. Perhaps you could start with a small set of words and observed how they scale upwards?

brudgers超过 11 年前

Randomly chosen from what? Or, to put it another way, does every word have the same probability of selection or is each word's probability weighted based upon frequency.If "crwth" is as likely as "is" then the likelihood of a meaningful sentence is lower than if actual word frequencies come into consideration.<a href="http://en.wikipedia.org/wiki/Crwth" rel="nofollow">http://en.wikipedia.org/wiki/Crwth</a>

chrismorgan超过 11 年前

A simple way to try it, in Python:<pre><code> >>> import random >>> words = open('/usr/share/dict/words').read().split() >>> words = filter(str.islower, words) >>> ' '.join(random.sample(words, 5))</code></pre>

评论 #6332479 未加载

4 条评论

lutusp超过 11 年前

评论 #6346354 未加载

thejteam超过 11 年前

brudgers超过 11 年前

chrismorgan超过 11 年前

评论 #6332479 未加载

Words that make valid sentences

4 条评论

Words that make valid sentences

4 条评论