TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Words that make valid sentences

5 点作者 makeshifthoop超过 11 年前
Given 5 randomly chosen words. What is the likelihood that it makes a valid sentence? The list of valid available words does not include proper nouns. Plural/singulars are treated as different words. Interesting thought experiment and wanted to see what other people come up with.

4 条评论

lutusp超过 11 年前
1. Create a list of common words, the more the better. Let&#x27;s say we list a common spelling dictionary of 80,000 words.<p>2. Create an order-5 permutation of the list (a result in which the order of the words matters). For an 80,000 word dictionary, that&#x27;s 3.27 x 10^24 test sentences, each of five words.<p>3. Scan the result sentence set, using a heuristic able to distinguish valid sentences from invalid ones. Let&#x27;s say that one optimized validation test requires one millisecond -- in that case, the test would require 1.04 x 10^14 years, 7536 times the age of the universe.<p>Meaning this is a much easier question to ask than answer.<p>EDIT: Another approach is to make an arbitrary assumption about the structure of a five-word sentence, like pronoun-noun-noun-verb-noun: &quot;The calico cat ate breakfast.&quot;. Crude and limited, and many apparently valid results will be meaningless, but it makes the estimate easier.<p>We realize that random words have a probability of being pronouns, nouns or verbs, pp, pn and pv. The probability of producing a valid sentence (pvs) using the described template is therefore:<p>pvs = pp * pn * pn * pv * pn<p>Just generate the probability values by scanning a dictionary, identifying the word types (easier said than done), perform the above equation, and you have your answer.
评论 #6346354 未加载
thejteam超过 11 年前
It depends completely on whether you are looking for &quot;valid&quot; in a structural sense or a semantic sense. If a structural sense, lutusp&#x27;s second approach looks very do-able. There are a lot of cases to consider and it would most likely mean pulling out an old grammar book that actually teaches these things but it could be approached systematically. It also would mean making some assumptions about verb forms, ie I would only want to have to choose the verb at random and not the form.<p>If &quot;valid&quot; in a semantic sense, I wouldn&#x27;t have a real clue where to start. The structural requirements give you an upper bound and a quick initial test. Perhaps you could start with a small set of words and observed how they scale upwards?
brudgers超过 11 年前
Randomly chosen from what? Or, to put it another way, does every word have the same probability of selection or is each word&#x27;s probability weighted based upon frequency.<p>If &quot;crwth&quot; is as likely as &quot;is&quot; then the likelihood of a meaningful sentence is lower than if actual word frequencies come into consideration.<p><a href="http://en.wikipedia.org/wiki/Crwth" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Crwth</a>
chrismorgan超过 11 年前
A simple way to try it, in Python:<p><pre><code> &gt;&gt;&gt; import random &gt;&gt;&gt; words = open(&#x27;&#x2F;usr&#x2F;share&#x2F;dict&#x2F;words&#x27;).read().split() &gt;&gt;&gt; words = filter(str.islower, words) &gt;&gt;&gt; &#x27; &#x27;.join(random.sample(words, 5))</code></pre>
评论 #6332479 未加载