TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The Central Limit Theorem Makes Random Testing Hard

56 pointsby sharkbotalmost 13 years ago

7 comments

imurrayalmost 13 years ago
Warning: slight pedantry ahead.<p>Specifically what this article is noticing is the <i>law of large numbers</i> [1], that sums or averages are "close" to their mean. The central limit theorem says that sums <i>converge in distribution</i> towards being normally distributed. But convergence in distribution isn't a strong statement about tail behavior, and so shouldn't be used in the sort of argument in the post.<p>[1] <a href="http://en.wikipedia.org/wiki/Law_of_large_numbers" rel="nofollow">http://en.wikipedia.org/wiki/Law_of_large_numbers</a>
streptomycinalmost 13 years ago
If the goal is to randomly probe the parameter space of a model, the author is correct that independent Gaussian random numbers are a poor choice (although it's not really related to the central limit theorem). However, this isn't a novel revelation. There are other/better ways to do it, for instance <a href="http://en.wikipedia.org/wiki/Latin_hypercube_sampling" rel="nofollow">http://en.wikipedia.org/wiki/Latin_hypercube_sampling</a>
评论 #4134144 未加载
Darmanialmost 13 years ago
While it is well known that uninformed random test generation often does a poor job at probing the parameter space, this has nothing to do with the CLT. Even his first example with the bounded queue, one where the CLT does make predictions, he is incorrect to apply it. That's because what we care about is not the difference between the number of enqueue and dequeue operations (which has expected value 0), but the magnitude of the difference (which has expected value sqrt(N)). And what we care about is not the magnitude of the difference after completing the tests, but the greatest magnitude at any point along the random walk. This is governed by a more advanced theorem called the Law of the Iterated Logarithm, which tells us that the probability of finding the bug is quite a bit higher than the author realizes.
lliiffeealmost 13 years ago
The central limit theorem only applies when random variables are combined <i>linearly</i>. I see no reason to suspect that this is the case with random inputs for testing. (Most functions are nonlinear!)
DaniFongalmost 13 years ago
Why the central limit theorem is usually off:<p><a href="http://daniellefong.com/2008/01/28/outliers-why-the-central-limit-theorem-is-typically-off/" rel="nofollow">http://daniellefong.com/2008/01/28/outliers-why-the-central-...</a>
评论 #4133688 未加载
bcbrownalmost 13 years ago
I think one approach would be to use RNGs in a slightly different way. Instead of a 'random walk' of various API calls, define a possible outcome category, and choose a random number to be the quantity of that outcome.<p>So for the bounded queue problem, set the outcome category to be "elements in queue". Say you get 3, 12, and 7 as your first three random numbers. Then the first three test cases would be "get the load to 3 elements", "get the capacity to 12 elements", "get the capacity to 7 elements".
评论 #4134723 未加载
zerostar07almost 13 years ago
I can't understand the relation to CLT. When using an RNG you basically sample from the same distribution (thus have only 1 random variable) plus the CLT comes into play when you add (sum, average) different random variables. In the case of the queue, it has nothing to do with the number of cases missed.
评论 #4134654 未加载