Other fun I've had with Benford's Law.<p>1. Spotting odd things in MPs' expenses: <a href="http://blog.jgc.org/2009/06/its-probably-worth-testing-mps.html" rel="nofollow">http://blog.jgc.org/2009/06/its-probably-worth-testing-mps.h...</a><p>2. Spotting odd things in BBC executives' expenses: <a href="http://blog.jgc.org/2009/06/running-numbers-on-bbc-executives.html" rel="nofollow">http://blog.jgc.org/2009/06/running-numbers-on-bbc-executive...</a><p>3. The Iranian election: <a href="http://blog.jgc.org/2009/06/benfords-law-and-iranian-election.html" rel="nofollow">http://blog.jgc.org/2009/06/benfords-law-and-iranian-electio...</a><p>4. New Age mumbo jumbo: <a href="http://www.jgc.org/blog/2008/02/any-sufficiently-simple-explanation-is.html" rel="nofollow">http://www.jgc.org/blog/2008/02/any-sufficiently-simple-expl...</a>
I like the history section of the wikipedia article:<p><blockquote>The discovery of this fact goes back to 1881, when the American astronomer Simon Newcomb noticed that in logarithm books, the earlier pages (which contained numbers that started with 1) were much more worn than the other pages.</blockquote><p>Can you imagine the sense of observation and curiosity that would make someone look at a book of numbers and say, "I wonder why these pages are more worn than those ones."
There's a nice discussion of this from Terry Tao (outrageously smart mathematician; has a Fields medal) at <a href="http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/" rel="nofollow">http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-...</a> which contains, e.g., the following nice observation: if X follows Benford's law and Y is any positive random variable independent of X, then XY also follows Benford's law. (Tao goes a bit further than this and thereby sheds some light on why many things approximately obey Benford's law.)<p>[EDITED to add: Discussed before on HN: <a href="http://news.ycombinator.com/item?id=687241" rel="nofollow">http://news.ycombinator.com/item?id=687241</a>. There have been quite a number of other discussions of Benford's law on HN, too.]
This is interesting (from Wikipedia article on Benford's Law):<p>In the United States, evidence based on Benford's law is legally admissible in criminal cases at the federal, state, and local levels.
Benford's law only seems strange until you realise natural phenomena tend to express logarithmic functions while our commonly used system of counting counting and measuring is not.<p>It's still a bit of a brain f--- when you first encounter it. I found it easier to <i>get</i> using plotting tools, as opposed to aggregating lists of numbers and measurements.
Seems to me that when you have a group of somethings that are constantly increasing in size it would be natural for the number 1 to come up in the first digit more often because in order to get to 2, you need to pass through 1 first and in order to get to 9 you need to pass 1,2,3,4,5,6,7,8 first. Therefore, you should get the distribution predicted by Benford's law. The way to test this theory, would be to run the numbers on values that are constantly decreasing. I'd expect the distribution would reverse itself.<p>If it proves itself true, then you could use it to test if a group of things is increasing or decreasing.
My favorite explanation of it is that if there is a distribution to the numbers, then that distribution should hold no matter what base you're working in (for natural things, after all, there's nothing special about base 10), and Benfords law can be shown to be a) a law that satisfies this base-independent property, and b) the only law that does so.
Searching reveals <i>lots</i> of previous discussion on Benford's law on here, so I won't give all the links. Of course, it's an interesting observation, so it's worth advertising every so often.<p>Here are some hacker-newsers testing files in their home directories:
<a href="http://news.ycombinator.com/item?id=1076534" rel="nofollow">http://news.ycombinator.com/item?id=1076534</a>
As far as I can tell, "Most common iPhone passcodes" doesn't belong on this list, and I'm perplexed why it seems to follow the law. An iPhone numeric password (which I'm assuming it's referring to) is simply a 4-digit string, so all first digits should be equally probable, unless there's some psychological issue at work. Or are they discarding leading zeros for the purpose of this chart? I guess they must be (0 doesn't appear on the chart), but that's a weird thing to do to a password.
Imagine you threw a single stone into the desert and asked your friend to go find it. It would be hard. Now imagine you threw 2 stones into the desert and asked your friend to go find them. It is twice as hard to find both stones as it is to find 1 stone. Imagine you threw 3 stones. It is 3 times as hard to find all 3 stones as it is to find 1 stone.<p>Now imagine that numbers are built out of stones. To "build" a 1, you only need 1 stone. But to "build" a 2, you need 2 stones. Thus, if you wanted to write a 3, you would have to go in the desert and find 3 stones. It's 3x as hard, and so you'd expect people to "build" 1/3 as many 3's as 1's, 1/5 as many 5's as 1's, and so on. Just as you'd expect there to be a lot more single story buildings than skyscrapers. It's easier to build a single story building.<p>Thus, the distribution is exactly what you'd expect. While it doesn't actually take stones to build numbers, we don't write the number 3 unless we have 3 of something. Unless you are lying. Which is why this is a great method of detecting fraud.<p>UPDATE: What do I mean when I say "3 times as hard"?<p>Imagine the desert is a rectangle of 10 squares. Kind of like a mancala board or a ladder on the ground. You start by stepping in square 1, and to get to square 10 you have to step through each square.<p>If there is only 1 rock, what are the odds that you'll have to walk all 10 steps to find it? This is the same thing as asking what are the odds that this rock is in square 10. The answer is 1/10 or 10%.<p>Now, if there are 3 rocks, what are the odds that you'll have to step into all 10 squares? Well, what are the odds that there's a rock in the last square? 26.1%, or approximately 3x as hard. It's interesting that it's not exactly 3x as hard, it's 2.61x as hard. Which makes the data in the OP seem even more logical since you'd expect 30.8% 1's given 11.8% 3's--the 32.62% actual number is not that far off.
Why is this not common sense?<p>For the numbers 1-19, more than half of them start with 1.
For the numbers 1-199, more than half of them start with one.<p>Change the examples to 1-299, 1-399, etc, and you'll get percentages of all digits matching Benford's law.
Cool stuff. However, something mostly entirely offtopic that I genuiunely wonder about: it seems everybody registers a .com just to make a HN post. What's the point of this? Why not post the same data on your blog?
Benford's law makes a lot of sense if you consider that many of the numbers are derived from counting up from 0. The scale of these things is exponentially distributed, and therefore the leading digits are more likely to be 1 than 9. This is related to social media -- once your userbase gets big enough it starts growing or shrinking proportionally to its size, i.e. exponentially. This is also somewhat related to the value of a social network ... Metcalfe's law seems to be too optimistic. THe value is probably more like nlog n
"If a set of values were truly random, each leading digit would appear about 11% of the time"<p>This kind of mathematically unsophisticated reasoning is exactly why Benford's law is so surprising to people. If you think of what it means for a value to be "truly random", the result is not surprising at all.
Perhaps you should let some of the open data citizen groups know about this so they can add more data. Also, if you haven't already then take a look at CKAN[1] for datasets to add.<p>[1] <a href="http://ckan.net/" rel="nofollow">http://ckan.net/</a>
'Presenting Benford's law' would be a more fitting title.
Nicely presented, and intriguing law for sure but I can't help to think "and?" At this point it lacks a more user-friendly way to submit data-sets.
This one is interesting:
<a href="http://testingbenfordslaw.com/most-common-iphone-passcodes" rel="nofollow">http://testingbenfordslaw.com/most-common-iphone-passcodes</a><p>I wonder what influence the 'spatial' properties of a number pad password has on this data. For example "5" gets a nice little spike... and "5" is the center key on the 10-key iPhone number pad. The "1" is still the winner by far, but I wonder how many of those are the easy-to-remember "1234".
It seems to me there's a lot of interesting psychology elements to this, but it's also a simple reflection of relatively constant growth rates. If population of cities grow 3% every year, they will spend a lot more years in the 1 millions than the 9 millions, etc<p>Chart looks like this. <a href="https://url.odesk.com/a7och" rel="nofollow">https://url.odesk.com/a7och</a>
>Imagine a large dataset, say something like a list of every country and its population.<p>How is that a large dataset? There aren't that many countries.