This actually hit my previous company in a software context.<p>We would number our hotfixes sequentially. Many would be items demanded by a single client, so would get deployed as hotfixes only to that customer's site, and just rolled into the main trunk for the next quarterly release for everyone else. Clients would always be notified about hotfixes going onto their live sites.<p>One savvy client noticed the hotfix numbering sequence. Naturally, that ensued quite a number of extremely awkward discussions as they would regularly ask why our software needed so many hotfixes (tens per week) and why they weren't entitled to all of them right away.<p>Solution: a new policy to randomly generate hotfix numbers. Which of course led to the next problem, that now the sequence was not obvious from the names, so dependent hotfixes would sometimes get deployed in the wrong order. Why can't anything be easy...
There is some practical relevance to software development here. One shouldn't expose sequential IDs (a.k.a. serial numbers) to the public for anything non-public.<p>I see this Hacker News post has a numerical ID in the URL, for example; I can estimate the size of Hacker News given enough of these numbers... More directly, I can modify that numerical ID to crawl Hacker News.<p>Many sites do this; it's generally better to generate a (random or hashed or generated from a natural key) 'slug' to use as the key instead. For example, Amazon generates a unique, non-sequential, 10-digit alphanumeric string for each item in their catalog.
It's astounding how accurate they were using only statistical methods:<p>> Analysis of wheels from two tanks (48 wheels each, 96 wheels total) yielded <i></i>an estimate of 270 produced in February 1944<i></i>, substantially more than had previously been suspected.<p>> German records after the war showed production for the month of February 1944 was <i></i>276<i></i>.
Huh, I once visited a military base where people on the trip wanted to be photographed with a tank. The soldiers said it was OK, as long as somebody obscured the tank's serial number by standing in front of it. I wonder if their training in this respect was inspired by this history!<p>(But if so, why not print the serial numbers inside the tank, not outside? Or maybe encrypt or HMAC them?)
Don't remember where I read it at least 12 years ago, but someone talked about an April Fools prank where they released three pigs in their high school, with numbers 1, 2, and 4 written on them. Allegedly the administrators spent weeks looking for number 3.
My favorite explanation of this (posed instead as the Locomotive problem) is in Allen Downey's "Think Bayes," pp.22<p>It's online too, and worth reading!<p><a href="http://www.greenteapress.com/thinkbayes/" rel="nofollow">http://www.greenteapress.com/thinkbayes/</a>
This is why the whole secret agent "#3" thing in movies like Bourne Legacy, James Bond etc are so ridiculous.<p>That's a worse code name then just using the person real name as it gives hints of the total participation in the secret organization.
I remember my theoretical stats teacher showing us this problem. It's used all the time in ecology. His example used it to estimate the number of alligators in Louisiana swamps. They tag the alligators, release, and then using the tags they re-capture over subsequent years, they can get an estimate of how many alligators exist in the wild!
So here's an idea. Conventional intelligence was off by quite a bit, spurring the allies to overproduce tanks (which was possible due to the absurd American industrial capacity), which then allowed the allies to cleanly overwhelm the order of magnitude fewer tanks they actually came in kinetic contact with.
I first read about this work a few years ago, but had I encountered it before college, I think I might have majored in statistics. Such powerful results - feel like magic.
I encountered a slightly different problem trying to find the size of the union of a bunch of sets. We ended up just storing the smallest k int64 hashes of each item in each set, and computing 2^64 / ((largest hash - smallest hash) / (k - 1) as an estimate of the size of the union.
I think the most important information is the table:
Month Statistical estimate Intelligence estimate German records
June 1940 169 1,000 122
June 1941 244 1,550 271
August 1942 327 1,550 342<p>Intelligence estimates... so off the mark.