The German Tank Problem

422 pointsby eadanalmost 6 years ago

25 comments

doogliusalmost 6 years ago

This is only the toy version of the actual problems solved by the Allies, which were more nuanced, and involved reasoning about the tank manufacturing pipeline. The write-up [0] doesn't go into the math but makes an interesting read.[0] <a href="https://sci-hub.tw/10.2307/2280189" rel="nofollow">https://sci-hub.tw/10.2307/2280189</a>

评论 #20319509 未加载

评论 #20319567 未加载

评论 #20318918 未加载

tzuryalmost 6 years ago

More about Frequentist and Bayesian analysis can be found here:<a href="https://en.wikipedia.org/wiki/German_tank_problem" rel="nofollow">https://en.wikipedia.org/wiki/German_tank_problem</a>Matter of fact...<pre><code> According to conventional Allied intelligence estimates, the Germans were producing around 1,400 tanks a month between June 1940 and September 1942. Applying the formula below to the serial numbers of captured tanks, the number was calculated to be 246 a month. After the war, captured German production figures from the ministry of Albert Speer showed the actual number to be 245.</code></pre>

评论 #20321500 未加载

jackfoxyalmost 6 years ago

How ironic that the nation that led the world in the frontiers of maths in the 19th century completely missed the boat in the applied math of signals intelligence in WWII. I'm referring to the tank serial numbers and the lack of care in Enigma codes, except by the Kriegsmarine, but even they eventually lost a code book to the allies, which they apparently considered an impossibility.

评论 #20320579 未加载

评论 #20323544 未加载

评论 #20319992 未加载

评论 #20324285 未加载

评论 #20324323 未加载

评论 #20319807 未加载

laGrenouillealmost 6 years ago

Interesting article, though I think it incorrectly leaves the reader thinking that there is some interesting informating hidden in the average spacing of the numbers. In fact, all you need to know is that maximum observation and the number of observations. Once you simplify the average spacing goes away.If M is the maximum serial number of N is the total number of observations, using the formula in the post:<pre><code> M + (avg. spacing) = M + M / N - 1 = (N + 1) / N * M </code></pre> To me that gives a more clear picture of what the unbiased estimator is doing: inflate the maximum value by a factor that limits towards one as the sample size grows.

评论 #20321448 未加载

评论 #20321310 未加载

mhh__almost 6 years ago

For anyone else interested in WW2 reverse engineering and design etc., <a href="https://www.youtube.com/watch?v=GJCF-Ufapu8" rel="nofollow">https://www.youtube.com/watch?v=GJCF-Ufapu8</a> "The secret war" is a huge documentary covering british efforts to counter german electronic warfare and V-weapons.

spectramaxalmost 6 years ago

Why didn't they use randomized and scrambled serial numbers? Sort of like what Amazon does to their order numbers. I know it can still be cracked but serially numbering military equipment is not very smart. I was setting up a Shopify store the other day and it doesn't allow for a lookup table to be used for order numbers. I don't want competitors to know that I've sold so many X items. Same thing with Squarespace and Square e-commerce stores. It blows my mind that a multi-billion dollar ecom giant has not implemented despite of forum posts and requests from users.

评论 #20319672 未加载

评论 #20319616 未加载

评论 #20319660 未加载

评论 #20319812 未加载

评论 #20320017 未加载

评论 #20319816 未加载

评论 #20319650 未加载

评论 #20319603 未加载

sreanalmost 6 years ago

The job interview version: If you are being interviewed for a position by engineers who have their employee ids (serially allocated) on their badge find the number of employees from those ids assuming all engineers are equally likely to be on the panel of 8.

评论 #20321213 未加载

评论 #20319909 未加载

评论 #20321508 未加载

评论 #20319612 未加载

评论 #20319920 未加载

wcoenenalmost 6 years ago

See also Doomsday Argument.<a href="https://en.wikipedia.org/wiki/Doomsday_argument" rel="nofollow">https://en.wikipedia.org/wiki/Doomsday_argument</a>

评论 #20320006 未加载

评论 #20319720 未加载

Nomentatusalmost 6 years ago

This all seems to assume the tank serial numbers would be captured at one moment in time ("captured 15 of these tanks uniformly at random.") But in fact the tank shells dribble in over time which biases the gap, the gaps at the highest numbers are going to be greater. Earlier tanks have had many more chances to be destroyed or captured. So using average gap is clearly not going to give the best estimate. If you restrict yourself to tanks from the latest large battle, that will cancel out the dribble effect though.

neviralmost 6 years ago

FWIW, this is part of why Amazon's product identifiers (ASINs) are obfuscated the way they are

评论 #20322139 未加载

评论 #20319098 未加载

评论 #20319481 未加载

kevingrahlalmost 6 years ago

Since no one else commented on that yet; I just wanted to say that I like the simple layout OP is using.Not much clutter & straight to the point. Loads fast and it’s under 630KB.Could certainly be improved but it’s nice not having to load >25MB just to read an article.

d--almost 6 years ago

This is also a good (applied, with simple code) example of the use of probabilistic programming. I can't get myself to read full books, but somehow this simple example gave me some intuition and additional pointers to follow.

dangalmost 6 years ago

Related from 2016: <a href="https://news.ycombinator.com/item?id=13095178" rel="nofollow">https://news.ycombinator.com/item?id=13095178</a>2015: <a href="https://news.ycombinator.com/item?id=10517882" rel="nofollow">https://news.ycombinator.com/item?id=10517882</a>2009: <a href="https://news.ycombinator.com/item?id=670065" rel="nofollow">https://news.ycombinator.com/item?id=670065</a>

squeakynickalmost 6 years ago

It depends on how you intend to 'score' the estimate.Are you looking for the answer that is the 'most likely', or one that has the 'lowest least squared error', or maybe one that is 'unbiased' (mean error)?<a href="http://datagenetics.com/blog/march22014/index.html" rel="nofollow">http://datagenetics.com/blog/march22014/index.html</a>

slyualmost 6 years ago

I recommend Think Bayes by Allen Downey if you want to study more. It's a free book available online. <a href="http://www.greenteapress.com/thinkbayes/thinkbayes.pdf" rel="nofollow">http://www.greenteapress.com/thinkbayes/thinkbayes.pdf</a>

RickJWagneralmost 6 years ago

"the Germans, being Germans, had numbered their parts in the order they rolled off the production line"Probably in today's world this is racist or nationalist or something. But (as someone of German descent) I have to admit it's funny.

ngneeralmost 6 years ago

I remember studying this problem in the context of anonymity a few years back, defining immeasurability as the property whereby an adversary cannot distinguish between different node counts, for example. The tank problem is related to mark recapture techniques for animal population size estimation. Shameless plug,<a href="http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2011/MSC/MSC-2011-06.pdf" rel="nofollow">http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2...</a>

debbiedowneralmost 6 years ago

Nice write up.Little bit of a funny though: Note how num_tanks ~ Unif(max(captured),2000) was defined, so you already have p[ parameter | data ]. Isn't this already a posterior?I get however how if you had the r.v.s num_tanks ~ Unif(M,2000), observed | num_tanks ~ Unif(1,num_tanks), M some constant, that you could find a posterior distribution num_tanks | vector<observed> by first finding the joint via E[ 1[num_tanks < t]P[observed | num_tanks] ]

joker3almost 6 years ago

Given the praise for Bayesian methods here, I'm surprised the author didn't discuss the Bayesian solution. See <a href="http://isaacslavitt.com/2015/12/19/german-tank-problem-with-pymc-and-pystan/" rel="nofollow">http://isaacslavitt.com/2015/12/19/german-tank-problem-with-...</a> for a similar exposition.

评论 #20319074 未加载

coldcodealmost 6 years ago

More impressive than using modern tools is that people in WW2 figured this out and modeled it on paper using slide rules.

评论 #20319086 未加载

评论 #20319680 未加载

评论 #20318909 未加载

ngcc_hkalmost 6 years ago

Very interesting. Especially the three links and in particular<a href="https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers" rel="nofollow">https://github.com/CamDavidsonPilon/Probabilistic-Programmin...</a>

salty_biscuitsalmost 6 years ago

Why call it probabilistic programming? It is bayesian inference with mcmc (or am I missing something).

matchagauchoalmost 6 years ago

Moral of the story: Don't auto-increment serial numbers :-)

评论 #20322801 未加载

jaimex2almost 6 years ago

Was there only one tank factory?

评论 #20329346 未加载

mrutsalmost 6 years ago

Seeing that it's a uniform distribution, let's start out with assuming our sample mean (the average serial number we find) has the same distribution as the true mean (the actual number of tanks in existence). If this is true, then:2 x meanshould be an unbiased estimator of the true mean. But because we are probably under sampling the extremes, we could use the Bessel correction:1/(n-1) x summation_{i=1}^n(sample_i)I would guess this comes out to a better estimation than what the article says.Bessel's correction might be a bit of overkill, since it's intended to work with normal distributions. But I still suspect it comes out to a better estimation that what the blog post says.

评论 #20321409 未加载

评论 #20324930 未加载