TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The German Tank Problem

422 pointsby eadanalmost 6 years ago

25 comments

doogliusalmost 6 years ago
This is only the toy version of the actual problems solved by the Allies, which were more nuanced, and involved reasoning about the tank manufacturing pipeline. The write-up [0] doesn&#x27;t go into the math but makes an interesting read.<p>[0] <a href="https:&#x2F;&#x2F;sci-hub.tw&#x2F;10.2307&#x2F;2280189" rel="nofollow">https:&#x2F;&#x2F;sci-hub.tw&#x2F;10.2307&#x2F;2280189</a>
评论 #20319509 未加载
评论 #20319567 未加载
评论 #20318918 未加载
tzuryalmost 6 years ago
More about Frequentist and Bayesian analysis can be found here:<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;German_tank_problem" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;German_tank_problem</a><p>Matter of fact...<p><pre><code> According to conventional Allied intelligence estimates, the Germans were producing around 1,400 tanks a month between June 1940 and September 1942. Applying the formula below to the serial numbers of captured tanks, the number was calculated to be 246 a month. After the war, captured German production figures from the ministry of Albert Speer showed the actual number to be 245.</code></pre>
评论 #20321500 未加载
jackfoxyalmost 6 years ago
How ironic that the nation that led the world in the frontiers of maths in the 19th century completely missed the boat in the applied math of signals intelligence in WWII. I&#x27;m referring to the tank serial numbers and the lack of care in Enigma codes, except by the Kriegsmarine, but even they eventually lost a code book to the allies, which they apparently considered an impossibility.
评论 #20320579 未加载
评论 #20323544 未加载
评论 #20319992 未加载
评论 #20324285 未加载
评论 #20324323 未加载
评论 #20319807 未加载
laGrenouillealmost 6 years ago
Interesting article, though I think it incorrectly leaves the reader thinking that there is some interesting informating hidden in the average spacing of the numbers. In fact, all you need to know is that maximum observation and the number of observations. Once you simplify the average spacing goes away.<p>If M is the maximum serial number of N is the total number of observations, using the formula in the post:<p><pre><code> M + (avg. spacing) = M + M &#x2F; N - 1 = (N + 1) &#x2F; N * M </code></pre> To me that gives a more clear picture of what the unbiased estimator is doing: inflate the maximum value by a factor that limits towards one as the sample size grows.
评论 #20321448 未加载
评论 #20321310 未加载
mhh__almost 6 years ago
For anyone else interested in WW2 reverse engineering and design etc., <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=GJCF-Ufapu8" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=GJCF-Ufapu8</a> &quot;The secret war&quot; is a huge documentary covering british efforts to counter german electronic warfare and V-weapons.
spectramaxalmost 6 years ago
Why didn&#x27;t they use randomized and scrambled serial numbers? Sort of like what Amazon does to their order numbers. I know it can still be cracked but serially numbering military equipment is not very smart. I was setting up a Shopify store the other day and it doesn&#x27;t allow for a lookup table to be used for order numbers. I don&#x27;t want competitors to know that I&#x27;ve sold so many X items. Same thing with Squarespace and Square e-commerce stores. It blows my mind that a multi-billion dollar ecom giant has not implemented despite of forum posts and requests from users.
评论 #20319672 未加载
评论 #20319616 未加载
评论 #20319660 未加载
评论 #20319812 未加载
评论 #20320017 未加载
评论 #20319816 未加载
评论 #20319650 未加载
评论 #20319603 未加载
sreanalmost 6 years ago
The job interview version: If you are being interviewed for a position by engineers who have their employee ids (serially allocated) on their badge find the number of employees from those ids assuming all engineers are equally likely to be on the panel of 8.
评论 #20321213 未加载
评论 #20319909 未加载
评论 #20321508 未加载
评论 #20319612 未加载
评论 #20319920 未加载
wcoenenalmost 6 years ago
See also Doomsday Argument.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Doomsday_argument" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Doomsday_argument</a>
评论 #20320006 未加载
评论 #20319720 未加载
Nomentatusalmost 6 years ago
This all seems to assume the tank serial numbers would be captured at one moment in time (&quot;captured 15 of these tanks uniformly at random.&quot;) But in fact the tank shells dribble in over time which biases the gap, the gaps at the highest numbers are going to be greater. Earlier tanks have had many more chances to be destroyed or captured. So using average gap is clearly not going to give the best estimate. If you restrict yourself to tanks from the latest large battle, that will cancel out the dribble effect though.
neviralmost 6 years ago
FWIW, this is part of why Amazon&#x27;s product identifiers (ASINs) are obfuscated the way they are
评论 #20322139 未加载
评论 #20319098 未加载
评论 #20319481 未加载
kevingrahlalmost 6 years ago
Since no one else commented on that yet; I just wanted to say that I like the simple layout OP is using.<p>Not much clutter &amp; straight to the point. Loads fast and it’s under 630KB.<p>Could certainly be improved but it’s nice not having to load &gt;25MB just to read an article.
d--almost 6 years ago
This is also a good (applied, with simple code) example of the use of probabilistic programming. I can&#x27;t get myself to read full books, but somehow this simple example gave me some intuition and additional pointers to follow.
dangalmost 6 years ago
Related from 2016: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=13095178" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=13095178</a><p>2015: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=10517882" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=10517882</a><p>2009: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=670065" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=670065</a>
squeakynickalmost 6 years ago
It depends on how you intend to &#x27;score&#x27; the estimate.<p>Are you looking for the answer that is the &#x27;most likely&#x27;, or one that has the &#x27;lowest least squared error&#x27;, or maybe one that is &#x27;unbiased&#x27; (mean error)?<p><a href="http:&#x2F;&#x2F;datagenetics.com&#x2F;blog&#x2F;march22014&#x2F;index.html" rel="nofollow">http:&#x2F;&#x2F;datagenetics.com&#x2F;blog&#x2F;march22014&#x2F;index.html</a>
slyualmost 6 years ago
I recommend Think Bayes by Allen Downey if you want to study more. It&#x27;s a free book available online. <a href="http:&#x2F;&#x2F;www.greenteapress.com&#x2F;thinkbayes&#x2F;thinkbayes.pdf" rel="nofollow">http:&#x2F;&#x2F;www.greenteapress.com&#x2F;thinkbayes&#x2F;thinkbayes.pdf</a>
RickJWagneralmost 6 years ago
&quot;the Germans, being Germans, had numbered their parts in the order they rolled off the production line&quot;<p>Probably in today&#x27;s world this is racist or nationalist or something. But (as someone of German descent) I have to admit it&#x27;s funny.
ngneeralmost 6 years ago
I remember studying this problem in the context of anonymity a few years back, defining immeasurability as the property whereby an adversary cannot distinguish between different node counts, for example. The tank problem is related to mark recapture techniques for animal population size estimation. Shameless plug,<p><a href="http:&#x2F;&#x2F;www.cs.technion.ac.il&#x2F;users&#x2F;wwwb&#x2F;cgi-bin&#x2F;tr-get.cgi&#x2F;2011&#x2F;MSC&#x2F;MSC-2011-06.pdf" rel="nofollow">http:&#x2F;&#x2F;www.cs.technion.ac.il&#x2F;users&#x2F;wwwb&#x2F;cgi-bin&#x2F;tr-get.cgi&#x2F;2...</a>
debbiedowneralmost 6 years ago
Nice write up.<p>Little bit of a funny though: Note how num_tanks ~ Unif(max(captured),2000) was defined, so you already have p[ parameter | data ]. Isn&#x27;t this already a posterior?<p>I get however how if you had the r.v.s num_tanks ~ Unif(M,2000), observed | num_tanks ~ Unif(1,num_tanks), M some constant, that you could find a posterior distribution num_tanks | vector&lt;observed&gt; by first finding the joint via E[ 1[num_tanks &lt; t]P[observed | num_tanks] ]
joker3almost 6 years ago
Given the praise for Bayesian methods here, I&#x27;m surprised the author didn&#x27;t discuss the Bayesian solution. See <a href="http:&#x2F;&#x2F;isaacslavitt.com&#x2F;2015&#x2F;12&#x2F;19&#x2F;german-tank-problem-with-pymc-and-pystan&#x2F;" rel="nofollow">http:&#x2F;&#x2F;isaacslavitt.com&#x2F;2015&#x2F;12&#x2F;19&#x2F;german-tank-problem-with-...</a> for a similar exposition.
评论 #20319074 未加载
coldcodealmost 6 years ago
More impressive than using modern tools is that people in WW2 figured this out and modeled it on paper using slide rules.
评论 #20319086 未加载
评论 #20319680 未加载
评论 #20318909 未加载
ngcc_hkalmost 6 years ago
Very interesting. Especially the three links and in particular<p><a href="https:&#x2F;&#x2F;github.com&#x2F;CamDavidsonPilon&#x2F;Probabilistic-Programming-and-Bayesian-Methods-for-Hackers" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;CamDavidsonPilon&#x2F;Probabilistic-Programmin...</a>
salty_biscuitsalmost 6 years ago
Why call it probabilistic programming? It is bayesian inference with mcmc (or am I missing something).
matchagauchoalmost 6 years ago
Moral of the story: Don&#x27;t auto-increment serial numbers :-)
评论 #20322801 未加载
jaimex2almost 6 years ago
Was there only one tank factory?
评论 #20329346 未加载
mrutsalmost 6 years ago
Seeing that it&#x27;s a uniform distribution, let&#x27;s start out with assuming our sample mean (the average serial number we find) has the same distribution as the true mean (the actual number of tanks in existence). If this is true, then:<p>2 x mean<p>should be an unbiased estimator of the true mean. But because we are probably under sampling the extremes, we could use the Bessel correction:<p>1&#x2F;(n-1) x summation_{i=1}^n(sample_i)<p>I would guess this comes out to a better estimation than what the article says.<p>Bessel&#x27;s correction might be a bit of overkill, since it&#x27;s intended to work with normal distributions. But I still suspect it comes out to a better estimation that what the blog post says.
评论 #20321409 未加载
评论 #20324930 未加载