Generalizing the German Tank Problem
The German Tank Problem dates back to World War II when the Allies used a statistical approach to estimate the number of enemy tanks produced or on the field from observed serial numbers after battles. Assuming that the tanks are labeled consecutively starting from 1, if we observe k tanks from a total of N tanks with the maximum observed tank being m, then the best estimate for N is m(1 + 1/k) - 1. We explore many generalizations. We looked at the discrete and continuous one dimensional case. We explored different estimators such as the Lth largest tank, and applied motivation from portfolio theory and studied a weighted average; however, the original formula was the best. We generalized the problem in two dimensions, with pairs instead of points, studying the discrete and continuous square and circle variants. There were complications from curvature issues and that not every number is representable as a sum of two squares. We often concentrated on the large N limit. For the discrete and continuous square, we tested various statistics, finding the largest observed component did best; the scaling factor for both cases is (2k+1)/2k. The discrete case was especially involved because we had to use approximation formulas that gave us the number of lattice points inside the circle. Interestingly, the scaling factors were different for the cases. Lastly, we generalized the problem into L dimensional squares and circles. The discrete and continuous square proved similar to the two dimensional square problem. However, for the Lth dimensional circle, we had to use formulas for the volume of the L-ball, and had to approximate the number of lattice points inside it. The formulas for the discrete circle were particularly interesting, as there was no L dependence in the formula.
READ FULL TEXT