Rylan Schaeffer

Kernel Papers

Coupon Collecting Problem

Suppose items have \(N\) unique possible type (e.g. coupons), and every time we obtain an item, its type is uniformly random.

Question 1: What is the expected number of items we need to collect in order to obtain at least 1 item of each type?

Define \(X_n\) as the number of items by which we obtain the \(n\)-th type.

First, note that \(\mathbb{E}[X_1] = 1\). Second, note that \(\mathbb{E}[X_2 - X_1] = \frac{1}{1 - 1/N}\), because the probability that we obtain a new type is 1 minus the probability we obtain the type we already have, and thus the expected number of required items is 1 over that probability. Third, note that \(\mathbb{E}[X_n - X_{n-1}] = \frac{1}{1 - (n-1)/N)\), for the same reason as above.

We want to know:

\[\begin{align*} \mathbb{E}[X_N] &= \mathbb{E}[X_N - X_{N-1} + X_{N-1} ... + X_2 - X_1 + X_1]\\ &= 1 + \sum_{n=2}^N \mathbb{E}[X_n - X_{n-1}\\ &= 1 + \sum_{n=1}^{N-1} \frac{1}{1 - n/N)\\ &\approx 1 + N \sum_{n=1}^{N-1} \frac{1}{n}\\ &= \Theta (N \log N) \end{align*}\]

Question 2: What is the probability we haven’t seen all \(N\) types after \(2 N \log N\) boxes?

Per Markov’s Inequality:

\[\mathbb{P}[X \geq 2 N \log N] \leq \frac{\mathbb{E}[X]}{2 N \log N} = \frac{N \log N}{2 N \log N} = \frac{1}{2}\]

Recall that for any geometric random variable \(Y\), \(\mathbb{V}[Y] = \frac{1-p}{p^2}\). Thus, the variance of \(X\) is:

\[\mathbb{V}[X] = \sum_{n=0}^{N-1} \mathbb{V}[X_n] = \sum \frac{1 - \frac{N - n}{N}}{(\frac{N-1}{N})^2} < N^2 \sum_{j=1}^N \frac{1}{j^2} < N^2 \frac{\pi^2}{6}\]

Per Chebyshev’s Inequality:

\[\mathbb{P}[X \geq 2 N \log N] \leq \mathbb{V}[|X - \mathbb{E}[X]]| \geq N \log N + O(N)] \leq \frac{\mathbb{V}[X]}{N^2 \log^2 N} = O(\frac{1}{\log^2 N})\]