Rylan Schaeffer

Logo
Resume
Research
Learning
Blog
Teaching
Jokes
Kernel Papers


Coupon Collecting Problem

Suppose items have \(N\) unique possible type (e.g. coupons), and every time we obtain an item, its type is uniformly random.

Question 1: What is the expected number of items we need to collect in order to obtain at least 1 item of each type?

Define \(X_n\) as the number of items by which we obtain the \(n\)-th type.

First, note that \(\mathbb{E}[X_1] = 1\). Second, note that \(\mathbb{E}[X_2 - X_1] = \frac{1}{1 - 1/N}\), because the probability that we obtain a new type is 1 minus the probability we obtain the type we already have, and thus the expected number of required items is 1 over that probability. Third, note that \(\mathbb{E}[X_n - X_{n-1}] = \frac{1}{1 - (n-1)/N)\), for the same reason as above.

We want to know:

\[\begin{align*} \mathbb{E}[X_N] &= \mathbb{E}[X_N - X_{N-1} + X_{N-1} ... + X_2 - X_1 + X_1]\\ &= 1 + \sum_{n=2}^N \mathbb{E}[X_n - X_{n-1}\\ &= 1 + \sum_{n=1}^{N-1} \frac{1}{1 - n/N)\\ &\approx 1 + N \sum_{n=1}^{N-1} \frac{1}{n}\\ &= \Theta (N \log N) \end{align*}\]

Question 2: What is the probability we haven’t seen all \(N\) types after \(2 N \log N\) boxes?

Per Markov’s Inequality:

\[\mathbb{P}[X \geq 2 N \log N] \leq \frac{\mathbb{E}[X]}{2 N \log N} = \frac{N \log N}{2 N \log N} = \frac{1}{2}\]

Recall that for any geometric random variable \(Y\), \(\mathbb{V}[Y] = \frac{1-p}{p^2}\). Thus, the variance of \(X\) is:

\[\mathbb{V}[X] = \sum_{n=0}^{N-1} \mathbb{V}[X_n] = \sum \frac{1 - \frac{N - n}{N}}{(\frac{N-1}{N})^2} < N^2 \sum_{j=1}^N \frac{1}{j^2} < N^2 \frac{\pi^2}{6}\]

Per Chebyshev’s Inequality:

\[\mathbb{P}[X \geq 2 N \log N] \leq \mathbb{V}[|X - \mathbb{E}[X]]| \geq N \log N + O(N)] \leq \frac{\mathbb{V}[X]}{N^2 \log^2 N} = O(\frac{1}{\log^2 N})\]