“All we ever see of stars are their old photographs.”
Parent: Stochastic Processes
The Stick-Breaking Process (SBP) is a stochastic process where each sample path is an infinite sequence of random variables π1,π2... such that each variable pii∈(0,1) and sum of the sequence ∑∞k=1π1 equals 1 with probability 1. The way the variables is generated is by sampling an i.i.d. sequence of vk∼Beta(α,β) and then defining
πK=vKK∏k=1(1−vk)This is the source of the name “stick breaking”: a stick with mass 1 is sequentially broken into smaller and smaller pieces.
If the stick-breaking weights vk Beta(α,β) with α=1, then the infinite random vector π=(π1,π2,...) is said to be distributed according to the so_called Griffiths-Engen-McCloskey distribution GEM(β).
The SBP is intimately related to the Dirichlet process. Let G∼DP(α,G0) be a random distribution. The distribution is defined as:
G=K∑k=1πkδθkwhere π1,π2,... is an infinite sequence of probability masses summing to 1 and θ1,θ2,... is an infinite sequence of elements sampled from the base distribution G0. From this perspective, one can see that an easy way to sample a random distribution from a DP is through the three-step stick-breaking construction:
This constructed G is distributed according to DP(α,G0), hence the name stick-breaking construction.
The SBP is closely related to the CRP. The difference is that the CRP integrates out the base measure G, meaning that the CRP cares only for how many elements have the same value, regardless of what that value (i.e. the location) is.