by Nott, Zhang, Yau and Jasra (Journal of Computational and Graphical Statistics 2014)
Previously, Wang and Dunson (2011) proposed a fast inference algorithm for Dirichlet process mixture models that could be applied in an online setting, fancifully called Sequential Updating and Greedy Search (SUGS). The authors actually didn’t intend for their algorithm to function online, recommending that to remove the dependency on observation order, one should average over random orderings of the observations. The way SUGS made inference easy was by taking the MAP of the latent cluster’s posterior, deterministically placing the observation at a single cluster.
Suppose at time \(t\), we have an approximate posterior for the previous time step
\[p(z_{1:t-1}, \theta_{1:t-1}|x_{1:t-1}) \approx \prod_{t' < t} q_{t-1}(z_{t'}) \prod_{t'<t} q_{t-1}(\theta_{t'})\]We can then update the previous time step’s variational posterior by assuming the current step will have a similar form:
\[\prod_{t' \leq t} q_{t}(z_{t'}) \prod_{t' \leq t} q_{t}(\theta_{t'})\]For the new terms, \(q_t(z_t)\) and \(q_t(\theta_t)\), first initialize \(q_t(z_t)\) as
\[q_t(z_t) = \underbrace{q_{ij}}_{\text{Var. Latent Prior}} \int \underbrace{q_{t-1}(\theta_k)}_{\text{Var. Param Prior}} \underbrace{p(y_t|\theta_k)}_{\text{Obs. Likelihood}} d\theta_k\]where \(q_{ij}\) is defined using the truncated CRP (with truncation level \(T\)):
$$q_{ij} = \frac{1}{\alpha + t - 1} \begin{cases} \sum_{t'< t} q_{t-1}(z_{t'} = j) + \alpha T\\
\alpha (1 - ((i-1) \wedge T)/T) \end{cases}$$