15 April 2021
Incremental Learning of Nonparametric Bayesian Mixture Models
by Gomes, Welling, Perona (CVPR 2008)
Research Questions
- How to perform online/streaming inference in Dirichlet process mixture model?
Conceptual Notes
- Challenge: In theory, all partitions are valid hypotheses (although
some partitions are more likely than others). Why prevents each clumps from
containing exactly one point?
- Answer: Unanswered. One relevant answer is they hard assign each data point
to one cluster (equation 10) similarly to the Local MAP approximation:
\[\argmax_k q(z_s = k)\]
- Challenge: How do we extract clump constraints without explicitly computing
all hypothesized partitions?
- Answer: Hypothesized partitions have redundancy in that they likely differ from each
other only for a subset of points. “Our approach is to partition the clustering problem”
into a series of independent subproblems […] This forms a tree of possible groupings
of data and the bottom level of this tree defines our clump constraints.” I don’t
understand this. For instance, in the below example, the tree seems to presume that
all hypotheses can be split between the left 7 points and the right 5 points; what
if no such cut exists?
- Challenge: Relatedly, if not all partitions are valid hypotheses, then what how
do we ensure unions of clumps yield a valid hypothesis? For instance,
suppose we have clumps {1, 2}, {3}, {4} that arise from hypothesis
{1, 2, 3}, {4} and hypothesis {1, 2, 4} {3}. But hypothesis {1, 2, 3, 4} is
not a valid hypothesis.
- Challenge: The space of hypotheses, constructed from clump unions,
is still combinatorically large. Do we actually get a benefit?
Algorithm Specifics
The algorithm is called the Memory Bound Variational Dirichlet Process (MB-VDP).
- Draw data in batches (called epochs) of size E data points
- Compute best estimated mixture model using Blei and Jordan (2006)
truncated variational Bayes model
- Use a modified version from Kurihara 2007 that starts with one component
and splits so long as ELBO / free energy increases
- Perform variational optimization subject to clump constraints (how this constraint
is imposed in the optimization is not clear)
- Compression phase: compute clump constraints in top-down manner
- Data points in same clump are used to update clump sufficient statistics
- Unless too few points are in a clump to compute sufficient statistics (e.g. covariance).
In this case, we save the individual data points (termed singlets)
- Stop when too much time / space (number of clump constraints) used
tags: mixture-models - dirichlet-process - bayesian-nonparametrics - clustering - streaming - online