Two concurrently released works Ambrogioni (Arxiv 2023) and Hoover et al. (Arxiv 2023) showed that associative memory models and diffusion models can be linked.
The modern continuous Hopfield Network has energy function
\[E_{\beta}(X) := - \beta^{-1} \log \Big ( \sum_n \exp \Big ( \beta x \cdot y_n \Big ) \Big ) + \frac{1}{2} \lvert \lvert x \lvert \lvert_2^2\]where \(\beta\) is a temperature parameter and additive constants in \(x\) are omitted.
We imagine we have a target distribution \(\phi(y)\) available only through training dataset:
\[D = \{y_1, \dots, y_N\}\]We inject noise that turns training samples into random noise, and learn to invert this process to turn random noise into new samples. In one diffusion model (called the variance exploding equation), Ambrogioni writes this process in reversed time with noiseless data corresponding to final time \(T\):
\[x(t - dt) = x(t) + \sigma \sqrt{dt} \delta(t)\]where $\sigma$ is the standard deviation of the noise and $\delta(t)$ is a standard Normal distribution. If we initialize the state with target distribution \(\phi(y)\), then the inverse equation is:
\[x(t + dt) = x(t) + \sigma^2 \nabla_x \log p_t(x(t)) dt + \sigma \sqrt{dt} \delta(t)\]where \(p_t(x)\) is the marginal distribution of the noise-injection process at time \(t\). For the variance exploding equation, the marginal can be computed analytically:
\[p_t(x) = \mathbb{E}_{y \sim \phi(y)} \Big[ \frac{1}{\sqrt{2 \pi (T - t) \sigma^2}} \exp( -\frac{\lvert \lvert x - y \lvert \lvert_2^2}{2(T-t)\sigma^2}) \Big]\]For real data, we may not know the score function \(\nabla_x \log p_t(x(t))\), but we can estimate it with a neural network:
\[\mathcal{L}(\theta) = \frac{1}{2} \mathbb{E}_{y \sim D}, t \Big[ \mathbb{E}_{x(t)|y} \Big[\lvert \lvert \delta(t) - s(x(t), t; \theta) \lvert \lvert_2^2 \Big] \Big]\]where \(\delta(t) = x(t) - y\) is the total noise added to pattern \(y\) up to time \(t\). The score can then be recovered as:
\[\nabla_x \log p_t(x) \approx - \sigma^{-1} s(x(t), t; \theta)\]Define the diffusion model energy function as:
\[E(x, t) = -\sigma^2 \log p_t(x) = - \sigma^2 \log \mathbb{E}_{y \sim \phi(y)} \Big[ \exp( -\frac{\lvert \lvert x - y \lvert \lvert_2^2}{2(T-t)\sigma^2}) \Big] + c\]We replace the distribution \(\phi(y)\) with a set of memories \(\{y_1, \dots, y_N\}\) and define the energy function:
\[E(x, t) = - \sigma^2 \log \Big[ \sum_n \exp( -\frac{\lvert \lvert x - y_n \lvert \lvert_2^2}{2(T-t)\sigma^2}) \Big]\]Assuming that patterns are normalized to unit length and removing constant additive terms, we obtain:
\[\frac{1}{\sigma^2} E(x, t) = - \log \Big[ \sum_n \exp( -\frac{1}{2(T-t)} x \cdot y_n ) \Big] + \frac{\lvert \lvert x \lvert \lvert_2^2}{(T-t)^2}\]Now, define the time-dependent inverse temperature \(\beta(t)^{-1} := \sigma^2(T-t)\) and multiply both sides by \(\beta(t)^{-1}\):
\[\frac{\beta^{-1}(t)}{\sigma^2} E(x, t) = - \beta^{-1} \log \Big ( \sum_n \exp \Big ( \beta x \cdot y_n \Big ) \Big ) + \frac{1}{2} \lvert \lvert x \lvert \lvert_2^2\]which is the energy of the continuous modern Hopfield network for fixed \(t\).
Key differences:
Ambrogioni points out that these two effects cancel because the divergence of \(\beta(t)\) suppresses the stochastic fluctuations. In experiments, there is no meaningful difference for large \(\beta\).
Ambrogioni (Arxiv 2023) also introduces a more general case that works for any differentiable scalar potential function \(v(x)\). We define the diffusion model’s dynamics as:
\[x(t - dt) = x(t) + \nabla_x v(x)dt + \sigma(t) \sqrt{dt} \delta(t)\]This excludes non-conservative dynamics and state-dependent noise models. The generative dynamics are:
\[x(t + dt) = x(t) + (\sigma(t)^2 \nabla \log p_t(x) - \nabla_x v(x) ) dt + \sigma(t) \sqrt{dt} \delta(t),\]and the conditional marginal distribution \(p_t(x)\) is:
\[p_t(x) = \mathbb{E}_{y \sim \phi(y)}[k (x(t), t; y , T)]\]For this general case, Ambrogioni says that no analytical solution exists. Define the log of the solution kernel:
\[\psi (x, t; y, T) := \log k (x, t; y , T)\]Then, the diffusion model’s energy function is:
\[E(x, t) = - \sigma^2 \log \Big(\sum_{n=1}^N \exp(\psi(x, t; y_n, T)) \Big) + v(x)\]