Resume
Research
Learning
Blog
Teaching
Jokes
Kernel Papers

# Nystrom Approximation

The kernel (a.k.a. Gram) matrix grows quadratically with the number of data $$N$$. Using it requires space $$O(N^2)$$ and time $$O(N^3)$$. The Nystrom approximation provides a way to create a rank-$$k$$ approximation of the kernel matrix with space $$O(kN)$$ and time $$O(k^2N)$$.

The Nystrom approximation says that for an $$n \times n$$ kernel matrix $$K$$, we can approximate it with low error in the following manner: first, randomly shuffle columns and rewrite $$K = \begin{bmatrix}W & K_{12}^T\\K_{12} & K_22 \end{bmatrix}$$
where $$W$$ is $$m \times m$$.

## Properties

• (Kumar, Mohri and Talwalkar 2012) Let $$\tilde{K}_k$$ denote the rank-k Nystrom approximation with $$m$$ columns that are sampled uniformly at random without replacement from $$K$$. Then
$\lvert\lvert K - \Tilde{K}_k \lvert \lvert_F \leq \lvert\lvert K - K_k \lvert \lvert_F + \epsilon$

where $$\epsilon = \mathcal{O}(m^{-1/4}) \lvert \lvert K \lvert_F$$.