The kernel (a.k.a. Gram) matrix grows quadratically with the number of data \(N\). Using it requires space \(O(N^2)\) and time \(O(N^3)\). The Nystrom approximation provides a way to create a rank-\(k\) approximation of the kernel matrix with space \(O(kN)\) and time \(O(k^2N)\).
The Nystrom approximation says that for an \(n \times n\) kernel matrix \(K\),
we can approximate it with low error in the following manner: first,
randomly shuffle columns and rewrite \(K = \begin{bmatrix}W & K_{12}^T\\K_{12} & K_22 \end{bmatrix}\)
where \(W\) is \(m \times m\).
where \(\epsilon = \mathcal{O}(m^{-1/4}) \lvert \lvert K \lvert_F\).