$\DeclareMathOperator*{\argmax}{argmax}$ $\DeclareMathOperator{\defeq}{\stackrel{def}{=}}$

Rylan Schaeffer > Learning > Numerical Methods

Error Analysis

Error analysis addresses identifying and quantifying causes of error in scientific computing and numerical analysis. Some examples include truncation (i.e. terminating a process early), discretization (i.e. approximating continuous values in a discrete grid), and rounding (i.e. losing precision due to computer float point limitations). For a concrete example involving all three sources of error, consider estimating the derivative of some function $f(x)$ using finite differences:

$ \begin{align} f'(x) \defeq \lim_{h \rightarrow 0} \frac{f(x+h) - f(x)}{h} \approx \hat{f'}(x, h) \defeq \frac{f(x+h) - f(x)}{h} & \end{align} $

This introduces the first source of error: discretization. Irrespective of how accurately we can represent numbers, we'll need to choose a specific value for h. The discretization error is therefore $|f'(x) - \hat{f'}(x, h)|$. The next source of error, truncation, arises when we expand $f(x+h)$ using a Taylor series expansion. However, this infinite serie's terms decline in magnitude due to $n!$ in the denominator, so we might decide to approximate $f(x+h)$ by truncating the series after the Nth term:

$ \begin{align} f(x+h) = \sum_{n=0}^{\infty} \frac{f^{(n)}(x)}{n!} h^n \approx \hat{f}(x + h, N) \defeq \sum_{n=0}^{N} \frac{f^{(n)}(x)}{n!} h^n \end{align} $

The truncation error is therefore $|f(x+h) - \hat{f}(x+h, N)|$. The final source of error, rounding, arises due to limitations in how precisely computers can represent the necessary values $h; f(x); \hat{f}(x + h, N); \hat{f'}(x, h)$.

Condition Number

One property of a function we might be interested in is how sensitive the output is for small input perturbations. There are two common ways to formalize this: the absolute condition number and the relative condition number. The relative condition number $\kappa$ captures whether small input perturbations inflate or attenuate after passing through the function.

$ \begin{align} \kappa \defeq \lim_{\epsilon \rightarrow 0} \sup_{\lVert \Delta x \rVert \leq \epsilon} \frac{\lVert \Delta f(x) \rVert / \lVert f(x) \rVert}{\lVert \Delta x \rVert / \lVert x \rVert} \end{align} $

We say that a function $f(x)$ with $\kappa > 1$ is "ill-conditioned" as small input perturbations will be amplified, and inversely, a function with $\kappa < 1$ is "well conditioned".

For a concrete example, consider calculating the relative condition number for a generic linear system: $f(x) = A x$, where $A$ is some $n \times n$ matrix and $x \in \mathbb{R}^n$. The condition number is therefore:

$ \begin{align*} \kappa \defeq \frac{\lVert \Delta f(x) \rVert / \lVert f(x) \rVert}{\lVert \Delta x \rVert / \lVert x \rVert} = \frac{\lVert A (x + \Delta x) - A x \rVert / \lVert A x \rVert}{\lVert \Delta x \rVert / \lVert x \rVert} = \frac{\lVert A \Delta x \rVert \lVert x \rVert}{\lVert \Delta x \rVert \lVert A x \rVert} \end{align*} $

Rate of convergence

For numerical methods requiring multiple steps, one question is how quickly the sequence of output numbers converges to the correct number. In practice, the rate of convergence is critical for saving real-world time. Convergence might depend on the number of iterations of an algorithm, on the number of steps in a series that aren't cut off, and on a measure of discretization.

Data Fitting

Two common approaches to fitting a function to a finite sample of data are interpolation, where a function is chosen to pass exactly through the sampled points, and optimization, where a function is chosen based on minimizing or maximizing some criteria.

Vandermonde Interpolation

Vandermonde Interpolation fits a function to a set of 2D points $\{(x_i, y_i)\}_{i=1}^{n}$ by using the points to construct a monomial basis (i.e. increasing powers of $x^0, x^1, x^2, ...$) and then attempting to find coefficients for an $n-1$ degree polynomial passing through all points. To find the "right" coefficients, Vandermonde interpolation solves a system of linear equations of $V b = y$:

$ \begin{align} \begin{bmatrix} 1 & x_1 & x_1^2 &... & x_1^{n-1}\\ \vdots & & \vdots & & \vdots \\ 1 & x_n & x_n^2 & ... & x_n^{n-1}\\ \end{bmatrix} \begin{bmatrix} b_0 \\ b_1 \\ \vdots \\ b_{n-1} \end{bmatrix} &= \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} \end{align} $

We can show that if the $n$ points are unique, then $Vb = y$ has a unique solution for $b$. We'll first show that having unique points guarantees at least one solution, and then show that having unique points guarantees only that solution. One way is to show that $Vb = y$ has a particular solution is to show that $V$ is invertible, which can be accomplished by showing that the determinant is non-zero. This is made easy by the fact that the Vandermonde matrix $V$ has a nice formula for its determinant: $\det(V) = \prod_{1 \leq i < j \leq n} (x_j - x_i)$. Because the $n$ points are unique, and because $i \neq j$, $x_j - x_i \neq 0$, and since the product of non-zero numbers can never be zero, the determinant is non-zero and thus V is invertible.

But what about the null solution? We know that the general solution to a system of linear equations is the sum of the particular solution of $Vb_p = y$ and the null solution of $Vb_n = 0$, since $Vb_p + Vb_n = y + 0 = y$ and thus $V(b_p + b_n) = y$. It turns out that the only null solution is $b=0$. One way to see this is to consider that for $Vb = 0$, we have $n$ equations telling us the $n$ unique roots of the polynomial $b_0 + b_1 x^1 + ... + b_{n-1} x^{n-1}$. But this polynomial is degree $n-1$, and the fundamental theorem of algebra tells us that a polynomial of degree $n-1$ can have at most $n-1$ roots. So the only way for a polynomial of degree $n-1$ to have $n$ roots is if it's the zero polynomial i.e. $b_0 = ... = b_{n-1} = 0$. In conclusion, the null solution is $b_n = 0$, leaving us with only one solution, the particular solution, which defines exactly one polynomial.

One well-known problem with Vandermonde interpolation is that the monomial basis it uses causes columns to become linearly dependent as n grows TODO. Another problem is that inverting the dense $V$ matrix takes $O(n^3)$ time.

Waring (Lagrange) Interpolation

To ameliorate the problems of Vandermonde Interpolation, Waring, Euler and later Lagrange discovered an improved method that is today called Lagrange Interpolation. The approach starts by observing that interpolation is easy if I have a polynomial that is 1 at $x_i$ and 0 at all other $x_j$; if I had such a polynomial, then I can guarantee that the polynomial passes exactly through $(x_i, y_i)$ without impacting any other point, just by multiplying the polynomial by $y_i$. Consequently, the overall approach is to construct one indicator polynomial $P_i(x)$ per data point, multiply each polynomial by $y_i$ and then linearly add all the polynomials.

The first question is how can I construct a polynomial that is 1 at $x_i$ and 0 at all other $x_{j \neq i}$?

$ \begin{align} L_i(x_j) = \begin{cases} 1 & x_j = x_i\\ 0 & x_j \neq x_i \end{cases} \end{align} $

I can immediately ensure that $L_i(x)$ is zero at $x_j \neq x_i$ by starting with a polynomial that has roots at each $x_j \neq x_i$: $(x-x_1)(x-x_2)...(x-x_n)$. To ensure that this polynomial is equal to 1 at $x_i$, I need to find a variable $v$ such that at $x_i$, $1 = v (x_i-x_1)(x_i-x_2)...(x_i-x_n)$. That's a pretty easy equation to solve! This means that for each point $(x_i, y_i)$, we have a polynomial that is 1 at $x_i$ and 0 at all other $x_j$.

$ \begin{align} L_i(x) &= \frac{(x-x_1)(x-x_2)...(x-x_n)}{(x_i-x_1)(x_i-x_2)...(x_i-x_n)}\\ &= \prod_{j=0\\j \neq i}^N \frac{x - x_j}{x_i - x_j} \end{align} $

To create our final polynomial, we multiply each polynomial by the corresponding $y_i$ and sum together. The final formula isn't particularly intuitive, but now you understand that each constituent polynomial is 1 at exactly one $x_i$, by construction, and then we just scale that 1 up to $y_i$.

$ \begin{align} P(x) &= \sum_{i=0}^{n} y_i L_i(x) \\ &= \sum_{i=0}^{n} y_i \prod_{j=0\\j \neq i}^N \frac{x - x_j}{x_i - x_j} \end{align} $