# Newton’s Method

## Idea

Suppose we wish to minimize a scalar-valued function $f(x)$
where $x \in \mathbb{R}^N$ using gradient descent. We want to identify
what perturbation $\delta$ to $x$ minimizes $f(x)$. We phrase this question
as an optimization problem:

\[\argmin_{\delta} f(x + \delta)\]
If we perform a second-order Taylor series expansion about the current point,
we see that:

\[f(x + \delta) \approx f(x) + \nabla_x f(x)^T \delta + \frac{1}{2} \delta^T
H(x) \delta + O(\delta^3)\]
where $H(x)_{ij} = \frac{\partial^2 f(x)}{\partial x_i \partial x_j}$
is the Hessian. To find the value of $\delta$ that minimizes $f(x^* + \delta)$,
we differentiate and set equal to 0:

\[0 = \nabla_{\delta} f(x^* + \delta) \Rightarrow
0 = 0 + \nabla_x f(x) + H(x) \delta\]
Solving for $\delta$ yields:

\[\delta = - H(x)^{-1} \nabla_x f(x)\]
Thus, Newton’s Method takes the following step:

\[x_{k+1} = x_k - H(x)^{-1} \nabla_x f(x)\]
## Properties

### Attraction to Saddle Points

One interesting property of Newton’s Method is that it is attracted to saddle
points, a point made by
Dauphin et al. 2014. To see why this is, let’s explore how Newton’s Method
behaves around fixed points. Starting with a second-order Taylor Series expansion
around a fixed point $x^*$:

\[f(x^* + \delta) =\]
we write the update in the eigenbasis
of the Hessian. Let $\lambda_i, e_i$ be the $i$-th eigenvalue, eigenvector pair of
the Hessian $H \defeq \nabla_x^2 f(x)$:

### Convergence Analysis

TODO