One of the earliest and simplest models of a nonlinear neural network is a binary classifier called the Perceptron, the non-linear equivalent of the single linear neuron:
$\begin{align*} y = \sign(w^T x) \end{align*}$
An amazing property of the perceptron was that for a given classification problem, if the