Least Squares Regression

For an overdetermined system where with , least squares finds the solution that minimizes the squared error:

One issue with this approach are outliers (i.e. when the distribution is not gaussian, which can significantly skew the solution → robust regression techniques.

Solution via SVD

Using the SVD , the solution is given by:

where is the pseudo inverse.

Geometric Interpretation

The solution gives us coefficients that define a hyperplane which:
Minimizes the sum of squared vertical distances to the data points
Projects onto the column space of
Makes the residual orthogonal to the column space of

Normal Equations

The normal equations for a least squares problem are:

where is the data matrix, is the target vector, and is the solution vector we seek.

The name “normal” comes from the fact that implies the residual is orthogonal (or normal) to the column space of .

Simple Linear Regression

For fitting a line through the origin to points , we have:

Applying the normal equations to this simplified case:

where is the dot product – the sum of coordinate-wise products (correlation matrix) – and normalizes by the squared magnitudes of the input.
To add an offset, we can append a column of ones to and solve for .