linear least squares regression

Least Squares Regression

For an overdetermined system $A x = b$ where $A \in R^{m \times n}$ with $m > n$ , least squares finds the solution $\tilde{x}$ that minimizes the squared error:
$x min ∥ A x - b ∥_{2}^{2}$
One issue with this approach are outliers (i.e. when the distribution is not gaussian, which can significantly skew the solution → robust regression techniques.

Solution via SVD

Using the SVD $A = U Σ V^{T}$ , the solution is given by:
$\tilde{x} = A^{+} b = V Σ^{- 1} U^{T} b$
where $A^{+}$ is the pseudo inverse.

Geometric Interpretation

The solution $\tilde{x}$ gives us coefficients that define a hyperplane which:
Minimizes the sum of squared vertical distances to the data points
Projects $b$ onto the column space of $A$
Makes the residual $r = b - A \tilde{x}$ orthogonal to the column space of $A$

Normal Equations

The normal equations for a least squares problem $A x \approx b$ are:
$A^{T} A x \overset{x}{ˉ} = A^{T} b = (A^{T} A)^{- 1} A^{T} b$
where $A \in R^{m \times n}$ is the data matrix, $b \in R^{m}$ is the target vector, and $x \in R^{n}$ is the solution vector we seek.

The name “normal” comes from the fact that $A^{T} (b - A x) = 0$ implies the residual $(b - A x)$ is orthogonal (or normal) to the column space of $A$ .

Simple Linear Regression

For fitting a line through the origin $y = m x$ to points $(x_{i}, y_{i})$ , we have:
$A = x_{1} x_{2} ⋮ x_{m}, β = [m], y = y_{1} y_{2} ⋮ y_{m}$
Applying the normal equations to this simplified case:
$\tilde{m} = (A^{T} A)^{- 1} A^{T} y = \frac{A ^{T} y}{A ^{T} A} = \frac{A ^{T} y}{∥ A ∥ _{2}^{2}}$
where $A^{T} y$ is the dot product – the sum of coordinate-wise products (correlation matrix) – and $∥ A ∥_{2}^{2}$ normalizes by the squared magnitudes of the input.
To add an offset, we can append a column of ones to $A$ and solve for $[m, c]$ .

Max Wolf's Second Brain

Explorer

linear least squares regression

Graph View