Linear Regression: MSE vs MAE Comparison

This interactive demo helps you understand how linear regression works with different loss functions. Adjust the sliders to modify the regression line and observe how it affects both MSE (Mean Squared Error) and MAE (Mean Absolute Error) in real-time.

Interactive Elements in This Demo

Interactive Regression Model

Line Parameters
1.0
0.0
Your Line: ŷ = 0.0 + 1.0x
Data Controls
15
1.0
Example Datasets:
Linear Trend
With Outliers
Clustered Data
MSE Best Fit: ŷ = 0.0 + 0.0x
MSE Value: 0.00
MAE Best Fit: ŷ = 0.0 + 0.0x
MAE Value: 0.00

Loss Functions Comparison

MSE Loss: Quadratic, smooth curve that heavily penalizes large errors
MAE Loss: V-shaped, linear penalty that treats all error sizes equally

Geometric Interpretation of Residuals

Parameter Space Comparison

Connection to Parameter Space

As seen in Demo 1, every point in parameter space (w₀, w₁) corresponds to a line in feature space. The optimal parameters are found at the lowest point of the loss surface.

Each point on the parameter space surfaces represents a possible model with specific slope and intercept values. The height of the surface shows the loss value for that model. The MSE surface is smooth and bowl-shaped with a unique minimum, while the MAE surface has sharper edges and can have multiple optimal solutions along a line.

Summary and Explanations

Linear Regression Model

Linear regression finds a linear relationship between variables: \( \hat{y} = w_0 + w_1 x \), where \(w_0\) is the intercept and \(w_1\) is the slope.

Loss Functions

Mean Squared Error (MSE)

\[ \text{MSE} = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2 \]

MSE squares the errors, giving more weight to large errors. It produces a smooth, bowl-shaped loss surface with a unique global minimum.

The analytical solution for minimizing MSE is:

\[ w_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2} = \frac{Cov(x, y)}{Var(x)} \] \[ w_0 = \bar{y} - w_1\bar{x} \]

Mean Absolute Error (MAE)

\[ \text{MAE} = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i| \]

MAE uses the absolute value of errors, treating all error magnitudes more uniformly. It produces a more angular loss surface and is more robust to outliers.

The solution for minimizing MAE often involves:

\[ w_1 = \text{median of pairwise slopes} \] \[ w_0 = \text{median of } (y_i - w_1 x_i) \]

Visual Interpretations

Key Differences

Aspect MSE MAE
Sensitivity to Outliers High (squares errors) Low (linear penalty)
Loss Surface Smooth, differentiable everywhere Angular, not differentiable at zero error
Computational Complexity Simple closed-form solution Requires median calculations
Optimal Solution Mean-centered Median-centered