Bias–Variance & Generalization Explorer

Inspired by Raschka, STAT 479: Model Evaluation 1 – Overfitting and Underfitting

This app links three ideas from the notes:

Overfitting vs. underfitting via training and test errors.
Bias–variance decomposition of the squared loss.
Intuition from multiple training sets: high bias vs. high variance models.

\[ \mathbb{E}\big[(y - \hat{y})^2\big] = \underbrace{\big(f(x) - \mathbb{E}[\hat{f}(x)]\big)^2}_{\text{Bias}^2} + \underbrace{\mathbb{E}\big[(\hat{f}(x) - \mathbb{E}[\hat{f}(x)])^2\big]}_{\text{Variance}} + \underbrace{\sigma^2}_{\text{Noise (irreducible)}} \]

Model Capacity (Polynomial Degree): 1

Underfitting (High Bias)

Capacity controls hypothesis space size (capacity).

Visualization Mode (Left Plot)

Single fit: One training set, one model.
Many fits: Several training sets, several models – illustrates high bias vs. high variance.

1. Data, True Function, and Fitted Models

We assume a true regression function \(f(x)\). Each training set is drawn as \(y = f(x) + \epsilon\) with noise \(\epsilon\). We fit a polynomial of degree \(d\) and compare:

True function (green, dashed)
Noisy training points (gray)
Model fit(s) (red): one or many fits depending on the selected mode

Single Fit: One Training Set

True function Training points Model fit(s)

Training vs. Test Error (Conceptual)

Train error Test (generalization) error
Left: underfitting (high bias), right: overfitting (high variance).

Bias–Variance Decomposition (Conceptual)

Bias² Variance Total error (with noise)