Preparing for your next Quant Interview?
Practice Here!
OpenQuant
Section 2 of 6
Statistical LearningLinear Regression

Linear Regression

Linear Regression forms the basis for models like the Capital Asset Pricing Model (CAPM), factor models, and many trading strategies.

I. Simple and Multiple Linear Regression

Model Formulation

The core assumption is a linear relationship between a dependent variable YY and one or more independent variables XiX_i.

Y=β0+i=1pβiXi+ϵY = \beta_0 + \sum_{i=1}^p \beta_i X_i + \epsilon
  • YY: Dependent variable (e.g., stock return)
  • XiX_i: Independent variables/Predictors (e.g., market return, factors)
  • β0\beta_0: Intercept
  • βi\beta_i: Regression coefficients (slopes)
  • ϵ\epsilon: Error term (residual), representing unmodeled variation

Ordinary Least Squares (OLS) Estimation

OLS finds the coefficients β^\hat{\beta} that minimize the Residual Sum of Squares (RSS): RSS=i=1m(yiy^i)2RSS = \sum_{i=1}^m (y_i - \hat{y}_i)^2.

Matrix Form (Multiple Regression): Given the data matrix X\mathbf{X} (including a column of ones for the intercept) and the response vector y\mathbf{y}, the OLS estimator is:

β^=(XX)1Xy\hat{\beta} = (\mathbf{X}^\intercal \mathbf{X})^{-1} \mathbf{X}^\intercal \mathbf{y}

The variance-covariance matrix of the estimated coefficients is:

Var(β^)=(XX)1σ2\text{Var}(\hat{\beta}) = (\mathbf{X}^\intercal \mathbf{X})^{-1} \sigma^2

where σ2\sigma^2 is the variance of the error term, estimated by σ^2=1mp1i=1m(yiy^i)2\hat{\sigma}^2 = \frac{1}{m-p-1} \sum_{i=1}^m (y_i - \hat{y}_i)^2.

II. The Gauss-Markov Theorem and OLS Assumptions

The OLS estimator β^\hat{\beta} is the Best Linear Unbiased Estimator (BLUE) if the following assumptions (the Gauss-Markov assumptions) hold.

AssumptionDescriptionFinancial Implication (Violation)
1. LinearityThe model is linear in the parameters β\beta.Model misspecification (e.g., ignoring non-linear relationships).
2. Strict ExogeneityE[ϵiX]=0\mathbb{E}[\epsilon_i \mid \mathbf{X}] = 0. The error term is uncorrelated with the predictors.Endogeneity: Crucial violation in finance (e.g., simultaneity, omitted variable bias). Leads to biased and inconsistent estimators.
3. No MulticollinearityXX\mathbf{X}^\intercal \mathbf{X} is invertible (i.e., no perfect linear relationship between predictors).Inflated standard errors and unstable coefficient estimates.
4. HomoscedasticityVar(ϵiX)=σ2\mathrm{Var}(\epsilon_i \mid \mathbf{X}) = \sigma^2. The error variance is constant across all observations.Heteroscedasticity: Common in finance (e.g., high-return periods often have high volatility). OLS is unbiased, but standard errors are incorrect, leading to invalid inference.
5. No AutocorrelationCov(ϵi,ϵjX)=0\mathrm{Cov}(\epsilon_i, \epsilon_j \mid \mathbf{X}) = 0 for iji \ne j. Errors are uncorrelated across observations.Autocorrelation: Common in time series data (e.g., momentum strategies). OLS is unbiased, but standard errors are incorrect.

Note: The OLS estimator is BLUE under assumptions 1-5. If we add the assumption that ϵN(0,σ2)\epsilon \sim N(0, \sigma^2), the OLS estimator is also the Maximum Likelihood Estimator (MLE).

III. Model Assessment and Inference

TermFormulaIntuition and Relevance
R2R^2 (Coefficient of Determination)1RSSTSS1 - \frac{RSS}{TSS}Proportion of the variance in YY that is predictable from XX. In finance, a low R2R^2 is common and expected.
Adjusted R2R^21RSS/(mp1)TSS/(m1)1 - \frac{RSS/(m-p-1)}{TSS/(m-1)}Penalizes the inclusion of irrelevant predictors; a better measure for comparing models with different numbers of predictors (pp).
Standard Error (SE) of β^i\hat{\beta}_iVar(β^i)\sqrt{\text{Var}(\hat{\beta}_i)}Used to construct confidence intervals and perform hypothesis tests on individual coefficients.
tt-statistict=β^iSE(β^i)t = \frac{\hat{\beta}_i}{\text{SE}(\hat{\beta}_i)}Used to test the null hypothesis H0:βi=0H_0: \beta_i = 0. Follows a tt-distribution with mp1m-p-1 degrees of freedom.
FF-statisticF=(TSSRSS)/pRSS/(mp1)F = \frac{(TSS - RSS)/p}{RSS/(m-p-1)}Used to test the overall significance of the model, H0:β1=β2==βp=0H_0: \beta_1 = \beta_2 = \dots = \beta_p = 0.

IV. Dealing with Violations and Model Selection

Robust Standard Errors

When Heteroscedasticity or Autocorrelation (or both) are present, the OLS standard errors are biased. Heteroscedasticity-Consistent (HC) Standard Errors (e.g., White's or Newey-West for autocorrelation) are used to correct the standard errors, allowing for valid statistical inference even when the error variance is not constant.

Regularization Methods (Shrinkage)

These methods address the issue of Multicollinearity and Overfitting by adding a penalty term to the OLS objective function, shrinking the coefficients towards zero. This reduces the variance of the coefficient estimates at the cost of introducing a small bias (Bias-Variance Tradeoff).

MethodPenalty TermObjective FunctionEffect
Ridge Regressionλj=1pβj2\lambda \sum_{j=1}^p \beta_j^2 (L2 norm)RSS+λj=1pβj2RSS + \lambda \sum_{j=1}^p \beta_j^2Shrinks all coefficients toward zero; effective for multicollinearity.
Lasso Regressionλj=1pβj\lambda \sum_{j=1}^p \lvert \beta_j \rvertRSS+λj=1pβjRSS + \lambda \sum_{j=1}^p \lvert \beta_j \rvertShrinks some coefficients exactly to zero; performs feature selection and works well for sparse models.

Bias-Variance Tradeoff

The expected prediction error (EPE) of a model f^(x)\hat{f}(x) can be decomposed:

E[(Yf^(x))2]=Irreducible Error+Bias2[f^(x)]+Var[f^(x)]\mathbb{E}\left[\left(Y - \hat{f}(x)\right)^2\right] = \text{Irreducible Error} + \text{Bias}^2\left[\hat{f}(x)\right] + \text{Var}\left[\hat{f}(x)\right]
  • Bias: Error from approximating a real-world function ff with a simpler model f^\hat{f}.
  • Variance: Error from the model being too sensitive to the training data.
  • Tradeoff: More complex models (e.g., high-degree polynomials) have low bias but high variance (overfitting). Simpler models (e.g., OLS) have high bias but low variance (underfitting). Regularization methods aim to find the optimal balance.

Statistical Learning

Quantitative Researcher
Table of Contents