Linear Regression
Linear Regression forms the basis for models like the Capital Asset Pricing Model (CAPM), factor models, and many trading strategies.
I. Simple and Multiple Linear Regression
Model Formulation
The core assumption is a linear relationship between a dependent variable and one or more independent variables .
- : Dependent variable (e.g., stock return)
- : Independent variables/Predictors (e.g., market return, factors)
- : Intercept
- : Regression coefficients (slopes)
- : Error term (residual), representing unmodeled variation
Ordinary Least Squares (OLS) Estimation
OLS finds the coefficients that minimize the Residual Sum of Squares (RSS): .
Matrix Form (Multiple Regression): Given the data matrix (including a column of ones for the intercept) and the response vector , the OLS estimator is:
The variance-covariance matrix of the estimated coefficients is:
where is the variance of the error term, estimated by .
II. The Gauss-Markov Theorem and OLS Assumptions
The OLS estimator is the Best Linear Unbiased Estimator (BLUE) if the following assumptions (the Gauss-Markov assumptions) hold.
| Assumption | Description | Financial Implication (Violation) |
|---|---|---|
| 1. Linearity | The model is linear in the parameters . | Model misspecification (e.g., ignoring non-linear relationships). |
| 2. Strict Exogeneity | . The error term is uncorrelated with the predictors. | Endogeneity: Crucial violation in finance (e.g., simultaneity, omitted variable bias). Leads to biased and inconsistent estimators. |
| 3. No Multicollinearity | is invertible (i.e., no perfect linear relationship between predictors). | Inflated standard errors and unstable coefficient estimates. |
| 4. Homoscedasticity | . The error variance is constant across all observations. | Heteroscedasticity: Common in finance (e.g., high-return periods often have high volatility). OLS is unbiased, but standard errors are incorrect, leading to invalid inference. |
| 5. No Autocorrelation | for . Errors are uncorrelated across observations. | Autocorrelation: Common in time series data (e.g., momentum strategies). OLS is unbiased, but standard errors are incorrect. |
Note: The OLS estimator is BLUE under assumptions 1-5. If we add the assumption that , the OLS estimator is also the Maximum Likelihood Estimator (MLE).
III. Model Assessment and Inference
| Term | Formula | Intuition and Relevance |
|---|---|---|
| (Coefficient of Determination) | Proportion of the variance in that is predictable from . In finance, a low is common and expected. | |
| Adjusted | Penalizes the inclusion of irrelevant predictors; a better measure for comparing models with different numbers of predictors (). | |
| Standard Error (SE) of | Used to construct confidence intervals and perform hypothesis tests on individual coefficients. | |
| -statistic | Used to test the null hypothesis . Follows a -distribution with degrees of freedom. | |
| -statistic | Used to test the overall significance of the model, . |
IV. Dealing with Violations and Model Selection
Robust Standard Errors
When Heteroscedasticity or Autocorrelation (or both) are present, the OLS standard errors are biased. Heteroscedasticity-Consistent (HC) Standard Errors (e.g., White's or Newey-West for autocorrelation) are used to correct the standard errors, allowing for valid statistical inference even when the error variance is not constant.
Regularization Methods (Shrinkage)
These methods address the issue of Multicollinearity and Overfitting by adding a penalty term to the OLS objective function, shrinking the coefficients towards zero. This reduces the variance of the coefficient estimates at the cost of introducing a small bias (Bias-Variance Tradeoff).
| Method | Penalty Term | Objective Function | Effect |
|---|---|---|---|
| Ridge Regression | (L2 norm) | Shrinks all coefficients toward zero; effective for multicollinearity. | |
| Lasso Regression | Shrinks some coefficients exactly to zero; performs feature selection and works well for sparse models. |
Bias-Variance Tradeoff
The expected prediction error (EPE) of a model can be decomposed:
- Bias: Error from approximating a real-world function with a simpler model .
- Variance: Error from the model being too sensitive to the training data.
- Tradeoff: More complex models (e.g., high-degree polynomials) have low bias but high variance (overfitting). Simpler models (e.g., OLS) have high bias but low variance (underfitting). Regularization methods aim to find the optimal balance.