Consider the linear regression setting in which you are given a training set: $\mathcal{D} := \{ (x_1, y_1), ..., (x_N, y_N) \}$ consisting of $N$ inputs where $y_i$ and $y_j$ are conditionally independent given their inputs $x_i, x_j$. Let $\mathcal{X} := \{x_1, ..., x_n\}$ and $\mathcal{Y} := \{ y_1, ..., y_N\}$. Our goal is to find the parameters $\theta^*$ for the linear regression model.

One approach for finding these parameters is maximum likelihood estimation in which we maximize the predictive distribution of the data given the parameters. We obtain the MLE parameters as:

To find the parameters $\theta_{MLE}$ we typically perform gradient descent. However, a closed-form solution also exists to find the parameters. Derive the closed-form solution to find $\theta_{MLE}$

**Hint**: Instead of maximizing the likelihood directly think about how we can use to the log transformation to simplify this derivation.