Preparing for your next Quant Interview?
Practice Here!
OpenQuant
Section 3 of 6
Deep LearningNeural Networks

Neural Networks

Neural Networks (NNs) and Deep Learning (DL) represent a powerful class of non-linear models capable of learning complex patterns and representations directly from data. While historically less prevalent in finance due to their "black-box" nature and data requirements, they are increasingly used for tasks where non-linearity and high-dimensional data are key.

I. Core Architecture and Mechanics

The Neuron and the Network

A neural network is a composition of simple, interconnected units called neurons or nodes, organized in layers.

  • Feedforward Pass: The output of a network is calculated by sequentially applying a linear transformation followed by a non-linear activation function f()f(\cdot) at each layer.
    h(l)=f(l)(W(l)h(l1)+b(l))\mathbf{h}^{(l)} = f^{(l)}(\mathbf{W}^{(l)} \mathbf{h}^{(l-1)} + \mathbf{b}^{(l)})
    where h(l)\mathbf{h}^{(l)} is the output of layer ll, W(l)\mathbf{W}^{(l)} are the weights, and b(l)\mathbf{b}^{(l)} are the biases.
  • Universal Approximation Theorem: A feedforward network with a single hidden layer and a non-linear activation function can approximate any continuous function to an arbitrary degree of accuracy. This is the theoretical basis for their power.

Activation Functions

Activation functions introduce the essential non-linearity that allows NNs to model complex relationships.

FunctionFormulaRangeUse Case
Sigmoidσ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}}(0,1)(0, 1)Output layer for binary classification (probability). Suffers from vanishing gradients.
ReLU (Rectified Linear Unit)ReLU(z)=max(0,z)\text{ReLU}(z) = \max(0, z)[0,)[0, \infty)Most common for hidden layers. Solves the vanishing gradient problem.
Softmaxezijezj\frac{e^{z_i}}{\sum_j e^{z_j}}(0,1)(0, 1)Output layer for multi-class classification (probabilities sum to 1).
Tanh (Hyperbolic Tangent)tanh(z)=ezezez+ez\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}(1,1)(-1, 1)Hidden layers. Zero-centered, which is often preferred over Sigmoid.

II. Training the Network

Loss Function and Optimization

Training involves minimizing a Loss Function (or Cost Function) L(y,y^)L(\mathbf{y}, \hat{\mathbf{y}}) that measures the discrepancy between the network's prediction y^\hat{\mathbf{y}} and the true value y\mathbf{y}.

  • Regression: Mean Squared Error (MSE).
  • Classification: Cross-Entropy Loss (or Log Loss).

Backpropagation and Gradient Descent

The network's parameters (W\mathbf{W} and b\mathbf{b}) are updated iteratively using an optimization algorithm, typically a variant of Stochastic Gradient Descent (SGD).

  • Gradient Descent: Updates parameters in the direction opposite to the gradient of the loss function.
  • Backpropagation: An efficient algorithm for computing the gradient of the loss function with respect to every weight in the network. It uses the chain rule of calculus to propagate the error signal backward from the output layer to the input layer.

Regularization and Overfitting

Due to the massive number of parameters, NNs are highly susceptible to overfitting.

  • Dropout: A regularization technique where randomly selected neurons are temporarily ignored during training. This prevents co-adaptation of neurons and forces the network to learn more robust features.
  • Early Stopping: Halting the training process when the performance on a separate validation set begins to degrade, even if the loss on the training set is still decreasing.

III. Specialized Architectures for Finance

The choice of architecture depends heavily on the structure of the financial data.

ArchitectureData TypeFinancial ApplicationRationale
Feedforward Neural Networks (FNN)Tabular data (cross-sectional features).Credit scoring, bond rating prediction, factor selection.Simple and effective for non-linear feature combinations.
Recurrent Neural Networks (RNN) / LSTM / GRUSequential data (time series).High-frequency trading, volatility forecasting, long-term price prediction.Designed to handle sequential dependencies and memory effects in time series.
Convolutional Neural Networks (CNN)Image-like data (e.g., heatmaps of order book data, spectrograms of audio data).Analyzing market microstructure patterns, processing satellite imagery for economic indicators.Excellent at extracting local spatial features.
AutoencodersHigh-dimensional data.Dimensionality reduction, anomaly detection (e.g., identifying fraudulent transactions or market dislocations).Learns a compressed representation of the input data.

Deep Learning

Quantitative Researcher
Quantitative Developer
Table of Contents