Gradient Descent Motivation
Machine Learning has 2 phases: Learning and Prediction.
Learning: The learning algorithm creates a hypothesis function.
Prediction: The hypothesis function's job is to predict outputs.
The cost function represents the error between predicted and actual outputs. We can tune parameters (of learning) to minimize the error. Gradient descent is a popular way of doing so.
Gradient Descent Algorithm
Goal: Find a local minima of Cost Function,
Start with any value for parameters of learning.
Can pick random values or initialize to all 0s.
Example,
parameters = np.zeros(dimensions + 1)
Update parameters to go downhill on the cost surface.
parameters += learning_rate * (y_i - predict(x_i)) * x_i
Repeat until you reach a local optimum.
If the cost function stops significantly decreasing (and α is small), you're typically at a local minimum.
Gradient Descent Example
Linear regression model using gradient descent based learning algorithm"""Machine Learning Models"""
import numpy as np
class LinearRegression:
"""Linear Regression"""
def __init__(self, dimensions: int = 1):
"""init
Parameters
----------
dimensions
number of dimensions for linear regression, by default 1
"""
# Initilize parameters
self.parameters = np.zeros(dimensions + 1)
def learn(self, x: np.ndarray, y: np.ndarray, learning_rate: float, max_iter: int):
"""learn maps input-output training dataset to a hypothesis function
Parameters
----------
x
input data
y
output data
learning_rate
learning rate
max_iter
maximum number of iterations, currently equal to total iterations
"""
# Gradient descent
for _ in range(max_iter):
for xy in zip(x, y):
x_i, y_i = self.reconstruct_input_vec(xy[0]), xy[1]
self.parameters += learning_rate * (y_i - self.predict(x_i)) * x_i
def reconstruct_input_vec(self, x: np.ndarray | float):
"""Create a dummy input, x_0, always defined as 1
Parameters
----------
x
input
Returns
-------
(x_0,...x_n), x_0 = 1
"""
return np.concatenate(([1], np.atleast_1d(x)))
def predict(self, x: np.ndarray):
"""Predict using linear regression hypothesis function
Parameters
----------
x
input
Returns
-------
prediction
"""
return np.dot(self.parameters, x)
Batch Vs Stochastic
Batch gradient descent scans through the entire training set before updating, where as stochastic gradient descent updates for each input-output pair. Stochastic is preferred over batch gradient descent when the data set is large.