MNIST Example:
Starting with an example:
After downloading the MNIST training and test data set, from the official site, I read the .gz files and convert them in NN friendly numpy.arrays, using the functions get_numpy_matrices_from_MNIST_set and get_numpy_array_from_MNIST_labels.
Then using my custom class General_Backpropagation_NN I created a function that creates an empty NN, create_NN, where there are defined only the cost function and activation function that i want to use, for this example I opted for the cost function: cross_entropy_loss and the activation function: ReLU
How to create a new Cost Function How to create a new Activation Function
I had to create first an empty NN first because the MNIST dataset provides two different datasets one for training and one for testing.
Then after using the one hot bit encoding i created 2 dictionaries, the first that converts the labels in the one bit encoding and the other dictionary that does the opposite.
So i classified the labels using the function classify_label according to the one bit encoding dictionary just created
Lastly, I create the NN and set manually the test set, validation set, and test set:
NN = create_NN(
cost_function = mp.Machine_Learning.Cost_Functions.cross_entropy_loss,
activation_function = mp.Machine_Learning.Activation_Functions.ReLU
)
m = NN.number_of_examples = training_set[0].shape[0]
ceil = math.ceil
NN.validation_set = tuple((training_set[0][ceil(m*0.8):], training_set[1][ceil(m*0.8):]))
NN.training_set = tuple((training_set[0][:ceil(m*0.8)], training_set[1][:ceil(m*0.8)]))
NN.test_set = test_setFor context we start with:
STARTING ACCURACY ON TEST SET: 7.6499999999999995%
And after training for epochs with a learning rate of using this function
NN.train_for_n_epochs(
n_epochs = 100,
learning_rate = 0.001,
stop_execution = False,
show_plot = True
)We end with an accuracy of:
ACCURACY ON TEST SET: 11.3%
I tried incrementing the n_epochs to and the accuracy incremented to around
General_Backpropagation_NN
I class to create a simple backpropagation Neural network, given labeled_inputs and labeled_ouptuts. By default the NN uses the cost function: logistic_loss, and the activation function: sigmoidal, but they can be changed.
How to create a new Cost Function How to create a new Activation Function
Also it is possible to create a stopping criteria based on the errors of training and validation set during training.
Already in the package there are some useful functions that can be used for the creation of the NN:
- Activation Functions:
- Cost_Functions:
- Stopping Criteria
The code for the class is divided in:
- Class Initialization: [[#init|init]]
- Methods:
Activation Functions:
How to create a new Activation Function
All function will receive the same inputs:
a: dictionary witha[0]: inputs,a[1]: first layer ouputs,a[2]: second layer ouputs, âĶw: dictionary withw[1]: weights first layer,w[2]: weights second layer, âĶb: dictionary withb[1]: biases first layer,b[2]: biases second layer, âĶi: index
The output of the function âformulaâ corresponds to the output of the layer i
The output of the function âderivateâ corresponds to:
NOTE: A new Activation Function file is expected to have 2 functions one called
formulaand the other onederivate
hyperbolical
# a = sigma(z)
# define here sigma
def formula(a, w, b, i):
ones_vector = numpy.ones((1, a[i-1].shape[0]))
z = a[i-1] @ w[i].T + (b[i] @ ones_vector).T
return numpy.tanh(z)
def sech(x): return 1/numpy.cosh(x)
# a = sigma(z)
# define here sigma'(z)
def derivate(a, w, b, i):
ones_vector = numpy.ones((1, a[i-1].shape[0]))
z = a[i-1] @ w[i].T + (b[i] @ ones_vector).T
return sech(z)**2ReLU
# a = sigma(z)
# define here sigma
def formula(a, w, b, i):
ones_vector = numpy.ones((1, a[i-1].shape[0]))
z = a[i-1] @ w[i].T + (b[i] @ ones_vector).T
z[z <= 0] = 0
return z
# a = sigma(z)
# define here sigma'(z)
def derivate(a, w, b, i):
ones_vector = numpy.ones((1, a[i-1].shape[0]))
z = a[i-1] @ w[i].T + (b[i] @ ones_vector).T
dz = numpy.ones(z.shape)
dz[z <= 0] = 0
return dzsigmoidal
# a = sigma(z)
# define here sigma
def formula(a, w, b, i):
ones_vector = numpy.ones((1, a[i-1].shape[0]))
z = a[i-1] @ w[i].T + (b[i] @ ones_vector).T
return ML.sigmoid(z)
# a = sigma(z)
# define here sigma'(z)
def derivate(a, w, b, i):
ones_vector = numpy.ones((1, a[i-1].shape[0]))
z = a[i-1] @ w[i].T + (b[i] @ ones_vector).T
return ML.sigmoid(z) * (1 - ML.sigmoid(z))sinusoidal
# a = sigma(z)
# define here sigma
def formula(a, w, b, i):
ones_vector = numpy.ones((1, a[i-1].shape[0]))
z = a[i-1] @ w[i].T + (b[i] @ ones_vector).T
return numpy.sin(z)
# a = sigma(z)
# define here sigma'(z)
def derivate(a, w, b, i):
ones_vector = numpy.ones((1, a[i-1].shape[0]))
z = a[i-1] @ w[i].T + (b[i] @ ones_vector).T
dz = numpy.ones(z.shape)
dz[z <= 0] = 0
return numpy.cos(z)Cost Functions:
How to create a new Cost Function
Like for the activation functions all cost functions will receive the same inputs:
y: numpy.ndarray with the real outputsy_hat: numpy.ndarray with the evaluated outputs given by the NN
Define in a function how the cost is computed, under some examples:
cross_entropy_loss
def cross_entropy_loss(y, y_hat):
loss = numpy.zeros(y.shape)
assert numpy.any([y != 0] and [y != 1])
assert numpy.all([y_hat >= 0])
loss[y==1] = - myNP.log(numpy.absolute(y_hat[y==1]))
return losslogistic_loss
def logistic_loss(y, y_hat):
loss = numpy.zeros(y.shape)
assert numpy.any([y != 0] and [y != 1])
assert numpy.all([y_hat >= 0])
loss[y==0] = - myNP.log(numpy.absolute(1 - y_hat[y==0]))
loss[y==1] = - myNP.log(numpy.absolute(y_hat[y==1]))
return losssimple_loss
def simple_loss(y, y_hat):
return numpy.absolute(y - y_hat)min_square_loss
def min_square_loss(y, y_hat):
return (y - y_hat)**2Stopping Criteria
How to create a new Stopping Criteria
The inputs for a stopping criteria function are:
J_training_history: tuple with all the costs relative to the training set as the training of the NN progressesJ_validation_history: tuple with all the costs relative to the validation set
The function is expected to return a bool:
Trueif the criteria is met and the training of the NN needs to stopFalseotherwise
no_stopping_criteria
def no_stopping_criteria(J_training_history, J_validation_history):
return Falsesame_after_1000_iterations
def same_after_1000_iterations(J_training_history, J_validation_history):
if len(J_training_history) < 1000: return False
J_validation_history = numpy.array(J_validation_history[-1000:])
J_validation_history -= J_validation_history[0]
if numpy.all(numpy.absolute(J_validation_history) <= 0.01): return True
else: return Falsesame_after_1000_iterations_variation
The only difference with the stopping criteria above is that the last cost of the training set has to be greater than the last cost of the validation set
def same_after_1000_iterations_variation(J_training_history, J_validation_history):
if len(J_training_history) < 1000: return False
if J_training_history[-1] > J_validation_history[-1]: return False
J_validation_history = numpy.array(J_validation_history[-1000:])
J_validation_history -= J_validation_history[0]
if numpy.all(numpy.absolute(J_validation_history) <= 0.01): return True
else: return FalseFunction used:
__init__
Arguments:
- [Required]:
labeled_inputs: numpy.array - [Required]:
labeled_outputs: numpy.array - (Optional):
hidden_layers_sizes: tuple - (Optional):
cost_function: function - (Optional):
activation_function: function - (Optional):
activation_function_formula: function - (Optional):
activation_function_derivate: function - (Optional):
activation_function_derivate: function - (Optional):
THETA: dict - (Optional):
range_for_THETA_init: tuple
Jobs of __init__:
- Assert that all the types of the arguments corresponds.
- Assign default values if some are not passed.
- Create a Random THETA Matrix.
NOTE: The THETA is not exactly a Matrix, itâs a dictionary of matrices where each item corresponds to the matrix containing the weights and the bias of the layer
def __init__(self,
labeled_inputs, labeled_outputs, hidden_layers_sizes = tuple(),
*,
cost_function = logistic_loss,
activation_function = sigmoidal,
activation_function_formula = None,
activation_function_derivate = None,
THETA = None,
range_for_THETA_init = (-10, +10),
):
assert type(labeled_inputs) is numpy.ndarray
assert type(labeled_outputs) is numpy.ndarray
assert type(hidden_layers_sizes) is tuple
assert type(range_for_THETA_init) is tuple
assert labeled_inputs.shape[0] == labeled_outputs.shape[0]
m = labeled_inputs.shape[0]
self.training_set = (labeled_inputs[:round(m*0.6)], labeled_outputs[:round(m*0.6)])
self.validation_set = (
labeled_inputs[round(m*0.6) + 1:round(m*0.8)],
labeled_outputs[round(m*0.6) + 1:round(m*0.8)]
)
self.test_set = (labeled_inputs[round(m*0.8)+1:], labeled_outputs[round(m*0.8)+1:])
self.number_of_examples = m
self.cost_function = cost_function
if activation_function_formula is None:
self.activation_function_formula = activation_function.formula
else: self.activation_function_formula = activation_function_formula
if activation_function_derivate is None:
self.activation_function_derivate = activation_function.derivate
else: self.activation_function_derivate = activation_function_derivate
if THETA is None:
self.THETA = self.create_random_THETA(
input_shape = labeled_inputs.shape[1],
hidden_layers_sizes = hidden_layers_sizes,
output_shape = labeled_outputs.shape[1],
range_for_THETA_init = range_for_THETA_init,
)
else: self.THETA = THETAcreate_random_THETA
Given the input_shape, output_shape and hidden_layers_sizes, as well as an (Optional) parameter range_for_THETA_init it returns the THETA dictionary.
def create_random_THETA(
NN,
input_shape : int,
hidden_layers_sizes : tuple,
output_shape : int,
range_for_THETA_init = (-10,10)
):
THETA = dict()
if len(hidden_layers_sizes) >= 1:
number_of_cols = input_shape + 1
for i in range(len(hidden_layers_sizes)):
number_of_rows = hidden_layers_sizes[i]
THETA[i+1] = myNP.create_random_matrix_of_size(
number_of_rows, number_of_cols,
range_of_values = range_for_THETA_init,
)
number_of_cols = hidden_layers_sizes[i] + 1
number_of_rows = output_shape
THETA[i+2] = myNP.create_random_matrix_of_size(
number_of_rows, number_of_cols,
range_of_values = range_for_THETA_init,
)
else:
number_of_rows = output_shape
number_of_cols = input_shape + 1
THETA[1] = myNP.create_random_matrix_of_size(
number_of_rows, number_of_cols,
range_of_values = range_for_THETA_init,
)
return THETAfeedforward_propagation
Basic formula of feedforward_propagation, given some inputs it returns a dictionary with the layers outputs (a).
def feedforward_propagation(NN, inputs):
a = dict()
z = dict()
w, b = ML.separate_weights_and_biases(NN.THETA)
ndim_ = max(w[1].ndim, inputs.ndim)
inputs = myNP.normalize_matrix_dimensions(inputs, ndim_)
if inputs.shape[-1] != w[1].shape[1]: inputs = inputs.T
a[0] = inputs
for i in w.keys():
w[i] = myNP.normalize_matrix_dimensions(w[i], ndim_)
b[i] = myNP.normalize_matrix_dimensions(b[i], ndim_)
a[i] = NN.activation_function_formula(a, w, b, i)
return abackpropagation
Basic formula of backpropagation for updating the matrix THETA, the full implementation and explanation can be found here and here.
def backpropagation(NN, learning_rate, expected_outputs, a):
THETA = copy.deepcopy(NN.THETA)
L = last_layer = max(a.keys())
h = numpy.ones(a[L].shape) * max(0.001 * numpy.mean(a[L]), 10**-30)
y = expected_outputs
dJ_da = (NN.cost_function(y, a[L] + h) - NN.cost_function(y, a[L])) / h
dz_dw = dz_da = dJ_dw = 0
dJ_dz = dict()
w, b = separate_weights_and_biases(THETA)
for i in range(L, 0, -1):
da_dz = NN.activation_function_derivate(a, w, b, i)
dz_dw = a[i-1]
if i == L: dJ_dz[i] = dJ_da * da_dz
else:
dz_da = w[i+1]
dJ_dz[i] = dJ_dz[i+1] @ dz_da * da_dz
dJ_dw = dJ_dz[i].T @ dz_dw
dJ_db = dJ_dz[i].T @ numpy.ones((dJ_dz[i].shape[0],1))
THETA[i] -= learning_rate * numpy.concatenate((dJ_db.T, dJ_dw.T)).T
return THETAcalculate_error
Uses the cost_function declared by the user if any, else the default one to calculate the loss for each evaluated input () in respect to the real output ()
def calculate_error(NN, y = None, y_hat = None, cost_function = None):
if y is None and y_hat is None:
y = NN.training_set[1]
y_hat = NN.evaluate(NN.training_set[0])
if cost_function == None: cost_function = NN.cost_function
return numpy.mean(cost_function(y, y_hat))train_for_one_epoch
Arguments:
- (Optional)
learning_rate: float - (Optional)
number_of_inputs_per_batch: int
If we want to run a stochastic approach to the NN we can pass the argument number_of_inputs_per_batch which will divide the inputs in different parts, each with at most a number of values equal to the value of the argument number_of_inputs_per_batch.
def train_for_one_epoch(NN, learning_rate = 0.5, number_of_inputs_per_batch = None):
if number_of_inputs_per_batch is None: number_of_inputs_per_batch = len(NN.training_set[0])
i = 0
while (i+1)*number_of_inputs_per_batch <= len(NN.training_set[0]):
range_of_batch = range(i*number_of_inputs_per_batch, (i+1)*number_of_inputs_per_batch)
NN.THETA = NN.backpropagation(
learning_rate = learning_rate,
expected_outputs = NN.training_set[1][range_of_batch],
a = NN.feedforward_propagation(NN.training_set[0][range_of_batch])
)
i += 1
if i*number_of_inputs_per_batch < len(NN.training_set[0]):
range_of_batch = range(i*number_of_inputs_per_batch, len(NN.training_set[0]))
NN.THETA = NN.backpropagation(
learning_rate = learning_rate,
expected_outputs = NN.training_set[1][range_of_batch],
a = NN.feedforward_propagation(NN.training_set[0][range_of_batch])
)train_for_n_epochs
Arguments:
- [Required]
n_epochs: int - (Optional)
learning_rate: float - (Optional)
number_of_inputs_per_batch: int - (Optional)
stopping_criteria: function - (Optional)
show_plot: bool - (Optional)
plot_N_times: int- Determine how many times the function
plot(...)is called, being a function which takes time.
- Determine how many times the function
- (Optional)
stop_execution: bool- If true the execution is stopped until the plot window is closed
def no_stopping_criteria(J_validation_history, J_training_history):
return False
def train_for_n_epochs(
NN,
n_epochs,
learning_rate = 0.5,
number_of_inputs_per_batch = None,
*,
stopping_criteria = no_stopping_criteria,
show_plot = True,
plot_N_times = 100,
stop_execution = True,
):
Plot = new_plot4()
Plot.legend[0] = "TRAINING_ERROR"
Plot.legend[1] = "VALIDATION_ERROR"
if show_plot == True: Plot.show()
buffer_error_trn = tuple()
buffer_error_val = tuple()
J_training_history = tuple()
J_validation_history = tuple()
y_training = NN.training_set[1]
y_validation = NN.validation_set[1]
y_hat_training = NN.evaluate(NN.training_set[0])
y_hat_validation = NN.evaluate(NN.validation_set[0])
min_validation_error = NN.calculate_error(y = y_validation, y_hat = y_hat_validation)
best_THETA = copy.deepcopy(NN.THETA)
for i in range(n_epochs+1):
NN.train_for_one_epoch(learning_rate, number_of_inputs_per_batch)
y_hat_training = NN.evaluate(NN.training_set[0])
y_hat_validation = NN.evaluate(NN.validation_set[0])
training_error = NN.calculate_error(y = y_training, y_hat = y_hat_training)
validation_error = NN.calculate_error(y = y_validation, y_hat = y_hat_validation)
if validation_error < min_validation_error: best_THETA = copy.deepcopy(NN.THETA)
if show_plot == True and (n_epochs <= plot_N_times or (i % round(n_epochs/plot_N_times) == 0)):
Plot.add_data(buffer_error_trn, buffer_error_val)
buffer_error_trn = (training_error,)
buffer_error_val = (validation_error,)
else:
buffer_error_trn += (training_error,)
buffer_error_val += (validation_error,)
J_training_history += (training_error,)
J_validation_history += (validation_error,)
if stopping_criteria(J_training_history, J_validation_history) == True: break
if show_plot == True and stop_execution == True: Plot.end()
NN.THETA = copy.deepcopy(best_THETA)
return best_THETAevaluate
Returns the last layer output of the NN.
def evaluate(NN, inputs):
a = NN.feedforward_propagation(inputs)
return a[max(a.keys())]get_numpy_matrices_from_MNIST_set
def get_numpy_matrices_from_MNIST_set(file_path):
with gzip.open(file_path, 'r') as f:
# first 4 bytes is a magic number
magic_number = int.from_bytes(f.read(4), 'big')
# second 4 bytes is the number of images
image_count = int.from_bytes(f.read(4), 'big')
# third 4 bytes is the row count
row_count = int.from_bytes(f.read(4), 'big')
# fourth 4 bytes is the column count
column_count = int.from_bytes(f.read(4), 'big')
# rest is the image pixel data, each pixel is stored as an unsigned byte
# pixel values are 0 to 255
image_data = f.read()
images = numpy.frombuffer(image_data, dtype=numpy.uint8)\
.reshape((image_count, row_count, column_count))
return imagesget_numpy_array_from_MNIST_labels
def get_numpy_array_from_MNIST_labels(file_path):
with gzip.open(f"{FOLDER_NAME}/{TRAINING_LABELS_NAME}", 'r') as f:
# first 4 bytes is a magic number
magic_number = int.from_bytes(f.read(4), 'big')
# second 4 bytes is the number of labels
label_count = int.from_bytes(f.read(4), 'big')
# rest is the label data, each label is stored as unsigned byte
# label values are 0 to 9
label_data = f.read()
labels = numpy.frombuffer(label_data, dtype=numpy.uint8)
return labelscreate_NN
def create_NN(cost_function, activation_function):
inputs = numpy.array(((0,0),(0,0)))
outputs = numpy.array(((0,0),(0,0)))
return mp.Classes.General_Backpropagation_NN(
inputs, outputs,
cost_function = cost_function,
activation_function = activation_function,
)Dictionaries for one hot bit encoding
def create_classification_dict(training_labels):
training_labels = convert_numpy_array_to_tuple(training_labels)
value_to_class = dict()
class_to_value = dict()
index = 0
for value in training_labels:
if value not in value_to_class.keys():
index += 1
value_to_class[value] = index
class_to_value[index] = value
return value_to_class, class_to_valueclassify_labels
def classify_label(labels_array, value_to_class_dictionary):
assert type(labels_array) is numpy.ndarray
lables_tuple = convert_numpy_array_to_tuple(labels_array)
class_array = numpy.zeros((labels_array.shape[0], max(class_to_value.keys()) + 1))
for i in range(labels_array.shape[0]):
value_ = lables_tuple[i]
class_ = value_to_class_dictionary[value_]
class_array[i][class_] = 1
return class_array