Definition of an ANN (Artificial Neural Networks)

An ANN is completely specified once we define its:

1. Architecture: An ANN is a directed or un-directed graph where each node (or vertices) is called a neuron and each arch (or edge) is called a synaptic connection, it has three subset: the input units, the output units and the hidden units. ⇒ For each node we can define an activation function ⇒ While for each arch we can define its weight. An architecture is completely defined with: Vertices, Edges, Input Units, Output Units, Hidden Units, Weights and Activation Functions.

1.1. Typical Activation Functions: ⇒ Step function or TLU (Threshold Logic Units): $f (a) = 1 (a), f (a) \in [0, 1]$ ⇒ Linear Function: $f (a) = a, f (a) \in R$ ⇒ Sigmoid function: $f (a) = \frac{1}{1 + e ^{- a}}, f (a) \in [0, 1]$ ⇒ Hyperbolic tangent sigmoid: $f (a) = tanh (a), f (a) \in [- 1, 1]$ ⇒ Gaussian: $f (a) = N (a; μ, σ^{2}), f (a) \in [0, 1]$ ⇒ ReLU (Rectifier Linear Units): $f (a) = max (0, a), f (a) \in [0, + \infty]$ ⇒ Leaky ReLU: $f (a) = max (λ a, a), where λ \in [- 0.1, 0.1]$

2. Dynamics: How the signal propagates. The dynamics of an ANN represent how the input data goes from start to end (how it “propagates”), we need to define how the weights interact with the data after they are processed by the activation function. ~Ex.: Let’s take $d$ input data $(x_{1}, x_{2}, \dots, x_{d})$ these data will not have any transformation in the input (even this part can be changed), then they will pass through a first hidden layer where for each node the activation function will be something like:

a_{j} = i = 1 \sum d w_{i} o_{i} z_{j} = σ (a_{j})

Where: $o_{i}$ : old layers (from $1$ to $d$ ) (in this case the input) $z_{j}$ : node $j$ belonging to the new layer (in this case the hidden layer) $σ$ : sigmoid function (our chosen activation function)

~Ex.:

This process is then repeated until the signal reaches the output layer.

Also the dynamics also define the clock (time trigger) of the ANN, which depends on its family of networks.

NOTE: The ANN topology specifies the hardware architecture, the value of the weights it’s the software, while the ANN dynamics (the living machine) represents the running process.

3. Learning: The ANN learns from the examples contained in the training set $τ = {\underline{z_{1}}, \underline{z_{2}}, \dots, \underline{z_{n}}}$ which is a continuous-valued data sample drawn from an underlying multivariate pdf. Main learning setups: ⇒ Supervised Learning: $τ = {(\underline{x_{1}}, \underline{y_{1}}), \dots, (\underline{x_{n}}, \underline{y_{n}})}$ ⇒ Unsupervised Learning: $τ = {\underline{x_{1}}, \dots, \underline{x_{n}}}$ ⇒ Semi-Supervised Learning: $τ = {(\underline{x_{1}}, \underline{y_{1}}), \dots, (\underline{x_{n}}, \underline{y_{n}}), \underline{x_{n + 1}}, \dots, \underline{x_{n + m}}}$ $n$ training data are supervised, they are a set of input $x_{i}$ and output $y_{i}$ , while $m$ data are unsupervised, we know the input $x_{j}$ but not its output ⇒ Reinforcement Learning: $τ = {\underline{x_{1}}, \underline{x_{2}}, \dots, (\underline{x_{t}}, \underline{y_{t}}), \underline{x_{t + 1}}, \dots}$ Where a reinforcement signal $y_{t}$ either a penalty or a reward is given every now and then.

3.1. Generalization: Learning is a process of progressive modification of the connection weights, aimed at inferring the (universal) law underlying the data. Far from being a mere memorization of the data, the laws learned this way are expected to generalize new data, previously unseen, this is called generalization capability.

🪴 Quartz 4.0

Explorer

University AI - ANN (Artificial Neural Network)

Definition of an ANN (Artificial Neural Networks)

Original Files:

Graph View

Table of Contents

Backlinks