Table of Contents Part 1 The Motivation and History of Neural Networks Part 2 Components of Artificial Neural Networks Part 3 Particular Types of Neural Network Architectures Part 4 Fundamentals on Learning and Training Samples ID: 550895
Download Presentation The PPT/PDF document "Neural Network Theory" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Neural Network TheorySlide2
Table of Contents
Part 1: The Motivation and History of Neural Networks
Part 2: Components of Artificial Neural Networks
Part 3: Particular Types of Neural Network Architectures
Part 4: Fundamentals on Learning and Training Samples
Part 5: Applications of Neural Network Theory and Open Problems
Part 6: Homework
Part 7: BibliographySlide3
Part 1: The motivation and History of Neural NetworksSlide4
Motivation
Biologically inspired
The organization of the brain is considered when constructing network configurations and algorithmsSlide5
The brain
A human Neuron has four elements:
Dendrites – receive signals from other cells
Synapses – where information is stored at the contact points between neurons
Axons – output signals transmitted
Cell body – produces all necessary chemicals for the neuron to function properlySlide6
Association to Neural Networks
Artificial neurons have
Input channels
Cell body
Output channel
And synapses are simulated with a weightSlide7
Main Characteristics adapted from Biology
Self-organization and learning capability
Generalization capability
Fault toleranceSlide8
The 100-step rule
Experiments showed that a human can recognize the picture of a familiar object or person in 0.1 seconds. Which corresponds to a neuron switching time of
seconds in 100 discrete time steps of parallel processing
A computer following the von Neumann architecture can do practically nothing in 100 time steps of assembler steps.
Slide9
Word to the wise
We must be careful comparing the nervous system with a complicated contemporary device. In ancient times , the brain was compared to a pneumatic machine, in renaissance to a clockwork, and in the 1900's to a telephone networkSlide10
History of Neural Network Theory
1943 - Warren McCulloch and Walter Pitts introduced models of neurological networks
1947 - Pitts and McCulloch indicated a practical field of application for neural networks
1949 - Karl
Lashley
defended his thesis that brain information storage is realized as a distributed system.Slide11
History Continued
1960 - Bernard
Widrow
and
Marcian
Hoff introduced the first fast and precise adaptive learning system. The first widely commercially used neural network. Hoff later became the co-founder of Intel Corporation.
1961 - Karl
Steinbuch
introduced technical realizations of associative memory which can be seen as predecessors of today's neural associative memories
1969 - Marvin Minsky and Seymour
Papert
published a precise analysis of the perceptron to show the perceptron model was not capable of representing many important problems and so, deduced that the field would be a research "dead end".Slide12
History Part 3
1973 - Christoph von der
Malsburg
used a neuron model that was non-linear and biologically more motivated
1974 - Harvard
Werbos
developed a learning procedure called
backpropagation of error
1982 -
Teuvo
Kohonen
described the
self-organizing feature maps
also known as Kohonen maps
1985 - John Hopfield published an article describing a way of finding acceptable solutions for the Travelling Salesman problem using Hopfield netsSlide13
Simple Example of a neural network
Assume we have a small robot. This robot has n number of distance sensors from which it extracts input data. Each sensor provides a real numeric value at any time. In this example, the robot can "sense" when it is about to crash. So, it drives until one of its sensors denotes it is going to collide with an object.
Neural networks allow the robot to "learn when to stop" by treating the neural network as a "black box", then we do not know its structure but just regard its behavior in practice. So, we show the robot when to drive on or when to stop. i.e. called training samples, and are taught to the neural network by learning procedures. Either an algorithm or a mathematical formula. From this, the neural network in the robot will generalize from these samples, and learn when to stop.Slide14
Part 2: Components of Artificial Neural NetworksSlide15
Flynn’s Taxonomy of Computer design
Single instruction stream
Multi instruction stream
Single program
Multiple program
Single data stream
SISD
MISD
Multiple data stream
SIMD
MIMD
SPMD
MPMD
Neural Computers are a particular case of MSMID architecture
Simplest case: an algorithm represents an operation of multiplying a large dimensionality vector or matrix by a vector
The number of operation cycles in the problem solving process is determined by the physical entity and complexity of the problemSlide16
Neural “Clustering”
A “cluster” is a synchronously functioning group of single-bit processors that has a special organization that is close to the implementation of the main part of the algorithm
This provides solutions to two additional problems
1) to minimize or eliminate the information interchange between nodes of the neural computer in the process of problem solving
2) to solve weakly formalized problems (e.g. learning for optimal pattern recognition, self-learning
clusterization
,
etc
)Slide17
DEFINTIONSSlide18
Neurons
Neuron
– nonlinear parameterized bounded function y
y=f(
,
,…,
;
,
,…,
) where {x
i
} are the variables and {
} are the parameters (or weights) of the function. {
} exists in {0,1}
The variables of the neuron are often called
input
and its value is the
output
The function f can be parameterized in any appropriate fashion
The most frequently used potential v is a weighted sum of inputs with an additional constant term called
"bias"
such that v=
+
Slide19
Neural Networks
Neural Network –
sorted triple (N, V, w)
N is the set of neurons
V is the set
whose elements are called connections between neuron
i
and neuron j
The function
defines the weights of the connection between neuron
i
and neuron j
Slide20
The Propagation function
Looking at neuron j, we will usually find a lot of neurons with connection to j. for a neuron j the
propagation function
receives outputs
of other neurons
which are connected to j and transforms them in consideration of the connecting weights
into the network input
that can be further processed by the activation function
Network input is the result of the propagation function
Slide21
Threshold function
Neurons get activated if the network input exceeds their threshold value:
Definition: Let j be a neuron. The
threshold value
is uniquely assigned to j and marks the position of the maximum gradient value of the activation function (basically a switching value)
Slide22
Activation Function
Definition: let j be a neuron. The
activation function
is defined as
This transforms the network input and the previous activation function into a new activation function.
Slide23
Further properties of the activation function
It is advisable that f, the
activation function
, be a
sigmoid function
The parameters are assigned to the neuron nonlinearity. i.e. they belong to the very definition of the activation function such is the case when function f is a
radial basis function
(RBF) or wavelet. For instance, the output of a
gaussian
RBF is given by y=
Where
is the position of the center of the
gaussian
and
is the standard deviation
The main difference between the two above categories of neurons is that RBFs and wavelets are local nonlinearities which vanish asymptotically in all directions of input space, whereas neurons that have a potential and sigmoid nonlinearity have an infinite-range of influence along the direction defined by v=0
Slide24
Optimal Control Theory
Zermelo’s
problem and the handout
Example problemSlide25
Part 3: Particular Types of Neural Network ArchitecturesSlide26
Transfer from logical basis to threshold basis
In the case of neural computers, the logical basis of the computer system in the simplest case is the basis
. This basis maximally corresponds to the logical basis of the major solved problems. The neural computer is a maximally parallelized system for a given algorithmic kernel implementation.
The number of operation cycles in the problem solving process (the number of adjustment cycles for optimization) of the secondary functional in the neural computer is determined by the physical entity and the complexity of the problem
Slide27
Fermi or Logistic Equation and
tanh
(x)
Fermi or logistic function.
Which maps the range of values (0,1)
Hyperbolic tangent
which maps from (-1,1)
Slide28
Neural Network with direct connectionsSlide29
Neural Networks with cross connectionsSlide30
Neural networks with ordered backward connectionsSlide31
Neural networks with amorphous backward connectionsSlide32
Multilayer Neural networks with sequential connectionsSlide33
Multilayer neural networkSlide34
FeedForward networks
Feed forward neural networks-
nonlinear function of its inputs which is the composition of the functions of its neurons
A feedforward network with n inputs,
hidden neurons and
output neurons computes
nonlinear functions of its n input variables as compositions of the
functions computed by the hidden neurons
Feedforward networks are static: e.g. if input is constant, so is output
Feedforward multilayer networks with sigmoid nonlinearities are often termed multilayer
perceptrons
or MLPs.
Slide35
FeedForward Network diagramSlide36
Completely linked networks (Clique)
Completely linked networks permit connections between all neurons except for direct recurrences. Furthermore, the connections must be symmetric. So, every neuron can become an input neuron. (Clique)Slide37
Directed Terms
If the function to be computed by the feedforward neural network is thought to have a significant linear component, it may be useful to add linear terms (called directed terms) to the above structureSlide38
Recurrent Networks
General Form:
Neural networks that include cycles. Since the output of a neuron cannot be a function of itself, then we must explicitly take time into account. The output of a neuron cannot be a function of itself at the same instant of time, but can be a function of its past values. These are considered discrete-time systems
Each connection of a recurrent neural network is assigned a delay value (possibly equal to zero) in addition to being assigned a weight as in feedforward networks.Slide39
Canonical Form of recurrent networks
Governed by recurrent discrete-time equations, the general mathematical description of a linear system is the state equations,
Where
is the state vector at time
,
is the input vector at time
,
is the output vector at time
, and A,B,C,D are matrices.
Property: Any recurrent neural network, however complex can be cast into a canonical form, made of a feedforward neural network, some outputs of which (state outputs) are fed back to the inputs through unit delays.
Slide40
Canonical form of recurrent network diagramSlide41
The order of neural networks
Synchronous activation - all neurons change their values synchronously. i.e. they simultaneously calculate network inputs, activation and output, and pass them on. Closest to biology, most generic and can be used with networks of arbitrary topology
Random order - a neuron
i
is randomly chosen and its
,
and
are updated. For n neurons, a cycle is the n-fold execution of this step. Not always useful
Random permutation - each neuron is chosen exactly once, but in random order, during one cycle. This way is used rarely because it is generally useless, and time-consuming
Topological order of activation - the neurons are updated during one cycle and according to a fixed order determined by the network topology.
Slide42
When to use Neural Networks
The fundamental property of neural networks with supervised training is the parsimonious approximation property. i.e. their ability of approximating any sufficiently regular function with arbitrary accuracy. Therefore, neural networks may be advantageous in any application that requires finding, in a machine learning framework, a nonlinear relation between numerical data
To do so, make sure that
1) a nonlinear model is necessary
2) determine if a neural network is necessary instead of, for instance a polynomial approximation. i.e. when the number of variables is large (larger than or equal to 3)Slide43
Part 4: Fundamentals on Learning and Training SamplesSlide44
Theoretically, a neural network could learn by
Developing new connections
Deleting existing connections
Changing connecting weights
Changing the threshold values of neurons
Varying one or more of the three neuron functions (activation, propagation, output)
Developing new neurons
Deleting neurons
The change of connecting weight is the most common procedure.
Slide45
Different types of training
Unsupervised learning - the training set only consists of input patterns, the network tries, by itself, to detect similarities and to generate pattern classes
Reinforcement learning - the training set consists of input patterns, after completion of a sequence a value is returned to the network indicating whether the result was right or wrong, and possibly, how it was right or wrong.
Supervised learning - the training set consists of input patterns with correct results so that the network can receive a precise error vectorSlide46
Supervised learning steps
Enter input pattern
Forward propagation of the input by the network, generation of the output
Comparing the output with the desired output and provide the error vector
Corrections of the network are calculated based on the error vector
Corrections are appliedSlide47
Error vector
determined usually by the root mean square function (RMSE)
Does not always guarantee global minimum, but may only find local minimum
To calculate RMSE
1) take each error of each data point, square the value.
2) Sum the error squared terms
3) divide by the number of data values
4) take the square root of that valueSlide48
Part 5: Applications of Neural Network Theory and Open ProblemsSlide49
Open problems
Identifying if the neural network will converge in finite time
Training the neural network to identify local versus global minimums
Neural modularitySlide50
Applications of Neural Network
THeory
Traveling Salesman problem
Image Compression
Character Recognition
Optimal Control ProblemsSlide51
Part 6: HomeworkSlide52
1) Show that for the following, the given equations can be expressed by the respective functions themselves
Fermi function:
Hyperbolic tangent function:
Slide53
Optimal control problem
2)
Slide54
Find the RMSE of the below data set
Sample
Data
Estimation
1
1
-1
2
4
7
3
3
1
4
0
-2
5
9
7
6
11
11
7
12
13
8
6
7
9
8
8
10
20
17
11
11
9
12
11
13
13
2
2
14
0
0
15
4
5
16
9
9
17
5
5
18
13
14
19
15
17
20
1
0Slide55
Part 7: BibliographySlide56
Works Cited
Dreyfus, G.
Neural Networks: Methodology and Applications
. Berlin: Springer, 2005. Print.
Galushkin
, A. I.
Neural Networks Theory
. Berlin: Springer, 2007. Print.
Kriesel
, David. "D.
Kriesel
."
A Brief Introduction to Neural Networks []
. Manuscript,
n.d. Web. 28 Mar. 2016.
Lenhart
, Suzanne, and John T. Workman.
Optimal Control Applied to Biological Models
. Boca Raton: Chapman & Hall/CRC, 2007. Print.
Ripley, Brian D.
Pattern Recognition and Neural Networks
. Cambridge: Cambridge UP, 1996. Print.
Rojas,
Raúl
.
Neural Networks: A Systematic Introduction
. Berlin: Springer-
Verlag
, 1996. Print.Wasserman, Philip D. Neural Computing: Theory and Practice. New York: Van Nostrand Reinhold, 1989. Print.https://www.researchgate.net/post/What_are_the_most_important_open_problems_in_the_field_of_artificial_neural_networks_for_the_next_ten_years_and_why