Dr David Wong With thanks to Dr Gari Clifford GIT The MultiLayer Perceptron single layer can only deal with linearly separable data Composed of many connected neurons Three general layers ID: 787766
Download The PPT/PDF document "Introduction to Neural Networks (cont.)" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Introduction to Neural Networks (cont.)
Dr David Wong
(With thanks to Dr Gari Clifford, G.I.T)
Slide2The Multi-Layer Perceptron
single layer can only deal with linearly separable data
Composed of many connected neurons
Three general layers; Input (i), hidden (j) and output (k)Signals are presented to each input ‘neuron’ or nodeEach signal is multiplied by a learned weighting factor (specific to each connection between each layer)… and by a global activation function,This is repeated in output layer to map the hidden node values to the output
Slide3Calculating Weights in a MLP
Cannot take the ‘simple’ approach, as we need take into account the ‘knock-on’ effect of multiple layers:
Logistic
LogisticLogistic
Y
X
1,1
X
2,1
X
3,1
X
4
,1
X
5
,1
X
6,1
X
1,2
X
2,2
Slide4Weight update as gradient descent
In the simple example, we updated weights in one layer using gradient descent. More generally, we want a formula to update weights in all layers.
We still wish to minimise the error:
Logistic
Logistic
Logistic
Y
X
1,1
X
2,1
X
3,1
X
4,1
X
5,1
X
6,1
X
1,2
X
2,2
Slide5Weight update as gradient descent
If we look at the output later, we see that this is just a single layer perceptron – we already know how to adjust the corresponding weights (w
7
and w8)We now want to adjust the weights at the next layer back, so we need to calculate: where i is a number between 1 and 6.
Logistic
Logistic
Logistic
Y
X
1,1
X
2,1
X
3,1
X
4,1
X
5,1
X
6,1
X
1,2
X
2,2
Worked example with numbers here: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
w
1
w
2
w
3
w
4
w
5
w
6
w
7
w
8
Slide6Weight update as gradient descent
Logistic
Logistic
Logistic
Y
X
1,1
X
2,1
X
3,1
X
4,1
X
5,1
X
6,1
X
1,2
X
2,2
Worked example with numbers here: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
w
1
w
2
w
3
w
4
w
5
w
6
w
7
w
8
h
2
= w
7
x
1,2
+ w
8
x
2,2
h
1
= w
1
x
1,1
+ w
2
x
2,1
+ w
3
x
3,2
We’ve already worked this out for the single layer
This is just
Weight update as gradient descent
Worked example with numbers here: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
h
2
= w
7
x
1,2
+ w
8
x
2,2
We’ve already worked this out for the single layer
This is just
Most important thing to note - we have used the answer from the output layer to help us work out the weights for the next later.
Hence: backpropagation
Slide8MLP example - MNIST
We will be using MLP to predict digits in the MNIST dataset
MLP can achieve approximately 98% accuracy
http://scienceai.github.io/neocortex/mnist_mlp/
Slide9Multilayer Perceptrons
for clinical data
Neural network prediction of relapse in breast
cancer patients. Tarassenko et al., 1996Goal: to predict relapse within 3 yearsFeatures: Age, Tumour size, No. Nodes, Log er, Log egfrData: 350 patients Architecture: 5-N-1 (1<N<7) – i.e. 1 hidden layerResults: 72% classification accuracyhttps://link.springer.com/content/pdf/10.1007/BF01413746.pdf
= relapse cases
Slide10How many hidden layers?
In theory, a neural network only needs one hidden layer to learn anything (
Cybenko
1989)In practice, networks with more layers do betterExample: https://playground.tensorflow.org
Slide11Deep Learning
Proportion of machine learning papers that contain the term ‘neural networks’
Popular in the 90s
Dip in the 2000sProblem: backpropagation is tricky in highly connected neural netshttp://www.cs.toronto.edu/~fritz/absps/ncfast.pdfA Fast Learning Algorithm for Deep Belief Nets, Hinton, 2006
Slide12Deep Learning and ImageNet
The re-emergence of neural networks
ImageNet - a large visual database for visual object recognition
Classify 150K images into 1,000 categories (e.g. Egyptian cat, gazelle, wok, photocopier)5 guesses allowed per pictureIn 2012, AlexNet, a ‘deep’ Neural Network won by a huge margin (12% error)Current best is around 3% error
Slide13Deep Learning vs ‘shallow’ learning
Deep learning uses the same building blocks as normal Neural Networks
But many more layers!
3x1
4x1
2x1
Slide14Why Deep Learning
If any classification function can be learned with 1 hidden layer, why do we need deep learning?
Slide15Why Deep Learning
If any classification function can be learned with 1 hidden layer, why do we need deep learning?
No need to create features
E.g. in your assignment, the images get summarised as 30 pertinent numbers. In deep learning, we simply present the whole image (or the array of pixel values) to the neural networkIt works better – Cybenko showed that 1 hidden layer was sufficient, but does not show (i) how many units required (ii) whether said network can be trained(potentially) simulates vision in a more human-like way. Early layers correspond to primitive features (e.g. straight lines), late layers correspond to higher-level features (e.g. things that look like eyes).
Slide16AlexNet
Uses
ReLu
(rather than logistic) unitsRectified Linear unitsHeuristic dropout to selectively ignore neuronsOverlapping max poolingGraphics Processing Units
Slide17Convolutional Neural Networks (simplified version of
AlexNet
)
ReLu vs SigmoidEncourages sparsitySigmoids tend to, but never quite reach, zeroFor high values of a=Wx+b Sigmoid gradient diminishes towards zero (so-called vanishing gradient)ReLu has a constant gradientN.b. possible for too many units to go to zero, prohibiting learningQuicker to compute (max(0,a))
Slide18Convolutional Neural Networks
Max-pooling:
Method of
downsamplingTakes the max value in a local neighbourhoodOverlapping max pooling means that neighbourhoods overlap (e.g. selected area in red)Effect is to ‘blur’ an image, but to keep the pertinent structure – this means computation is fasterEach successive layer looks at a ‘bigger’ picture
Overlapping max-pooling (3x3)
20
30
30
70
70
37
112
100
37
Slide19Convolutional Neural Networks
Convolution
Formally :
In 2D - Broadly equivalent to applying an image filterIn practice, treat the convolution mask as another set of parameters to be learned, and bundle in the back-propagation. NN learns a ‘good’ mask
Slide20Convolution
Slide21CNN architecture
1.) try multiple convolution masks to generate features (initialise these randomly)
2.) use max pooling to ‘shrink’ the image
3.) repeat – this has the effect of creating hierarchical features4.) candidate features are now put into a ‘normal’ feed-forward network
5.)
softmax
used for multi-class classification
Slide22Convolutional Neural Networks
Softmax
Extension of logistic model
In standard logistic unit:Compute logit (ax+b)Apply logistic functionThresholdClass as 1 or 0For softmaxCompute logit of each classSoftmax computes relative probability of each classUsed for multi-class classification
Slide23Slide24Example – identifying hands for
Parkinsons
’ diagnosis
Finger tapping test for diagnosing slow-movement (bradykinesia) in Parkinsons’ patientsGet patients to tap their fingers as wide and as fast as possibleCNN used to separate the hand from the background
Slide25Then convert each frame of the video into a single number (representing how fast the hand is moving)
Generate features -> reduce dimensions -> classify
Figure shows normal (blue) vs abnormal (red) patients
Example – identifying hands for Parkinsons’ diagnosis
Slide26Generative Adversarial Networks
Basically, two coupled neural networks
Generator – initially generates random signal (or image).
Feeds intoDiscriminator – a pre-trained network (e.g. recognises cats)Generator weights are updated based on how well/poorly it fools the discriminator
Slide27Example - https://thispersondoesnotexist.com/
Neural Network generator -> produces new samples from scratch
Neural Network discriminator -> classifies face as ‘real’ or ‘fake’
If you want to see if you can do better than the GAN:
http://www.whichfaceisreal.com/