/
Introduction to Neural Networks (cont.) Introduction to Neural Networks (cont.)

Introduction to Neural Networks (cont.) - PowerPoint Presentation

heartfang
heartfang . @heartfang
Follow
351 views
Uploaded On 2020-06-25

Introduction to Neural Networks (cont.) - PPT Presentation

Dr David Wong With thanks to Dr Gari Clifford GIT The MultiLayer Perceptron single layer can only deal with linearly separable data Composed of many connected neurons Three general layers ID: 787766

layer neural learning logistic neural layer logistic learning deep networks network layers hidden weights max gradient features step output

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Introduction to Neural Networks (cont.)" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Introduction to Neural Networks (cont.)

Dr David Wong

(With thanks to Dr Gari Clifford, G.I.T)

Slide2

The Multi-Layer Perceptron

single layer can only deal with linearly separable data

Composed of many connected neurons

Three general layers; Input (i), hidden (j) and output (k)Signals are presented to each input ‘neuron’ or nodeEach signal is multiplied by a learned weighting factor (specific to each connection between each layer)… and by a global activation function,This is repeated in output layer to map the hidden node values to the output

Slide3

Calculating Weights in a MLP

Cannot take the ‘simple’ approach, as we need take into account the ‘knock-on’ effect of multiple layers:

Logistic

LogisticLogistic

Y

X

1,1

X

2,1

X

3,1

X

4

,1

X

5

,1

X

6,1

X

1,2

X

2,2

Slide4

Weight update as gradient descent

In the simple example, we updated weights in one layer using gradient descent. More generally, we want a formula to update weights in all layers.

We still wish to minimise the error:

 

Logistic

Logistic

Logistic

Y

X

1,1

X

2,1

X

3,1

X

4,1

X

5,1

X

6,1

X

1,2

X

2,2

Slide5

Weight update as gradient descent

If we look at the output later, we see that this is just a single layer perceptron – we already know how to adjust the corresponding weights (w

7

and w8)We now want to adjust the weights at the next layer back, so we need to calculate: where i is a number between 1 and 6.

 

Logistic

Logistic

Logistic

Y

X

1,1

X

2,1

X

3,1

X

4,1

X

5,1

X

6,1

X

1,2

X

2,2

Worked example with numbers here: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

w

1

w

2

w

3

w

4

w

5

w

6

w

7

w

8

Slide6

Weight update as gradient descent

 

Logistic

Logistic

Logistic

Y

X

1,1

X

2,1

X

3,1

X

4,1

X

5,1

X

6,1

X

1,2

X

2,2

Worked example with numbers here: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

w

1

w

2

w

3

w

4

w

5

w

6

w

7

w

8

h

2

= w

7

x

1,2

+ w

8

x

2,2

h

1

= w

1

x

1,1

+ w

2

x

2,1

+ w

3

x

3,2

We’ve already worked this out for the single layer

This is just

 

Slide7

Weight update as gradient descent

 

Worked example with numbers here: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

h

2

= w

7

x

1,2

+ w

8

x

2,2

We’ve already worked this out for the single layer

This is just

 

Most important thing to note - we have used the answer from the output layer to help us work out the weights for the next later.

Hence: backpropagation

Slide8

MLP example - MNIST

We will be using MLP to predict digits in the MNIST dataset

MLP can achieve approximately 98% accuracy

http://scienceai.github.io/neocortex/mnist_mlp/

Slide9

Multilayer Perceptrons

for clinical data

Neural network prediction of relapse in breast

cancer patients. Tarassenko et al., 1996Goal: to predict relapse within 3 yearsFeatures: Age, Tumour size, No. Nodes, Log er, Log egfrData: 350 patients Architecture: 5-N-1 (1<N<7) – i.e. 1 hidden layerResults: 72% classification accuracyhttps://link.springer.com/content/pdf/10.1007/BF01413746.pdf

= relapse cases

Slide10

How many hidden layers?

In theory, a neural network only needs one hidden layer to learn anything (

Cybenko

1989)In practice, networks with more layers do betterExample: https://playground.tensorflow.org

Slide11

Deep Learning

Proportion of machine learning papers that contain the term ‘neural networks’

Popular in the 90s

Dip in the 2000sProblem: backpropagation is tricky in highly connected neural netshttp://www.cs.toronto.edu/~fritz/absps/ncfast.pdfA Fast Learning Algorithm for Deep Belief Nets, Hinton, 2006

Slide12

Deep Learning and ImageNet

The re-emergence of neural networks

ImageNet - a large visual database for visual object recognition

Classify 150K images into 1,000 categories (e.g. Egyptian cat, gazelle, wok, photocopier)5 guesses allowed per pictureIn 2012, AlexNet, a ‘deep’ Neural Network won by a huge margin (12% error)Current best is around 3% error

Slide13

Deep Learning vs ‘shallow’ learning

Deep learning uses the same building blocks as normal Neural Networks

But many more layers!

3x1

4x1

2x1

Slide14

Why Deep Learning

If any classification function can be learned with 1 hidden layer, why do we need deep learning?

Slide15

Why Deep Learning

If any classification function can be learned with 1 hidden layer, why do we need deep learning?

No need to create features

E.g. in your assignment, the images get summarised as 30 pertinent numbers. In deep learning, we simply present the whole image (or the array of pixel values) to the neural networkIt works better – Cybenko showed that 1 hidden layer was sufficient, but does not show (i) how many units required (ii) whether said network can be trained(potentially) simulates vision in a more human-like way. Early layers correspond to primitive features (e.g. straight lines), late layers correspond to higher-level features (e.g. things that look like eyes).

Slide16

AlexNet

Uses

ReLu

(rather than logistic) unitsRectified Linear unitsHeuristic dropout to selectively ignore neuronsOverlapping max poolingGraphics Processing Units

Slide17

Convolutional Neural Networks (simplified version of

AlexNet

)

ReLu vs SigmoidEncourages sparsitySigmoids tend to, but never quite reach, zeroFor high values of a=Wx+b Sigmoid gradient diminishes towards zero (so-called vanishing gradient)ReLu has a constant gradientN.b. possible for too many units to go to zero, prohibiting learningQuicker to compute (max(0,a))

Slide18

Convolutional Neural Networks

Max-pooling:

Method of

downsamplingTakes the max value in a local neighbourhoodOverlapping max pooling means that neighbourhoods overlap (e.g. selected area in red)Effect is to ‘blur’ an image, but to keep the pertinent structure – this means computation is fasterEach successive layer looks at a ‘bigger’ picture

Overlapping max-pooling (3x3)

20

30

30

70

70

37

112

100

37

Slide19

Convolutional Neural Networks

Convolution

Formally :

In 2D - Broadly equivalent to applying an image filterIn practice, treat the convolution mask as another set of parameters to be learned, and bundle in the back-propagation. NN learns a ‘good’ mask

Slide20

Convolution

Slide21

CNN architecture

1.) try multiple convolution masks to generate features (initialise these randomly)

2.) use max pooling to ‘shrink’ the image

3.) repeat – this has the effect of creating hierarchical features4.) candidate features are now put into a ‘normal’ feed-forward network

5.)

softmax

used for multi-class classification

Slide22

Convolutional Neural Networks

Softmax

Extension of logistic model

In standard logistic unit:Compute logit (ax+b)Apply logistic functionThresholdClass as 1 or 0For softmaxCompute logit of each classSoftmax computes relative probability of each classUsed for multi-class classification

Slide23

Slide24

Example – identifying hands for

Parkinsons

’ diagnosis

Finger tapping test for diagnosing slow-movement (bradykinesia) in Parkinsons’ patientsGet patients to tap their fingers as wide and as fast as possibleCNN used to separate the hand from the background

Slide25

Then convert each frame of the video into a single number (representing how fast the hand is moving)

Generate features -> reduce dimensions -> classify

Figure shows normal (blue) vs abnormal (red) patients

Example – identifying hands for Parkinsons’ diagnosis

Slide26

Generative Adversarial Networks

Basically, two coupled neural networks

Generator – initially generates random signal (or image).

Feeds intoDiscriminator – a pre-trained network (e.g. recognises cats)Generator weights are updated based on how well/poorly it fools the discriminator

Slide27

Example - https://thispersondoesnotexist.com/

Neural Network generator -> produces new samples from scratch

Neural Network discriminator -> classifies face as ‘real’ or ‘fake’

If you want to see if you can do better than the GAN:

http://www.whichfaceisreal.com/