Tips for Training Deep Network - PowerPoint Presentation

wellific . @wellific

343 views
Uploaded On 2020-08-28

Tips for Training Deep Network - PPT Presentation

Output Training Strategy Batch Normalization Activation Function SELU Network Structure Highway Network Batch Normalization Feature Scaling ID: 809613

network batch layer normalization batch network normalization layer grid lstm selu sigmoid training highway output scaling feature relu input

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/809613" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download The PPT/PDF document "Tips for Training Deep Network" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Slide1

Tips for Training Deep Network

Slide2

Output

Training Strategy: Batch Normalization

Activation Function: SELU

Network Structure: Highway Network

Slide3

Batch Normalization

Slide4

Feature Scaling

……

mean:

standard deviation:

The means of all dimensions are 0, and the variances are all 1

For each dimension i:

In general, gradient descent converges much faster with feature scaling than without it.

Slide5

How about Hidden Layer?

Layer 1

……

Feature Scaling

Feature Scaling ?

Smaller learning rate can be helpful, but the training would be slower.

Difficulty: their statistics change during the training …

Batch normalization

Internal Covariate Shift

Slide6

Batch

Sigmoid

……

Sigmoid

Batch

Slide7

Batch normalization

and

depends on

Note: Batch normalization cannot be applied on small batch.

Slide8

Batch normalization

and

depends on

Sigmoid

How to do

backpropogation

Slide9

Batch normalization

and

depends on

Slide10

Batch normalization

At testing stage:

are from

batch

are network parameters

We do not have

batch

at testing stage.

Ideal solution:

Computing

and

using the whole training dataset.

Practical solution:

Computing the moving average of

and

of the batches during training.

Acc

Updates

Slide11

Batch normalization - Benefit

BN reduces training times, and make very deep net trainable.

Because of less Covariate Shift, we can use larger learning rates.

Less exploding/vanishing gradients

Especially effective for sigmoid, tanh, etc.

Learning is less affected by initialization.

BN reduces the demand for regularization.

Slide12

Slide13

To learn more ……

Batch Renormalization

Layer Normalization

Instance Normalization

Weight Normalization

Spectrum Normalization

Slide14

Activation Function: SELU

Slide15

ReLURectified Linear Unit (ReLU)

Reason:

1. Fast to compute

2. Biological reason

3. Infinite sigmoid with different biases

4. Vanishing gradient problem

Slide16

ReLU - variant

also learned by gradient descent

Slide17

ReLU - variant

Exponential Linear Unit (ELU)

Scaled ELU (SELU)

https://github.com/bioinf-jku/SNNs

Slide18

SELU

Positive and negative values

The whole

ReLU

family has this property except the original

ReLU

Saturation region

ELU also has this property

Slope larger than 1

Only SELU also has this property

Slide19

SELU

…

The inputs are

i.i.d

random variables with mean

and variance

Do not have to be Gaussian

Slide20

SELU

…

The inputs are

i.i.d

random variables with mean

and variance

target

Assume Gaussian

Slide21

Demo

Slide22

Source of joke:

https://zhuanlan.zhihu.com/p/27336839

頁的證明

SELU is actually more general.

Slide23

Tips for Training Deep Network - PowerPoint Presentation

Tips for Training Deep Network - PPT Presentation

Share:

Link:

Embed:

Related Contents