/
A neural network approach to visual tracking A neural network approach to visual tracking

A neural network approach to visual tracking - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
401 views
Uploaded On 2018-01-10

A neural network approach to visual tracking - PPT Presentation

Zhe Zhang Kin Hong Wong Zhiliang Zeng Lei Zhu Department of Computer Science and Engineering The Chinese University of Hong Kong Contact khwongcsecuhkeduhk ANN approach to visual tracking MVA17 v7g ID: 622279

fcn tracking approach visual tracking fcn visual approach mva17 ann map response tracker target label convolutional predict object frame

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A neural network approach to visual trac..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

A neural network approach to visual tracking

Zhe Zhang, Kin Hong Wong*, Zhiliang Zeng, Lei Zhu Department of Computer Science and EngineeringThe Chinese University of Hong KongContact: *khwong@cse.cuhk.edu.hk

ANN approach to visual tracking, MVA17 v.7g

1Slide2

Contents

IntroductionMotivationWhat is FCN?Procedures and ImplementationResultConclusion

ANN approach to visual tracking, MVA17 v.7g

2Slide3

FCN-tracker is a modified

FCN (Fully Convolutional Networks)

model for object tracking.

The tracking

problem is to

locate

a moving

target.

Introduction

ANN approach to visual tracking, MVA17 v.7g

3

Time=1

Time=2

Time=3Slide4

Contents

IntroductionMotivationWhat is FCN?Procedures and ImplementationResultConclusion

ANN approach to visual tracking, MVA17 v.7g

4Slide5

Many tracking

algorithm are solely rely on simple hand-crafted features.CNN (Convolutional neural network) is good at learning efficient features from a large quantity of data.FCN is based on CNN which provides end-to-end training that can help to construct a simple pipeline.Motivation

ANN approach to visual tracking, MVA17 v.7g

5Slide6

Contents

IntroductionMotivationWhat is FCN?Procedures and ImplementationResultConclusion

ANN approach to visual tracking, MVA17 v.7g

6Slide7

What is FCN

(Fully Convolutional Networks)?

Fully Convolutional Networks[1] for Semantic

Segmentation(FCN Net)

ANN approach to visual tracking, MVA17 v.7g

7

[1] J. Long, E.

Shelhamer

, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015

Same size

dog

catSlide8

In FCN [1]

ANN approach to visual tracking, MVA17 v.7g8

Output: 3 labels: background, cat, dogSlide9

Why we use FCN for a tracker?

Two aspects that semantic segmentation and visual tracking are in commonThe goal of segmentation is to distinguish between different objects and background, while visual tracking is aiming at discriminating the target object from other objects and background.In addition, both tasks produces the pixel-level output.

ANN approach to visual tracking, MVA17 v.7g

9Slide10

Contents

IntroductionMotivationWhat is FCN?ProceduresResultConclusion

ANN approach to visual tracking, MVA17 v.7g

10Slide11

Procedures

Our window is : s x s=128x128For the first frame:Use the first ground truth data :w,h, center (

cx,cy

)

of the target window provided by the dataset.

Find

E

based on

w,h,s

,

=4

:

The label (fi,j

) for FCN of pixel (i,j)

is generated by a Gaussian function of user selected 

After FCN training (100 iterations) , it can predict the response (heat) map.

From the second frame onward during tracking:Using the generated label and response map to re-train the model (once for each subsequence frame).

ANN approach to visual tracking, MVA17 v.7g11

4

4Slide12

Our network structure is nearly the same as the original FCN-8s except that one more convolutional layer is appended after the last feature map layer.

The newly added convolutional layer is trained to transform the segmentation response map into target center highlighted response map.

Procedure –

Network Structure

ANN approach to visual tracking, MVA17 v.7g

12

One more convolution layer added hereSlide13

Here, S = 128,

= 4, w = 81, h = 81

 

FCN

Tracker –

Crop Result

ANN approach to visual tracking, MVA17 v.7g

13

h

w

E

E

Hand pick a window of size

w,h

The system uses a

ExE

window for trainingSlide14

We use following equation to crop the image sequence

E

= the edge length of square region, s = is the expected input size (128 here),

w

and

h

denote the width and the

height of the object in the first frame.

=4 in our program.

The reason we do this is that we only concern the object, instead using the whole image, we only use the square region around the object center.

here is a scaling factor because we want to cover some context information around the object.

 

Procedure

:FCN Tracker-Crop function

ANN approach to visual tracking, MVA17 v.7g

14

h

w

E

ESlide15

After training the model, the network is able to predict the target center. The target center would be the maximum position of the confident map.

Since we assume that the object movement is smooth, so we impose higher weights near the center and lower weights in the surrounding region. To do this we utilized the Hann window.Here C denote the confident map,

is the response map output from the network, P is the prior distribution, which is a

H

ann

window in our program.

 

FCN

Tracker

-Predict output

ANN approach to visual tracking, MVA17 v.7g

15Slide16

Input cropped data

FCN

Tracker-Predict output

ANN approach to visual tracking, MVA17 v.7g

16

 

C

FCN predict the response map

Apply prior distribution to response map

x

P=Hann windowSlide17

After tracking, we still need to refine our model to improve the accuracy, that means we need the label. We don’t provide the prior label, instead we assume the label should be a Gaussian shaped distribution.

Here

denotes the value in the label indexed by i and j, cx and cy denotes the coordinate of the target center.

is the predefined parameter. We set it to 4 in our program.

 

4

4

FCN

Tracker-Generate Label

ANN approach to visual tracking, MVA17 v.7g

17Slide18

The tracker in action

ANN approach to visual tracking, MVA17 v.7g18

Apply FCN

f

or 100 iterations

t

o generate the response (heat) map

Use the response map to predict the next target position

100 iterations

Apply FCN for 1 iteration

t

o generate the

updated

response (heat) map

1 iteration

Use the

updated

response map to predict the next target position

Apply FCN for 1 iteration

t

o generate the

updated

response (heat) map

1 iteration

Frame 1

Frame 2

Frame 3

Response map

Updated response map

Updated response mapSlide19

FCN

Tracker - ResultANN approach to visual tracking, MVA17 v.7g19

Demo video

https://

youtu.be/WcQLmTi07OcSlide20

FCN

Tracker - ResultANN approach to visual tracking, MVA17 v.7g20

Result compare to other

trackers using OTB database

How many % of frames fall into the error threshold

Error

How many % of frames fall into the overlap thresholdSlide21

Conclusion

ANN approach to visual tracking, MVA17 v.7g21

Our FCNs can track the target and online update the model efficiently.

The

whole pipeline is straightforward and

simple.

Experimental

results on OTB

benchmark

show that our tracker is competitive to

state-of-the-art results.Slide22

Thank you

Q&AANN approach to visual tracking, MVA17 v.7g

22