Zhe Zhang Kin Hong Wong Zhiliang Zeng Lei Zhu Department of Computer Science and Engineering The Chinese University of Hong Kong Contact khwongcsecuhkeduhk ANN approach to visual tracking MVA17 v7g ID: 622279
Download Presentation The PPT/PDF document "A neural network approach to visual trac..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A neural network approach to visual tracking
Zhe Zhang, Kin Hong Wong*, Zhiliang Zeng, Lei Zhu Department of Computer Science and EngineeringThe Chinese University of Hong KongContact: *khwong@cse.cuhk.edu.hk
ANN approach to visual tracking, MVA17 v.7g
1Slide2
Contents
IntroductionMotivationWhat is FCN?Procedures and ImplementationResultConclusion
ANN approach to visual tracking, MVA17 v.7g
2Slide3
FCN-tracker is a modified
FCN (Fully Convolutional Networks)
model for object tracking.
The tracking
problem is to
locate
a moving
target.
Introduction
ANN approach to visual tracking, MVA17 v.7g
3
Time=1
Time=2
Time=3Slide4
Contents
IntroductionMotivationWhat is FCN?Procedures and ImplementationResultConclusion
ANN approach to visual tracking, MVA17 v.7g
4Slide5
Many tracking
algorithm are solely rely on simple hand-crafted features.CNN (Convolutional neural network) is good at learning efficient features from a large quantity of data.FCN is based on CNN which provides end-to-end training that can help to construct a simple pipeline.Motivation
ANN approach to visual tracking, MVA17 v.7g
5Slide6
Contents
IntroductionMotivationWhat is FCN?Procedures and ImplementationResultConclusion
ANN approach to visual tracking, MVA17 v.7g
6Slide7
What is FCN
(Fully Convolutional Networks)?
Fully Convolutional Networks[1] for Semantic
Segmentation(FCN Net)
ANN approach to visual tracking, MVA17 v.7g
7
[1] J. Long, E.
Shelhamer
, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015
Same size
dog
catSlide8
In FCN [1]
ANN approach to visual tracking, MVA17 v.7g8
Output: 3 labels: background, cat, dogSlide9
Why we use FCN for a tracker?
Two aspects that semantic segmentation and visual tracking are in commonThe goal of segmentation is to distinguish between different objects and background, while visual tracking is aiming at discriminating the target object from other objects and background.In addition, both tasks produces the pixel-level output.
ANN approach to visual tracking, MVA17 v.7g
9Slide10
Contents
IntroductionMotivationWhat is FCN?ProceduresResultConclusion
ANN approach to visual tracking, MVA17 v.7g
10Slide11
Procedures
Our window is : s x s=128x128For the first frame:Use the first ground truth data :w,h, center (
cx,cy
)
of the target window provided by the dataset.
Find
E
based on
w,h,s
,
=4
:
The label (fi,j
) for FCN of pixel (i,j)
is generated by a Gaussian function of user selected
After FCN training (100 iterations) , it can predict the response (heat) map.
From the second frame onward during tracking:Using the generated label and response map to re-train the model (once for each subsequence frame).
ANN approach to visual tracking, MVA17 v.7g11
4
4Slide12
Our network structure is nearly the same as the original FCN-8s except that one more convolutional layer is appended after the last feature map layer.
The newly added convolutional layer is trained to transform the segmentation response map into target center highlighted response map.
Procedure –
Network Structure
ANN approach to visual tracking, MVA17 v.7g
12
One more convolution layer added hereSlide13
Here, S = 128,
= 4, w = 81, h = 81
FCN
Tracker –
Crop Result
ANN approach to visual tracking, MVA17 v.7g
13
h
w
E
E
Hand pick a window of size
w,h
The system uses a
ExE
window for trainingSlide14
We use following equation to crop the image sequence
E
= the edge length of square region, s = is the expected input size (128 here),
w
and
h
denote the width and the
height of the object in the first frame.
=4 in our program.
The reason we do this is that we only concern the object, instead using the whole image, we only use the square region around the object center.
here is a scaling factor because we want to cover some context information around the object.
Procedure
:FCN Tracker-Crop function
ANN approach to visual tracking, MVA17 v.7g
14
h
w
E
ESlide15
After training the model, the network is able to predict the target center. The target center would be the maximum position of the confident map.
Since we assume that the object movement is smooth, so we impose higher weights near the center and lower weights in the surrounding region. To do this we utilized the Hann window.Here C denote the confident map,
is the response map output from the network, P is the prior distribution, which is a
H
ann
window in our program.
FCN
Tracker
-Predict output
ANN approach to visual tracking, MVA17 v.7g
15Slide16
Input cropped data
FCN
Tracker-Predict output
ANN approach to visual tracking, MVA17 v.7g
16
C
FCN predict the response map
Apply prior distribution to response map
x
P=Hann windowSlide17
After tracking, we still need to refine our model to improve the accuracy, that means we need the label. We don’t provide the prior label, instead we assume the label should be a Gaussian shaped distribution.
Here
denotes the value in the label indexed by i and j, cx and cy denotes the coordinate of the target center.
is the predefined parameter. We set it to 4 in our program.
4
4
FCN
Tracker-Generate Label
ANN approach to visual tracking, MVA17 v.7g
17Slide18
The tracker in action
ANN approach to visual tracking, MVA17 v.7g18
Apply FCN
f
or 100 iterations
t
o generate the response (heat) map
Use the response map to predict the next target position
100 iterations
Apply FCN for 1 iteration
t
o generate the
updated
response (heat) map
1 iteration
Use the
updated
response map to predict the next target position
Apply FCN for 1 iteration
t
o generate the
updated
response (heat) map
1 iteration
Frame 1
Frame 2
Frame 3
Response map
Updated response map
Updated response mapSlide19
FCN
Tracker - ResultANN approach to visual tracking, MVA17 v.7g19
Demo video
https://
youtu.be/WcQLmTi07OcSlide20
FCN
Tracker - ResultANN approach to visual tracking, MVA17 v.7g20
Result compare to other
trackers using OTB database
How many % of frames fall into the error threshold
Error
How many % of frames fall into the overlap thresholdSlide21
Conclusion
ANN approach to visual tracking, MVA17 v.7g21
Our FCNs can track the target and online update the model efficiently.
The
whole pipeline is straightforward and
simple.
Experimental
results on OTB
benchmark
show that our tracker is competitive to
state-of-the-art results.Slide22
Thank you
Q&AANN approach to visual tracking, MVA17 v.7g
22