By Luigi Cardamone Daniele Loiacono and Pier Luca Lanzi The outline Introduction Related work Torcs Imitation learning What sensors What actions What learning method What data ID: 312472
Download Presentation The PPT/PDF document "Learning Drivers through Imitation using..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Learning Drivers through Imitation using Supervised Methods
By Luigi
Cardamone
, Daniele
Loiacono
and Pier Luca
LanziSlide2
The outline
Introduction
Related work
Torcs
Imitation learning
What sensors?
What actions?
What learning method?
What data?
Experimental results
Discussion, conclusions and future workSlide3
Introduction
What is imitation learning?
Supervised learning
Neuroevolution
Two main methods
Direct methods
Indirect methodsSlide4
Introduction
Direct methods are well-known to
be very
ineffective.
Our methods develop drivers with only 15% lower performance than best bot in TORCS.
The trick is in “human-like” high-level action predictionSlide5
The outline
Introduction
Related work
Torcs
Imitation learning
What sensors?
What actions?
What learning method?
What data?
Experimental results
Discussion, conclusions and future workSlide6
Related work
Imitation learning in computer games
Rule-based NPC for Quake III via two-step process
Quake II NPC via reinforcement learning, fuzzy clustering and a Bayesian motion-modeling.
Neural networks with
backpropagation
for Legion II and Motocross The Force.
Drivatar
training for
Forza
MotosportSlide7
The outline
Introduction
Related work
Torcs
Imitation learning
What sensors?
What actions?
What learning method?
What data?
Experimental results
Discussion, conclusions and future workSlide8
What input sensors?
The rangefinder sensor
The
lookahead
sensorSlide9
What actions?
4 low-level effectors in TORCS
Wheel
Gas pedal
Brake pedal
Gear change
2 high-level actions in this work
Speed
TrajectorySlide10
What learning methods?
K-nearest neighbor
Training :
Doesn’t need any training
How it was applied?
Directly during the TORCS race
At each tic, the logged data is searched to find the k most similar instances.
The k similar instances are selected and averagedSlide11
What learning methods?
Neural networks
Training
Neuroevolution
with Augmenting Topology (NEAT) to evolve both the weights and the topology of a neural network
How it was applied?
2 networks, for speed and target position prediction
Rangefinder networks with 19 angle inputs + 1 bias input
Lookahead
networks with 8 segments inputs + bias input
The fitness was defined as the prediction errorSlide12
What data?
Inferno bot on 3 tracks for 3 laps each
Simple fast track
Difficult track with many fast turns
A difficult track with many slow sharp turns
Only the data of second lap was recorded
3 data sets with 1982, 3899 and 3619 examples
Additional all-in set with 9500 examplesSlide13
The outline
Introduction
Related work
Torcs
Imitation learning
What sensors?
What actions?
What learning method?
What data?
Experimental results
Discussion, conclusions and future workSlide14
Experimental results
Overall, we obtained 16 models
2 learning algorithms
3 + 1 datasets
2 types of sensors
K-nearest algorithm was applied with k = 20
NEAT was applied with 100 individuals for 100 generations
All the experiments were conducted using TORCS 1.3.1Slide15
Experimental Results - Evaluation
Each model was evaluated by using it to drive a car on each track for 10.000 game ticks.
The tracks
3 tracks used for training
2 unseen tracks
A simple fast track
A track with many fast and difficult turns
The driver was also equipped with standard recovery policy.Slide16
Experimental results - Results
Inferno was better than its imitations
Lookeaheads
are better than rangefinders
K-nearest neighbor is better than NEAT
One of the models had only 15% lower performance than Inferno bot.Slide17
The summary of the resultsSlide18
Experimental results - Execution time
Direct methods result in low computational cost
Our approach needs 30 times less CPU time to obtain reasonable resultsSlide19
Increasing the lookahead
How much
lookahead
is useful?
Second series of tests with 8 and 16
lookeahead
values showed
overfittingSlide20
The outline
Introduction
Related work
Torcs
Imitation learning
What sensors?
What actions?
What learning method?
What data?
Experimental results
Discussion, conclusions and future workSlide21
Discussion
Good drivers
Close to the target bot
Run out of the track in difficult turns as a result of prediction error or a low reactivity in steering
Bad drivers
Many discontinues in the prediction of trajectories
Causes car to move quickly from one
side of the track to the other oneSlide22
Perceptual aliasing
Two
different places can be perceived the
same
Usually happens on long straight parts of the road
Can be solved via special treatment of straight
parts, full throttle or bigger
lookaheadSlide23
Summary
Supervised learning to imitate a driver
High-level aspect of driving, speed and trajectory rather than low-level effectors
Novel
lookahead
sensor
Good results with k-nearest neighbor
Inferno bot is still better due to perceptual aliasing and slow steering during abrupt turnsSlide24
Future work
Exploit structural symmetry on the track
Increase the robustness to noise
Reduce computational cost
Improve steering reaction to abrupt turns