Jemin Hwangbo Joonho Lee Alexey Dosovitskiy Dario Bellicoso Joonho Lee Vassilios Tsounis Vladlen Koltun and Marco Hutter Presented by Steven Mazzola UNI slm2242 ID: 935344
Download Presentation The PPT/PDF document "Learning Agile and Dynamic Motor Skills ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Learning Agile and Dynamic Motor Skills for Legged Robots
Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Joonho Lee, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter
Presented by Steven Mazzola
UNI: slm2242
Slide2Why Legged Robots?
Good alternative to wheeled robots for rough terrain or otherwise complicated environmentsCan perform similar actions to humans or other animalsLeg length increases obstacle avoidance and climbing ability
Slide3Boston Dynamics PETMAN
(hydraulic actuator)
Boston Dynamics
BigDog
(hydraulic actuator)
MIT Cheetah
(electric actuator)
Boston Dynamics
SpotMini
(electric actuator)
Slide4What's wrong with current systems?
Hydraulic actuator robots (PETMAN, BigDog)Advantage: uses conventional fuel, high energy output for sizeDisadvantages:Noisy and produces smoke. Cannot be used indoors without special accomodationsHeavy and large. Limits robot to large sizeElectric actuator robots (Cheetah, SpotMini)Cheetah: optimized mostly for speed, lacks general application capabilitiesSpotMini: inner working are kept secret
Slide5ANYmal
RobotBio-inspired dog-sized quadruped Weight: 32 kg4 legs, 55cm long, 3 degrees of freedomHigh leg-length to footprint ratio12 Series-Elastic Actuators (SEAs)
Electric motor
High gear ratio transmission
Elastic element
Rotary encoders
Spring deflection
Output position
Slide6Methodology
Slide71. Modeling
ANYbot
is composed of rigid links and ball bearings at the joints.
Would a fully ideal model representation be sufficient?
What aspects would be difficult to model?
Inertial properties
Actuator dynamics
Slide81. Modeling
Slide92. Training
Actuators have non-linear dynamics and complex internal states
Self-supervised learning can be used to determine the action-to-torque relationship through an actuator network
Network uses history of position errors and velocity
Current state and past two states at 0.01s interval
Interval is sparse to prevent overfitting but dense enough to capture high frequency dynamics
Tuned with validation error
Trained by generating foot trajectories, computing expected joint positions with inverse kinematics, and collecting error between predicted and real data.
Slide102. Training
Multi-layer perception (MLP) actuator network3 hidden layers of 32 units eachSoftsign activation function
Slide113. Learning
Observation o(t): state measurement of robot
Action a(t): position command to actuator
Reward r(t): factor to promote desired behavior
Includes nine measurements: base orientation, base height, linear velocity, angular velocity, joint position, joint velocity, joint state history, previous action, command
Locomotor training uses all nine, while recovery training omits base height
Selected according to stochastic policy
Uses a fixed PD controller
Kp
set at value which keeps relative ranges of position and torque similar
Kd
set at high value to reduce oscillation
Slide123. Learning
Multi-layer perception (MLP) actuator network2 hidden layers of 256 and 128 units Tanh activation functionTrust Region Policy Optimization (TRPO)Custom implementation allows for 250,000 state transitions in 4 hoursStop if no performance improvement in 300 iterations
Slide133. Learning
Need to find a balance for joint torque and velocity penalties
Low penalty: unnatural motions
High penalty: standing bias, limited movement
Curriculum:
Learn primary objective first, then refine movement for other criteria
Curriculum factor in range [0,1] added to all non-primary terms
For locomotion, base velocity error cost is unaffected
For recovery, base orientation cost is unaffected
Slide14Locomotion
Recovery
Slide153. Learning
High-speed locomotion training:
Send randomly sampled forward, lateral, and turning velocities as commands
Trajectory executed for 6 seconds
Simulation terminated if in violation of ground or other limits
Training time: 4 hours
Recovery training:
Robot collision bodies are given random sizes and positions
Dropped from 1 meter in random configurations
Simulation runs for 1.2 seconds and result is set as initial position for learning
Training time: 11 hours
Slide164. Deployment
Trained parameters set by simulation are now ported to onboard PCPosition commands converted to torque commands to control real robot
Complexity of actuators complicate this transfer
Slide17Video
https://www.youtube.com/watch?v=aTDkYFZFWug&feature=youtu.be
Slide18Results
Locomotion policy:A: Discovered gait for 1.0m/s forward velocityB: Comparison of base velocity tracking accuracy for different gaitsC,D,E:Comparison between controllers for different gaits
Slide19Results
High speed policy:A: Forward velocityB: Joint velocitiesC: Joint torquesD: Gait pattern
Slide20Results
Actuation validation:A: Validation setB,C: 0.75m/s forward velocityD,E: 1.6m/s forward velocity
Slide21So what was achieved?
ANYmal gained locomotion skills derived purely from a simulated training environment on an ordinary computer.Locomotion tests outperformed previous speed record on the ANYmal by 25%Recovery rate was 100% after tuning joint velocity constraints, even in complex initial configurationsThe simulation and learning framework created in this research can be roughly applied to any rigid body system
Slide22Additional References
ANYmal: https://www.anybotics.com/anymal-legged-robot/BostonDynamics: https://www.bostondynamics.comMIT: http://biomimetics.mit.edu/
Thank you!