/
Learning Agile and Dynamic Motor Skills for Legged Robots Learning Agile and Dynamic Motor Skills for Legged Robots

Learning Agile and Dynamic Motor Skills for Legged Robots - PowerPoint Presentation

KissableLips
KissableLips . @KissableLips
Follow
342 views
Uploaded On 2022-08-04

Learning Agile and Dynamic Motor Skills for Legged Robots - PPT Presentation

Jemin Hwangbo Joonho Lee Alexey Dosovitskiy Dario Bellicoso Joonho Lee Vassilios Tsounis Vladlen Koltun and Marco Hutter    Presented by Steven Mazzola UNI slm2242 ID: 935344

training actuator learning velocity actuator training velocity learning joint high dynamics position locomotion base robot anymal recovery robots policy

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Learning Agile and Dynamic Motor Skills ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Learning Agile and Dynamic Motor Skills for Legged Robots

Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Joonho Lee, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter  

Presented by Steven Mazzola

UNI: slm2242

Slide2

Why Legged Robots?

Good alternative to wheeled robots for rough terrain or otherwise complicated environmentsCan perform similar actions to humans or other animalsLeg length increases obstacle avoidance and climbing ability

Slide3

Boston Dynamics PETMAN

(hydraulic actuator)

Boston Dynamics

BigDog

(hydraulic actuator)

MIT Cheetah

(electric actuator)

Boston Dynamics

SpotMini

(electric actuator)

Slide4

What's wrong with current systems?

Hydraulic actuator robots (PETMAN, BigDog)Advantage: uses conventional fuel, high energy output for sizeDisadvantages:Noisy and produces smoke. Cannot be used indoors without special accomodationsHeavy and large. Limits robot to large sizeElectric actuator robots (Cheetah, SpotMini)Cheetah: optimized mostly for speed, lacks general application capabilitiesSpotMini: inner working are kept secret

Slide5

ANYmal

RobotBio-inspired dog-sized quadruped Weight: 32 kg4 legs, 55cm long, 3 degrees of freedomHigh leg-length to footprint ratio12 Series-Elastic Actuators (SEAs)

Electric motor

High gear ratio transmission

Elastic element

Rotary encoders

Spring deflection

Output position

Slide6

Methodology

Slide7

1. Modeling

ANYbot

is composed of rigid links and ball bearings at the joints.

Would a fully ideal model representation be sufficient?

What aspects would be difficult to model?

Inertial properties

Actuator dynamics

Slide8

1. Modeling

Slide9

2. Training

Actuators have non-linear dynamics and complex internal states

Self-supervised learning can be used to determine the action-to-torque relationship through an actuator network

Network uses history of position errors and velocity

Current state and past two states at 0.01s interval

Interval is sparse to prevent overfitting but dense enough to capture high frequency dynamics

Tuned with validation error

Trained by generating foot trajectories, computing expected joint positions with inverse kinematics, and collecting error between predicted and real data.

Slide10

2. Training

Multi-layer perception (MLP) actuator network3 hidden layers of 32 units eachSoftsign activation function

Slide11

3. Learning

Observation o(t): state measurement of robot

Action a(t): position command to actuator

Reward r(t): factor to promote desired behavior

Includes nine measurements: base orientation, base height, linear velocity, angular velocity, joint position, joint velocity, joint state history, previous action, command

Locomotor training uses all nine, while recovery training omits base height

Selected according to stochastic policy

Uses a fixed PD controller

Kp

 set at value which keeps relative ranges of position and torque similar

Kd

set at high value to reduce oscillation

Slide12

3. Learning

Multi-layer perception (MLP) actuator network2 hidden layers of 256 and 128 units Tanh activation functionTrust Region Policy Optimization (TRPO)Custom implementation allows for 250,000 state transitions in 4 hoursStop if no performance improvement in 300 iterations

Slide13

3. Learning

Need to find a balance for joint torque and velocity penalties

Low penalty: unnatural motions

High penalty: standing bias, limited movement

Curriculum:

Learn primary objective first, then refine movement for other criteria

Curriculum factor in range [0,1] added to all non-primary terms

For locomotion, base velocity error cost is unaffected

For recovery, base orientation cost is unaffected

Slide14

Locomotion

Recovery

Slide15

3. Learning

High-speed locomotion training:

Send randomly sampled forward, lateral, and turning velocities as commands

Trajectory executed for 6 seconds

Simulation terminated if in violation of ground or other limits

Training time: 4 hours

Recovery training:

Robot collision bodies are given random sizes and positions

Dropped from 1 meter in random configurations

Simulation runs for 1.2 seconds and result is set as initial position for learning

Training time: 11 hours

Slide16

4. Deployment

Trained parameters set by simulation are now ported to onboard PCPosition commands converted to torque commands to control real robot

Complexity of actuators complicate this transfer

Slide17

Video

https://www.youtube.com/watch?v=aTDkYFZFWug&feature=youtu.be

Slide18

Results

Locomotion policy:A: Discovered gait for 1.0m/s forward velocityB: Comparison of base velocity tracking accuracy for different gaitsC,D,E:Comparison between controllers for different gaits

Slide19

Results

High speed policy:A: Forward velocityB: Joint velocitiesC: Joint torquesD: Gait pattern

Slide20

Results

Actuation validation:A: Validation setB,C: 0.75m/s forward velocityD,E: 1.6m/s forward velocity

Slide21

So what was achieved?

ANYmal gained locomotion skills derived purely from a simulated training environment on an ordinary computer.Locomotion tests outperformed previous speed record on the ANYmal by 25%Recovery rate was 100% after tuning joint velocity constraints, even in complex initial configurationsThe simulation and learning framework created in this research can be roughly applied to any rigid body system

Slide22

Additional References

ANYmal: https://www.anybotics.com/anymal-legged-robot/BostonDynamics: https://www.bostondynamics.comMIT: http://biomimetics.mit.edu/

Thank you!