MIT 6.421:
Robotic Manipulation
Fall 2023, Lecture 18
Follow live at https://slides.com/d/0Oepxvs/live
(or later at https://slides.com/russtedrake/fall23-lec18)
policy needs to know
state of the robot x state of the environment
http://www.ai.mit.edu/projects/leglab/robots/robots.html
System
State-space
Auto-regressive (eg. ARMAX)
input
output
state
noise/disturbances
parameters
2017
Levine*, Finn*, Darrel, Abbeel, JMLR 2016
perception network
(often pre-trained)
policy network
other robot sensors
learned state representation
actions
x history
from hand-coded policies in sim
and teleop on the real robot
Standard "behavior-cloning" objective + data augmentation
"push box"
"flip box"
Policy is a small LSTM network (~100 LSTMs)
"And then … BC methods started to get good. Really good. So good that our best manipulation system today mostly uses BC, with a sprinkle of Q learning on top to perform high-level action selection. Today, less than 20% of our research investments is on RL, and the research runway for BC-based methods feels more robust."
Image source: Ho et al. 2020
Denoiser can be conditioned on additional inputs, \(u\): \(p_\theta(x_{t-1} | x_t, u) \)
Denoising approximates the projection onto the data manifold;
approximating the gradient of the distance to the manifold
input
output
Control Policy
(as a dynamical system)
"Diffusion Policy" is an auto-regressive (ARX) model with forecasting
\(H\) is the length of the history,
\(P\) is the length of the prediction
Conditional denoiser produces the forecast, conditional on the history
Image backbone: ResNet-18 (pretrained on ImageNet)
Total: 110M-150M Parameters
Training Time: 3-6 GPU Days ($150-$300)
e.g. to deal with "multi-modal demonstrations"
Andy Zeng's MIT CSL Seminar, April 4, 2022
Andy's slides.com presentation
with TRI's Soft Bubble Gripper
Open source: