Russ Tedrake
VP, Robotics Research
DARPA Robotics Competition, 2015
Robots are dancing and doing parkour.
Now computer vision is really starting to work...
can they load the dishwasher?
for robotics; in a few slides
Released in 2009
Example: Text completion
No extra "labeling" of the data required!
But it's trained on the entire internet...
And it's a really big network
Humans have also put lots of captioned images on the web
...
"A painting of a professor giving a talk at a robotics competition kickoff"
Input:
Output:
"a painting of a handsome MIT professor giving a talk about robotics and generative AI at brimmer and may school in newton, ma"
Input:
Output:
Is Dall-E just next pixel prediction?
Our engineering design process
Open source:
large language models
visually-conditioned language models
large behavior models
\(\sim\) VLA (vision-language-action)
\(\sim\) EFM (embodied foundation model)
Why actions (for dexterous manipulation) could be different:
should we expect similar generalization / scaling-laws?
Success in (single-task) behavior cloning suggests that these are not blockers
Big data
Big transfer
Small data
No transfer
robot teleop
(the "transfer learning bet")
Open-X
simulation rollouts
novel devices
Cumulative Number of Skills Collected Over Time
+ Amazing university partners
http://manipulation.mit.edu
http://underactuated.mit.edu