russtedrake PRO
Roboticist at MIT and TRI
Russ Tedrake
VP, Robotics Research
DARPA Robotics Competition, 2015
Robots are dancing and doing parkour.
Now computer vision is really starting to work...
can they load the dishwasher?
for robotics; in a few slides
Released in 2009
Example: Text completion
No extra "labeling" of the data required!
But it's trained on the entire internet...
And it's a really big network
Humans have also put lots of captioned images on the web
...
"A painting of a professor giving a talk at a robotics competition kickoff"
Input:
Output:
"a painting of a handsome MIT professor giving a talk about robotics and generative AI at brimmer and may school in newton, ma"
Input:
Output:
Is Dall-E just next pixel prediction?
Our engineering design process
Open source:
large language models
visually-conditioned language models
large behavior models
\(\sim\) VLA (vision-language-action)
\(\sim\) EFM (embodied foundation model)
Why actions (for dexterous manipulation) could be different:
should we expect similar generalization / scaling-laws?
Success in (single-task) behavior cloning suggests that these are not blockers
Big data
Big transfer
Small data
No transfer
robot teleop
(the "transfer learning bet")
Open-X
simulation rollouts
novel devices
Cumulative Number of Skills Collected Over Time
+ Amazing university partners
http://manipulation.mit.edu
http://underactuated.mit.edu
By russtedrake