Large Behavior Models

(Foundation models for dexterous manipulation)

Russ Tedrake

MIT, EECS/CSAIL

russt@mit.edu

DARPA Robotics Competition, 2015

LLMs \(\Rightarrow\) VLMs \(\Rightarrow\) LBMs

large language models

visually-conditioned language models

large behavior models

\(\sim\) VLA (vision-language-action)

\(\sim\) EFM (embodied foundation model)

vision encoder

language encoder

action

decoder

robot joint encoder

Q: Is predicting actions fundamentally different?

Why actions (for dexterous manipulation) could be different:

Actions are continuous (language tokens are discrete)
Have to obey physics, deal with stochasticity
Feedback / stability
...

should we expect similar generalization / scaling-laws?

Robotics: Science and Systems, 2023

Diffusion Policy

\(\Rightarrow\) Many new startups (some low-cost, some humanoids)

\(\Rightarrow\) Major new investments by tech giants

The opportunity

Common-sense for physical intelligence
- New levels of dexterity (manipulating cloth, liquids, etc)
- Programmed via imprecise natural language and/or a few demonstrations
- "Common-sense robustness"

GPT might make mistakes, but it always produces beautiful prose...

Q: Is predicting actions fundamentally different?

Why actions (for dexterous manipulation) could be different:

Actions are continuous (language tokens are discrete)
Have to obey physics, deal with stochasticity
Feedback / stability
...

should we expect similar generalization / scaling-laws?

One problem: we don't (yet) have internet scale robot data

The Robot Data Diet

Big data

Big transfer

Small data

No transfer

robot teleop

(the "transfer learning bet")

Open-X

simulation rollouts

novel devices

simulation for manipulation

drake.mit.edu

NVIDIA selected Drake and MuJoCo

(for potential inclusion in Omniverse)

(Establishing faith in)

Studying the (new) fundamentals requires scale

Entirely new basic research questions (both theoretical and experimental)
Robotics is becoming "big science"
MIT (and academia more generally) has an essential role to play
- need access to compute
- need access to / strategies for scaling data
- strong partnerships with industry

Online classes (videos + lecture notes + code)

http://manipulation.mit.edu

Large Behavior Models

By russtedrake

Large Behavior Models

3,495

russtedrake PRO

Roboticist at MIT and TRI

people.csail.mit.edu/russt

Large Behavior Models

LLMs \(\Rightarrow\) VLMs \(\Rightarrow\) LBMs

Q: Is predicting actions fundamentally different?

Diffusion Policy

The opportunity

Q: Is predicting actions fundamentally different?

The Robot Data Diet

simulation for manipulation

Studying the (new) fundamentals requires scale

Online classes (videos + lecture notes + code)

Large Behavior Models

More from russtedrake