Different views on robustness

Robust Control to Foundation Models

Russ Tedrake

November 6, 2023

"Dexterous Manipulation" Team

(founded in 2016)

Distributions over manipulation scenarios

  • Parameterized procedural mugs, even vegetables!
  • Parameterized environments (lighting conditions, etc), ...

Finding subtle bugs w/ Monte-Carlo testing

For the next challenge:

Good control when we don't have useful models?

For the next challenge:

Good control when we don't have useful models?

  • Rules out:
    • (Multibody) Simulation
    • Simulation-based reinforcement learning (RL)
    • State estimation / model-based control
  • My top choices:
    • Learn a dynamics model
    • Behavior cloning (imitation learning)

I was forced to reflect on my core beliefs...

  • The value of using RGB (at control rates) as a sensor is undeniable.  I must not ignore this going forward.
     
  • I don't love imitation learning (decision making \(\gg\) mimcry), but it's an awfully clever way to explore the space of policy representations
    • Don't need a model
    • Don't need an explicit state representation
      • (Not even to specify the objective!)

We've been exploring, and found something good in...

From Thursday...

Check out the TRI demo on Wednesday afternoon

Denoising diffusion models (generative AI)

Image source: Ho et al. 2020 

Denoiser can be conditioned on additional inputs, \(u\): \(p_\theta(x_{t-1} | x_t, u) \)

Image backbone: ResNet-18 (pretrained on ImageNet)
Total: 110M-150M Parameters
Training Time: 3-6 GPU Days ($150-$300)

Why (Denoising) Diffusion Models?

  • High capacity + great performance
  • Small number of demonstrations (typically ~50)
  • Multi-modal (non-expert) demonstrations
  • Training stability and consistency
    • no hyper-parameter tuning
  • Generates high-dimension continuous outputs
    • vs categorical distributions (e.g. RT-1, RT-2)
    • Action-chunking transformers (ACT)
  • Solid mathematical foundations (score functions)
  • Reduces nicely to the simple cases (e.g. LQG / Youla)

A derministic interpretation (manifold hypothesis)

Denoising approximates the projection onto the data manifold;

approximating the gradient of the distance to the manifold

Dynamic output feedback

..., u_{-1}, u_0, u_1, ...
..., y_{-1}, y_0, y_1, ...

input

output

Control Policy
(as a dynamical system)

"Diffusion Policy" is an auto-regressive (ARX) model with forecasting

\begin{aligned} [y_{n+1}, ..., y_{n+P}] = f_\theta(&u_n, ..., u_{n-H} \\ &y_n, ..., y_{n-H} )\end{aligned}

\(H\) is the length of the history,

\(P\) is the length of the prediction

Conditional denoiser produces the forecast, conditional on the history

Learns a distribution (score function) over actions

e.g. to deal with "multi-modal demonstrations"

Enabling technologies

Haptic Teleop Interface

Excellent system identification / robot control

Visuotactile sensing

with TRI's Soft Bubble Gripper

Open source:

https://punyo.tech/

Scaling Up

  • I've discussed training one skill
  • Wanted: few shot generalization to new skills
    • multitask, language-conditioned policies
    • connects beautifully to internet-scale data

 

  • Big Questions:
    • How do we feed the data flywheel?
    • What are the scaling laws?

 

  • I don't see any immediate ceiling

Discussion

I do think there is something deep happening here...

  • Manipulation should be easy (from a controls perspective)
  • probably low dimensional?? (manifold hypothesis)
  • memorization can go a long way

 

 

Q: How should we think about deployability?

Aside: I still very much believe in and work on model-based control!

Let's remember some lessons from control...

Robust control

  • Nominal model + uncertainty model, e.g. 

 

  • Domain randomization is choosing W

 

 

 

 

  • Optimize worst-case performance
    • But not because robust control folks are uber conservative
x_{n+1} = f(x_n, u_n, w_n), w_n \in W

Verification(?) in an open world...

Found surprisingly rare and diverse failures of the full comma.ai openpilot in the Carla simulator.

The risk-based verification framework

  • Key ideas:
    • Failure probability vs "falsification"
    • Rare-event methods
  • Uncertainty should excite all of the ways that the system can fail
  • When the distribution is too big, false positives are a pain

Essential point:

  • In both cases, I'm not trying to certify performance in the real world.

 

  • Being robust to my uncertainty spec, W, makes me more robust even to the unknown unknowns.

 

  • Foundation models take this to the next level.

The "foundation models for control" mentality

  • Grow dataset on real data until everything is "in distribution"

 

  • Essential point: "common-sense" robustness

 

  • (Non-catastrophic) failures are good, as long as we also show recovery.

 

  • We'll continue to capture/fix failures even after the system is deployed

So, how do we get to "deployable"?

  • We have not yet seen what "robustness" looks like when we get to the GPT moment for robotics.

 

  • I would bet it's deployable.

Online classes (videos + lecture notes + code)

http://manipulation.mit.edu

http://underactuated.mit.edu

Deployable@CoRL2023

By russtedrake

Deployable@CoRL2023

Princeton Robotics Seminar

  • 860