back to index

Anca Dragan: Human-Robot Interaction and Reward Engineering | Lex Fridman Podcast #81


small model | large model

link |
00:00:00.000
The following is a conversation with Anka Joghan, a professor of Berkeley working on human robot interaction.
link |
00:00:08.120
Algorithms that look beyond the robot's function and isolation and generate robot behavior that accounts for interaction and coordination with human beings.
link |
00:00:18.080
She also consults at Waymo, the autonomous vehicle company, but in this conversation, she is 100% wearing her Berkeley hat.
link |
00:00:27.120
She is one of the most brilliant and fun roboticists in the world to talk with.
link |
00:00:32.480
I had a tough and crazy day leading up to this conversation, so I was a bit tired, even more so than usual.
link |
00:00:41.400
But almost immediately as she walked in, her energy, passion, and excitement for human robot interaction was contagious.
link |
00:00:48.880
So I had a lot of fun and really enjoyed this conversation.
link |
00:00:52.840
This is the Artificial Intelligence Podcast.
link |
00:00:55.520
If you enjoy it, subscribe on YouTube, review it with 5 stars on Apple Podcasts, support it on Patreon, or simply connect with me on Twitter.
link |
00:01:03.680
Alex Friedman, spelled F R I D M A N.
link |
00:01:08.120
As usual, I'll do one or two minutes of ads now and never any ads in the middle that can break the flow of the conversation.
link |
00:01:14.800
I hope that works for you and doesn't hurt the listening experience.
link |
00:01:20.400
This show is presented by Cash App, the number one finance app in the App Store.
link |
00:01:25.480
When you get it, use code LEX Podcast.
link |
00:01:29.280
Cash App lets you send money to friends, buy Bitcoin, and invest in the stock market with as little as $1.
link |
00:01:36.800
Since Cash App does fractional share trading, let me mention that the order execution algorithm that works behind the scenes to create the abstraction of fractional orders is an algorithmic marvel.
link |
00:01:48.160
So big props to the Cash App engineers for solving a hard problem that in the end provides an easy interface that takes a step up to the next layer of abstraction over the stock market, making trading more accessible for new investors and diversification much easier.
link |
00:02:05.840
So again, if you get Cash App from the App Store or Google Play and use the code LEX Podcast, you get $10 and Cash App will also donate $10 the first, an organization that is helping to advance robotics and STEM education for young people around the world.
link |
00:02:22.280
And now, here's my conversation with Enka Droghan.
link |
00:02:26.800
When did you first fall in love with robotics?
link |
00:02:29.880
I think it was a very gradual process and it was somewhat accidental, actually, because I first started getting into programming when I was a kid and then into math and then into computer science was the thing I was going to do.
link |
00:02:47.880
And then in college, I got into AI and then I applied to the robotics institute at Carnegie Mellon and I was coming from this little school in Germany that nobody had heard of.
link |
00:02:59.000
But I had spent an exchange semester at Carnegie Mellon, so I had letters from Carnegie Mellon.
link |
00:03:04.040
So that was the only place, MIT said no, Berkeley said no, Stanford said no.
link |
00:03:09.200
That was the only place I got into.
link |
00:03:11.120
So I went there to the robotics institute and I thought that robotics is a really cool way to actually apply the stuff that I knew and love like optimization.
link |
00:03:21.640
So that's how I got into robotics.
link |
00:03:23.160
I have a better story how I got into cars, which is I used to do mostly manipulation in my PhD, but now I do kind of a bit of everything application wise, including cars.
link |
00:03:36.240
And I got into cars because I was here in Berkeley while I was a PhD student still for RSS 2014, Peter Bill organized it.
link |
00:03:48.200
And he arranged for it was Google at the time to give us rides and self driving cars.
link |
00:03:54.240
And I was in a robot and it was just making decision after decision, the right call.
link |
00:04:01.640
And it was so amazing.
link |
00:04:03.360
So it was a whole different experience, right?
link |
00:04:05.520
Just I mean manipulation is so hard.
link |
00:04:07.000
You can't do anything.
link |
00:04:07.840
And there it was.
link |
00:04:08.640
Was it the most magical robot you've ever met?
link |
00:04:11.160
So like for me to meet Google self driving car for the first time was like a transformative moment.
link |
00:04:18.400
Like I had two moments like that, that and spot mini.
link |
00:04:21.280
I don't know if you met spot many from Boston Dynamics.
link |
00:04:24.080
I felt like I felt like I fell in love or something like it because I thought I know how a spot mini works, right?
link |
00:04:30.800
It's just I mean, there's nothing truly special.
link |
00:04:33.960
It's great engineering work, but the anthropomorphism that went on into my brain that came to life.
link |
00:04:41.400
Like I had a little arm and it like and looked at me.
link |
00:04:45.840
He she looked at me.
link |
00:04:47.080
You know, I don't know.
link |
00:04:47.640
There's a magical connection there and it made me realize.
link |
00:04:50.560
Wow, robots can be so much more than things that manipulate objects.
link |
00:04:54.200
They can be things that have a human connection.
link |
00:04:56.880
Jeff was the self driving car the moment.
link |
00:05:00.400
Like was there a robot that truly sort of inspired you?
link |
00:05:04.720
That was I remember that experience very viscerally riding in that car and being just wowed.
link |
00:05:12.240
I had the they gave us a sticker that said I rode in a self driving car
link |
00:05:17.480
and it had this cute little firefly on and or logo or something.
link |
00:05:21.600
Oh, that was like the smaller one, like the firefly.
link |
00:05:23.600
Yeah, the really cute one.
link |
00:05:24.840
Yeah, and and I put it on my laptop and I had that for years until I finally changed my laptop out.
link |
00:05:31.960
And, you know, what about if we walk back, you mentioned optimization.
link |
00:05:36.240
Like what beautiful ideas inspired you in math, computer science early on?
link |
00:05:42.560
Like why get into this field seems like a cold and boring field of math.
link |
00:05:47.360
Like what was exciting to you about it?
link |
00:05:48.960
The thing is, I liked math from very early on, from fifth grade is when I got into the Math Olympiad and all of that.
link |
00:05:57.280
Oh, you competed too.
link |
00:05:58.480
Yeah, this is Romania is like our national sport.
link |
00:06:01.080
You got to understand.
link |
00:06:02.760
So I got into that fairly early and and it was a little maybe too just theory with no kind of I didn't
link |
00:06:11.920
kind of had a didn't really have a goal and other than understanding, which was cool.
link |
00:06:17.520
I always liked learning and understanding, but there was no, OK, what am I applying this understanding to?
link |
00:06:22.120
And so I think that's how I got into more heavily into computer science, because it was it was kind of math meets something you can do tangibly in the world.
link |
00:06:31.200
Do you remember like the first program you've written?
link |
00:06:34.360
OK, the first program I've written with I kind of do it was in Q basic and fourth grade.
link |
00:06:42.480
Wow.
link |
00:06:43.160
And it was drawing like a circle. Yeah, I don't know how to do that anymore, but in fourth grade, that's the first thing that they taught me.
link |
00:06:54.160
I was like, you could take a special, I wouldn't say it was an extracurricular, isn't the sense an extracurricular?
link |
00:06:58.960
So you could sign up for, you know, dance or music or programming.
link |
00:07:03.200
And I did the programming thing and my mom was like, what, why did you compete in program like these days, Romania, probably that's like a big thing.
link |
00:07:12.960
There's a program of competition.
link |
00:07:15.360
Was that did that touch you at all?
link |
00:07:17.000
I did a little bit of the computer science Olympian, but not not as serious as I did the math Olympian.
link |
00:07:24.680
So it's programming.
link |
00:07:25.720
Yeah, it's basically here's a hard math problem.
link |
00:07:27.680
Solve it with a computer is kind of the deal.
link |
00:07:29.400
Yeah, it's more like algorithm.
link |
00:07:30.640
Exactly. It's always algorithmic.
link |
00:07:32.560
So again, you kind of mentioned the Google self driving car, but outside of that, oh, what's like who or what is your favorite
link |
00:07:42.920
robot, real or fictional that like captivated your imagination throughout?
link |
00:07:48.240
I mean, I guess you kind of alluded to the Google self drive.
link |
00:07:51.320
The firefly was a magical moment, but is there something else?
link |
00:07:54.760
It was in the firefly there.
link |
00:07:55.840
It was I think there was the Lexus, by the way.
link |
00:07:57.840
This was back back then.
link |
00:07:59.520
But yeah, so good question.
link |
00:08:02.480
I might.
link |
00:08:03.360
Okay.
link |
00:08:04.040
My favorite fictional robot is Wally.
link |
00:08:07.560
And I love how amazingly expressive it is.
link |
00:08:13.560
I'm personally thinks a little bit about expressive motion kinds of things you're saying with.
link |
00:08:16.840
You can do this and it's a head and it's the manipulator.
link |
00:08:19.280
And what does it all mean?
link |
00:08:21.280
I like to think about that stuff.
link |
00:08:22.480
I love Pixar.
link |
00:08:23.480
I love animation.
link |
00:08:24.480
I love Wally has two big eyes, I think.
link |
00:08:26.520
Or no, yeah, it has these, these cameras and they move.
link |
00:08:33.160
So yeah, that's it's a, you know, it goes and then it goes.
link |
00:08:36.680
And then it's super cute.
link |
00:08:38.520
It's yeah, it's, you know, the way it moves is just so expressive.
link |
00:08:41.400
The timing of that motion that what is doing with its arms and what it's doing with these lenses is amazing.
link |
00:08:48.240
And so I've, I've really liked that from the start.
link |
00:08:53.320
And then on top of that, sometimes I shared this.
link |
00:08:56.400
It's a personal story I share with people or when I teach about AI or whatnot.
link |
00:09:00.240
My husband proposed to me by building a Wally and he actuated it.
link |
00:09:09.600
So it seven degrees of freedom, including the lens thing.
link |
00:09:12.840
And it kind of came in and it had the, he made it have like a, you know, the belly box opening thing.
link |
00:09:21.600
So it just did that.
link |
00:09:23.120
And then it's viewed out this box made out of Legos that opens slowly and then bam.
link |
00:09:29.280
Yeah, yeah, it was, it was quite, quite, it set a bar.
link |
00:09:34.240
It could be like the most impressive thing I've ever heard.
link |
00:09:37.520
Okay, that was special connection to Wally.
link |
00:09:40.160
Long story short, I like Wally because I like animation and I like robots.
link |
00:09:43.600
And I like, you know, the fact that this was, we still have this robot to this day.
link |
00:09:49.760
How hard is that problem?
link |
00:09:50.800
Do you think of the expressivity of robots?
link |
00:09:54.160
Like the, with the Boston Dynamics, I never talked to those folks about this particular element.
link |
00:10:00.240
I've talked to them a lot, but it seems to be like almost an accidental side effect for them
link |
00:10:06.320
that they weren't, I don't know if they're faking it.
link |
00:10:08.560
They weren't trying to, okay.
link |
00:10:11.600
They do say that the gripper on it was not intended to be a face.
link |
00:10:17.680
I don't know if that's a honest statement, but I think they're legitimate.
link |
00:10:21.600
Probably yes. So do we automatically just anthropomorphize anything we can see about a robot?
link |
00:10:29.120
So like the question is, how hard is it to create a Wally type robot that connects so
link |
00:10:34.240
deeply with us humans? What do you think?
link |
00:10:36.800
It's really hard, right? So it depends on what setting.
link |
00:10:39.840
So if you want to do it in this very particular narrow setting where it does only one thing
link |
00:10:47.040
and it's expressive, then you can get an animator, you know, can have Pixar on call,
link |
00:10:51.120
come in, design some trajectories.
link |
00:10:53.280
There was an Anki had a robot called Cosmo where they put in some of these animations.
link |
00:10:58.320
That part is easy, right? The hard part is doing it not via these kind of handcrafted
link |
00:11:05.280
behaviors, but doing it generally autonomously.
link |
00:11:09.600
Like I want robots, I don't work on, just to clarify, I don't, I used to work a lot on this.
link |
00:11:14.480
I don't work on that quite as much these days, but the notion of having robots
link |
00:11:20.880
that, you know, when they pick something up and put it in a place, they can do that with
link |
00:11:25.680
various forms of style or you can say, well, this robot is, you know, succeeding at this task
link |
00:11:31.200
and it's confident versus it's hesitant versus, you know, maybe it's happy or it's,
link |
00:11:35.280
you know, disappointed about something, some failure that it had.
link |
00:11:38.320
Or I think that when robots move, they can communicate so much about internal states
link |
00:11:46.640
or perceived internal states that they have. And I think that's really useful in an element
link |
00:11:53.840
that we'll want in the future because I was reading this article about how kids are,
link |
00:12:04.160
kids are being rude to Alexa because they can be rude to it and it doesn't really get angry,
link |
00:12:11.120
right? It doesn't reply in any way. It just says the same thing.
link |
00:12:14.080
So I think there's, at least for that, for the correct development of children,
link |
00:12:19.840
for that these things, you kind of react differently. I also think, you know,
link |
00:12:23.520
you walk in your home and you have a personal robot and if you're really pissed, presumably
link |
00:12:27.520
the robot should kind of behave slightly differently than when you're super happy and excited.
link |
00:12:32.400
But it's really hard because it's, I don't know, you know, the way I would think about it and the
link |
00:12:38.720
way I thought about it when it came to expressing goals or intentions for robots, it's, well,
link |
00:12:45.200
what's really happening is that instead of doing robotics where you have your state and you have
link |
00:12:51.760
your action space and you have your space, the reward function that you're trying to optimize,
link |
00:12:57.760
now you kind of have to expand the notion of state to include this human internal state.
link |
00:13:02.640
What is the person actually perceiving? What do they think about the robot, something or
link |
00:13:09.280
rather, and then you have to optimize in that system. And so that means you have to understand
link |
00:13:13.920
how your motion, your actions end up sort of influencing the observer's kind of perception
link |
00:13:19.760
of you. And it's very hard to write math about that. Right. So when you start to think about
link |
00:13:26.960
incorporating the human into the state model, apologize for the philosophical question, but
link |
00:13:34.080
how complicated are human beings, do you think? Like, can they be reduced to a kind of almost
link |
00:13:42.160
like an object that moves and maybe has some basic intents? Or is there something, do we have to model
link |
00:13:48.320
things like mood and general aggressiveness and time, I mean, all these kinds of human qualities
link |
00:13:54.800
or like game theoretic qualities? Like, what's your sense? How complicated is,
link |
00:14:00.080
how hard is the problem of human robot interaction? Yeah. Should we talk about what the problem of
link |
00:14:05.920
human robot interaction is? Yeah, this is, what is human robot interaction? And then talk about
link |
00:14:11.440
how that, yeah. So, and by the way, I'm going to talk about this very particular view of human
link |
00:14:17.200
robot interaction, right, which is not so much on the social side or on the side of how you have
link |
00:14:23.360
a good conversation with the robot, what should the robot's appearance be? It turns out that
link |
00:14:27.360
if you make robots taller versus shorter, this has an effect on how people act with them. So,
link |
00:14:32.240
I'm not, I'm not talking about that. But I'm talking about this very kind of narrow thing,
link |
00:14:36.080
which is you take, if you want to take a task that a robot can do in isolation in a lab out
link |
00:14:44.320
there in the world, but in isolation. And now you're asking, what does it mean for the robot
link |
00:14:49.600
to be able to do this task for, presumably, what its actually end goal is, which is to help some
link |
00:14:55.120
person? That ends up changing the problem in two ways. The first way it changes the problem is that
link |
00:15:04.560
the robot is no longer the single agent acting. That you have humans who also take actions in
link |
00:15:10.880
that same space, you know, cars navigating around people, robots around an office, navigating around
link |
00:15:15.840
the people in that office. If I send the robot to over there in the cafeteria to get me a coffee,
link |
00:15:22.240
then there's probably other people reaching for stuff in the same space. And so now you have
link |
00:15:26.640
your robot and you're in charge of the actions that the robot is taking. Then you have these people
link |
00:15:31.520
who are also making decisions and taking actions in that same space. And even if, you know,
link |
00:15:37.200
the robot knows what it's, what it should do and all of that, just coexisting with these people,
link |
00:15:42.320
right, kind of getting the actions, the gel well to mesh well together. That's sort of the kind of
link |
00:15:48.160
problem number one. And then there's problem number two, which is, goes back to this notion of,
link |
00:15:56.480
if I'm a programmer, I can specify some objective for the robot to go off and optimize and specify
link |
00:16:02.560
the task. But if I put the robot in your home, presumably, you might have your own opinions
link |
00:16:10.800
about, well, okay, I want my house clean, but how do I want it cleaned? And how should robot,
link |
00:16:14.880
how close to me it should come and all of that. And so I think those are the two differences
link |
00:16:19.520
that you have. You're acting around people and you, what you should be optimizing for should
link |
00:16:26.080
satisfy the preferences of that end user, not of your programmer who programmed you.
link |
00:16:30.720
Yeah. And the preferences thing is tricky. So figuring out those preferences, be able to
link |
00:16:36.000
interactively adjust, to understand what the human is doing. So it really boils down to be
link |
00:16:41.120
understanding humans in order to interact with them and in order to please them.
link |
00:16:45.840
Right. So why is this hard? Yeah. Why is understanding humans hard? So I think
link |
00:16:55.040
there's two tasks about understanding humans that in my mind are very, very similar, but not
link |
00:17:00.080
everyone agrees. So there's the task of being able to just anticipate what people will do.
link |
00:17:05.520
We all know that cars need to do this, right? We all know that, well, if I navigate around some
link |
00:17:09.520
people, the robot has to get some notion of, okay, where, where is this person going to be?
link |
00:17:15.280
So that's kind of the prediction side. And then there's what you are saying,
link |
00:17:19.040
satisfying the preferences, right? So adapting to the person's preferences, knowing what to
link |
00:17:23.280
optimize for, which is more this inference side, this, what is, what does this person want?
link |
00:17:28.000
What is their intent? What are their preferences? And to me, those kind of go together because I
link |
00:17:35.040
think that in, if you, at the very least, if you can understand, if you can look at human behavior
link |
00:17:42.160
and understand what it is that they want, then that's sort of the key enabler to being able
link |
00:17:47.280
to anticipate what they'll do in the future. Because I think that, you know, we're not arbitrary,
link |
00:17:52.800
we make these decisions that we make, we act in the way we do, because we're trying to achieve
link |
00:17:57.200
certain things. And so I think that's the relationship between them. Now, how complicated do these
link |
00:18:03.200
models need to be in order to be able to understand what people want? So we've gotten a long way in
link |
00:18:13.440
robotics with something called the inverse reinforcement learning, which is the notion of
link |
00:18:17.920
someone acts demonstrates what, how they want the thing done.
link |
00:18:20.960
What is an inverse reinforcement learning? You briefly said it.
link |
00:18:24.400
Right. So it's, it's the problem of take human behavior and infer reward function from this,
link |
00:18:33.120
figure out what it is that that behavior is optimal or respect to.
link |
00:18:37.280
And it's a great way to think about learning human preferences in the sense of, you know,
link |
00:18:41.520
you have a car and the person can drive it. And then you can say, well, okay, I can actually
link |
00:18:47.520
learn what the person is optimizing for. I can learn their driving style, or you can, you can
link |
00:18:54.560
have people demonstrate how they want the house clean. And then you can say, okay, this is,
link |
00:18:59.280
this is, I'm getting the tradeoffs that they're, that they're making, I'm getting the preferences
link |
00:19:04.000
that they want out of this. And so we've been successful in robotics somewhat with this.
link |
00:19:10.240
And it's, it's based on a very simple model of human behavior is remarkably simple, which is
link |
00:19:16.560
that human behavior is optimal with respect to whatever it is that people want, right?
link |
00:19:21.840
So you make that assumption and now you can kind of inverse through that's why it's called inverse,
link |
00:19:25.760
well, really optimal control, but, but also inverse reinforcement learning.
link |
00:19:30.480
So this is based on a utility maximization in economics, right? So back in the 40s,
link |
00:19:38.240
von Neumann Morgenstein, we're like, okay, people are making choices by maximizing utility, go.
link |
00:19:44.800
And then in the late 50s, we had loose and shepherd come in and say, people are a little
link |
00:19:54.480
bit noisy and approximate in that process. So they might choose something kind of
link |
00:20:00.720
stochastically with probability proportional to how much utility something has. There's a bit
link |
00:20:07.680
of noise in there. This has translated into robotics and something that we call Boltzmann
link |
00:20:13.600
rationality. So it's a kind of an evolution of the inverse reinforcement learning that
link |
00:20:17.760
accounts for, for human noise. And we've had some success with that too, for these tasks where it
link |
00:20:24.080
turns out people act noisily enough that you can't just do vanilla, the vanilla version. Ah, you
link |
00:20:30.800
can account for noise and still infer what, what they seem to want based on this. Then now we're
link |
00:20:37.760
hitting tasks where that's no not enough. And what's, what, what are examples, what are examples?
link |
00:20:44.320
So imagine you're trying to control some robot that's, that's fairly complicated. You're trying
link |
00:20:48.800
to control a robot arm, because maybe you're a patient with a motor impairment, and you have
link |
00:20:53.520
this wheelchair mounted arm, and you're trying to control it around. Or one test that we've looked
link |
00:20:58.880
at with Sergey is, and our students did is a lunar lander. So just, I don't know if you know
link |
00:21:04.880
this Atari game, it's called lunar lander. It's really hard. People really suck at landing the
link |
00:21:10.320
thing. Mostly they just crash it left and right. Okay, so this is the kind of task. Imagine you're
link |
00:21:15.360
trying to provide some assistance to a person operating such, such a robot, where you want the
link |
00:21:21.360
kind of the autonomy that they can figure out what it is that you're trying to do and help you do it.
link |
00:21:26.560
It's really hard to do that for, say, lunar lander, because people are all over the place.
link |
00:21:33.520
And so they seem much more noisy than really irrational. That's an example of a task where
link |
00:21:38.640
these models are kind of failing us. And it's not surprising because, so we, you know, we talked
link |
00:21:44.880
about the forties utility late fifties, sort of noisy, then the seventies came and behavioral
link |
00:21:52.080
economics started being a thing where people are like, no, no, no, no, no, people are not
link |
00:21:57.440
rational. People are messy and emotional and irrational and have all sorts of heuristics
link |
00:22:05.760
that might be domain specific. And they're just, they're just a mess. So, so what do, so what does
link |
00:22:11.520
my robot do to understand what you want? And it's a very, it's very, that's why it's complicated.
link |
00:22:17.920
It's, you know, for the most part, we get away with pretty simple models until we don't. And then
link |
00:22:23.440
the question is, what do you do then? And I have days when I wanted to, you know, pack my bags and
link |
00:22:30.880
go home and switch jobs because it's just, it feels really daunting to make sense of human behavior
link |
00:22:36.800
enough that you can reliably understand what people want, especially as, you know, robot
link |
00:22:41.600
capabilities will continue to get developed. You'll get these systems that are more and more
link |
00:22:46.640
capable of all sorts of things. And then you really want to make sure that you're telling them the
link |
00:22:50.080
right thing to do. What is that thing? Well, read it in human behavior.
link |
00:22:55.920
So if I just sat here quietly and tried to understand something about you by listening to you
link |
00:23:01.280
talk, it would be harder than if I got to say something and ask you and interact and control.
link |
00:23:09.680
Can you, can the robot help its understanding of the human by
link |
00:23:13.600
influencing the behavior by actually acting? Yeah, absolutely. So one of the things that's
link |
00:23:21.920
been exciting to me lately is this notion that when you try to think of the robotics problem as,
link |
00:23:31.840
okay, I have a robot and it needs to optimize for whatever it is that a person wants it to
link |
00:23:36.400
optimize as opposed to maybe what a programmer said. That problem we think of as a human robot
link |
00:23:44.560
collaboration problem in which both agents get to act in which the robot knows less than the human
link |
00:23:52.160
because the human actually has access to, you know, at least implicitly to what it is that they want.
link |
00:23:57.040
They can't write it down, but they can, they can talk about it. They can give all sorts of signals,
link |
00:24:02.160
they can demonstrate. But the robot doesn't need to sit there and passively observe human
link |
00:24:08.080
behavior and try to make sense of it. The robot can act too. And so there's these information
link |
00:24:13.600
gathering actions that the robot can take to solicit responses that are actually informative.
link |
00:24:20.960
So for instance, this is not for the purpose of assisting people, but with kind of back to
link |
00:24:24.960
coordinating with people in cars and all of that. One thing that Dorsa did was,
link |
00:24:31.760
so we were looking at cars being able to navigate around people and you might not know exactly the
link |
00:24:39.520
driving style of a particular individual that's next to you, but you want to change lanes in front
link |
00:24:44.640
of them. Navigating around other humans inside cars? Yeah, good clarification question. So
link |
00:24:52.800
you have an autonomous car and it's trying to navigate the road around human driven vehicles.
link |
00:24:58.160
Similar things, ideas apply to pedestrians as well, but let's just take human driven vehicles.
link |
00:25:03.040
So now you're trying to change a lane. Well, you could be trying to infer this driving style of
link |
00:25:09.760
this person next to you. You'd like to know if they're in particular, if they're sort of aggressive
link |
00:25:14.240
or defensive, if they're going to let you kind of go in or if they're going to not. And it's very
link |
00:25:21.280
and it's very difficult to just, if you think that if you want to hedge your bets and say,
link |
00:25:28.160
maybe they're actually pretty aggressive, I shouldn't try this. You kind of end up driving
link |
00:25:32.320
next to them and driving next to them, right? And then you don't know because you're not actually
link |
00:25:38.160
getting the observations that you get away. Someone drives when they're next to you
link |
00:25:42.880
and they just need to go straight. It's kind of the same regardless of their aggressive or defensive.
link |
00:25:47.280
And so you need to enable the robot to reason about how it might actually be able to gather
link |
00:25:55.040
information by changing the actions that it's taking. And then the robot comes up with these
link |
00:25:59.280
cool things where it kind of nudges towards you and then sees if you're going to slow down or not.
link |
00:26:05.040
Then if you slow down, it sort of updates its model of you and says, oh, okay,
link |
00:26:09.440
you're more on the defensive side. So now I can actually like that's a fascinating dance.
link |
00:26:14.160
That's so cool that you could use your own actions to gather information. That feels like a
link |
00:26:21.280
totally open, exciting new world of robotics. I mean, how many people are even thinking about
link |
00:26:26.560
that kind of thing? A handful of us? It's rare because it's actually leveraging human. I mean,
link |
00:26:33.440
most roboticists, I've talked to a lot of colleagues and so on, are kind of being honest, kind of
link |
00:26:40.800
afraid of humans. Because they're messy and complicated, right? I understand. Going back
link |
00:26:48.320
to what we were talking about earlier, right now, we're kind of in this dilemma of, okay,
link |
00:26:52.320
there are tasks that we can just assume people are approximately rational for and we can figure
link |
00:26:56.160
out what they want. We can figure out their goals. We can figure out their driving styles,
link |
00:26:59.040
whatever. Cool. There are these tasks that we can't. So what do we do, right? Do we pack our bags
link |
00:27:04.720
and go home? I've had a little bit of hope recently. And I'm kind of doubting myself,
link |
00:27:13.520
because what do I know that 50 years of behavioral economics hasn't figured out?
link |
00:27:19.360
But maybe it's not really in contradiction with the way that field is headed. But basically,
link |
00:27:24.400
one thing that we've been thinking about is instead of kind of giving up and saying people
link |
00:27:30.320
are too crazy and irrational for us to make sense of them, maybe we can give them a bit the benefit
link |
00:27:38.000
of the doubt. And maybe we can think of them as actually being relatively rational, but just under
link |
00:27:44.640
different assumptions about the world, about how the world works, about, you know, they don't have,
link |
00:27:52.000
we, when we think about rationality, implicit assumption is, or they're rational under all
link |
00:27:56.880
the same assumptions and constraints as the robot, right? This is the state of the world,
link |
00:28:01.600
that's what they know. This is the transition function, that's what they know. This is the
link |
00:28:05.360
horizon, that's what they know. But maybe the kind of this difference, the way, the reason they can
link |
00:28:12.160
seem a little messy and hectic, especially to robots, is that perhaps they just make different
link |
00:28:19.280
assumptions or have different beliefs. I mean, that's another fascinating idea that
link |
00:28:24.880
this, our kind of anecdotal desire to say that humans are irrational, perhaps grounded in behavioral
link |
00:28:32.240
economics, is that we just don't understand the constraints and the rewards under which they
link |
00:28:37.680
operate. And so our goal shouldn't be to throw our hands up and say they're irrational, is to say,
link |
00:28:43.520
let's try to understand what are the constraints. What it is that they must be assuming that makes
link |
00:28:48.880
this behavior make sense. Good life lesson, right? Good life lesson. That's true. It's just outside
link |
00:28:54.800
of robotics. That's just good to, that's communicating with humans. That's just a good,
link |
00:28:59.360
assume that you just don't sort of empathy, right? It's a... This is maybe there's something you're
link |
00:29:05.440
missing and you, and it's, you know, it especially happens to robots because they're kind of dumb
link |
00:29:09.200
and they don't know things. And oftentimes people are sort of supra rational and that they actually
link |
00:29:13.360
know a lot of things that robots don't. Sometimes like with the lunar lander, the robot, you know,
link |
00:29:19.040
knows much more. So it turns out that if you try to say, look, maybe people are operating this thing,
link |
00:29:26.800
but assuming a much more simplified physics model, because they don't get the complexity of this kind
link |
00:29:32.880
of craft or the robot arm with seven degrees of freedom with these inertias and whatever.
link |
00:29:37.760
So maybe they have this intuitive physics model, which is not, you know, this notion of intuitive
link |
00:29:43.120
physics is something that you studied actually in cognitive science, was like Josh Denenbaum,
link |
00:29:47.280
Tom Griffith's work on this stuff. And what we found is that you can actually try to figure out what
link |
00:29:56.560
physics model kind of best explains human actions. And then you can use that to sort of correct
link |
00:30:05.200
what it is that they're commanding the craft to do. So they might be sending the craft somewhere,
link |
00:30:11.200
but instead of executing that action, you can sort of take a step back and say,
link |
00:30:15.200
according to their intuitive, if the world worked according to their intuitive physics model,
link |
00:30:21.520
where do they think that the craft is going? Where are they trying to send it to?
link |
00:30:25.840
And then you can use the real physics, right, the inverse of that to actually figure out what
link |
00:30:30.240
you should do so that you do that instead of where they were actually sending you in the real world.
link |
00:30:34.640
And I kid you not, it worked. People land the damn thing in between the two flags and all that.
link |
00:30:42.400
So it's not conclusive in any way, but I'd say it's evidence that, yeah, maybe we're kind of
link |
00:30:48.160
underestimating humans in some ways when we're giving up and saying, yeah, they're just crazy
link |
00:30:52.400
noisy. So then you try to explicitly try to model the kind of worldview that they have?
link |
00:30:57.920
That they have. That's right. That's right. There's not too, I mean, there's things in behavioral
link |
00:31:02.960
economics, too, that, for instance, have touched upon the planning horizon. So there's this idea
link |
00:31:07.920
that there's bounded rationality, essentially, and the idea that, well, maybe we work on their
link |
00:31:12.160
computational constraints. And I think kind of our view recently has been, take the Bellman update
link |
00:31:18.960
in AI and just break it in all sorts of ways by saying, state? No, no, no, the person doesn't
link |
00:31:23.680
get to see the real state. Maybe they're estimating somehow. Transition function? No, no, no, no, no.
link |
00:31:28.800
Even the actual reward evaluation, maybe they're still learning about what it is that they want.
link |
00:31:35.200
Like, when you watch Netflix and you have all the things and then you have to pick something,
link |
00:31:41.680
imagine that the AI system interpreted that choice as this is the thing you prefer to see.
link |
00:31:48.800
How are you going to know? You're still trying to figure out what you like, what you don't like,
link |
00:31:52.000
et cetera. So I think it's important to also account for that. So it's not irrationality,
link |
00:31:56.640
because they're doing the right thing under the things that they know.
link |
00:31:59.920
Yeah, that's brilliant. You mentioned recommender systems. What kind of, and we were talking about
link |
00:32:05.440
human robot interaction, what kind of problem spaces are you thinking about? So is it robots,
link |
00:32:13.760
like wheeled robots with autonomous vehicles? Is it object manipulation? Like, when you think
link |
00:32:19.280
about human robot interaction in your mind, and maybe, I'm sure you can speak for the entire
link |
00:32:24.800
community of human robot interaction. But like, what are the problems of interest here?
link |
00:32:33.120
You know, I kind of think of open domain dialogue as human robot interaction,
link |
00:32:40.800
and that happens not in the physical space, but it could just happen in the virtual space.
link |
00:32:46.320
So where's the boundaries of this field for you when you're thinking about the things we've
link |
00:32:50.800
been talking about? Yeah, so I tried to find kind of underlying, I don't know what to even call
link |
00:33:02.000
them. I get tried to work on, you know, I might call what I do, the kind of working on the foundations
link |
00:33:07.520
of algorithmic human robot interaction and trying to make contributions there. And it's important
link |
00:33:13.920
to me that whatever we do is actually somewhat domain agnostic when it comes to is it about
link |
00:33:20.000
you know, autonomous cars? Or is it about quadrotors? Or is it about the same underlying
link |
00:33:29.840
principles apply? Of course, when you're trying to get a particular domain to work,
link |
00:33:32.800
you usually have to do some extra work to adapt that to that particular domain. But these things
link |
00:33:37.040
that we were talking about around, well, you know, how do you model humans? It turns out that a lot
link |
00:33:43.360
of systems need to core benefit from a better understanding of how human behavior relates
link |
00:33:49.440
to what people want and need to predict human behavior, physical robots of all sorts and beyond
link |
00:33:56.080
that. And so I used to do manipulation, I used to be, you know, picking up stuff and then I was
link |
00:34:00.880
picking up stuff with people around. And now it's sort of very broad when it comes to the application
link |
00:34:07.280
level. But in a sense, very focused on, okay, how does the problem need to change? How do the
link |
00:34:14.400
algorithms need to change when we're not doing a robot by itself, you know, emptying the dishwasher,
link |
00:34:21.280
but we're stepping outside of that. I thought that popped into my head just now. On the game
link |
00:34:27.120
theoretic side of things, you said this really interesting idea of using actions to gain more
link |
00:34:31.600
information. But if we think a sort of game theory, the humans that are interacting with you,
link |
00:34:42.560
with you, the robot, I'm taking the identity of the robot. Yeah, they also have a world model of you.
link |
00:34:55.520
And you can manipulate that. I mean, if we look at autonomous vehicles, people have a certain
link |
00:35:00.800
viewpoint. You said with the kids, people see Alexa as in a certain way. Is there some value
link |
00:35:08.480
in trying to also optimize how people see you as a robot? Or is that a little too far
link |
00:35:18.000
away from the specifics of what we can solve right now?
link |
00:35:23.520
Both, right? So it's really interesting. And we've seen a little bit of progress on this problem,
link |
00:35:30.880
on pieces of this problem. So you can, again, it kind of comes down to how complicated
link |
00:35:36.880
is the human model need to be. But in one piece of work that we were looking at, we just said,
link |
00:35:44.000
okay, there's these parameters that are internal to the robot and what the robot is about to do,
link |
00:35:51.520
or maybe what objective, what driving style the robot has or something like that. And what we're
link |
00:35:56.560
going to do is we're going to set up a system where part of the state is the person's belief
link |
00:36:00.320
over those parameters. And now when the robot acts, that the person gets new evidence about
link |
00:36:08.240
this robot internal state. And so they're updating their mental model of the robot, right? So if they
link |
00:36:14.160
see a card that sort of cuts someone off, they're like, oh, that's an aggressive card. They know more,
link |
00:36:19.040
right? If they see sort of a robot head towards a particular door, they're like, oh, the robot's
link |
00:36:25.040
trying to get to that door. So this thing that we have to do with humans to try to understand their
link |
00:36:29.440
goals and intentions, humans are inevitably going to do that to robots. And then that raises this
link |
00:36:35.760
interesting question that you asked, which is, can we do something about that? This is going to
link |
00:36:39.200
happen inevitably, but we can sort of be more confusing or less confusing to people. And it
link |
00:36:44.320
turns out you can optimize for being more informative and less confusing. If you have an
link |
00:36:50.240
understanding of how your actions are being interpreted by the human, how they're using
link |
00:36:54.320
these actions to update their belief. And honestly, all we did is just base rule. Basically, okay,
link |
00:37:01.440
the person has a belief, they see an action, they make some assumptions about how the robot
link |
00:37:05.280
generates its actions, presumably as being rational, because robots are rational,
link |
00:37:08.880
it's reasonable to assume that about them. And then they incorporate that new piece of evidence,
link |
00:37:17.040
the Bayesian sense in their belief, and they obtain a posterior. And now the robot
link |
00:37:21.840
is trying to figure out what actions to take, such that it steers the person's belief to put
link |
00:37:26.320
as much probability mass as possible on the correct, on the correct parameters.
link |
00:37:31.040
So that's kind of a mathematical formalization of that. But my worry, and I don't know if you
link |
00:37:37.600
want to go there with me, but I talk about this quite a bit. The kids talking to Alexa
link |
00:37:46.160
disrespectfully worries me. I worry in general about human nature. Like I said, I grew up in
link |
00:37:53.120
Soviet Union, World War II, I'm a Jew too, so with the Holocaust and everything. I just worry
link |
00:37:58.880
about how we humans sometimes treat the other, the group that we call the other, whatever it is,
link |
00:38:04.880
the human history, the group that's the other has been changed faces. But it seems like the robot
link |
00:38:11.200
will be the other, the other, the next the other. And one thing is, it feels to me that robots don't
link |
00:38:20.160
get no respect. They get shoved around shoved around. And is there one at the shallow level,
link |
00:38:27.040
for a better experience, it seems that robots need to talk back a little bit. Like my intuition
link |
00:38:33.280
says, I mean, most companies from sort of Roomba autonomous vehicle companies might not be so happy
link |
00:38:39.600
with the idea that a robot has a little bit of an attitude. But I feel, it feels to me that
link |
00:38:45.520
that's necessary to create a compelling experience. Like we humans don't seem to respect anything that
link |
00:38:50.640
doesn't give us some attitude. Or like a mix of mystery and attitude and anger and that threatens
link |
00:39:01.280
us subtly, maybe passive aggressively. I don't know. It seems like we humans yet need that.
link |
00:39:07.040
Do you, what are your, is there something you have thoughts on this?
link |
00:39:11.040
I'll give you two thoughts on it. One is, it's, we respond to, you know, someone being assertive,
link |
00:39:21.120
but we also respond to someone being vulnerable. So I think robots, my first thought is that
link |
00:39:28.160
robots get shoved around and bullied a lot, because they're sort of, you know, tempting and
link |
00:39:32.960
they're sort of showing off or they appear to be showing off. And so I think going back to these
link |
00:39:38.480
things we were talking about in the beginning of making robots a little more, a little more
link |
00:39:43.200
expressive, a little bit more like, eh, that wasn't cool to do. And now I'm bummed, right?
link |
00:39:49.920
I think that that can actually help because people can't help but anthropomorphize and
link |
00:39:53.520
respond to that. Even that though the emotion being communicated is not in any way a real thing.
link |
00:39:58.720
And people know that it's not a real thing because they know it's just a machine.
link |
00:40:01.920
We're still, you know, we watch, there's this famous psychology experiment with little triangles
link |
00:40:08.880
and kind of dots on a screen and a triangle is chasing the square and you get really angry
link |
00:40:14.560
at the darn triangle because why is it not leaving the square alone? So that's, yeah, we can't help.
link |
00:40:19.920
So that was the first thought. The vulnerability, that's really interesting. I think of like being
link |
00:40:27.360
a, pushing back, being assertive as the only mechanism of getting, of forming a connection,
link |
00:40:34.880
of getting respect, but perhaps vulnerability. Perhaps there's other mechanism that are less
link |
00:40:39.440
threatening. Yeah. Well, I see, well, a little bit, yes. But then this other thing that we can
link |
00:40:45.840
think about is, it goes back to what you were saying, that interaction is really game theoretic.
link |
00:40:50.240
Right? So the moment you're taking actions in a space, the humans are taking actions in that
link |
00:40:54.080
same space, but you have your own objective, which is, you know, you're a car, you need to get your
link |
00:40:59.040
passenger to the destination. And then the human nearby has their own objective, which someone
link |
00:41:04.160
overlaps with you, but not entirely. You're not interested in getting into an accident with
link |
00:41:09.200
each other, but you have different destinations and you want to get home faster and they want to
link |
00:41:13.280
get home faster. And that's a general sum game at that point. And so that's, I think that's what,
link |
00:41:20.080
what, treating it as such is kind of a way we can step outside of this kind of mode that where you
link |
00:41:30.240
try to anticipate what people do and you don't realize you have any influence over it, while
link |
00:41:35.600
still protecting yourself because you're understanding that people also understand that they can
link |
00:41:40.960
influence you. And it's just kind of back and forth is this negotiation, which is really,
link |
00:41:46.640
really talking about different equilibria of a game. The very basic way to solve coordination
link |
00:41:53.120
is to just make predictions about what people will do and then stay out of their way.
link |
00:41:57.680
And that's hard for the reasons we talked about, which is how you have to understand people's
link |
00:42:02.000
intentions implicitly, explicitly, who knows, but somehow you have to get enough of an understanding
link |
00:42:07.040
of that to be able to anticipate what happens next. And so that's challenging. But then it's
link |
00:42:12.480
further challenged by the fact that people change what they're do based on what you do,
link |
00:42:17.440
because they don't, they don't plan an isolation either, right? So when you see cars trying to merge
link |
00:42:23.200
on a highway and not succeeding, one of the reasons this can be is because you, you, they,
link |
00:42:30.400
they look at traffic that keeps coming, they predict what these people are planning on doing,
link |
00:42:35.760
which is to just keep going. And then they stay out of the way because there's not,
link |
00:42:39.680
there's no feasible plan, right? Any plan would actually intersect with one of these
link |
00:42:45.680
other people. So that's bad. So you get stuck there. So now kind of, if, if you start thinking
link |
00:42:52.640
about it as no, no, no, actually, these people change what they do, depending on what the car
link |
00:42:59.360
does, like if the car actually tries to kind of inch itself forward, they might actually slow down
link |
00:43:05.680
and let the car in. And now take an advantage of that. Well, that, you know, that's kind of the
link |
00:43:12.000
next level. We call this like this underactuated system idea where it's like an underactive system
link |
00:43:17.680
robotics, but it's kind of, it's, you don't, you're influenced these other degrees of freedom,
link |
00:43:23.120
but you don't get to decide what they do. I've, I've, I've somewhere seen you mention it, this,
link |
00:43:28.400
the human element in this picture as underactuated. So, you know, you understand underactured
link |
00:43:34.560
robotics is, you know, that you can't fully control the system. You can't go in arbitrary
link |
00:43:42.640
directions in the configuration space under your control. Yeah, it's a very simple way of
link |
00:43:47.920
underactuation where basically there's literally these degrees of freedom that you can control
link |
00:43:51.840
and these degrees of freedom that you can't, but you influence them. And I think that's the
link |
00:43:55.120
important part is that they don't do whatever, regardless of what you do, that what you do
link |
00:44:00.560
influences what they end up doing. I just also like the, the poetry of calling human and robot
link |
00:44:05.360
interaction an underactuated robotics problem. And you also mentioned sort of nudging. It seems
link |
00:44:12.160
that they're, I don't know, I think about this a lot in the case of pedestrians have
link |
00:44:17.040
collected hundreds of hours of videos. I like to just watch pedestrians and it seems that
link |
00:44:22.720
it's a funny hobby. Yeah, it's weird because I learn a lot. I learn a lot about myself,
link |
00:44:28.480
about our human behavior from watching pedestrians, watching people in their environment. Basically,
link |
00:44:36.400
crossing the street is like you're putting your life on the line. You know, I don't know, tens of
link |
00:44:42.320
millions of time in America every day is people are just like playing this weird game of chicken
link |
00:44:48.720
when they cross the street, especially when there's some ambiguity about the right of way.
link |
00:44:53.360
That has to do either with the rules of the road or with the general personality of the intersection
link |
00:44:59.760
based on the time of day and so on. And this nudging idea, I don't, you know, it seems that
link |
00:45:06.160
people don't even nudge. They just aggressively take, make a decision. Somebody, there's a runner
link |
00:45:11.600
that gave me this advice. I sometimes run in the street and, you know, not in the street,
link |
00:45:18.080
on the sidewalk. And he said that if you don't make eye contact with people when you're running,
link |
00:45:23.120
they will all move out of your way. It's called civil inattention. Civil inattention. That's
link |
00:45:28.560
the thing. Oh, wow. I need to look this up, but it works. What is that? My sense was if you communicate
link |
00:45:35.440
like confidence in your actions that you're unlikely to deviate from the action that you're
link |
00:45:42.240
following, that's a really powerful signal to others that they need to plan around your actions,
link |
00:45:47.040
as opposed to nudging where you're sort of hesitantly, then the hesitation might communicate
link |
00:45:53.200
that you're now, you're still in the dance and the game that they can influence with their own
link |
00:45:57.920
actions. I've recently had a conversation with Jim Keller, who's a sort of this
link |
00:46:05.920
legendary chip architect, but he also led the autopilot team for a while. And his intuition
link |
00:46:14.160
that driving is fundamentally still like a ballistics problem. Like you can ignore the human
link |
00:46:20.800
element that is just not hitting things. And you can kind of learn the right dynamics required
link |
00:46:27.040
to do the merger and all those kinds of things. And then my sense is, and I don't know if I can
link |
00:46:32.080
provide sort of definitive proof of this, but my sense is like an order of magnitude or more
link |
00:46:38.400
difficult when humans are involved. Like it's not simply a object, a collision avoidance problem.
link |
00:46:46.720
What's, where does your intuition, of course, nobody knows the right answer here, but where does
link |
00:46:51.360
your intuition fall on the difficulty, fundamental difficulty of the driving problem when humans
link |
00:46:57.040
are involved? Yeah. Good question. I have many opinions on this. Imagine downtown San Francisco.
link |
00:47:06.320
Yeah. Yeah. It's crazy, busy, everything. Okay, now take all the humans out. No pedestrians,
link |
00:47:13.840
no human driven vehicles, no cyclists, no people on little electric scooters zipping around, nothing.
link |
00:47:20.960
I think we're done. I think driving at that point is done. We're done. There's nothing really that's
link |
00:47:26.800
needs still needs to be solved about that. Well, let's pause there. I think I agree with you.
link |
00:47:32.800
Like, and I think a lot of people that will hear will agree with that. But we need to sort of
link |
00:47:39.920
internalize that idea. So what's the problem there? Because we might not quite yet be done with that,
link |
00:47:45.120
because a lot of people kind of focus on the perception problem. A lot of people kind of
link |
00:47:51.440
map autonomous driving into how close are we to solving being able to detect all the, you know,
link |
00:47:57.840
the drivable area, the objects in the scene. Do you see that as a, how hard is that problem?
link |
00:48:07.280
So your intuition there behind your statement was we might have not solved it yet, but we're
link |
00:48:11.840
close to solving basically the perception problem. I think the perception problem, I mean, and by the
link |
00:48:18.000
way, a bunch of years ago, this would not have been true. And a lot of issues in the space came
link |
00:48:24.000
were coming from the fact that, oh, we don't really, you know, we don't know what's, what's where.
link |
00:48:29.360
But I think it's fairly safe to say that at this point, although you could always improve on things
link |
00:48:35.760
and all of that, you can drive through downtown San Francisco if there are no people around.
link |
00:48:40.240
There's no really perception issues standing in your way there. I think perception is hard. But
link |
00:48:46.240
yeah, it's we've made a lot of progress on the perceptions and I to undermine the difficulty
link |
00:48:50.480
of the problem. I think everything about robotics is really difficult, of course. I think that,
link |
00:48:54.720
you know, the, the, the planning problem, the control problem, all very difficult. But I think
link |
00:48:59.760
what's, what makes it really, kind of, yeah, it might be, I mean, you know, I, and I picked Anton
link |
00:49:06.000
San Francisco, I, adapting to, well, now it's snowing, now it's no longer snowing, now it's
link |
00:49:12.960
slippery in this way, now it's the dynamic sport. Could, I could imagine being, being still somewhat
link |
00:49:21.920
challenging. But no, the thing that I think worries us in our tuition is not good there is
link |
00:49:28.160
the perception problem at the edge cases. Sort of downtown San Francisco, the nice thing,
link |
00:49:35.200
it's not actually, it may not be a good example because, because you know what to, what you're
link |
00:49:40.880
getting from, well, there's like crazy construction zones and all that. Yeah, but the thing is,
link |
00:49:44.320
you're traveling at slow speeds, so like it doesn't feel dangerous. To me, what feels dangerous is
link |
00:49:49.200
highway speeds, when everything is, to us humans, super clear. Yeah, I'm assuming LiDAR here, by
link |
00:49:56.240
the way. I think it's kind of irresponsible to not use LiDAR. That's just my personal opinion.
link |
00:50:03.440
I mean, depending on your use case, but I think like, you know, if you, if you have the opportunity
link |
00:50:07.280
to use LiDAR, then a lot, in a lot of cases, you might not. Good, your intuition makes more sense
link |
00:50:13.360
now. So you don't take vision. I just really just don't know enough to say, well, vision alone,
link |
00:50:20.000
what, you know, what's like, there's a lot of, how many cameras do they have? Is it how are you
link |
00:50:24.800
using them? I don't know. There's all, there's a sort of sorts of details. I imagine there's stuff
link |
00:50:29.200
that's really hard to actually see, you know, how do you deal with, with exactly what you were
link |
00:50:34.320
saying, stuff that people would see that, that, that you don't. I think I have more, my intuition
link |
00:50:39.760
comes from systems that can actually use LiDAR as well. Yeah. And until we know for sure, it's
link |
00:50:46.000
makes sense to be using LiDAR. That's kind of the safety focus. But then the sort of the,
link |
00:50:52.000
I also sympathize with the Elon Musk statement of LiDAR is a crutch. It's, it's, it's a,
link |
00:50:58.720
it's a fun notion to think that the things that work today is a crutch for the invention of the
link |
00:51:06.240
things that will work tomorrow, right? Like it, it's, it's kind of true in the sense that if,
link |
00:51:13.760
you know, we want to stick to the comfort that you see this in academic and research settings all
link |
00:51:18.160
the time, the things that work, uh, force you to not explore outside, think outside the box. I
link |
00:51:24.080
think that happens all the time. The problem is in the safety critical systems. You kind of want
link |
00:51:29.440
to stick with the things that work. Uh, so it's, it's a, it's an interesting and difficult trade
link |
00:51:34.640
off in the, in the, in the case of real world sort of safety critical robotic systems. But
link |
00:51:41.760
so your intuition is just to clarify how, I mean, how hard is this human element?
link |
00:51:49.760
Uh, for, like how hard is driving when this human element is involved? Are we
link |
00:51:57.440
years decades away from solving it? But perhaps actually the year isn't the, the thing I'm asking,
link |
00:52:03.680
it doesn't matter what the timeline is, but do you think we're, uh, how many breakthroughs
link |
00:52:09.040
are we away from in solving the human robot interaction problem to get this, to get this
link |
00:52:14.720
right? I think it, in a sense, it really depends. I think that in, we were talking about how well,
link |
00:52:24.080
look, it's really hard because, and this is what people do is hard. And on top of that,
link |
00:52:28.560
playing the game is hard. But I think we sort of have the fundamental, some of the fundamental
link |
00:52:36.880
understanding for that. And then you already see that these systems are being deployed in the real
link |
00:52:43.360
world, you know, even, even driverless, because I think now a few companies that don't have a
link |
00:52:53.600
driver in the car in some small areas. I got a chance to, I went to Phoenix and I, I shot a video
link |
00:53:01.200
with Waymo. I need to get that video out. People have been giving me slack, but there's incredible
link |
00:53:07.760
engineering work being done there. And it's one of those other seminal moments for me in my life
link |
00:53:11.760
to be able to, it sounds silly, but to be able to drive without a, without a ride, sorry, without
link |
00:53:17.920
a driver in the seat. I mean, it was an incredible robotics. I was driven by a robot without being
link |
00:53:26.720
able to take over, without being able to take the steering wheel. That's a magical, that's a
link |
00:53:32.560
magical moment. So in that regard, in those domains, at least for like Waymo, they're, they're,
link |
00:53:37.440
they're solving that human, there's, I mean, they were, they're going, I mean, it felt fast,
link |
00:53:43.440
because you're like freaking out at first. That was, this is my first experience, but it's going
link |
00:53:47.760
like the speed limit, right? 30, 40, whatever it is. And there's humans and it deals with them
link |
00:53:53.280
quite well. It detects them and, and it negotiates the intersections, the left turns and all that.
link |
00:53:58.080
So at least in those domains, it's solving them. The open question for me is like,
link |
00:54:02.560
like, how quickly can we expand? You know, that's the, you know, outside of the weather conditions,
link |
00:54:10.000
all of those kinds of things, how quickly can we expand to like cities like San Francisco?
link |
00:54:14.480
Yeah. And I wouldn't say that it's just, you know, now it's just pure engineering and it's
link |
00:54:19.520
probably the, I mean, and by the way, I'm speaking kind of very generally here as hypothesizing,
link |
00:54:26.240
but I think that, that there are successes and yet no one is everywhere out there. So that seems to
link |
00:54:35.600
suggest that things can be expanded and can be scaled. And we know how to do a lot of things,
link |
00:54:41.520
but there's still probably, you know, new algorithms or modified algorithms that, that
link |
00:54:47.680
you still need to put in there as you, as you learn more and more about new challenges that
link |
00:54:53.760
get, you get faced when. How much of this problem do you think can be learned through end to end?
link |
00:54:58.800
There's the success of machine learning and reinforcement learning. How much of it can be
link |
00:55:03.680
learned from sort of data from scratch and how much, which most of the success of autonomous
link |
00:55:09.040
vehicle systems have a lot of heuristics and rule based stuff on top, like human expertise in
link |
00:55:16.880
injected, forced into the system to make it work. What's your, what's your sense? How much,
link |
00:55:22.160
what's the, what will be the role of learning in the near term? I think, I, I think on the one hand
link |
00:55:32.160
that learning is inevitable here, right? I think on the other hand that when people characterize
link |
00:55:39.600
the problem as it's a bunch of rules that some people wrote down versus it's an end to end
link |
00:55:46.640
Darrell system or imitation learning, then maybe there's kind of something missing from, maybe
link |
00:55:54.640
that's, that's more. So for instance, I think a very, very useful tool in this sort of problem,
link |
00:56:04.240
both in how to generate the car's behavior and robots in general, and how to model human beings
link |
00:56:11.680
is actually planning, search optimization, right? So robotics is a sequential decision
link |
00:56:16.720
making problem. And when, when a robot can figure out on its own how to achieve its goal without
link |
00:56:28.160
hitting stuff and all that stuff, right? All the good stuff for motion planning one on one,
link |
00:56:32.960
I think of that as very much AI, not this is some rule or some, there's nothing rule based
link |
00:56:39.360
on that, right? It's just you're, you're searching through a space and figuring out, are you optimizing
link |
00:56:43.040
through a space and figure out what seems to be the right thing to do. And I think it's hard to
link |
00:56:48.960
just do that because you need to learn models of the world. And I think it's hard to just do the
link |
00:56:54.880
learning part where you don't, you know, you don't bother with any of that, because then you're
link |
00:56:59.840
saying, well, I could do imitation, but then when I go off distribution, I'm really screwed,
link |
00:57:04.480
or you can say, I can do reinforcement learning, which adds a lot of robustness,
link |
00:57:09.680
but then you have to do either reinforcement learning in the real world, which sounds a
link |
00:57:13.840
little challenging, or that trial and error, you know, or you have to do reinforcement learning
link |
00:57:19.520
in simulation. And then that means, well, guess what, you need to model things, at least to
link |
00:57:26.240
model people, model the world enough that you, you know, whatever policy you get of that is
link |
00:57:31.520
like actually fine to roll out in the world and do some additional learning there. So
link |
00:57:37.360
Do you think simulation, by the way, just a quick tangent has a role in the human robot
link |
00:57:43.280
interaction space? Like, is it useful? It seems like humans, everything we've been talking about
link |
00:57:48.320
are difficult to model and simulate. Do you think simulation has a role in this space?
link |
00:57:53.520
I do. I think so, because you can take models and train with them ahead of time, for instance,
link |
00:58:04.080
you can. But the models, sorry to interrupt, the models are sort of human constructed or learned?
link |
00:58:10.400
I think they have to be a combination, because if you get some human data, and then you say,
link |
00:58:20.400
this is going to be my model of the person, what are for simulation and training or for
link |
00:58:24.880
just deployment time, and that's what I'm planning with as my model of how people work.
link |
00:58:29.040
Regardless, if you take some data, and you don't assume anything else, and you just say, okay,
link |
00:58:37.040
this is some data that I've collected, let me fit a policy to help people work based on that.
link |
00:58:42.400
What does to happen is, you collected some data and some distribution,
link |
00:58:46.000
and then now your robot sort of computes a best response to that. It's like, what should I do
link |
00:58:54.320
if this is how people work, and easily goes off of distribution, where that model that you've built
link |
00:59:00.240
of the human completely sucks, because out of distribution, you have no idea. If you think of
link |
00:59:06.080
all the possible policies, and then you take only the ones that are consistent with the human data
link |
00:59:11.680
that you've observed, that still leads a lot of things could happen outside of that distribution
link |
00:59:17.440
where you're confident and you know what's going on. By the way, I've gotten used to this terminology
link |
00:59:23.600
of out of distribution, but it's such a machine learning terminology, because it kind of assumes,
link |
00:59:30.720
so distribution is referring to the data that you've seen. The set of states that you encountered.
link |
00:59:37.840
They've encountered so far at training time, but it kind of also implies that there's a nice
link |
00:59:44.480
statistical model that represents that data. Out of distribution, it raises to me philosophical
link |
00:59:53.840
questions of how we humans reason out of distribution, reason about things that are completely
link |
01:00:00.800
we haven't seen before. What we're talking about here is how do we reason about what other people
link |
01:00:08.560
do in situations where we haven't seen them? Somehow we just magically navigate that. I can
link |
01:00:15.760
anticipate what will happen in situations that are even novel in many ways. I have a pretty good
link |
01:00:22.320
intuition for I don't always get it right, but I might be a little uncertain and so on. I think
link |
01:00:26.720
it's this that if you just rely on data, there's just too many possibilities, too many policies
link |
01:00:36.720
out there that fit the data. By the way, it's not just state, it's really history of state,
link |
01:00:40.480
because to really be able to anticipate what the person will do, it depends on what they've
link |
01:00:44.240
been doing so far, because that's the information you need to at least implicitly say, this is the
link |
01:00:50.000
kind of person that this is, this is probably what they're trying to do. You're trying to map
link |
01:00:54.080
history of states to actions. There's many mappings. History meaning the last few seconds
link |
01:00:59.760
or the last few minutes or the last few months? Who knows? Who knows how much you need? In terms
link |
01:01:04.960
of if your state is really like the positions of everything or whatnot and velocities,
link |
01:01:09.520
who knows how much you need? Then there's so many mappings. Now you're talking about how do
link |
01:01:16.720
you regularize that space? What priors do you impose or what's the inductive bias? There's all
link |
01:01:22.640
very related things to think about it. Basically, what are assumptions that we should be making
link |
01:01:29.840
such that these models actually generalize outside of the data that we've seen?
link |
01:01:35.600
Now you're talking about, well, I don't know, what can you assume? Maybe you can assume that
link |
01:01:39.520
people actually have intentions and that's what drives their actions. Maybe that's the right
link |
01:01:45.440
thing to do when you haven't seen data very nearby that tells you otherwise. I don't know,
link |
01:01:51.520
it gets a very open question. Do you think one of the dreams of artificial intelligence was to
link |
01:01:57.760
solve common sense reasoning? Whatever the heck that means. Do you think something like common
link |
01:02:04.240
sense reasoning has to be solved in part to be able to solve this dance of human robot interaction,
link |
01:02:10.560
the driving space, or human robot interaction in general? Do you have to be able to reason about
link |
01:02:16.240
these kinds of common sense concepts of physics, of, you know, all the things we've been talking
link |
01:02:27.200
about humans, I don't even know how to express them with words, but the basics of human behavior,
link |
01:02:33.360
of fear of death. So like, to me, it's really important to encode in some kind of sense,
link |
01:02:40.080
maybe not, maybe it's implicit, but it feels that it's important to explicitly encode the fear of
link |
01:02:45.120
death, that people don't want to die. Because it seems silly, but like that, the game of chicken
link |
01:02:56.720
that involves with pedestrian crossing the street is playing with the idea of mortality.
link |
01:03:02.800
Like, we really don't want to die. It's not just like a negative reward. I don't know. It just feels
link |
01:03:08.640
like all these human concepts have to be encoded. Do you share that sense, or is it a lot simpler
link |
01:03:14.160
than I'm making out to be? I think it might be simpler. And I'm the person who likes the
link |
01:03:18.080
complicated. I think it might be simpler than that. Because it turns out, for instance, if you
link |
01:03:26.240
say model people in the very, I'll call it traditional, I don't know if it's fair to look
link |
01:03:32.000
at it as a traditional way, but you know, calling people as, okay, they're rational somehow,
link |
01:03:37.680
the utilitarian perspective. Well, in that, once you say that, you automatically capture that they
link |
01:03:47.840
have an incentive to keep on being. You know, Stuart likes to say, you can't fetch the coffee
link |
01:03:54.960
if you're dead. Stuart Russell, by the way. That's a good line. So when you're sort of
link |
01:04:04.320
treating agents as having these objectives, these incentives, humans or artificial,
link |
01:04:12.640
you're kind of implicitly modeling that they'd like to stick around so that they can accomplish
link |
01:04:18.480
those goals. So I think, I think in a sense, maybe that's what draws me so much to the
link |
01:04:24.240
rationality framework, even though it's so broken, we've been able to, it's been such a useful
link |
01:04:29.840
perspective. And like we were talking about earlier, what's the alternative I give up and go home,
link |
01:04:33.760
or you know, I just use complete black boxes, but then I don't know what to assume out of
link |
01:04:37.280
distribution that come back to this. It's just, it's been a very fruitful way to think about the
link |
01:04:43.360
problem in a very more positive way, right? It's just people aren't just crazy, maybe they make
link |
01:04:49.600
more sense than we think. But I think we also have to somehow be ready for it to be wrong,
link |
01:04:57.120
be able to detect when these assumptions are unholding, be all of that stuff.
link |
01:05:01.760
Let me ask sort of another small side of this, that we've been talking about the pure autonomous
link |
01:05:08.560
driving problem. But there's also relatively successful systems already deployed out there
link |
01:05:15.280
in what you may call like level two autonomy or semi autonomous vehicles, whether that's
link |
01:05:21.200
Tesla autopilot, work quite a bit with Cadillac super guru system, which has a driver facing
link |
01:05:29.920
camera that detects your state, there's a bunch of basically lane centering systems.
link |
01:05:35.440
What's your sense about this kind of way of dealing with the human robot interaction problem
link |
01:05:43.040
by having a really dumb robot and relying on the human to help the robot out to keep them both
link |
01:05:51.040
alive? Is that from the research perspective, how difficult is that problem? And from a practical
link |
01:06:00.240
deployment perspective, is that a fruitful way to approach this human robot interaction problem?
link |
01:06:07.920
I think what we have to be careful about there is to not, it seems like some of these systems,
link |
01:06:16.080
not all are making this underlying assumption that if, so I'm a driver and I'm now really
link |
01:06:24.800
not driving but supervising and my job is to intervene, right? And so we have to be careful
link |
01:06:30.240
with this assumption that when I'm, if I'm supervising, I will be just as safe as when I'm
link |
01:06:40.880
driving, like that I will, you know, if I, if I wouldn't get into some kind of accident, if I'm
link |
01:06:47.280
driving, I will be able to avoid that accident when I'm supervising too. And I think I'm concerned
link |
01:06:53.840
about this assumption from a few perspectives. So from a technical perspective, it's that when
link |
01:06:59.280
you let something kind of take control and do its thing, and it depends on what that thing is,
link |
01:07:03.680
obviously, and how much is taking control and how, what things are you trusting it to do.
link |
01:07:07.760
But if you let it do its thing and take control, it will go to what we might call
link |
01:07:14.000
off policy from the person's perspective state. So states that the person wouldn't actually find
link |
01:07:18.640
themselves in if they were the ones driving. And the assumption that the person functions
link |
01:07:23.920
just as well there as they function in the states that they would normally encounter
link |
01:07:27.920
is a little questionable. Now, another part is the kind of the human factor side of this,
link |
01:07:35.120
which is that I don't know about you, but I think I definitely feel like I'm experiencing things
link |
01:07:42.000
very differently when I'm actively engaged in the task versus when I'm a passive observer.
link |
01:07:46.960
Even if I try to stay engaged, right, it's very different than when I'm actually
link |
01:07:51.040
actively making decisions. And you see this in life in general, like you see students who are
link |
01:07:57.200
actively trying to come up with the answer, learn to think better than when they're passively told
link |
01:08:02.000
the answer. I think that's somewhat related. And I think people have studied this in human
link |
01:08:06.160
factors for airplanes. And I think it's actually fairly established that these two are not the
link |
01:08:11.040
same. So I, on that point, because I've gotten a huge amount of heat on this and I stand by it.
link |
01:08:17.040
Okay. Because I know the human factors can be well. And the work here is really strong. And
link |
01:08:24.000
there's many decades of work showing exactly what you're saying. Nevertheless, I've been
link |
01:08:29.440
continuously surprised that much of the predictions of that work has been wrong and what I've seen.
link |
01:08:35.280
So what we have to do, I still agree with everything you said, but we have to be a little bit more
link |
01:08:44.080
open minded. So the, I'll tell you, there's a few surprising things that
link |
01:08:50.240
supervise, like everything you said to the word is actually exactly correct.
link |
01:08:54.160
But it doesn't say, what you didn't say is that these systems are, you said you can't assume a
link |
01:09:00.480
bunch of things, but we don't know if these systems are fundamentally unsafe. That's still
link |
01:09:06.560
unknown. There's a lot of interesting things. Like I'm surprised by the fact, not the fact,
link |
01:09:15.120
that what seems to be anecdotally from, well, from large data collection that we've done,
link |
01:09:20.400
but also from just talking to a lot of people, when in the supervisory role of semi autonomous
link |
01:09:26.640
systems that are sufficiently dumb, at least, which is the, that might be an key element,
link |
01:09:33.440
is the systems have to be dumb. The people are actually more energized as observers. So they
link |
01:09:38.960
actually better, they're better at observing the situation. So there might be cases in systems,
link |
01:09:46.480
if you get the interaction right, where you as a supervisor will do a better job with the system
link |
01:09:53.040
together. I agree. I think that is actually really possible. I guess mainly I'm pointing out that if
link |
01:09:58.160
you do it naively, you're an implicitly assuming something that assumption might actually really
link |
01:10:03.600
be wrong. But I do think that if you explicitly think about what the agent should do such that
link |
01:10:10.960
the person still stays engaged, what the, so that you essentially empower the person to want,
link |
01:10:16.880
and they could, that's really the goal, right? Is you still have a driver. So you want to empower
link |
01:10:21.760
them to be so much better than they would be by themselves. And that's different. It's a very
link |
01:10:28.720
different mindset than I want them to basically not drive. And, but be ready to sort of take over.
link |
01:10:38.960
So one of the interesting things we've been talking about is the rewards
link |
01:10:44.720
that they seem to be fundamental to the way robots behaves. So broadly speaking,
link |
01:10:52.240
we've been talking about utility functions, but comment on how do we approach the design
link |
01:10:58.320
of reward functions? Like how do we come up with good reward functions?
link |
01:11:01.600
Mm hmm. Well, really good question because the answer is we don't. This was, you know,
link |
01:11:12.320
I used to think, I used to think about how, well, it's actually really hard to specify
link |
01:11:18.320
rewards for interaction because it's really supposed to be what the people want. And then
link |
01:11:24.240
you really, you know, we talked about how you have to customize what you want to do to the end
link |
01:11:29.680
user. But I kind of realized that even if you take the interactive component away,
link |
01:11:39.040
it's still really hard to design reward functions. So what do I mean by that? I mean,
link |
01:11:44.720
if we assume this sort of AI paradigm in which there's an agent and his job is to optimize some
link |
01:11:51.200
objectives, some reward, utility, loss, whatever cost. If you write it out, maybe it's a sad,
link |
01:11:59.920
depending on the situation or whatever it is. If you write it out, and then you deploy the agent,
link |
01:12:06.640
you'd want to make sure that whatever you specified incentivizes the behavior you want
link |
01:12:13.440
from the agent in any situation that the agent will be faced with, right? So I do motion planning
link |
01:12:20.320
on my robot arm. I specify some cost function, like, you know, this is how far away you should
link |
01:12:27.280
try to stay, so much a matter to stay away from people and this is how much it matters to be
link |
01:12:30.720
able to be efficient and blah, blah, blah, right? I need to make sure that whatever I specify those
link |
01:12:36.560
constraints or tradeoffs or whatever they are, that when the robot goes and solves that problem
link |
01:12:43.120
in every new situation, that behavior is the behavior that I want to see. And what I've been
link |
01:12:48.480
finding is that we have no idea how to do that. Basically, what I can do is I can sample, I can
link |
01:12:56.640
think of some situations that I think are representative of what the robot will face.
link |
01:13:02.080
And I can tune and add and tune some reward function until the optimal behavior is what I
link |
01:13:10.960
want on those situations, which, first of all, is super frustrating because, you know, through the
link |
01:13:17.760
miracle of AI, we've taken, we don't have to specify rules for behavior anymore, right? The,
link |
01:13:23.040
who were saying before, the robot comes up with the right thing to do, you plug in the situation,
link |
01:13:28.320
it optimizes, bring that situation, it optimizes, but you have to spend still a lot of time on
link |
01:13:34.800
actually defining what it is that that criterion should be. Make sure you didn't forget about 50
link |
01:13:40.320
bazillion things that are important and how they all should be combining together to tell the robot
link |
01:13:45.280
what's good and what's bad and how good and how bad. And so I think this is a lesson that,
link |
01:13:54.720
I don't know, kind of, I guess I close my eyes to it for a while because I've been, you know,
link |
01:14:00.080
tuning cost functions for 10 years now. But it really strikes me that, yeah, we've moved the
link |
01:14:08.240
tuning and the, like, designing of features or whatever from the behavior side into the reward
link |
01:14:19.200
side. And yes, I agree that there's way less of it, but it still seems really hard to anticipate
link |
01:14:24.800
any possible situation and make sure you specify a reward function that when optimized will work
link |
01:14:31.920
well in every possible situation. So you're kind of referring to unintended consequences or just,
link |
01:14:39.040
in general, any kind of suboptimal behavior that emerges outside of the things you said,
link |
01:14:44.640
out of distribution. Suboptimal behavior that is, you know, actually optimal. I mean, this,
link |
01:14:49.920
I guess, the idea of unintended consequences, you know, it's optimal in respect to what you
link |
01:14:52.880
specified, but it's not what you want. And there's a difference between those.
link |
01:14:57.360
But that's not fundamentally a robotics problem, right? That's a human problem.
link |
01:15:01.120
So like, that's the thing, right? So there's this thing called Good Hearts Law,
link |
01:15:05.120
which is you set a metric for an organization. And the moment it becomes a target that people
link |
01:15:11.360
actually optimize for, it's no longer a good metric. Well, what's it called? That's a quote.
link |
01:15:15.600
Good Hearts Law. Good Hearts Law. So the moment you specify a metric, it stops doing his job.
link |
01:15:21.520
Yeah, it stops doing his job. So there's, yeah, there's such a thing as optimizing for
link |
01:15:26.160
things and, and, you know, failing to, to think ahead of time of all the possible
link |
01:15:33.200
things that might be important. And so that's, so that's interesting because
link |
01:15:39.760
historically I work a lot on reward learning from the perspective of customizing to the end user,
link |
01:15:43.840
but it really seems like it's not just the interaction with the end user that's a problem
link |
01:15:49.920
of the human and the robot collaborating so that the robot can do what the human wants,
link |
01:15:54.720
right? This kind of back and forth, the robot probing, the person being informative, all of
link |
01:15:58.800
that stuff might be actually just as applicable to this kind of maybe new form of human robot
link |
01:16:06.800
interaction, which is the interaction between the robot and the expert programmer, roboticist,
link |
01:16:13.280
designer in charge of actually specifying what the heck one should do and specifying the task
link |
01:16:19.280
for the robot. Fascinating. That's so cool, like collaborating on the reward. Right, collaborating
link |
01:16:24.560
on the reward design. And so what, what does it mean, right? What does it, when we think about
link |
01:16:29.200
the problem, not as someone specifies all of your job is to optimize and we start thinking about
link |
01:16:36.160
you're in this interaction and this collaboration. And the first thing that comes up is when the
link |
01:16:42.720
person specifies a reward, it's not, you know, gospel, it's not like the letter of the law.
link |
01:16:48.560
It's not the definition of the reward function you should be optimizing because they're doing
link |
01:16:54.080
their best, but they're not some magic perfect oracle. And the sooner we start understanding
link |
01:16:58.320
that, I think the sooner we'll get to more robots that function better in different situations.
link |
01:17:06.320
And then, then you have kind of say, okay, well, it's, it's almost like robots are
link |
01:17:10.080
over learning over, they're putting too much weight on the reward specified by definition.
link |
01:17:18.320
And maybe leaving a lot of other information on the table, like what are other things we could do
link |
01:17:23.120
to actually communicate to the robot about what we want them to do besides attempting to specify
link |
01:17:28.800
a reward function. Yeah, you have this awesome, again, I love the poetry of leaked information.
link |
01:17:34.480
So you mentioned humans leak information about what they want, you know, leak reward signal
link |
01:17:42.560
for the, for the robot. So how do we detect these leaks? What is that? Yeah, what are these leaks?
link |
01:17:49.680
But I just, I don't know, I did that, those were, there's recently saw it, read it, I don't know
link |
01:17:54.240
where from you. And that's gonna stick with me for a while, for some reason, because it's not
link |
01:17:59.520
explicitly expressed, it kind of leaks indirectly from our behavior. Yeah, absolutely. So I think
link |
01:18:09.200
maybe some surprising bits, right? So we were talking before about I'm a robot arm and needs to
link |
01:18:15.040
move around people, carry stuff, put stuff away, all of that. And now imagine that, you know, the
link |
01:18:25.040
robot has some initial objective that the programmer gave it, so they can do all these things functionally,
link |
01:18:30.560
it's capable of doing that. And now I noticed that it's doing something and maybe it's coming
link |
01:18:37.520
too close to me, right? And maybe I'm the designer, maybe I'm the end user and this robot is now in
link |
01:18:42.240
my home. And I push it away. So I push away because, you know, it's a reaction to what the robot is
link |
01:18:51.040
currently doing. And this is what we call physical human robot interaction. And now there's a lot
link |
01:18:56.480
of, there's a lot of interesting work on how they have to respond to physical human robot
link |
01:19:00.720
interaction, what should the robot do if such an event occurs? And there's sort of different
link |
01:19:04.000
schools of thought, it's well, you know, you can sort of treat it the controls erratic way and say
link |
01:19:08.080
this is a disturbance that you must reject. You can sort of treat it more kind of heuristically
link |
01:19:15.680
and say I'm going to go into some like gravity compensation mode so that I'm easily maneuverable
link |
01:19:19.280
around I'm going to go into direction that the person pushed me. And, and to us,
link |
01:19:25.680
part of realization has been that that is signal that communicates about the reward because if
link |
01:19:30.880
my robot was moving in an optimal way, and I intervened, that means that I disagree with
link |
01:19:37.840
his notion of optimality, whatever it thinks is optimal is not actually optimal. And sort of
link |
01:19:44.000
optimization problems aside, that means that the cost function, the reward function is, is
link |
01:19:50.720
incorrect, at least is not what I wanted to be. How difficult is that signal to, to, to interpret
link |
01:19:58.240
and make actionable so like I because this connects to our autonomous vehicle discussion
link |
01:20:02.000
whether in the semi autonomous vehicle or autonomous vehicle, when a safety driver disengages
link |
01:20:07.440
the car, like they could have disengaged it for a million reasons. Yeah. Yeah. So that's true.
link |
01:20:14.960
Again, it comes back to a, can you, can you structure a little bit your assumptions about
link |
01:20:20.640
how human behavior relates to what they want? And you know, you can't one thing that we've
link |
01:20:25.760
done is literally just treated this external torque that they applied as, you know, when you
link |
01:20:32.320
take that and you add it with what the torque the robot was already applying, that overall action
link |
01:20:38.000
is probably relatively optimal in respect to whatever it is that the person wants. And then
link |
01:20:41.920
that gives you information about what it is that they want. So you can learn that people want you
link |
01:20:45.600
to stay further away from them. Now, you're right that there might be many things that explain just
link |
01:20:50.400
that one signal and that you might need much more data than that for, for, for the person to be able
link |
01:20:54.880
to shape your reward function over time. You can also do this info gathering stuff that we were
link |
01:21:01.120
talking about. Not that we've done that in that context just to clarify, but it's definitely
link |
01:21:04.480
something we thought about where you can have the robot start acting in a way, like if there are a
link |
01:21:11.120
bunch of different explanations, right? It moves in a way where it sees if you correct it in some
link |
01:21:16.480
other way or not, and then kind of actually plans its motion so that it can disambiguate and collect
link |
01:21:22.160
information about what you want. Anyway, so that's one way that's kind of sort of leaked information,
link |
01:21:27.280
maybe even more subtle leaked information is if I just press the E stop, right? I just, I'm doing
link |
01:21:33.280
it out of panic because the robot is about to do something bad. There's again information there,
link |
01:21:38.080
right? Okay, the robot should definitely stop, but it should also figure out that whatever it was
link |
01:21:43.120
about to do was not good. And in fact, it was so not good that stopping and remaining stop for a
link |
01:21:48.400
while was better, a better trajectory for it than whatever it is that it was about to do. And that
link |
01:21:52.960
again is information about what are my preferences? What do I want? Speaking of E stops, what are your
link |
01:22:01.920
expert opinions on the three laws of robotics from Isaac Asimov that don't harm humans, obey
link |
01:22:09.680
orders, protect yourself? I mean, it's such a silly notion, but I speak to so many people these
link |
01:22:14.880
days, just regular folks, just, I don't know, my parents and so on about robotics, and they kind
link |
01:22:20.080
of operate in that space of, you know, imagining our future with robots and thinking what are the
link |
01:22:26.960
ethical, how do we get that dance, right? I know the three laws might be a silly notion, but do you
link |
01:22:34.640
think about like what universal reward functions that might be that we should enforce on the robots
link |
01:22:42.960
of the future? Or is that a little too far out? Or is the mechanism that you just described,
link |
01:22:51.440
there shouldn't be three laws that should be constantly adjusting kind of thing?
link |
01:22:55.040
I think it should constantly be adjusting kind of thing. You know, the issue with the laws is,
link |
01:23:00.800
I don't even, you know, there are words and I have to write math and have to translate them into
link |
01:23:05.680
math. What does it mean to? What does harm mean? What is, obey what, right? Because we just talked
link |
01:23:13.520
about how you try to say what you want, but you don't always get it right and you want these machines
link |
01:23:21.120
to do what you want, not necessarily exactly what you're literally, so you don't want them to take
link |
01:23:25.840
you literally, you want to take what you say and interpret it in context. And that's what we do
link |
01:23:32.080
with the specified rewards. We don't take them literally anymore from the designer. We, not we
link |
01:23:37.440
as a community, we as, you know, some members of my group, we, and some of our collaborators like
link |
01:23:45.760
Peter Beall and Stuart Russell, we should have said, okay, the designer specified this thing,
link |
01:23:53.200
but I'm going to interpret it not as this is the universal reward function that I shall always
link |
01:23:57.760
optimize always and forever, but as this is good evidence about what the person wants.
link |
01:24:05.280
And I should interpret that evidence in the context of these situations that it was specified for,
link |
01:24:10.800
because ultimately, that's what the designer thought about. That's what they had in mind.
link |
01:24:14.160
And really, them specifying reward function that works for me in all these situations is really
link |
01:24:20.160
kind of telling me that whatever behavior that incentivizes must be good behavior respect to
link |
01:24:24.560
the thing that I should actually be optimizing for. And so now the robot kind of has uncertainty
link |
01:24:30.240
about what it is that it should be, what its reward function is. And then there's all these
link |
01:24:35.280
additional signals we've been finding that it can kind of continually learn from and adapt
link |
01:24:40.000
its understanding of what people want. Every time the person corrects it, maybe they demonstrate,
link |
01:24:44.640
maybe they stop, hopefully not. One really, really crazy one is the environment itself,
link |
01:24:54.720
like our world. It's not, you know, you observe our world and the state of it. And it's not that
link |
01:25:02.320
you're seeing behavior and you're saying, oh, people are making decisions that are rational,
link |
01:25:05.760
blah, blah, blah. But our world is something that we've been acting when, according to our
link |
01:25:13.040
preferences. So I have this example where like the robot walks into my home and my shoes are laid
link |
01:25:17.920
down on the floor kind of in a line, right? It took effort to do that. So even though the robot
link |
01:25:24.480
doesn't see me doing this, you know, actually aligning the shoes, it should still be able
link |
01:25:30.640
to figure out that I want the shoes aligned. Because there's no way for them to have magically,
link |
01:25:35.680
you know, been instantiated themselves in that way. Someone must have actually taken the time
link |
01:25:42.960
to do that. So it must be important. So the environment actually tells the environment
link |
01:25:46.880
leaks information, leaks information. I mean, the environment is the way it is because humans
link |
01:25:51.520
somehow manipulated it. So you have to kind of reverse engineer the narrative that happened
link |
01:25:56.320
to create the environments it is. And that leaks the preference information. Yeah.
link |
01:26:00.960
You have to be careful, right? Because people don't have the bandwidth to do everything. So
link |
01:26:06.720
just because, you know, my house is messy doesn't mean that I want it to be messy, right? But that
link |
01:26:10.960
just, you know, I didn't put the effort into that. I put the effort into something else.
link |
01:26:16.080
So the robot should figure out, well, that's me else was more important, but it doesn't mean that,
link |
01:26:19.760
you know, the house being messy is not so it's a little subtle. But yeah, we really think of it.
link |
01:26:24.160
The state itself is kind of like a choice that people implicitly made about how they want their
link |
01:26:30.560
world. What book or books, technical or fiction or philosophical had, when you like look back
link |
01:26:37.280
your life had a big impact, maybe it was a turning point was inspiring in some way.
link |
01:26:42.560
Maybe we're talking about some silly book that nobody in their right mind want to read. Or maybe
link |
01:26:48.960
it's a book that you would recommend to others to read. Or maybe those could be two different
link |
01:26:53.840
recommendations that of books that could be useful for people on their journey.
link |
01:27:01.600
When I was in, it's kind of a personal story, when I was in 12th grade, I got my hands on a
link |
01:27:09.200
PDF copy in Romania of Russell Norvig AI modern approach. I didn't know anything about AI at that
link |
01:27:16.960
point. I was, you know, I had watched the movie, The Matrix was my exposure. And so I started going
link |
01:27:27.840
through this thing. And you know, you're asking in the beginning, what are, you know, it's math
link |
01:27:35.200
and it's algorithms, what's interesting, it was so captivating, this notion that you could just
link |
01:27:40.400
have a goal and figure out your way through a kind of a messy, complicated situation.
link |
01:27:47.600
So what sequence of decisions you should make to autonomously to achieve that goal.
link |
01:27:52.960
That was so cool. I'm, you know, I'm biased, but that's a cool book to look at.
link |
01:28:00.000
Yeah, you can convert, you know, the goal, the goal of the process of intelligence and
link |
01:28:05.520
mechanize it. I had the same experience. I was really interested in psychiatry and trying to
link |
01:28:09.920
understand human behavior. And then AI modern approach is like, wait, you can just reduce it
link |
01:28:15.760
all to write math about human behavior, right? Yeah. So that's, and I think that stuck with me
link |
01:28:21.360
because, you know, a lot of what I do, a lot of what we do in my lab is write math about human
link |
01:28:28.240
behavior, combine it with data and learning, put it all together, give it to robots to plan with
link |
01:28:32.800
and, you know, hope that instead of writing rules for the robots, writing heuristics,
link |
01:28:38.160
designing behavior, they can actually autonomously come up with the right thing to do around people.
link |
01:28:43.840
That's kind of our, you know, that's our signature move. We wrote some math and then
link |
01:28:47.840
instead of kind of hand crafting this and that and that and the robot figure and stuff out and
link |
01:28:52.240
isn't that cool. And I think that is the same enthusiasm that I got from the robot figured
link |
01:28:57.280
out how to reach that goal in that graph. Isn't that cool? So apologize for the romanticized
link |
01:29:05.200
questions and the silly ones. If a doctor gave you five years to live, sort of emphasizing
link |
01:29:13.600
the finiteness of our existence, what would you try to accomplish? It's like my biggest nightmare,
link |
01:29:21.840
by the way. I really like living. So I'm actually, I really don't like the idea of being told that
link |
01:29:29.440
I'm going to die. Sorry to link on that for a second. I mean, do you meditate or ponder on
link |
01:29:35.280
your mortality or are human? The fact that this thing ends, it seems to be a fundamental
link |
01:29:40.640
feature. Do you think of it as a feature or a bug too? You said you don't like the idea of dying,
link |
01:29:47.440
but if I were to give you a choice of living forever, like you're not allowed to die.
link |
01:29:52.320
Now I'll say that I want to live forever, but I watch this show. It's very silly. It's called
link |
01:29:56.720
A Good Place. And they reflect a lot on this. And the moral of the story is that you have to make
link |
01:30:02.480
the afterlife be a finite too, because otherwise people just kind of, it's like walley. It's like
link |
01:30:08.240
whatever. So I think the finiteness helps. But yeah, it's just, I'm not a religious person.
link |
01:30:19.040
I don't think that there's something after. And so I think it just ends and you stop existing.
link |
01:30:25.440
And I really like existing. It's such a great privilege to exist that, yeah, it's just,
link |
01:30:33.840
I think that's the scary part. I still think that we like existing so much because it ends.
link |
01:30:40.320
And that's so sad. It's so sad to me every time. I find almost everything about this life beautiful.
link |
01:30:46.480
Like the silliest, most mundane things are just beautiful. And I think I'm cognizant of the fact
link |
01:30:51.680
that I find it beautiful because it ends. And it's so, I don't know. I don't know how to feel
link |
01:30:58.640
about that. I also feel like there's a lesson in there for robotics and AI that is not like,
link |
01:31:08.560
the finiteness of things seems to be a fundamental nature of human existence. I think
link |
01:31:14.080
some people sort of accuse me of just being Russian and melancholic and romantic or something.
link |
01:31:19.920
But that seems to be a fundamental nature of our existence that should be incorporated
link |
01:31:27.360
in our reward functions. But anyway, if you were speaking of reward functions,
link |
01:31:34.560
if you only had five years, what would you try to accomplish?
link |
01:31:38.320
This is the thing. I'm thinking about this question and have a pretty joyous moment
link |
01:31:45.120
because I don't know that I would change much. I'm trying to make some contribution
link |
01:31:53.600
stuff, how we understand human AI interaction. I don't think I would change that.
link |
01:32:00.400
Maybe I'll take more trips to the Caribbean or something. But I tried to solve that already
link |
01:32:07.280
from time to time. So yeah, I mean, I try to do the things that bring me joy and thinking about
link |
01:32:14.640
these things bring me joy is the Mary condo thing. Don't do stuff that doesn't spark joy.
link |
01:32:19.920
For the most part, I do things that spark joy. Maybe I'll do less service in the department
link |
01:32:24.480
or something. I'm not dealing with admissions anymore. But no, I think I have amazing colleagues
link |
01:32:35.680
and amazing students and amazing family and friends and spending time and some balance
link |
01:32:41.280
with all of them is what I do and that's what I'm doing already. So I don't know that I would
link |
01:32:46.000
really change anything. So on the spirit of positiveness, what's small act of kindness
link |
01:32:52.640
if one pops the mind where you once shown that you will never forget?
link |
01:32:58.480
When I was in high school, my friends, my classmates did some tutoring. We were gearing up
link |
01:33:09.840
for our baccalaureate exam, and they did some tutoring on, well, some on math, some on whatever.
link |
01:33:15.840
I was comfortable enough with some of those subjects, but physics was something that I
link |
01:33:21.280
hadn't focused on in a while. And so they were all working with this one teacher. And I started
link |
01:33:30.320
working with that teacher. Her name is Nicole Bicanu. And she was the one who kind of opened
link |
01:33:38.080
up this whole world for me because she sort of told me that I should take the SATs and apply to
link |
01:33:45.600
go to college abroad and, you know, do better on my English and all of that. And when it came to,
link |
01:33:54.240
well, financially, I couldn't, my parents couldn't really afford to do all these things. She started
link |
01:33:59.760
tutoring me on physics for free. And on top of that, sitting down with me to kind of train me for
link |
01:34:05.360
SATs and all that jazz that she had experience with. Wow. And obviously, that has taken you to
link |
01:34:14.640
be here today, also to one of the world experts in robotics. It's funny, those little... Yeah,
link |
01:34:21.040
people do it the small or large... For no reason, really, just out of karma... Wanting to support
link |
01:34:28.800
someone, yeah. Yeah. So we talked a ton about reward functions. Let me talk about the most
link |
01:34:36.240
ridiculous big question. What is the meaning of life? What's the reward function under which we
link |
01:34:41.920
humans operate? Like, what maybe to your life, maybe broader to human life in general, what do you
link |
01:34:48.320
think? What gives life fulfillment, purpose, happiness, meaning?
link |
01:34:55.120
You can't even ask that question with a straight face. That's so ridiculous. I can't, I can't.
link |
01:35:01.760
Okay. So, you know... You're going to try to answer it anyway, are you sure?
link |
01:35:09.440
So, I was in a planetarium once. Yes. And, you know, they show you the thing and then they zoom out
link |
01:35:17.440
and zoom out and this whole, like, you respect of dust kind of thing. I think I was conceptualizing
link |
01:35:21.920
that we're kind of, you know, what are humans? We're just on this little planet, whatever. We
link |
01:35:26.800
don't matter much in the grand scheme of things. And then my mind got really blown because they
link |
01:35:32.640
talked about this multiverse theory where they kind of zoomed out and were like, this is our
link |
01:35:37.680
universe. And then, like, there's a bazillion other ones and it just stays popped in and out of
link |
01:35:41.920
existence. So, like, our whole thing that's that we can't even fathom how big it is was like a blimp
link |
01:35:47.200
that went in and out. And at that point I was like, okay, like, I'm done. This is not, there is no
link |
01:35:52.720
meaning. And clearly what we should be doing is try to impact whatever local thing we can impact.
link |
01:35:59.760
Our communities leave a little bit behind there. Our friends, our family, our local communities,
link |
01:36:05.200
and just try to be there for other humans. Because just everything beyond that seems
link |
01:36:12.480
ridiculous. I mean, are you like, how do you make sense of these multiverses? Like, are you inspired
link |
01:36:19.120
by the immensity of it? That do you, I mean, you, is there,
link |
01:36:27.040
like, is it amazing to you? Or is it almost paralyzing in the mystery of it?
link |
01:36:32.960
It's frustrating. I'm frustrated by my inability to comprehend. It just feels very frustrating.
link |
01:36:43.840
It's like, there's, there's some stuff that, you know, we should time, blah, blah, blah,
link |
01:36:47.440
that we should really be understanding. And I definitely don't understand it. But,
link |
01:36:51.040
you know, the, the, the amazing physicists of the world have a much better understanding than me,
link |
01:36:56.560
but it's just an epsilon and the grand scheme of things. So it's very frustrating. It's just,
link |
01:37:01.040
it sort of feels like our brains don't have some fundamental capacity. Yeah. Well,
link |
01:37:06.400
yet or ever, I don't know, but. Well, this, one of the dreams of artificial intelligence is to
link |
01:37:11.040
create systems that will aid, expand our cognitive capacity in order to understand the, build the,
link |
01:37:17.360
the theory of everything with the physics and understand what the heck these multiverses are.
link |
01:37:25.360
So I think there's no better way to end it than talking about the meaning of life and the fundamental
link |
01:37:30.560
nature of the universe and multiverse. So Anka is a huge honor. One of the, my favorite conversations
link |
01:37:37.840
I've had. I really, really appreciate your time. Thank you for talking to them. Thank you for coming.
link |
01:37:43.200
Come back again. Thanks for listening to this conversation with Anka Drugan. And thank you
link |
01:37:48.800
to our presenting sponsor, Cash App. Please consider supporting the podcast by downloading
link |
01:37:53.440
Cash App and using code Lex podcast. If you enjoy this podcast, subscribe on YouTube,
link |
01:37:59.360
review it with five stars on Apple podcast, support on Patreon or simply connect with me
link |
01:38:04.320
on Twitter and Lex Friedman. And now let me leave you with some words from Isaac Asimov.
link |
01:38:12.400
Your assumptions are your windows in the world. Scrub them off every once in a while,
link |
01:38:17.840
or the light won't come in. Thank you for listening and hope to see you next time.