back to indexAnca Dragan: Human-Robot Interaction and Reward Engineering | Lex Fridman Podcast #81
link |
The following is a conversation with Anca Drogon,
link |
a professor at Berkeley working on human robot interaction,
link |
algorithms that look beyond the robot's function
link |
in isolation and generate robot behavior
link |
that accounts for interaction
link |
and coordination with human beings.
link |
She also consults at Waymo, the autonomous vehicle company,
link |
but in this conversation,
link |
she is 100% wearing her Berkeley hat.
link |
She is one of the most brilliant and fun roboticists
link |
in the world to talk with.
link |
I had a tough and crazy day leading up to this conversation,
link |
so I was a bit tired, even more so than usual,
link |
but almost immediately as she walked in,
link |
her energy, passion, and excitement
link |
for human robot interaction was contagious.
link |
So I had a lot of fun and really enjoyed this conversation.
link |
This is the Artificial Intelligence Podcast.
link |
If you enjoy it, subscribe on YouTube,
link |
review it with five stars on Apple Podcast,
link |
support it on Patreon,
link |
or simply connect with me on Twitter at Lex Friedman,
link |
spelled F R I D M A N.
link |
As usual, I'll do one or two minutes of ads now
link |
and never any ads in the middle
link |
that can break the flow of the conversation.
link |
I hope that works for you
link |
and doesn't hurt the listening experience.
link |
This show is presented by Cash App,
link |
the number one finance app in the App Store.
link |
When you get it, use code LEXPODCAST.
link |
Cash App lets you send money to friends,
link |
buy Bitcoin, and invest in the stock market
link |
with as little as one dollar.
link |
Since Cash App does fractional share trading,
link |
let me mention that the order execution algorithm
link |
that works behind the scenes
link |
to create the abstraction of fractional orders
link |
is an algorithmic marvel.
link |
So big props to the Cash App engineers
link |
for solving a hard problem that in the end
link |
provides an easy interface that takes a step up
link |
to the next layer of abstraction over the stock market,
link |
making trading more accessible for new investors
link |
and diversification much easier.
link |
So again, if you get Cash App from the App Store
link |
or Google Play and use the code LEXPODCAST,
link |
you get $10 and Cash App will also donate $10 to FIRST,
link |
an organization that is helping to advance robotics
link |
and STEM education for young people around the world.
link |
And now, here's my conversation with Anca Drogon.
link |
When did you first fall in love with robotics?
link |
I think it was a very gradual process
link |
and it was somewhat accidental actually
link |
because I first started getting into programming
link |
when I was a kid and then into math
link |
and then I decided computer science
link |
was the thing I was gonna do
link |
and then in college I got into AI
link |
and then I applied to the Robotics Institute
link |
at Carnegie Mellon and I was coming from this little school
link |
in Germany that nobody had heard of
link |
but I had spent an exchange semester at Carnegie Mellon
link |
so I had letters from Carnegie Mellon.
link |
So that was the only, you know, MIT said no,
link |
Berkeley said no, Stanford said no.
link |
That was the only place I got into
link |
so I went there to the Robotics Institute
link |
and I thought that robotics is a really cool way
link |
to actually apply the stuff that I knew and loved
link |
to like optimization so that's how I got into robotics.
link |
I have a better story how I got into cars
link |
which is I used to do mostly manipulation in my PhD
link |
but now I do kind of a bit of everything application wise
link |
including cars and I got into cars
link |
because I was here in Berkeley
link |
while I was a PhD student still for RSS 2014,
link |
Peter Bill organized it and he arranged for,
link |
it was Google at the time to give us rides
link |
in self driving cars and I was in a robot
link |
and it was just making decision after decision,
link |
the right call and it was so amazing.
link |
So it was a whole different experience, right?
link |
Just I mean manipulation is so hard you can't do anything
link |
Was it the most magical robot you've ever met?
link |
So like for me to meet a Google self driving car
link |
for the first time was like a transformative moment.
link |
Like I had two moments like that,
link |
that and Spot Mini, I don't know if you met Spot Mini
link |
from Boston Dynamics.
link |
I felt like I fell in love or something
link |
like it, cause I know how a Spot Mini works, right?
link |
It's just, I mean there's nothing truly special,
link |
it's great engineering work but the anthropomorphism
link |
that went on into my brain that came to life
link |
like it had a little arm and it looked at me,
link |
he, she looked at me, I don't know,
link |
there's a magical connection there
link |
and it made me realize, wow, robots can be so much more
link |
than things that manipulate objects.
link |
They can be things that have a human connection.
link |
Do you have, was the self driving car the moment like,
link |
was there a robot that truly sort of inspired you?
link |
That was, I remember that experience very viscerally,
link |
riding in that car and being just wowed.
link |
I had the, they gave us a sticker that said,
link |
I rode in a self driving car
link |
and it had this cute little firefly on and,
link |
or logo or something like that.
link |
Oh, that was like the smaller one, like the firefly.
link |
Yeah, the really cute one, yeah.
link |
And I put it on my laptop and I had that for years
link |
until I finally changed my laptop out and you know.
link |
What about if we walk back, you mentioned optimization,
link |
like what beautiful ideas inspired you in math,
link |
computer science early on?
link |
Like why get into this field?
link |
It seems like a cold and boring field of math.
link |
Like what was exciting to you about it?
link |
The thing is I liked math from very early on,
link |
from fifth grade is when I got into the math Olympiad
link |
Oh, you competed too?
link |
Yeah, this, it Romania is like our national sport too,
link |
you gotta understand.
link |
So I got into that fairly early
link |
and it was a little, maybe too just theory
link |
with no kind of, I didn't kind of had a,
link |
didn't really have a goal.
link |
And other than understanding, which was cool,
link |
I always liked learning and understanding,
link |
but there was no, okay,
link |
what am I applying this understanding to?
link |
And so I think that's how I got into,
link |
more heavily into computer science
link |
because it was kind of math meets something
link |
you can do tangibly in the world.
link |
Do you remember like the first program you've written?
link |
Okay, the first program I've written with,
link |
I kind of do, it was in Cubasic in fourth grade.
link |
And it was drawing like a circle.
link |
Yeah, that was, I don't know how to do that anymore,
link |
but in fourth grade,
link |
that's the first thing that they taught me.
link |
I was like, you could take a special,
link |
I wouldn't say it was an extracurricular,
link |
it's in the sense an extracurricular,
link |
so you could sign up for dance or music or programming.
link |
And I did the programming thing
link |
and my mom was like, what, why?
link |
Did you compete in programming?
link |
Like these days, Romania probably,
link |
that's like a big thing.
link |
There's a programming competition.
link |
Was that, did that touch you at all?
link |
I did a little bit of the computer science Olympian,
link |
but not as seriously as I did the math Olympian.
link |
So it was programming.
link |
Yeah, it's basically,
link |
here's a hard math problem,
link |
solve it with a computer is kind of the deal.
link |
Yeah, it's more like algorithm.
link |
Exactly, it's always algorithmic.
link |
So again, you kind of mentioned the Google self driving car,
link |
but outside of that,
link |
what's like who or what is your favorite robot,
link |
real or fictional that like captivated
link |
your imagination throughout?
link |
I mean, I guess you kind of alluded
link |
to the Google self drive,
link |
the Firefly was a magical moment,
link |
but is there something else?
link |
It wasn't the Firefly there,
link |
I think there was the Lexus by the way.
link |
This was back then.
link |
But yeah, so good question.
link |
Okay, my favorite fictional robot is WALLI.
link |
And I love how amazingly expressive it is.
link |
I'm personally thinks a little bit
link |
about expressive motion kinds of things you're saying with,
link |
you can do this and it's a head and it's the manipulator
link |
and what does it all mean?
link |
I like to think about that stuff.
link |
I love Pixar, I love animation.
link |
WALLI has two big eyes, I think, or no?
link |
Yeah, it has these cameras and they move.
link |
So yeah, it goes and then it's super cute.
link |
Yeah, the way it moves is just so expressive,
link |
the timing of that motion,
link |
what it's doing with its arms
link |
and what it's doing with these lenses is amazing.
link |
And so I've really liked that from the start.
link |
And then on top of that, sometimes I share this,
link |
it's a personal story I share with people
link |
or when I teach about AI or whatnot.
link |
My husband proposed to me by building a WALLI
link |
and he actuated it.
link |
So it's seven degrees of freedom, including the lens thing.
link |
And it kind of came in and it had the,
link |
he made it have like the belly box opening thing.
link |
So it just did that.
link |
And then it spewed out this box made out of Legos
link |
that open slowly and then bam, yeah.
link |
Yeah, it was quite, it set a bar.
link |
That could be like the most impressive thing I've ever heard.
link |
That was special connection to WALLI, long story short.
link |
I like WALLI because I like animation and I like robots
link |
and I like the fact that this was,
link |
we still have this robot to this day.
link |
How hard is that problem,
link |
do you think of the expressivity of robots?
link |
Like with the Boston Dynamics, I never talked to those folks
link |
about this particular element.
link |
I've talked to them a lot,
link |
but it seems to be like almost an accidental side effect
link |
for them that they weren't,
link |
I don't know if they're faking it.
link |
They weren't trying to, okay.
link |
They do say that the gripper,
link |
it was not intended to be a face.
link |
I don't know if that's a honest statement,
link |
but I think they're legitimate.
link |
Probably yes. And so do we automatically just
link |
anthropomorphize anything we can see about a robot?
link |
So like the question is,
link |
how hard is it to create a WALLI type robot
link |
that connects so deeply with us humans?
link |
What do you think?
link |
It's really hard, right?
link |
So it depends on what setting.
link |
So if you wanna do it in this very particular narrow setting
link |
where it does only one thing and it's expressive,
link |
then you can get an animator, you know,
link |
you can have Pixar on call come in,
link |
design some trajectories.
link |
There was a, Anki had a robot called Cosmo
link |
where they put in some of these animations.
link |
That part is easy, right?
link |
The hard part is doing it not via these
link |
kind of handcrafted behaviors,
link |
but doing it generally autonomously.
link |
Like I want robots, I don't work on,
link |
just to clarify, I don't, I used to work a lot on this.
link |
I don't work on that quite as much these days,
link |
but the notion of having robots that, you know,
link |
when they pick something up and put it in a place,
link |
they can do that with various forms of style,
link |
or you can say, well, this robot is, you know,
link |
succeeding at this task and is confident
link |
versus it's hesitant versus, you know,
link |
maybe it's happy or it's, you know,
link |
disappointed about something, some failure that it had.
link |
I think that when robots move,
link |
they can communicate so much about internal states
link |
or perceived internal states that they have.
link |
And I think that's really useful
link |
and an element that we'll want in the future
link |
because I was reading this article
link |
about how kids are,
link |
kids are being rude to Alexa
link |
because they can be rude to it
link |
and it doesn't really get angry, right?
link |
It doesn't reply in any way, it just says the same thing.
link |
So I think there's, at least for that,
link |
for the correct development of children,
link |
it's important that these things,
link |
you kind of react differently.
link |
I also think, you know, you walk in your home
link |
and you have a personal robot and if you're really pissed,
link |
presumably the robot should kind of behave
link |
slightly differently than when you're super happy
link |
and excited, but it's really hard because it's,
link |
I don't know, you know, the way I would think about it
link |
and the way I thought about it when it came to
link |
expressing goals or intentions for robots,
link |
it's, well, what's really happening is that
link |
instead of doing robotics where you have your state
link |
and you have your action space and you have your space,
link |
the reward function that you're trying to optimize,
link |
now you kind of have to expand the notion of state
link |
to include this human internal state.
link |
What is the person actually perceiving?
link |
What do they think about the robots?
link |
Something or rather,
link |
and then you have to optimize in that system.
link |
And so that means that you have to understand
link |
how your motion, your actions end up sort of influencing
link |
the observer's kind of perception of you.
link |
And it's very hard to write math about that.
link |
Right, so when you start to think about
link |
incorporating the human into the state model,
link |
apologize for the philosophical question,
link |
but how complicated are human beings, do you think?
link |
Like, can they be reduced to a kind of
link |
almost like an object that moves
link |
and maybe has some basic intents?
link |
Or is there something, do we have to model things like mood
link |
and general aggressiveness and time?
link |
I mean, all these kinds of human qualities
link |
or like game theoretic qualities, like what's your sense?
link |
How complicated is...
link |
How hard is the problem of human robot interaction?
link |
Yeah, should we talk about
link |
what the problem of human robot interaction is?
link |
Yeah, what is human robot interaction?
link |
And then talk about how that, yeah.
link |
So, and by the way, I'm gonna talk about
link |
this very particular view of human robot interaction, right?
link |
Which is not so much on the social side
link |
or on the side of how do you have a good conversation
link |
with the robot, what should the robot's appearance be?
link |
It turns out that if you make robots taller versus shorter,
link |
this has an effect on how people act with them.
link |
So I'm not talking about that.
link |
But I'm talking about this very kind of narrow thing,
link |
which is you take, if you wanna take a task
link |
that a robot can do in isolation,
link |
in a lab out there in the world, but in isolation,
link |
and now you're asking what does it mean for the robot
link |
to be able to do this task for,
link |
presumably what its actually end goal is,
link |
which is to help some person.
link |
That ends up changing the problem in two ways.
link |
The first way it changes the problem is that
link |
the robot is no longer the single agent acting.
link |
That you have humans who also take actions
link |
in that same space.
link |
Cars navigating around people, robots around an office,
link |
navigating around the people in that office.
link |
If I send the robot over there in the cafeteria
link |
to get me a coffee, then there's probably other people
link |
reaching for stuff in the same space.
link |
And so now you have your robot and you're in charge
link |
of the actions that the robot is taking.
link |
Then you have these people who are also making decisions
link |
and taking actions in that same space.
link |
And even if, you know, the robot knows what it should do
link |
and all of that, just coexisting with these people, right?
link |
Kind of getting the actions to gel well,
link |
to mesh well together.
link |
That's sort of the kind of problem number one.
link |
And then there's problem number two,
link |
which is, goes back to this notion of if I'm a programmer,
link |
I can specify some objective for the robot
link |
to go off and optimize and specify the task.
link |
But if I put the robot in your home,
link |
presumably you might have your own opinions about,
link |
well, okay, I want my house clean,
link |
but how do I want it cleaned?
link |
And how should robot move, how close to me it should come
link |
And so I think those are the two differences that you have.
link |
You're acting around people and what you should be
link |
optimizing for should satisfy the preferences
link |
of that end user, not of your programmer who programmed you.
link |
Yeah, and the preferences thing is tricky.
link |
So figuring out those preferences,
link |
be able to interactively adjust
link |
to understand what the human is doing.
link |
So really it boils down to understand the humans
link |
in order to interact with them and in order to please them.
link |
So why is this hard?
link |
Yeah, why is understanding humans hard?
link |
So I think there's two tasks about understanding humans
link |
that in my mind are very, very similar,
link |
but not everyone agrees.
link |
So there's the task of being able to just anticipate
link |
what people will do.
link |
We all know that cars need to do this, right?
link |
We all know that, well, if I navigate around some people,
link |
the robot has to get some notion of,
link |
okay, where is this person gonna be?
link |
So that's kind of the prediction side.
link |
And then there's what you were saying,
link |
satisfying the preferences, right?
link |
So adapting to the person's preferences,
link |
knowing what to optimize for,
link |
which is more this inference side,
link |
this what does this person want?
link |
What is their intent? What are their preferences?
link |
And to me, those kind of go together
link |
because I think that at the very least,
link |
if you can understand, if you can look at human behavior
link |
and understand what it is that they want,
link |
then that's sort of the key enabler
link |
to being able to anticipate what they'll do in the future.
link |
Because I think that we're not arbitrary.
link |
We make these decisions that we make,
link |
we act in the way we do
link |
because we're trying to achieve certain things.
link |
And so I think that's the relationship between them.
link |
Now, how complicated do these models need to be
link |
in order to be able to understand what people want?
link |
So we've gotten a long way in robotics
link |
with something called inverse reinforcement learning,
link |
which is the notion of if someone acts,
link |
demonstrates how they want the thing done.
link |
What is inverse reinforcement learning?
link |
You just briefly said it.
link |
Right, so it's the problem of take human behavior
link |
and infer reward function from this.
link |
So figure out what it is
link |
that that behavior is optimal with respect to.
link |
And it's a great way to think
link |
about learning human preferences
link |
in the sense of you have a car and the person can drive it
link |
and then you can say, well, okay,
link |
I can actually learn what the person is optimizing for.
link |
I can learn their driving style,
link |
or you can have people demonstrate
link |
how they want the house clean.
link |
And then you can say, okay, this is,
link |
I'm getting the trade offs that they're making.
link |
I'm getting the preferences that they want out of this.
link |
And so we've been successful in robotics somewhat with this.
link |
And it's based on a very simple model of human behavior.
link |
It was remarkably simple,
link |
which is that human behavior is optimal
link |
with respect to whatever it is that people want, right?
link |
So you make that assumption
link |
and now you can kind of inverse through.
link |
That's why it's called inverse,
link |
well, really optimal control,
link |
but also inverse reinforcement learning.
link |
So this is based on utility maximization in economics.
link |
Back in the forties, von Neumann and Morgenstern
link |
were like, okay, people are making choices
link |
by maximizing utility, go.
link |
And then in the late fifties,
link |
we had Luce and Shepherd come in and say,
link |
people are a little bit noisy and approximate in that process.
link |
So they might choose something kind of stochastically
link |
with probability proportional to
link |
how much utility something has.
link |
So there's a bit of noise in there.
link |
This has translated into robotics
link |
and something that we call Boltzmann rationality.
link |
So it's a kind of an evolution
link |
of inverse reinforcement learning
link |
that accounts for human noise.
link |
And we've had some success with that too,
link |
for these tasks where it turns out
link |
people act noisily enough that you can't just do vanilla,
link |
the vanilla version.
link |
You can account for noise
link |
and still infer what they seem to want based on this.
link |
Then now we're hitting tasks where that's not enough.
link |
What are examples of spatial tasks?
link |
So imagine you're trying to control some robot,
link |
that's fairly complicated.
link |
You're trying to control a robot arm
link |
because maybe you're a patient with a motor impairment
link |
and you have this wheelchair mounted arm
link |
and you're trying to control it around.
link |
Or one task that we've looked at with Sergei is,
link |
and our students did, is a lunar lander.
link |
So I don't know if you know this Atari game,
link |
it's called Lunar Lander.
link |
People really suck at landing the thing.
link |
Mostly they just crash it left and right.
link |
Okay, so this is the kind of task we imagine
link |
you're trying to provide some assistance
link |
to a person operating such a robot
link |
where you want the kind of the autonomy to kick in,
link |
figure out what it is that you're trying to do
link |
and help you do it.
link |
It's really hard to do that for, say, Lunar Lander
link |
because people are all over the place.
link |
And so they seem much more noisy than really irrational.
link |
That's an example of a task
link |
where these models are kind of failing us.
link |
And it's not surprising because
link |
we're talking about the 40s, utility, late 50s,
link |
Then the 70s came and behavioral economics
link |
started being a thing where people were like,
link |
no, no, no, no, no, people are not rational.
link |
People are messy and emotional and irrational
link |
and have all sorts of heuristics
link |
that might be domain specific.
link |
And they're just a mess.
link |
So what does my robot do to understand
link |
And it's a very, it's very, that's why it's complicated.
link |
It's, you know, for the most part,
link |
we get away with pretty simple models until we don't.
link |
And then the question is, what do you do then?
link |
And I had days when I wanted to, you know,
link |
pack my bags and go home and switch jobs
link |
because it's just, it feels really daunting
link |
to make sense of human behavior enough
link |
that you can reliably understand what people want,
link |
especially as, you know,
link |
robot capabilities will continue to get developed.
link |
You'll get these systems that are more and more capable
link |
of all sorts of things.
link |
And then you really want to make sure
link |
that you're telling them the right thing to do.
link |
What is that thing?
link |
Well, read it in human behavior.
link |
So if I just sat here quietly
link |
and tried to understand something about you
link |
by listening to you talk,
link |
it would be harder than if I got to say something
link |
and ask you and interact and control.
link |
Can you, can the robot help its understanding of the human
link |
by influencing the behavior by actually acting?
link |
So one of the things that's been exciting to me lately
link |
is this notion that when you try to,
link |
that when you try to think of the robotics problem as,
link |
okay, I have a robot and it needs to optimize
link |
for whatever it is that a person wants it to optimize
link |
as opposed to maybe what a programmer said.
link |
That problem we think of as a human robot
link |
collaboration problem in which both agents get to act
link |
in which the robot knows less than the human
link |
because the human actually has access to,
link |
you know, at least implicitly to what it is that they want.
link |
They can't write it down, but they can talk about it.
link |
They can give all sorts of signals.
link |
They can demonstrate and,
link |
but the robot doesn't need to sit there
link |
and passively observe human behavior
link |
and try to make sense of it.
link |
The robot can act too.
link |
And so there's these information gathering actions
link |
that the robot can take to sort of solicit responses
link |
that are actually informative.
link |
So for instance, this is not for the purpose
link |
of assisting people, but with kind of back to coordinating
link |
with people in cars and all of that.
link |
One thing that Dorsa did was,
link |
so we were looking at cars being able to navigate
link |
around people and you might not know exactly
link |
the driving style of a particular individual
link |
that's next to you,
link |
but you wanna change lanes in front of them.
link |
Navigating around other humans inside cars.
link |
Yeah, good, good clarification question.
link |
So you have an autonomous car and it's trying to navigate
link |
the road around human driven vehicles.
link |
Similar things ideas apply to pedestrians as well,
link |
but let's just take human driven vehicles.
link |
So now you're trying to change a lane.
link |
Well, you could be trying to infer the driving style
link |
of this person next to you.
link |
You'd like to know if they're in particular,
link |
if they're sort of aggressive or defensive,
link |
if they're gonna let you kind of go in
link |
or if they're gonna not.
link |
And it's very difficult to just,
link |
if you think that if you wanna hedge your bets
link |
and say, ah, maybe they're actually pretty aggressive,
link |
I shouldn't try this.
link |
You kind of end up driving next to them
link |
and driving next to them, right?
link |
And then you don't know
link |
because you're not actually getting the observations
link |
that you're getting away.
link |
Someone drives when they're next to you
link |
and they just need to go straight.
link |
It's kind of the same
link |
regardless if they're aggressive or defensive.
link |
And so you need to enable the robot
link |
to reason about how it might actually be able
link |
to gather information by changing the actions
link |
And then the robot comes up with these cool things
link |
where it kind of nudges towards you
link |
and then sees if you're gonna slow down or not.
link |
Then if you slow down,
link |
it sort of updates its model of you
link |
and says, oh, okay, you're more on the defensive side.
link |
So now I can actually like.
link |
That's a fascinating dance.
link |
That's so cool that you could use your own actions
link |
to gather information.
link |
That feels like a totally open,
link |
exciting new world of robotics.
link |
I mean, how many people are even thinking
link |
about that kind of thing?
link |
A handful of us, I'd say.
link |
It's rare because it's actually leveraging human.
link |
I mean, most roboticists,
link |
I've talked to a lot of colleagues and so on,
link |
are kind of, being honest, kind of afraid of humans.
link |
Because they're messy and complicated, right?
link |
Going back to what we were talking about earlier,
link |
right now we're kind of in this dilemma of, okay,
link |
there are tasks that we can just assume
link |
people are approximately rational for
link |
and we can figure out what they want.
link |
We can figure out their goals.
link |
We can figure out their driving styles, whatever.
link |
There are these tasks that we can't.
link |
So what do we do, right?
link |
Do we pack our bags and go home?
link |
And this one, I've had a little bit of hope recently.
link |
And I'm kind of doubting myself
link |
because what do I know that, you know,
link |
50 years of behavioral economics hasn't figured out.
link |
But maybe it's not really in contradiction
link |
with the way that field is headed.
link |
But basically one thing that we've been thinking about is,
link |
instead of kind of giving up and saying
link |
people are too crazy and irrational
link |
for us to make sense of them,
link |
maybe we can give them a bit the benefit of the doubt.
link |
And maybe we can think of them
link |
as actually being relatively rational,
link |
but just under different assumptions about the world,
link |
about how the world works, about, you know,
link |
they don't have, when we think about rationality,
link |
implicit assumption is, oh, they're rational,
link |
and they're all the same assumptions and constraints
link |
as the robot, right?
link |
What, if this is the state of the world,
link |
that's what they know.
link |
This is the transition function, that's what they know.
link |
This is the horizon, that's what they know.
link |
But maybe the kind of this difference,
link |
the way, the reason they can seem a little messy
link |
and hectic, especially to robots,
link |
is that perhaps they just make different assumptions
link |
or have different beliefs.
link |
Yeah, I mean, that's another fascinating idea
link |
that this, our kind of anecdotal desire
link |
to say that humans are irrational,
link |
perhaps grounded in behavioral economics,
link |
is that we just don't understand the constraints
link |
and the rewards under which they operate.
link |
And so our goal shouldn't be to throw our hands up
link |
and say they're irrational,
link |
it's to say, let's try to understand
link |
what are the constraints.
link |
What it is that they must be assuming
link |
that makes this behavior make sense.
link |
Good life lesson, right?
link |
That's true, it's just outside of robotics.
link |
That's just good to, that's communicating with humans.
link |
That's just a good assume
link |
that you just don't, sort of empathy, right?
link |
This is maybe there's something you're missing
link |
and it's, you know, it especially happens to robots
link |
cause they're kind of dumb and they don't know things.
link |
And oftentimes people are sort of supra rational
link |
and that they actually know a lot of things
link |
that robots don't.
link |
Sometimes like with the lunar lander,
link |
the robot, you know, knows much more.
link |
So it turns out that if you try to say,
link |
look, maybe people are operating this thing
link |
but assuming a much more simplified physics model
link |
cause they don't get the complexity of this kind of craft
link |
or the robot arm with seven degrees of freedom
link |
with these inertias and whatever.
link |
So maybe they have this intuitive physics model
link |
which is not, you know, this notion of intuitive physics
link |
is something that you studied actually in cognitive science
link |
was like Josh Denenbaum, Tom Griffith's work on this stuff.
link |
And what we found is that you can actually try
link |
to figure out what physics model
link |
kind of best explains human actions.
link |
And then you can use that to sort of correct what it is
link |
that they're commanding the craft to do.
link |
So they might, you know, be sending the craft somewhere
link |
but instead of executing that action,
link |
you can sort of take a step back and say,
link |
according to their intuitive,
link |
if the world worked according to their intuitive physics
link |
model, where do they think that the craft is going?
link |
Where are they trying to send it to?
link |
And then you can use the real physics, right?
link |
The inverse of that to actually figure out
link |
what you should do so that you do that
link |
instead of where they were actually sending you
link |
in the real world.
link |
And I kid you not at work people land the damn thing
link |
and you know, in between the two flags and all that.
link |
So it's not conclusive in any way
link |
but I'd say it's evidence that yeah,
link |
maybe we're kind of underestimating humans in some ways
link |
when we're giving up and saying,
link |
yeah, they're just crazy noisy.
link |
So then you try to explicitly try to model
link |
the kind of worldview that they have.
link |
That they have, that's right.
link |
And it's not too, I mean,
link |
there's things in behavior economics too
link |
that for instance have touched upon the planning horizon.
link |
So there's this idea that there's bounded rationality
link |
essentially and the idea that, well,
link |
maybe we work under computational constraints.
link |
And I think kind of our view recently has been
link |
take the Bellman update in AI
link |
and just break it in all sorts of ways by saying state,
link |
no, no, no, the person doesn't get to see the real state.
link |
Maybe they're estimating somehow.
link |
Transition function, no, no, no, no, no.
link |
Even the actual reward evaluation,
link |
maybe they're still learning
link |
about what it is that they want.
link |
Like, you know, when you watch Netflix
link |
and you know, you have all the things
link |
and then you have to pick something,
link |
imagine that, you know, the AI system interpreted
link |
that choice as this is the thing you prefer to see.
link |
Like, how are you going to know?
link |
You're still trying to figure out what you like,
link |
what you don't like, et cetera.
link |
So I think it's important to also account for that.
link |
So it's not irrationality,
link |
because they're doing the right thing
link |
under the things that they know.
link |
Yeah, that's brilliant.
link |
You mentioned recommender systems.
link |
What kind of, and we were talking
link |
about human robot interaction,
link |
what kind of problem spaces are you thinking about?
link |
So is it robots, like wheeled robots
link |
with autonomous vehicles?
link |
Is it object manipulation?
link |
Like when you think
link |
about human robot interaction in your mind,
link |
and maybe I'm sure you can speak
link |
for the entire community of human robot interaction.
link |
But like, what are the problems of interest here?
link |
And does it, you know, I kind of think
link |
of open domain dialogue as human robot interaction,
link |
and that happens not in the physical space,
link |
but it could just happen in the virtual space.
link |
So where's the boundaries of this field for you
link |
when you're thinking about the things
link |
we've been talking about?
link |
Yeah, so I try to find kind of underlying,
link |
I don't know what to even call them.
link |
I try to work on, you know, I might call what I do,
link |
the kind of working on the foundations
link |
of algorithmic human robot interaction
link |
and trying to make contributions there.
link |
And it's important to me that whatever we do
link |
is actually somewhat domain agnostic when it comes to,
link |
is it about, you know, autonomous cars
link |
or is it about quadrotors or is it about,
link |
is this sort of the same underlying principles apply?
link |
Of course, when you're trying to get
link |
a particular domain to work,
link |
you usually have to do some extra work
link |
to adapt that to that particular domain.
link |
But these things that we were talking about around,
link |
well, you know, how do you model humans?
link |
It turns out that a lot of systems need
link |
to core benefit from a better understanding
link |
of how human behavior relates to what people want
link |
and need to predict human behavior,
link |
physical robots of all sorts and beyond that.
link |
And so I used to do manipulation.
link |
I used to be, you know, picking up stuff
link |
and then I was picking up stuff with people around.
link |
And now it's sort of very broad
link |
when it comes to the application level,
link |
but in a sense, very focused on, okay,
link |
how does the problem need to change?
link |
How do the algorithms need to change
link |
when we're not doing a robot by itself?
link |
You know, emptying the dishwasher,
link |
but we're stepping outside of that.
link |
I thought that popped into my head just now.
link |
On the game theoretic side,
link |
I think you said this really interesting idea
link |
of using actions to gain more information.
link |
But if we think of sort of game theory,
link |
the humans that are interacting with you,
link |
with you, the robot?
link |
Wow, I'm thinking the identity of the robot.
link |
Yeah, I do that all the time.
link |
Yeah, is they also have a world model of you
link |
and you can manipulate that.
link |
I mean, if we look at autonomous vehicles,
link |
people have a certain viewpoint.
link |
You said with the kids, people see Alexa in a certain way.
link |
Is there some value in trying to also optimize
link |
how people see you as a robot?
link |
Or is that a little too far away from the specifics
link |
of what we can solve right now?
link |
So, well, both, right?
link |
So it's really interesting.
link |
And we've seen a little bit of progress on this problem,
link |
on pieces of this problem.
link |
So you can, again, it kind of comes down
link |
to how complicated does the human model need to be?
link |
But in one piece of work that we were looking at,
link |
we just said, okay, there's these parameters
link |
that are internal to the robot
link |
and what the robot is about to do,
link |
or maybe what objective,
link |
what driving style the robot has or something like that.
link |
And what we're gonna do is we're gonna set up a system
link |
where part of the state is the person's belief
link |
over those parameters.
link |
And now when the robot acts,
link |
that the person gets new evidence
link |
about this robot internal state.
link |
And so they're updating their mental model of the robot.
link |
So if they see a car that sort of cuts someone off,
link |
they're like, oh, that's an aggressive car.
link |
If they see sort of a robot head towards a particular door,
link |
they're like, oh yeah, the robot's trying to get
link |
So this thing that we have to do with humans
link |
to try and understand their goals and intentions,
link |
humans are inevitably gonna do that to robots.
link |
And then that raises this interesting question
link |
that you asked, which is, can we do something about that?
link |
This is gonna happen inevitably,
link |
but we can sort of be more confusing
link |
or less confusing to people.
link |
And it turns out you can optimize
link |
for being more informative and less confusing
link |
if you have an understanding of how your actions
link |
are being interpreted by the human,
link |
and how they're using these actions to update their belief.
link |
And honestly, all we did is just Bayes rule.
link |
Basically, okay, the person has a belief,
link |
they see an action, they make some assumptions
link |
about how the robot generates its actions,
link |
presumably as being rational,
link |
because robots are rational.
link |
It's reasonable to assume that about them.
link |
And then they incorporate that new piece of evidence
link |
in the Bayesian sense in their belief,
link |
and they obtain a posterior.
link |
And now the robot is trying to figure out
link |
what actions to take such that it steers
link |
the person's belief to put as much probability mass
link |
as possible on the correct parameters.
link |
So that's kind of a mathematical formalization of that.
link |
But my worry, and I don't know if you wanna go there
link |
with me, but I talk about this quite a bit.
link |
The kids talking to Alexa disrespectfully worries me.
link |
I worry in general about human nature.
link |
Like I said, I grew up in Soviet Union, World War II,
link |
I'm a Jew too, so with the Holocaust and everything.
link |
I just worry about how we humans sometimes treat the other,
link |
the group that we call the other, whatever it is.
link |
Through human history, the group that's the other
link |
has been changed faces.
link |
But it seems like the robot will be the other, the other,
link |
And one thing is it feels to me
link |
that robots don't get no respect.
link |
They get shoved around.
link |
Shoved around, and is there, one, at the shallow level,
link |
for a better experience, it seems that robots
link |
need to talk back a little bit.
link |
Like my intuition says, I mean, most companies
link |
from sort of Roomba, autonomous vehicle companies
link |
might not be so happy with the idea that a robot
link |
has a little bit of an attitude.
link |
But I feel, it feels to me that that's necessary
link |
to create a compelling experience.
link |
Like we humans don't seem to respect anything
link |
that doesn't give us some attitude.
link |
That, or like a mix of mystery and attitude and anger
link |
and that threatens us subtly, maybe passive aggressively.
link |
It seems like we humans, yeah, need that.
link |
Do you, what are your, is there something,
link |
you have thoughts on this?
link |
All right, I'll give you two thoughts on this.
link |
One is, one is, it's, we respond to, you know,
link |
someone being assertive, but we also respond
link |
to someone being vulnerable.
link |
So I think robots, my first thought is that
link |
robots get shoved around and bullied a lot
link |
because they're sort of, you know, tempting
link |
and they're sort of showing off
link |
or they appear to be showing off.
link |
And so I think going back to these things
link |
we were talking about in the beginning
link |
of making robots a little more, a little more expressive,
link |
a little bit more like, eh, that wasn't cool to do.
link |
And now I'm bummed, right?
link |
I think that that can actually help
link |
because people can't help but anthropomorphize
link |
and respond to that.
link |
Even that though, the emotion being communicated
link |
is not in any way a real thing.
link |
And people know that it's not a real thing
link |
because they know it's just a machine.
link |
We're still interpreting, you know, we watch,
link |
there's this famous psychology experiment
link |
with little triangles and kind of dots on a screen
link |
and a triangle is chasing the square
link |
and you get really angry at the darn triangle
link |
because why is it not leaving the square alone?
link |
So that's, yeah, we can't help.
link |
So that was the first thought.
link |
The vulnerability, that's really interesting that,
link |
I think of like being, pushing back, being assertive
link |
as the only mechanism of getting,
link |
of forming a connection, of getting respect,
link |
but perhaps vulnerability,
link |
perhaps there's other mechanisms that are less threatening.
link |
Well, I think, well, a little bit, yes,
link |
but then this other thing that we can think about is,
link |
it goes back to what you were saying,
link |
that interaction is really game theoretic, right?
link |
So the moment you're taking actions in a space,
link |
the humans are taking actions in that same space,
link |
but you have your own objective, which is, you know,
link |
you're a car, you need to get your passenger
link |
to the destination.
link |
And then the human nearby has their own objective,
link |
which somewhat overlaps with you, but not entirely.
link |
You're not interested in getting into an accident
link |
with each other, but you have different destinations
link |
and you wanna get home faster
link |
and they wanna get home faster.
link |
And that's a general sum game at that point.
link |
And so that's, I think that's what,
link |
treating it as such is kind of a way we can step outside
link |
of this kind of mode that,
link |
where you try to anticipate what people do
link |
and you don't realize you have any influence over it
link |
while still protecting yourself
link |
because you're understanding that people also understand
link |
that they can influence you.
link |
And it's just kind of back and forth is this negotiation,
link |
which is really talking about different equilibria
link |
The very basic way to solve coordination
link |
is to just make predictions about what people will do
link |
and then stay out of their way.
link |
And that's hard for the reasons we talked about,
link |
which is how you have to understand people's intentions
link |
implicitly, explicitly, who knows,
link |
but somehow you have to get enough of an understanding
link |
of that to be able to anticipate what happens next.
link |
And so that's challenging.
link |
But then it's further challenged by the fact
link |
that people change what they do based on what you do
link |
because they don't plan in isolation either, right?
link |
So when you see cars trying to merge on a highway
link |
and not succeeding, one of the reasons this can be
link |
is because they look at traffic that keeps coming,
link |
they predict what these people are planning on doing,
link |
which is to just keep going,
link |
and then they stay out of the way
link |
because there's no feasible plan, right?
link |
Any plan would actually intersect
link |
with one of these other people.
link |
So that's bad, so you get stuck there.
link |
So now kind of if you start thinking about it as no, no, no,
link |
actually these people change what they do
link |
depending on what the car does.
link |
Like if the car actually tries to kind of inch itself forward,
link |
they might actually slow down and let the car in.
link |
And now taking advantage of that,
link |
well, that's kind of the next level.
link |
We call this like this underactuated system idea
link |
where it's kind of underactuated system robotics,
link |
but it's kind of, you're influenced
link |
these other degrees of freedom,
link |
but you don't get to decide what they do.
link |
I've somewhere seen you mention it,
link |
the human element in this picture as underactuated.
link |
So you understand underactuated robotics
link |
is that you can't fully control the system.
link |
You can't go in arbitrary directions
link |
in the configuration space.
link |
Under your control.
link |
Yeah, it's a very simple way of underactuation
link |
where basically there's literally these degrees of freedom
link |
that you can control,
link |
and these degrees of freedom that you can't,
link |
but you influence them.
link |
And I think that's the important part
link |
is that they don't do whatever, regardless of what you do,
link |
that what you do influences what they end up doing.
link |
I just also like the poetry of calling human robot
link |
interaction an underactuated robotics problem.
link |
And you also mentioned sort of nudging.
link |
It seems that they're, I don't know.
link |
I think about this a lot in the case of pedestrians
link |
I've collected hundreds of hours of videos.
link |
I like to just watch pedestrians.
link |
And it seems that.
link |
It's a funny hobby.
link |
Cause I learn a lot.
link |
I learned a lot about myself,
link |
about our human behavior, from watching pedestrians,
link |
watching people in their environment.
link |
Basically crossing the street
link |
is like you're putting your life on the line.
link |
I don't know, tens of millions of time in America every day
link |
is people are just like playing this weird game of chicken
link |
when they cross the street,
link |
especially when there's some ambiguity
link |
about the right of way.
link |
That has to do either with the rules of the road
link |
or with the general personality of the intersection
link |
based on the time of day and so on.
link |
And this nudging idea,
link |
it seems that people don't even nudge.
link |
They just aggressively take, make a decision.
link |
Somebody, there's a runner that gave me this advice.
link |
I sometimes run in the street,
link |
not in the street, on the sidewalk.
link |
And he said that if you don't make eye contact with people
link |
when you're running, they will all move out of your way.
link |
It's called civil inattention.
link |
Civil inattention, that's a thing.
link |
Oh wow, I need to look this up, but it works.
link |
My sense was if you communicate like confidence
link |
in your actions that you're unlikely to deviate
link |
from the action that you're following,
link |
that's a really powerful signal to others
link |
that they need to plan around your actions.
link |
As opposed to nudging where you're sort of hesitantly,
link |
then the hesitation might communicate
link |
that you're still in the dance and the game
link |
that they can influence with their own actions.
link |
I've recently had a conversation with Jim Keller,
link |
who's a sort of this legendary chip architect,
link |
but he also led the autopilot team for a while.
link |
And his intuition that driving is fundamentally
link |
still like a ballistics problem.
link |
Like you can ignore the human element
link |
that is just not hitting things.
link |
And you can kind of learn the right dynamics
link |
required to do the merger and all those kinds of things.
link |
And then my sense is, and I don't know if I can provide
link |
sort of definitive proof of this,
link |
but my sense is like an order of magnitude
link |
are more difficult when humans are involved.
link |
Like it's not simply object collision avoidance problem.
link |
Where does your intuition,
link |
of course, nobody knows the right answer here,
link |
but where does your intuition fall on the difficulty,
link |
fundamental difficulty of the driving problem
link |
when humans are involved?
link |
Yeah, good question.
link |
I have many opinions on this.
link |
Imagine downtown San Francisco.
link |
Yeah, it's crazy, busy, everything.
link |
Okay, now take all the humans out.
link |
No pedestrians, no human driven vehicles,
link |
no cyclists, no people on little electric scooters
link |
zipping around, nothing.
link |
I think we're done.
link |
I think driving at that point is done.
link |
There's nothing really that still needs
link |
to be solved about that.
link |
Well, let's pause there.
link |
I think I agree with you and I think a lot of people
link |
that will hear will agree with that,
link |
but we need to sort of internalize that idea.
link |
So what's the problem there?
link |
Cause we might not quite yet be done with that.
link |
Cause a lot of people kind of focus
link |
on the perception problem.
link |
A lot of people kind of map autonomous driving
link |
into how close are we to solving,
link |
being able to detect all the, you know,
link |
the drivable area, the objects in the scene.
link |
Do you see that as a, how hard is that problem?
link |
So your intuition there behind your statement
link |
was we might have not solved it yet,
link |
but we're close to solving basically the perception problem.
link |
I think the perception problem, I mean,
link |
and by the way, a bunch of years ago,
link |
this would not have been true.
link |
And a lot of issues in the space were coming
link |
from the fact that, oh, we don't really, you know,
link |
we don't know what's where.
link |
But I think it's fairly safe to say that at this point,
link |
although you could always improve on things
link |
and all of that, you can drive through downtown San Francisco
link |
if there are no people around.
link |
There's no really perception issues
link |
standing in your way there.
link |
I think perception is hard, but yeah, it's, we've made
link |
a lot of progress on the perception,
link |
so I had to undermine the difficulty of the problem.
link |
I think everything about robotics is really difficult,
link |
of course, I think that, you know, the planning problem,
link |
the control problem, all very difficult,
link |
but I think what's, what makes it really kind of, yeah.
link |
It might be, I mean, you know,
link |
and I picked downtown San Francisco,
link |
it's adapting to, well, now it's snowing,
link |
now it's no longer snowing, now it's slippery in this way,
link |
now it's the dynamics part could,
link |
I could imagine being still somewhat challenging, but.
link |
No, the thing that I think worries us,
link |
and our intuition's not good there,
link |
is the perception problem at the edge cases.
link |
Sort of downtown San Francisco, the nice thing,
link |
it's not actually, it may not be a good example because.
link |
Because you know what you're getting from,
link |
well, there's like crazy construction zones
link |
and all of that. Yeah, but the thing is,
link |
you're traveling at slow speeds,
link |
so like it doesn't feel dangerous.
link |
To me, what feels dangerous is highway speeds,
link |
when everything is, to us humans, super clear.
link |
Yeah, I'm assuming LiDAR here, by the way.
link |
I think it's kind of irresponsible to not use LiDAR.
link |
That's just my personal opinion.
link |
That's, I mean, depending on your use case,
link |
but I think like, you know, if you have the opportunity
link |
to use LiDAR, in a lot of cases, you might not.
link |
Good, your intuition makes more sense now.
link |
So you don't think vision.
link |
I really just don't know enough to say,
link |
well, vision alone, what, you know, what's like,
link |
there's a lot of, how many cameras do you have?
link |
Is it, how are you using them?
link |
I don't know. There's details.
link |
There's all, there's all sorts of details.
link |
I imagine there's stuff that's really hard
link |
to actually see, you know, how do you deal with glare,
link |
exactly what you were saying,
link |
stuff that people would see that you don't.
link |
I think I have, more of my intuition comes from systems
link |
that can actually use LiDAR as well.
link |
Yeah, and until we know for sure,
link |
it makes sense to be using LiDAR.
link |
That's kind of the safety focus.
link |
But then the sort of the,
link |
I also sympathize with the Elon Musk statement
link |
of LiDAR is a crutch.
link |
It's a fun notion to think that the things that work today
link |
is a crutch for the invention of the things
link |
that will work tomorrow, right?
link |
Like it, it's kind of true in the sense that if,
link |
you know, we want to stick to the comfort zone,
link |
you see this in academic and research settings
link |
all the time, the things that work force you
link |
to not explore outside, think outside the box.
link |
I mean, that happens all the time.
link |
The problem is in the safety critical systems,
link |
you kind of want to stick with the things that work.
link |
So it's an interesting and difficult trade off
link |
in the case of real world sort of safety critical
link |
robotic systems, but so your intuition is,
link |
just to clarify, how, I mean,
link |
how hard is this human element for,
link |
like how hard is driving
link |
when this human element is involved?
link |
Are we years, decades away from solving it?
link |
But perhaps actually the year isn't the thing I'm asking.
link |
It doesn't matter what the timeline is,
link |
but do you think we're, how many breakthroughs
link |
are we away from in solving
link |
the human robotic interaction problem
link |
to get this, to get this right?
link |
I think it, in a sense, it really depends.
link |
I think that, you know, we were talking about how,
link |
well, look, it's really hard
link |
because anticipate what people do is hard.
link |
And on top of that, playing the game is hard.
link |
But I think we sort of have the fundamental,
link |
some of the fundamental understanding for that.
link |
And then you already see that these systems
link |
are being deployed in the real world,
link |
you know, even driverless.
link |
Like there's, I think now a few companies
link |
that don't have a driver in the car in some small areas.
link |
I got a chance to, I went to Phoenix and I,
link |
I shot a video with Waymo and I needed to get
link |
People have been giving me slack,
link |
but there's incredible engineering work being done there.
link |
And it's one of those other seminal moments
link |
for me in my life to be able to, it sounds silly,
link |
but to be able to drive without a ride, sorry,
link |
without a driver in the seat.
link |
I mean, that was an incredible robotics.
link |
I was driven by a robot without being able to take over,
link |
without being able to take the steering wheel.
link |
That's a magical, that's a magical moment.
link |
So in that regard, in those domains,
link |
at least for like Waymo, they're solving that human,
link |
there's, I mean, they're going, I mean, it felt fast
link |
because you're like freaking out at first.
link |
That was, this is my first experience,
link |
but it's going like the speed limit, right?
link |
30, 40, whatever it is.
link |
And there's humans and it deals with them quite well.
link |
It detects them, it negotiates the intersections,
link |
the left turns and all of that.
link |
So at least in those domains, it's solving them.
link |
The open question for me is like, how quickly can we expand?
link |
You know, that's the, you know,
link |
outside of the weather conditions,
link |
all of those kinds of things,
link |
how quickly can we expand to like cities like San Francisco?
link |
Yeah, and I wouldn't say that it's just, you know,
link |
now it's just pure engineering and it's probably the,
link |
I mean, and by the way,
link |
I'm speaking kind of very generally here as hypothesizing,
link |
but I think that there are successes
link |
and yet no one is everywhere out there.
link |
So that seems to suggest that things can be expanded
link |
and can be scaled and we know how to do a lot of things,
link |
but there's still probably, you know,
link |
new algorithms or modified algorithms
link |
that you still need to put in there
link |
as you learn more and more about new challenges
link |
that you get faced with.
link |
How much of this problem do you think can be learned
link |
through end to end?
link |
Is it the success of machine learning
link |
and reinforcement learning?
link |
How much of it can be learned from sort of data
link |
from scratch and how much,
link |
which most of the success of autonomous vehicle systems
link |
have a lot of heuristics and rule based stuff on top,
link |
like human expertise injected forced into the system
link |
What's your sense?
link |
How much, what will be the role of learning
link |
in the near term and long term?
link |
I think on the one hand that learning is inevitable here,
link |
I think on the other hand that when people characterize
link |
the problem as it's a bunch of rules
link |
that some people wrote down,
link |
versus it's an end to end RL system or imitation learning,
link |
then maybe there's kind of something missing
link |
from maybe that's more.
link |
So for instance, I think a very, very useful tool
link |
in this sort of problem,
link |
both in how to generate the car's behavior
link |
and robots in general and how to model human beings
link |
is actually planning, search optimization, right?
link |
So robotics is the sequential decision making problem.
link |
And when a robot can figure out on its own
link |
how to achieve its goal without hitting stuff
link |
and all that stuff, right?
link |
All the good stuff for motion planning 101,
link |
I think of that as very much AI,
link |
not this is some rule or something.
link |
There's nothing rule based around that, right?
link |
It's just you're searching through a space
link |
and figuring out are you optimizing through a space
link |
and figure out what seems to be the right thing to do.
link |
And I think it's hard to just do that
link |
because you need to learn models of the world.
link |
And I think it's hard to just do the learning part
link |
where you don't bother with any of that,
link |
because then you're saying, well, I could do imitation,
link |
but then when I go off distribution, I'm really screwed.
link |
Or you can say, I can do reinforcement learning,
link |
which adds a lot of robustness,
link |
but then you have to do either reinforcement learning
link |
in the real world, which sounds a little challenging
link |
or that trial and error, you know,
link |
or you have to do reinforcement learning in simulation.
link |
And then that means, well, guess what?
link |
You need to model things, at least to model people,
link |
model the world enough that whatever policy you get of that
link |
is actually fine to roll out in the world
link |
and do some additional learning there.
link |
So. Do you think simulation, by the way, just a quick tangent
link |
has a role in the human robot interaction space?
link |
Like, is it useful?
link |
It seems like humans, everything we've been talking about
link |
are difficult to model and simulate.
link |
Do you think simulation has a role in this space?
link |
I think so because you can take models
link |
and train with them ahead of time, for instance.
link |
But the models, sorry to interrupt,
link |
the models are sort of human constructed or learned?
link |
I think they have to be a combination
link |
because if you get some human data and then you say,
link |
this is how, this is gonna be my model of the person.
link |
What are for simulation and training
link |
or for just deployment time?
link |
And that's what I'm planning with
link |
as my model of how people work.
link |
Regardless, if you take some data
link |
and you don't assume anything else and you just say,
link |
okay, this is some data that I've collected.
link |
Let me fit a policy to how people work based on that.
link |
What tends to happen is you collected some data
link |
and some distribution, and then now your robot
link |
sort of computes a best response to that, right?
link |
It's sort of like, what should I do
link |
if this is how people work?
link |
And easily goes off of distribution
link |
where that model that you've built of the human
link |
completely sucks because out of distribution,
link |
you have no idea, right?
link |
If you think of all the possible policies
link |
and then you take only the ones that are consistent
link |
with the human data that you've observed,
link |
that still leads a lot of, a lot of things could happen
link |
outside of that distribution where you're confident
link |
then you know what's going on.
link |
By the way, that's, I mean, I've gotten used
link |
to this terminology of not a distribution,
link |
but it's such a machine learning terminology
link |
because it kind of assumes,
link |
so distribution is referring to the data
link |
The set of states that you encounter
link |
at training time. They've encountered so far
link |
at training time. Yeah.
link |
But it kind of also implies that there's a nice
link |
like statistical model that represents that data.
link |
So out of distribution feels like, I don't know,
link |
it raises to me philosophical questions
link |
of how we humans reason out of distribution,
link |
reason about things that are completely,
link |
we haven't seen before.
link |
And so, and what we're talking about here is
link |
how do we reason about what other people do
link |
in situations where we haven't seen them?
link |
And somehow we just magically navigate that.
link |
I can anticipate what will happen in situations
link |
that are even novel in many ways.
link |
And I have a pretty good intuition for,
link |
I don't always get it right, but you know,
link |
and I might be a little uncertain and so on.
link |
But I think it's this that if you just rely on data,
link |
you know, there's just too many possibilities,
link |
there's too many policies out there that fit the data.
link |
And by the way, it's not just state,
link |
it's really kind of history of state,
link |
cause to really be able to anticipate
link |
what the person will do,
link |
it kind of depends on what they've been doing so far,
link |
cause that's the information you need to kind of,
link |
at least implicitly sort of say,
link |
oh, this is the kind of person that this is,
link |
this is probably what they're trying to do.
link |
So anyway, it's like you're trying to map history of states
link |
to actions, there's many mappings.
link |
And history meaning like the last few seconds
link |
or the last few minutes or the last few months.
link |
Who knows, who knows how much you need, right?
link |
In terms of if your state is really like the positions
link |
of everything or whatnot and velocities,
link |
who knows how much you need.
link |
And then there's so many mappings.
link |
And so now you're talking about
link |
how do you regularize that space?
link |
What priors do you impose or what's the inductive bias?
link |
So, you know, there's all very related things
link |
to think about it.
link |
Basically, what are assumptions that we should be making
link |
such that these models actually generalize
link |
outside of the data that we've seen?
link |
And now you're talking about, well, I don't know,
link |
what can you assume?
link |
Maybe you can assume that people like actually
link |
have intentions and that's what drives their actions.
link |
Maybe that's, you know, the right thing to do
link |
when you haven't seen data very nearby
link |
that tells you otherwise.
link |
I don't know, it's a very open question.
link |
Do you think sort of that one of the dreams
link |
of artificial intelligence was to solve
link |
common sense reasoning, whatever the heck that means.
link |
Do you think something like common sense reasoning
link |
has to be solved in part to be able to solve this dance
link |
of human robot interaction, the driving space
link |
or human robot interaction in general?
link |
Do you have to be able to reason about these kinds
link |
of common sense concepts of physics,
link |
of, you know, all the things we've been talking about
link |
humans, I don't even know how to express them with words,
link |
but the basics of human behavior, a fear of death.
link |
So like, to me, it's really important to encode
link |
in some kind of sense, maybe not, maybe it's implicit,
link |
but it feels that it's important to explicitly encode
link |
the fear of death, that people don't wanna die.
link |
Because it seems silly, but like the game of chicken
link |
that involves with the pedestrian crossing the street
link |
is playing with the idea of mortality.
link |
Like we really don't wanna die.
link |
It's not just like a negative reward.
link |
I don't know, it just feels like all these human concepts
link |
have to be encoded.
link |
Do you share that sense or is this a lot simpler
link |
than I'm making out to be?
link |
I think it might be simpler.
link |
And I'm the person who likes to complicate things.
link |
I think it might be simpler than that.
link |
Because it turns out, for instance,
link |
if you say model people in the very,
link |
I'll call it traditional, I don't know if it's fair
link |
to look at it as a traditional way,
link |
but you know, calling people as,
link |
okay, they're rational somehow,
link |
the utilitarian perspective.
link |
Well, in that, once you say that,
link |
you automatically capture that they have an incentive
link |
You know, Stuart likes to say,
link |
you can't fetch the coffee if you're dead.
link |
Stuart Russell, by the way.
link |
That's a good line.
link |
So when you're sort of treating agents
link |
as having these objectives, these incentives,
link |
humans or artificial, you're kind of implicitly modeling
link |
that they'd like to stick around
link |
so that they can accomplish those goals.
link |
So I think in a sense,
link |
maybe that's what draws me so much
link |
to the rationality framework,
link |
even though it's so broken,
link |
we've been able to, it's been such a useful perspective.
link |
And like we were talking about earlier,
link |
what's the alternative?
link |
I give up and go home or, you know,
link |
I just use complete black boxes,
link |
but then I don't know what to assume out of distribution
link |
that come back to this.
link |
It's just, it's been a very fruitful way
link |
to think about the problem
link |
in a very more positive way, right?
link |
People aren't just crazy.
link |
Maybe they make more sense than we think.
link |
But I think we also have to somehow be ready for it
link |
to be wrong, be able to detect
link |
when these assumptions aren't holding,
link |
be all of that stuff.
link |
Let me ask sort of another small side of this
link |
that we've been talking about
link |
the pure autonomous driving problem,
link |
but there's also relatively successful systems
link |
already deployed out there in what you may call
link |
like level two autonomy or semi autonomous vehicles,
link |
whether that's Tesla Autopilot,
link |
work quite a bit with Cadillac SuperGuru system,
link |
which has a driver facing camera that detects your state.
link |
There's a bunch of basically lane centering systems.
link |
What's your sense about this kind of way of dealing
link |
with the human robot interaction problem
link |
by having a really dumb robot
link |
and relying on the human to help the robot out
link |
to keep them both alive?
link |
Is that from the research perspective,
link |
how difficult is that problem?
link |
And from a practical deployment perspective,
link |
is that a fruitful way to approach
link |
this human robot interaction problem?
link |
I think what we have to be careful about there
link |
is to not, it seems like some of these systems,
link |
not all are making this underlying assumption
link |
that if, so I'm a driver and I'm now really not driving,
link |
but supervising and my job is to intervene, right?
link |
And so we have to be careful with this assumption
link |
that when I'm, if I'm supervising,
link |
I will be just as safe as when I'm driving.
link |
That I will, if I wouldn't get into some kind of accident,
link |
if I'm driving, I will be able to avoid that accident
link |
when I'm supervising too.
link |
And I think I'm concerned about this assumption
link |
from a few perspectives.
link |
So from a technical perspective,
link |
it's that when you let something kind of take control
link |
and do its thing, and it depends on what that thing is,
link |
obviously, and how much it's taking control
link |
and how, what things are you trusting it to do.
link |
But if you let it do its thing and take control,
link |
it will go to what we might call off policy
link |
from the person's perspective state.
link |
So states that the person wouldn't actually
link |
find themselves in if they were the ones driving.
link |
And the assumption that the person functions
link |
just as well there as they function in the states
link |
that they would normally encounter
link |
is a little questionable.
link |
Now, another part is the kind of the human factor side
link |
of this, which is that I don't know about you,
link |
but I think I definitely feel like I'm experiencing things
link |
very differently when I'm actively engaged in the task
link |
versus when I'm a passive observer.
link |
Like even if I try to stay engaged, right?
link |
It's very different than when I'm actually
link |
actively making decisions.
link |
And you see this in life in general.
link |
Like you see students who are actively trying
link |
to come up with the answer, learn this thing better
link |
than when they're passively told the answer.
link |
I think that's somewhat related.
link |
And I think people have studied this in human factors
link |
And I think it's actually fairly established
link |
that these two are not the same.
link |
On that point, because I've gotten a huge amount
link |
of heat on this and I stand by it.
link |
Because I know the human factors community well
link |
and the work here is really strong.
link |
And there's many decades of work showing exactly
link |
what you're saying.
link |
Nevertheless, I've been continuously surprised
link |
that much of the predictions of that work has been wrong
link |
in what I've seen.
link |
So what we have to do,
link |
I still agree with everything you said,
link |
but we have to be a little bit more open minded.
link |
So the, I'll tell you, there's a few surprising things
link |
that supervise, like everything you said to the word
link |
is actually exactly correct.
link |
But it doesn't say, what you didn't say
link |
is that these systems are,
link |
you said you can't assume a bunch of things,
link |
but we don't know if these systems are fundamentally unsafe.
link |
That's still unknown.
link |
There's a lot of interesting things,
link |
like I'm surprised by the fact, not the fact,
link |
that what seems to be anecdotally from,
link |
well, from large data collection that we've done,
link |
but also from just talking to a lot of people,
link |
when in the supervisory role of semi autonomous systems
link |
that are sufficiently dumb, at least,
link |
which is, that might be the key element,
link |
is the systems have to be dumb.
link |
The people are actually more energized as observers.
link |
So they're actually better,
link |
they're better at observing the situation.
link |
So there might be cases in systems,
link |
if you get the interaction right,
link |
where you, as a supervisor,
link |
will do a better job with the system together.
link |
I agree, I think that is actually really possible.
link |
I guess mainly I'm pointing out that if you do it naively,
link |
you're implicitly assuming something,
link |
that assumption might actually really be wrong.
link |
But I do think that if you explicitly think about
link |
what the agent should do
link |
so that the person still stays engaged.
link |
What the, so that you essentially empower the person
link |
to do more than they could,
link |
that's really the goal, right?
link |
Is you still have a driver,
link |
so you wanna empower them to be so much better
link |
than they would be by themselves.
link |
And that's different, it's a very different mindset
link |
than I want them to basically not drive, right?
link |
And, but be ready to sort of take over.
link |
So one of the interesting things we've been talking about
link |
is the rewards, that they seem to be fundamental too,
link |
the way robots behaves.
link |
So broadly speaking,
link |
we've been talking about utility functions and so on,
link |
but could you comment on how do we approach
link |
the design of reward functions?
link |
Like, how do we come up with good reward functions?
link |
Well, really good question,
link |
because the answer is we don't.
link |
This was, you know, I used to think,
link |
I used to think about how,
link |
well, it's actually really hard to specify rewards
link |
for interaction because it's really supposed to be
link |
what the people want, and then you really, you know,
link |
we talked about how you have to customize
link |
what you wanna do to the end user.
link |
But I kind of realized that even if you take
link |
the interactive component away,
link |
it's still really hard to design reward functions.
link |
So what do I mean by that?
link |
I mean, if we assume this sort of AI paradigm
link |
in which there's an agent and his job is to optimize
link |
some objectives, some reward, utility, loss, whatever, cost,
link |
if you write it out, maybe it's a set,
link |
depending on the situation or whatever it is,
link |
if you write that out and then you deploy the agent,
link |
you'd wanna make sure that whatever you specified
link |
incentivizes the behavior you want from the agent
link |
in any situation that the agent will be faced with, right?
link |
So I do motion planning on my robot arm,
link |
I specify some cost function like, you know,
link |
this is how far away you should try to stay,
link |
so much it matters to stay away from people,
link |
and this is how much it matters to be able to be efficient
link |
and blah, blah, blah, right?
link |
I need to make sure that whatever I specified,
link |
those constraints or trade offs or whatever they are,
link |
that when the robot goes and solves that problem
link |
in every new situation,
link |
that behavior is the behavior that I wanna see.
link |
And what I've been finding is
link |
that we have no idea how to do that.
link |
Basically, what I can do is I can sample,
link |
I can think of some situations
link |
that I think are representative of what the robot will face,
link |
and I can tune and add and tune some reward function
link |
until the optimal behavior is what I want
link |
on those situations,
link |
which first of all is super frustrating
link |
because, you know, through the miracle of AI,
link |
we've taken, we don't have to specify rules
link |
for behavior anymore, right?
link |
The, who were saying before,
link |
the robot comes up with the right thing to do,
link |
you plug in this situation,
link |
it optimizes right in that situation, it optimizes,
link |
but you have to spend still a lot of time
link |
on actually defining what it is
link |
that that criteria should be,
link |
making sure you didn't forget
link |
about 50 bazillion things that are important
link |
and how they all should be combining together
link |
to tell the robot what's good and what's bad
link |
and how good and how bad.
link |
And so I think this is a lesson that I don't know,
link |
kind of, I guess I close my eyes to it for a while
link |
cause I've been, you know,
link |
tuning cost functions for 10 years now,
link |
but it's really strikes me that,
link |
yeah, we've moved the tuning
link |
and the like designing of features or whatever
link |
from the behavior side into the reward side.
link |
And yes, I agree that there's way less of it,
link |
but it still seems really hard
link |
to anticipate any possible situation
link |
and make sure you specify a reward function
link |
that when optimized will work well
link |
in every possible situation.
link |
So you're kind of referring to unintended consequences
link |
or just in general, any kind of suboptimal behavior
link |
that emerges outside of the things you said,
link |
out of distribution.
link |
Suboptimal behavior that is, you know, actually optimal.
link |
I mean, this, I guess the idea of unintended consequences,
link |
you know, it's optimal respect to what you specified,
link |
but it's not what you want.
link |
And there's a difference between those.
link |
But that's not fundamentally a robotics problem, right?
link |
That's a human problem.
link |
So like. That's the thing, right?
link |
So there's this thing called Goodhart's law,
link |
which is you set a metric for an organization
link |
and the moment it becomes a target
link |
that people actually optimize for,
link |
it's no longer a good metric.
link |
So the moment you specify a metric,
link |
it stops doing its job.
link |
Yeah, it stops doing its job.
link |
So there's, yeah, there's such a thing
link |
as optimizing for things and, you know,
link |
failing to think ahead of time
link |
of all the possible things that might be important.
link |
And so that's, so that's interesting
link |
because Historia works a lot on reward learning
link |
from the perspective of customizing to the end user,
link |
but it really seems like it's not just the interaction
link |
with the end user that's a problem of the human
link |
and the robot collaborating
link |
so that the robot can do what the human wants, right?
link |
This kind of back and forth, the robot probing,
link |
the person being informative, all of that stuff
link |
might be actually just as applicable
link |
to this kind of maybe new form of human robot interaction,
link |
which is the interaction between the robot
link |
and the expert programmer, roboticist designer
link |
in charge of actually specifying
link |
what the heck the robot should do,
link |
specifying the task for the robot.
link |
That's fascinating.
link |
That's so cool, like collaborating on the reward design.
link |
Right, collaborating on the reward design.
link |
And so what does it mean, right?
link |
What does it, when we think about the problem,
link |
not as someone specifies all of your job is to optimize,
link |
and we start thinking about you're in this interaction
link |
and this collaboration.
link |
And the first thing that comes up is
link |
when the person specifies a reward, it's not, you know,
link |
gospel, it's not like the letter of the law.
link |
It's not the definition of the reward function
link |
you should be optimizing,
link |
because they're doing their best,
link |
but they're not some magic perfect oracle.
link |
And the sooner we start understanding that,
link |
I think the sooner we'll get to more robust robots
link |
that function better in different situations.
link |
And then you have kind of say, okay, well,
link |
it's almost like robots are over learning,
link |
over putting too much weight on the reward specified
link |
by definition, and maybe leaving a lot of other information
link |
on the table, like what are other things we could do
link |
to actually communicate to the robot
link |
about what we want them to do besides attempting
link |
to specify a reward function.
link |
Yeah, you have this awesome,
link |
and again, I love the poetry of it, of leaked information.
link |
So you mentioned humans leak information
link |
about what they want, you know,
link |
leak reward signal for the robot.
link |
So how do we detect these leaks?
link |
Yeah, what are these leaks?
link |
Whether it just, I don't know,
link |
those were just recently saw it, read it,
link |
I don't know where from you,
link |
and it's gonna stick with me for a while for some reason,
link |
because it's not explicitly expressed.
link |
It kind of leaks indirectly from our behavior.
link |
From what we do, yeah, absolutely.
link |
So I think maybe some surprising bits, right?
link |
So we were talking before about, I'm a robot arm,
link |
it needs to move around people, carry stuff,
link |
put stuff away, all of that.
link |
And now imagine that, you know,
link |
the robot has some initial objective
link |
that the programmer gave it
link |
so they can do all these things functionally.
link |
It's capable of doing that.
link |
And now I noticed that it's doing something
link |
and maybe it's coming too close to me, right?
link |
And maybe I'm the designer,
link |
maybe I'm the end user and this robot is now in my home.
link |
And I push it away.
link |
So I push away because, you know,
link |
it's a reaction to what the robot is currently doing.
link |
And this is what we call physical human robot interaction.
link |
And now there's a lot of interesting work
link |
on how the heck do you respond to physical human
link |
robot interaction?
link |
What should the robot do if such an event occurs?
link |
And there's sort of different schools of thought.
link |
Well, you know, you can sort of treat it
link |
the control theoretic way and say,
link |
this is a disturbance that you must reject.
link |
You can sort of treat it more kind of heuristically
link |
and say, I'm gonna go into some like gravity compensation
link |
mode so that I'm easily maneuverable around.
link |
I'm gonna go in the direction that the person pushed me.
link |
And to us, part of realization has been
link |
that that is signal that communicates about the reward.
link |
Because if my robot was moving in an optimal way
link |
and I intervened, that means that I disagree
link |
with his notion of optimality, right?
link |
Whatever it thinks is optimal is not actually optimal.
link |
And sort of optimization problems aside,
link |
that means that the cost function,
link |
the reward function is incorrect,
link |
or at least is not what I want it to be.
link |
How difficult is that signal to interpret
link |
and make actionable?
link |
So like, cause this connects
link |
to our autonomous vehicle discussion
link |
where they're in the semi autonomous vehicle
link |
or autonomous vehicle when a safety driver
link |
disengages the car, like,
link |
but they could have disengaged it for a million reasons.
link |
Yeah, so that's true.
link |
Again, it comes back to, can you structure a little bit
link |
your assumptions about how human behavior
link |
relates to what they want?
link |
And you can, one thing that we've done is
link |
literally just treated this external torque
link |
that they applied as, when you take that
link |
and you add it with what the torque
link |
the robot was already applying,
link |
that overall action is probably relatively optimal
link |
in respect to whatever it is that the person wants.
link |
And then that gives you information
link |
about what it is that they want.
link |
So you can learn that people want you
link |
to stay further away from them.
link |
Now you're right that there might be many things
link |
that explain just that one signal
link |
and that you might need much more data than that
link |
for the person to be able to shape
link |
your reward function over time.
link |
You can also do this info gathering stuff
link |
that we were talking about.
link |
Not that we've done that in that context,
link |
just to clarify, but it's definitely something
link |
we thought about where you can have the robot
link |
start acting in a way, like if there's
link |
a bunch of different explanations, right?
link |
It moves in a way where it sees if you correct it
link |
in some other way or not,
link |
and then kind of actually plans its motion
link |
so that it can disambiguate
link |
and collect information about what you want.
link |
Anyway, so that's one way,
link |
that's kind of sort of leaked information,
link |
maybe even more subtle leaked information
link |
is if I just press the E stop, right?
link |
I just, I'm doing it out of panic
link |
because the robot is about to do something bad.
link |
There's again, information there, right?
link |
Okay, the robot should definitely stop,
link |
but it should also figure out
link |
that whatever it was about to do was not good.
link |
And in fact, it was so not good
link |
that stopping and remaining stopped for a while
link |
was a better trajectory for it
link |
than whatever it is that it was about to do.
link |
And that again is information about
link |
what are my preferences, what do I want?
link |
Speaking of E stops, what are your expert opinions
link |
on the three laws of robotics from Isaac Asimov
link |
that don't harm humans, obey orders, protect yourself?
link |
I mean, it's such a silly notion,
link |
but I speak to so many people these days,
link |
just regular folks, just, I don't know,
link |
my parents and so on about robotics.
link |
And they kind of operate in that space of,
link |
you know, imagining our future with robots
link |
and thinking what are the ethical,
link |
how do we get that dance right?
link |
I know the three laws might be a silly notion,
link |
but do you think about like
link |
what universal reward functions that might be
link |
that we should enforce on the robots of the future?
link |
Or is that a little too far out and it doesn't,
link |
or is the mechanism that you just described,
link |
it shouldn't be three laws,
link |
it should be constantly adjusting kind of thing.
link |
I think it should constantly be adjusting kind of thing.
link |
You know, the issue with the laws is,
link |
I don't even, you know, they're words
link |
and I have to write math
link |
and have to translate them into math.
link |
What does it mean to?
link |
What does harm mean?
link |
What is, it's not math.
link |
Cause we just talked about how
link |
you try to say what you want,
link |
but you don't always get it right.
link |
And you want these machines to do what you want,
link |
not necessarily exactly what you literally,
link |
so you don't want them to take you literally.
link |
You wanna take what you say and interpret it in context.
link |
And that's what we do with the specified rewards.
link |
We don't take them literally anymore from the designer.
link |
We, not we as a community, we as, you know,
link |
some members of my group, we,
link |
and some of our collaborators like Peter Beal
link |
and Stuart Russell, we sort of say,
link |
okay, the designer specified this thing,
link |
but I'm gonna interpret it not as,
link |
this is the universal reward function
link |
that I shall always optimize always and forever,
link |
but as this is good evidence about what the person wants.
link |
And I should interpret that evidence
link |
in the context of these situations that it was specified for.
link |
Cause ultimately that's what the designer thought about.
link |
That's what they had in mind.
link |
And really them specifying reward function
link |
that works for me in all these situations
link |
is really kind of telling me that whatever behavior
link |
that incentivizes must be good behavior
link |
with respect to the thing
link |
that I should actually be optimizing for.
link |
And so now the robot kind of has uncertainty
link |
about what it is that it should be,
link |
what its reward function is.
link |
And then there's all these additional signals
link |
that we've been finding that it can kind of continually
link |
learn from and adapt its understanding of what people want.
link |
Every time the person corrects it, maybe they demonstrate,
link |
maybe they stop, hopefully not, right?
link |
One really, really crazy one is the environment itself.
link |
Like our world, you don't, it's not, you know,
link |
you observe our world and the state of it.
link |
And it's not that you're seeing behavior
link |
and you're saying, oh, people are making decisions
link |
that are rational, blah, blah, blah.
link |
It's, but our world is something that we've been acting with
link |
according to our preferences.
link |
So I have this example where like,
link |
the robot walks into my home and my shoes are laid down
link |
on the floor kind of in a line, right?
link |
It took effort to do that.
link |
So even though the robot doesn't see me doing this,
link |
you know, actually aligning the shoes,
link |
it should still be able to figure out
link |
that I want the shoes aligned
link |
because there's no way for them to have magically,
link |
you know, be instantiated themselves in that way.
link |
Someone must have actually taken the time to do that.
link |
So it must be important.
link |
So the environment actually tells, the environment is.
link |
Leaks information.
link |
It leaks information.
link |
I mean, the environment is the way it is
link |
because humans somehow manipulated it.
link |
So you have to kind of reverse engineer the narrative
link |
that happened to create the environment as it is
link |
and that leaks the preference information.
link |
Yeah, and you have to be careful, right?
link |
Because people don't have the bandwidth to do everything.
link |
So just because, you know, my house is messy
link |
doesn't mean that I want it to be messy, right?
link |
But that just, you know, I didn't put the effort into that.
link |
I put the effort into something else.
link |
So the robot should figure out,
link |
well, that something else was more important,
link |
but it doesn't mean that, you know,
link |
the house being messy is not.
link |
So it's a little subtle, but yeah, we really think of it.
link |
The state itself is kind of like a choice
link |
that people implicitly made about how they want their world.
link |
What book or books, technical or fiction or philosophical,
link |
when you like look back, you know, life had a big impact,
link |
maybe it was a turning point, it was inspiring in some way.
link |
Maybe we're talking about some silly book
link |
that nobody in their right mind would want to read.
link |
Or maybe it's a book that you would recommend
link |
to others to read.
link |
Or maybe those could be two different recommendations
link |
of books that could be useful for people on their journey.
link |
When I was in, it's kind of a personal story.
link |
When I was in 12th grade,
link |
I got my hands on a PDF copy in Romania
link |
of Russell Norvig, AI modern approach.
link |
I didn't know anything about AI at that point.
link |
I was, you know, I had watched the movie,
link |
The Matrix was my exposure.
link |
And so I started going through this thing
link |
and, you know, you were asking in the beginning,
link |
what are, you know, it's math and it's algorithms,
link |
what's interesting.
link |
It was so captivating.
link |
This notion that you could just have a goal
link |
and figure out your way through
link |
kind of a messy, complicated situation.
link |
So what sequence of decisions you should make
link |
to autonomously to achieve that goal.
link |
I'm, you know, I'm biased, but that's a cool book to look at.
link |
You can convert, you know, the goal of intelligence,
link |
the process of intelligence and mechanize it.
link |
I had the same experience.
link |
I was really interested in psychiatry
link |
and trying to understand human behavior.
link |
And then AI modern approach is like, wait,
link |
you can just reduce it all to.
link |
You can write math about human behavior, right?
link |
So that's, and I think that stuck with me
link |
because, you know, a lot of what I do, a lot of what we do
link |
in my lab is write math about human behavior,
link |
combine it with data and learning, put it all together,
link |
give it to robots to plan with, and, you know,
link |
hope that instead of writing rules for the robots,
link |
writing heuristics, designing behavior,
link |
they can actually autonomously come up with the right thing
link |
to do around people.
link |
That's kind of our, you know, that's our signature move.
link |
We wrote some math and then instead of kind of hand crafting
link |
this and that and that and the robot figuring stuff out
link |
and isn't that cool.
link |
And I think that is the same enthusiasm that I got from
link |
the robot figured out how to reach that goal in that graph.
link |
So apologize for the romanticized questions,
link |
but, and the silly ones,
link |
if a doctor gave you five years to live,
link |
sort of emphasizing the finiteness of our existence,
link |
what would you try to accomplish?
link |
It's like my biggest nightmare, by the way.
link |
I really like living.
link |
So I'm actually, I really don't like the idea of being told
link |
that I'm going to die.
link |
Sorry to linger on that for a second.
link |
Do you, I mean, do you meditate or ponder on your mortality
link |
or human, the fact that this thing ends,
link |
it seems to be a fundamental feature.
link |
Do you think of it as a feature or a bug too?
link |
Is it, you said you don't like the idea of dying,
link |
but if I were to give you a choice of living forever,
link |
like you're not allowed to die.
link |
Now I'll say that I want to live forever,
link |
but I watched this show.
link |
It's called The Good Place and they reflect a lot on this.
link |
And you know, the,
link |
the moral of the story is that you have to make the afterlife
link |
Cause otherwise people just kind of, it's like Wally.
link |
It's like, ah, whatever.
link |
So, so I think the finiteness helps, but,
link |
but yeah, it's just, you know, I don't, I don't,
link |
I'm not a religious person.
link |
I don't think that there's something after.
link |
And so I think it just ends and you stop existing.
link |
And I really like existing.
link |
It's just, it's such a great privilege to exist that,
link |
that yeah, it's just, I think that's the scary part.
link |
I still think that we like existing so much because it ends.
link |
And that's so sad.
link |
Like it's so sad to me every time.
link |
Like I find almost everything about this life beautiful.
link |
Like the silliest, most mundane things are just beautiful.
link |
And I think I'm cognizant of the fact that I find it beautiful
link |
because it ends like it.
link |
And it's so, I don't know.
link |
I don't know how to feel about that.
link |
I also feel like there's a lesson in there for robotics
link |
and AI that is not like the finiteness of things seems
link |
to be a fundamental nature of human existence.
link |
I think some people sort of accuse me of just being Russian
link |
and melancholic and romantic or something,
link |
but that seems to be a fundamental nature of our existence
link |
that should be incorporated in our reward functions.
link |
But anyway, if you were speaking of reward functions,
link |
if you only had five years, what would you try to accomplish?
link |
This is the thing.
link |
I'm thinking about this question and have a pretty joyous moment
link |
because I don't know that I would change much.
link |
I'm trying to make some contributions to how we understand
link |
human AI interaction.
link |
I don't think I would change that.
link |
Maybe I'll take more trips to the Caribbean or something,
link |
but I tried some of that already from time to time.
link |
So, yeah, I try to do the things that bring me joy
link |
and thinking about these things bring me joy is the Marie Kondo thing.
link |
Don't do stuff that doesn't spark joy.
link |
For the most part, I do things that spark joy.
link |
Maybe I'll do less service in the department or something.
link |
I'm not dealing with admissions anymore.
link |
But no, I think I have amazing colleagues and amazing students
link |
and amazing family and friends and spending time in some balance
link |
with all of them is what I do and that's what I'm doing already.
link |
So, I don't know that I would really change anything.
link |
So, on the spirit of positiveness, what small act of kindness,
link |
if one pops to mind, were you once shown that you will never forget?
link |
When I was in high school, my friends, my classmates did some tutoring.
link |
We were gearing up for our baccalaureate exam
link |
and they did some tutoring on, well, some on math, some on whatever.
link |
I was comfortable enough with some of those subjects,
link |
but physics was something that I hadn't focused on in a while.
link |
And so, they were all working with this one teacher
link |
and I started working with that teacher.
link |
Her name is Nicole Beccano.
link |
And she was the one who kind of opened up this whole world for me
link |
because she sort of told me that I should take the SATs
link |
and apply to go to college abroad and do better on my English and all of that.
link |
And when it came to, well, financially I couldn't,
link |
my parents couldn't really afford to do all these things,
link |
she started tutoring me on physics for free
link |
and on top of that sitting down with me to kind of train me for SATs
link |
and all that jazz that she had experience with.
link |
Wow. And obviously that has taken you to be here today,
link |
sort of one of the world experts in robotics.
link |
It's funny those little... For no reason really.
link |
Just out of karma.
link |
Wanting to support someone, yeah.
link |
Yeah. So, we talked a ton about reward functions.
link |
Let me talk about the most ridiculous big question.
link |
What is the meaning of life?
link |
What's the reward function under which we humans operate?
link |
Like what, maybe to your life, maybe broader to human life in general,
link |
what do you think...
link |
What gives life fulfillment, purpose, happiness, meaning?
link |
You can't even ask that question with a straight face.
link |
That's how ridiculous this is.
link |
Okay. So, you know...
link |
You're going to try to answer it anyway, aren't you?
link |
So, I was in a planetarium once.
link |
And, you know, they show you the thing and then they zoom out and zoom out
link |
and this whole, like, you're a speck of dust kind of thing.
link |
I think I was conceptualizing that we're kind of, you know, what are humans?
link |
We're just on this little planet, whatever.
link |
We don't matter much in the grand scheme of things.
link |
And then my mind got really blown because they talked about this multiverse theory
link |
where they kind of zoomed out and were like, this is our universe.
link |
And then, like, there's a bazillion other ones and they just pop in and out of existence.
link |
So, like, our whole thing that we can't even fathom how big it is was like a blimp that went in and out.
link |
And at that point, I was like, okay, like, I'm done.
link |
This is not, there is no meaning.
link |
And clearly what we should be doing is try to impact whatever local thing we can impact,
link |
our communities, leave a little bit behind there, our friends, our family, our local communities,
link |
and just try to be there for other humans because I just, everything beyond that seems ridiculous.
link |
I mean, are you, like, how do you make sense of these multiverses?
link |
Like, are you inspired by the immensity of it?
link |
Do you, I mean, is there, like, is it amazing to you or is it almost paralyzing in the mystery of it?
link |
I'm frustrated by my inability to comprehend.
link |
It just feels very frustrating.
link |
It's like there's some stuff that, you know, we should time, blah, blah, blah, that we should really be understanding.
link |
And I definitely don't understand it.
link |
But, you know, the amazing physicists of the world have a much better understanding than me.
link |
But it still seems epsilon in the grand scheme of things.
link |
So, it's very frustrating.
link |
It just, it sort of feels like our brain don't have some fundamental capacity yet, well, yet or ever.
link |
Well, that's one of the dreams of artificial intelligence is to create systems that will aid,
link |
expand our cognitive capacity in order to understand, build the theory of everything with the physics
link |
and understand what the heck these multiverses are.
link |
So, I think there's no better way to end it than talking about the meaning of life and the fundamental nature of the universe and the multiverses.
link |
And the multiverse.
link |
So, Anca, it is a huge honor.
link |
One of my favorite conversations I've had.
link |
I really, really appreciate your time.
link |
Thank you for talking today.
link |
Thank you for coming.
link |
Thanks for listening to this conversation with Anca Dragan.
link |
And thank you to our presenting sponsor, Cash App.
link |
Please consider supporting the podcast by downloading Cash App and using code LexPodcast.
link |
If you enjoy this podcast, subscribe on YouTube, review it with 5 stars on Apple Podcast,
link |
support it on Patreon, or simply connect with me on Twitter at LexFriedman.
link |
And now, let me leave you with some words from Isaac Asimov.
link |
Your assumptions are your windows in the world.
link |
Scrub them off every once in a while or the light won't come in.
link |
Thank you for listening and hope to see you next time.