back to index

Anca Dragan: Human-Robot Interaction and Reward Engineering | Lex Fridman Podcast #81


small model | large model

link |
00:00:00.000
The following is a conversation with Anca Drogon,
link |
00:00:03.880
a professor at Berkeley working on human robot interaction,
link |
00:00:08.160
algorithms that look beyond the robot's function
link |
00:00:10.760
in isolation and generate robot behavior
link |
00:00:13.920
that accounts for interaction
link |
00:00:15.960
and coordination with human beings.
link |
00:00:18.080
She also consults at Waymo, the autonomous vehicle company,
link |
00:00:22.360
but in this conversation,
link |
00:00:23.560
she is 100% wearing her Berkeley hat.
link |
00:00:27.120
She is one of the most brilliant and fun roboticists
link |
00:00:30.600
in the world to talk with.
link |
00:00:32.480
I had a tough and crazy day leading up to this conversation,
link |
00:00:36.320
so I was a bit tired, even more so than usual,
link |
00:00:41.440
but almost immediately as she walked in,
link |
00:00:44.160
her energy, passion, and excitement
link |
00:00:46.320
for human robot interaction was contagious.
link |
00:00:48.880
So I had a lot of fun and really enjoyed this conversation.
link |
00:00:52.840
This is the Artificial Intelligence Podcast.
link |
00:00:55.560
If you enjoy it, subscribe on YouTube,
link |
00:00:57.880
review it with five stars on Apple Podcast,
link |
00:01:00.320
support it on Patreon,
link |
00:01:01.680
or simply connect with me on Twitter at Lex Friedman,
link |
00:01:05.160
spelled F R I D M A N.
link |
00:01:08.160
As usual, I'll do one or two minutes of ads now
link |
00:01:11.000
and never any ads in the middle
link |
00:01:12.560
that can break the flow of the conversation.
link |
00:01:14.800
I hope that works for you
link |
00:01:16.240
and doesn't hurt the listening experience.
link |
00:01:20.440
This show is presented by Cash App,
link |
00:01:22.720
the number one finance app in the App Store.
link |
00:01:25.520
When you get it, use code LEXPODCAST.
link |
00:01:29.320
Cash App lets you send money to friends,
link |
00:01:31.360
buy Bitcoin, and invest in the stock market
link |
00:01:33.880
with as little as one dollar.
link |
00:01:36.840
Since Cash App does fractional share trading,
link |
00:01:39.200
let me mention that the order execution algorithm
link |
00:01:41.700
that works behind the scenes
link |
00:01:43.360
to create the abstraction of fractional orders
link |
00:01:45.960
is an algorithmic marvel.
link |
00:01:48.180
So big props to the Cash App engineers
link |
00:01:50.500
for solving a hard problem that in the end
link |
00:01:53.240
provides an easy interface that takes a step up
link |
00:01:56.120
to the next layer of abstraction over the stock market,
link |
00:01:59.320
making trading more accessible for new investors
link |
00:02:02.060
and diversification much easier.
link |
00:02:05.860
So again, if you get Cash App from the App Store
link |
00:02:08.240
or Google Play and use the code LEXPODCAST,
link |
00:02:11.880
you get $10 and Cash App will also donate $10 to FIRST,
link |
00:02:15.920
an organization that is helping to advance robotics
link |
00:02:18.520
and STEM education for young people around the world.
link |
00:02:22.280
And now, here's my conversation with Anca Drogon.
link |
00:02:26.800
When did you first fall in love with robotics?
link |
00:02:29.880
I think it was a very gradual process
link |
00:02:34.200
and it was somewhat accidental actually
link |
00:02:37.040
because I first started getting into programming
link |
00:02:41.160
when I was a kid and then into math
link |
00:02:43.200
and then I decided computer science
link |
00:02:46.280
was the thing I was gonna do
link |
00:02:47.840
and then in college I got into AI
link |
00:02:50.160
and then I applied to the Robotics Institute
link |
00:02:52.480
at Carnegie Mellon and I was coming from this little school
link |
00:02:56.080
in Germany that nobody had heard of
link |
00:02:59.000
but I had spent an exchange semester at Carnegie Mellon
link |
00:03:01.800
so I had letters from Carnegie Mellon.
link |
00:03:04.040
So that was the only, you know, MIT said no,
link |
00:03:06.880
Berkeley said no, Stanford said no.
link |
00:03:09.200
That was the only place I got into
link |
00:03:11.100
so I went there to the Robotics Institute
link |
00:03:13.200
and I thought that robotics is a really cool way
link |
00:03:16.240
to actually apply the stuff that I knew and loved
link |
00:03:20.000
to like optimization so that's how I got into robotics.
link |
00:03:23.240
I have a better story how I got into cars
link |
00:03:25.800
which is I used to do mostly manipulation in my PhD
link |
00:03:31.600
but now I do kind of a bit of everything application wise
link |
00:03:34.800
including cars and I got into cars
link |
00:03:38.960
because I was here in Berkeley
link |
00:03:42.180
while I was a PhD student still for RSS 2014,
link |
00:03:46.400
Peter Bill organized it and he arranged for,
link |
00:03:50.380
it was Google at the time to give us rides
link |
00:03:52.840
in self driving cars and I was in a robot
link |
00:03:56.400
and it was just making decision after decision,
link |
00:04:00.660
the right call and it was so amazing.
link |
00:04:03.400
So it was a whole different experience, right?
link |
00:04:05.560
Just I mean manipulation is so hard you can't do anything
link |
00:04:07.880
and there it was.
link |
00:04:08.720
Was it the most magical robot you've ever met?
link |
00:04:11.200
So like for me to meet a Google self driving car
link |
00:04:14.940
for the first time was like a transformative moment.
link |
00:04:18.480
Like I had two moments like that,
link |
00:04:19.960
that and Spot Mini, I don't know if you met Spot Mini
link |
00:04:22.480
from Boston Dynamics.
link |
00:04:24.160
I felt like I fell in love or something
link |
00:04:27.200
like it, cause I know how a Spot Mini works, right?
link |
00:04:30.840
It's just, I mean there's nothing truly special,
link |
00:04:34.000
it's great engineering work but the anthropomorphism
link |
00:04:38.440
that went on into my brain that came to life
link |
00:04:41.440
like it had a little arm and it looked at me,
link |
00:04:45.880
he, she looked at me, I don't know,
link |
00:04:47.640
there's a magical connection there
link |
00:04:48.960
and it made me realize, wow, robots can be so much more
link |
00:04:52.480
than things that manipulate objects.
link |
00:04:54.240
They can be things that have a human connection.
link |
00:04:56.920
Do you have, was the self driving car the moment like,
link |
00:05:01.100
was there a robot that truly sort of inspired you?
link |
00:05:04.680
That was, I remember that experience very viscerally,
link |
00:05:08.240
riding in that car and being just wowed.
link |
00:05:11.600
I had the, they gave us a sticker that said,
link |
00:05:16.040
I rode in a self driving car
link |
00:05:17.520
and it had this cute little firefly on and,
link |
00:05:20.880
or logo or something like that.
link |
00:05:21.720
Oh, that was like the smaller one, like the firefly.
link |
00:05:23.680
Yeah, the really cute one, yeah.
link |
00:05:25.640
And I put it on my laptop and I had that for years
link |
00:05:30.140
until I finally changed my laptop out and you know.
link |
00:05:33.120
What about if we walk back, you mentioned optimization,
link |
00:05:36.320
like what beautiful ideas inspired you in math,
link |
00:05:40.760
computer science early on?
link |
00:05:42.680
Like why get into this field?
link |
00:05:44.560
It seems like a cold and boring field of math.
link |
00:05:47.460
Like what was exciting to you about it?
link |
00:05:49.080
The thing is I liked math from very early on,
link |
00:05:52.460
from fifth grade is when I got into the math Olympiad
link |
00:05:56.720
and all of that.
link |
00:05:57.540
Oh, you competed too?
link |
00:05:58.600
Yeah, this, it Romania is like our national sport too,
link |
00:06:01.440
you gotta understand.
link |
00:06:02.840
So I got into that fairly early
link |
00:06:05.800
and it was a little, maybe too just theory
link |
00:06:10.240
with no kind of, I didn't kind of had a,
link |
00:06:13.000
didn't really have a goal.
link |
00:06:15.040
And other than understanding, which was cool,
link |
00:06:17.600
I always liked learning and understanding,
link |
00:06:19.360
but there was no, okay,
link |
00:06:20.240
what am I applying this understanding to?
link |
00:06:22.280
And so I think that's how I got into,
link |
00:06:23.880
more heavily into computer science
link |
00:06:25.400
because it was kind of math meets something
link |
00:06:29.280
you can do tangibly in the world.
link |
00:06:31.360
Do you remember like the first program you've written?
link |
00:06:34.520
Okay, the first program I've written with,
link |
00:06:37.360
I kind of do, it was in Cubasic in fourth grade.
link |
00:06:42.600
Wow.
link |
00:06:43.440
And it was drawing like a circle.
link |
00:06:46.680
Graphics.
link |
00:06:47.520
Yeah, that was, I don't know how to do that anymore,
link |
00:06:51.720
but in fourth grade,
link |
00:06:52.880
that's the first thing that they taught me.
link |
00:06:54.200
I was like, you could take a special,
link |
00:06:56.320
I wouldn't say it was an extracurricular,
link |
00:06:57.600
it's in the sense an extracurricular,
link |
00:06:59.040
so you could sign up for dance or music or programming.
link |
00:07:03.340
And I did the programming thing
link |
00:07:04.700
and my mom was like, what, why?
link |
00:07:07.840
Did you compete in programming?
link |
00:07:08.880
Like these days, Romania probably,
link |
00:07:12.040
that's like a big thing.
link |
00:07:12.980
There's a programming competition.
link |
00:07:15.400
Was that, did that touch you at all?
link |
00:07:17.120
I did a little bit of the computer science Olympian,
link |
00:07:21.360
but not as seriously as I did the math Olympian.
link |
00:07:24.720
So it was programming.
link |
00:07:25.760
Yeah, it's basically,
link |
00:07:26.720
here's a hard math problem,
link |
00:07:27.720
solve it with a computer is kind of the deal.
link |
00:07:29.480
Yeah, it's more like algorithm.
link |
00:07:30.720
Exactly, it's always algorithmic.
link |
00:07:32.640
So again, you kind of mentioned the Google self driving car,
link |
00:07:36.720
but outside of that,
link |
00:07:39.920
what's like who or what is your favorite robot,
link |
00:07:44.000
real or fictional that like captivated
link |
00:07:46.520
your imagination throughout?
link |
00:07:48.360
I mean, I guess you kind of alluded
link |
00:07:49.900
to the Google self drive,
link |
00:07:51.440
the Firefly was a magical moment,
link |
00:07:53.620
but is there something else?
link |
00:07:54.880
It wasn't the Firefly there,
link |
00:07:56.220
I think there was the Lexus by the way.
link |
00:07:58.000
This was back then.
link |
00:07:59.660
But yeah, so good question.
link |
00:08:02.720
Okay, my favorite fictional robot is WALLI.
link |
00:08:08.800
And I love how amazingly expressive it is.
link |
00:08:15.000
I'm personally thinks a little bit
link |
00:08:16.040
about expressive motion kinds of things you're saying with,
link |
00:08:18.400
you can do this and it's a head and it's the manipulator
link |
00:08:20.800
and what does it all mean?
link |
00:08:22.840
I like to think about that stuff.
link |
00:08:24.040
I love Pixar, I love animation.
link |
00:08:26.160
WALLI has two big eyes, I think, or no?
link |
00:08:28.680
Yeah, it has these cameras and they move.
link |
00:08:34.600
So yeah, it goes and then it's super cute.
link |
00:08:38.860
Yeah, the way it moves is just so expressive,
link |
00:08:41.480
the timing of that motion,
link |
00:08:43.280
what it's doing with its arms
link |
00:08:44.760
and what it's doing with these lenses is amazing.
link |
00:08:48.280
And so I've really liked that from the start.
link |
00:08:53.360
And then on top of that, sometimes I share this,
link |
00:08:56.440
it's a personal story I share with people
link |
00:08:58.120
or when I teach about AI or whatnot.
link |
00:09:01.160
My husband proposed to me by building a WALLI
link |
00:09:07.040
and he actuated it.
link |
00:09:09.700
So it's seven degrees of freedom, including the lens thing.
link |
00:09:13.520
And it kind of came in and it had the,
link |
00:09:17.960
he made it have like the belly box opening thing.
link |
00:09:21.880
So it just did that.
link |
00:09:23.520
And then it spewed out this box made out of Legos
link |
00:09:27.600
that open slowly and then bam, yeah.
link |
00:09:31.200
Yeah, it was quite, it set a bar.
link |
00:09:34.360
That could be like the most impressive thing I've ever heard.
link |
00:09:37.620
Okay.
link |
00:09:39.080
That was special connection to WALLI, long story short.
link |
00:09:40.980
I like WALLI because I like animation and I like robots
link |
00:09:43.760
and I like the fact that this was,
link |
00:09:46.920
we still have this robot to this day.
link |
00:09:49.880
How hard is that problem,
link |
00:09:50.920
do you think of the expressivity of robots?
link |
00:09:54.260
Like with the Boston Dynamics, I never talked to those folks
link |
00:09:59.000
about this particular element.
link |
00:10:00.360
I've talked to them a lot,
link |
00:10:02.120
but it seems to be like almost an accidental side effect
link |
00:10:05.320
for them that they weren't,
link |
00:10:07.480
I don't know if they're faking it.
link |
00:10:08.720
They weren't trying to, okay.
link |
00:10:11.740
They do say that the gripper,
link |
00:10:14.240
it was not intended to be a face.
link |
00:10:17.920
I don't know if that's a honest statement,
link |
00:10:20.400
but I think they're legitimate.
link |
00:10:21.720
Probably yes. And so do we automatically just
link |
00:10:25.720
anthropomorphize anything we can see about a robot?
link |
00:10:29.320
So like the question is,
link |
00:10:30.720
how hard is it to create a WALLI type robot
link |
00:10:33.680
that connects so deeply with us humans?
link |
00:10:35.360
What do you think?
link |
00:10:36.760
It's really hard, right?
link |
00:10:37.880
So it depends on what setting.
link |
00:10:39.980
So if you wanna do it in this very particular narrow setting
link |
00:10:45.760
where it does only one thing and it's expressive,
link |
00:10:48.200
then you can get an animator, you know,
link |
00:10:50.120
you can have Pixar on call come in,
link |
00:10:52.100
design some trajectories.
link |
00:10:53.520
There was a, Anki had a robot called Cosmo
link |
00:10:56.040
where they put in some of these animations.
link |
00:10:58.360
That part is easy, right?
link |
00:11:00.520
The hard part is doing it not via these
link |
00:11:04.320
kind of handcrafted behaviors,
link |
00:11:06.480
but doing it generally autonomously.
link |
00:11:09.820
Like I want robots, I don't work on,
link |
00:11:12.040
just to clarify, I don't, I used to work a lot on this.
link |
00:11:14.680
I don't work on that quite as much these days,
link |
00:11:17.360
but the notion of having robots that, you know,
link |
00:11:21.720
when they pick something up and put it in a place,
link |
00:11:24.320
they can do that with various forms of style,
link |
00:11:28.160
or you can say, well, this robot is, you know,
link |
00:11:30.200
succeeding at this task and is confident
link |
00:11:32.000
versus it's hesitant versus, you know,
link |
00:11:34.080
maybe it's happy or it's, you know,
link |
00:11:35.920
disappointed about something, some failure that it had.
link |
00:11:38.800
I think that when robots move,
link |
00:11:42.880
they can communicate so much about internal states
link |
00:11:46.840
or perceived internal states that they have.
link |
00:11:49.800
And I think that's really useful
link |
00:11:53.320
and an element that we'll want in the future
link |
00:11:55.520
because I was reading this article
link |
00:11:58.080
about how kids are,
link |
00:12:04.120
kids are being rude to Alexa
link |
00:12:07.360
because they can be rude to it
link |
00:12:09.680
and it doesn't really get angry, right?
link |
00:12:11.560
It doesn't reply in any way, it just says the same thing.
link |
00:12:15.200
So I think there's, at least for that,
link |
00:12:17.560
for the correct development of children,
link |
00:12:20.040
it's important that these things,
link |
00:12:21.480
you kind of react differently.
link |
00:12:22.920
I also think, you know, you walk in your home
link |
00:12:24.600
and you have a personal robot and if you're really pissed,
link |
00:12:27.160
presumably the robot should kind of behave
link |
00:12:28.880
slightly differently than when you're super happy
link |
00:12:31.320
and excited, but it's really hard because it's,
link |
00:12:36.020
I don't know, you know, the way I would think about it
link |
00:12:38.720
and the way I thought about it when it came to
link |
00:12:40.840
expressing goals or intentions for robots,
link |
00:12:44.080
it's, well, what's really happening is that
link |
00:12:47.440
instead of doing robotics where you have your state
link |
00:12:51.520
and you have your action space and you have your space,
link |
00:12:55.600
the reward function that you're trying to optimize,
link |
00:12:57.840
now you kind of have to expand the notion of state
link |
00:13:00.560
to include this human internal state.
link |
00:13:02.780
What is the person actually perceiving?
link |
00:13:05.920
What do they think about the robots?
link |
00:13:08.600
Something or rather,
link |
00:13:10.160
and then you have to optimize in that system.
link |
00:13:12.760
And so that means that you have to understand
link |
00:13:14.120
how your motion, your actions end up sort of influencing
link |
00:13:17.960
the observer's kind of perception of you.
link |
00:13:20.980
And it's very hard to write math about that.
link |
00:13:25.040
Right, so when you start to think about
link |
00:13:27.140
incorporating the human into the state model,
link |
00:13:31.560
apologize for the philosophical question,
link |
00:13:33.680
but how complicated are human beings, do you think?
link |
00:13:36.440
Like, can they be reduced to a kind of
link |
00:13:40.740
almost like an object that moves
link |
00:13:43.740
and maybe has some basic intents?
link |
00:13:46.160
Or is there something, do we have to model things like mood
link |
00:13:50.060
and general aggressiveness and time?
link |
00:13:52.780
I mean, all these kinds of human qualities
link |
00:13:54.980
or like game theoretic qualities, like what's your sense?
link |
00:13:58.780
How complicated is...
link |
00:14:00.140
How hard is the problem of human robot interaction?
link |
00:14:03.340
Yeah, should we talk about
link |
00:14:05.260
what the problem of human robot interaction is?
link |
00:14:07.780
Yeah, what is human robot interaction?
link |
00:14:10.860
And then talk about how that, yeah.
link |
00:14:12.300
So, and by the way, I'm gonna talk about
link |
00:14:15.020
this very particular view of human robot interaction, right?
link |
00:14:19.060
Which is not so much on the social side
link |
00:14:21.620
or on the side of how do you have a good conversation
link |
00:14:24.540
with the robot, what should the robot's appearance be?
link |
00:14:26.780
It turns out that if you make robots taller versus shorter,
link |
00:14:29.220
this has an effect on how people act with them.
link |
00:14:31.900
So I'm not talking about that.
link |
00:14:34.660
But I'm talking about this very kind of narrow thing,
link |
00:14:36.260
which is you take, if you wanna take a task
link |
00:14:39.900
that a robot can do in isolation,
link |
00:14:42.860
in a lab out there in the world, but in isolation,
link |
00:14:46.580
and now you're asking what does it mean for the robot
link |
00:14:49.740
to be able to do this task for,
link |
00:14:52.580
presumably what its actually end goal is,
link |
00:14:54.300
which is to help some person.
link |
00:14:56.740
That ends up changing the problem in two ways.
link |
00:15:02.940
The first way it changes the problem is that
link |
00:15:04.700
the robot is no longer the single agent acting.
link |
00:15:08.580
That you have humans who also take actions
link |
00:15:10.980
in that same space.
link |
00:15:12.140
Cars navigating around people, robots around an office,
link |
00:15:15.300
navigating around the people in that office.
link |
00:15:18.580
If I send the robot over there in the cafeteria
link |
00:15:20.900
to get me a coffee, then there's probably other people
link |
00:15:23.580
reaching for stuff in the same space.
link |
00:15:25.340
And so now you have your robot and you're in charge
link |
00:15:28.580
of the actions that the robot is taking.
link |
00:15:30.580
Then you have these people who are also making decisions
link |
00:15:33.500
and taking actions in that same space.
link |
00:15:36.260
And even if, you know, the robot knows what it should do
link |
00:15:39.140
and all of that, just coexisting with these people, right?
link |
00:15:42.740
Kind of getting the actions to gel well,
link |
00:15:45.340
to mesh well together.
link |
00:15:47.100
That's sort of the kind of problem number one.
link |
00:15:50.500
And then there's problem number two,
link |
00:15:51.660
which is, goes back to this notion of if I'm a programmer,
link |
00:15:58.220
I can specify some objective for the robot
link |
00:16:00.900
to go off and optimize and specify the task.
link |
00:16:03.820
But if I put the robot in your home,
link |
00:16:07.340
presumably you might have your own opinions about,
link |
00:16:11.420
well, okay, I want my house clean,
link |
00:16:12.860
but how do I want it cleaned?
link |
00:16:14.060
And how should robot move, how close to me it should come
link |
00:16:16.340
and all of that.
link |
00:16:17.340
And so I think those are the two differences that you have.
link |
00:16:20.380
You're acting around people and what you should be
link |
00:16:24.940
optimizing for should satisfy the preferences
link |
00:16:27.500
of that end user, not of your programmer who programmed you.
link |
00:16:30.860
Yeah, and the preferences thing is tricky.
link |
00:16:33.780
So figuring out those preferences,
link |
00:16:35.700
be able to interactively adjust
link |
00:16:38.340
to understand what the human is doing.
link |
00:16:39.860
So really it boils down to understand the humans
link |
00:16:42.260
in order to interact with them and in order to please them.
link |
00:16:45.860
Right.
link |
00:16:47.100
So why is this hard?
link |
00:16:48.420
Yeah, why is understanding humans hard?
link |
00:16:51.100
So I think there's two tasks about understanding humans
link |
00:16:57.980
that in my mind are very, very similar,
link |
00:16:59.940
but not everyone agrees.
link |
00:17:00.980
So there's the task of being able to just anticipate
link |
00:17:04.460
what people will do.
link |
00:17:05.740
We all know that cars need to do this, right?
link |
00:17:07.620
We all know that, well, if I navigate around some people,
link |
00:17:10.580
the robot has to get some notion of,
link |
00:17:12.580
okay, where is this person gonna be?
link |
00:17:15.500
So that's kind of the prediction side.
link |
00:17:17.340
And then there's what you were saying,
link |
00:17:19.260
satisfying the preferences, right?
link |
00:17:21.060
So adapting to the person's preferences,
link |
00:17:22.820
knowing what to optimize for,
link |
00:17:24.500
which is more this inference side,
link |
00:17:25.900
this what does this person want?
link |
00:17:28.820
What is their intent? What are their preferences?
link |
00:17:31.580
And to me, those kind of go together
link |
00:17:35.100
because I think that at the very least,
link |
00:17:39.700
if you can understand, if you can look at human behavior
link |
00:17:42.980
and understand what it is that they want,
link |
00:17:45.500
then that's sort of the key enabler
link |
00:17:47.380
to being able to anticipate what they'll do in the future.
link |
00:17:50.660
Because I think that we're not arbitrary.
link |
00:17:53.580
We make these decisions that we make,
link |
00:17:55.380
we act in the way we do
link |
00:17:56.940
because we're trying to achieve certain things.
link |
00:17:59.340
And so I think that's the relationship between them.
link |
00:18:01.540
Now, how complicated do these models need to be
link |
00:18:05.540
in order to be able to understand what people want?
link |
00:18:10.140
So we've gotten a long way in robotics
link |
00:18:15.180
with something called inverse reinforcement learning,
link |
00:18:17.540
which is the notion of if someone acts,
link |
00:18:19.500
demonstrates how they want the thing done.
link |
00:18:22.100
What is inverse reinforcement learning?
link |
00:18:24.220
You just briefly said it.
link |
00:18:25.220
Right, so it's the problem of take human behavior
link |
00:18:30.220
and infer reward function from this.
link |
00:18:33.260
So figure out what it is
link |
00:18:34.500
that that behavior is optimal with respect to.
link |
00:18:37.420
And it's a great way to think
link |
00:18:38.700
about learning human preferences
link |
00:18:40.260
in the sense of you have a car and the person can drive it
link |
00:18:45.300
and then you can say, well, okay,
link |
00:18:46.900
I can actually learn what the person is optimizing for.
link |
00:18:51.940
I can learn their driving style,
link |
00:18:53.460
or you can have people demonstrate
link |
00:18:55.620
how they want the house clean.
link |
00:18:57.300
And then you can say, okay, this is,
link |
00:18:59.820
I'm getting the trade offs that they're making.
link |
00:19:02.980
I'm getting the preferences that they want out of this.
link |
00:19:06.140
And so we've been successful in robotics somewhat with this.
link |
00:19:10.300
And it's based on a very simple model of human behavior.
link |
00:19:15.020
It was remarkably simple,
link |
00:19:16.340
which is that human behavior is optimal
link |
00:19:18.660
with respect to whatever it is that people want, right?
link |
00:19:22.020
So you make that assumption
link |
00:19:23.100
and now you can kind of inverse through.
link |
00:19:24.380
That's why it's called inverse,
link |
00:19:25.900
well, really optimal control,
link |
00:19:27.220
but also inverse reinforcement learning.
link |
00:19:30.540
So this is based on utility maximization in economics.
link |
00:19:36.460
Back in the forties, von Neumann and Morgenstern
link |
00:19:39.500
were like, okay, people are making choices
link |
00:19:43.020
by maximizing utility, go.
link |
00:19:45.740
And then in the late fifties,
link |
00:19:48.380
we had Luce and Shepherd come in and say,
link |
00:19:52.460
people are a little bit noisy and approximate in that process.
link |
00:19:57.860
So they might choose something kind of stochastically
link |
00:20:01.580
with probability proportional to
link |
00:20:03.940
how much utility something has.
link |
00:20:07.060
So there's a bit of noise in there.
link |
00:20:09.620
This has translated into robotics
link |
00:20:11.740
and something that we call Boltzmann rationality.
link |
00:20:14.180
So it's a kind of an evolution
link |
00:20:15.700
of inverse reinforcement learning
link |
00:20:16.780
that accounts for human noise.
link |
00:20:19.620
And we've had some success with that too,
link |
00:20:21.980
for these tasks where it turns out
link |
00:20:23.860
people act noisily enough that you can't just do vanilla,
link |
00:20:28.340
the vanilla version.
link |
00:20:29.900
You can account for noise
link |
00:20:31.020
and still infer what they seem to want based on this.
link |
00:20:36.460
Then now we're hitting tasks where that's not enough.
link |
00:20:39.940
And because...
link |
00:20:41.260
What are examples of spatial tasks?
link |
00:20:43.620
So imagine you're trying to control some robot,
link |
00:20:45.900
that's fairly complicated.
link |
00:20:47.820
You're trying to control a robot arm
link |
00:20:49.220
because maybe you're a patient with a motor impairment
link |
00:20:52.580
and you have this wheelchair mounted arm
link |
00:20:53.860
and you're trying to control it around.
link |
00:20:56.260
Or one task that we've looked at with Sergei is,
link |
00:21:00.700
and our students did, is a lunar lander.
link |
00:21:02.860
So I don't know if you know this Atari game,
link |
00:21:05.060
it's called Lunar Lander.
link |
00:21:06.820
It's really hard.
link |
00:21:07.660
People really suck at landing the thing.
link |
00:21:09.740
Mostly they just crash it left and right.
link |
00:21:11.860
Okay, so this is the kind of task we imagine
link |
00:21:14.300
you're trying to provide some assistance
link |
00:21:16.980
to a person operating such a robot
link |
00:21:20.180
where you want the kind of the autonomy to kick in,
link |
00:21:21.980
figure out what it is that you're trying to do
link |
00:21:23.460
and help you do it.
link |
00:21:25.900
It's really hard to do that for, say, Lunar Lander
link |
00:21:30.700
because people are all over the place.
link |
00:21:32.940
And so they seem much more noisy than really irrational.
link |
00:21:36.700
That's an example of a task
link |
00:21:37.900
where these models are kind of failing us.
link |
00:21:41.220
And it's not surprising because
link |
00:21:43.500
we're talking about the 40s, utility, late 50s,
link |
00:21:47.020
sort of noisy.
link |
00:21:48.900
Then the 70s came and behavioral economics
link |
00:21:52.340
started being a thing where people were like,
link |
00:21:54.620
no, no, no, no, no, people are not rational.
link |
00:21:58.140
People are messy and emotional and irrational
link |
00:22:03.300
and have all sorts of heuristics
link |
00:22:05.340
that might be domain specific.
link |
00:22:06.980
And they're just a mess.
link |
00:22:08.580
The mess.
link |
00:22:09.420
So what does my robot do to understand
link |
00:22:13.180
what you want?
link |
00:22:14.740
And it's a very, it's very, that's why it's complicated.
link |
00:22:18.020
It's, you know, for the most part,
link |
00:22:19.580
we get away with pretty simple models until we don't.
link |
00:22:23.300
And then the question is, what do you do then?
link |
00:22:26.580
And I had days when I wanted to, you know,
link |
00:22:30.180
pack my bags and go home and switch jobs
link |
00:22:32.780
because it's just, it feels really daunting
link |
00:22:35.020
to make sense of human behavior enough
link |
00:22:37.300
that you can reliably understand what people want,
link |
00:22:40.540
especially as, you know,
link |
00:22:41.380
robot capabilities will continue to get developed.
link |
00:22:44.940
You'll get these systems that are more and more capable
link |
00:22:47.180
of all sorts of things.
link |
00:22:48.060
And then you really want to make sure
link |
00:22:49.140
that you're telling them the right thing to do.
link |
00:22:51.500
What is that thing?
link |
00:22:52.620
Well, read it in human behavior.
link |
00:22:56.100
So if I just sat here quietly
link |
00:22:58.460
and tried to understand something about you
link |
00:23:00.380
by listening to you talk,
link |
00:23:02.140
it would be harder than if I got to say something
link |
00:23:06.140
and ask you and interact and control.
link |
00:23:08.780
Can you, can the robot help its understanding of the human
link |
00:23:13.140
by influencing the behavior by actually acting?
link |
00:23:18.540
Yeah, absolutely.
link |
00:23:19.780
So one of the things that's been exciting to me lately
link |
00:23:23.660
is this notion that when you try to,
link |
00:23:28.780
that when you try to think of the robotics problem as,
link |
00:23:31.940
okay, I have a robot and it needs to optimize
link |
00:23:34.500
for whatever it is that a person wants it to optimize
link |
00:23:37.540
as opposed to maybe what a programmer said.
link |
00:23:40.700
That problem we think of as a human robot
link |
00:23:44.700
collaboration problem in which both agents get to act
link |
00:23:49.140
in which the robot knows less than the human
link |
00:23:52.300
because the human actually has access to,
link |
00:23:54.660
you know, at least implicitly to what it is that they want.
link |
00:23:57.220
They can't write it down, but they can talk about it.
link |
00:24:00.660
They can give all sorts of signals.
link |
00:24:02.300
They can demonstrate and,
link |
00:24:04.460
but the robot doesn't need to sit there
link |
00:24:06.540
and passively observe human behavior
link |
00:24:08.780
and try to make sense of it.
link |
00:24:10.100
The robot can act too.
link |
00:24:11.900
And so there's these information gathering actions
link |
00:24:15.380
that the robot can take to sort of solicit responses
link |
00:24:19.020
that are actually informative.
link |
00:24:21.060
So for instance, this is not for the purpose
link |
00:24:22.980
of assisting people, but with kind of back to coordinating
link |
00:24:25.580
with people in cars and all of that.
link |
00:24:27.420
One thing that Dorsa did was,
link |
00:24:31.860
so we were looking at cars being able to navigate
link |
00:24:34.260
around people and you might not know exactly
link |
00:24:39.500
the driving style of a particular individual
link |
00:24:41.860
that's next to you,
link |
00:24:43.020
but you wanna change lanes in front of them.
link |
00:24:45.260
Navigating around other humans inside cars.
link |
00:24:48.780
Yeah, good, good clarification question.
link |
00:24:50.940
So you have an autonomous car and it's trying to navigate
link |
00:24:55.860
the road around human driven vehicles.
link |
00:24:58.980
Similar things ideas apply to pedestrians as well,
link |
00:25:01.620
but let's just take human driven vehicles.
link |
00:25:03.900
So now you're trying to change a lane.
link |
00:25:06.220
Well, you could be trying to infer the driving style
link |
00:25:10.460
of this person next to you.
link |
00:25:12.180
You'd like to know if they're in particular,
link |
00:25:13.780
if they're sort of aggressive or defensive,
link |
00:25:15.940
if they're gonna let you kind of go in
link |
00:25:18.020
or if they're gonna not.
link |
00:25:20.300
And it's very difficult to just,
link |
00:25:25.900
if you think that if you wanna hedge your bets
link |
00:25:27.940
and say, ah, maybe they're actually pretty aggressive,
link |
00:25:30.340
I shouldn't try this.
link |
00:25:31.580
You kind of end up driving next to them
link |
00:25:33.420
and driving next to them, right?
link |
00:25:34.860
And then you don't know
link |
00:25:36.460
because you're not actually getting the observations
link |
00:25:39.380
that you're getting away.
link |
00:25:40.220
Someone drives when they're next to you
link |
00:25:42.620
and they just need to go straight.
link |
00:25:44.420
It's kind of the same
link |
00:25:45.260
regardless if they're aggressive or defensive.
link |
00:25:47.460
And so you need to enable the robot
link |
00:25:51.020
to reason about how it might actually be able
link |
00:25:54.220
to gather information by changing the actions
link |
00:25:57.020
that it's taking.
link |
00:25:58.140
And then the robot comes up with these cool things
link |
00:25:59.940
where it kind of nudges towards you
link |
00:26:02.580
and then sees if you're gonna slow down or not.
link |
00:26:05.260
Then if you slow down,
link |
00:26:06.260
it sort of updates its model of you
link |
00:26:07.940
and says, oh, okay, you're more on the defensive side.
link |
00:26:11.340
So now I can actually like.
link |
00:26:12.740
That's a fascinating dance.
link |
00:26:14.340
That's so cool that you could use your own actions
link |
00:26:18.100
to gather information.
link |
00:26:19.380
That feels like a totally open,
link |
00:26:22.380
exciting new world of robotics.
link |
00:26:24.380
I mean, how many people are even thinking
link |
00:26:26.100
about that kind of thing?
link |
00:26:28.660
A handful of us, I'd say.
link |
00:26:30.260
It's rare because it's actually leveraging human.
link |
00:26:33.380
I mean, most roboticists,
link |
00:26:34.620
I've talked to a lot of colleagues and so on,
link |
00:26:38.220
are kind of, being honest, kind of afraid of humans.
link |
00:26:42.980
Because they're messy and complicated, right?
link |
00:26:45.460
I understand.
link |
00:26:47.900
Going back to what we were talking about earlier,
link |
00:26:49.820
right now we're kind of in this dilemma of, okay,
link |
00:26:52.500
there are tasks that we can just assume
link |
00:26:54.020
people are approximately rational for
link |
00:26:55.700
and we can figure out what they want.
link |
00:26:57.140
We can figure out their goals.
link |
00:26:57.980
We can figure out their driving styles, whatever.
link |
00:26:59.740
Cool.
link |
00:27:00.580
There are these tasks that we can't.
link |
00:27:02.860
So what do we do, right?
link |
00:27:03.980
Do we pack our bags and go home?
link |
00:27:06.060
And this one, I've had a little bit of hope recently.
link |
00:27:12.340
And I'm kind of doubting myself
link |
00:27:13.740
because what do I know that, you know,
link |
00:27:15.500
50 years of behavioral economics hasn't figured out.
link |
00:27:19.620
But maybe it's not really in contradiction
link |
00:27:21.500
with the way that field is headed.
link |
00:27:23.940
But basically one thing that we've been thinking about is,
link |
00:27:27.980
instead of kind of giving up and saying
link |
00:27:30.180
people are too crazy and irrational
link |
00:27:32.020
for us to make sense of them,
link |
00:27:34.460
maybe we can give them a bit the benefit of the doubt.
link |
00:27:39.380
And maybe we can think of them
link |
00:27:41.420
as actually being relatively rational,
link |
00:27:43.980
but just under different assumptions about the world,
link |
00:27:48.980
about how the world works, about, you know,
link |
00:27:51.580
they don't have, when we think about rationality,
link |
00:27:54.100
implicit assumption is, oh, they're rational,
link |
00:27:56.500
and they're all the same assumptions and constraints
link |
00:27:58.580
as the robot, right?
link |
00:27:59.940
What, if this is the state of the world,
link |
00:28:01.820
that's what they know.
link |
00:28:02.740
This is the transition function, that's what they know.
link |
00:28:05.140
This is the horizon, that's what they know.
link |
00:28:07.380
But maybe the kind of this difference,
link |
00:28:11.060
the way, the reason they can seem a little messy
link |
00:28:13.820
and hectic, especially to robots,
link |
00:28:16.500
is that perhaps they just make different assumptions
link |
00:28:20.060
or have different beliefs.
link |
00:28:21.660
Yeah, I mean, that's another fascinating idea
link |
00:28:24.820
that this, our kind of anecdotal desire
link |
00:28:29.060
to say that humans are irrational,
link |
00:28:31.060
perhaps grounded in behavioral economics,
link |
00:28:33.300
is that we just don't understand the constraints
link |
00:28:36.420
and the rewards under which they operate.
link |
00:28:38.300
And so our goal shouldn't be to throw our hands up
link |
00:28:40.980
and say they're irrational,
link |
00:28:42.420
it's to say, let's try to understand
link |
00:28:44.940
what are the constraints.
link |
00:28:46.420
What it is that they must be assuming
link |
00:28:48.420
that makes this behavior make sense.
link |
00:28:51.140
Good life lesson, right?
link |
00:28:52.620
Good life lesson.
link |
00:28:53.460
That's true, it's just outside of robotics.
link |
00:28:55.580
That's just good to, that's communicating with humans.
link |
00:28:58.500
That's just a good assume
link |
00:29:00.780
that you just don't, sort of empathy, right?
link |
00:29:03.340
It's a...
link |
00:29:04.420
This is maybe there's something you're missing
link |
00:29:06.020
and it's, you know, it especially happens to robots
link |
00:29:08.580
cause they're kind of dumb and they don't know things.
link |
00:29:10.220
And oftentimes people are sort of supra rational
link |
00:29:12.740
and that they actually know a lot of things
link |
00:29:14.300
that robots don't.
link |
00:29:15.420
Sometimes like with the lunar lander,
link |
00:29:17.860
the robot, you know, knows much more.
link |
00:29:20.540
So it turns out that if you try to say,
link |
00:29:23.980
look, maybe people are operating this thing
link |
00:29:26.940
but assuming a much more simplified physics model
link |
00:29:31.100
cause they don't get the complexity of this kind of craft
link |
00:29:33.900
or the robot arm with seven degrees of freedom
link |
00:29:36.100
with these inertias and whatever.
link |
00:29:38.420
So maybe they have this intuitive physics model
link |
00:29:41.580
which is not, you know, this notion of intuitive physics
link |
00:29:44.260
is something that you studied actually in cognitive science
link |
00:29:46.620
was like Josh Denenbaum, Tom Griffith's work on this stuff.
link |
00:29:49.900
And what we found is that you can actually try
link |
00:29:54.700
to figure out what physics model
link |
00:29:58.420
kind of best explains human actions.
link |
00:30:01.380
And then you can use that to sort of correct what it is
link |
00:30:06.460
that they're commanding the craft to do.
link |
00:30:08.820
So they might, you know, be sending the craft somewhere
link |
00:30:11.420
but instead of executing that action,
link |
00:30:13.340
you can sort of take a step back and say,
link |
00:30:15.260
according to their intuitive,
link |
00:30:16.900
if the world worked according to their intuitive physics
link |
00:30:20.100
model, where do they think that the craft is going?
link |
00:30:23.620
Where are they trying to send it to?
link |
00:30:26.020
And then you can use the real physics, right?
link |
00:30:28.620
The inverse of that to actually figure out
link |
00:30:30.220
what you should do so that you do that
link |
00:30:31.540
instead of where they were actually sending you
link |
00:30:33.380
in the real world.
link |
00:30:34.820
And I kid you not at work people land the damn thing
link |
00:30:38.300
and you know, in between the two flags and all that.
link |
00:30:42.460
So it's not conclusive in any way
link |
00:30:45.180
but I'd say it's evidence that yeah,
link |
00:30:47.300
maybe we're kind of underestimating humans in some ways
link |
00:30:50.420
when we're giving up and saying,
link |
00:30:51.620
yeah, they're just crazy noisy.
link |
00:30:53.220
So then you try to explicitly try to model
link |
00:30:56.300
the kind of worldview that they have.
link |
00:30:58.140
That they have, that's right.
link |
00:30:59.620
That's right.
link |
00:31:00.460
And it's not too, I mean,
link |
00:31:02.260
there's things in behavior economics too
link |
00:31:03.620
that for instance have touched upon the planning horizon.
link |
00:31:06.940
So there's this idea that there's bounded rationality
link |
00:31:09.900
essentially and the idea that, well,
link |
00:31:11.380
maybe we work under computational constraints.
link |
00:31:13.660
And I think kind of our view recently has been
link |
00:31:17.020
take the Bellman update in AI
link |
00:31:19.740
and just break it in all sorts of ways by saying state,
link |
00:31:22.580
no, no, no, the person doesn't get to see the real state.
link |
00:31:25.020
Maybe they're estimating somehow.
link |
00:31:26.540
Transition function, no, no, no, no, no.
link |
00:31:28.860
Even the actual reward evaluation,
link |
00:31:31.580
maybe they're still learning
link |
00:31:32.740
about what it is that they want.
link |
00:31:34.860
Like, you know, when you watch Netflix
link |
00:31:37.740
and you know, you have all the things
link |
00:31:39.420
and then you have to pick something,
link |
00:31:41.700
imagine that, you know, the AI system interpreted
link |
00:31:46.180
that choice as this is the thing you prefer to see.
link |
00:31:48.860
Like, how are you going to know?
link |
00:31:49.700
You're still trying to figure out what you like,
link |
00:31:51.340
what you don't like, et cetera.
link |
00:31:52.620
So I think it's important to also account for that.
link |
00:31:55.540
So it's not irrationality,
link |
00:31:56.780
because they're doing the right thing
link |
00:31:58.100
under the things that they know.
link |
00:31:59.980
Yeah, that's brilliant.
link |
00:32:01.300
You mentioned recommender systems.
link |
00:32:03.260
What kind of, and we were talking
link |
00:32:05.340
about human robot interaction,
link |
00:32:07.140
what kind of problem spaces are you thinking about?
link |
00:32:10.820
So is it robots, like wheeled robots
link |
00:32:14.900
with autonomous vehicles?
link |
00:32:16.020
Is it object manipulation?
link |
00:32:18.580
Like when you think
link |
00:32:19.460
about human robot interaction in your mind,
link |
00:32:21.940
and maybe I'm sure you can speak
link |
00:32:24.460
for the entire community of human robot interaction.
link |
00:32:27.820
But like, what are the problems of interest here?
link |
00:32:30.540
And does it, you know, I kind of think
link |
00:32:34.500
of open domain dialogue as human robot interaction,
link |
00:32:40.860
and that happens not in the physical space,
link |
00:32:43.060
but it could just happen in the virtual space.
link |
00:32:46.380
So where's the boundaries of this field for you
link |
00:32:49.580
when you're thinking about the things
link |
00:32:50.780
we've been talking about?
link |
00:32:51.860
Yeah, so I try to find kind of underlying,
link |
00:33:00.740
I don't know what to even call them.
link |
00:33:02.500
I try to work on, you know, I might call what I do,
link |
00:33:05.060
the kind of working on the foundations
link |
00:33:07.620
of algorithmic human robot interaction
link |
00:33:09.580
and trying to make contributions there.
link |
00:33:12.780
And it's important to me that whatever we do
link |
00:33:15.940
is actually somewhat domain agnostic when it comes to,
link |
00:33:19.340
is it about, you know, autonomous cars
link |
00:33:23.980
or is it about quadrotors or is it about,
link |
00:33:27.780
is this sort of the same underlying principles apply?
link |
00:33:30.780
Of course, when you're trying to get
link |
00:33:31.660
a particular domain to work,
link |
00:33:32.900
you usually have to do some extra work
link |
00:33:34.260
to adapt that to that particular domain.
link |
00:33:36.580
But these things that we were talking about around,
link |
00:33:40.020
well, you know, how do you model humans?
link |
00:33:42.420
It turns out that a lot of systems need
link |
00:33:44.260
to core benefit from a better understanding
link |
00:33:47.260
of how human behavior relates to what people want
link |
00:33:50.940
and need to predict human behavior,
link |
00:33:53.540
physical robots of all sorts and beyond that.
link |
00:33:56.420
And so I used to do manipulation.
link |
00:33:58.540
I used to be, you know, picking up stuff
link |
00:34:00.620
and then I was picking up stuff with people around.
link |
00:34:03.340
And now it's sort of very broad
link |
00:34:05.940
when it comes to the application level,
link |
00:34:07.820
but in a sense, very focused on, okay,
link |
00:34:11.140
how does the problem need to change?
link |
00:34:14.060
How do the algorithms need to change
link |
00:34:15.860
when we're not doing a robot by itself?
link |
00:34:19.980
You know, emptying the dishwasher,
link |
00:34:21.380
but we're stepping outside of that.
link |
00:34:23.780
I thought that popped into my head just now.
link |
00:34:26.820
On the game theoretic side,
link |
00:34:27.860
I think you said this really interesting idea
link |
00:34:29.900
of using actions to gain more information.
link |
00:34:33.300
But if we think of sort of game theory,
link |
00:34:39.780
the humans that are interacting with you,
link |
00:34:43.420
with you, the robot?
link |
00:34:44.540
Wow, I'm thinking the identity of the robot.
link |
00:34:46.420
Yeah, I do that all the time.
link |
00:34:47.460
Yeah, is they also have a world model of you
link |
00:34:55.540
and you can manipulate that.
link |
00:34:57.420
I mean, if we look at autonomous vehicles,
link |
00:34:59.340
people have a certain viewpoint.
link |
00:35:01.420
You said with the kids, people see Alexa in a certain way.
link |
00:35:07.260
Is there some value in trying to also optimize
link |
00:35:10.860
how people see you as a robot?
link |
00:35:15.100
Or is that a little too far away from the specifics
link |
00:35:20.140
of what we can solve right now?
link |
00:35:21.620
So, well, both, right?
link |
00:35:24.340
So it's really interesting.
link |
00:35:26.300
And we've seen a little bit of progress on this problem,
link |
00:35:30.940
on pieces of this problem.
link |
00:35:32.340
So you can, again, it kind of comes down
link |
00:35:36.220
to how complicated does the human model need to be?
link |
00:35:38.260
But in one piece of work that we were looking at,
link |
00:35:42.300
we just said, okay, there's these parameters
link |
00:35:46.180
that are internal to the robot
link |
00:35:47.900
and what the robot is about to do,
link |
00:35:51.620
or maybe what objective,
link |
00:35:52.700
what driving style the robot has or something like that.
link |
00:35:55.260
And what we're gonna do is we're gonna set up a system
link |
00:35:58.180
where part of the state is the person's belief
link |
00:36:00.300
over those parameters.
link |
00:36:02.300
And now when the robot acts,
link |
00:36:05.180
that the person gets new evidence
link |
00:36:07.580
about this robot internal state.
link |
00:36:10.700
And so they're updating their mental model of the robot.
link |
00:36:13.700
So if they see a car that sort of cuts someone off,
link |
00:36:16.940
they're like, oh, that's an aggressive car.
link |
00:36:18.340
They know more.
link |
00:36:20.700
If they see sort of a robot head towards a particular door,
link |
00:36:24.100
they're like, oh yeah, the robot's trying to get
link |
00:36:25.500
to that door.
link |
00:36:26.340
So this thing that we have to do with humans
link |
00:36:27.980
to try and understand their goals and intentions,
link |
00:36:31.060
humans are inevitably gonna do that to robots.
link |
00:36:34.460
And then that raises this interesting question
link |
00:36:36.500
that you asked, which is, can we do something about that?
link |
00:36:38.860
This is gonna happen inevitably,
link |
00:36:40.220
but we can sort of be more confusing
link |
00:36:42.060
or less confusing to people.
link |
00:36:44.100
And it turns out you can optimize
link |
00:36:45.580
for being more informative and less confusing
link |
00:36:48.980
if you have an understanding of how your actions
link |
00:36:51.820
are being interpreted by the human,
link |
00:36:53.540
and how they're using these actions to update their belief.
link |
00:36:56.740
And honestly, all we did is just Bayes rule.
link |
00:36:59.700
Basically, okay, the person has a belief,
link |
00:37:02.980
they see an action, they make some assumptions
link |
00:37:04.820
about how the robot generates its actions,
link |
00:37:06.420
presumably as being rational,
link |
00:37:07.740
because robots are rational.
link |
00:37:09.180
It's reasonable to assume that about them.
link |
00:37:11.340
And then they incorporate that new piece of evidence
link |
00:37:17.300
in the Bayesian sense in their belief,
link |
00:37:19.380
and they obtain a posterior.
link |
00:37:20.700
And now the robot is trying to figure out
link |
00:37:23.020
what actions to take such that it steers
link |
00:37:25.180
the person's belief to put as much probability mass
link |
00:37:27.420
as possible on the correct parameters.
link |
00:37:31.260
So that's kind of a mathematical formalization of that.
link |
00:37:33.940
But my worry, and I don't know if you wanna go there
link |
00:37:38.300
with me, but I talk about this quite a bit.
link |
00:37:44.140
The kids talking to Alexa disrespectfully worries me.
link |
00:37:49.500
I worry in general about human nature.
link |
00:37:52.260
Like I said, I grew up in Soviet Union, World War II,
link |
00:37:54.820
I'm a Jew too, so with the Holocaust and everything.
link |
00:37:58.180
I just worry about how we humans sometimes treat the other,
link |
00:38:02.540
the group that we call the other, whatever it is.
link |
00:38:05.100
Through human history, the group that's the other
link |
00:38:07.300
has been changed faces.
link |
00:38:09.580
But it seems like the robot will be the other, the other,
link |
00:38:13.900
the next other.
link |
00:38:15.700
And one thing is it feels to me
link |
00:38:19.420
that robots don't get no respect.
link |
00:38:22.220
They get shoved around.
link |
00:38:23.420
Shoved around, and is there, one, at the shallow level,
link |
00:38:27.180
for a better experience, it seems that robots
link |
00:38:29.740
need to talk back a little bit.
link |
00:38:31.540
Like my intuition says, I mean, most companies
link |
00:38:35.460
from sort of Roomba, autonomous vehicle companies
link |
00:38:38.420
might not be so happy with the idea that a robot
link |
00:38:41.500
has a little bit of an attitude.
link |
00:38:43.660
But I feel, it feels to me that that's necessary
link |
00:38:46.760
to create a compelling experience.
link |
00:38:48.300
Like we humans don't seem to respect anything
link |
00:38:50.640
that doesn't give us some attitude.
link |
00:38:52.980
That, or like a mix of mystery and attitude and anger
link |
00:38:58.940
and that threatens us subtly, maybe passive aggressively.
link |
00:39:03.940
I don't know.
link |
00:39:04.780
It seems like we humans, yeah, need that.
link |
00:39:08.200
Do you, what are your, is there something,
link |
00:39:10.100
you have thoughts on this?
link |
00:39:11.900
All right, I'll give you two thoughts on this.
link |
00:39:13.100
Okay, sure.
link |
00:39:13.940
One is, one is, it's, we respond to, you know,
link |
00:39:18.940
someone being assertive, but we also respond
link |
00:39:24.220
to someone being vulnerable.
link |
00:39:26.020
So I think robots, my first thought is that
link |
00:39:28.220
robots get shoved around and bullied a lot
link |
00:39:31.460
because they're sort of, you know, tempting
link |
00:39:32.860
and they're sort of showing off
link |
00:39:34.100
or they appear to be showing off.
link |
00:39:35.700
And so I think going back to these things
link |
00:39:38.700
we were talking about in the beginning
link |
00:39:39.940
of making robots a little more, a little more expressive,
link |
00:39:43.940
a little bit more like, eh, that wasn't cool to do.
link |
00:39:46.880
And now I'm bummed, right?
link |
00:39:49.900
I think that that can actually help
link |
00:39:51.500
because people can't help but anthropomorphize
link |
00:39:53.420
and respond to that.
link |
00:39:54.260
Even that though, the emotion being communicated
link |
00:39:56.860
is not in any way a real thing.
link |
00:39:58.740
And people know that it's not a real thing
link |
00:40:00.220
because they know it's just a machine.
link |
00:40:01.860
We're still interpreting, you know, we watch,
link |
00:40:04.500
there's this famous psychology experiment
link |
00:40:07.100
with little triangles and kind of dots on a screen
link |
00:40:11.020
and a triangle is chasing the square
link |
00:40:12.860
and you get really angry at the darn triangle
link |
00:40:15.860
because why is it not leaving the square alone?
link |
00:40:18.500
So that's, yeah, we can't help.
link |
00:40:20.100
So that was the first thought.
link |
00:40:21.460
The vulnerability, that's really interesting that,
link |
00:40:25.500
I think of like being, pushing back, being assertive
link |
00:40:31.620
as the only mechanism of getting,
link |
00:40:33.680
of forming a connection, of getting respect,
link |
00:40:36.300
but perhaps vulnerability,
link |
00:40:37.920
perhaps there's other mechanisms that are less threatening.
link |
00:40:40.100
Yeah.
link |
00:40:40.940
Is there?
link |
00:40:41.760
Well, I think, well, a little bit, yes,
link |
00:40:43.980
but then this other thing that we can think about is,
link |
00:40:47.220
it goes back to what you were saying,
link |
00:40:48.380
that interaction is really game theoretic, right?
link |
00:40:50.640
So the moment you're taking actions in a space,
link |
00:40:52.780
the humans are taking actions in that same space,
link |
00:40:55.380
but you have your own objective, which is, you know,
link |
00:40:58.060
you're a car, you need to get your passenger
link |
00:40:59.640
to the destination.
link |
00:41:00.900
And then the human nearby has their own objective,
link |
00:41:03.740
which somewhat overlaps with you, but not entirely.
link |
00:41:07.060
You're not interested in getting into an accident
link |
00:41:09.180
with each other, but you have different destinations
link |
00:41:11.580
and you wanna get home faster
link |
00:41:13.000
and they wanna get home faster.
link |
00:41:14.620
And that's a general sum game at that point.
link |
00:41:17.580
And so that's, I think that's what,
link |
00:41:22.220
treating it as such is kind of a way we can step outside
link |
00:41:25.620
of this kind of mode that,
link |
00:41:29.580
where you try to anticipate what people do
link |
00:41:32.180
and you don't realize you have any influence over it
link |
00:41:35.260
while still protecting yourself
link |
00:41:37.180
because you're understanding that people also understand
link |
00:41:40.540
that they can influence you.
link |
00:41:42.660
And it's just kind of back and forth is this negotiation,
link |
00:41:45.540
which is really talking about different equilibria
link |
00:41:49.160
of a game.
link |
00:41:50.500
The very basic way to solve coordination
link |
00:41:53.140
is to just make predictions about what people will do
link |
00:41:55.860
and then stay out of their way.
link |
00:41:57.780
And that's hard for the reasons we talked about,
link |
00:41:59.860
which is how you have to understand people's intentions
link |
00:42:02.820
implicitly, explicitly, who knows,
link |
00:42:05.320
but somehow you have to get enough of an understanding
link |
00:42:07.140
of that to be able to anticipate what happens next.
link |
00:42:10.900
And so that's challenging.
link |
00:42:11.980
But then it's further challenged by the fact
link |
00:42:13.900
that people change what they do based on what you do
link |
00:42:17.620
because they don't plan in isolation either, right?
link |
00:42:21.240
So when you see cars trying to merge on a highway
link |
00:42:25.020
and not succeeding, one of the reasons this can be
link |
00:42:27.940
is because they look at traffic that keeps coming,
link |
00:42:33.180
they predict what these people are planning on doing,
link |
00:42:35.940
which is to just keep going,
link |
00:42:37.720
and then they stay out of the way
link |
00:42:39.260
because there's no feasible plan, right?
link |
00:42:42.260
Any plan would actually intersect
link |
00:42:44.640
with one of these other people.
link |
00:42:46.780
So that's bad, so you get stuck there.
link |
00:42:49.380
So now kind of if you start thinking about it as no, no, no,
link |
00:42:53.820
actually these people change what they do
link |
00:42:58.220
depending on what the car does.
link |
00:42:59.900
Like if the car actually tries to kind of inch itself forward,
link |
00:43:03.700
they might actually slow down and let the car in.
link |
00:43:07.220
And now taking advantage of that,
link |
00:43:10.620
well, that's kind of the next level.
link |
00:43:13.600
We call this like this underactuated system idea
link |
00:43:16.260
where it's kind of underactuated system robotics,
link |
00:43:18.700
but it's kind of, you're influenced
link |
00:43:22.100
these other degrees of freedom,
link |
00:43:23.300
but you don't get to decide what they do.
link |
00:43:25.740
I've somewhere seen you mention it,
link |
00:43:28.480
the human element in this picture as underactuated.
link |
00:43:32.020
So you understand underactuated robotics
link |
00:43:35.220
is that you can't fully control the system.
link |
00:43:41.340
You can't go in arbitrary directions
link |
00:43:43.420
in the configuration space.
link |
00:43:44.860
Under your control.
link |
00:43:46.360
Yeah, it's a very simple way of underactuation
link |
00:43:48.860
where basically there's literally these degrees of freedom
link |
00:43:51.060
that you can control,
link |
00:43:52.020
and these degrees of freedom that you can't,
link |
00:43:53.500
but you influence them.
link |
00:43:54.340
And I think that's the important part
link |
00:43:55.900
is that they don't do whatever, regardless of what you do,
link |
00:43:59.460
that what you do influences what they end up doing.
link |
00:44:02.300
I just also like the poetry of calling human robot
link |
00:44:05.460
interaction an underactuated robotics problem.
link |
00:44:09.420
And you also mentioned sort of nudging.
link |
00:44:11.900
It seems that they're, I don't know.
link |
00:44:14.260
I think about this a lot in the case of pedestrians
link |
00:44:16.620
I've collected hundreds of hours of videos.
link |
00:44:18.720
I like to just watch pedestrians.
link |
00:44:21.100
And it seems that.
link |
00:44:22.860
It's a funny hobby.
link |
00:44:24.300
Yeah, it's weird.
link |
00:44:25.740
Cause I learn a lot.
link |
00:44:27.220
I learned a lot about myself,
link |
00:44:28.620
about our human behavior, from watching pedestrians,
link |
00:44:32.940
watching people in their environment.
link |
00:44:35.280
Basically crossing the street
link |
00:44:37.900
is like you're putting your life on the line.
link |
00:44:41.660
I don't know, tens of millions of time in America every day
link |
00:44:44.540
is people are just like playing this weird game of chicken
link |
00:44:48.940
when they cross the street,
link |
00:44:49.980
especially when there's some ambiguity
link |
00:44:51.940
about the right of way.
link |
00:44:54.340
That has to do either with the rules of the road
link |
00:44:56.660
or with the general personality of the intersection
link |
00:44:59.860
based on the time of day and so on.
link |
00:45:02.340
And this nudging idea,
link |
00:45:05.660
it seems that people don't even nudge.
link |
00:45:07.340
They just aggressively take, make a decision.
link |
00:45:10.340
Somebody, there's a runner that gave me this advice.
link |
00:45:14.080
I sometimes run in the street,
link |
00:45:17.740
not in the street, on the sidewalk.
link |
00:45:18.860
And he said that if you don't make eye contact with people
link |
00:45:22.260
when you're running, they will all move out of your way.
link |
00:45:25.700
It's called civil inattention.
link |
00:45:27.500
Civil inattention, that's a thing.
link |
00:45:29.220
Oh wow, I need to look this up, but it works.
link |
00:45:32.020
What is that?
link |
00:45:32.860
My sense was if you communicate like confidence
link |
00:45:37.860
in your actions that you're unlikely to deviate
link |
00:45:41.260
from the action that you're following,
link |
00:45:43.100
that's a really powerful signal to others
link |
00:45:44.940
that they need to plan around your actions.
link |
00:45:47.180
As opposed to nudging where you're sort of hesitantly,
link |
00:45:50.380
then the hesitation might communicate
link |
00:45:53.300
that you're still in the dance and the game
link |
00:45:56.340
that they can influence with their own actions.
link |
00:45:59.460
I've recently had a conversation with Jim Keller,
link |
00:46:03.220
who's a sort of this legendary chip architect,
link |
00:46:08.260
but he also led the autopilot team for a while.
link |
00:46:12.260
And his intuition that driving is fundamentally
link |
00:46:16.820
still like a ballistics problem.
link |
00:46:18.860
Like you can ignore the human element
link |
00:46:22.220
that is just not hitting things.
link |
00:46:24.040
And you can kind of learn the right dynamics
link |
00:46:26.580
required to do the merger and all those kinds of things.
link |
00:46:29.700
And then my sense is, and I don't know if I can provide
link |
00:46:32.660
sort of definitive proof of this,
link |
00:46:34.980
but my sense is like an order of magnitude
link |
00:46:38.060
are more difficult when humans are involved.
link |
00:46:41.540
Like it's not simply object collision avoidance problem.
link |
00:46:48.100
Where does your intuition,
link |
00:46:49.260
of course, nobody knows the right answer here,
link |
00:46:51.020
but where does your intuition fall on the difficulty,
link |
00:46:54.380
fundamental difficulty of the driving problem
link |
00:46:57.060
when humans are involved?
link |
00:46:58.780
Yeah, good question.
link |
00:47:00.360
I have many opinions on this.
link |
00:47:03.260
Imagine downtown San Francisco.
link |
00:47:07.260
Yeah, it's crazy, busy, everything.
link |
00:47:10.740
Okay, now take all the humans out.
link |
00:47:12.800
No pedestrians, no human driven vehicles,
link |
00:47:15.660
no cyclists, no people on little electric scooters
link |
00:47:18.700
zipping around, nothing.
link |
00:47:19.960
I think we're done.
link |
00:47:21.960
I think driving at that point is done.
link |
00:47:23.800
We're done.
link |
00:47:25.000
There's nothing really that still needs
link |
00:47:27.720
to be solved about that.
link |
00:47:28.880
Well, let's pause there.
link |
00:47:30.600
I think I agree with you and I think a lot of people
link |
00:47:34.240
that will hear will agree with that,
link |
00:47:37.400
but we need to sort of internalize that idea.
link |
00:47:41.640
So what's the problem there?
link |
00:47:42.920
Cause we might not quite yet be done with that.
link |
00:47:45.280
Cause a lot of people kind of focus
link |
00:47:46.860
on the perception problem.
link |
00:47:48.200
A lot of people kind of map autonomous driving
link |
00:47:52.840
into how close are we to solving,
link |
00:47:55.720
being able to detect all the, you know,
link |
00:47:57.920
the drivable area, the objects in the scene.
link |
00:48:02.600
Do you see that as a, how hard is that problem?
link |
00:48:07.440
So your intuition there behind your statement
link |
00:48:09.640
was we might have not solved it yet,
link |
00:48:11.520
but we're close to solving basically the perception problem.
link |
00:48:14.520
I think the perception problem, I mean,
link |
00:48:17.120
and by the way, a bunch of years ago,
link |
00:48:19.360
this would not have been true.
link |
00:48:21.520
And a lot of issues in the space were coming
link |
00:48:24.600
from the fact that, oh, we don't really, you know,
link |
00:48:27.040
we don't know what's where.
link |
00:48:29.360
But I think it's fairly safe to say that at this point,
link |
00:48:33.760
although you could always improve on things
link |
00:48:35.840
and all of that, you can drive through downtown San Francisco
link |
00:48:38.880
if there are no people around.
link |
00:48:40.400
There's no really perception issues
link |
00:48:42.520
standing in your way there.
link |
00:48:44.920
I think perception is hard, but yeah, it's, we've made
link |
00:48:47.400
a lot of progress on the perception,
link |
00:48:49.160
so I had to undermine the difficulty of the problem.
link |
00:48:50.920
I think everything about robotics is really difficult,
link |
00:48:53.480
of course, I think that, you know, the planning problem,
link |
00:48:57.160
the control problem, all very difficult,
link |
00:48:59.480
but I think what's, what makes it really kind of, yeah.
link |
00:49:03.520
It might be, I mean, you know,
link |
00:49:05.440
and I picked downtown San Francisco,
link |
00:49:07.000
it's adapting to, well, now it's snowing,
link |
00:49:11.560
now it's no longer snowing, now it's slippery in this way,
link |
00:49:14.080
now it's the dynamics part could,
link |
00:49:16.600
I could imagine being still somewhat challenging, but.
link |
00:49:24.080
No, the thing that I think worries us,
link |
00:49:26.000
and our intuition's not good there,
link |
00:49:27.680
is the perception problem at the edge cases.
link |
00:49:31.560
Sort of downtown San Francisco, the nice thing,
link |
00:49:35.320
it's not actually, it may not be a good example because.
link |
00:49:39.760
Because you know what you're getting from,
link |
00:49:41.360
well, there's like crazy construction zones
link |
00:49:43.200
and all of that. Yeah, but the thing is,
link |
00:49:44.480
you're traveling at slow speeds,
link |
00:49:46.200
so like it doesn't feel dangerous.
link |
00:49:47.840
To me, what feels dangerous is highway speeds,
link |
00:49:51.040
when everything is, to us humans, super clear.
link |
00:49:54.600
Yeah, I'm assuming LiDAR here, by the way.
link |
00:49:57.120
I think it's kind of irresponsible to not use LiDAR.
link |
00:49:59.760
That's just my personal opinion.
link |
00:50:02.440
That's, I mean, depending on your use case,
link |
00:50:04.600
but I think like, you know, if you have the opportunity
link |
00:50:07.480
to use LiDAR, in a lot of cases, you might not.
link |
00:50:11.000
Good, your intuition makes more sense now.
link |
00:50:13.640
So you don't think vision.
link |
00:50:15.200
I really just don't know enough to say,
link |
00:50:18.040
well, vision alone, what, you know, what's like,
link |
00:50:21.440
there's a lot of, how many cameras do you have?
link |
00:50:24.160
Is it, how are you using them?
link |
00:50:25.680
I don't know. There's details.
link |
00:50:26.680
There's all, there's all sorts of details.
link |
00:50:28.400
I imagine there's stuff that's really hard
link |
00:50:30.120
to actually see, you know, how do you deal with glare,
link |
00:50:33.800
exactly what you were saying,
link |
00:50:34.640
stuff that people would see that you don't.
link |
00:50:37.680
I think I have, more of my intuition comes from systems
link |
00:50:40.640
that can actually use LiDAR as well.
link |
00:50:44.240
Yeah, and until we know for sure,
link |
00:50:45.800
it makes sense to be using LiDAR.
link |
00:50:48.000
That's kind of the safety focus.
link |
00:50:50.040
But then the sort of the,
link |
00:50:52.240
I also sympathize with the Elon Musk statement
link |
00:50:55.880
of LiDAR is a crutch.
link |
00:50:57.880
It's a fun notion to think that the things that work today
link |
00:51:04.600
is a crutch for the invention of the things
link |
00:51:08.040
that will work tomorrow, right?
link |
00:51:09.960
Like it, it's kind of true in the sense that if,
link |
00:51:15.520
you know, we want to stick to the comfort zone,
link |
00:51:17.320
you see this in academic and research settings
link |
00:51:19.440
all the time, the things that work force you
link |
00:51:22.360
to not explore outside, think outside the box.
link |
00:51:25.400
I mean, that happens all the time.
link |
00:51:26.840
The problem is in the safety critical systems,
link |
00:51:29.080
you kind of want to stick with the things that work.
link |
00:51:32.120
So it's an interesting and difficult trade off
link |
00:51:34.920
in the case of real world sort of safety critical
link |
00:51:38.400
robotic systems, but so your intuition is,
link |
00:51:44.960
just to clarify, how, I mean,
link |
00:51:48.080
how hard is this human element for,
link |
00:51:51.320
like how hard is driving
link |
00:51:52.760
when this human element is involved?
link |
00:51:55.120
Are we years, decades away from solving it?
link |
00:52:00.040
But perhaps actually the year isn't the thing I'm asking.
link |
00:52:03.880
It doesn't matter what the timeline is,
link |
00:52:05.480
but do you think we're, how many breakthroughs
link |
00:52:09.240
are we away from in solving
link |
00:52:12.320
the human robotic interaction problem
link |
00:52:13.640
to get this, to get this right?
link |
00:52:15.640
I think it, in a sense, it really depends.
link |
00:52:20.520
I think that, you know, we were talking about how,
link |
00:52:24.040
well, look, it's really hard
link |
00:52:25.160
because anticipate what people do is hard.
link |
00:52:27.080
And on top of that, playing the game is hard.
link |
00:52:30.360
But I think we sort of have the fundamental,
link |
00:52:35.960
some of the fundamental understanding for that.
link |
00:52:38.680
And then you already see that these systems
link |
00:52:41.080
are being deployed in the real world,
link |
00:52:45.000
you know, even driverless.
link |
00:52:47.720
Like there's, I think now a few companies
link |
00:52:50.840
that don't have a driver in the car in some small areas.
link |
00:52:55.840
I got a chance to, I went to Phoenix and I,
link |
00:52:59.640
I shot a video with Waymo and I needed to get
link |
00:53:03.560
that video out.
link |
00:53:04.640
People have been giving me slack,
link |
00:53:06.640
but there's incredible engineering work being done there.
link |
00:53:09.280
And it's one of those other seminal moments
link |
00:53:11.160
for me in my life to be able to, it sounds silly,
link |
00:53:13.920
but to be able to drive without a ride, sorry,
link |
00:53:17.640
without a driver in the seat.
link |
00:53:19.360
I mean, that was an incredible robotics.
link |
00:53:22.360
I was driven by a robot without being able to take over,
link |
00:53:27.840
without being able to take the steering wheel.
link |
00:53:31.200
That's a magical, that's a magical moment.
link |
00:53:33.520
So in that regard, in those domains,
link |
00:53:35.560
at least for like Waymo, they're solving that human,
link |
00:53:39.960
there's, I mean, they're going, I mean, it felt fast
link |
00:53:43.520
because you're like freaking out at first.
link |
00:53:45.600
That was, this is my first experience,
link |
00:53:47.440
but it's going like the speed limit, right?
link |
00:53:49.080
30, 40, whatever it is.
link |
00:53:51.200
And there's humans and it deals with them quite well.
link |
00:53:53.840
It detects them, it negotiates the intersections,
link |
00:53:57.000
the left turns and all of that.
link |
00:53:58.240
So at least in those domains, it's solving them.
link |
00:54:01.240
The open question for me is like, how quickly can we expand?
link |
00:54:06.000
You know, that's the, you know,
link |
00:54:08.760
outside of the weather conditions,
link |
00:54:10.080
all of those kinds of things,
link |
00:54:11.040
how quickly can we expand to like cities like San Francisco?
link |
00:54:14.560
Yeah, and I wouldn't say that it's just, you know,
link |
00:54:17.120
now it's just pure engineering and it's probably the,
link |
00:54:20.280
I mean, and by the way,
link |
00:54:22.080
I'm speaking kind of very generally here as hypothesizing,
link |
00:54:26.360
but I think that there are successes
link |
00:54:31.260
and yet no one is everywhere out there.
link |
00:54:34.400
So that seems to suggest that things can be expanded
link |
00:54:38.880
and can be scaled and we know how to do a lot of things,
link |
00:54:41.680
but there's still probably, you know,
link |
00:54:44.080
new algorithms or modified algorithms
link |
00:54:46.760
that you still need to put in there
link |
00:54:49.240
as you learn more and more about new challenges
link |
00:54:53.440
that you get faced with.
link |
00:54:55.760
How much of this problem do you think can be learned
link |
00:54:58.280
through end to end?
link |
00:54:59.120
Is it the success of machine learning
link |
00:55:00.680
and reinforcement learning?
link |
00:55:02.760
How much of it can be learned from sort of data
link |
00:55:05.280
from scratch and how much,
link |
00:55:07.040
which most of the success of autonomous vehicle systems
link |
00:55:10.540
have a lot of heuristics and rule based stuff on top,
link |
00:55:14.400
like human expertise injected forced into the system
link |
00:55:19.320
to make it work.
link |
00:55:20.840
What's your sense?
link |
00:55:22.000
How much, what will be the role of learning
link |
00:55:26.120
in the near term and long term?
link |
00:55:28.160
I think on the one hand that learning is inevitable here,
link |
00:55:36.000
right?
link |
00:55:37.400
I think on the other hand that when people characterize
link |
00:55:39.720
the problem as it's a bunch of rules
link |
00:55:42.080
that some people wrote down,
link |
00:55:44.400
versus it's an end to end RL system or imitation learning,
link |
00:55:49.640
then maybe there's kind of something missing
link |
00:55:53.480
from maybe that's more.
link |
00:55:57.080
So for instance, I think a very, very useful tool
link |
00:56:02.840
in this sort of problem,
link |
00:56:04.360
both in how to generate the car's behavior
link |
00:56:07.360
and robots in general and how to model human beings
link |
00:56:11.720
is actually planning, search optimization, right?
link |
00:56:15.000
So robotics is the sequential decision making problem.
link |
00:56:18.280
And when a robot can figure out on its own
link |
00:56:26.360
how to achieve its goal without hitting stuff
link |
00:56:28.960
and all that stuff, right?
link |
00:56:30.040
All the good stuff for motion planning 101,
link |
00:56:33.080
I think of that as very much AI,
link |
00:56:36.280
not this is some rule or something.
link |
00:56:38.120
There's nothing rule based around that, right?
link |
00:56:40.360
It's just you're searching through a space
link |
00:56:42.000
and figuring out are you optimizing through a space
link |
00:56:43.720
and figure out what seems to be the right thing to do.
link |
00:56:47.320
And I think it's hard to just do that
link |
00:56:49.880
because you need to learn models of the world.
link |
00:56:52.520
And I think it's hard to just do the learning part
link |
00:56:55.720
where you don't bother with any of that,
link |
00:56:58.800
because then you're saying, well, I could do imitation,
link |
00:57:01.720
but then when I go off distribution, I'm really screwed.
link |
00:57:04.640
Or you can say, I can do reinforcement learning,
link |
00:57:08.320
which adds a lot of robustness,
link |
00:57:09.840
but then you have to do either reinforcement learning
link |
00:57:12.640
in the real world, which sounds a little challenging
link |
00:57:15.320
or that trial and error, you know,
link |
00:57:18.400
or you have to do reinforcement learning in simulation.
link |
00:57:21.080
And then that means, well, guess what?
link |
00:57:23.080
You need to model things, at least to model people,
link |
00:57:27.280
model the world enough that whatever policy you get of that
link |
00:57:31.560
is actually fine to roll out in the world
link |
00:57:34.920
and do some additional learning there.
link |
00:57:36.480
So. Do you think simulation, by the way, just a quick tangent
link |
00:57:40.920
has a role in the human robot interaction space?
link |
00:57:44.280
Like, is it useful?
link |
00:57:46.320
It seems like humans, everything we've been talking about
link |
00:57:48.480
are difficult to model and simulate.
link |
00:57:51.400
Do you think simulation has a role in this space?
link |
00:57:53.640
I do.
link |
00:57:54.480
I think so because you can take models
link |
00:57:58.840
and train with them ahead of time, for instance.
link |
00:58:04.040
You can.
link |
00:58:06.080
But the models, sorry to interrupt,
link |
00:58:07.640
the models are sort of human constructed or learned?
link |
00:58:10.480
I think they have to be a combination
link |
00:58:14.880
because if you get some human data and then you say,
link |
00:58:20.520
this is how, this is gonna be my model of the person.
link |
00:58:22.960
What are for simulation and training
link |
00:58:24.440
or for just deployment time?
link |
00:58:25.800
And that's what I'm planning with
link |
00:58:27.200
as my model of how people work.
link |
00:58:29.120
Regardless, if you take some data
link |
00:58:33.440
and you don't assume anything else and you just say,
link |
00:58:35.280
okay, this is some data that I've collected.
link |
00:58:39.200
Let me fit a policy to how people work based on that.
link |
00:58:42.600
What tends to happen is you collected some data
link |
00:58:45.120
and some distribution, and then now your robot
link |
00:58:50.400
sort of computes a best response to that, right?
link |
00:58:52.960
It's sort of like, what should I do
link |
00:58:54.480
if this is how people work?
link |
00:58:56.280
And easily goes off of distribution
link |
00:58:58.600
where that model that you've built of the human
link |
00:59:01.040
completely sucks because out of distribution,
link |
00:59:03.480
you have no idea, right?
link |
00:59:05.120
If you think of all the possible policies
link |
00:59:07.880
and then you take only the ones that are consistent
link |
00:59:10.960
with the human data that you've observed,
link |
00:59:13.040
that still leads a lot of, a lot of things could happen
link |
00:59:15.880
outside of that distribution where you're confident
link |
00:59:18.680
then you know what's going on.
link |
00:59:19.840
By the way, that's, I mean, I've gotten used
link |
00:59:22.640
to this terminology of not a distribution,
link |
00:59:25.360
but it's such a machine learning terminology
link |
00:59:29.000
because it kind of assumes,
link |
00:59:30.800
so distribution is referring to the data
link |
00:59:36.040
that you've seen.
link |
00:59:36.880
The set of states that you encounter
link |
00:59:38.040
at training time. They've encountered so far
link |
00:59:39.400
at training time. Yeah.
link |
00:59:40.720
But it kind of also implies that there's a nice
link |
00:59:43.960
like statistical model that represents that data.
link |
00:59:47.440
So out of distribution feels like, I don't know,
link |
00:59:50.120
it raises to me philosophical questions
link |
00:59:54.400
of how we humans reason out of distribution,
link |
00:59:58.640
reason about things that are completely,
link |
01:00:01.600
we haven't seen before.
link |
01:00:03.240
And so, and what we're talking about here is
link |
01:00:05.760
how do we reason about what other people do
link |
01:00:09.160
in situations where we haven't seen them?
link |
01:00:11.480
And somehow we just magically navigate that.
link |
01:00:14.880
I can anticipate what will happen in situations
link |
01:00:18.000
that are even novel in many ways.
link |
01:00:21.640
And I have a pretty good intuition for,
link |
01:00:22.960
I don't always get it right, but you know,
link |
01:00:24.520
and I might be a little uncertain and so on.
link |
01:00:26.520
But I think it's this that if you just rely on data,
link |
01:00:33.240
you know, there's just too many possibilities,
link |
01:00:36.000
there's too many policies out there that fit the data.
link |
01:00:37.960
And by the way, it's not just state,
link |
01:00:39.320
it's really kind of history of state,
link |
01:00:40.640
cause to really be able to anticipate
link |
01:00:41.840
what the person will do,
link |
01:00:43.080
it kind of depends on what they've been doing so far,
link |
01:00:45.200
cause that's the information you need to kind of,
link |
01:00:47.840
at least implicitly sort of say,
link |
01:00:49.560
oh, this is the kind of person that this is,
link |
01:00:51.320
this is probably what they're trying to do.
link |
01:00:53.080
So anyway, it's like you're trying to map history of states
link |
01:00:55.200
to actions, there's many mappings.
link |
01:00:56.640
And history meaning like the last few seconds
link |
01:00:59.840
or the last few minutes or the last few months.
link |
01:01:02.520
Who knows, who knows how much you need, right?
link |
01:01:04.680
In terms of if your state is really like the positions
link |
01:01:07.280
of everything or whatnot and velocities,
link |
01:01:09.680
who knows how much you need.
link |
01:01:10.520
And then there's so many mappings.
link |
01:01:14.680
And so now you're talking about
link |
01:01:16.560
how do you regularize that space?
link |
01:01:17.960
What priors do you impose or what's the inductive bias?
link |
01:01:21.440
So, you know, there's all very related things
link |
01:01:23.600
to think about it.
link |
01:01:25.800
Basically, what are assumptions that we should be making
link |
01:01:29.800
such that these models actually generalize
link |
01:01:32.600
outside of the data that we've seen?
link |
01:01:35.560
And now you're talking about, well, I don't know,
link |
01:01:37.800
what can you assume?
link |
01:01:38.640
Maybe you can assume that people like actually
link |
01:01:40.840
have intentions and that's what drives their actions.
link |
01:01:43.800
Maybe that's, you know, the right thing to do
link |
01:01:46.560
when you haven't seen data very nearby
link |
01:01:49.600
that tells you otherwise.
link |
01:01:51.000
I don't know, it's a very open question.
link |
01:01:53.360
Do you think sort of that one of the dreams
link |
01:01:55.600
of artificial intelligence was to solve
link |
01:01:58.200
common sense reasoning, whatever the heck that means.
link |
01:02:02.640
Do you think something like common sense reasoning
link |
01:02:04.960
has to be solved in part to be able to solve this dance
link |
01:02:09.040
of human robot interaction, the driving space
link |
01:02:12.280
or human robot interaction in general?
link |
01:02:14.960
Do you have to be able to reason about these kinds
link |
01:02:16.880
of common sense concepts of physics,
link |
01:02:21.880
of, you know, all the things we've been talking about
link |
01:02:27.640
humans, I don't even know how to express them with words,
link |
01:02:30.640
but the basics of human behavior, a fear of death.
link |
01:02:34.680
So like, to me, it's really important to encode
link |
01:02:38.080
in some kind of sense, maybe not, maybe it's implicit,
link |
01:02:41.920
but it feels that it's important to explicitly encode
link |
01:02:44.760
the fear of death, that people don't wanna die.
link |
01:02:48.200
Because it seems silly, but like the game of chicken
link |
01:02:56.880
that involves with the pedestrian crossing the street
link |
01:02:59.800
is playing with the idea of mortality.
link |
01:03:03.000
Like we really don't wanna die.
link |
01:03:04.240
It's not just like a negative reward.
link |
01:03:07.000
I don't know, it just feels like all these human concepts
link |
01:03:10.040
have to be encoded.
link |
01:03:11.760
Do you share that sense or is this a lot simpler
link |
01:03:14.320
than I'm making out to be?
link |
01:03:15.840
I think it might be simpler.
link |
01:03:17.080
And I'm the person who likes to complicate things.
link |
01:03:18.840
I think it might be simpler than that.
link |
01:03:21.120
Because it turns out, for instance,
link |
01:03:24.200
if you say model people in the very,
link |
01:03:29.560
I'll call it traditional, I don't know if it's fair
link |
01:03:31.720
to look at it as a traditional way,
link |
01:03:33.040
but you know, calling people as,
link |
01:03:35.360
okay, they're rational somehow,
link |
01:03:37.880
the utilitarian perspective.
link |
01:03:40.080
Well, in that, once you say that,
link |
01:03:45.080
you automatically capture that they have an incentive
link |
01:03:48.960
to keep on being.
link |
01:03:50.960
You know, Stuart likes to say,
link |
01:03:53.720
you can't fetch the coffee if you're dead.
link |
01:03:56.960
Stuart Russell, by the way.
link |
01:03:59.960
That's a good line.
link |
01:04:01.320
So when you're sort of treating agents
link |
01:04:05.600
as having these objectives, these incentives,
link |
01:04:10.240
humans or artificial, you're kind of implicitly modeling
link |
01:04:14.880
that they'd like to stick around
link |
01:04:16.960
so that they can accomplish those goals.
link |
01:04:20.160
So I think in a sense,
link |
01:04:22.760
maybe that's what draws me so much
link |
01:04:24.200
to the rationality framework,
link |
01:04:25.520
even though it's so broken,
link |
01:04:26.800
we've been able to, it's been such a useful perspective.
link |
01:04:30.680
And like we were talking about earlier,
link |
01:04:32.200
what's the alternative?
link |
01:04:33.040
I give up and go home or, you know,
link |
01:04:34.360
I just use complete black boxes,
link |
01:04:36.040
but then I don't know what to assume out of distribution
link |
01:04:37.960
that come back to this.
link |
01:04:40.040
It's just, it's been a very fruitful way
link |
01:04:42.600
to think about the problem
link |
01:04:43.960
in a very more positive way, right?
link |
01:04:47.240
People aren't just crazy.
link |
01:04:49.080
Maybe they make more sense than we think.
link |
01:04:51.440
But I think we also have to somehow be ready for it
link |
01:04:55.640
to be wrong, be able to detect
link |
01:04:58.200
when these assumptions aren't holding,
link |
01:05:00.440
be all of that stuff.
link |
01:05:02.880
Let me ask sort of another small side of this
link |
01:05:06.640
that we've been talking about
link |
01:05:07.800
the pure autonomous driving problem,
link |
01:05:09.920
but there's also relatively successful systems
link |
01:05:13.720
already deployed out there in what you may call
link |
01:05:17.360
like level two autonomy or semi autonomous vehicles,
link |
01:05:20.680
whether that's Tesla Autopilot,
link |
01:05:23.400
work quite a bit with Cadillac SuperGuru system,
link |
01:05:27.480
which has a driver facing camera that detects your state.
link |
01:05:31.320
There's a bunch of basically lane centering systems.
link |
01:05:35.400
What's your sense about this kind of way of dealing
link |
01:05:41.160
with the human robot interaction problem
link |
01:05:43.160
by having a really dumb robot
link |
01:05:46.400
and relying on the human to help the robot out
link |
01:05:50.280
to keep them both alive?
link |
01:05:53.000
Is that from the research perspective,
link |
01:05:57.400
how difficult is that problem?
link |
01:05:59.280
And from a practical deployment perspective,
link |
01:06:02.240
is that a fruitful way to approach
link |
01:06:05.960
this human robot interaction problem?
link |
01:06:08.080
I think what we have to be careful about there
link |
01:06:12.120
is to not, it seems like some of these systems,
link |
01:06:16.240
not all are making this underlying assumption
link |
01:06:19.880
that if, so I'm a driver and I'm now really not driving,
link |
01:06:25.560
but supervising and my job is to intervene, right?
link |
01:06:28.920
And so we have to be careful with this assumption
link |
01:06:31.280
that when I'm, if I'm supervising,
link |
01:06:36.640
I will be just as safe as when I'm driving.
link |
01:06:41.640
That I will, if I wouldn't get into some kind of accident,
link |
01:06:46.840
if I'm driving, I will be able to avoid that accident
link |
01:06:50.880
when I'm supervising too.
link |
01:06:52.240
And I think I'm concerned about this assumption
link |
01:06:55.120
from a few perspectives.
link |
01:06:56.840
So from a technical perspective,
link |
01:06:58.440
it's that when you let something kind of take control
link |
01:07:01.400
and do its thing, and it depends on what that thing is,
link |
01:07:03.800
obviously, and how much it's taking control
link |
01:07:05.480
and how, what things are you trusting it to do.
link |
01:07:07.920
But if you let it do its thing and take control,
link |
01:07:11.880
it will go to what we might call off policy
link |
01:07:15.080
from the person's perspective state.
link |
01:07:16.800
So states that the person wouldn't actually
link |
01:07:18.440
find themselves in if they were the ones driving.
link |
01:07:22.000
And the assumption that the person functions
link |
01:07:24.120
just as well there as they function in the states
link |
01:07:26.280
that they would normally encounter
link |
01:07:28.080
is a little questionable.
link |
01:07:30.040
Now, another part is the kind of the human factor side
link |
01:07:34.400
of this, which is that I don't know about you,
link |
01:07:38.320
but I think I definitely feel like I'm experiencing things
link |
01:07:42.120
very differently when I'm actively engaged in the task
link |
01:07:45.320
versus when I'm a passive observer.
link |
01:07:47.000
Like even if I try to stay engaged, right?
link |
01:07:49.400
It's very different than when I'm actually
link |
01:07:51.120
actively making decisions.
link |
01:07:53.560
And you see this in life in general.
link |
01:07:55.480
Like you see students who are actively trying
link |
01:07:58.360
to come up with the answer, learn this thing better
link |
01:08:00.920
than when they're passively told the answer.
link |
01:08:03.000
I think that's somewhat related.
link |
01:08:04.360
And I think people have studied this in human factors
link |
01:08:06.680
for airplanes.
link |
01:08:07.600
And I think it's actually fairly established
link |
01:08:10.200
that these two are not the same.
link |
01:08:12.160
So.
link |
01:08:13.000
On that point, because I've gotten a huge amount
link |
01:08:14.960
of heat on this and I stand by it.
link |
01:08:17.120
Okay.
link |
01:08:18.960
Because I know the human factors community well
link |
01:08:22.000
and the work here is really strong.
link |
01:08:24.040
And there's many decades of work showing exactly
link |
01:08:27.040
what you're saying.
link |
01:08:28.280
Nevertheless, I've been continuously surprised
link |
01:08:30.920
that much of the predictions of that work has been wrong
link |
01:08:33.800
in what I've seen.
link |
01:08:35.360
So what we have to do,
link |
01:08:37.880
I still agree with everything you said,
link |
01:08:40.320
but we have to be a little bit more open minded.
link |
01:08:45.640
So the, I'll tell you, there's a few surprising things
link |
01:08:49.480
that supervise, like everything you said to the word
link |
01:08:52.960
is actually exactly correct.
link |
01:08:54.840
But it doesn't say, what you didn't say
link |
01:08:57.880
is that these systems are,
link |
01:09:00.160
you said you can't assume a bunch of things,
link |
01:09:02.480
but we don't know if these systems are fundamentally unsafe.
link |
01:09:06.680
That's still unknown.
link |
01:09:08.800
There's a lot of interesting things,
link |
01:09:11.040
like I'm surprised by the fact, not the fact,
link |
01:09:15.880
that what seems to be anecdotally from,
link |
01:09:18.840
well, from large data collection that we've done,
link |
01:09:21.160
but also from just talking to a lot of people,
link |
01:09:23.960
when in the supervisory role of semi autonomous systems
link |
01:09:27.120
that are sufficiently dumb, at least,
link |
01:09:29.480
which is, that might be the key element,
link |
01:09:33.560
is the systems have to be dumb.
link |
01:09:35.200
The people are actually more energized as observers.
link |
01:09:38.680
So they're actually better,
link |
01:09:40.600
they're better at observing the situation.
link |
01:09:43.400
So there might be cases in systems,
link |
01:09:46.520
if you get the interaction right,
link |
01:09:48.320
where you, as a supervisor,
link |
01:09:50.880
will do a better job with the system together.
link |
01:09:53.600
I agree, I think that is actually really possible.
link |
01:09:56.760
I guess mainly I'm pointing out that if you do it naively,
link |
01:10:00.080
you're implicitly assuming something,
link |
01:10:02.160
that assumption might actually really be wrong.
link |
01:10:04.480
But I do think that if you explicitly think about
link |
01:10:09.120
what the agent should do
link |
01:10:10.720
so that the person still stays engaged.
link |
01:10:13.480
What the, so that you essentially empower the person
link |
01:10:16.400
to do more than they could,
link |
01:10:17.560
that's really the goal, right?
link |
01:10:19.080
Is you still have a driver,
link |
01:10:20.280
so you wanna empower them to be so much better
link |
01:10:25.320
than they would be by themselves.
link |
01:10:27.040
And that's different, it's a very different mindset
link |
01:10:29.760
than I want them to basically not drive, right?
link |
01:10:33.160
And, but be ready to sort of take over.
link |
01:10:40.320
So one of the interesting things we've been talking about
link |
01:10:42.360
is the rewards, that they seem to be fundamental too,
link |
01:10:47.000
the way robots behaves.
link |
01:10:49.200
So broadly speaking,
link |
01:10:52.440
we've been talking about utility functions and so on,
link |
01:10:54.320
but could you comment on how do we approach
link |
01:10:56.960
the design of reward functions?
link |
01:10:59.640
Like, how do we come up with good reward functions?
link |
01:11:02.600
Well, really good question,
link |
01:11:05.160
because the answer is we don't.
link |
01:11:10.880
This was, you know, I used to think,
link |
01:11:13.560
I used to think about how,
link |
01:11:16.480
well, it's actually really hard to specify rewards
link |
01:11:18.920
for interaction because it's really supposed to be
link |
01:11:22.960
what the people want, and then you really, you know,
link |
01:11:25.040
we talked about how you have to customize
link |
01:11:26.600
what you wanna do to the end user.
link |
01:11:30.720
But I kind of realized that even if you take
link |
01:11:36.080
the interactive component away,
link |
01:11:39.200
it's still really hard to design reward functions.
link |
01:11:42.680
So what do I mean by that?
link |
01:11:43.800
I mean, if we assume this sort of AI paradigm
link |
01:11:47.360
in which there's an agent and his job is to optimize
link |
01:11:51.080
some objectives, some reward, utility, loss, whatever, cost,
link |
01:11:58.280
if you write it out, maybe it's a set,
link |
01:12:00.280
depending on the situation or whatever it is,
link |
01:12:03.680
if you write that out and then you deploy the agent,
link |
01:12:06.960
you'd wanna make sure that whatever you specified
link |
01:12:10.240
incentivizes the behavior you want from the agent
link |
01:12:14.840
in any situation that the agent will be faced with, right?
link |
01:12:18.640
So I do motion planning on my robot arm,
link |
01:12:22.080
I specify some cost function like, you know,
link |
01:12:25.920
this is how far away you should try to stay,
link |
01:12:28.080
so much it matters to stay away from people,
link |
01:12:29.560
and this is how much it matters to be able to be efficient
link |
01:12:31.800
and blah, blah, blah, right?
link |
01:12:33.920
I need to make sure that whatever I specified,
link |
01:12:36.560
those constraints or trade offs or whatever they are,
link |
01:12:40.160
that when the robot goes and solves that problem
link |
01:12:43.360
in every new situation,
link |
01:12:45.120
that behavior is the behavior that I wanna see.
link |
01:12:47.920
And what I've been finding is
link |
01:12:50.160
that we have no idea how to do that.
link |
01:12:52.320
Basically, what I can do is I can sample,
link |
01:12:56.520
I can think of some situations
link |
01:12:58.160
that I think are representative of what the robot will face,
link |
01:13:02.240
and I can tune and add and tune some reward function
link |
01:13:08.320
until the optimal behavior is what I want
link |
01:13:11.560
on those situations,
link |
01:13:13.280
which first of all is super frustrating
link |
01:13:15.800
because, you know, through the miracle of AI,
link |
01:13:19.040
we've taken, we don't have to specify rules
link |
01:13:21.360
for behavior anymore, right?
link |
01:13:22.880
The, who were saying before,
link |
01:13:24.520
the robot comes up with the right thing to do,
link |
01:13:27.000
you plug in this situation,
link |
01:13:28.520
it optimizes right in that situation, it optimizes,
link |
01:13:31.640
but you have to spend still a lot of time
link |
01:13:34.680
on actually defining what it is
link |
01:13:37.200
that that criteria should be,
link |
01:13:39.000
making sure you didn't forget
link |
01:13:40.040
about 50 bazillion things that are important
link |
01:13:42.400
and how they all should be combining together
link |
01:13:44.640
to tell the robot what's good and what's bad
link |
01:13:46.800
and how good and how bad.
link |
01:13:48.840
And so I think this is a lesson that I don't know,
link |
01:13:55.360
kind of, I guess I close my eyes to it for a while
link |
01:13:59.120
cause I've been, you know,
link |
01:14:00.240
tuning cost functions for 10 years now,
link |
01:14:03.640
but it's really strikes me that,
link |
01:14:07.120
yeah, we've moved the tuning
link |
01:14:09.600
and the like designing of features or whatever
link |
01:14:13.240
from the behavior side into the reward side.
link |
01:14:19.720
And yes, I agree that there's way less of it,
link |
01:14:22.040
but it still seems really hard
link |
01:14:24.000
to anticipate any possible situation
link |
01:14:26.960
and make sure you specify a reward function
link |
01:14:30.240
that when optimized will work well
link |
01:14:32.800
in every possible situation.
link |
01:14:35.160
So you're kind of referring to unintended consequences
link |
01:14:38.600
or just in general, any kind of suboptimal behavior
link |
01:14:42.120
that emerges outside of the things you said,
link |
01:14:44.840
out of distribution.
link |
01:14:46.520
Suboptimal behavior that is, you know, actually optimal.
link |
01:14:49.720
I mean, this, I guess the idea of unintended consequences,
link |
01:14:51.640
you know, it's optimal respect to what you specified,
link |
01:14:53.720
but it's not what you want.
link |
01:14:55.480
And there's a difference between those.
link |
01:14:57.560
But that's not fundamentally a robotics problem, right?
link |
01:14:59.880
That's a human problem.
link |
01:15:01.320
So like. That's the thing, right?
link |
01:15:03.440
So there's this thing called Goodhart's law,
link |
01:15:05.280
which is you set a metric for an organization
link |
01:15:07.920
and the moment it becomes a target
link |
01:15:10.880
that people actually optimize for,
link |
01:15:13.040
it's no longer a good metric.
link |
01:15:15.000
What's it called?
link |
01:15:15.840
Goodhart's law.
link |
01:15:16.680
Goodhart's law.
link |
01:15:17.520
So the moment you specify a metric,
link |
01:15:20.120
it stops doing its job.
link |
01:15:21.600
Yeah, it stops doing its job.
link |
01:15:24.000
So there's, yeah, there's such a thing
link |
01:15:25.120
as optimizing for things and, you know,
link |
01:15:27.400
failing to think ahead of time
link |
01:15:32.200
of all the possible things that might be important.
link |
01:15:35.600
And so that's, so that's interesting
link |
01:15:38.080
because Historia works a lot on reward learning
link |
01:15:41.560
from the perspective of customizing to the end user,
link |
01:15:44.000
but it really seems like it's not just the interaction
link |
01:15:48.040
with the end user that's a problem of the human
link |
01:15:50.880
and the robot collaborating
link |
01:15:52.320
so that the robot can do what the human wants, right?
link |
01:15:55.160
This kind of back and forth, the robot probing,
link |
01:15:57.280
the person being informative, all of that stuff
link |
01:16:00.200
might be actually just as applicable
link |
01:16:04.400
to this kind of maybe new form of human robot interaction,
link |
01:16:07.440
which is the interaction between the robot
link |
01:16:10.760
and the expert programmer, roboticist designer
link |
01:16:14.280
in charge of actually specifying
link |
01:16:16.240
what the heck the robot should do,
link |
01:16:18.360
specifying the task for the robot.
link |
01:16:20.200
That's fascinating.
link |
01:16:21.040
That's so cool, like collaborating on the reward design.
link |
01:16:23.800
Right, collaborating on the reward design.
link |
01:16:26.200
And so what does it mean, right?
link |
01:16:28.080
What does it, when we think about the problem,
link |
01:16:29.840
not as someone specifies all of your job is to optimize,
link |
01:16:34.400
and we start thinking about you're in this interaction
link |
01:16:37.600
and this collaboration.
link |
01:16:39.280
And the first thing that comes up is
link |
01:16:42.440
when the person specifies a reward, it's not, you know,
link |
01:16:46.360
gospel, it's not like the letter of the law.
link |
01:16:48.720
It's not the definition of the reward function
link |
01:16:52.080
you should be optimizing,
link |
01:16:53.320
because they're doing their best,
link |
01:16:54.840
but they're not some magic perfect oracle.
link |
01:16:57.120
And the sooner we start understanding that,
link |
01:16:58.720
I think the sooner we'll get to more robust robots
link |
01:17:02.360
that function better in different situations.
link |
01:17:06.400
And then you have kind of say, okay, well,
link |
01:17:08.480
it's almost like robots are over learning,
link |
01:17:12.680
over putting too much weight on the reward specified
link |
01:17:16.760
by definition, and maybe leaving a lot of other information
link |
01:17:21.120
on the table, like what are other things we could do
link |
01:17:23.280
to actually communicate to the robot
link |
01:17:25.480
about what we want them to do besides attempting
link |
01:17:28.280
to specify a reward function.
link |
01:17:29.600
Yeah, you have this awesome,
link |
01:17:31.760
and again, I love the poetry of it, of leaked information.
link |
01:17:34.760
So you mentioned humans leak information
link |
01:17:38.680
about what they want, you know,
link |
01:17:40.880
leak reward signal for the robot.
link |
01:17:44.960
So how do we detect these leaks?
link |
01:17:47.680
What is that?
link |
01:17:48.520
Yeah, what are these leaks?
link |
01:17:49.960
Whether it just, I don't know,
link |
01:17:51.840
those were just recently saw it, read it,
link |
01:17:54.040
I don't know where from you,
link |
01:17:55.200
and it's gonna stick with me for a while for some reason,
link |
01:17:58.640
because it's not explicitly expressed.
link |
01:18:00.920
It kind of leaks indirectly from our behavior.
link |
01:18:04.520
From what we do, yeah, absolutely.
link |
01:18:06.160
So I think maybe some surprising bits, right?
link |
01:18:11.320
So we were talking before about, I'm a robot arm,
link |
01:18:14.760
it needs to move around people, carry stuff,
link |
01:18:18.200
put stuff away, all of that.
link |
01:18:20.520
And now imagine that, you know,
link |
01:18:25.080
the robot has some initial objective
link |
01:18:27.160
that the programmer gave it
link |
01:18:28.960
so they can do all these things functionally.
link |
01:18:30.680
It's capable of doing that.
link |
01:18:32.240
And now I noticed that it's doing something
link |
01:18:35.800
and maybe it's coming too close to me, right?
link |
01:18:39.480
And maybe I'm the designer,
link |
01:18:40.520
maybe I'm the end user and this robot is now in my home.
link |
01:18:43.840
And I push it away.
link |
01:18:47.800
So I push away because, you know,
link |
01:18:49.320
it's a reaction to what the robot is currently doing.
link |
01:18:52.360
And this is what we call physical human robot interaction.
link |
01:18:55.800
And now there's a lot of interesting work
link |
01:18:58.440
on how the heck do you respond to physical human
link |
01:19:00.640
robot interaction?
link |
01:19:01.480
What should the robot do if such an event occurs?
link |
01:19:03.520
And there's sort of different schools of thought.
link |
01:19:05.000
Well, you know, you can sort of treat it
link |
01:19:07.040
the control theoretic way and say,
link |
01:19:08.280
this is a disturbance that you must reject.
link |
01:19:11.160
You can sort of treat it more kind of heuristically
link |
01:19:15.880
and say, I'm gonna go into some like gravity compensation
link |
01:19:18.040
mode so that I'm easily maneuverable around.
link |
01:19:19.800
I'm gonna go in the direction that the person pushed me.
link |
01:19:22.280
And to us, part of realization has been
link |
01:19:27.280
that that is signal that communicates about the reward.
link |
01:19:30.480
Because if my robot was moving in an optimal way
link |
01:19:34.560
and I intervened, that means that I disagree
link |
01:19:37.760
with his notion of optimality, right?
link |
01:19:40.240
Whatever it thinks is optimal is not actually optimal.
link |
01:19:43.560
And sort of optimization problems aside,
link |
01:19:45.960
that means that the cost function,
link |
01:19:47.400
the reward function is incorrect,
link |
01:19:51.400
or at least is not what I want it to be.
link |
01:19:53.560
How difficult is that signal to interpret
link |
01:19:58.440
and make actionable?
link |
01:19:59.400
So like, cause this connects
link |
01:20:00.800
to our autonomous vehicle discussion
link |
01:20:02.120
where they're in the semi autonomous vehicle
link |
01:20:03.960
or autonomous vehicle when a safety driver
link |
01:20:06.480
disengages the car, like,
link |
01:20:08.480
but they could have disengaged it for a million reasons.
link |
01:20:11.840
Yeah, so that's true.
link |
01:20:15.080
Again, it comes back to, can you structure a little bit
link |
01:20:19.840
your assumptions about how human behavior
link |
01:20:22.040
relates to what they want?
link |
01:20:24.240
And you can, one thing that we've done is
link |
01:20:26.320
literally just treated this external torque
link |
01:20:29.480
that they applied as, when you take that
link |
01:20:32.960
and you add it with what the torque
link |
01:20:34.800
the robot was already applying,
link |
01:20:36.600
that overall action is probably relatively optimal
link |
01:20:39.680
in respect to whatever it is that the person wants.
link |
01:20:41.800
And then that gives you information
link |
01:20:43.040
about what it is that they want.
link |
01:20:44.320
So you can learn that people want you
link |
01:20:45.680
to stay further away from them.
link |
01:20:47.600
Now you're right that there might be many things
link |
01:20:49.760
that explain just that one signal
link |
01:20:51.360
and that you might need much more data than that
link |
01:20:53.360
for the person to be able to shape
link |
01:20:55.480
your reward function over time.
link |
01:20:58.640
You can also do this info gathering stuff
link |
01:21:00.880
that we were talking about.
link |
01:21:01.760
Not that we've done that in that context,
link |
01:21:03.280
just to clarify, but it's definitely something
link |
01:21:04.800
we thought about where you can have the robot
link |
01:21:09.080
start acting in a way, like if there's
link |
01:21:11.040
a bunch of different explanations, right?
link |
01:21:13.400
It moves in a way where it sees if you correct it
link |
01:21:16.360
in some other way or not,
link |
01:21:17.600
and then kind of actually plans its motion
link |
01:21:19.920
so that it can disambiguate
link |
01:21:21.760
and collect information about what you want.
link |
01:21:24.880
Anyway, so that's one way,
link |
01:21:26.000
that's kind of sort of leaked information,
link |
01:21:27.440
maybe even more subtle leaked information
link |
01:21:29.280
is if I just press the E stop, right?
link |
01:21:32.760
I just, I'm doing it out of panic
link |
01:21:34.040
because the robot is about to do something bad.
link |
01:21:36.280
There's again, information there, right?
link |
01:21:38.480
Okay, the robot should definitely stop,
link |
01:21:40.800
but it should also figure out
link |
01:21:42.560
that whatever it was about to do was not good.
link |
01:21:45.240
And in fact, it was so not good
link |
01:21:46.720
that stopping and remaining stopped for a while
link |
01:21:48.920
was a better trajectory for it
link |
01:21:51.080
than whatever it is that it was about to do.
link |
01:21:52.760
And that again is information about
link |
01:21:54.800
what are my preferences, what do I want?
link |
01:21:57.560
Speaking of E stops, what are your expert opinions
link |
01:22:03.600
on the three laws of robotics from Isaac Asimov
link |
01:22:08.160
that don't harm humans, obey orders, protect yourself?
link |
01:22:11.280
I mean, it's such a silly notion,
link |
01:22:13.320
but I speak to so many people these days,
link |
01:22:15.400
just regular folks, just, I don't know,
link |
01:22:17.040
my parents and so on about robotics.
link |
01:22:19.360
And they kind of operate in that space of,
link |
01:22:23.440
you know, imagining our future with robots
link |
01:22:25.800
and thinking what are the ethical,
link |
01:22:28.440
how do we get that dance right?
link |
01:22:31.520
I know the three laws might be a silly notion,
link |
01:22:34.040
but do you think about like
link |
01:22:35.560
what universal reward functions that might be
link |
01:22:39.000
that we should enforce on the robots of the future?
link |
01:22:44.000
Or is that a little too far out and it doesn't,
link |
01:22:48.160
or is the mechanism that you just described,
link |
01:22:51.240
it shouldn't be three laws,
link |
01:22:52.680
it should be constantly adjusting kind of thing.
link |
01:22:55.160
I think it should constantly be adjusting kind of thing.
link |
01:22:57.840
You know, the issue with the laws is,
link |
01:23:01.000
I don't even, you know, they're words
link |
01:23:02.600
and I have to write math
link |
01:23:04.600
and have to translate them into math.
link |
01:23:06.240
What does it mean to?
link |
01:23:07.280
What does harm mean?
link |
01:23:08.200
What is, it's not math.
link |
01:23:11.920
Obey what, right?
link |
01:23:12.880
Cause we just talked about how
link |
01:23:14.720
you try to say what you want,
link |
01:23:17.040
but you don't always get it right.
link |
01:23:19.880
And you want these machines to do what you want,
link |
01:23:22.520
not necessarily exactly what you literally,
link |
01:23:24.560
so you don't want them to take you literally.
link |
01:23:26.600
You wanna take what you say and interpret it in context.
link |
01:23:31.600
And that's what we do with the specified rewards.
link |
01:23:33.520
We don't take them literally anymore from the designer.
link |
01:23:36.720
We, not we as a community, we as, you know,
link |
01:23:39.680
some members of my group, we,
link |
01:23:44.160
and some of our collaborators like Peter Beal
link |
01:23:46.360
and Stuart Russell, we sort of say,
link |
01:23:50.160
okay, the designer specified this thing,
link |
01:23:53.320
but I'm gonna interpret it not as,
link |
01:23:55.640
this is the universal reward function
link |
01:23:57.160
that I shall always optimize always and forever,
link |
01:23:59.520
but as this is good evidence about what the person wants.
link |
01:24:05.440
And I should interpret that evidence
link |
01:24:07.400
in the context of these situations that it was specified for.
link |
01:24:11.000
Cause ultimately that's what the designer thought about.
link |
01:24:12.840
That's what they had in mind.
link |
01:24:14.280
And really them specifying reward function
link |
01:24:16.800
that works for me in all these situations
link |
01:24:18.960
is really kind of telling me that whatever behavior
link |
01:24:22.120
that incentivizes must be good behavior
link |
01:24:24.040
with respect to the thing
link |
01:24:25.960
that I should actually be optimizing for.
link |
01:24:28.120
And so now the robot kind of has uncertainty
link |
01:24:30.320
about what it is that it should be,
link |
01:24:32.320
what its reward function is.
link |
01:24:34.320
And then there's all these additional signals
link |
01:24:36.320
that we've been finding that it can kind of continually
link |
01:24:39.160
learn from and adapt its understanding of what people want.
link |
01:24:41.800
Every time the person corrects it, maybe they demonstrate,
link |
01:24:44.880
maybe they stop, hopefully not, right?
link |
01:24:48.440
One really, really crazy one is the environment itself.
link |
01:24:54.920
Like our world, you don't, it's not, you know,
link |
01:24:58.960
you observe our world and the state of it.
link |
01:25:01.600
And it's not that you're seeing behavior
link |
01:25:03.600
and you're saying, oh, people are making decisions
link |
01:25:05.280
that are rational, blah, blah, blah.
link |
01:25:07.160
It's, but our world is something that we've been acting with
link |
01:25:12.240
according to our preferences.
link |
01:25:14.240
So I have this example where like,
link |
01:25:15.680
the robot walks into my home and my shoes are laid down
link |
01:25:18.880
on the floor kind of in a line, right?
link |
01:25:21.120
It took effort to do that.
link |
01:25:23.320
So even though the robot doesn't see me doing this,
link |
01:25:27.480
you know, actually aligning the shoes,
link |
01:25:29.920
it should still be able to figure out
link |
01:25:31.560
that I want the shoes aligned
link |
01:25:33.240
because there's no way for them to have magically,
link |
01:25:35.920
you know, be instantiated themselves in that way.
link |
01:25:39.040
Someone must have actually taken the time to do that.
link |
01:25:43.720
So it must be important.
link |
01:25:44.680
So the environment actually tells, the environment is.
link |
01:25:46.920
Leaks information.
link |
01:25:48.040
It leaks information.
link |
01:25:48.880
I mean, the environment is the way it is
link |
01:25:50.680
because humans somehow manipulated it.
link |
01:25:52.880
So you have to kind of reverse engineer the narrative
link |
01:25:55.760
that happened to create the environment as it is
link |
01:25:57.800
and that leaks the preference information.
link |
01:26:00.640
Yeah, and you have to be careful, right?
link |
01:26:03.160
Because people don't have the bandwidth to do everything.
link |
01:26:06.720
So just because, you know, my house is messy
link |
01:26:08.120
doesn't mean that I want it to be messy, right?
link |
01:26:10.840
But that just, you know, I didn't put the effort into that.
link |
01:26:14.440
I put the effort into something else.
link |
01:26:16.280
So the robot should figure out,
link |
01:26:17.440
well, that something else was more important,
link |
01:26:19.200
but it doesn't mean that, you know,
link |
01:26:20.400
the house being messy is not.
link |
01:26:21.640
So it's a little subtle, but yeah, we really think of it.
link |
01:26:24.560
The state itself is kind of like a choice
link |
01:26:26.800
that people implicitly made about how they want their world.
link |
01:26:31.800
What book or books, technical or fiction or philosophical,
link |
01:26:34.920
when you like look back, you know, life had a big impact,
link |
01:26:39.560
maybe it was a turning point, it was inspiring in some way.
link |
01:26:42.600
Maybe we're talking about some silly book
link |
01:26:45.600
that nobody in their right mind would want to read.
link |
01:26:48.520
Or maybe it's a book that you would recommend
link |
01:26:51.560
to others to read.
link |
01:26:52.480
Or maybe those could be two different recommendations
link |
01:26:56.120
of books that could be useful for people on their journey.
link |
01:27:00.520
When I was in, it's kind of a personal story.
link |
01:27:03.520
When I was in 12th grade,
link |
01:27:05.520
I got my hands on a PDF copy in Romania
link |
01:27:10.520
of Russell Norvig, AI modern approach.
link |
01:27:14.520
I didn't know anything about AI at that point.
link |
01:27:16.520
I was, you know, I had watched the movie,
link |
01:27:19.520
The Matrix was my exposure.
link |
01:27:22.520
And so I started going through this thing
link |
01:27:28.520
and, you know, you were asking in the beginning,
link |
01:27:31.520
what are, you know, it's math and it's algorithms,
link |
01:27:35.520
what's interesting.
link |
01:27:36.520
It was so captivating.
link |
01:27:38.520
This notion that you could just have a goal
link |
01:27:41.520
and figure out your way through
link |
01:27:44.520
kind of a messy, complicated situation.
link |
01:27:47.520
So what sequence of decisions you should make
link |
01:27:50.520
to autonomously to achieve that goal.
link |
01:27:53.520
That was so cool.
link |
01:27:55.520
I'm, you know, I'm biased, but that's a cool book to look at.
link |
01:28:00.520
You can convert, you know, the goal of intelligence,
link |
01:28:03.520
the process of intelligence and mechanize it.
link |
01:28:06.520
I had the same experience.
link |
01:28:07.520
I was really interested in psychiatry
link |
01:28:09.520
and trying to understand human behavior.
link |
01:28:11.520
And then AI modern approach is like, wait,
link |
01:28:14.520
you can just reduce it all to.
link |
01:28:15.520
You can write math about human behavior, right?
link |
01:28:18.520
Yeah.
link |
01:28:19.520
So that's, and I think that stuck with me
link |
01:28:21.520
because, you know, a lot of what I do, a lot of what we do
link |
01:28:25.520
in my lab is write math about human behavior,
link |
01:28:28.520
combine it with data and learning, put it all together,
link |
01:28:31.520
give it to robots to plan with, and, you know,
link |
01:28:33.520
hope that instead of writing rules for the robots,
link |
01:28:37.520
writing heuristics, designing behavior,
link |
01:28:39.520
they can actually autonomously come up with the right thing
link |
01:28:42.520
to do around people.
link |
01:28:43.520
That's kind of our, you know, that's our signature move.
link |
01:28:46.520
We wrote some math and then instead of kind of hand crafting
link |
01:28:49.520
this and that and that and the robot figuring stuff out
link |
01:28:52.520
and isn't that cool.
link |
01:28:53.520
And I think that is the same enthusiasm that I got from
link |
01:28:56.520
the robot figured out how to reach that goal in that graph.
link |
01:28:59.520
Isn't that cool?
link |
01:29:02.520
So apologize for the romanticized questions,
link |
01:29:05.520
but, and the silly ones,
link |
01:29:07.520
if a doctor gave you five years to live,
link |
01:29:11.520
sort of emphasizing the finiteness of our existence,
link |
01:29:15.520
what would you try to accomplish?
link |
01:29:20.520
It's like my biggest nightmare, by the way.
link |
01:29:22.520
I really like living.
link |
01:29:24.520
So I'm actually, I really don't like the idea of being told
link |
01:29:28.520
that I'm going to die.
link |
01:29:30.520
Sorry to linger on that for a second.
link |
01:29:32.520
Do you, I mean, do you meditate or ponder on your mortality
link |
01:29:36.520
or human, the fact that this thing ends,
link |
01:29:38.520
it seems to be a fundamental feature.
link |
01:29:41.520
Do you think of it as a feature or a bug too?
link |
01:29:44.520
Is it, you said you don't like the idea of dying,
link |
01:29:47.520
but if I were to give you a choice of living forever,
link |
01:29:50.520
like you're not allowed to die.
link |
01:29:52.520
Now I'll say that I want to live forever,
link |
01:29:54.520
but I watched this show.
link |
01:29:55.520
It's very silly.
link |
01:29:56.520
It's called The Good Place and they reflect a lot on this.
link |
01:29:59.520
And you know, the,
link |
01:30:00.520
the moral of the story is that you have to make the afterlife
link |
01:30:03.520
be a finite too.
link |
01:30:05.520
Cause otherwise people just kind of, it's like Wally.
link |
01:30:08.520
It's like, ah, whatever.
link |
01:30:10.520
So, so I think the finiteness helps, but,
link |
01:30:13.520
but yeah, it's just, you know, I don't, I don't,
link |
01:30:16.520
I'm not a religious person.
link |
01:30:18.520
I don't think that there's something after.
link |
01:30:21.520
And so I think it just ends and you stop existing.
link |
01:30:25.520
And I really like existing.
link |
01:30:26.520
It's just, it's such a great privilege to exist that,
link |
01:30:31.520
that yeah, it's just, I think that's the scary part.
link |
01:30:35.520
I still think that we like existing so much because it ends.
link |
01:30:40.520
And that's so sad.
link |
01:30:41.520
Like it's so sad to me every time.
link |
01:30:43.520
Like I find almost everything about this life beautiful.
link |
01:30:46.520
Like the silliest, most mundane things are just beautiful.
link |
01:30:49.520
And I think I'm cognizant of the fact that I find it beautiful
link |
01:30:52.520
because it ends like it.
link |
01:30:55.520
And it's so, I don't know.
link |
01:30:57.520
I don't know how to feel about that.
link |
01:30:59.520
I also feel like there's a lesson in there for robotics
link |
01:31:03.520
and AI that is not like the finiteness of things seems
link |
01:31:10.520
to be a fundamental nature of human existence.
link |
01:31:13.520
I think some people sort of accuse me of just being Russian
link |
01:31:16.520
and melancholic and romantic or something,
link |
01:31:19.520
but that seems to be a fundamental nature of our existence
link |
01:31:24.520
that should be incorporated in our reward functions.
link |
01:31:28.520
But anyway, if you were speaking of reward functions,
link |
01:31:34.520
if you only had five years, what would you try to accomplish?
link |
01:31:38.520
This is the thing.
link |
01:31:41.520
I'm thinking about this question and have a pretty joyous moment
link |
01:31:45.520
because I don't know that I would change much.
link |
01:31:49.520
I'm trying to make some contributions to how we understand
link |
01:31:55.520
human AI interaction.
link |
01:31:57.520
I don't think I would change that.
link |
01:32:00.520
Maybe I'll take more trips to the Caribbean or something,
link |
01:32:04.520
but I tried some of that already from time to time.
link |
01:32:08.520
So, yeah, I try to do the things that bring me joy
link |
01:32:13.520
and thinking about these things bring me joy is the Marie Kondo thing.
link |
01:32:17.520
Don't do stuff that doesn't spark joy.
link |
01:32:19.520
For the most part, I do things that spark joy.
link |
01:32:22.520
Maybe I'll do less service in the department or something.
link |
01:32:25.520
I'm not dealing with admissions anymore.
link |
01:32:30.520
But no, I think I have amazing colleagues and amazing students
link |
01:32:36.520
and amazing family and friends and spending time in some balance
link |
01:32:40.520
with all of them is what I do and that's what I'm doing already.
link |
01:32:44.520
So, I don't know that I would really change anything.
link |
01:32:47.520
So, on the spirit of positiveness, what small act of kindness,
link |
01:32:52.520
if one pops to mind, were you once shown that you will never forget?
link |
01:32:57.520
When I was in high school, my friends, my classmates did some tutoring.
link |
01:33:08.520
We were gearing up for our baccalaureate exam
link |
01:33:11.520
and they did some tutoring on, well, some on math, some on whatever.
link |
01:33:15.520
I was comfortable enough with some of those subjects,
link |
01:33:19.520
but physics was something that I hadn't focused on in a while.
link |
01:33:22.520
And so, they were all working with this one teacher
link |
01:33:28.520
and I started working with that teacher.
link |
01:33:31.520
Her name is Nicole Beccano.
link |
01:33:33.520
And she was the one who kind of opened up this whole world for me
link |
01:33:39.520
because she sort of told me that I should take the SATs
link |
01:33:44.520
and apply to go to college abroad and do better on my English and all of that.
link |
01:33:51.520
And when it came to, well, financially I couldn't,
link |
01:33:55.520
my parents couldn't really afford to do all these things,
link |
01:33:58.520
she started tutoring me on physics for free
link |
01:34:01.520
and on top of that sitting down with me to kind of train me for SATs
link |
01:34:06.520
and all that jazz that she had experience with.
link |
01:34:09.520
Wow. And obviously that has taken you to be here today,
link |
01:34:15.520
sort of one of the world experts in robotics.
link |
01:34:17.520
It's funny those little... For no reason really.
link |
01:34:24.520
Just out of karma.
link |
01:34:27.520
Wanting to support someone, yeah.
link |
01:34:29.520
Yeah. So, we talked a ton about reward functions.
link |
01:34:33.520
Let me talk about the most ridiculous big question.
link |
01:34:37.520
What is the meaning of life?
link |
01:34:39.520
What's the reward function under which we humans operate?
link |
01:34:42.520
Like what, maybe to your life, maybe broader to human life in general,
link |
01:34:47.520
what do you think...
link |
01:34:51.520
What gives life fulfillment, purpose, happiness, meaning?
link |
01:34:57.520
You can't even ask that question with a straight face.
link |
01:34:59.520
That's how ridiculous this is.
link |
01:35:00.520
I can't, I can't.
link |
01:35:01.520
Okay. So, you know...
link |
01:35:05.520
You're going to try to answer it anyway, aren't you?
link |
01:35:09.520
So, I was in a planetarium once.
link |
01:35:13.520
Yes.
link |
01:35:14.520
And, you know, they show you the thing and then they zoom out and zoom out
link |
01:35:18.520
and this whole, like, you're a speck of dust kind of thing.
link |
01:35:20.520
I think I was conceptualizing that we're kind of, you know, what are humans?
link |
01:35:23.520
We're just on this little planet, whatever.
link |
01:35:26.520
We don't matter much in the grand scheme of things.
link |
01:35:29.520
And then my mind got really blown because they talked about this multiverse theory
link |
01:35:35.520
where they kind of zoomed out and were like, this is our universe.
link |
01:35:38.520
And then, like, there's a bazillion other ones and they just pop in and out of existence.
link |
01:35:42.520
So, like, our whole thing that we can't even fathom how big it is was like a blimp that went in and out.
link |
01:35:48.520
And at that point, I was like, okay, like, I'm done.
link |
01:35:51.520
This is not, there is no meaning.
link |
01:35:54.520
And clearly what we should be doing is try to impact whatever local thing we can impact,
link |
01:35:59.520
our communities, leave a little bit behind there, our friends, our family, our local communities,
link |
01:36:05.520
and just try to be there for other humans because I just, everything beyond that seems ridiculous.
link |
01:36:13.520
I mean, are you, like, how do you make sense of these multiverses?
link |
01:36:16.520
Like, are you inspired by the immensity of it?
link |
01:36:21.520
Do you, I mean, is there, like, is it amazing to you or is it almost paralyzing in the mystery of it?
link |
01:36:34.520
It's frustrating.
link |
01:36:35.520
I'm frustrated by my inability to comprehend.
link |
01:36:41.520
It just feels very frustrating.
link |
01:36:43.520
It's like there's some stuff that, you know, we should time, blah, blah, blah, that we should really be understanding.
link |
01:36:48.520
And I definitely don't understand it.
link |
01:36:50.520
But, you know, the amazing physicists of the world have a much better understanding than me.
link |
01:36:56.520
But it still seems epsilon in the grand scheme of things.
link |
01:36:58.520
So, it's very frustrating.
link |
01:37:00.520
It just, it sort of feels like our brain don't have some fundamental capacity yet, well, yet or ever.
link |
01:37:06.520
I don't know.
link |
01:37:07.520
Well, that's one of the dreams of artificial intelligence is to create systems that will aid,
link |
01:37:12.520
expand our cognitive capacity in order to understand, build the theory of everything with the physics
link |
01:37:19.520
and understand what the heck these multiverses are.
link |
01:37:24.520
So, I think there's no better way to end it than talking about the meaning of life and the fundamental nature of the universe and the multiverses.
link |
01:37:32.520
And the multiverse.
link |
01:37:33.520
So, Anca, it is a huge honor.
link |
01:37:35.520
One of my favorite conversations I've had.
link |
01:37:38.520
I really, really appreciate your time.
link |
01:37:40.520
Thank you for talking today.
link |
01:37:41.520
Thank you for coming.
link |
01:37:42.520
Come back again.
link |
01:37:44.520
Thanks for listening to this conversation with Anca Dragan.
link |
01:37:47.520
And thank you to our presenting sponsor, Cash App.
link |
01:37:50.520
Please consider supporting the podcast by downloading Cash App and using code LexPodcast.
link |
01:37:56.520
If you enjoy this podcast, subscribe on YouTube, review it with 5 stars on Apple Podcast,
link |
01:38:01.520
support it on Patreon, or simply connect with me on Twitter at LexFriedman.
link |
01:38:07.520
And now, let me leave you with some words from Isaac Asimov.
link |
01:38:12.520
Your assumptions are your windows in the world.
link |
01:38:15.520
Scrub them off every once in a while or the light won't come in.
link |
01:38:20.520
Thank you for listening and hope to see you next time.