back to index

Leslie Kaelbling: Reinforcement Learning, Planning, and Robotics | Lex Fridman Podcast #15


small model | large model

link |
00:00:00.000
The following is a conversation with Leslie Kailbling. She is a roboticist and professor at
link |
00:00:05.440
MIT. She is recognized for her work in reinforcement learning, planning, robot navigation, and several
link |
00:00:12.080
other topics in AI. She won the Ijkai Computers and Thought Award and was the editor in chief
link |
00:00:18.560
of the prestigious journal machine learning research. This conversation is part of the
link |
00:00:24.320
artificial intelligence podcast at MIT and beyond. If you enjoy it, subscribe on YouTube,
link |
00:00:30.400
iTunes, or simply connect with me on Twitter at Lex Friedman, spelled F R I D. And now,
link |
00:00:37.760
here's my conversation with Leslie Kailbling. What made me get excited about AI, I can say
link |
00:00:45.360
that, is I read Girdle Escherbach when I was in high school. That was pretty formative for me
link |
00:00:49.920
because it exposed the interestingness of primitives and combination and how you can
link |
00:00:59.280
make complex things out of simple parts and ideas of AI and what kinds of programs might
link |
00:01:06.000
generate intelligent behavior. So you first fell in love with AI reasoning logic versus robots?
link |
00:01:12.800
Yeah, the robots came because my first job, so I finished an undergraduate degree in philosophy
link |
00:01:18.240
at Stanford and was about to finish masters in computer science and I got hired at SRI
link |
00:01:25.440
in their AI lab and they were building a robot. It was a kind of a follow on to shaky,
link |
00:01:30.960
but all the shaky people were not there anymore. And so my job was to try to get this robot to
link |
00:01:35.840
do stuff and that's really kind of what got me interested in robots. So maybe taking a small
link |
00:01:41.200
step back to your bachelor's in Stanford philosophy, did master's in PhD in computer science,
link |
00:01:46.160
but the bachelor's in philosophy. So what was that journey like? What elements of philosophy
link |
00:01:52.320
do you think you bring to your work in computer science?
link |
00:01:55.200
So it's surprisingly relevant. So part of the reason that I didn't do a computer science
link |
00:02:00.080
undergraduate degree was that there wasn't one at Stanford at the time, but that there's part of
link |
00:02:04.560
philosophy and in fact Stanford has a special sub major in something called now Symbolic Systems,
link |
00:02:09.200
which is logic, model, theory, formal semantics of natural language. And so that's actually
link |
00:02:15.520
a perfect preparation for work in AI and computer science.
link |
00:02:19.680
That's kind of interesting. So if you were interested in artificial intelligence,
link |
00:02:26.000
what kind of majors were people even thinking about taking? What is in your science? So besides
link |
00:02:32.560
philosophies, what were you supposed to do if you were fascinated by the idea of creating
link |
00:02:37.120
intelligence? There weren't enough people who did that for that even to be a conversation.
link |
00:02:41.440
I mean, I think probably philosophy. I mean, it's interesting in my graduating class of
link |
00:02:49.920
undergraduate philosophers, probably maybe slightly less than half went on in computer
link |
00:02:57.120
science, slightly less than half went on in law, and like one or two went on in philosophy.
link |
00:03:03.360
So it was a common kind of connection. Do you think AI researchers have a role,
link |
00:03:07.920
be part time philosophers, or should they stick to the solid science and engineering
link |
00:03:12.480
without sort of taking the philosophizing tangents? I mean, you work with robots,
link |
00:03:17.200
you think about what it takes to create intelligent beings. Aren't you the perfect person to think
link |
00:03:22.960
about the big picture philosophy at all? The parts of philosophy that are closest to AI,
link |
00:03:27.440
I think, or at least the closest to AI that I think about are stuff like
link |
00:03:30.400
belief and knowledge and denotation and that kind of stuff. It's quite formal, and it's
link |
00:03:38.400
like just one step away from the kinds of computer science work that we do kind of routinely.
link |
00:03:45.680
I think that there are important questions still about what you can do with a machine and what
link |
00:03:53.040
you can't and so on. Although at least my personal view is that I'm completely a materialist,
link |
00:03:57.680
and I don't think that there's any reason why we can't make a robot be
link |
00:04:02.800
behaviorally indistinguishable from a human. And the question of whether it's
link |
00:04:08.480
distinguishable internally, whether it's a zombie or not in philosophy terms, I actually don't,
link |
00:04:14.720
I don't know, and I don't know if I care too much about that.
link |
00:04:16.960
Right, but there is a philosophical notions there, mathematical and philosophical,
link |
00:04:22.080
because we don't know so much of how difficult that is, how difficult is a perception problem.
link |
00:04:27.520
How difficult is the planning problem? How difficult is it to operate in this world successfully?
link |
00:04:32.640
Because our robots are not currently as successful as human beings in many tasks.
link |
00:04:37.920
The question about the gap between current robots and human beings borders a little bit
link |
00:04:44.320
on philosophy. The expanse of knowledge that's required to operate in this world and the ability
link |
00:04:52.400
to form common sense knowledge, the ability to reason about uncertainty, much of the work
link |
00:04:57.280
you've been doing, there's open questions there that, I don't know, require to activate a certain
link |
00:05:06.320
big picture view. To me, that doesn't seem like a philosophical gap at all.
link |
00:05:10.640
To me, there is a big technical gap. There's a huge technical gap,
link |
00:05:15.040
but I don't see any reason why it's more than a technical gap.
link |
00:05:19.360
Perfect. When you mentioned AI, you mentioned SRI, and maybe can you describe to me when you
link |
00:05:28.400
first fell in love with robotics, with robots, or inspired, so you mentioned flaky or shaky flaky,
link |
00:05:38.400
and what was the robot that first captured your imagination of what's possible?
link |
00:05:42.720
Right. The first robot I worked with was flaky. Shaky was a robot that the SRI people had built,
link |
00:05:47.920
but by the time, I think when I arrived, it was sitting in a corner of somebody's office
link |
00:05:53.360
dripping hydraulic fluid into a pan, but it's iconic. Really, everybody should read the Shaky
link |
00:06:00.640
Tech Report because it has so many good ideas in it. They invented ASTAR Search and symbolic
link |
00:06:07.840
planning and learning macro operators. They had low level kind of configuration space planning for
link |
00:06:15.520
the robot. They had vision. That's the basic ideas of a ton of things.
link |
00:06:20.160
Can you take a step by it? Shaky was a mobile robot, but it could push objects,
link |
00:06:27.920
and so it would move things around. With which actuator?
link |
00:06:31.680
With its self, with its base. They had painted the baseboards black,
link |
00:06:40.080
so it used vision to localize itself in a map. It detected objects. It could detect objects that
link |
00:06:48.320
were surprising to it. It would plan and replan based on what it saw. It reasoned about whether
link |
00:06:54.800
to look and take pictures. It really had the basics of so many of the things that we think about now.
link |
00:07:03.280
How did it represent the space around it?
link |
00:07:05.360
It had representations at a bunch of different levels of abstraction,
link |
00:07:09.680
so it had, I think, a kind of an occupancy grid of some sort at the lowest level.
link |
00:07:14.880
At the high level, it was abstract, symbolic kind of rooms and connectivity.
link |
00:07:20.000
So where does Flakey come in?
link |
00:07:22.160
Yeah, okay. I showed up at SRI and we were building a brand new robot. As I said, none of the people
link |
00:07:29.600
from the previous project were there or involved anymore, so we were starting from scratch.
link |
00:07:34.240
My advisor was Stan Rosenstein. He ended up being my thesis advisor. He was motivated by this idea
link |
00:07:43.920
of situated computation or situated automata. The idea was that the tools of logical reasoning were
link |
00:07:52.400
important, but possibly only for the engineers or designers to use in the analysis of a system,
link |
00:08:01.200
but not necessarily to be manipulated in the head of the system itself.
link |
00:08:06.400
So I might use logic to prove a theorem about the behavior of my robot,
link |
00:08:10.480
even if the robot's not using logic, and it's had to prove theorems. So that was kind of the
link |
00:08:14.400
distinction. And so the idea was to kind of use those principles to make a robot do stuff.
link |
00:08:22.800
But a lot of the basic things we had to kind of learn for ourselves, because I had zero
link |
00:08:28.960
background in robotics. I didn't know anything about control. I didn't know anything about
link |
00:08:32.160
sensors. So we reinvented a lot of wheels on the way to getting that robot to do stuff.
link |
00:08:36.640
Do you think that was an advantage or hindrance?
link |
00:08:39.120
Oh, no. I'm big in favor of wheel reinvention, actually. I mean, I think you learned a lot
link |
00:08:45.600
by doing it. It's important though to eventually have the pointers so that you can see what's
link |
00:08:51.920
really going on. But I think you can appreciate much better the good solutions once you've
link |
00:08:58.080
messed around a little bit on your own and found a bad one.
link |
00:09:00.400
Yeah, I think you mentioned reinventing reinforcement learning and referring to
link |
00:09:04.880
rewards as pleasures, a pleasure, I think, which I think is a nice name for it.
link |
00:09:12.800
It's more fun, almost. Do you think you could tell the history of AI, machine learning,
link |
00:09:18.960
reinforcement learning, how you think about it from the 50s to now?
link |
00:09:23.520
One thing is that it oscillates. So things become fashionable and then they go out and
link |
00:09:29.440
then something else becomes cool and then it goes out and so on. So there's some interesting
link |
00:09:34.480
sociological process that actually drives a lot of what's going on. Early days was cybernetics and
link |
00:09:41.600
control and the idea that of homeostasis, people who made these robots that could,
link |
00:09:48.320
I don't know, try to plug into the wall when they needed power and then come loose and roll
link |
00:09:54.400
around and do stuff. And then I think over time, they thought, well, that was inspiring, but people
link |
00:10:00.960
said, no, no, no, we want to get maybe closer to what feels like real intelligence or human
link |
00:10:04.880
intelligence. And then maybe the expert systems people tried to do that, but maybe a little
link |
00:10:15.040
too superficially. So we get this surface understanding of what intelligence is like,
link |
00:10:21.760
because I understand how a steel mill works and I can try to explain it to you and you can write
link |
00:10:25.840
it down in logic and then we can make a computer infer that. And then that didn't work out.
link |
00:10:32.400
But what's interesting, I think, is when a thing starts to not be working very well,
link |
00:10:38.720
it's not only do we change methods, we change problems. So it's not like we have better ways
link |
00:10:44.480
of doing the problem of the expert systems people are trying to do. We have no ways of
link |
00:10:48.160
trying to do that problem. Oh, yeah, no, I think maybe a few. But we kind of give up on that problem
link |
00:10:56.800
and we switch to a different problem. And we work that for a while and we make progress.
link |
00:11:01.520
As a broad community. As a community. And there's a lot of people who would argue,
link |
00:11:04.960
you don't give up on the problem. It's just the decrease in the number of people working on it.
link |
00:11:09.760
You almost kind of like put it on the shelf. So we'll come back to this 20 years later.
link |
00:11:13.920
Yeah, I think that's right. Or you might decide that it's malformed. Like you might say,
link |
00:11:21.600
it's wrong to just try to make something that does superficial symbolic reasoning behave like a
link |
00:11:26.800
doctor. You can't do that until you've had the sensory motor experience of being a doctor or
link |
00:11:34.000
something. So there's arguments that say that that problem was not well formed. Or it could be
link |
00:11:38.560
that it is well formed, but we just weren't approaching it well. So you mentioned that your
link |
00:11:44.160
favorite part of logic and symbolic systems is that they give short names for large sets.
link |
00:11:49.840
So there is some use to this. They use symbolic reasoning. So looking at expert systems
link |
00:11:56.960
and symbolic computing, what do you think are the roadblocks that were hit in the 80s and 90s?
link |
00:12:02.640
Okay, so right. So the fact that I'm not a fan of expert systems doesn't mean that I'm not a fan
link |
00:12:08.320
of some kind of symbolic reasoning. So let's see roadblocks. Well, the main roadblock, I think,
link |
00:12:16.640
was that the idea that humans could articulate their knowledge effectively into some kind of
link |
00:12:25.040
logical statements. So it's not just the cost, the effort, but really just the capability of
link |
00:12:30.560
doing it. Right. Because we're all experts in vision, but totally don't have introspective access
link |
00:12:37.120
into how we do that. Right. And it's true that, I mean, I think the idea was, well, of course,
link |
00:12:45.440
even people then would know, of course, I wouldn't ask you to please write down the rules that you
link |
00:12:49.040
use for recognizing a water bottle. That's crazy. And everyone understood that. But we might ask
link |
00:12:54.000
you to please write down the rules you use for deciding, I don't know what tie to put on or
link |
00:13:00.800
or how to set up a microphone or something like that. But even those things, I think people maybe,
link |
00:13:08.880
I think what they found, I'm not sure about this, but I think what they found was that the
link |
00:13:12.720
so called experts could give explanations that sort of post hoc explanations for how and why
link |
00:13:19.120
they did things, but they weren't necessarily very good. And then they depended on maybe some
link |
00:13:27.680
kinds of perceptual things, which again, they couldn't really define very well. So I think,
link |
00:13:33.280
I think fundamentally, I think that the underlying problem with that was the assumption that people
link |
00:13:38.800
could articulate how and why they make their decisions. Right. So it's almost encoding the
link |
00:13:45.280
knowledge from converting from expert to something that a machine can understand and reason with.
link |
00:13:51.440
No, no, no, not even just encoding, but getting it out of you. Not not not writing it. I mean,
link |
00:13:58.880
yes, hard also to write it down for the computer. But I don't think that people can
link |
00:14:04.240
produce it. You can tell me a story about why you do stuff. But I'm not so sure that's the why.
link |
00:14:11.440
Great. So there are still on the hierarchical planning side,
link |
00:14:16.960
places where symbolic reasoning is very useful. So as you've talked about, so
link |
00:14:27.840
where so don't where's the gap? Yeah, okay, good. So saying that humans can't provide a
link |
00:14:34.400
description of their reasoning processes. That's okay, fine. But that doesn't mean that it's not
link |
00:14:40.560
good to do reasoning of various styles inside a computer. Those are just two orthogonal points.
link |
00:14:44.880
So then the question is, what kind of reasoning should you do inside a computer?
link |
00:14:50.560
Right. And the answer is, I think you need to do all different kinds of reasoning inside
link |
00:14:55.680
a computer, depending on what kinds of problems you face. I guess the question is, what kind of
link |
00:15:01.680
things can you encode symbolically so you can reason about? I think the idea about and and
link |
00:15:12.880
even symbolic, I don't even like that terminology because I don't know what it means technically
link |
00:15:18.080
and formally. I do believe in abstractions. So abstractions are critical, right? You cannot
link |
00:15:24.240
reason at completely fine grain about everything in your life, right? You can't make a plan at the
link |
00:15:30.240
level of images and torques for getting a PhD. So you have to reduce the size of the state space
link |
00:15:37.680
and you have to reduce the horizon if you're going to reason about getting a PhD or even buying
link |
00:15:43.040
the ingredients to make dinner. And so how can you reduce the spaces and the horizon of the
link |
00:15:50.080
reasoning you have to do? And the answer is abstraction, spatial abstraction, temporal
link |
00:15:53.200
abstraction. I think abstraction along the lines of goals is also interesting, like you might
link |
00:15:58.800
or well, abstraction and decomposition. Goals is maybe more of a decomposition thing.
link |
00:16:03.840
So I think that's where these kinds of, if you want to call it symbolic or discrete
link |
00:16:08.880
models come in. You talk about a room of your house instead of your pose. You talk about
link |
00:16:16.800
doing something during the afternoon instead of at 2.54. And you do that because it makes
link |
00:16:22.560
your reasoning problem easier and also because you don't have enough information
link |
00:16:30.000
to reason in high fidelity about your pose of your elbow at 2.35 this afternoon anyway.
link |
00:16:37.120
Right. When you're trying to get a PhD.
link |
00:16:39.440
Right. Or when you're doing anything really.
link |
00:16:41.600
Yeah, okay. Except for at that moment. At that moment,
link |
00:16:44.400
you do have to reason about the pose of your elbow, maybe. But then maybe you do that in some
link |
00:16:48.160
continuous joint space kind of model. And so again, my biggest point about all of this is that
link |
00:16:55.680
there should be, the dogma is not the thing, right? It shouldn't be that I am in favor
link |
00:17:01.440
against symbolic reasoning and you're in favor against neural networks. It should be that just
link |
00:17:07.600
computer science tells us what the right answer to all these questions is if we were smart enough
link |
00:17:12.240
to figure it out. Yeah. When you try to actually solve the problem with computers, the right answer
link |
00:17:16.960
comes out. You mentioned abstractions. I mean, neural networks form abstractions or rather,
link |
00:17:22.880
there's automated ways to form abstractions and there's expert driven ways to form abstractions
link |
00:17:30.320
and expert human driven ways. And humans just seems to be way better at forming abstractions
link |
00:17:35.920
currently and certain problems. So when you're referring to 2.45 a.m. versus afternoon,
link |
00:17:44.960
how do we construct that taxonomy? Is there any room for automated construction of such
link |
00:17:49.920
abstractions? Oh, I think eventually, yeah. I mean, I think when we get to be better
link |
00:17:56.160
and machine learning engineers, we'll build algorithms that build awesome abstractions.
link |
00:18:02.240
That are useful in this kind of way that you're describing. Yeah. So let's then step from
link |
00:18:07.840
the abstraction discussion and let's talk about BOMMDP's
link |
00:18:14.400
Partially Observable Markov Decision Processes. So uncertainty. So first, what are Markov Decision
link |
00:18:21.440
Processes? What are Markov Decision Processes? Maybe how much of our world can be models and
link |
00:18:27.520
MDPs? How much when you wake up in the morning and you're making breakfast, how do you think
link |
00:18:32.080
of yourself as an MDP? So how do you think about MDPs and how they relate to our world?
link |
00:18:38.080
Well, so there's a stance question, right? So a stance is a position that I take with
link |
00:18:43.040
respect to a problem. So I as a researcher or a person who designed systems can decide to make
link |
00:18:52.160
a model of the world around me in some terms. So I take this messy world and I say, I'm going to
link |
00:18:58.960
treat it as if it were a problem of this formal kind, and then I can apply solution concepts
link |
00:19:04.640
or algorithms or whatever to solve that formal thing, right? So of course, the world is not
link |
00:19:09.120
anything. It's not an MDP or a POMDP. I don't know what it is, but I can model aspects of it
link |
00:19:14.080
in some way or some other way. And when I model some aspect of it in a certain way, that gives me
link |
00:19:19.280
some set of algorithms I can use. You can model the world in all kinds of ways. Some have some
link |
00:19:26.400
are more accepting of uncertainty, more easily modeling uncertainty of the world. Some really
link |
00:19:32.880
force the world to be deterministic. And so certainly MDPs model the uncertainty of the world.
link |
00:19:40.720
Yes. Model some uncertainty. They model not present state uncertainty, but they model uncertainty
link |
00:19:47.200
in the way the future will unfold. Right. So what are Markov decision processes?
link |
00:19:53.840
So Markov decision process is a model. It's a kind of a model that you can make that says,
link |
00:19:57.680
I know completely the current state of my system. And what it means to be a state is that I have
link |
00:20:05.600
all the information right now that will let me make predictions about the future as well as I
link |
00:20:10.720
can. So that remembering anything about my history wouldn't make my predictions any better.
link |
00:20:18.720
But then it also says that then I can take some actions that might change the state of the world
link |
00:20:23.680
and that I don't have a deterministic model of those changes. I have a probabilistic model
link |
00:20:28.800
of how the world might change. It's a useful model for some kinds of systems. I mean, it's
link |
00:20:35.600
certainly not a good model for most problems. I think because for most problems, you don't
link |
00:20:43.280
actually know the state. For most problems, it's partially observed. So that's now a different
link |
00:20:49.680
problem class. So okay, that's where the problem depies, the partially observed Markov decision
link |
00:20:56.480
process step in. So how do they address the fact that you can't observe most the incomplete
link |
00:21:03.600
information about most of the world around you? Right. So now the idea is we still kind of postulate
link |
00:21:09.360
that there exists a state. We think that there is some information about the world out there
link |
00:21:14.640
such that if we knew that we could make good predictions, but we don't know the state.
link |
00:21:18.800
And so then we have to think about how, but we do get observations. Maybe I get images or I hear
link |
00:21:23.840
things or I feel things and those might be local or noisy. And so therefore they don't tell me
link |
00:21:29.520
everything about what's going on. And then I have to reason about given the history of actions
link |
00:21:35.440
I've taken and observations I've gotten, what do I think is going on in the world? And then
link |
00:21:40.000
given my own kind of uncertainty about what's going on in the world, I can decide what actions to
link |
00:21:43.920
take. And so difficult is this problem of planning under uncertainty in your view and your
link |
00:21:51.120
long experience of modeling the world, trying to deal with this uncertainty in
link |
00:21:57.840
especially in real world systems. Optimal planning for even discrete POMDPs can be
link |
00:22:04.240
undecidable depending on how you set it up. And so lots of people say I don't use POMDPs
link |
00:22:12.000
because they are intractable. And I think that that's a kind of a very funny thing to say because
link |
00:22:18.880
the problem you have to solve is the problem you have to solve. So if the problem you have to
link |
00:22:23.120
solve is intractable, that's what makes us AI people, right? So we solve, we understand that
link |
00:22:28.160
the problem we're solving is wildly intractable that we will never be able to solve it optimally,
link |
00:22:34.320
at least I don't. Yeah, right. So later we can come back to an idea about bounded optimality
link |
00:22:41.360
and something. But anyway, we can't come up with optimal solutions to these problems.
link |
00:22:45.520
So we have to make approximations. Approximations in modeling approximations in solution algorithms
link |
00:22:51.200
and so on. And so I don't have a problem with saying, yeah, my problem actually it is POMDP in
link |
00:22:58.160
continuous space with continuous observations. And it's so computationally complex. I can't
link |
00:23:02.880
even think about it's, you know, big O whatever. But that doesn't prevent me from it helps me
link |
00:23:10.320
gives me some clarity to think about it that way. And to then take steps to make approximation
link |
00:23:17.360
after approximation to get down to something that's like computable in some reasonable time.
link |
00:23:22.080
When you think about optimality, you know, the community broadly has shifted on that, I think,
link |
00:23:27.920
a little bit in how much they value the idea of optimality of chasing an optimal solution.
link |
00:23:35.600
How is your views of chasing an optimal solution changed over the years when you work with robots?
link |
00:23:42.240
That's interesting. I think we have a little bit of a methodological crisis, actually,
link |
00:23:49.920
from the theoretical side. I mean, I do think that theory is important and that right now we're not
link |
00:23:54.000
doing much of it. So there's lots of empirical hacking around and training this and doing that
link |
00:24:00.640
and reporting numbers. But is it good? Is it bad? We don't know. It's very hard to say things.
link |
00:24:08.240
And if you look at like computer science theory, so people talked for a while,
link |
00:24:15.920
everyone was about solving problems optimally or completely. And then there were interesting
link |
00:24:21.280
relaxations. So people look at, oh, can I, are there regret bounds? Or can I do some kind of,
link |
00:24:27.520
you know, approximation? Can I prove something that I can approximately solve this problem or
link |
00:24:33.280
that I get closer to the solution as I spend more time and so on? What's interesting, I think,
link |
00:24:38.160
is that we don't have good approximate solution concepts for very difficult problems. Right?
link |
00:24:47.680
I like to, you know, I like to say that I'm interested in doing a very bad job of very big
link |
00:24:52.640
problems. Right. So very bad job, very big problems. I like to do that. But I wish I could say
link |
00:25:02.960
something. I wish I had a, I don't know, some kind of a formal solution concept
link |
00:25:10.320
that I could use to say, oh, this algorithm actually, it gives me something. Like, I know
link |
00:25:16.640
what I'm going to get. I can do something other than just run it and get out. So that notion
link |
00:25:21.760
is still somewhere deeply compelling to you. The notion that you can say, you can drop
link |
00:25:28.640
thing on the table says this, you can expect this, this algorithm will give me some good results.
link |
00:25:33.440
I hope there's, I hope science will, I mean, there's engineering and there's science,
link |
00:25:38.960
I think that they're not exactly the same. And I think right now we're making huge engineering
link |
00:25:45.600
like leaps and bounds. So the engineering is running away ahead of the science, which is cool.
link |
00:25:49.840
And often how it goes, right? So we're making things and nobody knows how and why they work,
link |
00:25:54.800
roughly. But we need to turn that into science. There's some form. It's, yeah,
link |
00:26:03.200
there's some room for formalizing. We need to know what the principles are. Why does this work?
link |
00:26:07.200
Why does that not work? I mean, for while people build bridges by trying, but now we can often
link |
00:26:12.480
predict whether it's going to work or not without building it. Can we do that for learning systems
link |
00:26:17.520
or for robots? See, your hope is from a materialistic perspective that intelligence,
link |
00:26:23.600
artificial intelligence systems, robots are kind of just fancier bridges.
link |
00:26:29.200
Belief space. What's the difference between belief space and state space? So we mentioned
link |
00:26:33.600
MDPs, FOMDPs, you reasoning about, you sense the world, there's a state. What's this belief
link |
00:26:42.000
space idea? Yeah. Okay, that sounds good. It sounds good. So belief space, that is, instead of
link |
00:26:49.040
thinking about what's the state of the world and trying to control that as a robot, I think about
link |
00:26:55.760
what is the space of beliefs that I could have about the world? What's, if I think of a belief
link |
00:27:01.120
as a probability distribution of the ways the world could be, a belief state is a distribution,
link |
00:27:06.640
and then my control problem, if I'm reasoning about how to move through a world I'm uncertain about,
link |
00:27:14.160
my control problem is actually the problem of controlling my beliefs. So I think about taking
link |
00:27:18.880
actions, not just what effect they'll have on the world outside, but what effect they'll have on my
link |
00:27:23.120
own understanding of the world outside. And so that might compel me to ask a question or look
link |
00:27:29.920
somewhere to gather information, which may not really change the world state, but it changes
link |
00:27:35.280
my own belief about the world. That's a powerful way to empower the agent to reason about the
link |
00:27:43.440
world, to explore the world. What kind of problems does it allow you to solve to
link |
00:27:49.040
consider belief space versus just state space? Well, any problem that requires deliberate
link |
00:27:54.560
information gathering. So if in some problems, like chess, there's no uncertainty, or maybe
link |
00:28:02.800
there's uncertainty about the opponent. There's no uncertainty about the state.
link |
00:28:08.400
And some problems, there's uncertainty, but you gather information as you go. You might say,
link |
00:28:14.000
oh, I'm driving my autonomous car down the road, and it doesn't know perfectly where it is, but
link |
00:28:18.240
the LiDARs are all going all the time. So I don't have to think about whether to gather information.
link |
00:28:24.160
But if you're a human driving down the road, you sometimes look over your shoulder to see what's
link |
00:28:28.800
going on behind you in the lane. And you have to decide whether you should do that now. And you
link |
00:28:36.320
have to trade off the fact that you're not seeing in front of you, and you're looking behind you,
link |
00:28:40.400
and how valuable is that information, and so on. And so to make choices about information
link |
00:28:45.440
gathering, you have to reason in belief space. Also to just take into account your own uncertainty
link |
00:28:56.080
before trying to do things. So you might say, if I understand where I'm standing relative to the
link |
00:29:03.280
door jam, pretty accurately, then it's okay for me to go through the door. But if I'm really not
link |
00:29:08.880
sure where the door is, then it might be better to not do that right now. The degree of your
link |
00:29:14.240
uncertainty about the world is actually part of the thing you're trying to optimize in forming the
link |
00:29:18.800
plan, right? So this idea of a long horizon of planning for a PhD or just even how to get out
link |
00:29:26.560
of the house or how to make breakfast, you show this presentation of the WTF, where's the fork
link |
00:29:33.360
of a robot looking to sink. And can you describe how we plan in this world is this idea of hierarchical
link |
00:29:42.000
planning we've mentioned? Yeah, how can a robot hope to plan about something with such a long
link |
00:29:52.000
horizon where the goal is quite far away? People since probably reasoning began have thought about
link |
00:29:58.400
hierarchical reasoning, the temporal hierarchy in particular. Well, there's spatial hierarchy,
link |
00:30:02.560
but let's talk about temporal hierarchy. So you might say, oh, I have this long
link |
00:30:06.240
execution I have to do, but I can divide it into some segments abstractly, right? So maybe
link |
00:30:14.400
have to get out of the house, I have to get in the car, I have to drive, and so on. And so
link |
00:30:20.800
you can plan if you can build abstractions. So this we started out by talking about abstractions,
link |
00:30:25.920
and we're back to that now. If you can build abstractions in your state space,
link |
00:30:30.080
and abstractions, sort of temporal abstractions, then you can make plans at a high level. And you
link |
00:30:37.760
can say, I'm going to go to town, and then I'll have to get gas, and I can go here, and I can do
link |
00:30:42.320
this other thing. And you can reason about the dependencies and constraints among these actions,
link |
00:30:47.920
again, without thinking about the complete details. What we do in our hierarchical planning work is
link |
00:30:55.600
then say, all right, I make a plan at a high level of abstraction. I have to have some
link |
00:31:00.960
reason to think that it's feasible without working it out in complete detail. And that's
link |
00:31:06.640
actually the interesting step. I always like to talk about walking through an airport, like
link |
00:31:12.160
you can plan to go to New York and arrive at the airport, and then find yourself in an office
link |
00:31:16.720
building later. You can't even tell me in advance what your plan is for walking through the airport,
link |
00:31:21.520
partly because you're too lazy to think about it maybe, but partly also because you just don't
link |
00:31:26.320
have the information. You don't know what gate you're landing in or what people are going to be
link |
00:31:30.960
in front of you or anything. So there's no point in planning in detail. But you have to have,
link |
00:31:38.000
you have to make a leap of faith that you can figure it out once you get there. And it's really
link |
00:31:43.760
interesting to me how you arrive at that. How do you, so you have learned over your lifetime to be
link |
00:31:52.000
able to make some kinds of predictions about how hard it is to achieve some kinds of sub goals.
link |
00:31:57.440
And that's critical. Like you would never plan to fly somewhere if you couldn't,
link |
00:32:02.000
didn't have a model of how hard it was to do some of the intermediate steps.
link |
00:32:05.200
So one of the things we're thinking about now is how do you do this kind of very aggressive
link |
00:32:09.440
generalization to situations that you haven't been in and so on to predict how long will it
link |
00:32:16.400
take to walk through the Kuala Lumpur airport? Like you could give me an estimate and it wouldn't
link |
00:32:20.400
be crazy. And you have to have an estimate of that in order to make plans that involve
link |
00:32:26.800
walking through the Kuala Lumpur airport, even if you don't need to know it in detail.
link |
00:32:31.040
So I'm really interested in these kinds of abstract models and how do we acquire them.
link |
00:32:35.520
But once we have them, we can use them to do hierarchical reasoning, which I think is very
link |
00:32:39.760
important. Yeah, there's this notion of goal regression and preimage backchaining.
link |
00:32:46.400
This idea of starting at the goal and just forming these big clouds of states. I mean,
link |
00:32:54.560
it's almost like saying to the airport, you know, you know, once you show up to the airport,
link |
00:33:01.840
you're like a few steps away from the goal. So thinking of it this way is kind of interesting.
link |
00:33:08.560
I don't know if you have further comments on that of starting at the goal. Yeah, I mean,
link |
00:33:15.600
it's interesting that Herb Simon back in the early days of AI talked a lot about
link |
00:33:22.400
means ends reasoning and reasoning back from the goal. There's a kind of an intuition that people
link |
00:33:26.960
have that the number of the state space is big, the number of actions you could take is really big.
link |
00:33:35.760
So if you say, here I sit and I want to search forward from where I am, what are all the things
link |
00:33:39.440
I could do? That's just overwhelming. If you say, if you can reason at this other level and say,
link |
00:33:45.040
here's what I'm hoping to achieve, what can I do to make that true that somehow the
link |
00:33:49.520
branching is smaller? Now, what's interesting is that like in the AI planning community,
link |
00:33:54.000
that hasn't worked out in the class of problems that they look at and the methods that they tend
link |
00:33:59.120
to use, it hasn't turned out that it's better to go backward. It's still kind of my intuition
link |
00:34:04.400
that it is, but I can't prove that to you right now. Right. I share your intuition, at least for us
link |
00:34:10.720
mirror humans. Speaking of which, when you maybe now we take it and take a little step into that
link |
00:34:19.920
philosophy circle, how hard would it, when you think about human life, you give those examples
link |
00:34:27.280
often, how hard do you think it is to formulate human life as a planning problem or aspects of
link |
00:34:32.400
human life? So when you look at robots, you're often trying to think about object manipulation,
link |
00:34:38.640
tasks about moving a thing. When you take a slight step outside the room, let the robot
link |
00:34:46.240
leave and he'll get lunch or maybe try to pursue more fuzzy goals. How hard do you think is that
link |
00:34:54.480
problem? If you were to try to maybe put another way, try to formulate human life as a planning
link |
00:35:00.720
problem. Well, that would be a mistake. I mean, it's not all a planning problem, right? I think
link |
00:35:05.680
it's really, really important that we understand that you have to put together pieces and parts
link |
00:35:11.920
that have different styles of reasoning and representation and learning. I think it seems
link |
00:35:18.640
probably clear to anybody that it can't all be this or all be that. Brains aren't all like this
link |
00:35:25.680
or all like that, right? They have different pieces and parts and substructure and so on.
link |
00:35:30.160
So I don't think that there's any good reason to think that there's going to be like one true
link |
00:35:34.400
algorithmic thing that's going to do the whole job. Just a bunch of pieces together,
link |
00:35:39.600
design to solve a bunch of specific problems. Or maybe styles of problems. I mean,
link |
00:35:48.160
there's probably some reasoning that needs to go on in image space. I think, again,
link |
00:35:55.840
there's this model base versus model free idea, right? So in reinforcement learning,
link |
00:35:59.440
people talk about, oh, should I learn? I could learn a policy just straight up a way of behaving.
link |
00:36:06.000
I could learn it's popular in a value function. That's some kind of weird intermediate ground.
link |
00:36:13.360
Or I could learn a transition model, which tells me something about the dynamics of the world.
link |
00:36:18.320
If I take a, imagine that I learn a transition model and I couple it with a planner and I
link |
00:36:22.560
draw a box around that, I have a policy again. It's just stored a different way, right?
link |
00:36:30.800
But it's just as much of a policy as the other policy. It's just I've made, I think,
link |
00:36:34.560
the way I see it is it's a time space trade off in computation, right? A more overt policy
link |
00:36:41.920
representation. Maybe it takes more space, but maybe I can compute quickly what action I should
link |
00:36:47.680
take. On the other hand, maybe a very compact model of the world dynamics plus a planner
link |
00:36:53.680
lets me compute what action to take to just more slowly. There's no, I mean, I don't think,
link |
00:36:58.240
there's no argument to be had. It's just like a question of what form of computation is best
link |
00:37:04.240
for us. For the various sub problems. Right. So, and so like learning to do algebra manipulations
link |
00:37:12.720
for some reason is, I mean, that's probably going to want naturally a sort of a different
link |
00:37:17.280
representation than riding a unicycle. At the time constraints on the unicycle are serious.
link |
00:37:22.640
The space is maybe smaller. I don't know. But so I could be the more human size of
link |
00:37:28.640
falling in love, having a relationship that might be another another style of no idea how to model
link |
00:37:36.240
that. Yeah, that's, that's first solve the algebra and the object manipulation. What do you think
link |
00:37:44.160
is harder perception or planning perception? That's why I'm understanding that's why.
link |
00:37:51.920
So what do you think is so hard about perception by understanding the world around you?
link |
00:37:55.440
Well, I mean, I think the big question is representational. A hugely the question is
link |
00:38:03.520
representation. So perception has made great strides lately, right? And we can classify images and we
link |
00:38:12.560
can play certain kinds of games and predict how to steer the car and all this sort of stuff.
link |
00:38:17.760
I don't think we have a very good idea of what perception should deliver, right? So if you
link |
00:38:28.160
if you believe in modularity, okay, there's there's a very strong view which says
link |
00:38:34.640
we shouldn't build in any modularity, we should make a giant gigantic neural network,
link |
00:38:40.400
train it end to end to do the thing. And that's the best way forward.
link |
00:38:44.000
And it's hard to argue with that except on a sample complexity basis, right? So you might say,
link |
00:38:51.280
oh, well, if I want to do end to end reinforcement learning on this giant giant neural network,
link |
00:38:55.120
it's going to take a lot of data and a lot of like broken robots and stuff. So
link |
00:39:02.640
then the only answer is to say, okay, we have to build something in build in some structure
link |
00:39:10.000
or some bias, we know from theory of machine learning, the only way to cut down the sample
link |
00:39:14.080
complexity is to kind of cut down somehow cut down the hypothesis space, you can do that by
link |
00:39:20.320
building in bias. There's all kinds of reason to think that nature built bias into humans.
link |
00:39:27.520
Convolution is a bias, right? It's a very strong bias and it's a very critical bias.
link |
00:39:32.800
So my view is that we should look for more things that are like convolution, but that address other
link |
00:39:39.520
aspects of reasoning, right? So convolution helps us a lot with a certain kind of spatial
link |
00:39:43.440
reasoning that's quite close to the imaging. I think there's other ideas like that,
link |
00:39:52.400
maybe some amount of forward search, maybe some notions of abstraction, maybe the notion that
link |
00:39:58.240
objects exist, actually, I think that's pretty important. And a lot of people won't give you
link |
00:40:02.560
that to start with, right? So almost like a convolution in the
link |
00:40:08.640
in the object semantic object space or some kind of some kind of ideas in there. That's right.
link |
00:40:13.600
And people are like the graph, graph convolutions are an idea that are related to
link |
00:40:17.680
relational representations. And so I think there are, so you, I've come far field from perception,
link |
00:40:26.240
but I think, I think the thing that's going to make perception that kind of the next step is
link |
00:40:33.200
actually understanding better what it should produce, right? So what are we going to do with
link |
00:40:38.000
the output of it, right? It's fine when what we're going to do with the output is steer,
link |
00:40:41.920
it's less clear when we're just trying to make one integrated intelligent agent,
link |
00:40:48.880
what should the output of perception be? We have no idea. And how should that hook up to the other
link |
00:40:53.520
stuff? We don't know. So I think the pressing question is, what kinds of structure can we
link |
00:41:00.240
build in that are like the moral equivalent of convolution that will make a really awesome
link |
00:41:05.520
superstructure that then learning can kind of progress on efficiently?
link |
00:41:10.240
I agree. Very compelling description of actually where we stand with the perception from
link |
00:41:15.280
you're teaching a course on embodying intelligence. What do you think it takes to
link |
00:41:19.120
build a robot with human level intelligence? I don't know if we knew we would do it.
link |
00:41:27.680
If you were to, I mean, okay, so do you think a robot needs to have a self awareness,
link |
00:41:36.000
consciousness, fear of mortality? Or is it, is it simpler than that? Or is consciousness a simple
link |
00:41:44.160
thing? Do you think about these notions? I don't think much about consciousness. Even most philosophers
link |
00:41:51.680
who care about it will give you that you could have robots that are zombies, right, that behave
link |
00:41:56.560
like humans but are not conscious. And I, at this moment, would be happy enough for that. So I'm not
link |
00:42:01.360
really worried one way or the other. So the technical side, you're not thinking of the use of self
link |
00:42:06.240
awareness? Well, but I, okay, but then what does self awareness mean? I mean, that you need to have
link |
00:42:13.760
some part of the system that can observe other parts of the system and tell whether they're
link |
00:42:18.800
working well or not. That seems critical. So does that count as, I mean, does that count as
link |
00:42:24.560
self awareness or not? Well, it depends on whether you think that there's somebody at home who can
link |
00:42:30.560
articulate whether they're self aware. But clearly, if I have like, you know, some piece of code
link |
00:42:35.600
that's counting how many times this procedure gets executed, that's a kind of self awareness,
link |
00:42:41.120
right? So there's a big spectrum. It's clear you have to have some of it.
link |
00:42:44.560
Right. You know, we're quite far away on many dimensions, but is the direction of research
link |
00:42:49.600
that's most compelling to you for, you know, trying to achieve human level intelligence
link |
00:42:54.720
in our robots? Well, to me, I guess the thing that seems most compelling to me at the moment is this
link |
00:43:00.880
question of what to build in and what to learn. I think we're, we don't, we're missing a bunch of
link |
00:43:10.320
ideas. And, and we, you know, people, you know, don't you dare ask me how many years it's going
link |
00:43:17.120
to be until that happens, because I won't even participate in the conversation. Because I think
link |
00:43:22.320
we're missing ideas and I don't know how long it's going to take to find them. So I won't ask you
link |
00:43:26.240
how many years, but maybe I'll ask you what it, when you will be sufficiently impressed that we've
link |
00:43:34.160
achieved it. So what's a good test of intelligence? Do you like the Turing test and natural language
link |
00:43:41.280
in the robotic space? Is there something where you would sit back and think, oh, that's pretty
link |
00:43:47.520
impressive as a test, as a benchmark. Do you think about these kinds of problems?
link |
00:43:52.800
No, I resist. I mean, I think all the time that we spend arguing about those kinds of things could
link |
00:43:58.480
be better spent just making their robots work better. So you don't value competition. So I mean,
link |
00:44:04.800
there's a nature of benchmark, benchmarks and data sets, or Turing test challenges, where
link |
00:44:11.280
everybody kind of gets together and tries to build a better robot because they want to outcompete
link |
00:44:15.440
each other, like the DARPA challenge with the autonomous vehicles. Do you see the value of that?
link |
00:44:23.600
Or can get in the way? I think you can get in the way. I mean, some people, many people find it
link |
00:44:27.440
motivating. And so that's good. I find it anti motivating personally. But I think you get an
link |
00:44:34.880
interesting cycle where for a contest, a bunch of smart people get super motivated and they hack
link |
00:44:41.200
their brains out. And much of what gets done is just hacks, but sometimes really cool ideas emerge.
link |
00:44:47.120
And then that gives us something to chew on after that. So it's not a thing for me, but I don't
link |
00:44:53.360
I don't regret that other people do it. Yeah, it's like you said, with everything else that
link |
00:44:58.480
makes us good. So jumping topics a little bit, you started the journal machine learning research
link |
00:45:04.560
and served as its editor in chief. How did the publication come about?
link |
00:45:11.680
And what do you think about the current publishing model space in machine learning
link |
00:45:17.040
artificial intelligence? Okay, good. So it came about because there was a journal called machine
link |
00:45:23.040
learning, which still exists, which was owned by Clure. And there was I was on the editorial
link |
00:45:29.840
board and we used to have these meetings annually where we would complain to Clure that
link |
00:45:33.520
it was too expensive for the libraries and that people couldn't publish. And we would really
link |
00:45:37.280
like to have some kind of relief on those fronts. And they would always sympathize,
link |
00:45:41.920
but not do anything. So we just decided to make a new journal. And there was the Journal of AI
link |
00:45:49.120
Research, which has was on the same model, which had been in existence for maybe five years or so,
link |
00:45:54.880
and it was going on pretty well. So we just made a new journal. It wasn't I mean,
link |
00:46:03.600
I don't know, I guess it was work, but it wasn't that hard. So basically the editorial board,
link |
00:46:07.600
probably 75% of the editorial board of machine learning resigned. And we founded the new journal.
link |
00:46:17.440
But it was sort of it was more open. Yeah, right. So it's completely open. It's open access.
link |
00:46:25.280
Actually, I had a postdoc, George Conrad Harris, who wanted to call these journals free for all.
link |
00:46:33.520
Because there were I mean, it both has no page charges and has no
link |
00:46:40.080
access restrictions. And the reason and so lots of people, I mean, for there were,
link |
00:46:45.520
there were people who are mad about the existence of this journal who thought it was a fraud or
link |
00:46:50.240
something, it would be impossible, they said, to run a journal like this with basically,
link |
00:46:55.200
I mean, for a long time, I didn't even have a bank account. I paid for the
link |
00:46:59.840
lawyer to incorporate and the IP address. And it just didn't cost a couple hundred dollars a year
link |
00:47:06.640
to run. It's a little bit more now, but not that much more. But that's because I think computer
link |
00:47:12.880
scientists are competent and autonomous in a way that many scientists in other fields aren't.
link |
00:47:19.920
I mean, at doing these kinds of things, we already type set around papers,
link |
00:47:23.920
we all have students and people who can hack a website together in the afternoon.
link |
00:47:28.000
So the infrastructure for us was like, not a problem, but for other people in other fields,
link |
00:47:32.960
it's a harder thing to do. Yeah. And this kind of open access journal is nevertheless,
link |
00:47:38.960
one of the most prestigious journals. So it's not like a prestige and it can be achieved
link |
00:47:45.840
without any of the papers. Paper is not required for prestige, turns out. Yeah.
link |
00:47:50.640
So on the review process side of actually a long time ago, I don't remember when I reviewed a paper
link |
00:47:56.960
where you were also a reviewer and I remember reading your review and being influenced by it.
link |
00:48:01.360
It was really well written. It influenced how I write feature reviews. You disagreed with me,
link |
00:48:06.480
actually. And you made it my review, but much better. But nevertheless, the review process
link |
00:48:16.880
has its flaws. And what do you think works well? How can it be improved?
link |
00:48:23.600
So actually, when I started JMLR, I wanted to do something completely different.
link |
00:48:28.720
And I didn't because it felt like we needed a traditional journal of record and so we just
link |
00:48:34.800
made JMLR be almost like a normal journal, except for the open access parts of it, basically.
link |
00:48:43.200
Increasingly, of course, publication is not even a sensible word. You can publish something by
link |
00:48:47.600
putting it in an archive so I can publish everything tomorrow. So making stuff public is
link |
00:48:54.400
there's no barrier. We still need curation and evaluation. I don't have time to read all of
link |
00:49:04.800
archive. And you could argue that kind of social thumbs uping of articles suffices, right? You
link |
00:49:20.480
might say, oh, heck with this, we don't need journals at all. We'll put everything on archive
link |
00:49:25.440
and people will upvote and downvote the articles and then your CV will say, oh, man, he got a lot
link |
00:49:30.400
of upvotes. So that's good. But I think there's still value in careful reading and commentary of
link |
00:49:44.000
things. And it's hard to tell when people are upvoting and downvoting or arguing about your
link |
00:49:48.480
paper on Twitter and Reddit, whether they know what they're talking about. So then I have the
link |
00:49:55.440
second order problem of trying to decide whose opinions I should value and such. So I don't
link |
00:50:01.360
know. If I had infinite time, which I don't, and I'm not going to do this because I really want to
link |
00:50:06.240
make robots work, but if I felt inclined to do something more in a publication direction,
link |
00:50:12.880
I would do this other thing, which I thought about doing the first time, which is to get
link |
00:50:16.160
together some set of people whose opinions I value and who are pretty articulate. And I guess we
link |
00:50:22.480
would be public, although we could be private, I'm not sure. And we would review papers. We wouldn't
link |
00:50:27.520
publish them and you wouldn't submit them. We would just find papers and we would write reviews
link |
00:50:32.720
and we would make those reviews public. And maybe if you, you know, so we're Leslie's friends who
link |
00:50:39.120
review papers and maybe eventually if we, our opinion was sufficiently valued, like the opinion
link |
00:50:45.200
of JMLR is valued, then you'd say on your CV that Leslie's friends gave my paper a five star reading
link |
00:50:50.800
and that would be just as good as saying I got it accepted into this journal. So I think we
link |
00:50:58.800
should have good public commentary and organize it in some way, but I don't really know how to
link |
00:51:04.800
do it. It's interesting times. The way you describe it actually is really interesting. I mean,
link |
00:51:09.120
we do it for movies, IMDB.com. There's experts, critics come in, they write reviews, but there's
link |
00:51:15.040
also regular non critics humans write reviews and they're separated. I like open review.
link |
00:51:22.240
The eye clear process, I think is interesting. It's a step in the right direction, but it's still
link |
00:51:31.600
not as compelling as reviewing movies or video games. I mean, it sometimes almost, it might be
link |
00:51:39.840
silly, at least from my perspective to say, but it boils down to the user interface, how fun and
link |
00:51:44.400
easy it is to actually perform the reviews, how efficient, how much you as a reviewer get
link |
00:51:51.200
street cred for being a good reviewer. Those human elements come into play.
link |
00:51:57.200
No, it's a big investment to do a good review of a paper and the flood of papers is out of control.
link |
00:52:05.280
There aren't 3,000 new, I don't know how many new movies are there in a year, I don't know,
link |
00:52:08.960
but there's probably going to be less than how many machine learning papers there are in a year now.
link |
00:52:19.840
Right, so I'm like an old person, so of course I'm going to say,
link |
00:52:23.520
things are moving too fast, I'm a stick in the mud. So I can say that, but my particular flavor
link |
00:52:30.240
of that is, I think the horizon for researchers has gotten very short, that students want to
link |
00:52:38.240
publish a lot of papers and it's exciting and there's value in that and you get padded on the
link |
00:52:46.000
head for it and so on. And some of that is fine, but I'm worried that we're driving out people who
link |
00:52:58.320
would spend two years thinking about something. Back in my day, when we worked on our theses,
link |
00:53:05.280
we did not publish papers, you did your thesis for years, you picked a hard problem and then you
link |
00:53:10.560
worked and chewed on it and did stuff and wasted time and for a long time. And when it was roughly,
link |
00:53:16.320
when it was done, you would write papers. And so I don't know how to, and I don't think that
link |
00:53:22.800
everybody has to work in that mode, but I think there's some problems that are hard enough
link |
00:53:27.680
that it's important to have a longer research horizon and I'm worried that
link |
00:53:31.680
we don't incentivize that at all at this point. In this current structure. So what do you see
link |
00:53:41.440
what are your hopes and fears about the future of AI and continuing on this theme? So AI has
link |
00:53:47.280
gone through a few winters, ups and downs. Do you see another winter of AI coming?
link |
00:53:53.440
Or are you more hopeful about making robots work, as you said? I think the cycles are inevitable,
link |
00:54:03.040
but I think each time we get higher, right? I mean, it's like climbing some kind of
link |
00:54:10.080
landscape with a noisy optimizer. So it's clear that the deep learning stuff has
link |
00:54:19.600
made deep and important improvements. And so the high watermark is now higher. There's no question.
link |
00:54:25.760
But of course, I think people are overselling and eventually investors, I guess, and other people
link |
00:54:34.400
look around and say, well, you're not quite delivering on this grand claim and that wild
link |
00:54:40.640
hypothesis. It's like probably it's going to crash something out and then it's okay. I mean,
link |
00:54:47.680
it's okay. I mean, but I don't I can't imagine that there's like some awesome monotonic improvement
link |
00:54:54.000
from here to human level AI. So in, you know, I have to ask this question, I probably anticipate
link |
00:55:01.760
answers, the answers. But do you have a worry short term, a long term about the existential
link |
00:55:09.120
threats of AI and maybe short term, less existential, but more robots taking away jobs?
link |
00:55:20.480
Well, actually, let me talk a little bit about utility. Actually, I had an interesting conversation
link |
00:55:28.000
with some military ethicists who wanted to talk to me about autonomous weapons.
link |
00:55:32.480
And they're, they were interesting, smart, well educated guys who didn't know too much about AI or
link |
00:55:39.360
machine learning. And the first question they asked me was, has your robot ever done something you
link |
00:55:43.600
didn't expect? And I like burst out laughing because anybody who's ever done something other robot
link |
00:55:49.120
right knows that they don't do much. And what I realized was that their model of how we program
link |
00:55:54.720
a robot was completely wrong. Their model of how we could put program robot was like,
link |
00:55:59.440
program robot was like, Lego Mindstorms, like, Oh, go forward a meter, turn left, take a picture,
link |
00:56:05.600
do this, do that. And so if you have that model of programming, then it's true, it's kind of weird
link |
00:56:11.120
that your robot would do something that you didn't anticipate. But the fact is, and actually,
link |
00:56:16.240
so now this is my new educational mission, if I have to talk to non experts, I try to teach them
link |
00:56:22.720
the idea that we don't operate, we operate at least one or maybe many levels of abstraction
link |
00:56:28.080
about that. And we say, Oh, here's a hypothesis class, maybe it's a space of plans, or maybe it's a
link |
00:56:33.280
space of classifiers, or whatever. But there's some set of answers and an objective function. And
link |
00:56:38.400
then we work on some optimization method that tries to optimize a solution in that class.
link |
00:56:46.080
And we don't know what solution is going to come out. Right. So I think it's important to
link |
00:56:50.560
communicate that. So I mean, of course, probably people who listen to this, they know that lesson.
link |
00:56:55.520
But I think it's really critical to communicate that lesson. And then lots of people are now
link |
00:56:59.600
talking about, you know, the value alignment problem. So you want to be sure, as robots or
link |
00:57:06.480
software systems get more competent, that their objectives are aligned with your objectives,
link |
00:57:11.280
or that our objectives are compatible in some way, or we have a good way of mediating when they have
link |
00:57:17.680
different objectives. And so I think it is important to start thinking in terms, like,
link |
00:57:22.240
you don't have to be freaked out by the robot apocalypse to accept that it's important to think
link |
00:57:28.480
about objective functions of value alignment. And that you have to really, everyone who's done
link |
00:57:33.760
optimization knows that you have to be careful what you wish for that, you know, sometimes you get
link |
00:57:38.160
the optimal solution. And you realize, man, that was that objective was wrong. So pragmatically,
link |
00:57:45.280
in the shortest term, it seems to me that that those are really interesting and critical questions.
link |
00:57:51.360
And the idea that we're going to go from being people who engineer algorithms to being people
link |
00:57:55.680
who engineer objective functions, I think that's, that's definitely going to happen. And that's
link |
00:58:00.800
going to change our thinking and methodology and stuff.
link |
00:58:03.360
We're going to, you started at Stanford philosophy, that's wish you could be science,
link |
00:58:07.520
and I will go back to philosophy maybe. Well, I mean, they're mixed together because, because,
link |
00:58:13.840
as we also know, as machine learning people, right? When you design, in fact, this is the
link |
00:58:18.240
lecture I gave in class today, when you design an objective function, you have to wear both hats.
link |
00:58:23.360
There's the hat that says, what do I want? And there's the hat that says, but I know what my
link |
00:58:28.320
optimizer can do to some degree. And I have to take that into account. So it's, it's always a
link |
00:58:34.240
trade off. And we have to kind of be mindful of that. The part about taking people's jobs,
link |
00:58:40.480
that I understand that that's important, I don't understand sociology or economics or people
link |
00:58:47.360
very well. So I don't know how to think about that. So that's, yeah, so there might be a
link |
00:58:51.840
sociological aspect there, the economic aspect that's very difficult to think about. Okay.
link |
00:58:56.640
I mean, I think other people should be thinking about it, but I'm just, that's not my strength.
link |
00:59:00.000
So what do you think is the most exciting area of research in the short term,
link |
00:59:04.320
for the community and for your, for yourself? Well, so, I mean, there's this story I've been
link |
00:59:08.560
telling about how to engineer intelligent robots. So that's what we want to do. We all kind of want
link |
00:59:16.480
to do, well, I mean, some set of us want to do this. And the question is, what's the most effective
link |
00:59:20.960
strategy? And we've tried, and there's a bunch of different things you could do at the extremes,
link |
00:59:25.840
right? One super extreme is we do introspection and we write a program. Okay, that has not worked
link |
00:59:32.000
out very well. Another extreme is we take a giant bunch of neural guru and we try and train it up to
link |
00:59:37.360
do something. I don't think that's going to work either. So the question is, what's the middle
link |
00:59:43.040
ground? And again, this isn't a theological question or anything like that. It's just,
link |
00:59:49.840
like, how do, just how do we, what's the best way to make this work out? And I think it's clear,
link |
00:59:57.040
it's a combination of learning, to me, it's clear, it's a combination of learning and not learning.
link |
01:00:02.400
And what should that combination be? And what's the stuff we build in? So to me,
link |
01:00:05.920
that's the most compelling question. And when you say engineer robots, you mean
link |
01:00:10.080
engineering systems that work in the real world. That's the emphasis.
link |
01:00:17.600
Last question, which robots or robot is your favorite from science fiction?
link |
01:00:24.480
So you can go with Star Wars or RTD2, or you can go with more modern, maybe Hal.
link |
01:00:32.960
No, sir, I don't think I have a favorite robot from science fiction.
link |
01:00:37.040
This is, this is back to, you like to make robots work in the real world here, not, not in.
link |
01:00:45.520
I mean, I love the process. And I care more about the process.
link |
01:00:50.000
The engineering process.
link |
01:00:51.600
Yeah. I mean, I do research because it's fun, not because I care about what we produce.
link |
01:00:57.520
Well, that's, that's a beautiful note, actually. And Leslie, thank you so much for talking today.
link |
01:01:01.920
Sure, it's been fun.