back to indexSergey Levine: Robotics and Machine Learning | Lex Fridman Podcast #108
link |
The following is a conversation with Sergei Levine, a professor at Berkeley and a world
link |
class researcher in deep learning, reinforcement learning, robotics, and computer vision,
link |
including the development of algorithms for end to end training of neural network policies
link |
that combine perception and control, scalable algorithms for inverse reinforcement learning,
link |
and, in general, deep RL algorithms.
link |
Quick summary of the ads.
link |
Two sponsors, Cash App and ExpressVPN. Please consider supporting the podcast by
link |
downloading Cash App and using code LexPodcast and signing up at expressvpn.com slash LexPod.
link |
Click the links, buy the stuff, it's the best way to support this podcast and, in general,
link |
the journey I'm on.
link |
If you enjoy this thing, subscribe on YouTube, review it with 5 stars on Apple Podcast,
link |
follow on Spotify, support it on Patreon, or connect with me on Twitter at LexFreedman.
link |
As usual, I'll do a few minutes of ads now and never any ads in the middle that can break
link |
the flow of the conversation. This show is presented by Cash App, the number one finance
link |
app in the App Store. When you get it, use code LexPodcast. Cash App lets you send money to friends
link |
by Bitcoin and invest in the stock market with as little as $1. Since Cash App does fractional
link |
share trading, let me mention that the order execution algorithm that works behind the scenes
link |
to create the abstraction of the fractional orders is an algorithmic marvel. So big props to
link |
the Cash App engineers for taking a step up to the next layer of abstraction over the stock market,
link |
making trading more accessible for new investors and diversification much easier.
link |
So again, if you get Cash App from the App Store, Google Play and use the code LexPodcast,
link |
you get $10 and Cash App will also donate $10 the first, an organization that is helping to
link |
advance robotics and STEM education for young people around the world. This show is also sponsored
link |
by ExpressVPN. Get it at expressvpn.com slash LexPod to support this podcast and to get an extra
link |
three months free on a one year package. I've been using ExpressVPN for many years. I love it.
link |
I think ExpressVPN is the best VPN out there. They told me to say it, but it happens to be true
link |
in my humble opinion. It doesn't log your data. It's crazy fast and it's easy to use literally
link |
just one big power on button. Again, it's probably obvious to you, but I should say it again,
link |
it's really important that they don't log your data. It works on Linux and every other operating
link |
system, but Linux of course is the best operating system. Shout out to my favorite flavor, Ubuntu
link |
Mate 2004. Once again, get it at expressvpn.com slash LexPod to support this podcast and to get
link |
an extra three months free on a one year package. And now here's my conversation with Sergey Levine.
link |
What's the difference between a state of the art human, such as you and I, well,
link |
I don't know if we qualify as state of the art humans, but a state of the art human and a state
link |
of the art robot? That's a very interesting question. Robot capability is, it's kind of a,
link |
I think it's a very tricky thing to understand because there are some things that are difficult
link |
that we wouldn't think are difficult and some things that are easy that we wouldn't think are easy.
link |
And there's also a really big gap between capabilities of robots in terms of
link |
hardware and their physical capability and capabilities of robots in terms of what they
link |
can do autonomously. There is a little video that I think robotics researchers really like
link |
to show, especially robotics learning researchers like myself from 2004 from Stanford,
link |
which demonstrates a prototype robot called the PR1. And the PR1 was a robot that was designed as a
link |
home assistance robot. And there's this beautiful video showing the PR1 tidying up a living room,
link |
putting away toys, and at the end, bringing a beer to the person sitting on the couch,
link |
which looks really amazing. And then the punchline is that this robot is entirely controlled by a
link |
person. So you can, in some ways, the gap between a state of the art human and a state of the art
link |
robot, if the robot has a human brain, is actually not that large. Now, obviously,
link |
like human bodies are sophisticated and very robust and resilient in many ways. But on the whole,
link |
if we're willing to like spend a bit of money and do a bit of engineering,
link |
we can kind of close the hardware gap almost. But the intelligence gap, that one is very wide.
link |
And when you say hardware, you're referring to the physical sort of the actuators, the actual
link |
body of the robot as opposed to the hardware on which the cognition, the nervous, the hardware of
link |
the nervous system. Yes, exactly. I'm referring to the body rather than the mind.
link |
So that means that the kind of the work is cut out for us. While we can still make the
link |
body better, we kind of know that the big bottleneck right now is really the mind.
link |
And how big is that gap? How big is the difference in your sense of ability to learn,
link |
ability to reason, ability to perceive the world between humans and our best robots?
link |
The gap is very large, and the gap becomes larger the more unexpected events can happen
link |
in the world. So essentially, the spectrum along which you can measure the size of that gap is
link |
the spectrum of how open the world is. If you control everything in the world very tightly,
link |
if you put the robot in like a factory and you tell it where everything is and you rigidly
link |
program its motion, then it can do things, one might even say, in a superhuman way,
link |
it can move faster, it's stronger, it can lift up a car and things like that. But as soon as
link |
anything starts to vary in the environment, now it'll trip up. And if many, many things vary,
link |
like they would like in your kitchen, for example, then things are pretty much like wide open.
link |
Now, again, we're going to stick a bit on the philosophical questions, but
link |
how much on the human side of the cognitive abilities in your sense is nature versus nurture?
link |
So how much of it is a product of evolution and how much of it is something we'll learn from
link |
sort of scratch from the day we're born? I'm going to read into your question as asking about
link |
the implications of this for AI, because I'm not a biologist, I can't really like speak
link |
authoritatively. So until we learn it, if it's all about learning, then there's more hope for AI.
link |
Yeah. So the way that I look at this is that,
link |
you know, well, first, of course, biology is very messy. And it's, if you ask the question,
link |
how does a person do something, or how does a person's mind do something,
link |
you can come up with a bunch of hypotheses, and oftentimes you can find support for many
link |
different often conflicting hypotheses. One way that we can approach the question of
link |
what the implications of this for AI are, is we can think about what's sufficient.
link |
So, you know, maybe a person is, from birth, very, very good at some things like, for example,
link |
recognizing faces. There's a very strong evolutionary pressure to do that. If you can
link |
recognize your mother's face, then you're more likely to survive, and therefore people are good
link |
at this. But we can also ask like, what's the minimum sufficient thing? And one of the ways
link |
that we can study the minimal sufficient thing is we could, for example, see what people do in
link |
unusual situations. If you present them with things that evolution couldn't have prepared them for,
link |
you know, our daily lives actually do this to us all the time. We didn't evolve to deal with,
link |
you know, automobiles and space flight and whatever. So, there are all these situations
link |
that we can find ourselves in. And we do very well there. Like, I can give you a joystick
link |
to control a robotic arm, which you've never used before. And you might be pretty bad for the
link |
first couple of seconds. But if I tell you, like, your life depends on using this robotic arm to,
link |
like, open this door, you'll probably manage it. Even though you've never seen this device before,
link |
you've never used the joystick to control us, and you'll kind of muddle through it. And that's
link |
not your evolved natural ability. That's your flexibility, your adaptability. And that's exactly
link |
where our current robotic systems really kind of fall flat.
link |
But I wonder how much general, almost what we think of as common sense,
link |
pre trained models underneath all of that. So that ability to adapt to a joystick
link |
requires you to have a kind of, you know, I'm human. So it's hard for me to introspect all
link |
the knowledge I have about the world. But it seems like there might be an iceberg underneath
link |
of the amount of knowledge we actually bring to the table. That's kind of the open question.
link |
I think there's absolutely an iceberg of knowledge that we bring to the table. But
link |
I think it's very likely that iceberg of knowledge is actually built up over our
link |
lifetimes. Because we have, you know, we have a lot of prior experience to draw on. And it kind
link |
of makes sense that the right way for us to, you know, to optimize our efficiency, our evolutionary
link |
fitness, and so on, is to utilize all that experience to build up the best iceberg we can
link |
get. And that's actually one, you know, while that sounds an awful lot like what machine
link |
learning actually does, I think that for modern machine learning, it's actually one of the
link |
things that for modern machine learning, it's actually a really big challenge to take this
link |
unstructured mass of experience and distill out something that looks like a common sense
link |
understanding of the world. And perhaps part of that is it's not because something about
link |
machine learning itself is broken or hard, but because we've been a little too rigid
link |
in subscribing to a very supervised, very rigid notion of learning, you know, kind of the input
link |
output Xs go to Ys sort of model. And maybe what we really need to do is to view the world more
link |
as like a massive experience that is not necessarily providing any rigid supervision,
link |
but sort of providing many, many instances of things that could be. And then you take that
link |
and you distill it into some sort of common sense understanding.
link |
I see. Well, you're painting an optimistic, beautiful picture, especially from the robotics
link |
perspective, because that means we just need to invest and build better learning algorithms,
link |
figure out how we can get access to more and more data for those learning algorithms to extract
link |
signal from and then accumulate that iceberg of knowledge. It's a beautiful picture. It's a
link |
hopeful one. I think it's potentially a little bit more than just that. And this is where we
link |
perhaps reach the limits of our current understanding. But one thing that I think that
link |
the research community hasn't really resolved in a satisfactory way is how much it matters
link |
where that experience comes from. Like, you know, do you just like download everything on the
link |
internet and cram it into essentially the 21st century analog of the giant language model and
link |
then see what happens? Or does it actually matter whether your machine physically experiences the
link |
world or in the sense that it actually attempts things, observes the outcome of its actions and
link |
kind of augments its experience that way? That it chooses which parts of the world it
link |
gets to interact with and observe and learn from. Right. It may be that the world is so complex
link |
that simply obtaining a large mass of sort of IID samples of the world is a very difficult way
link |
to go. But if you are actually interacting with the world and essentially performing this sort of
link |
hard negative mining by attempting what you think might work, observing the sometimes happy and
link |
sometimes sad outcomes of that, and augmenting your understanding using that experience and
link |
you're just doing this continually for many years, maybe that sort of data in some sense
link |
is actually much more favorable to obtaining a common sense understanding. One reason we might
link |
think that this is true is that what we associate with common sense or lack of common sense
link |
is often characterized by the ability to reason about kind of counterfactual questions. Like,
link |
if I were to... Here, this bottle of water is sitting on the table, everything is fine,
link |
if I were to knock it over, which I'm not going to do, but if I were to do that, what would happen?
link |
And I know that nothing good would happen from that, but if I have a bad understanding of the
link |
world, I might think that that's a good way for me to gain more utility. If I actually go about
link |
daily life doing the things that my current understanding of the world suggests will give
link |
me high utility, in some ways, I'll get exactly the right supervision to tell me not to do those
link |
bad things and to keep doing the good things. So, there's a spectrum between IID, random walk
link |
through the space of data, and what we humans do. I don't even know if we do it optimal, but
link |
there might be beyond. So, this open question that you raised, where do you think systems,
link |
intelligent systems that would be able to deal with this world fall? Can we do pretty well
link |
by reading all of Wikipedia, randomly sampling it, like language models do, or do we have to be
link |
exceptionally selective and intelligent about which aspects of the world we try?
link |
So, I think this is first an open scientific problem, and I don't have a clear answer,
link |
but I can speculate a little bit. And what I would speculate is that you don't need to be
link |
super, super careful. I think it's less about being careful to avoid the useless stuff,
link |
and more about making sure that you hit on the really important stuff. So, perhaps it's okay
link |
if you spend part of your day just guided by your curiosity, visiting interesting regions of
link |
your state space, but it's important for you to, every once in a while, make sure that you really
link |
try out the solutions that your current model of the world suggests might be effective,
link |
and observe whether those solutions are working as you expect or not. And perhaps some of that
link |
is really essential to have a perpetual improvement loop. This perpetual improvement
link |
loop is really the key that's going to potentially distinguish the best current methods from the
link |
best methods of tomorrow, in a sense. How important do you think is exploration or total out of the
link |
box thinking exploration in this space to jump to totally different domains? So, you mentioned
link |
there's an optimization problem, you explore the specifics of a particular strategy, whatever the
link |
thing you're trying to solve. How important is it to explore totally outside of the strategies
link |
that have been working for you so far? What's your intuition there?
link |
Yeah, I think it's a very problem dependent kind of question. And I think that that's actually,
link |
you know, in some ways, that question gets at one of the big differences between
link |
sort of the classic formulation of a reinforcement learning problem and some of the sort of more
link |
open ended reformulations of that problem that had been explored in recent years. So,
link |
classically, reinforcement learning is framed as a problem of maximizing utility, like any kind of
link |
rational AI agent, and then anything you do is in service to maximizing that utility.
link |
But a very interesting kind of way to look at, I'm not necessarily saying this is the best way to
link |
look at it, but an interesting alternative way to look at these problems is as something where
link |
you first get to explore the world however you please, and then afterwards you will be tasked
link |
with doing something. And that might suggest a somewhat different solution. So, if you don't
link |
know what you're going to be tasked with doing, and you just want to prepare yourself optimally
link |
for whatever you're on certain future holds, maybe then you will choose to attain some sort of
link |
coverage, build up sort of an arsenal of cognitive tools, if you will, such that later on when someone
link |
tells you, now your job is to fetch the coffee for me, you will be well prepared to undertake that
link |
task. And that you see that as the modern formulation of the reinforcement learning
link |
problem as a kind of the more multitask, the general intelligence kind of formulation.
link |
I think that's one possible vision of where things might be headed. I don't think that's by any means
link |
the mainstream or standard way of doing things, and it's not like if I had to...
link |
But I like it. It's a beautiful vision. So, maybe actually take a step back. What is the goal of
link |
robotics? What's the general problem of robotics we're trying to solve? You actually kind of painted
link |
two pictures here, one of sort of the narrow, one of the general. What in your view is the big
link |
problem of robotics? Again, ridiculously philosophical question.
link |
I think that maybe there are two ways I can answer this question. One is there's a very
link |
pragmatic problem, which is what would make robots... What would sort of maximize the
link |
usefulness of robots? And there the answer might be something like a system where a system that
link |
a system that can perform whatever task a human user sets for it, within the physical constraints,
link |
of course. If you teleport to another planet, it probably can't do that. But if you ask it to do
link |
something that's within its physical capability, then potentially with a little bit of additional
link |
training or a little bit of additional trial and error, it ought to be able to figure it out
link |
in much the same way as like a human teleoperator ought to figure out how to drive the robot to
link |
do that. That's kind of the very pragmatic view of what it would take to kind of solve the robotics
link |
problem, if we will. But I think that there is a second answer, and that answer is a lot closer
link |
to why I want to work on robotics, which is that I think it's less about what it would take to do a
link |
really good job in the world of robotics, but more the other way around of what robotics
link |
can bring to the table to help us understand artificial intelligence.
link |
So your dream, fundamentally, is to understand intelligence?
link |
Yes. I think that's the dream for many people who actually work in this space. I think that
link |
there's something very pragmatic and very useful about studying robotics. But I do think that a
link |
lot of people that go into this field, actually, the things that they draw inspiration from are the
link |
potential for robots to help us learn about intelligence and about ourselves.
link |
So that's fascinating that robotics is basically the space by which you can get closer to
link |
understanding the fundamentals of artificial intelligence. So what is it about robotics
link |
that's different from some of the other approaches? So if we look at some of the early breakthroughs
link |
in deep learning or in the computer vision space and the natural language processing,
link |
there's really nice, clean benchmarks that a lot of people competed on, and thereby came
link |
up with a lot of brilliant ideas. What's the fundamental difference between computer vision
link |
purely defined and ImageNet and kind of the bigger robotics problem?
link |
So there are a couple of things. One is that with robotics, you kind of have to take away
link |
many of the crutches. So you have to deal with both the particular problems of perception,
link |
control, and so on. But you also have to deal with the integration of those things.
link |
And, you know, classically, we've always thought of the integration as kind of a separate problem.
link |
So a classic kind of modular engineering approach is that we solve the individual
link |
sub problems, then wire them together, and then the whole thing works. And one of the
link |
things that we've been seeing over the last couple of decades is that we'll maybe
link |
studying the thing as a whole might lead to just like very different solutions than if we were
link |
to study the parts and wire them together. So the integrative nature of robotics research
link |
helps us see, you know, the different perspectives on the problem. Another part of the answer is that
link |
with robotics, it casts a certain paradox into very clever relief. So this is sometimes referred
link |
to as Morvix paradox, the idea that in artificial intelligence, things that are very hard for people
link |
can be very easy for machines. And vice versa, things that are very easy for people can be
link |
very hard for machines. So, you know, integral and differential calculus is pretty difficult to
link |
learn for people. But if you program a computer, do it, it can derive derivatives and integrals
link |
for you all day long without any trouble. Whereas some things like, you know, drinking from a cup
link |
of water, very easy for a person to do very hard for a robot to deal with. And sometimes when we
link |
see such blatant discrepancies, that gives us a really strong hint that we're missing something
link |
important. So if we really try to zero in on those discrepancies, we might find that little bit
link |
that we're missing. And it's not that we need to make machines better or worse at math and better
link |
at drinking water, but just that by studying those discrepancies, we might find some new insight.
link |
So that could be, that could be in any space, doesn't have to be robotics. But you're saying,
link |
I mean, I get, it's kind of interesting that robotics seems to have a lot of those discrepancies.
link |
So the, the, the Hans Maravak paradox is probably referring to the space of the physical
link |
interaction, like you said, object manipulation, walking, all the kind of stuff we do in the physical
link |
world. How do you make sense if you were to try to disentangle the Maravaks paradox? Like, why is
link |
there such a gap in our intuition about it? Why do you think manipulating objects is so
link |
hard from everything you've learned from applying reinforcement learning in this space?
link |
Yeah, I think that one reason is maybe that for many of the, for many of the other problems that
link |
we've studied in AI and computer science and so on, the notion of input, output and supervision
link |
is much, much cleaner. So computer vision, for example, deals with very complex inputs,
link |
but it's comparatively a bit easier, at least up to some level of abstraction, to cast it as a very
link |
tightly supervised problem. It's comparatively much, much harder to cast robotic manipulation as a
link |
very tightly supervised problem. You can do it. It just doesn't seem to work all that well. So you
link |
could say that, well, maybe we get a labeled data set where we know exactly which motor commands
link |
to send and then we train on that. But for various reasons, that's not actually like such a great
link |
solution. And it also doesn't seem to be even remotely similar to how people and animals learn
link |
to do things because we're not told by like our parents, here's how you fire your muscles in order
link |
to walk. We do get some guidance, but the really low level detailed stuff we figure out mostly on
link |
our own. And that's what you mean by tightly coupled that every single little subaction gets a
link |
supervised signal of whether it's a good one or not. Right. So while in computer vision, you
link |
could sort of imagine up to a level of abstraction that maybe somebody told you this is a car and
link |
this is a cat and this is a dog. In motor control, it's very clear that that was not the case.
link |
If we look at sort of the subspaces of robotics that, again, as you said, robotics integrates
link |
all of them together and we get to see how this beautiful mess interplays. But so there's nevertheless
link |
still perception. So it's the computer vision problem, broadly speaking, understanding the
link |
environment. Then there's also, maybe you can correct me on this kind of categorization of the
link |
space, then there's prediction in trying to anticipate what things are going to do into
link |
the future in order for you to be able to act in that world. And then there's also this game
link |
theoretic aspect of how your actions will change the behavior of others. In this kind of space,
link |
and this is bigger than reinforcement learning, this is just broadly looking at the problem
link |
of robotics. What's the hardest problem here? Or is there, or is what you said
link |
true that when you start to look at all of them together, that's a whole
link |
another thing. You can't even say which one individually is harder because all of them
link |
together, you should only be looking at them all together. I think when you look at them all together,
link |
some things actually become easier. And I think that's actually pretty important.
link |
Back in 2014, we had some work, basically our first work on end to end reinforcement learning
link |
for robotic manipulation skills from vision, which at the time was something that seemed
link |
a little inflammatory and controversial in the robotics world. But other than the inflammatory
link |
and controversial part of it, the point that we were actually trying to make in that work is that
link |
for the particular case of combining perception and control, you could actually do better if you
link |
treat them together than if you try to separate them. And the way that we tried to demonstrate
link |
this is we picked a fairly simple motor control task where a robot had to insert a little red
link |
trapezoid into a trapezoidal hole. And we had our separated solution, which involved first
link |
detecting the hole using a pose detector, and then actuating the arm to put it in,
link |
and then our intent solution, which just mapped pixels to the torques. And one of the things we
link |
observed is that if you use the intent solution, essentially the pressure on the perception part
link |
of the model is actually lower. It doesn't have to figure out exactly where the thing is in 3D
link |
space. It just needs to figure out where it is distributing the errors in such a way that
link |
the horizontal difference matters more than the vertical difference, because vertically it just
link |
pushes it down all the way until it can't go any further. And their perceptual errors are a lot
link |
less harmful, whereas perpendicular to the direction of motion, perceptual errors are much
link |
more harmful. So the point is that if you combine these two things, you can trade off errors between
link |
the components optimally to best accomplish the task. And the components can actually be weaker
link |
while still leading to better overall performance.
link |
It has a profound idea. I mean, in the space of pegs and things like that, it's quite simple.
link |
It almost is tempting to overlook. But that seems to be at least intuitively an idea that should
link |
generalize the basically all aspects of perception and control.
link |
That one strengthens the other.
link |
Yeah. And people who have studied perceptual heuristics in humans and animals find things
link |
like that all the time. So one very well known example is something called the gaze heuristic,
link |
which is a little trick that you can use to intercept a flying object. So if you want to
link |
catch a ball, for instance, you could try to localize it in 3D space, estimate its velocity,
link |
estimate the effect of wind resistance, solve a complex system of differential equations in your
link |
head. Or you can maintain a running speed so that the object stays in the same position
link |
as in your field of view. So if it dips a little bit, you speed up. If it rises a little bit,
link |
you slow down. And if you follow the simple rule, you'll actually arrive at exactly the
link |
place where the object lands and you'll catch it. And humans use it when they play baseball.
link |
Human pilots use it when they fly airplanes to figure out if they're about to collide with
link |
somebody. Frogs use us to catch insects and so on and so on. So this is something that
link |
actually happens in nature. And I'm sure this is just one instance of it that we were able to
link |
identify just because all that scientists were able to identify because it's so prevalent,
link |
but there are probably many others.
link |
Do you have a, just so we can zoom in as we talk about robotics, do you have a canonical problem,
link |
sort of a simple, clean, beautiful representative problem in robotics that you think about when
link |
you're thinking about some of these problems? We talked about robotic manipulation. To me,
link |
that seems intuitively, at least the robotics community has conversed towards that as a space
link |
that's the canonical problem. If you agree, then maybe you do zoom in in some particular
link |
aspect of that problem that you just like. Like if we solve that problem perfectly,
link |
it'll unlock a major step towards human level intelligence.
link |
I don't think I have like a really great answer to that. And I think partly the reason I don't
link |
have a great answer kind of has to do with the, it has to do with the fact that the difficulty
link |
is really in the flexibility and adaptability rather than in doing a particular thing really,
link |
really well. So it's hard to just say like, oh, if you can shuffle a deck of cards as fast as like
link |
a Vegas casino dealer, then you'll be very proficient. It's really the ability to quickly
link |
figure out how to do some arbitrary new thing well enough to move on to the next arbitrary thing.
link |
But the source of newness and uncertainty, have you found problems in which it's easy to
link |
generate new newnessnessness is new types of newness?
link |
Yeah. So a few years ago, if you'd asked me this question around like 2016, maybe,
link |
I would have probably said that robotic grasping is a really great example of that because
link |
it's a task with great real world utility. Like you will get a lot of money if you can do it well.
link |
What is robotic grasping?
link |
Picking up any object.
link |
With a robotic hand.
link |
Exactly. So you will get a lot of money if you do it well because lots of people want to run
link |
warehouses with robots. And it's highly non trivial because very different objects will
link |
require very different grasping strategies. But actually, since then, people have gotten
link |
really good at building systems to solve this problem to the point where I'm not actually
link |
sure how much more progress we can make with that as like the main guiding thing.
link |
But it's kind of interesting to see the kind of methods that have actually worked well in that
link |
space because robotic grasping classically used to be regarded very much as kind of almost like
link |
a geometry problem. So people who have studied the history of computer vision will find this
link |
very familiar that kind of in the same way that in the early days of computer vision,
link |
people thought of it very much as like an inverse graphics thing. In robotic grasping,
link |
people thought of it as an inverse physics problem. Essentially, you look at what's in front of you,
link |
figure out the shapes, then use your best estimate of the laws of physics to figure out
link |
where to put your fingers on, you pick up the thing. And it turns out that what works really
link |
well for robotic grasping instantiated in many different recent works, including our own, but
link |
also ones from many other labs is to use learning methods with some combination of either exhaustive
link |
simulation or like actual real world trial and error. And it turns out that those things actually
link |
work really well. And then you don't have to worry about solving geometry problems or physics
link |
problems. So just by the way in the grasping, what are the difficulties that have been worked on?
link |
So one is like the materials of things, maybe occlusions on the perception side. Why is it
link |
such a difficult? Why is picking stuff up such a difficult problem? Yeah, it's a difficult problem
link |
because the number of things that you might have to deal with or the variety of things that you
link |
have to deal with is extremely large. And oftentimes, things that work for one class of
link |
objects won't work for other class of objects. So if you get really good at picking up boxes,
link |
and now you have to pick up plastic bags, you just need to employ a very different strategy.
link |
And there are many properties of objects that are more than just their geometry,
link |
it has to do with the bits that are easier to pick up, the bits that are hard to pick up,
link |
the bits that are more flexible, the bits that will cause the thing to pivot and bend and drop
link |
out of your hand versus the bits that result in a nice secure grasp, things that are flexible,
link |
things that if you pick them up the wrong way, they'll fall upside down and the contents will
link |
spill out. So there's all these little details that come up. But the task is still kind of can
link |
be characterized as one task, like there's a very clear notion of you did it or you didn't do it.
link |
So in terms of spilling things, there creeps in this notion that starts to sound and feel like
link |
common sense reasoning. Do you think solving the general problem of robotics requires
link |
common sense reasoning, requires general intelligence, this kind of human level capability of,
link |
you know, like you said, be robust and deal with uncertainty, but also be able to sort of reason
link |
and assimilate different pieces of knowledge that you have. Yeah. What are your thoughts on
link |
the needs of common sense reasoning in the space of the general robotics problem?
link |
So I'm going to slightly dodge that question and say that I think maybe actually,
link |
it's the other way around is that studying robotics can help us understand how to put
link |
common sense into RAI systems. One way to think about common sense is that, and why our current
link |
systems might lack common sense, is that common sense is an emergent property of
link |
actually having to interact with a particular world, a particular universe, and get things done
link |
in that universe. So you might think that, for instance, like an image captioning system,
link |
maybe it looks at pictures of the world and it types out English sentences. So it kind of
link |
deals with our world. And then you can easily construct situations where image captioning
link |
systems do things that defy common sense, like give it a picture of a person wearing a fur coat,
link |
and we'll say it's a teddy bear. But what I think what's really happening in those settings
link |
is that the system doesn't actually live in our world, it lives in its own world that consists
link |
of pixels and English sentences, and doesn't actually consist of having to put on a fur coat
link |
in the winter so you don't get cold. So perhaps the reason for the disconnect is that the
link |
systems that we have now simply inhabit a different universe. And if we build AI systems
link |
that are forced to deal with all of the messiness and complexity of our universe, maybe they will
link |
have to acquire common sense to essentially maximize their utility. Whereas the systems
link |
we're building now don't have to do that, they can take some shortcut.
link |
That's fascinating. You've a couple of times already sort of reframed the role of robotics
link |
in this whole thing. And for some reason, I don't know if my way of thinking is common,
link |
but I thought like, we need to understand and solve intelligence in order to solve robotics.
link |
And you're kind of framing it as, no, robotics is one of the best ways to just study
link |
artificial intelligence and build sort of like robotics is like the right space in which
link |
you get to explore some of the fundamental learning mechanisms, fundamental sort of
link |
multimodal, multitask aggregation of knowledge mechanisms that are required for general
link |
intelligence. That's a really interesting way to think about it. But let me ask about learning.
link |
Can the general sort of robotics, the epitome of the robotics problem be solved purely
link |
through learning, perhaps end to end learning, sort of learning from scratch,
link |
as opposed to injecting human expertise and rules and heuristics and so on?
link |
I think that in terms of the spirit of the question, I would say yes. I mean, I think that
link |
though in some ways it may be like an overly sharp dichotomy, like, you know, I think that
link |
in some ways when we build algorithms, at some point, a person does something.
link |
A person turned on the computer, a person implemented TensorFlow.
link |
But yeah, I think that in terms of the point that you're getting at, I do think the answer
link |
is yes. I think that we can solve many problems that have previously required meticulous manual
link |
engineering through automated optimization techniques. And actually, one thing I will say
link |
on this topic is I don't think this is actually a very radical or very new idea. I think people
link |
have been thinking about automated optimization techniques as a way to do control for a very,
link |
very long time. And in some ways, what's changed is really more the name. So today we would say that,
link |
oh, my robot does machine learning, it does reinforcement learning, maybe in the 1960s,
link |
you'd say, oh, my robot is doing optimal control. And maybe the difference between typing out a
link |
system of differential equations and doing feedback linearization versus training in neural net,
link |
maybe it's not such a large difference. It's just pushing the optimization deeper and deeper into
link |
the thing. Well, it is interesting, you think that way, but especially with deep learning,
link |
that the accumulation of experiences in data form to form deep representations starts to feel
link |
like knowledge as opposed to optimal control. So this feels like there's an accumulation of
link |
knowledge through the learning process. Yes. Yeah. So I think that is a good point that
link |
one big difference between learning based systems and classic optimal control systems is that
link |
learning based systems in principle should get better and better the more they do something.
link |
And I do think that that's actually a very, very powerful difference.
link |
So look back at the world of expert systems, the symbolic AI and so on,
link |
of using logic to accumulate expertise, human expertise, human encoded expertise.
link |
Do you think that will have a role at some points? The deep learning, machine learning,
link |
reinforcement learning has shown incredible results and breakthroughs and just inspired
link |
thousands, maybe millions of researchers. But there's this less popular now,
link |
but it used to be popular idea of symbolic AI. Do you think that will have a role?
link |
I think in some ways, the kind of the descendants of symbolic AI actually already have a role. So
link |
this is the highly biased history from my perspective. You say that, well, initially we
link |
thought that rational decision making involves logical manipulation. So you have some model of
link |
the world expressed in terms of logic. You have some query like, what action do I take in order to
link |
for X to be true? And then you manipulate your logical symbolic representation to get an answer.
link |
What that turned into somewhere in the 1990s is, well, instead of building kind of predicates
link |
and statements that have true or false values, we'll build probabilistic systems where things
link |
have probabilities associated and probabilities of being true and false. And that turned into
link |
Bayes Nets. And that provided sort of a boost to what were really still essentially logical
link |
inference systems, just probabilistic logical inference systems. And then people said, well,
link |
let's actually learn the individual probabilities inside these models. And then people said, well,
link |
let's not even specify the nodes in the models, let's just put a big neural net in there. But in
link |
many ways, I see these as actually kind of descendants from the same idea. It's essentially
link |
instantiating rational decision making by means of some inference process, and learning by means
link |
of an optimization process. So in a sense, I would say yes, that it has a place. And in many
link |
ways, that place is over, you know, it already holds that place. It's already in there. Yeah,
link |
it's just by different, it looks slightly different than it was before. But in some,
link |
there are some things that we can think about that make this a little bit more obvious. Like,
link |
if I train a big neural net model to predict what will happen in response to my robot's actions,
link |
and then I run probabilistic inference, meaning I invert that model to figure out the actions that
link |
lead to some plausible outcome. Like, to me, that seems like a kind of logic. You have a model of
link |
the world, it just happens to be expressed by a neural net. And you are doing some inference
link |
procedure, some sort of manipulation on that model to figure out, you know, the answer to a
link |
query that you have. It's the interpretability, it's the explainability, though, that seems to
link |
be lacking more so because the nice thing about sort of expert systems is you can follow the
link |
reasoning of the system that to us, mere humans is somehow compelling. It would, it's just,
link |
I don't know what to make of this fact that there's a human desire for intelligent systems to be able
link |
to convey in a poetic way to us why it made the decisions it did. Like, tell a convincing story.
link |
And perhaps that's like a silly human thing. Like, we shouldn't expect that of intelligent
link |
systems. Like, we should be super happy that there is intelligent systems out there. But
link |
if I were to sort of psychoanalyze the researchers at the time, I would say expert systems connected
link |
to that part, that desire for AI researchers for systems to be explainable. I mean, maybe on that
link |
topic, do you have a hope that sort of inference systems of learning based systems will be as
link |
explainable as the dream was with expert systems, for example? I think it's a very complicated
link |
question because I think that in some ways, the question of explainability is kind of very closely
link |
tied to the question of performance. Like, why do you want your system to explain itself? Well,
link |
it's so that when it screws up, you can kind of figure out why it did it. But in some ways,
link |
that's a much bigger problem, actually. Like, your system might screw up and then it might
link |
screw up in how it explains itself, or you might have some bug somewhere so that it's not actually
link |
doing what it was supposed to do. So, maybe a good way to view that problem is really as a
link |
problem, as a bigger problem of verification and validation of which explainability is sort of
link |
one component. I see. I just see it differently. I see explainability. You put it beautifully. I
link |
think you actually summarized the field of explainability. But to me, there's another
link |
aspect of explainability, which is like storytelling that has nothing to do with errors or with
link |
like the sort of it doesn't it uses errors as as elements of its story, as opposed to a fundamental
link |
need to be explainable when errors occur. It's just that for other intelligence systems to be in
link |
our world, we seem to want to tell each other stories. And that that's true in the political
link |
world. That's true in the academic world. And that I, you know, neural networks are less capable
link |
of doing that, or perhaps they're equally capable of storytelling, storytelling, maybe it doesn't
link |
matter what the fundamentals of the system are, you just need to be a good storyteller.
link |
Maybe one specific story I can tell you about in that space is actually about some work that was
link |
done by about my former collaborator, who's now a professor at MIT named Jacob Andreas. Jacob actually
link |
works in natural language processing, but he had this idea to do a little bit of work in reinforcement
link |
learning, and how on how natural language can basically structure the internals of policies
link |
trained with RL. And one of the things he did is he set up a model that attempts to perform some
link |
tasks that's defined by a reward function. But the model reads in a natural language instruction.
link |
So this is a pretty common thing to do an instruction following. So you tell it like,
link |
you know, go to the red house, and then it's supposed to go to the red house. But then one
link |
of the things that Jacob did is he treated that sentence not as a command from a person,
link |
but as a representation of the internal kind of state of the mind of this policy, essentially,
link |
so that when it was faced with a new task, what it would do is it would basically try to think of
link |
possible language descriptions, attempt to do them and see if they led to the right outcome.
link |
So would it kind of think out loud, like, you know, I'm faced with this new task, what am I
link |
going to do? Let me go to the red house. Oh, that didn't work. Let me go to the blue room or something,
link |
let me go to the green plant. And once it got some reward, it would say, oh, go to the green
link |
plant, that's what's working, I'm going to go to the green plant. And then you could look at the
link |
string that it came up with, and that was a description of how it thought it should solve
link |
the problem. So you could do, you could basically incorporate language as internal state, and you
link |
can start getting some handle on these kinds of things. And then what I was kind of trying to
link |
get to is that also if you add to the reward function, the convincingness of that story.
link |
So I have another reward signal of like, people who review that story, how much they like it.
link |
So that, you know, initially that could be a hyper parameter sort of hard coded heuristic
link |
type of thing, but it's an interesting notion of the convincingness of the story becoming part
link |
of the reward function, the objective function of the explainability. It's in the world of sort of
link |
Twitter and fake news that might be a scary notion that the nature of truth may not be as
link |
important as the convincingness of the how convincing you are in telling the story around
link |
the facts. Well, let me ask the basic question. You're one of the world class researchers in
link |
reinforcement learning, deeper enforcement learning, certainly in the robotics space.
link |
What is reinforcement learning? I think that what reinforcement learning
link |
refers to today is really just the kind of the modern incarnation of learning based control.
link |
So classically, reinforcement learning has a much more narrow definition, which is that
link |
it's, you know, literally learning from reinforcement, like the thing does something
link |
and then it gets a reward or punishment. But really, I think the way the term is used today is
link |
it's used for more broadly to learning based control. So some kind of system that's supposed
link |
to be controlling something and it uses data to get better. And what does control mean? So this
link |
action is the fundamental element there. It means making rational decisions.
link |
And rational decisions are decisions that maximize a measure of utility.
link |
And sequentially, so you made decisions time and time and time again. Now, like,
link |
it's easier to see that kind of idea in the space of maybe games and the space of robotics.
link |
Do you see it bigger than that? Is it applicable? Like, where are the limits of the applicability
link |
of reinforcement learning? Yeah, so rational decision making is essentially the
link |
encapsulation of the AI problem viewed through a particular lens. So any problem that we would
link |
want a machine to do, an intelligent machine can likely be represented as a decision making problem.
link |
Classifying images is a decision making problem, although not a sequential one typically.
link |
You know, controlling a chemical plant as a decision making problem,
link |
deciding what videos to recommend on YouTube is a decision making problem.
link |
And one of the really appealing things about reinforcement learning is, if it does encapsulate
link |
the range of all these decision making problems, perhaps working on reinforcement learning is,
link |
you know, one of the ways to reach a very broad swath of AI problems.
link |
But what do you use the fundamental difference between reinforcement learning and maybe supervised
link |
machine learning? So reinforcement learning can be viewed as a generalization of supervised
link |
machine learning. You can certainly cast supervised learning as a reinforcement learning problem.
link |
You can just say your loss function is the negative of your reward. But you have stronger
link |
assumptions. You have the assumption that someone actually told you what the correct answer was,
link |
that your data was IID and so on. So you could view reinforcement learning essentially relaxing
link |
some of those assumptions. Now, that's not always a very productive way to look at it,
link |
because if you actually have a supervised learning problem, you'll probably solve it
link |
much more effectively by using supervised learning methods, because it's easier. But
link |
you can view reinforcement learning as a generalization of that.
link |
No, for sure. But they're fundamentally different. That's a mathematical statement.
link |
That's absolutely correct. But it seems that reinforcement learning, the kind of tools
link |
we're bringing to the table today, after today, so maybe down the line, everything will be a
link |
reinforcement learning problem. Just like you said, image classification should be mapped to a
link |
reinforcement learning problem. But today, the tools and ideas, the way we think about them are
link |
different. Sort of supervised learning has been used very effectively to solve basic,
link |
narrow AI problems. Reinforcement learning kind of represents the dream of AI. It's very much so
link |
in the research space now, in sort of captivating the imagination of people of what we can do with
link |
intelligent systems. But it hasn't yet had as wide of an impact as the supervised learning
link |
approaches. So my question comes in a more practical sense. What do you see as the
link |
gap between the more general reinforcement learning and the very specific, yes,
link |
it's a question of decision making with one step in the sequence of the supervised learning?
link |
So from a practical standpoint, I think that one thing that is potentially a little tough now,
link |
and this is, I think, something that we'll see, this is a gap that we might see closing over
link |
the next couple of years, is the ability of reinforcement learning algorithms to effectively
link |
utilize large amounts of prior data. So one of the reasons why it's a bit difficult today
link |
to use reinforcement learning for all the things that we might want to use it for,
link |
is that in most of the settings where we want to do rational decision making,
link |
it's a little bit tough to just deploy some policy that does crazy stuff and learns purely
link |
through trial and error. It's much easier to collect a lot of data, a lot of logs of some
link |
other policy that you've got, and then maybe you, you know, if you can get a good policy out of that,
link |
then you deploy it and let it kind of fine tune a little bit. But algorithmically,
link |
it's quite difficult to do that. So I think that once we figure out how to get reinforcement learning
link |
to bootstrap effectively from large datasets, then we'll see very, very rapid growth in
link |
applications of these technologies. So this is what's referred to as off policy reinforcement
link |
learning or offline RL or batch RL. And I think we're seeing a lot of research right now that
link |
does bring us closer and closer to that. Can you maybe paint the picture of the different methods,
link |
as you said, off policy, what's value based, reinforcement learning, what's policy based,
link |
what's model based, what's off policy on policy, what are the different categories of reinforcement
link |
learning? Yeah. So one way we can think about reinforcement learning is that it's in some
link |
very fundamental way. It's about learning models that can answer kind of what if questions. So
link |
what would happen if I take this action that I hadn't taken before? And you do that, of course,
link |
from experience, from data. And oftentimes you do it in a loop. So you build a model that answers
link |
these what if questions, use it to figure out the best action you can take, and then go and try
link |
taking that and see if the outcome agrees with what you predicted. So the different kinds of
link |
techniques basically refer to different ways of doing it. So model based methods answer a question
link |
of what state you would get, basically, what would happen to the world if you were to take a
link |
certain action value based methods, they answer the question of what value you would get, meaning
link |
what utility you would get. But in a sense, they're not really all that different, because
link |
they're both really just answering these what if questions. Now, unfortunately, for us, with
link |
current machine learning methods, answering what if questions can be really hard, because
link |
they are really questions about things that didn't happen. If you wanted to answer what if questions
link |
about things that did happen, you wouldn't need to learn model, you would just like repeat the
link |
thing that worked before. And that's really a big part of why RL is a little bit tough. So if you
link |
have a purely on policy kind of online process, then you ask these what if questions, you make
link |
some mistakes, then you go and try doing those mistaken things. And then you observe kind of
link |
the counter examples that will teach you not to do those things again. If you have a bunch of
link |
off policy data, and you just want to synthesize the best policy you can out of that data, then
link |
you really have to deal with the challenges of making these counterfactual.
link |
First of all, what's a policy?
link |
A policy is a model or some kind of function that maps from observations of the world to actions.
link |
So in reinforcement learning, we often refer to the current configuration of the world as the
link |
state. So we say the state kind of encompasses everything you need to fully define where the
link |
world is at at the moment. And depending on how we formulate the problem, we might say you either
link |
get to see the state or you get to see an observation, which is some snapshots and piece of the state.
link |
So policy is just includes everything in it in order to be able to act in this world.
link |
And so what does off policy mean?
link |
Yeah, so the terms on policy and off policy refer to how you get your data.
link |
So if you get your data from somebody else who was doing some other stuff, maybe you get your data
link |
from some manually programmed system that was just running in the world before,
link |
that's referred to as off policy data. But if you got the data by actually acting in the world based
link |
on what your current policy thinks is good, we call that on policy data. And obviously,
link |
on policy data is more useful to you because if your current policy makes some bad decisions,
link |
you will actually see that those decisions are bad. Off policy data, however, might be much easier
link |
to obtain because maybe that's all the log data that you have from before.
link |
So we talked about offline, talked about autonomous vehicles, so you can envision
link |
off policy kind of approaches in robotic spaces where there's already a ton of robots out there,
link |
but they don't get the luxury of being able to explore based on a reinforced learning framework.
link |
So how do we make, again, open question, but how do we make off policy methods work?
link |
Yeah, so this is something that has been kind of a big open problem for a while. And in the last
link |
few years, people have made a little bit of progress on that. I can tell you about, and it's
link |
not by any means solved yet, but I can tell you some of the things that, for example, we've done to
link |
try to address some of the challenges. It turns out that one really big challenge with off policy
link |
reinforcement learning is that you can't really trust your models to give accurate predictions
link |
for any possible action. So if I've never tried to, if in my data set I never saw somebody steering
link |
the car off the road onto the sidewalk, my value function or my model is probably not going to
link |
predict the right thing if I ask what would happen if I were to steer the car off the road onto the
link |
sidewalk. So one of the important things you have to do to get off policy RL to work is you have
link |
to be able to figure out whether a given action will result in a trustworthy prediction or not.
link |
And you can use kind of distribution estimation methods, kind of density estimation methods
link |
to try to figure that out. So you could figure out that, well, this action, my model is telling me
link |
that it's great, but it looks totally different from any action I've taken before. So my model is
link |
probably not correct. And you can incorporate regularization terms into your learning objective
link |
that will essentially tell you not to ask those questions that your model is unable to answer.
link |
What would lead to breakthroughs in this space, do you think? Like what's needed? Is this a data set
link |
question? Do we need to collect big benchmark data sets that allows to explore the space?
link |
Is it a new kinds of methodologies? Like what's your sense? Or maybe coming together in a space
link |
of robotics and defining the right problem to be working on? I think for off policy reinforcement
link |
learning in particular, it's very much an algorithms question right now. And this is something that
link |
I think is great because an algorithms question is that that just takes some very smart people to
link |
get together and think about it really hard. Whereas if it was like a data problem or hardware
link |
problem, that would take some serious engineering. So that's why I'm pretty excited about that
link |
problem because I think that we're in a position where we can make some real progress on it just
link |
by coming up with the right algorithms in terms of which algorithms they could be. The problems at
link |
their core are very related to problems in things like causal inference because what you're really
link |
dealing with is situations where you have a model, a statistical model that's trying to make predictions
link |
about things that it hadn't seen before. And if it's a model that's generalizing properly,
link |
that'll make good predictions. If it's a model that picks up on various correlations that will
link |
not generalize properly, and then you have an arsenal of tools you can use, you could for example
link |
figure out what are the regions where it's trustworthy, or on the other hand, you could try
link |
to make it generalize better somehow or some combination of the two. Is there a room for
link |
mixing where most of it, like 90, 95% is off policy, you already have the data set,
link |
and then you get to send the robot out to do a little exploration? What's that role of
link |
mixing them together? Yeah, absolutely. I think that this is something that you actually
link |
described very well at the beginning of our discussion when you talked about the iceberg.
link |
This is the iceberg, that the 99% of your prior experience, that's your iceberg. You'd use that
link |
for off policy reinforcement learning. And then of course, if you've never opened that particular
link |
kind of door with that particular lock before, then you have to go out and fiddle with it a little
link |
bit. And that's that additional 1% to help you figure out a new task. And I think that's actually
link |
like a pretty good recipe going forward. Is this to you the most exciting space of reinforcement
link |
learning now? Or is there, what's, maybe taking a step back, not just now, but what's to use the
link |
most beautiful idea? Apologize for the romanticized question, but the beautiful idea or concept in
link |
reinforcement learning? In general, I actually think that one of the things that is a very beautiful
link |
idea in reinforcement learning is just the idea that you can obtain a near optimal control or
link |
near optimal policy without actually having a complete model of the world. It's something that
link |
feels perhaps kind of obvious if you just hear the term reinforcement learning or you think about
link |
trial and error learning. But from a control's perspective, it's a very weird thing because
link |
classically, we think about engineered systems and controlling engineered systems as the problem
link |
of writing down some equations and then figuring out, given these equations, basically like solve
link |
for x, figure out the thing that maximizes its performance. And the theory of reinforcement
link |
learning actually gives us a mathematically principled framework to think, to reason about
link |
optimizing some quantity when you don't actually know the equations that govern that system.
link |
And to me, that actually seems kind of very elegant, not something that
link |
becomes immediately obvious, at least in the mathematical sense.
link |
Does it make sense to you that it works at all?
link |
Well, I think it makes sense when you take some time to think about it, but it is a little
link |
surprising. Well, then taking a step into the more deeper representations, which is also very
link |
surprising of sort of the richness of the state space, the space of environments that
link |
this kind of approach can operate in. Can you maybe say what is deep reinforcement learning?
link |
Well, deep reinforcement learning simply refers to taking reinforcement learning algorithms and
link |
combining them with high capacity neural net representations, which might at first seem like
link |
a pretty arbitrary thing, just take these two components and stick them together. But the
link |
reason that it's something that has become so important in recent years is that reinforcement
link |
learning, it kind of faces an exacerbated version of a problem that has faced many other machine
link |
learning techniques. So if we go back to the early 2000s or the late 90s, we'll see a lot
link |
of research on machine learning methods that have some very appealing mathematical properties,
link |
like they reduce the convex optimization problems, for instance. But they require very
link |
special inputs. They require a representation of the input that is clean in some way, like for
link |
example, clean in the sense that the classes in your multi class classification problems
link |
separate linearly. So they have some kind of good representation, and we call this a feature
link |
representation. And for a long time, people were very worried about features in the world of supervised
link |
learning, because somebody had to actually build those features, so you couldn't just take an image
link |
and plug it into your logistic regression or your SVM or something, someone had to take that image
link |
and process it using some handwritten code. And then neural nets came along and they could
link |
actually learn the features. And suddenly, we could apply learning directly to the raw inputs,
link |
which was great for images, but it was even more great for all the other fields where people hadn't
link |
come up with good features yet. And one of those fields actually reinforcement learning,
link |
because in reinforcement learning, the notion of features, if you don't use neural nets and you
link |
have to design your own features, is very opaque. It's very hard to imagine, let's say I'm playing
link |
chess or Go. What is a feature with which I can represent the value function for Go or even the
link |
optimal policy for Go linearly? I don't even know how to start thinking about it. And people
link |
tried all sorts of things that would write down an expert chess player looks for whether the
link |
knight is in the middle of the board or not. So that's a feature is knight in middle of board.
link |
And they would write these like long lists of kind of arbitrary made up stuff. And that was
link |
really kind of getting us nowhere. And that's a little chess is a little more accessible than
link |
the robotics problem. Absolutely. Right. That's there's at least experts in the in the different
link |
features for chess. But still like the neural network there, to me, that's, I mean, you put it
link |
eloquently and almost made it seem like a natural step to add neural networks. But the fact that
link |
neural networks are able to discover features in the control problem. It's very interesting. It's
link |
hopeful. I'm not sure what to think about it, but it feels hopeful that the control problem has
link |
features to be learned. Like, I guess my question is, is it surprising to you how far the deep side
link |
of deeper enforcement learning was able to like what the space of promise has been able to tackle
link |
from especially in games with the Alpha star and and Alpha zero and just the representation power
link |
there and in the robotics space. And what is your sense of the limits of this representation power
link |
and the control context? I think that in regard to the limits that here, I think that one thing
link |
that makes it a little hard to fully answer this question is because in settings where we would
link |
like to push push these things to the limit, we encounter other bottlenecks. So like the reason
link |
that I can't get my robot to learn how to like, I don't know, do the dishes in the kitchen. It's
link |
not because it's neural net is not big enough. It's because when you try to actually do trial
link |
and error learning, reinforcement learning directly in the real world, where you have the
link |
potential to gather these large, very, you know, highly varied and complex data sets,
link |
you start running into other problems, like one problem you run into very quickly. It's it'll
link |
first sound like a very pragmatic problem, but it actually turns out to be a pretty deep scientific
link |
problem. Take the robot put in your kitchen, have it try to learn to do the dishes with trial and
link |
error, it'll break all your dishes, and then we'll have no more dishes to clean. Now you might think
link |
this is a very practical issue, but there's something to this, which is that if you have a
link |
person trying to do this, you know, a person will have some degree of common sense, they'll
link |
break one dish, they'll be a little more careful with the next one. And if they break all of them,
link |
they're going to go and get more or something like that. So there's all sorts of scaffolding
link |
that that comes very naturally to us for our learning process. Like, you know, if I have to
link |
learn something through trial and error, I have a common sense to know that I have to, you know,
link |
try multiple times. If I screw something up, I ask for help or I reset things or something like that.
link |
And all of that is kind of outside of the classic reinforcement learning problem formulation.
link |
There are other things that are that can also be categorized as kind of scaffolding,
link |
but are very important, like for example, where you do get your reward function. If I want to
link |
learn how to pour a cup of water, well, how do I know if I've done it correctly? Now that probably
link |
requires an entire computer vision system to be built just to determine that. And that seems a
link |
little bit inelegant. So there are all sorts of things like this that start to come up when we
link |
think through what we really need to get reinforcement learning to happen at scale in the real world.
link |
And many of these things actually suggest a little bit of a shortcoming in the problem
link |
formulation and a few deeper questions that we have to resolve. That's really interesting. I
link |
talked to like David Silver, about AlphaZero. And it seems like there's no, again, that we
link |
haven't hit the limit at all in the context when there's no broken dishes. So in the case of Go,
link |
you can, it's really about just scaling compute. So again, like the bottleneck is the amount of
link |
money you're willing to invest in compute, and then maybe the different, the scaffolding around
link |
how difficult it is to scale compute, maybe. But there there's no limit. And it's interesting.
link |
Now we move to the real world, and there's the broken dishes, there's all the, and the reward
link |
function like you mentioned, that's really nice. So what, how do we push forward there? Do you think
link |
there's, there's this kind of sample efficiency question that people bring up, you know, not
link |
having to break 100,000 dishes? Is this an algorithm question? Is this a data selection
link |
like question? What do you think? How do we, how do we not break too many dishes?
link |
Yeah. Well, one way we can think about that is that maybe we need to be better at
link |
at reusing our data, building that, that iceberg. So perhaps, perhaps it's too much to hope that
link |
you can have a machine that in isolation in the vacuum without anything else can just master
link |
complex tasks in like, in minutes, the way that people do. But perhaps it also doesn't have to,
link |
perhaps what it really needs to do is have an existence, a lifetime where it does many things
link |
and the previous things that it has done, prepare it to do new things more efficiently.
link |
And, you know, the study of these kinds of questions typically falls under categories
link |
like multitask learning or meta learning. But they all fundamentally deal with the same
link |
general theme, which is use experience for doing other things to learn to do new things
link |
efficiently and quickly. So what do you think about if you just look at the one particular
link |
case study of Tesla autopilot that has quickly approaching towards a million vehicles on the
link |
road, where some percentage of the time 30, 40% of the time is driving using the computer vision,
link |
multitask, hydranet, right? And then the other percent, that's what they call it hydranet.
link |
The other percent is human controlled. From the human side, how can we use that data? What's
link |
your sense? So like, what's the signal? Do you have ideas in this autonomous vehicle space
link |
when people can lose their lives? You know, it's a safety critical environment. So how do we use
link |
that data? So I think that actually the kind of problems that come up when we want systems that
link |
are reliable and that can kind of understand the limits of their capabilities, they're actually
link |
very similar to the kind of problems that come up when we're doing off policy reinforcement
link |
learning. So as I mentioned before, in off policy reinforcement learning, the big problem is you
link |
need to know when you can trust the predictions of your model, because if you're trying to evaluate
link |
some pattern of behavior for which your model doesn't give you an accurate prediction, then you
link |
shouldn't use that to modify your policy. It's actually very similar to the problem that we're
link |
faced when we actually then deploy that thing. And we want to decide whether we trust it in the
link |
moment or not. So perhaps we just need to do a better job of figuring out that part. And that's
link |
a very deep research question, of course. But it's also a question that a lot of people are
link |
working on. So I'm pretty optimistic that we can make some progress on that over the next few years.
link |
What's the role of simulation in reinforcement learning, deeper reinforcement learning,
link |
reinforcement learning? Like how essential is it? It's been essential for the breakthroughs so far,
link |
for some interesting breakthroughs. Do you think it's a crutch that we rely on? I mean,
link |
again, it's the connection to our off policy discussion. But do you think we can ever get rid
link |
of simulation? Or do you think simulation will actually take over? Will create more and more
link |
realistic simulations that will allow us to solve actual real world problems, like transfer the models
link |
we'll learn in simulation to real world problems? I think that simulation is a very pragmatic tool
link |
that we can use to get a lot of useful stuff to work right now. But I think that in the long run,
link |
we will need to build machines that can learn from real data, because that's the only way that we'll
link |
get them to improve perpetually. Because if we can't have our machines learn from real data,
link |
if they have to rely on simulated data, eventually the simulator becomes the bottleneck.
link |
In fact, this is a general thing. If your machine has any bottleneck that is built by humans,
link |
and that doesn't improve from data, it will eventually be the thing that holds it back.
link |
And if you're entirely reliant on your simulator, that'll be the bottleneck. If you're entirely
link |
reliant on a manually designed controller, that's going to be the bottleneck. So simulation is very
link |
useful. It's very pragmatic. But it's not a substitute for being able to utilize real experience.
link |
And this is, by the way, this is something that I think is quite relevant now, especially in the
link |
context of some of the things we've discussed, because some of these kind of scaffolding issues
link |
that I mentioned, things like the broken dishes and the unknown reward functions, like these are
link |
not problems that you would ever stumble on when working in a purely simulated kind of environment.
link |
But they become very apparent when we try to actually run these things in the real world.
link |
Do you throw a brief wrench into our discussion? Let me ask, do you think we're living in a simulation?
link |
Oh, I have no idea.
link |
Do you think that's a useful thing to even think about the fundamental physics nature of reality?
link |
Or another perspective? The reason I think the simulation hypothesis is interesting is
link |
to think about how difficult is it to create sort of a virtual reality game type situation
link |
that will be sufficiently convincing to us humans or sufficiently enjoyable that we wouldn't want
link |
to leave. That's actually a practical engineering challenge. And I personally really enjoy virtual
link |
reality, but it's quite far away. But I kind of think about, what would it take for me to want
link |
to spend more time in virtual reality versus the real world? And that's a, that's a sort of a nice
link |
clean question. Because at that point, we've reached, if I want to live in a virtual reality,
link |
that means we're just a few years away, we're a majority, the population lives in a virtual
link |
reality. And that's how we create the simulation, right? You don't need to actually simulate the
link |
the quantum gravity and just every aspect of the of the universe. And that's a really,
link |
that's an interesting question for reinforcement learning too, is if we want to make sufficiently
link |
realistic simulations that may, it blend the difference between sort of the real world and
link |
the simulation, thereby, just some of the things we've been talking about, kind of the problems
link |
go away, if we can create actually interesting, rich simulations. It's an interesting question.
link |
And it actually, I think your question casts your previous question in a very interesting light,
link |
because in some ways, asking whether we can, well, the more practical version is like,
link |
can we build simulators that are good enough to train essentially AI systems that will work
link |
in the world? And it's kind of interesting to think about this, about what this implies. If true,
link |
it kind of implies that it's easier to create the universe than it is to create a brain.
link |
And that seems like put this way, it seems kind of weird.
link |
The aspect of the simulation most interesting to me is the simulation of other humans.
link |
That seems to be a complexity that makes the robotics problem harder. Now,
link |
I don't know if every robotics person agrees with that notion. Just as a quick aside,
link |
what are your thoughts about when the human enters the picture of the robotics problem? How
link |
does that change the reinforcement learning problem, the learning problem in general?
link |
Yeah, I think that's a kind of a complex question. And I guess my hope for a while had been that
link |
if we build these robotic learning systems that are multitask, that utilize lots of prior data,
link |
and that learn from their own experience, the bit where they have to interact with people
link |
will be perhaps handled in much the same way as all the other bits. So if they have prior
link |
experience of attracting with people and they can learn from their own experience of attracting
link |
with people for this new task, maybe that'll be enough. Now, of course, if it's not enough,
link |
there are many other things we can do. And there's quite a bit of research in that area.
link |
But I think it's worth a shot to see whether the multi agent interaction, the ability to understand
link |
that other beings in the world have their own goals and intentions and thoughts and so on,
link |
whether that kind of understanding can emerge automatically from simply learning to do things
link |
with and maximize utility. That information arises from the data. You've said something
link |
about gravity, sort of that you don't need to explicitly inject anything into the system,
link |
they can be learned from the data. And gravity is an example of something that could be learned
link |
from data, sort of like the physics of the world. What are the limits of what we can learn from
link |
data? So a very simple, clean way to ask that is, do you really think we can learn gravity
link |
from just data, the idea, the laws of gravity? So something that I think is a common kind of
link |
pitfall when thinking about prior knowledge and learning is to assume that just because we know
link |
something, then that it's better to tell the machine about that rather than have it figured out
link |
and so on. In many cases, things that are important, that affect many of the events
link |
that the machine will experience are actually pretty easy to learn. If every time you drop
link |
something, it falls down. Yeah, you might get the Newton's version, not Einstein's version,
link |
but it'll be pretty good and it will probably be sufficient for you to act rationally in the world
link |
because you see the phenomenon all the time. So things that are readily apparent from the data,
link |
we might not need to specify those by hand. It might actually be easier to let the machine
link |
figure them out. It just feels like that there might be a space of many local
link |
minima in terms of theories of this world that we would discover and get stuck on.
link |
That Newtonian mechanics is not necessarily easy to come by.
link |
Yeah, and well, in fact, in some fields of science, for example, human civilizations
link |
that sell full of these local optimism. So for example, if you think about how people
link |
try to figure out biology and medicine for the longest time, the kind of rules, the kind of
link |
principles that serve us very well in our day to day lives actually serve us very poorly
link |
in understanding medicine and biology. We had very superstitious and weird ideas about how
link |
the body worked until the advent of the modern scientific method. So that does seem to be
link |
a failing of this approach, but it's also a failing of human intelligence arguably.
link |
Yeah, maybe a smaller side, but the idea of self play is fascinating in reinforcement learning,
link |
sort of these competitive, creating a competitive context in which agents can play against each
link |
other in sort of at the same skill level and thereby increasing each other skill level.
link |
It seems to be this kind of self improving mechanism is exceptionally powerful in the
link |
context where it could be applied. First of all, is that beautiful to you that this mechanism
link |
work as well as it does and also can be generalized to other contexts like in the robotic space
link |
or anything that's applicable to the real world?
link |
I think that it's a very interesting idea, but I suspect that the bottleneck to actually
link |
generalizing it to the robotic setting is actually going to be the same as
link |
the bottleneck for everything else, that we need to be able to build machines that can get better
link |
and better through natural interaction with the world. And once we can do that, then they can go
link |
out and play with each other, they can play with people, they can play with the natural environment.
link |
But before we get there, we've got all these other problems we have to get out of the way.
link |
So there's no shortcut around that. You have to interact with the natural environment that...
link |
Well, because in a self play setting, you still need a mediating mechanism. So the reason that
link |
self play works for a board game is because the rules of that board game
link |
mediate the interaction between the agents. So the kind of intelligent behavior that will
link |
emerge depends very heavily on the nature of that mediating mechanism.
link |
So on the side of reward functions, that's coming up with good reward function seems to
link |
be the thing that we associate with general... like human beings seem to value the idea of
link |
developing our own reward functions, of arriving at meaning and so on. And yet for reinforcement
link |
learning, we often specify that's the given. What's your sense of how we develop good reward
link |
functions? Yeah, I think that's a very complicated and very deep question. And you're completely
link |
right that classically in reinforcement learning, this question has been treated as an on issue,
link |
that you treat the reward as this external thing that comes from some other bit of your biology
link |
and you don't worry about it. And I do think that that's actually a little bit of a mistake that
link |
we should worry about it. And we can approach it in a few different ways. We can approach it,
link |
for instance, by thinking of reward as a communication medium. We can say, well,
link |
how does a person communicate to a robot what its objective is? You can approach it also as
link |
sort of more of an intrinsic motivation medium. You could say, can we write down
link |
kind of a general objective that leads to good capability? Like, for example, can you write
link |
down some objectives such that even in the absence of any other task, if you maximize that objective,
link |
you'll sort of learn useful things. This is something that has sometimes been called unsupervised
link |
reinforcement learning, which I think is a really fascinating area of research, especially today.
link |
We've done a bit of work on that recently. One of the things we've studied is whether
link |
we can have some notion of unsupervised reinforcement learning by means of
link |
information theoretic quantities, like, for instance, minimizing a Bayesian measure of surprise. This
link |
is an idea that was pioneered actually in the computational neuroscience community by folks
link |
like Carl Friston. And we've done some work recently that shows that you can actually learn
link |
pretty interesting skills by essentially behaving in a way that allows you to make accurate predictions
link |
about the world. It seems a little circular. Do the things that will lead to you getting the right
link |
answer for prediction. But by doing this, you can sort of discover stable niches in the world.
link |
You can discover that if you're playing Tetris, then correctly clearing the rows will let you
link |
play Tetris for longer and keep the board nice and clean, which sort of satisfies some desire
link |
for order in the world. And as a result, get some degree of leverage over your domain.
link |
So we're exploring that pretty actively. Is there a role for a human notion of curiosity
link |
in itself being the reward sort of discovering new things about the world?
link |
So one of the things that I'm pretty interested in is actually whether
link |
discovering new things can actually be an emergent property of some other objective
link |
that quantifies capability. So new things for the sake of new things, maybe might not by itself
link |
be the right answer, but perhaps we can figure out an objective for which discovering new things
link |
is actually the natural consequence. That's something we're working on right now,
link |
but I don't have a clear answer for you there yet. That's still a work in progress.
link |
You mean just as a curious observation to see sort of creative patterns of curiosity
link |
on the way to optimize for a particular task?
link |
On the way to optimize for a particular measure of capability.
link |
Is there ways to understand or anticipate unexpected, unintended consequences of
link |
particular reward functions? Sort of anticipate the kind of strategies that might be developed
link |
and try to avoid highly detrimental strategies?
link |
Yeah. So classically, this is something that has been pretty hard in reinforcement learning
link |
because it's difficult for a designer to have good intuition about what a learning algorithm
link |
will come up with when they give it some objective. There are ways to mitigate that.
link |
One way to mitigate it is to actually define an objective that says, don't do weird stuff.
link |
You can actually quantify it and say just don't enter situations that have low probability
link |
under the distribution of states you've seen before.
link |
It turns out that that's actually one very good way to do off policy reinforcement learning,
link |
actually. So we can do some things like that.
link |
If we slowly venture in speaking about reward functions into greater and greater levels of
link |
intelligence, there's, I mean, Stuart Russell thinks about this, the alignment of AI systems
link |
with us humans. So how do we ensure that AGI systems align with us humans?
link |
It's kind of a reward function question of specifying the behavior of AI systems
link |
such that their success aligns with the broader intended success interests of human beings.
link |
Do you have thoughts on this? Do you have concerns of where reinforcement learning fits into this?
link |
Or are you really focused on the current moment of us being quite far away and trying to solve
link |
the robotics problem? I don't have a great answer to this. And I do think that this is a problem
link |
that's important to figure out. For my part, I'm actually a bit more concerned about the other
link |
side of this equation that maybe rather than unintended consequences for objectives that
link |
are specified too well, I'm actually more worried right now about unintended consequences for
link |
objectives that are not optimized well enough, which might become a very pressing problem
link |
when we, for instance, try to use these techniques for safety critical systems like
link |
cars and aircraft and so on. I think at some point we'll face the issue of objectives being
link |
optimized too well, but right now I think we're more likely to face the issue of them not being
link |
optimized well enough. But you don't think unintended consequences can arise even when
link |
you're far from optimality, sort of like on the path to it? Oh, no, I think unintended
link |
consequences can absolutely arise. It's just I think right now the bottleneck for improving
link |
reliability, safety and things like that is more with systems that need to work better,
link |
that need to optimize their objective better. Do you have thoughts, concerns about existential
link |
threats of human level intelligence? If we put on our hat of looking in 10, 20, 100, 500 years from
link |
now, do you have concerns about existential threats of AI systems? I think there are absolutely
link |
existential threats for AI systems just like there are for any powerful technology.
link |
But I think that these kinds of problems can take many forms and some of those forms will
link |
come down to people with nefarious intent. Some of them will come down to AI systems that have
link |
some fatal flaws and some of them will of course come down to AI systems that are too capable in
link |
some way. But among this set of potential concerns, I would actually be much more concerned about the
link |
first two right now and principally the one with nefarious humans because just through all
link |
of human history actually it's the nefarious humans that have been the problem not the nefarious
link |
machines than I am about the others. And I think that right now the best that I can do to make
link |
sure things go well is to build the best technology I can and also hopefully promote
link |
responsible use of that technology. Do you think RL systems has something to teach us humans?
link |
You said nefarious humans getting us in trouble. I mean machine learning systems have in some ways
link |
have revealed to us the ethical flaws in our data in that same kind of way. Can reinforcement learning
link |
teach us about ourselves? Has it taught something? What have you learned about yourself from trying
link |
to build robots and reinforcement learning systems? I'm not sure what I've learned about myself but
link |
maybe part of the answer to your question might become a little bit more apparent once we see
link |
more widespread deployment of reinforcement learning for decision making support in domains
link |
like healthcare, education, social media, etc. And I think we will see some interesting stuff
link |
emerge there. We will see for instance what kind of behaviors these systems come up with
link |
in situations where there is interaction with humans and where they have possibility of
link |
influencing human behavior. I think we're not quite there yet but maybe in the next two years
link |
we'll see some interesting stuff come out in that area. I hope outside the research space because
link |
the exciting space where this could be observed is sort of large companies that deal with large
link |
data and I hope there's some transparency. One of the things that's unclear when I look at social
link |
networks and just online is why an algorithm did something or whether even an algorithm was involved
link |
and that'd be interesting as a from a research perspective just to observe the results of algorithms
link |
to open up that data or to at least be sufficiently transparent about the behavior of these AS systems
link |
in the real world. What's your sense? I don't know if you looked at the blog post bitter lesson
link |
by Rich Sutton where it looks at sort of the big lesson of research in AI and reinforcement learning
link |
is that simple methods, general methods that leverage computation seem to work well. So basically
link |
don't try to do any kind of fancy algorithms just wait for computation to get fast. Do you share
link |
this kind of intuition? I think the high level idea makes a lot of sense. I'm not sure that my
link |
takeaway would be that we don't need to work on algorithms. I think that my takeaway would be that
link |
we should work on general algorithms and actually I think that this idea of needing to better automate
link |
the acquisition of experience in the real world actually follows pretty naturally from Rich
link |
Sutton's conclusion. So if the claim is that automated general methods plus data leads to good
link |
results, then it makes sense that we should build general methods and we should build the kind of
link |
methods that we can deploy and get them to go out there and collect their experience autonomously.
link |
I think that one place where I think that the current state of things falls a little bit short
link |
of that is actually that the going out there and collecting the data autonomously, which is easy to
link |
do in a simulated board game but very hard to do in the real world. Yeah, it keeps coming back to
link |
this one problem, right? So your mind is focused there now in this real world. It just seems scary
link |
the step of collecting the data and it seems unclear to me how we can do it effectively.
link |
Well, you know, 7 billion people in the world, each of them had to do that at some point in
link |
their lives. And we should leverage that experience that they've all done. We should be able to try
link |
to collect that kind of data. Okay, big questions. Maybe stepping back to your life, wood book or
link |
books, technical or fiction or philosophical had a big impact on the way you saw the world,
link |
and the way you thought about in the world, your life in general. And maybe what books,
link |
if it's different, would you recommend people consider reading on their own intellectual
link |
journey? It could be within reinforcement learning, but it could be very much bigger.
link |
I don't know if this is like a scientifically, like, particularly meaningful answer, but
link |
like, the honest answer is that I actually found a lot of the work by Isaac Hasimov to be very
link |
inspiring when I was younger. I don't know if that has anything to do with AI necessarily.
link |
You don't think it had a ripple effect in your life?
link |
Maybe it did. But yeah, I think that a vision of a future where, well, first of all,
link |
artificial, I might say artificial intelligence system, artificial robotic systems,
link |
robotic systems have, you know, kind of a big place, a big role in society,
link |
and where we try to imagine the sort of the limiting case of technological advancement
link |
and how that might play out in our future history. But yeah, I think that that was
link |
in some way influential. I don't really know how, but I would recommend it. I mean,
link |
if nothing else, you'd be well entertained. When did you first, yourself, like fall in love with
link |
idea of artificial intelligence get captivated by this field?
link |
So my honest answer here is actually that I only really started to think about it as a,
link |
that's something that I might want to do actually in graduate school pretty late.
link |
And a big part of that was that until, you know, somewhere around 2009, 2010,
link |
then just wasn't really high on my priority list because I didn't think that it was something
link |
where we're going to see very substantial advances in my lifetime. And, you know, maybe
link |
in terms of my career, the time when I really decided I wanted to work on this was when I
link |
actually took a seminar course that was taught by Professor Andrew Ng. And, you know, at that
link |
point, I of course had some, had like a decent understanding of the technical things involved.
link |
But one of the things that really resonated with me was when he said in the opening lecture,
link |
something the effect of like, well, he used to have graduate students come to him and talk about
link |
how they want to work on AI and he would kind of chuckle and give them some math problem to deal
link |
with. But now he's actually thinking that this is an area where we might see like substantial
link |
advances in our lifetime. And that kind of got me thinking because, you know, in some abstract
link |
sense, yeah, like you can kind of imagine that. But in a very real sense, when someone who had
link |
been working on that kind of stuff their whole career suddenly says that, yeah, like that,
link |
that had some effect on me. Yeah, this might be a special moment in the history of the field.
link |
That this is where we might see some, some interesting breakthroughs. So in the space of
link |
advice, somebody who's interested in getting started in machine learning or reinforcement
link |
learning, what advice would you give to maybe an undergraduate student or maybe even younger,
link |
how what are the first steps to take? And further on, what are the steps to take on that journey?
link |
So something that I think is important to do is to not be afraid to spend time imagining
link |
the kind of outcome that you might like to see. So one outcome might be a successful career,
link |
a large paycheck or something, or state of the art results on some benchmark.
link |
But hopefully that's not the thing that's like the main driving force for somebody.
link |
But I think that if someone who's a student considering a career in AI, like, takes a
link |
little while, sits down and thinks like, what do I really want to see? What do I want to see a machine
link |
do? What do I want? What do I want to see a robot do? What do I want to do? And what do I want to
link |
see a natural language system? Just like imagine, you know, imagine it almost like a commercial
link |
for a future product or something or like like something that you'd like to see in the world,
link |
and then actually sit down and think about the steps that are necessary to get there.
link |
And hopefully that thing is not a better number on image net classification. It's like it's
link |
probably like an actual thing that we can't do today that would be really awesome, whether it's
link |
a robot butler or a, you know, a really awesome healthcare decision making support system,
link |
whatever it is that you find inspiring. And I think that thinking about that and then
link |
backtracking from there and imagining the steps needed to get there will actually
link |
lead to much better research. It'll lead to rethinking the assumptions. It'll lead to
link |
working on the bottlenecks that other people aren't working on.
link |
And then naturally to turn to you, we've talked about reward functions, and you just give an
link |
advice on looking forward to how you'd like to see what kind of change you would like to make
link |
in the world. What do you think, ridiculous, big question? What do you think is the meaning
link |
of life? What is the meaning of your life? What gives you fulfillment, purpose, happiness, and
link |
meaning? That's a very big question. What's the reward function under which you're operating?
link |
Yeah, I think one thing that does give, you know, if not meaning at least satisfaction is
link |
some degree of confidence that I'm working on a problem that really matters. I feel like it's
link |
less important to me to actually solve a problem, but it's quite nice to take things to spend my
link |
time on that I believe really matter. And I try pretty hard to look for that.
link |
I don't know if it's easy to answer this, but if you're successful, what does that look like?
link |
What's the big dream? Of course, success is built on top of success and you keep going forever,
link |
but what is the dream? Yeah, so one very concrete thing or maybe as concrete as it's going to get
link |
here is to see machines that actually get better and better the longer they exist in the world.
link |
And that kind of seems like on the surface, one might even think that that's something that we
link |
have today, but I think we really don't. I think that there is an ending complexity in the universe
link |
and to date, all of the machines that we've been able to build don't sort of improve up to the limit
link |
of that complexity. They hit a wall somewhere. Maybe they hit a wall because they're in a simulator
link |
that has that is only a very limited, very pale limitation of the real world or they hit a wall
link |
because they rely on a labeled dataset, but they never hit the wall of like running out of stuff
link |
to see. So, you know, I'd like to build a machine that can go as far as possible.
link |
And that runs up against the ceiling of the complexity of the universe. Yes.
link |
Well, I don't think there's a better way to end it, Sergei. Thank you so much. It's a huge honor.
link |
I can't wait to see the amazing work that you have to publish and in education space in terms
link |
of reinforcement learning. Thank you for inspiring the world. Thank you for the great research you
link |
do. Thank you. Thanks for listening to this conversation with Sergei Levine and thank you
link |
to our sponsors, Cash App and ExpressVPN. Please consider supporting this podcast by
link |
downloading Cash App and using code LEX Podcast and signing up at expressvpn.com
link |
slash lexpod. Click all the links, buy all the stuff. It's the best way to support this podcast
link |
and the journey I'm on. If you enjoy this thing, subscribe on YouTube, review it with
link |
Firestars and Apple Podcast. Support on Patreon or connect with me on Twitter at Lex Freedman
link |
spelled somehow if you can figure out how without using the letter E, just F R I D M A N.
link |
And now let me leave you with some words from Salvador Dali. Intelligence without ambition
link |
is a bird without wings. Thank you for listening and hope to see you next time.