back to indexDavid Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86
link |
The following is a conversation with David Silver, who leads the Reinforcement Learning
link |
Research Group at DeepMind, and was the lead researcher on AlphaGo, AlphaZero, and CoLED,
link |
the AlphaStar and MuZero efforts, and a lot of important work in reinforcement learning
link |
I believe AlphaZero is one of the most important accomplishments in the history of artificial
link |
intelligence, and David is one of the key humans who brought AlphaZero to life together
link |
with a lot of other great researchers at DeepMind.
link |
He's humble, kind, and brilliant.
link |
We were both jet lagged, but didn't care and made it happen.
link |
It was a pleasure and truly an honor to talk with David.
link |
This conversation was recorded before the outbreak of the pandemic.
link |
For everyone feeling the medical, psychological, and financial burden of this crisis, I'm
link |
sending love your way.
link |
Stay strong, or in this together, we'll beat this thing.
link |
This is the Artificial Intelligence Podcast.
link |
If you enjoy it, subscribe on YouTube, review it with 5 stars on Apple Podcasts, support
link |
on Patreon, or simply connect with me on Twitter at Lex Freedman, spelled F R I D M A N.
link |
As usual, I'll do a few minutes of ads now and never any ads in the middle that can break
link |
the flow of the conversation.
link |
I hope that works for you and doesn't hurt the listening experience.
link |
Quick summary of the ads.
link |
Masterclass and Cash App.
link |
Please consider supporting the podcast by signing up to masterclass and masterclass.com
link |
slash lex, and downloading Cash App and using code Lex Podcast.
link |
This show is presented by Cash App, the number one finance app in the App Store.
link |
When you get it, use code Lex Podcast.
link |
Cash App lets you send money to friends, buy Bitcoin, and invest in the stock market with
link |
This Cash App allows you to buy Bitcoin, let me mention that cryptocurrency in the context
link |
of the history of money is fascinating.
link |
I recommend Ascent of Money as a great book on this history.
link |
Debits and credits on ledgers started around 30,000 years ago.
link |
The US Dollar created over 200 years ago, and Bitcoin, the first decentralized cryptocurrency,
link |
released just over 10 years ago.
link |
So given that history, cryptocurrency is still very much in its early days of development,
link |
but is still aiming to and just might redefine the nature of money.
link |
So again, if you get Cash App from the App Store or Google Play and use the code Lex Podcast,
link |
you get $10, and Cash App will also donate $10 the first, an organization that is helping
link |
to advance robotics and STEM education for young people around the world.
link |
This show is sponsored by Masterclass, sign up at masterclass.com slash Lex to get a discount
link |
and to support this podcast.
link |
In fact, for a limited time now, if you sign up for an All Access Pass for a year, you
link |
get to get another All Access Pass to share with a friend.
link |
Buy one, get one free.
link |
When I first heard about Masterclass, I thought it was too good to be true.
link |
For $180 a year, you get an All Access Pass to watch courses from to list some of my favorites.
link |
Chris Hatfield on space exploration, Neil deGrasse Tyson on scientific thinking communication,
link |
Will Wright, the creator of SimCity and Sims, on game design, Jane Goodall on conservation,
link |
Carlos Santana on guitar, his song Europa could be the most beautiful guitar song ever
link |
Gary Gasparov on chess, Daniel Nagrano on poker, and many, many more.
link |
Chris Hatfield explaining how rockets work and the experience of being launched into
link |
space alone is worth the money.
link |
For me, the key is to not be overwhelmed by the abundance of choice.
link |
Pick three courses you want to complete.
link |
Watch each of them all the way through.
link |
It's not that long, but it's an experience that will stick with you for a long time.
link |
It's easily worth the money.
link |
You can watch it on basically any device.
link |
Once again, sign up on masterclass.com slash lex to get a discount and to support this
link |
And now here's my conversation with David Silver.
link |
What was the first program you ever written and what programming language do you remember?
link |
I remember very clearly, yeah, my parents brought home this BBC model B microcomputer.
link |
It was just this fascinating thing to me.
link |
I was about seven years old and couldn't resist just playing around with it.
link |
So I think first program ever was writing my name out in different colors and getting
link |
it to loop and repeat that.
link |
And there was something magical about that, which just led to more and more.
link |
How did you think about computers back then?
link |
The magical aspect of it, that you can write a program and there's this thing that you
link |
just gave birth to that's able to create visual elements and live in its own.
link |
Or did you not think of it in those romantic notions?
link |
Was it more like, oh, that's cool.
link |
I can solve some puzzles.
link |
It was always more than solving puzzles.
link |
It was something where there was this limitless possibilities once you have a computer in
link |
You can do anything with it.
link |
I used to play with Lego with the same feeling.
link |
You can make anything you want out of Lego, but even more so with a computer.
link |
You're not constrained by the amount of kit you've got.
link |
And so I was fascinated by it and started pulling out the user guide and the advanced
link |
user guide and then learning.
link |
So I started in basic and then later 6502.
link |
My father also became interested in this machine and gave up his career to go back to school
link |
and study for a master's degree in artificial intelligence, funnily enough, at Essex University
link |
So I was exposed to those things at an early age.
link |
He showed me how to program in Prologue and do things like querying your family tree.
link |
And those are some of my earliest memories of trying to figure things out on a computer.
link |
Those are the early steps in computer science programming, but when did you first fall in
link |
love with artificial intelligence or with the ideas, the dreams of AI?
link |
I think it was really when I went to study at university.
link |
So I was an undergrad at Cambridge and studying computer science.
link |
And I really started to question, you know, what really are the goals?
link |
Where do we want to go with computer science?
link |
And it seemed to me that the only step of major significance to take was to try and recreate
link |
something akin to human intelligence.
link |
If we could do that, that would be a major leap forward.
link |
And that idea certainly wasn't the first to have it, but it, you know, nestled within
link |
me somewhere and became like a bug, you know, I really wanted to crack that problem.
link |
So you thought it was like, you had a notion that this is something that human beings can
link |
do, that it is possible to create an intelligent machine?
link |
Well, I mean, unless you believe in something metaphysical, then what are our brains doing?
link |
Well, at some level, their information processing systems, which are able to take whatever information
link |
is in there, transform it through some form of program and produce some kind of output,
link |
which enables that human being to do all the amazing things that they can do in this incredible
link |
So, so then do you remember the first time you've written a program that, because you
link |
also had an interest in games, do you remember the first time you were in a program that
link |
beat you in a game?
link |
That or beat you at anything sort of achieved super David Silver level performance?
link |
So I used to work in the games industry.
link |
So for five years, I programmed games for my first job.
link |
So it was an amazing opportunity to get involved in a startup company.
link |
And so I was involved in building AI at that time.
link |
And so for sure, there was a sense of building handcrafted what people used to call AI in
link |
the games industry, which I think is not really what we might think of as AI in its fuller
link |
sense, but something which is able to take actions in a way which makes things interesting
link |
and challenging for the human player.
link |
And at that time, I was able to build these handcrafted agents, which in certain limited
link |
cases could do things which were able to do better than me, but mostly in these kind of
link |
twitch like scenarios where they were able to do things faster or because they had some
link |
pattern which was able to exploit repeatedly.
link |
I think if we're talking about real AI, the first experience for me came after that when
link |
I realized that this path I was on wasn't taking me towards, it wasn't dealing with
link |
that bug which I still had inside me to really understand intelligence and try and solve it.
link |
Everything people were doing in games was short term fixes rather than long term vision.
link |
And so I went back to study for my PhD, which was finally enough trying to apply reinforcement
link |
learning to the game of Go.
link |
And I built my first Go program using reinforcement learning, a system which would by trial and
link |
error play against itself and was able to learn which patterns were actually helpful
link |
to predict whether it was going to win or lose the game and then choose the moves that
link |
led to the combination of patterns that would mean that you're more likely to win.
link |
And that system, that system beat me.
link |
And how did that make you feel?
link |
Make me feel good.
link |
I mean, it's a mix of a sort of excitement and was there a tinge of sort of like almost
link |
like a fearful awe?
link |
You know, it's like in 2001 Space Odyssey kind of realizing that you've created something
link |
that's achieved human level intelligence in this one particular little task.
link |
And in that case, I suppose the neural networks weren't involved.
link |
There were no neural networks in those days.
link |
This was pre deep learning revolution, but it was a principled self learning system based
link |
on a lot of the principles which people still use in deep reinforcement learning.
link |
I think I found it immensely satisfying that a system which was able to learn from first
link |
principles for itself was able to reach the point that it was understanding this domain
link |
better than I could and able to outwit me.
link |
I don't think it was a sense of awe.
link |
It was a sense that satisfaction that something I felt should work had worked.
link |
So to me, Alpha Go, and I don't know how else to put it, but to me, Alpha Go and Alpha
link |
Go Zero mastering the game of Go is, again, to me, the most profound and inspiring moment
link |
in the history of artificial intelligence.
link |
So you're one of the key people behind this achievement.
link |
And I'm Russian, so I really felt the first sort of seminal achievement when deep blue
link |
be Garakasparov in 1997.
link |
So as far as I know, the AI community at that point largely saw the game of Go as unbeatable
link |
in AI using the sort of the state of the art to brute force methods, search methods.
link |
Even if you consider at least the way I saw it, even if you consider arbitrary exponential
link |
scaling of compute, Go would still not be solvable, hence why it was thought to be impossible.
link |
So given that the game of Go was impossible to master, when was the dream for you?
link |
You just mentioned your PhD thesis of building the system that plays Go.
link |
When was the dream for you that you could actually build a computer program that achieves
link |
world class, not necessarily beats the world champion, but achieves that kind of level
link |
First of all, thank you.
link |
That was very kind words.
link |
And funnily enough, I just came from a panel where I was actually in a conversation with
link |
Garakasparov and Murray Campbell, who was the author of Deep Blue.
link |
And it was their first meeting together since the match, so that just occurred yesterday.
link |
So I'm literally fresh from that experience.
link |
So these are amazing moments when they happen, but where did it all start?
link |
Well, for me, it started when I became fascinated in the game of Go.
link |
So Go, for me, I've grown up playing games, I've always had a fascination in board games.
link |
I played chess as a kid, I played Scrabble as a kid.
link |
When I was at university, I discovered the game of Go, and to me, it just blew all of
link |
those other games out of the water, it was just so deep and profound in its complexity
link |
with endless levels to it.
link |
What I discovered was that I could devote endless hours to this game, and I knew in
link |
my heart of hearts that no matter how many hours I would devote to it, I would never
link |
become a grandmaster.
link |
Or there was another path, and the other path was to try and understand how you could get
link |
some other intelligence to play this game better than I would be able to.
link |
And so even in those days, I had this idea that, what if it was possible to build a program
link |
that could crack this?
link |
And as I started to explore the domain, I discovered that this was really the domain
link |
where people felt deeply that if progress could be made in Go, it would really mean
link |
a giant leap forward for AI.
link |
It was the challenge where all other approaches had failed.
link |
This is coming out of the era you mentioned, which was in some sense the golden era for
link |
the classical methods of AI, like heuristic search.
link |
In the 90s, they all fell one after another, not just chess with deep blue, but checkers,
link |
batgammon, Othello.
link |
There were numerous cases where systems built on top of heuristic search methods with these
link |
high performance systems had been able to defeat the human world champion in each of
link |
And yet in that same time period, there was a million dollar prize available for the game
link |
of Go, for the first system to be a human professional player.
link |
And at the end of that time period, at year 2000, when the prize expired, the strongest
link |
Go program in the world was defeated by a nine year old child.
link |
When that nine year old child was giving nine free moves to the computer at the start of
link |
the game to try and even things up.
link |
And computer Go expert beat that same strongest program with 29 handicap stones, 29 free moves.
link |
So that's what the state of affairs was when I became interested in this problem in around
link |
2003 when I started working on computer Go.
link |
There was nothing.
link |
There was just very, very little in the way of progress towards meaningful performance
link |
again at anything approaching human level.
link |
And so people, it wasn't through lack of effort, people who tried many, many things.
link |
And so there was a strong sense that something different would be required for Go than had
link |
been needed for all of these other domains where AI had been successful.
link |
And maybe the single clearest example is that Go, unlike those other domains, had this kind
link |
of intuitive property that a Go player would look at a position and say, hey, here's this
link |
mess of black and white stones.
link |
But from this mess, oh, I can predict that this part of the board has become my territory,
link |
this part of the board has become your territory, and I've got this overall sense that I'm going
link |
to win and that this is about the right move to play.
link |
And that intuitive sense of judgment of being able to evaluate what's going on in a position,
link |
it was pivotal to humans being able to play this game and something that people had no
link |
idea how to put into computers.
link |
So this question of how to evaluate a position, how to come up with these intuitive judgments
link |
was the key reason why Go was so hard in addition to its enormous search space and the reason
link |
why methods which had succeeded so well elsewhere failed in Go.
link |
And so people really felt deep down that in order to crack Go, we would need to get something
link |
akin to human intuition.
link |
And if we got something akin to human intuition, we'd be able to solve many, many more problems
link |
So for me, that was the moment where it's like, okay, this is not just about playing
link |
This is about something profound.
link |
And it was back to that bug which had been itching me all those years.
link |
This is the opportunity to do something meaningful and transformative and I guess a dream was
link |
That's a really interesting way to put it.
link |
So almost this realization that you need to find formulate Go as a kind of a prediction
link |
problem versus a search problem was the intuition.
link |
I mean, maybe that's the wrong crude term, but to give it the ability to kind of intuit
link |
things about positional structure of the board.
link |
Now, okay, but what about the learning part of it?
link |
Did you have a sense that you have to, that learning has to be part of the system?
link |
Again, something that hasn't really as far as I think, except with TD Gammon and the
link |
90s with RL a little bit, hasn't been part of those daily art game playing systems.
link |
So I strongly felt that learning would be necessary and that's why my PhD topic back
link |
then was trying to apply reinforcement learning to the game of Go.
link |
And not just learning of any type, but I felt that the only way to really have a system
link |
to progress beyond human levels of performance wouldn't just be to mimic how humans do it,
link |
but to understand for themselves.
link |
And how else can a machine hope to understand what's going on, except through learning?
link |
If you're not learning, what else are you doing?
link |
Well, you're putting all the knowledge into the system and that just feels like something
link |
which decades of AI have told us is maybe not a dead end, but certainly has a ceiling
link |
to the capabilities.
link |
It's known as the knowledge acquisition bottleneck that the more you try to put into something,
link |
the more brittle the system becomes.
link |
And so you just have to have learning.
link |
You have to have learning.
link |
That's the only way you're going to be able to get a system which has sufficient knowledge
link |
in it, millions and millions of pieces of knowledge, billions, trillions of a form that
link |
can actually apply for itself and understand how those billions and trillions of pieces
link |
of knowledge can be leveraged in a way which will actually lead it towards its goal without
link |
conflict or other issues.
link |
I mean, if I put myself back in that time, I just wouldn't think like that without a good
link |
demonstration of RL.
link |
I would think more in the symbolic AI, like the not learning, but sort of a simulation
link |
of knowledge base, like a growing knowledge base, but it would still be sort of pattern
link |
based, like basically have little rules that you kind of assemble together into a large
link |
Well, in a sense, that was the state of the art back then.
link |
So if you look at the Go programs which had been competing for this prize I mentioned,
link |
they were an assembly of different specialized systems, some of which used huge amounts of
link |
human knowledge to describe how you should play the opening, how you should all the different
link |
patterns that were required to play well in the game of Go, end game theory, combinatorial
link |
game theory, and combined with more principled search based methods, which we're trying
link |
to solve for particular sub parts of the game, like life and death, connecting groups together,
link |
all these amazing sub problems that just emerge in the game of Go, there were different pieces
link |
all put together into this collage, which together would try and play against a human.
link |
And although not all of the pieces were handcrafted, the overall effect was nevertheless still
link |
brittle and it was hard to make all these pieces work well together.
link |
And so really, what I was pressing for and the main innovation of the approach I took
link |
was to go back to first principles and say, well, let's back off that and try and find
link |
a principled approach where the system can learn for itself.
link |
Just from the outcome, like, you know, learn for itself, if you try something, did that
link |
help or did it not help?
link |
And only through that procedure can you arrive at knowledge which is verified, the system
link |
has to verify it for itself, not relying on any other third party to say this is right
link |
And so that principle was already very important in those days that unfortunately we were missing
link |
some important pieces back then.
link |
So before we dive into maybe discussing the beauty of reinforcement learning, let's take
link |
a step back, we kind of skipped it a bit, but the rules of the game of Go, the elements
link |
of it perhaps contrasting to chess that sort of you really enjoyed as a human being and
link |
also that make it really difficult as a AI machine learning problem.
link |
So the game of Go has remarkably simple rules, in fact, so simple that people have speculated
link |
that if we were to meet alien life at some point that we wouldn't be able to communicate
link |
with them, but we would be able to play a game of Go with them, probably have discovered
link |
the same ruleset, so the game is played on a 19 by 19 grid and you play on the intersections
link |
of the grid and the players take turns.
link |
And the aim of the game is very simple, it's to surround as much territory as you can as
link |
many of these intersections with your stones and to surround more than your opponent does.
link |
And the only nuance to the game is that if you fully surround your opponent's piece,
link |
then you get to capture it and remove it from the board and it counts as your own territory.
link |
Now from those very simple rules, immense complexity arises, there's kind of profound
link |
strategies in how to surround territory, how to kind of trade off between making solid
link |
territory yourself now, compared to building up influence that will help you acquire territory
link |
later in the game, how to connect groups together, how to keep your own groups alive, which patterns
link |
of stones are most useful compared to others.
link |
There's just immense knowledge and human Go players have played this game for, it was
link |
discovered thousands of years ago and human Go players have built up this immense knowledge
link |
base over the years.
link |
It's studied very deeply and played by something like 50 million players across the world, mostly
link |
in China, Japan and Korea, where it's an important part of the culture, so much so that it's
link |
considered one of the four ancient arts that was required by Chinese scholars.
link |
There's a deep history there.
link |
But there's interesting quality, so if I were to compare it to chess, chess is in the same
link |
way as it is in Chinese culture for Go, and chess in Russia is also considered one of
link |
So if we contrast Go with chess, there's interesting qualities about Go, maybe you can correct
link |
me if I'm wrong, but the evaluation of a particular static board is not as reliable, like you
link |
can't, in chess you can kind of assign points to the different units, and it's kind of a
link |
pretty good measure of who's winning, who's losing.
link |
It's not so clear to do some Go.
link |
Yeah, so in the game of Go, you find yourself in a situation where both players have played
link |
the same number of stones, actually captures at strong level of play happen very rarely,
link |
which means that at any moment in the game you've got the same number of white stones
link |
and black stones, and the only thing which differentiates how well you're doing is this
link |
intuitive sense of where are the territories ultimately going to form on this board?
link |
And if you look at the complexity of a real Go position, it's mind boggling that question
link |
of what will happen in 300 moves from now when you see just a scattering of 20 white
link |
and black stones intermingled, and so that challenge is the reason why position evaluation
link |
is so hard in Go compared to other games.
link |
In addition to that, it has an enormous search space, so there's around 10 to 170 positions
link |
in the game of Go, that's an astronomical number, and that search space is so great
link |
that traditional heuristic search methods that were so successful and things like Deep
link |
Blue and chess programs just kind of fall over in Go.
link |
So at which point did reinforcement learning enter your life, your research life, your way
link |
We just talked about learning, but reinforcement learning is a very particular kind of learning,
link |
one that's both philosophically sort of profound, but also one that's pretty difficult to get
link |
to work as if we look back in the early days.
link |
So when did that enter your life and how did that work progress?
link |
So I had just finished working in the games industry at this startup company, and I took
link |
a year out to discover for myself exactly which path I wanted to take, I knew I wanted
link |
to study intelligence, but I wasn't sure what that meant at that stage, I really didn't
link |
feel I had the tools to decide on exactly which path I wanted to follow.
link |
So during that year, I read a lot, and one of the things I read was Sutton and Bartow,
link |
the sort of seminal textbook on an introduction to reinforcement learning, and when I read
link |
that textbook, I just had this resonating feeling that this is what I understood intelligence
link |
And this was the path that I felt would be necessary to go down to make progress in AI.
link |
So I got in touch with Rich Sutton and asked him if he would be interested in supervising
link |
me on a PhD thesis in computer go, and he basically said that if he's still alive he'd
link |
be happy to, but unfortunately he'd been struggling with very serious cancer for some years, and
link |
he really wasn't confident at that stage that he'd even be around to see the end event.
link |
But fortunately that part of the story worked out very happily, and I found myself out there
link |
in Alberta, they've got a great games group out there with a history of fantastic work
link |
in board games as well, as Rich Sutton, the father of RL, so it was the natural place
link |
for me to go in some sense to study this question.
link |
And the more I looked into it, the more strongly I felt that this wasn't just the path to progress
link |
in computer go, but really this was the thing I'd been looking for.
link |
This was really an opportunity to frame what intelligence means, like what are the goals
link |
of AI in a single, clear problem definition such that if we're able to solve that clear
link |
single problem definition, in some sense we've cracked the problem of AI.
link |
So to you, reinforcement learning ideas, at least sort of echoes of it, would be at the
link |
core of intelligence.
link |
Is it the core of intelligence?
link |
And if we ever create a human level intelligence system, it would be at the core of that kind
link |
Let me say it this way, that I think it's helpful to separate out the problem from the solution.
link |
So I see the problem of intelligence, I would say it can be formalized as the reinforcement
link |
learning problem, and that that formalization is enough to capture most if not all of the
link |
things that we mean by intelligence, that they can all be brought within this framework
link |
and gives us a way to access them in a meaningful way that allows us as scientists to understand
link |
intelligence and us as computer scientists to build them.
link |
And so in that sense, I feel that it gives us a path, maybe not the only path, but a
link |
And so do I think that any system in the future that's solved AI would have to have RL within
link |
Well, I think if you ask that, you're asking about the solution methods.
link |
I would say that if we have such a thing, it would be a solution to the RL problem.
link |
Now, what particular methods have been used to get there?
link |
Well, we should keep an open mind about the best approaches to actually solve any problem.
link |
And the things we have right now for reinforcement learning, maybe I believe they've got a lot
link |
of legs, but maybe we're missing some things.
link |
Maybe there's going to be better ideas.
link |
I think we should keep, let's remain modest, and we're at the early days of this field,
link |
and there are many amazing discoveries ahead of us.
link |
The specifics, especially of the different kinds of RL approaches currently, there could
link |
be other things that fall into the very large umbrella of RL.
link |
But if it's okay, can we take a step back and ask the basic question of what is, do
link |
you, reinforcement learning?
link |
So reinforcement learning is the study and the science and the problem of intelligence
link |
in the form of an agent that interacts with an environment.
link |
So the problem you're trying to solve is represented by some environment, like the world in which
link |
that agent is situated.
link |
And the goal of RL is clear, that the agent gets to take actions.
link |
Those actions have some effect on the environment, and the environment gives back an observation
link |
to the agent saying, you know, this is what you see or sense.
link |
And one special thing which it gives back is called the reward signal, how well it's
link |
doing in the environment.
link |
And the reinforcement learning problem is to simply take actions over time so as to maximize
link |
that reward signal.
link |
So a couple of basic questions, what types of RL approaches are there?
link |
So I don't know if there's a nice brief inwards way to paint a picture of sort of value based,
link |
model based, policy based reinforcement learning.
link |
So now if we think about, okay, so there's this ambitious problem definition of RL.
link |
It's really, you know, it's truly ambitious.
link |
It's trying to capture and encircle all of the things in which an agent interacts with
link |
an environment and say, well, how can we formalize and understand what it means to crack that?
link |
Now let's think about the solution method.
link |
Well, how do you solve a really hard problem like that?
link |
Well, one approach you can take is to decompose that very hard problem into pieces that work
link |
together to solve that hard problem.
link |
And so you can kind of look at the decomposition that's inside the agent's head, if you like,
link |
and ask, well, what form does that decomposition take?
link |
And some of the most common pieces that people use when they're kind of putting this system,
link |
the solution method together, some of the most common pieces that people use are whether
link |
or not that solution has a value function.
link |
That means is it trying to predict, explicitly trying to predict how much reward it will get
link |
Does it have a representation of a policy?
link |
That means something which is deciding how to pick actions.
link |
Is that decision making process explicitly represented?
link |
And is there a model in the system?
link |
Is there something which is explicitly trying to predict what will happen in the environment?
link |
And so those three pieces are, to me, some of the most common building blocks.
link |
And I understand the different choices in RL as choices of whether or not to use those
link |
building blocks when you're trying to decompose the solution.
link |
Should I have a value function represented?
link |
Should I have a policy represented?
link |
Should I have a model represented?
link |
And there are combinations of those pieces and, of course, other things that you could
link |
add into the picture as well.
link |
But those three fundamental choices give rise to some of the branches of RL with which
link |
we are very familiar.
link |
And so those, as you mentioned, there is a choice of what's specified or modeled explicitly.
link |
And the idea is that all of these are somehow implicitly learned within the system.
link |
So it's almost a choice of how you approach a problem.
link |
Do you see those as fundamental differences or are these almost like small specifics,
link |
like the details of how you solve the problem, but they're not fundamentally different from
link |
I think the fundamental idea is maybe at the higher level.
link |
The fundamental idea is the first step of the decomposition is really to say, well,
link |
how are we really going to solve any kind of problem where you're trying to figure out
link |
how to take actions and just from this stream of observations, you know, you've got some
link |
agent situated in its sensory motor stream and getting all these observations in, getting
link |
to take these actions.
link |
And what should it do?
link |
How can you even broach that problem?
link |
You know, maybe the complexity of the world is so great that you can't even imagine how
link |
to build a system that would understand how to deal with that.
link |
And so the first step of this decomposition is to say, well, you have to learn.
link |
The system has to learn for itself.
link |
And so note that the reinforcement learning problem doesn't actually stipulate that you
link |
Like you could maximize your rewards without learning, it would just say wouldn't do a
link |
very good job of it.
link |
So learning is required because it's the only way to achieve good performance in any sufficiently
link |
large and complex environment.
link |
So that's the first step.
link |
And so that step gives commonality to all of the other pieces, because now you might
link |
ask, well, what should you be learning?
link |
What does learning even mean?
link |
You know, in this sense, you know, learning might mean, well, you're trying to update
link |
the parameters of some system, which is then the thing that actually picks the actions.
link |
And those parameters could be representing anything, they could be parametrizing a value
link |
function or a model or a policy.
link |
And so in that sense, there's a lot of commonality in that whatever is being represented there
link |
is the thing which is being learned and it's being learned with the ultimate goal of maximizing
link |
But the way in which you decompose the problem is really what gives the semantics to the
link |
You're trying to learn something to predict well, like a value function or a model, or
link |
you're learning something to perform well, like a policy.
link |
And the form of that objective is kind of giving the semantics to the system.
link |
And so it really is, at the next level down, a fundamental choice.
link |
And we have to make those fundamental choices as system designers or enable our algorithms
link |
to be able to learn how to make those choices for themselves.
link |
So then the next step you mentioned, the very first thing you have to deal with is can you
link |
even take in this huge stream of observations and do anything with it?
link |
So the natural next basic question is what is the, what is deeper enforcement learning
link |
and what is this idea of using neural networks to deal with this huge incoming stream?
link |
So amongst all the approaches for reinforcement learning, deep reinforcement learning is
link |
one family of solution methods that tries to utilize powerful representations that are
link |
offered by neural networks to represent any of these different components of the solution,
link |
Like whether it's the value function or the model or the policy, the idea of deep learning
link |
is to say, well, here's a powerful toolkit that's so powerful that it's universal in
link |
the sense that it can represent any function and it can learn any function.
link |
And so if we can leverage that universality, that means that whatever we need to represent
link |
for our policy or for our value function for a model, deep learning can do it.
link |
So that deep learning is one approach that offers us a toolkit that has no ceiling to
link |
its performance, that as we start to put more resources into the system, more memory and
link |
more computation and more data, more experience, more interactions with the environment, that
link |
these are systems that can just get better and better and better at doing whatever the
link |
job is they've asked them to do.
link |
Whatever we've asked that function to represent, it can learn a function that does a better
link |
and better job of representing that knowledge, whether that knowledge be estimating how well
link |
you're going to do in the world, the value function, whether it's going to be choosing
link |
what to do in the world, the policy, or whether it's understanding the world itself, what's
link |
going to happen next, the model.
link |
Nevertheless, the fact that neural networks are able to learn incredibly complex representations
link |
that allow you to do the policy, the model or the value function is at least to my mind
link |
exceptionally beautiful and surprising.
link |
Was it surprising to you?
link |
Can you still believe it works as well as it does?
link |
Do you have good intuition about why it works at all and works as well as it does?
link |
I think let me take two parts to that question.
link |
I think it's not surprising to me that the idea of reinforcement learning works because
link |
in some sense, I feel it's the only thing which can ultimately.
link |
I feel we have to address it and there must be success as possible because we have examples
link |
It must at some level be able to possible to acquire experience and use that experience
link |
to do better in a way which is meaningful to environments of the complexity that humans
link |
Am I surprised that our current systems can do as well as they can do?
link |
I think one of the big surprises for me and a lot of the community is really the fact
link |
that deep learning can continue to perform so well despite the facts that these neural
link |
networks that they're representing have these incredibly nonlinear kind of bumpy surfaces
link |
which to our kind of low dimensional intuitions make it feel like surely you're just going
link |
to get stuck and learning will get stuck because you won't be able to make any further progress.
link |
Yet, the big surprise is that learning continues and these what appear to be local optima turn
link |
out not to be because in high dimensions when we make really big neural nets, there's always
link |
a way out and there's a way to go even lower and then you're still not in a local optima
link |
because there's some other pathway that will take you out and take you lower still.
link |
No matter where you are, learning can proceed and do better and better and better without
link |
That is a surprising and beautiful property of neural nets which I find elegant and beautiful
link |
and somewhat shocking that it turns out to be the case.
link |
As you said, which I really like, to our low dimensional intuitions, that's surprising.
link |
We're very tuned to working within a three dimensional environment and so to start to
link |
visualize what a billion dimensional neural network surface that you're trying to optimize
link |
over, what that even looks like is very hard for us and so I think that really if you try
link |
to account for essentially the AI winter where people gave up on neural networks, I think
link |
it's really down to that lack of ability to generalize from low dimensions to high dimensions
link |
because back then we were in the low dimensional case, people could only build neural nets
link |
with 50 nodes in them or something and to imagine that it might be possible to build
link |
a billion dimensional neural net and it might have a completely different qualitatively
link |
different property was very hard to anticipate and I think even now we're starting to build
link |
the theory to support that and it's incomplete at the moment but all of the theory seems
link |
to be pointing in the direction that indeed this is an approach which truly is universal
link |
both in its representational capacity which was known but also in its learning ability
link |
which is surprising.
link |
It makes one wonder what else we're missing due to our low dimensional intuitions that
link |
will seem obvious once it's discovered.
link |
I often wonder when we one day do have AIs which are superhuman in their abilities to
link |
understand the world, what will they think of the algorithms that we developed back now?
link |
Will it be looking back at these days and thinking that will we look back and feel that
link |
these algorithms were naive first steps or will they still be the fundamental ideas which
link |
are used even in 100,000, 10,000 years?
link |
They'll watch back to this conversation and with a smile and do a little bit of a laugh.
link |
My sense is I think just like when we used to think that the sun revolved around the
link |
earth they'll see our systems of today reinforcement learning as too complicated that the answer
link |
was simple all along.
link |
There's something just like you said in the game of Go, I mean I love the systems of like
link |
cellular automata that there's simple rules from which incredible complexity emerges.
link |
So it feels like there might be some very simple approaches.
link |
Just like Rich Sutton says, these simple methods would compute over time seem to prove to be
link |
the most effective.
link |
I 100% agree I think that if we try to anticipate what will generalize well into the future
link |
I think it's likely to be the case that it's the simple clear ideas which will have the
link |
longest legs and which will carry us further into the future.
link |
And nevertheless we're in a situation where we need to make things work today and sometimes
link |
that requires putting together more complex systems where we don't have the full answers
link |
yet as to what those minimal ingredients might be.
link |
So speaking of which, if we could take a step back to Go, what was Mogo and what was the
link |
key idea behind the system?
link |
So back during my PhD on computer Go around about that time there was a major new development
link |
in which actually happened in the context of computer Go and it was really a revolution
link |
in the way that heuristic search was done and the idea was essentially that a position
link |
could be evaluated or a state in general could be evaluated not by humans saying whether
link |
that position is good or not or even humans providing rules as to how you might evaluate
link |
it but instead by allowing the system to randomly play out the game until the end multiple times
link |
and taking the average of those outcomes as the prediction of what will happen.
link |
So for example if you're in the game of Go the intuition is that you take a position
link |
and you get the system to kind of play random moves against itself all the way to the end
link |
of the game and you see who wins and if black ends up winning more of those random games
link |
than white well you say hey this is a position that favors white and if white ends up winning
link |
more of those random games than black then it favors white.
link |
So that idea was known as Monte Carlo search and a particular form of Monte Carlo search
link |
that became very effective and was developed in computer Go first by Remy Coulomb in 2006
link |
and then taken further by others was something called Monte Carlo tree search which basically
link |
takes that same idea and uses that insight to evaluate every node of a search tree is
link |
evaluated by the average of the random playouts from that node onwards.
link |
And this idea was very powerful and suddenly led to huge leaps forward in the strength
link |
of computer Go playing programs and among those the strongest of the Go playing programs
link |
in those days was a program called Mogo which was the first program to actually reach human
link |
master level on small boards nine by nine boards.
link |
And so this was a program by someone called Sylvangeli who's a good colleague of mine
link |
but I worked with him a little bit in those days part of my PhD thesis and Mogo was a
link |
first step towards the latest successes we saw in computer Go but it was still missing
link |
a key ingredient Mogo was evaluating purely by random rollouts against itself and in a
link |
way it's truly remarkable that random play should give you anything at all like why in
link |
this perfectly deterministic game that's very precise and involves these very exact sequences
link |
why is it that random randomization is helpful and so the intuition is that randomization
link |
captures something about the nature of the search tree from a position that you're understanding
link |
the nature of the search tree from that node onwards by using randomization and this was
link |
a very powerful idea.
link |
And I've seen this in other spaces when I talk to Richard Karp and so on, randomized
link |
algorithms somehow magically are able to do exceptionally well and simplifying the problem
link |
somehow makes you wonder about the fundamental nature of randomness in our universe.
link |
It seems to be a useful thing but so from that moment, can you maybe tell the origin
link |
story in the journey of AlphaGo?
link |
Yeah, so programs based on Monte Carlo tree search were a first revolution in the sense
link |
that they led to suddenly programs that could play the game to any reasonable level but
link |
they plateaued, it seemed that no matter how much effort people put into these techniques
link |
they couldn't exceed the level of amateur, Dan level go players.
link |
So strong players but not anywhere near the level of professionals, never mind the world
link |
And so that brings us to the birth of AlphaGo which happened in the context of a startup
link |
company known as DeepMind where a project was born and the project was really a scientific
link |
investigation where myself and Ajah Huang and an intern Chris Madison were exploring
link |
a scientific question and that scientific question was really, is there another fundamentally
link |
different approach to this key question of go, the key challenge of how can you build
link |
that intuition in?
link |
How can you just have a system that could look at a position and understand what move
link |
to play or how well you're doing in that position, who's going to win?
link |
And so the deep learning revolution had just begun, the systems like ImageNet had suddenly
link |
been won by deep learning techniques back in 2012 and following that it was natural
link |
to ask, well, if deep learning is able to scale up so effectively with images to understand
link |
them enough to classify them, well, why not go, why not take the black and white stones
link |
of the go board and build a system which can understand for itself what that means in terms
link |
of what move to pick or who's going to win the game, black or white?
link |
And so that was our scientific question which we were probing and trying to understand and
link |
as we started to look at it, we discovered that we could build a system, so in fact our
link |
very first paper on AlphaGo was actually a pure deep learning system which was trying
link |
to answer this question and we showed that actually a pure deep learning system with
link |
no search at all was actually able to reach human band level, master level at the full
link |
game of go, 19 by 19 boards.
link |
And so without any search at all, suddenly we had systems which were playing at the level
link |
of the best Monte Carlo tree set systems, the ones with randomized rollouts.
link |
So first of all, sorry to interrupt, but that's kind of a groundbreaking notion that's like
link |
basically a definitive step away from a couple of decades of essentially search dominating
link |
How did that make you feel, was it surprising from a scientific perspective in general how
link |
I found this to be profoundly surprising.
link |
In fact, it was so surprising that we had a bet back then and like many good projects,
link |
bets are quite motivating and the bet was whether it was possible for a system based
link |
purely on deep learning, no search at all to beat a Dan level human player.
link |
And so we had someone who joined our team, who was a Dan level player, he came in and
link |
we had this first match against him.
link |
And which side of the bet were you on, by the way, the losing and the winning side?
link |
I tend to be an optimist with the power of deep learning and reinforcement learning.
link |
So the system won and we were able to beat this human Dan level player.
link |
And for me, that was the moment where it was like, okay, something special is afoot
link |
We have a system which without search is able to already just look at this position and
link |
understand things as well as a strong human player.
link |
And from that point onwards, I really felt that reaching the top levels of human play,
link |
you know, professional level, world champion level, I felt it was actually an inevitability
link |
and, and if it was an inevitable outcome, I was rather keen that it would be us that
link |
This was something where, you know, so had lots of conversations back then with Dennis
link |
the service at the head of DeepMind, who was extremely excited.
link |
And we made the decision to scale up the project, brought more people on board.
link |
And so AlphaGo became something where we had a clear goal, which was to try and crack this
link |
outstanding challenge of AI to see if we could beat the world's best players.
link |
And this led within the space of not so many months to playing against the European champion
link |
Fan Hui in a match which became, you know, memorable in history as the first time a go
link |
program had ever beaten a professional player.
link |
And at that time, we had to make a judgment as to whether when and whether we should go
link |
and challenge the world champion.
link |
And this was a difficult decision to make again.
link |
We were basing our predictions on our own progress and had to estimate based on the
link |
rapidity of our own progress when we thought we would exceed the level of the human world
link |
champion and we tried to make an estimate and set up a match and that became the AlphaGo
link |
versus LisaDoll match in 2016.
link |
And we should say spoiler alert that AlphaGo was able to defeat LisaDoll.
link |
Maybe we could take even a broader view, AlphaGo involves both learning from expert games
link |
and as far as I remember a self played component to where it learns by playing against itself.
link |
But in your sense, what was the role of learning from expert games there?
link |
And in terms of your self evaluation, whether you can take on the world champion, what was
link |
the thing that you're trying to do more of, sort of train more on expert games?
link |
Or was there's now another, I'm asking so many poorly faced questions, but did you have
link |
a hope or dream that self play would be the key component at that moment yet?
link |
So in the early days of AlphaGo, we used human data to explore the science of what deep learning
link |
And so when we had our first paper that showed that it was possible to predict the winner
link |
of the game, that it was possible to suggest moves, that was done using human data.
link |
Oh, solely human data.
link |
And so the reason that we did it that way was at that time we were exploring separately
link |
the deep learning aspect from the reinforcement learning aspect.
link |
That was the part which was new and unknown to me at that time was how far could that
link |
Once we had that, it then became natural to try and use that same representation and
link |
see if we could learn for ourselves using that same representation.
link |
And so right from the beginning, actually, our goal had been to build a system using
link |
And to us, the human data right from the beginning was an expedient step to help us for pragmatic
link |
reasons to go faster towards the goals of the project than we might be able to starting
link |
solely from self play.
link |
And so in those days, we were very aware that we were choosing to use human data and that
link |
might not be the long term holy grail of AI, but that it was something which was extremely
link |
It helped us to understand the system.
link |
It helped us to build deep learning representations which were clear and simple and easy to use.
link |
And so really, I would say it served a purpose not just as part of the algorithm, but something
link |
which I continue to use in our research today, which is trying to break down a very hard
link |
challenge into pieces which are easier to understand for us as researchers and develop.
link |
So if you use a component based on human data, it can help you to understand the system such
link |
that then you can build the more principled version later that does it for itself.
link |
So as I said, the AlphaGo victory, and I don't think I'm being sort of romanticizing this
link |
I think it's one of the greatest moments in the history of AI.
link |
So were you cognizant of this magnitude of the accomplishment of the time?
link |
I mean, are you cognizant of it even now?
link |
Because to me, I feel like it's something that we mentioned, what the AGI systems of
link |
the future will look back.
link |
I think they'll look back at the AlphaGo victory as like, holy crap, they figured it out.
link |
This is where it started.
link |
Well thank you again.
link |
I mean, it's funny because I guess I've been working on, I've been working on computer
link |
go for a long time.
link |
So I'd been working at the time of the AlphaGo match on computer go for more than a decade.
link |
And throughout that decade, I'd had this dream of what would it be like to, what would it
link |
be like really to actually be able to build a system that could play against the world
link |
And I imagined that that would be an interesting moment that maybe, you know, some people might
link |
care about that and that this might be, you know, a nice achievement.
link |
But I think when I arrived in Seoul and discovered the legions of journalists that were following
link |
us around and the hundred million people that were watching the match online live, I realized
link |
that I'd been off in my estimation of how significant this moment was by several orders
link |
And so there was definitely an adjustment process to realize that this was something
link |
which the world really cared about and which was a watershed moment.
link |
And I think there was that moment of realization, which was also a little bit scary because,
link |
you know, if you go into something thinking it's going to be maybe of interest and then
link |
discover that a hundred million people are watching, it suddenly makes you worry about
link |
whether some of the decisions you've made were really the best ones or the wisest or
link |
were going to lead to the best outcome.
link |
And we knew for sure that there were still imperfections in AlphaGo, which were going
link |
to be exposed to the whole world watching.
link |
And so, yeah, it was, I think, a great experience.
link |
And I feel privileged to have been part of it, privileged to have led that amazing team.
link |
I feel privileged to have been in a moment of history, like you say.
link |
But also lucky that, you know, in a sense, I was insulated from the knowledge of, I think
link |
it would have been harder to focus on the research if the full kind of reality of what
link |
was going to come to pass had been known to me and the team.
link |
I think it was, you know, we were in our bubble and we were working on research and we were
link |
trying to answer the scientific questions.
link |
And then bam, you know, the public sees it.
link |
And I think it was better that way in retrospect.
link |
Were you confident that, I guess, what were the chances that you could get the win?
link |
So just like you said, I'm a little bit more familiar with another accomplishment than
link |
we may not even get a chance to talk to.
link |
I talked to Oriel Villalos about AlphaStar, which is another incredible accomplishment.
link |
But here, you know, with AlphaStar and beating the StarCraft, there was like already a track
link |
record with AlphaGo.
link |
This is like the really first time you get to see reinforcement learning face the best
link |
human in the world.
link |
So what was your confidence like?
link |
What was the odds?
link |
Well, we actually...
link |
Funnily enough, there was.
link |
So just before the match, we weren't betting on anything concrete, but we all held out
link |
Everyone in the team held out a hand at the beginning of the match.
link |
And the number of fingers that they had out on that hand was supposed to represent how
link |
many games they thought we would win against Lisa Dahl.
link |
And there was an amazing spread in the team's predictions.
link |
But I have to say, I predicted four one.
link |
And the reason was based purely on data.
link |
So I'm a scientist first and foremost.
link |
And one of the things which we had established was that AlphaGo in around one in five games
link |
would develop something which we called a delusion, which was a kind of hole in its
link |
knowledge where it wasn't able to fully understand everything about the position.
link |
And that hole in its knowledge would persist for tens of moves throughout the game.
link |
And we knew two things.
link |
We knew that if there were no delusions that AlphaGo seemed to be playing at a level that
link |
was far beyond any human capabilities.
link |
But we also knew that if there were delusions, the opposite was true.
link |
And in fact, that's what came to pass.
link |
We saw all of those outcomes and Lisa Dahl in one of the games played a really beautiful
link |
sequence that AlphaGo just hadn't predicted.
link |
And after that, it led it into this situation where it was unable to really understand the
link |
position fully and found itself in one of these delusions.
link |
So indeed, four one was the outcome.
link |
And can you maybe speak to it a little bit more?
link |
What were the five games?
link |
What happened, is there interesting things that come to memory in terms of the play of
link |
the human machine?
link |
So I remember all of these games vividly, of course.
link |
Moments like these don't come too often in the lifetime of a scientist.
link |
And the first game was magical because it was the first time that a computer program
link |
had defeated a world champion in this grand challenge of go.
link |
And there was a moment where AlphaGo invaded Lisa Dahl's territory towards the end of the
link |
And that's quite an audacious thing to do.
link |
It's like saying, hey, you thought this was going to be your territory in the game, but
link |
I'm going to stick a stone right in the middle of it and prove to you that I can break it
link |
And Lisa Dahl's face just dropped.
link |
She wasn't expecting a computer to do something that audacious.
link |
The second game became famous for a move known as Move 37.
link |
This was a move that was played by AlphaGo that broke all of the conventions of go.
link |
That the go players were so shocked by this, they thought that maybe the operator had made
link |
They thought there was something crazy going on, and it just broke every rule that go players
link |
are taught from a very young age.
link |
They're just taught, you know, this kind of move called a shoulder hit.
link |
You can only play it on the third line or the fourth line, and AlphaGo played it on
link |
And it turned out to be a brilliant move and made this beautiful pattern in the middle
link |
of the board that ended up winning the game.
link |
And so this really was a clear instance where we could say computers exhibited creativity,
link |
that this was really a move that was something humans hadn't known about, hadn't anticipated.
link |
And computers discovered this idea.
link |
They were the ones to say, actually, here's a new idea, something new, not in the domains
link |
of human knowledge of the game.
link |
And now the humans think this is a reasonable thing to do, and it's part of go knowledge
link |
The third game, something special happens when you play against a human world champion,
link |
which again, I hadn't anticipated before going there, which is, you know, these players
link |
Lee Siddle was a true champion, 18 time world champion, and had this amazing ability to
link |
probe AlphaGo for weaknesses of any kind.
link |
And in the third game, he was losing, and we felt we were sailing comfortably to victory,
link |
but he managed to, from nothing, stir up this fight and build what's called a double co,
link |
these kind of repetitive positions.
link |
And he knew that historically, no computer go program had ever been able to deal correctly
link |
with double code positions, and he managed to summon one out of nothing.
link |
And so for us, you know, this was a real challenge, like would AlphaGo be able to deal with this,
link |
or would it just kind of crumble in the face of this situation?
link |
And fortunately, it dealt with it perfectly.
link |
The fourth game was amazing in that Lee Siddle appeared to be losing this game, AlphaGo thought
link |
it was winning, and then Lee Siddle did something which I think only a true world champion can
link |
do, which is he found a brilliant sequence in the middle of the game, a brilliant sequence
link |
that led him to really just transform the position, it kind of, he found just a piece
link |
And after that, AlphaGo, its evaluation just tumbled, it thought it was winning this game,
link |
and all of a sudden it tumbled and said, oh, now I've got no chance, and it starts to behave
link |
rather oddly at that point.
link |
In the final game, for some reason, we as a team were convinced having seen AlphaGo in
link |
the previous game suffer from delusions, we as a team were convinced that it was suffering
link |
from another delusion, we were convinced that it was misevaluating the position and
link |
that something was going terribly wrong.
link |
And it was only in the last few moves of the game that we realized that actually, although
link |
it had been predicting it was going to win all the way through, it really was.
link |
And so somehow, you know, it just taught us yet again that you have to have faith in your
link |
systems when they exceed your own level of ability and your own judgment, you have to
link |
trust in them to know better than you, the designer, once you've bestowed in them the
link |
ability to judge better than you can, then trust the system to do so.
link |
So just like in the case of Deep Blue beating Gary Kasparov, so Gary is, I think the first
link |
time he's ever lost actually to anybody, and I mean, there's a similar situation with
link |
Lisa Dahl, it's a tragic, it's a tragic loss for humans, but a beautiful one.
link |
I think that's kind of, from the tragedy, sort of emerges over time, emerges a kind
link |
of inspiring story, but Lisa Dahl recently analysis retirement, I don't know if we can
link |
look too deeply into it, but he did say that even if I become number one, there's an entity
link |
that cannot be defeated.
link |
So what do you think about these words?
link |
What do you think about his retirement from the game ago?
link |
Well, let me take you back first of all to the first part of your comment about Gary Kasparov,
link |
who was actually at the panel yesterday.
link |
He specifically said that when he first lost to Deep Blue, he viewed it as a failure.
link |
He viewed that this had been a failure of his, but later on in his career, he said he'd
link |
come to realize that actually it was a success, it was a success for everyone, because this
link |
marked a transformational moment for AI, and so even for Gary Kasparov, he came to realize
link |
that that moment was pivotal and actually meant something much more than his personal
link |
loss in that moment.
link |
Lisa Dahl, I think, was much more cognizant of that even at the time.
link |
So in his closing remarks to the match, he really felt very strongly that what had happened
link |
in the AlphaGo match was not only meaningful for AI, but for humans as well, and he felt
link |
as a go player that it had opened his horizons and meant that he could start exploring new
link |
It brought his joy back for the game of go because it had broken all of the conventions
link |
and barriers and meant that suddenly anything was possible again.
link |
And so I was sad to hear that he'd retired, but he's been a great world champion over
link |
many, many years, and I think he'll be remembered for that ever more.
link |
He'll be remembered as the last person to beat AlphaGo.
link |
I mean, after that, we increased the power of the system, and the next version of AlphaGo
link |
beats the other strong human players 60 games to nil.
link |
So what a great moment for him and something to be remembered for.
link |
It's interesting that you spent time at AAAI on a panel with Gary Kasparov.
link |
But I mean, it's almost, I'm just curious to learn the conversations you've had with
link |
Gary because he's also now, he's written a book about artificial intelligence.
link |
He's thinking about AI.
link |
He has kind of a view of it, and he talks about AlphaGo a lot.
link |
What's your sense, arguably, I'm not just being Russian, but I think Gary is the greatest
link |
chess player of all time.
link |
Probably one of the greatest game players of all time.
link |
And you sort of at the center of creating a system that beats one of the greatest players
link |
So what is that conversation like?
link |
Is there anything, any interesting digs, any bets, any funny things, any profound things?
link |
So Gary Kasparov has an incredible respect for what we did with AlphaGo, and it's an
link |
amazing tribute coming from him, of all people, that he really appreciates and respects what
link |
And I think he feels that the progress which has happened in computer chess, which later
link |
after AlphaGo, we built the AlphaZero system, which defeated the world's strongest chess
link |
And to Gary Kasparov, that moment in computer chess was more profound than deep blue.
link |
And the reason he believes it mattered more was because it was done with learning and
link |
a system which was able to discover for itself new principles, new ideas, which were able
link |
to play the game in a way which he hadn't always known about, or anyone.
link |
And in fact, one of the things I discovered at this panel was that the current world champion
link |
Magnus Carlsen apparently recently commented on his improvement in performance, and he
link |
attributes it to AlphaZero, that he's been studying the games of AlphaZero, he's changed
link |
his style to play more like AlphaZero.
link |
And it's led to him actually increasing his rating to a new peak.
link |
Yeah, I guess to me, just like to Gary, the inspiring thing is that, and just like you
link |
said with reinforcement learning, reinforcement learning and deep learning, machine learning
link |
feels like what intelligence is.
link |
And you could attribute it to sort of a bitter viewpoint from Gary's perspective, from us
link |
humans perspective, saying that pure search that IBM Deep Blue was doing is not really
link |
intelligence, but somehow it didn't feel like it.
link |
And so that's the magical.
link |
I'm not sure what it is about learning that feels like intelligence, but it does.
link |
So I think we should not demean the achievements of what was done in previous areas of AI.
link |
I think that Deep Blue was an amazing achievement in itself, and that heuristic search of the
link |
kind that was used by Deep Blue had some powerful ideas that were in there.
link |
But it also missed some things.
link |
So the fact that the evaluation function, the way that the chess position was understood,
link |
was created by humans and not by the machine is a limitation, which means that there's
link |
a ceiling on how well it can do.
link |
But maybe more importantly, it means that the same idea cannot be applied in other domains
link |
where we don't have access to the kind of human grandmasters and that ability to kind
link |
of encode exactly their knowledge into an evaluation function.
link |
And the reality is that the story of AI is that most domains turn out to be of the second
link |
type where knowledge is messy, it's hard to extract from experts or it isn't even available.
link |
And so we need to solve problems in a different way.
link |
And I think AlphaGo is a step towards solving things in a way which puts learning as a first
link |
class citizen and says, systems need to understand for themselves how to understand the world,
link |
how to judge the value of any action that they might take within that world in any state
link |
they might find themselves in.
link |
And in order to do that, we make progress towards AI.
link |
So one of the nice things about this, about taking a learning approach to the game of
link |
go or game playing is that the things you learn, the things you figure out are actually
link |
going to be applicable to other problems that are real world problems.
link |
That's ultimately, I mean, there's two really interesting things about AlphaGo.
link |
One is the science of it, just the science of learning, the science of intelligence.
link |
And then the other is, while you're actually learning to figuring out how to build systems
link |
that would be potentially applicable in other applications, medical, autonomous vehicles,
link |
And it's just open the door to all kinds of applications.
link |
So the next incredible step, really the profound step is probably AlphaGo Zero.
link |
I mean, it's arguable I kind of see them all as the same place, but really, and perhaps
link |
you were already thinking that AlphaGo Zero is the natural, it was always going to be
link |
the next step, but it's removing the reliance on human expert games for pre training as
link |
So how big of an intellectual leap was this, that self play could achieve super human level
link |
performance in its own?
link |
And maybe could you also say what is self play, we kind of mentioned it a few times.
link |
So let me start with self play.
link |
So the idea of self play is something which is really about systems learning for themselves,
link |
but in the situation where there's more than one agent.
link |
And so if you're in a game, and the game is played between two players, then self play
link |
is really about understanding that game just by playing games against yourself rather than
link |
against any actual real opponent.
link |
And so it's a way to kind of discover strategies without having to actually need to go out and
link |
play against any particular human player, for example.
link |
The main idea of Alpha Zero was really to try and step back from any of the knowledge
link |
that we put into the system and ask the question, is it possible to come up with a single elegant
link |
principle by which a system can learn for itself all of the knowledge which it requires
link |
to play a game such as Go.
link |
Importantly by taking knowledge out, you not only make the system less brittle in the sense
link |
that perhaps the knowledge you were putting in was just getting in the way and maybe stopping
link |
the system learning for itself, but also you make it more general.
link |
The more knowledge you put in, the harder it is for a system to actually be placed,
link |
taken out of the system in which it's kind of been designed, and placed in some other
link |
system that maybe would need a completely different knowledge base to understand and
link |
And so the real goal here is to strip out all of the knowledge that we put in to the
link |
point that we can just plug it into something totally different.
link |
And that to me is really the promise of AI is that we can have systems such as that,
link |
which no matter what the goal is, no matter what goal we set to the system, we can come
link |
up with, we have an algorithm which can be placed into that world, into that environment
link |
and can succeed in achieving that goal.
link |
And then that to me is almost the essence of intelligence if we can achieve that.
link |
And so AlphaZero is a step towards that, and it's a step that was taken in the context
link |
of two player perfect information games like Go and chess.
link |
We also applied it to Japanese chess.
link |
So just to clarify, the first step was AlphaGo Zero.
link |
The first step was to try and take all of the knowledge out of AlphaGo in such a way
link |
that it could play in a fully self discovered way, purely from self play.
link |
And to me, the motivation for that was always that we could then plug it into other domains,
link |
but we saved that until later.
link |
Well, in fact, I mean, just for fun, I could tell you exactly the moment where the idea
link |
for AlphaZero occurred to me, because I think there's maybe a lesson there for researchers
link |
who are kind of too deeply embedded in their research and working 24 sevens, try and come
link |
up with the next idea, which is, it actually occurred to me on honeymoon, and I was at
link |
my most fully relaxed state, really enjoying myself, and just being this, like, the algorithm
link |
for AlphaZero just appeared, and in its full form, and this was actually before we played
link |
against Lisa Dahl, but we just didn't, I think we were so busy trying to make sure we could
link |
beat the world champion, that it was only later that we had the opportunity to step
link |
back and start examining that sort of deeper scientific question of whether this could
link |
So nevertheless, so self play is probably one of the most profound ideas that represents
link |
to me at least, artificial intelligence.
link |
But the fact that you could use that kind of mechanism to, again, beat world class players,
link |
that's very surprising.
link |
So we kind of, to me, it feels like you have to train in a large number of expert games.
link |
So was it surprising to you?
link |
What was the intuition?
link |
Can you sort of think, not necessarily at that time, even now, what's your intuition?
link |
Why this thing works so well?
link |
Why is it able to learn from scratch?
link |
Well, let me first say why we tried it.
link |
So we tried it both because I feel that it was the deeper scientific question to be asking,
link |
to make progress towards AI.
link |
And also because in general, in my research, I don't like to do research on questions for
link |
which we already know the likely outcome.
link |
I don't see much value in running an experiment where you're 95% confident that you will succeed.
link |
And so we could have tried maybe to take AlphaGo and do something which we knew for sure it
link |
But much more interesting to me was to try it on the things which we weren't sure about.
link |
And one of the big questions on our minds back then was, could you really do this with
link |
How far could that go?
link |
Would it be as strong?
link |
And honestly, we weren't sure.
link |
It was 50, 50, I think.
link |
If you'd asked me, I wasn't confident that it could reach the same level as these systems,
link |
but it felt like the right question to ask.
link |
And even if it had not achieved the same level, I felt that that was an important direction
link |
And so then, lo and behold, it actually ended up outperforming the previous version of AlphaGo
link |
and indeed was able to beat it by 100 games to zero.
link |
So what's the intuition as to why?
link |
I think the intuition to me is clear that whenever you have errors in a system, as we
link |
did in AlphaGo, AlphaGo suffered from these delusions.
link |
Occasionally, it would misunderstand what was going on in a position and misevaluate
link |
How can you remove all of these errors?
link |
Errors arise from many sources.
link |
For us, they were arising both from starting from the human data, but also from the nature
link |
of the search and the nature of the algorithm itself.
link |
But the only way to address them in any complex system is to give the system the ability to
link |
correct its own errors.
link |
It must be able to correct them.
link |
It must be able to learn for itself when it's doing something wrong and correct for it.
link |
And so it seemed to me that the way to correct delusions was indeed to have more iterations
link |
of reinforcement learning, that no matter where you start, you should be able to correct
link |
those errors until it gets to play that out and understand, oh, well, I thought that I
link |
was going to win in this situation, but then I ended up losing.
link |
That suggests that I was misevaluating something, there's a hole in my knowledge and now the
link |
system can correct for itself and understand how to do better.
link |
Now if you take that same idea and trace it back all the way to the beginning, it should
link |
be able to take you from no knowledge, from completely random starting point, all the
link |
way to the highest levels of knowledge that you can achieve in a domain.
link |
And the principle is the same, that if you give, if you bestow a system with the ability
link |
to correct its own errors, then it can take you from random to something slightly better
link |
than random because it sees the stupid things that the random is doing and it can correct
link |
And then it can take you from that slightly better system and understand, well, what's
link |
And it takes you on to the next level and the next level and this progress can go on indefinitely.
link |
And indeed, what would have happened if we'd carried on training AlphaGo Zero for longer?
link |
We saw no sign of it slowing down its improvements, or at least it was certainly carrying on to
link |
And presumably, if you had the computational resources, this could lead to better and better
link |
systems that discover more and more.
link |
So your intuition is fundamentally there's not a ceiling to this process.
link |
One of the surprising things, just like you said, is the process of patching errors.
link |
It's intuitively makes sense that reinforcement learning should be part of that process.
link |
But what is surprising is in the process of patching your own lack of knowledge, you don't
link |
open up other patches.
link |
You keep sort of like there's a monotonic decrease of your weaknesses.
link |
Well, let me back this up.
link |
I think science always should make falsifiable hypotheses.
link |
So let me back up this claim with a falsifiable hypothesis, which is that if someone was to,
link |
in the future, take AlphaZero as an algorithm and run it on with greater computational
link |
resources that we had available today, then I would predict that they would be able to
link |
beat the previous system 100 games to zero.
link |
And that if they were then to do the same thing a couple of years later, that that would
link |
beat that previous system 100 games to zero, and that that process would continue indefinitely
link |
throughout at least my human lifetime.
link |
Probably the game of go would set the ceiling.
link |
I mean, the game of go would set the ceiling, but the game of go has 10 to the 170 states
link |
So the ceiling is unreachable by any computational device that can be built out of the 10 to
link |
the 80 atoms in the universe.
link |
You asked a really good question, which is, do you not open up other errors when you correct
link |
your previous ones?
link |
And the answer is yes, you do.
link |
And so it's a remarkable fact about this class of two player game and also true of single
link |
agent games that essentially progress will always lead you to, if you have sufficient
link |
representational resource like imagine you had, could represent every state in a big
link |
table of the game, then we know for sure that a progress of self improvement will lead all
link |
the way in the single agent case to the optimal possible behavior, and in the two player case
link |
to the minimax optimal behavior that is the best way that I can play knowing that you're
link |
playing perfectly against me.
link |
And so for those cases, we know that even if you do open up some new error that in some
link |
sense you've made progress, you're progressing towards the best that can be done.
link |
So AlphaGo was initially trained on expert games with some self play AlphaGo zero removed
link |
the need to be trained on expert games.
link |
And then another incredible step for me because I just love chess is to generalize that further
link |
to be in Alpha zero to be able to play the game of go beating AlphaGo zero and AlphaGo,
link |
and then also being able to play the game of chess and others.
link |
So what was that step like?
link |
What's the interesting aspects there that required to make that happen?
link |
I think the remarkable observation which we saw with Alpha zero was that actually without
link |
modifying the algorithm at all, it was able to play and crack some of AI's greatest previous
link |
In particular, we dropped it into the game of chess.
link |
And unlike the previous systems like Deep Blue, which had been worked on for years and years,
link |
we were able to beat the world's strongest computer chess program convincingly using
link |
a system that was fully discovered by its own from scratch with its own principles.
link |
And in fact, one of the nice things that we found was that in fact, we also achieved the
link |
same result in Japanese chess, a variant of chess where you get to capture pieces and then
link |
place them back down on your own side as an extra piece.
link |
So a much more complicated variant of chess.
link |
And we also beat the world's strongest programs and reached superhuman performance in that
link |
And it was the very first time that we'd ever run the system on that particular game was
link |
the version that we published in the paper on Alpha zero.
link |
It just worked out of the box, literally no touching it, we didn't have to do anything
link |
and there it was, superhuman performance, no tweaking, no twiddling.
link |
And so I think there's something beautiful about that principle that you can take an
link |
algorithm and without twiddling anything, it just works.
link |
Now to go beyond Alpha zero, what's required?
link |
Alpha zero is just a step.
link |
And there's a long way to go beyond that to really crack the deep problems of AI.
link |
But one of the important steps is to acknowledge that the world is a really messy place.
link |
It's this rich, complex, beautiful, but messy environment that we live in and no one gives
link |
Like no one knows the rules of the world, at least maybe we understand that it operates
link |
according to Newtonian or quantum mechanics at the micro level or according to relativity
link |
at the macro level, but that's not a model that's useful for us as people to operate
link |
Somehow the agent needs to understand the world for itself in a way where no one tells
link |
it the rules of the game and yet it can still figure out what to do in that world, deal
link |
with this stream of observations coming in, rich sensory input coming in, actions going
link |
out in a way that allows it to reason in the way that Alpha zero can reason in the way
link |
that these go and chess playing programs can reason, but in a way that allows it to take
link |
actions in that messy world to achieve its goals.
link |
And so this led us to the most recent step in the story of AlphaGo, which was a system
link |
called Mu Zero, and Mu Zero is a system which learns for itself even when the rules are
link |
It actually can be dropped into a system with messy perceptual inputs.
link |
We actually tried it in some Atari games, the canonical domains of Atari that have
link |
been used for reinforcement learning, and this system learned to build a model of these
link |
Atari games that was sufficiently rich and useful enough for it to be able to plan successfully.
link |
And in fact, that system not only went on to beat the state of the art in Atari, but
link |
the same system without modification was able to reach the same level of superhuman performance
link |
in Go, Chess, and Shogi that we'd seen in Alpha Zero, showing that even without the
link |
rules, the system can learn for itself just by trial and error, just by playing this game
link |
of Go, and no one tells you what the rules are, but you just get to the end and someone
link |
says, you know, win or loss, or you play this game of chess and someone says win or loss,
link |
or you play a game of breakout in Atari and someone just tells you, you know, your score
link |
And the system for itself figures out essentially the rules of the system, the dynamics of the
link |
world, how the world works, and not in any explicit way, but just implicitly enough understanding
link |
for it to be able to plan in that system in order to achieve its goals.
link |
And that's the fundamental process that you have to go through when you're facing in
link |
any uncertain kind of environment that you would in the real world is figuring out the
link |
sort of the rules, the basic rules of the game.
link |
So that allows it to be applicable to basically any domain that could be digitized in the
link |
way that it needs to in order to be consumable, sort of in order for the reinforcement learning
link |
framework to be able to sense the environment, to be able to act in the environment, so on.
link |
The full reinforcement learning problem needs to deal with worlds that are unknown and complex
link |
and the agent needs to learn for itself how to deal with that.
link |
And so Musero is a step, a further step in that direction.
link |
One of the things that inspired the general public, and just in conversations I have with
link |
my parents or something with my mom, that just loves what was done is kind of at least
link |
a notion that there was some display of creativity, some new strategies, new behaviors that were
link |
That again has echoes of intelligence.
link |
So is there something that stands out?
link |
Do you see it the same way that there's creativity and there's some behaviors, patterns that
link |
you saw that AlphaZero was able to display that are truly creative?
link |
So let me start by saying that I think we should ask what creativity really means.
link |
So to me, creativity means discovering something which wasn't known before, something unexpected,
link |
something outside of our norms.
link |
And so in that sense, the process of reinforcement learning or the self play approach that was
link |
used by AlphaZero is the essence of creativity.
link |
It's really saying at every stage, you're playing according to your current norms and
link |
you try something.
link |
And if it works out, you say, hey, here's something great, I'm going to start using that.
link |
And then that process, it's like a micro discovery that happens millions and millions of times
link |
over the course of the algorithm's life, where it just discovers some new idea, oh, this
link |
pattern, this pattern's working really well for me, I'm going to start using that.
link |
Oh, now, oh, here's this other thing I can do, I can start to connect these stones together
link |
in this way, or I can start to sacrifice stones or give up on pieces or play shoulder hits
link |
on the fifth line or whatever it is.
link |
The system's discovering things like this for itself continually, repeatedly all the
link |
And so it should come as no surprise to us then, when if you leave these systems going,
link |
that they discover things that are not known to humans, that to the human norms are considered
link |
And we've seen this several times, in fact, in AlphaGo Zero, we saw this beautiful timeline
link |
of discovery where what we saw was that there are these opening patterns that humans play
link |
called joseki, these are like the patterns that humans learn to play in the corners and
link |
they've been developed and refined over literally thousands of years in the game of Go.
link |
And what we saw was in the course of the training AlphaGo Zero, over the course of the 40 days
link |
that we trained this system, it starts to discover exactly these patterns that human
link |
And over time, we found that all of the joseki that humans played were discovered by the
link |
system through this process of self play and this sort of essential notion of creativity.
link |
But what was really interesting was that over time, it then starts to discard some of these
link |
in favor of its own joseki that humans didn't know about.
link |
And it starts to say, oh, well, you thought that the Knights move pincer joseki was a great
link |
But here's something different you can do there, which makes some new variation that
link |
the humans didn't know about.
link |
And actually now the human Go players study the joseki that AlphaGo played and they become
link |
the new norms that are used in today's top level Go competitions.
link |
That never gets old.
link |
And just the first to me, maybe just makes me feel good as a human being that a self
link |
play mechanism that knows nothing about us humans discovers patterns that we humans do.
link |
It says like an affirmation that we're doing okay as humans.
link |
In this domain and other domains, we figured out it's like the Churchill quote about democracy.
link |
It's the, you know, it sucks, but it's the best one we've tried.
link |
So in general, it's taking a step outside of Go and you have like a million accomplishment
link |
that I have no time to talk about with AlphaStar and so on and the current work.
link |
But in general, this self play mechanism that you've inspired the world with by beating
link |
the world champion Go player.
link |
Do you see that as DC being applied in other domains, do you have sort of dreams and hope
link |
that it's applied in both the simulated environments and the constrained environments of games constrained?
link |
I mean, AlphaStar really demonstrates that you can remove a lot of the constraints, but
link |
nevertheless, it's an individual simulated environment.
link |
Do you have a hope or dream that it starts being applied in the robotics environment
link |
and maybe even in domains that are safety critical and so on and have, you know, have
link |
a real impact in the real world like autonomous vehicles, for example, which seems like a very
link |
far out dream at this point.
link |
So I absolutely do hope and imagine that we will get to the point where ideas just like
link |
these are used in all kinds of different domains.
link |
In fact, one of the most satisfying things as a researcher is when you start to see other
link |
people use your algorithms in unexpected ways.
link |
So in the last couple of years, there have been, you know, a couple of nature papers
link |
where different teams unbeknownst to us took AlphaZero and applied exactly those same algorithms
link |
and ideas to real world problems of huge meaning to society.
link |
So one of them was the problem of chemical synthesis and they were able to beat the state
link |
of the art in finding pathways of how to actually synthesize chemicals, retrochemical synthesis.
link |
And the second paper actually just came out a couple of weeks ago in Nature showed that
link |
in quantum computation, you know, one of the big questions is how to understand the
link |
nature of the function in quantum computation and a system based on AlphaZero beat the state
link |
of the art by quite some distance there again.
link |
So these are just examples.
link |
And I think, you know, the lesson which we've seen elsewhere in machine learning time and
link |
time again is that if you make something general, it will be used in all kinds of ways.
link |
You know, you provide a really powerful tool to society and those tools can be used in
link |
And so I think we're just at the beginning and for sure, I hope that we see all kinds
link |
So the other side of the question of reinforcement learning framework is, you know, usually want
link |
to specify a reward function and an objective function.
link |
What do you think about sort of ideas of intrinsic rewards of when we're not really sure about,
link |
you know, if we take, you know, human beings as existence proof that we don't seem to
link |
be operating according to a single reward.
link |
Do you think that there's interesting ideas for when you don't know how to truly specify
link |
the reward, you know, that there's some flexibility for discovering it intrinsically or so on
link |
in the context of reinforcement learning?
link |
So I think, you know, when we think about intelligence, it's really important to be
link |
clear about the problem of intelligence.
link |
And I think it's clearest to understand that problem in terms of some ultimate goal that
link |
we want the system to try and solve for.
link |
And after all, if we don't understand the ultimate purpose of the system, do we really
link |
even have a clearly defined problem that we're solving at all?
link |
Now within that, as with your example for humans, the system may choose to create its
link |
own motivations and sub goals that help the system to achieve its ultimate goal.
link |
And that may indeed be a hugely important mechanism to achieve those ultimate goals.
link |
But there is still some ultimate goal I think the system needs to be measurable and evaluated
link |
And even for humans, I mean, humans, we're incredibly flexible.
link |
We feel that we can, you know, any goal that we're given, we feel we can master to some
link |
But if we think of those goals really, you know, like the goal of being able to pick
link |
up an object or the goal of being able to communicate or influence people to do things
link |
in a particular way or whatever those goals are, really, they're sub goals really that
link |
You know, we choose to pick up the object.
link |
We choose to communicate.
link |
We choose to influence someone else.
link |
And we choose those because we think it will lead us to something, you know, in later on.
link |
We think that that's helpful to us to achieve some ultimate goal.
link |
Now I don't want to speculate whether or not humans as a system necessarily have a singular
link |
overall goal of survival or whatever it is.
link |
But I think the principle for understanding and implementing intelligence is has to be
link |
that if we're trying to understand intelligence or implement our own, there has to be a well
link |
Otherwise, if it's not, I think it's like an admission of defeat that for that to be
link |
hoped for understanding or implementing intelligence, we have to know what we're doing.
link |
We have to know what we're asking the system to do.
link |
Otherwise, if you don't have a clearly defined purpose, you're not going to get a clearly
link |
The ridiculous big question that has to naturally follow is I have to pin you down on this thing
link |
that nevertheless, one of the big silly or big real questions before humans is the meaning
link |
of life is us trying to figure out our own reward function.
link |
And you just kind of mentioned that if you want to build intelligence systems and you
link |
know what you're doing, you should be at least cognizant to some degree of what the reward
link |
So the natural question is, what do you think is the reward function of human life, the
link |
meaning of life for us humans, the meaning of our existence?
link |
I think I'd be speculating beyond my own expertise, but just for fun, let me do that and say
link |
I think that there are many levels at which you can understand a system and you can understand
link |
something as optimizing for a goal at many levels.
link |
And so you can understand the, let's start with the universe, like does the universe
link |
It feels like it's just at one level, just following certain mechanical laws of physics
link |
and that that's led to the development of the universe.
link |
But at another level, you can view it as actually, there's the second law of thermodynamics
link |
that says that this is increasing in entropy over time forever.
link |
And now there's a view that's been developed by certain people at MIT that you can think
link |
of this as almost like a goal of the universe, that the purpose of the universe is to maximize
link |
So there are multiple levels at which you can understand a system.
link |
The next level down, you might say, well, if the goal is to maximize entropy, well,
link |
how can that be done by a particular system?
link |
And maybe evolution is something that the universe discovered in order to kind of dissipate
link |
energy as efficiently as possible.
link |
And by the way, I'm borrowing from Max Tegmark for some of these metaphors, the physicist.
link |
But if you can think of evolution as a mechanism for dispersing energy, then evolution, you
link |
might say, then becomes a goal, which is if evolution disperses energy by reproducing
link |
as efficiently as possible, what's evolution then?
link |
Well, it's now got its own goal within that, which is to actually reproduce as effectively
link |
And now, how does reproduction, how is that made as effective as possible?
link |
Well, you need entities within that that can survive and reproduce as effectively as possible.
link |
And so it's natural that in order to achieve that high level goal, those individual organisms
link |
discover brains, intelligences, which enable them to support the goals of evolution.
link |
And those brains, what do they do?
link |
Well, perhaps the early brains, maybe they were controlling things at some direct level.
link |
Maybe they were the equivalent of preprogrammed systems, which were directly controlling what
link |
was going on and setting certain things in order to achieve these particular goals.
link |
But that led to another level of discovery, which was learning systems, parts of the brain
link |
which were able to learn for themselves and learn how to program themselves to achieve
link |
any goal, and presumably there are parts of the brain where goals are set to parts of
link |
that system and provides this very flexible notion of intelligence that we as humans presumably
link |
have, which is the ability to kind of write the reason we feel that we can achieve any
link |
So, it's a very long winded answer to say that, you know, I think there are many perspectives
link |
and many levels at which intelligence can be understood.
link |
And at each of those levels, you can take multiple perspectives, you know, you can view
link |
the system as something which is optimizing for a goal, which is understanding it at a
link |
level by which we can maybe implement it and understand it as AI researchers or computer
link |
scientists, or you can understand it at the level of the mechanistic thing which is going
link |
on, that there are these atoms bouncing around in the brain and they lead to the outcome
link |
of that system is not in contradiction with the fact that it's also a decision making
link |
system that's optimizing for some goal and purpose.
link |
I've never heard the description of the meaning of life structured so beautifully in layers,
link |
but you did miss one layer, which is the next step, which you're responsible for, which
link |
is creating the artificial intelligence layer on top of that.
link |
And I can't wait to see, well, I may not be around, but I can't wait to see what the
link |
next layer beyond that will be.
link |
Well, let's just take that argument and pursue it to its natural conclusion.
link |
So the next level indeed is for how can our learning brain achieve its goals most effectively?
link |
Well, maybe it does so by us as learning beings, building a system which is able to solve for
link |
those goals more effectively than we can.
link |
And so when we build a system to play the game of go, when I said that I wanted to build
link |
a system that can play go better than I can, I've enabled myself to achieve that goal of
link |
playing go better than I could by directly playing it and learning it myself.
link |
And so now a new layer has been created, which is systems which are able to achieve goals
link |
And ultimately, there may be layers beyond that where they set subgoals to parts of their
link |
own system in order to achieve those and so forth.
link |
So the story of intelligence I think is a multi layered one and a multi perspective one.
link |
We live in an incredible universe.
link |
David, thank you so much first of all for dreaming of using learning to solve go and
link |
building intelligence systems and for actually making it happen and for inspiring millions
link |
of people in the process.
link |
It's truly an honor.
link |
Thank you so much for talking today.
link |
Thanks for listening to this conversation with David Silver.
link |
And thank you to our sponsors masterclass and cash app.
link |
Please consider supporting the podcast by signing up to masterclass at masterclass.com slash
link |
Lex and downloading cash app and using code Lex podcast.
link |
If you enjoy this podcast, subscribe on YouTube, review it with five stars and Apple podcast,
link |
support on Patreon or simply connect with me on Twitter at Lex Friedman.
link |
And now let me leave you some words from David Silver.
link |
My personal belief is that we've seen something of a turning point where we're starting to
link |
understand that many abilities like intuition and creativity that we've previously thought
link |
were in the domain only of the human mind are actually accessible to machine intelligence
link |
And I think that's a really exciting moment in history.
link |
Thank you for listening and hope to see you next time.