back to indexOriol Vinyals: DeepMind AlphaStar, StarCraft, and Language | Lex Fridman Podcast #20
link |
The following is a conversation with Ariol Vanialis.
link |
He's a senior research scientist at Google DeepMind,
link |
and before that, he was at Google Brain and Berkeley.
link |
His research has been cited over 39,000 times.
link |
He's truly one of the most brilliant and impactful minds
link |
in the field of deep learning.
link |
He's behind some of the biggest papers and ideas in AI,
link |
including sequence to sequence learning,
link |
audio generation, image captioning,
link |
neural machine translation,
link |
and, of course, reinforcement learning.
link |
He's a lead researcher of the AlphaStar project,
link |
creating an agent that defeated a top professional
link |
at the game of StarCraft.
link |
This conversation is part
link |
of the artificial intelligence podcast.
link |
If you enjoy it, subscribe on YouTube, iTunes,
link |
or simply connect with me on Twitter,
link |
at Lex Freedman, spelled F R I D.
link |
And now, here's my conversation with Ariol Vanialis.
link |
You spearheaded the DeepMind team
link |
behind AlphaStar that recently beat
link |
a top professional player at StarCraft.
link |
So you have an incredible wealth of work
link |
and deep learning and a bunch of fields,
link |
but let's talk about StarCraft first.
link |
Let's go back to the very beginning,
link |
even before AlphaStar, before DeepMind,
link |
before Deep Learning, first,
link |
what came first for you?
link |
A love for programming or a love for video games?
link |
I think for me, it definitely came first,
link |
the drive to play video games.
link |
I really liked computers.
link |
I didn't really code much,
link |
but what I would do is I would just mess with the computer,
link |
break it and fix it.
link |
That was the level of skills, I guess,
link |
that I gained in my very early days,
link |
I mean, when I was 10 or 11.
link |
And then I really got into video games,
link |
especially StarCraft, actually, the first version.
link |
I spent most of my time just playing,
link |
kind of, pseudo professionally,
link |
as professionally as you could play back in 98 in Europe,
link |
which was not a very main scene,
link |
like what's called nowadays esports.
link |
Right, of course, in the 90s.
link |
So how'd you get into StarCraft?
link |
What was your favorite race?
link |
How did you develop your skill?
link |
What was your strategy?
link |
All that kind of thing.
link |
So as a player, I tended to try to play not many games,
link |
not to disclose the strategies that I developed.
link |
And I like to play random, actually,
link |
not in competitions, but just to...
link |
I think in StarCraft, there's three main races,
link |
and I found it very useful to play with all of them.
link |
So I would choose random many times,
link |
even sometimes in tournaments,
link |
to gain skill on the three races,
link |
because it's not how you play against someone,
link |
but also if you understand the race because you play it,
link |
you also understand what's annoying,
link |
then when you're on the other side,
link |
what to do to annoy that person,
link |
to try to gain advantages here and there and so on.
link |
So I actually played random,
link |
although I must say in terms of favorite race,
link |
I really like Zerk.
link |
I was probably best at Zerk,
link |
and that's probably what I tend to use
link |
towards the end of my career before starting university.
link |
So let's step back a little bit.
link |
Could you try to describe StarCraft to people
link |
that may never have played video games,
link |
especially the massively online variety like StarCraft?
link |
So StarCraft is a real time strategy game,
link |
and the way to think about StarCraft,
link |
perhaps if you understand a bit chess,
link |
is that there's a board,
link |
which is called map,
link |
or the map where people play against each other.
link |
There's obviously many ways you can play,
link |
but the most interesting one is the one versus one setup,
link |
where you just play against someone else,
link |
or even the build in AI, right?
link |
Blizzard put a system that can play the game reasonably well,
link |
if you don't know how to play.
link |
And then in this board,
link |
you have, again, pieces like in chess,
link |
but these pieces are not there initially,
link |
like they are in chess.
link |
You actually need to decide to gather resources,
link |
to decide which pieces to build.
link |
So in a way, you're starting almost with no pieces.
link |
You start gathering resources in StarCraft.
link |
There's minerals and gas that you can gather,
link |
and then you must decide how much do you wanna focus,
link |
for instance, on gathering more resources,
link |
or starting to build units or pieces.
link |
And then once you have enough pieces,
link |
or maybe a good attack composition,
link |
then you go and attack the other side of the map.
link |
And now the other main difference with chess
link |
is that you don't see the other side of the map.
link |
So you're not seeing the moves of the enemy.
link |
It's what we call partially observable.
link |
So as a result, you must not only decide trading off economy
link |
versus building your own units,
link |
but you also must decide whether you wanna scout
link |
to gather information,
link |
but also by scouting you might be giving away some information
link |
that you might be hiding from the enemy.
link |
So there's a lot of complex decision making
link |
There's also unlike chess, this is not a turn based game.
link |
You play basically all the time continuously
link |
and thus some skill in terms of speed and accuracy
link |
of clicking is also very important.
link |
And people that train for this,
link |
really play this game at an amazing skill level.
link |
I've seen many times these,
link |
and if you can witness this life,
link |
it's really, really impressive.
link |
So in a way it's kind of a chess,
link |
where you don't see the other side of the board,
link |
you're building your own pieces,
link |
and you also need to gather resources
link |
to basically get some money to build other buildings,
link |
pieces, technology, and so on.
link |
From the perspective of the human player,
link |
the difference between that and chess,
link |
or maybe that and a game like turn based strategy,
link |
like Heroes of the Might of Magic,
link |
is that there's an anxiety,
link |
because you have to make these decisions really quickly.
link |
And if you are not actually aware of what decisions work,
link |
it's a very stressful balance that you have to,
link |
everything you describe is actually quite stressful,
link |
difficult to balance for amateur human player.
link |
I don't know if it gets easier at the professional level,
link |
like if they're fully aware of what they have to do,
link |
but at the amateur level,
link |
there's this anxiety, oh crap, I'm being attacked,
link |
oh crap, I have to build up resources,
link |
oh, I have to probably expand,
link |
and all these, the real time strategy aspect
link |
is really stressful and computational,
link |
I'm sure, difficult, we'll get into it.
link |
But for me, Battle.net,
link |
so StarCraft was released in 98, 20 years ago,
link |
which is hard to believe,
link |
and Blizzard Battle.net with Diablo 96 came out,
link |
and to me, it might be a narrow perspective,
link |
but it changed online gaming,
link |
and perhaps society forever,
link |
but I may have made way too narrow a viewpoint,
link |
but from your perspective,
link |
can you talk about the history of gaming
link |
over the past 20 years?
link |
Is this, how transformational,
link |
how important is this line of games?
link |
Right, so I think I kind of was an active gamer
link |
whilst this was developing the internet and online gaming,
link |
so for me, the way it came was I played
link |
other games strategy related,
link |
I played a bit of Common and Conquer,
link |
and then I played Warcraft 2, which is from Blizzard,
link |
but at the time, I didn't know,
link |
I didn't understand about what Blizzard was or anything.
link |
Warcraft 2 was just a game,
link |
which was actually very similar to StarCraft in many ways.
link |
It's also a real time strategy game
link |
where there's orcs and humans, so there's only two races.
link |
But it was offline, and it was offline, right?
link |
So I remember a friend of mine came to school,
link |
say, oh, there's this new cool game called StarCraft,
link |
and I just said, oh, this sounds like
link |
just a copy of Warcraft 2, until I kind of installed it,
link |
and at the time, I am from Spain,
link |
so we didn't have like very good internet, right?
link |
So there was, for us,
link |
StarCraft became first kind of an offline experience
link |
where you kind of start to play these missions, right?
link |
You play against some sort of scripted things
link |
to develop the story of the characters in the game,
link |
and then later on, I start playing against the built in AI,
link |
and I thought it was impossible to defeat it.
link |
Then eventually, you defeat one,
link |
and you can actually play against seven built in AIs
link |
which also felt impossible,
link |
but actually, it's not that hard to beat
link |
seven built in AIs at once.
link |
So once we achieved that,
link |
also we discovered that we could play,
link |
as I said, internet wasn't that great,
link |
but we could play with the LAN, right?
link |
Like basically against each other
link |
if we were in the same place,
link |
because you could just connect machines with like cables, right?
link |
So we started playing in LAN mode,
link |
and against, you know, as a group of friends,
link |
and it was really, really like much more entertaining
link |
than playing against the AIs.
link |
And later on, as internet was starting to develop
link |
and being a bit faster and more reliable,
link |
then it's when I started experiencing Battle.net,
link |
which is these amazing universe,
link |
not only because of the fact
link |
that you can play the game against anyone in the world,
link |
but you can also get to know more people.
link |
You just get exposed to now like this vast variety of,
link |
it's kind of a bit when the chats came about, right?
link |
There was a chat system,
link |
you could play against people,
link |
but you could also chat with people,
link |
not only about Stacker, but about anything.
link |
And that became a way of life for kind of two years.
link |
And obviously then it became like kind of,
link |
it exploded in me that I started to play more seriously,
link |
going to tournaments and so on and so forth.
link |
Do you have a sense on a societal sociological level
link |
what's this whole part of society
link |
that many of us are not aware of?
link |
And it's a huge part of society, which is gamers.
link |
I mean, every time I come across that in YouTube
link |
or streaming sites,
link |
I mean, this is a huge number of people play games religiously.
link |
Do you have a sense of those folks,
link |
especially now that you've returned to that realm
link |
a little bit on the AI side?
link |
Yeah, so in fact, even after Stacker,
link |
I actually played World of Warcraft,
link |
which is maybe the main sort of online world and presence
link |
that you get to interact with lots of people.
link |
So I played that for a little bit.
link |
To me, it was a bit less stressful than Starcraft
link |
because winning was kind of a given.
link |
You just put in this world
link |
and you can always complete missions.
link |
But I think it was actually the social aspect
link |
of especially Starcraft first
link |
and then games like World of Warcraft
link |
really shaped me in a very interesting ways
link |
because what you get to experience
link |
is just people you wouldn't usually interact with, right?
link |
So even nowadays, I still have many Facebook friends
link |
from the area where I played online
link |
and their ways of thinking is even political.
link |
They just don't, we don't live in it.
link |
Like we don't interact in the real world,
link |
but we were connected by basically fiber.
link |
And that way I actually get to understand a bit better
link |
that we live in a diverse world.
link |
And these were just connections that were made by,
link |
because I happened to go in a city,
link |
in a virtual city as a priest
link |
and I met this warrior and we became friends.
link |
And then we started like playing together, right?
link |
So I think it's transformative
link |
and more and more and more people are more aware of it.
link |
I mean, it's becoming quite mainstream.
link |
But back in the day, as you were saying,
link |
in 2005 even it was very, still very strange thing to do
link |
especially in Europe, I think there were exceptions
link |
like Korea for instance, it was amazing
link |
like that everything happened so early
link |
in terms of cybercafes.
link |
Like it's, if you go to Seoul, it's a city
link |
that back in the day, StarCraft was kind of,
link |
you could be a celebrity by playing StarCraft
link |
but this was like 99, 2000, right?
link |
It's not like recently.
link |
So yeah, it's quite interesting to look back
link |
and yeah, I think it's changing society.
link |
The same way of course, like technology
link |
and social networks and so on are also transforming things.
link |
And a quick tangent, let me ask,
link |
you're also one of the most productive people
link |
in your particular chosen passion and path in life.
link |
And yet you're also appreciate and enjoy video games.
link |
Do you think it's possible to enjoy video games in moderation?
link |
Someone told me that you could choose two out of three.
link |
When I was playing video games,
link |
you could choose having a girlfriend,
link |
playing video games or studying.
link |
And I think for the most part it was relatively true.
link |
These things do take time.
link |
Games like StarCraft, if you take the game pretty seriously
link |
and you wanna study it,
link |
then you obviously will dedicate more time to it.
link |
And I definitely took gaming
link |
and obviously studying very seriously.
link |
I love learning science and et cetera.
link |
So to me, especially when I started university undergrad,
link |
I kind of stepped off StarCraft.
link |
I actually fully stopped playing.
link |
And then World of Warcraft was a bit more casual.
link |
You could just connect online and I mean, it was fun.
link |
But as I said, that was not as much time investment
link |
as it was for me in StarCraft.
link |
Okay, so let's get into AlphaStar.
link |
What are the, you're behind the team.
link |
So DeepMind has been working on StarCraft
link |
and released a bunch of cool open source agents
link |
and so on in the past few years.
link |
But AlphaStar really is the moment
link |
where the first time you beat a world class player.
link |
So what are the parameters of the challenge
link |
in the way that AlphaStar took it on
link |
and how did you and David and the rest of the DeepMind team
link |
Consider that you can even beat the best in the world
link |
I think it all started in, back in 2015, actually I'm lying.
link |
I think it was 2014 when DeepMind was acquired by Google
link |
and I at the time was at Google Brain,
link |
which is it was in California, it's still in California.
link |
We had this summit where we got together the two groups.
link |
So Google Brain and Google DeepMind got together
link |
and we gave a series of talks.
link |
And given that they were doing deep reinforcement learning
link |
for games, I decided to bring up part of my past
link |
which I had developed at Berkeley like this thing
link |
which we call Berkeley Overmind
link |
which is really just a StarCraft one bot.
link |
So I talked about that
link |
and I remember them is just came to me and said,
link |
well, maybe not now, it's perhaps a bit too early
link |
but you should just come to DeepMind and do this again
link |
with deep reinforcement learning.
link |
And at the time it sounded very science fiction
link |
for several reasons.
link |
But then in 2016, when I actually moved to London
link |
and joined DeepMind transferring from Brain,
link |
it became apparent that because of the AlphaGo moment
link |
and kind of Blizzard reaching out to us to say,
link |
wait, like, do you want the next challenge
link |
and also me being full time at DeepMind?
link |
So sort of kind of all these came together.
link |
And then I went to Irvine in California
link |
to the Blizzard headquarters to just chat with them
link |
and try to explain how would it all work
link |
before you do anything.
link |
And the approach has always been about
link |
the learning perspective, right?
link |
So in Berkeley, we did a lot of rule based conditioning
link |
and if you have more than three units, then go attack
link |
and if the other has more units than me, I retreat
link |
and so on and so forth.
link |
And of course, the point of deep reinforcement learning,
link |
deep learning, machine learning in general
link |
is that all these should be learned behavior.
link |
So that kind of was the DNA of the project
link |
since its inception in 2016
link |
where we just didn't even have an environment to work with.
link |
And so that's how it all started really.
link |
So if you go back to that conversation with Demis
link |
or even in your own head, how far away did you,
link |
because we're talking about Atari games,
link |
we're talking about Go, which is kind of,
link |
if you're honest about it, really far away from StarCraft.
link |
Well, now that you've beaten it,
link |
maybe you could say it's close,
link |
but it seems like StarCraft is way harder than Go,
link |
philosophically and mathematically speaking.
link |
So how far away did you think you were?
link |
Do you think it's 2019 and 18 you could be doing
link |
as well as you have?
link |
Yeah, when I kind of thought about,
link |
okay, I'm gonna dedicate a lot of my time and focus on this.
link |
And obviously I do a lot of different research
link |
in deep learning, so spending time on it.
link |
I mean, I really had to kind of think
link |
there's gonna be something good happening out of this.
link |
So really I thought, well, this sounds impossible
link |
and it probably is impossible to do the full thing,
link |
like the full game where you play one versus one
link |
and it's only a neural network playing and so on.
link |
So it really felt like,
link |
I just didn't even think it was possible.
link |
But on the other hand,
link |
I could see some stepping stones towards that goal.
link |
Clearly you could define sub problems in StarCraft
link |
and sort of dissect it a bit and say,
link |
okay, here is a part of the game, here is another part.
link |
And also obviously the fact,
link |
so this was really also critical to me,
link |
the fact that we could access human replays.
link |
So Blizzard was very kind
link |
and in fact they open source this for the whole community
link |
where you can just go
link |
and it's not every single StarCraft game ever played,
link |
but it's a lot of them.
link |
You can just go and download
link |
and every day they will,
link |
you can just query a data set and say,
link |
well, give me all the games that were played today.
link |
And given my kind of experience with language
link |
and sequences and supervised learning,
link |
I thought, well, that's definitely gonna be very helpful
link |
and something quite unique now
link |
because ever before we had such a large data set
link |
of replays of people playing the game at this scale
link |
of such a complex video game, right?
link |
So that to me was a precious resource.
link |
And as soon as I knew that Blizzard was able
link |
to kind of give this to the community,
link |
I started to feel positive
link |
about something non trivial happening.
link |
But I also thought the full thing,
link |
like really no rules, no single line of code
link |
that tries to say, well, I mean,
link |
if you see this unit build a detector,
link |
all these, not having any of these specializations
link |
seemed really, really, really difficult to me.
link |
I do also like that Blizzard was teasing
link |
or even trolling you,
link |
sort of almost pulling you in
link |
into this really difficult challenge.
link |
Do they have any awareness?
link |
What's the interest from the perspective of Blizzard
link |
except just curiosity?
link |
Yeah, I think Blizzard has really understood
link |
and really bring forward this competitiveness
link |
of eSports in games.
link |
The StarCraft really kind of sparked a lot of,
link |
like something that almost was never seen,
link |
especially as I was saying, back in Korea.
link |
So they just probably thought, well,
link |
this is such a pure one versus one setup
link |
that it would be great to see
link |
if something that can play Atari or go
link |
and then later on chess could even tackle
link |
these kind of complex real time strategy game, right?
link |
So for them, they wanted to see first, obviously,
link |
whether it was possible,
link |
if the game they created was in a way solvable,
link |
And I think on the other hand,
link |
they also are a pretty modern company that innovates a lot.
link |
So just starting to understand AI for them
link |
to how to bring AI into games,
link |
is not AI for games, but games for AI, right?
link |
I mean, both ways, I think, can work.
link |
And we obviously had the manuse games for AI, right?
link |
To drive AI progress,
link |
but Blizzard might actually be able to do,
link |
and many other companies,
link |
to start to understand and do the opposite.
link |
So I think that is also something they can get out of this.
link |
And they definitely,
link |
we have brainstormed a lot about this, right?
link |
But one of the interesting things to me about StarCraft
link |
and Diablo and these games that Blizzard has created
link |
is the task of balancing classes, for example,
link |
sort of making the game fair from the starting point,
link |
and then let skill determine the outcome.
link |
Is there, I mean, can you first comment?
link |
There's three races, Zerg, Protoss, and Terran.
link |
I don't know if I've ever said that out loud.
link |
Is that how you pronounce it, Terran?
link |
Yeah, I don't think I've ever,
link |
in person, interacted with anybody about StarCraft.
link |
That's funny. So they seem to be pretty balanced.
link |
I wonder if the AI, the work that you're doing
link |
with AlphaStar would help balance them even further.
link |
Is that something you think about?
link |
Is that something that Blizzard is thinking about?
link |
Right, so balancing when you add a new unit
link |
or a new spell type is obviously possible,
link |
given that you can always train or pre train at scale,
link |
some agent that might start using that in unintended ways.
link |
But I think actually, if you understand
link |
how StarCraft has kind of co evolved with players,
link |
in a way, I think it's actually very cool,
link |
the ways that many of the things and strategies
link |
that people came up with, right?
link |
So I think it's, we've seen it over and over in StarCraft
link |
that Blizzard comes up with maybe a new unit,
link |
and then some players get creative
link |
and do something kind of unintentional
link |
or something that Blizzard designers
link |
that just simply didn't test or think about.
link |
And then after that becomes kind of mainstream in the community,
link |
Blizzard patches the game,
link |
and then they kind of maybe weaken that strategy
link |
or make it actually more interesting,
link |
but a bit more balanced.
link |
So this kind of continual talk between players and Blizzard
link |
is kind of what has defined them actually,
link |
in actually most games, like in StarCraft,
link |
but also in World of Warcraft,
link |
they would do that, there are several classes
link |
and it would be not good that everyone plays
link |
absolutely the same race and so on, right?
link |
So I think they do care about balancing, of course,
link |
and they do a fair amount of testing,
link |
but it's also beautiful to also see
link |
how players get creative anyways.
link |
And I mean, whether AI can be more creative at this point,
link |
I don't think so, right?
link |
I mean, it's just sometimes something so amazing happens,
link |
like I remember back in the days,
link |
like you have these drop ships that could drop the rivers,
link |
and that was actually not thought about,
link |
that you could drop this unit
link |
that has this what's called splash damage
link |
that would basically eliminate all the enemy's workers at once.
link |
No one thought that you could actually put them
link |
in really early game, do that kind of damage,
link |
and then things change in the game,
link |
but I don't know, I think it's quite an amazing
link |
exploration process from both sides,
link |
players and Blizzard alike.
link |
Well, it's almost like a reinforcement learning exploration,
link |
but the scale of humans that play Blizzard games
link |
is almost on the scale of a large scale,
link |
deep mind RL experiment.
link |
I mean, if you look at the numbers,
link |
that's, I mean, you're talking about,
link |
I don't know how many games,
link |
but hundreds of thousands of games, probably a month.
link |
Yeah, I mean, so you could,
link |
it's almost the same as running RL agents.
link |
What aspect of the problem of Starcraft,
link |
do you think is the hardest?
link |
Is it the, like you said, the imperfect information?
link |
Is it the fact they have to do longterm planning?
link |
Is it the real time aspect?
link |
So you have to do stuff really quickly?
link |
Is it the fact that large action space,
link |
so you can do so many possible things?
link |
Or is it, you know, in the game theoretic sense,
link |
there is no Nash equilibrium.
link |
At least you don't know what the optimal strategy is,
link |
because there's way too many options.
link |
Is there something that stands out
link |
as just like the hardest, the most annoying thing?
link |
So when we sort of looked at the problem
link |
and start to define the parameters of it, right?
link |
What are the observations?
link |
What are the actions?
link |
It became very apparent that, you know,
link |
the very first barrier that one would hit in Starcraft
link |
would be because of the action space being so large
link |
and as not being able to search like you could in chess
link |
or go even though the search space is vast.
link |
The main problem that we identified
link |
was that of exploration, right?
link |
So without any sort of human knowledge or human prior,
link |
if you think about Starcraft
link |
and you know how deep reinforcement learning algorithm works,
link |
work, which is essentially by issuing random actions
link |
and hoping that they will get some wins sometimes
link |
so they could learn.
link |
So if you think of the action space in Starcraft,
link |
almost anything you can do in the early game is bad
link |
because any action involves taking workers
link |
which are mining minerals for free.
link |
That's something that the game does automatically,
link |
sends them to mine
link |
and you would immediately just take them out of mining
link |
and send them around.
link |
So just thinking how is it gonna be possible
link |
to get to understand these concepts
link |
but even more like expanding, right?
link |
There's these buildings you can place
link |
in other locations in the map to gather more resources
link |
but the location of the building is important
link |
and you have to select a worker,
link |
send it walking to that location, build the building,
link |
wait for the building to be built
link |
and then put extra workers there so they start mining.
link |
That just, that feels like impossible
link |
if you just randomly click to produce that state,
link |
desirable state that then you could hope to learn from
link |
because eventually that may yield to an extra win, right?
link |
So for me, the exploration problem
link |
and due to the action space
link |
and the fact that there's not really turns,
link |
there's so many turns
link |
because the game essentially ticks at 22 times per second.
link |
If you, I mean, that's how they discretize sort of time.
link |
Obviously, you always have to discretize time
link |
where there's no such thing as real time
link |
but it's really a lot of time steps
link |
of things that could go wrong
link |
and that definitely felt a priori like the hardest.
link |
You mentioned many good ones,
link |
I think partial observability,
link |
the fact that there is no perfect strategy
link |
because of the partial observability,
link |
those are very interesting problems.
link |
We start seeing more and more now in terms of
link |
as we saw of the previous ones
link |
but the core problem to me was exploration
link |
and solving it has been basically kind of the focus
link |
and how we saw the first breakthroughs.
link |
So exploration in a multi hierarchical way.
link |
So like 22 times a second exploration
link |
has a very different meaning than it does
link |
in terms of should I gather resources early
link |
or should I wait or so on.
link |
So how do you solve the long term?
link |
Let's talk about the internals of Alpha Star.
link |
So first of all, how do you represent the state
link |
of the game as an input?
link |
How do you then do the long term sequence modeling?
link |
How do you build a policy?
link |
What's the architecture like?
link |
So Alpha Star has obviously several components
link |
but everything passes through what we call the policy
link |
which is a neural network
link |
and that's kind of the beauty of it.
link |
There is, I could just now give you a neural network
link |
and some weights and if you fed the right observations
link |
and you understood the actions the same way we do
link |
you would have basically the agent playing the game.
link |
There's absolutely nothing else needed
link |
other than those weights that were trained.
link |
Now, the first step is observing the game
link |
and we've experimented with a few alternatives.
link |
The one that we currently use
link |
mixes both spatial sort of images
link |
that you would process from the game
link |
that is the zoomed out version of the map
link |
and also a zoomed in version of the camera
link |
or the screen as we call it.
link |
But also we give to the agent the list of units
link |
that it sees more of as a set of objects
link |
that it can operate on.
link |
That is not necessarily required to use it
link |
and we have versions of the game that play well
link |
without this set vision that is a bit
link |
not like how humans perceive the game
link |
but it certainly helps a lot
link |
because it's a very natural way
link |
to encode the game is by just looking at all the units
link |
that they have properties like health, position,
link |
type of unit, whether it's my unit or the enemy's
link |
and that sort of is kind of the summary
link |
of the state of the game,
link |
not that list of units or set of units
link |
that you see all the time.
link |
But that's pretty close to the way humans see the game.
link |
Why do you say it's not,
link |
you're saying the exactness of it
link |
is not similar to humans?
link |
The exactness of it is perhaps not the problem.
link |
I guess maybe the problem if you look at it
link |
from how actually humans play the game
link |
is that they play with a mouse and a keyboard and a screen
link |
and they don't see sort of a structured object
link |
with all the units,
link |
what they see is what they see on the screen, right?
link |
So you remember that there's a certain interrupt,
link |
there's a plot that you showed with camera base
link |
where you do exactly that, right, you move around
link |
and that seems to converge to similar performance.
link |
Yeah, I think that's what we're kind of experimenting
link |
with what's necessary or not, but using the set.
link |
So actually if you look at research in computer vision
link |
where it makes a lot of sense to treat images
link |
as two dimensional arrays,
link |
there's actually a very nice paper from Facebook.
link |
I think, I forgot who the authors are,
link |
but I think it's part of Kmings has group.
link |
And what they do is they take an image,
link |
which is this two dimensional signal
link |
and they actually take pixel by pixel
link |
and scramble the image as if it was just a list of pixels.
link |
Crucially, they encode the position of the pixels
link |
with the XY coordinates.
link |
And this is just kind of a new architecture
link |
which we incidentally also use in stack graph
link |
called the transformer,
link |
which is a very popular paper from last year,
link |
which yielded very nice result in machine translation.
link |
And if you actually believe in this kind of,
link |
oh, it's actually a set of pixels
link |
as long as you encode XY, it's okay.
link |
Then you could argue that the list of units
link |
that we see is precisely that
link |
because we have each unit as a kind of pixel, if you will,
link |
and then their XY coordinates.
link |
So in that perspective, without knowing it,
link |
we use the same architecture
link |
that was shown to work very well
link |
on Pascal and ImageNet and so on.
link |
So the interesting thing here is putting it in that way,
link |
it starts to move it towards
link |
the way you usually work with language.
link |
So what, and especially with your expertise
link |
and work in language, it seems like there's echoes
link |
of a lot of the way you would work with natural language
link |
in the way you've approached AlphaStar.
link |
Right, does that help
link |
with the longterm sequence modeling there somehow?
link |
Exactly, so now that we understand what an observation
link |
for a given time step is, we need to move on to say,
link |
well, there's gonna be a sequence of such observations
link |
and an agent will need to, given all that it's seen,
link |
not only the current time step, but all that it's seen,
link |
why, because there is partial observability.
link |
We must remember whether we saw a worker
link |
going somewhere, for instance, right?
link |
Because then there might be an expansion
link |
on the top right of the map.
link |
So given that, what you must then think about
link |
is there is the problem of, given all the observations,
link |
you have to predict the next action.
link |
And not only given all the observations,
link |
but given all the observations
link |
and given all the actions you've taken,
link |
predict the next action.
link |
And that sounds exactly like machine translation,
link |
where, and that's exactly how kind of I saw the problem,
link |
especially when you are given supervised data
link |
or replaced from humans,
link |
because the problem is exactly the same.
link |
You're translating essentially a prefix
link |
of observations and actions
link |
onto what's gonna happen next,
link |
which is exactly how you would train a model to translate
link |
or to generate language as well, right?
link |
You have a certain prefix.
link |
You must remember everything that comes in the past,
link |
because otherwise,
link |
you might start having non coherent text.
link |
And the same architectures,
link |
we're using LSTMs and transformers
link |
to operate on across time
link |
to kind of integrate all that's happened in the past.
link |
Those architectures that work so well
link |
in translation or language modeling
link |
are exactly the same than what the agent is using
link |
to issue actions in the game.
link |
And the way we train it, moreover,
link |
for imitation, which is step one of alpha studies,
link |
take all the human experience and try to imitate it,
link |
much like you try to imitate translators
link |
that translated many pairs of sentences
link |
from French to English say,
link |
that sort of principle applies exactly the same.
link |
It's almost the same code,
link |
except that instead of words,
link |
you have a slightly more complicated objects,
link |
which are the observations
link |
and the actions are also a bit more complicated
link |
Is there a self play component then too?
link |
So once you run out of imitation?
link |
Right, so indeed you can bootstrap from human replays,
link |
but then the agents you get are actually not as good
link |
as the humans you imitated, right?
link |
So how do we imitate?
link |
Well, we take humans from 3000 MMR and higher.
link |
3000 MMR is just a metric of human skill.
link |
And 3000 MMR might be like 50% percentile, right?
link |
So it's just average human.
link |
So maybe a quick pause.
link |
MMR is a ranking scale,
link |
the matchmaking rating for players.
link |
So it's 3000, I remember there's like a master
link |
and a grandmaster, what's 3000?
link |
So 3000 is pretty bad.
link |
I think it's kind of gold level.
link |
It just sounds really good relative to chess, I think.
link |
Oh yeah, yeah, no, the ratings,
link |
the best in the world are at 7000 MMR.
link |
So 3000, it's a bit like Elo indeed, right?
link |
So 3500 just allows us to not filter a lot of the data.
link |
So we like to have a lot of data in deep learning
link |
as you probably know.
link |
So we take these kind of 3500 and above,
link |
but then we do a very interesting trick,
link |
which is we tell the neural network
link |
what level they are imitating.
link |
So we say these replay you're gonna try to imitate
link |
to predict the next action for all the actions
link |
that you're gonna see is a 4000 MMR replay.
link |
This one is a 6000 MMR replay.
link |
And what's cool about this is then we take this policy
link |
that is being trained from human
link |
and then we can ask it to play like a 3000 MMR player
link |
by setting a bit saying, well, okay,
link |
play like a 3000 MMR player or play like a 6000 MMR player.
link |
And you actually see how the policy behaves differently.
link |
It gets worse economy if you play like a gold level player.
link |
It does less actions per minute,
link |
which is the number of clicks or number of actions
link |
that you will issue in a whole minute.
link |
And it's very interesting to see
link |
that it kind of imitates the skill level quite well.
link |
But if we ask it to play like a 6000 MMR player,
link |
we tested of course these policies to see how well they do.
link |
They actually beat all the built in AIs
link |
that Blizzard put in the game,
link |
but they're nowhere near 6000 MMR players, right?
link |
They might be maybe around gold level, platinum perhaps.
link |
So there's still a lot of work to be done for the policy
link |
to truly understand what it means to win.
link |
So far we only ask them, okay, here is the screen
link |
and that's what's happened on the game until this point.
link |
What would the next action be if we ask a pro to now say,
link |
oh, you're gonna click here or here or there?
link |
And the point is experiencing wins and losses
link |
is very important to then start to refine.
link |
Otherwise the policy can get loose,
link |
can just go off policy as we call it.
link |
That's so interesting that you can at least hope eventually
link |
to be able to control a policy
link |
approximately to be at some MMR level.
link |
That's so interesting, especially given
link |
that you have ground truth for a lot of these cases.
link |
Can I ask you a personal question?
link |
Well, I haven't played Starcraft 2, so I am unranked,
link |
which is the kind of lowest league.
link |
So I used to play Starcraft 1.
link |
The first one and...
link |
But you haven't seriously played Starcraft 2?
link |
No, not Starcraft 2.
link |
So the best player we have at DeepMind is about 5,000 MMR,
link |
which is high masters.
link |
It's not at the Grand Master level.
link |
Grand Master level would be the top 200 players
link |
in a certain region, like Europe or America or Asia.
link |
But for me, it would be hard to say.
link |
I am very bad at the game.
link |
I actually played Alpha Star a bit too late and it beat me.
link |
I remember the whole team was, oh, Oreo, you should play.
link |
And I was, oh, it looks like it's not so good yet.
link |
And then I remember I kind of got busy and waited an extra week
link |
and I played and it really beat me very badly.
link |
How did that feel?
link |
Isn't that an amazing feeling?
link |
That's amazing, yeah.
link |
I mean, obviously, I tried my best and I tried to also impress
link |
because I actually played the first game,
link |
so I'm still pretty good at micro management.
link |
The problem is I just don't understand Starcraft 2.
link |
I understand Starcraft.
link |
And when I played Starcraft,
link |
I probably was consistently like for a couple of years,
link |
So I was decent, but at the time,
link |
we didn't have this kind of MMR system as well established.
link |
So it would be hard to know what it was back then.
link |
So what's the difference in interface between Alpha Star
link |
and Starcraft and a human player in Starcraft?
link |
Is there any significant differences
link |
between the way they both see the game?
link |
I would say the way they see the game,
link |
there's a few things that are just very hard to simulate.
link |
The main one, perhaps,
link |
which is obvious in hindsight,
link |
is what's called clocked units,
link |
which are invisible units.
link |
So in Starcraft, you can make some units
link |
that you need to have a particular kind of unit to detect it.
link |
So these units are invisible.
link |
If you cannot detect them, you cannot target them.
link |
So they would just destroy your buildings
link |
or kill your workers.
link |
But despite the fact you cannot target the unit,
link |
there's a shimmer that as a human you observe.
link |
I mean, you need to train a little bit.
link |
You need to pay attention,
link |
but you would see this kind of space time distortion
link |
and you wouldn't know, okay, there are, yeah.
link |
Yeah, there's like a wave thing.
link |
Yeah, it's called shimmer.
link |
Space time distortion, I like it.
link |
That's really like the blizzard term is shimmer.
link |
And so these shimmer professional players actually
link |
can see it immediately.
link |
They understand it very well,
link |
but it's still something that requires
link |
certain amount of attention
link |
and it's kind of a bit annoying to deal with.
link |
Whereas for Alpha Star, in terms of vision,
link |
it's very hard for us to simulate sort of,
link |
oh, are you looking at this pixel in the screen and so on?
link |
So the only thing we can do is
link |
there is a unit that's invisible over there.
link |
So Alpha Star would know that immediately.
link |
Obviously still obeys the rules.
link |
You cannot attack the unit.
link |
You must have a detector and so on,
link |
but it's kind of one of the main things
link |
that it just doesn't feel there's a very proper way.
link |
I mean, you could imagine, oh, you don't have hypers.
link |
Maybe you don't know exactly what it is,
link |
or sometimes you see it, sometimes you don't.
link |
But it's just really, really complicated to get it
link |
so that everyone would agree,
link |
oh, that's the best way to simulate this, right?
link |
You know, it seems like a perception problem.
link |
It is a perception problem.
link |
So the only problem is people, you ask,
link |
oh, what's the difference between how humans perceive the game?
link |
I would say they wouldn't be able to tell a shimmer
link |
immediately as it appears on the screen,
link |
whereas Alpha Star, in principle,
link |
sees it very sharply, right?
link |
It sees that the bit turned from zero to one,
link |
meaning there's now a unit there,
link |
although you don't know the unit,
link |
or you know that you cannot attack it and so on.
link |
So from a vision standpoint,
link |
that probably is the one that is kind of the most obvious one.
link |
Then there are things humans cannot do perfectly,
link |
even professionals, which is they might miss a detail
link |
or they might have not seen a unit.
link |
And obviously, as a computer,
link |
if there's a corner of the screen that turns green
link |
because a unit enters the field of view,
link |
that can go into the memory of the agent, the LSTM,
link |
and persists there for a while,
link |
and for however long is relevant, right?
link |
And in terms of action, it seems like the rate of action
link |
from Alpha Star is comparative,
link |
if not slower than professional players,
link |
but it's more precise is what I heard.
link |
So that's really probably the one
link |
that is causing us more issues for a couple of reasons, right?
link |
The first one is StarCraft has been an AI environment
link |
for quite a few years.
link |
In fact, I was participating in the very first competition
link |
And there's really not been a kind of a very clear set
link |
of rules, how the actions per minute,
link |
the rate of actions that you can issue is.
link |
And as a result, these agents or bots that people build
link |
in a kind of almost very cool way,
link |
they do like 20,000, 40,000 actions per minute.
link |
Now, to put this in perspective,
link |
a very good professional human might do 300
link |
to 800 actions per minute, they might not be as precise.
link |
That's why the range is a bit tricky to identify exactly.
link |
I mean, 300 actions per minute precisely
link |
is probably realistic, 800 is probably not,
link |
but you see humans doing a lot of actions
link |
because they warm up and they kind of select things
link |
and spam and so on, just so that when they need,
link |
they have the accuracy.
link |
So we came into this by not having kind of a standard way
link |
to say, well, how do we measure whether an agent
link |
is at human level or not?
link |
On the other hand, we had a huge advantage,
link |
which is because we do imitation learning,
link |
agents turned out to act like humans
link |
in terms of rate of actions, even precision
link |
and in precision of actions.
link |
In the supervised policy, you could see all these.
link |
You could see how agents like to spam click to move here.
link |
If you played, especially Diablo, you would know what I mean.
link |
I mean, you just like spam, oh, move here, move here, move here.
link |
You're doing literally like maybe five actions in two seconds,
link |
but these actions are not very meaningful.
link |
One would have sufficed.
link |
So on the one hand, we start from this imitation policy
link |
that is at the ballpark of the actions per minute of humans
link |
because it's actually statistically trying to imitate humans.
link |
So we see this very nicely in the curves
link |
that we showed in the blog post.
link |
Like there's these actions per minute
link |
and the distribution looks very human like.
link |
But then of course, as self play kicks in,
link |
and that's the part we haven't talked too much yet,
link |
but of course the agent must play against himself to improve,
link |
then there's almost no guarantees
link |
that these actions will not become more precise
link |
or even the rate of actions is gonna increase over time.
link |
So what we did, and this is probably kind of the first attempt
link |
that we thought was reasonable,
link |
is we looked at the distribution of actions for humans
link |
for certain windows of time.
link |
And just to give a perspective,
link |
because I guess I mentioned that some of these agents
link |
that are programmatic, let's call them,
link |
they do 40,000 actions per minute.
link |
Professionals, as I said, do 300 to 800.
link |
So what we looked is we look at the distribution
link |
over professional gamers
link |
and we took reasonably high actions per minute,
link |
but we kind of identify certain cutoffs
link |
after which even if the agent wanted to act,
link |
these actions would be dropped.
link |
But the problem is this cutoff is probably set a bit too high
link |
and what ends up happening, even though the games,
link |
and when we ask the professionals and the gamers,
link |
by and large, they feel like it's playing human like.
link |
There are some agents that developed
link |
maybe slightly too high APMs,
link |
which is actions per minute,
link |
combined with the precision,
link |
which made people sort of start discussing
link |
a very interesting issue,
link |
which is should we have limited
link |
this, should we just let it loose
link |
and see what cool things it can come up with, right?
link |
So this is in itself an extremely interesting question,
link |
but the same way that modeling the shimmer
link |
would be so difficult,
link |
modeling absolutely all the details about muscles
link |
and precision and tiredness of humans
link |
would be quite difficult, right?
link |
So we're really here in kind of innovating in this sense
link |
of, okay, what could be maybe the next iteration
link |
of putting more rules
link |
that makes the agents more human like
link |
in terms of restrictions.
link |
So yeah, putting constraints that.
link |
More constraints, yeah.
link |
That's really interesting, that's really innovative.
link |
So one of the constraints you put on yourself
link |
or at least focused in is on the Protoss race,
link |
as far as I understand.
link |
Can you tell me about the different races
link |
and how they, so Protoss, Terran and Zerg,
link |
how do they compare?
link |
How do they interact?
link |
Why did you choose Protoss?
link |
In the dynamics of the game
link |
seen from a strategic perspective.
link |
So Protoss, so in Starcraft, there are three races.
link |
Indeed, in the demonstration,
link |
we saw only the Protoss race.
link |
So maybe let's start with that one.
link |
Protoss is kind of the most technologically advanced race.
link |
It has units that are expensive, but powerful, right?
link |
So in general, you wanna kind of conserve your units
link |
So you wanna, and then you wanna utilize
link |
these tactical advantages of very fancy spells
link |
and so on, so forth.
link |
And at the same time, they're kind of,
link |
people say like they're a bit easier to play perhaps, right?
link |
But that I actually didn't know.
link |
I mean, I just talked to now a lot to the players
link |
that we work with, TLO and Mana.
link |
And they said, oh yeah, Protoss is actually,
link |
people think is actually one of the easiest races.
link |
So perhaps the easier, that doesn't mean
link |
that it's obviously professional players
link |
excel at the three races.
link |
And there's never like a race that dominates
link |
for a very long time anyway.
link |
So if you look at the top, I don't know, 100 in the world,
link |
is there one race that dominates that list?
link |
It would be hard to know because it depends on the regions.
link |
I think it's pretty equal in terms of distribution.
link |
And Blizzard wants it to be equal, right?
link |
They wouldn't want one race like Protoss
link |
to not be representative in the top place.
link |
So definitely like they tried it to be like the balance, right?
link |
So then maybe the opposite race of Protoss is Zerg.
link |
Zerg is a race where you just kind of expand
link |
and take over as many resources as you can.
link |
And they have a very high capacity
link |
to regenerate their units.
link |
So if you have an army, it's not that valuable
link |
in terms of losing the whole army is not a big deal as Zerg
link |
because you can then rebuild it.
link |
And given that you generally accumulate
link |
a huge bank of resources, Zerg's typically play
link |
by applying a lot of pressure,
link |
maybe losing their whole army,
link |
but then rebuilding it quickly.
link |
So although of course every race,
link |
I mean, there's never, I mean, they're pretty diverse.
link |
I mean, there are some units in Zerg
link |
that are technologically advanced
link |
and they do some very interesting spells.
link |
And there's some units in Protoss that are less valuable
link |
and you could lose a lot of them and rebuild them
link |
and it wouldn't be a big deal.
link |
All right, so maybe I'm missing out.
link |
Maybe I'm gonna say some dumb stuff,
link |
but just summary of strategy.
link |
So first there's collection of a lot of resources.
link |
That's one option.
link |
The other one is expanding, so building other bases.
link |
Then the other is obviously building units
link |
and attacking with those units.
link |
And then I don't know what else there is.
link |
Maybe there's the different timing of attacks.
link |
Like do attack early, attack late.
link |
What are the different strategies that emerged
link |
that you've learned about?
link |
I've read that a bunch of people are super happy
link |
that you guys have apparently,
link |
that Alpha Star apparently has discovered
link |
that it's really good to, what is it, saturate.
link |
Oh yeah, the mineral line.
link |
Yeah, the mineral line.
link |
And that's for greedy amateur players like myself.
link |
That's always been a good strategy.
link |
You just build up a lot of money
link |
and it just feels good to just accumulate and accumulate.
link |
So thank you for discovering that and validating all of us.
link |
But is there other strategies that you discovered
link |
interesting and unique to this game?
link |
Yeah, so if you look at the kind of,
link |
not being a Starcraft 2 player,
link |
but of course Starcraft and Starcraft 2
link |
and real time strategy games in general are very similar.
link |
I would classify perhaps the openings of the game.
link |
They're very important.
link |
And generally I would say there's two kinds of openings.
link |
One that's a standard opening,
link |
that's generally how players find sort of a balance
link |
between risk and economy
link |
and building some units early on
link |
so that they could defend,
link |
but they're not too exposed basically,
link |
but also expanding quite quickly.
link |
So this would be kind of a standard opening.
link |
And within a standard opening,
link |
then what you do choose generally
link |
is what technology are you aiming towards?
link |
So there's a bit of rock, paper, scissors
link |
of you could go for spaceships
link |
or you could go for invisible units
link |
or you could go for, I don't know,
link |
like massive units that attack against certain kinds of units
link |
but they're weak against others.
link |
So standard openings themselves have some choices
link |
like rock, paper, scissors style.
link |
Of course, if you scout and you're good at guessing
link |
what the opponent is doing,
link |
then you can play as an advantage
link |
because if you know you're gonna play rock,
link |
I mean, I'm gonna play paper obviously.
link |
So you can imagine that normal standard games
link |
in Starcraft looks like a continuous rock, paper,
link |
scissors game where you guess what the distribution
link |
of rock, paper, and scissors is from the enemy
link |
and reacting accordingly to try to beat it
link |
or put the paper out before he kind of changes
link |
his mind from rock to scissors
link |
and then you would be in a weak position.
link |
So sorry to pause on that.
link |
I didn't realize this element
link |
because I know it's true with poker.
link |
I looked at Leprata's.
link |
You're also estimating, trying to guess the distribution,
link |
trying to better and better estimate the distribution
link |
what the opponent is likely to be doing.
link |
Yeah, I mean, as a player,
link |
you definitely wanna have a belief state
link |
over what's up on the other side of the map
link |
and when your belief state becomes inaccurate
link |
when you start having serious doubts
link |
whether he's gonna play something that you must know,
link |
that's when you scout.
link |
You wanna then gather information, right?
link |
Is improving the accuracy of the belief
link |
or improving the belief state part of the loss
link |
that you're trying to optimize?
link |
Or is it just an side effect?
link |
It's implicit, but implicit.
link |
You could explicitly model it
link |
and it would be quite good at probably predicting
link |
what's on the other side of the map,
link |
but so far it's all implicit.
link |
There's no additional reward for predicting the enemy.
link |
So there's these standard openings
link |
and then there's what people call cheese,
link |
which is very interesting
link |
and Alpha Star sometimes really likes this kind of cheese.
link |
These cheeses, what they are is kind of an all in strategy.
link |
You're gonna do something sneaky.
link |
You're gonna hide your own buildings
link |
close to the enemy base
link |
or you're gonna go for hiding your technological buildings
link |
so that you do invisible units
link |
and the enemy just cannot react to detect it
link |
and thus lose the game.
link |
And there's quite a few of these cheeses
link |
and variants of them.
link |
And there it's where actually the belief state
link |
becomes even more important
link |
because if I scout your base
link |
and I see no buildings at all,
link |
any human player knows some things up.
link |
They might know, well,
link |
you're hiding something close to my base.
link |
Should I build suddenly a lot of units to defense?
link |
Should I actually block my ramp with workers
link |
so that you cannot come and destroy my base?
link |
So there's all this is happening
link |
and defending against cheeses is extremely important.
link |
And in the Alpha Star League,
link |
many agents actually develop some cheesy strategies.
link |
And in the games we saw against TLO and Mana,
link |
two out of the 10 agents
link |
were actually doing these kind of strategies
link |
which are cheesy strategies.
link |
And then there's a variant of cheesy strategy
link |
which is called all in.
link |
So an all in strategy is not perhaps as drastic
link |
as oh, I'm gonna build cannons on your base
link |
and then bring all my workers
link |
and try to just disrupt your base and game over
link |
or GG as we say in StarCraft.
link |
There's these kind of very cool things
link |
that you can align precisely at a certain time mark.
link |
So for instance, you can generate
link |
exactly 10 unit composition that is perfect.
link |
Like five of this type, five of these other type
link |
and align the upgrade so that at four minutes and a half,
link |
let's say you have these 10 units
link |
and the upgrade just finished.
link |
And at that point, that army is really scary.
link |
And unless the enemy really knows what's going on,
link |
if you push, you might then have an advantage
link |
because maybe the enemy is doing something more standard,
link |
it expanded too much, it developed too much economy
link |
and it trade off badly against having defenses
link |
and the enemy will lose.
link |
But it's called all in because if you don't win,
link |
then you're gonna lose.
link |
So you see players that do these kind of strategies,
link |
if they don't succeed, game is not over.
link |
I mean, they still have a base
link |
and they still gathering minerals,
link |
but they will just GG out of the game
link |
because they know, well, game is over.
link |
I gambled and I failed.
link |
So if we start entering the game
link |
theoretic aspects of the game, it's really rich
link |
and that's why it also makes it quite entertaining to watch.
link |
Even if I don't play, I still enjoy watching the game.
link |
But the agents are trying to do this mostly implicitly,
link |
but one element that we improved in self plays
link |
creating the Alpha Star League.
link |
And the Alpha Star League is not pure self play.
link |
It's trying to create different personalities of agents
link |
so that some of them will become cheesy agents.
link |
Some of them might become very economical, very greedy,
link |
like getting all the resources,
link |
but then maybe early on they're gonna be weak,
link |
but later on they're gonna be very strong.
link |
And by creating this personality of agents,
link |
which sometimes it just happens naturally
link |
that you can see kind of an evolution of agents
link |
that given the previous generation,
link |
they train against all of them
link |
and then they generate kind of the perfect counter
link |
to that distribution.
link |
But these agents, you must have them in the populations
link |
because if you don't have them,
link |
you're not covered against these things, right?
link |
It's kind of, you wanna create all sorts of the opponents
link |
that you will find in the wild.
link |
So you can be exposed to these cheeses,
link |
early aggression, later aggression, more expansions,
link |
dropping units in your base from the side, all these things.
link |
And pure self play is getting a bit stuck
link |
at finding some subset of these, but not all of these.
link |
So the Alpha Star League is a way to kind of
link |
do an ensemble of agents
link |
that they're all playing in a league
link |
much like people play on Battle.net, right?
link |
They play, you play against someone
link |
who does a new cool strategy and you immediately,
link |
oh my God, I wanna try it, I wanna play again.
link |
And these to me was another critical part
link |
of the problem which was,
link |
can we create a Battle.net for agents?
link |
And that's kind of what the Alpha Star League really.
link |
That's fascinating.
link |
And where they stick to their different strategies.
link |
Yeah, wow, that's really, really interesting.
link |
But that said, you were fortunate enough
link |
or just skilled enough to win 5.0.
link |
And so how hard is it to win?
link |
I mean, that's not the goal.
link |
I guess, I don't know what the goal is.
link |
The goal should be to win majority, not 5.0,
link |
but how hard is it in general to win all matchups?
link |
So that's a very interesting question
link |
because once you see Alpha Star
link |
and superficially you think, well, okay,
link |
it won, if you sum all the games like 10 to one, right?
link |
It lost the game that it played with the camera interface.
link |
You might think, well, that's done, right?
link |
It's super human at the game.
link |
And that's not really the claim we really can make, actually.
link |
The claim is we beat a professional gamer
link |
for the first time.
link |
Starcraft has really been a thing
link |
that has been going on for a few years,
link |
but a moment like this had not occurred before yet.
link |
But are these agents impossible to beat?
link |
Absolutely not, right?
link |
So that's a bit what's kind of the difference is
link |
the agents play at grandmaster level.
link |
They definitely understand the game enough
link |
to play extremely well, but are they unbeatable?
link |
Do they play perfect?
link |
No, and actually in Starcraft,
link |
because of these sneaky strategies,
link |
it's always possible that you might take a huge risk sometimes,
link |
but you might get wins, right, out of this.
link |
So I think as a domain, it still has a lot of opportunities,
link |
not only because of course we wanna learn with less experience,
link |
we would like to, I mean, if I learn to play Protoss,
link |
I can play Terran and learn it much quicker
link |
than Alpha Star can, right?
link |
So there are obvious interesting research challenges as well.
link |
But even as the raw performance goes,
link |
really the claim here can be we are at pro level
link |
or at high grandmaster level,
link |
but obviously the players also did not know what to expect,
link |
right, their prior distribution was a bit off
link |
because they played this kind of new alien brain
link |
as they like to say it, right?
link |
And that's what makes it exciting for them,
link |
but also I think if you look at the games closely,
link |
you see there were weaknesses in some points,
link |
maybe Alpha Star did not scout
link |
or if it had got invisible units going against
link |
at certain points, it wouldn't have known
link |
and it would have been bad.
link |
So there's still quite a lot of work to do,
link |
but it's really a very exciting moment for us
link |
to be seeing, wow, a single neural net on a GPU
link |
is actually playing against these guys who are amazing.
link |
I mean, you have to see them play in life.
link |
They're really, really amazing players.
link |
Yeah, I'm sure there must be a guy in Poland somewhere
link |
right now training his butt off
link |
to make sure that this never happens again with Alpha Star.
link |
So that's really exciting in terms of Alpha Star
link |
having some holes to exploit, which is great.
link |
And then you build on top of each other
link |
and it feels like Starcraft on let go, even if you win,
link |
it's still not, it's still not,
link |
there's so many different dimensions
link |
in which you can explore.
link |
So that's really, really interesting.
link |
Do you think there's a ceiling to Alpha Star?
link |
You've said that it hasn't reached, this is a big,
link |
let me actually just pause for a second.
link |
How did it feel to come here to this point,
link |
to beat a top professional player?
link |
Like that night, I mean, you know,
link |
Olympic athletes have their gold medal, right?
link |
This is your gold medal in a sense.
link |
Sure, you're cited a lot,
link |
you've published a lot of prestigious papers, whatever,
link |
but this is like a win.
link |
I mean, it was, for me, it was unbelievable
link |
because first the win itself, I mean, it was so exciting.
link |
I mean, so looking back to those last days of 2018,
link |
really, that's when the games were played,
link |
I'm sure I'll look back at that moment and say,
link |
oh my God, I wanna be in a project like that.
link |
It's like, I already feel the nostalgia of like,
link |
yeah, that was huge in terms of the energy
link |
and the team effort that went into it.
link |
And so in that sense, as soon as it happened,
link |
I already knew it was kind of,
link |
I was losing it a little bit.
link |
So it is almost like sad that it happened and oh my God,
link |
like, but on the other hand, it also verifies the approach.
link |
But to me also, there's so many challenges
link |
and interesting aspects of intelligence
link |
that even though we can train a neural network
link |
to play at the level of the best humans,
link |
there's still so many challenges.
link |
So for me, it's also like,
link |
well, this is really an amazing achievement,
link |
but I already was also thinking about next steps.
link |
I mean, as I said, these Asians play Protos,
link |
they play Protos versus Protos,
link |
but they should be able to play a different race
link |
much quicker, right?
link |
So that would be an amazing achievement.
link |
Some people call this meta reinforcement learning,
link |
meta learning and so on, right?
link |
So there's so many possibilities after that moment,
link |
but the moment itself, it really felt great.
link |
It's, we had this bet.
link |
So I'm kind of a pessimist in general.
link |
So I kind of send an email to the team and I said,
link |
okay, let's, against TLO first, right?
link |
Like what's going to be the result?
link |
And I really thought we would lose like five zero, right?
link |
I, I, we had some calibration made
link |
against the 5,000 MMR player.
link |
TLO was much stronger than that player.
link |
Even if he played Protos, which is his off race,
link |
but yeah, it was not imagining we would win.
link |
So for me, that was just kind of a test run or something.
link |
And then it really kind of, he was really surprised.
link |
And unbelievably, we went to this,
link |
to this bar to celebrate.
link |
And Dave tells me, well, why don't we invite someone
link |
who is a thousand MMR stronger in Protos?
link |
Like an actual Protos player, like,
link |
like that it turned out being mana, right?
link |
And, you know, we had some drinks and I said, sure, why not?
link |
But then I thought, well,
link |
that's really going to be impossible to beat.
link |
I mean, even because it's so much ahead.
link |
A thousand MMR is really like 99% probability
link |
that mana would beat TLO as Protos versus Protos, right?
link |
And to me, the second game was much more important,
link |
even though a lot of uncertainty kind of disappeared
link |
after we kind of beat TLO.
link |
I mean, he is a professional player.
link |
So that was kind of, oh, but that's really
link |
a very nice achievement.
link |
But mana really was at the top.
link |
And you could see he played much better,
link |
but our agents got much better too.
link |
And then after the first game, I said,
link |
if we take a single game, at least we can say we beat A game.
link |
I mean, even if we don't beat the series,
link |
for me, that was a huge relief.
link |
And I mean, I remember the hacking them is.
link |
And I mean, it was really like this moment,
link |
for me, will resonate forever as a researcher.
link |
And I mean, as a person, and yeah,
link |
it's a really like great accomplishment.
link |
And it was great also to be there with the team in the room.
link |
I don't know if you saw like this.
link |
So it was really like.
link |
I mean, from my perspective,
link |
the other interesting thing is just like watching Kasparov,
link |
now watching mana was also interesting
link |
because he is kind of at a loss of words.
link |
I mean, whenever you lose, I've done a lot of sports.
link |
You sometimes say excuses, you look for reasons.
link |
And he couldn't really come up with reasons.
link |
I mean, so with the off race for Protoss,
link |
you could say, well, it felt awkward, it wasn't,
link |
but here it was just beaten.
link |
And it was beautiful to look at a human being
link |
being superseded by an AI system.
link |
I mean, it's a beautiful moment for researchers.
link |
Yeah, for sure it was.
link |
I mean, probably the highlight of my career so far
link |
because of its uniqueness and coolness.
link |
And I don't know, I mean, it's obviously, as you said,
link |
you can look at paper citations and so on.
link |
But this really is like a testament
link |
of the whole machine learning approach
link |
and using games to advance technology.
link |
I mean, it really was, everything came together
link |
at that moment, that's really the summary.
link |
Also, on the other side, it's a popularization of AI too
link |
because it's just like traveling to the moon and so on.
link |
I mean, this is where a very large community of people
link |
that don't really know AI, they get to really interact with it.
link |
Which is very important.
link |
I mean, we must, you know, writing papers helps our peers,
link |
researchers to understand what we're doing.
link |
But I think AI is becoming mature enough
link |
that we must sort of try to explain what it is.
link |
And perhaps through games is an obvious way
link |
because these games always had built in AI.
link |
So it may be everyone experienced an AI playing a video game
link |
even if they don't know.
link |
Because there's always some scripted element
link |
and some people might even call that AI already, right?
link |
So what are other applications
link |
of the approaches underlying Alpha Star that you see happening?
link |
There's a lot of echoes of, you said, transformer
link |
of language modeling and so on.
link |
Have you already started thinking where the breakthroughs
link |
in Alpha Star get expanded to other applications?
link |
Right, so I thought about a few things
link |
for like kind of next months, next years.
link |
The main thing I'm thinking about actually is
link |
what's next as a kind of a grand challenge
link |
because for me, like we've seen Atari
link |
and then there's like the sort of three dimensional walls
link |
that we've seen also like pretty good performance
link |
from this capture the flag agents
link |
that also some people at DeepMind and elsewhere are working on.
link |
We've also seen some amazing results on like,
link |
for instance, Dota 2, which is also a very complicated game.
link |
So for me, like the main thing I'm thinking about
link |
is what's next in terms of challenge.
link |
So as a researcher, I see sort of two tensions
link |
between research and then applications or areas
link |
or domains where you apply them.
link |
So on the one hand, we've done,
link |
thanks to the application of StarCraft is very hard.
link |
We developed some techniques, some new research
link |
that now we could look at elsewhere,
link |
like are there other applications where we can apply this?
link |
And the obvious ones, absolutely,
link |
you can think of feeding back to sort of the community
link |
we took from, which was mostly sequence modeling
link |
or natural language processing.
link |
So we've developed an extended things from the transformer
link |
and we use pointer networks.
link |
We combine LSTM and transformers in interesting ways.
link |
So that's perhaps the kind of lowest hanging fruit
link |
of feeding back to now a different field of machine learning
link |
that's not playing video games.
link |
Let me go old school and jump to Mr. Alan Turing.
link |
So the Turing test is a natural language test,
link |
a conversational test.
link |
What's your thought of it as a test for intelligence?
link |
Do you think it is a grand challenge
link |
that's worthy of undertaking?
link |
Maybe if it is, would you reformulate it
link |
or phrase it somehow differently?
link |
Right, so I really love the Turing test
link |
because I also like sequences and language understanding.
link |
And in fact, some of the early work
link |
we did in machine translation,
link |
we tried to apply to kind of a neural chat bot,
link |
which obviously would never pass the Turing test
link |
because it was very limited.
link |
But it is a very fascinating idea
link |
that you could really have an AI
link |
that would be indistinguishable from humans
link |
in terms of asking or conversing with it, right?
link |
So I think the test itself seems very nice
link |
and it's kind of well defined actually,
link |
like the passing it or not.
link |
I think there's quite a few rules
link |
that feel like pretty simple and you could really have,
link |
I mean, I think they have these competitions every year.
link |
Yeah, so the Leibniz Prize,
link |
but I don't know if you've seen the kind of bots
link |
that emerge from that competition.
link |
They're not quite as what you would,
link |
so it feels like that there's weaknesses
link |
with the way Turing formulated it.
link |
It needs to be that the definition
link |
of a genuine, rich, fulfilling human conversation
link |
it needs to be something else.
link |
Like the Alexa Prize,
link |
which I'm not as well familiar with,
link |
has tried to define that more.
link |
I think by saying you have to continue
link |
keeping a conversation for 30 minutes,
link |
something like that.
link |
So basically forcing the agent not to just fool,
link |
but to have an engaging conversation kind of thing,
link |
is that, I mean, have you thought
link |
about this problem richly?
link |
And if you have in general, how far away are we from,
link |
you worked a lot on language understanding,
link |
language generation, but the full dialogue,
link |
the conversation, just sitting at the bar,
link |
having a cup of beers for an hour,
link |
that kind of conversation.
link |
Have you thought about it?
link |
Yeah, so I think you touched here on the critical point,
link |
which is feasibility, right?
link |
So there's a great sort of essay by Hamming,
link |
which describes sort of grand challenges of physics.
link |
And he argues that, well, okay, for instance,
link |
teleportation or time travel are great grand challenges
link |
of physics, but there's no attacks.
link |
We really don't know or cannot kind of make any progress.
link |
So that's why most physicists and so on,
link |
they don't work on these in their PhDs
link |
and as part of their careers.
link |
So I see the Turing test as, in the full Turing test,
link |
as a bit still too early.
link |
Like I am, I think we're, especially with the current trend
link |
of deep learning language models,
link |
we've seen some amazing examples,
link |
I think GPT2 being the most recent one,
link |
which is very impressive,
link |
but to understand to fully solve passing or fooling a human
link |
to think that there's a human on the other side,
link |
I think we're quite far.
link |
So as a result, I don't see myself
link |
and I probably would not recommend people doing a PhD
link |
on solving the Turing test,
link |
because it just feels it's kind of too early
link |
or too hard of a problem.
link |
Yeah, but that said, you said the exact same thing
link |
about StarCraft about a few years ago.
link |
So to demo, so I pre...
link |
You'll probably also be the person
link |
who passes the Turing test in three years.
link |
I mean, I think the, yeah, so...
link |
So we have this on record, this is nice.
link |
I mean, it's true that progress sometimes
link |
is a bit unpredictable.
link |
I really wouldn't have not, even six months ago,
link |
I would not have predicted the level
link |
that we see that these agents can deliver.
link |
At grandmaster level, but I have worked on language enough.
link |
And basically my concern is not that something could happen,
link |
a breakthrough could happen that would bring us to solving
link |
or passing the Turing test,
link |
is that I just think the statistical approach to it,
link |
like this is not gonna cut it.
link |
So we need a breakthrough,
link |
which is great for the community.
link |
But given that, I think there's quite a more uncertainty.
link |
Whereas for StarCraft,
link |
I knew what the steps would be to kind of get us there.
link |
I think it was clear that using the imitation learning part
link |
and then using these battle network agents
link |
were gonna be key and it turned out that this was the case
link |
and a little more was needed, but not much more.
link |
For Turing test, I just don't know what the plan
link |
or execution plan would look like.
link |
So that's why I myself working on it
link |
as a grand challenge is hard,
link |
but there are quite a few sub challenges
link |
that are related that you could say,
link |
well, I mean, what if you create a great assistant,
link |
like Google already has like the Google Assistant.
link |
So can we make it better
link |
and can we make it fully neural and so on?
link |
That I start to believe maybe we're reaching a point
link |
where we should attempt these challenges.
link |
I like this conversation so much
link |
because it echoes very much the StarCraft conversation.
link |
It's exactly how you approach StarCraft.
link |
Let's break it down into small pieces and solve those
link |
and you end up solving the whole game.
link |
Great, but that said, you're behind some
link |
of the sort of biggest pieces of work and deep learning
link |
in the last several years.
link |
So you mentioned some limits.
link |
What do you think are the current limits of deep learning
link |
and how do we overcome those limits?
link |
So if I had to actually use a single word
link |
to define the main challenge in deep learning,
link |
it's a challenge that probably has been the challenge
link |
for many years and is that of generalization.
link |
So what that means is that all that we're doing
link |
is fitting functions to data.
link |
And when the data we see is not from the same distribution
link |
or even if there are some times
link |
that it is very close to distribution
link |
but because of the way we train it with limited samples,
link |
we then get to this stage where we just don't see
link |
generalization as much as we can generalize.
link |
And I think adversarial examples are a clear example of this
link |
but if you study machine learning and literature
link |
and the reason why SVMs came very popular
link |
were because they were dealing
link |
and they had some guarantees about generalization
link |
which is unseen data or out of distribution
link |
or even within distribution
link |
where you take an image adding a bit of noise,
link |
these models fail.
link |
So I think really I don't see a lot of progress
link |
on generalization in the strong generalization sense
link |
I think our neural networks,
link |
you can always find design examples
link |
that will make their outputs arbitrary
link |
which is not good because we humans would never be fooled
link |
by these kind of images or manipulation of the image.
link |
And if you look at the mathematics,
link |
you kind of understand this is a bunch of matrices
link |
multiplied together, there's probably numerics
link |
and instability that you can just find corner cases.
link |
So I think that's really the underlying topic
link |
many times we see when even at the grand stage
link |
of like during test generalization,
link |
I mean, if you start, I mean, passing the during test,
link |
should it be in English or should it be in any language?
link |
I mean, as a human, if you ask something
link |
in a different language, you actually will go
link |
and do some research and try to translate it
link |
and so on, should the during test include that, right?
link |
And it's really a difficult problem
link |
and very fascinating and very mysterious actually.
link |
But do you think it's, if you were to try to solve it,
link |
can you not grow the size of data intelligently
link |
in such a way that the distribution of your training set
link |
does include the entirety of the testing set?
link |
I think is that one path?
link |
The other path is totally a new methodology.
link |
That's not statistical.
link |
So a path that has worked well
link |
and it worked well in StarCraft and in machine translation
link |
and in language is scaling up the data and the model.
link |
And that's kind of been maybe the only single formula
link |
that still delivers today in deep learning, right?
link |
It's that scale, data scale and model scale
link |
really do more and more of the things that we thought,
link |
oh, there's no way it can generalize to these
link |
or there's no way it can generalize to that.
link |
But I don't think fundamentally it will be solved with this.
link |
And for instance, I'm really liking some style
link |
or approach that would not only have neural networks
link |
but it would have programs or some discrete decision making
link |
because there is where I feel there's a bit more,
link |
like, I mean, the example of the best example,
link |
I think for understanding this is,
link |
I also worked a bit on, oh, like we can learn an algorithm
link |
with a neural network, right?
link |
So you give it many examples
link |
and it's gonna sort the input numbers
link |
or something like that.
link |
But really, strong generalization is you give me some numbers
link |
or you ask me to create an algorithm that sorts numbers
link |
and instead of creating a neural net which will be fragile
link |
because it's gonna go out of range at some point,
link |
you're gonna give it numbers that are too large,
link |
too small and whatnot, you just,
link |
if you just create a piece of code that sorts the numbers,
link |
then you can prove that that will generalize
link |
to absolutely all the possible inputs you could give.
link |
So I think that's, the problem comes
link |
with some exciting prospects.
link |
I mean, scale is a bit more boring, but it really works.
link |
And then maybe programs and discrete abstractions
link |
are a bit less developed,
link |
but clearly I think they're quite exciting
link |
in terms of future for the field.
link |
Do you draw any insight wisdom from the 80s
link |
and expert systems and symbolic systems, symbolic computing?
link |
Do you ever go back to those,
link |
the reasoning, that kind of logic?
link |
Do you think that might make a comeback?
link |
You'll have to dust off those books?
link |
Yeah, I actually love actually adding more inductive biases.
link |
To me, the problem really is what are you trying to solve?
link |
If what you're trying to solve is so important
link |
that try to solve it no matter what,
link |
then absolutely use rules, use domain knowledge
link |
and then use a bit of the magic of machine learning
link |
to empower or to make the system as the best system
link |
that will detect cancer or detect weather patterns, right?
link |
Or in terms of StarCraft, it also was a very big challenge.
link |
So I was definitely happy
link |
that if we had to cut a corner here and there,
link |
it could have been interesting to do.
link |
And in fact, in StarCraft,
link |
we start thinking about expert systems
link |
because it's a very, you can define,
link |
I mean, people actually build StarCraft bots
link |
by thinking about those principles like state machines
link |
and rule based and then you could think
link |
of combining a bit of a rule based system,
link |
but that has also neural networks incorporated
link |
to make it generalize a bit better.
link |
So absolutely, I mean, we should definitely go back
link |
to those ideas and anything that makes the problem simpler.
link |
As long as your problem is important, that's okay.
link |
And that's research driving a very important problem.
link |
And on the other hand,
link |
if you wanna really focus on the limits
link |
of reinforcement learning, then of course,
link |
you must try not to look at imitation data
link |
or to look for some rules of the domain
link |
that would help a lot or even feature engineering, right?
link |
So this is a tension that depending on what you do,
link |
I think both ways are definitely fine.
link |
And I would never not do one or the other
link |
if you're, as long as what you're doing
link |
is important and needs to be solved, right?
link |
All right, so there's a bunch of different ideas
link |
that you've developed that I really enjoy.
link |
But one is translating from image captioning,
link |
translating from image to text.
link |
Just another beautiful idea, I think,
link |
that resonates throughout your work, actually.
link |
So the underlying nature of reality being language always.
link |
So what's the connection between images and text?
link |
Or rather the visual world and the world of language
link |
Right, so I think a piece of research that's been central
link |
to, I would say, even extending into StarCraft
link |
is this idea of sequence to sequence learning,
link |
which what we really meant by that
link |
is that you can now really input anything
link |
to a neural network as the input X
link |
and then the neural network will learn a function F
link |
that will take X as an input and produce any output Y.
link |
And these X and Ys don't need to be like static
link |
or like a feature, like a fixed vectors
link |
or anything like that.
link |
It could be really sequences
link |
and now beyond like data structures, right?
link |
So that paradigm was tested in a very interesting way
link |
when we moved from translating French to English
link |
to translating an image to its caption.
link |
But the beauty of it is that really,
link |
and that's actually how it happened.
link |
I ran, I changed a line of code in this thing
link |
that was doing machine translation
link |
and I came the next day and I saw how it was producing
link |
captions that seemed like, oh my God,
link |
this is really, really working.
link |
And the principle is the same, right?
link |
So I think I don't see text, vision, speech, way forms
link |
as something different, as long as you basically learn
link |
a function that will vectorize these into,
link |
and then after we vectorize it, we can then use transformers,
link |
LSTMs, whatever the flavor of the month of the model is.
link |
And then as long as we have enough supervised data,
link |
really this formula will work and will keep working,
link |
I believe to some extent.
link |
Model of these generalization issues that I mentioned before.
link |
So, but the task there is to vectorize
link |
sort of form a representation that's meaningful,
link |
and your intuition now, having worked with all this media,
link |
is that once you are able to form that representation,
link |
you could basically take anything, any sequence.
link |
Is there, going back to StarCraft,
link |
is there limits on the length?
link |
So we didn't really touch on the long term aspect.
link |
How did you overcome the whole really long term aspect
link |
Is there some tricks or is it?
link |
So the main trick, so StarCraft,
link |
if you look at absolutely every frame,
link |
you might think it's quite a long game.
link |
So we would have to multiply 22 times,
link |
60 seconds per minute times maybe
link |
at least 10 minutes per game on average.
link |
So there are quite a few frames,
link |
but the trick really was to,
link |
only observe, in fact, which might be seen as a limitation,
link |
but it is also a computational advantage.
link |
Only observe when you act.
link |
And then what the neural network decides
link |
is what is the gap gonna be until the next action?
link |
And if you look at most StarCraft games
link |
that we have in the data set that Blizzard provided,
link |
it turns out that most games are actually only,
link |
I mean, it is still a long sequence,
link |
but it's maybe like 1,000 to 1,500 actions,
link |
which if you start looking at LSTMs,
link |
large LSTMs, transformers,
link |
it's not that difficult,
link |
especially if you have supervised learning.
link |
If you had to do it with reinforcement learning,
link |
the credit assignment problem,
link |
what is it that in this game that made you win?
link |
That would be really difficult.
link |
But thankfully, because of imitation learning,
link |
we didn't kind of have to deal with this directly.
link |
Although if we had to, we tried it,
link |
and what happened is you just take all your workers
link |
and attack with them.
link |
And that sort of is kind of obvious in retrospect,
link |
because you start trying random actions.
link |
One of the actions will be a worker
link |
that goes to the enemy base,
link |
and because it's self play,
link |
it's not gonna know how to defend,
link |
because it basically doesn't know almost anything.
link |
And eventually what you develop is this,
link |
take all workers and attack,
link |
because the credit assignment issue in our rally
link |
is really, really hard.
link |
I do believe we could do better,
link |
and that's maybe a research challenge for the future.
link |
But yeah, even in StarCraft,
link |
the sequences are maybe 1,000,
link |
which I believe is within the realm
link |
of what transformers can do.
link |
Yeah, I guess the difference between StarCraft and Go
link |
is in Go and chess,
link |
stuff starts happening right away.
link |
Yeah, it's pretty easy to self play,
link |
not easy, but to self play is possible
link |
to develop reasonable strategies quickly
link |
as opposed to StarCraft.
link |
In Go, there's only 400 actions,
link |
but one action is what people would call
link |
the God action that would be,
link |
if you had expanded the whole search tree,
link |
that's the best action if you did minimax
link |
or whatever algorithm you would do
link |
if you had the computational capacity.
link |
In 400, you couldn't even click
link |
on the pixels around a unit, right?
link |
So I think the problem there
link |
is in terms of action space size
link |
and that search is impossible.
link |
So there's quite a few challenges indeed
link |
that make this kind of a step up
link |
in terms of machine learning.
link |
For humans, maybe playing StarCraft
link |
seems more intuitive
link |
because it looks real,
link |
the graphics and everything moves smoothly,
link |
whereas I don't know how to...
link |
Go is a game that I wouldn't really need to study.
link |
It feels quite complicated,
link |
but for machines, maybe it's the reverse, yes.
link |
Which shows you the gap, actually,
link |
between deep learning and however the heck
link |
So you developed a lot of really interesting ideas.
link |
It's interesting to just ask,
link |
what's your process of developing new ideas?
link |
Do you like brainstorming with others?
link |
Do you like thinking alone?
link |
Ian Goodfellow said he came up with Gans
link |
after a few beers.
link |
He thinks beers are essential
link |
for coming up with new ideas.
link |
We had beers to decide to play another game
link |
of StarCraft after a week,
link |
so it's really similar to that story.
link |
Actually, I explained this
link |
in a deep mind retreat,
link |
and I said this is the same as the Gans story.
link |
I mean, we were on a bar and we decided,
link |
we were on a week and that's what happened.
link |
I feel like we're giving the wrong message
link |
to young undergrads.
link |
But in general, do you like brainstorming?
link |
Do you like thinking alone, working stuff out?
link |
So I think throughout the years
link |
also things changed, right?
link |
So initially, I was
link |
very fortunate to be
link |
with great minds like
link |
Jeff Dean, Ilya Tsutskiber.
link |
I was really fortunate to join Brain
link |
at a very good time.
link |
At that point, ideas,
link |
I was just kind of brainstorming with my colleagues
link |
and learned a lot,
link |
and keep learning is actually
link |
something you should never stop doing, right?
link |
So learning implies
link |
reading papers and also discussing ideas
link |
with others. It's very hard
link |
at some point to not communicate
link |
that being reading a paper from someone
link |
or actually discussing, right?
link |
that communication aspect
link |
needs to be there, whether it's written
link |
I'm also trying to be a bit more strategic
link |
about what research to do.
link |
So I was describing
link |
a little bit this sort of tension between
link |
research for the sake of research,
link |
and then you have, on the other hand,
link |
applications that can drive the research, right?
link |
the formula that has worked best for me is
link |
just find a hard problem
link |
see how research fits into it,
link |
how it doesn't fit into it,
link |
and then you must innovate.
link |
So I think machine translation
link |
drove sequence to sequence.
link |
learning algorithms
link |
that had to, like combinatorial algorithms
link |
led to pointer networks.
link |
Starcraft led to really scaling up
link |
imitation learning and the Alpha Star League.
link |
So that's been a formula
link |
that I personally like,
link |
but the other one is also valid,
link |
and I see it succeed a lot of the times
link |
where you just want to investigate
link |
as a kind of a research topic,
link |
and then you must then start to think,
link |
well, how are the tests?
link |
How are you going to test these ideas?
link |
You need kind of a minimal
link |
environment to try things.
link |
You need to read a lot of papers and so on,
link |
and that's also very fun to do,
link |
and something I've also done quite a few times,
link |
both at Brain, at DeepMind,
link |
and obviously as a PhD.
link |
the ideas and discussions,
link |
I think it's important also
link |
because you start sort of
link |
your own goals, but
link |
other people's goals
link |
to the next breakthrough, so
link |
you must really kind of understand
link |
this feasibility also
link |
as we were discussing before, right?
link |
Whether this domain is ready
link |
to be tackled or not, and you don't want
link |
to be too early, you obviously don't want
link |
to be too late, so it's really interesting
link |
this strategic component of research,
link |
which I think as a grad student
link |
I just had no idea,
link |
I just read papers and discussed
link |
ideas, and I think this has been maybe
link |
the major change, and I recommend
link |
fit forward to success, how it looks like
link |
and try to backtrack, other than just
link |
kind of looking out, this looks cool,
link |
this looks cool, and then you do a bit of
link |
random work, which sometimes you stumble upon
link |
some interesting things, but
link |
in general it's also good to plan a bit.
link |
Especially like your approach of taking
link |
on really hard problems, stepping right in
link |
and then being super skeptical about
link |
being able to solve the problem.
link |
balance of both, right? There's a silly
link |
sort of skepticism
link |
that's good to balance, which
link |
is why it's good to have a team of people
link |
that balance that.
link |
You don't do that on your own, you have both
link |
mentors that have seen
link |
or you obviously want to chat and
link |
discuss whether it's the right time.
link |
came in 2014 and he said
link |
maybe in a bit we'll do StarCraft and
link |
and I'm just following his lead, which
link |
is great because he's brilliant, right?
link |
So, these things are
link |
important that you want to
link |
be surrounded by people
link |
who are diverse, they
link |
have their knowledge. There's also
link |
I've learned a lot from people
link |
have an idea that I might not think it's good
link |
but if I give them the space to try it
link |
I've been proven wrong many, many times
link |
as well. So, that's great.
link |
Your colleagues are more important than yourself
link |
Now, let's real quick
link |
talk about another impossible problem.
link |
What do you think it takes to build a system
link |
that's human level intelligence?
link |
We talked a little bit about the touring test, StarCraft
link |
all these have echoes of general intelligence
link |
but if you think about
link |
just something that you would sit back
link |
and say, wow, this is
link |
really something that resembles
link |
human level intelligence, what do you think it takes
link |
AGI oftentimes is maybe not
link |
so what I'm trying to
link |
then come up with for myself is
link |
what would be a result
link |
you would start to believe that
link |
you would have agents or neural nets
link |
that no longer sort of overfit
link |
to a single task, right?
link |
the skill of learning, so to speak
link |
and that actually is a field that I
link |
am fascinated by which is
link |
the learning to learn or meta learning
link |
which is about no longer
link |
learning about a single domain
link |
so you can think about the learning algorithm
link |
itself is general, right?
link |
So the same formula we applied for
link |
Alpha Star or StarCraft
link |
we can now apply to kind of almost any
link |
video game or you could apply to
link |
many other problems and domains
link |
is what's kind of generalizing
link |
but the neural network, the weights
link |
those weights are useless even
link |
to play another race, right? I train
link |
a network to play very well at PROTOS vs PROTOS
link |
I need to throw away those weights
link |
now Terran vs Terran
link |
I would need to retrain
link |
a network from scratch with the same algorithm
link |
that's beautiful but the network
link |
itself will not be useful
link |
so I think when I, if I see
link |
can absorb or start
link |
solving new problems
link |
without the need to kind of restart
link |
the process I think that
link |
to me would be a nice way to define
link |
again, I don't know
link |
the grandiose like age, I mean
link |
during tests we solve before AGI
link |
I mean, I don't know, I think concretely
link |
I would like to see clearly
link |
that meta learning happen
link |
an architecture or a network
link |
that as it sees new problem
link |
or new data it solves it
link |
kind of a benchmark it should
link |
solve it at the same speed that we do solve
link |
new problems when I define
link |
a new object and you have to recognize it
link |
when you start playing a new game
link |
you played all the Atari games but now you play a new Atari game
link |
well, you're going to be
link |
pretty quickly pretty good at the game
link |
what's the domain and what's the exact benchmark
link |
it's a bit difficult, I think as a community
link |
we might need to do some work to define it
link |
but I think this first step
link |
I could see it happen relatively soon
link |
but then the whole
link |
what AGI means and so on
link |
I am a bit more confused about
link |
what I think people mean different things
link |
there's an emotional psychological level
link |
like even the Turing test, passing the Turing test
link |
is something that we just pass judgment
link |
on as human beings what it means to be
link |
like what level, what does it mean
link |
but I like the generalization
link |
and maybe as a community we converge towards
link |
a group of domains
link |
that are sufficiently far away
link |
that would be really damn impressive
link |
if we're able to generalize
link |
so perhaps not as close as Protoss and Zerg
link |
but like Wikipedia
link |
that would be a good step
link |
and then a really good step
link |
but then from Starcraft to Wikipedia
link |
that kind of thing
link |
and that feels also quite hard and far
link |
as long as you put the benchmark out
link |
as we discovered for instance with ImageNet
link |
then tremendous progress can be had
link |
so I think maybe there's a lack of
link |
but I'm sure we'll find one and the community
link |
will then work towards that
link |
and then beyond what AGI might mean
link |
I really am hopeful to see
link |
basically machine learning
link |
or AI just scaling up
link |
people that might not have the resources
link |
to hire an assistant
link |
they might not even know what the weather is like
link |
but so I think there's
link |
in terms of the impact
link |
the positive impact of AI
link |
I think that's maybe what we should also
link |
the research community building AGI
link |
that's a real nice goal
link |
and I think the way that DeepMind puts it
link |
is and then use it to solve everything else
link |
so I think we should paralyze
link |
yeah we shouldn't forget
link |
of all the positive things that are actually
link |
coming out of AI already and are going
link |
and then let me ask
link |
relative to popular perception
link |
do you have any worry about the existential
link |
threat of artificial intelligence
link |
in the near or far future
link |
that some people have
link |
I think in the near future
link |
I'm skeptical so I hope
link |
but I appreciate efforts
link |
and even like a whole research field on
link |
AI safety emerging and in conferences
link |
and so on I think that's great
link |
have the benefits outweigh the potential dangers
link |
I am hopeful for that
link |
remain vigilant to kind of monitor
link |
and assess whether the tradeoffs
link |
are there and we have
link |
also lead time to prevent
link |
or to redirect our efforts
link |
but I'm quite optimistic
link |
about the technology
link |
and definitely more fearful
link |
of other threats in terms of
link |
at this point but obviously
link |
that's the one I kind of have more
link |
power on so clearly
link |
start thinking more and more about this
link |
it's grown in me actually to
link |
start reading more about AI safety
link |
which is a field that so far I have not
link |
really contributed to but maybe
link |
there's something to be done there as well
link |
I think it's really important
link |
I talk about this with a few folks
link |
but it's important to ask you
link |
and shove it in your head because you're at the
link |
leading edge of actually
link |
what people are excited about in AI
link |
I mean the work with AlphaStar
link |
at the very cutting edge of the kind
link |
of thing that people are afraid of
link |
and so you speaking
link |
that we're actually quite far away
link |
to the kind of thing that people might be
link |
afraid of but it's still
link |
worthwhile to think about
link |
and it's also good that you're
link |
that you're not as worried
link |
and you're also open to
link |
I mean there's two aspects
link |
I mean me not being worried but obviously
link |
for things that could
link |
go wrong, misuse of the technologies
link |
as with any technologies
link |
there's always tradeoffs
link |
and as a society we've kind of
link |
solved this to some extent
link |
in the past so I'm hoping that
link |
by having the researchers
link |
and the whole community
link |
brainstorm and come up with
link |
interesting solutions to the new things
link |
that will happen in the future
link |
that we can still also push the research
link |
to the avenue that
link |
I think is kind of the greatest avenue
link |
understand intelligence, right? How are we doing
link |
what we're doing and
link |
obviously from a scientific standpoint
link |
that is kind of the drive
link |
my personal drive of
link |
all the time that I spend doing
link |
what I'm doing really.
link |
Where do you see the deep learning as a field heading
link |
where do you think the next big
link |
breakthrough might be?
link |
So I think deep learning
link |
I discussed a little of this before
link |
deep learning has to be
link |
combined with some form of discretization
link |
I think that's kind of as a research
link |
in itself is an interesting topic
link |
to expand and start doing more research
link |
as kind of what will deep learning
link |
enable to do in the future
link |
I don't think that's going to be what's going to happen
link |
this year but also this
link |
not to throw away all the weights
link |
this idea of learning to learn
link |
not having to restart their weights
link |
and you can have an agent
link |
that is kind of solving
link |
or classifying images on ImageNet
link |
but also generating speech
link |
if you ask it to generate some speech
link |
and it should really be kind of
link |
might not be a neural network it might be a neural network
link |
with an optimization algorithm
link |
attached to it but I think this idea
link |
of generalization to new task
link |
is something that we first
link |
must define good benchmarks but then
link |
I think that's going to be exciting
link |
and I'm not sure how close we are
link |
but I think there's
link |
if you have a very limited domain
link |
I think we can start doing some progress
link |
much like how we did a lot of programs
link |
in computer vision we should start thinking
link |
I really like a talk that
link |
Leon Boutou gave at ICML
link |
a few years ago which is
link |
this train test paradigm should be broken
link |
thinking about a training test
link |
sorry a training set and a test set
link |
and these are closed
link |
things that are untouchable
link |
I think we should go beyond these and
link |
in meta learning we call these the meta training set
link |
and the meta test set which is
link |
really thinking about
link |
if I know about ImageNet
link |
why would that network
link |
not work on MNIST which is a much
link |
simpler problem but right now it really doesn't
link |
but it just feels wrong right so I think
link |
that's kind of the
link |
there's the on the application
link |
or the benchmark sites we probably
link |
will see quite a few
link |
more interest and progress and hopefully
link |
people defining new
link |
and exciting challenges really
link |
do you have any hope or
link |
interest in knowledge graphs
link |
within this context so it's kind of
link |
constructing graphs
link |
going back to graphs
link |
well neural networks are graphs but I mean
link |
a different kind of knowledge graph
link |
sort of like semantic graphs
link |
where there's concepts
link |
the idea of graphs
link |
is so I've been quite interested
link |
in sequences first and then more
link |
interesting or different data structures
link |
I've studied graph neural networks
link |
in the last three years or so
link |
found these models just very interesting from
link |
like deep learning
link |
standpoint but then
link |
how what do we want
link |
why do we want these models and why would we
link |
use them what's the application
link |
what's kind of the killer application of graphs
link |
could extract a knowledge graph
link |
from Wikipedia automatically
link |
that would be interesting because
link |
then these graphs have
link |
this very interesting structure
link |
that also is a bit more compatible with
link |
this idea of programs and
link |
deep learning kind of working together
link |
jumping neighborhoods and so on
link |
you could imagine defining some primitives
link |
to go around graphs right so
link |
I really like the idea of a knowledge
link |
we started or you know
link |
as part of the research we did for StarCraft
link |
I thought wouldn't it be cool to give
link |
all these buildings that depend on each other
link |
and units that have
link |
prerequisites of being built by that and so
link |
this is information
link |
that the network can learn and extract
link |
but it would have been great to see
link |
really StarCraft as a giant graph
link |
that even also as the game evolves
link |
you kind of start taking branches
link |
and so on and we tried
link |
to do a little bit of research on this
link |
nothing too relevant
link |
but I really like the idea
link |
and it has elements that are
link |
which something you also worked with in terms of visualizing
link |
your networks as elements of
link |
having human interpretable
link |
being able to generate knowledge
link |
representations that are human interpretable
link |
that maybe human experts can then tweak
link |
or at least understand
link |
so there's a lot of interesting
link |
aspect there and for me personally I'm just a huge fan of
link |
Wikipedia and it's a shame
link |
that our neural networks
link |
aren't taking advantage of all the structured
link |
knowledge that's on the web.
link |
What's next for you?
link |
What's next for DeepMind?
link |
What are you excited about?
link |
the obvious next steps
link |
apply AlphaStar to
link |
other races I mean that sort of
link |
shows that the algorithm
link |
by mistake something in the architecture
link |
that happens to work for proto's
link |
but not for other races right so
link |
as verification I think
link |
that's an obvious next step that we are working on
link |
then I would like to see
link |
so agents and players
link |
different skill sets that allow them to be
link |
very good. I think we've seen
link |
AlphaStar understanding
link |
very well when to take battles and when
link |
also very good at micromanagement
link |
and moving the units around and so on
link |
and also very good at producing
link |
nonstop and trading of economy
link |
with building units
link |
perhaps seen as much as I would like
link |
this idea of the poker idea
link |
that you mentioned right.
link |
I'm not sure StarCraft or AlphaStar
link |
rather has developed a very
link |
deep understanding of
link |
what the opponent is doing
link |
and reacting to that and sort of
link |
trick the player to do something else or that
link |
you know so this kind of reasoning
link |
I would like to see more so I think
link |
purely from a research standpoint
link |
there's perhaps also quite a few
link |
things to be done there
link |
in the domain of StarCraft. Yeah in the
link |
domain of games I've seen some
link |
interesting work in sort of
link |
in even auctions manipulating
link |
other players sort of forming a belief
link |
state and just messing with
link |
people. Yeah it's called theory of mind
link |
theory of mind on StarCraft
link |
is kind of they're really
link |
made for each other so
link |
that would be very exciting to see
link |
those techniques applied to StarCraft
link |
or perhaps StarCraft driving
link |
new techniques as I said
link |
this is always the tension between the two.
link |
Wow Oriel thank you so much for talking
link |
awesome it was great to be here thanks