back to indexOriol Vinyals: DeepMind AlphaStar, StarCraft, and Language | Lex Fridman Podcast #20
link |
The following is a conversation with Ariel Vinales.
link |
He's a senior research scientist at Google DeepMind,
link |
and before that, he was at Google Brain and Berkeley.
link |
His research has been cited over 39,000 times.
link |
He's truly one of the most brilliant and impactful minds
link |
in the field of deep learning.
link |
He's behind some of the biggest papers and ideas in AI,
link |
including sequence to sequence learning,
link |
audio generation, image captioning,
link |
neural machine translation,
link |
and, of course, reinforcement learning.
link |
He's a lead researcher of the AlphaStar project,
link |
creating an agent that defeated a top professional
link |
at the game of StarCraft.
link |
This conversation is part
link |
of the Artificial Intelligence podcast.
link |
If you enjoy it, subscribe on YouTube, iTunes,
link |
or simply connect with me on Twitter at Lex Friedman,
link |
And now, here's my conversation with Ariel Vinales.
link |
You spearheaded the DeepMind team behind AlphaStar
link |
that recently beat a top professional player at StarCraft.
link |
So you have an incredible wealth of work
link |
in deep learning and a bunch of fields,
link |
but let's talk about StarCraft first.
link |
Let's go back to the very beginning,
link |
even before AlphaStar, before DeepMind,
link |
before deep learning first.
link |
What came first for you,
link |
a love for programming or a love for video games?
link |
I think for me, it definitely came first
link |
the drive to play video games.
link |
I really liked computers.
link |
I didn't really code much, but what I would do is
link |
I would just mess with the computer, break it and fix it.
link |
That was the level of skills, I guess,
link |
that I gained in my very early days,
link |
I mean, when I was 10 or 11.
link |
And then I really got into video games,
link |
especially StarCraft, actually, the first version.
link |
I spent most of my time
link |
just playing kind of pseudo professionally,
link |
as professionally as you could play back in 98 in Europe,
link |
which was not a very main scene
link |
like what's called nowadays eSports.
link |
Right, of course, in the 90s.
link |
So how'd you get into StarCraft?
link |
What was your favorite race?
link |
How did you develop your skill?
link |
What was your strategy?
link |
All that kind of thing.
link |
So as a player, I tended to try to play not many games,
link |
not to kind of disclose the strategies
link |
that I kind of developed.
link |
And I like to play random, actually,
link |
not in competitions, but just to...
link |
I think in StarCraft, there's three main races
link |
and I found it very useful to play with all of them.
link |
And so I would choose random many times,
link |
even sometimes in tournaments,
link |
to gain skill on the three races
link |
because it's not how you play against someone,
link |
but also if you understand the race because you played,
link |
you also understand what's annoying,
link |
then when you're on the other side,
link |
what to do to annoy that person,
link |
to try to gain advantages here and there and so on.
link |
So I actually played random,
link |
although I must say in terms of favorite race,
link |
I really liked Zerg.
link |
I was probably best at Zerg
link |
and that's probably what I tend to use
link |
towards the end of my career before starting university.
link |
So let's step back a little bit.
link |
Could you try to describe StarCraft
link |
to people that may never have played video games,
link |
especially the massively online variety like StarCraft?
link |
So StarCraft is a real time strategy game.
link |
And the way to think about StarCraft,
link |
perhaps if you understand a bit chess,
link |
is that there's a board which is called map
link |
or the map where people play against each other.
link |
There's obviously many ways you can play,
link |
but the most interesting one is the one versus one setup
link |
where you just play against someone else
link |
or even the built in AI, right?
link |
Blizzard put a system that can play the game
link |
reasonably well if you don't know how to play.
link |
And then in this board, you have again,
link |
pieces like in chess,
link |
but these pieces are not there initially
link |
like they are in chess.
link |
You actually need to decide to gather resources
link |
to decide which pieces to build.
link |
So in a way you're starting almost with no pieces.
link |
You start gathering resources in StarCraft.
link |
There's minerals and gas that you can gather.
link |
And then you must decide how much do you wanna focus
link |
for instance, on gathering more resources
link |
or starting to build units or pieces.
link |
And then once you have enough pieces
link |
or maybe like attack, a good attack composition,
link |
then you go and attack the other side of the map.
link |
And now the other main difference with chess
link |
is that you don't see the other side of the map.
link |
So you're not seeing the moves of the enemy.
link |
It's what we call partially observable.
link |
So as a result, you must not only decide
link |
trading off economy versus building your own units,
link |
but you also must decide whether you wanna scout
link |
to gather information, but also by scouting,
link |
you might be giving away some information
link |
that you might be hiding from the enemy.
link |
So there's a lot of complex decision making
link |
There's also unlike chess, this is not a turn based game.
link |
You play basically all the time continuously
link |
and thus some skill in terms of speed
link |
and accuracy of clicking is also very important.
link |
And people that train for this really play this game
link |
at an amazing skill level.
link |
I've seen many times these
link |
and if you can witness this life,
link |
it's really, really impressive.
link |
So in a way, it's kind of a chess
link |
where you don't see the other side of the board,
link |
you're building your own pieces
link |
and you also need to gather resources
link |
to basically get some money to build other buildings,
link |
pieces, technology and so on.
link |
From the perspective of a human player,
link |
the difference between that and chess
link |
or maybe that and a game like turn based strategy
link |
like Heroes of Might and Magic is that there's an anxiety
link |
because you have to make these decisions really quickly.
link |
And if you are not actually aware of what decisions work,
link |
it's a very stressful balance.
link |
Everything you describe is actually quite stressful,
link |
difficult to balance for an amateur human player.
link |
I don't know if it gets easier at the professional level,
link |
like if they're fully aware of what they have to do,
link |
but at the amateur level, there's this anxiety.
link |
Oh crap, I'm being attacked.
link |
Oh crap, I have to build up resource.
link |
Oh, I have to probably expand.
link |
And all these, the time,
link |
the real time strategy aspect is really stressful
link |
and computationally I'm sure difficult.
link |
We'll get into it.
link |
But for me, Battle.net,
link |
so StarCraft was released in 98, 20 years ago,
link |
which is hard to believe.
link |
And Blizzard Battle.net with Diablo in 96 came out.
link |
And to me, it might be a narrow perspective,
link |
but it changed online gaming and perhaps society forever.
link |
But I may have made way too narrow viewpoint,
link |
but from your perspective,
link |
can you talk about the history of gaming
link |
over the past 20 years?
link |
Is this, how transformational,
link |
how important is this line of games?
link |
Right, so I think I kind of was an active gamer
link |
whilst this was developing, the internet, online gaming.
link |
So for me, the way it came was I played other games,
link |
strategy related, I played a bit of Common and Conquer,
link |
and then I played Warcraft II, which is from Blizzard.
link |
But at the time, I didn't know,
link |
I didn't understand about what Blizzard was or anything.
link |
Warcraft II was just a game,
link |
which was actually very similar to StarCraft in many ways.
link |
It's also real time strategy game
link |
where there's orcs and humans, so there's only two races.
link |
But it was offline.
link |
And it was offline, right?
link |
So I remember a friend of mine came to school,
link |
say, oh, there's this new cool game called StarCraft.
link |
And I just said, oh, this sounds like
link |
just a copy of Warcraft II, until I kind of installed it.
link |
And at the time, I am from Spain,
link |
so we didn't have very good internet, right?
link |
So there was, for us,
link |
StarCraft became first kind of an offline experience
link |
where you kind of start to play these missions, right?
link |
You play against some sort of scripted things
link |
to develop the story of the characters in the game.
link |
And then later on, I start playing against the built in AI,
link |
and I thought it was impossible to defeat it.
link |
Then eventually you defeat one
link |
and you can actually play against seven built in AIs
link |
at the same time, which also felt impossible.
link |
But actually, it's not that hard to beat
link |
seven built in AIs at once.
link |
So once we achieved that, also we discovered that
link |
we could play, as I said, internet wasn't that great,
link |
but we could play with the LAN, right?
link |
Like basically against each other
link |
if we were in the same place
link |
because you could just connect machines with like cables,
link |
So we started playing in LAN mode
link |
and as a group of friends,
link |
and it was really, really like much more entertaining
link |
than playing against AIs.
link |
And later on, as internet was starting to develop
link |
and being a bit faster and more reliable,
link |
then it's when I started experiencing Battle.net,
link |
which is this amazing universe,
link |
not only because of the fact
link |
that you can play the game against anyone in the world,
link |
but you can also get to know more people.
link |
You just get exposed to now like this vast variety of,
link |
it's kind of a bit when the chats came about, right?
link |
There was a chat system.
link |
You could play against people,
link |
but you could also chat with people,
link |
not only about Stalker, but about anything.
link |
And that became a way of life for kind of two years.
link |
And obviously then it became like kind of,
link |
it exploded in me in that I started to play more seriously,
link |
going to tournaments and so on and so forth.
link |
Do you have a sense on a societal, sociological level,
link |
what's this whole part of society
link |
that many of us are not aware of
link |
and it's a huge part of society, which is gamers.
link |
I mean, every time I come across that in YouTube
link |
or streaming sites, I mean,
link |
this is the huge number of people play games religiously.
link |
Do you have a sense of those folks,
link |
especially now that you've returned to that realm
link |
a little bit on the AI side?
link |
Yeah, so in fact, even after Stalker,
link |
I actually played World of Warcraft,
link |
which is maybe the main sort of online worlds
link |
or in presence that you get to interact
link |
with lots of people.
link |
So I played that for a little bit.
link |
It was to me, it was a bit less stressful than StarCraft
link |
because winning was kind of a given.
link |
You just put in this world
link |
and you can always complete missions.
link |
But I think it was actually the social aspect
link |
of especially StarCraft first
link |
and then games like World of Warcraft
link |
really shaped me in a very interesting ways
link |
because what you get to experience
link |
is just people you wouldn't usually interact with, right?
link |
So even nowadays, I still have many Facebook friends
link |
from the area where I played online
link |
and their ways of thinking is even political.
link |
They just, we don't live in,
link |
like we don't interact in the real world,
link |
but we were connected by basically fiber.
link |
And that way I actually get to understand a bit better
link |
that we live in a diverse world.
link |
And these were just connections that were made by,
link |
because, you know, I happened to go in a city
link |
in a virtual city as a priest and I met this warrior
link |
and we became friends
link |
and then we start like playing together, right?
link |
So I think it's transformative
link |
and more and more and more people are more aware of it.
link |
I mean, it's becoming quite mainstream,
link |
but back in the day, as you were saying in 2000, 2005,
link |
even it was very, still very strange thing to do,
link |
especially in Europe.
link |
I think there were exceptions like Korea, for instance,
link |
it was amazing that everything happened so early
link |
in terms of cybercafes, like if you go to Seoul,
link |
it's a city that back in the day,
link |
StarCraft was kind of,
link |
you could be a celebrity by playing StarCraft,
link |
but this was like 99, 2000, right?
link |
It's not like recently.
link |
So yeah, it's quite interesting to look back
link |
and yeah, I think it's changing society.
link |
The same way, of course, like technology
link |
and social networks and so on are also transforming things.
link |
And a quick tangent, let me ask,
link |
you're also one of the most productive people
link |
in your particular chosen passion and path in life.
link |
And yet you're also appreciate and enjoy video games.
link |
Do you think it's possible to do,
link |
to enjoy video games in moderation?
link |
Someone told me that you could choose two out of three.
link |
When I was playing video games,
link |
you could choose having a girlfriend,
link |
playing video games or studying.
link |
And I think for the most part, it was relatively true.
link |
These things do take time.
link |
Games like StarCraft,
link |
if you take the game pretty seriously
link |
and you wanna study it,
link |
then you obviously will dedicate more time to it.
link |
And I definitely took gaming
link |
and obviously studying very seriously.
link |
I love learning science and et cetera.
link |
So to me, especially when I started university undergrad,
link |
I kind of step off StarCraft.
link |
I actually fully stopped playing.
link |
And then World of Warcraft was a bit more casual.
link |
You could just connect online.
link |
And I mean, it was fun.
link |
But as I said, that was not as much time investment
link |
as it was for me in StarCraft.
link |
Okay, so let's get into AlphaStar.
link |
What are the, you're behind the team.
link |
So DeepMind has been working on StarCraft
link |
and released a bunch of cool open source agents
link |
and so on the past few years.
link |
But AlphaStar really is the moment
link |
where the first time you beat a world class player.
link |
So what are the parameters of the challenge
link |
in the way that AlphaStar took it on
link |
and how did you and David
link |
and the rest of the DeepMind team get into it?
link |
Consider that you can even beat the best in the world
link |
I think it all started back in 2015.
link |
Actually, I'm lying.
link |
I think it was 2014 when DeepMind was acquired by Google.
link |
And I at the time was at Google Brain,
link |
which was in California, is still in California.
link |
We had this summit where we got together, the two groups.
link |
So Google Brain and Google DeepMind got together
link |
and we gave a series of talks.
link |
And given that they were doing
link |
deep reinforcement learning for games,
link |
I decided to bring up part of my past,
link |
which I had developed at Berkeley,
link |
like this thing which we call Berkeley OverMind,
link |
which is really just a StarCraft one bot, right?
link |
So I talked about that.
link |
And I remember Demis just came to me and said,
link |
well, maybe not now, it's perhaps a bit too early,
link |
but you should just come to DeepMind
link |
and do this again with deep reinforcement learning, right?
link |
And at the time it sounded very science fiction
link |
for several reasons.
link |
But then in 2016, when I actually moved to London
link |
and joined DeepMind transferring from Brain,
link |
it became apparent that because of the AlphaGo moment
link |
and kind of Blizzard reaching out to us to say,
link |
wait, like, do you want the next challenge?
link |
And also me being full time at DeepMind,
link |
so sort of kind of all these came together.
link |
And then I went to Irvine in California,
link |
to the Blizzard headquarters to just chat with them
link |
and try to explain how would it all work
link |
before you do anything.
link |
And the approach has always been
link |
about the learning perspective, right?
link |
So in Berkeley, we did a lot of rule based conditioning
link |
and if you have more than three units, then go attack.
link |
And if the other has more units than me,
link |
I retreat and so on and so forth.
link |
And of course, the point of deep reinforcement learning,
link |
deep learning, machine learning in general
link |
is that all these should be learned behavior.
link |
So that kind of was the DNA of the project
link |
since its inception in 2016,
link |
where we just didn't even have an environment to work with.
link |
And so that's how it all started really.
link |
So if you go back to that conversation with Demis
link |
or even in your own head, how far away did you,
link |
because we're talking about Atari games,
link |
we're talking about Go, which is kind of,
link |
if you're honest about it, really far away from StarCraft.
link |
In, well, now that you've beaten it,
link |
maybe you could say it's close,
link |
but it's much, it seems like StarCraft
link |
is way harder than Go philosophically
link |
and mathematically speaking.
link |
So how far away did you think you were?
link |
Do you think it's 2019 and 18
link |
you could be doing as well as you have?
link |
Yeah, when I kind of thought about,
link |
okay, I'm gonna dedicate a lot of my time
link |
and focus on this.
link |
And obviously I do a lot of different research
link |
So spending time on it, I mean,
link |
I really had to kind of think
link |
there's gonna be something good happening out of this.
link |
So really I thought, well, this sounds impossible.
link |
And it probably is impossible to do the full thing,
link |
like the full game where you play one versus one
link |
and it's only a neural network playing and so on.
link |
So it really felt like,
link |
I just didn't even think it was possible.
link |
But on the other hand,
link |
I could see some stepping stones towards that goal.
link |
Clearly you could define sub problems in StarCraft
link |
and sort of dissect it a bit and say,
link |
okay, here is a part of the game, here's another part.
link |
And also obviously the fact,
link |
so this was really also critical to me,
link |
the fact that we could access human replays, right?
link |
So Blizzard was very kind.
link |
And in fact, they open source these for the whole community
link |
where you can just go
link |
and it's not every single StarCraft game ever played,
link |
but it's a lot of them you can just go and download.
link |
And every day they will,
link |
you can just query a data set and say,
link |
well, give me all the games that were played today.
link |
And given my kind of experience with language
link |
and sequences and supervised learning,
link |
I thought, well, that's definitely gonna be very helpful
link |
and something quite unique now,
link |
because ever before we had such a large data set of replays,
link |
of people playing the game at this scale
link |
of such a complex video game, right?
link |
So that to me was a precious resource.
link |
And as soon as I knew that Blizzard
link |
was able to kind of give this to the community,
link |
I started to feel positive
link |
about something non trivial happening.
link |
But I also thought the full thing, like really no rules,
link |
no single line of code that tries to say,
link |
well, I mean, if you see this unit, build a detector,
link |
all these, not having any of these specializations
link |
seemed really, really, really difficult to me.
link |
I do also like that Blizzard was teasing
link |
or even trolling you,
link |
sort of almost, yeah, pulling you in
link |
into this really difficult challenge.
link |
Do they have any awareness?
link |
What's the interest from the perspective of Blizzard,
link |
except just curiosity?
link |
Yeah, I think Blizzard has really understood
link |
and really bring forward this competitiveness
link |
of esports in games.
link |
The StarCraft really kind of sparked a lot of,
link |
like something that almost was never seen,
link |
especially as I was saying, back in Korea.
link |
So they just probably thought,
link |
well, this is such a pure one versus one setup
link |
that it would be great to see
link |
if something that can play Atari or Go
link |
and then later on chess could even tackle
link |
these kind of complex real time strategy game, right?
link |
So for them, they wanted to see first,
link |
obviously whether it was possible,
link |
if the game they created was in a way solvable
link |
And I think on the other hand,
link |
they also are a pretty modern company that innovates a lot.
link |
So just starting to understand AI for them
link |
to how to bring AI into games
link |
is not AI for games, but games for AI, right?
link |
I mean, both ways I think can work.
link |
And we obviously at DeepMind use games for AI, right?
link |
To drive AI progress,
link |
but Blizzard might actually be able to do
link |
and many other companies to start to understand
link |
and do the opposite.
link |
So I think that is also something
link |
they can get out of these.
link |
And they definitely, we have brainstormed a lot
link |
about these, right?
link |
But one of the interesting things to me
link |
about StarCraft and Diablo
link |
and these games that Blizzard has created
link |
is the task of balancing classes, for example.
link |
Sort of making the game fair from the starting point
link |
and then let skill determine the outcome.
link |
Is there, I mean, can you first comment,
link |
there's three races, Zerg, Protoss and Terran.
link |
I don't know if I've ever said that out loud.
link |
Is that how you pronounce it?
link |
Yeah, I don't think I've ever in person interacted
link |
with anybody about StarCraft, that's funny.
link |
So they seem to be pretty balanced.
link |
I wonder if the AI, the work that you're doing
link |
with AlphaStar would help balance them even further.
link |
Is that something you think about?
link |
Is that something that Blizzard is thinking about?
link |
Right, so balancing when you add a new unit
link |
or a new spell type is obviously possible
link |
given that you can always train or pre train at scale
link |
some agent that might start using that in unintended ways.
link |
But I think actually, if you understand
link |
how StarCraft has kind of co evolved with players,
link |
in a way, I think it's actually very cool
link |
the ways that many of the things and strategies
link |
that people came up with, right?
link |
So I think we've seen it over and over in StarCraft
link |
that Blizzard comes up with maybe a new unit
link |
and then some players get creative
link |
and do something kind of unintentional
link |
or something that Blizzard designers
link |
that just simply didn't test or think about.
link |
And then after that becomes kind of mainstream
link |
in the community, Blizzard patches the game
link |
and then they kind of maybe weaken that strategy
link |
or make it actually more interesting
link |
but a bit more balanced.
link |
So these kind of continual talk between players
link |
and Blizzard is kind of what has defined them actually
link |
in actually most games in StarCraft
link |
but also in World of Warcraft, they would do that.
link |
There are several classes and it would be not good
link |
that everyone plays absolutely the same race and so on, right?
link |
So I think they do care about balancing of course
link |
and they do a fair amount of testing
link |
but it's also beautiful to also see
link |
how players get creative anyways.
link |
And I mean, whether AI can be more creative at this point,
link |
I don't think so, right?
link |
I mean, it's just sometimes something so amazing happens.
link |
Like I remember back in the days,
link |
like you have these drop ships that could drop the rivers
link |
and that was actually not thought about
link |
that you could drop this unit
link |
that has this what's called splash damage
link |
that would basically eliminate
link |
all the enemies workers at once.
link |
No one thought that you could actually put them
link |
in really early game, do that kind of damage
link |
and then things change in the game.
link |
But I don't know, I think it's quite an amazing
link |
exploration process from both sides,
link |
players and Blizzard alike.
link |
Well, it's almost like a reinforcement learning exploration
link |
but the scale of humans that play Blizzard games
link |
is almost on the scale of a large scale
link |
deep mind RL experiment.
link |
I mean, if you look at the numbers,
link |
I mean, you're talking about, I don't know how many games
link |
but hundreds of thousands of games probably a month.
link |
I mean, so it's almost the same as running RL agents.
link |
What aspect of the problem of Starcraft
link |
do you think is the hardest?
link |
Is it the, like you said, the imperfect information?
link |
Is it the fact they have to do longterm planning?
link |
Is it the real time aspects?
link |
We have to do stuff really quickly.
link |
Is it the fact that a large action space
link |
so you can do so many possible things?
link |
Or is it, you know, in the game theoretic sense
link |
there is no Nash equilibrium
link |
or at least you don't know what the optimal strategy is
link |
because there's way too many options.
link |
Is there something that stands out as just like the hardest
link |
the most annoying thing?
link |
So when we sort of looked at the problem
link |
and start to define like the parameters of it, right?
link |
What are the observations?
link |
What are the actions?
link |
It became very apparent that, you know,
link |
the very first barrier that one would hit in Starcraft
link |
would be because of the action space being so large
link |
and as not being able to search like you could in chess
link |
or go even though the search space is vast.
link |
The main problem that we identified
link |
was that of exploration, right?
link |
So without any sort of human knowledge or human prior,
link |
if you think about Starcraft
link |
and you know how deep reinforcement learnings algorithm
link |
work which is essentially by issuing random actions
link |
and hoping that they will get some wins sometimes
link |
so they could learn.
link |
So if you think of the action space in Starcraft
link |
almost anything you can do in the early game is bad
link |
because any action involves taking workers
link |
which are mining minerals for free.
link |
That's something that the game does automatically
link |
sends them to mine.
link |
And you would immediately just take them out of mining
link |
and send them around.
link |
So just thinking how is it gonna be possible
link |
to get to understand these concepts
link |
but even more like expanding, right?
link |
There's these buildings you can place
link |
in other locations in the map to gather more resources
link |
but the location of the building is important
link |
and you have to select a worker,
link |
send it walking to that location, build the building,
link |
wait for the building to be built
link |
and then put extra workers there so they start mining.
link |
That feels like impossible if you just randomly click
link |
to produce that state, desirable state
link |
that then you could hope to learn from
link |
because eventually that may yield to an extra win, right?
link |
So for me, the exploration problem
link |
and due to the action space
link |
and the fact that there's not really turns,
link |
there's so many turns because the game essentially
link |
takes that 22 times per second.
link |
I mean, that's how they could discretize sort of time.
link |
Obviously you always have to discretize time
link |
but there's no such thing as real time
link |
but it's really a lot of time steps
link |
of things that could go wrong.
link |
And that definitely felt a priori like the hardest.
link |
You mentioned many good ones.
link |
I think partial observability
link |
and the fact that there is no perfect strategy
link |
because of the partial observability.
link |
Those are very interesting problems.
link |
We start seeing more and more now
link |
in terms of as we solve the previous ones
link |
but the core problem to me was exploration
link |
and solving it has been basically kind of the focus
link |
and how we saw the first breakthroughs.
link |
So exploration in a multi hierarchical way.
link |
So like 22 times a second exploration
link |
has a very different meaning than it does
link |
in terms of should I gather resources early
link |
or should I wait or so on.
link |
So how do you solve the longterm?
link |
Let's talk about the internals of AlphaStar.
link |
So first of all, how do you represent the state
link |
of the game as an input?
link |
How do you then do the longterm sequence modeling?
link |
How do you build a policy?
link |
What's the architecture like?
link |
So AlphaStar has obviously several components
link |
but everything passes through what we call the policy
link |
which is a neural network.
link |
And that's kind of the beauty of it.
link |
There is, I could just now give you a neural network
link |
And if you fed the right observations
link |
and you understood the actions the same way we do
link |
you would have basically the agent playing the game.
link |
There's absolutely nothing else needed
link |
other than those weights that were trained.
link |
Now, the first step is observing the game
link |
and we've experimented with a few alternatives.
link |
The one that we currently use mixes both spatial
link |
sort of images that you would process from the game
link |
that is the zoomed out version of the map
link |
and also a zoomed in version of the camera
link |
or the screen as we call it.
link |
But also we give to the agent the list of units
link |
that it sees more of as a set of objects
link |
that it can operate on.
link |
That is not necessarily required to use it.
link |
And we have versions of the game that play well
link |
without this set vision that is a bit not like
link |
how humans perceive the game.
link |
But it certainly helps a lot
link |
because it's a very natural way to encode the game
link |
is by just looking at all the units that there are.
link |
They have properties like health, position, type of unit
link |
whether it's my unit or the enemies.
link |
And that sort of is kind of the summary
link |
of the state of the game,
link |
that list of units or set of units
link |
that you see all the time.
link |
But that's pretty close to the way humans see the game.
link |
Why do you say it's not, isn't that,
link |
you're saying the exactness of it is not similar to humans?
link |
The exactness of it is perhaps not the problem.
link |
I guess maybe the problem if you look at it
link |
from how actually humans play the game
link |
is that they play with a mouse and a keyboard and a screen
link |
and they don't see sort of a structured object
link |
with all the units.
link |
What they see is what they see on the screen, right?
link |
Remember that there's a, sorry to interrupt,
link |
there's a plot that you showed with camera base
link |
where you do exactly that, right?
link |
You move around and that seems to converge
link |
to similar performance.
link |
Yeah, I think that's what I,
link |
we're kind of experimenting with what's necessary or not,
link |
but using the set.
link |
So, actually, if you look at research in computer vision,
link |
where it makes a lot of sense to treat images
link |
as two dimensional arrays,
link |
there's actually a very nice paper from Facebook.
link |
I think, I forgot who the authors are,
link |
but I think it's part of Caming's group.
link |
And what they do is they take an image,
link |
which is this two dimensional signal,
link |
and they actually take pixel by pixel
link |
and scramble the image as if it was just a list of pixels.
link |
Crucially, they encode the position of the pixels
link |
with the X, Y coordinates.
link |
And this is just kind of a new architecture,
link |
which we incidentally also use in StarCraft
link |
called the Transformer,
link |
which is a very popular paper from last year,
link |
which yielded very nice result in machine translation.
link |
And if you actually believe in this kind of,
link |
oh, it's actually a set of pixels,
link |
as long as you encode X, Y, it's okay,
link |
then you could argue that the list of units that we see
link |
is precisely that,
link |
because we have each unit as a kind of pixel, if you will,
link |
and then their X, Y coordinates.
link |
So in that perspective, we, without knowing it,
link |
we use the same architecture that was shown
link |
to work very well on Pascal and ImageNet and so on.
link |
So the interesting thing here is putting it in that way
link |
it starts to move it towards
link |
the way you usually work with language.
link |
So what, and especially with your expertise
link |
and work in language,
link |
it seems like there's echoes of a lot of
link |
the way you would work with natural language
link |
in the way you've approached AlphaStar.
link |
What's, does that help
link |
with the longterm sequence modeling there somehow?
link |
Exactly, so now that we understand
link |
what an observation for a given time step is,
link |
we need to move on to say,
link |
well, there's going to be a sequence of such observations
link |
and an agent will need to, given all that it's seen,
link |
not only the current time step, but all that it's seen, why?
link |
Because there is partial observability.
link |
We must remember whether we saw a worker going somewhere,
link |
for instance, right?
link |
Because then there might be an expansion
link |
on the top right of the map.
link |
So given that, what you must then think about is
link |
there is the problem of given all the observations,
link |
you have to predict the next action.
link |
And not only given all the observations,
link |
but given all the observations
link |
and given all the actions you've taken,
link |
predict the next action.
link |
And that sounds exactly like machine translation where,
link |
and that's exactly how kind of I saw the problem,
link |
especially when you are given supervised data
link |
or replays from humans,
link |
because the problem is exactly the same.
link |
You're translating essentially a prefix of observations
link |
and actions onto what's going to happen next,
link |
which is exactly how you would train a model to translate
link |
or to generate language as well, right?
link |
Do you have a certain prefix?
link |
You must remember everything that comes in the past
link |
because otherwise you might start having noncoherent text.
link |
And the same architectures we're using LSTMs
link |
and transformers to operate on across time
link |
to kind of integrate all that's happened in the past.
link |
Those architectures that work so well in translation
link |
or language modeling are exactly the same
link |
than what the agent is using to issue actions in the game.
link |
And the way we train it, moreover, for imitation,
link |
which is step one of AlphaStar is,
link |
take all the human experience and try to imitate it,
link |
much like you try to imitate translators
link |
that translated many pairs of sentences
link |
from French to English say,
link |
that sort of principle applies exactly the same.
link |
It's almost the same code, except that instead of words,
link |
you have a slightly more complicated objects,
link |
which are the observations and the actions
link |
are also a bit more complicated than a word.
link |
Is there a self play component then too?
link |
So once you run out of imitation?
link |
Right, so indeed you can bootstrap from human replays,
link |
but then the agents you get are actually not as good
link |
as the humans you imitated, right?
link |
So how do we imitate?
link |
Well, we take humans from 3000 MMR and higher.
link |
3000 MMR is just a metric of human skill
link |
and 3000 MMR might be like 50% percentile, right?
link |
So it's just average human.
link |
So maybe quick pause, MMR is a ranking scale,
link |
the matchmaking rating for players.
link |
So it's 3000, I remember there's like a master
link |
and a grand master, what's 3000?
link |
So 3000 is pretty bad.
link |
I think it's kind of goals level.
link |
It just sounds really good relative to chess, I think.
link |
Oh yeah, yeah, no, the ratings,
link |
the best in the world are at 7,000 MMR.
link |
So 3000, it's a bit like Elo indeed, right?
link |
So 3,500 just allows us to not filter a lot of the data.
link |
So we like to have a lot of data in deep learning
link |
as you probably know.
link |
So we take these kind of 3,500 and above,
link |
but then we do a very interesting trick,
link |
which is we tell the neural network
link |
what level they are imitating.
link |
So we say, this replay you're gonna try to imitate
link |
to predict the next action for all the actions
link |
that you're gonna see is a 4,000 MMR replay.
link |
This one is a 6,000 MMR replay.
link |
And what's cool about this is then we take this policy
link |
that is being trained from human,
link |
and then we can ask it to play like a 3000 MMR player
link |
by setting a beat saying, well, okay,
link |
play like a 3000 MMR player
link |
or play like a 6,000 MMR player.
link |
And you actually see how the policy behaves differently.
link |
It gets worse economy if you play like a goal level player,
link |
it does less actions per minute,
link |
which is the number of clicks or number of actions
link |
that you will issue in a whole minute.
link |
And it's very interesting to see
link |
that it kind of imitates the skill level quite well.
link |
But if we ask it to play like a 6,000 MMR player,
link |
we tested, of course, these policies to see how well they do.
link |
They actually beat all the built in AIs
link |
that Blizzard put in the game,
link |
but they're nowhere near 6,000 MMR players, right?
link |
They might be maybe around goal level, platinum, perhaps.
link |
So there's still a lot of work to be done for the policy
link |
to truly understand what it means to win.
link |
So far, we only asked them, okay, here is the screen.
link |
And that's what's happened on the game until this point.
link |
What would the next action be if we ask a pro to now say,
link |
oh, you're gonna click here or here or there.
link |
And the point is experiencing wins and losses
link |
is very important to then start to refine.
link |
Otherwise the policy can get loose,
link |
can just go off policy as we call it.
link |
That's so interesting that you can at least hope eventually
link |
to be able to control a policy
link |
approximately to be at some MMR level.
link |
That's so interesting, especially given that you have
link |
ground truth for a lot of these cases.
link |
Can I ask you a personal question?
link |
Well, I haven't played StarCraft II, so I am unranked,
link |
which is the kind of lowest league.
link |
So I used to play StarCraft, the first one.
link |
But you haven't seriously played StarCraft II.
link |
So the best player we have at DeepMind is about 5,000 MMR,
link |
which is high masters.
link |
It's not at grand master level.
link |
Grand master level will be the top 200 players
link |
in a certain region like Europe or America or Asia.
link |
But for me, it would be hard to say.
link |
I am very bad at the game.
link |
I actually played AlphaStar a bit too late and it beat me.
link |
I remember the whole team was, oh, Oreo, you should play.
link |
And I was, oh, it looks like it's not so good yet.
link |
And then I remember I kind of got busy
link |
and waited an extra week and I played
link |
and it really beat me very badly.
link |
Was that, I mean, how did that feel?
link |
Isn't that an amazing feeling?
link |
That's amazing, yeah.
link |
I mean, obviously I tried my best
link |
and I tried to also impress my,
link |
because I actually played the first game.
link |
So I'm still pretty good at micromanagement.
link |
The problem is I just don't understand StarCraft II.
link |
I understand StarCraft.
link |
And when I played StarCraft,
link |
I probably was consistently like for a couple of years,
link |
So I was decent, but at the time we didn't have
link |
this kind of MMR system as well established.
link |
So it would be hard to know what it was back then.
link |
So what's the difference in interface
link |
between AlphaStar and StarCraft
link |
and a human player in StarCraft?
link |
Is there any significant differences
link |
between the way they both see the game?
link |
I would say the way they see the game,
link |
there's a few things that are just very hard to simulate.
link |
The main one perhaps, which is obvious in hindsight
link |
is what's called cloaked units, which are invisible units.
link |
So in StarCraft, you can make some units
link |
that you need to have a particular kind of unit
link |
So these units are invisible.
link |
If you cannot detect them, you cannot target them.
link |
So they would just destroy your buildings
link |
or kill your workers.
link |
But despite the fact you cannot target the unit,
link |
there's a shimmer that as a human you observe.
link |
I mean, you need to train a little bit,
link |
you need to pay attention,
link |
but you would see this kind of space time distortion
link |
and you would know, okay, there are, yeah.
link |
Yeah, there's like a wave thing.
link |
Yeah, it's called shimmer.
link |
Space time distortion, I like it.
link |
That's really like, the Blizzard term is shimmer.
link |
And so these shimmer professional players
link |
actually can see it immediately.
link |
They understand it very well,
link |
but it's still something that requires
link |
certain amount of attention
link |
and it's kind of a bit annoying to deal with.
link |
Whereas for AlphaStar, in terms of vision,
link |
it's very hard for us to simulate sort of,
link |
oh, are you looking at this pixel in the screen and so on?
link |
So the only thing we can do is,
link |
there is a unit that's invisible over there.
link |
So AlphaStar would know that immediately.
link |
Obviously still obeys the rules.
link |
You cannot attack the unit.
link |
You must have a detector and so on,
link |
but it's kind of one of the main things
link |
that it just doesn't feel there's a very proper way.
link |
I mean, you could imagine, oh, you don't have hypers.
link |
Maybe you don't know exactly where it is,
link |
or sometimes you see it, sometimes you don't,
link |
but it's just really, really complicated to get it
link |
so that everyone would agree,
link |
oh, that's the best way to simulate this, right?
link |
It seems like a perception problem.
link |
It is a perception problem.
link |
So the only problem is people, you ask,
link |
oh, what's the difference between
link |
how humans perceive the game?
link |
I would say they wouldn't be able to tell a shimmer
link |
immediately as it appears on the screen,
link |
whereas AlphaStar in principle sees it very sharply, right?
link |
It sees that the bit turned from zero to one,
link |
meaning there's now a unit there,
link |
although you don't know the unit,
link |
or you know that you cannot attack it and so on.
link |
So that from a vision standpoint,
link |
that probably is the one that is kind of the most obvious one.
link |
Then there are things humans cannot do perfectly,
link |
even professionals, which is they might miss a detail,
link |
or they might have not seen a unit.
link |
And obviously as a computer,
link |
if there's a corner of the screen that turns green
link |
because a unit enters the field of view,
link |
that can go into the memory of the agent, the LSTM,
link |
and persist there for a while,
link |
and for however long is relevant, right?
link |
And in terms of action,
link |
it seems like the rate of action from AlphaStar
link |
is comparative, if not slower than professional players,
link |
but it's more precise is what I read.
link |
So that's really probably the one that is causing us
link |
more issues for a couple of reasons, right?
link |
The first one is StarCraft has been an AI environment
link |
for quite a few years.
link |
In fact, I mean, I was participating
link |
in the very first competition back in 2010.
link |
And there's really not been a kind of a very clear set
link |
of rules how the actions per minute,
link |
the rate of actions that you can issue is.
link |
And as a result, these agents or bots that people build
link |
in a kind of almost very cool way,
link |
they do like 20,000, 40,000 actions per minute.
link |
Now, to put this in perspective,
link |
a very good professional human
link |
might do 300 to 800 actions per minute.
link |
They might not be as precise.
link |
That's why the range is a bit tricky to identify exactly.
link |
I mean, 300 actions per minute precisely
link |
is probably realistic.
link |
800 is probably not, but you see humans doing a lot of actions
link |
because they warm up and they kind of select things
link |
and spam and so on just so that when they need,
link |
they have the accuracy.
link |
So we came into this by not having kind of a standard way
link |
to say, well, how do we measure whether an agent is
link |
at human level or not?
link |
On the other hand, we had a huge advantage,
link |
which is because we do imitation learning,
link |
agents turned out to act like humans
link |
in terms of rate of actions, even
link |
precisions and imprecisions of actions
link |
in the supervised policy.
link |
You could see all these.
link |
You could see how agents like to spam click, to move here.
link |
If you played especially Diablo, you wouldn't know what I mean.
link |
I mean, you just like spam, oh, move here, move here,
link |
You're doing literally like maybe five actions
link |
in two seconds, but these actions are not
link |
One would have sufficed.
link |
So on the one hand, we start from this imitation policy
link |
that is at the ballpark of the actions per minutes of humans
link |
because it's actually statistically
link |
trying to imitate humans.
link |
So we see these very nicely in the curves
link |
that we showed in the blog post.
link |
There's these actions per minute,
link |
and the distribution looks very human like.
link |
But then, of course, as self play kicks in,
link |
and that's the part we haven't talked too much yet,
link |
but of course, the agent must play against itself to improve,
link |
then there's almost no guarantees
link |
that these actions will not become more precise
link |
or even the rate of actions is going to increase over time.
link |
So what we did, and this is probably
link |
the first attempt that we thought was reasonable,
link |
is we looked at the distribution of actions
link |
for humans for certain windows of time.
link |
And just to give a perspective, because I guess I mentioned
link |
that some of these agents that are programmatic,
link |
They do 40,000 actions per minute.
link |
Professionals, as I said, do 300 to 800.
link |
So what we looked is we look at the distribution
link |
over professional gamers, and we took reasonably high actions
link |
per minute, but we kind of identify certain cutoffs
link |
after which, even if the agent wanted to act,
link |
these actions would be dropped.
link |
But the problem is this cutoff is probably set a bit too high.
link |
And what ends up happening, even though the games,
link |
and when we ask the professionals and the gamers,
link |
by and large, they feel like it's playing humanlike,
link |
there are some agents that developed maybe slightly
link |
too high APMs, which is actions per minute,
link |
combined with the precision, which
link |
made people start discussing a very interesting issue, which
link |
is, should we have limited these?
link |
Should we just let it lose and see what cool things
link |
it can come up with?
link |
So this is in itself an extremely interesting
link |
question, but the same way that modeling the shimmer
link |
would be so difficult, modeling absolutely all the details
link |
about muscles and precision and tiredness of humans
link |
would be quite difficult.
link |
So we're really here kind of innovating
link |
in this sense of, OK, what could be maybe
link |
the next iteration of putting more rules that
link |
makes the agents more humanlike in terms of restrictions?
link |
Yeah, putting constraints that.
link |
More constraints, yeah.
link |
That's really interesting.
link |
That's really innovative.
link |
So one of the constraints you put on yourself,
link |
or at least focused in, is on the Protoss race,
link |
as far as I understand.
link |
Can you tell me about the different races
link |
and how they, so Protoss, Terran, and Zerg,
link |
how do they compare?
link |
How do they interact?
link |
Why did you choose Protoss?
link |
Yeah, in the dynamics of the game seen
link |
from a strategic perspective.
link |
So Protoss, so in StarCraft there are three races.
link |
Indeed, in the demonstration, we saw only the Protoss race.
link |
So maybe let's start with that one.
link |
Protoss is kind of the most technologically advanced race.
link |
It has units that are expensive but powerful.
link |
So in general, you want to kind of conserve your units
link |
And then you want to utilize these tactical advantages
link |
of very fancy spells and so on and so forth.
link |
And at the same time, they're kind of,
link |
people say they're a bit easier to play perhaps.
link |
But that I actually didn't know.
link |
I mean, I just talked now a lot to the players
link |
that we work with, TLO and Mana, and they said, oh yeah,
link |
Protoss is actually, people think,
link |
is actually one of the easiest races.
link |
So perhaps the easier, that doesn't
link |
mean that it's obviously professional players
link |
excel at the three races.
link |
And there's never a race that dominates
link |
for a very long time anyway.
link |
So if you look at the top, I don't know, 100 in the world,
link |
is there one race that dominates that list?
link |
It would be hard to know because it depends on the regions.
link |
I think it's pretty equal in terms of distribution.
link |
And Blizzard wants it to be equal.
link |
They wouldn't want one race like Protoss
link |
to not be representative in the top place.
link |
So definitely, they tried it to be balanced.
link |
So then maybe the opposite race of Protoss is Zerg.
link |
Zerg is a race where you just kind of expand and take over
link |
as many resources as you can, and they
link |
have a very high capacity to regenerate their units.
link |
So if you have an army, it's not that valuable in terms
link |
of losing the whole army is not a big deal as Zerg
link |
because you can then rebuild it.
link |
And given that you generally accumulate
link |
a huge bank of resources, Zergs typically
link |
play by applying a lot of pressure,
link |
maybe losing their whole army, but then rebuilding it
link |
So although, of course, every race, I mean, there's never,
link |
I mean, they're pretty diverse.
link |
I mean, there are some units in Zerg that
link |
are technologically advanced, and they do
link |
some very interesting spells.
link |
And there's some units in Protoss that are less valuable,
link |
and you could lose a lot of them and rebuild them,
link |
and it wouldn't be a big deal.
link |
All right, so maybe I'm missing out.
link |
Maybe I'm going to say some dumb stuff, but summary
link |
So first, there's collection of a lot of resources.
link |
That's one option.
link |
The other one is expanding, so building other bases.
link |
Then the other is obviously building units
link |
and attacking with those units.
link |
And then I don't know what else there is.
link |
Maybe there's the different timing of attacks,
link |
like do I attack early, attack late?
link |
What are the different strategies that emerged
link |
that you've learned about?
link |
I've read that a bunch of people are super happy
link |
that you guys have apparently, that Alpha Star apparently
link |
has discovered that it's really good to,
link |
what is it, saturate?
link |
Oh yeah, the mineral line.
link |
Yeah, the mineral line.
link |
And that's for greedy amateur players like myself.
link |
That's always been a good strategy.
link |
You just build up a lot of money,
link |
and it just feels good to just accumulate and accumulate.
link |
So thank you for discovering that and validating all of us.
link |
But is there other strategies that you discovered
link |
that are interesting, unique to this game?
link |
Yeah, so if you look at the kind of,
link |
not being a StarCraft II player,
link |
but of course StarCraft and StarCraft II
link |
and real time strategy games in general are very similar.
link |
I would classify perhaps the openings of the game.
link |
They're very important.
link |
And generally I would say there's two kinds of openings.
link |
One that's a standard opening.
link |
That's generally how players find sort of a balance
link |
between risk and economy and building some units early on
link |
so that they could defend,
link |
but they're not too exposed basically,
link |
but also expanding quite quickly.
link |
So this would be kind of a standard opening.
link |
And within a standard opening,
link |
then what you do choose generally is
link |
what technology are you aiming towards?
link |
So there's a bit of rock, paper, scissors
link |
of you could go for spaceships
link |
or you could go for invisible units
link |
or you could go for, I don't know,
link |
like massive units that attack against certain kinds
link |
of units, but they're weak against others.
link |
So standard openings themselves have some choices
link |
like rock, paper, scissors style.
link |
Of course, if you scout and you're good
link |
at guessing what the opponent is doing,
link |
then you can play as an advantage
link |
because if you know you're gonna play rock,
link |
I mean, I'm gonna play paper obviously.
link |
So you can imagine that normal standard games
link |
in StarCraft looks like a continuous rock, paper,
link |
scissors game where you guess what the distribution
link |
of rock, paper, and scissors is from the enemy
link |
and reacting accordingly to try to beat it
link |
or put the paper out before he kind of changes his mind
link |
from rock to scissors,
link |
and then you would be in a weak position.
link |
So, sorry to pause on that.
link |
I didn't realize this element
link |
because I know it's true with poker.
link |
I know I looked at Labratus.
link |
So you're also estimating trying to guess the distribution,
link |
trying to better and better estimate the distribution
link |
of what the opponent is likely to be doing.
link |
Yeah, I mean, as a player,
link |
you definitely wanna have a belief state
link |
over what's up on the other side of the map.
link |
And when your belief state becomes inaccurate,
link |
when you start having that serious doubts,
link |
whether he's gonna play something that you must know,
link |
that's when you scout.
link |
You wanna then gather information, right?
link |
Is improving the accuracy of the belief
link |
or improving the belief state part of the loss
link |
that you're trying to optimize?
link |
Or is it just a side effect?
link |
It's implicit, but you could explicitly model it,
link |
and it would be quite good at probably predicting
link |
what's on the other side of the map.
link |
But so far, it's all implicit.
link |
There's no additional reward for predicting the enemy.
link |
So there's these standard openings,
link |
and then there's what people call cheese,
link |
which is very interesting.
link |
And AlphaStar sometimes really likes this kind of cheese.
link |
These cheeses, what they are is kind of an all in strategy.
link |
You're gonna do something sneaky.
link |
You're gonna hide your own buildings
link |
close to the enemy base,
link |
or you're gonna go for hiding your technological buildings
link |
so that you do invisible units
link |
and the enemy just cannot react to detect it
link |
and thus lose the game.
link |
And there's quite a few of these cheeses
link |
and variants of them.
link |
And there it's where actually the belief state
link |
becomes even more important.
link |
Because if I scout your base and I see no buildings at all,
link |
any human player knows something's up.
link |
They might know, well,
link |
you're hiding something close to my base.
link |
Should I build suddenly a lot of units to defend?
link |
Should I actually block my ramp with workers
link |
so that you cannot come and destroy my base?
link |
So there's all this is happening
link |
and defending against cheeses is extremely important.
link |
And in the AlphaStar League,
link |
many agents actually develop some cheesy strategies.
link |
And in the games we saw against TLO and Mana,
link |
two out of the 10 agents
link |
were actually doing these kind of strategies
link |
which are cheesy strategies.
link |
And then there's a variant of cheesy strategy
link |
which is called all in.
link |
So an all in strategy is not perhaps as drastic as,
link |
oh, I'm gonna build cannons on your base
link |
and then bring all my workers
link |
and try to just disrupt your base and game over,
link |
or GG as we say in StarCraft.
link |
There's these kind of very cool things
link |
that you can align precisely at a certain time mark.
link |
you can generate exactly 10 unit composition
link |
that is perfect, like five of this type,
link |
five of this other type,
link |
and align the upgrade
link |
so that at four minutes and a half, let's say,
link |
you have these 10 units and the upgrade just finished.
link |
And at that point, that army is really scary.
link |
And unless the enemy really knows what's going on,
link |
if you push, you might then have an advantage
link |
because maybe the enemy is doing something more standard,
link |
it expanded too much, it developed too much economy,
link |
and it trade off badly against having defenses,
link |
and the enemy will lose.
link |
But it's called all in because if you don't win,
link |
then you're gonna lose.
link |
So you see players that do these kinds of strategies,
link |
if they don't succeed, game is not over.
link |
I mean, they still have a base
link |
and they still gathering minerals,
link |
but they will just GG out of the game
link |
because they know, well, game is over.
link |
I gambled and I failed.
link |
So if we start entering the game theoretic aspects
link |
of the game, it's really rich and it's really,
link |
that's why it also makes it quite entertaining to watch.
link |
Even if I don't play, I still enjoy watching the game.
link |
But the agents are trying to do this mostly implicitly.
link |
But one element that we improved in self play
link |
is creating the Alpha Star League.
link |
And the Alpha Star League is not pure self play.
link |
It's trying to create a different personalities of agents
link |
so that some of them will become cheesy agents.
link |
Some of them might become very economical, very greedy,
link |
like getting all the resources,
link |
but then being maybe early on, they're gonna be weak,
link |
but later on, they're gonna be very strong.
link |
And by creating this personality of agents,
link |
which sometimes it just happens naturally
link |
that you can see kind of an evolution of agents
link |
that given the previous generation,
link |
they train against all of them
link |
and then they generate kind of the perfect counter
link |
to that distribution.
link |
But these agents, you must have them in the populations
link |
because if you don't have them,
link |
you're not covered against these things.
link |
You wanna create all sorts of the opponents
link |
that you will find in the wild.
link |
So you can be exposed to these cheeses, early aggression,
link |
later aggression, more expansions,
link |
dropping units in your base from the side, all these things.
link |
And pure self play is getting a bit stuck
link |
at finding some subset of these, but not all of these.
link |
So the Alpha Star League is a way
link |
to kind of do an ensemble of agents
link |
that they're all playing in a league,
link |
much like people play on Battle.net, right?
link |
They play, you play against someone
link |
who does a new cool strategy and you immediately,
link |
oh my God, I wanna try it, I wanna play again.
link |
And this to me was another critical part of the problem,
link |
which was, can we create a Battle.net for agents?
link |
And that's kind of what the Alpha Star League really is.
link |
That's fascinating.
link |
And where they stick to their different strategies.
link |
Yeah, wow, that's really, really interesting.
link |
But that said, you were fortunate enough
link |
or just skilled enough to win five, zero.
link |
And so how hard is it to win?
link |
I mean, that's not the goal.
link |
I guess, I don't know what the goal is.
link |
The goal should be to win majority, not five, zero,
link |
but how hard is it in general to win all matchups
link |
So that's a very interesting question
link |
because once you see Alpha Star and superficially
link |
you think, well, okay, it won.
link |
Let's, if you sum all the games like 10 to one, right?
link |
It lost the game that it played with the camera interface.
link |
You might think, well, that's done, right?
link |
It's superhuman at the game.
link |
And that's not really the claim we really can make actually.
link |
The claim is we beat a professional gamer
link |
for the first time.
link |
StarCraft has really been a thing
link |
that has been going on for a few years,
link |
but a moment like this had not occurred before yet.
link |
But are these agents impossible to beat?
link |
Absolutely not, right?
link |
So that's a bit what's kind of the difference is
link |
the agents play at grandmaster level.
link |
They definitely understand the game enough
link |
to play extremely well, but are they unbeatable?
link |
Do they play perfect?
link |
No, and actually in StarCraft,
link |
because of these sneaky strategies,
link |
it's always possible that you might take a huge risk
link |
sometimes, but you might get wins, right?
link |
So I think that as a domain,
link |
it still has a lot of opportunities,
link |
not only because of course we wanna learn
link |
with less experience, we would like to,
link |
I mean, if I learned to play Protoss,
link |
I can play Terran and learn it much quicker
link |
than Alpha Star can, right?
link |
So there are obvious interesting research challenges
link |
as well, but even as the raw performance goes,
link |
really the claim here can be we are at pro level
link |
or at high grandmaster level,
link |
but obviously the players also did not know what to expect,
link |
Their prior distribution was a bit off
link |
because they played this kind of new like alien brain
link |
as they like to say it, right?
link |
And that's what makes it exciting for them.
link |
But also I think if you look at the games closely,
link |
you see there were weaknesses in some points,
link |
maybe Alpha Star did not scout,
link |
or if it had invisible units going against
link |
at certain points, it wouldn't have known
link |
and it would have been bad.
link |
So there's still quite a lot of work to do,
link |
but it's really a very exciting moment for us
link |
to be seeing, wow, a single neural net on a GPU
link |
is actually playing against these guys
link |
I mean, you have to see them play in life.
link |
They're really, really amazing players.
link |
Yeah, I'm sure there must be a guy in Poland
link |
somewhere right now training his butt off
link |
to make sure that this never happens again with Alpha Star.
link |
So that's really exciting in terms of Alpha Star
link |
having some holes to exploit, which is great.
link |
And then we build on top of each other
link |
and it feels like StarCraft on let go,
link |
even if you win, it's still not,
link |
there's so many different dimensions
link |
in which you can explore.
link |
So that's really, really interesting.
link |
Do you think there's a ceiling to Alpha Star?
link |
You've said that it hasn't reached,
link |
you know, this is a big,
link |
wait, let me actually just pause for a second.
link |
How did it feel to come here to this point,
link |
to beat a top professional player?
link |
Like that night, I mean, you know,
link |
Olympic athletes have their gold medal, right?
link |
This is your gold medal in a sense.
link |
Sure, you're cited a lot,
link |
you've published a lot of prestigious papers, whatever,
link |
but this is like a win.
link |
I mean, it was, for me, it was unbelievable
link |
because first the win itself,
link |
I mean, it was so exciting.
link |
I mean, so looking back to those last days of 2018 really,
link |
that's when the games were played.
link |
I'm sure I look back at that moment, I'll say,
link |
oh my God, I want to be in a project like that.
link |
It's like, I already feel the nostalgia of like,
link |
yeah, that was huge in terms of the energy
link |
and the team effort that went into it.
link |
And so in that sense, as soon as it happened,
link |
I already knew it was kind of,
link |
I was losing it a little bit.
link |
So it is almost like sad that it happened and oh my God,
link |
but on the other hand, it also verifies the approach.
link |
But to me also, there's so many challenges
link |
and interesting aspects of intelligence
link |
that even though we can train a neural network
link |
to play at the level of the best humans,
link |
there's still so many challenges.
link |
So for me, it's also like, well,
link |
this is really an amazing achievement,
link |
but I already was also thinking about next steps.
link |
I mean, as I said, these Asians play Protoss versus Protoss,
link |
but they should be able to play a different race
link |
much quicker, right?
link |
So that would be an amazing achievement.
link |
Some people call this meta reinforcement learning,
link |
meta learning and so on, right?
link |
So there's so many possibilities after that moment,
link |
but the moment itself, it really felt great.
link |
We had this bet, so I'm kind of a pessimist in general.
link |
So I kind of send an email to the team.
link |
I said, okay, let's against TLO first, right?
link |
Like what's gonna be the result?
link |
And I really thought we would lose like five zero, right?
link |
We had some calibration made against the 5,000 MMR player.
link |
TLO was much stronger than that player,
link |
even if he played Protoss, which is his off race.
link |
But yeah, I was not imagining we would win.
link |
So for me, that was just kind of a test run or something.
link |
And then it really kind of, he was really surprised.
link |
And unbelievably, we went to this bar to celebrate
link |
and Dave tells me, well, why don't we invite someone
link |
who is a thousand MMR stronger in Protoss,
link |
like actual Protoss player,
link |
like that it turned up being Mana, right?
link |
And we had some drinks and I said, sure, why not?
link |
But then I thought, well,
link |
that's really gonna be impossible to beat.
link |
I mean, even because it's so much ahead,
link |
a thousand MMR is really like 99% probability
link |
that Mana would beat TLO as Protoss versus Protoss, right?
link |
And to me, the second game was much more important,
link |
even though a lot of uncertainty kind of disappeared
link |
after we kind of beat TLO.
link |
I mean, he is a professional player.
link |
So that was kind of, oh,
link |
but that's really a very nice achievement.
link |
But Mana really was at the top
link |
and you could see he played much better,
link |
but our agents got much better too.
link |
So it's like, ah, and then after the first game,
link |
I said, if we take a single game,
link |
at least we can say we beat a game.
link |
I mean, even if we don't beat the series,
link |
for me, that was a huge relief.
link |
And I mean, I remember the hugging demis.
link |
And I mean, it was really like,
link |
this moment for me will resonate forever as a researcher.
link |
And I mean, as a person,
link |
and yeah, it's a really like great accomplishment.
link |
And it was great also to be there with the team in the room.
link |
I don't know if you saw like this.
link |
So it was really like.
link |
I mean, from my perspective,
link |
the other interesting thing is just like watching Kasparov,
link |
watching Mana was also interesting
link |
because he didn't, he has kind of a loss of words.
link |
I mean, whenever you lose, I've done a lot of sports.
link |
You sometimes say excuses, you look for reasons.
link |
And he couldn't really come up with reasons.
link |
I mean, so with the off race for Protoss,
link |
you could say, well, it felt awkward, it wasn't,
link |
but here it was just beaten.
link |
And it was beautiful to look at a human being
link |
being superseded by an AI system.
link |
I mean, it's a beautiful moment for researchers, so.
link |
Yeah, for sure it was.
link |
I mean, probably the highlight of my career so far
link |
because of its uniqueness and coolness.
link |
And I don't know, I mean, it's obviously, as you said,
link |
you can look at papers, citations and so on,
link |
but these really is like a testament
link |
of the whole machine learning approach
link |
and using games to advance technology.
link |
I mean, it really was,
link |
everything came together at that moment.
link |
That's really the summary.
link |
Also on the other side, it's a popularization of AI too,
link |
because it's just like traveling to the moon and so on.
link |
I mean, this is where a very large community of people
link |
that don't really know AI,
link |
they get to really interact with it.
link |
Which is very important.
link |
I mean, we must, you know,
link |
writing papers helps our peers, researchers,
link |
to understand what we're doing.
link |
But I think AI is becoming mature enough
link |
that we must sort of try to explain what it is.
link |
And perhaps through games is an obvious way
link |
because these games always had built in AI.
link |
So it may be everyone experience an AI playing a video game,
link |
even if they don't know,
link |
because there's always some scripted element
link |
and some people might even call that AI already, right?
link |
So what are other applications
link |
of the approaches underlying AlphaStar
link |
that you see happening?
link |
There's a lot of echoes of, you said,
link |
transformer of language modeling and so on.
link |
Have you already started thinking
link |
where the breakthroughs in AlphaStar
link |
get expanded to other applications?
link |
Right, so I thought about a few things
link |
for like kind of next month, next years.
link |
The main thing I'm thinking about actually is what's next
link |
as a kind of a grand challenge.
link |
Because for me, like we've seen Atari
link |
and then there's like the sort of three dimensional walls
link |
that we've seen also like pretty good performance
link |
from these capture the flag agents
link |
that also some people at DeepMind and elsewhere
link |
We've also seen some amazing results on like,
link |
for instance, Dota 2, which is also a very complicated game.
link |
So for me, like the main thing I'm thinking about
link |
is what's next in terms of challenge.
link |
So as a researcher, I see sort of two tensions
link |
between research and then applications or areas
link |
or domains where you apply them.
link |
So on the one hand, we've done,
link |
thanks to the application of StarCraft is very hard.
link |
We developed some techniques, some new research
link |
that now we could look at elsewhere.
link |
Like are there other applications where we can apply these?
link |
And the obvious ones, absolutely.
link |
You can think of feeding back to sort of the community
link |
we took from, which was mostly sequence modeling
link |
or natural language processing.
link |
So we've developed and extended things from the transformer
link |
and we use pointer networks.
link |
We combine LSTM and transformers in interesting ways.
link |
So that's perhaps the kind of lowest hanging fruit
link |
of feeding back to now a different field
link |
of machine learning that's not playing video games.
link |
Let me go old school and jump to Mr. Alan Turing.
link |
So the Turing test is a natural language test,
link |
a conversational test.
link |
What's your thought of it as a test for intelligence?
link |
Do you think it is a grand challenge
link |
that's worthy of undertaking?
link |
Maybe if it is, would you reformulate it or phrase it
link |
somehow differently?
link |
Right, so I really love the Turing test
link |
because I also like sequences and language understanding.
link |
And in fact, some of the early work
link |
we did in machine translation, we
link |
tried to apply to kind of a neural chatbot, which obviously
link |
would never pass the Turing test because it was very limited.
link |
But it is a very fascinating idea
link |
that you could really have an AI that
link |
would be indistinguishable from humans in terms of asking
link |
or conversing with it.
link |
So I think the test itself seems very nice.
link |
And it's kind of well defined, actually,
link |
like the passing it or not.
link |
I think there's quite a few rules
link |
that feel pretty simple.
link |
And I think they have these competitions every year.
link |
Yes, there's the Lebner Prize.
link |
But I don't know if you've seen the kind of bots
link |
that emerge from that competition.
link |
They're not quite as what you would.
link |
So it feels like that there's weaknesses with the way Turing
link |
It needs to be that the definition
link |
of a genuine, rich, fulfilling human conversation,
link |
it needs to be something else.
link |
Like the Alexa Prize, which I'm not as well familiar with,
link |
has tried to define that more, I think,
link |
by saying you have to continue keeping
link |
a conversation for 30 minutes, something like that.
link |
So basically forcing the agent not to just fool,
link |
but to have an engaging conversation kind of thing.
link |
Have you thought about this problem richly?
link |
And if you have in general, how far away are we from?
link |
You worked a lot on language understanding,
link |
language generation, but the full dialogue,
link |
the conversation, just sitting at the bar
link |
having a couple of beers for an hour,
link |
that kind of conversation.
link |
Have you thought about it?
link |
Yeah, so I think you touched here
link |
on the critical point, which is feasibility.
link |
So there's a great essay by Hamming,
link |
which describes sort of grand challenges of physics.
link |
And he argues that, well, OK, for instance,
link |
teleportation or time travel are great grand challenges
link |
of physics, but there's no attacks.
link |
We really don't know or cannot kind of make any progress.
link |
So that's why most physicists and so on,
link |
they don't work on these in their PhDs
link |
and as part of their careers.
link |
So I see the Turing test, in the full Turing test,
link |
as a bit still too early.
link |
Like I think we're, especially with the current trend
link |
of deep learning language models,
link |
we've seen some amazing examples.
link |
I think GPT2 being the most recent one, which
link |
is very impressive.
link |
But to understand to fully solve passing or fooling a human
link |
to think that there's a human on the other side,
link |
I think we're quite far.
link |
So as a result, I don't see myself
link |
and I probably would not recommend people doing a PhD
link |
on solving the Turing test because it just
link |
feels it's kind of too early or too hard of a problem.
link |
Yeah, but that said, you said the exact same thing
link |
about StarCraft about a few years ago.
link |
So you'll probably also be the person who passes
link |
the Turing test in three years.
link |
I mean, I think that, yeah.
link |
So we have this on record.
link |
I mean, it's true that progress sometimes
link |
is a bit unpredictable.
link |
I really wouldn't have not.
link |
Even six months ago, I would not have predicted the level
link |
that we see that these agents can deliver at grandmaster
link |
But I have worked on language enough.
link |
And basically, my concern is not that something could happen,
link |
a breakthrough could happen that would bring us to solving
link |
or passing the Turing test, is that I just
link |
think the statistical approach to it is not going to cut it.
link |
So we need a breakthrough, which is great for the community.
link |
But given that, I think there's quite more uncertainty.
link |
Whereas for StarCraft, I knew what the steps would
link |
be to get us there.
link |
I think it was clear that using the imitation learning part
link |
and then using this battle net for agents
link |
were going to be key.
link |
And it turned out that this was the case.
link |
And a little more was needed, but not much more.
link |
For Turing test, I just don't know
link |
what the plan or execution plan would look like.
link |
So that's why I myself working on it as a grand challenge
link |
But there are quite a few sub challenges
link |
that are related that you could say,
link |
well, I mean, what if you create a great assistant
link |
like Google already has, like the Google Assistant.
link |
So can we make it better?
link |
And can we make it fully neural and so on?
link |
That I start to believe maybe we're
link |
reaching a point where we should attempt these challenges.
link |
I like this conversation so much because it echoes very much
link |
the StarCraft conversation.
link |
It's exactly how you approach StarCraft.
link |
Let's break it down into small pieces and solve those.
link |
And you end up solving the whole game.
link |
But that said, you're behind some
link |
of the biggest pieces of work in deep learning
link |
in the last several years.
link |
So you mentioned some limits.
link |
What do you think of the current limits of deep learning?
link |
And how do we overcome those limits?
link |
So if I had to actually use a single word
link |
to define the main challenge in deep learning,
link |
it's a challenge that probably has
link |
been the challenge for many years.
link |
And it's that of generalization.
link |
So what that means is that all that we're doing
link |
is fitting functions to data.
link |
And when the data we see is not from the same distribution,
link |
or even if there are some times that it
link |
is very close to distribution, but because
link |
of the way we train it with limited samples,
link |
we then get to this stage where we just
link |
don't see generalization as much as we can generalize.
link |
And I think adversarial examples are a clear example of this.
link |
But if you study machine learning and literature,
link |
and the reason why SVMs came very popular
link |
were because they were dealing and they
link |
had some guarantees about generalization, which
link |
is unseen data or out of distribution,
link |
or even within distribution where you take an image adding
link |
a bit of noise, these models fail.
link |
So I think, really, I don't see a lot of progress
link |
on generalization in the strong generalization
link |
sense of the word.
link |
I think our neural networks, you can always
link |
find design examples that will make their outputs arbitrary,
link |
which is not good because we humans would never
link |
be fooled by these kind of images
link |
or manipulation of the image.
link |
And if you look at the mathematics,
link |
you kind of understand this is a bunch of matrices
link |
multiplied together.
link |
There's probably numerics and instability
link |
that you can just find corner cases.
link |
So I think that's really the underlying topic many times
link |
we see when even at the grand stage of Turing test
link |
generalization, if you start passing the Turing test,
link |
should it be in English or should it be in any language?
link |
As a human, if you ask something in a different language,
link |
you actually will go and do some research
link |
and try to translate it and so on.
link |
Should the Turing test include that?
link |
And it's really a difficult problem
link |
and very fascinating and very mysterious, actually.
link |
But do you think if you were to try to solve it,
link |
can you not grow the size of data intelligently
link |
in such a way that the distribution of your training
link |
set does include the entirety of the testing set?
link |
The other path is totally a new methodology.
link |
It's not statistical.
link |
So a path that has worked well, and it worked well
link |
in StarCraft and in machine translation and in languages,
link |
scaling up the data and the model.
link |
And that's kind of been maybe the only single formula that
link |
still delivers today in deep learning, right?
link |
It's that data scale and model scale really
link |
do more and more of the things that we thought,
link |
oh, there's no way it can generalize to these,
link |
or there's no way it can generalize to that.
link |
But I don't think fundamentally it will be solved with this.
link |
And for instance, I'm really liking some style or approach
link |
that would not only have neural networks,
link |
but it would have programs or some discrete decision making,
link |
because there is where I feel there's a bit more.
link |
I mean, the best example, I think, for understanding this
link |
is I also worked a bit on, oh, we
link |
can learn an algorithm with a neural network, right?
link |
So you give it many examples, and it's
link |
going to sort the input numbers or something like that.
link |
But really strong generalization is you give me some numbers
link |
or you ask me to create an algorithm that sorts numbers.
link |
And instead of creating a neural net, which will be fragile
link |
because it's going to go out of range at some point,
link |
you're going to give it numbers that are too large, too small,
link |
and whatnot, if you just create a piece of code that
link |
sorts the numbers, then you can prove
link |
that that will generalize to absolutely all the possible
link |
input you could give.
link |
So I think the problem comes with some exciting prospects.
link |
I mean, scale is a bit more boring, but it really works.
link |
And then maybe programs and discrete abstractions
link |
are a bit less developed.
link |
But clearly, I think they're quite exciting in terms
link |
of future for the field.
link |
Do you draw any insight wisdom from the 80s and expert
link |
systems and symbolic systems, symbolic computing?
link |
Do you ever go back to those reasoning, that kind of logic?
link |
Do you think that might make a comeback?
link |
You'll have to dust off those books?
link |
Yeah, I actually love actually adding more inductive biases.
link |
To me, the problem really is, what are you trying to solve?
link |
If what you're trying to solve is so important that try
link |
to solve it no matter what, then absolutely use rules,
link |
use domain knowledge, and then use
link |
a bit of the magic of machine learning
link |
to empower to make the system as the best system that
link |
will detect cancer or detect weather patterns, right?
link |
Or in terms of StarCraft, it also was a very big challenge.
link |
So I was definitely happy that if we
link |
had to cut a corner here and there,
link |
it could have been interesting to do.
link |
And in fact, in StarCraft, we start
link |
thinking about expert systems because it's a very,
link |
you know, you can define.
link |
I mean, people actually build StarCraft bots by thinking
link |
about those principles, like state machines and rule based.
link |
And then you could think of combining
link |
a bit of a rule based system, but that has also
link |
neural networks incorporated to make it generalize a bit
link |
So absolutely, I mean, we should definitely
link |
go back to those ideas.
link |
And anything that makes the problem simpler,
link |
as long as your problem is important, that's OK.
link |
And that's research driving a very important problem.
link |
And on the other hand, if you want to really focus
link |
on the limits of reinforcement learning,
link |
then of course, you must try not to look at imitation data
link |
or to look for some rules of the domain that would help a lot
link |
or even feature engineering, right?
link |
So this is a tension that depending on what you do,
link |
I think both ways are definitely fine.
link |
And I would never not do one or the other
link |
as long as what you're doing is important
link |
and needs to be solved, right?
link |
Right, so there's a bunch of different ideas
link |
that you developed that I really enjoy.
link |
But one is translating from image captioning,
link |
translating from image to text, just another beautiful idea,
link |
I think, that resonates throughout your work, actually.
link |
So the underlying nature of reality
link |
being language always, somehow.
link |
So what's the connection between images and text,
link |
or rather the visual world and the world
link |
of language in your view?
link |
Right, so I think a piece of research that's been central
link |
to, I would say, even extending into StarGraph
link |
is this idea of sequence to sequence learning,
link |
which what we really meant by that
link |
is that you can now really input anything
link |
to a neural network as the input x.
link |
And then the neural network will learn a function f
link |
that will take x as an input and produce any output y.
link |
And these x and y's don't need to be static or features,
link |
like fixed vectors or anything like that.
link |
It could be really sequences and now beyond data structures.
link |
So that paradigm was tested in a very interesting way
link |
when we moved from translating French to English
link |
to translating an image to its caption.
link |
But the beauty of it is that, really,
link |
and that's actually how it happened.
link |
I changed a line of code in this thing that
link |
was doing machine translation.
link |
And I came the next day, and I saw
link |
how it was producing captions that seemed like, oh my god,
link |
this is really, really working.
link |
And the principle is the same.
link |
So I think I don't see text, vision, speech, waveforms
link |
as something different as long as you basically
link |
learn a function that will vectorize these into.
link |
And then after we vectorize it, we
link |
can then use transformers, LSTMs, whatever
link |
the flavor of the month of the model is.
link |
And then as long as we have enough supervised data,
link |
really, this formula will work and will keep working,
link |
I believe, to some extent.
link |
Modulo these generalization issues that I mentioned before.
link |
But the task there is to vectorize,
link |
so to form a representation that's meaningful.
link |
And your intuition now, having worked with all this media,
link |
is that once you are able to form that representation,
link |
you could basically take any things, any sequence.
link |
Going back to StarCraft, is there
link |
limits on the length so that we didn't really
link |
touch on the long term aspect?
link |
How did you overcome the whole really long term
link |
aspect of things here?
link |
Is there some tricks?
link |
So the main trick, so StarCraft, if you
link |
look at absolutely every frame, you
link |
might think it's quite a long game.
link |
So we would have to multiply 22 times 60 seconds per minute
link |
times maybe at least 10 minutes per game on average.
link |
So there are quite a few frames.
link |
But the trick really was to only observe, in fact,
link |
which might be seen as a limitation,
link |
but it is also a computational advantage.
link |
Only observe when you act.
link |
And then what the neural network decides
link |
is what is the gap going to be until the next action.
link |
And if you look at most StarCraft games
link |
that we have in the data set that Blizzard provided,
link |
it turns out that most games are actually only,
link |
I mean, it is still a long sequence,
link |
but it's maybe like 1,000 to 1,500 actions,
link |
which if you start looking at LSTMs, large LSTMs,
link |
transformers, it's not that difficult, especially
link |
if you have supervised learning.
link |
If you had to do it with reinforcement learning,
link |
the credit assignment problem, what
link |
is it in this game that made you win?
link |
That would be really difficult.
link |
But thankfully, because of imitation learning,
link |
we didn't have to deal with these directly.
link |
Although if we had to, we tried it.
link |
And what happened is you just take all your workers
link |
and attack with them.
link |
And that is kind of obvious in retrospect
link |
because you start trying random actions.
link |
One of the actions will be a worker
link |
that goes to the enemy base.
link |
And because it's self play, it's not
link |
going to know how to defend because it basically
link |
doesn't know almost anything.
link |
And eventually, what you develop is this take all workers
link |
and attack because the credit assignment issue in a rally
link |
is really, really hard.
link |
I do believe we could do better.
link |
And that's maybe a research challenge for the future.
link |
But yeah, even in StarCraft, the sequences
link |
are maybe 1,000, which I believe is
link |
within the realm of what transformers can do.
link |
Yeah, I guess the difference between StarCraft and Go
link |
is in Go and Chess, stuff starts happening right away.
link |
So there's not, yeah, it's pretty easy to self play.
link |
Not easy, but to self play, it's possible to develop
link |
reasonable strategies quickly as opposed to StarCraft.
link |
I mean, in Go, there's only 400 actions.
link |
But one action is what people would call the God action.
link |
That would be if you had expanded the whole search
link |
tree, that's the best action if you did minimax
link |
or whatever algorithm you would do if you
link |
had the computational capacity.
link |
But in StarCraft, 400 is minuscule.
link |
Like in 400, you couldn't even click
link |
on the pixels around a unit.
link |
So I think the problem there is in terms of action space size
link |
And that search is impossible.
link |
So there's quite a few challenges indeed
link |
that make this kind of a step up in terms of machine learning.
link |
For humans, maybe playing StarCraft
link |
seems more intuitive because it looks real.
link |
I mean, the graphics and everything moves smoothly,
link |
whereas I don't know how to.
link |
I mean, Go is a game that I would really need to study.
link |
It feels quite complicated.
link |
But for machines, kind of maybe it's the reverse, yes.
link |
Which shows you the gap actually between deep learning
link |
and however the heck our brains work.
link |
So you developed a lot of really interesting ideas.
link |
It's interesting to just ask, what's
link |
your process of developing new ideas?
link |
Do you like brainstorming with others?
link |
Do you like thinking alone?
link |
Do you like, what was it, Ian Goodfellow said
link |
he came up with GANs after a few beers.
link |
He thinks beers are essential for coming up with new ideas.
link |
We had beers to decide to play another game of StarCraft
link |
So it's really similar to that story.
link |
Actually, I explained this in a DeepMind retreat.
link |
And I said, this is the same as the GAN story.
link |
I mean, we were in a bar.
link |
And we decided, let's play a GAN next week.
link |
And that's what happened.
link |
I feel like we're giving the wrong message
link |
to young undergrads.
link |
But in general, do you like brainstorming?
link |
Do you like thinking alone, working stuff out?
link |
So I think throughout the years, also, things changed.
link |
So initially, I was very fortunate to be
link |
with great minds like Jeff Hinton, Jeff Dean,
link |
I was really fortunate to join Brain at a very good time.
link |
So at that point, ideas, I was just
link |
brainstorming with my colleagues and learned a lot.
link |
And keep learning is actually something
link |
you should never stop doing.
link |
So learning implies reading papers and also
link |
discussing ideas with others.
link |
It's very hard at some point to not communicate
link |
that being reading a paper from someone
link |
or actually discussing.
link |
So definitely, that communication aspect
link |
needs to be there, whether it's written or oral.
link |
Nowadays, I'm also trying to be a bit more strategic
link |
about what research to do.
link |
So I was describing a little bit this tension
link |
between research for the sake of research,
link |
and then you have, on the other hand,
link |
applications that can drive the research.
link |
And honestly, the formula that has worked best for me
link |
is just find a hard problem and then
link |
try to see how research fits into it,
link |
how it doesn't fit into it, and then you must innovate.
link |
So I think machine translation drove sequence to sequence.
link |
Then maybe learning algorithms that had to,
link |
combinatorial algorithms led to pointer networks.
link |
StarCraft led to really scaling up imitation learning
link |
and the AlphaStarLeague.
link |
So that's been a formula that I personally like.
link |
But the other one is also valid.
link |
And I've seen it succeed a lot of the times
link |
where you just want to investigate model based
link |
RL as a research topic.
link |
And then you must then start to think, well,
link |
how are the tests?
link |
How are you going to test these ideas?
link |
You need a minimal environment to try things.
link |
You need to read a lot of papers and so on.
link |
And that's also very fun to do and something
link |
I've also done quite a few times,
link |
both at Brain, at DeepMind, and obviously as a PhD.
link |
So I think besides the ideas and discussions,
link |
I think it's important also because you start
link |
sort of guiding not only your own goals,
link |
but other people's goals to the next breakthrough.
link |
So you must really kind of understand this feasibility
link |
also, as we were discussing before,
link |
whether this domain is ready to be tackled or not.
link |
And you don't want to be too early.
link |
You obviously don't want to be too late.
link |
So it's really interesting, this strategic component
link |
of research, which I think as a grad student,
link |
I just had no idea.
link |
I just read papers and discussed ideas.
link |
And I think this has been maybe the major change.
link |
And I recommend people kind of feed forward
link |
to success how it looks like and try to backtrack,
link |
other than just kind of looking, oh, this looks cool.
link |
And then you do a bit of random work,
link |
which sometimes you stumble upon some interesting things.
link |
But in general, it's also good to plan a bit.
link |
Especially like your approach of taking a really hard problem,
link |
stepping right in, and then being
link |
super skeptical about being able to solve the problem.
link |
I mean, there's a balance of both, right?
link |
There's a silly optimism and a critical sort of skepticism
link |
that's good to balance, which is why
link |
it's good to have a team of people that balance that.
link |
You don't do that on your own.
link |
You have both mentors that have seen,
link |
or you obviously want to chat and discuss
link |
whether it's the right time.
link |
I mean, Demis came in 2014.
link |
And he said, maybe in a bit we'll do StarCraft.
link |
And maybe he knew.
link |
And I'm just following his lead, which is great,
link |
because he's brilliant, right?
link |
So these things are obviously quite important,
link |
that you want to be surrounded by people who are diverse.
link |
They have their knowledge.
link |
There's also important to, I mean,
link |
I've learned a lot from people who actually have an idea
link |
that I might not think it's good.
link |
But if I give them the space to try it,
link |
I've been proven wrong many, many times as well.
link |
I think your colleagues are more important than yourself,
link |
Now let's real quick talk about another impossible problem,
link |
What do you think it takes to build a system that's
link |
human level intelligence?
link |
We talked a little bit about the Turing test, StarCraft.
link |
All of these have echoes of general intelligence.
link |
But if you think about just something
link |
that you would sit back and say, wow,
link |
this is really something that resembles
link |
human level intelligence.
link |
What do you think it takes to build that?
link |
So I find that AGI oftentimes is maybe not very well defined.
link |
So what I'm trying to then come up with for myself
link |
is what would be a result look like that you would start
link |
to believe that you would have agents or neural nets that
link |
no longer overfeed to a single task,
link |
but actually learn the skill of learning, so to speak.
link |
And that actually is a field that I
link |
am fascinated by, which is the learning to learn,
link |
or meta learning, which is about no longer learning
link |
about a single domain.
link |
So you can think about the learning algorithm
link |
itself is general.
link |
So the same formula we applied for AlphaStar or StarCraft,
link |
we can now apply to almost any video game,
link |
or you could apply to many other problems and domains.
link |
But the algorithm is what's generalizing.
link |
But the neural network, those weights
link |
are useless even to play another race.
link |
I train a network to play very well at Protos versus Protos.
link |
I need to throw away those weights.
link |
If I want to play now Terran versus Terran,
link |
I would need to retrain a network from scratch
link |
with the same algorithm.
link |
But the network itself will not be useful.
link |
So I think if I see an approach that
link |
can absorb or start solving new problems without the need
link |
to kind of restart the process, I
link |
think that, to me, would be a nice way
link |
to define some form of AGI.
link |
Again, I don't know the grandiose like age.
link |
I mean, should Turing tests be solved before AGI?
link |
I mean, I don't know.
link |
I think concretely, I would like to see clearly
link |
that meta learning happen, meaning
link |
that there is an architecture or a network that
link |
as it sees new problem or new data, it solves it.
link |
And to make it kind of a benchmark,
link |
it should solve it at the same speed
link |
that we do solve new problems.
link |
When I define you a new object and you
link |
have to recognize it, when you start playing a new game,
link |
you played all the Atari games.
link |
But now you play a new Atari game.
link |
Well, you're going to be pretty quickly pretty good
link |
So that's perhaps what's the domain
link |
and what's the exact benchmark is a bit difficult.
link |
I think as a community, we might need
link |
to do some work to define it.
link |
But I think this first step, I could
link |
see it happen relatively soon.
link |
But then the whole what AGI means and so on,
link |
I am a bit more confused about what
link |
I think people mean different things.
link |
There's an emotional, psychological level
link |
that like even the Turing test, passing the Turing test
link |
is something that we just pass judgment on as human beings
link |
what it means to be as a dog in AGI system.
link |
What level, what does it mean, what does it mean?
link |
But I like the generalization.
link |
And maybe as a community, we converge
link |
towards a group of domains that are sufficiently far away.
link |
That would be really damn impressive
link |
if it was able to generalize.
link |
So perhaps not as close as Protoss and Zerg,
link |
but like Wikipedia.
link |
That would be a step.
link |
Yeah, that would be a good step and then a really good step.
link |
But then like from StarCraft to Wikipedia and back.
link |
Yeah, that kind of thing.
link |
And that feels also quite hard and far.
link |
But I think as long as you put the benchmark out,
link |
as we discovered, for instance, with ImageNet,
link |
then tremendous progress can be had.
link |
So I think maybe there's a lack of benchmark,
link |
but I'm sure we'll find one and the community will then
link |
work towards that.
link |
And then beyond what AGI might mean or would imply,
link |
I really am hopeful to see basically machine learning
link |
or AI just scaling up and helping people
link |
that might not have the resources to hire an assistant
link |
or that they might not even know what the weather is like.
link |
So I think in terms of the positive impact of AI,
link |
I think that's maybe what we should also not lose focus.
link |
The research community building AGI,
link |
I mean, that's a real nice goal.
link |
But I think the way that DeepMind puts it is,
link |
and then use it to solve everything else.
link |
So I think we should paralyze.
link |
Yeah, we shouldn't forget about all the positive things
link |
that are actually coming out of AI already
link |
and are going to be coming out.
link |
But on that note, let me ask relative
link |
to popular perception, do you have
link |
any worry about the existential threat
link |
of artificial intelligence in the near or far future
link |
that some people have?
link |
I think in the near future, I'm skeptical.
link |
So I hope I'm not wrong.
link |
But I'm not concerned, but I appreciate efforts,
link |
ongoing efforts, and even like whole research
link |
field on AI safety emerging and in conferences and so on.
link |
I think that's great.
link |
In the long term, I really hope we just
link |
can simply have the benefits outweigh
link |
the potential dangers.
link |
I am hopeful for that.
link |
But also, we must remain vigilant to monitor and assess
link |
whether the tradeoffs are there and we have enough also lead
link |
time to prevent or to redirect our efforts if need be.
link |
But I'm quite optimistic about the technology
link |
and definitely more fearful of other threats
link |
in terms of planetary level at this point.
link |
But obviously, that's the one I have more power on.
link |
So clearly, I do start thinking more and more about this.
link |
And it's grown in me actually to start reading more
link |
about AI safety, which is a field that so far I have not
link |
really contributed to.
link |
But maybe there's something to be done there as well.
link |
I think it's really important.
link |
I talk about this with a few folks.
link |
But it's important to ask you and shove it in your head
link |
because you're at the leading edge of actually what
link |
people are excited about in AI.
link |
The work with AlphaStar, it's arguably
link |
at the very cutting edge of the kind of thing
link |
that people are afraid of.
link |
And so you speaking to that fact and that we're actually
link |
quite far away to the kind of thing
link |
that people might be afraid of.
link |
But it's still worthwhile to think about.
link |
And it's also good that you're not as worried
link |
and you're also open to thinking about it.
link |
There's two aspects.
link |
I mean, me not being worried.
link |
But obviously, we should prepare for things
link |
that could go wrong, misuse of the technologies
link |
as with any technologies.
link |
So I think there's always trade offs.
link |
And as a society, we've kind of solved this to some extent
link |
So I'm hoping that by having the researchers
link |
and the whole community brainstorm and come up
link |
with interesting solutions to the new things that
link |
will happen in the future, that we can still also push
link |
the research to the avenue that I think
link |
is kind of the greatest avenue, which is
link |
to understand intelligence.
link |
How are we doing what we're doing?
link |
And obviously, from a scientific standpoint,
link |
that is kind of my personal drive of all the time
link |
that I spend doing what I'm doing, really.
link |
Where do you see the deep learning as a field heading?
link |
Where do you think the next big breakthrough might be?
link |
So I think deep learning, I discussed a little of this
link |
Deep learning has to be combined with some form
link |
of discretization, program synthesis.
link |
I think that's kind of as a research in itself
link |
is an interesting topic to expand and start
link |
doing more research.
link |
And then as kind of what will deep learning
link |
enable to do in the future?
link |
I don't think that's going to be what's going to happen this year.
link |
But also this idea of starting not to throw away all the weights,
link |
that this idea of learning to learn
link |
and really having these agents not having
link |
to restart their weights.
link |
And you can have an agent that is kind of solving or classifying
link |
images on ImageNet, but also generating speech
link |
if you ask it to generate some speech.
link |
And it should really be kind of almost the same network,
link |
but it might not be a neural network.
link |
It might be a neural network with an optimization
link |
algorithm attached to it.
link |
But I think this idea of generalization to new task
link |
is something that we first must define good benchmarks.
link |
But then I think that's going to be exciting.
link |
And I'm not sure how close we are.
link |
But I think if you have a very limited domain,
link |
I think we can start doing some progress.
link |
And much like how we did a lot of programs in computer vision,
link |
we should start thinking.
link |
I really like a talk that Leon Buto gave at ICML
link |
a few years ago, which is this train test paradigm should
link |
We should stop thinking about a training set and a test set.
link |
And these are closed things that are untouchable.
link |
I think we should go beyond these.
link |
And in meta learning, we call these the meta training
link |
set and the meta test set, which is really thinking about,
link |
if I know about ImageNet, why would that network not
link |
work on MNIST, which is a much simpler problem?
link |
But right now, it really doesn't.
link |
But it just feels wrong.
link |
So I think that's kind of the, on the application
link |
or the benchmark sites, we probably
link |
will see quite a few more interest and progress
link |
and hopefully people defining new and exciting challenges
link |
Do you have any hope or interest in knowledge graphs
link |
within this context?
link |
So this kind of constructing graph.
link |
So going back to graphs.
link |
Well, neural networks and graphs.
link |
But I mean, a different kind of knowledge graph,
link |
sort of like semantic graphs or those concepts.
link |
So I think the idea of graphs is,
link |
so I've been quite interested in sequences first and then
link |
more interesting or different data structures like graphs.
link |
And I've studied graph neural networks in the last three
link |
I found these models just very interesting
link |
from deep learning sites standpoint.
link |
But then why do we want these models
link |
and why would we use them?
link |
What's the application?
link |
What's kind of the killer application of graphs?
link |
And perhaps if we could extract a knowledge graph
link |
from Wikipedia automatically, that
link |
would be interesting because then these graphs have
link |
this very interesting structure that also is a bit more
link |
compatible with this idea of programs and deep learning
link |
kind of working together, jumping neighborhoods
link |
You could imagine defining some primitives
link |
to go around graphs, right?
link |
So I think I really like the idea of a knowledge graph.
link |
And in fact, when we started or as part of the research
link |
we did for StarCraft, I thought, wouldn't it
link |
be cool to give the graph of all these buildings that
link |
depend on each other and units that have prerequisites
link |
of being built by that.
link |
And so this is information that the network
link |
can learn and extract.
link |
But it would have been great to see
link |
or to think of really StarCraft as a giant graph that even
link |
also as the game evolves, you start taking branches
link |
And we did a bit of research on these,
link |
nothing too relevant, but I really like the idea.
link |
And it has elements that are something
link |
you also worked with in terms of visualizing your networks.
link |
It has elements of having human interpretable,
link |
being able to generate knowledge representations that
link |
are human interpretable that maybe human experts can then
link |
tweak or at least understand.
link |
So there's a lot of interesting aspect there.
link |
And for me personally, I'm just a huge fan of Wikipedia.
link |
And it's a shame that our neural networks aren't
link |
taking advantage of all the structured knowledge that's
link |
What's next for you?
link |
What's next for DeepMind?
link |
What are you excited about for AlphaStar?
link |
Yeah, so I think the obvious next steps
link |
would be to apply AlphaStar to other races.
link |
I mean, that sort of shows that the algorithm works
link |
because we wouldn't want to have created by mistake something
link |
in the architecture that happens to work for Protoss
link |
but not for other races.
link |
So as verification, I think that's an obvious next step
link |
that we are working on.
link |
And then I would like to see so agents and players can
link |
specialize on different skill sets that
link |
allow them to be very good.
link |
I think we've seen AlphaStar understanding very well
link |
when to take battles and when to not to do that.
link |
Also very good at micromanagement
link |
and moving the units around and so on.
link |
And also very good at producing nonstop and trading off
link |
economy with building units.
link |
But I have not perhaps seen as much
link |
as I would like this idea of the poker idea
link |
that you mentioned, right?
link |
I'm not sure StarCraft or AlphaStar
link |
rather has developed a very deep understanding of what
link |
the opponent is doing and reacting to that
link |
and sort of trying to trick the player to do something else
link |
So this kind of reasoning, I would like to see more.
link |
So I think purely from a research standpoint,
link |
there's perhaps also quite a few things
link |
to be done there in the domain of StarCraft.
link |
Yeah, in the domain of games, I've
link |
seen some interesting work in even auctions,
link |
manipulating other players, sort of forming a belief state
link |
and just messing with people.
link |
Yeah, it's called theory of mind, I guess.
link |
Theory of mind, yeah.
link |
So it's a fascinating.
link |
Theory of mind on StarCraft is kind of they're
link |
really made for each other.
link |
So that would be very exciting to see those techniques apply
link |
to StarCraft or perhaps StarCraft
link |
driving new techniques, right?
link |
As I said, this is always the tension between the two.
link |
Well, Orel, thank you so much for talking today.
link |
It was great to be here.