back to index

Oriol Vinyals: DeepMind AlphaStar, StarCraft, and Language | Lex Fridman Podcast #20


small model | large model

link |
00:00:00.000
The following is a conversation with Ariel Vinales.
link |
00:00:03.280
He's a senior research scientist at Google DeepMind,
link |
00:00:05.880
and before that, he was at Google Brain and Berkeley.
link |
00:00:09.120
His research has been cited over 39,000 times.
link |
00:00:13.280
He's truly one of the most brilliant and impactful minds
link |
00:00:16.520
in the field of deep learning.
link |
00:00:18.160
He's behind some of the biggest papers and ideas in AI,
link |
00:00:20.960
including sequence to sequence learning,
link |
00:00:23.080
audio generation, image captioning,
link |
00:00:25.480
neural machine translation,
link |
00:00:27.000
and, of course, reinforcement learning.
link |
00:00:29.600
He's a lead researcher of the AlphaStar project,
link |
00:00:32.800
creating an agent that defeated a top professional
link |
00:00:35.760
at the game of StarCraft.
link |
00:00:38.080
This conversation is part
link |
00:00:39.800
of the Artificial Intelligence podcast.
link |
00:00:41.800
If you enjoy it, subscribe on YouTube, iTunes,
link |
00:00:44.920
or simply connect with me on Twitter at Lex Friedman,
link |
00:00:48.800
spelled F R I D.
link |
00:00:51.240
And now, here's my conversation with Ariel Vinales.
link |
00:00:55.440
You spearheaded the DeepMind team behind AlphaStar
link |
00:00:59.600
that recently beat a top professional player at StarCraft.
link |
00:01:04.040
So you have an incredible wealth of work
link |
00:01:07.720
in deep learning and a bunch of fields,
link |
00:01:09.480
but let's talk about StarCraft first.
link |
00:01:11.840
Let's go back to the very beginning,
link |
00:01:13.760
even before AlphaStar, before DeepMind,
link |
00:01:16.680
before deep learning first.
link |
00:01:18.840
What came first for you,
link |
00:01:21.280
a love for programming or a love for video games?
link |
00:01:24.960
I think for me, it definitely came first
link |
00:01:28.560
the drive to play video games.
link |
00:01:31.960
I really liked computers.
link |
00:01:35.280
I didn't really code much, but what I would do is
link |
00:01:38.840
I would just mess with the computer, break it and fix it.
link |
00:01:42.080
That was the level of skills, I guess,
link |
00:01:43.800
that I gained in my very early days,
link |
00:01:46.400
I mean, when I was 10 or 11.
link |
00:01:48.520
And then I really got into video games,
link |
00:01:50.960
especially StarCraft, actually, the first version.
link |
00:01:53.680
I spent most of my time
link |
00:01:55.240
just playing kind of pseudo professionally,
link |
00:01:57.080
as professionally as you could play back in 98 in Europe,
link |
00:02:01.040
which was not a very main scene
link |
00:02:03.080
like what's called nowadays eSports.
link |
00:02:05.840
Right, of course, in the 90s.
link |
00:02:07.400
So how'd you get into StarCraft?
link |
00:02:09.920
What was your favorite race?
link |
00:02:11.680
How did you develop your skill?
link |
00:02:15.080
What was your strategy?
link |
00:02:16.880
All that kind of thing.
link |
00:02:18.040
So as a player, I tended to try to play not many games,
link |
00:02:21.520
not to kind of disclose the strategies
link |
00:02:23.720
that I kind of developed.
link |
00:02:25.400
And I like to play random, actually,
link |
00:02:27.560
not in competitions, but just to...
link |
00:02:30.040
I think in StarCraft, there's three main races
link |
00:02:33.400
and I found it very useful to play with all of them.
link |
00:02:36.280
And so I would choose random many times,
link |
00:02:38.360
even sometimes in tournaments,
link |
00:02:40.200
to gain skill on the three races
link |
00:02:42.360
because it's not how you play against someone,
link |
00:02:45.440
but also if you understand the race because you played,
link |
00:02:48.760
you also understand what's annoying,
link |
00:02:51.040
then when you're on the other side,
link |
00:02:52.480
what to do to annoy that person,
link |
00:02:54.160
to try to gain advantages here and there and so on.
link |
00:02:57.280
So I actually played random,
link |
00:02:59.080
although I must say in terms of favorite race,
link |
00:03:02.000
I really liked Zerg.
link |
00:03:03.640
I was probably best at Zerg
link |
00:03:05.480
and that's probably what I tend to use
link |
00:03:08.320
towards the end of my career before starting university.
link |
00:03:11.400
So let's step back a little bit.
link |
00:03:13.280
Could you try to describe StarCraft
link |
00:03:15.600
to people that may never have played video games,
link |
00:03:18.880
especially the massively online variety like StarCraft?
link |
00:03:22.280
So StarCraft is a real time strategy game.
link |
00:03:25.880
And the way to think about StarCraft,
link |
00:03:27.760
perhaps if you understand a bit chess,
link |
00:03:30.920
is that there's a board which is called map
link |
00:03:34.200
or the map where people play against each other.
link |
00:03:39.120
There's obviously many ways you can play,
link |
00:03:40.960
but the most interesting one is the one versus one setup
link |
00:03:44.600
where you just play against someone else
link |
00:03:47.360
or even the built in AI, right?
link |
00:03:49.280
Blizzard put a system that can play the game
link |
00:03:51.600
reasonably well if you don't know how to play.
link |
00:03:54.480
And then in this board, you have again,
link |
00:03:57.080
pieces like in chess,
link |
00:03:58.680
but these pieces are not there initially
link |
00:04:01.400
like they are in chess.
link |
00:04:02.360
You actually need to decide to gather resources
link |
00:04:05.800
to decide which pieces to build.
link |
00:04:07.920
So in a way you're starting almost with no pieces.
link |
00:04:10.760
You start gathering resources in StarCraft.
link |
00:04:13.400
There's minerals and gas that you can gather.
link |
00:04:16.200
And then you must decide how much do you wanna focus
link |
00:04:19.440
for instance, on gathering more resources
link |
00:04:21.480
or starting to build units or pieces.
link |
00:04:24.360
And then once you have enough pieces
link |
00:04:27.200
or maybe like attack, a good attack composition,
link |
00:04:32.120
then you go and attack the other side of the map.
link |
00:04:35.480
And now the other main difference with chess
link |
00:04:37.800
is that you don't see the other side of the map.
link |
00:04:39.920
So you're not seeing the moves of the enemy.
link |
00:04:43.360
It's what we call partially observable.
link |
00:04:45.440
So as a result, you must not only decide
link |
00:04:48.680
trading off economy versus building your own units,
link |
00:04:52.320
but you also must decide whether you wanna scout
link |
00:04:54.960
to gather information, but also by scouting,
link |
00:04:57.840
you might be giving away some information
link |
00:04:59.520
that you might be hiding from the enemy.
link |
00:05:01.960
So there's a lot of complex decision making
link |
00:05:04.960
all in real time.
link |
00:05:06.000
There's also unlike chess, this is not a turn based game.
link |
00:05:10.120
You play basically all the time continuously
link |
00:05:13.680
and thus some skill in terms of speed
link |
00:05:16.280
and accuracy of clicking is also very important.
link |
00:05:18.920
And people that train for this really play this game
link |
00:05:21.480
at an amazing skill level.
link |
00:05:23.560
I've seen many times these
link |
00:05:25.800
and if you can witness this life,
link |
00:05:27.360
it's really, really impressive.
link |
00:05:29.480
So in a way, it's kind of a chess
link |
00:05:31.400
where you don't see the other side of the board,
link |
00:05:33.400
you're building your own pieces
link |
00:05:35.200
and you also need to gather resources
link |
00:05:37.200
to basically get some money to build other buildings,
link |
00:05:40.680
pieces, technology and so on.
link |
00:05:42.840
From the perspective of a human player,
link |
00:05:45.120
the difference between that and chess
link |
00:05:47.200
or maybe that and a game like turn based strategy
link |
00:05:50.760
like Heroes of Might and Magic is that there's an anxiety
link |
00:05:55.160
because you have to make these decisions really quickly.
link |
00:05:58.760
And if you are not actually aware of what decisions work,
link |
00:06:04.360
it's a very stressful balance.
link |
00:06:06.480
Everything you describe is actually quite stressful,
link |
00:06:08.880
difficult to balance for an amateur human player.
link |
00:06:11.680
I don't know if it gets easier at the professional level,
link |
00:06:14.120
like if they're fully aware of what they have to do,
link |
00:06:16.440
but at the amateur level, there's this anxiety.
link |
00:06:19.240
Oh crap, I'm being attacked.
link |
00:06:20.440
Oh crap, I have to build up resource.
link |
00:06:22.760
Oh, I have to probably expand.
link |
00:06:24.320
And all these, the time,
link |
00:06:26.120
the real time strategy aspect is really stressful
link |
00:06:29.440
and computationally I'm sure difficult.
link |
00:06:31.320
We'll get into it.
link |
00:06:32.240
But for me, Battle.net,
link |
00:06:35.960
so StarCraft was released in 98, 20 years ago,
link |
00:06:42.600
which is hard to believe.
link |
00:06:44.640
And Blizzard Battle.net with Diablo in 96 came out.
link |
00:06:50.160
And to me, it might be a narrow perspective,
link |
00:06:52.560
but it changed online gaming and perhaps society forever.
link |
00:06:56.800
Yeah.
link |
00:06:57.640
But I may have made way too narrow viewpoint,
link |
00:07:00.280
but from your perspective,
link |
00:07:02.200
can you talk about the history of gaming
link |
00:07:05.040
over the past 20 years?
link |
00:07:06.440
Is this, how transformational,
link |
00:07:09.120
how important is this line of games?
link |
00:07:12.200
Right, so I think I kind of was an active gamer
link |
00:07:16.400
whilst this was developing, the internet, online gaming.
link |
00:07:20.040
So for me, the way it came was I played other games,
link |
00:07:24.800
strategy related, I played a bit of Common and Conquer,
link |
00:07:27.880
and then I played Warcraft II, which is from Blizzard.
link |
00:07:31.320
But at the time, I didn't know,
link |
00:07:32.520
I didn't understand about what Blizzard was or anything.
link |
00:07:35.520
Warcraft II was just a game,
link |
00:07:36.800
which was actually very similar to StarCraft in many ways.
link |
00:07:39.760
It's also real time strategy game
link |
00:07:41.960
where there's orcs and humans, so there's only two races.
link |
00:07:44.880
But it was offline.
link |
00:07:46.000
And it was offline, right?
link |
00:07:47.480
So I remember a friend of mine came to school,
link |
00:07:51.120
say, oh, there's this new cool game called StarCraft.
link |
00:07:53.480
And I just said, oh, this sounds like
link |
00:07:54.920
just a copy of Warcraft II, until I kind of installed it.
link |
00:07:59.240
And at the time, I am from Spain,
link |
00:08:01.520
so we didn't have very good internet, right?
link |
00:08:04.160
So there was, for us,
link |
00:08:05.720
StarCraft became first kind of an offline experience
link |
00:08:09.080
where you kind of start to play these missions, right?
link |
00:08:12.480
You play against some sort of scripted things
link |
00:08:15.280
to develop the story of the characters in the game.
link |
00:08:18.520
And then later on, I start playing against the built in AI,
link |
00:08:23.040
and I thought it was impossible to defeat it.
link |
00:08:25.680
Then eventually you defeat one
link |
00:08:27.000
and you can actually play against seven built in AIs
link |
00:08:29.240
at the same time, which also felt impossible.
link |
00:08:32.240
But actually, it's not that hard to beat
link |
00:08:34.840
seven built in AIs at once.
link |
00:08:36.520
So once we achieved that, also we discovered that
link |
00:08:40.120
we could play, as I said, internet wasn't that great,
link |
00:08:43.400
but we could play with the LAN, right?
link |
00:08:45.480
Like basically against each other
link |
00:08:47.600
if we were in the same place
link |
00:08:49.480
because you could just connect machines with like cables,
link |
00:08:51.880
right?
link |
00:08:53.200
So we started playing in LAN mode
link |
00:08:55.480
and as a group of friends,
link |
00:08:58.080
and it was really, really like much more entertaining
link |
00:09:00.520
than playing against AIs.
link |
00:09:02.280
And later on, as internet was starting to develop
link |
00:09:05.120
and being a bit faster and more reliable,
link |
00:09:07.400
then it's when I started experiencing Battle.net,
link |
00:09:09.720
which is this amazing universe,
link |
00:09:11.560
not only because of the fact
link |
00:09:13.720
that you can play the game against anyone in the world,
link |
00:09:16.440
but you can also get to know more people.
link |
00:09:20.200
You just get exposed to now like this vast variety of,
link |
00:09:23.080
it's kind of a bit when the chats came about, right?
link |
00:09:25.320
There was a chat system.
link |
00:09:27.320
You could play against people,
link |
00:09:29.040
but you could also chat with people,
link |
00:09:30.720
not only about Stalker, but about anything.
link |
00:09:32.480
And that became a way of life for kind of two years.
link |
00:09:36.640
And obviously then it became like kind of,
link |
00:09:38.880
it exploded in me in that I started to play more seriously,
link |
00:09:42.240
going to tournaments and so on and so forth.
link |
00:09:44.680
Do you have a sense on a societal, sociological level,
link |
00:09:49.840
what's this whole part of society
link |
00:09:52.240
that many of us are not aware of
link |
00:09:53.800
and it's a huge part of society, which is gamers.
link |
00:09:56.840
I mean, every time I come across that in YouTube
link |
00:10:00.920
or streaming sites, I mean,
link |
00:10:03.160
this is the huge number of people play games religiously.
link |
00:10:07.600
Do you have a sense of those folks,
link |
00:10:08.880
especially now that you've returned to that realm
link |
00:10:10.840
a little bit on the AI side?
link |
00:10:12.600
Yeah, so in fact, even after Stalker,
link |
00:10:15.880
I actually played World of Warcraft,
link |
00:10:17.600
which is maybe the main sort of online worlds
link |
00:10:21.360
or in presence that you get to interact
link |
00:10:23.880
with lots of people.
link |
00:10:24.720
So I played that for a little bit.
link |
00:10:26.320
It was to me, it was a bit less stressful than StarCraft
link |
00:10:29.000
because winning was kind of a given.
link |
00:10:30.840
You just put in this world
link |
00:10:32.320
and you can always complete missions.
link |
00:10:34.960
But I think it was actually the social aspect
link |
00:10:38.040
of especially StarCraft first
link |
00:10:40.400
and then games like World of Warcraft
link |
00:10:43.360
really shaped me in a very interesting ways
link |
00:10:46.880
because what you get to experience
link |
00:10:48.480
is just people you wouldn't usually interact with, right?
link |
00:10:51.600
So even nowadays, I still have many Facebook friends
link |
00:10:54.920
from the area where I played online
link |
00:10:56.880
and their ways of thinking is even political.
link |
00:11:00.040
They just, we don't live in,
link |
00:11:01.560
like we don't interact in the real world,
link |
00:11:03.640
but we were connected by basically fiber.
link |
00:11:06.680
And that way I actually get to understand a bit better
link |
00:11:10.760
that we live in a diverse world.
link |
00:11:12.760
And these were just connections that were made by,
link |
00:11:15.560
because, you know, I happened to go in a city
link |
00:11:18.040
in a virtual city as a priest and I met this warrior
link |
00:11:22.400
and we became friends
link |
00:11:23.600
and then we start like playing together, right?
link |
00:11:25.640
So I think it's transformative
link |
00:11:28.720
and more and more and more people are more aware of it.
link |
00:11:31.240
I mean, it's becoming quite mainstream,
link |
00:11:33.440
but back in the day, as you were saying in 2000, 2005,
link |
00:11:37.560
even it was very, still very strange thing to do,
link |
00:11:42.040
especially in Europe.
link |
00:11:44.200
I think there were exceptions like Korea, for instance,
link |
00:11:47.080
it was amazing that everything happened so early
link |
00:11:50.560
in terms of cybercafes, like if you go to Seoul,
link |
00:11:54.400
it's a city that back in the day,
link |
00:11:57.040
StarCraft was kind of,
link |
00:11:58.360
you could be a celebrity by playing StarCraft,
link |
00:12:00.600
but this was like 99, 2000, right?
link |
00:12:03.000
It's not like recently.
link |
00:12:04.120
So yeah, it's quite interesting to look back
link |
00:12:08.520
and yeah, I think it's changing society.
link |
00:12:10.920
The same way, of course, like technology
link |
00:12:13.080
and social networks and so on are also transforming things.
link |
00:12:16.880
And a quick tangent, let me ask,
link |
00:12:18.440
you're also one of the most productive people
link |
00:12:20.960
in your particular chosen passion and path in life.
link |
00:12:26.400
And yet you're also appreciate and enjoy video games.
link |
00:12:29.440
Do you think it's possible to do,
link |
00:12:32.680
to enjoy video games in moderation?
link |
00:12:35.760
Someone told me that you could choose two out of three.
link |
00:12:39.880
When I was playing video games,
link |
00:12:41.120
you could choose having a girlfriend,
link |
00:12:43.680
playing video games or studying.
link |
00:12:46.200
And I think for the most part, it was relatively true.
link |
00:12:50.520
These things do take time.
link |
00:12:52.320
Games like StarCraft,
link |
00:12:53.320
if you take the game pretty seriously
link |
00:12:55.360
and you wanna study it,
link |
00:12:56.480
then you obviously will dedicate more time to it.
link |
00:12:59.040
And I definitely took gaming
link |
00:13:01.160
and obviously studying very seriously.
link |
00:13:03.640
I love learning science and et cetera.
link |
00:13:08.680
So to me, especially when I started university undergrad,
link |
00:13:13.080
I kind of step off StarCraft.
link |
00:13:14.880
I actually fully stopped playing.
link |
00:13:16.800
And then World of Warcraft was a bit more casual.
link |
00:13:19.000
You could just connect online.
link |
00:13:20.400
And I mean, it was fun.
link |
00:13:22.880
But as I said, that was not as much time investment
link |
00:13:26.800
as it was for me in StarCraft.
link |
00:13:29.440
Okay, so let's get into AlphaStar.
link |
00:13:31.600
What are the, you're behind the team.
link |
00:13:35.160
So DeepMind has been working on StarCraft
link |
00:13:37.200
and released a bunch of cool open source agents
link |
00:13:39.360
and so on the past few years.
link |
00:13:41.280
But AlphaStar really is the moment
link |
00:13:43.160
where the first time you beat a world class player.
link |
00:13:49.120
So what are the parameters of the challenge
link |
00:13:51.560
in the way that AlphaStar took it on
link |
00:13:53.440
and how did you and David
link |
00:13:55.240
and the rest of the DeepMind team get into it?
link |
00:13:58.240
Consider that you can even beat the best in the world
link |
00:14:00.920
or top players.
link |
00:14:02.440
I think it all started back in 2015.
link |
00:14:08.040
Actually, I'm lying.
link |
00:14:08.880
I think it was 2014 when DeepMind was acquired by Google.
link |
00:14:14.000
And I at the time was at Google Brain,
link |
00:14:15.680
which was in California, is still in California.
link |
00:14:18.880
We had this summit where we got together, the two groups.
link |
00:14:21.800
So Google Brain and Google DeepMind got together
link |
00:14:24.360
and we gave a series of talks.
link |
00:14:26.320
And given that they were doing
link |
00:14:28.600
deep reinforcement learning for games,
link |
00:14:30.560
I decided to bring up part of my past,
link |
00:14:33.600
which I had developed at Berkeley,
link |
00:14:35.080
like this thing which we call Berkeley OverMind,
link |
00:14:37.400
which is really just a StarCraft one bot, right?
link |
00:14:40.160
So I talked about that.
link |
00:14:42.120
And I remember Demis just came to me and said,
link |
00:14:44.280
well, maybe not now, it's perhaps a bit too early,
link |
00:14:47.120
but you should just come to DeepMind
link |
00:14:48.920
and do this again with deep reinforcement learning, right?
link |
00:14:53.720
And at the time it sounded very science fiction
link |
00:14:56.600
for several reasons.
link |
00:14:58.760
But then in 2016, when I actually moved to London
link |
00:15:01.520
and joined DeepMind transferring from Brain,
link |
00:15:04.760
it became apparent that because of the AlphaGo moment
link |
00:15:08.200
and kind of Blizzard reaching out to us to say,
link |
00:15:11.280
wait, like, do you want the next challenge?
link |
00:15:13.000
And also me being full time at DeepMind,
link |
00:15:15.080
so sort of kind of all these came together.
link |
00:15:17.440
And then I went to Irvine in California,
link |
00:15:20.960
to the Blizzard headquarters to just chat with them
link |
00:15:23.800
and try to explain how would it all work
link |
00:15:26.320
before you do anything.
link |
00:15:27.800
And the approach has always been
link |
00:15:30.680
about the learning perspective, right?
link |
00:15:33.640
So in Berkeley, we did a lot of rule based conditioning
link |
00:15:39.160
and if you have more than three units, then go attack.
link |
00:15:42.520
And if the other has more units than me,
link |
00:15:44.200
I retreat and so on and so forth.
link |
00:15:46.360
And of course, the point of deep reinforcement learning,
link |
00:15:48.840
deep learning, machine learning in general
link |
00:15:50.480
is that all these should be learned behavior.
link |
00:15:53.440
So that kind of was the DNA of the project
link |
00:15:56.960
since its inception in 2016,
link |
00:15:59.480
where we just didn't even have an environment to work with.
link |
00:16:02.880
And so that's how it all started really.
link |
00:16:05.840
So if you go back to that conversation with Demis
link |
00:16:08.600
or even in your own head, how far away did you,
link |
00:16:12.200
because we're talking about Atari games,
link |
00:16:14.480
we're talking about Go, which is kind of,
link |
00:16:16.680
if you're honest about it, really far away from StarCraft.
link |
00:16:20.120
In, well, now that you've beaten it,
link |
00:16:22.160
maybe you could say it's close,
link |
00:16:23.280
but it's much, it seems like StarCraft
link |
00:16:25.880
is way harder than Go philosophically
link |
00:16:29.120
and mathematically speaking.
link |
00:16:30.880
So how far away did you think you were?
link |
00:16:34.240
Do you think it's 2019 and 18
link |
00:16:36.560
you could be doing as well as you have?
link |
00:16:37.960
Yeah, when I kind of thought about,
link |
00:16:40.120
okay, I'm gonna dedicate a lot of my time
link |
00:16:43.000
and focus on this.
link |
00:16:44.080
And obviously I do a lot of different research
link |
00:16:47.320
in deep learning.
link |
00:16:48.160
So spending time on it, I mean,
link |
00:16:50.000
I really had to kind of think
link |
00:16:51.480
there's gonna be something good happening out of this.
link |
00:16:55.120
So really I thought, well, this sounds impossible.
link |
00:16:58.400
And it probably is impossible to do the full thing,
link |
00:17:01.000
like the full game where you play one versus one
link |
00:17:06.080
and it's only a neural network playing and so on.
link |
00:17:09.120
So it really felt like,
link |
00:17:10.360
I just didn't even think it was possible.
link |
00:17:13.360
But on the other hand,
link |
00:17:14.200
I could see some stepping stones towards that goal.
link |
00:17:18.440
Clearly you could define sub problems in StarCraft
link |
00:17:21.000
and sort of dissect it a bit and say,
link |
00:17:22.760
okay, here is a part of the game, here's another part.
link |
00:17:26.080
And also obviously the fact,
link |
00:17:29.240
so this was really also critical to me,
link |
00:17:31.120
the fact that we could access human replays, right?
link |
00:17:34.240
So Blizzard was very kind.
link |
00:17:35.560
And in fact, they open source these for the whole community
link |
00:17:38.400
where you can just go
link |
00:17:39.800
and it's not every single StarCraft game ever played,
link |
00:17:42.880
but it's a lot of them you can just go and download.
link |
00:17:45.720
And every day they will,
link |
00:17:47.000
you can just query a data set and say,
link |
00:17:48.800
well, give me all the games that were played today.
link |
00:17:51.520
And given my kind of experience with language
link |
00:17:55.640
and sequences and supervised learning,
link |
00:17:57.760
I thought, well, that's definitely gonna be very helpful
link |
00:18:00.600
and something quite unique now,
link |
00:18:02.240
because ever before we had such a large data set of replays,
link |
00:18:08.040
of people playing the game at this scale
link |
00:18:10.840
of such a complex video game, right?
link |
00:18:12.400
So that to me was a precious resource.
link |
00:18:15.480
And as soon as I knew that Blizzard
link |
00:18:17.240
was able to kind of give this to the community,
link |
00:18:20.800
I started to feel positive
link |
00:18:22.080
about something non trivial happening.
link |
00:18:24.120
But I also thought the full thing, like really no rules,
link |
00:18:28.240
no single line of code that tries to say,
link |
00:18:31.120
well, I mean, if you see this unit, build a detector,
link |
00:18:33.200
all these, not having any of these specializations
link |
00:18:36.560
seemed really, really, really difficult to me.
link |
00:18:38.960
Intuitively.
link |
00:18:39.800
I do also like that Blizzard was teasing
link |
00:18:42.520
or even trolling you,
link |
00:18:45.360
sort of almost, yeah, pulling you in
link |
00:18:48.440
into this really difficult challenge.
link |
00:18:50.160
Do they have any awareness?
link |
00:18:51.680
What's the interest from the perspective of Blizzard,
link |
00:18:55.600
except just curiosity?
link |
00:18:57.240
Yeah, I think Blizzard has really understood
link |
00:18:59.360
and really bring forward this competitiveness
link |
00:19:03.200
of esports in games.
link |
00:19:04.720
The StarCraft really kind of sparked a lot of,
link |
00:19:07.800
like something that almost was never seen,
link |
00:19:10.680
especially as I was saying, back in Korea.
link |
00:19:13.920
So they just probably thought,
link |
00:19:16.200
well, this is such a pure one versus one setup
link |
00:19:18.840
that it would be great to see
link |
00:19:21.120
if something that can play Atari or Go
link |
00:19:24.840
and then later on chess could even tackle
link |
00:19:27.920
these kind of complex real time strategy game, right?
link |
00:19:30.600
So for them, they wanted to see first,
link |
00:19:33.320
obviously whether it was possible,
link |
00:19:36.440
if the game they created was in a way solvable
link |
00:19:39.760
to some extent.
link |
00:19:40.840
And I think on the other hand,
link |
00:19:42.160
they also are a pretty modern company that innovates a lot.
link |
00:19:45.760
So just starting to understand AI for them
link |
00:19:48.520
to how to bring AI into games
link |
00:19:50.240
is not AI for games, but games for AI, right?
link |
00:19:54.320
I mean, both ways I think can work.
link |
00:19:56.120
And we obviously at DeepMind use games for AI, right?
link |
00:20:00.040
To drive AI progress,
link |
00:20:01.240
but Blizzard might actually be able to do
link |
00:20:03.680
and many other companies to start to understand
link |
00:20:06.040
and do the opposite.
link |
00:20:06.880
So I think that is also something
link |
00:20:08.600
they can get out of these.
link |
00:20:09.760
And they definitely, we have brainstormed a lot
link |
00:20:12.400
about these, right?
link |
00:20:13.680
But one of the interesting things to me
link |
00:20:15.120
about StarCraft and Diablo
link |
00:20:17.560
and these games that Blizzard has created
link |
00:20:19.360
is the task of balancing classes, for example.
link |
00:20:23.520
Sort of making the game fair from the starting point
link |
00:20:27.440
and then let skill determine the outcome.
link |
00:20:30.920
Is there, I mean, can you first comment,
link |
00:20:33.560
there's three races, Zerg, Protoss and Terran.
link |
00:20:36.760
I don't know if I've ever said that out loud.
link |
00:20:38.920
Is that how you pronounce it?
link |
00:20:40.040
Terran?
link |
00:20:40.880
Yeah, Terran.
link |
00:20:41.720
Yeah.
link |
00:20:44.120
Yeah, I don't think I've ever in person interacted
link |
00:20:46.480
with anybody about StarCraft, that's funny.
link |
00:20:49.600
So they seem to be pretty balanced.
link |
00:20:51.800
I wonder if the AI, the work that you're doing
link |
00:20:56.280
with AlphaStar would help balance them even further.
link |
00:20:59.200
Is that something you think about?
link |
00:21:00.560
Is that something that Blizzard is thinking about?
link |
00:21:03.360
Right, so balancing when you add a new unit
link |
00:21:06.440
or a new spell type is obviously possible
link |
00:21:09.160
given that you can always train or pre train at scale
link |
00:21:13.240
some agent that might start using that in unintended ways.
link |
00:21:16.720
But I think actually, if you understand
link |
00:21:19.200
how StarCraft has kind of co evolved with players,
link |
00:21:22.240
in a way, I think it's actually very cool
link |
00:21:24.360
the ways that many of the things and strategies
link |
00:21:27.440
that people came up with, right?
link |
00:21:28.720
So I think we've seen it over and over in StarCraft
link |
00:21:32.320
that Blizzard comes up with maybe a new unit
link |
00:21:35.000
and then some players get creative
link |
00:21:37.280
and do something kind of unintentional
link |
00:21:39.120
or something that Blizzard designers
link |
00:21:40.920
that just simply didn't test or think about.
link |
00:21:43.600
And then after that becomes kind of mainstream
link |
00:21:46.240
in the community, Blizzard patches the game
link |
00:21:48.280
and then they kind of maybe weaken that strategy
link |
00:21:51.920
or make it actually more interesting
link |
00:21:53.920
but a bit more balanced.
link |
00:21:55.440
So these kind of continual talk between players
link |
00:21:57.760
and Blizzard is kind of what has defined them actually
link |
00:22:01.720
in actually most games in StarCraft
link |
00:22:04.040
but also in World of Warcraft, they would do that.
link |
00:22:06.440
There are several classes and it would be not good
link |
00:22:09.280
that everyone plays absolutely the same race and so on, right?
link |
00:22:13.240
So I think they do care about balancing of course
link |
00:22:17.280
and they do a fair amount of testing
link |
00:22:19.640
but it's also beautiful to also see
link |
00:22:22.120
how players get creative anyways.
link |
00:22:24.480
And I mean, whether AI can be more creative at this point,
link |
00:22:27.440
I don't think so, right?
link |
00:22:28.680
I mean, it's just sometimes something so amazing happens.
link |
00:22:31.560
Like I remember back in the days,
link |
00:22:33.680
like you have these drop ships that could drop the rivers
link |
00:22:36.920
and that was actually not thought about
link |
00:22:39.600
that you could drop this unit
link |
00:22:41.280
that has this what's called splash damage
link |
00:22:43.240
that would basically eliminate
link |
00:22:45.640
all the enemies workers at once.
link |
00:22:47.840
No one thought that you could actually put them
link |
00:22:50.120
in really early game, do that kind of damage
link |
00:22:53.080
and then things change in the game.
link |
00:22:55.440
But I don't know, I think it's quite an amazing
link |
00:22:58.040
exploration process from both sides,
link |
00:23:00.320
players and Blizzard alike.
link |
00:23:01.880
Well, it's almost like a reinforcement learning exploration
link |
00:23:05.040
but the scale of humans that play Blizzard games
link |
00:23:11.240
is almost on the scale of a large scale
link |
00:23:13.720
deep mind RL experiment.
link |
00:23:15.360
I mean, if you look at the numbers,
link |
00:23:17.640
I mean, you're talking about, I don't know how many games
link |
00:23:19.560
but hundreds of thousands of games probably a month.
link |
00:23:22.080
Yeah.
link |
00:23:22.920
I mean, so it's almost the same as running RL agents.
link |
00:23:28.800
What aspect of the problem of Starcraft
link |
00:23:31.240
do you think is the hardest?
link |
00:23:32.160
Is it the, like you said, the imperfect information?
link |
00:23:35.400
Is it the fact they have to do longterm planning?
link |
00:23:38.160
Is it the real time aspects?
link |
00:23:40.320
We have to do stuff really quickly.
link |
00:23:42.240
Is it the fact that a large action space
link |
00:23:44.760
so you can do so many possible things?
link |
00:23:47.640
Or is it, you know, in the game theoretic sense
link |
00:23:51.120
there is no Nash equilibrium
link |
00:23:52.400
or at least you don't know what the optimal strategy is
link |
00:23:54.280
because there's way too many options.
link |
00:23:56.520
Right.
link |
00:23:57.360
Is there something that stands out as just like the hardest
link |
00:23:59.520
the most annoying thing?
link |
00:24:01.000
So when we sort of looked at the problem
link |
00:24:04.200
and start to define like the parameters of it, right?
link |
00:24:07.640
What are the observations?
link |
00:24:08.800
What are the actions?
link |
00:24:10.520
It became very apparent that, you know,
link |
00:24:13.880
the very first barrier that one would hit in Starcraft
link |
00:24:17.160
would be because of the action space being so large
link |
00:24:20.720
and as not being able to search like you could in chess
link |
00:24:24.880
or go even though the search space is vast.
link |
00:24:28.640
The main problem that we identified
link |
00:24:30.600
was that of exploration, right?
link |
00:24:32.440
So without any sort of human knowledge or human prior,
link |
00:24:36.720
if you think about Starcraft
link |
00:24:38.040
and you know how deep reinforcement learnings algorithm
link |
00:24:40.880
work which is essentially by issuing random actions
link |
00:24:45.400
and hoping that they will get some wins sometimes
link |
00:24:47.840
so they could learn.
link |
00:24:49.240
So if you think of the action space in Starcraft
link |
00:24:52.840
almost anything you can do in the early game is bad
link |
00:24:55.920
because any action involves taking workers
link |
00:24:58.760
which are mining minerals for free.
link |
00:25:01.360
That's something that the game does automatically
link |
00:25:03.560
sends them to mine.
link |
00:25:04.920
And you would immediately just take them out of mining
link |
00:25:07.760
and send them around.
link |
00:25:09.080
So just thinking how is it gonna be possible
link |
00:25:13.640
to get to understand these concepts
link |
00:25:16.920
but even more like expanding, right?
link |
00:25:19.280
There's these buildings you can place
link |
00:25:21.080
in other locations in the map to gather more resources
link |
00:25:24.160
but the location of the building is important
link |
00:25:26.840
and you have to select a worker,
link |
00:25:28.880
send it walking to that location, build the building,
link |
00:25:32.680
wait for the building to be built
link |
00:25:34.120
and then put extra workers there so they start mining.
link |
00:25:37.800
That feels like impossible if you just randomly click
link |
00:25:41.720
to produce that state, desirable state
link |
00:25:44.480
that then you could hope to learn from
link |
00:25:46.960
because eventually that may yield to an extra win, right?
link |
00:25:49.800
So for me, the exploration problem
link |
00:25:51.760
and due to the action space
link |
00:25:53.760
and the fact that there's not really turns,
link |
00:25:56.080
there's so many turns because the game essentially
link |
00:25:59.120
takes that 22 times per second.
link |
00:26:02.040
I mean, that's how they could discretize sort of time.
link |
00:26:05.520
Obviously you always have to discretize time
link |
00:26:07.280
but there's no such thing as real time
link |
00:26:09.600
but it's really a lot of time steps
link |
00:26:12.520
of things that could go wrong.
link |
00:26:14.240
And that definitely felt a priori like the hardest.
link |
00:26:17.920
You mentioned many good ones.
link |
00:26:19.320
I think partial observability
link |
00:26:21.120
and the fact that there is no perfect strategy
link |
00:26:23.440
because of the partial observability.
link |
00:26:25.520
Those are very interesting problems.
link |
00:26:26.840
We start seeing more and more now
link |
00:26:28.520
in terms of as we solve the previous ones
link |
00:26:31.040
but the core problem to me was exploration
link |
00:26:34.240
and solving it has been basically kind of the focus
link |
00:26:37.720
and how we saw the first breakthroughs.
link |
00:26:39.760
So exploration in a multi hierarchical way.
link |
00:26:43.680
So like 22 times a second exploration
link |
00:26:46.560
has a very different meaning than it does
link |
00:26:48.600
in terms of should I gather resources early
link |
00:26:51.440
or should I wait or so on.
link |
00:26:53.200
So how do you solve the longterm?
link |
00:26:56.200
Let's talk about the internals of AlphaStar.
link |
00:26:58.080
So first of all, how do you represent the state
link |
00:27:02.480
of the game as an input?
link |
00:27:05.440
How do you then do the longterm sequence modeling?
link |
00:27:08.800
How do you build a policy?
link |
00:27:10.760
What's the architecture like?
link |
00:27:12.560
So AlphaStar has obviously several components
link |
00:27:16.840
but everything passes through what we call the policy
link |
00:27:20.880
which is a neural network.
link |
00:27:22.280
And that's kind of the beauty of it.
link |
00:27:24.280
There is, I could just now give you a neural network
link |
00:27:27.160
and some weights.
link |
00:27:28.520
And if you fed the right observations
link |
00:27:30.440
and you understood the actions the same way we do
link |
00:27:32.560
you would have basically the agent playing the game.
link |
00:27:35.120
There's absolutely nothing else needed
link |
00:27:37.240
other than those weights that were trained.
link |
00:27:40.320
Now, the first step is observing the game
link |
00:27:43.360
and we've experimented with a few alternatives.
link |
00:27:46.640
The one that we currently use mixes both spatial
link |
00:27:50.280
sort of images that you would process from the game
link |
00:27:53.800
that is the zoomed out version of the map
link |
00:27:56.400
and also a zoomed in version of the camera
link |
00:27:58.960
or the screen as we call it.
link |
00:28:00.880
But also we give to the agent the list of units
link |
00:28:04.840
that it sees more of as a set of objects
link |
00:28:09.000
that it can operate on.
link |
00:28:11.040
That is not necessarily required to use it.
link |
00:28:14.760
And we have versions of the game that play well
link |
00:28:16.840
without this set vision that is a bit not like
link |
00:28:19.760
how humans perceive the game.
link |
00:28:21.640
But it certainly helps a lot
link |
00:28:23.600
because it's a very natural way to encode the game
link |
00:28:26.520
is by just looking at all the units that there are.
link |
00:28:29.360
They have properties like health, position, type of unit
link |
00:28:33.920
whether it's my unit or the enemies.
link |
00:28:36.160
And that sort of is kind of the summary
link |
00:28:40.760
of the state of the game,
link |
00:28:43.040
that list of units or set of units
link |
00:28:45.480
that you see all the time.
link |
00:28:47.360
But that's pretty close to the way humans see the game.
link |
00:28:49.560
Why do you say it's not, isn't that,
link |
00:28:51.520
you're saying the exactness of it is not similar to humans?
link |
00:28:55.040
The exactness of it is perhaps not the problem.
link |
00:28:57.200
I guess maybe the problem if you look at it
link |
00:28:59.800
from how actually humans play the game
link |
00:29:02.320
is that they play with a mouse and a keyboard and a screen
link |
00:29:05.720
and they don't see sort of a structured object
link |
00:29:08.720
with all the units.
link |
00:29:09.560
What they see is what they see on the screen, right?
link |
00:29:12.200
So.
link |
00:29:13.040
Remember that there's a, sorry to interrupt,
link |
00:29:14.360
there's a plot that you showed with camera base
link |
00:29:16.960
where you do exactly that, right?
link |
00:29:18.600
You move around and that seems to converge
link |
00:29:21.080
to similar performance.
link |
00:29:22.240
Yeah, I think that's what I,
link |
00:29:23.520
we're kind of experimenting with what's necessary or not,
link |
00:29:26.320
but using the set.
link |
00:29:28.720
So, actually, if you look at research in computer vision,
link |
00:29:32.360
where it makes a lot of sense to treat images
link |
00:29:35.960
as two dimensional arrays,
link |
00:29:38.160
there's actually a very nice paper from Facebook.
link |
00:29:40.360
I think, I forgot who the authors are,
link |
00:29:42.720
but I think it's part of Caming's group.
link |
00:29:46.360
And what they do is they take an image,
link |
00:29:49.520
which is this two dimensional signal,
link |
00:29:51.920
and they actually take pixel by pixel
link |
00:29:54.280
and scramble the image as if it was just a list of pixels.
link |
00:29:59.120
Crucially, they encode the position of the pixels
link |
00:30:01.760
with the X, Y coordinates.
link |
00:30:03.680
And this is just kind of a new architecture,
link |
00:30:06.120
which we incidentally also use in StarCraft
link |
00:30:08.480
called the Transformer,
link |
00:30:09.800
which is a very popular paper from last year,
link |
00:30:11.960
which yielded very nice result in machine translation.
link |
00:30:15.560
And if you actually believe in this kind of,
link |
00:30:18.000
oh, it's actually a set of pixels,
link |
00:30:20.280
as long as you encode X, Y, it's okay,
link |
00:30:22.520
then you could argue that the list of units that we see
link |
00:30:26.080
is precisely that,
link |
00:30:26.920
because we have each unit as a kind of pixel, if you will,
link |
00:30:31.440
and then their X, Y coordinates.
link |
00:30:33.200
So in that perspective, we, without knowing it,
link |
00:30:36.360
we use the same architecture that was shown
link |
00:30:38.680
to work very well on Pascal and ImageNet and so on.
link |
00:30:41.360
So the interesting thing here is putting it in that way
link |
00:30:45.400
it starts to move it towards
link |
00:30:46.880
the way you usually work with language.
link |
00:30:49.400
So what, and especially with your expertise
link |
00:30:52.680
and work in language,
link |
00:30:55.440
it seems like there's echoes of a lot of
link |
00:30:58.920
the way you would work with natural language
link |
00:31:00.640
in the way you've approached AlphaStar.
link |
00:31:02.320
Right.
link |
00:31:03.160
What's, does that help
link |
00:31:05.000
with the longterm sequence modeling there somehow?
link |
00:31:08.120
Exactly, so now that we understand
link |
00:31:10.160
what an observation for a given time step is,
link |
00:31:13.520
we need to move on to say,
link |
00:31:14.600
well, there's going to be a sequence of such observations
link |
00:31:17.680
and an agent will need to, given all that it's seen,
link |
00:31:21.040
not only the current time step, but all that it's seen, why?
link |
00:31:24.040
Because there is partial observability.
link |
00:31:25.880
We must remember whether we saw a worker going somewhere,
link |
00:31:29.000
for instance, right?
link |
00:31:30.040
Because then there might be an expansion
link |
00:31:31.680
on the top right of the map.
link |
00:31:33.560
So given that, what you must then think about is
link |
00:31:37.920
there is the problem of given all the observations,
link |
00:31:40.320
you have to predict the next action.
link |
00:31:42.560
And not only given all the observations,
link |
00:31:44.440
but given all the observations
link |
00:31:45.880
and given all the actions you've taken,
link |
00:31:47.840
predict the next action.
link |
00:31:49.280
And that sounds exactly like machine translation where,
link |
00:31:53.520
and that's exactly how kind of I saw the problem,
link |
00:31:57.080
especially when you are given supervised data
link |
00:31:59.920
or replays from humans,
link |
00:32:01.680
because the problem is exactly the same.
link |
00:32:03.520
You're translating essentially a prefix of observations
link |
00:32:07.600
and actions onto what's going to happen next,
link |
00:32:10.080
which is exactly how you would train a model to translate
link |
00:32:12.920
or to generate language as well, right?
link |
00:32:14.680
Do you have a certain prefix?
link |
00:32:16.560
You must remember everything that comes in the past
link |
00:32:18.920
because otherwise you might start having noncoherent text.
link |
00:32:22.560
And the same architectures we're using LSTMs
link |
00:32:26.480
and transformers to operate on across time
link |
00:32:29.680
to kind of integrate all that's happened in the past.
link |
00:32:33.000
Those architectures that work so well in translation
link |
00:32:35.640
or language modeling are exactly the same
link |
00:32:38.320
than what the agent is using to issue actions in the game.
link |
00:32:42.280
And the way we train it, moreover, for imitation,
link |
00:32:44.680
which is step one of AlphaStar is,
link |
00:32:47.040
take all the human experience and try to imitate it,
link |
00:32:49.800
much like you try to imitate translators
link |
00:32:52.840
that translated many pairs of sentences
link |
00:32:55.280
from French to English say,
link |
00:32:57.200
that sort of principle applies exactly the same.
link |
00:33:00.120
It's almost the same code, except that instead of words,
link |
00:33:04.440
you have a slightly more complicated objects,
link |
00:33:06.600
which are the observations and the actions
link |
00:33:08.840
are also a bit more complicated than a word.
link |
00:33:11.720
Is there a self play component then too?
link |
00:33:13.920
So once you run out of imitation?
link |
00:33:16.480
Right, so indeed you can bootstrap from human replays,
link |
00:33:22.240
but then the agents you get are actually not as good
link |
00:33:25.960
as the humans you imitated, right?
link |
00:33:28.160
So how do we imitate?
link |
00:33:30.440
Well, we take humans from 3000 MMR and higher.
link |
00:33:34.280
3000 MMR is just a metric of human skill
link |
00:33:37.960
and 3000 MMR might be like 50% percentile, right?
link |
00:33:41.880
So it's just average human.
link |
00:33:43.760
What's that?
link |
00:33:44.600
So maybe quick pause, MMR is a ranking scale,
link |
00:33:47.760
the matchmaking rating for players.
link |
00:33:50.320
So it's 3000, I remember there's like a master
link |
00:33:52.320
and a grand master, what's 3000?
link |
00:33:54.120
So 3000 is pretty bad.
link |
00:33:56.720
I think it's kind of goals level.
link |
00:33:58.440
It just sounds really good relative to chess, I think.
link |
00:34:00.680
Oh yeah, yeah, no, the ratings,
link |
00:34:02.440
the best in the world are at 7,000 MMR.
link |
00:34:05.320
So 3000, it's a bit like Elo indeed, right?
link |
00:34:07.840
So 3,500 just allows us to not filter a lot of the data.
link |
00:34:13.200
So we like to have a lot of data in deep learning
link |
00:34:15.680
as you probably know.
link |
00:34:17.320
So we take these kind of 3,500 and above,
link |
00:34:20.640
but then we do a very interesting trick,
link |
00:34:22.680
which is we tell the neural network
link |
00:34:25.000
what level they are imitating.
link |
00:34:27.560
So we say, this replay you're gonna try to imitate
link |
00:34:30.800
to predict the next action for all the actions
link |
00:34:33.040
that you're gonna see is a 4,000 MMR replay.
link |
00:34:36.120
This one is a 6,000 MMR replay.
link |
00:34:38.840
And what's cool about this is then we take this policy
link |
00:34:42.520
that is being trained from human,
link |
00:34:44.320
and then we can ask it to play like a 3000 MMR player
link |
00:34:47.440
by setting a beat saying, well, okay,
link |
00:34:49.600
play like a 3000 MMR player
link |
00:34:51.280
or play like a 6,000 MMR player.
link |
00:34:53.720
And you actually see how the policy behaves differently.
link |
00:34:57.320
It gets worse economy if you play like a goal level player,
link |
00:35:01.520
it does less actions per minute,
link |
00:35:03.000
which is the number of clicks or number of actions
link |
00:35:05.360
that you will issue in a whole minute.
link |
00:35:07.800
And it's very interesting to see
link |
00:35:09.240
that it kind of imitates the skill level quite well.
link |
00:35:12.360
But if we ask it to play like a 6,000 MMR player,
link |
00:35:15.480
we tested, of course, these policies to see how well they do.
link |
00:35:18.640
They actually beat all the built in AIs
link |
00:35:20.600
that Blizzard put in the game,
link |
00:35:22.440
but they're nowhere near 6,000 MMR players, right?
link |
00:35:25.000
They might be maybe around goal level, platinum, perhaps.
link |
00:35:29.280
So there's still a lot of work to be done for the policy
link |
00:35:32.240
to truly understand what it means to win.
link |
00:35:35.000
So far, we only asked them, okay, here is the screen.
link |
00:35:38.240
And that's what's happened on the game until this point.
link |
00:35:41.680
What would the next action be if we ask a pro to now say,
link |
00:35:46.160
oh, you're gonna click here or here or there.
link |
00:35:49.160
And the point is experiencing wins and losses
link |
00:35:53.720
is very important to then start to refine.
link |
00:35:56.400
Otherwise the policy can get loose,
link |
00:35:58.400
can just go off policy as we call it.
link |
00:36:00.520
That's so interesting that you can at least hope eventually
link |
00:36:03.480
to be able to control a policy
link |
00:36:06.840
approximately to be at some MMR level.
link |
00:36:10.000
That's so interesting, especially given that you have
link |
00:36:12.720
ground truth for a lot of these cases.
link |
00:36:15.080
Can I ask you a personal question?
link |
00:36:17.560
What's your MMR?
link |
00:36:19.240
Well, I haven't played StarCraft II, so I am unranked,
link |
00:36:23.680
which is the kind of lowest league.
link |
00:36:26.200
So I used to play StarCraft, the first one.
link |
00:36:29.600
But you haven't seriously played StarCraft II.
link |
00:36:32.680
So the best player we have at DeepMind is about 5,000 MMR,
link |
00:36:37.760
which is high masters.
link |
00:36:39.640
It's not at grand master level.
link |
00:36:42.120
Grand master level will be the top 200 players
link |
00:36:44.680
in a certain region like Europe or America or Asia.
link |
00:36:49.160
But for me, it would be hard to say.
link |
00:36:51.640
I am very bad at the game.
link |
00:36:53.760
I actually played AlphaStar a bit too late and it beat me.
link |
00:36:56.680
I remember the whole team was, oh, Oreo, you should play.
link |
00:36:59.760
And I was, oh, it looks like it's not so good yet.
link |
00:37:02.240
And then I remember I kind of got busy
link |
00:37:04.960
and waited an extra week and I played
link |
00:37:07.320
and it really beat me very badly.
link |
00:37:09.760
Was that, I mean, how did that feel?
link |
00:37:11.560
Isn't that an amazing feeling?
link |
00:37:12.720
That's amazing, yeah.
link |
00:37:13.640
I mean, obviously I tried my best
link |
00:37:16.560
and I tried to also impress my,
link |
00:37:18.120
because I actually played the first game.
link |
00:37:19.840
So I'm still pretty good at micromanagement.
link |
00:37:23.160
The problem is I just don't understand StarCraft II.
link |
00:37:25.320
I understand StarCraft.
link |
00:37:27.000
And when I played StarCraft,
link |
00:37:28.560
I probably was consistently like for a couple of years,
link |
00:37:32.760
top 32 in Europe.
link |
00:37:34.720
So I was decent, but at the time we didn't have
link |
00:37:37.280
this kind of MMR system as well established.
link |
00:37:40.400
So it would be hard to know what it was back then.
link |
00:37:43.240
So what's the difference in interface
link |
00:37:44.720
between AlphaStar and StarCraft
link |
00:37:47.800
and a human player in StarCraft?
link |
00:37:49.720
Is there any significant differences
link |
00:37:52.120
between the way they both see the game?
link |
00:37:54.200
I would say the way they see the game,
link |
00:37:56.080
there's a few things that are just very hard to simulate.
link |
00:38:01.080
The main one perhaps, which is obvious in hindsight
link |
00:38:05.240
is what's called cloaked units, which are invisible units.
link |
00:38:10.600
So in StarCraft, you can make some units
link |
00:38:13.280
that you need to have a particular kind of unit
link |
00:38:16.800
to detect it.
link |
00:38:18.080
So these units are invisible.
link |
00:38:20.600
If you cannot detect them, you cannot target them.
link |
00:38:22.760
So they would just destroy your buildings
link |
00:38:25.800
or kill your workers.
link |
00:38:27.760
But despite the fact you cannot target the unit,
link |
00:38:31.680
there's a shimmer that as a human you observe.
link |
00:38:34.640
I mean, you need to train a little bit,
link |
00:38:35.960
you need to pay attention,
link |
00:38:37.480
but you would see this kind of space time distortion
link |
00:38:41.920
and you would know, okay, there are, yeah.
link |
00:38:44.880
Yeah, there's like a wave thing.
link |
00:38:46.080
Yeah, it's called shimmer.
link |
00:38:47.720
Space time distortion, I like it.
link |
00:38:49.200
That's really like, the Blizzard term is shimmer.
link |
00:38:51.960
Shimmer, okay.
link |
00:38:52.800
And so these shimmer professional players
link |
00:38:55.600
actually can see it immediately.
link |
00:38:57.160
They understand it very well,
link |
00:38:59.520
but it's still something that requires
link |
00:39:01.440
certain amount of attention
link |
00:39:02.720
and it's kind of a bit annoying to deal with.
link |
00:39:05.680
Whereas for AlphaStar, in terms of vision,
link |
00:39:08.640
it's very hard for us to simulate sort of,
link |
00:39:11.120
oh, are you looking at this pixel in the screen and so on?
link |
00:39:14.200
So the only thing we can do is,
link |
00:39:17.520
there is a unit that's invisible over there.
link |
00:39:19.720
So AlphaStar would know that immediately.
link |
00:39:22.520
Obviously still obeys the rules.
link |
00:39:24.040
You cannot attack the unit.
link |
00:39:25.200
You must have a detector and so on,
link |
00:39:27.440
but it's kind of one of the main things
link |
00:39:29.360
that it just doesn't feel there's a very proper way.
link |
00:39:32.720
I mean, you could imagine, oh, you don't have hypers.
link |
00:39:35.520
Maybe you don't know exactly where it is,
link |
00:39:37.000
or sometimes you see it, sometimes you don't,
link |
00:39:39.280
but it's just really, really complicated to get it
link |
00:39:43.040
so that everyone would agree,
link |
00:39:44.320
oh, that's the best way to simulate this, right?
link |
00:39:47.680
It seems like a perception problem.
link |
00:39:49.320
It is a perception problem.
link |
00:39:50.640
So the only problem is people, you ask,
link |
00:39:54.280
oh, what's the difference between
link |
00:39:55.320
how humans perceive the game?
link |
00:39:56.760
I would say they wouldn't be able to tell a shimmer
link |
00:39:59.960
immediately as it appears on the screen,
link |
00:40:02.240
whereas AlphaStar in principle sees it very sharply, right?
link |
00:40:05.640
It sees that the bit turned from zero to one,
link |
00:40:08.680
meaning there's now a unit there,
link |
00:40:10.480
although you don't know the unit,
link |
00:40:11.960
or you know that you cannot attack it and so on.
link |
00:40:15.840
So that from a vision standpoint,
link |
00:40:18.080
that probably is the one that is kind of the most obvious one.
link |
00:40:22.960
Then there are things humans cannot do perfectly,
link |
00:40:25.160
even professionals, which is they might miss a detail,
link |
00:40:28.080
or they might have not seen a unit.
link |
00:40:30.600
And obviously as a computer,
link |
00:40:32.240
if there's a corner of the screen that turns green
link |
00:40:35.000
because a unit enters the field of view,
link |
00:40:37.680
that can go into the memory of the agent, the LSTM,
link |
00:40:41.040
and persist there for a while,
link |
00:40:42.480
and for however long is relevant, right?
link |
00:40:45.680
And in terms of action,
link |
00:40:47.680
it seems like the rate of action from AlphaStar
link |
00:40:50.720
is comparative, if not slower than professional players,
link |
00:40:54.280
but it's more precise is what I read.
link |
00:40:57.120
So that's really probably the one that is causing us
link |
00:41:01.840
more issues for a couple of reasons, right?
link |
00:41:05.000
The first one is StarCraft has been an AI environment
link |
00:41:08.400
for quite a few years.
link |
00:41:09.960
In fact, I mean, I was participating
link |
00:41:12.760
in the very first competition back in 2010.
link |
00:41:15.880
And there's really not been a kind of a very clear set
link |
00:41:19.880
of rules how the actions per minute,
link |
00:41:22.320
the rate of actions that you can issue is.
link |
00:41:24.720
And as a result, these agents or bots that people build
link |
00:41:29.280
in a kind of almost very cool way,
link |
00:41:31.080
they do like 20,000, 40,000 actions per minute.
link |
00:41:35.400
Now, to put this in perspective,
link |
00:41:37.200
a very good professional human
link |
00:41:39.520
might do 300 to 800 actions per minute.
link |
00:41:44.080
They might not be as precise.
link |
00:41:45.480
That's why the range is a bit tricky to identify exactly.
link |
00:41:49.040
I mean, 300 actions per minute precisely
link |
00:41:51.560
is probably realistic.
link |
00:41:53.400
800 is probably not, but you see humans doing a lot of actions
link |
00:41:56.960
because they warm up and they kind of select things
link |
00:41:59.480
and spam and so on just so that when they need,
link |
00:42:02.240
they have the accuracy.
link |
00:42:04.320
So we came into this by not having kind of a standard way
link |
00:42:09.680
to say, well, how do we measure whether an agent is
link |
00:42:13.240
at human level or not?
link |
00:42:15.760
On the other hand, we had a huge advantage,
link |
00:42:18.320
which is because we do imitation learning,
link |
00:42:21.360
agents turned out to act like humans
link |
00:42:24.480
in terms of rate of actions, even
link |
00:42:26.240
precisions and imprecisions of actions
link |
00:42:28.720
in the supervised policy.
link |
00:42:30.160
You could see all these.
link |
00:42:31.160
You could see how agents like to spam click, to move here.
link |
00:42:34.600
If you played especially Diablo, you wouldn't know what I mean.
link |
00:42:37.280
I mean, you just like spam, oh, move here, move here,
link |
00:42:39.720
move here.
link |
00:42:40.320
You're doing literally like maybe five actions
link |
00:42:43.280
in two seconds, but these actions are not
link |
00:42:45.640
very meaningful.
link |
00:42:46.840
One would have sufficed.
link |
00:42:48.720
So on the one hand, we start from this imitation policy
link |
00:42:52.080
that is at the ballpark of the actions per minutes of humans
link |
00:42:55.600
because it's actually statistically
link |
00:42:57.280
trying to imitate humans.
link |
00:42:58.920
So we see these very nicely in the curves
link |
00:43:01.040
that we showed in the blog post.
link |
00:43:02.480
There's these actions per minute,
link |
00:43:04.480
and the distribution looks very human like.
link |
00:43:07.640
But then, of course, as self play kicks in,
link |
00:43:10.920
and that's the part we haven't talked too much yet,
link |
00:43:13.240
but of course, the agent must play against itself to improve,
link |
00:43:17.160
then there's almost no guarantees
link |
00:43:19.600
that these actions will not become more precise
link |
00:43:22.400
or even the rate of actions is going to increase over time.
link |
00:43:26.120
So what we did, and this is probably
link |
00:43:29.080
the first attempt that we thought was reasonable,
link |
00:43:31.200
is we looked at the distribution of actions
link |
00:43:33.120
for humans for certain windows of time.
link |
00:43:36.360
And just to give a perspective, because I guess I mentioned
link |
00:43:39.280
that some of these agents that are programmatic,
link |
00:43:41.640
let's call them.
link |
00:43:42.320
They do 40,000 actions per minute.
link |
00:43:44.560
Professionals, as I said, do 300 to 800.
link |
00:43:47.320
So what we looked is we look at the distribution
link |
00:43:49.400
over professional gamers, and we took reasonably high actions
link |
00:43:53.680
per minute, but we kind of identify certain cutoffs
link |
00:43:57.400
after which, even if the agent wanted to act,
link |
00:44:00.520
these actions would be dropped.
link |
00:44:02.920
But the problem is this cutoff is probably set a bit too high.
link |
00:44:07.040
And what ends up happening, even though the games,
link |
00:44:10.040
and when we ask the professionals and the gamers,
link |
00:44:12.040
by and large, they feel like it's playing humanlike,
link |
00:44:15.840
there are some agents that developed maybe slightly
link |
00:44:20.640
too high APMs, which is actions per minute,
link |
00:44:24.200
combined with the precision, which
link |
00:44:27.000
made people start discussing a very interesting issue, which
link |
00:44:30.520
is, should we have limited these?
link |
00:44:32.440
Should we just let it lose and see what cool things
link |
00:44:35.880
it can come up with?
link |
00:44:37.040
Right?
link |
00:44:37.520
Interesting.
link |
00:44:38.200
So this is in itself an extremely interesting
link |
00:44:41.520
question, but the same way that modeling the shimmer
link |
00:44:44.000
would be so difficult, modeling absolutely all the details
link |
00:44:47.720
about muscles and precision and tiredness of humans
link |
00:44:51.680
would be quite difficult.
link |
00:44:52.960
So we're really here kind of innovating
link |
00:44:56.280
in this sense of, OK, what could be maybe
link |
00:44:58.960
the next iteration of putting more rules that
link |
00:45:02.040
makes the agents more humanlike in terms of restrictions?
link |
00:45:06.360
Yeah, putting constraints that.
link |
00:45:08.120
More constraints, yeah.
link |
00:45:09.240
That's really interesting.
link |
00:45:10.200
That's really innovative.
link |
00:45:11.200
So one of the constraints you put on yourself,
link |
00:45:15.360
or at least focused in, is on the Protoss race,
link |
00:45:18.040
as far as I understand.
link |
00:45:19.920
Can you tell me about the different races
link |
00:45:21.920
and how they, so Protoss, Terran, and Zerg,
link |
00:45:26.000
how do they compare?
link |
00:45:27.080
How do they interact?
link |
00:45:28.160
Why did you choose Protoss?
link |
00:45:30.360
Yeah, in the dynamics of the game seen
link |
00:45:34.000
from a strategic perspective.
link |
00:45:35.680
So Protoss, so in StarCraft there are three races.
link |
00:45:39.680
Indeed, in the demonstration, we saw only the Protoss race.
link |
00:45:43.880
So maybe let's start with that one.
link |
00:45:45.560
Protoss is kind of the most technologically advanced race.
link |
00:45:49.440
It has units that are expensive but powerful.
link |
00:45:53.800
So in general, you want to kind of conserve your units
link |
00:45:57.880
as you go attack.
link |
00:45:59.520
And then you want to utilize these tactical advantages
link |
00:46:03.280
of very fancy spells and so on and so forth.
link |
00:46:07.320
And at the same time, they're kind of,
link |
00:46:11.480
people say they're a bit easier to play perhaps.
link |
00:46:15.280
But that I actually didn't know.
link |
00:46:17.160
I mean, I just talked now a lot to the players
link |
00:46:20.160
that we work with, TLO and Mana, and they said, oh yeah,
link |
00:46:23.360
Protoss is actually, people think,
link |
00:46:24.720
is actually one of the easiest races.
link |
00:46:26.360
So perhaps the easier, that doesn't
link |
00:46:28.840
mean that it's obviously professional players
link |
00:46:32.680
excel at the three races.
link |
00:46:34.120
And there's never a race that dominates
link |
00:46:37.560
for a very long time anyway.
link |
00:46:38.800
So if you look at the top, I don't know, 100 in the world,
link |
00:46:41.680
is there one race that dominates that list?
link |
00:46:44.280
It would be hard to know because it depends on the regions.
link |
00:46:46.840
I think it's pretty equal in terms of distribution.
link |
00:46:50.600
And Blizzard wants it to be equal.
link |
00:46:53.360
They wouldn't want one race like Protoss
link |
00:46:56.280
to not be representative in the top place.
link |
00:46:59.880
So definitely, they tried it to be balanced.
link |
00:47:03.800
So then maybe the opposite race of Protoss is Zerg.
link |
00:47:07.280
Zerg is a race where you just kind of expand and take over
link |
00:47:11.680
as many resources as you can, and they
link |
00:47:14.360
have a very high capacity to regenerate their units.
link |
00:47:17.760
So if you have an army, it's not that valuable in terms
link |
00:47:20.920
of losing the whole army is not a big deal as Zerg
link |
00:47:23.920
because you can then rebuild it.
link |
00:47:25.840
And given that you generally accumulate
link |
00:47:28.320
a huge bank of resources, Zergs typically
link |
00:47:31.800
play by applying a lot of pressure,
link |
00:47:34.200
maybe losing their whole army, but then rebuilding it
link |
00:47:37.040
quickly.
link |
00:47:37.920
So although, of course, every race, I mean, there's never,
link |
00:47:42.560
I mean, they're pretty diverse.
link |
00:47:43.960
I mean, there are some units in Zerg that
link |
00:47:45.320
are technologically advanced, and they do
link |
00:47:47.160
some very interesting spells.
link |
00:47:48.760
And there's some units in Protoss that are less valuable,
link |
00:47:51.360
and you could lose a lot of them and rebuild them,
link |
00:47:53.480
and it wouldn't be a big deal.
link |
00:47:55.080
All right, so maybe I'm missing out.
link |
00:47:57.840
Maybe I'm going to say some dumb stuff, but summary
link |
00:48:01.680
of strategy.
link |
00:48:02.520
So first, there's collection of a lot of resources.
link |
00:48:05.720
That's one option.
link |
00:48:06.560
The other one is expanding, so building other bases.
link |
00:48:11.920
Then the other is obviously building units
link |
00:48:15.640
and attacking with those units.
link |
00:48:17.160
And then I don't know what else there is.
link |
00:48:20.640
Maybe there's the different timing of attacks,
link |
00:48:24.080
like do I attack early, attack late?
link |
00:48:26.000
What are the different strategies that emerged
link |
00:48:28.000
that you've learned about?
link |
00:48:29.120
I've read that a bunch of people are super happy
link |
00:48:31.360
that you guys have apparently, that Alpha Star apparently
link |
00:48:34.440
has discovered that it's really good to,
link |
00:48:36.400
what is it, saturate?
link |
00:48:38.040
Oh yeah, the mineral line.
link |
00:48:39.600
Yeah, the mineral line.
link |
00:48:41.400
Yeah, yeah.
link |
00:48:42.240
And that's for greedy amateur players like myself.
link |
00:48:45.640
That's always been a good strategy.
link |
00:48:47.520
You just build up a lot of money,
link |
00:48:49.040
and it just feels good to just accumulate and accumulate.
link |
00:48:53.320
So thank you for discovering that and validating all of us.
link |
00:48:56.720
But is there other strategies that you discovered
link |
00:48:59.240
that are interesting, unique to this game?
link |
00:49:01.840
Yeah, so if you look at the kind of,
link |
00:49:05.280
not being a StarCraft II player,
link |
00:49:06.480
but of course StarCraft and StarCraft II
link |
00:49:08.080
and real time strategy games in general are very similar.
link |
00:49:12.120
I would classify perhaps the openings of the game.
link |
00:49:17.560
They're very important.
link |
00:49:18.760
And generally I would say there's two kinds of openings.
link |
00:49:21.760
One that's a standard opening.
link |
00:49:23.400
That's generally how players find sort of a balance
link |
00:49:28.400
between risk and economy and building some units early on
link |
00:49:32.880
so that they could defend,
link |
00:49:34.080
but they're not too exposed basically,
link |
00:49:36.280
but also expanding quite quickly.
link |
00:49:38.920
So this would be kind of a standard opening.
link |
00:49:41.520
And within a standard opening,
link |
00:49:43.120
then what you do choose generally is
link |
00:49:45.320
what technology are you aiming towards?
link |
00:49:47.840
So there's a bit of rock, paper, scissors
link |
00:49:49.760
of you could go for spaceships
link |
00:49:52.400
or you could go for invisible units
link |
00:49:54.560
or you could go for, I don't know,
link |
00:49:55.920
like massive units that attack against certain kinds
link |
00:49:58.760
of units, but they're weak against others.
link |
00:50:01.080
So standard openings themselves have some choices
link |
00:50:05.200
like rock, paper, scissors style.
link |
00:50:06.960
Of course, if you scout and you're good
link |
00:50:08.480
at guessing what the opponent is doing,
link |
00:50:10.520
then you can play as an advantage
link |
00:50:12.240
because if you know you're gonna play rock,
link |
00:50:13.920
I mean, I'm gonna play paper obviously.
link |
00:50:15.920
So you can imagine that normal standard games
link |
00:50:18.600
in StarCraft looks like a continuous rock, paper,
link |
00:50:22.400
scissors game where you guess what the distribution
link |
00:50:26.080
of rock, paper, and scissors is from the enemy
link |
00:50:29.400
and reacting accordingly to try to beat it
link |
00:50:32.840
or put the paper out before he kind of changes his mind
link |
00:50:36.880
from rock to scissors,
link |
00:50:38.360
and then you would be in a weak position.
link |
00:50:39.960
So, sorry to pause on that.
link |
00:50:41.640
I didn't realize this element
link |
00:50:42.800
because I know it's true with poker.
link |
00:50:44.360
I know I looked at Labratus.
link |
00:50:48.320
So you're also estimating trying to guess the distribution,
link |
00:50:51.720
trying to better and better estimate the distribution
link |
00:50:53.680
of what the opponent is likely to be doing.
link |
00:50:55.560
Yeah, I mean, as a player,
link |
00:50:56.960
you definitely wanna have a belief state
link |
00:50:59.360
over what's up on the other side of the map.
link |
00:51:02.520
And when your belief state becomes inaccurate,
link |
00:51:05.080
when you start having that serious doubts,
link |
00:51:07.560
whether he's gonna play something that you must know,
link |
00:51:10.800
that's when you scout.
link |
00:51:11.920
You wanna then gather information, right?
link |
00:51:14.040
Is improving the accuracy of the belief
link |
00:51:15.960
or improving the belief state part of the loss
link |
00:51:19.360
that you're trying to optimize?
link |
00:51:20.560
Or is it just a side effect?
link |
00:51:22.360
It's implicit, but you could explicitly model it,
link |
00:51:25.440
and it would be quite good at probably predicting
link |
00:51:27.880
what's on the other side of the map.
link |
00:51:30.000
But so far, it's all implicit.
link |
00:51:32.520
There's no additional reward for predicting the enemy.
link |
00:51:36.320
So there's these standard openings,
link |
00:51:38.400
and then there's what people call cheese,
link |
00:51:41.240
which is very interesting.
link |
00:51:42.400
And AlphaStar sometimes really likes this kind of cheese.
link |
00:51:46.360
These cheeses, what they are is kind of an all in strategy.
link |
00:51:50.440
You're gonna do something sneaky.
link |
00:51:53.240
You're gonna hide your own buildings
link |
00:51:56.680
close to the enemy base,
link |
00:51:58.200
or you're gonna go for hiding your technological buildings
link |
00:52:01.600
so that you do invisible units
link |
00:52:03.040
and the enemy just cannot react to detect it
link |
00:52:06.040
and thus lose the game.
link |
00:52:07.960
And there's quite a few of these cheeses
link |
00:52:10.000
and variants of them.
link |
00:52:11.760
And there it's where actually the belief state
link |
00:52:14.480
becomes even more important.
link |
00:52:16.360
Because if I scout your base and I see no buildings at all,
link |
00:52:20.200
any human player knows something's up.
link |
00:52:22.480
They might know, well,
link |
00:52:23.320
you're hiding something close to my base.
link |
00:52:25.640
Should I build suddenly a lot of units to defend?
link |
00:52:28.400
Should I actually block my ramp with workers
link |
00:52:31.000
so that you cannot come and destroy my base?
link |
00:52:33.520
So there's all this is happening
link |
00:52:35.680
and defending against cheeses is extremely important.
link |
00:52:39.440
And in the AlphaStar League,
link |
00:52:40.800
many agents actually develop some cheesy strategies.
link |
00:52:45.080
And in the games we saw against TLO and Mana,
link |
00:52:48.040
two out of the 10 agents
link |
00:52:49.240
were actually doing these kind of strategies
link |
00:52:51.760
which are cheesy strategies.
link |
00:52:53.640
And then there's a variant of cheesy strategy
link |
00:52:55.600
which is called all in.
link |
00:52:57.360
So an all in strategy is not perhaps as drastic as,
link |
00:53:00.440
oh, I'm gonna build cannons on your base
link |
00:53:02.520
and then bring all my workers
link |
00:53:03.840
and try to just disrupt your base and game over,
link |
00:53:06.800
or GG as we say in StarCraft.
link |
00:53:09.800
There's these kind of very cool things
link |
00:53:11.960
that you can align precisely at a certain time mark.
link |
00:53:14.720
So for instance,
link |
00:53:15.680
you can generate exactly 10 unit composition
link |
00:53:19.520
that is perfect, like five of this type,
link |
00:53:21.440
five of this other type,
link |
00:53:22.920
and align the upgrade
link |
00:53:24.360
so that at four minutes and a half, let's say,
link |
00:53:27.240
you have these 10 units and the upgrade just finished.
link |
00:53:30.600
And at that point, that army is really scary.
link |
00:53:33.960
And unless the enemy really knows what's going on,
link |
00:53:36.440
if you push, you might then have an advantage
link |
00:53:40.240
because maybe the enemy is doing something more standard,
link |
00:53:42.440
it expanded too much, it developed too much economy,
link |
00:53:45.760
and it trade off badly against having defenses,
link |
00:53:49.720
and the enemy will lose.
link |
00:53:51.120
But it's called all in because if you don't win,
link |
00:53:53.640
then you're gonna lose.
link |
00:53:55.040
So you see players that do these kinds of strategies,
link |
00:53:57.960
if they don't succeed, game is not over.
link |
00:54:00.000
I mean, they still have a base
link |
00:54:01.200
and they still gathering minerals,
link |
00:54:02.840
but they will just GG out of the game
link |
00:54:04.760
because they know, well, game is over.
link |
00:54:06.760
I gambled and I failed.
link |
00:54:08.840
So if we start entering the game theoretic aspects
link |
00:54:12.480
of the game, it's really rich and it's really,
link |
00:54:15.200
that's why it also makes it quite entertaining to watch.
link |
00:54:17.960
Even if I don't play, I still enjoy watching the game.
link |
00:54:21.760
But the agents are trying to do this mostly implicitly.
link |
00:54:26.880
But one element that we improved in self play
link |
00:54:29.120
is creating the Alpha Star League.
link |
00:54:31.400
And the Alpha Star League is not pure self play.
link |
00:54:34.640
It's trying to create a different personalities of agents
link |
00:54:37.960
so that some of them will become cheesy agents.
link |
00:54:41.560
Some of them might become very economical, very greedy,
link |
00:54:44.440
like getting all the resources,
link |
00:54:46.240
but then being maybe early on, they're gonna be weak,
link |
00:54:48.840
but later on, they're gonna be very strong.
link |
00:54:51.080
And by creating this personality of agents,
link |
00:54:53.480
which sometimes it just happens naturally
link |
00:54:55.440
that you can see kind of an evolution of agents
link |
00:54:58.280
that given the previous generation,
link |
00:55:00.840
they train against all of them
link |
00:55:02.000
and then they generate kind of the perfect counter
link |
00:55:04.400
to that distribution.
link |
00:55:05.800
But these agents, you must have them in the populations
link |
00:55:09.320
because if you don't have them,
link |
00:55:11.320
you're not covered against these things.
link |
00:55:13.440
You wanna create all sorts of the opponents
link |
00:55:17.120
that you will find in the wild.
link |
00:55:18.680
So you can be exposed to these cheeses, early aggression,
link |
00:55:23.120
later aggression, more expansions,
link |
00:55:25.760
dropping units in your base from the side, all these things.
link |
00:55:29.600
And pure self play is getting a bit stuck
link |
00:55:32.800
at finding some subset of these, but not all of these.
link |
00:55:36.240
So the Alpha Star League is a way
link |
00:55:38.400
to kind of do an ensemble of agents
link |
00:55:41.600
that they're all playing in a league,
link |
00:55:43.520
much like people play on Battle.net, right?
link |
00:55:45.560
They play, you play against someone
link |
00:55:47.480
who does a new cool strategy and you immediately,
link |
00:55:50.280
oh my God, I wanna try it, I wanna play again.
link |
00:55:53.080
And this to me was another critical part of the problem,
link |
00:55:57.600
which was, can we create a Battle.net for agents?
link |
00:56:01.280
And that's kind of what the Alpha Star League really is.
link |
00:56:03.560
That's fascinating.
link |
00:56:04.400
And where they stick to their different strategies.
link |
00:56:06.920
Yeah, wow, that's really, really interesting.
link |
00:56:09.880
But that said, you were fortunate enough
link |
00:56:13.240
or just skilled enough to win five, zero.
link |
00:56:17.320
And so how hard is it to win?
link |
00:56:19.280
I mean, that's not the goal.
link |
00:56:20.320
I guess, I don't know what the goal is.
link |
00:56:21.880
The goal should be to win majority, not five, zero,
link |
00:56:25.400
but how hard is it in general to win all matchups
link |
00:56:29.360
on a one V one?
link |
00:56:31.080
So that's a very interesting question
link |
00:56:33.600
because once you see Alpha Star and superficially
link |
00:56:38.680
you think, well, okay, it won.
link |
00:56:40.520
Let's, if you sum all the games like 10 to one, right?
link |
00:56:42.960
It lost the game that it played with the camera interface.
link |
00:56:46.280
You might think, well, that's done, right?
link |
00:56:48.760
It's superhuman at the game.
link |
00:56:50.800
And that's not really the claim we really can make actually.
link |
00:56:55.960
The claim is we beat a professional gamer
link |
00:56:58.800
for the first time.
link |
00:57:00.080
StarCraft has really been a thing
link |
00:57:02.440
that has been going on for a few years,
link |
00:57:04.080
but a moment like this had not occurred before yet.
link |
00:57:09.480
But are these agents impossible to beat?
link |
00:57:12.360
Absolutely not, right?
link |
00:57:13.400
So that's a bit what's kind of the difference is
link |
00:57:17.320
the agents play at grandmaster level.
link |
00:57:19.520
They definitely understand the game enough
link |
00:57:21.480
to play extremely well, but are they unbeatable?
link |
00:57:24.920
Do they play perfect?
link |
00:57:26.600
No, and actually in StarCraft,
link |
00:57:29.280
because of these sneaky strategies,
link |
00:57:32.160
it's always possible that you might take a huge risk
link |
00:57:34.920
sometimes, but you might get wins, right?
link |
00:57:36.880
Out of this.
link |
00:57:38.160
So I think that as a domain,
link |
00:57:41.560
it still has a lot of opportunities,
link |
00:57:43.320
not only because of course we wanna learn
link |
00:57:45.840
with less experience, we would like to,
link |
00:57:47.720
I mean, if I learned to play Protoss,
link |
00:57:49.680
I can play Terran and learn it much quicker
link |
00:57:52.520
than Alpha Star can, right?
link |
00:57:53.640
So there are obvious interesting research challenges
link |
00:57:56.720
as well, but even as the raw performance goes,
link |
00:58:02.320
really the claim here can be we are at pro level
link |
00:58:05.200
or at high grandmaster level,
link |
00:58:08.320
but obviously the players also did not know what to expect,
link |
00:58:13.440
right?
link |
00:58:14.280
Their prior distribution was a bit off
link |
00:58:15.960
because they played this kind of new like alien brain
link |
00:58:19.400
as they like to say it, right?
link |
00:58:21.000
And that's what makes it exciting for them.
link |
00:58:24.120
But also I think if you look at the games closely,
link |
00:58:27.160
you see there were weaknesses in some points,
link |
00:58:30.680
maybe Alpha Star did not scout,
link |
00:58:32.520
or if it had invisible units going against
link |
00:58:35.240
at certain points, it wouldn't have known
link |
00:58:37.400
and it would have been bad.
link |
00:58:38.800
So there's still quite a lot of work to do,
link |
00:58:42.160
but it's really a very exciting moment for us
link |
00:58:44.680
to be seeing, wow, a single neural net on a GPU
link |
00:58:48.400
is actually playing against these guys
link |
00:58:50.240
who are amazing.
link |
00:58:51.280
I mean, you have to see them play in life.
link |
00:58:52.920
They're really, really amazing players.
link |
00:58:55.040
Yeah, I'm sure there must be a guy in Poland
link |
00:58:59.320
somewhere right now training his butt off
link |
00:59:02.000
to make sure that this never happens again with Alpha Star.
link |
00:59:05.920
So that's really exciting in terms of Alpha Star
link |
00:59:09.080
having some holes to exploit, which is great.
link |
00:59:11.520
And then we build on top of each other
link |
00:59:13.720
and it feels like StarCraft on let go,
link |
00:59:16.360
even if you win, it's still not,
link |
00:59:20.640
there's so many different dimensions
link |
00:59:23.120
in which you can explore.
link |
00:59:24.200
So that's really, really interesting.
link |
00:59:25.560
Do you think there's a ceiling to Alpha Star?
link |
00:59:28.520
You've said that it hasn't reached,
link |
00:59:31.360
you know, this is a big,
link |
00:59:32.840
wait, let me actually just pause for a second.
link |
00:59:35.520
How did it feel to come here to this point,
link |
00:59:40.200
to beat a top professional player?
link |
00:59:42.240
Like that night, I mean, you know,
link |
00:59:44.600
Olympic athletes have their gold medal, right?
link |
00:59:47.120
This is your gold medal in a sense.
link |
00:59:48.840
Sure, you're cited a lot,
link |
00:59:50.400
you've published a lot of prestigious papers, whatever,
link |
00:59:53.120
but this is like a win.
link |
00:59:55.280
How did it feel?
link |
00:59:56.480
I mean, it was, for me, it was unbelievable
link |
00:59:59.440
because first the win itself,
link |
01:00:03.920
I mean, it was so exciting.
link |
01:00:05.080
I mean, so looking back to those last days of 2018 really,
link |
01:00:11.040
that's when the games were played.
link |
01:00:13.120
I'm sure I look back at that moment, I'll say,
link |
01:00:15.240
oh my God, I want to be in a project like that.
link |
01:00:18.000
It's like, I already feel the nostalgia of like,
link |
01:00:21.120
yeah, that was huge in terms of the energy
link |
01:00:24.240
and the team effort that went into it.
link |
01:00:26.360
And so in that sense, as soon as it happened,
link |
01:00:29.240
I already knew it was kind of,
link |
01:00:31.280
I was losing it a little bit.
link |
01:00:33.000
So it is almost like sad that it happened and oh my God,
link |
01:00:36.320
but on the other hand, it also verifies the approach.
link |
01:00:41.320
But to me also, there's so many challenges
link |
01:00:43.800
and interesting aspects of intelligence
link |
01:00:46.080
that even though we can train a neural network
link |
01:00:49.840
to play at the level of the best humans,
link |
01:00:52.680
there's still so many challenges.
link |
01:00:54.200
So for me, it's also like, well,
link |
01:00:55.680
this is really an amazing achievement,
link |
01:00:57.440
but I already was also thinking about next steps.
link |
01:00:59.920
I mean, as I said, these Asians play Protoss versus Protoss,
link |
01:01:04.080
but they should be able to play a different race
link |
01:01:07.200
much quicker, right?
link |
01:01:08.120
So that would be an amazing achievement.
link |
01:01:10.640
Some people call this meta reinforcement learning,
link |
01:01:13.360
meta learning and so on, right?
link |
01:01:15.200
So there's so many possibilities after that moment,
link |
01:01:18.960
but the moment itself, it really felt great.
link |
01:01:23.600
We had this bet, so I'm kind of a pessimist in general.
link |
01:01:27.760
So I kind of send an email to the team.
link |
01:01:29.920
I said, okay, let's against TLO first, right?
link |
01:01:33.680
Like what's gonna be the result?
link |
01:01:35.120
And I really thought we would lose like five zero, right?
link |
01:01:38.680
We had some calibration made against the 5,000 MMR player.
link |
01:01:44.080
TLO was much stronger than that player,
link |
01:01:47.360
even if he played Protoss, which is his off race.
link |
01:01:51.040
But yeah, I was not imagining we would win.
link |
01:01:53.120
So for me, that was just kind of a test run or something.
link |
01:01:55.600
And then it really kind of, he was really surprised.
link |
01:01:59.000
And unbelievably, we went to this bar to celebrate
link |
01:02:04.560
and Dave tells me, well, why don't we invite someone
link |
01:02:08.360
who is a thousand MMR stronger in Protoss,
link |
01:02:10.960
like actual Protoss player,
link |
01:02:12.520
like that it turned up being Mana, right?
link |
01:02:16.160
And we had some drinks and I said, sure, why not?
link |
01:02:19.360
But then I thought, well,
link |
01:02:20.200
that's really gonna be impossible to beat.
link |
01:02:22.040
I mean, even because it's so much ahead,
link |
01:02:24.560
a thousand MMR is really like 99% probability
link |
01:02:28.400
that Mana would beat TLO as Protoss versus Protoss, right?
link |
01:02:33.040
So we did that.
link |
01:02:34.200
And to me, the second game was much more important,
link |
01:02:38.960
even though a lot of uncertainty kind of disappeared
link |
01:02:42.080
after we kind of beat TLO.
link |
01:02:43.640
I mean, he is a professional player.
link |
01:02:45.640
So that was kind of, oh,
link |
01:02:46.840
but that's really a very nice achievement.
link |
01:02:49.720
But Mana really was at the top
link |
01:02:51.760
and you could see he played much better,
link |
01:02:53.840
but our agents got much better too.
link |
01:02:55.360
So it's like, ah, and then after the first game,
link |
01:02:59.480
I said, if we take a single game,
link |
01:03:00.880
at least we can say we beat a game.
link |
01:03:02.720
I mean, even if we don't beat the series,
link |
01:03:04.320
for me, that was a huge relief.
link |
01:03:06.920
And I mean, I remember the hugging demis.
link |
01:03:09.200
And I mean, it was really like,
link |
01:03:10.840
this moment for me will resonate forever as a researcher.
link |
01:03:14.160
And I mean, as a person,
link |
01:03:15.360
and yeah, it's a really like great accomplishment.
link |
01:03:18.240
And it was great also to be there with the team in the room.
link |
01:03:21.360
I don't know if you saw like this.
link |
01:03:23.040
So it was really like.
link |
01:03:24.720
I mean, from my perspective,
link |
01:03:25.960
the other interesting thing is just like watching Kasparov,
link |
01:03:29.840
watching Mana was also interesting
link |
01:03:33.720
because he didn't, he has kind of a loss of words.
link |
01:03:36.120
I mean, whenever you lose, I've done a lot of sports.
link |
01:03:38.600
You sometimes say excuses, you look for reasons.
link |
01:03:43.520
And he couldn't really come up with reasons.
link |
01:03:46.240
I mean, so with the off race for Protoss,
link |
01:03:50.000
you could say, well, it felt awkward, it wasn't,
link |
01:03:52.280
but here it was just beaten.
link |
01:03:55.160
And it was beautiful to look at a human being
link |
01:03:57.920
being superseded by an AI system.
link |
01:04:00.240
I mean, it's a beautiful moment for researchers, so.
link |
01:04:04.400
Yeah, for sure it was.
link |
01:04:05.960
I mean, probably the highlight of my career so far
link |
01:04:09.920
because of its uniqueness and coolness.
link |
01:04:11.760
And I don't know, I mean, it's obviously, as you said,
link |
01:04:14.240
you can look at papers, citations and so on,
link |
01:04:16.200
but these really is like a testament
link |
01:04:19.240
of the whole machine learning approach
link |
01:04:22.400
and using games to advance technology.
link |
01:04:24.640
I mean, it really was,
link |
01:04:26.840
everything came together at that moment.
link |
01:04:28.840
That's really the summary.
link |
01:04:29.840
Also on the other side, it's a popularization of AI too,
link |
01:04:34.040
because it's just like traveling to the moon and so on.
link |
01:04:38.200
I mean, this is where a very large community of people
link |
01:04:41.000
that don't really know AI,
link |
01:04:43.120
they get to really interact with it.
link |
01:04:45.200
Which is very important.
link |
01:04:46.040
I mean, we must, you know,
link |
01:04:48.640
writing papers helps our peers, researchers,
link |
01:04:51.400
to understand what we're doing.
link |
01:04:52.520
But I think AI is becoming mature enough
link |
01:04:55.880
that we must sort of try to explain what it is.
link |
01:04:59.000
And perhaps through games is an obvious way
link |
01:05:01.440
because these games always had built in AI.
link |
01:05:03.640
So it may be everyone experience an AI playing a video game,
link |
01:05:07.680
even if they don't know,
link |
01:05:08.520
because there's always some scripted element
link |
01:05:10.240
and some people might even call that AI already, right?
link |
01:05:13.920
So what are other applications
link |
01:05:16.320
of the approaches underlying AlphaStar
link |
01:05:19.080
that you see happening?
link |
01:05:20.280
There's a lot of echoes of, you said,
link |
01:05:22.360
transformer of language modeling and so on.
link |
01:05:25.440
Have you already started thinking
link |
01:05:27.120
where the breakthroughs in AlphaStar
link |
01:05:30.400
get expanded to other applications?
link |
01:05:32.280
Right, so I thought about a few things
link |
01:05:34.640
for like kind of next month, next years.
link |
01:05:38.440
The main thing I'm thinking about actually is what's next
link |
01:05:41.480
as a kind of a grand challenge.
link |
01:05:43.160
Because for me, like we've seen Atari
link |
01:05:47.120
and then there's like the sort of three dimensional walls
link |
01:05:50.280
that we've seen also like pretty good performance
link |
01:05:52.520
from these capture the flag agents
link |
01:05:54.120
that also some people at DeepMind and elsewhere
link |
01:05:56.440
are working on.
link |
01:05:57.600
We've also seen some amazing results on like,
link |
01:05:59.600
for instance, Dota 2, which is also a very complicated game.
link |
01:06:03.280
So for me, like the main thing I'm thinking about
link |
01:06:05.960
is what's next in terms of challenge.
link |
01:06:07.960
So as a researcher, I see sort of two tensions
link |
01:06:12.960
between research and then applications or areas
link |
01:06:16.760
or domains where you apply them.
link |
01:06:18.480
So on the one hand, we've done,
link |
01:06:20.480
thanks to the application of StarCraft is very hard.
link |
01:06:23.320
We developed some techniques, some new research
link |
01:06:25.600
that now we could look at elsewhere.
link |
01:06:27.480
Like are there other applications where we can apply these?
link |
01:06:30.520
And the obvious ones, absolutely.
link |
01:06:32.880
You can think of feeding back to sort of the community
link |
01:06:37.440
we took from, which was mostly sequence modeling
link |
01:06:40.240
or natural language processing.
link |
01:06:41.680
So we've developed and extended things from the transformer
link |
01:06:46.120
and we use pointer networks.
link |
01:06:48.120
We combine LSTM and transformers in interesting ways.
link |
01:06:51.280
So that's perhaps the kind of lowest hanging fruit
link |
01:06:54.200
of feeding back to now a different field
link |
01:06:57.600
of machine learning that's not playing video games.
link |
01:07:00.880
Let me go old school and jump to Mr. Alan Turing.
link |
01:07:05.680
So the Turing test is a natural language test,
link |
01:07:09.920
a conversational test.
link |
01:07:11.560
What's your thought of it as a test for intelligence?
link |
01:07:15.760
Do you think it is a grand challenge
link |
01:07:17.320
that's worthy of undertaking?
link |
01:07:18.920
Maybe if it is, would you reformulate it or phrase it
link |
01:07:22.440
somehow differently?
link |
01:07:23.640
Right, so I really love the Turing test
link |
01:07:25.600
because I also like sequences and language understanding.
link |
01:07:29.480
And in fact, some of the early work
link |
01:07:32.120
we did in machine translation, we
link |
01:07:33.640
tried to apply to kind of a neural chatbot, which obviously
link |
01:07:38.680
would never pass the Turing test because it was very limited.
link |
01:07:42.200
But it is a very fascinating idea
link |
01:07:45.160
that you could really have an AI that
link |
01:07:49.760
would be indistinguishable from humans in terms of asking
link |
01:07:53.840
or conversing with it.
link |
01:07:56.000
So I think the test itself seems very nice.
link |
01:08:00.680
And it's kind of well defined, actually,
link |
01:08:02.560
like the passing it or not.
link |
01:08:04.840
I think there's quite a few rules
link |
01:08:06.520
that feel pretty simple.
link |
01:08:09.080
And I think they have these competitions every year.
link |
01:08:14.680
Yes, there's the Lebner Prize.
link |
01:08:15.920
But I don't know if you've seen the kind of bots
link |
01:08:22.240
that emerge from that competition.
link |
01:08:24.120
They're not quite as what you would.
link |
01:08:27.960
So it feels like that there's weaknesses with the way Turing
link |
01:08:30.640
formulated it.
link |
01:08:31.400
It needs to be that the definition
link |
01:08:34.960
of a genuine, rich, fulfilling human conversation,
link |
01:08:39.880
it needs to be something else.
link |
01:08:41.640
Like the Alexa Prize, which I'm not as well familiar with,
link |
01:08:44.880
has tried to define that more, I think,
link |
01:08:46.440
by saying you have to continue keeping
link |
01:08:48.560
a conversation for 30 minutes, something like that.
link |
01:08:52.200
So basically forcing the agent not to just fool,
link |
01:08:55.480
but to have an engaging conversation kind of thing.
link |
01:09:02.320
Have you thought about this problem richly?
link |
01:09:06.520
And if you have in general, how far away are we from?
link |
01:09:10.720
You worked a lot on language understanding,
link |
01:09:14.160
language generation, but the full dialogue,
link |
01:09:16.640
the conversation, just sitting at the bar
link |
01:09:19.880
having a couple of beers for an hour,
link |
01:09:21.680
that kind of conversation.
link |
01:09:22.920
Have you thought about it?
link |
01:09:23.640
Yeah, so I think you touched here
link |
01:09:25.160
on the critical point, which is feasibility.
link |
01:09:28.960
So there's a great essay by Hamming,
link |
01:09:32.840
which describes sort of grand challenges of physics.
link |
01:09:37.280
And he argues that, well, OK, for instance,
link |
01:09:41.040
teleportation or time travel are great grand challenges
link |
01:09:44.680
of physics, but there's no attacks.
link |
01:09:46.600
We really don't know or cannot kind of make any progress.
link |
01:09:50.360
So that's why most physicists and so on,
link |
01:09:53.320
they don't work on these in their PhDs
link |
01:09:55.320
and as part of their careers.
link |
01:09:57.840
So I see the Turing test, in the full Turing test,
link |
01:10:00.880
as a bit still too early.
link |
01:10:02.720
Like I think we're, especially with the current trend
link |
01:10:06.680
of deep learning language models,
link |
01:10:10.040
we've seen some amazing examples.
link |
01:10:11.600
I think GPT2 being the most recent one, which
link |
01:10:14.360
is very impressive.
link |
01:10:15.760
But to understand to fully solve passing or fooling a human
link |
01:10:21.000
to think that there's a human on the other side,
link |
01:10:23.440
I think we're quite far.
link |
01:10:24.880
So as a result, I don't see myself
link |
01:10:27.240
and I probably would not recommend people doing a PhD
link |
01:10:30.480
on solving the Turing test because it just
link |
01:10:32.360
feels it's kind of too early or too hard of a problem.
link |
01:10:35.400
Yeah, but that said, you said the exact same thing
link |
01:10:37.800
about StarCraft about a few years ago.
link |
01:10:40.560
Indeed.
link |
01:10:41.040
To Demis.
link |
01:10:41.560
So you'll probably also be the person who passes
link |
01:10:46.200
the Turing test in three years.
link |
01:10:48.120
I mean, I think that, yeah.
link |
01:10:50.920
So we have this on record.
link |
01:10:52.040
This is nice.
link |
01:10:52.640
It's true.
link |
01:10:53.520
I mean, it's true that progress sometimes
link |
01:10:56.560
is a bit unpredictable.
link |
01:10:57.800
I really wouldn't have not.
link |
01:10:59.200
Even six months ago, I would not have predicted the level
link |
01:11:02.440
that we see that these agents can deliver at grandmaster
link |
01:11:06.160
level.
link |
01:11:07.800
But I have worked on language enough.
link |
01:11:10.040
And basically, my concern is not that something could happen,
link |
01:11:13.600
a breakthrough could happen that would bring us to solving
link |
01:11:16.400
or passing the Turing test, is that I just
link |
01:11:19.160
think the statistical approach to it is not going to cut it.
link |
01:11:24.120
So we need a breakthrough, which is great for the community.
link |
01:11:28.240
But given that, I think there's quite more uncertainty.
link |
01:11:31.800
Whereas for StarCraft, I knew what the steps would
link |
01:11:36.120
be to get us there.
link |
01:11:38.120
I think it was clear that using the imitation learning part
link |
01:11:41.560
and then using this battle net for agents
link |
01:11:44.320
were going to be key.
link |
01:11:45.440
And it turned out that this was the case.
link |
01:11:48.280
And a little more was needed, but not much more.
link |
01:11:51.560
For Turing test, I just don't know
link |
01:11:53.640
what the plan or execution plan would look like.
link |
01:11:56.080
So that's why I myself working on it as a grand challenge
link |
01:12:00.680
is hard.
link |
01:12:01.480
But there are quite a few sub challenges
link |
01:12:03.880
that are related that you could say,
link |
01:12:05.600
well, I mean, what if you create a great assistant
link |
01:12:09.040
like Google already has, like the Google Assistant.
link |
01:12:11.400
So can we make it better?
link |
01:12:13.120
And can we make it fully neural and so on?
link |
01:12:15.400
That I start to believe maybe we're
link |
01:12:17.440
reaching a point where we should attempt these challenges.
link |
01:12:20.640
I like this conversation so much because it echoes very much
link |
01:12:23.520
the StarCraft conversation.
link |
01:12:24.840
It's exactly how you approach StarCraft.
link |
01:12:26.880
Let's break it down into small pieces and solve those.
link |
01:12:29.600
And you end up solving the whole game.
link |
01:12:31.320
Great.
link |
01:12:31.920
But that said, you're behind some
link |
01:12:34.120
of the biggest pieces of work in deep learning
link |
01:12:37.960
in the last several years.
link |
01:12:40.360
So you mentioned some limits.
link |
01:12:42.280
What do you think of the current limits of deep learning?
link |
01:12:44.880
And how do we overcome those limits?
link |
01:12:47.160
So if I had to actually use a single word
link |
01:12:50.160
to define the main challenge in deep learning,
link |
01:12:53.240
it's a challenge that probably has
link |
01:12:55.120
been the challenge for many years.
link |
01:12:56.960
And it's that of generalization.
link |
01:12:59.720
So what that means is that all that we're doing
link |
01:13:04.560
is fitting functions to data.
link |
01:13:06.720
And when the data we see is not from the same distribution,
link |
01:13:12.160
or even if there are some times that it
link |
01:13:14.520
is very close to distribution, but because
link |
01:13:17.320
of the way we train it with limited samples,
link |
01:13:20.240
we then get to this stage where we just
link |
01:13:23.560
don't see generalization as much as we can generalize.
link |
01:13:27.800
And I think adversarial examples are a clear example of this.
link |
01:13:31.240
But if you study machine learning and literature,
link |
01:13:34.640
and the reason why SVMs came very popular
link |
01:13:38.280
were because they were dealing and they
link |
01:13:40.040
had some guarantees about generalization, which
link |
01:13:42.640
is unseen data or out of distribution,
link |
01:13:45.600
or even within distribution where you take an image adding
link |
01:13:48.280
a bit of noise, these models fail.
link |
01:13:51.360
So I think, really, I don't see a lot of progress
link |
01:13:56.680
on generalization in the strong generalization
link |
01:14:00.520
sense of the word.
link |
01:14:01.800
I think our neural networks, you can always
link |
01:14:05.960
find design examples that will make their outputs arbitrary,
link |
01:14:11.040
which is not good because we humans would never
link |
01:14:15.600
be fooled by these kind of images
link |
01:14:17.880
or manipulation of the image.
link |
01:14:19.880
And if you look at the mathematics,
link |
01:14:21.760
you kind of understand this is a bunch of matrices
link |
01:14:23.960
multiplied together.
link |
01:14:26.160
There's probably numerics and instability
link |
01:14:28.040
that you can just find corner cases.
link |
01:14:30.920
So I think that's really the underlying topic many times
link |
01:14:35.240
we see when even at the grand stage of Turing test
link |
01:14:40.120
generalization, if you start passing the Turing test,
link |
01:14:44.520
should it be in English or should it be in any language?
link |
01:14:48.840
As a human, if you ask something in a different language,
link |
01:14:53.200
you actually will go and do some research
link |
01:14:54.920
and try to translate it and so on.
link |
01:14:57.720
Should the Turing test include that?
link |
01:15:01.000
And it's really a difficult problem
link |
01:15:02.920
and very fascinating and very mysterious, actually.
link |
01:15:05.480
Yeah, absolutely.
link |
01:15:06.280
But do you think if you were to try to solve it,
link |
01:15:10.760
can you not grow the size of data intelligently
link |
01:15:14.240
in such a way that the distribution of your training
link |
01:15:17.080
set does include the entirety of the testing set?
link |
01:15:20.880
Is that one path?
link |
01:15:21.760
The other path is totally a new methodology.
link |
01:15:23.840
It's not statistical.
link |
01:15:24.960
So a path that has worked well, and it worked well
link |
01:15:27.920
in StarCraft and in machine translation and in languages,
link |
01:15:30.720
scaling up the data and the model.
link |
01:15:32.800
And that's kind of been maybe the only single formula that
link |
01:15:38.480
still delivers today in deep learning, right?
link |
01:15:40.480
It's that data scale and model scale really
link |
01:15:44.960
do more and more of the things that we thought,
link |
01:15:47.040
oh, there's no way it can generalize to these,
link |
01:15:49.240
or there's no way it can generalize to that.
link |
01:15:51.320
But I don't think fundamentally it will be solved with this.
link |
01:15:54.760
And for instance, I'm really liking some style or approach
link |
01:15:59.600
that would not only have neural networks,
link |
01:16:02.120
but it would have programs or some discrete decision making,
link |
01:16:06.360
because there is where I feel there's a bit more.
link |
01:16:10.320
I mean, the best example, I think, for understanding this
link |
01:16:13.520
is I also worked a bit on, oh, we
link |
01:16:16.640
can learn an algorithm with a neural network, right?
link |
01:16:18.820
So you give it many examples, and it's
link |
01:16:20.560
going to sort the input numbers or something like that.
link |
01:16:24.360
But really strong generalization is you give me some numbers
link |
01:16:29.520
or you ask me to create an algorithm that sorts numbers.
link |
01:16:32.320
And instead of creating a neural net, which will be fragile
link |
01:16:34.760
because it's going to go out of range at some point,
link |
01:16:37.960
you're going to give it numbers that are too large, too small,
link |
01:16:40.840
and whatnot, if you just create a piece of code that
link |
01:16:45.600
sorts the numbers, then you can prove
link |
01:16:47.240
that that will generalize to absolutely all the possible
link |
01:16:50.600
input you could give.
link |
01:16:51.960
So I think the problem comes with some exciting prospects.
link |
01:16:55.920
I mean, scale is a bit more boring, but it really works.
link |
01:16:59.460
And then maybe programs and discrete abstractions
link |
01:17:02.840
are a bit less developed.
link |
01:17:04.840
But clearly, I think they're quite exciting in terms
link |
01:17:07.840
of future for the field.
link |
01:17:09.960
Do you draw any insight wisdom from the 80s and expert
link |
01:17:14.040
systems and symbolic systems, symbolic computing?
link |
01:17:16.920
Do you ever go back to those reasoning, that kind of logic?
link |
01:17:20.760
Do you think that might make a comeback?
link |
01:17:23.200
You'll have to dust off those books?
link |
01:17:24.920
Yeah, I actually love actually adding more inductive biases.
link |
01:17:31.320
To me, the problem really is, what are you trying to solve?
link |
01:17:34.280
If what you're trying to solve is so important that try
link |
01:17:37.440
to solve it no matter what, then absolutely use rules,
link |
01:17:42.480
use domain knowledge, and then use
link |
01:17:45.240
a bit of the magic of machine learning
link |
01:17:46.920
to empower to make the system as the best system that
link |
01:17:50.640
will detect cancer or detect weather patterns, right?
link |
01:17:56.040
Or in terms of StarCraft, it also was a very big challenge.
link |
01:17:59.240
So I was definitely happy that if we
link |
01:18:01.920
had to cut a corner here and there,
link |
01:18:04.560
it could have been interesting to do.
link |
01:18:07.040
And in fact, in StarCraft, we start
link |
01:18:09.240
thinking about expert systems because it's a very,
link |
01:18:11.640
you know, you can define.
link |
01:18:12.800
I mean, people actually build StarCraft bots by thinking
link |
01:18:15.560
about those principles, like state machines and rule based.
link |
01:18:20.240
And then you could think of combining
link |
01:18:22.240
a bit of a rule based system, but that has also
link |
01:18:25.560
neural networks incorporated to make it generalize a bit
link |
01:18:28.600
better.
link |
01:18:29.080
So absolutely, I mean, we should definitely
link |
01:18:31.480
go back to those ideas.
link |
01:18:32.840
And anything that makes the problem simpler,
link |
01:18:35.440
as long as your problem is important, that's OK.
link |
01:18:37.960
And that's research driving a very important problem.
link |
01:18:41.080
And on the other hand, if you want to really focus
link |
01:18:44.520
on the limits of reinforcement learning,
link |
01:18:46.560
then of course, you must try not to look at imitation data
link |
01:18:50.720
or to look for some rules of the domain that would help a lot
link |
01:18:55.200
or even feature engineering, right?
link |
01:18:56.960
So this is a tension that depending on what you do,
link |
01:19:00.720
I think both ways are definitely fine.
link |
01:19:03.280
And I would never not do one or the other
link |
01:19:06.760
as long as what you're doing is important
link |
01:19:08.840
and needs to be solved, right?
link |
01:19:10.000
Right, so there's a bunch of different ideas
link |
01:19:13.440
that you developed that I really enjoy.
link |
01:19:16.840
But one is translating from image captioning,
link |
01:19:22.160
translating from image to text, just another beautiful idea,
link |
01:19:27.480
I think, that resonates throughout your work, actually.
link |
01:19:33.160
So the underlying nature of reality
link |
01:19:35.080
being language always, somehow.
link |
01:19:38.760
So what's the connection between images and text,
link |
01:19:42.480
or rather the visual world and the world
link |
01:19:44.880
of language in your view?
link |
01:19:46.480
Right, so I think a piece of research that's been central
link |
01:19:51.440
to, I would say, even extending into StarGraph
link |
01:19:54.320
is this idea of sequence to sequence learning,
link |
01:19:57.600
which what we really meant by that
link |
01:19:59.800
is that you can now really input anything
link |
01:20:03.440
to a neural network as the input x.
link |
01:20:06.040
And then the neural network will learn a function f
link |
01:20:09.520
that will take x as an input and produce any output y.
link |
01:20:12.720
And these x and y's don't need to be static or features,
link |
01:20:19.200
like fixed vectors or anything like that.
link |
01:20:22.200
It could be really sequences and now beyond data structures.
link |
01:20:26.520
So that paradigm was tested in a very interesting way
link |
01:20:31.560
when we moved from translating French to English
link |
01:20:35.720
to translating an image to its caption.
link |
01:20:37.920
But the beauty of it is that, really,
link |
01:20:40.720
and that's actually how it happened.
link |
01:20:43.000
I changed a line of code in this thing that
link |
01:20:45.440
was doing machine translation.
link |
01:20:47.480
And I came the next day, and I saw
link |
01:20:50.080
how it was producing captions that seemed like, oh my god,
link |
01:20:54.160
this is really, really working.
link |
01:20:55.960
And the principle is the same.
link |
01:20:57.640
So I think I don't see text, vision, speech, waveforms
link |
01:21:04.080
as something different as long as you basically
link |
01:21:09.000
learn a function that will vectorize these into.
link |
01:21:14.760
And then after we vectorize it, we
link |
01:21:16.640
can then use transformers, LSTMs, whatever
link |
01:21:20.040
the flavor of the month of the model is.
link |
01:21:22.680
And then as long as we have enough supervised data,
link |
01:21:25.720
really, this formula will work and will keep working,
link |
01:21:30.040
I believe, to some extent.
link |
01:21:31.800
Modulo these generalization issues that I mentioned before.
link |
01:21:35.200
But the task there is to vectorize,
link |
01:21:36.760
so to form a representation that's meaningful.
link |
01:21:39.840
And your intuition now, having worked with all this media,
link |
01:21:42.720
is that once you are able to form that representation,
link |
01:21:46.520
you could basically take any things, any sequence.
link |
01:21:51.280
Going back to StarCraft, is there
link |
01:21:52.880
limits on the length so that we didn't really
link |
01:21:56.760
touch on the long term aspect?
link |
01:21:59.520
How did you overcome the whole really long term
link |
01:22:02.480
aspect of things here?
link |
01:22:03.800
Is there some tricks?
link |
01:22:05.200
So the main trick, so StarCraft, if you
link |
01:22:08.680
look at absolutely every frame, you
link |
01:22:10.880
might think it's quite a long game.
link |
01:22:12.600
So we would have to multiply 22 times 60 seconds per minute
link |
01:22:18.400
times maybe at least 10 minutes per game on average.
link |
01:22:21.840
So there are quite a few frames.
link |
01:22:25.640
But the trick really was to only observe, in fact,
link |
01:22:30.200
which might be seen as a limitation,
link |
01:22:32.280
but it is also a computational advantage.
link |
01:22:35.160
Only observe when you act.
link |
01:22:37.720
And then what the neural network decides
link |
01:22:40.000
is what is the gap going to be until the next action.
link |
01:22:44.680
And if you look at most StarCraft games
link |
01:22:48.080
that we have in the data set that Blizzard provided,
link |
01:22:51.920
it turns out that most games are actually only,
link |
01:22:56.000
I mean, it is still a long sequence,
link |
01:22:58.000
but it's maybe like 1,000 to 1,500 actions,
link |
01:23:02.160
which if you start looking at LSTMs, large LSTMs,
link |
01:23:07.200
transformers, it's not that difficult, especially
link |
01:23:12.320
if you have supervised learning.
link |
01:23:14.320
If you had to do it with reinforcement learning,
link |
01:23:16.240
the credit assignment problem, what
link |
01:23:18.080
is it in this game that made you win?
link |
01:23:19.800
That would be really difficult.
link |
01:23:21.640
But thankfully, because of imitation learning,
link |
01:23:24.640
we didn't have to deal with these directly.
link |
01:23:27.400
Although if we had to, we tried it.
link |
01:23:29.600
And what happened is you just take all your workers
link |
01:23:31.840
and attack with them.
link |
01:23:33.280
And that is kind of obvious in retrospect
link |
01:23:36.080
because you start trying random actions.
link |
01:23:38.120
One of the actions will be a worker
link |
01:23:40.280
that goes to the enemy base.
link |
01:23:41.440
And because it's self play, it's not
link |
01:23:43.560
going to know how to defend because it basically
link |
01:23:45.680
doesn't know almost anything.
link |
01:23:47.000
And eventually, what you develop is this take all workers
link |
01:23:50.320
and attack because the credit assignment issue in a rally
link |
01:23:54.680
is really, really hard.
link |
01:23:55.840
I do believe we could do better.
link |
01:23:57.600
And that's maybe a research challenge for the future.
link |
01:24:01.520
But yeah, even in StarCraft, the sequences
link |
01:24:04.160
are maybe 1,000, which I believe is
link |
01:24:07.640
within the realm of what transformers can do.
link |
01:24:10.360
Yeah, I guess the difference between StarCraft and Go
link |
01:24:12.800
is in Go and Chess, stuff starts happening right away.
link |
01:24:18.160
So there's not, yeah, it's pretty easy to self play.
link |
01:24:22.240
Not easy, but to self play, it's possible to develop
link |
01:24:24.560
reasonable strategies quickly as opposed to StarCraft.
link |
01:24:27.240
I mean, in Go, there's only 400 actions.
link |
01:24:30.600
But one action is what people would call the God action.
link |
01:24:34.200
That would be if you had expanded the whole search
link |
01:24:38.480
tree, that's the best action if you did minimax
link |
01:24:40.840
or whatever algorithm you would do if you
link |
01:24:42.800
had the computational capacity.
link |
01:24:44.960
But in StarCraft, 400 is minuscule.
link |
01:24:48.720
Like in 400, you couldn't even click
link |
01:24:51.960
on the pixels around a unit.
link |
01:24:53.840
So I think the problem there is in terms of action space size
link |
01:24:58.880
is way harder.
link |
01:25:01.640
And that search is impossible.
link |
01:25:03.960
So there's quite a few challenges indeed
link |
01:25:06.000
that make this kind of a step up in terms of machine learning.
link |
01:25:10.640
For humans, maybe playing StarCraft
link |
01:25:13.560
seems more intuitive because it looks real.
link |
01:25:16.000
I mean, the graphics and everything moves smoothly,
link |
01:25:18.840
whereas I don't know how to.
link |
01:25:20.240
I mean, Go is a game that I would really need to study.
link |
01:25:22.680
It feels quite complicated.
link |
01:25:23.920
But for machines, kind of maybe it's the reverse, yes.
link |
01:25:27.040
Which shows you the gap actually between deep learning
link |
01:25:30.240
and however the heck our brains work.
link |
01:25:34.040
So you developed a lot of really interesting ideas.
link |
01:25:36.080
It's interesting to just ask, what's
link |
01:25:38.480
your process of developing new ideas?
link |
01:25:41.200
Do you like brainstorming with others?
link |
01:25:42.960
Do you like thinking alone?
link |
01:25:44.560
Do you like, what was it, Ian Goodfellow said
link |
01:25:49.200
he came up with GANs after a few beers.
link |
01:25:52.840
He thinks beers are essential for coming up with new ideas.
link |
01:25:55.880
We had beers to decide to play another game of StarCraft
link |
01:25:59.160
after a week.
link |
01:25:59.720
So it's really similar to that story.
link |
01:26:02.760
Actually, I explained this in a DeepMind retreat.
link |
01:26:05.880
And I said, this is the same as the GAN story.
link |
01:26:08.000
I mean, we were in a bar.
link |
01:26:09.080
And we decided, let's play a GAN next week.
link |
01:26:10.920
And that's what happened.
link |
01:26:11.880
I feel like we're giving the wrong message
link |
01:26:13.600
to young undergrads.
link |
01:26:15.120
Yeah, I know.
link |
01:26:15.760
But in general, do you like brainstorming?
link |
01:26:18.320
Do you like thinking alone, working stuff out?
link |
01:26:20.280
So I think throughout the years, also, things changed.
link |
01:26:23.960
So initially, I was very fortunate to be
link |
01:26:29.320
with great minds like Jeff Hinton, Jeff Dean,
link |
01:26:33.120
Ilya Sutskever.
link |
01:26:34.040
I was really fortunate to join Brain at a very good time.
link |
01:26:37.800
So at that point, ideas, I was just
link |
01:26:41.000
brainstorming with my colleagues and learned a lot.
link |
01:26:44.040
And keep learning is actually something
link |
01:26:46.400
you should never stop doing.
link |
01:26:48.160
So learning implies reading papers and also
link |
01:26:51.520
discussing ideas with others.
link |
01:26:53.200
It's very hard at some point to not communicate
link |
01:26:56.680
that being reading a paper from someone
link |
01:26:59.160
or actually discussing.
link |
01:27:00.520
So definitely, that communication aspect
link |
01:27:04.680
needs to be there, whether it's written or oral.
link |
01:27:08.520
Nowadays, I'm also trying to be a bit more strategic
link |
01:27:12.840
about what research to do.
link |
01:27:15.160
So I was describing a little bit this tension
link |
01:27:18.480
between research for the sake of research,
link |
01:27:21.440
and then you have, on the other hand,
link |
01:27:23.000
applications that can drive the research.
link |
01:27:25.520
And honestly, the formula that has worked best for me
link |
01:27:28.560
is just find a hard problem and then
link |
01:27:32.240
try to see how research fits into it,
link |
01:27:34.600
how it doesn't fit into it, and then you must innovate.
link |
01:27:37.880
So I think machine translation drove sequence to sequence.
link |
01:27:43.040
Then maybe learning algorithms that had to,
link |
01:27:47.360
combinatorial algorithms led to pointer networks.
link |
01:27:50.560
StarCraft led to really scaling up imitation learning
link |
01:27:53.840
and the AlphaStarLeague.
link |
01:27:55.520
So that's been a formula that I personally like.
link |
01:27:58.360
But the other one is also valid.
link |
01:28:00.000
And I've seen it succeed a lot of the times
link |
01:28:02.760
where you just want to investigate model based
link |
01:28:05.680
RL as a research topic.
link |
01:28:08.160
And then you must then start to think, well,
link |
01:28:11.360
how are the tests?
link |
01:28:12.160
How are you going to test these ideas?
link |
01:28:14.240
You need a minimal environment to try things.
link |
01:28:17.920
You need to read a lot of papers and so on.
link |
01:28:19.720
And that's also very fun to do and something
link |
01:28:21.520
I've also done quite a few times,
link |
01:28:24.000
both at Brain, at DeepMind, and obviously as a PhD.
link |
01:28:28.920
So I think besides the ideas and discussions,
link |
01:28:32.880
I think it's important also because you start
link |
01:28:35.920
sort of guiding not only your own goals,
link |
01:28:40.200
but other people's goals to the next breakthrough.
link |
01:28:44.000
So you must really kind of understand this feasibility
link |
01:28:48.400
also, as we were discussing before,
link |
01:28:50.400
whether this domain is ready to be tackled or not.
link |
01:28:53.960
And you don't want to be too early.
link |
01:28:55.480
You obviously don't want to be too late.
link |
01:28:57.080
So it's really interesting, this strategic component
link |
01:29:00.520
of research, which I think as a grad student,
link |
01:29:03.200
I just had no idea.
link |
01:29:05.120
I just read papers and discussed ideas.
link |
01:29:07.400
And I think this has been maybe the major change.
link |
01:29:09.760
And I recommend people kind of feed forward
link |
01:29:13.520
to success how it looks like and try to backtrack,
link |
01:29:16.040
other than just kind of looking, oh, this looks cool.
link |
01:29:18.680
This looks cool.
link |
01:29:19.320
And then you do a bit of random work,
link |
01:29:21.080
which sometimes you stumble upon some interesting things.
link |
01:29:23.880
But in general, it's also good to plan a bit.
link |
01:29:27.440
Yeah, I like it.
link |
01:29:28.960
Especially like your approach of taking a really hard problem,
link |
01:29:31.960
stepping right in, and then being
link |
01:29:33.680
super skeptical about being able to solve the problem.
link |
01:29:37.840
I mean, there's a balance of both, right?
link |
01:29:40.120
There's a silly optimism and a critical sort of skepticism
link |
01:29:46.920
that's good to balance, which is why
link |
01:29:49.000
it's good to have a team of people that balance that.
link |
01:29:52.400
You don't do that on your own.
link |
01:29:53.880
You have both mentors that have seen,
link |
01:29:56.440
or you obviously want to chat and discuss
link |
01:29:59.680
whether it's the right time.
link |
01:30:00.920
I mean, Demis came in 2014.
link |
01:30:03.960
And he said, maybe in a bit we'll do StarCraft.
link |
01:30:06.600
And maybe he knew.
link |
01:30:08.240
And I'm just following his lead, which is great,
link |
01:30:11.200
because he's brilliant, right?
link |
01:30:12.600
So these things are obviously quite important,
link |
01:30:17.280
that you want to be surrounded by people who are diverse.
link |
01:30:22.280
They have their knowledge.
link |
01:30:23.960
There's also important to, I mean,
link |
01:30:26.640
I've learned a lot from people who actually have an idea
link |
01:30:30.960
that I might not think it's good.
link |
01:30:32.440
But if I give them the space to try it,
link |
01:30:34.880
I've been proven wrong many, many times as well.
link |
01:30:37.080
So that's great.
link |
01:30:38.200
I think your colleagues are more important than yourself,
link |
01:30:42.760
I think.
link |
01:30:43.480
Sure.
link |
01:30:44.480
Now let's real quick talk about another impossible problem,
link |
01:30:48.600
AGI.
link |
01:30:49.560
Right.
link |
01:30:50.280
What do you think it takes to build a system that's
link |
01:30:52.680
human level intelligence?
link |
01:30:54.080
We talked a little bit about the Turing test, StarCraft.
link |
01:30:56.400
All of these have echoes of general intelligence.
link |
01:30:58.960
But if you think about just something
link |
01:31:01.400
that you would sit back and say, wow,
link |
01:31:03.400
this is really something that resembles
link |
01:31:06.720
human level intelligence.
link |
01:31:07.800
What do you think it takes to build that?
link |
01:31:09.520
So I find that AGI oftentimes is maybe not very well defined.
link |
01:31:17.160
So what I'm trying to then come up with for myself
link |
01:31:20.520
is what would be a result look like that you would start
link |
01:31:25.040
to believe that you would have agents or neural nets that
link |
01:31:28.640
no longer overfeed to a single task,
link |
01:31:31.800
but actually learn the skill of learning, so to speak.
link |
01:31:37.840
And that actually is a field that I
link |
01:31:40.480
am fascinated by, which is the learning to learn,
link |
01:31:43.560
or meta learning, which is about no longer learning
link |
01:31:47.120
about a single domain.
link |
01:31:48.680
So you can think about the learning algorithm
link |
01:31:51.040
itself is general.
link |
01:31:52.680
So the same formula we applied for AlphaStar or StarCraft,
link |
01:31:56.800
we can now apply to almost any video game,
link |
01:31:59.440
or you could apply to many other problems and domains.
link |
01:32:03.520
But the algorithm is what's generalizing.
link |
01:32:07.040
But the neural network, those weights
link |
01:32:09.640
are useless even to play another race.
link |
01:32:12.120
I train a network to play very well at Protos versus Protos.
link |
01:32:15.400
I need to throw away those weights.
link |
01:32:17.680
If I want to play now Terran versus Terran,
link |
01:32:20.640
I would need to retrain a network from scratch
link |
01:32:23.720
with the same algorithm.
link |
01:32:24.800
That's beautiful.
link |
01:32:26.000
But the network itself will not be useful.
link |
01:32:28.640
So I think if I see an approach that
link |
01:32:32.840
can absorb or start solving new problems without the need
link |
01:32:38.040
to kind of restart the process, I
link |
01:32:40.280
think that, to me, would be a nice way
link |
01:32:42.600
to define some form of AGI.
link |
01:32:45.600
Again, I don't know the grandiose like age.
link |
01:32:48.120
I mean, should Turing tests be solved before AGI?
link |
01:32:50.600
I mean, I don't know.
link |
01:32:51.720
I think concretely, I would like to see clearly
link |
01:32:54.760
that meta learning happen, meaning
link |
01:32:57.560
that there is an architecture or a network that
link |
01:33:01.320
as it sees new problem or new data, it solves it.
link |
01:33:04.920
And to make it kind of a benchmark,
link |
01:33:08.240
it should solve it at the same speed
link |
01:33:09.800
that we do solve new problems.
link |
01:33:11.400
When I define you a new object and you
link |
01:33:13.520
have to recognize it, when you start playing a new game,
link |
01:33:16.240
you played all the Atari games.
link |
01:33:17.480
But now you play a new Atari game.
link |
01:33:19.360
Well, you're going to be pretty quickly pretty good
link |
01:33:22.000
at the game.
link |
01:33:22.560
So that's perhaps what's the domain
link |
01:33:25.840
and what's the exact benchmark is a bit difficult.
link |
01:33:28.120
I think as a community, we might need
link |
01:33:29.760
to do some work to define it.
link |
01:33:32.600
But I think this first step, I could
link |
01:33:34.720
see it happen relatively soon.
link |
01:33:36.840
But then the whole what AGI means and so on,
link |
01:33:40.600
I am a bit more confused about what
link |
01:33:43.120
I think people mean different things.
link |
01:33:44.800
There's an emotional, psychological level
link |
01:33:48.680
that like even the Turing test, passing the Turing test
link |
01:33:53.080
is something that we just pass judgment on as human beings
link |
01:33:55.840
what it means to be as a dog in AGI system.
link |
01:34:03.560
Yeah.
link |
01:34:04.080
What level, what does it mean, what does it mean?
link |
01:34:07.520
But I like the generalization.
link |
01:34:08.960
And maybe as a community, we converge
link |
01:34:10.680
towards a group of domains that are sufficiently far away.
link |
01:34:14.960
That would be really damn impressive
link |
01:34:16.560
if it was able to generalize.
link |
01:34:18.280
So perhaps not as close as Protoss and Zerg,
link |
01:34:21.360
but like Wikipedia.
link |
01:34:22.720
That would be a step.
link |
01:34:23.600
Yeah, that would be a good step and then a really good step.
link |
01:34:26.400
But then like from StarCraft to Wikipedia and back.
link |
01:34:30.800
Yeah, that kind of thing.
link |
01:34:31.880
And that feels also quite hard and far.
link |
01:34:34.200
But I think as long as you put the benchmark out,
link |
01:34:38.200
as we discovered, for instance, with ImageNet,
link |
01:34:41.080
then tremendous progress can be had.
link |
01:34:43.120
So I think maybe there's a lack of benchmark,
link |
01:34:46.360
but I'm sure we'll find one and the community will then
link |
01:34:49.520
work towards that.
link |
01:34:52.360
And then beyond what AGI might mean or would imply,
link |
01:34:56.920
I really am hopeful to see basically machine learning
link |
01:35:01.040
or AI just scaling up and helping people
link |
01:35:05.280
that might not have the resources to hire an assistant
link |
01:35:08.640
or that they might not even know what the weather is like.
link |
01:35:13.800
So I think in terms of the positive impact of AI,
link |
01:35:18.000
I think that's maybe what we should also not lose focus.
link |
01:35:22.440
The research community building AGI,
link |
01:35:23.960
I mean, that's a real nice goal.
link |
01:35:25.520
But I think the way that DeepMind puts it is,
link |
01:35:28.480
and then use it to solve everything else.
link |
01:35:30.760
So I think we should paralyze.
link |
01:35:33.440
Yeah, we shouldn't forget about all the positive things
link |
01:35:36.160
that are actually coming out of AI already
link |
01:35:38.000
and are going to be coming out.
link |
01:35:40.600
Right.
link |
01:35:41.600
But on that note, let me ask relative
link |
01:35:45.400
to popular perception, do you have
link |
01:35:47.760
any worry about the existential threat
link |
01:35:49.640
of artificial intelligence in the near or far future
link |
01:35:53.200
that some people have?
link |
01:35:55.080
I think in the near future, I'm skeptical.
link |
01:35:58.080
So I hope I'm not wrong.
link |
01:35:59.280
But I'm not concerned, but I appreciate efforts,
link |
01:36:04.720
ongoing efforts, and even like whole research
link |
01:36:07.760
field on AI safety emerging and in conferences and so on.
link |
01:36:10.720
I think that's great.
link |
01:36:12.560
In the long term, I really hope we just
link |
01:36:16.200
can simply have the benefits outweigh
link |
01:36:19.120
the potential dangers.
link |
01:36:20.600
I am hopeful for that.
link |
01:36:23.400
But also, we must remain vigilant to monitor and assess
link |
01:36:27.400
whether the tradeoffs are there and we have enough also lead
link |
01:36:32.640
time to prevent or to redirect our efforts if need be.
link |
01:36:37.720
But I'm quite optimistic about the technology
link |
01:36:41.440
and definitely more fearful of other threats
link |
01:36:45.000
in terms of planetary level at this point.
link |
01:36:48.600
But obviously, that's the one I have more power on.
link |
01:36:52.520
So clearly, I do start thinking more and more about this.
link |
01:36:56.280
And it's grown in me actually to start reading more
link |
01:37:00.840
about AI safety, which is a field that so far I have not
link |
01:37:04.120
really contributed to.
link |
01:37:05.360
But maybe there's something to be done there as well.
link |
01:37:07.720
I think it's really important.
link |
01:37:09.280
I talk about this with a few folks.
link |
01:37:11.440
But it's important to ask you and shove it in your head
link |
01:37:14.800
because you're at the leading edge of actually what
link |
01:37:18.040
people are excited about in AI.
link |
01:37:19.880
The work with AlphaStar, it's arguably
link |
01:37:22.800
at the very cutting edge of the kind of thing
link |
01:37:25.400
that people are afraid of.
link |
01:37:27.160
And so you speaking to that fact and that we're actually
link |
01:37:31.640
quite far away to the kind of thing
link |
01:37:33.520
that people might be afraid of.
link |
01:37:35.160
But it's still worthwhile to think about.
link |
01:37:38.320
And it's also good that you're not as worried
link |
01:37:43.480
and you're also open to thinking about it.
link |
01:37:45.720
There's two aspects.
link |
01:37:46.560
I mean, me not being worried.
link |
01:37:47.720
But obviously, we should prepare for things
link |
01:37:53.880
that could go wrong, misuse of the technologies
link |
01:37:56.800
as with any technologies.
link |
01:37:58.360
So I think there's always trade offs.
link |
01:38:02.400
And as a society, we've kind of solved this to some extent
link |
01:38:06.800
in the past.
link |
01:38:07.360
So I'm hoping that by having the researchers
link |
01:38:10.720
and the whole community brainstorm and come up
link |
01:38:14.120
with interesting solutions to the new things that
link |
01:38:16.960
will happen in the future, that we can still also push
link |
01:38:20.320
the research to the avenue that I think
link |
01:38:23.000
is kind of the greatest avenue, which is
link |
01:38:25.800
to understand intelligence.
link |
01:38:27.760
How are we doing what we're doing?
link |
01:38:29.920
And obviously, from a scientific standpoint,
link |
01:38:32.560
that is kind of my personal drive of all the time
link |
01:38:37.000
that I spend doing what I'm doing, really.
link |
01:38:40.000
Where do you see the deep learning as a field heading?
link |
01:38:42.960
Where do you think the next big breakthrough might be?
link |
01:38:46.720
So I think deep learning, I discussed a little of this
link |
01:38:49.880
before.
link |
01:38:50.720
Deep learning has to be combined with some form
link |
01:38:54.000
of discretization, program synthesis.
link |
01:38:56.680
I think that's kind of as a research in itself
link |
01:38:59.240
is an interesting topic to expand and start
link |
01:39:02.000
doing more research.
link |
01:39:04.080
And then as kind of what will deep learning
link |
01:39:07.080
enable to do in the future?
link |
01:39:08.560
I don't think that's going to be what's going to happen this year.
link |
01:39:11.480
But also this idea of starting not to throw away all the weights,
link |
01:39:16.480
that this idea of learning to learn
link |
01:39:18.840
and really having these agents not having
link |
01:39:23.400
to restart their weights.
link |
01:39:24.960
And you can have an agent that is kind of solving or classifying
link |
01:39:29.760
images on ImageNet, but also generating speech
link |
01:39:32.760
if you ask it to generate some speech.
link |
01:39:34.680
And it should really be kind of almost the same network,
link |
01:39:39.760
but it might not be a neural network.
link |
01:39:41.760
It might be a neural network with an optimization
link |
01:39:44.240
algorithm attached to it.
link |
01:39:45.600
But I think this idea of generalization to new task
link |
01:39:49.280
is something that we first must define good benchmarks.
link |
01:39:52.160
But then I think that's going to be exciting.
link |
01:39:54.680
And I'm not sure how close we are.
link |
01:39:56.480
But I think if you have a very limited domain,
link |
01:40:00.880
I think we can start doing some progress.
link |
01:40:02.800
And much like how we did a lot of programs in computer vision,
link |
01:40:07.200
we should start thinking.
link |
01:40:09.040
I really like a talk that Leon Buto gave at ICML
link |
01:40:12.720
a few years ago, which is this train test paradigm should
link |
01:40:16.800
be broken.
link |
01:40:17.920
We should stop thinking about a training set and a test set.
link |
01:40:23.160
And these are closed things that are untouchable.
link |
01:40:26.640
I think we should go beyond these.
link |
01:40:28.200
And in meta learning, we call these the meta training
link |
01:40:30.840
set and the meta test set, which is really thinking about,
link |
01:40:35.320
if I know about ImageNet, why would that network not
link |
01:40:39.040
work on MNIST, which is a much simpler problem?
link |
01:40:41.320
But right now, it really doesn't.
link |
01:40:44.560
But it just feels wrong.
link |
01:40:46.200
So I think that's kind of the, on the application
link |
01:40:50.960
or the benchmark sites, we probably
link |
01:40:52.960
will see quite a few more interest and progress
link |
01:40:56.520
and hopefully people defining new and exciting challenges
link |
01:41:00.240
really.
link |
01:41:00.960
Do you have any hope or interest in knowledge graphs
link |
01:41:04.160
within this context?
link |
01:41:05.280
So this kind of constructing graph.
link |
01:41:08.160
So going back to graphs.
link |
01:41:10.480
Well, neural networks and graphs.
link |
01:41:12.120
But I mean, a different kind of knowledge graph,
link |
01:41:14.840
sort of like semantic graphs or those concepts.
link |
01:41:18.000
Yeah.
link |
01:41:18.800
So I think the idea of graphs is,
link |
01:41:23.520
so I've been quite interested in sequences first and then
link |
01:41:26.680
more interesting or different data structures like graphs.
link |
01:41:29.720
And I've studied graph neural networks in the last three
link |
01:41:33.960
years or so.
link |
01:41:34.520
I found these models just very interesting
link |
01:41:37.680
from deep learning sites standpoint.
link |
01:41:42.160
But then why do we want these models
link |
01:41:45.840
and why would we use them?
link |
01:41:47.280
What's the application?
link |
01:41:48.640
What's kind of the killer application of graphs?
link |
01:41:51.320
And perhaps if we could extract a knowledge graph
link |
01:41:58.520
from Wikipedia automatically, that
link |
01:42:01.680
would be interesting because then these graphs have
link |
01:42:04.680
this very interesting structure that also is a bit more
link |
01:42:07.920
compatible with this idea of programs and deep learning
link |
01:42:11.560
kind of working together, jumping neighborhoods
link |
01:42:14.360
and so on.
link |
01:42:14.840
You could imagine defining some primitives
link |
01:42:17.080
to go around graphs, right?
link |
01:42:18.800
So I think I really like the idea of a knowledge graph.
link |
01:42:23.720
And in fact, when we started or as part of the research
link |
01:42:29.640
we did for StarCraft, I thought, wouldn't it
link |
01:42:31.960
be cool to give the graph of all these buildings that
link |
01:42:38.000
depend on each other and units that have prerequisites
link |
01:42:41.440
of being built by that.
link |
01:42:42.440
And so this is information that the network
link |
01:42:45.680
can learn and extract.
link |
01:42:46.920
But it would have been great to see
link |
01:42:50.120
or to think of really StarCraft as a giant graph that even
link |
01:42:53.880
also as the game evolves, you start taking branches
link |
01:42:57.040
and so on.
link |
01:42:57.960
And we did a bit of research on these,
link |
01:42:59.920
nothing too relevant, but I really like the idea.
link |
01:43:04.080
And it has elements that are something
link |
01:43:06.360
you also worked with in terms of visualizing your networks.
link |
01:43:08.840
It has elements of having human interpretable,
link |
01:43:13.280
being able to generate knowledge representations that
link |
01:43:15.840
are human interpretable that maybe human experts can then
link |
01:43:18.640
tweak or at least understand.
link |
01:43:20.960
So there's a lot of interesting aspect there.
link |
01:43:22.880
And for me personally, I'm just a huge fan of Wikipedia.
link |
01:43:25.600
And it's a shame that our neural networks aren't
link |
01:43:29.360
taking advantage of all the structured knowledge that's
link |
01:43:31.600
on the web.
link |
01:43:32.400
What's next for you?
link |
01:43:34.920
What's next for DeepMind?
link |
01:43:36.400
What are you excited about for AlphaStar?
link |
01:43:39.680
Yeah, so I think the obvious next steps
link |
01:43:43.560
would be to apply AlphaStar to other races.
link |
01:43:48.040
I mean, that sort of shows that the algorithm works
link |
01:43:51.640
because we wouldn't want to have created by mistake something
link |
01:43:56.120
in the architecture that happens to work for Protoss
link |
01:43:58.840
but not for other races.
link |
01:44:00.120
So as verification, I think that's an obvious next step
link |
01:44:03.480
that we are working on.
link |
01:44:05.640
And then I would like to see so agents and players can
link |
01:44:11.440
specialize on different skill sets that
link |
01:44:13.920
allow them to be very good.
link |
01:44:15.920
I think we've seen AlphaStar understanding very well
link |
01:44:19.480
when to take battles and when to not to do that.
link |
01:44:22.360
Also very good at micromanagement
link |
01:44:24.880
and moving the units around and so on.
link |
01:44:27.520
And also very good at producing nonstop and trading off
link |
01:44:30.400
economy with building units.
link |
01:44:33.360
But I have not perhaps seen as much
link |
01:44:36.520
as I would like this idea of the poker idea
link |
01:44:39.000
that you mentioned, right?
link |
01:44:40.360
I'm not sure StarCraft or AlphaStar
link |
01:44:42.600
rather has developed a very deep understanding of what
link |
01:44:47.160
the opponent is doing and reacting to that
link |
01:44:50.120
and sort of trying to trick the player to do something else
link |
01:44:54.080
or that.
link |
01:44:55.440
So this kind of reasoning, I would like to see more.
link |
01:44:58.320
So I think purely from a research standpoint,
link |
01:45:01.600
there's perhaps also quite a few things
link |
01:45:03.920
to be done there in the domain of StarCraft.
link |
01:45:06.000
Yeah, in the domain of games, I've
link |
01:45:08.320
seen some interesting work in even auctions,
link |
01:45:11.960
manipulating other players, sort of forming a belief state
link |
01:45:15.160
and just messing with people.
link |
01:45:17.160
Yeah, it's called theory of mind, I guess.
link |
01:45:18.800
Theory of mind, yeah.
link |
01:45:20.080
So it's a fascinating.
link |
01:45:21.440
Theory of mind on StarCraft is kind of they're
link |
01:45:24.400
really made for each other.
link |
01:45:26.080
So that would be very exciting to see those techniques apply
link |
01:45:29.840
to StarCraft or perhaps StarCraft
link |
01:45:32.040
driving new techniques, right?
link |
01:45:33.280
As I said, this is always the tension between the two.
link |
01:45:36.600
Well, Orel, thank you so much for talking today.
link |
01:45:38.800
Awesome.
link |
01:45:39.320
It was great to be here.
link |
01:45:40.280
Thanks.