back to index

Oriol Vinyals: DeepMind AlphaStar, StarCraft, and Language | Lex Fridman Podcast #20


small model | large model

link |
00:00:00.000
The following is a conversation with Ariol Vanialis.
link |
00:00:03.320
He's a senior research scientist at Google DeepMind,
link |
00:00:05.920
and before that, he was at Google Brain and Berkeley.
link |
00:00:09.120
His research has been cited over 39,000 times.
link |
00:00:13.320
He's truly one of the most brilliant and impactful minds
link |
00:00:16.520
in the field of deep learning.
link |
00:00:18.160
He's behind some of the biggest papers and ideas in AI,
link |
00:00:20.960
including sequence to sequence learning,
link |
00:00:23.080
audio generation, image captioning,
link |
00:00:25.480
neural machine translation,
link |
00:00:27.000
and, of course, reinforcement learning.
link |
00:00:29.640
He's a lead researcher of the AlphaStar project,
link |
00:00:32.800
creating an agent that defeated a top professional
link |
00:00:35.760
at the game of StarCraft.
link |
00:00:38.040
This conversation is part
link |
00:00:39.800
of the artificial intelligence podcast.
link |
00:00:41.800
If you enjoy it, subscribe on YouTube, iTunes,
link |
00:00:44.920
or simply connect with me on Twitter,
link |
00:00:47.000
at Lex Freedman, spelled F R I D.
link |
00:00:51.200
And now, here's my conversation with Ariol Vanialis.
link |
00:00:56.520
You spearheaded the DeepMind team
link |
00:00:58.480
behind AlphaStar that recently beat
link |
00:01:00.640
a top professional player at StarCraft.
link |
00:01:04.000
So you have an incredible wealth of work
link |
00:01:07.680
and deep learning and a bunch of fields,
link |
00:01:09.440
but let's talk about StarCraft first.
link |
00:01:11.840
Let's go back to the very beginning,
link |
00:01:13.720
even before AlphaStar, before DeepMind,
link |
00:01:16.680
before Deep Learning, first,
link |
00:01:18.800
what came first for you?
link |
00:01:21.240
A love for programming or a love for video games?
link |
00:01:24.960
I think for me, it definitely came first,
link |
00:01:28.560
the drive to play video games.
link |
00:01:31.960
I really liked computers.
link |
00:01:35.280
I didn't really code much,
link |
00:01:37.800
but what I would do is I would just mess with the computer,
link |
00:01:40.680
break it and fix it.
link |
00:01:42.080
That was the level of skills, I guess,
link |
00:01:43.800
that I gained in my very early days,
link |
00:01:46.400
I mean, when I was 10 or 11.
link |
00:01:48.520
And then I really got into video games,
link |
00:01:51.000
especially StarCraft, actually, the first version.
link |
00:01:53.680
I spent most of my time just playing,
link |
00:01:55.800
kind of, pseudo professionally,
link |
00:01:57.080
as professionally as you could play back in 98 in Europe,
link |
00:02:01.040
which was not a very main scene,
link |
00:02:03.080
like what's called nowadays esports.
link |
00:02:05.840
Right, of course, in the 90s.
link |
00:02:07.400
So how'd you get into StarCraft?
link |
00:02:09.920
What was your favorite race?
link |
00:02:11.680
How did you develop your skill?
link |
00:02:15.080
What was your strategy?
link |
00:02:16.880
All that kind of thing.
link |
00:02:18.000
So as a player, I tended to try to play not many games,
link |
00:02:21.520
not to disclose the strategies that I developed.
link |
00:02:25.400
And I like to play random, actually,
link |
00:02:27.560
not in competitions, but just to...
link |
00:02:30.040
I think in StarCraft, there's three main races,
link |
00:02:33.400
and I found it very useful to play with all of them.
link |
00:02:36.560
So I would choose random many times,
link |
00:02:38.360
even sometimes in tournaments,
link |
00:02:40.200
to gain skill on the three races,
link |
00:02:42.360
because it's not how you play against someone,
link |
00:02:45.440
but also if you understand the race because you play it,
link |
00:02:48.760
you also understand what's annoying,
link |
00:02:51.120
then when you're on the other side,
link |
00:02:52.560
what to do to annoy that person,
link |
00:02:54.240
to try to gain advantages here and there and so on.
link |
00:02:57.360
So I actually played random,
link |
00:02:59.160
although I must say in terms of favorite race,
link |
00:03:02.080
I really like Zerk.
link |
00:03:03.720
I was probably best at Zerk,
link |
00:03:05.560
and that's probably what I tend to use
link |
00:03:08.400
towards the end of my career before starting university.
link |
00:03:11.480
So let's step back a little bit.
link |
00:03:13.360
Could you try to describe StarCraft to people
link |
00:03:16.080
that may never have played video games,
link |
00:03:18.960
especially the massively online variety like StarCraft?
link |
00:03:22.320
So StarCraft is a real time strategy game,
link |
00:03:25.920
and the way to think about StarCraft,
link |
00:03:27.800
perhaps if you understand a bit chess,
link |
00:03:30.960
is that there's a board,
link |
00:03:32.920
which is called map,
link |
00:03:34.240
or the map where people play against each other.
link |
00:03:39.120
There's obviously many ways you can play,
link |
00:03:41.000
but the most interesting one is the one versus one setup,
link |
00:03:44.640
where you just play against someone else,
link |
00:03:47.400
or even the build in AI, right?
link |
00:03:48.880
Blizzard put a system that can play the game reasonably well,
link |
00:03:52.600
if you don't know how to play.
link |
00:03:54.480
And then in this board,
link |
00:03:56.040
you have, again, pieces like in chess,
link |
00:03:58.680
but these pieces are not there initially,
link |
00:04:01.400
like they are in chess.
link |
00:04:02.360
You actually need to decide to gather resources,
link |
00:04:05.800
to decide which pieces to build.
link |
00:04:07.920
So in a way, you're starting almost with no pieces.
link |
00:04:10.760
You start gathering resources in StarCraft.
link |
00:04:13.400
There's minerals and gas that you can gather,
link |
00:04:16.200
and then you must decide how much do you wanna focus,
link |
00:04:19.440
for instance, on gathering more resources,
link |
00:04:21.480
or starting to build units or pieces.
link |
00:04:24.360
And then once you have enough pieces,
link |
00:04:27.200
or maybe a good attack composition,
link |
00:04:32.120
then you go and attack the other side of the map.
link |
00:04:35.480
And now the other main difference with chess
link |
00:04:37.800
is that you don't see the other side of the map.
link |
00:04:39.920
So you're not seeing the moves of the enemy.
link |
00:04:43.360
It's what we call partially observable.
link |
00:04:45.440
So as a result, you must not only decide trading off economy
link |
00:04:50.160
versus building your own units,
link |
00:04:52.320
but you also must decide whether you wanna scout
link |
00:04:54.960
to gather information,
link |
00:04:56.840
but also by scouting you might be giving away some information
link |
00:04:59.520
that you might be hiding from the enemy.
link |
00:05:01.960
So there's a lot of complex decision making
link |
00:05:04.960
all in real time.
link |
00:05:06.000
There's also unlike chess, this is not a turn based game.
link |
00:05:10.120
You play basically all the time continuously
link |
00:05:13.680
and thus some skill in terms of speed and accuracy
link |
00:05:16.920
of clicking is also very important.
link |
00:05:18.960
And people that train for this,
link |
00:05:20.520
really play this game at an amazing skill level.
link |
00:05:23.600
I've seen many times these,
link |
00:05:25.840
and if you can witness this life,
link |
00:05:27.400
it's really, really impressive.
link |
00:05:29.520
So in a way it's kind of a chess,
link |
00:05:31.440
where you don't see the other side of the board,
link |
00:05:33.440
you're building your own pieces,
link |
00:05:35.240
and you also need to gather resources
link |
00:05:37.240
to basically get some money to build other buildings,
link |
00:05:40.720
pieces, technology, and so on.
link |
00:05:42.920
From the perspective of the human player,
link |
00:05:45.160
the difference between that and chess,
link |
00:05:47.240
or maybe that and a game like turn based strategy,
link |
00:05:50.800
like Heroes of the Might of Magic,
link |
00:05:52.840
is that there's an anxiety,
link |
00:05:55.200
because you have to make these decisions really quickly.
link |
00:05:58.800
And if you are not actually aware of what decisions work,
link |
00:06:04.400
it's a very stressful balance that you have to,
link |
00:06:06.520
everything you describe is actually quite stressful,
link |
00:06:08.920
difficult to balance for amateur human player.
link |
00:06:11.760
I don't know if it gets easier at the professional level,
link |
00:06:14.200
like if they're fully aware of what they have to do,
link |
00:06:16.520
but at the amateur level,
link |
00:06:18.360
there's this anxiety, oh crap, I'm being attacked,
link |
00:06:20.520
oh crap, I have to build up resources,
link |
00:06:22.840
oh, I have to probably expand,
link |
00:06:24.360
and all these, the real time strategy aspect
link |
00:06:28.600
is really stressful and computational,
link |
00:06:30.360
I'm sure, difficult, we'll get into it.
link |
00:06:32.320
But for me, Battle.net,
link |
00:06:36.040
so StarCraft was released in 98, 20 years ago,
link |
00:06:42.640
which is hard to believe,
link |
00:06:44.680
and Blizzard Battle.net with Diablo 96 came out,
link |
00:06:50.200
and to me, it might be a narrow perspective,
link |
00:06:52.600
but it changed online gaming,
link |
00:06:54.520
and perhaps society forever,
link |
00:06:57.280
but I may have made way too narrow a viewpoint,
link |
00:07:00.320
but from your perspective,
link |
00:07:03.280
can you talk about the history of gaming
link |
00:07:05.600
over the past 20 years?
link |
00:07:07.000
Is this, how transformational,
link |
00:07:09.640
how important is this line of games?
link |
00:07:12.760
Right, so I think I kind of was an active gamer
link |
00:07:16.920
whilst this was developing the internet and online gaming,
link |
00:07:20.560
so for me, the way it came was I played
link |
00:07:24.040
other games strategy related,
link |
00:07:26.400
I played a bit of Common and Conquer,
link |
00:07:28.400
and then I played Warcraft 2, which is from Blizzard,
link |
00:07:31.840
but at the time, I didn't know,
link |
00:07:33.040
I didn't understand about what Blizzard was or anything.
link |
00:07:36.000
Warcraft 2 was just a game,
link |
00:07:37.320
which was actually very similar to StarCraft in many ways.
link |
00:07:40.240
It's also a real time strategy game
link |
00:07:42.480
where there's orcs and humans, so there's only two races.
link |
00:07:45.400
But it was offline, and it was offline, right?
link |
00:07:47.960
So I remember a friend of mine came to school,
link |
00:07:51.600
say, oh, there's this new cool game called StarCraft,
link |
00:07:53.960
and I just said, oh, this sounds like
link |
00:07:55.400
just a copy of Warcraft 2, until I kind of installed it,
link |
00:07:59.720
and at the time, I am from Spain,
link |
00:08:01.960
so we didn't have like very good internet, right?
link |
00:08:04.640
So there was, for us,
link |
00:08:06.160
StarCraft became first kind of an offline experience
link |
00:08:09.520
where you kind of start to play these missions, right?
link |
00:08:12.920
You play against some sort of scripted things
link |
00:08:15.440
to develop the story of the characters in the game,
link |
00:08:18.920
and then later on, I start playing against the built in AI,
link |
00:08:23.480
and I thought it was impossible to defeat it.
link |
00:08:26.080
Then eventually, you defeat one,
link |
00:08:27.400
and you can actually play against seven built in AIs
link |
00:08:29.680
at the same time,
link |
00:08:31.040
which also felt impossible,
link |
00:08:32.680
but actually, it's not that hard to beat
link |
00:08:35.280
seven built in AIs at once.
link |
00:08:36.960
So once we achieved that,
link |
00:08:39.160
also we discovered that we could play,
link |
00:08:42.120
as I said, internet wasn't that great,
link |
00:08:43.840
but we could play with the LAN, right?
link |
00:08:45.920
Like basically against each other
link |
00:08:48.040
if we were in the same place,
link |
00:08:49.920
because you could just connect machines with like cables, right?
link |
00:08:53.640
So we started playing in LAN mode,
link |
00:08:55.960
and against, you know, as a group of friends,
link |
00:08:58.080
and it was really, really like much more entertaining
link |
00:09:00.520
than playing against the AIs.
link |
00:09:02.280
And later on, as internet was starting to develop
link |
00:09:05.120
and being a bit faster and more reliable,
link |
00:09:07.400
then it's when I started experiencing Battle.net,
link |
00:09:09.720
which is these amazing universe,
link |
00:09:11.560
not only because of the fact
link |
00:09:13.720
that you can play the game against anyone in the world,
link |
00:09:16.440
but you can also get to know more people.
link |
00:09:20.200
You just get exposed to now like this vast variety of,
link |
00:09:23.080
it's kind of a bit when the chats came about, right?
link |
00:09:25.320
There was a chat system,
link |
00:09:27.320
you could play against people,
link |
00:09:29.040
but you could also chat with people,
link |
00:09:30.760
not only about Stacker, but about anything.
link |
00:09:32.520
And that became a way of life for kind of two years.
link |
00:09:36.680
And obviously then it became like kind of,
link |
00:09:38.920
it exploded in me that I started to play more seriously,
link |
00:09:42.280
going to tournaments and so on and so forth.
link |
00:09:44.720
Do you have a sense on a societal sociological level
link |
00:09:49.880
what's this whole part of society
link |
00:09:52.280
that many of us are not aware of?
link |
00:09:53.840
And it's a huge part of society, which is gamers.
link |
00:09:56.920
I mean, every time I come across that in YouTube
link |
00:10:01.000
or streaming sites,
link |
00:10:03.000
I mean, this is a huge number of people play games religiously.
link |
00:10:07.640
Do you have a sense of those folks,
link |
00:10:08.920
especially now that you've returned to that realm
link |
00:10:10.880
a little bit on the AI side?
link |
00:10:12.600
Yeah, so in fact, even after Stacker,
link |
00:10:15.880
I actually played World of Warcraft,
link |
00:10:17.640
which is maybe the main sort of online world and presence
link |
00:10:22.320
that you get to interact with lots of people.
link |
00:10:24.640
So I played that for a little bit.
link |
00:10:26.400
To me, it was a bit less stressful than Starcraft
link |
00:10:29.080
because winning was kind of a given.
link |
00:10:30.920
You just put in this world
link |
00:10:32.400
and you can always complete missions.
link |
00:10:35.040
But I think it was actually the social aspect
link |
00:10:38.120
of especially Starcraft first
link |
00:10:40.480
and then games like World of Warcraft
link |
00:10:43.440
really shaped me in a very interesting ways
link |
00:10:47.000
because what you get to experience
link |
00:10:48.560
is just people you wouldn't usually interact with, right?
link |
00:10:51.680
So even nowadays, I still have many Facebook friends
link |
00:10:55.000
from the area where I played online
link |
00:10:56.960
and their ways of thinking is even political.
link |
00:11:00.120
They just don't, we don't live in it.
link |
00:11:01.680
Like we don't interact in the real world,
link |
00:11:03.720
but we were connected by basically fiber.
link |
00:11:06.800
And that way I actually get to understand a bit better
link |
00:11:10.840
that we live in a diverse world.
link |
00:11:12.840
And these were just connections that were made by,
link |
00:11:15.640
because I happened to go in a city,
link |
00:11:18.120
in a virtual city as a priest
link |
00:11:20.720
and I met this warrior and we became friends.
link |
00:11:23.680
And then we started like playing together, right?
link |
00:11:25.720
So I think it's transformative
link |
00:11:28.800
and more and more and more people are more aware of it.
link |
00:11:31.320
I mean, it's becoming quite mainstream.
link |
00:11:33.520
But back in the day, as you were saying,
link |
00:11:35.360
in 2005 even it was very, still very strange thing to do
link |
00:11:42.120
especially in Europe, I think there were exceptions
link |
00:11:45.920
like Korea for instance, it was amazing
link |
00:11:47.960
like that everything happened so early
link |
00:11:50.600
in terms of cybercafes.
link |
00:11:52.240
Like it's, if you go to Seoul, it's a city
link |
00:11:55.120
that back in the day, StarCraft was kind of,
link |
00:11:58.400
you could be a celebrity by playing StarCraft
link |
00:12:00.640
but this was like 99, 2000, right?
link |
00:12:03.040
It's not like recently.
link |
00:12:04.160
So yeah, it's quite interesting to look back
link |
00:12:08.560
and yeah, I think it's changing society.
link |
00:12:10.760
The same way of course, like technology
link |
00:12:13.120
and social networks and so on are also transforming things.
link |
00:12:16.920
And a quick tangent, let me ask,
link |
00:12:18.480
you're also one of the most productive people
link |
00:12:21.000
in your particular chosen passion and path in life.
link |
00:12:26.440
And yet you're also appreciate and enjoy video games.
link |
00:12:29.480
Do you think it's possible to enjoy video games in moderation?
link |
00:12:35.800
Someone told me that you could choose two out of three.
link |
00:12:39.920
When I was playing video games,
link |
00:12:41.160
you could choose having a girlfriend,
link |
00:12:43.680
playing video games or studying.
link |
00:12:46.240
And I think for the most part it was relatively true.
link |
00:12:50.560
These things do take time.
link |
00:12:52.360
Games like StarCraft, if you take the game pretty seriously
link |
00:12:55.400
and you wanna study it,
link |
00:12:56.520
then you obviously will dedicate more time to it.
link |
00:12:59.080
And I definitely took gaming
link |
00:13:01.200
and obviously studying very seriously.
link |
00:13:03.680
I love learning science and et cetera.
link |
00:13:08.720
So to me, especially when I started university undergrad,
link |
00:13:13.120
I kind of stepped off StarCraft.
link |
00:13:14.920
I actually fully stopped playing.
link |
00:13:16.800
And then World of Warcraft was a bit more casual.
link |
00:13:19.040
You could just connect online and I mean, it was fun.
link |
00:13:22.920
But as I said, that was not as much time investment
link |
00:13:26.840
as it was for me in StarCraft.
link |
00:13:29.480
Okay, so let's get into AlphaStar.
link |
00:13:31.640
What are the, you're behind the team.
link |
00:13:35.200
So DeepMind has been working on StarCraft
link |
00:13:37.240
and released a bunch of cool open source agents
link |
00:13:39.440
and so on in the past few years.
link |
00:13:41.320
But AlphaStar really is the moment
link |
00:13:43.240
where the first time you beat a world class player.
link |
00:13:49.160
So what are the parameters of the challenge
link |
00:13:51.600
in the way that AlphaStar took it on
link |
00:13:53.480
and how did you and David and the rest of the DeepMind team
link |
00:13:57.440
get into it?
link |
00:13:58.280
Consider that you can even beat the best in the world
link |
00:14:00.960
or top players.
link |
00:14:02.480
I think it all started in, back in 2015, actually I'm lying.
link |
00:14:07.480
I think it was 2014 when DeepMind was acquired by Google
link |
00:14:12.760
and I at the time was at Google Brain,
link |
00:14:14.320
which is it was in California, it's still in California.
link |
00:14:17.600
We had this summit where we got together the two groups.
link |
00:14:20.600
So Google Brain and Google DeepMind got together
link |
00:14:23.400
and we gave a series of talks.
link |
00:14:25.200
And given that they were doing deep reinforcement learning
link |
00:14:28.520
for games, I decided to bring up part of my past
link |
00:14:32.200
which I had developed at Berkeley like this thing
link |
00:14:35.000
which we call Berkeley Overmind
link |
00:14:36.440
which is really just a StarCraft one bot.
link |
00:14:39.560
So I talked about that
link |
00:14:41.640
and I remember them is just came to me and said,
link |
00:14:43.800
well, maybe not now, it's perhaps a bit too early
link |
00:14:46.640
but you should just come to DeepMind and do this again
link |
00:14:51.080
with deep reinforcement learning.
link |
00:14:53.240
And at the time it sounded very science fiction
link |
00:14:56.120
for several reasons.
link |
00:14:58.280
But then in 2016, when I actually moved to London
link |
00:15:01.000
and joined DeepMind transferring from Brain,
link |
00:15:04.280
it became apparent that because of the AlphaGo moment
link |
00:15:07.840
and kind of Blizzard reaching out to us to say,
link |
00:15:11.280
wait, like, do you want the next challenge
link |
00:15:13.000
and also me being full time at DeepMind?
link |
00:15:15.080
So sort of kind of all these came together.
link |
00:15:17.440
And then I went to Irvine in California
link |
00:15:21.000
to the Blizzard headquarters to just chat with them
link |
00:15:23.800
and try to explain how would it all work
link |
00:15:26.320
before you do anything.
link |
00:15:27.800
And the approach has always been about
link |
00:15:32.120
the learning perspective, right?
link |
00:15:33.680
So in Berkeley, we did a lot of rule based conditioning
link |
00:15:39.200
and if you have more than three units, then go attack
link |
00:15:42.560
and if the other has more units than me, I retreat
link |
00:15:45.040
and so on and so forth.
link |
00:15:46.400
And of course, the point of deep reinforcement learning,
link |
00:15:48.840
deep learning, machine learning in general
link |
00:15:50.520
is that all these should be learned behavior.
link |
00:15:53.480
So that kind of was the DNA of the project
link |
00:15:57.000
since its inception in 2016
link |
00:15:59.520
where we just didn't even have an environment to work with.
link |
00:16:02.920
And so that's how it all started really.
link |
00:16:05.840
So if you go back to that conversation with Demis
link |
00:16:08.600
or even in your own head, how far away did you,
link |
00:16:12.200
because we're talking about Atari games,
link |
00:16:14.480
we're talking about Go, which is kind of,
link |
00:16:16.720
if you're honest about it, really far away from StarCraft.
link |
00:16:21.120
Well, now that you've beaten it,
link |
00:16:22.160
maybe you could say it's close,
link |
00:16:23.280
but it seems like StarCraft is way harder than Go,
link |
00:16:28.040
philosophically and mathematically speaking.
link |
00:16:30.840
So how far away did you think you were?
link |
00:16:35.040
Do you think it's 2019 and 18 you could be doing
link |
00:16:38.000
as well as you have?
link |
00:16:38.840
Yeah, when I kind of thought about,
link |
00:16:40.880
okay, I'm gonna dedicate a lot of my time and focus on this.
link |
00:16:44.880
And obviously I do a lot of different research
link |
00:16:48.080
in deep learning, so spending time on it.
link |
00:16:50.400
I mean, I really had to kind of think
link |
00:16:52.240
there's gonna be something good happening out of this.
link |
00:16:55.880
So really I thought, well, this sounds impossible
link |
00:16:59.120
and it probably is impossible to do the full thing,
link |
00:17:01.600
like the full game where you play one versus one
link |
00:17:06.720
and it's only a neural network playing and so on.
link |
00:17:09.760
So it really felt like,
link |
00:17:10.960
I just didn't even think it was possible.
link |
00:17:14.000
But on the other hand,
link |
00:17:14.840
I could see some stepping stones towards that goal.
link |
00:17:19.080
Clearly you could define sub problems in StarCraft
link |
00:17:21.600
and sort of dissect it a bit and say,
link |
00:17:23.360
okay, here is a part of the game, here is another part.
link |
00:17:26.720
And also obviously the fact,
link |
00:17:29.400
so this was really also critical to me,
link |
00:17:31.280
the fact that we could access human replays.
link |
00:17:34.400
So Blizzard was very kind
link |
00:17:35.720
and in fact they open source this for the whole community
link |
00:17:38.560
where you can just go
link |
00:17:39.960
and it's not every single StarCraft game ever played,
link |
00:17:43.040
but it's a lot of them.
link |
00:17:44.200
You can just go and download
link |
00:17:45.880
and every day they will,
link |
00:17:47.160
you can just query a data set and say,
link |
00:17:48.960
well, give me all the games that were played today.
link |
00:17:51.680
And given my kind of experience with language
link |
00:17:55.800
and sequences and supervised learning,
link |
00:17:57.920
I thought, well, that's definitely gonna be very helpful
link |
00:18:00.760
and something quite unique now
link |
00:18:02.400
because ever before we had such a large data set
link |
00:18:06.720
of replays of people playing the game at this scale
link |
00:18:11.000
of such a complex video game, right?
link |
00:18:12.560
So that to me was a precious resource.
link |
00:18:15.640
And as soon as I knew that Blizzard was able
link |
00:18:18.000
to kind of give this to the community,
link |
00:18:20.960
I started to feel positive
link |
00:18:22.280
about something non trivial happening.
link |
00:18:24.280
But I also thought the full thing,
link |
00:18:27.120
like really no rules, no single line of code
link |
00:18:30.400
that tries to say, well, I mean,
link |
00:18:31.680
if you see this unit build a detector,
link |
00:18:33.320
all these, not having any of these specializations
link |
00:18:36.680
seemed really, really, really difficult to me.
link |
00:18:39.160
Intuitively.
link |
00:18:40.000
I do also like that Blizzard was teasing
link |
00:18:42.680
or even trolling you,
link |
00:18:45.480
sort of almost pulling you in
link |
00:18:48.560
into this really difficult challenge.
link |
00:18:50.280
Do they have any awareness?
link |
00:18:51.840
What's the interest from the perspective of Blizzard
link |
00:18:55.640
except just curiosity?
link |
00:18:57.280
Yeah, I think Blizzard has really understood
link |
00:18:59.400
and really bring forward this competitiveness
link |
00:19:03.240
of eSports in games.
link |
00:19:04.800
The StarCraft really kind of sparked a lot of,
link |
00:19:07.840
like something that almost was never seen,
link |
00:19:10.720
especially as I was saying, back in Korea.
link |
00:19:13.960
So they just probably thought, well,
link |
00:19:16.480
this is such a pure one versus one setup
link |
00:19:18.880
that it would be great to see
link |
00:19:21.160
if something that can play Atari or go
link |
00:19:24.840
and then later on chess could even tackle
link |
00:19:27.920
these kind of complex real time strategy game, right?
link |
00:19:30.600
So for them, they wanted to see first, obviously,
link |
00:19:33.880
whether it was possible,
link |
00:19:36.440
if the game they created was in a way solvable,
link |
00:19:39.760
to some extent.
link |
00:19:40.840
And I think on the other hand,
link |
00:19:42.200
they also are a pretty modern company that innovates a lot.
link |
00:19:45.760
So just starting to understand AI for them
link |
00:19:48.520
to how to bring AI into games,
link |
00:19:50.240
is not AI for games, but games for AI, right?
link |
00:19:54.320
I mean, both ways, I think, can work.
link |
00:19:56.120
And we obviously had the manuse games for AI, right?
link |
00:20:00.040
To drive AI progress,
link |
00:20:01.280
but Blizzard might actually be able to do,
link |
00:20:03.680
and many other companies,
link |
00:20:04.760
to start to understand and do the opposite.
link |
00:20:06.800
So I think that is also something they can get out of this.
link |
00:20:09.800
And they definitely,
link |
00:20:11.320
we have brainstormed a lot about this, right?
link |
00:20:13.720
But one of the interesting things to me about StarCraft
link |
00:20:16.080
and Diablo and these games that Blizzard has created
link |
00:20:19.400
is the task of balancing classes, for example,
link |
00:20:23.560
sort of making the game fair from the starting point,
link |
00:20:27.480
and then let skill determine the outcome.
link |
00:20:30.960
Is there, I mean, can you first comment?
link |
00:20:33.600
There's three races, Zerg, Protoss, and Terran.
link |
00:20:36.760
I don't know if I've ever said that out loud.
link |
00:20:38.960
Is that how you pronounce it, Terran?
link |
00:20:40.600
Yeah, Terran.
link |
00:20:41.600
Yeah.
link |
00:20:42.440
Yeah, I don't think I've ever,
link |
00:20:45.200
in person, interacted with anybody about StarCraft.
link |
00:20:47.680
That's funny. So they seem to be pretty balanced.
link |
00:20:51.760
I wonder if the AI, the work that you're doing
link |
00:20:56.240
with AlphaStar would help balance them even further.
link |
00:20:59.160
Is that something you think about?
link |
00:21:00.520
Is that something that Blizzard is thinking about?
link |
00:21:03.280
Right, so balancing when you add a new unit
link |
00:21:06.400
or a new spell type is obviously possible,
link |
00:21:09.120
given that you can always train or pre train at scale,
link |
00:21:13.160
some agent that might start using that in unintended ways.
link |
00:21:16.680
But I think actually, if you understand
link |
00:21:19.120
how StarCraft has kind of co evolved with players,
link |
00:21:22.200
in a way, I think it's actually very cool,
link |
00:21:24.280
the ways that many of the things and strategies
link |
00:21:27.400
that people came up with, right?
link |
00:21:28.680
So I think it's, we've seen it over and over in StarCraft
link |
00:21:32.280
that Blizzard comes up with maybe a new unit,
link |
00:21:34.920
and then some players get creative
link |
00:21:37.240
and do something kind of unintentional
link |
00:21:39.080
or something that Blizzard designers
link |
00:21:40.840
that just simply didn't test or think about.
link |
00:21:43.560
And then after that becomes kind of mainstream in the community,
link |
00:21:46.960
Blizzard patches the game,
link |
00:21:48.240
and then they kind of maybe weaken that strategy
link |
00:21:51.880
or make it actually more interesting,
link |
00:21:53.880
but a bit more balanced.
link |
00:21:55.400
So this kind of continual talk between players and Blizzard
link |
00:21:58.280
is kind of what has defined them actually,
link |
00:22:01.680
in actually most games, like in StarCraft,
link |
00:22:04.040
but also in World of Warcraft,
link |
00:22:05.760
they would do that, there are several classes
link |
00:22:07.440
and it would be not good that everyone plays
link |
00:22:10.800
absolutely the same race and so on, right?
link |
00:22:13.200
So I think they do care about balancing, of course,
link |
00:22:17.280
and they do a fair amount of testing,
link |
00:22:19.640
but it's also beautiful to also see
link |
00:22:22.160
how players get creative anyways.
link |
00:22:24.520
And I mean, whether AI can be more creative at this point,
link |
00:22:27.440
I don't think so, right?
link |
00:22:28.680
I mean, it's just sometimes something so amazing happens,
link |
00:22:31.600
like I remember back in the days,
link |
00:22:33.720
like you have these drop ships that could drop the rivers,
link |
00:22:36.920
and that was actually not thought about,
link |
00:22:39.600
that you could drop this unit
link |
00:22:41.280
that has this what's called splash damage
link |
00:22:43.200
that would basically eliminate all the enemy's workers at once.
link |
00:22:47.800
No one thought that you could actually put them
link |
00:22:50.080
in really early game, do that kind of damage,
link |
00:22:53.040
and then things change in the game,
link |
00:22:55.400
but I don't know, I think it's quite an amazing
link |
00:22:58.000
exploration process from both sides,
link |
00:23:00.280
players and Blizzard alike.
link |
00:23:01.840
Well, it's almost like a reinforcement learning exploration,
link |
00:23:05.000
but the scale of humans that play Blizzard games
link |
00:23:10.000
is almost on the scale of a large scale,
link |
00:23:13.680
deep mind RL experiment.
link |
00:23:15.360
I mean, if you look at the numbers,
link |
00:23:17.200
that's, I mean, you're talking about,
link |
00:23:18.720
I don't know how many games,
link |
00:23:19.560
but hundreds of thousands of games, probably a month.
link |
00:23:22.080
Yeah, I mean, so you could,
link |
00:23:23.880
it's almost the same as running RL agents.
link |
00:23:28.800
What aspect of the problem of Starcraft,
link |
00:23:31.240
do you think is the hardest?
link |
00:23:32.160
Is it the, like you said, the imperfect information?
link |
00:23:35.400
Is it the fact they have to do longterm planning?
link |
00:23:38.160
Is it the real time aspect?
link |
00:23:40.280
So you have to do stuff really quickly?
link |
00:23:42.240
Is it the fact that large action space,
link |
00:23:44.760
so you can do so many possible things?
link |
00:23:47.640
Or is it, you know, in the game theoretic sense,
link |
00:23:51.120
there is no Nash equilibrium.
link |
00:23:52.440
At least you don't know what the optimal strategy is,
link |
00:23:54.240
because there's way too many options.
link |
00:23:56.520
Right.
link |
00:23:57.360
Is there something that stands out
link |
00:23:58.360
as just like the hardest, the most annoying thing?
link |
00:24:01.000
So when we sort of looked at the problem
link |
00:24:04.200
and start to define the parameters of it, right?
link |
00:24:07.640
What are the observations?
link |
00:24:08.800
What are the actions?
link |
00:24:10.520
It became very apparent that, you know,
link |
00:24:13.920
the very first barrier that one would hit in Starcraft
link |
00:24:17.160
would be because of the action space being so large
link |
00:24:20.720
and as not being able to search like you could in chess
link |
00:24:24.880
or go even though the search space is vast.
link |
00:24:28.640
The main problem that we identified
link |
00:24:30.600
was that of exploration, right?
link |
00:24:32.440
So without any sort of human knowledge or human prior,
link |
00:24:36.720
if you think about Starcraft
link |
00:24:38.040
and you know how deep reinforcement learning algorithm works,
link |
00:24:41.440
work, which is essentially by issuing random actions
link |
00:24:45.360
and hoping that they will get some wins sometimes
link |
00:24:47.800
so they could learn.
link |
00:24:49.200
So if you think of the action space in Starcraft,
link |
00:24:52.800
almost anything you can do in the early game is bad
link |
00:24:55.880
because any action involves taking workers
link |
00:24:58.720
which are mining minerals for free.
link |
00:25:01.360
That's something that the game does automatically,
link |
00:25:03.560
sends them to mine
link |
00:25:04.920
and you would immediately just take them out of mining
link |
00:25:07.720
and send them around.
link |
00:25:09.080
So just thinking how is it gonna be possible
link |
00:25:13.640
to get to understand these concepts
link |
00:25:16.880
but even more like expanding, right?
link |
00:25:19.280
There's these buildings you can place
link |
00:25:21.080
in other locations in the map to gather more resources
link |
00:25:24.160
but the location of the building is important
link |
00:25:26.840
and you have to select a worker,
link |
00:25:28.920
send it walking to that location, build the building,
link |
00:25:32.680
wait for the building to be built
link |
00:25:34.160
and then put extra workers there so they start mining.
link |
00:25:37.840
That just, that feels like impossible
link |
00:25:40.200
if you just randomly click to produce that state,
link |
00:25:43.680
desirable state that then you could hope to learn from
link |
00:25:47.000
because eventually that may yield to an extra win, right?
link |
00:25:49.880
So for me, the exploration problem
link |
00:25:51.840
and due to the action space
link |
00:25:53.840
and the fact that there's not really turns,
link |
00:25:56.160
there's so many turns
link |
00:25:57.000
because the game essentially ticks at 22 times per second.
link |
00:26:01.440
If you, I mean, that's how they discretize sort of time.
link |
00:26:05.560
Obviously, you always have to discretize time
link |
00:26:07.320
where there's no such thing as real time
link |
00:26:09.640
but it's really a lot of time steps
link |
00:26:12.560
of things that could go wrong
link |
00:26:14.280
and that definitely felt a priori like the hardest.
link |
00:26:17.960
You mentioned many good ones,
link |
00:26:19.360
I think partial observability,
link |
00:26:21.360
the fact that there is no perfect strategy
link |
00:26:23.440
because of the partial observability,
link |
00:26:25.560
those are very interesting problems.
link |
00:26:26.880
We start seeing more and more now in terms of
link |
00:26:29.400
as we saw of the previous ones
link |
00:26:31.080
but the core problem to me was exploration
link |
00:26:34.320
and solving it has been basically kind of the focus
link |
00:26:37.800
and how we saw the first breakthroughs.
link |
00:26:39.840
So exploration in a multi hierarchical way.
link |
00:26:43.720
So like 22 times a second exploration
link |
00:26:46.640
has a very different meaning than it does
link |
00:26:48.680
in terms of should I gather resources early
link |
00:26:51.520
or should I wait or so on.
link |
00:26:53.200
So how do you solve the long term?
link |
00:26:56.240
Let's talk about the internals of Alpha Star.
link |
00:26:58.120
So first of all, how do you represent the state
link |
00:27:02.520
of the game as an input?
link |
00:27:05.160
How do you then do the long term sequence modeling?
link |
00:27:08.840
How do you build a policy?
link |
00:27:10.480
What's the architecture like?
link |
00:27:12.600
So Alpha Star has obviously several components
link |
00:27:16.880
but everything passes through what we call the policy
link |
00:27:20.920
which is a neural network
link |
00:27:22.320
and that's kind of the beauty of it.
link |
00:27:24.320
There is, I could just now give you a neural network
link |
00:27:27.200
and some weights and if you fed the right observations
link |
00:27:30.480
and you understood the actions the same way we do
link |
00:27:32.600
you would have basically the agent playing the game.
link |
00:27:35.160
There's absolutely nothing else needed
link |
00:27:37.280
other than those weights that were trained.
link |
00:27:40.360
Now, the first step is observing the game
link |
00:27:43.400
and we've experimented with a few alternatives.
link |
00:27:46.680
The one that we currently use
link |
00:27:48.800
mixes both spatial sort of images
link |
00:27:51.440
that you would process from the game
link |
00:27:53.840
that is the zoomed out version of the map
link |
00:27:56.440
and also a zoomed in version of the camera
link |
00:27:58.960
or the screen as we call it.
link |
00:28:00.880
But also we give to the agent the list of units
link |
00:28:04.840
that it sees more of as a set of objects
link |
00:28:09.000
that it can operate on.
link |
00:28:11.040
That is not necessarily required to use it
link |
00:28:14.760
and we have versions of the game that play well
link |
00:28:16.840
without this set vision that is a bit
link |
00:28:19.080
not like how humans perceive the game
link |
00:28:21.680
but it certainly helps a lot
link |
00:28:23.640
because it's a very natural way
link |
00:28:25.040
to encode the game is by just looking at all the units
link |
00:28:28.480
that they have properties like health, position,
link |
00:28:32.960
type of unit, whether it's my unit or the enemy's
link |
00:28:36.200
and that sort of is kind of the summary
link |
00:28:40.800
of the state of the game,
link |
00:28:42.880
not that list of units or set of units
link |
00:28:45.520
that you see all the time.
link |
00:28:47.400
But that's pretty close to the way humans see the game.
link |
00:28:49.600
Why do you say it's not,
link |
00:28:51.440
you're saying the exactness of it
link |
00:28:53.240
is not similar to humans?
link |
00:28:55.080
The exactness of it is perhaps not the problem.
link |
00:28:57.200
I guess maybe the problem if you look at it
link |
00:28:59.840
from how actually humans play the game
link |
00:29:02.320
is that they play with a mouse and a keyboard and a screen
link |
00:29:05.760
and they don't see sort of a structured object
link |
00:29:08.760
with all the units,
link |
00:29:09.600
what they see is what they see on the screen, right?
link |
00:29:12.240
So you remember that there's a certain interrupt,
link |
00:29:14.400
there's a plot that you showed with camera base
link |
00:29:17.000
where you do exactly that, right, you move around
link |
00:29:19.680
and that seems to converge to similar performance.
link |
00:29:22.280
Yeah, I think that's what we're kind of experimenting
link |
00:29:24.760
with what's necessary or not, but using the set.
link |
00:29:28.720
So actually if you look at research in computer vision
link |
00:29:32.360
where it makes a lot of sense to treat images
link |
00:29:36.000
as two dimensional arrays,
link |
00:29:38.160
there's actually a very nice paper from Facebook.
link |
00:29:40.360
I think, I forgot who the authors are,
link |
00:29:42.720
but I think it's part of Kmings has group.
link |
00:29:46.360
And what they do is they take an image,
link |
00:29:49.520
which is this two dimensional signal
link |
00:29:51.960
and they actually take pixel by pixel
link |
00:29:54.320
and scramble the image as if it was just a list of pixels.
link |
00:29:59.160
Crucially, they encode the position of the pixels
link |
00:30:01.800
with the XY coordinates.
link |
00:30:03.720
And this is just kind of a new architecture
link |
00:30:06.160
which we incidentally also use in stack graph
link |
00:30:08.520
called the transformer,
link |
00:30:09.880
which is a very popular paper from last year,
link |
00:30:12.000
which yielded very nice result in machine translation.
link |
00:30:15.600
And if you actually believe in this kind of,
link |
00:30:18.040
oh, it's actually a set of pixels
link |
00:30:20.320
as long as you encode XY, it's okay.
link |
00:30:22.520
Then you could argue that the list of units
link |
00:30:25.560
that we see is precisely that
link |
00:30:26.960
because we have each unit as a kind of pixel, if you will,
link |
00:30:31.480
and then their XY coordinates.
link |
00:30:33.240
So in that perspective, without knowing it,
link |
00:30:36.360
we use the same architecture
link |
00:30:37.680
that was shown to work very well
link |
00:30:39.680
on Pascal and ImageNet and so on.
link |
00:30:41.400
So the interesting thing here is putting it in that way,
link |
00:30:45.440
it starts to move it towards
link |
00:30:47.000
the way you usually work with language.
link |
00:30:49.480
So what, and especially with your expertise
link |
00:30:52.760
and work in language, it seems like there's echoes
link |
00:30:57.000
of a lot of the way you would work with natural language
link |
00:31:00.720
in the way you've approached AlphaStar.
link |
00:31:02.440
Right, does that help
link |
00:31:05.080
with the longterm sequence modeling there somehow?
link |
00:31:08.200
Exactly, so now that we understand what an observation
link |
00:31:11.200
for a given time step is, we need to move on to say,
link |
00:31:14.680
well, there's gonna be a sequence of such observations
link |
00:31:17.760
and an agent will need to, given all that it's seen,
link |
00:31:21.120
not only the current time step, but all that it's seen,
link |
00:31:23.720
why, because there is partial observability.
link |
00:31:25.920
We must remember whether we saw a worker
link |
00:31:28.400
going somewhere, for instance, right?
link |
00:31:30.120
Because then there might be an expansion
link |
00:31:31.720
on the top right of the map.
link |
00:31:33.640
So given that, what you must then think about
link |
00:31:37.840
is there is the problem of, given all the observations,
link |
00:31:40.400
you have to predict the next action.
link |
00:31:42.640
And not only given all the observations,
link |
00:31:44.520
but given all the observations
link |
00:31:45.920
and given all the actions you've taken,
link |
00:31:47.920
predict the next action.
link |
00:31:49.360
And that sounds exactly like machine translation,
link |
00:31:52.480
where, and that's exactly how kind of I saw the problem,
link |
00:31:57.160
especially when you are given supervised data
link |
00:31:59.960
or replaced from humans,
link |
00:32:01.760
because the problem is exactly the same.
link |
00:32:03.600
You're translating essentially a prefix
link |
00:32:06.680
of observations and actions
link |
00:32:08.240
onto what's gonna happen next,
link |
00:32:10.160
which is exactly how you would train a model to translate
link |
00:32:13.000
or to generate language as well, right?
link |
00:32:14.760
You have a certain prefix.
link |
00:32:16.640
You must remember everything that comes in the past,
link |
00:32:19.000
because otherwise,
link |
00:32:20.080
you might start having non coherent text.
link |
00:32:22.640
And the same architectures,
link |
00:32:25.120
we're using LSTMs and transformers
link |
00:32:27.760
to operate on across time
link |
00:32:29.760
to kind of integrate all that's happened in the past.
link |
00:32:33.080
Those architectures that work so well
link |
00:32:35.000
in translation or language modeling
link |
00:32:36.880
are exactly the same than what the agent is using
link |
00:32:40.640
to issue actions in the game.
link |
00:32:42.360
And the way we train it, moreover,
link |
00:32:43.880
for imitation, which is step one of alpha studies,
link |
00:32:47.120
take all the human experience and try to imitate it,
link |
00:32:49.880
much like you try to imitate translators
link |
00:32:52.920
that translated many pairs of sentences
link |
00:32:55.360
from French to English say,
link |
00:32:57.280
that sort of principle applies exactly the same.
link |
00:33:00.200
It's almost the same code,
link |
00:33:02.760
except that instead of words,
link |
00:33:04.520
you have a slightly more complicated objects,
link |
00:33:06.680
which are the observations
link |
00:33:08.280
and the actions are also a bit more complicated
link |
00:33:10.240
than a word.
link |
00:33:11.760
Is there a self play component then too?
link |
00:33:13.920
So once you run out of imitation?
link |
00:33:16.480
Right, so indeed you can bootstrap from human replays,
link |
00:33:22.240
but then the agents you get are actually not as good
link |
00:33:25.960
as the humans you imitated, right?
link |
00:33:28.160
So how do we imitate?
link |
00:33:30.440
Well, we take humans from 3000 MMR and higher.
link |
00:33:34.240
3000 MMR is just a metric of human skill.
link |
00:33:37.960
And 3000 MMR might be like 50% percentile, right?
link |
00:33:41.880
So it's just average human.
link |
00:33:43.760
What's that?
link |
00:33:44.600
So maybe a quick pause.
link |
00:33:45.440
MMR is a ranking scale,
link |
00:33:47.760
the matchmaking rating for players.
link |
00:33:50.320
So it's 3000, I remember there's like a master
link |
00:33:52.320
and a grandmaster, what's 3000?
link |
00:33:54.120
So 3000 is pretty bad.
link |
00:33:56.720
I think it's kind of gold level.
link |
00:33:58.440
It just sounds really good relative to chess, I think.
link |
00:34:00.680
Oh yeah, yeah, no, the ratings,
link |
00:34:02.440
the best in the world are at 7000 MMR.
link |
00:34:04.480
7000.
link |
00:34:05.320
So 3000, it's a bit like Elo indeed, right?
link |
00:34:07.840
So 3500 just allows us to not filter a lot of the data.
link |
00:34:13.200
So we like to have a lot of data in deep learning
link |
00:34:15.680
as you probably know.
link |
00:34:17.320
So we take these kind of 3500 and above,
link |
00:34:20.640
but then we do a very interesting trick,
link |
00:34:22.720
which is we tell the neural network
link |
00:34:25.000
what level they are imitating.
link |
00:34:27.560
So we say these replay you're gonna try to imitate
link |
00:34:30.800
to predict the next action for all the actions
link |
00:34:33.040
that you're gonna see is a 4000 MMR replay.
link |
00:34:36.120
This one is a 6000 MMR replay.
link |
00:34:38.800
And what's cool about this is then we take this policy
link |
00:34:42.480
that is being trained from human
link |
00:34:44.280
and then we can ask it to play like a 3000 MMR player
link |
00:34:47.400
by setting a bit saying, well, okay,
link |
00:34:49.600
play like a 3000 MMR player or play like a 6000 MMR player.
link |
00:34:53.680
And you actually see how the policy behaves differently.
link |
00:34:57.320
It gets worse economy if you play like a gold level player.
link |
00:35:01.520
It does less actions per minute,
link |
00:35:03.000
which is the number of clicks or number of actions
link |
00:35:05.320
that you will issue in a whole minute.
link |
00:35:07.760
And it's very interesting to see
link |
00:35:09.200
that it kind of imitates the skill level quite well.
link |
00:35:12.360
But if we ask it to play like a 6000 MMR player,
link |
00:35:15.480
we tested of course these policies to see how well they do.
link |
00:35:18.600
They actually beat all the built in AIs
link |
00:35:20.600
that Blizzard put in the game,
link |
00:35:22.400
but they're nowhere near 6000 MMR players, right?
link |
00:35:24.960
They might be maybe around gold level, platinum perhaps.
link |
00:35:29.240
So there's still a lot of work to be done for the policy
link |
00:35:32.200
to truly understand what it means to win.
link |
00:35:34.960
So far we only ask them, okay, here is the screen
link |
00:35:38.160
and that's what's happened on the game until this point.
link |
00:35:41.600
What would the next action be if we ask a pro to now say,
link |
00:35:46.080
oh, you're gonna click here or here or there?
link |
00:35:49.120
And the point is experiencing wins and losses
link |
00:35:53.680
is very important to then start to refine.
link |
00:35:56.320
Otherwise the policy can get loose,
link |
00:35:58.360
can just go off policy as we call it.
link |
00:36:00.440
That's so interesting that you can at least hope eventually
link |
00:36:03.400
to be able to control a policy
link |
00:36:06.760
approximately to be at some MMR level.
link |
00:36:09.960
That's so interesting, especially given
link |
00:36:12.240
that you have ground truth for a lot of these cases.
link |
00:36:15.000
Can I ask you a personal question?
link |
00:36:17.520
What's your MMR?
link |
00:36:19.200
Well, I haven't played Starcraft 2, so I am unranked,
link |
00:36:23.600
which is the kind of lowest league.
link |
00:36:25.360
Okay.
link |
00:36:26.200
So I used to play Starcraft 1.
link |
00:36:28.360
The first one and...
link |
00:36:29.560
But you haven't seriously played Starcraft 2?
link |
00:36:31.280
No, not Starcraft 2.
link |
00:36:32.640
So the best player we have at DeepMind is about 5,000 MMR,
link |
00:36:37.720
which is high masters.
link |
00:36:39.560
It's not at the Grand Master level.
link |
00:36:42.040
Grand Master level would be the top 200 players
link |
00:36:44.640
in a certain region, like Europe or America or Asia.
link |
00:36:49.120
But for me, it would be hard to say.
link |
00:36:51.560
I am very bad at the game.
link |
00:36:53.680
I actually played Alpha Star a bit too late and it beat me.
link |
00:36:56.600
I remember the whole team was, oh, Oreo, you should play.
link |
00:36:59.720
And I was, oh, it looks like it's not so good yet.
link |
00:37:02.160
And then I remember I kind of got busy and waited an extra week
link |
00:37:06.600
and I played and it really beat me very badly.
link |
00:37:09.640
How did that feel?
link |
00:37:11.160
Isn't that an amazing feeling?
link |
00:37:12.600
That's amazing, yeah.
link |
00:37:13.560
I mean, obviously, I tried my best and I tried to also impress
link |
00:37:17.920
because I actually played the first game,
link |
00:37:19.720
so I'm still pretty good at micro management.
link |
00:37:23.040
The problem is I just don't understand Starcraft 2.
link |
00:37:25.200
I understand Starcraft.
link |
00:37:27.200
And when I played Starcraft,
link |
00:37:28.440
I probably was consistently like for a couple of years,
link |
00:37:32.640
top 32 in Europe.
link |
00:37:34.600
So I was decent, but at the time,
link |
00:37:36.440
we didn't have this kind of MMR system as well established.
link |
00:37:40.280
So it would be hard to know what it was back then.
link |
00:37:43.120
So what's the difference in interface between Alpha Star
link |
00:37:45.760
and Starcraft and a human player in Starcraft?
link |
00:37:49.600
Is there any significant differences
link |
00:37:52.000
between the way they both see the game?
link |
00:37:54.080
I would say the way they see the game,
link |
00:37:55.960
there's a few things that are just very hard to simulate.
link |
00:38:01.000
The main one, perhaps,
link |
00:38:02.640
which is obvious in hindsight,
link |
00:38:05.200
is what's called clocked units,
link |
00:38:08.440
which are invisible units.
link |
00:38:10.560
So in Starcraft, you can make some units
link |
00:38:13.240
that you need to have a particular kind of unit to detect it.
link |
00:38:18.040
So these units are invisible.
link |
00:38:20.560
If you cannot detect them, you cannot target them.
link |
00:38:22.720
So they would just destroy your buildings
link |
00:38:25.720
or kill your workers.
link |
00:38:27.720
But despite the fact you cannot target the unit,
link |
00:38:31.640
there's a shimmer that as a human you observe.
link |
00:38:34.600
I mean, you need to train a little bit.
link |
00:38:35.920
You need to pay attention,
link |
00:38:37.400
but you would see this kind of space time distortion
link |
00:38:41.840
and you wouldn't know, okay, there are, yeah.
link |
00:38:44.800
Yeah, there's like a wave thing.
link |
00:38:46.040
Yeah, it's called shimmer.
link |
00:38:47.840
Space time distortion, I like it.
link |
00:38:49.120
That's really like the blizzard term is shimmer.
link |
00:38:52.440
And so these shimmer professional players actually
link |
00:38:56.040
can see it immediately.
link |
00:38:57.160
They understand it very well,
link |
00:38:59.480
but it's still something that requires
link |
00:39:01.400
certain amount of attention
link |
00:39:02.720
and it's kind of a bit annoying to deal with.
link |
00:39:05.640
Whereas for Alpha Star, in terms of vision,
link |
00:39:08.640
it's very hard for us to simulate sort of,
link |
00:39:11.120
oh, are you looking at this pixel in the screen and so on?
link |
00:39:14.160
So the only thing we can do is
link |
00:39:17.520
there is a unit that's invisible over there.
link |
00:39:19.720
So Alpha Star would know that immediately.
link |
00:39:22.520
Obviously still obeys the rules.
link |
00:39:24.040
You cannot attack the unit.
link |
00:39:25.200
You must have a detector and so on,
link |
00:39:27.400
but it's kind of one of the main things
link |
00:39:29.320
that it just doesn't feel there's a very proper way.
link |
00:39:32.680
I mean, you could imagine, oh, you don't have hypers.
link |
00:39:35.480
Maybe you don't know exactly what it is,
link |
00:39:36.960
or sometimes you see it, sometimes you don't.
link |
00:39:39.240
But it's just really, really complicated to get it
link |
00:39:43.040
so that everyone would agree,
link |
00:39:44.280
oh, that's the best way to simulate this, right?
link |
00:39:47.120
You know, it seems like a perception problem.
link |
00:39:49.280
It is a perception problem.
link |
00:39:50.600
So the only problem is people, you ask,
link |
00:39:54.240
oh, what's the difference between how humans perceive the game?
link |
00:39:56.760
I would say they wouldn't be able to tell a shimmer
link |
00:39:59.960
immediately as it appears on the screen,
link |
00:40:02.240
whereas Alpha Star, in principle,
link |
00:40:04.320
sees it very sharply, right?
link |
00:40:05.640
It sees that the bit turned from zero to one,
link |
00:40:08.680
meaning there's now a unit there,
link |
00:40:10.520
although you don't know the unit,
link |
00:40:11.960
or you know that you cannot attack it and so on.
link |
00:40:15.840
So from a vision standpoint,
link |
00:40:18.080
that probably is the one that is kind of the most obvious one.
link |
00:40:23.000
Then there are things humans cannot do perfectly,
link |
00:40:25.200
even professionals, which is they might miss a detail
link |
00:40:28.120
or they might have not seen a unit.
link |
00:40:30.640
And obviously, as a computer,
link |
00:40:32.320
if there's a corner of the screen that turns green
link |
00:40:35.040
because a unit enters the field of view,
link |
00:40:37.720
that can go into the memory of the agent, the LSTM,
link |
00:40:41.120
and persists there for a while,
link |
00:40:42.560
and for however long is relevant, right?
link |
00:40:45.720
And in terms of action, it seems like the rate of action
link |
00:40:49.920
from Alpha Star is comparative,
link |
00:40:51.640
if not slower than professional players,
link |
00:40:54.280
but it's more precise is what I heard.
link |
00:40:57.160
So that's really probably the one
link |
00:40:59.760
that is causing us more issues for a couple of reasons, right?
link |
00:41:05.040
The first one is StarCraft has been an AI environment
link |
00:41:08.440
for quite a few years.
link |
00:41:09.960
In fact, I was participating in the very first competition
link |
00:41:14.000
back in 2010.
link |
00:41:15.920
And there's really not been a kind of a very clear set
link |
00:41:19.920
of rules, how the actions per minute,
link |
00:41:22.320
the rate of actions that you can issue is.
link |
00:41:24.720
And as a result, these agents or bots that people build
link |
00:41:29.280
in a kind of almost very cool way,
link |
00:41:31.080
they do like 20,000, 40,000 actions per minute.
link |
00:41:35.400
Now, to put this in perspective,
link |
00:41:37.200
a very good professional human might do 300
link |
00:41:41.640
to 800 actions per minute, they might not be as precise.
link |
00:41:45.480
That's why the range is a bit tricky to identify exactly.
link |
00:41:49.040
I mean, 300 actions per minute precisely
link |
00:41:51.640
is probably realistic, 800 is probably not,
link |
00:41:54.600
but you see humans doing a lot of actions
link |
00:41:57.000
because they warm up and they kind of select things
link |
00:41:59.480
and spam and so on, just so that when they need,
link |
00:42:02.240
they have the accuracy.
link |
00:42:04.240
So we came into this by not having kind of a standard way
link |
00:42:09.240
to say, well, how do we measure whether an agent
link |
00:42:12.480
is at human level or not?
link |
00:42:15.400
On the other hand, we had a huge advantage,
link |
00:42:17.760
which is because we do imitation learning,
link |
00:42:20.960
agents turned out to act like humans
link |
00:42:24.040
in terms of rate of actions, even precision
link |
00:42:26.520
and in precision of actions.
link |
00:42:28.320
In the supervised policy, you could see all these.
link |
00:42:30.600
You could see how agents like to spam click to move here.
link |
00:42:34.280
If you played, especially Diablo, you would know what I mean.
link |
00:42:36.440
I mean, you just like spam, oh, move here, move here, move here.
link |
00:42:39.840
You're doing literally like maybe five actions in two seconds,
link |
00:42:43.680
but these actions are not very meaningful.
link |
00:42:46.360
One would have sufficed.
link |
00:42:48.280
So on the one hand, we start from this imitation policy
link |
00:42:51.640
that is at the ballpark of the actions per minute of humans
link |
00:42:55.160
because it's actually statistically trying to imitate humans.
link |
00:42:58.440
So we see this very nicely in the curves
link |
00:43:00.560
that we showed in the blog post.
link |
00:43:01.840
Like there's these actions per minute
link |
00:43:04.120
and the distribution looks very human like.
link |
00:43:07.240
But then of course, as self play kicks in,
link |
00:43:10.560
and that's the part we haven't talked too much yet,
link |
00:43:12.800
but of course the agent must play against himself to improve,
link |
00:43:16.760
then there's almost no guarantees
link |
00:43:19.200
that these actions will not become more precise
link |
00:43:22.000
or even the rate of actions is gonna increase over time.
link |
00:43:25.600
So what we did, and this is probably kind of the first attempt
link |
00:43:29.440
that we thought was reasonable,
link |
00:43:30.760
is we looked at the distribution of actions for humans
link |
00:43:33.800
for certain windows of time.
link |
00:43:36.440
And just to give a perspective,
link |
00:43:37.800
because I guess I mentioned that some of these agents
link |
00:43:40.760
that are programmatic, let's call them,
link |
00:43:42.360
they do 40,000 actions per minute.
link |
00:43:44.640
Professionals, as I said, do 300 to 800.
link |
00:43:47.400
So what we looked is we look at the distribution
link |
00:43:49.440
over professional gamers
link |
00:43:51.000
and we took reasonably high actions per minute,
link |
00:43:54.160
but we kind of identify certain cutoffs
link |
00:43:57.520
after which even if the agent wanted to act,
link |
00:44:00.600
these actions would be dropped.
link |
00:44:02.000
But the problem is this cutoff is probably set a bit too high
link |
00:44:06.440
and what ends up happening, even though the games,
link |
00:44:09.360
and when we ask the professionals and the gamers,
link |
00:44:11.400
by and large, they feel like it's playing human like.
link |
00:44:15.040
There are some agents that developed
link |
00:44:18.440
maybe slightly too high APMs,
link |
00:44:22.040
which is actions per minute,
link |
00:44:23.520
combined with the precision,
link |
00:44:25.680
which made people sort of start discussing
link |
00:44:28.720
a very interesting issue,
link |
00:44:29.680
which is should we have limited
link |
00:44:31.200
this, should we just let it loose
link |
00:44:34.000
and see what cool things it can come up with, right?
link |
00:44:37.520
Interesting.
link |
00:44:38.360
So this is in itself an extremely interesting question,
link |
00:44:41.960
but the same way that modeling the shimmer
link |
00:44:43.920
would be so difficult,
link |
00:44:45.440
modeling absolutely all the details about muscles
link |
00:44:48.800
and precision and tiredness of humans
link |
00:44:51.600
would be quite difficult, right?
link |
00:44:52.880
So we're really here in kind of innovating in this sense
link |
00:44:56.760
of, okay, what could be maybe the next iteration
link |
00:45:00.040
of putting more rules
link |
00:45:01.720
that makes the agents more human like
link |
00:45:05.040
in terms of restrictions.
link |
00:45:06.240
So yeah, putting constraints that.
link |
00:45:08.040
More constraints, yeah.
link |
00:45:09.240
That's really interesting, that's really innovative.
link |
00:45:11.040
So one of the constraints you put on yourself
link |
00:45:15.400
or at least focused in is on the Protoss race,
link |
00:45:18.000
as far as I understand.
link |
00:45:19.880
Can you tell me about the different races
link |
00:45:21.880
and how they, so Protoss, Terran and Zerg,
link |
00:45:26.000
how do they compare?
link |
00:45:27.040
How do they interact?
link |
00:45:28.120
Why did you choose Protoss?
link |
00:45:29.960
In the dynamics of the game
link |
00:45:33.600
seen from a strategic perspective.
link |
00:45:35.680
So Protoss, so in Starcraft, there are three races.
link |
00:45:39.680
Indeed, in the demonstration,
link |
00:45:41.320
we saw only the Protoss race.
link |
00:45:43.880
So maybe let's start with that one.
link |
00:45:45.560
Protoss is kind of the most technologically advanced race.
link |
00:45:49.440
It has units that are expensive, but powerful, right?
link |
00:45:53.800
So in general, you wanna kind of conserve your units
link |
00:45:57.840
as you go attack.
link |
00:45:58.640
So you wanna, and then you wanna utilize
link |
00:46:01.840
these tactical advantages of very fancy spells
link |
00:46:05.120
and so on, so forth.
link |
00:46:07.240
And at the same time, they're kind of,
link |
00:46:11.280
people say like they're a bit easier to play perhaps, right?
link |
00:46:15.040
But that I actually didn't know.
link |
00:46:17.160
I mean, I just talked to now a lot to the players
link |
00:46:20.120
that we work with, TLO and Mana.
link |
00:46:22.520
And they said, oh yeah, Protoss is actually,
link |
00:46:24.120
people think is actually one of the easiest races.
link |
00:46:26.320
So perhaps the easier, that doesn't mean
link |
00:46:29.360
that it's obviously professional players
link |
00:46:32.760
excel at the three races.
link |
00:46:34.080
And there's never like a race that dominates
link |
00:46:37.600
for a very long time anyway.
link |
00:46:38.800
So if you look at the top, I don't know, 100 in the world,
link |
00:46:41.760
is there one race that dominates that list?
link |
00:46:44.360
It would be hard to know because it depends on the regions.
link |
00:46:46.840
I think it's pretty equal in terms of distribution.
link |
00:46:50.600
And Blizzard wants it to be equal, right?
link |
00:46:52.840
They wouldn't want one race like Protoss
link |
00:46:56.320
to not be representative in the top place.
link |
00:46:59.960
So definitely like they tried it to be like the balance, right?
link |
00:47:03.880
So then maybe the opposite race of Protoss is Zerg.
link |
00:47:07.320
Zerg is a race where you just kind of expand
link |
00:47:10.600
and take over as many resources as you can.
link |
00:47:13.840
And they have a very high capacity
link |
00:47:15.720
to regenerate their units.
link |
00:47:17.680
So if you have an army, it's not that valuable
link |
00:47:20.520
in terms of losing the whole army is not a big deal as Zerg
link |
00:47:23.960
because you can then rebuild it.
link |
00:47:25.960
And given that you generally accumulate
link |
00:47:28.320
a huge bank of resources, Zerg's typically play
link |
00:47:32.000
by applying a lot of pressure,
link |
00:47:34.240
maybe losing their whole army,
link |
00:47:36.160
but then rebuilding it quickly.
link |
00:47:37.880
So although of course every race,
link |
00:47:40.480
I mean, there's never, I mean, they're pretty diverse.
link |
00:47:43.960
I mean, there are some units in Zerg
link |
00:47:45.160
that are technologically advanced
link |
00:47:46.600
and they do some very interesting spells.
link |
00:47:48.880
And there's some units in Protoss that are less valuable
link |
00:47:51.360
and you could lose a lot of them and rebuild them
link |
00:47:53.360
and it wouldn't be a big deal.
link |
00:47:55.160
All right, so maybe I'm missing out.
link |
00:47:57.840
Maybe I'm gonna say some dumb stuff,
link |
00:47:59.280
but just summary of strategy.
link |
00:48:02.480
So first there's collection of a lot of resources.
link |
00:48:05.720
That's one option.
link |
00:48:06.560
The other one is expanding, so building other bases.
link |
00:48:11.920
Then the other is obviously building units
link |
00:48:15.640
and attacking with those units.
link |
00:48:17.200
And then I don't know what else there is.
link |
00:48:20.640
Maybe there's the different timing of attacks.
link |
00:48:24.080
Like do attack early, attack late.
link |
00:48:26.000
What are the different strategies that emerged
link |
00:48:28.000
that you've learned about?
link |
00:48:29.120
I've read that a bunch of people are super happy
link |
00:48:31.360
that you guys have apparently,
link |
00:48:33.000
that Alpha Star apparently has discovered
link |
00:48:35.000
that it's really good to, what is it, saturate.
link |
00:48:38.000
Oh yeah, the mineral line.
link |
00:48:39.600
Yeah, the mineral line.
link |
00:48:41.360
Yeah, yeah.
link |
00:48:42.200
And that's for greedy amateur players like myself.
link |
00:48:45.600
That's always been a good strategy.
link |
00:48:47.480
You just build up a lot of money
link |
00:48:49.000
and it just feels good to just accumulate and accumulate.
link |
00:48:53.280
So thank you for discovering that and validating all of us.
link |
00:48:56.720
But is there other strategies that you discovered
link |
00:48:59.200
interesting and unique to this game?
link |
00:49:01.880
Yeah, so if you look at the kind of,
link |
00:49:05.080
not being a Starcraft 2 player,
link |
00:49:06.480
but of course Starcraft and Starcraft 2
link |
00:49:08.120
and real time strategy games in general are very similar.
link |
00:49:11.120
I would classify perhaps the openings of the game.
link |
00:49:17.560
They're very important.
link |
00:49:18.760
And generally I would say there's two kinds of openings.
link |
00:49:21.760
One that's a standard opening,
link |
00:49:23.400
that's generally how players find sort of a balance
link |
00:49:28.840
between risk and economy
link |
00:49:31.520
and building some units early on
link |
00:49:33.400
so that they could defend,
link |
00:49:34.600
but they're not too exposed basically,
link |
00:49:36.800
but also expanding quite quickly.
link |
00:49:39.480
So this would be kind of a standard opening.
link |
00:49:42.040
And within a standard opening,
link |
00:49:43.680
then what you do choose generally
link |
00:49:45.480
is what technology are you aiming towards?
link |
00:49:48.400
So there's a bit of rock, paper, scissors
link |
00:49:50.280
of you could go for spaceships
link |
00:49:52.920
or you could go for invisible units
link |
00:49:55.080
or you could go for, I don't know,
link |
00:49:56.400
like massive units that attack against certain kinds of units
link |
00:50:00.080
but they're weak against others.
link |
00:50:01.640
So standard openings themselves have some choices
link |
00:50:05.760
like rock, paper, scissors style.
link |
00:50:07.480
Of course, if you scout and you're good at guessing
link |
00:50:09.640
what the opponent is doing,
link |
00:50:11.080
then you can play as an advantage
link |
00:50:12.800
because if you know you're gonna play rock,
link |
00:50:14.480
I mean, I'm gonna play paper obviously.
link |
00:50:16.480
So you can imagine that normal standard games
link |
00:50:19.120
in Starcraft looks like a continuous rock, paper,
link |
00:50:22.920
scissors game where you guess what the distribution
link |
00:50:26.600
of rock, paper, and scissors is from the enemy
link |
00:50:29.920
and reacting accordingly to try to beat it
link |
00:50:33.360
or put the paper out before he kind of changes
link |
00:50:37.000
his mind from rock to scissors
link |
00:50:38.880
and then you would be in a weak position.
link |
00:50:40.480
So sorry to pause on that.
link |
00:50:42.120
I didn't realize this element
link |
00:50:43.320
because I know it's true with poker.
link |
00:50:44.880
I looked at Leprata's.
link |
00:50:47.640
You're also estimating, trying to guess the distribution,
link |
00:50:52.200
trying to better and better estimate the distribution
link |
00:50:54.160
what the opponent is likely to be doing.
link |
00:50:56.040
Yeah, I mean, as a player,
link |
00:50:57.440
you definitely wanna have a belief state
link |
00:50:59.840
over what's up on the other side of the map
link |
00:51:03.000
and when your belief state becomes inaccurate
link |
00:51:05.600
when you start having serious doubts
link |
00:51:08.040
whether he's gonna play something that you must know,
link |
00:51:11.320
that's when you scout.
link |
00:51:12.440
You wanna then gather information, right?
link |
00:51:14.560
Is improving the accuracy of the belief
link |
00:51:16.440
or improving the belief state part of the loss
link |
00:51:19.880
that you're trying to optimize?
link |
00:51:21.040
Or is it just an side effect?
link |
00:51:22.720
It's implicit, but implicit.
link |
00:51:24.040
You could explicitly model it
link |
00:51:25.840
and it would be quite good at probably predicting
link |
00:51:28.280
what's on the other side of the map,
link |
00:51:30.360
but so far it's all implicit.
link |
00:51:32.880
There's no additional reward for predicting the enemy.
link |
00:51:36.680
So there's these standard openings
link |
00:51:38.800
and then there's what people call cheese,
link |
00:51:41.640
which is very interesting
link |
00:51:42.800
and Alpha Star sometimes really likes this kind of cheese.
link |
00:51:46.760
These cheeses, what they are is kind of an all in strategy.
link |
00:51:51.120
You're gonna do something sneaky.
link |
00:51:53.240
You're gonna hide your own buildings
link |
00:51:56.680
close to the enemy base
link |
00:51:58.200
or you're gonna go for hiding your technological buildings
link |
00:52:01.600
so that you do invisible units
link |
00:52:03.040
and the enemy just cannot react to detect it
link |
00:52:06.040
and thus lose the game.
link |
00:52:08.000
And there's quite a few of these cheeses
link |
00:52:10.000
and variants of them.
link |
00:52:11.800
And there it's where actually the belief state
link |
00:52:14.480
becomes even more important
link |
00:52:16.360
because if I scout your base
link |
00:52:18.520
and I see no buildings at all,
link |
00:52:20.200
any human player knows some things up.
link |
00:52:22.480
They might know, well,
link |
00:52:23.320
you're hiding something close to my base.
link |
00:52:25.640
Should I build suddenly a lot of units to defense?
link |
00:52:28.200
Should I actually block my ramp with workers
link |
00:52:31.000
so that you cannot come and destroy my base?
link |
00:52:33.520
So there's all this is happening
link |
00:52:35.680
and defending against cheeses is extremely important.
link |
00:52:39.440
And in the Alpha Star League,
link |
00:52:40.760
many agents actually develop some cheesy strategies.
link |
00:52:45.080
And in the games we saw against TLO and Mana,
link |
00:52:48.000
two out of the 10 agents
link |
00:52:49.240
were actually doing these kind of strategies
link |
00:52:51.760
which are cheesy strategies.
link |
00:52:53.640
And then there's a variant of cheesy strategy
link |
00:52:55.600
which is called all in.
link |
00:52:57.360
So an all in strategy is not perhaps as drastic
link |
00:53:00.280
as oh, I'm gonna build cannons on your base
link |
00:53:02.560
and then bring all my workers
link |
00:53:03.880
and try to just disrupt your base and game over
link |
00:53:06.840
or GG as we say in StarCraft.
link |
00:53:09.840
There's these kind of very cool things
link |
00:53:12.000
that you can align precisely at a certain time mark.
link |
00:53:14.760
So for instance, you can generate
link |
00:53:17.400
exactly 10 unit composition that is perfect.
link |
00:53:20.280
Like five of this type, five of these other type
link |
00:53:22.960
and align the upgrade so that at four minutes and a half,
link |
00:53:26.240
let's say you have these 10 units
link |
00:53:28.680
and the upgrade just finished.
link |
00:53:30.640
And at that point, that army is really scary.
link |
00:53:34.000
And unless the enemy really knows what's going on,
link |
00:53:36.440
if you push, you might then have an advantage
link |
00:53:40.280
because maybe the enemy is doing something more standard,
link |
00:53:42.480
it expanded too much, it developed too much economy
link |
00:53:45.800
and it trade off badly against having defenses
link |
00:53:49.760
and the enemy will lose.
link |
00:53:51.120
But it's called all in because if you don't win,
link |
00:53:53.680
then you're gonna lose.
link |
00:53:55.080
So you see players that do these kind of strategies,
link |
00:53:57.960
if they don't succeed, game is not over.
link |
00:54:00.000
I mean, they still have a base
link |
00:54:01.240
and they still gathering minerals,
link |
00:54:02.880
but they will just GG out of the game
link |
00:54:04.800
because they know, well, game is over.
link |
00:54:06.800
I gambled and I failed.
link |
00:54:08.880
So if we start entering the game
link |
00:54:11.600
theoretic aspects of the game, it's really rich
link |
00:54:14.520
and that's why it also makes it quite entertaining to watch.
link |
00:54:18.000
Even if I don't play, I still enjoy watching the game.
link |
00:54:21.800
But the agents are trying to do this mostly implicitly,
link |
00:54:26.800
but one element that we improved in self plays
link |
00:54:29.320
creating the Alpha Star League.
link |
00:54:31.320
And the Alpha Star League is not pure self play.
link |
00:54:34.600
It's trying to create different personalities of agents
link |
00:54:37.880
so that some of them will become cheesy agents.
link |
00:54:41.480
Some of them might become very economical, very greedy,
link |
00:54:44.360
like getting all the resources,
link |
00:54:46.160
but then maybe early on they're gonna be weak,
link |
00:54:48.760
but later on they're gonna be very strong.
link |
00:54:51.040
And by creating this personality of agents,
link |
00:54:53.400
which sometimes it just happens naturally
link |
00:54:55.400
that you can see kind of an evolution of agents
link |
00:54:58.240
that given the previous generation,
link |
00:55:00.760
they train against all of them
link |
00:55:01.920
and then they generate kind of the perfect counter
link |
00:55:04.320
to that distribution.
link |
00:55:05.760
But these agents, you must have them in the populations
link |
00:55:09.280
because if you don't have them,
link |
00:55:11.280
you're not covered against these things, right?
link |
00:55:13.040
It's kind of, you wanna create all sorts of the opponents
link |
00:55:17.080
that you will find in the wild.
link |
00:55:18.640
So you can be exposed to these cheeses,
link |
00:55:21.800
early aggression, later aggression, more expansions,
link |
00:55:25.720
dropping units in your base from the side, all these things.
link |
00:55:29.560
And pure self play is getting a bit stuck
link |
00:55:32.760
at finding some subset of these, but not all of these.
link |
00:55:36.200
So the Alpha Star League is a way to kind of
link |
00:55:39.480
do an ensemble of agents
link |
00:55:41.560
that they're all playing in a league
link |
00:55:43.480
much like people play on Battle.net, right?
link |
00:55:45.520
They play, you play against someone
link |
00:55:47.440
who does a new cool strategy and you immediately,
link |
00:55:50.240
oh my God, I wanna try it, I wanna play again.
link |
00:55:53.040
And these to me was another critical part
link |
00:55:55.960
of the problem which was,
link |
00:55:58.520
can we create a Battle.net for agents?
link |
00:56:01.240
Yeah.
link |
00:56:02.080
And that's kind of what the Alpha Star League really.
link |
00:56:03.400
That's fascinating.
link |
00:56:04.240
And where they stick to their different strategies.
link |
00:56:06.920
Yeah, wow, that's really, really interesting.
link |
00:56:09.960
But that said, you were fortunate enough
link |
00:56:13.240
or just skilled enough to win 5.0.
link |
00:56:17.320
And so how hard is it to win?
link |
00:56:19.280
I mean, that's not the goal.
link |
00:56:20.320
I guess, I don't know what the goal is.
link |
00:56:21.920
The goal should be to win majority, not 5.0,
link |
00:56:25.400
but how hard is it in general to win all matchups?
link |
00:56:29.360
I don't want V1.
link |
00:56:31.080
So that's a very interesting question
link |
00:56:33.600
because once you see Alpha Star
link |
00:56:37.240
and superficially you think, well, okay,
link |
00:56:39.520
it won, if you sum all the games like 10 to one, right?
link |
00:56:42.960
It lost the game that it played with the camera interface.
link |
00:56:46.280
You might think, well, that's done, right?
link |
00:56:48.480
It's super human at the game.
link |
00:56:50.840
And that's not really the claim we really can make, actually.
link |
00:56:56.000
The claim is we beat a professional gamer
link |
00:56:58.840
for the first time.
link |
00:57:00.120
Starcraft has really been a thing
link |
00:57:02.480
that has been going on for a few years,
link |
00:57:04.120
but a moment like this had not occurred before yet.
link |
00:57:09.520
But are these agents impossible to beat?
link |
00:57:12.400
Absolutely not, right?
link |
00:57:13.440
So that's a bit what's kind of the difference is
link |
00:57:17.360
the agents play at grandmaster level.
link |
00:57:19.560
They definitely understand the game enough
link |
00:57:21.520
to play extremely well, but are they unbeatable?
link |
00:57:24.960
Do they play perfect?
link |
00:57:27.920
No, and actually in Starcraft,
link |
00:57:30.320
because of these sneaky strategies,
link |
00:57:33.240
it's always possible that you might take a huge risk sometimes,
link |
00:57:36.680
but you might get wins, right, out of this.
link |
00:57:39.200
So I think as a domain, it still has a lot of opportunities,
link |
00:57:44.200
not only because of course we wanna learn with less experience,
link |
00:57:47.760
we would like to, I mean, if I learn to play Protoss,
link |
00:57:50.480
I can play Terran and learn it much quicker
link |
00:57:53.280
than Alpha Star can, right?
link |
00:57:54.480
So there are obvious interesting research challenges as well.
link |
00:57:58.440
But even as the raw performance goes,
link |
00:58:03.080
really the claim here can be we are at pro level
link |
00:58:05.960
or at high grandmaster level,
link |
00:58:09.080
but obviously the players also did not know what to expect,
link |
00:58:14.360
right, their prior distribution was a bit off
link |
00:58:17.000
because they played this kind of new alien brain
link |
00:58:20.400
as they like to say it, right?
link |
00:58:22.080
And that's what makes it exciting for them,
link |
00:58:25.080
but also I think if you look at the games closely,
link |
00:58:28.040
you see there were weaknesses in some points,
link |
00:58:31.520
maybe Alpha Star did not scout
link |
00:58:33.320
or if it had got invisible units going against
link |
00:58:36.080
at certain points, it wouldn't have known
link |
00:58:38.200
and it would have been bad.
link |
00:58:39.600
So there's still quite a lot of work to do,
link |
00:58:42.920
but it's really a very exciting moment for us
link |
00:58:45.440
to be seeing, wow, a single neural net on a GPU
link |
00:58:49.120
is actually playing against these guys who are amazing.
link |
00:58:52.080
I mean, you have to see them play in life.
link |
00:58:53.760
They're really, really amazing players.
link |
00:58:55.800
Yeah, I'm sure there must be a guy in Poland somewhere
link |
00:59:00.440
right now training his butt off
link |
00:59:02.680
to make sure that this never happens again with Alpha Star.
link |
00:59:06.600
So that's really exciting in terms of Alpha Star
link |
00:59:09.720
having some holes to exploit, which is great.
link |
00:59:12.200
And then you build on top of each other
link |
00:59:14.320
and it feels like Starcraft on let go, even if you win,
link |
00:59:18.920
it's still not, it's still not,
link |
00:59:21.640
there's so many different dimensions
link |
00:59:23.120
in which you can explore.
link |
00:59:24.200
So that's really, really interesting.
link |
00:59:25.560
Do you think there's a ceiling to Alpha Star?
link |
00:59:28.520
You've said that it hasn't reached, this is a big,
link |
00:59:33.960
let me actually just pause for a second.
link |
00:59:35.520
How did it feel to come here to this point,
link |
00:59:40.200
to beat a top professional player?
link |
00:59:42.240
Like that night, I mean, you know,
link |
00:59:44.600
Olympic athletes have their gold medal, right?
link |
00:59:47.160
This is your gold medal in a sense.
link |
00:59:48.840
Sure, you're cited a lot,
link |
00:59:50.400
you've published a lot of prestigious papers, whatever,
link |
00:59:53.120
but this is like a win.
link |
00:59:55.280
How did it feel?
link |
00:59:56.480
I mean, it was, for me, it was unbelievable
link |
00:59:59.440
because first the win itself, I mean, it was so exciting.
link |
01:00:04.440
I mean, so looking back to those last days of 2018,
link |
01:00:09.840
really, that's when the games were played,
link |
01:00:12.040
I'm sure I'll look back at that moment and say,
link |
01:00:14.560
oh my God, I wanna be in a project like that.
link |
01:00:17.280
It's like, I already feel the nostalgia of like,
link |
01:00:20.440
yeah, that was huge in terms of the energy
link |
01:00:23.560
and the team effort that went into it.
link |
01:00:25.720
And so in that sense, as soon as it happened,
link |
01:00:28.520
I already knew it was kind of,
link |
01:00:30.640
I was losing it a little bit.
link |
01:00:32.320
So it is almost like sad that it happened and oh my God,
link |
01:00:35.440
like, but on the other hand, it also verifies the approach.
link |
01:00:40.600
But to me also, there's so many challenges
link |
01:00:43.160
and interesting aspects of intelligence
link |
01:00:45.400
that even though we can train a neural network
link |
01:00:49.240
to play at the level of the best humans,
link |
01:00:52.040
there's still so many challenges.
link |
01:00:53.600
So for me, it's also like,
link |
01:00:54.800
well, this is really an amazing achievement,
link |
01:00:56.800
but I already was also thinking about next steps.
link |
01:00:59.280
I mean, as I said, these Asians play Protos,
link |
01:01:01.720
they play Protos versus Protos,
link |
01:01:04.080
but they should be able to play a different race
link |
01:01:07.240
much quicker, right?
link |
01:01:08.160
So that would be an amazing achievement.
link |
01:01:10.640
Some people call this meta reinforcement learning,
link |
01:01:13.360
meta learning and so on, right?
link |
01:01:15.200
So there's so many possibilities after that moment,
link |
01:01:18.960
but the moment itself, it really felt great.
link |
01:01:22.240
It's, we had this bet.
link |
01:01:24.520
So I'm kind of a pessimist in general.
link |
01:01:27.720
So I kind of send an email to the team and I said,
link |
01:01:30.120
okay, let's, against TLO first, right?
link |
01:01:33.680
Like what's going to be the result?
link |
01:01:35.120
And I really thought we would lose like five zero, right?
link |
01:01:38.680
I, I, we had some calibration made
link |
01:01:41.480
against the 5,000 MMR player.
link |
01:01:44.080
TLO was much stronger than that player.
link |
01:01:47.360
Even if he played Protos, which is his off race,
link |
01:01:51.040
but yeah, it was not imagining we would win.
link |
01:01:53.120
So for me, that was just kind of a test run or something.
link |
01:01:55.600
And then it really kind of, he was really surprised.
link |
01:01:59.000
And unbelievably, we went to this,
link |
01:02:02.360
to this bar to celebrate.
link |
01:02:04.560
And Dave tells me, well, why don't we invite someone
link |
01:02:08.360
who is a thousand MMR stronger in Protos?
link |
01:02:11.000
Like an actual Protos player, like,
link |
01:02:13.040
like that it turned out being mana, right?
link |
01:02:16.200
And, you know, we had some drinks and I said, sure, why not?
link |
01:02:19.400
But then I thought, well,
link |
01:02:20.240
that's really going to be impossible to beat.
link |
01:02:22.080
I mean, even because it's so much ahead.
link |
01:02:24.600
A thousand MMR is really like 99% probability
link |
01:02:28.440
that mana would beat TLO as Protos versus Protos, right?
link |
01:02:33.080
So we did that.
link |
01:02:34.240
And to me, the second game was much more important,
link |
01:02:39.000
even though a lot of uncertainty kind of disappeared
link |
01:02:42.120
after we kind of beat TLO.
link |
01:02:43.680
I mean, he is a professional player.
link |
01:02:45.680
So that was kind of, oh, but that's really
link |
01:02:48.000
a very nice achievement.
link |
01:02:49.760
But mana really was at the top.
link |
01:02:51.800
And you could see he played much better,
link |
01:02:53.880
but our agents got much better too.
link |
01:02:55.400
So it's like, ah.
link |
01:02:57.440
And then after the first game, I said,
link |
01:02:59.800
if we take a single game, at least we can say we beat A game.
link |
01:03:02.760
I mean, even if we don't beat the series,
link |
01:03:04.320
for me, that was a huge relief.
link |
01:03:06.960
And I mean, I remember the hacking them is.
link |
01:03:09.200
And I mean, it was really like this moment,
link |
01:03:11.600
for me, will resonate forever as a researcher.
link |
01:03:14.200
And I mean, as a person, and yeah,
link |
01:03:15.880
it's a really like great accomplishment.
link |
01:03:18.280
And it was great also to be there with the team in the room.
link |
01:03:21.360
I don't know if you saw like this.
link |
01:03:23.080
So it was really like.
link |
01:03:24.760
I mean, from my perspective,
link |
01:03:26.000
the other interesting thing is just like watching Kasparov,
link |
01:03:29.720
now watching mana was also interesting
link |
01:03:33.760
because he is kind of at a loss of words.
link |
01:03:36.160
I mean, whenever you lose, I've done a lot of sports.
link |
01:03:38.400
You sometimes say excuses, you look for reasons.
link |
01:03:43.560
And he couldn't really come up with reasons.
link |
01:03:46.280
I mean, so with the off race for Protoss,
link |
01:03:50.040
you could say, well, it felt awkward, it wasn't,
link |
01:03:52.320
but here it was just beaten.
link |
01:03:55.200
And it was beautiful to look at a human being
link |
01:03:57.960
being superseded by an AI system.
link |
01:04:00.320
I mean, it's a beautiful moment for researchers.
link |
01:04:04.480
Yeah, for sure it was.
link |
01:04:05.920
I mean, probably the highlight of my career so far
link |
01:04:09.960
because of its uniqueness and coolness.
link |
01:04:11.760
And I don't know, I mean, it's obviously, as you said,
link |
01:04:14.280
you can look at paper citations and so on.
link |
01:04:16.240
But this really is like a testament
link |
01:04:19.280
of the whole machine learning approach
link |
01:04:22.400
and using games to advance technology.
link |
01:04:24.640
I mean, it really was, everything came together
link |
01:04:27.880
at that moment, that's really the summary.
link |
01:04:29.840
Also, on the other side, it's a popularization of AI too
link |
01:04:34.040
because it's just like traveling to the moon and so on.
link |
01:04:38.200
I mean, this is where a very large community of people
link |
01:04:41.000
that don't really know AI, they get to really interact with it.
link |
01:04:45.080
Which is very important.
link |
01:04:46.000
I mean, we must, you know, writing papers helps our peers,
link |
01:04:50.800
researchers to understand what we're doing.
link |
01:04:52.520
But I think AI is becoming mature enough
link |
01:04:55.880
that we must sort of try to explain what it is.
link |
01:04:59.000
And perhaps through games is an obvious way
link |
01:05:01.440
because these games always had built in AI.
link |
01:05:03.640
So it may be everyone experienced an AI playing a video game
link |
01:05:07.680
even if they don't know.
link |
01:05:08.520
Because there's always some scripted element
link |
01:05:10.240
and some people might even call that AI already, right?
link |
01:05:13.880
So what are other applications
link |
01:05:16.320
of the approaches underlying Alpha Star that you see happening?
link |
01:05:20.280
There's a lot of echoes of, you said, transformer
link |
01:05:23.120
of language modeling and so on.
link |
01:05:25.680
Have you already started thinking where the breakthroughs
link |
01:05:29.480
in Alpha Star get expanded to other applications?
link |
01:05:32.280
Right, so I thought about a few things
link |
01:05:34.640
for like kind of next months, next years.
link |
01:05:38.440
The main thing I'm thinking about actually is
link |
01:05:40.520
what's next as a kind of a grand challenge
link |
01:05:43.160
because for me, like we've seen Atari
link |
01:05:47.120
and then there's like the sort of three dimensional walls
link |
01:05:50.280
that we've seen also like pretty good performance
link |
01:05:52.520
from this capture the flag agents
link |
01:05:54.160
that also some people at DeepMind and elsewhere are working on.
link |
01:05:57.600
We've also seen some amazing results on like,
link |
01:05:59.600
for instance, Dota 2, which is also a very complicated game.
link |
01:06:03.280
So for me, like the main thing I'm thinking about
link |
01:06:05.960
is what's next in terms of challenge.
link |
01:06:07.960
So as a researcher, I see sort of two tensions
link |
01:06:12.960
between research and then applications or areas
link |
01:06:16.760
or domains where you apply them.
link |
01:06:18.480
So on the one hand, we've done,
link |
01:06:20.480
thanks to the application of StarCraft is very hard.
link |
01:06:23.320
We developed some techniques, some new research
link |
01:06:25.600
that now we could look at elsewhere,
link |
01:06:27.480
like are there other applications where we can apply this?
link |
01:06:30.520
And the obvious ones, absolutely,
link |
01:06:32.880
you can think of feeding back to sort of the community
link |
01:06:37.480
we took from, which was mostly sequence modeling
link |
01:06:40.240
or natural language processing.
link |
01:06:41.680
So we've developed an extended things from the transformer
link |
01:06:46.120
and we use pointer networks.
link |
01:06:48.120
We combine LSTM and transformers in interesting ways.
link |
01:06:51.280
So that's perhaps the kind of lowest hanging fruit
link |
01:06:54.200
of feeding back to now a different field of machine learning
link |
01:06:58.840
that's not playing video games.
link |
01:07:00.880
Let me go old school and jump to Mr. Alan Turing.
link |
01:07:05.680
So the Turing test is a natural language test,
link |
01:07:09.880
a conversational test.
link |
01:07:11.560
What's your thought of it as a test for intelligence?
link |
01:07:15.760
Do you think it is a grand challenge
link |
01:07:17.360
that's worthy of undertaking?
link |
01:07:18.920
Maybe if it is, would you reformulate it
link |
01:07:21.960
or phrase it somehow differently?
link |
01:07:23.720
Right, so I really love the Turing test
link |
01:07:25.640
because I also like sequences and language understanding.
link |
01:07:29.600
And in fact, some of the early work
link |
01:07:32.160
we did in machine translation,
link |
01:07:33.520
we tried to apply to kind of a neural chat bot,
link |
01:07:37.320
which obviously would never pass the Turing test
link |
01:07:40.200
because it was very limited.
link |
01:07:42.320
But it is a very fascinating idea
link |
01:07:45.200
that you could really have an AI
link |
01:07:49.440
that would be indistinguishable from humans
link |
01:07:51.800
in terms of asking or conversing with it, right?
link |
01:07:56.040
So I think the test itself seems very nice
link |
01:08:00.720
and it's kind of well defined actually,
link |
01:08:02.600
like the passing it or not.
link |
01:08:05.000
I think there's quite a few rules
link |
01:08:06.560
that feel like pretty simple and you could really have,
link |
01:08:12.520
I mean, I think they have these competitions every year.
link |
01:08:14.800
Yeah, so the Leibniz Prize,
link |
01:08:15.920
but I don't know if you've seen the kind of bots
link |
01:08:22.240
that emerge from that competition.
link |
01:08:24.160
They're not quite as what you would,
link |
01:08:28.000
so it feels like that there's weaknesses
link |
01:08:29.920
with the way Turing formulated it.
link |
01:08:31.400
It needs to be that the definition
link |
01:08:35.000
of a genuine, rich, fulfilling human conversation
link |
01:08:40.000
it needs to be something else.
link |
01:08:41.640
Like the Alexa Prize,
link |
01:08:43.000
which I'm not as well familiar with,
link |
01:08:44.880
has tried to define that more.
link |
01:08:46.200
I think by saying you have to continue
link |
01:08:48.240
keeping a conversation for 30 minutes,
link |
01:08:50.680
something like that.
link |
01:08:52.240
So basically forcing the agent not to just fool,
link |
01:08:55.520
but to have an engaging conversation kind of thing,
link |
01:08:58.000
is that, I mean, have you thought
link |
01:09:03.720
about this problem richly?
link |
01:09:06.400
And if you have in general, how far away are we from,
link |
01:09:10.680
you worked a lot on language understanding,
link |
01:09:14.160
language generation, but the full dialogue,
link |
01:09:16.640
the conversation, just sitting at the bar,
link |
01:09:19.920
having a cup of beers for an hour,
link |
01:09:21.760
that kind of conversation.
link |
01:09:22.960
Have you thought about it?
link |
01:09:23.800
Yeah, so I think you touched here on the critical point,
link |
01:09:26.440
which is feasibility, right?
link |
01:09:28.640
So there's a great sort of essay by Hamming,
link |
01:09:32.880
which describes sort of grand challenges of physics.
link |
01:09:37.400
And he argues that, well, okay, for instance,
link |
01:09:41.080
teleportation or time travel are great grand challenges
link |
01:09:44.720
of physics, but there's no attacks.
link |
01:09:46.600
We really don't know or cannot kind of make any progress.
link |
01:09:50.360
So that's why most physicists and so on,
link |
01:09:53.360
they don't work on these in their PhDs
link |
01:09:55.360
and as part of their careers.
link |
01:09:57.920
So I see the Turing test as, in the full Turing test,
link |
01:10:01.000
as a bit still too early.
link |
01:10:02.760
Like I am, I think we're, especially with the current trend
link |
01:10:06.760
of deep learning language models,
link |
01:10:10.080
we've seen some amazing examples,
link |
01:10:11.640
I think GPT2 being the most recent one,
link |
01:10:14.160
which is very impressive,
link |
01:10:15.840
but to understand to fully solve passing or fooling a human
link |
01:10:21.080
to think that there's a human on the other side,
link |
01:10:23.480
I think we're quite far.
link |
01:10:24.960
So as a result, I don't see myself
link |
01:10:27.360
and I probably would not recommend people doing a PhD
link |
01:10:30.520
on solving the Turing test,
link |
01:10:31.680
because it just feels it's kind of too early
link |
01:10:34.120
or too hard of a problem.
link |
01:10:35.520
Yeah, but that said, you said the exact same thing
link |
01:10:37.840
about StarCraft about a few years ago.
link |
01:10:40.480
So to demo, so I pre...
link |
01:10:42.600
Yes.
link |
01:10:43.920
You'll probably also be the person
link |
01:10:45.600
who passes the Turing test in three years.
link |
01:10:48.240
I mean, I think the, yeah, so...
link |
01:10:51.040
So we have this on record, this is nice.
link |
01:10:52.720
It's true.
link |
01:10:53.560
I mean, it's true that progress sometimes
link |
01:10:56.600
is a bit unpredictable.
link |
01:10:57.840
I really wouldn't have not, even six months ago,
link |
01:11:00.840
I would not have predicted the level
link |
01:11:02.520
that we see that these agents can deliver.
link |
01:11:05.480
At grandmaster level, but I have worked on language enough.
link |
01:11:10.120
And basically my concern is not that something could happen,
link |
01:11:13.640
a breakthrough could happen that would bring us to solving
link |
01:11:16.440
or passing the Turing test,
link |
01:11:18.440
is that I just think the statistical approach to it,
link |
01:11:21.680
like this is not gonna cut it.
link |
01:11:24.160
So we need a breakthrough,
link |
01:11:25.960
which is great for the community.
link |
01:11:28.320
But given that, I think there's quite a more uncertainty.
link |
01:11:31.840
Whereas for StarCraft,
link |
01:11:34.280
I knew what the steps would be to kind of get us there.
link |
01:11:38.160
I think it was clear that using the imitation learning part
link |
01:11:41.640
and then using these battle network agents
link |
01:11:44.360
were gonna be key and it turned out that this was the case
link |
01:11:48.320
and a little more was needed, but not much more.
link |
01:11:51.640
For Turing test, I just don't know what the plan
link |
01:11:54.360
or execution plan would look like.
link |
01:11:56.000
So that's why I myself working on it
link |
01:11:59.160
as a grand challenge is hard,
link |
01:12:01.520
but there are quite a few sub challenges
link |
01:12:03.920
that are related that you could say,
link |
01:12:05.480
well, I mean, what if you create a great assistant,
link |
01:12:09.080
like Google already has like the Google Assistant.
link |
01:12:11.400
So can we make it better
link |
01:12:13.120
and can we make it fully neural and so on?
link |
01:12:15.440
That I start to believe maybe we're reaching a point
link |
01:12:18.200
where we should attempt these challenges.
link |
01:12:20.760
I like this conversation so much
link |
01:12:22.480
because it echoes very much the StarCraft conversation.
link |
01:12:24.920
It's exactly how you approach StarCraft.
link |
01:12:26.920
Let's break it down into small pieces and solve those
link |
01:12:29.680
and you end up solving the whole game.
link |
01:12:31.400
Great, but that said, you're behind some
link |
01:12:34.120
of the sort of biggest pieces of work and deep learning
link |
01:12:37.960
in the last several years.
link |
01:12:40.280
So you mentioned some limits.
link |
01:12:42.320
What do you think are the current limits of deep learning
link |
01:12:44.960
and how do we overcome those limits?
link |
01:12:47.080
So if I had to actually use a single word
link |
01:12:50.160
to define the main challenge in deep learning,
link |
01:12:53.200
it's a challenge that probably has been the challenge
link |
01:12:55.720
for many years and is that of generalization.
link |
01:12:59.760
So what that means is that all that we're doing
link |
01:13:04.520
is fitting functions to data.
link |
01:13:06.800
And when the data we see is not from the same distribution
link |
01:13:12.160
or even if there are some times
link |
01:13:14.080
that it is very close to distribution
link |
01:13:16.800
but because of the way we train it with limited samples,
link |
01:13:20.240
we then get to this stage where we just don't see
link |
01:13:23.880
generalization as much as we can generalize.
link |
01:13:27.760
And I think adversarial examples are a clear example of this
link |
01:13:31.240
but if you study machine learning and literature
link |
01:13:34.640
and the reason why SVMs came very popular
link |
01:13:38.320
were because they were dealing
link |
01:13:39.720
and they had some guarantees about generalization
link |
01:13:42.400
which is unseen data or out of distribution
link |
01:13:45.600
or even within distribution
link |
01:13:47.000
where you take an image adding a bit of noise,
link |
01:13:49.760
these models fail.
link |
01:13:51.280
So I think really I don't see a lot of progress
link |
01:13:56.280
on generalization in the strong generalization sense
link |
01:14:00.800
of the word.
link |
01:14:01.880
I think our neural networks,
link |
01:14:05.280
you can always find design examples
link |
01:14:08.000
that will make their outputs arbitrary
link |
01:14:11.000
which is not good because we humans would never be fooled
link |
01:14:16.000
by these kind of images or manipulation of the image.
link |
01:14:19.920
And if you look at the mathematics,
link |
01:14:21.720
you kind of understand this is a bunch of matrices
link |
01:14:23.960
multiplied together, there's probably numerics
link |
01:14:27.320
and instability that you can just find corner cases.
link |
01:14:30.880
So I think that's really the underlying topic
link |
01:14:34.560
many times we see when even at the grand stage
link |
01:14:38.760
of like during test generalization,
link |
01:14:40.840
I mean, if you start, I mean, passing the during test,
link |
01:14:44.560
should it be in English or should it be in any language?
link |
01:14:47.920
I mean, as a human, if you ask something
link |
01:14:52.320
in a different language, you actually will go
link |
01:14:54.120
and do some research and try to translate it
link |
01:14:56.280
and so on, should the during test include that, right?
link |
01:15:01.080
And it's really a difficult problem
link |
01:15:02.920
and very fascinating and very mysterious actually.
link |
01:15:05.360
Yeah, absolutely.
link |
01:15:06.320
But do you think it's, if you were to try to solve it,
link |
01:15:10.520
can you not grow the size of data intelligently
link |
01:15:14.280
in such a way that the distribution of your training set
link |
01:15:17.400
does include the entirety of the testing set?
link |
01:15:20.360
I think is that one path?
link |
01:15:21.800
The other path is totally a new methodology.
link |
01:15:23.880
That's not statistical.
link |
01:15:25.000
So a path that has worked well
link |
01:15:27.080
and it worked well in StarCraft and in machine translation
link |
01:15:29.880
and in language is scaling up the data and the model.
link |
01:15:32.800
And that's kind of been maybe the only single formula
link |
01:15:37.400
that still delivers today in deep learning, right?
link |
01:15:40.440
It's that scale, data scale and model scale
link |
01:15:44.080
really do more and more of the things that we thought,
link |
01:15:47.080
oh, there's no way it can generalize to these
link |
01:15:49.240
or there's no way it can generalize to that.
link |
01:15:51.360
But I don't think fundamentally it will be solved with this.
link |
01:15:54.840
And for instance, I'm really liking some style
link |
01:15:58.960
or approach that would not only have neural networks
link |
01:16:02.120
but it would have programs or some discrete decision making
link |
01:16:06.400
because there is where I feel there's a bit more,
link |
01:16:09.760
like, I mean, the example of the best example,
link |
01:16:12.200
I think for understanding this is,
link |
01:16:14.680
I also worked a bit on, oh, like we can learn an algorithm
link |
01:16:17.640
with a neural network, right?
link |
01:16:18.840
So you give it many examples
link |
01:16:20.160
and it's gonna sort the input numbers
link |
01:16:22.880
or something like that.
link |
01:16:24.440
But really, strong generalization is you give me some numbers
link |
01:16:29.520
or you ask me to create an algorithm that sorts numbers
link |
01:16:32.360
and instead of creating a neural net which will be fragile
link |
01:16:34.760
because it's gonna go out of range at some point,
link |
01:16:38.000
you're gonna give it numbers that are too large,
link |
01:16:40.400
too small and whatnot, you just,
link |
01:16:42.680
if you just create a piece of code that sorts the numbers,
link |
01:16:46.400
then you can prove that that will generalize
link |
01:16:48.760
to absolutely all the possible inputs you could give.
link |
01:16:52.040
So I think that's, the problem comes
link |
01:16:53.920
with some exciting prospects.
link |
01:16:56.000
I mean, scale is a bit more boring, but it really works.
link |
01:16:59.560
And then maybe programs and discrete abstractions
link |
01:17:02.960
are a bit less developed,
link |
01:17:04.920
but clearly I think they're quite exciting
link |
01:17:07.520
in terms of future for the field.
link |
01:17:10.000
Do you draw any insight wisdom from the 80s
link |
01:17:13.560
and expert systems and symbolic systems, symbolic computing?
link |
01:17:17.000
Do you ever go back to those,
link |
01:17:18.920
the reasoning, that kind of logic?
link |
01:17:20.800
Do you think that might make a comeback?
link |
01:17:23.200
You'll have to dust off those books?
link |
01:17:25.000
Yeah, I actually love actually adding more inductive biases.
link |
01:17:31.360
To me, the problem really is what are you trying to solve?
link |
01:17:34.360
If what you're trying to solve is so important
link |
01:17:36.560
that try to solve it no matter what,
link |
01:17:39.240
then absolutely use rules, use domain knowledge
link |
01:17:44.280
and then use a bit of the magic of machine learning
link |
01:17:46.960
to empower or to make the system as the best system
link |
01:17:50.160
that will detect cancer or detect weather patterns, right?
link |
01:17:56.080
Or in terms of StarCraft, it also was a very big challenge.
link |
01:17:59.160
So I was definitely happy
link |
01:18:01.320
that if we had to cut a corner here and there,
link |
01:18:04.560
it could have been interesting to do.
link |
01:18:06.920
And in fact, in StarCraft,
link |
01:18:08.400
we start thinking about expert systems
link |
01:18:10.600
because it's a very, you can define,
link |
01:18:12.840
I mean, people actually build StarCraft bots
link |
01:18:15.120
by thinking about those principles like state machines
link |
01:18:18.720
and rule based and then you could think
link |
01:18:21.600
of combining a bit of a rule based system,
link |
01:18:24.520
but that has also neural networks incorporated
link |
01:18:27.480
to make it generalize a bit better.
link |
01:18:29.080
So absolutely, I mean, we should definitely go back
link |
01:18:31.840
to those ideas and anything that makes the problem simpler.
link |
01:18:35.440
As long as your problem is important, that's okay.
link |
01:18:38.040
And that's research driving a very important problem.
link |
01:18:41.080
And on the other hand,
link |
01:18:42.160
if you wanna really focus on the limits
link |
01:18:45.240
of reinforcement learning, then of course,
link |
01:18:47.240
you must try not to look at imitation data
link |
01:18:50.800
or to look for some rules of the domain
link |
01:18:54.200
that would help a lot or even feature engineering, right?
link |
01:18:57.040
So this is a tension that depending on what you do,
link |
01:19:00.760
I think both ways are definitely fine.
link |
01:19:03.360
And I would never not do one or the other
link |
01:19:06.080
if you're, as long as what you're doing
link |
01:19:08.040
is important and needs to be solved, right?
link |
01:19:10.080
All right, so there's a bunch of different ideas
link |
01:19:13.520
that you've developed that I really enjoy.
link |
01:19:16.920
But one is translating from image captioning,
link |
01:19:22.240
translating from image to text.
link |
01:19:23.960
Just another beautiful idea, I think,
link |
01:19:28.720
that resonates throughout your work, actually.
link |
01:19:33.240
So the underlying nature of reality being language always.
link |
01:19:36.760
Yeah, somehow.
link |
01:19:37.680
So what's the connection between images and text?
link |
01:19:42.520
Or rather the visual world and the world of language
link |
01:19:45.880
in your view?
link |
01:19:46.720
Right, so I think a piece of research that's been central
link |
01:19:51.480
to, I would say, even extending into StarCraft
link |
01:19:54.400
is this idea of sequence to sequence learning,
link |
01:19:57.680
which what we really meant by that
link |
01:19:59.840
is that you can now really input anything
link |
01:20:03.520
to a neural network as the input X
link |
01:20:06.160
and then the neural network will learn a function F
link |
01:20:09.600
that will take X as an input and produce any output Y.
link |
01:20:12.840
And these X and Ys don't need to be like static
link |
01:20:16.240
or like a feature, like a fixed vectors
link |
01:20:21.240
or anything like that.
link |
01:20:22.240
It could be really sequences
link |
01:20:23.800
and now beyond like data structures, right?
link |
01:20:26.600
So that paradigm was tested in a very interesting way
link |
01:20:31.600
when we moved from translating French to English
link |
01:20:35.760
to translating an image to its caption.
link |
01:20:37.960
But the beauty of it is that really,
link |
01:20:40.760
and that's actually how it happened.
link |
01:20:42.160
I ran, I changed a line of code in this thing
link |
01:20:45.240
that was doing machine translation
link |
01:20:47.520
and I came the next day and I saw how it was producing
link |
01:20:51.800
captions that seemed like, oh my God,
link |
01:20:54.200
this is really, really working.
link |
01:20:56.040
And the principle is the same, right?
link |
01:20:57.560
So I think I don't see text, vision, speech, way forms
link |
01:21:02.560
as something different, as long as you basically learn
link |
01:21:08.120
a function that will vectorize these into,
link |
01:21:13.480
and then after we vectorize it, we can then use transformers,
link |
01:21:17.480
LSTMs, whatever the flavor of the month of the model is.
link |
01:21:21.160
And then as long as we have enough supervised data,
link |
01:21:24.280
really this formula will work and will keep working,
link |
01:21:28.280
I believe to some extent.
link |
01:21:30.280
Model of these generalization issues that I mentioned before.
link |
01:21:33.360
So, but the task there is to vectorize
link |
01:21:35.400
sort of form a representation that's meaningful,
link |
01:21:37.880
and your intuition now, having worked with all this media,
link |
01:21:41.400
is that once you are able to form that representation,
link |
01:21:45.240
you could basically take anything, any sequence.
link |
01:21:48.960
Is there, going back to StarCraft,
link |
01:21:51.240
is there limits on the length?
link |
01:21:54.080
So we didn't really touch on the long term aspect.
link |
01:21:57.960
How did you overcome the whole really long term aspect
link |
01:22:01.640
of things here?
link |
01:22:02.480
Is there some tricks or is it?
link |
01:22:03.920
So the main trick, so StarCraft,
link |
01:22:07.000
if you look at absolutely every frame,
link |
01:22:09.360
you might think it's quite a long game.
link |
01:22:11.120
So we would have to multiply 22 times,
link |
01:22:15.600
60 seconds per minute times maybe
link |
01:22:18.200
at least 10 minutes per game on average.
link |
01:22:20.360
So there are quite a few frames,
link |
01:22:24.160
but the trick really was to,
link |
01:22:26.600
only observe, in fact, which might be seen as a limitation,
link |
01:22:30.760
but it is also a computational advantage.
link |
01:22:33.600
Only observe when you act.
link |
01:22:36.040
And then what the neural network decides
link |
01:22:38.440
is what is the gap gonna be until the next action?
link |
01:22:43.200
And if you look at most StarCraft games
link |
01:22:46.520
that we have in the data set that Blizzard provided,
link |
01:22:50.200
it turns out that most games are actually only,
link |
01:22:54.720
I mean, it is still a long sequence,
link |
01:22:56.720
but it's maybe like 1,000 to 1,500 actions,
link |
01:23:00.720
which if you start looking at LSTMs,
link |
01:23:04.720
large LSTMs, transformers,
link |
01:23:07.720
it's not that difficult,
link |
01:23:10.720
especially if you have supervised learning.
link |
01:23:13.720
If you had to do it with reinforcement learning,
link |
01:23:15.720
the credit assignment problem,
link |
01:23:16.720
what is it that in this game that made you win?
link |
01:23:18.720
That would be really difficult.
link |
01:23:20.720
But thankfully, because of imitation learning,
link |
01:23:23.720
we didn't kind of have to deal with this directly.
link |
01:23:26.720
Although if we had to, we tried it,
link |
01:23:28.720
and what happened is you just take all your workers
link |
01:23:30.720
and attack with them.
link |
01:23:32.720
And that sort of is kind of obvious in retrospect,
link |
01:23:35.720
because you start trying random actions.
link |
01:23:37.720
One of the actions will be a worker
link |
01:23:39.720
that goes to the enemy base,
link |
01:23:40.720
and because it's self play,
link |
01:23:42.720
it's not gonna know how to defend,
link |
01:23:44.720
because it basically doesn't know almost anything.
link |
01:23:46.720
And eventually what you develop is this,
link |
01:23:48.720
take all workers and attack,
link |
01:23:51.720
because the credit assignment issue in our rally
link |
01:23:54.720
is really, really hard.
link |
01:23:55.720
I do believe we could do better,
link |
01:23:57.720
and that's maybe a research challenge for the future.
link |
01:24:00.720
But yeah, even in StarCraft,
link |
01:24:03.720
the sequences are maybe 1,000,
link |
01:24:05.720
which I believe is within the realm
link |
01:24:08.720
of what transformers can do.
link |
01:24:10.720
Yeah, I guess the difference between StarCraft and Go
link |
01:24:13.720
is in Go and chess,
link |
01:24:15.720
stuff starts happening right away.
link |
01:24:17.720
Right.
link |
01:24:18.720
Yeah, it's pretty easy to self play,
link |
01:24:21.720
not easy, but to self play is possible
link |
01:24:23.720
to develop reasonable strategies quickly
link |
01:24:25.720
as opposed to StarCraft.
link |
01:24:27.720
In Go, there's only 400 actions,
link |
01:24:29.720
but one action is what people would call
link |
01:24:32.720
the God action that would be,
link |
01:24:34.720
if you had expanded the whole search tree,
link |
01:24:37.720
that's the best action if you did minimax
link |
01:24:39.720
or whatever algorithm you would do
link |
01:24:41.720
if you had the computational capacity.
link |
01:24:43.720
But in StarCraft,
link |
01:24:45.720
400 is minuscule.
link |
01:24:48.720
In 400, you couldn't even click
link |
01:24:51.720
on the pixels around a unit, right?
link |
01:24:53.720
So I think the problem there
link |
01:24:55.720
is in terms of action space size
link |
01:24:58.720
is way harder,
link |
01:25:00.720
and that search is impossible.
link |
01:25:03.720
So there's quite a few challenges indeed
link |
01:25:05.720
that make this kind of a step up
link |
01:25:08.720
in terms of machine learning.
link |
01:25:10.720
For humans, maybe playing StarCraft
link |
01:25:12.720
seems more intuitive
link |
01:25:14.720
because it looks real,
link |
01:25:16.720
the graphics and everything moves smoothly,
link |
01:25:18.720
whereas I don't know how to...
link |
01:25:20.720
Go is a game that I wouldn't really need to study.
link |
01:25:22.720
It feels quite complicated,
link |
01:25:24.720
but for machines, maybe it's the reverse, yes.
link |
01:25:26.720
Which shows you the gap, actually,
link |
01:25:28.720
between deep learning and however the heck
link |
01:25:30.720
our brains work.
link |
01:25:32.720
So you developed a lot of really interesting ideas.
link |
01:25:35.720
It's interesting to just ask,
link |
01:25:37.720
what's your process of developing new ideas?
link |
01:25:40.720
Do you like brainstorming with others?
link |
01:25:42.720
Do you like thinking alone?
link |
01:25:44.720
Do you like...
link |
01:25:46.720
Like what was it?
link |
01:25:48.720
Ian Goodfellow said he came up with Gans
link |
01:25:50.720
after a few beers.
link |
01:25:52.720
He thinks beers are essential
link |
01:25:54.720
for coming up with new ideas.
link |
01:25:56.720
We had beers to decide to play another game
link |
01:25:58.720
of StarCraft after a week,
link |
01:26:00.720
so it's really similar to that story.
link |
01:26:02.720
Actually, I explained this
link |
01:26:04.720
in a deep mind retreat,
link |
01:26:06.720
and I said this is the same as the Gans story.
link |
01:26:08.720
I mean, we were on a bar and we decided,
link |
01:26:10.720
we were on a week and that's what happened.
link |
01:26:12.720
I feel like we're giving the wrong message
link |
01:26:14.720
to young undergrads.
link |
01:26:16.720
But in general, do you like brainstorming?
link |
01:26:18.720
Do you like thinking alone, working stuff out?
link |
01:26:20.720
So I think throughout the years
link |
01:26:22.720
also things changed, right?
link |
01:26:24.720
So initially, I was
link |
01:26:26.720
very fortunate to be
link |
01:26:28.720
with great minds like
link |
01:26:30.720
Jeff Hinton,
link |
01:26:32.720
Jeff Dean, Ilya Tsutskiber.
link |
01:26:34.720
I was really fortunate to join Brain
link |
01:26:36.720
at a very good time.
link |
01:26:38.720
At that point, ideas,
link |
01:26:40.720
I was just kind of brainstorming with my colleagues
link |
01:26:42.720
and learned a lot,
link |
01:26:44.720
and keep learning is actually
link |
01:26:46.720
something you should never stop doing, right?
link |
01:26:48.720
So learning implies
link |
01:26:50.720
reading papers and also discussing ideas
link |
01:26:52.720
with others. It's very hard
link |
01:26:54.720
at some point to not communicate
link |
01:26:56.720
that being reading a paper from someone
link |
01:26:58.720
or actually discussing, right?
link |
01:27:00.720
So definitely
link |
01:27:02.720
that communication aspect
link |
01:27:04.720
needs to be there, whether it's written
link |
01:27:06.720
or oral.
link |
01:27:08.720
Nowadays,
link |
01:27:10.720
I'm also trying to be a bit more strategic
link |
01:27:12.720
about what research to do.
link |
01:27:14.720
So I was describing
link |
01:27:16.720
a little bit this sort of tension between
link |
01:27:18.720
research for the sake of research,
link |
01:27:20.720
and then you have, on the other hand,
link |
01:27:22.720
applications that can drive the research, right?
link |
01:27:24.720
And honestly,
link |
01:27:26.720
the formula that has worked best for me is
link |
01:27:28.720
just find a hard problem
link |
01:27:30.720
and then try to
link |
01:27:32.720
see how research fits into it,
link |
01:27:34.720
how it doesn't fit into it,
link |
01:27:36.720
and then you must innovate.
link |
01:27:38.720
So I think machine translation
link |
01:27:40.720
drove sequence to sequence.
link |
01:27:42.720
Then maybe
link |
01:27:44.720
learning algorithms
link |
01:27:46.720
that had to, like combinatorial algorithms
link |
01:27:48.720
led to pointer networks.
link |
01:27:50.720
Starcraft led to really scaling up
link |
01:27:52.720
imitation learning and the Alpha Star League.
link |
01:27:54.720
So that's been a formula
link |
01:27:56.720
that I personally like,
link |
01:27:58.720
but the other one is also valid,
link |
01:28:00.720
and I see it succeed a lot of the times
link |
01:28:02.720
where you just want to investigate
link |
01:28:04.720
model based RL
link |
01:28:06.720
as a kind of a research topic,
link |
01:28:08.720
and then you must then start to think,
link |
01:28:10.720
well, how are the tests?
link |
01:28:12.720
How are you going to test these ideas?
link |
01:28:14.720
You need kind of a minimal
link |
01:28:16.720
environment to try things.
link |
01:28:18.720
You need to read a lot of papers and so on,
link |
01:28:20.720
and that's also very fun to do,
link |
01:28:22.720
and something I've also done quite a few times,
link |
01:28:24.720
both at Brain, at DeepMind,
link |
01:28:26.720
and obviously as a PhD.
link |
01:28:28.720
So I think
link |
01:28:30.720
the ideas and discussions,
link |
01:28:32.720
I think it's important also
link |
01:28:34.720
because you start sort of
link |
01:28:36.720
guiding not only
link |
01:28:38.720
your own goals, but
link |
01:28:40.720
other people's goals
link |
01:28:42.720
to the next breakthrough, so
link |
01:28:44.720
you must really kind of understand
link |
01:28:46.720
this feasibility also
link |
01:28:48.720
as we were discussing before, right?
link |
01:28:50.720
Whether this domain is ready
link |
01:28:52.720
to be tackled or not, and you don't want
link |
01:28:54.720
to be too early, you obviously don't want
link |
01:28:56.720
to be too late, so it's really interesting
link |
01:28:58.720
this strategic component of research,
link |
01:29:00.720
which I think as a grad student
link |
01:29:02.720
I just had no idea,
link |
01:29:04.720
I just read papers and discussed
link |
01:29:06.720
ideas, and I think this has been maybe
link |
01:29:08.720
the major change, and I recommend
link |
01:29:10.720
people kind of
link |
01:29:12.720
fit forward to success, how it looks like
link |
01:29:14.720
and try to backtrack, other than just
link |
01:29:16.720
kind of looking out, this looks cool,
link |
01:29:18.720
this looks cool, and then you do a bit of
link |
01:29:20.720
random work, which sometimes you stumble upon
link |
01:29:22.720
some interesting things, but
link |
01:29:24.720
in general it's also good to plan a bit.
link |
01:29:26.720
Yeah, I like it.
link |
01:29:28.720
Especially like your approach of taking
link |
01:29:30.720
on really hard problems, stepping right in
link |
01:29:32.720
and then being super skeptical about
link |
01:29:34.720
being able to solve the problem.
link |
01:29:36.720
I mean, there's a
link |
01:29:38.720
balance of both, right? There's a silly
link |
01:29:40.720
optimism
link |
01:29:42.720
and a critical
link |
01:29:44.720
sort of skepticism
link |
01:29:46.720
that's good to balance, which
link |
01:29:48.720
is why it's good to have a team of people
link |
01:29:50.720
that balance that.
link |
01:29:52.720
You don't do that on your own, you have both
link |
01:29:54.720
mentors that have seen
link |
01:29:56.720
or you obviously want to chat and
link |
01:29:58.720
discuss whether it's the right time.
link |
01:30:00.720
I mean, Damis
link |
01:30:02.720
came in 2014 and he said
link |
01:30:04.720
maybe in a bit we'll do StarCraft and
link |
01:30:06.720
maybe he knew
link |
01:30:08.720
and I'm just following his lead, which
link |
01:30:10.720
is great because he's brilliant, right?
link |
01:30:12.720
So, these things are
link |
01:30:14.720
obviously quite
link |
01:30:16.720
important that you want to
link |
01:30:18.720
be surrounded by people
link |
01:30:20.720
who are diverse, they
link |
01:30:22.720
have their knowledge. There's also
link |
01:30:24.720
important to...
link |
01:30:26.720
I've learned a lot from people
link |
01:30:28.720
who actually
link |
01:30:30.720
have an idea that I might not think it's good
link |
01:30:32.720
but if I give them the space to try it
link |
01:30:34.720
I've been proven wrong many, many times
link |
01:30:36.720
as well. So, that's great.
link |
01:30:38.720
I think it's...
link |
01:30:40.720
Your colleagues are more important than yourself
link |
01:30:42.720
I think so.
link |
01:30:44.720
Now, let's real quick
link |
01:30:46.720
talk about another impossible problem.
link |
01:30:48.720
AGI.
link |
01:30:50.720
What do you think it takes to build a system
link |
01:30:52.720
that's human level intelligence?
link |
01:30:54.720
We talked a little bit about the touring test, StarCraft
link |
01:30:56.720
all these have echoes of general intelligence
link |
01:30:58.720
but if you think about
link |
01:31:00.720
just something that you would sit back
link |
01:31:02.720
and say, wow, this is
link |
01:31:04.720
really something that resembles
link |
01:31:06.720
human level intelligence, what do you think it takes
link |
01:31:08.720
to build that?
link |
01:31:10.720
So, I find that
link |
01:31:12.720
AGI oftentimes is maybe not
link |
01:31:14.720
very well defined
link |
01:31:16.720
so what I'm trying to
link |
01:31:18.720
then come up with for myself is
link |
01:31:20.720
what would be a result
link |
01:31:22.720
look like that
link |
01:31:24.720
you would start to believe that
link |
01:31:26.720
you would have agents or neural nets
link |
01:31:28.720
that no longer sort of overfit
link |
01:31:30.720
to a single task, right?
link |
01:31:32.720
But actually
link |
01:31:34.720
kind of learn
link |
01:31:36.720
the skill of learning, so to speak
link |
01:31:38.720
and that actually is a field that I
link |
01:31:40.720
am fascinated by which is
link |
01:31:42.720
the learning to learn or meta learning
link |
01:31:44.720
which is about no longer
link |
01:31:46.720
learning about a single domain
link |
01:31:48.720
so you can think about the learning algorithm
link |
01:31:50.720
itself is general, right?
link |
01:31:52.720
So the same formula we applied for
link |
01:31:54.720
Alpha Star or StarCraft
link |
01:31:56.720
we can now apply to kind of almost any
link |
01:31:58.720
video game or you could apply to
link |
01:32:00.720
many other problems and domains
link |
01:32:02.720
but the algorithm
link |
01:32:04.720
is what's kind of generalizing
link |
01:32:06.720
but the neural network, the weights
link |
01:32:08.720
those weights are useless even
link |
01:32:10.720
to play another race, right? I train
link |
01:32:12.720
a network to play very well at PROTOS vs PROTOS
link |
01:32:14.720
I need to throw away those weights
link |
01:32:16.720
if I want to play
link |
01:32:18.720
now Terran vs Terran
link |
01:32:20.720
I would need to retrain
link |
01:32:22.720
a network from scratch with the same algorithm
link |
01:32:24.720
that's beautiful but the network
link |
01:32:26.720
itself will not be useful
link |
01:32:28.720
so I think when I, if I see
link |
01:32:30.720
an approach that
link |
01:32:32.720
can absorb or start
link |
01:32:34.720
solving new problems
link |
01:32:36.720
without the need to kind of restart
link |
01:32:38.720
the process I think that
link |
01:32:40.720
to me would be a nice way to define
link |
01:32:42.720
some form of AGI
link |
01:32:44.720
again, I don't know
link |
01:32:46.720
the grandiose like age, I mean
link |
01:32:48.720
during tests we solve before AGI
link |
01:32:50.720
I mean, I don't know, I think concretely
link |
01:32:52.720
I would like to see clearly
link |
01:32:54.720
that meta learning happen
link |
01:32:56.720
meaning there is
link |
01:32:58.720
an architecture or a network
link |
01:33:00.720
that as it sees new problem
link |
01:33:02.720
or new data it solves it
link |
01:33:04.720
and to make it
link |
01:33:06.720
kind of a benchmark it should
link |
01:33:08.720
solve it at the same speed that we do solve
link |
01:33:10.720
new problems when I define
link |
01:33:12.720
a new object and you have to recognize it
link |
01:33:14.720
when you start playing a new game
link |
01:33:16.720
you played all the Atari games but now you play a new Atari game
link |
01:33:18.720
well, you're going to be
link |
01:33:20.720
pretty quickly pretty good at the game
link |
01:33:22.720
so that's perhaps
link |
01:33:24.720
what's the domain and what's the exact benchmark
link |
01:33:26.720
it's a bit difficult, I think as a community
link |
01:33:28.720
we might need to do some work to define it
link |
01:33:32.720
but I think this first step
link |
01:33:34.720
I could see it happen relatively soon
link |
01:33:36.720
but then the whole
link |
01:33:38.720
what AGI means and so on
link |
01:33:40.720
I am a bit more confused about
link |
01:33:42.720
what I think people mean different things
link |
01:33:44.720
there's an emotional psychological level
link |
01:33:48.720
that
link |
01:33:50.720
like even the Turing test, passing the Turing test
link |
01:33:52.720
is something that we just pass judgment
link |
01:33:54.720
on as human beings what it means to be
link |
01:33:56.720
you know, as a
link |
01:33:58.720
as a dog
link |
01:34:00.720
an AGI system
link |
01:34:02.720
like what level, what does it mean
link |
01:34:04.720
what does it mean
link |
01:34:06.720
but I like the generalization
link |
01:34:08.720
and maybe as a community we converge towards
link |
01:34:10.720
a group of domains
link |
01:34:12.720
that are sufficiently far away
link |
01:34:14.720
that would be really damn impressive
link |
01:34:16.720
if we're able to generalize
link |
01:34:18.720
so perhaps not as close as Protoss and Zerg
link |
01:34:20.720
but like Wikipedia
link |
01:34:22.720
that would be a good step
link |
01:34:24.720
and then a really good step
link |
01:34:26.720
but then from Starcraft to Wikipedia
link |
01:34:28.720
and back
link |
01:34:30.720
that kind of thing
link |
01:34:32.720
and that feels also quite hard and far
link |
01:34:34.720
I think this
link |
01:34:36.720
as long as you put the benchmark out
link |
01:34:38.720
as we discovered for instance with ImageNet
link |
01:34:40.720
then tremendous progress can be had
link |
01:34:42.720
so I think maybe there's a lack of
link |
01:34:44.720
benchmark
link |
01:34:46.720
but I'm sure we'll find one and the community
link |
01:34:48.720
will then work towards that
link |
01:34:52.720
and then beyond what AGI might mean
link |
01:34:54.720
or would imply
link |
01:34:56.720
I really am hopeful to see
link |
01:34:58.720
basically machine learning
link |
01:35:00.720
or AI just scaling up
link |
01:35:02.720
and helping
link |
01:35:04.720
people that might not have the resources
link |
01:35:06.720
to hire an assistant
link |
01:35:08.720
or that
link |
01:35:10.720
they might not even know what the weather is like
link |
01:35:12.720
but so I think there's
link |
01:35:14.720
in terms of the impact
link |
01:35:16.720
the positive impact of AI
link |
01:35:18.720
I think that's maybe what we should also
link |
01:35:20.720
not lose focus
link |
01:35:22.720
the research community building AGI
link |
01:35:24.720
that's a real nice goal
link |
01:35:26.720
and I think the way that DeepMind puts it
link |
01:35:28.720
is and then use it to solve everything else
link |
01:35:30.720
so I think we should paralyze
link |
01:35:32.720
yeah we shouldn't forget
link |
01:35:34.720
of all the positive things that are actually
link |
01:35:36.720
coming out of AI already and are going
link |
01:35:38.720
to be coming out
link |
01:35:40.720
right
link |
01:35:42.720
and then let me ask
link |
01:35:44.720
relative to popular perception
link |
01:35:46.720
do you have any worry about the existential
link |
01:35:48.720
threat of artificial intelligence
link |
01:35:50.720
in the near or far future
link |
01:35:52.720
that some people have
link |
01:35:54.720
I think in the near future
link |
01:35:56.720
I'm skeptical so I hope
link |
01:35:58.720
I'm not wrong but
link |
01:36:00.720
I'm not concerned
link |
01:36:02.720
but I appreciate efforts
link |
01:36:04.720
ongoing efforts
link |
01:36:06.720
and even like a whole research field on
link |
01:36:08.720
AI safety emerging and in conferences
link |
01:36:10.720
and so on I think that's great
link |
01:36:12.720
in the long term
link |
01:36:14.720
I really hope we
link |
01:36:16.720
just can simply
link |
01:36:18.720
have the benefits outweigh the potential dangers
link |
01:36:20.720
I am hopeful for that
link |
01:36:22.720
but also we must
link |
01:36:24.720
remain vigilant to kind of monitor
link |
01:36:26.720
and assess whether the tradeoffs
link |
01:36:28.720
are there and we have
link |
01:36:30.720
enough
link |
01:36:32.720
also lead time to prevent
link |
01:36:34.720
or to redirect our efforts
link |
01:36:36.720
if need be
link |
01:36:38.720
but I'm quite optimistic
link |
01:36:40.720
about the technology
link |
01:36:42.720
and definitely more fearful
link |
01:36:44.720
of other threats in terms of
link |
01:36:46.720
planetary level
link |
01:36:48.720
at this point but obviously
link |
01:36:50.720
that's the one I kind of have more
link |
01:36:52.720
power on so clearly
link |
01:36:54.720
start thinking more and more about this
link |
01:36:56.720
and it's kind of
link |
01:36:58.720
it's grown in me actually to
link |
01:37:00.720
start reading more about AI safety
link |
01:37:02.720
which is a field that so far I have not
link |
01:37:04.720
really contributed to but maybe
link |
01:37:06.720
there's something to be done there as well
link |
01:37:08.720
I think it's really important
link |
01:37:10.720
I talk about this with a few folks
link |
01:37:12.720
but it's important to ask you
link |
01:37:14.720
and shove it in your head because you're at the
link |
01:37:16.720
leading edge of actually
link |
01:37:18.720
what people are excited about in AI
link |
01:37:20.720
I mean the work with AlphaStar
link |
01:37:22.720
at the very cutting edge of the kind
link |
01:37:24.720
of thing that people are afraid of
link |
01:37:26.720
and so you speaking
link |
01:37:28.720
to that fact and
link |
01:37:30.720
that we're actually quite far away
link |
01:37:32.720
to the kind of thing that people might be
link |
01:37:34.720
afraid of but it's still
link |
01:37:36.720
worthwhile to think about
link |
01:37:38.720
and it's also good that you're
link |
01:37:40.720
that you're not as worried
link |
01:37:42.720
and you're also open to
link |
01:37:44.720
I mean there's two aspects
link |
01:37:46.720
I mean me not being worried but obviously
link |
01:37:48.720
we should prepare
link |
01:37:50.720
for it
link |
01:37:52.720
for things that could
link |
01:37:54.720
go wrong, misuse of the technologies
link |
01:37:56.720
as with any technologies
link |
01:37:58.720
so I think
link |
01:38:00.720
there's always tradeoffs
link |
01:38:02.720
and as a society we've kind of
link |
01:38:04.720
solved this to some extent
link |
01:38:06.720
in the past so I'm hoping that
link |
01:38:08.720
by having the researchers
link |
01:38:10.720
and the whole community
link |
01:38:12.720
brainstorm and come up with
link |
01:38:14.720
interesting solutions to the new things
link |
01:38:16.720
that will happen in the future
link |
01:38:18.720
that we can still also push the research
link |
01:38:20.720
to the avenue that
link |
01:38:22.720
I think is kind of the greatest avenue
link |
01:38:24.720
which is to
link |
01:38:26.720
understand intelligence, right? How are we doing
link |
01:38:28.720
what we're doing and
link |
01:38:30.720
obviously from a scientific standpoint
link |
01:38:32.720
that is kind of the drive
link |
01:38:34.720
my personal drive of
link |
01:38:36.720
all the time that I spend doing
link |
01:38:38.720
what I'm doing really.
link |
01:38:40.720
Where do you see the deep learning as a field heading
link |
01:38:42.720
where do you think the next big
link |
01:38:44.720
breakthrough might be?
link |
01:38:46.720
So I think deep learning
link |
01:38:48.720
I discussed a little of this before
link |
01:38:50.720
deep learning has to be
link |
01:38:52.720
combined with some form of discretization
link |
01:38:54.720
program synthesis
link |
01:38:56.720
I think that's kind of as a research
link |
01:38:58.720
in itself is an interesting topic
link |
01:39:00.720
to expand and start doing more research
link |
01:39:02.720
and then
link |
01:39:04.720
as kind of what will deep learning
link |
01:39:06.720
enable to do in the future
link |
01:39:08.720
I don't think that's going to be what's going to happen
link |
01:39:10.720
this year but also this
link |
01:39:12.720
idea of
link |
01:39:14.720
not to throw away all the weights
link |
01:39:16.720
this idea of learning to learn
link |
01:39:18.720
and really having
link |
01:39:20.720
these agents
link |
01:39:22.720
not having to restart their weights
link |
01:39:24.720
and you can have an agent
link |
01:39:26.720
that is kind of solving
link |
01:39:28.720
or classifying images on ImageNet
link |
01:39:30.720
but also generating speech
link |
01:39:32.720
if you ask it to generate some speech
link |
01:39:34.720
and it should really be kind of
link |
01:39:36.720
almost the same
link |
01:39:38.720
network but
link |
01:39:40.720
might not be a neural network it might be a neural network
link |
01:39:42.720
with an optimization algorithm
link |
01:39:44.720
attached to it but I think this idea
link |
01:39:46.720
of generalization to new task
link |
01:39:48.720
is something that we first
link |
01:39:50.720
must define good benchmarks but then
link |
01:39:52.720
I think that's going to be exciting
link |
01:39:54.720
and I'm not sure how close we are
link |
01:39:56.720
but I think there's
link |
01:39:58.720
if you have a very limited domain
link |
01:40:00.720
I think we can start doing some progress
link |
01:40:02.720
and
link |
01:40:04.720
much like how we did a lot of programs
link |
01:40:06.720
in computer vision we should start thinking
link |
01:40:08.720
I really like a talk that
link |
01:40:10.720
Leon Boutou gave at ICML
link |
01:40:12.720
a few years ago which is
link |
01:40:14.720
this train test paradigm should be broken
link |
01:40:16.720
we should stop
link |
01:40:18.720
thinking about a training test
link |
01:40:20.720
sorry a training set and a test set
link |
01:40:22.720
and these are closed
link |
01:40:24.720
things that are untouchable
link |
01:40:26.720
I think we should go beyond these and
link |
01:40:28.720
in meta learning we call these the meta training set
link |
01:40:30.720
and the meta test set which is
link |
01:40:32.720
really thinking about
link |
01:40:34.720
if I know about ImageNet
link |
01:40:36.720
why would that network
link |
01:40:38.720
not work on MNIST which is a much
link |
01:40:40.720
simpler problem but right now it really doesn't
link |
01:40:42.720
it you know
link |
01:40:44.720
but it just feels wrong right so I think
link |
01:40:46.720
that's kind of the
link |
01:40:48.720
there's the on the application
link |
01:40:50.720
or the benchmark sites we probably
link |
01:40:52.720
will see quite a few
link |
01:40:54.720
more interest and progress and hopefully
link |
01:40:56.720
people defining new
link |
01:40:58.720
and exciting challenges really
link |
01:41:00.720
do you have any hope or
link |
01:41:02.720
interest in knowledge graphs
link |
01:41:04.720
within this context so it's kind of
link |
01:41:06.720
constructing graphs
link |
01:41:08.720
going back to graphs
link |
01:41:10.720
well neural networks are graphs but I mean
link |
01:41:12.720
a different kind of knowledge graph
link |
01:41:14.720
sort of like semantic graphs
link |
01:41:16.720
where there's concepts
link |
01:41:18.720
so I think
link |
01:41:20.720
the idea of graphs
link |
01:41:22.720
is so I've been quite interested
link |
01:41:24.720
in sequences first and then more
link |
01:41:26.720
interesting or different data structures
link |
01:41:28.720
like graphs and
link |
01:41:30.720
I've studied graph neural networks
link |
01:41:32.720
in the last three years or so
link |
01:41:34.720
I
link |
01:41:36.720
found these models just very interesting from
link |
01:41:38.720
like deep learning
link |
01:41:40.720
standpoint but then
link |
01:41:42.720
how what do we want
link |
01:41:44.720
why do we want these models and why would we
link |
01:41:46.720
use them what's the application
link |
01:41:48.720
what's kind of the killer application of graphs
link |
01:41:50.720
right and
link |
01:41:52.720
perhaps
link |
01:41:54.720
if we
link |
01:41:56.720
could extract a knowledge graph
link |
01:41:58.720
from Wikipedia automatically
link |
01:42:00.720
that would be interesting because
link |
01:42:02.720
then these graphs have
link |
01:42:04.720
this very interesting structure
link |
01:42:06.720
that also is a bit more compatible with
link |
01:42:08.720
this idea of programs and
link |
01:42:10.720
deep learning kind of working together
link |
01:42:12.720
jumping neighborhoods and so on
link |
01:42:14.720
you could imagine defining some primitives
link |
01:42:16.720
to go around graphs right so
link |
01:42:18.720
I think
link |
01:42:20.720
I really like the idea of a knowledge
link |
01:42:22.720
graph and in fact
link |
01:42:24.720
when we
link |
01:42:26.720
we started or you know
link |
01:42:28.720
as part of the research we did for StarCraft
link |
01:42:30.720
I thought wouldn't it be cool to give
link |
01:42:32.720
the graph of
link |
01:42:34.720
you know all the
link |
01:42:36.720
all these buildings that depend on each other
link |
01:42:38.720
and units that have
link |
01:42:40.720
prerequisites of being built by that and so
link |
01:42:42.720
this is information
link |
01:42:44.720
that the network can learn and extract
link |
01:42:46.720
but it would have been great to see
link |
01:42:48.720
or to think of
link |
01:42:50.720
really StarCraft as a giant graph
link |
01:42:52.720
that even also as the game evolves
link |
01:42:54.720
you kind of start taking branches
link |
01:42:56.720
and so on and we tried
link |
01:42:58.720
to do a little bit of research on this
link |
01:43:00.720
nothing too relevant
link |
01:43:02.720
but I really like the idea
link |
01:43:04.720
and it has elements that are
link |
01:43:06.720
which something you also worked with in terms of visualizing
link |
01:43:08.720
your networks as elements of
link |
01:43:10.720
having human interpretable
link |
01:43:12.720
being able to generate knowledge
link |
01:43:14.720
representations that are human interpretable
link |
01:43:16.720
that maybe human experts can then tweak
link |
01:43:18.720
or at least understand
link |
01:43:20.720
so there's a lot of interesting
link |
01:43:22.720
aspect there and for me personally I'm just a huge fan of
link |
01:43:24.720
Wikipedia and it's a shame
link |
01:43:26.720
that our neural networks
link |
01:43:28.720
aren't taking advantage of all the structured
link |
01:43:30.720
knowledge that's on the web.
link |
01:43:32.720
What's next for you?
link |
01:43:34.720
What's next for DeepMind?
link |
01:43:36.720
What are you excited about?
link |
01:43:38.720
For AlphaStar?
link |
01:43:40.720
Yeah so I think
link |
01:43:42.720
the obvious next steps
link |
01:43:44.720
would be to
link |
01:43:46.720
apply AlphaStar to
link |
01:43:48.720
other races I mean that sort of
link |
01:43:50.720
shows that the algorithm
link |
01:43:52.720
works because
link |
01:43:54.720
by mistake something in the architecture
link |
01:43:56.720
that happens to work for proto's
link |
01:43:58.720
but not for other races right so
link |
01:44:00.720
as verification I think
link |
01:44:02.720
that's an obvious next step that we are working on
link |
01:44:04.720
and
link |
01:44:06.720
then I would like to see
link |
01:44:08.720
so agents and players
link |
01:44:10.720
can specialize on
link |
01:44:12.720
different skill sets that allow them to be
link |
01:44:14.720
very good. I think we've seen
link |
01:44:16.720
AlphaStar understanding
link |
01:44:18.720
very well when to take battles and when
link |
01:44:20.720
to not do that
link |
01:44:22.720
also very good at micromanagement
link |
01:44:24.720
and moving the units around and so on
link |
01:44:26.720
and also very good at producing
link |
01:44:28.720
nonstop and trading of economy
link |
01:44:30.720
with building units
link |
01:44:32.720
but I have not
link |
01:44:34.720
perhaps seen as much as I would like
link |
01:44:36.720
this idea of the poker idea
link |
01:44:38.720
that you mentioned right.
link |
01:44:40.720
I'm not sure StarCraft or AlphaStar
link |
01:44:42.720
rather has developed a very
link |
01:44:44.720
deep understanding of
link |
01:44:46.720
what the opponent is doing
link |
01:44:48.720
and reacting to that and sort of
link |
01:44:50.720
trying to
link |
01:44:52.720
trick the player to do something else or that
link |
01:44:54.720
you know so this kind of reasoning
link |
01:44:56.720
I would like to see more so I think
link |
01:44:58.720
purely from a research standpoint
link |
01:45:00.720
there's perhaps also quite a few
link |
01:45:02.720
things to be done there
link |
01:45:04.720
in the domain of StarCraft. Yeah in the
link |
01:45:06.720
domain of games I've seen some
link |
01:45:08.720
interesting work in sort of
link |
01:45:10.720
in even auctions manipulating
link |
01:45:12.720
other players sort of forming a belief
link |
01:45:14.720
state and just messing with
link |
01:45:16.720
people. Yeah it's called theory of mind
link |
01:45:18.720
so it's a fast
link |
01:45:20.720
theory of mind on StarCraft
link |
01:45:22.720
is kind of they're really
link |
01:45:24.720
made for each other so
link |
01:45:26.720
that would be very exciting to see
link |
01:45:28.720
those techniques applied to StarCraft
link |
01:45:30.720
or perhaps StarCraft driving
link |
01:45:32.720
new techniques as I said
link |
01:45:34.720
this is always the tension between the two.
link |
01:45:36.720
Wow Oriel thank you so much for talking
link |
01:45:38.720
awesome it was great to be here thanks