back to index

Tuomas Sandholm: Poker and Game Theory | Lex Fridman Podcast #12


small model | large model

link |
00:00:00.000
The following is a conversation with Thomas Sanholm.
link |
00:00:03.440
He's a professor at CMU and co creator of Labratus,
link |
00:00:06.880
which is the first AI system to beat top human players
link |
00:00:09.880
in the game of Heads Up No Limit Texas Holdem.
link |
00:00:13.000
He has published over 450 papers
link |
00:00:15.600
on game theory and machine learning,
link |
00:00:17.320
including a best paper in 2017 at NIPS,
link |
00:00:21.120
now renamed to Newrips,
link |
00:00:23.560
which is where I caught up with him for this conversation.
link |
00:00:27.040
His research and companies have had wide reaching impact
link |
00:00:30.680
in the real world,
link |
00:00:32.160
especially because he and his group
link |
00:00:34.400
not only propose new ideas,
link |
00:00:36.640
but also build systems to prove that these ideas work
link |
00:00:40.440
in the real world.
link |
00:00:42.120
This conversation is part of the MIT course
link |
00:00:44.640
on artificial general intelligence
link |
00:00:46.440
and the artificial intelligence podcast.
link |
00:00:49.040
If you enjoy it, subscribe on YouTube, iTunes,
link |
00:00:52.400
or simply connect with me on Twitter
link |
00:00:54.320
at Lex Friedman, spelled F R I D.
link |
00:00:58.080
And now here's my conversation with Thomas Sanholm.
link |
00:01:03.080
Can you describe at the high level
link |
00:01:06.120
the game of poker, Texas Holdem, Heads Up Texas Holdem
link |
00:01:09.320
for people who might not be familiar with this card game?
link |
00:01:13.280
Yeah, happy to.
link |
00:01:14.440
So Heads Up No Limit Texas Holdem
link |
00:01:16.520
has really emerged in the AI community
link |
00:01:18.840
as a main benchmark for testing these
link |
00:01:21.360
application independent algorithms
link |
00:01:23.560
for imperfect information game solving.
link |
00:01:26.440
And this is a game that's actually played by humans.
link |
00:01:30.960
You don't see that much on TV or casinos
link |
00:01:33.960
because well, for various reasons,
link |
00:01:36.160
but you do see it in some expert level casinos
link |
00:01:40.240
and you see it in the best poker movies of all time.
link |
00:01:43.080
It's actually an event in the World Series of Poker,
link |
00:01:45.720
but mostly it's played online
link |
00:01:48.200
and typically for pretty big sums of money.
link |
00:01:50.880
And this is a game that usually only experts play.
link |
00:01:54.560
So if you go to your home game on a Friday night,
link |
00:01:58.720
it probably is not gonna be Heads Up No Limit Texas Holdem.
link |
00:02:01.280
It might be No Limit Texas Holdem in some cases,
link |
00:02:04.640
but typically for a big group and it's not as competitive.
link |
00:02:08.720
While Heads Up means it's two players.
link |
00:02:10.520
So it's really like me against you.
link |
00:02:13.360
Am I better or are you better?
link |
00:02:14.680
Much like chess or go in that sense,
link |
00:02:17.520
but an imperfect information game,
link |
00:02:19.520
which makes it much harder because I have to deal
link |
00:02:21.520
with issues of you knowing things that I don't know
link |
00:02:25.560
and I know things that you don't know
link |
00:02:27.200
instead of pieces being nicely laid on the board
link |
00:02:29.720
for both of us to see.
link |
00:02:31.120
So in Texas Holdem, there's two cards
link |
00:02:34.840
that you only see that belong to you.
link |
00:02:37.440
Yeah. And there is,
link |
00:02:38.520
they gradually lay out some cards
link |
00:02:40.400
that add up overall to five cards that everybody can see.
link |
00:02:44.080
Yeah. So the imperfect nature
link |
00:02:45.720
of the information is the two cards
link |
00:02:47.560
that you're holding in your hand.
link |
00:02:48.400
Up front, yeah.
link |
00:02:49.380
So as you said, you first get two cards
link |
00:02:51.840
in private each and then there's a betting round.
link |
00:02:55.200
Then you get three cards in public on the table.
link |
00:02:58.320
Then there's a betting round.
link |
00:02:59.240
Then you get the fourth card in public on the table.
link |
00:03:01.680
There's a betting round.
link |
00:03:02.580
Then you get the 5th card on the table.
link |
00:03:04.920
There's a betting round.
link |
00:03:05.760
So there's a total of four betting rounds
link |
00:03:07.480
and four tranches of information revelation if you will.
link |
00:03:11.140
The only the first tranche is private
link |
00:03:14.120
and then it's public from there.
link |
00:03:16.520
And this is probably by far the most popular game in AI
link |
00:03:24.040
and just the general public
link |
00:03:26.380
in terms of imperfect information.
link |
00:03:28.400
So that's probably the most popular spectator game
link |
00:03:32.520
to watch, right?
link |
00:03:33.400
So, which is why it's a super exciting game to tackle.
link |
00:03:37.260
So it's on the order of chess, I would say,
link |
00:03:40.480
in terms of popularity, in terms of AI setting it
link |
00:03:43.680
as the bar of what is intelligence.
link |
00:03:46.360
So in 2017, Labratus, how do you pronounce it?
link |
00:03:50.400
Labratus.
link |
00:03:51.220
Labratus.
link |
00:03:52.060
Labratus beats.
link |
00:03:52.900
A little Latin there.
link |
00:03:54.080
A little bit of Latin.
link |
00:03:55.520
Labratus beats a few, four expert human players.
link |
00:04:01.040
Can you describe that event?
link |
00:04:03.080
What you learned from it?
link |
00:04:04.060
What was it like?
link |
00:04:04.900
What was the process in general
link |
00:04:06.860
for people who have not read the papers and the study?
link |
00:04:09.960
Yeah, so the event was that we invited
link |
00:04:12.920
four of the top 10 players,
link |
00:04:14.840
with these specialist players in Heads Up No Limit,
link |
00:04:17.080
Texas Holden, which is very important
link |
00:04:19.080
because this game is actually quite different
link |
00:04:21.400
than the multiplayer version.
link |
00:04:23.900
We brought them in to Pittsburgh
link |
00:04:25.680
to play at the Reverse Casino for 20 days.
link |
00:04:28.920
We wanted to get 120,000 hands in
link |
00:04:31.840
because we wanted to get statistical significance.
link |
00:04:36.160
So it's a lot of hands for humans to play,
link |
00:04:39.040
even for these top pros who play fairly quickly normally.
link |
00:04:42.840
So we couldn't just have one of them play so many hands.
link |
00:04:46.400
20 days, they were playing basically morning to evening.
link |
00:04:50.400
And I raised 200,000 as a little incentive for them to play.
link |
00:04:55.660
And the setting was so that they didn't all get 50,000.
link |
00:05:01.080
We actually paid them out
link |
00:05:02.640
based on how they did against the AI each.
link |
00:05:05.480
So they had an incentive to play as hard as they could,
link |
00:05:09.440
whether they're way ahead or way behind
link |
00:05:11.160
or right at the mark of beating the AI.
link |
00:05:13.760
And you don't make any money, unfortunately.
link |
00:05:16.000
Right, no, we can't make any money.
link |
00:05:17.920
So originally, a couple of years earlier,
link |
00:05:20.320
I actually explored whether we could actually play for money
link |
00:05:24.080
because that would be, of course, interesting as well,
link |
00:05:28.000
to play against the top people for money.
link |
00:05:29.520
But the Pennsylvania Gaming Board said no, so we couldn't.
link |
00:05:33.040
So this is much like an exhibit,
link |
00:05:36.400
like for a musician or a boxer or something like that.
link |
00:05:39.760
Nevertheless, they were keeping track of the money
link |
00:05:41.600
and brought us close to $2 million, I think.
link |
00:05:48.200
So if it was for real money, if you were able to earn money,
link |
00:05:51.840
that was a quite impressive and inspiring achievement.
link |
00:05:55.360
Just a few details, what were the players looking at?
link |
00:05:59.280
Were they behind a computer?
link |
00:06:00.460
What was the interface like?
link |
00:06:02.080
Yes, they were playing much like they normally do.
link |
00:06:05.240
These top players, when they play this game,
link |
00:06:07.200
they play mostly online.
link |
00:06:08.680
So they're used to playing through a UI.
link |
00:06:11.640
And they did the same thing here.
link |
00:06:13.280
So there was this layout.
link |
00:06:14.520
You could imagine there's a table on a screen.
link |
00:06:17.920
There's the human sitting there,
link |
00:06:20.080
and then there's the AI sitting there.
link |
00:06:21.720
And the screen shows everything that's happening.
link |
00:06:24.560
The cards coming out and shows the bets being made.
link |
00:06:27.480
And we also had the betting history for the human.
link |
00:06:29.940
So if the human forgot what had happened in the hand so far,
link |
00:06:33.320
they could actually reference back and so forth.
link |
00:06:37.240
Is there a reason they were given access
link |
00:06:39.480
to the betting history for?
link |
00:06:41.200
Well, we just, it didn't really matter.
link |
00:06:45.860
They wouldn't have forgotten anyway.
link |
00:06:47.360
These are top quality people.
link |
00:06:48.800
But we just wanted to put out there
link |
00:06:51.300
so it's not a question of the human forgetting
link |
00:06:53.460
and the AI somehow trying to get advantage
link |
00:06:55.320
of better memory.
link |
00:06:56.760
So what was that like?
link |
00:06:57.640
I mean, that was an incredible accomplishment.
link |
00:06:59.720
So what did it feel like before the event?
link |
00:07:02.760
Did you have doubt, hope?
link |
00:07:05.640
Where was your confidence at?
link |
00:07:08.160
Yeah, that's great.
link |
00:07:09.240
So great question.
link |
00:07:10.160
So 18 months earlier, I had organized a similar brains
link |
00:07:14.200
versus AI competition with a previous AI called Cloudyco
link |
00:07:17.840
and we couldn't beat the humans.
link |
00:07:20.560
So this time around, it was only 18 months later.
link |
00:07:23.800
And I knew that this new AI, Libratus, was way stronger,
link |
00:07:27.820
but it's hard to say how you'll do against the top humans
link |
00:07:31.360
before you try.
link |
00:07:32.440
So I thought we had about a 50, 50 shot.
link |
00:07:35.160
And the international betting sites put us
link |
00:07:38.880
as a four to one or five to one underdog.
link |
00:07:41.800
So it's kind of interesting that people really believe
link |
00:07:44.700
in people and over AI, not just people.
link |
00:07:48.440
People don't just over believe in themselves,
link |
00:07:50.720
but they have overconfidence in other people as well
link |
00:07:53.280
compared to the performance of AI.
link |
00:07:55.440
And yeah, so we were a four to one or five to one underdog.
link |
00:07:59.120
And even after three days of beating the humans in a row,
link |
00:08:02.880
we were still 50, 50 on the international betting sites.
link |
00:08:06.520
Do you think there's something special and magical
link |
00:08:09.040
about poker and the way people think about it,
link |
00:08:12.160
in the sense you have,
link |
00:08:14.600
I mean, even in chess, there's no Hollywood movies.
link |
00:08:17.320
Poker is the star of many movies.
link |
00:08:21.200
And there's this feeling that certain human facial
link |
00:08:26.640
expressions and body language, eye movement,
link |
00:08:30.760
all these tells are critical to poker.
link |
00:08:33.360
Like you can look into somebody's soul
link |
00:08:35.000
and understand their betting strategy and so on.
link |
00:08:37.880
So that's probably why, possibly,
link |
00:08:41.520
do you think that is why people have a confidence
link |
00:08:43.640
that humans will outperform?
link |
00:08:45.640
Because AI systems cannot, in this construct,
link |
00:08:48.920
perceive these kinds of tells.
link |
00:08:51.040
They're only looking at betting patterns
link |
00:08:53.200
and nothing else, betting patterns and statistics.
link |
00:08:58.200
So what's more important to you
link |
00:09:02.200
if you step back on human players, human versus human?
link |
00:09:06.120
What's the role of these tells,
link |
00:09:08.600
of these ideas that we romanticize?
link |
00:09:11.880
Yeah, so I'll split it into two parts.
link |
00:09:15.480
So one is why do humans trust humans more than AI
link |
00:09:20.480
and have overconfidence in humans?
link |
00:09:22.600
I think that's not really related to the tell question.
link |
00:09:25.920
It's just that they've seen these top players,
link |
00:09:28.600
how good they are, and they're really fantastic.
link |
00:09:31.040
So it's just hard to believe that an AI could beat them.
link |
00:09:36.040
So I think that's where that comes from.
link |
00:09:37.680
And that's actually maybe a more general lesson about AI.
link |
00:09:40.600
That until you've seen it overperform a human,
link |
00:09:43.200
it's hard to believe that it could.
link |
00:09:45.080
But then the tells, a lot of these top players,
link |
00:09:50.560
they're so good at hiding tells
link |
00:09:52.760
that among the top players,
link |
00:09:56.240
it's actually not really worth it
link |
00:09:59.480
for them to invest a lot of effort
link |
00:10:01.200
trying to find tells in each other
link |
00:10:03.160
because they're so good at hiding them.
link |
00:10:05.640
So yes, at the kind of Friday evening game,
link |
00:10:09.840
tells are gonna be a huge thing.
link |
00:10:11.800
You can read other people.
link |
00:10:13.160
And if you're a good reader,
link |
00:10:14.120
you'll read them like an open book.
link |
00:10:16.440
But at the top levels of poker now,
link |
00:10:18.280
the tells become a much smaller and smaller aspect
link |
00:10:21.960
of the game as you go to the top levels.
link |
00:10:24.480
The amount of strategies, the amount of possible actions
link |
00:10:28.120
is very large, 10 to the power of 100 plus.
link |
00:10:35.400
So there has to be some, I've read a few of the papers
link |
00:10:37.880
related, it has to form some abstractions
link |
00:10:42.080
of various hands and actions.
link |
00:10:44.040
So what kind of abstractions are effective
link |
00:10:47.560
for the game of poker?
link |
00:10:49.200
Yeah, so you're exactly right.
link |
00:10:50.880
So when you go from a game tree that's 10 to the 161,
link |
00:10:55.360
especially in an imperfect information game,
link |
00:10:58.000
it's way too large to solve directly,
link |
00:11:00.200
even with our fastest equilibrium finding algorithms.
link |
00:11:03.280
So you wanna abstract it first.
link |
00:11:07.200
And abstraction in games is much trickier
link |
00:11:10.920
than abstraction in MDPs or other single agent settings.
link |
00:11:15.440
Because you have these abstraction pathologies
link |
00:11:17.760
that if I have a finer grained abstraction,
link |
00:11:19.880
the strategy that I can get from that for the real game
link |
00:11:23.240
might actually be worse than the strategy
link |
00:11:25.240
I can get from the coarse grained abstraction.
link |
00:11:27.160
So you have to be very careful.
link |
00:11:28.760
Now the kinds of abstractions, just to zoom out,
link |
00:11:31.080
we're talking about, there's the hands abstractions
link |
00:11:34.480
and then there's betting strategies.
link |
00:11:37.280
Yeah, betting actions, yeah.
link |
00:11:38.600
Baiting actions.
link |
00:11:39.440
So there's information abstraction,
link |
00:11:41.640
don't talk about general games, information abstraction,
link |
00:11:44.720
which is the abstraction of what chance does.
link |
00:11:47.560
And this would be the cards in the case of poker.
link |
00:11:50.080
And then there's action abstraction,
link |
00:11:52.480
which is abstracting the actions of the actual players,
link |
00:11:57.000
which would be bets in the case of poker.
link |
00:11:59.560
Yourself and the other players?
link |
00:12:01.320
Yes, yourself and other players.
link |
00:12:03.680
And for information abstraction,
link |
00:12:08.280
we were completely automated.
link |
00:12:11.160
So these are algorithms,
link |
00:12:13.840
but they do what we call potential aware abstraction,
link |
00:12:16.760
where we don't just look at the value of the hand,
link |
00:12:19.000
but also how it might materialize
link |
00:12:20.840
into good or bad hands over time.
link |
00:12:22.560
And it's a certain kind of bottom up process
link |
00:12:25.280
with integer programming there and clustering
link |
00:12:27.640
and various aspects, how do you build this abstraction?
link |
00:12:31.480
And then in the action abstraction,
link |
00:12:34.400
there it's largely based on how humans and other AIs
link |
00:12:40.520
have played this game in the past.
link |
00:12:42.320
But in the beginning,
link |
00:12:43.880
we actually used an automated action abstraction technology,
link |
00:12:47.680
which is provably convergent
link |
00:12:51.240
that it finds the optimal combination of bet sizes,
link |
00:12:54.040
but it's not very scalable.
link |
00:12:55.480
So we couldn't use it for the whole game,
link |
00:12:57.280
but we use it for the first couple of betting actions.
link |
00:12:59.880
So what's more important, the strength of the hand,
link |
00:13:03.080
so the information abstraction or the how you play them,
link |
00:13:09.320
the actions, does it, you know,
link |
00:13:11.640
the romanticized notion again,
link |
00:13:13.200
is that it doesn't matter what hands you have,
link |
00:13:15.600
that the actions, the betting may be the way you win
link |
00:13:19.240
no matter what hands you have.
link |
00:13:20.320
Yeah, so that's why you have to play a lot of hands
link |
00:13:23.280
so that the role of luck gets smaller.
link |
00:13:26.800
So you could otherwise get lucky and get some good hands
link |
00:13:29.920
and then you're gonna win the match.
link |
00:13:31.480
Even with thousands of hands, you can get lucky
link |
00:13:35.280
because there's so much variance
link |
00:13:36.720
in No Limit Texas Holden because if we both go all in,
link |
00:13:40.880
it's a huge stack of variance, so there are these
link |
00:13:43.640
massive swings in No Limit Texas Holden.
link |
00:13:47.800
So that's why you have to play not just thousands,
link |
00:13:50.240
but over 100,000 hands to get statistical significance.
link |
00:13:55.000
So let me ask another way this question.
link |
00:13:57.880
If you didn't even look at your hands,
link |
00:14:02.000
but they didn't know that, the opponents didn't know that,
link |
00:14:04.560
how well would you be able to do?
link |
00:14:06.680
Oh, that's a good question.
link |
00:14:07.760
There's actually, I heard this story
link |
00:14:09.600
that there's this Norwegian female poker player
link |
00:14:11.800
called Annette Oberstad who's actually won a tournament
link |
00:14:15.240
by doing exactly that, but that would be extremely rare.
link |
00:14:18.640
So you cannot really play well that way.
link |
00:14:23.440
Okay, so the hands do have some role to play, okay.
link |
00:14:27.840
So Labradus does not use, as far as I understand,
link |
00:14:33.120
they use learning methods, deep learning.
link |
00:14:35.320
Is there room for learning in,
link |
00:14:40.600
there's no reason why Labradus doesn't combine
link |
00:14:44.120
with an AlphaGo type approach for estimating
link |
00:14:46.400
the quality for function estimator.
link |
00:14:49.200
What are your thoughts on this,
link |
00:14:52.040
maybe as compared to another algorithm
link |
00:14:54.760
which I'm not that familiar with, DeepStack,
link |
00:14:56.720
the engine that does use deep learning,
link |
00:14:59.280
that it's unclear how well it does,
link |
00:15:01.560
but nevertheless uses deep learning.
link |
00:15:03.480
So what are your thoughts about learning methods
link |
00:15:05.400
to aid in the way that Labradus plays in the game of poker?
link |
00:15:09.280
Yeah, so as you said,
link |
00:15:10.640
Labradus did not use learning methods
link |
00:15:13.080
and played very well without them.
link |
00:15:15.680
Since then, we have actually, actually here,
link |
00:15:17.840
we have a couple of papers on things
link |
00:15:20.000
that do use learning techniques.
link |
00:15:22.360
Excellent.
link |
00:15:24.440
And deep learning in particular.
link |
00:15:26.360
And sort of the way you're talking about
link |
00:15:29.920
where it's learning an evaluation function,
link |
00:15:33.360
but in imperfect information games,
link |
00:15:37.400
unlike let's say in Go or now also in chess and shogi,
link |
00:15:42.440
it's not sufficient to learn an evaluation for a state
link |
00:15:47.400
because the value of an information set
link |
00:15:52.920
depends not only on the exact state,
link |
00:15:55.400
but it also depends on both players beliefs.
link |
00:15:59.200
Like if I have a bad hand,
link |
00:16:01.240
I'm much better off if the opponent thinks I have a good hand
link |
00:16:04.720
and vice versa.
link |
00:16:05.560
If I have a good hand,
link |
00:16:06.480
I'm much better off if the opponent believes
link |
00:16:09.360
I have a bad hand.
link |
00:16:11.360
So the value of a state is not just a function of the cards.
link |
00:16:15.640
It depends on, if you will, the path of play,
link |
00:16:19.600
but only to the extent that it's captured
link |
00:16:22.040
in the belief distributions.
link |
00:16:23.720
So that's why it's not as simple
link |
00:16:26.240
as it is in perfect information games.
link |
00:16:29.320
And I don't wanna say it's simple there either.
link |
00:16:31.080
It's of course very complicated computationally there too,
link |
00:16:34.200
but at least conceptually, it's very straightforward.
link |
00:16:36.520
There's a state, there's an evaluation function.
link |
00:16:38.760
You can try to learn it.
link |
00:16:39.800
Here, you have to do something more.
link |
00:16:43.280
And what we do is in one of these papers,
link |
00:16:47.160
we're looking at where we allow the opponent
link |
00:16:50.800
to actually take different strategies
link |
00:16:53.000
at the leaf of the search tree, if you will.
link |
00:16:56.440
And that is a different way of doing it.
link |
00:16:59.840
And it doesn't assume therefore a particular way
link |
00:17:02.560
that the opponent plays,
link |
00:17:04.040
but it allows the opponent to choose
link |
00:17:05.840
from a set of different continuation strategies.
link |
00:17:09.800
And that forces us to not be too optimistic
link |
00:17:13.400
in a look ahead search.
link |
00:17:15.520
And that's one way you can do sound look ahead search
link |
00:17:19.040
in imperfect information games,
link |
00:17:21.480
which is very difficult.
link |
00:17:23.360
And you were asking about DeepStack.
link |
00:17:26.080
What they did, it was very different than what we do,
link |
00:17:29.280
either in Libratus or in this new work.
link |
00:17:32.000
They were randomly generating various situations
link |
00:17:35.440
in the game.
link |
00:17:36.440
Then they were doing the look ahead
link |
00:17:38.080
from there to the end of the game,
link |
00:17:39.840
as if that was the start of a different game.
link |
00:17:42.960
And then they were using deep learning
link |
00:17:44.920
to learn those values of those states,
link |
00:17:47.960
but the states were not just the physical states.
link |
00:17:50.280
They include belief distributions.
link |
00:17:52.560
When you talk about look ahead for DeepStack
link |
00:17:56.800
or with Libratus, does it mean,
link |
00:17:59.480
considering every possibility that the game can evolve,
link |
00:18:02.680
are we talking about extremely,
link |
00:18:04.280
sort of this exponentially growth of a tree?
link |
00:18:06.880
Yes, so we're talking about exactly that.
link |
00:18:11.280
Much like you do in alpha beta search
link |
00:18:14.280
or Monte Carlo tree search, but with different techniques.
link |
00:18:17.480
So there's a different search algorithm.
link |
00:18:19.280
And then we have to deal with the leaves differently.
link |
00:18:21.920
So if you think about what Libratus did,
link |
00:18:24.000
we didn't have to worry about this
link |
00:18:25.520
because we only did it at the end of the game.
link |
00:18:28.560
So we would always terminate into a real situation
link |
00:18:32.280
and we would know what the payout is.
link |
00:18:34.000
It didn't do these depth limited lookaheads,
link |
00:18:36.880
but now in this new paper, which is called depth limited,
link |
00:18:40.680
I think it's called depth limited search
link |
00:18:42.120
for imperfect information games,
link |
00:18:43.880
we can actually do sound depth limited lookahead.
link |
00:18:47.040
So we can actually start to do the look ahead
link |
00:18:49.240
from the beginning of the game on,
link |
00:18:51.080
because that's too complicated to do
link |
00:18:53.400
for this whole long game.
link |
00:18:54.920
So in Libratus, we were just doing it for the end.
link |
00:18:57.680
So, and then the other side, this belief distribution,
link |
00:19:00.720
so is it explicitly modeled what kind of beliefs
link |
00:19:05.320
that the opponent might have?
link |
00:19:07.400
Yeah, it is explicitly modeled, but it's not assumed.
link |
00:19:11.840
The beliefs are actually output, not input.
link |
00:19:15.400
Of course, the starting beliefs are input,
link |
00:19:18.840
but they just fall from the rules of the game
link |
00:19:20.640
because we know that the dealer deals uniformly
link |
00:19:23.520
from the deck, so I know that every pair of cards
link |
00:19:27.720
that you might have is equally likely.
link |
00:19:30.440
I know that for a fact, that just follows
link |
00:19:32.200
from the rules of the game.
link |
00:19:33.160
Of course, except the two cards that I have,
link |
00:19:35.200
I know you don't have those.
link |
00:19:36.560
Yeah.
link |
00:19:37.560
You have to take that into account.
link |
00:19:38.720
That's called card removal and that's very important.
link |
00:19:40.920
Is the dealing always coming from a single deck
link |
00:19:43.760
in Heads Up, so you can assume.
link |
00:19:45.880
Single deck, so you know that if I have the ace of spades,
link |
00:19:50.880
I know you don't have an ace of spades.
link |
00:19:53.560
Great, so in the beginning, your belief is basically
link |
00:19:56.880
the fact that it's a fair dealing of hands,
link |
00:19:59.320
but how do you start to adjust that belief?
link |
00:20:02.800
Well, that's where this beauty of game theory comes.
link |
00:20:06.800
So Nash equilibrium, which John Nash introduced in 1950,
link |
00:20:10.920
introduces what rational play is
link |
00:20:13.800
when you have more than one player.
link |
00:20:16.040
And these are pairs of strategies
link |
00:20:18.440
where strategies are contingency plans,
link |
00:20:20.360
one for each player.
link |
00:20:22.880
So that neither player wants to deviate
link |
00:20:25.720
to a different strategy,
link |
00:20:26.960
given that the other doesn't deviate.
link |
00:20:29.160
But as a side effect, you get the beliefs from base roll.
link |
00:20:33.840
So Nash equilibrium really isn't just deriving
link |
00:20:36.440
in these imperfect information games,
link |
00:20:38.360
Nash equilibrium, it doesn't just define strategies.
link |
00:20:41.920
It also defines beliefs for both of us
link |
00:20:44.960
and defines beliefs for each state.
link |
00:20:48.840
So at each state, it's called information sets.
link |
00:20:53.280
At each information set in the game,
link |
00:20:55.560
there's a set of different states that we might be in,
link |
00:20:59.000
but I don't know which one we're in.
link |
00:21:01.760
Nash equilibrium tells me exactly
link |
00:21:03.400
what is the probability distribution
link |
00:21:05.000
over those real world states in my mind.
link |
00:21:08.280
How does Nash equilibrium give you that distribution?
link |
00:21:11.440
So why?
link |
00:21:12.280
I'll do a simple example.
link |
00:21:13.320
So you know the game Rock, Paper, Scissors?
link |
00:21:16.760
So we can draw it as player one moves first
link |
00:21:20.000
and then player two moves.
link |
00:21:21.600
But of course, it's important that player two
link |
00:21:24.520
doesn't know what player one moved,
link |
00:21:26.400
otherwise player two would win every time.
link |
00:21:28.600
So we can draw that as an information set
link |
00:21:30.480
where player one makes one of three moves first,
link |
00:21:33.280
and then there's an information set for player two.
link |
00:21:36.200
So player two doesn't know which of those nodes
link |
00:21:39.920
the world is in.
link |
00:21:41.800
But once we know the strategy for player one,
link |
00:21:44.920
Nash equilibrium will say that you play 1 3rd Rock,
link |
00:21:47.320
1 3rd Paper, 1 3rd Scissors.
link |
00:21:49.400
From that, I can derive my beliefs on the information set
link |
00:21:52.600
that they're 1 3rd, 1 3rd, 1 3rd.
link |
00:21:54.480
So Bayes gives you that.
link |
00:21:56.280
Bayes gives you.
link |
00:21:57.560
But is that specific to a particular player,
link |
00:21:59.760
or is it something you quickly update
link |
00:22:03.960
with the specific player?
link |
00:22:05.040
No, the game theory isn't really player specific.
link |
00:22:08.800
So that's also why we don't need any data.
link |
00:22:11.720
We don't need any history
link |
00:22:12.760
how these particular humans played in the past
link |
00:22:14.800
or how any AI or human had played before.
link |
00:22:17.400
It's all about rationality.
link |
00:22:20.240
So the AI just thinks about
link |
00:22:22.720
what would a rational opponent do?
link |
00:22:24.880
And what would I do if I am rational?
link |
00:22:28.000
And that's the idea of game theory.
link |
00:22:31.080
So it's really a data free, opponent free approach.
link |
00:22:35.560
So it comes from the design of the game
link |
00:22:37.680
as opposed to the design of the player.
link |
00:22:40.040
Exactly, there's no opponent modeling per se.
link |
00:22:43.080
I mean, we've done some work on combining opponent modeling
link |
00:22:45.600
with game theory so you can exploit weak players even more,
link |
00:22:48.840
but that's another strand.
link |
00:22:50.280
And in Librarus, we didn't turn that on.
link |
00:22:52.320
So I decided that these players are too good.
link |
00:22:55.000
And when you start to exploit an opponent,
link |
00:22:58.080
you typically open yourself up to exploitation.
link |
00:23:01.800
And these guys have so few holes to exploit
link |
00:23:04.000
and they're world's leading experts in counter exploitation.
link |
00:23:06.760
So I decided that we're not gonna turn that stuff on.
link |
00:23:09.200
Actually, I saw a few of your papers exploiting opponents.
link |
00:23:12.160
It sounded very interesting to explore.
link |
00:23:15.720
Do you think there's room for exploitation
link |
00:23:17.880
generally outside of Librarus?
link |
00:23:19.920
Is there a subject or people differences
link |
00:23:24.080
that could be exploited, maybe not just in poker,
link |
00:23:27.920
but in general interactions and negotiations,
link |
00:23:30.440
all these other domains that you're considering?
link |
00:23:33.480
Yeah, definitely.
link |
00:23:34.680
We've done some work on that.
link |
00:23:35.920
And I really like the work at hybrid digested too.
link |
00:23:39.880
So you figure out what would a rational opponent do.
link |
00:23:43.440
And by the way, that's safe in these zero sum games,
link |
00:23:46.280
two player zero sum games,
link |
00:23:47.480
because if the opponent does something irrational,
link |
00:23:49.560
yes, it might throw off my beliefs,
link |
00:23:53.080
but the amount that the player can gain
link |
00:23:55.760
by throwing off my belief is always less
link |
00:23:59.160
than they lose by playing poorly.
link |
00:24:01.800
So it's safe.
link |
00:24:03.080
But still, if somebody's weak as a player,
link |
00:24:06.720
you might wanna play differently to exploit them more.
link |
00:24:10.240
So you can think about it this way,
link |
00:24:12.040
a game theoretic strategy is unbeatable,
link |
00:24:15.600
but it doesn't maximally beat the other opponent.
link |
00:24:19.600
So the winnings per hand might be better
link |
00:24:22.800
with a different strategy.
link |
00:24:24.240
And the hybrid is that you start
link |
00:24:25.720
from a game theoretic approach.
link |
00:24:27.080
And then as you gain data about the opponent
link |
00:24:30.840
in certain parts of the game tree,
link |
00:24:32.600
then in those parts of the game tree,
link |
00:24:34.360
you start to tweak your strategy more and more
link |
00:24:37.800
towards exploitation while still staying fairly close
link |
00:24:40.960
to the game theoretic strategy
link |
00:24:42.160
so as to not open yourself up to exploitation too much.
link |
00:24:46.840
How do you do that?
link |
00:24:48.320
Do you try to vary up strategies, make it unpredictable?
link |
00:24:53.640
It's like, what is it, tit for tat strategies
link |
00:24:57.520
in Prisoner's Dilemma or?
link |
00:25:00.720
Well, that's a repeated game.
link |
00:25:03.240
Repeated games.
link |
00:25:04.080
Simple Prisoner's Dilemma, repeated games.
link |
00:25:06.520
But even there, there's no proof that says
link |
00:25:08.760
that that's the best thing.
link |
00:25:10.080
But experimentally, it actually does well.
link |
00:25:13.280
So what kind of games are there, first of all?
link |
00:25:15.320
I don't know if this is something
link |
00:25:17.040
that you could just summarize.
link |
00:25:18.600
There's perfect information games
link |
00:25:20.360
where all the information's on the table.
link |
00:25:22.400
There is imperfect information games.
link |
00:25:25.480
There's repeated games that you play over and over.
link |
00:25:28.560
There's zero sum games.
link |
00:25:31.320
There's non zero sum games.
link |
00:25:34.440
And then there's a really important distinction
link |
00:25:37.520
you're making, two player versus more players.
link |
00:25:40.720
So what are, what other games are there?
link |
00:25:44.760
And what's the difference, for example,
link |
00:25:46.160
with this two player game versus more players?
link |
00:25:50.040
What are the key differences in your view?
link |
00:25:51.680
So let me start from the basics.
link |
00:25:54.600
So a repeated game is a game where the same exact game
link |
00:25:59.600
is played over and over.
link |
00:26:01.800
In these extensive form games, where it's,
link |
00:26:05.800
think about three form, maybe with these information sets
link |
00:26:08.480
to represent incomplete information,
link |
00:26:11.400
you can have kind of repetitive interactions.
link |
00:26:14.840
Even repeated games are a special case of that, by the way.
link |
00:26:17.760
But the game doesn't have to be exactly the same.
link |
00:26:21.520
It's like in sourcing auctions.
link |
00:26:23.040
Yes, we're gonna see the same supply base year to year,
link |
00:26:26.320
but what I'm buying is a little different every time.
link |
00:26:28.800
And the supply base is a little different every time
link |
00:26:31.000
and so on.
link |
00:26:31.840
So it's not really repeated.
link |
00:26:33.400
So to find a purely repeated game
link |
00:26:35.680
is actually very rare in the world.
link |
00:26:37.840
So they're really a very course model of what's going on.
link |
00:26:42.840
Then if you move up from just repeated,
link |
00:26:46.360
simple repeated matrix games,
link |
00:26:49.040
not all the way to extensive form games,
link |
00:26:50.800
but in between, they're stochastic games,
link |
00:26:53.600
where, you know, there's these,
link |
00:26:57.000
you think about it like these little matrix games.
link |
00:27:00.520
And when you take an action and your opponent takes an action,
link |
00:27:04.200
they determine not which next state I'm going to,
link |
00:27:07.680
next game I'm going to,
link |
00:27:09.120
but the distribution over next games
link |
00:27:11.440
where I might be going to.
link |
00:27:13.360
So that's the stochastic game.
link |
00:27:15.360
But it's like matrix games, repeated stochastic games,
link |
00:27:19.000
extensive form games.
link |
00:27:20.400
That is from less to more general.
link |
00:27:23.040
And poker is an example of the last one.
link |
00:27:26.280
So it's really in the most general setting.
link |
00:27:29.560
Extensive form games.
link |
00:27:30.640
And that's kind of what the AI community has been working on
link |
00:27:34.520
and being benchmarked on
link |
00:27:36.280
with this Heads Up No Limit Texas Holdem.
link |
00:27:38.040
Can you describe extensive form games?
link |
00:27:39.760
What's the model here?
link |
00:27:41.560
Yeah, so if you're familiar with the tree form,
link |
00:27:44.320
so it's really the tree form.
link |
00:27:45.760
Like in chess, there's a search tree.
link |
00:27:47.560
Versus a matrix.
link |
00:27:48.720
Versus a matrix, yeah.
link |
00:27:50.080
And the matrix is called the matrix form
link |
00:27:53.000
or bi matrix form or normal form game.
link |
00:27:55.320
And here you have the tree form.
link |
00:27:57.080
So you can actually do certain types of reasoning there
link |
00:28:00.000
that you lose the information when you go to normal form.
link |
00:28:04.680
There's a certain form of equivalence.
link |
00:28:07.000
Like if you go from tree form and you say it,
link |
00:28:08.880
every possible contingency plan is a strategy.
link |
00:28:12.720
Then I can actually go back to the normal form,
link |
00:28:15.080
but I lose some information from the lack of sequentiality.
link |
00:28:18.600
Then the multiplayer versus two player distinction
link |
00:28:21.280
is an important one.
link |
00:28:22.880
So two player games in zero sum
link |
00:28:27.320
are conceptually easier and computationally easier.
link |
00:28:32.840
They're still huge like this one,
link |
00:28:36.000
but they're conceptually easier and computationally easier
link |
00:28:39.680
in that conceptually, you don't have to worry about
link |
00:28:42.920
which equilibrium is the other guy going to play
link |
00:28:45.360
when there are multiple,
link |
00:28:46.640
because any equilibrium strategy is a best response
link |
00:28:49.920
to any other equilibrium strategy.
link |
00:28:52.000
So I can play a different equilibrium from you
link |
00:28:54.360
and we'll still get the right values of the game.
link |
00:28:57.320
That falls apart even with two players
link |
00:28:59.240
when you have general sum games.
link |
00:29:01.360
Even without cooperation just in general.
link |
00:29:03.120
Even without cooperation.
link |
00:29:04.800
So there's a big gap from two player zero sum
link |
00:29:07.640
to two player general sum or even to three player zero sum.
link |
00:29:11.160
That's a big gap, at least in theory.
link |
00:29:14.280
Can you maybe non mathematically provide the intuition
link |
00:29:18.920
why it all falls apart with three or more players?
link |
00:29:22.120
It seems like you should still be able to have
link |
00:29:24.400
a Nash equilibrium that's instructive, that holds.
link |
00:29:31.280
Okay, so it is true that all finite games
link |
00:29:36.000
have a Nash equilibrium.
link |
00:29:38.200
So this is what John Nash actually proved.
link |
00:29:41.080
So they do have a Nash equilibrium.
link |
00:29:42.920
That's not the problem.
link |
00:29:43.840
The problem is that there can be many.
link |
00:29:46.600
And then there's a question of which equilibrium to select.
link |
00:29:50.400
So, and if you select your strategy
link |
00:29:52.200
from a different equilibrium and I select mine,
link |
00:29:57.920
then what does that mean?
link |
00:29:59.920
And in these non zero sum games,
link |
00:30:02.080
we may lose some joint benefit
link |
00:30:05.720
by being just simply stupid.
link |
00:30:07.040
We could actually both be better off
link |
00:30:08.400
if we did something else.
link |
00:30:09.920
And in three player, you get other problems
link |
00:30:11.760
also like collusion.
link |
00:30:13.200
Like maybe you and I can gang up on a third player
link |
00:30:16.560
and we can do radically better by colluding.
link |
00:30:19.800
So there are lots of issues that come up there.
link |
00:30:22.200
So Noah Brown, the student you work with on this
link |
00:30:25.640
has mentioned, I looked through the AMA on Reddit.
link |
00:30:29.360
He mentioned that the ability of poker players
link |
00:30:31.280
to collaborate will make the game.
link |
00:30:33.800
He was asked the question of,
link |
00:30:35.200
how would you make the game of poker,
link |
00:30:37.920
or both of you were asked the question,
link |
00:30:39.280
how would you make the game of poker
link |
00:30:41.560
beyond being solvable by current AI methods?
link |
00:30:47.000
And he said that there's not many ways
link |
00:30:50.560
of making poker more difficult,
link |
00:30:53.120
but a collaboration or cooperation between players
link |
00:30:57.760
would make it extremely difficult.
link |
00:30:59.760
So can you provide the intuition behind why that is,
link |
00:31:03.320
if you agree with that idea?
link |
00:31:05.280
Yeah, so I've done a lot of work on coalitional games
link |
00:31:10.200
and we actually have a paper here
link |
00:31:11.680
with my other student Gabriele Farina
link |
00:31:13.680
and some other collaborators at NIPS on that.
link |
00:31:16.640
Actually just came back from the poster session
link |
00:31:18.520
where we presented this.
link |
00:31:19.760
But so when you have a collusion, it's a different problem.
link |
00:31:23.800
And it typically gets even harder then.
link |
00:31:27.520
Even the game representations,
link |
00:31:29.600
some of the game representations don't really allow
link |
00:31:33.600
good computation.
link |
00:31:34.480
So we actually introduced a new game representation
link |
00:31:37.600
for that.
link |
00:31:38.720
Is that kind of cooperation part of the model?
link |
00:31:42.040
Are you, do you have, do you have information
link |
00:31:44.560
about the fact that other players are cooperating
link |
00:31:47.040
or is it just this chaos that where nothing is known?
link |
00:31:50.000
So there's some things unknown.
link |
00:31:52.360
Can you give an example of a collusion type game
link |
00:31:55.840
or is it usually?
link |
00:31:56.680
So like bridge.
link |
00:31:58.360
So think about bridge.
link |
00:31:59.640
It's like when you and I are on a team,
link |
00:32:02.320
our payoffs are the same.
link |
00:32:04.480
The problem is that we can't talk.
link |
00:32:06.400
So when I get my cards, I can't whisper to you
link |
00:32:09.000
what my cards are.
link |
00:32:10.320
That would not be allowed.
link |
00:32:12.480
So we have to somehow coordinate our strategies
link |
00:32:16.080
ahead of time and only ahead of time.
link |
00:32:19.920
And then there's certain signals we can talk about,
link |
00:32:22.760
but they have to be such that the other team
link |
00:32:25.240
also understands them.
link |
00:32:26.840
So that's an example where the coordination
link |
00:32:30.440
is already built into the rules of the game.
link |
00:32:33.000
But in many other situations like auctions
link |
00:32:35.640
or negotiations or diplomatic relationships, poker,
link |
00:32:40.880
it's not really built in, but it still can be very helpful
link |
00:32:44.160
for the colluders.
link |
00:32:45.280
I've read you write somewhere,
link |
00:32:48.240
the negotiations you come to the table with prior,
link |
00:32:52.800
like a strategy that you're willing to do
link |
00:32:56.080
and not willing to do those kinds of things.
link |
00:32:58.320
So how do you start to now moving away from poker,
link |
00:33:01.960
moving beyond poker into other applications
link |
00:33:04.520
like negotiations, how do you start applying this
link |
00:33:07.000
to other domains, even real world domains
link |
00:33:11.640
that you've worked on?
link |
00:33:12.520
Yeah, I actually have two startup companies
link |
00:33:14.440
doing exactly that.
link |
00:33:15.480
One is called Strategic Machine,
link |
00:33:17.800
and that's for kind of business applications,
link |
00:33:20.000
gaming, sports, all sorts of things like that.
link |
00:33:22.880
Any applications of this to business and to sports
link |
00:33:27.200
and to gaming, to various types of things
link |
00:33:32.120
in finance, electricity markets and so on.
link |
00:33:34.240
And the other is called Strategy Robot,
link |
00:33:36.600
where we are taking these to military security,
link |
00:33:40.640
cyber security and intelligence applications.
link |
00:33:43.520
I think you worked a little bit in,
link |
00:33:48.000
how do you put it, advertisement,
link |
00:33:51.000
sort of suggesting ads kind of thing, auction.
link |
00:33:55.360
That's another company, optimized markets.
link |
00:33:57.800
But that's much more about a combinatorial market
link |
00:34:00.880
and optimization based technology.
link |
00:34:02.840
That's not using these game theoretic reasoning technologies.
link |
00:34:06.840
I see, okay, so what sort of high level
link |
00:34:11.600
do you think about our ability to use
link |
00:34:15.280
game theoretic concepts to model human behavior?
link |
00:34:18.040
Do you think human behavior is amenable
link |
00:34:21.640
to this kind of modeling outside of the poker games,
link |
00:34:24.720
and where have you seen it done successfully in your work?
link |
00:34:27.520
I'm not sure the goal really is modeling humans.
link |
00:34:33.640
Like for example, if I'm playing a zero sum game,
link |
00:34:36.480
I don't really care that the opponent
link |
00:34:39.840
is actually following my model of rational behavior,
link |
00:34:42.960
because if they're not, that's even better for me.
link |
00:34:46.400
Right, so see with the opponents in games,
link |
00:34:51.120
the prerequisite is that you formalize
link |
00:34:56.120
the interaction in some way
link |
00:34:57.800
that can be amenable to analysis.
link |
00:35:01.000
And you've done this amazing work with mechanism design,
link |
00:35:04.160
designing games that have certain outcomes.
link |
00:35:10.040
But, so I'll tell you an example
link |
00:35:12.320
from my world of autonomous vehicles, right?
link |
00:35:15.460
We're studying pedestrians,
link |
00:35:17.040
and pedestrians and cars negotiate
link |
00:35:20.200
in this nonverbal communication.
link |
00:35:22.160
There's this weird game dance of tension
link |
00:35:25.040
where pedestrians are basically saying,
link |
00:35:27.280
I trust that you won't kill me,
link |
00:35:28.800
and so as a jaywalker, I will step onto the road
link |
00:35:31.840
even though I'm breaking the law, and there's this tension.
link |
00:35:34.720
And the question is, we really don't know
link |
00:35:36.640
how to model that well in trying to model intent.
link |
00:35:40.720
And so people sometimes bring up ideas
link |
00:35:43.080
of game theory and so on.
link |
00:35:44.880
Do you think that aspect of human behavior
link |
00:35:49.120
can use these kinds of imperfect information approaches,
link |
00:35:53.080
modeling, how do you start to attack a problem like that
link |
00:35:57.860
when you don't even know how to design the game
link |
00:36:00.940
to describe the situation in order to solve it?
link |
00:36:04.280
Okay, so I haven't really thought about jaywalking,
link |
00:36:06.800
but one thing that I think could be a good application
link |
00:36:10.120
in autonomous vehicles is the following.
link |
00:36:13.000
So let's say that you have fleets of autonomous cars
link |
00:36:16.320
operating by different companies.
link |
00:36:18.340
So maybe here's the Waymo fleet and here's the Uber fleet.
link |
00:36:22.120
If you think about the rules of the road,
link |
00:36:24.320
they define certain legal rules,
link |
00:36:26.560
but that still leaves a huge strategy space open.
link |
00:36:30.080
Like as a simple example, when cars merge,
link |
00:36:32.840
how humans merge, they slow down and look at each other
link |
00:36:36.000
and try to merge.
link |
00:36:39.240
Wouldn't it be better if these situations
link |
00:36:40.920
would already be prenegotiated
link |
00:36:43.480
so we can actually merge at full speed
link |
00:36:45.200
and we know that this is the situation,
link |
00:36:47.440
this is how we do it, and it's all gonna be faster.
link |
00:36:50.540
But there are way too many situations to negotiate manually.
link |
00:36:54.120
So you could use automated negotiation,
link |
00:36:56.400
this is the idea at least,
link |
00:36:57.780
you could use automated negotiation
link |
00:36:59.840
to negotiate all of these situations
link |
00:37:02.060
or many of them in advance.
link |
00:37:04.320
And of course it might be that,
link |
00:37:05.460
hey, maybe you're not gonna always let me go first.
link |
00:37:09.180
Maybe you said, okay, well, in these situations,
link |
00:37:11.280
I'll let you go first, but in exchange,
link |
00:37:13.560
you're gonna give me too much,
link |
00:37:14.520
you're gonna let me go first in this situation.
link |
00:37:17.260
So it's this huge combinatorial negotiation.
link |
00:37:20.680
And do you think there's room in that example of merging
link |
00:37:24.080
to model this whole situation
link |
00:37:25.600
as an imperfect information game
link |
00:37:27.160
or do you really want to consider it to be a perfect?
link |
00:37:30.120
No, that's a good question, yeah.
link |
00:37:32.240
That's a good question.
link |
00:37:33.080
Do you pay the price of assuming
link |
00:37:37.080
that you don't know everything?
link |
00:37:39.800
Yeah, I don't know.
link |
00:37:40.760
It's certainly much easier.
link |
00:37:42.120
Games with perfect information are much easier.
link |
00:37:45.060
So if you can't get away with it, you should.
link |
00:37:49.280
But if the real situation is of imperfect information,
link |
00:37:52.640
then you're gonna have to deal with imperfect information.
link |
00:37:55.160
Great, so what lessons have you learned
link |
00:37:58.080
the Annual Computer Poker Competition?
link |
00:38:00.680
An incredible accomplishment of AI.
link |
00:38:03.440
You look at the history of Deep Blue, AlphaGo,
link |
00:38:07.000
these kind of moments when AI stepped up
link |
00:38:10.400
in an engineering effort and a scientific effort combined
link |
00:38:13.960
to beat the best of human players.
link |
00:38:16.400
So what do you take away from this whole experience?
link |
00:38:19.480
What have you learned about designing AI systems
link |
00:38:22.440
that play these kinds of games?
link |
00:38:23.960
And what does that mean for AI in general,
link |
00:38:28.280
for the future of AI development?
link |
00:38:30.760
Yeah, so that's a good question.
link |
00:38:32.800
So there's so much to say about it.
link |
00:38:35.440
I do like this type of performance oriented research.
link |
00:38:39.120
Although in my group, we go all the way from like idea
link |
00:38:42.000
to theory, to experiments, to big system building,
link |
00:38:44.880
to commercialization, so we span that spectrum.
link |
00:38:47.960
But I think that in a lot of situations in AI,
link |
00:38:51.080
you really have to build the big systems
link |
00:38:53.440
and evaluate them at scale
link |
00:38:55.640
before you know what works and doesn't.
link |
00:38:57.520
And we've seen that in the computational
link |
00:39:00.080
game theory community, that there are a lot of techniques
link |
00:39:02.880
that look good in the small,
link |
00:39:05.200
but then they cease to look good in the large.
link |
00:39:07.120
And we've also seen that there are a lot of techniques
link |
00:39:10.160
that look superior in theory.
link |
00:39:13.280
And I really mean in terms of convergence rates,
link |
00:39:16.200
like first order methods, better convergence rates,
link |
00:39:18.440
like the CFR based algorithms,
link |
00:39:20.880
yet the CFR based algorithms are the fastest in practice.
link |
00:39:24.880
So it really tells me that you have to test this in reality.
link |
00:39:28.240
The theory isn't tight enough, if you will,
link |
00:39:30.880
to tell you which algorithms are better than the others.
link |
00:39:34.360
And you have to look at these things in the large,
link |
00:39:38.600
because any sort of projections you do from the small
link |
00:39:41.480
can at least in this domain be very misleading.
link |
00:39:43.800
So that's kind of from a kind of a science
link |
00:39:46.240
and engineering perspective, from a personal perspective,
link |
00:39:49.120
it's been just a wild experience
link |
00:39:51.280
in that with the first poker competition,
link |
00:39:54.160
the first brains versus AI,
link |
00:39:57.200
man machine poker competition that we organized.
link |
00:39:59.840
There had been, by the way, for other poker games,
link |
00:40:01.760
there had been previous competitions,
link |
00:40:03.240
but this was for Heads Up No Limit, this was the first.
link |
00:40:06.360
And I probably became the most hated person
link |
00:40:09.560
in the world of poker.
link |
00:40:10.880
And I didn't mean to, I just saw.
link |
00:40:12.880
Why is that?
link |
00:40:13.720
For cracking the game, for something.
link |
00:40:15.840
Yeah, a lot of people felt that it was a real threat
link |
00:40:20.000
to the whole game, the whole existence of the game.
link |
00:40:22.760
If AI becomes better than humans,
link |
00:40:26.080
people would be scared to play poker
link |
00:40:28.520
because there are these superhuman AIs running around
link |
00:40:30.680
taking their money and all of that.
link |
00:40:32.760
So I just, it's just really aggressive.
link |
00:40:36.200
The comments were super aggressive.
link |
00:40:37.880
I got everything just short of death threats.
link |
00:40:40.920
Do you think the same was true for chess?
link |
00:40:44.000
Because right now they just completed
link |
00:40:45.760
the world championships in chess,
link |
00:40:47.720
and humans just started ignoring the fact
link |
00:40:49.560
that there's AI systems now that outperform humans
link |
00:40:52.920
and they still enjoy the game, it's still a beautiful game.
link |
00:40:55.520
That's what I think.
link |
00:40:56.360
And I think the same thing happens in poker.
link |
00:40:58.800
And so I didn't think of myself
link |
00:41:01.040
as somebody who was gonna kill the game,
link |
00:41:02.360
and I don't think I did.
link |
00:41:03.800
I've really learned to love this game.
link |
00:41:05.600
I wasn't a poker player before,
link |
00:41:06.960
but learned so many nuances about it from these AIs,
link |
00:41:10.520
and they've really changed how the game is played,
link |
00:41:12.480
by the way.
link |
00:41:13.320
So they have these very Martian ways of playing poker,
link |
00:41:16.240
and the top humans are now incorporating
link |
00:41:18.400
those types of strategies into their own play.
link |
00:41:21.400
So if anything, to me, our work has made poker
link |
00:41:26.560
a richer, more interesting game for humans to play,
link |
00:41:29.800
not something that is gonna steer humans
link |
00:41:32.160
away from it entirely.
link |
00:41:34.200
Just a quick comment on something you said,
link |
00:41:35.960
which is, if I may say so,
link |
00:41:39.400
in academia is a little bit rare sometimes.
link |
00:41:42.400
It's pretty brave to put your ideas to the test
link |
00:41:45.520
in the way you described,
link |
00:41:47.200
saying that sometimes good ideas don't work
link |
00:41:49.360
when you actually try to apply them at scale.
link |
00:41:52.760
So where does that come from?
link |
00:41:54.200
I mean, if you could do advice for people,
link |
00:41:58.880
what drives you in that sense?
link |
00:42:00.760
Were you always this way?
link |
00:42:02.360
I mean, it takes a brave person.
link |
00:42:04.080
I guess is what I'm saying, to test their ideas
link |
00:42:06.760
and to see if this thing actually works
link |
00:42:08.640
against human top human players and so on.
link |
00:42:11.680
Yeah, I don't know about brave,
link |
00:42:12.960
but it takes a lot of work.
link |
00:42:15.000
It takes a lot of work and a lot of time
link |
00:42:18.400
to organize, to make something big
link |
00:42:20.360
and to organize an event and stuff like that.
link |
00:42:22.920
And what drives you in that effort?
link |
00:42:24.760
Because you could still, I would argue,
link |
00:42:26.880
get a best paper award at NIPS as you did in 17
link |
00:42:30.280
without doing this.
link |
00:42:31.440
That's right, yes.
link |
00:42:32.960
And so in general, I believe it's very important
link |
00:42:37.640
to do things in the real world and at scale.
link |
00:42:41.480
And that's really where the pudding, if you will,
link |
00:42:46.160
proof is in the pudding, that's where it is.
link |
00:42:48.400
In this particular case,
link |
00:42:50.080
it was kind of a competition between different groups
link |
00:42:55.160
and for many years as to who can be the first one
link |
00:42:59.080
to beat the top humans at Heads Up No Limit, Texas Holdem.
link |
00:43:02.040
So it became kind of like a competition who can get there.
link |
00:43:09.560
Yeah, so a little friendly competition
link |
00:43:11.800
could do wonders for progress.
link |
00:43:14.040
Yes, absolutely.
link |
00:43:16.400
So the topic of mechanism design,
link |
00:43:19.040
which is really interesting, also kind of new to me,
link |
00:43:22.280
except as an observer of, I don't know, politics and any,
link |
00:43:25.680
I'm an observer of mechanisms,
link |
00:43:27.600
but you write in your paper an automated mechanism design
link |
00:43:31.440
that I quickly read.
link |
00:43:34.000
So mechanism design is designing the rules of the game
link |
00:43:37.880
so you get a certain desirable outcome.
link |
00:43:40.200
And you have this work on doing so in an automatic fashion
link |
00:43:44.920
as opposed to fine tuning it.
link |
00:43:46.720
So what have you learned from those efforts?
link |
00:43:50.680
If you look, say, I don't know,
link |
00:43:52.280
at complexes like our political system,
link |
00:43:56.200
can we design our political system
link |
00:43:58.560
to have, in an automated fashion,
link |
00:44:01.800
to have outcomes that we want?
link |
00:44:03.360
Can we design something like traffic lights to be smart
link |
00:44:09.000
where it gets outcomes that we want?
link |
00:44:11.800
So what are the lessons that you draw from that work?
link |
00:44:14.840
Yeah, so I still very much believe
link |
00:44:17.240
in the automated mechanism design direction.
link |
00:44:19.400
Yes.
link |
00:44:20.840
But it's not a panacea.
link |
00:44:23.000
There are impossibility results in mechanism design
link |
00:44:26.520
saying that there is no mechanism that accomplishes
link |
00:44:30.240
objective X in class C.
link |
00:44:33.920
So it's not going to,
link |
00:44:36.120
there's no way using any mechanism design tools,
link |
00:44:39.000
manual or automated,
link |
00:44:41.000
to do certain things in mechanism design.
link |
00:44:42.800
Can you describe that again?
link |
00:44:43.800
So meaning it's impossible to achieve that?
link |
00:44:47.480
Yeah, yeah.
link |
00:44:48.320
And it's unlikely.
link |
00:44:50.440
Impossible.
link |
00:44:51.280
Impossible.
link |
00:44:52.120
So these are not statements about human ingenuity
link |
00:44:55.240
who might come up with something smart.
link |
00:44:57.120
These are proofs that if you want to accomplish
link |
00:44:59.880
properties X in class C,
link |
00:45:02.480
that is not doable with any mechanism.
link |
00:45:04.880
The good thing about automated mechanism design
link |
00:45:07.080
is that we're not really designing for a class.
link |
00:45:10.840
We're designing for specific settings at a time.
link |
00:45:14.160
So even if there's an impossibility result
link |
00:45:16.720
for the whole class,
link |
00:45:18.240
it just doesn't mean that all of the cases
link |
00:45:21.360
in the class are impossible.
link |
00:45:22.560
It just means that some of the cases are impossible.
link |
00:45:25.080
So we can actually carve these islands of possibility
link |
00:45:28.240
within these known impossible classes.
link |
00:45:30.920
And we've actually done that.
link |
00:45:31.960
So one of the famous results in mechanism design
link |
00:45:35.160
is the Meyerson Satethweight theorem
link |
00:45:37.360
by Roger Meyerson and Mark Satethweight from 1983.
link |
00:45:41.000
It's an impossibility of efficient trade
link |
00:45:43.480
under imperfect information.
link |
00:45:45.200
We show that you can, in many settings,
link |
00:45:48.560
avoid that and get efficient trade anyway.
link |
00:45:51.480
Depending on how they design the game, okay.
link |
00:45:54.160
Depending how you design the game.
link |
00:45:55.880
And of course, it doesn't in any way
link |
00:46:00.240
contradict the impossibility result.
link |
00:46:01.800
The impossibility result is still there,
link |
00:46:03.920
but it just finds spots within this impossible class
link |
00:46:08.920
where in those spots, you don't have the impossibility.
link |
00:46:12.440
Sorry if I'm going a bit philosophical,
link |
00:46:14.760
but what lessons do you draw towards,
link |
00:46:17.480
like I mentioned, politics or human interaction
link |
00:46:20.160
and designing mechanisms for outside of just
link |
00:46:24.880
these kinds of trading or auctioning
link |
00:46:26.960
or purely formal games or human interaction,
link |
00:46:33.480
like a political system?
link |
00:46:34.920
How, do you think it's applicable to, yeah, politics
link |
00:46:39.160
or to business, to negotiations, these kinds of things,
link |
00:46:46.280
designing rules that have certain outcomes?
link |
00:46:49.040
Yeah, yeah, I do think so.
link |
00:46:51.360
Have you seen that successfully done?
link |
00:46:54.200
They haven't really, oh, you mean mechanism design
link |
00:46:56.440
or automated mechanism design?
link |
00:46:57.280
Automated mechanism design.
link |
00:46:59.000
So mechanism design itself
link |
00:47:01.520
has had fairly limited success so far.
link |
00:47:06.440
There are certain cases,
link |
00:47:07.600
but most of the real world situations
link |
00:47:10.200
are actually not sound from a mechanism design perspective,
link |
00:47:14.680
even in those cases where they've been designed
link |
00:47:16.920
by very knowledgeable mechanism design people,
link |
00:47:20.000
the people are typically just taking some insights
link |
00:47:22.760
from the theory and applying those insights
link |
00:47:25.040
into the real world,
link |
00:47:26.280
rather than applying the mechanisms directly.
link |
00:47:29.280
So one famous example of is the FCC spectrum auctions.
link |
00:47:33.520
So I've also had a small role in that
link |
00:47:36.880
and very good economists have been working,
link |
00:47:40.600
excellent economists have been working on that
link |
00:47:42.560
with no game theory,
link |
00:47:44.040
yet the rules that are designed in practice there,
link |
00:47:47.440
they're such that bidding truthfully
link |
00:47:49.840
is not the best strategy.
link |
00:47:51.800
Usually mechanism design,
link |
00:47:52.960
we try to make things easy for the participants.
link |
00:47:56.160
So telling the truth is the best strategy,
link |
00:47:58.560
but even in those very high stakes auctions
link |
00:48:01.480
where you have tens of billions of dollars
link |
00:48:03.080
worth of spectrum being auctioned,
link |
00:48:06.360
truth telling is not the best strategy.
link |
00:48:08.280
And by the way,
link |
00:48:10.040
nobody knows even a single optimal bidding strategy
link |
00:48:12.920
for those auctions.
link |
00:48:14.120
What's the challenge of coming up with an optimal,
link |
00:48:15.960
because there's a lot of players and there's imperfect.
link |
00:48:18.160
It's not so much that a lot of players,
link |
00:48:20.040
but many items for sale,
link |
00:48:22.320
and these mechanisms are such that even with just two items
link |
00:48:26.000
or one item, bidding truthfully
link |
00:48:28.400
wouldn't be the best strategy.
link |
00:48:31.400
If you look at the history of AI,
link |
00:48:34.560
it's marked by seminal events.
link |
00:48:37.160
AlphaGo beating a world champion human Go player,
link |
00:48:40.160
I would put Liberatus winning the Heads Up No Limit Holdem
link |
00:48:43.680
as one of such event.
link |
00:48:45.000
Thank you.
link |
00:48:46.040
And what do you think is the next such event,
link |
00:48:52.560
whether it's in your life or in the broadly AI community
link |
00:48:56.640
that you think might be out there
link |
00:48:59.040
that would surprise the world?
link |
00:49:01.640
So that's a great question,
link |
00:49:02.800
and I don't really know the answer.
link |
00:49:04.520
In terms of game solving,
link |
00:49:07.360
Heads Up No Limit Texas Holdem
link |
00:49:08.920
really was the one remaining widely agreed upon benchmark.
link |
00:49:14.400
So that was the big milestone.
link |
00:49:15.880
Now, are there other things?
link |
00:49:17.800
Yeah, certainly there are,
link |
00:49:18.920
but there's not one that the community
link |
00:49:21.080
has kind of focused on.
link |
00:49:22.920
So what could be other things?
link |
00:49:25.240
There are groups working on StarCraft.
link |
00:49:27.640
There are groups working on Dota 2.
link |
00:49:29.840
These are video games.
link |
00:49:31.560
Or you could have like Diplomacy or Hanabi,
link |
00:49:36.240
things like that.
link |
00:49:37.080
These are like recreational games,
link |
00:49:38.640
but none of them are really acknowledged
link |
00:49:42.040
as kind of the main next challenge problem,
link |
00:49:45.840
like chess or Go or Heads Up No Limit Texas Holdem was.
link |
00:49:50.000
So I don't really know in the game solving space
link |
00:49:52.360
what is or what will be the next benchmark.
link |
00:49:55.400
I kind of hope that there will be a next benchmark
link |
00:49:57.840
because really the different groups
link |
00:49:59.560
working on the same problem
link |
00:50:01.120
really drove these application independent techniques
link |
00:50:05.120
forward very quickly over 10 years.
link |
00:50:07.480
Do you think there's an open problem
link |
00:50:09.120
that excites you that you start moving away
link |
00:50:11.480
from games into real world games,
link |
00:50:15.000
like say the stock market trading?
link |
00:50:17.200
Yeah, so that's kind of how I am.
link |
00:50:19.320
So I am probably not going to work
link |
00:50:23.120
as hard on these recreational benchmarks.
link |
00:50:27.640
I'm doing two startups on game solving technology,
link |
00:50:30.200
Strategic Machine and Strategy Robot,
link |
00:50:32.320
and we're really interested
link |
00:50:34.160
in pushing this stuff into practice.
link |
00:50:36.560
What do you think would be really
link |
00:50:43.160
a powerful result that would be surprising
link |
00:50:45.920
that would be, if you can say,
link |
00:50:49.960
I mean, five years, 10 years from now,
link |
00:50:53.280
something that statistically you would say
link |
00:50:56.480
is not very likely,
link |
00:50:57.920
but if there's a breakthrough, would achieve?
link |
00:51:01.480
Yeah, so I think that overall,
link |
00:51:03.800
we're in a very different situation in game theory
link |
00:51:09.000
than we are in, let's say, machine learning.
link |
00:51:11.760
So in machine learning, it's a fairly mature technology
link |
00:51:14.360
and it's very broadly applied
link |
00:51:16.480
and proven success in the real world.
link |
00:51:19.680
In game solving, there are almost no applications yet.
link |
00:51:24.320
We have just become superhuman,
link |
00:51:26.680
which machine learning you could argue happened in the 90s,
link |
00:51:29.600
if not earlier,
link |
00:51:30.640
and at least on supervised learning,
link |
00:51:32.960
certain complex supervised learning applications.
link |
00:51:36.960
Now, I think the next challenge problem,
link |
00:51:39.000
I know you're not asking about it this way,
link |
00:51:40.560
you're asking about the technology breakthrough,
link |
00:51:42.640
but I think that big, big breakthrough
link |
00:51:44.240
is to be able to show that, hey,
link |
00:51:46.120
maybe most of, let's say, military planning
link |
00:51:48.280
or most of business strategy
link |
00:51:50.080
will actually be done strategically
link |
00:51:52.200
using computational game theory.
link |
00:51:54.120
That's what I would like to see
link |
00:51:55.800
as the next five or 10 year goal.
link |
00:51:57.640
Maybe you can explain to me again,
link |
00:51:59.520
forgive me if this is an obvious question,
link |
00:52:01.920
but machine learning methods,
link |
00:52:04.000
neural networks suffer from not being transparent,
link |
00:52:07.840
not being explainable.
link |
00:52:09.280
Game theoretic methods, Nash equilibria,
link |
00:52:12.400
do they generally, when you see the different solutions,
link |
00:52:15.280
are they, when you talk about military operations,
link |
00:52:19.640
are they, once you see the strategies,
link |
00:52:21.800
do they make sense, are they explainable,
link |
00:52:23.880
or do they suffer from the same problems
link |
00:52:25.840
as neural networks do?
link |
00:52:27.120
So that's a good question.
link |
00:52:28.720
I would say a little bit yes and no.
link |
00:52:31.240
And what I mean by that is that
link |
00:52:34.560
these game theoretic strategies,
link |
00:52:36.160
let's say, Nash equilibrium,
link |
00:52:38.520
it has provable properties.
link |
00:52:40.320
So it's unlike, let's say, deep learning
link |
00:52:42.360
where you kind of cross your fingers,
link |
00:52:44.440
hopefully it'll work.
link |
00:52:45.680
And then after the fact, when you have the weights,
link |
00:52:47.880
you're still crossing your fingers,
link |
00:52:48.920
and I hope it will work.
link |
00:52:51.160
Here, you know that the solution quality is there.
link |
00:52:55.400
There's provable solution quality guarantees.
link |
00:52:58.040
Now, that doesn't necessarily mean
link |
00:53:00.920
that the strategies are human understandable.
link |
00:53:03.480
That's a whole other problem.
link |
00:53:04.720
So I think that deep learning and computational game theory
link |
00:53:08.680
are in the same boat in that sense,
link |
00:53:10.720
that both are difficult to understand.
link |
00:53:13.760
But at least the game theoretic techniques,
link |
00:53:15.680
they have these guarantees of solution quality.
link |
00:53:19.840
So do you see business operations, strategic operations,
link |
00:53:22.880
or even military in the future being
link |
00:53:26.040
at least the strong candidates
link |
00:53:28.320
being proposed by automated systems?
link |
00:53:32.760
Do you see that?
link |
00:53:34.000
Yeah, I do, I do.
link |
00:53:35.040
But that's more of a belief than a substantiated fact.
link |
00:53:39.640
Depending on where you land in optimism or pessimism,
link |
00:53:42.320
that's a really, to me, that's an exciting future,
link |
00:53:45.720
especially if there's provable things
link |
00:53:48.760
in terms of optimality.
link |
00:53:50.560
So looking into the future,
link |
00:53:54.040
there's a few folks worried about the,
link |
00:53:58.760
especially you look at the game of poker,
link |
00:54:01.200
which is probably one of the last benchmarks
link |
00:54:03.360
in terms of games being solved.
link |
00:54:05.480
They worry about the future
link |
00:54:07.520
and the existential threats of artificial intelligence.
link |
00:54:10.520
So the negative impact in whatever form on society.
link |
00:54:13.840
Is that something that concerns you as much,
link |
00:54:17.440
or are you more optimistic about the positive impacts of AI?
link |
00:54:21.600
Oh, I am much more optimistic about the positive impacts.
link |
00:54:24.720
So just in my own work, what we've done so far,
link |
00:54:27.560
we run the nationwide kidney exchange.
link |
00:54:29.920
Hundreds of people are walking around alive today,
link |
00:54:32.960
who would it be?
link |
00:54:34.080
And it's increased employment.
link |
00:54:36.120
You have a lot of people now running kidney exchanges
link |
00:54:39.920
and at the transplant centers,
link |
00:54:42.200
interacting with the kidney exchange.
link |
00:54:45.560
You have extra surgeons, nurses, anesthesiologists,
link |
00:54:49.440
hospitals, all of that.
link |
00:54:51.400
So employment is increasing from that
link |
00:54:53.560
and the world is becoming a better place.
link |
00:54:55.320
Another example is combinatorial sourcing auctions.
link |
00:54:59.040
We did 800 large scale combinatorial sourcing auctions
link |
00:55:04.040
from 2001 to 2010 in a previous startup of mine
link |
00:55:08.240
called CombineNet.
link |
00:55:09.400
And we increased the supply chain efficiency
link |
00:55:13.080
on that $60 billion of spend by 12.6%.
link |
00:55:18.080
So that's over $6 billion of efficiency improvement
link |
00:55:21.440
in the world.
link |
00:55:22.240
And this is not like shifting value
link |
00:55:23.760
from somebody to somebody else,
link |
00:55:25.240
just efficiency improvement, like in trucking,
link |
00:55:28.200
less empty driving, so there's less waste,
link |
00:55:31.120
less carbon footprint and so on.
link |
00:55:33.440
So a huge positive impact in the near term,
link |
00:55:36.720
but sort of to stay in it for a little longer,
link |
00:55:40.680
because I think game theory has a role to play here.
link |
00:55:43.080
Oh, let me actually come back on that as one thing.
link |
00:55:45.320
I think AI is also going to make the world much safer.
link |
00:55:49.400
So that's another aspect that often gets overlooked.
link |
00:55:53.760
Well, let me ask this question.
link |
00:55:54.920
Maybe you can speak to the safer.
link |
00:55:56.960
So I talked to Max Tegmark and Stuart Russell,
link |
00:55:59.960
who are very concerned about existential threats of AI.
link |
00:56:02.680
And often the concern is about value misalignment.
link |
00:56:06.240
So AI systems basically working,
link |
00:56:11.880
operating towards goals that are not the same
link |
00:56:14.680
as human civilization, human beings.
link |
00:56:17.920
So it seems like game theory has a role to play there
link |
00:56:24.200
to make sure the values are aligned with human beings.
link |
00:56:27.880
I don't know if that's how you think about it.
link |
00:56:29.960
If not, how do you think AI might help with this problem?
link |
00:56:35.280
How do you think AI might make the world safer?
link |
00:56:39.240
Yeah, I think this value misalignment
link |
00:56:43.000
is a fairly theoretical worry.
link |
00:56:46.480
And I haven't really seen it in,
link |
00:56:49.960
because I do a lot of real applications.
link |
00:56:51.840
I don't see it anywhere.
link |
00:56:53.920
The closest I've seen it
link |
00:56:55.240
was the following type of mental exercise really,
link |
00:56:57.920
where I had this argument in the late eighties
link |
00:57:00.720
when we were building
link |
00:57:01.560
these transportation optimization systems.
link |
00:57:03.560
And somebody had heard that it's a good idea
link |
00:57:05.360
to have high utilization of assets.
link |
00:57:08.160
So they told me, hey, why don't you put that as objective?
link |
00:57:11.400
And we didn't even put it as an objective
link |
00:57:14.720
because I just showed him that,
link |
00:57:16.480
if you had that as your objective,
link |
00:57:18.480
the solution would be to load your trucks full
link |
00:57:20.320
and drive in circles.
link |
00:57:21.840
Nothing would ever get delivered.
link |
00:57:23.000
You'd have a hundred percent utilization.
link |
00:57:25.120
So yeah, I know this phenomenon.
link |
00:57:27.240
I've known this for over 30 years,
link |
00:57:29.680
but I've never seen it actually be a problem in reality.
link |
00:57:33.360
And yes, if you have the wrong objective,
link |
00:57:35.240
the AI will optimize that to the hilt
link |
00:57:37.800
and it's gonna hurt more than some human
link |
00:57:39.800
who's kind of trying to solve it in a half baked way
link |
00:57:43.800
with some human insight too.
link |
00:57:45.480
But I just haven't seen that materialize in practice.
link |
00:57:49.160
There's this gap that you've actually put your finger on
link |
00:57:52.720
very clearly just now between theory and reality.
link |
00:57:57.080
That's very difficult to put into words, I think.
link |
00:57:59.680
It's what you can theoretically imagine,
link |
00:58:03.240
the worst possible case or even, yeah, I mean bad cases.
link |
00:58:08.000
And what usually happens in reality.
link |
00:58:10.520
So for example, to me,
link |
00:58:11.960
maybe it's something you can comment on having grown up
link |
00:58:15.720
and I grew up in the Soviet Union.
link |
00:58:19.120
There's currently 10,000 nuclear weapons in the world.
link |
00:58:22.120
And for many decades,
link |
00:58:24.200
it's theoretically surprising to me
link |
00:58:28.360
that the nuclear war is not broken out.
link |
00:58:30.880
Do you think about this aspect
link |
00:58:33.760
from a game theoretic perspective in general,
link |
00:58:36.080
why is that true?
link |
00:58:38.440
Why in theory you could see
link |
00:58:40.720
how things would go terribly wrong
link |
00:58:42.600
and somehow yet they have not?
link |
00:58:44.280
Yeah, how do you think about it?
link |
00:58:45.600
So I do think about that a lot.
link |
00:58:47.240
I think the biggest two threats that we're facing as mankind,
link |
00:58:50.320
one is climate change and the other is nuclear war.
link |
00:58:53.320
So those are my main two worries that I worry about.
link |
00:58:57.200
And I've tried to do something about climate,
link |
00:58:59.920
thought about trying to do something
link |
00:59:01.320
for climate change twice.
link |
00:59:02.880
Actually, for two of my startups,
link |
00:59:05.040
I've actually commissioned studies
link |
00:59:06.760
of what we could do on those things.
link |
00:59:09.480
And we didn't really find a sweet spot,
link |
00:59:11.040
but I'm still keeping an eye out on that.
link |
00:59:12.680
If there's something where we could actually
link |
00:59:15.160
provide a market solution or optimizations solution
link |
00:59:17.800
or some other technology solution to problems.
link |
00:59:20.960
Right now, like for example,
link |
00:59:23.360
pollution critic markets was what we were looking at then.
link |
00:59:26.760
And it was much more the lack of political will
link |
00:59:30.040
by those markets were not so successful
link |
00:59:32.840
rather than bad market design.
link |
00:59:34.640
So I could go in and make a better market design,
link |
00:59:37.080
but that wouldn't really move the needle
link |
00:59:38.600
on the world very much if there's no political will.
link |
00:59:41.160
And in the US, the market,
link |
00:59:43.600
at least the Chicago market was just shut down and so on.
link |
00:59:47.520
So then it doesn't really help
link |
00:59:48.760
how great your market design was.
link |
00:59:51.040
And then the nuclear side, it's more,
link |
00:59:53.560
so global warming is a more encroaching problem.
link |
01:00:00.840
Nuclear weapons have been here.
link |
01:00:03.280
It's an obvious problem that's just been sitting there.
link |
01:00:05.720
So how do you think about,
link |
01:00:07.480
what is the mechanism design there
link |
01:00:09.240
that just made everything seem stable?
link |
01:00:12.280
And are you still extremely worried?
link |
01:00:14.800
I am still extremely worried.
link |
01:00:16.640
So you probably know the simple game theory of mad.
link |
01:00:20.040
So this was a mutually assured destruction
link |
01:00:23.760
and it doesn't require any computation with small matrices.
link |
01:00:27.360
You can actually convince yourself
link |
01:00:28.600
that the game is such that nobody wants to initiate.
link |
01:00:31.480
Yeah, that's a very coarse grained analysis.
link |
01:00:34.600
And it really works in a situational way.
link |
01:00:36.880
You have two superpowers or small number of superpowers.
link |
01:00:40.400
Now things are very different.
link |
01:00:41.960
You have a smaller nuke.
link |
01:00:43.080
So the threshold of initiating is smaller
link |
01:00:47.240
and you have smaller countries and non nation actors
link |
01:00:51.520
who may get a nuke and so on.
link |
01:00:53.760
So I think it's riskier now than it was maybe ever before.
link |
01:00:58.320
And what idea, application of AI,
link |
01:01:03.640
you've talked about a little bit,
link |
01:01:04.640
but what is the most exciting to you right now?
link |
01:01:07.560
I mean, you're here at NIPS, NeurIPS.
link |
01:01:10.160
Now you have a few excellent pieces of work,
link |
01:01:14.920
but what are you thinking into the future
link |
01:01:16.680
with several companies you're doing?
link |
01:01:17.840
What's the most exciting thing or one of the exciting things?
link |
01:01:21.120
The number one thing for me right now
link |
01:01:23.160
is coming up with these scalable techniques
link |
01:01:26.360
for game solving and applying them into the real world.
link |
01:01:30.440
I'm still very interested in market design as well.
link |
01:01:33.160
And we're doing that in the optimized markets,
link |
01:01:35.400
but I'm most interested if number one right now
link |
01:01:37.560
is strategic machine strategy robot,
link |
01:01:40.000
getting that technology out there
link |
01:01:41.440
and seeing as you were in the trenches doing applications,
link |
01:01:45.560
what needs to be actually filled,
link |
01:01:47.120
what technology gaps still need to be filled.
link |
01:01:49.800
So it's so hard to just put your feet on the table
link |
01:01:52.040
and imagine what needs to be done.
link |
01:01:53.800
But when you're actually doing real applications,
link |
01:01:56.280
the applications tell you what needs to be done.
link |
01:01:59.120
And I really enjoy that interaction.
link |
01:02:00.840
Is it a challenging process to apply
link |
01:02:04.480
some of the state of the art techniques you're working on
link |
01:02:07.760
and having the various players in industry
link |
01:02:14.080
or the military or people who could really benefit from it
link |
01:02:17.720
actually use it?
link |
01:02:19.040
What's that process like of,
link |
01:02:21.400
autonomous vehicles work with automotive companies
link |
01:02:23.680
and they're in many ways are a little bit old fashioned.
link |
01:02:28.200
It's difficult.
link |
01:02:29.240
They really want to use this technology.
link |
01:02:31.840
There's clearly will have a significant benefit,
link |
01:02:34.640
but the systems aren't quite in place
link |
01:02:37.480
to easily have them integrated in terms of data,
link |
01:02:41.080
in terms of compute, in terms of all these kinds of things.
link |
01:02:43.760
So is that one of the bigger challenges that you're facing
link |
01:02:48.680
and how do you tackle that challenge?
link |
01:02:50.000
Yeah, I think that's always a challenge.
link |
01:02:52.360
That's kind of slowness and inertia really
link |
01:02:55.560
of let's do things the way we've always done it.
link |
01:02:57.920
You just have to find the internal champions
link |
01:03:00.120
at the customer who understand that,
link |
01:03:02.120
hey, things can't be the same way in the future.
link |
01:03:04.680
Otherwise bad things are going to happen.
link |
01:03:06.960
And it's in autonomous vehicles.
link |
01:03:08.600
It's actually very interesting
link |
01:03:09.680
that the car makers are doing that
link |
01:03:11.120
and they're very traditional,
link |
01:03:12.440
but at the same time you have tech companies
link |
01:03:14.360
who have nothing to do with cars or transportation
link |
01:03:17.120
like Google and Baidu really pushing on autonomous cars.
link |
01:03:21.880
I find that fascinating.
link |
01:03:23.240
Clearly you're super excited
link |
01:03:25.160
about actually these ideas having an impact in the world.
link |
01:03:29.320
In terms of the technology, in terms of ideas and research,
link |
01:03:32.680
are there directions that you're also excited about?
link |
01:03:36.600
Whether that's on some of the approaches you talked about
link |
01:03:40.840
for the imperfect information games,
link |
01:03:42.760
whether it's applying deep learning
link |
01:03:44.000
to some of these problems,
link |
01:03:45.120
is there something that you're excited
link |
01:03:46.520
in the research side of things?
link |
01:03:48.840
Yeah, yeah, lots of different things
link |
01:03:51.120
in the game solving.
link |
01:03:53.240
So solving even bigger games,
link |
01:03:56.400
games where you have more hidden action
link |
01:03:59.760
of the player actions as well.
link |
01:04:02.040
Poker is a game where really the chance actions are hidden
link |
01:04:05.880
or some of them are hidden,
link |
01:04:07.080
but the player actions are public.
link |
01:04:11.440
Multiplayer games of various sorts,
link |
01:04:14.000
collusion, opponent exploitation,
link |
01:04:18.080
all and even longer games.
link |
01:04:21.280
So games that basically go forever,
link |
01:04:23.160
but they're not repeated.
link |
01:04:24.680
So see extensive fun games that go forever.
link |
01:04:27.880
What would that even look like?
link |
01:04:30.080
How do you represent that?
link |
01:04:31.040
How do you solve that?
link |
01:04:32.040
What's an example of a game like that?
link |
01:04:33.440
Or is this some of the stochastic games
link |
01:04:35.600
that you mentioned?
link |
01:04:36.440
Let's say business strategy.
link |
01:04:37.320
So it's not just modeling like a particular interaction,
link |
01:04:40.840
but thinking about the business from here to eternity.
link |
01:04:44.440
Or let's say military strategy.
link |
01:04:49.040
So it's not like war is gonna go away.
link |
01:04:51.000
How do you think about military strategy
link |
01:04:54.280
that's gonna go forever?
link |
01:04:56.680
How do you even model that?
link |
01:04:58.080
How do you know whether a move was good
link |
01:05:01.000
that somebody made and so on?
link |
01:05:05.200
So that's kind of one direction.
link |
01:05:06.960
I'm also very interested in learning
link |
01:05:09.800
much more scalable techniques for integer programming.
link |
01:05:13.440
So we had an ICML paper this summer on that.
link |
01:05:16.560
The first automated algorithm configuration paper
link |
01:05:20.280
that has theoretical generalization guarantees.
link |
01:05:23.560
So if I see this many training examples
link |
01:05:26.200
and I told my algorithm in this way,
link |
01:05:28.560
it's going to have good performance
link |
01:05:30.560
on the real distribution, which I've not seen.
link |
01:05:33.200
So, which is kind of interesting
link |
01:05:34.840
that algorithm configuration has been going on now
link |
01:05:37.680
for at least 17 years seriously.
link |
01:05:41.200
And there has not been any generalization theory before.
link |
01:05:45.960
Well, this is really exciting
link |
01:05:47.200
and it's a huge honor to talk to you.
link |
01:05:49.840
Thank you so much, Tomas.
link |
01:05:51.160
Thank you for bringing Labradus to the world
link |
01:05:52.880
and all the great work you're doing.
link |
01:05:54.160
Well, thank you very much.
link |
01:05:55.000
It's been fun.
link |
01:05:55.840
No more questions.