back to index

Tuomas Sandholm: Poker and Game Theory | Lex Fridman Podcast #12


small model | large model

link |
00:00:00.000
The following is a conversation with Thomas Sanholm. He's a professor at SameU and cocreator of
link |
00:00:05.840
Labratus, which is the first AI system to beat top human players in the game of Heads Up No Limit
link |
00:00:11.440
Texas Holdem. He has published over 450 papers on game theory and machine learning, including a
link |
00:00:17.840
best paper in 2017 at NIPS, now renamed to New Rips, which is where I caught up with him for this
link |
00:00:25.360
conversation. His research and companies have had wide reaching impact in the real world,
link |
00:00:32.080
especially because he and his group not only propose new ideas, but also build systems to prove
link |
00:00:38.720
that these ideas work in the real world. This conversation is part of the MIT course on
link |
00:00:44.800
artificial journal intelligence and the artificial intelligence podcast. If you enjoy it, subscribe
link |
00:00:50.400
on YouTube, iTunes, or simply connect with me on Twitter at Lex Friedman, spelled FRID. And now
link |
00:00:58.720
here's my conversation with Thomas Sanholm. Can you describe at the high level the game of poker,
link |
00:01:06.960
Texas Holdem, Heads Up Texas Holdem, for people who might not be familiar with this card game?
link |
00:01:13.200
Yeah, happy to. So Heads Up No Limit Texas Holdem has really emerged in the AI community
link |
00:01:18.720
as a main benchmark for testing these application independent algorithms for
link |
00:01:24.160
imperfect information game solving. And this is a game that's actually played by humans. You don't
link |
00:01:31.200
see that much on TV or casinos because, well, for various reasons, but you do see it in some
link |
00:01:38.560
expert level casinos and you see it in the best poker movies of all time. It's actually an event
link |
00:01:44.000
in the world series of poker, but mostly it's played online and typically for pretty big sums
link |
00:01:50.400
of money. And this is a game that usually only experts play. So if you go to your home game
link |
00:01:57.200
on a Friday night, it probably is not going to be Heads Up No Limit Texas Holdem. It might be
link |
00:02:02.800
No Limit Texas Holdem in some cases, but typically for a big group and it's not as competitive.
link |
00:02:08.560
Well, Heads Up means it's two players, so it's really like me against you. Am I better or are you
link |
00:02:14.320
better, much like chess or go in that sense, but an imperfect information game which makes it much
link |
00:02:20.240
harder because I have to deal with issues of you knowing things that I don't know and I know
link |
00:02:25.920
things that you don't know instead of pieces being nicely laid on the board for both of us to see.
link |
00:02:30.960
So in Texas Holdem, there's two cards that you only see that belong to you and there is
link |
00:02:38.400
they gradually lay out some cards that add up overall to five cards that everybody can see.
link |
00:02:44.080
Yeah, the imperfect nature of the information is the two cards that you're holding up front.
link |
00:02:48.800
Yeah. So as you said, you know, you first get two cards in private each, and then you there's a
link |
00:02:54.160
betting round, then you get three cards in public on the table, then there's a betting round, then
link |
00:02:59.440
you get the fourth card in public on the table, there's a betting round, then you get the fifth
link |
00:03:03.760
card on the table, there's a betting round. So there's a total of four betting rounds and four
link |
00:03:08.480
tranches of information revelation, if you will. The only the first tranche is private and then
link |
00:03:14.480
it's public from there. And this is probably by far the most popular game in AI and just
link |
00:03:25.200
the general public in terms of imperfect information. So it's probably the most popular
link |
00:03:30.080
spectator game to watch, right? So, which is why it's a super exciting game tackle. So it's on
link |
00:03:38.000
the order of chess, I would say in terms of popularity, in terms of AI setting it as the bar
link |
00:03:44.400
of what is intelligence. So in 2017, Libratus beats a few four expert human players. Can you
link |
00:04:01.280
describe that event? What you learned from it? What was it like? What was the process in general
link |
00:04:06.720
for people who have not read the papers in the study? Yeah. So the event was that we invited
link |
00:04:12.960
four of the top 10 players with these specialist players in Heads Up No Limit Texas Holden,
link |
00:04:17.600
which is very important because this game is actually quite different than the multiplayer
link |
00:04:22.960
version. We brought them in to Pittsburgh to play at the reverse casino for 20 days. We wanted to
link |
00:04:29.360
get 120,000 hands in because we wanted to get statistical significance. So it's a lot of hands
link |
00:04:37.440
for humans to play, even for these top pros who play fairly quickly normally. So we couldn't just
link |
00:04:44.000
have one of them play so many hands. 20 days, they were playing basically morning to evening.
link |
00:04:50.320
And you raised 200,000 as a little incentive for them to play. And the setting was so that
link |
00:04:57.920
they didn't all get 50,000. We actually paid them out based on how they did against the AI each.
link |
00:05:05.280
So they had an incentive to play as hard as they could, whether they're way ahead or way behind
link |
00:05:11.040
or right at the mark of beating the AI. And you don't make any money, unfortunately. Right. No,
link |
00:05:16.400
we can't make any money. So originally, a couple of years earlier, I actually explored whether we
link |
00:05:22.560
could actually play for money because that would be, of course, interesting as well to play against
link |
00:05:28.400
the top people for money. But the Pennsylvania Gaming Board said no. So we couldn't. So this
link |
00:05:33.280
is much like an exhibit, like for a musician or a boxer or something like that. Nevertheless,
link |
00:05:40.160
they were keeping track of the money and brought us one close to $2 million, I think. So if it was
link |
00:05:49.040
for real money, if you were able to earn money, that was a quite impressive and inspiring achievement.
link |
00:05:55.200
Just a few details. What were the players looking at? Were they behind a computer? What
link |
00:06:00.720
was the interface like? Yes, they were playing much like they normally do. These top players,
link |
00:06:06.080
when they play this game, they play mostly online. So they used to playing through a UI.
link |
00:06:11.600
And they did the same thing here. So there was this layout, you could imagine. There's a table
link |
00:06:16.880
on a screen. There's the human sitting there. And then there's the AI sitting there. And the
link |
00:06:22.880
screen shows everything that's happening. The cards coming out and shows the bets being made.
link |
00:06:27.440
And we also had the betting history for the human. So if the human forgot what had happened in the
link |
00:06:32.320
hand so far, they could actually reference back and so forth. Is there a reason they were given
link |
00:06:39.040
access to the betting history? Well, it didn't really matter. They wouldn't have forgotten
link |
00:06:46.880
anywhere. These are top quality people. But we just wanted to put out there so it's not the question
link |
00:06:52.160
of the human forgetting and the AI somehow trying to get that advantage of better memory.
link |
00:06:56.560
So what was that like? I mean, that was an incredible accomplishment. So what did it feel like
link |
00:07:01.040
before the event? Did you have doubt, hope? Where was your confidence at?
link |
00:07:08.080
Yeah, that's great. Great question. So 18 months earlier, I had organized a similar
link |
00:07:13.760
brains versus AI competition with a previous AI called Cloudical. And we couldn't beat the humans.
link |
00:07:20.480
So this time around, it was only 18 months later. And I knew that this new AI, Libratus,
link |
00:07:26.400
was way stronger. But it's hard to say how you'll do against the top humans before you try.
link |
00:07:32.320
So I thought we had about a 50 50 shot. And the international betting sites put us as a
link |
00:07:39.040
four to one or five to one underdog. So it's kind of interesting that people really believe in people
link |
00:07:45.440
and over AI, not just people, people don't just believe over believing themselves,
link |
00:07:50.560
but they have overconfidence in other people as well, compared to the performance of AI.
link |
00:07:54.800
And yeah, so we were a four to one or five to one underdog. And even after three days of beating
link |
00:08:00.800
the humans in a row, we were still 50 50 on the international betting sites.
link |
00:08:05.680
Do you think there's something special and magical about poker in the way people think about it?
link |
00:08:11.120
In the sense you have, I mean, even in chess, there's no Hollywood movies. Poker is the star
link |
00:08:19.120
of many movies. And there's this feeling that certain human facial expressions and body language,
link |
00:08:29.760
eye movement, all these tells are critical to poker. Like you can look into somebody's soul
link |
00:08:34.880
and understand their betting strategy and so on. So that's probably why the possibly do you think
link |
00:08:41.760
that is why people have a confidence that humans will outperform because AI systems cannot in this
link |
00:08:48.000
construct perceive these kinds of tells, they're only looking at betting patterns and
link |
00:08:55.600
nothing else, the betting patterns and statistics. So what's more important to you if you step back
link |
00:09:03.920
on human players, human versus human? What's the role of these tells of these ideas that we romanticize?
link |
00:09:12.960
Yeah, so I'll split it into two parts. So one is why do humans trust humans more than AI and
link |
00:09:21.760
have overconfidence in humans? I think that's not really related to the tell question. It's
link |
00:09:27.040
just that they've seen these top players how good they are and they're really fantastic. So it's just
link |
00:09:33.120
hard to believe that the AI could beat them. So I think that's where that comes from. And that's
link |
00:09:39.200
actually maybe a more general lesson about AI that until you've seen it overperform a human,
link |
00:09:44.080
it's hard to believe that it could. But then the tells, a lot of these top players, they're so good
link |
00:09:52.080
at hiding tells that among the top players, it's actually not really worth it for them to invest a
link |
00:10:00.320
lot of effort trying to find tells in each other because they're so good at hiding them. So yes,
link |
00:10:06.560
at the kind of Friday evening game, tells are going to be a huge thing. You can read other people
link |
00:10:12.960
and if you're a good reader, you'll read them like an open book. But at the top levels of poker,
link |
00:10:17.760
no, the tells become a less much smaller and smaller aspect of the game as you go to the top
link |
00:10:23.120
levels. The amount of strategies, the amounts of possible actions is very large, 10 to the power
link |
00:10:32.960
of 100 plus. So there has to be some, I've read a few of the papers related that has,
link |
00:10:40.400
it has to form some abstractions of various hands and actions. So what kind of abstractions are
link |
00:10:46.880
effective for the game of poker? Yeah, so you're exactly right. So when you go from a game tree
link |
00:10:52.800
that's 10 to the 161, especially in an imperfect information game, it's way too large to solve
link |
00:10:59.280
directly. Even with our fastest equilibrium finding algorithms. So you want to abstract it
link |
00:11:06.160
first. And abstraction in games is much trickier than abstraction in MDPs or other single agent
link |
00:11:13.840
settings. Because you have these abstraction pathologies. But if I have a finer grained abstraction,
link |
00:11:20.560
the strategy that I can get from that for the real game might actually be worse than the strategy
link |
00:11:25.120
I can get from the coarse grained abstraction. So you have to be very careful. Now, the kinds
link |
00:11:29.440
of abstractions just to zoom out, we're talking about there's the hands abstractions and then there
link |
00:11:34.880
is betting strategies. Yeah, betting actions. So there's information abstraction to talk about
link |
00:11:42.480
general games, information abstraction, which is the abstraction of what chance does. And this
link |
00:11:47.760
would be the cards in the case of poker. And then there's action abstraction, which is abstracting
link |
00:11:54.400
the actions of the actual players, which would be bets in the case of poker, yourself, any other
link |
00:12:00.640
players, yes, yourself and other players. And for information abstraction, we were completely
link |
00:12:10.240
automated. So these were these are algorithms, but they do what we call potential aware abstraction,
link |
00:12:16.640
where we don't just look at the value of the hand, but also how it might materialize in the
link |
00:12:20.880
good or bad hands over time. And it's a certain kind of bottom up process with integer programming
link |
00:12:26.320
there and clustering and various aspects, how do you build this abstraction. And then in the
link |
00:12:32.640
action abstraction, there, it's largely based on how humans other and other AI's have played
link |
00:12:40.720
this game in the past. But in the beginning, we actually use an automated action abstraction
link |
00:12:46.560
technology, which is provably convergent, that it finds the optimal combination of bet sizes,
link |
00:12:53.840
but it's not very scalable. So we couldn't use it for the whole game, but we use it for the first
link |
00:12:58.320
couple of betting actions. So what's more important, the strength of the hand, so the
link |
00:13:03.760
information abstraction or the how you play them, the actions, does it, you know, the romanticized
link |
00:13:12.320
notion again, is that it doesn't matter what hands you have, that the actions, the betting,
link |
00:13:18.000
maybe the way you win, no matter what hands you have. Yeah, so that's why you have to play a lot
link |
00:13:22.240
of hands, so that the role of luck gets smaller. So you could otherwise get lucky and get some good
link |
00:13:29.440
hands, and then you're going to win the match. Even with thousands of hands, you can get lucky.
link |
00:13:35.120
Because there's so much variance in no limit, Texas hold them, because if we both go all in,
link |
00:13:40.720
it's a huge stack of variance. So there are these massive swings in no limit Texas hold them. So
link |
00:13:47.920
that's why you have to play not just thousands, but over a hundred thousand hands to get statistical
link |
00:13:53.600
significance. So let me ask another way this question. If you didn't even look at your hands,
link |
00:14:01.920
but they didn't know that the opponents didn't know that, how well would you be able to do?
link |
00:14:05.840
That's a good question. There's actually, I heard the story that there's this Norwegian female poker
link |
00:14:11.440
player called Annette Oberstad, who's actually won a tournament by doing exactly that. But that
link |
00:14:17.040
would be extremely rare. So you cannot really play well that way. Okay, so the hands do have
link |
00:14:26.160
some role to play. So Lebrados does not use, as far as I understand, use learning methods,
link |
00:14:34.320
deep learning. Is there room for learning in, you know, there's no reason why Lebrados doesn't,
link |
00:14:42.880
you know, combine with an alpha go type approach for estimating the quality for function estimator.
link |
00:14:49.520
What are your thoughts on this? Maybe as compared to another algorithm, which I'm not
link |
00:14:55.040
that familiar with deep stack, the engine that does use deep learning that is unclear how well
link |
00:15:01.120
it does, but nevertheless uses deep learning. So what are your thoughts about learning methods to
link |
00:15:05.680
aid in the way that Lebrados plays the game of poker? Yeah, so as you said, Lebrados did not
link |
00:15:11.520
use learning methods and played very well without them. Since then, we have actually actually here,
link |
00:15:17.760
we have a couple of papers on things that do use learning techniques. Excellent. So and deep learning
link |
00:15:25.200
in particular, and sort of the way you're talking about where it's learning an evaluation function.
link |
00:15:33.360
But in imperfect information games, unlike, let's say in go war, now also in chess and shogi,
link |
00:15:42.400
it's not sufficient to learn an evaluation for a state because the value of an information set
link |
00:15:52.160
depends not only on the exact state, but it also depends on both players beliefs. Like if I have
link |
00:16:00.080
a bad hand, I'm much better off if the opponent thinks I have a good hand. And vice versa,
link |
00:16:05.360
if I have a good hand, I'm much better off if the opponent believes I have a bad hand. So the value
link |
00:16:12.640
of a state is not just a function of the cards. It depends on if you will, the path of play,
link |
00:16:18.720
but only to the extent that it's captured in the belief distributions. So that's why it's not as
link |
00:16:25.440
simple as it is in perfect information games. And I don't want to say it's simple there either.
link |
00:16:31.040
It's of course, very complicated computationally there too. But at least conceptually, it's very
link |
00:16:35.680
straightforward. There's a state, there's an evaluation function, you can try to learn it.
link |
00:16:40.080
Here, you have to do something more. And what we do is in one of these papers, we're looking at
link |
00:16:48.640
allowing where we allow the opponent to actually take different strategies at the leaf of the
link |
00:16:54.000
search tree, if you will. And that is a different way of doing it. And it doesn't assume therefore
link |
00:17:01.600
a particular way that the opponent plays. But it allows the opponent to choose from a set of
link |
00:17:07.200
different continuation strategies. And that forces us to not be too optimistic in a lookahead
link |
00:17:14.560
search. And that's that's one way you can do sound lookahead search in imperfect information
link |
00:17:20.720
games, which is very different, difficult. And in US, you were asking about deep stack, what they
link |
00:17:26.400
did, it was very different than what we do, either in Libertadores or in this new work.
link |
00:17:31.920
They were generally randomly generating various situations in the game. Then they were doing
link |
00:17:37.120
the lookahead from there to the end of the game, as if that was the start of a different game. And
link |
00:17:42.960
then they were using deep learning to learn those values of those states. But the states were not
link |
00:17:48.800
just the physical states, they include the belief distributions. When you talk about lookahead,
link |
00:17:55.680
or deep stack, or with Libertadores, does it mean considering every possibility that the game can
link |
00:18:01.600
evolve? Are we talking about extremely sort of this exponential growth of a tree? Yes. So we're
link |
00:18:08.160
talking about exactly that. Much like you doing alpha beta search or Monte Carlo tree search,
link |
00:18:15.760
but with different techniques. So there's a different search algorithm. And then we have to
link |
00:18:19.920
deal with the leaves differently. So if you think about what Libertadores did, we didn't have to
link |
00:18:24.400
worry about this, because we only did it at the end of the game. So we would always terminate into a
link |
00:18:30.880
real situation. And we would know what the payout is. It didn't do these depth limited lookaheads.
link |
00:18:36.720
But now in this new paper, which is called depth limited, I think it's called depth limited search
link |
00:18:42.000
for imperfect information games, we can actually do sound depth limited lookaheads. So we can
link |
00:18:47.360
actually start to do the lookahead from the beginning of the game on, because that's too
link |
00:18:52.080
complicated to do for this whole long game. So in Libertadores, we were just doing it for the end.
link |
00:18:57.680
So and then the other side, this belief distribution. So is it explicitly modeled
link |
00:19:03.040
what kind of beliefs that the opponent might have? Yeah, it is explicitly modeled, but it's not
link |
00:19:10.800
assumed that beliefs are actually output, not input. Of course, the starting beliefs are input,
link |
00:19:18.720
but they just fall from the rules of the game, because we know that the dealer deals uniformly
link |
00:19:23.440
from the deck. So I know that every pair of cards that you might have is equally likely.
link |
00:19:29.600
I know that for a fact, that just follows from the rules of the game. Of course,
link |
00:19:33.600
except the two cards that I have, I know you don't have those. You have to take that into
link |
00:19:38.320
account. That's called card removal, and that's very important. Is the dealing always coming
link |
00:19:42.480
from a single deck in heads up? Yes. So you can assume single deck. So you know that if I have
link |
00:19:50.000
the ace of spades, I know you don't have an ace of spades. So in the beginning, your belief is
link |
00:19:55.680
basically the fact that it's a fair dealing of hands. But how do you start to adjust that belief?
link |
00:20:02.640
Well, that's where this beauty of game theory comes. So Nash equilibrium, which John Nash
link |
00:20:08.960
introduced in 1950, introduces what rational play is when you have more than one player.
link |
00:20:15.920
And these are pairs of strategies where strategies are contingency plans, one for each player.
link |
00:20:21.200
So that neither player wants to deviate to a different strategy, given that the other
link |
00:20:27.920
doesn't deviate. But as a side effect, you get the beliefs from base rule. So Nash equilibrium
link |
00:20:34.800
really isn't just deriving in these imperfect information games. Nash equilibrium doesn't
link |
00:20:39.760
just define strategies. It also defines beliefs for both of us, and it defines beliefs for each state.
link |
00:20:47.920
So at each state, each, if they call information sets, at each information set in the game,
link |
00:20:54.720
there's a set of different states that we might be in, but I don't know which one we're in.
link |
00:21:00.960
Nash equilibrium tells me exactly what is the probability distribution over those real
link |
00:21:05.440
wall states in my mind. How does Nash equilibrium give you that distribution? So why?
link |
00:21:11.360
I'll do a simple example. So you know the game Rock Paper Scissors? So we can draw it
link |
00:21:17.280
as player one moves first, and then player two moves. But of course, it's important that player
link |
00:21:23.600
two doesn't know what player one moved. Otherwise player two would win every time. So we can draw
link |
00:21:28.560
that as an information set where player one makes one or three moves first. And then there's an
link |
00:21:33.440
information set for player two. So player two doesn't know which of those nodes the world is in.
link |
00:21:40.480
But once we know the strategy for player one, Nash equilibrium will say that you play one third
link |
00:21:46.400
rock, one third paper, one third scissors. From that I can derive my beliefs on the information
link |
00:21:52.160
set that they're one third, one third, one third. So Bayes gives you that. But is that specific
link |
00:21:58.480
to a particular player? Or is it something you quickly update with those? No, the game theory
link |
00:22:06.800
isn't really player specific. So that's also why we don't need any data. We don't need any history
link |
00:22:12.560
how these particular humans played in the past or how any AI or even had played before. It's all
link |
00:22:17.840
about rationality. So we just think the AI just thinks about what would a rational opponent do?
link |
00:22:24.720
And what would I do if I were I am rational and what that that's that's the idea of game theory.
link |
00:22:30.960
So it's really a data free opponent free approach. So it comes from the design of the game as opposed
link |
00:22:38.080
to the design of the player. Exactly. There's no opponent modeling per se. I mean, we've done
link |
00:22:43.600
some work on combining opponent modeling with game theory. So you can exploit weak players
link |
00:22:47.760
even more. But that's another strand and in Libra, there's wouldn't turn that on. So I decided that
link |
00:22:53.440
these players are too good. And when you start to exploit an opponent, you typically open yourself
link |
00:22:59.520
up self up to exploitation. And these guys have so few holes to exploit and they're world's leading
link |
00:23:04.720
experts in counter exploitation. So I decided that we're not going to turn that stuff on.
link |
00:23:09.120
Actually, I saw a few your papers exploiting opponents sound very interesting to explore.
link |
00:23:15.600
Do you think there's room for exploitation, generally outside of the broadest?
link |
00:23:19.840
Is there a subject or people differences that could be exploited? Maybe not just in poker,
link |
00:23:27.840
but in general interactions and negotiations all these other domains that you're considering?
link |
00:23:33.360
Yeah, definitely. We've done some work on that. And I really like the work that hybridizes the two.
link |
00:23:39.760
So you figure out what would a rational opponent do. And by the way, that's safe in these zero
link |
00:23:45.200
sum games to players, zero sum games, because if the opponent does something irrational,
link |
00:23:49.440
yes, it might show a throw of my beliefs. But the amount that the player can gain by throwing
link |
00:23:56.320
of my belief is always less than they lose by playing poorly. So it's safe. But still,
link |
00:24:04.240
if somebody's weak as a player, you might want to play differently to exploit them more.
link |
00:24:10.160
So you can think about it this way, a game theoretic strategy is unbeatable,
link |
00:24:15.600
but it doesn't maximally beat the other opponent. So the winnings per hand might be better
link |
00:24:22.720
with a different strategy. And the hybrid is that you start from a game theoretic approach,
link |
00:24:27.120
and then as you gain data about the opponent in certain parts of the game tree, then in those
link |
00:24:33.040
parts of the game tree, you start to tweak your strategy more and more towards exploitation,
link |
00:24:39.280
while still staying fairly close to the game theoretic strategy so as to not open yourself
link |
00:24:44.160
up to exploitation too much. How do you do that? Do you try to vary up strategies, make it unpredictable?
link |
00:24:53.600
It's like, what is it, tit for tat strategies in Prisoner's Dilemma or?
link |
00:25:00.560
Well, that's a repeated game, simple Prisoner's Dilemma, repeated games. But even there,
link |
00:25:07.440
there's no proof that says that that's the best thing. But experimentally, it actually does well.
link |
00:25:13.120
So what kind of games are there, first of all? I don't know if this is something that you could
link |
00:25:17.360
just summarize. There's perfect information games with all the information on the table.
link |
00:25:22.320
There is imperfect information games. There's repeated games that you play over and over.
link |
00:25:28.480
There's zero sum games. There's non zero sum games. And then there's a really important
link |
00:25:36.960
distinction you're making to player versus more players. So what are what other games are there?
link |
00:25:44.640
And what's the difference, for example, with this two player game versus more players? Yeah,
link |
00:25:49.920
what are the key differences? Right. So let me start from the the basics. So
link |
00:25:56.320
a repeated game is a game where the same exact game is played over and over.
link |
00:26:02.160
In these extensive form games, where you think about three form, maybe with these
link |
00:26:08.160
information sets to represent incomplete information, you can have kind of repetitive
link |
00:26:14.000
interactions and even repeated games are a special case of that, by the way. But
link |
00:26:19.680
the game doesn't have to be exactly the same. It's like in sourcing options. Yes, we're going to
link |
00:26:24.160
see the same supply base year to year. But what I'm buying is a little different every time.
link |
00:26:29.040
And the supply base is a little different every time and so on. So it's not really repeated.
link |
00:26:34.080
So to find a purely repeated game is actually very rare in the world. So they're really a very
link |
00:26:40.960
coarse model of what's going on. Then if you move up from just repeated, simple,
link |
00:26:47.360
repeated matrix games, not all the way to extensive form games, but in between,
link |
00:26:52.560
there's stochastic games, where you know, there's these, you think about it like these little
link |
00:26:59.360
matrix games. And when you take an action and your own takes an action, they determine not which
link |
00:27:06.080
next state I'm going to next game, I'm going to, but the distribution over next games,
link |
00:27:11.280
where I might be going to. So that's the stochastic game. But it's like
link |
00:27:15.760
like matrix games, repeated stochastic games, extensive form games, that is from less to more
link |
00:27:22.160
general. And poker is an example of the last one. So it's really in the most general setting,
link |
00:27:29.440
extensive form games. And that's kind of what the AI community has been working on and being
link |
00:27:34.720
benchmarked on with this heads up no limit, Texas Holden. Can you describe extensive form games?
link |
00:27:39.680
What's the model here? So if you're familiar with the tree form, so it's really the tree form,
link |
00:27:45.600
like in chess, there's a search tree versus a matrix versus a matrix. Yeah. And that's the
link |
00:27:51.280
matrix is called the matrix form or by matrix form or normal form game. And here you have the tree
link |
00:27:56.640
form. So you can actually do certain types of reasoning there, that you lose the information
link |
00:28:02.320
when you go to normal form. There's a certain form of equivalence, like if you go from three
link |
00:28:07.840
form and you say it every possible contingency plan is a strategy, then I can actually go back
link |
00:28:13.840
to the normal form, but I lose some information from the lack of sequentiality. Then the multiplayer
link |
00:28:19.600
versus two player distinction is an important one. So two player games in zero sum are conceptually
link |
00:28:29.520
easier and computationally easier. They're still huge like this one, this one. But they're conceptually
link |
00:28:37.040
easier and computationally easier. In that conceptually, you don't have to worry about
link |
00:28:42.160
which equilibrium is the other guy going to play when there are multiple, because any equilibrium
link |
00:28:47.440
strategy is the best response to any other equilibrium strategy. So I can play a different
link |
00:28:52.400
equilibrium from you and we'll still get the right values of the game. That falls apart even
link |
00:28:57.600
with two players when you have general sum games. Even without cooperation, just even without
link |
00:29:02.960
cooperation. So there's a big gap from two player zero sum to two player general sum, or even to
link |
00:29:09.040
three players zero sum. That's a big gap, at least in theory. Can you maybe non mathematically
link |
00:29:17.600
provide the intuition why it all falls apart with three or more players? It seems like you should
link |
00:29:23.040
still be able to have a Nash equilibrium that's instructive, that holds. Okay, so it is true
link |
00:29:32.640
that all finite games have a Nash equilibrium. So this is what your Nash actually proved.
link |
00:29:40.880
So they do have a Nash equilibrium. That's not the problem. The problem is that there can be many.
link |
00:29:46.480
And then there's a question of which equilibrium to select. So and if you select your strategy
link |
00:29:52.000
from a different equilibrium and I select mine, then what does that mean? And in these non zero
link |
00:30:00.960
sum games, we may lose some joint benefit by being just simply stupid, we could actually both be
link |
00:30:07.760
better off if we did something else. And in three player, you get other problems also like collusion.
link |
00:30:12.960
Like maybe you and I can gang up on a third player, and we can do radically better by colluding.
link |
00:30:19.600
So there are lots of issues that come up there. So No Brown, the student you work with on this,
link |
00:30:25.440
has mentioned, I looked through the AMA on Reddit, he mentioned that the ability of poker players
link |
00:30:31.120
to collaborate would make the game. He was asked the question of, how would you make the game of
link |
00:30:36.320
poker? Or both of you were asked the question, how would you make the game of poker beyond
link |
00:30:43.440
being solvable by current AI methods? And he said that there's not many ways of making poker more
link |
00:30:51.440
difficult, but a collaboration or cooperation between players would make it extremely difficult.
link |
00:30:59.600
So can you provide the intuition behind why that is, if you agree with that idea?
link |
00:31:05.200
Yeah, so we've done a lot of work on coalitional games. And we actually have a paper here with
link |
00:31:11.840
my other student Gabriella Farina and some other collaborators on at NIPPS on that actually just
link |
00:31:17.040
came back from the poster session where we presented this. So when you have a collusion,
link |
00:31:22.080
it's a different problem. And it typically gets even harder then. Even the game representations,
link |
00:31:29.440
some of the game representations don't really allow good computation. So we actually introduced a new
link |
00:31:35.840
game representation for that. Is that kind of cooperation part of the model? Do you have
link |
00:31:43.840
information about the fact that other players are cooperating? Or is it just this chaos that
link |
00:31:48.720
where nothing is known? So there's some some things unknown. Can you give an example of a
link |
00:31:53.760
collusion type game? Or is it usually? So like bridge. Yeah, so think about bridge. It's like
link |
00:31:59.920
when you and I are on a team, our payoffs are the same. The problem is that we can't talk. So when
link |
00:32:06.960
I get my cards, I can't whisper to you what my cards are. That would not be allowed. So we have
link |
00:32:13.360
to somehow coordinate our strategies ahead of time. And only ahead of time. And then there's
link |
00:32:20.480
certain signals we can talk about. But they have to be such that the other team also understands
link |
00:32:25.920
that. So so that that's that's an example where the coordination is already built into the rules
link |
00:32:31.920
of the game. But in many other situations, like auctions or negotiations or diplomatic
link |
00:32:38.400
relationships, poker, it's not really built in. But it still can be very helpful for the
link |
00:32:44.320
colliders. I've read you write somewhere, the negotiations, you come to the table with prior
link |
00:32:52.640
like a strategy that, like that you're willing to do and not willing to do those kinds of things.
link |
00:32:58.160
So how do you start to now moving away from poker, moving beyond poker into other applications
link |
00:33:04.320
like negotiations? How do you start applying this to other to other domains, even real world
link |
00:33:10.960
domains that you've worked on? Yeah, I actually have two startup companies doing exactly that.
link |
00:33:15.360
One is called Strategic Machine. And that's for kind of business applications, gaming, sports,
link |
00:33:21.520
all sorts of things like that. Any applications of this to business and to sports and to gaming,
link |
00:33:29.040
to various types of things for in finance, electricity markets and so on. And the other
link |
00:33:34.800
is called Strategy Robot, where we are taking these to military security, cybersecurity,
link |
00:33:41.360
and intelligence applications. I think you worked a little bit in
link |
00:33:47.920
how do you put it, advertisement, sort of suggesting ads kind of thing.
link |
00:33:54.560
Yeah, that's another company, Optimized Markets. But that's much more about a
link |
00:33:59.040
combinatorial market and optimization based technology. That's not using these
link |
00:34:04.720
game theory decreasing technologies. I see. Okay, so what sort of high level do you think about
link |
00:34:13.040
our ability to use game theoretic concepts to model human behavior? Do you think human behavior is
link |
00:34:20.320
amenable to this kind of modeling? So outside of the poker games and where have you seen it
link |
00:34:25.680
done successfully in your work? I'm not sure. The goal really is modeling humans.
link |
00:34:33.520
Like for example, if I'm playing a zero sum game, I don't really care that the opponent is actually
link |
00:34:40.240
following my model of rational behavior. Because if they're not, that's even better for me.
link |
00:34:45.680
All right, so see with the opponents in games, the prerequisite is that you've formalized
link |
00:34:56.240
the interaction in some way that can be amenable to analysis. I mean, you've done this amazing work
link |
00:35:02.720
with mechanism design, designing games that have certain outcomes. But so I'll tell you an example
link |
00:35:12.160
for my world of autonomous vehicles. We're studying pedestrians and pedestrians and cars
link |
00:35:19.360
negotiate in this nonverbal communication. There's this weird game dance of tension where
link |
00:35:25.760
pedestrians are basically saying, I trust that you won't kill me. And so as a J Walker,
link |
00:35:30.400
I will step onto the road even though I'm breaking the law and there's this tension.
link |
00:35:34.560
And the question is, we really don't know how to model that well in trying to model intent.
link |
00:35:40.480
And so people sometimes bring up ideas of game theory and so on. Do you think that aspect
link |
00:35:47.120
of human behavior can use these kinds of imperfect information approaches, modeling?
link |
00:35:54.800
How do you start to attack a problem like that when you don't even know how to design the game
link |
00:36:00.800
to describe the situation in order to solve it? Okay, so I haven't really thought about J walking.
link |
00:36:06.640
But one thing that I think could be a good application in autonomous vehicles is the
link |
00:36:12.080
following. So let's say that you have fleets of autonomous cars operating by different companies.
link |
00:36:18.240
So maybe here's the Waymo fleet and here's the Uber fleet. If you think about the rules of the road,
link |
00:36:24.160
they define certain legal rules, but that still leaves a huge strategy space open.
link |
00:36:29.920
Like as a simple example, when cars merge, you know, how humans merge, you know,
link |
00:36:33.760
they slow down and look at each other and try to merge. Wouldn't it be better if these situations
link |
00:36:40.800
would already be prenegotiated so we can actually merge at full speed and we know that this is the
link |
00:36:46.240
situation, this is how we do it and it's all going to be faster. But there are way too many
link |
00:36:51.680
situations to negotiate manually. So you could use automated negotiation. This is the idea at least.
link |
00:36:57.600
You could use automated negotiation to negotiate all of these situations or many of them in advance.
link |
00:37:04.160
And of course, it might be that, hey, maybe you're not going to always let me go first.
link |
00:37:09.040
Maybe you said, okay, well, in these situations, I'll let you go first. But in exchange, you're
link |
00:37:13.520
going to give me two hours, you're going to let me go first in these situations. So it's this huge
link |
00:37:18.240
combinatorial negotiation. And do you think there's room in that example of merging to
link |
00:37:24.240
model this whole situation as an imperfect information game? Or do you really want to
link |
00:37:28.080
consider it to be a perfect? No, that's a good question. Yeah. That's a good question. Do you
link |
00:37:33.520
pay the price of assuming that you don't know everything? Yeah, I don't know. It's certainly
link |
00:37:41.120
much easier. Games with perfect information are much easier. So if you can get away with it,
link |
00:37:48.240
you should. But if the real situation is of imperfect information, then you're going to
link |
00:37:52.960
have to deal with imperfect information. Great. So what lessons have you learned the annual
link |
00:37:58.480
computer poker competition? An incredible accomplishment of AI. You look at the history
link |
00:38:04.560
of the blue AlphaGo, these kind of moments when AI stepped up in an engineering effort and a
link |
00:38:12.240
scientific effort combined to beat the best human player. So what do you take away from
link |
00:38:18.240
this whole experience? What have you learned about designing AI systems that play these kinds
link |
00:38:23.200
of games? And what does that mean for AI in general, for the future of AI development?
link |
00:38:30.720
Yeah, so that's a good question. So there's so much to say about it.
link |
00:38:35.280
I do like this type of performance oriented research, although in my group, we go all the
link |
00:38:40.320
way from like idea to theory to experiments to big system building to commercialization. So we
link |
00:38:46.160
span that spectrum. But I think that in a lot of situations in AI, you really have to build the
link |
00:38:52.560
big systems and evaluate them at scale before you know what works and doesn't. And we've seen
link |
00:38:58.400
that in the computational game theory community, that there are a lot of techniques that look good
link |
00:39:03.280
in the small, but then they cease to look good in the large. And we've also seen that there are a
link |
00:39:08.640
lot of techniques that look superior in theory. And I really mean in terms of convergence rates,
link |
00:39:15.600
better like first order methods, better convergence rates like the CFR based algorithms,
link |
00:39:20.720
yet the CFR based algorithms are the first fastest in practice. So it really tells me that you have
link |
00:39:26.080
to test these in reality, the theory isn't tight enough, if you will, to tell you which
link |
00:39:32.400
algorithms are better than the others. And you have to look at these things that in the large,
link |
00:39:38.480
because any sort of projections you do from the small can at least in this domain be very misleading.
link |
00:39:43.600
So that's kind of from a kind of science and engineering perspective, from a personal perspective,
link |
00:39:49.040
it's been just a wild experience in that with the first poker competition, the first
link |
00:39:56.160
brains versus AI man machine poker competition that we organized. There had been, by the way,
link |
00:40:00.640
for other poker games, there had been previous competitions, but this was for heads up no limit,
link |
00:40:04.880
this was the first. And I probably became the most hated person in the world of poker. And I
link |
00:40:11.200
didn't mean to sigh. Why is that for cracking the game for? Yeah, it was a lot of people
link |
00:40:18.400
felt that it was a real threat to the whole game, the whole existence of the game. If AI becomes
link |
00:40:24.160
better than humans, people would be scared to play poker, because there are these superhuman
link |
00:40:29.680
AIs running around taking their money and you know, all of that. So I just it was really aggressive.
link |
00:40:35.120
Interesting. The comments were super aggressive. I got everything just short of death threats.
link |
00:40:42.000
Do you think the same was true for chess? Because right now, they just completed the
link |
00:40:45.760
world championships in chess, and humans just started ignoring the fact that there's AI systems
link |
00:40:50.720
now that outperform humans and they still enjoy the game is still a beautiful game.
link |
00:40:55.360
That's what I think. And I think the same thing happened in poker. And so I didn't
link |
00:41:00.160
think of myself as somebody was going to kill the game. And I don't think I did.
link |
00:41:03.760
I've really learned to love this game. I wasn't a poker player before, but
link |
00:41:07.360
learn so many nuances about it from these AIs. And they've really changed how the game is played,
link |
00:41:12.400
by the way. So they have these very Martian ways of playing poker. And the top humans are now
link |
00:41:17.600
incorporating those types of strategies into their own play. So if anything, to me, our work has made
link |
00:41:25.600
poker a richer, more interesting game for humans to play, not something that is going to steer
link |
00:41:31.760
humans away from it entirely. Just a quick comment on something you said, which is,
link |
00:41:37.440
if I may say so, in academia is a little bit rare sometimes. It's pretty brave to put your ideas
link |
00:41:44.800
to the test in the way you described, saying that sometimes good ideas don't work when you actually
link |
00:41:50.560
try to apply them at scale. So where does that come from? I mean, if you could do advice for
link |
00:41:57.600
people, what drives you in that sense? Were you always this way? I mean, it takes a brave person,
link |
00:42:04.000
I guess is what I'm saying, to test their ideas and to see if this thing actually works against human
link |
00:42:09.840
top human players and so on. I don't know about brave, but it takes a lot of work.
link |
00:42:14.800
It takes a lot of work and a lot of time to organize, to make something big and to organize
link |
00:42:20.960
an event and stuff like that. And what drives you in that effort? Because you could still,
link |
00:42:26.080
I would argue, get a Best Paper Award at NIPS as you did in 17 without doing this.
link |
00:42:31.280
That's right, yes.
link |
00:42:34.400
So in general, I believe it's very important to do things in the real world and at scale.
link |
00:42:41.600
And that's really where the pudding, if you will, proves in the pudding. That's where it is.
link |
00:42:48.320
In this particular case, it was kind of a competition between different groups.
link |
00:42:54.720
And for many years, as to who can be the first one to beat the top humans at heads up,
link |
00:43:00.400
no limit takes us hold them. So it became kind of like a competition who can get there.
link |
00:43:09.440
Yeah, so a little friendly competition could do wonders for progress.
link |
00:43:13.840
Yes, absolutely.
link |
00:43:16.240
So the topic of mechanism design, which is really interesting, also kind of new to me,
link |
00:43:21.440
except as an observer of, I don't know, politics and any, I'm an observer of mechanisms,
link |
00:43:27.440
but you write in your paper, an automated mechanism design that I quickly read.
link |
00:43:33.840
So mechanism design is designing the rules of the game,
link |
00:43:37.760
so you get a certain desirable outcome. And you have this work on doing so in an automatic
link |
00:43:44.400
fashion as opposed to fine tuning it. So what have you learned from those efforts?
link |
00:43:49.440
If you look, say, I don't know, at complex, it's like our political system. Can we design our
link |
00:43:57.040
political system to have in an automated fashion, to have outcomes that we want? Can we design
link |
00:44:04.480
something like traffic lights to be smart, where it gets outcomes that we want?
link |
00:44:11.680
So what are the lessons that you draw from that work?
link |
00:44:14.880
Yeah, so I still very much believe in the automated mechanism design direction.
link |
00:44:19.280
Yes.
link |
00:44:20.640
But it's not a panacea. There are impossibility results in mechanism design,
link |
00:44:26.400
saying that there is no mechanism that accomplishes objective X in class C.
link |
00:44:33.840
So it's not going up, there's no way, using any mechanism design tools, manual or automated,
link |
00:44:40.800
to do certain things in mechanism design.
link |
00:44:42.720
Can you describe that again? So meaning there, it's impossible to achieve that?
link |
00:44:46.960
Yeah, there's also an impossible. So these are not statements about human ingenuity,
link |
00:44:55.120
who might come up with something smart. These are proofs that if you want to accomplish properties
link |
00:45:00.320
X in class C, that is not doable with any mechanism. The good thing about automated
link |
00:45:06.080
mechanism design is that we're not really designing for a class, we're designing for specific settings
link |
00:45:12.800
at a time. So even if there's an impossibility result for the whole class, it just doesn't
link |
00:45:18.800
mean that all of the cases in the class are impossible, it just means that some of the
link |
00:45:23.520
cases are impossible. So we can actually carve these islands of possibility within these
link |
00:45:28.800
non impossible classes. And we've actually done that. So one of the famous results in mechanism
link |
00:45:34.640
design is a Meyers and Settled Weight theorem by Roger Meyers and Mark Settled Weight from 1983.
link |
00:45:40.640
So it's an impossibility of efficient trade under imperfect information. We show that
link |
00:45:46.880
you can in many settings avoid that and get efficient trade anyway.
link |
00:45:51.360
Depending on how you design the game. Depending how you design the game. And of course,
link |
00:45:56.640
it doesn't in any way contradict the impossibility result. The impossibility result is still there,
link |
00:46:03.760
but it just finds spots within this impossible class where in those spots you don't have the
link |
00:46:11.200
impossibility. Sorry if I'm going a bit philosophical, but what lessons do you draw towards like I
link |
00:46:17.600
mentioned politics or the human interaction and designing mechanisms for outside of just
link |
00:46:24.800
these kinds of trading or auctioning or
link |
00:46:27.120
purely formal games or human interaction like a political system. Do you think it's applicable
link |
00:46:37.200
to politics or to business, to negotiations, these kinds of things, designing rules that have
link |
00:46:47.920
certain outcomes? Yeah, yeah, I do think so. Have you seen success that successfully done?
link |
00:46:53.360
There hasn't really. Oh, you mean mechanism design or automated mechanism? Automated mechanism design.
link |
00:46:58.880
So mechanism design itself has had fairly limited success so far. There are certain cases,
link |
00:47:07.440
but most of the real world situations are actually not sound from a mechanism design
link |
00:47:13.600
perspective. Even in those cases where they've been designed by very knowledgeable mechanism
link |
00:47:18.640
design people, the people are typically just taking some insights from the theory and applying
link |
00:47:24.080
those insights into the real world rather than applying the mechanisms directly. So one famous
link |
00:47:29.920
example of is the FCC spectrum auctions. So I've also had a small role in that and
link |
00:47:38.560
very good economists have been working on that with no game theory. Yet the rules that are
link |
00:47:45.040
designed in practice there, they're such that bidding truthfully is not the best strategy.
link |
00:47:51.680
Usually mechanism design, we try to make things easy for the participants. So telling the truth
link |
00:47:57.040
is the best strategy. But even in those very high stakes auctions where you have tens of billions
link |
00:48:02.480
of dollars worth of spectrum being auctioned, truth telling is not the best strategy.
link |
00:48:09.200
And by the way, nobody knows even a single optimal bidding strategy for those auctions.
link |
00:48:14.000
What's the challenge of coming up with an optimal bid? Because there's a lot of players and there's
link |
00:48:17.600
imperfections. It's not so much there, a lot of players, but many items for sale. And these
link |
00:48:23.440
mechanisms are such that even with just two items or one item, bidding truthfully wouldn't be
link |
00:48:29.200
the best strategy. If you look at the history of AI, it's marked by seminal events and an
link |
00:48:37.040
AlphaGo beating a world champion, human go player, I would put Lebrados winning the heads
link |
00:48:42.160
up no limit hold them as one of such event. Thank you. And what do you think is the next such event?
link |
00:48:52.400
Whether it's in your life or in the broadly AI community that you think might be out there
link |
00:48:58.880
that would surprise the world. So that's a great question and I really know the answer. In terms
link |
00:49:04.800
of game solving heads up no limit takes us all and really was the one remaining widely agreed
link |
00:49:12.880
upon benchmark. So that was the big milestone. Now, are there other things? Yeah, certainly
link |
00:49:18.400
there are, but there there is not one that the community has kind of focused on. So what could
link |
00:49:23.440
be other things? There are groups working on Starcraft. There are groups working on Dota 2.
link |
00:49:29.680
These are video games. Yes, or you could have like diplomacy or Hanabi, you know, things like
link |
00:49:36.560
that. These are like recreational games, but none of them are really acknowledged as kind of the
link |
00:49:42.640
main next challenge problem. Like chess or go or heads up no limit takes us hold them was.
link |
00:49:49.920
So I don't really know in the game solving space what is or what will be the next benchmark. I
link |
00:49:55.600
kind of hope that there will be a next benchmark because really the different groups working on
link |
00:49:59.920
the same problem really drove these application independent techniques forward very quickly
link |
00:50:06.160
over 10 years. Do you think there's an open problem that excites you that you start moving
link |
00:50:11.120
away from games into real world games like say the stock market trading? Yeah, so that's kind
link |
00:50:18.240
of how I am. So I am probably not going to work as hard on these recreational benchmarks.
link |
00:50:27.760
I'm doing two startups on game solving technology strategic machine and strategy robot and we're
link |
00:50:32.960
really interested in pushing this stuff into practice. What do you think would be really,
link |
00:50:39.680
you know, a powerful result that would be surprising that would be if you can say, I mean,
link |
00:50:50.400
it's, you know, five years, 10 years from now, something that statistically would say is not
link |
00:50:56.880
very likely, but if there's a breakthrough would achieve. Yeah, so I think that overall,
link |
00:51:03.200
we're in a very different situation in game theory than we are in, let's say, machine learning. Yes.
link |
00:51:11.600
So in machine learning, it's a fairly mature technology and it's very broadly applied and
link |
00:51:17.280
proven success in the real world. In game solving, there are almost no applications yet.
link |
00:51:24.320
We have just become superhuman, which machine learning you could argue happened in the 90s,
link |
00:51:29.440
if not earlier, and at least on supervised learning, certain complex supervised learning
link |
00:51:35.120
applications. Now, I think the next challenge problem, I know you're not asking about it this
link |
00:51:40.960
way, you're asking about technology breakthrough. But I think that big breakthrough is to be able
link |
00:51:45.360
to show that, hey, maybe most of, let's say, military planning or most of business strategy
link |
00:51:50.720
will actually be done strategically using computational game theory. That's what I would
link |
00:51:55.600
like to see as a next five or 10 year goal. Maybe you can explain to me again, forgive me if this
link |
00:52:01.120
is an obvious question, but machine learning methods and neural networks suffer from not
link |
00:52:07.360
being transparent, not being explainable. Game theoretic methods, Nash Equilibria,
link |
00:52:12.800
do they generally, when you see the different solutions, are they, when you talk about military
link |
00:52:18.960
operations, are they, once you see the strategies, do they make sense? Are they explainable or do
link |
00:52:24.640
they suffer from the same problems as neural networks do? So that's a good question. I would say
link |
00:52:30.400
a little bit yes and no. And what I mean by that is that these game theoretic strategies,
link |
00:52:36.560
let's say, Nash Equilibrium, it has provable properties. So it's unlike, let's say, deep
link |
00:52:42.320
learning where you kind of cross your fingers, hopefully it'll work. And then after the fact,
link |
00:52:47.040
when you have the weights, you're still crossing your fingers, and I hope it will work. Here,
link |
00:52:52.560
you know that the solution quality is there. There's provable solution quality guarantees.
link |
00:52:58.480
Now, that doesn't necessarily mean that the strategies are human understandable.
link |
00:53:03.360
That's a whole other problem. So I think that deep learning and computational game theory
link |
00:53:08.560
are in the same boat in that sense, that both are difficult to understand.
link |
00:53:13.680
But at least the game theoretic techniques, they have this
link |
00:53:16.160
guarantees of solution quality. So do you see business operations, strategic operations,
link |
00:53:22.800
even military in the future being at least the strong candidates being proposed by automated
link |
00:53:31.440
systems? Do you see that? Yeah, I do. I do. But that's more of a belief than a substantiated fact.
link |
00:53:39.520
Depending on where you land, an optimism or pessimism, that's a really, to me, that's an
link |
00:53:43.840
exciting future, especially if there's provable things in terms of optimality. So looking into
link |
00:53:52.560
the future, there's a few folks worried about the, especially you look at the game of poker,
link |
00:54:01.120
which is probably one of the last benchmarks in terms of games being solved. They worry about
link |
00:54:06.800
the future and the existential threats of artificial intelligence. So the negative impact
link |
00:54:11.760
in whatever form on society, is that something that concerns you as much? Or are you more optimistic
link |
00:54:18.800
about the positive impacts of AI? I am much more optimistic about the positive impacts.
link |
00:54:24.640
So just in my own work, what we've done so far, we run the nationwide kidney exchange.
link |
00:54:29.920
Hundreds of people are walking around alive today, who would it be? And it's increased employment.
link |
00:54:36.000
You have a lot of people now running kidney exchanges and at transplant centers, interacting
link |
00:54:42.720
with the kidney exchange. You have some extra surgeons, nurses, anesthesiologists, hospitals,
link |
00:54:50.240
all of that. So employment is increasing from that and the world is becoming a better place.
link |
00:54:55.200
Another example is combinatorial sourcing auctions. We did 800 large scale combinatorial
link |
00:55:03.120
sourcing auctions from 2001 to 2010 in a previous startup of mine called Combinet.
link |
00:55:09.280
And we increased the supply chain efficiency on that $60 billion of spend by 12.6%. So that's
link |
00:55:18.480
over $6 billion of efficiency improvement in the world. And this is not like shifting value from
link |
00:55:24.000
somebody to somebody else, just efficiency improvement, like in trucking, less empty
link |
00:55:28.960
driving. So there's less waste, less carbon footprint and so on. This is a huge positive
link |
00:55:35.040
impact in the near term. But sort of to stay in it for a little longer, because I think game theory
link |
00:55:42.080
is a role to play here. Let me actually come back on that. That's one thing. I think AI is also going
link |
00:55:46.720
to make the world much safer. So that's another aspect that often gets overlooked.
link |
00:55:53.920
Well, let me ask this question. Maybe you can speak to the safer. So I talked to Max Tagmark
link |
00:55:58.560
and Stuart Russell, who are very concerned about existential threats of AI. And often,
link |
00:56:03.200
the concern is about value misalignment. So AI systems basically working, operating towards
link |
00:56:12.960
goals that are not the same as human civilization, human beings. So it seems like game theory has
link |
00:56:19.680
a role to play there to make sure the values are aligned with human beings. I don't know if that's
link |
00:56:28.800
how you think about it. If not, how do you think AI might help with this problem? How do you think
link |
00:56:36.160
AI might make the world safer? Yeah, I think this value misalignment is a fairly theoretical
link |
00:56:44.800
worry. And I haven't really seen it in because I do a lot of real applications. I don't see it
link |
00:56:52.880
anywhere. The closest I've seen it was the following type of mental exercise, really,
link |
00:56:58.080
where I had this argument in the late 80s when we were building these transportation optimization
link |
00:57:03.200
systems. And somebody had heard that it's a good idea to have high utilization of assets.
link |
00:57:08.160
So they told me that, hey, why don't you put that as objective? And we didn't even put it as an
link |
00:57:13.840
objective, because I just showed him that if you had that as your objective, the solution would be
link |
00:57:19.360
to load your trucks full and drive in circles. Nothing would ever get delivered. You'd have
link |
00:57:23.360
100% utilization. So yeah, I know this phenomenon. I've known this for over 30 years. But I've never
link |
00:57:30.480
seen it actually be a problem in reality. And yes, if you have the wrong objective, the AI will
link |
00:57:35.920
optimize that to the hilt. And it's going to hurt more than some human who's kind of trying to
link |
00:57:40.720
solve it in a half baked way with some human insight, too. But I just haven't seen that
link |
00:57:47.200
materialize in practice. There's this gap that you've actually put your finger on
link |
00:57:52.880
very clearly just now between theory and reality that's very difficult to put into words, I think.
link |
00:57:59.760
It's what you can theoretically imagine, the worst possible case or even, yeah, I mean,
link |
00:58:06.720
bad cases. And what usually happens in reality. So for example, to me, maybe it's something you
link |
00:58:12.720
can comment on, having grown up and I grew up in the Soviet Union. You know, there's currently
link |
00:58:19.680
10,000 nuclear weapons in the world. And for many decades, it's theoretically surprising to me
link |
00:58:28.320
that the nuclear war is not broken out. Do you think about this aspect from a game
link |
00:58:34.240
theoretic perspective in general? Why is that true? Why? In theory, you could see how things
link |
00:58:41.520
go terribly wrong. And somehow yet they have not. Yeah, how do you think so? So I do think that
link |
00:58:46.480
about that a lot. I think the biggest two threats that we're facing as mankind, one is climate
link |
00:58:51.120
change, and the other is nuclear war. So so those are my main two worries that they worry about.
link |
00:58:57.200
And I've tried to do something about climate thought about trying to do something for climate
link |
00:59:01.840
change twice. Actually, for two of my startups, I've actually commissioned studies of what we
link |
00:59:07.920
could do on those things. And we didn't really find a sweet spot, but I'm still keeping an eye out
link |
00:59:12.320
on that if there's something where we could actually provide a market solution or optimization
link |
00:59:17.280
solution or some other technology solution to problems. Right now, like for example, pollution
link |
00:59:24.080
critic markets was what we were looking at then. And it was much more the lack of political will
link |
00:59:30.000
by those markets were not so successful, rather than bad market design. So I could go in and
link |
00:59:35.520
make a better market design. But that wouldn't really move the needle on the world very much
link |
00:59:39.920
if there's no political will and in the US, you know, the market, at least the Chicago market
link |
00:59:44.800
was just shut down, and so on. So then it doesn't really help how great your market design was.
link |
00:59:50.560
And on the nuclear side, it's more so global warming is a more encroaching problem.
link |
01:00:00.560
You know, nuclear weapons have been here. It's an obvious problem has just been sitting there.
link |
01:00:05.680
So how do you think about what is the mechanism design there that just made everything seem stable?
link |
01:00:12.240
And are you still extremely worried? I am still extremely worried. So you probably know the simple
link |
01:00:18.560
game theory of mad. So this was a mutually assured destruction. And it doesn't require any
link |
01:00:25.280
computation with small matrices, you can actually convince yourself that the game is such that
link |
01:00:29.600
nobody wants to initiate. Yeah, that's a very coarse grained analysis. And it really works in
link |
01:00:36.000
a situation where you have two superpowers or small numbers of superpowers. Now things are
link |
01:00:40.960
very different. You have a smaller nuke. So the threshold of initiating is smaller. And you have
link |
01:00:47.920
smaller countries and non non nation actors who may get nukes and so on. So it's I think it's
link |
01:00:54.800
riskier now than it was maybe ever before. And what idea application of AI, you've talked about
link |
01:01:04.240
a little bit, but what is the most exciting to you right now? I mean, you're here at NIPS,
link |
01:01:09.440
NewRips. Now, you have a few excellent pieces of work. But what are you thinking into the future
link |
01:01:16.640
with several companies you're doing? What's the most exciting thing or one of the exciting things?
link |
01:01:21.200
The number one thing for me right now is coming up with these scalable techniques for
link |
01:01:26.960
game solving and applying them into the real world. I'm still very interested in market design
link |
01:01:32.800
as well. And we're doing that in the optimized markets. But I'm most interested if number one
link |
01:01:37.200
right now is strategic machine strategy robot getting that technology out there and seeing
link |
01:01:42.320
as you're in the trenches doing applications, what needs to be actually filled, what technology
link |
01:01:47.920
gaps still need to be filled. So it's so hard to just put your feet on the table and imagine what
link |
01:01:52.720
needs to be done. But when you're actually doing real applications, the applications tell you
link |
01:01:58.000
what needs to be done. And I really enjoy that interaction. Is it a challenging process to
link |
01:02:03.040
apply some of the state of the art techniques you're working on, and having the various players in
link |
01:02:13.520
industry or the military, or people who could really benefit from it actually use it? What's
link |
01:02:19.280
that process like of, you know, in autonomous vehicles, we work with automotive companies and
link |
01:02:23.920
they're in many ways are a little bit old fashioned, it's difficult. They really want to use this
link |
01:02:31.120
technology, there's clearly will have a significant benefit. But the systems aren't quite in place
link |
01:02:37.520
to easily have them integrated in terms of data in terms of compute in terms of all these kinds
link |
01:02:43.280
of things. So do you, is that one of the bigger challenges that you're facing? And how do you
link |
01:02:49.200
tackle that challenge? Yeah, I think that's always a challenge that that's kind of slowness and inertia
link |
01:02:54.000
really of let's do things the way we've always done it. You just have to find the internal
link |
01:02:59.360
champions that the customer who understand that hey, things can't be the same way in the future,
link |
01:03:04.480
otherwise bad things are going to happen. And it's in autonomous vehicles, it's actually very
link |
01:03:09.120
interesting that the car makers are doing that, and they're very traditional. But at the same time,
link |
01:03:13.200
you have tech companies who have nothing to do with cars or transportation, like Google and Baidu,
link |
01:03:19.120
really pushing on autonomous cars. I find that fascinating. Clearly, you're super excited about
link |
01:03:25.360
how actually these ideas have an impact in the world. In terms of the technology, in terms of
link |
01:03:31.120
ideas and research, are there directions that you're also excited about, whether that's on the
link |
01:03:39.440
some of the approaches you talked about for imperfect information games, whether it's applying
link |
01:03:43.360
deep learning to some of these problems? Is there something that you're excited in
link |
01:03:47.360
in the research side of things? Yeah, yeah, lots of different things in the game solving.
link |
01:03:52.240
So solving even bigger games, games where you have more hidden action of the player
link |
01:04:00.240
actions as well. poker is a game where really chance actions are hidden, or some of them are
link |
01:04:06.560
hidden, but the player actions are public. multiplayer games or various sorts, collusion,
link |
01:04:14.480
opponent exploitation, all and even longer games. So games that basically go forever,
link |
01:04:22.960
but they're not repeated. So see extensive fun games that go forever. What would that even look
link |
01:04:29.520
like? How do you represent that? How do you solve that? What's an example of a game like that?
link |
01:04:33.920
This is some of the stochastic games that you mentioned. Let's say business strategy. So it's
link |
01:04:37.680
and not just modeling like a particular interaction, but thinking about the business from here to
link |
01:04:42.880
eternity. Or I see, or let's let's say military strategy. So it's not like war is going to go away.
link |
01:04:50.960
How do you think about military strategy that's going to go forever? How do you even model that?
link |
01:04:58.000
How do you know whether a move was good that you use somebody made? And so on. So that that's
link |
01:05:05.760
kind of one direction. I'm also very interested in learning much more scalable techniques for
link |
01:05:11.920
integer programming. So we had an ICML paper this summer on that, the first automated algorithm
link |
01:05:18.560
configuration paper that has theoretical generalization guarantees. So if I see these many
link |
01:05:24.400
training examples, and I tool my algorithm in this way, it's going to have good performance
link |
01:05:30.480
on the real distribution, which I've not seen. So which is kind of interesting that, you know,
link |
01:05:35.280
algorithm configuration has been going on now for at least 17 years seriously. And there has not
link |
01:05:42.560
been any generalization theory before. Well, this is really exciting. And it's been, it's a huge
link |
01:05:48.800
honor to talk to you. Thank you so much, Tomas. Thank you for bringing Lebrates to the world
link |
01:05:52.720
and all the great work you're doing. Well, thank you very much. It's been fun. Good questions.