back to indexMichael Littman: Reinforcement Learning and the Future of AI | Lex Fridman Podcast #144
link |
The following is a conversation with Michael Littman, a computer science professor at Brown University
link |
doing research on and teaching machine learning, reinforcement learning, and artificial intelligence.
link |
He enjoys being silly and lighthearted in conversation, so this was definitely a fun one.
link |
Quick mention of each sponsor, followed by some thoughts related to the episode.
link |
Thank you to SimplySafe, a home security company I use to monitor and protect my apartment.
link |
ExpressVPN, the VPN I've used for many years to protect my privacy on the internet.
link |
Masterclass, online courses that I enjoy from some of the most amazing humans in history,
link |
and better help online therapy with a licensed professional.
link |
Please check out these sponsors in the description to get a discount and to support this podcast.
link |
As a side note, let me say that I may experiment with doing some solo episodes in the coming month
link |
or two. The three ideas I have floating in my head currently is to use one, a particular moment
link |
in history, two, a particular movie, or three, a book to drive a conversation about a set of
link |
related concepts. For example, I could use 2001 Space Odyssey or Ex Machina to talk about AGI
link |
for one, two, three hours, or I could do an episode on the rise and fall of Hitler and Stalin
link |
each in a separate episode, using relevant books and historical moments for reference.
link |
I find the format of a solo episode very uncomfortable and challenging, but that just
link |
tells me that it's something I definitely need to do and learn from the experience.
link |
Of course, I hope you come along for the ride. Also, since we have all this moment of built up
link |
unannouncements, I'm giving a few lectures on machine learning at MIT this January. In general,
link |
if you have ideas for the episodes, for the lectures, or for just short videos on YouTube,
link |
let me know in the comments that I still definitely read despite my better judgment
link |
and the wise sage device of the great Joe Rogan. If you enjoy this thing, subscribe on YouTube,
link |
review it with five stars and I'll have a podcast, follow on Spotify,
link |
support on Patreon, or connect with me on Twitter at Lex Freedman.
link |
And now here's my conversation with Michael Littman. I saw a video of you talking to Charles
link |
Isbell about Westworld, the TV series. You guys were doing a kind of thing where you're
link |
watching new things together, but let's rewind back. Is there a sci fi movie or book
link |
or shows that was profound that had an impact on you philosophically or just specifically
link |
something you enjoyed nerding out about? Yeah, interesting. I think a lot of us have been inspired
link |
by robots in movies. The one that I really like is there's a movie called Robot and Frank,
link |
which I think is really interesting because it's very near term future where robots are being deployed
link |
as helpers in people's homes. And we don't know how to make robots like that at this point,
link |
but it seemed very plausible. It seemed very realistic or imaginable. I thought that was
link |
really cool because they're awkward. They do funny things that raise some interesting issues,
link |
but it seemed like something that would ultimately be helpful and good if we could do it right.
link |
Yeah, he was an older cranky gentleman. He was an older cranky jewel thief. Yeah.
link |
It's kind of a funny little thing, which is he's a jewel thief and so he pulls the robot
link |
into his life, which is something you could imagine taking a home robotics thing and pulling
link |
into whatever quirky thing that's involved in your existence. It's meaningful to you,
link |
exactly so. Yeah, and I think from that perspective, I mean, not all of us are jewel thieves. And so
link |
when we bring our robots into our lives, explains a lot about this apartment, actually. But no,
link |
the idea that people should have the ability to make this technology their own, that it becomes
link |
part of their lives. And I think that's, it's hard for us as technologists to make that kind
link |
of technology. It's easier to mold people into what we need them to be. And just that opposite
link |
vision I think is really inspiring. And then there's a anthropomorphization where we project
link |
certain things on them because I think the robot was kind of dumb. But I have a bunch of Roombas
link |
that play with and they, you immediately project stuff onto them, much greater level of intelligence.
link |
We'll probably do that with each other too. Much, much, much greater degree of compassion.
link |
That's right. One of the things we're learning from AI is where we are smart and where we are
link |
not smart. Yeah. You also enjoy, as people can see, and I enjoyed myself watching you sing
link |
and even dance a little bit, a little bit of dancing. A little bit of dancing. That's not
link |
quite my thing. As a method of education, or just in life, in general. So easy question.
link |
What's the definitive, objectively speaking, top three songs of all time? Maybe something that,
link |
to walk that back a little bit, maybe something that others might be surprised by. Three songs
link |
that you kind of enjoy. That is a great question that I cannot answer. But instead, let me tell
link |
you a story. Pick a question you do want to answer. That's right. I've been watching the
link |
presidential debates and vice presidential debates and it turns out, yeah, it's really,
link |
you can just answer any question you want. So it's a related question. Yeah, well said.
link |
I really like pop music. I've enjoyed pop music ever since I was very young. So 60s music, 70s
link |
music, 80s music, this is all awesome. And then I had kids and I think I stopped listening to music
link |
and I was starting to realize that the, like my musical taste had sort of frozen out. And so I
link |
decided in 2011, I think to start listening to the top 10 Billboard songs each week. So I'd be on
link |
the treadmill and I would listen to that week's top 10 songs so I could find out what was popular
link |
now. And what I discovered is that I have no musical taste whatsoever. I like what I'm familiar
link |
with. And so the first time I'd hear a song is the first week that was on the charts. I'd be like,
link |
and then the second week, I was into it a little bit and the third week, I was loving it. And by
link |
the fourth week is like just part of me. And so I'm afraid that I can't tell you the most, my
link |
favorite song of all time, because it's whatever I heard most recently. Yeah, that's interesting.
link |
People have told me that there's an art to listening to music as well. And you can start to,
link |
if you listen to a song, just carefully, explicitly just force yourself to really listen. You start
link |
to, I did this when I was part of Jazz Band and Fusion Band in college. You start to hear the layers
link |
of the instruments. You start to hear the individual instruments and you start to,
link |
you can listen to classical music or to orchestra this way, you can listen to jazz this way.
link |
It's funny to imagine you now to walk in that forward to listening to pop hits now as like a
link |
scholar listening to like Cardi B or something like that or Justin Timberlake. No, not Timberlake,
link |
Bieber. They've both been in the top 10 since I've been listening. They're still up there. Oh,
link |
my God, I'm so cool. If you haven't heard Justin Timberlake's top 10 in the last few years,
link |
there was one song that he did where the music video was set at essentially NeurIPS. Oh, wow.
link |
Oh, the one with the robotics. Yeah, yeah, yeah, yeah. Yeah, yeah. It's like at an academic conference
link |
and he's doing a demo and it was sort of a cross between the Apple, like Steve Jobs kind of talk
link |
and NeurIPS. So, you know, it's always fun when AI shows up in pop culture. I wonder if he consulted
link |
somebody for that. That's really interesting. So maybe on that topic, I've seen your celebrity
link |
multiple dimensions, but one of them is you've done cameos in different places. I've seen you
link |
in a TurboTax commercial as like I guess the brilliant Einstein character. And the point
link |
is that TurboTax doesn't need somebody like you. It doesn't need a brilliant person.
link |
Very few things need someone like me. But yes, they were specifically emphasizing the idea that
link |
you don't need to be like a computer expert to be able to use their software. How did you end up
link |
in that world? I think it's an interesting story. So I was teaching my class. It was an
link |
intro computer science class for nonconcentrators, nonmajors. And sometimes when people would visit
link |
campus, they would check in to say, hey, we want to see what a class is like. Can we sit on your
link |
class? So a person came to my class who was the daughter of the brother of the husband of the best
link |
friend of my wife. Anyway, basically a family friend came to campus to check out Brown and
link |
asked to come to my class and came with her dad. Her dad is who I've known from various kinds of
link |
family events and so forth, but he also does advertising. And he said that he was recruiting
link |
scientists for this ad, this TurboTax set of ads. And he said, we wrote the ad with the idea that
link |
we get the most brilliant researchers, but they all said no. So can you help us find the B level
link |
scientists? And I'm like, sure, that's who I hang out with. So that should be fine. So I put
link |
together a list and I did what some people called a Dick Cheney. So I included myself on the list
link |
of possible candidates with a little blurb about each one and why I thought that would make sense
link |
for them to do it. And they reached out to a handful of them, but then they ultimately,
link |
they YouTube stalked me a little bit and they thought, oh, I think he could do this. And they
link |
said, okay, we're going to offer you the commercial. I'm like, what? So it was such an interesting
link |
experience because they have another world, the people who do nationwide kind of ad campaigns
link |
and television shows and movies and so forth. It's quite a remarkable system that they have going
link |
because it's like a set. Yeah. So I went to, it was just somebody's house that they rented in
link |
New Jersey, but in the commercial, it's just me and this other woman. In reality, there were 50
link |
people in that room and another, I don't know, half a dozen kind of spread out around the house
link |
in various ways. There were people whose job it was to control the sun. They were in the backyard
link |
on ladders putting filters up to try to make sure that the sun didn't glare off the window in a way
link |
that would wreck the shot. So there was like six people out there doing that. There was three people
link |
out there giving snacks, the craft table. There was another three people giving healthy snacks
link |
because that was a separate craft table. There was one person whose job it was to keep me from
link |
getting lost. And I think the reason for all this is because so many people are in one place at one
link |
time, they have to be time efficient. They have to get it done. The morning they were going to do
link |
my commercial and the afternoon they were going to do a commercial of a mathematics professor from
link |
Rinston. They had to get it done. No wasted time or energy. And so there's just a fleet of people
link |
all working as an organism and it was fascinating. I was just the whole time just looking around like
link |
this is so neat. Like one person whose job it was to take the camera off of the cameraman
link |
so that someone else whose job it was to remove the film canister because every couple's takes,
link |
they had to replace the film because film gets used up. It was just, I don't know, I was geeking
link |
out the whole time. It was so fun. How many takes did it take? It looked the opposite like there
link |
was more than two people there. It was very relaxed. Right. Yeah. The person who I was in the scene
link |
with is a professional. She's an improv comedian from New York City. And when I got there, they
link |
had given me a script such as it was. And then I got there and they said, we're going to do this
link |
as improv. I'm like, I don't know how to improv. I don't know what you're telling me to do here.
link |
Yeah. Don't worry. She knows. I'm like, okay, we'll see how this goes. I guess I got pulled
link |
into the story because like, where the heck did you come from? I guess in the scene. Like,
link |
how did you show up in this random person's house? I don't know. Yeah. Well, I mean,
link |
the reality of it is I stood outside in the blazing sun. There was someone whose job it was
link |
to keep an umbrella over me because I started to sweat. I started to sweat. And so I would
link |
wreck the shot because my face was all shiny with sweat. So there was one person who would dab me
link |
off at an umbrella. But yeah, like the reality of it, like why is this strange stalkery person
link |
hanging around outside somebody's house? We're not sure. We have to look in. We have to wait
link |
for the book. But are you... So you make, like you said, YouTube, you make videos yourself.
link |
You make awesome parody, sort of parody songs that kind of focus in a particular aspect of
link |
computer science. How much... Those seem really natural. How much production value goes into
link |
that? Do you also have a team of 50 people? The videos, almost all the videos, except for the
link |
ones that people would have actually seen, were just me. I write the lyrics. I sing the song. I
link |
generally find a backing track online because I'm like, you can't really play an instrument.
link |
And then I do, in some cases, I'll do visuals using just like PowerPoint. Lots and lots of
link |
PowerPoint to make it sort of like an animation. The most produced one is the one that people
link |
might have seen, which is the overfitting video that I did with Charles Isbell. And that was
link |
produced by the Georgia Tech and Udacity people because we were doing a class together. It was
link |
kind of... I usually do parody songs kind of to cap off a class at the end of a class.
link |
So that one you're wearing, so it was just a thriller, you're wearing the Michael Jackson,
link |
the red leather jacket. The interesting thing with podcasting that you're also into is that
link |
I really enjoy is that there's not a team of people. It's kind of more... Because you know,
link |
there's something that happens when there's more people involved than just one person. That's just
link |
the way you start acting. I don't know. There's a censorship. You're not given, especially for
link |
like slow thinkers like me, you're not... And I think most of us are if we're trying to actually
link |
think we're a little bit slow and careful, it kind of large teams get in the way of that.
link |
And I don't know what to do with that. To me, it's very popular to criticize
link |
quote unquote mainstream media, but there is legitimacy to criticizing them the same. I
link |
love listening to NPR for example, but it's clear that there's a team behind it.
link |
There's a constant commercial breaks. There's this kind of like rush of like,
link |
okay, I have to interrupt you now because we have to go to commercial. Just this whole,
link |
it creates, it destroys the possibility of nuanced conversation. Yeah, exactly.
link |
Evian, which Charles Isbel who I talked to yesterday told me that Evian is naive backwards,
link |
which the fact that his mind thinks this way is just quite brilliant. Anyway, there's a freedom
link |
to this podcast. He's Dr. Awkward, which by the way, is a palindrome. That's a palindrome that
link |
I happen to know from other parts of my life. And I just figured out, well, you know, use it against
link |
Charles. Dr. Awkward. So what was the most challenging parody song to make? Was it the
link |
Thriller one? No, that one was really fun. I wrote the lyrics really quickly. And then I gave it
link |
over to the production team. They recruited an acapella group to sing. That went really smoothly.
link |
It's great having a team because then you can just focus on the part that you really love,
link |
which in my case is writing the lyrics. For me, the most challenging one, not challenging in a bad
link |
way, but challenging in a really fun way, was I did, one of the parody songs I did is about
link |
the halting problem in computer science. The fact that you can't create a program that can tell
link |
for any other arbitrary program, whether it's actually going to get stuck in an infinite loop
link |
or whether it's going to eventually stop. And so I did it to an 80s song because that's, I hadn't
link |
started my new thing of learning current songs. And it was Billy Joel's The Piano Man, which is
link |
a great song. Sing me a song. You're The Piano Man. So the lyrics are great because first of all,
link |
it rhymes. Not all songs rhyme. I've done Rolling Stone songs, which turn out to have no rhyme scheme
link |
whatsoever. They're just sort of yelling and having good time, which makes it not fun from a
link |
parody perspective because like you can say anything, but the lines rhyme and there was a lot
link |
of internal rhymes as well. And so figuring out how to sing with internal rhymes, a proof of the
link |
halting problem was really challenging. And it was, I really enjoyed that process.
link |
What about the last question on this topic? What about the dancing in the thriller video?
link |
How many takes that take? So I wasn't planning to dance. They had me in the studio and they gave
link |
me the jacket and it's like, well, you can't, if you have the jacket and the glove, like there's
link |
not much you can do. So I think I just danced around and then they said, why don't you dance a
link |
little bit? There was a scene with me and Charles dancing together. They did not use it in the
link |
video, but we recorded it. I don't remember. Yeah, yeah. No, it was pretty funny. And Charles,
link |
who has this beautiful, wonderful voice, doesn't really sing. He's not really a singer. And so
link |
that was why I designed the song with him doing a spoken section and me doing the singing. It's
link |
very like Barry White. Yeah, smooth baritone. Yeah. Yeah, it's great. That was awesome.
link |
So one of the other things Charles said is that, you know, everyone knows you as like a super nice
link |
guy, super passionate about teaching and so on. What he said, I don't know if it's true, that
link |
despite the fact that you are cold blood, like, okay, I will admit this finally for the first
link |
time that was that was me. It's the Johnny Cash song. The Manorino, just to watch him die.
link |
That you actually do have some strong opinions and some topics. So if this in fact is true, what
link |
a strong opinions would you say you have? Is there ideas you think maybe an artificial
link |
intelligence, machine learning, maybe in life that you believe is true, that others might,
link |
you know, some number of people might disagree with you on. So I try very hard to see things from
link |
multiple perspectives. There's this great Calvin and Harbs, Calvin and Hobbes cartoon where Cal,
link |
do you know, okay, so Calvin's dad is always kind of a bit of a foil and he talked Calvin and Calvin
link |
had done something wrong. The dad talks to him into like seeing it from another perspective
link |
and Calvin like this breaks Calvin because he's like, oh my gosh, now I can see the opposite
link |
sides of things. And so it becomes like a Cubist cartoon where there is no front and back. Everything
link |
is just exposed and it really freaks him out. And finally he settles back down. It's like,
link |
oh good, no, I can make that go away. But like I'm that, I live in that world where I'm trying to
link |
see everything from every perspective all the time. So there are some things that I've formed
link |
opinions about that I would be harder, I think, to disavow me of. One is the super intelligence
link |
argument and the existential threat of AI is one where I feel pretty confident in my feeling about
link |
that one. Like I'm willing to hear other arguments, but like I am not particularly moved by the idea
link |
that if we're not careful, we will accidentally create a super intelligence that will destroy
link |
human life. Let's talk about that. Let's get you in trouble and record your video. It's like Bill
link |
Gates, I think he said like some quote about the internet that that's just going to be a small
link |
thing. It's not going to really go anywhere. And then I think Steve Ballmer said, I don't know why
link |
I'm sticking on Microsoft. That's something that like smartphones are useless. There's no reason
link |
why Microsoft should get into smartphones that kind of. So let's get you, let's talk about
link |
AGI. As AGI is destroying the world, we'll look back at this video and see. No, I think it's
link |
really interesting to actually talk about because nobody really knows the future. So you have to
link |
use your best intuition. It's very difficult to predict it, but you have spoken about AGI
link |
and the existential risks around it and sort of based on your intuition that
link |
we're quite far away from that being a serious concern relative to the other concerns we have.
link |
Can you maybe unpack that a little bit? Yeah, sure, sure, sure. So as I understand it, that
link |
for example, I read Boston's book and a bunch of other reading material about this sort of general
link |
way of thinking about the world. And I think the story goes something like this, that we
link |
will at some point create computers that are smart enough that they can help design the next
link |
version of themselves, which itself will be smarter than the previous version of themselves,
link |
and eventually bootstrapped up to being smarter than us, at which point we are essentially at
link |
the mercy of this sort of more powerful intellect, which in principle, we don't have any control over
link |
what its goals are. And so if its goals are at all out of sync with our goals, for example,
link |
the continued existence of humanity, we won't be able to stop it. It'll be way more powerful than
link |
us and we will be toast. So there's some, I don't know, very smart people who have signed on to
link |
that story. And it's a, it's a compelling story. I want to, now I can really get myself in trouble.
link |
I once wrote an op ed about this, specifically responding to some quotes from Elon Musk,
link |
who has been, you know, on this very podcast, more than once, and
link |
AI summoning the demon. I think he said, but then he came to Providence, Rhode Island,
link |
which is where I live, and said to the governors of all the states, you know,
link |
you're worried about entirely the wrong thing. You need to be worried about AI. You need to be
link |
very, very worried about AI. So, and journalists kind of reacted to that. They wanted to get
link |
people's people's take. And I was like, okay, my, my, my belief is that one of the things that
link |
makes Elon Musk so successful and so remarkable as an individual is that he believes in the power
link |
of ideas. He believes that you can have, you can, if you know, if you have a really good idea for
link |
getting into space, you can get into space. If you have a really good idea for a company or for
link |
how to change the way that people drive, you just have to do it and it can happen.
link |
It's really natural to apply that same idea to AI. You see the systems that are doing some pretty
link |
remarkable computational tricks, demonstrations, and then to take that idea and just push it
link |
all the way to the limit and think, okay, where does this go? Where is this going to take us next?
link |
And if you're a deep believer in the power of ideas, then it's really natural to believe that
link |
those ideas could be taken to the extreme and, and kill us. So I think, you know, his strength is
link |
also his undoing because that doesn't mean it's true. Like it doesn't mean that that has to happen,
link |
but it's natural for him to think that. So another way to phrase the way he thinks, and
link |
I, I find it very difficult to argue with that line of thinking. So Sam Harris is another person
link |
from the neuroscience perspective that thinks like that is saying, well, is there something
link |
fundamental in the physics of the universe that prevents this from eventually happening?
link |
And that's, Nick Bostrom thinks in the same way that kind of zooming out, yeah, okay, we humans
link |
now are existing in this like timescale of minutes and days. And so our intuition is in this timescale
link |
of minutes, hours and days. But if you look at the span of human history, is there any reason
link |
we can't see this in 100 years? And like, is there, is there something fundamental about
link |
the laws of physics that prevent this? And if it doesn't, then it eventually will happen,
link |
or we will destroy ourselves in some other way. And it's very difficult, I find to actually argue
link |
against that. Yeah. Me too. And not sound like, not sound like you're just like rolling your
link |
eyes. Like I have like science fiction, we don't have to think about it. But even, even worse than
link |
that, which is like, I don't know, kids, but like, I got to pick up my kids now, like this, okay,
link |
I see more pressing short term. Yeah, there's more pressing short term things that like,
link |
stop it with this existential crisis will have much, much shorter things like now,
link |
especially this year, there's COVID. So like any kind of discussion like that is,
link |
like there's, you know, there's pressing things today. And then so the same Harris argument,
link |
well, like any day, the exponential singularity can occur is very difficult to argue against.
link |
I mean, I don't know. But part of his story is also, he's not going to put a date on it.
link |
It could be in a thousand years, it could be in a hundred years, it could be in two years.
link |
It's just that as long as we keep making this kind of progress, it's ultimately
link |
has to become a concern. I kind of am on board with that. But the thing that the piece that I
link |
feel like is missing from that, that way of extrapolating from the moment that we're in,
link |
is that I believe that in the process of actually developing technology that can
link |
really get around in the world and really process and do things in the world in a sophisticated
link |
way, we're going to learn a lot about what that means, which that we don't know now,
link |
because we don't know how to do this right now. If you believe that you can just turn on a deep
link |
learning network and it eventually give it enough compute and it'll eventually get there.
link |
Well, sure, that seems really scary, because we won't be in the loop at all. We won't be
link |
helping to design or target these kinds of systems. But I don't see that feels like it
link |
is against the laws of physics, because these systems need help. They need to surpass the
link |
difficulty, the wall of complexity that happens in arranging something in the form that will
link |
happen. I believe in evolution. I believe that there's an argument. There's another argument,
link |
just to look at it from a different perspective, that people say, why don't believe in evolution?
link |
How could evolution, it's sort of like a random set of parts assemble themselves into a 747,
link |
and that could just never happen. So it's like, okay, that's maybe hard to argue against. But
link |
clearly, 747s do get assembled. They get assembled by us. Basically, the idea being that
link |
there's a process by which we will get to the point of making technology that has that kind of
link |
awareness. And in that process, we're going to learn a lot about that process, and we'll have
link |
more ability to control it or to shape it or to build it in our own image. It's not something
link |
that is going to spring into existence like that 747. And we're just going to have to contend with
link |
it completely unprepared. Now, it's very possible that in the context of the long arc of human history,
link |
it will in fact spring into existence. But that that springing might take like if you look at
link |
nuclear weapons, like even 20 years is a springing in the context of human history. And it's very
link |
possible just like with nuclear weapons that we could have, I don't know what percentage you want
link |
to put at it, but the possibility of could have knocked ourselves out. Yeah, the possibility of
link |
human beings destroying themselves in the 20th century with the nuclear weapons, I don't know,
link |
you can, if you really think through it, you could really put it close to like, I don't know,
link |
30 40%, given like the certain moments of crisis that happen. So like, I think
link |
one, like fear in the shadows that's not being acknowledged is it's not so much the AI will
link |
run away is is that as it's running away, we won't have enough time to think through how to stop it.
link |
Right. Fast takeoff or fume. Yeah, I mean, my much bigger concern, I wonder what you think about
link |
it, which is we won't know it's happening. So I kind of think that there's an AGI situation
link |
already happening with social media, that our minds, our collective intelligence of human
link |
civilizations already being controlled by an algorithm. And like, we're we're already super,
link |
like the level of a collective intelligence, thanks to Wikipedia, people should donate to
link |
Wikipedia to feed the AGI. Man, if we had a super intelligence that that was in line with
link |
Wikipedia's values, that it's a lot better than a lot of other things I could imagine. I trust
link |
Wikipedia more than I trust Facebook or YouTube as far as trying to do the right thing from a
link |
rational perspective. Now, that's not where you were going. I understand that. But it does strike
link |
me that there's sort of smarter and less smart ways of exposing ourselves to each other on the
link |
internet. Yeah, the interesting thing is that Wikipedia and social media have very different
link |
forces. You're right. I mean, Wikipedia, if AGI was Wikipedia, it'd be just like this cranky,
link |
overly competent editor of articles. There's something to that. But the social media aspect
link |
is not so the vision of AGI is as a separate system that's super intelligent. That's super
link |
intelligent. That's one key little thing. I mean, there's the paperclip argument that's super dumb,
link |
but super powerful systems. But with social media, you have a relatively like algorithms
link |
we may talk about today, very simple algorithms that when something Charles talks a lot about,
link |
which is interactive AI, when they start like having at scale, like tiny little interactions
link |
with human beings, they can start controlling these human beings. So a single algorithm can
link |
control the minds of human beings slowly to where we might not realize it can start wars,
link |
it can start, it can change the way we think about things. It feels like in the long arc of history,
link |
if I were to sort of zoom out from all the outrage and all the tension on social media,
link |
that it's progressing us towards better and better things. It feels like chaos and toxic and all that
link |
kind of stuff. It's chaos and toxic. Yeah. But it feels like actually, the chaos and toxic is
link |
similar to the kind of debates we had from the founding of this country. There's a civil war
link |
that happened over that period. And ultimately, it was all about this tension of like,
link |
something doesn't feel right about our implementation of the core values we hold
link |
as human beings and they're constantly struggling with this. And that results in
link |
people calling each other, just being shady to each other on Twitter. But ultimately,
link |
the algorithm is managing all that. And it feels like there's a possible future in which that algorithm
link |
controls us into the direction of self destruction and whatever that looks like.
link |
Yeah. So all right, I do believe in the power of social media to screw us up royally. I do believe
link |
in the power of social media to benefit us too. I do think that we're in a, yeah, it's sort of
link |
almost got dropped on top of us. And now we're trying to, as a culture, figure out how to cope
link |
with it. There's a sense in which, I don't know, there's some arguments that say that, for example,
link |
I guess, college age students now, late college age students now, people who were in middle school
link |
when social media started to really take off, maybe really damaged. This may have really
link |
hurt their development in a way that we don't have all the implications of quite yet.
link |
That's the generation who, and I hate to make it somebody else's responsibility,
link |
but they're the ones who can fix it. They're the ones who can figure out,
link |
how do we keep the good of this kind of technology without letting it eat us alive?
link |
And if they're successful, we move on to the next phase, the next level of the game. If they're not
link |
successful, then, yeah, then we're going to wreck each other. We're going to destroy society.
link |
So you're going to, in your old age, sit on a porch and watch the world burn because of the
link |
TikTok generation that... I believe, well, so this is my kid's age, right? And certainly my
link |
daughter's age, and she's very tapped in to social stuff, but she's also, she's trying to find that
link |
balance of participating in it and in getting the positives of it, but without letting it eat her
link |
alive. And I think sometimes she ventures... I hope she doesn't watch this. Sometimes I think
link |
she ventures a little too far and is consumed by it, and other times she gets a little distant.
link |
And if there's enough people like her out there, they're going to navigate this choppy waters.
link |
That's an interesting skill, actually, to develop. I talked to my dad about it. You know, I've now
link |
somehow, this podcast in particular, but other reasons has received a little bit of attention.
link |
And with that, apparently in this world, even though I don't shut up about love and I'm just
link |
all about kindness, I have now a little mini army of trolls. It's kind of hilarious, actually,
link |
but it also doesn't feel good. But it's a skill to learn to not look at that.
link |
Like to moderate, actually, how much you look at that. The discussion I have with my dad, it's
link |
similar to, it doesn't have to be about trolls. It could be about checking email, which is like,
link |
if you're anticipating, you know, there's, my dad runs a large institute at Drexel University,
link |
and there could be stressful emails you're waiting, like there's drama of some kinds.
link |
And so like, there's a temptation to check the email. If you send an email and you got it,
link |
and that pulls you in into, it doesn't feel good. And it's a skill that he actually complains that
link |
he hasn't learned, I mean, he grew up without it. So he hasn't learned the skill of how to
link |
shut off the internet and walk away. And I think young people, while they're also being,
link |
quote unquote, damaged by like, you know, being bullied online, all of those stories,
link |
which are very like horrific, you basically can't escape your bullies these days when you're growing
link |
up. But at the same time, they're also learning that skill of how to be able to shut off the,
link |
like disconnect with it, be able to laugh at it, not take it too seriously. It's a fascinating,
link |
like we're all trying to figure this out. Just like you said, has it been dropped on us? And
link |
we're trying to figure it out. Yeah, I think that's really interesting. And I, I, I guess I've become
link |
a believer in the human design, which I feel like I don't completely understand, like, how do you
link |
make something as robust as us? Like we're so flawed in so many ways. And yet, and yet, you know,
link |
we dominate the planet. And we do seem to manage to get ourselves out of scrapes, eventually,
link |
not necessarily the most elegant possible way, but somehow we get, we get to the next step.
link |
And I don't know how I'd make a machine do that. I, I, I generally speaking, like if I train one
link |
of my reinforcement learning agents to play a video game, and it works really hard on that
link |
first stage over and over and over again, and it makes it through it succeeds on that first level.
link |
And then the new level comes, and it's just like, okay, I'm back to the drawing board.
link |
And somehow humanity, we keep leveling up, and then somehow managing to put together the skills
link |
necessary to achieve success, some semblance of success in that next level too. And, you know,
link |
I hope we can keep doing that. You mentioned reinforcement learning. So you've had a couple
link |
of years in the field. No, quite, you know, quite a few, quite a long career in artificial
link |
intelligence broadly, but reinforcement learning specifically. Can you maybe give a hint about
link |
your sense of the history of the field? And in some ways it's changed with the advent of deep
link |
learning, but as long roots, like how is it we've done it out of your own life? How have you seen
link |
the community change or maybe the ideas that it's playing with change? I've had the privilege,
link |
the pleasure of being, of having almost a front row seat to a lot of this stuff. And it's been
link |
really, really fun and interesting. So when I was in college in the 80s, early 80s, the neural net
link |
thing was starting to happen. And I was taking a lot of psychology classes and a lot of computer
link |
science classes as a college student. And I thought, you know, something that can play tic tac toe
link |
and just like learn to get better at it, that ought to be a really easy thing. So I spent almost,
link |
almost all of my, what would have been vacations during college, like hacking on my home computer,
link |
trying to teach it how to play tic tac toe and programming language. Basic. Oh yeah. That's
link |
my first language. That's my native language. Is that when you first fell in love with computer
link |
science, just like programming basic on that? What was the computer? Do you remember? I had a
link |
TRS80 model one before they were called model ones, because there was nothing else. I got my
link |
computer in 1979. So I was, I would have been bar mitzvahed, but instead of having a big party
link |
that my parents threw on my behalf, they just got me a computer, because that's what I really,
link |
really, really wanted. I saw them in the, in the, in the mall in Rayershack. And I thought,
link |
what, how are they doing that? I would try to stump them. I would give them math problems,
link |
like one plus, and then in parentheses, two plus one. And I would always get it right. I'm like,
link |
how do you know so much? Like I've had to go to an algebra class for the last few years to learn
link |
this stuff. And you just seem to know. So I was, I was, I was smitten and I got a computer. And I
link |
think ages 13 to 15, I have no memory of those years. I think I just was in my room with the
link |
computer. Listening to Billy Joel. Communing, possibly listening to the radio, listening to
link |
Billy Joel. That was the one album I had on vinyl at that time. And, and then I got it on cassette
link |
tape. And that was really helpful. Because then I could play it. I didn't have to go down in my
link |
parents Wi Fi or Hi Fi. Sorry. And then age 15, I remember kind of walking out and like, okay,
link |
I'm ready to talk to people again. Like I've learned what I need to learn here.
link |
And so yeah, so, so that was, that was my home computer. And so I went to college and I was
link |
like, oh, I'm totally going to study computer science. I opted the college I chose specifically
link |
had a computer science major, the one that I really want the college I really wanted to go to
link |
didn't. So bye bye to them. Which college did you go to? So I went to Yale. Princeton would
link |
have been way more convenient. And it was just beautiful campus. And it was close enough to
link |
home. And I was really excited about Princeton. And I visited, I said, so computer science major
link |
like, well, we have computer engineering. I'm like, oh, I don't like that word engineering.
link |
I like computer science. I really, I want to do like, you're saying hardware and software.
link |
They're like, yeah, I'm like, I just want to do software. I couldn't care less about hardware.
link |
You grew up in Philadelphia? I grew up outside Philly. Yeah. Yeah. So the, you know, local
link |
schools were like Penn and Drexel and Temple, like everyone in my family went to Temple,
link |
at least at one point in their lives, except for me. So yeah, Philly family.
link |
Yale had a computer science department. And that's when you, it's kind of interesting.
link |
You said 80s in neural networks. That's when the neural networks is a hot new thing or a hot thing
link |
period. So what is that in college when you first learned about neural networks? Yeah.
link |
When she learned like how it was in a psychology class, not in a CS class.
link |
Yeah. Was it psychology or cognitive science or like, do you remember like what context?
link |
It was, yeah, yeah, yeah. So, so I was a, I've always been a bit of a cognitive psychology
link |
groupie. So like I, I studied computer science, but I like, I like to hang around where the
link |
cognitive scientists are. Cause I don't know brains, man. They're like, they're wacky. Cool.
link |
And they have a bigger picture view of things. They're a little less engineering, I would say.
link |
They're more, they're more interested in the nature of cognition and intelligence and perception
link |
and how like division system work. Like they're asking always bigger questions. Now with the
link |
deep learning community, there, I think more, there's a lot of intersections, but I do find in
link |
that the, the neuroscience folks actually, and cognitive psychology, cognitive science folks
link |
are starting to learn how to program, how to use neural artificial neural networks.
link |
And they are actually approaching problems in like totally new, interesting ways. It's fun to
link |
watch that grass students from those departments like approach a problem of machine learning.
link |
Right. They come in with a different perspective. Yeah. They don't care about like your image
link |
net data set or whatever. They, they want like to understand the, the, the like the basic mechanisms
link |
at the, at the neuronal level, at the functional level of intelligence. So it's kind of, it's
link |
going to cool to see them work. But yeah. Okay. So you always love, you're always a
link |
group of cognitive psychology. Yeah. Yeah. And so, so it was in a class by Richard Garrick.
link |
He was kind of my, my favorite psych professor in college. And I took like three different
link |
classes with him. And yeah, so that we were, they were talking specifically the class, I think was
link |
kind of a, there was a big paper that was written by Steven Pinker and Prince. I don't, I'm blanking
link |
on Prince's first name, but Princeton, Pinker and Prince, they wrote kind of a, they were at that
link |
time kind of like, I'm blanking on the names of the current people. The cognitive scientists who
link |
were complaining a lot about deep networks. Oh, Gary, Gary Marcus. Gary Marcus. And who else? I
link |
mean, there's a few, but Gary, Gary is the most feisty. Sure. Gary is very feisty. And with this,
link |
with his coauthor, they, they, you know, they're kind of doing these kind of takedowns where they
link |
say, okay, well, yeah, it does all these amazing things, amazing things. But here's a shortcoming,
link |
here's a shortcoming, here's a shortcoming. And so the Pinker Prince paper is kind of like the,
link |
that generation's version of Marcus and Davis, right? Where they're, they're trained as cognitive
link |
scientists, but they're looking skeptically at the results in the, in the artificial intelligence,
link |
neural net kind of world and saying, yeah, it can do this and this and this, but like, it can't do
link |
that. And it can't do that. And it can't do that. Maybe in principle, or maybe just in practice at
link |
this point. But, but the fact of the matter is you're, you've narrowed your focus too far to be
link |
impressed, you know, you're impressed with the things within that circle, but you need to broaden
link |
that circle a little bit. You need to look at a wider set of problems. And so, so we, so I was
link |
in this seminar in college, that was basically a close reading of the Pinker Prince paper,
link |
which was like really thick. There was a lot going on in there. And, and it, and it talked about
link |
the reinforcement learning idea a little bit. I'm like, Oh, that sounds really cool, because
link |
behavior is what is really interesting to me about psychology anyway. So making programs that, I mean,
link |
programs are things that behave. People are things that behave. Like I want to make learning that
link |
learns to behave. In which way was reinforcement learning presented? Is this talking about human
link |
and animal behavior? Or are we talking about actual mathematical construct?
link |
That's right. So that's a good question. Right. So this is, I think it wasn't actually talked about
link |
as behavior in the paper that I was reading. I think that it just talked about learning.
link |
And to me, learning is about learning to behave, but really neural nets at that point were about
link |
learning, like supervised learning. So learning to produce outputs from inputs. So I kind of tried
link |
to invent reinforcement learning. I, when I graduated, I joined a research group at Belcore,
link |
which had spun out of Bellabs recently at that time, because of the divestiture of the, of
link |
long distance and local phone service in the 1980s, 1984. And I was in a group with Dave Ackley,
link |
who was the first author of the Boltzmann machine paper. So the very first neural net paper that
link |
could handle XOR, right? So XOR sort of killed neural nets, the very first, the zero width
link |
the first winter. Yeah. The, the Perceptron's paper and Hinton, along with the student Dave Ackley,
link |
and, and I think there was other authors as well, showed that no, no, no, with bolts machines,
link |
we can actually learn nonlinear concepts. And so everything's back on the table again. And that
link |
kind of started that second wave of neural networks. So Dave Ackley was, he became my mentor at,
link |
at Belcore. And we talked a lot about learning and life and computation and how all these things
link |
fit together. Now, Dave and I have a podcast together. So, so I get to kind of enjoy that
link |
sort of his perspective once again, even, even all these years later. And so I said, so I said,
link |
I was really interested in learning, but in the concept of behavior. And he's like, oh, well,
link |
that's reinforcement learning here. And he gave me Rich Sutton's 1984 TD paper. So I read that
link |
paper, I honestly didn't get all of it. But I got the idea, I got that they were using
link |
that he was using ideas that I was familiar with in the context of neural nets and, and
link |
like sort of back prop. But with this idea of making predictions over time, I'm like,
link |
this is so interesting, but I don't really get all the details I said to Dave. And Dave said,
link |
oh, well, why don't we have him come and give a talk. And I was like,
link |
wait, what, you can do that? Like, these are real people? I thought they were just words. I thought
link |
it was just like ideas that somehow magically seeped into paper. He's like, no, I, I, I, I know
link |
Rich, like, we'll just have him come down and he'll give a talk. And so I was, you know,
link |
my mind was blown. And so Rich came and he gave a talk at Bellcore. And he talked about what he
link |
was super excited, which was they had just figured out at the time, Q learning. So Watkins had
link |
visited the Rich Sutton's lab at UMass or Andy Bartow's lab that Rich was a part of.
link |
And he was really excited about this because it resolved a whole bunch of problems that he
link |
didn't know how to resolve in the, in the earlier paper. And so, for people who don't know,
link |
TD, temporal difference, these are all just algorithms for reinforcement learning.
link |
Right. And TD, temporal difference, in particular, is about making predictions over time.
link |
And you can try to use it for making decisions, right? Because if you can predict how good a
link |
future action, an action outcomes will be in the future, you can choose one that has better.
link |
And, but the theory didn't really support changing your behavior. Like the predictions
link |
had to be of a consistent process if you really wanted it to work. And one of the things that
link |
was really cool about Q learning, another algorithm for reinforcement learning is it was
link |
off policy, which meant that you could actually be learning about the environment and what
link |
the value of different actions would be while actually figuring out how to behave optimally.
link |
Yeah. So that was a revelation.
link |
Yeah. And the proof of that is kind of interesting. I mean, that's really surprising to me when I
link |
first read that and then enriched Rich Sutton's book on the matter. It's kind of beautiful that
link |
a single equation can capture all one line of code and like you can learn anything.
link |
Yeah, like, so equation and code, you're right. Like you can, the code that you can arguably,
link |
at least if you like squint your eyes can say this is all of intelligence,
link |
that you can implement that in a single, well, I think I started with Lisp, which is a shout
link |
out to Lisp, with like a single line of code, key piece of code, maybe a couple,
link |
that you could do that as kind of magical. It feels so good to be true.
link |
Well, and it sort of is. Yeah. It seems they require an awful lot of extra stuff supporting it.
link |
But nonetheless, the idea is really good. And as far as we know, it is a very reasonable way of
link |
trying to create adaptive behavior, behavior that gets better at something over time.
link |
Did you find the idea of optimal at all compelling that you could prove that it's optimal? So like
link |
one part of computer science that it makes people feel warm and fuzzy inside is when you
link |
can prove something like that assorting algorithm, worst case, runs and log in, and it makes everybody
link |
feel so good. Even though in reality, it doesn't really matter what the worst case is, what matters
link |
is like, does this thing actually work in practice on this particular actual set of data that I
link |
enjoy? Did you? So here's a place where I have maybe a strong opinion, which is like,
link |
you're right, of course, but no, no. So what makes worst case so great, right? If you have a
link |
worst case analysis, so great is that you get modularity. You can take that thing and plug it
link |
into another thing and still have some understanding of what's going to happen when you click them
link |
together, right? If it just works well in practice, in other words, with respect to some distribution
link |
that you care about, when you go plug it into another thing, that distribution can shift,
link |
it can change, and your thing may not work well anymore. And you want it to and you wish it does
link |
and you hope that it will, but it might not. So you're saying you don't like machine learning.
link |
But we have some positive theoretical results for these things. You can come back at me
link |
with, yeah, but they're really weak and yeah, they're really weak. And you can even say that
link |
sorting algorithms, like if you do the optimal sorting algorithm, it's not really the one that
link |
you want. And that might be true as well. But it is, the modularity is a really powerful state.
link |
I really like that. As an engineer, you can then assemble different things you can count on them
link |
to be, I mean, it's interesting. It's a balance, like with everything else in life, you don't want
link |
to get too obsessed. I mean, this is what computer scientists do, which they tend to get obsessed,
link |
they over optimize things, or they start by optimizing them, they over optimize. So it's
link |
easy to like get really granular about this thing. But like the step from an n squared to an n log n
link |
sorting algorithm is a big leap for most real world systems, no matter what the actual
link |
behavior of the system is, that's a big leap. And the same can probably be said for other kind of
link |
first leaps that you would take on a particular problem. Like it's picking the low hanging fruit
link |
or whatever the equivalent of doing the not the dumbest thing, but the next to the dumbest thing
link |
is picking the most delicious, reachable fruit. Yeah, most delicious, reachable fruit. I don't
link |
know why that's not a saying. And yeah. Okay, so, so you then this is the 80s and this kind of idea
link |
starts to percolate of learning. Yeah, I got to meet rich Sutton. So everything was sort of
link |
downhill from there. And that was that was really the pinnacle of everything. But then I, you know,
link |
then I felt like I was kind of on the inside. So then as interesting results were happening,
link |
I could like check in with, with rich or with Jerry Tesaro, who had a huge impact on kind of
link |
early thinking in, in temple difference learning and reinforcement learning and show that you
link |
could do, you could solve problems that we didn't know how to solve any other way.
link |
And so that was really cool. So it was good things were happening, I would hear about it from
link |
either the people who were doing it, or the people who were talking to the people who were
link |
doing it. And so I was able to track things pretty well through, through the 90s.
link |
So what wasn't most of the excitement on reinforcement learning in the 90s era with,
link |
what is it, TD gamma, like, what's the role of these kind of little like fun,
link |
game playing things and breakthroughs about, you know, exciting the community? Was that,
link |
like, what were your, because you've also built across or we're part of building a crossword
link |
puzzle, uh, solver program, yeah, solving program, uh, called prover. So,
link |
so you were interested in this as a problem, like in forming, in using games to understand how to
link |
build, uh, intelligent systems. So like, what did you think about TD gamma? Like, what did you
link |
think about that whole thing in the 90s? Yeah. I mean, I found the TD gamma result really just
link |
remarkable. So I had known about some of Jerry's stuff before he did TD gamma and he did a system,
link |
just more vanilla, well, not entirely vanilla, but a more classical backpropy kind of, uh,
link |
network for playing backgammon, where he was training it on expert moves. So it was kind of
link |
supervised. But the way that it worked was not to mimic the actions, but to learn internally
link |
and evaluation function. So to learn, well, if the expert chose this over this, that must mean
link |
that the expert values this more than this. And so let me adjust my weights to make it so that
link |
the network evaluates this as being better than this. So it could learn from, from human
link |
preferences, it could learn its own preferences. And then when he took the step from that to
link |
actually doing it as a full on reinforcement learning problem where you didn't need a trainer,
link |
you could just let it play, that was, that was remarkable. Right. And so I think as,
link |
as humans often do, as we've done in the recent past as well, people extrapolate and it's like,
link |
Oh, well, if you can do that, which is obviously very hard, then obviously you could do all these
link |
other problems that we, that we want to solve that we know are also really hard. And it turned out
link |
very few of them ended up being practical. Um, partly because I think neural nets,
link |
it's certainly at the time we're struggling to be consistent and reliable. And so training them
link |
in a reinforcement learning setting was a bit of a mess. I had, uh, I don't know, generation
link |
after generation of like master students who wanted to do value function approximation,
link |
basically learn reinforcement learning with neural nets. And over and over and over again,
link |
we were failing. We couldn't get the, the good results that Jerry Tesarro got. I now believe
link |
that Jerry is a neural net whisperer. He has a particular ability to get neural networks to
link |
do things that other people would find impossible. And it's not the technology, it's the technology
link |
and Jerry together. Yeah. At which I think speaks to the role of the human expert in the
link |
process of machine learning. Right. It's so easy. We were so drawn to the idea that,
link |
that it's the technology that is, that is where the power is coming from,
link |
that I think we lose sight of the, of the fact that sometimes you need a really good,
link |
just like, I mean, no one would think, Hey, here's this great piece of software. Here's like,
link |
I don't know, GNU Emacs or whatever. Um, and it doesn't that prove that computers are super
link |
powerful and basically going to take over the world. It's like, no, Stamina is a hell of a hacker.
link |
Right. So he was able to make the code do these amazing things. He couldn't have done it without
link |
the computer, but the computer couldn't have done it without him. And so I think people discount
link |
the role of people like Jerry who, who, um, who have just a particular, a particular set of skills.
link |
On that topic, by the way, as a small side note, I tweeted Emacs is greater than Vim yesterday
link |
and deleted, deleted the tweet 10 minutes later when I realized it started a war. Yeah. I was like,
link |
Oh, I was just kidding. I was just being provocative walk, walk, walk back. So people still feel
link |
passionately about that particular piece of, uh, I don't get that cause Emacs is clearly so much
link |
better. I don't understand. But you know, why do I say that? Because I, cause like I spent a block
link |
of time in the eighties, um, making my fingers know the Emacs keys. And now like that's part
link |
of the thought process for me. Like I need to express. And if you take that, if you take my Emacs
link |
key bindings away, I become lit. Yeah. I can't express myself. I'm the same way with the, I
link |
don't know if you know what, what it is, but a Kinesis keyboard, which is, uh, this butt shaped
link |
keyboard. Yes, I've seen them. Yeah. And they're very, uh, I don't know, sexy, elegant. They're
link |
beautiful. Yeah. They're, they're gorgeous, uh, way too expensive. But, uh, the, the problem with
link |
them similar with Emacs is when you, once you learn to use it, it's harder to use other things.
link |
It's hard to use other things. There's this absurd thing where I have like small, elegant,
link |
lightweight, beautiful little laptops and I'm sitting there in a coffee shop with a giant
link |
Kinesis keyboard and a sexy little laptop. It's absurd. But it, you know, like I used to feel
link |
bad about it, but at the same time, you just kind of have to, sometimes it's back to the Billy Joel
link |
thing. You just have to throw that Billy Joel record and throw Taylor Swift and Justin Bieber
link |
to the wind. So. See, but I like them now because I, because again, I have no musical taste. Like,
link |
like now that I've heard Justin Bieber enough, I like, I really like his songs and Taylor Swift,
link |
not only do I like her songs, but my daughter's convinced that she's a genius. And so now I
link |
basically have, I'm signed on to that. So. So yeah, that, that speaks to the back to the robustness
link |
of the human brain. That speaks to the neuroplasticity that you can just, you can just like a mouse,
link |
teach yourself to a, or a dog, teach yourself to enjoy Taylor Swift. I'll try it out. I don't know.
link |
I try, you know what it has to do with just like acclimation, right? Just like you said,
link |
a couple of weeks. Yeah. That's an interesting experiment. I'll actually try that. Like I'll
link |
listen to. If that wasn't the intent of the experiment, just like social media, it wasn't
link |
intended as an experiment to see what we can take as a society, but it turned out that way.
link |
I don't think I'll be the same person on the other side of the week listening to Taylor Swift,
link |
but let's try. You know, it's more compartmentalized. Don't be so worried. Like it's,
link |
like I get that you can be worried, but don't be so worried because we compartmentalize really
link |
well. And so it won't bleed into other parts of your life. You won't start, I don't know,
link |
wearing red lipstick or whatever. Like it's, it's fine. It's fine. Change fashion and everything.
link |
It's fine. But you know what? The thing you have to watch out for is you'll walk into a
link |
coffee shop once we can do that again. And recognize the song. And you'll be, no,
link |
you won't know that you're singing along until everybody in the coffee shop is looking at you.
link |
And then you're like, that wasn't me. Yeah. That's the, you know, people are afraid of AGI. I'm
link |
afraid of the Taylor Swift takeover. Yeah. And I mean, people should know that T.D. Gammon
link |
was, I get, would you call it, do you like the terminology of self play by any chance?
link |
So, so like systems that learn by playing themselves, just, I don't know if it's the best
link |
word, but, uh, so what's, what's the problem with that term? Okay. So it's like the big bang,
link |
like it's, it's like talking to serious physicists. Do you like the term big bang when, when it was
link |
early? I feel like it's the early days of self play. I don't know. Maybe it was just previously,
link |
but I think it's been used by only a small group of people. And so like, I think we're still deciding,
link |
is this ridiculously silly name, a good name for the concept, potentially one of the most
link |
important concepts in artificial intelligence. Okay. Depends how broadly you apply the term. So I
link |
used the term in my 1996 PhD dissertation. Are you, wow, the actual terms of self play.
link |
Yeah. Because, because Tassaro's paper was something like, um, training up an expert
link |
backgammon player through self play. So I think it was in the title of his paper. If not in the
link |
title, it was definitely a term that he used. There's another term that we got from that work
link |
is rollout. So I don't know if you, do you ever hear the term rollout? That's a backgammon term
link |
that has now applied generally in computers. Well, at least in AI, because of TDGammon. Yeah.
link |
That's fascinating. So how is self play being used now? And like, why is it, does it, does it
link |
feel like a more general powerful concept is sort of the idea of, well, the machine just
link |
going to teach itself to be smart? Yeah. So that's, that's where maybe you can correct me,
link |
but that's where, you know, the continuation of the spirit and actually like literally the exact
link |
algorithms of TDGammon are applied by DeepMind and OpenAI to learn games that are a little bit
link |
more complex, that when I was learning artificial intelligence go was presented to me with artificial
link |
intelligence, the modern approach. I don't know if they explicitly pointed to go in those books
link |
as like unsolvable kind of thing, like implying that these approaches hit their limit in this
link |
have with these particular kind of games. So something, I don't remember if the book said
link |
it or not, but something in my head for was the professors instilled in me the idea like,
link |
this is the limits of artificial intelligence of the field. Like it instilled in me the idea that
link |
if we can create a system that can solve the game of go, we've achieved AGI. That was kind of
link |
I didn't explicitly like say this, but that was the feeling. And so from, I was one of the people
link |
that it seemed magical when a learning system was able to beat a human world champion at the game
link |
of go. And even more so from that, that was AlphaGo, even more so with AlphaGo zero than kind of
link |
renamed and advanced into Alpha zero, beating a world champion or world class player without
link |
any supervised learning on expert games we're doing only through by playing itself. So that
link |
is, I don't know what to make of it. I think it'll be interesting to hear what your opinions are on
link |
just how exciting, surprising, profound, interesting, or boring the breakthrough performance of Alpha
link |
zero was. Okay. So AlphaGo knocked my socks off. That was so remarkable. Which aspect of it?
link |
They got it to work that they actually were able to leverage a whole bunch of different ideas,
link |
integrate them into one giant system. Just the software engineering aspect of it is mind blowing.
link |
I don't, I've never been a part of a program as complicated as the program that they built for
link |
that. And, and just the, you know, like, like Jerry Chisaro is a neural net whisperer, like,
link |
you know, David Silver is a kind of neural net whisperer too. He was able to coax these networks
link |
and these new way out there architectures to do these, you know, solve these problems that,
link |
as you said, you know, when we were learning from AI, no one had an idea how to make it work.
link |
It was, it was remarkable that these, you know, these, these techniques that were so good at
link |
playing chess and that could beat the world champion in chess couldn't beat, you know,
link |
your typical go playing teenager in Go. So the fact that the, you know, in a very short number
link |
of years, we kind of ramped up to trouncing people in Go just blew me away.
link |
So you, you're kind of focusing on the engineering aspect, which is also very surprising. I mean,
link |
there's something different about large, well funded companies. I mean, there's a compute aspect to
link |
it too. Sure. Like that, of course, I mean, that's similar to deep blue, right, with, with IBM.
link |
Like there's something important to be learned and remembered about a large company taking
link |
the ideas that are already out there and investing a few million dollars into it or, or more. And
link |
so you're kind of saying the engineering is kind of fascinating, both on the, with AlphaGo is
link |
probably just gathering all the data, right, of the, of the expert games, like organizing everything,
link |
actually doing distributed supervised learning. And to me, see the engineering I kind of took
link |
for granted, to me philosophically being able to persist in the, in the face of like long odds,
link |
because it feels like for me, I'll be one of the skeptical people in the room thinking that you
link |
can learn your way to, to beat go. Like it sounded like, especially with David Silver, it sounded
link |
like David was not confident at all. So like it was like, not, it's funny how confidence works.
link |
Yeah. It's like, you're not like cocky about it. Like, but. Right. Cause if you're cocky about it,
link |
you, you kind of stop and stall and don't get anywhere. Yeah. But there's like a hope
link |
that's unbreakable. Maybe that's better than confidence. It's a kind of wishful
link |
hope and a little dream. And you almost don't want to do anything else. You kind of keep doing it.
link |
That's, that seems to be the story and. But with enough skepticism that you're looking for where
link |
the problems are and fighting through them. Yeah. Cause you know, there's got to be a way out of
link |
this thing. Yeah. And for him, it was probably, there's, there's a bunch of little factors that
link |
come into play. It's funny how these stories just all come together. Like everything he did in his
link |
life came into play, which is like a love for video games and also a connection to, so the,
link |
the nineties had to happen with TD Gammon and so on. Yeah. And in some ways it's surprising,
link |
maybe you can provide some intuition to it that not much more than TD Gammon was done for quite
link |
a long time on the reinforcement learning front. Yeah. Is that weird to you? I mean,
link |
like I said, the, the students who I worked with, we tried to get,
link |
to basically apply that architecture to other problems and we consistently failed. There were
link |
a couple, a couple of really nice demonstrations that ended up being in the literature. There was
link |
a paper about controlling elevators, right? Where it's, it's like, okay, can we modify the heuristic
link |
that elevators use for deciding, like a bank of elevators for deciding which floors we should
link |
be stopping on to maximize throughput essentially. And you can set that up as a reinforcement
link |
learning problem and you can, you know, have a neural net represent the value function so that
link |
it's taking where all the elevators, where the button pushes, you know, this high dimensional,
link |
well, at the time, high dimensional input, you know, a couple dozen dimensions and turn that
link |
into a prediction as to, oh, is it going to be better if I stop at this floor or not? And ultimately,
link |
it appeared as though for the standard simulation distribution for people trying to leave the
link |
building at the end of the day, that the neural net learned a better strategy than the standard one
link |
that's implemented in elevator controllers. So that, that was nice. There was some work that
link |
Satinder Singh at all did on handoffs with cell phones, you know, deciding when, when should you
link |
hand off from this cell tower to this cell tower. Oh, okay, communication networks. Yeah. And so
link |
a couple things seemed like they were really promising. None of them made it into production
link |
that I'm aware of. And neural nets as a whole started to kind of implode around then. And so
link |
there just wasn't a lot of air in the room for people to try to figure out, okay,
link |
how do we get this to work in the RL setting? And then they, they found their way back in 10,
link |
in 10 plus years. So you said Alpha Go was impressive, like it's a big spectacle. Is there
link |
Right. So then Alpha zero. So I think I may have a slightly different opinion on this than
link |
some people. So I talked to Satinder Singh in particular about this. So Satinder was
link |
like Rich Sutton, a student of Antibarto. So they came out of the same lab, very influential,
link |
machine learning, reinforcement learning researcher. Now deep behind as is, as is Rich,
link |
though different sites, the two of them. He's in Alberta. Rich is in Alberta. And
link |
Satinder would be in England, but I think he's in England from Michigan at the moment.
link |
But the, but he was, yes, he was much more impressed with Alpha Go zero, which is didn't,
link |
didn't get a kind of a bootstrap in the beginning with human trained games. And it's just was purely
link |
self play. Though the first one Alpha Go was also a tremendous amount of self play, right? They
link |
started off, they kickstarted the, the action network that was making decisions. But then
link |
they trained it for a really long time using more traditional temple difference methods.
link |
So, so as a result, I didn't, it didn't seem that different to me. Like, it seems like, yeah, why
link |
wouldn't that work? Like once, once it works, it works. So what, but he found that, that removal
link |
of that extra information to be breathtaking, like that, that's a game changer. To me, the first
link |
thing was more of a game changer. But the open question, I mean, I guess that's the assumption
link |
is the expert games might contain within them, within them a, yeah, this amount of information.
link |
But we know that it went beyond that, right? We know that it somehow got away from that information
link |
because it was learning strategies. I don't think it's, I don't think Alpha Go is just
link |
better at implementing human strategies. I think it actually developed its own strategies that were,
link |
that were more effective. And so from that perspective, okay, well, so it, it made at
link |
least one quantum leap in terms of strategic knowledge. Okay, so now maybe it makes three.
link |
Like, okay, but that first one is the doozy, right? Getting it to, to, to work reliably and,
link |
and for the networks to, to hold on to the value well enough. Like that was, that was a big step.
link |
Well, isn't, maybe you could speak to this on the reinforcement learning front. So the
link |
starting from scratch and learning to do something like the first like, like random behavior to
link |
like crappy behavior to like somewhat okay behavior. It's not obvious to me that that's not
link |
like impossible to take those steps. Like if you just think about the intuition, like how the heck
link |
does random behavior become somewhat basic intelligent behavior? Not, not human level,
link |
not super human level, but just basic. But you're saying to you kind of the intuition is like,
link |
if you can go from human to super human level intelligence on the, on this particular task
link |
of game playing, then you're good at taking leaps. So you can take many of them.
link |
That the system, I believe that the system can take that kind of leap. Yeah. And also,
link |
I think that the beginner knowledge in go, like you can start to get a feel really quickly for
link |
the idea that, you know, certain parts of the being in certain parts of the board seems to be
link |
more associated with winning, right? Cause it's not, it's not stumbling upon the concept of winning.
link |
It's told that it wins or that it loses. Well, it's self play. So it both wins and loses.
link |
It's told which, which side won. And the information is kind of there to start percolating around to
link |
make a difference as to, um, well, these things have a better chance of helping you win and these
link |
things have a worse chance of helping you win. And so, you know, it can get to basic play,
link |
I think pretty quickly, then once it has basic play, well, now it's kind of forced to do some
link |
search to actually experiment with, okay, well, what gets me that next increment of, of improvement?
link |
How far do you think, okay, this is where you kind of bring up the, the Elon Musk and the
link |
Sam Harris is right. How far is your intuition about these kinds of self play mechanisms being
link |
able to take us? Cause it feels one of the ominous, but stated calmly things that when I talked to
link |
David Silver, he said, is that they have not yet discovered a ceiling for Alpha zero, for example,
link |
on the game of Go or chess. Like it's, it keeps, no matter how much they compute, they throw at it,
link |
it keeps improving. So it's possible, it's very possible that you, if you throw, you know, some
link |
like 10 X compute that it will improve by five X or something like that. And when stated calmly,
link |
it's so like, oh yeah, I guess so. But, but like, then you think like, well, can we potentially
link |
have like continuations of Moore's law in totally different way, like broadly defined Moore's law,
link |
not the exponential improvement like, are we going to have an Alpha zero that swallows the world?
link |
But notice it's not getting better at other things. It's getting better at Go. And I think it's a,
link |
that's a big leap to say, okay, well, therefore it's better at other things.
link |
Well, I mean, the question is how much of the game of life can be turned into,
link |
right? So that's, that I think is a really good question. And I think that we don't,
link |
I don't think we as a, I don't know, community really know that the answer to this, but
link |
so okay, so, so I went, I went to a talk by some experts on computer chess. So in particular,
link |
computer chess is really interesting because for, you know, for, of course, for a thousand years,
link |
humans were the best chess playing things on the planet. And then computers, like edge to
link |
head of the best person, and they've been ahead ever since. It's not like people have, have
link |
overtaken computers, but, but computers and people together have overtaken computers.
link |
Right. So at least last time I checked, I don't know what the very latest is, but last time I
link |
checked that there were teams of people who could work with computer programs to defeat the best
link |
computer programs. In the game of Go. In the game of chess. In the game of chess. Right.
link |
And so using the information about how these things called ELO scores, the sort of notion of
link |
how strong a player are you, there's a, there's kind of a range of possible scores and this,
link |
the, you, you increment and score. Basically, if you can beat another player of that lower score,
link |
62% of the time or something like that, like there's some threshold of, if you can somewhat
link |
consistently beat someone, then you are of a higher score than that person. And there's a question
link |
as to how many times can you do that in chess? Right. And so we know that there's a range of
link |
human ability levels that cap out with the best playing humans. And the computers went a step
link |
beyond that. And computers and people together have not gone, I think, a full step beyond that.
link |
It feels, the estimates that they have is that it's starting to asymptote, that we've reached
link |
kind of the maximum, the best possible chess playing. And so that means that there's kind of a
link |
finite strategic depth, right? At some point, you just can't get any better at this game.
link |
Yeah. I mean, I don't, so I'll actually check that. I think it's interesting, because if you
link |
have somebody like Magnus Carlson, who's using these chess programs to train his mind, like to
link |
learn about chess. To become a better chess player, yeah. And so like, that's a very interesting
link |
thing, because we're not static creatures. We're learning together. I mean, just like we're talking
link |
about social networks, those algorithms are teaching us, just like we're teaching those
link |
algorithms. So that's a fascinating thing. But I think the best chess playing programs are now
link |
better than the pairs. Like they have competition between pairs, but it's still, even if they weren't,
link |
it's an interesting question. Where's the ceiling? So the David, the ominous David Silver kind of
link |
statement is like, we have not found the ceiling. Right. So the question is, okay, so I don't know
link |
his analysis on that. From talking to Go experts, the depth, the strategic depth of Go seems to be
link |
substantially greater than that of chess, that there's more kind of steps of improvement that
link |
you can make getting better and better and better and better. But there's no reason to think that
link |
it's infinite. Infinite, yeah. And so it could be that what David is seeing is a kind of asymptoting,
link |
that you can keep getting better, but with diminishing returns. And at some point,
link |
you hit optimal play. Like in theory, all these finite games, they're finite. They have an optimal
link |
strategy. There's a strategy that is the minimax optimal strategy. And so at that point,
link |
you can't get any better. You can't beat that, that strategy. Now, that strategy may be
link |
from an information processing perspective, intractable, right? That you need.
link |
All the situations are sufficiently different that you can't compress it at all. It's this
link |
giant mess of hardcoded rules. And we can never achieve that. But that still puts a cap on how
link |
many levels of improvement that we can actually make. But the thing about self play is if you put
link |
it, although I don't like doing that, in the broader category of self supervised learning,
link |
is that it doesn't require too much or any human human labeling. Yeah. Yeah. Human label or just
link |
human effort, the human involvement past a certain point. And the same thing you could argue is true
link |
for the recent breakthroughs in natural language processing with language models. Oh, this is
link |
how you get to GPT three. Yeah, see how that did. That was a good, good transition. Yeah,
link |
you're proud. I practiced that for days, leading up to this. But that's one of the questions is,
link |
can we find ways to formulate problems in this world that are important to us humans,
link |
like more important than the game of chess, that to which self supervised kinds of approaches
link |
could be applied, whether it's self play, for example, for like, maybe you could think of
link |
like autonomous vehicles in simulation, that kind of stuff, or just robotics applications
link |
and simulation, or in the self supervised learning, where un annotated data or data that's generated
link |
by humans naturally without extra costs, like the Wikipedia or like all of the internet can be
link |
used to, to learn something about, to create intelligent systems that do something really
link |
powerful, that pass the touring test, or that do some kind of super human level performance.
link |
So what's your intuition, like trying to stitch all of it together about our discussion of AGI,
link |
the limits of self play, and your thoughts about maybe the limits of neural networks
link |
in the context of language models? Is there some intuition in there that might be useful to think
link |
about? Yeah, yeah, yeah. So first of all, the whole transformer network family of things
link |
is really cool. It's really, really cool. I mean, if you've ever, back in the day, you played with,
link |
I don't know, Markov models for generating text, and you've seen the kind of text that they spit
link |
out, and you compare it to what's happening now. It's, it's amazing. It's so amazing. Now, it doesn't
link |
take very long interacting with one of these systems before you find the holes, right? It's,
link |
it's not smart in any kind of general way. It's really good at a bunch of things, and it does seem
link |
to understand a lot of the statistics of language extremely well. And that turns out to be very
link |
powerful. You can answer many questions with that, but it doesn't make it a good conversation list,
link |
right? And it doesn't make it a good storyteller. It just makes it good at imitating of things that
link |
is seen in the past. The exact same thing could be said by people who voting for Donald Trump
link |
about Joe Biden supporters and people voting for Joe Biden about Donald Trump supporters is,
link |
you know, that they're not intelligent. They're just following the, yeah, they're following things
link |
they've seen in the past. And so it's very, it doesn't take long to find the flaws in their,
link |
in their like natural language generation abilities. Yes. Yes. So we're being very.
link |
That's interesting. Critical of AS systems. Right. So, so I've had a similar thought,
link |
which was that the stories that GPT three spits out are amazing and very human like.
link |
And it doesn't mean that computers are smarter than we realize necessarily. It partly means that
link |
people are dumber than we realize or that much of what we do day to day is not that deep. Like,
link |
we're just, we're just kind of going with the flow. We're saying whatever feels like the natural
link |
thing to say next. Not a lot of it is, is, is creative or meaningful or intentional. But enough
link |
is that we actually get, we get by, right? And we do come up with new ideas sometimes and we do
link |
manage to talk each other into things sometimes. And we do sometimes vote for reasonable people
link |
sometimes. But, but it's really hard to see in the statistics because so much of what we're saying
link |
is kind of wrote. And so our metrics that we use to measure how these systems are doing,
link |
don't reveal that because it's, it's, it's in the interstices that, that is very hard to detect.
link |
But is your, do you have an intuition that with these language models, if they grow in size,
link |
it's already surprising that when you go from GPT two to GPT three, that there is a noticeable
link |
improvement. So the question now goes back to the ominous David Silver and the ceiling.
link |
Right. So maybe there's just no ceiling. We just need more compute. Now,
link |
I mean, okay. So now I'm speculating as opposed to before when I was completely on firm ground.
link |
Yeah. All right. I don't believe that you can get something that really can do language and use
link |
language as a thing that doesn't interact with people. Like, I think that it's not enough to just
link |
take everything that we've said written down and just say, that's enough. You can just learn from
link |
that and you can be intelligent. I think you really need to be pushed back at. I think that
link |
conversations, even people who are pretty smart, maybe the smartest thing that we know not, maybe
link |
not the smartest thing we can imagine, but we get so much benefit out of talking to each other
link |
and interacting. That's presumably why you have conversations live with guests is that, that there's
link |
something in that interaction that would not be exposed by, oh, I'll just write you a story and
link |
then you can read it later. And I think, I think because these systems are just learning from our
link |
stories, they're not learning from being pushed back at by us, that they're fundamentally limited
link |
into what they could actually become on this route. They have to, they have to get, you know,
link |
shut down. We have to have an argument that, they have to have an argument with us and lose
link |
a couple of times before they start to realize, oh, okay, wait, there's some nuance here that
link |
actually matters. Yeah, that's actually subtle sounding, but quite profound that the intro
link |
found that the interaction with humans is, is essential. And the limitation within that is
link |
profound as well, because the time scale, like the bandwidth at which you can really interact
link |
with humans is very low. So it's costly. So you can't, one of the underlying things about self
link |
plays, it has to do, you know, a very large number of interactions. And so you can't really deploy
link |
reinforcement learning systems into the real world to interact, like you couldn't deploy a language
link |
model into the real world to interact with humans, because it would just not get enough data relative
link |
to the cost it takes to interact, like the time of humans is, is expensive, which is really
link |
interesting. That's that good, that takes us back to reinforcement learning and trying to figure out
link |
if there's ways to make algorithms that are more efficient at learning, keep the spirit and reinforcement
link |
learning and become more efficient. In some sense, this seems to be the goal. I'd love to hear what
link |
your thoughts are. I don't know if you got the chance to see it. The blog post called Biddle
link |
Lesson. Oh, yes. By Rich Sutton, that makes an argument, hopefully I can summarize it perhaps,
link |
perhaps you can. Yeah, but good. Okay. So I mean, I could try and you can correct me, which is,
link |
he makes an argument that it seems if we look at the long arc of the history of the artificial
link |
intelligence field, it calls, you know, 70 years, that the algorithms from which we've seen the biggest
link |
improvements in practice are the very simple, like dumb algorithms that are able to leverage
link |
computation. And you just wait for the computation to improve, like all of the academics and so on
link |
have fun by finding little tricks and, and congratulate themselves on those tricks. And
link |
sometimes those tricks can be like big, that feel in the moment, like big spikes and breakthroughs,
link |
but in reality, over the decades, it's still the same dumb algorithm that just waits for the
link |
compute to get faster and faster. Do you find that to be an interesting argument against the
link |
entirety of the field of machine learning as an academic discipline? That we're really just a
link |
subfield of computer architecture. Yeah. We're just kind of waiting around for them to do their
link |
next thing. We really don't want to do hardware work. So like, that's right. I really don't want
link |
to, I don't want to think about hardware. We're procrastinating. Yes, that's right. Just waiting
link |
for them to do their job so that we can pretend to have done ours. So, yeah, I mean, the argument
link |
reminds me a lot of, I think it was a Fred Jelenet quote, early computational linguist who said,
link |
you know, we're building these computational linguistic systems. And every time we fire a
link |
linguist, performance goes up by 10%. Something like that. And so the idea of us building the
link |
knowledge in, in that, in that case, was much less, he was finding it to be much less successful
link |
than get rid of the people who know about language as a, you know, from a kind of
link |
scholastic academic kind of perspective and replace them with more compute.
link |
And so I think this is kind of a modern version of that story, which is, okay, we want to do better
link |
on machine vision. You could build in all these, you know, motivated, part based models that,
link |
you know, that just feel like obviously the right thing that you have to have, or we can throw a
link |
lot of data at it and guess what we're doing better with it with a lot of data. So I hadn't
link |
thought about it until this moment in this way. But what I believe, well, I've thought about what
link |
I believe. What I believe is that, you know, compositionality and what's the right way to
link |
say it, the complexity grows rapidly as you consider more and more possibilities, like
link |
explosively. And so far, Moore's law has also been growing explosively, exponentially. And so,
link |
so it really does seem like, well, we don't have to think really hard about the algorithm design
link |
or the way that we build the systems, because the best benefit we could get is exponential,
link |
and the best benefit that we can get from waiting is exponential, so we can just wait.
link |
It's got, that's got to end, right? And there's hints now that Moore's law is starting to feel
link |
some friction, starting to, the world is pushing back a little bit. One thing I don't know,
link |
lots of people know this, I didn't know this, I was trying to write an essay. And
link |
yeah, Moore's law has been amazing, and it's been, it's enabled all sorts of things. But there's a,
link |
there's also a kind of counter Moore's law, which is that the development cost for each
link |
successive generation of chips also is doubling. So it's costing twice as much money. So the amount
link |
of development money per cycle or whatever is actually sort of constant. And at some point,
link |
we run out of money. So, or we have to come up with an entirely different way of doing the
link |
development process. So like, I guess I always, always a bit skeptical of the look, it's an
link |
exponential curve, therefore it has no end. Soon the number of people going to Neuraps will be
link |
greater than the population of the earth. That means we're going to discover life on other planets.
link |
No, it doesn't. It means that we're in a, in a sigmoid curve on the front half, which looks
link |
a lot like an exponential. The second half is going to look a lot like diminishing returns.
link |
Yeah. The, I mean, but the interesting thing about Moore's law, if you actually like, look at the
link |
technologies involved, it's hundreds, not thousands of S curves stacked on top of each other. It's
link |
not actually an exponential curve. It's constant breakthroughs. And then what becomes useful to
link |
think about, which is exactly what you're saying, the cost of development, like the size of teams,
link |
the amount of resources that are invested in continuing to find new S curves, new breakthroughs.
link |
And yeah, it's a, it's an interesting idea. You know, if we live in the moment, if we sit here
link |
today, it seems to be the reasonable thing to say that exponentials end. And yet in the software
link |
realm, they just keep appearing to be happening. And it's so, I mean, it's so hard to disagree
link |
with Elon Musk on this because it like, I've, you know, I used to be one of those folks. I'm
link |
still one of those folks I've studied autonomous vehicles. This is what I worked on. And, and
link |
it's, it's like, you look at what Elon Musk is saying about autonomous vehicles, well, obviously
link |
in a couple of years or in a year or, or next month, we'll have fully autonomous vehicles.
link |
Like there's no reason why we can't driving is pretty simple. Like it's just a learning problem.
link |
And you just need to convert all the driving that we're doing into data and just having you
link |
all know with the trains and that data. And like we use only our eyes. So you can use cameras and
link |
you can train on it. And it's like, yeah, that's that what that should work. And then you put that
link |
hat on like the philosophical hat. And but then you put the pragmatic hat and it's like, this is
link |
what the flaws of computer vision are like, this is what it means to train at scale. And then you
link |
you put the human factors, the psychology hat on, which is like, it's actually driving us a lot,
link |
the cognitive science or cognitive, whatever the heck you call it is, it's really hard. It's much
link |
harder to drive than, than we realize there's a much larger number of edge cases. So building up
link |
an intuition around this is, is around exponential is really difficult. And on top of that, the pandemic
link |
is making us think about exponentials, making us realize that like, we don't understand anything
link |
about it. We're not able to intuit exponentials. We're either ultra terrified, some part of the
link |
population and some part is like the opposite of whatever the different carefree. And we're not
link |
managing it. Blase. Blase. Well, wow, that's French. So it's got an accent. So it's, it's, it's
link |
fascinating to think what, what the limits of this exponential growth of technology, not just
link |
Moore's law, it's technology, how that rubs up against the bitter lesson in GPT three and self
link |
playing mechanisms that is not obvious. I used to be much more skeptical about neural networks,
link |
now at least give us slither possibility that will be all, that will be very much surprised.
link |
And also, you know, caught in a way that like, we are not prepared for, like in applications of
link |
social networks, for example, because it feels like really good transformer models that are able
link |
to do some kind of like very good natural language generation of the same kind of models that could
link |
be used to learn human behavior and then manipulate that human behavior to gain advertiser dollars
link |
and all those kinds of things. Sure. To feed the capitalist system. And they arguably already
link |
are manipulating human behavior. Yeah. So, but not for self preservation, which I think is a big,
link |
that would be a big step. Like if they were trying to manipulate us to convince us not to
link |
shut them off, I would be very freaked out. But I don't see a path to that from where we are now.
link |
They don't have any of those abilities. That's not what they're trying to do. They're trying to
link |
keep people on the site. But see, the thing is this, this is the thing about life on earth is
link |
they might be borrowing our consciousness and sentience. Like, so like in a sense, they do
link |
because the creators of the algorithms have like, they're not, you know, if you look at our body,
link |
yeah, okay, we're not a single organism, we're a huge number of organisms with like tiny little
link |
motivations, we're built on top of each other. In the same sense, the AI algorithms that they're not
link |
like a system that includes human companies and corporations, right? Because corporations are
link |
funny organisms in and of themselves that really do seem to have self preservation built in. And I
link |
think that's at the, at the design level, I think the design to have self preservation be a focus.
link |
So you're right, in that, in that broader system that we're also a part of and can have some influence
link |
on, it's, it's, it is much more complicated, much more powerful. Yeah, I agree with that.
link |
So people really love it when I ask, what three books, technical, philosophical fiction had a
link |
big impact on your life, maybe you can recommend, we went with movies, we went with Billy Joel,
link |
and I forgot what you, what music you recommended, but I didn't, I just said I have no taste in music,
link |
I just like pop music. That was actually really skillful the way you avoided that question.
link |
Thanks, I was, I'm going to try to do the same with the books.
link |
So do you have a skillful way to avoid answering the question about three books you would recommend?
link |
I'd like to tell you a story. So my first job out of college was at Bellcor, I mentioned that
link |
before, where I worked with Dave Ackley, the head of the group was a guy named Tom Landauer,
link |
and I don't know how well known he's known now, but arguably he's the, he's the inventor and the
link |
first proselytizer of word embeddings. So they, they developed a system shortly before I got to the
link |
group. Yeah, that, that called latent semantic analysis that would take words of English and
link |
embed them in, you know, multi hundred dimensional space, and then use that as a way of, you know,
link |
assessing similarity and basically doing reinforcement learning, not sorry, not reinforcement,
link |
information retrieval, you know, sort of pre Google information retrieval.
link |
And he was trained as an anthropologist, but then became a cognitive scientist,
link |
I was in the cognitive science research group. It's, you know, like I said,
link |
I'm a cognitive science groupie. At the time, I thought I'd become a cognitive scientist,
link |
but then I realized in that group, no, I'm a computer scientist, but I'm a computer scientist
link |
who really loves to hang out with cognitive scientists. And he said, he studied language
link |
acquisition in particular, he said, you know, humans have about this number of words of vocabulary,
link |
and most of that is learned from reading. And I said, that can't be true, because I have a really
link |
big vocabulary, and I don't read. He's like, you must. I'm like, I don't think I do. I mean,
link |
like stop signs, I definitely read stop signs. But like reading books is not, is not a thing
link |
that I do a lot. Really, though, it might be just, maybe the red color. Do I read stop signs?
link |
Yeah. No, it's just pattern recognition at this point. I don't sound it out.
link |
Stop. So now I do, I wonder what that, oh yeah, stop the guns. So,
link |
that's fascinating. So you don't, so I don't read very, I mean, obviously, I read and I've read,
link |
I've read plenty of books. But like some people like Charles, my friend Charles and others,
link |
like a lot of people in my field, a lot of academics, like reading was really a central
link |
topic to them in development. And I'm not that guy. In fact, I used to joke that when I got into
link |
college, that it was on kind of a help out the illiterate kind of program, because I got to
link |
college, like in my house, I wasn't a particularly bad or good reader. But when I got to college,
link |
I was surrounded by these people that were just voracious in their reading appetite.
link |
And they would like, have you read this? Have you read this? Have you read this? And I'd be like,
link |
no, I'm clearly not qualified to be at this school. Like there's no way I should be here.
link |
Now I've discovered books on tape, like audio books. And so I'm much better. I'm more caught up,
link |
I read a lot of books. A small tangent on that. It is a fascinating open question to me
link |
on the topic of driving, whether, you know, supervised learning people, machine learning
link |
people think you have to like drive to learn how to drive. To me, it's very possible that just by
link |
us humans, by first of all, walking, but also by watching other people drive, not even being inside
link |
cars as a passenger, but let's say being inside the cars as a passenger, but even just like being
link |
a pedestrian and crossing the road, you learn so much about driving from that. It's very possible
link |
that you can, without ever being inside of a car, be okay at driving once you get in it.
link |
Or like watching a movie, for example, I don't know, something like that.
link |
Have you, have you taught anyone to drive? No.
link |
So I have two children and I learned a lot about car driving because my wife doesn't want to be the
link |
one in the car while they're learning. So that's my job. So I sit in the passenger seat and it's
link |
really scary. I have, you know, I have wishes to live and they're, you know, they're figuring things
link |
out. Now, they start off very, very much better than I imagine like a neural network would, right?
link |
They get that they're seeing the world. They get that there's a road that they're trying to be on.
link |
They get that there's a relationship between the angle of the steering, but it takes a while to
link |
not be very jerky. And so that happens pretty quickly. Like the ability to stay in lane at
link |
speed, that happens relatively fast. It's not zero shot learning, but it's pretty fast.
link |
The thing that's remarkably hard, and this is I think partly why self driving cars are really hard,
link |
is the degree to which driving is a social interaction activity. And that blew me away.
link |
I was completely unaware of it until I watched my son learning to drive. And I was realizing
link |
that he was sending signals to all the cars around him. And those, in his case, he's,
link |
he's always had social communication challenges. He was sending very mixed confusing signals to
link |
the other cars. And that was causing the other cars to drive weirdly and erratically. And there
link |
was no question in my mind that he would, he would have an accident because they didn't know how to
link |
read him. There's things you do with the speed that you drive, the positioning of your car,
link |
that you're constantly like in the head of the other drivers. And seeing him not knowing how to
link |
do that and having to be taught explicitly, okay, you have to be thinking about what the other driver
link |
is thinking was a revelation to me. I was stunned. It's a creating kind of theories of mind of the
link |
other. The theories of mind of the other cars. Yeah. Yeah. Which I just hadn't heard discussed in
link |
the self driving car talks that I've been to. Since then, there's some people who do do
link |
consider those kinds of issues, but it's way more subtle than I think there's a little bit of work
link |
involved with that. When you realize, like when you especially focus not on other cars, but on
link |
pedestrians, for example, it's literally staring you in the face. Yeah. So then when you're just
link |
like, how do I interact with pedestrians? You have pedestrians, you're practically talking to
link |
an octopus at that point. They've got all these weird degrees of freedom. You don't know what
link |
they're going to do. They can turn around any second. But the point is, we humans know what
link |
they're going to do. We have a good theory of mind. We have a good mental model of what they're
link |
doing. And we have a good model of the model they have of you and the model of the model of the
link |
model. We're able to kind of reason about this kind of the social game of it. The hope is that
link |
it's quite simple actually, that it could be learned. That's why I just talked to the Waymo.
link |
I don't know if you know of that company. Google sells their every car. I talked to their CTO
link |
about this podcast and they wrote in their car and it's quite aggressive and it's quite fast
link |
and it's good and it feels great. It also just like Tesla, Waymo made me change my mind about
link |
maybe driving is easier than I thought. Maybe I'm just being speciest human center. Maybe...
link |
It's a speciest argument. Yes, I don't know. But it's fascinating to think about like the same
link |
as with reading, which I think you just said. You avoided the question. I still hope you answered
link |
someone. You avoided it brilliantly. There's blind spots as artificial intelligence that
link |
artificial intelligence researchers have about what it actually takes to learn to solve a problem.
link |
That's fascinating. Have you had Anka Dragan on? Yes. She's one of my favorites. So much energy.
link |
She's amazing. Fantastic. And in particular, she thinks a lot about this kind of...
link |
I know that you know that I know kind of planning. And the last time I spoke with her,
link |
she was very articulate about the ways in which self driving cars are not solved,
link |
like what's still really, really hard. But even her intuition is limited. We're all new to this.
link |
So in some sense, the Elon Musk approach of being ultra confident and just like plowing...
link |
Put it out there. Put it out there. Like some people say it's reckless and dangerous and so on.
link |
But partly it seems to be one of the only ways to make progress in artificial intelligence.
link |
These are difficult things. Democracy is messy. Implementation of artificial
link |
intelligence systems in the real world is messy. So many years ago, before self driving cars were
link |
an actual thing you could have a discussion about, somebody asked me, what if we could
link |
use that robotic technology and use it to drive cars around? Aren't people going to be killed?
link |
That's not what's going to happen. I said with confidence incorrectly, obviously.
link |
What I think is going to happen is we're going to have a lot more like a very gradual kind of
link |
rollout where people have these cars in like closed communities, where it's somewhat realistic,
link |
but it's still in a box so that we can really get a sense of what are the weird things that
link |
can happen? How do we have to change the way we behave around these vehicles? It obviously requires
link |
a kind of co evolution that you can't just plop them in and see what happens.
link |
But of course, we're basically plopping them in and see what happens. So I was wrong,
link |
but I do think that would have been a better plan.
link |
But your intuition is funny, just zooming out and looking at the forces of capitalism.
link |
And it seems that capitalism rewards risk takers and rewards and punishes risk takers and
link |
try it out. The academic approach to let's try a small thing and try to understand slowly the
link |
fundamentals of the problem. And let's start with one and do two and then see that and then do the
link |
three. The capitalists like startup entrepreneurial dream is let's build a thousand and let's...
link |
Right. And 500 of them fail, but whatever the other 500 we learned from them.
link |
But if you're good enough, I mean, one thing is like your intuition would say like,
link |
that's going to be hugely destructive to everything. But actually it's kind of the forces
link |
of capitalism. People are quite... It's easy to be critical, but if you actually look at the data
link |
at the way our world has progressed in terms of the quality of life, it seems like the competent,
link |
good people rise to the top. This is coming from me from the Soviet Union and so on.
link |
It's interesting that somebody like Elon Musk is the way you push progress and artificial
link |
intelligence. Like it's forcing Waymo to step their stuff up and Waymo is forcing Elon Musk
link |
to step up. It's fascinating because I have this tension in my heart and just being upset by
link |
the lack of progress in autonomous vehicles within academia. So there's huge progress
link |
in the early days of the DARPA challenges. And then it just kind of stopped at MIT,
link |
but it's true everywhere else with an exception of a few sponsors here and there. It's not seen
link |
as a sexy problem. The moment artificial intelligence starts approaching the problems
link |
of the real world, like academics kind of like, all right, let the...
link |
Because they get really hard in a different way.
link |
In a different way. That's right.
link |
I think some of us are not excited about that other way.
link |
But I still think there's fundamentals problems to be solved in those difficult things. It's
link |
still publishable, I think. We just need to... It's the same criticism you could have of all
link |
these conferences in Europe, in CVPR, where application papers are often as powerful and
link |
as important as theory paper. Even theory just seems much more respectable and so on.
link |
I mean, machine learning community is changing that a little bit, at least in statements,
link |
but it's still not seen as the sexiest of pursuits, which is like, how do I actually
link |
make this thing work in practice as opposed to on this toy dataset?
link |
All that to say, are you still avoiding the three books question? Is there something on
link |
audiobook that you can recommend? Oh, yeah. I mean, yeah, I've read a lot of really fun stuff.
link |
In terms of books that I find myself thinking back on that I read a while ago,
link |
like that have stood the test of time to some degree, I find myself thinking of
link |
program or B programed a lot by Douglas Roschkopf, which was... It basically put out the premise
link |
that we all need to become programmers in one form or another. It was in analogy to,
link |
once upon a time, we all had to become readers. We had to become literate. There was a time before
link |
that when not everybody was literate, but once literacy was possible, the people who were literate
link |
had more of a say in society than the people who weren't. We made a big effort to get everybody
link |
up to speed and now it's not 100% universal, but it's quite widespread. The assumption
link |
is generally that people can read. The analogy that he makes is that programming is a similar
link |
kind of thing, that we need to have a say in... Right. Being a reader, being literate, being a
link |
reader means you can receive all this information, but you don't get to put it out there. Programming
link |
is the way that we get to put it out there. That was the argument they made. I think he
link |
specifically has now backed away from this idea. He doesn't think it's happening quite this way.
link |
That might be true that society didn't play forward quite that way.
link |
I still believe in the premise. I still believe that at some point,
link |
we have... The relationship that we have to these machines and these networks has to be one of each
link |
individual has the wherewithal to make the machines help them do the things that that
link |
person once done. As software people, we know how to do that. When we have a problem, we're like,
link |
okay, I'll hack up a pulse grip or something and make it so. If we lived in a world where
link |
everybody could do that, that would be a better world. Computers would have, I think, less sway
link |
over us. Other people's software would have less sway over us as a group.
link |
Yeah. In some sense, software engineering is programming's power.
link |
Programming is power. Right. Yeah. It's like magic. It's like magic spells. It's not out of reach
link |
of everyone, but at the moment, it's just a sliver of the population who can
link |
commune with machines in this way. I don't know. That book had a big impact on me.
link |
Currently, I'm reading The Alignment Problem actually by Brian Christian. I don't know if
link |
you've seen this out there yet. Is it similar to Stuart Russell's work with the control problem?
link |
It's in that same general neighborhood. They have different
link |
emphases that they're concentrating on. I think Stuart's book did a remarkably good job.
link |
Like just a celebratory good job at describing AI technology and how it works. I thought that
link |
was great. It was really cool to see that in a book. I think he has some experience writing
link |
some books. That's probably a possible thing. He's maybe thought a thing or two about how to explain
link |
AI to people. Yeah. Yeah. That's a really good point. This book so far has been remarkably good
link |
at telling the story of the recent history of some of the things that have happened.
link |
I'm in the first third. He said this book is in three thirds. The first third is essentially AI
link |
fairness and implications of AI on society that we're seeing right now. That's been great. He's
link |
telling those stories really well. He went out and talked to the frontline people whose names
link |
were associated with some of these ideas. It's been terrific. He says the second half of the book
link |
is on reinforcement learning. Maybe that'll be fun. Then the third half, third third,
link |
is on the superintelligence alignment problem. I suspect that that part will be less fun for
link |
me to read. Yeah. It's an interesting problem to talk about. I find it to be the most interesting
link |
just like thinking about whether we live in a simulation or not as a thought experiment to
link |
think about our own existence. In the same way, talking about alignment problem with AGI is a
link |
good way to think, similar to the trolley problem with autonomous vehicles. It's a useless thing
link |
for engineering, but it's a nice little thought experiment for actually thinking about our own
link |
human ethical systems, our moral systems, by thinking how we engineer these things,
link |
you start to understand yourself. SciFi can be good at that too. One sci fi book to recommend
link |
is Excellations by Ted Chiang, a bunch of short stories. Ted Chiang is the guy who wrote the
link |
short story that became the movie Arrival. All of his stories, just from a, he was a computer
link |
scientist. Actually, he studied at Brown. They all have this really insightful bit of science
link |
or computer science that drives them. It's just a romp. He creates these artificial worlds by
link |
extrapolating on these ideas that we know about, but hadn't really thought through to this kind of
link |
conclusion. His stuff is, it's really fun to read. It's mind warping. I'm not sure if you're
link |
familiar. I seem to mention this every other word. I'm from the Soviet Union and I'm Russian.
link |
I think my roots are Russian too, but a couple of generations back.
link |
Well, it's probably in there somewhere. Maybe we can pull at that thread a little bit
link |
of the existential dread that we all feel. I think somewhere in the conversation,
link |
you mentioned that you don't really pretty much like dying. I forget in which context.
link |
It might have been a reinforcement learning perspective. I don't know.
link |
I know what it was. It was in teaching my kids to drive. That's how you face your mortality.
link |
Yes. From a human beings perspective or from a reinforcement learning researchers perspective,
link |
let me ask you the most absurd question. What do you think is the meaning of this whole thing,
link |
the meaning of life on this spinning rock? I mean, I think reinforcement learning researchers
link |
maybe think about this from a science perspective more often than a lot of other people. As a
link |
supervised learning person, you're probably not thinking about the sweep of a lifetime,
link |
but reinforcement learning agents are having little lifetimes, little weird little lifetimes,
link |
and it's hard not to project yourself into their world sometimes.
link |
As far as the meaning of life, when I turn 42, you may know from, that is a book I read,
link |
The Hitchhiker's Guide to the Galaxy, that that is the meaning of life. When I turned 42,
link |
I had a meaning of life party where I invited people over and everyone shared their meaning
link |
of life. We had slides made up and so we all sat down and did a slide presentation to each other
link |
about the meaning of life. Mine was balance. I think that life is balance.
link |
And so the activity at the party, for a 42 year old, maybe this is a little bit nonstandard,
link |
but I found all the little toys and devices that I had that where you had to balance on them.
link |
You had to stand on it and balance or Pogo Stick I brought, a Rip Stick, which is like a weird
link |
two wheeled skateboard. I got a Unicycle, but I didn't know how to do it. I now can do it.
link |
I love watching you try. Yeah, I'll send you a video. I'm not great, but I managed.
link |
And so balance, yeah. So my wife has a really good one that she sticks to and is probably
link |
pretty accurate and it has to do with healthy relationships with people that you love and
link |
working hard for good causes. But to me, yeah, balance, balance in a word. That works for me.
link |
Not too much of anything because too much of anything is iffy.
link |
That feels like a Rolling Stones song. I feel like there must be.
link |
You can't always get what you want, but if you try sometimes, you can strike a balance.
link |
Yeah, I think that's how it goes. I'll write you a parody.
link |
It's a huge honor to talk to you. This is really fun. I've been a big fan of yours, so
link |
I can't wait to see what you do next in the world of education, the world of parody,
link |
in the world of reinforcement learning. Thanks for talking to me. My pleasure.
link |
Thank you for listening to this conversation with Michael Littman and thank you to our sponsors.
link |
Simply Safe, a home security company I use to monitor and protect my apartment,
link |
ExpressVPN, the VPN I've used for many years to protect my privacy on the internet,
link |
Masterclass, online courses that I enjoy from some of the most amazing humans in history,
link |
and BetterHelp, online therapy with a licensed professional.
link |
Please check out these sponsors in the description to get a discount and to support this podcast.
link |
If you enjoy this thing, subscribe on YouTube, review it with five stars and up a podcast,
link |
follow on Spotify, support on Patreon, or connect with me on Twitter at Lex Freedman.
link |
And now, let me leave you some words from Groucho Marks. If you're not having fun,
link |
you're doing something wrong. Thank you for listening and hope to see you next time.