back to index

Michael Littman: Reinforcement Learning and the Future of AI | Lex Fridman Podcast #144


small model | large model

link |
00:00:00.000
The following is a conversation with Michael Littman, a computer science professor at Brown University
link |
00:00:05.440
doing research on and teaching machine learning, reinforcement learning, and artificial intelligence.
link |
00:00:12.240
He enjoys being silly and lighthearted in conversation, so this was definitely a fun one.
link |
00:00:18.960
Quick mention of each sponsor, followed by some thoughts related to the episode.
link |
00:00:23.440
Thank you to SimplySafe, a home security company I use to monitor and protect my apartment.
link |
00:00:29.280
ExpressVPN, the VPN I've used for many years to protect my privacy on the internet.
link |
00:00:34.800
Masterclass, online courses that I enjoy from some of the most amazing humans in history,
link |
00:00:40.640
and better help online therapy with a licensed professional.
link |
00:00:45.200
Please check out these sponsors in the description to get a discount and to support this podcast.
link |
00:00:50.560
As a side note, let me say that I may experiment with doing some solo episodes in the coming month
link |
00:00:56.480
or two. The three ideas I have floating in my head currently is to use one, a particular moment
link |
00:01:03.680
in history, two, a particular movie, or three, a book to drive a conversation about a set of
link |
00:01:11.200
related concepts. For example, I could use 2001 Space Odyssey or Ex Machina to talk about AGI
link |
00:01:18.400
for one, two, three hours, or I could do an episode on the rise and fall of Hitler and Stalin
link |
00:01:28.160
each in a separate episode, using relevant books and historical moments for reference.
link |
00:01:34.000
I find the format of a solo episode very uncomfortable and challenging, but that just
link |
00:01:39.520
tells me that it's something I definitely need to do and learn from the experience.
link |
00:01:44.080
Of course, I hope you come along for the ride. Also, since we have all this moment of built up
link |
00:01:50.080
unannouncements, I'm giving a few lectures on machine learning at MIT this January. In general,
link |
00:01:55.600
if you have ideas for the episodes, for the lectures, or for just short videos on YouTube,
link |
00:02:03.440
let me know in the comments that I still definitely read despite my better judgment
link |
00:02:11.120
and the wise sage device of the great Joe Rogan. If you enjoy this thing, subscribe on YouTube,
link |
00:02:18.240
review it with five stars and I'll have a podcast, follow on Spotify,
link |
00:02:21.680
support on Patreon, or connect with me on Twitter at Lex Freedman.
link |
00:02:26.480
And now here's my conversation with Michael Littman. I saw a video of you talking to Charles
link |
00:02:33.440
Isbell about Westworld, the TV series. You guys were doing a kind of thing where you're
link |
00:02:37.760
watching new things together, but let's rewind back. Is there a sci fi movie or book
link |
00:02:46.000
or shows that was profound that had an impact on you philosophically or just specifically
link |
00:02:53.040
something you enjoyed nerding out about? Yeah, interesting. I think a lot of us have been inspired
link |
00:02:58.560
by robots in movies. The one that I really like is there's a movie called Robot and Frank,
link |
00:03:05.200
which I think is really interesting because it's very near term future where robots are being deployed
link |
00:03:11.600
as helpers in people's homes. And we don't know how to make robots like that at this point,
link |
00:03:18.160
but it seemed very plausible. It seemed very realistic or imaginable. I thought that was
link |
00:03:24.000
really cool because they're awkward. They do funny things that raise some interesting issues,
link |
00:03:28.480
but it seemed like something that would ultimately be helpful and good if we could do it right.
link |
00:03:32.800
Yeah, he was an older cranky gentleman. He was an older cranky jewel thief. Yeah.
link |
00:03:38.000
It's kind of a funny little thing, which is he's a jewel thief and so he pulls the robot
link |
00:03:43.680
into his life, which is something you could imagine taking a home robotics thing and pulling
link |
00:03:51.840
into whatever quirky thing that's involved in your existence. It's meaningful to you,
link |
00:03:57.200
exactly so. Yeah, and I think from that perspective, I mean, not all of us are jewel thieves. And so
link |
00:04:01.360
when we bring our robots into our lives, explains a lot about this apartment, actually. But no,
link |
00:04:08.160
the idea that people should have the ability to make this technology their own, that it becomes
link |
00:04:14.800
part of their lives. And I think that's, it's hard for us as technologists to make that kind
link |
00:04:19.840
of technology. It's easier to mold people into what we need them to be. And just that opposite
link |
00:04:25.200
vision I think is really inspiring. And then there's a anthropomorphization where we project
link |
00:04:30.400
certain things on them because I think the robot was kind of dumb. But I have a bunch of Roombas
link |
00:04:34.720
that play with and they, you immediately project stuff onto them, much greater level of intelligence.
link |
00:04:40.320
We'll probably do that with each other too. Much, much, much greater degree of compassion.
link |
00:04:44.720
That's right. One of the things we're learning from AI is where we are smart and where we are
link |
00:04:48.640
not smart. Yeah. You also enjoy, as people can see, and I enjoyed myself watching you sing
link |
00:04:58.480
and even dance a little bit, a little bit of dancing. A little bit of dancing. That's not
link |
00:05:03.440
quite my thing. As a method of education, or just in life, in general. So easy question.
link |
00:05:11.840
What's the definitive, objectively speaking, top three songs of all time? Maybe something that,
link |
00:05:20.720
to walk that back a little bit, maybe something that others might be surprised by. Three songs
link |
00:05:26.960
that you kind of enjoy. That is a great question that I cannot answer. But instead, let me tell
link |
00:05:31.840
you a story. Pick a question you do want to answer. That's right. I've been watching the
link |
00:05:36.480
presidential debates and vice presidential debates and it turns out, yeah, it's really,
link |
00:05:39.440
you can just answer any question you want. So it's a related question. Yeah, well said.
link |
00:05:47.280
I really like pop music. I've enjoyed pop music ever since I was very young. So 60s music, 70s
link |
00:05:52.160
music, 80s music, this is all awesome. And then I had kids and I think I stopped listening to music
link |
00:05:57.040
and I was starting to realize that the, like my musical taste had sort of frozen out. And so I
link |
00:06:02.000
decided in 2011, I think to start listening to the top 10 Billboard songs each week. So I'd be on
link |
00:06:09.120
the treadmill and I would listen to that week's top 10 songs so I could find out what was popular
link |
00:06:13.680
now. And what I discovered is that I have no musical taste whatsoever. I like what I'm familiar
link |
00:06:20.000
with. And so the first time I'd hear a song is the first week that was on the charts. I'd be like,
link |
00:06:25.680
and then the second week, I was into it a little bit and the third week, I was loving it. And by
link |
00:06:29.840
the fourth week is like just part of me. And so I'm afraid that I can't tell you the most, my
link |
00:06:35.680
favorite song of all time, because it's whatever I heard most recently. Yeah, that's interesting.
link |
00:06:39.840
People have told me that there's an art to listening to music as well. And you can start to,
link |
00:06:47.680
if you listen to a song, just carefully, explicitly just force yourself to really listen. You start
link |
00:06:53.360
to, I did this when I was part of Jazz Band and Fusion Band in college. You start to hear the layers
link |
00:07:01.200
of the instruments. You start to hear the individual instruments and you start to,
link |
00:07:05.680
you can listen to classical music or to orchestra this way, you can listen to jazz this way.
link |
00:07:10.800
It's funny to imagine you now to walk in that forward to listening to pop hits now as like a
link |
00:07:17.440
scholar listening to like Cardi B or something like that or Justin Timberlake. No, not Timberlake,
link |
00:07:23.920
Bieber. They've both been in the top 10 since I've been listening. They're still up there. Oh,
link |
00:07:29.440
my God, I'm so cool. If you haven't heard Justin Timberlake's top 10 in the last few years,
link |
00:07:34.080
there was one song that he did where the music video was set at essentially NeurIPS. Oh, wow.
link |
00:07:40.720
Oh, the one with the robotics. Yeah, yeah, yeah, yeah. Yeah, yeah. It's like at an academic conference
link |
00:07:45.680
and he's doing a demo and it was sort of a cross between the Apple, like Steve Jobs kind of talk
link |
00:07:52.640
and NeurIPS. So, you know, it's always fun when AI shows up in pop culture. I wonder if he consulted
link |
00:07:59.120
somebody for that. That's really interesting. So maybe on that topic, I've seen your celebrity
link |
00:08:05.680
multiple dimensions, but one of them is you've done cameos in different places. I've seen you
link |
00:08:11.440
in a TurboTax commercial as like I guess the brilliant Einstein character. And the point
link |
00:08:19.360
is that TurboTax doesn't need somebody like you. It doesn't need a brilliant person.
link |
00:08:25.520
Very few things need someone like me. But yes, they were specifically emphasizing the idea that
link |
00:08:29.840
you don't need to be like a computer expert to be able to use their software. How did you end up
link |
00:08:34.320
in that world? I think it's an interesting story. So I was teaching my class. It was an
link |
00:08:39.120
intro computer science class for nonconcentrators, nonmajors. And sometimes when people would visit
link |
00:08:46.400
campus, they would check in to say, hey, we want to see what a class is like. Can we sit on your
link |
00:08:50.080
class? So a person came to my class who was the daughter of the brother of the husband of the best
link |
00:09:04.320
friend of my wife. Anyway, basically a family friend came to campus to check out Brown and
link |
00:09:12.720
asked to come to my class and came with her dad. Her dad is who I've known from various kinds of
link |
00:09:19.120
family events and so forth, but he also does advertising. And he said that he was recruiting
link |
00:09:24.880
scientists for this ad, this TurboTax set of ads. And he said, we wrote the ad with the idea that
link |
00:09:33.440
we get the most brilliant researchers, but they all said no. So can you help us find the B level
link |
00:09:42.720
scientists? And I'm like, sure, that's who I hang out with. So that should be fine. So I put
link |
00:09:48.800
together a list and I did what some people called a Dick Cheney. So I included myself on the list
link |
00:09:53.440
of possible candidates with a little blurb about each one and why I thought that would make sense
link |
00:09:58.320
for them to do it. And they reached out to a handful of them, but then they ultimately,
link |
00:10:02.240
they YouTube stalked me a little bit and they thought, oh, I think he could do this. And they
link |
00:10:07.280
said, okay, we're going to offer you the commercial. I'm like, what? So it was such an interesting
link |
00:10:13.440
experience because they have another world, the people who do nationwide kind of ad campaigns
link |
00:10:21.200
and television shows and movies and so forth. It's quite a remarkable system that they have going
link |
00:10:27.520
because it's like a set. Yeah. So I went to, it was just somebody's house that they rented in
link |
00:10:32.640
New Jersey, but in the commercial, it's just me and this other woman. In reality, there were 50
link |
00:10:41.280
people in that room and another, I don't know, half a dozen kind of spread out around the house
link |
00:10:46.080
in various ways. There were people whose job it was to control the sun. They were in the backyard
link |
00:10:51.360
on ladders putting filters up to try to make sure that the sun didn't glare off the window in a way
link |
00:10:57.120
that would wreck the shot. So there was like six people out there doing that. There was three people
link |
00:11:00.960
out there giving snacks, the craft table. There was another three people giving healthy snacks
link |
00:11:06.240
because that was a separate craft table. There was one person whose job it was to keep me from
link |
00:11:10.560
getting lost. And I think the reason for all this is because so many people are in one place at one
link |
00:11:16.800
time, they have to be time efficient. They have to get it done. The morning they were going to do
link |
00:11:21.360
my commercial and the afternoon they were going to do a commercial of a mathematics professor from
link |
00:11:25.040
Rinston. They had to get it done. No wasted time or energy. And so there's just a fleet of people
link |
00:11:31.920
all working as an organism and it was fascinating. I was just the whole time just looking around like
link |
00:11:36.800
this is so neat. Like one person whose job it was to take the camera off of the cameraman
link |
00:11:43.680
so that someone else whose job it was to remove the film canister because every couple's takes,
link |
00:11:48.240
they had to replace the film because film gets used up. It was just, I don't know, I was geeking
link |
00:11:54.320
out the whole time. It was so fun. How many takes did it take? It looked the opposite like there
link |
00:11:58.160
was more than two people there. It was very relaxed. Right. Yeah. The person who I was in the scene
link |
00:12:03.680
with is a professional. She's an improv comedian from New York City. And when I got there, they
link |
00:12:10.880
had given me a script such as it was. And then I got there and they said, we're going to do this
link |
00:12:15.440
as improv. I'm like, I don't know how to improv. I don't know what you're telling me to do here.
link |
00:12:21.280
Yeah. Don't worry. She knows. I'm like, okay, we'll see how this goes. I guess I got pulled
link |
00:12:27.360
into the story because like, where the heck did you come from? I guess in the scene. Like,
link |
00:12:32.000
how did you show up in this random person's house? I don't know. Yeah. Well, I mean,
link |
00:12:36.160
the reality of it is I stood outside in the blazing sun. There was someone whose job it was
link |
00:12:40.080
to keep an umbrella over me because I started to sweat. I started to sweat. And so I would
link |
00:12:44.240
wreck the shot because my face was all shiny with sweat. So there was one person who would dab me
link |
00:12:48.080
off at an umbrella. But yeah, like the reality of it, like why is this strange stalkery person
link |
00:12:54.240
hanging around outside somebody's house? We're not sure. We have to look in. We have to wait
link |
00:12:58.400
for the book. But are you... So you make, like you said, YouTube, you make videos yourself.
link |
00:13:04.960
You make awesome parody, sort of parody songs that kind of focus in a particular aspect of
link |
00:13:11.680
computer science. How much... Those seem really natural. How much production value goes into
link |
00:13:17.840
that? Do you also have a team of 50 people? The videos, almost all the videos, except for the
link |
00:13:22.960
ones that people would have actually seen, were just me. I write the lyrics. I sing the song. I
link |
00:13:28.240
generally find a backing track online because I'm like, you can't really play an instrument.
link |
00:13:36.160
And then I do, in some cases, I'll do visuals using just like PowerPoint. Lots and lots of
link |
00:13:41.840
PowerPoint to make it sort of like an animation. The most produced one is the one that people
link |
00:13:46.640
might have seen, which is the overfitting video that I did with Charles Isbell. And that was
link |
00:13:51.920
produced by the Georgia Tech and Udacity people because we were doing a class together. It was
link |
00:13:57.200
kind of... I usually do parody songs kind of to cap off a class at the end of a class.
link |
00:14:01.760
So that one you're wearing, so it was just a thriller, you're wearing the Michael Jackson,
link |
00:14:06.800
the red leather jacket. The interesting thing with podcasting that you're also into is that
link |
00:14:14.480
I really enjoy is that there's not a team of people. It's kind of more... Because you know,
link |
00:14:25.440
there's something that happens when there's more people involved than just one person. That's just
link |
00:14:31.840
the way you start acting. I don't know. There's a censorship. You're not given, especially for
link |
00:14:38.640
like slow thinkers like me, you're not... And I think most of us are if we're trying to actually
link |
00:14:43.760
think we're a little bit slow and careful, it kind of large teams get in the way of that.
link |
00:14:52.560
And I don't know what to do with that. To me, it's very popular to criticize
link |
00:14:58.640
quote unquote mainstream media, but there is legitimacy to criticizing them the same. I
link |
00:15:04.800
love listening to NPR for example, but it's clear that there's a team behind it.
link |
00:15:10.640
There's a constant commercial breaks. There's this kind of like rush of like,
link |
00:15:16.080
okay, I have to interrupt you now because we have to go to commercial. Just this whole,
link |
00:15:20.560
it creates, it destroys the possibility of nuanced conversation. Yeah, exactly.
link |
00:15:28.160
Evian, which Charles Isbel who I talked to yesterday told me that Evian is naive backwards,
link |
00:15:36.080
which the fact that his mind thinks this way is just quite brilliant. Anyway, there's a freedom
link |
00:15:41.920
to this podcast. He's Dr. Awkward, which by the way, is a palindrome. That's a palindrome that
link |
00:15:46.560
I happen to know from other parts of my life. And I just figured out, well, you know, use it against
link |
00:15:52.400
Charles. Dr. Awkward. So what was the most challenging parody song to make? Was it the
link |
00:15:59.520
Thriller one? No, that one was really fun. I wrote the lyrics really quickly. And then I gave it
link |
00:16:05.360
over to the production team. They recruited an acapella group to sing. That went really smoothly.
link |
00:16:11.600
It's great having a team because then you can just focus on the part that you really love,
link |
00:16:14.880
which in my case is writing the lyrics. For me, the most challenging one, not challenging in a bad
link |
00:16:20.400
way, but challenging in a really fun way, was I did, one of the parody songs I did is about
link |
00:16:26.560
the halting problem in computer science. The fact that you can't create a program that can tell
link |
00:16:32.240
for any other arbitrary program, whether it's actually going to get stuck in an infinite loop
link |
00:16:36.960
or whether it's going to eventually stop. And so I did it to an 80s song because that's, I hadn't
link |
00:16:44.400
started my new thing of learning current songs. And it was Billy Joel's The Piano Man, which is
link |
00:16:51.600
a great song. Sing me a song. You're The Piano Man. So the lyrics are great because first of all,
link |
00:17:03.040
it rhymes. Not all songs rhyme. I've done Rolling Stone songs, which turn out to have no rhyme scheme
link |
00:17:09.200
whatsoever. They're just sort of yelling and having good time, which makes it not fun from a
link |
00:17:13.680
parody perspective because like you can say anything, but the lines rhyme and there was a lot
link |
00:17:17.280
of internal rhymes as well. And so figuring out how to sing with internal rhymes, a proof of the
link |
00:17:23.200
halting problem was really challenging. And it was, I really enjoyed that process.
link |
00:17:28.080
What about the last question on this topic? What about the dancing in the thriller video?
link |
00:17:32.720
How many takes that take? So I wasn't planning to dance. They had me in the studio and they gave
link |
00:17:38.800
me the jacket and it's like, well, you can't, if you have the jacket and the glove, like there's
link |
00:17:42.240
not much you can do. So I think I just danced around and then they said, why don't you dance a
link |
00:17:48.000
little bit? There was a scene with me and Charles dancing together. They did not use it in the
link |
00:17:52.720
video, but we recorded it. I don't remember. Yeah, yeah. No, it was pretty funny. And Charles,
link |
00:17:59.040
who has this beautiful, wonderful voice, doesn't really sing. He's not really a singer. And so
link |
00:18:04.320
that was why I designed the song with him doing a spoken section and me doing the singing. It's
link |
00:18:08.320
very like Barry White. Yeah, smooth baritone. Yeah. Yeah, it's great. That was awesome.
link |
00:18:14.320
So one of the other things Charles said is that, you know, everyone knows you as like a super nice
link |
00:18:20.960
guy, super passionate about teaching and so on. What he said, I don't know if it's true, that
link |
00:18:28.800
despite the fact that you are cold blood, like, okay, I will admit this finally for the first
link |
00:18:34.880
time that was that was me. It's the Johnny Cash song. The Manorino, just to watch him die.
link |
00:18:41.760
That you actually do have some strong opinions and some topics. So if this in fact is true, what
link |
00:18:49.120
a strong opinions would you say you have? Is there ideas you think maybe an artificial
link |
00:18:55.120
intelligence, machine learning, maybe in life that you believe is true, that others might,
link |
00:19:01.200
you know, some number of people might disagree with you on. So I try very hard to see things from
link |
00:19:08.640
multiple perspectives. There's this great Calvin and Harbs, Calvin and Hobbes cartoon where Cal,
link |
00:19:15.360
do you know, okay, so Calvin's dad is always kind of a bit of a foil and he talked Calvin and Calvin
link |
00:19:21.760
had done something wrong. The dad talks to him into like seeing it from another perspective
link |
00:19:25.440
and Calvin like this breaks Calvin because he's like, oh my gosh, now I can see the opposite
link |
00:19:30.560
sides of things. And so it becomes like a Cubist cartoon where there is no front and back. Everything
link |
00:19:36.320
is just exposed and it really freaks him out. And finally he settles back down. It's like,
link |
00:19:39.840
oh good, no, I can make that go away. But like I'm that, I live in that world where I'm trying to
link |
00:19:44.960
see everything from every perspective all the time. So there are some things that I've formed
link |
00:19:48.720
opinions about that I would be harder, I think, to disavow me of. One is the super intelligence
link |
00:19:56.800
argument and the existential threat of AI is one where I feel pretty confident in my feeling about
link |
00:20:03.360
that one. Like I'm willing to hear other arguments, but like I am not particularly moved by the idea
link |
00:20:09.120
that if we're not careful, we will accidentally create a super intelligence that will destroy
link |
00:20:14.640
human life. Let's talk about that. Let's get you in trouble and record your video. It's like Bill
link |
00:20:19.680
Gates, I think he said like some quote about the internet that that's just going to be a small
link |
00:20:25.600
thing. It's not going to really go anywhere. And then I think Steve Ballmer said, I don't know why
link |
00:20:31.200
I'm sticking on Microsoft. That's something that like smartphones are useless. There's no reason
link |
00:20:37.920
why Microsoft should get into smartphones that kind of. So let's get you, let's talk about
link |
00:20:41.920
AGI. As AGI is destroying the world, we'll look back at this video and see. No, I think it's
link |
00:20:47.280
really interesting to actually talk about because nobody really knows the future. So you have to
link |
00:20:51.040
use your best intuition. It's very difficult to predict it, but you have spoken about AGI
link |
00:20:57.280
and the existential risks around it and sort of based on your intuition that
link |
00:21:03.280
we're quite far away from that being a serious concern relative to the other concerns we have.
link |
00:21:09.600
Can you maybe unpack that a little bit? Yeah, sure, sure, sure. So as I understand it, that
link |
00:21:16.080
for example, I read Boston's book and a bunch of other reading material about this sort of general
link |
00:21:22.880
way of thinking about the world. And I think the story goes something like this, that we
link |
00:21:27.520
will at some point create computers that are smart enough that they can help design the next
link |
00:21:35.520
version of themselves, which itself will be smarter than the previous version of themselves,
link |
00:21:41.440
and eventually bootstrapped up to being smarter than us, at which point we are essentially at
link |
00:21:48.000
the mercy of this sort of more powerful intellect, which in principle, we don't have any control over
link |
00:21:55.680
what its goals are. And so if its goals are at all out of sync with our goals, for example,
link |
00:22:03.440
the continued existence of humanity, we won't be able to stop it. It'll be way more powerful than
link |
00:22:09.920
us and we will be toast. So there's some, I don't know, very smart people who have signed on to
link |
00:22:16.800
that story. And it's a, it's a compelling story. I want to, now I can really get myself in trouble.
link |
00:22:23.040
I once wrote an op ed about this, specifically responding to some quotes from Elon Musk,
link |
00:22:28.160
who has been, you know, on this very podcast, more than once, and
link |
00:22:34.160
AI summoning the demon. I think he said, but then he came to Providence, Rhode Island,
link |
00:22:38.480
which is where I live, and said to the governors of all the states, you know,
link |
00:22:44.720
you're worried about entirely the wrong thing. You need to be worried about AI. You need to be
link |
00:22:47.920
very, very worried about AI. So, and journalists kind of reacted to that. They wanted to get
link |
00:22:54.160
people's people's take. And I was like, okay, my, my, my belief is that one of the things that
link |
00:23:00.320
makes Elon Musk so successful and so remarkable as an individual is that he believes in the power
link |
00:23:06.480
of ideas. He believes that you can have, you can, if you know, if you have a really good idea for
link |
00:23:11.120
getting into space, you can get into space. If you have a really good idea for a company or for
link |
00:23:15.280
how to change the way that people drive, you just have to do it and it can happen.
link |
00:23:21.360
It's really natural to apply that same idea to AI. You see the systems that are doing some pretty
link |
00:23:26.000
remarkable computational tricks, demonstrations, and then to take that idea and just push it
link |
00:23:33.280
all the way to the limit and think, okay, where does this go? Where is this going to take us next?
link |
00:23:37.760
And if you're a deep believer in the power of ideas, then it's really natural to believe that
link |
00:23:42.800
those ideas could be taken to the extreme and, and kill us. So I think, you know, his strength is
link |
00:23:49.280
also his undoing because that doesn't mean it's true. Like it doesn't mean that that has to happen,
link |
00:23:54.640
but it's natural for him to think that. So another way to phrase the way he thinks, and
link |
00:23:59.760
I, I find it very difficult to argue with that line of thinking. So Sam Harris is another person
link |
00:24:07.360
from the neuroscience perspective that thinks like that is saying, well, is there something
link |
00:24:13.600
fundamental in the physics of the universe that prevents this from eventually happening?
link |
00:24:20.080
And that's, Nick Bostrom thinks in the same way that kind of zooming out, yeah, okay, we humans
link |
00:24:25.440
now are existing in this like timescale of minutes and days. And so our intuition is in this timescale
link |
00:24:33.520
of minutes, hours and days. But if you look at the span of human history, is there any reason
link |
00:24:41.200
we can't see this in 100 years? And like, is there, is there something fundamental about
link |
00:24:47.680
the laws of physics that prevent this? And if it doesn't, then it eventually will happen,
link |
00:24:52.320
or we will destroy ourselves in some other way. And it's very difficult, I find to actually argue
link |
00:24:58.320
against that. Yeah. Me too. And not sound like, not sound like you're just like rolling your
link |
00:25:09.040
eyes. Like I have like science fiction, we don't have to think about it. But even, even worse than
link |
00:25:14.400
that, which is like, I don't know, kids, but like, I got to pick up my kids now, like this, okay,
link |
00:25:18.800
I see more pressing short term. Yeah, there's more pressing short term things that like,
link |
00:25:23.280
stop it with this existential crisis will have much, much shorter things like now,
link |
00:25:27.040
especially this year, there's COVID. So like any kind of discussion like that is,
link |
00:25:31.280
like there's, you know, there's pressing things today. And then so the same Harris argument,
link |
00:25:38.240
well, like any day, the exponential singularity can occur is very difficult to argue against.
link |
00:25:45.840
I mean, I don't know. But part of his story is also, he's not going to put a date on it.
link |
00:25:50.400
It could be in a thousand years, it could be in a hundred years, it could be in two years.
link |
00:25:53.520
It's just that as long as we keep making this kind of progress, it's ultimately
link |
00:25:57.600
has to become a concern. I kind of am on board with that. But the thing that the piece that I
link |
00:26:02.880
feel like is missing from that, that way of extrapolating from the moment that we're in,
link |
00:26:08.240
is that I believe that in the process of actually developing technology that can
link |
00:26:12.400
really get around in the world and really process and do things in the world in a sophisticated
link |
00:26:17.360
way, we're going to learn a lot about what that means, which that we don't know now,
link |
00:26:22.080
because we don't know how to do this right now. If you believe that you can just turn on a deep
link |
00:26:26.160
learning network and it eventually give it enough compute and it'll eventually get there.
link |
00:26:29.760
Well, sure, that seems really scary, because we won't be in the loop at all. We won't be
link |
00:26:34.240
helping to design or target these kinds of systems. But I don't see that feels like it
link |
00:26:42.720
is against the laws of physics, because these systems need help. They need to surpass the
link |
00:26:49.120
difficulty, the wall of complexity that happens in arranging something in the form that will
link |
00:26:54.240
happen. I believe in evolution. I believe that there's an argument. There's another argument,
link |
00:27:01.760
just to look at it from a different perspective, that people say, why don't believe in evolution?
link |
00:27:05.440
How could evolution, it's sort of like a random set of parts assemble themselves into a 747,
link |
00:27:12.960
and that could just never happen. So it's like, okay, that's maybe hard to argue against. But
link |
00:27:17.520
clearly, 747s do get assembled. They get assembled by us. Basically, the idea being that
link |
00:27:23.680
there's a process by which we will get to the point of making technology that has that kind of
link |
00:27:28.720
awareness. And in that process, we're going to learn a lot about that process, and we'll have
link |
00:27:33.680
more ability to control it or to shape it or to build it in our own image. It's not something
link |
00:27:39.680
that is going to spring into existence like that 747. And we're just going to have to contend with
link |
00:27:44.720
it completely unprepared. Now, it's very possible that in the context of the long arc of human history,
link |
00:27:52.640
it will in fact spring into existence. But that that springing might take like if you look at
link |
00:27:58.720
nuclear weapons, like even 20 years is a springing in the context of human history. And it's very
link |
00:28:06.320
possible just like with nuclear weapons that we could have, I don't know what percentage you want
link |
00:28:10.800
to put at it, but the possibility of could have knocked ourselves out. Yeah, the possibility of
link |
00:28:15.520
human beings destroying themselves in the 20th century with the nuclear weapons, I don't know,
link |
00:28:21.200
you can, if you really think through it, you could really put it close to like, I don't know,
link |
00:28:25.360
30 40%, given like the certain moments of crisis that happen. So like, I think
link |
00:28:33.120
one, like fear in the shadows that's not being acknowledged is it's not so much the AI will
link |
00:28:40.480
run away is is that as it's running away, we won't have enough time to think through how to stop it.
link |
00:28:49.600
Right. Fast takeoff or fume. Yeah, I mean, my much bigger concern, I wonder what you think about
link |
00:28:55.760
it, which is we won't know it's happening. So I kind of think that there's an AGI situation
link |
00:29:06.480
already happening with social media, that our minds, our collective intelligence of human
link |
00:29:12.400
civilizations already being controlled by an algorithm. And like, we're we're already super,
link |
00:29:19.120
like the level of a collective intelligence, thanks to Wikipedia, people should donate to
link |
00:29:23.600
Wikipedia to feed the AGI. Man, if we had a super intelligence that that was in line with
link |
00:29:29.200
Wikipedia's values, that it's a lot better than a lot of other things I could imagine. I trust
link |
00:29:35.200
Wikipedia more than I trust Facebook or YouTube as far as trying to do the right thing from a
link |
00:29:40.480
rational perspective. Now, that's not where you were going. I understand that. But it does strike
link |
00:29:44.800
me that there's sort of smarter and less smart ways of exposing ourselves to each other on the
link |
00:29:50.640
internet. Yeah, the interesting thing is that Wikipedia and social media have very different
link |
00:29:54.720
forces. You're right. I mean, Wikipedia, if AGI was Wikipedia, it'd be just like this cranky,
link |
00:30:01.680
overly competent editor of articles. There's something to that. But the social media aspect
link |
00:30:09.520
is not so the vision of AGI is as a separate system that's super intelligent. That's super
link |
00:30:17.680
intelligent. That's one key little thing. I mean, there's the paperclip argument that's super dumb,
link |
00:30:22.080
but super powerful systems. But with social media, you have a relatively like algorithms
link |
00:30:28.080
we may talk about today, very simple algorithms that when something Charles talks a lot about,
link |
00:30:35.440
which is interactive AI, when they start like having at scale, like tiny little interactions
link |
00:30:41.120
with human beings, they can start controlling these human beings. So a single algorithm can
link |
00:30:46.400
control the minds of human beings slowly to where we might not realize it can start wars,
link |
00:30:51.920
it can start, it can change the way we think about things. It feels like in the long arc of history,
link |
00:30:59.600
if I were to sort of zoom out from all the outrage and all the tension on social media,
link |
00:31:04.880
that it's progressing us towards better and better things. It feels like chaos and toxic and all that
link |
00:31:12.640
kind of stuff. It's chaos and toxic. Yeah. But it feels like actually, the chaos and toxic is
link |
00:31:18.320
similar to the kind of debates we had from the founding of this country. There's a civil war
link |
00:31:23.040
that happened over that period. And ultimately, it was all about this tension of like,
link |
00:31:29.520
something doesn't feel right about our implementation of the core values we hold
link |
00:31:33.520
as human beings and they're constantly struggling with this. And that results in
link |
00:31:38.080
people calling each other, just being shady to each other on Twitter. But ultimately,
link |
00:31:46.560
the algorithm is managing all that. And it feels like there's a possible future in which that algorithm
link |
00:31:53.120
controls us into the direction of self destruction and whatever that looks like.
link |
00:31:58.640
Yeah. So all right, I do believe in the power of social media to screw us up royally. I do believe
link |
00:32:05.200
in the power of social media to benefit us too. I do think that we're in a, yeah, it's sort of
link |
00:32:12.160
almost got dropped on top of us. And now we're trying to, as a culture, figure out how to cope
link |
00:32:16.000
with it. There's a sense in which, I don't know, there's some arguments that say that, for example,
link |
00:32:23.600
I guess, college age students now, late college age students now, people who were in middle school
link |
00:32:27.840
when social media started to really take off, maybe really damaged. This may have really
link |
00:32:34.560
hurt their development in a way that we don't have all the implications of quite yet.
link |
00:32:38.880
That's the generation who, and I hate to make it somebody else's responsibility,
link |
00:32:45.840
but they're the ones who can fix it. They're the ones who can figure out,
link |
00:32:50.080
how do we keep the good of this kind of technology without letting it eat us alive?
link |
00:32:55.520
And if they're successful, we move on to the next phase, the next level of the game. If they're not
link |
00:33:03.440
successful, then, yeah, then we're going to wreck each other. We're going to destroy society.
link |
00:33:07.840
So you're going to, in your old age, sit on a porch and watch the world burn because of the
link |
00:33:12.480
TikTok generation that... I believe, well, so this is my kid's age, right? And certainly my
link |
00:33:18.080
daughter's age, and she's very tapped in to social stuff, but she's also, she's trying to find that
link |
00:33:23.760
balance of participating in it and in getting the positives of it, but without letting it eat her
link |
00:33:28.800
alive. And I think sometimes she ventures... I hope she doesn't watch this. Sometimes I think
link |
00:33:35.760
she ventures a little too far and is consumed by it, and other times she gets a little distant.
link |
00:33:43.040
And if there's enough people like her out there, they're going to navigate this choppy waters.
link |
00:33:48.240
That's an interesting skill, actually, to develop. I talked to my dad about it. You know, I've now
link |
00:33:56.720
somehow, this podcast in particular, but other reasons has received a little bit of attention.
link |
00:34:03.440
And with that, apparently in this world, even though I don't shut up about love and I'm just
link |
00:34:08.720
all about kindness, I have now a little mini army of trolls. It's kind of hilarious, actually,
link |
00:34:15.840
but it also doesn't feel good. But it's a skill to learn to not look at that.
link |
00:34:23.280
Like to moderate, actually, how much you look at that. The discussion I have with my dad, it's
link |
00:34:27.200
similar to, it doesn't have to be about trolls. It could be about checking email, which is like,
link |
00:34:33.280
if you're anticipating, you know, there's, my dad runs a large institute at Drexel University,
link |
00:34:39.200
and there could be stressful emails you're waiting, like there's drama of some kinds.
link |
00:34:44.000
And so like, there's a temptation to check the email. If you send an email and you got it,
link |
00:34:49.200
and that pulls you in into, it doesn't feel good. And it's a skill that he actually complains that
link |
00:34:56.320
he hasn't learned, I mean, he grew up without it. So he hasn't learned the skill of how to
link |
00:35:01.360
shut off the internet and walk away. And I think young people, while they're also being,
link |
00:35:05.760
quote unquote, damaged by like, you know, being bullied online, all of those stories,
link |
00:35:11.600
which are very like horrific, you basically can't escape your bullies these days when you're growing
link |
00:35:16.640
up. But at the same time, they're also learning that skill of how to be able to shut off the,
link |
00:35:23.920
like disconnect with it, be able to laugh at it, not take it too seriously. It's a fascinating,
link |
00:35:28.880
like we're all trying to figure this out. Just like you said, has it been dropped on us? And
link |
00:35:31.920
we're trying to figure it out. Yeah, I think that's really interesting. And I, I, I guess I've become
link |
00:35:36.160
a believer in the human design, which I feel like I don't completely understand, like, how do you
link |
00:35:42.480
make something as robust as us? Like we're so flawed in so many ways. And yet, and yet, you know,
link |
00:35:48.960
we dominate the planet. And we do seem to manage to get ourselves out of scrapes, eventually,
link |
00:35:57.680
not necessarily the most elegant possible way, but somehow we get, we get to the next step.
link |
00:36:02.320
And I don't know how I'd make a machine do that. I, I, I generally speaking, like if I train one
link |
00:36:09.440
of my reinforcement learning agents to play a video game, and it works really hard on that
link |
00:36:13.120
first stage over and over and over again, and it makes it through it succeeds on that first level.
link |
00:36:17.680
And then the new level comes, and it's just like, okay, I'm back to the drawing board.
link |
00:36:21.040
And somehow humanity, we keep leveling up, and then somehow managing to put together the skills
link |
00:36:26.080
necessary to achieve success, some semblance of success in that next level too. And, you know,
link |
00:36:34.400
I hope we can keep doing that. You mentioned reinforcement learning. So you've had a couple
link |
00:36:40.800
of years in the field. No, quite, you know, quite a few, quite a long career in artificial
link |
00:36:48.720
intelligence broadly, but reinforcement learning specifically. Can you maybe give a hint about
link |
00:36:55.280
your sense of the history of the field? And in some ways it's changed with the advent of deep
link |
00:37:01.760
learning, but as long roots, like how is it we've done it out of your own life? How have you seen
link |
00:37:07.760
the community change or maybe the ideas that it's playing with change? I've had the privilege,
link |
00:37:13.680
the pleasure of being, of having almost a front row seat to a lot of this stuff. And it's been
link |
00:37:17.760
really, really fun and interesting. So when I was in college in the 80s, early 80s, the neural net
link |
00:37:27.280
thing was starting to happen. And I was taking a lot of psychology classes and a lot of computer
link |
00:37:32.400
science classes as a college student. And I thought, you know, something that can play tic tac toe
link |
00:37:37.840
and just like learn to get better at it, that ought to be a really easy thing. So I spent almost,
link |
00:37:41.760
almost all of my, what would have been vacations during college, like hacking on my home computer,
link |
00:37:47.760
trying to teach it how to play tic tac toe and programming language. Basic. Oh yeah. That's
link |
00:37:53.040
my first language. That's my native language. Is that when you first fell in love with computer
link |
00:37:57.440
science, just like programming basic on that? What was the computer? Do you remember? I had a
link |
00:38:03.440
TRS80 model one before they were called model ones, because there was nothing else. I got my
link |
00:38:08.560
computer in 1979. So I was, I would have been bar mitzvahed, but instead of having a big party
link |
00:38:20.640
that my parents threw on my behalf, they just got me a computer, because that's what I really,
link |
00:38:24.240
really, really wanted. I saw them in the, in the, in the mall in Rayershack. And I thought,
link |
00:38:29.680
what, how are they doing that? I would try to stump them. I would give them math problems,
link |
00:38:33.040
like one plus, and then in parentheses, two plus one. And I would always get it right. I'm like,
link |
00:38:38.160
how do you know so much? Like I've had to go to an algebra class for the last few years to learn
link |
00:38:43.200
this stuff. And you just seem to know. So I was, I was, I was smitten and I got a computer. And I
link |
00:38:48.880
think ages 13 to 15, I have no memory of those years. I think I just was in my room with the
link |
00:38:56.480
computer. Listening to Billy Joel. Communing, possibly listening to the radio, listening to
link |
00:39:00.320
Billy Joel. That was the one album I had on vinyl at that time. And, and then I got it on cassette
link |
00:39:07.280
tape. And that was really helpful. Because then I could play it. I didn't have to go down in my
link |
00:39:10.640
parents Wi Fi or Hi Fi. Sorry. And then age 15, I remember kind of walking out and like, okay,
link |
00:39:18.000
I'm ready to talk to people again. Like I've learned what I need to learn here.
link |
00:39:22.640
And so yeah, so, so that was, that was my home computer. And so I went to college and I was
link |
00:39:26.560
like, oh, I'm totally going to study computer science. I opted the college I chose specifically
link |
00:39:31.040
had a computer science major, the one that I really want the college I really wanted to go to
link |
00:39:35.280
didn't. So bye bye to them. Which college did you go to? So I went to Yale. Princeton would
link |
00:39:41.280
have been way more convenient. And it was just beautiful campus. And it was close enough to
link |
00:39:44.960
home. And I was really excited about Princeton. And I visited, I said, so computer science major
link |
00:39:49.040
like, well, we have computer engineering. I'm like, oh, I don't like that word engineering.
link |
00:39:54.800
I like computer science. I really, I want to do like, you're saying hardware and software.
link |
00:39:58.880
They're like, yeah, I'm like, I just want to do software. I couldn't care less about hardware.
link |
00:40:01.840
You grew up in Philadelphia? I grew up outside Philly. Yeah. Yeah. So the, you know, local
link |
00:40:06.320
schools were like Penn and Drexel and Temple, like everyone in my family went to Temple,
link |
00:40:12.480
at least at one point in their lives, except for me. So yeah, Philly family.
link |
00:40:17.280
Yale had a computer science department. And that's when you, it's kind of interesting.
link |
00:40:21.360
You said 80s in neural networks. That's when the neural networks is a hot new thing or a hot thing
link |
00:40:26.080
period. So what is that in college when you first learned about neural networks? Yeah.
link |
00:40:30.880
When she learned like how it was in a psychology class, not in a CS class.
link |
00:40:34.400
Yeah. Was it psychology or cognitive science or like, do you remember like what context?
link |
00:40:39.040
It was, yeah, yeah, yeah. So, so I was a, I've always been a bit of a cognitive psychology
link |
00:40:44.160
groupie. So like I, I studied computer science, but I like, I like to hang around where the
link |
00:40:48.720
cognitive scientists are. Cause I don't know brains, man. They're like, they're wacky. Cool.
link |
00:40:55.200
And they have a bigger picture view of things. They're a little less engineering, I would say.
link |
00:40:59.600
They're more, they're more interested in the nature of cognition and intelligence and perception
link |
00:41:04.480
and how like division system work. Like they're asking always bigger questions. Now with the
link |
00:41:09.440
deep learning community, there, I think more, there's a lot of intersections, but I do find in
link |
00:41:15.200
that the, the neuroscience folks actually, and cognitive psychology, cognitive science folks
link |
00:41:23.120
are starting to learn how to program, how to use neural artificial neural networks.
link |
00:41:27.760
And they are actually approaching problems in like totally new, interesting ways. It's fun to
link |
00:41:31.840
watch that grass students from those departments like approach a problem of machine learning.
link |
00:41:37.200
Right. They come in with a different perspective. Yeah. They don't care about like your image
link |
00:41:40.960
net data set or whatever. They, they want like to understand the, the, the like the basic mechanisms
link |
00:41:48.960
at the, at the neuronal level, at the functional level of intelligence. So it's kind of, it's
link |
00:41:53.920
going to cool to see them work. But yeah. Okay. So you always love, you're always a
link |
00:41:58.400
group of cognitive psychology. Yeah. Yeah. And so, so it was in a class by Richard Garrick.
link |
00:42:04.240
He was kind of my, my favorite psych professor in college. And I took like three different
link |
00:42:09.840
classes with him. And yeah, so that we were, they were talking specifically the class, I think was
link |
00:42:15.280
kind of a, there was a big paper that was written by Steven Pinker and Prince. I don't, I'm blanking
link |
00:42:22.400
on Prince's first name, but Princeton, Pinker and Prince, they wrote kind of a, they were at that
link |
00:42:28.480
time kind of like, I'm blanking on the names of the current people. The cognitive scientists who
link |
00:42:36.400
were complaining a lot about deep networks. Oh, Gary, Gary Marcus. Gary Marcus. And who else? I
link |
00:42:44.160
mean, there's a few, but Gary, Gary is the most feisty. Sure. Gary is very feisty. And with this,
link |
00:42:48.960
with his coauthor, they, they, you know, they're kind of doing these kind of takedowns where they
link |
00:42:52.720
say, okay, well, yeah, it does all these amazing things, amazing things. But here's a shortcoming,
link |
00:42:56.800
here's a shortcoming, here's a shortcoming. And so the Pinker Prince paper is kind of like the,
link |
00:43:01.600
that generation's version of Marcus and Davis, right? Where they're, they're trained as cognitive
link |
00:43:07.360
scientists, but they're looking skeptically at the results in the, in the artificial intelligence,
link |
00:43:12.480
neural net kind of world and saying, yeah, it can do this and this and this, but like, it can't do
link |
00:43:17.280
that. And it can't do that. And it can't do that. Maybe in principle, or maybe just in practice at
link |
00:43:21.280
this point. But, but the fact of the matter is you're, you've narrowed your focus too far to be
link |
00:43:27.520
impressed, you know, you're impressed with the things within that circle, but you need to broaden
link |
00:43:31.680
that circle a little bit. You need to look at a wider set of problems. And so, so we, so I was
link |
00:43:36.720
in this seminar in college, that was basically a close reading of the Pinker Prince paper,
link |
00:43:42.160
which was like really thick. There was a lot going on in there. And, and it, and it talked about
link |
00:43:49.600
the reinforcement learning idea a little bit. I'm like, Oh, that sounds really cool, because
link |
00:43:53.200
behavior is what is really interesting to me about psychology anyway. So making programs that, I mean,
link |
00:43:58.880
programs are things that behave. People are things that behave. Like I want to make learning that
link |
00:44:03.840
learns to behave. In which way was reinforcement learning presented? Is this talking about human
link |
00:44:09.520
and animal behavior? Or are we talking about actual mathematical construct?
link |
00:44:13.040
That's right. So that's a good question. Right. So this is, I think it wasn't actually talked about
link |
00:44:18.720
as behavior in the paper that I was reading. I think that it just talked about learning.
link |
00:44:23.040
And to me, learning is about learning to behave, but really neural nets at that point were about
link |
00:44:27.600
learning, like supervised learning. So learning to produce outputs from inputs. So I kind of tried
link |
00:44:32.400
to invent reinforcement learning. I, when I graduated, I joined a research group at Belcore,
link |
00:44:38.480
which had spun out of Bellabs recently at that time, because of the divestiture of the, of
link |
00:44:43.200
long distance and local phone service in the 1980s, 1984. And I was in a group with Dave Ackley,
link |
00:44:51.280
who was the first author of the Boltzmann machine paper. So the very first neural net paper that
link |
00:44:56.880
could handle XOR, right? So XOR sort of killed neural nets, the very first, the zero width
link |
00:45:02.720
the first winter. Yeah. The, the Perceptron's paper and Hinton, along with the student Dave Ackley,
link |
00:45:11.280
and, and I think there was other authors as well, showed that no, no, no, with bolts machines,
link |
00:45:15.280
we can actually learn nonlinear concepts. And so everything's back on the table again. And that
link |
00:45:20.640
kind of started that second wave of neural networks. So Dave Ackley was, he became my mentor at,
link |
00:45:26.000
at Belcore. And we talked a lot about learning and life and computation and how all these things
link |
00:45:31.760
fit together. Now, Dave and I have a podcast together. So, so I get to kind of enjoy that
link |
00:45:38.560
sort of his perspective once again, even, even all these years later. And so I said, so I said,
link |
00:45:45.520
I was really interested in learning, but in the concept of behavior. And he's like, oh, well,
link |
00:45:49.760
that's reinforcement learning here. And he gave me Rich Sutton's 1984 TD paper. So I read that
link |
00:45:56.160
paper, I honestly didn't get all of it. But I got the idea, I got that they were using
link |
00:46:01.280
that he was using ideas that I was familiar with in the context of neural nets and, and
link |
00:46:06.080
like sort of back prop. But with this idea of making predictions over time, I'm like,
link |
00:46:11.280
this is so interesting, but I don't really get all the details I said to Dave. And Dave said,
link |
00:46:15.280
oh, well, why don't we have him come and give a talk. And I was like,
link |
00:46:20.000
wait, what, you can do that? Like, these are real people? I thought they were just words. I thought
link |
00:46:24.320
it was just like ideas that somehow magically seeped into paper. He's like, no, I, I, I, I know
link |
00:46:30.400
Rich, like, we'll just have him come down and he'll give a talk. And so I was, you know,
link |
00:46:36.080
my mind was blown. And so Rich came and he gave a talk at Bellcore. And he talked about what he
link |
00:46:41.760
was super excited, which was they had just figured out at the time, Q learning. So Watkins had
link |
00:46:49.040
visited the Rich Sutton's lab at UMass or Andy Bartow's lab that Rich was a part of.
link |
00:46:55.760
And he was really excited about this because it resolved a whole bunch of problems that he
link |
00:47:01.520
didn't know how to resolve in the, in the earlier paper. And so, for people who don't know,
link |
00:47:07.200
TD, temporal difference, these are all just algorithms for reinforcement learning.
link |
00:47:11.440
Right. And TD, temporal difference, in particular, is about making predictions over time.
link |
00:47:16.080
And you can try to use it for making decisions, right? Because if you can predict how good a
link |
00:47:19.680
future action, an action outcomes will be in the future, you can choose one that has better.
link |
00:47:24.480
And, but the theory didn't really support changing your behavior. Like the predictions
link |
00:47:29.520
had to be of a consistent process if you really wanted it to work. And one of the things that
link |
00:47:35.040
was really cool about Q learning, another algorithm for reinforcement learning is it was
link |
00:47:38.960
off policy, which meant that you could actually be learning about the environment and what
link |
00:47:42.640
the value of different actions would be while actually figuring out how to behave optimally.
link |
00:47:48.800
Yeah. So that was a revelation.
link |
00:47:50.320
Yeah. And the proof of that is kind of interesting. I mean, that's really surprising to me when I
link |
00:47:53.760
first read that and then enriched Rich Sutton's book on the matter. It's kind of beautiful that
link |
00:48:00.080
a single equation can capture all one line of code and like you can learn anything.
link |
00:48:04.800
Yeah, like, so equation and code, you're right. Like you can, the code that you can arguably,
link |
00:48:13.200
at least if you like squint your eyes can say this is all of intelligence,
link |
00:48:18.640
that you can implement that in a single, well, I think I started with Lisp, which is a shout
link |
00:48:25.120
out to Lisp, with like a single line of code, key piece of code, maybe a couple,
link |
00:48:31.200
that you could do that as kind of magical. It feels so good to be true.
link |
00:48:36.880
Well, and it sort of is. Yeah. It seems they require an awful lot of extra stuff supporting it.
link |
00:48:43.280
But nonetheless, the idea is really good. And as far as we know, it is a very reasonable way of
link |
00:48:50.480
trying to create adaptive behavior, behavior that gets better at something over time.
link |
00:48:56.640
Did you find the idea of optimal at all compelling that you could prove that it's optimal? So like
link |
00:49:02.400
one part of computer science that it makes people feel warm and fuzzy inside is when you
link |
00:49:08.560
can prove something like that assorting algorithm, worst case, runs and log in, and it makes everybody
link |
00:49:14.960
feel so good. Even though in reality, it doesn't really matter what the worst case is, what matters
link |
00:49:19.600
is like, does this thing actually work in practice on this particular actual set of data that I
link |
00:49:25.440
enjoy? Did you? So here's a place where I have maybe a strong opinion, which is like,
link |
00:49:30.880
you're right, of course, but no, no. So what makes worst case so great, right? If you have a
link |
00:49:38.080
worst case analysis, so great is that you get modularity. You can take that thing and plug it
link |
00:49:43.280
into another thing and still have some understanding of what's going to happen when you click them
link |
00:49:48.160
together, right? If it just works well in practice, in other words, with respect to some distribution
link |
00:49:53.120
that you care about, when you go plug it into another thing, that distribution can shift,
link |
00:49:57.680
it can change, and your thing may not work well anymore. And you want it to and you wish it does
link |
00:50:02.480
and you hope that it will, but it might not. So you're saying you don't like machine learning.
link |
00:50:13.040
But we have some positive theoretical results for these things. You can come back at me
link |
00:50:19.440
with, yeah, but they're really weak and yeah, they're really weak. And you can even say that
link |
00:50:24.320
sorting algorithms, like if you do the optimal sorting algorithm, it's not really the one that
link |
00:50:28.000
you want. And that might be true as well. But it is, the modularity is a really powerful state.
link |
00:50:33.920
I really like that. As an engineer, you can then assemble different things you can count on them
link |
00:50:38.160
to be, I mean, it's interesting. It's a balance, like with everything else in life, you don't want
link |
00:50:45.440
to get too obsessed. I mean, this is what computer scientists do, which they tend to get obsessed,
link |
00:50:51.280
they over optimize things, or they start by optimizing them, they over optimize. So it's
link |
00:50:57.360
easy to like get really granular about this thing. But like the step from an n squared to an n log n
link |
00:51:06.080
sorting algorithm is a big leap for most real world systems, no matter what the actual
link |
00:51:12.400
behavior of the system is, that's a big leap. And the same can probably be said for other kind of
link |
00:51:18.880
first leaps that you would take on a particular problem. Like it's picking the low hanging fruit
link |
00:51:25.520
or whatever the equivalent of doing the not the dumbest thing, but the next to the dumbest thing
link |
00:51:31.920
is picking the most delicious, reachable fruit. Yeah, most delicious, reachable fruit. I don't
link |
00:51:36.400
know why that's not a saying. And yeah. Okay, so, so you then this is the 80s and this kind of idea
link |
00:51:45.120
starts to percolate of learning. Yeah, I got to meet rich Sutton. So everything was sort of
link |
00:51:51.440
downhill from there. And that was that was really the pinnacle of everything. But then I, you know,
link |
00:51:55.920
then I felt like I was kind of on the inside. So then as interesting results were happening,
link |
00:52:00.000
I could like check in with, with rich or with Jerry Tesaro, who had a huge impact on kind of
link |
00:52:06.080
early thinking in, in temple difference learning and reinforcement learning and show that you
link |
00:52:10.880
could do, you could solve problems that we didn't know how to solve any other way.
link |
00:52:16.000
And so that was really cool. So it was good things were happening, I would hear about it from
link |
00:52:19.840
either the people who were doing it, or the people who were talking to the people who were
link |
00:52:23.120
doing it. And so I was able to track things pretty well through, through the 90s.
link |
00:52:28.080
So what wasn't most of the excitement on reinforcement learning in the 90s era with,
link |
00:52:35.600
what is it, TD gamma, like, what's the role of these kind of little like fun,
link |
00:52:41.680
game playing things and breakthroughs about, you know, exciting the community? Was that,
link |
00:52:47.360
like, what were your, because you've also built across or we're part of building a crossword
link |
00:52:53.600
puzzle, uh, solver program, yeah, solving program, uh, called prover. So,
link |
00:53:01.120
so you were interested in this as a problem, like in forming, in using games to understand how to
link |
00:53:09.760
build, uh, intelligent systems. So like, what did you think about TD gamma? Like, what did you
link |
00:53:14.640
think about that whole thing in the 90s? Yeah. I mean, I found the TD gamma result really just
link |
00:53:19.680
remarkable. So I had known about some of Jerry's stuff before he did TD gamma and he did a system,
link |
00:53:24.720
just more vanilla, well, not entirely vanilla, but a more classical backpropy kind of, uh,
link |
00:53:30.560
network for playing backgammon, where he was training it on expert moves. So it was kind of
link |
00:53:36.000
supervised. But the way that it worked was not to mimic the actions, but to learn internally
link |
00:53:42.640
and evaluation function. So to learn, well, if the expert chose this over this, that must mean
link |
00:53:47.920
that the expert values this more than this. And so let me adjust my weights to make it so that
link |
00:53:52.800
the network evaluates this as being better than this. So it could learn from, from human
link |
00:53:58.960
preferences, it could learn its own preferences. And then when he took the step from that to
link |
00:54:04.880
actually doing it as a full on reinforcement learning problem where you didn't need a trainer,
link |
00:54:10.000
you could just let it play, that was, that was remarkable. Right. And so I think as,
link |
00:54:16.400
as humans often do, as we've done in the recent past as well, people extrapolate and it's like,
link |
00:54:22.160
Oh, well, if you can do that, which is obviously very hard, then obviously you could do all these
link |
00:54:27.120
other problems that we, that we want to solve that we know are also really hard. And it turned out
link |
00:54:32.240
very few of them ended up being practical. Um, partly because I think neural nets,
link |
00:54:36.720
it's certainly at the time we're struggling to be consistent and reliable. And so training them
link |
00:54:43.520
in a reinforcement learning setting was a bit of a mess. I had, uh, I don't know, generation
link |
00:54:49.120
after generation of like master students who wanted to do value function approximation,
link |
00:54:55.600
basically learn reinforcement learning with neural nets. And over and over and over again,
link |
00:55:02.720
we were failing. We couldn't get the, the good results that Jerry Tesarro got. I now believe
link |
00:55:06.880
that Jerry is a neural net whisperer. He has a particular ability to get neural networks to
link |
00:55:14.160
do things that other people would find impossible. And it's not the technology, it's the technology
link |
00:55:20.640
and Jerry together. Yeah. At which I think speaks to the role of the human expert in the
link |
00:55:27.360
process of machine learning. Right. It's so easy. We were so drawn to the idea that,
link |
00:55:31.680
that it's the technology that is, that is where the power is coming from,
link |
00:55:35.920
that I think we lose sight of the, of the fact that sometimes you need a really good,
link |
00:55:39.360
just like, I mean, no one would think, Hey, here's this great piece of software. Here's like,
link |
00:55:42.640
I don't know, GNU Emacs or whatever. Um, and it doesn't that prove that computers are super
link |
00:55:47.760
powerful and basically going to take over the world. It's like, no, Stamina is a hell of a hacker.
link |
00:55:52.240
Right. So he was able to make the code do these amazing things. He couldn't have done it without
link |
00:55:56.640
the computer, but the computer couldn't have done it without him. And so I think people discount
link |
00:56:01.360
the role of people like Jerry who, who, um, who have just a particular, a particular set of skills.
link |
00:56:08.960
On that topic, by the way, as a small side note, I tweeted Emacs is greater than Vim yesterday
link |
00:56:16.000
and deleted, deleted the tweet 10 minutes later when I realized it started a war. Yeah. I was like,
link |
00:56:23.680
Oh, I was just kidding. I was just being provocative walk, walk, walk back. So people still feel
link |
00:56:31.440
passionately about that particular piece of, uh, I don't get that cause Emacs is clearly so much
link |
00:56:36.480
better. I don't understand. But you know, why do I say that? Because I, cause like I spent a block
link |
00:56:41.200
of time in the eighties, um, making my fingers know the Emacs keys. And now like that's part
link |
00:56:48.400
of the thought process for me. Like I need to express. And if you take that, if you take my Emacs
link |
00:56:53.280
key bindings away, I become lit. Yeah. I can't express myself. I'm the same way with the, I
link |
00:57:00.640
don't know if you know what, what it is, but a Kinesis keyboard, which is, uh, this butt shaped
link |
00:57:05.200
keyboard. Yes, I've seen them. Yeah. And they're very, uh, I don't know, sexy, elegant. They're
link |
00:57:11.520
beautiful. Yeah. They're, they're gorgeous, uh, way too expensive. But, uh, the, the problem with
link |
00:57:18.320
them similar with Emacs is when you, once you learn to use it, it's harder to use other things.
link |
00:57:25.520
It's hard to use other things. There's this absurd thing where I have like small, elegant,
link |
00:57:29.760
lightweight, beautiful little laptops and I'm sitting there in a coffee shop with a giant
link |
00:57:34.240
Kinesis keyboard and a sexy little laptop. It's absurd. But it, you know, like I used to feel
link |
00:57:40.320
bad about it, but at the same time, you just kind of have to, sometimes it's back to the Billy Joel
link |
00:57:45.200
thing. You just have to throw that Billy Joel record and throw Taylor Swift and Justin Bieber
link |
00:57:51.040
to the wind. So. See, but I like them now because I, because again, I have no musical taste. Like,
link |
00:57:56.560
like now that I've heard Justin Bieber enough, I like, I really like his songs and Taylor Swift,
link |
00:58:01.840
not only do I like her songs, but my daughter's convinced that she's a genius. And so now I
link |
00:58:05.280
basically have, I'm signed on to that. So. So yeah, that, that speaks to the back to the robustness
link |
00:58:10.880
of the human brain. That speaks to the neuroplasticity that you can just, you can just like a mouse,
link |
00:58:16.640
teach yourself to a, or a dog, teach yourself to enjoy Taylor Swift. I'll try it out. I don't know.
link |
00:58:24.000
I try, you know what it has to do with just like acclimation, right? Just like you said,
link |
00:58:28.560
a couple of weeks. Yeah. That's an interesting experiment. I'll actually try that. Like I'll
link |
00:58:32.080
listen to. If that wasn't the intent of the experiment, just like social media, it wasn't
link |
00:58:34.880
intended as an experiment to see what we can take as a society, but it turned out that way.
link |
00:58:39.360
I don't think I'll be the same person on the other side of the week listening to Taylor Swift,
link |
00:58:43.200
but let's try. You know, it's more compartmentalized. Don't be so worried. Like it's,
link |
00:58:47.520
like I get that you can be worried, but don't be so worried because we compartmentalize really
link |
00:58:51.120
well. And so it won't bleed into other parts of your life. You won't start, I don't know,
link |
00:58:56.160
wearing red lipstick or whatever. Like it's, it's fine. It's fine. Change fashion and everything.
link |
00:58:59.840
It's fine. But you know what? The thing you have to watch out for is you'll walk into a
link |
00:59:03.200
coffee shop once we can do that again. And recognize the song. And you'll be, no,
link |
00:59:06.960
you won't know that you're singing along until everybody in the coffee shop is looking at you.
link |
00:59:11.520
And then you're like, that wasn't me. Yeah. That's the, you know, people are afraid of AGI. I'm
link |
00:59:18.400
afraid of the Taylor Swift takeover. Yeah. And I mean, people should know that T.D. Gammon
link |
00:59:24.720
was, I get, would you call it, do you like the terminology of self play by any chance?
link |
00:59:30.800
So, so like systems that learn by playing themselves, just, I don't know if it's the best
link |
00:59:36.640
word, but, uh, so what's, what's the problem with that term? Okay. So it's like the big bang,
link |
00:59:43.520
like it's, it's like talking to serious physicists. Do you like the term big bang when, when it was
link |
00:59:49.040
early? I feel like it's the early days of self play. I don't know. Maybe it was just previously,
link |
00:59:53.120
but I think it's been used by only a small group of people. And so like, I think we're still deciding,
link |
00:59:59.600
is this ridiculously silly name, a good name for the concept, potentially one of the most
link |
01:00:04.880
important concepts in artificial intelligence. Okay. Depends how broadly you apply the term. So I
link |
01:00:09.360
used the term in my 1996 PhD dissertation. Are you, wow, the actual terms of self play.
link |
01:00:14.560
Yeah. Because, because Tassaro's paper was something like, um, training up an expert
link |
01:00:20.080
backgammon player through self play. So I think it was in the title of his paper. If not in the
link |
01:00:24.480
title, it was definitely a term that he used. There's another term that we got from that work
link |
01:00:28.720
is rollout. So I don't know if you, do you ever hear the term rollout? That's a backgammon term
link |
01:00:33.200
that has now applied generally in computers. Well, at least in AI, because of TDGammon. Yeah.
link |
01:00:40.560
That's fascinating. So how is self play being used now? And like, why is it, does it, does it
link |
01:00:44.480
feel like a more general powerful concept is sort of the idea of, well, the machine just
link |
01:00:48.320
going to teach itself to be smart? Yeah. So that's, that's where maybe you can correct me,
link |
01:00:53.600
but that's where, you know, the continuation of the spirit and actually like literally the exact
link |
01:00:58.880
algorithms of TDGammon are applied by DeepMind and OpenAI to learn games that are a little bit
link |
01:01:05.680
more complex, that when I was learning artificial intelligence go was presented to me with artificial
link |
01:01:11.760
intelligence, the modern approach. I don't know if they explicitly pointed to go in those books
link |
01:01:16.720
as like unsolvable kind of thing, like implying that these approaches hit their limit in this
link |
01:01:24.640
have with these particular kind of games. So something, I don't remember if the book said
link |
01:01:28.640
it or not, but something in my head for was the professors instilled in me the idea like,
link |
01:01:34.400
this is the limits of artificial intelligence of the field. Like it instilled in me the idea that
link |
01:01:41.040
if we can create a system that can solve the game of go, we've achieved AGI. That was kind of
link |
01:01:46.640
I didn't explicitly like say this, but that was the feeling. And so from, I was one of the people
link |
01:01:52.640
that it seemed magical when a learning system was able to beat a human world champion at the game
link |
01:02:01.280
of go. And even more so from that, that was AlphaGo, even more so with AlphaGo zero than kind of
link |
01:02:08.880
renamed and advanced into Alpha zero, beating a world champion or world class player without
link |
01:02:17.840
any supervised learning on expert games we're doing only through by playing itself. So that
link |
01:02:26.640
is, I don't know what to make of it. I think it'll be interesting to hear what your opinions are on
link |
01:02:32.880
just how exciting, surprising, profound, interesting, or boring the breakthrough performance of Alpha
link |
01:02:43.840
zero was. Okay. So AlphaGo knocked my socks off. That was so remarkable. Which aspect of it?
link |
01:02:54.000
They got it to work that they actually were able to leverage a whole bunch of different ideas,
link |
01:02:58.880
integrate them into one giant system. Just the software engineering aspect of it is mind blowing.
link |
01:03:04.080
I don't, I've never been a part of a program as complicated as the program that they built for
link |
01:03:08.480
that. And, and just the, you know, like, like Jerry Chisaro is a neural net whisperer, like,
link |
01:03:14.720
you know, David Silver is a kind of neural net whisperer too. He was able to coax these networks
link |
01:03:19.280
and these new way out there architectures to do these, you know, solve these problems that,
link |
01:03:24.960
as you said, you know, when we were learning from AI, no one had an idea how to make it work.
link |
01:03:32.640
It was, it was remarkable that these, you know, these, these techniques that were so good at
link |
01:03:39.440
playing chess and that could beat the world champion in chess couldn't beat, you know,
link |
01:03:42.960
your typical go playing teenager in Go. So the fact that the, you know, in a very short number
link |
01:03:49.200
of years, we kind of ramped up to trouncing people in Go just blew me away.
link |
01:03:55.760
So you, you're kind of focusing on the engineering aspect, which is also very surprising. I mean,
link |
01:04:00.160
there's something different about large, well funded companies. I mean, there's a compute aspect to
link |
01:04:06.720
it too. Sure. Like that, of course, I mean, that's similar to deep blue, right, with, with IBM.
link |
01:04:14.160
Like there's something important to be learned and remembered about a large company taking
link |
01:04:19.840
the ideas that are already out there and investing a few million dollars into it or, or more. And
link |
01:04:27.680
so you're kind of saying the engineering is kind of fascinating, both on the, with AlphaGo is
link |
01:04:33.040
probably just gathering all the data, right, of the, of the expert games, like organizing everything,
link |
01:04:38.720
actually doing distributed supervised learning. And to me, see the engineering I kind of took
link |
01:04:47.120
for granted, to me philosophically being able to persist in the, in the face of like long odds,
link |
01:04:57.760
because it feels like for me, I'll be one of the skeptical people in the room thinking that you
link |
01:05:02.720
can learn your way to, to beat go. Like it sounded like, especially with David Silver, it sounded
link |
01:05:08.720
like David was not confident at all. So like it was like, not, it's funny how confidence works.
link |
01:05:18.000
Yeah. It's like, you're not like cocky about it. Like, but. Right. Cause if you're cocky about it,
link |
01:05:25.680
you, you kind of stop and stall and don't get anywhere. Yeah. But there's like a hope
link |
01:05:30.160
that's unbreakable. Maybe that's better than confidence. It's a kind of wishful
link |
01:05:34.800
hope and a little dream. And you almost don't want to do anything else. You kind of keep doing it.
link |
01:05:40.720
That's, that seems to be the story and. But with enough skepticism that you're looking for where
link |
01:05:45.840
the problems are and fighting through them. Yeah. Cause you know, there's got to be a way out of
link |
01:05:50.080
this thing. Yeah. And for him, it was probably, there's, there's a bunch of little factors that
link |
01:05:55.280
come into play. It's funny how these stories just all come together. Like everything he did in his
link |
01:05:58.800
life came into play, which is like a love for video games and also a connection to, so the,
link |
01:06:05.680
the nineties had to happen with TD Gammon and so on. Yeah. And in some ways it's surprising,
link |
01:06:10.720
maybe you can provide some intuition to it that not much more than TD Gammon was done for quite
link |
01:06:16.560
a long time on the reinforcement learning front. Yeah. Is that weird to you? I mean,
link |
01:06:21.440
like I said, the, the students who I worked with, we tried to get,
link |
01:06:24.800
to basically apply that architecture to other problems and we consistently failed. There were
link |
01:06:30.880
a couple, a couple of really nice demonstrations that ended up being in the literature. There was
link |
01:06:35.280
a paper about controlling elevators, right? Where it's, it's like, okay, can we modify the heuristic
link |
01:06:42.080
that elevators use for deciding, like a bank of elevators for deciding which floors we should
link |
01:06:46.240
be stopping on to maximize throughput essentially. And you can set that up as a reinforcement
link |
01:06:51.520
learning problem and you can, you know, have a neural net represent the value function so that
link |
01:06:55.680
it's taking where all the elevators, where the button pushes, you know, this high dimensional,
link |
01:07:00.480
well, at the time, high dimensional input, you know, a couple dozen dimensions and turn that
link |
01:07:06.560
into a prediction as to, oh, is it going to be better if I stop at this floor or not? And ultimately,
link |
01:07:12.160
it appeared as though for the standard simulation distribution for people trying to leave the
link |
01:07:17.840
building at the end of the day, that the neural net learned a better strategy than the standard one
link |
01:07:21.840
that's implemented in elevator controllers. So that, that was nice. There was some work that
link |
01:07:27.520
Satinder Singh at all did on handoffs with cell phones, you know, deciding when, when should you
link |
01:07:36.000
hand off from this cell tower to this cell tower. Oh, okay, communication networks. Yeah. And so
link |
01:07:41.760
a couple things seemed like they were really promising. None of them made it into production
link |
01:07:45.600
that I'm aware of. And neural nets as a whole started to kind of implode around then. And so
link |
01:07:51.360
there just wasn't a lot of air in the room for people to try to figure out, okay,
link |
01:07:55.040
how do we get this to work in the RL setting? And then they, they found their way back in 10,
link |
01:08:01.200
in 10 plus years. So you said Alpha Go was impressive, like it's a big spectacle. Is there
link |
01:08:07.680
Right. So then Alpha zero. So I think I may have a slightly different opinion on this than
link |
01:08:11.760
some people. So I talked to Satinder Singh in particular about this. So Satinder was
link |
01:08:17.120
like Rich Sutton, a student of Antibarto. So they came out of the same lab, very influential,
link |
01:08:22.720
machine learning, reinforcement learning researcher. Now deep behind as is, as is Rich,
link |
01:08:29.680
though different sites, the two of them. He's in Alberta. Rich is in Alberta. And
link |
01:08:34.480
Satinder would be in England, but I think he's in England from Michigan at the moment.
link |
01:08:38.240
But the, but he was, yes, he was much more impressed with Alpha Go zero, which is didn't,
link |
01:08:48.320
didn't get a kind of a bootstrap in the beginning with human trained games. And it's just was purely
link |
01:08:52.480
self play. Though the first one Alpha Go was also a tremendous amount of self play, right? They
link |
01:08:58.000
started off, they kickstarted the, the action network that was making decisions. But then
link |
01:09:02.720
they trained it for a really long time using more traditional temple difference methods.
link |
01:09:06.800
So, so as a result, I didn't, it didn't seem that different to me. Like, it seems like, yeah, why
link |
01:09:14.000
wouldn't that work? Like once, once it works, it works. So what, but he found that, that removal
link |
01:09:21.280
of that extra information to be breathtaking, like that, that's a game changer. To me, the first
link |
01:09:26.320
thing was more of a game changer. But the open question, I mean, I guess that's the assumption
link |
01:09:30.480
is the expert games might contain within them, within them a, yeah, this amount of information.
link |
01:09:38.960
But we know that it went beyond that, right? We know that it somehow got away from that information
link |
01:09:43.520
because it was learning strategies. I don't think it's, I don't think Alpha Go is just
link |
01:09:48.000
better at implementing human strategies. I think it actually developed its own strategies that were,
link |
01:09:52.800
that were more effective. And so from that perspective, okay, well, so it, it made at
link |
01:09:57.920
least one quantum leap in terms of strategic knowledge. Okay, so now maybe it makes three.
link |
01:10:04.560
Like, okay, but that first one is the doozy, right? Getting it to, to, to work reliably and,
link |
01:10:10.480
and for the networks to, to hold on to the value well enough. Like that was, that was a big step.
link |
01:10:15.920
Well, isn't, maybe you could speak to this on the reinforcement learning front. So the
link |
01:10:19.360
starting from scratch and learning to do something like the first like, like random behavior to
link |
01:10:30.240
like crappy behavior to like somewhat okay behavior. It's not obvious to me that that's not
link |
01:10:38.240
like impossible to take those steps. Like if you just think about the intuition, like how the heck
link |
01:10:44.960
does random behavior become somewhat basic intelligent behavior? Not, not human level,
link |
01:10:52.400
not super human level, but just basic. But you're saying to you kind of the intuition is like,
link |
01:10:57.920
if you can go from human to super human level intelligence on the, on this particular task
link |
01:11:02.800
of game playing, then you're good at taking leaps. So you can take many of them.
link |
01:11:08.320
That the system, I believe that the system can take that kind of leap. Yeah. And also,
link |
01:11:13.200
I think that the beginner knowledge in go, like you can start to get a feel really quickly for
link |
01:11:20.000
the idea that, you know, certain parts of the being in certain parts of the board seems to be
link |
01:11:25.760
more associated with winning, right? Cause it's not, it's not stumbling upon the concept of winning.
link |
01:11:31.920
It's told that it wins or that it loses. Well, it's self play. So it both wins and loses.
link |
01:11:36.560
It's told which, which side won. And the information is kind of there to start percolating around to
link |
01:11:43.680
make a difference as to, um, well, these things have a better chance of helping you win and these
link |
01:11:48.880
things have a worse chance of helping you win. And so, you know, it can get to basic play,
link |
01:11:52.560
I think pretty quickly, then once it has basic play, well, now it's kind of forced to do some
link |
01:11:58.000
search to actually experiment with, okay, well, what gets me that next increment of, of improvement?
link |
01:12:03.200
How far do you think, okay, this is where you kind of bring up the, the Elon Musk and the
link |
01:12:08.800
Sam Harris is right. How far is your intuition about these kinds of self play mechanisms being
link |
01:12:14.480
able to take us? Cause it feels one of the ominous, but stated calmly things that when I talked to
link |
01:12:23.600
David Silver, he said, is that they have not yet discovered a ceiling for Alpha zero, for example,
link |
01:12:31.040
on the game of Go or chess. Like it's, it keeps, no matter how much they compute, they throw at it,
link |
01:12:36.000
it keeps improving. So it's possible, it's very possible that you, if you throw, you know, some
link |
01:12:44.400
like 10 X compute that it will improve by five X or something like that. And when stated calmly,
link |
01:12:51.040
it's so like, oh yeah, I guess so. But, but like, then you think like, well, can we potentially
link |
01:12:58.000
have like continuations of Moore's law in totally different way, like broadly defined Moore's law,
link |
01:13:04.640
not the exponential improvement like, are we going to have an Alpha zero that swallows the world?
link |
01:13:12.960
But notice it's not getting better at other things. It's getting better at Go. And I think it's a,
link |
01:13:18.000
that's a big leap to say, okay, well, therefore it's better at other things.
link |
01:13:22.560
Well, I mean, the question is how much of the game of life can be turned into,
link |
01:13:27.520
right? So that's, that I think is a really good question. And I think that we don't,
link |
01:13:30.960
I don't think we as a, I don't know, community really know that the answer to this, but
link |
01:13:36.320
so okay, so, so I went, I went to a talk by some experts on computer chess. So in particular,
link |
01:13:44.080
computer chess is really interesting because for, you know, for, of course, for a thousand years,
link |
01:13:49.200
humans were the best chess playing things on the planet. And then computers, like edge to
link |
01:13:55.360
head of the best person, and they've been ahead ever since. It's not like people have, have
link |
01:13:59.200
overtaken computers, but, but computers and people together have overtaken computers.
link |
01:14:06.400
Right. So at least last time I checked, I don't know what the very latest is, but last time I
link |
01:14:10.800
checked that there were teams of people who could work with computer programs to defeat the best
link |
01:14:16.800
computer programs. In the game of Go. In the game of chess. In the game of chess. Right.
link |
01:14:20.640
And so using the information about how these things called ELO scores, the sort of notion of
link |
01:14:28.000
how strong a player are you, there's a, there's kind of a range of possible scores and this,
link |
01:14:33.040
the, you, you increment and score. Basically, if you can beat another player of that lower score,
link |
01:14:39.760
62% of the time or something like that, like there's some threshold of, if you can somewhat
link |
01:14:44.560
consistently beat someone, then you are of a higher score than that person. And there's a question
link |
01:14:49.760
as to how many times can you do that in chess? Right. And so we know that there's a range of
link |
01:14:54.160
human ability levels that cap out with the best playing humans. And the computers went a step
link |
01:14:59.120
beyond that. And computers and people together have not gone, I think, a full step beyond that.
link |
01:15:05.120
It feels, the estimates that they have is that it's starting to asymptote, that we've reached
link |
01:15:09.760
kind of the maximum, the best possible chess playing. And so that means that there's kind of a
link |
01:15:15.440
finite strategic depth, right? At some point, you just can't get any better at this game.
link |
01:15:21.600
Yeah. I mean, I don't, so I'll actually check that. I think it's interesting, because if you
link |
01:15:28.960
have somebody like Magnus Carlson, who's using these chess programs to train his mind, like to
link |
01:15:36.880
learn about chess. To become a better chess player, yeah. And so like, that's a very interesting
link |
01:15:41.440
thing, because we're not static creatures. We're learning together. I mean, just like we're talking
link |
01:15:46.160
about social networks, those algorithms are teaching us, just like we're teaching those
link |
01:15:50.560
algorithms. So that's a fascinating thing. But I think the best chess playing programs are now
link |
01:15:57.520
better than the pairs. Like they have competition between pairs, but it's still, even if they weren't,
link |
01:16:03.440
it's an interesting question. Where's the ceiling? So the David, the ominous David Silver kind of
link |
01:16:08.880
statement is like, we have not found the ceiling. Right. So the question is, okay, so I don't know
link |
01:16:15.280
his analysis on that. From talking to Go experts, the depth, the strategic depth of Go seems to be
link |
01:16:23.280
substantially greater than that of chess, that there's more kind of steps of improvement that
link |
01:16:27.920
you can make getting better and better and better and better. But there's no reason to think that
link |
01:16:31.120
it's infinite. Infinite, yeah. And so it could be that what David is seeing is a kind of asymptoting,
link |
01:16:38.080
that you can keep getting better, but with diminishing returns. And at some point,
link |
01:16:42.240
you hit optimal play. Like in theory, all these finite games, they're finite. They have an optimal
link |
01:16:48.720
strategy. There's a strategy that is the minimax optimal strategy. And so at that point,
link |
01:16:53.680
you can't get any better. You can't beat that, that strategy. Now, that strategy may be
link |
01:16:58.080
from an information processing perspective, intractable, right? That you need.
link |
01:17:03.360
All the situations are sufficiently different that you can't compress it at all. It's this
link |
01:17:09.120
giant mess of hardcoded rules. And we can never achieve that. But that still puts a cap on how
link |
01:17:16.480
many levels of improvement that we can actually make. But the thing about self play is if you put
link |
01:17:22.720
it, although I don't like doing that, in the broader category of self supervised learning,
link |
01:17:27.440
is that it doesn't require too much or any human human labeling. Yeah. Yeah. Human label or just
link |
01:17:34.000
human effort, the human involvement past a certain point. And the same thing you could argue is true
link |
01:17:41.040
for the recent breakthroughs in natural language processing with language models. Oh, this is
link |
01:17:46.160
how you get to GPT three. Yeah, see how that did. That was a good, good transition. Yeah,
link |
01:17:51.680
you're proud. I practiced that for days, leading up to this. But that's one of the questions is,
link |
01:17:59.600
can we find ways to formulate problems in this world that are important to us humans,
link |
01:18:05.440
like more important than the game of chess, that to which self supervised kinds of approaches
link |
01:18:12.400
could be applied, whether it's self play, for example, for like, maybe you could think of
link |
01:18:16.960
like autonomous vehicles in simulation, that kind of stuff, or just robotics applications
link |
01:18:23.920
and simulation, or in the self supervised learning, where un annotated data or data that's generated
link |
01:18:35.760
by humans naturally without extra costs, like the Wikipedia or like all of the internet can be
link |
01:18:43.280
used to, to learn something about, to create intelligent systems that do something really
link |
01:18:49.760
powerful, that pass the touring test, or that do some kind of super human level performance.
link |
01:18:56.400
So what's your intuition, like trying to stitch all of it together about our discussion of AGI,
link |
01:19:05.120
the limits of self play, and your thoughts about maybe the limits of neural networks
link |
01:19:10.320
in the context of language models? Is there some intuition in there that might be useful to think
link |
01:19:16.400
about? Yeah, yeah, yeah. So first of all, the whole transformer network family of things
link |
01:19:25.200
is really cool. It's really, really cool. I mean, if you've ever, back in the day, you played with,
link |
01:19:31.600
I don't know, Markov models for generating text, and you've seen the kind of text that they spit
link |
01:19:35.360
out, and you compare it to what's happening now. It's, it's amazing. It's so amazing. Now, it doesn't
link |
01:19:42.400
take very long interacting with one of these systems before you find the holes, right? It's,
link |
01:19:47.600
it's not smart in any kind of general way. It's really good at a bunch of things, and it does seem
link |
01:19:55.760
to understand a lot of the statistics of language extremely well. And that turns out to be very
link |
01:20:01.360
powerful. You can answer many questions with that, but it doesn't make it a good conversation list,
link |
01:20:06.160
right? And it doesn't make it a good storyteller. It just makes it good at imitating of things that
link |
01:20:10.480
is seen in the past. The exact same thing could be said by people who voting for Donald Trump
link |
01:20:16.480
about Joe Biden supporters and people voting for Joe Biden about Donald Trump supporters is,
link |
01:20:22.400
you know, that they're not intelligent. They're just following the, yeah, they're following things
link |
01:20:26.080
they've seen in the past. And so it's very, it doesn't take long to find the flaws in their,
link |
01:20:32.880
in their like natural language generation abilities. Yes. Yes. So we're being very.
link |
01:20:37.920
That's interesting. Critical of AS systems. Right. So, so I've had a similar thought,
link |
01:20:43.280
which was that the stories that GPT three spits out are amazing and very human like.
link |
01:20:51.360
And it doesn't mean that computers are smarter than we realize necessarily. It partly means that
link |
01:20:57.200
people are dumber than we realize or that much of what we do day to day is not that deep. Like,
link |
01:21:03.600
we're just, we're just kind of going with the flow. We're saying whatever feels like the natural
link |
01:21:08.160
thing to say next. Not a lot of it is, is, is creative or meaningful or intentional. But enough
link |
01:21:16.560
is that we actually get, we get by, right? And we do come up with new ideas sometimes and we do
link |
01:21:22.000
manage to talk each other into things sometimes. And we do sometimes vote for reasonable people
link |
01:21:26.720
sometimes. But, but it's really hard to see in the statistics because so much of what we're saying
link |
01:21:33.280
is kind of wrote. And so our metrics that we use to measure how these systems are doing,
link |
01:21:38.640
don't reveal that because it's, it's, it's in the interstices that, that is very hard to detect.
link |
01:21:45.840
But is your, do you have an intuition that with these language models, if they grow in size,
link |
01:21:52.240
it's already surprising that when you go from GPT two to GPT three, that there is a noticeable
link |
01:21:57.840
improvement. So the question now goes back to the ominous David Silver and the ceiling.
link |
01:22:02.320
Right. So maybe there's just no ceiling. We just need more compute. Now,
link |
01:22:05.680
I mean, okay. So now I'm speculating as opposed to before when I was completely on firm ground.
link |
01:22:10.800
Yeah. All right. I don't believe that you can get something that really can do language and use
link |
01:22:17.840
language as a thing that doesn't interact with people. Like, I think that it's not enough to just
link |
01:22:23.520
take everything that we've said written down and just say, that's enough. You can just learn from
link |
01:22:27.920
that and you can be intelligent. I think you really need to be pushed back at. I think that
link |
01:22:33.360
conversations, even people who are pretty smart, maybe the smartest thing that we know not, maybe
link |
01:22:38.080
not the smartest thing we can imagine, but we get so much benefit out of talking to each other
link |
01:22:44.080
and interacting. That's presumably why you have conversations live with guests is that, that there's
link |
01:22:49.600
something in that interaction that would not be exposed by, oh, I'll just write you a story and
link |
01:22:54.480
then you can read it later. And I think, I think because these systems are just learning from our
link |
01:22:58.720
stories, they're not learning from being pushed back at by us, that they're fundamentally limited
link |
01:23:04.400
into what they could actually become on this route. They have to, they have to get, you know,
link |
01:23:09.760
shut down. We have to have an argument that, they have to have an argument with us and lose
link |
01:23:14.640
a couple of times before they start to realize, oh, okay, wait, there's some nuance here that
link |
01:23:19.920
actually matters. Yeah, that's actually subtle sounding, but quite profound that the intro
link |
01:23:26.880
found that the interaction with humans is, is essential. And the limitation within that is
link |
01:23:34.320
profound as well, because the time scale, like the bandwidth at which you can really interact
link |
01:23:40.400
with humans is very low. So it's costly. So you can't, one of the underlying things about self
link |
01:23:47.120
plays, it has to do, you know, a very large number of interactions. And so you can't really deploy
link |
01:23:54.800
reinforcement learning systems into the real world to interact, like you couldn't deploy a language
link |
01:24:00.400
model into the real world to interact with humans, because it would just not get enough data relative
link |
01:24:07.360
to the cost it takes to interact, like the time of humans is, is expensive, which is really
link |
01:24:13.200
interesting. That's that good, that takes us back to reinforcement learning and trying to figure out
link |
01:24:17.680
if there's ways to make algorithms that are more efficient at learning, keep the spirit and reinforcement
link |
01:24:23.920
learning and become more efficient. In some sense, this seems to be the goal. I'd love to hear what
link |
01:24:29.200
your thoughts are. I don't know if you got the chance to see it. The blog post called Biddle
link |
01:24:34.560
Lesson. Oh, yes. By Rich Sutton, that makes an argument, hopefully I can summarize it perhaps,
link |
01:24:42.080
perhaps you can. Yeah, but good. Okay. So I mean, I could try and you can correct me, which is,
link |
01:24:48.480
he makes an argument that it seems if we look at the long arc of the history of the artificial
link |
01:24:53.600
intelligence field, it calls, you know, 70 years, that the algorithms from which we've seen the biggest
link |
01:25:00.800
improvements in practice are the very simple, like dumb algorithms that are able to leverage
link |
01:25:07.120
computation. And you just wait for the computation to improve, like all of the academics and so on
link |
01:25:13.120
have fun by finding little tricks and, and congratulate themselves on those tricks. And
link |
01:25:17.440
sometimes those tricks can be like big, that feel in the moment, like big spikes and breakthroughs,
link |
01:25:22.560
but in reality, over the decades, it's still the same dumb algorithm that just waits for the
link |
01:25:29.120
compute to get faster and faster. Do you find that to be an interesting argument against the
link |
01:25:36.720
entirety of the field of machine learning as an academic discipline? That we're really just a
link |
01:25:41.600
subfield of computer architecture. Yeah. We're just kind of waiting around for them to do their
link |
01:25:46.000
next thing. We really don't want to do hardware work. So like, that's right. I really don't want
link |
01:25:49.280
to, I don't want to think about hardware. We're procrastinating. Yes, that's right. Just waiting
link |
01:25:52.480
for them to do their job so that we can pretend to have done ours. So, yeah, I mean, the argument
link |
01:25:58.240
reminds me a lot of, I think it was a Fred Jelenet quote, early computational linguist who said,
link |
01:26:04.320
you know, we're building these computational linguistic systems. And every time we fire a
link |
01:26:09.600
linguist, performance goes up by 10%. Something like that. And so the idea of us building the
link |
01:26:14.640
knowledge in, in that, in that case, was much less, he was finding it to be much less successful
link |
01:26:20.880
than get rid of the people who know about language as a, you know, from a kind of
link |
01:26:27.280
scholastic academic kind of perspective and replace them with more compute.
link |
01:26:32.480
And so I think this is kind of a modern version of that story, which is, okay, we want to do better
link |
01:26:36.320
on machine vision. You could build in all these, you know, motivated, part based models that,
link |
01:26:44.960
you know, that just feel like obviously the right thing that you have to have, or we can throw a
link |
01:26:49.040
lot of data at it and guess what we're doing better with it with a lot of data. So I hadn't
link |
01:26:55.280
thought about it until this moment in this way. But what I believe, well, I've thought about what
link |
01:26:59.680
I believe. What I believe is that, you know, compositionality and what's the right way to
link |
01:27:08.240
say it, the complexity grows rapidly as you consider more and more possibilities, like
link |
01:27:14.720
explosively. And so far, Moore's law has also been growing explosively, exponentially. And so,
link |
01:27:21.200
so it really does seem like, well, we don't have to think really hard about the algorithm design
link |
01:27:27.360
or the way that we build the systems, because the best benefit we could get is exponential,
link |
01:27:32.560
and the best benefit that we can get from waiting is exponential, so we can just wait.
link |
01:27:38.000
It's got, that's got to end, right? And there's hints now that Moore's law is starting to feel
link |
01:27:43.200
some friction, starting to, the world is pushing back a little bit. One thing I don't know,
link |
01:27:50.000
lots of people know this, I didn't know this, I was trying to write an essay. And
link |
01:27:54.240
yeah, Moore's law has been amazing, and it's been, it's enabled all sorts of things. But there's a,
link |
01:27:59.200
there's also a kind of counter Moore's law, which is that the development cost for each
link |
01:28:03.680
successive generation of chips also is doubling. So it's costing twice as much money. So the amount
link |
01:28:09.680
of development money per cycle or whatever is actually sort of constant. And at some point,
link |
01:28:15.680
we run out of money. So, or we have to come up with an entirely different way of doing the
link |
01:28:20.720
development process. So like, I guess I always, always a bit skeptical of the look, it's an
link |
01:28:26.080
exponential curve, therefore it has no end. Soon the number of people going to Neuraps will be
link |
01:28:30.640
greater than the population of the earth. That means we're going to discover life on other planets.
link |
01:28:35.280
No, it doesn't. It means that we're in a, in a sigmoid curve on the front half, which looks
link |
01:28:40.640
a lot like an exponential. The second half is going to look a lot like diminishing returns.
link |
01:28:45.440
Yeah. The, I mean, but the interesting thing about Moore's law, if you actually like, look at the
link |
01:28:50.640
technologies involved, it's hundreds, not thousands of S curves stacked on top of each other. It's
link |
01:28:56.720
not actually an exponential curve. It's constant breakthroughs. And then what becomes useful to
link |
01:29:03.600
think about, which is exactly what you're saying, the cost of development, like the size of teams,
link |
01:29:07.920
the amount of resources that are invested in continuing to find new S curves, new breakthroughs.
link |
01:29:13.600
And yeah, it's a, it's an interesting idea. You know, if we live in the moment, if we sit here
link |
01:29:21.360
today, it seems to be the reasonable thing to say that exponentials end. And yet in the software
link |
01:29:30.800
realm, they just keep appearing to be happening. And it's so, I mean, it's so hard to disagree
link |
01:29:39.520
with Elon Musk on this because it like, I've, you know, I used to be one of those folks. I'm
link |
01:29:48.000
still one of those folks I've studied autonomous vehicles. This is what I worked on. And, and
link |
01:29:53.120
it's, it's like, you look at what Elon Musk is saying about autonomous vehicles, well, obviously
link |
01:29:57.920
in a couple of years or in a year or, or next month, we'll have fully autonomous vehicles.
link |
01:30:03.040
Like there's no reason why we can't driving is pretty simple. Like it's just a learning problem.
link |
01:30:07.840
And you just need to convert all the driving that we're doing into data and just having you
link |
01:30:12.720
all know with the trains and that data. And like we use only our eyes. So you can use cameras and
link |
01:30:18.880
you can train on it. And it's like, yeah, that's that what that should work. And then you put that
link |
01:30:27.120
hat on like the philosophical hat. And but then you put the pragmatic hat and it's like, this is
link |
01:30:31.840
what the flaws of computer vision are like, this is what it means to train at scale. And then you
link |
01:30:36.640
you put the human factors, the psychology hat on, which is like, it's actually driving us a lot,
link |
01:30:43.440
the cognitive science or cognitive, whatever the heck you call it is, it's really hard. It's much
link |
01:30:48.400
harder to drive than, than we realize there's a much larger number of edge cases. So building up
link |
01:30:53.840
an intuition around this is, is around exponential is really difficult. And on top of that, the pandemic
link |
01:31:01.680
is making us think about exponentials, making us realize that like, we don't understand anything
link |
01:31:07.760
about it. We're not able to intuit exponentials. We're either ultra terrified, some part of the
link |
01:31:14.160
population and some part is like the opposite of whatever the different carefree. And we're not
link |
01:31:22.480
managing it. Blase. Blase. Well, wow, that's French. So it's got an accent. So it's, it's, it's
link |
01:31:31.360
fascinating to think what, what the limits of this exponential growth of technology, not just
link |
01:31:41.360
Moore's law, it's technology, how that rubs up against the bitter lesson in GPT three and self
link |
01:31:52.640
playing mechanisms that is not obvious. I used to be much more skeptical about neural networks,
link |
01:31:58.160
now at least give us slither possibility that will be all, that will be very much surprised.
link |
01:32:04.400
And also, you know, caught in a way that like, we are not prepared for, like in applications of
link |
01:32:17.600
social networks, for example, because it feels like really good transformer models that are able
link |
01:32:24.400
to do some kind of like very good natural language generation of the same kind of models that could
link |
01:32:31.360
be used to learn human behavior and then manipulate that human behavior to gain advertiser dollars
link |
01:32:37.680
and all those kinds of things. Sure. To feed the capitalist system. And they arguably already
link |
01:32:43.280
are manipulating human behavior. Yeah. So, but not for self preservation, which I think is a big,
link |
01:32:51.120
that would be a big step. Like if they were trying to manipulate us to convince us not to
link |
01:32:55.040
shut them off, I would be very freaked out. But I don't see a path to that from where we are now.
link |
01:33:02.560
They don't have any of those abilities. That's not what they're trying to do. They're trying to
link |
01:33:08.080
keep people on the site. But see, the thing is this, this is the thing about life on earth is
link |
01:33:13.360
they might be borrowing our consciousness and sentience. Like, so like in a sense, they do
link |
01:33:20.880
because the creators of the algorithms have like, they're not, you know, if you look at our body,
link |
01:33:26.240
yeah, okay, we're not a single organism, we're a huge number of organisms with like tiny little
link |
01:33:30.960
motivations, we're built on top of each other. In the same sense, the AI algorithms that they're not
link |
01:33:36.480
like a system that includes human companies and corporations, right? Because corporations are
link |
01:33:41.120
funny organisms in and of themselves that really do seem to have self preservation built in. And I
link |
01:33:45.840
think that's at the, at the design level, I think the design to have self preservation be a focus.
link |
01:33:52.400
So you're right, in that, in that broader system that we're also a part of and can have some influence
link |
01:33:59.680
on, it's, it's, it is much more complicated, much more powerful. Yeah, I agree with that.
link |
01:34:06.720
So people really love it when I ask, what three books, technical, philosophical fiction had a
link |
01:34:13.600
big impact on your life, maybe you can recommend, we went with movies, we went with Billy Joel,
link |
01:34:21.120
and I forgot what you, what music you recommended, but I didn't, I just said I have no taste in music,
link |
01:34:26.400
I just like pop music. That was actually really skillful the way you avoided that question.
link |
01:34:31.040
Thanks, I was, I'm going to try to do the same with the books.
link |
01:34:34.560
So do you have a skillful way to avoid answering the question about three books you would recommend?
link |
01:34:39.600
I'd like to tell you a story. So my first job out of college was at Bellcor, I mentioned that
link |
01:34:46.240
before, where I worked with Dave Ackley, the head of the group was a guy named Tom Landauer,
link |
01:34:50.080
and I don't know how well known he's known now, but arguably he's the, he's the inventor and the
link |
01:34:56.320
first proselytizer of word embeddings. So they, they developed a system shortly before I got to the
link |
01:35:01.920
group. Yeah, that, that called latent semantic analysis that would take words of English and
link |
01:35:09.280
embed them in, you know, multi hundred dimensional space, and then use that as a way of, you know,
link |
01:35:15.040
assessing similarity and basically doing reinforcement learning, not sorry, not reinforcement,
link |
01:35:18.800
information retrieval, you know, sort of pre Google information retrieval.
link |
01:35:23.360
And he was trained as an anthropologist, but then became a cognitive scientist,
link |
01:35:29.600
I was in the cognitive science research group. It's, you know, like I said,
link |
01:35:32.720
I'm a cognitive science groupie. At the time, I thought I'd become a cognitive scientist,
link |
01:35:36.960
but then I realized in that group, no, I'm a computer scientist, but I'm a computer scientist
link |
01:35:41.040
who really loves to hang out with cognitive scientists. And he said, he studied language
link |
01:35:46.640
acquisition in particular, he said, you know, humans have about this number of words of vocabulary,
link |
01:35:52.640
and most of that is learned from reading. And I said, that can't be true, because I have a really
link |
01:35:58.080
big vocabulary, and I don't read. He's like, you must. I'm like, I don't think I do. I mean,
link |
01:36:03.120
like stop signs, I definitely read stop signs. But like reading books is not, is not a thing
link |
01:36:08.080
that I do a lot. Really, though, it might be just, maybe the red color. Do I read stop signs?
link |
01:36:14.000
Yeah. No, it's just pattern recognition at this point. I don't sound it out.
link |
01:36:19.360
Stop. So now I do, I wonder what that, oh yeah, stop the guns. So,
link |
01:36:25.280
that's fascinating. So you don't, so I don't read very, I mean, obviously, I read and I've read,
link |
01:36:30.080
I've read plenty of books. But like some people like Charles, my friend Charles and others,
link |
01:36:35.760
like a lot of people in my field, a lot of academics, like reading was really a central
link |
01:36:40.560
topic to them in development. And I'm not that guy. In fact, I used to joke that when I got into
link |
01:36:48.400
college, that it was on kind of a help out the illiterate kind of program, because I got to
link |
01:36:54.880
college, like in my house, I wasn't a particularly bad or good reader. But when I got to college,
link |
01:36:58.560
I was surrounded by these people that were just voracious in their reading appetite.
link |
01:37:03.520
And they would like, have you read this? Have you read this? Have you read this? And I'd be like,
link |
01:37:07.120
no, I'm clearly not qualified to be at this school. Like there's no way I should be here.
link |
01:37:11.600
Now I've discovered books on tape, like audio books. And so I'm much better. I'm more caught up,
link |
01:37:18.160
I read a lot of books. A small tangent on that. It is a fascinating open question to me
link |
01:37:24.480
on the topic of driving, whether, you know, supervised learning people, machine learning
link |
01:37:31.440
people think you have to like drive to learn how to drive. To me, it's very possible that just by
link |
01:37:38.720
us humans, by first of all, walking, but also by watching other people drive, not even being inside
link |
01:37:45.040
cars as a passenger, but let's say being inside the cars as a passenger, but even just like being
link |
01:37:50.960
a pedestrian and crossing the road, you learn so much about driving from that. It's very possible
link |
01:37:57.680
that you can, without ever being inside of a car, be okay at driving once you get in it.
link |
01:38:04.240
Or like watching a movie, for example, I don't know, something like that.
link |
01:38:07.440
Have you, have you taught anyone to drive? No.
link |
01:38:11.840
So I have two children and I learned a lot about car driving because my wife doesn't want to be the
link |
01:38:20.080
one in the car while they're learning. So that's my job. So I sit in the passenger seat and it's
link |
01:38:24.800
really scary. I have, you know, I have wishes to live and they're, you know, they're figuring things
link |
01:38:32.000
out. Now, they start off very, very much better than I imagine like a neural network would, right?
link |
01:38:39.600
They get that they're seeing the world. They get that there's a road that they're trying to be on.
link |
01:38:43.920
They get that there's a relationship between the angle of the steering, but it takes a while to
link |
01:38:48.080
not be very jerky. And so that happens pretty quickly. Like the ability to stay in lane at
link |
01:38:54.240
speed, that happens relatively fast. It's not zero shot learning, but it's pretty fast.
link |
01:39:00.000
The thing that's remarkably hard, and this is I think partly why self driving cars are really hard,
link |
01:39:04.640
is the degree to which driving is a social interaction activity. And that blew me away.
link |
01:39:10.160
I was completely unaware of it until I watched my son learning to drive. And I was realizing
link |
01:39:15.920
that he was sending signals to all the cars around him. And those, in his case, he's,
link |
01:39:20.800
he's always had social communication challenges. He was sending very mixed confusing signals to
link |
01:39:28.160
the other cars. And that was causing the other cars to drive weirdly and erratically. And there
link |
01:39:32.640
was no question in my mind that he would, he would have an accident because they didn't know how to
link |
01:39:38.720
read him. There's things you do with the speed that you drive, the positioning of your car,
link |
01:39:43.520
that you're constantly like in the head of the other drivers. And seeing him not knowing how to
link |
01:39:50.160
do that and having to be taught explicitly, okay, you have to be thinking about what the other driver
link |
01:39:54.480
is thinking was a revelation to me. I was stunned. It's a creating kind of theories of mind of the
link |
01:40:02.640
other. The theories of mind of the other cars. Yeah. Yeah. Which I just hadn't heard discussed in
link |
01:40:07.200
the self driving car talks that I've been to. Since then, there's some people who do do
link |
01:40:11.600
consider those kinds of issues, but it's way more subtle than I think there's a little bit of work
link |
01:40:16.960
involved with that. When you realize, like when you especially focus not on other cars, but on
link |
01:40:21.520
pedestrians, for example, it's literally staring you in the face. Yeah. So then when you're just
link |
01:40:27.600
like, how do I interact with pedestrians? You have pedestrians, you're practically talking to
link |
01:40:32.560
an octopus at that point. They've got all these weird degrees of freedom. You don't know what
link |
01:40:35.680
they're going to do. They can turn around any second. But the point is, we humans know what
link |
01:40:39.760
they're going to do. We have a good theory of mind. We have a good mental model of what they're
link |
01:40:45.280
doing. And we have a good model of the model they have of you and the model of the model of the
link |
01:40:50.800
model. We're able to kind of reason about this kind of the social game of it. The hope is that
link |
01:41:00.160
it's quite simple actually, that it could be learned. That's why I just talked to the Waymo.
link |
01:41:05.280
I don't know if you know of that company. Google sells their every car. I talked to their CTO
link |
01:41:11.120
about this podcast and they wrote in their car and it's quite aggressive and it's quite fast
link |
01:41:17.680
and it's good and it feels great. It also just like Tesla, Waymo made me change my mind about
link |
01:41:24.560
maybe driving is easier than I thought. Maybe I'm just being speciest human center. Maybe...
link |
01:41:32.400
It's a speciest argument. Yes, I don't know. But it's fascinating to think about like the same
link |
01:41:41.120
as with reading, which I think you just said. You avoided the question. I still hope you answered
link |
01:41:46.400
someone. You avoided it brilliantly. There's blind spots as artificial intelligence that
link |
01:41:52.560
artificial intelligence researchers have about what it actually takes to learn to solve a problem.
link |
01:41:58.640
That's fascinating. Have you had Anka Dragan on? Yes. She's one of my favorites. So much energy.
link |
01:42:04.960
She's amazing. Fantastic. And in particular, she thinks a lot about this kind of...
link |
01:42:10.240
I know that you know that I know kind of planning. And the last time I spoke with her,
link |
01:42:14.720
she was very articulate about the ways in which self driving cars are not solved,
link |
01:42:19.920
like what's still really, really hard. But even her intuition is limited. We're all new to this.
link |
01:42:26.000
So in some sense, the Elon Musk approach of being ultra confident and just like plowing...
link |
01:42:30.160
Put it out there. Put it out there. Like some people say it's reckless and dangerous and so on.
link |
01:42:35.280
But partly it seems to be one of the only ways to make progress in artificial intelligence.
link |
01:42:44.240
These are difficult things. Democracy is messy. Implementation of artificial
link |
01:42:50.960
intelligence systems in the real world is messy. So many years ago, before self driving cars were
link |
01:42:56.400
an actual thing you could have a discussion about, somebody asked me, what if we could
link |
01:43:01.440
use that robotic technology and use it to drive cars around? Aren't people going to be killed?
link |
01:43:08.400
That's not what's going to happen. I said with confidence incorrectly, obviously.
link |
01:43:13.200
What I think is going to happen is we're going to have a lot more like a very gradual kind of
link |
01:43:16.960
rollout where people have these cars in like closed communities, where it's somewhat realistic,
link |
01:43:24.240
but it's still in a box so that we can really get a sense of what are the weird things that
link |
01:43:29.840
can happen? How do we have to change the way we behave around these vehicles? It obviously requires
link |
01:43:37.920
a kind of co evolution that you can't just plop them in and see what happens.
link |
01:43:42.480
But of course, we're basically plopping them in and see what happens. So I was wrong,
link |
01:43:45.280
but I do think that would have been a better plan.
link |
01:43:48.480
But your intuition is funny, just zooming out and looking at the forces of capitalism.
link |
01:43:54.000
And it seems that capitalism rewards risk takers and rewards and punishes risk takers and
link |
01:44:02.640
try it out. The academic approach to let's try a small thing and try to understand slowly the
link |
01:44:13.200
fundamentals of the problem. And let's start with one and do two and then see that and then do the
link |
01:44:19.040
three. The capitalists like startup entrepreneurial dream is let's build a thousand and let's...
link |
01:44:26.640
Right. And 500 of them fail, but whatever the other 500 we learned from them.
link |
01:44:30.640
But if you're good enough, I mean, one thing is like your intuition would say like,
link |
01:44:34.800
that's going to be hugely destructive to everything. But actually it's kind of the forces
link |
01:44:41.600
of capitalism. People are quite... It's easy to be critical, but if you actually look at the data
link |
01:44:46.880
at the way our world has progressed in terms of the quality of life, it seems like the competent,
link |
01:44:52.480
good people rise to the top. This is coming from me from the Soviet Union and so on.
link |
01:44:58.400
It's interesting that somebody like Elon Musk is the way you push progress and artificial
link |
01:45:07.440
intelligence. Like it's forcing Waymo to step their stuff up and Waymo is forcing Elon Musk
link |
01:45:15.760
to step up. It's fascinating because I have this tension in my heart and just being upset by
link |
01:45:24.320
the lack of progress in autonomous vehicles within academia. So there's huge progress
link |
01:45:31.360
in the early days of the DARPA challenges. And then it just kind of stopped at MIT,
link |
01:45:38.560
but it's true everywhere else with an exception of a few sponsors here and there. It's not seen
link |
01:45:47.120
as a sexy problem. The moment artificial intelligence starts approaching the problems
link |
01:45:54.080
of the real world, like academics kind of like, all right, let the...
link |
01:45:59.360
Because they get really hard in a different way.
link |
01:46:01.760
In a different way. That's right.
link |
01:46:03.120
I think some of us are not excited about that other way.
link |
01:46:07.040
But I still think there's fundamentals problems to be solved in those difficult things. It's
link |
01:46:12.960
still publishable, I think. We just need to... It's the same criticism you could have of all
link |
01:46:17.680
these conferences in Europe, in CVPR, where application papers are often as powerful and
link |
01:46:24.320
as important as theory paper. Even theory just seems much more respectable and so on.
link |
01:46:31.120
I mean, machine learning community is changing that a little bit, at least in statements,
link |
01:46:35.280
but it's still not seen as the sexiest of pursuits, which is like, how do I actually
link |
01:46:41.360
make this thing work in practice as opposed to on this toy dataset?
link |
01:46:46.960
All that to say, are you still avoiding the three books question? Is there something on
link |
01:46:51.680
audiobook that you can recommend? Oh, yeah. I mean, yeah, I've read a lot of really fun stuff.
link |
01:46:58.640
In terms of books that I find myself thinking back on that I read a while ago,
link |
01:47:03.200
like that have stood the test of time to some degree, I find myself thinking of
link |
01:47:07.360
program or B programed a lot by Douglas Roschkopf, which was... It basically put out the premise
link |
01:47:15.680
that we all need to become programmers in one form or another. It was in analogy to,
link |
01:47:23.200
once upon a time, we all had to become readers. We had to become literate. There was a time before
link |
01:47:28.480
that when not everybody was literate, but once literacy was possible, the people who were literate
link |
01:47:32.800
had more of a say in society than the people who weren't. We made a big effort to get everybody
link |
01:47:39.040
up to speed and now it's not 100% universal, but it's quite widespread. The assumption
link |
01:47:45.600
is generally that people can read. The analogy that he makes is that programming is a similar
link |
01:47:50.960
kind of thing, that we need to have a say in... Right. Being a reader, being literate, being a
link |
01:47:59.120
reader means you can receive all this information, but you don't get to put it out there. Programming
link |
01:48:04.720
is the way that we get to put it out there. That was the argument they made. I think he
link |
01:48:08.160
specifically has now backed away from this idea. He doesn't think it's happening quite this way.
link |
01:48:13.440
That might be true that society didn't play forward quite that way.
link |
01:48:20.640
I still believe in the premise. I still believe that at some point,
link |
01:48:23.840
we have... The relationship that we have to these machines and these networks has to be one of each
link |
01:48:28.560
individual has the wherewithal to make the machines help them do the things that that
link |
01:48:36.000
person once done. As software people, we know how to do that. When we have a problem, we're like,
link |
01:48:41.120
okay, I'll hack up a pulse grip or something and make it so. If we lived in a world where
link |
01:48:45.760
everybody could do that, that would be a better world. Computers would have, I think, less sway
link |
01:48:52.160
over us. Other people's software would have less sway over us as a group.
link |
01:48:57.360
Yeah. In some sense, software engineering is programming's power.
link |
01:49:00.720
Programming is power. Right. Yeah. It's like magic. It's like magic spells. It's not out of reach
link |
01:49:07.840
of everyone, but at the moment, it's just a sliver of the population who can
link |
01:49:12.800
commune with machines in this way. I don't know. That book had a big impact on me.
link |
01:49:18.400
Currently, I'm reading The Alignment Problem actually by Brian Christian. I don't know if
link |
01:49:22.480
you've seen this out there yet. Is it similar to Stuart Russell's work with the control problem?
link |
01:49:26.880
It's in that same general neighborhood. They have different
link |
01:49:30.640
emphases that they're concentrating on. I think Stuart's book did a remarkably good job.
link |
01:49:36.240
Like just a celebratory good job at describing AI technology and how it works. I thought that
link |
01:49:43.520
was great. It was really cool to see that in a book. I think he has some experience writing
link |
01:49:48.400
some books. That's probably a possible thing. He's maybe thought a thing or two about how to explain
link |
01:49:54.800
AI to people. Yeah. Yeah. That's a really good point. This book so far has been remarkably good
link |
01:50:00.640
at telling the story of the recent history of some of the things that have happened.
link |
01:50:08.240
I'm in the first third. He said this book is in three thirds. The first third is essentially AI
link |
01:50:13.760
fairness and implications of AI on society that we're seeing right now. That's been great. He's
link |
01:50:19.920
telling those stories really well. He went out and talked to the frontline people whose names
link |
01:50:24.960
were associated with some of these ideas. It's been terrific. He says the second half of the book
link |
01:50:29.280
is on reinforcement learning. Maybe that'll be fun. Then the third half, third third,
link |
01:50:36.320
is on the superintelligence alignment problem. I suspect that that part will be less fun for
link |
01:50:43.360
me to read. Yeah. It's an interesting problem to talk about. I find it to be the most interesting
link |
01:50:50.640
just like thinking about whether we live in a simulation or not as a thought experiment to
link |
01:50:56.480
think about our own existence. In the same way, talking about alignment problem with AGI is a
link |
01:51:02.400
good way to think, similar to the trolley problem with autonomous vehicles. It's a useless thing
link |
01:51:07.600
for engineering, but it's a nice little thought experiment for actually thinking about our own
link |
01:51:14.400
human ethical systems, our moral systems, by thinking how we engineer these things,
link |
01:51:22.960
you start to understand yourself. SciFi can be good at that too. One sci fi book to recommend
link |
01:51:28.960
is Excellations by Ted Chiang, a bunch of short stories. Ted Chiang is the guy who wrote the
link |
01:51:35.200
short story that became the movie Arrival. All of his stories, just from a, he was a computer
link |
01:51:42.720
scientist. Actually, he studied at Brown. They all have this really insightful bit of science
link |
01:51:49.920
or computer science that drives them. It's just a romp. He creates these artificial worlds by
link |
01:51:58.320
extrapolating on these ideas that we know about, but hadn't really thought through to this kind of
link |
01:52:03.360
conclusion. His stuff is, it's really fun to read. It's mind warping. I'm not sure if you're
link |
01:52:09.840
familiar. I seem to mention this every other word. I'm from the Soviet Union and I'm Russian.
link |
01:52:16.240
I think my roots are Russian too, but a couple of generations back.
link |
01:52:22.480
Well, it's probably in there somewhere. Maybe we can pull at that thread a little bit
link |
01:52:28.640
of the existential dread that we all feel. I think somewhere in the conversation,
link |
01:52:34.000
you mentioned that you don't really pretty much like dying. I forget in which context.
link |
01:52:39.360
It might have been a reinforcement learning perspective. I don't know.
link |
01:52:42.160
I know what it was. It was in teaching my kids to drive. That's how you face your mortality.
link |
01:52:48.560
Yes. From a human beings perspective or from a reinforcement learning researchers perspective,
link |
01:52:55.200
let me ask you the most absurd question. What do you think is the meaning of this whole thing,
link |
01:53:01.440
the meaning of life on this spinning rock? I mean, I think reinforcement learning researchers
link |
01:53:08.880
maybe think about this from a science perspective more often than a lot of other people. As a
link |
01:53:13.680
supervised learning person, you're probably not thinking about the sweep of a lifetime,
link |
01:53:18.320
but reinforcement learning agents are having little lifetimes, little weird little lifetimes,
link |
01:53:22.800
and it's hard not to project yourself into their world sometimes.
link |
01:53:28.080
As far as the meaning of life, when I turn 42, you may know from, that is a book I read,
link |
01:53:34.720
The Hitchhiker's Guide to the Galaxy, that that is the meaning of life. When I turned 42,
link |
01:53:41.120
I had a meaning of life party where I invited people over and everyone shared their meaning
link |
01:53:48.320
of life. We had slides made up and so we all sat down and did a slide presentation to each other
link |
01:53:54.960
about the meaning of life. Mine was balance. I think that life is balance.
link |
01:54:01.680
And so the activity at the party, for a 42 year old, maybe this is a little bit nonstandard,
link |
01:54:08.960
but I found all the little toys and devices that I had that where you had to balance on them.
link |
01:54:13.440
You had to stand on it and balance or Pogo Stick I brought, a Rip Stick, which is like a weird
link |
01:54:19.920
two wheeled skateboard. I got a Unicycle, but I didn't know how to do it. I now can do it.
link |
01:54:27.840
I love watching you try. Yeah, I'll send you a video. I'm not great, but I managed.
link |
01:54:35.360
And so balance, yeah. So my wife has a really good one that she sticks to and is probably
link |
01:54:42.960
pretty accurate and it has to do with healthy relationships with people that you love and
link |
01:54:49.200
working hard for good causes. But to me, yeah, balance, balance in a word. That works for me.
link |
01:54:55.360
Not too much of anything because too much of anything is iffy.
link |
01:54:59.600
That feels like a Rolling Stones song. I feel like there must be.
link |
01:55:03.280
You can't always get what you want, but if you try sometimes, you can strike a balance.
link |
01:55:09.520
Yeah, I think that's how it goes. I'll write you a parody.
link |
01:55:14.640
It's a huge honor to talk to you. This is really fun. I've been a big fan of yours, so
link |
01:55:18.880
I can't wait to see what you do next in the world of education, the world of parody,
link |
01:55:27.040
in the world of reinforcement learning. Thanks for talking to me. My pleasure.
link |
01:55:30.640
Thank you for listening to this conversation with Michael Littman and thank you to our sponsors.
link |
01:55:35.040
Simply Safe, a home security company I use to monitor and protect my apartment,
link |
01:55:40.160
ExpressVPN, the VPN I've used for many years to protect my privacy on the internet,
link |
01:55:44.960
Masterclass, online courses that I enjoy from some of the most amazing humans in history,
link |
01:55:51.360
and BetterHelp, online therapy with a licensed professional.
link |
01:55:55.440
Please check out these sponsors in the description to get a discount and to support this podcast.
link |
01:56:00.720
If you enjoy this thing, subscribe on YouTube, review it with five stars and up a podcast,
link |
01:56:05.680
follow on Spotify, support on Patreon, or connect with me on Twitter at Lex Freedman.
link |
01:56:11.440
And now, let me leave you some words from Groucho Marks. If you're not having fun,
link |
01:56:17.680
you're doing something wrong. Thank you for listening and hope to see you next time.