back to index

Michael Littman: Reinforcement Learning and the Future of AI | Lex Fridman Podcast #144


small model | large model

link |
00:00:00.000
The following is a conversation with Michael Littman, a computer science professor at Brown
link |
00:00:04.560
University doing research on and teaching machine learning, reinforcement learning,
link |
00:00:10.320
and artificial intelligence. He enjoys being silly and lighthearted in conversation,
link |
00:00:16.400
so this was definitely a fun one. Quick mention of each sponsor,
link |
00:00:20.640
followed by some thoughts related to the episode. Thank you to SimplySafe, a home security company
link |
00:00:26.800
I use to monitor and protect my apartment, ExpressVPN, the VPN I've used for many years
link |
00:00:32.480
to protect my privacy on the internet, MasterClass, online courses that I enjoy from
link |
00:00:38.000
some of the most amazing humans in history, and BetterHelp, online therapy with a licensed
link |
00:00:43.760
professional. Please check out these sponsors in the description to get a discount and to support
link |
00:00:49.200
this podcast. As a side note, let me say that I may experiment with doing some solo episodes
link |
00:00:55.440
in the coming month or two. The three ideas I have floating in my head currently is to use one,
link |
00:01:02.720
a particular moment in history, two, a particular movie, or three, a book to drive a conversation
link |
00:01:10.240
about a set of related concepts. For example, I could use 2001, A Space Odyssey, or Ex Machina
link |
00:01:17.120
to talk about AGI for one, two, three hours. Or I could do an episode on the, yes, rise and fall of
link |
00:01:26.000
Hitler and Stalin, each in a separate episode, using relevant books and historical moments
link |
00:01:32.560
for reference. I find the format of a solo episode very uncomfortable and challenging,
link |
00:01:38.800
but that just tells me that it's something I definitely need to do and learn from the experience.
link |
00:01:44.080
Of course, I hope you come along for the ride. Also, since we have all this momentum built up
link |
00:01:49.280
on announcements, I'm giving a few lectures on machine learning at MIT this January.
link |
00:01:54.240
In general, if you have ideas for the episodes, for the lectures, or for just short videos on
link |
00:02:01.600
YouTube, let me know in the comments that I still definitely read, despite my better judgment,
link |
00:02:10.080
and the wise sage advice of the great Joe Rogan. If you enjoy this thing, subscribe on YouTube,
link |
00:02:17.200
review it with Five Stars and Apple Podcast, follow on Spotify, support on Patreon, or connect
link |
00:02:22.400
with me on Twitter at Lex Friedman. And now, here's my conversation with Michael Littman.
link |
00:02:29.920
I saw a video of you talking to Charles Isbell about Westworld, the TV series. You guys were
link |
00:02:35.680
doing the kind of thing where you're watching new things together, but let's rewind back.
link |
00:02:41.360
Is there a sci fi movie or book or shows that was profound, that had an impact on you philosophically,
link |
00:02:50.560
or just specifically something you enjoyed nerding out about?
link |
00:02:55.200
Yeah, interesting. I think a lot of us have been inspired by robots in movies. One that I really
link |
00:03:00.640
like is, there's a movie called Robot and Frank, which I think is really interesting because it's
link |
00:03:05.760
very near term future, where robots are being deployed as helpers in people's homes. And we
link |
00:03:15.200
don't know how to make robots like that at this point, but it seemed very plausible. It seemed
link |
00:03:19.200
very realistic or imaginable. And I thought that was really cool because they're awkward,
link |
00:03:25.280
they do funny things that raise some interesting issues, but it seemed like something that would
link |
00:03:29.040
ultimately be helpful and good if we could do it right.
link |
00:03:31.600
Yeah, he was an older cranky gentleman, right?
link |
00:03:33.760
He was an older cranky jewel thief, yeah.
link |
00:03:36.800
It's kind of funny little thing, which is, you know, he's a jewel thief and so he pulls the
link |
00:03:42.240
robot into his life, which is like, which is something you could imagine taking a home robotics
link |
00:03:49.520
thing and pulling into whatever quirky thing that's involved in your existence.
link |
00:03:54.800
It's meaningful to you. Exactly so. Yeah. And I think from that perspective, I mean,
link |
00:04:00.000
not all of us are jewel thieves. And so when we bring our robots into our lives, it explains a
link |
00:04:05.680
lot about this apartment, actually. But no, the idea that people should have the ability to make
link |
00:04:12.400
this technology their own, that it becomes part of their lives. And I think it's hard for us
link |
00:04:18.400
as technologists to make that kind of technology. It's easier to mold people into what we need them
link |
00:04:22.720
to be. And just that opposite vision, I think, is really inspiring. And then there's a
link |
00:04:28.080
anthropomorphization where we project certain things on them, because I think the robot was
link |
00:04:32.640
kind of dumb. But I have a bunch of Roombas I play with and you immediately project stuff onto
link |
00:04:38.240
them. Much greater level of intelligence. We'll probably do that with each other too. Much greater
link |
00:04:43.920
degree of compassion. That's right. One of the things we're learning from AI is where we are
link |
00:04:47.760
smart and where we are not smart. Yeah. You also enjoy, as people can see, and I enjoyed
link |
00:04:55.760
myself watching you sing and even dance a little bit, a little bit, a little bit of dancing.
link |
00:05:02.160
A little bit of dancing. That's not quite my thing. As a method of education or just in life,
link |
00:05:08.800
you know, in general. So easy question. What's the definitive, objectively speaking,
link |
00:05:15.920
top three songs of all time? Maybe something that, you know, to walk that back a little bit,
link |
00:05:22.000
maybe something that others might be surprised by the three songs that you kind of enjoy.
link |
00:05:28.480
That is a great question that I cannot answer. But instead, let me tell you a story.
link |
00:05:32.560
So pick a question you do want to answer. That's right. I've been watching the
link |
00:05:36.480
presidential debates and vice presidential debates. And it turns out, yeah, it's really,
link |
00:05:39.440
you can just answer any question you want. So it's a related question. Well said.
link |
00:05:47.280
I really like pop music. I've enjoyed pop music ever since I was very young. So 60s music,
link |
00:05:51.760
70s music, 80s music. This is all awesome. And then I had kids and I think I stopped listening
link |
00:05:56.560
to music and I was starting to realize that my musical taste had sort of frozen out.
link |
00:06:01.440
And so I decided in 2011, I think, to start listening to the top 10 billboard songs each week.
link |
00:06:08.240
So I'd be on the on the treadmill and I would listen to that week's top 10 songs
link |
00:06:11.920
so I could find out what was popular now. And what I discovered is that I have no musical
link |
00:06:17.280
taste whatsoever. I like what I'm familiar with. And so the first time I'd hear a song
link |
00:06:22.960
is the first week that was on the charts, I'd be like, and then the second week,
link |
00:06:26.880
I was into it a little bit. And the third week, I was loving it. And by the fourth week is like,
link |
00:06:30.640
just part of me. And so I'm afraid that I can't tell you the most my favorite song of all time,
link |
00:06:36.720
because it's whatever I heard most recently. Yeah, that's interesting. People have told me that
link |
00:06:44.240
there's an art to listening to music as well. And you can start to, if you listen to a song,
link |
00:06:48.800
just carefully, like explicitly, just force yourself to really listen. You start to,
link |
00:06:54.080
I did this when I was part of jazz band and fusion band in college. You start to hear the layers
link |
00:07:01.200
of the instruments. You start to hear the individual instruments and you start to,
link |
00:07:04.720
you can listen to classical music or to orchestra this way. You can listen to jazz this way.
link |
00:07:08.240
I mean, it's funny to imagine you now to walking that forward to listening to pop hits now as like
link |
00:07:16.240
a scholar, listening to like Cardi B or something like that, or Justin Timberlake. Is he? No,
link |
00:07:22.160
not Timberlake, Bieber. They've both been in the top 10 since I've been listening.
link |
00:07:26.640
They're still up there. Oh my God, I'm so cool.
link |
00:07:29.520
If you haven't heard Justin Timberlake's top 10 in the last few years, there was one
link |
00:07:33.440
song that he did where the music video was set at essentially NeurIPS.
link |
00:07:38.720
Oh, wow. Oh, the one with the robotics. Yeah, yeah, yeah, yeah, yeah.
link |
00:07:42.400
Yeah, yeah. It's like at an academic conference and he's doing a demo.
link |
00:07:45.520
He was presenting, right?
link |
00:07:46.640
It was sort of a cross between the Apple, like Steve Jobs kind of talk and NeurIPS.
link |
00:07:51.920
Yeah.
link |
00:07:53.120
So, you know, it's always fun when AI shows up in pop culture.
link |
00:07:56.560
I wonder if he consulted somebody for that. That's really interesting. So maybe on that topic,
link |
00:08:01.840
I've seen your celebrity multiple dimensions, but one of them is you've done cameos in different
link |
00:08:08.000
places. I've seen you in a TurboTax commercial as like, I guess, the brilliant Einstein character.
link |
00:08:16.720
And the point is that TurboTax doesn't need somebody like you. It doesn't need a brilliant
link |
00:08:23.840
person.
link |
00:08:24.340
Very few things need someone like me. But yes, they were specifically emphasizing the
link |
00:08:28.000
idea that you don't need to be like a computer expert to be able to use their software.
link |
00:08:32.080
How did you end up in that world?
link |
00:08:33.680
I think it's an interesting story. So I was teaching my class. It was an intro computer
link |
00:08:38.560
science class for non concentrators, non majors. And sometimes when people would visit campus,
link |
00:08:45.440
they would check in to say, hey, we want to see what a class is like. Can we sit on your class?
link |
00:08:48.960
So a person came to my class who was the daughter of the brother of the husband of the best friend
link |
00:09:02.800
of my wife. Anyway, basically a family friend came to campus to check out Brown and asked to
link |
00:09:11.200
come to my class and came with her dad. Her dad is, who I've known from various
link |
00:09:16.800
kinds of family events and so forth, but he also does advertising. And he said that he was
link |
00:09:21.360
recruiting scientists for this ad, this TurboTax set of ads. And he said, we wrote the ad with the
link |
00:09:31.200
idea that we get like the most brilliant researchers, but they all said no. So can you
link |
00:09:36.720
help us find like B level scientists? And I'm like, sure, that's who I hang out with.
link |
00:09:44.800
So that should be fine. So I put together a list and I did what some people call the Dick Cheney.
link |
00:09:49.840
So I included myself on the list of possible candidates, with a little blurb about each one
link |
00:09:55.040
and why I thought that would make sense for them to do it. And they reached out to a handful of
link |
00:09:59.200
them, but then they ultimately, they YouTube stalked me a little bit and they thought,
link |
00:10:03.120
oh, I think he could do this. And they said, okay, we're going to offer you the commercial.
link |
00:10:07.600
I'm like, what? So it was such an interesting experience because they have another world, the
link |
00:10:14.320
people who do like nationwide kind of ad campaigns and television shows and movies and so forth.
link |
00:10:21.760
It's quite a remarkable system that they have going because they have a set. Yeah. So I went to,
link |
00:10:28.400
it was just somebody's house that they rented in New Jersey. But in the commercial, it's just me
link |
00:10:35.680
and this other woman. In reality, there were 50 people in that room and another, I don't know,
link |
00:10:41.680
half a dozen kind of spread out around the house in various ways. There were people whose job it
link |
00:10:46.400
was to control the sun. They were in the backyard on ladders, putting filters up to try to make sure
link |
00:10:53.440
that the sun didn't glare off the window in a way that would wreck the shot. So there was like
link |
00:10:57.120
six people out there doing that. There was three people out there giving snacks, the craft table.
link |
00:11:02.160
There was another three people giving healthy snacks because that was a separate craft table.
link |
00:11:05.840
There was one person whose job it was to keep me from getting lost. And I think the reason for all
link |
00:11:12.720
this is because so many people are in one place at one time. They have to be time efficient. They
link |
00:11:16.560
have to get it done. The morning they were going to do my commercial. In the afternoon, they were
link |
00:11:20.640
going to do a commercial of a mathematics professor from Princeton. They had to get it done. No wasted
link |
00:11:27.600
time or energy. And so there's just a fleet of people all working as an organism. And it was
link |
00:11:32.320
fascinating. I was just the whole time just looking around like, this is so neat. Like one person
link |
00:11:36.880
whose job it was to take the camera off of the cameraman so that someone else whose job it was
link |
00:11:43.760
to remove the film canister. Because every couple's takes, they had to replace the film because film
link |
00:11:48.720
gets used up. It was just, I don't know. I was geeking out the whole time. It was so fun.
link |
00:11:53.520
How many takes did it take? It looked the opposite. There was more than two people there. It was very
link |
00:11:57.920
relaxed. Right. Yeah. The person who I was in the scene with is a professional. She's an improv
link |
00:12:06.320
comedian from New York City. And when I got there, they had given me a script as such as it was. And
link |
00:12:11.040
then I got there and they said, we're going to do this as improv. I'm like, I don't know how to
link |
00:12:15.280
improv. I don't know what you're telling me to do here. Don't worry. She knows. I'm like, okay.
link |
00:12:21.600
I'll go see how this goes. I guess I got pulled into the story because like, where the heck did
link |
00:12:26.320
you come from? I guess in the scene. Like, how did you show up in this random person's house?
link |
00:12:32.480
Yeah. Well, I mean, the reality of it is I stood outside in the blazing sun. There was someone
link |
00:12:36.320
whose job it was to keep an umbrella over me because I started to sweat. And so I would wreck
link |
00:12:41.440
the shot because my face was all shiny with sweat. So there was one person who would dab me off,
link |
00:12:45.600
had an umbrella. But yeah, like the reality of it, like, why is this strange stalkery person hanging
link |
00:12:51.600
around outside somebody's house? We're not sure when you have to look in,
link |
00:12:54.960
what the ways for the book, but are you, so you make, you make, like you said, YouTube,
link |
00:13:00.400
you make videos yourself, you make awesome parody, sort of parody songs that kind of focus on a
link |
00:13:07.760
particular aspect of computer science. How much those seem really interesting to you?
link |
00:13:13.360
How much those seem really natural? How much production value goes into that?
link |
00:13:18.000
Do you also have a team of 50 people? The videos, almost all the videos,
link |
00:13:22.480
except for the ones that people would have actually seen, are just me. I write the lyrics,
link |
00:13:26.880
I sing the song. I generally find a, like a backing track online because I'm like you,
link |
00:13:34.400
can't really play an instrument. And then I do, in some cases I'll do visuals using just like
link |
00:13:39.120
PowerPoint. Lots and lots of PowerPoint to make it sort of like an animation.
link |
00:13:44.240
The most produced one is the one that people might have seen, which is the overfitting video
link |
00:13:49.120
that I did with Charles Isbell. And that was produced by the Georgia Tech and Udacity people
link |
00:13:55.760
because we were doing a class together. It was kind of, I usually do parody songs kind of to
link |
00:13:59.680
cap off a class at the end of a class. So that one you're wearing, so it was just a
link |
00:14:04.560
thriller. You're wearing the Michael Jackson, the red leather jacket. The interesting thing
link |
00:14:09.920
with podcasting that you're also into is that I really enjoy is that there's not a team of people.
link |
00:14:21.040
It's kind of more, because you know, there's something that happens when there's more people
link |
00:14:29.040
involved than just one person that just the way you start acting, I don't know. There's a censorship.
link |
00:14:36.400
You're not given, especially for like slow thinkers like me, you're not. And I think most of us are,
link |
00:14:42.480
if we're trying to actually think we're a little bit slow and careful, it kind of large teams get
link |
00:14:50.640
in the way of that. And I don't know what to do with that. Like that's the, to me, like if,
link |
00:14:56.480
yeah, it's very popular to criticize quote unquote mainstream media.
link |
00:15:01.760
But there is legitimacy to criticizing them the same. I love listening to NPR, for example,
link |
00:15:06.880
but every, it's clear that there's a team behind it. There's a commercial,
link |
00:15:11.440
there's constant commercial breaks. There's this kind of like rush of like,
link |
00:15:16.080
okay, I have to interrupt you now because we have to go to commercial. Just this whole,
link |
00:15:20.320
it creates, it destroys the possibility of nuanced conversation. Yeah, exactly. Evian,
link |
00:15:29.280
which Charles Isbell, who I talked to yesterday told me that Evian is naive backwards, which
link |
00:15:36.800
the fact that his mind thinks this way is quite brilliant. Anyway, there's a freedom to this
link |
00:15:42.240
podcast. He's Dr. Awkward, which by the way, is a palindrome. That's a palindrome that I happen to
link |
00:15:46.960
know from other parts of my life. And I just, well, you know, use it against Charles. Dr. Awkward.
link |
00:15:54.640
So what was the most challenging parody song to make? Was it the Thriller one?
link |
00:16:00.800
No, that one was really fun. I wrote the lyrics really quickly and then I gave it over to the
link |
00:16:06.080
production team. They recruited a acapella group to sing. That went really smoothly. It's great
link |
00:16:11.920
having a team because then you can just focus on the part that you really love, which in my case
link |
00:16:15.520
is writing the lyrics. For me, the most challenging one, not challenging in a bad way, but challenging
link |
00:16:21.040
in a really fun way, was I did one of the parody songs I did is about the halting problem in
link |
00:16:27.520
computer science. The fact that you can't create a program that can tell for any other arbitrary
link |
00:16:34.480
program whether it actually going to get stuck in infinite loop or whether it's going to eventually
link |
00:16:38.080
stop. And so I did it to an 80's song because I hadn't started my new thing of learning current
link |
00:16:46.000
songs. And it was Billy Joel's The Piano Man. Nice. Which is a great song. Sing me a song.
link |
00:16:56.560
You're the piano man. Yeah. So the lyrics are great because first of all, it rhymes. Not all
link |
00:17:04.560
songs rhyme. I've done Rolling Stones songs which turn out to have no rhyme scheme whatsoever. They're
link |
00:17:09.760
just sort of yelling and having a good time, which makes it not fun from a parody perspective because
link |
00:17:14.640
like you can say anything. But the lines rhymed and there was a lot of internal rhymes as well.
link |
00:17:18.960
And so figuring out how to sing with internal rhymes, a proof of the halting problem was really
link |
00:17:24.720
challenging. And I really enjoyed that process. What about, last question on this topic, what
link |
00:17:30.960
about the dancing in the Thriller video? How many takes that take? So I wasn't planning to dance.
link |
00:17:36.800
They had me in the studio and they gave me the jacket and it's like, well, you can't,
link |
00:17:40.560
if you have the jacket and the glove, like there's not much you can do. Yeah. So I think I just
link |
00:17:46.080
danced around and then they said, why don't you dance a little bit? There was a scene with me
link |
00:17:49.600
and Charles dancing together. They did not use it in the video, but we recorded it. Yeah. Yeah. No,
link |
00:17:55.920
it was pretty funny. And Charles, who has this beautiful, wonderful voice doesn't really sing.
link |
00:18:02.720
He's not really a singer. And so that was why I designed the song with him doing a spoken section
link |
00:18:07.520
and me doing the singing. It's very like Barry White. Yeah. Smooth baritone. Yeah. Yeah. It's
link |
00:18:12.320
great. That was awesome. So one of the other things Charles said is that, you know, everyone
link |
00:18:19.200
knows you as like a super nice guy, super passionate about teaching and so on. What he said,
link |
00:18:27.040
don't know if it's true, that despite the fact that you're, you are. Okay. I will admit this
link |
00:18:34.000
finally for the first time. That was, that was me. It's the Johnny Cash song. Kill the Manorino just
link |
00:18:39.360
to watch him die. That you actually do have some strong opinions on some topics. So if this in fact
link |
00:18:46.880
is true, what strong opinions would you say you have? Is there ideas you think maybe in artificial
link |
00:18:55.120
intelligence and machine learning, maybe in life that you believe is true that others might,
link |
00:19:02.640
you know, some number of people might disagree with you on? So I try very hard to see things
link |
00:19:08.400
from multiple perspectives. There's this great Calvin and Hobbes cartoon where, do you know?
link |
00:19:15.680
Yeah. Okay. So Calvin's dad is always kind of a bit of a foil and he talked Calvin into,
link |
00:19:21.440
Calvin had done something wrong. The dad talks him into like seeing it from another perspective
link |
00:19:25.440
and Calvin, like this breaks Calvin because he's like, oh my gosh, now I can see the opposite sides
link |
00:19:30.880
of things. And so the, it's, it becomes like a Cubist cartoon where there is no front and back.
link |
00:19:35.920
Everything's just exposed and it really freaks him out. And finally he settles back down. It's
link |
00:19:39.680
like, oh good. No, I can make that go away. But like, I'm that, I'm that I live in that world where
link |
00:19:44.160
I'm trying to see everything from every perspective all the time. So there are some things that I've
link |
00:19:48.400
formed opinions about that I would be harder, I think, to disavow me of. One is the super
link |
00:19:56.160
intelligence argument and the existential threat of AI is one where I feel pretty confident in my
link |
00:20:02.640
feeling about that one. Like I'm willing to hear other arguments, but like, I am not particularly
link |
00:20:07.840
moved by the idea that if we're not careful, we will accidentally create a super intelligence
link |
00:20:13.600
that will destroy human life. Let's talk about that. Let's get you in trouble and record your
link |
00:20:17.600
video. It's like Bill Gates, I think he said like some quote about the internet that that's just
link |
00:20:24.800
going to be a small thing. It's not going to really go anywhere. And then I think Steve
link |
00:20:29.360
Ballmer said, I don't know why I'm sticking on Microsoft. That's something that like smartphones
link |
00:20:36.080
are useless. There's no reason why Microsoft should get into smartphones, that kind of.
link |
00:20:40.400
So let's get, let's talk about AGI. As AGI is destroying the world, we'll look back at this
link |
00:20:45.200
video and see. No, I think it's really interesting to actually talk about because nobody really
link |
00:20:49.920
knows the future. So you have to use your best intuition. It's very difficult to predict it,
link |
00:20:54.080
but you have spoken about AGI and the existential risks around it and sort of basing your intuition
link |
00:21:01.760
that we're quite far away from that being a serious concern relative to the other concerns
link |
00:21:08.960
we have. Can you maybe unpack that a little bit? Yeah, sure, sure, sure. So as I understand it,
link |
00:21:15.840
that for example, I read Bostrom's book and a bunch of other reading material about this sort
link |
00:21:22.320
of general way of thinking about the world. And I think the story goes something like this, that we
link |
00:21:27.520
will at some point create computers that are smart enough that they can help design the next version
link |
00:21:35.840
of themselves, which itself will be smarter than the previous version of themselves and eventually
link |
00:21:42.160
bootstrapped up to being smarter than us. At which point we are essentially at the mercy of this sort
link |
00:21:49.120
of more powerful intellect, which in principle we don't have any control over what its goals are.
link |
00:21:56.720
And so if its goals are at all out of sync with our goals, for example, the continued existence
link |
00:22:04.480
of humanity, we won't be able to stop it. It'll be way more powerful than us and we will be toast.
link |
00:22:12.640
So there's some, I don't know, very smart people who have signed on to that story. And it's a
link |
00:22:18.800
compelling story. Now I can really get myself in trouble. I once wrote an op ed about this,
link |
00:22:25.360
specifically responding to some quotes from Elon Musk, who has been on this very podcast
link |
00:22:30.640
more than once. AI summoning the demon. But then he came to Providence, Rhode Island,
link |
00:22:38.480
which is where I live, and said to the governors of all the states, you know, you're worried about
link |
00:22:45.360
entirely the wrong thing. You need to be worried about AI. You need to be very, very worried about
link |
00:22:49.360
AI. And journalists kind of reacted to that and they wanted to get people's take. And I was like,
link |
00:22:56.240
OK, my my my belief is that one of the things that makes Elon Musk so successful and so remarkable
link |
00:23:03.440
as an individual is that he believes in the power of ideas. He believes that you can have you can
link |
00:23:08.880
if you know, if you have a really good idea for getting into space, you can get into space.
link |
00:23:12.960
If you have a really good idea for a company or for how to change the way that people drive,
link |
00:23:18.080
you just have to do it and it can happen. It's really natural to apply that same idea to AI.
link |
00:23:23.840
You see these systems that are doing some pretty remarkable computational tricks, demonstrations,
link |
00:23:30.720
and then to take that idea and just push it all the way to the limit and think, OK, where does
link |
00:23:35.760
this go? Where is this going to take us next? And if you're a deep believer in the power of ideas,
link |
00:23:40.720
then it's really natural to believe that those ideas could be taken to the extreme and kill us.
link |
00:23:47.760
So I think, you know, his strength is also his undoing, because that doesn't mean it's true.
link |
00:23:52.720
Like, it doesn't mean that that has to happen, but it's natural for him to think that.
link |
00:23:56.720
So another way to phrase the way he thinks, and I find it very difficult to argue with that line
link |
00:24:04.160
of thinking. So Sam Harris is another person from neuroscience perspective that thinks like that
link |
00:24:09.920
is saying, well, is there something fundamental in the physics of the universe that prevents this
link |
00:24:18.080
from eventually happening? And Nick Bostrom thinks in the same way, that kind of zooming out, yeah,
link |
00:24:24.320
OK, we humans now are existing in this like time scale of minutes and days. And so our intuition
link |
00:24:32.400
is in this time scale of minutes, hours and days. But if you look at the span of human history,
link |
00:24:39.200
is there any reason you can't see this in 100 years? And like, is there something fundamental
link |
00:24:47.520
about the laws of physics that prevent this? And if it doesn't, then it eventually will happen
link |
00:24:52.320
or will we will destroy ourselves in some other way. And it's very difficult, I find,
link |
00:24:57.200
to actually argue against that. Yeah, me too.
link |
00:25:03.680
And not sound like. Not sound like you're just like rolling your eyes like I have like science
link |
00:25:11.600
fiction, we don't have to think about it, but even even worse than that, which is like, I don't have
link |
00:25:16.000
kids, but like I got to pick up my kids now like this. OK, I see there's more pressing short. Yeah,
link |
00:25:20.400
there's more pressing short term things that like stop over the next national crisis. We have much,
link |
00:25:25.440
much shorter things like now, especially this year, there's covid. So like any kind of discussion
link |
00:25:30.000
like that is like there's this, you know, this pressing things today is. And then so the Sam
link |
00:25:37.520
Harris argument, well, like any day the exponential singularity can can occur is very difficult to
link |
00:25:45.200
argue against. I mean, I don't know. But part of his story is also he's not going to put a date on
link |
00:25:50.160
it. It could be in a thousand years, it could be in a hundred years, it could be in two years. It's
link |
00:25:53.680
just that as long as we keep making this kind of progress, it's ultimately has to become a concern.
link |
00:25:59.680
I kind of am on board with that. But the thing that the piece that I feel like is missing from
link |
00:26:03.920
that that way of extrapolating from the moment that we're in, is that I believe that in the
link |
00:26:09.600
process of actually developing technology that can really get around in the world and really process
link |
00:26:14.560
and do things in the world in a sophisticated way, we're going to learn a lot about what that means,
link |
00:26:20.960
which that we don't know now because we don't know how to do this right now.
link |
00:26:24.240
If you believe that you can just turn on a deep learning network and eventually give it enough
link |
00:26:28.160
compute and eventually get there. Well, sure, that seems really scary because we won't we won't be
link |
00:26:32.320
in the loop at all. We won't we won't be helping to design or target these kinds of systems.
link |
00:26:38.640
But I don't I don't see that. That feels like it is against the laws of physics,
link |
00:26:43.840
because these systems need help. Right. They need they need to surpass the the the difficulty,
link |
00:26:49.760
the wall of complexity that happens in arranging something in the form that that will happen.
link |
00:26:55.520
Yeah, like I believe in evolution, like I believe that that that there's an argument. Right. So
link |
00:27:00.880
there's another argument, just to look at it from a different perspective, that people say,
link |
00:27:04.400
why don't believe in evolution? How could evolution? It's it's sort of like a random set of
link |
00:27:10.000
parts assemble themselves into a 747. And that could just never happen. So it's like,
link |
00:27:15.680
OK, that's maybe hard to argue against. But clearly, 747 do get assembled. They get assembled
link |
00:27:20.480
by us. Basically, the idea being that there's a process by which we will get to the point of
link |
00:27:26.480
making technology that has that kind of awareness. And in that process, we're going to learn a lot
link |
00:27:31.920
about that process and we'll have more ability to control it or to shape it or to build it in our
link |
00:27:37.760
own image. It's not something that is going to spring into existence like that 747. And we're
link |
00:27:43.680
just going to have to contend with it completely unprepared. That's very possible that in the
link |
00:27:49.440
context of the long arc of human history, it will, in fact, spring into existence.
link |
00:27:55.200
But that springing might take like if you look at nuclear weapons, like even 20 years is a springing
link |
00:28:02.640
in in the context of human history. And it's very possible, just like with nuclear weapons,
link |
00:28:07.760
that we could have I don't know what percentage you want to put at it, but the possibility could
link |
00:28:13.040
have knocked ourselves out. Yeah. The possibility of human beings destroying themselves in the 20th
link |
00:28:17.520
century with nuclear weapons. I don't know. You can if you really think through it, you could
link |
00:28:23.200
really put it close to, like, I don't know, 30, 40 percent, given like the certain moments of
link |
00:28:28.400
crisis that happen. So, like, I think one, like, fear in the shadows that's not being acknowledged
link |
00:28:38.240
is it's not so much the A.I. will run away is is that as it's running away,
link |
00:28:44.240
we won't have enough time to think through how to stop it. Right. Fast takeoff or FOOM. Yeah.
link |
00:28:52.080
I mean, my much bigger concern, I wonder what you think about it, which is
link |
00:28:55.760
we won't know it's happening. So I kind of think that there's an A.G.I. situation already happening
link |
00:29:05.760
with social media that our minds, our collective intelligence of human civilization is already
link |
00:29:11.840
being controlled by an algorithm. And like we're we're already super like the level of a collective
link |
00:29:19.520
intelligence, thanks to Wikipedia, people should donate to Wikipedia to feed the A.G.I.
link |
00:29:23.840
. Man, if we had a super intelligence that that was in line with Wikipedia's values,
link |
00:29:31.920
that it's a lot better than a lot of other things I could imagine. I trust Wikipedia more than I
link |
00:29:36.160
trust Facebook or YouTube as far as trying to do the right thing from a rational perspective.
link |
00:29:41.520
Yeah. Now, that's not where you were going. I understand that. But it does strike me that
link |
00:29:45.120
there's sort of smarter and less smart ways of exposing ourselves to each other on the Internet.
link |
00:29:51.200
Yeah. The interesting thing is that Wikipedia and social media have very different forces.
link |
00:29:55.360
You're right. I mean, Wikipedia, if A.G.I. was Wikipedia, it'd be just like this cranky, overly
link |
00:30:02.160
competent editor of articles. You know, there's something to that. But the social
link |
00:30:08.480
media aspect is not. So the vision of A.G.I. is as a separate system that's super intelligent.
link |
00:30:17.120
That's super intelligent. That's one key little thing. I mean, there's the paperclip argument
link |
00:30:20.880
that's super dumb, but super powerful systems. But with social media, you have a relatively like
link |
00:30:27.200
algorithms we may talk about today, very simple algorithms that when something Charles talks a
link |
00:30:35.040
lot about, which is interactive A.I., when they start like having at scale, like tiny little
link |
00:30:40.640
interactions with human beings, they can start controlling these human beings. So a single
link |
00:30:45.200
algorithm can control the minds of human beings slowly to what we might not realize. It could
link |
00:30:51.040
start wars. It could start. It could change the way we think about things. It feels like
link |
00:30:57.840
in the long arc of history, if I were to sort of zoom out from all the outrage and all the tension
link |
00:31:03.680
on social media, that it's progressing us towards better and better things. It feels like chaos and
link |
00:31:11.840
toxic and all that kind of stuff. It's chaos and toxic. Yeah. But it feels like actually
link |
00:31:17.040
the chaos and toxic is similar to the kind of debates we had from the founding of this country.
link |
00:31:22.000
You know, there was a civil war that happened over that period. And ultimately it was all about
link |
00:31:28.000
this tension of like something doesn't feel right about our implementation of the core values we
link |
00:31:33.280
hold as human beings. And they're constantly struggling with this. And that results in people
link |
00:31:38.720
calling each other, just being shady to each other on Twitter. But ultimately the algorithm is
link |
00:31:47.680
managing all that. And it feels like there's a possible future in which that algorithm
link |
00:31:53.120
controls us into the direction of self destruction and whatever that looks like.
link |
00:31:59.200
Yeah. So, all right. I do believe in the power of social media to screw us up royally. I do believe
link |
00:32:05.200
in the power of social media to benefit us too. I do think that we're in a, yeah, it's sort of
link |
00:32:12.160
almost got dropped on top of us. And now we're trying to, as a culture, figure out how to cope
link |
00:32:16.000
with it. There's a sense in which, I don't know, there's some arguments that say that, for example,
link |
00:32:23.600
I guess college age students now, late college age students now, people who were in middle school
link |
00:32:27.840
when social media started to really take off, may be really damaged. Like this may have really hurt
link |
00:32:34.720
their development in a way that we don't have all the implications of quite yet. That's the generation
link |
00:32:40.000
who, and I hate to make it somebody else's responsibility, but like they're the ones who
link |
00:32:46.880
can fix it. They're the ones who can figure out how do we keep the good of this kind of technology
link |
00:32:53.280
without letting it eat us alive. And if they're successful, we move on to the next phase, the next
link |
00:33:01.920
level of the game. If they're not successful, then yeah, then we're going to wreck each other. We're
link |
00:33:06.080
going to destroy society. So you're going to, in your old age, sit on a porch and watch the world
link |
00:33:11.360
burn because of the TikTok generation that... I believe, well, so this is my kid's age,
link |
00:33:17.040
right? And that's certainly my daughter's age. And she's very tapped in to social stuff, but she's
link |
00:33:21.520
also, she's trying to find that balance, right? Of participating in it and in getting the positives
link |
00:33:26.720
of it, but without letting it eat her alive. And I think sometimes she ventures, I hope she doesn't
link |
00:33:33.120
watch this. Sometimes I think she ventures a little too far and is consumed by it. And other
link |
00:33:39.440
times she gets a little distance. And if there's enough people like her out there, they're going to
link |
00:33:46.320
navigate this choppy waters. That's an interesting skill actually to develop. I talked to my dad
link |
00:33:52.960
about it. I've now, somehow this podcast in particular, but other reasons has received a
link |
00:34:01.920
little bit of attention. And with that, apparently in this world, even though I don't shut up about
link |
00:34:07.600
love and I'm just all about kindness, I have now a little mini army of trolls. It's kind of hilarious
link |
00:34:15.040
actually, but it also doesn't feel good, but it's a skill to learn to not look at that, like to
link |
00:34:23.920
moderate actually how much you look at that. The discussion I have with my dad, it's similar to,
link |
00:34:28.800
it doesn't have to be about trolls. It could be about checking email, which is like, if you're
link |
00:34:33.840
anticipating, you know, there's a, my dad runs a large Institute at Drexel University and there
link |
00:34:39.840
could be stressful like emails you're waiting, like there's drama of some kinds. And so like,
link |
00:34:45.680
there's a temptation to check the email. If you send an email and you kind of,
link |
00:34:49.200
and that pulls you in into, it doesn't feel good. And it's a skill that he actually complains that
link |
00:34:56.320
he hasn't learned. I mean, he grew up without it. So he hasn't learned the skill of how to
link |
00:35:01.440
shut off the internet and walk away. And I think young people, while they're also being
link |
00:35:05.840
quote unquote damaged by like, you know, being bullied online, all of those stories, which are
link |
00:35:12.000
very like horrific, you basically can't escape your bullies these days when you're growing up.
link |
00:35:17.200
But at the same time, they're also learning that skill of how to be able to shut off the,
link |
00:35:23.920
like disconnect with it, be able to laugh at it, not take it too seriously. It's fascinating. Like
link |
00:35:29.040
we're all trying to figure this out. Just like you said, it's been dropped on us and we're trying to
link |
00:35:32.320
figure it out. Yeah. I think that's really interesting. And I guess I've become a believer
link |
00:35:37.280
in the human design, which I feel like I don't completely understand. Like how do you make
link |
00:35:42.720
something as robust as us? Like we're so flawed in so many ways. And yet, and yet, you know,
link |
00:35:48.960
we dominate the planet and we do seem to manage to get ourselves out of scrapes eventually,
link |
00:35:57.680
not necessarily the most elegant possible way, but somehow we get, we get to the next step.
link |
00:36:02.160
And I don't know how I'd make a machine do that. Generally speaking, like if I train one of my
link |
00:36:09.600
reinforcement learning agents to play a video game and it works really hard on that first stage
link |
00:36:13.760
over and over and over again, and it makes it through, it succeeds on that first level.
link |
00:36:17.680
And then the new level comes and it's just like, okay, I'm back to the drawing board. And somehow
link |
00:36:21.520
humanity, we keep leveling up and then somehow managing to put together the skills necessary to
link |
00:36:26.800
achieve success, some semblance of success in that next level too. And, you know,
link |
00:36:33.760
I hope we can keep doing that.
link |
00:36:36.320
You mentioned reinforcement learning. So you've had a couple of years in the field. No, quite,
link |
00:36:42.880
you know, quite a few, quite a long career in artificial intelligence broadly, but reinforcement
link |
00:36:50.160
learning specifically, can you maybe give a hint about your sense of the history of the field?
link |
00:36:58.320
And in some ways it's changed with the advent of deep learning, but as a long roots, like how is it
link |
00:37:05.280
weaved in and out of your own life? How have you seen the community change or maybe the ideas that
link |
00:37:09.840
it's playing with change? I've had the privilege, the pleasure of being, of having almost a front
link |
00:37:16.080
row seat to a lot of this stuff. And it's been really, really fun and interesting. So when I was
link |
00:37:21.040
in college in the eighties, early eighties, the neural net thing was starting to happen.
link |
00:37:29.280
And I was taking a lot of psychology classes and a lot of computer science classes as a college
link |
00:37:34.000
student. And I thought, you know, something that can play tic tac toe and just like learn to get
link |
00:37:38.720
better at it. That ought to be a really easy thing. So I spent almost, almost all of my, what would
link |
00:37:43.440
have been vacations during college, like hacking on my home computer, trying to teach it how to
link |
00:37:48.640
play tic tac toe and programming language. Basic. Oh yeah. That's, that's, I was, I that's my first
link |
00:37:53.520
language. That's my native language. Is that when you first fell in love with computer science,
link |
00:37:57.760
just like programming basic on that? Uh, what was, what was the computer? Do you remember? I had,
link |
00:38:02.880
I had a TRS 80 model one before they were called model ones. Cause there was nothing else. Uh,
link |
00:38:08.000
I got my computer in 1979, uh, instead. So I was, I was, I would have been bar mitzvahed,
link |
00:38:18.960
but instead of having a big party that my parents threw on my behalf, they just got me a computer.
link |
00:38:23.440
Cause that's what I really, really, really wanted. I saw them in the, in the, in the mall and
link |
00:38:26.960
radio shack. And I thought, what, how are they doing that? I would try to stump them. I would
link |
00:38:32.080
give them math problems like one plus and then in parentheses, two plus one. And I would always get
link |
00:38:37.280
it right. I'm like, how do you know so much? Like I've had to go to algebra class for the last few
link |
00:38:42.640
years to learn this stuff and you just seem to know. So I was, I was, I was smitten and, uh,
link |
00:38:48.000
got a computer and I think ages 13 to 15. I have no memory of those years. I think I just was in
link |
00:38:55.520
my room with the computer, listening to Billy Joel, communing, possibly listening to the radio,
link |
00:38:59.920
listening to Billy Joel. That was the one album I had, uh, on vinyl at that time. And, um, and then
link |
00:39:06.480
I got it on cassette tape and that was really helpful because then I could play it. I didn't
link |
00:39:09.920
have to go down to my parents, wifi or hi fi sorry. Uh, and at age 15, I remember kind of
link |
00:39:16.320
walking out and like, okay, I'm ready to talk to people again. Like I've learned what I need to
link |
00:39:20.480
learn here. And, um, so yeah, so, so that was, that was my home computer. And so I went to college
link |
00:39:26.240
and I was like, oh, I'm totally going to study computer science. And I opted the college I chose
link |
00:39:30.400
specifically had a computer science major. The one that I really wanted the college I really wanted
link |
00:39:34.720
to go to didn't so bye bye to them. So I went to Yale, uh, Princeton would have been way more
link |
00:39:41.840
convenient and it was just beautiful campus and it was close enough to home. And I was really
link |
00:39:45.760
excited about Princeton. And I visited, I said, so computer science majors like, well, we have
link |
00:39:50.240
computer engineering. I'm like, Oh, I don't like that word engineering. I like computer science.
link |
00:39:55.920
I really, I want to do like, you're saying hardware and software. They're like, yeah.
link |
00:39:59.360
I'm like, I just want to do software. I couldn't care less about hardware. And you grew up in
link |
00:40:02.240
Philadelphia. I grew up outside Philly. Yeah. Yeah. Uh, so the, you know, local schools were
link |
00:40:07.280
like Penn and Drexel and, uh, temple. Like everyone in my family went to temple at least at
link |
00:40:12.800
one point in their lives, except for me. So yeah, Philly, Philly family, Yale had a computer science
link |
00:40:18.400
department. And that's when you, it's kind of interesting. You said eighties and neural
link |
00:40:22.560
networks. That's when the neural networks was a hot new thing or a hot thing period. Uh, so what
link |
00:40:27.760
is that in college when you first learned about neural networks or when she learned, like how did
link |
00:40:31.760
it was in a psychology class, not in a CS. Yeah. Was it psychology or cognitive science or like,
link |
00:40:36.960
do you remember like what context it was? Yeah. Yeah. Yeah. So, so I was a, I've always been a
link |
00:40:42.320
bit of a cognitive psychology groupie. So like I'm, I studied computer science, but I like,
link |
00:40:47.600
I like to hang around where the cognitive scientists are. Cause I don't know brains, man.
link |
00:40:52.640
They're like, they're wacky. Cool. And they have a bigger picture view of things. They're a little
link |
00:40:57.920
less engineering. I would say they're more, they're more interested in the nature of cognition and
link |
00:41:03.120
intelligence and perception and how like the vision system work. Like they're asking always
link |
00:41:07.440
bigger questions. Now with the deep learning community there, I think more, there's a lot of
link |
00:41:12.880
intersections, but I do find that the neuroscience folks actually in cognitive psychology, cognitive
link |
00:41:21.920
science folks are starting to learn how to program, how to use neural, artificial neural networks.
link |
00:41:27.760
And they are actually approaching problems in like totally new, interesting ways. It's fun to
link |
00:41:31.840
watch that grad students from those departments, like approach a problem of machine learning.
link |
00:41:37.200
Right. They come in with a different perspective. Yeah. They don't care about like your
link |
00:41:40.640
image net data set or whatever they want, like to understand the, the, the, like the basic
link |
00:41:47.440
mechanisms at the, at the neuronal level and the functional level of intelligence. It's kind of,
link |
00:41:53.760
it's kind of cool to see them work, but yeah. Okay. So you always love, you're always a groupie
link |
00:41:58.720
of cognitive psychology. Yeah. Yeah. And so, so it was in a class by Richard Garrig. He was kind of
link |
00:42:04.800
like my favorite psych professor in college. And I took like three different classes with him
link |
00:42:11.600
and yeah. So they were talking specifically the class, I think was kind of a,
link |
00:42:17.440
there was a big paper that was written by Steven Pinker and Prince. I don't, I'm blanking on
link |
00:42:22.560
Prince's first name, but Prince and Pinker and Prince, they wrote kind of a, they were at that
link |
00:42:28.480
time kind of like, ah, I'm blanking on the names of the current people. The cognitive scientists
link |
00:42:36.240
who are complaining a lot about deep networks. Oh, Gary, Gary Marcus, Marcus and who else? I mean,
link |
00:42:44.720
there's a few, but Gary, Gary's the most feisty. Sure. Gary's very feisty. And with this, with his
link |
00:42:49.280
coauthor, they, they, you know, they're kind of doing these kinds of take downs where they say,
link |
00:42:52.880
okay, well, yeah, it does all these amazing, amazing things, but here's a shortcoming. Here's
link |
00:42:56.960
a shortcoming. Here's a shortcoming. And so the Pinker Prince paper is kind of like the,
link |
00:43:01.600
that generation's version of Marcus and Davis, right? Where they're, they're trained as cognitive
link |
00:43:07.360
scientists, but they're looking skeptically at the results in the, in the artificial intelligence,
link |
00:43:12.480
neural net kind of world and saying, yeah, it can do this and this and this, but low,
link |
00:43:16.720
it can't do that. And it can't do that. And it can't do that maybe in principle or maybe just
link |
00:43:20.640
in practice at this point. But, but the fact of the matter is you're, you've narrowed your focus
link |
00:43:26.000
too far to be impressed. You know, you're impressed with the things within that circle,
link |
00:43:30.720
but you need to broaden that circle a little bit. You need to look at a wider set of problems.
link |
00:43:34.800
And so, so we had, so I was in this seminar in college that was basically a close reading of
link |
00:43:40.720
the Pinker Prince paper, which was like really thick. There was a lot going on in there. And,
link |
00:43:47.920
and it, you know, and it talked about the reinforcement learning idea a little bit.
link |
00:43:51.120
I'm like, oh, that sounds really cool because behavior is what is really interesting to me
link |
00:43:55.120
about psychology anyway. So making programs that, I mean, programs are things that behave.
link |
00:44:00.640
People are things that behave. Like I want to make learning that learns to behave.
link |
00:44:05.360
And which way was reinforcement learning presented? Is this talking about human and
link |
00:44:09.760
animal behavior or are we talking about actual mathematical construct?
link |
00:44:12.960
Ah, that's right. So that's a good question. Right. So this is, I think it wasn't actually
link |
00:44:17.760
talked about as behavior in the paper that I was reading. I think that it just talked about
link |
00:44:22.000
learning. And to me, learning is about learning to behave, but really neural nets at that point
link |
00:44:27.120
were about learning like supervised learning. So learning to produce outputs from inputs.
link |
00:44:31.360
So I kind of tried to invent reinforcement learning. When I graduated, I joined a research
link |
00:44:36.800
group at Bellcore, which had spun out of Bell Labs recently at that time because of the divestiture
link |
00:44:42.240
of the long distance and local phone service in the 1980s, 1984. And I was in a group with
link |
00:44:50.400
Dave Ackley, who was the first author of the Boltzmann machine paper. So the very first neural
link |
00:44:56.240
net paper that could handle XOR, right? So XOR sort of killed neural nets. The very first,
link |
00:45:02.000
the zero with the first winter. Yeah. Um, the, the perceptrons paper and Hinton along with his
link |
00:45:10.320
student, Dave Ackley, and I think there was other authors as well showed that no, no, no,
link |
00:45:14.480
with Boltzmann machines, we can actually learn nonlinear concepts. And so everything's back on
link |
00:45:19.600
the table again. And that kind of started that second wave of neural networks. So Dave Ackley
link |
00:45:24.240
was, he became my mentor at, at Bellcore and we talked a lot about learning and life and
link |
00:45:30.320
computation and how all these things fit together. Now Dave and I have a podcast together. So,
link |
00:45:35.440
so I get to kind of enjoy that sort of his, his perspective once again, even, even all these years
link |
00:45:42.320
later. And so I said, so I said, I was really interested in learning, but in the concept of
link |
00:45:48.240
behavior and he's like, oh, well that's reinforcement learning here. And he gave me
link |
00:45:52.640
Rich Sutton's 1984 TD paper. So I read that paper. I honestly didn't get all of it,
link |
00:45:58.880
but I got the idea. I got that they were using, that he was using ideas that I was familiar with
link |
00:46:04.000
in the context of neural nets and, and like sort of back prop. But with this idea of making
link |
00:46:09.920
predictions over time, I'm like, this is so interesting, but I don't really get all the
link |
00:46:13.200
details I said to Dave. And Dave said, oh, well, why don't we have him come and give a talk?
link |
00:46:18.560
And I was like, wait, what, you can do that? Like, these are real people. I thought they
link |
00:46:23.040
were just words. I thought it was just like ideas that somehow magically seeped into paper. He's
link |
00:46:28.240
like, no, I, I, I know Rich like, we'll just have him come down and he'll give a talk. And so I was,
link |
00:46:35.680
you know, my mind was blown. And so Rich came and he gave a talk at Bellcore and he talked about
link |
00:46:41.440
what he was super excited, which was they had just figured out at the time Q learning. So Watkins
link |
00:46:48.880
had visited the Rich Sutton's lab at, at UMass or Andy Bartow's lab that Rich was a part of.
link |
00:46:55.760
And, um, he was really excited about this because it resolved a whole bunch of problems that he
link |
00:47:00.560
didn't know how to resolve in the, in the earlier paper. And so, um,
link |
00:47:05.040
For people who don't know TD, temporal difference, these are all just algorithms
link |
00:47:09.200
for reinforcement learning.
link |
00:47:10.320
Right. And TD, temporal difference in particular is about making predictions over time. And you can
link |
00:47:15.520
try to use it for making decisions, right? Cause if you can predict how good a future action or an
link |
00:47:19.840
action outcomes will be in the future, you can choose one that has better and, or, but the thing
link |
00:47:24.960
that's really cool about Q learning is it was off policy, which meant that you could actually be
link |
00:47:29.040
learning about the environment and what the value of different actions would be while actually
link |
00:47:33.840
figuring out how to behave optimally. So that was a revelation.
link |
00:47:38.160
Yeah. And the proof of that is kind of interesting. I mean, that's really surprising
link |
00:47:41.280
to me when I first read that paper. I mean, it's, it's, it's, it's, it's, it's, it's, it's,
link |
00:47:46.400
it's, it's, it's, it's, it's, it's, it's, it's, it's, it's, it's, it's, it's, it's, it's, it's, it's,
link |
00:47:51.840
it's interesting. I mean, that's really surprising to me when I first read that and then in Richard,
link |
00:47:55.840
Rich Sutton's book on the matter, it's, it's kind of a beautiful that a single equation can
link |
00:48:01.120
capture all one line of code and like, you can learn anything. Yeah. Like enough time.
link |
00:48:06.160
So equation and code, you're right. Like you can the code that you can arguably, at least
link |
00:48:13.600
if you like squint your eyes can say,
link |
00:48:17.180
this is all of intelligence is that you can implement
link |
00:48:21.880
that in a single one.
link |
00:48:22.720
I think I started with Lisp, which is a shout out to Lisp
link |
00:48:26.720
with like a single line of code, key piece of code,
link |
00:48:29.860
maybe a couple that you could do that.
link |
00:48:32.200
It's kind of magical.
link |
00:48:33.480
It's feels too good to be true.
link |
00:48:37.040
Well, and it sort of is.
link |
00:48:38.400
Yeah, kind of.
link |
00:48:40.360
It seems to require an awful lot
link |
00:48:41.980
of extra stuff supporting it.
link |
00:48:43.400
But nonetheless, the idea is really good.
link |
00:48:46.500
And as far as we know, it is a very reasonable way
link |
00:48:50.480
of trying to create adaptive behavior,
link |
00:48:52.480
behavior that gets better at something over time.
link |
00:48:56.840
Did you find the idea of optimal at all compelling
link |
00:49:00.240
that you could prove that it's optimal?
link |
00:49:02.040
So like one part of computer science
link |
00:49:04.920
that it makes people feel warm and fuzzy inside
link |
00:49:08.240
is when you can prove something like
link |
00:49:10.440
that a sorting algorithm worst case runs
link |
00:49:13.000
and N log N, and it makes everybody feel so good.
link |
00:49:16.220
Even though in reality, it doesn't really matter
link |
00:49:18.200
what the worst case is, what matters is like,
link |
00:49:20.080
does this thing actually work in practice
link |
00:49:22.500
on this particular actual set of data that I enjoy?
link |
00:49:26.000
Did you?
link |
00:49:26.840
So here's a place where I have maybe a strong opinion,
link |
00:49:29.880
which is like, you're right, of course, but no, no.
link |
00:49:34.040
Like, so what makes worst case so great, right?
link |
00:49:37.760
If you have a worst case analysis so great
link |
00:49:39.520
is that you get modularity.
link |
00:49:41.040
You can take that thing and plug it into another thing
link |
00:49:44.320
and still have some understanding of what's gonna happen
link |
00:49:47.400
when you click them together, right?
link |
00:49:49.320
If it just works well in practice, in other words,
link |
00:49:51.600
with respect to some distribution that you care about,
link |
00:49:54.640
when you go plug it into another thing,
link |
00:49:56.300
that distribution can shift, it can change,
link |
00:49:58.560
and your thing may not work well anymore.
link |
00:50:00.480
And you want it to, and you wish it does,
link |
00:50:02.620
and you hope that it will, but it might not,
link |
00:50:04.960
and then, ah.
link |
00:50:06.560
So you're saying you don't like machine learning.
link |
00:50:13.220
But we have some positive theoretical results
link |
00:50:15.680
for these things.
link |
00:50:17.680
You can come back at me with,
link |
00:50:20.460
yeah, but they're really weak,
link |
00:50:21.520
and yeah, they're really weak.
link |
00:50:22.960
And you can even say that sorting algorithms,
link |
00:50:25.520
like if you do the optimal sorting algorithm,
link |
00:50:27.200
it's not really the one that you want,
link |
00:50:30.000
and that might be true as well.
link |
00:50:31.860
But it is, the modularity is a really powerful statement.
link |
00:50:34.200
I really like that.
link |
00:50:35.040
If you're an engineer, you can then assemble
link |
00:50:36.880
different things, you can count on them to be,
link |
00:50:39.240
I mean, it's interesting.
link |
00:50:42.040
It's a balance, like with everything else in life,
link |
00:50:45.280
you don't want to get too obsessed.
link |
00:50:47.300
I mean, this is what computer scientists do,
link |
00:50:48.760
which they tend to get obsessed,
link |
00:50:51.440
and they overoptimize things,
link |
00:50:53.560
or they start by optimizing, and then they overoptimize.
link |
00:50:56.560
So it's easy to get really granular about this thing,
link |
00:51:00.960
but like the step from an n squared to an n log n
link |
00:51:06.160
sorting algorithm is a big leap for most real world systems.
link |
00:51:10.480
No matter what the actual behavior of the system is,
link |
00:51:13.560
that's a big leap.
link |
00:51:14.760
And the same can probably be said
link |
00:51:17.400
for other kind of first leaps
link |
00:51:20.800
that you would take on a particular problem.
link |
00:51:22.380
Like it's picking the low hanging fruit,
link |
00:51:25.680
or whatever the equivalent of doing the,
link |
00:51:29.120
not the dumbest thing, but the next to the dumbest thing.
link |
00:51:32.560
Picking the most delicious reachable fruit.
link |
00:51:34.760
Yeah, most delicious reachable fruit.
link |
00:51:36.440
I don't know why that's not a saying.
link |
00:51:38.920
Yeah.
link |
00:51:39.960
Okay, so then this is the 80s,
link |
00:51:44.000
and this kind of idea starts to percolate of learning.
link |
00:51:47.680
At that point, I got to meet Rich Sutton,
link |
00:51:50.680
so everything was sort of downhill from there,
link |
00:51:52.240
and that was really the pinnacle of everything.
link |
00:51:55.280
But then I felt like I was kind of on the inside.
link |
00:51:58.020
So then as interesting results were happening,
link |
00:52:00.080
I could like check in with Rich or with Jerry Tesaro,
link |
00:52:03.560
who had a huge impact on kind of early thinking
link |
00:52:06.920
in temporal difference learning and reinforcement learning
link |
00:52:10.200
and showed that you could do,
link |
00:52:11.700
you could solve problems
link |
00:52:12.720
that we didn't know how to solve any other way.
link |
00:52:16.120
And so that was really cool.
link |
00:52:17.240
So as good things were happening,
link |
00:52:18.780
I would hear about it from either the people
link |
00:52:20.720
who were doing it,
link |
00:52:21.560
or the people who were talking to the people
link |
00:52:23.080
who were doing it.
link |
00:52:23.920
And so I was able to track things pretty well
link |
00:52:25.800
through the 90s.
link |
00:52:28.240
So what wasn't most of the excitement
link |
00:52:32.000
on reinforcement learning in the 90s era
link |
00:52:34.640
with, what is it, TD Gamma?
link |
00:52:37.100
Like what's the role of these kind of little
link |
00:52:40.560
like fun game playing things and breakthroughs
link |
00:52:43.360
about exciting the community?
link |
00:52:46.840
Was that, like what were your,
link |
00:52:48.720
because you've also built across,
link |
00:52:50.720
or part of building across a puzzle solver,
link |
00:52:56.680
solving program called proverb.
link |
00:53:00.000
So you were interested in this as a problem,
link |
00:53:05.600
like in forming, using games to understand
link |
00:53:09.660
how to build intelligence systems.
link |
00:53:12.480
So like, what did you think about TD Gamma?
link |
00:53:14.240
Like what did you think about that whole thing in the 90s?
link |
00:53:16.560
Yeah, I mean, I found the TD Gamma result
link |
00:53:19.000
really just remarkable.
link |
00:53:20.320
So I had known about some of Jerry's stuff
link |
00:53:22.280
before he did TD Gamma and he did a system,
link |
00:53:24.840
just more vanilla, well, not entirely vanilla,
link |
00:53:27.840
but a more classical back proppy kind of network
link |
00:53:31.320
for playing backgammon,
link |
00:53:32.720
where he was training it on expert moves.
link |
00:53:35.200
So it was kind of supervised,
link |
00:53:37.280
but the way that it worked was not to mimic the actions,
link |
00:53:41.100
but to learn internally an evaluation function.
link |
00:53:44.040
So to learn, well, if the expert chose this over this,
link |
00:53:47.440
that must mean that the expert values this more than this.
link |
00:53:50.480
And so let me adjust my weights to make it
link |
00:53:52.280
so that the network evaluates this
link |
00:53:54.760
as being better than this.
link |
00:53:56.240
So it could learn from human preferences,
link |
00:53:59.940
it could learn its own preferences.
link |
00:54:02.080
And then when he took the step from that
link |
00:54:04.480
to actually doing it
link |
00:54:06.520
as a full on reinforcement learning problem,
link |
00:54:08.580
where you didn't need a trainer,
link |
00:54:10.080
you could just let it play, that was remarkable, right?
link |
00:54:13.840
And so I think as humans often do,
link |
00:54:17.920
as we've done in the recent past as well,
link |
00:54:20.960
people extrapolate.
link |
00:54:22.000
It's like, oh, well, if you can do that,
link |
00:54:23.460
which is obviously very hard,
link |
00:54:24.960
then obviously you could do all these other problems
link |
00:54:27.960
that we wanna solve that we know are also really hard.
link |
00:54:31.560
And it turned out very few of them ended up being practical,
link |
00:54:35.320
partly because I think neural nets,
link |
00:54:38.000
certainly at the time,
link |
00:54:39.100
were struggling to be consistent and reliable.
link |
00:54:42.740
And so training them in a reinforcement learning setting
link |
00:54:45.020
was a bit of a mess.
link |
00:54:46.720
I had, I don't know, generation after generation
link |
00:54:50.120
of like master students
link |
00:54:51.880
who wanted to do value function approximation,
link |
00:54:55.700
basically reinforcement learning with neural nets.
link |
00:54:59.380
And over and over and over again, we were failing.
link |
00:55:03.620
We couldn't get the good results that Jerry Tesaro got.
link |
00:55:06.160
I now believe that Jerry is a neural net whisperer.
link |
00:55:09.680
He has a particular ability to get neural networks
link |
00:55:14.080
to do things that other people would find impossible.
link |
00:55:18.040
And it's not the technology,
link |
00:55:19.640
it's the technology and Jerry together.
link |
00:55:22.700
Which I think speaks to the role of the human expert
link |
00:55:27.200
in the process of machine learning.
link |
00:55:28.760
Right, it's so easy.
link |
00:55:30.060
We're so drawn to the idea that it's the technology
link |
00:55:32.860
that is where the power is coming from
link |
00:55:36.000
that I think we lose sight of the fact
link |
00:55:38.000
that sometimes you need a really good,
link |
00:55:39.440
just like, I mean, no one would think,
link |
00:55:40.800
hey, here's this great piece of software.
link |
00:55:42.240
Here's like, I don't know, GNU Emacs or whatever.
link |
00:55:44.800
And doesn't that prove that computers are super powerful
link |
00:55:48.380
and basically gonna take over the world?
link |
00:55:49.960
It's like, no, Stalman is a hell of a hacker, right?
link |
00:55:52.640
So he was able to make the code do these amazing things.
link |
00:55:55.880
He couldn't have done it without the computer,
link |
00:55:57.520
but the computer couldn't have done it without him.
link |
00:55:59.160
And so I think people discount the role of people
link |
00:56:02.360
like Jerry who have just a particular set of skills.
link |
00:56:07.360
On that topic, by the way, as a small side note,
link |
00:56:10.620
I tweeted Emacs is greater than Vim yesterday
link |
00:56:14.620
and deleted the tweet 10 minutes later
link |
00:56:18.020
when I realized it started a war.
link |
00:56:21.860
I was like, oh, I was just kidding.
link |
00:56:24.340
I was just being, and I'm gonna walk back and forth.
link |
00:56:29.340
So people still feel passionately
link |
00:56:30.980
about that particular piece of good stuff.
link |
00:56:32.940
Yeah, I don't get that
link |
00:56:33.780
because Emacs is clearly so much better, I don't understand.
link |
00:56:37.380
But why do I say that?
link |
00:56:38.220
Because I spent a block of time in the 80s
link |
00:56:43.220
making my fingers know the Emacs keys
link |
00:56:46.180
and now that's part of the thought process for me.
link |
00:56:49.060
Like I need to express, and if you take that,
link |
00:56:51.460
if you take my Emacs key bindings away, I become...
link |
00:56:57.660
I can't express myself.
link |
00:56:58.820
I'm the same way with the,
link |
00:56:59.660
I don't know if you know what it is,
link |
00:57:01.060
but it's a Kinesis keyboard, which is this butt shaped keyboard.
link |
00:57:05.100
Yes, I've seen them.
link |
00:57:06.940
They're very, I don't know, sexy, elegant?
link |
00:57:10.540
They're just beautiful.
link |
00:57:11.700
Yeah, they're gorgeous, way too expensive.
link |
00:57:14.460
But the problem with them, similar with Emacs,
link |
00:57:19.220
is once you learn to use it.
link |
00:57:23.860
It's harder to use other things.
link |
00:57:24.860
It's hard to use other things.
link |
00:57:26.100
There's this absurd thing where I have like small, elegant,
link |
00:57:29.060
lightweight, beautiful little laptops
link |
00:57:31.500
and I'm sitting there in a coffee shop
link |
00:57:33.180
with a giant Kinesis keyboard and a sexy little laptop.
link |
00:57:36.340
It's absurd, but I used to feel bad about it,
link |
00:57:40.460
but at the same time, you just kind of have to,
link |
00:57:42.900
sometimes it's back to the Billy Joel thing.
link |
00:57:44.780
You just have to throw that Billy Joel record
link |
00:57:47.220
and throw Taylor Swift and Justin Bieber to the wind.
link |
00:57:51.380
So...
link |
00:57:52.220
See, but I like them now because again,
link |
00:57:54.820
I have no musical taste.
link |
00:57:55.740
Like now that I've heard Justin Bieber enough,
link |
00:57:57.900
I'm like, I really like his songs.
link |
00:57:59.980
And Taylor Swift, not only do I like her songs,
link |
00:58:02.980
but my daughter's convinced that she's a genius.
link |
00:58:04.820
And so now I basically have signed onto that.
link |
00:58:07.020
So...
link |
00:58:08.100
So yeah, that speaks to the,
link |
00:58:10.060
back to the robustness of the human brain.
link |
00:58:11.700
That speaks to the neuroplasticity
link |
00:58:13.300
that you can just like a mouse teach yourself to,
link |
00:58:17.980
or probably a dog teach yourself to enjoy Taylor Swift.
link |
00:58:21.500
I'll try it out.
link |
00:58:22.340
I don't know.
link |
00:58:23.660
I try, you know what?
link |
00:58:25.300
It has to do with just like acclimation, right?
link |
00:58:28.060
Just like you said, a couple of weeks.
link |
00:58:29.660
Yeah.
link |
00:58:30.500
That's an interesting experiment.
link |
00:58:31.340
I'll actually try that.
link |
00:58:32.180
Like I'll listen to it.
link |
00:58:33.020
That wasn't the intent of the experiment?
link |
00:58:33.860
Just like social media,
link |
00:58:34.700
it wasn't intended as an experiment
link |
00:58:36.100
to see what we can take as a society,
link |
00:58:38.220
but it turned out that way.
link |
00:58:39.540
I don't think I'll be the same person
link |
00:58:40.860
on the other side of the week listening to Taylor Swift,
link |
00:58:43.300
but let's try.
link |
00:58:44.140
No, it's more compartmentalized.
link |
00:58:45.820
Don't be so worried.
link |
00:58:46.860
Like it's, like I get that you can be worried,
link |
00:58:48.980
but don't be so worried
link |
00:58:49.820
because we compartmentalize really well.
link |
00:58:51.420
And so it won't bleed into other parts of your life.
link |
00:58:53.860
You won't start, I don't know,
link |
00:58:56.220
wearing red lipstick or whatever.
link |
00:58:57.260
Like it's fine.
link |
00:58:58.260
It's fine.
link |
00:58:59.100
It changed fashion and everything.
link |
00:58:59.940
It's fine.
link |
00:59:00.780
But you know what?
link |
00:59:01.620
The thing you have to watch out for
link |
00:59:02.460
is you'll walk into a coffee shop
link |
00:59:03.860
once we can do that again.
link |
00:59:05.180
And recognize the song?
link |
00:59:06.220
And you'll be, no,
link |
00:59:07.060
you won't know that you're singing along
link |
00:59:09.220
until everybody in the coffee shop is looking at you.
link |
00:59:11.540
And then you're like, that wasn't me.
link |
00:59:16.060
Yeah, that's the, you know,
link |
00:59:17.140
people are afraid of AGI.
link |
00:59:18.300
I'm afraid of the Taylor Swift.
link |
00:59:21.020
The Taylor Swift takeover.
link |
00:59:22.300
Yeah, and I mean, people should know that TD Gammon was,
link |
00:59:26.940
I get, would you call it,
link |
00:59:28.300
do you like the terminology of self play by any chance?
link |
00:59:31.300
So like systems that learn by playing themselves.
link |
00:59:35.300
Just, I don't know if it's the best word, but.
link |
00:59:38.060
So what's the problem with that term?
link |
00:59:41.180
I don't know.
link |
00:59:42.020
So it's like the big bang,
link |
00:59:43.540
like it's like talking to a serious physicist.
link |
00:59:46.780
Do you like the term big bang?
link |
00:59:47.980
And when it was early,
link |
00:59:49.740
I feel like it's the early days of self play.
link |
00:59:51.620
I don't know, maybe it was used previously,
link |
00:59:53.220
but I think it's been used by only a small group of people.
link |
00:59:57.660
And so like, I think we're still deciding
link |
00:59:59.660
is this ridiculously silly name a good name
link |
01:00:02.860
for potentially one of the most important concepts
link |
01:00:05.860
in artificial intelligence?
link |
01:00:07.140
Okay, it depends how broadly you apply the term.
link |
01:00:09.020
So I used the term in my 1996 PhD dissertation.
link |
01:00:12.980
Wow, the actual terms of self play.
link |
01:00:14.660
Yeah, because Tesoro's paper was something like
link |
01:00:18.540
training up an expert backgammon player through self play.
link |
01:00:21.660
So I think it was in the title of his paper.
link |
01:00:24.060
If not in the title, it was definitely a term that he used.
link |
01:00:27.140
There's another term that we got from that work is rollout.
link |
01:00:29.740
So I don't know if you, do you ever hear the term rollout?
link |
01:00:32.020
That's a backgammon term that has now applied
link |
01:00:35.180
generally in computers, well, at least in AI
link |
01:00:38.380
because of TD gammon.
link |
01:00:40.740
That's fascinating.
link |
01:00:41.580
So how is self play being used now?
link |
01:00:43.140
And like, why is it,
link |
01:00:44.380
does it feel like a more general powerful concept
link |
01:00:46.460
is sort of the idea of,
link |
01:00:47.860
well, the machine's just gonna teach itself to be smart.
link |
01:00:50.020
Yeah, so that's where maybe you can correct me,
link |
01:00:53.740
but that's where the continuation of the spirit
link |
01:00:56.740
and actually like literally the exact algorithms
link |
01:01:00.220
of TD gammon are applied by DeepMind and OpenAI
link |
01:01:03.980
to learn games that are a little bit more complex
link |
01:01:07.220
that when I was learning artificial intelligence,
link |
01:01:09.060
Go was presented to me
link |
01:01:10.780
with artificial intelligence, the modern approach.
link |
01:01:13.900
I don't know if they explicitly pointed to Go
link |
01:01:16.180
in those books as like unsolvable kind of thing,
link |
01:01:20.900
like implying that these approaches hit their limit
link |
01:01:24.340
in this, with these particular kind of games.
link |
01:01:26.380
So something, I don't remember if the book said it or not,
link |
01:01:29.460
but something in my head,
link |
01:01:31.140
or if it was the professors instilled in me the idea
link |
01:01:34.380
like this is the limits of artificial intelligence
link |
01:01:37.060
of the field.
link |
01:01:38.300
Like it instilled in me the idea
link |
01:01:40.780
that if we can create a system that can solve the game of Go
link |
01:01:44.900
we've achieved AGI.
link |
01:01:46.180
That was kind of, I didn't explicitly like say this,
link |
01:01:49.580
but that was the feeling.
link |
01:01:51.180
And so from, I was one of the people that it seemed magical
link |
01:01:54.140
when a learning system was able to beat
link |
01:01:59.340
a human world champion at the game of Go
link |
01:02:02.340
and even more so from that, that was AlphaGo,
link |
01:02:06.740
even more so with AlphaGo Zero
link |
01:02:08.380
than kind of renamed and advanced into AlphaZero
link |
01:02:11.900
beating a world champion or world class player
link |
01:02:16.940
without any supervised learning on expert games.
link |
01:02:21.420
We're doing only through by playing itself.
link |
01:02:24.580
So that is, I don't know what to make of it.
link |
01:02:29.020
I think it would be interesting to hear
link |
01:02:31.300
what your opinions are on just how exciting,
link |
01:02:35.180
surprising, profound, interesting, or boring
link |
01:02:40.180
the breakthrough performance of AlphaZero was.
link |
01:02:45.180
Okay, so AlphaGo knocked my socks off.
link |
01:02:48.380
That was so remarkable.
link |
01:02:50.780
Which aspect of it?
link |
01:02:52.940
That they got it to work,
link |
01:02:55.020
that they actually were able to leverage
link |
01:02:57.540
a whole bunch of different ideas,
link |
01:02:58.980
integrate them into one giant system.
link |
01:03:01.060
Just the software engineering aspect of it is mind blowing.
link |
01:03:04.220
I don't, I've never been a part of a program
link |
01:03:06.760
as complicated as the program that they built for that.
link |
01:03:09.660
And just the, like Jerry Tesaro is a neural net whisperer,
link |
01:03:14.660
like David Silver is a kind of neural net whisperer too.
link |
01:03:17.420
He was able to coax these networks
link |
01:03:19.300
and these new way out there architectures
link |
01:03:22.380
to do these, solve these problems that,
link |
01:03:25.980
as you said, when we were learning from AI,
link |
01:03:31.220
no one had an idea how to make it work.
link |
01:03:32.780
It was remarkable that these techniques
link |
01:03:35.780
that were so good at playing chess
link |
01:03:40.140
and that could beat the world champion in chess
link |
01:03:42.020
couldn't beat your typical Go playing teenager in Go.
link |
01:03:46.660
So the fact that in a very short number of years,
link |
01:03:49.740
we kind of ramped up to trouncing people in Go
link |
01:03:54.180
just blew me away.
link |
01:03:55.980
So you're kind of focusing on the engineering aspect,
link |
01:03:58.500
which is also very surprising.
link |
01:04:00.060
I mean, there's something different
link |
01:04:02.580
about large, well funded companies.
link |
01:04:05.260
I mean, there's a compute aspect to it too.
link |
01:04:07.940
Like that, of course, I mean, that's similar
link |
01:04:11.500
to Deep Blue, right, with IBM.
link |
01:04:14.300
Like there's something important to be learned
link |
01:04:16.660
and remembered about a large company
link |
01:04:19.500
taking the ideas that are already out there
link |
01:04:22.020
and investing a few million dollars into it or more.
link |
01:04:26.180
And so you're kind of saying the engineering
link |
01:04:29.820
is kind of fascinating, both on the,
link |
01:04:32.060
with AlphaGo is probably just gathering all the data,
link |
01:04:35.300
right, of the expert games, like organizing everything,
link |
01:04:38.860
actually doing distributed supervised learning.
link |
01:04:42.780
And to me, see the engineering I kind of took for granted,
link |
01:04:49.420
to me philosophically being able to persist
link |
01:04:55.100
in the face of like long odds,
link |
01:04:57.940
because it feels like for me,
link |
01:05:00.180
I would be one of the skeptical people in the room
link |
01:05:02.260
thinking that you can learn your way to beat Go.
link |
01:05:05.140
Like it sounded like, especially with David Silver,
link |
01:05:08.500
it sounded like David was not confident at all.
link |
01:05:11.780
So like it was, like not,
link |
01:05:15.780
it's funny how confidence works.
link |
01:05:18.540
It's like, you're not like cocky about it, like, but.
link |
01:05:24.860
Right, because if you're cocky about it,
link |
01:05:26.140
you kind of stop and stall and don't get anywhere.
link |
01:05:28.660
But there's like a hope that's unbreakable.
link |
01:05:31.620
Maybe that's better than confidence.
link |
01:05:33.280
It's a kind of wishful hope and a little dream.
link |
01:05:36.380
And you almost don't want to do anything else.
link |
01:05:38.980
You kind of keep doing it.
link |
01:05:40.900
That's, that seems to be the story and.
link |
01:05:43.660
But with enough skepticism that you're looking
link |
01:05:45.660
for where the problems are and fighting through them.
link |
01:05:48.420
Cause you know, there's gotta be a way out of this thing.
link |
01:05:51.100
And for him, it was probably,
link |
01:05:52.500
there's a bunch of little factors that come into play.
link |
01:05:55.980
It's funny how these stories just all come together.
link |
01:05:57.780
Like everything he did in his life came into play,
link |
01:06:00.660
which is like a love for video games
link |
01:06:02.940
and also a connection to,
link |
01:06:05.380
so the nineties had to happen with TD Gammon and so on.
link |
01:06:09.020
In some ways it's surprising,
link |
01:06:10.900
maybe you can provide some intuition to it
link |
01:06:13.700
that not much more than TD Gammon was done
link |
01:06:16.300
for quite a long time on the reinforcement learning front.
link |
01:06:19.840
Is that weird to you?
link |
01:06:21.140
I mean, like I said, the students who I worked with,
link |
01:06:24.180
we tried to get, basically apply that architecture
link |
01:06:27.140
to other problems and we consistently failed.
link |
01:06:30.700
There were a couple of really nice demonstrations
link |
01:06:33.900
that ended up being in the literature.
link |
01:06:35.100
There was a paper about controlling elevators, right?
link |
01:06:38.700
Where it's like, okay, can we modify the heuristic
link |
01:06:42.260
that elevators use for deciding,
link |
01:06:43.620
like a bank of elevators for deciding which floors
link |
01:06:46.160
we should be stopping on to maximize throughput essentially.
link |
01:06:50.260
And you can set that up as a reinforcement learning problem
link |
01:06:52.320
and you can have a neural net represent the value function
link |
01:06:55.580
so that it's taking where all the elevators,
link |
01:06:57.680
where the button pushes, you know, this high dimensional,
link |
01:07:00.580
well, at the time high dimensional input,
link |
01:07:03.700
you know, a couple of dozen dimensions
link |
01:07:05.620
and turn that into a prediction as to,
link |
01:07:07.980
oh, is it gonna be better if I stop at this floor or not?
link |
01:07:10.620
And ultimately it appeared as though
link |
01:07:13.460
for the standard simulation distribution
link |
01:07:16.780
for people trying to leave the building
link |
01:07:18.280
at the end of the day,
link |
01:07:19.300
that the neural net learned a better strategy
link |
01:07:21.160
than the standard one that's implemented
link |
01:07:22.740
in elevator controllers.
link |
01:07:24.860
So that was nice.
link |
01:07:26.540
There was some work that Satyendra Singh et al
link |
01:07:28.820
did on handoffs with cell phones,
link |
01:07:34.060
you know, deciding when should you hand off
link |
01:07:36.680
from this cell tower to this cell tower.
link |
01:07:38.100
Oh, okay, communication networks, yeah.
link |
01:07:39.980
Yeah, and so a couple of things
link |
01:07:42.700
seemed like they were really promising.
link |
01:07:44.180
None of them made it into production that I'm aware of.
link |
01:07:46.780
And neural nets as a whole started
link |
01:07:48.420
to kind of implode around then.
link |
01:07:50.300
And so there just wasn't a lot of air in the room
link |
01:07:53.800
for people to try to figure out,
link |
01:07:55.020
okay, how do we get this to work in the RL setting?
link |
01:07:58.420
And then they found their way back in 10 plus years.
link |
01:08:03.140
So you said AlphaGo was impressive,
link |
01:08:05.180
like it's a big spectacle.
link |
01:08:06.540
Is there, is that?
link |
01:08:07.860
Right, so then AlphaZero.
link |
01:08:09.120
So I think I may have a slightly different opinion
link |
01:08:11.460
on this than some people.
link |
01:08:12.440
So I talked to Satyendra Singh in particular about this.
link |
01:08:15.540
So Satyendra was like Rich Sutton,
link |
01:08:18.400
a student of Andy Bartow.
link |
01:08:19.660
So they came out of the same lab,
link |
01:08:21.280
very influential machine learning,
link |
01:08:23.940
reinforcement learning researcher.
link |
01:08:26.100
Now at DeepMind, as is Rich.
link |
01:08:29.900
Though different sites, the two of them.
link |
01:08:31.940
He's in Alberta.
link |
01:08:33.020
Rich is in Alberta and Satyendra would be in England,
link |
01:08:36.340
but I think he's in England from Michigan at the moment.
link |
01:08:39.620
But the, but he was, yes,
link |
01:08:41.860
he was much more impressed with AlphaGo Zero,
link |
01:08:46.780
which is didn't get a kind of a bootstrap
link |
01:08:50.100
in the beginning with human trained games.
link |
01:08:51.660
It just was purely self play.
link |
01:08:53.300
Though the first one AlphaGo
link |
01:08:55.740
was also a tremendous amount of self play, right?
link |
01:08:58.080
They started off, they kickstarted the action network
link |
01:09:01.060
that was making decisions,
link |
01:09:02.540
but then they trained it for a really long time
link |
01:09:04.460
using more traditional temporal difference methods.
link |
01:09:08.220
So as a result, I didn't,
link |
01:09:09.860
it didn't seem that different to me.
link |
01:09:11.860
Like, it seems like, yeah, why wouldn't that work?
link |
01:09:15.940
Like once you, once it works, it works.
link |
01:09:17.780
So what, but he found that removal
link |
01:09:21.420
of that extra information to be breathtaking.
link |
01:09:23.780
Like that's a game changer.
link |
01:09:25.940
To me, the first thing was more of a game changer.
link |
01:09:27.860
But the open question, I mean,
link |
01:09:29.420
I guess that's the assumption is the expert games
link |
01:09:32.980
might contain within them a humongous amount of information.
link |
01:09:39.180
But we know that it went beyond that, right?
link |
01:09:41.140
We know that it somehow got away from that information
link |
01:09:43.740
because it was learning strategies.
link |
01:09:45.140
I don't think AlphaGo is just better
link |
01:09:48.540
at implementing human strategies.
link |
01:09:50.260
I think it actually developed its own strategies
link |
01:09:52.540
that were more effective.
link |
01:09:54.500
And so from that perspective, okay, well,
link |
01:09:56.780
so it made at least one quantum leap
link |
01:10:00.220
in terms of strategic knowledge.
link |
01:10:02.460
Okay, so now maybe it makes three, like, okay.
link |
01:10:05.460
But that first one is the doozy, right?
link |
01:10:07.540
Getting it to work reliably and for the networks
link |
01:10:11.660
to hold onto the value well enough.
link |
01:10:13.500
Like that was a big step.
link |
01:10:16.100
Well, maybe you could speak to this
link |
01:10:17.820
on the reinforcement learning front.
link |
01:10:19.140
So starting from scratch and learning to do something,
link |
01:10:25.260
like the first like random behavior
link |
01:10:29.140
to like crappy behavior to like somewhat okay behavior.
link |
01:10:34.860
It's not obvious to me that that's not like impossible
link |
01:10:39.860
to take those steps.
link |
01:10:41.420
Like if you just think about the intuition,
link |
01:10:43.900
like how the heck does random behavior
link |
01:10:46.780
become somewhat basic intelligent behavior?
link |
01:10:51.100
Not human level, not superhuman level, but just basic.
link |
01:10:55.180
But you're saying to you kind of the intuition is like,
link |
01:10:58.100
if you can go from human to superhuman level intelligence
link |
01:11:01.060
on this particular task of game playing,
link |
01:11:04.060
then so you're good at taking leaps.
link |
01:11:07.020
So you can take many of them.
link |
01:11:08.580
That the system, I believe that the system
link |
01:11:10.020
can take that kind of leap.
link |
01:11:12.140
Yeah, and also I think that beginner knowledge in go,
link |
01:11:17.060
like you can start to get a feel really quickly
link |
01:11:19.700
for the idea that being in certain parts of the board
link |
01:11:25.180
seems to be more associated with winning, right?
link |
01:11:28.460
Cause it's not stumbling upon the concept of winning.
link |
01:11:32.060
It's told that it wins or that it loses.
link |
01:11:34.660
Well, it's self play.
link |
01:11:35.500
So it both wins and loses.
link |
01:11:36.700
It's told which side won.
link |
01:11:39.540
And the information is kind of there
link |
01:11:41.900
to start percolating around to make a difference as to,
link |
01:11:46.460
well, these things have a better chance of helping you win.
link |
01:11:48.860
And these things have a worse chance of helping you win.
link |
01:11:50.660
And so it can get to basic play, I think pretty quickly.
link |
01:11:54.340
Then once it has basic play,
link |
01:11:55.980
well now it's kind of forced to do some search
link |
01:11:58.580
to actually experiment with, okay,
link |
01:12:00.100
well what gets me that next increment of improvement?
link |
01:12:04.140
How far do you think, okay, this is where you kind of
link |
01:12:07.180
bring up the Elon Musk and the Sam Harris, right?
link |
01:12:10.500
How far is your intuition about these kinds
link |
01:12:13.140
of self play mechanisms being able to take us?
link |
01:12:16.020
Cause it feels, one of the ominous but stated calmly things
link |
01:12:23.060
that when I talked to David Silver, he said,
link |
01:12:25.500
is that they have not yet discovered a ceiling
link |
01:12:29.180
for Alpha Zero, for example, in the game of Go or chess.
link |
01:12:32.660
Like it keeps, no matter how much they compute,
link |
01:12:35.540
they throw at it, it keeps improving.
link |
01:12:37.620
So it's possible, it's very possible that if you throw,
link |
01:12:43.100
you know, some like 10 X compute that it will improve
link |
01:12:46.540
by five X or something like that.
link |
01:12:48.660
And when stated calmly, it's so like, oh yeah, I guess so.
link |
01:12:54.580
But like, and then you think like,
link |
01:12:56.300
well, can we potentially have like continuations
link |
01:13:00.900
of Moore's law in totally different way,
link |
01:13:02.860
like broadly defined Moore's law,
link |
01:13:04.980
not the exponential improvement, like,
link |
01:13:08.500
are we going to have an Alpha Zero that swallows the world?
link |
01:13:13.180
But notice it's not getting better at other things.
link |
01:13:15.140
It's getting better at Go.
link |
01:13:16.820
And I think that's a big leap to say,
link |
01:13:19.460
okay, well, therefore it's better at other things.
link |
01:13:22.820
Well, I mean, the question is how much of the game of life
link |
01:13:26.500
can be turned into.
link |
01:13:27.700
Right, so that I think is a really good question.
link |
01:13:30.100
And I think that we don't, I don't think we as a,
link |
01:13:32.460
I don't know, community really know the answer to this,
link |
01:13:34.860
but so, okay, so I went to a talk
link |
01:13:39.060
by some experts on computer chess.
link |
01:13:43.260
So in particular, computer chess is really interesting
link |
01:13:45.980
because for, of course, for a thousand years,
link |
01:13:49.340
humans were the best chess playing things on the planet.
link |
01:13:52.460
And then computers like edged ahead of the best person.
link |
01:13:56.420
And they've been ahead ever since.
link |
01:13:57.620
It's not like people have overtaken computers.
link |
01:14:01.160
But computers and people together
link |
01:14:05.020
have overtaken computers.
link |
01:14:07.100
So at least last time I checked,
link |
01:14:09.060
I don't know what the very latest is,
link |
01:14:10.340
but last time I checked that there were teams of people
link |
01:14:14.220
who could work with computer programs
link |
01:14:16.100
to defeat the best computer programs.
link |
01:14:17.980
In the game of Go?
link |
01:14:18.820
In the game of chess.
link |
01:14:19.740
In the game of chess.
link |
01:14:20.580
Right, and so using the information about how,
link |
01:14:25.740
these things called ELO scores,
link |
01:14:27.080
this sort of notion of how strong a player are you.
link |
01:14:30.320
There's kind of a range of possible scores.
link |
01:14:32.540
And you increment in score,
link |
01:14:35.500
basically if you can beat another player
link |
01:14:37.820
of that lower score 62% of the time or something like that.
link |
01:14:41.760
Like there's some threshold
link |
01:14:42.900
of if you can somewhat consistently beat someone,
link |
01:14:46.220
then you are of a higher score than that person.
link |
01:14:48.800
And there's a question as to how many times
link |
01:14:50.820
can you do that in chess, right?
link |
01:14:52.700
And so we know that there's a range of human ability levels
link |
01:14:55.460
that cap out with the best playing humans.
link |
01:14:57.820
And the computers went a step beyond that.
link |
01:15:00.140
And computers and people together have not gone,
link |
01:15:03.100
I think a full step beyond that.
link |
01:15:05.200
It feels, the estimates that they have
link |
01:15:07.540
is that it's starting to asymptote.
link |
01:15:09.160
That we've reached kind of the maximum,
link |
01:15:11.000
the best possible chess playing.
link |
01:15:13.940
And so that means that there's kind of
link |
01:15:15.460
a finite strategic depth, right?
link |
01:15:18.500
At some point you just can't get any better at this game.
link |
01:15:21.700
Yeah, I mean, I don't, so I'll actually check that.
link |
01:15:25.740
I think it's interesting because if you have somebody
link |
01:15:29.660
like Magnus Carlsen, who's using these chess programs
link |
01:15:34.980
to train his mind, like to learn about chess.
link |
01:15:37.940
To become a better chess player, yeah.
link |
01:15:38.900
And so like, that's a very interesting thing
link |
01:15:41.820
because we're not static creatures.
link |
01:15:43.980
We're learning together.
link |
01:15:45.180
I mean, just like we're talking about social networks,
link |
01:15:47.820
those algorithms are teaching us
link |
01:15:49.540
just like we're teaching those algorithms.
link |
01:15:51.540
So that's a fascinating thing.
link |
01:15:52.500
But I think the best chess playing programs
link |
01:15:57.140
are now better than the pairs.
link |
01:15:58.700
Like they have competition between pairs,
link |
01:16:00.700
but it's still, even if they weren't,
link |
01:16:03.620
it's an interesting question, where's the ceiling?
link |
01:16:06.020
So the David, the ominous David Silver kind of statement
link |
01:16:09.420
is like, we have not found the ceiling.
link |
01:16:12.180
Right, so the question is, okay,
link |
01:16:14.260
so I don't know his analysis on that.
link |
01:16:16.540
My, from talking to Go experts,
link |
01:16:20.060
the depth, the strategic depth of Go
link |
01:16:22.620
seems to be substantially greater than that of chess.
link |
01:16:25.180
That there's more kind of steps of improvement
link |
01:16:27.920
that you can make, getting better and better
link |
01:16:29.700
and better and better.
link |
01:16:30.540
But there's no reason to think that it's infinite.
link |
01:16:32.100
Infinite, yeah.
link |
01:16:33.420
And so it could be that what David is seeing
link |
01:16:37.060
is a kind of asymptoting that you can keep getting better,
link |
01:16:39.780
but with diminishing returns.
link |
01:16:41.140
And at some point you hit optimal play.
link |
01:16:43.620
Like in theory, all these finite games, they're finite.
link |
01:16:47.620
They have an optimal strategy.
link |
01:16:49.280
There's a strategy that is the minimax optimal strategy.
link |
01:16:51.820
And so at that point, you can't get any better.
link |
01:16:54.780
You can't beat that strategy.
link |
01:16:56.460
Now that strategy may be,
link |
01:16:58.220
from an information processing perspective, intractable.
link |
01:17:02.380
Right, you need, all the situations
link |
01:17:06.260
are sufficiently different that you can't compress it at all.
link |
01:17:08.460
It's this giant mess of hardcoded rules.
link |
01:17:12.220
And we can never achieve that.
link |
01:17:14.720
But that still puts a cap on how many levels of improvement
link |
01:17:17.740
that we can actually make.
link |
01:17:19.020
But the thing about self play is if you put it,
link |
01:17:23.260
although I don't like doing that,
link |
01:17:24.540
in the broader category of self supervised learning,
link |
01:17:28.420
is that it doesn't require too much or any human input.
link |
01:17:31.780
Human labeling, yeah.
link |
01:17:32.700
Yeah, human label or just human effort.
link |
01:17:34.900
The human involvement passed a certain point.
link |
01:17:37.940
And the same thing you could argue is true
link |
01:17:41.100
for the recent breakthroughs in natural language processing
link |
01:17:44.820
with language models.
link |
01:17:45.860
Oh, this is how you get to GPT3.
link |
01:17:47.780
Yeah, see how that did the.
link |
01:17:49.780
That was a good transition.
link |
01:17:51.300
Yeah, I practiced that for days leading up to this now.
link |
01:17:56.460
But like that's one of the questions is,
link |
01:17:59.680
can we find ways to formulate problems in this world
link |
01:18:03.400
that are important to us humans,
link |
01:18:05.520
like more important than the game of chess,
link |
01:18:08.260
that to which self supervised kinds of approaches
link |
01:18:12.540
could be applied?
link |
01:18:13.380
Whether it's self play, for example,
link |
01:18:15.540
for like maybe you could think of like autonomous vehicles
link |
01:18:19.260
in simulation, that kind of stuff,
link |
01:18:22.340
or just robotics applications and simulation,
link |
01:18:25.720
or in the self supervised learning,
link |
01:18:29.440
where unannotated data,
link |
01:18:33.660
or data that's generated by humans naturally
link |
01:18:37.460
without extra costs, like Wikipedia,
link |
01:18:41.420
or like all of the internet can be used
link |
01:18:44.060
to learn something about,
link |
01:18:46.300
to create intelligent systems that do something
link |
01:18:49.300
really powerful, that pass the Turing test,
link |
01:18:52.380
or that do some kind of superhuman level performance.
link |
01:18:56.500
So what's your intuition,
link |
01:18:58.820
like trying to stitch all of it together
link |
01:19:01.600
about our discussion of AGI,
link |
01:19:05.180
the limits of self play,
link |
01:19:07.260
and your thoughts about maybe the limits of neural networks
link |
01:19:10.420
in the context of language models.
link |
01:19:13.100
Is there some intuition in there
link |
01:19:14.540
that might be useful to think about?
link |
01:19:17.020
Yeah, yeah, yeah.
link |
01:19:17.860
So first of all, the whole Transformer network
link |
01:19:22.820
family of things is really cool.
link |
01:19:26.620
It's really, really cool.
link |
01:19:28.140
I mean, if you've ever,
link |
01:19:30.260
back in the day you played with,
link |
01:19:31.780
I don't know, Markov models for generating texts,
link |
01:19:34.020
and you've seen the kind of texts that they spit out,
link |
01:19:35.820
and you compare it to what's happening now,
link |
01:19:37.960
it's amazing, it's so amazing.
link |
01:19:41.820
Now, it doesn't take very long interacting
link |
01:19:43.980
with one of these systems before you find the holes, right?
link |
01:19:47.340
It's not smart in any kind of general way.
link |
01:19:53.100
It's really good at a bunch of things.
link |
01:19:55.300
And it does seem to understand
link |
01:19:56.540
a lot of the statistics of language extremely well.
link |
01:19:59.980
And that turns out to be very powerful.
link |
01:20:01.860
You can answer many questions with that.
link |
01:20:04.040
But it doesn't make it a good conversationalist, right?
link |
01:20:06.580
And it doesn't make it a good storyteller.
link |
01:20:08.460
It just makes it good at imitating
link |
01:20:10.040
of things that is seen in the past.
link |
01:20:12.620
The exact same thing could be said
link |
01:20:14.540
by people who are voting for Donald Trump
link |
01:20:16.620
about Joe Biden supporters,
link |
01:20:18.060
and people voting for Joe Biden
link |
01:20:19.420
about Donald Trump supporters is, you know.
link |
01:20:22.900
That they're not intelligent, they're just following the.
link |
01:20:25.100
Yeah, they're following things they've seen in the past.
link |
01:20:27.420
And it doesn't take long to find the flaws
link |
01:20:31.220
in their natural language generation abilities.
link |
01:20:36.380
Yes, yes.
link |
01:20:37.220
So we're being very.
link |
01:20:38.060
That's interesting.
link |
01:20:39.500
Critical of AI systems.
link |
01:20:41.260
Right, so I've had a similar thought,
link |
01:20:43.420
which was that the stories that GPT3 spits out
link |
01:20:48.700
are amazing and very humanlike.
link |
01:20:52.420
And it doesn't mean that computers are smarter
link |
01:20:55.940
than we realize necessarily.
link |
01:20:57.500
It partly means that people are dumber than we realize.
link |
01:21:00.280
Or that much of what we do day to day is not that deep.
link |
01:21:04.520
Like we're just kind of going with the flow.
link |
01:21:07.300
We're saying whatever feels like the natural thing
link |
01:21:09.360
to say next.
link |
01:21:10.380
Not a lot of it is creative or meaningful or intentional.
link |
01:21:17.060
But enough is that we actually get by, right?
link |
01:21:20.460
We do come up with new ideas sometimes,
link |
01:21:22.280
and we do manage to talk each other into things sometimes.
link |
01:21:24.860
And we do sometimes vote for reasonable people sometimes.
link |
01:21:29.420
But it's really hard to see in the statistics
link |
01:21:32.660
because so much of what we're saying is kind of rote.
link |
01:21:35.620
And so our metrics that we use to measure
link |
01:21:38.160
how these systems are doing don't reveal that
link |
01:21:41.700
because it's in the interstices that is very hard to detect.
link |
01:21:47.100
But is your, do you have an intuition
link |
01:21:49.020
that with these language models, if they grow in size,
link |
01:21:53.380
it's already surprising when you go from GPT2 to GPT3
link |
01:21:57.460
that there is a noticeable improvement.
link |
01:21:59.540
So the question now goes back to the ominous David Silver
link |
01:22:02.560
and the ceiling.
link |
01:22:03.420
Right, so maybe there's just no ceiling.
link |
01:22:04.980
We just need more compute.
link |
01:22:06.140
Now, I mean, okay, so now I'm speculating.
link |
01:22:10.340
Yes.
link |
01:22:11.180
As opposed to before when I was completely on firm ground.
link |
01:22:13.860
All right, I don't believe that you can get something
link |
01:22:17.300
that really can do language and use language as a thing
link |
01:22:21.940
that doesn't interact with people.
link |
01:22:24.360
Like I think that it's not enough
link |
01:22:25.940
to just take everything that we've said written down
link |
01:22:28.300
and just say, that's enough.
link |
01:22:29.840
You can just learn from that and you can be intelligent.
link |
01:22:32.020
I think you really need to be pushed back at.
link |
01:22:35.360
I think that conversations,
link |
01:22:36.780
even people who are pretty smart,
link |
01:22:38.940
maybe the smartest thing that we know,
link |
01:22:40.720
maybe not the smartest thing we can imagine,
link |
01:22:43.020
but we get so much benefit
link |
01:22:44.700
out of talking to each other and interacting.
link |
01:22:48.620
That's presumably why you have conversations live with guests
link |
01:22:51.260
is that there's something in that interaction
link |
01:22:53.900
that would not be exposed by,
link |
01:22:55.920
oh, I'll just write you a story
link |
01:22:57.180
and then you can read it later.
link |
01:22:58.340
And I think because these systems
link |
01:23:00.300
are just learning from our stories,
link |
01:23:01.800
they're not learning from being pushed back at by us,
link |
01:23:05.200
that they're fundamentally limited
link |
01:23:06.540
into what they can actually become on this route.
link |
01:23:08.860
They have to get shut down.
link |
01:23:12.300
Like we have to have an argument,
link |
01:23:14.940
they have to have an argument with us
link |
01:23:15.980
and lose a couple of times
link |
01:23:17.540
before they start to realize, oh, okay, wait,
link |
01:23:20.540
there's some nuance here that actually matters.
link |
01:23:23.240
Yeah, that's actually subtle sounding,
link |
01:23:25.820
but quite profound that the interaction with humans
link |
01:23:30.020
is essential and the limitation within that
link |
01:23:34.240
is profound as well because the timescale,
link |
01:23:37.380
like the bandwidth at which you can really interact
link |
01:23:40.520
with humans is very low.
link |
01:23:43.500
So it's costly.
link |
01:23:44.460
So you can't, one of the underlying things about self plays,
link |
01:23:47.700
it has to do a very large number of interactions.
link |
01:23:53.100
And so you can't really deploy reinforcement learning systems
link |
01:23:56.660
into the real world to interact.
link |
01:23:58.140
Like you couldn't deploy a language model
link |
01:24:01.340
into the real world to interact with humans
link |
01:24:04.580
because it was just not getting enough data
link |
01:24:06.780
relative to the cost it takes to interact.
link |
01:24:09.860
Like the time of humans is expensive,
link |
01:24:12.820
which is really interesting.
link |
01:24:13.700
That takes us back to reinforcement learning
link |
01:24:16.300
and trying to figure out if there's ways
link |
01:24:18.700
to make algorithms that are more efficient at learning,
link |
01:24:22.500
keep the spirit in reinforcement learning
link |
01:24:24.660
and become more efficient.
link |
01:24:26.300
In some sense, that seems to be the goal.
link |
01:24:28.220
I'd love to hear what your thoughts are.
link |
01:24:31.380
I don't know if you got a chance to see
link |
01:24:33.380
the blog post called Bitter Lesson.
link |
01:24:35.140
Oh yes.
link |
01:24:37.060
By Rich Sutton that makes an argument,
link |
01:24:39.620
hopefully I can summarize it.
link |
01:24:41.620
Perhaps you can.
link |
01:24:43.460
Yeah, but do you want?
link |
01:24:44.660
Okay.
link |
01:24:45.500
So I mean, I could try and you can correct me,
link |
01:24:47.380
which is he makes an argument that it seems
link |
01:24:50.340
if we look at the long arc of the history
link |
01:24:52.940
of the artificial intelligence field,
link |
01:24:55.020
he calls 70 years that the algorithms
link |
01:24:58.380
from which we've seen the biggest improvements in practice
link |
01:25:02.900
are the very simple, like dumb algorithms
link |
01:25:05.980
that are able to leverage computation.
link |
01:25:08.660
And you just wait for the computation to improve.
link |
01:25:11.420
Like all of the academics and so on have fun
link |
01:25:13.660
by finding little tricks
link |
01:25:15.020
and congratulate themselves on those tricks.
link |
01:25:17.460
And sometimes those tricks can be like big,
link |
01:25:20.060
that feel in the moment like big spikes and breakthroughs,
link |
01:25:22.700
but in reality over the decades,
link |
01:25:25.660
it's still the same dumb algorithm
link |
01:25:27.620
that just waits for the compute to get faster and faster.
link |
01:25:31.700
Do you find that to be an interesting argument
link |
01:25:36.300
against the entirety of the field of machine learning
link |
01:25:39.540
as an academic discipline?
link |
01:25:41.020
That we're really just a subfield of computer architecture.
link |
01:25:44.380
We're just kind of waiting around
link |
01:25:45.500
for them to do their next thing.
link |
01:25:46.340
Who really don't want to do hardware work.
link |
01:25:48.140
So like.
link |
01:25:48.980
That's right.
link |
01:25:49.820
I really don't want to think about it.
link |
01:25:50.660
We're procrastinating.
link |
01:25:51.500
Yes, that's right, just waiting for them to do their jobs
link |
01:25:53.740
so that we can pretend to have done ours.
link |
01:25:55.180
So yeah, I mean, the argument reminds me a lot of,
link |
01:26:00.180
I think it was a Fred Jelinek quote,
link |
01:26:02.300
early computational linguist who said,
link |
01:26:04.740
we're building these computational linguistic systems
link |
01:26:07.260
and every time we fire a linguist performance goes up
link |
01:26:11.100
by 10%, something like that.
link |
01:26:13.060
And so the idea of us building the knowledge in,
link |
01:26:16.060
in that case was much less,
link |
01:26:19.100
he was finding it to be much less successful
link |
01:26:20.980
than get rid of the people who know about language as a,
link |
01:26:25.020
from a kind of scholastic academic kind of perspective
link |
01:26:29.700
and replace them with more compute.
link |
01:26:32.180
And so I think this is kind of a modern version
link |
01:26:34.380
of that story, which is, okay,
link |
01:26:35.620
we want to do better on machine vision.
link |
01:26:38.420
You could build in all these,
link |
01:26:41.940
motivated part based models that,
link |
01:26:45.420
that just feel like obviously the right thing
link |
01:26:47.420
that you have to have,
link |
01:26:48.500
or we can throw a lot of data at it
link |
01:26:49.980
and guess what we're doing better with a lot of data.
link |
01:26:52.100
So I hadn't thought about it until this moment in this way,
link |
01:26:57.460
but what I believe, well, I've thought about what I believe.
link |
01:27:00.620
What I believe is that, you know, compositionality
link |
01:27:05.780
and what's the right way to say it,
link |
01:27:08.820
the complexity grows rapidly
link |
01:27:12.180
as you consider more and more possibilities,
link |
01:27:14.580
like explosively.
link |
01:27:16.740
And so far Moore's law has also been growing explosively
link |
01:27:20.180
exponentially.
link |
01:27:21.020
And so it really does seem like, well,
link |
01:27:23.020
we don't have to think really hard about the algorithm
link |
01:27:27.140
design or the way that we build the systems,
link |
01:27:29.340
because the best benefit we could get is exponential.
link |
01:27:32.740
And the best benefit that we can get from waiting
link |
01:27:34.700
is exponential.
link |
01:27:35.860
So we can just wait.
link |
01:27:38.180
It's got, that's gotta end, right?
link |
01:27:39.940
And there's hints now that,
link |
01:27:41.100
that Moore's law is starting to feel some friction,
link |
01:27:44.740
starting to, the world is pushing back a little bit.
link |
01:27:48.380
One thing that I don't know, do lots of people know this?
link |
01:27:50.940
I didn't know this, I was trying to write an essay
link |
01:27:54.020
and yeah, Moore's law has been amazing
link |
01:27:56.940
and it's enabled all sorts of things,
link |
01:27:58.580
but there's also a kind of counter Moore's law,
link |
01:28:01.380
which is that the development cost
link |
01:28:03.260
for each successive generation of chips also is doubling.
link |
01:28:07.660
So it's costing twice as much money.
link |
01:28:09.380
So the amount of development money per cycle or whatever
link |
01:28:12.900
is actually sort of constant.
link |
01:28:14.860
And at some point we run out of money.
link |
01:28:17.180
So, or we have to come up with an entirely different way
link |
01:28:19.540
of doing the development process.
link |
01:28:22.100
So like, I guess I always a bit skeptical of the look,
link |
01:28:25.980
it's an exponential curve, therefore it has no end.
link |
01:28:28.700
Soon the number of people going to NeurIPS
link |
01:28:30.500
will be greater than the population of the earth.
link |
01:28:32.660
That means we're gonna discover life on other planets.
link |
01:28:35.460
No, it doesn't.
link |
01:28:36.300
It means that we're in a sigmoid curve on the front half,
link |
01:28:40.340
which looks a lot like an exponential.
link |
01:28:42.700
The second half is gonna look a lot like diminishing returns.
link |
01:28:46.140
Yeah, I mean, but the interesting thing about Moore's law,
link |
01:28:48.980
if you actually like look at the technologies involved,
link |
01:28:52.220
it's hundreds, if not thousands of S curves
link |
01:28:55.620
stacked on top of each other.
link |
01:28:56.700
It's not actually an exponential curve,
link |
01:28:58.700
it's constant breakthroughs.
link |
01:29:01.100
And then what becomes useful to think about,
link |
01:29:04.140
which is exactly what you're saying,
link |
01:29:05.500
the cost of development, like the size of teams,
link |
01:29:08.100
the amount of resources that are invested
link |
01:29:10.220
in continuing to find new S curves, new breakthroughs.
link |
01:29:14.300
And yeah, it's an interesting idea.
link |
01:29:19.100
If we live in the moment, if we sit here today,
link |
01:29:22.860
it seems to be the reasonable thing
link |
01:29:25.820
to say that exponentials end.
link |
01:29:29.180
And yet in the software realm,
link |
01:29:31.420
they just keep appearing to be happening.
link |
01:29:34.740
And it's so, I mean, it's so hard to disagree
link |
01:29:39.700
with Elon Musk on this.
link |
01:29:41.060
Because it like, I've, you know,
link |
01:29:45.980
I used to be one of those folks,
link |
01:29:47.740
I'm still one of those folks that studied
link |
01:29:49.980
autonomous vehicles, that's what I worked on.
link |
01:29:52.180
And it's like, you look at what Elon Musk is saying
link |
01:29:56.260
about autonomous vehicles, well, obviously,
link |
01:29:58.100
in a couple of years, or in a year, or next month,
link |
01:30:01.580
we'll have fully autonomous vehicles.
link |
01:30:03.220
Like there's no reason why we can't.
link |
01:30:04.700
Driving is pretty simple, like it's just a learning problem
link |
01:30:07.980
and you just need to convert all the driving
link |
01:30:11.060
that we're doing into data and just having you all know
link |
01:30:13.140
with the trains on that data.
link |
01:30:14.660
And like, we use only our eyes, so you can use cameras
link |
01:30:18.620
and you can train on it.
link |
01:30:20.380
And it's like, yeah, that should work.
link |
01:30:26.180
And then you put that hat on, like the philosophical hat,
link |
01:30:29.100
and but then you put the pragmatic hat and it's like,
link |
01:30:31.540
this is what the flaws of computer vision are.
link |
01:30:33.900
Like, this is what it means to train at scale.
link |
01:30:35.980
And then you put the human factors, the psychology hat on,
link |
01:30:40.940
which is like, it's actually driving us a lot,
link |
01:30:43.620
the cognitive science or cognitive,
link |
01:30:44.900
whatever the heck you call it, it's really hard,
link |
01:30:48.180
it's much harder to drive than we realize,
link |
01:30:50.900
there's a much larger number of edge cases.
link |
01:30:53.420
So building up an intuition around this is,
link |
01:30:57.460
around exponentials is really difficult.
link |
01:30:59.380
And on top of that, the pandemic is making us think
link |
01:31:03.180
about exponentials, making us realize that like,
link |
01:31:06.980
we don't understand anything about it,
link |
01:31:08.900
we're not able to intuit exponentials,
link |
01:31:11.060
we're either ultra terrified, some part of the population
link |
01:31:15.540
and some part is like the opposite of whatever
link |
01:31:20.260
the different carefree and we're not managing it very well.
link |
01:31:24.620
Blase, well, wow, is that French?
link |
01:31:28.260
I assume so, it's got an accent.
link |
01:31:29.780
So it's fascinating to think what the limits
link |
01:31:35.460
of this exponential growth of technology,
link |
01:31:41.060
not just Moore's law, it's technology,
link |
01:31:44.460
how that rubs up against the bitter lesson
link |
01:31:49.460
and GPT three and self play mechanisms.
link |
01:31:53.700
Like it's not obvious, I used to be much more skeptical
link |
01:31:56.980
about neural networks.
link |
01:31:58.220
Now I at least give a slither of possibility
link |
01:32:00.980
that we'll be very much surprised
link |
01:32:04.420
and also caught in a way that like,
link |
01:32:10.900
we are not prepared for.
link |
01:32:14.140
Like in applications of social networks, for example,
link |
01:32:19.420
cause it feels like really good transformer models
link |
01:32:23.460
that are able to do some kind of like very good
link |
01:32:28.460
natural language generation of the same kind of models
link |
01:32:31.220
that can be used to learn human behavior
link |
01:32:33.860
and then manipulate that human behavior
link |
01:32:35.940
to gain advertisers dollars and all those kinds of things
link |
01:32:38.980
through the capitalist system.
link |
01:32:41.380
And they arguably already are manipulating human behavior.
link |
01:32:46.420
But not for self preservation, which I think is a big,
link |
01:32:51.220
that would be a big step.
link |
01:32:52.340
Like if they were trying to manipulate us
link |
01:32:54.020
to convince us not to shut them off,
link |
01:32:57.020
I would be very freaked out.
link |
01:32:58.580
But I don't see a path to that from where we are now.
link |
01:33:01.780
They don't have any of those abilities.
link |
01:33:05.820
That's not what they're trying to do.
link |
01:33:07.660
They're trying to keep people on the site.
link |
01:33:10.100
But see the thing is, this is the thing about life on earth
link |
01:33:13.020
is they might be borrowing our consciousness
link |
01:33:16.860
and sentience like, so like in a sense they do
link |
01:33:20.940
because the creators of the algorithms have,
link |
01:33:23.740
like they're not, if you look at our body,
link |
01:33:26.940
we're not a single organism.
link |
01:33:28.540
We're a huge number of organisms
link |
01:33:30.340
with like tiny little motivations
link |
01:33:31.700
were built on top of each other.
link |
01:33:33.300
In the same sense, the AI algorithms that are,
link |
01:33:36.220
they're not like.
link |
01:33:37.060
It's a system that includes companies and corporations,
link |
01:33:40.260
because corporations are funny organisms
link |
01:33:42.100
in and of themselves that really do seem
link |
01:33:44.380
to have self preservation built in.
link |
01:33:45.780
And I think that's at the design level.
link |
01:33:48.180
I think they're designed to have self preservation
link |
01:33:50.020
to be a focus.
link |
01:33:52.540
So you're right.
link |
01:33:53.380
In that broader system that we're also a part of
link |
01:33:58.620
and can have some influence on,
link |
01:34:02.460
it is much more complicated, much more powerful.
link |
01:34:04.780
Yeah, I agree with that.
link |
01:34:06.980
So people really love it when I ask,
link |
01:34:09.380
what three books, technical, philosophical, fiction
link |
01:34:13.500
had a big impact on your life?
link |
01:34:14.860
Maybe you can recommend.
link |
01:34:16.180
We went with movies, we went with Billy Joe
link |
01:34:21.260
and I forgot what music you recommended, but.
link |
01:34:24.460
I didn't, I just said I have no taste in music.
link |
01:34:26.580
I just like pop music.
link |
01:34:27.740
That was actually really skillful
link |
01:34:30.020
the way you avoided that question.
link |
01:34:30.860
Thank you, thanks.
link |
01:34:31.700
I'm gonna try to do the same with the books.
link |
01:34:33.780
So do you have a skillful way to avoid answering
link |
01:34:37.300
the question about three books you would recommend?
link |
01:34:39.820
I'd like to tell you a story.
link |
01:34:42.900
So my first job out of college was at Bellcore.
link |
01:34:45.900
I mentioned that before, where I worked with Dave Ackley.
link |
01:34:48.180
The head of the group was a guy named Tom Landauer.
link |
01:34:50.180
And I don't know how well known he's known now,
link |
01:34:53.580
but arguably he's the inventor
link |
01:34:56.260
and the first proselytizer of word embeddings.
link |
01:34:59.100
So they developed a system shortly before I got to the group
link |
01:35:04.740
that was called latent semantic analysis
link |
01:35:07.700
that would take words of English
link |
01:35:09.300
and embed them in multi hundred dimensional space
link |
01:35:12.780
and then use that as a way of assessing
link |
01:35:15.740
similarity and basically doing reinforcement learning,
link |
01:35:17.860
I'm sorry, not reinforcement, information retrieval,
link |
01:35:20.940
sort of pre Google information retrieval.
link |
01:35:23.460
And he was trained as an anthropologist,
link |
01:35:28.060
but then became a cognitive scientist.
link |
01:35:29.780
So I was in the cognitive science research group.
link |
01:35:32.020
Like I said, I'm a cognitive science groupie.
link |
01:35:34.980
At the time I thought I'd become a cognitive scientist,
link |
01:35:37.100
but then I realized in that group,
link |
01:35:38.740
no, I'm a computer scientist,
link |
01:35:40.380
but I'm a computer scientist who really loves
link |
01:35:41.780
to hang out with cognitive scientists.
link |
01:35:43.660
And he said, he studied language acquisition in particular.
link |
01:35:48.660
He said, you know, humans have about this number of words
link |
01:35:51.500
of vocabulary and most of that is learned from reading.
link |
01:35:55.540
And I said, that can't be true
link |
01:35:57.260
because I have a really big vocabulary and I don't read.
link |
01:36:00.580
He's like, you must.
link |
01:36:01.420
I'm like, I don't think I do.
link |
01:36:03.020
I mean like stop signs, I definitely read stop signs,
link |
01:36:05.740
but like reading books is not a thing that I do a lot of.
link |
01:36:08.900
Do you really though?
link |
01:36:09.860
It might be just visual, maybe the red color.
link |
01:36:12.260
Do I read stop signs?
link |
01:36:14.340
No, it's just pattern recognition at this point.
link |
01:36:15.900
I don't sound it out.
link |
01:36:19.740
So now I do.
link |
01:36:21.780
I wonder what that, oh yeah, stop the guns.
link |
01:36:25.140
So.
link |
01:36:26.620
That's fascinating.
link |
01:36:27.460
So you don't.
link |
01:36:28.300
So I don't read very, I mean, obviously I read
link |
01:36:29.700
and I've read plenty of books,
link |
01:36:31.980
but like some people like Charles,
link |
01:36:34.020
my friend Charles and others,
link |
01:36:35.940
like a lot of people in my field, a lot of academics,
link |
01:36:38.620
like reading was really a central topic to them
link |
01:36:42.260
in development and I'm not that guy.
link |
01:36:45.100
In fact, I used to joke that when I got into college,
link |
01:36:49.420
that it was on kind of a help out the illiterate
link |
01:36:53.740
kind of program because I got to,
link |
01:36:55.180
like in my house, I wasn't a particularly bad
link |
01:36:57.260
or good reader, but when I got to college,
link |
01:36:58.740
I was surrounded by these people that were just voracious
link |
01:37:01.900
in their reading appetite.
link |
01:37:03.380
And they would like, have you read this?
link |
01:37:04.900
Have you read this?
link |
01:37:05.740
Have you read this?
link |
01:37:06.580
And I'm like, no, I'm clearly not qualified
link |
01:37:09.060
to be at this school.
link |
01:37:10.220
Like there's no way I should be here.
link |
01:37:11.700
Now I've discovered books on tape, like audio books.
link |
01:37:14.780
And so I'm much better.
link |
01:37:17.580
I'm more caught up.
link |
01:37:18.420
I read a lot of books.
link |
01:37:20.260
The small tangent on that,
link |
01:37:22.140
it is a fascinating open question to me
link |
01:37:24.620
on the topic of driving.
link |
01:37:27.020
Whether, you know, supervised learning people,
link |
01:37:30.980
machine learning people think you have to like drive
link |
01:37:33.860
to learn how to drive.
link |
01:37:35.900
To me, it's very possible that just by us humans,
link |
01:37:40.020
by first of all, walking,
link |
01:37:41.500
but also by watching other people drive,
link |
01:37:44.140
not even being inside cars as a passenger,
link |
01:37:46.500
but let's say being inside the car as a passenger,
link |
01:37:49.260
but even just like being a pedestrian and crossing the road,
link |
01:37:53.340
you learn so much about driving from that.
link |
01:37:56.260
It's very possible that you can,
link |
01:37:58.660
without ever being inside of a car,
link |
01:38:01.300
be okay at driving once you get in it.
link |
01:38:04.420
Or like watching a movie, for example.
link |
01:38:06.380
I don't know, something like that.
link |
01:38:08.100
Have you taught anyone to drive?
link |
01:38:11.140
No, except myself.
link |
01:38:13.460
I have two children.
link |
01:38:15.020
And I learned a lot about car driving
link |
01:38:18.740
because my wife doesn't want to be the one in the car
link |
01:38:21.060
while they're learning.
link |
01:38:21.900
So that's my job.
link |
01:38:22.980
So I sit in the passenger seat and it's really scary.
link |
01:38:27.260
You know, I have wishes to live
link |
01:38:30.460
and they're figuring things out.
link |
01:38:32.260
Now, they start off very much better
link |
01:38:37.140
than I imagine like a neural network would, right?
link |
01:38:39.700
They get that they're seeing the world.
link |
01:38:41.660
They get that there's a road that they're trying to be on.
link |
01:38:44.100
They get that there's a relationship
link |
01:38:45.420
between the angle of the steering,
link |
01:38:47.020
but it takes a while to not be very jerky.
link |
01:38:51.020
And so that happens pretty quickly.
link |
01:38:52.340
Like the ability to stay in lane at speed,
link |
01:38:55.100
that happens relatively fast.
link |
01:38:56.940
It's not zero shot learning, but it's pretty fast.
link |
01:39:00.140
The thing that's remarkably hard,
link |
01:39:01.900
and this is I think partly why self driving cars
link |
01:39:03.860
are really hard,
link |
01:39:04.780
is the degree to which driving
link |
01:39:06.700
is a social interaction activity.
link |
01:39:09.460
And that blew me away.
link |
01:39:10.380
I was completely unaware of it
link |
01:39:11.940
until I watched my son learning to drive.
link |
01:39:14.260
And I was realizing that he was sending signals
link |
01:39:17.780
to all the cars around him.
link |
01:39:19.420
And those in his case,
link |
01:39:20.980
he's always had social communication challenges.
link |
01:39:25.940
He was sending very mixed confusing signals
link |
01:39:28.220
to the other cars.
link |
01:39:29.060
And that was causing the other cars
link |
01:39:30.460
to drive weirdly and erratically.
link |
01:39:32.540
And there was no question in my mind
link |
01:39:34.300
that he would have an accident
link |
01:39:36.620
because they didn't know how to read him.
link |
01:39:39.860
There's things you do with the speed that you drive,
link |
01:39:42.220
the positioning of your car,
link |
01:39:43.740
that you're constantly like in the head
link |
01:39:46.220
of the other drivers.
link |
01:39:47.580
And seeing him not knowing how to do that
link |
01:39:50.740
and having to be taught explicitly,
link |
01:39:52.220
okay, you have to be thinking
link |
01:39:53.420
about what the other driver is thinking,
link |
01:39:55.980
was a revelation to me.
link |
01:39:57.460
I was stunned.
link |
01:39:58.780
So creating kind of theories of mind of the other.
link |
01:40:02.980
Theories of mind of the other cars.
link |
01:40:04.740
Yeah, yeah.
link |
01:40:05.580
Which I just hadn't heard discussed
link |
01:40:07.260
in the self driving car talks that I've been to.
link |
01:40:09.700
Since then, there's some people who do consider
link |
01:40:13.620
those kinds of issues,
link |
01:40:14.460
but it's way more subtle than I think
link |
01:40:16.140
there's a little bit of work involved with that
link |
01:40:19.140
when you realize like when you especially focus
link |
01:40:21.340
not on other cars, but on pedestrians, for example,
link |
01:40:24.260
it's literally staring you in the face.
link |
01:40:27.620
So then when you're just like,
link |
01:40:28.700
how do I interact with pedestrians?
link |
01:40:32.020
Pedestrians, you're practically talking
link |
01:40:33.340
to an octopus at that point.
link |
01:40:34.460
They've got all these weird degrees of freedom.
link |
01:40:36.180
You don't know what they're gonna do.
link |
01:40:37.140
They can turn around any second.
link |
01:40:38.420
But the point is, we humans know what they're gonna do.
link |
01:40:42.020
Like we have a good theory of mind.
link |
01:40:43.860
We have a good mental model of what they're doing.
link |
01:40:46.740
And we have a good model of the model they have a view
link |
01:40:50.460
and the model of the model of the model.
link |
01:40:52.020
Like we're able to kind of reason about this kind of,
link |
01:40:55.540
the social like game of it all.
link |
01:40:59.980
The hope is that it's quite simple actually,
link |
01:41:03.180
that it could be learned.
link |
01:41:04.340
That's why I just talked to the Waymo.
link |
01:41:06.180
I don't know if you know that company.
link |
01:41:07.540
It's Google South Africa.
link |
01:41:09.340
They, I talked to their CTO about this podcast
link |
01:41:12.900
and they like, I rode in their car
link |
01:41:15.340
and it's quite aggressive and it's quite fast
link |
01:41:17.820
and it's good and it feels great.
link |
01:41:20.060
It also, just like Tesla,
link |
01:41:21.860
Waymo made me change my mind about like,
link |
01:41:24.580
maybe driving is easier than I thought.
link |
01:41:27.540
Maybe I'm just being speciest, human centric, maybe.
link |
01:41:33.260
It's a speciest argument.
link |
01:41:35.100
Yeah, so I don't know.
link |
01:41:36.620
But it's fascinating to think about like the same
link |
01:41:41.220
as with reading, which I think you just said.
link |
01:41:43.860
You avoided the question,
link |
01:41:45.380
though I still hope you answered it somewhat.
link |
01:41:47.100
You avoided it brilliantly.
link |
01:41:48.620
It is, there's blind spots as artificial intelligence,
link |
01:41:52.140
that artificial intelligence researchers have
link |
01:41:55.140
about what it actually takes to learn to solve a problem.
link |
01:41:58.820
That's fascinating.
link |
01:41:59.660
Have you had Anca Dragan on?
link |
01:42:00.820
Yeah.
link |
01:42:01.660
Okay.
link |
01:42:02.500
She's one of my favorites.
link |
01:42:03.320
So much energy.
link |
01:42:04.160
She's right.
link |
01:42:05.000
Oh, yeah.
link |
01:42:05.820
She's amazing.
link |
01:42:06.660
Fantastic.
link |
01:42:07.500
And in particular, she thinks a lot about this kind of,
link |
01:42:10.380
I know that you know that I know kind of planning.
link |
01:42:12.820
And the last time I spoke with her,
link |
01:42:14.820
she was very articulate about the ways
link |
01:42:17.340
in which self driving cars are not solved.
link |
01:42:20.060
Like what's still really, really hard.
link |
01:42:22.100
But even her intuition is limited.
link |
01:42:23.780
Like we're all like new to this.
link |
01:42:26.060
So in some sense, the Elon Musk approach
link |
01:42:27.900
of being ultra confident and just like plowing.
link |
01:42:30.300
Put it out there.
link |
01:42:31.140
Putting it out there.
link |
01:42:32.180
Like some people say it's reckless and dangerous and so on.
link |
01:42:35.340
But like, partly it's like, it seems to be one
link |
01:42:39.060
of the only ways to make progress
link |
01:42:40.500
in artificial intelligence.
link |
01:42:41.540
So it's, you know, these are difficult things.
link |
01:42:45.540
You know, democracy is messy.
link |
01:42:49.360
Implementation of artificial intelligence systems
link |
01:42:51.940
in the real world is messy.
link |
01:42:53.980
So many years ago, before self driving cars
link |
01:42:56.260
were an actual thing you could have a discussion about,
link |
01:42:58.500
somebody asked me, like, what if we could use
link |
01:43:01.820
that robotic technology and use it to drive cars around?
link |
01:43:04.780
Like, isn't that, aren't people gonna be killed?
link |
01:43:06.580
And then it's not, you know, blah, blah, blah.
link |
01:43:08.060
I'm like, that's not what's gonna happen.
link |
01:43:09.700
I said with confidence, incorrectly, obviously.
link |
01:43:13.320
What I think is gonna happen is we're gonna have a lot more,
link |
01:43:15.820
like a very gradual kind of rollout
link |
01:43:17.580
where people have these cars in like closed communities,
link |
01:43:22.540
right, where it's somewhat realistic,
link |
01:43:24.480
but it's still in a box, right?
link |
01:43:26.660
So that we can really get a sense of what,
link |
01:43:28.980
what are the weird things that can happen?
link |
01:43:30.620
How do we, how do we have to change the way we behave
link |
01:43:34.580
around these vehicles?
link |
01:43:35.700
Like, it's obviously requires a kind of co evolution
link |
01:43:39.500
that you can't just plop them in and see what happens.
link |
01:43:42.720
But of course, we're basically popping them in
link |
01:43:44.240
and see what happens.
link |
01:43:45.080
So I was wrong, but I do think that would have been
link |
01:43:46.860
a better plan.
link |
01:43:47.900
So that's, but your intuition, that's funny,
link |
01:43:50.600
just zooming out and looking at the forces of capitalism.
link |
01:43:54.180
And it seems that capitalism rewards risk takers
link |
01:43:57.700
and rewards and punishes risk takers, like,
link |
01:44:00.860
and like, try it out.
link |
01:44:03.900
The academic approach to let's try a small thing
link |
01:44:11.200
and try to understand slowly the fundamentals
link |
01:44:13.980
of the problem.
link |
01:44:14.820
And let's start with one, then do two, and then see that.
link |
01:44:18.420
And then do the three, you know, the capitalist
link |
01:44:21.900
like startup entrepreneurial dream is let's build a thousand
link |
01:44:26.180
and let's.
link |
01:44:27.020
Right, and 500 of them fail, but whatever,
link |
01:44:28.820
the other 500, we learned from them.
link |
01:44:30.680
But if you're good enough, I mean, one thing is like,
link |
01:44:33.340
your intuition would say like, that's gonna be
link |
01:44:35.740
hugely destructive to everything.
link |
01:44:37.940
But actually, it's kind of the forces of capitalism,
link |
01:44:42.260
like people are quite, it's easy to be critical,
link |
01:44:44.940
but if you actually look at the data at the way
link |
01:44:47.780
our world has progressed in terms of the quality of life,
link |
01:44:50.660
it seems like the competent good people rise to the top.
link |
01:44:54.700
This is coming from me from the Soviet Union and so on.
link |
01:44:58.500
It's like, it's interesting that somebody like Elon Musk
link |
01:45:03.540
is the way you push progress in artificial intelligence.
link |
01:45:08.060
Like it's forcing Waymo to step their stuff up
link |
01:45:11.580
and Waymo is forcing Elon Musk to step up.
link |
01:45:17.020
It's fascinating, because I have this tension in my heart
link |
01:45:21.180
and just being upset by the lack of progress
link |
01:45:26.100
in autonomous vehicles within academia.
link |
01:45:29.760
So there's a huge progress in the early days
link |
01:45:33.580
of the DARPA challenges.
link |
01:45:35.620
And then it just kind of stopped like at MIT,
link |
01:45:39.260
but it's true everywhere else with an exception
link |
01:45:43.060
of a few sponsors here and there is like,
link |
01:45:46.940
it's not seen as a sexy problem, Thomas.
link |
01:45:50.260
Like the moment artificial intelligence starts approaching
link |
01:45:53.900
the problems of the real world,
link |
01:45:56.180
like academics kind of like, all right, let the...
link |
01:46:00.300
They get really hard in a different way.
link |
01:46:01.860
In a different way, that's right.
link |
01:46:03.260
I think, yeah, right, some of us are not excited
link |
01:46:05.880
about that other way.
link |
01:46:07.220
But I still think there's fundamentals problems
link |
01:46:09.540
to be solved in those difficult things.
link |
01:46:12.140
It's not, it's still publishable, I think.
link |
01:46:14.700
Like we just need to, it's the same criticism
link |
01:46:17.100
you could have of all these conferences in Europe, CVPR,
link |
01:46:20.300
where application papers are often as powerful
link |
01:46:24.340
and as important as like a theory paper.
link |
01:46:27.420
Even like theory just seems much more respectable and so on.
link |
01:46:31.300
I mean, machine learning community is changing
link |
01:46:32.860
that a little bit.
link |
01:46:33.820
I mean, at least in statements,
link |
01:46:35.380
but it's still not seen as the sexiest of pursuits,
link |
01:46:40.300
which is like, how do I actually make this thing
link |
01:46:42.060
work in practice as opposed to on this toy data set?
link |
01:46:47.060
All that to say, are you still avoiding
link |
01:46:49.860
the three books question?
link |
01:46:50.900
Is there something on audio book that you can recommend?
link |
01:46:54.620
Oh, yeah, I mean, yeah, I've read a lot of really fun stuff.
link |
01:46:58.740
In terms of books that I find myself thinking back on
link |
01:47:02.140
that I read a while ago,
link |
01:47:03.460
like that stood the test of time to some degree.
link |
01:47:06.380
I find myself thinking of program or be programmed a lot
link |
01:47:09.200
by Douglas Roschkopf, which was,
link |
01:47:13.980
it basically put out the premise
link |
01:47:15.780
that we all need to become programmers
link |
01:47:19.180
in one form or another.
link |
01:47:21.200
And it was an analogy to once upon a time
link |
01:47:24.180
we all had to become readers.
link |
01:47:26.500
We had to become literate.
link |
01:47:27.600
And there was a time before that
link |
01:47:28.860
when not everybody was literate,
link |
01:47:30.060
but once literacy was possible,
link |
01:47:31.740
the people who were literate had more of a say in society
link |
01:47:36.080
than the people who weren't.
link |
01:47:37.660
And so we made a big effort to get everybody up to speed.
link |
01:47:39.700
And now it's not 100% universal, but it's quite widespread.
link |
01:47:44.000
Like the assumption is generally that people can read.
link |
01:47:48.340
The analogy that he makes is that programming
link |
01:47:50.600
is a similar kind of thing,
link |
01:47:51.760
that we need to have a say in, right?
link |
01:47:57.100
So being a reader, being literate, being a reader means
link |
01:47:59.780
you can receive all this information,
link |
01:48:01.900
but you don't get to put it out there.
link |
01:48:04.260
And programming is the way that we get to put it out there.
link |
01:48:06.720
And that was the argument that he made.
link |
01:48:07.740
I think he specifically has now backed away from this idea.
link |
01:48:11.780
He doesn't think it's happening quite this way.
link |
01:48:14.880
And that might be true that it didn't,
link |
01:48:17.500
society didn't sort of play forward quite that way.
link |
01:48:20.740
I still believe in the premise.
link |
01:48:22.220
I still believe that at some point,
link |
01:48:24.460
the relationship that we have to these machines
link |
01:48:26.420
and these networks has to be one of each individual
link |
01:48:29.260
can, has the wherewithal to make the machines help them.
link |
01:48:34.940
Do the things that that person wants done.
link |
01:48:37.140
And as software people, we know how to do that.
link |
01:48:40.140
And when we have a problem, we're like, okay,
link |
01:48:41.500
I'll just, I'll hack up a Pearl script or something
link |
01:48:43.380
and make it so.
link |
01:48:44.900
If we lived in a world where everybody could do that,
link |
01:48:47.260
that would be a better world.
link |
01:48:49.260
And computers would be, have, I think less sway over us.
link |
01:48:53.780
And other people's software would have less sway over us
link |
01:48:56.920
as a group.
link |
01:48:57.760
In some sense, software engineering, programming is power.
link |
01:49:00.860
Programming is power, right?
link |
01:49:03.100
Yeah, it's like magic.
link |
01:49:04.220
It's like magic spells.
link |
01:49:05.420
And it's not out of reach of everyone.
link |
01:49:09.220
But at the moment, it's just a sliver of the population
link |
01:49:11.780
who can commune with machines in this way.
link |
01:49:15.300
So I don't know, so that book had a big impact on me.
link |
01:49:18.460
Currently, I'm reading The Alignment Problem,
link |
01:49:20.900
actually by Brian Christian.
link |
01:49:22.180
So I don't know if you've seen this out there yet.
link |
01:49:23.660
Is this similar to Stuart Russell's work
link |
01:49:25.380
with the control problem?
link |
01:49:27.040
It's in that same general neighborhood.
link |
01:49:28.860
I mean, they have different emphases
link |
01:49:31.320
that they're concentrating on.
link |
01:49:32.540
I think Stuart's book did a remarkably good job,
link |
01:49:36.380
like just a celebratory good job
link |
01:49:38.940
at describing AI technology and sort of how it works.
link |
01:49:43.220
I thought that was great.
link |
01:49:44.180
It was really cool to see that in a book.
link |
01:49:46.500
I think he has some experience writing some books.
link |
01:49:49.540
You know, that's probably a possible thing.
link |
01:49:52.100
He's maybe thought a thing or two
link |
01:49:53.620
about how to explain AI to people.
link |
01:49:56.200
Yeah, that's a really good point.
link |
01:49:57.820
This book so far has been remarkably good
link |
01:50:00.720
at telling the story of sort of the history,
link |
01:50:04.860
the recent history of some of the things
link |
01:50:07.060
that have happened.
link |
01:50:08.420
I'm in the first third.
link |
01:50:09.600
He said this book is in three thirds.
link |
01:50:10.980
The first third is essentially AI fairness
link |
01:50:14.540
and implications of AI on society
link |
01:50:16.860
that we're seeing right now.
link |
01:50:18.420
And that's been great.
link |
01:50:19.720
I mean, he's telling the stories really well.
link |
01:50:21.220
He went out and talked to the frontline people
link |
01:50:23.700
whose names were associated with some of these ideas
link |
01:50:26.620
and it's been terrific.
link |
01:50:28.220
He says the second half of the book
link |
01:50:29.420
is on reinforcement learning.
link |
01:50:30.700
So maybe that'll be fun.
link |
01:50:33.220
And then the third half, third third,
link |
01:50:36.420
is on the super intelligence alignment problem.
link |
01:50:39.980
And I suspect that that part will be less fun
link |
01:50:43.360
for me to read.
link |
01:50:44.320
Yeah.
link |
01:50:46.260
Yeah, it's an interesting problem to talk about.
link |
01:50:48.940
I find it to be the most interesting,
link |
01:50:50.740
just like thinking about whether we live
link |
01:50:52.560
in a simulation or not,
link |
01:50:54.060
as a thought experiment to think about our own existence.
link |
01:50:58.280
So in the same way,
link |
01:50:59.700
talking about alignment problem with AGI
link |
01:51:02.260
is a good way to think similar
link |
01:51:04.180
to like the trolley problem with autonomous vehicles.
link |
01:51:06.660
It's a useless thing for engineering,
link |
01:51:08.520
but it's a nice little thought experiment
link |
01:51:10.900
for actually thinking about what are like
link |
01:51:13.580
our own human ethical systems, our moral systems
link |
01:51:17.180
to by thinking how we engineer these things,
link |
01:51:23.100
you start to understand yourself.
link |
01:51:25.660
So sci fi can be good at that too.
link |
01:51:27.180
So one sci fi book to recommend
link |
01:51:29.020
is Exhalations by Ted Chiang,
link |
01:51:31.900
bunch of short stories.
link |
01:51:33.880
This Ted Chiang is the guy who wrote the short story
link |
01:51:35.940
that became the movie Arrival.
link |
01:51:38.660
And all of his stories just from a,
link |
01:51:41.660
he was a computer scientist,
link |
01:51:43.340
actually he studied at Brown.
link |
01:51:44.740
And they all have this sort of really insightful bit
link |
01:51:49.140
of science or computer science that drives them.
link |
01:51:52.260
And so it's just a romp, right?
link |
01:51:54.940
To just like, he creates these artificial worlds
link |
01:51:57.420
with these by extrapolating on these ideas
link |
01:51:59.840
that we know about,
link |
01:52:01.460
but hadn't really thought through
link |
01:52:02.780
to this kind of conclusion.
link |
01:52:04.120
And so his stuff is, it's really fun to read,
link |
01:52:06.460
it's mind warping.
link |
01:52:08.620
So I'm not sure if you're familiar,
link |
01:52:10.820
I seem to mention this every other word
link |
01:52:13.820
is I'm from the Soviet Union and I'm Russian.
link |
01:52:17.820
Way too much to see us.
link |
01:52:18.940
My roots are Russian too,
link |
01:52:20.220
but a couple generations back.
link |
01:52:22.580
Well, it's probably in there somewhere.
link |
01:52:24.240
So maybe we can pull at that thread a little bit
link |
01:52:28.740
of the existential dread that we all feel.
link |
01:52:31.500
You mentioned that you,
link |
01:52:32.740
I think somewhere in the conversation you mentioned
link |
01:52:34.540
that you don't really pretty much like dying.
link |
01:52:38.120
I forget in which context,
link |
01:52:39.540
it might've been a reinforcement learning perspective.
link |
01:52:41.560
I don't know.
link |
01:52:42.400
No, you know what it was?
link |
01:52:43.220
It was in teaching my kids to drive.
link |
01:52:47.100
That's how you face your mortality, yes.
link |
01:52:49.860
From a human beings perspective
link |
01:52:52.820
or from a reinforcement learning researchers perspective,
link |
01:52:55.420
let me ask you the most absurd question.
link |
01:52:57.340
What do you think is the meaning of this whole thing?
link |
01:53:01.660
The meaning of life on this spinning rock.
link |
01:53:06.660
I mean, I think reinforcement learning researchers
link |
01:53:08.980
maybe think about this from a science perspective
link |
01:53:11.380
more often than a lot of other people, right?
link |
01:53:13.680
As a supervised learning person,
link |
01:53:14.940
you're probably not thinking about the sweep of a lifetime,
link |
01:53:18.500
but reinforcement learning agents
link |
01:53:20.180
are having little lifetimes, little weird little lifetimes.
link |
01:53:22.860
And it's hard not to project yourself
link |
01:53:25.420
into their world sometimes.
link |
01:53:27.740
But as far as the meaning of life,
link |
01:53:30.300
so when I turned 42, you may know from,
link |
01:53:34.060
that is a book I read,
link |
01:53:35.700
The Hitchhiker's Guide to the Galaxy,
link |
01:53:38.940
that that is the meaning of life.
link |
01:53:40.100
So when I turned 42, I had a meaning of life party
link |
01:53:43.660
where I invited people over
link |
01:53:45.300
and everyone shared their meaning of life.
link |
01:53:48.980
We had slides made up.
link |
01:53:50.860
And so we all sat down and did a slide presentation
link |
01:53:54.660
to each other about the meaning of life.
link |
01:53:56.700
And mine was balance.
link |
01:54:00.500
I think that life is balance.
link |
01:54:02.100
And so the activity at the party,
link |
01:54:06.740
for a 42 year old, maybe this is a little bit nonstandard,
link |
01:54:09.180
but I found all the little toys and devices that I had
link |
01:54:12.380
where you had to balance on them.
link |
01:54:13.620
You had to like stand on it and balance,
link |
01:54:15.740
or a pogo stick I brought,
link |
01:54:17.500
a rip stick, which is like a weird two wheeled skateboard.
link |
01:54:23.180
I got a unicycle, but I didn't know how to do it.
link |
01:54:26.860
I now can do it.
link |
01:54:28.280
I would love watching you try.
link |
01:54:29.540
Yeah, I'll send you a video.
link |
01:54:31.820
I'm not great, but I managed.
link |
01:54:35.460
And so balance, yeah.
link |
01:54:37.220
So my wife has a really good one that she sticks to
link |
01:54:42.460
and is probably pretty accurate.
link |
01:54:43.700
And it has to do with healthy relationships
link |
01:54:47.060
with people that you love and working hard for good causes.
link |
01:54:51.440
But to me, yeah, balance, balance in a word.
link |
01:54:53.700
That works for me.
link |
01:54:56.080
Not too much of anything,
link |
01:54:57.220
because too much of anything is iffy.
link |
01:55:00.340
That feels like a Rolling Stones song.
link |
01:55:02.300
I feel like they must be.
link |
01:55:03.420
You can't always get what you want,
link |
01:55:05.020
but if you try sometimes, you can strike a balance.
link |
01:55:09.620
Yeah, I think that's how it goes, Michael.
link |
01:55:12.860
I'll write you a parody.
link |
01:55:14.620
It's a huge honor to talk to you.
link |
01:55:16.220
This is really fun.
link |
01:55:17.060
Oh, no, the honor's mine.
link |
01:55:17.880
I've been a big fan of yours,
link |
01:55:18.800
so can't wait to see what you do next
link |
01:55:24.460
in the world of education, in the world of parody,
link |
01:55:27.160
in the world of reinforcement learning.
link |
01:55:28.420
Thanks for talking to me.
link |
01:55:29.340
My pleasure.
link |
01:55:30.840
Thank you for listening to this conversation
link |
01:55:32.340
with Michael Littman, and thank you to our sponsors,
link |
01:55:35.140
SimpliSafe, a home security company I use
link |
01:55:37.780
to monitor and protect my apartment, ExpressVPN,
link |
01:55:41.680
the VPN I've used for many years
link |
01:55:43.420
to protect my privacy on the internet,
link |
01:55:45.700
Masterclass, online courses that I enjoy
link |
01:55:48.540
from some of the most amazing humans in history,
link |
01:55:51.400
and BetterHelp, online therapy with a licensed professional.
link |
01:55:55.640
Please check out these sponsors in the description
link |
01:55:58.180
to get a discount and to support this podcast.
link |
01:56:00.900
If you enjoy this thing, subscribe on YouTube,
link |
01:56:03.540
review it with five stars on Apple Podcast,
link |
01:56:05.860
follow on Spotify, support it on Patreon,
link |
01:56:08.660
or connect with me on Twitter at Lex Friedman.
link |
01:56:12.220
And now, let me leave you with some words
link |
01:56:14.660
from Groucho Marx.
link |
01:56:16.760
If you're not having fun, you're doing something wrong.
link |
01:56:20.700
Thank you for listening, and hope to see you next time.