back to index

Andrew Ng: Deep Learning, Education, and Real-World AI | Lex Fridman Podcast #73


small model | large model

link |
00:00:00.000
The following is a conversation with Andrew Ang, one of the most impactful educators, researchers,
link |
00:00:06.480
innovators, and leaders in artificial intelligence and technology space in general.
link |
00:00:11.920
He cofounded Coursera and Google Brain, launched Deep Learning AI, Lending AI, and the AI Fund,
link |
00:00:19.680
and was the chief scientist at Baidu. As a Stanford professor and with Coursera and Deep
link |
00:00:26.160
Learning AI, he has helped educate and inspire millions of students, including me.
link |
00:00:33.600
This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube,
link |
00:00:38.400
give it to 5 stars on Apple Podcasts, support it on Patreon, or simply connect with me on Twitter
link |
00:00:43.760
at Lex Friedman, spelled F R I D M A N. As usual, I'll do one or two minutes of ads now
link |
00:00:51.200
and never any ads in the middle that can break the flow of the conversation. I hope that works for
link |
00:00:55.760
you and doesn't hurt the listening experience. This show is presented by Cash App, the number
link |
00:01:01.440
one finance app in the App Store. When you get it, use code Lex Podcast. Cash App lets you send
link |
00:01:08.240
money to friends, buy Bitcoin, and invest in the stock market with as little as $1. Broker services
link |
00:01:14.560
that are provided by Cash App Investing, a subsidiary of Square, and member SIPC.
link |
00:01:19.680
Since Cash App allows you to buy Bitcoin, let me mention that cryptocurrency
link |
00:01:25.280
in the context of the history of money is fascinating. I recommend Ascent of Money as a great book
link |
00:01:32.000
on this history. Debits and credits on ledgers started over 30,000 years ago. The US dollar
link |
00:01:39.520
was created over 200 years ago, and Bitcoin, the first decentralized cryptocurrency, released
link |
00:01:46.000
just over 10 years ago. So given that history, cryptocurrency is still very much in its early
link |
00:01:52.080
days of development, but it's still aiming to and just might redefine the nature of money.
link |
00:01:59.680
So again, if you get Cash App from the App Store or Google Play and use the code Lex Podcast,
link |
00:02:05.840
you'll get $10 and Cash App will also donate $10 to first, one of my favorite organizations
link |
00:02:12.080
that is helping to advance robotics and STEM education for young people around the world.
link |
00:02:18.480
And now here's my conversation with Andrew Eng. The courses you taught on machine learning
link |
00:02:25.120
at Stanford and later on Coursera that you co founded have educated and inspired millions of
link |
00:02:31.120
people. So let me ask you, what people or ideas inspired you to get into computer science and
link |
00:02:36.240
machine learning when you were young? When did you first fall in love with the field?
link |
00:02:41.440
There's another way to put it.
link |
00:02:43.760
Growing up in Hong Kong and Singapore, I started learning to code when I was five or six years
link |
00:02:49.520
old. At that time, I was learning the basic programming language, and I would take these
link |
00:02:54.400
books and they'll tell you, type this program into your computer. So type that program to my
link |
00:02:59.120
computer. And as a result of all that typing, I would get to play these very simple shoot them
link |
00:03:05.120
up games that I had implemented on my on my little computer. So I thought it's fascinating as a young
link |
00:03:11.840
kid that I could write this code that's really just copying code from a book into my computer
link |
00:03:18.160
to then play these cooler video games. Another moment for me was when I was a teenager and my
link |
00:03:25.760
father was a doctor was reading about expert systems and about neural networks. So he got me
link |
00:03:31.920
to read some of these books. And I thought it was really cool. You could write a computer
link |
00:03:36.160
that started to exhibit intelligence. Then I remember doing an internship while I was in
link |
00:03:42.000
high school. This was in Singapore, where I remember doing a lot of photocopying and I was
link |
00:03:48.640
office assistant. And the highlight of my job was when I got to use the shredder. So the teenager
link |
00:03:54.800
of me remember thinking, boy, this is a lot of photocopying. If only we could write software,
link |
00:03:58.960
build a robot, something to automate this, maybe I could do something else. So I think a lot of
link |
00:04:03.680
my work since then has centered on the theme of automation. Even the way I think about machine
link |
00:04:08.880
learning today, we're very good at writing learning algorithms that can automate things that people
link |
00:04:13.840
can do. Or even launching the first MOOCs, Mass Open Online courses that later led to Coursera,
link |
00:04:20.000
I was trying to automate what could be automatable in how I was teaching on campus.
link |
00:04:25.280
The process of education tried to automate parts of that to make it more,
link |
00:04:30.240
sort of to have more impact from a single teacher, a single educator.
link |
00:04:34.720
Yeah, I felt, you know, teaching at Stanford, teaching machine learning to about 400 students
link |
00:04:40.080
a year at the time. And I found myself filming the exact same video every year, telling the same
link |
00:04:46.640
jokes in the same room. And I thought, why am I doing this? Why don't we just take last year's
link |
00:04:51.360
video and then I can spend my time building a deeper relationship with students. So that
link |
00:04:55.680
process of thinking through how to do that, that led to the first MOOCs that we launched.
link |
00:05:02.320
And then you have more time to write new jokes. Are there favorite memories from your early days
link |
00:05:07.440
at Stanford teaching thousands of people in person and then millions of people online?
link |
00:05:12.880
You know, teaching online, what not many people know was that a lot of those videos were shot
link |
00:05:22.480
between the hours of 10 p.m. and 3 a.m. A lot of times we were launching the first MOOCs
link |
00:05:30.400
out of Stanford with already announced the course, about 100,000 people signed up.
link |
00:05:34.240
We just started to write the code and we had not yet actually filmed the videos. So we
link |
00:05:39.680
were at a lot of pressure, 100,000 people waiting for us to produce the content. So many
link |
00:05:44.240
Fridays, Saturdays, I would go out and have dinner with my friends and then I would think,
link |
00:05:50.000
okay, do you want to go home now or do you want to go to the office to film videos? And the thought
link |
00:05:55.840
of being able to help 100,000 people potentially learn machine learning, fortunately that made me
link |
00:06:01.520
think, okay, I want to go to my office, go to my tiny recording studio, I would adjust my logic
link |
00:06:06.960
webcam, adjust my Wacom tablet, make sure my lapel mic was on, and then I would start recording
link |
00:06:13.840
often until 2 a.m. or 3 a.m. I think unfortunately that doesn't show that it was recorded that later
link |
00:06:20.000
night, but it was really inspiring the thought that we could create content to help so many people
link |
00:06:26.800
learn about machine learning. How did that feel? The fact that you're probably somewhat alone,
link |
00:06:31.440
maybe a couple of friends recording with a Logitech webcam and kind of going home alone at 1
link |
00:06:38.880
or 2 a.m. at night and knowing that that's going to reach sort of thousands of people,
link |
00:06:45.120
eventually millions of people, is what's that feeling like? I mean, is there a feeling of just
link |
00:06:50.800
satisfaction of pushing through? I think it's humbling and I wasn't thinking about what I was
link |
00:06:57.360
feeling. I think one thing I'm proud to say we got right from the early days was I told my whole
link |
00:07:04.240
team back then that the number one priority is to do what's best for learners, do what's best for
link |
00:07:08.640
students, and so when I went to the recording studio, the only thing on my mind was, what can I say,
link |
00:07:13.920
how can I design my slides, what I need to draw right to make these concepts as clear as possible
link |
00:07:18.960
for learners. I think, you know, I've seen sometimes instructors is tempting to, hey,
link |
00:07:24.320
let's talk about my work. Maybe if I teach you about my research, someone will cite my papers
link |
00:07:28.880
a couple more times. And I think one of the things we got right, launching the first few MOOCs
link |
00:07:32.960
and later building Coursera, was putting in place that bedrock principle of let's just do what's
link |
00:07:38.000
best for learners and forget about everything else. And I think that that is a guiding principle
link |
00:07:43.200
turned out to be really important to the rise of the MOOC movement. And the kind of learner you're
link |
00:07:48.000
imagined in your mind is as broad as possible, as global as possible. So really try to reach
link |
00:07:55.520
as many people interested in machine learning and AI as possible.
link |
00:07:59.680
I really want to help anyone that had an interest in machine learning to break into the field.
link |
00:08:04.640
And I think sometimes, I've actually had people ask me, hey, why are you spending so much time
link |
00:08:10.000
explaining grade and descent? And my answer was, if I look at what I think the learner needs and
link |
00:08:15.680
what benefit from, I felt that having a good understanding of the foundations, coming back
link |
00:08:21.040
to the basics, would put them in a better state to then build on a long term career.
link |
00:08:26.800
So try to consistently make decisions on that principle.
link |
00:08:30.560
One of the things you actually revealed to the narrow AI community at the time and to the world
link |
00:08:38.080
is that the amount of people who are actually interested in AI is much larger than we imagined.
link |
00:08:42.640
By you teaching the class and how popular it became, it showed that, wow, this isn't just a small
link |
00:08:50.240
community of people who go to New York and it's much bigger. It's developers. It's people from
link |
00:08:58.720
all over the world. I mean, I'm Russian. So everybody in Russia is really interested.
link |
00:09:03.360
There's a huge number of programmers who are interested in machine learning, India, China,
link |
00:09:07.760
South America, everywhere. There's just millions of people who are interested in machine learning.
link |
00:09:13.520
So how big do you get a sense that the number of people is that are interested from your
link |
00:09:19.520
perspective? I think the numbers grown over time. I think one of those things that maybe it feels
link |
00:09:24.960
like it came out of nowhere, but it's an insight that building it took years. It's one of those
link |
00:09:29.120
overnight successes that took years to get there. My first foray into this type of online education
link |
00:09:35.920
was when we were filming my Stanford class and sticking the videos on YouTube and some of the
link |
00:09:40.080
things we had uploaded, the whole works and so on, but basically the one hour 15 minute video that
link |
00:09:45.360
we put on YouTube. Then we had four or five other versions of websites that had built,
link |
00:09:52.000
most of which you would never have heard of because they reached small audiences,
link |
00:09:55.760
but that allowed me to iterate, allow my team and me to iterate to learn what are the ideas that
link |
00:10:00.240
work and what doesn't. For example, one of the features I was really excited about and really
link |
00:10:05.280
proud of was build this website where multiple people could be logged into the website at the
link |
00:10:10.080
same time. So today, if you go to a website, if you are logged in and then I want to log in,
link |
00:10:15.840
you need to log out. It's the same browser, the same computer. But I thought, well, what if two
link |
00:10:20.000
people say you and me were watching a video together in front of a computer? What if a website
link |
00:10:25.520
could have you type your name and password, have me type my name and password, and then now the
link |
00:10:29.440
computer knows both of us are watching together and it gives both of us credit for anything we do
link |
00:10:33.920
as a group. Inferences feature rolled it out in a high in a school in San Francisco. We had about
link |
00:10:40.400
20 something users. Where's the teacher there? Sacred Heart Cathedral Prep. The teacher is great.
link |
00:10:46.400
I mean, guess what? Zero people use this feature. It turns out people studying online,
link |
00:10:51.760
they want to watch the videos by themselves so you can play back, pause at your own speed rather
link |
00:10:56.880
than in groups. So that was one example of a tiny lesson learned out of many that allowed us to hone
link |
00:11:02.960
into the set of features. And it sounds like a brilliant feature. So I guess the lesson to take
link |
00:11:08.240
from that is there's something that looks amazing on paper and then nobody uses it,
link |
00:11:15.200
doesn't actually have the impact that you think it might have. So yeah, I saw that you've really
link |
00:11:20.160
went through a lot of different features and a lot of ideas to arrive at the final, at Corsair,
link |
00:11:24.960
at the final kind of powerful thing that showed the world that MOOCs can educate millions.
link |
00:11:32.080
And I think with the whole machine learning movement as well, I think it didn't come out of
link |
00:11:37.440
nowhere. Instead, what happened was, as more people learn about machine learning,
link |
00:11:42.240
they will tell their friends and their friends will see how's applicable to their
link |
00:11:45.600
work. And then the community kept on growing. And I think we're still growing.
link |
00:11:50.960
I don't know in the future what percentage of all developers will be AI developers.
link |
00:11:56.080
I could easily see it being more for 50%, right? Because so many AI developers broadly
link |
00:12:04.080
construed, not just people doing the machine learning modeling, but the people building
link |
00:12:07.600
infrastructure, data pipelines, all the software surrounding the core machine learning model,
link |
00:12:13.680
maybe is even bigger. I feel like today almost every software engineer has some understanding
link |
00:12:19.360
of the cloud, not all, but maybe this is my microcontroller developer doesn't need to do the
link |
00:12:24.720
cloud. But I feel like the vast majority of software engineers today are sort of having
link |
00:12:30.240
the patience to cloud. I think in the future, maybe we're approaching nearly 100% of all developers
link |
00:12:35.920
being in some way an AI developer, at least having an appreciation of machine learning.
link |
00:12:41.920
And my hope is that there's this kind of effect that there's people who are not really interested
link |
00:12:46.960
in being a programmer or being into software engineering, like biologists, chemists, and
link |
00:12:52.960
physicists, even mechanical engineers, all these disciplines that are now more and more
link |
00:12:59.280
sitting on large data sets. And here they didn't think they're interested in programming until
link |
00:13:04.640
they have this data set and they realize there's this set of machine learning tools that allow
link |
00:13:08.080
you to use the data set. So they actually become, they learn to program and they become new programmers.
link |
00:13:13.520
So like the, not just because you've mentioned a larger percentage of developers become
link |
00:13:18.320
machine learning people, it seems like more and more the kinds of people who are becoming
link |
00:13:24.560
developers is also growing significantly. Yeah, I think once upon a time, only a small part of
link |
00:13:30.880
humanity was literate, you know, could read and write. And maybe you thought maybe not everyone
link |
00:13:36.240
needs to learn to read and write. You know, you just go listen to a few monks read to you,
link |
00:13:43.200
and maybe that was enough, or maybe we just need a few handful of authors to write the best sellers
link |
00:13:47.680
and then no one else needs to write. But what we found was that by giving as many people, you know,
link |
00:13:53.360
in some countries, almost everyone, basic literacy, it dramatically enhanced human to human
link |
00:13:58.640
communications. And we can now write for an audience of one, such as advice, engine email,
link |
00:14:02.480
or you send me an email. I think in computing, we're still in that phase where so few people
link |
00:14:08.640
know how to code that the coders mostly have to code for relatively large audiences. But if everyone,
link |
00:14:15.120
or most people, became developers at some level, similar to how most people in developed economies
link |
00:14:22.240
are somewhat literate, I would love to see the owners of a mom and pop store be able to write
link |
00:14:27.680
a little bit of code to customize the TV display for their special this week. And I think it'll
link |
00:14:32.400
enhance human to computer communications, which is becoming more and more important today as well.
link |
00:14:37.840
So you think it's possible that machine learning becomes kind of similar to literacy,
link |
00:14:44.320
where, where, yeah, like you said, the owners of a mom and pop shop is basically everybody in all walks
link |
00:14:51.680
of life would have some degree of programming capability. I could see society getting there.
link |
00:14:58.400
There's one interesting thing, you know, if I go talk to the mom and pop store, if I talk to a lot
link |
00:15:03.360
of people in their daily professions, I previously didn't have a good story for why they should learn
link |
00:15:08.400
to code, you know, give them some reasons. But what I found with the rise of machine learning
link |
00:15:13.200
and data science is that I think the number of people with a concrete use of data science
link |
00:15:18.320
in their daily lives and their jobs may be even larger than the number of people with a concrete
link |
00:15:23.440
use for software engineering. For example, if you actually if you run a small mom and pop store,
link |
00:15:28.080
I think if you can analyze the data about your sales, your customers, I think there's actually
link |
00:15:32.560
real value there, maybe even more than traditional software engineering. So I find that for a lot
link |
00:15:38.400
of my friends in various professions, be it recruiters or accountants or, you know, people that
link |
00:15:44.080
work in factories, which I deal with more and more these days, I feel if they were data scientists
link |
00:15:50.480
at some level, they could immediately use that in their work. So I think that data science and
link |
00:15:55.760
machine learning may be an even easier entree into the developer world for a lot of people
link |
00:16:01.600
than the software engineering. That's interesting. And I agree with that. But that's
link |
00:16:05.680
beautifully put. We live in a world where most courses and talks have slides, PowerPoint, keynote,
link |
00:16:12.720
and yet you famously often still use a marker and a whiteboard. The simplicity of that is
link |
00:16:18.720
compelling and for me at least fun to watch. So let me ask, why do you like using a marker and
link |
00:16:24.960
whiteboard, even on the biggest of stages? I think it depends on the concepts you want to explain.
link |
00:16:31.280
For mathematical concepts, it's nice to build up the equation one piece at a time. And the
link |
00:16:37.680
whiteboard marker or the pen and stylus is a very easy way to build up the equation,
link |
00:16:42.960
build up a complex concept one piece at a time while you're talking about it. And sometimes
link |
00:16:47.840
that enhances understandability. The downside of writing is that it's slow. And so if you want a
link |
00:16:54.400
long sentence, it's very hard to write that. So I think they're frozen columns. And sometimes I
link |
00:16:58.400
use slides and sometimes I use a whiteboard or a stylus. The slowness of a whiteboard is also
link |
00:17:04.960
its upside because it forces you to reduce everything to the basics. So some of your talks
link |
00:17:13.360
involve the whiteboard. I mean, it's really not, you go very slowly and you really focus on the
link |
00:17:18.880
most simple principles. And that's a beautiful, that enforces a kind of a minimalism of ideas
link |
00:17:26.400
that I think is surprising to me is great for education. Like a great talk,
link |
00:17:33.760
I think is not one that has a lot of content. A great talk is one that just clearly says a few
link |
00:17:40.720
simple ideas. And I think the whiteboard somehow enforces that. Peter Abiel, who's now one of the
link |
00:17:48.400
top roboticists and reinforcement learning experts in the world, was your first PhD student.
link |
00:17:52.880
And so I bring him up just because I kind of imagine this was must have been an interesting time in
link |
00:18:00.640
your life. Do you have any favorite memories of working with Peter, your first student in those
link |
00:18:07.120
uncertain times, especially before deep learning really, really sort of blew up any favorite
link |
00:18:16.080
memories from those times? Yeah, I was really fortunate to have had Peter Abiel as my first
link |
00:18:21.280
PhD student. And I think even my long term professional success builds on early foundations
link |
00:18:26.880
or early work that Peter was so critical to. So I was really grateful to him for working with me.
link |
00:18:34.880
What not a lot of people know is just how hard research was and still is. Peter's PhD thesis
link |
00:18:43.600
was using reinforcement learning to fly helicopters. And so, even today, the website,
link |
00:18:49.920
heli.stanford.edu, you can watch videos of us using reinforcement learning to make a helicopter
link |
00:18:57.200
fly upside down, fly loops, so it's cool. It's one of the most incredible robotics videos ever,
link |
00:19:02.320
so people should watch it. Oh yeah, thank you. It's inspiring. That's from like 2008 or seven
link |
00:19:08.960
or six, like that range. Something like that. It was over 10 years old. That was really inspiring
link |
00:19:13.840
to a lot of people. What not many people see is how hard it was. So Peter and Adam Codes and
link |
00:19:21.680
Morgan Quigley and I were working on various versions of the helicopter. And a lot of things
link |
00:19:26.480
did not work. For example, it turns out one of the hardest problems we had was when the helicopter
link |
00:19:30.800
is flying around upside down doing stunts, how do you figure out the position? How do you localize
link |
00:19:35.600
the helicopter? So we want to try all sorts of things. Having one GPS unit doesn't work because
link |
00:19:41.280
you're flying upside down. The GPS unit is facing down, so you can't see the satellites. So we experimented
link |
00:19:47.040
trying to have two GPS units, one facing up, one facing down, so if you flip over, that didn't work
link |
00:19:51.760
because the downward facing one couldn't synchronize if you're flipping quickly. Morgan Quigley was
link |
00:19:58.160
exploring this crazy complicated configuration of specialized hardware to interpret GPS signals.
link |
00:20:04.560
Looking at the FPG, it's completely insane. Spent about a year working on that. Didn't work. So I remember
link |
00:20:11.360
Peter, a great guy, him and me sitting down in my office looking at some of the latest things we
link |
00:20:18.000
had tried that didn't work and saying, you know, done it like what now? Because we tried so many
link |
00:20:24.560
things and it just didn't work. In the end, what we did, and Adam Codes was crucial to this, was
link |
00:20:32.640
put cameras on the ground and use cameras on the ground to localize the helicopter and that
link |
00:20:38.000
solved the localization problem so that we could then focus on the reinforcement learning and
link |
00:20:42.400
inverse reinforcement learning techniques so it didn't actually make the helicopter fly.
link |
00:20:47.840
And, you know, I'll remind it, when I was doing this work at Stanford around that time,
link |
00:20:52.880
there was a lot of reinforcement learning theoretical papers, but not a lot of practical
link |
00:20:58.160
applications. So the autonomous helicopter for flying helicopters was one of the few, you know,
link |
00:21:05.200
practical applications of reinforcement learning at the time, which caused it to become pretty
link |
00:21:10.240
well known. I feel like we might have almost come full circle with today. There's so much buzz,
link |
00:21:15.520
so much hype, so much excitement about reinforcement learning. But again, we're hunting
link |
00:21:20.640
for more applications of all of these great ideas that David Kunhe has come up with.
link |
00:21:24.880
What was the drive, sort of in the face of the fact that most people are doing theoretical work,
link |
00:21:30.080
what motivate you in the uncertainty and the challenges to get the helicopter sort of
link |
00:21:34.800
to do the applied work, to get the actual system to work? Yeah, in the face of fear, uncertainty,
link |
00:21:41.840
sort of the setbacks that you mentioned for localization.
link |
00:21:45.920
I like stuff that works. In the physical world. So like, it's back to the shredder.
link |
00:21:50.960
You know, I like theory, but when I work on theory myself, and this is personal taste,
link |
00:21:58.560
I'm not saying anyone else should do what I do. But when I work on theory, I personally
link |
00:22:03.280
enjoy it more if I feel that the work I do will influence people, have positive impact or help
link |
00:22:10.080
someone. I remember when many years ago, I was speaking with a mathematics professor,
link |
00:22:16.960
and he kind of just said, hey, why do you do what you do? And then he said, he actually had stars in
link |
00:22:24.400
his eyes when he answered. And this mathematician, not from Stanford, different university, he said,
link |
00:22:30.160
I do what I do because it helps me to discover truth and beauty in the universe. He had stars
link |
00:22:37.200
in his eyes, he said. And I thought that's great. I don't want to do that. I think it's great that
link |
00:22:43.360
someone does that fully support the people that do it. A lot of respect for people that. But I am
link |
00:22:47.600
more motivated when I can see a line to how the work that my teams and I are doing helps people.
link |
00:22:56.960
The world needs all sorts of people. I'm just one type. I don't think everyone should do things
link |
00:23:00.880
the same way as I do. But when I delve into either theory or practice, if I personally have conviction
link |
00:23:07.680
that here's a pathway to help people, I find that more satisfying to have that conviction.
link |
00:23:15.200
That's your path. You were a proponent of deep learning before it gained widespread acceptance.
link |
00:23:23.280
What did you see in this field that gave you confidence? What was your thinking process like
link |
00:23:27.600
in that first decade of the, I don't know what that's called, 2000s, the aughts?
link |
00:23:32.800
Yeah. I can tell you the thing we got wrong and the thing we got right. The thing we really got
link |
00:23:37.840
wrong was the importance of, the early importance of unsupervised learning. So early days of Google
link |
00:23:46.160
Rain, we put a lot of effort into unsupervised learning rather than supervised learning. And
link |
00:23:50.400
there was this argument, I think it was around 2005, after, you know, New Europe's at that time
link |
00:23:56.480
called NIPPS, but now New Europe's had ended. And Geoff Hinton and I were sitting in the
link |
00:24:01.120
cafeteria outside the conference where lunch was just chatting. And Geoff pulled up this napkin,
link |
00:24:06.080
he started sketching this argument on the napkin. It was very compelling as I'll repeat it.
link |
00:24:11.760
Human brain has about 100 trillion. So there's 10 to the 14 synaptic connections.
link |
00:24:17.760
You will live for about 10 to the nine seconds. That's 30 years. You actually live for two by
link |
00:24:23.920
10 to the nine, maybe three by 10 to the nine seconds. So just let's say 10 to the nine.
link |
00:24:27.040
So if each synaptic connection, each weight in your brain's neural network has just a one bit
link |
00:24:33.760
parameter, that's 10 to the 14 bits you need to learn in up to 10 to the nine seconds of your life.
link |
00:24:41.840
So via this simple argument, which is a lot of problems, it's very simplified,
link |
00:24:46.000
that's 10 to the five bits per second you need to learn in your life. And I have a one year old
link |
00:24:51.200
daughter. I am not pointing out 10 to five bits per second of labels to her. I think I'm a very
link |
00:25:01.440
loving parent, but I'm just not going to do that. So from this very crude, definitely problematic
link |
00:25:08.320
argument, there's just no way that most of what we know is through supervised learning.
link |
00:25:13.360
But where you get so many bits of information is from sucking in images, audio, just experiences
link |
00:25:17.840
in the world. And so that argument, and there are a lot of known forces argument,
link |
00:25:24.000
going to really convince me that there's a lot of power to unsupervised learning.
link |
00:25:29.440
So that was the part that we actually maybe got wrong. I still think unsupervised learning is
link |
00:25:33.920
really important, but in the early days, 10, 15 years ago, a lot of us thought that was the path
link |
00:25:40.320
forward. Oh, so you're saying that that perhaps was the wrong intuition for the time? For the time.
link |
00:25:46.080
That was the part we got wrong. The part we got right was the importance of scale. So
link |
00:25:52.400
Adam Coates, another wonderful person, fortunate to have worked with him, he was in my group at
link |
00:25:58.880
Stanford at the time. And Adam had run these experiments at Stanford, showing that the bigger
link |
00:26:03.600
we train a learning algorithm, the better its performance. And it was based on that there was
link |
00:26:10.240
a graph that Adam generated, where the x axis, y axis lines going up into the right. So
link |
00:26:15.920
they could make this thing the better performance accuracy as a vertical axis. So it's really
link |
00:26:20.800
based on that chart that Adam generated that he gave me the conviction that we could scale these
link |
00:26:25.280
models way bigger than what we could on a few CPUs, which is where we had a Stanford, that we
link |
00:26:29.520
could get even better results. And it was really based on that one figure that Adam generated
link |
00:26:34.960
that gave me the conviction to go with Sebastian Thun to pitch starting a project at Google,
link |
00:26:42.640
which became the Google Brain project. Brain, you go find Google Brain. And there the intuition was
link |
00:26:48.080
scale will bring performance for the system. So we should chase a larger and larger scale.
link |
00:26:55.360
And I think people don't don't realize how, how groundbreaking of it is simple, but it's a
link |
00:27:01.040
groundbreaking idea that bigger data sets will result in better performance.
link |
00:27:06.000
It was controversial at the time. Some of my well meaning friends, you know,
link |
00:27:10.160
senior people in the machine learning community, I won't name, but who's people, some of whom we
link |
00:27:14.800
know. My well meaning friends came and we're trying to give me a friend and say, Hey, Andrew,
link |
00:27:20.080
why are you doing this? This is crazy. It's in the near and after architecture. Look at these
link |
00:27:23.600
architectures of building. You just want to go for scale. Like there's a bad career move. So my
link |
00:27:28.080
well meaning friends, you know, we're trying to, some of them were trying to talk me out of it.
link |
00:27:33.920
But I find that if you want to make a breakthrough, you sometimes have to have conviction and
link |
00:27:39.280
do something before it's popular since that lets you have a bigger impact.
link |
00:27:42.960
Let me ask you just a small tangent on that topic. I find myself arguing with people saying that
link |
00:27:50.000
greater scale, especially in the context of active learning. So very carefully selecting the
link |
00:27:56.080
data set, but growing the scale of the data set is going to lead to even further breakthroughs
link |
00:28:01.600
in deep learning. And there's currently pushback at that idea that larger data sets are no longer
link |
00:28:07.760
there. So you want to increase the efficiency of learning. You want to make better learning
link |
00:28:12.960
mechanisms. And I personally believe that just bigger data sets will still with the same learning
link |
00:28:18.720
methods we have now will result in better performance. What's your intuition at this time
link |
00:28:23.920
on this dual side is do we need to come up with better architectures for learning?
link |
00:28:31.680
Or can we just get bigger, better data sets that will improve performance? I think both are
link |
00:28:38.240
important. And it's also problem dependent. So for a few data sets, we may be approaching,
link |
00:28:44.800
you know, Bayes error rate or approaching or surpassing human level performance. And then
link |
00:28:49.600
there's that theoretical ceiling that we will never surpass or Bayes error rate.
link |
00:28:54.560
But then I think there are plenty of problems where we're still quite far from either human
link |
00:28:58.880
level performance or from Bayes error rate. And bigger data sets with neural networks
link |
00:29:05.360
without further average innovation will be sufficient to take us further. But on the flip
link |
00:29:11.040
side, if we look at the recent breakthroughs using your transformer networks or language models,
link |
00:29:15.520
it was a combination of novel architecture, but also scale had a lot to do with it. We look at
link |
00:29:20.880
what happened with GP2 and BERT. I think scale was the large part of the story.
link |
00:29:25.520
Yeah, that's not often talked about is the scale of the data set it was trained on and
link |
00:29:31.040
the quality of the data set because there's some, so it was like reddit threads that had,
link |
00:29:38.240
they were operated highly. So there's already some weak supervision on a very large data set
link |
00:29:44.640
that people don't often talk about, right? I find that today we have maturing processes to managing
link |
00:29:51.280
code, things like Git, right? Version control. It took us a long time to evolve the good processes.
link |
00:29:58.320
I remember when my friends and I were emailing each other C++ files and email,
link |
00:30:02.080
you know, but then we had was that CVS subversion, Git, maybe something else in the future.
link |
00:30:07.440
We're very immature in terms of tools for managing data and thinking about the clean data and how
link |
00:30:12.160
the soft, very hot, messy data problems. I think there's a lot of innovation there to be had still.
link |
00:30:18.000
I love the idea that you were versioning through email.
link |
00:30:22.000
I'll give you one example. When we work with manufacturing companies,
link |
00:30:29.200
it's not at all uncommon for there to be multiple labels that disagree with each other, right? And
link |
00:30:36.480
so we would, during the work in visual inspection, we will take, say a plastic pot and show it to
link |
00:30:43.600
one inspector. And the inspector, sometimes very opinionated, they'll go, clearly, that's a defect.
link |
00:30:48.560
The scratch unacceptable. Gotta reject this pot. Take the same part to different inspector,
link |
00:30:53.360
different, very opinionated. Clearly, the scratch is small. It's fine. Don't throw it away. You're
link |
00:30:57.680
going to make us, you know, and then sometimes you take the same plastic pot, show it to the same
link |
00:31:02.640
inspector in the afternoon, I suppose, in the morning, and very opinionated go in the morning to
link |
00:31:07.600
say, clearly, it's okay in the afternoon, equally confident. Clearly, this is a defect. And so what
link |
00:31:13.200
does the AI team supposed to do if sometimes even one person doesn't agree with himself or
link |
00:31:18.000
himself in the span of a day? So I think these are the types of very practical, very messy data
link |
00:31:24.880
problems that my teams wrestle with. In the case of large consumer internet companies,
link |
00:31:32.880
where you have a billion users, you have a lot of data, you don't worry about it. Just take
link |
00:31:36.640
the average. It kind of works. But in the case of other industry settings, we don't have big data.
link |
00:31:42.400
If you're just a small data, very small data sets, maybe in the 100 defective parts,
link |
00:31:47.760
or 100 examples of a defect. If you have only 100 examples, these little labeling errors,
link |
00:31:52.960
you know, if 10 of your 100 labels are wrong, that actually is 10% of your data set has a big
link |
00:31:57.840
impact. So how do you clean this up? What are you supposed to do? This is an example of the types
link |
00:32:03.040
of things that my teams, this is a landing AI example, are wrestling with to deal with small
link |
00:32:08.480
data, which comes up all the time once you're outside consumer internet. Yeah, that's fascinating.
link |
00:32:12.960
So then you invest more effort and time in thinking about the actual labeling process.
link |
00:32:18.080
What are the labels? What are the hardware disagreements resolved and all those kinds of
link |
00:32:23.600
like pragmatic real world problems? That's a fascinating space. Yeah, I find that actually
link |
00:32:28.240
when I'm teaching at Stanford, I increasingly encourage students at Stanford to try to find
link |
00:32:34.800
their own project for the end of term project rather than just downloading someone else's
link |
00:32:40.320
nicely clean data set. It's actually much harder if you need to go and define your own problem and
link |
00:32:44.480
find your own data set rather than you go to one of the several good websites, very good websites
link |
00:32:49.920
with clean scoped data sets that you could just work on. You're now running three efforts, the AI
link |
00:32:57.760
fund, landing AI and deep learning.ai. As you've said, the AI fund is involved in creating new
link |
00:33:05.280
companies from scratch, landing AI is involved in helping already established companies do AI and
link |
00:33:11.200
deep learning AI is for education of everyone else or of individuals interested in of getting
link |
00:33:17.440
into the field and excelling in it. So let's perhaps talk about each of these areas first,
link |
00:33:22.560
deeplearning.ai. How the basic question, how does a person interested in deep learning get started
link |
00:33:30.720
in the field? Deep learning AI is working to create causes to help people break into AI. So
link |
00:33:38.880
my machine learning course that I taught through Stanford means one of the most popular causes
link |
00:33:44.240
on course era. To this day, it's probably one of the courses sort of if I asked somebody, how did
link |
00:33:50.240
you get into machine learning or how did you fall in love with machine learning or get you
link |
00:33:54.800
interested, it always goes back to and you're at some point. The amount of people you've
link |
00:34:02.160
influenced is ridiculous. So for that, I'm sure I speak for a lot of people say big thank you.
link |
00:34:07.040
No, thank you. I was once reading a news article. I think it was tech review and I'm going to
link |
00:34:16.240
mess up the statistic. But I remember reading an article that said something like one third of our
link |
00:34:21.680
programmers are self taught. I may have the number one third around me was two thirds. But when I
link |
00:34:25.920
read that article, I thought, this doesn't make sense. Everyone is self taught. So because you
link |
00:34:30.480
teach yourself, I don't teach people. That's well played. So yeah, so how does one get started
link |
00:34:36.880
in deep learning and where does deeplearning.ai fit into that? So the deep learning specialization
link |
00:34:42.480
offered by deep learning.ai is I think what it was called Sarah's talk specialization,
link |
00:34:49.920
it might still be so it's a very popular way for people to take that specialization to learn about
link |
00:34:55.680
everything from neural networks to how to tune in your network to what is a confnet to what is a
link |
00:35:02.400
RNN or sequence model or what is an attention model. And so the deep learning specialization
link |
00:35:07.840
steps everyone through those algorithms. So you deeply understand it and can implement it and use
link |
00:35:13.520
it for whatever. From the very beginning. So what would you say are the prerequisites for somebody
link |
00:35:20.000
to take the deep learning specialization in terms of maybe math or programming background?
link |
00:35:25.600
Yeah, need to understand basic programming since there are programming exercises in Python.
link |
00:35:31.280
And the math prereq is quite basic. So no calculus is needed. If you know calculus is great,
link |
00:35:37.040
you get better intuitions, but deliberately try to teach that specialization without
link |
00:35:41.520
requiring calculus. So I think high school math would be sufficient. If you know how to
link |
00:35:47.760
multiply two matrices, I think that that's great. So a little basic linear algebra is great.
link |
00:35:54.720
Basically linear algebra, even very, very basically linear algebra in some programming.
link |
00:36:00.000
I think that people that have done the machine learning course will find a deep learning
link |
00:36:03.040
specialization a bit easier. But it's also possible to jump into the deep learning specialization
link |
00:36:07.760
directly. But it will be a little bit harder since we tend to go over faster concepts like
link |
00:36:14.800
how does gradient descent work and what is the objective function, which we covered most slowly
link |
00:36:18.880
in the machine learning course. Could you briefly mention some of the key concepts in deep learning
link |
00:36:23.520
that students should learn that you envision them learning in the first few months in the first
link |
00:36:28.160
year or so? So if you take the deep learning specialization, you learn the foundations of
link |
00:36:33.600
what is a neural network? How do you build up a neural network from a single logistic unit
link |
00:36:38.720
to a stack of layers to different activation functions? You learn how to train the neural
link |
00:36:44.080
networks. One thing I'm very proud of in that specialization is we go through a lot of
link |
00:36:48.880
practical know how of how to actually make these things work. So what are the differences between
link |
00:36:53.840
different optimization algorithms? What do you do with the algorithm overfit? So how do you tell if the
link |
00:36:58.000
algorithm is overfitting? When do you collect more data? When should you not bother to collect more
link |
00:37:02.320
data? I find that even today, unfortunately, there are engineers that will spend six months trying to
link |
00:37:10.320
pursue a particular direction, such as collect more data because we heard more data is valuable.
link |
00:37:15.840
But sometimes you could run some tests and could have figured out six months earlier that for this
link |
00:37:20.960
particular problem, collecting more data isn't going to cut it. So just don't spend six months
link |
00:37:25.040
collecting more data. Spend your time modifying the architecture or trying something else. So
link |
00:37:30.800
go through a lot of the practical know how so that when someone, when you take the deep learning
link |
00:37:36.160
specialization, you have those skills to be very efficient in how you build these networks.
link |
00:37:41.920
So dive right in to play with the network, to train it, to do the inference on a particular
link |
00:37:46.720
data set, to build intuition about it without building it up too big to where you spend,
link |
00:37:52.880
like you said, six months learning, building up your big project without building any intuition
link |
00:37:58.720
of a small aspect of the data that could already tell you everything you need to know about that
link |
00:38:04.720
data. Yes, and also the systematic frameworks of thinking for how to go about building practical
link |
00:38:11.360
machine learning. Maybe to make an analogy, when we learn to code, we have to learn the syntax of
link |
00:38:16.880
some programming language, right, be it Python or C++ or Octave or whatever. But that equally
link |
00:38:22.480
important or maybe even more important part of coding is to understand how to string together
link |
00:38:26.800
these lines of code and to coherent things. So when should you put something in a function
link |
00:38:31.600
column? When should you not? How do you think about abstraction? So those frameworks are what
link |
00:38:36.960
makes a programmer efficient, even more than understanding the syntax. I remember when I was
link |
00:38:42.560
an undergrad at Carnegie Mellon, one of my friends would debug their code by first trying to compile
link |
00:38:48.720
it and then it was C++ code. And then every line that is syntax error, they want to get
link |
00:38:53.760
rid of the syntax errors as quickly as possible. So how do you do that? Well, they would delete
link |
00:38:57.360
every single line of code with a syntax error. So really efficient for getting rid of syntax
link |
00:39:01.360
errors for horrible debugging errors. So I think, so we learn how to debug. And I think in machine
link |
00:39:06.240
learning, the way you debug a machine learning program is very different than the way you,
link |
00:39:11.120
you know, like do binary search or whatever, use a debug, trace through the code in traditional
link |
00:39:15.760
software engineering. So as an evolving discipline, but I find that the people that are really good
link |
00:39:20.720
at debugging machine learning algorithms are easily 10x, maybe 100x faster at getting something to
link |
00:39:26.800
work. And the basic process of debugging is, so the bug in this case, why isn't this thing learning,
link |
00:39:34.080
learning, improving, sort of going into the questions of overfitting and all those kinds of
link |
00:39:40.080
things. That's, that's the logical space that the debugging is happening in with neural networks.
link |
00:39:46.480
Yeah, the often question is, why doesn't it work yet? Or can I expect it to eventually work?
link |
00:39:52.960
And what are the things I could try, change the architecture, more data, more regularization,
link |
00:39:57.440
different optimization algorithm, you know, different types of data. So to answer those
link |
00:40:02.720
questions systematically, so that you don't hitting down the, so you don't spend six months hitting
link |
00:40:07.120
down the blind alley before someone comes and says, why did you spend six months doing this?
link |
00:40:12.160
What concepts in deep learning do you think students struggle the most with?
link |
00:40:16.480
Or sort of this is the biggest challenge for them wants to get over that hill. It's,
link |
00:40:23.120
it hooks them and it inspires them and they really get it.
link |
00:40:26.400
Similar to learning mathematics, I think one of the challenges of deep learning is that there are
link |
00:40:31.920
a lot of concepts that build on top of each other. If you ask me what's hard about mathematics,
link |
00:40:37.440
I have a hard time pinpointing one thing. Is it addition, subtraction? Is it a carry? Is it
link |
00:40:42.160
multiplication? There's just a lot of stuff. I think one of the challenges of learning math
link |
00:40:46.400
and of learning certain technical fields is that there are a lot of concepts and if you
link |
00:40:50.640
miss a concept, then you're kind of missing the prerequisite for something that comes
link |
00:40:54.800
later. So in the deep learning specialization, try to break down the concepts to maximize the
link |
00:41:02.240
odds of each component being understandable. So when you move on to the more advanced thing,
link |
00:41:07.120
we learn you have confnets, hopefully you have enough intuitions from the earlier sections
link |
00:41:11.600
to then understand why we structure confnets in a certain way and then eventually why we
link |
00:41:18.080
build RNNs and LSTMs or attention models in a certain way, building on top of the earlier concepts.
link |
00:41:24.800
Actually, I'm curious. You do a lot of teaching as well. Do you have a favorite,
link |
00:41:30.400
this is the hard concept, moment in your teaching?
link |
00:41:36.960
Well, I don't think anyone's ever turned the interview on me.
link |
00:41:40.160
I'm glad you get the first.
link |
00:41:41.680
I think that's a really good question. Yeah, it's really hard to capture the moment when they
link |
00:41:50.080
struggle. I think you put it really eloquently. I do think there's moments that are like aha
link |
00:41:55.280
moments that really inspire people. I think for some reason, reinforcement learning,
link |
00:42:01.920
especially deeper reinforcement learning, is a really great way to really inspire people
link |
00:42:08.000
and get what the use of neural networks can do. Even though neural networks really are just a part
link |
00:42:15.280
of the deep RL framework, but it's a really nice way to paint the entirety of the picture
link |
00:42:21.040
of a neural network being able to learn from scratch, knowing nothing and explore the world
link |
00:42:26.560
and pick up lessons. I find that a lot of the aha moments happen when you use deep RL to teach
link |
00:42:33.600
people about neural networks, which is counterintuitive. I find like a lot of the inspired
link |
00:42:38.560
sort of fire and people's passion, people's eyes, it comes from the RL world. Do you find
link |
00:42:43.920
reinforcement learning to be a useful part of the teaching process or not?
link |
00:42:50.720
I still teach reinforcement learning in one of my Stanford classes, and my PhD thesis was on
link |
00:42:55.360
reinforcement learning, so I currently love the field. I find that if I'm trying to teach
link |
00:42:59.840
students the most useful techniques for them to use today, I end up shrinking the amount of time
link |
00:43:06.320
I talk about reinforcement learning. It's not what's working today. Now, our world changes so fast.
link |
00:43:11.520
Maybe this will be totally different in a couple of years, but I think we need a couple more things
link |
00:43:16.960
for reinforcement learning to get there. One of my teams is looking to reinforcement learning
link |
00:43:21.920
for some robotic control tasks. I see the applications, but if you look at it as a percentage
link |
00:43:26.800
of all of the impact of the types of things we do, at least today, outside of playing video games
link |
00:43:35.280
in a few of the games, the scope. Actually, at Neurov's, a bunch of us were standing around
link |
00:43:40.880
saying, hey, what's your best example of an actual deployment of reinforcement learning
link |
00:43:44.560
application among senior machine learning researchers? Again, there are some emerging
link |
00:43:51.040
ones, but there are not that many great examples.
link |
00:43:54.720
Well, I think you're absolutely right. The sad thing is there hasn't been a big application
link |
00:44:01.920
impactful real world application reinforcement learning. I think its biggest impact to me
link |
00:44:07.520
has been in the toy domain, in the game domain, in the small example. That's what I mean for
link |
00:44:12.320
educational purpose. It seems to be a fun thing to explore and it all networks with,
link |
00:44:16.720
but I think from your perspective, and I think that might be the best perspective,
link |
00:44:21.920
is if you're trying to educate with a simple example in order to illustrate how this can
link |
00:44:26.480
actually be grown to scale and have a real world impact, then perhaps focusing on the
link |
00:44:33.040
fundamentals of supervised learning in the context of a simple data set, even like an
link |
00:44:39.520
MNIST data set, is the right way, is the right path to take. The amount of fun I've seen people
link |
00:44:46.560
have with reinforcement learning has been great, but not in the applied impact on the real world
link |
00:44:52.080
setting. It's a trade off. How much impact you want to have versus how much fun you want to have.
link |
00:44:57.280
Yeah, that's really cool. I feel like the world actually needs all sorts. Even within machine
link |
00:45:02.240
learning, I feel like deep learning is so exciting, but the AI team shouldn't just use
link |
00:45:07.760
deep learning. I find that my teams use a portfolio of tools, and maybe that's not the
link |
00:45:12.480
exciting thing to say, but some days we use a neural net, some days we use a PCA. Actually,
link |
00:45:20.240
the other day I was sitting down with my team looking at PCA residuals, trying to figure out
link |
00:45:23.200
what's going on with PCA applied to manufacturing problem. Some days we use a probabilistic graphical
link |
00:45:27.760
model. Some days we use a knowledge draft, which is one of the things that has tremendous industry
link |
00:45:32.320
impact, but the amount of chatter about knowledge drafts in academia is really thin compared to
link |
00:45:37.760
the actual rural impact. I think reinforcement learning should be in that portfolio and then
link |
00:45:42.800
it's about balancing how much we teach all of these things. The world should have diverse
link |
00:45:47.520
skills. It would be sad if everyone just learned one narrow thing. Yeah, the diverse skill helped
link |
00:45:52.560
you discover the right tool for the job. What is the most beautiful, surprising, or inspiring
link |
00:45:58.480
idea in deep learning to you? Something that captivated your imagination? Is it the scale
link |
00:46:05.760
that could be the performance that could be achieved with scale, or is there other ideas?
link |
00:46:11.520
I think that if my only job was being an academic researcher with an unlimited budget
link |
00:46:18.160
and didn't have to worry about short term impact and only focus on long term impact,
link |
00:46:23.760
I've really spent all my time doing research on unsupervised learning. I still think unsupervised
link |
00:46:28.480
learning is a beautiful idea. At both this past in Europe and ICML, I was attending workshops or
link |
00:46:36.080
listening to various talks about self supervised learning, which is one vertical segment, maybe
link |
00:46:41.840
of sort of unsupervised learning that I'm excited about. Maybe just to summarize the idea. I guess
link |
00:46:46.800
you know the idea we'll describe briefly. No, please. So here's an example of self supervised
link |
00:46:51.040
learning. Let's say we grab a lot of unlabeled images off the internet, so with infinite amounts
link |
00:46:56.640
of this type of data, I'm going to take each image and rotate it by a random multiple of 90 degrees,
link |
00:47:02.960
and then I'm going to train a supervised neural network to predict what was the original orientation.
link |
00:47:08.960
So it has simply rotated 90 degrees, 180 degrees, 270 degrees, or zero degrees. So you can generate
link |
00:47:15.680
an infinite amount of labeled data because you rotated the image so you know what's the
link |
00:47:19.920
drunk truth label. And so various researchers have found that by taking unlabeled data and making
link |
00:47:27.120
up labeled data sets and training a large neural network on these tasks, you can then take the
link |
00:47:32.240
hidden layer representation and transfer it to a different task very powerfully. Learning word
link |
00:47:38.880
embeddings where we take a sentence to read the word, predict the missing word, which is how we
link |
00:47:43.440
learn. One of the ways we learn word embeddings is another example. And I think there's now this
link |
00:47:49.440
portfolio of techniques for generating these made up tasks. Another one called jigsaw would be if
link |
00:47:55.680
you take an image, cut it up into a three by three grid, so like a nine three by three puzzle piece,
link |
00:48:02.000
jump up the nine pieces and have a neural network predict which of the nine factorial possible
link |
00:48:07.200
permutations it came from. So many groups including OpenAI, Peter B has been doing some work on this
link |
00:48:15.760
to Facebook, Google, Brain, I think DeepMind. Oh, actually, Aaron Vendorold has great work on the
link |
00:48:23.120
CPC objective. So many teams are doing exciting work. And I think this is a way to generate
link |
00:48:28.320
infinite labeled data. And I find this a very exciting piece of unsupervised learning.
link |
00:48:34.080
So long term, you think that's going to unlock a lot of power in machine learning systems? Is this
link |
00:48:40.320
kind of unsupervised learning? I don't think there's a whole enchilada. I think it's just a piece of
link |
00:48:44.880
it. And I think this one piece unsupervised learning is starting to get traction. We're very
link |
00:48:50.880
close to it being useful. Well, word embeddings are really useful. I think we're getting closer
link |
00:48:56.560
and closer to just having a significant real world impact, maybe in computer vision and video.
link |
00:49:03.040
But I think this concept, and I think there'll be other concepts around it, you know,
link |
00:49:08.000
other unsupervised learning things that I worked on, I've been excited about. I was really excited
link |
00:49:12.800
about sparse coding and ICA, slow feature analysis. I think all of these are ideas that
link |
00:49:19.440
various of us were working on about a decade ago before we all got distracted by how well
link |
00:49:24.160
supervised learning was doing. So we would return to the fundamentals of representation
link |
00:49:30.400
learning that really started this movement of deep learning. I think there's a lot more work
link |
00:49:34.880
that one could explore around the steam of ideas and other ideas to come up with better algorithms.
link |
00:49:39.520
So if we could return to maybe talk quickly about the specifics of deeplearning.ai,
link |
00:49:46.640
the deep learning specialization perhaps, how long does it take to complete the course, would you say?
link |
00:49:52.640
The official length of the deep learning specialization is I think 16 weeks, so about four
link |
00:49:57.920
months, but it's go at your own pace. So if you subscribe to the deep learning specialization,
link |
00:50:03.520
there are people that finish it in less than a month by working more intensely and study
link |
00:50:07.360
more intensely. So it really depends on the individual. When we created the deep learning
link |
00:50:12.080
specialization, we wanted to make it very accessible and very affordable. And with
link |
00:50:19.360
Coursera and deep learning education mission, one of the things that's really important to me
link |
00:50:23.520
is that if there's someone for whom paying anything is a financial hardship,
link |
00:50:29.280
then just apply for financial aid and get it for free.
link |
00:50:32.480
If you were to recommend a daily schedule for people in learning, whether it's through the
link |
00:50:39.600
deep learning that AI specialization or just learning in the world of deep learning,
link |
00:50:45.440
what would you recommend? How do they go about day to day sort of specific advice
link |
00:50:50.240
about learning and about their journey in the world of deep learning machine learning?
link |
00:50:54.240
I think getting the habit of learning is key and that means regularity. So for example,
link |
00:51:04.320
we send out a weekly newsletter, The Batch, every Wednesday. So people know it's coming
link |
00:51:09.120
Wednesday, you can spend a little bit of time on Wednesday catching up on the latest news
link |
00:51:13.680
through The Batch on Wednesday. And for myself, I've picked up a habit of spending
link |
00:51:21.120
some time every Saturday and every Sunday reading or studying. And so I don't wake up on the Saturday
link |
00:51:26.640
and have to make a decision. Do I feel like reading or studying today or not? It's just
link |
00:51:31.040
what I do. And the fact is a habit makes it easier. So I think if someone can get into that habit,
link |
00:51:37.600
it's like, you know, just like we brush our teeth every morning. I don't think about it. If I thought
link |
00:51:42.480
about it, it's a little bit annoying to have to spend two minutes doing that. But it's a habit
link |
00:51:47.120
that it takes no cognitive load. But this would be so much harder if we have to make a decision
link |
00:51:51.280
every morning. So and then actually, that's the reason why we're the same thing every day as well.
link |
00:51:56.080
It's just one less decision. I just get up and where I prefer it. So if I think you can get
link |
00:52:00.640
that habit, that consistency of studying, then it actually feels easier. So yeah, it's kind of
link |
00:52:06.960
amazing. In my own life, like I play guitar every day for, I force myself to at least for five minutes
link |
00:52:14.880
play guitar. It's a ridiculously short period of time. But because I've gotten into that habit,
link |
00:52:20.080
it's incredible what you can accomplish in a period of a year or two years, you can become
link |
00:52:26.160
you know, exceptionally good at certain aspects of a thing by just doing it every day for a very
link |
00:52:31.280
short period of time. It's kind of a miracle that that's how it works. It's adds up over time.
link |
00:52:36.240
Yeah. And I think it's often not about the burst of sustained efforts and the all nighters,
link |
00:52:41.920
because you could only do that a limited number of times. It's the sustained effort over a long time.
link |
00:52:47.200
I think, you know, reading two research papers isn't a nice thing to do, but the power is not
link |
00:52:52.880
reading two research papers. It's reading two research papers a week for a year. Then you
link |
00:52:57.760
read 100 papers and you actually learn a lot when you read 100 papers. So regularity and
link |
00:53:03.600
making learning a habit. Do you have general other study tips for particularly deep learning that
link |
00:53:12.560
people should, in their process of learning, is there some kind of recommendations or tips you
link |
00:53:17.760
have as they learn? One thing I still do when I'm trying to study something really deeply is
link |
00:53:24.400
take handwritten notes. It varies. I know there are a lot of people that take the deep learning
link |
00:53:29.360
courses during a commute or something where it may be more awkward to take notes. So I know it
link |
00:53:35.440
may not work for everyone. But when I'm taking courses on course error, and I still take some
link |
00:53:41.680
every now and then, the most recent one I took was a course on clinical trials because I was
link |
00:53:45.280
interested about that. I got out of my little Moscan notebook and I was sitting in my desk
link |
00:53:49.600
just taking down notes of what the instructor was saying. And we know that that act of taking notes,
link |
00:53:55.440
preferably handwritten notes, increases retention. So as you're sort of watching the video, just kind
link |
00:54:02.720
of pausing maybe and then taking the basic insights down on paper? Yeah. So there have been a few
link |
00:54:09.600
studies. If you search online, you find some of these studies that taking handwritten notes,
link |
00:54:15.200
because handwriting is slower, as we're saying just now. It causes you to recode the knowledge
link |
00:54:21.360
in your own words more. And that process of recoding promotes long term retention. This is as
link |
00:54:27.200
opposed to typing, which is fine. Again, typing is better than nothing or in taking a class and not
link |
00:54:32.080
taking notes is better than not taking any class at all. But comparing handwritten notes and typing,
link |
00:54:38.000
you can usually type faster for a lot of people that you can hand write notes. And so when people
link |
00:54:42.640
type, they're more likely to just try to strive verbatim what they heard. And that reduces the
link |
00:54:47.360
amount of recoding. And that actually results in less long term retention. I don't know what the
link |
00:54:52.960
psychological effect there is, but it's so true. There's something fundamentally different about
link |
00:54:57.200
writing hand handwriting. I wonder what that is. I wonder if it is as simple as just the time it
link |
00:55:02.560
takes to write is slower. Yeah. And because you can't write as many words, you have to take
link |
00:55:09.360
whatever they said and summarize it into fewer words. And that summarization process requires
link |
00:55:13.840
deeper processing of the meaning, which then results in better retention. That's fascinating.
link |
00:55:20.400
And I've spent, I think because of Coursera, I've spent so much time studying pedagogy.
link |
00:55:24.160
It's actually one of my passions. I really love learning how to more efficiently help others
link |
00:55:28.640
learn. Yeah, one of the things I do both when creating videos or when we write the batch is
link |
00:55:36.160
I try to think is one minute spent of us going to be a more efficient learning experience than
link |
00:55:42.160
one minute spent anywhere else. And we really try to make a time efficient for the learners,
link |
00:55:48.320
because you know, everyone's busy. So when we're editing, I often tell my teams,
link |
00:55:53.520
every word needs to fight for his life. And if you can delete a word, this is deleted and not
link |
00:55:57.280
wait, let's not waste the learners time. That's so amazing that you think that way. Because
link |
00:56:02.480
there is millions of people, they're impacted by your teaching and sort of that one minute spent
link |
00:56:06.640
has a ripple effect, right? Through years of time, which is just fascinating to think about. How
link |
00:56:12.720
does one make a career out of an interest in deep learning? Do you have advice for people? We just
link |
00:56:18.960
talked about sort of the beginning early steps. But if you want to make it an entire life's journey,
link |
00:56:24.240
or at least a journey of a decade or two, how do you do it? So most important thing is to get
link |
00:56:30.080
started. Right. And I think in the early parts of a career coursework, like the deep learning
link |
00:56:37.600
specialization, or it's a very efficient way to master this material. So because, you know,
link |
00:56:46.160
instructors, be it me or someone else, or, you know, Lawrence Maroney teaches intensive
link |
00:56:50.960
specialization or other things we're working on, spend effort to try to make it time efficient for
link |
00:56:56.240
you to learn new concepts. So coursework is actually a very efficient way for people to learn
link |
00:57:02.080
concepts in the beginning parts of breaking into new fields. In fact, one thing I see at Stanford,
link |
00:57:08.880
some of my PhD students want to jump in the research right away and actually tend to say,
link |
00:57:12.960
look, in your first couple years of PhD, spend time taking courses because it lays a foundation.
link |
00:57:18.320
It's fine if you're less productive in your first couple years, you'll be better off in the long
link |
00:57:22.320
term. Beyond a certain point, there's materials that doesn't exist in courses because it's too
link |
00:57:28.240
cutting edge, the courses we've created yet, there's some practical experience that we're not
link |
00:57:32.160
yet that good as teaching in a course. And I think after exhausting the efficient coursework,
link |
00:57:37.760
then most people need to go on to either ideally work on projects and then maybe also continue
link |
00:57:46.160
their learning by reading blog posts and research papers and things like that. Doing
link |
00:57:51.200
projects is really important. And again, I think it's important to start small and just do something.
link |
00:57:58.240
Today, you read about deep learning. If you're like, oh, all these people are doing such exciting
link |
00:58:01.360
things, whether I'm not building a neural network that changes the world, then what's the point?
link |
00:58:05.040
Well, the point is sometimes building that tiny neural network, be it MNIST or upgrade to fashion
link |
00:58:11.360
MNIST to whatever, doing your own fun hobby project. That's how you gain the skills to let
link |
00:58:16.880
you do bigger and bigger projects. I find this to be true at the individual level and also at the
link |
00:58:22.400
organizational level. For a company to become good at machine learning, sometimes the right thing to
link |
00:58:26.560
do is not to tackle the giant project, is instead to do the small project that lets the
link |
00:58:32.560
organization learn and then build up from there. But this is true both for individuals and for
link |
00:58:38.160
companies. Taking the first step and then taking small steps is the key. Should students pursue
link |
00:58:46.240
a PhD? Do you think you can do so much? That's one of the fascinating things of machine learning.
link |
00:58:51.360
You can have so much impact without ever getting a PhD. So what are your thoughts?
link |
00:58:56.240
Should people go to grad school? Should people get a PhD?
link |
00:58:59.680
I think that there are multiple good options of which doing a PhD could be one of them. I think
link |
00:59:05.600
that if someone's admitted to a top PhD program at MIT, Stanford, top schools, I think that's a
link |
00:59:12.880
very good experience. Or if someone gets a job at a top organization, at the top AI team, I think
link |
00:59:20.800
that's also a very good experience. There are some things you still need a PhD to do. If someone's
link |
00:59:26.320
aspiration is to be a professor at the top academic university, you just need a PhD to do that.
link |
00:59:30.960
But if it goes to start a company, build a company, do great technical work, I think
link |
00:59:36.320
PhD is a good experience. But I would look at the different options available to someone.
link |
00:59:41.440
Where are the places where you can get a job? Where are the places you can get in a PhD program
link |
00:59:44.960
and weigh the pros and cons of those? Just to linger on that for a little bit longer,
link |
00:59:50.000
what final dreams and goals do you think people should have? What options should they explore?
link |
00:59:57.280
So you can work in industry for a large company, like Google, Facebook, Baidu,
link |
01:00:03.520
all these large companies that already have huge teams of machine learning engineers.
link |
01:00:09.120
You can also do with an industry more research groups, like Google Research, Google Brain.
link |
01:00:15.120
Then you can also do, like we said, a professor in academia. And what else? Oh, you can build
link |
01:00:22.720
your own company. You can do a startup. Is there anything that stands out between those options?
link |
01:00:28.480
Or are they all beautiful, different journeys that people should consider?
link |
01:00:32.560
I think the thing that affects your experience more is less, are you in this company versus
link |
01:00:37.440
that company or academia versus industry? I think the thing that affects your experience,
link |
01:00:41.280
Moses, who are the people you're interacting with in a daily basis? So even if you look at
link |
01:00:47.280
some of the large companies, the experience of individuals in different teams is very different.
link |
01:00:52.960
And what matters most is not the logo above the door when you walk into the giant building every
link |
01:00:57.760
day. What matters the most is who are the 10 people, who are the 30 people you interact with every
link |
01:01:02.400
day. So I tend to advise people, if you get a job from a company, ask who is your manager?
link |
01:01:09.280
Who are your peers? Who are you actually going to talk to? We're all social creatures. We tend
link |
01:01:12.800
to become more like the people around us. And if you're working with great people, you will learn
link |
01:01:18.080
faster. Or if you get admitted, if you get a job at a great company or a great university,
link |
01:01:24.160
maybe the logo you walk in is great, but you're actually stuck on some team doing
link |
01:01:28.800
really work that doesn't excite you. And then that's actually a really bad experience.
link |
01:01:33.680
So this is true both for universities and for large companies. For small companies,
link |
01:01:38.800
you can kind of figure out who you be working with quite quickly. And I tend to advise people,
link |
01:01:43.760
if a company refuses to tell you who you work with, so you can say, oh, join us. The rotation
link |
01:01:48.240
system will figure it out. I think that that's a worrying answer because it means you may not
link |
01:01:54.560
get sent to, you may not actually get to a team with great peers and great people to work with.
link |
01:02:00.720
It's actually a really profound advice that we kind of sometimes sweep. We don't consider
link |
01:02:06.720
too rigorously or carefully. The people around you are really often, especially when you
link |
01:02:11.840
accomplish great things, it seems the great things are accomplished because of the people around you.
link |
01:02:16.640
So it's not about whether you learn this thing or that thing, or like you said,
link |
01:02:23.280
the logo that hangs up top, it's the people. That's a fascinating and it's such a hard search process
link |
01:02:30.480
of finding, just like finding the right friends and somebody to get married with and that kind
link |
01:02:36.960
of thing. It's a very hard search. It's a people search problem. Yeah. But I think when someone
link |
01:02:42.560
interviews at a university or the research lab or the large corporation, it's good to insist on
link |
01:02:48.640
just asking who are the people, who is my manager. And if you refuse to tell me, I'm going to
link |
01:02:53.200
think, well, maybe that's because you don't have a good answer. It may not be someone I like.
link |
01:02:57.200
And if you don't particularly connect, if something feels off with the people,
link |
01:03:02.320
then don't stick to it. That's a really important signal to consider.
link |
01:03:08.480
Yeah. And actually, in my standard class, CS230, as well as an ACM talk, I think I gave like a
link |
01:03:15.920
hour long talk on career advice, including on the job search process and then some of these.
link |
01:03:20.880
So if you can find those videos online. Awesome. And I'll point them. I'll point people to them.
link |
01:03:26.400
Beautiful. So the AI fund helps AI startups get off the ground. Or perhaps you can elaborate on all
link |
01:03:34.960
the fun things it's evolved with. What's your advice and how does one build a successful AI startup?
link |
01:03:41.840
You know, in Silicon Valley, a lot of start up failures come from building other products that
link |
01:03:47.120
no one wanted. So when, you know, cool technology, but who's going to use it? So
link |
01:03:54.560
I think I tend to be very outcome driven and customer obsessed. Ultimately, we don't get
link |
01:04:01.760
to vote if we succeed or fail. It's only the customer that they're the only one that gets a
link |
01:04:07.360
thumbs up or thumbs down vote in the long term. In the short term, you know, there are various
link |
01:04:11.680
people that get various votes, but in the long term, that's what really matters.
link |
01:04:15.600
So as you build this startup, you have to constantly ask the question,
link |
01:04:20.720
will the customer give a thumbs up on this?
link |
01:04:24.080
I think so. I think startups that are very customer focused, customers that
link |
01:04:28.240
deeply understand the customer and are oriented to serve the customer are more likely to succeed.
link |
01:04:36.320
With the provisional, I think all of us should only do things that we think
link |
01:04:39.920
create social good and move the world forward. So I personally don't want to build
link |
01:04:44.240
addictive digital products just to sell off ads. There are things that could be lucrative that I
link |
01:04:49.680
won't do. But if we can find ways to serve people in meaningful ways, I think those can be great
link |
01:04:57.920
things to do, either in an academic setting or in a corporate setting or a startup setting.
link |
01:05:02.880
So can you give me the idea of why you started the AI fund?
link |
01:05:07.360
I remember when I was leading the AI group at Baidu, I had two jobs, two parts of my job. One was
link |
01:05:16.080
to build an AI engine to support the existing businesses, and that was running, just ran,
link |
01:05:21.280
just performed by itself. The second part of my job at the time was to try to systematically
link |
01:05:26.240
initiate new lines of businesses using the company's AI capabilities. So, you know,
link |
01:05:32.000
the self driving car team came out of my group, the smart speaker team, similar to what is
link |
01:05:38.800
Amazon Echo or Alexa in the US, but we actually announced it before Amazon did. So Baidu wasn't
link |
01:05:44.240
following Amazon. That came out of my group, and I found that to be actually the most fun part of
link |
01:05:52.000
my job. So what I want to do was to build AI fund as a startup studio to systematically create new
link |
01:06:00.320
startups from scratch. With all of the things we can now do of AI, I think the ability to build new
link |
01:06:06.880
teams to go after this rich space of opportunities is a very important way, a very important mechanism
link |
01:06:13.520
to get these projects done that I think will move the world forward. So I've been fortunate to build
link |
01:06:18.560
a few teams that had a meaningful positive impact, and I felt that we might be able to do this in a
link |
01:06:25.280
more systematic, repeatable way. So a startup studio is a relatively new concept. There are
link |
01:06:31.920
maybe dozens of startup studios right now, but I feel like all of us, many teams are still trying
link |
01:06:39.920
to figure out how do you systematically build companies with a high success rate. So I think
link |
01:06:46.720
even a lot of my venture capital friends seem to be more and more building companies rather than
link |
01:06:52.080
investing in companies. But I find a fascinating thing to do, to figure out the mechanisms by
link |
01:06:56.800
which we could systematically build successful teams, successful businesses in areas that we
link |
01:07:02.480
find meaningful. So a startup studio is a place and a mechanism for startups to go from zero to
link |
01:07:10.320
success. So try to develop a blueprint. It's actually a place for us to build startups from
link |
01:07:15.760
scratch. So we often bring in founders and work with them or maybe even have existing ideas
link |
01:07:23.760
that we match founders with. And then this launches, hopefully into successful companies.
link |
01:07:30.960
So how close are you to figuring out a way to automate the process of starting from scratch
link |
01:07:38.240
and building successful AI startup? Yeah, I think we've been constantly improving and iterating on
link |
01:07:45.280
our processes, how we do that. So things like how many customer calls do we need to make in order
link |
01:07:51.520
to get customer validation? How do we make sure this technology can be built? Quite a lot of our
link |
01:07:55.840
businesses need cutting edge machine learning algorithms. So kind of algorithms have developed
link |
01:08:00.480
in the last one or two years. And even if it works in a research paper, it turns out taking
link |
01:08:05.600
the production is really hard. There are a lot of issues, but making these things work in the real
link |
01:08:09.760
life that are not widely addressed in academia. So how do we validate that this is actually doable?
link |
01:08:17.040
How do we build a team, get specialized domain knowledge, be it in education or healthcare,
link |
01:08:21.360
whatever sector we're focusing on? So I think we've actually been getting much better at
link |
01:08:27.120
giving the entrepreneurs a high success rate. But I think the whole world is still in the
link |
01:08:33.840
early phases. But do you think there are some aspects of that process that are transferable
link |
01:08:40.000
from one startup to another, to another, to another? Yeah, very much so. Starting a company
link |
01:08:46.480
to most entrepreneurs is a really lonely thing. And I've seen so many entrepreneurs not know how
link |
01:08:54.560
to make certain decisions like when do you need to, how do you do BDP sales? If you don't know
link |
01:09:00.560
that, it's really hard. Or how do you market this efficiently other than buying ads, which is
link |
01:09:07.360
really expensive. Are there more efficient tactics to that? Or for a machine learning project,
link |
01:09:12.960
basic decisions can change the course of whether machine learning product works or not. And so
link |
01:09:18.960
there are so many hundreds of decisions that entrepreneurs need to make. And making a mistake
link |
01:09:24.480
in a couple of key decisions can have a huge impact on the fate of the company.
link |
01:09:30.160
So I think a startup studio provides a support structure that makes starting a company much
link |
01:09:34.560
less of a lonely experience. And also, when facing with these key decisions, like trying to hire your
link |
01:09:42.000
first VP of engineering, what's a good selection criteria? How do you solve? Should I hire this
link |
01:09:47.600
person or not? By having our ecosystem around the entrepreneurs, the founders to help, I think we
link |
01:09:55.520
help them at the key moments and hopefully significantly make them more enjoyable and
link |
01:10:01.040
then higher success rate. So they have somebody to brainstorm with in these very difficult
link |
01:10:06.320
decision points. And also to help them recognize what they may not even realize is a key decision
link |
01:10:13.840
point. That's the first and probably the most important part, yeah. I can say one other thing.
link |
01:10:19.440
You know, I think building companies is one thing, but I feel like it's really important that we
link |
01:10:25.840
build companies that move the world forward. For example, within the AFUN team, there was once an
link |
01:10:31.840
idea for a new company that if it had succeeded, would have resulted in people watching a lot
link |
01:10:38.160
more videos in a certain narrow vertical type of video. I looked at it, the business case was fine,
link |
01:10:44.160
the revenue case was fine, but I looked and I just said, I don't want to do this. I don't actually
link |
01:10:49.440
just want to have a lot more people watch this type of video. It wasn't educational, it was an
link |
01:10:53.600
educational baby. And so I code the idea on the basis that I didn't think it would actually
link |
01:10:59.600
help people. So whether building companies or working enterprises or doing personal projects,
link |
01:11:05.200
I think it's up to each of us to figure out what's the difference we want to make in the world.
link |
01:11:11.520
With Lending AI, you help already established companies grow their AI and machine learning
link |
01:11:16.240
efforts. How does a large company integrate machine learning into their efforts?
link |
01:11:22.400
AI is a general purpose technology and I think it will transform every industry. Our community has
link |
01:11:29.040
already transformed to a large extent the software internet sector. Most software internet companies
link |
01:11:34.080
outside the top right 506 or 304 already have reasonable machine learning capabilities or
link |
01:11:41.520
getting there. It's the room for improvement. But when I look outside the software internet sector,
link |
01:11:47.120
everything from manufacturing, every culture, healthcare, logistics, transportation, there's
link |
01:11:51.760
so many opportunities that very few people are working on. So I think the next wave for AI
link |
01:11:57.520
is first also transform all of those other industries. There was a McKinsey company,
link |
01:12:02.320
estimating $13 trillion of global economic growth. U.S. GDP is $19 trillion. So $13 trillion is a big
link |
01:12:10.240
number. Or PwC has been $16 trillion. So whatever number is this large. But the interesting thing
link |
01:12:16.400
to me was a lot of that impact will be outside the software internet sector. So we need more teams
link |
01:12:23.280
to work with these companies to help them adopt AI. And I think this is one of the things that
link |
01:12:28.560
help drive global economic growth and make humanity more powerful.
link |
01:12:33.040
And like you said, the impact is there. So what are the best industries, the biggest industries
link |
01:12:37.440
where AI can help perhaps outside the software tech sector?
link |
01:12:41.200
Frankly, I think it's all of them. Some of the ones I'm spending a lot of time on are manufacturing,
link |
01:12:48.320
every culture, looking to healthcare. For example, in manufacturing, we do a lot of work in visual
link |
01:12:55.120
inspection. Where today, there are people standing around using the human eye to check if, you know,
link |
01:13:01.120
this plastic part or the smartphone or this thing has a scratch or a dent or something in it.
link |
01:13:05.840
We can use a camera to take a picture, use a deep learning and other things to check if it's
link |
01:13:13.520
defective or not. And does our factories improve yield and improve quality and improve throughput.
link |
01:13:20.160
It turns out the practical problems we run into are very different than the ones you might read
link |
01:13:25.040
about in most research papers. The data sets are really small. So we face small data problems.
link |
01:13:29.440
You know, the factories keep on changing the environment. So it works well on your test set.
link |
01:13:35.440
But guess what? Something changes in the factory. The lights go on or off. Recently,
link |
01:13:41.120
there was a factory in which a bird threw through the factory and pooped on something.
link |
01:13:46.080
And so that changed stuff. And so increasing our algorithm to make robustness to all the changes
link |
01:13:52.720
happened in the factory. I find that we run a lot of practical problems that are not as widely
link |
01:13:58.320
discussed in academia. And it's really fun kind of being on the cutting edge solving these problems
link |
01:14:03.680
before, you know, maybe before many people are even aware that there is a problem there.
link |
01:14:07.840
And that's such a fascinating space. You're absolutely right. But what is the first step
link |
01:14:12.960
that a company should take? It's just a scary leap into this new world of going from the human eye
link |
01:14:19.120
inspecting to digitizing that process, having a camera, having an algorithm. What's the first step?
link |
01:14:25.440
Like, what's the early journey that you recommend that you see these companies taking?
link |
01:14:30.400
I published a document called the AI Transformation Playbook. There's online and taught briefly in
link |
01:14:36.320
the AI for Everyone course on course era about the long term journey that a company should take.
link |
01:14:41.840
But the first step is actually to start small. I've seen a lot more companies fail by starting
link |
01:14:47.760
too big than by starting too small. Take even Google. Most people don't realize how hard it was
link |
01:14:54.960
and how controversial it was in the early days. So when I started Google Brain, it was controversial.
link |
01:15:01.760
People thought deep learning near and at tried it, didn't work. Why would you want to do deep learning?
link |
01:15:07.120
Why would you want to do deep learning? So my first internal customer in Google was the Google
link |
01:15:13.040
Speech Team, which is not the most lucrative project in Google, not the most important.
link |
01:15:18.320
It's not web search or advertising. But by starting small, my team helped the speech team
link |
01:15:25.760
build a more accurate speech recognition system. And this caused their peers, other teams to start
link |
01:15:31.120
to have more faith in deep learning. My second internal customer was the Google Maps team,
link |
01:15:36.320
where we used computer vision to read host numbers from basic street view images to more
link |
01:15:41.280
accurately locate houses with Google Maps to improve the quality of the geodata.
link |
01:15:45.760
And it was only after those two successes that I then started a more serious conversation with
link |
01:15:50.640
the Google Ads team. And so there's a ripple effect that you showed that it works in these
link |
01:15:56.000
in these cases. And they just propagates through the entire company that this
link |
01:15:59.840
this thing has a lot of value and use for us. I think the early small scale projects,
link |
01:16:05.120
it helps the teams gain faith, but also helps the teams learn what these technologies do.
link |
01:16:11.520
I still remember when our first GPU server, it was a server under some guy's desk,
link |
01:16:16.880
and you know, and then that taught us early important lessons about how do you have multiple
link |
01:16:22.960
users share a set of GPUs, which is really not obvious at the time. But those early lessons
link |
01:16:28.480
were important. We learned a lot from that first GPU server that later helped the teams think through
link |
01:16:33.920
how to scale it up to much larger deployments. Are there concrete challenges that companies face
link |
01:16:40.160
that the UC is important for them to solve? I think building and deploying machine learning
link |
01:16:45.680
systems is hard. There's a huge gap between something that works in a Jupyter notebook on
link |
01:16:50.720
your laptop versus something that runs in a production deployment setting in a factory
link |
01:16:55.440
or a culture plant or whatever. So I see a lot of people, you know, get something to work on
link |
01:17:00.560
your laptop and say, wow, look what I've done. And that's great. That's hard. That's a very
link |
01:17:04.400
important first step. But a lot of teams underestimate the rest of the steps needed.
link |
01:17:09.520
So for example, I've heard this exact same conversation between a lot of machine learning
link |
01:17:13.600
people and business people. The machine learning person says, look, my algorithm does well on the
link |
01:17:20.240
test set. And it's a clean test set. I didn't peek. And the machine in the business person says,
link |
01:17:25.440
thank you very much, but your algorithm sucks. It doesn't work. And the machine learning person
link |
01:17:30.240
says, no, wait, I did well on the test set. And I think there is a gulf between what it takes to
link |
01:17:38.480
do well on a test set on your hard drive versus what it takes to work well in a deployment setting.
link |
01:17:44.320
Some common problems, robustness and generalization, you know, you deploy something in a factory,
link |
01:17:50.480
maybe they chopped down a tree outside the factory so the tree no longer covers the
link |
01:17:54.960
window and the lighting is different. So the test set changes. And in machine learning,
link |
01:17:59.520
and especially in academia, we don't know how to deal with test set distributions that are
link |
01:18:04.080
dramatically different than the training set distribution. This research, this stuff like
link |
01:18:09.280
domain annotation, transfer learning, you know, there are people working on it, but we're really
link |
01:18:14.240
not good at this. So how do you actually get this to work because your test set distribution is
link |
01:18:19.600
going to change? And I think also, if you look at the number of lines of code in the software system,
link |
01:18:27.040
the machine learning model is maybe 5% or even fewer relative to the entire software system
link |
01:18:34.240
you need to build. So how do you get all that work done and make it reliable and systematic?
link |
01:18:38.800
So good software engineering work is fundamental here to building a successful small machine
link |
01:18:45.040
learning system? Yes. And the software system needs to interface with people's workloads. So
link |
01:18:51.280
machine learning is automation on steroids. If we take one task out of many tasks that are done
link |
01:18:56.560
in the factory, so the factory does lots of things. One task is visual inspection. If we
link |
01:19:01.120
automate that one task, it can be really valuable, but you may need to redesign a lot of other tasks
link |
01:19:06.080
around that one task. For example, say the machine learning algorithm says this is defective. What
link |
01:19:11.120
are you supposed to do? Do you throw it away? Do you get a human to double check? Do you want to
link |
01:19:14.960
rework it or fix it? So you need to redesign a lot of tasks around that thing you've now automated.
link |
01:19:20.720
So planning for the change management and making sure that the software you write is consistent
link |
01:19:26.320
with the new workflow. And you take the time to explain to people what needs to happen. So I think
link |
01:19:31.280
what Lambda AI has become good at, and I think we learned by making the steps and
link |
01:19:37.920
painful experiences, or what would become good at is working upon this to think through
link |
01:19:45.120
all the things beyond just the machine learning model that you put in a notebook,
link |
01:19:48.960
but to build the entire system, manage the change process, and figure out how to deploy
link |
01:19:54.080
this in the way that has an actual impact. The processes that the large software tech companies
link |
01:19:59.440
use for deploying don't work for a lot of other scenarios. For example, when I was leading large
link |
01:20:05.360
speech teams, if the speech recognition system goes down, what happens? Well, allowance goes off,
link |
01:20:10.960
and then someone like me would say, hey, you 20 engineers, please fix this. But if you have a
link |
01:20:17.760
system go down in the factory, there are not 20 machine learning engineers sitting around,
link |
01:20:21.920
you can page a duty and have them fix it. So how do you deal with the maintenance or the
link |
01:20:26.800
DevOps or the MOOps or the other aspects of this? So these are concepts that I think Lambda AI and
link |
01:20:34.400
a few other teams on the cutting Asia, but we don't even have systematic terminology yet to
link |
01:20:39.680
describe some of the stuff we do, because I think we're indenting it on the fly.
link |
01:20:44.640
So you mentioned some people are interested in discovering mathematical beauty and truth in
link |
01:20:48.800
the universe, and you're interested in having a big positive impact in the world. So let me ask
link |
01:20:55.600
you. The two are not inconsistent. No, they're all together. I'm only half joking because
link |
01:21:01.040
you're probably interested a little bit in both. But let me ask a romanticized question. So much
link |
01:21:06.720
of the work, your work and our discussion today has been on applied AI. Maybe you can even call
link |
01:21:13.040
narrow AI, where the goal is to create systems that automate some specific process that adds a
link |
01:21:17.840
lot of value to the world. But there's another branch of AI starting with Alan Turing that kind
link |
01:21:23.120
of dreams of creating human level or superhuman level intelligence. Is this something you dream
link |
01:21:29.680
of as well? Do you think we human beings will ever build a human level intelligence or superhuman
link |
01:21:35.200
level intelligence system? I would love to get the AGI, and I think humanity will. But whether it
link |
01:21:41.520
takes 100 years or 500 or 5000, I find hard to estimate. Some folks have worries about the
link |
01:21:52.160
different trajectories that path would take, even existential threats of an AGI system. Do you have
link |
01:21:57.920
such concerns, whether in the short term or the long term? I do worry about the long term fate
link |
01:22:04.880
of humanity. I do wonder as well. I do worry about overpopulation on the planet Mars, just not
link |
01:22:13.760
today. I think there will be a day when maybe someday in the future, Mars will be polluted,
link |
01:22:20.000
there are all these children dying, and someone will look back at this video and say,
link |
01:22:23.200
Andrew, how is Andrew so heartless? He didn't care about all these children dying on the planet
link |
01:22:27.280
Mars. And I apologize to the future viewer. I do care about the children, but I just don't know
link |
01:22:32.800
how to productively work on that today. Your picture will be in the dictionary for the people
link |
01:22:38.400
who are ignorant about the overpopulation on Mars. Yes. So it's a long term problem. Is there
link |
01:22:44.720
something in the short term we should be thinking about in terms of aligning the values of our AGI
link |
01:22:50.000
systems with the values of us humans? Something that Stuart Russell and other folks are thinking
link |
01:22:57.440
about as this system develops more and more, we want to make sure that it represents the
link |
01:23:03.440
better angels of our nature, the ethics, the values of our society.
link |
01:23:09.440
You know, if you take cell driving cars, the biggest problem with cell driving cars is not
link |
01:23:14.640
that there's some trolley dilemma and you teach this. So, you know, how many times when you are
link |
01:23:20.640
driving your car, did you face this moral dilemma as food I trash into? So I think
link |
01:23:26.240
the cell driving cars will run into that problem roughly as often as we do when we drive our cars.
link |
01:23:32.080
The biggest problem with cell driving cars is when there's a big white truck across the road
link |
01:23:35.760
and what you should do is break and not crash into it and the cell driving car fails and it
link |
01:23:40.320
crashes into it. So I think we need to solve that problem first. I think the problem with some of
link |
01:23:44.640
these discussions about AGI, you know, alignments, the paperclip problem is that is a huge distraction
link |
01:23:54.320
from the much harder problems that we actually need to address today. Some of the hard problems
link |
01:23:59.520
need to address today. I think bias is a huge issue. I worry about wealth and equality.
link |
01:24:06.640
AI and internet are causing an acceleration of concentration of power because we can now
link |
01:24:12.240
centralize data, use AI to process it and so industry after industry will affect every industry.
link |
01:24:17.520
So the internet industry has a lot of win and take, most of win and take all dynamics,
link |
01:24:22.000
both with infected all these other industries. So also giving these other industries win and
link |
01:24:26.400
take most of win and take all flavors. So look at what Uber and Lyft did to the taxi industry.
link |
01:24:32.480
So we're doing this type of thing. It's a lot. So this, so we're creating tremendous wealth,
link |
01:24:36.400
but how do we make sure that the wealth is fairly shared? I think that and how do we help people
link |
01:24:43.360
whose jobs are displaced? You know, I think education is part of it. There may be even more
link |
01:24:48.400
that we need to do than education. I think bias is a serious issue. They're adverse
link |
01:24:56.080
users of AI like deep fakes being used for various nefarious purposes. So I worry about
link |
01:25:03.600
some teams maybe accidentally and I hope not deliberately making a lot of noise about
link |
01:25:10.000
things that problems in the distant future rather than focusing on sometimes the much
link |
01:25:14.960
harder problems. Yeah, overshadow the problems that we have already today that are exceptionally
link |
01:25:19.760
challenging. Like those you said, and even the silly ones, but the ones that have a huge impact,
link |
01:25:24.480
which is the lighting variation outside of your factory window that ultimately is what makes
link |
01:25:30.880
the difference between like you said, the Jupiter notebook and something that actually
link |
01:25:34.720
transforms an entire industry potentially. Yeah. And I think, and just to some companies
link |
01:25:40.560
or regulator comes to you and says, look, your product is messing things up. Fixing it may have
link |
01:25:46.640
a revenue impact. Well, it's much more fun to talk to them about how you promise not to wipe out
link |
01:25:51.120
humanity and to face the actually really hard problems we face. So your life has been a great
link |
01:25:57.120
journey from teaching to research to entrepreneurship. Two questions. One, are there regrets moments
link |
01:26:04.480
that if you went back, you would do differently? And two, are there moments you're especially proud
link |
01:26:10.000
of moments that made you truly happy? You know, I've made so many mistakes. It feels like every
link |
01:26:18.160
time I discover something, I go, why didn't I think of this, you know, five years earlier or even
link |
01:26:25.920
10 years earlier? And as recently, and then sometimes I read a book and I go, I wish I read
link |
01:26:33.600
this book 10 years ago, my life would have been so different. Although that happened recently.
link |
01:26:37.600
And then I was thinking, if only I read this book, when we're starting up called Sarah,
link |
01:26:41.600
it could have been so much better. But I discovered the book had not yet been written
link |
01:26:45.680
or starting called Sarah. So that made me feel better. But I find that the process of discovery,
link |
01:26:53.120
we keep on finding out things that seem so obvious in hindsight. But it always takes us so
link |
01:26:58.720
much longer than I wish to figure it out. So on the second question, are there moments in your
link |
01:27:07.280
life that if you look back that you're especially proud of or you're especially happy that filled
link |
01:27:16.000
you with happiness and fulfillment? Well, two answers. One, that's my daughter Nova.
link |
01:27:21.360
Yes, of course. Because I know how much time I spent with her. I just can't spend enough time with her.
link |
01:27:25.360
Congratulations, by the way. Thank you. And then second is helping other people. I think to me,
link |
01:27:30.480
I think the meaning of life is helping others achieve whatever are their dreams.
link |
01:27:36.080
And then also, to try to move the world forward by making humanity more powerful as a whole.
link |
01:27:42.480
So the times that I felt most happy, most proud with when I felt someone else
link |
01:27:50.080
allowed me the good fortune of helping them a little bit on the path to their dreams.
link |
01:27:56.080
I think there's no better way to end it than talking about happiness and the meaning of life.
link |
01:28:00.160
So it's true. It's a huge honor. Me and millions of people, thank you for all the work you've done.
link |
01:28:05.040
Thank you for talking to me today. Thank you so much. Thanks.
link |
01:28:08.720
Thanks for listening to this conversation with Andrew Eng and thank you to our presenting
link |
01:28:13.120
sponsor, Cash App. Download it, use code LEX Podcast. You'll get $10 and $10 will go to first,
link |
01:28:20.080
an organization that inspires and educates young minds to become science and technology
link |
01:28:24.720
innovators of tomorrow. If you enjoy this podcast, subscribe on YouTube, give it five stars on Apple
link |
01:28:30.560
Podcasts, support on Patreon, or simply connect with me on Twitter at Lex Freedman.
link |
01:28:36.960
And now let me leave you with some words of wisdom from Andrew Eng.
link |
01:28:41.120
Ask yourself if what you're working on succeeds beyond your wildest dreams,
link |
01:28:46.160
which have significantly helped other people. If not, then keep searching for something else to
link |
01:28:52.160
work on. Otherwise, you're not living up to your full potential. Thank you for listening
link |
01:28:58.800
and hope to see you next time.