back to index

Jeremy Howard: fast.ai Deep Learning Courses and Research | Lex Fridman Podcast #35


small model | large model

link |
00:00:00.000
The following is a conversation with Jeremy Howard.
link |
00:00:03.120
He's the founder of FastAI, a research institute dedicated
link |
00:00:07.040
to making deep learning more accessible.
link |
00:00:09.720
He's also a distinguished research scientist
link |
00:00:12.560
at the University of San Francisco,
link |
00:00:14.600
a former president of Kaggle,
link |
00:00:16.640
as well as a top ranking competitor there.
link |
00:00:18.760
And in general, he's a successful entrepreneur,
link |
00:00:21.680
educator, researcher, and an inspiring personality
link |
00:00:25.200
in the AI community.
link |
00:00:27.000
When someone asks me, how do I get started with deep learning?
link |
00:00:30.200
FastAI is one of the top places that point them to.
link |
00:00:33.320
It's free, it's easy to get started,
link |
00:00:35.480
it's insightful and accessible,
link |
00:00:37.600
and if I may say so, it has very little BS
link |
00:00:40.960
that can sometimes dilute the value of educational content
link |
00:00:44.120
on popular topics like deep learning.
link |
00:00:46.720
FastAI has a focus on practical application of deep learning
link |
00:00:50.280
and hands on exploration of the cutting edge
link |
00:00:52.800
that is incredibly both accessible to beginners
link |
00:00:56.000
and useful to experts.
link |
00:00:57.960
This is the Artificial Intelligence Podcast.
link |
00:01:01.320
If you enjoy it, subscribe on YouTube,
link |
00:01:03.800
give it five stars on iTunes,
link |
00:01:05.480
support it on Patreon,
link |
00:01:06.920
or simply connect with me on Twitter
link |
00:01:09.040
at Lex Friedman, spelled F R I D M A N.
link |
00:01:13.280
And now, here's my conversation with Jeremy Howard.
link |
00:01:18.520
What's the first program you ever written?
link |
00:01:21.680
First program I wrote that I remember
link |
00:01:24.760
would be at high school.
link |
00:01:29.200
I did an assignment where I decided
link |
00:01:31.200
to try to find out if there were some better musical scales
link |
00:01:36.200
than the normal 12 tone, 12 interval scale.
link |
00:01:40.600
So I wrote a program on my Commodore 64 in basic
link |
00:01:43.640
that searched through other scale sizes
link |
00:01:46.000
to see if it could find one
link |
00:01:47.240
where there were more accurate harmonies.
link |
00:01:51.880
Like mid tone?
link |
00:01:53.520
Like you want an actual exactly three to two ratio
link |
00:01:56.520
or else with a 12 interval scale,
link |
00:01:59.400
it's not exactly three to two, for example.
link |
00:02:01.480
So that's well tempered as they say in there.
link |
00:02:05.040
And basic on a Commodore 64.
link |
00:02:07.680
Where was the interest in music from?
link |
00:02:09.440
Or is it just technical?
link |
00:02:10.440
I did music all my life.
link |
00:02:12.000
So I played saxophone and clarinet and piano
link |
00:02:15.360
and guitar and drums and whatever.
link |
00:02:18.120
How does that thread go through your life?
link |
00:02:22.120
Where's music today?
link |
00:02:24.200
It's not where I wish it was.
link |
00:02:28.360
For various reasons, couldn't really keep it going,
link |
00:02:30.200
particularly because I had a lot of problems
link |
00:02:31.640
with RSI with my fingers.
link |
00:02:33.520
And so I had to kind of like cut back anything
link |
00:02:35.560
that used hands and fingers.
link |
00:02:39.400
I hope one day I'll be able to get back to it health wise.
link |
00:02:43.920
So there's a love for music underlying it all.
link |
00:02:46.400
Yeah.
link |
00:02:47.840
What's your favorite instrument?
link |
00:02:49.520
Saxophone.
link |
00:02:50.360
Sax.
link |
00:02:51.200
Or baritone saxophone.
link |
00:02:52.880
Well, probably bass saxophone, but they're awkward.
link |
00:02:57.480
Well, I always love it when music
link |
00:03:00.040
is coupled with programming.
link |
00:03:01.720
There's something about a brain that utilizes those
link |
00:03:04.680
that emerges with creative ideas.
link |
00:03:07.560
So you've used and studied quite a few programming languages.
link |
00:03:11.240
Can you give an overview of what you've used?
link |
00:03:15.160
What are the pros and cons of each?
link |
00:03:17.880
Well, my favorite programming environment,
link |
00:03:20.080
well, most certainly was Microsoft Access
link |
00:03:24.560
back in like the earliest days.
link |
00:03:26.440
So that was Visual Basic for applications,
link |
00:03:28.880
which is not a good programming language,
link |
00:03:30.680
but the programming environment was fantastic.
link |
00:03:33.040
It's like the ability to create, you know,
link |
00:03:38.000
user interfaces and tie data and actions to them
link |
00:03:41.200
and create reports and all that
link |
00:03:43.680
as I've never seen anything as good.
link |
00:03:46.760
There's things nowadays like Airtable,
link |
00:03:48.560
which are like small subsets of that,
link |
00:03:54.240
which people love for good reason,
link |
00:03:56.160
but unfortunately, nobody's ever achieved
link |
00:04:00.080
anything like that.
link |
00:04:01.080
What is that?
link |
00:04:01.920
If you could pause on that for a second.
link |
00:04:03.280
Oh, Access?
link |
00:04:04.120
Is it a database?
link |
00:04:06.280
It was a database program that Microsoft produced,
link |
00:04:09.600
part of Office, and they kind of withered, you know,
link |
00:04:13.400
but basically it lets you in a totally graphical way
link |
00:04:16.240
create tables and relationships and queries
link |
00:04:18.440
and tie them to forms and set up, you know,
link |
00:04:21.800
event handlers and calculations.
link |
00:04:24.680
And it was a very complete powerful system
link |
00:04:28.120
designed for not massive scalable things,
link |
00:04:31.480
but for like useful little applications that I loved.
link |
00:04:36.360
So what's the connection between Excel and Access?
link |
00:04:40.240
So very close.
link |
00:04:42.120
So Access kind of was the relational database equivalent,
link |
00:04:47.120
if you like.
link |
00:04:47.960
So people still do a lot of that stuff
link |
00:04:50.600
that should be in Access in Excel as they know it.
link |
00:04:53.600
Excel's great as well.
link |
00:04:54.840
So, but it's just not as rich a programming model
link |
00:04:59.680
as VBA combined with a relational database.
link |
00:05:04.080
And so I've always loved relational databases,
link |
00:05:06.800
but today programming on top of relational database
link |
00:05:10.480
is just a lot more of a headache.
link |
00:05:13.040
You know, you generally either need to kind of,
link |
00:05:15.680
you know, you need something that connects,
link |
00:05:18.240
that runs some kind of database server
link |
00:05:19.920
unless you use SQLite, which has its own issues.
link |
00:05:25.000
Then you kind of often,
link |
00:05:25.920
if you want to get a nice programming model,
link |
00:05:27.600
you'll need to like create an, add an ORM on top.
link |
00:05:30.400
And then, I don't know,
link |
00:05:31.960
there's all these pieces to tie together
link |
00:05:34.360
and it's just a lot more awkward than it should be.
link |
00:05:37.000
There are people that are trying to make it easier.
link |
00:05:39.200
So in particular, I think of F sharp, you know, Don Syme,
link |
00:05:42.440
who, him and his team have done a great job
link |
00:05:45.760
of making something like a database appear
link |
00:05:50.520
in the type system.
link |
00:05:51.640
So you actually get like tab completion for fields
link |
00:05:54.200
and tables and stuff like that.
link |
00:05:57.800
Anyway, so that was kind of, anyway,
link |
00:05:59.280
so like that whole VBA office thing, I guess,
link |
00:06:01.880
was a starting point, which I still miss.
link |
00:06:04.640
And I got into standard Visual Basic, which...
link |
00:06:07.800
That's interesting, just to pause on that for a second.
link |
00:06:09.880
It's interesting that you're connecting programming languages
link |
00:06:13.520
to the ease of management of data.
link |
00:06:17.440
Yeah.
link |
00:06:18.280
So in your use of programming languages,
link |
00:06:20.600
you always had a love and a connection with data.
link |
00:06:24.880
I've always been interested in doing useful things
link |
00:06:28.000
for myself and for others,
link |
00:06:29.480
which generally means getting some data
link |
00:06:31.920
and doing something with it and putting it out there again.
link |
00:06:34.600
So that's been my interest throughout.
link |
00:06:38.440
So I also did a lot of stuff with AppleScript
link |
00:06:41.600
back in the early days.
link |
00:06:43.880
So it's kind of nice being able to get the computer
link |
00:06:48.000
and computers to talk to each other
link |
00:06:50.160
and to do things for you.
link |
00:06:52.960
And then I think that one,
link |
00:06:54.640
the programming language I most loved then
link |
00:06:58.240
would have been Delphi, which was Object Pascal,
link |
00:07:02.920
created by Anders Heilsberg,
link |
00:07:04.880
who previously did Turbo Pascal
link |
00:07:07.480
and then went on to create.NET
link |
00:07:08.840
and then went on to create TypeScript.
link |
00:07:11.080
Delphi was amazing because it was like a compiled,
link |
00:07:14.880
fast language that was as easy to use as Visual Basic.
link |
00:07:20.200
Delphi, what is it similar to in more modern languages?
link |
00:07:27.520
Visual Basic.
link |
00:07:28.880
Visual Basic.
link |
00:07:29.720
Yeah, but a compiled, fast version.
link |
00:07:32.320
So I'm not sure there's anything quite like it anymore.
link |
00:07:37.080
If you took like C Sharp or Java
link |
00:07:40.680
and got rid of the virtual machine
link |
00:07:42.520
and replaced it with something,
link |
00:07:43.440
you could compile a small type binary.
link |
00:07:46.560
I feel like it's where Swift could get to
link |
00:07:50.720
with the new Swift UI
link |
00:07:52.640
and the cross platform development going on.
link |
00:07:56.520
Like that's one of my dreams
link |
00:07:59.360
is that we'll hopefully get back to where Delphi was.
link |
00:08:02.840
There is actually a free Pascal project nowadays
link |
00:08:08.520
called Lazarus,
link |
00:08:09.360
which is also attempting to kind of recreate Delphi.
link |
00:08:13.400
So they're making good progress.
link |
00:08:16.080
So, okay, Delphi,
link |
00:08:18.560
that's one of your favorite programming languages.
link |
00:08:20.960
Well, it's programming environments.
link |
00:08:22.360
Again, I'd say Pascal's not a nice language.
link |
00:08:26.280
If you wanted to know specifically
link |
00:08:27.880
about what languages I like,
link |
00:08:29.640
I would definitely pick J as being an amazingly wonderful
link |
00:08:33.600
language.
link |
00:08:35.480
What's J?
link |
00:08:37.040
J, are you aware of APL?
link |
00:08:39.640
I am not, except from doing a little research
link |
00:08:42.440
on the work you've done.
link |
00:08:44.080
Okay, so not at all surprising you're not familiar with it
link |
00:08:48.000
because it's not well known,
link |
00:08:49.000
but it's actually one of the main families
link |
00:08:54.880
of programming languages going back to the late 50s,
link |
00:08:57.080
early 60s.
link |
00:08:57.920
So there was a couple of major directions.
link |
00:09:01.640
One was the kind of Lambda Calculus Alonzo Church direction,
link |
00:09:06.120
which I guess kind of lisp and scheme and whatever,
link |
00:09:09.960
which has a history going back
link |
00:09:12.280
to the early days of computing.
link |
00:09:13.360
The second was the kind of imperative slash OO,
link |
00:09:18.680
algo similar going on to C, C++ and so forth.
link |
00:09:23.160
There was a third,
link |
00:09:24.000
which are called array oriented languages,
link |
00:09:26.920
which started with a paper by a guy called Ken Iverson,
link |
00:09:31.480
which was actually a math theory paper,
link |
00:09:35.160
not a programming paper.
link |
00:09:37.480
It was called Notation as a Tool for Thought.
link |
00:09:41.440
And it was the development of a new way,
link |
00:09:43.480
a new type of math notation.
link |
00:09:45.280
And the idea is that this math notation
link |
00:09:47.560
was much more flexible, expressive,
link |
00:09:51.320
and also well defined than traditional math notation,
link |
00:09:55.280
which is none of those things.
link |
00:09:56.440
Math notation is awful.
link |
00:09:59.200
And so he actually turned that into a programming language
link |
00:10:02.280
and cause this was the early 50s or the sorry, late 50s,
link |
00:10:05.640
all the names were available.
link |
00:10:06.760
So he called his language a programming language or APL.
link |
00:10:10.560
APL.
link |
00:10:11.400
So APL is a implementation of notation
link |
00:10:15.360
as a tool for thought by which he means math notation.
link |
00:10:18.320
And Ken and his son went on to do many things,
link |
00:10:22.880
but eventually they actually produced a new language
link |
00:10:26.600
that was built on top of all the learnings of APL.
link |
00:10:28.440
And that was called J.
link |
00:10:30.600
And J is the most expressive, composable language
link |
00:10:39.360
of beautifully designed language I've ever seen.
link |
00:10:42.440
Does it have object oriented components?
link |
00:10:44.560
Does it have that kind of thing?
link |
00:10:45.560
Not really, it's an array oriented language.
link |
00:10:47.720
It's the third path.
link |
00:10:51.440
Are you saying array?
link |
00:10:52.800
Array oriented, yeah.
link |
00:10:53.960
What does it mean to be array oriented?
link |
00:10:55.520
So array oriented means that you generally
link |
00:10:57.520
don't use any loops,
link |
00:10:59.560
but the whole thing is done with kind of
link |
00:11:02.400
a extreme version of broadcasting,
link |
00:11:06.360
if you're familiar with that NumPy slash Python concept.
link |
00:11:09.920
So you do a lot with one line of code.
link |
00:11:14.640
It looks a lot like math notation, highly compact.
link |
00:11:19.640
And the idea is that you can kind of,
link |
00:11:22.880
because you can do so much with one line of code,
link |
00:11:24.760
a single screen of code is very unlikely to,
link |
00:11:27.760
you very rarely need more than that
link |
00:11:29.560
to express your program.
link |
00:11:31.120
And so you can kind of keep it all in your head
link |
00:11:33.320
and you can kind of clearly communicate it.
link |
00:11:36.080
It's interesting that APL created two main branches,
link |
00:11:40.000
K and J.
link |
00:11:41.640
J is this kind of like open source,
link |
00:11:44.560
niche community of crazy enthusiasts like me.
link |
00:11:49.440
And then the other path, K, was fascinating.
link |
00:11:52.160
It's an astonishingly expensive programming language,
link |
00:11:56.640
which many of the world's
link |
00:11:58.520
most ludicrously rich hedge funds use.
link |
00:12:02.920
So the entire K machine is so small
link |
00:12:06.680
it sits inside level three cache on your CPU.
link |
00:12:09.360
And it easily wins every benchmark I've ever seen
link |
00:12:14.120
in terms of data processing speed.
link |
00:12:16.760
But you don't come across it very much
link |
00:12:17.920
because it's like $100,000 per CPU to run it.
link |
00:12:22.760
It's like this path of programming languages
link |
00:12:26.280
is just so much, I don't know,
link |
00:12:28.920
so much more powerful in every way
link |
00:12:30.360
than the ones that almost anybody uses every day.
link |
00:12:33.920
So it's all about computation.
link |
00:12:37.520
It's really focused on computation.
link |
00:12:38.360
It's pretty heavily focused on computation.
link |
00:12:40.640
I mean, so much of programming
link |
00:12:43.200
is data processing by definition.
link |
00:12:45.640
So there's a lot of things you can do with it.
link |
00:12:48.960
But yeah, there's not much work being done
link |
00:12:51.440
on making like user interface toolkits or whatever.
link |
00:12:57.000
I mean, there's some, but they're not great.
link |
00:12:59.320
At the same time, you've done a lot of stuff
link |
00:13:00.880
with Perl and Python.
link |
00:13:03.120
So where does that fit into the picture of J and K and APL?
link |
00:13:08.840
Well, it's just much more pragmatic.
link |
00:13:11.000
Like in the end, you kind of have to end up
link |
00:13:13.880
where the libraries are, you know?
link |
00:13:17.760
Like, cause to me, my focus is on productivity.
link |
00:13:21.240
I just want to get stuff done and solve problems.
link |
00:13:23.680
So Perl was great.
link |
00:13:27.280
I created an email company called FastMail
link |
00:13:29.680
and Perl was great cause back in the late nineties,
link |
00:13:32.840
early two thousands, it just had a lot of stuff it could do.
link |
00:13:38.080
I still had to write my own monitoring system
link |
00:13:41.760
and my own web framework, my own whatever,
link |
00:13:43.800
cause like none of that stuff existed.
link |
00:13:45.720
But it was a super flexible language to do that in.
link |
00:13:50.280
And you used Perl for FastMail, you used it as a backend?
link |
00:13:54.240
Like so everything was written in Perl?
link |
00:13:55.760
Yeah, yeah, everything, everything was Perl.
link |
00:13:58.720
Why do you think Perl hasn't succeeded
link |
00:14:02.920
or hasn't dominated the market where Python
link |
00:14:05.960
really takes over a lot of the tasks?
link |
00:14:07.560
Well, I mean, Perl did dominate.
link |
00:14:09.600
It was everything, everywhere,
link |
00:14:13.080
but then the guy that ran Perl, Larry Wohl,
link |
00:14:17.120
kind of just didn't put the time in anymore.
link |
00:14:22.320
And no project can be successful if there isn't,
link |
00:14:28.040
you know, particularly one that started with a strong leader
link |
00:14:31.600
that loses that strong leadership.
link |
00:14:35.040
So then Python has kind of replaced it.
link |
00:14:37.840
You know, Python is a lot less elegant language
link |
00:14:43.400
in nearly every way,
link |
00:14:45.040
but it has the data science libraries
link |
00:14:48.880
and a lot of them are pretty great.
link |
00:14:51.280
So I kind of use it
link |
00:14:56.240
cause it's the best we have,
link |
00:14:58.280
but it's definitely not good enough.
link |
00:15:01.800
But what do you think the future of programming looks like?
link |
00:15:04.080
What do you hope the future of programming looks like
link |
00:15:06.560
if we zoom in on the computational fields,
link |
00:15:08.760
on data science, on machine learning?
link |
00:15:11.840
I hope Swift is successful
link |
00:15:15.440
because the goal of Swift,
link |
00:15:19.440
the way Chris Latner describes it,
link |
00:15:21.040
is to be infinitely hackable.
link |
00:15:22.640
And that's what I want.
link |
00:15:23.480
I want something where me and the people I do research with
link |
00:15:26.920
and my students can look at
link |
00:15:29.480
and change everything from top to bottom.
link |
00:15:32.000
There's nothing mysterious and magical and inaccessible.
link |
00:15:36.240
Unfortunately with Python, it's the opposite of that
link |
00:15:38.600
because Python is so slow.
link |
00:15:40.800
It's extremely unhackable.
link |
00:15:42.640
You get to a point where it's like,
link |
00:15:43.840
okay, from here on down at C.
link |
00:15:45.360
So your debugger doesn't work in the same way.
link |
00:15:47.280
Your profiler doesn't work in the same way.
link |
00:15:48.920
Your build system doesn't work in the same way.
link |
00:15:50.760
It's really not very hackable at all.
link |
00:15:53.760
What's the part you like to be hackable?
link |
00:15:55.600
Is it for the objective of optimizing training
link |
00:16:00.120
of neural networks, inference of neural networks?
link |
00:16:02.560
Is it performance of the system
link |
00:16:04.320
or is there some non performance related, just?
link |
00:16:07.840
It's everything.
link |
00:16:09.000
I mean, in the end, I want to be productive
link |
00:16:11.280
as a practitioner.
link |
00:16:13.840
So that means that, so like at the moment,
link |
00:16:16.280
our understanding of deep learning is incredibly primitive.
link |
00:16:20.000
There's very little we understand.
link |
00:16:21.440
Most things don't work very well,
link |
00:16:23.200
even though it works better than anything else out there.
link |
00:16:26.120
There's so many opportunities to make it better.
link |
00:16:28.600
So you look at any domain area,
link |
00:16:31.280
like, I don't know, speech recognition with deep learning
link |
00:16:35.720
or natural language processing classification
link |
00:16:38.360
with deep learning or whatever.
link |
00:16:39.400
Every time I look at an area with deep learning,
link |
00:16:41.920
I always see like, oh, it's terrible.
link |
00:16:44.440
There's lots and lots of obviously stupid ways
link |
00:16:47.480
to do things that need to be fixed.
link |
00:16:50.160
So then I want to be able to jump in there
link |
00:16:51.600
and quickly experiment and make them better.
link |
00:16:54.840
You think the programming language has a role in that?
link |
00:16:59.240
Huge role, yeah.
link |
00:17:00.240
So currently, Python has a big gap
link |
00:17:05.960
in terms of our ability to innovate,
link |
00:17:09.240
particularly around recurrent neural networks
link |
00:17:11.800
and natural language processing.
link |
00:17:14.880
Because it's so slow, the actual loop
link |
00:17:18.240
where we actually loop through words,
link |
00:17:20.160
we have to do that whole thing in CUDA C.
link |
00:17:23.720
So we actually can't innovate with the kernel,
link |
00:17:27.080
the heart of that most important algorithm.
link |
00:17:31.520
And it's just a huge problem.
link |
00:17:33.640
And this happens all over the place.
link |
00:17:36.440
So we hit research limitations.
link |
00:17:40.040
Another example, convolutional neural networks,
link |
00:17:42.600
which are actually the most popular architecture
link |
00:17:44.720
for lots of things, maybe most things in deep learning.
link |
00:17:48.880
We almost certainly should be using
link |
00:17:50.280
sparse convolutional neural networks,
link |
00:17:52.880
but only like two people are,
link |
00:17:55.360
because to do it, you have to rewrite
link |
00:17:57.800
all of that CUDA C level stuff.
link |
00:17:59.880
And yeah, just researchers and practitioners don't.
link |
00:18:04.480
So there's just big gaps in what people actually research on,
link |
00:18:09.200
what people actually implement
link |
00:18:10.520
because of the programming language problem.
link |
00:18:13.200
So you think it's just too difficult to write in CUDA C
link |
00:18:20.600
that a higher level programming language like Swift
link |
00:18:24.480
should enable the easier,
link |
00:18:30.480
fooling around creative stuff with RNNs
link |
00:18:33.080
or with sparse convolutional neural networks?
link |
00:18:34.840
Kind of.
link |
00:18:35.680
Who's at fault?
link |
00:18:37.680
Who's at charge of making it easy
link |
00:18:41.000
for a researcher to play around?
link |
00:18:42.240
I mean, no one's at fault,
link |
00:18:43.480
just nobody's got around to it yet,
link |
00:18:45.040
or it's just, it's hard, right?
link |
00:18:46.960
And I mean, part of the fault is that we ignored
link |
00:18:49.280
that whole APL kind of direction.
link |
00:18:53.000
Nearly everybody did for 60 years, 50 years.
link |
00:18:57.440
But recently people have been starting to
link |
00:19:01.520
reinvent pieces of that
link |
00:19:03.480
and kind of create some interesting new directions
link |
00:19:05.400
in the compiler technology.
link |
00:19:07.240
So the place where that's particularly happening right now
link |
00:19:11.680
is something called MLIR,
link |
00:19:13.440
which is something that, again,
link |
00:19:14.840
Chris Latina, the Swift guy, is leading.
link |
00:19:18.000
And yeah, because it's actually not gonna be Swift
link |
00:19:20.560
on its own that solves this problem,
link |
00:19:22.080
because the problem is that currently writing
link |
00:19:24.920
a acceptably fast, you know, GPU program
link |
00:19:30.960
is too complicated regardless of what language you use.
link |
00:19:33.720
Right.
link |
00:19:36.440
And that's just because if you have to deal with the fact
link |
00:19:38.640
that I've got, you know, 10,000 threads
link |
00:19:41.680
and I have to synchronize between them all
link |
00:19:43.440
and I have to put my thing into grid blocks
link |
00:19:45.320
and think about warps and all this stuff,
link |
00:19:47.000
it's just so much boilerplate that to do that well,
link |
00:19:50.680
you have to be a specialist at that
link |
00:19:52.200
and it's gonna be a year's work to, you know,
link |
00:19:56.440
optimize that algorithm in that way.
link |
00:19:59.640
But with things like tensor comprehensions
link |
00:20:03.520
and TILE and MLIR and TVM,
link |
00:20:07.120
there's all these various projects
link |
00:20:08.640
which are all about saying,
link |
00:20:10.840
let's let people create like domain specific languages
link |
00:20:14.000
for tensor computations.
link |
00:20:16.840
These are the kinds of things we do generally
link |
00:20:19.320
on the GPU for deep learning and then have a compiler
link |
00:20:22.840
which can optimize that tensor computation.
link |
00:20:28.080
A lot of this work is actually sitting
link |
00:20:29.840
on top of a project called Halide,
link |
00:20:32.640
which is a mind blowing project where they came up
link |
00:20:37.080
with such a domain specific language.
link |
00:20:38.840
In fact, two, one domain specific language for expressing
link |
00:20:41.200
this is what my tensor computation is
link |
00:20:43.800
and another domain specific language for expressing
link |
00:20:46.280
this is the kind of the way I want you to structure
link |
00:20:50.280
the compilation of that and like do it block by block
link |
00:20:53.040
and do these bits in parallel.
link |
00:20:54.920
And they were able to show how you can compress
link |
00:20:57.720
the amount of code by 10X compared to optimized GPU code
link |
00:21:03.280
and get the same performance.
link |
00:21:05.520
So that's like, so these other things are kind of sitting
link |
00:21:08.080
on top of that kind of research and MLIR is pulling a lot
link |
00:21:12.760
of those best practices together.
link |
00:21:15.120
And now we're starting to see work done on making all
link |
00:21:18.240
of that directly accessible through Swift
link |
00:21:21.360
so that I could use Swift to kind of write those
link |
00:21:23.720
domain specific languages and hopefully we'll get
link |
00:21:27.240
then Swift CUDA kernels written in a very expressive
link |
00:21:30.680
and concise way that looks a bit like J and APL
link |
00:21:34.160
and then Swift layers on top of that
link |
00:21:36.680
and then a Swift UI on top of that.
link |
00:21:38.440
And it'll be so nice if we can get to that point.
link |
00:21:42.600
Now does it all eventually boil down to CUDA
link |
00:21:46.520
and NVIDIA GPUs?
link |
00:21:48.560
Unfortunately at the moment it does,
link |
00:21:50.160
but one of the nice things about MLIR if AMD ever
link |
00:21:54.480
gets their act together which they probably won't
link |
00:21:56.760
is that they or others could write MLIR backends
link |
00:22:02.120
for other GPUs or rather tensor computation devices
link |
00:22:09.720
of which today there are increasing number
link |
00:22:11.600
like Graph Core or Vertex AI or whatever.
link |
00:22:18.760
So yeah, being able to target lots of backends
link |
00:22:22.560
would be another benefit of this
link |
00:22:23.920
and the market really needs competition
link |
00:22:26.680
because at the moment NVIDIA is massively overcharging
link |
00:22:29.520
for their kind of enterprise class cards
link |
00:22:33.640
because there is no serious competition
link |
00:22:36.720
because nobody else is doing the software properly.
link |
00:22:39.280
In the cloud there is some competition, right?
link |
00:22:41.400
But...
link |
00:22:42.920
Not really, other than TPUs perhaps,
link |
00:22:45.120
but TPUs are almost unprogrammable at the moment.
link |
00:22:48.240
So TPUs have the same problem that you can't?
link |
00:22:51.200
It's even worse.
link |
00:22:52.040
So TPUs, Google actually made an explicit decision
link |
00:22:54.840
to make them almost entirely unprogrammable
link |
00:22:57.200
because they felt that there was too much IP in there
link |
00:22:59.960
and if they gave people direct access to program them,
link |
00:23:02.640
people would learn their secrets.
link |
00:23:04.360
So you can't actually directly program the memory
link |
00:23:09.360
in a TPU.
link |
00:23:11.000
You can't even directly create code that runs on
link |
00:23:15.200
and that you look at on the machine that has the TPU,
link |
00:23:18.040
it all goes through a virtual machine.
link |
00:23:19.920
So all you can really do is this kind of cookie cutter thing
link |
00:23:22.920
of like plug in high level stuff together,
link |
00:23:26.720
which is just super tedious and annoying
link |
00:23:30.520
and totally unnecessary.
link |
00:23:32.920
So what was the, tell me if you could,
link |
00:23:36.040
the origin story of fast AI.
link |
00:23:38.080
What is the motivation, its mission, its dream?
link |
00:23:43.280
So I guess the founding story is heavily tied
link |
00:23:48.280
to my previous startup, which is a company called Analytic,
link |
00:23:51.480
which was the first company to focus on deep learning
link |
00:23:54.880
for medicine and I created that because I saw
link |
00:23:58.720
that was a huge opportunity to,
link |
00:24:02.120
there's about a 10X shortage of the number of doctors
link |
00:24:05.840
in the world, in the developing world that we need.
link |
00:24:08.200
I expected it would take about 300 years
link |
00:24:11.760
to train enough doctors to meet that gap.
link |
00:24:13.920
But I guess that maybe if we used deep learning
link |
00:24:19.400
for some of the analytics, we could maybe make it
link |
00:24:22.800
so you don't need as highly trained doctors.
link |
00:24:25.240
For diagnosis.
link |
00:24:26.080
For diagnosis and treatment planning.
link |
00:24:27.720
Where's the biggest benefit just before we get to fast AI,
link |
00:24:31.440
where's the biggest benefit of AI
link |
00:24:33.880
and medicine that you see today?
link |
00:24:36.400
And maybe next time.
link |
00:24:37.240
Not much happening today in terms of like stuff
link |
00:24:39.480
that's actually out there, it's very early.
link |
00:24:41.040
But in terms of the opportunity,
link |
00:24:42.880
it's to take markets like India and China and Indonesia,
link |
00:24:48.720
which have big populations, Africa,
link |
00:24:52.080
small numbers of doctors,
link |
00:24:55.760
and provide diagnostic, particularly treatment planning
link |
00:25:00.760
and triage kind of on device so that if you do a test
link |
00:25:05.760
for malaria or tuberculosis or whatever,
link |
00:25:09.240
you immediately get something that even a healthcare worker
link |
00:25:12.440
that's had a month of training can get
link |
00:25:16.160
a very high quality assessment of whether the patient
link |
00:25:20.440
might be at risk and tell, okay,
link |
00:25:22.320
we'll send them off to a hospital.
link |
00:25:25.240
So for example, in Africa, outside of South Africa,
link |
00:25:29.240
there's only five pediatric radiologists
link |
00:25:31.640
for the entire continent.
link |
00:25:32.960
So most countries don't have any.
link |
00:25:34.720
So if your kid is sick and they need something diagnosed
link |
00:25:37.440
through medical imaging, the person,
link |
00:25:39.800
even if you're able to get medical imaging done,
link |
00:25:41.640
the person that looks at it will be a nurse at best.
link |
00:25:46.400
But actually in India, for example, and China,
link |
00:25:50.080
almost no x rays are read by anybody,
link |
00:25:52.360
by any trained professional because they don't have enough.
link |
00:25:57.040
So if instead we had a algorithm that could take
link |
00:26:02.040
the most likely high risk 5% and say triage,
link |
00:26:08.040
basically say, okay, someone needs to look at this,
link |
00:26:11.040
it would massively change the kind of way
link |
00:26:14.240
that what's possible with medicine in the developing world.
link |
00:26:18.680
And remember, they have, increasingly they have money.
link |
00:26:21.600
They're the developing world, they're not the poor world,
link |
00:26:23.560
they're the developing world.
link |
00:26:24.400
So they have the money.
link |
00:26:25.240
So they're building the hospitals,
link |
00:26:27.040
they're getting the diagnostic equipment,
link |
00:26:30.440
but there's no way for a very long time
link |
00:26:33.320
will they be able to have the expertise.
link |
00:26:37.040
Shortage of expertise, okay.
link |
00:26:38.480
And that's where the deep learning systems can step in
link |
00:26:41.760
and magnify the expertise they do have.
link |
00:26:44.320
Exactly, yeah.
link |
00:26:46.240
So you do see, just to linger a little bit longer,
link |
00:26:51.240
the interaction, do you still see the human experts
link |
00:26:55.760
still at the core of these systems?
link |
00:26:57.560
Yeah, absolutely.
link |
00:26:58.400
Is there something in medicine
link |
00:26:59.240
that could be automated almost completely?
link |
00:27:01.280
I don't see the point of even thinking about that
link |
00:27:03.880
because we have such a shortage of people.
link |
00:27:06.080
Why would we want to find a way not to use them?
link |
00:27:09.760
We have people, so the idea of like,
link |
00:27:13.000
even from an economic point of view,
link |
00:27:14.680
if you can make them 10X more productive,
link |
00:27:17.320
getting rid of the person,
link |
00:27:18.920
doesn't impact your unit economics at all.
link |
00:27:21.600
And it totally ignores the fact
link |
00:27:23.360
that there are things people do better than machines.
link |
00:27:26.520
So it's just to me,
link |
00:27:27.360
that's not a useful way of framing the problem.
link |
00:27:32.000
I guess, just to clarify,
link |
00:27:33.760
I guess I meant there may be some problems
link |
00:27:36.480
where you can avoid even going to the expert ever,
link |
00:27:40.000
sort of maybe preventative care or some basic stuff,
link |
00:27:44.000
allowing food,
link |
00:27:44.840
allowing the expert to focus on the things
link |
00:27:46.600
that are really that, you know.
link |
00:27:49.200
Well, that's what the triage would do, right?
link |
00:27:50.920
So the triage would say,
link |
00:27:52.800
okay, there's 99% sure there's nothing here.
link |
00:27:58.640
So that can be done on device
link |
00:28:01.960
and they can just say, okay, go home.
link |
00:28:03.840
So the experts are being used to look at the stuff
link |
00:28:07.320
which has some chance it's worth looking at,
link |
00:28:10.160
which most things it's not, it's fine.
link |
00:28:14.320
Why do you think that is?
link |
00:28:15.520
You know, it's fine.
link |
00:28:16.880
Why do you think we haven't quite made progress on that yet
link |
00:28:19.920
in terms of the scale of how much AI is applied
link |
00:28:27.000
in the medical field?
link |
00:28:27.840
Oh, there's a lot of reasons.
link |
00:28:28.680
I mean, one is it's pretty new.
link |
00:28:29.720
I only started in Liddick in like 2014.
link |
00:28:32.120
And before that, it's hard to express
link |
00:28:36.040
to what degree the medical world
link |
00:28:37.440
was not aware of the opportunities here.
link |
00:28:40.760
So I went to RSNA,
link |
00:28:42.960
which is the world's largest radiology conference.
link |
00:28:46.240
And I told everybody I could, you know,
link |
00:28:49.520
like I'm doing this thing with deep learning,
link |
00:28:51.760
please come and check it out.
link |
00:28:53.360
And no one had any idea what I was talking about
link |
00:28:56.840
and no one had any interest in it.
link |
00:28:59.680
So like we've come from absolute zero, which is hard.
link |
00:29:05.120
And then the whole regulatory framework, education system,
link |
00:29:09.920
everything is just set up to think of doctoring
link |
00:29:13.440
in a very different way.
link |
00:29:14.960
So today there is a small number of people
link |
00:29:17.120
who are deep learning practitioners
link |
00:29:20.600
and doctors at the same time.
link |
00:29:23.040
And we're starting to see the first ones
link |
00:29:24.640
come out of their PhD programs.
link |
00:29:26.600
So Zach Kahane over in Boston, Cambridge
link |
00:29:31.600
has a number of students now who are data science experts,
link |
00:29:37.880
deep learning experts, and actual medical doctors.
link |
00:29:43.480
Quite a few doctors have completed our fast AI course now
link |
00:29:47.000
and are publishing papers and creating journal reading groups
link |
00:29:52.560
in the American Council of Radiology.
link |
00:29:55.200
And like, it's just starting to happen,
link |
00:29:57.360
but it's gonna be a long time coming.
link |
00:29:59.640
It's gonna happen, but it's gonna be a long process.
link |
00:30:02.880
The regulators have to learn how to regulate this.
link |
00:30:04.880
They have to build guidelines.
link |
00:30:08.720
And then the lawyers at hospitals
link |
00:30:12.120
have to develop a new way of understanding
link |
00:30:15.080
that sometimes it makes sense for data to be looked at
link |
00:30:22.440
in raw form in large quantities
link |
00:30:24.880
in order to create well changing results.
link |
00:30:27.000
Yeah, so the regulation around data, all that,
link |
00:30:30.120
it sounds probably the hardest problem,
link |
00:30:33.880
but sounds reminiscent of autonomous vehicles as well.
link |
00:30:36.800
Many of the same regulatory challenges,
link |
00:30:38.760
many of the same data challenges.
link |
00:30:40.640
Yeah, I mean, funnily enough,
link |
00:30:41.560
the problem is less the regulation
link |
00:30:43.680
and more the interpretation of that regulation
link |
00:30:45.880
by lawyers in hospitals.
link |
00:30:48.240
So HIPAA is actually, was designed to pay,
link |
00:30:52.920
and HIPAA does not stand for privacy.
link |
00:30:56.480
It stands for portability.
link |
00:30:57.680
It's actually meant to be a way that data can be used.
link |
00:31:01.240
And it was created with lots of gray areas
link |
00:31:04.400
because the idea is that would be more practical
link |
00:31:06.560
and it would help people to use this legislation
link |
00:31:10.480
to actually share data in a more thoughtful way.
link |
00:31:13.720
Unfortunately, it's done the opposite
link |
00:31:15.360
because when a lawyer sees a gray area,
link |
00:31:17.800
they say, oh, if we don't know, we won't get sued,
link |
00:31:20.760
then we can't do it.
link |
00:31:22.440
So HIPAA is not exactly the problem.
link |
00:31:26.360
The problem is more that there's,
link |
00:31:29.200
hospital lawyers are not incented
link |
00:31:31.000
to make bold decisions about data portability.
link |
00:31:36.520
Or even to embrace technology that saves lives.
link |
00:31:40.440
They more want to not get in trouble
link |
00:31:42.440
for embracing that technology.
link |
00:31:44.760
It also saves lives in a very abstract way,
link |
00:31:47.840
which is like, oh, we've been able to release
link |
00:31:49.840
these 100,000 anonymized records.
link |
00:31:52.320
I can't point to the specific person
link |
00:31:54.120
whose life that saved.
link |
00:31:55.320
I can say like, oh, we ended up with this paper
link |
00:31:57.720
which found this result,
link |
00:31:58.960
which diagnosed a thousand more people
link |
00:32:02.200
than we would have otherwise,
link |
00:32:03.080
but it's like, which ones were helped?
link |
00:32:05.480
It's very abstract.
link |
00:32:07.320
And on the counter side of that,
link |
00:32:09.360
you may be able to point to a life that was taken
link |
00:32:13.080
because of something that was.
link |
00:32:14.320
Yeah, or a person whose privacy was violated.
link |
00:32:18.160
It's like, oh, this specific person was deidentified.
link |
00:32:24.200
So, identified.
link |
00:32:25.960
Just a fascinating topic.
link |
00:32:27.280
We're jumping around.
link |
00:32:28.240
We'll get back to fast AI,
link |
00:32:29.400
but on the question of privacy,
link |
00:32:32.600
data is the fuel for so much innovation in deep learning.
link |
00:32:38.080
What's your sense on privacy?
link |
00:32:39.760
Whether we're talking about Twitter, Facebook, YouTube,
link |
00:32:44.000
just the technologies like in the medical field
link |
00:32:48.640
that rely on people's data in order to create impact.
link |
00:32:53.360
How do we get that right,
link |
00:32:56.600
respecting people's privacy and yet creating technology
link |
00:33:01.200
that is learning from data?
link |
00:33:03.320
One of my areas of focus is on doing more with less data.
link |
00:33:08.320
More with less data, which,
link |
00:33:11.840
so most vendors, unfortunately,
link |
00:33:14.400
are strongly incented to find ways
link |
00:33:17.560
to require more data and more computation.
link |
00:33:20.040
So, Google and IBM being the most obvious.
link |
00:33:24.400
IBM.
link |
00:33:25.920
Yeah, so Watson.
link |
00:33:27.680
So, Google and IBM both strongly push the idea
link |
00:33:31.160
that you have to be,
link |
00:33:33.080
that they have more data and more computation
link |
00:33:35.440
and more intelligent people than anybody else.
link |
00:33:37.840
And so you have to trust them to do things
link |
00:33:39.880
because nobody else can do it.
link |
00:33:42.640
And Google's very upfront about this,
link |
00:33:45.400
like Jeff Dean has gone out there and given talks
link |
00:33:48.440
and said, our goal is to require
link |
00:33:50.560
a thousand times more computation, but less people.
link |
00:33:55.160
Our goal is to use the people that you have better
link |
00:34:00.640
and the data you have better
link |
00:34:01.680
and the computation you have better.
link |
00:34:03.000
So, one of the things that we've discovered is,
link |
00:34:06.040
or at least highlighted,
link |
00:34:08.000
is that you very, very, very often
link |
00:34:11.080
don't need much data at all.
link |
00:34:13.360
And so the data you already have in your organization
link |
00:34:16.160
will be enough to get state of the art results.
link |
00:34:19.240
So, like my starting point would be to kind of say
link |
00:34:21.320
around privacy is a lot of people are looking for ways
link |
00:34:25.760
to share data and aggregate data,
link |
00:34:28.160
but I think often that's unnecessary.
link |
00:34:29.960
They assume that they need more data than they do
link |
00:34:32.200
because they're not familiar with the basics
link |
00:34:34.160
of transfer learning, which is this critical technique
link |
00:34:38.440
for needing orders of magnitude less data.
link |
00:34:42.000
Is your sense, one reason you might wanna collect data
link |
00:34:44.680
from everyone is like in the recommender system context,
link |
00:34:50.440
where your individual, Jeremy Howard's individual data
link |
00:34:54.520
is the most useful for providing a product
link |
00:34:58.440
that's impactful for you.
link |
00:34:59.840
So, for giving you advertisements,
link |
00:35:02.240
for recommending to you movies,
link |
00:35:04.160
for doing medical diagnosis,
link |
00:35:07.600
is your sense we can build with a small amount of data,
link |
00:35:11.680
general models that will have a huge impact
link |
00:35:15.200
for most people that we don't need to have data
link |
00:35:18.280
from each individual?
link |
00:35:19.160
On the whole, I'd say yes.
link |
00:35:20.560
I mean, there are things like,
link |
00:35:25.240
you know, recommender systems have this cold start problem
link |
00:35:28.360
where, you know, Jeremy is a new customer,
link |
00:35:30.960
we haven't seen him before, so we can't recommend him things
link |
00:35:33.280
based on what else he's bought and liked with us.
link |
00:35:36.000
And there's various workarounds to that.
link |
00:35:38.840
Like in a lot of music programs,
link |
00:35:40.640
we'll start out by saying, which of these artists do you like?
link |
00:35:44.880
Which of these albums do you like?
link |
00:35:46.760
Which of these songs do you like?
link |
00:35:49.760
Netflix used to do that, nowadays they tend not to.
link |
00:35:53.520
People kind of don't like that
link |
00:35:54.760
because they think, oh, we don't wanna bother the user.
link |
00:35:57.320
So, you could work around that
link |
00:35:58.680
by having some kind of data sharing
link |
00:36:00.960
where you get my marketing record from Axiom or whatever,
link |
00:36:04.880
and try to guess from that.
link |
00:36:06.560
To me, the benefit to me and to society
link |
00:36:12.320
of saving me five minutes on answering some questions
link |
00:36:16.440
versus the negative externalities of the privacy issue
link |
00:36:23.480
doesn't add up.
link |
00:36:24.760
So, I think like a lot of the time,
link |
00:36:26.120
the places where people are invading our privacy
link |
00:36:30.120
in order to provide convenience
link |
00:36:32.760
is really about just trying to make them more money
link |
00:36:36.800
and they move these negative externalities
link |
00:36:40.720
to places that they don't have to pay for them.
link |
00:36:44.240
So, when you actually see regulations appear
link |
00:36:48.440
that actually cause the companies
link |
00:36:50.360
that create these negative externalities
link |
00:36:52.080
to have to pay for it themselves,
link |
00:36:53.480
they say, well, we can't do it anymore.
link |
00:36:56.080
So, the cost is actually too high.
link |
00:36:58.160
But for something like medicine,
link |
00:37:00.320
yeah, I mean, the hospital has my medical imaging,
link |
00:37:05.200
my pathology studies, my medical records,
link |
00:37:08.880
and also I own my medical data.
link |
00:37:11.840
So, you can, so I help a startup called Doc.ai.
link |
00:37:16.920
One of the things Doc.ai does is that it has an app.
link |
00:37:19.680
You can connect to, you know, Sutter Health
link |
00:37:23.760
and LabCorp and Walgreens
link |
00:37:26.080
and download your medical data to your phone
link |
00:37:29.800
and then upload it again at your discretion
link |
00:37:33.520
to share it as you wish.
link |
00:37:35.960
So, with that kind of approach,
link |
00:37:38.000
we can share our medical information
link |
00:37:41.120
with the people we want to.
link |
00:37:44.760
Yeah, so control.
link |
00:37:45.680
I mean, really being able to control
link |
00:37:47.440
who you share it with and so on.
link |
00:37:48.760
Yeah.
link |
00:37:49.720
So, that has a beautiful, interesting tangent
link |
00:37:53.480
to return back to the origin story of Fast.ai.
link |
00:37:59.360
Right, so before I started Fast.ai,
link |
00:38:02.480
I spent a year researching
link |
00:38:06.320
where are the biggest opportunities for deep learning?
link |
00:38:10.360
Because I knew from my time at Kaggle in particular
link |
00:38:14.040
that deep learning had kind of hit this threshold point
link |
00:38:16.880
where it was rapidly becoming the state of the art approach
link |
00:38:19.840
in every area that looked at it.
link |
00:38:21.560
And I'd been working with neural nets for over 20 years.
link |
00:38:25.360
I knew that from a theoretical point of view,
link |
00:38:27.400
once it hit that point,
link |
00:38:28.520
it would do that in kind of just about every domain.
link |
00:38:31.520
And so I kind of spent a year researching
link |
00:38:34.440
what are the domains that's gonna have
link |
00:38:36.200
the biggest low hanging fruit
link |
00:38:37.360
in the shortest time period.
link |
00:38:39.360
I picked medicine, but there were so many
link |
00:38:42.040
I could have picked.
link |
00:38:43.880
And so there was a kind of level of frustration for me
link |
00:38:46.200
of like, okay, I'm really glad we've opened up
link |
00:38:49.920
the medical deep learning world.
link |
00:38:51.120
And today it's huge, as you know,
link |
00:38:53.880
but we can't do, I can't do everything.
link |
00:38:58.240
I don't even know, like in medicine,
link |
00:39:00.360
it took me a really long time to even get a sense
link |
00:39:02.240
of like what kind of problems do medical practitioners solve?
link |
00:39:05.040
What kind of data do they have?
link |
00:39:06.360
Who has that data?
link |
00:39:08.480
So I kind of felt like I need to approach this differently
link |
00:39:12.440
if I wanna maximize the positive impact of deep learning.
link |
00:39:16.200
Rather than me picking an area
link |
00:39:19.160
and trying to become good at it and building something,
link |
00:39:21.720
I should let people who are already domain experts
link |
00:39:24.440
in those areas and who already have the data
link |
00:39:27.760
do it themselves.
link |
00:39:29.200
So that was the reason for Fast.ai
link |
00:39:33.080
is to basically try and figure out
link |
00:39:36.760
how to get deep learning into the hands of people
link |
00:39:40.120
who could benefit from it and help them to do so
link |
00:39:43.240
in as quick and easy and effective a way as possible.
link |
00:39:47.080
Got it, so sort of empower the domain experts.
link |
00:39:50.200
Yeah, and like partly it's because like,
link |
00:39:54.240
unlike most people in this field,
link |
00:39:56.280
my background is very applied and industrial.
link |
00:39:59.920
Like my first job was at McKinsey & Company.
link |
00:40:02.440
I spent 10 years in management consulting.
link |
00:40:04.640
I spend a lot of time with domain experts.
link |
00:40:10.440
So I kind of respect them and appreciate them.
link |
00:40:12.760
And I know that's where the value generation in society is.
link |
00:40:16.480
And so I also know how most of them can't code
link |
00:40:21.600
and most of them don't have the time to invest
link |
00:40:26.320
three years in a graduate degree or whatever.
link |
00:40:29.320
So I was like, how do I upskill those domain experts?
link |
00:40:33.520
I think that would be a super powerful thing,
link |
00:40:36.600
the biggest societal impact I could have.
link |
00:40:40.240
So yeah, that was the thinking.
link |
00:40:41.680
So much of Fast.ai students and researchers
link |
00:40:45.680
and the things you teach are pragmatically minded,
link |
00:40:50.160
practically minded,
link |
00:40:52.080
figuring out ways how to solve real problems and fast.
link |
00:40:55.800
So from your experience,
link |
00:40:57.480
what's the difference between theory
link |
00:40:59.120
and practice of deep learning?
link |
00:41:03.680
Well, most of the research in the deep learning world
link |
00:41:07.520
is a total waste of time.
link |
00:41:09.840
Right, that's what I was getting at.
link |
00:41:11.040
Yeah.
link |
00:41:12.200
It's a problem in science in general.
link |
00:41:16.240
Scientists need to be published,
link |
00:41:19.600
which means they need to work on things
link |
00:41:21.480
that their peers are extremely familiar with
link |
00:41:24.080
and can recognize in advance in that area.
link |
00:41:26.200
So that means that they all need to work on the same thing.
link |
00:41:30.120
And so it really, and the thing they work on,
link |
00:41:33.040
there's nothing to encourage them to work on things
link |
00:41:35.640
that are practically useful.
link |
00:41:38.840
So you get just a whole lot of research,
link |
00:41:41.160
which is minor advances and stuff
link |
00:41:43.240
that's been very highly studied
link |
00:41:44.640
and has no significant practical impact.
link |
00:41:49.360
Whereas the things that really make a difference,
link |
00:41:50.920
like I mentioned transfer learning,
link |
00:41:52.800
like if we can do better at transfer learning,
link |
00:41:55.640
then it's this like world changing thing
link |
00:41:58.200
where suddenly like lots more people
link |
00:41:59.800
can do world class work with less resources and less data.
link |
00:42:06.840
But almost nobody works on that.
link |
00:42:08.560
Or another example, active learning,
link |
00:42:10.800
which is the study of like,
link |
00:42:11.920
how do we get more out of the human beings in the loop?
link |
00:42:15.920
That's my favorite topic.
link |
00:42:17.160
Yeah, so active learning is great,
link |
00:42:18.560
but it's almost nobody working on it
link |
00:42:21.200
because it's just not a trendy thing right now.
link |
00:42:23.840
You know what somebody, sorry to interrupt,
link |
00:42:27.080
you're saying that nobody is publishing on active learning,
link |
00:42:31.560
but there's people inside companies,
link |
00:42:33.480
anybody who actually has to solve a problem,
link |
00:42:36.840
they're going to innovate on active learning.
link |
00:42:39.680
Yeah, everybody kind of reinvents active learning
link |
00:42:42.120
when they actually have to work in practice
link |
00:42:43.800
because they start labeling things and they think,
link |
00:42:46.400
gosh, this is taking a long time and it's very expensive.
link |
00:42:49.320
And then they start thinking,
link |
00:42:51.240
well, why am I labeling everything?
link |
00:42:52.640
I'm only, the machine's only making mistakes
link |
00:42:54.840
on those two classes.
link |
00:42:56.040
They're the hard ones.
link |
00:42:56.880
Maybe I'll just start labeling those two classes.
link |
00:42:58.880
And then you start thinking,
link |
00:43:00.360
well, why did I do that manually?
link |
00:43:01.600
Why can't I just get the system to tell me
link |
00:43:03.000
which things are going to be hardest?
link |
00:43:05.080
It's an obvious thing to do, but yeah,
link |
00:43:08.320
it's just like transfer learning.
link |
00:43:11.440
It's understudied and the academic world
link |
00:43:14.160
just has no reason to care about practical results.
link |
00:43:17.480
The funny thing is,
link |
00:43:18.320
like I've only really ever written one paper.
link |
00:43:19.960
I hate writing papers.
link |
00:43:21.560
And I didn't even write it.
link |
00:43:22.800
It was my colleague, Sebastian Ruder,
link |
00:43:24.640
who actually wrote it.
link |
00:43:25.520
I just did the research for it,
link |
00:43:28.080
but it was basically introducing transfer learning,
link |
00:43:30.600
successful transfer learning to NLP for the first time.
link |
00:43:34.280
The algorithm is called ULM fit.
link |
00:43:36.960
And it actually, I actually wrote it for the course,
link |
00:43:42.280
for the Fast AI course.
link |
00:43:43.680
I wanted to teach people NLP and I thought,
link |
00:43:45.760
I only want to teach people practical stuff.
link |
00:43:47.480
And I think the only practical stuff is transfer learning.
link |
00:43:50.520
And I couldn't find any examples of transfer learning in NLP.
link |
00:43:53.280
So I just did it.
link |
00:43:54.520
And I was shocked to find that as soon as I did it,
link |
00:43:57.280
which, you know, the basic prototype took a couple of days,
link |
00:44:01.040
smashed the state of the art
link |
00:44:02.480
on one of the most important data sets
link |
00:44:04.240
in a field that I knew nothing about.
link |
00:44:06.680
And I just thought, well, this is ridiculous.
link |
00:44:10.320
And so I spoke to Sebastian about it
link |
00:44:13.760
and he kindly offered to write it up, the results.
link |
00:44:17.640
And so it ended up being published in ACL,
link |
00:44:21.320
which is the top computational linguistics conference.
link |
00:44:25.520
So like people do actually care once you do it,
link |
00:44:28.840
but I guess it's difficult for maybe like junior researchers
link |
00:44:32.760
or like, I don't care whether I get citations
link |
00:44:36.560
or papers or whatever.
link |
00:44:37.720
There's nothing in my life that makes that important,
link |
00:44:39.600
which is why I've never actually bothered
link |
00:44:41.480
to write a paper myself.
link |
00:44:43.000
But for people who do,
link |
00:44:43.960
I guess they have to pick the kind of safe option,
link |
00:44:49.560
which is like, yeah, make a slight improvement
link |
00:44:52.240
on something that everybody's already working on.
link |
00:44:54.920
Yeah, nobody does anything interesting
link |
00:44:58.240
or succeeds in life with the safe option.
link |
00:45:01.160
Although, I mean, the nice thing is,
link |
00:45:02.400
nowadays everybody is now working on NLP transfer learning
link |
00:45:05.280
because since that time we've had GPT and GPT2 and BERT,
link |
00:45:09.720
and, you know, it's like, it's, so yeah,
link |
00:45:12.640
once you show that something's possible,
link |
00:45:15.360
everybody jumps in, I guess, so.
link |
00:45:17.600
I hope to be a part of,
link |
00:45:19.160
and I hope to see more innovation
link |
00:45:20.640
and active learning in the same way.
link |
00:45:22.120
I think transfer learning and active learning
link |
00:45:24.480
are fascinating, public, open work.
link |
00:45:27.320
I actually helped start a startup called Platform AI,
link |
00:45:29.960
which is really all about active learning.
link |
00:45:31.720
And yeah, it's been interesting trying to kind of see
link |
00:45:35.840
what research is out there and make the most of it.
link |
00:45:37.760
And there's basically none.
link |
00:45:39.160
So we've had to do all our own research.
link |
00:45:41.000
Once again, and just as you described.
link |
00:45:44.240
Can you tell the story of the Stanford competition,
link |
00:45:47.640
Dawn Bench, and FastAI's achievement on it?
link |
00:45:51.480
Sure, so something which I really enjoy
link |
00:45:54.280
is that I basically teach two courses a year,
link |
00:45:57.400
the Practical Deep Learning for Coders,
link |
00:45:59.640
which is kind of the introductory course,
link |
00:46:02.080
and then Cutting Edge Deep Learning for Coders,
link |
00:46:04.000
which is the kind of research level course.
link |
00:46:08.040
And while I teach those courses,
link |
00:46:10.360
I basically have a big office
link |
00:46:16.760
at the University of San Francisco,
link |
00:46:18.520
big enough for like 30 people.
link |
00:46:19.760
And I invite anybody, any student who wants to come
link |
00:46:22.080
and hang out with me while I build the course.
link |
00:46:25.320
And so generally it's full.
link |
00:46:26.600
And so we have 20 or 30 people in a big office
link |
00:46:30.840
with nothing to do but study deep learning.
link |
00:46:33.840
So it was during one of these times
link |
00:46:35.880
that somebody in the group said,
link |
00:46:37.320
oh, there's a thing called Dawn Bench
link |
00:46:40.520
that looks interesting.
link |
00:46:41.400
And I was like, what the hell is that?
link |
00:46:42.760
And they set out some competition
link |
00:46:44.040
to see how quickly you can train a model.
link |
00:46:46.320
Seems kind of, not exactly relevant to what we're doing,
link |
00:46:50.240
but it sounds like the kind of thing
link |
00:46:51.320
which you might be interested in.
link |
00:46:52.400
And I checked it out and I was like,
link |
00:46:53.320
oh crap, there's only 10 days till it's over.
link |
00:46:55.760
It's too late.
link |
00:46:58.000
And we're kind of busy trying to teach this course.
link |
00:47:00.880
But we're like, oh, it would make an interesting case study
link |
00:47:05.520
for the course.
link |
00:47:06.360
It's like, it's all the stuff we're already doing.
link |
00:47:08.160
Why don't we just put together
link |
00:47:09.480
our current best practices and ideas?
link |
00:47:12.440
So me and I guess about four students
link |
00:47:16.040
just decided to give it a go.
link |
00:47:17.520
And we focused on this small one called Cifar 10,
link |
00:47:20.840
which is little 32 by 32 pixel images.
link |
00:47:24.600
Can you say what Dawn Bench is?
link |
00:47:26.080
Yeah, so it's a competition to train a model
link |
00:47:28.600
as fast as possible.
link |
00:47:29.520
It was run by Stanford.
link |
00:47:30.960
And it's cheap as possible too.
link |
00:47:32.480
That's also another one for as cheap as possible.
link |
00:47:34.280
And there was a couple of categories,
link |
00:47:36.400
ImageNet and Cifar 10.
link |
00:47:38.120
So ImageNet is this big 1.3 million image thing
link |
00:47:42.040
that took a couple of days to train.
link |
00:47:45.400
Remember a friend of mine, Pete Warden,
link |
00:47:47.840
who's now at Google.
link |
00:47:51.240
I remember he told me how he trained ImageNet
link |
00:47:53.240
a few years ago when he basically like had this
link |
00:47:58.320
little granny flat out the back
link |
00:47:59.720
that he turned into his ImageNet training center.
link |
00:48:01.880
And he figured, you know, after like a year of work,
link |
00:48:03.760
he figured out how to train it in like 10 days or something.
link |
00:48:07.040
It's like, that was a big job.
link |
00:48:08.440
Whereas Cifar 10, at that time,
link |
00:48:10.480
you could train in a few hours.
link |
00:48:12.840
You know, it's much smaller and easier.
link |
00:48:14.480
So we thought we'd try Cifar 10.
link |
00:48:18.120
And yeah, I've really never done that before.
link |
00:48:23.760
Like I'd never really,
link |
00:48:24.760
like things like using more than one GPU at a time
link |
00:48:27.880
was something I tried to avoid.
link |
00:48:29.800
Cause to me, it's like very against the whole idea
link |
00:48:32.120
of accessibility is should better do things with one GPU.
link |
00:48:35.000
I mean, have you asked in the past before,
link |
00:48:38.000
after having accomplished something,
link |
00:48:39.640
how do I do this faster, much faster?
link |
00:48:42.480
Oh, always, but it's always, for me,
link |
00:48:44.160
it's always how do I make it much faster on a single GPU
link |
00:48:47.680
that a normal person could afford in their day to day life.
link |
00:48:50.360
It's not how could I do it faster by, you know,
link |
00:48:53.880
having a huge data center.
link |
00:48:55.280
Cause to me, it's all about like,
link |
00:48:57.240
as many people should better use something as possible
link |
00:48:59.520
without fussing around with infrastructure.
link |
00:49:04.080
So anyways, in this case it's like, well,
link |
00:49:06.040
we can use eight GPUs just by renting a AWS machine.
link |
00:49:10.200
So we thought we'd try that.
link |
00:49:11.840
And yeah, basically using the stuff we were already doing,
link |
00:49:16.520
we were able to get, you know, the speed,
link |
00:49:20.120
you know, within a few days we had the speed down to,
link |
00:49:23.840
I don't know, a very small number of minutes.
link |
00:49:26.000
I can't remember exactly how many minutes it was,
link |
00:49:28.760
but it might've been like 10 minutes or something.
link |
00:49:31.360
And so, yeah, we found ourselves
link |
00:49:32.880
at the top of the leaderboard easily
link |
00:49:34.720
for both time and money, which really shocked me
link |
00:49:39.040
cause the other people competing in this
link |
00:49:40.160
were like Google and Intel and stuff
link |
00:49:41.880
who I like know a lot more about this stuff
link |
00:49:43.880
than I think we do.
link |
00:49:45.360
So then we were emboldened.
link |
00:49:46.800
We thought let's try the ImageNet one too.
link |
00:49:50.640
I mean, it seemed way out of our league,
link |
00:49:53.320
but our goal was to get under 12 hours.
link |
00:49:55.960
And we did, which was really exciting.
link |
00:49:59.520
But we didn't put anything up on the leaderboard,
link |
00:50:01.400
but we were down to like 10 hours.
link |
00:50:03.040
But then Google put in like five hours or something
link |
00:50:09.960
and we're just like, oh, we're so screwed.
link |
00:50:13.360
But we kind of thought, we'll keep trying.
link |
00:50:16.560
You know, if Google can do it in five,
link |
00:50:17.800
I mean, Google did on five hours on something
link |
00:50:19.480
on like a TPU pod or something, like a lot of hardware.
link |
00:50:23.280
But we kind of like had a bunch of ideas to try.
link |
00:50:26.360
Like a really simple thing was
link |
00:50:28.720
why are we using these big images?
link |
00:50:30.480
They're like 224 or 256 by 256 pixels.
link |
00:50:35.400
You know, why don't we try smaller ones?
link |
00:50:37.720
And just to elaborate, there's a constraint
link |
00:50:40.400
on the accuracy that your trained model
link |
00:50:42.200
is supposed to achieve, right?
link |
00:50:43.040
Yeah, you gotta achieve 93%, I think it was,
link |
00:50:46.400
for ImageNet, exactly.
link |
00:50:49.200
Which is very tough, so you have to.
link |
00:50:51.080
Yeah, 93%, like they picked a good threshold.
link |
00:50:54.680
It was a little bit higher
link |
00:50:56.920
than what the most commonly used ResNet 50 model
link |
00:51:00.840
could achieve at that time.
link |
00:51:03.360
So yeah, so it's quite a difficult problem to solve.
link |
00:51:08.200
But yeah, we realized if we actually
link |
00:51:09.720
just use 64 by 64 images,
link |
00:51:14.680
it trained a pretty good model.
link |
00:51:16.280
And then we could take that same model
link |
00:51:18.040
and just give it a couple of epochs to learn 224 by 224 images.
link |
00:51:21.920
And it was basically already trained.
link |
00:51:24.520
It makes a lot of sense.
link |
00:51:25.480
Like if you teach somebody,
link |
00:51:26.640
like here's what a dog looks like
link |
00:51:28.120
and you show them low res versions,
link |
00:51:30.200
and then you say, here's a really clear picture of a dog,
link |
00:51:33.600
they already know what a dog looks like.
link |
00:51:35.960
So that like just, we jumped to the front
link |
00:51:39.880
and we ended up winning parts of that competition.
link |
00:51:43.880
We actually ended up doing a distributed version
link |
00:51:47.280
over multiple machines a couple of months later
link |
00:51:49.560
and ended up at the top of the leaderboard.
link |
00:51:51.120
We had 18 minutes.
link |
00:51:53.000
ImageNet.
link |
00:51:53.840
Yeah, and it was,
link |
00:51:55.640
and people have just kept on blasting through
link |
00:51:57.920
again and again since then, so.
link |
00:52:00.000
So what's your view on multi GPU
link |
00:52:03.200
or multiple machine training in general
link |
00:52:06.120
as a way to speed code up?
link |
00:52:09.520
I think it's largely a waste of time.
link |
00:52:11.240
Both of them.
link |
00:52:12.080
I think it's largely a waste of time.
link |
00:52:13.960
Both multi GPU on a single machine and.
link |
00:52:15.840
Yeah, particularly multi machines,
link |
00:52:17.640
cause it's just clunky.
link |
00:52:21.840
Multi GPUs is less clunky than it used to be,
link |
00:52:25.320
but to me anything that slows down your iteration speed
link |
00:52:28.520
is a waste of time.
link |
00:52:31.680
So you could maybe do your very last,
link |
00:52:34.960
you know, perfecting of the model on multi GPUs
link |
00:52:38.000
if you need to, but.
link |
00:52:40.040
So for example, I think doing stuff on ImageNet
link |
00:52:44.560
is generally a waste of time.
link |
00:52:46.000
Why test things on 1.3 million images?
link |
00:52:48.240
Most of us don't use 1.3 million images.
link |
00:52:51.080
And we've also done research that shows that
link |
00:52:53.840
doing things on a smaller subset of images
link |
00:52:56.480
gives you the same relative answers anyway.
link |
00:52:59.160
So from a research point of view, why waste that time?
link |
00:53:02.080
So actually I released a couple of new data sets recently.
link |
00:53:06.120
One is called ImageNet,
link |
00:53:07.720
the French ImageNet, which is a small subset of ImageNet,
link |
00:53:12.880
which is designed to be easy to classify.
link |
00:53:15.040
What's, how do you spell ImageNet?
link |
00:53:17.240
It's got an extra T and E at the end,
link |
00:53:19.200
cause it's very French.
link |
00:53:20.440
And then another one called ImageWolf,
link |
00:53:24.680
which is a subset of ImageNet that only contains dog breeds.
link |
00:53:29.960
And that's a hard one, right?
link |
00:53:31.080
That's a hard one.
link |
00:53:31.960
And I've discovered that if you just look at these
link |
00:53:34.120
two subsets, you can train things on a single GPU
link |
00:53:37.760
in 10 minutes.
link |
00:53:39.080
And the results you get are directly transferable
link |
00:53:42.040
to ImageNet nearly all the time.
link |
00:53:44.280
And so now I'm starting to see some researchers
link |
00:53:46.360
start to use these much smaller data sets.
link |
00:53:48.960
I so deeply love the way you think,
link |
00:53:51.120
because I think you might've written a blog post
link |
00:53:55.040
saying that sort of going these big data sets
link |
00:54:00.120
is encouraging people to not think creatively.
link |
00:54:03.840
Absolutely.
link |
00:54:04.680
So you're too, it sort of constrains you to train
link |
00:54:08.760
on large resources.
link |
00:54:09.800
And because you have these resources,
link |
00:54:11.240
you think more research will be better.
link |
00:54:13.960
And then you start, so like somehow you kill the creativity.
link |
00:54:17.720
Yeah, and even worse than that, Lex,
link |
00:54:19.240
I keep hearing from people who say,
link |
00:54:21.160
I decided not to get into deep learning
link |
00:54:23.320
because I don't believe it's accessible to people
link |
00:54:26.040
outside of Google to do useful work.
link |
00:54:28.480
So like I see a lot of people make an explicit decision
link |
00:54:31.600
to not learn this incredibly valuable tool
link |
00:54:35.960
because they've drunk the Google Koolaid,
link |
00:54:39.000
which is that only Google's big enough
link |
00:54:40.680
and smart enough to do it.
link |
00:54:42.400
And I just find that so disappointing and it's so wrong.
link |
00:54:45.320
And I think all of the major breakthroughs in AI
link |
00:54:49.120
in the next 20 years will be doable on a single GPU.
link |
00:54:53.240
Like I would say, my sense is all the big sort of.
link |
00:54:57.360
Well, let's put it this way.
link |
00:54:58.200
None of the big breakthroughs of the last 20 years
link |
00:55:00.120
have required multiple GPUs.
link |
00:55:01.680
So like batch norm, ReLU, Dropout.
link |
00:55:05.920
To demonstrate that there's something to them.
link |
00:55:08.040
Every one of them, none of them has required multiple GPUs.
link |
00:55:11.760
GANs, the original GANs didn't require multiple GPUs.
link |
00:55:15.760
Well, and we've actually recently shown
link |
00:55:18.000
that you don't even need GANs.
link |
00:55:19.600
So we've developed GAN level outcomes without needing GANs.
link |
00:55:24.640
And we can now do it with, again,
link |
00:55:26.840
by using transfer learning,
link |
00:55:27.920
we can do it in a couple of hours on a single GPU.
link |
00:55:30.200
You're just using a generator model
link |
00:55:31.600
without the adversarial part?
link |
00:55:32.960
Yeah, so we've found loss functions
link |
00:55:35.680
that work super well without the adversarial part.
link |
00:55:38.640
And then one of our students, a guy called Jason Antich,
link |
00:55:41.800
has created a system called dealtify,
link |
00:55:44.600
which uses this technique to colorize
link |
00:55:47.240
old black and white movies.
link |
00:55:48.800
You can do it on a single GPU,
link |
00:55:50.440
colorize a whole movie in a couple of hours.
link |
00:55:52.840
And one of the things that Jason and I did together
link |
00:55:56.040
was we figured out how to add a little bit of GAN
link |
00:56:00.400
at the very end, which it turns out for colorization
link |
00:56:02.920
makes it just a bit brighter and nicer.
link |
00:56:05.920
And then Jason did masses of experiments
link |
00:56:07.880
to figure out exactly how much to do,
link |
00:56:09.960
but it's still all done on his home machine
link |
00:56:12.760
on a single GPU in his lounge room.
link |
00:56:15.320
And if you think about colorizing Hollywood movies,
link |
00:56:19.160
that sounds like something a huge studio would have to do,
link |
00:56:21.680
but he has the world's best results on this.
link |
00:56:25.160
There's this problem of microphones.
link |
00:56:27.000
We're just talking to microphones now.
link |
00:56:29.080
It's such a pain in the ass to have these microphones
link |
00:56:32.520
to get good quality audio.
link |
00:56:34.360
And I tried to see if it's possible to plop down
link |
00:56:36.680
a bunch of cheap sensors and reconstruct
link |
00:56:39.200
higher quality audio from multiple sources.
link |
00:56:41.840
Because right now I haven't seen the work from,
link |
00:56:45.160
okay, we can say even expensive mics
link |
00:56:47.440
automatically combining audio from multiple sources
link |
00:56:50.040
to improve the combined audio.
link |
00:56:52.280
People haven't done that.
link |
00:56:53.120
And that feels like a learning problem.
link |
00:56:55.080
So hopefully somebody can.
link |
00:56:56.840
Well, I mean, it's evidently doable
link |
00:56:58.800
and it should have been done by now.
link |
00:57:01.400
I felt the same way about computational photography
link |
00:57:03.600
four years ago.
link |
00:57:05.240
Why are we investing in big lenses
link |
00:57:07.120
when three cheap lenses plus actually
link |
00:57:10.640
a little bit of intentional movement,
link |
00:57:13.760
so like take a few frames,
link |
00:57:16.640
gives you enough information
link |
00:57:18.280
to get excellent subpixel resolution,
link |
00:57:20.560
which particularly with deep learning,
link |
00:57:22.440
you would know exactly what you meant to be looking at.
link |
00:57:25.840
We can totally do the same thing with audio.
link |
00:57:28.160
I think it's madness that it hasn't been done yet.
link |
00:57:30.680
Is there progress on the photography company?
link |
00:57:33.280
Yeah, photography is basically standard now.
link |
00:57:36.720
So the Google Pixel Night Light,
link |
00:57:40.800
I don't know if you've ever tried it,
link |
00:57:42.120
but it's astonishing.
link |
00:57:43.200
You take a picture in almost pitch black
link |
00:57:45.440
and you get back a very high quality image.
link |
00:57:49.160
And it's not because of the lens.
link |
00:57:51.480
Same stuff with like adding the bokeh
link |
00:57:53.440
to the background blurring,
link |
00:57:55.800
it's done computationally.
link |
00:57:57.200
This is the pixel right here.
link |
00:57:58.600
Yeah, basically everybody now
link |
00:58:01.880
is doing most of the fanciest stuff
link |
00:58:05.000
on their phones with computational photography
link |
00:58:07.120
and also increasingly people are putting
link |
00:58:08.680
more than one lens on the back of the camera.
link |
00:58:11.800
So the same will happen for audio for sure.
link |
00:58:14.360
And there's applications in the audio side.
link |
00:58:16.480
If you look at an Alexa type device,
link |
00:58:19.320
most people I've seen,
link |
00:58:20.840
especially I worked at Google before,
link |
00:58:22.320
when you look at noise background removal,
link |
00:58:25.920
you don't think of multiple sources of audio.
link |
00:58:29.560
You don't play with that as much
link |
00:58:31.040
as I would hope people would.
link |
00:58:31.880
But I mean, you can still do it even with one.
link |
00:58:33.600
Like again, not much work's been done in this area.
link |
00:58:36.120
So we're actually gonna be releasing an audio library soon,
link |
00:58:39.000
which hopefully will encourage development of this
link |
00:58:41.040
because it's so underused.
link |
00:58:43.160
The basic approach we used for our super resolution
link |
00:58:46.480
and which Jason uses for dealtify
link |
00:58:48.640
of generating high quality images,
link |
00:58:50.960
the exact same approach would work for audio.
link |
00:58:53.440
No one's done it yet,
link |
00:58:54.440
but it would be a couple of months work.
link |
00:58:57.120
Okay, also learning rate in terms of Dawn Bench.
link |
00:59:01.560
There's some magic on learning rate
link |
00:59:03.520
that you played around with that's kind of interesting.
link |
00:59:05.720
Yeah, so this is all work that came
link |
00:59:06.960
from a guy called Leslie Smith.
link |
00:59:09.280
Leslie's a researcher who, like us,
link |
00:59:12.720
cares a lot about just the practicalities
link |
00:59:15.800
of training neural networks quickly and accurately,
link |
00:59:20.360
which I think is what everybody should care about,
link |
00:59:22.120
but almost nobody does.
link |
00:59:24.920
And he discovered something very interesting,
link |
00:59:28.080
which he calls super convergence,
link |
00:59:29.760
which is there are certain networks
link |
00:59:31.240
that with certain settings of high parameters
link |
00:59:33.320
could suddenly be trained 10 times faster
link |
00:59:37.080
by using a 10 times higher learning rate.
link |
00:59:39.480
Now, no one published that paper
link |
00:59:43.640
because it's not an area of kind of active research
link |
00:59:49.520
in the academic world.
link |
00:59:50.440
No academics recognize that this is important.
link |
00:59:52.640
And also deep learning in academia
link |
00:59:56.080
is not considered a experimental science.
link |
00:59:59.840
So unlike in physics where you could say like,
link |
01:00:02.440
I just saw a subatomic particle do something
link |
01:00:05.360
which the theory doesn't explain,
link |
01:00:07.240
you could publish that without an explanation.
link |
01:00:10.440
And then in the next 60 years,
link |
01:00:11.840
people can try to work out how to explain it.
link |
01:00:14.080
We don't allow this in the deep learning world.
link |
01:00:16.120
So it's literally impossible for Leslie
link |
01:00:19.520
to publish a paper that says,
link |
01:00:21.600
I've just seen something amazing happen.
link |
01:00:23.520
This thing trained 10 times faster than it should have.
link |
01:00:25.640
I don't know why.
link |
01:00:27.360
And so the reviewers were like,
link |
01:00:28.520
well, you can't publish that because you don't know why.
link |
01:00:30.280
So anyway.
link |
01:00:31.120
That's important to pause on
link |
01:00:32.160
because there's so many discoveries
link |
01:00:34.280
that would need to start like that.
link |
01:00:36.120
Every other scientific field I know of works that way.
link |
01:00:39.240
I don't know why ours is uniquely disinterested
link |
01:00:43.520
in publishing unexplained experimental results,
link |
01:00:47.720
but there it is.
link |
01:00:48.680
So it wasn't published.
link |
01:00:51.200
Having said that,
link |
01:00:52.560
I read a lot more unpublished papers than published papers
link |
01:00:56.840
because that's where you find the interesting insights.
link |
01:01:00.040
So I absolutely read this paper.
link |
01:01:02.680
And I was just like,
link |
01:01:04.520
this is astonishingly mind blowing and weird
link |
01:01:08.920
and awesome.
link |
01:01:09.760
And like, why isn't everybody only talking about this?
link |
01:01:12.400
Because like, if you can train these things 10 times faster,
link |
01:01:15.480
they also generalize better
link |
01:01:16.720
because you're doing less epochs,
link |
01:01:18.800
which means you look at the data less,
link |
01:01:20.080
you get better accuracy.
link |
01:01:22.360
So I've been kind of studying that ever since.
link |
01:01:24.640
And eventually Leslie kind of figured out
link |
01:01:28.520
a lot of how to get this done.
link |
01:01:30.120
And we added minor tweaks.
link |
01:01:32.240
And a big part of the trick
link |
01:01:33.600
is starting at a very low learning rate,
link |
01:01:36.440
very gradually increasing it.
link |
01:01:37.880
So as you're training your model,
link |
01:01:39.800
you would take very small steps at the start
link |
01:01:42.120
and you gradually make them bigger and bigger
link |
01:01:44.040
until eventually you're taking much bigger steps
link |
01:01:46.400
than anybody thought was possible.
link |
01:01:49.400
There's a few other little tricks to make it work,
link |
01:01:51.120
but basically we can reliably get super convergence.
link |
01:01:55.240
And so for the Dawn Bench thing,
link |
01:01:56.600
we were using just much higher learning rates
link |
01:01:59.280
than people expected to work.
link |
01:02:02.200
What do you think the future of,
link |
01:02:03.840
I mean, it makes so much sense
link |
01:02:04.880
for that to be a critical hyperparameter learning rate
link |
01:02:07.600
that you vary.
link |
01:02:08.640
What do you think the future
link |
01:02:09.520
of learning rate magic looks like?
link |
01:02:13.480
Well, there's been a lot of great work
link |
01:02:14.920
in the last 12 months in this area.
link |
01:02:17.400
And people are increasingly realizing that optimize,
link |
01:02:20.160
like we just have no idea really how optimizers work.
link |
01:02:23.120
And the combination of weight decay,
link |
01:02:25.840
which is how we regularize optimizers,
link |
01:02:27.480
and the learning rate,
link |
01:02:29.200
and then other things like the epsilon we use
link |
01:02:31.520
in the Adam optimizer,
link |
01:02:32.760
they all work together in weird ways.
link |
01:02:36.560
And different parts of the model,
link |
01:02:38.560
this is another thing we've done a lot of work on
link |
01:02:40.480
is research into how different parts of the model
link |
01:02:43.480
should be trained at different rates in different ways.
link |
01:02:46.640
So we do something we call discriminative learning rates,
link |
01:02:49.040
which is really important,
link |
01:02:50.160
particularly for transfer learning.
link |
01:02:53.240
So really, I think in the last 12 months,
link |
01:02:54.920
a lot of people have realized
link |
01:02:55.880
that all this stuff is important.
link |
01:02:57.400
There's been a lot of great work coming out
link |
01:03:00.000
and we're starting to see algorithms appear,
link |
01:03:03.680
which have very, very few dials, if any,
link |
01:03:06.920
that you have to touch.
link |
01:03:07.960
So I think what's gonna happen
link |
01:03:09.280
is the idea of a learning rate,
link |
01:03:10.440
well, it almost already has disappeared
link |
01:03:12.840
in the latest research.
link |
01:03:14.360
And instead, it's just like we know enough
link |
01:03:18.240
about how to interpret the gradients
link |
01:03:22.600
and the change of gradients we see
link |
01:03:23.840
to know how to set every parameter
link |
01:03:25.320
in an optimal way.
link |
01:03:26.160
So you see the future of deep learning
link |
01:03:30.840
where really, where's the input of a human expert needed?
link |
01:03:34.560
Well, hopefully the input of a human expert
link |
01:03:36.520
will be almost entirely unneeded
link |
01:03:38.760
from the deep learning point of view.
link |
01:03:40.440
So again, like Google's approach to this
link |
01:03:43.480
is to try and use thousands of times more compute
link |
01:03:46.000
to run lots and lots of models at the same time
link |
01:03:49.400
and hope that one of them is good.
link |
01:03:51.080
AutoML kind of thing?
link |
01:03:51.920
Yeah, AutoML kind of stuff, which I think is insane.
link |
01:03:56.720
When you better understand the mechanics
link |
01:03:59.600
of how models learn,
link |
01:04:01.680
you don't have to try a thousand different models
link |
01:04:03.800
to find which one happens to work the best.
link |
01:04:05.640
You can just jump straight to the best one,
link |
01:04:08.120
which means that it's more accessible
link |
01:04:09.720
in terms of compute, cheaper,
link |
01:04:12.720
and also with less hyperparameters to set,
link |
01:04:14.920
it means you don't need deep learning experts
link |
01:04:16.800
to train your deep learning model for you,
link |
01:04:19.320
which means that domain experts can do more of the work,
link |
01:04:22.280
which means that now you can focus the human time
link |
01:04:24.960
on the kind of interpretation, the data gathering,
link |
01:04:28.320
identifying model errors and stuff like that.
link |
01:04:31.360
Yeah, the data side.
link |
01:04:32.840
How often do you work with data these days
link |
01:04:34.720
in terms of the cleaning, looking at it?
link |
01:04:37.800
Like Darwin looked at different species
link |
01:04:41.120
while traveling about.
link |
01:04:42.880
Do you look at data?
link |
01:04:45.000
Have you in your roots in Kaggle?
link |
01:04:48.040
Always, yeah.
link |
01:04:48.880
Look at data.
link |
01:04:49.720
Yeah, I mean, it's a key part of our course.
link |
01:04:51.320
It's like before we train a model in the course,
link |
01:04:53.480
we see how to look at the data.
link |
01:04:55.200
And then the first thing we do
link |
01:04:56.520
after we train our first model,
link |
01:04:57.920
which we fine tune an ImageNet model for five minutes.
link |
01:05:00.520
And then the thing we immediately do after that
link |
01:05:02.240
is we learn how to analyze the results of the model
link |
01:05:05.800
by looking at examples of misclassified images
link |
01:05:08.920
and looking at a classification matrix,
link |
01:05:10.880
and then doing research on Google
link |
01:05:15.080
to learn about the kinds of things that it's misclassifying.
link |
01:05:18.120
So to me, one of the really cool things
link |
01:05:19.520
about machine learning models in general
link |
01:05:21.840
is that when you interpret them,
link |
01:05:24.320
they tell you about things like
link |
01:05:25.400
what are the most important features,
link |
01:05:27.320
which groups are you misclassifying,
link |
01:05:29.360
and they help you become a domain expert more quickly
link |
01:05:32.440
because you can focus your time on the bits
link |
01:05:34.840
that the model is telling you is important.
link |
01:05:38.680
So it lets you deal with things like data leakage,
link |
01:05:40.720
for example, if it says,
link |
01:05:41.720
oh, the main feature I'm looking at is customer ID.
link |
01:05:45.640
And you're like, oh, customer ID should be predictive.
link |
01:05:47.600
And then you can talk to the people
link |
01:05:50.640
that manage customer IDs and they'll tell you like,
link |
01:05:53.240
oh yes, as soon as a customer's application is accepted,
link |
01:05:57.480
we add a one on the end of their customer ID or something.
link |
01:06:01.160
So yeah, looking at data,
link |
01:06:03.720
particularly from the lens of which parts of the data
link |
01:06:06.000
the model says is important is super important.
link |
01:06:09.360
Yeah, and using the model to almost debug the data
link |
01:06:12.920
to learn more about the data.
link |
01:06:14.240
Exactly.
link |
01:06:16.800
What are the different cloud options
link |
01:06:18.600
for training your own networks?
link |
01:06:20.160
Last question related to DawnBench.
link |
01:06:21.960
Well, it's part of a lot of the work you do,
link |
01:06:24.200
but from a perspective of performance,
link |
01:06:27.240
I think you've written this in a blog post.
link |
01:06:29.440
There's AWS, there's TPU from Google.
link |
01:06:32.720
What's your sense?
link |
01:06:33.560
What the future holds?
link |
01:06:34.480
What would you recommend now in terms of training?
link |
01:06:37.360
So from a hardware point of view,
link |
01:06:40.520
Google's TPUs and the best Nvidia GPUs are similar.
link |
01:06:45.320
I mean, maybe the TPUs are like 30% faster,
link |
01:06:47.920
but they're also much harder to program.
link |
01:06:49.920
There isn't a clear leader in terms of hardware right now,
link |
01:06:54.640
although much more importantly,
link |
01:06:56.240
the Nvidia GPUs are much more programmable.
link |
01:06:59.520
They've got much more written for all of them.
link |
01:07:00.920
So like that's the clear leader for me
link |
01:07:03.120
and where I would spend my time
link |
01:07:04.360
as a researcher and practitioner.
link |
01:07:08.560
But then in terms of the platform,
link |
01:07:12.160
I mean, we're super lucky now with stuff like Google GCP,
link |
01:07:16.200
Google Cloud, and AWS that you can access a GPU
link |
01:07:21.480
pretty quickly and easily.
link |
01:07:25.400
But I mean, for AWS, it's still too hard.
link |
01:07:28.040
Like you have to find an AMI and get the instance running
link |
01:07:33.720
and then install the software you want and blah, blah, blah.
link |
01:07:37.040
GCP is currently the best way to get started
link |
01:07:40.720
on a full server environment
link |
01:07:42.280
because they have a fantastic fast AI in PyTorch ready
link |
01:07:46.360
to go instance, which has all the courses preinstalled.
link |
01:07:51.040
It has Jupyter Notebook pre running.
link |
01:07:53.000
Jupyter Notebook is this wonderful
link |
01:07:55.880
interactive computing system,
link |
01:07:57.560
which everybody basically should be using
link |
01:08:00.360
for any kind of data driven research.
link |
01:08:02.880
But then even better than that,
link |
01:08:05.600
there are platforms like Salamander, which we own
link |
01:08:09.480
and Paperspace, where literally you click a single button
link |
01:08:13.560
and it pops up a Jupyter Notebook straight away
link |
01:08:17.200
without any kind of installation or anything.
link |
01:08:22.200
And all the course notebooks are all preinstalled.
link |
01:08:25.800
So like for me, this is one of the things
link |
01:08:28.560
we spent a lot of time kind of curating and working on.
link |
01:08:34.200
Because when we first started our courses,
link |
01:08:35.960
the biggest problem was people dropped out of lesson one
link |
01:08:39.600
because they couldn't get an AWS instance running.
link |
01:08:42.680
So things are so much better now.
link |
01:08:44.880
And like we actually have, if you go to course.fast.ai,
link |
01:08:47.800
the first thing it says is here's how to get started
link |
01:08:49.680
with your GPU.
link |
01:08:50.520
And there's like, you just click on the link
link |
01:08:52.120
and you click start and you're going.
link |
01:08:55.360
You'll go GCP.
link |
01:08:56.280
I have to confess, I've never used the Google GCP.
link |
01:08:58.800
Yeah, GCP gives you $300 of compute for free,
link |
01:09:01.640
which is really nice.
link |
01:09:03.920
But as I say, Salamander and Paperspace
link |
01:09:07.280
are even easier still.
link |
01:09:09.440
Okay.
link |
01:09:10.960
So from the perspective of deep learning frameworks,
link |
01:09:15.080
you work with fast.ai, if you go to this framework,
link |
01:09:18.440
and PyTorch and TensorFlow.
link |
01:09:21.240
What are the strengths of each platform in your perspective?
link |
01:09:25.800
So in terms of what we've done our research on
link |
01:09:28.760
and taught in our course,
link |
01:09:30.240
we started with Theano and Keras,
link |
01:09:34.360
and then we switched to TensorFlow and Keras,
link |
01:09:38.080
and then we switched to PyTorch,
link |
01:09:40.360
and then we switched to PyTorch and fast.ai.
link |
01:09:42.960
And that kind of reflects a growth and development
link |
01:09:47.560
of the ecosystem of deep learning libraries.
link |
01:09:52.560
Theano and TensorFlow were great,
link |
01:09:57.080
but were much harder to teach and to do research
link |
01:10:00.800
and development on because they define
link |
01:10:02.800
what's called a computational graph upfront,
link |
01:10:05.080
a static graph, where you basically have to say,
link |
01:10:07.520
here are all the things that I'm gonna eventually do
link |
01:10:10.880
in my model, and then later on you say,
link |
01:10:13.240
okay, do those things with this data.
link |
01:10:15.120
And you can't like debug them,
link |
01:10:17.160
you can't do them step by step,
link |
01:10:18.560
you can't program them interactively
link |
01:10:20.160
in a Jupyter notebook and so forth.
link |
01:10:22.320
PyTorch was not the first,
link |
01:10:23.760
but PyTorch was certainly the strongest entrant
link |
01:10:26.880
to come along and say, let's not do it that way,
link |
01:10:28.720
let's just use normal Python.
link |
01:10:31.400
And everything you know about in Python
link |
01:10:32.920
is just gonna work, and we'll figure out
link |
01:10:35.280
how to make that run on the GPU as and when necessary.
link |
01:10:40.840
That turned out to be a huge leap
link |
01:10:44.640
in terms of what we could do with our research
link |
01:10:46.840
and what we could do with our teaching.
link |
01:10:49.760
Because it wasn't limiting.
link |
01:10:51.240
Yeah, I mean, it was critical for us
link |
01:10:52.760
for something like DawnBench
link |
01:10:53.880
to be able to rapidly try things.
link |
01:10:55.960
It's just so much harder to be a researcher
link |
01:10:57.840
and practitioner when you have to do everything upfront
link |
01:11:00.520
and you can't inspect it.
link |
01:11:03.400
Problem with PyTorch is it's not at all accessible
link |
01:11:07.960
to newcomers because you have to like
link |
01:11:10.160
write your own training loop and manage the gradients
link |
01:11:12.920
and all this stuff.
link |
01:11:15.680
And it's also like not great for researchers
link |
01:11:17.880
because you're spending your time dealing
link |
01:11:19.640
with all this boilerplate and overhead
link |
01:11:21.640
rather than thinking about your algorithm.
link |
01:11:23.880
So we ended up writing this very multi layered API
link |
01:11:27.760
that at the top level, you can train
link |
01:11:29.960
a state of the art neural network
link |
01:11:31.400
in three lines of code.
link |
01:11:33.640
And which kind of talks to an API,
link |
01:11:35.120
which talks to an API, which talks to an API,
link |
01:11:36.680
which like you can dive into at any level
link |
01:11:38.880
and get progressively closer to the machine
link |
01:11:42.720
kind of levels of control.
link |
01:11:45.360
And this is the fast AI library.
link |
01:11:47.480
That's been critical for us and for our students
link |
01:11:51.840
and for lots of people that have won deep learning
link |
01:11:54.200
competitions with it and written academic papers with it.
link |
01:11:58.400
It's made a big difference.
link |
01:12:00.640
We're still limited though by Python.
link |
01:12:03.920
And particularly this problem with things like
link |
01:12:06.400
recurrent neural nets say where you just can't change things
link |
01:12:11.400
unless you accept it going so slowly that it's impractical.
link |
01:12:15.640
So in the latest incarnation of the course
link |
01:12:18.320
and with some of the research we're now starting to do,
link |
01:12:20.880
we're starting to do stuff, some stuff in Swift.
link |
01:12:24.520
I think we're three years away from that
link |
01:12:28.040
being super practical, but I'm in no hurry.
link |
01:12:31.040
I'm very happy to invest the time to get there.
link |
01:12:35.520
But with that, we actually already have a nascent version
link |
01:12:39.040
of the fast AI library for vision running
link |
01:12:42.520
on Swift and TensorFlow.
link |
01:12:44.760
Cause a Python for TensorFlow is not gonna cut it.
link |
01:12:48.040
It's just a disaster.
link |
01:12:49.960
What they did was they tried to replicate
link |
01:12:53.960
the bits that people were saying they like about PyTorch,
link |
01:12:57.120
this kind of interactive computation,
link |
01:12:59.200
but they didn't actually change
link |
01:13:00.640
their foundational runtime components.
link |
01:13:03.920
So they kind of added this like syntax sugar
link |
01:13:06.640
they call TF Eager, TensorFlow Eager,
link |
01:13:08.400
which makes it look a lot like PyTorch,
link |
01:13:10.920
but it's 10 times slower than PyTorch
link |
01:13:12.760
to actually do a step.
link |
01:13:16.400
So because they didn't invest the time in like retooling
link |
01:13:20.200
the foundations, cause their code base is so horribly
link |
01:13:23.280
complex.
link |
01:13:24.120
Yeah, I think it's probably very difficult
link |
01:13:25.280
to do that kind of retooling.
link |
01:13:26.440
Yeah, well, particularly the way TensorFlow was written,
link |
01:13:28.640
it was written by a lot of people very quickly
link |
01:13:31.480
in a very disorganized way.
link |
01:13:33.320
So like when you actually look in the code,
link |
01:13:35.000
as I do often, I'm always just like,
link |
01:13:37.080
Oh God, what were they thinking?
link |
01:13:38.840
It's just, it's pretty awful.
link |
01:13:41.400
So I'm really extremely negative
link |
01:13:45.240
about the potential future for Python for TensorFlow.
link |
01:13:50.080
But Swift for TensorFlow can be a different beast altogether.
link |
01:13:53.760
It can be like, it can basically be a layer on top of MLIR
link |
01:13:57.560
that takes advantage of, you know,
link |
01:14:00.440
all the great compiler stuff that Swift builds on with LLVM
link |
01:14:04.760
and yeah, I think it will be absolutely fantastic.
link |
01:14:10.280
Well, you're inspiring me to try.
link |
01:14:11.880
I haven't truly felt the pain of TensorFlow 2.0 Python.
link |
01:14:17.640
It's fine by me, but of...
link |
01:14:21.040
Yeah, I mean, it does the job
link |
01:14:22.120
if you're using like predefined things
link |
01:14:25.120
that somebody has already written.
link |
01:14:27.720
But if you actually compare, you know,
link |
01:14:29.560
like I've had to do,
link |
01:14:31.360
cause I've been having to do a lot of stuff
link |
01:14:32.640
with TensorFlow recently,
link |
01:14:33.680
you actually compare like,
link |
01:14:34.760
okay, I want to write something from scratch
link |
01:14:37.360
and you're like, I just keep finding it's like,
link |
01:14:38.880
Oh, it's running 10 times slower than PyTorch.
link |
01:14:41.520
So is the biggest cost,
link |
01:14:43.800
let's throw running time out the window.
link |
01:14:47.320
How long it takes you to program?
link |
01:14:49.600
That's not too different now,
link |
01:14:50.960
thanks to TensorFlow Eager, that's not too different.
link |
01:14:54.040
But because so many things take so long to run,
link |
01:14:58.640
you wouldn't run it at 10 times slower.
link |
01:15:00.280
Like you just go like, Oh, this is taking too long.
link |
01:15:03.240
And also there's a lot of things
link |
01:15:04.240
which are just less programmable,
link |
01:15:05.840
like tf.data, which is the way data processing works
link |
01:15:08.960
in TensorFlow is just this big mess.
link |
01:15:11.360
It's incredibly inefficient.
link |
01:15:13.200
And they kind of had to write it that way
link |
01:15:14.800
because of the TPU problems I described earlier.
link |
01:15:19.160
So I just, you know,
link |
01:15:22.160
I just feel like they've got this huge technical debt,
link |
01:15:24.720
which they're not going to solve
link |
01:15:26.200
without starting from scratch.
link |
01:15:27.920
So here's an interesting question then,
link |
01:15:29.400
if there's a new student starting today,
link |
01:15:34.560
what would you recommend they use?
link |
01:15:37.480
Well, I mean, we obviously recommend Fastai and PyTorch
link |
01:15:40.440
because we teach new students and that's what we teach with.
link |
01:15:43.880
So we would very strongly recommend that
link |
01:15:46.080
because it will let you get on top of the concepts
link |
01:15:50.000
much more quickly.
link |
01:15:51.920
So then you'll become an actual,
link |
01:15:53.120
and you'll also learn the actual state
link |
01:15:54.920
of the art techniques, you know,
link |
01:15:56.400
so you actually get world class results.
link |
01:15:59.200
Honestly, it doesn't much matter what library you learn
link |
01:16:03.920
because switching from the trainer to MXNet
link |
01:16:08.320
to TensorFlow to PyTorch is gonna be a couple of days work
link |
01:16:12.000
as long as you understand the foundation as well.
link |
01:16:15.240
But you think will Swift creep in there
link |
01:16:19.400
as a thing that people start using?
link |
01:16:22.920
Not for a few years,
link |
01:16:24.360
particularly because like Swift has no data science
link |
01:16:29.720
community, libraries, schooling.
link |
01:16:33.400
And the Swift community has a total lack of appreciation
link |
01:16:39.080
and understanding of numeric computing.
link |
01:16:40.880
So like they keep on making stupid decisions, you know,
link |
01:16:43.600
for years, they've just done dumb things
link |
01:16:45.440
around performance and prioritization.
link |
01:16:50.240
That's clearly changing now
link |
01:16:53.440
because the developer of Swift, Chris Latner,
link |
01:16:58.000
is working at Google on Swift for TensorFlow.
link |
01:17:00.720
So like that's a priority.
link |
01:17:04.120
It'll be interesting to see what happens with Apple
link |
01:17:05.800
because like Apple hasn't shown any sign of caring
link |
01:17:10.760
about numeric programming in Swift.
link |
01:17:13.760
So I mean, hopefully they'll get off their ass
link |
01:17:17.360
and start appreciating this
link |
01:17:18.800
because currently all of their low level libraries
link |
01:17:22.200
are not written in Swift.
link |
01:17:25.080
They're not particularly Swifty at all,
link |
01:17:27.360
stuff like CoreML, they're really pretty rubbish.
link |
01:17:30.760
So yeah, so there's a long way to go.
link |
01:17:33.680
But at least one nice thing is that Swift for TensorFlow
link |
01:17:36.080
can actually directly use Python code and Python libraries
link |
01:17:40.760
in a literally the entire lesson one notebook of fast AI
link |
01:17:45.040
runs in Swift right now in Python mode.
link |
01:17:48.560
So that's a nice intermediate thing.
link |
01:17:51.640
How long does it take?
link |
01:17:53.320
If you look at the two fast AI courses,
link |
01:17:57.560
how long does it take to get from point zero
link |
01:18:00.440
to completing both courses?
link |
01:18:03.240
It varies a lot.
link |
01:18:05.720
Somewhere between two months and two years generally.
link |
01:18:13.120
So for two months, how many hours a day on average?
link |
01:18:16.040
So like somebody who is a very competent coder
link |
01:18:20.480
can do 70 hours per course and pick up 70.
link |
01:18:27.800
70, seven zero, that's it, okay.
link |
01:18:30.760
But a lot of people I know take a year off
link |
01:18:35.640
to study fast AI full time and say at the end of the year,
link |
01:18:40.440
they feel pretty competent
link |
01:18:43.440
because generally there's a lot of other things you do
link |
01:18:45.560
like generally they'll be entering Kaggle competitions,
link |
01:18:48.680
they might be reading Ian Goodfellow's book,
link |
01:18:51.440
they might, they'll be doing a bunch of stuff
link |
01:18:54.560
and often particularly if they are a domain expert,
link |
01:18:57.760
their coding skills might be a little
link |
01:19:00.560
on the pedestrian side.
link |
01:19:01.720
So part of it's just like doing a lot more writing.
link |
01:19:04.760
What do you find is the bottleneck for people usually
link |
01:19:07.960
except getting started and setting stuff up?
link |
01:19:11.720
I would say coding.
link |
01:19:13.360
Yeah, I would say the best,
link |
01:19:14.320
the people who are strong coders pick it up the best.
link |
01:19:18.800
Although another bottleneck is people who have a lot
link |
01:19:21.640
of experience of classic statistics can really struggle
link |
01:19:27.440
because the intuition is so the opposite
link |
01:19:30.000
of what they're used to.
link |
01:19:30.880
They're very used to like trying to reduce the number
link |
01:19:33.040
of parameters in their model
link |
01:19:34.320
and looking at individual coefficients and stuff like that.
link |
01:19:39.400
So I find people who have a lot of coding background
link |
01:19:42.920
and know nothing about statistics
link |
01:19:44.640
are generally gonna be the best off.
link |
01:19:48.560
So you taught several courses on deep learning
link |
01:19:51.360
and as Feynman says,
link |
01:19:52.960
best way to understand something is to teach it.
link |
01:19:55.640
What have you learned about deep learning from teaching it?
link |
01:19:59.160
A lot.
link |
01:20:00.600
That's a key reason for me to teach the courses.
link |
01:20:03.560
I mean, obviously it's gonna be necessary
link |
01:20:04.960
to achieve our goal of getting domain experts
link |
01:20:07.680
to be familiar with deep learning,
link |
01:20:09.320
but it was also necessary for me to achieve my goal
link |
01:20:12.080
of being really familiar with deep learning.
link |
01:20:18.240
I mean, to see so many domain experts
link |
01:20:24.080
from so many different backgrounds,
link |
01:20:25.680
it's definitely, I wouldn't say taught me,
link |
01:20:28.840
but convinced me something that I liked to believe was true,
link |
01:20:32.200
which was anyone can do it.
link |
01:20:34.920
So there's a lot of kind of snobbishness out there
link |
01:20:37.440
about only certain people can learn to code.
link |
01:20:40.240
Only certain people are gonna be smart enough
link |
01:20:42.000
like do AI, that's definitely bullshit.
link |
01:20:45.360
I've seen so many people from so many different backgrounds
link |
01:20:48.880
get state of the art results in their domain areas now.
link |
01:20:53.880
It's definitely taught me that the key differentiator
link |
01:20:57.160
between people that succeed
link |
01:20:58.720
and people that fail is tenacity.
link |
01:21:00.680
That seems to be basically the only thing that matters.
link |
01:21:05.560
A lot of people give up.
link |
01:21:06.760
But of the ones who don't give up,
link |
01:21:09.760
pretty much everybody succeeds.
link |
01:21:12.760
Even if at first I'm just kind of like thinking like,
link |
01:21:15.640
wow, they really aren't quite getting it yet, are they?
link |
01:21:18.440
But eventually people get it and they succeed.
link |
01:21:22.560
So I think that's been,
link |
01:21:24.240
I think they're both things I liked to believe was true,
link |
01:21:26.560
but I don't feel like I really had strong evidence
link |
01:21:28.680
for them to be true,
link |
01:21:29.520
but now I can say I've seen it again and again.
link |
01:21:32.520
I've seen it again and again. So what advice do you have
link |
01:21:37.760
for someone who wants to get started in deep learning?
link |
01:21:42.200
Train lots of models.
link |
01:21:44.400
That's how you learn it.
link |
01:21:47.080
So I think, it's not just me,
link |
01:21:51.600
I think our course is very good,
link |
01:21:53.360
but also lots of people independently
link |
01:21:54.760
have said it's very good.
link |
01:21:55.600
It recently won the COGx award for AI courses
link |
01:21:58.640
as being the best in the world.
link |
01:21:59.920
So I'd say come to our course, course.fast.ai.
link |
01:22:02.960
And the thing I keep on hopping on in my lessons
link |
01:22:05.240
is train models, print out the inputs to the models,
link |
01:22:09.120
print out to the outputs to the models,
link |
01:22:11.040
like study, change the inputs a bit,
link |
01:22:15.320
look at how the outputs vary,
link |
01:22:17.320
just run lots of experiments
link |
01:22:18.600
to get an intuitive understanding of what's going on.
link |
01:22:25.400
To get hooked, do you think, you mentioned training,
link |
01:22:29.080
do you think just running the models inference,
link |
01:22:32.640
like if we talk about getting started?
link |
01:22:35.400
No, you've got to fine tune the models.
link |
01:22:37.480
So that's the critical thing,
link |
01:22:39.480
because at that point you now have a model
link |
01:22:41.240
that's in your domain area.
link |
01:22:43.280
So there's no point running somebody else's model
link |
01:22:46.840
because it's not your model.
link |
01:22:48.120
So it only takes five minutes to fine tune a model
link |
01:22:50.480
for the data you care about.
link |
01:22:52.080
And in lesson two of the course,
link |
01:22:53.560
we teach you how to create your own data set from scratch
link |
01:22:56.360
by scripting Google image search.
link |
01:22:58.560
So, and we show you how to actually create
link |
01:23:01.120
a web application running online.
link |
01:23:02.840
So I create one in the course that differentiates
link |
01:23:05.280
between a teddy bear, a grizzly bear and a brown bear.
link |
01:23:08.320
And it does it with basically 100% accuracy,
link |
01:23:11.040
took me about four minutes to scrape the images
link |
01:23:13.120
from Google search in the script.
link |
01:23:15.080
There's a little graphical widgets we have in the notebook
link |
01:23:18.760
that help you clean up the data set.
link |
01:23:21.400
There's other widgets that help you study the results
link |
01:23:24.040
to see where the errors are happening.
link |
01:23:26.360
And so now we've got over a thousand replies
link |
01:23:29.280
in our share your work here thread
link |
01:23:31.400
of students saying, here's the thing I built.
link |
01:23:34.280
And so there's people who like,
link |
01:23:35.880
and a lot of them are state of the art.
link |
01:23:37.600
Like somebody said, oh, I tried looking
link |
01:23:39.000
at Devangari characters and I couldn't believe it.
link |
01:23:41.160
The thing that came out was more accurate
link |
01:23:43.320
than the best academic paper after lesson one.
link |
01:23:46.640
And then there's others which are just more kind of fun,
link |
01:23:48.560
like somebody who's doing Trinidad and Tobago hummingbirds.
link |
01:23:53.080
She said that's kind of their national bird
link |
01:23:54.880
and she's got something that can now classify Trinidad
link |
01:23:57.400
and Tobago hummingbirds.
link |
01:23:58.840
So yeah, train models, fine tune models with your data set
link |
01:24:02.440
and then study their inputs and outputs.
link |
01:24:05.200
How much is Fast.ai courses?
link |
01:24:07.160
Free.
link |
01:24:08.920
Everything we do is free.
link |
01:24:10.520
We have no revenue sources of any kind.
link |
01:24:12.720
It's just a service to the community.
link |
01:24:15.400
You're a saint.
link |
01:24:16.600
Okay, once a person understands the basics,
link |
01:24:20.080
trains a bunch of models,
link |
01:24:22.360
if we look at the scale of years,
link |
01:24:25.840
what advice do you have for someone wanting
link |
01:24:27.600
to eventually become an expert?
link |
01:24:30.800
Train lots of models.
link |
01:24:31.800
But specifically train lots of models in your domain area.
link |
01:24:35.320
So an expert what, right?
link |
01:24:37.040
We don't need more expert,
link |
01:24:39.120
like create slightly evolutionary research in areas
link |
01:24:45.400
that everybody's studying.
link |
01:24:46.680
We need experts at using deep learning
link |
01:24:50.400
to diagnose malaria.
link |
01:24:52.600
Or we need experts at using deep learning
link |
01:24:55.480
to analyze language to study media bias.
link |
01:25:01.000
So we need experts in analyzing fisheries
link |
01:25:08.320
to identify problem areas in the ocean.
link |
01:25:11.880
That's what we need.
link |
01:25:13.200
So become the expert in your passion area.
link |
01:25:17.720
And this is a tool which you can use for just about anything
link |
01:25:21.200
and you'll be able to do that thing better
link |
01:25:22.880
than other people, particularly by combining it
link |
01:25:25.720
with your passion and domain expertise.
link |
01:25:27.400
So that's really interesting.
link |
01:25:28.360
Even if you do wanna innovate on transfer learning
link |
01:25:30.840
or active learning, your thought is,
link |
01:25:34.000
I mean, it's one I certainly share,
link |
01:25:36.200
is you also need to find a domain or data set
link |
01:25:40.120
that you actually really care for.
link |
01:25:42.000
If you're not working on a real problem that you understand,
link |
01:25:45.360
how do you know if you're doing it any good?
link |
01:25:48.040
How do you know if your results are good?
link |
01:25:49.320
How do you know if you're getting bad results?
link |
01:25:50.800
Why are you getting bad results?
link |
01:25:52.040
Is it a problem with the data?
link |
01:25:54.080
Like, how do you know you're doing anything useful?
link |
01:25:57.400
Yeah, to me, the only really interesting research is,
link |
01:26:00.960
not the only, but the vast majority
link |
01:26:02.360
of interesting research is like,
link |
01:26:04.480
try and solve an actual problem and solve it really well.
link |
01:26:06.880
So both understanding sufficient tools
link |
01:26:09.440
on the deep learning side and becoming a domain expert
link |
01:26:13.720
in a particular domain are really things
link |
01:26:15.640
within reach for anybody.
link |
01:26:18.240
Yeah, I mean, to me, I would compare it
link |
01:26:20.520
to like studying self driving cars,
link |
01:26:23.440
having never looked at a car or been in a car
link |
01:26:26.520
or turned a car on, which is like the way it is
link |
01:26:29.320
for a lot of people, they'll study some academic data set
link |
01:26:33.960
where they literally have no idea about that.
link |
01:26:36.200
By the way, I'm not sure how familiar
link |
01:26:37.680
with autonomous vehicles, but that is literally,
link |
01:26:40.840
you describe a large percentage of robotics folks
link |
01:26:43.400
working in self driving cars is they actually
link |
01:26:45.800
haven't considered driving.
link |
01:26:48.640
They haven't actually looked at what driving looks like.
link |
01:26:50.560
They haven't driven.
link |
01:26:51.400
And it's a problem because you know,
link |
01:26:53.280
when you've actually driven, you know,
link |
01:26:54.360
like these are the things that happened
link |
01:26:55.920
to me when I was driving.
link |
01:26:57.400
There's nothing that beats the real world examples
link |
01:26:59.640
of just experiencing them.
link |
01:27:02.360
You've created many successful startups.
link |
01:27:04.840
What does it take to create a successful startup?
link |
01:27:08.600
Same thing as becoming a successful
link |
01:27:11.480
deep learning practitioner, which is not giving up.
link |
01:27:15.000
So you can run out of money or run out of time
link |
01:27:23.160
or run out of something, you know,
link |
01:27:24.680
but if you keep costs super low
link |
01:27:28.000
and try and save up some money beforehand
link |
01:27:29.920
so you can afford to have some time,
link |
01:27:35.360
then just sticking with it is one important thing.
link |
01:27:38.040
Doing something you understand and care about is important.
link |
01:27:42.640
By something, I don't mean,
link |
01:27:44.840
the biggest problem I see with deep learning people
link |
01:27:46.680
is they do a PhD in deep learning
link |
01:27:50.120
and then they try and commercialize their PhD.
link |
01:27:52.400
It is a waste of time
link |
01:27:53.280
because that doesn't solve an actual problem.
link |
01:27:55.840
You picked your PhD topic
link |
01:27:57.560
because it was an interesting kind of engineering
link |
01:28:00.080
or math or research exercise.
link |
01:28:02.480
But yeah, if you've actually spent time as a recruiter
link |
01:28:06.640
and you know that most of your time was spent
link |
01:28:09.240
sifting through resumes
link |
01:28:10.640
and you know that most of the time
link |
01:28:12.840
you're just looking for certain kinds of things
link |
01:28:14.680
and you can try doing that with a model for a few minutes
link |
01:28:19.680
and see whether that's something which a model
link |
01:28:21.000
seems to be able to do as well as you could,
link |
01:28:23.720
then you're on the right track to creating a startup.
link |
01:28:27.600
And then I think just, yeah, being, just be pragmatic and
link |
01:28:32.280
try and stay away from venture capital money
link |
01:28:36.760
as long as possible, preferably forever.
link |
01:28:39.160
So yeah, on that point, do you venture capital?
link |
01:28:43.400
So did you, were you able to successfully run startups
link |
01:28:47.120
with self funded for quite a while?
link |
01:28:48.200
Yeah, so my first two were self funded
link |
01:28:50.160
and that was the right way to do it.
link |
01:28:52.320
Is that scary?
link |
01:28:54.240
No, VC startups are much more scary
link |
01:28:57.800
because you have these people on your back
link |
01:29:00.640
who do this all the time and who have done it for years
link |
01:29:03.320
telling you grow, grow, grow, grow.
link |
01:29:05.400
And they don't care if you fail.
link |
01:29:07.160
They only care if you don't grow fast enough.
link |
01:29:09.440
So that's scary.
link |
01:29:10.800
Whereas doing the ones myself, well, with partners
link |
01:29:16.600
who were friends was nice
link |
01:29:18.400
because like we just went along at a pace that made sense
link |
01:29:22.360
and we were able to build it to something
link |
01:29:23.760
which was big enough that we never had to work again
link |
01:29:27.280
but was not big enough that any VC
link |
01:29:29.280
would think it was impressive.
link |
01:29:31.480
And that was enough for us to be excited, you know?
link |
01:29:35.920
So I thought that's a much better way
link |
01:29:38.840
to do things than most people.
link |
01:29:40.280
In generally speaking, not for yourself
link |
01:29:41.920
but how do you make money during that process?
link |
01:29:44.520
Do you cut into savings?
link |
01:29:47.440
So yeah, so for, so I started Fast Mail
link |
01:29:49.840
and Optimal Decisions at the same time in 1999
link |
01:29:52.760
with two different friends.
link |
01:29:54.560
And for Fast Mail, I guess I spent $70 a month
link |
01:30:01.160
on the server.
link |
01:30:04.000
And when the server ran out of space
link |
01:30:06.240
I put a payments button on the front page
link |
01:30:09.400
and said, if you want more than 10 mega space
link |
01:30:11.880
you have to pay $10 a year.
link |
01:30:15.640
And.
link |
01:30:16.480
So run low, like keep your costs down.
link |
01:30:18.520
Yeah, so I kept my costs down.
link |
01:30:19.480
And once, you know, once I needed to spend more money
link |
01:30:22.960
I asked people to spend the money for me.
link |
01:30:25.600
And that, that was that.
link |
01:30:28.400
Basically from then on, we were making money
link |
01:30:30.800
and I was profitable from then.
link |
01:30:35.400
For Optimal Decisions, it was a bit harder
link |
01:30:37.680
because we were trying to sell something
link |
01:30:40.040
that was more like a $1 million sale.
link |
01:30:42.160
But what we did was we would sell scoping projects.
link |
01:30:46.400
So kind of like prototypy projects
link |
01:30:50.560
but rather than doing it for free
link |
01:30:51.720
we would sell them 50 to $100,000.
link |
01:30:54.200
So again, we were covering our costs
link |
01:30:56.920
and also making the client feel
link |
01:30:58.320
like we were doing something valuable.
link |
01:31:00.200
So in both cases, we were profitable from six months in.
link |
01:31:06.000
Ah, nevertheless, it's scary.
link |
01:31:08.160
I mean, yeah, sure.
link |
01:31:10.040
I mean, it's, it's scary before you jump in
link |
01:31:13.280
and I just, I guess I was comparing it
link |
01:31:15.600
to the scarediness of VC.
link |
01:31:18.120
I felt like with VC stuff, it was more scary.
link |
01:31:20.480
Kind of much more in somebody else's hands,
link |
01:31:24.320
will they fund you or not?
link |
01:31:26.120
And what do they think of what you're doing?
link |
01:31:27.840
I also found it very difficult with VCs,
link |
01:31:29.760
back startups to actually do the thing
link |
01:31:32.600
which I thought was important for the company
link |
01:31:34.880
rather than doing the thing
link |
01:31:35.920
which I thought would make the VC happy.
link |
01:31:38.840
And VCs always tell you not to do the thing
link |
01:31:40.880
that makes them happy.
link |
01:31:42.360
But then if you don't do the thing that makes them happy
link |
01:31:44.040
they get sad, so.
link |
01:31:46.360
And do you think optimizing for the,
link |
01:31:48.080
whatever they call it, the exit is a good thing
link |
01:31:51.960
to optimize for?
link |
01:31:53.040
I mean, it can be, but not at the VC level
link |
01:31:54.880
because the VC exit needs to be, you know, a thousand X.
link |
01:31:59.560
So where else the lifestyle exit,
link |
01:32:03.120
if you can sell something for $10 million,
link |
01:32:05.360
then you've made it, right?
link |
01:32:06.440
So I don't, it depends.
link |
01:32:09.160
If you want to build something that's gonna,
link |
01:32:11.200
you're kind of happy to do forever, then fine.
link |
01:32:13.560
If you want to build something you want to sell
link |
01:32:16.720
in three years time, that's fine too.
link |
01:32:18.440
I mean, they're both perfectly good outcomes.
link |
01:32:21.280
So you're learning Swift now, in a way.
link |
01:32:24.880
I mean, you've already.
link |
01:32:25.720
I'm trying to.
link |
01:32:26.760
And I read that you use, at least in some cases,
link |
01:32:31.120
space repetition as a mechanism for learning new things.
link |
01:32:34.400
I use Anki quite a lot myself.
link |
01:32:36.400
Me too.
link |
01:32:38.920
I actually never talk to anybody about it.
link |
01:32:41.440
Don't know how many people do it,
link |
01:32:44.120
but it works incredibly well for me.
link |
01:32:46.720
Can you talk to your experience?
link |
01:32:47.920
Like how did you, what do you?
link |
01:32:51.080
First of all, okay, let's back it up.
link |
01:32:53.080
What is space repetition?
link |
01:32:55.080
So space repetition is an idea created
link |
01:33:00.280
by a psychologist named Ebbinghaus.
link |
01:33:04.200
I don't know, must be a couple of hundred years ago
link |
01:33:06.080
or something, 150 years ago.
link |
01:33:08.000
He did something which sounds pretty damn tedious.
link |
01:33:10.680
He wrote down random sequences of letters on cards
link |
01:33:15.600
and tested how well he would remember
link |
01:33:18.840
those random sequences a day later, a week later, whatever.
link |
01:33:23.000
He discovered that there was this kind of a curve
link |
01:33:26.120
where his probability of remembering one of them
link |
01:33:28.800
would be dramatically smaller the next day
link |
01:33:30.640
and then a little bit smaller the next day
link |
01:33:31.960
and a little bit smaller the next day.
link |
01:33:33.520
What he discovered is that if he revised those cards
link |
01:33:36.880
after a day, the probabilities would decrease
link |
01:33:41.600
at a smaller rate.
link |
01:33:42.880
And then if you revise them again a week later,
link |
01:33:44.960
they would decrease at a smaller rate again.
link |
01:33:47.040
And so he basically figured out a roughly optimal equation
link |
01:33:51.800
for when you should revise something you wanna remember.
link |
01:33:56.560
So space repetition learning is using this simple algorithm,
link |
01:34:00.440
just something like revise something after a day
link |
01:34:03.640
and then three days and then a week and then three weeks
link |
01:34:06.640
and so forth.
link |
01:34:07.720
And so if you use a program like Anki, as you know,
link |
01:34:10.680
it will just do that for you.
link |
01:34:12.120
And it will say, did you remember this?
link |
01:34:14.560
And if you say no, it will reschedule it back
link |
01:34:17.680
to appear again like 10 times faster
link |
01:34:20.320
than it otherwise would have.
link |
01:34:23.080
It's a kind of a way of being guaranteed to learn something
link |
01:34:27.920
because by definition, if you're not learning it,
link |
01:34:30.240
it will be rescheduled to be revised more quickly.
link |
01:34:33.680
Unfortunately though, it's also like,
link |
01:34:36.120
it doesn't let you fool yourself.
link |
01:34:37.480
If you're not learning something,
link |
01:34:40.160
you know like your revisions will just get more and more.
link |
01:34:44.080
So you have to find ways to learn things productively
link |
01:34:48.280
and effectively like treat your brain well.
link |
01:34:50.560
So using like mnemonics and stories and context
link |
01:34:54.880
and stuff like that.
link |
01:34:57.560
So yeah, it's a super great technique.
link |
01:34:59.760
It's like learning how to learn is something
link |
01:35:01.360
which everybody should learn
link |
01:35:03.800
before they actually learn anything.
link |
01:35:05.680
But almost nobody does.
link |
01:35:07.840
So what have you, so it certainly works well
link |
01:35:10.120
for learning new languages for, I mean,
link |
01:35:13.720
for learning like small projects almost.
link |
01:35:16.440
But do you, you know, I started using it for,
link |
01:35:19.840
I forget who wrote a blog post about this inspired me.
link |
01:35:22.160
It might've been you, I'm not sure.
link |
01:35:26.840
I started when I read papers,
link |
01:35:28.520
I'll concepts and ideas, I'll put them.
link |
01:35:31.920
Was it Michael Nielsen?
link |
01:35:32.840
It was Michael Nielsen.
link |
01:35:33.680
So Michael started doing this recently
link |
01:35:36.400
and has been writing about it.
link |
01:35:41.000
So the kind of today's Ebbinghaus
link |
01:35:43.200
is a guy called Peter Wozniak
link |
01:35:45.080
who developed a system called SuperMemo.
link |
01:35:47.720
And he's been basically trying to become like
link |
01:35:51.680
the world's greatest Renaissance man
link |
01:35:54.080
over the last few decades.
link |
01:35:55.960
He's basically lived his life
link |
01:35:57.280
with space repetition learning for everything.
link |
01:36:03.840
I, and sort of like,
link |
01:36:05.800
Michael's only very recently got into this,
link |
01:36:07.440
but he started really getting excited
link |
01:36:08.920
about doing it for a lot of different things.
link |
01:36:11.200
For me personally, I actually don't use it
link |
01:36:14.600
for anything except Chinese.
link |
01:36:16.920
And the reason for that is that
link |
01:36:20.120
Chinese is specifically a thing I made a conscious decision
link |
01:36:23.080
that I want to continue to remember,
link |
01:36:27.680
even if I don't get much of a chance to exercise it,
link |
01:36:30.080
cause like I'm not often in China, so I don't.
link |
01:36:33.840
Or else something like programming languages or papers.
link |
01:36:38.280
I have a very different approach,
link |
01:36:39.600
which is I try not to learn anything from them,
link |
01:36:43.040
but instead I try to identify the important concepts
link |
01:36:47.040
and like actually ingest them.
link |
01:36:48.960
So like really understand that concept deeply
link |
01:36:53.600
and study it carefully.
link |
01:36:54.760
I will decide if it really is important,
link |
01:36:56.560
if it is like incorporated into our library,
link |
01:37:01.560
you know, incorporated into how I do things
link |
01:37:04.160
or decide it's not worth it, say.
link |
01:37:07.960
So I find, I find I then remember the things
link |
01:37:12.200
that I care about because I'm using it all the time.
link |
01:37:15.720
So I've, for the last 25 years,
link |
01:37:20.160
I've committed to spending at least half of every day
link |
01:37:23.440
learning or practicing something new,
link |
01:37:25.920
which is all my colleagues have always hated
link |
01:37:28.800
because it always looks like I'm not working on
link |
01:37:31.040
what I'm meant to be working on,
link |
01:37:32.000
but it always means I do everything faster
link |
01:37:34.560
because I've been practicing a lot of stuff.
link |
01:37:36.920
So I kind of give myself a lot of opportunity
link |
01:37:39.400
to practice new things.
link |
01:37:41.680
And so I find now I don't,
link |
01:37:43.280
yeah, I don't often kind of find myself
link |
01:37:47.840
wishing I could remember something
link |
01:37:50.240
because if it's something that's useful,
link |
01:37:51.400
then I've been using it a lot.
link |
01:37:53.840
It's easy enough to look it up on Google,
link |
01:37:56.120
but speaking Chinese, you can't look it up on Google.
link |
01:37:59.640
Do you have advice for people learning new things?
link |
01:38:01.520
So if you, what have you learned as a process as a,
link |
01:38:04.800
I mean, it all starts with just making the hours
link |
01:38:07.600
and the day available.
link |
01:38:08.920
Yeah, you got to stick with it,
link |
01:38:10.120
which is again, the number one thing
link |
01:38:12.000
that 99% of people don't do.
link |
01:38:13.600
So the people I started learning Chinese with,
link |
01:38:15.840
none of them were still doing it 12 months later.
link |
01:38:18.320
I'm still doing it 10 years later.
link |
01:38:20.320
I tried to stay in touch with them,
link |
01:38:21.840
but they just, no one did it.
link |
01:38:24.560
For something like Chinese,
link |
01:38:26.160
like study how human learning works.
link |
01:38:28.440
So every one of my Chinese flashcards
link |
01:38:31.160
is associated with a story.
link |
01:38:33.680
And that story is specifically designed to be memorable.
link |
01:38:36.680
And we find things memorable,
link |
01:38:37.800
which are like funny or disgusting or sexy
link |
01:38:41.320
or related to people that we know or care about.
link |
01:38:44.200
So I try to make sure all of the stories
link |
01:38:46.040
that are in my head have those characteristics.
link |
01:38:51.000
Yeah, so you have to, you know,
link |
01:38:52.120
you won't remember things well
link |
01:38:53.200
if they don't have some context.
link |
01:38:56.000
And yeah, you won't remember them well
link |
01:38:57.240
if you don't regularly practice them,
link |
01:39:00.600
whether it be just part of your day to day life
link |
01:39:02.440
or the Chinese and me flashcards.
link |
01:39:06.040
I mean, the other thing is,
link |
01:39:07.800
I'll let yourself fail sometimes.
link |
01:39:09.520
So like I've had various medical problems
link |
01:39:11.840
over the last few years.
link |
01:39:13.040
And basically my flashcards
link |
01:39:16.400
just stopped for about three years.
link |
01:39:18.640
And there've been other times I've stopped for a few months
link |
01:39:22.600
and it's so hard because you get back to it
link |
01:39:24.240
and it's like, you have 18,000 cards due.
link |
01:39:27.400
It's like, and so you just have to go, all right,
link |
01:39:30.920
well, I can either stop and give up everything
link |
01:39:34.160
or just decide to do this every day for the next two years
link |
01:39:37.560
until I get back to it.
link |
01:39:39.000
The amazing thing has been that even after three years,
link |
01:39:41.680
I, you know, the Chinese were still in there.
link |
01:39:45.880
Like it was so much faster to relearn
link |
01:39:48.480
than it was to learn the first time.
link |
01:39:50.120
Yeah, absolutely.
link |
01:39:52.320
It's in there.
link |
01:39:53.160
I have the same with guitar, with music and so on.
link |
01:39:56.560
It's sad because the work sometimes takes away
link |
01:39:59.160
and then you won't play for a year.
link |
01:40:01.200
But really, if you then just get back to it every day,
link |
01:40:03.560
you're right there again.
link |
01:40:06.040
What do you think is the next big breakthrough
link |
01:40:08.400
in artificial intelligence?
link |
01:40:09.400
What are your hopes in deep learning or beyond
link |
01:40:12.720
that people should be working on
link |
01:40:14.120
or you hope there'll be breakthroughs?
link |
01:40:16.320
I don't think it's possible to predict.
link |
01:40:17.960
I think what we already have
link |
01:40:20.600
is an incredibly powerful platform
link |
01:40:23.720
to solve lots of societally important problems
link |
01:40:26.520
that are currently unsolved.
link |
01:40:27.600
So I just hope that people will,
link |
01:40:29.920
lots of people will learn this toolkit and try to use it.
link |
01:40:33.360
I don't think we need a lot of new technological breakthroughs
link |
01:40:36.800
to do a lot of great work right now.
link |
01:40:39.880
And when do you think we're going to create
link |
01:40:42.760
a human level intelligence system?
link |
01:40:45.160
Do you think?
link |
01:40:46.000
Don't know.
link |
01:40:46.840
How hard is it?
link |
01:40:47.680
How far away are we?
link |
01:40:48.720
Don't know.
link |
01:40:49.560
Don't know.
link |
01:40:50.400
I have no way to know.
link |
01:40:51.240
I don't know why people make predictions about this
link |
01:40:53.840
because there's no data and nothing to go on.
link |
01:40:57.480
And it's just like,
link |
01:41:00.320
there's so many societally important problems
link |
01:41:03.480
to solve right now.
link |
01:41:04.400
I just don't find it a really interesting question
link |
01:41:08.680
to even answer.
link |
01:41:10.280
So in terms of societally important problems,
link |
01:41:12.960
what's the problem that is within reach?
link |
01:41:16.360
Well, I mean, for example,
link |
01:41:17.440
there are problems that AI creates, right?
link |
01:41:19.760
So more specifically,
link |
01:41:23.160
labor force displacement is going to be huge
link |
01:41:26.800
and people keep making this
link |
01:41:29.160
frivolous econometric argument of being like,
link |
01:41:31.520
oh, there's been other things that aren't AI
link |
01:41:33.960
that have come along before
link |
01:41:34.920
and haven't created massive labor force displacement,
link |
01:41:37.800
therefore AI won't.
link |
01:41:39.880
So that's a serious concern for you?
link |
01:41:41.560
Oh yeah.
link |
01:41:42.400
Andrew Yang is running on it.
link |
01:41:43.680
Yeah, it's, I'm desperately concerned.
link |
01:41:47.320
And you see already that the changing workplace
link |
01:41:53.080
has led to a hollowing out of the middle class.
link |
01:41:55.720
You're seeing that students coming out of school today
link |
01:41:59.000
have a less rosy financial future ahead of them
link |
01:42:03.120
than their parents did,
link |
01:42:03.960
which has never happened in recent,
link |
01:42:06.560
in the last few hundred years.
link |
01:42:08.600
You know, we've always had progress before.
link |
01:42:11.760
And you see this turning into anxiety
link |
01:42:15.520
and despair and even violence.
link |
01:42:19.440
So I very much worry about that.
link |
01:42:23.400
You've written quite a bit about ethics too.
link |
01:42:25.720
I do think that every data scientist
link |
01:42:29.600
working with deep learning needs to recognize
link |
01:42:33.920
they have an incredibly high leverage tool
link |
01:42:35.600
that they're using that can influence society
link |
01:42:37.960
in lots of ways.
link |
01:42:39.000
And if they're doing research,
link |
01:42:40.320
that that research is gonna be used by people
link |
01:42:42.760
doing this kind of work.
link |
01:42:44.400
And they have a responsibility to consider the consequences
link |
01:42:48.360
and to think about things like
link |
01:42:51.760
how will humans be in the loop here?
link |
01:42:53.920
How do we avoid runaway feedback loops?
link |
01:42:56.520
How do we ensure an appeals process for humans
link |
01:42:59.200
that are impacted by my algorithm?
link |
01:43:01.720
How do I ensure that the constraints of my algorithm
link |
01:43:04.960
are adequately explained to the people
link |
01:43:06.720
that end up using them?
link |
01:43:09.160
There's all kinds of human issues
link |
01:43:11.880
which only data scientists are actually
link |
01:43:15.400
in the right place to educate people are about,
link |
01:43:17.960
but data scientists tend to think of themselves
link |
01:43:20.280
as just engineers and that they don't need
link |
01:43:23.400
to be part of that process, which is wrong.
link |
01:43:26.720
Well, you're in the perfect position to educate them better,
link |
01:43:30.320
to read literature, to read history, to learn from history.
link |
01:43:35.800
Well, Jeremy, thank you so much for everything you do
link |
01:43:39.160
for inspiring huge amount of people,
link |
01:43:41.360
getting them into deep learning
link |
01:43:42.520
and having the ripple effects,
link |
01:43:45.120
the flap of a butterfly's wings
link |
01:43:47.480
that will probably change the world.
link |
01:43:48.680
So thank you very much.
link |
01:43:50.120
Thank you, thank you, thank you, thank you.