back to index

Jeremy Howard: fast.ai Deep Learning Courses and Research | Lex Fridman Podcast #35


small model | large model

link |
00:00:00.000
The following is a conversation with Jeremy Howard.
link |
00:00:03.120
He's the founder of Fast AI, a research institute dedicated
link |
00:00:07.080
to making deep learning more accessible.
link |
00:00:09.760
He's also a distinguished research scientist
link |
00:00:12.560
at the University of San Francisco,
link |
00:00:14.600
a former president of Kegel, as well as a top breaking
link |
00:00:17.600
competitor there.
link |
00:00:18.800
And in general, he's a successful entrepreneur,
link |
00:00:21.680
educator, researcher, and an inspiring personality
link |
00:00:25.240
in the AI community.
link |
00:00:27.000
When someone asked me, how do I get
link |
00:00:28.680
started with deep learning?
link |
00:00:30.240
Fast AI is one of the top places I point them to.
link |
00:00:33.360
It's free.
link |
00:00:34.120
It's easy to get started.
link |
00:00:35.520
It's insightful and accessible.
link |
00:00:37.600
And if I may say so, it has very little BS.
link |
00:00:40.960
It can sometimes dilute the value of educational content
link |
00:00:44.160
on popular topics like deep learning.
link |
00:00:46.720
Fast AI has a focus on practical application
link |
00:00:49.440
of deep learning and hands on exploration
link |
00:00:51.600
of the cutting edge that is incredibly
link |
00:00:53.880
both accessible to beginners and useful to experts.
link |
00:00:57.960
This is the Artificial Intelligence Podcast.
link |
00:01:01.360
If you enjoy it, subscribe on YouTube,
link |
00:01:03.760
give it five stars on iTunes, support it on Patreon,
link |
00:01:06.920
or simply connect with me on Twitter.
link |
00:01:09.040
Alex Friedman, spelled F R I D M A N.
link |
00:01:13.280
And now, here's my conversation with Jeremy Howard.
link |
00:01:18.560
What's the first program you ever written?
link |
00:01:21.680
First program I wrote that I remember
link |
00:01:24.800
would be at high school.
link |
00:01:29.200
I did an assignment where I decided
link |
00:01:31.240
to try to find out if there were some better musical scales
link |
00:01:36.240
than the normal 12 tone, 12 interval scale.
link |
00:01:40.640
So I wrote a program on my Commodore 64 in BASIC
link |
00:01:43.680
that searched through other scale sizes
link |
00:01:46.080
to see if it could find one where there
link |
00:01:48.440
were more accurate harmonies.
link |
00:01:51.880
Like mid tone?
link |
00:01:53.040
Like you want an actual exactly 3 to 2 ratio,
link |
00:01:56.520
where else with a 12 interval scale,
link |
00:01:59.400
it's not exactly 3 to 2, for example.
link |
00:02:01.480
So that's well tempered, as they say.
link |
00:02:05.080
And BASIC on a Commodore 64.
link |
00:02:07.680
Where was the interest in music from?
link |
00:02:09.440
Or is it just technical?
link |
00:02:10.480
I did music all my life, so I played saxophone and clarinet
link |
00:02:14.640
and piano and guitar and drums and whatever.
link |
00:02:18.120
How does that thread go through your life?
link |
00:02:22.200
Where's music today?
link |
00:02:24.160
It's not where I wish it was.
link |
00:02:28.320
For various reasons, couldn't really keep it going,
link |
00:02:30.200
particularly because I had a lot of problems with RSI,
link |
00:02:32.560
with my fingers.
link |
00:02:33.480
And so I had to cut back anything that used hands
link |
00:02:37.360
and fingers.
link |
00:02:39.360
I hope one day I'll be able to get back to it health wise.
link |
00:02:43.920
So there's a love for music underlying it all.
link |
00:02:46.240
Sure, yeah.
link |
00:02:47.840
What's your favorite instrument?
link |
00:02:49.480
Saxophone.
link |
00:02:50.360
Sax.
link |
00:02:51.000
Baritone saxophone.
link |
00:02:52.840
Well, probably bass saxophone, but they're awkward.
link |
00:02:57.440
Well, I always love it when music is
link |
00:03:00.120
coupled with programming.
link |
00:03:01.760
There's something about a brain that
link |
00:03:03.800
utilizes those that emerges with creative ideas.
link |
00:03:07.520
So you've used and studied quite a few programming languages.
link |
00:03:11.200
Can you give an overview of what you've used?
link |
00:03:15.120
What are the pros and cons of each?
link |
00:03:17.920
Well, my favorite programming environment almost certainly
link |
00:03:21.960
was Microsoft Access back in the earliest days.
link |
00:03:26.520
So that was a special basic for applications, which
link |
00:03:29.080
is not a good programming language,
link |
00:03:30.720
but the programming environment is fantastic.
link |
00:03:33.080
It's like the ability to create user interfaces and tied data
link |
00:03:40.120
and actions to them and create reports and all that.
link |
00:03:43.720
As I've never seen anything as good.
link |
00:03:46.800
So things nowadays like Airtable, which
link |
00:03:48.920
are like small subsets of that, which people love for good reason.
link |
00:03:56.200
But unfortunately, nobody's ever achieved anything like that.
link |
00:04:01.160
What is that, if you could pause on that for a second?
link |
00:04:03.320
Oh, Access.
link |
00:04:03.840
Access.
link |
00:04:04.340
Is it a fundamental database?
link |
00:04:06.320
It was a database program that Microsoft produced,
link |
00:04:09.600
part of Office, and it kind of withered.
link |
00:04:13.440
But basically, it lets you in a totally graphical way
link |
00:04:16.320
create tables and relationships and queries
link |
00:04:18.480
and tie them to forms and set up event handlers and calculations.
link |
00:04:24.720
And it was a very complete, powerful system designed
link |
00:04:28.680
for not massive scalable things, but for useful little applications
link |
00:04:35.000
that I loved.
link |
00:04:36.400
So what's the connection between Excel and Access?
link |
00:04:40.240
So very close.
link |
00:04:42.160
So Access was the relational database equivalent,
link |
00:04:47.680
if you like.
link |
00:04:48.360
So people still do a lot of that stuff
link |
00:04:51.080
that should be in Access in Excel because they know it.
link |
00:04:54.120
Excel's great as well.
link |
00:04:56.680
But it's just not as rich a programming model as VBA
link |
00:05:01.760
combined with a relational database.
link |
00:05:04.680
And so I've always loved relational databases.
link |
00:05:07.320
But today, programming on top of relational databases
link |
00:05:11.080
is just a lot more of a headache.
link |
00:05:13.840
You generally either need to kind of,
link |
00:05:16.680
you need something that connects, that runs some kind
link |
00:05:19.040
of database server, unless you use SQLite, which
link |
00:05:21.560
has its own issues.
link |
00:05:25.000
Then you kind of often, if you want
link |
00:05:26.320
to get a nice programming model, you
link |
00:05:27.760
need to create an ORM on top.
link |
00:05:30.440
And then, I don't know, there's all these pieces tied together.
link |
00:05:34.360
And it's just a lot more awkward than it should be.
link |
00:05:37.000
There are people that are trying to make it easier,
link |
00:05:39.200
so in particular, I think of Fsharp, Don Syme, who him
link |
00:05:44.480
and his team have done a great job of making something
link |
00:05:49.320
like a database appear in the type system,
link |
00:05:51.640
so you actually get tab completion for fields and tables
link |
00:05:54.960
and stuff like that.
link |
00:05:57.840
Anyway, so that was kind of, anyway,
link |
00:05:59.280
so that whole VBA Office thing, I guess,
link |
00:06:01.880
was a starting point, which is your miss.
link |
00:06:04.560
And I got into Standard Visual Basic, which
link |
00:06:07.800
that's interesting, just to pause on that for a second.
link |
00:06:09.840
And it's interesting that you're connecting programming
link |
00:06:12.600
languages to the ease of management of data.
link |
00:06:18.200
So in your use of programming languages,
link |
00:06:20.600
you always had a love and a connection with data.
link |
00:06:24.880
I've always been interested in doing useful things for myself
link |
00:06:28.640
and for others, which generally means getting some data
link |
00:06:31.880
and doing something with it and putting it out there again.
link |
00:06:34.600
So that's been my interest throughout.
link |
00:06:38.400
So I also did a lot of stuff with Apple script
link |
00:06:41.560
back in the early days.
link |
00:06:43.880
So it's kind of nice being able to get the computer
link |
00:06:47.960
and computers to talk to each other and to do things for you.
link |
00:06:52.960
And then I think that one night, the programming language
link |
00:06:56.600
I most loved then would have been Delphi, which
link |
00:06:59.960
was Object Pascal created by Anders Halsberg, who previously
link |
00:07:05.960
did Turbo Pascal and then went on to create.net
link |
00:07:08.840
and then went on to create TypeScript.
link |
00:07:11.080
Delphi was amazing because it was like a compiled, fast language
link |
00:07:16.720
that was as easy to use as Visual Basic.
link |
00:07:20.200
Delphi, what is it similar to in more modern languages?
link |
00:07:27.480
Visual Basic.
link |
00:07:28.840
Visual Basic.
link |
00:07:29.680
Yeah, that a compiled, fast version.
link |
00:07:32.320
So I'm not sure there's anything quite like it anymore.
link |
00:07:37.080
If you took C Sharp or Java and got rid of the virtual machine
link |
00:07:42.520
and replaced it with something, you could compile a small type
link |
00:07:45.040
binary.
link |
00:07:46.520
I feel like it's where Swift could get to with the new Swift
link |
00:07:51.680
UI and the cross platform development going on.
link |
00:07:56.640
That's one of my dreams is that we'll hopefully get back
link |
00:08:01.600
to where Delphi was.
link |
00:08:02.840
There is actually a free Pascal project nowadays
link |
00:08:08.520
called Lazarus, which is also attempting
link |
00:08:10.320
to recreate Delphi.
link |
00:08:13.960
They're making good progress.
link |
00:08:16.080
So OK, Delphi, that's one of your favorite programming languages?
link |
00:08:21.000
Well, it's programming environments.
link |
00:08:22.360
Again, say Pascal's not a nice language.
link |
00:08:26.280
If you wanted to know specifically
link |
00:08:27.880
about what languages I like, I would definitely
link |
00:08:30.360
pick Jay as being an amazingly wonderful language.
link |
00:08:35.480
What's Jay?
link |
00:08:37.000
Jay, are you aware of APL?
link |
00:08:39.600
I am not, except from doing a little research on the work
link |
00:08:43.520
you've done.
link |
00:08:44.080
OK, so not at all surprising you're not
link |
00:08:47.280
familiar with it because it's not well known,
link |
00:08:49.040
but it's actually one of the main families of programming
link |
00:08:55.480
languages going back to the late 50s, early 60s.
link |
00:08:57.920
So there was a couple of major directions.
link |
00:09:01.720
One was the kind of lambda, calculus,
link |
00:09:04.440
Alonzo church direction, which I guess kind of Lisbon scheme
link |
00:09:08.640
and whatever, which has a history going back
link |
00:09:12.040
to the early days of computing.
link |
00:09:13.440
The second was the kind of imperative slash
link |
00:09:17.360
OO, algo, similar going on to C, C++, so forth.
link |
00:09:23.240
There was a third, which are called array oriented languages,
link |
00:09:26.960
which started with a paper by a guy called Ken Iverson, which
link |
00:09:31.720
was actually a math theory paper, not a programming paper.
link |
00:09:37.480
It was called Notation as a Tool for Thought.
link |
00:09:41.520
And it was the development of a new type of math notation.
link |
00:09:45.320
And the idea is that this math notation was much more
link |
00:09:48.560
flexible, expressive, and also well defined than traditional
link |
00:09:54.480
math notation, which is none of those things.
link |
00:09:56.440
Math notation is awful.
link |
00:09:59.160
And so he actually turned that into a programming language.
link |
00:10:02.840
Because this was the late 50s, all the names were available.
link |
00:10:06.720
So he called his programming language, or APL.
link |
00:10:10.520
APL, what?
link |
00:10:11.160
So APL is a implementation of notation
link |
00:10:15.360
as a tool for thought, by which he means math notation.
link |
00:10:18.280
And Ken and his son went on to do many things,
link |
00:10:22.880
but eventually they actually produced a new language that
link |
00:10:26.720
was built on top of all the learnings of APL.
link |
00:10:28.440
And that was called J. And J is the most
link |
00:10:32.800
expressive, composable, beautifully designed language
link |
00:10:41.040
I've ever seen.
link |
00:10:42.400
Does it have object oriented components?
link |
00:10:44.520
Does it have that kind of thing?
link |
00:10:45.520
Not really.
link |
00:10:46.240
It's an array oriented language.
link |
00:10:47.720
It's the third path.
link |
00:10:51.400
Are you saying array?
link |
00:10:52.760
Array oriented.
link |
00:10:53.720
Yeah.
link |
00:10:54.200
It needs to be array oriented.
link |
00:10:55.480
So array oriented means that you generally
link |
00:10:57.480
don't use any loops.
link |
00:10:59.520
But the whole thing is done with kind
link |
00:11:02.240
of an extreme version of broadcasting,
link |
00:11:06.360
if you're familiar with that NumPy slash Python concept.
link |
00:11:09.880
So you do a lot with one line of code.
link |
00:11:14.240
It looks a lot like math.
link |
00:11:17.520
Notation is basically highly compact.
link |
00:11:20.280
And the idea is that you can kind of,
link |
00:11:22.800
because you can do so much with one line of code,
link |
00:11:24.760
a single screen of code is very unlikely to,
link |
00:11:27.720
you very rarely need more than that to express your program.
link |
00:11:31.080
And so you can kind of keep it all in your head.
link |
00:11:33.240
And you can kind of clearly communicate it.
link |
00:11:36.000
It's interesting that APL created two main branches, K and J.
link |
00:11:41.560
J is this kind of like open source niche community of crazy
link |
00:11:47.920
enthusiasts like me.
link |
00:11:49.360
And then the other path, K, was fascinating.
link |
00:11:52.120
It's an astonishingly expensive programming language,
link |
00:11:56.600
which many of the world's most ludicrously rich hedge funds
link |
00:12:01.920
use.
link |
00:12:02.840
So the entire K machine is so small,
link |
00:12:06.640
it sits inside level three cache on your CPU.
link |
00:12:09.320
And it easily wins every benchmark I've ever seen
link |
00:12:14.040
in terms of data processing speed.
link |
00:12:16.440
But you don't come across it very much,
link |
00:12:17.840
because it's like $100,000 per CPU to run it.
link |
00:12:22.640
But it's like this path of programming languages
link |
00:12:26.240
is just so much, I don't know, so much more powerful
link |
00:12:29.760
in every way than the ones that almost anybody uses every day.
link |
00:12:33.840
So it's all about computation.
link |
00:12:37.400
It's really focusing on it.
link |
00:12:38.360
Pretty heavily focused on computation.
link |
00:12:40.640
I mean, so much of programming is data processing
link |
00:12:44.320
by definition.
link |
00:12:45.640
And so there's a lot of things you can do with it.
link |
00:12:49.000
But yeah, there's not much work being
link |
00:12:51.320
done on making user interface toolkills or whatever.
link |
00:12:57.080
I mean, there's some, but they're not great.
link |
00:12:59.400
At the same time, you've done a lot of stuff with Perl and Python.
link |
00:13:03.160
So what does that fit into the picture of J and K and APL
link |
00:13:08.320
and Python?
link |
00:13:08.880
Well, it's just much more pragmatic.
link |
00:13:12.400
In the end, you kind of have to end up
link |
00:13:13.960
where the libraries are.
link |
00:13:17.960
Because to me, my focus is on productivity.
link |
00:13:21.320
I just want to get stuff done and solve problems.
link |
00:13:23.800
So Perl was great.
link |
00:13:27.360
I created an email company called Fastmail.
link |
00:13:29.760
And Perl was great, because back in the late 90s, early 2000s,
link |
00:13:35.200
it just had a lot of stuff it could do.
link |
00:13:38.160
I still had to write my own monitoring system
link |
00:13:41.840
and my own web framework and my own whatever,
link |
00:13:43.840
because none of that stuff existed.
link |
00:13:45.760
But it was a super flexible language to do that in.
link |
00:13:50.280
And you used Perl for Fastmail.
link |
00:13:52.720
You used it as a back end.
link |
00:13:54.520
So everything was written in Perl?
link |
00:13:55.800
Yeah.
link |
00:13:56.520
Yeah, everything was Perl.
link |
00:13:58.720
Why do you think Perl hasn't succeeded or hasn't dominated
link |
00:14:04.480
the market where Python really takes over a lot of the
link |
00:14:07.120
tests?
link |
00:14:08.200
Well, I mean, Perl did dominate.
link |
00:14:09.640
It was everything, everywhere.
link |
00:14:13.080
But then the guy that ran Perl, Larry Wall,
link |
00:14:19.920
just didn't put the time in anymore.
link |
00:14:22.280
And no project can be successful if there isn't.
link |
00:14:29.680
Particularly one that started with a strong leader that
link |
00:14:32.640
loses that strong leadership.
link |
00:14:35.040
So then Python has kind of replaced it.
link |
00:14:38.040
Python is a lot less elegant language in nearly every way.
link |
00:14:45.040
But it has the data science libraries.
link |
00:14:48.880
And a lot of them are pretty great.
link |
00:14:51.240
So I kind of use it because it's the best we have.
link |
00:14:58.280
But it's definitely not good enough.
link |
00:15:01.800
What do you think the future of programming looks like?
link |
00:15:04.040
What do you hope the future of programming looks like if we
link |
00:15:06.880
zoom in on the computational fields on data science
link |
00:15:10.200
and machine learning?
link |
00:15:11.800
I hope Swift is successful because the goal of Swift,
link |
00:15:19.440
the way Chris Latna describes it,
link |
00:15:21.000
is to be infinitely hackable.
link |
00:15:22.640
And that's what I want.
link |
00:15:23.480
I want something where me and the people I do research with
link |
00:15:26.920
and my students can look at and change everything
link |
00:15:30.360
from top to bottom.
link |
00:15:32.000
There's nothing mysterious and magical and inaccessible.
link |
00:15:36.240
Unfortunately, with Python, it's the opposite of that
link |
00:15:38.600
because Python is so slow, it's extremely unhackable.
link |
00:15:42.640
You get to a point where it's like, OK, from here on down
link |
00:15:44.840
at C. So your debugger doesn't work in the same way.
link |
00:15:47.320
Your profiler doesn't work in the same way.
link |
00:15:48.920
Your build system doesn't work in the same way.
link |
00:15:50.880
It's really not very hackable at all.
link |
00:15:53.760
What's the part you like to be hackable?
link |
00:15:55.600
Is it for the objective of optimizing training
link |
00:16:00.120
of neural networks, inference of neural networks?
link |
00:16:02.600
Is it performance of the system?
link |
00:16:04.360
Or is there some nonperformance related, just creative idea?
link |
00:16:08.440
It's everything.
link |
00:16:09.080
I mean, in the end, I want to be productive as a practitioner.
link |
00:16:15.480
So at the moment, our understanding of deep learning
link |
00:16:18.440
is incredibly primitive.
link |
00:16:20.080
There's very little we understand.
link |
00:16:21.520
Most things don't work very well, even though it works better
link |
00:16:24.200
than anything else out there.
link |
00:16:26.200
There's so many opportunities to make it better.
link |
00:16:28.760
So you look at any domain area like speech recognition
link |
00:16:34.360
with deep learning or natural language processing
link |
00:16:37.720
classification with deep learning or whatever.
link |
00:16:39.440
Every time I look at an area with deep learning,
link |
00:16:41.960
I always see like, oh, it's terrible.
link |
00:16:44.480
There's lots and lots of obviously stupid ways
link |
00:16:47.560
to do things that need to be fixed.
link |
00:16:50.000
So then I want to be able to jump in there and quickly
link |
00:16:53.320
experiment and make them better.
link |
00:16:54.880
Do you think the programming language has a role in that?
link |
00:16:59.320
Huge role, yeah.
link |
00:17:00.280
So currently, Python has a big gap in terms of our ability
link |
00:17:07.080
to innovate particularly around recurrent neural networks
link |
00:17:11.880
and natural language processing because it's so slow.
link |
00:17:16.840
The actual loop where we actually loop through words,
link |
00:17:20.200
we have to do that whole thing in CUDA C.
link |
00:17:23.760
So we actually can't innovate with the kernel, the heart,
link |
00:17:27.600
of that most important algorithm.
link |
00:17:31.560
And it's just a huge problem.
link |
00:17:33.680
And this happens all over the place.
link |
00:17:36.600
So we hit research limitations.
link |
00:17:40.080
Another example, convolutional neural networks, which
link |
00:17:42.840
are actually the most popular architecture for lots of things,
link |
00:17:46.800
maybe most things in deep learning.
link |
00:17:48.920
We almost certainly should be using
link |
00:17:50.360
sparse convolutional neural networks, but only like two
link |
00:17:54.600
people are because to do it, you have
link |
00:17:56.800
to rewrite all of that CUDA C level stuff.
link |
00:17:59.920
And yeah, just research, just in practitioners, don't.
link |
00:18:04.520
So there's just big gaps in what people actually research on,
link |
00:18:09.240
what people actually implement because of the programming
link |
00:18:11.640
language problem.
link |
00:18:13.240
So you think it's just too difficult
link |
00:18:17.560
to write in CUDA C that a higher level programming language
link |
00:18:23.480
like Swift should enable the easier,
link |
00:18:30.520
fooling around, create stuff with RNNs,
link |
00:18:33.160
or sparse convolutional neural networks?
link |
00:18:34.920
Kind of.
link |
00:18:35.920
Who is at fault?
link |
00:18:38.520
Who is at charge of making it easy for a researcher to play around?
link |
00:18:42.320
I mean, no one's at fault.
link |
00:18:43.520
Just nobody's got a round to it yet.
link |
00:18:45.120
Or it's just it's hard.
link |
00:18:47.080
And I mean, part of the fault is that we ignored that whole APL
link |
00:18:51.800
kind of direction, or nearly everybody did for 60 years,
link |
00:18:55.640
50 years.
link |
00:18:57.720
But recently, people have been starting
link |
00:18:59.920
to reinvent pieces of that and kind of create some interesting
link |
00:19:04.840
new directions in the compiler technology.
link |
00:19:07.400
So the place where that's particularly happening right now
link |
00:19:11.760
is something called MLIR, which is something that, again,
link |
00:19:14.920
Chris Lattener, the Swift guy, is leading.
link |
00:19:18.000
And because it's actually not going
link |
00:19:20.080
to be Swift on its own that solves this problem.
link |
00:19:22.160
Because the problem is that currently writing
link |
00:19:24.880
a acceptably fast GPU program is too complicated,
link |
00:19:32.360
regardless of what language you use.
link |
00:19:36.480
And that's just because if you have to deal with the fact
link |
00:19:38.680
that I've got 10,000 threads and I have to synchronize between them
link |
00:19:43.160
all, and I have to put my thing into grid blocks
link |
00:19:45.360
and think about warps and all this stuff,
link |
00:19:47.040
it's just so much boilerplate that to do that well,
link |
00:19:50.720
you have to be a specialist at that.
link |
00:19:52.240
And it's going to be a year's work to optimize that algorithm
link |
00:19:58.200
in that way.
link |
00:19:59.720
But with things like TensorFlow Comprehensions, and Tile,
link |
00:20:04.640
and MLIR, and TVM, there's all these various projects which
link |
00:20:08.880
are all about saying, let's let people
link |
00:20:11.840
create domain specific languages for tensor
link |
00:20:16.080
computations.
link |
00:20:16.880
These are the kinds of things we do generally
link |
00:20:19.120
on the GPU for deep learning, and then
link |
00:20:21.640
have a compiler which can optimize that tensor computation.
link |
00:20:28.280
A lot of this work is actually sitting on top of a project
link |
00:20:31.440
called Halide, which is a mind blowing project
link |
00:20:36.040
where they came up with such a domain specific language.
link |
00:20:38.880
In fact, two, one domain specific language for expressing,
link |
00:20:41.240
this is what my tensor computation is.
link |
00:20:43.840
And another domain specific language for expressing,
link |
00:20:46.320
this is the way I want you to structure
link |
00:20:50.320
the compilation of that, and do it block by block
link |
00:20:53.040
and do these bits in parallel.
link |
00:20:54.960
And they were able to show how you can compress
link |
00:20:57.760
the amount of code by 10x compared to optimized GPU
link |
00:21:02.880
code and get the same performance.
link |
00:21:05.600
So these are the things that are sitting on top
link |
00:21:08.480
of that kind of research, and MLIR
link |
00:21:12.240
is pulling a lot of those best practices together.
link |
00:21:15.160
And now we're starting to see work done
link |
00:21:17.160
on making all of that directly accessible through Swift
link |
00:21:21.400
so that I could use Swift to write those domain specific
link |
00:21:25.040
languages.
link |
00:21:25.880
And hopefully we'll get then Swift CUDA kernels
link |
00:21:29.520
written in a very expressive and concise way that
link |
00:21:31.720
looks a bit like J in APL, and then Swift layers on top
link |
00:21:36.280
of that, and then a Swift UI on top of that,
link |
00:21:38.360
and it'll be so nice if we can get to that point.
link |
00:21:42.600
Now does it all eventually boil down to CUDA and NVIDIA GPUs?
link |
00:21:48.560
Unfortunately at the moment it does,
link |
00:21:50.120
but one of the nice things about MLIR,
link |
00:21:52.600
if AMD ever gets their act together, which they probably
link |
00:21:56.120
want, is that they or others could
link |
00:21:59.040
write MLIR backends for other GPUs
link |
00:22:05.000
or rather tensor computation devices, of which today
link |
00:22:10.320
there are increasing number like Graphcore or Vertex AI
link |
00:22:15.520
or whatever.
link |
00:22:18.840
So yeah, being able to target lots of backends
link |
00:22:22.600
would be another benefit of this,
link |
00:22:23.960
and the market really needs competition,
link |
00:22:26.680
because at the moment NVIDIA is massively
link |
00:22:28.680
overcharging for their kind of enterprise class cards,
link |
00:22:33.640
because there is no serious competition,
link |
00:22:36.720
because nobody else is doing the software properly.
link |
00:22:39.280
In the cloud there is some competition, right?
link |
00:22:41.400
But not really, other than TPUs perhaps,
link |
00:22:45.080
but TPUs are almost unprogrammable at the moment.
link |
00:22:49.040
TPUs have the same problem that you can't.
link |
00:22:51.080
It's even worse.
link |
00:22:51.760
So TPUs, Google actually made an explicit decision
link |
00:22:54.800
to make them almost entirely unprogrammable,
link |
00:22:57.200
because they felt that there was too much IP in there,
link |
00:22:59.960
and if they gave people direct access to program them,
link |
00:23:02.640
people would learn their secrets.
link |
00:23:04.360
So you can't actually directly program
link |
00:23:09.720
the memory in a TPU.
link |
00:23:12.120
You can't even directly create code that runs on
link |
00:23:16.360
and that you look at on the machine that has the TPU.
link |
00:23:19.080
It all goes through a virtual machine.
link |
00:23:20.920
So all you can really do is this kind of cookie cutter
link |
00:23:23.680
thing of like plug in high level stuff together,
link |
00:23:27.760
which is just super tedious and annoying
link |
00:23:31.440
and totally unnecessary.
link |
00:23:33.920
So tell me if you could, the origin story of fast AI.
link |
00:23:40.960
What is the motivation, its mission, its dream?
link |
00:23:45.760
So I guess the founding story is heavily
link |
00:23:50.040
tied to my previous startup, which
link |
00:23:51.840
is a company called Inletic, which
link |
00:23:53.960
was the first company to focus on deep learning for medicine.
link |
00:23:58.280
And I created that because I saw there was a huge opportunity
link |
00:24:03.240
to, there's about a 10x shortage of the number of doctors
link |
00:24:07.960
in the world and the developing world that we need.
link |
00:24:12.120
I expected it would take about 300 years
link |
00:24:13.840
to train enough doctors to meet that gap.
link |
00:24:16.120
But I guessed that maybe if we used
link |
00:24:20.760
deep learning for some of the analytics,
link |
00:24:23.760
we could maybe make it so you don't need
link |
00:24:25.760
as highly trained doctors.
link |
00:24:27.320
For diagnosis?
link |
00:24:28.320
For diagnosis and treatment planning.
link |
00:24:29.840
Where's the biggest benefit just before get the fast AI?
link |
00:24:33.440
Where's the biggest benefit of AI and medicine that you see
link |
00:24:37.280
today and in the future?
link |
00:24:39.440
Not much happening today in terms of stuff that's actually
link |
00:24:41.960
out there.
link |
00:24:42.440
It's very early.
link |
00:24:43.160
But in terms of the opportunity, it's
link |
00:24:45.320
to take markets like India and China and Indonesia, which
link |
00:24:51.080
have big populations, Africa, small numbers of doctors,
link |
00:24:58.120
and provide diagnostic, particularly treatment
link |
00:25:02.440
planning and triage kind of on device
link |
00:25:05.160
so that if you do a test for malaria or tuberculosis
link |
00:25:10.360
or whatever, you immediately get something
link |
00:25:12.800
that even a health care worker that's
link |
00:25:14.840
had a month of training can get a very high quality
link |
00:25:20.360
assessment of whether the patient might be at risk
link |
00:25:23.480
until OK, we'll send them off to a hospital.
link |
00:25:27.480
So for example, in Africa, outside of South Africa,
link |
00:25:31.720
there's only five pediatric radiologists
link |
00:25:34.080
for the entire continent.
link |
00:25:35.320
So most countries don't have any.
link |
00:25:37.200
So if your kid is sick and they need something
link |
00:25:39.240
diagnosed through medical imaging,
link |
00:25:41.200
the person, even if you're able to get medical imaging done,
link |
00:25:44.040
the person that looks at it will be a nurse at best.
link |
00:25:48.920
But actually, in India, for example, and China,
link |
00:25:52.480
almost no x rays are read by anybody,
link |
00:25:54.760
by any trained professional, because they don't have enough.
link |
00:25:59.400
So if instead we had an algorithm that
link |
00:26:02.880
could take the most likely high risk 5% and say triage,
link |
00:26:10.080
basically say, OK, somebody needs to look at this,
link |
00:26:13.280
it would massively change the kind of way
link |
00:26:16.240
that what's possible with medicine in the developing world.
link |
00:26:20.640
And remember, increasingly, they have money.
link |
00:26:23.680
They're the developing world.
link |
00:26:24.800
They're not the poor world, the developing world.
link |
00:26:26.160
So they have the money.
link |
00:26:26.920
So they're building the hospitals.
link |
00:26:28.480
They're getting the diagnostic equipment.
link |
00:26:31.960
But there's no way for a very long time
link |
00:26:34.880
will they be able to have the expertise.
link |
00:26:38.480
Shortage of expertise.
link |
00:26:39.760
OK, and that's where the deep learning systems
link |
00:26:42.720
can step in and magnify the expertise they do have.
link |
00:26:46.040
Exactly.
link |
00:26:47.840
So you do see, just to linger a little bit longer,
link |
00:26:54.160
the interaction, do you still see the human experts still
link |
00:26:58.520
at the core of the system?
link |
00:26:59.840
Yeah, absolutely.
link |
00:27:00.480
Is there something in medicine that
link |
00:27:01.720
could be automated almost completely?
link |
00:27:03.760
I don't see the point of even thinking about that,
link |
00:27:06.360
because we have such a shortage of people.
link |
00:27:08.480
Why would we want to find a way not to use them?
link |
00:27:12.160
Like, we have people.
link |
00:27:13.840
So the idea of, even from an economic point of view,
link |
00:27:17.200
if you can make them 10x more productive,
link |
00:27:19.800
getting rid of the person doesn't
link |
00:27:21.600
impact your unit economics at all.
link |
00:27:23.880
And it totally involves the fact that there are things
link |
00:27:26.680
people do better than machines.
link |
00:27:28.760
So it's just, to me, that's not a useful way
link |
00:27:33.120
of framing the problem.
link |
00:27:34.120
I guess, just to clarify, I guess I
link |
00:27:36.440
meant there may be some problems where you can avoid even
link |
00:27:40.560
going to the expert ever.
link |
00:27:42.160
Sort of maybe preventative care or some basic stuff,
link |
00:27:46.160
the low hanging fruit, allowing the expert
link |
00:27:47.800
to focus on the things that are really that.
link |
00:27:51.320
Well, that's what the triage would do, right?
link |
00:27:52.960
So the triage would say, OK, 99% sure there's nothing here.
link |
00:28:00.760
So that can be done on device.
link |
00:28:04.040
And they can just say, OK, go home.
link |
00:28:05.920
So the experts are being used to look at the stuff which
link |
00:28:10.520
has some chance it's worth looking at,
link |
00:28:12.240
which most things is not.
link |
00:28:15.720
It's fine.
link |
00:28:16.280
Why do you think we haven't quite made progress on that yet
link |
00:28:19.840
in terms of the scale of how much AI is applied in the method?
link |
00:28:27.480
There's a lot of reasons.
link |
00:28:28.400
I mean, one is it's pretty new.
link |
00:28:29.640
I only started in late 2014.
link |
00:28:32.040
And before that, it's hard to express
link |
00:28:35.920
to what degree the medical world was not
link |
00:28:37.760
aware of the opportunities here.
link |
00:28:40.720
So I went to RSNA, which is the world's largest radiology
link |
00:28:45.520
conference.
link |
00:28:46.240
And I told everybody I could, like,
link |
00:28:50.040
I'm doing this thing with deep learning.
link |
00:28:51.800
Please come and check it out.
link |
00:28:53.320
And no one had any idea what I was talking about.
link |
00:28:56.880
No one had any interest in it.
link |
00:28:59.640
So we've come from absolute zero, which is hard.
link |
00:29:05.040
And then the whole regulatory framework, education system,
link |
00:29:09.920
everything is just set up to think of doctoring
link |
00:29:13.400
in a very different way.
link |
00:29:14.920
So today, there is a small number
link |
00:29:16.400
of people who are deep learning practitioners and doctors
link |
00:29:22.040
at the same time.
link |
00:29:22.960
And we're starting to see the first ones come out
link |
00:29:25.040
of their PhD programs.
link |
00:29:26.520
So Zach Cahane over in Boston, Cambridge
link |
00:29:33.960
has a number of students now who are data science experts,
link |
00:29:41.040
deep learning experts, and actual medical doctors.
link |
00:29:46.400
Quite a few doctors have completed our fast AI course
link |
00:29:49.480
now and are publishing papers and creating journal reading
link |
00:29:54.920
groups in the American Council of Radiology.
link |
00:29:58.040
And it's just starting to happen.
link |
00:30:00.280
But it's going to be a long process.
link |
00:30:02.840
The regulators have to learn how to regulate this.
link |
00:30:04.920
They have to build guidelines.
link |
00:30:08.720
And then the lawyers at hospitals
link |
00:30:12.120
have to develop a new way of understanding
link |
00:30:15.080
that sometimes it makes sense for data
link |
00:30:18.680
to be looked at in raw form in large quantities
link |
00:30:24.880
in order to create world changing results.
link |
00:30:27.000
Yeah, there's a regulation around data, all that.
link |
00:30:30.080
It sounds probably the hardest problem,
link |
00:30:33.840
but it sounds reminiscent of autonomous vehicles as well.
link |
00:30:36.760
Many of the same regulatory challenges,
link |
00:30:38.760
many of the same data challenges.
link |
00:30:40.560
Yeah, I mean, funnily enough, the problem
link |
00:30:42.160
is less the regulation and more the interpretation
link |
00:30:44.880
of that regulation by lawyers in hospitals.
link |
00:30:48.200
So HIPAA was actually designed.
link |
00:30:52.560
The P in HIPAA does not stand for privacy.
link |
00:30:56.400
It stands for portability.
link |
00:30:57.640
It's actually meant to be a way that data can be used.
link |
00:31:01.200
And it was created with lots of gray areas
link |
00:31:04.400
because the idea is that would be more practical
link |
00:31:06.560
and it would help people to use this legislation
link |
00:31:10.480
to actually share data in a more thoughtful way.
link |
00:31:13.680
Unfortunately, it's done the opposite
link |
00:31:15.320
because when a lawyer sees a gray area, they see, oh,
link |
00:31:18.880
if we don't know we won't get sued, then we can't do it.
link |
00:31:22.440
So HIPAA is not exactly the problem.
link |
00:31:26.360
The problem is more that hospital lawyers
link |
00:31:30.080
are not incented to make bold decisions
link |
00:31:34.720
about data portability.
link |
00:31:36.520
Or even to embrace technology that saves lives.
link |
00:31:40.480
They more want to not get in trouble
link |
00:31:42.440
for embracing that technology.
link |
00:31:44.280
Also, it is also saves lives in a very abstract way,
link |
00:31:47.840
which is like, oh, we've been able to release
link |
00:31:49.840
these 100,000 anonymous records.
link |
00:31:52.360
I can't point at the specific person whose life that's saved.
link |
00:31:55.360
I can say like, oh, we've ended up with this paper
link |
00:31:57.760
which found this result, which diagnosed 1,000 more people
link |
00:32:02.200
than we would have otherwise, but it's like,
link |
00:32:04.200
which ones were helped, it's very abstract.
link |
00:32:07.360
Yeah, and on the counter side of that,
link |
00:32:09.400
you may be able to point to a life that was taken
link |
00:32:13.080
because of something that was...
link |
00:32:14.360
Yeah, or a person whose privacy was violated.
link |
00:32:18.240
It's like, oh, this specific person,
link |
00:32:20.360
you know, there was deidentified.
link |
00:32:25.480
Just a fascinating topic.
link |
00:32:27.360
We're jumping around.
link |
00:32:28.360
We'll get back to fast AI, but on the question of privacy,
link |
00:32:32.880
data is the fuel for so much innovation in deep learning.
link |
00:32:38.160
What's your sense on privacy,
link |
00:32:39.840
whether we're talking about Twitter, Facebook, YouTube,
link |
00:32:44.080
just the technologies like in the medical field
link |
00:32:48.720
that rely on people's data in order to create impact?
link |
00:32:53.440
How do we get that right, respecting people's privacy
link |
00:32:58.840
and yet creating technology that is learned from data?
link |
00:33:03.360
One of my areas of focus is on doing more with less data,
link |
00:33:11.480
which so most vendors, unfortunately, are strongly
link |
00:33:15.000
centred to find ways to require more data and more computation.
link |
00:33:20.000
So Google and IBM being the most obvious...
link |
00:33:24.000
IBM.
link |
00:33:26.000
Yeah, so Watson, you know, so Google and IBM both strongly push
link |
00:33:30.600
the idea that they have more data and more computation
link |
00:33:35.400
and more intelligent people than anybody else,
link |
00:33:37.800
and so you have to trust them to do things
link |
00:33:39.840
because nobody else can do it.
link |
00:33:42.600
And Google's very upfront about this,
link |
00:33:45.360
like Jeff Dain has gone out there and given talks and said,
link |
00:33:48.680
our goal is to require 1,000 times more computation,
link |
00:33:52.840
but less people.
link |
00:33:55.120
Our goal is to use the people that you have better
link |
00:34:00.600
and the data you have better and the computation you have better.
link |
00:34:02.960
So one of the things that we've discovered is,
link |
00:34:06.000
or at least highlighted, is that you very, very, very often
link |
00:34:11.080
don't need much data at all.
link |
00:34:13.360
And so the data you already have in your organization
link |
00:34:16.160
will be enough to get state of the art results.
link |
00:34:19.240
So like my starting point would be to kind of say around privacy
link |
00:34:22.600
is a lot of people are looking for ways
link |
00:34:25.760
to share data and aggregate data,
link |
00:34:28.120
but I think often that's unnecessary.
link |
00:34:29.920
They assume that they need more data than they do
link |
00:34:32.160
because they're not familiar with the basics of transfer
link |
00:34:35.240
learning, which is this critical technique
link |
00:34:38.440
for needing orders of magnitude less data.
link |
00:34:42.000
Is your sense, one reason you might want to collect data
link |
00:34:44.680
from everyone is like in the recommender system context,
link |
00:34:50.440
where your individual, Jeremy Howard's individual data
link |
00:34:54.520
is the most useful for providing a product that's
link |
00:34:58.600
impactful for you.
link |
00:34:59.880
So for giving you advertisements,
link |
00:35:02.240
for recommending to you movies, for doing medical diagnosis.
link |
00:35:07.640
Is your sense we can build with a small amount of data,
link |
00:35:11.720
general models that will have a huge impact for most people,
link |
00:35:16.040
that we don't need to have data from each individual?
link |
00:35:19.120
On the whole, I'd say yes.
link |
00:35:20.560
I mean, there are things like, recommender systems
link |
00:35:26.400
have this cold start problem, where Jeremy is a new customer.
link |
00:35:30.960
We haven't seen him before, so we can't recommend him things
link |
00:35:33.280
based on what else he's bought and liked with us.
link |
00:35:36.520
And there's various workarounds to that.
link |
00:35:39.440
A lot of music programs will start out
link |
00:35:41.160
by saying, which of these artists do you like?
link |
00:35:44.920
Which of these albums do you like?
link |
00:35:46.800
Which of these songs do you like?
link |
00:35:49.800
Netflix used to do that.
link |
00:35:51.040
Nowadays, people don't like that because they think, oh,
link |
00:35:55.320
we don't want to bother the user.
link |
00:35:57.400
So you could work around that by having some kind of data
link |
00:36:00.560
sharing where you get my marketing record from Axiom
link |
00:36:04.240
or whatever and try to question that.
link |
00:36:06.360
To me, the benefit to me and to society
link |
00:36:12.360
of saving me five minutes on answering some questions
link |
00:36:16.520
versus the negative externalities of the privacy issue
link |
00:36:23.520
doesn't add up.
link |
00:36:24.800
So I think a lot of the time, the places
link |
00:36:26.600
where people are invading our privacy in order
link |
00:36:30.520
to provide convenience is really about just trying
link |
00:36:35.360
to make them more money.
link |
00:36:36.880
And they move these negative externalities
link |
00:36:40.760
into places that they don't have to pay for them.
link |
00:36:44.360
So when you actually see regulations
link |
00:36:48.120
appear that actually cause the companies that
link |
00:36:50.560
create these negative externalities to have
link |
00:36:52.360
to pay for it themselves, they say, well,
link |
00:36:54.320
we can't do it anymore.
link |
00:36:56.160
So the cost is actually too high.
link |
00:36:58.240
But for something like medicine, the hospital
link |
00:37:02.280
has my medical imaging, my pathology studies,
link |
00:37:06.440
my medical records.
link |
00:37:08.920
And also, I own my medical data.
link |
00:37:11.920
So I help a startup called DocAI.
link |
00:37:16.960
One of the things DocAI does is that it has an app.
link |
00:37:19.760
You can connect to Sutter Health and Labcore and Walgreens
link |
00:37:26.120
and download your medical data to your phone
link |
00:37:29.840
and then upload it, again, at your discretion
link |
00:37:33.560
to share it as you wish.
link |
00:37:36.040
So with that kind of approach, we
link |
00:37:38.440
can share our medical information
link |
00:37:41.160
with the people we want to.
link |
00:37:44.840
Yeah, so control.
link |
00:37:45.720
I mean, really being able to control who you share it with
link |
00:37:48.240
and so on.
link |
00:37:49.760
So that has a beautiful, interesting tangent
link |
00:37:53.080
to return back to the origin story of FastAI.
link |
00:37:59.360
Right, so before I started FastAI,
link |
00:38:02.520
I spent a year researching where are the biggest
link |
00:38:07.160
opportunities for deep learning.
link |
00:38:10.400
Because I knew from my time at Kaggle in particular
link |
00:38:14.080
that deep learning had hit this threshold point where it was
link |
00:38:17.960
rapidly becoming the state of the art approach in every area
link |
00:38:20.520
that looked at it.
link |
00:38:21.600
And I'd been working with neural nets for over 20 years.
link |
00:38:25.400
I knew that from a theoretical point of view,
link |
00:38:27.440
once it hit that point, it would do that in just about every
link |
00:38:30.760
domain.
link |
00:38:31.600
And so I spent a year researching
link |
00:38:34.480
what are the domains it's going to have the biggest low hanging
link |
00:38:37.120
fruit in the shortest time period.
link |
00:38:39.400
I picked medicine, but there were so many I could have picked.
link |
00:38:43.880
And so there was a level of frustration for me of like, OK,
link |
00:38:47.640
I'm really glad we've opened up the medical deep learning
link |
00:38:50.840
world and today it's huge, as you know.
link |
00:38:53.880
But we can't do, you know, I can't do everything.
link |
00:38:58.280
I don't even know like, like in medicine,
link |
00:39:00.400
it took me a really long time to even get a sense of like,
link |
00:39:02.760
what kind of problems do medical practitioners solve?
link |
00:39:05.080
What kind of data do they have?
link |
00:39:06.400
Who has that data?
link |
00:39:08.520
So I kind of felt like I need to approach this differently
link |
00:39:12.480
if I want to maximize the positive impact of deep learning.
link |
00:39:16.200
Rather than me picking an area and trying
link |
00:39:19.480
to become good at it and building something,
link |
00:39:21.720
I should let people who are already domain experts
link |
00:39:24.480
in those areas and who already have the data do it themselves.
link |
00:39:29.240
So that was the reason for vast AI is to basically try
link |
00:39:35.520
and figure out how to get deep learning
link |
00:39:38.840
into the hands of people who could benefit from it
link |
00:39:41.800
and help them to do so in as quick and easy and effective
link |
00:39:45.400
a way as possible.
link |
00:39:47.080
Got it.
link |
00:39:47.560
So sort of empower the domain experts.
link |
00:39:50.240
Yeah.
link |
00:39:51.320
And like partly it's because like,
link |
00:39:54.200
unlike most people in this field,
link |
00:39:56.280
my background is very applied and industrial.
link |
00:39:59.960
Like my first job was at McKinsey & Company.
link |
00:40:02.480
I spent 10 years of management consulting.
link |
00:40:04.640
I spend a lot of time with domain experts.
link |
00:40:10.240
You know, so I kind of respect them and appreciate them.
link |
00:40:12.800
And I know that's where the value generation in society is.
link |
00:40:16.440
And so I also know how most of them can't code.
link |
00:40:21.560
And most of them don't have the time to invest, you know,
link |
00:40:26.320
three years in a graduate degree or whatever.
link |
00:40:29.320
So it's like, how do I upskill those domain experts?
link |
00:40:33.520
I think that would be a super powerful thing,
link |
00:40:36.080
you know, the biggest societal impact I could have.
link |
00:40:40.200
So yeah, that was the thinking.
link |
00:40:41.680
So so much of fast AI students and researchers
link |
00:40:45.680
and the things you teach are programmatically minded,
link |
00:40:50.120
practically minded,
link |
00:40:51.520
figuring out ways how to solve real problems and fast.
link |
00:40:55.840
So from your experience,
link |
00:40:57.480
what's the difference between theory and practice of deep learning?
link |
00:41:02.040
Hmm.
link |
00:41:03.680
Well, most of the research in the deep mining world
link |
00:41:07.520
is a total waste of time.
link |
00:41:09.840
Right. That's what I was getting at.
link |
00:41:11.040
Yeah.
link |
00:41:12.200
It's it's a problem in science in general.
link |
00:41:16.240
Scientists need to be published,
link |
00:41:19.600
which means they need to work on things
link |
00:41:21.480
that their peers are extremely familiar with
link |
00:41:24.040
and can recognize in advance in that area.
link |
00:41:26.200
So that means that they all need to work on the same thing.
link |
00:41:30.040
And so it really ink and the thing they work on
link |
00:41:33.040
is nothing to encourage them to work on things
link |
00:41:35.640
that are practically useful.
link |
00:41:38.840
So you get just a whole lot of research,
link |
00:41:41.120
which is minor advances in stuff
link |
00:41:43.200
that's been very highly studied
link |
00:41:44.600
and has no significant practical impact.
link |
00:41:49.280
Whereas the things that really make a difference
link |
00:41:50.840
like I mentioned transfer learning,
link |
00:41:52.760
like if we can do better at transfer learning,
link |
00:41:55.560
then it's this like world changing thing
link |
00:41:58.160
where suddenly like lots more people can do world class work
link |
00:42:02.880
with less resources and less data and.
link |
00:42:06.760
But almost nobody works on that.
link |
00:42:08.480
Or another example, active learning,
link |
00:42:10.760
which is the study of like,
link |
00:42:11.880
how do we get more out of the human beings in the loop?
link |
00:42:15.880
That's my favorite topic.
link |
00:42:17.120
Yeah. So active learning is great,
link |
00:42:18.520
but it's almost nobody working on it
link |
00:42:21.160
because it's just not a trendy thing right now.
link |
00:42:23.800
You know what somebody started to interrupt?
link |
00:42:27.040
He was saying that nobody is publishing
link |
00:42:29.720
on active learning, right?
link |
00:42:31.520
But there's people inside companies,
link |
00:42:33.440
anybody who actually has to solve a problem,
link |
00:42:36.800
they're going to innovate on active learning.
link |
00:42:39.600
Yeah. Everybody kind of reinvents active learning
link |
00:42:42.080
when they actually have to work in practice
link |
00:42:43.760
because they start labeling things and they think,
link |
00:42:46.360
gosh, this is taking a long time and it's very expensive.
link |
00:42:49.280
And then they start thinking,
link |
00:42:51.200
well, why am I labeling everything?
link |
00:42:52.640
I'm only, the machine's only making mistakes
link |
00:42:54.840
on those two classes.
link |
00:42:56.040
They're the hard ones.
link |
00:42:56.880
Maybe I'll just start labeling those two classes
link |
00:42:58.840
and then you start thinking,
link |
00:43:00.360
well, why did I do that manually?
link |
00:43:01.560
Why can't I just get the system to tell me
link |
00:43:03.000
which things are going to be harder steps?
link |
00:43:04.760
It's an obvious thing to do.
link |
00:43:06.200
But yeah, it's just like transfer learning.
link |
00:43:11.400
It's understudied and the academic world
link |
00:43:14.120
just has no reason to care about practical results.
link |
00:43:17.440
The funny thing is, like,
link |
00:43:18.360
I've only really ever written one paper.
link |
00:43:19.920
I hate writing papers.
link |
00:43:21.520
And I didn't even write it.
link |
00:43:22.760
It was my colleague, Sebastian Ruder, who actually wrote it.
link |
00:43:25.480
I just did the research for it.
link |
00:43:28.040
But it was basically introducing successful transfer learning
link |
00:43:31.640
to NLP for the first time.
link |
00:43:34.200
And the algorithm is called ULMfit.
link |
00:43:37.000
And I actually wrote it for the course,
link |
00:43:42.320
for the first day of course.
link |
00:43:43.720
I wanted to teach people NLP.
link |
00:43:45.360
And I thought I only want to teach people practical stuff.
link |
00:43:47.520
And I think the only practical stuff is transfer learning.
link |
00:43:50.560
And I couldn't find any examples of transfer learning in NLP.
link |
00:43:53.360
So I just did it.
link |
00:43:54.560
And I was shocked to find that as soon as I did it,
link |
00:43:57.320
which, you know, the basic prototype took a couple of days,
link |
00:44:01.080
smashed the state of the art
link |
00:44:02.520
on one of the most important data sets in a field
link |
00:44:04.760
that I knew nothing about.
link |
00:44:06.720
And I just thought, well, this is ridiculous.
link |
00:44:10.400
And so I spoke to Sebastian about it.
link |
00:44:13.800
And he kindly offered to write it up the results.
link |
00:44:17.680
And so it ended up being published in ACL,
link |
00:44:21.360
which is the top computational linguistics conference.
link |
00:44:25.560
So like, people do actually care once you do it.
link |
00:44:28.880
But I guess it's difficult for maybe junior researchers.
link |
00:44:34.160
I don't care whether I get citations or papers or whatever.
link |
00:44:37.720
There's nothing in my life that makes that important,
link |
00:44:39.640
which is why I've never actually
link |
00:44:41.240
bothered to write a paper myself.
link |
00:44:43.040
But for people who do, I guess they
link |
00:44:44.400
have to pick the kind of safe option, which is like,
link |
00:44:50.960
yeah, make a slight improvement on something
link |
00:44:52.720
that everybody's already working on.
link |
00:44:55.160
Yeah, nobody does anything interesting or succeeds
link |
00:44:59.040
in life with the safe option.
link |
00:45:01.240
Well, I mean, the nice thing is nowadays,
link |
00:45:02.960
everybody is now working on NLP transfer learning.
link |
00:45:05.320
Because since that time, we've had GPT and GPT2 and BERT.
link |
00:45:12.240
So yeah, once you show that something's possible,
link |
00:45:15.400
everybody jumps in, I guess.
link |
00:45:17.680
I hope to be a part of it.
link |
00:45:19.320
I hope to see more innovation and active learning
link |
00:45:21.600
in the same way.
link |
00:45:22.160
I think transfer learning and active learning
link |
00:45:24.560
are a fascinating public open work.
link |
00:45:27.360
I actually helped start a startup called Platform AI, which
link |
00:45:30.160
is really all about active learning.
link |
00:45:31.760
And yeah, it's been interesting trying
link |
00:45:34.200
to kind of see what research is out there
link |
00:45:36.920
and make the most of it.
link |
00:45:37.800
And there's basically none.
link |
00:45:39.200
So we've had to do all our own research.
link |
00:45:41.040
Once again, and just as you described,
link |
00:45:44.240
can you tell the story of the Stanford competition,
link |
00:45:47.640
Dawn Bench, and fast AI's achievement on it?
link |
00:45:51.520
Sure.
link |
00:45:51.960
So something which I really enjoy is that I basically
link |
00:45:55.560
teach two courses a year, the practical deep learning
link |
00:45:59.000
for coders, which is kind of the introductory course,
link |
00:46:02.120
and then cutting edge deep learning for coders, which
link |
00:46:04.280
is the kind of research level course.
link |
00:46:08.080
And while I teach those courses, I basically
link |
00:46:14.320
have a big office at the University of San Francisco.
link |
00:46:18.440
It'd be enough for like 30 people.
link |
00:46:19.800
And I invite any student who wants to come and hang out
link |
00:46:22.960
with me while I build the course.
link |
00:46:25.320
And so generally, it's full.
link |
00:46:26.640
And so we have 20 or 30 people in a big office
link |
00:46:30.880
with nothing to do but study deep learning.
link |
00:46:33.880
So it was during one of these times
link |
00:46:35.880
that somebody in the group said, oh, there's
link |
00:46:38.640
a thing called Dawn Bench that looks interesting.
link |
00:46:41.480
And I say, what the hell is that?
link |
00:46:42.800
I'm going to set out some competition
link |
00:46:44.120
to see how quickly you can train a model.
link |
00:46:46.440
It seems kind of not exactly relevant to what we're doing,
link |
00:46:50.080
but it sounds like the kind of thing
link |
00:46:51.440
which you might be interested in.
link |
00:46:52.480
And I checked it out and I was like, oh, crap.
link |
00:46:53.960
There's only 10 days till it's over.
link |
00:46:55.840
It's pretty much too late.
link |
00:46:58.120
And we're kind of busy trying to teach this course.
link |
00:47:01.000
But we're like, oh, it would make an interesting case study
link |
00:47:05.640
for the course like it's all the stuff we're already doing.
link |
00:47:08.200
Why don't we just put together our current best practices
link |
00:47:11.120
and ideas.
link |
00:47:12.480
So me and I guess about four students just decided
link |
00:47:16.880
to give it a go.
link |
00:47:17.560
And we focused on this small one called
link |
00:47:19.880
SciFar 10, which is little 32 by 32 pixel images.
link |
00:47:24.640
Can you say what Dawn Bench is?
link |
00:47:26.160
Yeah, so it's a competition to train a model as fast as possible.
link |
00:47:29.560
It was run by Stanford.
link |
00:47:31.000
And as cheap as possible, too.
link |
00:47:32.480
That's also another one for as cheap as possible.
link |
00:47:34.320
And there's a couple of categories, ImageNet and SciFar 10.
link |
00:47:38.160
So ImageNet's this big 1.3 million image thing
link |
00:47:42.080
that took a couple of days to train.
link |
00:47:45.400
I remember a friend of mine, Pete Warden, who's now at Google.
link |
00:47:51.240
I remember he told me how he trained ImageNet a few years
link |
00:47:53.760
ago when he basically had this little granny flat out
link |
00:47:59.440
the back that he turned into was ImageNet training center.
link |
00:48:01.920
And after a year of work, he figured out
link |
00:48:04.240
how to train it in 10 days or something.
link |
00:48:07.040
It's like that was a big job.
link |
00:48:08.480
Whereas SciFar 10, at that time, you
link |
00:48:10.640
could train in a few hours.
link |
00:48:13.040
It's much smaller and easier.
link |
00:48:14.520
So we thought we'd try SciFar 10.
link |
00:48:18.120
And yeah, I've really never done that before.
link |
00:48:23.800
Like, things like using more than one GPU at a time
link |
00:48:27.920
was something I tried to avoid.
link |
00:48:29.800
Because to me, it's very against the whole idea
link |
00:48:32.160
of accessibility, is she better do things with one GPU?
link |
00:48:35.080
I mean, have you asked in the past
link |
00:48:36.480
before, after having accomplished something,
link |
00:48:39.680
how do I do this faster, much faster?
link |
00:48:42.520
Oh, always.
link |
00:48:43.240
But it's always, for me, it's always,
link |
00:48:44.680
how do I make it much faster on a single GPU
link |
00:48:47.640
that a normal person could afford in their day to day life?
link |
00:48:50.400
It's not, how could I do it faster by having a huge data
link |
00:48:54.760
center?
link |
00:48:55.280
Because to me, it's all about, like,
link |
00:48:57.240
as many people should be able to use something as possible
link |
00:48:59.560
without fussing around with infrastructure.
link |
00:49:04.160
So anyway, so in this case, it's like, well,
link |
00:49:06.080
we can use 8GPUs just by renting a AWS machine.
link |
00:49:10.240
So we thought we'd try that.
link |
00:49:11.920
And yeah, basically, using the stuff we were already doing,
link |
00:49:16.560
we were able to get the speed.
link |
00:49:20.360
Within a few days, we had the speed down to a very small
link |
00:49:25.360
number of minutes.
link |
00:49:26.040
I can't remember exactly how many minutes it was,
link |
00:49:28.800
but it might have been like 10 minutes or something.
link |
00:49:31.440
And so yeah, we found ourselves at the top of the leaderboard
link |
00:49:34.200
easily for both time and money, which really shocked me.
link |
00:49:38.720
Because the other people competing in this
link |
00:49:40.160
were like Google and Intel and stuff,
link |
00:49:41.880
where I know a lot more about this stuff than I think we do.
link |
00:49:45.360
So then we emboldened.
link |
00:49:46.800
We thought, let's try the ImageNet one too.
link |
00:49:50.640
I mean, it seemed way out of our league.
link |
00:49:53.280
But our goal was to get under 12 hours.
link |
00:49:57.120
And we did, which was really exciting.
link |
00:49:59.280
And we didn't put anything up on the leaderboard,
link |
00:50:01.440
but we were down to like 10 hours.
link |
00:50:03.080
But then Google put in like five hours or something,
link |
00:50:10.000
and we're just like, oh, we're so screwed.
link |
00:50:13.360
But we kind of thought, well, keep trying.
link |
00:50:16.880
If Google can do it in five hours.
link |
00:50:17.880
I mean, Google did it on five hours on like a TPU pod
link |
00:50:20.760
or something, like a lot of hardware.
link |
00:50:24.280
But we kind of like had a bunch of ideas to try.
link |
00:50:26.360
Like a really simple thing was, why
link |
00:50:28.920
are we using these big images?
link |
00:50:30.480
They're like 224, 256 by 256 pixels.
link |
00:50:36.280
Why don't we try smaller ones?
link |
00:50:37.640
And just to elaborate, there's a constraint on the accuracy
link |
00:50:41.360
that your train model is supposed to achieve.
link |
00:50:43.080
Yeah, you've got to achieve 93%.
link |
00:50:45.760
I think it was for ImageNet.
link |
00:50:47.640
Exactly.
link |
00:50:49.160
Which is very tough.
link |
00:50:50.240
So you have to repeat that.
link |
00:50:51.240
Yeah, 93%.
link |
00:50:52.120
Like they picked a good threshold.
link |
00:50:54.680
It was a little bit higher than what the most commonly used
link |
00:50:58.920
ResNet 50 model could achieve at that time.
link |
00:51:03.320
So yeah, so it's quite a difficult problem to solve.
link |
00:51:08.080
But yeah, we realized if we actually just
link |
00:51:09.920
use 64 by 64 images, it trained a pretty good model.
link |
00:51:16.200
And then we could take that same model
link |
00:51:17.960
and just give it a couple of epochs
link |
00:51:19.560
to learn 224 by 224 images.
link |
00:51:21.880
And it was basically already trained.
link |
00:51:24.440
It makes a lot of sense.
link |
00:51:25.480
Like if you teach somebody, like here's
link |
00:51:27.200
what a dog looks like, and you show them low res versions,
link |
00:51:30.240
and then you say, here's a really clear picture of a dog.
link |
00:51:33.640
They already know what a dog looks like.
link |
00:51:36.000
So that, like, just we jumped to the front,
link |
00:51:39.920
and we ended up winning parts of that competition.
link |
00:51:46.400
We actually ended up doing a distributed version
link |
00:51:49.680
over multiple machines a couple of months later
link |
00:51:51.960
and ended up at the top of the leaderboard.
link |
00:51:53.560
We had 18 minutes.
link |
00:51:55.440
ImageNet.
link |
00:51:56.280
Yeah, and people have just kept on blasting through again
link |
00:52:00.560
and again since then.
link |
00:52:02.320
So what's your view on multi GPU or multiple machine
link |
00:52:06.760
training in general as a way to speed code up?
link |
00:52:11.960
I think it's largely a waste of time.
link |
00:52:13.680
Both multi GPU on a single machine and?
link |
00:52:15.880
Yeah, particularly multi machines,
link |
00:52:17.640
because it's just clunky.
link |
00:52:21.840
Multi GPUs is less clunky than it used to be.
link |
00:52:25.320
But to me, anything that slows down your iteration speed
link |
00:52:28.520
is a waste of time.
link |
00:52:31.800
So you could maybe do your very last perfecting of the model
link |
00:52:36.960
on multi GPUs if you need to.
link |
00:52:38.960
But so for example, I think doing stuff on ImageNet
link |
00:52:44.560
is generally a waste of time.
link |
00:52:46.000
Why test things on 1.3 million images?
link |
00:52:48.240
Most of us don't use 1.3 million images.
link |
00:52:51.040
And we've also done research that shows that doing things
link |
00:52:54.360
on a smaller subset of images gives you
link |
00:52:56.840
the same relative answers anyway.
link |
00:52:59.280
So from a research point of view, why waste that time?
link |
00:53:02.120
So actually, I released a couple of new data sets recently.
link |
00:53:06.200
One is called ImageNet.
link |
00:53:08.880
The French ImageNet, which is a small subset of ImageNet,
link |
00:53:12.920
which is designed to be easy to classify.
link |
00:53:15.200
What's how do you spell ImageNet?
link |
00:53:17.320
It's got an extra T and E at the end,
link |
00:53:19.200
because it's very French.
link |
00:53:20.520
Image, OK.
link |
00:53:21.640
And then another one called ImageWolf,
link |
00:53:24.720
which is a subset of ImageNet that only contains dog breeds.
link |
00:53:29.840
But that's a hard one, right?
link |
00:53:31.120
That's a hard one.
link |
00:53:32.000
And I've discovered that if you just look at these two
link |
00:53:34.360
subsets, you can train things on a single GPU in 10 minutes.
link |
00:53:39.120
And the results you get are directly transferrable
link |
00:53:42.040
to ImageNet nearly all the time.
link |
00:53:44.320
And so now I'm starting to see some researchers start
link |
00:53:46.600
to use these smaller data sets.
link |
00:53:48.960
I so deeply love the way you think,
link |
00:53:51.120
because I think you might have written a blog post saying
link |
00:53:57.000
that going with these big data sets
link |
00:54:00.200
is encouraging people to not think creatively.
link |
00:54:03.920
Absolutely.
link |
00:54:04.560
So year two, it sort of constrains you
link |
00:54:08.320
to train on large resources.
link |
00:54:09.840
And because you have these resources,
link |
00:54:11.280
you think more research will be better.
link |
00:54:14.040
And then you start to like somehow you kill the creativity.
link |
00:54:17.760
Yeah.
link |
00:54:18.000
And even worse than that, Lex, I keep hearing from people
link |
00:54:20.760
who say, I decided not to get into deep learning
link |
00:54:23.480
because I don't believe it's accessible to people
link |
00:54:26.080
outside of Google to do useful work.
link |
00:54:28.560
So like I see a lot of people make an explicit decision
link |
00:54:31.640
to not learn this incredibly valuable tool
link |
00:54:36.000
because they've drunk the Google Kool Aid, which is that only
link |
00:54:39.840
Google's big enough and smart enough to do it.
link |
00:54:42.440
And I just find that so disappointing and it's so wrong.
link |
00:54:45.400
And I think all of the major breakthroughs in AI
link |
00:54:49.200
in the next 20 years will be doable on a single GPU.
link |
00:54:53.280
Like I would say, my sense is all the big sort of.
link |
00:54:57.120
Well, let's put it this way.
link |
00:54:58.200
None of the big breakthroughs of the last 20 years
link |
00:55:00.200
have required multiple GPUs.
link |
00:55:01.720
So like batch norm, value, dropout,
link |
00:55:05.920
to demonstrate that there's something to them.
link |
00:55:08.080
Every one of them, none of them has required multiple GPUs.
link |
00:55:11.840
GANs, the original GANs, didn't require multiple GPUs.
link |
00:55:15.800
Well, and we've actually recently shown
link |
00:55:18.040
that you don't even need GANs.
link |
00:55:19.680
So we've developed GAN level outcomes
link |
00:55:23.360
without needing GANs.
link |
00:55:24.720
And we can now do it with, again,
link |
00:55:26.880
by using transfer learning, we can do it in a couple of hours
link |
00:55:29.680
on a single GPU.
link |
00:55:30.520
So you're using a generator model
link |
00:55:31.600
without the adversarial part?
link |
00:55:32.960
Yeah.
link |
00:55:33.440
So we've found loss functions that
link |
00:55:35.880
work super well without the adversarial part.
link |
00:55:38.680
And then one of our students, a guy called Jason Antich,
link |
00:55:41.840
has created a system called Dealtify,
link |
00:55:44.640
which uses this technique to colorize
link |
00:55:47.280
old black and white movies.
link |
00:55:48.840
You can do it on a single GPU, colorize a whole movie
link |
00:55:51.480
in a couple of hours.
link |
00:55:52.920
And one of the things that Jason and I did together
link |
00:55:56.080
was we figured out how to add a little bit of GAN
link |
00:56:00.480
at the very end, which it turns out for colorization,
link |
00:56:03.000
makes it just a bit brighter and nicer.
link |
00:56:06.000
And then Jason did masses of experiments
link |
00:56:07.920
to figure out exactly how much to do.
link |
00:56:10.000
But it's still all done on his home machine,
link |
00:56:12.840
on a single GPU in his lounge room.
link |
00:56:15.400
And if you think about colorizing Hollywood movies,
link |
00:56:19.200
that sounds like something a huge studio would have to do.
link |
00:56:21.720
But he has the world's best results on this.
link |
00:56:25.280
There's this problem of microphones.
link |
00:56:27.040
We're just talking to microphones now.
link |
00:56:28.640
Yeah.
link |
00:56:29.140
It's such a pain in the ass to have these microphones
link |
00:56:32.520
to get good quality audio.
link |
00:56:34.440
And I tried to see if it's possible to plop down
link |
00:56:36.720
a bunch of cheap sensors and reconstruct higher quality
link |
00:56:39.960
audio from multiple sources.
link |
00:56:41.840
Because right now, I haven't seen work from, OK,
link |
00:56:45.440
we can save inexpensive mics, automatically combining
link |
00:56:48.760
audio from multiple sources to improve the combined audio.
link |
00:56:52.280
People haven't done that.
link |
00:56:53.200
And that feels like a learning problem.
link |
00:56:55.080
So hopefully somebody can.
link |
00:56:56.800
Well, I mean, it's evidently doable.
link |
00:56:58.760
And it should have been done by now.
link |
00:57:01.000
I felt the same way about computational photography
link |
00:57:03.640
four years ago.
link |
00:57:04.480
That's right.
link |
00:57:05.240
Why are we investing in big lenses when
link |
00:57:08.240
three cheap lenses plus actually a little bit of intentional
link |
00:57:13.160
movement, so like take a few frames,
link |
00:57:16.640
gives you enough information to get excellent subpixel
link |
00:57:19.840
resolution, which particularly with deep learning,
link |
00:57:22.440
you would know exactly what you meant to be looking at.
link |
00:57:25.840
We can totally do the same thing with audio.
link |
00:57:28.200
I think the madness that it hasn't been done yet.
link |
00:57:30.720
Has there been progress on photography companies?
link |
00:57:33.320
Yeah.
link |
00:57:33.820
Photography is basically a standard now.
link |
00:57:36.720
So the Google Pixel Nightlight, I
link |
00:57:41.120
don't know if you've ever tried it, but it's astonishing.
link |
00:57:43.240
You take a picture and almost pitch black
link |
00:57:45.440
and you get back a very high quality image.
link |
00:57:49.120
And it's not because of the lens.
link |
00:57:51.440
Same stuff with like adding the bokeh to the background
link |
00:57:55.280
blurring.
link |
00:57:55.800
It's done computationally.
link |
00:57:57.200
Just the pics over here.
link |
00:57:58.520
Yeah.
link |
00:57:59.020
Basically, everybody now is doing most of the fanciest stuff
link |
00:58:05.000
on their phones with computational photography
link |
00:58:07.120
and also increasingly, people are putting more than one lens
link |
00:58:10.640
on the back of the camera.
link |
00:58:11.840
So the same will happen for audio, for sure.
link |
00:58:14.360
And there's applications in the audio side.
link |
00:58:16.520
If you look at an Alexa type device,
link |
00:58:19.360
most people I've seen, especially I worked at Google
link |
00:58:21.840
before, when you look at noise background removal,
link |
00:58:26.000
you don't think of multiple sources of audio.
link |
00:58:29.480
You don't play with that as much as I would hope people would.
link |
00:58:31.920
But I mean, you can still do it even with one.
link |
00:58:33.640
Like, again, it's not much work's been done in this area.
link |
00:58:36.120
So we're actually going to be releasing an audio library
link |
00:58:38.440
soon, which hopefully will encourage development of this
link |
00:58:41.040
because it's so underused.
link |
00:58:43.200
The basic approach we used for our super resolution,
link |
00:58:46.480
in which Jason uses for dealdify of generating
link |
00:58:49.960
high quality images, the exact same approach
link |
00:58:51.920
would work for audio.
link |
00:58:53.480
No one's done it yet, but it would be a couple of months work.
link |
00:58:57.160
OK, also learning rate in terms of dawn bench.
link |
00:59:01.600
There's some magic on learning rate that you played around
link |
00:59:04.280
with.
link |
00:59:04.760
It's kind of interesting.
link |
00:59:05.800
Yeah, so this is all work that came from a guy called Leslie
link |
00:59:08.120
Smith.
link |
00:59:09.360
Leslie's a researcher who, like us,
link |
00:59:12.760
cares a lot about just the practicalities of training
link |
00:59:17.720
neural networks quickly and accurately,
link |
00:59:20.000
which you would think is what everybody should care about,
link |
00:59:22.120
but almost nobody does.
link |
00:59:25.000
And he discovered something very interesting,
link |
00:59:28.120
which he calls super convergence, which
link |
00:59:30.000
is there are certain networks that with certain settings
link |
00:59:32.360
of high parameters could suddenly
link |
00:59:34.320
be trained 10 times faster by using
link |
00:59:37.440
a 10 times higher learning rate.
link |
00:59:39.480
Now, no one published that paper
link |
00:59:44.680
because it's not an area of active research
link |
00:59:49.520
in the academic world.
link |
00:59:50.440
No academics recognize this is important.
link |
00:59:52.840
And also, deep learning in academia
link |
00:59:56.080
is not considered a experimental science.
link |
01:00:00.040
So unlike in physics, where you could say,
link |
01:00:02.440
I just saw a subatomic particle do something
link |
01:00:05.360
which the theory doesn't explain,
link |
01:00:07.240
you could publish that without an explanation.
link |
01:00:10.440
And then in the next 60 years, people
link |
01:00:12.120
can try to work out how to explain it.
link |
01:00:14.120
We don't allow this in the deep learning world.
link |
01:00:16.200
So it's literally impossible for Leslie to publish a paper that
link |
01:00:20.720
says, I've just seen something amazing happen.
link |
01:00:23.560
This thing trained 10 times faster than it should have.
link |
01:00:25.680
I don't know why.
link |
01:00:27.080
And so the reviewers were like, well,
link |
01:00:28.600
you can't publish that because you don't know why.
link |
01:00:30.280
So anyway.
link |
01:00:31.000
That's important to pause on because there's
link |
01:00:32.680
so many discoveries that would need to start like that.
link |
01:00:36.160
Every other scientific field I know of works of that way.
link |
01:00:39.280
I don't know why ours is uniquely
link |
01:00:42.520
disinterested in publishing unexplained
link |
01:00:46.480
experimental results.
link |
01:00:47.680
But there it is.
link |
01:00:48.680
So it wasn't published.
link |
01:00:51.200
Having said that, I read a lot more
link |
01:00:55.080
unpublished papers and published papers
link |
01:00:56.840
because that's where you find the interesting insights.
link |
01:01:00.080
So I absolutely read this paper.
link |
01:01:02.680
And I was just like, this is astonishingly mind blowing
link |
01:01:08.120
and weird and awesome.
link |
01:01:09.760
And why isn't everybody only talking about this?
link |
01:01:12.400
Because if you can train these things 10 times faster,
link |
01:01:15.520
they also generalize better because you're doing less epochs,
link |
01:01:18.480
which means you look at the data less,
link |
01:01:20.080
you get better accuracy.
link |
01:01:22.400
So I've been kind of studying that ever since.
link |
01:01:24.640
And eventually Leslie kind of figured out
link |
01:01:28.520
a lot of how to get this done.
link |
01:01:30.160
And we added minor tweaks.
link |
01:01:32.280
And a big part of the trick is starting
link |
01:01:34.840
at a very low learning rate, very gradually increasing it.
link |
01:01:37.920
So as you're training your model,
link |
01:01:39.800
you take very small steps at the start.
link |
01:01:42.120
And you gradually make them bigger and bigger
link |
01:01:44.080
until eventually you're taking much bigger steps
link |
01:01:46.440
than anybody thought was possible.
link |
01:01:49.400
There's a few other little tricks to make it work.
link |
01:01:52.280
Basically, we can reliably get super convergence.
link |
01:01:55.240
And so for the dorm bench thing,
link |
01:01:56.640
we were using just much higher learning rates
link |
01:01:59.320
than people expected to work.
link |
01:02:02.200
What do you think the future of,
link |
01:02:03.880
I mean, it makes so much sense for that
link |
01:02:05.200
to be a critical hyperparameter learning rate that you vary.
link |
01:02:08.640
What do you think the future of learning rate magic looks like?
link |
01:02:13.480
Well, there's been a lot of great work
link |
01:02:14.960
in the last 12 months in this area.
link |
01:02:17.400
And people are increasingly realizing that we just
link |
01:02:20.800
have no idea really how optimizers work.
link |
01:02:23.120
And the combination of weight decay,
link |
01:02:25.840
which is how we regularize optimizers,
link |
01:02:27.480
and the learning rate, and then other things
link |
01:02:30.120
like the epsilon we use in the atom optimizer,
link |
01:02:32.760
they all work together in weird ways.
link |
01:02:36.560
And different parts of the model,
link |
01:02:38.560
this is another thing we've done a lot of work on,
link |
01:02:40.480
is research into how different parts of the model
link |
01:02:43.480
should be trained at different rates in different ways.
link |
01:02:46.600
So we do something we call discriminative learning rates,
link |
01:02:49.040
which is really important, particularly for transfer
link |
01:02:51.040
learning.
link |
01:02:53.200
So really, I think in the last 12 months,
link |
01:02:54.880
a lot of people have realized that all this stuff is important.
link |
01:02:57.360
There's been a lot of great work coming out.
link |
01:03:00.000
And we're starting to see algorithms
link |
01:03:02.880
appear which have very, very few dials, if any,
link |
01:03:06.880
that you have to touch.
link |
01:03:07.920
So I think what's going to happen
link |
01:03:09.240
is the idea of a learning rate, well,
link |
01:03:10.840
it almost already has disappeared in the latest research.
link |
01:03:14.360
And instead, it's just like, we know enough
link |
01:03:18.240
about how to interpret the gradients
link |
01:03:22.440
and the change of gradients we see
link |
01:03:23.840
to know how to set every parameter of our way.
link |
01:03:25.440
There you can automate it.
link |
01:03:26.440
So you see the future of deep learning, where really,
link |
01:03:31.720
where is the input of a human expert needed?
link |
01:03:34.600
Well, hopefully, the input of a human expert
link |
01:03:36.520
will be almost entirely unneeded from the deep learning
link |
01:03:39.680
point of view.
link |
01:03:40.560
So again, Google's approach to this
link |
01:03:43.480
is to try and use thousands of times more compute
link |
01:03:46.000
to run lots and lots of models at the same time
link |
01:03:49.400
and hope that one of them is good.
link |
01:03:51.040
A lot of malkana stuff.
link |
01:03:51.960
Yeah, a lot of malkana stuff, which I think is insane.
link |
01:03:56.800
When you better understand the mechanics of how models learn,
link |
01:04:01.720
you don't have to try 1,000 different models
link |
01:04:03.800
to find which one happens to work the best.
link |
01:04:05.680
You can just jump straight to the best one, which
link |
01:04:08.240
means that it's more accessible in terms of compute, cheaper,
link |
01:04:12.720
and also with less hyperparameters to set.
link |
01:04:14.920
That means you don't need deep learning experts
link |
01:04:16.800
to train your deep learning model for you,
link |
01:04:19.360
which means that domain experts can do more of the work, which
link |
01:04:22.480
means that now you can focus the human time
link |
01:04:24.960
on the kind of interpretation, the data gathering,
link |
01:04:28.320
identifying model errors, and stuff like that.
link |
01:04:31.360
Yeah, the data side.
link |
01:04:32.840
How often do you work with data these days
link |
01:04:34.720
in terms of the cleaning, Darwin looked
link |
01:04:38.680
at different species while traveling about,
link |
01:04:43.120
do you look at data?
link |
01:04:45.040
Have you, in your roots in Kaggle, just look at data?
link |
01:04:49.400
Yeah, I mean, it's a key part of our course.
link |
01:04:51.320
It's like before we train a model in the course,
link |
01:04:53.480
we see how to look at the data.
link |
01:04:55.160
And then the first thing we do after we train our first model,
link |
01:04:57.920
which we fine tune an ImageNet model for five minutes.
link |
01:05:00.520
And then the thing we immediately do after that
link |
01:05:02.240
is we learn how to analyze the results of the model
link |
01:05:05.760
by looking at examples of misclassified images,
link |
01:05:08.920
and looking at a classification matrix,
link |
01:05:10.880
and then doing research on Google
link |
01:05:15.080
to learn about the kinds of things that it's misclassifying.
link |
01:05:18.000
So to me, one of the really cool things
link |
01:05:19.520
about machine learning models in general
link |
01:05:21.840
is that when you interpret them, they
link |
01:05:24.480
tell you about things like what are the most important features,
link |
01:05:27.360
which groups you're misclassifying,
link |
01:05:29.400
and they help you become a domain expert more quickly,
link |
01:05:32.440
because you can focus your time on the bits
link |
01:05:34.880
that the model is telling you is important.
link |
01:05:38.680
So it lets you deal with things like data leakage,
link |
01:05:40.760
for example, if it says, oh, the main feature I'm looking at
link |
01:05:43.080
is customer ID.
link |
01:05:45.240
And you're like, oh, customer ID should be predictive.
link |
01:05:47.640
And then you can talk to the people that manage customer IDs,
link |
01:05:52.280
and they'll tell you, oh, yes, as soon as a customer's application
link |
01:05:56.840
is accepted, we add a one on the end of their customer ID
link |
01:05:59.480
or something.
link |
01:06:01.200
So yeah, looking at data, particularly
link |
01:06:04.360
from the lens of which parts of the data the model says
link |
01:06:06.600
is important, is super important.
link |
01:06:09.400
Yeah, and using kind of using the model
link |
01:06:11.480
to almost debug the data to learn more about the data.
link |
01:06:14.240
Exactly.
link |
01:06:16.800
What are the different cloud options
link |
01:06:18.600
for training your networks?
link |
01:06:20.160
Last question related to Don Bench.
link |
01:06:22.000
Well, it's part of a lot of the work you do,
link |
01:06:24.240
but from a perspective of performance,
link |
01:06:27.280
I think you've written this in a blog post.
link |
01:06:29.480
There's AWS, there's a TPU from Google.
link |
01:06:32.720
What's your sense?
link |
01:06:33.440
What the future holds?
link |
01:06:34.520
What would you recommend now in terms of training in the cloud?
link |
01:06:37.360
So from a hardware point of view,
link |
01:06:40.520
Google's TPUs and the best Nvidia GPUs are similar.
link |
01:06:45.520
And maybe the TPUs are like 30% faster,
link |
01:06:47.880
but they're also much harder to program.
link |
01:06:51.160
There isn't a clear leader in terms of hardware right now,
link |
01:06:54.720
although much more importantly, the Nvidia's GPUs
link |
01:06:57.840
are much more programmable.
link |
01:06:59.560
They've got much more written problems.
link |
01:07:01.280
That's the clear leader for me and where
link |
01:07:03.480
I would spend my time as a researcher and practitioner.
link |
01:07:08.640
But then in terms of the platform,
link |
01:07:12.280
I mean, we're super lucky now with stuff like Google,
link |
01:07:15.680
GCP, Google Cloud, and AWS that you can access a GPU
link |
01:07:21.520
pretty quickly and easily.
link |
01:07:25.440
But I mean, for AWS, it's still too hard.
link |
01:07:28.280
You have to find an AMI and get the instance running
link |
01:07:33.760
and then install the software you want and blah, blah, blah.
link |
01:07:37.080
GCP is currently the best way to get
link |
01:07:40.400
started on a full server environment
link |
01:07:42.320
because they have a fantastic fast AI in PyTorch,
link |
01:07:46.120
ready to go instance, which has all the courses preinstalled.
link |
01:07:51.120
It has Jupyter Notebook prerunning.
link |
01:07:53.040
Jupyter Notebook is this wonderful interactive computing
link |
01:07:57.080
system, which everybody basically
link |
01:07:59.440
should be using for any kind of data driven research.
link |
01:08:02.920
But then even better than that, there
link |
01:08:05.880
are platforms like Salamander, which we own,
link |
01:08:09.560
and Paperspace, where literally you click a single button
link |
01:08:13.600
and it pops up and you put a notebook straight away
link |
01:08:17.240
without any kind of installation or anything.
link |
01:08:22.240
And all the course notebooks are all preinstalled.
link |
01:08:25.800
So for me, this is one of the things
link |
01:08:28.560
we spent a lot of time curating and working on.
link |
01:08:34.160
Because when we first started our courses,
link |
01:08:35.960
the biggest problem was people dropped out of lesson one
link |
01:08:39.560
because they couldn't get an AWS instance running.
link |
01:08:42.680
So things are so much better now.
link |
01:08:44.880
And we actually have, if you go to course.fast.ai,
link |
01:08:47.760
the first thing it says is, here's
link |
01:08:49.040
how to get started with your GPU.
link |
01:08:50.480
And it's like, you just click on the link
link |
01:08:52.120
and you click start and you're going.
link |
01:08:55.120
You will go GCP.
link |
01:08:56.240
I have to confess, I've never used the Google GCP.
link |
01:08:58.760
Yeah, GCP gives you $300 of compute for free,
link |
01:09:01.600
which is really nice.
link |
01:09:04.920
But as I say, Salamander and Paperspace are even easier still.
link |
01:09:10.960
So from the perspective of deep learning frameworks,
link |
01:09:15.120
you work with Fast.ai, if you think of it as framework,
link |
01:09:18.400
and PyTorch and TensorFlow, what are the strengths
link |
01:09:22.960
of each platform in your perspective?
link |
01:09:25.840
So in terms of what we've done our research on and taught
link |
01:09:29.240
in our course, we started with Theano and Keras.
link |
01:09:34.400
And then we switched to TensorFlow and Keras.
link |
01:09:38.120
And then we switched to PyTorch.
link |
01:09:40.400
And then we switched to PyTorch and Fast.ai.
link |
01:09:43.360
And that kind of reflects a growth and development
link |
01:09:47.560
of the ecosystem of deep learning libraries.
link |
01:09:52.560
Theano and TensorFlow were great,
link |
01:09:57.040
but were much harder to teach and to do research and development
link |
01:10:01.360
on because they define what's called a computational graph
link |
01:10:04.560
up front, a static graph, where you basically
link |
01:10:06.680
have to say, here are all the things
link |
01:10:08.360
that I'm going to eventually do in my model.
link |
01:10:12.040
And then later on, you say, OK, do those things with this data.
link |
01:10:15.080
And you can't debug them.
link |
01:10:17.160
You can't do them step by step.
link |
01:10:18.560
You can't program them interactively
link |
01:10:20.160
in a Jupyter notebook and so forth.
link |
01:10:22.280
PyTorch was not the first, but PyTorch
link |
01:10:24.320
was certainly the strongest entrant to come along
link |
01:10:27.400
and say, let's not do it that way.
link |
01:10:28.720
Let's just use normal Python.
link |
01:10:31.320
And everything you know about in Python
link |
01:10:32.880
is just going to work.
link |
01:10:34.000
And we'll figure out how to make that run on the GPU
link |
01:10:37.880
as and when necessary.
link |
01:10:40.800
That turned out to be a huge leap in terms
link |
01:10:45.120
of what we could do with our research
link |
01:10:46.800
and what we could do with our teaching.
link |
01:10:49.720
Because it wasn't limiting.
link |
01:10:51.160
Yeah, I mean, it was critical for us
link |
01:10:52.760
for something like Dawnbench to be able to rapidly try things.
link |
01:10:55.960
It's just so much harder to be a researcher and practitioner
link |
01:10:58.560
when you have to do everything upfront
link |
01:11:00.520
and you can't inspect it.
link |
01:11:03.400
Problem with PyTorch is it's not at all
link |
01:11:07.360
accessible to newcomers because you
link |
01:11:09.360
have to write your own training loop
link |
01:11:11.600
and manage the gradients and all this stuff.
link |
01:11:15.680
And it's also not great for researchers
link |
01:11:17.920
because you're spending your time dealing with all this boiler
link |
01:11:20.680
plate and overhead rather than thinking about your algorithm.
link |
01:11:23.920
So we ended up writing this very multi layered API
link |
01:11:27.760
that at the top level, you can train a state of the art neural
link |
01:11:31.040
network in three lines of code.
link |
01:11:33.640
And which talks to an API, which talks to an API,
link |
01:11:35.920
which talks to an API, which you can dive into at any level
link |
01:11:38.880
and get progressively closer to the machine levels of control.
link |
01:11:45.400
And this is the fast AI library.
link |
01:11:47.480
That's been critical for us and for our students
link |
01:11:51.920
and for lots of people that have won big learning
link |
01:11:54.200
competitions with it and written academic papers with it.
link |
01:11:58.560
It's made a big difference.
link |
01:12:00.680
We're still limited though by Python.
link |
01:12:03.960
And particularly this problem with things
link |
01:12:05.920
like our current neural nets say where you just can't change
link |
01:12:10.640
things unless you accept it going so slowly
link |
01:12:13.320
that it's impractical.
link |
01:12:15.680
So in the latest incarnation of the course
link |
01:12:18.320
and with some of the research we're now starting to do,
link |
01:12:20.880
we're starting to do some stuff in Swift.
link |
01:12:24.480
I think we're three years away from that being
link |
01:12:28.920
super practical, but I'm in no hurry.
link |
01:12:31.080
I'm very happy to invest the time to get there.
link |
01:12:35.480
But with that, we actually already
link |
01:12:38.000
have a nascent version of the fast AI library for vision
link |
01:12:41.840
running on Swift and TensorFlow.
link |
01:12:44.720
Because Python for TensorFlow is not going to cut it.
link |
01:12:48.040
It's just a disaster.
link |
01:12:49.920
What they did was they tried to replicate the bits
link |
01:12:54.440
that people were saying they like about PyTorch,
link |
01:12:56.640
this kind of interactive computation.
link |
01:12:59.160
But they didn't actually change their foundational runtime
link |
01:13:02.760
components.
link |
01:13:03.920
So they kind of added this like syntax, sugar,
link |
01:13:06.640
they call TF Eager, TensorFlow Eager, which
link |
01:13:08.560
makes it look a lot like PyTorch.
link |
01:13:10.880
But it's 10 times slower than PyTorch to actually do a step.
link |
01:13:16.400
So because they didn't invest the time
link |
01:13:19.080
in retooling the foundations because their code base
link |
01:13:22.080
is so horribly complex.
link |
01:13:23.520
Yeah, I think it's probably very difficult
link |
01:13:25.280
to do that kind of rejoining.
link |
01:13:26.440
Yeah, well, particularly the way TensorFlow was written,
link |
01:13:28.680
it was written by a lot of people very quickly
link |
01:13:31.480
in a very disorganized way.
link |
01:13:33.320
So when you actually look in the code, as I do often,
link |
01:13:36.000
I'm always just like, oh, god, what were they thinking?
link |
01:13:38.840
It's just, it's pretty awful.
link |
01:13:41.480
So I'm really extremely negative about the potential future
link |
01:13:47.080
for Python TensorFlow that Swift for TensorFlow
link |
01:13:52.120
can be a different beast altogether.
link |
01:13:53.760
It can be like, it can basically be a layer on top of MLIR
link |
01:13:57.560
that takes advantage of all the great compiler stuff
link |
01:14:02.640
that Swift builds on with LLVM.
link |
01:14:04.760
And yeah, it could be absolutely.
link |
01:14:07.040
I think it will be absolutely fantastic.
link |
01:14:10.320
Well, you're inspiring me to try.
link |
01:14:11.920
Evan truly felt the pain of TensorFlow 2.0 Python.
link |
01:14:17.640
It's fine by me.
link |
01:14:19.040
But of course.
link |
01:14:21.080
Yeah, I mean, it does the job if you're using
link |
01:14:23.240
predefined things that somebody's already written.
link |
01:14:27.720
But if you actually compare, like I've
link |
01:14:29.920
had to do a lot of stuff with TensorFlow recently,
link |
01:14:33.680
you actually compare like, I want
link |
01:14:35.480
to write something from scratch.
link |
01:14:37.360
And you're like, I just keep finding it's like, oh,
link |
01:14:39.040
it's running 10 times slower than PyTorch.
link |
01:14:41.560
So is the biggest cost.
link |
01:14:43.800
Let's throw running time out the window.
link |
01:14:47.360
How long it takes you to program?
link |
01:14:49.640
That's not too different now.
link |
01:14:51.000
Thanks to TensorFlow Eager, that's not too different.
link |
01:14:54.080
But because so many things take so long to run,
link |
01:14:58.640
you wouldn't run it at 10 times slower.
link |
01:15:00.320
Like, you just go like, oh, this is taking too long.
link |
01:15:03.000
And also, there's a lot of things
link |
01:15:04.240
which are just less programmable,
link |
01:15:05.840
like tf.data, which is the way data processing works
link |
01:15:09.000
in TensorFlow, is just this big mess.
link |
01:15:11.400
It's incredibly inefficient.
link |
01:15:13.160
And they kind of had to write it that way
link |
01:15:14.800
because of the TPU problems I described earlier.
link |
01:15:19.160
So I just feel like they've got this huge technical debt,
link |
01:15:24.680
which they're not going to solve without starting from scratch.
link |
01:15:27.960
So here's an interesting question then.
link |
01:15:29.440
If there's a new student starting today,
link |
01:15:34.720
what would you recommend they use?
link |
01:15:37.480
Well, I mean, we obviously recommend
link |
01:15:39.160
FastAI and PyTorch because we teach new students.
link |
01:15:42.760
And that's what we teach with.
link |
01:15:43.960
So we would very strongly recommend that
link |
01:15:46.080
because it will let you get on top of the concepts much
link |
01:15:50.280
more quickly.
link |
01:15:51.960
So then you'll become an action.
link |
01:15:53.160
And you'll also learn the actual state of the art techniques.
link |
01:15:56.400
So you actually get world class results.
link |
01:15:59.240
Honestly, it doesn't much matter what library
link |
01:16:03.000
you learn because switching from Shaina to MXNet to TensorFlow
link |
01:16:09.240
to PyTorch is going to be a couple of days work
link |
01:16:12.000
if you long as you understand the foundation as well.
link |
01:16:15.280
But you think we'll Swift creep in there as a thing
link |
01:16:21.600
that people start using?
link |
01:16:22.960
Not for a few years, particularly because Swift
link |
01:16:26.400
has no data science community, libraries, schooling.
link |
01:16:33.440
And the Swift community has a total lack of appreciation
link |
01:16:39.080
and understanding of numeric computing.
link |
01:16:41.040
So they keep on making stupid decisions.
link |
01:16:43.640
For years, they've just done dumb things around performance
link |
01:16:47.480
and prioritization.
link |
01:16:50.280
That's clearly changing now because the developer of Chris
link |
01:16:56.360
Lattner is working at Google on Swift for TensorFlow.
link |
01:16:59.960
So that's a priority.
link |
01:17:03.200
It'll be interesting to see what happens with Apple
link |
01:17:05.000
because Apple hasn't shown any sign of caring
link |
01:17:10.000
about numeric programming in Swift.
link |
01:17:12.960
So hopefully they'll get off their arse
link |
01:17:16.600
and start appreciating this because currently all
link |
01:17:18.840
of their low level libraries are not written in Swift.
link |
01:17:24.240
They're not particularly Swifty at all, stuff like Core ML.
link |
01:17:27.640
They're really pretty rubbish.
link |
01:17:30.840
So yeah, so there's a long way to go.
link |
01:17:32.760
But at least one nice thing is that Swift for TensorFlow
link |
01:17:35.360
can actually directly use Python code and Python libraries.
link |
01:17:40.000
Literally, the entire lesson one notebook of fast AI
link |
01:17:44.240
runs in Swift right now in Python mode.
link |
01:17:47.800
So that's a nice intermediate thing.
link |
01:17:50.800
How long does it take if you look at the two fast AI courses,
link |
01:17:56.800
how long does it take to get from 0.0 to completing
link |
01:18:00.360
both courses?
link |
01:18:02.360
It varies a lot.
link |
01:18:04.800
Somewhere between two months and two years, generally.
link |
01:18:12.360
So for two months, how many hours a day on average?
link |
01:18:15.360
So like somebody who is a very competent coder
link |
01:18:20.360
can can do 70 hours per course and pick up.
link |
01:18:27.360
70, 70.
link |
01:18:28.360
That's it?
link |
01:18:29.360
OK.
link |
01:18:30.360
But a lot of people I know take a year off to study fast AI
link |
01:18:36.360
full time and say at the end of the year,
link |
01:18:39.360
they feel pretty competent.
link |
01:18:42.360
Because generally, there's a lot of other things you do.
link |
01:18:45.360
Generally, they'll be entering Kaggle competitions.
link |
01:18:48.360
They might be reading Ian Goodfellow's book.
link |
01:18:51.360
They might be doing a bunch of stuff.
link |
01:18:54.360
And often, particularly if they are a domain expert,
link |
01:18:57.360
their coding skills might be a little on the pedestrian side.
link |
01:19:01.360
So part of it's just like doing a lot more writing.
link |
01:19:04.360
What do you find is the bottleneck for people usually,
link |
01:19:07.360
except getting started and setting stuff up?
link |
01:19:11.360
I would say coding.
link |
01:19:13.360
The people who are strong coders pick it up the best.
link |
01:19:17.360
Although another bottleneck is people who have a lot of
link |
01:19:21.360
experience of classic statistics can really struggle
link |
01:19:27.360
because the intuition is so the opposite of what they're used to.
link |
01:19:30.360
They're very used to trying to reduce the number of parameters
link |
01:19:33.360
in their model and looking at individual coefficients
link |
01:19:38.360
and stuff like that.
link |
01:19:39.360
So I find people who have a lot of coding background
link |
01:19:42.360
and know nothing about statistics are generally
link |
01:19:45.360
going to be the best stuff.
link |
01:19:48.360
So you taught several courses on deep learning
link |
01:19:51.360
and as Feynman says, the best way to understand something
link |
01:19:54.360
is to teach it.
link |
01:19:55.360
What have you learned about deep learning from teaching it?
link |
01:19:58.360
A lot.
link |
01:20:00.360
It's a key reason for me to teach the courses.
link |
01:20:03.360
Obviously, it's going to be necessary to achieve our goal
link |
01:20:06.360
of getting domain experts to be familiar with deep learning,
link |
01:20:09.360
but it was also necessary for me to achieve my goal
link |
01:20:12.360
of being really familiar with deep learning.
link |
01:20:16.360
I mean, to see so many domain experts from so many different
link |
01:20:24.360
backgrounds, it's definitely, I wouldn't say taught me,
link |
01:20:28.360
but convinced me something that I liked to believe was true,
link |
01:20:31.360
which was anyone can do it.
link |
01:20:34.360
So there's a lot of kind of snobbishness out there about
link |
01:20:37.360
only certain people can learn to code,
link |
01:20:39.360
only certain people are going to be smart enough to do AI.
link |
01:20:42.360
That's definitely bullshit.
link |
01:20:44.360
I've seen so many people from so many different backgrounds
link |
01:20:48.360
get state of the art results in their domain areas now.
link |
01:20:52.360
It's definitely taught me that the key differentiator
link |
01:20:56.360
between people that succeed and people that fail is tenacity.
link |
01:21:00.360
That seems to be basically the only thing that matters.
link |
01:21:03.360
A lot of people give up.
link |
01:21:07.360
But if the ones who don't give up pretty much everybody succeeds,
link |
01:21:13.360
even if at first I'm just kind of thinking,
link |
01:21:17.360
wow, they really aren't quite getting it yet, are they?
link |
01:21:20.360
But eventually people get it and they succeed.
link |
01:21:24.360
So I think that's been, I think they're both things I liked
link |
01:21:27.360
to believe was true, but I don't feel like I really had
link |
01:21:29.360
strong evidence for them to be true,
link |
01:21:31.360
but now I can see I've seen it again and again.
link |
01:21:34.360
So what advice do you have for someone
link |
01:21:39.360
who wants to get started in deep learning?
link |
01:21:42.360
Train lots of models.
link |
01:21:44.360
That's how you learn it.
link |
01:21:47.360
So I think, it's not just me.
link |
01:21:51.360
I think our course is very good,
link |
01:21:53.360
but also lots of people independently have said it's very good.
link |
01:21:55.360
It recently won the CogEx Award for AI courses,
link |
01:21:58.360
it's being the best in the world.
link |
01:22:00.360
I'd say come to our course, course.fast.ai.
link |
01:22:02.360
And the thing I keep on harping on in my lessons is
link |
01:22:05.360
train models, print out the inputs to the models,
link |
01:22:08.360
print out to the outputs to the models,
link |
01:22:10.360
like study, you know, change the inputs a bit,
link |
01:22:14.360
look at how the outputs vary,
link |
01:22:16.360
just run lots of experiments to get an intuitive understanding
link |
01:22:22.360
of what's going on.
link |
01:22:24.360
To get hooked, do you think, you mentioned training,
link |
01:22:28.360
do you think just running the models inference?
link |
01:22:32.360
If we talk about getting started.
link |
01:22:35.360
No, you've got to fine tune the models.
link |
01:22:37.360
So that's the critical thing,
link |
01:22:39.360
because at that point, you now have a model that's in your domain area.
link |
01:22:43.360
So there's no point running somebody else's model,
link |
01:22:46.360
because it's not your model.
link |
01:22:48.360
So it only takes five minutes to fine tune a model
link |
01:22:50.360
for the data you care about.
link |
01:22:52.360
And in lesson two of the course,
link |
01:22:54.360
we teach you how to create your own dataset from scratch
link |
01:22:56.360
by scripting Google image search.
link |
01:22:58.360
And we show you how to actually create a web application running online.
link |
01:23:02.360
So I create one in the course that differentiates
link |
01:23:05.360
between a teddy bear, a grizzly bear, and a brown bear.
link |
01:23:08.360
And it does it with basically 100% accuracy.
link |
01:23:10.360
It took me about four minutes to scrape the images
link |
01:23:13.360
from Google search in the script.
link |
01:23:15.360
There's a little graphical widgets we have in the notebook
link |
01:23:18.360
that help you clean up the dataset.
link |
01:23:21.360
There's other widgets that help you study the results
link |
01:23:24.360
and see where the errors are happening.
link |
01:23:26.360
And so now we've got over a thousand replies
link |
01:23:29.360
in our Share Your Work Here thread of students saying,
link |
01:23:32.360
here's the thing I built.
link |
01:23:34.360
And so there's people who, like,
link |
01:23:36.360
and a lot of them are state of the art.
link |
01:23:38.360
Like somebody said, oh, I tried looking at Dev and Gary characters
link |
01:23:40.360
and I couldn't believe it.
link |
01:23:42.360
The thing that came out was more accurate
link |
01:23:44.360
than the best academic paper after lesson one.
link |
01:23:46.360
And then there's others which are just more kind of fun,
link |
01:23:48.360
like somebody who's doing Trinidad and Tobago hummingbirds.
link |
01:23:53.360
So that's kind of their national bird.
link |
01:23:55.360
And Susie's got something that can now classify Trinidad
link |
01:23:57.360
and Tobago hummingbirds.
link |
01:23:59.360
So yeah, train models, fine tune models with your dataset
link |
01:24:02.360
and then study their inputs and outputs.
link |
01:24:05.360
How much is Fast AI courses?
link |
01:24:07.360
Free.
link |
01:24:09.360
Everything we do is free.
link |
01:24:11.360
We have no revenue sources of any kind.
link |
01:24:13.360
It's just a service to the community.
link |
01:24:15.360
You're a saint.
link |
01:24:17.360
Okay, once a person understands the basics,
link |
01:24:20.360
trains a bunch of models,
link |
01:24:22.360
if we look at the scale of years,
link |
01:24:25.360
what advice do you have for someone wanting
link |
01:24:27.360
to eventually become an expert?
link |
01:24:30.360
Train lots of models.
link |
01:24:32.360
Specifically, train lots of models in your domain area.
link |
01:24:35.360
So an expert, what, right?
link |
01:24:37.360
We don't need more expert, like,
link |
01:24:40.360
create slightly evolutionary research in areas
link |
01:24:45.360
that everybody's studying.
link |
01:24:47.360
We need experts at using deep learning
link |
01:24:50.360
to diagnose malaria.
link |
01:24:52.360
Well, we need experts at using deep learning
link |
01:24:55.360
to analyze language to study media bias.
link |
01:25:00.360
So we need experts in analyzing fisheries
link |
01:25:08.360
to identify problem areas and the ocean.
link |
01:25:11.360
That's what we need.
link |
01:25:13.360
So become the expert in your passion area.
link |
01:25:17.360
And this is a tool which you can use for just about anything,
link |
01:25:21.360
and you'll be able to do that thing better than other people,
link |
01:25:24.360
particularly by combining it with your passion
link |
01:25:26.360
and domain expertise.
link |
01:25:27.360
So that's really interesting.
link |
01:25:28.360
Even if you do want to innovate on transfer learning
link |
01:25:30.360
or active learning,
link |
01:25:32.360
your thought is, I mean,
link |
01:25:34.360
what I certainly share is you also need to find
link |
01:25:38.360
a domain or data set that you actually really care for.
link |
01:25:41.360
Right.
link |
01:25:42.360
If you're not working on a real problem that you understand,
link |
01:25:45.360
how do you know if you're doing it any good?
link |
01:25:47.360
How do you know if your results are good?
link |
01:25:49.360
How do you know if you're getting bad results?
link |
01:25:51.360
Why are you getting bad results?
link |
01:25:52.360
Is it a problem with the data?
link |
01:25:54.360
How do you know you're doing anything useful?
link |
01:25:57.360
Yeah, to me, the only really interesting research is,
link |
01:26:00.360
not the only, but the vast majority of interesting research
link |
01:26:03.360
is try and solve an actual problem and solve it really well.
link |
01:26:06.360
So both understanding sufficient tools on the deep learning side
link |
01:26:10.360
and becoming a domain expert in a particular domain
link |
01:26:14.360
are really things within reach for anybody.
link |
01:26:18.360
Yeah.
link |
01:26:19.360
To me, I would compare it to studying self driving cars,
link |
01:26:23.360
having never looked at a car or been in a car
link |
01:26:26.360
or turned a car on, which is like the way it is
link |
01:26:29.360
for a lot of people.
link |
01:26:30.360
They'll study some academic data set
link |
01:26:33.360
where they literally have no idea about that.
link |
01:26:36.360
By the way, I'm not sure how familiar
link |
01:26:37.360
you are with autonomous vehicles,
link |
01:26:39.360
but that is literally, you describe a large percentage
link |
01:26:42.360
of robotics folks working in self driving cars,
link |
01:26:45.360
as they actually haven't considered driving.
link |
01:26:48.360
They haven't actually looked at what driving looks like.
link |
01:26:50.360
They haven't driven.
link |
01:26:51.360
And it applies.
link |
01:26:52.360
It's a problem because you know when you've actually driven,
link |
01:26:54.360
these are the things that happened to me when I was driving.
link |
01:26:57.360
There's nothing that beats the real world examples
link |
01:26:59.360
or just experiencing them.
link |
01:27:02.360
You've created many successful startups.
link |
01:27:04.360
What does it take to create a successful startup?
link |
01:27:08.360
Same thing as becoming successful deep learning practitioner,
link |
01:27:12.360
which is not giving up.
link |
01:27:14.360
So you can run out of money or run out of time
link |
01:27:22.360
or run out of something, you know,
link |
01:27:24.360
but if you keep costs super low
link |
01:27:27.360
and try and save up some money beforehand
link |
01:27:29.360
so you can afford to have some time,
link |
01:27:34.360
then just sticking with it is one important thing.
link |
01:27:37.360
Doing something you understand and care about is important.
link |
01:27:42.360
By something, I don't mean...
link |
01:27:44.360
The biggest problem I see with deep learning people
link |
01:27:46.360
is they do a PhD in deep learning
link |
01:27:49.360
and then they try and commercialize their PhD.
link |
01:27:52.360
It does a waste of time
link |
01:27:53.360
because that doesn't solve an actual problem.
link |
01:27:55.360
You picked your PhD topic
link |
01:27:57.360
because it was an interesting kind of engineering
link |
01:28:00.360
or math or research exercise.
link |
01:28:02.360
But yeah, if you've actually spent time as a recruiter
link |
01:28:06.360
and you know that most of your time was spent sifting through resumes
link |
01:28:10.360
and you know that most of the time
link |
01:28:12.360
you're just looking for certain kinds of things
link |
01:28:14.360
and you can try doing that with a model for a few minutes
link |
01:28:19.360
and see whether that's something which a model
link |
01:28:21.360
seems to be able to do as well as you could,
link |
01:28:23.360
then you're on the right track to creating a startup.
link |
01:28:27.360
And then I think just being...
link |
01:28:30.360
Just be pragmatic and...
link |
01:28:34.360
try and stay away from venture capital money
link |
01:28:36.360
as long as possible, preferably forever.
link |
01:28:38.360
So yeah, on that point, do you...
link |
01:28:42.360
venture capital...
link |
01:28:43.360
So were you able to successfully run startups
link |
01:28:46.360
with self funded for quite a while?
link |
01:28:48.360
Yeah, so my first two were self funded
link |
01:28:50.360
and that was the right way to do it.
link |
01:28:52.360
Is that scary?
link |
01:28:53.360
No.
link |
01:28:55.360
VCs startups are much more scary
link |
01:28:57.360
because you have these people on your back
link |
01:29:00.360
who do this all the time
link |
01:29:01.360
and who have done it for years
link |
01:29:03.360
telling you grow, grow, grow, grow.
link |
01:29:05.360
And they don't care if you fail.
link |
01:29:07.360
They only care if you don't grow fast enough.
link |
01:29:09.360
So that's scary.
link |
01:29:10.360
We're else doing the ones myself
link |
01:29:13.360
with partners who were friends.
link |
01:29:17.360
It's nice because we just went along
link |
01:29:20.360
at a pace that made sense
link |
01:29:22.360
and we were able to build it to something
link |
01:29:24.360
which was big enough that we never had to work again
link |
01:29:27.360
but was not big enough that any VC
link |
01:29:29.360
would think it was impressive
link |
01:29:31.360
and that was enough for us to be excited.
link |
01:29:35.360
So I thought that's a much better way
link |
01:29:38.360
to do things for most people.
link |
01:29:40.360
And generally speaking now for yourself
link |
01:29:42.360
but how do you make money during that process?
link |
01:29:44.360
Do you cut into savings?
link |
01:29:47.360
So yeah, so I started Fast Mail
link |
01:29:49.360
and Optimal Decisions at the same time
link |
01:29:51.360
in 1999 with two different friends.
link |
01:29:54.360
And for Fast Mail,
link |
01:29:59.360
I guess I spent $70 a month on the server.
link |
01:30:03.360
And when the server ran out of space
link |
01:30:06.360
I put a payments button on the front page
link |
01:30:09.360
and said if you want more than 10 meg of space
link |
01:30:11.360
you have to pay $10 a year.
link |
01:30:15.360
So run low like I keep your cost down.
link |
01:30:18.360
Yeah, so I kept my cost down
link |
01:30:19.360
and once I needed to spend more money
link |
01:30:22.360
I asked people to spend the money for me
link |
01:30:25.360
and that was that basically from then on.
link |
01:30:29.360
We were making money and I was profitable from then.
link |
01:30:34.360
For Optimal Decisions it was a bit harder
link |
01:30:37.360
because we were trying to sell something
link |
01:30:40.360
that was more like a $1 million sale
link |
01:30:42.360
but what we did was we would sell scoping projects
link |
01:30:46.360
so kind of like prototypy projects
link |
01:30:50.360
but rather than doing it for free
link |
01:30:51.360
we would sell them $50,000 to $100,000.
link |
01:30:54.360
So again we were covering our costs
link |
01:30:57.360
and also making the client feel like
link |
01:30:58.360
we were doing something valuable.
link |
01:31:00.360
So in both cases we were profitable from six months in.
link |
01:31:06.360
Nevertheless it's scary.
link |
01:31:08.360
I mean, yeah, sure.
link |
01:31:10.360
I mean it's scary before you jump in
link |
01:31:13.360
and I guess I was comparing it to the scaredyness of VC.
link |
01:31:18.360
I felt like with VC stuff it was more scary.
link |
01:31:20.360
Much more in somebody else's hands.
link |
01:31:24.360
Will they fund you or not?
link |
01:31:26.360
What do they think of what you're doing?
link |
01:31:28.360
I also found it very difficult with VC's back startups
link |
01:31:30.360
to actually do the thing which I thought was important
link |
01:31:33.360
for the company rather than doing the thing
link |
01:31:35.360
which I thought would make the VC happy.
link |
01:31:38.360
Now, VC's always tell you not to do the thing
link |
01:31:40.360
that makes them happy
link |
01:31:41.360
but then if you don't do the thing that makes them happy
link |
01:31:43.360
they get sad.
link |
01:31:45.360
And do you think optimizing for the whatever they call it
link |
01:31:48.360
the exit is a good thing to optimize for?
link |
01:31:52.360
I mean it can be but not at the VC level
link |
01:31:54.360
because the VC exit needs to be, you know, a thousand X.
link |
01:31:59.360
So where else the lifestyle exit
link |
01:32:02.360
if you can sell something for $10 million
link |
01:32:04.360
then you've made it, right?
link |
01:32:06.360
So it depends.
link |
01:32:08.360
If you want to build something that's going to,
link |
01:32:10.360
you're kind of happy to do forever then fine.
link |
01:32:13.360
If you want to build something you want to sell
link |
01:32:16.360
then three years time that's fine too.
link |
01:32:18.360
I mean they're both perfectly good outcomes.
link |
01:32:21.360
So you're learning Swift now?
link |
01:32:24.360
In a way, I mean you already.
link |
01:32:26.360
And I read that you use at least in some cases
link |
01:32:31.360
space repetition as a mechanism for learning new things.
link |
01:32:34.360
I use Anki quite a lot myself.
link |
01:32:38.360
I actually don't never talk to anybody about it.
link |
01:32:41.360
Don't know how many people do it
link |
01:32:44.360
and it works incredibly well for me.
link |
01:32:46.360
Can you talk to your experience?
link |
01:32:48.360
Like how did you, what do you, first of all, okay,
link |
01:32:52.360
let's back it up.
link |
01:32:53.360
What is space repetition?
link |
01:32:55.360
So space repetition is an idea created
link |
01:33:00.360
by a psychologist named Ebbinghaus,
link |
01:33:03.360
I don't know, must be a couple hundred years ago
link |
01:33:06.360
or something 150 years ago.
link |
01:33:08.360
He did something which sounds pretty damn tedious.
link |
01:33:11.360
He found random sequences of letters on cards
link |
01:33:16.360
and tested how well he would remember those random sequences
link |
01:33:21.360
a day later, a week later, whatever.
link |
01:33:23.360
He discovered that there was this kind of a curve
link |
01:33:26.360
where his probability of remembering one of them
link |
01:33:29.360
would be dramatically smaller the next day
link |
01:33:31.360
and then a little bit smaller the next day
link |
01:33:32.360
and a little bit smaller the next day.
link |
01:33:34.360
What he discovered is that if he revised those cards
link |
01:33:37.360
a day, the probabilities would decrease at a smaller rate
link |
01:33:42.360
and then if he revised them again a week later,
link |
01:33:44.360
they would decrease at a smaller rate again.
link |
01:33:46.360
And so he basically figured out a roughly optimal equation
link |
01:33:51.360
for when you should revise something you want to remember.
link |
01:33:56.360
So space repetition learning is using this simple algorithm,
link |
01:34:00.360
just something like revise something after a day
link |
01:34:03.360
and then three days and then a week and then three weeks
link |
01:34:06.360
and so forth.
link |
01:34:07.360
And so if you use a program like Anki, as you know,
link |
01:34:10.360
it will just do that for you.
link |
01:34:12.360
And it will say, did you remember this?
link |
01:34:14.360
And if you say no, it will reschedule it back to be
link |
01:34:18.360
appear again like 10 times faster than it otherwise would have.
link |
01:34:22.360
It's a kind of a way of being guaranteed to learn something
link |
01:34:27.360
because by definition, if you're not learning it,
link |
01:34:30.360
it will be rescheduled to be revised more quickly.
link |
01:34:33.360
Unfortunately though, it doesn't let you fool yourself.
link |
01:34:37.360
If you're not learning something, you know your revisions
link |
01:34:42.360
will just get more and more.
link |
01:34:44.360
So you have to find ways to learn things productively
link |
01:34:48.360
and effectively treat your brain well.
link |
01:34:50.360
So using mnemonics and stories and context and stuff like that.
link |
01:34:57.360
So yeah, it's a super great technique.
link |
01:34:59.360
It's like learning how to learn is something
link |
01:35:01.360
which everybody should learn before they actually learn anything.
link |
01:35:05.360
But almost nobody does.
link |
01:35:07.360
Yes, so what have you, so it certainly works well
link |
01:35:10.360
for learning new languages, for, I mean, for learning,
link |
01:35:14.360
like small projects almost.
link |
01:35:16.360
But do you, you know, I started using it for,
link |
01:35:19.360
I forget who wrote a blog post about this inspired me.
link |
01:35:22.360
It might have been you, I'm not sure.
link |
01:35:25.360
I started when I read papers.
link |
01:35:28.360
I'll, concepts and ideas, I'll put them.
link |
01:35:31.360
Was it Michael Nielsen?
link |
01:35:32.360
It was Michael Nielsen.
link |
01:35:33.360
Yeah, it was Michael Nielsen.
link |
01:35:34.360
Michael started doing this recently
link |
01:35:36.360
and has been writing about it.
link |
01:35:39.360
I, so the kind of today's ebbing house is a guy called Peter Wozniak
link |
01:35:44.360
who developed a system called Super Memo.
link |
01:35:47.360
And he's been basically trying to become like
link |
01:35:51.360
the world's greatest renaissance man over the last few decades.
link |
01:35:55.360
He's basically lived his life with space repetition learning
link |
01:36:00.360
for everything.
link |
01:36:03.360
I, and sort of like Michael's only very recently got into this,
link |
01:36:07.360
but he started really getting excited about doing it
link |
01:36:09.360
for a lot of different things.
link |
01:36:10.360
For me personally, I actually don't use it
link |
01:36:14.360
for anything except Chinese.
link |
01:36:16.360
And the reason for that is that Chinese is specifically a thing.
link |
01:36:21.360
I made a conscious decision that I want to continue to remember
link |
01:36:26.360
even if I don't get much of a chance to exercise it
link |
01:36:29.360
because like I'm not often in China, so I don't.
link |
01:36:33.360
Or else something like programming languages or papers,
link |
01:36:37.360
they have a very different approach,
link |
01:36:39.360
which is I try not to learn anything from them,
link |
01:36:42.360
but instead I try to identify the important concepts
link |
01:36:46.360
and like actually ingest them.
link |
01:36:48.360
So like really understand that concept deeply
link |
01:36:53.360
and study it carefully.
link |
01:36:54.360
Well, decide if it really is important.
link |
01:36:56.360
If it is like incorporate it into our library,
link |
01:37:00.360
you know, incorporate it into how I do things
link |
01:37:03.360
or decide it's not worth it.
link |
01:37:06.360
So I find I then remember the things that I care about
link |
01:37:12.360
because I'm using it all the time.
link |
01:37:15.360
So for the last 25 years,
link |
01:37:19.360
I've committed to spending at least half of every day
link |
01:37:23.360
learning or practicing something new,
link |
01:37:25.360
which is all my colleagues have always hated
link |
01:37:28.360
because it always looks like I'm not working on
link |
01:37:30.360
what I'm meant to be working on,
link |
01:37:31.360
but that always means I do everything faster
link |
01:37:34.360
because I've been practicing a lot of stuff.
link |
01:37:36.360
So I kind of give myself a lot of opportunity
link |
01:37:39.360
to practice new things.
link |
01:37:41.360
And so I find now I don't often kind of find myself
link |
01:37:47.360
wishing I could remember something
link |
01:37:50.360
because if it's something that's useful,
link |
01:37:51.360
then I've been using it a lot.
link |
01:37:53.360
It's easy enough to look it up on Google.
link |
01:37:55.360
But speaking Chinese, you can't look it up on Google.
link |
01:37:59.360
Do you have advice for people learning new things?
link |
01:38:01.360
What have you learned as a process?
link |
01:38:04.360
I mean, it all starts just making the hours
link |
01:38:07.360
and the day available.
link |
01:38:08.360
Yeah, you've got to stick with it,
link |
01:38:10.360
which is, again, the number one thing
link |
01:38:12.360
that 99% of people don't do.
link |
01:38:14.360
So the people I started learning Chinese with,
link |
01:38:16.360
none of them were still doing it 12 months later.
link |
01:38:18.360
I'm still doing it 10 years later.
link |
01:38:20.360
I tried to stay in touch with them,
link |
01:38:22.360
but they just, no one did it.
link |
01:38:24.360
For something like Chinese,
link |
01:38:26.360
like study how human learning works.
link |
01:38:28.360
So every one of my Chinese flashcards
link |
01:38:31.360
is associated with a story,
link |
01:38:33.360
and that story is specifically designed to be memorable.
link |
01:38:36.360
And we find things memorable,
link |
01:38:38.360
funny or disgusting or sexy
link |
01:38:41.360
or related to people that we know or care about.
link |
01:38:44.360
So I try to make sure all the stories that are in my head
link |
01:38:47.360
have those characteristics.
link |
01:38:50.360
Yeah, so you have to, you know,
link |
01:38:52.360
you won't remember things well if they don't have some context.
link |
01:38:55.360
And yeah, you won't remember them well
link |
01:38:57.360
if you don't regularly practice them,
link |
01:39:00.360
whether it be just part of your day to day life
link |
01:39:02.360
for the Chinese and me flashcards.
link |
01:39:05.360
I mean, the other thing is, let yourself fail sometimes.
link |
01:39:09.360
So like, I've had various medical problems
link |
01:39:11.360
over the last few years,
link |
01:39:13.360
and basically my flashcards just stopped
link |
01:39:16.360
for about three years.
link |
01:39:18.360
And then there've been other times I've stopped
link |
01:39:21.360
for a few months, and it's so hard because you get back to it,
link |
01:39:24.360
and it's like, you have 18,000 cards due.
link |
01:39:27.360
It's like, and so you just have to go,
link |
01:39:30.360
all right, well, I can either stop and give up everything
link |
01:39:33.360
or just decide to do this every day for the next two years
link |
01:39:36.360
until I get back to it.
link |
01:39:38.360
The amazing thing has been that even after three years,
link |
01:39:41.360
I, you know, the Chinese were still in there.
link |
01:39:45.360
Like, it was so much faster to relearn
link |
01:39:47.360
than it was to mine the first time.
link |
01:39:49.360
Yeah, absolutely.
link |
01:39:51.360
It's in there.
link |
01:39:52.360
I have the same with guitar, with music and so on.
link |
01:39:55.360
It's sad because work sometimes takes away
link |
01:39:58.360
and then you won't play for a year.
link |
01:40:00.360
But really, if you then just get back to it every day,
link |
01:40:03.360
you're right there again.
link |
01:40:05.360
What do you think is the next big breakthrough
link |
01:40:08.360
in artificial intelligence?
link |
01:40:09.360
What are your hopes in deep learning or beyond
link |
01:40:12.360
that people should be working on,
link |
01:40:14.360
or you hope there'll be breakthroughs?
link |
01:40:16.360
I don't think it's possible to predict.
link |
01:40:18.360
I think what we already have
link |
01:40:20.360
is an incredibly powerful platform
link |
01:40:23.360
to solve lots of societally important problems
link |
01:40:26.360
that are currently unsolved.
link |
01:40:28.360
I just hope that people will, lots of people
link |
01:40:30.360
will learn this toolkit and try to use it.
link |
01:40:33.360
I don't think we need a lot of new technological breakthroughs
link |
01:40:36.360
to do a lot of great work right now.
link |
01:40:39.360
And when do you think we're going to create
link |
01:40:42.360
a human level intelligence system?
link |
01:40:44.360
Do you think?
link |
01:40:45.360
I don't know.
link |
01:40:46.360
How hard is it?
link |
01:40:47.360
How far away are we?
link |
01:40:48.360
I don't know.
link |
01:40:49.360
I have no way to know.
link |
01:40:50.360
I don't know.
link |
01:40:51.360
Like, I don't know why people make predictions about this
link |
01:40:53.360
because there's no data and nothing to go on.
link |
01:40:57.360
And it's just like,
link |
01:40:59.360
there's so many societally important problems
link |
01:41:03.360
to solve right now,
link |
01:41:04.360
I just don't find it a really interesting question
link |
01:41:08.360
to even answer.
link |
01:41:09.360
So in terms of societally important problems,
link |
01:41:12.360
what's the problem that is within reach?
link |
01:41:15.360
Well, I mean, for example,
link |
01:41:17.360
there are problems that AI creates, right?
link |
01:41:19.360
So more specifically,
link |
01:41:22.360
labor force displacement is going to be huge
link |
01:41:26.360
and people keep making this
link |
01:41:28.360
frivolous econometric argument of being like,
link |
01:41:31.360
oh, there's been other things that aren't AI
link |
01:41:33.360
that have come along before
link |
01:41:34.360
and haven't created massive labor force displacement.
link |
01:41:37.360
Therefore, AI won't.
link |
01:41:39.360
So that's a serious concern for you?
link |
01:41:41.360
Oh, yeah.
link |
01:41:42.360
Andrew Yang is running on it.
link |
01:41:43.360
Yeah.
link |
01:41:44.360
It's desperately concerned.
link |
01:41:46.360
And you see already that the changing workplace
link |
01:41:52.360
has lived to a hollowing out of the middle class.
link |
01:41:55.360
You're seeing that students coming out of school today
link |
01:41:58.360
have a less rosy financial future ahead of them
link |
01:42:03.360
than the parents did,
link |
01:42:04.360
which has never happened in recent,
link |
01:42:06.360
in the last 300 years.
link |
01:42:08.360
We've always had progress before.
link |
01:42:11.360
And you see this turning into anxiety and despair
link |
01:42:16.360
and even violence.
link |
01:42:19.360
So I very much worry about that.
link |
01:42:21.360
You've written quite a bit about ethics, too.
link |
01:42:24.360
I do think that every data scientist
link |
01:42:27.360
working with deep learning needs to recognize
link |
01:42:32.360
they have an incredibly high leverage tool
link |
01:42:34.360
that they're using that can influence society
link |
01:42:36.360
in lots of ways.
link |
01:42:37.360
And if they're doing research,
link |
01:42:38.360
that research is going to be used by people
link |
01:42:41.360
doing this kind of work
link |
01:42:42.360
and they have a responsibility
link |
01:42:44.360
to consider the consequences
link |
01:42:46.360
and to think about things like
link |
01:42:49.360
how will humans be in the loop here?
link |
01:42:53.360
How do we avoid runaway feedback loops?
link |
01:42:55.360
How do we ensure an appeals process for humans
link |
01:42:58.360
that are impacted by my algorithm?
link |
01:43:00.360
How do I ensure that the constraints of my algorithm
link |
01:43:04.360
are adequately explained to the people that end up using them?
link |
01:43:08.360
There's all kinds of human issues,
link |
01:43:11.360
which only data scientists
link |
01:43:13.360
are actually in the right place to educate people about,
link |
01:43:17.360
but data scientists tend to think of themselves as
link |
01:43:21.360
just engineers
link |
01:43:22.360
and that they don't need to be part of that process,
link |
01:43:24.360
which is wrong.
link |
01:43:26.360
Well, you're in the perfect position to educate them better,
link |
01:43:29.360
to read literature, to read history,
link |
01:43:32.360
to learn from history.
link |
01:43:35.360
Well, Jeremy, thank you so much for everything you do
link |
01:43:38.360
for inspiring a huge amount of people,
link |
01:43:40.360
getting them into deep learning
link |
01:43:42.360
and having the ripple effects,
link |
01:43:44.360
the flap of a butterfly's wings that will probably change the world.
link |
01:43:48.360
So thank you very much.