back to indexJeremy Howard: fast.ai Deep Learning Courses and Research | Lex Fridman Podcast #35
link |
The following is a conversation with Jeremy Howard.
link |
He's the founder of Fast AI, a research institute dedicated
link |
to making deep learning more accessible.
link |
He's also a distinguished research scientist
link |
at the University of San Francisco,
link |
a former president of Kegel, as well as a top breaking
link |
And in general, he's a successful entrepreneur,
link |
educator, researcher, and an inspiring personality
link |
in the AI community.
link |
When someone asked me, how do I get
link |
started with deep learning?
link |
Fast AI is one of the top places I point them to.
link |
It's easy to get started.
link |
It's insightful and accessible.
link |
And if I may say so, it has very little BS.
link |
It can sometimes dilute the value of educational content
link |
on popular topics like deep learning.
link |
Fast AI has a focus on practical application
link |
of deep learning and hands on exploration
link |
of the cutting edge that is incredibly
link |
both accessible to beginners and useful to experts.
link |
This is the Artificial Intelligence Podcast.
link |
If you enjoy it, subscribe on YouTube,
link |
give it five stars on iTunes, support it on Patreon,
link |
or simply connect with me on Twitter.
link |
Alex Friedman, spelled F R I D M A N.
link |
And now, here's my conversation with Jeremy Howard.
link |
What's the first program you ever written?
link |
First program I wrote that I remember
link |
would be at high school.
link |
I did an assignment where I decided
link |
to try to find out if there were some better musical scales
link |
than the normal 12 tone, 12 interval scale.
link |
So I wrote a program on my Commodore 64 in BASIC
link |
that searched through other scale sizes
link |
to see if it could find one where there
link |
were more accurate harmonies.
link |
Like you want an actual exactly 3 to 2 ratio,
link |
where else with a 12 interval scale,
link |
it's not exactly 3 to 2, for example.
link |
So that's well tempered, as they say.
link |
And BASIC on a Commodore 64.
link |
Where was the interest in music from?
link |
Or is it just technical?
link |
I did music all my life, so I played saxophone and clarinet
link |
and piano and guitar and drums and whatever.
link |
How does that thread go through your life?
link |
Where's music today?
link |
It's not where I wish it was.
link |
For various reasons, couldn't really keep it going,
link |
particularly because I had a lot of problems with RSI,
link |
And so I had to cut back anything that used hands
link |
I hope one day I'll be able to get back to it health wise.
link |
So there's a love for music underlying it all.
link |
What's your favorite instrument?
link |
Baritone saxophone.
link |
Well, probably bass saxophone, but they're awkward.
link |
Well, I always love it when music is
link |
coupled with programming.
link |
There's something about a brain that
link |
utilizes those that emerges with creative ideas.
link |
So you've used and studied quite a few programming languages.
link |
Can you give an overview of what you've used?
link |
What are the pros and cons of each?
link |
Well, my favorite programming environment almost certainly
link |
was Microsoft Access back in the earliest days.
link |
So that was a special basic for applications, which
link |
is not a good programming language,
link |
but the programming environment is fantastic.
link |
It's like the ability to create user interfaces and tied data
link |
and actions to them and create reports and all that.
link |
As I've never seen anything as good.
link |
So things nowadays like Airtable, which
link |
are like small subsets of that, which people love for good reason.
link |
But unfortunately, nobody's ever achieved anything like that.
link |
What is that, if you could pause on that for a second?
link |
Is it a fundamental database?
link |
It was a database program that Microsoft produced,
link |
part of Office, and it kind of withered.
link |
But basically, it lets you in a totally graphical way
link |
create tables and relationships and queries
link |
and tie them to forms and set up event handlers and calculations.
link |
And it was a very complete, powerful system designed
link |
for not massive scalable things, but for useful little applications
link |
So what's the connection between Excel and Access?
link |
So Access was the relational database equivalent,
link |
So people still do a lot of that stuff
link |
that should be in Access in Excel because they know it.
link |
Excel's great as well.
link |
But it's just not as rich a programming model as VBA
link |
combined with a relational database.
link |
And so I've always loved relational databases.
link |
But today, programming on top of relational databases
link |
is just a lot more of a headache.
link |
You generally either need to kind of,
link |
you need something that connects, that runs some kind
link |
of database server, unless you use SQLite, which
link |
has its own issues.
link |
Then you kind of often, if you want
link |
to get a nice programming model, you
link |
need to create an ORM on top.
link |
And then, I don't know, there's all these pieces tied together.
link |
And it's just a lot more awkward than it should be.
link |
There are people that are trying to make it easier,
link |
so in particular, I think of Fsharp, Don Syme, who him
link |
and his team have done a great job of making something
link |
like a database appear in the type system,
link |
so you actually get tab completion for fields and tables
link |
and stuff like that.
link |
Anyway, so that was kind of, anyway,
link |
so that whole VBA Office thing, I guess,
link |
was a starting point, which is your miss.
link |
And I got into Standard Visual Basic, which
link |
that's interesting, just to pause on that for a second.
link |
And it's interesting that you're connecting programming
link |
languages to the ease of management of data.
link |
So in your use of programming languages,
link |
you always had a love and a connection with data.
link |
I've always been interested in doing useful things for myself
link |
and for others, which generally means getting some data
link |
and doing something with it and putting it out there again.
link |
So that's been my interest throughout.
link |
So I also did a lot of stuff with Apple script
link |
back in the early days.
link |
So it's kind of nice being able to get the computer
link |
and computers to talk to each other and to do things for you.
link |
And then I think that one night, the programming language
link |
I most loved then would have been Delphi, which
link |
was Object Pascal created by Anders Halsberg, who previously
link |
did Turbo Pascal and then went on to create.net
link |
and then went on to create TypeScript.
link |
Delphi was amazing because it was like a compiled, fast language
link |
that was as easy to use as Visual Basic.
link |
Delphi, what is it similar to in more modern languages?
link |
Yeah, that a compiled, fast version.
link |
So I'm not sure there's anything quite like it anymore.
link |
If you took C Sharp or Java and got rid of the virtual machine
link |
and replaced it with something, you could compile a small type
link |
I feel like it's where Swift could get to with the new Swift
link |
UI and the cross platform development going on.
link |
That's one of my dreams is that we'll hopefully get back
link |
to where Delphi was.
link |
There is actually a free Pascal project nowadays
link |
called Lazarus, which is also attempting
link |
to recreate Delphi.
link |
They're making good progress.
link |
So OK, Delphi, that's one of your favorite programming languages?
link |
Well, it's programming environments.
link |
Again, say Pascal's not a nice language.
link |
If you wanted to know specifically
link |
about what languages I like, I would definitely
link |
pick Jay as being an amazingly wonderful language.
link |
Jay, are you aware of APL?
link |
I am not, except from doing a little research on the work
link |
OK, so not at all surprising you're not
link |
familiar with it because it's not well known,
link |
but it's actually one of the main families of programming
link |
languages going back to the late 50s, early 60s.
link |
So there was a couple of major directions.
link |
One was the kind of lambda, calculus,
link |
Alonzo church direction, which I guess kind of Lisbon scheme
link |
and whatever, which has a history going back
link |
to the early days of computing.
link |
The second was the kind of imperative slash
link |
OO, algo, similar going on to C, C++, so forth.
link |
There was a third, which are called array oriented languages,
link |
which started with a paper by a guy called Ken Iverson, which
link |
was actually a math theory paper, not a programming paper.
link |
It was called Notation as a Tool for Thought.
link |
And it was the development of a new type of math notation.
link |
And the idea is that this math notation was much more
link |
flexible, expressive, and also well defined than traditional
link |
math notation, which is none of those things.
link |
Math notation is awful.
link |
And so he actually turned that into a programming language.
link |
Because this was the late 50s, all the names were available.
link |
So he called his programming language, or APL.
link |
So APL is a implementation of notation
link |
as a tool for thought, by which he means math notation.
link |
And Ken and his son went on to do many things,
link |
but eventually they actually produced a new language that
link |
was built on top of all the learnings of APL.
link |
And that was called J. And J is the most
link |
expressive, composable, beautifully designed language
link |
Does it have object oriented components?
link |
Does it have that kind of thing?
link |
It's an array oriented language.
link |
It's the third path.
link |
Are you saying array?
link |
It needs to be array oriented.
link |
So array oriented means that you generally
link |
don't use any loops.
link |
But the whole thing is done with kind
link |
of an extreme version of broadcasting,
link |
if you're familiar with that NumPy slash Python concept.
link |
So you do a lot with one line of code.
link |
It looks a lot like math.
link |
Notation is basically highly compact.
link |
And the idea is that you can kind of,
link |
because you can do so much with one line of code,
link |
a single screen of code is very unlikely to,
link |
you very rarely need more than that to express your program.
link |
And so you can kind of keep it all in your head.
link |
And you can kind of clearly communicate it.
link |
It's interesting that APL created two main branches, K and J.
link |
J is this kind of like open source niche community of crazy
link |
enthusiasts like me.
link |
And then the other path, K, was fascinating.
link |
It's an astonishingly expensive programming language,
link |
which many of the world's most ludicrously rich hedge funds
link |
So the entire K machine is so small,
link |
it sits inside level three cache on your CPU.
link |
And it easily wins every benchmark I've ever seen
link |
in terms of data processing speed.
link |
But you don't come across it very much,
link |
because it's like $100,000 per CPU to run it.
link |
But it's like this path of programming languages
link |
is just so much, I don't know, so much more powerful
link |
in every way than the ones that almost anybody uses every day.
link |
So it's all about computation.
link |
It's really focusing on it.
link |
Pretty heavily focused on computation.
link |
I mean, so much of programming is data processing
link |
And so there's a lot of things you can do with it.
link |
But yeah, there's not much work being
link |
done on making user interface toolkills or whatever.
link |
I mean, there's some, but they're not great.
link |
At the same time, you've done a lot of stuff with Perl and Python.
link |
So what does that fit into the picture of J and K and APL
link |
Well, it's just much more pragmatic.
link |
In the end, you kind of have to end up
link |
where the libraries are.
link |
Because to me, my focus is on productivity.
link |
I just want to get stuff done and solve problems.
link |
So Perl was great.
link |
I created an email company called Fastmail.
link |
And Perl was great, because back in the late 90s, early 2000s,
link |
it just had a lot of stuff it could do.
link |
I still had to write my own monitoring system
link |
and my own web framework and my own whatever,
link |
because none of that stuff existed.
link |
But it was a super flexible language to do that in.
link |
And you used Perl for Fastmail.
link |
You used it as a back end.
link |
So everything was written in Perl?
link |
Yeah, everything was Perl.
link |
Why do you think Perl hasn't succeeded or hasn't dominated
link |
the market where Python really takes over a lot of the
link |
Well, I mean, Perl did dominate.
link |
It was everything, everywhere.
link |
But then the guy that ran Perl, Larry Wall,
link |
just didn't put the time in anymore.
link |
And no project can be successful if there isn't.
link |
Particularly one that started with a strong leader that
link |
loses that strong leadership.
link |
So then Python has kind of replaced it.
link |
Python is a lot less elegant language in nearly every way.
link |
But it has the data science libraries.
link |
And a lot of them are pretty great.
link |
So I kind of use it because it's the best we have.
link |
But it's definitely not good enough.
link |
What do you think the future of programming looks like?
link |
What do you hope the future of programming looks like if we
link |
zoom in on the computational fields on data science
link |
and machine learning?
link |
I hope Swift is successful because the goal of Swift,
link |
the way Chris Latna describes it,
link |
is to be infinitely hackable.
link |
And that's what I want.
link |
I want something where me and the people I do research with
link |
and my students can look at and change everything
link |
from top to bottom.
link |
There's nothing mysterious and magical and inaccessible.
link |
Unfortunately, with Python, it's the opposite of that
link |
because Python is so slow, it's extremely unhackable.
link |
You get to a point where it's like, OK, from here on down
link |
at C. So your debugger doesn't work in the same way.
link |
Your profiler doesn't work in the same way.
link |
Your build system doesn't work in the same way.
link |
It's really not very hackable at all.
link |
What's the part you like to be hackable?
link |
Is it for the objective of optimizing training
link |
of neural networks, inference of neural networks?
link |
Is it performance of the system?
link |
Or is there some nonperformance related, just creative idea?
link |
I mean, in the end, I want to be productive as a practitioner.
link |
So at the moment, our understanding of deep learning
link |
is incredibly primitive.
link |
There's very little we understand.
link |
Most things don't work very well, even though it works better
link |
than anything else out there.
link |
There's so many opportunities to make it better.
link |
So you look at any domain area like speech recognition
link |
with deep learning or natural language processing
link |
classification with deep learning or whatever.
link |
Every time I look at an area with deep learning,
link |
I always see like, oh, it's terrible.
link |
There's lots and lots of obviously stupid ways
link |
to do things that need to be fixed.
link |
So then I want to be able to jump in there and quickly
link |
experiment and make them better.
link |
Do you think the programming language has a role in that?
link |
So currently, Python has a big gap in terms of our ability
link |
to innovate particularly around recurrent neural networks
link |
and natural language processing because it's so slow.
link |
The actual loop where we actually loop through words,
link |
we have to do that whole thing in CUDA C.
link |
So we actually can't innovate with the kernel, the heart,
link |
of that most important algorithm.
link |
And it's just a huge problem.
link |
And this happens all over the place.
link |
So we hit research limitations.
link |
Another example, convolutional neural networks, which
link |
are actually the most popular architecture for lots of things,
link |
maybe most things in deep learning.
link |
We almost certainly should be using
link |
sparse convolutional neural networks, but only like two
link |
people are because to do it, you have
link |
to rewrite all of that CUDA C level stuff.
link |
And yeah, just research, just in practitioners, don't.
link |
So there's just big gaps in what people actually research on,
link |
what people actually implement because of the programming
link |
So you think it's just too difficult
link |
to write in CUDA C that a higher level programming language
link |
like Swift should enable the easier,
link |
fooling around, create stuff with RNNs,
link |
or sparse convolutional neural networks?
link |
Who is at charge of making it easy for a researcher to play around?
link |
I mean, no one's at fault.
link |
Just nobody's got a round to it yet.
link |
Or it's just it's hard.
link |
And I mean, part of the fault is that we ignored that whole APL
link |
kind of direction, or nearly everybody did for 60 years,
link |
But recently, people have been starting
link |
to reinvent pieces of that and kind of create some interesting
link |
new directions in the compiler technology.
link |
So the place where that's particularly happening right now
link |
is something called MLIR, which is something that, again,
link |
Chris Lattener, the Swift guy, is leading.
link |
And because it's actually not going
link |
to be Swift on its own that solves this problem.
link |
Because the problem is that currently writing
link |
a acceptably fast GPU program is too complicated,
link |
regardless of what language you use.
link |
And that's just because if you have to deal with the fact
link |
that I've got 10,000 threads and I have to synchronize between them
link |
all, and I have to put my thing into grid blocks
link |
and think about warps and all this stuff,
link |
it's just so much boilerplate that to do that well,
link |
you have to be a specialist at that.
link |
And it's going to be a year's work to optimize that algorithm
link |
But with things like TensorFlow Comprehensions, and Tile,
link |
and MLIR, and TVM, there's all these various projects which
link |
are all about saying, let's let people
link |
create domain specific languages for tensor
link |
These are the kinds of things we do generally
link |
on the GPU for deep learning, and then
link |
have a compiler which can optimize that tensor computation.
link |
A lot of this work is actually sitting on top of a project
link |
called Halide, which is a mind blowing project
link |
where they came up with such a domain specific language.
link |
In fact, two, one domain specific language for expressing,
link |
this is what my tensor computation is.
link |
And another domain specific language for expressing,
link |
this is the way I want you to structure
link |
the compilation of that, and do it block by block
link |
and do these bits in parallel.
link |
And they were able to show how you can compress
link |
the amount of code by 10x compared to optimized GPU
link |
code and get the same performance.
link |
So these are the things that are sitting on top
link |
of that kind of research, and MLIR
link |
is pulling a lot of those best practices together.
link |
And now we're starting to see work done
link |
on making all of that directly accessible through Swift
link |
so that I could use Swift to write those domain specific
link |
And hopefully we'll get then Swift CUDA kernels
link |
written in a very expressive and concise way that
link |
looks a bit like J in APL, and then Swift layers on top
link |
of that, and then a Swift UI on top of that,
link |
and it'll be so nice if we can get to that point.
link |
Now does it all eventually boil down to CUDA and NVIDIA GPUs?
link |
Unfortunately at the moment it does,
link |
but one of the nice things about MLIR,
link |
if AMD ever gets their act together, which they probably
link |
want, is that they or others could
link |
write MLIR backends for other GPUs
link |
or rather tensor computation devices, of which today
link |
there are increasing number like Graphcore or Vertex AI
link |
So yeah, being able to target lots of backends
link |
would be another benefit of this,
link |
and the market really needs competition,
link |
because at the moment NVIDIA is massively
link |
overcharging for their kind of enterprise class cards,
link |
because there is no serious competition,
link |
because nobody else is doing the software properly.
link |
In the cloud there is some competition, right?
link |
But not really, other than TPUs perhaps,
link |
but TPUs are almost unprogrammable at the moment.
link |
TPUs have the same problem that you can't.
link |
So TPUs, Google actually made an explicit decision
link |
to make them almost entirely unprogrammable,
link |
because they felt that there was too much IP in there,
link |
and if they gave people direct access to program them,
link |
people would learn their secrets.
link |
So you can't actually directly program
link |
the memory in a TPU.
link |
You can't even directly create code that runs on
link |
and that you look at on the machine that has the TPU.
link |
It all goes through a virtual machine.
link |
So all you can really do is this kind of cookie cutter
link |
thing of like plug in high level stuff together,
link |
which is just super tedious and annoying
link |
and totally unnecessary.
link |
So tell me if you could, the origin story of fast AI.
link |
What is the motivation, its mission, its dream?
link |
So I guess the founding story is heavily
link |
tied to my previous startup, which
link |
is a company called Inletic, which
link |
was the first company to focus on deep learning for medicine.
link |
And I created that because I saw there was a huge opportunity
link |
to, there's about a 10x shortage of the number of doctors
link |
in the world and the developing world that we need.
link |
I expected it would take about 300 years
link |
to train enough doctors to meet that gap.
link |
But I guessed that maybe if we used
link |
deep learning for some of the analytics,
link |
we could maybe make it so you don't need
link |
as highly trained doctors.
link |
For diagnosis and treatment planning.
link |
Where's the biggest benefit just before get the fast AI?
link |
Where's the biggest benefit of AI and medicine that you see
link |
today and in the future?
link |
Not much happening today in terms of stuff that's actually
link |
But in terms of the opportunity, it's
link |
to take markets like India and China and Indonesia, which
link |
have big populations, Africa, small numbers of doctors,
link |
and provide diagnostic, particularly treatment
link |
planning and triage kind of on device
link |
so that if you do a test for malaria or tuberculosis
link |
or whatever, you immediately get something
link |
that even a health care worker that's
link |
had a month of training can get a very high quality
link |
assessment of whether the patient might be at risk
link |
until OK, we'll send them off to a hospital.
link |
So for example, in Africa, outside of South Africa,
link |
there's only five pediatric radiologists
link |
for the entire continent.
link |
So most countries don't have any.
link |
So if your kid is sick and they need something
link |
diagnosed through medical imaging,
link |
the person, even if you're able to get medical imaging done,
link |
the person that looks at it will be a nurse at best.
link |
But actually, in India, for example, and China,
link |
almost no x rays are read by anybody,
link |
by any trained professional, because they don't have enough.
link |
So if instead we had an algorithm that
link |
could take the most likely high risk 5% and say triage,
link |
basically say, OK, somebody needs to look at this,
link |
it would massively change the kind of way
link |
that what's possible with medicine in the developing world.
link |
And remember, increasingly, they have money.
link |
They're the developing world.
link |
They're not the poor world, the developing world.
link |
So they have the money.
link |
So they're building the hospitals.
link |
They're getting the diagnostic equipment.
link |
But there's no way for a very long time
link |
will they be able to have the expertise.
link |
Shortage of expertise.
link |
OK, and that's where the deep learning systems
link |
can step in and magnify the expertise they do have.
link |
So you do see, just to linger a little bit longer,
link |
the interaction, do you still see the human experts still
link |
at the core of the system?
link |
Is there something in medicine that
link |
could be automated almost completely?
link |
I don't see the point of even thinking about that,
link |
because we have such a shortage of people.
link |
Why would we want to find a way not to use them?
link |
Like, we have people.
link |
So the idea of, even from an economic point of view,
link |
if you can make them 10x more productive,
link |
getting rid of the person doesn't
link |
impact your unit economics at all.
link |
And it totally involves the fact that there are things
link |
people do better than machines.
link |
So it's just, to me, that's not a useful way
link |
of framing the problem.
link |
I guess, just to clarify, I guess I
link |
meant there may be some problems where you can avoid even
link |
going to the expert ever.
link |
Sort of maybe preventative care or some basic stuff,
link |
the low hanging fruit, allowing the expert
link |
to focus on the things that are really that.
link |
Well, that's what the triage would do, right?
link |
So the triage would say, OK, 99% sure there's nothing here.
link |
So that can be done on device.
link |
And they can just say, OK, go home.
link |
So the experts are being used to look at the stuff which
link |
has some chance it's worth looking at,
link |
which most things is not.
link |
Why do you think we haven't quite made progress on that yet
link |
in terms of the scale of how much AI is applied in the method?
link |
There's a lot of reasons.
link |
I mean, one is it's pretty new.
link |
I only started in late 2014.
link |
And before that, it's hard to express
link |
to what degree the medical world was not
link |
aware of the opportunities here.
link |
So I went to RSNA, which is the world's largest radiology
link |
And I told everybody I could, like,
link |
I'm doing this thing with deep learning.
link |
Please come and check it out.
link |
And no one had any idea what I was talking about.
link |
No one had any interest in it.
link |
So we've come from absolute zero, which is hard.
link |
And then the whole regulatory framework, education system,
link |
everything is just set up to think of doctoring
link |
in a very different way.
link |
So today, there is a small number
link |
of people who are deep learning practitioners and doctors
link |
And we're starting to see the first ones come out
link |
of their PhD programs.
link |
So Zach Cahane over in Boston, Cambridge
link |
has a number of students now who are data science experts,
link |
deep learning experts, and actual medical doctors.
link |
Quite a few doctors have completed our fast AI course
link |
now and are publishing papers and creating journal reading
link |
groups in the American Council of Radiology.
link |
And it's just starting to happen.
link |
But it's going to be a long process.
link |
The regulators have to learn how to regulate this.
link |
They have to build guidelines.
link |
And then the lawyers at hospitals
link |
have to develop a new way of understanding
link |
that sometimes it makes sense for data
link |
to be looked at in raw form in large quantities
link |
in order to create world changing results.
link |
Yeah, there's a regulation around data, all that.
link |
It sounds probably the hardest problem,
link |
but it sounds reminiscent of autonomous vehicles as well.
link |
Many of the same regulatory challenges,
link |
many of the same data challenges.
link |
Yeah, I mean, funnily enough, the problem
link |
is less the regulation and more the interpretation
link |
of that regulation by lawyers in hospitals.
link |
So HIPAA was actually designed.
link |
The P in HIPAA does not stand for privacy.
link |
It stands for portability.
link |
It's actually meant to be a way that data can be used.
link |
And it was created with lots of gray areas
link |
because the idea is that would be more practical
link |
and it would help people to use this legislation
link |
to actually share data in a more thoughtful way.
link |
Unfortunately, it's done the opposite
link |
because when a lawyer sees a gray area, they see, oh,
link |
if we don't know we won't get sued, then we can't do it.
link |
So HIPAA is not exactly the problem.
link |
The problem is more that hospital lawyers
link |
are not incented to make bold decisions
link |
about data portability.
link |
Or even to embrace technology that saves lives.
link |
They more want to not get in trouble
link |
for embracing that technology.
link |
Also, it is also saves lives in a very abstract way,
link |
which is like, oh, we've been able to release
link |
these 100,000 anonymous records.
link |
I can't point at the specific person whose life that's saved.
link |
I can say like, oh, we've ended up with this paper
link |
which found this result, which diagnosed 1,000 more people
link |
than we would have otherwise, but it's like,
link |
which ones were helped, it's very abstract.
link |
Yeah, and on the counter side of that,
link |
you may be able to point to a life that was taken
link |
because of something that was...
link |
Yeah, or a person whose privacy was violated.
link |
It's like, oh, this specific person,
link |
you know, there was deidentified.
link |
Just a fascinating topic.
link |
We're jumping around.
link |
We'll get back to fast AI, but on the question of privacy,
link |
data is the fuel for so much innovation in deep learning.
link |
What's your sense on privacy,
link |
whether we're talking about Twitter, Facebook, YouTube,
link |
just the technologies like in the medical field
link |
that rely on people's data in order to create impact?
link |
How do we get that right, respecting people's privacy
link |
and yet creating technology that is learned from data?
link |
One of my areas of focus is on doing more with less data,
link |
which so most vendors, unfortunately, are strongly
link |
centred to find ways to require more data and more computation.
link |
So Google and IBM being the most obvious...
link |
Yeah, so Watson, you know, so Google and IBM both strongly push
link |
the idea that they have more data and more computation
link |
and more intelligent people than anybody else,
link |
and so you have to trust them to do things
link |
because nobody else can do it.
link |
And Google's very upfront about this,
link |
like Jeff Dain has gone out there and given talks and said,
link |
our goal is to require 1,000 times more computation,
link |
Our goal is to use the people that you have better
link |
and the data you have better and the computation you have better.
link |
So one of the things that we've discovered is,
link |
or at least highlighted, is that you very, very, very often
link |
don't need much data at all.
link |
And so the data you already have in your organization
link |
will be enough to get state of the art results.
link |
So like my starting point would be to kind of say around privacy
link |
is a lot of people are looking for ways
link |
to share data and aggregate data,
link |
but I think often that's unnecessary.
link |
They assume that they need more data than they do
link |
because they're not familiar with the basics of transfer
link |
learning, which is this critical technique
link |
for needing orders of magnitude less data.
link |
Is your sense, one reason you might want to collect data
link |
from everyone is like in the recommender system context,
link |
where your individual, Jeremy Howard's individual data
link |
is the most useful for providing a product that's
link |
impactful for you.
link |
So for giving you advertisements,
link |
for recommending to you movies, for doing medical diagnosis.
link |
Is your sense we can build with a small amount of data,
link |
general models that will have a huge impact for most people,
link |
that we don't need to have data from each individual?
link |
On the whole, I'd say yes.
link |
I mean, there are things like, recommender systems
link |
have this cold start problem, where Jeremy is a new customer.
link |
We haven't seen him before, so we can't recommend him things
link |
based on what else he's bought and liked with us.
link |
And there's various workarounds to that.
link |
A lot of music programs will start out
link |
by saying, which of these artists do you like?
link |
Which of these albums do you like?
link |
Which of these songs do you like?
link |
Netflix used to do that.
link |
Nowadays, people don't like that because they think, oh,
link |
we don't want to bother the user.
link |
So you could work around that by having some kind of data
link |
sharing where you get my marketing record from Axiom
link |
or whatever and try to question that.
link |
To me, the benefit to me and to society
link |
of saving me five minutes on answering some questions
link |
versus the negative externalities of the privacy issue
link |
So I think a lot of the time, the places
link |
where people are invading our privacy in order
link |
to provide convenience is really about just trying
link |
to make them more money.
link |
And they move these negative externalities
link |
into places that they don't have to pay for them.
link |
So when you actually see regulations
link |
appear that actually cause the companies that
link |
create these negative externalities to have
link |
to pay for it themselves, they say, well,
link |
we can't do it anymore.
link |
So the cost is actually too high.
link |
But for something like medicine, the hospital
link |
has my medical imaging, my pathology studies,
link |
my medical records.
link |
And also, I own my medical data.
link |
So I help a startup called DocAI.
link |
One of the things DocAI does is that it has an app.
link |
You can connect to Sutter Health and Labcore and Walgreens
link |
and download your medical data to your phone
link |
and then upload it, again, at your discretion
link |
to share it as you wish.
link |
So with that kind of approach, we
link |
can share our medical information
link |
with the people we want to.
link |
I mean, really being able to control who you share it with
link |
So that has a beautiful, interesting tangent
link |
to return back to the origin story of FastAI.
link |
Right, so before I started FastAI,
link |
I spent a year researching where are the biggest
link |
opportunities for deep learning.
link |
Because I knew from my time at Kaggle in particular
link |
that deep learning had hit this threshold point where it was
link |
rapidly becoming the state of the art approach in every area
link |
that looked at it.
link |
And I'd been working with neural nets for over 20 years.
link |
I knew that from a theoretical point of view,
link |
once it hit that point, it would do that in just about every
link |
And so I spent a year researching
link |
what are the domains it's going to have the biggest low hanging
link |
fruit in the shortest time period.
link |
I picked medicine, but there were so many I could have picked.
link |
And so there was a level of frustration for me of like, OK,
link |
I'm really glad we've opened up the medical deep learning
link |
world and today it's huge, as you know.
link |
But we can't do, you know, I can't do everything.
link |
I don't even know like, like in medicine,
link |
it took me a really long time to even get a sense of like,
link |
what kind of problems do medical practitioners solve?
link |
What kind of data do they have?
link |
Who has that data?
link |
So I kind of felt like I need to approach this differently
link |
if I want to maximize the positive impact of deep learning.
link |
Rather than me picking an area and trying
link |
to become good at it and building something,
link |
I should let people who are already domain experts
link |
in those areas and who already have the data do it themselves.
link |
So that was the reason for vast AI is to basically try
link |
and figure out how to get deep learning
link |
into the hands of people who could benefit from it
link |
and help them to do so in as quick and easy and effective
link |
a way as possible.
link |
So sort of empower the domain experts.
link |
And like partly it's because like,
link |
unlike most people in this field,
link |
my background is very applied and industrial.
link |
Like my first job was at McKinsey & Company.
link |
I spent 10 years of management consulting.
link |
I spend a lot of time with domain experts.
link |
You know, so I kind of respect them and appreciate them.
link |
And I know that's where the value generation in society is.
link |
And so I also know how most of them can't code.
link |
And most of them don't have the time to invest, you know,
link |
three years in a graduate degree or whatever.
link |
So it's like, how do I upskill those domain experts?
link |
I think that would be a super powerful thing,
link |
you know, the biggest societal impact I could have.
link |
So yeah, that was the thinking.
link |
So so much of fast AI students and researchers
link |
and the things you teach are programmatically minded,
link |
practically minded,
link |
figuring out ways how to solve real problems and fast.
link |
So from your experience,
link |
what's the difference between theory and practice of deep learning?
link |
Well, most of the research in the deep mining world
link |
is a total waste of time.
link |
Right. That's what I was getting at.
link |
It's it's a problem in science in general.
link |
Scientists need to be published,
link |
which means they need to work on things
link |
that their peers are extremely familiar with
link |
and can recognize in advance in that area.
link |
So that means that they all need to work on the same thing.
link |
And so it really ink and the thing they work on
link |
is nothing to encourage them to work on things
link |
that are practically useful.
link |
So you get just a whole lot of research,
link |
which is minor advances in stuff
link |
that's been very highly studied
link |
and has no significant practical impact.
link |
Whereas the things that really make a difference
link |
like I mentioned transfer learning,
link |
like if we can do better at transfer learning,
link |
then it's this like world changing thing
link |
where suddenly like lots more people can do world class work
link |
with less resources and less data and.
link |
But almost nobody works on that.
link |
Or another example, active learning,
link |
which is the study of like,
link |
how do we get more out of the human beings in the loop?
link |
That's my favorite topic.
link |
Yeah. So active learning is great,
link |
but it's almost nobody working on it
link |
because it's just not a trendy thing right now.
link |
You know what somebody started to interrupt?
link |
He was saying that nobody is publishing
link |
on active learning, right?
link |
But there's people inside companies,
link |
anybody who actually has to solve a problem,
link |
they're going to innovate on active learning.
link |
Yeah. Everybody kind of reinvents active learning
link |
when they actually have to work in practice
link |
because they start labeling things and they think,
link |
gosh, this is taking a long time and it's very expensive.
link |
And then they start thinking,
link |
well, why am I labeling everything?
link |
I'm only, the machine's only making mistakes
link |
on those two classes.
link |
They're the hard ones.
link |
Maybe I'll just start labeling those two classes
link |
and then you start thinking,
link |
well, why did I do that manually?
link |
Why can't I just get the system to tell me
link |
which things are going to be harder steps?
link |
It's an obvious thing to do.
link |
But yeah, it's just like transfer learning.
link |
It's understudied and the academic world
link |
just has no reason to care about practical results.
link |
The funny thing is, like,
link |
I've only really ever written one paper.
link |
I hate writing papers.
link |
And I didn't even write it.
link |
It was my colleague, Sebastian Ruder, who actually wrote it.
link |
I just did the research for it.
link |
But it was basically introducing successful transfer learning
link |
to NLP for the first time.
link |
And the algorithm is called ULMfit.
link |
And I actually wrote it for the course,
link |
for the first day of course.
link |
I wanted to teach people NLP.
link |
And I thought I only want to teach people practical stuff.
link |
And I think the only practical stuff is transfer learning.
link |
And I couldn't find any examples of transfer learning in NLP.
link |
And I was shocked to find that as soon as I did it,
link |
which, you know, the basic prototype took a couple of days,
link |
smashed the state of the art
link |
on one of the most important data sets in a field
link |
that I knew nothing about.
link |
And I just thought, well, this is ridiculous.
link |
And so I spoke to Sebastian about it.
link |
And he kindly offered to write it up the results.
link |
And so it ended up being published in ACL,
link |
which is the top computational linguistics conference.
link |
So like, people do actually care once you do it.
link |
But I guess it's difficult for maybe junior researchers.
link |
I don't care whether I get citations or papers or whatever.
link |
There's nothing in my life that makes that important,
link |
which is why I've never actually
link |
bothered to write a paper myself.
link |
But for people who do, I guess they
link |
have to pick the kind of safe option, which is like,
link |
yeah, make a slight improvement on something
link |
that everybody's already working on.
link |
Yeah, nobody does anything interesting or succeeds
link |
in life with the safe option.
link |
Well, I mean, the nice thing is nowadays,
link |
everybody is now working on NLP transfer learning.
link |
Because since that time, we've had GPT and GPT2 and BERT.
link |
So yeah, once you show that something's possible,
link |
everybody jumps in, I guess.
link |
I hope to be a part of it.
link |
I hope to see more innovation and active learning
link |
I think transfer learning and active learning
link |
are a fascinating public open work.
link |
I actually helped start a startup called Platform AI, which
link |
is really all about active learning.
link |
And yeah, it's been interesting trying
link |
to kind of see what research is out there
link |
and make the most of it.
link |
And there's basically none.
link |
So we've had to do all our own research.
link |
Once again, and just as you described,
link |
can you tell the story of the Stanford competition,
link |
Dawn Bench, and fast AI's achievement on it?
link |
So something which I really enjoy is that I basically
link |
teach two courses a year, the practical deep learning
link |
for coders, which is kind of the introductory course,
link |
and then cutting edge deep learning for coders, which
link |
is the kind of research level course.
link |
And while I teach those courses, I basically
link |
have a big office at the University of San Francisco.
link |
It'd be enough for like 30 people.
link |
And I invite any student who wants to come and hang out
link |
with me while I build the course.
link |
And so generally, it's full.
link |
And so we have 20 or 30 people in a big office
link |
with nothing to do but study deep learning.
link |
So it was during one of these times
link |
that somebody in the group said, oh, there's
link |
a thing called Dawn Bench that looks interesting.
link |
And I say, what the hell is that?
link |
I'm going to set out some competition
link |
to see how quickly you can train a model.
link |
It seems kind of not exactly relevant to what we're doing,
link |
but it sounds like the kind of thing
link |
which you might be interested in.
link |
And I checked it out and I was like, oh, crap.
link |
There's only 10 days till it's over.
link |
It's pretty much too late.
link |
And we're kind of busy trying to teach this course.
link |
But we're like, oh, it would make an interesting case study
link |
for the course like it's all the stuff we're already doing.
link |
Why don't we just put together our current best practices
link |
So me and I guess about four students just decided
link |
And we focused on this small one called
link |
SciFar 10, which is little 32 by 32 pixel images.
link |
Can you say what Dawn Bench is?
link |
Yeah, so it's a competition to train a model as fast as possible.
link |
It was run by Stanford.
link |
And as cheap as possible, too.
link |
That's also another one for as cheap as possible.
link |
And there's a couple of categories, ImageNet and SciFar 10.
link |
So ImageNet's this big 1.3 million image thing
link |
that took a couple of days to train.
link |
I remember a friend of mine, Pete Warden, who's now at Google.
link |
I remember he told me how he trained ImageNet a few years
link |
ago when he basically had this little granny flat out
link |
the back that he turned into was ImageNet training center.
link |
And after a year of work, he figured out
link |
how to train it in 10 days or something.
link |
It's like that was a big job.
link |
Whereas SciFar 10, at that time, you
link |
could train in a few hours.
link |
It's much smaller and easier.
link |
So we thought we'd try SciFar 10.
link |
And yeah, I've really never done that before.
link |
Like, things like using more than one GPU at a time
link |
was something I tried to avoid.
link |
Because to me, it's very against the whole idea
link |
of accessibility, is she better do things with one GPU?
link |
I mean, have you asked in the past
link |
before, after having accomplished something,
link |
how do I do this faster, much faster?
link |
But it's always, for me, it's always,
link |
how do I make it much faster on a single GPU
link |
that a normal person could afford in their day to day life?
link |
It's not, how could I do it faster by having a huge data
link |
Because to me, it's all about, like,
link |
as many people should be able to use something as possible
link |
without fussing around with infrastructure.
link |
So anyway, so in this case, it's like, well,
link |
we can use 8GPUs just by renting a AWS machine.
link |
So we thought we'd try that.
link |
And yeah, basically, using the stuff we were already doing,
link |
we were able to get the speed.
link |
Within a few days, we had the speed down to a very small
link |
number of minutes.
link |
I can't remember exactly how many minutes it was,
link |
but it might have been like 10 minutes or something.
link |
And so yeah, we found ourselves at the top of the leaderboard
link |
easily for both time and money, which really shocked me.
link |
Because the other people competing in this
link |
were like Google and Intel and stuff,
link |
where I know a lot more about this stuff than I think we do.
link |
So then we emboldened.
link |
We thought, let's try the ImageNet one too.
link |
I mean, it seemed way out of our league.
link |
But our goal was to get under 12 hours.
link |
And we did, which was really exciting.
link |
And we didn't put anything up on the leaderboard,
link |
but we were down to like 10 hours.
link |
But then Google put in like five hours or something,
link |
and we're just like, oh, we're so screwed.
link |
But we kind of thought, well, keep trying.
link |
If Google can do it in five hours.
link |
I mean, Google did it on five hours on like a TPU pod
link |
or something, like a lot of hardware.
link |
But we kind of like had a bunch of ideas to try.
link |
Like a really simple thing was, why
link |
are we using these big images?
link |
They're like 224, 256 by 256 pixels.
link |
Why don't we try smaller ones?
link |
And just to elaborate, there's a constraint on the accuracy
link |
that your train model is supposed to achieve.
link |
Yeah, you've got to achieve 93%.
link |
I think it was for ImageNet.
link |
Which is very tough.
link |
So you have to repeat that.
link |
Like they picked a good threshold.
link |
It was a little bit higher than what the most commonly used
link |
ResNet 50 model could achieve at that time.
link |
So yeah, so it's quite a difficult problem to solve.
link |
But yeah, we realized if we actually just
link |
use 64 by 64 images, it trained a pretty good model.
link |
And then we could take that same model
link |
and just give it a couple of epochs
link |
to learn 224 by 224 images.
link |
And it was basically already trained.
link |
It makes a lot of sense.
link |
Like if you teach somebody, like here's
link |
what a dog looks like, and you show them low res versions,
link |
and then you say, here's a really clear picture of a dog.
link |
They already know what a dog looks like.
link |
So that, like, just we jumped to the front,
link |
and we ended up winning parts of that competition.
link |
We actually ended up doing a distributed version
link |
over multiple machines a couple of months later
link |
and ended up at the top of the leaderboard.
link |
We had 18 minutes.
link |
Yeah, and people have just kept on blasting through again
link |
and again since then.
link |
So what's your view on multi GPU or multiple machine
link |
training in general as a way to speed code up?
link |
I think it's largely a waste of time.
link |
Both multi GPU on a single machine and?
link |
Yeah, particularly multi machines,
link |
because it's just clunky.
link |
Multi GPUs is less clunky than it used to be.
link |
But to me, anything that slows down your iteration speed
link |
is a waste of time.
link |
So you could maybe do your very last perfecting of the model
link |
on multi GPUs if you need to.
link |
But so for example, I think doing stuff on ImageNet
link |
is generally a waste of time.
link |
Why test things on 1.3 million images?
link |
Most of us don't use 1.3 million images.
link |
And we've also done research that shows that doing things
link |
on a smaller subset of images gives you
link |
the same relative answers anyway.
link |
So from a research point of view, why waste that time?
link |
So actually, I released a couple of new data sets recently.
link |
One is called ImageNet.
link |
The French ImageNet, which is a small subset of ImageNet,
link |
which is designed to be easy to classify.
link |
What's how do you spell ImageNet?
link |
It's got an extra T and E at the end,
link |
because it's very French.
link |
And then another one called ImageWolf,
link |
which is a subset of ImageNet that only contains dog breeds.
link |
But that's a hard one, right?
link |
That's a hard one.
link |
And I've discovered that if you just look at these two
link |
subsets, you can train things on a single GPU in 10 minutes.
link |
And the results you get are directly transferrable
link |
to ImageNet nearly all the time.
link |
And so now I'm starting to see some researchers start
link |
to use these smaller data sets.
link |
I so deeply love the way you think,
link |
because I think you might have written a blog post saying
link |
that going with these big data sets
link |
is encouraging people to not think creatively.
link |
So year two, it sort of constrains you
link |
to train on large resources.
link |
And because you have these resources,
link |
you think more research will be better.
link |
And then you start to like somehow you kill the creativity.
link |
And even worse than that, Lex, I keep hearing from people
link |
who say, I decided not to get into deep learning
link |
because I don't believe it's accessible to people
link |
outside of Google to do useful work.
link |
So like I see a lot of people make an explicit decision
link |
to not learn this incredibly valuable tool
link |
because they've drunk the Google Kool Aid, which is that only
link |
Google's big enough and smart enough to do it.
link |
And I just find that so disappointing and it's so wrong.
link |
And I think all of the major breakthroughs in AI
link |
in the next 20 years will be doable on a single GPU.
link |
Like I would say, my sense is all the big sort of.
link |
Well, let's put it this way.
link |
None of the big breakthroughs of the last 20 years
link |
have required multiple GPUs.
link |
So like batch norm, value, dropout,
link |
to demonstrate that there's something to them.
link |
Every one of them, none of them has required multiple GPUs.
link |
GANs, the original GANs, didn't require multiple GPUs.
link |
Well, and we've actually recently shown
link |
that you don't even need GANs.
link |
So we've developed GAN level outcomes
link |
without needing GANs.
link |
And we can now do it with, again,
link |
by using transfer learning, we can do it in a couple of hours
link |
So you're using a generator model
link |
without the adversarial part?
link |
So we've found loss functions that
link |
work super well without the adversarial part.
link |
And then one of our students, a guy called Jason Antich,
link |
has created a system called Dealtify,
link |
which uses this technique to colorize
link |
old black and white movies.
link |
You can do it on a single GPU, colorize a whole movie
link |
in a couple of hours.
link |
And one of the things that Jason and I did together
link |
was we figured out how to add a little bit of GAN
link |
at the very end, which it turns out for colorization,
link |
makes it just a bit brighter and nicer.
link |
And then Jason did masses of experiments
link |
to figure out exactly how much to do.
link |
But it's still all done on his home machine,
link |
on a single GPU in his lounge room.
link |
And if you think about colorizing Hollywood movies,
link |
that sounds like something a huge studio would have to do.
link |
But he has the world's best results on this.
link |
There's this problem of microphones.
link |
We're just talking to microphones now.
link |
It's such a pain in the ass to have these microphones
link |
to get good quality audio.
link |
And I tried to see if it's possible to plop down
link |
a bunch of cheap sensors and reconstruct higher quality
link |
audio from multiple sources.
link |
Because right now, I haven't seen work from, OK,
link |
we can save inexpensive mics, automatically combining
link |
audio from multiple sources to improve the combined audio.
link |
People haven't done that.
link |
And that feels like a learning problem.
link |
So hopefully somebody can.
link |
Well, I mean, it's evidently doable.
link |
And it should have been done by now.
link |
I felt the same way about computational photography
link |
Why are we investing in big lenses when
link |
three cheap lenses plus actually a little bit of intentional
link |
movement, so like take a few frames,
link |
gives you enough information to get excellent subpixel
link |
resolution, which particularly with deep learning,
link |
you would know exactly what you meant to be looking at.
link |
We can totally do the same thing with audio.
link |
I think the madness that it hasn't been done yet.
link |
Has there been progress on photography companies?
link |
Photography is basically a standard now.
link |
So the Google Pixel Nightlight, I
link |
don't know if you've ever tried it, but it's astonishing.
link |
You take a picture and almost pitch black
link |
and you get back a very high quality image.
link |
And it's not because of the lens.
link |
Same stuff with like adding the bokeh to the background
link |
It's done computationally.
link |
Just the pics over here.
link |
Basically, everybody now is doing most of the fanciest stuff
link |
on their phones with computational photography
link |
and also increasingly, people are putting more than one lens
link |
on the back of the camera.
link |
So the same will happen for audio, for sure.
link |
And there's applications in the audio side.
link |
If you look at an Alexa type device,
link |
most people I've seen, especially I worked at Google
link |
before, when you look at noise background removal,
link |
you don't think of multiple sources of audio.
link |
You don't play with that as much as I would hope people would.
link |
But I mean, you can still do it even with one.
link |
Like, again, it's not much work's been done in this area.
link |
So we're actually going to be releasing an audio library
link |
soon, which hopefully will encourage development of this
link |
because it's so underused.
link |
The basic approach we used for our super resolution,
link |
in which Jason uses for dealdify of generating
link |
high quality images, the exact same approach
link |
would work for audio.
link |
No one's done it yet, but it would be a couple of months work.
link |
OK, also learning rate in terms of dawn bench.
link |
There's some magic on learning rate that you played around
link |
It's kind of interesting.
link |
Yeah, so this is all work that came from a guy called Leslie
link |
Leslie's a researcher who, like us,
link |
cares a lot about just the practicalities of training
link |
neural networks quickly and accurately,
link |
which you would think is what everybody should care about,
link |
but almost nobody does.
link |
And he discovered something very interesting,
link |
which he calls super convergence, which
link |
is there are certain networks that with certain settings
link |
of high parameters could suddenly
link |
be trained 10 times faster by using
link |
a 10 times higher learning rate.
link |
Now, no one published that paper
link |
because it's not an area of active research
link |
in the academic world.
link |
No academics recognize this is important.
link |
And also, deep learning in academia
link |
is not considered a experimental science.
link |
So unlike in physics, where you could say,
link |
I just saw a subatomic particle do something
link |
which the theory doesn't explain,
link |
you could publish that without an explanation.
link |
And then in the next 60 years, people
link |
can try to work out how to explain it.
link |
We don't allow this in the deep learning world.
link |
So it's literally impossible for Leslie to publish a paper that
link |
says, I've just seen something amazing happen.
link |
This thing trained 10 times faster than it should have.
link |
And so the reviewers were like, well,
link |
you can't publish that because you don't know why.
link |
That's important to pause on because there's
link |
so many discoveries that would need to start like that.
link |
Every other scientific field I know of works of that way.
link |
I don't know why ours is uniquely
link |
disinterested in publishing unexplained
link |
experimental results.
link |
So it wasn't published.
link |
Having said that, I read a lot more
link |
unpublished papers and published papers
link |
because that's where you find the interesting insights.
link |
So I absolutely read this paper.
link |
And I was just like, this is astonishingly mind blowing
link |
and weird and awesome.
link |
And why isn't everybody only talking about this?
link |
Because if you can train these things 10 times faster,
link |
they also generalize better because you're doing less epochs,
link |
which means you look at the data less,
link |
you get better accuracy.
link |
So I've been kind of studying that ever since.
link |
And eventually Leslie kind of figured out
link |
a lot of how to get this done.
link |
And we added minor tweaks.
link |
And a big part of the trick is starting
link |
at a very low learning rate, very gradually increasing it.
link |
So as you're training your model,
link |
you take very small steps at the start.
link |
And you gradually make them bigger and bigger
link |
until eventually you're taking much bigger steps
link |
than anybody thought was possible.
link |
There's a few other little tricks to make it work.
link |
Basically, we can reliably get super convergence.
link |
And so for the dorm bench thing,
link |
we were using just much higher learning rates
link |
than people expected to work.
link |
What do you think the future of,
link |
I mean, it makes so much sense for that
link |
to be a critical hyperparameter learning rate that you vary.
link |
What do you think the future of learning rate magic looks like?
link |
Well, there's been a lot of great work
link |
in the last 12 months in this area.
link |
And people are increasingly realizing that we just
link |
have no idea really how optimizers work.
link |
And the combination of weight decay,
link |
which is how we regularize optimizers,
link |
and the learning rate, and then other things
link |
like the epsilon we use in the atom optimizer,
link |
they all work together in weird ways.
link |
And different parts of the model,
link |
this is another thing we've done a lot of work on,
link |
is research into how different parts of the model
link |
should be trained at different rates in different ways.
link |
So we do something we call discriminative learning rates,
link |
which is really important, particularly for transfer
link |
So really, I think in the last 12 months,
link |
a lot of people have realized that all this stuff is important.
link |
There's been a lot of great work coming out.
link |
And we're starting to see algorithms
link |
appear which have very, very few dials, if any,
link |
that you have to touch.
link |
So I think what's going to happen
link |
is the idea of a learning rate, well,
link |
it almost already has disappeared in the latest research.
link |
And instead, it's just like, we know enough
link |
about how to interpret the gradients
link |
and the change of gradients we see
link |
to know how to set every parameter of our way.
link |
There you can automate it.
link |
So you see the future of deep learning, where really,
link |
where is the input of a human expert needed?
link |
Well, hopefully, the input of a human expert
link |
will be almost entirely unneeded from the deep learning
link |
So again, Google's approach to this
link |
is to try and use thousands of times more compute
link |
to run lots and lots of models at the same time
link |
and hope that one of them is good.
link |
A lot of malkana stuff.
link |
Yeah, a lot of malkana stuff, which I think is insane.
link |
When you better understand the mechanics of how models learn,
link |
you don't have to try 1,000 different models
link |
to find which one happens to work the best.
link |
You can just jump straight to the best one, which
link |
means that it's more accessible in terms of compute, cheaper,
link |
and also with less hyperparameters to set.
link |
That means you don't need deep learning experts
link |
to train your deep learning model for you,
link |
which means that domain experts can do more of the work, which
link |
means that now you can focus the human time
link |
on the kind of interpretation, the data gathering,
link |
identifying model errors, and stuff like that.
link |
Yeah, the data side.
link |
How often do you work with data these days
link |
in terms of the cleaning, Darwin looked
link |
at different species while traveling about,
link |
do you look at data?
link |
Have you, in your roots in Kaggle, just look at data?
link |
Yeah, I mean, it's a key part of our course.
link |
It's like before we train a model in the course,
link |
we see how to look at the data.
link |
And then the first thing we do after we train our first model,
link |
which we fine tune an ImageNet model for five minutes.
link |
And then the thing we immediately do after that
link |
is we learn how to analyze the results of the model
link |
by looking at examples of misclassified images,
link |
and looking at a classification matrix,
link |
and then doing research on Google
link |
to learn about the kinds of things that it's misclassifying.
link |
So to me, one of the really cool things
link |
about machine learning models in general
link |
is that when you interpret them, they
link |
tell you about things like what are the most important features,
link |
which groups you're misclassifying,
link |
and they help you become a domain expert more quickly,
link |
because you can focus your time on the bits
link |
that the model is telling you is important.
link |
So it lets you deal with things like data leakage,
link |
for example, if it says, oh, the main feature I'm looking at
link |
And you're like, oh, customer ID should be predictive.
link |
And then you can talk to the people that manage customer IDs,
link |
and they'll tell you, oh, yes, as soon as a customer's application
link |
is accepted, we add a one on the end of their customer ID
link |
So yeah, looking at data, particularly
link |
from the lens of which parts of the data the model says
link |
is important, is super important.
link |
Yeah, and using kind of using the model
link |
to almost debug the data to learn more about the data.
link |
What are the different cloud options
link |
for training your networks?
link |
Last question related to Don Bench.
link |
Well, it's part of a lot of the work you do,
link |
but from a perspective of performance,
link |
I think you've written this in a blog post.
link |
There's AWS, there's a TPU from Google.
link |
What's your sense?
link |
What the future holds?
link |
What would you recommend now in terms of training in the cloud?
link |
So from a hardware point of view,
link |
Google's TPUs and the best Nvidia GPUs are similar.
link |
And maybe the TPUs are like 30% faster,
link |
but they're also much harder to program.
link |
There isn't a clear leader in terms of hardware right now,
link |
although much more importantly, the Nvidia's GPUs
link |
are much more programmable.
link |
They've got much more written problems.
link |
That's the clear leader for me and where
link |
I would spend my time as a researcher and practitioner.
link |
But then in terms of the platform,
link |
I mean, we're super lucky now with stuff like Google,
link |
GCP, Google Cloud, and AWS that you can access a GPU
link |
pretty quickly and easily.
link |
But I mean, for AWS, it's still too hard.
link |
You have to find an AMI and get the instance running
link |
and then install the software you want and blah, blah, blah.
link |
GCP is currently the best way to get
link |
started on a full server environment
link |
because they have a fantastic fast AI in PyTorch,
link |
ready to go instance, which has all the courses preinstalled.
link |
It has Jupyter Notebook prerunning.
link |
Jupyter Notebook is this wonderful interactive computing
link |
system, which everybody basically
link |
should be using for any kind of data driven research.
link |
But then even better than that, there
link |
are platforms like Salamander, which we own,
link |
and Paperspace, where literally you click a single button
link |
and it pops up and you put a notebook straight away
link |
without any kind of installation or anything.
link |
And all the course notebooks are all preinstalled.
link |
So for me, this is one of the things
link |
we spent a lot of time curating and working on.
link |
Because when we first started our courses,
link |
the biggest problem was people dropped out of lesson one
link |
because they couldn't get an AWS instance running.
link |
So things are so much better now.
link |
And we actually have, if you go to course.fast.ai,
link |
the first thing it says is, here's
link |
how to get started with your GPU.
link |
And it's like, you just click on the link
link |
and you click start and you're going.
link |
I have to confess, I've never used the Google GCP.
link |
Yeah, GCP gives you $300 of compute for free,
link |
which is really nice.
link |
But as I say, Salamander and Paperspace are even easier still.
link |
So from the perspective of deep learning frameworks,
link |
you work with Fast.ai, if you think of it as framework,
link |
and PyTorch and TensorFlow, what are the strengths
link |
of each platform in your perspective?
link |
So in terms of what we've done our research on and taught
link |
in our course, we started with Theano and Keras.
link |
And then we switched to TensorFlow and Keras.
link |
And then we switched to PyTorch.
link |
And then we switched to PyTorch and Fast.ai.
link |
And that kind of reflects a growth and development
link |
of the ecosystem of deep learning libraries.
link |
Theano and TensorFlow were great,
link |
but were much harder to teach and to do research and development
link |
on because they define what's called a computational graph
link |
up front, a static graph, where you basically
link |
have to say, here are all the things
link |
that I'm going to eventually do in my model.
link |
And then later on, you say, OK, do those things with this data.
link |
And you can't debug them.
link |
You can't do them step by step.
link |
You can't program them interactively
link |
in a Jupyter notebook and so forth.
link |
PyTorch was not the first, but PyTorch
link |
was certainly the strongest entrant to come along
link |
and say, let's not do it that way.
link |
Let's just use normal Python.
link |
And everything you know about in Python
link |
is just going to work.
link |
And we'll figure out how to make that run on the GPU
link |
as and when necessary.
link |
That turned out to be a huge leap in terms
link |
of what we could do with our research
link |
and what we could do with our teaching.
link |
Because it wasn't limiting.
link |
Yeah, I mean, it was critical for us
link |
for something like Dawnbench to be able to rapidly try things.
link |
It's just so much harder to be a researcher and practitioner
link |
when you have to do everything upfront
link |
and you can't inspect it.
link |
Problem with PyTorch is it's not at all
link |
accessible to newcomers because you
link |
have to write your own training loop
link |
and manage the gradients and all this stuff.
link |
And it's also not great for researchers
link |
because you're spending your time dealing with all this boiler
link |
plate and overhead rather than thinking about your algorithm.
link |
So we ended up writing this very multi layered API
link |
that at the top level, you can train a state of the art neural
link |
network in three lines of code.
link |
And which talks to an API, which talks to an API,
link |
which talks to an API, which you can dive into at any level
link |
and get progressively closer to the machine levels of control.
link |
And this is the fast AI library.
link |
That's been critical for us and for our students
link |
and for lots of people that have won big learning
link |
competitions with it and written academic papers with it.
link |
It's made a big difference.
link |
We're still limited though by Python.
link |
And particularly this problem with things
link |
like our current neural nets say where you just can't change
link |
things unless you accept it going so slowly
link |
that it's impractical.
link |
So in the latest incarnation of the course
link |
and with some of the research we're now starting to do,
link |
we're starting to do some stuff in Swift.
link |
I think we're three years away from that being
link |
super practical, but I'm in no hurry.
link |
I'm very happy to invest the time to get there.
link |
But with that, we actually already
link |
have a nascent version of the fast AI library for vision
link |
running on Swift and TensorFlow.
link |
Because Python for TensorFlow is not going to cut it.
link |
It's just a disaster.
link |
What they did was they tried to replicate the bits
link |
that people were saying they like about PyTorch,
link |
this kind of interactive computation.
link |
But they didn't actually change their foundational runtime
link |
So they kind of added this like syntax, sugar,
link |
they call TF Eager, TensorFlow Eager, which
link |
makes it look a lot like PyTorch.
link |
But it's 10 times slower than PyTorch to actually do a step.
link |
So because they didn't invest the time
link |
in retooling the foundations because their code base
link |
is so horribly complex.
link |
Yeah, I think it's probably very difficult
link |
to do that kind of rejoining.
link |
Yeah, well, particularly the way TensorFlow was written,
link |
it was written by a lot of people very quickly
link |
in a very disorganized way.
link |
So when you actually look in the code, as I do often,
link |
I'm always just like, oh, god, what were they thinking?
link |
It's just, it's pretty awful.
link |
So I'm really extremely negative about the potential future
link |
for Python TensorFlow that Swift for TensorFlow
link |
can be a different beast altogether.
link |
It can be like, it can basically be a layer on top of MLIR
link |
that takes advantage of all the great compiler stuff
link |
that Swift builds on with LLVM.
link |
And yeah, it could be absolutely.
link |
I think it will be absolutely fantastic.
link |
Well, you're inspiring me to try.
link |
Evan truly felt the pain of TensorFlow 2.0 Python.
link |
Yeah, I mean, it does the job if you're using
link |
predefined things that somebody's already written.
link |
But if you actually compare, like I've
link |
had to do a lot of stuff with TensorFlow recently,
link |
you actually compare like, I want
link |
to write something from scratch.
link |
And you're like, I just keep finding it's like, oh,
link |
it's running 10 times slower than PyTorch.
link |
So is the biggest cost.
link |
Let's throw running time out the window.
link |
How long it takes you to program?
link |
That's not too different now.
link |
Thanks to TensorFlow Eager, that's not too different.
link |
But because so many things take so long to run,
link |
you wouldn't run it at 10 times slower.
link |
Like, you just go like, oh, this is taking too long.
link |
And also, there's a lot of things
link |
which are just less programmable,
link |
like tf.data, which is the way data processing works
link |
in TensorFlow, is just this big mess.
link |
It's incredibly inefficient.
link |
And they kind of had to write it that way
link |
because of the TPU problems I described earlier.
link |
So I just feel like they've got this huge technical debt,
link |
which they're not going to solve without starting from scratch.
link |
So here's an interesting question then.
link |
If there's a new student starting today,
link |
what would you recommend they use?
link |
Well, I mean, we obviously recommend
link |
FastAI and PyTorch because we teach new students.
link |
And that's what we teach with.
link |
So we would very strongly recommend that
link |
because it will let you get on top of the concepts much
link |
So then you'll become an action.
link |
And you'll also learn the actual state of the art techniques.
link |
So you actually get world class results.
link |
Honestly, it doesn't much matter what library
link |
you learn because switching from Shaina to MXNet to TensorFlow
link |
to PyTorch is going to be a couple of days work
link |
if you long as you understand the foundation as well.
link |
But you think we'll Swift creep in there as a thing
link |
that people start using?
link |
Not for a few years, particularly because Swift
link |
has no data science community, libraries, schooling.
link |
And the Swift community has a total lack of appreciation
link |
and understanding of numeric computing.
link |
So they keep on making stupid decisions.
link |
For years, they've just done dumb things around performance
link |
and prioritization.
link |
That's clearly changing now because the developer of Chris
link |
Lattner is working at Google on Swift for TensorFlow.
link |
So that's a priority.
link |
It'll be interesting to see what happens with Apple
link |
because Apple hasn't shown any sign of caring
link |
about numeric programming in Swift.
link |
So hopefully they'll get off their arse
link |
and start appreciating this because currently all
link |
of their low level libraries are not written in Swift.
link |
They're not particularly Swifty at all, stuff like Core ML.
link |
They're really pretty rubbish.
link |
So yeah, so there's a long way to go.
link |
But at least one nice thing is that Swift for TensorFlow
link |
can actually directly use Python code and Python libraries.
link |
Literally, the entire lesson one notebook of fast AI
link |
runs in Swift right now in Python mode.
link |
So that's a nice intermediate thing.
link |
How long does it take if you look at the two fast AI courses,
link |
how long does it take to get from 0.0 to completing
link |
Somewhere between two months and two years, generally.
link |
So for two months, how many hours a day on average?
link |
So like somebody who is a very competent coder
link |
can can do 70 hours per course and pick up.
link |
But a lot of people I know take a year off to study fast AI
link |
full time and say at the end of the year,
link |
they feel pretty competent.
link |
Because generally, there's a lot of other things you do.
link |
Generally, they'll be entering Kaggle competitions.
link |
They might be reading Ian Goodfellow's book.
link |
They might be doing a bunch of stuff.
link |
And often, particularly if they are a domain expert,
link |
their coding skills might be a little on the pedestrian side.
link |
So part of it's just like doing a lot more writing.
link |
What do you find is the bottleneck for people usually,
link |
except getting started and setting stuff up?
link |
I would say coding.
link |
The people who are strong coders pick it up the best.
link |
Although another bottleneck is people who have a lot of
link |
experience of classic statistics can really struggle
link |
because the intuition is so the opposite of what they're used to.
link |
They're very used to trying to reduce the number of parameters
link |
in their model and looking at individual coefficients
link |
and stuff like that.
link |
So I find people who have a lot of coding background
link |
and know nothing about statistics are generally
link |
going to be the best stuff.
link |
So you taught several courses on deep learning
link |
and as Feynman says, the best way to understand something
link |
What have you learned about deep learning from teaching it?
link |
It's a key reason for me to teach the courses.
link |
Obviously, it's going to be necessary to achieve our goal
link |
of getting domain experts to be familiar with deep learning,
link |
but it was also necessary for me to achieve my goal
link |
of being really familiar with deep learning.
link |
I mean, to see so many domain experts from so many different
link |
backgrounds, it's definitely, I wouldn't say taught me,
link |
but convinced me something that I liked to believe was true,
link |
which was anyone can do it.
link |
So there's a lot of kind of snobbishness out there about
link |
only certain people can learn to code,
link |
only certain people are going to be smart enough to do AI.
link |
That's definitely bullshit.
link |
I've seen so many people from so many different backgrounds
link |
get state of the art results in their domain areas now.
link |
It's definitely taught me that the key differentiator
link |
between people that succeed and people that fail is tenacity.
link |
That seems to be basically the only thing that matters.
link |
A lot of people give up.
link |
But if the ones who don't give up pretty much everybody succeeds,
link |
even if at first I'm just kind of thinking,
link |
wow, they really aren't quite getting it yet, are they?
link |
But eventually people get it and they succeed.
link |
So I think that's been, I think they're both things I liked
link |
to believe was true, but I don't feel like I really had
link |
strong evidence for them to be true,
link |
but now I can see I've seen it again and again.
link |
So what advice do you have for someone
link |
who wants to get started in deep learning?
link |
Train lots of models.
link |
That's how you learn it.
link |
So I think, it's not just me.
link |
I think our course is very good,
link |
but also lots of people independently have said it's very good.
link |
It recently won the CogEx Award for AI courses,
link |
it's being the best in the world.
link |
I'd say come to our course, course.fast.ai.
link |
And the thing I keep on harping on in my lessons is
link |
train models, print out the inputs to the models,
link |
print out to the outputs to the models,
link |
like study, you know, change the inputs a bit,
link |
look at how the outputs vary,
link |
just run lots of experiments to get an intuitive understanding
link |
of what's going on.
link |
To get hooked, do you think, you mentioned training,
link |
do you think just running the models inference?
link |
If we talk about getting started.
link |
No, you've got to fine tune the models.
link |
So that's the critical thing,
link |
because at that point, you now have a model that's in your domain area.
link |
So there's no point running somebody else's model,
link |
because it's not your model.
link |
So it only takes five minutes to fine tune a model
link |
for the data you care about.
link |
And in lesson two of the course,
link |
we teach you how to create your own dataset from scratch
link |
by scripting Google image search.
link |
And we show you how to actually create a web application running online.
link |
So I create one in the course that differentiates
link |
between a teddy bear, a grizzly bear, and a brown bear.
link |
And it does it with basically 100% accuracy.
link |
It took me about four minutes to scrape the images
link |
from Google search in the script.
link |
There's a little graphical widgets we have in the notebook
link |
that help you clean up the dataset.
link |
There's other widgets that help you study the results
link |
and see where the errors are happening.
link |
And so now we've got over a thousand replies
link |
in our Share Your Work Here thread of students saying,
link |
here's the thing I built.
link |
And so there's people who, like,
link |
and a lot of them are state of the art.
link |
Like somebody said, oh, I tried looking at Dev and Gary characters
link |
and I couldn't believe it.
link |
The thing that came out was more accurate
link |
than the best academic paper after lesson one.
link |
And then there's others which are just more kind of fun,
link |
like somebody who's doing Trinidad and Tobago hummingbirds.
link |
So that's kind of their national bird.
link |
And Susie's got something that can now classify Trinidad
link |
and Tobago hummingbirds.
link |
So yeah, train models, fine tune models with your dataset
link |
and then study their inputs and outputs.
link |
How much is Fast AI courses?
link |
Everything we do is free.
link |
We have no revenue sources of any kind.
link |
It's just a service to the community.
link |
Okay, once a person understands the basics,
link |
trains a bunch of models,
link |
if we look at the scale of years,
link |
what advice do you have for someone wanting
link |
to eventually become an expert?
link |
Train lots of models.
link |
Specifically, train lots of models in your domain area.
link |
So an expert, what, right?
link |
We don't need more expert, like,
link |
create slightly evolutionary research in areas
link |
that everybody's studying.
link |
We need experts at using deep learning
link |
to diagnose malaria.
link |
Well, we need experts at using deep learning
link |
to analyze language to study media bias.
link |
So we need experts in analyzing fisheries
link |
to identify problem areas and the ocean.
link |
That's what we need.
link |
So become the expert in your passion area.
link |
And this is a tool which you can use for just about anything,
link |
and you'll be able to do that thing better than other people,
link |
particularly by combining it with your passion
link |
and domain expertise.
link |
So that's really interesting.
link |
Even if you do want to innovate on transfer learning
link |
or active learning,
link |
your thought is, I mean,
link |
what I certainly share is you also need to find
link |
a domain or data set that you actually really care for.
link |
If you're not working on a real problem that you understand,
link |
how do you know if you're doing it any good?
link |
How do you know if your results are good?
link |
How do you know if you're getting bad results?
link |
Why are you getting bad results?
link |
Is it a problem with the data?
link |
How do you know you're doing anything useful?
link |
Yeah, to me, the only really interesting research is,
link |
not the only, but the vast majority of interesting research
link |
is try and solve an actual problem and solve it really well.
link |
So both understanding sufficient tools on the deep learning side
link |
and becoming a domain expert in a particular domain
link |
are really things within reach for anybody.
link |
To me, I would compare it to studying self driving cars,
link |
having never looked at a car or been in a car
link |
or turned a car on, which is like the way it is
link |
for a lot of people.
link |
They'll study some academic data set
link |
where they literally have no idea about that.
link |
By the way, I'm not sure how familiar
link |
you are with autonomous vehicles,
link |
but that is literally, you describe a large percentage
link |
of robotics folks working in self driving cars,
link |
as they actually haven't considered driving.
link |
They haven't actually looked at what driving looks like.
link |
They haven't driven.
link |
It's a problem because you know when you've actually driven,
link |
these are the things that happened to me when I was driving.
link |
There's nothing that beats the real world examples
link |
or just experiencing them.
link |
You've created many successful startups.
link |
What does it take to create a successful startup?
link |
Same thing as becoming successful deep learning practitioner,
link |
which is not giving up.
link |
So you can run out of money or run out of time
link |
or run out of something, you know,
link |
but if you keep costs super low
link |
and try and save up some money beforehand
link |
so you can afford to have some time,
link |
then just sticking with it is one important thing.
link |
Doing something you understand and care about is important.
link |
By something, I don't mean...
link |
The biggest problem I see with deep learning people
link |
is they do a PhD in deep learning
link |
and then they try and commercialize their PhD.
link |
It does a waste of time
link |
because that doesn't solve an actual problem.
link |
You picked your PhD topic
link |
because it was an interesting kind of engineering
link |
or math or research exercise.
link |
But yeah, if you've actually spent time as a recruiter
link |
and you know that most of your time was spent sifting through resumes
link |
and you know that most of the time
link |
you're just looking for certain kinds of things
link |
and you can try doing that with a model for a few minutes
link |
and see whether that's something which a model
link |
seems to be able to do as well as you could,
link |
then you're on the right track to creating a startup.
link |
And then I think just being...
link |
Just be pragmatic and...
link |
try and stay away from venture capital money
link |
as long as possible, preferably forever.
link |
So yeah, on that point, do you...
link |
venture capital...
link |
So were you able to successfully run startups
link |
with self funded for quite a while?
link |
Yeah, so my first two were self funded
link |
and that was the right way to do it.
link |
VCs startups are much more scary
link |
because you have these people on your back
link |
who do this all the time
link |
and who have done it for years
link |
telling you grow, grow, grow, grow.
link |
And they don't care if you fail.
link |
They only care if you don't grow fast enough.
link |
We're else doing the ones myself
link |
with partners who were friends.
link |
It's nice because we just went along
link |
at a pace that made sense
link |
and we were able to build it to something
link |
which was big enough that we never had to work again
link |
but was not big enough that any VC
link |
would think it was impressive
link |
and that was enough for us to be excited.
link |
So I thought that's a much better way
link |
to do things for most people.
link |
And generally speaking now for yourself
link |
but how do you make money during that process?
link |
Do you cut into savings?
link |
So yeah, so I started Fast Mail
link |
and Optimal Decisions at the same time
link |
in 1999 with two different friends.
link |
And for Fast Mail,
link |
I guess I spent $70 a month on the server.
link |
And when the server ran out of space
link |
I put a payments button on the front page
link |
and said if you want more than 10 meg of space
link |
you have to pay $10 a year.
link |
So run low like I keep your cost down.
link |
Yeah, so I kept my cost down
link |
and once I needed to spend more money
link |
I asked people to spend the money for me
link |
and that was that basically from then on.
link |
We were making money and I was profitable from then.
link |
For Optimal Decisions it was a bit harder
link |
because we were trying to sell something
link |
that was more like a $1 million sale
link |
but what we did was we would sell scoping projects
link |
so kind of like prototypy projects
link |
but rather than doing it for free
link |
we would sell them $50,000 to $100,000.
link |
So again we were covering our costs
link |
and also making the client feel like
link |
we were doing something valuable.
link |
So in both cases we were profitable from six months in.
link |
Nevertheless it's scary.
link |
I mean, yeah, sure.
link |
I mean it's scary before you jump in
link |
and I guess I was comparing it to the scaredyness of VC.
link |
I felt like with VC stuff it was more scary.
link |
Much more in somebody else's hands.
link |
Will they fund you or not?
link |
What do they think of what you're doing?
link |
I also found it very difficult with VC's back startups
link |
to actually do the thing which I thought was important
link |
for the company rather than doing the thing
link |
which I thought would make the VC happy.
link |
Now, VC's always tell you not to do the thing
link |
that makes them happy
link |
but then if you don't do the thing that makes them happy
link |
And do you think optimizing for the whatever they call it
link |
the exit is a good thing to optimize for?
link |
I mean it can be but not at the VC level
link |
because the VC exit needs to be, you know, a thousand X.
link |
So where else the lifestyle exit
link |
if you can sell something for $10 million
link |
then you've made it, right?
link |
If you want to build something that's going to,
link |
you're kind of happy to do forever then fine.
link |
If you want to build something you want to sell
link |
then three years time that's fine too.
link |
I mean they're both perfectly good outcomes.
link |
So you're learning Swift now?
link |
In a way, I mean you already.
link |
And I read that you use at least in some cases
link |
space repetition as a mechanism for learning new things.
link |
I use Anki quite a lot myself.
link |
I actually don't never talk to anybody about it.
link |
Don't know how many people do it
link |
and it works incredibly well for me.
link |
Can you talk to your experience?
link |
Like how did you, what do you, first of all, okay,
link |
What is space repetition?
link |
So space repetition is an idea created
link |
by a psychologist named Ebbinghaus,
link |
I don't know, must be a couple hundred years ago
link |
or something 150 years ago.
link |
He did something which sounds pretty damn tedious.
link |
He found random sequences of letters on cards
link |
and tested how well he would remember those random sequences
link |
a day later, a week later, whatever.
link |
He discovered that there was this kind of a curve
link |
where his probability of remembering one of them
link |
would be dramatically smaller the next day
link |
and then a little bit smaller the next day
link |
and a little bit smaller the next day.
link |
What he discovered is that if he revised those cards
link |
a day, the probabilities would decrease at a smaller rate
link |
and then if he revised them again a week later,
link |
they would decrease at a smaller rate again.
link |
And so he basically figured out a roughly optimal equation
link |
for when you should revise something you want to remember.
link |
So space repetition learning is using this simple algorithm,
link |
just something like revise something after a day
link |
and then three days and then a week and then three weeks
link |
And so if you use a program like Anki, as you know,
link |
it will just do that for you.
link |
And it will say, did you remember this?
link |
And if you say no, it will reschedule it back to be
link |
appear again like 10 times faster than it otherwise would have.
link |
It's a kind of a way of being guaranteed to learn something
link |
because by definition, if you're not learning it,
link |
it will be rescheduled to be revised more quickly.
link |
Unfortunately though, it doesn't let you fool yourself.
link |
If you're not learning something, you know your revisions
link |
will just get more and more.
link |
So you have to find ways to learn things productively
link |
and effectively treat your brain well.
link |
So using mnemonics and stories and context and stuff like that.
link |
So yeah, it's a super great technique.
link |
It's like learning how to learn is something
link |
which everybody should learn before they actually learn anything.
link |
But almost nobody does.
link |
Yes, so what have you, so it certainly works well
link |
for learning new languages, for, I mean, for learning,
link |
like small projects almost.
link |
But do you, you know, I started using it for,
link |
I forget who wrote a blog post about this inspired me.
link |
It might have been you, I'm not sure.
link |
I started when I read papers.
link |
I'll, concepts and ideas, I'll put them.
link |
Was it Michael Nielsen?
link |
It was Michael Nielsen.
link |
Yeah, it was Michael Nielsen.
link |
Michael started doing this recently
link |
and has been writing about it.
link |
I, so the kind of today's ebbing house is a guy called Peter Wozniak
link |
who developed a system called Super Memo.
link |
And he's been basically trying to become like
link |
the world's greatest renaissance man over the last few decades.
link |
He's basically lived his life with space repetition learning
link |
I, and sort of like Michael's only very recently got into this,
link |
but he started really getting excited about doing it
link |
for a lot of different things.
link |
For me personally, I actually don't use it
link |
for anything except Chinese.
link |
And the reason for that is that Chinese is specifically a thing.
link |
I made a conscious decision that I want to continue to remember
link |
even if I don't get much of a chance to exercise it
link |
because like I'm not often in China, so I don't.
link |
Or else something like programming languages or papers,
link |
they have a very different approach,
link |
which is I try not to learn anything from them,
link |
but instead I try to identify the important concepts
link |
and like actually ingest them.
link |
So like really understand that concept deeply
link |
and study it carefully.
link |
Well, decide if it really is important.
link |
If it is like incorporate it into our library,
link |
you know, incorporate it into how I do things
link |
or decide it's not worth it.
link |
So I find I then remember the things that I care about
link |
because I'm using it all the time.
link |
So for the last 25 years,
link |
I've committed to spending at least half of every day
link |
learning or practicing something new,
link |
which is all my colleagues have always hated
link |
because it always looks like I'm not working on
link |
what I'm meant to be working on,
link |
but that always means I do everything faster
link |
because I've been practicing a lot of stuff.
link |
So I kind of give myself a lot of opportunity
link |
to practice new things.
link |
And so I find now I don't often kind of find myself
link |
wishing I could remember something
link |
because if it's something that's useful,
link |
then I've been using it a lot.
link |
It's easy enough to look it up on Google.
link |
But speaking Chinese, you can't look it up on Google.
link |
Do you have advice for people learning new things?
link |
What have you learned as a process?
link |
I mean, it all starts just making the hours
link |
and the day available.
link |
Yeah, you've got to stick with it,
link |
which is, again, the number one thing
link |
that 99% of people don't do.
link |
So the people I started learning Chinese with,
link |
none of them were still doing it 12 months later.
link |
I'm still doing it 10 years later.
link |
I tried to stay in touch with them,
link |
but they just, no one did it.
link |
For something like Chinese,
link |
like study how human learning works.
link |
So every one of my Chinese flashcards
link |
is associated with a story,
link |
and that story is specifically designed to be memorable.
link |
And we find things memorable,
link |
funny or disgusting or sexy
link |
or related to people that we know or care about.
link |
So I try to make sure all the stories that are in my head
link |
have those characteristics.
link |
Yeah, so you have to, you know,
link |
you won't remember things well if they don't have some context.
link |
And yeah, you won't remember them well
link |
if you don't regularly practice them,
link |
whether it be just part of your day to day life
link |
for the Chinese and me flashcards.
link |
I mean, the other thing is, let yourself fail sometimes.
link |
So like, I've had various medical problems
link |
over the last few years,
link |
and basically my flashcards just stopped
link |
for about three years.
link |
And then there've been other times I've stopped
link |
for a few months, and it's so hard because you get back to it,
link |
and it's like, you have 18,000 cards due.
link |
It's like, and so you just have to go,
link |
all right, well, I can either stop and give up everything
link |
or just decide to do this every day for the next two years
link |
until I get back to it.
link |
The amazing thing has been that even after three years,
link |
I, you know, the Chinese were still in there.
link |
Like, it was so much faster to relearn
link |
than it was to mine the first time.
link |
I have the same with guitar, with music and so on.
link |
It's sad because work sometimes takes away
link |
and then you won't play for a year.
link |
But really, if you then just get back to it every day,
link |
you're right there again.
link |
What do you think is the next big breakthrough
link |
in artificial intelligence?
link |
What are your hopes in deep learning or beyond
link |
that people should be working on,
link |
or you hope there'll be breakthroughs?
link |
I don't think it's possible to predict.
link |
I think what we already have
link |
is an incredibly powerful platform
link |
to solve lots of societally important problems
link |
that are currently unsolved.
link |
I just hope that people will, lots of people
link |
will learn this toolkit and try to use it.
link |
I don't think we need a lot of new technological breakthroughs
link |
to do a lot of great work right now.
link |
And when do you think we're going to create
link |
a human level intelligence system?
link |
How far away are we?
link |
I have no way to know.
link |
Like, I don't know why people make predictions about this
link |
because there's no data and nothing to go on.
link |
And it's just like,
link |
there's so many societally important problems
link |
to solve right now,
link |
I just don't find it a really interesting question
link |
So in terms of societally important problems,
link |
what's the problem that is within reach?
link |
Well, I mean, for example,
link |
there are problems that AI creates, right?
link |
So more specifically,
link |
labor force displacement is going to be huge
link |
and people keep making this
link |
frivolous econometric argument of being like,
link |
oh, there's been other things that aren't AI
link |
that have come along before
link |
and haven't created massive labor force displacement.
link |
Therefore, AI won't.
link |
So that's a serious concern for you?
link |
Andrew Yang is running on it.
link |
It's desperately concerned.
link |
And you see already that the changing workplace
link |
has lived to a hollowing out of the middle class.
link |
You're seeing that students coming out of school today
link |
have a less rosy financial future ahead of them
link |
than the parents did,
link |
which has never happened in recent,
link |
in the last 300 years.
link |
We've always had progress before.
link |
And you see this turning into anxiety and despair
link |
and even violence.
link |
So I very much worry about that.
link |
You've written quite a bit about ethics, too.
link |
I do think that every data scientist
link |
working with deep learning needs to recognize
link |
they have an incredibly high leverage tool
link |
that they're using that can influence society
link |
And if they're doing research,
link |
that research is going to be used by people
link |
doing this kind of work
link |
and they have a responsibility
link |
to consider the consequences
link |
and to think about things like
link |
how will humans be in the loop here?
link |
How do we avoid runaway feedback loops?
link |
How do we ensure an appeals process for humans
link |
that are impacted by my algorithm?
link |
How do I ensure that the constraints of my algorithm
link |
are adequately explained to the people that end up using them?
link |
There's all kinds of human issues,
link |
which only data scientists
link |
are actually in the right place to educate people about,
link |
but data scientists tend to think of themselves as
link |
and that they don't need to be part of that process,
link |
Well, you're in the perfect position to educate them better,
link |
to read literature, to read history,
link |
to learn from history.
link |
Well, Jeremy, thank you so much for everything you do
link |
for inspiring a huge amount of people,
link |
getting them into deep learning
link |
and having the ripple effects,
link |
the flap of a butterfly's wings that will probably change the world.
link |
So thank you very much.