back to indexJeremy Howard: fast.ai Deep Learning Courses and Research | Lex Fridman Podcast #35
link |
The following is a conversation with Jeremy Howard.
link |
He's the founder of FastAI, a research institute dedicated
link |
to making deep learning more accessible.
link |
He's also a distinguished research scientist
link |
at the University of San Francisco,
link |
a former president of Kaggle,
link |
as well as a top ranking competitor there.
link |
And in general, he's a successful entrepreneur,
link |
educator, researcher, and an inspiring personality
link |
in the AI community.
link |
When someone asks me, how do I get started with deep learning?
link |
FastAI is one of the top places that point them to.
link |
It's free, it's easy to get started,
link |
it's insightful and accessible,
link |
and if I may say so, it has very little BS
link |
that can sometimes dilute the value of educational content
link |
on popular topics like deep learning.
link |
FastAI has a focus on practical application of deep learning
link |
and hands on exploration of the cutting edge
link |
that is incredibly both accessible to beginners
link |
and useful to experts.
link |
This is the Artificial Intelligence Podcast.
link |
If you enjoy it, subscribe on YouTube,
link |
give it five stars on iTunes,
link |
support it on Patreon,
link |
or simply connect with me on Twitter
link |
at Lex Friedman, spelled F R I D M A N.
link |
And now, here's my conversation with Jeremy Howard.
link |
What's the first program you ever written?
link |
First program I wrote that I remember
link |
would be at high school.
link |
I did an assignment where I decided
link |
to try to find out if there were some better musical scales
link |
than the normal 12 tone, 12 interval scale.
link |
So I wrote a program on my Commodore 64 in basic
link |
that searched through other scale sizes
link |
to see if it could find one
link |
where there were more accurate harmonies.
link |
Like you want an actual exactly three to two ratio
link |
or else with a 12 interval scale,
link |
it's not exactly three to two, for example.
link |
So that's well tempered as they say in there.
link |
And basic on a Commodore 64.
link |
Where was the interest in music from?
link |
Or is it just technical?
link |
I did music all my life.
link |
So I played saxophone and clarinet and piano
link |
and guitar and drums and whatever.
link |
How does that thread go through your life?
link |
Where's music today?
link |
It's not where I wish it was.
link |
For various reasons, couldn't really keep it going,
link |
particularly because I had a lot of problems
link |
with RSI with my fingers.
link |
And so I had to kind of like cut back anything
link |
that used hands and fingers.
link |
I hope one day I'll be able to get back to it health wise.
link |
So there's a love for music underlying it all.
link |
What's your favorite instrument?
link |
Or baritone saxophone.
link |
Well, probably bass saxophone, but they're awkward.
link |
Well, I always love it when music
link |
is coupled with programming.
link |
There's something about a brain that utilizes those
link |
that emerges with creative ideas.
link |
So you've used and studied quite a few programming languages.
link |
Can you give an overview of what you've used?
link |
What are the pros and cons of each?
link |
Well, my favorite programming environment,
link |
well, most certainly was Microsoft Access
link |
back in like the earliest days.
link |
So that was Visual Basic for applications,
link |
which is not a good programming language,
link |
but the programming environment was fantastic.
link |
It's like the ability to create, you know,
link |
user interfaces and tie data and actions to them
link |
and create reports and all that
link |
as I've never seen anything as good.
link |
There's things nowadays like Airtable,
link |
which are like small subsets of that,
link |
which people love for good reason,
link |
but unfortunately, nobody's ever achieved
link |
anything like that.
link |
If you could pause on that for a second.
link |
It was a database program that Microsoft produced,
link |
part of Office, and they kind of withered, you know,
link |
but basically it lets you in a totally graphical way
link |
create tables and relationships and queries
link |
and tie them to forms and set up, you know,
link |
event handlers and calculations.
link |
And it was a very complete powerful system
link |
designed for not massive scalable things,
link |
but for like useful little applications that I loved.
link |
So what's the connection between Excel and Access?
link |
So Access kind of was the relational database equivalent,
link |
So people still do a lot of that stuff
link |
that should be in Access in Excel as they know it.
link |
Excel's great as well.
link |
So, but it's just not as rich a programming model
link |
as VBA combined with a relational database.
link |
And so I've always loved relational databases,
link |
but today programming on top of relational database
link |
is just a lot more of a headache.
link |
You know, you generally either need to kind of,
link |
you know, you need something that connects,
link |
that runs some kind of database server
link |
unless you use SQLite, which has its own issues.
link |
Then you kind of often,
link |
if you want to get a nice programming model,
link |
you'll need to like create an, add an ORM on top.
link |
And then, I don't know,
link |
there's all these pieces to tie together
link |
and it's just a lot more awkward than it should be.
link |
There are people that are trying to make it easier.
link |
So in particular, I think of F sharp, you know, Don Syme,
link |
who, him and his team have done a great job
link |
of making something like a database appear
link |
in the type system.
link |
So you actually get like tab completion for fields
link |
and tables and stuff like that.
link |
Anyway, so that was kind of, anyway,
link |
so like that whole VBA office thing, I guess,
link |
was a starting point, which I still miss.
link |
And I got into standard Visual Basic, which...
link |
That's interesting, just to pause on that for a second.
link |
It's interesting that you're connecting programming languages
link |
to the ease of management of data.
link |
So in your use of programming languages,
link |
you always had a love and a connection with data.
link |
I've always been interested in doing useful things
link |
for myself and for others,
link |
which generally means getting some data
link |
and doing something with it and putting it out there again.
link |
So that's been my interest throughout.
link |
So I also did a lot of stuff with AppleScript
link |
back in the early days.
link |
So it's kind of nice being able to get the computer
link |
and computers to talk to each other
link |
and to do things for you.
link |
And then I think that one,
link |
the programming language I most loved then
link |
would have been Delphi, which was Object Pascal,
link |
created by Anders Heilsberg,
link |
who previously did Turbo Pascal
link |
and then went on to create.NET
link |
and then went on to create TypeScript.
link |
Delphi was amazing because it was like a compiled,
link |
fast language that was as easy to use as Visual Basic.
link |
Delphi, what is it similar to in more modern languages?
link |
Yeah, but a compiled, fast version.
link |
So I'm not sure there's anything quite like it anymore.
link |
If you took like C Sharp or Java
link |
and got rid of the virtual machine
link |
and replaced it with something,
link |
you could compile a small type binary.
link |
I feel like it's where Swift could get to
link |
with the new Swift UI
link |
and the cross platform development going on.
link |
Like that's one of my dreams
link |
is that we'll hopefully get back to where Delphi was.
link |
There is actually a free Pascal project nowadays
link |
which is also attempting to kind of recreate Delphi.
link |
So they're making good progress.
link |
that's one of your favorite programming languages.
link |
Well, it's programming environments.
link |
Again, I'd say Pascal's not a nice language.
link |
If you wanted to know specifically
link |
about what languages I like,
link |
I would definitely pick J as being an amazingly wonderful
link |
J, are you aware of APL?
link |
I am not, except from doing a little research
link |
on the work you've done.
link |
Okay, so not at all surprising you're not familiar with it
link |
because it's not well known,
link |
but it's actually one of the main families
link |
of programming languages going back to the late 50s,
link |
So there was a couple of major directions.
link |
One was the kind of Lambda Calculus Alonzo Church direction,
link |
which I guess kind of lisp and scheme and whatever,
link |
which has a history going back
link |
to the early days of computing.
link |
The second was the kind of imperative slash OO,
link |
algo similar going on to C, C++ and so forth.
link |
There was a third,
link |
which are called array oriented languages,
link |
which started with a paper by a guy called Ken Iverson,
link |
which was actually a math theory paper,
link |
not a programming paper.
link |
It was called Notation as a Tool for Thought.
link |
And it was the development of a new way,
link |
a new type of math notation.
link |
And the idea is that this math notation
link |
was much more flexible, expressive,
link |
and also well defined than traditional math notation,
link |
which is none of those things.
link |
Math notation is awful.
link |
And so he actually turned that into a programming language
link |
and cause this was the early 50s or the sorry, late 50s,
link |
all the names were available.
link |
So he called his language a programming language or APL.
link |
So APL is a implementation of notation
link |
as a tool for thought by which he means math notation.
link |
And Ken and his son went on to do many things,
link |
but eventually they actually produced a new language
link |
that was built on top of all the learnings of APL.
link |
And that was called J.
link |
And J is the most expressive, composable language
link |
of beautifully designed language I've ever seen.
link |
Does it have object oriented components?
link |
Does it have that kind of thing?
link |
Not really, it's an array oriented language.
link |
It's the third path.
link |
Are you saying array?
link |
Array oriented, yeah.
link |
What does it mean to be array oriented?
link |
So array oriented means that you generally
link |
don't use any loops,
link |
but the whole thing is done with kind of
link |
a extreme version of broadcasting,
link |
if you're familiar with that NumPy slash Python concept.
link |
So you do a lot with one line of code.
link |
It looks a lot like math notation, highly compact.
link |
And the idea is that you can kind of,
link |
because you can do so much with one line of code,
link |
a single screen of code is very unlikely to,
link |
you very rarely need more than that
link |
to express your program.
link |
And so you can kind of keep it all in your head
link |
and you can kind of clearly communicate it.
link |
It's interesting that APL created two main branches,
link |
J is this kind of like open source,
link |
niche community of crazy enthusiasts like me.
link |
And then the other path, K, was fascinating.
link |
It's an astonishingly expensive programming language,
link |
which many of the world's
link |
most ludicrously rich hedge funds use.
link |
So the entire K machine is so small
link |
it sits inside level three cache on your CPU.
link |
And it easily wins every benchmark I've ever seen
link |
in terms of data processing speed.
link |
But you don't come across it very much
link |
because it's like $100,000 per CPU to run it.
link |
It's like this path of programming languages
link |
is just so much, I don't know,
link |
so much more powerful in every way
link |
than the ones that almost anybody uses every day.
link |
So it's all about computation.
link |
It's really focused on computation.
link |
It's pretty heavily focused on computation.
link |
I mean, so much of programming
link |
is data processing by definition.
link |
So there's a lot of things you can do with it.
link |
But yeah, there's not much work being done
link |
on making like user interface toolkits or whatever.
link |
I mean, there's some, but they're not great.
link |
At the same time, you've done a lot of stuff
link |
with Perl and Python.
link |
So where does that fit into the picture of J and K and APL?
link |
Well, it's just much more pragmatic.
link |
Like in the end, you kind of have to end up
link |
where the libraries are, you know?
link |
Like, cause to me, my focus is on productivity.
link |
I just want to get stuff done and solve problems.
link |
So Perl was great.
link |
I created an email company called FastMail
link |
and Perl was great cause back in the late nineties,
link |
early two thousands, it just had a lot of stuff it could do.
link |
I still had to write my own monitoring system
link |
and my own web framework, my own whatever,
link |
cause like none of that stuff existed.
link |
But it was a super flexible language to do that in.
link |
And you used Perl for FastMail, you used it as a backend?
link |
Like so everything was written in Perl?
link |
Yeah, yeah, everything, everything was Perl.
link |
Why do you think Perl hasn't succeeded
link |
or hasn't dominated the market where Python
link |
really takes over a lot of the tasks?
link |
Well, I mean, Perl did dominate.
link |
It was everything, everywhere,
link |
but then the guy that ran Perl, Larry Wohl,
link |
kind of just didn't put the time in anymore.
link |
And no project can be successful if there isn't,
link |
you know, particularly one that started with a strong leader
link |
that loses that strong leadership.
link |
So then Python has kind of replaced it.
link |
You know, Python is a lot less elegant language
link |
in nearly every way,
link |
but it has the data science libraries
link |
and a lot of them are pretty great.
link |
So I kind of use it
link |
cause it's the best we have,
link |
but it's definitely not good enough.
link |
But what do you think the future of programming looks like?
link |
What do you hope the future of programming looks like
link |
if we zoom in on the computational fields,
link |
on data science, on machine learning?
link |
I hope Swift is successful
link |
because the goal of Swift,
link |
the way Chris Latner describes it,
link |
is to be infinitely hackable.
link |
And that's what I want.
link |
I want something where me and the people I do research with
link |
and my students can look at
link |
and change everything from top to bottom.
link |
There's nothing mysterious and magical and inaccessible.
link |
Unfortunately with Python, it's the opposite of that
link |
because Python is so slow.
link |
It's extremely unhackable.
link |
You get to a point where it's like,
link |
okay, from here on down at C.
link |
So your debugger doesn't work in the same way.
link |
Your profiler doesn't work in the same way.
link |
Your build system doesn't work in the same way.
link |
It's really not very hackable at all.
link |
What's the part you like to be hackable?
link |
Is it for the objective of optimizing training
link |
of neural networks, inference of neural networks?
link |
Is it performance of the system
link |
or is there some non performance related, just?
link |
I mean, in the end, I want to be productive
link |
as a practitioner.
link |
So that means that, so like at the moment,
link |
our understanding of deep learning is incredibly primitive.
link |
There's very little we understand.
link |
Most things don't work very well,
link |
even though it works better than anything else out there.
link |
There's so many opportunities to make it better.
link |
So you look at any domain area,
link |
like, I don't know, speech recognition with deep learning
link |
or natural language processing classification
link |
with deep learning or whatever.
link |
Every time I look at an area with deep learning,
link |
I always see like, oh, it's terrible.
link |
There's lots and lots of obviously stupid ways
link |
to do things that need to be fixed.
link |
So then I want to be able to jump in there
link |
and quickly experiment and make them better.
link |
You think the programming language has a role in that?
link |
So currently, Python has a big gap
link |
in terms of our ability to innovate,
link |
particularly around recurrent neural networks
link |
and natural language processing.
link |
Because it's so slow, the actual loop
link |
where we actually loop through words,
link |
we have to do that whole thing in CUDA C.
link |
So we actually can't innovate with the kernel,
link |
the heart of that most important algorithm.
link |
And it's just a huge problem.
link |
And this happens all over the place.
link |
So we hit research limitations.
link |
Another example, convolutional neural networks,
link |
which are actually the most popular architecture
link |
for lots of things, maybe most things in deep learning.
link |
We almost certainly should be using
link |
sparse convolutional neural networks,
link |
but only like two people are,
link |
because to do it, you have to rewrite
link |
all of that CUDA C level stuff.
link |
And yeah, just researchers and practitioners don't.
link |
So there's just big gaps in what people actually research on,
link |
what people actually implement
link |
because of the programming language problem.
link |
So you think it's just too difficult to write in CUDA C
link |
that a higher level programming language like Swift
link |
should enable the easier,
link |
fooling around creative stuff with RNNs
link |
or with sparse convolutional neural networks?
link |
Who's at charge of making it easy
link |
for a researcher to play around?
link |
I mean, no one's at fault,
link |
just nobody's got around to it yet,
link |
or it's just, it's hard, right?
link |
And I mean, part of the fault is that we ignored
link |
that whole APL kind of direction.
link |
Nearly everybody did for 60 years, 50 years.
link |
But recently people have been starting to
link |
reinvent pieces of that
link |
and kind of create some interesting new directions
link |
in the compiler technology.
link |
So the place where that's particularly happening right now
link |
is something called MLIR,
link |
which is something that, again,
link |
Chris Latina, the Swift guy, is leading.
link |
And yeah, because it's actually not gonna be Swift
link |
on its own that solves this problem,
link |
because the problem is that currently writing
link |
a acceptably fast, you know, GPU program
link |
is too complicated regardless of what language you use.
link |
And that's just because if you have to deal with the fact
link |
that I've got, you know, 10,000 threads
link |
and I have to synchronize between them all
link |
and I have to put my thing into grid blocks
link |
and think about warps and all this stuff,
link |
it's just so much boilerplate that to do that well,
link |
you have to be a specialist at that
link |
and it's gonna be a year's work to, you know,
link |
optimize that algorithm in that way.
link |
But with things like tensor comprehensions
link |
and TILE and MLIR and TVM,
link |
there's all these various projects
link |
which are all about saying,
link |
let's let people create like domain specific languages
link |
for tensor computations.
link |
These are the kinds of things we do generally
link |
on the GPU for deep learning and then have a compiler
link |
which can optimize that tensor computation.
link |
A lot of this work is actually sitting
link |
on top of a project called Halide,
link |
which is a mind blowing project where they came up
link |
with such a domain specific language.
link |
In fact, two, one domain specific language for expressing
link |
this is what my tensor computation is
link |
and another domain specific language for expressing
link |
this is the kind of the way I want you to structure
link |
the compilation of that and like do it block by block
link |
and do these bits in parallel.
link |
And they were able to show how you can compress
link |
the amount of code by 10X compared to optimized GPU code
link |
and get the same performance.
link |
So that's like, so these other things are kind of sitting
link |
on top of that kind of research and MLIR is pulling a lot
link |
of those best practices together.
link |
And now we're starting to see work done on making all
link |
of that directly accessible through Swift
link |
so that I could use Swift to kind of write those
link |
domain specific languages and hopefully we'll get
link |
then Swift CUDA kernels written in a very expressive
link |
and concise way that looks a bit like J and APL
link |
and then Swift layers on top of that
link |
and then a Swift UI on top of that.
link |
And it'll be so nice if we can get to that point.
link |
Now does it all eventually boil down to CUDA
link |
Unfortunately at the moment it does,
link |
but one of the nice things about MLIR if AMD ever
link |
gets their act together which they probably won't
link |
is that they or others could write MLIR backends
link |
for other GPUs or rather tensor computation devices
link |
of which today there are increasing number
link |
like Graph Core or Vertex AI or whatever.
link |
So yeah, being able to target lots of backends
link |
would be another benefit of this
link |
and the market really needs competition
link |
because at the moment NVIDIA is massively overcharging
link |
for their kind of enterprise class cards
link |
because there is no serious competition
link |
because nobody else is doing the software properly.
link |
In the cloud there is some competition, right?
link |
Not really, other than TPUs perhaps,
link |
but TPUs are almost unprogrammable at the moment.
link |
So TPUs have the same problem that you can't?
link |
So TPUs, Google actually made an explicit decision
link |
to make them almost entirely unprogrammable
link |
because they felt that there was too much IP in there
link |
and if they gave people direct access to program them,
link |
people would learn their secrets.
link |
So you can't actually directly program the memory
link |
You can't even directly create code that runs on
link |
and that you look at on the machine that has the TPU,
link |
it all goes through a virtual machine.
link |
So all you can really do is this kind of cookie cutter thing
link |
of like plug in high level stuff together,
link |
which is just super tedious and annoying
link |
and totally unnecessary.
link |
So what was the, tell me if you could,
link |
the origin story of fast AI.
link |
What is the motivation, its mission, its dream?
link |
So I guess the founding story is heavily tied
link |
to my previous startup, which is a company called Analytic,
link |
which was the first company to focus on deep learning
link |
for medicine and I created that because I saw
link |
that was a huge opportunity to,
link |
there's about a 10X shortage of the number of doctors
link |
in the world, in the developing world that we need.
link |
I expected it would take about 300 years
link |
to train enough doctors to meet that gap.
link |
But I guess that maybe if we used deep learning
link |
for some of the analytics, we could maybe make it
link |
so you don't need as highly trained doctors.
link |
For diagnosis and treatment planning.
link |
Where's the biggest benefit just before we get to fast AI,
link |
where's the biggest benefit of AI
link |
and medicine that you see today?
link |
And maybe next time.
link |
Not much happening today in terms of like stuff
link |
that's actually out there, it's very early.
link |
But in terms of the opportunity,
link |
it's to take markets like India and China and Indonesia,
link |
which have big populations, Africa,
link |
small numbers of doctors,
link |
and provide diagnostic, particularly treatment planning
link |
and triage kind of on device so that if you do a test
link |
for malaria or tuberculosis or whatever,
link |
you immediately get something that even a healthcare worker
link |
that's had a month of training can get
link |
a very high quality assessment of whether the patient
link |
might be at risk and tell, okay,
link |
we'll send them off to a hospital.
link |
So for example, in Africa, outside of South Africa,
link |
there's only five pediatric radiologists
link |
for the entire continent.
link |
So most countries don't have any.
link |
So if your kid is sick and they need something diagnosed
link |
through medical imaging, the person,
link |
even if you're able to get medical imaging done,
link |
the person that looks at it will be a nurse at best.
link |
But actually in India, for example, and China,
link |
almost no x rays are read by anybody,
link |
by any trained professional because they don't have enough.
link |
So if instead we had a algorithm that could take
link |
the most likely high risk 5% and say triage,
link |
basically say, okay, someone needs to look at this,
link |
it would massively change the kind of way
link |
that what's possible with medicine in the developing world.
link |
And remember, they have, increasingly they have money.
link |
They're the developing world, they're not the poor world,
link |
they're the developing world.
link |
So they have the money.
link |
So they're building the hospitals,
link |
they're getting the diagnostic equipment,
link |
but there's no way for a very long time
link |
will they be able to have the expertise.
link |
Shortage of expertise, okay.
link |
And that's where the deep learning systems can step in
link |
and magnify the expertise they do have.
link |
So you do see, just to linger a little bit longer,
link |
the interaction, do you still see the human experts
link |
still at the core of these systems?
link |
Is there something in medicine
link |
that could be automated almost completely?
link |
I don't see the point of even thinking about that
link |
because we have such a shortage of people.
link |
Why would we want to find a way not to use them?
link |
We have people, so the idea of like,
link |
even from an economic point of view,
link |
if you can make them 10X more productive,
link |
getting rid of the person,
link |
doesn't impact your unit economics at all.
link |
And it totally ignores the fact
link |
that there are things people do better than machines.
link |
So it's just to me,
link |
that's not a useful way of framing the problem.
link |
I guess, just to clarify,
link |
I guess I meant there may be some problems
link |
where you can avoid even going to the expert ever,
link |
sort of maybe preventative care or some basic stuff,
link |
allowing the expert to focus on the things
link |
that are really that, you know.
link |
Well, that's what the triage would do, right?
link |
So the triage would say,
link |
okay, there's 99% sure there's nothing here.
link |
So that can be done on device
link |
and they can just say, okay, go home.
link |
So the experts are being used to look at the stuff
link |
which has some chance it's worth looking at,
link |
which most things it's not, it's fine.
link |
Why do you think that is?
link |
You know, it's fine.
link |
Why do you think we haven't quite made progress on that yet
link |
in terms of the scale of how much AI is applied
link |
in the medical field?
link |
Oh, there's a lot of reasons.
link |
I mean, one is it's pretty new.
link |
I only started in Liddick in like 2014.
link |
And before that, it's hard to express
link |
to what degree the medical world
link |
was not aware of the opportunities here.
link |
So I went to RSNA,
link |
which is the world's largest radiology conference.
link |
And I told everybody I could, you know,
link |
like I'm doing this thing with deep learning,
link |
please come and check it out.
link |
And no one had any idea what I was talking about
link |
and no one had any interest in it.
link |
So like we've come from absolute zero, which is hard.
link |
And then the whole regulatory framework, education system,
link |
everything is just set up to think of doctoring
link |
in a very different way.
link |
So today there is a small number of people
link |
who are deep learning practitioners
link |
and doctors at the same time.
link |
And we're starting to see the first ones
link |
come out of their PhD programs.
link |
So Zach Kahane over in Boston, Cambridge
link |
has a number of students now who are data science experts,
link |
deep learning experts, and actual medical doctors.
link |
Quite a few doctors have completed our fast AI course now
link |
and are publishing papers and creating journal reading groups
link |
in the American Council of Radiology.
link |
And like, it's just starting to happen,
link |
but it's gonna be a long time coming.
link |
It's gonna happen, but it's gonna be a long process.
link |
The regulators have to learn how to regulate this.
link |
They have to build guidelines.
link |
And then the lawyers at hospitals
link |
have to develop a new way of understanding
link |
that sometimes it makes sense for data to be looked at
link |
in raw form in large quantities
link |
in order to create well changing results.
link |
Yeah, so the regulation around data, all that,
link |
it sounds probably the hardest problem,
link |
but sounds reminiscent of autonomous vehicles as well.
link |
Many of the same regulatory challenges,
link |
many of the same data challenges.
link |
Yeah, I mean, funnily enough,
link |
the problem is less the regulation
link |
and more the interpretation of that regulation
link |
by lawyers in hospitals.
link |
So HIPAA is actually, was designed to pay,
link |
and HIPAA does not stand for privacy.
link |
It stands for portability.
link |
It's actually meant to be a way that data can be used.
link |
And it was created with lots of gray areas
link |
because the idea is that would be more practical
link |
and it would help people to use this legislation
link |
to actually share data in a more thoughtful way.
link |
Unfortunately, it's done the opposite
link |
because when a lawyer sees a gray area,
link |
they say, oh, if we don't know, we won't get sued,
link |
then we can't do it.
link |
So HIPAA is not exactly the problem.
link |
The problem is more that there's,
link |
hospital lawyers are not incented
link |
to make bold decisions about data portability.
link |
Or even to embrace technology that saves lives.
link |
They more want to not get in trouble
link |
for embracing that technology.
link |
It also saves lives in a very abstract way,
link |
which is like, oh, we've been able to release
link |
these 100,000 anonymized records.
link |
I can't point to the specific person
link |
whose life that saved.
link |
I can say like, oh, we ended up with this paper
link |
which found this result,
link |
which diagnosed a thousand more people
link |
than we would have otherwise,
link |
but it's like, which ones were helped?
link |
It's very abstract.
link |
And on the counter side of that,
link |
you may be able to point to a life that was taken
link |
because of something that was.
link |
Yeah, or a person whose privacy was violated.
link |
It's like, oh, this specific person was deidentified.
link |
Just a fascinating topic.
link |
We're jumping around.
link |
We'll get back to fast AI,
link |
but on the question of privacy,
link |
data is the fuel for so much innovation in deep learning.
link |
What's your sense on privacy?
link |
Whether we're talking about Twitter, Facebook, YouTube,
link |
just the technologies like in the medical field
link |
that rely on people's data in order to create impact.
link |
How do we get that right,
link |
respecting people's privacy and yet creating technology
link |
that is learning from data?
link |
One of my areas of focus is on doing more with less data.
link |
More with less data, which,
link |
so most vendors, unfortunately,
link |
are strongly incented to find ways
link |
to require more data and more computation.
link |
So, Google and IBM being the most obvious.
link |
So, Google and IBM both strongly push the idea
link |
that you have to be,
link |
that they have more data and more computation
link |
and more intelligent people than anybody else.
link |
And so you have to trust them to do things
link |
because nobody else can do it.
link |
And Google's very upfront about this,
link |
like Jeff Dean has gone out there and given talks
link |
and said, our goal is to require
link |
a thousand times more computation, but less people.
link |
Our goal is to use the people that you have better
link |
and the data you have better
link |
and the computation you have better.
link |
So, one of the things that we've discovered is,
link |
or at least highlighted,
link |
is that you very, very, very often
link |
don't need much data at all.
link |
And so the data you already have in your organization
link |
will be enough to get state of the art results.
link |
So, like my starting point would be to kind of say
link |
around privacy is a lot of people are looking for ways
link |
to share data and aggregate data,
link |
but I think often that's unnecessary.
link |
They assume that they need more data than they do
link |
because they're not familiar with the basics
link |
of transfer learning, which is this critical technique
link |
for needing orders of magnitude less data.
link |
Is your sense, one reason you might wanna collect data
link |
from everyone is like in the recommender system context,
link |
where your individual, Jeremy Howard's individual data
link |
is the most useful for providing a product
link |
that's impactful for you.
link |
So, for giving you advertisements,
link |
for recommending to you movies,
link |
for doing medical diagnosis,
link |
is your sense we can build with a small amount of data,
link |
general models that will have a huge impact
link |
for most people that we don't need to have data
link |
from each individual?
link |
On the whole, I'd say yes.
link |
I mean, there are things like,
link |
you know, recommender systems have this cold start problem
link |
where, you know, Jeremy is a new customer,
link |
we haven't seen him before, so we can't recommend him things
link |
based on what else he's bought and liked with us.
link |
And there's various workarounds to that.
link |
Like in a lot of music programs,
link |
we'll start out by saying, which of these artists do you like?
link |
Which of these albums do you like?
link |
Which of these songs do you like?
link |
Netflix used to do that, nowadays they tend not to.
link |
People kind of don't like that
link |
because they think, oh, we don't wanna bother the user.
link |
So, you could work around that
link |
by having some kind of data sharing
link |
where you get my marketing record from Axiom or whatever,
link |
and try to guess from that.
link |
To me, the benefit to me and to society
link |
of saving me five minutes on answering some questions
link |
versus the negative externalities of the privacy issue
link |
So, I think like a lot of the time,
link |
the places where people are invading our privacy
link |
in order to provide convenience
link |
is really about just trying to make them more money
link |
and they move these negative externalities
link |
to places that they don't have to pay for them.
link |
So, when you actually see regulations appear
link |
that actually cause the companies
link |
that create these negative externalities
link |
to have to pay for it themselves,
link |
they say, well, we can't do it anymore.
link |
So, the cost is actually too high.
link |
But for something like medicine,
link |
yeah, I mean, the hospital has my medical imaging,
link |
my pathology studies, my medical records,
link |
and also I own my medical data.
link |
So, you can, so I help a startup called Doc.ai.
link |
One of the things Doc.ai does is that it has an app.
link |
You can connect to, you know, Sutter Health
link |
and LabCorp and Walgreens
link |
and download your medical data to your phone
link |
and then upload it again at your discretion
link |
to share it as you wish.
link |
So, with that kind of approach,
link |
we can share our medical information
link |
with the people we want to.
link |
I mean, really being able to control
link |
who you share it with and so on.
link |
So, that has a beautiful, interesting tangent
link |
to return back to the origin story of Fast.ai.
link |
Right, so before I started Fast.ai,
link |
I spent a year researching
link |
where are the biggest opportunities for deep learning?
link |
Because I knew from my time at Kaggle in particular
link |
that deep learning had kind of hit this threshold point
link |
where it was rapidly becoming the state of the art approach
link |
in every area that looked at it.
link |
And I'd been working with neural nets for over 20 years.
link |
I knew that from a theoretical point of view,
link |
once it hit that point,
link |
it would do that in kind of just about every domain.
link |
And so I kind of spent a year researching
link |
what are the domains that's gonna have
link |
the biggest low hanging fruit
link |
in the shortest time period.
link |
I picked medicine, but there were so many
link |
I could have picked.
link |
And so there was a kind of level of frustration for me
link |
of like, okay, I'm really glad we've opened up
link |
the medical deep learning world.
link |
And today it's huge, as you know,
link |
but we can't do, I can't do everything.
link |
I don't even know, like in medicine,
link |
it took me a really long time to even get a sense
link |
of like what kind of problems do medical practitioners solve?
link |
What kind of data do they have?
link |
Who has that data?
link |
So I kind of felt like I need to approach this differently
link |
if I wanna maximize the positive impact of deep learning.
link |
Rather than me picking an area
link |
and trying to become good at it and building something,
link |
I should let people who are already domain experts
link |
in those areas and who already have the data
link |
So that was the reason for Fast.ai
link |
is to basically try and figure out
link |
how to get deep learning into the hands of people
link |
who could benefit from it and help them to do so
link |
in as quick and easy and effective a way as possible.
link |
Got it, so sort of empower the domain experts.
link |
Yeah, and like partly it's because like,
link |
unlike most people in this field,
link |
my background is very applied and industrial.
link |
Like my first job was at McKinsey & Company.
link |
I spent 10 years in management consulting.
link |
I spend a lot of time with domain experts.
link |
So I kind of respect them and appreciate them.
link |
And I know that's where the value generation in society is.
link |
And so I also know how most of them can't code
link |
and most of them don't have the time to invest
link |
three years in a graduate degree or whatever.
link |
So I was like, how do I upskill those domain experts?
link |
I think that would be a super powerful thing,
link |
the biggest societal impact I could have.
link |
So yeah, that was the thinking.
link |
So much of Fast.ai students and researchers
link |
and the things you teach are pragmatically minded,
link |
practically minded,
link |
figuring out ways how to solve real problems and fast.
link |
So from your experience,
link |
what's the difference between theory
link |
and practice of deep learning?
link |
Well, most of the research in the deep learning world
link |
is a total waste of time.
link |
Right, that's what I was getting at.
link |
It's a problem in science in general.
link |
Scientists need to be published,
link |
which means they need to work on things
link |
that their peers are extremely familiar with
link |
and can recognize in advance in that area.
link |
So that means that they all need to work on the same thing.
link |
And so it really, and the thing they work on,
link |
there's nothing to encourage them to work on things
link |
that are practically useful.
link |
So you get just a whole lot of research,
link |
which is minor advances and stuff
link |
that's been very highly studied
link |
and has no significant practical impact.
link |
Whereas the things that really make a difference,
link |
like I mentioned transfer learning,
link |
like if we can do better at transfer learning,
link |
then it's this like world changing thing
link |
where suddenly like lots more people
link |
can do world class work with less resources and less data.
link |
But almost nobody works on that.
link |
Or another example, active learning,
link |
which is the study of like,
link |
how do we get more out of the human beings in the loop?
link |
That's my favorite topic.
link |
Yeah, so active learning is great,
link |
but it's almost nobody working on it
link |
because it's just not a trendy thing right now.
link |
You know what somebody, sorry to interrupt,
link |
you're saying that nobody is publishing on active learning,
link |
but there's people inside companies,
link |
anybody who actually has to solve a problem,
link |
they're going to innovate on active learning.
link |
Yeah, everybody kind of reinvents active learning
link |
when they actually have to work in practice
link |
because they start labeling things and they think,
link |
gosh, this is taking a long time and it's very expensive.
link |
And then they start thinking,
link |
well, why am I labeling everything?
link |
I'm only, the machine's only making mistakes
link |
on those two classes.
link |
They're the hard ones.
link |
Maybe I'll just start labeling those two classes.
link |
And then you start thinking,
link |
well, why did I do that manually?
link |
Why can't I just get the system to tell me
link |
which things are going to be hardest?
link |
It's an obvious thing to do, but yeah,
link |
it's just like transfer learning.
link |
It's understudied and the academic world
link |
just has no reason to care about practical results.
link |
The funny thing is,
link |
like I've only really ever written one paper.
link |
I hate writing papers.
link |
And I didn't even write it.
link |
It was my colleague, Sebastian Ruder,
link |
who actually wrote it.
link |
I just did the research for it,
link |
but it was basically introducing transfer learning,
link |
successful transfer learning to NLP for the first time.
link |
The algorithm is called ULM fit.
link |
And it actually, I actually wrote it for the course,
link |
for the Fast AI course.
link |
I wanted to teach people NLP and I thought,
link |
I only want to teach people practical stuff.
link |
And I think the only practical stuff is transfer learning.
link |
And I couldn't find any examples of transfer learning in NLP.
link |
And I was shocked to find that as soon as I did it,
link |
which, you know, the basic prototype took a couple of days,
link |
smashed the state of the art
link |
on one of the most important data sets
link |
in a field that I knew nothing about.
link |
And I just thought, well, this is ridiculous.
link |
And so I spoke to Sebastian about it
link |
and he kindly offered to write it up, the results.
link |
And so it ended up being published in ACL,
link |
which is the top computational linguistics conference.
link |
So like people do actually care once you do it,
link |
but I guess it's difficult for maybe like junior researchers
link |
or like, I don't care whether I get citations
link |
or papers or whatever.
link |
There's nothing in my life that makes that important,
link |
which is why I've never actually bothered
link |
to write a paper myself.
link |
But for people who do,
link |
I guess they have to pick the kind of safe option,
link |
which is like, yeah, make a slight improvement
link |
on something that everybody's already working on.
link |
Yeah, nobody does anything interesting
link |
or succeeds in life with the safe option.
link |
Although, I mean, the nice thing is,
link |
nowadays everybody is now working on NLP transfer learning
link |
because since that time we've had GPT and GPT2 and BERT,
link |
and, you know, it's like, it's, so yeah,
link |
once you show that something's possible,
link |
everybody jumps in, I guess, so.
link |
I hope to be a part of,
link |
and I hope to see more innovation
link |
and active learning in the same way.
link |
I think transfer learning and active learning
link |
are fascinating, public, open work.
link |
I actually helped start a startup called Platform AI,
link |
which is really all about active learning.
link |
And yeah, it's been interesting trying to kind of see
link |
what research is out there and make the most of it.
link |
And there's basically none.
link |
So we've had to do all our own research.
link |
Once again, and just as you described.
link |
Can you tell the story of the Stanford competition,
link |
Dawn Bench, and FastAI's achievement on it?
link |
Sure, so something which I really enjoy
link |
is that I basically teach two courses a year,
link |
the Practical Deep Learning for Coders,
link |
which is kind of the introductory course,
link |
and then Cutting Edge Deep Learning for Coders,
link |
which is the kind of research level course.
link |
And while I teach those courses,
link |
I basically have a big office
link |
at the University of San Francisco,
link |
big enough for like 30 people.
link |
And I invite anybody, any student who wants to come
link |
and hang out with me while I build the course.
link |
And so generally it's full.
link |
And so we have 20 or 30 people in a big office
link |
with nothing to do but study deep learning.
link |
So it was during one of these times
link |
that somebody in the group said,
link |
oh, there's a thing called Dawn Bench
link |
that looks interesting.
link |
And I was like, what the hell is that?
link |
And they set out some competition
link |
to see how quickly you can train a model.
link |
Seems kind of, not exactly relevant to what we're doing,
link |
but it sounds like the kind of thing
link |
which you might be interested in.
link |
And I checked it out and I was like,
link |
oh crap, there's only 10 days till it's over.
link |
And we're kind of busy trying to teach this course.
link |
But we're like, oh, it would make an interesting case study
link |
It's like, it's all the stuff we're already doing.
link |
Why don't we just put together
link |
our current best practices and ideas?
link |
So me and I guess about four students
link |
just decided to give it a go.
link |
And we focused on this small one called Cifar 10,
link |
which is little 32 by 32 pixel images.
link |
Can you say what Dawn Bench is?
link |
Yeah, so it's a competition to train a model
link |
as fast as possible.
link |
It was run by Stanford.
link |
And it's cheap as possible too.
link |
That's also another one for as cheap as possible.
link |
And there was a couple of categories,
link |
ImageNet and Cifar 10.
link |
So ImageNet is this big 1.3 million image thing
link |
that took a couple of days to train.
link |
Remember a friend of mine, Pete Warden,
link |
who's now at Google.
link |
I remember he told me how he trained ImageNet
link |
a few years ago when he basically like had this
link |
little granny flat out the back
link |
that he turned into his ImageNet training center.
link |
And he figured, you know, after like a year of work,
link |
he figured out how to train it in like 10 days or something.
link |
It's like, that was a big job.
link |
Whereas Cifar 10, at that time,
link |
you could train in a few hours.
link |
You know, it's much smaller and easier.
link |
So we thought we'd try Cifar 10.
link |
And yeah, I've really never done that before.
link |
Like I'd never really,
link |
like things like using more than one GPU at a time
link |
was something I tried to avoid.
link |
Cause to me, it's like very against the whole idea
link |
of accessibility is should better do things with one GPU.
link |
I mean, have you asked in the past before,
link |
after having accomplished something,
link |
how do I do this faster, much faster?
link |
Oh, always, but it's always, for me,
link |
it's always how do I make it much faster on a single GPU
link |
that a normal person could afford in their day to day life.
link |
It's not how could I do it faster by, you know,
link |
having a huge data center.
link |
Cause to me, it's all about like,
link |
as many people should better use something as possible
link |
without fussing around with infrastructure.
link |
So anyways, in this case it's like, well,
link |
we can use eight GPUs just by renting a AWS machine.
link |
So we thought we'd try that.
link |
And yeah, basically using the stuff we were already doing,
link |
we were able to get, you know, the speed,
link |
you know, within a few days we had the speed down to,
link |
I don't know, a very small number of minutes.
link |
I can't remember exactly how many minutes it was,
link |
but it might've been like 10 minutes or something.
link |
And so, yeah, we found ourselves
link |
at the top of the leaderboard easily
link |
for both time and money, which really shocked me
link |
cause the other people competing in this
link |
were like Google and Intel and stuff
link |
who I like know a lot more about this stuff
link |
than I think we do.
link |
So then we were emboldened.
link |
We thought let's try the ImageNet one too.
link |
I mean, it seemed way out of our league,
link |
but our goal was to get under 12 hours.
link |
And we did, which was really exciting.
link |
But we didn't put anything up on the leaderboard,
link |
but we were down to like 10 hours.
link |
But then Google put in like five hours or something
link |
and we're just like, oh, we're so screwed.
link |
But we kind of thought, we'll keep trying.
link |
You know, if Google can do it in five,
link |
I mean, Google did on five hours on something
link |
on like a TPU pod or something, like a lot of hardware.
link |
But we kind of like had a bunch of ideas to try.
link |
Like a really simple thing was
link |
why are we using these big images?
link |
They're like 224 or 256 by 256 pixels.
link |
You know, why don't we try smaller ones?
link |
And just to elaborate, there's a constraint
link |
on the accuracy that your trained model
link |
is supposed to achieve, right?
link |
Yeah, you gotta achieve 93%, I think it was,
link |
for ImageNet, exactly.
link |
Which is very tough, so you have to.
link |
Yeah, 93%, like they picked a good threshold.
link |
It was a little bit higher
link |
than what the most commonly used ResNet 50 model
link |
could achieve at that time.
link |
So yeah, so it's quite a difficult problem to solve.
link |
But yeah, we realized if we actually
link |
just use 64 by 64 images,
link |
it trained a pretty good model.
link |
And then we could take that same model
link |
and just give it a couple of epochs to learn 224 by 224 images.
link |
And it was basically already trained.
link |
It makes a lot of sense.
link |
Like if you teach somebody,
link |
like here's what a dog looks like
link |
and you show them low res versions,
link |
and then you say, here's a really clear picture of a dog,
link |
they already know what a dog looks like.
link |
So that like just, we jumped to the front
link |
and we ended up winning parts of that competition.
link |
We actually ended up doing a distributed version
link |
over multiple machines a couple of months later
link |
and ended up at the top of the leaderboard.
link |
We had 18 minutes.
link |
and people have just kept on blasting through
link |
again and again since then, so.
link |
So what's your view on multi GPU
link |
or multiple machine training in general
link |
as a way to speed code up?
link |
I think it's largely a waste of time.
link |
I think it's largely a waste of time.
link |
Both multi GPU on a single machine and.
link |
Yeah, particularly multi machines,
link |
cause it's just clunky.
link |
Multi GPUs is less clunky than it used to be,
link |
but to me anything that slows down your iteration speed
link |
is a waste of time.
link |
So you could maybe do your very last,
link |
you know, perfecting of the model on multi GPUs
link |
if you need to, but.
link |
So for example, I think doing stuff on ImageNet
link |
is generally a waste of time.
link |
Why test things on 1.3 million images?
link |
Most of us don't use 1.3 million images.
link |
And we've also done research that shows that
link |
doing things on a smaller subset of images
link |
gives you the same relative answers anyway.
link |
So from a research point of view, why waste that time?
link |
So actually I released a couple of new data sets recently.
link |
One is called ImageNet,
link |
the French ImageNet, which is a small subset of ImageNet,
link |
which is designed to be easy to classify.
link |
What's, how do you spell ImageNet?
link |
It's got an extra T and E at the end,
link |
cause it's very French.
link |
And then another one called ImageWolf,
link |
which is a subset of ImageNet that only contains dog breeds.
link |
And that's a hard one, right?
link |
That's a hard one.
link |
And I've discovered that if you just look at these
link |
two subsets, you can train things on a single GPU
link |
And the results you get are directly transferable
link |
to ImageNet nearly all the time.
link |
And so now I'm starting to see some researchers
link |
start to use these much smaller data sets.
link |
I so deeply love the way you think,
link |
because I think you might've written a blog post
link |
saying that sort of going these big data sets
link |
is encouraging people to not think creatively.
link |
So you're too, it sort of constrains you to train
link |
on large resources.
link |
And because you have these resources,
link |
you think more research will be better.
link |
And then you start, so like somehow you kill the creativity.
link |
Yeah, and even worse than that, Lex,
link |
I keep hearing from people who say,
link |
I decided not to get into deep learning
link |
because I don't believe it's accessible to people
link |
outside of Google to do useful work.
link |
So like I see a lot of people make an explicit decision
link |
to not learn this incredibly valuable tool
link |
because they've drunk the Google Koolaid,
link |
which is that only Google's big enough
link |
and smart enough to do it.
link |
And I just find that so disappointing and it's so wrong.
link |
And I think all of the major breakthroughs in AI
link |
in the next 20 years will be doable on a single GPU.
link |
Like I would say, my sense is all the big sort of.
link |
Well, let's put it this way.
link |
None of the big breakthroughs of the last 20 years
link |
have required multiple GPUs.
link |
So like batch norm, ReLU, Dropout.
link |
To demonstrate that there's something to them.
link |
Every one of them, none of them has required multiple GPUs.
link |
GANs, the original GANs didn't require multiple GPUs.
link |
Well, and we've actually recently shown
link |
that you don't even need GANs.
link |
So we've developed GAN level outcomes without needing GANs.
link |
And we can now do it with, again,
link |
by using transfer learning,
link |
we can do it in a couple of hours on a single GPU.
link |
You're just using a generator model
link |
without the adversarial part?
link |
Yeah, so we've found loss functions
link |
that work super well without the adversarial part.
link |
And then one of our students, a guy called Jason Antich,
link |
has created a system called dealtify,
link |
which uses this technique to colorize
link |
old black and white movies.
link |
You can do it on a single GPU,
link |
colorize a whole movie in a couple of hours.
link |
And one of the things that Jason and I did together
link |
was we figured out how to add a little bit of GAN
link |
at the very end, which it turns out for colorization
link |
makes it just a bit brighter and nicer.
link |
And then Jason did masses of experiments
link |
to figure out exactly how much to do,
link |
but it's still all done on his home machine
link |
on a single GPU in his lounge room.
link |
And if you think about colorizing Hollywood movies,
link |
that sounds like something a huge studio would have to do,
link |
but he has the world's best results on this.
link |
There's this problem of microphones.
link |
We're just talking to microphones now.
link |
It's such a pain in the ass to have these microphones
link |
to get good quality audio.
link |
And I tried to see if it's possible to plop down
link |
a bunch of cheap sensors and reconstruct
link |
higher quality audio from multiple sources.
link |
Because right now I haven't seen the work from,
link |
okay, we can say even expensive mics
link |
automatically combining audio from multiple sources
link |
to improve the combined audio.
link |
People haven't done that.
link |
And that feels like a learning problem.
link |
So hopefully somebody can.
link |
Well, I mean, it's evidently doable
link |
and it should have been done by now.
link |
I felt the same way about computational photography
link |
Why are we investing in big lenses
link |
when three cheap lenses plus actually
link |
a little bit of intentional movement,
link |
so like take a few frames,
link |
gives you enough information
link |
to get excellent subpixel resolution,
link |
which particularly with deep learning,
link |
you would know exactly what you meant to be looking at.
link |
We can totally do the same thing with audio.
link |
I think it's madness that it hasn't been done yet.
link |
Is there progress on the photography company?
link |
Yeah, photography is basically standard now.
link |
So the Google Pixel Night Light,
link |
I don't know if you've ever tried it,
link |
but it's astonishing.
link |
You take a picture in almost pitch black
link |
and you get back a very high quality image.
link |
And it's not because of the lens.
link |
Same stuff with like adding the bokeh
link |
to the background blurring,
link |
it's done computationally.
link |
This is the pixel right here.
link |
Yeah, basically everybody now
link |
is doing most of the fanciest stuff
link |
on their phones with computational photography
link |
and also increasingly people are putting
link |
more than one lens on the back of the camera.
link |
So the same will happen for audio for sure.
link |
And there's applications in the audio side.
link |
If you look at an Alexa type device,
link |
most people I've seen,
link |
especially I worked at Google before,
link |
when you look at noise background removal,
link |
you don't think of multiple sources of audio.
link |
You don't play with that as much
link |
as I would hope people would.
link |
But I mean, you can still do it even with one.
link |
Like again, not much work's been done in this area.
link |
So we're actually gonna be releasing an audio library soon,
link |
which hopefully will encourage development of this
link |
because it's so underused.
link |
The basic approach we used for our super resolution
link |
and which Jason uses for dealtify
link |
of generating high quality images,
link |
the exact same approach would work for audio.
link |
No one's done it yet,
link |
but it would be a couple of months work.
link |
Okay, also learning rate in terms of Dawn Bench.
link |
There's some magic on learning rate
link |
that you played around with that's kind of interesting.
link |
Yeah, so this is all work that came
link |
from a guy called Leslie Smith.
link |
Leslie's a researcher who, like us,
link |
cares a lot about just the practicalities
link |
of training neural networks quickly and accurately,
link |
which I think is what everybody should care about,
link |
but almost nobody does.
link |
And he discovered something very interesting,
link |
which he calls super convergence,
link |
which is there are certain networks
link |
that with certain settings of high parameters
link |
could suddenly be trained 10 times faster
link |
by using a 10 times higher learning rate.
link |
Now, no one published that paper
link |
because it's not an area of kind of active research
link |
in the academic world.
link |
No academics recognize that this is important.
link |
And also deep learning in academia
link |
is not considered a experimental science.
link |
So unlike in physics where you could say like,
link |
I just saw a subatomic particle do something
link |
which the theory doesn't explain,
link |
you could publish that without an explanation.
link |
And then in the next 60 years,
link |
people can try to work out how to explain it.
link |
We don't allow this in the deep learning world.
link |
So it's literally impossible for Leslie
link |
to publish a paper that says,
link |
I've just seen something amazing happen.
link |
This thing trained 10 times faster than it should have.
link |
And so the reviewers were like,
link |
well, you can't publish that because you don't know why.
link |
That's important to pause on
link |
because there's so many discoveries
link |
that would need to start like that.
link |
Every other scientific field I know of works that way.
link |
I don't know why ours is uniquely disinterested
link |
in publishing unexplained experimental results,
link |
So it wasn't published.
link |
I read a lot more unpublished papers than published papers
link |
because that's where you find the interesting insights.
link |
So I absolutely read this paper.
link |
And I was just like,
link |
this is astonishingly mind blowing and weird
link |
And like, why isn't everybody only talking about this?
link |
Because like, if you can train these things 10 times faster,
link |
they also generalize better
link |
because you're doing less epochs,
link |
which means you look at the data less,
link |
you get better accuracy.
link |
So I've been kind of studying that ever since.
link |
And eventually Leslie kind of figured out
link |
a lot of how to get this done.
link |
And we added minor tweaks.
link |
And a big part of the trick
link |
is starting at a very low learning rate,
link |
very gradually increasing it.
link |
So as you're training your model,
link |
you would take very small steps at the start
link |
and you gradually make them bigger and bigger
link |
until eventually you're taking much bigger steps
link |
than anybody thought was possible.
link |
There's a few other little tricks to make it work,
link |
but basically we can reliably get super convergence.
link |
And so for the Dawn Bench thing,
link |
we were using just much higher learning rates
link |
than people expected to work.
link |
What do you think the future of,
link |
I mean, it makes so much sense
link |
for that to be a critical hyperparameter learning rate
link |
What do you think the future
link |
of learning rate magic looks like?
link |
Well, there's been a lot of great work
link |
in the last 12 months in this area.
link |
And people are increasingly realizing that optimize,
link |
like we just have no idea really how optimizers work.
link |
And the combination of weight decay,
link |
which is how we regularize optimizers,
link |
and the learning rate,
link |
and then other things like the epsilon we use
link |
in the Adam optimizer,
link |
they all work together in weird ways.
link |
And different parts of the model,
link |
this is another thing we've done a lot of work on
link |
is research into how different parts of the model
link |
should be trained at different rates in different ways.
link |
So we do something we call discriminative learning rates,
link |
which is really important,
link |
particularly for transfer learning.
link |
So really, I think in the last 12 months,
link |
a lot of people have realized
link |
that all this stuff is important.
link |
There's been a lot of great work coming out
link |
and we're starting to see algorithms appear,
link |
which have very, very few dials, if any,
link |
that you have to touch.
link |
So I think what's gonna happen
link |
is the idea of a learning rate,
link |
well, it almost already has disappeared
link |
in the latest research.
link |
And instead, it's just like we know enough
link |
about how to interpret the gradients
link |
and the change of gradients we see
link |
to know how to set every parameter
link |
in an optimal way.
link |
So you see the future of deep learning
link |
where really, where's the input of a human expert needed?
link |
Well, hopefully the input of a human expert
link |
will be almost entirely unneeded
link |
from the deep learning point of view.
link |
So again, like Google's approach to this
link |
is to try and use thousands of times more compute
link |
to run lots and lots of models at the same time
link |
and hope that one of them is good.
link |
AutoML kind of thing?
link |
Yeah, AutoML kind of stuff, which I think is insane.
link |
When you better understand the mechanics
link |
of how models learn,
link |
you don't have to try a thousand different models
link |
to find which one happens to work the best.
link |
You can just jump straight to the best one,
link |
which means that it's more accessible
link |
in terms of compute, cheaper,
link |
and also with less hyperparameters to set,
link |
it means you don't need deep learning experts
link |
to train your deep learning model for you,
link |
which means that domain experts can do more of the work,
link |
which means that now you can focus the human time
link |
on the kind of interpretation, the data gathering,
link |
identifying model errors and stuff like that.
link |
Yeah, the data side.
link |
How often do you work with data these days
link |
in terms of the cleaning, looking at it?
link |
Like Darwin looked at different species
link |
while traveling about.
link |
Do you look at data?
link |
Have you in your roots in Kaggle?
link |
Yeah, I mean, it's a key part of our course.
link |
It's like before we train a model in the course,
link |
we see how to look at the data.
link |
And then the first thing we do
link |
after we train our first model,
link |
which we fine tune an ImageNet model for five minutes.
link |
And then the thing we immediately do after that
link |
is we learn how to analyze the results of the model
link |
by looking at examples of misclassified images
link |
and looking at a classification matrix,
link |
and then doing research on Google
link |
to learn about the kinds of things that it's misclassifying.
link |
So to me, one of the really cool things
link |
about machine learning models in general
link |
is that when you interpret them,
link |
they tell you about things like
link |
what are the most important features,
link |
which groups are you misclassifying,
link |
and they help you become a domain expert more quickly
link |
because you can focus your time on the bits
link |
that the model is telling you is important.
link |
So it lets you deal with things like data leakage,
link |
for example, if it says,
link |
oh, the main feature I'm looking at is customer ID.
link |
And you're like, oh, customer ID should be predictive.
link |
And then you can talk to the people
link |
that manage customer IDs and they'll tell you like,
link |
oh yes, as soon as a customer's application is accepted,
link |
we add a one on the end of their customer ID or something.
link |
So yeah, looking at data,
link |
particularly from the lens of which parts of the data
link |
the model says is important is super important.
link |
Yeah, and using the model to almost debug the data
link |
to learn more about the data.
link |
What are the different cloud options
link |
for training your own networks?
link |
Last question related to DawnBench.
link |
Well, it's part of a lot of the work you do,
link |
but from a perspective of performance,
link |
I think you've written this in a blog post.
link |
There's AWS, there's TPU from Google.
link |
What's your sense?
link |
What the future holds?
link |
What would you recommend now in terms of training?
link |
So from a hardware point of view,
link |
Google's TPUs and the best Nvidia GPUs are similar.
link |
I mean, maybe the TPUs are like 30% faster,
link |
but they're also much harder to program.
link |
There isn't a clear leader in terms of hardware right now,
link |
although much more importantly,
link |
the Nvidia GPUs are much more programmable.
link |
They've got much more written for all of them.
link |
So like that's the clear leader for me
link |
and where I would spend my time
link |
as a researcher and practitioner.
link |
But then in terms of the platform,
link |
I mean, we're super lucky now with stuff like Google GCP,
link |
Google Cloud, and AWS that you can access a GPU
link |
pretty quickly and easily.
link |
But I mean, for AWS, it's still too hard.
link |
Like you have to find an AMI and get the instance running
link |
and then install the software you want and blah, blah, blah.
link |
GCP is currently the best way to get started
link |
on a full server environment
link |
because they have a fantastic fast AI in PyTorch ready
link |
to go instance, which has all the courses preinstalled.
link |
It has Jupyter Notebook pre running.
link |
Jupyter Notebook is this wonderful
link |
interactive computing system,
link |
which everybody basically should be using
link |
for any kind of data driven research.
link |
But then even better than that,
link |
there are platforms like Salamander, which we own
link |
and Paperspace, where literally you click a single button
link |
and it pops up a Jupyter Notebook straight away
link |
without any kind of installation or anything.
link |
And all the course notebooks are all preinstalled.
link |
So like for me, this is one of the things
link |
we spent a lot of time kind of curating and working on.
link |
Because when we first started our courses,
link |
the biggest problem was people dropped out of lesson one
link |
because they couldn't get an AWS instance running.
link |
So things are so much better now.
link |
And like we actually have, if you go to course.fast.ai,
link |
the first thing it says is here's how to get started
link |
And there's like, you just click on the link
link |
and you click start and you're going.
link |
I have to confess, I've never used the Google GCP.
link |
Yeah, GCP gives you $300 of compute for free,
link |
which is really nice.
link |
But as I say, Salamander and Paperspace
link |
are even easier still.
link |
So from the perspective of deep learning frameworks,
link |
you work with fast.ai, if you go to this framework,
link |
and PyTorch and TensorFlow.
link |
What are the strengths of each platform in your perspective?
link |
So in terms of what we've done our research on
link |
and taught in our course,
link |
we started with Theano and Keras,
link |
and then we switched to TensorFlow and Keras,
link |
and then we switched to PyTorch,
link |
and then we switched to PyTorch and fast.ai.
link |
And that kind of reflects a growth and development
link |
of the ecosystem of deep learning libraries.
link |
Theano and TensorFlow were great,
link |
but were much harder to teach and to do research
link |
and development on because they define
link |
what's called a computational graph upfront,
link |
a static graph, where you basically have to say,
link |
here are all the things that I'm gonna eventually do
link |
in my model, and then later on you say,
link |
okay, do those things with this data.
link |
And you can't like debug them,
link |
you can't do them step by step,
link |
you can't program them interactively
link |
in a Jupyter notebook and so forth.
link |
PyTorch was not the first,
link |
but PyTorch was certainly the strongest entrant
link |
to come along and say, let's not do it that way,
link |
let's just use normal Python.
link |
And everything you know about in Python
link |
is just gonna work, and we'll figure out
link |
how to make that run on the GPU as and when necessary.
link |
That turned out to be a huge leap
link |
in terms of what we could do with our research
link |
and what we could do with our teaching.
link |
Because it wasn't limiting.
link |
Yeah, I mean, it was critical for us
link |
for something like DawnBench
link |
to be able to rapidly try things.
link |
It's just so much harder to be a researcher
link |
and practitioner when you have to do everything upfront
link |
and you can't inspect it.
link |
Problem with PyTorch is it's not at all accessible
link |
to newcomers because you have to like
link |
write your own training loop and manage the gradients
link |
and all this stuff.
link |
And it's also like not great for researchers
link |
because you're spending your time dealing
link |
with all this boilerplate and overhead
link |
rather than thinking about your algorithm.
link |
So we ended up writing this very multi layered API
link |
that at the top level, you can train
link |
a state of the art neural network
link |
in three lines of code.
link |
And which kind of talks to an API,
link |
which talks to an API, which talks to an API,
link |
which like you can dive into at any level
link |
and get progressively closer to the machine
link |
kind of levels of control.
link |
And this is the fast AI library.
link |
That's been critical for us and for our students
link |
and for lots of people that have won deep learning
link |
competitions with it and written academic papers with it.
link |
It's made a big difference.
link |
We're still limited though by Python.
link |
And particularly this problem with things like
link |
recurrent neural nets say where you just can't change things
link |
unless you accept it going so slowly that it's impractical.
link |
So in the latest incarnation of the course
link |
and with some of the research we're now starting to do,
link |
we're starting to do stuff, some stuff in Swift.
link |
I think we're three years away from that
link |
being super practical, but I'm in no hurry.
link |
I'm very happy to invest the time to get there.
link |
But with that, we actually already have a nascent version
link |
of the fast AI library for vision running
link |
on Swift and TensorFlow.
link |
Cause a Python for TensorFlow is not gonna cut it.
link |
It's just a disaster.
link |
What they did was they tried to replicate
link |
the bits that people were saying they like about PyTorch,
link |
this kind of interactive computation,
link |
but they didn't actually change
link |
their foundational runtime components.
link |
So they kind of added this like syntax sugar
link |
they call TF Eager, TensorFlow Eager,
link |
which makes it look a lot like PyTorch,
link |
but it's 10 times slower than PyTorch
link |
to actually do a step.
link |
So because they didn't invest the time in like retooling
link |
the foundations, cause their code base is so horribly
link |
Yeah, I think it's probably very difficult
link |
to do that kind of retooling.
link |
Yeah, well, particularly the way TensorFlow was written,
link |
it was written by a lot of people very quickly
link |
in a very disorganized way.
link |
So like when you actually look in the code,
link |
as I do often, I'm always just like,
link |
Oh God, what were they thinking?
link |
It's just, it's pretty awful.
link |
So I'm really extremely negative
link |
about the potential future for Python for TensorFlow.
link |
But Swift for TensorFlow can be a different beast altogether.
link |
It can be like, it can basically be a layer on top of MLIR
link |
that takes advantage of, you know,
link |
all the great compiler stuff that Swift builds on with LLVM
link |
and yeah, I think it will be absolutely fantastic.
link |
Well, you're inspiring me to try.
link |
I haven't truly felt the pain of TensorFlow 2.0 Python.
link |
It's fine by me, but of...
link |
Yeah, I mean, it does the job
link |
if you're using like predefined things
link |
that somebody has already written.
link |
But if you actually compare, you know,
link |
like I've had to do,
link |
cause I've been having to do a lot of stuff
link |
with TensorFlow recently,
link |
you actually compare like,
link |
okay, I want to write something from scratch
link |
and you're like, I just keep finding it's like,
link |
Oh, it's running 10 times slower than PyTorch.
link |
So is the biggest cost,
link |
let's throw running time out the window.
link |
How long it takes you to program?
link |
That's not too different now,
link |
thanks to TensorFlow Eager, that's not too different.
link |
But because so many things take so long to run,
link |
you wouldn't run it at 10 times slower.
link |
Like you just go like, Oh, this is taking too long.
link |
And also there's a lot of things
link |
which are just less programmable,
link |
like tf.data, which is the way data processing works
link |
in TensorFlow is just this big mess.
link |
It's incredibly inefficient.
link |
And they kind of had to write it that way
link |
because of the TPU problems I described earlier.
link |
So I just, you know,
link |
I just feel like they've got this huge technical debt,
link |
which they're not going to solve
link |
without starting from scratch.
link |
So here's an interesting question then,
link |
if there's a new student starting today,
link |
what would you recommend they use?
link |
Well, I mean, we obviously recommend Fastai and PyTorch
link |
because we teach new students and that's what we teach with.
link |
So we would very strongly recommend that
link |
because it will let you get on top of the concepts
link |
much more quickly.
link |
So then you'll become an actual,
link |
and you'll also learn the actual state
link |
of the art techniques, you know,
link |
so you actually get world class results.
link |
Honestly, it doesn't much matter what library you learn
link |
because switching from the trainer to MXNet
link |
to TensorFlow to PyTorch is gonna be a couple of days work
link |
as long as you understand the foundation as well.
link |
But you think will Swift creep in there
link |
as a thing that people start using?
link |
Not for a few years,
link |
particularly because like Swift has no data science
link |
community, libraries, schooling.
link |
And the Swift community has a total lack of appreciation
link |
and understanding of numeric computing.
link |
So like they keep on making stupid decisions, you know,
link |
for years, they've just done dumb things
link |
around performance and prioritization.
link |
That's clearly changing now
link |
because the developer of Swift, Chris Latner,
link |
is working at Google on Swift for TensorFlow.
link |
So like that's a priority.
link |
It'll be interesting to see what happens with Apple
link |
because like Apple hasn't shown any sign of caring
link |
about numeric programming in Swift.
link |
So I mean, hopefully they'll get off their ass
link |
and start appreciating this
link |
because currently all of their low level libraries
link |
are not written in Swift.
link |
They're not particularly Swifty at all,
link |
stuff like CoreML, they're really pretty rubbish.
link |
So yeah, so there's a long way to go.
link |
But at least one nice thing is that Swift for TensorFlow
link |
can actually directly use Python code and Python libraries
link |
in a literally the entire lesson one notebook of fast AI
link |
runs in Swift right now in Python mode.
link |
So that's a nice intermediate thing.
link |
How long does it take?
link |
If you look at the two fast AI courses,
link |
how long does it take to get from point zero
link |
to completing both courses?
link |
Somewhere between two months and two years generally.
link |
So for two months, how many hours a day on average?
link |
So like somebody who is a very competent coder
link |
can do 70 hours per course and pick up 70.
link |
70, seven zero, that's it, okay.
link |
But a lot of people I know take a year off
link |
to study fast AI full time and say at the end of the year,
link |
they feel pretty competent
link |
because generally there's a lot of other things you do
link |
like generally they'll be entering Kaggle competitions,
link |
they might be reading Ian Goodfellow's book,
link |
they might, they'll be doing a bunch of stuff
link |
and often particularly if they are a domain expert,
link |
their coding skills might be a little
link |
on the pedestrian side.
link |
So part of it's just like doing a lot more writing.
link |
What do you find is the bottleneck for people usually
link |
except getting started and setting stuff up?
link |
I would say coding.
link |
Yeah, I would say the best,
link |
the people who are strong coders pick it up the best.
link |
Although another bottleneck is people who have a lot
link |
of experience of classic statistics can really struggle
link |
because the intuition is so the opposite
link |
of what they're used to.
link |
They're very used to like trying to reduce the number
link |
of parameters in their model
link |
and looking at individual coefficients and stuff like that.
link |
So I find people who have a lot of coding background
link |
and know nothing about statistics
link |
are generally gonna be the best off.
link |
So you taught several courses on deep learning
link |
and as Feynman says,
link |
best way to understand something is to teach it.
link |
What have you learned about deep learning from teaching it?
link |
That's a key reason for me to teach the courses.
link |
I mean, obviously it's gonna be necessary
link |
to achieve our goal of getting domain experts
link |
to be familiar with deep learning,
link |
but it was also necessary for me to achieve my goal
link |
of being really familiar with deep learning.
link |
I mean, to see so many domain experts
link |
from so many different backgrounds,
link |
it's definitely, I wouldn't say taught me,
link |
but convinced me something that I liked to believe was true,
link |
which was anyone can do it.
link |
So there's a lot of kind of snobbishness out there
link |
about only certain people can learn to code.
link |
Only certain people are gonna be smart enough
link |
like do AI, that's definitely bullshit.
link |
I've seen so many people from so many different backgrounds
link |
get state of the art results in their domain areas now.
link |
It's definitely taught me that the key differentiator
link |
between people that succeed
link |
and people that fail is tenacity.
link |
That seems to be basically the only thing that matters.
link |
A lot of people give up.
link |
But of the ones who don't give up,
link |
pretty much everybody succeeds.
link |
Even if at first I'm just kind of like thinking like,
link |
wow, they really aren't quite getting it yet, are they?
link |
But eventually people get it and they succeed.
link |
So I think that's been,
link |
I think they're both things I liked to believe was true,
link |
but I don't feel like I really had strong evidence
link |
for them to be true,
link |
but now I can say I've seen it again and again.
link |
I've seen it again and again. So what advice do you have
link |
for someone who wants to get started in deep learning?
link |
Train lots of models.
link |
That's how you learn it.
link |
So I think, it's not just me,
link |
I think our course is very good,
link |
but also lots of people independently
link |
have said it's very good.
link |
It recently won the COGx award for AI courses
link |
as being the best in the world.
link |
So I'd say come to our course, course.fast.ai.
link |
And the thing I keep on hopping on in my lessons
link |
is train models, print out the inputs to the models,
link |
print out to the outputs to the models,
link |
like study, change the inputs a bit,
link |
look at how the outputs vary,
link |
just run lots of experiments
link |
to get an intuitive understanding of what's going on.
link |
To get hooked, do you think, you mentioned training,
link |
do you think just running the models inference,
link |
like if we talk about getting started?
link |
No, you've got to fine tune the models.
link |
So that's the critical thing,
link |
because at that point you now have a model
link |
that's in your domain area.
link |
So there's no point running somebody else's model
link |
because it's not your model.
link |
So it only takes five minutes to fine tune a model
link |
for the data you care about.
link |
And in lesson two of the course,
link |
we teach you how to create your own data set from scratch
link |
by scripting Google image search.
link |
So, and we show you how to actually create
link |
a web application running online.
link |
So I create one in the course that differentiates
link |
between a teddy bear, a grizzly bear and a brown bear.
link |
And it does it with basically 100% accuracy,
link |
took me about four minutes to scrape the images
link |
from Google search in the script.
link |
There's a little graphical widgets we have in the notebook
link |
that help you clean up the data set.
link |
There's other widgets that help you study the results
link |
to see where the errors are happening.
link |
And so now we've got over a thousand replies
link |
in our share your work here thread
link |
of students saying, here's the thing I built.
link |
And so there's people who like,
link |
and a lot of them are state of the art.
link |
Like somebody said, oh, I tried looking
link |
at Devangari characters and I couldn't believe it.
link |
The thing that came out was more accurate
link |
than the best academic paper after lesson one.
link |
And then there's others which are just more kind of fun,
link |
like somebody who's doing Trinidad and Tobago hummingbirds.
link |
She said that's kind of their national bird
link |
and she's got something that can now classify Trinidad
link |
and Tobago hummingbirds.
link |
So yeah, train models, fine tune models with your data set
link |
and then study their inputs and outputs.
link |
How much is Fast.ai courses?
link |
Everything we do is free.
link |
We have no revenue sources of any kind.
link |
It's just a service to the community.
link |
Okay, once a person understands the basics,
link |
trains a bunch of models,
link |
if we look at the scale of years,
link |
what advice do you have for someone wanting
link |
to eventually become an expert?
link |
Train lots of models.
link |
But specifically train lots of models in your domain area.
link |
So an expert what, right?
link |
We don't need more expert,
link |
like create slightly evolutionary research in areas
link |
that everybody's studying.
link |
We need experts at using deep learning
link |
to diagnose malaria.
link |
Or we need experts at using deep learning
link |
to analyze language to study media bias.
link |
So we need experts in analyzing fisheries
link |
to identify problem areas in the ocean.
link |
That's what we need.
link |
So become the expert in your passion area.
link |
And this is a tool which you can use for just about anything
link |
and you'll be able to do that thing better
link |
than other people, particularly by combining it
link |
with your passion and domain expertise.
link |
So that's really interesting.
link |
Even if you do wanna innovate on transfer learning
link |
or active learning, your thought is,
link |
I mean, it's one I certainly share,
link |
is you also need to find a domain or data set
link |
that you actually really care for.
link |
If you're not working on a real problem that you understand,
link |
how do you know if you're doing it any good?
link |
How do you know if your results are good?
link |
How do you know if you're getting bad results?
link |
Why are you getting bad results?
link |
Is it a problem with the data?
link |
Like, how do you know you're doing anything useful?
link |
Yeah, to me, the only really interesting research is,
link |
not the only, but the vast majority
link |
of interesting research is like,
link |
try and solve an actual problem and solve it really well.
link |
So both understanding sufficient tools
link |
on the deep learning side and becoming a domain expert
link |
in a particular domain are really things
link |
within reach for anybody.
link |
Yeah, I mean, to me, I would compare it
link |
to like studying self driving cars,
link |
having never looked at a car or been in a car
link |
or turned a car on, which is like the way it is
link |
for a lot of people, they'll study some academic data set
link |
where they literally have no idea about that.
link |
By the way, I'm not sure how familiar
link |
with autonomous vehicles, but that is literally,
link |
you describe a large percentage of robotics folks
link |
working in self driving cars is they actually
link |
haven't considered driving.
link |
They haven't actually looked at what driving looks like.
link |
They haven't driven.
link |
And it's a problem because you know,
link |
when you've actually driven, you know,
link |
like these are the things that happened
link |
to me when I was driving.
link |
There's nothing that beats the real world examples
link |
of just experiencing them.
link |
You've created many successful startups.
link |
What does it take to create a successful startup?
link |
Same thing as becoming a successful
link |
deep learning practitioner, which is not giving up.
link |
So you can run out of money or run out of time
link |
or run out of something, you know,
link |
but if you keep costs super low
link |
and try and save up some money beforehand
link |
so you can afford to have some time,
link |
then just sticking with it is one important thing.
link |
Doing something you understand and care about is important.
link |
By something, I don't mean,
link |
the biggest problem I see with deep learning people
link |
is they do a PhD in deep learning
link |
and then they try and commercialize their PhD.
link |
It is a waste of time
link |
because that doesn't solve an actual problem.
link |
You picked your PhD topic
link |
because it was an interesting kind of engineering
link |
or math or research exercise.
link |
But yeah, if you've actually spent time as a recruiter
link |
and you know that most of your time was spent
link |
sifting through resumes
link |
and you know that most of the time
link |
you're just looking for certain kinds of things
link |
and you can try doing that with a model for a few minutes
link |
and see whether that's something which a model
link |
seems to be able to do as well as you could,
link |
then you're on the right track to creating a startup.
link |
And then I think just, yeah, being, just be pragmatic and
link |
try and stay away from venture capital money
link |
as long as possible, preferably forever.
link |
So yeah, on that point, do you venture capital?
link |
So did you, were you able to successfully run startups
link |
with self funded for quite a while?
link |
Yeah, so my first two were self funded
link |
and that was the right way to do it.
link |
No, VC startups are much more scary
link |
because you have these people on your back
link |
who do this all the time and who have done it for years
link |
telling you grow, grow, grow, grow.
link |
And they don't care if you fail.
link |
They only care if you don't grow fast enough.
link |
Whereas doing the ones myself, well, with partners
link |
who were friends was nice
link |
because like we just went along at a pace that made sense
link |
and we were able to build it to something
link |
which was big enough that we never had to work again
link |
but was not big enough that any VC
link |
would think it was impressive.
link |
And that was enough for us to be excited, you know?
link |
So I thought that's a much better way
link |
to do things than most people.
link |
In generally speaking, not for yourself
link |
but how do you make money during that process?
link |
Do you cut into savings?
link |
So yeah, so for, so I started Fast Mail
link |
and Optimal Decisions at the same time in 1999
link |
with two different friends.
link |
And for Fast Mail, I guess I spent $70 a month
link |
And when the server ran out of space
link |
I put a payments button on the front page
link |
and said, if you want more than 10 mega space
link |
you have to pay $10 a year.
link |
So run low, like keep your costs down.
link |
Yeah, so I kept my costs down.
link |
And once, you know, once I needed to spend more money
link |
I asked people to spend the money for me.
link |
And that, that was that.
link |
Basically from then on, we were making money
link |
and I was profitable from then.
link |
For Optimal Decisions, it was a bit harder
link |
because we were trying to sell something
link |
that was more like a $1 million sale.
link |
But what we did was we would sell scoping projects.
link |
So kind of like prototypy projects
link |
but rather than doing it for free
link |
we would sell them 50 to $100,000.
link |
So again, we were covering our costs
link |
and also making the client feel
link |
like we were doing something valuable.
link |
So in both cases, we were profitable from six months in.
link |
Ah, nevertheless, it's scary.
link |
I mean, yeah, sure.
link |
I mean, it's, it's scary before you jump in
link |
and I just, I guess I was comparing it
link |
to the scarediness of VC.
link |
I felt like with VC stuff, it was more scary.
link |
Kind of much more in somebody else's hands,
link |
will they fund you or not?
link |
And what do they think of what you're doing?
link |
I also found it very difficult with VCs,
link |
back startups to actually do the thing
link |
which I thought was important for the company
link |
rather than doing the thing
link |
which I thought would make the VC happy.
link |
And VCs always tell you not to do the thing
link |
that makes them happy.
link |
But then if you don't do the thing that makes them happy
link |
And do you think optimizing for the,
link |
whatever they call it, the exit is a good thing
link |
I mean, it can be, but not at the VC level
link |
because the VC exit needs to be, you know, a thousand X.
link |
So where else the lifestyle exit,
link |
if you can sell something for $10 million,
link |
then you've made it, right?
link |
So I don't, it depends.
link |
If you want to build something that's gonna,
link |
you're kind of happy to do forever, then fine.
link |
If you want to build something you want to sell
link |
in three years time, that's fine too.
link |
I mean, they're both perfectly good outcomes.
link |
So you're learning Swift now, in a way.
link |
I mean, you've already.
link |
And I read that you use, at least in some cases,
link |
space repetition as a mechanism for learning new things.
link |
I use Anki quite a lot myself.
link |
I actually never talk to anybody about it.
link |
Don't know how many people do it,
link |
but it works incredibly well for me.
link |
Can you talk to your experience?
link |
Like how did you, what do you?
link |
First of all, okay, let's back it up.
link |
What is space repetition?
link |
So space repetition is an idea created
link |
by a psychologist named Ebbinghaus.
link |
I don't know, must be a couple of hundred years ago
link |
or something, 150 years ago.
link |
He did something which sounds pretty damn tedious.
link |
He wrote down random sequences of letters on cards
link |
and tested how well he would remember
link |
those random sequences a day later, a week later, whatever.
link |
He discovered that there was this kind of a curve
link |
where his probability of remembering one of them
link |
would be dramatically smaller the next day
link |
and then a little bit smaller the next day
link |
and a little bit smaller the next day.
link |
What he discovered is that if he revised those cards
link |
after a day, the probabilities would decrease
link |
at a smaller rate.
link |
And then if you revise them again a week later,
link |
they would decrease at a smaller rate again.
link |
And so he basically figured out a roughly optimal equation
link |
for when you should revise something you wanna remember.
link |
So space repetition learning is using this simple algorithm,
link |
just something like revise something after a day
link |
and then three days and then a week and then three weeks
link |
And so if you use a program like Anki, as you know,
link |
it will just do that for you.
link |
And it will say, did you remember this?
link |
And if you say no, it will reschedule it back
link |
to appear again like 10 times faster
link |
than it otherwise would have.
link |
It's a kind of a way of being guaranteed to learn something
link |
because by definition, if you're not learning it,
link |
it will be rescheduled to be revised more quickly.
link |
Unfortunately though, it's also like,
link |
it doesn't let you fool yourself.
link |
If you're not learning something,
link |
you know like your revisions will just get more and more.
link |
So you have to find ways to learn things productively
link |
and effectively like treat your brain well.
link |
So using like mnemonics and stories and context
link |
and stuff like that.
link |
So yeah, it's a super great technique.
link |
It's like learning how to learn is something
link |
which everybody should learn
link |
before they actually learn anything.
link |
But almost nobody does.
link |
So what have you, so it certainly works well
link |
for learning new languages for, I mean,
link |
for learning like small projects almost.
link |
But do you, you know, I started using it for,
link |
I forget who wrote a blog post about this inspired me.
link |
It might've been you, I'm not sure.
link |
I started when I read papers,
link |
I'll concepts and ideas, I'll put them.
link |
Was it Michael Nielsen?
link |
It was Michael Nielsen.
link |
So Michael started doing this recently
link |
and has been writing about it.
link |
So the kind of today's Ebbinghaus
link |
is a guy called Peter Wozniak
link |
who developed a system called SuperMemo.
link |
And he's been basically trying to become like
link |
the world's greatest Renaissance man
link |
over the last few decades.
link |
He's basically lived his life
link |
with space repetition learning for everything.
link |
I, and sort of like,
link |
Michael's only very recently got into this,
link |
but he started really getting excited
link |
about doing it for a lot of different things.
link |
For me personally, I actually don't use it
link |
for anything except Chinese.
link |
And the reason for that is that
link |
Chinese is specifically a thing I made a conscious decision
link |
that I want to continue to remember,
link |
even if I don't get much of a chance to exercise it,
link |
cause like I'm not often in China, so I don't.
link |
Or else something like programming languages or papers.
link |
I have a very different approach,
link |
which is I try not to learn anything from them,
link |
but instead I try to identify the important concepts
link |
and like actually ingest them.
link |
So like really understand that concept deeply
link |
and study it carefully.
link |
I will decide if it really is important,
link |
if it is like incorporated into our library,
link |
you know, incorporated into how I do things
link |
or decide it's not worth it, say.
link |
So I find, I find I then remember the things
link |
that I care about because I'm using it all the time.
link |
So I've, for the last 25 years,
link |
I've committed to spending at least half of every day
link |
learning or practicing something new,
link |
which is all my colleagues have always hated
link |
because it always looks like I'm not working on
link |
what I'm meant to be working on,
link |
but it always means I do everything faster
link |
because I've been practicing a lot of stuff.
link |
So I kind of give myself a lot of opportunity
link |
to practice new things.
link |
And so I find now I don't,
link |
yeah, I don't often kind of find myself
link |
wishing I could remember something
link |
because if it's something that's useful,
link |
then I've been using it a lot.
link |
It's easy enough to look it up on Google,
link |
but speaking Chinese, you can't look it up on Google.
link |
Do you have advice for people learning new things?
link |
So if you, what have you learned as a process as a,
link |
I mean, it all starts with just making the hours
link |
and the day available.
link |
Yeah, you got to stick with it,
link |
which is again, the number one thing
link |
that 99% of people don't do.
link |
So the people I started learning Chinese with,
link |
none of them were still doing it 12 months later.
link |
I'm still doing it 10 years later.
link |
I tried to stay in touch with them,
link |
but they just, no one did it.
link |
For something like Chinese,
link |
like study how human learning works.
link |
So every one of my Chinese flashcards
link |
is associated with a story.
link |
And that story is specifically designed to be memorable.
link |
And we find things memorable,
link |
which are like funny or disgusting or sexy
link |
or related to people that we know or care about.
link |
So I try to make sure all of the stories
link |
that are in my head have those characteristics.
link |
Yeah, so you have to, you know,
link |
you won't remember things well
link |
if they don't have some context.
link |
And yeah, you won't remember them well
link |
if you don't regularly practice them,
link |
whether it be just part of your day to day life
link |
or the Chinese and me flashcards.
link |
I mean, the other thing is,
link |
I'll let yourself fail sometimes.
link |
So like I've had various medical problems
link |
over the last few years.
link |
And basically my flashcards
link |
just stopped for about three years.
link |
And there've been other times I've stopped for a few months
link |
and it's so hard because you get back to it
link |
and it's like, you have 18,000 cards due.
link |
It's like, and so you just have to go, all right,
link |
well, I can either stop and give up everything
link |
or just decide to do this every day for the next two years
link |
until I get back to it.
link |
The amazing thing has been that even after three years,
link |
I, you know, the Chinese were still in there.
link |
Like it was so much faster to relearn
link |
than it was to learn the first time.
link |
I have the same with guitar, with music and so on.
link |
It's sad because the work sometimes takes away
link |
and then you won't play for a year.
link |
But really, if you then just get back to it every day,
link |
you're right there again.
link |
What do you think is the next big breakthrough
link |
in artificial intelligence?
link |
What are your hopes in deep learning or beyond
link |
that people should be working on
link |
or you hope there'll be breakthroughs?
link |
I don't think it's possible to predict.
link |
I think what we already have
link |
is an incredibly powerful platform
link |
to solve lots of societally important problems
link |
that are currently unsolved.
link |
So I just hope that people will,
link |
lots of people will learn this toolkit and try to use it.
link |
I don't think we need a lot of new technological breakthroughs
link |
to do a lot of great work right now.
link |
And when do you think we're going to create
link |
a human level intelligence system?
link |
How far away are we?
link |
I have no way to know.
link |
I don't know why people make predictions about this
link |
because there's no data and nothing to go on.
link |
And it's just like,
link |
there's so many societally important problems
link |
to solve right now.
link |
I just don't find it a really interesting question
link |
So in terms of societally important problems,
link |
what's the problem that is within reach?
link |
Well, I mean, for example,
link |
there are problems that AI creates, right?
link |
So more specifically,
link |
labor force displacement is going to be huge
link |
and people keep making this
link |
frivolous econometric argument of being like,
link |
oh, there's been other things that aren't AI
link |
that have come along before
link |
and haven't created massive labor force displacement,
link |
therefore AI won't.
link |
So that's a serious concern for you?
link |
Andrew Yang is running on it.
link |
Yeah, it's, I'm desperately concerned.
link |
And you see already that the changing workplace
link |
has led to a hollowing out of the middle class.
link |
You're seeing that students coming out of school today
link |
have a less rosy financial future ahead of them
link |
than their parents did,
link |
which has never happened in recent,
link |
in the last few hundred years.
link |
You know, we've always had progress before.
link |
And you see this turning into anxiety
link |
and despair and even violence.
link |
So I very much worry about that.
link |
You've written quite a bit about ethics too.
link |
I do think that every data scientist
link |
working with deep learning needs to recognize
link |
they have an incredibly high leverage tool
link |
that they're using that can influence society
link |
And if they're doing research,
link |
that that research is gonna be used by people
link |
doing this kind of work.
link |
And they have a responsibility to consider the consequences
link |
and to think about things like
link |
how will humans be in the loop here?
link |
How do we avoid runaway feedback loops?
link |
How do we ensure an appeals process for humans
link |
that are impacted by my algorithm?
link |
How do I ensure that the constraints of my algorithm
link |
are adequately explained to the people
link |
that end up using them?
link |
There's all kinds of human issues
link |
which only data scientists are actually
link |
in the right place to educate people are about,
link |
but data scientists tend to think of themselves
link |
as just engineers and that they don't need
link |
to be part of that process, which is wrong.
link |
Well, you're in the perfect position to educate them better,
link |
to read literature, to read history, to learn from history.
link |
Well, Jeremy, thank you so much for everything you do
link |
for inspiring huge amount of people,
link |
getting them into deep learning
link |
and having the ripple effects,
link |
the flap of a butterfly's wings
link |
that will probably change the world.
link |
So thank you very much.
link |
Thank you, thank you, thank you, thank you.