back to indexYann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36
link |
The following is a conversation with Yann LeCun.
link |
He's considered to be one of the fathers of deep learning,
link |
which, if you've been hiding under a rock,
link |
is the recent revolution in AI that has captivated the world
link |
with the possibility of what machines can learn from data.
link |
He's a professor at New York University,
link |
a vice president and chief AI scientist at Facebook,
link |
and co recipient of the Turing Award
link |
for his work on deep learning.
link |
He's probably best known as the founding father
link |
of convolutional neural networks,
link |
in particular their application
link |
to optical character recognition
link |
and the famed MNIST dataset.
link |
He is also an outspoken personality,
link |
unafraid to speak his mind in a distinctive French accent
link |
and explore provocative ideas,
link |
both in the rigorous medium of academic research
link |
and the somewhat less rigorous medium
link |
of Twitter and Facebook.
link |
This is the Artificial Intelligence Podcast.
link |
If you enjoy it, subscribe on YouTube,
link |
give it five stars on iTunes, support it on Patreon,
link |
or simply connect with me on Twitter at Lex Friedman,
link |
spelled F R I D M A N.
link |
And now, here's my conversation with Yann LeCun.
link |
You said that 2001 Space Odyssey
link |
is one of your favorite movies.
link |
Hal 9000 decides to get rid of the astronauts
link |
for people who haven't seen the movie, spoiler alert,
link |
because he, it, she believes that the astronauts,
link |
they will interfere with the mission.
link |
Do you see Hal as flawed in some fundamental way
link |
or even evil, or did he do the right thing?
link |
There's no notion of evil in that context,
link |
other than the fact that people die,
link |
but it was an example of what people call
link |
value misalignment, right?
link |
You give an objective to a machine,
link |
and the machine strives to achieve this objective.
link |
And if you don't put any constraints on this objective,
link |
like don't kill people and don't do things like this,
link |
the machine, given the power, will do stupid things
link |
just to achieve this objective,
link |
or damaging things to achieve this objective.
link |
It's a little bit like, I mean, we're used to this
link |
in the context of human society.
link |
We put in place laws to prevent people
link |
from doing bad things, because spontaneously,
link |
they would do those bad things, right?
link |
So we have to shape their cost function,
link |
their objective function, if you want,
link |
through laws to kind of correct,
link |
and education, obviously, to sort of correct for those.
link |
So maybe just pushing a little further on that point,
link |
how, you know, there's a mission,
link |
there's this fuzziness around,
link |
the ambiguity around what the actual mission is,
link |
but, you know, do you think that there will be a time,
link |
from a utilitarian perspective,
link |
where an AI system, where it is not misalignment,
link |
where it is alignment, for the greater good of society,
link |
that an AI system will make decisions that are difficult?
link |
Well, that's the trick.
link |
I mean, eventually we'll have to figure out how to do this.
link |
And again, we're not starting from scratch,
link |
because we've been doing this with humans for millennia.
link |
So designing objective functions for people
link |
is something that we know how to do.
link |
And we don't do it by, you know, programming things,
link |
although the legal code is called code.
link |
So that tells you something.
link |
And it's actually the design of an objective function.
link |
That's really what legal code is, right?
link |
It tells you, here is what you can do,
link |
here is what you can't do.
link |
If you do it, you pay that much,
link |
that's an objective function.
link |
So there is this idea somehow that it's a new thing
link |
for people to try to design objective functions
link |
that are aligned with the common good.
link |
But no, we've been writing laws for millennia
link |
and that's exactly what it is.
link |
So that's where, you know, the science of lawmaking
link |
and computer science will.
link |
Will come together.
link |
So there's nothing special about HAL or AI systems,
link |
it's just the continuation of tools used
link |
to make some of these difficult ethical judgments
link |
Yeah, and we have systems like this already
link |
that make many decisions for ourselves in society
link |
that need to be designed in a way that they,
link |
like rules about things that sometimes have bad side effects
link |
and we have to be flexible enough about those rules
link |
so that they can be broken when it's obvious
link |
that they shouldn't be applied.
link |
So you don't see this on the camera here,
link |
but all the decoration in this room
link |
is all pictures from 2001 and Space Odyssey.
link |
Wow, is that by accident or is there a lot?
link |
No, by accident, it's by design.
link |
So if you were to build HAL 10,000,
link |
so an improvement of HAL 9,000, what would you improve?
link |
Well, first of all, I wouldn't ask it to hold secrets
link |
and tell lies because that's really what breaks it
link |
in the end, that's the fact that it's asking itself
link |
questions about the purpose of the mission
link |
and it's, you know, pieces things together that it's heard,
link |
you know, all the secrecy of the preparation of the mission
link |
and the fact that it was the discovery
link |
on the lunar surface that really was kept secret
link |
and one part of HAL's memory knows this
link |
and the other part does not know it
link |
and is supposed to not tell anyone
link |
and that creates internal conflict.
link |
So you think there's never should be a set of things
link |
that an AI system should not be allowed,
link |
like a set of facts that should not be shared
link |
with the human operators?
link |
Well, I think, no, I think it should be a bit like
link |
in the design of autonomous AI systems,
link |
there should be the equivalent of, you know,
link |
the oath that a hypocrite oath
link |
that doctors sign up to, right?
link |
So there's certain things, certain rules
link |
that you have to abide by and we can sort of hardwire this
link |
into our machines to kind of make sure they don't go.
link |
So I'm not, you know, an advocate of the three laws
link |
of robotics, you know, the Asimov kind of thing
link |
because I don't think it's practical,
link |
but, you know, some level of limits.
link |
But to be clear, these are not questions
link |
that are kind of really worth asking today
link |
because we just don't have the technology to do this.
link |
We don't have autonomous intelligent machines,
link |
we have intelligent machines.
link |
Some are intelligent machines that are very specialized,
link |
but they don't really sort of satisfy an objective.
link |
They're just, you know, kind of trained to do one thing.
link |
So until we have some idea for design
link |
of a full fledged autonomous intelligent system,
link |
asking the question of how we design this objective,
link |
I think is a little too abstract.
link |
It's a little too abstract.
link |
There's useful elements to it in that it helps us understand
link |
our own ethical codes, humans.
link |
So even just as a thought experiment,
link |
if you imagine that an AGI system is here today,
link |
how would we program it is a kind of nice thought experiment
link |
of constructing how should we have a law,
link |
have a system of laws for us humans.
link |
It's just a nice practical tool.
link |
And I think there's echoes of that idea too
link |
in the AI systems we have today
link |
that don't have to be that intelligent.
link |
Like autonomous vehicles.
link |
These things start creeping in that are worth thinking about,
link |
but certainly they shouldn't be framed as how.
link |
Looking back, what is the most,
link |
I'm sorry if it's a silly question,
link |
but what is the most beautiful
link |
or surprising idea in deep learning
link |
or AI in general that you've ever come across?
link |
Sort of personally, when you said back
link |
and just had this kind of,
link |
oh, that's pretty cool moment.
link |
That's surprising.
link |
I don't know if it's an idea
link |
rather than a sort of empirical fact.
link |
The fact that you can build gigantic neural nets,
link |
train them on relatively small amounts of data relatively
link |
with stochastic gradient descent
link |
and that it actually works,
link |
breaks everything you read in every textbook, right?
link |
Every pre deep learning textbook that told you,
link |
you need to have fewer parameters
link |
and you have data samples.
link |
If you have a non convex objective function,
link |
you have no guarantee of convergence.
link |
All those things that you read in textbook
link |
and they tell you to stay away from this
link |
and they're all wrong.
link |
The huge number of parameters, non convex,
link |
and somehow which is very relative
link |
to the number of parameters data,
link |
it's able to learn anything.
link |
Does that still surprise you today?
link |
Well, it was kind of obvious to me
link |
before I knew anything that this is a good idea.
link |
And then it became surprising that it worked
link |
because I started reading those textbooks.
link |
So can you talk through the intuition
link |
of why it was obvious to you if you remember?
link |
So the intuition was it's sort of like,
link |
those people in the late 19th century
link |
who proved that heavier than air flight was impossible.
link |
And of course you have birds, right?
link |
And so on the face of it,
link |
it's obviously wrong as an empirical question, right?
link |
And so we have the same kind of thing
link |
that we know that the brain works.
link |
We don't know how, but we know it works.
link |
And we know it's a large network of neurons and interaction
link |
and that learning takes place by changing the connection.
link |
So kind of getting this level of inspiration
link |
without copying the details,
link |
but sort of trying to derive basic principles,
link |
and that kind of gives you a clue
link |
as to which direction to go.
link |
There's also the idea somehow that I've been convinced of
link |
since I was an undergrad that, even before,
link |
that intelligence is inseparable from learning.
link |
So the idea somehow that you can create
link |
an intelligent machine by basically programming,
link |
for me it was a non starter from the start.
link |
Every intelligent entity that we know about
link |
arrives at this intelligence through learning.
link |
So machine learning was a completely obvious path.
link |
Also because I'm lazy, so, you know, kind of.
link |
He's automate basically everything
link |
and learning is the automation of intelligence.
link |
So do you think, so what is learning then?
link |
What falls under learning?
link |
Because do you think of reasoning as learning?
link |
Well, reasoning is certainly a consequence
link |
of learning as well, just like other functions of the brain.
link |
The big question about reasoning is,
link |
how do you make reasoning compatible
link |
with gradient based learning?
link |
Do you think neural networks can be made to reason?
link |
Yes, there is no question about that.
link |
Again, we have a good example, right?
link |
The question is how?
link |
So the question is how much prior structure
link |
do you have to put in the neural net
link |
so that something like human reasoning
link |
will emerge from it, you know, from learning?
link |
Another question is all of our kind of model
link |
of what reasoning is that are based on logic
link |
are discrete and are therefore incompatible
link |
with gradient based learning.
link |
And I'm a very strong believer
link |
in this idea of gradient based learning.
link |
I don't believe that other types of learning
link |
that don't use kind of gradient information if you want.
link |
So you don't like discrete mathematics?
link |
You don't like anything discrete?
link |
Well, that's, it's not that I don't like it,
link |
it's just that it's incompatible with learning
link |
and I'm a big fan of learning, right?
link |
So in fact, that's perhaps one reason
link |
why deep learning has been kind of looked at
link |
with suspicion by a lot of computer scientists
link |
because the math is very different.
link |
The math that you use for deep learning,
link |
you know, it kind of has more to do with,
link |
you know, cybernetics, the kind of math you do
link |
in electrical engineering than the kind of math
link |
you do in computer science.
link |
And, you know, nothing in machine learning is exact, right?
link |
Computer science is all about sort of, you know,
link |
obviously compulsive attention to details of like,
link |
you know, every index has to be right.
link |
And you can prove that an algorithm is correct, right?
link |
Machine learning is the science of sloppiness, really.
link |
So, okay, maybe let's feel around in the dark
link |
of what is a neural network that reasons
link |
or a system that works with continuous functions
link |
that's able to do, build knowledge,
link |
however we think about reasoning,
link |
build on previous knowledge, build on extra knowledge,
link |
create new knowledge,
link |
generalize outside of any training set to ever build.
link |
What does that look like?
link |
If, yeah, maybe give inklings of thoughts
link |
of what that might look like.
link |
Yeah, I mean, yes and no.
link |
If I had precise ideas about this,
link |
I think, you know, we'd be building it right now.
link |
And there are people working on this
link |
whose main research interest is actually exactly that, right?
link |
So what you need to have is a working memory.
link |
So you need to have some device, if you want,
link |
some subsystem that can store a relatively large number
link |
of factual episodic information for, you know,
link |
a reasonable amount of time.
link |
So, you know, in the brain, for example,
link |
there are kind of three main types of memory.
link |
One is the sort of memory of the state of your cortex.
link |
And that sort of disappears within 20 seconds.
link |
You can't remember things for more than about 20 seconds
link |
or a minute if you don't have any other form of memory.
link |
The second type of memory, which is longer term,
link |
is still short term, is the hippocampus.
link |
So you can, you know, you came into this building,
link |
you remember where the exit is, where the elevators are.
link |
You have some map of that building
link |
that's stored in your hippocampus.
link |
You might remember something about what I said,
link |
you know, a few minutes ago.
link |
I forgot it all already.
link |
Of course, it's been erased, but, you know,
link |
but that would be in your hippocampus.
link |
And then the longer term memory is in the synapse,
link |
the synapses, right?
link |
So what you need if you want a system
link |
that's capable of reasoning
link |
is that you want the hippocampus like thing, right?
link |
And that's what people have tried to do
link |
with memory networks and, you know,
link |
neural training machines and stuff like that, right?
link |
And now with transformers,
link |
which have sort of a memory in there,
link |
kind of self attention system.
link |
You can think of it this way.
link |
So that's one element you need.
link |
Another thing you need is some sort of network
link |
that can access this memory,
link |
get an information back and then kind of crunch on it
link |
and then do this iteratively multiple times
link |
because a chain of reasoning is a process
link |
by which you update your knowledge
link |
about the state of the world,
link |
about, you know, what's going to happen, et cetera.
link |
And that has to be this sort of
link |
recurrent operation basically.
link |
And you think that kind of,
link |
if we think about a transformer,
link |
so that seems to be too small
link |
to contain the knowledge that's,
link |
to represent the knowledge
link |
that's contained in Wikipedia, for example.
link |
Well, a transformer doesn't have this idea of recurrence.
link |
It's got a fixed number of layers
link |
and that's the number of steps that, you know,
link |
limits basically its representation.
link |
But recurrence would build on the knowledge somehow.
link |
I mean, it would evolve the knowledge
link |
and expand the amount of information perhaps
link |
or useful information within that knowledge.
link |
But is this something that just can emerge with size?
link |
Because it seems like everything we have now is too small.
link |
Not just, no, it's not clear.
link |
I mean, how you access and write
link |
into an associative memory in an efficient way.
link |
I mean, sort of the original memory network
link |
maybe had something like the right architecture,
link |
but if you try to scale up a memory network
link |
so that the memory contains all the Wikipedia,
link |
it doesn't quite work.
link |
So there's a need for new ideas there, okay.
link |
But it's not the only form of reasoning.
link |
So there's another form of reasoning,
link |
which is true, which is very classical also
link |
in some types of AI.
link |
And it's based on, let's call it energy minimization.
link |
Okay, so you have some sort of objective,
link |
some energy function that represents
link |
the quality or the negative quality, okay.
link |
Energy goes up when things get bad
link |
and they get low when things get good.
link |
So let's say you want to figure out,
link |
what gestures do I need to do
link |
to grab an object or walk out the door.
link |
If you have a good model of your own body,
link |
a good model of the environment,
link |
using this kind of energy minimization,
link |
you can do planning.
link |
And in optimal control,
link |
it's called model predictive control.
link |
You have a model of what's gonna happen in the world
link |
as a consequence of your actions.
link |
And that allows you to, by energy minimization,
link |
figure out the sequence of action
link |
that optimizes a particular objective function,
link |
which measures, minimizes the number of times
link |
you're gonna hit something
link |
and the energy you're gonna spend
link |
doing the gesture and et cetera.
link |
So that's a form of reasoning.
link |
Planning is a form of reasoning.
link |
And perhaps what led to the ability of humans to reason
link |
is the fact that, or species that appear before us
link |
had to do some sort of planning
link |
to be able to hunt and survive
link |
and survive the winter in particular.
link |
And so it's the same capacity that you need to have.
link |
So in your intuition is,
link |
if we look at expert systems
link |
and encoding knowledge as logic systems,
link |
as graphs, in this kind of way,
link |
is not a useful way to think about knowledge?
link |
Graphs are a little brittle or logic representation.
link |
So basically, variables that have values
link |
and then constraint between them
link |
that are represented by rules,
link |
is a little too rigid and too brittle, right?
link |
So some of the early efforts in that respect
link |
were to put probabilities on them.
link |
So a rule, if you have this and that symptom,
link |
you have this disease with that probability
link |
and you should prescribe that antibiotic
link |
with that probability, right?
link |
That's the mycin system from the 70s.
link |
And that's what that branch of AI led to,
link |
based on networks and graphical models
link |
and causal inference and variational method.
link |
So there is certainly a lot of interesting
link |
work going on in this area.
link |
The main issue with this is knowledge acquisition.
link |
How do you reduce a bunch of data to a graph of this type?
link |
Yeah, it relies on the expert, on the human being,
link |
to encode, to add knowledge.
link |
And that's essentially impractical.
link |
Yeah, it's not scalable.
link |
That's a big question.
link |
The second question is,
link |
do you want to represent knowledge as symbols
link |
and do you want to manipulate them with logic?
link |
And again, that's incompatible with learning.
link |
So one suggestion, which Jeff Hinton
link |
has been advocating for many decades,
link |
is replace symbols by vectors.
link |
Think of it as pattern of activities
link |
in a bunch of neurons or units
link |
or whatever you want to call them.
link |
And replace logic by continuous functions.
link |
Okay, and that becomes now compatible.
link |
There's a very good set of ideas
link |
by, written in a paper about 10 years ago
link |
by Leon Boutout, who is here at Facebook.
link |
The title of the paper is,
link |
From Machine Learning to Machine Reasoning.
link |
And his idea is that a learning system
link |
should be able to manipulate objects
link |
that are in a space
link |
and then put the result back in the same space.
link |
So it's this idea of working memory, basically.
link |
And it's very enlightening.
link |
And in a sense, that might learn something
link |
like the simple expert systems.
link |
I mean, you can learn basic logic operations there.
link |
Yeah, quite possibly.
link |
There's a big debate on sort of how much prior structure
link |
you have to put in for this kind of stuff to emerge.
link |
That's the debate I have with Gary Marcus
link |
and people like that.
link |
Yeah, yeah, so, and the other person,
link |
so I just talked to Judea Pearl,
link |
from the you mentioned causal inference world.
link |
So his worry is that the current neural networks
link |
are not able to learn what causes
link |
what causal inference between things.
link |
So I think he's right and wrong about this.
link |
If he's talking about the sort of classic
link |
type of neural nets,
link |
people sort of didn't worry too much about this.
link |
But there's a lot of people now working on causal inference.
link |
And there's a paper that just came out last week
link |
by Leon Boutou, among others,
link |
David Lopez, Baz, and a bunch of other people,
link |
exactly on that problem of how do you kind of
link |
get a neural net to sort of pay attention
link |
to real causal relationships,
link |
which may also solve issues of bias in data
link |
and things like this, so.
link |
I'd like to read that paper
link |
because that ultimately the challenges
link |
also seems to fall back on the human expert
link |
to ultimately decide causality between things.
link |
People are not very good
link |
at establishing causality, first of all.
link |
So first of all, you talk to physicists
link |
and physicists actually don't believe in causality
link |
because look at all the basic laws of microphysics
link |
are time reversible, so there's no causality.
link |
The arrow of time is not real, yeah.
link |
It's as soon as you start looking at macroscopic systems
link |
where there is unpredictable randomness,
link |
where there is clearly an arrow of time,
link |
but it's a big mystery in physics, actually,
link |
Is it emergent or is it part of
link |
the fundamental fabric of reality?
link |
Or is it a bias of intelligent systems
link |
that because of the second law of thermodynamics,
link |
we perceive a particular arrow of time,
link |
but in fact, it's kind of arbitrary, right?
link |
So yeah, physicists, mathematicians,
link |
they don't care about, I mean,
link |
the math doesn't care about the flow of time.
link |
Well, certainly, macrophysics doesn't.
link |
People themselves are not very good
link |
at establishing causal relationships.
link |
If you ask, I think it was in one of Seymour Papert's book
link |
on children learning.
link |
He studied with Jean Piaget.
link |
He's the guy who coauthored the book Perceptron
link |
with Marvin Minsky that kind of killed
link |
the first wave of neural nets,
link |
but he was actually a learning person.
link |
He, in the sense of studying learning in humans
link |
and machines, that's why he got interested in Perceptron.
link |
And he wrote that if you ask a little kid
link |
about what is the cause of the wind,
link |
a lot of kids will say, they will think for a while
link |
and they'll say, oh, it's the branches in the trees,
link |
they move and that creates wind, right?
link |
So they get the causal relationship backwards.
link |
And it's because their understanding of the world
link |
and intuitive physics is not that great, right?
link |
I mean, these are like, you know, four or five year old kids.
link |
You know, it gets better,
link |
and then you understand that this, it can be, right?
link |
But there are many things which we can,
link |
because of our common sense understanding of things,
link |
what people call common sense,
link |
and our understanding of physics,
link |
we can, there's a lot of stuff
link |
that we can figure out causality.
link |
Even with diseases, we can figure out
link |
what's not causing what, often.
link |
There's a lot of mystery, of course,
link |
but the idea is that you should be able
link |
to encode that into systems,
link |
because it seems unlikely they'd be able
link |
to figure that out themselves.
link |
Well, whenever we can do intervention,
link |
but you know, all of humanity has been completely deluded
link |
for millennia, probably since its existence,
link |
about a very, very wrong causal relationship,
link |
where whatever you can explain, you attribute it to,
link |
you know, some deity, some divinity, right?
link |
And that's a cop out, that's a way of saying like,
link |
I don't know the cause, so you know, God did it, right?
link |
So you mentioned Marvin Minsky,
link |
and the irony of, you know,
link |
maybe causing the first AI winter.
link |
You were there in the 90s, you were there in the 80s,
link |
In the 90s, why do you think people lost faith
link |
in deep learning, in the 90s, and found it again,
link |
a decade later, over a decade later?
link |
Yeah, it wasn't called deep learning yet,
link |
it was just called neural nets, but yeah,
link |
they lost interest.
link |
I mean, I think I would put that around 1995,
link |
at least the machine learning community,
link |
there was always a neural net community,
link |
but it became kind of disconnected
link |
from sort of mainstream machine learning, if you want.
link |
There were, it was basically electrical engineering
link |
that kept at it, and computer science gave up on neural nets.
link |
I don't know, you know, I was too close to it
link |
to really sort of analyze it with sort of an unbiased eye,
link |
if you want, but I would make a few guesses.
link |
So the first one is, at the time, neural nets were,
link |
it was very hard to make them work,
link |
in the sense that you would implement backprop
link |
in your favorite language, and that favorite language
link |
was not Python, it was not MATLAB,
link |
it was not any of those things,
link |
because they didn't exist, right?
link |
You had to write it in Fortran OC,
link |
or something like this, right?
link |
So you would experiment with it,
link |
you would probably make some very basic mistakes,
link |
like, you know, badly initialize your weights,
link |
make the network too small,
link |
because you read in the textbook, you know,
link |
you don't want too many parameters, right?
link |
And of course, you know, and you would train on XOR,
link |
because you didn't have any other data set to trade on.
link |
And of course, you know, it works half the time.
link |
So you would say, I give up.
link |
Also, you would train it with batch gradient,
link |
which, you know, isn't that sufficient.
link |
So there's a lot of, there's a bag of tricks
link |
that you had to know to make those things work,
link |
or you had to reinvent, and a lot of people just didn't,
link |
and they just couldn't make it work.
link |
So that's one thing.
link |
The investment in software platform
link |
to be able to kind of, you know, display things,
link |
figure out why things don't work,
link |
kind of get a good intuition for how to get them to work,
link |
have enough flexibility so you can create, you know,
link |
network architectures like convolutional nets
link |
and stuff like that.
link |
I mean, you had to write everything from scratch.
link |
And again, you didn't have any Python
link |
or MATLAB or anything, right?
link |
I read that, sorry to interrupt,
link |
but I read that you wrote in Lisp
link |
the first versions of Lanet with convolutional networks,
link |
which by the way, one of my favorite languages.
link |
That's how I knew you were legit.
link |
Turing award, whatever.
link |
You programmed in Lisp, that's...
link |
It's still my favorite language,
link |
but it's not that we programmed in Lisp,
link |
it's that we had to write our Lisp interpreter, okay?
link |
Because it's not like we used one that existed.
link |
So we wrote a Lisp interpreter that we hooked up to,
link |
you know, a backend library that we wrote also
link |
for sort of neural net computation.
link |
And then after a few years around 1991,
link |
we invented this idea of basically having modules
link |
that know how to forward propagate
link |
and back propagate gradients,
link |
and then interconnecting those modules in a graph.
link |
Number two had made proposals on this,
link |
about this in the late eighties,
link |
and we were able to implement this using our Lisp system.
link |
Eventually we wanted to use that system
link |
to build production code for character recognition
link |
So we actually wrote a compiler for that Lisp interpreter
link |
so that Patricia Simard, who is now at Microsoft,
link |
kind of did the bulk of it with Leon and me.
link |
And so we could write our system in Lisp
link |
and then compile to C,
link |
and then we'll have a self contained complete system
link |
that could kind of do the entire thing.
link |
Neither PyTorch nor TensorFlow can do this today.
link |
Yeah, okay, it's coming.
link |
I mean, there's something like that in PyTorch
link |
called TorchScript.
link |
And so, you know, we had to write our Lisp interpreter,
link |
we had to write our Lisp compiler,
link |
we had to invest a huge amount of effort to do this.
link |
And not everybody,
link |
if you don't completely believe in the concept,
link |
you're not going to invest the time to do this.
link |
Now at the time also, you know,
link |
or today, this would turn into Torch or PyTorch
link |
or TensorFlow or whatever,
link |
we'd put it in open source, everybody would use it
link |
and, you know, realize it's good.
link |
Back before 1995, working at AT&T,
link |
there's no way the lawyers would let you
link |
release anything in open source of this nature.
link |
And so we could not distribute our code really.
link |
And on that point,
link |
and sorry to go on a million tangents,
link |
but on that point, I also read that there was some,
link |
almost like a patent on convolutional neural networks
link |
So that, first of all, I mean, just.
link |
There's two actually.
link |
Thankfully, in 2007.
link |
So I'm gonna, what,
link |
can we just talk about that for a second?
link |
I know you're a Facebook, but you're also at NYU.
link |
And what does it mean to patent ideas
link |
like these software ideas, essentially?
link |
Or what are mathematical ideas?
link |
Okay, so they're not mathematical ideas.
link |
They are, you know, algorithms.
link |
And there was a period where the US Patent Office
link |
would allow the patent of software
link |
as long as it was embodied.
link |
The Europeans are very different.
link |
They don't quite accept that.
link |
They have a different concept.
link |
But, you know, I don't, I no longer,
link |
I mean, I never actually strongly believed in this,
link |
but I don't believe in this kind of patent.
link |
Facebook basically doesn't believe in this kind of patent.
link |
Google fires patents because they've been burned with Apple.
link |
And so now they do this for defensive purpose,
link |
but usually they say,
link |
we're not gonna sue you if you infringe.
link |
Facebook has a similar policy.
link |
They say, you know, we fire patents on certain things
link |
for defensive purpose.
link |
We're not gonna sue you if you infringe,
link |
unless you sue us.
link |
So the industry does not believe in patents.
link |
They are there because of, you know,
link |
the legal landscape and various things.
link |
But I don't really believe in patents
link |
for this kind of stuff.
link |
So that's a great thing.
link |
I'll tell you a worse story, actually.
link |
So what happens was the first patent about convolutional net
link |
was about kind of the early version of convolutional net
link |
that didn't have separate pooling layers.
link |
It had convolutional layers
link |
which tried more than one, if you want, right?
link |
And then there was a second one on convolutional nets
link |
with separate pooling layers, trained with backprop.
link |
And there were files filed in 89 and 1990
link |
or something like this.
link |
At the time, the life of a patent was 17 years.
link |
So here's what happened over the next few years
link |
is that we started developing character recognition
link |
technology around convolutional nets.
link |
a check reading system was deployed in ATM machines.
link |
In 1995, it was for large check reading machines
link |
in back offices, et cetera.
link |
And those systems were developed by an engineering group
link |
that we were collaborating with at AT&T.
link |
And they were commercialized by NCR,
link |
which at the time was a subsidiary of AT&T.
link |
Now AT&T split up in 1996,
link |
And the lawyers just looked at all the patents
link |
and they distributed the patents among the various companies.
link |
They gave the convolutional net patent to NCR
link |
because they were actually selling products that used it.
link |
But nobody at NCR had any idea what a convolutional net was.
link |
So between 1996 and 2007,
link |
so there's a whole period until 2002
link |
where I didn't actually work on machine learning
link |
or convolutional net.
link |
I resumed working on this around 2002.
link |
And between 2002 and 2007,
link |
I was working on them, crossing my finger
link |
that nobody at NCR would notice.
link |
Yeah, and I hope that this kind of somewhat,
link |
as you said, lawyers aside,
link |
relative openness of the community now will continue.
link |
It accelerates the entire progress of the industry.
link |
And the problems that Facebook and Google
link |
and others are facing today
link |
is not whether Facebook or Google or Microsoft or IBM
link |
or whoever is ahead of the other.
link |
It's that we don't have the technology
link |
to build the things we want to build.
link |
We want to build intelligent virtual assistants
link |
that have common sense.
link |
We don't have monopoly on good ideas for this.
link |
We don't believe we do.
link |
Maybe others believe they do, but we don't.
link |
If a startup tells you they have the secret
link |
to human level intelligence and common sense,
link |
don't believe them, they don't.
link |
And it's gonna take the entire work
link |
of the world research community for a while
link |
to get to the point where you can go off
link |
and each of those companies
link |
kind of start to build things on this.
link |
We're not there yet.
link |
It's absolutely, and this calls to the gap
link |
between the space of ideas
link |
and the rigorous testing of those ideas
link |
of practical application that you often speak to.
link |
You've written advice saying don't get fooled
link |
by people who claim to have a solution
link |
to artificial general intelligence,
link |
who claim to have an AI system
link |
that works just like the human brain
link |
or who claim to have figured out how the brain works.
link |
Ask them what the error rate they get
link |
on MNIST or ImageNet.
link |
So this is a little dated by the way.
link |
2000, I mean five years, who's counting?
link |
Okay, but I think your opinion is still,
link |
MNIST and ImageNet, yes, may be dated,
link |
there may be new benchmarks, right?
link |
But I think that philosophy is one you still
link |
in somewhat hold, that benchmarks
link |
and the practical testing, the practical application
link |
is where you really get to test the ideas.
link |
Well, it may not be completely practical.
link |
Like for example, it could be a toy data set,
link |
but it has to be some sort of task
link |
that the community as a whole has accepted
link |
as some sort of standard kind of benchmark if you want.
link |
It doesn't need to be real.
link |
So for example, many years ago here at FAIR,
link |
people, Jason West and Antoine Borne
link |
and a few others proposed the Babi tasks,
link |
which were kind of a toy problem to test
link |
the ability of machines to reason actually
link |
to access working memory and things like this.
link |
And it was very useful even though it wasn't a real task.
link |
MNIST is kind of halfway real task.
link |
So toy problems can be very useful.
link |
It's just that I was really struck by the fact
link |
that a lot of people, particularly a lot of people
link |
with money to invest would be fooled by people telling them,
link |
oh, we have the algorithm of the cortex
link |
and you should give us 50 million.
link |
So there's a lot of people who try to take advantage
link |
of the hype for business reasons and so on.
link |
But let me sort of talk to this idea
link |
that sort of new ideas, the ideas that push the field
link |
forward may not yet have a benchmark
link |
or it may be very difficult to establish a benchmark.
link |
That's part of the process.
link |
Establishing benchmarks is part of the process.
link |
So what are your thoughts about,
link |
so we have these benchmarks on around stuff we can do
link |
with images from classification to captioning
link |
to just every kind of information you can pull off
link |
from images and the surface level.
link |
There's audio data sets, there's some video.
link |
What can we start, natural language, what kind of stuff,
link |
what kind of benchmarks do you see that start creeping
link |
on to more something like intelligence, like reasoning,
link |
like maybe you don't like the term,
link |
but AGI echoes of that kind of formulation.
link |
A lot of people are working on interactive environments
link |
in which you can train and test intelligence systems.
link |
So there, for example, it's the classical paradigm
link |
of supervised learning is that you have a data set,
link |
you partition it into a training set, validation set,
link |
test set, and there's a clear protocol, right?
link |
But what if that assumes that the samples
link |
are statistically independent, you can exchange them,
link |
the order in which you see them shouldn't matter,
link |
But what if the answer you give determines
link |
the next sample you see, which is the case, for example,
link |
in robotics, right?
link |
You robot does something and then it gets exposed
link |
to a new room, and depending on where it goes,
link |
the room would be different.
link |
So that creates the exploration problem.
link |
The what if the samples, so that creates also a dependency
link |
between samples, right?
link |
You, if you move, if you can only move in space,
link |
the next sample you're gonna see is gonna be probably
link |
in the same building, most likely, right?
link |
So all the assumptions about the validity
link |
of this training set, test set hypothesis break.
link |
Whenever a machine can take an action
link |
that has an influence in the world,
link |
and it's what it's gonna see.
link |
So people are setting up artificial environments
link |
where that takes place, right?
link |
The robot runs around a 3D model of a house
link |
and can interact with objects and things like this.
link |
So you do robotics based simulation,
link |
you have those opening a gym type thing
link |
or Mujoko kind of simulated robots
link |
and you have games, things like that.
link |
So that's where the field is going really,
link |
this kind of environment.
link |
Now, back to the question of AGI.
link |
I don't like the term AGI because it implies
link |
that human intelligence is general
link |
and human intelligence is nothing like general.
link |
It's very, very specialized.
link |
We think it's general.
link |
We'd like to think of ourselves
link |
as having general intelligence.
link |
We don't, we're very specialized.
link |
We're only slightly more general than.
link |
Why does it feel general?
link |
So you kind of, the term general.
link |
I think what's impressive about humans is ability to learn,
link |
as we were talking about learning,
link |
to learn in just so many different domains.
link |
It's perhaps not arbitrarily general,
link |
but just you can learn in many domains
link |
and integrate that knowledge somehow.
link |
The knowledge persists.
link |
So let me take a very specific example.
link |
It's not an example.
link |
It's more like a quasi mathematical demonstration.
link |
So you have about 1 million fibers
link |
coming out of one of your eyes.
link |
Okay, 2 million total,
link |
but let's talk about just one of them.
link |
It's 1 million nerve fibers, your optical nerve.
link |
Let's imagine that they are binary.
link |
So they can be active or inactive, right?
link |
So the input to your visual cortex is 1 million bits.
link |
Now they're connected to your brain in a particular way,
link |
and your brain has connections
link |
that are kind of a little bit like a convolutional net,
link |
they're kind of local, you know, in space
link |
and things like this.
link |
Now, imagine I play a trick on you.
link |
It's a pretty nasty trick, I admit.
link |
I cut your optical nerve,
link |
and I put a device that makes a random perturbation
link |
of a permutation of all the nerve fibers.
link |
So now what comes to your brain
link |
is a fixed but random permutation of all the pixels.
link |
There's no way in hell that your visual cortex,
link |
even if I do this to you in infancy,
link |
will actually learn vision
link |
to the same level of quality that you can.
link |
Got it, and you're saying there's no way you've learned that?
link |
No, because now two pixels that are nearby in the world
link |
will end up in very different places in your visual cortex,
link |
and your neurons there have no connections with each other
link |
because they're only connected locally.
link |
So this whole, our entire, the hardware is built
link |
in many ways to support?
link |
The locality of the real world.
link |
Yes, that's specialization.
link |
Yeah, but it's still pretty damn impressive,
link |
so it's not perfect generalization, it's not even close.
link |
No, no, it's not that it's not even close, it's not at all.
link |
Yeah, it's not, it's specialized, yeah.
link |
So how many Boolean functions?
link |
So let's imagine you want to train your visual system
link |
to recognize particular patterns of those one million bits.
link |
Okay, so that's a Boolean function, right?
link |
Either the pattern is here or not here,
link |
this is a two way classification
link |
with one million binary inputs.
link |
How many such Boolean functions are there?
link |
Okay, you have two to the one million
link |
combinations of inputs,
link |
for each of those you have an output bit,
link |
and so you have two to the one million
link |
Boolean functions of this type, okay?
link |
Which is an unimaginably large number.
link |
How many of those functions can actually be computed
link |
by your visual cortex?
link |
And the answer is a tiny, tiny, tiny, tiny, tiny, tiny sliver.
link |
Like an enormously tiny sliver.
link |
So we are ridiculously specialized.
link |
But, okay, that's an argument against the word general.
link |
I think there's a, I agree with your intuition,
link |
but I'm not sure it's, it seems the brain is impressively
link |
capable of adjusting to things, so.
link |
It's because we can't imagine tasks
link |
that are outside of our comprehension, right?
link |
So we think we're general because we're general
link |
of all the things that we can apprehend.
link |
But there is a huge world out there
link |
of things that we have no idea.
link |
We call that heat, by the way.
link |
So, at least physicists call that heat,
link |
or they call it entropy, which is kind of.
link |
You have a thing full of gas, right?
link |
Closed system for gas.
link |
Closed or not closed.
link |
It has pressure, it has temperature, it has, you know,
link |
and you can write equations, PV equal N on T,
link |
you know, things like that, right?
link |
When you reduce the volume, the temperature goes up,
link |
the pressure goes up, you know, things like that, right?
link |
For perfect gas, at least.
link |
Those are the things you can know about that system.
link |
And it's a tiny, tiny number of bits
link |
compared to the complete information
link |
of the state of the entire system.
link |
Because the state of the entire system
link |
will give you the position of momentum
link |
of every molecule of the gas.
link |
And what you don't know about it is the entropy,
link |
and you interpret it as heat.
link |
The energy contained in that thing is what we call heat.
link |
Now, it's very possible that, in fact,
link |
there is some very strong structure
link |
in how those molecules are moving.
link |
It's just that they are in a way
link |
that we are just not wired to perceive.
link |
Yeah, we're ignorant to it.
link |
And there's, in your infinite amount of things,
link |
we're not wired to perceive.
link |
And you're right, that's a nice way to put it.
link |
We're general to all the things we can imagine,
link |
which is a very tiny subset of all things that are possible.
link |
So it's like comograph complexity
link |
or the comograph chitin sum of complexity.
link |
You know, every bit string or every integer is random,
link |
except for all the ones that you can actually write down.
link |
So beautifully put.
link |
But, you know, so we can just call it artificial intelligence.
link |
We don't need to have a general.
link |
Human level intelligence is good.
link |
You know, you'll start, anytime you touch human,
link |
it gets interesting because, you know,
link |
it's because we attach ourselves to human
link |
and it's difficult to define what human intelligence is.
link |
Nevertheless, my definition is maybe dem impressive
link |
intelligence, okay?
link |
Dem impressive demonstration of intelligence, whatever.
link |
And so on that topic, most successes in deep learning
link |
have been in supervised learning.
link |
What is your view on unsupervised learning?
link |
Is there a hope to reduce involvement of human input
link |
and still have successful systems
link |
that have practical use?
link |
Yeah, I mean, there's definitely a hope.
link |
It's more than a hope, actually.
link |
It's mounting evidence for it.
link |
And that's basically all I do.
link |
Like, the only thing I'm interested in at the moment is,
link |
I call it self supervised learning, not unsupervised.
link |
Because unsupervised learning is a loaded term.
link |
People who know something about machine learning,
link |
you know, tell you, so you're doing clustering or PCA,
link |
which is not the case.
link |
And the white public, you know,
link |
when you say unsupervised learning,
link |
oh my God, machines are gonna learn by themselves
link |
without supervision.
link |
You know, they see this as...
link |
Where's the parents?
link |
Yeah, so I call it self supervised learning
link |
because, in fact, the underlying algorithms that are used
link |
are the same algorithms as the supervised learning
link |
algorithms, except that what we train them to do
link |
is not predict a particular set of variables,
link |
like the category of an image,
link |
and not to predict a set of variables
link |
that have been provided by human labelers.
link |
But what you're trying the machine to do
link |
is basically reconstruct a piece of its input
link |
that is being maxed out, essentially.
link |
You can think of it this way, right?
link |
So show a piece of video to a machine
link |
and ask it to predict what's gonna happen next.
link |
And of course, after a while, you can show what happens
link |
and the machine will kind of train itself
link |
to do better at that task.
link |
You can do like all the latest, most successful models
link |
in natural language processing,
link |
use self supervised learning.
link |
You know, sort of BERT style systems, for example, right?
link |
You show it a window of a dozen words on a text corpus,
link |
you take out 15% of the words,
link |
and then you train the machine to predict the words
link |
that are missing, that self supervised learning.
link |
It's not predicting the future,
link |
it's just predicting things in the middle,
link |
but you could have it predict the future,
link |
that's what language models do.
link |
So you construct, so in an unsupervised way,
link |
you construct a model of language.
link |
Or video or the physical world or whatever, right?
link |
How far do you think that can take us?
link |
Do you think BERT understands anything?
link |
To some level, it has a shallow understanding of text,
link |
but it needs to, I mean,
link |
to have kind of true human level intelligence,
link |
I think you need to ground language in reality.
link |
So some people are attempting to do this, right?
link |
Having systems that kind of have some visual representation
link |
of what is being talked about,
link |
which is one reason you need
link |
those interactive environments actually.
link |
But this is like a huge technical problem
link |
that is not solved,
link |
and that explains why self supervised learning
link |
works in the context of natural language,
link |
but does not work in the context, or at least not well,
link |
in the context of image recognition and video,
link |
although it's making progress quickly.
link |
And the reason, that reason is the fact that
link |
it's much easier to represent uncertainty in the prediction
link |
in a context of natural language
link |
than it is in the context of things like video and images.
link |
So for example, if I ask you to predict
link |
what words are missing,
link |
15% of the words that I've taken out.
link |
The possibilities are small.
link |
That means... It's small, right?
link |
There is 100,000 words in the lexicon,
link |
and what the machine spits out
link |
is a big probability vector, right?
link |
It's a bunch of numbers between zero and one
link |
And we know how to do this with computers.
link |
So there, representing uncertainty in the prediction
link |
is relatively easy, and that's, in my opinion,
link |
why those techniques work for NLP.
link |
For images, if you ask...
link |
If you block a piece of an image,
link |
and you ask the system,
link |
reconstruct that piece of the image,
link |
there are many possible answers.
link |
They are all perfectly legit, right?
link |
And how do you represent this set of possible answers?
link |
You can't train a system to make one prediction.
link |
You can't train a neural net to say,
link |
here it is, that's the image,
link |
because there's a whole set of things
link |
that are compatible with it.
link |
So how do you get the machine to represent
link |
not a single output, but a whole set of outputs?
link |
And similarly with video prediction,
link |
there's a lot of things that can happen
link |
in the future of video.
link |
You're looking at me right now.
link |
I'm not moving my head very much,
link |
but I might turn my head to the left or to the right.
link |
If you don't have a system that can predict this,
link |
and you train it with least square
link |
to minimize the error with the prediction
link |
and what I'm doing,
link |
what you get is a blurry image of myself
link |
in all possible future positions that I might be in,
link |
which is not a good prediction.
link |
So there might be other ways
link |
to do the self supervision for visual scenes.
link |
I mean, if I knew, I wouldn't tell you,
link |
publish it first, I don't know.
link |
No, there might be.
link |
So I mean, these are kind of,
link |
there might be artificial ways of like self play in games,
link |
the way you can simulate part of the environment.
link |
Oh, that doesn't solve the problem.
link |
It's just a way of generating data.
link |
But because you have more of a control,
link |
like maybe you can control,
link |
yeah, it's a way to generate data.
link |
And because you can do huge amounts of data generation,
link |
that doesn't, you're right.
link |
Well, it creeps up on the problem from the side of data,
link |
and you don't think that's the right way to creep up.
link |
It doesn't solve this problem
link |
of handling uncertainty in the world, right?
link |
So if you have a machine learn a predictive model
link |
of the world in a game that is deterministic
link |
or quasi deterministic, it's easy, right?
link |
Just give a few frames of the game to a ConvNet,
link |
put a bunch of layers,
link |
and then have the game generates the next few frames.
link |
And if the game is deterministic, it works fine.
link |
And that includes feeding the system with the action
link |
that your little character is gonna take.
link |
The problem comes from the fact that the real world
link |
and most games are not entirely predictable.
link |
And so there you get those blurry predictions
link |
and you can't do planning with blurry predictions, right?
link |
So if you have a perfect model of the world,
link |
you can, in your head, run this model
link |
with a hypothesis for a sequence of actions,
link |
and you're going to predict the outcome
link |
of that sequence of actions.
link |
But if your model is imperfect, how can you plan?
link |
Yeah, it quickly explodes.
link |
What are your thoughts on the extension of this,
link |
which topic I'm super excited about,
link |
it's connected to something you were talking about
link |
in terms of robotics, is active learning.
link |
So as opposed to sort of completely unsupervised
link |
or self supervised learning,
link |
you ask the system for human help
link |
for selecting parts you want annotated next.
link |
So if you think about a robot exploring a space
link |
or a baby exploring a space
link |
or a system exploring a data set,
link |
every once in a while asking for human input,
link |
do you see value in that kind of work?
link |
I don't see transformative value.
link |
It's going to make things that we can already do
link |
more efficient or they will learn slightly more efficiently,
link |
but it's not going to make machines
link |
sort of significantly more intelligent.
link |
I think, and by the way, there is no opposition,
link |
there's no conflict between self supervised learning,
link |
reinforcement learning and supervised learning
link |
or imitation learning or active learning.
link |
I see self supervised learning
link |
as a preliminary to all of the above.
link |
So the example I use very often is how is it that,
link |
so if you use classical reinforcement learning,
link |
deep reinforcement learning, if you want,
link |
the best methods today,
link |
so called model free reinforcement learning
link |
to learn to play Atari games,
link |
take about 80 hours of training to reach the level
link |
that any human can reach in about 15 minutes.
link |
They get better than humans, but it takes them a long time.
link |
Alpha star, okay, the, you know,
link |
Aureal Vinyals and his teams,
link |
the system to play StarCraft plays,
link |
you know, a single map, a single type of player.
link |
A single player and can reach better than human level
link |
with about the equivalent of 200 years of training
link |
playing against itself.
link |
It's 200 years, right?
link |
It's not something that no human can ever do.
link |
I mean, I'm not sure what lesson to take away from that.
link |
Okay, now take those algorithms,
link |
the best algorithms we have today
link |
to train a car to drive itself.
link |
It would probably have to drive millions of hours.
link |
It will have to kill thousands of pedestrians.
link |
It will have to run into thousands of trees.
link |
It will have to run off cliffs.
link |
And it had to run off cliff multiple times
link |
before it figures out that it's a bad idea, first of all.
link |
And second of all, before it figures out how not to do it.
link |
And so, I mean, this type of learning obviously
link |
does not reflect the kind of learning
link |
that animals and humans do.
link |
There is something missing
link |
that's really, really important there.
link |
And my hypothesis, which I've been advocating
link |
for like five years now,
link |
is that we have predictive models of the world
link |
that include the ability to predict under uncertainty.
link |
And what allows us to not run off a cliff
link |
when we learn to drive,
link |
most of us can learn to drive in about 20 or 30 hours
link |
of training without ever crashing, causing any accident.
link |
And if we drive next to a cliff,
link |
we know that if we turn the wheel to the right,
link |
the car is gonna run off the cliff
link |
and nothing good is gonna come out of this.
link |
Because we have a pretty good model of intuitive physics
link |
that tells us the car is gonna fall.
link |
We know about gravity.
link |
Babies learn this around the age of eight or nine months
link |
that objects don't float, they fall.
link |
And we have a pretty good idea of the effect
link |
of turning the wheel on the car
link |
and we know we need to stay on the road.
link |
So there's a lot of things that we bring to the table,
link |
which is basically our predictive model of the world.
link |
And that model allows us to not do stupid things.
link |
And to basically stay within the context
link |
of things we need to do.
link |
We still face unpredictable situations
link |
and that's how we learn.
link |
But that allows us to learn really, really, really quickly.
link |
So that's called model based reinforcement learning.
link |
There's some imitation and supervised learning
link |
because we have a driving instructor
link |
that tells us occasionally what to do.
link |
But most of the learning is learning the model,
link |
learning physics that we've done since we were babies.
link |
That's where all, almost all the learning.
link |
And the physics is somewhat transferable from,
link |
it's transferable from scene to scene.
link |
Stupid things are the same everywhere.
link |
Yeah, I mean, if you have experience of the world,
link |
you don't need to be from a particularly intelligent species
link |
to know that if you spill water from a container,
link |
the rest is gonna get wet.
link |
You might get wet.
link |
So cats know this, right?
link |
Right, so the main problem we need to solve
link |
is how do we learn models of the world?
link |
That's what I'm interested in.
link |
That's what self supervised learning is all about.
link |
If you were to try to construct a benchmark for,
link |
let's look at MNIST.
link |
I love that data set.
link |
Do you think it's useful, interesting, slash possible
link |
to perform well on MNIST with just one example
link |
of each digit and how would we solve that problem?
link |
The answer is probably yes.
link |
The question is what other type of learning
link |
are you allowed to do?
link |
So if what you're allowed to do is train
link |
on some gigantic data set of labeled digit,
link |
that's called transfer learning.
link |
And we know that works, okay?
link |
We do this at Facebook, like in production, right?
link |
We train large convolutional nets to predict hashtags
link |
that people type on Instagram
link |
and we train on billions of images, literally billions.
link |
And then we chop off the last layer
link |
and fine tune on whatever task we want.
link |
That works really well.
link |
You can beat the ImageNet record with this.
link |
We actually open sourced the whole thing
link |
like a few weeks ago.
link |
Yeah, that's still pretty cool.
link |
But yeah, so what would be impressive?
link |
What's useful and impressive?
link |
What kind of transfer learning
link |
would be useful and impressive?
link |
Is it Wikipedia, that kind of thing?
link |
No, no, so I don't think transfer learning
link |
is really where we should focus.
link |
We should try to do,
link |
you know, have a kind of scenario for Benchmark
link |
where you have unlabeled data
link |
and you can, and it's very large number of unlabeled data.
link |
It could be video clips.
link |
It could be where you do, you know, frame prediction.
link |
It could be images where you could choose to,
link |
you know, mask a piece of it, could be whatever,
link |
but they're unlabeled and you're not allowed to label them.
link |
So you do some training on this,
link |
and then you train on a particular supervised task,
link |
ImageNet or a NIST,
link |
and you measure how your test error decrease
link |
or validation error decreases
link |
as you increase the number of label training samples.
link |
Okay, and what you'd like to see is that,
link |
you know, your error decreases much faster
link |
than if you train from scratch from random weights.
link |
So that to reach the same level of performance
link |
and a completely supervised, purely supervised system
link |
would reach you would need way fewer samples.
link |
So that's the crucial question
link |
because it will answer the question to like, you know,
link |
people interested in medical image analysis.
link |
Okay, you know, if I want to get to a particular level
link |
of error rate for this task,
link |
I know I need a million samples.
link |
Can I do, you know, self supervised pre training
link |
to reduce this to about 100 or something?
link |
And you think the answer there
link |
is self supervised pre training?
link |
Yeah, some form, some form of it.
link |
Telling you active learning, but you disagree.
link |
No, it's not useless.
link |
It's just not gonna lead to a quantum leap.
link |
It's just gonna make things that we already do.
link |
So you're way smarter than me.
link |
I just disagree with you.
link |
But I don't have anything to back that.
link |
It's just intuition.
link |
So I worked a lot of large scale data sets
link |
and there's something that might be magic
link |
in active learning, but okay.
link |
And at least I said it publicly.
link |
At least I'm being an idiot publicly.
link |
It's not being an idiot.
link |
It's, you know, working with the data you have.
link |
I mean, I mean, certainly people are doing things like,
link |
okay, I have 3000 hours of, you know,
link |
imitation learning for start driving car,
link |
but most of those are incredibly boring.
link |
What I like is select, you know, 10% of them
link |
that are kind of the most informative.
link |
And with just that, I would probably reach the same.
link |
So it's a weak form of active learning if you want.
link |
Yes, but there might be a much stronger version.
link |
Yeah, that's right.
link |
That's what, and that's an awful question if it exists.
link |
The question is how much stronger can you get?
link |
Elon Musk is confident.
link |
Talked to him recently.
link |
He's confident that large scale data and deep learning
link |
can solve the autonomous driving problem.
link |
What are your thoughts on the limits,
link |
possibilities of deep learning in this space?
link |
It's obviously part of the solution.
link |
I mean, I don't think we'll ever have a set driving system
link |
or at least not in the foreseeable future
link |
that does not use deep learning.
link |
Let me put it this way.
link |
Now, how much of it?
link |
So in the history of sort of engineering,
link |
particularly sort of AI like systems,
link |
there's generally a first phase where everything is built by hand.
link |
Then there is a second phase.
link |
And that was the case for autonomous driving 20, 30 years ago.
link |
There's a phase where there's a little bit of learning is used,
link |
but there's a lot of engineering that's involved in kind of
link |
taking care of corner cases and putting limits, et cetera,
link |
because the learning system is not perfect.
link |
And then as technology progresses,
link |
we end up relying more and more on learning.
link |
That's the history of character recognition,
link |
it's the history of science.
link |
Character recognition is the history of speech recognition,
link |
now computer vision, natural language processing.
link |
And I think the same is going to happen with autonomous driving
link |
that currently the methods that are closest
link |
to providing some level of autonomy,
link |
some decent level of autonomy
link |
where you don't expect a driver to kind of do anything
link |
is where you constrain the world.
link |
So you only run within 100 square kilometers
link |
or square miles in Phoenix where the weather is nice
link |
and the roads are wide, which is what Waymo is doing.
link |
You completely overengineer the car with tons of LIDARs
link |
and sophisticated sensors that are too expensive
link |
for consumer cars,
link |
but they're fine if you just run a fleet.
link |
And you engineer the hell out of everything else.
link |
You map the entire world.
link |
So you have complete 3D model of everything.
link |
So the only thing that the perception system
link |
has to take care of is moving objects
link |
and construction and sort of things that weren't in your map.
link |
And you can engineer a good SLAM system and all that stuff.
link |
So that's kind of the current approach
link |
that's closest to some level of autonomy.
link |
But I think eventually the longterm solution
link |
is going to rely more and more on learning
link |
and possibly using a combination
link |
of self supervised learning and model based reinforcement
link |
or something like that.
link |
But ultimately learning will be not just at the core,
link |
but really the fundamental part of the system.
link |
Yeah, it already is, but it will become more and more.
link |
What do you think it takes to build a system
link |
with human level intelligence?
link |
You talked about the AI system in the movie Her
link |
being way out of reach, our current reach.
link |
This might be outdated as well, but.
link |
It's still way out of reach.
link |
It's still way out of reach.
link |
What would it take to build Her?
link |
So I can tell you the first two obstacles
link |
that we have to clear,
link |
but I don't know how many obstacles there are after this.
link |
So the image I usually use is that
link |
there is a bunch of mountains that we have to climb
link |
and we can see the first one,
link |
but we don't know if there are 50 mountains behind it or not.
link |
And this might be a good sort of metaphor
link |
for why AI researchers in the past
link |
have been overly optimistic about the result of AI.
link |
You know, for example,
link |
Noel and Simon wrote the general problem solver
link |
and they called it the general problem solver.
link |
General problem solver.
link |
And of course, the first thing you realize
link |
is that all the problems you want to solve are exponential.
link |
And so you can't actually use it for anything useful,
link |
Yeah, so yeah, all you see is the first peak.
link |
So in general, what are the first couple of peaks for Her?
link |
So the first peak, which is precisely what I'm working on
link |
is self supervised learning.
link |
How do we get machines to run models of the world
link |
by observation, kind of like babies and like young animals?
link |
So we've been working with, you know, cognitive scientists.
link |
So this Emmanuelle Dupoux, who's at FAIR in Paris,
link |
is a half time, is also a researcher in a French university.
link |
And he has this chart that shows that which,
link |
how many months of life baby humans
link |
kind of learn different concepts.
link |
And you can measure this in sort of various ways.
link |
So things like distinguishing animate objects
link |
from inanimate objects,
link |
you can tell the difference at age two, three months.
link |
Whether an object is going to stay stable,
link |
is going to fall, you know,
link |
about four months, you can tell.
link |
You know, there are various things like this.
link |
And then things like gravity,
link |
the fact that objects are not supposed to float in the air,
link |
but are supposed to fall,
link |
you run this around the age of eight or nine months.
link |
If you look at the data,
link |
eight or nine months, if you look at a lot of,
link |
you know, eight month old babies,
link |
you give them a bunch of toys on their high chair.
link |
First thing they do is they throw them on the ground
link |
and they look at them.
link |
It's because, you know, they're learning about,
link |
actively learning about gravity.
link |
Okay, so they're not trying to annoy you,
link |
but they, you know, they need to do the experiment, right?
link |
So, you know, how do we get machines to learn like babies,
link |
mostly by observation with a little bit of interaction
link |
and learning those models of the world?
link |
Because I think that's really a crucial piece
link |
of an intelligent autonomous system.
link |
So if you think about the architecture
link |
of an intelligent autonomous system,
link |
it needs to have a predictive model of the world.
link |
So something that says, here is a world at time T,
link |
here is a state of the world at time T plus one,
link |
if I take this action.
link |
And it's not a single answer, it can be a...
link |
Yeah, it can be a distribution, yeah.
link |
Yeah, well, but we don't know how to represent
link |
distributions in high dimensional T spaces.
link |
So it's gotta be something weaker than that, okay?
link |
But with some representation of uncertainty.
link |
If you have that, then you can do what optimal control
link |
theorists call model predictive control,
link |
which means that you can run your model
link |
with a hypothesis for a sequence of action
link |
and then see the result.
link |
Now, what you need, the other thing you need
link |
is some sort of objective that you want to optimize.
link |
Am I reaching the goal of grabbing this object?
link |
Am I minimizing energy?
link |
Am I whatever, right?
link |
So there is some sort of objective that you have to minimize.
link |
And so in your head, if you have this model,
link |
you can figure out the sequence of action
link |
that will optimize your objective.
link |
That objective is something that ultimately is rooted
link |
in your basal ganglia, at least in the human brain,
link |
that's what it's basal ganglia,
link |
computes your level of contentment or miscontentment.
link |
I don't know if that's a word.
link |
Unhappiness, okay?
link |
Discontentment, maybe.
link |
And so your entire behavior is driven towards
link |
kind of minimizing that objective,
link |
which is maximizing your contentment,
link |
computed by your basal ganglia.
link |
And what you have is an objective function,
link |
which is basically a predictor
link |
of what your basal ganglia is going to tell you.
link |
So you're not going to put your hand on fire
link |
because you know it's going to burn
link |
and you're going to get hurt.
link |
And you're predicting this because of your model
link |
of the world and your sort of predictor
link |
of this objective, right?
link |
So if you have those three components,
link |
you have four components,
link |
you have the hardwired objective,
link |
hardwired contentment objective computer,
link |
if you want, calculator.
link |
And then you have the three components.
link |
One is the objective predictor,
link |
which basically predicts your level of contentment.
link |
One is the model of the world.
link |
And there's a third module I didn't mention,
link |
which is the module that will figure out
link |
the best course of action to optimize an objective
link |
given your model, okay?
link |
And you can call this a policy network
link |
or something like that, right?
link |
Now, you need those three components
link |
to act autonomously intelligently.
link |
And you can be stupid in three different ways.
link |
You can be stupid because your model of the world is wrong.
link |
You can be stupid because your objective is not aligned
link |
with what you actually want to achieve, okay?
link |
In humans, that would be a psychopath.
link |
And then the third way you can be stupid
link |
is that you have the right model,
link |
you have the right objective,
link |
but you're unable to figure out a course of action
link |
to optimize your objective given your model.
link |
Some people who are in charge of big countries
link |
actually have all three that are wrong.
link |
Okay, so if we think about this agent,
link |
if we think about the movie Her,
link |
you've criticized the art project
link |
that is Sophia the Robot.
link |
And what that project essentially does
link |
is uses our natural inclination to anthropomorphize
link |
things that look like human and give them more.
link |
Do you think that could be used by AI systems
link |
like in the movie Her?
link |
So do you think that body is needed
link |
to create a feeling of intelligence?
link |
Well, if Sophia was just an art piece,
link |
I would have no problem with it,
link |
but it's presented as something else.
link |
Let me, on that comment real quick,
link |
if creators of Sophia could change something
link |
about their marketing or behavior in general,
link |
I'm just about everything.
link |
I mean, don't you think, here's a tough question.
link |
Let me, so I agree with you.
link |
So Sophia is not, the general public feels
link |
that Sophia can do way more than she actually can.
link |
And the people who created Sophia
link |
are not honestly publicly communicating,
link |
trying to teach the public.
link |
But here's a tough question.
link |
Don't you think the same thing is scientists
link |
in industry and research are taking advantage
link |
of the same misunderstanding in the public
link |
when they create AI companies or publish stuff?
link |
Some companies, yes.
link |
I mean, there is no sense of,
link |
there's no desire to delude.
link |
There's no desire to kind of over claim
link |
when something is done, right?
link |
You publish a paper on AI that has this result
link |
on ImageNet, it's pretty clear.
link |
I mean, it's not even interesting anymore,
link |
but I don't think there is that.
link |
I mean, the reviewers are generally not very forgiving
link |
of unsupported claims of this type.
link |
And, but there are certainly quite a few startups
link |
that have had a huge amount of hype around this
link |
that I find extremely damaging
link |
and I've been calling it out when I've seen it.
link |
So yeah, but to go back to your original question,
link |
like the necessity of embodiment,
link |
I think, I don't think embodiment is necessary.
link |
I think grounding is necessary.
link |
So I don't think we're gonna get machines
link |
that really understand language
link |
without some level of grounding in the real world.
link |
And it's not clear to me that language
link |
is a high enough bandwidth medium
link |
to communicate how the real world works.
link |
So I think for this.
link |
Can you talk to what grounding means?
link |
So grounding means that,
link |
so there is this classic problem of common sense reasoning,
link |
you know, the Winograd schema, right?
link |
And so I tell you the trophy doesn't fit in the suitcase
link |
because it's too big,
link |
or the trophy doesn't fit in the suitcase
link |
because it's too small.
link |
And the it in the first case refers to the trophy
link |
in the second case to the suitcase.
link |
And the reason you can figure this out
link |
is because you know where the trophy and the suitcase are,
link |
you know, one is supposed to fit in the other one
link |
and you know the notion of size
link |
and a big object doesn't fit in a small object,
link |
unless it's a Tardis, you know, things like that, right?
link |
So you have this knowledge of how the world works,
link |
of geometry and things like that.
link |
I don't believe you can learn everything about the world
link |
by just being told in language how the world works.
link |
I think you need some low level perception of the world,
link |
you know, be it visual touch, you know, whatever,
link |
but some higher bandwidth perception of the world.
link |
By reading all the world's text,
link |
you still might not have enough information.
link |
There's a lot of things that just will never appear in text
link |
and that you can't really infer.
link |
So I think common sense will emerge from,
link |
you know, certainly a lot of language interaction,
link |
but also with watching videos
link |
or perhaps even interacting in virtual environments
link |
and possibly, you know, robot interacting in the real world.
link |
But I don't actually believe necessarily
link |
that this last one is absolutely necessary.
link |
But I think that there's a need for some grounding.
link |
But the final product
link |
doesn't necessarily need to be embodied, you're saying.
link |
It just needs to have an awareness, a grounding to.
link |
Right, but it needs to know how the world works
link |
to have, you know, to not be frustrating to talk to.
link |
And you talked about emotions being important.
link |
That's a whole nother topic.
link |
Well, so, you know, I talked about this,
link |
the basal ganglia as the thing
link |
that calculates your level of miscontentment.
link |
And then there is this other module
link |
that sort of tries to do a prediction
link |
of whether you're going to be content or not.
link |
That's the source of some emotion.
link |
So fear, for example, is an anticipation
link |
of bad things that can happen to you, right?
link |
You have this inkling that there is some chance
link |
that something really bad is going to happen to you
link |
and that creates fear.
link |
Well, you know for sure
link |
that something bad is going to happen to you,
link |
you kind of give up, right?
link |
It's not fear anymore.
link |
It's uncertainty that creates fear.
link |
So the punchline is,
link |
we're not going to have autonomous intelligence
link |
Whatever the heck emotions are.
link |
So you mentioned very practical things of fear,
link |
but there's a lot of other mess around it.
link |
But there are kind of the results of, you know, drives.
link |
Yeah, there's deeper biological stuff going on.
link |
And I've talked to a few folks on this.
link |
There's fascinating stuff
link |
that ultimately connects to our brain.
link |
If we create an AGI system, sorry.
link |
Human level intelligence.
link |
Human level intelligence system.
link |
And you get to ask her one question.
link |
What would that question be?
link |
You know, I think the first one we'll create
link |
would probably not be that smart.
link |
They'd be like a four year old.
link |
So you would have to ask her a question
link |
to know she's not that smart.
link |
Well, what's a good question to ask, you know,
link |
What is the cause of wind?
link |
And if she answers,
link |
oh, it's because the leaves of the tree are moving
link |
and that creates wind.
link |
She's onto something.
link |
And if she says that's a stupid question,
link |
she's really onto something.
link |
No, and then you tell her,
link |
actually, you know, here is the real thing.
link |
She says, oh yeah, that makes sense.
link |
So questions that reveal the ability
link |
to do common sense reasoning about the physical world.
link |
And you'll sum it up with causal inference.
link |
Well, it was a huge honor.
link |
Congratulations on your Turing Award.
link |
Thank you so much for talking today.
link |
Thank you for having me.