back to index

Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36


small model | large model

link |
00:00:00.000
The following is a conversation with Yanlacun.
link |
00:00:03.080
He's considered to be one of the fathers of deep learning,
link |
00:00:06.320
which, if you've been hiding under a rock,
link |
00:00:09.040
is the recent revolution in AI that has captivated the world
link |
00:00:12.240
with the possibility of what machines can learn from data.
link |
00:00:16.160
He's a professor at New York University,
link |
00:00:18.520
a vice president and chief AI scientist at Facebook,
link |
00:00:21.720
and co recipient of the Turing Award
link |
00:00:24.320
for his work on deep learning.
link |
00:00:26.240
He's probably best known as the founding father
link |
00:00:28.880
of convolutional neural networks,
link |
00:00:30.760
in particular, their application
link |
00:00:32.520
to optical character recognition
link |
00:00:34.440
and the famed MNIST dataset.
link |
00:00:37.280
He is also an outspoken personality,
link |
00:00:40.160
unafraid to speak his mind in a distinctive French accent
link |
00:00:43.840
and explore provocative ideas,
link |
00:00:45.760
both in the rigorous medium of academic research
link |
00:00:48.400
and the somewhat less rigorous medium of Twitter and Facebook.
link |
00:00:52.840
This is the Artificial Intelligence Podcast.
link |
00:00:55.640
If you enjoy it, subscribe on YouTube,
link |
00:00:58.000
give it five stars on iTunes,
link |
00:00:59.520
support it on Patreon,
link |
00:01:01.000
or simply connect with me on Twitter at Lex Freedman,
link |
00:01:03.880
spelled F R I D M A N.
link |
00:01:06.880
And now, here's my conversation with Leon Lacun.
link |
00:01:11.760
You said that 2001 Space Odyssey
link |
00:01:13.840
is one of your favorite movies.
link |
00:01:16.280
Hal 9000 decides to get rid of the astronauts
link |
00:01:20.400
for people who haven't seen the movie, Spoiler Alert,
link |
00:01:23.080
because he, it, she believes
link |
00:01:27.160
that the astronauts, they will interfere with the mission.
link |
00:01:31.640
Do you see Hal as flawed in some fundamental way
link |
00:01:34.720
or even evil, or did he do the right thing?
link |
00:01:38.480
Neither.
link |
00:01:39.360
There's no notion of evil in that, in that context,
link |
00:01:43.280
other than the fact that people die,
link |
00:01:44.760
but it was an example of what people call
link |
00:01:48.760
value misalignment, right?
link |
00:01:50.160
You give an objective to a machine,
link |
00:01:52.160
and the machine tries to achieve this objective.
link |
00:01:55.720
And if you don't put any constraints on this objective,
link |
00:01:58.160
like don't kill people and don't do things like this,
link |
00:02:02.280
the machine, given the power, will do stupid things
link |
00:02:06.280
just to achieve this, this objective,
link |
00:02:08.040
or damaging things to achieve this objective.
link |
00:02:10.240
It's a little bit like, I mean, we are used to this
link |
00:02:12.480
in the context of human society.
link |
00:02:15.760
We, we put in place laws to prevent people
link |
00:02:21.000
from doing bad things,
link |
00:02:22.160
because spontaneously they would do those bad things, right?
link |
00:02:24.840
So we have to shape their cost function,
link |
00:02:28.400
their objective function, if you want, through laws
link |
00:02:30.160
to kind of correct an education, obviously,
link |
00:02:33.360
to sort of correct for those.
link |
00:02:36.160
So maybe just pushing a little further on that point.
link |
00:02:41.960
Hal, you know, there's a mission.
link |
00:02:44.360
There's a fuzziness around the ambiguity
link |
00:02:47.640
around what the actual mission is.
link |
00:02:49.800
But, you know, do you think that there will be a time
link |
00:02:55.120
from a utilitarian perspective,
link |
00:02:56.760
when AI system, where it is not misalignment,
link |
00:02:59.680
where it is alignment for the greater good of society,
link |
00:03:02.840
that an AI system will make decisions that are difficult?
link |
00:03:05.920
Well, that's the trick.
link |
00:03:06.840
I mean, eventually we'll have to figure out how to do this.
link |
00:03:10.840
And again, we're not starting from scratch
link |
00:03:12.640
because we've been doing this with humans for millennia.
link |
00:03:16.480
So designing objective functions for people
link |
00:03:19.160
is something that we know how to do.
link |
00:03:20.880
And we don't do it by, you know, programming things,
link |
00:03:24.600
although the legal code is called code.
link |
00:03:29.040
So that tells you something.
link |
00:03:30.760
And it's actually the design of an objective function.
link |
00:03:33.040
That's really what legal code is, right?
link |
00:03:34.600
It tells you, here is what you can do,
link |
00:03:36.280
here is what you can't do.
link |
00:03:37.440
If you do it, you pay that much,
link |
00:03:39.040
that's an objective function.
link |
00:03:41.680
So there is this idea somehow that it's a new thing
link |
00:03:44.600
for people to try to design objective functions
link |
00:03:46.600
that are aligned with the common good.
link |
00:03:47.960
But no, we've been writing laws for millennia
link |
00:03:49.880
and that's exactly what it is.
link |
00:03:52.080
So that's where, you know,
link |
00:03:54.520
the science of lawmaking and computer science will...
link |
00:04:00.560
Come together.
link |
00:04:01.400
Will come together.
link |
00:04:02.840
So there's nothing special about how our AI systems
link |
00:04:06.760
is just the continuation of tools used
link |
00:04:09.480
to make some of these difficult ethical judgments
link |
00:04:11.720
that laws make.
link |
00:04:13.000
Yeah, and we have systems like this already
link |
00:04:15.080
that make many decisions for ourselves in society
link |
00:04:20.000
that need to be designed in a way that they...
link |
00:04:22.640
Like, you know, rules about things
link |
00:04:24.200
that sometimes have bad side effects.
link |
00:04:27.520
And we have to be flexible enough about those rules
link |
00:04:29.600
so that they can be broken when it's obvious
link |
00:04:31.600
that they shouldn't be applied.
link |
00:04:34.040
So you don't see this on the camera here,
link |
00:04:35.680
but all the decoration in this room
link |
00:04:36.960
is all pictures from 2001, it's based out of C.
link |
00:04:39.760
That's it.
link |
00:04:41.400
Wow, is that by accident?
link |
00:04:43.080
Or is there a lot?
link |
00:04:43.920
The accident is by design.
link |
00:04:47.480
Oh, wow.
link |
00:04:48.480
So if you were to build HAL 10,000,
link |
00:04:52.560
so an improvement of HAL 9,000, what would you improve?
link |
00:04:57.080
Well, first of all, I wouldn't ask you
link |
00:04:59.160
to hold secrets and tell lies
link |
00:05:01.960
because that's really what breaks it in the end.
link |
00:05:03.840
That's the fact that it's asking itself questions
link |
00:05:07.160
about the purpose of the mission.
link |
00:05:08.880
And it's, you know, pieces things together
link |
00:05:10.880
that it's heard, you know,
link |
00:05:11.720
all the secrecy of the preparation of the mission
link |
00:05:13.960
and the fact that it was discovery on the lunar surface
link |
00:05:17.680
that really was kept secret.
link |
00:05:19.120
And one part of HAL's memory knows this
link |
00:05:22.320
and the other part is, does not know it
link |
00:05:24.680
and is supposed to not tell anyone
link |
00:05:26.680
and that creates internal conflict.
link |
00:05:28.560
So you think there's never should be a set of things
link |
00:05:32.200
that an AI system should not be allowed,
link |
00:05:36.560
like a set of facts that should not be shared
link |
00:05:39.880
with the human operators?
link |
00:05:42.520
Well, I think, no, I think that,
link |
00:05:44.160
I think it should be a bit like in the design
link |
00:05:47.480
of autonomous AI systems.
link |
00:05:51.960
There should be the equivalent of, you know,
link |
00:05:54.200
the oath that hypocrites oaths
link |
00:05:59.040
that doctors sign up to, right?
link |
00:06:02.560
So there's certain things, certain rules
link |
00:06:04.040
that you have to abide by.
link |
00:06:05.960
And we can sort of hardwire this into our machines
link |
00:06:09.000
to kind of make sure they don't go.
link |
00:06:11.000
So I'm not, you know, an advocate of the $3 of robotics,
link |
00:06:15.280
you know, the azimov kind of thing
link |
00:06:17.120
because I don't think it's practical,
link |
00:06:18.560
but, you know, some level of limits.
link |
00:06:23.240
But to be clear, this is not,
link |
00:06:27.000
these are not questions that are kind of reworth asking today
link |
00:06:32.040
because we just don't have the technology to do this.
link |
00:06:34.360
We don't have autonomous intelligent machines.
link |
00:06:36.440
We have intelligent machines.
link |
00:06:37.560
Some are intelligent machines that are very specialized,
link |
00:06:41.000
but they don't really sort of satisfy an objective.
link |
00:06:43.360
They're just, you know, kind of trained to do one thing.
link |
00:06:46.520
So until we have some idea for design
link |
00:06:50.000
of a full fledged autonomous intelligent system,
link |
00:06:53.360
asking the question of how design is subjective,
link |
00:06:55.680
I think is a little too abstract.
link |
00:06:58.600
It's a little too abstract.
link |
00:06:59.680
There's useful elements to it
link |
00:07:01.600
in that it helps us understand
link |
00:07:04.240
our own ethical codes, humans.
link |
00:07:07.960
So even just as a thought experiment,
link |
00:07:10.240
if you imagine that an AGI system is here today,
link |
00:07:14.280
how would we program it
link |
00:07:15.920
is a kind of nice thought experiment of constructing,
link |
00:07:18.360
how should we have a system of laws for us humans?
link |
00:07:24.360
It's just a nice practical tool.
link |
00:07:26.800
And I think there's echoes of that idea too
link |
00:07:29.760
in the AI systems we have today.
link |
00:07:32.160
They don't have to be that intelligent.
link |
00:07:33.960
Yeah.
link |
00:07:34.800
Like autonomous vehicles.
link |
00:07:35.640
These things start creeping in
link |
00:07:37.760
that they're worth thinking about,
link |
00:07:39.200
but certainly they shouldn't be framed as how.
link |
00:07:43.720
Looking back, what is the most,
link |
00:07:46.720
I'm sorry if it's a silly question,
link |
00:07:49.440
but what is the most beautiful
link |
00:07:51.440
or surprising idea in deep learning
link |
00:07:53.800
or AI in general that you've ever come across?
link |
00:07:56.320
So personally, when you said back,
link |
00:08:00.040
and just had this kind of,
link |
00:08:01.960
oh, that's pretty cool moment.
link |
00:08:03.920
That's nice.
link |
00:08:04.760
That's surprising.
link |
00:08:05.600
I don't know if it's an idea
link |
00:08:06.560
rather than a sort of empirical fact.
link |
00:08:12.200
The fact that you can build gigantic neural nets,
link |
00:08:16.480
train them on relatively small amounts of data relatively
link |
00:08:23.440
with stochastic gradient descent,
link |
00:08:24.840
and that it actually works,
link |
00:08:26.960
breaks everything you read in every textbook, right?
link |
00:08:29.280
Every pre deep learning textbook
link |
00:08:31.520
I told you, you need to have fewer parameters
link |
00:08:33.920
and you have data samples.
link |
00:08:37.080
If you have nonconvex objective function,
link |
00:08:38.760
you have no guarantee of convergence.
link |
00:08:40.680
All those things that you read in textbook,
link |
00:08:42.080
and they tell you, stay away from this,
link |
00:08:43.480
and they're all wrong.
link |
00:08:45.160
Huge number of parameters, nonconvex,
link |
00:08:48.080
and somehow which is very relative
link |
00:08:50.320
to the number of parameters data,
link |
00:08:53.480
it's able to learn anything.
link |
00:08:55.080
Does that still surprise you today?
link |
00:08:57.520
Well, it was kind of obvious to me before I knew anything
link |
00:09:02.000
that this is a good idea.
link |
00:09:04.120
And then it became surprising that it worked
link |
00:09:06.040
because I started reading those textbooks.
link |
00:09:09.240
Okay, so do you talk through the intuition
link |
00:09:12.320
of why it was obvious to you if you remember?
link |
00:09:14.360
Well, okay, so the intuition was,
link |
00:09:16.120
it's sort of like those people in the late 19th century
link |
00:09:19.960
who proved that heavier than air flight was impossible, right?
link |
00:09:25.480
And of course you have birds, right?
link |
00:09:26.800
They do fly.
link |
00:09:28.280
And so on the face of it,
link |
00:09:30.320
it's obviously wrong as an empirical question, right?
link |
00:09:33.200
And so we have the same kind of thing that,
link |
00:09:35.960
you know, we know that the brain works.
link |
00:09:38.560
We don't know how, but we know it works.
link |
00:09:39.920
And we know it's a large network of neurons
link |
00:09:42.440
and interaction and that learning takes place
link |
00:09:44.280
by changing the connection.
link |
00:09:45.360
So kind of getting this level of inspiration
link |
00:09:48.000
without covering the details,
link |
00:09:49.320
but sort of trying to derive basic principles.
link |
00:09:52.520
You know, that kind of gives you a clue
link |
00:09:56.800
as to which direction to go.
link |
00:09:58.360
There's also the idea somehow
link |
00:09:59.680
that I've been convinced of since I was an undergrad
link |
00:10:02.080
that even before that intelligence
link |
00:10:05.480
is inseparable from learning.
link |
00:10:06.880
So the idea somehow that you can create
link |
00:10:10.040
an intelligent machine by basically programming,
link |
00:10:14.080
for me was a non starter, you know, from the start.
link |
00:10:17.440
Every intelligent entity that we know about
link |
00:10:20.280
arrives at this intelligence through learning.
link |
00:10:25.000
So learning, you know, machine learning
link |
00:10:26.280
was a completely obvious path.
link |
00:10:30.000
Also because I'm lazy.
link |
00:10:30.960
So, you know, kind of.
link |
00:10:32.440
These automate basically everything
link |
00:10:35.200
and learning is the automation of intelligence.
link |
00:10:37.920
Right.
link |
00:10:39.240
So do you think, so what is learning then?
link |
00:10:43.000
What falls under learning?
link |
00:10:44.600
Because do you think of reasoning as learning?
link |
00:10:48.320
Well, reasoning is certainly a consequence
link |
00:10:51.320
of learning as well,
link |
00:10:53.320
just like other functions of the brain.
link |
00:10:56.320
The big question about reasoning is,
link |
00:10:58.320
how do you make reasoning compatible
link |
00:11:00.320
with gradient based learning?
link |
00:11:02.320
Do you think neural networks can be made to reason?
link |
00:11:04.320
Yes, there is no question about that.
link |
00:11:06.320
Again, we have a good example, right?
link |
00:11:10.320
The question is how?
link |
00:11:11.320
So the question is how much prior structure
link |
00:11:13.320
do you have to put in the neural net
link |
00:11:15.320
so that something like human reasoning
link |
00:11:17.320
will emerge from it, you know, from learning?
link |
00:11:21.320
Another question is all of our kind of model
link |
00:11:24.320
of what reasoning is that are based on logic
link |
00:11:27.320
are discrete and are therefore incompatible
link |
00:11:31.320
with gradient based learning.
link |
00:11:33.320
And I'm a very strong believer in this idea
link |
00:11:35.320
of gradient based learning.
link |
00:11:36.320
I don't believe that other types of learning
link |
00:11:39.320
that don't use kind of gradient information
link |
00:11:41.320
if you want.
link |
00:11:42.320
So you don't like discrete mathematics.
link |
00:11:43.320
You don't like anything discrete?
link |
00:11:45.320
Well, that's, it's not that I don't like it.
link |
00:11:47.320
It's just that it's incompatible with learning
link |
00:11:49.320
and I'm a big fan of learning, right?
link |
00:11:51.320
So in fact, that's perhaps one reason why deep learning
link |
00:11:56.320
has been kind of looked at with suspicion
link |
00:11:58.320
by a lot of computer scientists
link |
00:11:59.320
because the math is very different.
link |
00:12:00.320
The math that you use for deep learning,
link |
00:12:02.320
you know, it kind of has more to do with, you know,
link |
00:12:05.320
cybernetics, the kind of math you do
link |
00:12:08.320
in electrical engineering
link |
00:12:09.320
than the kind of math you do in computer science.
link |
00:12:12.320
And, you know, nothing in machine learning is exact, right?
link |
00:12:16.320
Computer science is all about sort of, you know,
link |
00:12:19.320
obsessive compulsive attention to details
link |
00:12:21.320
of like, you know, every index has to be right
link |
00:12:24.320
and you can prove that an algorithm is correct, right?
link |
00:12:26.320
Machine learning is the science of sloppiness, really.
link |
00:12:31.320
That's beautiful.
link |
00:12:33.320
So, okay, maybe let's feel around in the dark
link |
00:12:38.320
of what is a neural network that reasons
link |
00:12:41.320
or a system that works with continuous functions
link |
00:12:47.320
that's able to do, build knowledge.
link |
00:12:52.320
However we think about reasoning,
link |
00:12:54.320
build on previous knowledge, build on extra knowledge,
link |
00:12:57.320
create new knowledge, generalize outside
link |
00:13:00.320
of any training set ever built, what does that look like?
link |
00:13:04.320
If, yeah, maybe do you have inklings of thoughts
link |
00:13:08.320
of what that might look like?
link |
00:13:10.320
Yeah, I mean, yes and no.
link |
00:13:12.320
If I had precise ideas about this,
link |
00:13:14.320
I think, you know, we'll be building it right now.
link |
00:13:16.320
But, and there are people working on this
link |
00:13:18.320
whose main research interest is actually exactly that, right?
link |
00:13:22.320
So, what you need to have is a working memory.
link |
00:13:25.320
So, you need to have some device, if you want,
link |
00:13:29.320
some subsystem that can store a relatively large number
link |
00:13:34.320
of factual, episodic information for, you know,
link |
00:13:38.320
reasonable amount of time.
link |
00:13:40.320
So, you know, in the brain, for example,
link |
00:13:43.320
there are kind of three main types of memory.
link |
00:13:45.320
One is the sort of memory of the state of your cortex.
link |
00:13:52.320
And that sort of disappears within 20 seconds.
link |
00:13:55.320
You can't remember things for more than about 20 seconds
link |
00:13:57.320
or a minute if you don't have any other form of memory.
link |
00:14:01.320
The second type of memory, which is longer term,
link |
00:14:04.320
is short term, is the hippocampus.
link |
00:14:06.320
So, you can, you know, you came into this building,
link |
00:14:08.320
you remember where the exit is, where the elevators are.
link |
00:14:13.320
You have some map of that building
link |
00:14:15.320
that's stored in your hippocampus.
link |
00:14:17.320
You might remember something about what I said,
link |
00:14:20.320
you know, a few minutes ago.
link |
00:14:21.320
I forgot it already.
link |
00:14:22.320
Of course, it's been erased.
link |
00:14:23.320
But, you know, that would be in your hippocampus.
link |
00:14:27.320
And then the longer term memory is in the synapse.
link |
00:14:30.320
The synapses, right?
link |
00:14:32.320
So, what you need if you want a system
link |
00:14:34.320
that's capable of reasoning is that you want
link |
00:14:36.320
the hippocampus like thing, right?
link |
00:14:39.320
And that's what people have tried to do
link |
00:14:41.320
with memory networks and, you know,
link |
00:14:43.320
neural engineering machines and stuff like that, right?
link |
00:14:45.320
And now with transformers, which have sort of a memory
link |
00:14:49.320
in their kind of self attention system.
link |
00:14:51.320
You can think of it this way.
link |
00:14:53.320
So, that's one element you need.
link |
00:14:56.320
Another thing you need is some sort of network
link |
00:14:59.320
that can access this memory,
link |
00:15:04.320
get an information back and then kind of crunch on it
link |
00:15:07.320
and then do this iteratively multiple times
link |
00:15:10.320
because a chain of reasoning is a process
link |
00:15:15.320
by which you can update your knowledge
link |
00:15:19.320
about the state of the world,
link |
00:15:20.320
about, you know, what's going to happen, et cetera.
link |
00:15:22.320
And that has to be this sort of recurrent operation, basically.
link |
00:15:26.320
And you think that kind of, if we think about a transformer,
link |
00:15:30.320
so that seems to be too small to contain the knowledge
link |
00:15:33.320
that's to represent the knowledge that's contained
link |
00:15:37.320
in Wikipedia, for example.
link |
00:15:38.320
Well, a transformer doesn't have this idea of recurrence.
link |
00:15:41.320
It's got a fixed number of layers
link |
00:15:42.320
and that's the number of steps that, you know,
link |
00:15:44.320
limits basically as a representation.
link |
00:15:46.320
But recurrence would build on the knowledge somehow.
link |
00:15:50.320
I mean, it would evolve the knowledge
link |
00:15:54.320
and expand the amount of information,
link |
00:15:57.320
perhaps, or useful information within that knowledge.
link |
00:16:00.320
But is this something that just can emerge with size?
link |
00:16:04.320
Because it seems like everything we have now is too small.
link |
00:16:06.320
No, it's not clear.
link |
00:16:09.320
I mean, how you access and write into an associated memory
link |
00:16:12.320
in an efficient way.
link |
00:16:13.320
I mean, sort of the original memory network
link |
00:16:15.320
maybe had something like the right architecture,
link |
00:16:17.320
but if you try to scale up a memory network
link |
00:16:20.320
so that the memory contains all of Wikipedia,
link |
00:16:22.320
it doesn't quite work.
link |
00:16:24.320
So there's a need for new ideas there.
link |
00:16:27.320
But it's not the only form of reasoning.
link |
00:16:29.320
So there's another form of reasoning,
link |
00:16:31.320
which is very classical also in some types of AI,
link |
00:16:36.320
and it's based on, let's call it energy minimization.
link |
00:16:40.320
So you have some sort of objective,
link |
00:16:44.320
some energy function that represents the quality
link |
00:16:50.320
or the negative quality.
link |
00:16:52.320
Energy goes up when things get bad
link |
00:16:54.320
and they get low when things get good.
link |
00:16:56.320
So let's say you want to figure out what gestures
link |
00:17:00.320
do I need to do to grab an object or walk out the door.
link |
00:17:07.320
If you have a good model of your own body,
link |
00:17:09.320
a good model of the environment,
link |
00:17:11.320
using this kind of energy minimization,
link |
00:17:13.320
you can do planning.
link |
00:17:16.320
And it's in optimal control, it's called model predictive control.
link |
00:17:21.320
You have a model of what's going to happen in the world
link |
00:17:23.320
as a consequence of your actions.
link |
00:17:25.320
And that allows you to buy energy minimization,
link |
00:17:28.320
figure out a sequence of action
link |
00:17:29.320
that optimizes a particular objective function,
link |
00:17:31.320
which measures the number of times you're going to hit something
link |
00:17:34.320
and the energy you're going to spend doing the gesture and etc.
link |
00:17:39.320
So that's a form of reasoning.
link |
00:17:42.320
Planning is a form of reasoning.
link |
00:17:43.320
And perhaps what led to the ability of humans to reason
link |
00:17:47.320
is the fact that species that appear before us
link |
00:17:53.320
had to do some sort of planning to be able to hunt and survive
link |
00:17:56.320
and survive the winter in particular.
link |
00:17:59.320
And so it's the same capacity that you need to have.
link |
00:18:03.320
So in your intuition, if we look at expert systems,
link |
00:18:09.320
and encoding knowledge as logic systems,
link |
00:18:13.320
as graphs in this kind of way,
link |
00:18:16.320
is not a useful way to think about knowledge?
link |
00:18:20.320
Graphs are a little brittle or logic representation.
link |
00:18:24.320
So basically, variables that have values
link |
00:18:28.320
and then constrained between them that are represented by rules
link |
00:18:31.320
is a little too rigid and too brittle.
link |
00:18:33.320
So some of the early efforts in that respect
link |
00:18:38.320
were to put probabilities on them.
link |
00:18:41.320
So a rule, if you have this and that symptom,
link |
00:18:44.320
you have this disease with that probability
link |
00:18:47.320
and you should prescribe that antibiotic with that probability.
link |
00:18:50.320
That's the mysine system from the 70s.
link |
00:18:54.320
And that branch of AI led to business networks
link |
00:18:59.320
and graphical models and causal inference
link |
00:19:02.320
and variational method.
link |
00:19:05.320
So there is certainly a lot of interesting work going on
link |
00:19:10.320
in this area.
link |
00:19:11.320
The main issue with this is knowledge acquisition.
link |
00:19:13.320
How do you reduce a bunch of data to a graph of this type?
link |
00:19:19.320
It relies on the expert on the human being to encode,
link |
00:19:23.320
to add knowledge.
link |
00:19:24.320
And that's essentially impractical.
link |
00:19:27.320
So that's a big question.
link |
00:19:29.320
The second question is, do you want to represent knowledge
link |
00:19:32.320
as symbols and do you want to manipulate them with logic?
link |
00:19:36.320
And again, that's incompatible with learning.
link |
00:19:38.320
So one suggestion with Jeff Hinton
link |
00:19:42.320
has been advocating for many decades
link |
00:19:44.320
is replace symbols by vectors.
link |
00:19:48.320
Think of it as pattern of activities
link |
00:19:50.320
in a bunch of neurons or units or whatever you want to call them.
link |
00:19:54.320
And replace logic by continuous functions.
link |
00:19:58.320
And that becomes now compatible.
link |
00:20:01.320
There's a very good set of ideas
link |
00:20:04.320
written in a paper about 10 years ago
link |
00:20:07.320
by Leon Botou who is here at Facebook.
link |
00:20:12.320
The title of the paper is
link |
00:20:14.320
From Machine Learning to Machine Reasoning.
link |
00:20:15.320
And his idea is that a learning system
link |
00:20:19.320
should be able to manipulate objects that are in a space
link |
00:20:22.320
and then put the result back in the same space.
link |
00:20:24.320
So it's this idea of working memory basically.
link |
00:20:27.320
And it's very enlightening.
link |
00:20:30.320
And in a sense, that might learn something
link |
00:20:33.320
like the simple expert systems.
link |
00:20:37.320
I mean, you can learn basic logic operations there.
link |
00:20:41.320
Yeah, quite possibly.
link |
00:20:43.320
There's a big debate on how much prior structure
link |
00:20:46.320
you have to put in for this kind of stuff to emerge.
link |
00:20:48.320
That's the debate I have with Gary Marcus and people like that.
link |
00:20:51.320
Yeah, so and the other person,
link |
00:20:54.320
so I just talked to Judea Pearl
link |
00:20:57.320
and he mentioned causal inference world.
link |
00:21:00.320
So his worry is that the current neural networks
link |
00:21:04.320
are not able to learn what causes
link |
00:21:09.320
what causal inference between things.
link |
00:21:12.320
So I think he's right and wrong about this.
link |
00:21:15.320
If he's talking about the sort of classic type of neural nets,
link |
00:21:21.320
people sort of didn't worry too much about this.
link |
00:21:23.320
But there's a lot of people now working on causal inference.
link |
00:21:26.320
There's a paper that just came out last week
link |
00:21:28.320
by Leon Boutou, among others,
link |
00:21:29.320
the Vila Pespas and a bunch of other people.
link |
00:21:32.320
Exactly on that problem of how do you kind of,
link |
00:21:36.320
you know, get a neural net to sort of pay attention
link |
00:21:39.320
to real causal relationships,
link |
00:21:41.320
which may also solve issues of bias in data
link |
00:21:46.320
and things like this.
link |
00:21:48.320
I'd like to read that paper because that ultimately
link |
00:21:51.320
challenges also seems to fall back on the human expert
link |
00:21:56.320
to ultimately decide causality between things.
link |
00:22:01.320
People are not very good at establishing causality, first of all.
link |
00:22:04.320
So first of all, you talk to physicists
link |
00:22:06.320
and physicists actually don't believe in causality
link |
00:22:08.320
because look at all the basic laws of macro physics
link |
00:22:12.320
are time reversible, so there's no causality.
link |
00:22:15.320
The era of time is not real.
link |
00:22:17.320
It's as soon as you start looking at macroscopic systems
link |
00:22:20.320
where there is unpredictable randomness
link |
00:22:22.320
where there is clearly an hour of time,
link |
00:22:25.320
but it's a big mystery in physics, actually, how that emerges.
link |
00:22:28.320
Is it emergent or is it part of the fundamental fabric of reality?
link |
00:22:34.320
Or is it a bias of intelligent systems
link |
00:22:36.320
that, you know, because of the second law of thermodynamics,
link |
00:22:39.320
we perceive a particular hour of time,
link |
00:22:41.320
but in fact, it's kind of arbitrary, right?
link |
00:22:44.320
So yeah, physicists, mathematicians, they don't care about,
link |
00:22:47.320
I mean, the math doesn't care about the flow of time.
link |
00:22:51.320
Well, certainly macro physics doesn't.
link |
00:22:53.320
People themselves are not very good at establishing causal relationships.
link |
00:22:58.320
If you ask, I think it was in one of Seymour Papert's book
link |
00:23:02.320
on, like, children learning.
link |
00:23:06.320
You know, he studied with Jean Piaget.
link |
00:23:08.320
He's the guy who coauthored the book Perception with Marvin Minsky
link |
00:23:12.320
that kind of killed the first wave of neural nets.
link |
00:23:14.320
But he was actually a learning person.
link |
00:23:17.320
He, in the sense of studying learning in humans and machines.
link |
00:23:22.320
That's why he got interested in Perceptron.
link |
00:23:24.320
And he wrote that if you ask a little kid about what is the cause of the wind,
link |
00:23:33.320
a lot of kids will say, they will think for a while and they will say,
link |
00:23:36.320
oh, it's the branches in the trees.
link |
00:23:38.320
They move and that creates wind, right?
link |
00:23:40.320
So they get the causal relationship backwards.
link |
00:23:42.320
And it's because they're understanding of the world and intuitive physics.
link |
00:23:45.320
It's not that great, right?
link |
00:23:46.320
I mean, these are like, you know, four or five year old kids.
link |
00:23:49.320
You know, it gets better and then you understand that this, it can be, right?
link |
00:23:53.320
But there are many things which we can, because of our common sense understanding of things,
link |
00:24:00.320
what people call common sense.
link |
00:24:02.320
Yeah.
link |
00:24:03.320
And we're understanding of physics.
link |
00:24:05.320
We can, there's a lot of stuff that we can figure out causality, even with diseases.
link |
00:24:09.320
We can figure out what's not causing what often.
link |
00:24:13.320
There's a lot of mystery, of course, but the idea is that you should be able to encode that into systems.
link |
00:24:19.320
Because it seems unlikely they'd be able to figure that out themselves.
link |
00:24:22.320
Well, whenever we can do intervention, but you know, all of humanity has been completely deluded
link |
00:24:26.320
for millennia, probably since existence, about a very, very wrong causal relationship
link |
00:24:32.320
where whatever you can explain, you're attributed to, you know, some deity, some divinity, right?
link |
00:24:38.320
And that's a cup out.
link |
00:24:40.320
That's a way of saying like, I don't know the cause.
link |
00:24:42.320
So, you know, God did it, right?
link |
00:24:44.320
So you mentioned Marvin Minsky and the irony of, you know, maybe causing the first day I winter.
link |
00:24:54.320
You were there in the 90s.
link |
00:24:56.320
You were there in the 80s, of course.
link |
00:24:58.320
In the 90s, what do you think people lost faith in deep learning in the 90s
link |
00:25:02.320
and found it again a decade later, over a decade later?
link |
00:25:06.320
Yeah.
link |
00:25:07.320
Deep learning, yeah, it was just called neural nets.
link |
00:25:09.320
You know, that works.
link |
00:25:11.320
Yeah, they lost interest.
link |
00:25:13.320
I mean, I think I would put that around 1995, at least the machine learning community.
link |
00:25:18.320
There was always a neural net community, but it became kind of disconnected from sort of mainstream machine learning if you want.
link |
00:25:28.320
There were, it was basically electrical engineering that kept at it.
link |
00:25:32.320
Right.
link |
00:25:33.320
And computer science.
link |
00:25:35.320
Just gave up.
link |
00:25:36.320
Neural nets.
link |
00:25:37.320
I don't, I don't know.
link |
00:25:39.320
You know, I was too close to it to really sort of analyze it with sort of a unbiased eye if you want.
link |
00:25:47.320
But I would, I would, I would make a few guesses.
link |
00:25:50.320
So the first one is at the time neural nets were, it was very hard to make them work in a sense that you would, you know, implement backprop in your favorite language.
link |
00:26:03.320
And that favorite language was not Python.
link |
00:26:06.320
It was not MATLAB.
link |
00:26:07.320
It was not any of those things because they didn't exist.
link |
00:26:10.320
Right.
link |
00:26:11.320
You had to write it in Fortran or C or something like this.
link |
00:26:14.320
Right.
link |
00:26:15.320
So you would experiment with it.
link |
00:26:18.320
You would probably make some very basic mistakes, like, you know, badly initialize your weights, make the network too small because you're already in the textbook, you know, you don't want too many parameters.
link |
00:26:26.320
Right.
link |
00:26:27.320
And of course, you know, and you would train on XOR because you didn't have any other data set to trade on.
link |
00:26:31.320
And of course, you know, it works half the time.
link |
00:26:33.320
So you would say, I give up.
link |
00:26:35.320
Also, you would train it with batch gradient, which, you know, isn't that sufficient.
link |
00:26:39.320
So there was a lot of bad good tricks that you had to know to make those things work or you had to reinvent.
link |
00:26:46.320
And a lot of people just didn't and they just couldn't make it work.
link |
00:26:50.320
So that's one thing.
link |
00:26:52.320
The investment in software platform to be able to kind of, you know, display things, figure out why things don't work, kind of get a good intuition for how to get them to work, have enough flexibility so you can create, you know, network architectures like convolutional nets and stuff like that.
link |
00:27:08.320
It was hard.
link |
00:27:09.320
I mean, you had to write everything from scratch.
link |
00:27:10.320
And again, you didn't have any Python or MATLAB or anything.
link |
00:27:13.320
Right.
link |
00:27:14.320
I read that, sorry to interrupt, but I read that you wrote in Lisp the, your first versions of Lynette with the convolutional networks, which by the way, one of my favorite languages.
link |
00:27:25.320
That's how I knew you were legit.
link |
00:27:27.320
Touring award, whatever.
link |
00:27:29.320
You programmed in Lisp.
link |
00:27:31.320
It's still my favorite language.
link |
00:27:32.320
But it's not that we programmed in Lisp.
link |
00:27:35.320
It's that we had to write a Lisp interpreter.
link |
00:27:37.320
Okay.
link |
00:27:38.320
Because it's not like we use one that existed.
link |
00:27:40.320
So we wrote a Lisp interpreter that we hooked up to, you know, a back end library that we wrote also for sort of neural net computation.
link |
00:27:48.320
And then after a few years around 1991, we invented this idea of basically having modules that know how to forward propagate and back propagate gradients and then interconnecting those modules in a graph.
link |
00:28:01.320
Leon but who had made proposals on this about this in the late 80s, and we're able to implement this using a list system.
link |
00:28:11.320
Eventually, we wanted to use that system to make build production code for character recognition at Bell Labs.
link |
00:28:14.320
So we actually wrote a compiler for that Lisp interpreter so that Petris Seymard, who is now Microsoft, kind of did the bulk of it with Leon and me.
link |
00:28:22.320
And so we could write our system in Lisp and then compile to C and then we'll have a self contain complete system that could kind of do the entire thing.
link |
00:28:33.320
Neither PyTorch nor Transparency can do this today.
link |
00:28:36.320
Yeah.
link |
00:28:37.320
Okay.
link |
00:28:38.320
It's coming.
link |
00:28:39.320
Yeah.
link |
00:28:40.320
I mean, there's something like that in PyTorch called, you know, Torch script.
link |
00:28:44.320
And so, you know, we had to write a Lisp interpreter, we had to write a Lisp compiler, we had to invest a huge amount of effort to do this.
link |
00:28:50.320
And not everybody, if you don't completely believe in the concept, you're not going to invest the time to do this.
link |
00:28:56.320
Right.
link |
00:28:57.320
Now, at the time also, you know, or today, this would turn into Torch or PyTorch or Transparency or whatever.
link |
00:29:03.320
We'd put it in open source, everybody would use it and, you know, realize it's good.
link |
00:29:07.320
Back before 1995, working at AT&T, there's no way the lawyers would let you release anything in open source of this nature.
link |
00:29:17.320
And so we could not distribute our code, really.
link |
00:29:20.320
And on that point, and sorry to go on a million tangents, but on that point, I also read that there was some almost pat, like a patent on convolutional networks.
link |
00:29:29.320
Yes, there was.
link |
00:29:31.320
So that, first of all, I mean, just.
link |
00:29:35.320
There were two, actually.
link |
00:29:37.320
That ran out.
link |
00:29:39.320
Thankfully, in 2007.
link |
00:29:41.320
In 2007.
link |
00:29:44.320
What, can we, can we just talk about that first?
link |
00:29:48.320
I know you're a Facebook, but you're also an NYU.
link |
00:29:50.320
And what does it mean to patent ideas like these software ideas, essentially?
link |
00:29:58.320
Or what are mathematical ideas?
link |
00:30:01.320
Or what are they?
link |
00:30:03.320
Okay.
link |
00:30:04.320
So they're not mathematical ideas.
link |
00:30:05.320
So there are, you know, algorithms.
link |
00:30:07.320
And there was a period where the US patent office would allow the patent of software as long as it was embodied.
link |
00:30:15.320
The Europeans are very different.
link |
00:30:18.320
They don't, they don't quite accept that they have a different concept.
link |
00:30:21.320
But, you know, I don't, I no longer, I mean, I never actually strongly believed in this, but I don't believe in this kind of patent.
link |
00:30:28.320
Facebook basically doesn't believe in this kind of patent.
link |
00:30:33.320
Google files patents because they've been burned with Apple.
link |
00:30:39.320
And so now they do this for defensive purpose.
link |
00:30:41.320
But usually they say, we're not going to see you if you're in French.
link |
00:30:44.320
Facebook has a, has a similar policy.
link |
00:30:47.320
They say, you know, we have a patent on certain things for defensive purpose.
link |
00:30:50.320
We're not going to see you if you're in French unless you through us.
link |
00:30:54.320
So the, the industry does not believe in, in patents.
link |
00:30:59.320
They're there because of, you know, the legal landscape and, and, and various things.
link |
00:31:03.320
But, but I don't really believe in patents for this kind of stuff.
link |
00:31:07.320
Okay. So that's, that's a great thing.
link |
00:31:09.320
So I, I tell you a worst story.
link |
00:31:11.320
Yeah.
link |
00:31:12.320
So what happens was the first, the first patent about convolutional net was about kind of the early version of convolutional net that didn't have separate pooling layers.
link |
00:31:19.320
It had, you know, convolutional layers with tried more than one, if you want, right?
link |
00:31:24.320
And then there was a second one on convolutional nets with separate pooling layers, trained with backprop.
link |
00:31:31.320
And there were files filed in 89 and 1990 or something like this.
link |
00:31:35.320
At the time, the life, life of a patent was 17 years.
link |
00:31:39.320
So here's what happened over the next few years is that we started developing character recognition technology around convolutional nets.
link |
00:31:48.320
And in 1994, a check reading system was deployed in ATM machines.
link |
00:31:55.320
In 1995, it was for large check reading machines in back offices, et cetera.
link |
00:32:00.320
And those systems were developed by an engineering group that we were collaborating with AT&T and they were commercialized by NCR,
link |
00:32:08.320
which at the time was a subsidiary of AT&T.
link |
00:32:11.320
Now AT&T split up in 1996, early 1996.
link |
00:32:18.320
And the lawyers just looked at all the patents and they distributed the patents among the various companies.
link |
00:32:22.320
They gave the convolutional net patent to NCR because they were actually selling products that used it.
link |
00:32:28.320
But nobody at NCR had any idea what a convolutional net was.
link |
00:32:31.320
Yeah.
link |
00:32:32.320
Okay.
link |
00:32:33.320
So between 1996 and 2007, there's a whole period until 2002 where I didn't actually work on
link |
00:32:40.320
machine learning or convolutional net.
link |
00:32:42.320
I resumed working on this around 2002.
link |
00:32:45.320
And between 2002 and 2007, I was working on them crossing my finger that nobody at NCR would notice and nobody noticed.
link |
00:32:51.320
Yeah.
link |
00:32:52.320
And I hope that this kind of somewhat, as you said, lawyers aside, relative openness of the community now will continue.
link |
00:33:02.320
It accelerates the entire progress of the industry.
link |
00:33:05.320
And the problems that Facebook and Google and others are facing today is not whether Facebook or Google or Microsoft or IBM or whoever is ahead of the other.
link |
00:33:17.320
It's that we don't have the technology to build these things we want to build.
link |
00:33:20.320
We want to build intelligent virtual assistants that have common sense.
link |
00:33:24.320
We don't have monopoly on good ideas for this.
link |
00:33:26.320
We don't believe we do.
link |
00:33:27.320
Maybe others do believe they do, but we don't.
link |
00:33:30.320
Okay.
link |
00:33:31.320
If a startup tells you they have a secret to human level intelligence and common sense, don't believe them.
link |
00:33:37.320
They don't.
link |
00:33:38.320
And it's going to take the entire work of the world research community for a while to get to the point where you can go off and in each of those companies can start to build things on this.
link |
00:33:50.320
We're not there yet.
link |
00:33:51.320
Absolutely.
link |
00:33:52.320
And this calls to the gap between the space of ideas and the rigorous testing of those ideas of practical application that you often speak to.
link |
00:34:03.320
You've written advice saying, don't get fooled by people who claim to have a solution to artificial general intelligence who claim to have an AI system that works just like the human brain or who claim to have figured out how the brain works.
link |
00:34:17.320
That's them, what the error rate they get on MNIST or ImageNet.
link |
00:34:23.320
This is a little dated, by the way.
link |
00:34:25.320
$2,000.
link |
00:34:26.320
I mean, five years.
link |
00:34:27.320
Who's counting?
link |
00:34:28.320
Okay.
link |
00:34:29.320
But I think your opinion is the MNIST and ImageNet.
link |
00:34:33.320
Yes, maybe dated.
link |
00:34:35.320
There may be new benchmarks, right?
link |
00:34:36.320
But I think that philosophy is one you still in somewhat hold that benchmarks and the practical testing, the practical application is where you really get to test the ideas.
link |
00:34:47.320
Well, it may not be completely practical.
link |
00:34:49.320
Like, for example, you know, it could be a toy data set, but it has to be some sort of task that the community as a whole is accepted as some sort of standard, you know, kind of benchmark if you want.
link |
00:35:00.320
It doesn't need to be real.
link |
00:35:01.320
So for example, many years ago here at FAIR, people, you know, Cheson West and Antoine Bourne and a few others proposed the baby tasks, which were kind of a toy problem to test the ability of machines to reason actually to access working memory and things like this.
link |
00:35:17.320
And it was very useful, even though it wasn't a real task.
link |
00:35:20.320
MNIST is kind of halfway a real task.
link |
00:35:23.320
So, you know, toy problems can be very useful.
link |
00:35:26.320
I guess that I was really struck by the fact that a lot of people, particularly a lot of people with money to invest would be fooled by people telling them, oh, we have, you know, the algorithm of the cortex and you should give us 50 million.
link |
00:35:39.320
Yes, absolutely.
link |
00:35:40.320
So there's a lot of people who who try to take advantage of the hype for business reasons and so on.
link |
00:35:48.320
But let me sort of talk to this idea that new ideas, the ideas that push the field forward may not yet have a benchmark or it may be very difficult to establish a benchmark.
link |
00:36:00.320
I agree.
link |
00:36:01.320
That's part of the process.
link |
00:36:02.320
Establishing benchmarks is part of the process.
link |
00:36:04.320
So what are your thoughts about, so we have these benchmarks on around stuff we can do with images from classification to captioning to just every kind of information you can pull off from images and the surface level.
link |
00:36:18.320
There's audio data set.
link |
00:36:20.320
There's some video.
link |
00:36:22.320
What can we start natural language?
link |
00:36:25.320
What kind of stuff, what kind of benchmarks do you see that start creeping on to more something like intelligence, like reasoning, like maybe you don't like the term but AGI echoes of that kind of formulation.
link |
00:36:41.320
A lot of people are working on interactive environments in which you can you can train and test intelligence systems.
link |
00:36:48.320
So there, for example, you know, it's the classical paradigm of supervised running is that you have a data set, you partition it into a training set, validation set, test set, and there's a clear protocol, right.
link |
00:37:02.320
But what if the that assumes that the samples are statistically independent, you can exchange them, the order in which you see them doesn't shouldn't matter, you know, things like that.
link |
00:37:13.320
But what if the answer you give determines the next sample you see, which is the case, for example, in robotics, right, you robot does something and then it gets exposed to a new room.
link |
00:37:23.320
And depending on where it goes, the room would be different.
link |
00:37:26.320
So that's the that creates the exploration problem.
link |
00:37:30.320
What if the samples, so that creates also a dependency between samples, right, you, you, if you move, if you can only move in space, the next sample you're going to see is going to be probably in the same building, most likely.
link |
00:37:44.320
So, so, so the all the assumptions about the validity of this training set, test set hypothesis break, whenever machine can take an action that has an influence in the in the world, and it's what is going to see.
link |
00:37:56.320
So people are setting up artificial environments where where that takes place, right, the robot runs around a 3D model of a house and can interact with objects and things like this.
link |
00:38:08.320
So you do robotics by simulation, you have those, you know, opening a gym type thing or Mujoko kind of simulated robots and you have games, you know, things like that.
link |
00:38:20.320
So that that's where the field is going really this kind of environment.
link |
00:38:25.320
Now, back to the question of AGI, like, I don't like the term AGI, because it implies that human intelligence is general.
link |
00:38:35.320
And human intelligence is nothing like general, it's very, very specialized.
link |
00:38:40.320
We think is general, we'd like to think of ourselves as having general intelligence, we don't, we're very specialized.
link |
00:38:45.320
We're only slightly more general.
link |
00:38:47.320
Why does it feel general?
link |
00:38:48.320
So you kind of the term general.
link |
00:38:51.320
I think what's impressive about humans is ability to learn, as we were talking about learning, to learn in just so many different domains is perhaps not arbitrarily general, but just you can learn in many domains and integrate that knowledge somehow.
link |
00:39:07.320
Okay.
link |
00:39:08.320
The knowledge persists.
link |
00:39:09.320
So let me take a very specific example.
link |
00:39:11.320
Yes.
link |
00:39:12.320
It's not an example.
link |
00:39:13.320
It's more like a quasi mathematical demonstration.
link |
00:39:16.320
So you have about one million fibers coming out of one of your eyes, okay, two million total, but let's, let's talk about just one of them.
link |
00:39:22.320
It's one million nerve fibers, your optical nerve.
link |
00:39:26.320
Let's imagine that they are binary, so they can be active or inactive, right?
link |
00:39:30.320
So the input to your visual cortex is one million bits.
link |
00:39:36.320
Now they're connected to your brain in a particular way and your brain has connections that are kind of a little bit like a convolution that they kind of local, you know, in space and things like this.
link |
00:39:47.320
Now imagine I play a trick on you.
link |
00:39:50.320
It's a pretty nasty trick, I admit.
link |
00:39:52.320
I cut your optical nerve and I put a device that makes a random perturbation of a permutation of all the nerve fibers.
link |
00:40:00.320
So now what comes to your brain is a fixed but random permutation of all the pixels.
link |
00:40:08.320
There's no way in hell that your visual cortex, even if I do this to you in infancy, will actually learn vision to the same level of quality that you can.
link |
00:40:19.320
Got it.
link |
00:40:20.320
And you're saying there's no way you've learned that?
link |
00:40:22.320
No, because now two pixels that are nearby in the world will end up in very different places in your visual cortex.
link |
00:40:28.320
And your neurons there have no connections with each other because they only connect it locally.
link |
00:40:33.320
So this whole, our entire, the hardware is built in many ways to support.
link |
00:40:38.320
The locality of the real world?
link |
00:40:39.320
Yeah.
link |
00:40:40.320
Yes.
link |
00:40:41.320
That's specialization.
link |
00:40:42.320
Yeah, but it's still pretty damn impressive.
link |
00:40:44.320
So it's not perfect generalization.
link |
00:40:46.320
It's not even close.
link |
00:40:47.320
No, no.
link |
00:40:48.320
It's not that it's not even close.
link |
00:40:50.320
It's not at all.
link |
00:40:51.320
Yeah, it's not.
link |
00:40:52.320
So how many Boolean functions?
link |
00:40:54.320
Let's imagine you want to train your visual system to recognize particular patterns of those one million bits.
link |
00:41:03.320
So that's a Boolean function.
link |
00:41:05.320
Either the pattern is here or not here.
link |
00:41:07.320
It's a two way classification with one million binary inputs.
link |
00:41:13.320
How many such Boolean functions are there?
link |
00:41:16.320
You have two to the one million combinations of inputs.
link |
00:41:21.320
For each of those, you have an output bit.
link |
00:41:24.320
And so you have two to the two to the one million Boolean functions of this type.
link |
00:41:29.320
Okay.
link |
00:41:30.320
Which is an unimaginably large number.
link |
00:41:33.320
How many of those functions can actually be computed by your visual cortex?
link |
00:41:37.320
And the answer is a tiny, tiny, tiny, tiny, tiny, tiny sliver.
link |
00:41:41.320
Like an enormously tiny sliver.
link |
00:41:43.320
Yeah.
link |
00:41:44.320
Yeah.
link |
00:41:45.320
So we are ridiculously specialized.
link |
00:41:48.320
Okay.
link |
00:41:51.320
That's an argument against the word general.
link |
00:41:54.320
I agree with your intuition, but I'm not sure it seems the brain is impressively capable of adjusting to things.
link |
00:42:09.320
It's because we can't imagine tasks that are outside of our comprehension.
link |
00:42:16.320
So we think we are general because we're general of all the things that we can apprehend.
link |
00:42:20.320
So yeah.
link |
00:42:21.320
But there is a huge world out there of things that we have no idea.
link |
00:42:24.320
We call that heat, by the way.
link |
00:42:26.320
Heat.
link |
00:42:27.320
Heat.
link |
00:42:28.320
So at least physicists call that heat or they call it entropy, which is kind of...
link |
00:42:33.320
You have a thing full of gas, right?
link |
00:42:39.320
Close system for gas.
link |
00:42:40.320
Right?
link |
00:42:41.320
Close or no close.
link |
00:42:42.320
It has, you know, pressure, it has temperature, it has, you know, and you can write equations,
link |
00:42:51.320
PV equal and RT, you know, things like that, right?
link |
00:42:55.320
When you reduce the volume, the temperature goes up, the pressure goes up, you know, things like that, right?
link |
00:43:00.320
For perfect gas, at least.
link |
00:43:02.320
Those are the things you can know about that system.
link |
00:43:05.320
And it's a tiny, tiny number of bits compared to the complete information of the state of the entire system.
link |
00:43:10.320
Because the state of the entire system will give you the position and momentum of every molecule of the gas.
link |
00:43:17.320
And what you don't know about it is the entropy and you interpret it as heat.
link |
00:43:23.320
The energy contained in that thing is what we call heat.
link |
00:43:27.320
Now, it's very possible that, in fact, there is some very strong structure in how those molecules are moving.
link |
00:43:34.320
It's just that they are in a way that we are just not wired to perceive.
link |
00:43:38.320
Yeah, we're ignorant of it.
link |
00:43:39.320
And there's, in your infinite amount of things, we're not wired to perceive.
link |
00:43:44.320
Yeah.
link |
00:43:45.320
And you're right, that's a nice way to put it.
link |
00:43:47.320
We're general to all the things we can imagine, which is a very tiny subset of all the things that are possible.
link |
00:43:54.320
So it's like comagraph complexity or the comagraph's chat in some kind of complexity.
link |
00:43:58.320
Yeah.
link |
00:43:59.320
You know, every bit string or every integer is random, except for all the ones that you can actually write down.
link |
00:44:07.320
Yeah, okay, so beautiful, but, you know, so we can just call it artificial intelligence.
link |
00:44:15.320
We don't need to have a general.
link |
00:44:17.320
Or human level.
link |
00:44:18.320
Human level intelligence is good.
link |
00:44:20.320
You know, you'll start, anytime you touch human, it gets interesting because, you know, it's because we attach ourselves to human
link |
00:44:33.320
and it's difficult to define what human intelligence is.
link |
00:44:36.320
Nevertheless, my definition is maybe a damn impressive intelligence.
link |
00:44:42.320
Okay, damn impressive demonstration of intelligence, whatever.
link |
00:44:46.320
And so on that topic, most successes in deep learning have been in supervised learning.
link |
00:44:53.320
What is your view on unsupervised learning?
link |
00:44:57.320
Is there a hope to reduce involvement of human input and still have successful systems that have practically use?
link |
00:45:07.320
Yeah, I mean, there's definitely a hope.
link |
00:45:09.320
It's more than a hope, actually.
link |
00:45:11.320
It's, you know, mounting evidence for it.
link |
00:45:13.320
And that's basically all I do.
link |
00:45:15.320
Like the only thing I'm interested in at the moment is I call itself supervised learning, not unsupervised.
link |
00:45:20.320
Because unsupervised learning is a loaded term.
link |
00:45:25.320
People who know something about machine learning, you know, tell you, so you're doing clustering or PCA, which is not the case.
link |
00:45:31.320
And the white public, you know, when you say unsupervised learning, oh my God, you know, machines are going to learn by themselves and without supervision.
link |
00:45:37.320
You know, they see this as...
link |
00:45:39.320
Where's the parents?
link |
00:45:41.320
Yeah, so I call myself supervised learning because, in fact, the underlying algorithms that are used are the same algorithms as the supervised learning algorithms.
link |
00:45:49.320
Except that what we try them to do is not predict a particular set of variables, like the category of an image.
link |
00:45:59.320
And not to predict a set of variables that have been provided by human labelers.
link |
00:46:05.320
But what you're trying the machine to do is basically reconstruct a piece of its input that it's being...
link |
00:46:11.320
It's being masked out, essentially. You can think of it this way, right?
link |
00:46:15.320
So show a piece of video to a machine and ask it to predict what's going to happen next.
link |
00:46:20.320
And of course, after a while, you can show what happens and the machine will kind of train itself to do better at that task.
link |
00:46:28.320
You can do, like all the latest, most successful models in natural language processing, use self supervised learning.
link |
00:46:35.320
You know, sort of bird style systems, for example, right?
link |
00:46:38.320
You show it a window of a dozen words on a text corpus.
link |
00:46:43.320
You take out 15% of the words and then you train the machine to predict the words that are missing.
link |
00:46:51.320
That's self supervised learning. It's not predicting the future, it's just predicting things in the middle.
link |
00:46:56.320
But you could have it predict the future. That's what language models do.
link |
00:46:59.320
So in an unsupervised way, you construct a model of language. Do you think...
link |
00:47:05.320
Or video or the physical world or whatever, right?
link |
00:47:09.320
How far do you think that can take us?
link |
00:47:12.320
Do you think very far it understands anything?
link |
00:47:17.320
To some level, it has, you know, a shadow understanding of text.
link |
00:47:23.320
But it needs to, I mean, to have kind of true human level intelligence.
link |
00:47:26.320
I think you need to ground language in reality.
link |
00:47:29.320
So some people are attempting to do this, right?
link |
00:47:32.320
Having systems that kind of have some visual representation of what is being talked about.
link |
00:47:37.320
Which is one reason you need those interactive environments, actually.
link |
00:47:40.320
But this is like a huge technical problem that is not solved.
link |
00:47:44.320
And that explains why self supervised learning works in the context of natural language.
link |
00:47:49.320
That does not work in the context, or at least not well, in the context of image recognition and video.
link |
00:47:55.320
Although it's making progress quickly.
link |
00:47:57.320
And the reason, that reason is the fact that it's much easier to represent uncertainty in the prediction.
link |
00:48:04.320
In the context of natural language than it is in the context of things like video and images.
link |
00:48:09.320
So for example, if I ask you to predict what words I'm missing, you know, 15% of the words that I've taken out.
link |
00:48:17.320
The possibilities are small.
link |
00:48:19.320
It's small, right? There is 100,000 words in the lexicon.
link |
00:48:22.320
And what the machine spits out is a big probability vector, right?
link |
00:48:27.320
It's a bunch of numbers between the one one that's on to one.
link |
00:48:30.320
And we know how to do this with computers.
link |
00:48:33.320
So there, representing uncertainty in the prediction is relatively easy.
link |
00:48:37.320
And that's, in my opinion, why those techniques work for NLP.
link |
00:48:42.320
For images, if you ask, if you block a piece of an image and you ask the system reconstruct that piece of the image,
link |
00:48:48.320
there are many possible answers that are all perfectly legit, right?
link |
00:48:54.320
And how do you represent that, this set of possible answers?
link |
00:48:58.320
You can't train a system to make one prediction.
link |
00:49:00.320
You can train an old net to say, here it is, that's the image.
link |
00:49:04.320
Because there's a whole set of things that are compatible with it.
link |
00:49:07.320
So how do you get the machine to represent not a single output, but a whole set of outputs?
link |
00:49:12.320
And, you know, similarly with video prediction, there's a lot of things that can happen in the future of video.
link |
00:49:20.320
You're looking at me right now. I'm not moving my head very much.
link |
00:49:22.320
But, you know, I might, you know, turn my head to the left or to the right.
link |
00:49:26.320
If you don't have a system that can predict this,
link |
00:49:30.320
and you train it with least square to kind of minimize the error with a prediction on what I'm doing,
link |
00:49:34.320
what you get is a blurry image of myself in all possible future positions that I might be in.
link |
00:49:39.320
Which is not a good prediction.
link |
00:49:41.320
But so there might be other ways to do the self supervision, right?
link |
00:49:45.320
For visual scenes.
link |
00:49:47.320
Like what?
link |
00:49:49.320
I mean, if I knew I wouldn't tell you, I'd publish it first. I don't know.
link |
00:49:55.320
No, there might be.
link |
00:49:57.320
So, I mean, these are kind of, there might be artificial ways of like self play in games to where you can simulate part of the environment.
link |
00:50:05.320
Oh, that doesn't solve the problem. It's just a way of generating data.
link |
00:50:10.320
But because you have more of a control, that may mean you can control, yeah, it's a way to generate data.
link |
00:50:16.320
That's right. And because you can do huge amounts of data generation, that doesn't, you're right.
link |
00:50:21.320
Well, it's a creeps up on the problem from the side of data.
link |
00:50:26.320
I don't think that's the right way to creep up on the problem.
link |
00:50:28.320
It doesn't solve this problem of handling uncertainty in the world, right?
link |
00:50:31.320
So, if you have a machine learn a predictive model of the world in a game that is deterministic or quasi deterministic, it's easy, right?
link |
00:50:42.320
Just, you know, give a few frames of the game to a connet, put a bunch of layers, and then have the game generates the next few frames.
link |
00:50:49.320
And if the game is deterministic, it works fine.
link |
00:50:54.320
And that includes, you know, feeding the system with the action that your little character is going to take.
link |
00:51:02.320
The problem comes from the fact that the real world and most games are not entirely predictable.
link |
00:51:09.320
And so there you get those blurry predictions, and you can't do planning with blurry predictions.
link |
00:51:13.320
Right, so if you have a perfect model of the world, you can, in your head, run this model with a hypothesis for a sequence of actions,
link |
00:51:23.320
and you're going to predict the outcome of that sequence of actions.
link |
00:51:27.320
But if your model is imperfect, how can you plan?
link |
00:51:32.320
Yeah, it quickly explodes.
link |
00:51:34.320
What are your thoughts on the extension of this, which topic I'm super excited about.
link |
00:51:39.320
It's connected to something you were talking about in terms of robotics, is active learning.
link |
00:51:44.320
So, as opposed to sort of completely unsupervised or self supervised learning,
link |
00:51:50.320
you ask the system for human help for selecting parts you want annotated next.
link |
00:51:58.320
So if you think about a robot exploring a space, or a baby exploring a space,
link |
00:52:02.320
or a system exploring a data set, every once in a while asking for human input.
link |
00:52:08.320
Do you see value in that kind of work?
link |
00:52:12.320
I don't see transformative value.
link |
00:52:14.320
It's going to make things that we can already do more efficient, or they will learn slightly more efficiently,
link |
00:52:20.320
but it's not going to make machines sort of significantly more intelligent, I think.
link |
00:52:25.320
And by the way, there is no opposition, there is no conflict between self supervised learning, reinforcement learning,
link |
00:52:34.320
and supervised learning, or imitation learning, or active learning.
link |
00:52:38.320
I see self supervised learning as a preliminary to all of the above.
link |
00:52:43.320
Yes.
link |
00:52:44.320
So, the example I use very often is, how is it that, so if you use classical reinforcement learning,
link |
00:52:54.320
deep reinforcement learning, if you want.
link |
00:52:57.320
The best methods today, so called model free reinforcement learning, to learn to play Atari games,
link |
00:53:05.320
take about 80 hours of training to reach the level that any human can reach in about 15 minutes.
link |
00:53:11.320
They get better than humans, but it takes them a long time.
link |
00:53:17.320
Alpha star, okay, the, you know, all your vinyls and his teams, the system to play, to play Starcraft,
link |
00:53:27.320
plays, you know, a single map, a single type of player,
link |
00:53:34.320
and can reach better than human level with about the equivalent of 200 years of training playing against itself.
link |
00:53:45.320
It's 200 years, right? It's not something that no human can, could ever do.
link |
00:53:50.320
I mean, I'm not sure what lesson to take away from that.
link |
00:53:52.320
Okay, now, take those algorithms, the best RL algorithms we have today, to train a car to drive itself.
link |
00:54:01.320
It would probably have to drive millions of hours, it will have to kill thousands of pedestrians,
link |
00:54:05.320
it will have to run into thousands of trees, it will have to run off cliffs,
link |
00:54:09.320
and it had to run off cliffs multiple times before it figures out that it's a bad idea, first of all,
link |
00:54:15.320
and second of all, before it figures out how not to do it.
link |
00:54:18.320
And so, I mean, this type of learning obviously does not reflect the kind of learning that animals and humans do.
link |
00:54:24.320
There is something missing that's really, really important there.
link |
00:54:27.320
And my hypothesis, which I've been advocating for like five years now,
link |
00:54:31.320
is that we have predictive models of the world that include the ability to predict under uncertainty,
link |
00:54:39.320
and what allows us to not run off a cliff when we learn to drive.
link |
00:54:45.320
Most of us can learn to drive in about 20 or 30 hours of training without ever crashing, causing any accident.
link |
00:54:51.320
If we drive next to a cliff, we know that if we turn the wheel to the right,
link |
00:54:56.320
the car is going to run off the cliff and nothing good is going to come out of this,
link |
00:55:00.320
because we have a pretty good model of intuitive physics that tells us the car is going to fall.
link |
00:55:03.320
We know about gravity.
link |
00:55:05.320
Babies run this around the age of eight or nine months that objects don't float, they fall.
link |
00:55:12.320
And we have a pretty good idea of the effect of turning the wheel on the car,
link |
00:55:16.320
and we know we need to stay on the road.
link |
00:55:18.320
So there's a lot of things that we bring to the table, which is basically our predictive model of the world,
link |
00:55:23.320
and that model allows us to not do stupid things and to basically stay within the context of things we need to do.
link |
00:55:31.320
We still face unpredictable situations, and that's how we learn,
link |
00:55:35.320
but that allows us to learn really, really, really quickly.
link |
00:55:39.320
So that's called model based reinforcement learning.
link |
00:55:42.320
There's some imitation and supervised learning because we have a driving instructor that tells us occasionally what to do,
link |
00:55:48.320
but most of the learning is learning the model.
link |
00:55:52.320
Learning physics that we've done since we were babies.
link |
00:55:55.320
That's where almost all the learning...
link |
00:55:57.320
And the physics is somewhat transferable from...
link |
00:56:00.320
It's transferable from scene to scene.
link |
00:56:02.320
Stupid things are the same everywhere.
link |
00:56:05.320
Yeah. I mean, if you have an experience of the world,
link |
00:56:08.320
you don't need to be from a particularly intelligent species to know that if you spill water from a container,
link |
00:56:16.320
the rest is going to get wet.
link |
00:56:19.320
You might get wet.
link |
00:56:21.320
So cats know this, right?
link |
00:56:24.320
Yeah.
link |
00:56:25.320
So the main problem we need to solve is how do we learn models of the world?
link |
00:56:30.320
And that's what I'm interested in.
link |
00:56:31.320
That's what self supervised learning is all about.
link |
00:56:34.320
If you were to try to construct a benchmark for...
link |
00:56:39.320
Let's look at MNIST.
link |
00:56:41.320
I love that dataset.
link |
00:56:43.320
Do you think it's useful, interesting, slash possible to perform well on MNIST with just one example of each digit?
link |
00:56:53.320
And how would we solve that problem?
link |
00:56:58.320
The answer is probably yes.
link |
00:56:59.320
The question is what other type of learning are you allowed to do?
link |
00:57:03.320
So if what you're allowed to do is train on some gigantic dataset of labeled digit that's called transfer learning.
link |
00:57:08.320
And we know that works.
link |
00:57:10.320
We do this at Facebook like in production, right?
link |
00:57:13.320
We train large convolution nest to predict hashtags that people type on Instagram and we train on billions of images, literally billions.
link |
00:57:20.320
And then we chop off the last layer and fine tune on whatever task we want.
link |
00:57:24.320
That works really well.
link |
00:57:25.320
You can beat the ImageNet record with this.
link |
00:57:28.320
We actually open sourced the whole thing like a few weeks ago.
link |
00:57:31.320
Yeah, that's still pretty cool.
link |
00:57:33.320
But yeah, so what would be impressive and what's useful and impressive, what kind of transfer learning would be useful and impressive?
link |
00:57:40.320
Is it Wikipedia, that kind of thing?
link |
00:57:42.320
No, no.
link |
00:57:43.320
I don't think transfer learning is really where we should focus.
link |
00:57:46.320
We should try to have a kind of scenario for a benchmark where you have unlabeled data and it's a very large number of unlabeled data.
link |
00:57:59.320
It could be video clips, it could be where you do frame prediction, it could be images where you could choose to mask a piece of it.
link |
00:58:10.320
It could be whatever, but they're unlabeled and you're not allowed to label them.
link |
00:58:15.320
So you do some training on this and then you train on a particular supervised task, ImageNet or NIST.
link |
00:58:26.320
And you measure how your test error or validation error decreases as you increase the number of labeled training samples.
link |
00:58:35.320
And what you'd like to see is that your error decreases much faster than if you train from scratch, from random weights.
link |
00:58:47.320
So that to reach the same level of performance than a completely supervised, purely supervised system would reach, you would need way fewer samples.
link |
00:58:56.320
So that's the crucial question because it will answer the question to people interested in medical image analysis.
link |
00:59:02.320
Okay, if I want to get a particular level of error rate for this task, I know I need a million samples, can I do self supervised pre training to reduce this to about 100 or something?
link |
00:59:17.320
And you think the answer there is self supervised pre training?
link |
00:59:20.320
Yeah, some form of it.
link |
00:59:24.320
Telling you active learning, but you disagree?
link |
00:59:27.320
No, it's not useless, it's just not going to lead to a quantum leap, it's just going to make things that we already do.
link |
00:59:33.320
So you're way smarter than me, I just disagree with you.
link |
00:59:36.320
But I don't have anything to back that, it's just intuition.
link |
00:59:40.320
So I worked a lot of large scale data sets and there's something that might be magic in active learning.
link |
00:59:46.320
But okay, at least I said it publicly.
link |
00:59:49.320
At least I'm being an idiot publicly.
link |
00:59:52.320
Okay, it's not being an idiot, it's working with the data you have. I mean, certainly people are doing things like, okay, I have 3000 hours of imitation learning for cell driving car, but most of those are incredibly boring.
link |
01:00:05.320
What I like is select 10% of them that are kind of the most informative and with just that, I would probably reach the same.
link |
01:00:12.320
So it's a weak form of active learning if you want.
link |
01:00:16.320
Yes, but there might be a much stronger version.
link |
01:00:20.320
That's right. And that's an open question if it exists.
link |
01:00:23.320
The question is how much stronger can you get?
link |
01:00:26.320
Elon Musk is confident, talked to him recently, he's confident that large scale data and deep learning can solve the autonomous driving problem.
link |
01:00:35.320
What are your thoughts on the limits possibilities of deep learning in this space?
link |
01:00:40.320
It's obviously part of the solution.
link |
01:00:42.320
I mean, I don't think we'll ever have a cell driving system or it is not in the foreseeable future that does not use deep learning.
link |
01:00:50.320
Now, how much of it?
link |
01:00:52.320
So in the history of sort of engineering, particularly sort of AI like systems, there's generally a first phase where everything is built by hand.
link |
01:01:03.320
Then there is a second phase, and that was the case for autonomous driving, you know, 20, 30 years ago.
link |
01:01:08.320
There's a phase where there's a little bit of learning is used, but there's a lot of engineering that's involved in kind of, you know, taking care of corner cases and putting limits, etc.
link |
01:01:18.320
Because the learning system is not perfect.
link |
01:01:20.320
And then as technology progresses, we end up relying more and more on learning.
link |
01:01:26.320
That's the history of character recognition, so history of speech recognition, now computer vision, natural language processing.
link |
01:01:31.320
And I think the same is going to happen with autonomous driving that currently the methods that are closest to providing some level of autonomy,
link |
01:01:43.320
some, you know, decent level of autonomy where you don't expect a driver to kind of do anything, is where you constrain the world.
link |
01:01:50.320
So you only run within, you know, 100 square kilometers or square miles in Phoenix, but the weather is nice and the roads are wide, which is what Waymo is doing.
link |
01:02:00.320
You completely over engineer the car with tons of lidars and sophisticated sensors that are too expensive for consumer cars, but they're fine if you just run a fleet.
link |
01:02:13.320
And you engineer the thing, the hell out of the everything else, you map the entire world, so you have complete 3D model of everything.
link |
01:02:20.320
So the only thing that the perception system has to take care of is moving objects and construction and sort of, you know, things that weren't in your map.
link |
01:02:30.320
And you can engineer a good, you know, slam system.
link |
01:02:33.320
So that's kind of the current approach that's closest to some level of autonomy, but I think eventually the long term solution is going to rely more and more on learning
link |
01:02:43.320
and possibly using a combination of self supervised learning and model based reinforcement or something like that.
link |
01:02:50.320
But ultimately learning will be not just at the core, but really the fundamental part of the system.
link |
01:02:57.320
Yeah, it already is, but it will become more and more.
link |
01:03:00.320
What do you think it takes to build a system with human level intelligence?
link |
01:03:04.320
You talked about the AI system in the movie, her being way out of reach, our current reach, this might be outdated as well, but
link |
01:03:12.320
this is your way out of reach.
link |
01:03:13.320
It's the way out of reach.
link |
01:03:15.320
What would it take to build her?
link |
01:03:18.320
Do you think?
link |
01:03:19.320
So I can tell you the first two obstacles that we have to clear, but I don't know how many obstacles there are after this.
link |
01:03:24.320
So the image I usually use is that there is a bunch of mountains that we have to climb and we can see the first one, but we don't know if there are 50 mountains behind it or not.
link |
01:03:32.320
And this might be a good sort of metaphor for why AI researchers in the past have been overly optimistic about the result of AI.
link |
01:03:43.320
For example, Noah and Simon wrote the general problem solver and they call it the general problem solver.
link |
01:03:52.320
And of course, the first thing you realize is that all the problems you want to solve are exponential and so you can't actually use it for anything useful.
link |
01:03:59.320
Yeah, so yeah, all you see is the first peak.
link |
01:04:02.320
So what are the first couple of peaks for her?
link |
01:04:05.320
So the first peak, which is precisely what I'm working on, is cell supervision.
link |
01:04:09.320
How do we get machines to run models of the world by observation, kind of like babies and like young animals?
link |
01:04:17.320
So we've been working with, you know, cognitive scientists.
link |
01:04:23.320
So this Emmanuel Dupu, who is at Faire in Paris, is a half time, is also a researcher in French University.
link |
01:04:32.320
And he has this chart that shows how many months of life baby humans can learn different concepts.
link |
01:04:42.320
And you can measure this in various ways.
link |
01:04:46.320
So things like distinguishing animate objects from inanimate objects, you can tell the difference at age two, three months.
link |
01:04:56.320
Whether an object is going to stay stable is going to fall, you know, about four months you can tell.
link |
01:05:03.320
You know, there are various things like this.
link |
01:05:05.320
And then things like gravity, the fact that objects are not supposed to float in the air but are supposed to fall, you run this around the age of eight or nine months.
link |
01:05:13.320
So you look at a lot of eight month old babies, you give them a bunch of toys on their high chair.
link |
01:05:19.320
First thing they do is throw them on the ground and they look at them.
link |
01:05:22.320
It's because, you know, they're learning about, actively learning about gravity.
link |
01:05:27.320
So they're not trying to know you, but they need to do the experiment, right?
link |
01:05:33.320
So, you know, how do we get machines to learn like babies mostly by observation with a little bit of interaction
link |
01:05:39.320
and learning those models of the world because I think that's really a crucial piece of an intelligent autonomous system.
link |
01:05:46.320
So if you think about the architecture of an intelligent autonomous system, it needs to have a predictive model of the world.
link |
01:05:51.320
So something that says, here is a world at time t, here is a state of the world at time t plus one if I take this action.
link |
01:05:57.320
And it's not a single answer.
link |
01:05:59.320
It can be a distribution.
link |
01:06:01.320
Yeah, well, we don't know how to represent distributions in highly measured space.
link |
01:06:05.320
So it's got to be something weaker than that.
link |
01:06:07.320
With some representation of uncertainty.
link |
01:06:10.320
If you have that, then you can do what optimal control theory is called model predictive control,
link |
01:06:15.320
which means that you can run your model with a hypothesis for a sequence of action and then see the result.
link |
01:06:21.320
Now what you need, the other thing you need is some sort of objective that you want to optimize.
link |
01:06:25.320
Am I reaching the goal of grabbing the subject?
link |
01:06:28.320
Am I minimizing energy?
link |
01:06:30.320
Am I whatever, right?
link |
01:06:31.320
So there is some sort of objective that you have to minimize.
link |
01:06:34.320
And so in your head, if you have this model, you can figure out the sequence of action that will optimize your objective.
link |
01:06:40.320
That objective is something that ultimately is rooted in your basal ganglia, at least in the human brain.
link |
01:06:46.320
That's what it's.
link |
01:06:47.320
Basal ganglia computes your level of contentment or miscontentment.
link |
01:06:52.320
I don't know if that's a word.
link |
01:06:53.320
Unhappiness, okay.
link |
01:06:55.320
Discontentment.
link |
01:06:57.320
Discontentment.
link |
01:06:58.320
And so your entire behavior is driven towards kind of minimizing that objective, which is maximizing your contentment computed by your basal ganglia.
link |
01:07:10.320
And what you have is an objective function, which is basically a predictor of what your basal ganglia is going to tell you.
link |
01:07:16.320
So you're not going to put your hand on fire because you know it's going to burn and you're going to get hurt.
link |
01:07:23.320
And you're predicting this because of your model of the world and your sort of predictor of this objective, right?
link |
01:07:29.320
So if you have those three components, you have four components, you have the hardwired contentment objective computer, if you want, calculator.
link |
01:07:43.320
And then you have the three components.
link |
01:07:44.320
One is the objective predictor, which basically predicts your level of contentment.
link |
01:07:48.320
One is the model of the world, and there's a third module I didn't mention, which is the module that will figure out the best course of action to optimize an objective given your model.
link |
01:08:01.320
Okay?
link |
01:08:02.320
Yeah.
link |
01:08:03.320
Collision policy, policy network or something like that, right?
link |
01:08:08.320
Now, you need those three components to act autonomously intelligently, and you can be stupid in three different ways.
link |
01:08:15.320
You can be stupid because your model of the world is wrong.
link |
01:08:18.320
You can be stupid because your objective is not aligned with what you actually want to achieve.
link |
01:08:24.320
Okay?
link |
01:08:26.320
In humans, that would be a psychopath.
link |
01:08:29.320
And then the third thing, the third way you can be stupid is that you have the right model, you have the right objective, but you're unable to figure out a course of action to optimize your objective given your model.
link |
01:08:40.320
Right.
link |
01:08:41.320
Okay?
link |
01:08:43.320
Some people who are in charge of big countries actually have all three that are wrong.
link |
01:08:47.320
All right.
link |
01:08:50.320
Which countries?
link |
01:08:51.320
I don't know.
link |
01:08:52.320
Okay.
link |
01:08:53.320
So if we think about this agent, if we think about the movie Her, you've criticized the art project that is Sophia the Robot.
link |
01:09:04.320
And what that project essentially does is uses our natural inclination to anthropomorphize things that look like human and give them more.
link |
01:09:14.320
Do you think that could be used by AI systems like in the movie Her?
link |
01:09:20.320
So do you think that body is needed to create a feeling of intelligence?
link |
01:09:26.320
Well, if Sophia was just an art piece, I would have no problem with it, but it's presented as something else.
link |
01:09:32.320
Let me add that comment real quick.
link |
01:09:35.320
If creators of Sophia could change something about their marketing or behavior in general, what would it be?
link |
01:09:42.320
I'm just about everything.
link |
01:09:45.320
I mean, don't you think, here's a tough question.
link |
01:09:50.320
Let me, so I agree with you.
link |
01:09:52.320
So Sophia is not, the general public feels that Sophia can do way more than she actually can.
link |
01:09:59.320
That's right.
link |
01:10:00.320
And the people who created Sophia are not honestly publicly communicating, trying to teach the public.
link |
01:10:09.320
Right.
link |
01:10:10.320
But here's a tough question.
link |
01:10:13.320
Don't you think the same thing is scientists in industry and research are taking advantage of the same misunderstanding in the public when they create AI companies or publish stuff?
link |
01:10:29.320
Some companies, yes.
link |
01:10:31.320
I mean, there is no sense of, there's no desire to delude.
link |
01:10:34.320
There's no desire to kind of overclaim what something is done.
link |
01:10:38.320
Right.
link |
01:10:39.320
You publish a paper on AI that has this result on ImageNet.
link |
01:10:42.320
It's pretty clear.
link |
01:10:43.320
I mean, it's not even interesting anymore.
link |
01:10:45.320
But I don't think there is that.
link |
01:10:48.320
I mean, the reviewers are generally not very forgiving of unsupported claims of this type.
link |
01:10:57.320
And, but there are certainly quite a few startups that have had a huge amount of hype around this that I find extremely damaging.
link |
01:11:05.320
And I've been calling it out when I've seen it.
link |
01:11:07.320
So, yeah, but to go back to your original question, like the necessity of embodiment, I think, I don't think embodiment is necessary.
link |
01:11:15.320
I think grounding is necessary.
link |
01:11:17.320
So I don't think we're going to get machines that really understand language without some level of grounding in the real world.
link |
01:11:22.320
And it's not clear to me that language is a high enough bandwidth medium to communicate how the real world works.
link |
01:11:29.320
I think for this...
link |
01:11:30.320
Can you talk about what grounding means to you?
link |
01:11:33.320
So grounding means that...
link |
01:11:34.320
So there is this classic problem of common sense reasoning, you know, the Winograd schema, right?
link |
01:11:41.320
And so I tell you the trophy doesn't fit in the suitcase because it's too big, or the trophy doesn't fit in the suitcase because it's too small.
link |
01:11:49.320
And the it in the first case refers to the trophy in the second case to the suitcase.
link |
01:11:53.320
And the reason you can figure this out is because you know what the trophy in the suitcase are, you know, one is supposed to fit in the other one,
link |
01:11:58.320
and you know the notion of size and a big object doesn't fit in a small object unless it's a target, you know, things like that, right?
link |
01:12:05.320
So you have this knowledge of how the world works, of geometry and things like that.
link |
01:12:11.320
I don't believe you can learn everything about the world by just being told in language how the world works.
link |
01:12:18.320
You need some low level perception of the world, you know, be it visual touch, you know, whatever, but some higher bandwidth perception of the world.
link |
01:12:26.320
So by reading all the world's text, you still may not have enough information.
link |
01:12:31.320
That's right.
link |
01:12:32.320
There's a lot of things that just will never appear in text and that you can't really infer.
link |
01:12:37.320
So I think common sense will emerge from, you know, certainly a lot of language interaction,
link |
01:12:43.320
but also with watching videos or perhaps even interacting in virtual environments and possibly, you know, robot interacting in the real world.
link |
01:12:51.320
But I don't actually believe necessarily that this last one is absolutely necessary.
link |
01:12:55.320
But I think there's a need for some grounding.
link |
01:12:59.320
But the final product doesn't necessarily need to be embodied, you're saying?
link |
01:13:04.320
No.
link |
01:13:05.320
It just needs to have an awareness grounding.
link |
01:13:07.320
Right.
link |
01:13:08.320
It needs to know how the world works to have, you know, to not be frustrated, frustrating to talk to.
link |
01:13:16.320
And you talked about emotions being important.
link |
01:13:20.320
That's a whole other topic.
link |
01:13:22.320
Well, so, you know, I talked about this, the base of ganglia as the, you know, the thing that calculates your level of misconstantment, contentment.
link |
01:13:33.320
This is the other module that sort of tries to do a prediction of whether you're going to be content or not.
link |
01:13:38.320
That's the source of some emotion.
link |
01:13:40.320
So fear, for example, is an anticipation of bad things that can happen to you, right?
link |
01:13:47.320
You have this inkling that there is some chance that something really bad is going to happen to you and that creates fear.
link |
01:13:52.320
When you know for sure that something bad is going to happen to you, you kind of give up, right?
link |
01:13:56.320
It's not going to be anymore.
link |
01:13:57.320
It's uncertainty that creates fear.
link |
01:13:59.320
So the punchline is we're not going to have autonomous intelligence without emotions.
link |
01:14:04.320
Okay.
link |
01:14:06.320
Whatever the heck emotions are.
link |
01:14:08.320
So you mentioned very practical things of fear, but there's a lot of other mess around it.
link |
01:14:13.320
But there are kind of the results of, you know, drives.
link |
01:14:16.320
Yeah.
link |
01:14:17.320
There's deeper biological stuff going on.
link |
01:14:19.320
And I've talked to a few folks on this.
link |
01:14:21.320
There's this fascinating stuff that ultimately connects to our brain.
link |
01:14:27.320
If we create an AGI system.
link |
01:14:30.320
Sorry.
link |
01:14:31.320
Human level intelligence.
link |
01:14:32.320
Human level intelligence system.
link |
01:14:34.320
And you get to ask her one question.
link |
01:14:37.320
What would that question be?
link |
01:14:40.320
You know, I think the first one we'll create will probably not be that smart.
link |
01:14:45.320
They'll be like a four year old.
link |
01:14:47.320
Okay.
link |
01:14:48.320
So you would have to ask her a question to know she's not that smart.
link |
01:14:53.320
Yeah.
link |
01:14:54.320
Well, what's a good question to ask, you know, to be impressed?
link |
01:14:57.320
With the cause of wind.
link |
01:15:00.320
And if she answers, oh, it's because the leaves of the tree are moving and that creates wind.
link |
01:15:06.320
She's onto something.
link |
01:15:08.320
And if she says, that's a stupid question, she's really onto something.
link |
01:15:12.320
No.
link |
01:15:13.320
And then you tell her, actually, you know, here is the real thing.
link |
01:15:17.320
And she says, oh, yeah, that makes sense.
link |
01:15:20.320
So questions that, that reveal the ability to do common sense reasoning about the physical world.
link |
01:15:26.320
Yeah.
link |
01:15:27.320
And you know, some of that will cause an inference.
link |
01:15:29.320
Causal inference.
link |
01:15:31.320
Well, it was a huge honor.
link |
01:15:33.320
Congratulations on your touring award.
link |
01:15:35.320
Thank you so much for talking today.
link |
01:15:37.320
Thank you.
link |
01:15:38.320
Thank you.