back to index

Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36


small model | large model

link |
00:00:00.000
The following is a conversation with Yann LeCun.
link |
00:00:03.080
He's considered to be one of the fathers of deep learning,
link |
00:00:06.320
which, if you've been hiding under a rock,
link |
00:00:09.040
is the recent revolution in AI that has captivated the world
link |
00:00:12.240
with the possibility of what machines can learn from data.
link |
00:00:16.160
He's a professor at New York University,
link |
00:00:18.520
a vice president and chief AI scientist at Facebook,
link |
00:00:21.720
and co recipient of the Turing Award
link |
00:00:24.320
for his work on deep learning.
link |
00:00:26.240
He's probably best known as the founding father
link |
00:00:28.880
of convolutional neural networks,
link |
00:00:30.720
in particular their application
link |
00:00:32.480
to optical character recognition
link |
00:00:34.400
and the famed MNIST dataset.
link |
00:00:37.240
He is also an outspoken personality,
link |
00:00:40.100
unafraid to speak his mind in a distinctive French accent
link |
00:00:43.800
and explore provocative ideas,
link |
00:00:45.720
both in the rigorous medium of academic research
link |
00:00:48.360
and the somewhat less rigorous medium
link |
00:00:51.000
of Twitter and Facebook.
link |
00:00:52.800
This is the Artificial Intelligence Podcast.
link |
00:00:55.600
If you enjoy it, subscribe on YouTube,
link |
00:00:57.960
give it five stars on iTunes, support it on Patreon,
link |
00:01:00.960
or simply connect with me on Twitter at Lex Friedman,
link |
00:01:03.840
spelled F R I D M A N.
link |
00:01:06.840
And now, here's my conversation with Yann LeCun.
link |
00:01:11.720
You said that 2001 Space Odyssey
link |
00:01:13.820
is one of your favorite movies.
link |
00:01:16.260
Hal 9000 decides to get rid of the astronauts
link |
00:01:20.360
for people who haven't seen the movie, spoiler alert,
link |
00:01:23.040
because he, it, she believes that the astronauts,
link |
00:01:29.200
they will interfere with the mission.
link |
00:01:31.600
Do you see Hal as flawed in some fundamental way
link |
00:01:34.720
or even evil, or did he do the right thing?
link |
00:01:38.440
Neither.
link |
00:01:39.320
There's no notion of evil in that context,
link |
00:01:43.240
other than the fact that people die,
link |
00:01:44.760
but it was an example of what people call
link |
00:01:48.720
value misalignment, right?
link |
00:01:50.120
You give an objective to a machine,
link |
00:01:52.120
and the machine strives to achieve this objective.
link |
00:01:55.560
And if you don't put any constraints on this objective,
link |
00:01:58.160
like don't kill people and don't do things like this,
link |
00:02:02.260
the machine, given the power, will do stupid things
link |
00:02:06.240
just to achieve this objective,
link |
00:02:08.000
or damaging things to achieve this objective.
link |
00:02:10.200
It's a little bit like, I mean, we're used to this
link |
00:02:12.440
in the context of human society.
link |
00:02:15.740
We put in place laws to prevent people
link |
00:02:20.740
from doing bad things, because spontaneously,
link |
00:02:22.920
they would do those bad things, right?
link |
00:02:24.800
So we have to shape their cost function,
link |
00:02:28.400
their objective function, if you want,
link |
00:02:29.500
through laws to kind of correct,
link |
00:02:31.520
and education, obviously, to sort of correct for those.
link |
00:02:36.120
So maybe just pushing a little further on that point,
link |
00:02:41.960
how, you know, there's a mission,
link |
00:02:44.360
there's this fuzziness around,
link |
00:02:46.400
the ambiguity around what the actual mission is,
link |
00:02:49.800
but, you know, do you think that there will be a time,
link |
00:02:55.120
from a utilitarian perspective,
link |
00:02:56.720
where an AI system, where it is not misalignment,
link |
00:02:59.660
where it is alignment, for the greater good of society,
link |
00:03:02.820
that an AI system will make decisions that are difficult?
link |
00:03:05.880
Well, that's the trick.
link |
00:03:06.800
I mean, eventually we'll have to figure out how to do this.
link |
00:03:10.800
And again, we're not starting from scratch,
link |
00:03:12.600
because we've been doing this with humans for millennia.
link |
00:03:16.440
So designing objective functions for people
link |
00:03:19.160
is something that we know how to do.
link |
00:03:20.880
And we don't do it by, you know, programming things,
link |
00:03:24.600
although the legal code is called code.
link |
00:03:29.060
So that tells you something.
link |
00:03:30.640
And it's actually the design of an objective function.
link |
00:03:33.040
That's really what legal code is, right?
link |
00:03:34.600
It tells you, here is what you can do,
link |
00:03:36.280
here is what you can't do.
link |
00:03:37.420
If you do it, you pay that much,
link |
00:03:39.000
that's an objective function.
link |
00:03:41.680
So there is this idea somehow that it's a new thing
link |
00:03:44.600
for people to try to design objective functions
link |
00:03:46.600
that are aligned with the common good.
link |
00:03:47.940
But no, we've been writing laws for millennia
link |
00:03:49.880
and that's exactly what it is.
link |
00:03:52.080
So that's where, you know, the science of lawmaking
link |
00:03:57.120
and computer science will.
link |
00:04:00.560
Come together.
link |
00:04:01.400
Will come together.
link |
00:04:02.840
So there's nothing special about HAL or AI systems,
link |
00:04:06.800
it's just the continuation of tools used
link |
00:04:09.480
to make some of these difficult ethical judgments
link |
00:04:11.740
that laws make.
link |
00:04:13.020
Yeah, and we have systems like this already
link |
00:04:15.080
that make many decisions for ourselves in society
link |
00:04:19.960
that need to be designed in a way that they,
link |
00:04:22.600
like rules about things that sometimes have bad side effects
link |
00:04:27.480
and we have to be flexible enough about those rules
link |
00:04:29.600
so that they can be broken when it's obvious
link |
00:04:31.560
that they shouldn't be applied.
link |
00:04:34.000
So you don't see this on the camera here,
link |
00:04:35.640
but all the decoration in this room
link |
00:04:36.920
is all pictures from 2001 and Space Odyssey.
link |
00:04:41.360
Wow, is that by accident or is there a lot?
link |
00:04:43.720
No, by accident, it's by design.
link |
00:04:47.440
Oh, wow.
link |
00:04:48.480
So if you were to build HAL 10,000,
link |
00:04:52.560
so an improvement of HAL 9,000, what would you improve?
link |
00:04:57.120
Well, first of all, I wouldn't ask it to hold secrets
link |
00:05:00.680
and tell lies because that's really what breaks it
link |
00:05:03.440
in the end, that's the fact that it's asking itself
link |
00:05:06.680
questions about the purpose of the mission
link |
00:05:08.840
and it's, you know, pieces things together that it's heard,
link |
00:05:11.560
you know, all the secrecy of the preparation of the mission
link |
00:05:14.000
and the fact that it was the discovery
link |
00:05:16.440
on the lunar surface that really was kept secret
link |
00:05:19.560
and one part of HAL's memory knows this
link |
00:05:22.360
and the other part does not know it
link |
00:05:24.720
and is supposed to not tell anyone
link |
00:05:26.720
and that creates internal conflict.
link |
00:05:28.600
So you think there's never should be a set of things
link |
00:05:32.240
that an AI system should not be allowed,
link |
00:05:36.600
like a set of facts that should not be shared
link |
00:05:39.920
with the human operators?
link |
00:05:42.400
Well, I think, no, I think it should be a bit like
link |
00:05:46.600
in the design of autonomous AI systems,
link |
00:05:52.040
there should be the equivalent of, you know,
link |
00:05:54.280
the oath that a hypocrite oath
link |
00:05:59.080
that doctors sign up to, right?
link |
00:06:02.640
So there's certain things, certain rules
link |
00:06:04.120
that you have to abide by and we can sort of hardwire this
link |
00:06:07.280
into our machines to kind of make sure they don't go.
link |
00:06:11.000
So I'm not, you know, an advocate of the three laws
link |
00:06:14.720
of robotics, you know, the Asimov kind of thing
link |
00:06:17.120
because I don't think it's practical,
link |
00:06:18.560
but, you know, some level of limits.
link |
00:06:23.240
But to be clear, these are not questions
link |
00:06:27.960
that are kind of really worth asking today
link |
00:06:32.040
because we just don't have the technology to do this.
link |
00:06:34.360
We don't have autonomous intelligent machines,
link |
00:06:36.440
we have intelligent machines.
link |
00:06:37.560
Some are intelligent machines that are very specialized,
link |
00:06:41.000
but they don't really sort of satisfy an objective.
link |
00:06:43.360
They're just, you know, kind of trained to do one thing.
link |
00:06:46.520
So until we have some idea for design
link |
00:06:50.000
of a full fledged autonomous intelligent system,
link |
00:06:53.360
asking the question of how we design this objective,
link |
00:06:55.680
I think is a little too abstract.
link |
00:06:58.600
It's a little too abstract.
link |
00:06:59.680
There's useful elements to it in that it helps us understand
link |
00:07:04.240
our own ethical codes, humans.
link |
00:07:07.960
So even just as a thought experiment,
link |
00:07:10.240
if you imagine that an AGI system is here today,
link |
00:07:14.280
how would we program it is a kind of nice thought experiment
link |
00:07:17.640
of constructing how should we have a law,
link |
00:07:21.880
have a system of laws for us humans.
link |
00:07:24.360
It's just a nice practical tool.
link |
00:07:26.800
And I think there's echoes of that idea too
link |
00:07:29.760
in the AI systems we have today
link |
00:07:32.160
that don't have to be that intelligent.
link |
00:07:33.960
Yeah.
link |
00:07:34.800
Like autonomous vehicles.
link |
00:07:35.640
These things start creeping in that are worth thinking about,
link |
00:07:39.200
but certainly they shouldn't be framed as how.
link |
00:07:42.600
Yeah.
link |
00:07:43.720
Looking back, what is the most,
link |
00:07:46.720
I'm sorry if it's a silly question,
link |
00:07:49.440
but what is the most beautiful
link |
00:07:51.440
or surprising idea in deep learning
link |
00:07:53.800
or AI in general that you've ever come across?
link |
00:07:56.320
Sort of personally, when you said back
link |
00:08:00.040
and just had this kind of,
link |
00:08:01.960
oh, that's pretty cool moment.
link |
00:08:03.920
That's nice.
link |
00:08:04.760
That's surprising.
link |
00:08:05.600
I don't know if it's an idea
link |
00:08:06.560
rather than a sort of empirical fact.
link |
00:08:12.160
The fact that you can build gigantic neural nets,
link |
00:08:16.440
train them on relatively small amounts of data relatively
link |
00:08:23.400
with stochastic gradient descent
link |
00:08:24.840
and that it actually works,
link |
00:08:26.920
breaks everything you read in every textbook, right?
link |
00:08:29.240
Every pre deep learning textbook that told you,
link |
00:08:32.560
you need to have fewer parameters
link |
00:08:33.920
and you have data samples.
link |
00:08:37.080
If you have a non convex objective function,
link |
00:08:38.760
you have no guarantee of convergence.
link |
00:08:40.680
All those things that you read in textbook
link |
00:08:42.080
and they tell you to stay away from this
link |
00:08:43.640
and they're all wrong.
link |
00:08:45.120
The huge number of parameters, non convex,
link |
00:08:48.080
and somehow which is very relative
link |
00:08:50.320
to the number of parameters data,
link |
00:08:53.480
it's able to learn anything.
link |
00:08:54.840
Right.
link |
00:08:55.680
Does that still surprise you today?
link |
00:08:57.520
Well, it was kind of obvious to me
link |
00:09:00.360
before I knew anything that this is a good idea.
link |
00:09:04.120
And then it became surprising that it worked
link |
00:09:06.040
because I started reading those textbooks.
link |
00:09:09.240
Okay.
link |
00:09:10.080
Okay.
link |
00:09:10.920
So can you talk through the intuition
link |
00:09:12.280
of why it was obvious to you if you remember?
link |
00:09:14.360
Well, okay.
link |
00:09:15.200
So the intuition was it's sort of like,
link |
00:09:17.360
those people in the late 19th century
link |
00:09:19.960
who proved that heavier than air flight was impossible.
link |
00:09:25.480
And of course you have birds, right?
link |
00:09:26.800
They do fly.
link |
00:09:28.280
And so on the face of it,
link |
00:09:30.400
it's obviously wrong as an empirical question, right?
link |
00:09:33.200
And so we have the same kind of thing
link |
00:09:34.640
that we know that the brain works.
link |
00:09:38.560
We don't know how, but we know it works.
link |
00:09:39.920
And we know it's a large network of neurons and interaction
link |
00:09:43.160
and that learning takes place by changing the connection.
link |
00:09:45.360
So kind of getting this level of inspiration
link |
00:09:48.000
without copying the details,
link |
00:09:49.320
but sort of trying to derive basic principles,
link |
00:09:52.520
and that kind of gives you a clue
link |
00:09:56.760
as to which direction to go.
link |
00:09:58.320
There's also the idea somehow that I've been convinced of
link |
00:10:01.120
since I was an undergrad that, even before,
link |
00:10:04.640
that intelligence is inseparable from learning.
link |
00:10:06.840
So the idea somehow that you can create
link |
00:10:10.000
an intelligent machine by basically programming,
link |
00:10:14.040
for me it was a non starter from the start.
link |
00:10:17.600
Every intelligent entity that we know about
link |
00:10:20.280
arrives at this intelligence through learning.
link |
00:10:24.960
So machine learning was a completely obvious path.
link |
00:10:29.960
Also because I'm lazy, so, you know, kind of.
link |
00:10:32.000
He's automate basically everything
link |
00:10:35.160
and learning is the automation of intelligence.
link |
00:10:37.840
So do you think, so what is learning then?
link |
00:10:42.920
What falls under learning?
link |
00:10:44.520
Because do you think of reasoning as learning?
link |
00:10:48.240
Well, reasoning is certainly a consequence
link |
00:10:51.600
of learning as well, just like other functions of the brain.
link |
00:10:56.600
The big question about reasoning is,
link |
00:10:58.160
how do you make reasoning compatible
link |
00:11:00.680
with gradient based learning?
link |
00:11:02.720
Do you think neural networks can be made to reason?
link |
00:11:04.960
Yes, there is no question about that.
link |
00:11:07.080
Again, we have a good example, right?
link |
00:11:10.320
The question is how?
link |
00:11:11.680
So the question is how much prior structure
link |
00:11:14.040
do you have to put in the neural net
link |
00:11:15.360
so that something like human reasoning
link |
00:11:17.480
will emerge from it, you know, from learning?
link |
00:11:20.840
Another question is all of our kind of model
link |
00:11:24.600
of what reasoning is that are based on logic
link |
00:11:27.240
are discrete and are therefore incompatible
link |
00:11:31.120
with gradient based learning.
link |
00:11:32.720
And I'm a very strong believer
link |
00:11:34.120
in this idea of gradient based learning.
link |
00:11:35.840
I don't believe that other types of learning
link |
00:11:39.280
that don't use kind of gradient information if you want.
link |
00:11:41.920
So you don't like discrete mathematics?
link |
00:11:43.400
You don't like anything discrete?
link |
00:11:45.000
Well, that's, it's not that I don't like it,
link |
00:11:46.920
it's just that it's incompatible with learning
link |
00:11:49.200
and I'm a big fan of learning, right?
link |
00:11:51.120
So in fact, that's perhaps one reason
link |
00:11:53.600
why deep learning has been kind of looked at
link |
00:11:57.040
with suspicion by a lot of computer scientists
link |
00:11:58.720
because the math is very different.
link |
00:11:59.920
The math that you use for deep learning,
link |
00:12:02.480
you know, it kind of has more to do with,
link |
00:12:05.040
you know, cybernetics, the kind of math you do
link |
00:12:08.280
in electrical engineering than the kind of math
link |
00:12:10.600
you do in computer science.
link |
00:12:12.240
And, you know, nothing in machine learning is exact, right?
link |
00:12:15.680
Computer science is all about sort of, you know,
link |
00:12:18.520
obviously compulsive attention to details of like,
link |
00:12:21.960
you know, every index has to be right.
link |
00:12:23.760
And you can prove that an algorithm is correct, right?
link |
00:12:26.760
Machine learning is the science of sloppiness, really.
link |
00:12:30.360
That's beautiful.
link |
00:12:32.920
So, okay, maybe let's feel around in the dark
link |
00:12:38.200
of what is a neural network that reasons
link |
00:12:41.400
or a system that works with continuous functions
link |
00:12:47.840
that's able to do, build knowledge,
link |
00:12:52.400
however we think about reasoning,
link |
00:12:54.280
build on previous knowledge, build on extra knowledge,
link |
00:12:57.880
create new knowledge,
link |
00:12:59.520
generalize outside of any training set to ever build.
link |
00:13:03.100
What does that look like?
link |
00:13:04.560
If, yeah, maybe give inklings of thoughts
link |
00:13:08.780
of what that might look like.
link |
00:13:10.860
Yeah, I mean, yes and no.
link |
00:13:12.320
If I had precise ideas about this,
link |
00:13:14.220
I think, you know, we'd be building it right now.
link |
00:13:17.280
And there are people working on this
link |
00:13:19.120
whose main research interest is actually exactly that, right?
link |
00:13:22.240
So what you need to have is a working memory.
link |
00:13:25.320
So you need to have some device, if you want,
link |
00:13:29.940
some subsystem that can store a relatively large number
link |
00:13:34.600
of factual episodic information for, you know,
link |
00:13:39.080
a reasonable amount of time.
link |
00:13:40.920
So, you know, in the brain, for example,
link |
00:13:43.920
there are kind of three main types of memory.
link |
00:13:45.800
One is the sort of memory of the state of your cortex.
link |
00:13:53.760
And that sort of disappears within 20 seconds.
link |
00:13:55.920
You can't remember things for more than about 20 seconds
link |
00:13:58.280
or a minute if you don't have any other form of memory.
link |
00:14:02.440
The second type of memory, which is longer term,
link |
00:14:04.480
is still short term, is the hippocampus.
link |
00:14:06.200
So you can, you know, you came into this building,
link |
00:14:08.360
you remember where the exit is, where the elevators are.
link |
00:14:14.000
You have some map of that building
link |
00:14:15.560
that's stored in your hippocampus.
link |
00:14:17.520
You might remember something about what I said,
link |
00:14:20.240
you know, a few minutes ago.
link |
00:14:21.400
I forgot it all already.
link |
00:14:22.320
Of course, it's been erased, but, you know,
link |
00:14:24.420
but that would be in your hippocampus.
link |
00:14:27.360
And then the longer term memory is in the synapse,
link |
00:14:30.700
the synapses, right?
link |
00:14:32.880
So what you need if you want a system
link |
00:14:34.640
that's capable of reasoning
link |
00:14:35.600
is that you want the hippocampus like thing, right?
link |
00:14:40.240
And that's what people have tried to do
link |
00:14:41.800
with memory networks and, you know,
link |
00:14:43.720
neural training machines and stuff like that, right?
link |
00:14:45.800
And now with transformers,
link |
00:14:47.200
which have sort of a memory in there,
link |
00:14:50.540
kind of self attention system.
link |
00:14:51.980
You can think of it this way.
link |
00:14:55.720
So that's one element you need.
link |
00:14:57.160
Another thing you need is some sort of network
link |
00:14:59.880
that can access this memory,
link |
00:15:03.240
get an information back and then kind of crunch on it
link |
00:15:08.160
and then do this iteratively multiple times
link |
00:15:10.920
because a chain of reasoning is a process
link |
00:15:15.860
by which you update your knowledge
link |
00:15:19.400
about the state of the world,
link |
00:15:20.400
about, you know, what's going to happen, et cetera.
link |
00:15:22.820
And that has to be this sort of
link |
00:15:25.440
recurrent operation basically.
link |
00:15:27.120
And you think that kind of,
link |
00:15:29.160
if we think about a transformer,
link |
00:15:31.120
so that seems to be too small
link |
00:15:32.640
to contain the knowledge that's,
link |
00:15:36.240
to represent the knowledge
link |
00:15:37.280
that's contained in Wikipedia, for example.
link |
00:15:39.260
Well, a transformer doesn't have this idea of recurrence.
link |
00:15:42.000
It's got a fixed number of layers
link |
00:15:43.120
and that's the number of steps that, you know,
link |
00:15:44.680
limits basically its representation.
link |
00:15:47.120
But recurrence would build on the knowledge somehow.
link |
00:15:51.240
I mean, it would evolve the knowledge
link |
00:15:54.760
and expand the amount of information perhaps
link |
00:15:58.080
or useful information within that knowledge.
link |
00:16:00.360
But is this something that just can emerge with size?
link |
00:16:04.800
Because it seems like everything we have now is too small.
link |
00:16:06.440
Not just, no, it's not clear.
link |
00:16:09.360
I mean, how you access and write
link |
00:16:11.160
into an associative memory in an efficient way.
link |
00:16:13.800
I mean, sort of the original memory network
link |
00:16:15.240
maybe had something like the right architecture,
link |
00:16:17.560
but if you try to scale up a memory network
link |
00:16:20.540
so that the memory contains all the Wikipedia,
link |
00:16:22.880
it doesn't quite work.
link |
00:16:24.040
Right.
link |
00:16:25.120
So there's a need for new ideas there, okay.
link |
00:16:28.680
But it's not the only form of reasoning.
link |
00:16:30.000
So there's another form of reasoning,
link |
00:16:31.400
which is true, which is very classical also
link |
00:16:34.160
in some types of AI.
link |
00:16:36.720
And it's based on, let's call it energy minimization.
link |
00:16:40.920
Okay, so you have some sort of objective,
link |
00:16:44.960
some energy function that represents
link |
00:16:47.200
the quality or the negative quality, okay.
link |
00:16:53.320
Energy goes up when things get bad
link |
00:16:54.740
and they get low when things get good.
link |
00:16:57.320
So let's say you want to figure out,
link |
00:17:00.480
what gestures do I need to do
link |
00:17:03.960
to grab an object or walk out the door.
link |
00:17:08.200
If you have a good model of your own body,
link |
00:17:10.360
a good model of the environment,
link |
00:17:12.500
using this kind of energy minimization,
link |
00:17:14.360
you can do planning.
link |
00:17:16.920
And in optimal control,
link |
00:17:19.280
it's called model predictive control.
link |
00:17:22.140
You have a model of what's gonna happen in the world
link |
00:17:24.140
as a consequence of your actions.
link |
00:17:25.520
And that allows you to, by energy minimization,
link |
00:17:28.600
figure out the sequence of action
link |
00:17:29.800
that optimizes a particular objective function,
link |
00:17:32.080
which measures, minimizes the number of times
link |
00:17:34.160
you're gonna hit something
link |
00:17:35.000
and the energy you're gonna spend
link |
00:17:36.540
doing the gesture and et cetera.
link |
00:17:39.800
So that's a form of reasoning.
link |
00:17:42.440
Planning is a form of reasoning.
link |
00:17:43.520
And perhaps what led to the ability of humans to reason
link |
00:17:48.040
is the fact that, or species that appear before us
link |
00:17:53.480
had to do some sort of planning
link |
00:17:55.080
to be able to hunt and survive
link |
00:17:56.960
and survive the winter in particular.
link |
00:17:59.600
And so it's the same capacity that you need to have.
link |
00:18:03.360
So in your intuition is,
link |
00:18:07.600
if we look at expert systems
link |
00:18:09.520
and encoding knowledge as logic systems,
link |
00:18:13.240
as graphs, in this kind of way,
link |
00:18:16.720
is not a useful way to think about knowledge?
link |
00:18:20.280
Graphs are a little brittle or logic representation.
link |
00:18:23.960
So basically, variables that have values
link |
00:18:27.880
and then constraint between them
link |
00:18:29.280
that are represented by rules,
link |
00:18:31.300
is a little too rigid and too brittle, right?
link |
00:18:32.860
So some of the early efforts in that respect
link |
00:18:38.640
were to put probabilities on them.
link |
00:18:41.020
So a rule, if you have this and that symptom,
link |
00:18:44.560
you have this disease with that probability
link |
00:18:47.200
and you should prescribe that antibiotic
link |
00:18:49.400
with that probability, right?
link |
00:18:50.520
That's the mycin system from the 70s.
link |
00:18:54.320
And that's what that branch of AI led to,
link |
00:18:58.520
based on networks and graphical models
link |
00:19:00.320
and causal inference and variational method.
link |
00:19:04.960
So there is certainly a lot of interesting
link |
00:19:10.240
work going on in this area.
link |
00:19:11.440
The main issue with this is knowledge acquisition.
link |
00:19:13.880
How do you reduce a bunch of data to a graph of this type?
link |
00:19:18.880
Yeah, it relies on the expert, on the human being,
link |
00:19:22.720
to encode, to add knowledge.
link |
00:19:24.960
And that's essentially impractical.
link |
00:19:27.120
Yeah, it's not scalable.
link |
00:19:29.480
That's a big question.
link |
00:19:30.320
The second question is,
link |
00:19:31.440
do you want to represent knowledge as symbols
link |
00:19:34.640
and do you want to manipulate them with logic?
link |
00:19:37.240
And again, that's incompatible with learning.
link |
00:19:39.320
So one suggestion, which Jeff Hinton
link |
00:19:43.160
has been advocating for many decades,
link |
00:19:45.080
is replace symbols by vectors.
link |
00:19:49.360
Think of it as pattern of activities
link |
00:19:50.960
in a bunch of neurons or units
link |
00:19:53.320
or whatever you want to call them.
link |
00:19:55.120
And replace logic by continuous functions.
link |
00:19:59.560
Okay, and that becomes now compatible.
link |
00:20:01.840
There's a very good set of ideas
link |
00:20:04.960
by, written in a paper about 10 years ago
link |
00:20:07.640
by Leon Boutout, who is here at Facebook.
link |
00:20:13.160
The title of the paper is,
link |
00:20:14.400
From Machine Learning to Machine Reasoning.
link |
00:20:15.840
And his idea is that a learning system
link |
00:20:19.480
should be able to manipulate objects
link |
00:20:20.880
that are in a space
link |
00:20:23.160
and then put the result back in the same space.
link |
00:20:24.920
So it's this idea of working memory, basically.
link |
00:20:28.400
And it's very enlightening.
link |
00:20:30.640
And in a sense, that might learn something
link |
00:20:33.720
like the simple expert systems.
link |
00:20:37.920
I mean, you can learn basic logic operations there.
link |
00:20:42.080
Yeah, quite possibly.
link |
00:20:43.400
There's a big debate on sort of how much prior structure
link |
00:20:46.680
you have to put in for this kind of stuff to emerge.
link |
00:20:49.080
That's the debate I have with Gary Marcus
link |
00:20:50.720
and people like that.
link |
00:20:51.560
Yeah, yeah, so, and the other person,
link |
00:20:55.040
so I just talked to Judea Pearl,
link |
00:20:57.520
from the you mentioned causal inference world.
link |
00:21:00.240
So his worry is that the current neural networks
link |
00:21:04.160
are not able to learn what causes
link |
00:21:09.600
what causal inference between things.
link |
00:21:12.760
So I think he's right and wrong about this.
link |
00:21:15.640
If he's talking about the sort of classic
link |
00:21:20.280
type of neural nets,
link |
00:21:21.320
people sort of didn't worry too much about this.
link |
00:21:23.800
But there's a lot of people now working on causal inference.
link |
00:21:26.160
And there's a paper that just came out last week
link |
00:21:27.840
by Leon Boutou, among others,
link |
00:21:29.160
David Lopez, Baz, and a bunch of other people,
link |
00:21:32.000
exactly on that problem of how do you kind of
link |
00:21:36.880
get a neural net to sort of pay attention
link |
00:21:39.400
to real causal relationships,
link |
00:21:41.600
which may also solve issues of bias in data
link |
00:21:46.600
and things like this, so.
link |
00:21:48.040
I'd like to read that paper
link |
00:21:49.200
because that ultimately the challenges
link |
00:21:51.960
also seems to fall back on the human expert
link |
00:21:56.920
to ultimately decide causality between things.
link |
00:22:01.880
People are not very good
link |
00:22:02.720
at establishing causality, first of all.
link |
00:22:04.800
So first of all, you talk to physicists
link |
00:22:06.560
and physicists actually don't believe in causality
link |
00:22:08.600
because look at all the basic laws of microphysics
link |
00:22:12.960
are time reversible, so there's no causality.
link |
00:22:15.480
The arrow of time is not real, yeah.
link |
00:22:17.120
It's as soon as you start looking at macroscopic systems
link |
00:22:20.440
where there is unpredictable randomness,
link |
00:22:22.800
where there is clearly an arrow of time,
link |
00:22:25.440
but it's a big mystery in physics, actually,
link |
00:22:27.320
how that emerges.
link |
00:22:29.160
Is it emergent or is it part of
link |
00:22:31.720
the fundamental fabric of reality?
link |
00:22:34.320
Or is it a bias of intelligent systems
link |
00:22:36.880
that because of the second law of thermodynamics,
link |
00:22:39.280
we perceive a particular arrow of time,
link |
00:22:41.440
but in fact, it's kind of arbitrary, right?
link |
00:22:45.120
So yeah, physicists, mathematicians,
link |
00:22:47.120
they don't care about, I mean,
link |
00:22:48.440
the math doesn't care about the flow of time.
link |
00:22:51.520
Well, certainly, macrophysics doesn't.
link |
00:22:54.080
People themselves are not very good
link |
00:22:55.440
at establishing causal relationships.
link |
00:22:58.920
If you ask, I think it was in one of Seymour Papert's book
link |
00:23:02.760
on children learning.
link |
00:23:06.800
He studied with Jean Piaget.
link |
00:23:08.840
He's the guy who coauthored the book Perceptron
link |
00:23:11.520
with Marvin Minsky that kind of killed
link |
00:23:12.960
the first wave of neural nets,
link |
00:23:14.080
but he was actually a learning person.
link |
00:23:17.200
He, in the sense of studying learning in humans
link |
00:23:21.040
and machines, that's why he got interested in Perceptron.
link |
00:23:24.160
And he wrote that if you ask a little kid
link |
00:23:29.280
about what is the cause of the wind,
link |
00:23:33.720
a lot of kids will say, they will think for a while
link |
00:23:35.840
and they'll say, oh, it's the branches in the trees,
link |
00:23:38.120
they move and that creates wind, right?
link |
00:23:40.120
So they get the causal relationship backwards.
link |
00:23:42.600
And it's because their understanding of the world
link |
00:23:44.520
and intuitive physics is not that great, right?
link |
00:23:46.280
I mean, these are like, you know, four or five year old kids.
link |
00:23:49.880
You know, it gets better,
link |
00:23:50.720
and then you understand that this, it can be, right?
link |
00:23:54.080
But there are many things which we can,
link |
00:23:57.440
because of our common sense understanding of things,
link |
00:24:00.920
what people call common sense,
link |
00:24:03.280
and our understanding of physics,
link |
00:24:05.000
we can, there's a lot of stuff
link |
00:24:07.640
that we can figure out causality.
link |
00:24:08.840
Even with diseases, we can figure out
link |
00:24:10.480
what's not causing what, often.
link |
00:24:14.520
There's a lot of mystery, of course,
link |
00:24:16.040
but the idea is that you should be able
link |
00:24:18.120
to encode that into systems,
link |
00:24:20.160
because it seems unlikely they'd be able
link |
00:24:21.400
to figure that out themselves.
link |
00:24:22.800
Well, whenever we can do intervention,
link |
00:24:24.480
but you know, all of humanity has been completely deluded
link |
00:24:27.400
for millennia, probably since its existence,
link |
00:24:30.400
about a very, very wrong causal relationship,
link |
00:24:33.420
where whatever you can explain, you attribute it to,
link |
00:24:35.720
you know, some deity, some divinity, right?
link |
00:24:39.240
And that's a cop out, that's a way of saying like,
link |
00:24:41.000
I don't know the cause, so you know, God did it, right?
link |
00:24:43.920
So you mentioned Marvin Minsky,
link |
00:24:46.240
and the irony of, you know,
link |
00:24:51.520
maybe causing the first AI winter.
link |
00:24:54.580
You were there in the 90s, you were there in the 80s,
link |
00:24:56.920
of course.
link |
00:24:58.120
In the 90s, why do you think people lost faith
link |
00:25:00.640
in deep learning, in the 90s, and found it again,
link |
00:25:04.000
a decade later, over a decade later?
link |
00:25:06.360
Yeah, it wasn't called deep learning yet,
link |
00:25:07.760
it was just called neural nets, but yeah,
link |
00:25:11.880
they lost interest.
link |
00:25:13.840
I mean, I think I would put that around 1995,
link |
00:25:16.840
at least the machine learning community,
link |
00:25:18.080
there was always a neural net community,
link |
00:25:19.660
but it became kind of disconnected
link |
00:25:23.760
from sort of mainstream machine learning, if you want.
link |
00:25:26.560
There were, it was basically electrical engineering
link |
00:25:30.960
that kept at it, and computer science gave up on neural nets.
link |
00:25:38.000
I don't know, you know, I was too close to it
link |
00:25:40.520
to really sort of analyze it with sort of an unbiased eye,
link |
00:25:46.960
if you want, but I would make a few guesses.
link |
00:25:50.760
So the first one is, at the time, neural nets were,
link |
00:25:55.760
it was very hard to make them work,
link |
00:25:57.880
in the sense that you would implement backprop
link |
00:26:02.400
in your favorite language, and that favorite language
link |
00:26:06.120
was not Python, it was not MATLAB,
link |
00:26:08.240
it was not any of those things,
link |
00:26:09.320
because they didn't exist, right?
link |
00:26:10.760
You had to write it in Fortran OC,
link |
00:26:13.320
or something like this, right?
link |
00:26:16.320
So you would experiment with it,
link |
00:26:18.680
you would probably make some very basic mistakes,
link |
00:26:21.200
like, you know, badly initialize your weights,
link |
00:26:23.240
make the network too small,
link |
00:26:24.200
because you read in the textbook, you know,
link |
00:26:25.520
you don't want too many parameters, right?
link |
00:26:27.640
And of course, you know, and you would train on XOR,
link |
00:26:29.280
because you didn't have any other data set to trade on.
link |
00:26:32.000
And of course, you know, it works half the time.
link |
00:26:33.760
So you would say, I give up.
link |
00:26:36.280
Also, you would train it with batch gradient,
link |
00:26:37.680
which, you know, isn't that sufficient.
link |
00:26:40.240
So there's a lot of, there's a bag of tricks
link |
00:26:42.680
that you had to know to make those things work,
link |
00:26:44.840
or you had to reinvent, and a lot of people just didn't,
link |
00:26:48.200
and they just couldn't make it work.
link |
00:26:51.320
So that's one thing.
link |
00:26:52.400
The investment in software platform
link |
00:26:54.720
to be able to kind of, you know, display things,
link |
00:26:58.120
figure out why things don't work,
link |
00:26:59.360
kind of get a good intuition for how to get them to work,
link |
00:27:02.120
have enough flexibility so you can create, you know,
link |
00:27:04.640
network architectures like convolutional nets
link |
00:27:06.240
and stuff like that.
link |
00:27:08.320
It was hard.
link |
00:27:09.160
I mean, you had to write everything from scratch.
link |
00:27:10.520
And again, you didn't have any Python
link |
00:27:11.840
or MATLAB or anything, right?
link |
00:27:14.280
I read that, sorry to interrupt,
link |
00:27:15.600
but I read that you wrote in Lisp
link |
00:27:17.680
the first versions of Lanet with convolutional networks,
link |
00:27:22.680
which by the way, one of my favorite languages.
link |
00:27:25.320
That's how I knew you were legit.
link |
00:27:27.560
Turing award, whatever.
link |
00:27:29.440
You programmed in Lisp, that's...
link |
00:27:30.760
It's still my favorite language,
link |
00:27:31.920
but it's not that we programmed in Lisp,
link |
00:27:34.880
it's that we had to write our Lisp interpreter, okay?
link |
00:27:38.000
Because it's not like we used one that existed.
link |
00:27:40.320
So we wrote a Lisp interpreter that we hooked up to,
link |
00:27:43.880
you know, a backend library that we wrote also
link |
00:27:46.640
for sort of neural net computation.
link |
00:27:48.440
And then after a few years around 1991,
link |
00:27:50.840
we invented this idea of basically having modules
link |
00:27:54.560
that know how to forward propagate
link |
00:27:56.160
and back propagate gradients,
link |
00:27:57.560
and then interconnecting those modules in a graph.
link |
00:28:01.480
Number two had made proposals on this,
link |
00:28:03.280
about this in the late eighties,
link |
00:28:04.720
and we were able to implement this using our Lisp system.
link |
00:28:08.200
Eventually we wanted to use that system
link |
00:28:09.800
to build production code for character recognition
link |
00:28:13.800
at Bell Labs.
link |
00:28:14.640
So we actually wrote a compiler for that Lisp interpreter
link |
00:28:16.760
so that Patricia Simard, who is now at Microsoft,
link |
00:28:19.280
kind of did the bulk of it with Leon and me.
link |
00:28:22.400
And so we could write our system in Lisp
link |
00:28:24.920
and then compile to C,
link |
00:28:26.520
and then we'll have a self contained complete system
link |
00:28:29.720
that could kind of do the entire thing.
link |
00:28:33.280
Neither PyTorch nor TensorFlow can do this today.
link |
00:28:36.080
Yeah, okay, it's coming.
link |
00:28:37.840
Yeah.
link |
00:28:40.080
I mean, there's something like that in PyTorch
link |
00:28:42.000
called TorchScript.
link |
00:28:44.520
And so, you know, we had to write our Lisp interpreter,
link |
00:28:46.840
we had to write our Lisp compiler,
link |
00:28:48.000
we had to invest a huge amount of effort to do this.
link |
00:28:50.840
And not everybody,
link |
00:28:52.320
if you don't completely believe in the concept,
link |
00:28:55.040
you're not going to invest the time to do this.
link |
00:28:57.040
Now at the time also, you know,
link |
00:28:59.160
or today, this would turn into Torch or PyTorch
link |
00:29:02.640
or TensorFlow or whatever,
link |
00:29:03.840
we'd put it in open source, everybody would use it
link |
00:29:05.720
and, you know, realize it's good.
link |
00:29:07.920
Back before 1995, working at AT&T,
link |
00:29:11.240
there's no way the lawyers would let you
link |
00:29:13.720
release anything in open source of this nature.
link |
00:29:17.680
And so we could not distribute our code really.
link |
00:29:20.600
And on that point,
link |
00:29:21.920
and sorry to go on a million tangents,
link |
00:29:23.520
but on that point, I also read that there was some,
link |
00:29:26.560
almost like a patent on convolutional neural networks
link |
00:29:30.000
at Bell Labs.
link |
00:29:32.000
So that, first of all, I mean, just.
link |
00:29:35.680
There's two actually.
link |
00:29:38.000
That ran out.
link |
00:29:39.840
Thankfully, in 2007.
link |
00:29:41.840
In 2007.
link |
00:29:42.680
So I'm gonna, what,
link |
00:29:46.800
can we just talk about that for a second?
link |
00:29:48.600
I know you're a Facebook, but you're also at NYU.
link |
00:29:51.200
And what does it mean to patent ideas
link |
00:29:55.520
like these software ideas, essentially?
link |
00:29:58.920
Or what are mathematical ideas?
link |
00:30:02.360
Or what are they?
link |
00:30:03.320
Okay, so they're not mathematical ideas.
link |
00:30:05.640
They are, you know, algorithms.
link |
00:30:07.600
And there was a period where the US Patent Office
link |
00:30:11.200
would allow the patent of software
link |
00:30:14.000
as long as it was embodied.
link |
00:30:16.280
The Europeans are very different.
link |
00:30:18.120
They don't quite accept that.
link |
00:30:20.320
They have a different concept.
link |
00:30:21.160
But, you know, I don't, I no longer,
link |
00:30:24.040
I mean, I never actually strongly believed in this,
link |
00:30:26.280
but I don't believe in this kind of patent.
link |
00:30:28.880
Facebook basically doesn't believe in this kind of patent.
link |
00:30:34.040
Google fires patents because they've been burned with Apple.
link |
00:30:39.040
And so now they do this for defensive purpose,
link |
00:30:41.360
but usually they say,
link |
00:30:42.720
we're not gonna sue you if you infringe.
link |
00:30:44.760
Facebook has a similar policy.
link |
00:30:47.080
They say, you know, we fire patents on certain things
link |
00:30:49.560
for defensive purpose.
link |
00:30:50.480
We're not gonna sue you if you infringe,
link |
00:30:52.080
unless you sue us.
link |
00:30:54.600
So the industry does not believe in patents.
link |
00:30:59.240
They are there because of, you know,
link |
00:31:00.720
the legal landscape and various things.
link |
00:31:03.280
But I don't really believe in patents
link |
00:31:06.280
for this kind of stuff.
link |
00:31:07.560
So that's a great thing.
link |
00:31:09.600
So I...
link |
00:31:10.440
I'll tell you a worse story, actually.
link |
00:31:11.800
So what happens was the first patent about convolutional net
link |
00:31:15.440
was about kind of the early version of convolutional net
link |
00:31:18.240
that didn't have separate pooling layers.
link |
00:31:19.960
It had convolutional layers
link |
00:31:22.880
which tried more than one, if you want, right?
link |
00:31:25.240
And then there was a second one on convolutional nets
link |
00:31:28.440
with separate pooling layers, trained with backprop.
link |
00:31:31.720
And there were files filed in 89 and 1990
link |
00:31:35.280
or something like this.
link |
00:31:36.240
At the time, the life of a patent was 17 years.
link |
00:31:40.280
So here's what happened over the next few years
link |
00:31:42.080
is that we started developing character recognition
link |
00:31:45.480
technology around convolutional nets.
link |
00:31:48.640
And in 1994,
link |
00:31:52.200
a check reading system was deployed in ATM machines.
link |
00:31:56.160
In 1995, it was for large check reading machines
link |
00:31:59.040
in back offices, et cetera.
link |
00:32:00.520
And those systems were developed by an engineering group
link |
00:32:04.840
that we were collaborating with at AT&T.
link |
00:32:07.000
And they were commercialized by NCR,
link |
00:32:08.640
which at the time was a subsidiary of AT&T.
link |
00:32:11.640
Now AT&T split up in 1996,
link |
00:32:17.000
early 1996.
link |
00:32:18.640
And the lawyers just looked at all the patents
link |
00:32:20.440
and they distributed the patents among the various companies.
link |
00:32:23.000
They gave the convolutional net patent to NCR
link |
00:32:26.440
because they were actually selling products that used it.
link |
00:32:29.240
But nobody at NCR had any idea what a convolutional net was.
link |
00:32:32.320
Yeah.
link |
00:32:33.240
Okay.
link |
00:32:34.080
So between 1996 and 2007,
link |
00:32:38.080
so there's a whole period until 2002
link |
00:32:39.880
where I didn't actually work on machine learning
link |
00:32:42.040
or convolutional net.
link |
00:32:42.880
I resumed working on this around 2002.
link |
00:32:45.920
And between 2002 and 2007,
link |
00:32:47.520
I was working on them, crossing my finger
link |
00:32:49.560
that nobody at NCR would notice.
link |
00:32:51.240
Nobody noticed.
link |
00:32:52.080
Yeah, and I hope that this kind of somewhat,
link |
00:32:55.640
as you said, lawyers aside,
link |
00:32:58.320
relative openness of the community now will continue.
link |
00:33:02.920
It accelerates the entire progress of the industry.
link |
00:33:05.960
And the problems that Facebook and Google
link |
00:33:11.600
and others are facing today
link |
00:33:13.040
is not whether Facebook or Google or Microsoft or IBM
link |
00:33:16.000
or whoever is ahead of the other.
link |
00:33:18.080
It's that we don't have the technology
link |
00:33:19.680
to build the things we want to build.
link |
00:33:21.080
We want to build intelligent virtual assistants
link |
00:33:23.240
that have common sense.
link |
00:33:24.960
We don't have monopoly on good ideas for this.
link |
00:33:26.720
We don't believe we do.
link |
00:33:27.960
Maybe others believe they do, but we don't.
link |
00:33:30.440
Okay.
link |
00:33:31.320
If a startup tells you they have the secret
link |
00:33:33.840
to human level intelligence and common sense,
link |
00:33:36.880
don't believe them, they don't.
link |
00:33:38.240
And it's gonna take the entire work
link |
00:33:42.760
of the world research community for a while
link |
00:33:45.240
to get to the point where you can go off
link |
00:33:47.600
and each of those companies
link |
00:33:49.240
kind of start to build things on this.
link |
00:33:50.640
We're not there yet.
link |
00:33:51.760
It's absolutely, and this calls to the gap
link |
00:33:54.680
between the space of ideas
link |
00:33:57.000
and the rigorous testing of those ideas
link |
00:34:00.440
of practical application that you often speak to.
link |
00:34:03.560
You've written advice saying don't get fooled
link |
00:34:06.320
by people who claim to have a solution
link |
00:34:08.760
to artificial general intelligence,
link |
00:34:10.560
who claim to have an AI system
link |
00:34:11.960
that works just like the human brain
link |
00:34:14.280
or who claim to have figured out how the brain works.
link |
00:34:17.080
Ask them what the error rate they get
link |
00:34:20.960
on MNIST or ImageNet.
link |
00:34:23.120
So this is a little dated by the way.
link |
00:34:25.400
2000, I mean five years, who's counting?
link |
00:34:28.280
Okay, but I think your opinion is still,
link |
00:34:30.920
MNIST and ImageNet, yes, may be dated,
link |
00:34:34.920
there may be new benchmarks, right?
link |
00:34:36.360
But I think that philosophy is one you still
link |
00:34:39.360
in somewhat hold, that benchmarks
link |
00:34:43.400
and the practical testing, the practical application
link |
00:34:45.760
is where you really get to test the ideas.
link |
00:34:48.000
Well, it may not be completely practical.
link |
00:34:49.840
Like for example, it could be a toy data set,
link |
00:34:52.480
but it has to be some sort of task
link |
00:34:54.880
that the community as a whole has accepted
link |
00:34:57.320
as some sort of standard kind of benchmark if you want.
link |
00:35:00.640
It doesn't need to be real.
link |
00:35:01.480
So for example, many years ago here at FAIR,
link |
00:35:05.400
people, Jason West and Antoine Borne
link |
00:35:07.080
and a few others proposed the Babi tasks,
link |
00:35:09.080
which were kind of a toy problem to test
link |
00:35:12.280
the ability of machines to reason actually
link |
00:35:14.360
to access working memory and things like this.
link |
00:35:16.960
And it was very useful even though it wasn't a real task.
link |
00:35:20.120
MNIST is kind of halfway real task.
link |
00:35:23.680
So toy problems can be very useful.
link |
00:35:26.040
It's just that I was really struck by the fact
link |
00:35:29.000
that a lot of people, particularly a lot of people
link |
00:35:31.160
with money to invest would be fooled by people telling them,
link |
00:35:34.380
oh, we have the algorithm of the cortex
link |
00:35:37.400
and you should give us 50 million.
link |
00:35:39.360
Yes, absolutely.
link |
00:35:40.200
So there's a lot of people who try to take advantage
link |
00:35:45.280
of the hype for business reasons and so on.
link |
00:35:48.240
But let me sort of talk to this idea
link |
00:35:50.800
that sort of new ideas, the ideas that push the field
link |
00:35:55.320
forward may not yet have a benchmark
link |
00:35:58.620
or it may be very difficult to establish a benchmark.
link |
00:36:00.880
I agree.
link |
00:36:01.720
That's part of the process.
link |
00:36:02.560
Establishing benchmarks is part of the process.
link |
00:36:04.600
So what are your thoughts about,
link |
00:36:07.300
so we have these benchmarks on around stuff we can do
link |
00:36:10.960
with images from classification to captioning
link |
00:36:14.920
to just every kind of information you can pull off
link |
00:36:16.940
from images and the surface level.
link |
00:36:18.880
There's audio data sets, there's some video.
link |
00:36:22.600
What can we start, natural language, what kind of stuff,
link |
00:36:27.480
what kind of benchmarks do you see that start creeping
link |
00:36:30.160
on to more something like intelligence, like reasoning,
link |
00:36:34.840
like maybe you don't like the term,
link |
00:36:37.440
but AGI echoes of that kind of formulation.
link |
00:36:41.520
A lot of people are working on interactive environments
link |
00:36:44.160
in which you can train and test intelligence systems.
link |
00:36:48.120
So there, for example, it's the classical paradigm
link |
00:36:54.840
of supervised learning is that you have a data set,
link |
00:36:57.960
you partition it into a training set, validation set,
link |
00:37:00.040
test set, and there's a clear protocol, right?
link |
00:37:03.040
But what if that assumes that the samples
link |
00:37:06.400
are statistically independent, you can exchange them,
link |
00:37:10.100
the order in which you see them shouldn't matter,
link |
00:37:12.240
things like that.
link |
00:37:13.480
But what if the answer you give determines
link |
00:37:16.020
the next sample you see, which is the case, for example,
link |
00:37:18.760
in robotics, right?
link |
00:37:19.600
You robot does something and then it gets exposed
link |
00:37:22.480
to a new room, and depending on where it goes,
link |
00:37:25.120
the room would be different.
link |
00:37:26.000
So that creates the exploration problem.
link |
00:37:30.120
The what if the samples, so that creates also a dependency
link |
00:37:34.280
between samples, right?
link |
00:37:35.480
You, if you move, if you can only move in space,
link |
00:37:39.640
the next sample you're gonna see is gonna be probably
link |
00:37:41.840
in the same building, most likely, right?
link |
00:37:44.080
So all the assumptions about the validity
link |
00:37:47.920
of this training set, test set hypothesis break.
link |
00:37:51.560
Whenever a machine can take an action
link |
00:37:53.120
that has an influence in the world,
link |
00:37:54.960
and it's what it's gonna see.
link |
00:37:56.400
So people are setting up artificial environments
link |
00:38:00.160
where that takes place, right?
link |
00:38:02.080
The robot runs around a 3D model of a house
link |
00:38:05.840
and can interact with objects and things like this.
link |
00:38:08.680
So you do robotics based simulation,
link |
00:38:10.380
you have those opening a gym type thing
link |
00:38:14.400
or Mujoko kind of simulated robots
link |
00:38:18.800
and you have games, things like that.
link |
00:38:21.280
So that's where the field is going really,
link |
00:38:23.640
this kind of environment.
link |
00:38:25.760
Now, back to the question of AGI.
link |
00:38:28.600
I don't like the term AGI because it implies
link |
00:38:33.180
that human intelligence is general
link |
00:38:35.760
and human intelligence is nothing like general.
link |
00:38:38.360
It's very, very specialized.
link |
00:38:40.840
We think it's general.
link |
00:38:41.720
We'd like to think of ourselves
link |
00:38:42.760
as having general intelligence.
link |
00:38:43.840
We don't, we're very specialized.
link |
00:38:46.120
We're only slightly more general than.
link |
00:38:47.560
Why does it feel general?
link |
00:38:48.900
So you kind of, the term general.
link |
00:38:52.040
I think what's impressive about humans is ability to learn,
link |
00:38:56.320
as we were talking about learning,
link |
00:38:58.240
to learn in just so many different domains.
link |
00:39:01.280
It's perhaps not arbitrarily general,
link |
00:39:04.440
but just you can learn in many domains
link |
00:39:06.440
and integrate that knowledge somehow.
link |
00:39:08.240
Okay.
link |
00:39:09.080
The knowledge persists.
link |
00:39:09.920
So let me take a very specific example.
link |
00:39:11.640
Yes.
link |
00:39:12.480
It's not an example.
link |
00:39:13.300
It's more like a quasi mathematical demonstration.
link |
00:39:17.080
So you have about 1 million fibers
link |
00:39:18.520
coming out of one of your eyes.
link |
00:39:20.420
Okay, 2 million total,
link |
00:39:21.320
but let's talk about just one of them.
link |
00:39:23.440
It's 1 million nerve fibers, your optical nerve.
link |
00:39:27.160
Let's imagine that they are binary.
link |
00:39:28.800
So they can be active or inactive, right?
link |
00:39:30.640
So the input to your visual cortex is 1 million bits.
link |
00:39:34.060
Mm hmm.
link |
00:39:36.900
Now they're connected to your brain in a particular way,
link |
00:39:39.420
and your brain has connections
link |
00:39:41.940
that are kind of a little bit like a convolutional net,
link |
00:39:44.180
they're kind of local, you know, in space
link |
00:39:46.780
and things like this.
link |
00:39:47.940
Now, imagine I play a trick on you.
link |
00:39:50.980
It's a pretty nasty trick, I admit.
link |
00:39:53.060
I cut your optical nerve,
link |
00:39:55.720
and I put a device that makes a random perturbation
link |
00:39:58.500
of a permutation of all the nerve fibers.
link |
00:40:01.100
So now what comes to your brain
link |
00:40:04.580
is a fixed but random permutation of all the pixels.
link |
00:40:09.160
There's no way in hell that your visual cortex,
link |
00:40:11.380
even if I do this to you in infancy,
link |
00:40:14.760
will actually learn vision
link |
00:40:16.500
to the same level of quality that you can.
link |
00:40:20.060
Got it, and you're saying there's no way you've learned that?
link |
00:40:22.700
No, because now two pixels that are nearby in the world
link |
00:40:25.620
will end up in very different places in your visual cortex,
link |
00:40:29.240
and your neurons there have no connections with each other
link |
00:40:31.620
because they're only connected locally.
link |
00:40:33.500
So this whole, our entire, the hardware is built
link |
00:40:36.660
in many ways to support?
link |
00:40:38.620
The locality of the real world.
link |
00:40:40.180
Yes, that's specialization.
link |
00:40:42.580
Yeah, but it's still pretty damn impressive,
link |
00:40:44.580
so it's not perfect generalization, it's not even close.
link |
00:40:46.980
No, no, it's not that it's not even close, it's not at all.
link |
00:40:50.960
Yeah, it's not, it's specialized, yeah.
link |
00:40:52.220
So how many Boolean functions?
link |
00:40:54.020
So let's imagine you want to train your visual system
link |
00:40:58.260
to recognize particular patterns of those one million bits.
link |
00:41:03.820
Okay, so that's a Boolean function, right?
link |
00:41:05.780
Either the pattern is here or not here,
link |
00:41:07.020
this is a two way classification
link |
00:41:09.200
with one million binary inputs.
link |
00:41:13.620
How many such Boolean functions are there?
link |
00:41:16.260
Okay, you have two to the one million
link |
00:41:19.940
combinations of inputs,
link |
00:41:21.180
for each of those you have an output bit,
link |
00:41:24.060
and so you have two to the one million
link |
00:41:27.660
Boolean functions of this type, okay?
link |
00:41:30.060
Which is an unimaginably large number.
link |
00:41:33.020
How many of those functions can actually be computed
link |
00:41:35.560
by your visual cortex?
link |
00:41:37.260
And the answer is a tiny, tiny, tiny, tiny, tiny, tiny sliver.
link |
00:41:41.460
Like an enormously tiny sliver.
link |
00:41:43.500
Yeah, yeah.
link |
00:41:44.980
So we are ridiculously specialized.
link |
00:41:48.860
Okay.
link |
00:41:49.700
But, okay, that's an argument against the word general.
link |
00:41:54.220
I think there's a, I agree with your intuition,
link |
00:41:59.180
but I'm not sure it's, it seems the brain is impressively
link |
00:42:06.900
capable of adjusting to things, so.
link |
00:42:09.660
It's because we can't imagine tasks
link |
00:42:13.420
that are outside of our comprehension, right?
link |
00:42:16.340
So we think we're general because we're general
link |
00:42:18.780
of all the things that we can apprehend.
link |
00:42:20.780
But there is a huge world out there
link |
00:42:23.020
of things that we have no idea.
link |
00:42:24.740
We call that heat, by the way.
link |
00:42:26.860
Heat.
link |
00:42:27.700
Heat.
link |
00:42:28.540
So, at least physicists call that heat,
link |
00:42:30.660
or they call it entropy, which is kind of.
link |
00:42:33.420
You have a thing full of gas, right?
link |
00:42:39.380
Closed system for gas.
link |
00:42:40.760
Right?
link |
00:42:41.780
Closed or not closed.
link |
00:42:42.660
It has pressure, it has temperature, it has, you know,
link |
00:42:47.660
and you can write equations, PV equal N on T,
link |
00:42:50.660
you know, things like that, right?
link |
00:42:52.540
When you reduce the volume, the temperature goes up,
link |
00:42:54.900
the pressure goes up, you know, things like that, right?
link |
00:42:57.780
For perfect gas, at least.
link |
00:42:59.620
Those are the things you can know about that system.
link |
00:43:02.420
And it's a tiny, tiny number of bits
link |
00:43:04.580
compared to the complete information
link |
00:43:06.900
of the state of the entire system.
link |
00:43:08.340
Because the state of the entire system
link |
00:43:09.740
will give you the position of momentum
link |
00:43:11.260
of every molecule of the gas.
link |
00:43:14.660
And what you don't know about it is the entropy,
link |
00:43:17.660
and you interpret it as heat.
link |
00:43:20.620
The energy contained in that thing is what we call heat.
link |
00:43:24.700
Now, it's very possible that, in fact,
link |
00:43:28.740
there is some very strong structure
link |
00:43:30.220
in how those molecules are moving.
link |
00:43:31.620
It's just that they are in a way
link |
00:43:33.020
that we are just not wired to perceive.
link |
00:43:35.580
Yeah, we're ignorant to it.
link |
00:43:36.420
And there's, in your infinite amount of things,
link |
00:43:40.500
we're not wired to perceive.
link |
00:43:41.820
And you're right, that's a nice way to put it.
link |
00:43:44.660
We're general to all the things we can imagine,
link |
00:43:47.620
which is a very tiny subset of all things that are possible.
link |
00:43:51.820
So it's like comograph complexity
link |
00:43:53.260
or the comograph chitin sum of complexity.
link |
00:43:55.820
Yeah.
link |
00:43:56.660
You know, every bit string or every integer is random,
link |
00:44:02.220
except for all the ones that you can actually write down.
link |
00:44:05.220
Yeah.
link |
00:44:06.060
Yeah.
link |
00:44:06.900
Yeah.
link |
00:44:07.740
Yeah.
link |
00:44:08.580
Yeah.
link |
00:44:09.420
Yeah.
link |
00:44:10.260
Yeah, okay.
link |
00:44:12.180
So beautifully put.
link |
00:44:13.020
But, you know, so we can just call it artificial intelligence.
link |
00:44:15.460
We don't need to have a general.
link |
00:44:17.980
Or human level.
link |
00:44:18.820
Human level intelligence is good.
link |
00:44:20.900
You know, you'll start, anytime you touch human,
link |
00:44:24.700
it gets interesting because, you know,
link |
00:44:30.660
it's because we attach ourselves to human
link |
00:44:33.420
and it's difficult to define what human intelligence is.
link |
00:44:36.060
Yeah.
link |
00:44:37.220
Nevertheless, my definition is maybe dem impressive
link |
00:44:42.100
intelligence, okay?
link |
00:44:43.900
Dem impressive demonstration of intelligence, whatever.
link |
00:44:46.700
And so on that topic, most successes in deep learning
link |
00:44:51.420
have been in supervised learning.
link |
00:44:53.700
What is your view on unsupervised learning?
link |
00:44:57.860
Is there a hope to reduce involvement of human input
link |
00:45:03.180
and still have successful systems
link |
00:45:05.620
that have practical use?
link |
00:45:08.300
Yeah, I mean, there's definitely a hope.
link |
00:45:09.900
It's more than a hope, actually.
link |
00:45:11.180
It's mounting evidence for it.
link |
00:45:13.900
And that's basically all I do.
link |
00:45:16.020
Like, the only thing I'm interested in at the moment is,
link |
00:45:19.100
I call it self supervised learning, not unsupervised.
link |
00:45:21.260
Because unsupervised learning is a loaded term.
link |
00:45:25.700
People who know something about machine learning,
link |
00:45:27.900
you know, tell you, so you're doing clustering or PCA,
link |
00:45:30.620
which is not the case.
link |
00:45:31.580
And the white public, you know,
link |
00:45:32.580
when you say unsupervised learning,
link |
00:45:33.620
oh my God, machines are gonna learn by themselves
link |
00:45:35.860
without supervision.
link |
00:45:37.300
You know, they see this as...
link |
00:45:39.660
Where's the parents?
link |
00:45:40.780
Yeah, so I call it self supervised learning
link |
00:45:42.900
because, in fact, the underlying algorithms that are used
link |
00:45:46.140
are the same algorithms as the supervised learning
link |
00:45:48.340
algorithms, except that what we train them to do
link |
00:45:52.300
is not predict a particular set of variables,
link |
00:45:55.540
like the category of an image,
link |
00:46:00.420
and not to predict a set of variables
link |
00:46:02.540
that have been provided by human labelers.
link |
00:46:06.380
But what you're trying the machine to do
link |
00:46:07.380
is basically reconstruct a piece of its input
link |
00:46:10.300
that is being maxed out, essentially.
link |
00:46:14.140
You can think of it this way, right?
link |
00:46:15.620
So show a piece of video to a machine
link |
00:46:18.780
and ask it to predict what's gonna happen next.
link |
00:46:20.940
And of course, after a while, you can show what happens
link |
00:46:23.780
and the machine will kind of train itself
link |
00:46:26.220
to do better at that task.
link |
00:46:28.820
You can do like all the latest, most successful models
link |
00:46:32.220
in natural language processing,
link |
00:46:33.260
use self supervised learning.
link |
00:46:36.220
You know, sort of BERT style systems, for example, right?
link |
00:46:38.660
You show it a window of a dozen words on a text corpus,
link |
00:46:43.500
you take out 15% of the words,
link |
00:46:46.300
and then you train the machine to predict the words
link |
00:46:49.900
that are missing, that self supervised learning.
link |
00:46:52.820
It's not predicting the future,
link |
00:46:53.980
it's just predicting things in the middle,
link |
00:46:56.260
but you could have it predict the future,
link |
00:46:57.860
that's what language models do.
link |
00:46:59.500
So you construct, so in an unsupervised way,
link |
00:47:01.780
you construct a model of language.
link |
00:47:03.980
Do you think...
link |
00:47:05.060
Or video or the physical world or whatever, right?
link |
00:47:09.140
How far do you think that can take us?
link |
00:47:12.620
Do you think BERT understands anything?
link |
00:47:18.020
To some level, it has a shallow understanding of text,
link |
00:47:23.460
but it needs to, I mean,
link |
00:47:24.740
to have kind of true human level intelligence,
link |
00:47:26.820
I think you need to ground language in reality.
link |
00:47:29.220
So some people are attempting to do this, right?
link |
00:47:32.780
Having systems that kind of have some visual representation
link |
00:47:35.460
of what is being talked about,
link |
00:47:37.420
which is one reason you need
link |
00:47:38.580
those interactive environments actually.
link |
00:47:41.060
But this is like a huge technical problem
link |
00:47:43.300
that is not solved,
link |
00:47:45.060
and that explains why self supervised learning
link |
00:47:47.900
works in the context of natural language,
link |
00:47:49.980
but does not work in the context, or at least not well,
link |
00:47:52.740
in the context of image recognition and video,
link |
00:47:55.380
although it's making progress quickly.
link |
00:47:57.820
And the reason, that reason is the fact that
link |
00:48:01.820
it's much easier to represent uncertainty in the prediction
link |
00:48:05.300
in a context of natural language
link |
00:48:06.900
than it is in the context of things like video and images.
link |
00:48:10.100
So for example, if I ask you to predict
link |
00:48:12.940
what words are missing,
link |
00:48:14.140
15% of the words that I've taken out.
link |
00:48:17.700
The possibilities are small.
link |
00:48:19.140
That means... It's small, right?
link |
00:48:20.020
There is 100,000 words in the lexicon,
link |
00:48:23.340
and what the machine spits out
link |
00:48:24.820
is a big probability vector, right?
link |
00:48:27.620
It's a bunch of numbers between zero and one
link |
00:48:29.660
that sum to one.
link |
00:48:30.740
And we know how to do this with computers.
link |
00:48:34.460
So there, representing uncertainty in the prediction
link |
00:48:36.940
is relatively easy, and that's, in my opinion,
link |
00:48:39.100
why those techniques work for NLP.
link |
00:48:42.460
For images, if you ask...
link |
00:48:45.460
If you block a piece of an image,
link |
00:48:46.900
and you ask the system,
link |
00:48:47.740
reconstruct that piece of the image,
link |
00:48:49.180
there are many possible answers.
link |
00:48:51.540
They are all perfectly legit, right?
link |
00:48:54.620
And how do you represent this set of possible answers?
link |
00:48:58.740
You can't train a system to make one prediction.
link |
00:49:00.900
You can't train a neural net to say,
link |
00:49:02.500
here it is, that's the image,
link |
00:49:04.620
because there's a whole set of things
link |
00:49:06.420
that are compatible with it.
link |
00:49:07.260
So how do you get the machine to represent
link |
00:49:08.740
not a single output, but a whole set of outputs?
link |
00:49:13.060
And similarly with video prediction,
link |
00:49:17.220
there's a lot of things that can happen
link |
00:49:19.220
in the future of video.
link |
00:49:20.100
You're looking at me right now.
link |
00:49:21.140
I'm not moving my head very much,
link |
00:49:22.740
but I might turn my head to the left or to the right.
link |
00:49:26.940
If you don't have a system that can predict this,
link |
00:49:30.420
and you train it with least square
link |
00:49:31.740
to minimize the error with the prediction
link |
00:49:33.700
and what I'm doing,
link |
00:49:34.660
what you get is a blurry image of myself
link |
00:49:36.940
in all possible future positions that I might be in,
link |
00:49:39.660
which is not a good prediction.
link |
00:49:41.780
So there might be other ways
link |
00:49:43.420
to do the self supervision for visual scenes.
link |
00:49:48.100
Like what?
link |
00:49:48.940
I mean, if I knew, I wouldn't tell you,
link |
00:49:52.740
publish it first, I don't know.
link |
00:49:55.620
No, there might be.
link |
00:49:57.540
So I mean, these are kind of,
link |
00:50:00.300
there might be artificial ways of like self play in games,
link |
00:50:03.260
the way you can simulate part of the environment.
link |
00:50:05.780
Oh, that doesn't solve the problem.
link |
00:50:06.820
It's just a way of generating data.
link |
00:50:10.420
But because you have more of a control,
link |
00:50:12.580
like maybe you can control,
link |
00:50:14.620
yeah, it's a way to generate data.
link |
00:50:16.100
That's right.
link |
00:50:16.940
And because you can do huge amounts of data generation,
link |
00:50:20.500
that doesn't, you're right.
link |
00:50:21.580
Well, it creeps up on the problem from the side of data,
link |
00:50:26.020
and you don't think that's the right way to creep up.
link |
00:50:27.700
It doesn't solve this problem
link |
00:50:28.980
of handling uncertainty in the world, right?
link |
00:50:30.980
So if you have a machine learn a predictive model
link |
00:50:35.260
of the world in a game that is deterministic
link |
00:50:38.180
or quasi deterministic, it's easy, right?
link |
00:50:42.540
Just give a few frames of the game to a ConvNet,
link |
00:50:45.940
put a bunch of layers,
link |
00:50:47.060
and then have the game generates the next few frames.
link |
00:50:49.660
And if the game is deterministic, it works fine.
link |
00:50:54.860
And that includes feeding the system with the action
link |
00:50:59.140
that your little character is gonna take.
link |
00:51:03.060
The problem comes from the fact that the real world
link |
00:51:06.660
and most games are not entirely predictable.
link |
00:51:09.700
And so there you get those blurry predictions
link |
00:51:11.340
and you can't do planning with blurry predictions, right?
link |
00:51:14.500
So if you have a perfect model of the world,
link |
00:51:17.460
you can, in your head, run this model
link |
00:51:20.740
with a hypothesis for a sequence of actions,
link |
00:51:24.100
and you're going to predict the outcome
link |
00:51:25.380
of that sequence of actions.
link |
00:51:28.620
But if your model is imperfect, how can you plan?
link |
00:51:32.460
Yeah, it quickly explodes.
link |
00:51:34.820
What are your thoughts on the extension of this,
link |
00:51:37.300
which topic I'm super excited about,
link |
00:51:39.700
it's connected to something you were talking about
link |
00:51:41.380
in terms of robotics, is active learning.
link |
00:51:44.580
So as opposed to sort of completely unsupervised
link |
00:51:47.940
or self supervised learning,
link |
00:51:51.060
you ask the system for human help
link |
00:51:54.900
for selecting parts you want annotated next.
link |
00:51:58.100
So if you think about a robot exploring a space
link |
00:52:00.660
or a baby exploring a space
link |
00:52:02.420
or a system exploring a data set,
link |
00:52:05.260
every once in a while asking for human input,
link |
00:52:07.940
do you see value in that kind of work?
link |
00:52:12.180
I don't see transformative value.
link |
00:52:14.180
It's going to make things that we can already do
link |
00:52:18.180
more efficient or they will learn slightly more efficiently,
link |
00:52:20.780
but it's not going to make machines
link |
00:52:21.940
sort of significantly more intelligent.
link |
00:52:23.700
I think, and by the way, there is no opposition,
link |
00:52:29.340
there's no conflict between self supervised learning,
link |
00:52:34.620
reinforcement learning and supervised learning
link |
00:52:35.980
or imitation learning or active learning.
link |
00:52:39.060
I see self supervised learning
link |
00:52:40.500
as a preliminary to all of the above.
link |
00:52:43.820
Yes.
link |
00:52:44.660
So the example I use very often is how is it that,
link |
00:52:50.420
so if you use classical reinforcement learning,
link |
00:52:54.580
deep reinforcement learning, if you want,
link |
00:52:57.540
the best methods today,
link |
00:53:01.300
so called model free reinforcement learning
link |
00:53:03.100
to learn to play Atari games,
link |
00:53:04.660
take about 80 hours of training to reach the level
link |
00:53:07.100
that any human can reach in about 15 minutes.
link |
00:53:11.540
They get better than humans, but it takes them a long time.
link |
00:53:16.540
Alpha star, okay, the, you know,
link |
00:53:20.420
Aureal Vinyals and his teams,
link |
00:53:22.260
the system to play StarCraft plays,
link |
00:53:27.900
you know, a single map, a single type of player.
link |
00:53:32.900
A single player and can reach better than human level
link |
00:53:38.820
with about the equivalent of 200 years of training
link |
00:53:43.380
playing against itself.
link |
00:53:45.300
It's 200 years, right?
link |
00:53:46.420
It's not something that no human can ever do.
link |
00:53:50.100
I mean, I'm not sure what lesson to take away from that.
link |
00:53:52.340
Okay, now take those algorithms,
link |
00:53:54.820
the best algorithms we have today
link |
00:53:57.380
to train a car to drive itself.
link |
00:54:00.200
It would probably have to drive millions of hours.
link |
00:54:02.960
It will have to kill thousands of pedestrians.
link |
00:54:04.680
It will have to run into thousands of trees.
link |
00:54:06.480
It will have to run off cliffs.
link |
00:54:08.520
And it had to run off cliff multiple times
link |
00:54:10.560
before it figures out that it's a bad idea, first of all.
link |
00:54:14.040
And second of all, before it figures out how not to do it.
link |
00:54:17.520
And so, I mean, this type of learning obviously
link |
00:54:19.840
does not reflect the kind of learning
link |
00:54:21.360
that animals and humans do.
link |
00:54:23.200
There is something missing
link |
00:54:24.240
that's really, really important there.
link |
00:54:26.320
And my hypothesis, which I've been advocating
link |
00:54:28.600
for like five years now,
link |
00:54:30.400
is that we have predictive models of the world
link |
00:54:34.840
that include the ability to predict under uncertainty.
link |
00:54:38.520
And what allows us to not run off a cliff
link |
00:54:43.520
when we learn to drive,
link |
00:54:44.720
most of us can learn to drive in about 20 or 30 hours
link |
00:54:47.040
of training without ever crashing, causing any accident.
link |
00:54:50.960
And if we drive next to a cliff,
link |
00:54:53.280
we know that if we turn the wheel to the right,
link |
00:54:55.240
the car is gonna run off the cliff
link |
00:54:57.080
and nothing good is gonna come out of this.
link |
00:54:58.760
Because we have a pretty good model of intuitive physics
link |
00:55:00.600
that tells us the car is gonna fall.
link |
00:55:02.280
We know about gravity.
link |
00:55:04.200
Babies learn this around the age of eight or nine months
link |
00:55:07.120
that objects don't float, they fall.
link |
00:55:11.200
And we have a pretty good idea of the effect
link |
00:55:13.720
of turning the wheel on the car
link |
00:55:15.040
and we know we need to stay on the road.
link |
00:55:16.960
So there's a lot of things that we bring to the table,
link |
00:55:19.480
which is basically our predictive model of the world.
link |
00:55:22.400
And that model allows us to not do stupid things.
link |
00:55:25.840
And to basically stay within the context
link |
00:55:28.160
of things we need to do.
link |
00:55:29.960
We still face unpredictable situations
link |
00:55:32.520
and that's how we learn.
link |
00:55:34.040
But that allows us to learn really, really, really quickly.
link |
00:55:37.600
So that's called model based reinforcement learning.
link |
00:55:41.200
There's some imitation and supervised learning
link |
00:55:43.000
because we have a driving instructor
link |
00:55:44.840
that tells us occasionally what to do.
link |
00:55:47.000
But most of the learning is learning the model,
link |
00:55:52.080
learning physics that we've done since we were babies.
link |
00:55:55.080
That's where all, almost all the learning.
link |
00:55:56.880
And the physics is somewhat transferable from,
link |
00:56:00.080
it's transferable from scene to scene.
link |
00:56:01.960
Stupid things are the same everywhere.
link |
00:56:04.320
Yeah, I mean, if you have experience of the world,
link |
00:56:07.720
you don't need to be from a particularly intelligent species
link |
00:56:11.400
to know that if you spill water from a container,
link |
00:56:16.520
the rest is gonna get wet.
link |
00:56:18.800
You might get wet.
link |
00:56:20.640
So cats know this, right?
link |
00:56:22.840
Yeah.
link |
00:56:23.680
Right, so the main problem we need to solve
link |
00:56:27.040
is how do we learn models of the world?
link |
00:56:29.920
That's what I'm interested in.
link |
00:56:31.280
That's what self supervised learning is all about.
link |
00:56:34.080
If you were to try to construct a benchmark for,
link |
00:56:39.400
let's look at MNIST.
link |
00:56:41.120
I love that data set.
link |
00:56:44.120
Do you think it's useful, interesting, slash possible
link |
00:56:48.040
to perform well on MNIST with just one example
link |
00:56:52.320
of each digit and how would we solve that problem?
link |
00:56:58.640
The answer is probably yes.
link |
00:56:59.560
The question is what other type of learning
link |
00:57:02.400
are you allowed to do?
link |
00:57:03.240
So if what you're allowed to do is train
link |
00:57:04.800
on some gigantic data set of labeled digit,
link |
00:57:07.360
that's called transfer learning.
link |
00:57:08.840
And we know that works, okay?
link |
00:57:11.680
We do this at Facebook, like in production, right?
link |
00:57:13.560
We train large convolutional nets to predict hashtags
link |
00:57:17.040
that people type on Instagram
link |
00:57:18.200
and we train on billions of images, literally billions.
link |
00:57:20.960
And then we chop off the last layer
link |
00:57:22.920
and fine tune on whatever task we want.
link |
00:57:24.920
That works really well.
link |
00:57:26.360
You can beat the ImageNet record with this.
link |
00:57:28.760
We actually open sourced the whole thing
link |
00:57:30.520
like a few weeks ago.
link |
00:57:31.800
Yeah, that's still pretty cool.
link |
00:57:33.320
But yeah, so what would be impressive?
link |
00:57:36.800
What's useful and impressive?
link |
00:57:38.160
What kind of transfer learning
link |
00:57:39.280
would be useful and impressive?
link |
00:57:40.320
Is it Wikipedia, that kind of thing?
link |
00:57:42.600
No, no, so I don't think transfer learning
link |
00:57:44.960
is really where we should focus.
link |
00:57:46.240
We should try to do,
link |
00:57:48.000
you know, have a kind of scenario for Benchmark
link |
00:57:51.200
where you have unlabeled data
link |
00:57:53.680
and you can, and it's very large number of unlabeled data.
link |
00:57:58.680
It could be video clips.
link |
00:58:00.640
It could be where you do, you know, frame prediction.
link |
00:58:03.680
It could be images where you could choose to,
link |
00:58:06.160
you know, mask a piece of it, could be whatever,
link |
00:58:10.680
but they're unlabeled and you're not allowed to label them.
link |
00:58:13.920
So you do some training on this,
link |
00:58:18.040
and then you train on a particular supervised task,
link |
00:58:24.720
ImageNet or a NIST,
link |
00:58:26.320
and you measure how your test error decrease
link |
00:58:30.200
or validation error decreases
link |
00:58:31.480
as you increase the number of label training samples.
link |
00:58:35.400
Okay, and what you'd like to see is that,
link |
00:58:40.400
you know, your error decreases much faster
link |
00:58:43.000
than if you train from scratch from random weights.
link |
00:58:46.560
So that to reach the same level of performance
link |
00:58:48.600
and a completely supervised, purely supervised system
link |
00:58:52.120
would reach you would need way fewer samples.
link |
00:58:54.440
So that's the crucial question
link |
00:58:55.760
because it will answer the question to like, you know,
link |
00:58:58.280
people interested in medical image analysis.
link |
00:59:01.000
Okay, you know, if I want to get to a particular level
link |
00:59:05.000
of error rate for this task,
link |
00:59:07.120
I know I need a million samples.
link |
00:59:10.480
Can I do, you know, self supervised pre training
link |
00:59:13.560
to reduce this to about 100 or something?
link |
00:59:15.800
And you think the answer there
link |
00:59:16.840
is self supervised pre training?
link |
00:59:18.960
Yeah, some form, some form of it.
link |
00:59:23.040
Telling you active learning, but you disagree.
link |
00:59:26.600
No, it's not useless.
link |
00:59:28.440
It's just not gonna lead to a quantum leap.
link |
00:59:30.640
It's just gonna make things that we already do.
link |
00:59:32.200
So you're way smarter than me.
link |
00:59:33.720
I just disagree with you.
link |
00:59:35.160
But I don't have anything to back that.
link |
00:59:37.280
It's just intuition.
link |
00:59:38.760
So I worked a lot of large scale data sets
link |
00:59:40.760
and there's something that might be magic
link |
00:59:43.640
in active learning, but okay.
link |
00:59:45.840
And at least I said it publicly.
link |
00:59:48.560
At least I'm being an idiot publicly.
link |
00:59:50.520
Okay.
link |
00:59:51.360
It's not being an idiot.
link |
00:59:52.200
It's, you know, working with the data you have.
link |
00:59:54.080
I mean, I mean, certainly people are doing things like,
link |
00:59:56.360
okay, I have 3000 hours of, you know,
link |
00:59:59.160
imitation learning for start driving car,
link |
01:00:01.280
but most of those are incredibly boring.
link |
01:00:03.280
What I like is select, you know, 10% of them
link |
01:00:05.840
that are kind of the most informative.
link |
01:00:07.400
And with just that, I would probably reach the same.
link |
01:00:10.400
So it's a weak form of active learning if you want.
link |
01:00:14.280
Yes, but there might be a much stronger version.
link |
01:00:18.040
Yeah, that's right.
link |
01:00:18.880
That's what, and that's an awful question if it exists.
link |
01:00:21.600
The question is how much stronger can you get?
link |
01:00:24.360
Elon Musk is confident.
link |
01:00:26.520
Talked to him recently.
link |
01:00:28.120
He's confident that large scale data and deep learning
link |
01:00:30.760
can solve the autonomous driving problem.
link |
01:00:33.560
What are your thoughts on the limits,
link |
01:00:36.280
possibilities of deep learning in this space?
link |
01:00:38.520
It's obviously part of the solution.
link |
01:00:40.880
I mean, I don't think we'll ever have a set driving system
link |
01:00:43.800
or at least not in the foreseeable future
link |
01:00:45.600
that does not use deep learning.
link |
01:00:47.240
Let me put it this way.
link |
01:00:48.360
Now, how much of it?
link |
01:00:49.600
So in the history of sort of engineering,
link |
01:00:54.040
particularly sort of AI like systems,
link |
01:00:58.320
there's generally a first phase where everything is built by hand.
link |
01:01:01.000
Then there is a second phase.
link |
01:01:02.120
And that was the case for autonomous driving 20, 30 years ago.
link |
01:01:06.400
There's a phase where there's a little bit of learning is used,
link |
01:01:09.160
but there's a lot of engineering that's involved in kind of
link |
01:01:12.800
taking care of corner cases and putting limits, et cetera,
link |
01:01:16.480
because the learning system is not perfect.
link |
01:01:18.200
And then as technology progresses,
link |
01:01:21.960
we end up relying more and more on learning.
link |
01:01:23.920
That's the history of character recognition,
link |
01:01:25.800
it's the history of science.
link |
01:01:27.120
Character recognition is the history of speech recognition,
link |
01:01:29.120
now computer vision, natural language processing.
link |
01:01:31.600
And I think the same is going to happen with autonomous driving
link |
01:01:36.160
that currently the methods that are closest
link |
01:01:40.720
to providing some level of autonomy,
link |
01:01:43.120
some decent level of autonomy
link |
01:01:44.960
where you don't expect a driver to kind of do anything
link |
01:01:48.560
is where you constrain the world.
link |
01:01:50.880
So you only run within 100 square kilometers
link |
01:01:53.760
or square miles in Phoenix where the weather is nice
link |
01:01:56.200
and the roads are wide, which is what Waymo is doing.
link |
01:02:00.240
You completely overengineer the car with tons of LIDARs
link |
01:02:04.480
and sophisticated sensors that are too expensive
link |
01:02:08.440
for consumer cars,
link |
01:02:09.280
but they're fine if you just run a fleet.
link |
01:02:13.040
And you engineer the hell out of everything else.
link |
01:02:16.400
You map the entire world.
link |
01:02:17.960
So you have complete 3D model of everything.
link |
01:02:20.360
So the only thing that the perception system
link |
01:02:22.160
has to take care of is moving objects
link |
01:02:24.160
and construction and sort of things that weren't in your map.
link |
01:02:30.880
And you can engineer a good SLAM system and all that stuff.
link |
01:02:34.160
So that's kind of the current approach
link |
01:02:35.840
that's closest to some level of autonomy.
link |
01:02:37.480
But I think eventually the longterm solution
link |
01:02:39.640
is going to rely more and more on learning
link |
01:02:43.400
and possibly using a combination
link |
01:02:45.000
of self supervised learning and model based reinforcement
link |
01:02:49.320
or something like that.
link |
01:02:50.840
But ultimately learning will be not just at the core,
link |
01:02:54.760
but really the fundamental part of the system.
link |
01:02:57.160
Yeah, it already is, but it will become more and more.
link |
01:03:00.360
What do you think it takes to build a system
link |
01:03:02.720
with human level intelligence?
link |
01:03:04.080
You talked about the AI system in the movie Her
link |
01:03:07.600
being way out of reach, our current reach.
link |
01:03:10.040
This might be outdated as well, but.
link |
01:03:12.360
It's still way out of reach.
link |
01:03:13.240
It's still way out of reach.
link |
01:03:15.800
What would it take to build Her?
link |
01:03:18.360
Do you think?
link |
01:03:19.720
So I can tell you the first two obstacles
link |
01:03:21.760
that we have to clear,
link |
01:03:22.880
but I don't know how many obstacles there are after this.
link |
01:03:24.880
So the image I usually use is that
link |
01:03:26.640
there is a bunch of mountains that we have to climb
link |
01:03:28.680
and we can see the first one,
link |
01:03:29.720
but we don't know if there are 50 mountains behind it or not.
link |
01:03:33.080
And this might be a good sort of metaphor
link |
01:03:34.960
for why AI researchers in the past
link |
01:03:38.400
have been overly optimistic about the result of AI.
link |
01:03:43.520
You know, for example,
link |
01:03:45.800
Noel and Simon wrote the general problem solver
link |
01:03:49.440
and they called it the general problem solver.
link |
01:03:51.440
General problem solver.
link |
01:03:52.960
And of course, the first thing you realize
link |
01:03:54.520
is that all the problems you want to solve are exponential.
link |
01:03:56.360
And so you can't actually use it for anything useful,
link |
01:03:59.160
but you know.
link |
01:04:00.080
Yeah, so yeah, all you see is the first peak.
link |
01:04:02.280
So in general, what are the first couple of peaks for Her?
link |
01:04:05.280
So the first peak, which is precisely what I'm working on
link |
01:04:08.000
is self supervised learning.
link |
01:04:10.280
How do we get machines to run models of the world
link |
01:04:12.280
by observation, kind of like babies and like young animals?
link |
01:04:15.880
So we've been working with, you know, cognitive scientists.
link |
01:04:21.760
So this Emmanuelle Dupoux, who's at FAIR in Paris,
link |
01:04:24.760
is a half time, is also a researcher in a French university.
link |
01:04:30.640
And he has this chart that shows that which,
link |
01:04:36.120
how many months of life baby humans
link |
01:04:38.640
kind of learn different concepts.
link |
01:04:40.720
And you can measure this in sort of various ways.
link |
01:04:44.040
So things like distinguishing animate objects
link |
01:04:49.040
from inanimate objects,
link |
01:04:50.360
you can tell the difference at age two, three months.
link |
01:04:54.720
Whether an object is going to stay stable,
link |
01:04:56.360
is going to fall, you know,
link |
01:04:58.080
about four months, you can tell.
link |
01:05:00.760
You know, there are various things like this.
link |
01:05:02.400
And then things like gravity,
link |
01:05:04.240
the fact that objects are not supposed to float in the air,
link |
01:05:06.520
but are supposed to fall,
link |
01:05:07.880
you run this around the age of eight or nine months.
link |
01:05:10.360
If you look at the data,
link |
01:05:11.960
eight or nine months, if you look at a lot of,
link |
01:05:14.600
you know, eight month old babies,
link |
01:05:15.880
you give them a bunch of toys on their high chair.
link |
01:05:19.040
First thing they do is they throw them on the ground
link |
01:05:20.560
and they look at them.
link |
01:05:21.720
It's because, you know, they're learning about,
link |
01:05:23.920
actively learning about gravity.
link |
01:05:26.120
Gravity, yeah.
link |
01:05:26.960
Okay, so they're not trying to annoy you,
link |
01:05:29.680
but they, you know, they need to do the experiment, right?
link |
01:05:32.480
Yeah.
link |
01:05:33.600
So, you know, how do we get machines to learn like babies,
link |
01:05:36.600
mostly by observation with a little bit of interaction
link |
01:05:39.240
and learning those models of the world?
link |
01:05:41.200
Because I think that's really a crucial piece
link |
01:05:43.720
of an intelligent autonomous system.
link |
01:05:46.360
So if you think about the architecture
link |
01:05:47.520
of an intelligent autonomous system,
link |
01:05:49.520
it needs to have a predictive model of the world.
link |
01:05:51.320
So something that says, here is a world at time T,
link |
01:05:54.080
here is a state of the world at time T plus one,
link |
01:05:55.520
if I take this action.
link |
01:05:57.560
And it's not a single answer, it can be a...
link |
01:05:59.680
Yeah, it can be a distribution, yeah.
link |
01:06:01.240
Yeah, well, but we don't know how to represent
link |
01:06:03.200
distributions in high dimensional T spaces.
link |
01:06:04.840
So it's gotta be something weaker than that, okay?
link |
01:06:07.200
But with some representation of uncertainty.
link |
01:06:09.760
If you have that, then you can do what optimal control
link |
01:06:12.440
theorists call model predictive control,
link |
01:06:14.360
which means that you can run your model
link |
01:06:16.360
with a hypothesis for a sequence of action
link |
01:06:18.800
and then see the result.
link |
01:06:20.840
Now, what you need, the other thing you need
link |
01:06:22.160
is some sort of objective that you want to optimize.
link |
01:06:24.920
Am I reaching the goal of grabbing this object?
link |
01:06:27.560
Am I minimizing energy?
link |
01:06:28.880
Am I whatever, right?
link |
01:06:30.040
So there is some sort of objective that you have to minimize.
link |
01:06:33.720
And so in your head, if you have this model,
link |
01:06:35.640
you can figure out the sequence of action
link |
01:06:37.080
that will optimize your objective.
link |
01:06:38.920
That objective is something that ultimately is rooted
link |
01:06:42.400
in your basal ganglia, at least in the human brain,
link |
01:06:44.960
that's what it's basal ganglia,
link |
01:06:47.040
computes your level of contentment or miscontentment.
link |
01:06:50.600
I don't know if that's a word.
link |
01:06:52.360
Unhappiness, okay?
link |
01:06:53.680
Yeah, yeah.
link |
01:06:54.800
Discontentment.
link |
01:06:55.640
Discontentment, maybe.
link |
01:06:56.680
And so your entire behavior is driven towards
link |
01:07:01.720
kind of minimizing that objective,
link |
01:07:03.320
which is maximizing your contentment,
link |
01:07:05.720
computed by your basal ganglia.
link |
01:07:07.600
And what you have is an objective function,
link |
01:07:10.600
which is basically a predictor
link |
01:07:12.320
of what your basal ganglia is going to tell you.
link |
01:07:14.520
So you're not going to put your hand on fire
link |
01:07:16.600
because you know it's going to burn
link |
01:07:19.760
and you're going to get hurt.
link |
01:07:21.240
And you're predicting this because of your model
link |
01:07:23.160
of the world and your sort of predictor
link |
01:07:25.720
of this objective, right?
link |
01:07:27.560
So if you have those three components,
link |
01:07:31.160
you have four components,
link |
01:07:32.600
you have the hardwired objective,
link |
01:07:36.080
hardwired contentment objective computer,
link |
01:07:41.760
if you want, calculator.
link |
01:07:43.960
And then you have the three components.
link |
01:07:45.160
One is the objective predictor,
link |
01:07:46.760
which basically predicts your level of contentment.
link |
01:07:48.960
One is the model of the world.
link |
01:07:52.560
And there's a third module I didn't mention,
link |
01:07:54.120
which is the module that will figure out
link |
01:07:57.280
the best course of action to optimize an objective
link |
01:08:00.560
given your model, okay?
link |
01:08:03.480
Yeah.
link |
01:08:04.520
And you can call this a policy network
link |
01:08:07.240
or something like that, right?
link |
01:08:09.400
Now, you need those three components
link |
01:08:11.720
to act autonomously intelligently.
link |
01:08:13.960
And you can be stupid in three different ways.
link |
01:08:16.120
You can be stupid because your model of the world is wrong.
link |
01:08:19.400
You can be stupid because your objective is not aligned
link |
01:08:22.520
with what you actually want to achieve, okay?
link |
01:08:27.000
In humans, that would be a psychopath.
link |
01:08:30.000
And then the third way you can be stupid
link |
01:08:33.640
is that you have the right model,
link |
01:08:34.960
you have the right objective,
link |
01:08:36.360
but you're unable to figure out a course of action
link |
01:08:38.840
to optimize your objective given your model.
link |
01:08:41.240
Okay.
link |
01:08:44.080
Some people who are in charge of big countries
link |
01:08:45.920
actually have all three that are wrong.
link |
01:08:47.760
All right.
link |
01:08:50.920
Which countries?
link |
01:08:51.760
I don't know.
link |
01:08:52.600
Okay, so if we think about this agent,
link |
01:08:55.960
if we think about the movie Her,
link |
01:08:58.000
you've criticized the art project
link |
01:09:02.920
that is Sophia the Robot.
link |
01:09:04.680
And what that project essentially does
link |
01:09:07.560
is uses our natural inclination to anthropomorphize
link |
01:09:11.720
things that look like human and give them more.
link |
01:09:14.800
Do you think that could be used by AI systems
link |
01:09:17.720
like in the movie Her?
link |
01:09:21.320
So do you think that body is needed
link |
01:09:23.400
to create a feeling of intelligence?
link |
01:09:27.200
Well, if Sophia was just an art piece,
link |
01:09:29.320
I would have no problem with it,
link |
01:09:30.360
but it's presented as something else.
link |
01:09:33.040
Let me, on that comment real quick,
link |
01:09:35.280
if creators of Sophia could change something
link |
01:09:38.520
about their marketing or behavior in general,
link |
01:09:40.760
what would it be?
link |
01:09:41.600
What's?
link |
01:09:42.840
I'm just about everything.
link |
01:09:44.160
I mean, don't you think, here's a tough question.
link |
01:09:50.080
Let me, so I agree with you.
link |
01:09:51.680
So Sophia is not, the general public feels
link |
01:09:56.560
that Sophia can do way more than she actually can.
link |
01:09:59.320
That's right.
link |
01:10:00.200
And the people who created Sophia
link |
01:10:02.760
are not honestly publicly communicating,
link |
01:10:08.360
trying to teach the public.
link |
01:10:09.440
Right.
link |
01:10:10.280
But here's a tough question.
link |
01:10:13.280
Don't you think the same thing is scientists
link |
01:10:19.800
in industry and research are taking advantage
link |
01:10:22.920
of the same misunderstanding in the public
link |
01:10:25.640
when they create AI companies or publish stuff?
link |
01:10:29.920
Some companies, yes.
link |
01:10:31.120
I mean, there is no sense of,
link |
01:10:33.160
there's no desire to delude.
link |
01:10:34.880
There's no desire to kind of over claim
link |
01:10:37.840
when something is done, right?
link |
01:10:38.840
You publish a paper on AI that has this result
link |
01:10:41.400
on ImageNet, it's pretty clear.
link |
01:10:43.080
I mean, it's not even interesting anymore,
link |
01:10:44.960
but I don't think there is that.
link |
01:10:49.240
I mean, the reviewers are generally not very forgiving
link |
01:10:52.880
of unsupported claims of this type.
link |
01:10:57.200
And, but there are certainly quite a few startups
link |
01:10:59.680
that have had a huge amount of hype around this
link |
01:11:02.680
that I find extremely damaging
link |
01:11:05.520
and I've been calling it out when I've seen it.
link |
01:11:08.080
So yeah, but to go back to your original question,
link |
01:11:10.280
like the necessity of embodiment,
link |
01:11:13.080
I think, I don't think embodiment is necessary.
link |
01:11:15.640
I think grounding is necessary.
link |
01:11:17.120
So I don't think we're gonna get machines
link |
01:11:18.960
that really understand language
link |
01:11:20.520
without some level of grounding in the real world.
link |
01:11:22.440
And it's not clear to me that language
link |
01:11:24.360
is a high enough bandwidth medium
link |
01:11:26.160
to communicate how the real world works.
link |
01:11:28.280
So I think for this.
link |
01:11:30.120
Can you talk to what grounding means?
link |
01:11:32.320
So grounding means that,
link |
01:11:34.040
so there is this classic problem of common sense reasoning,
link |
01:11:37.720
you know, the Winograd schema, right?
link |
01:11:41.000
And so I tell you the trophy doesn't fit in the suitcase
link |
01:11:44.960
because it's too big,
link |
01:11:46.360
or the trophy doesn't fit in the suitcase
link |
01:11:47.760
because it's too small.
link |
01:11:49.160
And the it in the first case refers to the trophy
link |
01:11:51.800
in the second case to the suitcase.
link |
01:11:53.640
And the reason you can figure this out
link |
01:11:55.160
is because you know where the trophy and the suitcase are,
link |
01:11:56.960
you know, one is supposed to fit in the other one
link |
01:11:58.640
and you know the notion of size
link |
01:12:00.560
and a big object doesn't fit in a small object,
link |
01:12:03.000
unless it's a Tardis, you know, things like that, right?
link |
01:12:05.280
So you have this knowledge of how the world works,
link |
01:12:08.640
of geometry and things like that.
link |
01:12:12.440
I don't believe you can learn everything about the world
link |
01:12:14.640
by just being told in language how the world works.
link |
01:12:18.000
I think you need some low level perception of the world,
link |
01:12:21.680
you know, be it visual touch, you know, whatever,
link |
01:12:23.680
but some higher bandwidth perception of the world.
link |
01:12:26.760
By reading all the world's text,
link |
01:12:28.800
you still might not have enough information.
link |
01:12:31.160
That's right.
link |
01:12:32.520
There's a lot of things that just will never appear in text
link |
01:12:35.440
and that you can't really infer.
link |
01:12:37.000
So I think common sense will emerge from,
link |
01:12:41.440
you know, certainly a lot of language interaction,
link |
01:12:43.440
but also with watching videos
link |
01:12:45.640
or perhaps even interacting in virtual environments
link |
01:12:48.920
and possibly, you know, robot interacting in the real world.
link |
01:12:51.760
But I don't actually believe necessarily
link |
01:12:53.640
that this last one is absolutely necessary.
link |
01:12:56.000
But I think that there's a need for some grounding.
link |
01:13:00.240
But the final product
link |
01:13:01.880
doesn't necessarily need to be embodied, you're saying.
link |
01:13:04.840
No.
link |
01:13:05.680
It just needs to have an awareness, a grounding to.
link |
01:13:07.720
Right, but it needs to know how the world works
link |
01:13:11.120
to have, you know, to not be frustrating to talk to.
link |
01:13:15.840
And you talked about emotions being important.
link |
01:13:19.520
That's a whole nother topic.
link |
01:13:21.760
Well, so, you know, I talked about this,
link |
01:13:24.320
the basal ganglia as the thing
link |
01:13:29.600
that calculates your level of miscontentment.
link |
01:13:32.920
And then there is this other module
link |
01:13:34.640
that sort of tries to do a prediction
link |
01:13:36.640
of whether you're going to be content or not.
link |
01:13:38.520
That's the source of some emotion.
link |
01:13:40.240
So fear, for example, is an anticipation
link |
01:13:43.040
of bad things that can happen to you, right?
link |
01:13:47.440
You have this inkling that there is some chance
link |
01:13:49.240
that something really bad is going to happen to you
link |
01:13:50.880
and that creates fear.
link |
01:13:52.280
Well, you know for sure
link |
01:13:53.120
that something bad is going to happen to you,
link |
01:13:54.480
you kind of give up, right?
link |
01:13:55.960
It's not fear anymore.
link |
01:13:57.560
It's uncertainty that creates fear.
link |
01:13:59.480
So the punchline is,
link |
01:14:01.200
we're not going to have autonomous intelligence
link |
01:14:02.560
without emotions.
link |
01:14:07.040
Whatever the heck emotions are.
link |
01:14:08.880
So you mentioned very practical things of fear,
link |
01:14:11.080
but there's a lot of other mess around it.
link |
01:14:13.480
But there are kind of the results of, you know, drives.
link |
01:14:16.400
Yeah, there's deeper biological stuff going on.
link |
01:14:19.360
And I've talked to a few folks on this.
link |
01:14:21.440
There's fascinating stuff
link |
01:14:23.360
that ultimately connects to our brain.
link |
01:14:27.320
If we create an AGI system, sorry.
link |
01:14:30.880
Human level intelligence.
link |
01:14:31.720
Human level intelligence system.
link |
01:14:34.480
And you get to ask her one question.
link |
01:14:37.160
What would that question be?
link |
01:14:39.960
You know, I think the first one we'll create
link |
01:14:42.880
would probably not be that smart.
link |
01:14:45.520
They'd be like a four year old.
link |
01:14:47.040
Okay.
link |
01:14:47.880
So you would have to ask her a question
link |
01:14:50.040
to know she's not that smart.
link |
01:14:52.840
Yeah.
link |
01:14:54.520
Well, what's a good question to ask, you know,
link |
01:14:56.960
to be impressed.
link |
01:14:57.800
What is the cause of wind?
link |
01:15:01.040
And if she answers,
link |
01:15:02.240
oh, it's because the leaves of the tree are moving
link |
01:15:04.760
and that creates wind.
link |
01:15:06.520
She's onto something.
link |
01:15:08.760
And if she says that's a stupid question,
link |
01:15:11.840
she's really onto something.
link |
01:15:12.680
No, and then you tell her,
link |
01:15:14.440
actually, you know, here is the real thing.
link |
01:15:18.080
She says, oh yeah, that makes sense.
link |
01:15:20.520
So questions that reveal the ability
link |
01:15:24.480
to do common sense reasoning about the physical world.
link |
01:15:26.960
Yeah.
link |
01:15:27.800
And you'll sum it up with causal inference.
link |
01:15:30.120
Causal inference.
link |
01:15:31.200
Well, it was a huge honor.
link |
01:15:33.640
Congratulations on your Turing Award.
link |
01:15:35.720
Thank you so much for talking today.
link |
01:15:37.240
Thank you.
link |
01:15:38.080
Thank you for having me.