back to index

Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning | Lex Fridman Podcast #258


small model | large model

link |
00:00:00.000
The following is a conversation with Yann LeCun,
link |
00:00:02.720
his second time on the podcast.
link |
00:00:04.560
He is the chief AI scientist at Meta, formerly Facebook,
link |
00:00:09.180
professor at NYU, touring award winner,
link |
00:00:13.080
one of the seminal figures in the history
link |
00:00:15.640
of machine learning and artificial intelligence,
link |
00:00:18.480
and someone who is brilliant and opinionated
link |
00:00:21.960
in the best kind of way.
link |
00:00:23.440
And so it was always fun to talk to him.
link |
00:00:26.000
This is the Lex Friedman podcast.
link |
00:00:28.000
To support it, please check out our sponsors
link |
00:00:29.960
in the description.
link |
00:00:31.220
And now, here's my conversation with Yann LeCun.
link |
00:00:36.160
You cowrote the article,
link |
00:00:37.600
Self Supervised Learning, the Dark Matter of Intelligence.
link |
00:00:40.900
Great title, by the way, with Ishan Mizra.
link |
00:00:43.720
So let me ask, what is self supervised learning,
link |
00:00:46.640
and why is it the dark matter of intelligence?
link |
00:00:49.920
I'll start by the dark matter part.
link |
00:00:53.120
There is obviously a kind of learning
link |
00:00:55.680
that humans and animals are doing
link |
00:00:59.880
that we currently are not reproducing properly
link |
00:01:02.800
with machines or with AI, right?
link |
00:01:04.660
So the most popular approaches to machine learning today are,
link |
00:01:08.480
or paradigms, I should say,
link |
00:01:09.660
are supervised learning and reinforcement learning.
link |
00:01:12.720
And they are extremely inefficient.
link |
00:01:15.120
Supervised learning requires many samples
link |
00:01:17.620
for learning anything.
link |
00:01:19.760
And reinforcement learning requires a ridiculously large
link |
00:01:22.760
number of trial and errors for a system to learn anything.
link |
00:01:29.320
And that's why we don't have self driving cars.
link |
00:01:32.960
That was a big leap from one to the other.
link |
00:01:34.760
Okay, so that, to solve difficult problems,
link |
00:01:38.760
you have to have a lot of human annotation
link |
00:01:42.360
for supervised learning to work.
link |
00:01:44.080
And to solve those difficult problems
link |
00:01:45.520
with reinforcement learning,
link |
00:01:46.680
you have to have some way to maybe simulate that problem
link |
00:01:50.240
such that you can do that large scale kind of learning
link |
00:01:52.720
that reinforcement learning requires.
link |
00:01:54.420
Right, so how is it that most teenagers can learn
link |
00:01:58.320
to drive a car in about 20 hours of practice,
link |
00:02:02.280
whereas even with millions of hours of simulated practice,
link |
00:02:07.400
a self driving car can't actually learn
link |
00:02:09.220
to drive itself properly.
link |
00:02:12.120
And so obviously we're missing something, right?
link |
00:02:13.920
And it's quite obvious for a lot of people
link |
00:02:15.600
that the immediate response you get from many people is,
link |
00:02:19.760
well, humans use their background knowledge
link |
00:02:22.840
to learn faster, and they're right.
link |
00:02:25.820
Now, how was that background knowledge acquired?
link |
00:02:28.280
And that's the big question.
link |
00:02:30.080
So now you have to ask, how do babies
link |
00:02:34.040
in the first few months of life learn how the world works?
link |
00:02:37.120
Mostly by observation,
link |
00:02:38.240
because they can hardly act in the world.
link |
00:02:40.960
And they learn an enormous amount
link |
00:02:42.560
of background knowledge about the world
link |
00:02:43.840
that may be the basis of what we call common sense.
link |
00:02:47.960
This type of learning is not learning a task.
link |
00:02:51.280
It's not being reinforced for anything.
link |
00:02:53.680
It's just observing the world and figuring out how it works.
link |
00:02:58.400
Building world models, learning world models.
link |
00:03:01.240
How do we do this?
link |
00:03:02.120
And how do we reproduce this in machines?
link |
00:03:04.560
So self supervised learning is one instance
link |
00:03:09.520
or one attempt at trying to reproduce this kind of learning.
link |
00:03:13.120
Okay, so you're looking at just observation,
link |
00:03:16.400
so not even the interacting part of a child.
link |
00:03:18.720
It's just sitting there watching mom and dad walk around,
link |
00:03:21.600
pick up stuff, all of that.
link |
00:03:23.480
That's what we mean about background knowledge.
link |
00:03:25.520
Perhaps not even watching mom and dad,
link |
00:03:27.520
just watching the world go by.
link |
00:03:30.000
Just having eyes open or having eyes closed
link |
00:03:31.920
or the very act of opening and closing eyes
link |
00:03:34.480
that the world appears and disappears,
link |
00:03:36.280
all that basic information.
link |
00:03:39.120
And you're saying in order to learn to drive,
link |
00:03:43.160
like the reason humans are able to learn to drive quickly,
link |
00:03:45.840
some faster than others,
link |
00:03:47.360
is because of the background knowledge.
link |
00:03:48.680
They're able to watch cars operate in the world
link |
00:03:51.760
in the many years leading up to it,
link |
00:03:53.640
the physics of basic objects, all that kind of stuff.
link |
00:03:55.760
That's right.
link |
00:03:56.600
I mean, the basic physics of objects,
link |
00:03:57.440
you don't even need to know how a car works, right?
link |
00:04:00.880
Because that you can learn fairly quickly.
link |
00:04:02.500
I mean, the example I use very often
link |
00:04:03.840
is you're driving next to a cliff.
link |
00:04:06.680
And you know in advance because of your understanding
link |
00:04:10.560
of intuitive physics that if you turn the wheel
link |
00:04:13.200
to the right, the car will veer to the right,
link |
00:04:15.080
will run off the cliff, fall off the cliff,
link |
00:04:17.560
and nothing good will come out of this, right?
link |
00:04:20.400
But if you are a sort of tabularized
link |
00:04:23.760
reinforcement learning system
link |
00:04:25.100
that doesn't have a model of the world,
link |
00:04:28.160
you have to repeat falling off this cliff
link |
00:04:30.500
thousands of times before you figure out it's a bad idea.
link |
00:04:32.800
And then a few more thousand times
link |
00:04:34.560
before you figure out how to not do it.
link |
00:04:36.960
And then a few more million times
link |
00:04:38.480
before you figure out how to not do it
link |
00:04:39.800
in every situation you ever encounter.
link |
00:04:42.520
So self supervised learning still has to have
link |
00:04:45.800
some source of truth being told to it by somebody.
link |
00:04:50.560
So you have to figure out a way without human assistance
link |
00:04:54.560
or without significant amount of human assistance
link |
00:04:56.600
to get that truth from the world.
link |
00:04:59.100
So the mystery there is how much signal is there?
link |
00:05:03.980
How much truth is there that the world gives you?
link |
00:05:06.280
Whether it's the human world,
link |
00:05:08.160
like you watch YouTube or something like that,
link |
00:05:10.020
or it's the more natural world.
link |
00:05:12.960
So how much signal is there?
link |
00:05:14.920
So here's the trick.
link |
00:05:16.280
There is way more signal in sort of a self supervised
link |
00:05:20.120
setting than there is in either a supervised
link |
00:05:22.500
or reinforcement setting.
link |
00:05:24.520
And this is going to my analogy of the cake.
link |
00:05:30.280
The cake as someone has called it,
link |
00:05:32.320
where when you try to figure out how much information
link |
00:05:36.000
you ask the machine to predict
link |
00:05:37.840
and how much feedback you give the machine at every trial,
link |
00:05:41.040
in reinforcement learning,
link |
00:05:41.880
you give the machine a single scalar.
link |
00:05:43.340
You tell the machine you did good, you did bad.
link |
00:05:45.400
And you only tell this to the machine once in a while.
link |
00:05:49.640
When I say you, it could be the universe
link |
00:05:51.440
telling the machine, right?
link |
00:05:54.120
But it's just one scalar.
link |
00:05:55.840
And so as a consequence,
link |
00:05:57.160
you cannot possibly learn something very complicated
link |
00:05:59.600
without many, many, many trials
link |
00:06:01.120
where you get many, many feedbacks of this type.
link |
00:06:04.760
Supervised learning, you give a few bits to the machine
link |
00:06:08.880
at every sample.
link |
00:06:11.280
Let's say you're training a system on recognizing images
link |
00:06:15.720
on ImageNet with 1000 categories,
link |
00:06:17.680
that's a little less than 10 bits of information per sample.
link |
00:06:22.180
But self supervised learning, here is the setting.
link |
00:06:24.640
Ideally, we don't know how to do this yet,
link |
00:06:26.360
but ideally you would show a machine a segment of video
link |
00:06:31.680
and then stop the video and ask the machine to predict
link |
00:06:34.200
what's going to happen next.
link |
00:06:37.640
And so we let the machine predict
link |
00:06:38.720
and then you let time go by
link |
00:06:41.400
and show the machine what actually happened
link |
00:06:44.340
and hope the machine will learn to do a better job
link |
00:06:47.920
at predicting next time around.
link |
00:06:49.400
There's a huge amount of information you give the machine
link |
00:06:51.580
because it's an entire video clip
link |
00:06:54.680
of the future after the video clip you fed it
link |
00:06:59.280
in the first place.
link |
00:07:00.280
So both for language and for vision, there's a subtle,
link |
00:07:05.120
seemingly trivial construction,
link |
00:07:06.920
but maybe that's representative
link |
00:07:08.520
of what is required to create intelligence,
link |
00:07:10.620
which is filling the gap.
link |
00:07:13.720
So it sounds dumb, but can you,
link |
00:07:19.760
it is possible you could solve all of intelligence
link |
00:07:22.080
in this way, just for both language,
link |
00:07:25.280
just give a sentence and continue it
link |
00:07:28.800
or give a sentence and there's a gap in it,
link |
00:07:32.080
some words blanked out and you fill in what words go there.
link |
00:07:35.720
For vision, you give a sequence of images
link |
00:07:39.200
and predict what's going to happen next,
link |
00:07:40.960
or you fill in what happened in between.
link |
00:07:43.840
Do you think it's possible that formulation alone
link |
00:07:48.600
as a signal for self supervised learning
link |
00:07:50.980
can solve intelligence for vision and language?
link |
00:07:53.640
I think that's the best shot at the moment.
link |
00:07:56.320
So whether this will take us all the way
link |
00:07:59.120
to human level intelligence or something,
link |
00:08:01.760
or just cat level intelligence is not clear,
link |
00:08:04.840
but among all the possible approaches
link |
00:08:07.340
that people have proposed, I think it's our best shot.
link |
00:08:09.520
So I think this idea of an intelligent system
link |
00:08:14.640
filling in the blanks, either predicting the future,
link |
00:08:18.880
inferring the past, filling in missing information,
link |
00:08:23.760
I'm currently filling the blank
link |
00:08:25.200
of what is behind your head
link |
00:08:26.680
and what your head looks like from the back,
link |
00:08:30.600
because I have basic knowledge about how humans are made.
link |
00:08:33.760
And I don't know what you're going to say,
link |
00:08:36.360
at which point you're going to speak,
link |
00:08:37.280
whether you're going to move your head this way or that way,
link |
00:08:38.960
which way you're going to look,
link |
00:08:40.280
but I know you're not going to just dematerialize
link |
00:08:42.080
and reappear three meters down the hall,
link |
00:08:46.280
because I know what's possible and what's impossible
link |
00:08:49.520
according to intuitive physics.
link |
00:08:51.160
You have a model of what's possible and what's impossible
link |
00:08:53.280
and then you'd be very surprised if it happens
link |
00:08:55.080
and then you'll have to reconstruct your model.
link |
00:08:57.840
Right, so that's the model of the world.
link |
00:08:59.600
It's what tells you, what fills in the blanks.
link |
00:09:02.240
So given your partial information about the state
link |
00:09:04.960
of the world, given by your perception,
link |
00:09:08.080
your model of the world fills in the missing information
link |
00:09:11.360
and that includes predicting the future,
link |
00:09:13.760
re predicting the past, filling in things
link |
00:09:16.880
you don't immediately perceive.
link |
00:09:18.400
And that doesn't have to be purely generic vision
link |
00:09:22.280
or visual information or generic language.
link |
00:09:24.340
You can go to specifics like predicting
link |
00:09:28.920
what control decision you make when you're driving
link |
00:09:31.120
in a lane, you have a sequence of images from a vehicle
link |
00:09:35.620
and then you have information if you record it on video
link |
00:09:39.640
where the car ended up going so you can go back in time
link |
00:09:43.680
and predict where the car went
link |
00:09:45.520
based on the visual information.
link |
00:09:46.680
That's very specific, domain specific.
link |
00:09:49.440
Right, but the question is whether we can come up
link |
00:09:51.480
with sort of a generic method for training machines
link |
00:09:57.000
to do this kind of prediction or filling in the blanks.
link |
00:09:59.840
So right now, this type of approach has been unbelievably
link |
00:10:04.720
successful in the context of natural language processing.
link |
00:10:08.200
Every modern natural language processing is pre trained
link |
00:10:10.440
in self supervised manner to fill in the blanks.
link |
00:10:13.720
You show it a sequence of words, you remove 10% of them
link |
00:10:16.400
and then you train some gigantic neural net
link |
00:10:17.940
to predict the words that are missing.
link |
00:10:20.320
And once you've pre trained that network,
link |
00:10:22.760
you can use the internal representation learned by it
link |
00:10:26.600
as input to something that you train supervised
link |
00:10:30.480
or whatever.
link |
00:10:32.240
That's been incredibly successful.
link |
00:10:33.400
Not so successful in images, although it's making progress
link |
00:10:37.600
and it's based on sort of manual data augmentation.
link |
00:10:42.600
We can go into this later,
link |
00:10:43.560
but what has not been successful yet is training from video.
link |
00:10:47.200
So getting a machine to learn to represent
link |
00:10:49.440
the visual world, for example, by just watching video.
link |
00:10:52.800
Nobody has really succeeded in doing this.
link |
00:10:54.800
Okay, well, let's kind of give a high level overview.
link |
00:10:57.520
What's the difference in kind and in difficulty
link |
00:11:02.360
between vision and language?
link |
00:11:03.960
So you said people haven't been able to really
link |
00:11:08.280
kind of crack the problem of vision open
link |
00:11:10.480
in terms of self supervised learning,
link |
00:11:11.960
but that may not be necessarily
link |
00:11:13.800
because it's fundamentally more difficult.
link |
00:11:15.840
Maybe like when we're talking about achieving,
link |
00:11:18.720
like passing the Turing test in the full spirit
link |
00:11:22.320
of the Turing test in language might be harder than vision.
link |
00:11:24.920
That's not obvious.
link |
00:11:26.400
So in your view, which is harder
link |
00:11:29.440
or perhaps are they just the same problem?
link |
00:11:31.960
When the farther we get to solving each,
link |
00:11:34.840
the more we realize it's all the same thing.
link |
00:11:36.720
It's all the same cake.
link |
00:11:37.680
I think what I'm looking for are methods
link |
00:11:40.200
that make them look essentially like the same cake,
link |
00:11:43.600
but currently they're not.
link |
00:11:44.800
And the main issue with learning world models
link |
00:11:48.480
or learning predictive models is that the prediction
link |
00:11:53.120
is never a single thing
link |
00:11:55.880
because the world is not entirely predictable.
link |
00:11:59.240
It may be deterministic or stochastic.
link |
00:12:00.680
We can get into the philosophical discussion about it,
link |
00:12:02.960
but even if it's deterministic,
link |
00:12:05.280
it's not entirely predictable.
link |
00:12:07.440
And so if I play a short video clip
link |
00:12:11.760
and then I ask you to predict what's going to happen next,
link |
00:12:14.160
there's many, many plausible continuations
link |
00:12:16.360
for that video clip and the number of continuation grows
link |
00:12:20.520
with the interval of time that you're asking the system
link |
00:12:23.920
to make a prediction for.
link |
00:12:26.480
And so one big question with self supervised learning
link |
00:12:29.880
is how you represent this uncertainty,
link |
00:12:32.320
how you represent multiple discrete outcomes,
link |
00:12:35.200
how you represent a sort of continuum
link |
00:12:37.120
of possible outcomes, et cetera.
link |
00:12:40.400
And if you are sort of a classical machine learning person,
link |
00:12:45.200
you say, oh, you just represent a distribution, right?
link |
00:12:49.120
And that we know how to do when we're predicting words,
link |
00:12:52.560
missing words in the text,
link |
00:12:53.720
because you can have a neural net give a score
link |
00:12:56.840
for every word in the dictionary.
link |
00:12:58.640
It's a big list of numbers, maybe 100,000 or so.
link |
00:13:02.480
And you can turn them into a probability distribution
link |
00:13:05.280
that tells you when I say a sentence,
link |
00:13:09.880
the cat is chasing the blank in the kitchen.
link |
00:13:13.000
There are only a few words that make sense there.
link |
00:13:15.840
It could be a mouse or it could be a lizard spot
link |
00:13:18.360
or something like that, right?
link |
00:13:21.560
And if I say the blank is chasing the blank in the Savannah,
link |
00:13:25.840
you also have a bunch of plausible options
link |
00:13:27.840
for those two words, right?
link |
00:13:30.960
Because you have kind of a underlying reality
link |
00:13:33.640
that you can refer to to sort of fill in those blanks.
link |
00:13:38.080
So you cannot say for sure in the Savannah,
link |
00:13:42.040
if it's a lion or a cheetah or whatever,
link |
00:13:44.480
you cannot know if it's a zebra or a do or whatever,
link |
00:13:49.560
wildebeest, the same thing.
link |
00:13:55.360
But you can represent the uncertainty
link |
00:13:56.840
by just a long list of numbers.
link |
00:13:58.520
Now, if I do the same thing with video,
link |
00:14:01.800
when I ask you to predict a video clip,
link |
00:14:04.360
it's not a discrete set of potential frames.
link |
00:14:07.400
You have to have somewhere representing
link |
00:14:10.000
a sort of infinite number of plausible continuations
link |
00:14:13.520
of multiple frames in a high dimensional continuous space.
link |
00:14:17.480
And we just have no idea how to do this properly.
link |
00:14:20.520
Fine night, high dimensional.
link |
00:14:22.880
So like you,
link |
00:14:23.720
It's finite high dimensional, yes.
link |
00:14:25.320
Just like the words,
link |
00:14:26.240
they try to get it down to a small finite set
link |
00:14:32.200
of like under a million, something like that.
link |
00:14:34.240
Something like that.
link |
00:14:35.080
I mean, it's kind of ridiculous that we're doing
link |
00:14:38.320
a distribution over every single possible word
link |
00:14:40.840
for language and it works.
link |
00:14:42.880
It feels like that's a really dumb way to do it.
link |
00:14:46.480
Like there seems to be like there should be
link |
00:14:49.720
some more compressed representation
link |
00:14:52.920
of the distribution of the words.
link |
00:14:55.040
You're right about that.
link |
00:14:56.120
And so do you have any interesting ideas
link |
00:14:58.880
about how to represent all of reality in a compressed way
link |
00:15:01.880
such that you can form a distribution over it?
link |
00:15:03.800
That's one of the big questions, how do you do that?
link |
00:15:06.200
Right, I mean, what's kind of another thing
link |
00:15:08.440
that really is stupid about, I shouldn't say stupid,
link |
00:15:13.080
but like simplistic about current approaches
link |
00:15:15.560
to self supervised learning in NLP in text
link |
00:15:19.360
is that not only do you represent
link |
00:15:21.920
a giant distribution over words,
link |
00:15:23.840
but for multiple words that are missing,
link |
00:15:25.680
those distributions are essentially independent
link |
00:15:27.680
of each other.
link |
00:15:30.200
And you don't pay too much of a price for this.
link |
00:15:33.040
So you can't, so the system in the sentence
link |
00:15:37.840
that I gave earlier, if it gives a certain probability
link |
00:15:41.280
for a lion and cheetah, and then a certain probability
link |
00:15:44.800
for gazelle, wildebeest and zebra,
link |
00:15:51.960
those two probabilities are independent of each other.
link |
00:15:55.960
And it's not the case that those things are independent.
link |
00:15:58.040
Lions actually attack like bigger animals than cheetahs.
link |
00:16:01.480
So there's a huge independent hypothesis in this process,
link |
00:16:05.960
which is not actually true.
link |
00:16:07.800
The reason for this is that we don't know
link |
00:16:09.880
how to represent properly distributions
link |
00:16:13.000
over combinatorial sequences of symbols,
link |
00:16:16.240
essentially because the number grows exponentially
link |
00:16:19.000
with the length of the symbols.
link |
00:16:21.320
And so we have to use tricks for this,
link |
00:16:22.760
but those techniques can get around,
link |
00:16:26.400
like don't even deal with it.
link |
00:16:27.800
So the big question is would there be some sort
link |
00:16:31.760
of abstract latent representation of text
link |
00:16:35.640
that would say that when I switch lion for gazelle,
link |
00:16:40.680
lion for cheetah, I also have to switch zebra for gazelle?
link |
00:16:45.480
Yeah, so this independence assumption,
link |
00:16:48.720
let me throw some criticism at you that I often hear
link |
00:16:51.160
and see how you respond.
link |
00:16:52.920
So this kind of filling in the blanks is just statistics.
link |
00:16:56.000
You're not learning anything
link |
00:16:58.880
like the deep underlying concepts.
link |
00:17:01.600
You're just mimicking stuff from the past.
link |
00:17:05.640
You're not learning anything new such that you can use it
link |
00:17:08.560
to generalize about the world.
link |
00:17:11.960
Or okay, let me just say the crude version,
link |
00:17:14.120
which is just statistics.
link |
00:17:16.200
It's not intelligence.
link |
00:17:18.320
What do you have to say to that?
link |
00:17:19.640
What do you usually say to that
link |
00:17:20.880
if you kind of hear this kind of thing?
link |
00:17:22.640
I don't get into those discussions
link |
00:17:23.960
because they are kind of pointless.
link |
00:17:26.760
So first of all, it's quite possible
link |
00:17:28.760
that intelligence is just statistics.
link |
00:17:30.480
It's just statistics of a particular kind.
link |
00:17:32.760
Yes, this is the philosophical question.
link |
00:17:35.480
It's kind of is it possible
link |
00:17:38.400
that intelligence is just statistics?
link |
00:17:40.280
Yeah, but what kind of statistics?
link |
00:17:43.520
So if you are asking the question,
link |
00:17:47.160
are the models of the world that we learn,
link |
00:17:50.680
do they have some notion of causality?
link |
00:17:52.320
Yes.
link |
00:17:53.400
So if the criticism comes from people who say,
link |
00:17:57.200
current machine learning system don't care about causality,
link |
00:17:59.440
which by the way is wrong, I agree with them.
link |
00:18:04.600
Your model of the world should have your actions
link |
00:18:06.560
as one of the inputs.
link |
00:18:09.080
And that will drive you to learn causal models of the world
link |
00:18:11.400
where you know what intervention in the world
link |
00:18:15.080
will cause what result.
link |
00:18:16.720
Or you can do this by observation of other agents
link |
00:18:19.400
acting in the world and observing the effect.
link |
00:18:22.520
Other humans, for example.
link |
00:18:24.240
So I think at some level of description,
link |
00:18:28.440
intelligence is just statistics.
link |
00:18:31.680
But that doesn't mean you don't have models
link |
00:18:35.200
that have deep mechanistic explanation for what goes on.
link |
00:18:40.080
The question is how do you learn them?
link |
00:18:41.760
That's the question I'm interested in.
link |
00:18:44.440
Because a lot of people who actually voice their criticism
link |
00:18:49.360
say that those mechanistic model
link |
00:18:51.040
have to come from someplace else.
link |
00:18:52.640
They have to come from human designers,
link |
00:18:54.040
they have to come from I don't know what.
link |
00:18:56.200
And obviously we learn them.
link |
00:18:59.280
Or if we don't learn them as an individual,
link |
00:19:01.800
nature learn them for us using evolution.
link |
00:19:04.920
So regardless of what you think,
link |
00:19:07.160
those processes have been learned somehow.
link |
00:19:10.240
So if you look at the human brain,
link |
00:19:12.920
just like when we humans introspect
link |
00:19:14.640
about how the brain works,
link |
00:19:16.320
it seems like when we think about what is intelligence,
link |
00:19:20.240
we think about the high level stuff,
link |
00:19:22.440
like the models we've constructed,
link |
00:19:23.960
concepts like cognitive science,
link |
00:19:25.560
like concepts of memory and reasoning module,
link |
00:19:28.720
almost like these high level modules.
link |
00:19:32.360
Is this serve as a good analogy?
link |
00:19:35.400
Like are we ignoring the dark matter,
link |
00:19:40.720
the basic low level mechanisms?
link |
00:19:43.560
Just like we ignore the way the operating system works,
link |
00:19:45.800
we're just using the high level software.
link |
00:19:49.640
We're ignoring that at the low level,
link |
00:19:52.720
the neural network might be doing something like statistics.
link |
00:19:56.440
Like meaning, sorry to use this word
link |
00:19:59.120
probably incorrectly and crudely,
link |
00:20:00.560
but doing this kind of fill in the gap kind of learning
link |
00:20:03.320
and just kind of updating the model constantly
link |
00:20:05.720
in order to be able to support the raw sensory information
link |
00:20:09.240
to predict it and then adjust to the prediction
link |
00:20:11.360
when it's wrong.
link |
00:20:12.400
But like when we look at our brain at the high level,
link |
00:20:15.840
it feels like we're doing, like we're playing chess,
link |
00:20:18.320
like we're like playing with high level concepts
link |
00:20:22.240
and we're stitching them together
link |
00:20:23.680
and we're putting them into longterm memory.
link |
00:20:26.000
But really what's going underneath
link |
00:20:28.280
is something we're not able to introspect,
link |
00:20:30.160
which is this kind of simple, large neural network
link |
00:20:34.440
that's just filling in the gaps.
link |
00:20:36.000
Right, well, okay.
link |
00:20:37.120
So there's a lot of questions and a lot of answers there.
link |
00:20:39.760
Okay, so first of all,
link |
00:20:40.600
there's a whole school of thought in neuroscience,
link |
00:20:42.680
computational neuroscience in particular,
link |
00:20:45.240
that likes the idea of predictive coding,
link |
00:20:47.760
which is really related to the idea
link |
00:20:50.080
I was talking about in self supervised learning.
link |
00:20:52.040
So everything is about prediction.
link |
00:20:53.520
The essence of intelligence is the ability to predict
link |
00:20:56.320
and everything the brain does is trying to predict,
link |
00:20:59.920
predict everything from everything else.
link |
00:21:02.120
Okay, and that's really sort of the underlying principle,
link |
00:21:04.760
if you want, that self supervised learning
link |
00:21:07.800
is trying to kind of reproduce this idea of prediction
link |
00:21:10.640
as kind of an essential mechanism
link |
00:21:13.080
of task independent learning, if you want.
link |
00:21:16.320
The next step is what kind of intelligence
link |
00:21:19.320
are you interested in reproducing?
link |
00:21:21.120
And of course, we all think about trying to reproduce
link |
00:21:24.640
sort of high level cognitive processes in humans,
link |
00:21:28.320
but like with machines, we're not even at the level
link |
00:21:30.400
of even reproducing the learning processes in a cat brain.
link |
00:21:37.160
The most intelligent or intelligent systems
link |
00:21:39.360
don't have as much common sense as a house cat.
link |
00:21:43.200
So how is it that cats learn?
link |
00:21:45.160
And cats don't do a whole lot of reasoning.
link |
00:21:47.920
They certainly have causal models.
link |
00:21:49.600
They certainly have, because many cats can figure out
link |
00:21:53.600
how they can act on the world to get what they want.
link |
00:21:56.600
They certainly have a fantastic model of intuitive physics,
link |
00:22:01.800
certainly the dynamics of their own bodies,
link |
00:22:04.560
but also of praise and things like that.
link |
00:22:06.880
So they're pretty smart.
link |
00:22:09.880
They only do this with about 800 million neurons.
link |
00:22:12.400
We are not anywhere close to reproducing this kind of thing.
link |
00:22:17.920
So to some extent, I could say,
link |
00:22:21.320
let's not even worry about like the high level cognition
link |
00:22:26.280
and kind of longterm planning and reasoning
link |
00:22:27.960
that humans can do until we figure out like,
link |
00:22:30.120
can we even reproduce what cats are doing?
link |
00:22:32.520
Now that said, this ability to learn world models,
link |
00:22:37.000
I think is the key to the possibility of learning machines
link |
00:22:41.560
that can also reason.
link |
00:22:43.160
So whenever I give a talk, I say there are three challenges
link |
00:22:45.640
in the three main challenges in machine learning.
link |
00:22:47.320
The first one is getting machines to learn
link |
00:22:49.920
to represent the world
link |
00:22:51.800
and I'm proposing self supervised learning.
link |
00:22:54.840
The second is getting machines to reason
link |
00:22:58.000
in ways that are compatible
link |
00:22:59.240
with essentially gradient based learning
link |
00:23:01.640
because this is what deep learning is all about really.
link |
00:23:05.280
And the third one is something
link |
00:23:06.640
we have no idea how to solve,
link |
00:23:07.640
at least I have no idea how to solve
link |
00:23:09.480
is can we get machines to learn hierarchical representations
link |
00:23:14.360
of action plans?
link |
00:23:17.920
We know how to train them
link |
00:23:18.760
to learn hierarchical representations of perception
link |
00:23:22.200
with convolutional nets and things like that
link |
00:23:23.680
and transformers, but what about action plans?
link |
00:23:26.040
Can we get them to spontaneously learn
link |
00:23:28.280
good hierarchical representations of actions?
link |
00:23:30.480
Also gradient based.
link |
00:23:32.400
Yeah, all of that needs to be somewhat differentiable
link |
00:23:35.880
so that you can apply sort of gradient based learning,
link |
00:23:38.720
which is really what deep learning is about.
link |
00:23:42.080
So it's background, knowledge, ability to reason
link |
00:23:46.760
in a way that's differentiable
link |
00:23:50.520
that is somehow connected, deeply integrated
link |
00:23:53.840
with that background knowledge
link |
00:23:55.480
or builds on top of that background knowledge
link |
00:23:57.600
and then given that background knowledge
link |
00:23:59.120
be able to make hierarchical plans in the world.
link |
00:24:02.360
So if you take classical optimal control,
link |
00:24:05.480
there's something in classical optimal control
link |
00:24:07.000
called model predictive control.
link |
00:24:10.520
And it's been around since the early sixties.
link |
00:24:13.840
NASA uses that to compute trajectories of rockets.
link |
00:24:16.840
And the basic idea is that you have a predictive model
link |
00:24:20.600
of the rocket, let's say,
link |
00:24:21.840
or whatever system you intend to control,
link |
00:24:25.440
which given the state of the system at time T
link |
00:24:28.360
and given an action that you're taking the system.
link |
00:24:31.640
So for a rocket to be thrust
link |
00:24:33.520
and all the controls you can have,
link |
00:24:35.600
it gives you the state of the system
link |
00:24:37.280
at time T plus Delta T, right?
link |
00:24:38.800
So basically a differential equation, something like that.
link |
00:24:43.520
And if you have this model
link |
00:24:45.240
and you have this model in the form of some sort of neural net
link |
00:24:48.720
or some sort of a set of formula
link |
00:24:50.960
that you can back propagate gradient through,
link |
00:24:52.920
you can do what's called model predictive control
link |
00:24:55.240
or gradient based model predictive control.
link |
00:24:57.680
So you can unroll that model in time.
link |
00:25:02.680
You feed it a hypothesized sequence of actions.
link |
00:25:08.080
And then you have some objective function
link |
00:25:10.760
that measures how well at the end of the trajectory,
link |
00:25:13.240
the system has succeeded or matched what you wanted to do.
link |
00:25:17.240
Is it a robot harm?
link |
00:25:18.280
Have you grasped the object you want to grasp?
link |
00:25:20.680
If it's a rocket, are you at the right place
link |
00:25:23.360
near the space station, things like that.
link |
00:25:26.120
And by back propagation through time,
link |
00:25:28.040
and again, this was invented in the 1960s,
link |
00:25:30.080
by optimal control theorists, you can figure out
link |
00:25:34.040
what is the optimal sequence of actions
link |
00:25:36.160
that will get my system to the best final state.
link |
00:25:42.040
So that's a form of reasoning.
link |
00:25:44.560
It's basically planning.
link |
00:25:45.640
And a lot of planning systems in robotics
link |
00:25:48.160
are actually based on this.
link |
00:25:49.600
And you can think of this as a form of reasoning.
link |
00:25:53.160
So to take the example of the teenager driving a car,
link |
00:25:57.040
you have a pretty good dynamical model of the car.
link |
00:26:00.120
It doesn't need to be very accurate.
link |
00:26:01.280
But you know, again, that if you turn the wheel
link |
00:26:03.840
to the right and there is a cliff,
link |
00:26:05.080
you're gonna run off the cliff, right?
link |
00:26:06.520
You don't need to have a very accurate model
link |
00:26:08.000
to predict that.
link |
00:26:09.080
And you can run this in your mind
link |
00:26:10.640
and decide not to do it for that reason.
link |
00:26:13.080
Because you can predict in advance
link |
00:26:14.480
that the result is gonna be bad.
link |
00:26:15.600
So you can sort of imagine different scenarios
link |
00:26:17.960
and then employ or take the first step
link |
00:26:21.560
in the scenario that is most favorable
link |
00:26:23.360
and then repeat the process again.
link |
00:26:24.960
The scenario that is most favorable
link |
00:26:27.120
and then repeat the process of planning.
link |
00:26:28.480
That's called receding horizon model predictive control.
link |
00:26:31.280
So even all those things have names going back decades.
link |
00:26:36.480
And so if you're not a classical optimal control,
link |
00:26:40.680
the model of the world is not generally learned.
link |
00:26:44.360
Sometimes a few parameters you have to identify.
link |
00:26:46.240
That's called systems identification.
link |
00:26:47.800
But generally, the model is mostly deterministic
link |
00:26:52.640
and mostly built by hand.
link |
00:26:53.920
So the question of AI,
link |
00:26:55.920
I think the big challenge of AI for the next decade
link |
00:26:58.760
is how do we get machines to learn predictive models
link |
00:27:01.120
of the world that deal with uncertainty
link |
00:27:03.720
and deal with the real world in all this complexity?
link |
00:27:05.840
So it's not just the trajectory of a rocket,
link |
00:27:08.160
which you can reduce to first principles.
link |
00:27:10.240
It's not even just the trajectory of a robot arm,
link |
00:27:13.040
which again, you can model by careful mathematics.
link |
00:27:16.320
But it's everything else,
link |
00:27:17.200
everything we observe in the world.
link |
00:27:18.880
People, behavior,
link |
00:27:20.120
physical systems that involve collective phenomena,
link |
00:27:25.800
like water or trees and branches in a tree or something
link |
00:27:31.880
or complex things that humans have no trouble
link |
00:27:36.680
developing abstract representations
link |
00:27:38.520
and predictive model for,
link |
00:27:39.840
but we still don't know how to do with machines.
link |
00:27:41.600
Where do you put in these three,
link |
00:27:43.880
maybe in the planning stages,
link |
00:27:46.180
the game theoretic nature of this world,
link |
00:27:50.660
where your actions not only respond
link |
00:27:52.980
to the dynamic nature of the world, the environment,
link |
00:27:55.540
but also affect it.
link |
00:27:57.500
So if there's other humans involved,
link |
00:27:59.860
is this point number four,
link |
00:28:02.220
or is it somehow integrated
link |
00:28:03.420
into the hierarchical representation of action
link |
00:28:05.820
in your view?
link |
00:28:06.660
I think it's integrated.
link |
00:28:07.500
It's just that now your model of the world has to deal with,
link |
00:28:11.580
it just makes it more complicated.
link |
00:28:13.100
The fact that humans are complicated
link |
00:28:15.600
and not easily predictable,
link |
00:28:17.220
that makes your model of the world much more complicated,
link |
00:28:19.860
that much more complicated.
link |
00:28:21.340
Well, there's a chess,
link |
00:28:22.380
I mean, I suppose chess is an analogy.
link |
00:28:25.300
So multicolored tree search.
link |
00:28:28.860
There's a, I go, you go, I go, you go.
link |
00:28:32.040
Like Andre Capote recently gave a talk at MIT
link |
00:28:35.580
about car doors.
link |
00:28:37.900
I think there's some machine learning too,
link |
00:28:39.280
but mostly car doors.
link |
00:28:40.780
And there's a dynamic nature to the car,
link |
00:28:43.340
like the person opening the door,
link |
00:28:44.700
checking, I mean, he wasn't talking about that.
link |
00:28:46.900
He was talking about the perception problem
link |
00:28:48.420
of what the ontology of what defines a car door,
link |
00:28:50.940
this big philosophical question.
link |
00:28:52.940
But to me, it was interesting
link |
00:28:54.060
because it's obvious that the person opening the car doors,
link |
00:28:57.300
they're trying to get out, like here in New York,
link |
00:28:59.580
trying to get out of the car.
link |
00:29:01.400
You slowing down is going to signal something.
link |
00:29:03.580
You speeding up is gonna signal something,
link |
00:29:05.380
and that's a dance.
link |
00:29:06.460
It's a asynchronous chess game.
link |
00:29:10.140
I don't know.
link |
00:29:10.980
So it feels like it's not just,
link |
00:29:16.900
I mean, I guess you can integrate all of them
link |
00:29:18.780
to one giant model, like the entirety
link |
00:29:21.300
of these little interactions.
link |
00:29:24.340
Because it's not as complicated as chess.
link |
00:29:25.740
It's just like a little dance.
link |
00:29:27.120
We do like a little dance together,
link |
00:29:28.800
and then we figure it out.
link |
00:29:29.980
Well, in some ways it's way more complicated than chess
link |
00:29:32.500
because it's continuous, it's uncertain
link |
00:29:36.020
in a continuous manner.
link |
00:29:38.220
It doesn't feel more complicated.
link |
00:29:39.860
But it doesn't feel more complicated
link |
00:29:41.060
because that's what we've evolved to solve.
link |
00:29:43.660
This is the kind of problem we've evolved to solve.
link |
00:29:45.480
And so we're good at it
link |
00:29:46.400
because nature has made us good at it.
link |
00:29:50.500
Nature has not made us good at chess.
link |
00:29:52.340
We completely suck at chess.
link |
00:29:55.700
In fact, that's why we designed it as a game,
link |
00:29:57.980
is to be challenging.
link |
00:30:00.340
And if there is something that recent progress
link |
00:30:02.580
in chess and Go has made us realize
link |
00:30:05.580
is that humans are really terrible at those things,
link |
00:30:07.900
like really bad.
link |
00:30:09.660
There was a story right before AlphaGo
link |
00:30:11.540
that the best Go players thought
link |
00:30:15.220
there were maybe two or three stones behind an ideal player
link |
00:30:18.520
that they would call God.
link |
00:30:20.700
In fact, no, there are like nine or 10 stones behind.
link |
00:30:23.700
I mean, we're just bad.
link |
00:30:25.340
So we're not good at,
link |
00:30:27.420
and it's because we have limited working memory.
link |
00:30:30.340
We're not very good at doing this tree exploration
link |
00:30:32.980
that computers are much better at doing than we are.
link |
00:30:36.780
But we are much better
link |
00:30:37.940
at learning differentiable models to the world.
link |
00:30:40.620
I mean, I said differentiable in a kind of,
link |
00:30:43.820
I should say not differentiable in the sense that
link |
00:30:46.420
we went back far through it,
link |
00:30:47.480
but in the sense that our brain has some mechanism
link |
00:30:50.500
for estimating gradients of some kind.
link |
00:30:54.060
And that's what makes us efficient.
link |
00:30:56.540
So if you have an agent that consists of a model
link |
00:31:02.180
of the world, which in the human brain
link |
00:31:04.380
is basically the entire front half of your brain,
link |
00:31:08.340
an objective function,
link |
00:31:10.220
which in humans is a combination of two things.
link |
00:31:14.440
There is your sort of intrinsic motivation module,
link |
00:31:17.660
which is in the basal ganglia,
link |
00:31:19.140
the base of your brain.
link |
00:31:20.100
That's the thing that measures pain and hunger
link |
00:31:22.540
and things like that,
link |
00:31:23.360
like immediate feelings and emotions.
link |
00:31:28.020
And then there is the equivalent
link |
00:31:30.780
of what people in reinforcement learning call a critic,
link |
00:31:32.620
which is a sort of module that predicts ahead
link |
00:31:36.100
what the outcome of a situation will be.
link |
00:31:41.940
And so it's not a cost function,
link |
00:31:43.840
but it's sort of not an objective function,
link |
00:31:45.460
but it's sort of a train predictor
link |
00:31:49.020
of the ultimate objective function.
link |
00:31:50.980
And that also is differentiable.
link |
00:31:52.620
And so if all of this is differentiable,
link |
00:31:54.660
your cost function, your critic, your world model,
link |
00:31:59.660
then you can use gradient based type methods
link |
00:32:03.100
to do planning, to do reasoning, to do learning,
link |
00:32:05.820
to do all the things that we'd like
link |
00:32:08.140
an intelligent agent to do.
link |
00:32:11.840
And gradient based learning,
link |
00:32:14.180
like what's your intuition?
link |
00:32:15.340
That's probably at the core of what can solve intelligence.
link |
00:32:18.420
So you don't need like logic based reasoning in your view.
link |
00:32:25.620
I don't know how to make logic based reasoning
link |
00:32:27.260
compatible with efficient learning.
link |
00:32:31.020
Okay, I mean, there is a big question,
link |
00:32:32.300
perhaps a philosophical question.
link |
00:32:33.900
I mean, it's not that philosophical,
link |
00:32:35.220
but that we can ask is that all the learning algorithms
link |
00:32:40.020
we know from engineering and computer science
link |
00:32:43.300
proceed by optimizing some objective function.
link |
00:32:48.340
So one question we may ask is,
link |
00:32:51.780
does learning in the brain minimize an objective function?
link |
00:32:54.740
I mean, it could be a composite
link |
00:32:57.340
of multiple objective functions,
link |
00:32:58.500
but it's still an objective function.
link |
00:33:01.420
Second, if it does optimize an objective function,
link |
00:33:04.660
does it do it by some sort of gradient estimation?
link |
00:33:09.940
It doesn't need to be a back prop,
link |
00:33:10.860
but some way of estimating the gradient in efficient manner
link |
00:33:14.820
whose complexity is on the same order of magnitude
link |
00:33:17.020
as actually running the inference.
link |
00:33:20.800
Because you can't afford to do things
link |
00:33:24.060
like perturbing a weight in your brain
link |
00:33:26.540
to figure out what the effect is.
link |
00:33:28.100
And then sort of, you can do sort of
link |
00:33:30.780
estimating gradient by perturbation.
link |
00:33:33.300
To me, it seems very implausible
link |
00:33:35.460
that the brain uses some sort of zeroth order black box
link |
00:33:41.060
gradient free optimization,
link |
00:33:43.000
because it's so much less efficient
link |
00:33:45.200
than gradient optimization.
link |
00:33:46.320
So it has to have a way of estimating gradient.
link |
00:33:49.260
Is it possible that some kind of logic based reasoning
link |
00:33:52.780
emerges in pockets as a useful,
link |
00:33:55.400
like you said, if the brain is an objective function,
link |
00:33:58.100
maybe it's a mechanism for creating objective functions.
link |
00:34:01.300
It's a mechanism for creating knowledge bases, for example,
link |
00:34:06.520
that can then be queried.
link |
00:34:08.380
Like maybe it's like an efficient representation
link |
00:34:10.300
of knowledge that's learned in a gradient based way
link |
00:34:12.700
or something like that.
link |
00:34:13.780
Well, so I think there is a lot of different types
link |
00:34:15.980
of intelligence.
link |
00:34:17.340
So first of all, I think the type of logical reasoning
link |
00:34:19.700
that we think about that we are maybe stemming
link |
00:34:23.780
from sort of classical AI of the 1970s and 80s.
link |
00:34:29.080
I think humans use that relatively rarely
link |
00:34:33.020
and are not particularly good at it.
link |
00:34:34.740
But we judge each other based on our ability
link |
00:34:37.560
to solve those rare problems.
link |
00:34:40.620
It's called an IQ test.
link |
00:34:41.660
I don't think so.
link |
00:34:42.700
Like I'm not very good at chess.
link |
00:34:45.260
Yes, I'm judging you this whole time.
link |
00:34:47.420
Because, well, we actually.
link |
00:34:49.740
With your heritage, I'm sure you're good at chess.
link |
00:34:53.500
No, stereotypes.
link |
00:34:55.060
Not all stereotypes are true.
link |
00:34:58.020
Well, I'm terrible at chess.
link |
00:34:59.020
So, but I think perhaps another type of intelligence
link |
00:35:04.660
that I have is this ability of sort of building models
link |
00:35:08.980
to the world from reasoning obviously,
link |
00:35:13.820
but also data.
link |
00:35:15.980
And those models generally are more kind of analogical.
link |
00:35:18.900
So it's reasoning by simulation,
link |
00:35:22.380
and by analogy, where you use one model
link |
00:35:25.120
to apply to a new situation.
link |
00:35:26.900
Even though you've never seen that situation,
link |
00:35:28.500
you can sort of connect it to a situation
link |
00:35:31.620
you've encountered before.
link |
00:35:33.500
And your reasoning is more akin
link |
00:35:36.700
to some sort of internal simulation.
link |
00:35:38.420
So you're kind of simulating what's happening
link |
00:35:41.140
when you're building, I don't know,
link |
00:35:42.240
a box out of wood or something, right?
link |
00:35:44.100
You can imagine in advance what would be the result
link |
00:35:47.460
of cutting the wood in this particular way.
link |
00:35:49.660
Are you going to use screws or nails or whatever?
link |
00:35:52.900
When you are interacting with someone,
link |
00:35:54.180
you also have a model of that person
link |
00:35:55.780
and sort of interact with that person,
link |
00:35:59.580
having this model in mind to kind of tell the person
link |
00:36:03.660
what you think is useful to them.
link |
00:36:05.280
So I think this ability to construct models to the world
link |
00:36:10.220
is basically the essence, the essence of intelligence.
link |
00:36:13.900
And the ability to use it then to plan actions
link |
00:36:18.220
that will fulfill a particular criterion,
link |
00:36:23.080
of course, is necessary as well.
link |
00:36:25.460
So I'm going to ask you a series of impossible questions
link |
00:36:27.740
as we keep asking, as I've been doing.
link |
00:36:30.180
So if that's the fundamental sort of dark matter
link |
00:36:33.460
of intelligence, this ability to form a background model,
link |
00:36:36.580
what's your intuition about how much knowledge is required?
link |
00:36:41.460
You know, I think dark matter,
link |
00:36:43.100
you could put a percentage on it
link |
00:36:45.980
of the composition of the universe
link |
00:36:50.060
and how much of it is dark matter,
link |
00:36:51.460
how much of it is dark energy,
link |
00:36:52.640
how much information do you think is required
link |
00:36:57.900
to be a house cat?
link |
00:36:59.920
So you have to be able to, when you see a box going in,
link |
00:37:02.900
when you see a human compute the most evil action,
link |
00:37:06.220
if there's a thing that's near an edge,
link |
00:37:07.940
you knock it off, all of that,
link |
00:37:10.980
plus the extra stuff you mentioned,
link |
00:37:12.740
which is a great self awareness of the physics
link |
00:37:15.700
of your own body and the world.
link |
00:37:18.740
How much knowledge is required, do you think, to solve it?
link |
00:37:22.500
I don't even know how to measure an answer to that question.
link |
00:37:25.620
I'm not sure how to measure it,
link |
00:37:26.680
but whatever it is, it fits in about 800,000 neurons,
link |
00:37:32.380
800 million neurons.
link |
00:37:33.900
What's the representation does?
link |
00:37:36.300
Everything, all knowledge, everything, right?
link |
00:37:40.100
You know, it's less than a billion.
link |
00:37:41.500
A dog is 2 billion, but a cat is less than 1 billion.
link |
00:37:45.500
And so multiply that by a thousand
link |
00:37:48.140
and you get the number of synapses.
link |
00:37:50.300
And I think almost all of it is learned
link |
00:37:52.780
through this, you know, a sort of self supervised running,
link |
00:37:55.940
although, you know, I think a tiny sliver
link |
00:37:58.500
is learned through reinforcement running
link |
00:37:59.900
and certainly very little through, you know,
link |
00:38:02.220
classical supervised running,
link |
00:38:03.340
although it's not even clear how supervised running
link |
00:38:05.180
actually works in the biological world.
link |
00:38:09.260
So I think almost all of it is self supervised running,
link |
00:38:12.860
but it's driven by the sort of ingrained objective functions
link |
00:38:18.180
that a cat or a human have at the base of their brain,
link |
00:38:21.400
which kind of drives their behavior.
link |
00:38:24.880
So, you know, nature tells us you're hungry.
link |
00:38:29.480
It doesn't tell us how to feed ourselves.
link |
00:38:31.900
That's something that the rest of our brain
link |
00:38:33.500
has to figure out, right?
link |
00:38:35.780
What's interesting is there might be more
link |
00:38:37.940
like deeper objective functions
link |
00:38:39.660
than allowing the whole thing.
link |
00:38:41.300
So hunger may be some kind of,
link |
00:38:44.500
now you go to like neurobiology,
link |
00:38:46.140
it might be just the brain trying to maintain homeostasis.
link |
00:38:52.460
So hunger is just one of the human perceivable symptoms
link |
00:38:58.020
of the brain being unhappy
link |
00:38:59.380
with the way things are currently.
link |
00:39:01.460
It could be just like one really dumb objective function
link |
00:39:04.140
at the core.
link |
00:39:04.980
But that's how behavior is driven.
link |
00:39:08.460
The fact that, you know, or basal ganglia
link |
00:39:12.360
drive us to do things that are different
link |
00:39:14.820
from say an orangutan or certainly a cat
link |
00:39:18.180
is what makes, you know, human nature
link |
00:39:20.060
versus orangutan nature versus cat nature.
link |
00:39:23.280
So for example, you know, our basal ganglia
link |
00:39:27.100
drives us to seek the company of other humans.
link |
00:39:32.220
And that's because nature has figured out
link |
00:39:34.540
that we need to be social animals for our species to survive.
link |
00:39:37.540
And it's true of many primates.
link |
00:39:41.300
It's not true of orangutans.
link |
00:39:42.620
Orangutans are solitary animals.
link |
00:39:44.900
They don't seek the company of others.
link |
00:39:46.900
In fact, they avoid them.
link |
00:39:49.300
In fact, they scream at them when they come too close
link |
00:39:51.060
because they're territorial.
link |
00:39:52.740
Because for their survival, you know,
link |
00:39:55.900
evolution has figured out that's the best thing.
link |
00:39:58.300
I mean, they're occasionally social, of course,
link |
00:40:00.040
for, you know, reproduction and stuff like that.
link |
00:40:03.500
But they're mostly solitary.
link |
00:40:05.920
So all of those behaviors are not part of intelligence.
link |
00:40:09.540
You know, people say,
link |
00:40:10.380
oh, you're never gonna have intelligent machines
link |
00:40:11.800
because, you know, human intelligence is social.
link |
00:40:13.940
But then you look at orangutans, you look at octopus.
link |
00:40:16.820
Octopus never know their parents.
link |
00:40:18.800
They barely interact with any other.
link |
00:40:20.500
And they get to be really smart in less than a year,
link |
00:40:23.900
in like half a year.
link |
00:40:26.040
You know, in a year, they're adults.
link |
00:40:27.620
In two years, they're dead.
link |
00:40:28.780
So there are things that we think, as humans,
link |
00:40:33.620
are intimately linked with intelligence,
link |
00:40:35.740
like social interaction, like language.
link |
00:40:39.760
We think, I think we give way too much importance
link |
00:40:42.860
to language as a substrate of intelligence as humans.
link |
00:40:46.780
Because we think our reasoning is so linked with language.
link |
00:40:49.840
So to solve the house cat intelligence problem,
link |
00:40:53.460
you think you could do it on a desert island.
link |
00:40:55.500
You could have, you could just have a cat sitting there
link |
00:41:00.360
looking at the waves, at the ocean waves,
link |
00:41:03.180
and figure a lot of it out.
link |
00:41:05.740
It needs to have sort of, you know,
link |
00:41:07.500
the right set of drives to kind of, you know,
link |
00:41:11.540
get it to do the thing and learn the appropriate things,
link |
00:41:13.980
right, but like for example, you know,
link |
00:41:17.660
baby humans are driven to learn to stand up and walk.
link |
00:41:22.660
You know, that's kind of, this desire is hardwired.
link |
00:41:26.020
How to do it precisely is not, that's learned.
link |
00:41:28.540
But the desire to walk, move around and stand up,
link |
00:41:32.840
that's sort of probably hardwired.
link |
00:41:35.940
But it's very simple to hardwire this kind of stuff.
link |
00:41:38.940
Oh, like the desire to, well, that's interesting.
link |
00:41:42.780
You're hardwired to want to walk.
link |
00:41:45.620
That's not, there's gotta be a deeper need for walking.
link |
00:41:50.460
I think it was probably socially imposed by society
link |
00:41:53.140
that you need to walk all the other bipedal.
link |
00:41:55.580
No, like a lot of simple animals that, you know,
link |
00:41:58.420
will probably walk without ever watching
link |
00:42:01.040
any other members of the species.
link |
00:42:03.900
It seems like a scary thing to have to do
link |
00:42:06.820
because you suck at bipedal walking at first.
link |
00:42:09.280
It seems crawling is much safer, much more like,
link |
00:42:13.820
why are you in a hurry?
link |
00:42:15.700
Well, because you have this thing that drives you to do it,
link |
00:42:18.660
you know, which is sort of part of the sort of
link |
00:42:24.220
human development.
link |
00:42:25.060
Is that understood actually what?
link |
00:42:26.700
Not entirely, no.
link |
00:42:28.220
What's the reason you get on two feet?
link |
00:42:29.740
It's really hard.
link |
00:42:30.620
Like most animals don't get on two feet.
link |
00:42:32.780
Well, they get on four feet.
link |
00:42:33.980
You know, many mammals get on four feet.
link |
00:42:35.740
Yeah, they do. Very quickly.
link |
00:42:36.760
Some of them extremely quickly.
link |
00:42:38.500
But I don't, you know, like from the last time
link |
00:42:41.380
I've interacted with a table,
link |
00:42:42.620
that's much more stable than a thing than two legs.
link |
00:42:44.940
It's just a really hard problem.
link |
00:42:46.420
Yeah, I mean, birds have figured it out with two feet.
link |
00:42:48.620
Well, technically we can go into ontology.
link |
00:42:52.020
They have four, I guess they have two feet.
link |
00:42:54.500
They have two feet.
link |
00:42:55.340
Chickens.
link |
00:42:56.380
You know, dinosaurs have two feet, many of them.
link |
00:42:58.860
Allegedly.
link |
00:43:01.560
I'm just now learning that T. rex was eating grass,
link |
00:43:04.340
not other animals.
link |
00:43:05.420
T. rex might've been a friendly pet.
link |
00:43:08.020
What do you think about,
link |
00:43:10.320
I don't know if you looked at the test
link |
00:43:13.500
for general intelligence that François Chollet put together.
link |
00:43:16.380
I don't know if you got a chance to look
link |
00:43:18.000
at that kind of thing.
link |
00:43:19.660
What's your intuition about how to solve
link |
00:43:21.860
like an IQ type of test?
link |
00:43:23.740
I don't know.
link |
00:43:24.580
I think it's so outside of my radar screen
link |
00:43:26.140
that it's not really relevant, I think, in the short term.
link |
00:43:30.740
Well, I guess one way to ask,
link |
00:43:33.100
another way, perhaps more closer to what do you work is like,
link |
00:43:37.780
how do you solve MNIST with very little example data?
link |
00:43:42.740
That's right.
link |
00:43:43.560
And that's the answer to this probably
link |
00:43:44.860
is self supervised learning.
link |
00:43:45.860
Just learn to represent images
link |
00:43:47.300
and then learning to recognize handwritten digits
link |
00:43:51.060
on top of this will only require a few samples.
link |
00:43:53.620
And we observe this in humans, right?
link |
00:43:55.460
You show a young child a picture book
link |
00:43:58.660
with a couple of pictures of an elephant and that's it.
link |
00:44:01.940
The child knows what an elephant is.
link |
00:44:03.900
And we see this today with practical systems
link |
00:44:06.700
that we train image recognition systems
link |
00:44:09.540
with enormous amounts of images,
link |
00:44:13.660
either completely self supervised
link |
00:44:15.740
or very weakly supervised.
link |
00:44:16.980
For example, you can train a neural net
link |
00:44:20.900
to predict whatever hashtag people type on Instagram, right?
link |
00:44:24.180
Then you can do this with billions of images
link |
00:44:25.780
because there's billions per day that are showing up.
link |
00:44:28.540
So the amount of training data there
link |
00:44:30.700
is essentially unlimited.
link |
00:44:32.340
And then you take the output representation,
link |
00:44:35.380
a couple of layers down from the outputs
link |
00:44:37.380
of what the system learned and feed this as input
link |
00:44:40.680
to a classifier for any object in the world that you want
link |
00:44:43.780
and it works pretty well.
link |
00:44:44.940
So that's transfer learning, okay?
link |
00:44:47.620
Or weakly supervised transfer learning.
link |
00:44:51.340
People are making very, very fast progress
link |
00:44:53.460
using self supervised learning
link |
00:44:55.300
for this kind of scenario as well.
link |
00:44:58.580
And my guess is that that's gonna be the future.
link |
00:45:02.500
For self supervised learning,
link |
00:45:03.660
how much cleaning do you think is needed
link |
00:45:06.800
for filtering malicious signal or what's a better term?
link |
00:45:11.800
But like a lot of people use hashtags on Instagram
link |
00:45:16.760
to get like good SEO that doesn't fully represent
link |
00:45:21.200
the contents of the image.
link |
00:45:23.100
Like they'll put a picture of a cat
link |
00:45:24.520
and hashtag it with like science, awesome, fun.
link |
00:45:28.060
I don't know all kinds, why would you put science?
link |
00:45:31.200
That's not very good SEO.
link |
00:45:33.080
The way my colleagues who worked on this project
link |
00:45:34.960
at Facebook, now Meta AI, a few years ago dealt with this
link |
00:45:39.960
is that they only selected something like 17,000 tags
link |
00:45:43.760
that correspond to kind of physical things or situations,
link |
00:45:48.100
like that has some visual content.
link |
00:45:52.320
So you wouldn't have like hash TBT or anything like that.
link |
00:45:57.120
Oh, so they keep a very select set of hashtags
link |
00:46:00.820
is what you're saying?
link |
00:46:01.660
Yeah.
link |
00:46:02.480
Okay.
link |
00:46:03.320
But it's still in the order of 10 to 20,000.
link |
00:46:06.080
So it's fairly large.
link |
00:46:07.960
Okay.
link |
00:46:09.040
Can you tell me about data augmentation?
link |
00:46:11.280
What the heck is data augmentation and how is it used
link |
00:46:14.760
maybe contrast of learning for video?
link |
00:46:19.080
What are some cool ideas here?
link |
00:46:20.880
Right, so data augmentation.
link |
00:46:22.120
I mean, first data augmentation is the idea
link |
00:46:24.520
of artificially increasing the size of your training set
link |
00:46:26.960
by distorting the images that you have
link |
00:46:30.020
in ways that don't change the nature of the image, right?
link |
00:46:32.360
So you do MNIST, you can do data augmentation on MNIST
link |
00:46:35.520
and people have done this since the 1990s, right?
link |
00:46:37.360
You take a MNIST digit and you shift it a little bit
link |
00:46:40.880
or you change the size or rotate it, skew it,
link |
00:46:45.800
you know, et cetera.
link |
00:46:47.000
Add noise.
link |
00:46:48.280
Add noise, et cetera.
link |
00:46:49.520
And it works better if you train a supervised classifier
link |
00:46:52.440
with augmented data, you're gonna get better results.
link |
00:46:55.600
Now it's become really interesting
link |
00:46:58.640
over the last couple of years
link |
00:47:00.400
because a lot of self supervised learning techniques
link |
00:47:04.160
to pre train vision systems are based on data augmentation.
link |
00:47:07.980
And the basic techniques is originally inspired
link |
00:47:12.000
by techniques that I worked on in the early 90s
link |
00:47:15.840
and Jeff Hinton worked on also in the early 90s.
link |
00:47:17.720
They were sort of parallel work.
link |
00:47:20.040
I used to call this Siamese network.
link |
00:47:21.600
So basically you take two identical copies
link |
00:47:24.960
of the same network, they share the same weights
link |
00:47:27.720
and you show two different views of the same object.
link |
00:47:31.760
Either those two different views may have been obtained
link |
00:47:33.920
by data augmentation
link |
00:47:35.440
or maybe it's two different views of the same scene
link |
00:47:37.680
from a camera that you moved or at different times
link |
00:47:40.280
or something like that, right?
link |
00:47:41.400
Or two pictures of the same person, things like that.
link |
00:47:44.400
And then you train this neural net,
link |
00:47:46.480
those two identical copies of this neural net
link |
00:47:48.420
to produce an output representation, a vector
link |
00:47:52.460
in such a way that the representation for those two images
link |
00:47:56.560
are as close to each other as possible,
link |
00:47:58.880
as identical to each other as possible, right?
link |
00:48:00.840
Because you want the system
link |
00:48:02.040
to basically learn a function that will be invariant,
link |
00:48:06.120
that will not change, whose output will not change
link |
00:48:08.200
when you transform those inputs in those particular ways,
link |
00:48:12.480
right?
link |
00:48:14.080
So that's easy to do.
link |
00:48:15.680
What's complicated is how do you make sure
link |
00:48:17.720
that when you show two images that are different,
link |
00:48:19.520
the system will produce different things?
link |
00:48:21.960
Because if you don't have a specific provision for this,
link |
00:48:26.200
the system will just ignore the inputs when you train it,
link |
00:48:29.160
it will end up ignoring the input
link |
00:48:30.360
and just produce a constant vector
link |
00:48:31.740
that is the same for every input, right?
link |
00:48:33.680
That's called a collapse.
link |
00:48:35.200
Now, how do you avoid collapse?
link |
00:48:36.720
So there's two ideas.
link |
00:48:38.840
One idea that I proposed in the early 90s
link |
00:48:41.560
with my colleagues at Bell Labs,
link |
00:48:43.120
Jane Barmley and a couple other people,
link |
00:48:46.280
which we now call contrastive learning,
link |
00:48:48.280
which is to have negative examples, right?
link |
00:48:50.020
So you have pairs of images that you know are different
link |
00:48:54.400
and you show them to the network and those two copies,
link |
00:48:57.480
and then you push the two output vectors away
link |
00:48:59.760
from each other and it will eventually guarantee
link |
00:49:02.200
that things that are semantically similar
link |
00:49:04.880
produce similar representations
link |
00:49:06.480
and things that are different
link |
00:49:07.320
produce different representations.
link |
00:49:10.280
We actually came up with this idea
link |
00:49:11.440
for a project of doing signature verification.
link |
00:49:14.480
So we would collect signatures from,
link |
00:49:18.400
like multiple signatures on the same person
link |
00:49:20.160
and then train a neural net to produce the same representation
link |
00:49:23.280
and then force the system to produce different
link |
00:49:27.880
representation for different signatures.
link |
00:49:31.000
This was actually, the problem was proposed by people
link |
00:49:33.460
from what was a subsidiary of AT&T at the time called NCR.
link |
00:49:38.240
And they were interested in storing
link |
00:49:40.360
representation of the signature on the 80 bytes
link |
00:49:43.500
of the magnetic strip of a credit card.
link |
00:49:46.640
So we came up with this idea of having a neural net
link |
00:49:48.800
with 80 outputs that we would quantize on bytes
link |
00:49:52.280
so that we could encode the signature.
link |
00:49:53.840
And that encoding was then used to compare
link |
00:49:55.440
whether the signature matches or not.
link |
00:49:57.080
That's right.
link |
00:49:57.920
So then you would sign, you would run through the neural net
link |
00:50:00.640
and then you would compare the output vector
link |
00:50:02.400
to whatever is stored on your card.
link |
00:50:03.240
Did it actually work?
link |
00:50:04.640
It worked, but they ended up not using it.
link |
00:50:08.940
Because nobody cares actually.
link |
00:50:10.120
I mean, the American financial payment system
link |
00:50:13.800
is incredibly lax in that respect compared to Europe.
link |
00:50:17.560
Oh, with the signatures?
link |
00:50:18.960
What's the purpose of signatures anyway?
link |
00:50:20.520
This is very different.
link |
00:50:21.360
Nobody looks at them, nobody cares.
link |
00:50:23.280
It's, yeah.
link |
00:50:24.440
Yeah, no, so that's contrastive learning, right?
link |
00:50:27.840
So you need positive and negative pairs.
link |
00:50:29.440
And the problem with that is that,
link |
00:50:31.760
even though I had the original paper on this,
link |
00:50:34.760
I'm actually not very positive about it
link |
00:50:36.800
because it doesn't work in high dimension.
link |
00:50:38.640
If your representation is high dimensional,
link |
00:50:41.040
there's just too many ways for two things to be different.
link |
00:50:44.300
And so you would need lots and lots
link |
00:50:45.960
and lots of negative pairs.
link |
00:50:48.260
So there is a particular implementation of this,
link |
00:50:50.800
which is relatively recent from actually
link |
00:50:52.840
the Google Toronto group where, you know,
link |
00:50:56.040
Jeff Hinton is the senior member there.
link |
00:50:58.800
It's called SIMCLR, S I M C L R.
link |
00:51:02.000
And it, you know, basically a particular way
link |
00:51:03.720
of implementing this idea of contrastive learning,
link |
00:51:06.760
the particular objective function.
link |
00:51:08.600
Now, what I'm much more enthusiastic about these days
link |
00:51:13.160
is non contrastive methods.
link |
00:51:14.600
So other ways to guarantee that the representations
link |
00:51:19.600
would be different for different inputs.
link |
00:51:24.200
And it's actually based on an idea that Jeff Hinton
link |
00:51:28.320
proposed in the early nineties with his student
link |
00:51:30.360
at the time, Sue Becker.
link |
00:51:31.960
And it's based on the idea of maximizing
link |
00:51:33.440
the mutual information between the outputs
link |
00:51:35.000
of the two systems.
link |
00:51:36.200
You only show positive pairs.
link |
00:51:37.480
You only show pairs of images that you know
link |
00:51:39.160
are somewhat similar.
link |
00:51:41.640
And you train the two networks to be informative,
link |
00:51:44.200
but also to be as informative of each other as possible.
link |
00:51:48.880
So basically one representation has to be predictable
link |
00:51:51.400
from the other, essentially.
link |
00:51:54.520
And, you know, he proposed that idea,
link |
00:51:56.400
had, you know, a couple of papers in the early nineties,
link |
00:51:59.440
and then nothing was done about it for decades.
link |
00:52:02.280
And I kind of revived this idea together
link |
00:52:04.360
with my postdocs at FAIR,
link |
00:52:07.480
particularly a postdoc called Stefan Denis,
link |
00:52:08.920
who is now a junior professor in Finland
link |
00:52:11.800
at University of Aalto.
link |
00:52:13.240
We came up with something that we call Barlow Twins.
link |
00:52:18.240
And it's a particular way of maximizing
link |
00:52:20.520
the information content of a vector,
link |
00:52:24.240
you know, using some hypotheses.
link |
00:52:27.920
And we have kind of another version of it
link |
00:52:30.920
that's more recent now called VICREG, V I C A R E G.
link |
00:52:33.480
That means Variance, Invariance, Covariance,
link |
00:52:35.960
Regularization.
link |
00:52:36.800
And it's the thing I'm the most excited about
link |
00:52:38.840
in machine learning in the last 15 years.
link |
00:52:40.600
I mean, I'm not, I'm really, really excited about this.
link |
00:52:43.360
What kind of data augmentation is useful
link |
00:52:46.400
for that noncontrastive learning method?
link |
00:52:49.280
Are we talking about, does that not matter that much?
link |
00:52:51.680
Or it seems like a very important part of the step.
link |
00:52:55.040
Yeah.
link |
00:52:55.880
How you generate the images that are similar,
link |
00:52:57.120
but sufficiently different.
link |
00:52:58.680
Yeah, that's right.
link |
00:52:59.520
It's an important step and it's also an annoying step
link |
00:53:01.440
because you need to have that knowledge
link |
00:53:02.840
of what data augmentation you can do
link |
00:53:05.840
that do not change the nature of the object.
link |
00:53:09.320
And so the standard scenario,
link |
00:53:12.280
which a lot of people working in this area are using
link |
00:53:14.520
is you use the type of distortion.
link |
00:53:18.720
So basically you do a geometric distortion.
link |
00:53:21.160
So one basically just shifts the image a little bit,
link |
00:53:23.360
it's called cropping.
link |
00:53:24.400
Another one kind of changes the scale a little bit.
link |
00:53:26.880
Another one kind of rotates it.
link |
00:53:28.240
Another one changes the colors.
link |
00:53:30.000
You can do a shift in color balance
link |
00:53:32.040
or something like that, saturation.
link |
00:53:34.880
Another one sort of blurs it.
link |
00:53:36.240
Another one adds noise.
link |
00:53:37.080
So you have like a catalog of kind of standard things
link |
00:53:40.040
and people try to use the same ones
link |
00:53:42.120
for different algorithms so that they can compare.
link |
00:53:44.960
But some algorithms, some self supervised algorithm
link |
00:53:47.200
actually can deal with much bigger,
link |
00:53:49.600
like more aggressive data augmentation and some don't.
link |
00:53:52.480
So that kind of makes the whole thing difficult.
link |
00:53:55.400
But that's the kind of distortions we're talking about.
link |
00:53:57.760
And so you train with those distortions
link |
00:54:02.520
and then you chop off the last layer, a couple layers
link |
00:54:07.400
of the network and you use the representation
link |
00:54:11.480
as input to a classifier.
link |
00:54:12.680
You train the classifier on ImageNet, let's say,
link |
00:54:16.680
or whatever, and measure the performance.
link |
00:54:19.600
And interestingly enough, the methods that are really good
link |
00:54:23.520
at eliminating the information that is irrelevant,
link |
00:54:25.960
which is the distortions between those images,
link |
00:54:29.200
do a good job at eliminating it.
link |
00:54:31.480
And as a consequence, you cannot use the representations
link |
00:54:36.480
in those systems for things like object detection
link |
00:54:39.080
and localization because that information is gone.
link |
00:54:41.480
So the type of data augmentation you need to do
link |
00:54:44.760
depends on the tasks you want eventually the system
link |
00:54:47.720
to solve and the type of data augmentation,
link |
00:54:50.680
standard data augmentation that we use today
link |
00:54:52.560
are only appropriate for object recognition
link |
00:54:54.720
or image classification.
link |
00:54:56.040
They're not appropriate for things like.
link |
00:54:57.760
Can you help me out understand what wide localizations?
link |
00:55:00.800
So you're saying it's just not good at the negative,
link |
00:55:03.760
like at classifying the negative,
link |
00:55:05.440
so that's why it can't be used for the localization?
link |
00:55:07.920
No, it's just that you train the system,
link |
00:55:10.360
you give it an image and then you give it the same image
link |
00:55:13.560
shifted and scaled and you tell it that's the same image.
link |
00:55:17.400
So the system basically is trained
link |
00:55:19.160
to eliminate the information about position and size.
link |
00:55:22.040
So now you want to use that to figure out
link |
00:55:26.200
where an object is and what size it is.
link |
00:55:27.760
Like a bounding box, like they'd be able to actually.
link |
00:55:30.040
Okay, it can still find the object in the image,
link |
00:55:34.160
it's just not very good at finding
link |
00:55:35.960
the exact boundaries of that object, interesting.
link |
00:55:38.960
Interesting, which that's an interesting
link |
00:55:42.040
sort of philosophical question,
link |
00:55:43.480
how important is object localization anyway?
link |
00:55:46.800
We're like obsessed by measuring image segmentation,
link |
00:55:51.240
obsessed by measuring perfectly knowing
link |
00:55:53.420
the boundaries of objects when arguably
link |
00:55:56.760
that's not that essential to understanding
link |
00:56:01.840
what are the contents of the scene.
link |
00:56:03.800
On the other hand, I think evolutionarily,
link |
00:56:05.880
the first vision systems in animals
link |
00:56:08.200
were basically all about localization,
link |
00:56:10.040
very little about recognition.
link |
00:56:12.480
And in the human brain, you have two separate pathways
link |
00:56:15.320
for recognizing the nature of a scene or an object
link |
00:56:20.880
and localizing objects.
link |
00:56:22.320
So you use the first pathway called eventual pathway
link |
00:56:25.200
for telling what you're looking at.
link |
00:56:29.140
The other pathway, the dorsal pathway,
link |
00:56:30.560
is used for navigation, for grasping, for everything else.
link |
00:56:34.120
And basically a lot of the things you need for survival
link |
00:56:36.920
are localization and detection.
link |
00:56:41.880
Is similarity learning or contrastive learning,
link |
00:56:45.080
are these non contrastive methods
link |
00:56:46.520
the same as understanding something?
link |
00:56:48.880
Just because you know a distorted cat
link |
00:56:50.680
is the same as a non distorted cat,
link |
00:56:52.600
does that mean you understand what it means to be a cat?
link |
00:56:56.760
To some extent.
link |
00:56:57.600
I mean, it's a superficial understanding, obviously.
link |
00:57:00.120
But what is the ceiling of this method, do you think?
link |
00:57:02.360
Is this just one trick on the path
link |
00:57:05.120
to doing self supervised learning?
link |
00:57:07.320
Can we go really, really far?
link |
00:57:10.040
I think we can go really far.
link |
00:57:11.280
So if we figure out how to use techniques of that type,
link |
00:57:16.400
perhaps very different, but the same nature,
link |
00:57:19.480
to train a system from video to do video prediction,
link |
00:57:23.360
essentially, I think we'll have a path towards,
link |
00:57:30.440
I wouldn't say unlimited, but a path towards some level
link |
00:57:33.520
of physical common sense in machines.
link |
00:57:38.120
And I also think that that ability to learn
link |
00:57:44.440
how the world works from a sort of high throughput channel
link |
00:57:47.720
like vision is a necessary step towards
link |
00:57:53.520
sort of real artificial intelligence.
link |
00:57:55.560
In other words, I believe in grounded intelligence.
link |
00:57:58.080
I don't think we can train a machine
link |
00:57:59.920
to be intelligent purely from text.
link |
00:58:02.200
Because I think the amount of information about the world
link |
00:58:04.960
that's contained in text is tiny compared
link |
00:58:07.680
to what we need to know.
link |
00:58:11.600
So for example, and people have attempted to do this
link |
00:58:15.320
for 30 years, the psych project and things like that,
link |
00:58:18.920
basically kind of writing down all the facts that are known
link |
00:58:21.160
and hoping that some sort of common sense will emerge.
link |
00:58:25.240
I think it's basically hopeless.
link |
00:58:27.160
But let me take an example.
link |
00:58:28.320
You take an object, I describe a situation to you.
link |
00:58:31.280
I take an object, I put it on the table
link |
00:58:33.560
and I push the table.
link |
00:58:34.960
It's completely obvious to you that the object
link |
00:58:37.240
will be pushed with the table,
link |
00:58:39.240
because it's sitting on it.
link |
00:58:41.840
There's no text in the world, I believe, that explains this.
link |
00:58:45.040
And so if you train a machine as powerful as it could be,
link |
00:58:49.040
your GPT 5000 or whatever it is,
link |
00:58:53.920
it's never gonna learn about this.
link |
00:58:57.040
That information is just not present in any text.
link |
00:59:01.040
Well, the question, like with the psych project,
link |
00:59:03.280
the dream I think is to have like 10 million,
link |
00:59:08.000
say facts like that, that give you a headstart,
link |
00:59:13.000
like a parent guiding you.
link |
00:59:15.200
Now, we humans don't need a parent to tell us
link |
00:59:17.280
that the table will move, sorry,
link |
00:59:19.240
the smartphone will move with the table.
link |
00:59:21.440
But we get a lot of guidance in other ways.
link |
00:59:25.640
So it's possible that we can give it a quick shortcut.
link |
00:59:28.160
What about a cat?
link |
00:59:29.200
The cat knows that.
link |
00:59:30.800
No, but they evolved, so.
link |
00:59:33.120
No, they learn like us.
link |
00:59:35.840
Sorry, the physics of stuff?
link |
00:59:37.080
Yeah.
link |
00:59:38.480
Well, yeah, so you're saying it's,
link |
00:59:41.360
so you're putting a lot of intelligence
link |
00:59:45.080
onto the nurture side, not the nature.
link |
00:59:47.120
Yes.
link |
00:59:47.960
We seem to have, you know,
link |
00:59:50.000
there's a very inefficient arguably process of evolution
link |
00:59:53.640
that got us from bacteria to who we are today.
link |
00:59:57.840
Started at the bottom, now we're here.
link |
00:59:59.800
So the question is how, okay,
link |
01:00:04.240
the question is how fundamental is that,
link |
01:00:06.000
the nature of the whole hardware?
link |
01:00:08.400
And then is there any way to shortcut it
link |
01:00:11.680
if it's fundamental?
link |
01:00:12.520
If it's not, if it's most of intelligence,
link |
01:00:14.280
most of the cool stuff we've been talking about
link |
01:00:15.920
is mostly nurture, mostly trained.
link |
01:00:18.800
We figure it out by observing the world.
link |
01:00:20.680
We can form that big, beautiful, sexy background model
link |
01:00:24.760
that you're talking about just by sitting there.
link |
01:00:28.880
Then, okay, then you need to, then like maybe,
link |
01:00:34.800
it is all supervised learning all the way down.
link |
01:00:37.840
Self supervised learning, say.
link |
01:00:39.000
Whatever it is that makes, you know,
link |
01:00:41.360
human intelligence different from other animals,
link |
01:00:44.080
which, you know, a lot of people think is language
link |
01:00:46.320
and logical reasoning and this kind of stuff.
link |
01:00:48.720
It cannot be that complicated because it only popped up
link |
01:00:51.000
in the last million years.
link |
01:00:52.840
Yeah.
link |
01:00:54.320
And, you know, it only involves, you know,
link |
01:00:57.840
less than 1% of our genome might be,
link |
01:00:59.640
which is the difference between human genome
link |
01:01:01.200
and chimps or whatever.
link |
01:01:03.360
So it can't be that complicated.
link |
01:01:06.640
You know, it can't be that fundamental.
link |
01:01:08.040
I mean, most of the complicated stuff
link |
01:01:10.880
already exists in cats and dogs and, you know,
link |
01:01:13.640
certainly primates, nonhuman primates.
link |
01:01:17.120
Yeah, that little thing with humans
link |
01:01:18.640
might be just something about social interaction
link |
01:01:22.480
and ability to maintain ideas
link |
01:01:24.000
across like a collective of people.
link |
01:01:28.160
It sounds very dramatic and very impressive,
link |
01:01:30.840
but it probably isn't mechanistically speaking.
link |
01:01:33.400
It is, but we're not there yet.
link |
01:01:34.680
Like, you know, we have, I mean, this is number 634,
link |
01:01:39.480
you know, in the list of problems we have to solve.
link |
01:01:43.400
So basic physics of the world is number one.
link |
01:01:46.880
What do you, just a quick tangent on data augmentation.
link |
01:01:51.600
So a lot of it is hard coded versus learned.
link |
01:01:57.920
Do you have any intuition that maybe
link |
01:02:00.960
there could be some weird data augmentation,
link |
01:02:03.600
like generative type of data augmentation,
link |
01:02:06.200
like doing something weird to images,
link |
01:02:07.680
which then improves the similarity learning process?
link |
01:02:13.120
So not just kind of dumb, simple distortions,
link |
01:02:16.280
but by you shaking your head,
link |
01:02:18.120
just saying that even simple distortions are enough.
link |
01:02:20.880
I think, no, I think data augmentation
link |
01:02:22.800
is a temporary necessary evil.
link |
01:02:26.480
So what people are working on now is two things.
link |
01:02:28.880
One is the type of self supervised learning,
link |
01:02:32.960
like trying to translate the type of self supervised learning
link |
01:02:35.480
people use in language, translating these two images,
link |
01:02:38.680
which is basically a denoising autoencoder method, right?
link |
01:02:41.800
So you take an image, you block, you mask some parts of it,
link |
01:02:47.320
and then you train some giant neural net
link |
01:02:49.520
to reconstruct the parts that are missing.
link |
01:02:52.640
And until very recently,
link |
01:02:56.200
there was no working methods for that.
link |
01:02:59.160
All the autoencoder type methods for images
link |
01:03:01.600
weren't producing very good representation,
link |
01:03:03.720
but there's a paper now coming out of the fair group
link |
01:03:06.600
at MNL Park that actually works very well.
link |
01:03:08.960
So that doesn't require data augmentation,
link |
01:03:12.120
that requires only masking, okay.
link |
01:03:15.000
Only masking for images, okay.
link |
01:03:18.640
Right, so you mask part of the image
link |
01:03:20.280
and you train a system, which in this case is a transformer
link |
01:03:24.560
because the transformer represents the image
link |
01:03:28.400
as non overlapping patches,
link |
01:03:30.880
so it's easy to mask patches and things like that.
link |
01:03:33.320
Okay, but then my question transfers to that problem,
link |
01:03:35.680
the masking, like why should the mask be square or rectangle?
link |
01:03:40.080
So it doesn't matter, like, you know,
link |
01:03:41.600
I think we're gonna come up probably in the future
link |
01:03:44.360
with sort of ways to mask that are kind of random,
link |
01:03:50.480
essentially, I mean, they are random already, but.
link |
01:03:52.920
No, no, but like something that's challenging,
link |
01:03:56.800
like optimally challenging.
link |
01:03:59.400
So like, I mean, maybe it's a metaphor that doesn't apply,
link |
01:04:02.440
but you're, it seems like there's a data augmentation
link |
01:04:06.400
or masking, there's an interactive element with it.
link |
01:04:09.880
Like you're almost like playing with an image.
link |
01:04:12.560
And like, it's like the way we play with an image
link |
01:04:14.720
in our minds.
link |
01:04:15.680
No, but it's like dropout.
link |
01:04:16.680
It's like Boston machine training.
link |
01:04:18.160
You, you know, every time you see a percept,
link |
01:04:23.200
you also, you can perturb it in some way.
link |
01:04:26.840
And then the principle of the training procedure
link |
01:04:31.520
is to minimize the difference of the output
link |
01:04:33.600
or the representation between the clean version
link |
01:04:36.920
and the corrupted version, essentially, right?
link |
01:04:40.280
And you can do this in real time, right?
link |
01:04:42.000
So, you know, Boston machine work like this, right?
link |
01:04:44.240
You show a percept, you tell the machine
link |
01:04:47.400
that's a good combination of activities
link |
01:04:49.840
or your input neurons.
link |
01:04:50.880
And then you either let them go their merry way
link |
01:04:56.560
without clamping them to values,
link |
01:04:58.960
or you only do this with a subset.
link |
01:05:01.120
And what you're doing is you're training the system
link |
01:05:03.520
so that the stable state of the entire network
link |
01:05:07.000
is the same regardless of whether it sees
link |
01:05:08.920
the entire input or whether it sees only part of it.
link |
01:05:12.880
You know, denoising autoencoder method
link |
01:05:14.360
is basically the same thing, right?
link |
01:05:15.880
You're training a system to reproduce the input,
link |
01:05:18.600
the complete inputs and filling the input
link |
01:05:20.480
and filling the blanks, regardless of which parts
link |
01:05:23.400
are missing, and that's really the underlying principle.
link |
01:05:26.280
And you could imagine sort of, even in the brain,
link |
01:05:28.320
some sort of neural principle where, you know,
link |
01:05:30.720
neurons kind of oscillate, right?
link |
01:05:32.800
So they take their activity and then temporarily
link |
01:05:35.520
they kind of shut off to, you know,
link |
01:05:38.040
force the rest of the system to basically reconstruct
link |
01:05:42.120
the input without their help, you know?
link |
01:05:44.800
And, I mean, you could imagine, you know,
link |
01:05:49.040
more or less biologically possible processes.
link |
01:05:51.040
Something like that.
link |
01:05:51.880
And I guess with this denoising autoencoder
link |
01:05:54.960
and masking and data augmentation,
link |
01:05:58.720
you don't have to worry about being super efficient.
link |
01:06:01.160
You could just do as much as you want
link |
01:06:03.960
and get better over time.
link |
01:06:06.160
Because I was thinking, like, you might want to be clever
link |
01:06:08.800
about the way you do all these procedures, you know,
link |
01:06:12.000
but that's only, it's somehow costly to do every iteration,
link |
01:06:16.720
but it's not really.
link |
01:06:17.960
Not really.
link |
01:06:19.280
Maybe.
link |
01:06:20.280
And then there is, you know,
link |
01:06:21.480
data augmentation without explicit data augmentation.
link |
01:06:24.160
This data augmentation by weighting,
link |
01:06:25.600
which is, you know, the sort of video prediction.
link |
01:06:29.320
You're observing a video clip,
link |
01:06:31.480
observing the, you know, the continuation of that video clip.
link |
01:06:36.400
You try to learn a representation
link |
01:06:38.040
using dual joint embedding architectures
link |
01:06:40.240
in such a way that the representation of the future clip
link |
01:06:43.280
is easily predictable from the representation
link |
01:06:45.680
of the observed clip.
link |
01:06:48.600
Do you think YouTube has enough raw data
link |
01:06:52.720
from which to learn how to be a cat?
link |
01:06:56.400
I think so.
link |
01:06:57.760
So the amount of data is not the constraint.
link |
01:07:01.200
No, it would require some selection, I think.
link |
01:07:04.120
Some selection?
link |
01:07:05.400
Some selection of, you know, maybe the right type of data.
link |
01:07:08.480
You need some.
link |
01:07:09.320
Don't go down the rabbit hole of just cat videos.
link |
01:07:11.400
You might need to watch some lectures or something.
link |
01:07:14.600
No, you wouldn't.
link |
01:07:15.720
How meta would that be
link |
01:07:17.480
if it like watches lectures about intelligence
link |
01:07:21.400
and then learns,
link |
01:07:22.240
watches your lectures in NYU
link |
01:07:24.320
and learns from that how to be intelligent?
link |
01:07:26.280
I don't think that would be enough.
link |
01:07:30.080
What's your, do you find multimodal learning interesting?
link |
01:07:33.240
We've been talking about visual language,
link |
01:07:35.080
like combining those together,
link |
01:07:36.440
maybe audio, all those kinds of things.
link |
01:07:38.120
There's a lot of things that I find interesting
link |
01:07:40.400
in the short term,
link |
01:07:41.240
but are not addressing the important problem
link |
01:07:44.080
that I think are really kind of the big challenges.
link |
01:07:46.600
So I think, you know, things like multitask learning,
link |
01:07:48.920
continual learning, you know, adversarial issues.
link |
01:07:54.360
I mean, those have great practical interests
link |
01:07:57.000
in the relatively short term, possibly,
link |
01:08:00.280
but I don't think they're fundamental.
link |
01:08:01.240
You know, active learning,
link |
01:08:02.600
even to some extent, reinforcement learning.
link |
01:08:04.360
I think those things will become either obsolete
link |
01:08:07.920
or useless or easy
link |
01:08:10.800
once we figured out how to do self supervised
link |
01:08:14.880
representation learning
link |
01:08:15.880
or learning predictive world models.
link |
01:08:19.280
And so I think that's what, you know,
link |
01:08:21.480
the entire community should be focusing on.
link |
01:08:24.400
At least people who are interested
link |
01:08:25.400
in sort of fundamental questions
link |
01:08:26.680
or, you know, really kind of pushing the envelope
link |
01:08:28.440
of AI towards the next stage.
link |
01:08:31.440
But of course, there's like a huge amount of,
link |
01:08:33.080
you know, very interesting work to do
link |
01:08:34.360
in sort of practical questions
link |
01:08:35.840
that have, you know, short term impact.
link |
01:08:38.000
Well, you know, it's difficult to talk about
link |
01:08:41.240
the temporal scale,
link |
01:08:42.200
because all of human civilization
link |
01:08:44.240
will eventually be destroyed
link |
01:08:45.400
because the sun will die out.
link |
01:08:48.520
And even if Elon Musk is successful
link |
01:08:50.280
in multi planetary colonization across the galaxy,
link |
01:08:54.520
eventually the entirety of it
link |
01:08:56.560
will just become giant black holes.
link |
01:08:58.920
And that's gonna take a while though.
link |
01:09:02.120
So, but what I'm saying is then that logic
link |
01:09:04.800
can be used to say it's all meaningless.
link |
01:09:07.360
I'm saying all that to say that multitask learning
link |
01:09:11.840
might be, you're calling it practical
link |
01:09:15.400
or pragmatic or whatever.
link |
01:09:17.280
That might be the thing that achieves something
link |
01:09:19.440
very akin to intelligence
link |
01:09:22.560
while we're trying to solve the more general problem
link |
01:09:26.880
of self supervised learning of background knowledge.
link |
01:09:29.400
So the reason I bring that up,
link |
01:09:30.640
maybe one way to ask that question.
link |
01:09:33.040
I've been very impressed
link |
01:09:34.000
by what Tesla Autopilot team is doing.
link |
01:09:36.440
I don't know if you've gotten a chance to glance
link |
01:09:38.320
at this particular one example of multitask learning,
link |
01:09:42.080
where they're literally taking the problem,
link |
01:09:44.960
like, I don't know, Charles Darwin studying animals.
link |
01:09:48.880
They're studying the problem of driving
link |
01:09:51.600
and asking, okay, what are all the things
link |
01:09:53.320
you have to perceive?
link |
01:09:55.000
And the way they're solving it is one,
link |
01:09:57.800
there's an ontology where you're bringing that to the table.
link |
01:10:00.400
So you're formulating a bunch of different tasks.
link |
01:10:02.240
It's like over a hundred tasks or something like that
link |
01:10:04.240
that they're involved in driving.
link |
01:10:06.040
And then they're deploying it
link |
01:10:07.720
and then getting data back from people that run into trouble
link |
01:10:10.480
and they're trying to figure out, do we add tasks?
link |
01:10:12.680
Do we, like, we focus on each individual task separately?
link |
01:10:16.040
In fact, I would say,
link |
01:10:18.280
I would classify Andre Karpathy's talk in two ways.
link |
01:10:20.680
So one was about doors
link |
01:10:22.360
and the other one about how much ImageNet sucks.
link |
01:10:24.720
He kept going back and forth on those two topics,
link |
01:10:28.560
which ImageNet sucks,
link |
01:10:30.000
meaning you can't just use a single benchmark.
link |
01:10:33.040
There's so, like, you have to have like a giant suite
link |
01:10:37.240
of benchmarks to understand how well your system actually works.
link |
01:10:39.880
Oh, I agree with him.
link |
01:10:40.720
I mean, he's a very sensible guy.
link |
01:10:43.880
Now, okay, it's very clear that if you're faced
link |
01:10:47.560
with an engineering problem that you need to solve
link |
01:10:50.480
in a relatively short time,
link |
01:10:51.920
particularly if you have Elon Musk breathing down your neck,
link |
01:10:55.880
you're going to have to take shortcuts, right?
link |
01:10:58.640
You might think about the fact that the right thing to do
link |
01:11:02.560
and the longterm solution involves, you know,
link |
01:11:04.520
some fancy self supervised running,
link |
01:11:06.560
but you have, you know, Elon Musk breathing down your neck
link |
01:11:10.240
and, you know, this involves, you know, human lives.
link |
01:11:13.600
And so you have to basically just do
link |
01:11:17.320
the systematic engineering and, you know,
link |
01:11:22.000
fine tuning and refinements
link |
01:11:23.280
and trial and error and all that stuff.
link |
01:11:26.360
There's nothing wrong with that.
link |
01:11:27.400
That's called engineering.
link |
01:11:28.600
That's called, you know, putting technology out in the world.
link |
01:11:35.840
And you have to kind of ironclad it before you do this,
link |
01:11:39.880
you know, so much for, you know,
link |
01:11:44.520
grand ideas and principles.
link |
01:11:48.280
But, you know, I'm placing myself sort of, you know,
link |
01:11:50.720
some, you know, upstream of this, you know,
link |
01:11:54.480
quite a bit upstream of this.
link |
01:11:55.760
You're a Plato, think about platonic forms.
link |
01:11:58.240
You're not platonic because eventually
link |
01:12:01.320
I want that stuff to get used,
link |
01:12:03.120
but it's okay if it takes five or 10 years
link |
01:12:06.920
for the community to realize this is the right thing to do.
link |
01:12:09.720
I've done this before.
link |
01:12:11.280
It's been the case before that, you know,
link |
01:12:13.240
I've made that case.
link |
01:12:14.440
I mean, if you look back in the mid 2000, for example,
link |
01:12:17.760
and you ask yourself the question, okay,
link |
01:12:19.320
I want to recognize cars or faces or whatever,
link |
01:12:24.360
you know, I can use convolutional net.
link |
01:12:25.560
So I can use sort of more conventional
link |
01:12:28.360
kind of computer vision techniques, you know,
link |
01:12:29.880
using interest point detectors or assist density features
link |
01:12:33.760
and, you know, sticking an SVM on top.
link |
01:12:35.760
At that time, the datasets were so small
link |
01:12:37.800
that those methods that use more hand engineering
link |
01:12:41.920
worked better than ConvNets.
link |
01:12:43.560
It was just not enough data for ConvNets
link |
01:12:45.560
and ConvNets were a little slow with the kind of hardware
link |
01:12:48.880
that was available at the time.
link |
01:12:50.840
And there was a sea change when, basically,
link |
01:12:53.880
when, you know, datasets became bigger
link |
01:12:56.680
and GPUs became available.
link |
01:12:58.600
That's what, you know, two of the main factors
link |
01:13:02.960
that basically made people change their mind.
link |
01:13:07.880
And you can look at the history of,
link |
01:13:11.880
like, all sub branches of AI or pattern recognition.
link |
01:13:16.400
And there's a similar trajectory followed by techniques
link |
01:13:19.800
where people start by, you know, engineering the hell out of it.
link |
01:13:25.200
You know, be it optical character recognition,
link |
01:13:29.200
speech recognition, computer vision,
link |
01:13:31.760
like image recognition in general,
link |
01:13:34.280
natural language understanding, like, you know, translation,
link |
01:13:37.280
things like that, right?
link |
01:13:38.000
You start to engineer the hell out of it.
link |
01:13:41.040
You start to acquire all the knowledge,
link |
01:13:42.680
the prior knowledge you know about image formation,
link |
01:13:44.760
about, you know, the shape of characters,
link |
01:13:46.600
about, you know, morphological operations,
link |
01:13:49.560
about, like, feature extraction, Fourier transforms,
link |
01:13:52.400
you know, vernicke moments, you know, whatever, right?
link |
01:13:54.440
People have come up with thousands of ways
link |
01:13:56.280
of representing images
link |
01:13:57.680
so that they could be easily classified afterwards.
link |
01:14:01.600
Same for speech recognition, right?
link |
01:14:03.000
There is, you know, it took decades
link |
01:14:04.640
for people to figure out a good front end
link |
01:14:06.920
to preprocess speech signals
link |
01:14:09.680
so that, you know, all the information
link |
01:14:11.120
about what is being said is preserved,
link |
01:14:13.400
but most of the information
link |
01:14:14.440
about the identity of the speaker is gone.
link |
01:14:16.920
You know, kestrel coefficients or whatever, right?
link |
01:14:20.880
And same for text, right?
link |
01:14:23.400
You do named entity recognition and you parse
link |
01:14:26.440
and you do tagging of the parts of speech
link |
01:14:31.800
and, you know, you do this sort of tree representation
link |
01:14:34.480
of clauses and all that stuff, right?
link |
01:14:36.480
Before you can do anything.
link |
01:14:40.720
So that's how it starts, right?
link |
01:14:43.520
Just engineer the hell out of it.
link |
01:14:45.160
And then you start having data
link |
01:14:47.920
and maybe you have more powerful computers.
link |
01:14:50.160
Maybe you know something about statistical learning.
link |
01:14:52.400
So you start using machine learning
link |
01:14:53.640
and it's usually a small sliver
link |
01:14:54.840
on top of your kind of handcrafted system
link |
01:14:56.800
where, you know, you extract features by hand.
link |
01:14:59.560
Okay, and now, you know, nowadays the standard way
link |
01:15:02.280
of doing this is that you train the entire thing end to end
link |
01:15:04.320
with a deep learning system and it learns its own features
link |
01:15:06.720
and, you know, speech recognition systems nowadays
link |
01:15:10.920
or CR systems are completely end to end.
link |
01:15:12.920
It's, you know, it's some giant neural net
link |
01:15:15.320
that takes raw waveforms
link |
01:15:17.920
and produces a sequence of characters coming out.
link |
01:15:20.440
And it's just a huge neural net, right?
link |
01:15:22.080
There's no, you know, Markov model,
link |
01:15:24.000
there's no language model that is explicit
link |
01:15:26.360
other than, you know, something that's ingrained
link |
01:15:28.600
in the sort of neural language model, if you want.
link |
01:15:30.960
Same for translation, same for all kinds of stuff.
link |
01:15:33.400
So you see this continuous evolution
link |
01:15:36.440
from, you know, less and less hand crafting
link |
01:15:40.440
and more and more learning.
link |
01:15:43.120
And I think, I mean, it's true in biology as well.
link |
01:15:50.680
So, I mean, we might disagree about this,
link |
01:15:52.880
maybe not, this one little piece at the end,
link |
01:15:56.860
you mentioned active learning.
link |
01:15:58.360
It feels like active learning,
link |
01:16:01.440
which is the selection of data
link |
01:16:02.880
and also the interactivity needs to be part
link |
01:16:05.600
of this giant neural network.
link |
01:16:06.800
You cannot just be an observer
link |
01:16:08.360
to do self supervised learning.
link |
01:16:09.720
You have to, well, I don't,
link |
01:16:12.200
self supervised learning is just a word,
link |
01:16:14.560
but I would, whatever this giant stack
link |
01:16:16.760
of a neural network that's automatically learning,
link |
01:16:19.640
it feels, my intuition is that you have to have a system,
link |
01:16:26.520
whether it's a physical robot or a digital robot,
link |
01:16:30.220
that's interacting with the world
link |
01:16:32.360
and doing so in a flawed way and improving over time
link |
01:16:35.960
in order to form the self supervised learning.
link |
01:16:41.520
Well, you can't just give it a giant sea of data.
link |
01:16:44.960
Okay, I agree and I disagree.
link |
01:16:47.120
I agree in the sense that I think, I agree in two ways.
link |
01:16:52.000
The first way I agree is that if you want,
link |
01:16:55.140
and you certainly need a causal model of the world
link |
01:16:57.480
that allows you to predict the consequences
link |
01:16:59.120
of your actions, to train that model,
link |
01:17:01.280
you need to take actions, right?
link |
01:17:02.760
You need to be able to act in a world
link |
01:17:04.600
and see the effect for you to be,
link |
01:17:07.040
to learn causal models of the world.
link |
01:17:08.560
So that's not obvious because you can observe others.
link |
01:17:11.560
You can observe others.
link |
01:17:12.400
And you can infer that they're similar to you
link |
01:17:14.720
and then you can learn from that.
link |
01:17:16.000
Yeah, but then you have to kind of hardwire that part,
link |
01:17:18.400
right, and then, you know, mirror neurons
link |
01:17:19.880
and all that stuff, right?
link |
01:17:20.720
So, and it's not clear to me
link |
01:17:23.280
how you would do this in a machine.
link |
01:17:24.440
So I think the action part would be necessary
link |
01:17:30.240
for having causal models of the world.
link |
01:17:32.620
The second reason it may be necessary,
link |
01:17:36.660
or at least more efficient,
link |
01:17:37.860
is that active learning basically, you know,
link |
01:17:41.700
goes for the jugular of what you don't know, right?
link |
01:17:44.900
Is, you know, obvious areas of uncertainty
link |
01:17:48.020
about your world and about how the world behaves.
link |
01:17:52.940
And you can resolve this uncertainty
link |
01:17:56.220
by systematic exploration of that part
link |
01:17:58.980
that you don't know.
link |
01:18:00.300
And if you know that you don't know,
link |
01:18:01.700
then, you know, it makes you curious.
link |
01:18:03.020
You kind of look into situations that,
link |
01:18:05.620
and, you know, across the animal world,
link |
01:18:09.260
different species have different levels of curiosity,
link |
01:18:12.900
right, depending on how they're built, right?
link |
01:18:15.100
So, you know, cats and rats are incredibly curious,
link |
01:18:18.780
dogs not so much, I mean, less.
link |
01:18:20.620
Yeah, so it could be useful
link |
01:18:22.140
to have that kind of curiosity.
link |
01:18:23.900
So it'd be useful,
link |
01:18:24.740
but curiosity just makes the process faster.
link |
01:18:26.980
It doesn't make the process exist.
link |
01:18:28.780
The, so what process, what learning process is it
link |
01:18:33.820
that active learning makes more efficient?
link |
01:18:37.780
And I'm asking that first question, you know,
link |
01:18:42.300
you know, we haven't answered that question yet.
link |
01:18:43.940
So, you know, I worry about active learning
link |
01:18:45.860
once this question is...
link |
01:18:47.300
So it's the more fundamental question to ask.
link |
01:18:49.940
And if active learning or interaction
link |
01:18:53.900
increases the efficiency of the learning,
link |
01:18:56.220
see, sometimes it becomes very different
link |
01:18:59.700
if the increase is several orders of magnitude, right?
link |
01:19:03.700
Like...
link |
01:19:04.540
That's true.
link |
01:19:05.380
But fundamentally it's still the same thing
link |
01:19:07.620
and building up the intuition about how to,
link |
01:19:10.700
in a self supervised way to construct background models,
link |
01:19:13.340
efficient or inefficient, is the core problem.
link |
01:19:18.180
What do you think about Yoshua Bengio's
link |
01:19:20.300
talking about consciousness
link |
01:19:22.380
and all of these kinds of concepts?
link |
01:19:24.060
Okay, I don't know what consciousness is, but...
link |
01:19:29.780
It's a good opener.
link |
01:19:31.500
And to some extent, a lot of the things
link |
01:19:33.100
that are said about consciousness
link |
01:19:34.860
remind me of the questions people were asking themselves
link |
01:19:38.260
in the 18th century or 17th century
link |
01:19:40.900
when they discovered that, you know, how the eye works
link |
01:19:44.620
and the fact that the image at the back of the eye
link |
01:19:46.620
was upside down, right?
link |
01:19:49.420
Because you have a lens.
link |
01:19:50.260
And so on your retina, the image that forms is an image
link |
01:19:54.140
of the world, but it's upside down.
link |
01:19:55.180
How is it that you see right side up?
link |
01:19:57.820
And, you know, with what we know today in science,
link |
01:20:00.100
you know, we realize this question doesn't make any sense
link |
01:20:03.500
or is kind of ridiculous in some way, right?
link |
01:20:05.980
So I think a lot of what is said about consciousness
link |
01:20:07.820
is of that nature.
link |
01:20:08.660
Now, that said, there is a lot of really smart people
link |
01:20:10.620
that for whom I have a lot of respect
link |
01:20:13.460
who are talking about this topic,
link |
01:20:14.700
people like David Chalmers, who is a colleague of mine at NYU.
link |
01:20:17.900
I have kind of an orthodox folk speculative hypothesis
link |
01:20:28.140
about consciousness.
link |
01:20:29.180
So we're talking about the study of a world model.
link |
01:20:32.020
And I think, you know, our entire prefrontal cortex
link |
01:20:35.540
basically is the engine for a world model.
link |
01:20:40.820
But when we are attending at a particular situation,
link |
01:20:44.580
we're focused on that situation.
link |
01:20:46.060
We basically cannot attend to anything else.
link |
01:20:48.540
And that seems to suggest that we basically have
link |
01:20:53.540
only one world model engine in our prefrontal cortex.
link |
01:20:59.780
That engine is configurable to the situation at hand.
link |
01:21:02.620
So we are building a box out of wood,
link |
01:21:04.660
or we are driving down the highway playing chess.
link |
01:21:09.300
We basically have a single model of the world
link |
01:21:12.820
that we configure into the situation at hand,
link |
01:21:15.380
which is why we can only attend to one task at a time.
link |
01:21:19.220
Now, if there is a task that we do repeatedly,
link |
01:21:22.860
it goes from the sort of deliberate reasoning
link |
01:21:25.940
using model of the world and prediction
link |
01:21:27.420
and perhaps something like model predictive control,
link |
01:21:29.300
which I was talking about earlier,
link |
01:21:31.380
to something that is more subconscious
link |
01:21:33.340
that becomes automatic.
link |
01:21:34.380
So I don't know if you've ever played
link |
01:21:35.940
against a chess grandmaster.
link |
01:21:39.180
I get wiped out in 10 plays, right?
link |
01:21:43.820
And I have to think about my move for like 15 minutes.
link |
01:21:50.140
And the person in front of me, the grandmaster,
link |
01:21:52.620
would just react within seconds, right?
link |
01:21:56.540
He doesn't need to think about it.
link |
01:21:58.580
That's become part of the subconscious
link |
01:21:59.980
because it's basically just pattern recognition
link |
01:22:02.620
at this point.
link |
01:22:04.740
Same, the first few hours you drive a car,
link |
01:22:07.660
you are really attentive, you can't do anything else.
link |
01:22:09.660
And then after 20, 30 hours of practice, 50 hours,
link |
01:22:13.460
the subconscious, you can talk to the person next to you,
link |
01:22:15.700
things like that, right?
link |
01:22:17.100
Unless the situation becomes unpredictable
link |
01:22:19.060
and then you have to stop talking.
link |
01:22:21.060
So that suggests you only have one model in your head.
link |
01:22:24.740
And it might suggest the idea that consciousness
link |
01:22:27.860
basically is the module that configures
link |
01:22:29.780
this world model of yours.
link |
01:22:31.980
You need to have some sort of executive kind of overseer
link |
01:22:36.540
that configures your world model for the situation at hand.
link |
01:22:40.620
And that leads to kind of the really curious concept
link |
01:22:43.780
that consciousness is not a consequence
link |
01:22:46.020
of the power of our minds,
link |
01:22:47.660
but of the limitation of our brains.
link |
01:22:49.940
That because we have only one world model,
link |
01:22:52.060
we have to be conscious.
link |
01:22:53.660
If we had as many world models
link |
01:22:55.220
as situations we encounter,
link |
01:22:58.540
then we could do all of them simultaneously
link |
01:23:00.740
and we wouldn't need this sort of executive control
link |
01:23:02.940
that we call consciousness.
link |
01:23:04.540
Yeah, interesting.
link |
01:23:05.380
And somehow maybe that executive controller,
link |
01:23:08.940
I mean, the hard problem of consciousness,
link |
01:23:10.980
there's some kind of chemicals in biology
link |
01:23:12.860
that's creating a feeling,
link |
01:23:15.020
like it feels to experience some of these things.
link |
01:23:18.780
That's kind of like the hard question is,
link |
01:23:22.460
what the heck is that and why is that useful?
link |
01:23:24.900
Maybe the more pragmatic question,
link |
01:23:26.180
why is it useful to feel like this is really you
link |
01:23:29.940
experiencing this versus just like information
link |
01:23:33.340
being processed?
link |
01:23:34.380
It could be just a very nice side effect
link |
01:23:39.020
of the way we evolved.
link |
01:23:41.820
That's just very useful to feel a sense of ownership
link |
01:23:48.620
to the decisions you make, to the perceptions you make,
link |
01:23:51.180
to the model you're trying to maintain.
link |
01:23:53.180
Like you own this thing and this is the only one you got
link |
01:23:56.260
and if you lose it, it's gonna really suck.
link |
01:23:58.420
And so you should really send the brain
link |
01:24:00.620
some signals about it.
link |
01:24:02.300
So what ideas do you believe might be true
link |
01:24:06.860
that most or at least many people disagree with?
link |
01:24:11.260
Let's say in the space of machine learning.
link |
01:24:13.740
Well, it depends who you talk about,
link |
01:24:14.940
but I think, so certainly there is a bunch of people
link |
01:24:20.100
who are nativists, right?
link |
01:24:21.100
Who think that a lot of the basic things about the world
link |
01:24:23.300
are kind of hardwired in our minds.
link |
01:24:26.420
Things like the world is three dimensional, for example,
link |
01:24:28.860
is that hardwired?
link |
01:24:30.420
Things like object permanence,
link |
01:24:32.660
is this something that we learn
link |
01:24:35.140
before the age of three months or so?
link |
01:24:37.500
Or are we born with it?
link |
01:24:39.340
And there are very wide disagreements
link |
01:24:42.380
among the cognitive scientists for this.
link |
01:24:46.580
I think those things are actually very simple to learn.
link |
01:24:50.580
Is it the case that the oriented edge detectors in V1
link |
01:24:54.220
are learned or are they hardwired?
link |
01:24:56.180
I think they are learned.
link |
01:24:57.260
They might be learned before both
link |
01:24:58.580
because it's really easy to generate signals
link |
01:25:00.620
from the retina that actually will train edge detectors.
link |
01:25:04.620
And again, those are things that can be learned
link |
01:25:06.740
within minutes of opening your eyes, right?
link |
01:25:09.580
I mean, since the 1990s,
link |
01:25:12.660
we have algorithms that can learn oriented edge detectors
link |
01:25:15.460
completely unsupervised
link |
01:25:16.940
with the equivalent of a few minutes of real time.
link |
01:25:19.060
So those things have to be learned.
link |
01:25:22.660
And there's also those MIT experiments
link |
01:25:24.580
where you kind of plug the optical nerve
link |
01:25:27.820
on the auditory cortex of a baby ferret, right?
link |
01:25:30.300
And that auditory cortex
link |
01:25:31.300
becomes a visual cortex essentially.
link |
01:25:33.420
So clearly there's learning taking place there.
link |
01:25:37.980
So I think a lot of what people think are so basic
link |
01:25:41.340
that they need to be hardwired,
link |
01:25:43.180
I think a lot of those things are learned
link |
01:25:44.420
because they are easy to learn.
link |
01:25:46.260
So you put a lot of value in the power of learning.
link |
01:25:49.980
What kind of things do you suspect might not be learned?
link |
01:25:53.340
Is there something that could not be learned?
link |
01:25:56.060
So your intrinsic drives are not learned.
link |
01:25:59.820
There are the things that make humans human
link |
01:26:03.460
or make cats different from dogs, right?
link |
01:26:07.460
It's the basic drives that are kind of hardwired
link |
01:26:10.060
in our basal ganglia.
link |
01:26:13.100
I mean, there are people who are working
link |
01:26:14.060
on this kind of stuff that's called intrinsic motivation
link |
01:26:16.380
in the context of reinforcement learning.
link |
01:26:18.220
So these are objective functions
link |
01:26:20.100
where the reward doesn't come from the external world.
link |
01:26:23.100
It's computed by your own brain.
link |
01:26:24.660
Your own brain computes whether you're happy or not, right?
link |
01:26:28.140
It measures your degree of comfort or in comfort.
link |
01:26:33.460
And because it's your brain computing this,
link |
01:26:36.100
presumably it knows also how to estimate
link |
01:26:37.780
gradients of this, right?
link |
01:26:38.780
So it's easier to learn when your objective is intrinsic.
link |
01:26:47.100
So that has to be hardwired.
link |
01:26:50.100
The critic that makes longterm prediction of the outcome,
link |
01:26:53.460
which is the eventual result of this, that's learned.
link |
01:26:57.860
And perception is learned
link |
01:26:59.060
and your model of the world is learned.
link |
01:27:01.260
But let me take an example of why the critic,
link |
01:27:04.260
I mean, an example of how the critic may be learned, right?
link |
01:27:06.860
If I come to you, I reach across the table
link |
01:27:11.220
and I pinch your arm, right?
link |
01:27:13.380
Complete surprise for you.
link |
01:27:15.060
You would not have expected this from me.
link |
01:27:16.260
I was expecting that the whole time, but yes, right.
link |
01:27:18.100
Let's say for the sake of the story, yes.
link |
01:27:20.420
So, okay, your basal ganglia is gonna light up
link |
01:27:24.980
because it's gonna hurt, right?
link |
01:27:28.500
And now your model of the world includes the fact that
link |
01:27:31.140
I may pinch you if I approach my...
link |
01:27:34.820
Don't trust humans.
link |
01:27:36.220
Right, my hand to your arm.
link |
01:27:37.860
So if I try again, you're gonna recoil.
link |
01:27:40.020
And that's your critic, your predictive,
link |
01:27:44.060
your predictor of your ultimate pain system
link |
01:27:50.660
that predicts that something bad is gonna happen
link |
01:27:52.380
and you recoil to avoid it.
link |
01:27:53.860
So even that can be learned.
link |
01:27:55.260
That is learned, definitely.
link |
01:27:56.700
This is what allows you also to define some goals, right?
link |
01:28:00.700
So the fact that you're a school child,
link |
01:28:04.540
you wake up in the morning and you go to school
link |
01:28:06.780
and it's not because you necessarily like waking up early
link |
01:28:12.060
and going to school,
link |
01:28:12.900
but you know that there is a long term objective
link |
01:28:14.620
you're trying to optimize.
link |
01:28:15.820
So Ernest Becker, I'm not sure if you're familiar with him,
link |
01:28:18.540
the philosopher, he wrote the book Denial of Death
link |
01:28:20.900
and his idea is that one of the core motivations
link |
01:28:23.420
of human beings is our terror of death, our fear of death.
link |
01:28:27.220
That's what makes us unique from cats.
link |
01:28:28.900
Cats are just surviving.
link |
01:28:30.500
They do not have a deep, like a cognizance introspection
link |
01:28:37.540
that over the horizon is the end.
link |
01:28:41.740
And then he says that, I mean,
link |
01:28:43.060
there's a terror management theory
link |
01:28:44.420
that just all these psychological experiments
link |
01:28:46.260
that show basically this idea
link |
01:28:50.020
that all of human civilization, everything we create
link |
01:28:54.380
is kind of trying to forget if even for a brief moment
link |
01:28:58.820
that we're going to die.
link |
01:29:00.660
When do you think humans understand
link |
01:29:03.780
that they're going to die?
link |
01:29:04.900
Is it learned early on also?
link |
01:29:07.580
I don't know at what point.
link |
01:29:11.260
I mean, it's a question like at what point
link |
01:29:13.460
do you realize that what death really is?
link |
01:29:16.420
And I think most people don't actually realize
link |
01:29:18.180
what death is, right?
link |
01:29:19.220
I mean, most people believe that you go to heaven
link |
01:29:20.940
or something, right?
link |
01:29:21.860
So to push back on that, what Ernest Becker says
link |
01:29:25.580
and Sheldon Solomon, all of those folks,
link |
01:29:29.300
and I find those ideas a little bit compelling
link |
01:29:31.620
is that there is moments in life, early in life,
link |
01:29:34.100
a lot of this fun happens early in life
link |
01:29:36.540
when you do deeply experience
link |
01:29:41.620
the terror of this realization.
link |
01:29:43.540
And all the things you think about about religion,
link |
01:29:45.980
all those kinds of things that we kind of think about
link |
01:29:48.420
more like teenage years and later,
link |
01:29:50.660
we're talking about way earlier.
link |
01:29:52.100
No, it was like seven or eight years,
link |
01:29:53.220
something like that, yeah.
link |
01:29:54.060
You realize, holy crap, this is like the mystery,
link |
01:29:59.660
the terror, like it's almost like you're a little prey,
link |
01:30:03.220
a little baby deer sitting in the darkness
link |
01:30:05.340
of the jungle or the woods looking all around you.
link |
01:30:08.060
There's darkness full of terror.
link |
01:30:09.540
I mean, that realization says, okay,
link |
01:30:12.140
I'm gonna go back in the comfort of my mind
link |
01:30:14.460
where there is a deep meaning,
link |
01:30:16.780
where there is maybe like pretend I'm immortal
link |
01:30:20.420
in however way, however kind of idea I can construct
link |
01:30:25.060
to help me understand that I'm immortal.
link |
01:30:27.180
Religion helps with that.
link |
01:30:28.660
You can delude yourself in all kinds of ways,
link |
01:30:31.440
like lose yourself in the busyness of each day,
link |
01:30:34.220
have little goals in mind, all those kinds of things
link |
01:30:36.380
to think that it's gonna go on forever.
link |
01:30:38.100
And you kind of know you're gonna die, yeah,
link |
01:30:40.740
and it's gonna be sad, but you don't really understand
link |
01:30:43.820
that you're going to die.
link |
01:30:45.140
And so that's their idea.
link |
01:30:46.460
And I find that compelling because it does seem
link |
01:30:49.940
to be a core unique aspect of human nature
link |
01:30:52.820
that we're able to think that we're going,
link |
01:30:55.180
we're able to really understand that this life is finite.
link |
01:30:59.540
That seems important.
link |
01:31:00.580
There's a bunch of different things there.
link |
01:31:02.260
So first of all, I don't think there is a qualitative
link |
01:31:04.300
difference between us and cats in the term.
link |
01:31:07.520
I think the difference is that we just have a better
link |
01:31:10.180
long term ability to predict in the long term.
link |
01:31:14.740
And so we have a better understanding of how the world works.
link |
01:31:17.380
So we have better understanding of finiteness of life
link |
01:31:20.180
and things like that.
link |
01:31:21.020
So we have a better planning engine than cats?
link |
01:31:23.540
Yeah.
link |
01:31:24.500
Okay.
link |
01:31:25.340
But what's the motivation for planning that far?
link |
01:31:28.780
Well, I think it's just a side effect of the fact
link |
01:31:30.540
that we have just a better planning engine
link |
01:31:32.340
because it makes us, as I said,
link |
01:31:34.780
the essence of intelligence is the ability to predict.
link |
01:31:37.420
And so the, because we're smarter as a side effect,
link |
01:31:41.220
we also have this ability to kind of make predictions
link |
01:31:43.500
about our own future existence or lack thereof.
link |
01:31:47.580
Okay.
link |
01:31:48.500
You say religion helps with that.
link |
01:31:50.540
I think religion hurts actually.
link |
01:31:53.000
It makes people worry about like,
link |
01:31:55.000
what's going to happen after their death, et cetera.
link |
01:31:57.500
If you believe that, you just don't exist after death.
link |
01:32:00.820
Like, it solves completely the problem, at least.
link |
01:32:02.940
You're saying if you don't believe in God,
link |
01:32:04.940
you don't worry about what happens after death?
link |
01:32:07.220
Yeah.
link |
01:32:08.260
I don't know.
link |
01:32:09.100
You only worry about this life
link |
01:32:11.900
because that's the only one you have.
link |
01:32:14.220
I think it's, well, I don't know.
link |
01:32:16.140
If I were to say what Ernest Becker says,
link |
01:32:17.740
and obviously I agree with him more than not,
link |
01:32:22.140
is you do deeply worry.
link |
01:32:26.160
If you believe there's no God,
link |
01:32:27.900
there's still a deep worry of the mystery of it all.
link |
01:32:31.780
Like, how does that make any sense that it just ends?
link |
01:32:35.700
I don't think we can truly understand that this ride,
link |
01:32:39.740
I mean, so much of our life, the consciousness,
link |
01:32:41.900
the ego is invested in this being.
link |
01:32:46.220
And then...
link |
01:32:47.580
Science keeps bringing humanity down from its pedestal.
link |
01:32:51.560
And that's just another example of it.
link |
01:32:54.740
That's wonderful, but for us individual humans,
link |
01:32:57.820
we don't like to be brought down from a pedestal.
link |
01:33:00.300
You're saying like, but see, you're fine with it because,
link |
01:33:03.580
well, so what Ernest Becker would say is you're fine with it
link |
01:33:06.340
because there's just a more peaceful existence for you,
link |
01:33:08.580
but you're not really fine.
link |
01:33:09.580
You're hiding from it.
link |
01:33:10.820
In fact, some of the people that experience
link |
01:33:12.780
the deepest trauma earlier in life,
link |
01:33:16.700
they often, before they seek extensive therapy,
link |
01:33:19.580
will say that I'm fine.
link |
01:33:21.060
It's like when you talk to people who are truly angry,
link |
01:33:23.460
how are you doing, I'm fine.
link |
01:33:25.380
The question is, what's going on?
link |
01:33:27.780
Now I had a near death experience.
link |
01:33:29.140
I had a very bad motorbike accident when I was 17.
link |
01:33:33.580
So, but that didn't have any impact
link |
01:33:36.920
on my reflection on that topic.
link |
01:33:40.420
So I'm basically just playing a bit of devil's advocate,
link |
01:33:43.100
pushing back on wondering,
link |
01:33:45.820
is it truly possible to accept death?
link |
01:33:47.540
And the flip side, that's more interesting,
link |
01:33:49.340
I think for AI and robotics is how important
link |
01:33:53.060
is it to have this as one of the suite of motivations
link |
01:33:57.180
is to not just avoid falling off the roof
link |
01:34:03.320
or something like that, but ponder the end of the ride.
link |
01:34:10.180
If you listen to the stoics, it's a great motivator.
link |
01:34:14.820
It adds a sense of urgency.
link |
01:34:16.900
So maybe to truly fear death or be cognizant of it
link |
01:34:21.420
might give a deeper meaning and urgency to the moment
link |
01:34:26.460
to live fully.
link |
01:34:30.460
Maybe I don't disagree with that.
link |
01:34:32.220
I mean, I think what motivates me here
link |
01:34:34.280
is knowing more about human nature.
link |
01:34:38.980
I mean, I think human nature and human intelligence
link |
01:34:41.760
is a big mystery.
link |
01:34:42.600
It's a scientific mystery
link |
01:34:45.020
in addition to philosophical and et cetera,
link |
01:34:48.580
but I'm a true believer in science.
link |
01:34:50.700
So, and I do have kind of a belief
link |
01:34:56.180
that for complex systems like the brain and the mind,
link |
01:34:59.940
the way to understand it is to try to reproduce it
link |
01:35:04.460
with artifacts that you build
link |
01:35:07.060
because you know what's essential to it
link |
01:35:08.900
when you try to build it.
link |
01:35:10.180
The same way I've used this analogy before with you,
link |
01:35:12.660
I believe, the same way we only started
link |
01:35:15.780
to understand aerodynamics
link |
01:35:18.140
when we started building airplanes
link |
01:35:19.300
and that helped us understand how birds fly.
link |
01:35:22.380
So I think there's kind of a similar process here
link |
01:35:25.460
where we don't have a full theory of intelligence,
link |
01:35:29.660
but building intelligent artifacts
link |
01:35:31.760
will help us perhaps develop some underlying theory
link |
01:35:35.480
that encompasses not just artificial implements,
link |
01:35:39.380
but also human and biological intelligence in general.
link |
01:35:43.860
So you're an interesting person to ask this question
link |
01:35:46.080
about sort of all kinds of different other
link |
01:35:49.400
intelligent entities or intelligences.
link |
01:35:53.100
What are your thoughts about kind of like the touring
link |
01:35:56.300
or the Chinese room question?
link |
01:35:59.240
If we create an AI system that exhibits
link |
01:36:02.920
a lot of properties of intelligence and consciousness,
link |
01:36:07.520
how comfortable are you thinking of that entity
link |
01:36:10.220
as intelligent or conscious?
link |
01:36:12.340
So you're trying to build now systems
link |
01:36:14.580
that have intelligence and there's metrics
link |
01:36:16.420
about their performance, but that metric is external.
link |
01:36:22.740
So how are you, are you okay calling a thing intelligent
link |
01:36:26.420
or are you going to be like most humans
link |
01:36:29.020
and be once again unhappy to be brought down
link |
01:36:32.700
from a pedestal of consciousness slash intelligence?
link |
01:36:34.920
No, I'll be very happy to understand
link |
01:36:39.500
more about human nature, human mind and human intelligence
link |
01:36:45.540
through the construction of machines
link |
01:36:47.240
that have similar abilities.
link |
01:36:50.600
And if a consequence of this is to bring down humanity
link |
01:36:54.520
one notch down from its already low pedestal,
link |
01:36:58.020
I'm just fine with it.
link |
01:36:59.140
That's just the reality of life.
link |
01:37:01.360
So I'm fine with that.
link |
01:37:02.460
Now you were asking me about things that,
link |
01:37:05.020
opinions I have that a lot of people may disagree with.
link |
01:37:07.940
I think if we think about the design
link |
01:37:12.780
of autonomous intelligence systems,
link |
01:37:14.300
so assuming that we are somewhat successful
link |
01:37:16.860
at some level of getting machines to learn models
link |
01:37:20.060
of the world, predictive models of the world,
link |
01:37:22.620
we build intrinsic motivation objective functions
link |
01:37:25.860
to drive the behavior of that system.
link |
01:37:28.340
The system also has perception modules
link |
01:37:30.100
that allows it to estimate the state of the world
link |
01:37:32.820
and then have some way of figuring out
link |
01:37:34.640
the sequence of actions that,
link |
01:37:36.180
to optimize a particular objective.
link |
01:37:39.300
If it has a critic of the type that I was describing before,
link |
01:37:42.740
the thing that makes you recoil your arm
link |
01:37:44.600
the second time I try to pinch you,
link |
01:37:48.620
intelligent autonomous machine will have emotions.
link |
01:37:51.700
I think emotions are an integral part
link |
01:37:54.060
of autonomous intelligence.
link |
01:37:56.400
If you have an intelligent system
link |
01:37:59.020
that is driven by intrinsic motivation, by objectives,
link |
01:38:03.160
if it has a critic that allows it to predict in advance
link |
01:38:07.680
whether the outcome of a situation is gonna be good or bad,
link |
01:38:11.040
is going to have emotions, it's gonna have fear.
link |
01:38:13.480
Yes.
link |
01:38:14.320
When it predicts that the outcome is gonna be bad
link |
01:38:18.180
and something to avoid is gonna have elation
link |
01:38:20.720
when it predicts it's gonna be good.
link |
01:38:24.280
If it has drives to relate with humans,
link |
01:38:28.680
in some ways the way humans have,
link |
01:38:30.660
it's gonna be social, right?
link |
01:38:34.460
And so it's gonna have emotions
link |
01:38:36.460
about attachment and things of that type.
link |
01:38:38.620
So I think the sort of sci fi thing
link |
01:38:44.700
where you see commander data,
link |
01:38:46.900
like having an emotion chip that you can turn off, right?
link |
01:38:50.100
I think that's ridiculous.
link |
01:38:51.700
So, I mean, here's the difficult
link |
01:38:53.380
philosophical social question.
link |
01:38:57.820
Do you think there will be a time like a civil rights
link |
01:39:01.020
movement for robots where, okay, forget the movement,
link |
01:39:05.180
but a discussion like the Supreme Court
link |
01:39:09.740
that particular kinds of robots,
link |
01:39:12.880
you know, particular kinds of systems
link |
01:39:16.100
deserve the same rights as humans
link |
01:39:18.300
because they can suffer just as humans can,
link |
01:39:22.900
all those kinds of things.
link |
01:39:24.740
Well, perhaps, perhaps not.
link |
01:39:27.340
Like imagine that humans were,
link |
01:39:29.580
that you could, you know, die and be restored.
link |
01:39:33.740
Like, you know, you could be sort of, you know,
link |
01:39:35.500
be 3D reprinted and, you know,
link |
01:39:37.540
your brain could be reconstructed in its finest details.
link |
01:39:40.740
Our ideas of rights will change in that case.
link |
01:39:43.140
If you can always just,
link |
01:39:45.900
there's always a backup you could always restore.
link |
01:39:48.220
Maybe like the importance of murder
link |
01:39:50.260
will go down one notch.
link |
01:39:51.980
That's right.
link |
01:39:52.820
But also your desire to do dangerous things,
link |
01:39:57.580
like, you know, skydiving or, you know,
link |
01:40:03.300
or, you know, race car driving,
link |
01:40:05.660
you know, car racing or that kind of stuff,
link |
01:40:07.300
you know, would probably increase
link |
01:40:09.460
or, you know, aeroplanes, aerobatics
link |
01:40:11.140
or that kind of stuff, right?
link |
01:40:12.380
It would be fine to do a lot of those things
link |
01:40:14.180
or explore, you know, dangerous areas and things like that.
link |
01:40:17.500
It would kind of change your relationship.
link |
01:40:19.220
So now it's very likely that robots would be like that
link |
01:40:22.420
because, you know, they'll be based on perhaps technology
link |
01:40:27.060
that is somewhat similar to today's technology
link |
01:40:30.140
and you can always have a backup.
link |
01:40:32.260
So it's possible, I don't know if you like video games,
link |
01:40:35.700
but there's a game called Diablo and...
link |
01:40:39.340
Oh, my sons are huge fans of this.
link |
01:40:41.860
Yes.
link |
01:40:44.100
In fact, they made a game that's inspired by it.
link |
01:40:47.060
Awesome.
link |
01:40:47.900
Like built a game?
link |
01:40:49.260
My three sons have a game design studio between them, yeah.
link |
01:40:52.660
That's awesome.
link |
01:40:53.480
They came out with a game.
link |
01:40:54.320
They just came out with a game.
link |
01:40:55.160
Last year, no, this was last year,
link |
01:40:56.860
early last year, about a year ago.
link |
01:40:58.180
That's awesome.
link |
01:40:59.020
But so in Diablo, there's something called hardcore mode,
link |
01:41:02.020
which if you die, there's no, you're gone.
link |
01:41:05.480
Right.
link |
01:41:06.320
That's it.
link |
01:41:07.140
And so it's possible with AI systems
link |
01:41:10.620
for them to be able to operate successfully
link |
01:41:13.260
and for us to treat them in a certain way
link |
01:41:15.580
because they have to be integrated in human society,
link |
01:41:18.400
they have to be able to die, no copies allowed.
link |
01:41:22.020
In fact, copying is illegal.
link |
01:41:23.860
It's possible with humans as well,
link |
01:41:25.260
like cloning will be illegal, even when it's possible.
link |
01:41:28.580
But cloning is not copying, right?
link |
01:41:29.960
I mean, you don't reproduce the mind of the person
link |
01:41:33.060
and the experience.
link |
01:41:33.940
Right.
link |
01:41:34.760
It's just a delayed twin, so.
link |
01:41:36.420
But then it's, but we were talking about with computers
link |
01:41:39.060
that you will be able to copy.
link |
01:41:40.580
Right.
link |
01:41:41.420
You will be able to perfectly save,
link |
01:41:42.660
pickle the mind state.
link |
01:41:46.640
And it's possible that that will be illegal
link |
01:41:49.660
because that goes against,
link |
01:41:53.300
that will destroy the motivation of the system.
link |
01:41:55.980
Okay, so let's say you have a domestic robot, okay?
link |
01:42:00.240
Sometime in the future.
link |
01:42:01.380
Yes.
link |
01:42:02.460
And the domestic robot comes to you kind of
link |
01:42:06.100
somewhat pre trained, it can do a bunch of things,
link |
01:42:08.700
but it has a particular personality
link |
01:42:10.580
that makes it slightly different from the other robots
link |
01:42:12.300
because that makes them more interesting.
link |
01:42:14.220
And then because it's lived with you for five years,
link |
01:42:18.060
you've grown some attachment to it and vice versa,
link |
01:42:21.900
and it's learned a lot about you.
link |
01:42:24.380
Or maybe it's not a real household robot.
link |
01:42:25.900
Maybe it's a virtual assistant that lives in your,
link |
01:42:29.380
you know, augmented reality glasses or whatever, right?
link |
01:42:32.580
You know, the horror movie type thing, right?
link |
01:42:36.680
And that system to some extent,
link |
01:42:39.620
the intelligence in that system is a bit like your child
link |
01:42:43.900
or maybe your PhD student in the sense that
link |
01:42:47.100
there's a lot of you in that machine now, right?
link |
01:42:50.260
And so if it were a living thing,
link |
01:42:53.500
you would do this for free if you want, right?
link |
01:42:56.560
If it's your child, your child can, you know,
link |
01:42:58.400
then live his or her own life.
link |
01:43:01.580
And you know, the fact that they learn stuff from you
link |
01:43:04.020
doesn't mean that you have any ownership of it, right?
link |
01:43:06.540
But if it's a robot that you've trained,
link |
01:43:09.380
perhaps you have some intellectual property claim
link |
01:43:13.580
about.
link |
01:43:14.420
Oh, intellectual property.
link |
01:43:15.240
Oh, I thought you meant like a permanence value
link |
01:43:18.140
in the sense that part of you is in.
link |
01:43:20.180
Well, there is permanence value, right?
link |
01:43:21.700
So you would lose a lot if that robot were to be destroyed
link |
01:43:24.660
and you had no backup, you would lose a lot, right?
link |
01:43:26.660
You lose a lot of investment, you know,
link |
01:43:28.100
kind of like, you know, a person dying, you know,
link |
01:43:31.860
that a friend of yours dying
link |
01:43:34.300
or a coworker or something like that.
link |
01:43:38.480
But also you have like intellectual property rights
link |
01:43:42.340
in the sense that that system is fine tuned
link |
01:43:45.940
to your particular existence.
link |
01:43:47.340
So that's now a very unique instantiation
link |
01:43:49.860
of that original background model,
link |
01:43:51.980
whatever it was that arrived.
link |
01:43:54.260
And then there are issues of privacy, right?
link |
01:43:55.660
Because now imagine that that robot has its own kind
link |
01:44:00.000
of volition and decides to work for someone else.
link |
01:44:02.820
Or kind of, you know, thinks life with you
link |
01:44:06.020
is sort of untenable or whatever.
link |
01:44:07.880
Now, all the things that that system learned from you,
link |
01:44:14.760
you know, can you like, you know,
link |
01:44:16.880
delete all the personal information
link |
01:44:18.160
that that system knows about you?
link |
01:44:19.680
I mean, that would be kind of an ethical question.
link |
01:44:22.200
Like, you know, can you erase the mind
link |
01:44:24.760
of a intelligent robot to protect your privacy?
link |
01:44:30.040
You can't do this with humans.
link |
01:44:31.580
You can ask them to shut up,
link |
01:44:32.680
but that you don't have complete power over them.
link |
01:44:35.640
You can't erase humans, yeah, it's the problem
link |
01:44:38.040
with the relationships, you know, if you break up,
link |
01:44:40.120
you can't erase the other human.
link |
01:44:42.640
With robots, I think it will have to be the same thing
link |
01:44:44.960
with robots, that risk, that there has to be some risk
link |
01:44:52.420
to our interactions to truly experience them deeply,
link |
01:44:55.120
it feels like.
link |
01:44:56.140
So you have to be able to lose your robot friend
link |
01:44:59.600
and that robot friend to go tweeting
link |
01:45:01.680
about how much of an asshole you were.
link |
01:45:03.680
But then are you allowed to, you know,
link |
01:45:06.160
murder the robot to protect your private information
link |
01:45:08.760
if the robot decides to leave?
link |
01:45:09.960
I have this intuition that for robots with certain,
link |
01:45:14.520
like, it's almost like a regulation.
link |
01:45:16.820
If you declare your robot to be,
link |
01:45:19.240
let's call it sentient or something like that,
link |
01:45:20.960
like this robot is designed for human interaction,
link |
01:45:24.180
then you're not allowed to murder these robots.
link |
01:45:26.040
It's the same as murdering other humans.
link |
01:45:28.160
Well, but what about you do a backup of the robot
link |
01:45:30.280
that you preserve on a hard drive
link |
01:45:32.600
for the equivalent in the future?
link |
01:45:33.880
That might be illegal.
link |
01:45:34.720
It's like piracy is illegal.
link |
01:45:38.080
No, but it's your own robot, right?
link |
01:45:39.800
But you can't, you don't.
link |
01:45:41.640
But then you can wipe out his brain.
link |
01:45:45.040
So this robot doesn't know anything about you anymore,
link |
01:45:47.440
but you still have, technically it's still in existence
link |
01:45:50.440
because you backed it up.
link |
01:45:51.700
And then there'll be these great speeches
link |
01:45:53.560
at the Supreme Court by saying,
link |
01:45:55.480
oh, sure, you can erase the mind of the robot
link |
01:45:57.840
just like you can erase the mind of a human.
link |
01:46:00.060
We both can suffer.
link |
01:46:01.100
There'll be some epic like Obama type character
link |
01:46:03.360
with a speech that we,
link |
01:46:05.680
like the robots and the humans are the same.
link |
01:46:08.840
We can both suffer.
link |
01:46:09.880
We can both hope.
link |
01:46:11.380
We can both, all of those kinds of things,
link |
01:46:14.880
raise families, all that kind of stuff.
link |
01:46:17.280
It's interesting for these, just like you said,
link |
01:46:20.140
emotion seems to be a fascinatingly powerful aspect
link |
01:46:24.200
of human interaction, human robot interaction.
link |
01:46:27.360
And if they're able to exhibit emotions
link |
01:46:30.480
at the end of the day,
link |
01:46:31.800
that's probably going to have us deeply consider
link |
01:46:35.920
human rights, like what we value in humans,
link |
01:46:38.480
what we value in other animals.
link |
01:46:40.320
That's why robots and AI is great.
link |
01:46:42.120
It makes us ask really good questions.
link |
01:46:44.280
The hard questions, yeah.
link |
01:46:45.480
But you asked about the Chinese room type argument.
link |
01:46:49.560
Is it real?
link |
01:46:50.400
If it looks real.
link |
01:46:51.480
I think the Chinese room argument is a really good one.
link |
01:46:54.400
So.
link |
01:46:55.440
So for people who don't know what Chinese room is,
link |
01:46:58.440
you can, I don't even know how to formulate it well,
link |
01:47:00.740
but basically you can mimic the behavior
link |
01:47:04.620
of an intelligence system by just following
link |
01:47:06.760
a giant algorithm code book that tells you exactly
link |
01:47:10.680
how to respond in exactly each case.
link |
01:47:12.880
But is that really intelligent?
link |
01:47:14.700
It's like a giant lookup table.
link |
01:47:16.600
When this person says this, you answer this.
link |
01:47:18.580
When this person says this, you answer this.
link |
01:47:21.000
And if you understand how that works,
link |
01:47:24.320
you have this giant, nearly infinite lookup table.
link |
01:47:27.360
Is that really intelligence?
link |
01:47:28.600
Cause intelligence seems to be a mechanism
link |
01:47:31.280
that's much more interesting and complex
link |
01:47:33.440
than this lookup table.
link |
01:47:34.620
I don't think so.
link |
01:47:35.460
So the, I mean, the real question comes down to,
link |
01:47:38.960
do you think, you know, you can,
link |
01:47:42.080
you can mechanize intelligence in some way,
link |
01:47:44.320
even if that involves learning?
link |
01:47:47.560
And the answer is, of course, yes, there's no question.
link |
01:47:50.720
There's a second question then, which is,
link |
01:47:53.400
assuming you can reproduce intelligence
link |
01:47:56.560
in sort of different hardware than biological hardware,
link |
01:47:59.400
you know, like computers, can you, you know,
link |
01:48:04.440
match human intelligence in all the domains
link |
01:48:09.600
in which humans are intelligent?
link |
01:48:12.920
Is it possible, right?
link |
01:48:13.920
So that's the hypothesis of strong AI.
link |
01:48:17.040
The answer to this, in my opinion, is an unqualified yes.
link |
01:48:20.700
This will as well happen at some point.
link |
01:48:22.640
There's no question that machines at some point
link |
01:48:25.300
will become more intelligent than humans
link |
01:48:26.640
in all domains where humans are intelligent.
link |
01:48:28.640
This is not for tomorrow.
link |
01:48:30.200
It is going to take a long time,
link |
01:48:32.240
regardless of what, you know,
link |
01:48:34.800
Elon and others have claimed or believed.
link |
01:48:38.120
This is a lot harder than many of those guys think it is.
link |
01:48:43.480
And many of those guys who thought it was simpler than that
link |
01:48:45.800
years, you know, five years ago,
link |
01:48:47.480
now think it's hard because it's been five years
link |
01:48:49.920
and they realize it's going to take a lot longer.
link |
01:48:53.460
That includes a bunch of people at DeepMind, for example.
link |
01:48:55.200
But...
link |
01:48:56.160
Oh, interesting.
link |
01:48:57.000
I haven't actually touched base with the DeepMind folks,
link |
01:48:59.320
but some of it, Elon or Demis Hassabis.
link |
01:49:03.280
I mean, sometimes in your role,
link |
01:49:05.800
you have to kind of create deadlines
link |
01:49:08.780
that are nearer than farther away
link |
01:49:10.720
to kind of create an urgency.
link |
01:49:12.800
Because, you know, you have to believe the impossible
link |
01:49:14.600
is possible in order to accomplish it.
link |
01:49:16.200
And there's, of course, a flip side to that coin,
link |
01:49:18.520
but it's a weird, you can't be too cynical
link |
01:49:21.280
if you want to get something done.
link |
01:49:22.400
Absolutely.
link |
01:49:23.360
I agree with that.
link |
01:49:24.280
But, I mean, you have to inspire people, right?
link |
01:49:26.920
To work on sort of ambitious things.
link |
01:49:31.400
So, you know, it's certainly a lot harder than we believe,
link |
01:49:35.620
but there's no question in my mind that this will happen.
link |
01:49:38.200
And now, you know, people are kind of worried about
link |
01:49:40.300
what does that mean for humans?
link |
01:49:42.480
They are going to be brought down from their pedestal,
link |
01:49:45.160
you know, a bunch of notches with that.
link |
01:49:47.980
And, you know, is that going to be good or bad?
link |
01:49:51.740
I mean, it's just going to give more power, right?
link |
01:49:53.480
It's an amplifier for human intelligence, really.
link |
01:49:56.200
So, speaking of doing cool, ambitious things,
link |
01:49:59.720
FAIR, the Facebook AI research group,
link |
01:50:02.920
has recently celebrated its eighth birthday.
link |
01:50:05.520
Or, maybe you can correct me on that.
link |
01:50:08.640
Looking back, what has been the successes, the failures,
link |
01:50:12.400
the lessons learned from the eight years of FAIR?
link |
01:50:14.440
And maybe you can also give context of
link |
01:50:16.600
where does the newly minted meta AI fit into,
link |
01:50:21.320
how does it relate to FAIR?
link |
01:50:22.640
Right, so let me tell you a little bit
link |
01:50:23.800
about the organization of all this.
link |
01:50:26.760
Yeah, FAIR was created almost exactly eight years ago.
link |
01:50:30.060
It wasn't called FAIR yet.
link |
01:50:31.240
It took that name a few months later.
link |
01:50:34.680
And at the time I joined Facebook,
link |
01:50:37.760
there was a group called the AI group
link |
01:50:39.520
that had about 12 engineers and a few scientists,
link |
01:50:43.560
like, you know, 10 engineers and two scientists
link |
01:50:45.480
or something like that.
link |
01:50:47.080
I ran it for three and a half years as a director,
link |
01:50:50.680
you know, hired the first few scientists
link |
01:50:52.380
and kind of set up the culture and organized it,
link |
01:50:55.040
you know, explained to the Facebook leadership
link |
01:50:57.880
what fundamental research was about
link |
01:51:00.200
and how it can work within industry
link |
01:51:03.640
and how it needs to be open and everything.
link |
01:51:07.240
And I think it's been an unqualified success
link |
01:51:12.360
in the sense that FAIR has simultaneously produced,
link |
01:51:17.800
you know, top level research
link |
01:51:19.560
and advanced the science and the technology,
link |
01:51:21.640
provided tools, open source tools,
link |
01:51:23.480
like PyTorch and many others,
link |
01:51:26.680
but at the same time has had a direct
link |
01:51:29.880
or mostly indirect impact on Facebook at the time,
link |
01:51:34.680
now Meta, in the sense that a lot of systems
link |
01:51:38.580
that Meta is built around now are based
link |
01:51:43.600
on research projects that started at FAIR.
link |
01:51:48.360
And so if you were to take out, you know,
link |
01:51:49.640
deep learning out of Facebook services now
link |
01:51:52.840
and Meta more generally,
link |
01:51:55.140
I mean, the company would literally crumble.
link |
01:51:57.760
I mean, it's completely built around AI these days.
link |
01:52:01.480
And it's really essential to the operations.
link |
01:52:04.000
So what happened after three and a half years
link |
01:52:06.640
is that I changed role, I became chief scientist.
link |
01:52:10.200
So I'm not doing day to day management of FAIR anymore.
link |
01:52:14.880
I'm more of a kind of, you know,
link |
01:52:17.120
think about strategy and things like that.
link |
01:52:18.880
And I carry my, I conduct my own research.
link |
01:52:21.440
I have, you know, my own kind of research group
link |
01:52:23.320
working on self supervised learning and things like this,
link |
01:52:25.320
which I didn't have time to do when I was director.
link |
01:52:28.240
So now FAIR is run by Joel Pinot and Antoine Bord together
link |
01:52:34.720
because FAIR is kind of split in two now.
link |
01:52:36.360
There's something called FAIR Labs,
link |
01:52:37.860
which is sort of bottom up science driven research
link |
01:52:40.940
and FAIR Excel, which is slightly more organized
link |
01:52:43.460
for bigger projects that require a little more
link |
01:52:46.440
kind of focus and more engineering support
link |
01:52:49.040
and things like that.
link |
01:52:49.880
So Joel needs FAIR Lab and Antoine Bord needs FAIR Excel.
link |
01:52:52.920
Where are they located?
link |
01:52:54.520
It's delocalized all over.
link |
01:52:58.000
So there's no question that the leadership of the company
link |
01:53:02.540
believes that this was a very worthwhile investment.
link |
01:53:06.560
And what that means is that it's there for the long run.
link |
01:53:12.840
Right?
link |
01:53:13.680
So if you want to talk in these terms, which I don't like,
link |
01:53:17.720
this is a business model, if you want,
link |
01:53:19.560
where FAIR, despite being a very fundamental research lab
link |
01:53:23.680
brings a lot of value to the company,
link |
01:53:25.320
either mostly indirectly through other groups.
link |
01:53:29.920
Now what happened three and a half years ago
link |
01:53:31.600
when I stepped down was also the creation of Facebook AI,
link |
01:53:34.640
which was basically a larger organization
link |
01:53:37.700
that covers FAIR, so FAIR is included in it,
link |
01:53:41.740
but also has other organizations
link |
01:53:43.880
that are focused on applied research
link |
01:53:47.840
or advanced development of AI technology
link |
01:53:51.220
that is more focused on the products of the company.
link |
01:53:54.680
So less emphasis on fundamental research.
link |
01:53:56.640
Less fundamental, but it's still research.
link |
01:53:58.220
I mean, there's a lot of papers coming out
link |
01:53:59.760
of those organizations and the people are awesome
link |
01:54:03.960
and wonderful to interact with.
link |
01:54:06.400
But it serves as kind of a way
link |
01:54:10.680
to kind of scale up if you want sort of AI technology,
link |
01:54:15.720
which, you know, may be very experimental
link |
01:54:17.600
and sort of lab prototypes into things that are usable.
link |
01:54:20.600
So FAIR is a subset of Meta AI.
link |
01:54:23.040
Is FAIR become like KFC?
link |
01:54:24.800
It'll just keep the F.
link |
01:54:26.520
Nobody cares what the F stands for.
link |
01:54:29.440
We'll know soon enough, probably by the end of 2021.
link |
01:54:35.600
I guess it's not a giant change, Mare, FAIR.
link |
01:54:38.400
Well, Mare doesn't sound too good,
link |
01:54:39.520
but the brand people are kind of deciding on this
link |
01:54:43.560
and they've been hesitating for a while now.
link |
01:54:45.860
And they tell us they're going to come up with an answer
link |
01:54:48.480
as to whether FAIR is going to change name
link |
01:54:50.440
or whether we're going to change just the meaning of the F.
link |
01:54:53.480
That's a good call.
link |
01:54:54.300
I would keep FAIR and change the meaning of the F.
link |
01:54:56.160
That would be my preference.
link |
01:54:57.600
I would turn the F into fundamental AI research.
link |
01:55:02.280
Oh, that's really good.
link |
01:55:03.120
Within Meta AI.
link |
01:55:04.280
So this would be meta FAIR,
link |
01:55:06.720
but people will call it FAIR, right?
link |
01:55:08.320
Yeah, exactly.
link |
01:55:09.320
I like it.
link |
01:55:10.160
And now Meta AI is part of the Reality Lab.
link |
01:55:16.680
So Meta now, the new Facebook is called Meta
link |
01:55:21.760
and it's kind of divided into Facebook, Instagram, WhatsApp
link |
01:55:30.400
and Reality Lab.
link |
01:55:32.920
And Reality Lab is about AR, VR, telepresence,
link |
01:55:37.920
communication technology and stuff like that.
link |
01:55:40.520
It's kind of the, you can think of it as the sort of,
link |
01:55:44.200
a combination of sort of new products
link |
01:55:47.920
and technology part of Meta.
link |
01:55:51.960
Is that where the touch sensing for robots,
link |
01:55:54.240
I saw that you were posting about that.
link |
01:55:56.120
Touch sensing for robot is part of FAIR actually.
link |
01:55:58.240
That's a FAIR project.
link |
01:55:59.080
Oh, it is.
link |
01:55:59.920
Okay, cool.
link |
01:56:00.740
Yeah, this is also the, no, but there is the other way,
link |
01:56:03.040
the haptic glove, right?
link |
01:56:05.680
Yes, that's more Reality Lab.
link |
01:56:07.640
That's Reality Lab research.
link |
01:56:10.760
Reality Lab research.
link |
01:56:11.960
By the way, the touch sensors are super interesting.
link |
01:56:14.400
Like integrating that modality
link |
01:56:16.120
into the whole sensing suite is very interesting.
link |
01:56:20.120
So what do you think about the Metaverse?
link |
01:56:23.680
What do you think about this whole kind of expansion
link |
01:56:27.820
of the view of the role of Facebook and Meta in the world?
link |
01:56:30.920
Well, Metaverse really should be thought of
link |
01:56:32.520
as the next step in the internet, right?
link |
01:56:35.360
Sort of trying to kind of make the experience
link |
01:56:41.760
more compelling of being connected
link |
01:56:46.280
either with other people or with content.
link |
01:56:49.520
And we are evolved and trained to evolve
link |
01:56:54.000
in 3D environments where we can see other people.
link |
01:56:58.680
We can talk to them when we're near them
link |
01:57:01.080
or an other viewer far away can't hear us,
link |
01:57:04.360
things like that, right?
link |
01:57:05.200
So there's a lot of social conventions
link |
01:57:08.080
that exist in the real world that we can try to transpose.
link |
01:57:10.800
Now, what is going to be eventually the,
link |
01:57:15.120
how compelling is it going to be?
link |
01:57:16.240
Like, is it going to be the case
link |
01:57:18.740
that people are going to be willing to do this
link |
01:57:21.300
if they have to wear a huge pair of goggles all day?
link |
01:57:24.600
Maybe not.
link |
01:57:26.400
But then again, if the experience
link |
01:57:27.480
is sufficiently compelling, maybe so.
link |
01:57:30.320
Or if the device that you have to wear
link |
01:57:32.200
is just basically a pair of glasses,
link |
01:57:34.560
and technology makes sufficient progress for that.
link |
01:57:38.400
AR is a much easier concept to grasp
link |
01:57:41.560
that you're going to have augmented reality glasses
link |
01:57:45.000
that basically contain some sort of virtual assistant
link |
01:57:48.640
that can help you in your daily lives.
link |
01:57:50.280
But at the same time with the AR,
link |
01:57:51.920
you have to contend with reality.
link |
01:57:53.480
With VR, you can completely detach yourself from reality.
link |
01:57:55.880
So it gives you freedom.
link |
01:57:57.200
It might be easier to design worlds in VR.
link |
01:58:00.360
Yeah, but you can imagine the metaverse
link |
01:58:02.900
being a mix, right?
link |
01:58:06.520
Or like, you can have objects that exist in the metaverse
link |
01:58:09.280
that pop up on top of the real world,
link |
01:58:11.200
or only exist in virtual reality.
link |
01:58:14.380
Okay, let me ask the hard question.
link |
01:58:17.080
Oh, because all of this was easy so far.
link |
01:58:18.520
This was easy.
link |
01:58:20.680
The Facebook, now Meta, the social network
link |
01:58:24.280
has been painted by the media as a net negative for society,
link |
01:58:28.280
even destructive and evil at times.
link |
01:58:30.840
You've pushed back against this, defending Facebook.
link |
01:58:34.080
Can you explain your defense?
link |
01:58:36.560
Yeah, so the description,
link |
01:58:38.640
the company that is being described in some media
link |
01:58:43.960
is not the company we know when we work inside.
link |
01:58:47.360
And it could be claimed that a lot of employees
link |
01:58:52.080
are uninformed about what really goes on in the company,
link |
01:58:54.600
but I'm a vice president.
link |
01:58:56.520
I mean, I have a pretty good vision of what goes on.
link |
01:58:58.920
I don't know everything, obviously.
link |
01:59:00.200
I'm not involved in everything,
link |
01:59:01.860
but certainly not in decision about content moderation
link |
01:59:05.320
or anything like this,
link |
01:59:06.160
but I have some decent vision of what goes on.
link |
01:59:10.160
And this evil that is being described, I just don't see it.
link |
01:59:13.660
And then I think there is an easy story to buy,
link |
01:59:18.200
which is that all the bad things in the world
link |
01:59:21.760
and the reason your friend believe crazy stuff,
link |
01:59:25.160
there's an easy scapegoat in social media in general,
link |
01:59:32.800
Facebook in particular.
link |
01:59:34.480
But you have to look at the data.
link |
01:59:35.720
Is it the case that Facebook, for example,
link |
01:59:40.080
polarizes people politically?
link |
01:59:42.720
Are there academic studies that show this?
link |
01:59:45.220
Is it the case that teenagers think of themselves less
link |
01:59:50.280
if they use Instagram more?
link |
01:59:52.160
Is it the case that people get more riled up
link |
01:59:57.280
against opposite sides in a debate or political opinion
link |
02:00:02.680
if they are more on Facebook or if they are less?
link |
02:00:05.720
And study after study show that none of this is true.
link |
02:00:10.880
This is independent studies by academic.
link |
02:00:12.400
They're not funded by Facebook or Meta.
link |
02:00:15.880
Study by Stanford, by some of my colleagues at NYU actually
link |
02:00:18.640
with whom I have no connection.
link |
02:00:20.140
There's a study recently, they paid people,
link |
02:00:24.980
I think it was in former Yugoslavia,
link |
02:00:29.940
I'm not exactly sure in what part,
link |
02:00:31.820
but they paid people to not use Facebook for a while
link |
02:00:34.380
in the period before the anniversary
link |
02:00:40.240
of the Srebrenica massacres.
link |
02:00:43.540
So people get riled up, like should we have a celebration?
link |
02:00:47.800
I mean, a memorial kind of celebration for it or not.
link |
02:00:51.120
So they paid a bunch of people
link |
02:00:52.540
to not use Facebook for a few weeks.
link |
02:00:56.260
And it turns out that those people ended up
link |
02:00:59.580
being more polarized than they were at the beginning
link |
02:01:02.660
and the people who were more on Facebook were less polarized.
link |
02:01:06.660
There's a study from Stanford of economists at Stanford
link |
02:01:10.460
that try to identify the causes
link |
02:01:12.660
of increasing polarization in the US.
link |
02:01:16.000
And it's been going on for 40 years
link |
02:01:17.820
before Mark Zuckerberg was born continuously.
link |
02:01:22.540
And so if there is a cause,
link |
02:01:25.620
it's not Facebook or social media.
link |
02:01:27.620
So you could say if social media just accelerated,
link |
02:01:29.580
but no, I mean, it's basically a continuous evolution
link |
02:01:33.060
by some measure of polarization in the US.
link |
02:01:35.820
And then you compare this with other countries
link |
02:01:37.660
like the West half of Germany
link |
02:01:41.460
because you can go 40 years in the East side
link |
02:01:44.700
or Denmark or other countries.
link |
02:01:47.380
And they use Facebook just as much
link |
02:01:49.460
and they're not getting more polarized,
link |
02:01:50.700
they're getting less polarized.
link |
02:01:52.040
So if you want to look for a causal relationship there,
link |
02:01:57.640
you can find a scapegoat, but you can't find a cause.
link |
02:01:59.840
Now, if you want to fix the problem,
link |
02:02:01.720
you have to find the right cause.
link |
02:02:03.180
And what rise me up is that people now are accusing Facebook
link |
02:02:07.720
of bad deeds that are done by others
link |
02:02:09.300
and those others are we're not doing anything about them.
link |
02:02:12.380
And by the way, those others include the owner
link |
02:02:14.820
of the Wall Street Journal
link |
02:02:15.660
in which all of those papers were published.
link |
02:02:17.700
So I should mention that I'm talking to Schrepp,
link |
02:02:20.060
Mike Schrepp on this podcast and also Mark Zuckerberg
link |
02:02:23.460
and probably these are conversations you can have with them
link |
02:02:26.340
because it's very interesting to me,
link |
02:02:27.620
even if Facebook has some measurable negative effect,
link |
02:02:31.900
you can't just consider that in isolation.
link |
02:02:33.780
You have to consider about all the positive ways
link |
02:02:35.940
that it connects us.
link |
02:02:36.820
So like every technology.
link |
02:02:38.140
It connects people, it's a question.
link |
02:02:39.660
You can't just say like there's an increase in division.
link |
02:02:43.880
Yes, probably Google search engine
link |
02:02:46.100
has created increase in division.
link |
02:02:47.900
But you have to consider about how much information
link |
02:02:49.900
are brought to the world.
link |
02:02:51.140
Like I'm sure Wikipedia created more division.
link |
02:02:53.700
If you just look at the division,
link |
02:02:55.340
we have to look at the full context of the world
link |
02:02:57.700
and they didn't make a better world.
link |
02:02:59.100
And you have to.
link |
02:02:59.940
The printing press has created more division, right?
link |
02:03:01.660
Exactly.
link |
02:03:02.500
I mean, so when the printing press was invented,
link |
02:03:06.900
the first books that were printed were things like the Bible
link |
02:03:10.780
and that allowed people to read the Bible by themselves,
link |
02:03:13.780
not get the message uniquely from priests in Europe.
link |
02:03:17.400
And that created the Protestant movement
link |
02:03:20.340
and 200 years of religious persecution and wars.
link |
02:03:23.660
So that's a bad side effect of the printing press.
link |
02:03:26.180
Social networks aren't being nearly as bad
link |
02:03:28.500
as the printing press,
link |
02:03:29.320
but nobody would say the printing press was a bad idea.
link |
02:03:33.520
Yeah, a lot of it is perception
link |
02:03:35.100
and there's a lot of different incentives operating here.
link |
02:03:38.420
Maybe a quick comment,
link |
02:03:40.020
since you're one of the top leaders at Facebook
link |
02:03:42.700
and at Meta, sorry, that's in the tech space,
link |
02:03:46.760
I'm sure Facebook involves a lot of incredible
link |
02:03:49.700
technological challenges that need to be solved.
link |
02:03:52.900
A lot of it probably is in the computer infrastructure,
link |
02:03:55.000
the hardware, I mean, it's just a huge amount.
link |
02:03:58.920
Maybe can you give me context about how much of Shrepp's life
link |
02:04:03.580
is AI and how much of it is low level compute?
link |
02:04:06.240
How much of it is flying all around doing business stuff?
link |
02:04:09.580
And the same with Mark Zuckerberg.
link |
02:04:12.000
They really focus on AI.
link |
02:04:13.740
I mean, certainly in the run up of the creation of FAIR
link |
02:04:19.520
and for at least a year after that, if not more,
link |
02:04:24.060
Mark was very, very much focused on AI
link |
02:04:26.700
and was spending quite a lot of effort on it.
link |
02:04:29.700
And that's his style.
link |
02:04:30.780
When he gets interested in something,
link |
02:04:32.060
he reads everything about it.
link |
02:04:34.100
He read some of my papers, for example, before I joined.
link |
02:04:39.620
And so he learned a lot about it.
link |
02:04:41.860
He said he liked notes.
link |
02:04:43.740
Right.
link |
02:04:46.460
And Shrepp was really into it also.
link |
02:04:51.100
I mean, Shrepp is really kind of,
link |
02:04:54.780
has something I've tried to preserve also
link |
02:04:57.940
despite my not so young age,
link |
02:05:00.180
which is a sense of wonder about science and technology.
link |
02:05:03.180
And he certainly has that.
link |
02:05:06.300
He's also a wonderful person.
link |
02:05:07.420
I mean, in terms of like as a manager,
link |
02:05:10.380
like dealing with people and everything.
link |
02:05:12.140
Mark also, actually.
link |
02:05:14.540
I mean, they're very human people.
link |
02:05:18.020
In the case of Mark, it's shockingly human
link |
02:05:20.600
given his trajectory.
link |
02:05:25.460
I mean, the personality of him that is painted in the press,
link |
02:05:28.100
it's just completely wrong.
link |
02:05:29.620
Yeah.
link |
02:05:30.460
But you have to know how to play the press.
link |
02:05:31.980
So that's, I put some of that responsibility on him too.
link |
02:05:36.220
You have to, it's like, you know,
link |
02:05:40.980
like the director, the conductor of an orchestra,
link |
02:05:44.300
you have to play the press and the public
link |
02:05:46.980
in a certain kind of way
link |
02:05:48.020
where you convey your true self to them.
link |
02:05:49.740
If there's a depth and kindness to it.
link |
02:05:51.060
It's hard.
link |
02:05:51.900
And it's probably not the best at it.
link |
02:05:53.740
So, yeah.
link |
02:05:56.460
You have to learn.
link |
02:05:57.700
And it's sad to see, and I'll talk to him about it,
link |
02:06:00.460
but Shrep is slowly stepping down.
link |
02:06:04.060
It's always sad to see folks sort of be there
link |
02:06:07.500
for a long time and slowly.
link |
02:06:09.420
I guess time is sad.
link |
02:06:11.220
I think he's done the thing he set out to do.
link |
02:06:14.780
And, you know, he's got, you know,
link |
02:06:19.700
family priorities and stuff like that.
link |
02:06:21.420
And I understand, you know, after 13 years or something.
link |
02:06:27.900
It's been a good run.
link |
02:06:28.900
Which in Silicon Valley is basically a lifetime.
link |
02:06:32.100
Yeah.
link |
02:06:32.940
You know, because, you know, it's dog years.
link |
02:06:35.000
So, NeurIPS, the conference just wrapped up.
link |
02:06:38.660
Let me just go back to something else.
link |
02:06:40.580
You posted that a paper you coauthored
link |
02:06:42.500
was rejected from NeurIPS.
link |
02:06:44.440
As you said, proudly, in quotes, rejected.
link |
02:06:48.020
It's a joke.
link |
02:06:48.940
Yeah, I know.
link |
02:06:49.760
So, can you describe this paper?
link |
02:06:53.260
And like, what was the idea in it?
link |
02:06:55.700
And also, maybe this is a good opportunity to ask
link |
02:06:59.060
what are the pros and cons, what works and what doesn't
link |
02:07:01.740
about the review process?
link |
02:07:03.620
Yeah, let me talk about the paper first.
link |
02:07:04.980
I'll talk about the review process afterwards.
link |
02:07:09.220
The paper is called VicReg.
link |
02:07:10.700
So, this is, I mentioned that before.
link |
02:07:12.540
Variance, invariance, covariance, regularization.
link |
02:07:14.900
And it's a technique, a noncontrastive learning technique
link |
02:07:18.260
for what I call joint embedding architecture.
link |
02:07:21.300
So, SiameseNets are an example
link |
02:07:23.380
of joint embedding architecture.
link |
02:07:24.860
So, joint embedding architecture is,
link |
02:07:29.220
let me back up a little bit, right?
link |
02:07:30.600
So, if you want to do self supervised learning,
link |
02:07:33.300
you can do it by prediction.
link |
02:07:36.440
So, let's say you want to train the system
link |
02:07:37.920
to predict video, right?
link |
02:07:38.760
You show it a video clip and you train the system
link |
02:07:42.500
to predict the next, the continuation of that video clip.
link |
02:07:45.040
Now, because you need to handle uncertainty,
link |
02:07:47.800
because there are many continuations that are plausible,
link |
02:07:51.580
you need to have, you need to handle this in some way.
link |
02:07:54.020
You need to have a way for the system
link |
02:07:56.660
to be able to produce multiple predictions.
link |
02:08:00.620
And the way, the only way I know to do this
link |
02:08:03.500
is through what's called a latent variable.
link |
02:08:05.420
So, you have some sort of hidden vector
link |
02:08:08.780
of a variable that you can vary over a set
link |
02:08:11.180
or draw from a distribution.
link |
02:08:12.580
And as you vary this vector over a set,
link |
02:08:14.500
the output, the prediction varies
link |
02:08:16.000
over a set of plausible predictions, okay?
link |
02:08:18.740
So, that's called,
link |
02:08:19.580
I call this a generative latent variable model.
link |
02:08:24.140
Got it.
link |
02:08:24.980
Okay, now there is an alternative to this,
link |
02:08:27.060
to handle uncertainty.
link |
02:08:28.700
And instead of directly predicting the next frames
link |
02:08:33.380
of the clip, you also run those through another neural net.
link |
02:08:41.080
So, you now have two neural nets,
link |
02:08:42.500
one that looks at the initial segment of the video clip,
link |
02:08:48.700
and another one that looks at the continuation
link |
02:08:51.260
during training, right?
link |
02:08:53.560
And what you're trying to do is learn a representation
link |
02:08:57.680
of those two video clips that is maximally informative
link |
02:09:00.780
about the video clips themselves,
link |
02:09:03.460
but is such that you can predict the representation
link |
02:09:07.180
of the second video clip
link |
02:09:08.580
from the representation of the first one easily, okay?
link |
02:09:12.340
And you can sort of formalize this
link |
02:09:13.580
in terms of maximizing mutual information
link |
02:09:15.340
and some stuff like that, but it doesn't matter.
link |
02:09:18.140
What you want is informative representations
link |
02:09:24.540
of the two video clips that are mutually predictable.
link |
02:09:28.460
What that means is that there's a lot of details
link |
02:09:30.900
in the second video clips that are irrelevant.
link |
02:09:36.500
Let's say a video clip consists in a camera panning
link |
02:09:40.500
the scene, there's gonna be a piece of that room
link |
02:09:43.340
that is gonna be revealed, and I can somewhat predict
link |
02:09:46.180
what that room is gonna look like,
link |
02:09:48.060
but I may not be able to predict the details
link |
02:09:50.220
of the texture of the ground
link |
02:09:52.300
and where the tiles are ending and stuff like that, right?
link |
02:09:54.500
So, those are irrelevant details
link |
02:09:56.360
that perhaps my representation will eliminate.
link |
02:09:59.620
And so, what I need is to train this second neural net
link |
02:10:03.680
in such a way that whenever the continuation video clip
link |
02:10:08.680
varies over all the plausible continuations,
link |
02:10:13.600
the representation doesn't change.
link |
02:10:15.600
Got it.
link |
02:10:16.440
So, it's the, yeah, yeah, got it.
link |
02:10:18.100
Over the space of the representations,
link |
02:10:20.860
doing the same kind of thing
link |
02:10:21.880
as you do with similarity learning.
link |
02:10:24.300
Right.
link |
02:10:25.680
So, these are two ways to handle multimodality
link |
02:10:28.840
in a prediction, right?
link |
02:10:29.680
In the first way, you parameterize the prediction
link |
02:10:32.280
with a latent variable,
link |
02:10:33.480
but you predict pixels essentially, right?
link |
02:10:35.800
In the second one, you don't predict pixels,
link |
02:10:38.400
you predict an abstract representation of pixels,
link |
02:10:40.720
and you guarantee that this abstract representation
link |
02:10:43.480
has as much information as possible about the input,
link |
02:10:46.200
but sort of, you know,
link |
02:10:47.080
drops all the stuff that you really can't predict,
link |
02:10:49.740
essentially.
link |
02:10:52.120
I used to be a big fan of the first approach.
link |
02:10:53.880
And in fact, in this paper with Hicham Mishra,
link |
02:10:55.880
this blog post, the Dark Matter Intelligence,
link |
02:10:58.400
I was kind of advocating for this.
link |
02:10:59.760
And in the last year and a half,
link |
02:11:01.600
I've completely changed my mind.
link |
02:11:02.840
I'm now a big fan of the second one.
link |
02:11:04.640
And it's because of a small collection of algorithms
link |
02:11:10.000
that have been proposed over the last year and a half or so,
link |
02:11:13.680
two years, to do this, including vCraig,
link |
02:11:17.800
its predecessor called Barlow Twins,
link |
02:11:19.600
which I mentioned, a method from our friends at DeepMind
link |
02:11:23.560
called BYOL, and there's a bunch of others now
link |
02:11:28.500
that kind of work similarly.
link |
02:11:29.600
So, they're all based on this idea of joint embedding.
link |
02:11:32.600
Some of them have an explicit criterion
link |
02:11:34.660
that is an approximation of mutual information.
link |
02:11:36.640
Some others at BYOL work, but we don't really know why.
link |
02:11:39.400
And there's been like lots of theoretical papers
link |
02:11:41.240
about why BYOL works.
link |
02:11:42.360
No, it's not that, because we take it out
link |
02:11:43.940
and it still works, and blah, blah, blah.
link |
02:11:46.040
I mean, so there's like a big debate,
link |
02:11:47.800
but the important point is that we now have a collection
link |
02:11:51.540
of noncontrastive joint embedding methods,
link |
02:11:53.720
which I think is the best thing since sliced bread.
link |
02:11:56.400
So, I'm super excited about this
link |
02:11:58.320
because I think it's our best shot
link |
02:12:01.200
for techniques that would allow us
link |
02:12:02.720
to kind of build predictive world models.
link |
02:12:06.360
And at the same time,
link |
02:12:07.440
learn hierarchical representations of the world,
link |
02:12:09.920
where what matters about the world is preserved
link |
02:12:11.840
and what is irrelevant is eliminated.
link |
02:12:14.440
And by the way, the representations,
link |
02:12:15.880
the before and after, is in the space
link |
02:12:19.200
in a sequence of images, or is it for single images?
link |
02:12:22.320
It would be either for a single image, for a sequence.
link |
02:12:24.600
It doesn't have to be images.
link |
02:12:25.660
This could be applied to text.
link |
02:12:26.680
This could be applied to just about any signal.
link |
02:12:28.560
I'm looking for methods that are generally applicable
link |
02:12:32.960
that are not specific to one particular modality.
link |
02:12:36.200
It could be audio or whatever.
link |
02:12:37.640
Got it.
link |
02:12:38.460
So, what's the story behind this paper?
link |
02:12:40.120
This paper is describing one such method?
link |
02:12:43.480
It's this vcrack method.
link |
02:12:44.480
So, this is coauthored.
link |
02:12:45.720
The first author is a student called Adrien Barne,
link |
02:12:49.280
who is a resident PhD student at Fair Paris,
link |
02:12:52.680
who is coadvised by me and Jean Ponce,
link |
02:12:55.800
who is a professor at École Normale Supérieure,
link |
02:12:58.720
also a research director at INRIA.
link |
02:13:01.600
So, this is a wonderful program in France
link |
02:13:03.600
where PhD students can basically do their PhD in industry,
link |
02:13:06.640
and that's kind of what's happening here.
link |
02:13:10.440
And this paper is a followup on this Bardo Twin paper
link |
02:13:15.480
by my former postdoc, now Stéphane Denis,
link |
02:13:18.360
with Li Jing and Iorij Bontar
link |
02:13:21.560
and a bunch of other people from Fair.
link |
02:13:24.720
And one of the main criticism from reviewers
link |
02:13:27.840
is that vcrack is not different enough from Bardo Twins.
link |
02:13:31.400
But, you know, my impression is that it's, you know,
link |
02:13:36.720
Bardo Twins with a few bugs fixed, essentially,
link |
02:13:39.880
and in the end, this is what people will use.
link |
02:13:43.200
Right, so.
link |
02:13:44.520
But, you know, I'm used to stuff
link |
02:13:47.080
that I submit being rejected for a while.
link |
02:13:49.040
So, it might be rejected and actually exceptionally well cited
link |
02:13:51.360
because people use it.
link |
02:13:52.280
Well, it's already cited like a bunch of times.
link |
02:13:54.360
So, I mean, the question is then to the deeper question
link |
02:13:57.600
about peer review and conferences.
link |
02:14:00.240
I mean, computer science is a field that's kind of unique
link |
02:14:02.600
that the conference is highly prized.
link |
02:14:04.960
That's one.
link |
02:14:05.800
Right.
link |
02:14:06.640
And it's interesting because the peer review process there
link |
02:14:09.120
is similar, I suppose, to journals,
link |
02:14:11.080
but it's accelerated significantly.
link |
02:14:13.640
Well, not significantly, but it goes fast.
link |
02:14:16.560
And it's a nice way to get stuff out quickly,
link |
02:14:19.760
to peer review it quickly,
link |
02:14:20.800
go to present it quickly to the community.
link |
02:14:22.640
So, not quickly, but quicker.
link |
02:14:25.160
Yeah.
link |
02:14:26.000
But nevertheless, it has many of the same flaws
link |
02:14:27.840
of peer review,
link |
02:14:29.120
because it's a limited number of people look at it.
link |
02:14:31.520
There's bias and the following,
link |
02:14:32.800
like that if you want to do new ideas,
link |
02:14:35.600
you're going to get pushback.
link |
02:14:38.120
There's self interested people that kind of can infer
link |
02:14:42.120
who submitted it and kind of, you know,
link |
02:14:45.320
be cranky about it, all that kind of stuff.
link |
02:14:47.760
Yeah, I mean, there's a lot of social phenomena there.
link |
02:14:51.040
There's one social phenomenon, which is that
link |
02:14:53.200
because the field has been growing exponentially,
link |
02:14:56.760
the vast majority of people in the field
link |
02:14:58.560
are extremely junior.
link |
02:15:00.000
Yeah.
link |
02:15:00.840
So, as a consequence,
link |
02:15:01.920
and that's just a consequence of the field growing, right?
link |
02:15:04.880
So, as the number of, as the size of the field
link |
02:15:07.840
kind of starts saturating,
link |
02:15:08.920
you will have less of that problem
link |
02:15:11.440
of reviewers being very inexperienced.
link |
02:15:15.360
A consequence of this is that, you know, young reviewers,
link |
02:15:20.160
I mean, there's a phenomenon which is that
link |
02:15:22.840
reviewers try to make their life easy
link |
02:15:24.640
and to make their life easy when reviewing a paper
link |
02:15:27.440
is very simple.
link |
02:15:28.280
You just have to find a flaw in the paper, right?
link |
02:15:29.960
So, basically they see the task as finding flaws in papers
link |
02:15:34.480
and most papers have flaws, even the good ones.
link |
02:15:36.720
Yeah.
link |
02:15:38.160
So, it's easy to, you know, to do that.
link |
02:15:41.480
Your job is easier as a reviewer if you just focus on this.
link |
02:15:46.440
But what's important is like,
link |
02:15:49.640
is there a new idea in that paper
link |
02:15:51.520
that is likely to influence?
link |
02:15:54.120
It doesn't matter if the experiments are not that great,
link |
02:15:56.240
if the protocol is, you know, so, so, you know,
link |
02:16:00.680
things like that.
link |
02:16:01.520
As long as there is a worthy idea in it
link |
02:16:05.040
that will influence the way people think about the problem,
link |
02:16:09.200
even if they make it better, you know, eventually,
link |
02:16:11.160
I think that's really what makes a paper useful.
link |
02:16:15.480
And so, this combination of social phenomena
link |
02:16:19.520
creates a disease that has plagued, you know,
link |
02:16:24.200
other fields in the past, like speech recognition,
link |
02:16:26.680
where basically, you know, people chase numbers
link |
02:16:28.560
on benchmarks and it's much easier to get a paper accepted
link |
02:16:34.680
if it brings an incremental improvement
link |
02:16:37.040
on a sort of mainstream well accepted method or problem.
link |
02:16:44.160
And those are, to me, boring papers.
link |
02:16:46.040
I mean, they're not useless, right?
link |
02:16:47.880
Because industry, you know, strives
link |
02:16:50.560
on those kinds of progress,
link |
02:16:52.400
but they're not the ones that I'm interested in,
link |
02:16:54.080
in terms of like new concepts and new ideas.
link |
02:16:55.680
So, papers that are really trying to strike
link |
02:16:59.320
kind of new advances generally don't make it.
link |
02:17:02.600
Now, thankfully we have Archive.
link |
02:17:04.240
Archive, exactly.
link |
02:17:05.320
And then there's open review type of situations
link |
02:17:08.160
where you, and then, I mean, Twitter's a kind of open review.
link |
02:17:11.680
I'm a huge believer that review should be done
link |
02:17:13.880
by thousands of people, not two people.
link |
02:17:15.720
I agree.
link |
02:17:16.760
And so Archive, like do you see a future
link |
02:17:19.560
where a lot of really strong papers,
link |
02:17:21.240
it's already the present, but a growing future
link |
02:17:23.640
where it'll just be Archive
link |
02:17:26.280
and you're presenting an ongoing continuous conference
link |
02:17:31.280
called Twitter slash the internet slash Archive Sanity.
link |
02:17:35.560
Andre just released a new version.
link |
02:17:38.040
So just not, you know, not being so elitist
link |
02:17:40.920
about this particular gating.
link |
02:17:43.440
It's not a question of being elitist or not.
link |
02:17:44.960
It's a question of being basically recommendation
link |
02:17:50.120
and sort of approvals for people who don't see themselves
link |
02:17:53.400
as having the ability to do so by themselves, right?
link |
02:17:55.880
And so it saves time, right?
link |
02:17:57.320
If you rely on other people's opinion
link |
02:18:00.000
and you trust those people or those groups
link |
02:18:03.760
to evaluate a paper for you, that saves you time
link |
02:18:09.960
because, you know, you don't have to like scrutinize
link |
02:18:12.680
the paper as much, you know, is brought to your attention.
link |
02:18:15.200
I mean, it's the whole idea of sort of, you know,
link |
02:18:16.680
collective recommender system, right?
link |
02:18:18.760
So I actually thought about this a lot, you know,
link |
02:18:22.360
about 10, 15 years ago,
link |
02:18:24.200
because there were discussions at NIPS
link |
02:18:27.080
and, you know, and we're about to create iClear
link |
02:18:30.040
with Yoshua Bengio.
link |
02:18:31.200
And so I wrote a document kind of describing
link |
02:18:34.880
a reviewing system, which basically was, you know,
link |
02:18:38.040
you post your paper on some repository,
link |
02:18:39.720
let's say archive or now could be open review.
link |
02:18:42.560
And then you can form a reviewing entity,
link |
02:18:46.240
which is equivalent to a reviewing board, you know,
link |
02:18:48.840
of a journal or program committee of a conference.
link |
02:18:53.960
You have to list the members.
link |
02:18:55.600
And then that group reviewing entity can choose
link |
02:19:00.000
to review a particular paper spontaneously or not.
link |
02:19:03.720
There is no exclusive relationship anymore
link |
02:19:05.600
between a paper and a venue or reviewing entity.
link |
02:19:09.200
Any reviewing entity can review any paper
link |
02:19:12.720
or may choose not to.
link |
02:19:15.000
And then, you know, given evaluation,
link |
02:19:16.640
it's not published, not published,
link |
02:19:17.920
it's just an evaluation and a comment,
link |
02:19:20.320
which would be public, signed by the reviewing entity.
link |
02:19:23.680
And if it's signed by a reviewing entity,
link |
02:19:25.880
you know, it's one of the members of reviewing entity.
link |
02:19:27.760
So if the reviewing entity is, you know,
link |
02:19:30.680
Lex Friedman's, you know, preferred papers, right?
link |
02:19:33.720
You know, it's Lex Friedman writing the review.
link |
02:19:35.640
Yes, so for me, that's a beautiful system, I think.
link |
02:19:40.920
But in addition to that,
link |
02:19:42.880
it feels like there should be a reputation system
link |
02:19:45.800
for the reviewers.
link |
02:19:47.480
For the reviewing entities,
link |
02:19:49.040
not the reviewers individually.
link |
02:19:50.280
The reviewing entities, sure.
link |
02:19:51.720
But even within that, the reviewers too,
link |
02:19:53.880
because there's another thing here.
link |
02:19:57.120
It's not just the reputation,
link |
02:19:59.360
it's an incentive for an individual person to do great.
link |
02:20:02.680
Right now, in the academic setting,
link |
02:20:05.040
the incentive is kind of internal,
link |
02:20:07.880
just wanting to do a good job.
link |
02:20:09.240
But honestly, that's not a strong enough incentive
link |
02:20:11.240
to do a really good job in reading a paper,
link |
02:20:13.720
in finding the beautiful amidst the mistakes and the flaws
link |
02:20:16.400
and all that kind of stuff.
link |
02:20:17.760
Like if you're the person that first discovered
link |
02:20:20.760
a powerful paper, and you get to be proud of that discovery,
link |
02:20:25.120
then that gives a huge incentive to you.
link |
02:20:27.520
That's a big part of my proposal, actually,
link |
02:20:29.280
where I describe that as, you know,
link |
02:20:31.280
if your evaluation of papers is predictive
link |
02:20:35.280
of future success, okay,
link |
02:20:37.560
then your reputation should go up as a reviewing entity.
link |
02:20:42.560
So yeah, exactly.
link |
02:20:43.760
I mean, I even had a master's student
link |
02:20:46.280
who was a master's student in library science
link |
02:20:49.560
and computer science actually kind of work out exactly
link |
02:20:52.480
how that should work with formulas and everything.
link |
02:20:55.160
So in terms of implementation,
link |
02:20:56.800
do you think that's something that's doable?
link |
02:20:58.640
I mean, I've been sort of, you know,
link |
02:20:59.720
talking about this to sort of various people
link |
02:21:02.080
like, you know, Andrew McCallum, who started Open Review.
link |
02:21:05.960
And the reason why we picked Open Review
link |
02:21:07.800
for iClear initially,
link |
02:21:09.120
even though it was very early for them,
link |
02:21:11.440
is because my hope was that iClear,
link |
02:21:14.320
it was eventually going to kind of
link |
02:21:16.760
inaugurate this type of system.
link |
02:21:18.600
So iClear kept the idea of Open Reviews.
link |
02:21:22.240
So where the reviews are, you know,
link |
02:21:23.840
published with a paper, which I think is very useful,
link |
02:21:27.320
but in many ways that's kind of reverted
link |
02:21:29.800
to kind of more of a conventional type conferences
link |
02:21:33.280
for everything else.
link |
02:21:34.120
And that, I mean, I don't run iClear.
link |
02:21:37.800
I'm just the president of the foundation,
link |
02:21:41.200
but you know, people who run it
link |
02:21:44.120
should make decisions about how to run it.
link |
02:21:45.680
And I'm not going to tell them because they are volunteers
link |
02:21:48.560
and I'm really thankful that they do that.
link |
02:21:50.360
So, but I'm saddened by the fact
link |
02:21:53.040
that we're not being innovative enough.
link |
02:21:57.120
Yeah, me too.
link |
02:21:57.960
I hope that changes.
link |
02:21:59.640
Yeah.
link |
02:22:00.480
Cause the communication science broadly,
link |
02:22:02.040
but communication computer science ideas
link |
02:22:05.440
is how you make those ideas have impact, I think.
link |
02:22:08.400
Yeah, and I think, you know, a lot of this is
link |
02:22:11.440
because people have in their mind kind of an objective,
link |
02:22:16.200
which is, you know, fairness for authors
link |
02:22:19.120
and the ability to count points basically
link |
02:22:22.600
and give credits accurately.
link |
02:22:24.880
But that comes at the expense of the progress of science.
link |
02:22:28.880
So to some extent,
link |
02:22:29.720
we're slowing down the progress of science.
link |
02:22:32.160
And are we actually achieving fairness?
link |
02:22:34.440
And we're not achieving fairness.
link |
02:22:35.920
You know, we still have biases.
link |
02:22:37.880
You know, we're doing, you know, a double blind review,
link |
02:22:39.840
but you know, the biases are still there.
link |
02:22:44.360
There are different kinds of biases.
link |
02:22:46.720
You write that the phenomenon of emergence,
link |
02:22:49.360
collective behavior exhibited by a large collection
link |
02:22:51.680
of simple elements in interaction
link |
02:22:54.280
is one of the things that got you
link |
02:22:55.760
into neural nets in the first place.
link |
02:22:57.760
I love cellular automata.
link |
02:22:59.120
I love simple interacting elements
link |
02:23:02.000
and the things that emerge from them.
link |
02:23:04.040
Do you think we understand how complex systems can emerge
link |
02:23:07.880
from such simple components that interact simply?
link |
02:23:11.080
No, we don't.
link |
02:23:12.320
It's a big mystery.
link |
02:23:13.160
Also, it's a mystery for physicists.
link |
02:23:14.480
It's a mystery for biologists.
link |
02:23:17.000
You know, how is it that the universe around us
link |
02:23:22.000
seems to be increasing in complexity and not decreasing?
link |
02:23:25.120
I mean, that is a kind of curious property of physics
link |
02:23:29.640
that despite the second law of thermodynamics,
link |
02:23:32.320
we seem to be, you know, evolution and learning
link |
02:23:35.960
and et cetera seems to be kind of at least locally
link |
02:23:40.640
to increase complexity and not decrease it.
link |
02:23:44.000
So perhaps the ultimate purpose of the universe
link |
02:23:46.520
is to just get more complex.
link |
02:23:49.040
Have these, I mean, small pockets of beautiful complexity.
link |
02:23:55.120
Does that, cellular automata,
link |
02:23:57.120
these kinds of emergence of complex systems
link |
02:23:59.680
give you some intuition or guide your understanding
link |
02:24:04.120
of machine learning systems and neural networks and so on?
link |
02:24:06.680
Or are these, for you right now, disparate concepts?
link |
02:24:09.440
Well, it got me into it.
link |
02:24:10.880
You know, I discovered the existence of the perceptron
link |
02:24:15.600
when I was a college student, you know, by reading a book
link |
02:24:19.280
and it was a debate between Chomsky and Piaget
link |
02:24:21.680
and Seymour Papert from MIT was kind of singing the praise
link |
02:24:25.920
of the perceptron in that book.
link |
02:24:27.400
And I, the first time I heard about the running machine,
link |
02:24:29.760
right, so I started digging the literature
link |
02:24:31.360
and I found those paper, those books,
link |
02:24:33.560
which were basically transcription of workshops
link |
02:24:37.120
or conferences from the fifties and sixties
link |
02:24:39.880
about self organizing systems.
link |
02:24:42.160
So there were, there was a series of conferences
link |
02:24:44.560
on self organizing systems and there's books on this.
link |
02:24:48.160
Some of them are, you can actually get them
link |
02:24:50.200
at the internet archive, you know, the digital version.
link |
02:24:55.120
And there are like fascinating articles in there by,
link |
02:24:58.280
there's a guy whose name has been largely forgotten,
link |
02:25:00.360
Heinz von Förster, he's a German physicist
link |
02:25:04.520
who immigrated to the US and worked
link |
02:25:07.240
on self organizing systems in the fifties.
link |
02:25:11.320
And in the sixties he created at University of Illinois
link |
02:25:13.800
at Urbana Champagne, he created the biological
link |
02:25:16.440
computer laboratory, BCL, which was all about neural nets.
link |
02:25:21.680
Unfortunately, that was kind of towards the end
link |
02:25:23.440
of the popularity of neural nets.
link |
02:25:24.920
So that lab never kind of strived very much,
link |
02:25:27.760
but he wrote a bunch of papers about self organization
link |
02:25:30.360
and about the mystery of self organization.
link |
02:25:33.480
An example he has is you take, imagine you are in space,
link |
02:25:37.000
there's no gravity and you have a big box
link |
02:25:38.880
with magnets in it, okay.
link |
02:25:42.200
You know, kind of rectangular magnets
link |
02:25:43.920
with North Pole on one end, South Pole on the other end.
link |
02:25:46.880
You shake the box gently and the magnets will kind of stick
link |
02:25:49.640
to themselves and probably form like complex structure,
link |
02:25:53.480
you know, spontaneously.
link |
02:25:55.280
You know, that could be an example of self organization,
link |
02:25:57.120
but you know, you have lots of examples,
link |
02:25:58.400
neural nets are an example of self organization too,
link |
02:26:01.280
you know, in many respect.
link |
02:26:03.080
And it's a bit of a mystery, you know,
link |
02:26:05.960
how like what is possible with this, you know,
link |
02:26:09.520
pattern formation in physical systems, in chaotic system
link |
02:26:12.960
and things like that, you know, the emergence of life,
link |
02:26:16.120
you know, things like that.
link |
02:26:16.960
So, you know, how does that happen?
link |
02:26:19.560
So it's a big puzzle for physicists as well.
link |
02:26:22.600
It feels like understanding this,
link |
02:26:24.720
the mathematics of emergence
link |
02:26:27.920
in some constrained situations
link |
02:26:29.720
might help us create intelligence,
link |
02:26:32.120
like help us add a little spice to the systems
link |
02:26:36.040
because you seem to be able to in complex systems
link |
02:26:40.960
with emergence to be able to get a lot from little.
link |
02:26:44.600
And so that seems like a shortcut
link |
02:26:47.000
to get big leaps in performance, but...
link |
02:26:51.120
But there's a missing concept that we don't have.
link |
02:26:55.000
Yeah.
link |
02:26:55.840
And it's something also I've been fascinated by
link |
02:26:58.440
since my undergrad days,
link |
02:27:00.720
and it's how you measure complexity, right?
link |
02:27:03.880
So we don't actually have good ways of measuring,
link |
02:27:06.960
or at least we don't have good ways of interpreting
link |
02:27:09.840
the measures that we have at our disposal.
link |
02:27:11.920
Like how do you measure the complexity of something, right?
link |
02:27:14.480
So there's all those things, you know,
link |
02:27:15.680
like, you know, Kolmogorov, Chaitin, Solomonov complexity
link |
02:27:18.560
of, you know, the length of the shortest program
link |
02:27:20.920
that would generate a bit string can be thought of
link |
02:27:23.320
as the complexity of that bit string, right?
link |
02:27:26.840
I've been fascinated by that concept.
link |
02:27:28.200
The problem with that is that
link |
02:27:30.160
that complexity is defined up to a constant,
link |
02:27:32.840
which can be very large.
link |
02:27:34.920
Right.
link |
02:27:35.760
There are similar concepts that are derived from,
link |
02:27:37.840
you know, Bayesian probability theory,
link |
02:27:42.280
where, you know, the complexity of something
link |
02:27:44.520
is the negative log of its probability, essentially, right?
link |
02:27:48.360
And you have a complete equivalence between the two things.
link |
02:27:51.120
And there you would think, you know,
link |
02:27:52.120
the probability is something that's well defined mathematically,
link |
02:27:55.160
which means complexity is well defined.
link |
02:27:57.200
But it's not true.
link |
02:27:58.040
You need to have a model of the distribution.
link |
02:28:01.720
You may need to have a prior
link |
02:28:02.800
if you're doing Bayesian inference.
link |
02:28:04.200
And the prior plays the same role
link |
02:28:05.720
as the choice of the computer
link |
02:28:07.040
with which you measure Kolmogorov complexity.
link |
02:28:09.480
And so every measure of complexity we have
link |
02:28:12.040
has some arbitrary density,
link |
02:28:15.440
you know, an additive constant,
link |
02:28:16.840
which can be arbitrarily large.
link |
02:28:19.560
And so, you know, how can we come up with a good theory
link |
02:28:23.360
of how things become more complex
link |
02:28:24.640
if we don't have a good measure of complexity?
link |
02:28:26.080
Yeah, which we need for this.
link |
02:28:28.200
One way that people study this in the space of biology,
link |
02:28:32.240
the people that study the origin of life
link |
02:28:33.760
or try to recreate the life in the laboratory.
link |
02:28:37.120
And the more interesting one is the alien one,
link |
02:28:39.200
is when we go to other planets,
link |
02:28:41.320
how do we recognize this life?
link |
02:28:43.960
Because, you know, complexity, we associate complexity,
link |
02:28:46.800
maybe some level of mobility with life.
link |
02:28:50.000
You know, we have to be able to, like,
link |
02:28:51.680
have concrete algorithms for, like,
link |
02:28:57.200
measuring the level of complexity we see
link |
02:29:00.000
in order to know the difference between life and non life.
link |
02:29:02.760
And the problem is that complexity
link |
02:29:04.040
is in the eye of the beholder.
link |
02:29:05.440
So let me give you an example.
link |
02:29:07.480
If I give you an image of the MNIST digits, right,
link |
02:29:13.240
and I flip through MNIST digits,
link |
02:29:15.400
there is obviously some structure to it
link |
02:29:18.120
because local structure, you know,
link |
02:29:20.440
neighboring pixels are correlated
link |
02:29:23.200
across the entire data set.
link |
02:29:25.440
I imagine that I apply a random permutation
link |
02:29:30.440
to all the pixels, a fixed random permutation.
link |
02:29:33.920
Now I show you those images,
link |
02:29:35.360
they will look, you know, really disorganized to you,
link |
02:29:38.880
more complex.
link |
02:29:40.680
In fact, they're not more complex in absolute terms,
link |
02:29:42.880
they're exactly the same as originally, right?
link |
02:29:45.480
And if you knew what the permutation was,
link |
02:29:46.960
you know, you could undo the permutation.
link |
02:29:49.440
Now, imagine I give you special glasses
link |
02:29:52.360
that undo that permutation.
link |
02:29:54.120
Now, all of a sudden, what looked complicated
link |
02:29:56.160
becomes simple.
link |
02:29:57.000
Right.
link |
02:29:57.920
So if you have two, if you have, you know,
link |
02:30:00.400
humans on one end, and then another race of aliens
link |
02:30:03.280
that sees the universe with permutation glasses.
link |
02:30:05.440
Yeah, with the permutation glasses.
link |
02:30:06.600
Okay, what we perceive as simple to them
link |
02:30:09.800
is hardly complicated, it's probably heat.
link |
02:30:11.760
Yeah.
link |
02:30:12.600
Heat, yeah.
link |
02:30:13.440
Okay, and what they perceive as simple to us
link |
02:30:15.320
is random fluctuation, it's heat.
link |
02:30:18.480
Yeah.
link |
02:30:19.320
Yeah, it's truly in the eye of the beholder.
link |
02:30:22.760
Yeah.
link |
02:30:23.600
It depends what kind of glasses you're wearing.
link |
02:30:24.920
Right.
link |
02:30:25.760
It depends what kind of algorithm you're running
link |
02:30:26.840
in your perception system.
link |
02:30:28.360
So I don't think we'll have a theory of intelligence,
link |
02:30:31.080
self organization, evolution, things like this,
link |
02:30:34.320
until we have a good handle on a notion of complexity
link |
02:30:38.520
which we know is in the eye of the beholder.
link |
02:30:42.320
Yeah, it's sad to think that we might not be able
link |
02:30:44.400
to detect or interact with alien species
link |
02:30:47.600
because we're wearing different glasses.
link |
02:30:50.280
Because their notion of locality
link |
02:30:51.440
might be different from ours.
link |
02:30:52.400
Yeah, exactly.
link |
02:30:53.240
This actually connects with fascinating questions
link |
02:30:55.200
in physics at the moment, like modern physics,
link |
02:30:58.120
quantum physics, like, you know, questions about,
link |
02:31:00.240
like, you know, can we recover the information
link |
02:31:02.520
that's lost in a black hole and things like this, right?
link |
02:31:04.520
And that relies on notions of complexity,
link |
02:31:09.360
which, you know, I find this fascinating.
link |
02:31:11.640
Can you describe your personal quest
link |
02:31:13.360
to build an expressive electronic wind instrument, EWI?
link |
02:31:19.760
What is it?
link |
02:31:20.600
What does it take to build it?
link |
02:31:24.000
Well, I'm a tinker.
link |
02:31:25.080
I like building things.
link |
02:31:26.760
I like building things with combinations of electronics
link |
02:31:28.960
and, you know, mechanical stuff.
link |
02:31:32.400
You know, I have a bunch of different hobbies,
link |
02:31:34.120
but, you know, probably my first one was little,
link |
02:31:37.960
was building model airplanes and stuff like that.
link |
02:31:39.800
And I still do that to some extent.
link |
02:31:41.880
But also electronics, I taught myself electronics
link |
02:31:43.800
before I studied it.
link |
02:31:46.240
And the reason I taught myself electronics
link |
02:31:48.120
is because of music.
link |
02:31:49.600
My cousin was an aspiring electronic musician
link |
02:31:53.200
and he had an analog synthesizer.
link |
02:31:55.000
And I was, you know, basically modifying it for him
link |
02:31:58.000
and building sequencers and stuff like that, right, for him.
link |
02:32:00.280
I was in high school when I was doing this.
link |
02:32:02.640
That's the interesting, like, progressive rock, like 80s.
link |
02:32:06.040
Like, what's the greatest band of all time,
link |
02:32:08.000
according to Yann LeCun?
link |
02:32:09.520
Oh, man, there's too many of them.
link |
02:32:11.080
But, you know, it's a combination of, you know,
link |
02:32:16.360
Mahavishnu Orchestra, Weather Report,
link |
02:32:19.800
yes, Genesis, you know, pre Peter Gabriel,
link |
02:32:27.120
Gentle Giant, you know, things like that.
link |
02:32:29.120
Great.
link |
02:32:29.960
Okay, so this love of electronics
link |
02:32:32.280
and this love of music combined together.
link |
02:32:34.240
Right, so I was actually trained to play
link |
02:32:36.280
Baroque and Renaissance music and I played in an orchestra
link |
02:32:42.040
when I was in high school and first years of college.
link |
02:32:45.640
And I played the recorder, crumb horn,
link |
02:32:48.040
a little bit of oboe, you know, things like that.
link |
02:32:50.200
So I'm a wind instrument player.
link |
02:32:52.520
But I always wanted to play improvised music,
link |
02:32:54.080
even though I don't know anything about it.
link |
02:32:56.320
And the only way I figured, you know,
link |
02:32:58.760
short of like learning to play saxophone
link |
02:33:01.080
was to play electronic wind instruments.
link |
02:33:03.560
So they behave, you know, the fingering is similar
link |
02:33:05.680
to a saxophone, but, you know,
link |
02:33:07.640
you have wide variety of sound
link |
02:33:09.080
because you control the synthesizer with it.
link |
02:33:11.040
So I had a bunch of those, you know,
link |
02:33:13.120
going back to the late 80s from either Yamaha or Akai.
link |
02:33:18.880
They're both kind of the main manufacturers of those.
link |
02:33:22.520
So they were classically, you know,
link |
02:33:23.720
going back several decades.
link |
02:33:25.520
But I've never been completely satisfied with them
link |
02:33:27.680
because of lack of expressivity.
link |
02:33:31.120
And, you know, those things, you know,
link |
02:33:32.480
are somewhat expressive.
link |
02:33:33.400
I mean, they measure the breath pressure,
link |
02:33:34.760
they measure the lip pressure.
link |
02:33:36.520
And, you know, you have various parameters.
link |
02:33:39.800
You can vary with fingers,
link |
02:33:41.480
but they're not really as expressive
link |
02:33:44.800
as an acoustic instrument, right?
link |
02:33:47.040
You hear John Coltrane play two notes
link |
02:33:49.400
and you know it's John Coltrane,
link |
02:33:50.760
you know, it's got a unique sound.
link |
02:33:53.000
Or Miles Davis, right?
link |
02:33:54.280
You can hear it's Miles Davis playing the trumpet
link |
02:33:57.480
because the sound reflects their, you know,
link |
02:34:02.480
physiognomy, basically, the shape of the vocal track
link |
02:34:07.600
kind of shapes the sound.
link |
02:34:09.200
So how do you do this with an electronic instrument?
link |
02:34:12.320
And I was, many years ago,
link |
02:34:13.920
I met a guy called David Wessel.
link |
02:34:15.640
He was a professor at Berkeley
link |
02:34:18.240
and created the Center for Music Technology there.
link |
02:34:23.000
And he was interested in that question.
link |
02:34:25.600
And so I kept kind of thinking about this for many years.
link |
02:34:28.120
And finally, because of COVID, you know, I was at home,
link |
02:34:31.040
I was in my workshop.
link |
02:34:32.600
My workshop serves also as my kind of Zoom room
link |
02:34:36.040
and home office.
link |
02:34:37.360
And this is in New Jersey?
link |
02:34:38.800
In New Jersey.
link |
02:34:39.640
And I started really being serious about, you know,
link |
02:34:43.600
building my own iwi instrument.
link |
02:34:45.800
What else is going on in that New Jersey workshop?
link |
02:34:48.160
Is there some crazy stuff you've built,
link |
02:34:50.880
like just, or like left on the workshop floor, left behind?
link |
02:34:55.200
A lot of crazy stuff is, you know,
link |
02:34:57.600
electronics built with microcontrollers of various kinds
link |
02:35:01.680
and, you know, weird flying contraptions.
link |
02:35:06.720
So you still love flying?
link |
02:35:08.720
It's a family disease.
link |
02:35:09.880
My dad got me into it when I was a kid.
link |
02:35:13.520
And he was building model airplanes when he was a kid.
link |
02:35:16.840
And he was a mechanical engineer.
link |
02:35:19.800
He taught himself electronics also.
link |
02:35:21.200
So he built his early radio control systems
link |
02:35:24.080
in the late 60s, early 70s.
link |
02:35:27.760
And so that's what got me into,
link |
02:35:29.640
I mean, he got me into kind of, you know,
link |
02:35:31.120
engineering and science and technology.
link |
02:35:33.040
Do you also have an interest in appreciation of flight
link |
02:35:36.120
in other forms, like with drones, quadroptors,
link |
02:35:38.320
or do you, is it model airplane, the thing that's?
link |
02:35:41.720
You know, before drones were, you know,
link |
02:35:45.240
kind of a consumer product, you know,
link |
02:35:49.240
I built my own, you know,
link |
02:35:50.280
with also building a microcontroller
link |
02:35:52.000
with JavaScripts and accelerometers for stabilization,
link |
02:35:56.240
writing the firmware for it, you know.
link |
02:35:57.760
And then when it became kind of a standard thing
link |
02:35:59.200
you could buy, it was boring, you know,
link |
02:36:00.320
I stopped doing it.
link |
02:36:01.160
It was not fun anymore.
link |
02:36:03.520
Yeah.
link |
02:36:04.720
You were doing it before it was cool.
link |
02:36:06.280
Yeah.
link |
02:36:07.120
What advice would you give to a young person today
link |
02:36:10.080
in high school and college
link |
02:36:11.360
that dreams of doing something big like Yann LeCun,
link |
02:36:15.960
like let's talk in the space of intelligence,
link |
02:36:18.960
dreams of having a chance to solve
link |
02:36:21.000
some fundamental problem in space of intelligence,
link |
02:36:23.960
both for their career and just in life,
link |
02:36:26.200
being somebody who was a part
link |
02:36:28.600
of creating something special?
link |
02:36:30.680
So try to get interested by big questions,
link |
02:36:35.400
things like, you know, what is intelligence?
link |
02:36:38.680
What is the universe made of?
link |
02:36:40.440
What's life all about?
link |
02:36:41.680
Things like that.
link |
02:36:45.040
Like even like crazy big questions,
link |
02:36:47.040
like what's time?
link |
02:36:49.040
Like nobody knows what time is.
link |
02:36:53.160
And then learn basic things,
link |
02:36:58.640
like basic methods, either from math,
link |
02:37:00.680
from physics or from engineering.
link |
02:37:03.280
Things that have a long shelf life.
link |
02:37:05.600
Like if you have a choice between,
link |
02:37:07.280
like, you know, learning, you know,
link |
02:37:10.160
mobile programming on iPhone
link |
02:37:12.600
or quantum mechanics, take quantum mechanics.
link |
02:37:16.880
Because you're gonna learn things
link |
02:37:18.480
that you have no idea exist.
link |
02:37:20.120
And you may not, you may never be a quantum physicist,
link |
02:37:25.320
but you will learn about path integrals.
link |
02:37:26.800
And path integrals are used everywhere.
link |
02:37:29.120
It's the same formula that you use
link |
02:37:30.280
for, you know, Bayesian integration and stuff like that.
link |
02:37:33.280
So the ideas, the little ideas within quantum mechanics,
link |
02:37:37.720
within some of these kind of more solidified fields
link |
02:37:41.440
will have a longer shelf life.
link |
02:37:42.920
You'll somehow use indirectly in your work.
link |
02:37:46.920
Learn classical mechanics, like you'll learn
link |
02:37:48.640
about Lagrangian, for example,
link |
02:37:51.360
which is like a huge, hugely useful concept,
link |
02:37:55.000
you know, for all kinds of different things.
link |
02:37:57.320
Learn statistical physics, because all the math
link |
02:38:01.680
that comes out of, you know, for machine learning
link |
02:38:05.480
basically comes out of, was figured out
link |
02:38:07.280
by statistical physicists in the, you know,
link |
02:38:09.240
late 19th, early 20th century, right?
link |
02:38:10.960
So, and for some of them actually more recently
link |
02:38:14.320
for, by people like Giorgio Parisi,
link |
02:38:16.120
who just got the Nobel prize for the replica method,
link |
02:38:19.040
among other things, it's used for a lot of different things.
link |
02:38:23.200
You know, variational inference,
link |
02:38:25.560
that math comes from statistical physics.
link |
02:38:28.600
So a lot of those kind of, you know, basic courses,
link |
02:38:33.960
you know, if you do electrical engineering,
link |
02:38:36.240
you take signal processing,
link |
02:38:37.360
you'll learn about Fourier transforms.
link |
02:38:39.880
Again, something super useful is at the basis
link |
02:38:42.720
of things like graph neural nets,
link |
02:38:44.920
which is an entirely new sub area of, you know,
link |
02:38:49.400
AI machine learning, deep learning,
link |
02:38:50.680
which I think is super promising
link |
02:38:52.160
for all kinds of applications.
link |
02:38:54.360
Something very promising,
link |
02:38:55.240
if you're more interested in applications,
link |
02:38:56.680
is the applications of AI machine learning
link |
02:38:58.840
and deep learning to science,
link |
02:39:01.520
or to science that can help solve big problems
link |
02:39:05.120
in the world.
link |
02:39:05.960
I have colleagues at Meta, at Fair,
link |
02:39:09.240
who started this project called Open Catalyst,
link |
02:39:11.240
and it's an open project collaborative.
link |
02:39:14.560
And the idea is to use deep learning
link |
02:39:16.640
to help design new chemical compounds or materials
link |
02:39:21.960
that would facilitate the separation
link |
02:39:23.800
of hydrogen from oxygen.
link |
02:39:25.840
If you can efficiently separate oxygen from hydrogen
link |
02:39:29.080
with electricity, you solve climate change.
link |
02:39:33.520
It's as simple as that,
link |
02:39:34.480
because you cover, you know,
link |
02:39:37.640
some random desert with solar panels,
link |
02:39:40.800
and you have them work all day,
link |
02:39:42.560
produce hydrogen,
link |
02:39:43.480
and then you shoot the hydrogen wherever it's needed.
link |
02:39:45.400
You don't need anything else.
link |
02:39:48.560
You know, you have controllable power
link |
02:39:53.440
that can be transported anywhere.
link |
02:39:55.640
So if we have a large scale,
link |
02:39:59.040
efficient energy storage technology,
link |
02:40:02.160
like producing hydrogen, we solve climate change.
link |
02:40:06.640
Here's another way to solve climate change,
link |
02:40:08.560
is figuring out how to make fusion work.
link |
02:40:10.480
Now, the problem with fusion
link |
02:40:11.520
is that you make a super hot plasma,
link |
02:40:13.640
and the plasma is unstable and you can't control it.
link |
02:40:16.240
Maybe with deep learning,
link |
02:40:17.080
you can find controllers that will stabilize plasma
link |
02:40:19.120
and make, you know, practical fusion reactors.
link |
02:40:21.640
I mean, that's very speculative,
link |
02:40:23.080
but, you know, it's worth trying,
link |
02:40:24.480
because, you know, the payoff is huge.
link |
02:40:28.280
There's a group at Google working on this,
link |
02:40:29.880
led by John Platt.
link |
02:40:31.160
So control, convert as many problems
link |
02:40:33.920
in science and physics and biology and chemistry
link |
02:40:36.800
into a learnable problem
link |
02:40:39.760
and see if a machine can learn it.
link |
02:40:41.560
Right, I mean, there's properties of, you know,
link |
02:40:43.880
complex materials that we don't understand
link |
02:40:46.280
from first principle, for example, right?
link |
02:40:48.520
So, you know, if we could design new, you know,
link |
02:40:53.040
new materials, we could make more efficient batteries.
link |
02:40:56.400
You know, we could make maybe faster electronics.
link |
02:40:58.800
We could, I mean, there's a lot of things we can imagine
link |
02:41:01.920
doing, or, you know, lighter materials
link |
02:41:04.480
for cars or airplanes or things like that.
link |
02:41:06.400
Maybe better fuel cells.
link |
02:41:07.600
I mean, there's all kinds of stuff we can imagine.
link |
02:41:09.520
If we had good fuel cells, hydrogen fuel cells,
link |
02:41:12.280
we could use them to power airplanes,
link |
02:41:13.640
and, you know, transportation wouldn't be, or cars,
link |
02:41:17.240
and we wouldn't have emission problem,
link |
02:41:20.280
CO2 emission problems for air transportation anymore.
link |
02:41:24.600
So there's a lot of those things, I think,
link |
02:41:26.880
where AI, you know, can be used.
link |
02:41:30.160
And this is not even talking about
link |
02:41:31.560
all the sort of medicine, biology,
link |
02:41:33.520
and everything like that, right?
link |
02:41:35.680
You know, like, you know, protein folding,
link |
02:41:37.840
you know, figuring out, like, how could you design
link |
02:41:40.040
your proteins that it sticks to another protein
link |
02:41:41.880
at a particular site, because that's how you design drugs
link |
02:41:44.040
in the end.
link |
02:41:46.280
So, you know, deep learning would be useful,
link |
02:41:47.600
although those are kind of, you know,
link |
02:41:49.280
would be sort of enormous progress
link |
02:41:51.120
if we could use it for that.
link |
02:41:53.360
Here's an example.
link |
02:41:54.320
If you take, this is like from recent material physics,
link |
02:41:58.280
you take a monoatomic layer of graphene, right?
link |
02:42:02.200
So it's just carbon on a hexagonal mesh,
link |
02:42:04.920
and you make this single atom thick.
link |
02:42:09.120
You put another one on top,
link |
02:42:10.360
you twist them by some magic number of degrees,
link |
02:42:13.080
three degrees or something.
link |
02:42:14.800
It becomes superconductor.
link |
02:42:16.760
Nobody has any idea why.
link |
02:42:18.240
Okay.
link |
02:42:20.800
I want to know how that was discovered,
link |
02:42:22.480
but that's the kind of thing that machine learning
link |
02:42:23.920
can actually discover, these kinds of things.
link |
02:42:25.800
Maybe not, but there is a hint, perhaps,
link |
02:42:28.960
that with machine learning, we would train a system
link |
02:42:31.720
to basically be a phenomenological model
link |
02:42:34.840
of some complex emergent phenomenon,
link |
02:42:37.240
which, you know, superconductivity is one of those,
link |
02:42:42.400
where, you know, this collective phenomenon
link |
02:42:44.760
is too difficult to describe from first principles
link |
02:42:46.920
with the current, you know,
link |
02:42:48.800
the usual sort of reductionist type method,
link |
02:42:51.920
but we could have deep learning systems
link |
02:42:54.960
that predict the properties of a system
link |
02:42:57.680
from a description of it after being trained
link |
02:42:59.880
with sufficiently many samples.
link |
02:43:04.880
This guy, Pascal Foua, at EPFL,
link |
02:43:06.680
he has a startup company that,
link |
02:43:09.800
where he basically trained a convolutional net,
link |
02:43:13.440
essentially, to predict the aerodynamic properties
link |
02:43:16.640
of solids, and you can generate as much data as you want
link |
02:43:19.640
by just running computational free dynamics, right?
link |
02:43:21.920
So you give, like, a wing, airfoil,
link |
02:43:27.800
or something, shape of some kind,
link |
02:43:29.800
and you run computational free dynamics,
link |
02:43:31.400
you get, as a result, the drag and, you know,
link |
02:43:36.160
lift and all that stuff, right?
link |
02:43:37.480
And you can generate lots of data,
link |
02:43:40.080
train a neural net to make those predictions,
link |
02:43:41.840
and now what you have is a differentiable model
link |
02:43:44.120
of, let's say, drag and lift
link |
02:43:47.000
as a function of the shape of that solid,
link |
02:43:48.680
and so you can do back rate and descent,
link |
02:43:49.960
you can optimize the shape
link |
02:43:51.520
so you get the properties you want.
link |
02:43:54.880
Yeah, that's incredible.
link |
02:43:56.040
That's incredible, and on top of all that,
link |
02:43:58.280
probably you should read a little bit of literature
link |
02:44:01.480
and a little bit of history
link |
02:44:03.600
for inspiration and for wisdom,
link |
02:44:06.640
because after all, all of these technologies
link |
02:44:08.800
will have to work in the human world.
link |
02:44:10.280
Yes.
link |
02:44:11.120
And the human world is complicated.
link |
02:44:12.640
It is, sadly.
link |
02:44:15.080
Jan, this is an amazing conversation.
link |
02:44:18.440
I'm really honored that you would talk with me today.
link |
02:44:20.400
Thank you for all the amazing work you're doing
link |
02:44:22.240
at FAIR, at Meta, and thank you for being so passionate
link |
02:44:26.280
after all these years about everything
link |
02:44:28.120
that's going on, you're a beacon of hope
link |
02:44:29.960
for the machine learning community,
link |
02:44:31.600
and thank you so much for spending
link |
02:44:33.200
your valuable time with me today.
link |
02:44:34.480
That was awesome.
link |
02:44:35.320
Thanks for having me on.
link |
02:44:36.280
That was a pleasure.
link |
02:44:38.800
Thanks for listening to this conversation with Jan Lacune.
link |
02:44:41.440
To support this podcast,
link |
02:44:42.800
please check out our sponsors in the description.
link |
02:44:45.720
And now, let me leave you with some words
link |
02:44:47.840
from Isaac Asimov.
link |
02:44:50.640
Your assumptions are your windows on the world.
link |
02:44:53.760
Scrub them off every once in a while,
link |
02:44:56.040
or the light won't come in.
link |
02:44:58.760
Thank you for listening, and hope to see you next time.