back to index

Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning | Lex Fridman Podcast #258


small model | large model

link |
00:00:00.000
The following is a conversation with Yann LeCun,
link |
00:00:02.720
his second time in the podcast.
link |
00:00:04.560
He is the chief AI scientist at Meta, formerly Facebook,
link |
00:00:09.160
professor at NYU, touring award winner,
link |
00:00:13.040
one of the seminal figures in the history
link |
00:00:15.600
of machine learning and artificial intelligence,
link |
00:00:18.480
and someone who is brilliant and opinionated
link |
00:00:21.960
in the best kind of way,
link |
00:00:23.400
and so it was always fun to talk to.
link |
00:00:26.000
This is the Lex Friedman podcast, to support it,
link |
00:00:28.840
please check out our sponsors in the description,
link |
00:00:31.240
and now here's my conversation with Yann LeCun.
link |
00:00:36.160
You cowrote the article,
link |
00:00:37.560
self supervised learning, the dark matter of intelligence.
link |
00:00:40.920
Great title, by the way, with Yann Misra.
link |
00:00:43.720
So let me ask, what is self supervised learning
link |
00:00:46.640
and why is it the dark matter of intelligence?
link |
00:00:49.920
I'll start by the dark matter part.
link |
00:00:53.320
There is obviously a kind of learning
link |
00:00:55.680
that humans and animals are doing
link |
00:00:59.880
that we currently are not reproducing properly
link |
00:01:02.800
with machines or with AI, right?
link |
00:01:04.680
So the most popular approaches to machine learning today
link |
00:01:07.440
are, or pydimes I should say,
link |
00:01:09.680
are supervised learning and reinforcement learning,
link |
00:01:12.720
and they are extremely efficient.
link |
00:01:15.120
Supervised learning requires many samples
link |
00:01:17.640
for learning anything,
link |
00:01:19.760
and reinforcement learning requires a ridiculously large
link |
00:01:22.760
number of trial and errors for a system to run anything.
link |
00:01:29.320
And that's why we don't have self driving cars.
link |
00:01:32.960
That's a big leap from one to the other, okay?
link |
00:01:35.320
So to solve difficult problems,
link |
00:01:38.760
you have to have a lot of human annotation
link |
00:01:42.320
for supervised learning to work,
link |
00:01:44.080
and to solve those difficult problems
link |
00:01:45.520
with reinforcement learning,
link |
00:01:46.680
you have to have some way to maybe simulate that problem
link |
00:01:50.240
such that you can do that large scale kind of learning
link |
00:01:52.720
that reinforcement learning requires.
link |
00:01:54.440
Right, so how is it that most teenagers
link |
00:01:58.040
can learn to drive a car in about 20 hours of practice,
link |
00:02:02.280
whereas even with millions of hours of simulated practice,
link |
00:02:07.480
a self driving car can't actually learn
link |
00:02:09.200
to drive itself properly?
link |
00:02:12.120
And so obviously we're missing something, right?
link |
00:02:13.920
And it's quite obvious for a lot of people
link |
00:02:15.600
that the immediate response you get from many people
link |
00:02:19.520
is, well, humans use their background knowledge
link |
00:02:22.880
to learn faster, and they're right.
link |
00:02:25.840
Now, how was that background knowledge acquired?
link |
00:02:28.320
And that's the big question.
link |
00:02:30.120
So now you have to ask,
link |
00:02:32.440
how do babies in the first few months of life
link |
00:02:35.160
learn how the world works?
link |
00:02:37.160
Mostly by observation,
link |
00:02:38.280
because they can hardly act in the world.
link |
00:02:41.400
And they learn an enormous amount of background knowledge
link |
00:02:43.200
about the world that may be the basis
link |
00:02:46.040
of what we call common sense.
link |
00:02:48.160
This type of learning, it's not learning a task,
link |
00:02:51.280
it's not being reinforced for anything,
link |
00:02:53.720
it's just observing the world and figuring out how it works.
link |
00:02:58.400
Building world models, learning world models.
link |
00:03:01.240
How do we do this?
link |
00:03:02.120
And how do we reproduce this in machines?
link |
00:03:04.560
So self supervision learning is one instance
link |
00:03:09.520
or one attempt at trying to reproduce this kind of learning.
link |
00:03:13.120
Okay, so you're looking at just observation,
link |
00:03:16.400
so not even the interacting part of a child.
link |
00:03:18.720
It's just sitting there watching mom and dad walk around,
link |
00:03:21.600
pick up stuff, all of that.
link |
00:03:23.480
That's what we mean by background knowledge.
link |
00:03:25.520
Perhaps not even watching mom and dad,
link |
00:03:27.520
just watching the world go by.
link |
00:03:30.000
Just having eyes open or having eyes closed
link |
00:03:31.960
or the very active opening and closing eyes
link |
00:03:34.480
that the world appears and disappears,
link |
00:03:36.280
all that basic information.
link |
00:03:39.120
And you're saying in order to learn to drive,
link |
00:03:43.160
like the reason humans are able to learn to drive quickly,
link |
00:03:45.840
some faster than others,
link |
00:03:47.360
is because of the background knowledge
link |
00:03:48.680
they were able to watch cars operate in the world
link |
00:03:51.760
in the many years leading up to it,
link |
00:03:53.640
the physics of basics objects and all that kind of stuff.
link |
00:03:55.760
That's right.
link |
00:03:56.600
I mean, the basic physics of objects,
link |
00:03:57.440
you don't even need to know how a car works, right?
link |
00:04:00.880
Because that you can learn fairly quickly.
link |
00:04:02.480
I mean, the example I use very often
link |
00:04:03.840
is you're driving next to a cliff.
link |
00:04:06.680
And you know in advance because of your understanding
link |
00:04:10.560
of intuitive physics that if you turn the wheel to the right,
link |
00:04:13.760
the car will veer to the right, we'll run off the cliff,
link |
00:04:16.760
fall off the cliff,
link |
00:04:17.600
and nothing good will come out of this, right?
link |
00:04:20.440
But if you are a sort of, you know,
link |
00:04:22.760
tabular rice reinforcement learning system
link |
00:04:25.120
that doesn't have a model of the world,
link |
00:04:28.200
you have to repeat folding off this cliff thousands of times
link |
00:04:31.320
before you figure out it's a bad idea.
link |
00:04:32.800
And then a few more thousand times
link |
00:04:34.600
before you figure out how to not do it.
link |
00:04:37.040
And then a few more million times before you figure out
link |
00:04:39.240
how to not do it in every situation you ever encounter.
link |
00:04:42.560
So self supervised learning still has to have
link |
00:04:45.840
some source of truth being told to it by somebody.
link |
00:04:50.640
So you have to figure out a way without human assistance
link |
00:04:54.560
or without significant amount of human assistance
link |
00:04:56.640
to get that truth from the world.
link |
00:04:59.160
So the mystery there is how much signal is there,
link |
00:05:04.000
how much truth is there that the world gives you,
link |
00:05:06.320
whether it's the human world,
link |
00:05:08.200
like you watch YouTube or something like that,
link |
00:05:10.080
or it's the more natural world.
link |
00:05:12.960
So how much signal is there?
link |
00:05:14.920
So here's the trick.
link |
00:05:16.280
There is way more signal in sort of a self supervised setting
link |
00:05:20.600
than there is in either a supervised
link |
00:05:22.480
or reinforcement setting.
link |
00:05:24.520
And this is going to my analogy of the cake.
link |
00:05:30.240
The cake has someone that's called it,
link |
00:05:32.320
where when you try to figure out how much information
link |
00:05:36.000
you ask the machine to predict
link |
00:05:37.840
and how much feedback you give the machine at every trial,
link |
00:05:41.000
in reinforcement learning,
link |
00:05:41.880
you give the machine a single scalar,
link |
00:05:43.320
you tell the machine you did good, you did bad,
link |
00:05:45.400
and you only tell this to the machine once in a while.
link |
00:05:49.640
When I say you, it could be the universe
link |
00:05:51.440
telling the machine, right?
link |
00:05:54.120
But it's just one scalar.
link |
00:05:55.840
So as a consequence, you could not possibly learn
link |
00:05:58.480
something very complicated without many, many, many trials
link |
00:06:01.120
where you get many, many feedbacks of this type.
link |
00:06:04.760
In supervised learning, you give a few bits to the machine
link |
00:06:08.880
at every sample.
link |
00:06:11.280
Let's say you're training a system on,
link |
00:06:14.160
you know, recognizing images on ImageNet.
link |
00:06:16.320
There is 1,000 categories,
link |
00:06:17.680
that's a little less than 10 bits of information per sample.
link |
00:06:22.200
But self supervised learning here is the setting.
link |
00:06:23.840
You, ideally, we don't know how to do this yet,
link |
00:06:26.360
but ideally you would show a machine a segment of video
link |
00:06:31.680
and then stop the video and ask the machine
link |
00:06:33.760
to predict what's going to happen next.
link |
00:06:37.640
So you let the machine predict,
link |
00:06:38.720
and then you let time go by and show the machine
link |
00:06:42.840
what actually happened, and hope the machine will,
link |
00:06:46.000
you know, learn to do a better job
link |
00:06:47.920
at predicting next time around.
link |
00:06:49.400
There's a huge amount of information you give the machine,
link |
00:06:51.600
because it's an entire video clip of, you know,
link |
00:06:56.080
of the future after the video clip you fed it
link |
00:06:59.280
in the first place.
link |
00:07:00.280
So both for language and for vision,
link |
00:07:02.880
there's a subtle, seemingly trivial construction,
link |
00:07:06.880
but maybe that's representative
link |
00:07:08.480
of what is required to create intelligence,
link |
00:07:10.600
which is filling the gap.
link |
00:07:13.720
So...
link |
00:07:14.560
Filling the gaps.
link |
00:07:15.400
It sounds dumb, but can you?
link |
00:07:18.320
It is possible that you could solve
link |
00:07:21.120
all of intelligence in this way.
link |
00:07:23.000
Just for both language, just give a sentence
link |
00:07:27.840
and continue it, or give a sentence,
link |
00:07:29.800
and there's a gap in it, some words blanked out,
link |
00:07:33.480
and you fill in what words go there.
link |
00:07:35.680
For vision, you give a sequence of images
link |
00:07:39.160
and predict what's going to happen next,
link |
00:07:40.920
or you fill in what happened in between.
link |
00:07:43.840
Do you think it's possible that formulation alone
link |
00:07:48.600
as a signal for self supervised learning
link |
00:07:50.960
can solve intelligence for vision and language?
link |
00:07:53.600
I think that's our best shot at the moment.
link |
00:07:56.280
So whether this will take us all the way
link |
00:07:59.120
to human level intelligence or something,
link |
00:08:01.760
or just cat level intelligence is not clear,
link |
00:08:04.840
but among all the possible approaches
link |
00:08:07.320
that people have proposed, I think is our best shot.
link |
00:08:09.480
So I think this idea of an intelligent system
link |
00:08:14.640
filling in the blanks, either predicting the future,
link |
00:08:18.840
inferring the past, filling in missing information.
link |
00:08:23.760
I'm currently filling the blank of what is behind your head
link |
00:08:26.640
and what your head looks like from the back,
link |
00:08:30.600
because I have a basic knowledge about how humans are made.
link |
00:08:33.760
And I don't know if you're gonna,
link |
00:08:35.600
what word you're gonna say, at which point you're gonna speak,
link |
00:08:37.280
whether you're gonna move your head this way or that way,
link |
00:08:38.960
which way you're gonna look,
link |
00:08:40.280
but I know you're not gonna just dematerialize
link |
00:08:42.080
and reappear three meters down the hall,
link |
00:08:46.280
because I know what's possible and what's impossible
link |
00:08:49.480
according to the physics.
link |
00:08:51.160
You have a model of what's possible, what's impossible,
link |
00:08:53.280
and then you'd be very surprised if it happens,
link |
00:08:55.120
and then you'll have to reconstruct your model.
link |
00:08:57.880
Right, so that's the model of the world.
link |
00:08:59.640
It's what tells you, what fills in the blanks.
link |
00:09:02.280
So given your partial information
link |
00:09:04.520
about the state of the world, given by your perception,
link |
00:09:08.120
your model of the world fills in the missing information,
link |
00:09:11.400
and that includes predicting the future,
link |
00:09:13.800
retradicting the past, filling in things
link |
00:09:16.920
you don't immediately perceive.
link |
00:09:18.440
And that doesn't have to be purely generic vision
link |
00:09:22.320
or visual information or generic language.
link |
00:09:24.360
You can go to specifics like predicting
link |
00:09:28.960
what control decision you make
link |
00:09:30.320
when you're driving in a lane.
link |
00:09:31.640
You have a sequence of images from a vehicle,
link |
00:09:35.680
and then you have information, if you recorded on video,
link |
00:09:39.680
where the car ended up going,
link |
00:09:41.880
so you can go back in time and predict
link |
00:09:44.280
where the car went based on the visual information.
link |
00:09:46.720
That's very specific, domain specific.
link |
00:09:49.440
Right, but the question is whether we can come up
link |
00:09:51.520
with sort of a generic method for, you know,
link |
00:09:56.400
training machines to do this kind of prediction
link |
00:09:58.560
or filling in the blanks.
link |
00:09:59.880
So right now, this type of approach
link |
00:10:03.280
has been unbelievably successful
link |
00:10:05.640
in the context of natural language processing.
link |
00:10:08.240
Every modern natural language processing
link |
00:10:09.760
is pre trained in self supervised manner
link |
00:10:12.320
to fill in the blanks.
link |
00:10:13.240
So you show it a sequence of words,
link |
00:10:15.000
you remove 10% of them,
link |
00:10:16.440
and then you train some gigantic neural net
link |
00:10:17.960
to predict the words that are missing.
link |
00:10:19.880
And once you've pre trained that network,
link |
00:10:22.680
you can use the internal representation learned by it
link |
00:10:26.560
as input to, you know,
link |
00:10:28.600
something that you train supervised or whatever.
link |
00:10:32.160
That's been incredibly successful,
link |
00:10:33.320
not so successful in images,
link |
00:10:35.000
although it's making progress.
link |
00:10:37.520
And it's based on sort of manual data augmentation.
link |
00:10:42.520
We can go into this later,
link |
00:10:43.520
but what has not been successful yet is training from video.
link |
00:10:47.160
So getting a machine to learn,
link |
00:10:48.480
to represent the visual world, for example,
link |
00:10:51.520
by just watching video,
link |
00:10:52.800
nobody has really succeeded in doing this.
link |
00:10:54.800
Okay, well, let's kind of give a high level overview.
link |
00:10:57.520
What's the difference in kind and in difficulty
link |
00:11:02.360
between vision and language?
link |
00:11:03.960
So you said people haven't been able to really kind of crack
link |
00:11:08.840
the problem of vision open
link |
00:11:10.480
in terms of self supervised learning,
link |
00:11:11.960
but that may not be necessary
link |
00:11:13.800
because it's fundamentally more difficult.
link |
00:11:15.840
Maybe like when we're talking about achieving,
link |
00:11:18.680
like passing the touring test in the full spirit
link |
00:11:22.280
of the touring test in language
link |
00:11:23.800
might be harder than vision.
link |
00:11:24.880
That's not obvious.
link |
00:11:26.360
So in your view, which is harder
link |
00:11:29.400
or perhaps are they just the same problem?
link |
00:11:31.920
When the farther we get to solving each,
link |
00:11:34.800
the more we realize it's all the same thing.
link |
00:11:36.680
It's all the same cake.
link |
00:11:37.640
I think what I'm looking for are methods
link |
00:11:40.160
that make them look essentially like the same cake,
link |
00:11:43.560
but currently they're not.
link |
00:11:44.760
And the main issue with learning world models
link |
00:11:48.480
or learning predictive models
link |
00:11:50.160
is that the prediction is never a single thing
link |
00:11:55.880
because the world is not entirely predictable.
link |
00:11:59.200
It may be deterministic or stochastic.
link |
00:12:00.680
We can get into the philosophical discussion about it,
link |
00:12:02.960
but even if it's deterministic,
link |
00:12:05.280
it's not entirely predictable.
link |
00:12:07.440
And so if I play a short video clip
link |
00:12:11.760
and then I ask you to predict what's going to happen next,
link |
00:12:14.160
there's many, many plausible continuations
link |
00:12:16.360
for that video clip.
link |
00:12:18.320
And the number of continuation grows
link |
00:12:20.520
with the interval of time
link |
00:12:22.840
that you're asking the system to make a prediction for.
link |
00:12:26.480
And so one big question we start supervising
link |
00:12:29.800
is how you represent this uncertainty,
link |
00:12:32.320
how you represent multiple discrete outcomes,
link |
00:12:35.200
how you represent a continuum of possible outcomes, et cetera.
link |
00:12:40.200
And if you are sort of a classical machine learning person,
link |
00:12:45.000
you say, oh, you just represent a distribution, right?
link |
00:12:48.920
And that we know how to do
link |
00:12:51.000
when we're predicting words, missing words in the text
link |
00:12:53.480
because you can have a neural net,
link |
00:12:56.040
give a score for every word in the dictionary.
link |
00:12:58.400
It's a big list of numbers, maybe 100,000 or so.
link |
00:13:02.240
And you can turn them into a parallel distribution
link |
00:13:04.600
that tells you when I say a sentence,
link |
00:13:07.400
the cat is chasing the blank in the kitchen.
link |
00:13:12.360
There are only a few words that make sense there.
link |
00:13:15.320
It could be a mouse or it could be a lizard spot
link |
00:13:18.440
or something like that, right?
link |
00:13:21.640
And if I say the blank is chasing the blank in the savannah,
link |
00:13:25.880
you also have a bunch of plausible options
link |
00:13:27.880
for those two words, right?
link |
00:13:31.000
Because you have kind of a underlying reality
link |
00:13:33.720
you can refer to to sort of fill in those blanks.
link |
00:13:38.120
So you cannot say for sure in the savannah
link |
00:13:42.040
if it's a lion or a cheetah or whatever,
link |
00:13:44.480
you cannot know if it's a zebra or a do or whatever,
link |
00:13:49.560
wildebeest, the same thing.
link |
00:13:55.360
But you can represent the uncertainty
link |
00:13:56.840
by just a long list of numbers.
link |
00:13:58.520
Now, if I do the same thing with video
link |
00:14:01.800
when I ask you to predict a video clip,
link |
00:14:04.360
it's not a discrete set of potential frames.
link |
00:14:07.400
You have to have somewhere representing
link |
00:14:10.000
a sort of infinite number of plausible continuations
link |
00:14:13.560
of multiple frames in a high dimensional,
link |
00:14:16.240
continuous space.
link |
00:14:17.520
And we just have no idea how to do this properly.
link |
00:14:20.560
Finite, high dimensional.
link |
00:14:22.920
So like you...
link |
00:14:23.760
It's finite, high dimensional, yes.
link |
00:14:25.360
Just like the words,
link |
00:14:26.280
they try to get it to down to a small finite set
link |
00:14:32.240
of like under a million, something like that.
link |
00:14:34.240
Something like that.
link |
00:14:35.080
I mean, it's kind of ridiculous
link |
00:14:36.040
that we're doing a distribution
link |
00:14:39.040
of every single possible word for language, and it works.
link |
00:14:42.920
It feels like that's a really dumb way to do it.
link |
00:14:46.520
Like there seems to be like there should be
link |
00:14:49.760
some more compressed representation
link |
00:14:52.960
of the distribution of the words.
link |
00:14:55.040
You're right about that.
link |
00:14:56.160
And so do you have any interesting ideas
link |
00:14:58.920
about how to represent all the reality in a compressed way
link |
00:15:01.880
such that you can form a distribution over?
link |
00:15:03.800
That's one of the big questions, you know,
link |
00:15:05.040
how do you do that?
link |
00:15:06.200
But I mean, what's kind of, you know,
link |
00:15:07.960
another thing that really is stupid about,
link |
00:15:12.120
I shouldn't say stupid,
link |
00:15:13.080
but like simplistic about current approaches
link |
00:15:15.560
to self supervision in NLP in text
link |
00:15:19.360
is that not only do you represent
link |
00:15:21.920
a giant distribution over words,
link |
00:15:23.800
but for multiple words that are missing,
link |
00:15:25.640
those distributions are essentially independent
link |
00:15:27.680
of each other.
link |
00:15:30.160
And, you know, you don't pay too much of a price for this.
link |
00:15:33.040
So you can't.
link |
00:15:34.560
So, you know, the system, you know,
link |
00:15:36.720
in the sentence that I gave earlier,
link |
00:15:39.640
if it gives a certain probability for lion and cheetah,
link |
00:15:43.600
and then a certain probability for, you know,
link |
00:15:46.240
gazelle, wildebeest and zebra,
link |
00:15:50.240
those two probabilities are independent of each other.
link |
00:15:55.960
And it's not the case that those things are independent.
link |
00:15:58.080
Lions actually attack like bigger animals than cheetahs.
link |
00:16:01.480
So, you know, there's a huge independence hypothesis
link |
00:16:04.560
in this process, which is not actually true.
link |
00:16:07.800
The reason for this is that we don't know
link |
00:16:09.920
how to represent properly distributions
link |
00:16:13.000
over combinatorial sequences of symbols, essentially,
link |
00:16:16.680
when they're, because the number grows exponentially
link |
00:16:19.000
with the length of the symbols.
link |
00:16:21.320
And so we have to use tricks for this,
link |
00:16:22.800
but those techniques can, you know, get around,
link |
00:16:26.400
like don't even deal with it.
link |
00:16:27.800
So the big question is, like, would there be
link |
00:16:31.360
some sort of abstract latent representation of text
link |
00:16:35.680
that would say that, you know, when I switch lion for gazelle,
link |
00:16:40.680
lion for cheetah, I also have to switch zebra for gazelle.
link |
00:16:45.520
Yeah, so this independence assumption,
link |
00:16:48.760
let me throw some criticism at you
link |
00:16:50.280
that I often hear and see how you respond.
link |
00:16:52.960
So this kind of feeling in the blanks is just statistics.
link |
00:16:56.040
You're not learning anything,
link |
00:16:58.920
like the deep underlying concepts.
link |
00:17:01.640
You're just mimicking stuff from the past.
link |
00:17:05.680
You're not learning anything new
link |
00:17:07.560
such that you can use it to generalize about the world.
link |
00:17:12.000
Or, okay, let me just say the crude version,
link |
00:17:14.160
which is just statistics.
link |
00:17:16.240
It's not intelligence.
link |
00:17:17.880
Oh, what do you have to say to that?
link |
00:17:19.640
What do you usually say to that
link |
00:17:20.880
if you kind of hear this kind of thing?
link |
00:17:22.640
I don't get into those discussions
link |
00:17:23.960
because they are kind of pointless.
link |
00:17:26.720
So first of all, it's quite possible
link |
00:17:28.720
that intelligence is just statistics.
link |
00:17:30.440
It's just statistics of a particular kind.
link |
00:17:32.720
Yes.
link |
00:17:33.640
Where this is the philosophical question.
link |
00:17:35.280
Is it possible that intelligence is just statistics?
link |
00:17:40.280
Yeah.
link |
00:17:41.560
But what kind of statistics?
link |
00:17:43.520
So if you are asking the question,
link |
00:17:46.200
are the models of the world that we learn,
link |
00:17:50.640
do they have some notion of causality?
link |
00:17:52.320
Yes.
link |
00:17:53.400
So if the criticism comes from people who say,
link |
00:17:56.360
a current machine learning system don't care about causality,
link |
00:17:59.440
which by the way is wrong, I agree with that.
link |
00:18:03.600
Your model of the world should have your actions
link |
00:18:06.560
as one of the inputs,
link |
00:18:09.080
and that will drive you to learn causal models of the world
link |
00:18:11.400
where you know what intervention in the world
link |
00:18:15.080
will cause what result,
link |
00:18:16.720
or you can do this by observation
link |
00:18:18.000
of other agents acting in the world
link |
00:18:20.160
and observing the effect of other humans, for example.
link |
00:18:24.200
So I think at some level of description,
link |
00:18:28.400
intelligence is just statistics.
link |
00:18:31.640
But that doesn't mean you won't have models
link |
00:18:35.160
that have deep mechanistic explanation for what goes on.
link |
00:18:40.040
The question is how do you learn them?
link |
00:18:41.760
That's the question I'm interested in.
link |
00:18:44.400
Because a lot of people who actually voice their criticism
link |
00:18:49.320
say that those mechanistic model
link |
00:18:51.000
have to come from someplace else.
link |
00:18:52.640
They have to come from human designers.
link |
00:18:54.040
They have to come from, I don't know what.
link |
00:18:56.200
And obviously we learn them.
link |
00:18:59.280
Or if we don't learn them as an individual,
link |
00:19:01.800
nature learned them for us using evolution.
link |
00:19:04.920
So regardless of what you think,
link |
00:19:07.160
those processes have been learned somehow.
link |
00:19:10.240
So if you look at the human brain,
link |
00:19:12.960
just like when we humans introspect
link |
00:19:14.680
about how the brain works,
link |
00:19:16.360
it seems like when we think about what is intelligence,
link |
00:19:20.280
we think about the high level stuff,
link |
00:19:22.480
like the models we've constructed,
link |
00:19:24.000
concepts like cognitive science,
link |
00:19:25.600
like concepts of memory and reasoning module,
link |
00:19:28.760
almost like these high level modules.
link |
00:19:31.680
Is this serve as a good analogy?
link |
00:19:35.440
Like are we ignoring the dark matter,
link |
00:19:40.440
the basic low level mechanisms?
link |
00:19:43.560
Just like we ignore the way the operating system works,
link |
00:19:45.800
we're just using the high level software.
link |
00:19:49.680
We're ignoring that at the low level,
link |
00:19:52.760
the neural network might be doing something like statistics.
link |
00:19:56.480
Like me, sorry to use this word,
link |
00:19:59.080
probably incorrectly and crudely,
link |
00:20:00.600
but doing this kind of fill in the gap kind of learning
link |
00:20:03.320
and just kind of updating the model constantly
link |
00:20:05.720
in order to be able to support the raw sensory information,
link |
00:20:09.280
to predict it and adjust to the prediction when it's wrong.
link |
00:20:12.480
But like when we look at our brain at the high level,
link |
00:20:15.880
it feels like we're playing chess,
link |
00:20:18.400
like we're playing with high level concepts
link |
00:20:22.280
and we're stitching them together
link |
00:20:23.760
and we're putting them into long term memory,
link |
00:20:26.080
but really what's going underneath
link |
00:20:28.320
is something we're not able to introspect,
link |
00:20:30.240
which is this kind of simple, large neural network
link |
00:20:34.480
that's just filling in the gaps.
link |
00:20:36.080
Right, well, okay, so there's a lot of questions
link |
00:20:38.280
and a lot of answers there.
link |
00:20:39.800
Okay, so first of all,
link |
00:20:40.640
there's a whole school of thought in neuroscience,
link |
00:20:42.680
competition on neuroscience in particular,
link |
00:20:45.240
that likes the idea of predictive coding,
link |
00:20:47.800
which is really related to the idea
link |
00:20:50.120
I was talking about in self supervised learning.
link |
00:20:52.080
So everything is about prediction.
link |
00:20:53.560
The essence of intelligence is the ability to predict
link |
00:20:56.360
and everything the brain does is trying to predict everything
link |
00:21:00.800
from everything else.
link |
00:21:02.160
Okay, and that's really sort of the underlying principle
link |
00:21:04.800
if you want that self supervised learning
link |
00:21:07.880
is trying to kind of reproduce this idea of prediction
link |
00:21:10.720
as kind of an essential mechanism
link |
00:21:13.160
of task independent learning if you want.
link |
00:21:16.400
The next step is what kind of intelligence
link |
00:21:19.400
are you interested in reproducing?
link |
00:21:21.200
And of course, we all think about trying to reproduce
link |
00:21:25.360
high level cognitive processes in humans,
link |
00:21:28.400
but like with machines, we're not even at the level
link |
00:21:30.480
of even reproducing the learning processes in a cat brain.
link |
00:21:35.480
You know, the most intelligent of our intelligence systems
link |
00:21:38.240
don't have as much common sense as a house cat.
link |
00:21:42.160
So how is it that cats learn?
link |
00:21:44.160
And cats don't do a whole lot of reasoning.
link |
00:21:46.760
They certainly have causal models.
link |
00:21:48.480
They certainly have, because many cats can figure out
link |
00:21:52.480
like how they can act on the world to get what they want.
link |
00:21:55.240
They certainly have a fantastic model of intuitive physics,
link |
00:22:00.640
certainly of the dynamics of their own bodies,
link |
00:22:03.560
but also of praise and things like that, right?
link |
00:22:05.800
So they're pretty smart.
link |
00:22:08.760
They only do this with about 800 million neurons.
link |
00:22:12.560
We are not anywhere close to reproducing this kind of thing.
link |
00:22:16.560
So to some extent, I could say,
link |
00:22:20.040
let's not even worry about like the high level cognition
link |
00:22:25.040
and kind of long term planning and reasoning that humans
link |
00:22:27.160
can do until we figure out like,
link |
00:22:29.000
can we even reproduce what cats are doing?
link |
00:22:31.280
Now that said, this ability to learn world models,
link |
00:22:35.880
I think is the key to the possibility
link |
00:22:39.080
of running machines that can also reason.
link |
00:22:41.920
So whenever I give a talk, I say there are three challenges
link |
00:22:44.680
in the three main challenges in machine learning.
link |
00:22:46.320
The first one is, you know,
link |
00:22:48.160
getting machines to learn to represent the world.
link |
00:22:50.760
And I'm proposing self supervised learning.
link |
00:22:53.760
The second is getting machines to reason
link |
00:22:56.920
in ways that are compatible with essentially gradient based
link |
00:23:00.480
learning, because this is what deep learning is all about,
link |
00:23:02.480
really.
link |
00:23:04.480
And the third one is something we have no idea how to solve.
link |
00:23:06.480
At least I have no idea how to solve is,
link |
00:23:10.480
can we get machines to learn hierarchical representations
link |
00:23:13.480
of action plans, you know, like, you know,
link |
00:23:17.480
we know how to trend them to learn hierarchical representations
link |
00:23:19.480
of perception, you know, with convolutional nets
link |
00:23:22.480
and things like that and transformers.
link |
00:23:24.480
But what about action plans?
link |
00:23:25.480
Can we get them to spontaneously learn good hierarchical
link |
00:23:28.480
representations of actions?
link |
00:23:30.480
Also gradient based.
link |
00:23:32.480
Yeah, all of that, you know, needs to be somewhat differentiable
link |
00:23:35.480
so that you can apply sort of gradient based learning,
link |
00:23:38.480
which is really what deep learning is about.
link |
00:23:42.480
So it's background knowledge, ability to reason in a way
link |
00:23:49.480
that's differentiable, that is somehow connected deeply
link |
00:23:52.480
integrated with that background knowledge or builds
link |
00:23:55.480
on top of that background knowledge.
link |
00:23:57.480
And then giving that background knowledge be able to make
link |
00:23:59.480
hierarchical plans in the world.
link |
00:24:01.480
Right.
link |
00:24:02.480
So if you take classical optimal control,
link |
00:24:05.480
there's something classical optimal control called
link |
00:24:08.480
model predictive control.
link |
00:24:10.480
And it's, you know, it's been around since the early 60s.
link |
00:24:13.480
NASA uses that to compute trajectories of rockets.
link |
00:24:16.480
And the basic idea is that you have a predictive model
link |
00:24:20.480
of the rocket, let's say, or whatever system you intend
link |
00:24:23.480
to control, which given the state of the system at time t
link |
00:24:27.480
and given an action that you're taking the system.
link |
00:24:31.480
So for a rocket to be thrust and, you know,
link |
00:24:33.480
all the controls you can have.
link |
00:24:35.480
It gives you the state of the system at time t plus delta t.
link |
00:24:38.480
Right.
link |
00:24:39.480
So basically differential equation, something like that.
link |
00:24:43.480
And if you have this model and you have this model in the form
link |
00:24:47.480
of some sort of neural net or some sort of set of formula that
link |
00:24:51.480
you can back propagate gradient through,
link |
00:24:53.480
you can do what's called model predictive control
link |
00:24:55.480
or gradient based model predictive control.
link |
00:24:58.480
So you have, you can enroll that model in time.
link |
00:25:04.480
You feel it a hypothesized sequence of actions.
link |
00:25:10.480
And then you have some objective function that measures
link |
00:25:13.480
how well at the end of the trajectory of the system
link |
00:25:16.480
has succeeded or matched what you wanted to do.
link |
00:25:19.480
You know, is it a robot harm?
link |
00:25:21.480
Have you grasped the object you want to grasp?
link |
00:25:23.480
If it's a rocket, you know, are you at the right place
link |
00:25:26.480
near the space station?
link |
00:25:27.480
Things like that.
link |
00:25:28.480
And by back propagation through time, and again,
link |
00:25:31.480
this was invented in the 1960s by optimal control theorists,
link |
00:25:34.480
you can figure out what is the optimal sequence of actions
link |
00:25:38.480
that will, you know, get my system to the best final state.
link |
00:25:44.480
So that's a form of reasoning.
link |
00:25:47.480
It's basically planning and a lot of planning systems
link |
00:25:50.480
in robotics are actually based on this.
link |
00:25:52.480
And you can think of this as a form of reasoning.
link |
00:25:56.480
So, you know, to take the example of the teenager driving
link |
00:25:59.480
a car again, you have a pretty good dynamical model of the car.
link |
00:26:02.480
It doesn't need to be very accurate.
link |
00:26:04.480
But you know, again, that if you turn the wheel to the right
link |
00:26:07.480
and there is a cliff, you're going to run off the cliff, right?
link |
00:26:09.480
You don't need to have a very accurate model to predict that.
link |
00:26:11.480
And you can run this in your mind and decide not to do it
link |
00:26:14.480
for that reason because you can predict in advance
link |
00:26:17.480
that the result is going to be bad.
link |
00:26:18.480
So you can sort of imagine different scenarios
link |
00:26:20.480
and then, you know, employ or take the first step
link |
00:26:24.480
in the scenario that is most favorable
link |
00:26:26.480
and then repeat the process of planning.
link |
00:26:27.480
That's called receding horizon model predictive control.
link |
00:26:30.480
So, you know, all those things have names, you know,
link |
00:26:32.480
going back, you know, decades.
link |
00:26:35.480
And so if you're not, you know, in classical optimal control,
link |
00:26:39.480
the model of the world is not generally learned.
link |
00:26:42.480
There's, you know, sometimes a few parameters
link |
00:26:44.480
you have to identify that's called systems identification.
link |
00:26:46.480
But generally, the model is mostly deterministic
link |
00:26:51.480
and mostly built by hand.
link |
00:26:52.480
So the big question of AI, I think the big challenge of AI
link |
00:26:56.480
for the next decade is how do we get machines
link |
00:26:59.480
to learn predictive models of the world
link |
00:27:01.480
that deal with uncertainty and deal with the real world
link |
00:27:04.480
in all this complexity.
link |
00:27:05.480
So it's not just trajectory of a rocket,
link |
00:27:07.480
which you can reduce to first principles.
link |
00:27:09.480
It's not even just a trajectory of a robot arm,
link |
00:27:12.480
which again, you can model by, you know, careful mathematics.
link |
00:27:15.480
But it's everything else, everything we observe in the world,
link |
00:27:17.480
you know, people, behavior, you know, physical systems
link |
00:27:22.480
that involve collective phenomena like water or, you know,
link |
00:27:27.480
trees and, you know, branches in a tree or something
link |
00:27:31.480
or like complex things that, you know, humans have no trouble
link |
00:27:35.480
developing abstract representations
link |
00:27:37.480
and predictive model for, but we still don't know
link |
00:27:39.480
how to deal with machines.
link |
00:27:40.480
Where do you put in these three maybe in the planning stages
link |
00:27:45.480
the game theoretic nature of this world,
link |
00:27:49.480
where your actions not only respond to the dynamic nature
link |
00:27:53.480
of the world, the environment, but also affect it.
link |
00:27:56.480
So if there's other humans involved, is this point number four
link |
00:28:01.480
or is it somehow integrated into the hierarchical representation
link |
00:28:04.480
of action in your view?
link |
00:28:05.480
I think it's integrated.
link |
00:28:07.480
It's just that now your model of the world has to deal with,
link |
00:28:10.480
you know, it just makes it more complicated, right?
link |
00:28:12.480
The fact that humans are complicated and not easily predictable,
link |
00:28:16.480
that makes your model of the world much more complicated,
link |
00:28:19.480
that much more complicated.
link |
00:28:20.480
Well, there's a chess, I mean, I suppose chess is an analogy.
link |
00:28:24.480
So Monte Carlo tree search.
link |
00:28:27.480
I mean, there is a, I go, you go, I go, you go.
link |
00:28:31.480
Like Andre Carpathia recently gave a talk at MIT about car doors.
link |
00:28:37.480
I think there's some machine learning too, but mostly car doors.
link |
00:28:40.480
And there's a dynamic nature to the car, like the person opening
link |
00:28:44.480
the door checking.
link |
00:28:45.480
I mean, he wasn't talking about that.
link |
00:28:46.480
He was talking about the perception problem of what the,
link |
00:28:48.480
the ontology of what defines a car door,
link |
00:28:50.480
this big philosophical question.
link |
00:28:52.480
But to me, it was interesting because like it's obvious that
link |
00:28:55.480
the person opening the car doors, they're trying to get out like here
link |
00:28:58.480
in New York, trying to get out of the car.
link |
00:29:00.480
You're slowing down is going to signal something.
link |
00:29:03.480
You're speeding up is going to signal something.
link |
00:29:05.480
And that's a dance.
link |
00:29:06.480
It's a asynchronous chess game.
link |
00:29:09.480
I don't know.
link |
00:29:10.480
So it feels like it's not just, I mean, I guess you can integrate
link |
00:29:17.480
all of them to one giant model.
link |
00:29:19.480
Like the entirety of these little interactions,
link |
00:29:23.480
because it's not as complicated as chess.
link |
00:29:25.480
It's just like a little dance.
link |
00:29:26.480
We do like a little dance together.
link |
00:29:28.480
And then we figure it out.
link |
00:29:29.480
Well, in some ways it's way more complicated than chess because,
link |
00:29:33.480
because it's continuous.
link |
00:29:34.480
It's uncertain in a continuous manner.
link |
00:29:37.480
It doesn't feel more complicated, but it doesn't feel more complicated
link |
00:29:40.480
because that's what we're, we've evolved to solve.
link |
00:29:43.480
This is the kind of problem we've evolved to solve.
link |
00:29:45.480
And so we're good at it because, you know, nature has made us good at it.
link |
00:29:49.480
Nature has not made us good at chess.
link |
00:29:52.480
We completely suck at chess.
link |
00:29:54.480
In fact, that's why we designed it as a, as a game is to be challenging.
link |
00:29:58.480
And if there is something that, you know,
link |
00:30:01.480
recent progress in chess and Go has made us realize is that humans
link |
00:30:06.480
are really terrible at those things.
link |
00:30:07.480
Like really bad.
link |
00:30:08.480
You know, there was a story, right?
link |
00:30:10.480
Before AlphaGo that, you know, the best Go player thought there were
link |
00:30:15.480
maybe two or three stones behind, you know,
link |
00:30:17.480
an ideal player that they would call God.
link |
00:30:19.480
In fact, no, there are like nine or 10 stones behind.
link |
00:30:23.480
I mean, we're just bad.
link |
00:30:24.480
So we're not good at, and it's because we have limited working memory.
link |
00:30:29.480
We, you know, we're not very good at like doing this tree exploration
link |
00:30:32.480
that, you know, computers are much better at doing than we are.
link |
00:30:36.480
But we are much better at learning differentiable models to the world.
link |
00:30:39.480
I mean, I said differentiable in the kind of, you know,
link |
00:30:43.480
I should say not differentiable in the sense that, you know,
link |
00:30:46.480
we went back far through it, but in the sense that our brain has some
link |
00:30:49.480
mechanism for estimating gradients of some kind.
link |
00:30:53.480
And that's what, you know, makes us efficient.
link |
00:30:56.480
So if you have an agent that consists of a model of the world,
link |
00:31:02.480
which, you know, in the human brain is basically the entire front half
link |
00:31:05.480
of your brain, an objective function, which in humans is a combination
link |
00:31:13.480
of two things.
link |
00:31:14.480
There is your sort of intrinsic motivation module,
link |
00:31:17.480
which is in the basal ganglia, you know, the base of your brain.
link |
00:31:19.480
That's the thing that measures pain and hunger and things like that.
link |
00:31:22.480
Like immediate feelings and emotions.
link |
00:31:26.480
And then there is, you know, the equivalent of what people
link |
00:31:30.480
in Reference Metronomy call a critic, which is a sort of module
link |
00:31:34.480
that predicts ahead what the outcome of a situation will be.
link |
00:31:41.480
And so it's not a cost function, but it's sort of not an objective
link |
00:31:44.480
function, but it's sort of a, you know, trained predictor
link |
00:31:48.480
of the ultimate objective function.
link |
00:31:50.480
And that also is differentiable.
link |
00:31:52.480
And so if all of this is differentiable, your cost function,
link |
00:31:55.480
your critic, your, you know, your world model, then you can use
link |
00:32:01.480
gradient based type methods to do planning, to do reasoning,
link |
00:32:04.480
to do learning, you know, to do all the things that would like
link |
00:32:07.480
an intelligent agent to do.
link |
00:32:11.480
And gradient based learning, like what's your intuition?
link |
00:32:15.480
That's probably at the core of what can solve intelligence.
link |
00:32:18.480
So you don't need like a logic based reasoning in your view.
link |
00:32:25.480
I don't know how to make logic based reasoning compatible
link |
00:32:27.480
with efficient learning.
link |
00:32:29.480
Yeah.
link |
00:32:30.480
And okay, I mean, there is a big question, perhaps a
link |
00:32:32.480
philosophical question.
link |
00:32:33.480
I mean, it's not that philosophical, but that we can ask is,
link |
00:32:37.480
is that, you know, all the learning algorithms we know from
link |
00:32:40.480
engineering and computer science proceed by optimizing
link |
00:32:44.480
some objective function.
link |
00:32:46.480
Yeah.
link |
00:32:47.480
So one question we may ask is, is does learning in the brain
link |
00:32:52.480
minimize an objective function?
link |
00:32:54.480
I mean, it could be, you know, a composite of multiple
link |
00:32:57.480
objective functions, but it's still an objective function.
link |
00:33:00.480
Second, if it does optimize an objective function, does it do,
link |
00:33:05.480
does it do it by some sort of gradient estimation?
link |
00:33:09.480
You know, it doesn't need to be backprop, but, you know, some
link |
00:33:11.480
way of estimating the gradient in efficient manner, whose
link |
00:33:14.480
complexity is on the same order of magnitude as, you know,
link |
00:33:17.480
actually running the inference.
link |
00:33:22.480
Because you can't afford to do things like, you know,
link |
00:33:24.480
perturbing a weight in your brain to figure out what the
link |
00:33:26.480
effect is, and then sort of, you know, you can do sort of
link |
00:33:30.480
estimating gradient by perturbation.
link |
00:33:32.480
It's, to me, it seems very implausible that the brain uses
link |
00:33:36.480
some sort of, you know, zeroth order, black box, gradient
link |
00:33:40.480
free optimization, because it's so much less efficient than
link |
00:33:44.480
gradient optimization.
link |
00:33:46.480
So it has to have a way of estimating gradient.
link |
00:33:48.480
Is it possible that some kind of logic based reasoning
link |
00:33:52.480
emerges in pockets as a useful, like you said, if the brain
link |
00:33:56.480
is an objective function?
link |
00:33:58.480
Maybe it's a mechanism for creating objective functions.
link |
00:34:00.480
It's a mechanism for creating knowledge bases, for example,
link |
00:34:05.480
that can then be quarried.
link |
00:34:07.480
Like maybe it's like an efficient representation of knowledge
link |
00:34:10.480
that's learned in a gradient based way or something like that.
link |
00:34:13.480
Well, so I think there is a lot of different types of
link |
00:34:15.480
intelligence.
link |
00:34:17.480
So first of all, I think the type of logical reasoning that
link |
00:34:19.480
we think about, that we are, you know, maybe stemming from,
link |
00:34:23.480
you know, sort of classical AI of the 1970s and 80s.
link |
00:34:28.480
I think humans use that relatively rarely and are not
link |
00:34:32.480
particularly good at it.
link |
00:34:34.480
But we judge each other based on our ability to solve those
link |
00:34:38.480
rare problems.
link |
00:34:40.480
It's called IQ test.
link |
00:34:41.480
I don't think so.
link |
00:34:42.480
Like, I'm not very good at chess.
link |
00:34:44.480
Yes, I'm judging you this whole time, because, well, we
link |
00:34:48.480
actually...
link |
00:34:49.480
With your, you know, heritage, I'm sure you're good at chess.
link |
00:34:52.480
No, stereotypes.
link |
00:34:54.480
Not all stereotypes are true.
link |
00:34:57.480
Well, I'm terrible at chess.
link |
00:34:59.480
You know, but I think perhaps another type of intelligence
link |
00:35:04.480
that I have is this, you know, ability of sort of building
link |
00:35:08.480
models to the world from, you know, reasoning, obviously,
link |
00:35:13.480
but also data.
link |
00:35:15.480
And those models generally are more kind of analogical, right?
link |
00:35:18.480
So it's reasoning by simulation and by analogy, where you use
link |
00:35:24.480
one model to apply to a new situation, even though you've
link |
00:35:27.480
seen that situation, you can sort of connect it to a situation
link |
00:35:31.480
you've encountered before.
link |
00:35:33.480
And your reasoning is more, you know, akin to some sort of
link |
00:35:37.480
internal simulation.
link |
00:35:38.480
So you're kind of simulating what's happening when you're
link |
00:35:41.480
building, I don't know, a box out of wood or something, right?
link |
00:35:43.480
You can imagine in advance, like, will we be the result of,
link |
00:35:47.480
you know, cutting the wood in this particular way?
link |
00:35:49.480
Are you going to use, you know, screws on nails or whatever?
link |
00:35:52.480
When you're interacting with someone, you also have a model
link |
00:35:55.480
in mind to kind of tell the person what you think is useful
link |
00:36:00.480
to them.
link |
00:36:01.480
So I think this ability to construct models to the world is
link |
00:36:06.480
basically the essence of intelligence, and the ability
link |
00:36:10.480
to use it then to plan actions that will fulfill a particular
link |
00:36:17.480
criterion, of course, is necessary as well.
link |
00:36:21.480
So I'm going to ask you a series of impossible questions as we
link |
00:36:26.480
keep asking, has that been doing?
link |
00:36:29.480
So if that's the fundamental sort of dark matter of
link |
00:36:32.480
intelligence, this ability to form a background model, what's
link |
00:36:36.480
your intuition about how much knowledge is required?
link |
00:36:40.480
You know, I think dark matter, you can put a percentage on it
link |
00:36:45.480
of the composition of the universe and how much of it is dark
link |
00:36:50.480
matter, how much of it is dark energy, how much information
link |
00:36:55.480
do you think is required to be a house cat?
link |
00:36:59.480
So you have to be able to, when you see a box going, when you
link |
00:37:02.480
see a human compute the most evil action, if there's a thing
link |
00:37:06.480
that's near an edge, you knock it off, all of that.
link |
00:37:10.480
Plus the extra stuff you mentioned, which is a great
link |
00:37:13.480
self awareness of the physics of your own body and the world.
link |
00:37:18.480
How much knowledge is required, do you think, to solve it?
link |
00:37:21.480
I don't even know how to measure an answer to that question.
link |
00:37:25.480
I'm not sure how to measure it, but whatever it is, it fits
link |
00:37:27.480
in about 800,000 neurons, 800 million neurons.
link |
00:37:33.480
The representation does.
link |
00:37:35.480
Everything, all knowledge, everything, right?
link |
00:37:39.480
There's less than a billion.
link |
00:37:41.480
A dog is 2 billion, but a cat is less than 1 billion.
link |
00:37:45.480
And so multiply that by 1000 and you get the number of synapses.
link |
00:37:49.480
And I think almost all of it is learned through a sort of
link |
00:37:54.480
self supervised running.
link |
00:37:56.480
Although I think a tiny sliver is learned through reinforcement
link |
00:37:59.480
running and certainly very little through classical
link |
00:38:02.480
supervised running, although it's not even clear how
link |
00:38:04.480
supervised running actually works in the biological world.
link |
00:38:08.480
So I think almost all of it is self supervised running.
link |
00:38:12.480
But it's driven by the sort of ingrained objective functions
link |
00:38:17.480
that a cat or human have at the base of their brain,
link |
00:38:21.480
which kind of drives their behavior.
link |
00:38:24.480
So, you know, nature tells us, you're hungry.
link |
00:38:29.480
It doesn't tell us how to feed ourselves.
link |
00:38:31.480
That's something that the rest of our brain has to figure out, right?
link |
00:38:35.480
What's interesting is there might be more like
link |
00:38:38.480
deeper objective functions than allowing the whole thing.
link |
00:38:41.480
So hunger may be some kind of...
link |
00:38:44.480
Now you go to like neurobiology.
link |
00:38:46.480
It might be just the brain trying to maintain homeostasis.
link |
00:38:52.480
So hunger is just one of the human perceivable symptoms
link |
00:38:58.480
of the brain being unhappy with the way things are currently.
link |
00:39:01.480
It could be just like one really dumb objective function at the core.
link |
00:39:05.480
But that's how behavior is driven.
link |
00:39:08.480
The fact that, you know, the Orbezo Ganglia
link |
00:39:12.480
drive us to do things that are different from, say, an orangutan
link |
00:39:16.480
or certainly a cat is what makes, you know, human nature
link |
00:39:20.480
versus orangutan nature versus cat nature.
link |
00:39:23.480
So for example, you know, our Bezo Ganglia
link |
00:39:27.480
drives us to seek the company of other humans.
link |
00:39:32.480
And that's because nature has figured out that we need to be
link |
00:39:35.480
a social animal for our species to survive.
link |
00:39:38.480
And it's true of many primates.
link |
00:39:41.480
It's not true of orangutans.
link |
00:39:43.480
Orangutans are solitary animals.
link |
00:39:45.480
They don't seek the company of others.
link |
00:39:47.480
In fact, they avoid them.
link |
00:39:49.480
In fact, they scream at them when they come too close
link |
00:39:51.480
because they're territorial.
link |
00:39:53.480
Because for their survival, you know, evolution has figured out
link |
00:39:57.480
that's the best thing.
link |
00:39:59.480
I mean, they're occasionally social, of course, for, you know,
link |
00:40:02.480
reproduction and stuff like that, but they're mostly solitary.
link |
00:40:06.480
So all of those behaviors are not part of intelligence.
link |
00:40:09.480
You know, people say, oh, you're never going to have intelligent
link |
00:40:11.480
machines because, you know, human intelligence is social.
link |
00:40:13.480
But then you look at orangutans, you look at octopus.
link |
00:40:16.480
Octopus never know their parents.
link |
00:40:18.480
They barely interact with any other.
link |
00:40:20.480
And they get to be really smart in less than a year
link |
00:40:23.480
in like half a year.
link |
00:40:25.480
You know, in a year of their adults, in two years they're dead.
link |
00:40:28.480
So there are things that we think as humans are intimately linked
link |
00:40:34.480
with intelligence, like social interaction, like language.
link |
00:40:39.480
We think, I think we give way too much importance to language
link |
00:40:43.480
as a substrate of intelligence as humans.
link |
00:40:46.480
Because we think our reasoning is so linked with language.
link |
00:40:49.480
So for, to solve the house cat intelligence problem,
link |
00:40:53.480
you think you could do it on a desert island.
link |
00:40:55.480
You could have a cat sitting there looking at the waves,
link |
00:41:01.480
at the ocean waves and figure a lot of it out.
link |
00:41:05.480
It needs to have sort of, you know, the right set of drives
link |
00:41:09.480
to kind of, you know, get it to do the thing
link |
00:41:12.480
and learn the appropriate things, right?
link |
00:41:14.480
But like, for example, you know, baby humans are driven
link |
00:41:19.480
to learn to stand up and walk.
link |
00:41:22.480
Okay, that's kind of, this desire is hardwired.
link |
00:41:25.480
How to do it precisely is not, that's learned.
link |
00:41:28.480
But the desire to walk, move around and stand up,
link |
00:41:32.480
that's sort of hardwired.
link |
00:41:35.480
It's very simple to hardwire this kind of stuff.
link |
00:41:38.480
Oh, like the desire to, well, that's interesting.
link |
00:41:42.480
You're hardwired to want to walk.
link |
00:41:45.480
That's not a, there's got to be a deeper need for walking.
link |
00:41:50.480
I think it was probably socially imposed by society
link |
00:41:53.480
that you need to walk all the other bipedal.
link |
00:41:55.480
No, like a lot of simple animals that, you know,
link |
00:41:58.480
would probably walk without ever watching any other members
link |
00:42:02.480
of the species.
link |
00:42:03.480
It seems like a scary thing to have to do
link |
00:42:06.480
because you suck at bipedal walking at first.
link |
00:42:09.480
It seems crawling is much safer, much more like,
link |
00:42:13.480
why are you in a hurry?
link |
00:42:15.480
Well, because you have this thing that drives you to do it,
link |
00:42:18.480
you know, which is sort of part of the sort of human development.
link |
00:42:24.480
Is that understood actually what?
link |
00:42:26.480
Not entirely.
link |
00:42:27.480
No.
link |
00:42:28.480
What's the reason you get on two feet?
link |
00:42:29.480
It's really hard.
link |
00:42:30.480
Like most animals don't get on two feet.
link |
00:42:32.480
Well, they get on four feet.
link |
00:42:33.480
You know, many mammals get on four feet.
link |
00:42:35.480
Yeah, they do.
link |
00:42:36.480
Very quickly.
link |
00:42:37.480
Some of them extremely quickly.
link |
00:42:38.480
But I don't, you know, like from the last time I've interacted
link |
00:42:41.480
with a table, that's much more stable than a thing
link |
00:42:43.480
than two legs.
link |
00:42:44.480
It's just a really hard problem.
link |
00:42:46.480
Yeah, I mean birds have figured it out with two feet.
link |
00:42:49.480
Well, technically, we can go into ontology.
link |
00:42:52.480
They have four.
link |
00:42:53.480
I guess they have two feet.
link |
00:42:54.480
They have two feet.
link |
00:42:55.480
Chickens.
link |
00:42:56.480
You know, dinosaurs have two feet.
link |
00:42:58.480
Many of them.
link |
00:42:59.480
Allegedly.
link |
00:43:01.480
I'm just now learning that T. rex was eating grass,
link |
00:43:04.480
not other animals.
link |
00:43:05.480
T. rex might have been a friendly pet.
link |
00:43:08.480
What do you think about, I don't know if you looked at the test
link |
00:43:13.480
for general intelligence that Francois Chollet put together?
link |
00:43:15.480
I don't know if you got a chance to look at that kind of thing.
link |
00:43:18.480
What's your intuition about how to solve like an IQ type of test?
link |
00:43:23.480
I don't know.
link |
00:43:24.480
I think it's so outside of my radar screen that it's not really relevant,
link |
00:43:28.480
I think, in the short term.
link |
00:43:30.480
Well, I guess one way to ask another way, perhaps more closer to what do
link |
00:43:36.480
your work is like, how do you solve MNIST with very little example data?
link |
00:43:42.480
That's right.
link |
00:43:43.480
The answer to this probably is self supervised running.
link |
00:43:45.480
Just learn to represent images and then learning to recognize
link |
00:43:49.480
handwritten digits on top of this will only require a few samples.
link |
00:43:53.480
We observe this in humans, right?
link |
00:43:55.480
You show a young child a picture book with a couple of pictures of an elephant
link |
00:44:00.480
and that's it.
link |
00:44:01.480
The child knows what an elephant is.
link |
00:44:03.480
We see this today with practical systems that we train image recognition
link |
00:44:08.480
systems with enormous amounts of images, either completely self
link |
00:44:14.480
supervised or very weakly supervised.
link |
00:44:16.480
For example, you can train a neural net to predict whatever hashtag
link |
00:44:21.480
people type on Instagram, right?
link |
00:44:23.480
Then you can do this with billions of images because there's billions
link |
00:44:25.480
per day that are showing up.
link |
00:44:27.480
So the amount of training data there is essentially unlimited.
link |
00:44:31.480
And then you take the output representation, a couple of layers
link |
00:44:35.480
down from the output of what the system learned and feed this as input
link |
00:44:40.480
to a classifier for any object in the world that you want.
link |
00:44:43.480
And it works pretty well.
link |
00:44:44.480
So that's transfer learning or weakly supervised transfer learning.
link |
00:44:50.480
People are making very, very fast progress using self supervised
link |
00:44:54.480
running with this kind of scenario as well.
link |
00:44:57.480
And my guess is that that's going to be the future.
link |
00:45:02.480
For self supervised learning, how much cleaning do you think is needed
link |
00:45:06.480
for filtering malicious signal or what's the better term?
link |
00:45:12.480
But like a lot of people use hashtags on Instagram to get like good SEO
link |
00:45:19.480
that doesn't fully represent the contents of the image.
link |
00:45:22.480
Like they'll put a picture of a cat and hashtag it with like science,
link |
00:45:26.480
awesome, fun.
link |
00:45:27.480
I don't know.
link |
00:45:28.480
Why would you put science?
link |
00:45:30.480
That's not very good SEO.
link |
00:45:32.480
The way my colleagues who worked on this project at Facebook now,
link |
00:45:36.480
META, META AI, a few years ago, dealt with this is that they only
link |
00:45:41.480
selected something like 17,000 tags that correspond to kind of
link |
00:45:44.480
physical things or situations.
link |
00:45:47.480
Like, you know, that has some visual content.
link |
00:45:51.480
So, you know, you wouldn't have like hash TBT or anything like that.
link |
00:45:56.480
Also, they keep a very select set of hashtags.
link |
00:46:00.480
Is that what you're saying?
link |
00:46:01.480
Yeah.
link |
00:46:02.480
Okay.
link |
00:46:03.480
But it's still on the order of, you know, 10 to 20,000.
link |
00:46:06.480
So it's fairly large.
link |
00:46:08.480
Okay.
link |
00:46:09.480
Can you tell me about data augmentation?
link |
00:46:11.480
What the heck is data augmentation and how is it used?
link |
00:46:14.480
Maybe contrast of learning for video?
link |
00:46:18.480
What are some cool ideas here?
link |
00:46:20.480
Right.
link |
00:46:21.480
So data augmentation, I mean, first data augmentation, you know,
link |
00:46:23.480
is the idea of artificially increasing the size of your training set
link |
00:46:26.480
by distorting the images that you have in ways that don't change
link |
00:46:30.480
the nature of the image.
link |
00:46:31.480
Right.
link |
00:46:32.480
So you take, you do MNIST, you can do data augmentation on MNIST.
link |
00:46:35.480
And people have done this since the 1990s, right?
link |
00:46:37.480
You take a MNIST digit and you shift it a little bit or you change
link |
00:46:41.480
the size or rotate it, skew it, you know, et cetera.
link |
00:46:46.480
Add noise.
link |
00:46:47.480
Add noise, et cetera.
link |
00:46:49.480
And it works better if you train a supervised classifier with
link |
00:46:52.480
augmented data.
link |
00:46:53.480
You're going to get better results.
link |
00:46:55.480
Now it's become really interesting over the last couple of years
link |
00:47:00.480
because a lot of self supervised learning techniques to pre train
link |
00:47:05.480
vision systems are based on data augmentation.
link |
00:47:07.480
And the basic techniques is originally inspired by techniques
link |
00:47:13.480
that I worked on in the early 90s and Jeff Newton worked on
link |
00:47:16.480
also in the early 90s.
link |
00:47:17.480
There was sort of parallel work.
link |
00:47:19.480
I used to call this same is network.
link |
00:47:21.480
So basically you take two identical copies of the same network.
link |
00:47:25.480
They share the same weights and you show two different views of
link |
00:47:30.480
the same object.
link |
00:47:31.480
Either those two different views may have been obtained by
link |
00:47:33.480
data augmentation or maybe it's two different views of the same
link |
00:47:36.480
scene from a camera that you moved or at different times or
link |
00:47:39.480
something like that, right?
link |
00:47:41.480
Or two pictures of the same person, things like that.
link |
00:47:43.480
And then you train this neural net, those two identical copies
link |
00:47:47.480
of this neural net to produce an output representation, a vector
link |
00:47:51.480
in such a way that the representation for those two images
link |
00:47:55.480
are as close to each other as possible, as identical to each
link |
00:47:59.480
other as possible, right?
link |
00:48:00.480
Because you want the system to basically learn a function that
link |
00:48:04.480
will be invariant that will not change, whose output will not
link |
00:48:07.480
change when you transform those inputs in those particular
link |
00:48:11.480
ways, right?
link |
00:48:13.480
So that's easy to do.
link |
00:48:15.480
What's complicated is how do you make sure that when you show
link |
00:48:18.480
two images that are different, the system will produce different
link |
00:48:20.480
things?
link |
00:48:21.480
Because if you don't have a specific provision for this, the
link |
00:48:26.480
system will just ignore the inputs.
link |
00:48:28.480
When you train it, it will end up ignoring the input and just
link |
00:48:30.480
produce a constant vector that is the same for every input,
link |
00:48:33.480
right?
link |
00:48:34.480
That's called a collapse.
link |
00:48:35.480
Now, how do you avoid collapse?
link |
00:48:36.480
So there's two ideas.
link |
00:48:38.480
One idea that I proposed in the early 90s with my colleagues at
link |
00:48:42.480
the lab, Jane Bromley and a couple other people, which we now
link |
00:48:47.480
call contrastive learning, which is to have negative examples,
link |
00:48:50.480
right?
link |
00:48:51.480
So you have pairs of images that you know are different.
link |
00:48:54.480
And you show them to the network and those two copies, and then
link |
00:48:58.480
you push the two output vectors away from each other.
link |
00:49:01.480
And it will eventually guarantee that things that are
link |
00:49:04.480
symmetrically similar produce similar representations and
link |
00:49:07.480
things that are different produce different representations.
link |
00:49:10.480
So we actually came up with this idea for a project of doing
link |
00:49:13.480
signature verification.
link |
00:49:14.480
So we would collect signatures from like multiple signatures on
link |
00:49:19.480
the same person and then train a neural net to produce the same
link |
00:49:22.480
representation.
link |
00:49:23.480
And then, you know, force the system to produce different
link |
00:49:27.480
representation from different signatures.
link |
00:49:30.480
This was actually the problem was proposed by people from what
link |
00:49:34.480
was a subsidiary of AT&T at the time called NCR.
link |
00:49:38.480
They were interested in storing representation of the signature
link |
00:49:41.480
on the 80 bytes of the magnetic strip of a credit card.
link |
00:49:46.480
So we came up with this idea of having a neural net with 80
link |
00:49:49.480
outputs, you know, that we would quantize on bytes so that we
link |
00:49:52.480
could encode the signature.
link |
00:49:54.480
And that encoding was then used to compare whether the
link |
00:49:56.480
signature matches or not.
link |
00:49:57.480
That's right.
link |
00:49:58.480
So then you would, you know, sign, it would run through the
link |
00:50:00.480
neural net and then you would compare the output vector to
link |
00:50:02.480
whatever is stored on your card.
link |
00:50:03.480
Did it actually work?
link |
00:50:04.480
It worked, but they ended up not using it.
link |
00:50:06.480
Because nobody cares actually.
link |
00:50:09.480
I mean, the American, you know, financial payment system is
link |
00:50:13.480
incredibly lax in that respect compared to Europe.
link |
00:50:17.480
Oh, the signatures.
link |
00:50:18.480
What's the purpose of signatures anyway?
link |
00:50:20.480
Nobody looks at them.
link |
00:50:21.480
Nobody cares.
link |
00:50:22.480
Yeah.
link |
00:50:23.480
So that's contrastive learning, right?
link |
00:50:27.480
So you need positive and negative pairs.
link |
00:50:29.480
And the problem with that is that, you know, even though I had
link |
00:50:32.480
the original paper on this, I actually not very positive about
link |
00:50:35.480
it because it doesn't work in high dimension.
link |
00:50:38.480
If your representation is high dimensional, there's just too
link |
00:50:41.480
many ways for two things to be different.
link |
00:50:43.480
And so you would need lots and lots and lots of negative pairs.
link |
00:50:47.480
So there is a particular implementation of this, which is
link |
00:50:50.480
relatively recent from actually the Google Toronto group,
link |
00:50:54.480
where, you know, Jeff Hinton is the senior member there.
link |
00:50:58.480
It's called SimClear, SIN, CLR.
link |
00:51:01.480
And, you know, basically a particular way of implementing this
link |
00:51:04.480
idea of contrastive learning is a particular objective function.
link |
00:51:08.480
Now, what I'm much more enthusiastic about these days is
link |
00:51:12.480
non contrastive methods.
link |
00:51:14.480
So other ways to guarantee that the representations would be
link |
00:51:19.480
different for different inputs.
link |
00:51:23.480
And it's actually based on an idea that Jeff Hinton proposed
link |
00:51:28.480
in the early 90s with his student at the time, Sue Becker.
link |
00:51:31.480
And it's based on the idea of maximizing the mutual
link |
00:51:33.480
information between the outputs of the two systems.
link |
00:51:35.480
You only show positive pairs.
link |
00:51:37.480
You only show pairs of images that you know are somewhat similar.
link |
00:51:41.480
And you train the two networks to be informative,
link |
00:51:44.480
but also to be as informative of each other as possible.
link |
00:51:49.480
So basically one representation has to be predictable
link |
00:51:51.480
from the other, essentially.
link |
00:51:54.480
And, you know, he proposed that idea had, you know,
link |
00:51:58.480
a couple of papers in the early 90s, and then nothing was
link |
00:52:00.480
done about it for decades.
link |
00:52:03.480
And I kind of revived this idea together with my postdocs
link |
00:52:05.480
at FAIR, particularly a postdoc called Steph Anthony,
link |
00:52:09.480
who is now a junior professor in Finland at University of
link |
00:52:12.480
Alto.
link |
00:52:14.480
We came up with something called, that we call Balu twins.
link |
00:52:18.480
And it's a particular way of maximizing the information content
link |
00:52:22.480
of a vector, you know, using some hypothesis.
link |
00:52:28.480
And we have kind of another version of it that's more recent
link |
00:52:32.480
now called VICRAG, V I C A R E G.
link |
00:52:34.480
That means variance invariance covariance regularization.
link |
00:52:37.480
And it's the thing I'm the most excited about in machine
link |
00:52:40.480
learning in the last 15 years.
link |
00:52:41.480
I mean, I'm not, I'm really, really excited about this.
link |
00:52:44.480
What kind of data augmentation is useful for that
link |
00:52:47.480
noncontrasting learning method?
link |
00:52:49.480
Are we talking about, does that not matter that much?
link |
00:52:52.480
Or it seems like a very important part of the step.
link |
00:52:55.480
Yeah.
link |
00:52:56.480
Do you generate the images that are similar but sufficiently
link |
00:52:58.480
different?
link |
00:52:59.480
Yeah, that's right.
link |
00:53:00.480
It's an important step.
link |
00:53:01.480
And it's also an annoying step because you need to have that
link |
00:53:03.480
knowledge of what the documentation you can do that do not
link |
00:53:07.480
change the nature of the, of the object.
link |
00:53:10.480
And so the standard scenario, which, you know, a lot of people
link |
00:53:14.480
working in this area are using is you use the type of distortion.
link |
00:53:19.480
So, so basically you do a geometric distortion.
link |
00:53:22.480
So one basically just shifts the image a little bit.
link |
00:53:24.480
It's called cropping.
link |
00:53:25.480
Another one kind of changes the scale a little bit.
link |
00:53:27.480
Another one kind of rotates it.
link |
00:53:29.480
Another one changes the colors.
link |
00:53:30.480
You know, you can do a shift in color balance or something like
link |
00:53:33.480
that.
link |
00:53:34.480
Saturation.
link |
00:53:35.480
Another one sort of blurs it.
link |
00:53:37.480
Another one has noise.
link |
00:53:38.480
So you have like a catalog of kind of standard things and people
link |
00:53:41.480
try to use the same ones for different algorithms so that they
link |
00:53:44.480
can compare.
link |
00:53:45.480
But some algorithms, some surface algorithm actually can deal
link |
00:53:49.480
with much bigger, like more aggressive data augmentation and
link |
00:53:52.480
some don't.
link |
00:53:53.480
So that kind of makes the whole thing difficult.
link |
00:53:56.480
But that's the kind of distortions we're talking about.
link |
00:53:58.480
And so you train with those distortions and then you chop
link |
00:54:05.480
off the last layer, a couple layers of the network and you use
link |
00:54:11.480
the representation as input to a classifier.
link |
00:54:13.480
You train the classifier on ImageNet, let's say, or whatever
link |
00:54:18.480
and measure the performance.
link |
00:54:20.480
And interestingly enough, the methods that are really good at
link |
00:54:24.480
eliminating the information that is irrelevant, which is the
link |
00:54:27.480
distortions between those images, do a good job at eliminating
link |
00:54:31.480
it.
link |
00:54:32.480
And as a consequence, you cannot use the representations in
link |
00:54:36.480
those systems for things like object detection and localization
link |
00:54:39.480
because that information is gone.
link |
00:54:42.480
So the type of data augmentation you need to do depends on the
link |
00:54:45.480
tasks you want eventually the system to solve.
link |
00:54:48.480
And the type of data augmentation, standard data augmentation
link |
00:54:51.480
that we use today are only appropriate for object recognition
link |
00:54:54.480
or image classification.
link |
00:54:55.480
They're not appropriate for things like.
link |
00:54:57.480
Can you help me out understand why the localization?
link |
00:55:00.480
So you're saying it's just not good at the negative, like at
link |
00:55:04.480
classifying the negative.
link |
00:55:05.480
So that's why it can't be used for the localization.
link |
00:55:07.480
No, it's just that you train the system, you know, you give
link |
00:55:11.480
it an image and then you give it the same image shifted and
link |
00:55:14.480
scaled and you tell it that's the same image.
link |
00:55:16.480
So the system basically is trained to eliminate the
link |
00:55:19.480
information about position and size.
link |
00:55:21.480
So now, and now you want to use that.
link |
00:55:24.480
Oh, like where an object is and what size.
link |
00:55:27.480
It's like a bounding box.
link |
00:55:28.480
They'd be able to actually.
link |
00:55:29.480
Okay.
link |
00:55:30.480
It can still find the object in the image.
link |
00:55:33.480
It's just not very good at finding the exact boundaries of
link |
00:55:36.480
that object.
link |
00:55:37.480
Interesting.
link |
00:55:38.480
Interesting.
link |
00:55:39.480
Which, you know, that's an interesting sort of philosophical
link |
00:55:42.480
question.
link |
00:55:43.480
How important is object localization anyway?
link |
00:55:46.480
We're like obsessed by measuring like image segmentation.
link |
00:55:50.480
Obsessed by measuring perfectly knowing the boundaries of
link |
00:55:53.480
objects when arguably that's not that essential to understanding
link |
00:56:01.480
what are the contents of the scene.
link |
00:56:03.480
On the other hand, I think evolutionarily, the first vision
link |
00:56:06.480
systems in animals were basically all about localization,
link |
00:56:09.480
very little about recognition.
link |
00:56:11.480
And in the human brain, you have two separate pathways for
link |
00:56:15.480
recognizing the nature of a scene or an object and localizing
link |
00:56:21.480
objects.
link |
00:56:22.480
So you use the first pathway called a ventral pathway for,
link |
00:56:25.480
you know, telling what you're looking at.
link |
00:56:28.480
The other path with the dorsal pathway is used for navigation,
link |
00:56:31.480
for grasping, for everything else.
link |
00:56:33.480
And, you know, basically a lot of the things you need for
link |
00:56:35.480
survival are localization and detection.
link |
00:56:40.480
Is similarity learning or contrastive learning or these
link |
00:56:45.480
noncontrastive methods the same as understanding something?
link |
00:56:48.480
Just because, you know, a distorted cat is the same as a
link |
00:56:51.480
non distorted cat.
link |
00:56:52.480
Does that mean you understand what it means to be a cat?
link |
00:56:56.480
To some extent.
link |
00:56:57.480
I mean, it's a superficial understanding, obviously.
link |
00:56:59.480
But like what is the ceiling of this method, do you think?
link |
00:57:02.480
Is this just one trick on the path to doing self
link |
00:57:06.480
supervised learning?
link |
00:57:07.480
Can we go really, really far?
link |
00:57:09.480
I think we can go really far.
link |
00:57:11.480
So if we figure out how to use techniques of that type, perhaps
link |
00:57:16.480
very different, but, you know, of the same nature to train a
link |
00:57:20.480
system from video to do video prediction, essentially, I think
link |
00:57:25.480
we'll have a path, you know, towards, you know, I wouldn't
link |
00:57:30.480
say unlimited, but a path towards some level of, you know,
link |
00:57:35.480
physical common sense in machines.
link |
00:57:37.480
And I also think that that ability to learn how the world
link |
00:57:44.480
works from a sort of high throughput channel like vision
link |
00:57:48.480
is a necessary step towards sort of real artificial intelligence.
link |
00:57:55.480
In other words, I believe in grounded intelligence.
link |
00:57:57.480
I don't think we can train a machine to be intelligent purely
link |
00:58:00.480
from text, because I think the amount of information about the
link |
00:58:04.480
world that's contained in text is tiny compared to what we need
link |
00:58:09.480
to know.
link |
00:58:10.480
So for example, let's, and I know people have attempted to do
link |
00:58:14.480
this for 30 years, right?
link |
00:58:16.480
The site project and things like that, right?
link |
00:58:18.480
Of basically kind of writing down all the facts that are known
link |
00:58:21.480
and hoping that some sort of common sense will emerge.
link |
00:58:24.480
I think it's basically hopeless.
link |
00:58:26.480
But let me take an example.
link |
00:58:28.480
You take an object.
link |
00:58:29.480
I describe a situation to you.
link |
00:58:31.480
I take an object.
link |
00:58:32.480
I put it on the table and I push the table.
link |
00:58:34.480
It's completely obvious to you that the object will be pushed
link |
00:58:37.480
with the table, right?
link |
00:58:39.480
Because it's sitting on it.
link |
00:58:41.480
There's no text in the world, I believe, that explains this.
link |
00:58:44.480
And so if you train a machine as powerful as it could be,
link |
00:58:48.480
you know, your GPT 5000 or whatever it is,
link |
00:58:53.480
it's never going to learn about this.
link |
00:58:56.480
That information is just not present in any text.
link |
00:59:00.480
Well, the question, like with the site project,
link |
00:59:03.480
the dream I think is to have like 10 million, say, facts like that
link |
00:59:10.480
that give you a head start, like a parent guiding you.
link |
00:59:15.480
Now we humans don't need a parent to tell us that the table will move.
link |
00:59:19.480
Sorry, the smartphone will move with the table.
link |
00:59:21.480
But we get a lot of guidance in other ways.
link |
00:59:25.480
So it's possible that we can give it a quick shortcut.
link |
00:59:28.480
What about cat? The cat knows that.
link |
00:59:30.480
No, but they evolved.
link |
00:59:32.480
No, they learned like us.
link |
00:59:35.480
Sorry, the physics of stuff.
link |
00:59:38.480
Well, yeah, so you're saying it's,
link |
00:59:42.480
you're putting a lot of intelligence onto the nurture side, not the nature.
link |
00:59:47.480
We seem to have, you know, there's a very inefficient,
link |
00:59:51.480
arguably, process of evolution that got us from bacteria to who we are today.
link |
00:59:57.480
Started at the bottom. Now we're here.
link |
00:59:59.480
So the question is how, okay.
link |
01:00:03.480
So the question is how fundamental is that the nature of the whole hardware?
link |
01:00:08.480
And then is there any way to shortcut it if it's fundamental?
link |
01:00:12.480
If it's not, if it's most of intelligence, most of the cool stuff we've been talking about
link |
01:00:15.480
is mostly nurture, mostly trained.
link |
01:00:18.480
We figured out by observing the world.
link |
01:00:20.480
We can form that big, beautiful, sexy background model
link |
01:00:24.480
that you're talking about just by sitting there.
link |
01:00:28.480
Then, okay, then you need to, then like maybe,
link |
01:00:34.480
it is all supervised learning all the way down.
link |
01:00:37.480
Self supervised learning, say.
link |
01:00:39.480
Whatever it is that makes, you know, human intelligence different from other animals,
link |
01:00:43.480
which, you know, a lot of people think is language and logical reasoning and this kind of stuff.
link |
01:00:48.480
It cannot be that complicated because it only popped up in the last million years.
link |
01:00:52.480
And, you know, it only involves, you know, less than 1% of a genome, right?
link |
01:00:59.480
Which is the difference between human genome and chimps or whatever.
link |
01:01:03.480
So it can be that complicated, you know, it can be that fundamental.
link |
01:01:07.480
I mean, the most of the so complicated stuff already existing cats and dogs
link |
01:01:12.480
and, you know, certainly primates, non human primates.
link |
01:01:16.480
Yeah, that little thing with humans might be just something about social interaction
link |
01:01:22.480
and ability to maintain ideas across like a collective of people.
link |
01:01:27.480
It sounds very dramatic and very impressive, but it probably isn't mechanistically speaking.
link |
01:01:33.480
It is, but we're not there yet.
link |
01:01:34.480
Like, you know, we have, I mean, this is number 634, you know, in the list of problems we have to solve.
link |
01:01:42.480
So basic physics of the world is number one.
link |
01:01:46.480
What do you, just a quick tangent on data augmentation.
link |
01:01:51.480
So a lot of it is hard coded versus learned.
link |
01:01:57.480
Do you have any intuition that maybe there could be some weird data augmentation,
link |
01:02:03.480
like generative type of data augmentation, like doing something weird to images,
link |
01:02:07.480
which then improves the similarity learning process.
link |
01:02:12.480
So not just kind of dumb, simple distortions, but by you shaking your head,
link |
01:02:17.480
just saying that even simple distortions are enough.
link |
01:02:20.480
I think no, I think data augmentation is a temporary necessary evil.
link |
01:02:25.480
So what people are working on now is two things.
link |
01:02:28.480
One is the type of self supervised learning, like trying to translate the type of self supervised learning people
link |
01:02:35.480
using language, translating these two images, which is basically a denosing auto encoder method.
link |
01:02:41.480
So you take an image, you block, you mask some parts of it,
link |
01:02:46.480
and then you train some giant neural net to reconstruct the parts that are missing.
link |
01:02:52.480
And until very recently, there was no working methods for that.
link |
01:02:58.480
All the auto encoder type methods for images weren't producing very good representation.
link |
01:03:03.480
But there's a paper now coming out of the Fair Group in Menlo Park that actually works very well.
link |
01:03:08.480
So that doesn't require the documentation, that requires only masking.
link |
01:03:14.480
Only masking for images, okay.
link |
01:03:18.480
Right, so you mask part of the image and you train a system,
link |
01:03:21.480
which in this case is a transformer because the transformer represents the image as non overlapping patches,
link |
01:03:30.480
so it's easy to mask patches and things like that.
link |
01:03:33.480
Okay, then my question transfers to that problem, then masking.
link |
01:03:36.480
Why should the mask be a square rectangle?
link |
01:03:39.480
So it doesn't matter.
link |
01:03:41.480
I think we're going to come up probably in the future with ways to mask that are kind of random, essentially.
link |
01:03:50.480
I mean, they are random already.
link |
01:03:52.480
No, but something that's challenging, optimally challenging.
link |
01:03:59.480
So maybe it's a metaphor that doesn't apply, but it seems like there's an data augmentation or masking.
link |
01:04:07.480
There's an interactive element with it.
link |
01:04:09.480
You're almost playing with an image and it's like the way we play with an image in our minds.
link |
01:04:15.480
No, but it's like dropout.
link |
01:04:16.480
It's like machine training.
link |
01:04:18.480
Every time you see a percept, you can perturb it in some way.
link |
01:04:26.480
And then the principle of the training procedure is to minimize the difference of the output or the representation
link |
01:04:34.480
between the clean version and the corrupted version, essentially, right?
link |
01:04:39.480
And you can do this in real time, right?
link |
01:04:41.480
So it was a machine work like this, right?
link |
01:04:44.480
You show a percept, you tell the machine that's a good combination of activities or your input neurons.
link |
01:04:51.480
And then you either let them go their merry way without clamping them to values, or you only do this with a subset.
link |
01:05:01.480
And what you're doing is you're training the system so that the stable state of the entire network is the same
link |
01:05:08.480
regardless of whether it sees the entire input or whether it sees only part of it.
link |
01:05:13.480
You know, denoting autoencoder method is basically the same thing, right?
link |
01:05:16.480
You're training a system to reproduce the input, the complete input and filling the blanks regardless of which parts are missing.
link |
01:05:23.480
And that's really the underlying principle.
link |
01:05:25.480
And you could imagine sort of even in the brain some sort of neural principle where, you know, neurons can oscillate, right?
link |
01:05:32.480
So they take their activity and then temporarily they kind of shut off to, you know,
link |
01:05:37.480
force the rest of the system to basically reconstruct the input without their help, you know?
link |
01:05:44.480
And I mean, you could imagine, you know, more or less biologically possible processes.
link |
01:05:50.480
Something like that.
link |
01:05:51.480
And I guess with this denoising autoencoder and masking and data augmentation, you don't have to worry about being super efficient.
link |
01:06:00.480
You can just do as much as you want and get better over time.
link |
01:06:05.480
Because I was thinking like you might want to be clever about the way you do all these procedures, you know?
link |
01:06:11.480
But that's only, it's somehow costly to do every iteration, but it's not really.
link |
01:06:17.480
Not really.
link |
01:06:19.480
And then there is, you know, data augmentation without explicit data augmentation.
link |
01:06:23.480
It's data augmentation by weighting, which is, you know, the sort of video prediction.
link |
01:06:28.480
You're observing a video clip, observing the, you know, the continuation of that video clip.
link |
01:06:35.480
You try to learn a representation using the joint embedding architectures in such a way that the representation of the future clip is easily predictable from the representation of the observed clip.
link |
01:06:47.480
Do you think YouTube has enough raw data from which to learn how to be a cat?
link |
01:06:55.480
I think so.
link |
01:06:57.480
So the amount of data is not the constraint?
link |
01:07:00.480
No, it would require some selection, I think.
link |
01:07:03.480
Some selection of, you know, maybe the right type of data.
link |
01:07:08.480
You need some...
link |
01:07:09.480
Don't go down the rabbit hole of just cat videos.
link |
01:07:11.480
You might need to watch some lectures or something.
link |
01:07:14.480
No, you...
link |
01:07:15.480
How meta would that be if it like watches lectures about intelligence and then learns, watches your lectures at NYU and learns from that how to be intelligent?
link |
01:07:26.480
I don't think that would be enough.
link |
01:07:29.480
What's your... do you find multimodal learning interesting?
link |
01:07:33.480
We've been talking about visual language, like combining those together, maybe audio, all those kinds of things.
link |
01:07:38.480
There's a lot of things that I find interesting in the short term, but are not addressing the important problem that I think are really kind of the big challenges.
link |
01:07:46.480
So I think, you know, things like multitask learning, continual learning, you know, adversarial issues.
link |
01:07:54.480
I mean, those have, you know, great practical interests in the relatively short term, possibly.
link |
01:08:00.480
But I don't think they're fundamental, you know, active learning, even to some extent reinforcement learning.
link |
01:08:04.480
I think those things will become either obsolete or useless or easy once we figure out how to do self supervised representation learning or learning predictive world models.
link |
01:08:19.480
So I think that's what, you know, the entire community should be focusing on.
link |
01:08:24.480
At least people are interested in sort of fundamental questions or, you know, really kind of pushing the envelope of AI towards the next stage.
link |
01:08:31.480
But of course, there's like a huge amount of, you know, very interesting work to do in sort of practical questions that have, you know, short term impact.
link |
01:08:38.480
Well, you know, it's difficult to talk about the temporal scale because all of human civilization will eventually be destroyed because the sun will die out.
link |
01:08:48.480
And even if Elon Musk is successful, multi planetary colonization across the galaxy, eventually the entirety of it will just become giant black holes.
link |
01:08:58.480
And that's going to take a while though.
link |
01:09:01.480
So, but what I'm saying is then that logic can be used to say it's all meaningless.
link |
01:09:06.480
I'm saying all that to say that multitask learning might be your song, you're calling it practical or pragmatic or whatever.
link |
01:09:16.480
That might be the thing that achieves something very kinder intelligence while we're trying to solve the more general problem of self supervised learning of background knowledge.
link |
01:09:28.480
So the reason I bring that up may be one way to ask that question.
link |
01:09:32.480
I've been very impressed by what Tesla autopilot team is doing.
link |
01:09:35.480
I don't know if you've gotten a chance to glance at this particular one example of multitask learning where they're literally taking the problem.
link |
01:09:44.480
Like, I don't know, Charles Darwin start studying animals.
link |
01:09:48.480
They're studying the problem of driving and asking, okay, what are all the things you have to perceive?
link |
01:09:54.480
And the way they're solving it is one, there's an ontology where you're bringing that to the table.
link |
01:10:00.480
So you formulate a bunch of different tasks.
link |
01:10:02.480
It's like over a hundred tasks or something like that that they're involved in driving.
link |
01:10:05.480
And then they're deploying it and then getting data back from people that run into trouble.
link |
01:10:10.480
And then trying to figure out, do we add tasks? Do we, like we focus on each individual tasks separately?
link |
01:10:15.480
In fact, half.
link |
01:10:17.480
So the, I would say I'll classify Andre Capati's talking two ways.
link |
01:10:20.480
So one was about doors and the other one about how much ImageNet sucks.
link |
01:10:24.480
He kept going back and forth on those two topics, which ImageNet sucks, meaning you can't just use a single benchmark.
link |
01:10:32.480
There's so like, you have to have like a giant suite of benchmarks to understand how well your system actually works.
link |
01:10:39.480
Oh, I agree with him. I mean, he's a very sensible guy.
link |
01:10:43.480
Now, okay, it's very clear that if you're faced with an engineering problem that you need to solve in a relatively short time,
link |
01:10:51.480
particularly if you have it almost breathing down your neck, you're going to have to take shortcuts, right?
link |
01:10:57.480
You might think about the fact that the right thing to do and the long term solution involves, you know,
link |
01:11:04.480
some fancy software provisioning, but you have, you know, almost breathing down your neck.
link |
01:11:09.480
And, you know, this involves, you know, human lives.
link |
01:11:13.480
And so you have to basically just do the systematic engineering and, you know, fine tuning and refinements and try an error and all that stuff.
link |
01:11:25.480
There's nothing wrong with that. That's, that's called engineering.
link |
01:11:28.480
That's called, you know, putting technology out in the, in the world.
link |
01:11:35.480
And you have to kind of ironclad it before, before you do this, you know, so much for, you know, grand ideas and principles.
link |
01:11:47.480
But, you know, I'm placing myself sort of, you know, some, you know, upstream of this, you know, quite a bit upstream of this.
link |
01:11:55.480
Your play, I don't think about platonic forms. Your platonic because eventually I want the stuff to get used.
link |
01:12:02.480
But it's okay if it takes five or 10 years for the community to realize this is the right thing to do.
link |
01:12:08.480
I've done this before. It's been the case before that, you know, I've made that case.
link |
01:12:13.480
I mean, if you look back in the mid 2000, for example, and you ask yourself the question, okay, I want to recognize cars or faces or whatever.
link |
01:12:21.480
You know, I can use convolutional net, so I can use a more conventional kind of computer vision techniques, you know, using interest point detectors or a sift,
link |
01:12:32.480
density features and, you know, sticking an SVM on top.
link |
01:12:35.480
At that time, the data sets were so small that those methods that use more hand engineering work better than com nets.
link |
01:12:43.480
There was just not enough data for com nets, and com nets were, were a little, a little slow with the kind of hardware that was available at the time.
link |
01:12:50.480
And there was a sea change when basically when, you know, data sets became bigger and GPUs became available.
link |
01:12:58.480
That's what, you know, the two of the main factors that basically made people change their, change their mind.
link |
01:13:07.480
And you can, you can look at the history of like all sub branches of AI or pattern recognition.
link |
01:13:16.480
And there's a similar trajectory followed by techniques where people start by, you know, engineering the hell out of it.
link |
01:13:24.480
You know, be it optical character recognition, speech recognition, computer vision, like image recognition in general, natural language understanding,
link |
01:13:35.480
like, you know, translation, things like that, right, you start to engineer the hell out of it.
link |
01:13:40.480
You start to acquire all the knowledge, the prior knowledge, you know, about image formation, about, you know, the shape of characters,
link |
01:13:46.480
about, you know, morphological operations, about like feature extraction, Fourier transforms, you know, Wernicke moments, you know, whatever, right.
link |
01:13:54.480
People have come up with thousands of ways of representing images so that they could be easily classified afterwards.
link |
01:14:01.480
Same for speech recognition, right. There is, you know, two decades for people to figure out a good front end to pre process a speech signal
link |
01:14:09.480
so that, you know, the information about what is being said is preserved, but most of the information about the identity of the speaker is gone.
link |
01:14:17.480
You know, casserole coefficient, so whatever, right.
link |
01:14:21.480
And same for text, right. You do name identity recognition and then you parse and you do tagging of the parts of speech and, you know,
link |
01:14:33.480
you do this sort of tree representation of clauses and all that stuff right before you can do anything.
link |
01:14:41.480
So that's how it starts, right. Just engineer the hell out of it.
link |
01:14:46.480
And then you start having data and maybe you have more powerful computers, maybe you know something about statistical learning.
link |
01:14:53.480
So you start using machine learning and it's usually a small sliver on top of your kind of handcrafted system where, you know, you extract features by hand.
link |
01:15:00.480
Okay, and now, you know, nowadays the standard way of doing this is that you train the entire thing end to end with a deep learning system
link |
01:15:06.480
and it learns its own features and, you know, speech recognition systems nowadays, OCR systems are completely end to end.
link |
01:15:14.480
It's, you know, it's some giant neural net that takes raw waveforms and produces a sequence of characters coming out.
link |
01:15:21.480
And it's just a huge neural net, right. There's no Markov model, there's no language model that is explicit other than, you know,
link |
01:15:28.480
something that's ingrained in the, in the sort of neural language model if you want.
link |
01:15:31.480
Same for translation, same for all kinds of stuff.
link |
01:15:34.480
So you see this continuous evolution from, you know, less and less hand crafting and more and more learning.
link |
01:15:44.480
And I think it's true in biology as well.
link |
01:15:50.480
So, I mean, we might disagree about this, maybe not in this one little piece at the end, you mentioned active learning.
link |
01:15:59.480
It feels like active learning, which is the selection of data and also the interactivity needs to be part of this giant neural network.
link |
01:16:06.480
You cannot just be an observer to do self supervised learning.
link |
01:16:09.480
You have to, well, self supervised learning is just a word, but I would, whatever this giant stack of a neural network that's automatically learning,
link |
01:16:19.480
it feels my intuition is that you have to have a system, whether it's a physical robot or a digital robot that's interacting with the world
link |
01:16:31.480
and doing so in a flawed way and improving over time in order to do form the self supervised learning.
link |
01:16:40.480
Well, you can't just give it a giant sea of data.
link |
01:16:44.480
Okay, I agree and I disagree.
link |
01:16:46.480
I agree in the sense that I agree in two ways.
link |
01:16:51.480
The first way I agree is that if you want and you certainly need a causal model of the world that allows you to predict the consequences of your actions,
link |
01:16:59.480
to train that model, you need to take actions, right?
link |
01:17:02.480
You need to be able to act in a world and see the effect for you to learn causal models of the world.
link |
01:17:08.480
That's not obvious because you can observe others.
link |
01:17:11.480
You can observe others.
link |
01:17:12.480
And you can infer that they're similar to you and then you can learn from that.
link |
01:17:15.480
Yeah, but then you have to kind of hardware that part and mirror neurons and all that stuff.
link |
01:17:20.480
And it's not clear to me how you would do this in a machine.
link |
01:17:24.480
So I think the action part would be necessary for having causal models of the world.
link |
01:17:32.480
The second reason it may be necessary or at least more efficient is that active learning basically goes for the juggler of what you don't know, right?
link |
01:17:44.480
There's obvious areas of uncertainty about your world and about how the world behaves.
link |
01:17:52.480
And you can resolve this uncertainty by systematic exploration of that part that you don't know.
link |
01:18:00.480
And if you know that you don't know, then it makes you curious.
link |
01:18:03.480
You kind of look into situations that...
link |
01:18:05.480
And across the animal world, different species at different levels of curiosity, right?
link |
01:18:13.480
Depending on how they're built, right?
link |
01:18:15.480
So, you know, cats and rats are incredibly curious.
link |
01:18:18.480
Dogs know so much.
link |
01:18:19.480
I mean, less.
link |
01:18:20.480
Yeah.
link |
01:18:21.480
So it could be useful to have that kind of curiosity.
link |
01:18:23.480
So it'd be useful.
link |
01:18:24.480
But curiosity just makes the process faster.
link |
01:18:26.480
It doesn't make the process exist.
link |
01:18:29.480
So what process, what learning process is it that active learning makes more efficient?
link |
01:18:38.480
And I'm asking that first question.
link |
01:18:40.480
You know, we haven't answered that question yet.
link |
01:18:44.480
So, you know, I worry about active learning once this question is...
link |
01:18:48.480
So it's the more fundamental question to ask.
link |
01:18:50.480
And if active learning or interaction increases the efficiency of the learning...
link |
01:18:56.480
See, sometimes it becomes very different if the increase is several orders of magnitude, right?
link |
01:19:04.480
That's true.
link |
01:19:05.480
But fundamentally, it's still the same thing in building up the intuition about how to...
link |
01:19:10.480
In a self supervised way to construct background models, efficient or inefficient is the core problem.
link |
01:19:18.480
What do you think about Yosha Benjos talking about consciousness and all of these kinds of concepts?
link |
01:19:24.480
Okay.
link |
01:19:25.480
I don't know what consciousness is, but...
link |
01:19:29.480
It's a good opener.
link |
01:19:31.480
And to some extent, a lot of the things that are said about consciousness remind me of the questions people were asking themselves
link |
01:19:38.480
in the 18th century or 17th century when they discovered that, you know, how the eye works
link |
01:19:44.480
and the fact that the image at the back of the eye was upside down, right?
link |
01:19:49.480
Because you have a lens.
link |
01:19:50.480
And so on your retina, the image that forms is an image of the world, but it's upside down.
link |
01:19:55.480
How is it that you see right side up?
link |
01:19:57.480
And, you know, with what we know today in science, you know, we realize this question doesn't make any sense
link |
01:20:03.480
or is kind of ridiculous in some way, right?
link |
01:20:06.480
So I think a lot of what is said about consciousness is of that nature.
link |
01:20:09.480
Now that said, there's a lot of really smart people that for whom I have a lot of respect who are talking about this topic,
link |
01:20:15.480
people like David Chalmers, who is a colleague of mine at NYU.
link |
01:20:19.480
I have kind of an orthodox folk speculative hypothesis about consciousness.
link |
01:20:28.480
So we're talking about the study of a world model.
link |
01:20:31.480
And I think, you know, our entire prefrontal context basically is the engine for a world model.
link |
01:20:40.480
But when we are attending at a particular situation, we're focused on that situation.
link |
01:20:45.480
We basically cannot attend to anything else.
link |
01:20:48.480
And that seems to suggest that we basically have only one world model engine in our prefrontal context.
link |
01:20:59.480
That engine is configurable to the situation at hand.
link |
01:21:02.480
So we are burning a box out of wood or we are, you know, driving down the highway playing chess.
link |
01:21:09.480
We basically have a single model of the world that we configure into the situation at hand,
link |
01:21:15.480
which is why we can only attend to one task at a time.
link |
01:21:18.480
Now, if there is a task that we do repeatedly, it goes from the sort of deliberate reasoning using model of the world and prediction
link |
01:21:27.480
and perhaps something like model predictive control, which I was talking about earlier,
link |
01:21:31.480
to something that is more subconscious that becomes automatic.
link |
01:21:34.480
So I don't know if you've ever played against a chess grandmaster.
link |
01:21:38.480
You know, I get wiped out in, you know, 10, 10 plays, right?
link |
01:21:43.480
And, you know, I have to think about my move for, you know, like 15 minutes.
link |
01:21:49.480
And the person in front of me, the grandmaster, you know, would just like react within seconds, right?
link |
01:21:55.480
You know, he doesn't need to think about it.
link |
01:21:58.480
That's become part of subconscious because, you know, it's basically just pattern recognition at this point.
link |
01:22:04.480
Same, you know, the first few hours you drive a car, you're really attentive, you can't do anything else.
link |
01:22:09.480
And then after 20, 30 hours of practice, 50 hours, you know, it's subconscious.
link |
01:22:13.480
You can talk to the person next to you, you know, things like that, right?
link |
01:22:16.480
Unless the situation becomes unpredictable and then you have to stop talking.
link |
01:22:20.480
So that suggests you only have one model in your head.
link |
01:22:24.480
And it might suggest the idea that consciousness basically is the module that configures this world model of yours.
link |
01:22:31.480
You know, you need to have some sort of executive kind of overseer that configures your world model for the situation at hand.
link |
01:22:40.480
And that needs to kind of the really curious concept that consciousness is not a consequence of the power of our mind,
link |
01:22:47.480
but of the limitation of our brains.
link |
01:22:49.480
But because we have only one world model, we have to be conscious.
link |
01:22:53.480
If we had as many world models as there are situations we encounter,
link |
01:22:58.480
then we could do all of them simultaneously and we wouldn't need this sort of executive control that we call consciousness.
link |
01:23:04.480
Yeah, interesting. And somehow maybe that executive controller, I mean, the hard problem of consciousness,
link |
01:23:10.480
there's some kind of chemicals in biology that's creating a feeling like it feels to experience some of these things.
link |
01:23:18.480
That's kind of like the hard question is, what the heck is that? And why is that useful?
link |
01:23:24.480
Maybe the more pragmatic question, why is it useful to feel like this is really you experiencing this versus just like information being processed?
link |
01:23:35.480
It could be just a very nice side effect of the way we evolved.
link |
01:23:41.480
That's just very useful to feel a sense of ownership to the decisions you make, to the perceptions you make, to the model you're trying to maintain.
link |
01:23:52.480
Like you own this thing and it's the only one you got and if you lose it, it's going to really suck.
link |
01:23:57.480
And so you should really send the brain some signals about it.
link |
01:24:03.480
What ideas do you believe might be true that most or at least many people disagree with you with?
link |
01:24:10.480
Let's say in the space of machine learning.
link |
01:24:13.480
Well, it depends who you talk about, but I think, so certainly there is a bunch of people who are nativists who think that a lot of the basic things about the world are kind of hardwired in our minds.
link |
01:24:25.480
Things like the world is three dimensional, for example, is that hardwired?
link |
01:24:30.480
Things like object permanence, is it something that we learn before the age of three months or so?
link |
01:24:37.480
Or are we born with it?
link |
01:24:39.480
And there are very wide disagreements among the cognitive scientists for this.
link |
01:24:46.480
I think those things are actually very simple to learn.
link |
01:24:50.480
Is it the case that the oriented edge detectors in V1 are learned or are they hardwired?
link |
01:24:56.480
I think they are learned.
link |
01:24:57.480
They might be learned before both because it's really easy to generate signals from the retina that actually will train edge detectors.
link |
01:25:04.480
And again, those are things that can be learned within minutes of opening your eyes.
link |
01:25:09.480
I mean, since the 1990s, we have algorithms that can learn oriented edge detectors completely unsupervised with the equivalent of a few minutes of real time.
link |
01:25:19.480
So those things have to be learned.
link |
01:25:22.480
And there's also those MIT experiments where you kind of plug the optical nerve on the auditory cortex of a baby ferret, right?
link |
01:25:30.480
And that auditory cortex becomes a visual cortex essentially.
link |
01:25:33.480
So clearly, there's running taking place there.
link |
01:25:37.480
So I think a lot of what people think are so basic that they need to be hardwired, I think a lot of those things are learned because they are easy to learn.
link |
01:25:45.480
So you put a lot of value in the power of learning.
link |
01:25:49.480
What kind of things do you suspect might not be learned?
link |
01:25:52.480
Is there something that could not be learned?
link |
01:25:55.480
So your intrinsic drives are not learned.
link |
01:25:59.480
There are the things that make humans human or make cats different from dogs, right?
link |
01:26:07.480
It's the basic drives that are kind of hardwired in our bezoganglia.
link |
01:26:12.480
I mean, there are people who are working on this kind of stuff that's called intrinsic motivation in the context of reinforcement learning.
link |
01:26:17.480
So these are objective functions.
link |
01:26:19.480
Whether reward doesn't come from the external world, it's computed by your own brain.
link |
01:26:24.480
Your own brain computes whether you're happy or not, right?
link |
01:26:29.480
It measures your degree of comfort or in comfort.
link |
01:26:33.480
And because it's your brain computing this, presumably it knows also how to estimate gradients of this, right?
link |
01:26:40.480
So it's easier to learn when your objective is intrinsic.
link |
01:26:46.480
So that has to be hardwired.
link |
01:26:49.480
The critic that makes long term prediction of the outcome, which is the eventual result of this, that's learned.
link |
01:26:57.480
And perception is learned and your model of the world is learned.
link |
01:27:01.480
But let me take an example of why the critic, I mean, an example of how the critic might be learned, right?
link |
01:27:07.480
If I come to you, I reach across the table and I pinch your arm, right?
link |
01:27:13.480
Complete surprise for you.
link |
01:27:15.480
You would not have expected this from me.
link |
01:27:17.480
Yes, right, let's say for the sake of the story.
link |
01:27:21.480
Okay, your bezelganglia is going to light up because it's going to hurt, right?
link |
01:27:28.480
And now your model of the world includes the fact that I may pinch you if I approach my...
link |
01:27:34.480
Don't trust humans.
link |
01:27:36.480
Right, my hand to your arm.
link |
01:27:38.480
If I try again, you're going to recoil and that's your critic, your predictor of your ultimate pain system that predicts that something bad is going to happen and you recoil to avoid it.
link |
01:27:53.480
So even that can be learned.
link |
01:27:54.480
That is learned, definitely.
link |
01:27:56.480
This is what allows you also to define some goals, right?
link |
01:28:00.480
So the fact that you're a school child who wake up in the morning and you go to school and it's not because you necessarily like waking up early and going to school, but you know that there is a long term objective you're trying to optimize.
link |
01:28:15.480
So Ernest Becker, I'm not sure if you're familiar with the philosopher who wrote the book Denial of Death and his idea is that one of the core motivations of human beings is our terror of death, our fear of death.
link |
01:28:26.480
That's what makes us unique from cats.
link |
01:28:28.480
Cats are just surviving.
link |
01:28:30.480
They do not have a deep, like cognizance introspection that over the horizon is the end.
link |
01:28:41.480
And he says that, I mean, there's a terror management theory that just all these psychological experiments that show basically this idea that all of human civilization, everything we create is kind of trying to forget.
link |
01:28:55.480
Even for a brief moment that we're going to die.
link |
01:28:59.480
When do you think humans understand that they're going to die?
link |
01:29:04.480
Is it learned early on also?
link |
01:29:07.480
I don't know at what point.
link |
01:29:10.480
I mean, it's a question like, at what point do you realize that what death really is?
link |
01:29:15.480
And I think most people don't actually realize what death is, right?
link |
01:29:18.480
I mean, most people believe that you go to heaven or something, right?
link |
01:29:21.480
So to push back on that, what Ernest Becker says and Sheldon Solomon, all of those folks, and I find those ideas a little bit compelling is that there is moments in life, early in life.
link |
01:29:33.480
A lot of this fun happens early in life when you are, when you do deeply experience the terror of this realization and all the things you think about about religion, all those kinds of things that would kind of think about more like teenage years and later.
link |
01:29:50.480
We're talking about way earlier.
link |
01:29:52.480
No, it's like seven or eight years or something like that.
link |
01:29:54.480
You realize, holy crap, this is like the mystery, the terror.
link |
01:30:00.480
It's almost like you're a little prey, a little baby deer sitting in the darkness of the jungle of the woods, looking all around you.
link |
01:30:08.480
There's darkness full of terror.
link |
01:30:10.480
And that realization says, okay, I'm going to go back in the comfort of my mind where there is a deep meaning, where there is maybe like pretend I'm immortal in however way, however kind of idea I can construct to help me understand that I'm immortal.
link |
01:30:26.480
Religion helps with that.
link |
01:30:28.480
You can delude yourself in all kinds of ways, like lose yourself in the busyness of each day, have little goals in mind, all those kinds of things to think that it's going to go on forever.
link |
01:30:38.480
You kind of know you're going to die and it's going to be sad, but you don't really understand that you're going to die.
link |
01:30:44.480
And so that's their idea.
link |
01:30:46.480
And I find that compelling because it does seem to be a core unique aspect of human nature that we were able to really understand that this life is finite.
link |
01:30:58.480
That seems important.
link |
01:31:00.480
There's a bunch of different things there.
link |
01:31:02.480
So first of all, I don't think there is a qualitative difference between us and cats in the term.
link |
01:31:06.480
I think the difference is that we just have a better long term ability to predict in the long term.
link |
01:31:14.480
And so we have a better understanding of other world works.
link |
01:31:16.480
So we have better understanding of finiteness of life and things like that.
link |
01:31:20.480
So we have a better planning engine than cats?
link |
01:31:22.480
Yeah.
link |
01:31:24.480
But what's the motivation for planning that far?
link |
01:31:28.480
Well, I think it's just a side effect of the fact that we have just a better planning engine because it makes us, as I said, the essence of intelligence is the ability to predict.
link |
01:31:37.480
And so because we're smarter, as a side effect, we also have this ability to kind of make predictions about our own future existence or lack thereof.
link |
01:31:47.480
You say religion helps with that.
link |
01:31:50.480
I think religion hurts, actually.
link |
01:31:52.480
It makes people worry about what's going to happen after their death, et cetera.
link |
01:31:57.480
If you believe that you just don't exist after death, it solves completely the problem at least.
link |
01:32:02.480
You're saying if you don't believe in God, you don't worry about what happens after death?
link |
01:32:07.480
I don't know.
link |
01:32:09.480
You only worry about this life because that's the only one you have.
link |
01:32:14.480
Well, if I were to say what Ernest Becker says, and I actually agree with him more than not, is you do deeply worry.
link |
01:32:26.480
If you believe there's no God, there's still a deep worry of the mystery of it all.
link |
01:32:31.480
How does that make any sense?
link |
01:32:33.480
That it just ends.
link |
01:32:35.480
I don't think we can truly understand that this...
link |
01:32:39.480
I mean, so much of our life, the consciousness, the ego is invested in this being.
link |
01:32:46.480
Science keeps bringing humanity down from its pedestal.
link |
01:32:51.480
That's another example of it.
link |
01:32:54.480
That's wonderful, but for us individual humans, we don't like to be brought down from a pedestal.
link |
01:32:59.480
I'm fine with it.
link |
01:33:01.480
But see, you're fine with it because...
link |
01:33:03.480
Well, so what Ernest Becker would say is you're fine with it because that's just a more peaceful existence for you,
link |
01:33:08.480
but you're not really fine.
link |
01:33:10.480
In fact, some of the people that experienced the deepest trauma earlier in life,
link |
01:33:16.480
they often, before they seek extensive therapy, will say that I'm fine.
link |
01:33:20.480
It's like when you talk to people who are truly angry,
link |
01:33:23.480
they're like, how are you doing? I'm fine.
link |
01:33:25.480
The question is, what's going on?
link |
01:33:27.480
I had a near death experience.
link |
01:33:29.480
I had a very bad motorbike accident when I was 17.
link |
01:33:34.480
But that didn't have any impact on my reflection on that topic.
link |
01:33:40.480
So I'm basically just playing a bit of devil's advocate, pushing back on wondering,
link |
01:33:45.480
is it truly possible to accept death?
link |
01:33:47.480
And the flip side that's more interesting, I think, for AI and robotics, is how important is it to have this as one of the suite of motivations,
link |
01:33:57.480
is to not just avoid falling off the roof or something like that,
link |
01:34:04.480
but ponder the end of the ride.
link |
01:34:09.480
If you listen to the Stoics, it's a great motivator.
link |
01:34:14.480
It adds a sense of urgency.
link |
01:34:16.480
So maybe to truly fear death or be cognizant of it might give a deeper meaning and urgency to the moment to live fully.
link |
01:34:28.480
Maybe I don't disagree with that.
link |
01:34:31.480
I think what motivates me here is knowing more about human nature.
link |
01:34:38.480
I think human nature and human intelligence is a big mystery.
link |
01:34:42.480
It's a scientific mystery in addition to philosophical, et cetera.
link |
01:34:48.480
But I'm a true believer in science.
link |
01:34:53.480
And I do have a belief that for complex systems like the brain and the mind,
link |
01:34:59.480
the way to understand it is to try to reproduce it with artifacts that you build,
link |
01:35:06.480
because you know what's essential to it when you try to build it.
link |
01:35:09.480
You know, the same way I've used this analogy before with you, I believe,
link |
01:35:13.480
the same way we only started to understand aerodynamics when we started building airplanes,
link |
01:35:18.480
and that helped us understand how birds fly.
link |
01:35:21.480
So I think there's kind of a similar process here where we don't have a theory,
link |
01:35:27.480
a full theory of intelligence, but building intelligent artifacts will help us perhaps develop some underlying theory
link |
01:35:35.480
that encompasses not just artificial implements, but also human and biological intelligence in general.
link |
01:35:43.480
So you're an interesting person to ask this question about sort of all kinds of different other intelligent entities or intelligences.
link |
01:35:52.480
What are your thoughts about kind of like the touring or the Chinese room question?
link |
01:35:58.480
If we create an AI system that exhibits a lot of properties of intelligence and consciousness,
link |
01:36:07.480
how comfortable are you thinking of that entity as intelligent or conscious?
link |
01:36:12.480
So you're trying to build now systems that have intelligence and there's metrics about their performance,
link |
01:36:17.480
but that metric is external.
link |
01:36:22.480
So are you okay calling a thing intelligent or are you going to be like most humans
link |
01:36:28.480
and be once again unhappy to be brought down from a pedestal of consciousness or intelligence?
link |
01:36:34.480
No, I'll be very happy to understand more about human nature, human mind,
link |
01:36:43.480
and human intelligence through the construction of machines that have similar abilities.
link |
01:36:50.480
And if a consequence of this is to bring down humanity one notch down from its already low pedestal,
link |
01:36:57.480
I'm just fine with it. That's just a reality of life. So I'm fine with that.
link |
01:37:02.480
Now you were asking me about things that opinions I have that a lot of people may disagree with.
link |
01:37:07.480
I think if we think about the design of autonomous intelligence systems,
link |
01:37:14.480
assuming that we are somewhat successful at some level of getting machines to learn models of the world,
link |
01:37:21.480
we build intrinsic motivation objective functions to drive the behavior of that system.
link |
01:37:27.480
The system also has perception modules that allows it to estimate the state of the world
link |
01:37:32.480
and then have some way of figuring out a sequence of actions to optimize a particular objective.
link |
01:37:38.480
If it has a critic of the type that was describing before, the thing that makes you recoil your arm
link |
01:37:44.480
the second time I try to pinch you, intelligent autonomous machine will have emotions.
link |
01:37:51.480
I think emotions are an integral part of autonomous intelligence.
link |
01:37:55.480
If you have an intelligent system that is driven by intrinsic motivation, by objectives,
link |
01:38:03.480
if it has a critic that allows you to predict in advance whether the outcome of a situation is going to be good or bad,
link |
01:38:10.480
it's going to have emotions. It's going to have fear when it predicts that the outcome is going to be bad
link |
01:38:17.480
and something to avoid is going to have elation when it predicts it's going to be good.
link |
01:38:23.480
If it has drives to relate with humans in some ways, the way humans have, it's going to be social.
link |
01:38:34.480
It's going to have emotions about attachment and things of that type.
link |
01:38:39.480
I think the sci fi thing where you see commander data having an emotion chip that you can turn off,
link |
01:38:49.480
I think that's ridiculous.
link |
01:38:51.480
Here's the difficult philosophical, social question.
link |
01:38:57.480
Do you think there will be a time, like a civil rights movement for robots where,
link |
01:39:03.480
okay, if we get the movement, but a discussion like the Supreme Court,
link |
01:39:09.480
that particular kinds of robots, particular kinds of systems deserve the same rights as humans
link |
01:39:18.480
because they can suffer just as humans can, all those kinds of things?
link |
01:39:24.480
Well, perhaps not.
link |
01:39:27.480
Imagine that humans were, that you could die and be restored.
link |
01:39:33.480
You could be 3D reprinted and your brain could be reconstructed in its finest details.
link |
01:39:40.480
Our ideas of rights will change in that case.
link |
01:39:43.480
You can always just, there's always a backup, you could always restore.
link |
01:39:48.480
Maybe the importance of murder will go down one notch.
link |
01:39:52.480
That's right.
link |
01:39:53.480
But also your desire to do dangerous things like skydiving or race car driving,
link |
01:40:06.480
car racing, all that kind of stuff would probably increase or airplane aerobatics or that kind of stuff.
link |
01:40:12.480
It would be fine to do a lot of those things or explore dangerous areas and things like that
link |
01:40:17.480
that would change your relationship.
link |
01:40:19.480
Now, it's very likely that robots would be like that because they'll be based on perhaps technology
link |
01:40:27.480
that is somewhat similar to this technology and you can always have a backup.
link |
01:40:32.480
It's possible.
link |
01:40:34.480
I don't know if you like video games, but there's a game called Diablo.
link |
01:40:39.480
My sons are huge fans of this.
link |
01:40:42.480
Yes.
link |
01:40:43.480
In fact, they made a game that's inspired by it.
link |
01:40:46.480
Awesome.
link |
01:40:47.480
They built a game.
link |
01:40:49.480
My three sons have a game design studio between them.
link |
01:40:52.480
That's awesome.
link |
01:40:53.480
They came out with a game last year.
link |
01:40:55.480
No, this was last year, about a year ago.
link |
01:40:58.480
That's awesome.
link |
01:40:59.480
But in Diablo, there's something called hardcore mode, which if you die, there's no, you're gone.
link |
01:41:05.480
Right.
link |
01:41:06.480
That's it.
link |
01:41:07.480
It's possible with AI systems for them to be able to operate successfully and for us to treat them in a certain way
link |
01:41:15.480
because they have to be integrated in human society, they have to be able to die.
link |
01:41:20.480
No copies allowed.
link |
01:41:21.480
In fact, copying is illegal.
link |
01:41:23.480
It's possible with humans as well, like cloning will be illegal, even when it's possible.
link |
01:41:28.480
But cloning is not copying, right?
link |
01:41:29.480
I mean, you don't reproduce the mind of the person and experience.
link |
01:41:33.480
Right.
link |
01:41:34.480
It's just a delayed twin.
link |
01:41:36.480
But then it's, but we were talking about with computers that you will be able to copy.
link |
01:41:40.480
Right.
link |
01:41:41.480
You'll be able to perfectly save, pickle the mind state.
link |
01:41:46.480
And it's possible that that will be illegal because that goes against, that will destroy the motivation of the system.
link |
01:41:55.480
Okay.
link |
01:41:56.480
So let's say you have a domestic robot sometime in the future.
link |
01:42:01.480
Yes.
link |
01:42:02.480
And the domestic robot comes to you kind of somewhat pre trained, it can do a bunch of things.
link |
01:42:08.480
Yes.
link |
01:42:09.480
But it has a particular personality that makes it slightly different from the other robots because that makes them more interesting.
link |
01:42:14.480
And then because it's lived with you for five years, you've grown some attachment to it and vice versa.
link |
01:42:22.480
And it's learned a lot about you.
link |
01:42:24.480
Or maybe it's not a household robot.
link |
01:42:26.480
Maybe it's a virtual assistant that lives in your augmented reality glasses or whatever, right?
link |
01:42:32.480
You know, the her movie type thing, right?
link |
01:42:36.480
And that system to some extent that the intelligence in that system is a bit like your child or maybe your PhD student in the sense that there's a lot of you in that machine now, right?
link |
01:42:49.480
Yeah.
link |
01:42:50.480
So if it were a living thing, you would do this for free if you want, right?
link |
01:42:56.480
If it's your child, your child can then live his or her own life.
link |
01:43:01.480
And you know, the fact that they learn stuff from you doesn't mean that you have any ownership of it, right?
link |
01:43:06.480
But if it's a robot that you've trained, perhaps you have some intellectual property claim.
link |
01:43:13.480
Intellectual property? Oh, I thought you meant like permanent value in the sense that's part of you in that way.
link |
01:43:20.480
Well, there is permanent value, right?
link |
01:43:21.480
So you would lose a lot if that robot were to be destroyed and you had no backup.
link |
01:43:25.480
You would lose a lot.
link |
01:43:26.480
You would lose a lot of investment.
link |
01:43:27.480
You know, kind of like a person dying, you know, that a friend of yours dying or a coworker or something like that.
link |
01:43:37.480
But also you have like intellectual property rights in the sense that that system is fine tuned to your particular existence.
link |
01:43:46.480
So that's now a very unique instantiation of that original background model, whatever it was that arrived.
link |
01:43:53.480
And then there are issues of privacy, right?
link |
01:43:55.480
Because now imagine that that robot has its own kind of volition and decides to work with someone else.
link |
01:44:01.480
Yes.
link |
01:44:02.480
Or kind of thinks life with you is sort of untenable or whatever.
link |
01:44:09.480
Now, all the things that that system learned from you, can you delete all the personal information that that system knows about you?
link |
01:44:20.480
I mean, that would be kind of an ethical question.
link |
01:44:22.480
Can you erase the mind of an intelligent robot to protect your privacy?
link |
01:44:29.480
You can't do this with humans.
link |
01:44:31.480
You can ask them to shut up, but that you don't have complete power over them.
link |
01:44:35.480
Can't erase humans.
link |
01:44:36.480
Yeah, it's the problem with the relationships, you know, that you break up, you can't you can't erase the other human with robots.
link |
01:44:43.480
I think it will have to be the same thing with robots that that risk that there has to be some risk to our interactions to truly experience them deeply.
link |
01:44:55.480
It feels like so you have to be able to lose your robot friend.
link |
01:44:59.480
And that robot friend to go tweeting about how much of an asshole you are.
link |
01:45:03.480
But then are you allowed to, you know, murder the robot to protect your private information?
link |
01:45:08.480
Yeah, probably not.
link |
01:45:09.480
If the robot decides to leave.
link |
01:45:10.480
I have this intuition that for robots with with certain, like it's almost like regulation.
link |
01:45:16.480
If you declare your robot to be, let's call it sentient or something like that, like this, this robot is designed for human interaction.
link |
01:45:23.480
Then you're not allowed to murder these robots.
link |
01:45:25.480
It's the same as murdering other humans.
link |
01:45:27.480
Well, but what about you do a backup of the robot that you preserve on the on a high drive or the equivalent in the future.
link |
01:45:33.480
That might be illegal.
link |
01:45:34.480
It's like it's a piracy, piracy is illegal.
link |
01:45:37.480
But it's your own, it's your own robot, right?
link |
01:45:39.480
But you can't, you don't.
link |
01:45:41.480
But then, but then you can wipe out his brain.
link |
01:45:44.480
So the this robot doesn't know anything about you anymore, but you still have technically is still in existence because you backed it up.
link |
01:45:51.480
And then there'll be these great speeches at the Supreme Court by saying, Oh, sure, you can erase the mind of the robot, just like you can erase the mind of a human.
link |
01:45:59.480
We both can suffer.
link |
01:46:00.480
There'll be some epic like Obama type character with a speech that we we like the robots and the humans are the same.
link |
01:46:08.480
We can both suffer.
link |
01:46:09.480
We can both hope.
link |
01:46:10.480
We can both all of those, all those kinds of things, raise families, all that kind of stuff.
link |
01:46:16.480
It's it's interesting for these, just like you said, emotion seems to be a fascinatingly powerful aspect of human human interaction, human robot interaction.
link |
01:46:26.480
And if they're able to exhibit emotions at the end of the day, that's probably going to have us deeply consider human rights, like what we value in humans, what we value in other animals.
link |
01:46:39.480
That's why robots and AI is great.
link |
01:46:41.480
It makes us ask really good questions.
link |
01:46:43.480
The hard questions.
link |
01:46:44.480
Yeah.
link |
01:46:45.480
I mean, you asked about you asked about the Chinese room type argument, you know, is it real?
link |
01:46:49.480
If it looks real?
link |
01:46:50.480
Yeah.
link |
01:46:51.480
I think the Chinese room argument is the ridiculous one.
link |
01:46:54.480
So, so for people that don't know Chinese room is you can, I don't even know how to formulate it well, but basically, you can mimic the behavior of an intelligent system by just following a giant algorithm code book that tells you exactly how to respond in exactly each case.
link |
01:47:12.480
But is that really intelligent?
link |
01:47:14.480
It's like a giant lookup table.
link |
01:47:16.480
When this person says this, you answer this.
link |
01:47:18.480
When this person says this, you answer this.
link |
01:47:20.480
And if you understand how that works, you have this giant nearly infinite lookup table.
link |
01:47:26.480
Is that really intelligence?
link |
01:47:28.480
Because intelligence seems to be a mechanism that's much more interesting and complex than this lookup table.
link |
01:47:34.480
I don't think so.
link |
01:47:35.480
I mean, the real question comes down to, do you think, you know, you can, you can mechanize intelligence in some way, even if that involves learning?
link |
01:47:47.480
And the answer is, of course, yes, there's no question.
link |
01:47:50.480
There's a second question then, which is, assuming you can reproduce intelligence in sort of different hardware than biological hardware, you know, like computers.
link |
01:48:02.480
Can you, you know, match human intelligence in all the domains in which humans are intelligent?
link |
01:48:12.480
Is it possible, right?
link |
01:48:14.480
So the hypothesis of strong AI.
link |
01:48:16.480
The answer to this, in my opinion, is an unqualified yes.
link |
01:48:20.480
This will as well happen at some point.
link |
01:48:22.480
There's no question that machines at some point will become more intelligent than humans in all domains where humans are intelligent.
link |
01:48:28.480
This is not for tomorrow, it's going to take a long time, regardless of what, you know, Elon and others have claimed or believed.
link |
01:48:37.480
This is a lot, a lot harder than many of, many of those guys think it is.
link |
01:48:42.480
And many of those guys who thought it was simpler than that years, you know, five years ago now think it's hard because it's been five years and they realize it's going to take a lot longer.
link |
01:48:53.480
That includes a bunch of people at DeepMind, for example.
link |
01:48:55.480
Oh, interesting. I haven't actually touched base with the DeepMind folks, but some of it, Elon or the MSSIs, I mean, sometimes in your role, you have to kind of create deadlines that are nearer than farther away to kind of create an urgency.
link |
01:49:12.480
Because, you know, you have to believe the impossible is possible in order to accomplish it.
link |
01:49:16.480
And there's, of course, a flip side to that coin, but it's a weird, you can't be too cynical if you want to get something done.
link |
01:49:22.480
Absolutely. I agree with that.
link |
01:49:24.480
I mean, you have to inspire people to work on sort of ambitious things.
link |
01:49:31.480
So, you know, it's certainly a lot harder than we believe, but there's no question in my mind that this will happen.
link |
01:49:38.480
And now, you know, people are kind of worried about what does that mean for humans.
link |
01:49:42.480
They are going to be brought down from their pedestal, you know, a bunch of notches with that and, you know, is that going to be good or bad?
link |
01:49:51.480
I mean, it's just going to give more power, right?
link |
01:49:53.480
And so, amplify your four human intelligence, really.
link |
01:49:56.480
So, speaking of doing cool, ambitious things, FAIR, the Facebook AI Research Group, has recently celebrated its eighth birthday.
link |
01:50:05.480
Or maybe you can correct me on that.
link |
01:50:08.480
Looking back, what has been the successes, the failures, the lessons learned from the eight years of FAIR?
link |
01:50:14.480
And maybe you can also give context of where does the newly minted meta AI fit into how does it relate to FAIR?
link |
01:50:22.480
Right, so let me tell you a little bit about the organization of all this.
link |
01:50:26.480
Yeah, FAIR was created almost exactly eight years ago.
link |
01:50:29.480
It wasn't called FAIR yet.
link |
01:50:31.480
It took that name a few months later.
link |
01:50:34.480
And at the time, I joined Facebook.
link |
01:50:37.480
There was a group called the AI Group that had about 12 engineers and a few scientists, like, you know, 10 engineers and two scientists and something like that.
link |
01:50:46.480
I run it for three and a half years as a director.
link |
01:50:50.480
I hired the first few scientists and kind of set up the culture and organized it, you know, explained to the Facebook leadership what fundamental research was about and how it can work within industry and how it needs to be open and everything.
link |
01:51:06.480
And I think it's been an unqualified success in the sense that FAIR has simultaneously produced, you know, top level research and advanced the science and the technology, provided tools, open source tools like PyTorch and many others.
link |
01:51:25.480
But at the same time as had a direct or mostly indirect impact on Facebook at the time, now Meta, in the sense that a lot of systems that are, that Meta is built around now are based on research projects that started at FAIR.
link |
01:51:47.480
And so if you were to take out, you know, deep running out of Facebook services now and Meta more generally, I mean, the company would literally crumble.
link |
01:51:57.480
I mean, it's completely built around AI these days.
link |
01:52:00.480
And it's really essential to the operations.
link |
01:52:03.480
So what happened after three and a half years is that I changed role.
link |
01:52:08.480
I became chief scientist.
link |
01:52:09.480
So I'm not doing day to day management of FAIR anymore.
link |
01:52:14.480
I'm more of a kind of, you know, think about strategy and things like that.
link |
01:52:18.480
And I carry my, I conduct my own research.
link |
01:52:21.480
I've, you know, my own kind of research group working on self supervised learning and things like this, which I didn't have time to do when I was director.
link |
01:52:27.480
So now FAIR is run by Joel Pinot and Antoine Bard together because FAIR is kind of split into now there's something called FAIR Labs, which is sort of bottom up census driven research and FAIR Excel, which is slightly more organized for bigger projects that require a little
link |
01:52:45.480
more kind of focus and more engineering support and things like that.
link |
01:52:49.480
So Joel needs FAIR Lab and Antoine Bard needs FAIR Excel.
link |
01:52:52.480
Where are they located?
link |
01:52:53.480
All over.
link |
01:52:54.480
It's delocalized all over.
link |
01:52:56.480
Okay.
link |
01:52:57.480
So there's no question that the leadership of the company believes that this was a very worthwhile investment.
link |
01:53:06.480
And what that means is that it's there for the long run.
link |
01:53:12.480
Right.
link |
01:53:13.480
So there is, if you want to talk in these terms, which I don't like, there's a business model, if you want, where FAIR, despite being a very fundamental research lab, brings a lot of value to the company.
link |
01:53:25.480
Mostly indirectly through other groups.
link |
01:53:29.480
Now what happened three and a half years ago when I stepped down was also the creation of Facebook AI, which was basically a larger organization that covers FAIR.
link |
01:53:40.480
So FAIR is included in it, but also has other organizations that are focused on applied research or advanced development of AI technology that is more focused on the products of the company.
link |
01:53:54.480
So less emphasis on fundamental research.
link |
01:53:56.480
Less fundamental.
link |
01:53:57.480
But it's still a research.
link |
01:53:58.480
I mean, there's a lot of papers coming out of those organizations and people are awesome and wonderful to interact with.
link |
01:54:05.480
But it serves as kind of a way to kind of scale up if you want AI technology, which may be very experimental and sort of lab prototypes into things that are usable.
link |
01:54:20.480
So FAIR is a subset of meta AI.
link |
01:54:22.480
Is FAIR become like KFC?
link |
01:54:24.480
It'll just keep the F.
link |
01:54:26.480
Nobody cares what the F stands for.
link |
01:54:28.480
We'll know soon enough by probably by the end of 2021.
link |
01:54:35.480
This is not a giant change, FAIR.
link |
01:54:38.480
Well, FAIR doesn't sound too good.
link |
01:54:39.480
But the brand people are kind of deciding on this and they've been hesitating for a while now and they tell us they're going to come up with an answer as to whether FAIR is going to change name.
link |
01:54:50.480
Or whether we're going to change just the meaning of the F.
link |
01:54:53.480
That's a good call.
link |
01:54:54.480
I'll keep FAIR and change the meaning of the F.
link |
01:54:56.480
That would be my preference.
link |
01:54:57.480
I would turn the F into fundamental.
link |
01:55:00.480
Oh, that's good.
link |
01:55:01.480
Fundamental AI research.
link |
01:55:02.480
Oh, that's really good.
link |
01:55:03.480
Within meta AI.
link |
01:55:04.480
So this would be meta FAIR.
link |
01:55:06.480
Yeah.
link |
01:55:07.480
But people will call it FAIR, right?
link |
01:55:08.480
Yeah, exactly.
link |
01:55:09.480
I like it.
link |
01:55:10.480
Meta AI is part of the reality lab.
link |
01:55:20.480
Meta now, the new Facebook right?
link |
01:55:21.480
It's called Meta and it's kind of divided into Facebook, Instagram, WhatsApp, and reality lab.
link |
01:55:32.480
Reality lab is about AR, VR, telepresence, communication technology and stuff like that.
link |
01:55:40.480
It's kind of the, you can think of it as the sort of a combination of sort of new products and technology part of meta.
link |
01:55:51.480
Is that where the touch sensing for robots?
link |
01:55:53.480
I saw that you were posting about that.
link |
01:55:55.480
Touch sensing for robots is part of FAIR, actually.
link |
01:55:57.480
That's a fair product.
link |
01:55:58.480
Oh, it is.
link |
01:55:59.480
Okay, cool.
link |
01:56:00.480
This is also the, no, but there is the other way, the haptic glove, right?
link |
01:56:05.480
Yes.
link |
01:56:06.480
That's more reality lab.
link |
01:56:08.480
That's reality lab research.
link |
01:56:10.480
Reality lab research.
link |
01:56:11.480
By the way, the touch sensors are super interesting.
link |
01:56:14.480
Like integrating that modality into the whole sensing suite is very interesting.
link |
01:56:21.480
So what do you think about the metaverse?
link |
01:56:23.480
What do you think about this whole kind of expansion of the view of the role of Facebook and meta in the world?
link |
01:56:30.480
Well, metaverse really should be thought of as the next step in the internet, right?
link |
01:56:35.480
Sort of trying to kind of make the experience more compelling of being connected either with other people or with content.
link |
01:56:48.480
And, you know, we are evolved and trained to evolve in, you know, 3D environments where, you know, we can see other people.
link |
01:56:58.480
We can talk to them when you're near them or, you know, and other people are far away can hear us, you know, things like that, right?
link |
01:57:05.480
So it's, there's a lot of social conventions that exist in the real world that we can try to transpose.
link |
01:57:10.480
Now, what is going to be eventually the, how compelling is it going to be?
link |
01:57:16.480
Is it going to be the case that people are going to be willing to do this if they have to wear, you know, a huge pair of goggles all day?
link |
01:57:24.480
Maybe not.
link |
01:57:25.480
But then again, if the experience is sufficiently compelling, maybe so.
link |
01:57:30.480
Or if the device that you have to wear is just basically a pair of glasses, you know, technology makes sufficient progress for that.
link |
01:57:36.480
You know, AR is a much easier concept to grasp that you're going to have, you know, augmented reality glasses that basically contain some sort of, you know, virtual assistant that can help you in your daily lives.
link |
01:57:49.480
But at the same time with the AR, you have to contend with reality.
link |
01:57:53.480
With VR, you can completely detach yourself from reality.
link |
01:57:55.480
So it gives you freedom.
link |
01:57:56.480
It might be easier to design worlds in VR.
link |
01:57:59.480
Yeah.
link |
01:58:00.480
But you can imagine, you know, the metaverse being a mix, right?
link |
01:58:05.480
Or like you can have objects that exist in the metaverse that, you know, pop up on top of the real world or only exist in virtual reality.
link |
01:58:13.480
Okay.
link |
01:58:14.480
Let me ask the hard question.
link |
01:58:16.480
Because all of this was easy.
link |
01:58:18.480
This was easy.
link |
01:58:19.480
The Facebook now meta, the social network has been painted by the media as net negative for society, even destructive and evil at times.
link |
01:58:30.480
You've pushed back against this defending Facebook.
link |
01:58:33.480
Can you explain your defense?
link |
01:58:36.480
Yeah.
link |
01:58:37.480
So the description, the company that is being described in the, in some media is not the company we know when we work inside.
link |
01:58:46.480
And, you know, it could be claimed that a lot of employees are uninformed about what really goes on in the company.
link |
01:58:54.480
But, you know, I'm a vice president.
link |
01:58:56.480
I mean, I have a pretty good vision of what goes on.
link |
01:58:58.480
You know, I don't know everything.
link |
01:58:59.480
Obviously, I'm not involved in everything, but certainly not in decision about like, you know, content moderation or anything like this.
link |
01:59:06.480
But I have some decent vision of what goes on.
link |
01:59:09.480
And this evil that is being described, I just don't see it.
link |
01:59:13.480
And then, you know, I think there is an easy story to buy, which is that, you know, all the bad things in the world and, you know, the reason your friend believes crazy stuff.
link |
01:59:26.480
You know, there's an easy scapegoat, right, in social media in general, Facebook in particular.
link |
01:59:33.480
But you have to look at the data.
link |
01:59:35.480
Like, is it the case that Facebook, for example, polarizes people politically?
link |
01:59:41.480
Are there academic studies that show this?
link |
01:59:44.480
Is it the case that, you know, teenagers think of themselves less if they use Instagram more?
link |
01:59:51.480
Is it the case that, you know, people get more riled up against, you know, opposite sides in a debate or political opinion if they are more on Facebook or if they are less?
link |
02:00:05.480
And study after study show that none of this is true.
link |
02:00:10.480
This is independent studies by academics.
link |
02:00:12.480
They're not funded by Facebook or Meta.
link |
02:00:14.480
You know, studied by Stanford, by some of my colleagues at NYU actually, with whom I have no connection.
link |
02:00:20.480
You know, there's a study recently.
link |
02:00:22.480
They paid people, I think it was in former Yugoslavia, I'm not exactly sure in what part.
link |
02:00:31.480
But they paid people to not use Facebook for a while in the period before the anniversary of the cybernature massacres, right?
link |
02:00:43.480
So, you know, people get riled up, like, you know, should we have a celebration?
link |
02:00:47.480
I mean, a memorial kind of celebration for it or not.
link |
02:00:50.480
So they paid a bunch of people to not use Facebook for a few weeks.
link |
02:00:55.480
And it turns out that those people ended up being more polarized than they were at the beginning and the people who were more on Facebook were less polarized.
link |
02:01:05.480
There's a study, you know, from Stanford of economists at Stanford that tried to identify the causes of increasing polarization in the U.S.
link |
02:01:15.480
And it's been going on for 40 years before, you know, Mark Zuckerberg was born continuously.
link |
02:01:22.480
And so if there is a cause, it's not Facebook or social media.
link |
02:01:27.480
So you could say if social media just accelerated.
link |
02:01:29.480
But no, I mean, it's basically a continuous evolution by some measure of polarization in the U.S.
link |
02:01:35.480
And then you compare this with other countries like the West half of Germany, because you can go 40 years on the East side or Denmark or other countries.
link |
02:01:46.480
And they use Facebook just as much.
link |
02:01:48.480
And they're not getting more polarized.
link |
02:01:50.480
They're getting less polarized.
link |
02:01:51.480
So if you want to look for, you know, a causal relationship there, you can find a scapegoat, but you can't find a cause.
link |
02:01:59.480
Now, if you want to fix the problem, you have to find the right cause.
link |
02:02:02.480
And what rise me up is that people now are accusing Facebook of bad deeds that are done by others.
link |
02:02:08.480
And those others are we're not doing anything about them.
link |
02:02:11.480
And by the way, those others include the owner of the Wall Street Journal in which all of those papers were published.
link |
02:02:17.480
So I should mention that I'm talking to Shreve, Mike Shreve for on this podcast and also Mark Zuckerberg and probably these are conversations you can have with them.
link |
02:02:25.480
Because it's very interesting to me, even if Facebook has some measurable negative effect,
link |
02:02:31.480
you can't just consider that an isolation.
link |
02:02:33.480
You have to consider about all the positive ways it connects us.
link |
02:02:36.480
So like every technology is a question.
link |
02:02:39.480
You can't just say like there's an increase in division.
link |
02:02:43.480
Yes, probably Google search engine has created increase in division.
link |
02:02:47.480
We have to consider about how much information are brought to the world.
link |
02:02:50.480
Like I'm sure Wikipedia created more division.
link |
02:02:53.480
If you just look at the division, we have to look at the full context of the world and they didn't make a better world.
link |
02:02:59.480
But the printing press has created more division.
link |
02:03:01.480
Exactly.
link |
02:03:02.480
So when the printing press was invented, the first books that were printed were things like the Bible.
link |
02:03:10.480
And that allowed people to read the Bible by themselves, not get the message uniquely from priests in Europe.
link |
02:03:17.480
And that created the Protestant movement and 200 years of religious persecution and wars.
link |
02:03:23.480
So that's a bad side effect of the printing press.
link |
02:03:25.480
Social networks aren't being nearly as bad as the printing press, but nobody would say the printing press was a bad idea.
link |
02:03:32.480
Yeah, a lot of it is perception and there's a lot of different incentives operating here.
link |
02:03:37.480
Maybe a quick comment.
link |
02:03:39.480
Since you're one of the top leaders at Facebook and at Meta, sorry, that's in the tech space.
link |
02:03:46.480
I'm sure Facebook involves a lot of incredible technological challenges that need to be solved.
link |
02:03:52.480
A lot of it probably is in the computer infrastructure, the hardware.
link |
02:03:55.480
I mean, it's just a huge amount.
link |
02:03:58.480
Maybe can you give me context about how much of Shrap's life is AI and how much of it is low level compute?
link |
02:04:05.480
How much of it is flying all around doing business stuff in the same way as Mark Zuckerberg?
link |
02:04:11.480
They really focus on AI.
link |
02:04:13.480
I mean, certainly in the run up of the creation of FAIR and for at least a year after that, if not more,
link |
02:04:23.480
Mark was very, very much focused on AI and was spending quite a lot of effort on it.
link |
02:04:29.480
And that's his style.
link |
02:04:30.480
When he gets interested in something, he reads everything about it.
link |
02:04:33.480
He read some of my papers, for example, before I joined.
link |
02:04:39.480
And so he learns a lot about it.
link |
02:04:44.480
And Shrap was really to it also.
link |
02:04:50.480
I mean, Shrap is really kind of has something I've tried to preserve also despite my not so young age,
link |
02:04:59.480
which is a sense of wonder about science and technology.
link |
02:05:02.480
And he certainly has that.
link |
02:05:05.480
He's also a wonderful person in terms of as a manager, dealing with people and everything.
link |
02:05:11.480
Mark also actually.
link |
02:05:13.480
I mean, they're very human people.
link |
02:05:17.480
In the case of Mark, it's shockingly human, given his trajectory.
link |
02:05:24.480
The personality of him that he spent in the press is just completely wrong.
link |
02:05:29.480
Yeah.
link |
02:05:30.480
But you have to know how to play the press.
link |
02:05:32.480
I put some of that responsibility on him, too.
link |
02:05:36.480
It's like the director, the conductor of an orchestra.
link |
02:05:44.480
You have to play the press and the public in a certain kind of way where you convey your true self to them.
link |
02:05:49.480
If there's a depth and kind of state.
link |
02:05:51.480
And it's probably not the best at it.
link |
02:05:54.480
You have to learn.
link |
02:05:57.480
And it's sad to see, I'll talk to him about it, but Shrap is slowly stepping down.
link |
02:06:03.480
It's always sad to see folks sort of be there for a long time and slowly.
link |
02:06:09.480
I guess time is sad.
link |
02:06:11.480
I think he's done the thing he set out to do and he's got family priorities and stuff like that.
link |
02:06:21.480
And I understand after 13 years or something.
link |
02:06:27.480
It's been a good run.
link |
02:06:28.480
Which in Silicon Valley is basically a lifetime, because it's dog years.
link |
02:06:34.480
So New Europe's the conference just wrapped up.
link |
02:06:38.480
Let me just go back to something else.
link |
02:06:40.480
You posted the paper you coauthored was rejected from New Europe's.
link |
02:06:44.480
As you said, proudly in quotes rejected.
link |
02:06:47.480
Good joke.
link |
02:06:48.480
Yeah, I know.
link |
02:06:50.480
Can you describe this paper and like what was the idea in it?
link |
02:06:55.480
And also maybe this is a good opportunity to ask what are the pros and cons?
link |
02:07:00.480
What works and what doesn't about the review process?
link |
02:07:03.480
Yeah, let me talk about the paper first.
link |
02:07:04.480
I'll talk about the review process afterwards.
link |
02:07:08.480
The paper is called Vicreg.
link |
02:07:10.480
So this is I mentioned that before variance in variance covariance regularization.
link |
02:07:14.480
And it's a technique in non contrastive learning technique for what I call joint embedding architecture.
link |
02:07:21.480
So Samy's nets are an example of joint embedding architecture.
link |
02:07:24.480
So joint embedding architecture is let me back up a little bit, right?
link |
02:07:30.480
So if you want to do self supervised learning, you can do it by prediction.
link |
02:07:35.480
So let's say you want to train a system to predict video, right?
link |
02:07:38.480
You show it a video clip and you train the system to predict the next, the continuation of that video clip.
link |
02:07:44.480
Now, because you need to handle uncertainty because there are many, you know, many continuations that are plausible.
link |
02:07:50.480
You need to have, you need to handle this in some way.
link |
02:07:53.480
You need to have a way for the system to be able to produce multiple predictions.
link |
02:07:59.480
And the way, the only way I know to do this is through what's called a latent variable.
link |
02:08:04.480
So you have some sort of hidden vector of a variable that you can vary over a set or draw from a distribution.
link |
02:08:11.480
And as you vary this vector over a set, the output, the prediction varies over a set of plausible predictions.
link |
02:08:17.480
Okay, so that's called, I call this a generative latent variable model.
link |
02:08:23.480
Okay, now there's an alternative to this to handle uncertainty.
link |
02:08:28.480
And instead of directly predicting the next frames of the clip, you also run those through another neural net.
link |
02:08:40.480
So you now have two neural nets, one that looks at the, you know, the initial segment of the video clip.
link |
02:08:48.480
And another one that looks at the continuation during training, right?
link |
02:08:53.480
And what you're trying to do is learn a representation of those two video clips that is maximally informative about the video clips themselves.
link |
02:09:03.480
But it's such that you can predict the representation of the second video clip from the representation of the first one easily.
link |
02:09:11.480
And you can sort of formalize this in terms of maximizing mutual information and some stuff like that, but it doesn't matter.
link |
02:09:17.480
What you want is informative, representative, you know, informative representations of the two video clips that are mutually predictable.
link |
02:09:27.480
What that means is that there's a lot of details in the second video clips that are irrelevant.
link |
02:09:33.480
You know, let's say a video clip consists in, you know, a camera panning the scene.
link |
02:09:41.480
There's going to be a piece of that room that is going to be revealed and I can somewhat predict what that room is going to look like.
link |
02:09:47.480
But I may not be able to predict the details of the texture of the ground and where the tiles are ending and stuff like that, right?
link |
02:09:54.480
So those are irrelevant details that perhaps my representation will eliminate.
link |
02:09:59.480
And so what I need is to train this second neural net in such a way that whenever the continuation video clip varies
link |
02:10:09.480
over all the plausible continuations, the representation doesn't change.
link |
02:10:15.480
Got it.
link |
02:10:17.480
Over the space of representations, doing the same kind of thing as you do with similarity learning.
link |
02:10:25.480
So these are two ways to handle multimodality in a prediction, right?
link |
02:10:29.480
In the first way, you parameterize the prediction with a latent variable, but you predict pixels essentially, right?
link |
02:10:35.480
In the second one, you don't predict pixels, you predict an abstract representation of pixels,
link |
02:10:40.480
and you guarantee that this abstract representation has as much information as possible about the input,
link |
02:10:45.480
but sort of, you know, drops all the stuff that you really can't predict essentially.
link |
02:10:51.480
I used to be a big fan of the first approach.
link |
02:10:53.480
And in fact, in this paper with the Chen Mishra, this blog post, the Dark Matter Intelligence, I was kind of advocating for this.
link |
02:10:59.480
And in the last year and a half, I've completely changed my mind.
link |
02:11:02.480
I'm now a big fan of the second one.
link |
02:11:05.480
And it's because of a small collection of algorithms that have been proposed over the last year and a half or so, two years to do this, including V Craig.
link |
02:11:17.480
It's predecessor called Barlow Twins, which I mentioned.
link |
02:11:21.480
A method from our friends at DeepMind could be YOL.
link |
02:11:25.480
And there's a bunch of others now that kind of work similarly.
link |
02:11:29.480
So they're all based on this idea of joint embedding.
link |
02:11:32.480
Some of them have an explicit criterion.
link |
02:11:34.480
There is an approximation of mutual information.
link |
02:11:36.480
Some others are BOL work, but we don't really know why.
link |
02:11:39.480
And there's been like lots of theoretical papers about why BOL works.
link |
02:11:42.480
No, it's not that because we take it out and it still works.
link |
02:11:45.480
I mean, so there's like a big debate, but the important point is that we now have a collection of noncontrastive joint embedding methods,
link |
02:11:53.480
which I think is the best thing since sliced bread.
link |
02:11:56.480
So I'm super excited about this because I think it's our best shot for techniques that would allow us to kind of build predictive world models.
link |
02:12:06.480
And at the same time, learn hierarchical representations of the world where what matters about the world is preserved and what is irrelevant is eliminated.
link |
02:12:14.480
By the way, the representations that before and after is in the space in a sequence of images or is it for single images?
link |
02:12:22.480
It would be either for a single image for a sequence.
link |
02:12:24.480
It doesn't have to be images. This could be applied to text.
link |
02:12:26.480
This could be applied to just about any signal.
link |
02:12:28.480
I'm looking for methods that are generally applicable that are not specific to one particular modality.
link |
02:12:35.480
It could be audio or whatever.
link |
02:12:37.480
Got it. So what's the story behind this paper?
link |
02:12:39.480
This paper is describing one such method?
link |
02:12:43.480
It's this Vicreg method.
link |
02:12:44.480
So the first author is a student called Adrien Bard, who is a resident PhD student at Fer Paris.
link |
02:12:54.480
He's coadvised by me and Jean Ponce, who is a professor at Economa Supérieure and also research director at INRIA.
link |
02:13:01.480
So this is a wonderful program in France where PhD students can basically do their PhD in industry.
link |
02:13:06.480
And that's kind of what's happening here.
link |
02:13:10.480
And this paper is a follow up on this Balotuin paper by my former postdoc now, Stéphane Denis,
link |
02:13:18.480
with Lijing and Yorich Bontar and a bunch of other people from Fer.
link |
02:13:24.480
And one of the main criticism from reviewers is that Vicreg is not different enough from Balotuin's.
link |
02:13:31.480
But my impression is that it's Balotuin's with a few bugs fixed essentially.
link |
02:13:40.480
And in the end, this is what people reuse.
link |
02:13:43.480
Right.
link |
02:13:44.480
But I'm used to stuff that I submit being rejected for once.
link |
02:13:49.480
So it might be rejected and actually exceptionally well cited because people use it.
link |
02:13:52.480
Well, it's already cited a bunch of times.
link |
02:13:54.480
So I mean, the question is then to the deeper question about peer review and conferences.
link |
02:14:01.480
I mean, computer science as a field is kind of unique that the conference is highly prized.
link |
02:14:04.480
That's one.
link |
02:14:05.480
Right.
link |
02:14:06.480
And it's interesting because the peer review process there is similar, I suppose, to journals,
link |
02:14:10.480
but it's accelerated significantly.
link |
02:14:13.480
Well, not significantly, but it goes fast.
link |
02:14:16.480
And it's a nice way to get stuff out quickly, to peer review quickly, go to present it quickly to the community.
link |
02:14:22.480
So not quickly, but quicker.
link |
02:14:25.480
But nevertheless, it has many of the same flaws of peer review because it's a limited number of people look at it.
link |
02:14:31.480
There's bias and following.
link |
02:14:33.480
If you want to do new ideas, you're going to get pushed back.
link |
02:14:37.480
There's self interested people that kind of can infer who submitted it and kind of be cranky about it, all that kind of stuff.
link |
02:14:47.480
Yeah.
link |
02:14:48.480
I mean, there's a lot of social phenomena there.
link |
02:14:50.480
There's one social phenomenon, which is that because the field has been growing exponentially, the vast majority of people in the field are extremely junior.
link |
02:14:59.480
Yeah.
link |
02:15:00.480
So as a consequence, and that's just a consequence of the field growing, right?
link |
02:15:04.480
So as the number of, as the size of the field kind of starts saturating, you will have less of that problem of reviewers being very inexperienced.
link |
02:15:14.480
A consequence of this is that, you know, young reviewers, I mean, there's a phenomenon which is that reviewers try to make their life easy.
link |
02:15:24.480
And to make their life easy when reviewing a paper is very simple.
link |
02:15:27.480
You just have to find a flaw in the paper, right?
link |
02:15:29.480
So basically, they see that task as finding flaws in papers.
link |
02:15:34.480
And most papers have flaws, even the good ones.
link |
02:15:36.480
Yeah.
link |
02:15:37.480
So it's easy to, you know, to do that, your job is easier as a reviewer if you just focus on this.
link |
02:15:46.480
But what's important is, like, is there a new idea in that paper that is likely to influence?
link |
02:15:53.480
It doesn't matter if the experiments are not that great, if the protocol is, you know, so, so, you know, things like that.
link |
02:16:01.480
As long as there is a worthy idea in it that will influence the way people think about the problem.
link |
02:16:08.480
Even if they make it better, you know, eventually, I think that's, that's really what makes a paper useful.
link |
02:16:15.480
And so this combination of social phenomena creates a disease that has plagued, you know, other fields in the past like speech recognition,
link |
02:16:26.480
where basically, you know, people chase numbers on benchmarks.
link |
02:16:31.480
And it's much easier to get a paper accepted if it brings an incremental improvement on a sort of mainstream, well accepted method or problem.
link |
02:16:43.480
And those are, to me, boring papers.
link |
02:16:45.480
I mean, they're not useless, right?
link |
02:16:47.480
Because industry, you know, strives on those kind of progress.
link |
02:16:51.480
But they're not the one that I'm interested in, in terms of like new concepts and new ideas.
link |
02:16:55.480
So papers that are really trying to strike new advances generally don't make it.
link |
02:17:02.480
Now, thankfully, we have archive.
link |
02:17:04.480
Archive, exactly.
link |
02:17:05.480
And then there's open review type of situations.
link |
02:17:08.480
And then, I mean, Twitter is a kind of open review.
link |
02:17:11.480
I'm a huge believer that review should be done by thousands of people, not two people.
link |
02:17:15.480
I agree.
link |
02:17:16.480
And so archive, like, do you see a future where a lot of really strong papers, it's already the present,
link |
02:17:22.480
a growing future where it'll just be archive.
link |
02:17:25.480
And you're presenting an ongoing continuous conference called Twitter and slash the internet slash archive sanity.
link |
02:17:35.480
Andre just released a new version.
link |
02:17:37.480
So just not, you know, not being so elitist about this particular gating.
link |
02:17:43.480
It's not a question of being elitist or not.
link |
02:17:45.480
It's a question of being basically recommendation and zero approvals for people who don't see themselves as having the ability to do so by themselves.
link |
02:17:55.480
Right.
link |
02:17:56.480
So it saves time.
link |
02:17:57.480
Right.
link |
02:17:58.480
If you rely on other people's opinion, and you trust those people or those groups to evaluate a paper for you, that saves you time.
link |
02:18:09.480
Because, you know, you don't have to like scrutinize the paper as much, you know, is brought to your attention.
link |
02:18:14.480
I mean, it's the whole idea of sort of, you know, collective recommender system.
link |
02:18:18.480
So I actually thought about this a lot, you know, about 10, 15 years ago, because there were discussions at NIPPS and, you know, and we're about to create iClear with Yosha Benjo.
link |
02:18:30.480
And so I wrote a document kind of describing a reviewing system, which basically was, you know, you post your paper on some repository, let's say archive, or now it could be open review.
link |
02:18:42.480
And then you can form a reviewing entity, which is equivalent to a reviewing board, you know, of a journal or program committee of a conference.
link |
02:18:53.480
You have to list the members.
link |
02:18:55.480
And then that group reviewing entity can choose to review a particular paper spontaneously or not.
link |
02:19:03.480
There is no exclusive relationship anymore between a paper and a venue or reviewing entity.
link |
02:19:08.480
Any reviewing entity can review any paper or may choose not to.
link |
02:19:14.480
And then, you know, give an evaluation.
link |
02:19:16.480
It's not published, not published.
link |
02:19:17.480
It's just an evaluation and a comment, which would be public signed by the reviewing entity.
link |
02:19:23.480
And if it's signed by the reviewing entity, you know, it's one of the members of reviewing entity.
link |
02:19:27.480
So if the reviewing entity is, you know, Lex Friedman's, you know, preferred papers, right, you know, it's Lex Friedman writing a review.
link |
02:19:35.480
Yes.
link |
02:19:36.480
But so for me, that's a beautiful system, I think.
link |
02:19:40.480
But what's in addition to that, it feels like there should be a reputation system for the reviewers.
link |
02:19:46.480
Absolutely.
link |
02:19:47.480
For the reviewing entities.
link |
02:19:48.480
Not the reviewers individually.
link |
02:19:49.480
The reviewing entities, sure.
link |
02:19:51.480
But even within that, the reviewers too, because there's another thing here.
link |
02:19:56.480
It's not just the reputation.
link |
02:19:58.480
It's an incentive for an individual person to do great.
link |
02:20:02.480
Right now, in the academic setting, the incentive is kind of internal, just wanting to do a good job.
link |
02:20:09.480
But honestly, that's not a strong enough incentive to do a really good job at reading a paper and finding the beautiful amidst the mistakes and the flaws and all that kind of stuff.
link |
02:20:17.480
Right.
link |
02:20:18.480
Like if you're the person that first discovered a powerful paper and you get to be proud of that discovery, then that gives a huge incentive to you.
link |
02:20:27.480
That's a big part of my proposal, actually.
link |
02:20:29.480
I describe that as if your evaluation of papers is predictive of future success, then your reputation should go up as a reviewing entity.
link |
02:20:42.480
So yeah, exactly.
link |
02:20:44.480
I even had a master's student who was a master's student in library science and computer science.
link |
02:20:50.480
Actually, it kind of worked out exactly how that should work with formulas and everything.
link |
02:20:55.480
So in terms of implementation, do you think that's something that's doable?
link |
02:20:58.480
I mean, I've been sort of talking about this to various people like Andrew McCallum who started OpenReview.
link |
02:21:05.480
And the reason why we picked OpenReview for iClear initially, even though it was very early for them, is because my hope was that iClear was eventually going to kind of inaugurate this type of system.
link |
02:21:18.480
So iClear kept the idea of OpenReviews.
link |
02:21:21.480
So whether reviews are published with a paper, which I think is very useful.
link |
02:21:26.480
But in many ways, that's kind of reverted to kind of more of a conventional type conferences for everything else and that.
link |
02:21:34.480
I mean, I don't run iClear.
link |
02:21:37.480
I'm just the president of the foundation, but people who run it should make decisions about how to run it.
link |
02:21:45.480
And I'm not going to tell them because there are volunteers and I'm really thankful that they do that.
link |
02:21:50.480
But I'm saddened by the fact that we're not being innovative enough.
link |
02:21:56.480
Yeah, me too.
link |
02:21:57.480
I hope that changes.
link |
02:21:58.480
Yeah, because the communication science broadly, but communication, computer science ideas is how you make those ideas have impact, I think.
link |
02:22:08.480
Yeah.
link |
02:22:09.480
And I think a lot of this is because people have in their mind kind of an objective, which is fairness for authors.
link |
02:22:18.480
And the ability to count points basically and give credits accurately.
link |
02:22:24.480
But that comes at the expense of the progress of science.
link |
02:22:28.480
So to some extent, we're slowing down the progress of science.
link |
02:22:31.480
And are we actually achieving fairness?
link |
02:22:33.480
And we're not achieving fairness.
link |
02:22:35.480
We still have biases.
link |
02:22:37.480
We're doing a double blind review, but the biases are still there.
link |
02:22:44.480
There are different kinds of biases.
link |
02:22:46.480
You write that the phenomenon of emergence, collective behavior exhibited by a large collection of simple elements in interaction is one of the things that got you into neural nets in the first place.
link |
02:22:57.480
I love cellular automata.
link |
02:22:58.480
I love simple interacting elements and the things that emerge from them.
link |
02:23:03.480
Do you think we understand how complex systems can emerge from such simple components that interact simply?
link |
02:23:10.480
No, we don't.
link |
02:23:11.480
It's a big mystery.
link |
02:23:12.480
Also, it's a mystery for physicists, it's a mystery for biologists.
link |
02:23:17.480
How is it that the universe around us seems to be increasing in complexity and not decreasing?
link |
02:23:25.480
I mean, that is a kind of curious property of physics that despite the second law of thermodynamics, we seem to be evolution and learning.
link |
02:23:36.480
It seems to be at least locally to increase complexity and not decrease it.
link |
02:23:43.480
So perhaps the ultimate purpose of the universe is to just get more complex.
link |
02:23:48.480
I mean, small pockets of beautiful complexity.
link |
02:23:54.480
Does that, to cellular automata, these kinds of emergence and complex systems give you some intuition or guide your understanding of machine learning systems and neural networks and so on?
link |
02:24:06.480
Are these for you right now disparate concepts?
link |
02:24:09.480
Well, it got me into it.
link |
02:24:11.480
I discovered the existence of the perceptron when I was a college student by reading a book and it was a debate between Chomsky and Piaget and Seymour Pepperd from MIT who was kind of singing the praise of the perceptron in that book.
link |
02:24:27.480
And the first time I heard about the running machine, so I started digging the literature and I found those books which were basically transcription of workshops or conferences from the 50s and 60s about self organizing systems.
link |
02:24:42.480
So there was a series of conferences on self organizing systems and these books.
link |
02:24:48.480
Some of them are, you can actually get them at the Internet Archive, the digital version.
link |
02:24:55.480
There are like fascinating articles in there by, there's a guy whose name has been largely forgotten, Heinz von Förster.
link |
02:25:02.480
He's a German physicist who immigrated to the US and worked on self organizing systems in the 50s.
link |
02:25:11.480
And in the 60s, he created, at University of Illinois, Japan at Champagne, he created the biological computer laboratory, BCL, which was all about neural nets.
link |
02:25:21.480
Unfortunately, that was kind of towards the end of the popularity of neural nets, so that lab never kind of strived very much.
link |
02:25:27.480
But he wrote a bunch of papers about self organization and about the mystery of self organization.
link |
02:25:33.480
An example he has is, you take, imagine you are in space, there's no gravity.
link |
02:25:37.480
You have a big box with magnets in it, okay?
link |
02:25:41.480
You know, kind of rectangular magnets with north pole on one end, south pole on the other end.
link |
02:25:46.480
You shake the box gently and the magnets will kind of stick to themselves and probably form like complex structure, you know, spontaneously.
link |
02:25:54.480
You know, that could be an example of self organization, but you know, you have lots of example, neural nets are an example of self organization to, you know, in many respects.
link |
02:26:02.480
And it's a bit of a mystery, you know, how, like what, what is possible with this, you know, pattern formation in physical systems, in chaotic system and things like that.
link |
02:26:13.480
You know, the emergence of life, you know, things like that.
link |
02:26:16.480
So, you know, how does that happen?
link |
02:26:19.480
So it's a big puzzle for physicists as well.
link |
02:26:22.480
It feels like understanding this, the mathematics of emergence in some constrained situations might help us create intelligence.
link |
02:26:31.480
Like help us add a little spice to the systems because you seem to be able to, in complex systems with emergence,
link |
02:26:41.480
to be able to get a lot from little.
link |
02:26:44.480
And so that seems like a shortcut to get big leaps in performance.
link |
02:26:49.480
But there's a missing concept that we don't have.
link |
02:26:55.480
And it's something also I've been fascinated by since my undergrad days.
link |
02:27:00.480
And it's how you measure complexity.
link |
02:27:03.480
Right. So we don't actually have good ways of measuring, or at least we don't have good ways of interpreting the measures that we have at our disposal.
link |
02:27:11.480
Like how do we measure the complexity of something, right?
link |
02:27:14.480
So there's all those things, you know, like, you know, Karmogorov, Chaitin, Solomonov complexity of, you know, the length of the shortest program that would generate a bit string can be thought of as the complexity of that bit string.
link |
02:27:25.480
I've been fascinated by that concept.
link |
02:27:28.480
And the thing with that is that that complexity is defined up to a constant, which can be very large.
link |
02:27:36.480
There are similar concepts that are derived from, you know, Bayesian probability theory, where, you know, the complexity of something is the negative log of its probability, essentially, right?
link |
02:27:49.480
And you have a complete equivalence between the two things.
link |
02:27:52.480
And there you would think, you know, the probability is something that's well defined mathematically, which means complexity is well defined.
link |
02:27:58.480
But it's not true.
link |
02:27:59.480
You need to have a model of the distribution.
link |
02:28:02.480
You may need to have a prior if you're doing Bayesian inference.
link |
02:28:05.480
And the prior plays the same role as the choice of the computer with which you measure Karmogorov complexity.
link |
02:28:10.480
And so every measure of complexity we have has some arbitrary density, you know, an additive constant, which is, can be arbitrarily large.
link |
02:28:20.480
And so, you know, how can we come up with a good theory of how things become more complex if we don't have a good measure of complexity?
link |
02:28:26.480
Yeah, which we need for is one way that people study this in the space of biology, the people that study the origin of life or try to recreate life in the laboratory.
link |
02:28:37.480
And the more interesting one is the alien one is when we go to other planets, how do we recognize this life?
link |
02:28:44.480
Because, you know, complexity, we associate complexity, maybe some level of mobility with life, you know, we have to be able to like have concrete algorithms for like measuring the level of complexity we see in order to know the difference between life and nonlife.
link |
02:29:02.480
And the problem is that complexity is in the eye of the beholder.
link |
02:29:05.480
So, let me give you an example. If I give you an image of the MNIST digits, right, and I flip through MNIST digits, there is some obviously some structure to it because local structure, you know, neighboring pixels are correlated across the entire dataset.
link |
02:29:25.480
Now, imagine that I apply a random permutation to all the pixels, a fixed random permutation. Now, I show you those images, they will look, you know, really disorganized to you, more complex.
link |
02:29:40.480
In fact, they're not more complex in absolute terms, they're exactly the same as originally, right? And if you knew what the permutation was, you know, you could undo the permutation.
link |
02:29:49.480
Now, imagine I give you special glasses that undo their permutation. Now, all of a sudden what looked complicated becomes simple.
link |
02:29:57.480
Right.
link |
02:29:58.480
So, if you have two, if you have, you know, humans on one end, and then another race of aliens that sees the universe with permutation glasses.
link |
02:30:05.480
Yeah, with the permutation glasses.
link |
02:30:07.480
What we perceive as simple to them is hardly complicated, it's probably heat.
link |
02:30:12.480
Yeah, heat, yeah.
link |
02:30:13.480
Okay, and what they perceive as simple to us is random fluctuation, it's heat.
link |
02:30:19.480
Yeah.
link |
02:30:20.480
So,
link |
02:30:21.480
Truly in the eye of the beholder, depends what kind of glasses you're wearing, depends what kind of algorithm you're running in your perception system.
link |
02:30:28.480
So, I don't think we'll have a theory of intelligence, self organization, evolution, things like that, until we have a good handle on a notion of complexity, which we know is in the eye of the beholder.
link |
02:30:42.480
Yeah, it's sad to think that we might not be able to detect or interact with alien species because we're wearing different glasses.
link |
02:30:50.480
Because the notion of locality might be different from ours.
link |
02:30:52.480
Yeah.
link |
02:30:53.480
This actually connects with fascinating questions in physics at the moment, like modern physics, quantum physics, like, you know, questions about, like, you know, can we recover the information that's lost in a black hole and things like this, right?
link |
02:31:05.480
And that relies on notions of complexity, which, you know, I find is fascinating.
link |
02:31:11.480
Can you describe your personal quest to build an expressive electronic wind instrument, EWI?
link |
02:31:19.480
What is it?
link |
02:31:20.480
What does it take to build it?
link |
02:31:23.480
Well, I'm a thinker.
link |
02:31:24.480
I like building things.
link |
02:31:26.480
I like building things with combinations of electronics and, you know, mechanical stuff.
link |
02:31:31.480
You know, I have a bunch of different hobbies, but, you know, probably my first one was little was building model airplanes and stuff like that, and I still do that to some extent.
link |
02:31:41.480
But also electronics, I taught myself electronics before I studied it.
link |
02:31:45.480
And the reason I taught myself electronics is because of music.
link |
02:31:49.480
My cousin was an aspiring electronic musician, and then he had an analog synthesizer.
link |
02:31:54.480
And I was, you know, basically modifying it for him and building sequencers and stuff like that, right, for him.
link |
02:32:00.480
I was in high school when I was doing this.
link |
02:32:02.480
How's the interest in, like, progressive rock, like, 80s?
link |
02:32:05.480
Like, what's the greatest band of all time, according to Yamakun?
link |
02:32:09.480
There's too many of them.
link |
02:32:11.480
But, you know, it's a combination of, you know, Mayavishnu Orchestra, Weather Report, Yes, Genesis, you know, Pre Peter Gabriel,
link |
02:32:26.480
Gentle Giant, you know, things like that.
link |
02:32:29.480
Okay, so this love of electronics and this love of music combined together.
link |
02:32:33.480
Right, so I was actually trained to play Baroque and Renaissance music,
link |
02:32:39.480
and I played in an orchestra when I was in high school and first years of college.
link |
02:32:45.480
And I played the recorder, crumb horn, a little bit of oboe, you know, things like that.
link |
02:32:49.480
So I'm a wind instrument player, but I always wanted to play improvised music,
link |
02:32:53.480
even though I don't know anything about it.
link |
02:32:55.480
And the only way I figured, you know, short of, like, learning to play saxophone
link |
02:33:00.480
was to play electronic wind instruments.
link |
02:33:03.480
So they behave, you know, the fingering is similar to a saxophone,
link |
02:33:06.480
but, you know, you have a wide variety of sound because you control the synthesizer with it.
link |
02:33:10.480
So I had a bunch of those, you know, going back to the late 80s,
link |
02:33:14.480
from either Yamaha or Akai, they're both kind of the main manufacturers of those.
link |
02:33:22.480
So they were, classically, you know, going back several decades.
link |
02:33:25.480
But I've never been completely satisfied with them because of lack of expressivity.
link |
02:33:30.480
And, you know, those things, you know, are somewhat expressive.
link |
02:33:33.480
I mean, they measure the breath pressure, they measure the lip pressure,
link |
02:33:36.480
and, you know, you have various parameters you can vary with fingers,
link |
02:33:41.480
but they're not really as expressive as an acoustic instrument, right?
link |
02:33:46.480
You hear John Coltrane play two notes, and you know it's John Coltrane,
link |
02:33:50.480
you know, it's got a unique sound, or Miles Davis, right?
link |
02:33:54.480
You can hear it's Miles Davis playing the trumpet,
link |
02:33:57.480
because the sound reflects their, you know, physiognomy,
link |
02:34:04.480
basically the shape of the vocal track kind of shapes the sound.
link |
02:34:09.480
So how do you do this with an electronic instrument?
link |
02:34:12.480
And I was, many years ago, I met a guy called David Wessel.
link |
02:34:16.480
He was a professor at Berkeley and created the center for, like, you know, music technology there.
link |
02:34:23.480
And he was interested in that question.
link |
02:34:25.480
And so I kept kind of thinking about this for many years.
link |
02:34:28.480
And finally, because of COVID, you know, I was at home.
link |
02:34:31.480
I was in my workshop.
link |
02:34:32.480
My workshop serves also as my kind of Zoom room and home office.
link |
02:34:37.480
This is in New Jersey?
link |
02:34:38.480
In New Jersey.
link |
02:34:39.480
And I started really being serious about, you know, building my own EU instrument.
link |
02:34:45.480
What else is going on in the New Jersey workshop?
link |
02:34:48.480
Is there some crazy stuff you built that just, or like left on the workshop floor left behind?
link |
02:34:55.480
A lot of crazy stuff is, you know, electronics built with microcontrollers of various kinds,
link |
02:35:01.480
and, you know, weird flying contraptions.
link |
02:35:06.480
So you still love flying?
link |
02:35:08.480
It's a family disease.
link |
02:35:09.480
My dad got me into it when I was a kid.
link |
02:35:13.480
And he was building model airplanes when he was a kid.
link |
02:35:17.480
And he was a mechanical engineer.
link |
02:35:19.480
He taught himself electronics also.
link |
02:35:21.480
So he built his early radio control systems in the late 60s, early 70s.
link |
02:35:27.480
And so that's what got me into, I mean, he got me into kind of, you know, engineering and science and technology.
link |
02:35:33.480
Do you also have an interest and appreciation of flight in other forms, like with drones, quadroptors?
link |
02:35:38.480
Or do you, is it model airplane?
link |
02:35:41.480
You know, I, you know, before drones were, you know, kind of a consumer product.
link |
02:35:48.480
You know, I built my own, you know, with also building a microcontroller with gyroscopes and accelerometers for stabilization,
link |
02:35:56.480
writing the firmware for it, you know, and then when it became kind of a standard thing you could buy,
link |
02:35:59.480
it was boring, you know, I stopped doing it.
link |
02:36:01.480
It was in front anymore.
link |
02:36:03.480
Yeah, you were doing it before it was cool.
link |
02:36:06.480
What advice would you give to a young person today in high school and college that dreams of doing something big,
link |
02:36:14.480
like Yanlacoon, like let's talk in the space of intelligence,
link |
02:36:18.480
dreams of having a chance to solve some fundamental problem in space of intelligence,
link |
02:36:23.480
both for their career and just in life, being somebody who was a part of creating something special.
link |
02:36:30.480
So try to get interested by big questions, things like, you know, what is intelligence,
link |
02:36:38.480
what is the universe made of, what's life all about, things like that.
link |
02:36:44.480
Like even like crazy big questions like what's time, like nobody knows what time is.
link |
02:36:52.480
And then learn basic things like basic methods, either from math, from physics or from engineering,
link |
02:37:02.480
things that have a long shelf life.
link |
02:37:04.480
Like if you have a choice between like, you know, learning, you know, mobile programming on iPhone or quantum mechanics,
link |
02:37:13.480
take quantum mechanics, because you're going to learn things that you have no idea exist.
link |
02:37:19.480
And you may not, you may never be a quantum physicist, but you will learn about path integrals.
link |
02:37:26.480
And path integrals are used everywhere.
link |
02:37:28.480
It's the same formula that you use for, you know, Bayesian integration and stuff like that.
link |
02:37:32.480
So the ideas, the little ideas within quantum mechanics or within some of these kind of more solidified fields will have a longer shelf life.
link |
02:37:42.480
They will use somehow use indirectly in your work.
link |
02:37:46.480
Learn classical mechanics, like you learn about Lagrangians, for example, which is like a huge, hugely useful concept,
link |
02:37:54.480
you know, for all kinds of different things.
link |
02:37:56.480
Learn statistical physics, because all the math that comes out of, you know, for machine learning,
link |
02:38:04.480
basically comes out of what we got out by statistical physicists in the, you know, late 19, early 20th century.
link |
02:38:10.480
And for some of them, actually, more recently, by people like Giorgio Parisi, who just got the Nobel Prize for the replica method,
link |
02:38:18.480
among other things, it's used for a lot of different things.
link |
02:38:22.480
You know, variational inference, that math comes from statistical physics.
link |
02:38:27.480
So a lot of those kind of, you know, basic courses, you know, if you do electrical engineering,
link |
02:38:35.480
you take signal processing, you'll learn about Fourier transforms.
link |
02:38:39.480
Again, something super useful is at the basis of things like graph neural nets,
link |
02:38:44.480
which is an entirely new subarea of, you know, AI machine learning, deep learning,
link |
02:38:50.480
which I think is super promising for all kinds of applications.
link |
02:38:53.480
Something very promising, if you're more interested in applications,
link |
02:38:56.480
is the applications of AI machine learning and deep learning to science,
link |
02:39:00.480
or to science that can help solve big problems in the world.
link |
02:39:05.480
I have colleagues at Meta, at Fair, who started this project called Open Catalyst.
link |
02:39:11.480
And it's an open project collaborative.
link |
02:39:14.480
And the idea is to use deep learning to help design new chemical compounds or materials
link |
02:39:21.480
that would facilitate the separation of hydrogen from oxygen.
link |
02:39:25.480
If you can efficiently separate oxygen from hydrogen with electricity, you solve climate change.
link |
02:39:33.480
It's as simple as that, because you cover, you know, some random desert with solar panels,
link |
02:39:40.480
and you have them work all day, produce hydrogen, and then you shoot the hydrogen wherever it's needed.
link |
02:39:45.480
You don't need anything else.
link |
02:39:48.480
You know, you have controllable power that can be transported anywhere.
link |
02:39:55.480
So if we have a large scale, efficient energy storage technology like producing hydrogen,
link |
02:40:04.480
we solve climate change.
link |
02:40:06.480
Here's another way to solve climate change, is figuring out how to make fusion work.
link |
02:40:10.480
Now, the problem with fusion is that you make a super hot plasma,
link |
02:40:13.480
and the plasma is unstable, and you can't control it.
link |
02:40:15.480
Maybe with deep learning, you can find controllers that would stabilize plasma
link |
02:40:18.480
and make, you know, practical fusion reactors.
link |
02:40:21.480
I mean, that's very speculative, but, you know, it's worth trying because, you know,
link |
02:40:26.480
the payoff is huge.
link |
02:40:28.480
There's a group at Google working on this led by John Platt.
link |
02:40:31.480
So control, convert as many problems in science and physics and biology and chemistry
link |
02:40:36.480
into a learnable problem and see if a machine can learn it?
link |
02:40:41.480
Right.
link |
02:40:42.480
I mean, there's properties of, you know, complex materials that we don't understand
link |
02:40:46.480
from first principle, for example.
link |
02:40:48.480
And so, you know, if we could design new, you know, new materials,
link |
02:40:54.480
we could make more efficient batteries.
link |
02:40:56.480
You know, we could make maybe faster electronics.
link |
02:40:59.480
There's a lot of things we can imagine doing or, you know, lighter materials
link |
02:41:04.480
for cars or airplanes and things like that, maybe better fuel cells.
link |
02:41:07.480
I mean, there's all kinds of stuff we can imagine.
link |
02:41:09.480
If we had good fuel cells, hydrogen fuel cells, we could use them to power airplanes
link |
02:41:13.480
and, you know, transportation wouldn't be, or cars.
link |
02:41:17.480
We wouldn't have emission problem, CO2 emission problems for air transportation anymore.
link |
02:41:24.480
So there's a lot of those things, I think, where AI, you know, can be used.
link |
02:41:29.480
And this is not even talking about all the sort of medicine biology and everything like that, right?
link |
02:41:35.480
You know, like protein folding, you know, figuring out, like, how can you design your proteins
link |
02:41:40.480
that it sticks to another protein at a particular site because that's how you design drugs in the end.
link |
02:41:45.480
So, you know, deep learning would be useful, all of this.
link |
02:41:47.480
And those are kind of, you know, would be sort of enormous progress if we could use it for that.
link |
02:41:52.480
Here's an example.
link |
02:41:53.480
If you take, this is like from recent material physics, you take a monoatomic layer of graphene, right?
link |
02:42:01.480
So it's just carbon on an hexagonal mesh and you make this single atom thick.
link |
02:42:08.480
You put another one on top.
link |
02:42:09.480
You twist them by some magic number of degrees, three degrees or something.
link |
02:42:14.480
It becomes superconductor.
link |
02:42:16.480
Nobody has any idea why.
link |
02:42:20.480
I want to know how that was discovered.
link |
02:42:22.480
But that's the kind of thing that machine learning can actually discover, these kinds of things.
link |
02:42:25.480
Well, maybe not.
link |
02:42:26.480
But there is a hint, perhaps, that with machine learning, we could train a system to basically be a phenomenological model
link |
02:42:34.480
of some complex emergent phenomenon, which, you know, superconductivity is one of those.
link |
02:42:41.480
Where, you know, this collective phenomenon is too difficult to describe from first principles
link |
02:42:46.480
with the current, you know, the usual sort of reductionist type method.
link |
02:42:51.480
But we could have deep learning systems that predict the properties of a system from a description of it
link |
02:42:58.480
after being trained with sufficiently many samples.
link |
02:43:02.480
This guy, Pascal Foua, at DPFL, he has a startup company that,
link |
02:43:09.480
where he basically trained a convolutional net, essentially, to predict the aerodynamic properties of solids.
link |
02:43:17.480
And you can generate as much data as you want by just running computational free dynamics, right?
link |
02:43:21.480
So you give, like, a wing, airfoil or something shape of some kind.
link |
02:43:29.480
And you run computational free dynamics, you get, as a result, the drag and, you know, lift and all that stuff, right?
link |
02:43:37.480
And you can generate lots of data, train a neural net to make those predictions.
link |
02:43:41.480
And now what you have is a differentiable model of, let's say, drag and lift as a function of the shape of that solid.
link |
02:43:48.480
And so you can do background and descent. You can optimize the shape so you get the properties you want.
link |
02:43:54.480
Yeah, that's incredible. That's incredible.
link |
02:43:56.480
And on top of all that, probably you should read a little bit of literature and a little bit of history for inspiration and for wisdom.
link |
02:44:06.480
Because after all, all of these technologies will have to work in the human world.
link |
02:44:10.480
Yes.
link |
02:44:10.480
And the human world is complicated.
link |
02:44:12.480
It is, certainly.
link |
02:44:14.480
Yeah, and this is an amazing conversation. I'm really honored that you would talk with me today.
link |
02:44:20.480
Thank you for all the amazing work you're doing at FAIR at Metta.
link |
02:44:23.480
And thank you for being so passionate after all these years about everything that's going on.
link |
02:44:28.480
You're a beacon of hope for the machine learning community.
link |
02:44:31.480
And thank you so much for spending your valuable time with me today. That was awesome.
link |
02:44:35.480
Thank you for having me on. That was a pleasure.
link |
02:45:05.480
Thank you.