back to indexYann LeCun: Dark Matter of Intelligence and Self-Supervised Learning | Lex Fridman Podcast #258
link |
The following is a conversation with Yann LeCun,
link |
his second time on the podcast.
link |
He is the chief AI scientist at Meta, formerly Facebook,
link |
professor at NYU, touring award winner,
link |
one of the seminal figures in the history
link |
of machine learning and artificial intelligence,
link |
and someone who is brilliant and opinionated
link |
in the best kind of way.
link |
And so it was always fun to talk to him.
link |
This is the Lex Friedman podcast.
link |
To support it, please check out our sponsors
link |
in the description.
link |
And now, here's my conversation with Yann LeCun.
link |
You cowrote the article,
link |
Self Supervised Learning, the Dark Matter of Intelligence.
link |
Great title, by the way, with Ishan Mizra.
link |
So let me ask, what is self supervised learning,
link |
and why is it the dark matter of intelligence?
link |
I'll start by the dark matter part.
link |
There is obviously a kind of learning
link |
that humans and animals are doing
link |
that we currently are not reproducing properly
link |
with machines or with AI, right?
link |
So the most popular approaches to machine learning today are,
link |
or paradigms, I should say,
link |
are supervised learning and reinforcement learning.
link |
And they are extremely inefficient.
link |
Supervised learning requires many samples
link |
for learning anything.
link |
And reinforcement learning requires a ridiculously large
link |
number of trial and errors for a system to learn anything.
link |
And that's why we don't have self driving cars.
link |
That was a big leap from one to the other.
link |
Okay, so that, to solve difficult problems,
link |
you have to have a lot of human annotation
link |
for supervised learning to work.
link |
And to solve those difficult problems
link |
with reinforcement learning,
link |
you have to have some way to maybe simulate that problem
link |
such that you can do that large scale kind of learning
link |
that reinforcement learning requires.
link |
Right, so how is it that most teenagers can learn
link |
to drive a car in about 20 hours of practice,
link |
whereas even with millions of hours of simulated practice,
link |
a self driving car can't actually learn
link |
to drive itself properly.
link |
And so obviously we're missing something, right?
link |
And it's quite obvious for a lot of people
link |
that the immediate response you get from many people is,
link |
well, humans use their background knowledge
link |
to learn faster, and they're right.
link |
Now, how was that background knowledge acquired?
link |
And that's the big question.
link |
So now you have to ask, how do babies
link |
in the first few months of life learn how the world works?
link |
Mostly by observation,
link |
because they can hardly act in the world.
link |
And they learn an enormous amount
link |
of background knowledge about the world
link |
that may be the basis of what we call common sense.
link |
This type of learning is not learning a task.
link |
It's not being reinforced for anything.
link |
It's just observing the world and figuring out how it works.
link |
Building world models, learning world models.
link |
How do we do this?
link |
And how do we reproduce this in machines?
link |
So self supervised learning is one instance
link |
or one attempt at trying to reproduce this kind of learning.
link |
Okay, so you're looking at just observation,
link |
so not even the interacting part of a child.
link |
It's just sitting there watching mom and dad walk around,
link |
pick up stuff, all of that.
link |
That's what we mean about background knowledge.
link |
Perhaps not even watching mom and dad,
link |
just watching the world go by.
link |
Just having eyes open or having eyes closed
link |
or the very act of opening and closing eyes
link |
that the world appears and disappears,
link |
all that basic information.
link |
And you're saying in order to learn to drive,
link |
like the reason humans are able to learn to drive quickly,
link |
some faster than others,
link |
is because of the background knowledge.
link |
They're able to watch cars operate in the world
link |
in the many years leading up to it,
link |
the physics of basic objects, all that kind of stuff.
link |
I mean, the basic physics of objects,
link |
you don't even need to know how a car works, right?
link |
Because that you can learn fairly quickly.
link |
I mean, the example I use very often
link |
is you're driving next to a cliff.
link |
And you know in advance because of your understanding
link |
of intuitive physics that if you turn the wheel
link |
to the right, the car will veer to the right,
link |
will run off the cliff, fall off the cliff,
link |
and nothing good will come out of this, right?
link |
But if you are a sort of tabularized
link |
reinforcement learning system
link |
that doesn't have a model of the world,
link |
you have to repeat falling off this cliff
link |
thousands of times before you figure out it's a bad idea.
link |
And then a few more thousand times
link |
before you figure out how to not do it.
link |
And then a few more million times
link |
before you figure out how to not do it
link |
in every situation you ever encounter.
link |
So self supervised learning still has to have
link |
some source of truth being told to it by somebody.
link |
So you have to figure out a way without human assistance
link |
or without significant amount of human assistance
link |
to get that truth from the world.
link |
So the mystery there is how much signal is there?
link |
How much truth is there that the world gives you?
link |
Whether it's the human world,
link |
like you watch YouTube or something like that,
link |
or it's the more natural world.
link |
So how much signal is there?
link |
So here's the trick.
link |
There is way more signal in sort of a self supervised
link |
setting than there is in either a supervised
link |
or reinforcement setting.
link |
And this is going to my analogy of the cake.
link |
The cake as someone has called it,
link |
where when you try to figure out how much information
link |
you ask the machine to predict
link |
and how much feedback you give the machine at every trial,
link |
in reinforcement learning,
link |
you give the machine a single scalar.
link |
You tell the machine you did good, you did bad.
link |
And you only tell this to the machine once in a while.
link |
When I say you, it could be the universe
link |
telling the machine, right?
link |
But it's just one scalar.
link |
And so as a consequence,
link |
you cannot possibly learn something very complicated
link |
without many, many, many trials
link |
where you get many, many feedbacks of this type.
link |
Supervised learning, you give a few bits to the machine
link |
Let's say you're training a system on recognizing images
link |
on ImageNet with 1000 categories,
link |
that's a little less than 10 bits of information per sample.
link |
But self supervised learning, here is the setting.
link |
Ideally, we don't know how to do this yet,
link |
but ideally you would show a machine a segment of video
link |
and then stop the video and ask the machine to predict
link |
what's going to happen next.
link |
And so we let the machine predict
link |
and then you let time go by
link |
and show the machine what actually happened
link |
and hope the machine will learn to do a better job
link |
at predicting next time around.
link |
There's a huge amount of information you give the machine
link |
because it's an entire video clip
link |
of the future after the video clip you fed it
link |
in the first place.
link |
So both for language and for vision, there's a subtle,
link |
seemingly trivial construction,
link |
but maybe that's representative
link |
of what is required to create intelligence,
link |
which is filling the gap.
link |
So it sounds dumb, but can you,
link |
it is possible you could solve all of intelligence
link |
in this way, just for both language,
link |
just give a sentence and continue it
link |
or give a sentence and there's a gap in it,
link |
some words blanked out and you fill in what words go there.
link |
For vision, you give a sequence of images
link |
and predict what's going to happen next,
link |
or you fill in what happened in between.
link |
Do you think it's possible that formulation alone
link |
as a signal for self supervised learning
link |
can solve intelligence for vision and language?
link |
I think that's the best shot at the moment.
link |
So whether this will take us all the way
link |
to human level intelligence or something,
link |
or just cat level intelligence is not clear,
link |
but among all the possible approaches
link |
that people have proposed, I think it's our best shot.
link |
So I think this idea of an intelligent system
link |
filling in the blanks, either predicting the future,
link |
inferring the past, filling in missing information,
link |
I'm currently filling the blank
link |
of what is behind your head
link |
and what your head looks like from the back,
link |
because I have basic knowledge about how humans are made.
link |
And I don't know what you're going to say,
link |
at which point you're going to speak,
link |
whether you're going to move your head this way or that way,
link |
which way you're going to look,
link |
but I know you're not going to just dematerialize
link |
and reappear three meters down the hall,
link |
because I know what's possible and what's impossible
link |
according to intuitive physics.
link |
You have a model of what's possible and what's impossible
link |
and then you'd be very surprised if it happens
link |
and then you'll have to reconstruct your model.
link |
Right, so that's the model of the world.
link |
It's what tells you, what fills in the blanks.
link |
So given your partial information about the state
link |
of the world, given by your perception,
link |
your model of the world fills in the missing information
link |
and that includes predicting the future,
link |
re predicting the past, filling in things
link |
you don't immediately perceive.
link |
And that doesn't have to be purely generic vision
link |
or visual information or generic language.
link |
You can go to specifics like predicting
link |
what control decision you make when you're driving
link |
in a lane, you have a sequence of images from a vehicle
link |
and then you have information if you record it on video
link |
where the car ended up going so you can go back in time
link |
and predict where the car went
link |
based on the visual information.
link |
That's very specific, domain specific.
link |
Right, but the question is whether we can come up
link |
with sort of a generic method for training machines
link |
to do this kind of prediction or filling in the blanks.
link |
So right now, this type of approach has been unbelievably
link |
successful in the context of natural language processing.
link |
Every modern natural language processing is pre trained
link |
in self supervised manner to fill in the blanks.
link |
You show it a sequence of words, you remove 10% of them
link |
and then you train some gigantic neural net
link |
to predict the words that are missing.
link |
And once you've pre trained that network,
link |
you can use the internal representation learned by it
link |
as input to something that you train supervised
link |
That's been incredibly successful.
link |
Not so successful in images, although it's making progress
link |
and it's based on sort of manual data augmentation.
link |
We can go into this later,
link |
but what has not been successful yet is training from video.
link |
So getting a machine to learn to represent
link |
the visual world, for example, by just watching video.
link |
Nobody has really succeeded in doing this.
link |
Okay, well, let's kind of give a high level overview.
link |
What's the difference in kind and in difficulty
link |
between vision and language?
link |
So you said people haven't been able to really
link |
kind of crack the problem of vision open
link |
in terms of self supervised learning,
link |
but that may not be necessarily
link |
because it's fundamentally more difficult.
link |
Maybe like when we're talking about achieving,
link |
like passing the Turing test in the full spirit
link |
of the Turing test in language might be harder than vision.
link |
That's not obvious.
link |
So in your view, which is harder
link |
or perhaps are they just the same problem?
link |
When the farther we get to solving each,
link |
the more we realize it's all the same thing.
link |
It's all the same cake.
link |
I think what I'm looking for are methods
link |
that make them look essentially like the same cake,
link |
but currently they're not.
link |
And the main issue with learning world models
link |
or learning predictive models is that the prediction
link |
is never a single thing
link |
because the world is not entirely predictable.
link |
It may be deterministic or stochastic.
link |
We can get into the philosophical discussion about it,
link |
but even if it's deterministic,
link |
it's not entirely predictable.
link |
And so if I play a short video clip
link |
and then I ask you to predict what's going to happen next,
link |
there's many, many plausible continuations
link |
for that video clip and the number of continuation grows
link |
with the interval of time that you're asking the system
link |
to make a prediction for.
link |
And so one big question with self supervised learning
link |
is how you represent this uncertainty,
link |
how you represent multiple discrete outcomes,
link |
how you represent a sort of continuum
link |
of possible outcomes, et cetera.
link |
And if you are sort of a classical machine learning person,
link |
you say, oh, you just represent a distribution, right?
link |
And that we know how to do when we're predicting words,
link |
missing words in the text,
link |
because you can have a neural net give a score
link |
for every word in the dictionary.
link |
It's a big list of numbers, maybe 100,000 or so.
link |
And you can turn them into a probability distribution
link |
that tells you when I say a sentence,
link |
the cat is chasing the blank in the kitchen.
link |
There are only a few words that make sense there.
link |
It could be a mouse or it could be a lizard spot
link |
or something like that, right?
link |
And if I say the blank is chasing the blank in the Savannah,
link |
you also have a bunch of plausible options
link |
for those two words, right?
link |
Because you have kind of a underlying reality
link |
that you can refer to to sort of fill in those blanks.
link |
So you cannot say for sure in the Savannah,
link |
if it's a lion or a cheetah or whatever,
link |
you cannot know if it's a zebra or a do or whatever,
link |
wildebeest, the same thing.
link |
But you can represent the uncertainty
link |
by just a long list of numbers.
link |
Now, if I do the same thing with video,
link |
when I ask you to predict a video clip,
link |
it's not a discrete set of potential frames.
link |
You have to have somewhere representing
link |
a sort of infinite number of plausible continuations
link |
of multiple frames in a high dimensional continuous space.
link |
And we just have no idea how to do this properly.
link |
Fine night, high dimensional.
link |
It's finite high dimensional, yes.
link |
Just like the words,
link |
they try to get it down to a small finite set
link |
of like under a million, something like that.
link |
Something like that.
link |
I mean, it's kind of ridiculous that we're doing
link |
a distribution over every single possible word
link |
for language and it works.
link |
It feels like that's a really dumb way to do it.
link |
Like there seems to be like there should be
link |
some more compressed representation
link |
of the distribution of the words.
link |
You're right about that.
link |
And so do you have any interesting ideas
link |
about how to represent all of reality in a compressed way
link |
such that you can form a distribution over it?
link |
That's one of the big questions, how do you do that?
link |
Right, I mean, what's kind of another thing
link |
that really is stupid about, I shouldn't say stupid,
link |
but like simplistic about current approaches
link |
to self supervised learning in NLP in text
link |
is that not only do you represent
link |
a giant distribution over words,
link |
but for multiple words that are missing,
link |
those distributions are essentially independent
link |
And you don't pay too much of a price for this.
link |
So you can't, so the system in the sentence
link |
that I gave earlier, if it gives a certain probability
link |
for a lion and cheetah, and then a certain probability
link |
for gazelle, wildebeest and zebra,
link |
those two probabilities are independent of each other.
link |
And it's not the case that those things are independent.
link |
Lions actually attack like bigger animals than cheetahs.
link |
So there's a huge independent hypothesis in this process,
link |
which is not actually true.
link |
The reason for this is that we don't know
link |
how to represent properly distributions
link |
over combinatorial sequences of symbols,
link |
essentially because the number grows exponentially
link |
with the length of the symbols.
link |
And so we have to use tricks for this,
link |
but those techniques can get around,
link |
like don't even deal with it.
link |
So the big question is would there be some sort
link |
of abstract latent representation of text
link |
that would say that when I switch lion for gazelle,
link |
lion for cheetah, I also have to switch zebra for gazelle?
link |
Yeah, so this independence assumption,
link |
let me throw some criticism at you that I often hear
link |
and see how you respond.
link |
So this kind of filling in the blanks is just statistics.
link |
You're not learning anything
link |
like the deep underlying concepts.
link |
You're just mimicking stuff from the past.
link |
You're not learning anything new such that you can use it
link |
to generalize about the world.
link |
Or okay, let me just say the crude version,
link |
which is just statistics.
link |
It's not intelligence.
link |
What do you have to say to that?
link |
What do you usually say to that
link |
if you kind of hear this kind of thing?
link |
I don't get into those discussions
link |
because they are kind of pointless.
link |
So first of all, it's quite possible
link |
that intelligence is just statistics.
link |
It's just statistics of a particular kind.
link |
Yes, this is the philosophical question.
link |
It's kind of is it possible
link |
that intelligence is just statistics?
link |
Yeah, but what kind of statistics?
link |
So if you are asking the question,
link |
are the models of the world that we learn,
link |
do they have some notion of causality?
link |
So if the criticism comes from people who say,
link |
current machine learning system don't care about causality,
link |
which by the way is wrong, I agree with them.
link |
Your model of the world should have your actions
link |
as one of the inputs.
link |
And that will drive you to learn causal models of the world
link |
where you know what intervention in the world
link |
will cause what result.
link |
Or you can do this by observation of other agents
link |
acting in the world and observing the effect.
link |
Other humans, for example.
link |
So I think at some level of description,
link |
intelligence is just statistics.
link |
But that doesn't mean you don't have models
link |
that have deep mechanistic explanation for what goes on.
link |
The question is how do you learn them?
link |
That's the question I'm interested in.
link |
Because a lot of people who actually voice their criticism
link |
say that those mechanistic model
link |
have to come from someplace else.
link |
They have to come from human designers,
link |
they have to come from I don't know what.
link |
And obviously we learn them.
link |
Or if we don't learn them as an individual,
link |
nature learn them for us using evolution.
link |
So regardless of what you think,
link |
those processes have been learned somehow.
link |
So if you look at the human brain,
link |
just like when we humans introspect
link |
about how the brain works,
link |
it seems like when we think about what is intelligence,
link |
we think about the high level stuff,
link |
like the models we've constructed,
link |
concepts like cognitive science,
link |
like concepts of memory and reasoning module,
link |
almost like these high level modules.
link |
Is this serve as a good analogy?
link |
Like are we ignoring the dark matter,
link |
the basic low level mechanisms?
link |
Just like we ignore the way the operating system works,
link |
we're just using the high level software.
link |
We're ignoring that at the low level,
link |
the neural network might be doing something like statistics.
link |
Like meaning, sorry to use this word
link |
probably incorrectly and crudely,
link |
but doing this kind of fill in the gap kind of learning
link |
and just kind of updating the model constantly
link |
in order to be able to support the raw sensory information
link |
to predict it and then adjust to the prediction
link |
But like when we look at our brain at the high level,
link |
it feels like we're doing, like we're playing chess,
link |
like we're like playing with high level concepts
link |
and we're stitching them together
link |
and we're putting them into longterm memory.
link |
But really what's going underneath
link |
is something we're not able to introspect,
link |
which is this kind of simple, large neural network
link |
that's just filling in the gaps.
link |
Right, well, okay.
link |
So there's a lot of questions and a lot of answers there.
link |
Okay, so first of all,
link |
there's a whole school of thought in neuroscience,
link |
computational neuroscience in particular,
link |
that likes the idea of predictive coding,
link |
which is really related to the idea
link |
I was talking about in self supervised learning.
link |
So everything is about prediction.
link |
The essence of intelligence is the ability to predict
link |
and everything the brain does is trying to predict,
link |
predict everything from everything else.
link |
Okay, and that's really sort of the underlying principle,
link |
if you want, that self supervised learning
link |
is trying to kind of reproduce this idea of prediction
link |
as kind of an essential mechanism
link |
of task independent learning, if you want.
link |
The next step is what kind of intelligence
link |
are you interested in reproducing?
link |
And of course, we all think about trying to reproduce
link |
sort of high level cognitive processes in humans,
link |
but like with machines, we're not even at the level
link |
of even reproducing the learning processes in a cat brain.
link |
The most intelligent or intelligent systems
link |
don't have as much common sense as a house cat.
link |
So how is it that cats learn?
link |
And cats don't do a whole lot of reasoning.
link |
They certainly have causal models.
link |
They certainly have, because many cats can figure out
link |
how they can act on the world to get what they want.
link |
They certainly have a fantastic model of intuitive physics,
link |
certainly the dynamics of their own bodies,
link |
but also of praise and things like that.
link |
So they're pretty smart.
link |
They only do this with about 800 million neurons.
link |
We are not anywhere close to reproducing this kind of thing.
link |
So to some extent, I could say,
link |
let's not even worry about like the high level cognition
link |
and kind of longterm planning and reasoning
link |
that humans can do until we figure out like,
link |
can we even reproduce what cats are doing?
link |
Now that said, this ability to learn world models,
link |
I think is the key to the possibility of learning machines
link |
that can also reason.
link |
So whenever I give a talk, I say there are three challenges
link |
in the three main challenges in machine learning.
link |
The first one is getting machines to learn
link |
to represent the world
link |
and I'm proposing self supervised learning.
link |
The second is getting machines to reason
link |
in ways that are compatible
link |
with essentially gradient based learning
link |
because this is what deep learning is all about really.
link |
And the third one is something
link |
we have no idea how to solve,
link |
at least I have no idea how to solve
link |
is can we get machines to learn hierarchical representations
link |
We know how to train them
link |
to learn hierarchical representations of perception
link |
with convolutional nets and things like that
link |
and transformers, but what about action plans?
link |
Can we get them to spontaneously learn
link |
good hierarchical representations of actions?
link |
Also gradient based.
link |
Yeah, all of that needs to be somewhat differentiable
link |
so that you can apply sort of gradient based learning,
link |
which is really what deep learning is about.
link |
So it's background, knowledge, ability to reason
link |
in a way that's differentiable
link |
that is somehow connected, deeply integrated
link |
with that background knowledge
link |
or builds on top of that background knowledge
link |
and then given that background knowledge
link |
be able to make hierarchical plans in the world.
link |
So if you take classical optimal control,
link |
there's something in classical optimal control
link |
called model predictive control.
link |
And it's been around since the early sixties.
link |
NASA uses that to compute trajectories of rockets.
link |
And the basic idea is that you have a predictive model
link |
of the rocket, let's say,
link |
or whatever system you intend to control,
link |
which given the state of the system at time T
link |
and given an action that you're taking the system.
link |
So for a rocket to be thrust
link |
and all the controls you can have,
link |
it gives you the state of the system
link |
at time T plus Delta T, right?
link |
So basically a differential equation, something like that.
link |
And if you have this model
link |
and you have this model in the form of some sort of neural net
link |
or some sort of a set of formula
link |
that you can back propagate gradient through,
link |
you can do what's called model predictive control
link |
or gradient based model predictive control.
link |
So you can unroll that model in time.
link |
You feed it a hypothesized sequence of actions.
link |
And then you have some objective function
link |
that measures how well at the end of the trajectory,
link |
the system has succeeded or matched what you wanted to do.
link |
Is it a robot harm?
link |
Have you grasped the object you want to grasp?
link |
If it's a rocket, are you at the right place
link |
near the space station, things like that.
link |
And by back propagation through time,
link |
and again, this was invented in the 1960s,
link |
by optimal control theorists, you can figure out
link |
what is the optimal sequence of actions
link |
that will get my system to the best final state.
link |
So that's a form of reasoning.
link |
It's basically planning.
link |
And a lot of planning systems in robotics
link |
are actually based on this.
link |
And you can think of this as a form of reasoning.
link |
So to take the example of the teenager driving a car,
link |
you have a pretty good dynamical model of the car.
link |
It doesn't need to be very accurate.
link |
But you know, again, that if you turn the wheel
link |
to the right and there is a cliff,
link |
you're gonna run off the cliff, right?
link |
You don't need to have a very accurate model
link |
And you can run this in your mind
link |
and decide not to do it for that reason.
link |
Because you can predict in advance
link |
that the result is gonna be bad.
link |
So you can sort of imagine different scenarios
link |
and then employ or take the first step
link |
in the scenario that is most favorable
link |
and then repeat the process again.
link |
The scenario that is most favorable
link |
and then repeat the process of planning.
link |
That's called receding horizon model predictive control.
link |
So even all those things have names going back decades.
link |
And so if you're not a classical optimal control,
link |
the model of the world is not generally learned.
link |
Sometimes a few parameters you have to identify.
link |
That's called systems identification.
link |
But generally, the model is mostly deterministic
link |
and mostly built by hand.
link |
So the question of AI,
link |
I think the big challenge of AI for the next decade
link |
is how do we get machines to learn predictive models
link |
of the world that deal with uncertainty
link |
and deal with the real world in all this complexity?
link |
So it's not just the trajectory of a rocket,
link |
which you can reduce to first principles.
link |
It's not even just the trajectory of a robot arm,
link |
which again, you can model by careful mathematics.
link |
But it's everything else,
link |
everything we observe in the world.
link |
physical systems that involve collective phenomena,
link |
like water or trees and branches in a tree or something
link |
or complex things that humans have no trouble
link |
developing abstract representations
link |
and predictive model for,
link |
but we still don't know how to do with machines.
link |
Where do you put in these three,
link |
maybe in the planning stages,
link |
the game theoretic nature of this world,
link |
where your actions not only respond
link |
to the dynamic nature of the world, the environment,
link |
but also affect it.
link |
So if there's other humans involved,
link |
is this point number four,
link |
or is it somehow integrated
link |
into the hierarchical representation of action
link |
I think it's integrated.
link |
It's just that now your model of the world has to deal with,
link |
it just makes it more complicated.
link |
The fact that humans are complicated
link |
and not easily predictable,
link |
that makes your model of the world much more complicated,
link |
that much more complicated.
link |
Well, there's a chess,
link |
I mean, I suppose chess is an analogy.
link |
So multicolored tree search.
link |
There's a, I go, you go, I go, you go.
link |
Like Andre Capote recently gave a talk at MIT
link |
I think there's some machine learning too,
link |
but mostly car doors.
link |
And there's a dynamic nature to the car,
link |
like the person opening the door,
link |
checking, I mean, he wasn't talking about that.
link |
He was talking about the perception problem
link |
of what the ontology of what defines a car door,
link |
this big philosophical question.
link |
But to me, it was interesting
link |
because it's obvious that the person opening the car doors,
link |
they're trying to get out, like here in New York,
link |
trying to get out of the car.
link |
You slowing down is going to signal something.
link |
You speeding up is gonna signal something,
link |
and that's a dance.
link |
It's a asynchronous chess game.
link |
So it feels like it's not just,
link |
I mean, I guess you can integrate all of them
link |
to one giant model, like the entirety
link |
of these little interactions.
link |
Because it's not as complicated as chess.
link |
It's just like a little dance.
link |
We do like a little dance together,
link |
and then we figure it out.
link |
Well, in some ways it's way more complicated than chess
link |
because it's continuous, it's uncertain
link |
in a continuous manner.
link |
It doesn't feel more complicated.
link |
But it doesn't feel more complicated
link |
because that's what we've evolved to solve.
link |
This is the kind of problem we've evolved to solve.
link |
And so we're good at it
link |
because nature has made us good at it.
link |
Nature has not made us good at chess.
link |
We completely suck at chess.
link |
In fact, that's why we designed it as a game,
link |
is to be challenging.
link |
And if there is something that recent progress
link |
in chess and Go has made us realize
link |
is that humans are really terrible at those things,
link |
There was a story right before AlphaGo
link |
that the best Go players thought
link |
there were maybe two or three stones behind an ideal player
link |
that they would call God.
link |
In fact, no, there are like nine or 10 stones behind.
link |
I mean, we're just bad.
link |
So we're not good at,
link |
and it's because we have limited working memory.
link |
We're not very good at doing this tree exploration
link |
that computers are much better at doing than we are.
link |
But we are much better
link |
at learning differentiable models to the world.
link |
I mean, I said differentiable in a kind of,
link |
I should say not differentiable in the sense that
link |
we went back far through it,
link |
but in the sense that our brain has some mechanism
link |
for estimating gradients of some kind.
link |
And that's what makes us efficient.
link |
So if you have an agent that consists of a model
link |
of the world, which in the human brain
link |
is basically the entire front half of your brain,
link |
an objective function,
link |
which in humans is a combination of two things.
link |
There is your sort of intrinsic motivation module,
link |
which is in the basal ganglia,
link |
the base of your brain.
link |
That's the thing that measures pain and hunger
link |
and things like that,
link |
like immediate feelings and emotions.
link |
And then there is the equivalent
link |
of what people in reinforcement learning call a critic,
link |
which is a sort of module that predicts ahead
link |
what the outcome of a situation will be.
link |
And so it's not a cost function,
link |
but it's sort of not an objective function,
link |
but it's sort of a train predictor
link |
of the ultimate objective function.
link |
And that also is differentiable.
link |
And so if all of this is differentiable,
link |
your cost function, your critic, your world model,
link |
then you can use gradient based type methods
link |
to do planning, to do reasoning, to do learning,
link |
to do all the things that we'd like
link |
an intelligent agent to do.
link |
And gradient based learning,
link |
like what's your intuition?
link |
That's probably at the core of what can solve intelligence.
link |
So you don't need like logic based reasoning in your view.
link |
I don't know how to make logic based reasoning
link |
compatible with efficient learning.
link |
Okay, I mean, there is a big question,
link |
perhaps a philosophical question.
link |
I mean, it's not that philosophical,
link |
but that we can ask is that all the learning algorithms
link |
we know from engineering and computer science
link |
proceed by optimizing some objective function.
link |
So one question we may ask is,
link |
does learning in the brain minimize an objective function?
link |
I mean, it could be a composite
link |
of multiple objective functions,
link |
but it's still an objective function.
link |
Second, if it does optimize an objective function,
link |
does it do it by some sort of gradient estimation?
link |
It doesn't need to be a back prop,
link |
but some way of estimating the gradient in efficient manner
link |
whose complexity is on the same order of magnitude
link |
as actually running the inference.
link |
Because you can't afford to do things
link |
like perturbing a weight in your brain
link |
to figure out what the effect is.
link |
And then sort of, you can do sort of
link |
estimating gradient by perturbation.
link |
To me, it seems very implausible
link |
that the brain uses some sort of zeroth order black box
link |
gradient free optimization,
link |
because it's so much less efficient
link |
than gradient optimization.
link |
So it has to have a way of estimating gradient.
link |
Is it possible that some kind of logic based reasoning
link |
emerges in pockets as a useful,
link |
like you said, if the brain is an objective function,
link |
maybe it's a mechanism for creating objective functions.
link |
It's a mechanism for creating knowledge bases, for example,
link |
that can then be queried.
link |
Like maybe it's like an efficient representation
link |
of knowledge that's learned in a gradient based way
link |
or something like that.
link |
Well, so I think there is a lot of different types
link |
So first of all, I think the type of logical reasoning
link |
that we think about that we are maybe stemming
link |
from sort of classical AI of the 1970s and 80s.
link |
I think humans use that relatively rarely
link |
and are not particularly good at it.
link |
But we judge each other based on our ability
link |
to solve those rare problems.
link |
It's called an IQ test.
link |
Like I'm not very good at chess.
link |
Yes, I'm judging you this whole time.
link |
Because, well, we actually.
link |
With your heritage, I'm sure you're good at chess.
link |
Not all stereotypes are true.
link |
Well, I'm terrible at chess.
link |
So, but I think perhaps another type of intelligence
link |
that I have is this ability of sort of building models
link |
to the world from reasoning obviously,
link |
And those models generally are more kind of analogical.
link |
So it's reasoning by simulation,
link |
and by analogy, where you use one model
link |
to apply to a new situation.
link |
Even though you've never seen that situation,
link |
you can sort of connect it to a situation
link |
you've encountered before.
link |
And your reasoning is more akin
link |
to some sort of internal simulation.
link |
So you're kind of simulating what's happening
link |
when you're building, I don't know,
link |
a box out of wood or something, right?
link |
You can imagine in advance what would be the result
link |
of cutting the wood in this particular way.
link |
Are you going to use screws or nails or whatever?
link |
When you are interacting with someone,
link |
you also have a model of that person
link |
and sort of interact with that person,
link |
having this model in mind to kind of tell the person
link |
what you think is useful to them.
link |
So I think this ability to construct models to the world
link |
is basically the essence, the essence of intelligence.
link |
And the ability to use it then to plan actions
link |
that will fulfill a particular criterion,
link |
of course, is necessary as well.
link |
So I'm going to ask you a series of impossible questions
link |
as we keep asking, as I've been doing.
link |
So if that's the fundamental sort of dark matter
link |
of intelligence, this ability to form a background model,
link |
what's your intuition about how much knowledge is required?
link |
You know, I think dark matter,
link |
you could put a percentage on it
link |
of the composition of the universe
link |
and how much of it is dark matter,
link |
how much of it is dark energy,
link |
how much information do you think is required
link |
to be a house cat?
link |
So you have to be able to, when you see a box going in,
link |
when you see a human compute the most evil action,
link |
if there's a thing that's near an edge,
link |
you knock it off, all of that,
link |
plus the extra stuff you mentioned,
link |
which is a great self awareness of the physics
link |
of your own body and the world.
link |
How much knowledge is required, do you think, to solve it?
link |
I don't even know how to measure an answer to that question.
link |
I'm not sure how to measure it,
link |
but whatever it is, it fits in about 800,000 neurons,
link |
800 million neurons.
link |
What's the representation does?
link |
Everything, all knowledge, everything, right?
link |
You know, it's less than a billion.
link |
A dog is 2 billion, but a cat is less than 1 billion.
link |
And so multiply that by a thousand
link |
and you get the number of synapses.
link |
And I think almost all of it is learned
link |
through this, you know, a sort of self supervised running,
link |
although, you know, I think a tiny sliver
link |
is learned through reinforcement running
link |
and certainly very little through, you know,
link |
classical supervised running,
link |
although it's not even clear how supervised running
link |
actually works in the biological world.
link |
So I think almost all of it is self supervised running,
link |
but it's driven by the sort of ingrained objective functions
link |
that a cat or a human have at the base of their brain,
link |
which kind of drives their behavior.
link |
So, you know, nature tells us you're hungry.
link |
It doesn't tell us how to feed ourselves.
link |
That's something that the rest of our brain
link |
has to figure out, right?
link |
What's interesting is there might be more
link |
like deeper objective functions
link |
than allowing the whole thing.
link |
So hunger may be some kind of,
link |
now you go to like neurobiology,
link |
it might be just the brain trying to maintain homeostasis.
link |
So hunger is just one of the human perceivable symptoms
link |
of the brain being unhappy
link |
with the way things are currently.
link |
It could be just like one really dumb objective function
link |
But that's how behavior is driven.
link |
The fact that, you know, or basal ganglia
link |
drive us to do things that are different
link |
from say an orangutan or certainly a cat
link |
is what makes, you know, human nature
link |
versus orangutan nature versus cat nature.
link |
So for example, you know, our basal ganglia
link |
drives us to seek the company of other humans.
link |
And that's because nature has figured out
link |
that we need to be social animals for our species to survive.
link |
And it's true of many primates.
link |
It's not true of orangutans.
link |
Orangutans are solitary animals.
link |
They don't seek the company of others.
link |
In fact, they avoid them.
link |
In fact, they scream at them when they come too close
link |
because they're territorial.
link |
Because for their survival, you know,
link |
evolution has figured out that's the best thing.
link |
I mean, they're occasionally social, of course,
link |
for, you know, reproduction and stuff like that.
link |
But they're mostly solitary.
link |
So all of those behaviors are not part of intelligence.
link |
You know, people say,
link |
oh, you're never gonna have intelligent machines
link |
because, you know, human intelligence is social.
link |
But then you look at orangutans, you look at octopus.
link |
Octopus never know their parents.
link |
They barely interact with any other.
link |
And they get to be really smart in less than a year,
link |
in like half a year.
link |
You know, in a year, they're adults.
link |
In two years, they're dead.
link |
So there are things that we think, as humans,
link |
are intimately linked with intelligence,
link |
like social interaction, like language.
link |
We think, I think we give way too much importance
link |
to language as a substrate of intelligence as humans.
link |
Because we think our reasoning is so linked with language.
link |
So to solve the house cat intelligence problem,
link |
you think you could do it on a desert island.
link |
You could have, you could just have a cat sitting there
link |
looking at the waves, at the ocean waves,
link |
and figure a lot of it out.
link |
It needs to have sort of, you know,
link |
the right set of drives to kind of, you know,
link |
get it to do the thing and learn the appropriate things,
link |
right, but like for example, you know,
link |
baby humans are driven to learn to stand up and walk.
link |
You know, that's kind of, this desire is hardwired.
link |
How to do it precisely is not, that's learned.
link |
But the desire to walk, move around and stand up,
link |
that's sort of probably hardwired.
link |
But it's very simple to hardwire this kind of stuff.
link |
Oh, like the desire to, well, that's interesting.
link |
You're hardwired to want to walk.
link |
That's not, there's gotta be a deeper need for walking.
link |
I think it was probably socially imposed by society
link |
that you need to walk all the other bipedal.
link |
No, like a lot of simple animals that, you know,
link |
will probably walk without ever watching
link |
any other members of the species.
link |
It seems like a scary thing to have to do
link |
because you suck at bipedal walking at first.
link |
It seems crawling is much safer, much more like,
link |
why are you in a hurry?
link |
Well, because you have this thing that drives you to do it,
link |
you know, which is sort of part of the sort of
link |
human development.
link |
Is that understood actually what?
link |
What's the reason you get on two feet?
link |
Like most animals don't get on two feet.
link |
Well, they get on four feet.
link |
You know, many mammals get on four feet.
link |
Yeah, they do. Very quickly.
link |
Some of them extremely quickly.
link |
But I don't, you know, like from the last time
link |
I've interacted with a table,
link |
that's much more stable than a thing than two legs.
link |
It's just a really hard problem.
link |
Yeah, I mean, birds have figured it out with two feet.
link |
Well, technically we can go into ontology.
link |
They have four, I guess they have two feet.
link |
They have two feet.
link |
You know, dinosaurs have two feet, many of them.
link |
I'm just now learning that T. rex was eating grass,
link |
not other animals.
link |
T. rex might've been a friendly pet.
link |
What do you think about,
link |
I don't know if you looked at the test
link |
for general intelligence that François Chollet put together.
link |
I don't know if you got a chance to look
link |
at that kind of thing.
link |
What's your intuition about how to solve
link |
like an IQ type of test?
link |
I think it's so outside of my radar screen
link |
that it's not really relevant, I think, in the short term.
link |
Well, I guess one way to ask,
link |
another way, perhaps more closer to what do you work is like,
link |
how do you solve MNIST with very little example data?
link |
And that's the answer to this probably
link |
is self supervised learning.
link |
Just learn to represent images
link |
and then learning to recognize handwritten digits
link |
on top of this will only require a few samples.
link |
And we observe this in humans, right?
link |
You show a young child a picture book
link |
with a couple of pictures of an elephant and that's it.
link |
The child knows what an elephant is.
link |
And we see this today with practical systems
link |
that we train image recognition systems
link |
with enormous amounts of images,
link |
either completely self supervised
link |
or very weakly supervised.
link |
For example, you can train a neural net
link |
to predict whatever hashtag people type on Instagram, right?
link |
Then you can do this with billions of images
link |
because there's billions per day that are showing up.
link |
So the amount of training data there
link |
is essentially unlimited.
link |
And then you take the output representation,
link |
a couple of layers down from the outputs
link |
of what the system learned and feed this as input
link |
to a classifier for any object in the world that you want
link |
and it works pretty well.
link |
So that's transfer learning, okay?
link |
Or weakly supervised transfer learning.
link |
People are making very, very fast progress
link |
using self supervised learning
link |
for this kind of scenario as well.
link |
And my guess is that that's gonna be the future.
link |
For self supervised learning,
link |
how much cleaning do you think is needed
link |
for filtering malicious signal or what's a better term?
link |
But like a lot of people use hashtags on Instagram
link |
to get like good SEO that doesn't fully represent
link |
the contents of the image.
link |
Like they'll put a picture of a cat
link |
and hashtag it with like science, awesome, fun.
link |
I don't know all kinds, why would you put science?
link |
That's not very good SEO.
link |
The way my colleagues who worked on this project
link |
at Facebook, now Meta AI, a few years ago dealt with this
link |
is that they only selected something like 17,000 tags
link |
that correspond to kind of physical things or situations,
link |
like that has some visual content.
link |
So you wouldn't have like hash TBT or anything like that.
link |
Oh, so they keep a very select set of hashtags
link |
is what you're saying?
link |
But it's still in the order of 10 to 20,000.
link |
So it's fairly large.
link |
Can you tell me about data augmentation?
link |
What the heck is data augmentation and how is it used
link |
maybe contrast of learning for video?
link |
What are some cool ideas here?
link |
Right, so data augmentation.
link |
I mean, first data augmentation is the idea
link |
of artificially increasing the size of your training set
link |
by distorting the images that you have
link |
in ways that don't change the nature of the image, right?
link |
So you do MNIST, you can do data augmentation on MNIST
link |
and people have done this since the 1990s, right?
link |
You take a MNIST digit and you shift it a little bit
link |
or you change the size or rotate it, skew it,
link |
you know, et cetera.
link |
Add noise, et cetera.
link |
And it works better if you train a supervised classifier
link |
with augmented data, you're gonna get better results.
link |
Now it's become really interesting
link |
over the last couple of years
link |
because a lot of self supervised learning techniques
link |
to pre train vision systems are based on data augmentation.
link |
And the basic techniques is originally inspired
link |
by techniques that I worked on in the early 90s
link |
and Jeff Hinton worked on also in the early 90s.
link |
They were sort of parallel work.
link |
I used to call this Siamese network.
link |
So basically you take two identical copies
link |
of the same network, they share the same weights
link |
and you show two different views of the same object.
link |
Either those two different views may have been obtained
link |
by data augmentation
link |
or maybe it's two different views of the same scene
link |
from a camera that you moved or at different times
link |
or something like that, right?
link |
Or two pictures of the same person, things like that.
link |
And then you train this neural net,
link |
those two identical copies of this neural net
link |
to produce an output representation, a vector
link |
in such a way that the representation for those two images
link |
are as close to each other as possible,
link |
as identical to each other as possible, right?
link |
Because you want the system
link |
to basically learn a function that will be invariant,
link |
that will not change, whose output will not change
link |
when you transform those inputs in those particular ways,
link |
So that's easy to do.
link |
What's complicated is how do you make sure
link |
that when you show two images that are different,
link |
the system will produce different things?
link |
Because if you don't have a specific provision for this,
link |
the system will just ignore the inputs when you train it,
link |
it will end up ignoring the input
link |
and just produce a constant vector
link |
that is the same for every input, right?
link |
That's called a collapse.
link |
Now, how do you avoid collapse?
link |
So there's two ideas.
link |
One idea that I proposed in the early 90s
link |
with my colleagues at Bell Labs,
link |
Jane Barmley and a couple other people,
link |
which we now call contrastive learning,
link |
which is to have negative examples, right?
link |
So you have pairs of images that you know are different
link |
and you show them to the network and those two copies,
link |
and then you push the two output vectors away
link |
from each other and it will eventually guarantee
link |
that things that are semantically similar
link |
produce similar representations
link |
and things that are different
link |
produce different representations.
link |
We actually came up with this idea
link |
for a project of doing signature verification.
link |
So we would collect signatures from,
link |
like multiple signatures on the same person
link |
and then train a neural net to produce the same representation
link |
and then force the system to produce different
link |
representation for different signatures.
link |
This was actually, the problem was proposed by people
link |
from what was a subsidiary of AT&T at the time called NCR.
link |
And they were interested in storing
link |
representation of the signature on the 80 bytes
link |
of the magnetic strip of a credit card.
link |
So we came up with this idea of having a neural net
link |
with 80 outputs that we would quantize on bytes
link |
so that we could encode the signature.
link |
And that encoding was then used to compare
link |
whether the signature matches or not.
link |
So then you would sign, you would run through the neural net
link |
and then you would compare the output vector
link |
to whatever is stored on your card.
link |
Did it actually work?
link |
It worked, but they ended up not using it.
link |
Because nobody cares actually.
link |
I mean, the American financial payment system
link |
is incredibly lax in that respect compared to Europe.
link |
Oh, with the signatures?
link |
What's the purpose of signatures anyway?
link |
This is very different.
link |
Nobody looks at them, nobody cares.
link |
Yeah, no, so that's contrastive learning, right?
link |
So you need positive and negative pairs.
link |
And the problem with that is that,
link |
even though I had the original paper on this,
link |
I'm actually not very positive about it
link |
because it doesn't work in high dimension.
link |
If your representation is high dimensional,
link |
there's just too many ways for two things to be different.
link |
And so you would need lots and lots
link |
and lots of negative pairs.
link |
So there is a particular implementation of this,
link |
which is relatively recent from actually
link |
the Google Toronto group where, you know,
link |
Jeff Hinton is the senior member there.
link |
It's called SIMCLR, S I M C L R.
link |
And it, you know, basically a particular way
link |
of implementing this idea of contrastive learning,
link |
the particular objective function.
link |
Now, what I'm much more enthusiastic about these days
link |
is non contrastive methods.
link |
So other ways to guarantee that the representations
link |
would be different for different inputs.
link |
And it's actually based on an idea that Jeff Hinton
link |
proposed in the early nineties with his student
link |
at the time, Sue Becker.
link |
And it's based on the idea of maximizing
link |
the mutual information between the outputs
link |
of the two systems.
link |
You only show positive pairs.
link |
You only show pairs of images that you know
link |
are somewhat similar.
link |
And you train the two networks to be informative,
link |
but also to be as informative of each other as possible.
link |
So basically one representation has to be predictable
link |
from the other, essentially.
link |
And, you know, he proposed that idea,
link |
had, you know, a couple of papers in the early nineties,
link |
and then nothing was done about it for decades.
link |
And I kind of revived this idea together
link |
with my postdocs at FAIR,
link |
particularly a postdoc called Stefan Denis,
link |
who is now a junior professor in Finland
link |
at University of Aalto.
link |
We came up with something that we call Barlow Twins.
link |
And it's a particular way of maximizing
link |
the information content of a vector,
link |
you know, using some hypotheses.
link |
And we have kind of another version of it
link |
that's more recent now called VICREG, V I C A R E G.
link |
That means Variance, Invariance, Covariance,
link |
And it's the thing I'm the most excited about
link |
in machine learning in the last 15 years.
link |
I mean, I'm not, I'm really, really excited about this.
link |
What kind of data augmentation is useful
link |
for that noncontrastive learning method?
link |
Are we talking about, does that not matter that much?
link |
Or it seems like a very important part of the step.
link |
How you generate the images that are similar,
link |
but sufficiently different.
link |
Yeah, that's right.
link |
It's an important step and it's also an annoying step
link |
because you need to have that knowledge
link |
of what data augmentation you can do
link |
that do not change the nature of the object.
link |
And so the standard scenario,
link |
which a lot of people working in this area are using
link |
is you use the type of distortion.
link |
So basically you do a geometric distortion.
link |
So one basically just shifts the image a little bit,
link |
it's called cropping.
link |
Another one kind of changes the scale a little bit.
link |
Another one kind of rotates it.
link |
Another one changes the colors.
link |
You can do a shift in color balance
link |
or something like that, saturation.
link |
Another one sort of blurs it.
link |
Another one adds noise.
link |
So you have like a catalog of kind of standard things
link |
and people try to use the same ones
link |
for different algorithms so that they can compare.
link |
But some algorithms, some self supervised algorithm
link |
actually can deal with much bigger,
link |
like more aggressive data augmentation and some don't.
link |
So that kind of makes the whole thing difficult.
link |
But that's the kind of distortions we're talking about.
link |
And so you train with those distortions
link |
and then you chop off the last layer, a couple layers
link |
of the network and you use the representation
link |
as input to a classifier.
link |
You train the classifier on ImageNet, let's say,
link |
or whatever, and measure the performance.
link |
And interestingly enough, the methods that are really good
link |
at eliminating the information that is irrelevant,
link |
which is the distortions between those images,
link |
do a good job at eliminating it.
link |
And as a consequence, you cannot use the representations
link |
in those systems for things like object detection
link |
and localization because that information is gone.
link |
So the type of data augmentation you need to do
link |
depends on the tasks you want eventually the system
link |
to solve and the type of data augmentation,
link |
standard data augmentation that we use today
link |
are only appropriate for object recognition
link |
or image classification.
link |
They're not appropriate for things like.
link |
Can you help me out understand what wide localizations?
link |
So you're saying it's just not good at the negative,
link |
like at classifying the negative,
link |
so that's why it can't be used for the localization?
link |
No, it's just that you train the system,
link |
you give it an image and then you give it the same image
link |
shifted and scaled and you tell it that's the same image.
link |
So the system basically is trained
link |
to eliminate the information about position and size.
link |
So now you want to use that to figure out
link |
where an object is and what size it is.
link |
Like a bounding box, like they'd be able to actually.
link |
Okay, it can still find the object in the image,
link |
it's just not very good at finding
link |
the exact boundaries of that object, interesting.
link |
Interesting, which that's an interesting
link |
sort of philosophical question,
link |
how important is object localization anyway?
link |
We're like obsessed by measuring image segmentation,
link |
obsessed by measuring perfectly knowing
link |
the boundaries of objects when arguably
link |
that's not that essential to understanding
link |
what are the contents of the scene.
link |
On the other hand, I think evolutionarily,
link |
the first vision systems in animals
link |
were basically all about localization,
link |
very little about recognition.
link |
And in the human brain, you have two separate pathways
link |
for recognizing the nature of a scene or an object
link |
and localizing objects.
link |
So you use the first pathway called eventual pathway
link |
for telling what you're looking at.
link |
The other pathway, the dorsal pathway,
link |
is used for navigation, for grasping, for everything else.
link |
And basically a lot of the things you need for survival
link |
are localization and detection.
link |
Is similarity learning or contrastive learning,
link |
are these non contrastive methods
link |
the same as understanding something?
link |
Just because you know a distorted cat
link |
is the same as a non distorted cat,
link |
does that mean you understand what it means to be a cat?
link |
I mean, it's a superficial understanding, obviously.
link |
But what is the ceiling of this method, do you think?
link |
Is this just one trick on the path
link |
to doing self supervised learning?
link |
Can we go really, really far?
link |
I think we can go really far.
link |
So if we figure out how to use techniques of that type,
link |
perhaps very different, but the same nature,
link |
to train a system from video to do video prediction,
link |
essentially, I think we'll have a path towards,
link |
I wouldn't say unlimited, but a path towards some level
link |
of physical common sense in machines.
link |
And I also think that that ability to learn
link |
how the world works from a sort of high throughput channel
link |
like vision is a necessary step towards
link |
sort of real artificial intelligence.
link |
In other words, I believe in grounded intelligence.
link |
I don't think we can train a machine
link |
to be intelligent purely from text.
link |
Because I think the amount of information about the world
link |
that's contained in text is tiny compared
link |
to what we need to know.
link |
So for example, and people have attempted to do this
link |
for 30 years, the psych project and things like that,
link |
basically kind of writing down all the facts that are known
link |
and hoping that some sort of common sense will emerge.
link |
I think it's basically hopeless.
link |
But let me take an example.
link |
You take an object, I describe a situation to you.
link |
I take an object, I put it on the table
link |
and I push the table.
link |
It's completely obvious to you that the object
link |
will be pushed with the table,
link |
because it's sitting on it.
link |
There's no text in the world, I believe, that explains this.
link |
And so if you train a machine as powerful as it could be,
link |
your GPT 5000 or whatever it is,
link |
it's never gonna learn about this.
link |
That information is just not present in any text.
link |
Well, the question, like with the psych project,
link |
the dream I think is to have like 10 million,
link |
say facts like that, that give you a headstart,
link |
like a parent guiding you.
link |
Now, we humans don't need a parent to tell us
link |
that the table will move, sorry,
link |
the smartphone will move with the table.
link |
But we get a lot of guidance in other ways.
link |
So it's possible that we can give it a quick shortcut.
link |
The cat knows that.
link |
No, but they evolved, so.
link |
No, they learn like us.
link |
Sorry, the physics of stuff?
link |
Well, yeah, so you're saying it's,
link |
so you're putting a lot of intelligence
link |
onto the nurture side, not the nature.
link |
We seem to have, you know,
link |
there's a very inefficient arguably process of evolution
link |
that got us from bacteria to who we are today.
link |
Started at the bottom, now we're here.
link |
So the question is how, okay,
link |
the question is how fundamental is that,
link |
the nature of the whole hardware?
link |
And then is there any way to shortcut it
link |
if it's fundamental?
link |
If it's not, if it's most of intelligence,
link |
most of the cool stuff we've been talking about
link |
is mostly nurture, mostly trained.
link |
We figure it out by observing the world.
link |
We can form that big, beautiful, sexy background model
link |
that you're talking about just by sitting there.
link |
Then, okay, then you need to, then like maybe,
link |
it is all supervised learning all the way down.
link |
Self supervised learning, say.
link |
Whatever it is that makes, you know,
link |
human intelligence different from other animals,
link |
which, you know, a lot of people think is language
link |
and logical reasoning and this kind of stuff.
link |
It cannot be that complicated because it only popped up
link |
in the last million years.
link |
And, you know, it only involves, you know,
link |
less than 1% of our genome might be,
link |
which is the difference between human genome
link |
and chimps or whatever.
link |
So it can't be that complicated.
link |
You know, it can't be that fundamental.
link |
I mean, most of the complicated stuff
link |
already exists in cats and dogs and, you know,
link |
certainly primates, nonhuman primates.
link |
Yeah, that little thing with humans
link |
might be just something about social interaction
link |
and ability to maintain ideas
link |
across like a collective of people.
link |
It sounds very dramatic and very impressive,
link |
but it probably isn't mechanistically speaking.
link |
It is, but we're not there yet.
link |
Like, you know, we have, I mean, this is number 634,
link |
you know, in the list of problems we have to solve.
link |
So basic physics of the world is number one.
link |
What do you, just a quick tangent on data augmentation.
link |
So a lot of it is hard coded versus learned.
link |
Do you have any intuition that maybe
link |
there could be some weird data augmentation,
link |
like generative type of data augmentation,
link |
like doing something weird to images,
link |
which then improves the similarity learning process?
link |
So not just kind of dumb, simple distortions,
link |
but by you shaking your head,
link |
just saying that even simple distortions are enough.
link |
I think, no, I think data augmentation
link |
is a temporary necessary evil.
link |
So what people are working on now is two things.
link |
One is the type of self supervised learning,
link |
like trying to translate the type of self supervised learning
link |
people use in language, translating these two images,
link |
which is basically a denoising autoencoder method, right?
link |
So you take an image, you block, you mask some parts of it,
link |
and then you train some giant neural net
link |
to reconstruct the parts that are missing.
link |
And until very recently,
link |
there was no working methods for that.
link |
All the autoencoder type methods for images
link |
weren't producing very good representation,
link |
but there's a paper now coming out of the fair group
link |
at MNL Park that actually works very well.
link |
So that doesn't require data augmentation,
link |
that requires only masking, okay.
link |
Only masking for images, okay.
link |
Right, so you mask part of the image
link |
and you train a system, which in this case is a transformer
link |
because the transformer represents the image
link |
as non overlapping patches,
link |
so it's easy to mask patches and things like that.
link |
Okay, but then my question transfers to that problem,
link |
the masking, like why should the mask be square or rectangle?
link |
So it doesn't matter, like, you know,
link |
I think we're gonna come up probably in the future
link |
with sort of ways to mask that are kind of random,
link |
essentially, I mean, they are random already, but.
link |
No, no, but like something that's challenging,
link |
like optimally challenging.
link |
So like, I mean, maybe it's a metaphor that doesn't apply,
link |
but you're, it seems like there's a data augmentation
link |
or masking, there's an interactive element with it.
link |
Like you're almost like playing with an image.
link |
And like, it's like the way we play with an image
link |
No, but it's like dropout.
link |
It's like Boston machine training.
link |
You, you know, every time you see a percept,
link |
you also, you can perturb it in some way.
link |
And then the principle of the training procedure
link |
is to minimize the difference of the output
link |
or the representation between the clean version
link |
and the corrupted version, essentially, right?
link |
And you can do this in real time, right?
link |
So, you know, Boston machine work like this, right?
link |
You show a percept, you tell the machine
link |
that's a good combination of activities
link |
or your input neurons.
link |
And then you either let them go their merry way
link |
without clamping them to values,
link |
or you only do this with a subset.
link |
And what you're doing is you're training the system
link |
so that the stable state of the entire network
link |
is the same regardless of whether it sees
link |
the entire input or whether it sees only part of it.
link |
You know, denoising autoencoder method
link |
is basically the same thing, right?
link |
You're training a system to reproduce the input,
link |
the complete inputs and filling the input
link |
and filling the blanks, regardless of which parts
link |
are missing, and that's really the underlying principle.
link |
And you could imagine sort of, even in the brain,
link |
some sort of neural principle where, you know,
link |
neurons kind of oscillate, right?
link |
So they take their activity and then temporarily
link |
they kind of shut off to, you know,
link |
force the rest of the system to basically reconstruct
link |
the input without their help, you know?
link |
And, I mean, you could imagine, you know,
link |
more or less biologically possible processes.
link |
Something like that.
link |
And I guess with this denoising autoencoder
link |
and masking and data augmentation,
link |
you don't have to worry about being super efficient.
link |
You could just do as much as you want
link |
and get better over time.
link |
Because I was thinking, like, you might want to be clever
link |
about the way you do all these procedures, you know,
link |
but that's only, it's somehow costly to do every iteration,
link |
but it's not really.
link |
And then there is, you know,
link |
data augmentation without explicit data augmentation.
link |
This data augmentation by weighting,
link |
which is, you know, the sort of video prediction.
link |
You're observing a video clip,
link |
observing the, you know, the continuation of that video clip.
link |
You try to learn a representation
link |
using dual joint embedding architectures
link |
in such a way that the representation of the future clip
link |
is easily predictable from the representation
link |
of the observed clip.
link |
Do you think YouTube has enough raw data
link |
from which to learn how to be a cat?
link |
So the amount of data is not the constraint.
link |
No, it would require some selection, I think.
link |
Some selection of, you know, maybe the right type of data.
link |
Don't go down the rabbit hole of just cat videos.
link |
You might need to watch some lectures or something.
link |
How meta would that be
link |
if it like watches lectures about intelligence
link |
watches your lectures in NYU
link |
and learns from that how to be intelligent?
link |
I don't think that would be enough.
link |
What's your, do you find multimodal learning interesting?
link |
We've been talking about visual language,
link |
like combining those together,
link |
maybe audio, all those kinds of things.
link |
There's a lot of things that I find interesting
link |
in the short term,
link |
but are not addressing the important problem
link |
that I think are really kind of the big challenges.
link |
So I think, you know, things like multitask learning,
link |
continual learning, you know, adversarial issues.
link |
I mean, those have great practical interests
link |
in the relatively short term, possibly,
link |
but I don't think they're fundamental.
link |
You know, active learning,
link |
even to some extent, reinforcement learning.
link |
I think those things will become either obsolete
link |
or useless or easy
link |
once we figured out how to do self supervised
link |
representation learning
link |
or learning predictive world models.
link |
And so I think that's what, you know,
link |
the entire community should be focusing on.
link |
At least people who are interested
link |
in sort of fundamental questions
link |
or, you know, really kind of pushing the envelope
link |
of AI towards the next stage.
link |
But of course, there's like a huge amount of,
link |
you know, very interesting work to do
link |
in sort of practical questions
link |
that have, you know, short term impact.
link |
Well, you know, it's difficult to talk about
link |
the temporal scale,
link |
because all of human civilization
link |
will eventually be destroyed
link |
because the sun will die out.
link |
And even if Elon Musk is successful
link |
in multi planetary colonization across the galaxy,
link |
eventually the entirety of it
link |
will just become giant black holes.
link |
And that's gonna take a while though.
link |
So, but what I'm saying is then that logic
link |
can be used to say it's all meaningless.
link |
I'm saying all that to say that multitask learning
link |
might be, you're calling it practical
link |
or pragmatic or whatever.
link |
That might be the thing that achieves something
link |
very akin to intelligence
link |
while we're trying to solve the more general problem
link |
of self supervised learning of background knowledge.
link |
So the reason I bring that up,
link |
maybe one way to ask that question.
link |
I've been very impressed
link |
by what Tesla Autopilot team is doing.
link |
I don't know if you've gotten a chance to glance
link |
at this particular one example of multitask learning,
link |
where they're literally taking the problem,
link |
like, I don't know, Charles Darwin studying animals.
link |
They're studying the problem of driving
link |
and asking, okay, what are all the things
link |
you have to perceive?
link |
And the way they're solving it is one,
link |
there's an ontology where you're bringing that to the table.
link |
So you're formulating a bunch of different tasks.
link |
It's like over a hundred tasks or something like that
link |
that they're involved in driving.
link |
And then they're deploying it
link |
and then getting data back from people that run into trouble
link |
and they're trying to figure out, do we add tasks?
link |
Do we, like, we focus on each individual task separately?
link |
In fact, I would say,
link |
I would classify Andre Karpathy's talk in two ways.
link |
So one was about doors
link |
and the other one about how much ImageNet sucks.
link |
He kept going back and forth on those two topics,
link |
which ImageNet sucks,
link |
meaning you can't just use a single benchmark.
link |
There's so, like, you have to have like a giant suite
link |
of benchmarks to understand how well your system actually works.
link |
Oh, I agree with him.
link |
I mean, he's a very sensible guy.
link |
Now, okay, it's very clear that if you're faced
link |
with an engineering problem that you need to solve
link |
in a relatively short time,
link |
particularly if you have Elon Musk breathing down your neck,
link |
you're going to have to take shortcuts, right?
link |
You might think about the fact that the right thing to do
link |
and the longterm solution involves, you know,
link |
some fancy self supervised running,
link |
but you have, you know, Elon Musk breathing down your neck
link |
and, you know, this involves, you know, human lives.
link |
And so you have to basically just do
link |
the systematic engineering and, you know,
link |
fine tuning and refinements
link |
and trial and error and all that stuff.
link |
There's nothing wrong with that.
link |
That's called engineering.
link |
That's called, you know, putting technology out in the world.
link |
And you have to kind of ironclad it before you do this,
link |
you know, so much for, you know,
link |
grand ideas and principles.
link |
But, you know, I'm placing myself sort of, you know,
link |
some, you know, upstream of this, you know,
link |
quite a bit upstream of this.
link |
You're a Plato, think about platonic forms.
link |
You're not platonic because eventually
link |
I want that stuff to get used,
link |
but it's okay if it takes five or 10 years
link |
for the community to realize this is the right thing to do.
link |
I've done this before.
link |
It's been the case before that, you know,
link |
I've made that case.
link |
I mean, if you look back in the mid 2000, for example,
link |
and you ask yourself the question, okay,
link |
I want to recognize cars or faces or whatever,
link |
you know, I can use convolutional net.
link |
So I can use sort of more conventional
link |
kind of computer vision techniques, you know,
link |
using interest point detectors or assist density features
link |
and, you know, sticking an SVM on top.
link |
At that time, the datasets were so small
link |
that those methods that use more hand engineering
link |
worked better than ConvNets.
link |
It was just not enough data for ConvNets
link |
and ConvNets were a little slow with the kind of hardware
link |
that was available at the time.
link |
And there was a sea change when, basically,
link |
when, you know, datasets became bigger
link |
and GPUs became available.
link |
That's what, you know, two of the main factors
link |
that basically made people change their mind.
link |
And you can look at the history of,
link |
like, all sub branches of AI or pattern recognition.
link |
And there's a similar trajectory followed by techniques
link |
where people start by, you know, engineering the hell out of it.
link |
You know, be it optical character recognition,
link |
speech recognition, computer vision,
link |
like image recognition in general,
link |
natural language understanding, like, you know, translation,
link |
things like that, right?
link |
You start to engineer the hell out of it.
link |
You start to acquire all the knowledge,
link |
the prior knowledge you know about image formation,
link |
about, you know, the shape of characters,
link |
about, you know, morphological operations,
link |
about, like, feature extraction, Fourier transforms,
link |
you know, vernicke moments, you know, whatever, right?
link |
People have come up with thousands of ways
link |
of representing images
link |
so that they could be easily classified afterwards.
link |
Same for speech recognition, right?
link |
There is, you know, it took decades
link |
for people to figure out a good front end
link |
to preprocess speech signals
link |
so that, you know, all the information
link |
about what is being said is preserved,
link |
but most of the information
link |
about the identity of the speaker is gone.
link |
You know, kestrel coefficients or whatever, right?
link |
And same for text, right?
link |
You do named entity recognition and you parse
link |
and you do tagging of the parts of speech
link |
and, you know, you do this sort of tree representation
link |
of clauses and all that stuff, right?
link |
Before you can do anything.
link |
So that's how it starts, right?
link |
Just engineer the hell out of it.
link |
And then you start having data
link |
and maybe you have more powerful computers.
link |
Maybe you know something about statistical learning.
link |
So you start using machine learning
link |
and it's usually a small sliver
link |
on top of your kind of handcrafted system
link |
where, you know, you extract features by hand.
link |
Okay, and now, you know, nowadays the standard way
link |
of doing this is that you train the entire thing end to end
link |
with a deep learning system and it learns its own features
link |
and, you know, speech recognition systems nowadays
link |
or CR systems are completely end to end.
link |
It's, you know, it's some giant neural net
link |
that takes raw waveforms
link |
and produces a sequence of characters coming out.
link |
And it's just a huge neural net, right?
link |
There's no, you know, Markov model,
link |
there's no language model that is explicit
link |
other than, you know, something that's ingrained
link |
in the sort of neural language model, if you want.
link |
Same for translation, same for all kinds of stuff.
link |
So you see this continuous evolution
link |
from, you know, less and less hand crafting
link |
and more and more learning.
link |
And I think, I mean, it's true in biology as well.
link |
So, I mean, we might disagree about this,
link |
maybe not, this one little piece at the end,
link |
you mentioned active learning.
link |
It feels like active learning,
link |
which is the selection of data
link |
and also the interactivity needs to be part
link |
of this giant neural network.
link |
You cannot just be an observer
link |
to do self supervised learning.
link |
You have to, well, I don't,
link |
self supervised learning is just a word,
link |
but I would, whatever this giant stack
link |
of a neural network that's automatically learning,
link |
it feels, my intuition is that you have to have a system,
link |
whether it's a physical robot or a digital robot,
link |
that's interacting with the world
link |
and doing so in a flawed way and improving over time
link |
in order to form the self supervised learning.
link |
Well, you can't just give it a giant sea of data.
link |
Okay, I agree and I disagree.
link |
I agree in the sense that I think, I agree in two ways.
link |
The first way I agree is that if you want,
link |
and you certainly need a causal model of the world
link |
that allows you to predict the consequences
link |
of your actions, to train that model,
link |
you need to take actions, right?
link |
You need to be able to act in a world
link |
and see the effect for you to be,
link |
to learn causal models of the world.
link |
So that's not obvious because you can observe others.
link |
You can observe others.
link |
And you can infer that they're similar to you
link |
and then you can learn from that.
link |
Yeah, but then you have to kind of hardwire that part,
link |
right, and then, you know, mirror neurons
link |
and all that stuff, right?
link |
So, and it's not clear to me
link |
how you would do this in a machine.
link |
So I think the action part would be necessary
link |
for having causal models of the world.
link |
The second reason it may be necessary,
link |
or at least more efficient,
link |
is that active learning basically, you know,
link |
goes for the jugular of what you don't know, right?
link |
Is, you know, obvious areas of uncertainty
link |
about your world and about how the world behaves.
link |
And you can resolve this uncertainty
link |
by systematic exploration of that part
link |
that you don't know.
link |
And if you know that you don't know,
link |
then, you know, it makes you curious.
link |
You kind of look into situations that,
link |
and, you know, across the animal world,
link |
different species have different levels of curiosity,
link |
right, depending on how they're built, right?
link |
So, you know, cats and rats are incredibly curious,
link |
dogs not so much, I mean, less.
link |
Yeah, so it could be useful
link |
to have that kind of curiosity.
link |
So it'd be useful,
link |
but curiosity just makes the process faster.
link |
It doesn't make the process exist.
link |
The, so what process, what learning process is it
link |
that active learning makes more efficient?
link |
And I'm asking that first question, you know,
link |
you know, we haven't answered that question yet.
link |
So, you know, I worry about active learning
link |
once this question is...
link |
So it's the more fundamental question to ask.
link |
And if active learning or interaction
link |
increases the efficiency of the learning,
link |
see, sometimes it becomes very different
link |
if the increase is several orders of magnitude, right?
link |
But fundamentally it's still the same thing
link |
and building up the intuition about how to,
link |
in a self supervised way to construct background models,
link |
efficient or inefficient, is the core problem.
link |
What do you think about Yoshua Bengio's
link |
talking about consciousness
link |
and all of these kinds of concepts?
link |
Okay, I don't know what consciousness is, but...
link |
It's a good opener.
link |
And to some extent, a lot of the things
link |
that are said about consciousness
link |
remind me of the questions people were asking themselves
link |
in the 18th century or 17th century
link |
when they discovered that, you know, how the eye works
link |
and the fact that the image at the back of the eye
link |
was upside down, right?
link |
Because you have a lens.
link |
And so on your retina, the image that forms is an image
link |
of the world, but it's upside down.
link |
How is it that you see right side up?
link |
And, you know, with what we know today in science,
link |
you know, we realize this question doesn't make any sense
link |
or is kind of ridiculous in some way, right?
link |
So I think a lot of what is said about consciousness
link |
is of that nature.
link |
Now, that said, there is a lot of really smart people
link |
that for whom I have a lot of respect
link |
who are talking about this topic,
link |
people like David Chalmers, who is a colleague of mine at NYU.
link |
I have kind of an orthodox folk speculative hypothesis
link |
about consciousness.
link |
So we're talking about the study of a world model.
link |
And I think, you know, our entire prefrontal cortex
link |
basically is the engine for a world model.
link |
But when we are attending at a particular situation,
link |
we're focused on that situation.
link |
We basically cannot attend to anything else.
link |
And that seems to suggest that we basically have
link |
only one world model engine in our prefrontal cortex.
link |
That engine is configurable to the situation at hand.
link |
So we are building a box out of wood,
link |
or we are driving down the highway playing chess.
link |
We basically have a single model of the world
link |
that we configure into the situation at hand,
link |
which is why we can only attend to one task at a time.
link |
Now, if there is a task that we do repeatedly,
link |
it goes from the sort of deliberate reasoning
link |
using model of the world and prediction
link |
and perhaps something like model predictive control,
link |
which I was talking about earlier,
link |
to something that is more subconscious
link |
that becomes automatic.
link |
So I don't know if you've ever played
link |
against a chess grandmaster.
link |
I get wiped out in 10 plays, right?
link |
And I have to think about my move for like 15 minutes.
link |
And the person in front of me, the grandmaster,
link |
would just react within seconds, right?
link |
He doesn't need to think about it.
link |
That's become part of the subconscious
link |
because it's basically just pattern recognition
link |
Same, the first few hours you drive a car,
link |
you are really attentive, you can't do anything else.
link |
And then after 20, 30 hours of practice, 50 hours,
link |
the subconscious, you can talk to the person next to you,
link |
things like that, right?
link |
Unless the situation becomes unpredictable
link |
and then you have to stop talking.
link |
So that suggests you only have one model in your head.
link |
And it might suggest the idea that consciousness
link |
basically is the module that configures
link |
this world model of yours.
link |
You need to have some sort of executive kind of overseer
link |
that configures your world model for the situation at hand.
link |
And that leads to kind of the really curious concept
link |
that consciousness is not a consequence
link |
of the power of our minds,
link |
but of the limitation of our brains.
link |
That because we have only one world model,
link |
we have to be conscious.
link |
If we had as many world models
link |
as situations we encounter,
link |
then we could do all of them simultaneously
link |
and we wouldn't need this sort of executive control
link |
that we call consciousness.
link |
Yeah, interesting.
link |
And somehow maybe that executive controller,
link |
I mean, the hard problem of consciousness,
link |
there's some kind of chemicals in biology
link |
that's creating a feeling,
link |
like it feels to experience some of these things.
link |
That's kind of like the hard question is,
link |
what the heck is that and why is that useful?
link |
Maybe the more pragmatic question,
link |
why is it useful to feel like this is really you
link |
experiencing this versus just like information
link |
It could be just a very nice side effect
link |
of the way we evolved.
link |
That's just very useful to feel a sense of ownership
link |
to the decisions you make, to the perceptions you make,
link |
to the model you're trying to maintain.
link |
Like you own this thing and this is the only one you got
link |
and if you lose it, it's gonna really suck.
link |
And so you should really send the brain
link |
some signals about it.
link |
So what ideas do you believe might be true
link |
that most or at least many people disagree with?
link |
Let's say in the space of machine learning.
link |
Well, it depends who you talk about,
link |
but I think, so certainly there is a bunch of people
link |
who are nativists, right?
link |
Who think that a lot of the basic things about the world
link |
are kind of hardwired in our minds.
link |
Things like the world is three dimensional, for example,
link |
is that hardwired?
link |
Things like object permanence,
link |
is this something that we learn
link |
before the age of three months or so?
link |
Or are we born with it?
link |
And there are very wide disagreements
link |
among the cognitive scientists for this.
link |
I think those things are actually very simple to learn.
link |
Is it the case that the oriented edge detectors in V1
link |
are learned or are they hardwired?
link |
I think they are learned.
link |
They might be learned before both
link |
because it's really easy to generate signals
link |
from the retina that actually will train edge detectors.
link |
And again, those are things that can be learned
link |
within minutes of opening your eyes, right?
link |
I mean, since the 1990s,
link |
we have algorithms that can learn oriented edge detectors
link |
completely unsupervised
link |
with the equivalent of a few minutes of real time.
link |
So those things have to be learned.
link |
And there's also those MIT experiments
link |
where you kind of plug the optical nerve
link |
on the auditory cortex of a baby ferret, right?
link |
And that auditory cortex
link |
becomes a visual cortex essentially.
link |
So clearly there's learning taking place there.
link |
So I think a lot of what people think are so basic
link |
that they need to be hardwired,
link |
I think a lot of those things are learned
link |
because they are easy to learn.
link |
So you put a lot of value in the power of learning.
link |
What kind of things do you suspect might not be learned?
link |
Is there something that could not be learned?
link |
So your intrinsic drives are not learned.
link |
There are the things that make humans human
link |
or make cats different from dogs, right?
link |
It's the basic drives that are kind of hardwired
link |
in our basal ganglia.
link |
I mean, there are people who are working
link |
on this kind of stuff that's called intrinsic motivation
link |
in the context of reinforcement learning.
link |
So these are objective functions
link |
where the reward doesn't come from the external world.
link |
It's computed by your own brain.
link |
Your own brain computes whether you're happy or not, right?
link |
It measures your degree of comfort or in comfort.
link |
And because it's your brain computing this,
link |
presumably it knows also how to estimate
link |
gradients of this, right?
link |
So it's easier to learn when your objective is intrinsic.
link |
So that has to be hardwired.
link |
The critic that makes longterm prediction of the outcome,
link |
which is the eventual result of this, that's learned.
link |
And perception is learned
link |
and your model of the world is learned.
link |
But let me take an example of why the critic,
link |
I mean, an example of how the critic may be learned, right?
link |
If I come to you, I reach across the table
link |
and I pinch your arm, right?
link |
Complete surprise for you.
link |
You would not have expected this from me.
link |
I was expecting that the whole time, but yes, right.
link |
Let's say for the sake of the story, yes.
link |
So, okay, your basal ganglia is gonna light up
link |
because it's gonna hurt, right?
link |
And now your model of the world includes the fact that
link |
I may pinch you if I approach my...
link |
Don't trust humans.
link |
Right, my hand to your arm.
link |
So if I try again, you're gonna recoil.
link |
And that's your critic, your predictive,
link |
your predictor of your ultimate pain system
link |
that predicts that something bad is gonna happen
link |
and you recoil to avoid it.
link |
So even that can be learned.
link |
That is learned, definitely.
link |
This is what allows you also to define some goals, right?
link |
So the fact that you're a school child,
link |
you wake up in the morning and you go to school
link |
and it's not because you necessarily like waking up early
link |
and going to school,
link |
but you know that there is a long term objective
link |
you're trying to optimize.
link |
So Ernest Becker, I'm not sure if you're familiar with him,
link |
the philosopher, he wrote the book Denial of Death
link |
and his idea is that one of the core motivations
link |
of human beings is our terror of death, our fear of death.
link |
That's what makes us unique from cats.
link |
Cats are just surviving.
link |
They do not have a deep, like a cognizance introspection
link |
that over the horizon is the end.
link |
And then he says that, I mean,
link |
there's a terror management theory
link |
that just all these psychological experiments
link |
that show basically this idea
link |
that all of human civilization, everything we create
link |
is kind of trying to forget if even for a brief moment
link |
that we're going to die.
link |
When do you think humans understand
link |
that they're going to die?
link |
Is it learned early on also?
link |
I don't know at what point.
link |
I mean, it's a question like at what point
link |
do you realize that what death really is?
link |
And I think most people don't actually realize
link |
what death is, right?
link |
I mean, most people believe that you go to heaven
link |
or something, right?
link |
So to push back on that, what Ernest Becker says
link |
and Sheldon Solomon, all of those folks,
link |
and I find those ideas a little bit compelling
link |
is that there is moments in life, early in life,
link |
a lot of this fun happens early in life
link |
when you do deeply experience
link |
the terror of this realization.
link |
And all the things you think about about religion,
link |
all those kinds of things that we kind of think about
link |
more like teenage years and later,
link |
we're talking about way earlier.
link |
No, it was like seven or eight years,
link |
something like that, yeah.
link |
You realize, holy crap, this is like the mystery,
link |
the terror, like it's almost like you're a little prey,
link |
a little baby deer sitting in the darkness
link |
of the jungle or the woods looking all around you.
link |
There's darkness full of terror.
link |
I mean, that realization says, okay,
link |
I'm gonna go back in the comfort of my mind
link |
where there is a deep meaning,
link |
where there is maybe like pretend I'm immortal
link |
in however way, however kind of idea I can construct
link |
to help me understand that I'm immortal.
link |
Religion helps with that.
link |
You can delude yourself in all kinds of ways,
link |
like lose yourself in the busyness of each day,
link |
have little goals in mind, all those kinds of things
link |
to think that it's gonna go on forever.
link |
And you kind of know you're gonna die, yeah,
link |
and it's gonna be sad, but you don't really understand
link |
that you're going to die.
link |
And so that's their idea.
link |
And I find that compelling because it does seem
link |
to be a core unique aspect of human nature
link |
that we're able to think that we're going,
link |
we're able to really understand that this life is finite.
link |
That seems important.
link |
There's a bunch of different things there.
link |
So first of all, I don't think there is a qualitative
link |
difference between us and cats in the term.
link |
I think the difference is that we just have a better
link |
long term ability to predict in the long term.
link |
And so we have a better understanding of how the world works.
link |
So we have better understanding of finiteness of life
link |
and things like that.
link |
So we have a better planning engine than cats?
link |
But what's the motivation for planning that far?
link |
Well, I think it's just a side effect of the fact
link |
that we have just a better planning engine
link |
because it makes us, as I said,
link |
the essence of intelligence is the ability to predict.
link |
And so the, because we're smarter as a side effect,
link |
we also have this ability to kind of make predictions
link |
about our own future existence or lack thereof.
link |
You say religion helps with that.
link |
I think religion hurts actually.
link |
It makes people worry about like,
link |
what's going to happen after their death, et cetera.
link |
If you believe that, you just don't exist after death.
link |
Like, it solves completely the problem, at least.
link |
You're saying if you don't believe in God,
link |
you don't worry about what happens after death?
link |
You only worry about this life
link |
because that's the only one you have.
link |
I think it's, well, I don't know.
link |
If I were to say what Ernest Becker says,
link |
and obviously I agree with him more than not,
link |
is you do deeply worry.
link |
If you believe there's no God,
link |
there's still a deep worry of the mystery of it all.
link |
Like, how does that make any sense that it just ends?
link |
I don't think we can truly understand that this ride,
link |
I mean, so much of our life, the consciousness,
link |
the ego is invested in this being.
link |
Science keeps bringing humanity down from its pedestal.
link |
And that's just another example of it.
link |
That's wonderful, but for us individual humans,
link |
we don't like to be brought down from a pedestal.
link |
You're saying like, but see, you're fine with it because,
link |
well, so what Ernest Becker would say is you're fine with it
link |
because there's just a more peaceful existence for you,
link |
but you're not really fine.
link |
You're hiding from it.
link |
In fact, some of the people that experience
link |
the deepest trauma earlier in life,
link |
they often, before they seek extensive therapy,
link |
will say that I'm fine.
link |
It's like when you talk to people who are truly angry,
link |
how are you doing, I'm fine.
link |
The question is, what's going on?
link |
Now I had a near death experience.
link |
I had a very bad motorbike accident when I was 17.
link |
So, but that didn't have any impact
link |
on my reflection on that topic.
link |
So I'm basically just playing a bit of devil's advocate,
link |
pushing back on wondering,
link |
is it truly possible to accept death?
link |
And the flip side, that's more interesting,
link |
I think for AI and robotics is how important
link |
is it to have this as one of the suite of motivations
link |
is to not just avoid falling off the roof
link |
or something like that, but ponder the end of the ride.
link |
If you listen to the stoics, it's a great motivator.
link |
It adds a sense of urgency.
link |
So maybe to truly fear death or be cognizant of it
link |
might give a deeper meaning and urgency to the moment
link |
Maybe I don't disagree with that.
link |
I mean, I think what motivates me here
link |
is knowing more about human nature.
link |
I mean, I think human nature and human intelligence
link |
It's a scientific mystery
link |
in addition to philosophical and et cetera,
link |
but I'm a true believer in science.
link |
So, and I do have kind of a belief
link |
that for complex systems like the brain and the mind,
link |
the way to understand it is to try to reproduce it
link |
with artifacts that you build
link |
because you know what's essential to it
link |
when you try to build it.
link |
The same way I've used this analogy before with you,
link |
I believe, the same way we only started
link |
to understand aerodynamics
link |
when we started building airplanes
link |
and that helped us understand how birds fly.
link |
So I think there's kind of a similar process here
link |
where we don't have a full theory of intelligence,
link |
but building intelligent artifacts
link |
will help us perhaps develop some underlying theory
link |
that encompasses not just artificial implements,
link |
but also human and biological intelligence in general.
link |
So you're an interesting person to ask this question
link |
about sort of all kinds of different other
link |
intelligent entities or intelligences.
link |
What are your thoughts about kind of like the touring
link |
or the Chinese room question?
link |
If we create an AI system that exhibits
link |
a lot of properties of intelligence and consciousness,
link |
how comfortable are you thinking of that entity
link |
as intelligent or conscious?
link |
So you're trying to build now systems
link |
that have intelligence and there's metrics
link |
about their performance, but that metric is external.
link |
So how are you, are you okay calling a thing intelligent
link |
or are you going to be like most humans
link |
and be once again unhappy to be brought down
link |
from a pedestal of consciousness slash intelligence?
link |
No, I'll be very happy to understand
link |
more about human nature, human mind and human intelligence
link |
through the construction of machines
link |
that have similar abilities.
link |
And if a consequence of this is to bring down humanity
link |
one notch down from its already low pedestal,
link |
I'm just fine with it.
link |
That's just the reality of life.
link |
So I'm fine with that.
link |
Now you were asking me about things that,
link |
opinions I have that a lot of people may disagree with.
link |
I think if we think about the design
link |
of autonomous intelligence systems,
link |
so assuming that we are somewhat successful
link |
at some level of getting machines to learn models
link |
of the world, predictive models of the world,
link |
we build intrinsic motivation objective functions
link |
to drive the behavior of that system.
link |
The system also has perception modules
link |
that allows it to estimate the state of the world
link |
and then have some way of figuring out
link |
the sequence of actions that,
link |
to optimize a particular objective.
link |
If it has a critic of the type that I was describing before,
link |
the thing that makes you recoil your arm
link |
the second time I try to pinch you,
link |
intelligent autonomous machine will have emotions.
link |
I think emotions are an integral part
link |
of autonomous intelligence.
link |
If you have an intelligent system
link |
that is driven by intrinsic motivation, by objectives,
link |
if it has a critic that allows it to predict in advance
link |
whether the outcome of a situation is gonna be good or bad,
link |
is going to have emotions, it's gonna have fear.
link |
When it predicts that the outcome is gonna be bad
link |
and something to avoid is gonna have elation
link |
when it predicts it's gonna be good.
link |
If it has drives to relate with humans,
link |
in some ways the way humans have,
link |
it's gonna be social, right?
link |
And so it's gonna have emotions
link |
about attachment and things of that type.
link |
So I think the sort of sci fi thing
link |
where you see commander data,
link |
like having an emotion chip that you can turn off, right?
link |
I think that's ridiculous.
link |
So, I mean, here's the difficult
link |
philosophical social question.
link |
Do you think there will be a time like a civil rights
link |
movement for robots where, okay, forget the movement,
link |
but a discussion like the Supreme Court
link |
that particular kinds of robots,
link |
you know, particular kinds of systems
link |
deserve the same rights as humans
link |
because they can suffer just as humans can,
link |
all those kinds of things.
link |
Well, perhaps, perhaps not.
link |
Like imagine that humans were,
link |
that you could, you know, die and be restored.
link |
Like, you know, you could be sort of, you know,
link |
be 3D reprinted and, you know,
link |
your brain could be reconstructed in its finest details.
link |
Our ideas of rights will change in that case.
link |
If you can always just,
link |
there's always a backup you could always restore.
link |
Maybe like the importance of murder
link |
will go down one notch.
link |
But also your desire to do dangerous things,
link |
like, you know, skydiving or, you know,
link |
or, you know, race car driving,
link |
you know, car racing or that kind of stuff,
link |
you know, would probably increase
link |
or, you know, aeroplanes, aerobatics
link |
or that kind of stuff, right?
link |
It would be fine to do a lot of those things
link |
or explore, you know, dangerous areas and things like that.
link |
It would kind of change your relationship.
link |
So now it's very likely that robots would be like that
link |
because, you know, they'll be based on perhaps technology
link |
that is somewhat similar to today's technology
link |
and you can always have a backup.
link |
So it's possible, I don't know if you like video games,
link |
but there's a game called Diablo and...
link |
Oh, my sons are huge fans of this.
link |
In fact, they made a game that's inspired by it.
link |
Like built a game?
link |
My three sons have a game design studio between them, yeah.
link |
They came out with a game.
link |
They just came out with a game.
link |
Last year, no, this was last year,
link |
early last year, about a year ago.
link |
But so in Diablo, there's something called hardcore mode,
link |
which if you die, there's no, you're gone.
link |
And so it's possible with AI systems
link |
for them to be able to operate successfully
link |
and for us to treat them in a certain way
link |
because they have to be integrated in human society,
link |
they have to be able to die, no copies allowed.
link |
In fact, copying is illegal.
link |
It's possible with humans as well,
link |
like cloning will be illegal, even when it's possible.
link |
But cloning is not copying, right?
link |
I mean, you don't reproduce the mind of the person
link |
and the experience.
link |
It's just a delayed twin, so.
link |
But then it's, but we were talking about with computers
link |
that you will be able to copy.
link |
You will be able to perfectly save,
link |
pickle the mind state.
link |
And it's possible that that will be illegal
link |
because that goes against,
link |
that will destroy the motivation of the system.
link |
Okay, so let's say you have a domestic robot, okay?
link |
Sometime in the future.
link |
And the domestic robot comes to you kind of
link |
somewhat pre trained, it can do a bunch of things,
link |
but it has a particular personality
link |
that makes it slightly different from the other robots
link |
because that makes them more interesting.
link |
And then because it's lived with you for five years,
link |
you've grown some attachment to it and vice versa,
link |
and it's learned a lot about you.
link |
Or maybe it's not a real household robot.
link |
Maybe it's a virtual assistant that lives in your,
link |
you know, augmented reality glasses or whatever, right?
link |
You know, the horror movie type thing, right?
link |
And that system to some extent,
link |
the intelligence in that system is a bit like your child
link |
or maybe your PhD student in the sense that
link |
there's a lot of you in that machine now, right?
link |
And so if it were a living thing,
link |
you would do this for free if you want, right?
link |
If it's your child, your child can, you know,
link |
then live his or her own life.
link |
And you know, the fact that they learn stuff from you
link |
doesn't mean that you have any ownership of it, right?
link |
But if it's a robot that you've trained,
link |
perhaps you have some intellectual property claim
link |
Oh, intellectual property.
link |
Oh, I thought you meant like a permanence value
link |
in the sense that part of you is in.
link |
Well, there is permanence value, right?
link |
So you would lose a lot if that robot were to be destroyed
link |
and you had no backup, you would lose a lot, right?
link |
You lose a lot of investment, you know,
link |
kind of like, you know, a person dying, you know,
link |
that a friend of yours dying
link |
or a coworker or something like that.
link |
But also you have like intellectual property rights
link |
in the sense that that system is fine tuned
link |
to your particular existence.
link |
So that's now a very unique instantiation
link |
of that original background model,
link |
whatever it was that arrived.
link |
And then there are issues of privacy, right?
link |
Because now imagine that that robot has its own kind
link |
of volition and decides to work for someone else.
link |
Or kind of, you know, thinks life with you
link |
is sort of untenable or whatever.
link |
Now, all the things that that system learned from you,
link |
you know, can you like, you know,
link |
delete all the personal information
link |
that that system knows about you?
link |
I mean, that would be kind of an ethical question.
link |
Like, you know, can you erase the mind
link |
of a intelligent robot to protect your privacy?
link |
You can't do this with humans.
link |
You can ask them to shut up,
link |
but that you don't have complete power over them.
link |
You can't erase humans, yeah, it's the problem
link |
with the relationships, you know, if you break up,
link |
you can't erase the other human.
link |
With robots, I think it will have to be the same thing
link |
with robots, that risk, that there has to be some risk
link |
to our interactions to truly experience them deeply,
link |
So you have to be able to lose your robot friend
link |
and that robot friend to go tweeting
link |
about how much of an asshole you were.
link |
But then are you allowed to, you know,
link |
murder the robot to protect your private information
link |
if the robot decides to leave?
link |
I have this intuition that for robots with certain,
link |
like, it's almost like a regulation.
link |
If you declare your robot to be,
link |
let's call it sentient or something like that,
link |
like this robot is designed for human interaction,
link |
then you're not allowed to murder these robots.
link |
It's the same as murdering other humans.
link |
Well, but what about you do a backup of the robot
link |
that you preserve on a hard drive
link |
for the equivalent in the future?
link |
That might be illegal.
link |
It's like piracy is illegal.
link |
No, but it's your own robot, right?
link |
But you can't, you don't.
link |
But then you can wipe out his brain.
link |
So this robot doesn't know anything about you anymore,
link |
but you still have, technically it's still in existence
link |
because you backed it up.
link |
And then there'll be these great speeches
link |
at the Supreme Court by saying,
link |
oh, sure, you can erase the mind of the robot
link |
just like you can erase the mind of a human.
link |
We both can suffer.
link |
There'll be some epic like Obama type character
link |
with a speech that we,
link |
like the robots and the humans are the same.
link |
We can both suffer.
link |
We can both, all of those kinds of things,
link |
raise families, all that kind of stuff.
link |
It's interesting for these, just like you said,
link |
emotion seems to be a fascinatingly powerful aspect
link |
of human interaction, human robot interaction.
link |
And if they're able to exhibit emotions
link |
at the end of the day,
link |
that's probably going to have us deeply consider
link |
human rights, like what we value in humans,
link |
what we value in other animals.
link |
That's why robots and AI is great.
link |
It makes us ask really good questions.
link |
The hard questions, yeah.
link |
But you asked about the Chinese room type argument.
link |
I think the Chinese room argument is a really good one.
link |
So for people who don't know what Chinese room is,
link |
you can, I don't even know how to formulate it well,
link |
but basically you can mimic the behavior
link |
of an intelligence system by just following
link |
a giant algorithm code book that tells you exactly
link |
how to respond in exactly each case.
link |
But is that really intelligent?
link |
It's like a giant lookup table.
link |
When this person says this, you answer this.
link |
When this person says this, you answer this.
link |
And if you understand how that works,
link |
you have this giant, nearly infinite lookup table.
link |
Is that really intelligence?
link |
Cause intelligence seems to be a mechanism
link |
that's much more interesting and complex
link |
than this lookup table.
link |
So the, I mean, the real question comes down to,
link |
do you think, you know, you can,
link |
you can mechanize intelligence in some way,
link |
even if that involves learning?
link |
And the answer is, of course, yes, there's no question.
link |
There's a second question then, which is,
link |
assuming you can reproduce intelligence
link |
in sort of different hardware than biological hardware,
link |
you know, like computers, can you, you know,
link |
match human intelligence in all the domains
link |
in which humans are intelligent?
link |
Is it possible, right?
link |
So that's the hypothesis of strong AI.
link |
The answer to this, in my opinion, is an unqualified yes.
link |
This will as well happen at some point.
link |
There's no question that machines at some point
link |
will become more intelligent than humans
link |
in all domains where humans are intelligent.
link |
This is not for tomorrow.
link |
It is going to take a long time,
link |
regardless of what, you know,
link |
Elon and others have claimed or believed.
link |
This is a lot harder than many of those guys think it is.
link |
And many of those guys who thought it was simpler than that
link |
years, you know, five years ago,
link |
now think it's hard because it's been five years
link |
and they realize it's going to take a lot longer.
link |
That includes a bunch of people at DeepMind, for example.
link |
I haven't actually touched base with the DeepMind folks,
link |
but some of it, Elon or Demis Hassabis.
link |
I mean, sometimes in your role,
link |
you have to kind of create deadlines
link |
that are nearer than farther away
link |
to kind of create an urgency.
link |
Because, you know, you have to believe the impossible
link |
is possible in order to accomplish it.
link |
And there's, of course, a flip side to that coin,
link |
but it's a weird, you can't be too cynical
link |
if you want to get something done.
link |
I agree with that.
link |
But, I mean, you have to inspire people, right?
link |
To work on sort of ambitious things.
link |
So, you know, it's certainly a lot harder than we believe,
link |
but there's no question in my mind that this will happen.
link |
And now, you know, people are kind of worried about
link |
what does that mean for humans?
link |
They are going to be brought down from their pedestal,
link |
you know, a bunch of notches with that.
link |
And, you know, is that going to be good or bad?
link |
I mean, it's just going to give more power, right?
link |
It's an amplifier for human intelligence, really.
link |
So, speaking of doing cool, ambitious things,
link |
FAIR, the Facebook AI research group,
link |
has recently celebrated its eighth birthday.
link |
Or, maybe you can correct me on that.
link |
Looking back, what has been the successes, the failures,
link |
the lessons learned from the eight years of FAIR?
link |
And maybe you can also give context of
link |
where does the newly minted meta AI fit into,
link |
how does it relate to FAIR?
link |
Right, so let me tell you a little bit
link |
about the organization of all this.
link |
Yeah, FAIR was created almost exactly eight years ago.
link |
It wasn't called FAIR yet.
link |
It took that name a few months later.
link |
And at the time I joined Facebook,
link |
there was a group called the AI group
link |
that had about 12 engineers and a few scientists,
link |
like, you know, 10 engineers and two scientists
link |
or something like that.
link |
I ran it for three and a half years as a director,
link |
you know, hired the first few scientists
link |
and kind of set up the culture and organized it,
link |
you know, explained to the Facebook leadership
link |
what fundamental research was about
link |
and how it can work within industry
link |
and how it needs to be open and everything.
link |
And I think it's been an unqualified success
link |
in the sense that FAIR has simultaneously produced,
link |
you know, top level research
link |
and advanced the science and the technology,
link |
provided tools, open source tools,
link |
like PyTorch and many others,
link |
but at the same time has had a direct
link |
or mostly indirect impact on Facebook at the time,
link |
now Meta, in the sense that a lot of systems
link |
that Meta is built around now are based
link |
on research projects that started at FAIR.
link |
And so if you were to take out, you know,
link |
deep learning out of Facebook services now
link |
and Meta more generally,
link |
I mean, the company would literally crumble.
link |
I mean, it's completely built around AI these days.
link |
And it's really essential to the operations.
link |
So what happened after three and a half years
link |
is that I changed role, I became chief scientist.
link |
So I'm not doing day to day management of FAIR anymore.
link |
I'm more of a kind of, you know,
link |
think about strategy and things like that.
link |
And I carry my, I conduct my own research.
link |
I have, you know, my own kind of research group
link |
working on self supervised learning and things like this,
link |
which I didn't have time to do when I was director.
link |
So now FAIR is run by Joel Pinot and Antoine Bord together
link |
because FAIR is kind of split in two now.
link |
There's something called FAIR Labs,
link |
which is sort of bottom up science driven research
link |
and FAIR Excel, which is slightly more organized
link |
for bigger projects that require a little more
link |
kind of focus and more engineering support
link |
and things like that.
link |
So Joel needs FAIR Lab and Antoine Bord needs FAIR Excel.
link |
Where are they located?
link |
It's delocalized all over.
link |
So there's no question that the leadership of the company
link |
believes that this was a very worthwhile investment.
link |
And what that means is that it's there for the long run.
link |
So if you want to talk in these terms, which I don't like,
link |
this is a business model, if you want,
link |
where FAIR, despite being a very fundamental research lab
link |
brings a lot of value to the company,
link |
either mostly indirectly through other groups.
link |
Now what happened three and a half years ago
link |
when I stepped down was also the creation of Facebook AI,
link |
which was basically a larger organization
link |
that covers FAIR, so FAIR is included in it,
link |
but also has other organizations
link |
that are focused on applied research
link |
or advanced development of AI technology
link |
that is more focused on the products of the company.
link |
So less emphasis on fundamental research.
link |
Less fundamental, but it's still research.
link |
I mean, there's a lot of papers coming out
link |
of those organizations and the people are awesome
link |
and wonderful to interact with.
link |
But it serves as kind of a way
link |
to kind of scale up if you want sort of AI technology,
link |
which, you know, may be very experimental
link |
and sort of lab prototypes into things that are usable.
link |
So FAIR is a subset of Meta AI.
link |
Is FAIR become like KFC?
link |
It'll just keep the F.
link |
Nobody cares what the F stands for.
link |
We'll know soon enough, probably by the end of 2021.
link |
I guess it's not a giant change, Mare, FAIR.
link |
Well, Mare doesn't sound too good,
link |
but the brand people are kind of deciding on this
link |
and they've been hesitating for a while now.
link |
And they tell us they're going to come up with an answer
link |
as to whether FAIR is going to change name
link |
or whether we're going to change just the meaning of the F.
link |
That's a good call.
link |
I would keep FAIR and change the meaning of the F.
link |
That would be my preference.
link |
I would turn the F into fundamental AI research.
link |
Oh, that's really good.
link |
So this would be meta FAIR,
link |
but people will call it FAIR, right?
link |
And now Meta AI is part of the Reality Lab.
link |
So Meta now, the new Facebook is called Meta
link |
and it's kind of divided into Facebook, Instagram, WhatsApp
link |
And Reality Lab is about AR, VR, telepresence,
link |
communication technology and stuff like that.
link |
It's kind of the, you can think of it as the sort of,
link |
a combination of sort of new products
link |
and technology part of Meta.
link |
Is that where the touch sensing for robots,
link |
I saw that you were posting about that.
link |
Touch sensing for robot is part of FAIR actually.
link |
That's a FAIR project.
link |
Yeah, this is also the, no, but there is the other way,
link |
the haptic glove, right?
link |
Yes, that's more Reality Lab.
link |
That's Reality Lab research.
link |
Reality Lab research.
link |
By the way, the touch sensors are super interesting.
link |
Like integrating that modality
link |
into the whole sensing suite is very interesting.
link |
So what do you think about the Metaverse?
link |
What do you think about this whole kind of expansion
link |
of the view of the role of Facebook and Meta in the world?
link |
Well, Metaverse really should be thought of
link |
as the next step in the internet, right?
link |
Sort of trying to kind of make the experience
link |
more compelling of being connected
link |
either with other people or with content.
link |
And we are evolved and trained to evolve
link |
in 3D environments where we can see other people.
link |
We can talk to them when we're near them
link |
or an other viewer far away can't hear us,
link |
things like that, right?
link |
So there's a lot of social conventions
link |
that exist in the real world that we can try to transpose.
link |
Now, what is going to be eventually the,
link |
how compelling is it going to be?
link |
Like, is it going to be the case
link |
that people are going to be willing to do this
link |
if they have to wear a huge pair of goggles all day?
link |
But then again, if the experience
link |
is sufficiently compelling, maybe so.
link |
Or if the device that you have to wear
link |
is just basically a pair of glasses,
link |
and technology makes sufficient progress for that.
link |
AR is a much easier concept to grasp
link |
that you're going to have augmented reality glasses
link |
that basically contain some sort of virtual assistant
link |
that can help you in your daily lives.
link |
But at the same time with the AR,
link |
you have to contend with reality.
link |
With VR, you can completely detach yourself from reality.
link |
So it gives you freedom.
link |
It might be easier to design worlds in VR.
link |
Yeah, but you can imagine the metaverse
link |
being a mix, right?
link |
Or like, you can have objects that exist in the metaverse
link |
that pop up on top of the real world,
link |
or only exist in virtual reality.
link |
Okay, let me ask the hard question.
link |
Oh, because all of this was easy so far.
link |
The Facebook, now Meta, the social network
link |
has been painted by the media as a net negative for society,
link |
even destructive and evil at times.
link |
You've pushed back against this, defending Facebook.
link |
Can you explain your defense?
link |
Yeah, so the description,
link |
the company that is being described in some media
link |
is not the company we know when we work inside.
link |
And it could be claimed that a lot of employees
link |
are uninformed about what really goes on in the company,
link |
but I'm a vice president.
link |
I mean, I have a pretty good vision of what goes on.
link |
I don't know everything, obviously.
link |
I'm not involved in everything,
link |
but certainly not in decision about content moderation
link |
or anything like this,
link |
but I have some decent vision of what goes on.
link |
And this evil that is being described, I just don't see it.
link |
And then I think there is an easy story to buy,
link |
which is that all the bad things in the world
link |
and the reason your friend believe crazy stuff,
link |
there's an easy scapegoat in social media in general,
link |
Facebook in particular.
link |
But you have to look at the data.
link |
Is it the case that Facebook, for example,
link |
polarizes people politically?
link |
Are there academic studies that show this?
link |
Is it the case that teenagers think of themselves less
link |
if they use Instagram more?
link |
Is it the case that people get more riled up
link |
against opposite sides in a debate or political opinion
link |
if they are more on Facebook or if they are less?
link |
And study after study show that none of this is true.
link |
This is independent studies by academic.
link |
They're not funded by Facebook or Meta.
link |
Study by Stanford, by some of my colleagues at NYU actually
link |
with whom I have no connection.
link |
There's a study recently, they paid people,
link |
I think it was in former Yugoslavia,
link |
I'm not exactly sure in what part,
link |
but they paid people to not use Facebook for a while
link |
in the period before the anniversary
link |
of the Srebrenica massacres.
link |
So people get riled up, like should we have a celebration?
link |
I mean, a memorial kind of celebration for it or not.
link |
So they paid a bunch of people
link |
to not use Facebook for a few weeks.
link |
And it turns out that those people ended up
link |
being more polarized than they were at the beginning
link |
and the people who were more on Facebook were less polarized.
link |
There's a study from Stanford of economists at Stanford
link |
that try to identify the causes
link |
of increasing polarization in the US.
link |
And it's been going on for 40 years
link |
before Mark Zuckerberg was born continuously.
link |
And so if there is a cause,
link |
it's not Facebook or social media.
link |
So you could say if social media just accelerated,
link |
but no, I mean, it's basically a continuous evolution
link |
by some measure of polarization in the US.
link |
And then you compare this with other countries
link |
like the West half of Germany
link |
because you can go 40 years in the East side
link |
or Denmark or other countries.
link |
And they use Facebook just as much
link |
and they're not getting more polarized,
link |
they're getting less polarized.
link |
So if you want to look for a causal relationship there,
link |
you can find a scapegoat, but you can't find a cause.
link |
Now, if you want to fix the problem,
link |
you have to find the right cause.
link |
And what rise me up is that people now are accusing Facebook
link |
of bad deeds that are done by others
link |
and those others are we're not doing anything about them.
link |
And by the way, those others include the owner
link |
of the Wall Street Journal
link |
in which all of those papers were published.
link |
So I should mention that I'm talking to Schrepp,
link |
Mike Schrepp on this podcast and also Mark Zuckerberg
link |
and probably these are conversations you can have with them
link |
because it's very interesting to me,
link |
even if Facebook has some measurable negative effect,
link |
you can't just consider that in isolation.
link |
You have to consider about all the positive ways
link |
that it connects us.
link |
So like every technology.
link |
It connects people, it's a question.
link |
You can't just say like there's an increase in division.
link |
Yes, probably Google search engine
link |
has created increase in division.
link |
But you have to consider about how much information
link |
are brought to the world.
link |
Like I'm sure Wikipedia created more division.
link |
If you just look at the division,
link |
we have to look at the full context of the world
link |
and they didn't make a better world.
link |
The printing press has created more division, right?
link |
I mean, so when the printing press was invented,
link |
the first books that were printed were things like the Bible
link |
and that allowed people to read the Bible by themselves,
link |
not get the message uniquely from priests in Europe.
link |
And that created the Protestant movement
link |
and 200 years of religious persecution and wars.
link |
So that's a bad side effect of the printing press.
link |
Social networks aren't being nearly as bad
link |
as the printing press,
link |
but nobody would say the printing press was a bad idea.
link |
Yeah, a lot of it is perception
link |
and there's a lot of different incentives operating here.
link |
Maybe a quick comment,
link |
since you're one of the top leaders at Facebook
link |
and at Meta, sorry, that's in the tech space,
link |
I'm sure Facebook involves a lot of incredible
link |
technological challenges that need to be solved.
link |
A lot of it probably is in the computer infrastructure,
link |
the hardware, I mean, it's just a huge amount.
link |
Maybe can you give me context about how much of Shrepp's life
link |
is AI and how much of it is low level compute?
link |
How much of it is flying all around doing business stuff?
link |
And the same with Mark Zuckerberg.
link |
They really focus on AI.
link |
I mean, certainly in the run up of the creation of FAIR
link |
and for at least a year after that, if not more,
link |
Mark was very, very much focused on AI
link |
and was spending quite a lot of effort on it.
link |
And that's his style.
link |
When he gets interested in something,
link |
he reads everything about it.
link |
He read some of my papers, for example, before I joined.
link |
And so he learned a lot about it.
link |
He said he liked notes.
link |
And Shrepp was really into it also.
link |
I mean, Shrepp is really kind of,
link |
has something I've tried to preserve also
link |
despite my not so young age,
link |
which is a sense of wonder about science and technology.
link |
And he certainly has that.
link |
He's also a wonderful person.
link |
I mean, in terms of like as a manager,
link |
like dealing with people and everything.
link |
Mark also, actually.
link |
I mean, they're very human people.
link |
In the case of Mark, it's shockingly human
link |
given his trajectory.
link |
I mean, the personality of him that is painted in the press,
link |
it's just completely wrong.
link |
But you have to know how to play the press.
link |
So that's, I put some of that responsibility on him too.
link |
You have to, it's like, you know,
link |
like the director, the conductor of an orchestra,
link |
you have to play the press and the public
link |
in a certain kind of way
link |
where you convey your true self to them.
link |
If there's a depth and kindness to it.
link |
And it's probably not the best at it.
link |
You have to learn.
link |
And it's sad to see, and I'll talk to him about it,
link |
but Shrep is slowly stepping down.
link |
It's always sad to see folks sort of be there
link |
for a long time and slowly.
link |
I guess time is sad.
link |
I think he's done the thing he set out to do.
link |
And, you know, he's got, you know,
link |
family priorities and stuff like that.
link |
And I understand, you know, after 13 years or something.
link |
It's been a good run.
link |
Which in Silicon Valley is basically a lifetime.
link |
You know, because, you know, it's dog years.
link |
So, NeurIPS, the conference just wrapped up.
link |
Let me just go back to something else.
link |
You posted that a paper you coauthored
link |
was rejected from NeurIPS.
link |
As you said, proudly, in quotes, rejected.
link |
So, can you describe this paper?
link |
And like, what was the idea in it?
link |
And also, maybe this is a good opportunity to ask
link |
what are the pros and cons, what works and what doesn't
link |
about the review process?
link |
Yeah, let me talk about the paper first.
link |
I'll talk about the review process afterwards.
link |
The paper is called VicReg.
link |
So, this is, I mentioned that before.
link |
Variance, invariance, covariance, regularization.
link |
And it's a technique, a noncontrastive learning technique
link |
for what I call joint embedding architecture.
link |
So, SiameseNets are an example
link |
of joint embedding architecture.
link |
So, joint embedding architecture is,
link |
let me back up a little bit, right?
link |
So, if you want to do self supervised learning,
link |
you can do it by prediction.
link |
So, let's say you want to train the system
link |
to predict video, right?
link |
You show it a video clip and you train the system
link |
to predict the next, the continuation of that video clip.
link |
Now, because you need to handle uncertainty,
link |
because there are many continuations that are plausible,
link |
you need to have, you need to handle this in some way.
link |
You need to have a way for the system
link |
to be able to produce multiple predictions.
link |
And the way, the only way I know to do this
link |
is through what's called a latent variable.
link |
So, you have some sort of hidden vector
link |
of a variable that you can vary over a set
link |
or draw from a distribution.
link |
And as you vary this vector over a set,
link |
the output, the prediction varies
link |
over a set of plausible predictions, okay?
link |
So, that's called,
link |
I call this a generative latent variable model.
link |
Okay, now there is an alternative to this,
link |
to handle uncertainty.
link |
And instead of directly predicting the next frames
link |
of the clip, you also run those through another neural net.
link |
So, you now have two neural nets,
link |
one that looks at the initial segment of the video clip,
link |
and another one that looks at the continuation
link |
during training, right?
link |
And what you're trying to do is learn a representation
link |
of those two video clips that is maximally informative
link |
about the video clips themselves,
link |
but is such that you can predict the representation
link |
of the second video clip
link |
from the representation of the first one easily, okay?
link |
And you can sort of formalize this
link |
in terms of maximizing mutual information
link |
and some stuff like that, but it doesn't matter.
link |
What you want is informative representations
link |
of the two video clips that are mutually predictable.
link |
What that means is that there's a lot of details
link |
in the second video clips that are irrelevant.
link |
Let's say a video clip consists in a camera panning
link |
the scene, there's gonna be a piece of that room
link |
that is gonna be revealed, and I can somewhat predict
link |
what that room is gonna look like,
link |
but I may not be able to predict the details
link |
of the texture of the ground
link |
and where the tiles are ending and stuff like that, right?
link |
So, those are irrelevant details
link |
that perhaps my representation will eliminate.
link |
And so, what I need is to train this second neural net
link |
in such a way that whenever the continuation video clip
link |
varies over all the plausible continuations,
link |
the representation doesn't change.
link |
So, it's the, yeah, yeah, got it.
link |
Over the space of the representations,
link |
doing the same kind of thing
link |
as you do with similarity learning.
link |
So, these are two ways to handle multimodality
link |
in a prediction, right?
link |
In the first way, you parameterize the prediction
link |
with a latent variable,
link |
but you predict pixels essentially, right?
link |
In the second one, you don't predict pixels,
link |
you predict an abstract representation of pixels,
link |
and you guarantee that this abstract representation
link |
has as much information as possible about the input,
link |
but sort of, you know,
link |
drops all the stuff that you really can't predict,
link |
I used to be a big fan of the first approach.
link |
And in fact, in this paper with Hicham Mishra,
link |
this blog post, the Dark Matter Intelligence,
link |
I was kind of advocating for this.
link |
And in the last year and a half,
link |
I've completely changed my mind.
link |
I'm now a big fan of the second one.
link |
And it's because of a small collection of algorithms
link |
that have been proposed over the last year and a half or so,
link |
two years, to do this, including vCraig,
link |
its predecessor called Barlow Twins,
link |
which I mentioned, a method from our friends at DeepMind
link |
called BYOL, and there's a bunch of others now
link |
that kind of work similarly.
link |
So, they're all based on this idea of joint embedding.
link |
Some of them have an explicit criterion
link |
that is an approximation of mutual information.
link |
Some others at BYOL work, but we don't really know why.
link |
And there's been like lots of theoretical papers
link |
about why BYOL works.
link |
No, it's not that, because we take it out
link |
and it still works, and blah, blah, blah.
link |
I mean, so there's like a big debate,
link |
but the important point is that we now have a collection
link |
of noncontrastive joint embedding methods,
link |
which I think is the best thing since sliced bread.
link |
So, I'm super excited about this
link |
because I think it's our best shot
link |
for techniques that would allow us
link |
to kind of build predictive world models.
link |
And at the same time,
link |
learn hierarchical representations of the world,
link |
where what matters about the world is preserved
link |
and what is irrelevant is eliminated.
link |
And by the way, the representations,
link |
the before and after, is in the space
link |
in a sequence of images, or is it for single images?
link |
It would be either for a single image, for a sequence.
link |
It doesn't have to be images.
link |
This could be applied to text.
link |
This could be applied to just about any signal.
link |
I'm looking for methods that are generally applicable
link |
that are not specific to one particular modality.
link |
It could be audio or whatever.
link |
So, what's the story behind this paper?
link |
This paper is describing one such method?
link |
It's this vcrack method.
link |
So, this is coauthored.
link |
The first author is a student called Adrien Barne,
link |
who is a resident PhD student at Fair Paris,
link |
who is coadvised by me and Jean Ponce,
link |
who is a professor at École Normale Supérieure,
link |
also a research director at INRIA.
link |
So, this is a wonderful program in France
link |
where PhD students can basically do their PhD in industry,
link |
and that's kind of what's happening here.
link |
And this paper is a followup on this Bardo Twin paper
link |
by my former postdoc, now Stéphane Denis,
link |
with Li Jing and Iorij Bontar
link |
and a bunch of other people from Fair.
link |
And one of the main criticism from reviewers
link |
is that vcrack is not different enough from Bardo Twins.
link |
But, you know, my impression is that it's, you know,
link |
Bardo Twins with a few bugs fixed, essentially,
link |
and in the end, this is what people will use.
link |
But, you know, I'm used to stuff
link |
that I submit being rejected for a while.
link |
So, it might be rejected and actually exceptionally well cited
link |
because people use it.
link |
Well, it's already cited like a bunch of times.
link |
So, I mean, the question is then to the deeper question
link |
about peer review and conferences.
link |
I mean, computer science is a field that's kind of unique
link |
that the conference is highly prized.
link |
And it's interesting because the peer review process there
link |
is similar, I suppose, to journals,
link |
but it's accelerated significantly.
link |
Well, not significantly, but it goes fast.
link |
And it's a nice way to get stuff out quickly,
link |
to peer review it quickly,
link |
go to present it quickly to the community.
link |
So, not quickly, but quicker.
link |
But nevertheless, it has many of the same flaws
link |
because it's a limited number of people look at it.
link |
There's bias and the following,
link |
like that if you want to do new ideas,
link |
you're going to get pushback.
link |
There's self interested people that kind of can infer
link |
who submitted it and kind of, you know,
link |
be cranky about it, all that kind of stuff.
link |
Yeah, I mean, there's a lot of social phenomena there.
link |
There's one social phenomenon, which is that
link |
because the field has been growing exponentially,
link |
the vast majority of people in the field
link |
are extremely junior.
link |
So, as a consequence,
link |
and that's just a consequence of the field growing, right?
link |
So, as the number of, as the size of the field
link |
kind of starts saturating,
link |
you will have less of that problem
link |
of reviewers being very inexperienced.
link |
A consequence of this is that, you know, young reviewers,
link |
I mean, there's a phenomenon which is that
link |
reviewers try to make their life easy
link |
and to make their life easy when reviewing a paper
link |
You just have to find a flaw in the paper, right?
link |
So, basically they see the task as finding flaws in papers
link |
and most papers have flaws, even the good ones.
link |
So, it's easy to, you know, to do that.
link |
Your job is easier as a reviewer if you just focus on this.
link |
But what's important is like,
link |
is there a new idea in that paper
link |
that is likely to influence?
link |
It doesn't matter if the experiments are not that great,
link |
if the protocol is, you know, so, so, you know,
link |
As long as there is a worthy idea in it
link |
that will influence the way people think about the problem,
link |
even if they make it better, you know, eventually,
link |
I think that's really what makes a paper useful.
link |
And so, this combination of social phenomena
link |
creates a disease that has plagued, you know,
link |
other fields in the past, like speech recognition,
link |
where basically, you know, people chase numbers
link |
on benchmarks and it's much easier to get a paper accepted
link |
if it brings an incremental improvement
link |
on a sort of mainstream well accepted method or problem.
link |
And those are, to me, boring papers.
link |
I mean, they're not useless, right?
link |
Because industry, you know, strives
link |
on those kinds of progress,
link |
but they're not the ones that I'm interested in,
link |
in terms of like new concepts and new ideas.
link |
So, papers that are really trying to strike
link |
kind of new advances generally don't make it.
link |
Now, thankfully we have Archive.
link |
And then there's open review type of situations
link |
where you, and then, I mean, Twitter's a kind of open review.
link |
I'm a huge believer that review should be done
link |
by thousands of people, not two people.
link |
And so Archive, like do you see a future
link |
where a lot of really strong papers,
link |
it's already the present, but a growing future
link |
where it'll just be Archive
link |
and you're presenting an ongoing continuous conference
link |
called Twitter slash the internet slash Archive Sanity.
link |
Andre just released a new version.
link |
So just not, you know, not being so elitist
link |
about this particular gating.
link |
It's not a question of being elitist or not.
link |
It's a question of being basically recommendation
link |
and sort of approvals for people who don't see themselves
link |
as having the ability to do so by themselves, right?
link |
And so it saves time, right?
link |
If you rely on other people's opinion
link |
and you trust those people or those groups
link |
to evaluate a paper for you, that saves you time
link |
because, you know, you don't have to like scrutinize
link |
the paper as much, you know, is brought to your attention.
link |
I mean, it's the whole idea of sort of, you know,
link |
collective recommender system, right?
link |
So I actually thought about this a lot, you know,
link |
about 10, 15 years ago,
link |
because there were discussions at NIPS
link |
and, you know, and we're about to create iClear
link |
with Yoshua Bengio.
link |
And so I wrote a document kind of describing
link |
a reviewing system, which basically was, you know,
link |
you post your paper on some repository,
link |
let's say archive or now could be open review.
link |
And then you can form a reviewing entity,
link |
which is equivalent to a reviewing board, you know,
link |
of a journal or program committee of a conference.
link |
You have to list the members.
link |
And then that group reviewing entity can choose
link |
to review a particular paper spontaneously or not.
link |
There is no exclusive relationship anymore
link |
between a paper and a venue or reviewing entity.
link |
Any reviewing entity can review any paper
link |
or may choose not to.
link |
And then, you know, given evaluation,
link |
it's not published, not published,
link |
it's just an evaluation and a comment,
link |
which would be public, signed by the reviewing entity.
link |
And if it's signed by a reviewing entity,
link |
you know, it's one of the members of reviewing entity.
link |
So if the reviewing entity is, you know,
link |
Lex Friedman's, you know, preferred papers, right?
link |
You know, it's Lex Friedman writing the review.
link |
Yes, so for me, that's a beautiful system, I think.
link |
But in addition to that,
link |
it feels like there should be a reputation system
link |
for the reviewers.
link |
For the reviewing entities,
link |
not the reviewers individually.
link |
The reviewing entities, sure.
link |
But even within that, the reviewers too,
link |
because there's another thing here.
link |
It's not just the reputation,
link |
it's an incentive for an individual person to do great.
link |
Right now, in the academic setting,
link |
the incentive is kind of internal,
link |
just wanting to do a good job.
link |
But honestly, that's not a strong enough incentive
link |
to do a really good job in reading a paper,
link |
in finding the beautiful amidst the mistakes and the flaws
link |
and all that kind of stuff.
link |
Like if you're the person that first discovered
link |
a powerful paper, and you get to be proud of that discovery,
link |
then that gives a huge incentive to you.
link |
That's a big part of my proposal, actually,
link |
where I describe that as, you know,
link |
if your evaluation of papers is predictive
link |
of future success, okay,
link |
then your reputation should go up as a reviewing entity.
link |
I mean, I even had a master's student
link |
who was a master's student in library science
link |
and computer science actually kind of work out exactly
link |
how that should work with formulas and everything.
link |
So in terms of implementation,
link |
do you think that's something that's doable?
link |
I mean, I've been sort of, you know,
link |
talking about this to sort of various people
link |
like, you know, Andrew McCallum, who started Open Review.
link |
And the reason why we picked Open Review
link |
for iClear initially,
link |
even though it was very early for them,
link |
is because my hope was that iClear,
link |
it was eventually going