back to indexFrançois Chollet: Measures of Intelligence | Lex Fridman Podcast #120
link |
The following is a conversation with Francois Chollet,
link |
his second time on the podcast.
link |
He's both a world class engineer and a philosopher
link |
in the realm of deep learning and artificial intelligence.
link |
This time, we talk a lot about his paper titled
link |
on the measure of intelligence that discusses
link |
how we might define and measure general intelligence
link |
in our computing machinery.
link |
Quick summary of the sponsors,
link |
Babbel, Masterclass, and Cash App.
link |
Click the sponsor links in the description
link |
to get a discount and to support this podcast.
link |
As a side note, let me say that the serious,
link |
rigorous scientific study
link |
of artificial general intelligence is a rare thing.
link |
The mainstream machine learning community works
link |
on very narrow AI with very narrow benchmarks.
link |
This is very good for incremental
link |
and sometimes big incremental progress.
link |
On the other hand, the outside the mainstream,
link |
renegade, you could say, AGI community works
link |
on approaches that verge on the philosophical
link |
and even the literary without big public benchmarks.
link |
Walking the line between the two worlds is a rare breed,
link |
but it doesn't have to be.
link |
I ran the AGI series at MIT as an attempt
link |
to inspire more people to walk this line.
link |
Deep mind and open AI for a time
link |
and still on occasion walk this line.
link |
Francois Chollet does as well.
link |
It's a beautiful dream to work towards
link |
and to make real one day.
link |
If you enjoy this thing, subscribe on YouTube,
link |
review it with five stars on Apple Podcast,
link |
follow on Spotify, support on Patreon,
link |
or connect with me on Twitter at Lex Friedman.
link |
As usual, I'll do a few minutes of ads now
link |
and no ads in the middle.
link |
I try to make these interesting,
link |
but I give you timestamps so you can skip.
link |
But still, please do check out the sponsors
link |
by clicking the links in the description.
link |
It's the best way to support this podcast.
link |
This show is sponsored by Babbel,
link |
an app and website that gets you speaking
link |
in a new language within weeks.
link |
Go to babbel.com and use code Lex to get three months free.
link |
They offer 14 languages, including Spanish, French,
link |
Italian, German, and yes, Russian.
link |
Daily lessons are 10 to 15 minutes,
link |
super easy, effective,
link |
designed by over 100 language experts.
link |
Let me read a few lines from the Russian poem
link |
Noch, ulitsa, fanar, apteka, by Alexander Bloch,
link |
that you'll start to understand if you sign up to Babbel.
link |
Noch, ulitsa, fanar, apteka,
link |
Bessmysliny, ituskly, svet,
link |
Zhevi esho, khod chetvert veka,
link |
Vse budet tak, ishoda, net.
link |
Now, I say that you'll start to understand this poem
link |
because Russian starts with a language
link |
and ends with vodka.
link |
Now, the latter part is definitely not endorsed
link |
or provided by Babbel.
link |
It will probably lose me this sponsorship,
link |
although it hasn't yet.
link |
But once you graduate with Babbel,
link |
you can enroll in my advanced course
link |
of late night Russian conversation over vodka.
link |
No app for that yet.
link |
So get started by visiting babbel.com
link |
and use code Lex to get three months free.
link |
This show is also sponsored by Masterclass.
link |
Sign up at masterclass.com slash Lex
link |
to get a discount and to support this podcast.
link |
When I first heard about Masterclass,
link |
I thought it was too good to be true.
link |
I still think it's too good to be true.
link |
For $180 a year, you get an all access pass
link |
to watch courses from, to list some of my favorites.
link |
Chris Hatfield on space exploration,
link |
hope to have him in this podcast one day.
link |
Neil Dugras Tyson on scientific thinking and communication,
link |
Will Wright, creator of SimCity and Sims
link |
on game design, Carlos Santana on guitar,
link |
Kary Kasparov on chess, Daniel Nagrano on poker,
link |
Chris Hatfield explaining how rockets work
link |
and the experience of being watched at the space
link |
alone is worth the money.
link |
By the way, you can watch it on basically any device.
link |
Once again, sign up at masterclass.com slash Lex
link |
to get a discount and to support this podcast.
link |
This show finally is presented by Cash App,
link |
the number one finance app in the App Store.
link |
When you get it, use code LexPodcast.
link |
Cash App lets you send money to friends,
link |
buy Bitcoin, and invest in the stock market
link |
with as little as $1.
link |
Since Cash App allows you to send
link |
and receive money digitally,
link |
let me mention a surprising fact related to physical money.
link |
Of all the currency in the world,
link |
roughly 8% of it is actually physical money.
link |
The other 92% of the money only exists digitally,
link |
and that's only going to increase.
link |
So again, if you get Cash App from the App Store
link |
through Google Play and use code LexPodcast,
link |
and Cash App will also donate $10 to FIRST,
link |
an organization that is helping to advance robotics
link |
and STEM education for young people around the world.
link |
And now here's my conversation with Francois Chalet.
link |
What philosophers, thinkers, or ideas
link |
had a big impact on you growing up and today?
link |
So one author that had a big impact on me
link |
when I read his books as a teenager was Jean Piaget,
link |
who is a Swiss psychologist,
link |
is considered to be the father of developmental psychology.
link |
And he has a large body of work about
link |
basically how intelligence develops in children.
link |
And so it's very old work,
link |
like most of it is from the 1930s, 1940s.
link |
So it's not quite up to date.
link |
It's actually superseded by many newer developments
link |
in developmental psychology.
link |
But to me, it was very interesting, very striking,
link |
and actually shaped the early ways
link |
in which I started thinking about the mind
link |
and the development of intelligence as a teenager.
link |
His actual ideas or the way he thought about it
link |
or just the fact that you could think
link |
about the developing mind at all?
link |
Jean Piaget is the author that really introduced me
link |
to the notion that intelligence and the mind
link |
is something that you construct throughout your life
link |
and that children construct it in stages.
link |
And I thought that was a very interesting idea,
link |
which is, of course, very relevant to AI,
link |
to building artificial minds.
link |
Another book that I read around the same time
link |
that had a big impact on me,
link |
and there was actually a little bit of overlap
link |
with Jean Piaget as well,
link |
and I read it around the same time,
link |
is Geoff Hawking's On Intelligence, which is a classic.
link |
And he has this vision of the mind
link |
as a multi scale hierarchy of temporal prediction modules.
link |
And these ideas really resonated with me,
link |
like the notion of a modular hierarchy
link |
of potentially compression functions
link |
or prediction functions.
link |
I thought it was really, really interesting,
link |
and it shaped the way I started thinking
link |
about how to build minds.
link |
The hierarchical nature, which aspect?
link |
Also, he's a neuroscientist, so he was thinking actual,
link |
he was basically talking about how our mind works.
link |
Yeah, the notion that cognition is prediction
link |
was an idea that was kind of new to me at the time
link |
and that I really loved at the time.
link |
And yeah, and the notion that there are multiple scales
link |
of processing in the brain.
link |
This was before deep learning.
link |
These ideas of hierarchies in AI
link |
have been around for a long time,
link |
even before on intelligence.
link |
They've been around since the 1980s.
link |
And yeah, that was before deep learning.
link |
But of course, I think these ideas really found
link |
their practical implementation in deep learning.
link |
What about the memory side of things?
link |
I think he was talking about knowledge representation.
link |
Do you think about memory a lot?
link |
One way you can think of neural networks
link |
as a kind of memory, you're memorizing things,
link |
but it doesn't seem to be the kind of memory
link |
that's in our brains,
link |
or it doesn't have the same rich complexity,
link |
long term nature that's in our brains.
link |
Yes, the brain is more of a sparse access memory
link |
so that you can actually retrieve very precisely
link |
like bits of your experience.
link |
The retrieval aspect, you can like introspect,
link |
you can ask yourself questions.
link |
I guess you can program your own memory
link |
and language is actually the tool you use to do that.
link |
I think language is a kind of operating system for the mind
link |
Well, one of the uses of language is as a query
link |
that you run over your own memory,
link |
use words as keys to retrieve specific experiences
link |
or specific concepts, specific thoughts.
link |
Like language is a way you store thoughts,
link |
not just in writing, in the physical world,
link |
but also in your own mind.
link |
And it's also how you retrieve them.
link |
Like, imagine if you didn't have language,
link |
then you would have to,
link |
you would not really have a self,
link |
internally triggered way of retrieving past thoughts.
link |
You would have to rely on external experiences.
link |
For instance, you see a specific site,
link |
you smell a specific smell and that brings up memories,
link |
but you would not really have a way
link |
to deliberately access these memories without language.
link |
Well, the interesting thing you mentioned
link |
is you can also program the memory.
link |
You can change it probably with language.
link |
Yeah, using language, yes.
link |
Well, let me ask you a Chomsky question,
link |
which is like, first of all,
link |
do you think language is like fundamental,
link |
like there's turtles, what's at the bottom of the turtles?
link |
They don't go, it can't be turtles all the way down.
link |
Is language at the bottom of cognition of everything?
link |
Is like language, the fundamental aspect
link |
of like what it means to be a thinking thing?
link |
No, I don't think so.
link |
I think language is.
link |
You disagree with Norm Chomsky?
link |
Yes, I think language is a layer on top of cognition.
link |
So it is fundamental to cognition in the sense that
link |
to use a computing metaphor,
link |
I see language as the operating system of the brain,
link |
of the human mind.
link |
And the operating system is a layer on top of the computer.
link |
The computer exists before the operating system,
link |
but the operating system is how you make it truly useful.
link |
And the operating system is most likely Windows, not Linux,
link |
because language is messy.
link |
Yeah, it's messy and it's pretty difficult
link |
to inspect it, introspect it.
link |
How do you think about language?
link |
Like we use actually sort of human interpretable language,
link |
but is there something like a deeper,
link |
that's closer to like logical type of statements?
link |
Like, yeah, what is the nature of language, do you think?
link |
Like is there something deeper than like the syntactic rules
link |
Is there something that doesn't require utterances
link |
or writing or so on?
link |
Are you asking about the possibility
link |
that there could exist languages for thinking
link |
that are not made of words?
link |
I think, so the mind is layers, right?
link |
And language is almost like the outermost,
link |
the uppermost layer.
link |
But before we think in words,
link |
I think we think in terms of emotion in space
link |
and we think in terms of physical actions.
link |
And I think babies in particular,
link |
probably expresses thoughts in terms of the actions
link |
that they've seen or that they can perform
link |
and in terms of motions of objects in their environment
link |
before they start thinking in terms of words.
link |
It's amazing to think about that
link |
as the building blocks of language.
link |
So like the kind of actions and ways the babies see the world
link |
as like more fundamental
link |
than the beautiful Shakespearean language
link |
you construct on top of it.
link |
And we probably don't have any idea
link |
what that looks like, right?
link |
Like what, because it's important
link |
for them trying to engineer it into AI systems.
link |
I think visual analogies and motion
link |
is a fundamental building block of the mind.
link |
And you actually see it reflected in language.
link |
Like language is full of special metaphors.
link |
And when you think about things,
link |
I consider myself very much as a visual thinker.
link |
You often express these thoughts
link |
by using things like visualizing concepts
link |
in 2D space or like you solve problems
link |
by imagining yourself navigating a concept space.
link |
So I don't know if you have this sort of experience.
link |
You said visualizing concept space.
link |
So like, so I certainly think about,
link |
I certainly visualize mathematical concepts,
link |
but you mean like in concept space,
link |
visually you're embedding ideas
link |
into a three dimensional space
link |
you can explore with your mind essentially?
link |
You should be more like 2D, but yeah.
link |
You're a flatlander.
link |
I always have to, before I jump from concept to concept,
link |
I have to put it back down on paper.
link |
It has to be on paper.
link |
I can only travel on 2D paper, not inside my mind.
link |
You're able to move inside your mind.
link |
But even if you're writing like a paper, for instance,
link |
don't you have like a spatial representation of your paper?
link |
Like you visualize where ideas lie topologically
link |
in relationship to other ideas,
link |
kind of like a subway map of the ideas in your paper.
link |
Yeah, that's true.
link |
I mean, there is, in papers, I don't know about you,
link |
but it feels like there's a destination.
link |
There's a key idea that you want to arrive at.
link |
And a lot of it is in the fog
link |
and you're trying to kind of,
link |
it's almost like, what's that called
link |
when you do a path planning search from both directions,
link |
from the start and from the end.
link |
And then you find, you do like shortest path,
link |
but like, you know, in game playing,
link |
you do this with like A star from both sides.
link |
And you see where we're on the join.
link |
Yeah, so you kind of do, at least for me,
link |
I think like, first of all,
link |
just exploring from the start from like first principles,
link |
what do I know, what can I start proving from that, right?
link |
And then from the destination,
link |
if you start backtracking,
link |
like if I want to show some kind of sets of ideas,
link |
what would it take to show them and you kind of backtrack,
link |
I don't think I'm doing all that in my mind though.
link |
Like I'm putting it down on paper.
link |
Do you use mind maps to organize your ideas?
link |
Yeah, I like mind maps.
link |
Let's get into this,
link |
because I've been so jealous of people.
link |
I haven't really tried it.
link |
I've been jealous of people that seem to like,
link |
they get like this fire of passion in their eyes
link |
because everything starts making sense.
link |
It's like Tom Cruise in the movie
link |
was like moving stuff around.
link |
Some of the most brilliant people I know use mind maps.
link |
I haven't tried really.
link |
Can you explain what the hell a mind map is?
link |
I guess mind map is a way to make
link |
kind of like the mess inside your mind
link |
to just put it on paper so that you gain more control over it.
link |
It's a way to organize things on paper
link |
and as kind of like a consequence
link |
of organizing things on paper,
link |
they start being more organized inside your own mind.
link |
So what does that look like?
link |
You put, like, do you have an example?
link |
Like what's the first thing you write on paper?
link |
What's the second thing you write?
link |
I mean, typically you draw a mind map
link |
to organize the way you think about a topic.
link |
So you would start by writing down
link |
like the key concept about that topic.
link |
Like you would write intelligence or something,
link |
and then you would start adding associative connections.
link |
Like what do you think about
link |
when you think about intelligence?
link |
What do you think are the key elements of intelligence?
link |
So maybe you would have language, for instance,
link |
and you'd have motion.
link |
And so you would start drawing notes with these things.
link |
And then you would see what do you think about
link |
when you think about motion and so on.
link |
And you would go like that, like a tree.
link |
Is it a tree mostly or is it a graph too, like a tree?
link |
Oh, it's more of a graph than a tree.
link |
And it's not limited to just writing down words.
link |
You can also draw things.
link |
And it's not supposed to be purely hierarchical, right?
link |
The point is that once you start writing it down,
link |
you can start reorganizing it so that it makes more sense,
link |
so that it's connected in a more effective way.
link |
See, but I'm so OCD that you just mentioned
link |
intelligence and language and motion.
link |
I would start becoming paranoid
link |
that the categorization isn't perfect.
link |
Like that I would become paralyzed with the mind map
link |
that like this may not be.
link |
So like the, even though you're just doing
link |
associative kind of connections,
link |
there's an implied hierarchy that's emerging.
link |
And I would start becoming paranoid
link |
that it's not the proper hierarchy.
link |
So you're not just, one way to see mind maps
link |
is you're putting thoughts on paper.
link |
It's like a stream of consciousness,
link |
but then you can also start getting paranoid.
link |
Well, is this the right hierarchy?
link |
Sure, which it's mind maps, your mind map.
link |
You're free to draw anything you want.
link |
You're free to draw any connection you want.
link |
And you can just make a different mind map
link |
if you think the central node is not the right node.
link |
Yeah, I suppose there's a fear of being wrong.
link |
If you want to organize your ideas
link |
by writing down what you think,
link |
which I think is very effective.
link |
Like how do you know what you think about something
link |
if you don't write it down, right?
link |
If you do that, the thing is that it imposes
link |
much more syntactic structure over your ideas,
link |
which is not required with mind maps.
link |
So mind map is kind of like a lower level,
link |
more freehand way of organizing your thoughts.
link |
And once you've drawn it,
link |
then you can start actually voicing your thoughts
link |
in terms of, you know, paragraphs.
link |
It's a two dimensional aspect of layout too, right?
link |
It's a kind of flower, I guess, you start.
link |
There's usually, you want to start with a central concept?
link |
Then you move out.
link |
Typically it ends up more like a subway map.
link |
So it ends up more like a graph,
link |
a topological graph without a root node.
link |
Yeah, so like in a subway map,
link |
there are some nodes that are more connected than others.
link |
And there are some nodes that are more important than others.
link |
So there are destinations,
link |
but it's not going to be purely like a tree, for instance.
link |
Yeah, it's fascinating to think that
link |
if there's something to that about the way our mind thinks.
link |
By the way, I just kind of remembered obvious thing
link |
that I have probably thousands of documents
link |
in Google Doc at this point, that are bullet point lists,
link |
which is, you can probably map a mind map
link |
to a bullet point list.
link |
It's the same, it's a, no, it's not, it's a tree.
link |
It's a tree, yeah.
link |
So I create trees,
link |
but also they don't have the visual element.
link |
Like, I guess I'm comfortable with the structure.
link |
It feels like the narrowness,
link |
the constraints feel more comforting.
link |
If you have thousands of documents
link |
with your own thoughts in Google Docs,
link |
why don't you write some kind of search engine,
link |
like maybe a mind map, a piece of software,
link |
mind mapping software, where you write down a concept
link |
and then it gives you sentences or paragraphs
link |
from your thousand Google Docs document
link |
that match this concept.
link |
The problem is it's so deeply, unlike mind maps,
link |
it's so deeply rooted in natural language.
link |
So it's not, it's not semantically searchable,
link |
I would say, because the categories are very,
link |
you kind of mentioned intelligence, language, and motion.
link |
They're very strong, semantic.
link |
Like, it feels like the mind map forces you
link |
to be semantically clear and specific.
link |
The bullet points list I have are sparse,
link |
desperate thoughts that poetically represent
link |
a category like motion, as opposed to saying motion.
link |
So unfortunately, that's the same problem with the internet.
link |
That's why the idea of semantic web is difficult to get.
link |
It's, most language on the internet is a giant mess
link |
of natural language that's hard to interpret, which,
link |
so do you think there's something to mind maps as,
link |
you actually originally brought it up
link |
as we were talking about kind of cognition and language.
link |
Do you think there's something to mind maps
link |
about how our brain actually deals,
link |
like think reasons about things?
link |
I think it's reasonable to assume that there is
link |
some level of topological processing in the brain,
link |
that the brain is very associative in nature.
link |
And I also believe that a topological space
link |
is a better medium to encode thoughts
link |
than a geometric space.
link |
What's the difference in a topological
link |
and a geometric space?
link |
Well, if you're talking about topologies,
link |
then points are either connected or not.
link |
So a topology is more like a subway map.
link |
And geometry is when you're interested
link |
in the distance between things.
link |
And in a subway map,
link |
you don't really have the concept of distance.
link |
You only have the concept of whether there is a train
link |
going from station A to station B.
link |
And what we do in deep learning is that we're actually
link |
dealing with geometric spaces.
link |
We are dealing with concept vectors, word vectors,
link |
that have a distance between them
link |
to express in terms of that product.
link |
So we are not really building topological models usually.
link |
I think you're absolutely right.
link |
Like distance is a fundamental importance in deep learning.
link |
I mean, it's the continuous aspect of it.
link |
Yes, because everything is a vector
link |
and everything has to be a vector
link |
because everything has to be differentiable.
link |
If your space is discrete, it's no longer differentiable.
link |
You cannot do deep learning in it anymore.
link |
Well, you could, but you can only do it by embedding it
link |
in a bigger continuous space.
link |
So if you do topology in the context of deep learning,
link |
you have to do it by embedding your topology
link |
Well, let me zoom out for a second.
link |
Let's get into your paper on the measure of intelligence
link |
that you put out in 2019.
link |
Yeah, remember 2019?
link |
That was a different time.
link |
It feels like a different world.
link |
You could travel, you could actually go outside
link |
Let me ask the most absurd question.
link |
I think there's some nonzero probability
link |
there'll be a textbook one day, like 200 years from now
link |
on artificial intelligence,
link |
or it'll be called like just intelligence
link |
cause humans will already be gone.
link |
It'll be your picture with a quote.
link |
This is, you know, one of the early biological systems
link |
would consider the nature of intelligence
link |
and there'll be like a definition
link |
of how they thought about intelligence.
link |
Which is one of the things you do in your paper
link |
on measure intelligence is to ask like,
link |
well, what is intelligence
link |
and how to test for intelligence and so on.
link |
So is there a spiffy quote about what is intelligence?
link |
What is the definition of intelligence
link |
according to Francois Chollet?
link |
Yeah, so do you think the super intelligent AIs
link |
of the future will want to remember us
link |
the way we remember humans from the past?
link |
And do you think they will be, you know,
link |
they won't be ashamed of having a biological origin?
link |
No, I think it would be a niche topic.
link |
It won't be that interesting,
link |
but it'll be like the people that study
link |
in certain contexts like historical civilization
link |
that no longer exists, the Aztecs and so on.
link |
That's how it'll be seen.
link |
And it'll be study in also the context on social media.
link |
There'll be hashtags about the atrocity
link |
committed to human beings
link |
when the robots finally got rid of them.
link |
Like it was a mistake.
link |
You'll be seen as a giant mistake,
link |
but ultimately in the name of progress
link |
and it created a better world
link |
because humans were over consuming the resources
link |
and they were not very rational
link |
and were destructive in the end in terms of productivity
link |
and putting more love in the world.
link |
And so within that context,
link |
there'll be a chapter about these biological systems.
link |
It seems to have a very detailed vision of that hit here.
link |
You should write a sci fi novel about it.
link |
I'm working on a sci fi novel currently, yes.
link |
Self published, yeah.
link |
The definition of intelligence.
link |
So intelligence is the efficiency
link |
with which you acquire new skills at tasks
link |
that you did not previously know about,
link |
that you did not prepare for, right?
link |
So intelligence is not skill itself.
link |
It's not what you know, it's not what you can do.
link |
It's how well and how efficiently
link |
you can learn new things.
link |
The idea of newness there
link |
seems to be fundamentally important.
link |
So you would see intelligence on display, for instance.
link |
Whenever you see a human being or an AI creature
link |
adapt to a new environment that it does not see before,
link |
that its creators did not anticipate.
link |
When you see adaptation, when you see improvisation,
link |
when you see generalization, that's intelligence.
link |
In reverse, if you have a system
link |
that when you put it in a slightly new environment,
link |
it cannot adapt, it cannot improvise,
link |
it cannot deviate from what it's hard coded to do
link |
or what it has been trained to do,
link |
that is a system that is not intelligent.
link |
There's actually a quote from Einstein
link |
that captures this idea, which is,
link |
the measure of intelligence is the ability to change.
link |
I like that quote.
link |
I think it captures at least part of this idea.
link |
You know, there might be something interesting
link |
about the difference between your definition and Einstein's.
link |
I mean, he's just being Einstein and clever,
link |
but acquisition of new ability to deal with new things
link |
versus ability to just change.
link |
What's the difference between those two things?
link |
So just change in itself.
link |
Do you think there's something to that?
link |
Just being able to change.
link |
Yes, being able to adapt.
link |
So not change, but certainly change its direction.
link |
Being able to adapt yourself to your environment.
link |
Whatever the environment is.
link |
That's a big part of intelligence.
link |
And intelligence is more precisely, you know,
link |
how efficiently you're able to adapt,
link |
how efficiently you're able to basically master your environment,
link |
how efficiently you can acquire new skills.
link |
And I think there's a big distinction to be drawn
link |
between intelligence, which is a process,
link |
and the output of that process, which is skill.
link |
So for instance, if you have a very smart human brain,
link |
so for instance, if you have a very smart human programmer
link |
that considers the game of chess,
link |
and that writes down a static program that can play chess,
link |
then the intelligence is the process
link |
of developing that program.
link |
But the program itself is just encoding
link |
the output artifact of that process.
link |
The program itself is not intelligent.
link |
And the way you tell it's not intelligent
link |
is that if you put it in a different context,
link |
you ask it to play Go or something,
link |
it's not going to be able to perform well
link |
without human involvement,
link |
because the source of intelligence,
link |
the entity that is capable of that process
link |
is the human programmer.
link |
So we should be able to tell the difference
link |
between the process and its output.
link |
We should not confuse the output and the process.
link |
It's the same as, you know,
link |
do not confuse a road building company
link |
and one specific road,
link |
because one specific road takes you from point A to point B,
link |
but a road building company can take you from,
link |
can make a path from anywhere to anywhere else.
link |
Yeah, that's beautifully put,
link |
but it's also to play devil's advocate a little bit.
link |
You know, it's possible that there's something
link |
more fundamental than us humans.
link |
So you kind of said the programmer creates
link |
the difference between the choir,
link |
the skill and the skill itself.
link |
There could be something like,
link |
you could argue the universe is more intelligent.
link |
Like the base intelligence that we should be trying
link |
to measure is something that created humans.
link |
We should be measuring God or the source of the universe
link |
as opposed to, like there could be a deeper intelligence.
link |
There's always deeper intelligence, I guess.
link |
You can argue that,
link |
but that does not take anything away
link |
from the fact that humans are intelligent.
link |
And you can tell that
link |
because they are capable of adaptation and generality.
link |
And you see that in particular in the fact
link |
that humans are capable of handling situations and tasks
link |
that are quite different from anything
link |
that any of our evolutionary ancestors
link |
has ever encountered.
link |
So we are capable of generalizing very much
link |
out of distribution,
link |
if you consider our evolutionary history
link |
as being in a way our training data.
link |
Of course, evolutionary biologists would argue
link |
that we're not going too far out of the distribution.
link |
We're like mapping the skills we've learned previously,
link |
desperately trying to like jam them
link |
into like these new situations.
link |
I mean, there's definitely a little bit of that,
link |
but it's pretty clear to me that we're able to,
link |
most of the things we do any given day
link |
in our modern civilization
link |
are things that are very, very different
link |
from what our ancestors a million years ago
link |
would have been doing in a given day.
link |
And your environment is very different.
link |
So I agree that everything we do,
link |
we do it with cognitive building blocks
link |
that we acquired over the course of evolution, right?
link |
And that anchors our cognition to a certain context,
link |
which is the human condition very much.
link |
But still our mind is capable of a pretty remarkable degree
link |
of generality far beyond anything we can create
link |
in artificial systems today.
link |
Like the degree in which the mind can generalize
link |
from its evolutionary history,
link |
can generalize away from its evolutionary history
link |
is much greater than the degree
link |
to which a deep learning system today
link |
can generalize away from its training data.
link |
And like the key point you're making,
link |
which I think is quite beautiful is like,
link |
we shouldn't measure, if we're talking about measurement,
link |
we shouldn't measure the skill.
link |
We should measure like the creation of the new skill,
link |
the ability to create that new skill.
link |
But it's tempting, like it's weird
link |
because the skill is a little bit of a small window
link |
So whenever you have a lot of skills,
link |
it's tempting to measure the skills.
link |
I mean, the skill is the only thing you can objectively
link |
measure, but yeah.
link |
So the thing to keep in mind is that
link |
when you see skill in the human,
link |
it gives you a strong signal that that human is intelligent
link |
because you know they weren't born with that skill typically.
link |
Like you see a very strong chess player,
link |
maybe you're a very strong chess player yourself.
link |
I think you're saying that because I'm Russian
link |
and now you're prejudiced, you assume.
link |
All Russians are good at chess.
link |
I'm biased, exactly.
link |
Well, you're definitely biased.
link |
So if you see a very strong chess player,
link |
you know they weren't born knowing how to play chess.
link |
So they had to acquire that skill
link |
with their limited resources, with their limited lifetime.
link |
And they did that because they are generally intelligent.
link |
And so they may as well have acquired any other skill.
link |
You know they have this potential.
link |
And on the other hand, if you see a computer playing chess,
link |
you cannot make the same assumptions
link |
because you cannot just assume
link |
the computer is generally intelligent.
link |
The computer may be born knowing how to play chess
link |
in the sense that it may have been programmed by a human
link |
that has understood chess for the computer
link |
and that has just encoded the output
link |
of that understanding in a static program.
link |
And that program is not intelligent.
link |
So let's zoom out just for a second and say like,
link |
what is the goal on the measure of intelligence paper?
link |
Like what do you hope to achieve with it?
link |
So the goal of the paper is to clear up
link |
some longstanding misunderstandings
link |
about the way we've been conceptualizing intelligence
link |
in the AI community and in the way we've been
link |
evaluating progress in AI.
link |
There's been a lot of progress recently in machine learning
link |
and people are extrapolating from that progress
link |
that we are about to solve general intelligence.
link |
And if you want to be able to evaluate these statements,
link |
you need to precisely define what you're talking about
link |
when you're talking about general intelligence.
link |
And you need a formal way, a reliable way to measure
link |
how much intelligence,
link |
how much general intelligence a system processes.
link |
And ideally this measure of intelligence
link |
should be actionable.
link |
So it should not just describe what intelligence is.
link |
It should not just be a binary indicator
link |
that tells you the system is intelligent or it isn't.
link |
It should be actionable.
link |
It should have explanatory power, right?
link |
So you could use it as a feedback signal.
link |
It would show you the way
link |
towards building more intelligent systems.
link |
So at the first level, you draw a distinction
link |
between two divergent views of intelligence.
link |
As we just talked about,
link |
intelligence is a collection of task specific skills
link |
and a general learning ability.
link |
So what's the difference between
link |
kind of this memorization of skills
link |
and a general learning ability?
link |
We've talked about it a little bit,
link |
but can you try to linger on this topic for a bit?
link |
Yeah, so the first part of the paper
link |
is an assessment of the different ways
link |
we've been thinking about intelligence
link |
and the different ways we've been evaluating progress in AI.
link |
And this tree of cognitive sciences
link |
has been shaped by two views of the human mind.
link |
And one view is the evolutionary psychology view
link |
in which the mind is a collection of fairly static
link |
special purpose ad hoc mechanisms
link |
that have been hard coded by evolution
link |
over our history as a species for a very long time.
link |
And early AI researchers,
link |
people like Marvin Minsky, for instance,
link |
they clearly subscribed to this view.
link |
And they saw the mind as a kind of
link |
collection of static programs
link |
similar to the programs they would run
link |
on like mainframe computers.
link |
And in fact, I think they very much understood the mind
link |
through the metaphor of the mainframe computer
link |
because that was the tool they were working with, right?
link |
And so you had these static programs,
link |
this collection of very different static programs
link |
operating over a database like memory.
link |
And in this picture, learning was not very important.
link |
Learning was considered to be just memorization.
link |
And in fact, learning is basically not featured
link |
in AI textbooks until the 1980s
link |
with the rise of machine learning.
link |
It's kind of fun to think about
link |
that learning was the outcast.
link |
Like the weird people working on learning,
link |
like the mainstream AI world was,
link |
I mean, I don't know what the best term is,
link |
but it's non learning.
link |
It was seen as like reasoning would not be learning based.
link |
Yes, it was considered that the mind
link |
was a collection of programs
link |
that were primarily logical in nature.
link |
And that's all you needed to do to create a mind
link |
was to write down these programs
link |
and they would operate over knowledge,
link |
which would be stored in some kind of database.
link |
And as long as your database would encompass,
link |
you know, everything about the world
link |
and your logical rules were comprehensive,
link |
then you would have a mind.
link |
So the other view of the mind
link |
is the brain as a sort of blank slate, right?
link |
This is a very old idea.
link |
You find it in John Locke's writings.
link |
This is the tabula rasa.
link |
And this is this idea that the mind
link |
is some kind of like information sponge
link |
that starts empty, that starts blank.
link |
And that absorbs knowledge and skills from experience, right?
link |
So it's a sponge that reflects the complexity of the world,
link |
the complexity of your life experience, essentially.
link |
That everything you know and everything you can do
link |
is a reflection of something you found
link |
in the outside world, essentially.
link |
So this is an idea that's very old.
link |
That was not very popular, for instance, in the 1970s.
link |
But that gained a lot of vitality recently
link |
with the rise of connectionism, in particular deep learning.
link |
And so today, deep learning
link |
is the dominant paradigm in AI.
link |
And I feel like lots of AI researchers
link |
are conceptualizing the mind via a deep learning metaphor.
link |
Like they see the mind as a kind of
link |
randomly initialized neural network that starts blank
link |
And then that gets trained via exposure to trained data
link |
that acquires knowledge and skills
link |
via exposure to trained data.
link |
By the way, it's a small tangent.
link |
I feel like people who are thinking about intelligence
link |
are not conceptualizing it that way.
link |
I actually haven't met too many people
link |
who believe that a neural network
link |
will be able to reason, who seriously think that rigorously.
link |
Because I think it's actually an interesting worldview.
link |
And we'll talk about it more,
link |
but it's been impressive what neural networks
link |
have been able to accomplish.
link |
And to me, I don't know, you might disagree,
link |
but it's an open question whether like scaling size
link |
eventually might lead to incredible results
link |
to us mere humans will appear as if it's general.
link |
I mean, if you ask people who are seriously thinking
link |
about intelligence, they will definitely not say
link |
that all you need to do is,
link |
like the mind is just a neural network.
link |
However, it's actually a view that's very popular,
link |
I think, in the deep learning community
link |
that many people are kind of conceptually
link |
intellectually lazy about it.
link |
Right, it's a, but I guess what I'm saying exactly right,
link |
it's, I mean, I haven't met many people
link |
and I think it would be interesting to meet a person
link |
who is not intellectually lazy about this particular topic
link |
and still believes that neural networks will go all the way.
link |
I think Yama is probably closest to that
link |
with self supervised.
link |
There are definitely people who argue
link |
that current deep learning techniques
link |
are already the way to general artificial intelligence.
link |
And that all you need to do is to scale it up
link |
to all the available trained data.
link |
And that's, if you look at the waves
link |
that OpenAI's GPT3 model has made,
link |
you see echoes of this idea.
link |
So on that topic, GPT3, similar to GPT2 actually,
link |
have captivated some part of the imagination of the public.
link |
There's just a bunch of hype of different kind.
link |
That's, I would say it's emergent.
link |
It's not artificially manufactured.
link |
It's just like people just get excited
link |
for some strange reason.
link |
And in the case of GPT3, which is funny,
link |
that there's, I believe, a couple months delay
link |
from release to hype.
link |
Maybe I'm not historically correct on that,
link |
but it feels like there was a little bit of a lack of hype
link |
and then there's a phase shift into hype.
link |
But nevertheless, there's a bunch of cool applications
link |
that seem to captivate the imagination of the public
link |
about what this language model
link |
that's trained in unsupervised way
link |
without any fine tuning is able to achieve.
link |
So what do you make of that?
link |
What are your thoughts about GPT3?
link |
Yeah, so I think what's interesting about GPT3
link |
is the idea that it may be able to learn new tasks
link |
after just being shown a few examples.
link |
So I think if it's actually capable of doing that,
link |
that's novel and that's very interesting
link |
and that's something we should investigate.
link |
That said, I must say, I'm not entirely convinced
link |
that we have shown it's capable of doing that.
link |
It's very likely, given the amount of data
link |
that the model is trained on,
link |
that what it's actually doing is pattern matching
link |
a new task you give it with a task
link |
that it's been exposed to in its trained data.
link |
It's just recognizing the task
link |
instead of just developing a model of the task, right?
link |
But there's, sorry to interrupt,
link |
there's a parallel as to what you said before,
link |
which is it's possible to see GPT3 as like the prompts
link |
it's given as a kind of SQL query
link |
into this thing that it's learned,
link |
similar to what you said before,
link |
which is language is used to query the memory.
link |
So is it possible that neural network
link |
is a giant memorization thing,
link |
but then if it gets sufficiently giant,
link |
it'll memorize sufficiently large amounts
link |
of things in the world or it becomes,
link |
or intelligence becomes a querying machine?
link |
I think it's possible that a significant chunk
link |
of intelligence is this giant associative memory.
link |
I definitely don't believe that intelligence
link |
is just a giant associative memory,
link |
but it may well be a big component.
link |
So do you think GPT3, 4, 5,
link |
GPT10 will eventually, like, what do you think,
link |
where's the ceiling?
link |
Do you think you'll be able to reason?
link |
No, that's a bad question.
link |
Like, what is the ceiling is the better question.
link |
How well is it gonna scale?
link |
How good is GPTN going to be?
link |
So I believe GPTN is gonna.
link |
Is gonna improve on the strength of GPT2 and 3,
link |
which is it will be able to generate, you know,
link |
ever more plausible text in context.
link |
Just monotonically increasing performance.
link |
Yes, if you train a bigger model on more data,
link |
then your text will be increasingly more context aware
link |
and increasingly more plausible
link |
in the same way that GPT3 is much better
link |
at generating plausible text compared to GPT2.
link |
But that said, I don't think just scaling up the model
link |
to more transformer layers and more trained data
link |
is gonna address the flaws of GPT3,
link |
which is that it can generate plausible text,
link |
but that text is not constrained by anything else
link |
other than plausibility.
link |
So in particular, it's not constrained by factualness
link |
or even consistency, which is why it's very easy
link |
to get GPT3 to generate statements
link |
that are factually untrue.
link |
Or to generate statements that are even self contradictory.
link |
Because it's only goal is plausibility,
link |
and it has no other constraints.
link |
It's not constrained to be self consistent, for instance.
link |
And so for this reason, one thing that I thought
link |
was very interesting with GPT3 is that you can
link |
predetermine the answer it will give you
link |
by asking the question in a specific way,
link |
because it's very responsive to the way you ask the question.
link |
Since it has no understanding of the content of the question.
link |
And if you have the same question in two different ways
link |
that are basically adversarially engineered
link |
to produce certain answer,
link |
you will get two different answers,
link |
two contradictory answers.
link |
It's very susceptible to adversarial attacks, essentially.
link |
So in general, the problem with these models,
link |
these generative models, is that they are very good
link |
at generating plausible text,
link |
but that's just not enough.
link |
I think one avenue that would be very interesting
link |
to make progress is to make it possible
link |
to write programs over the latent space
link |
that these models operate on.
link |
That you would rely on these self supervised models
link |
to generate a sort of like pool of knowledge and concepts
link |
And then you would be able to write
link |
explicit reasoning programs over it.
link |
Because the current problem with GPT3 is that
link |
it can be quite difficult to get it to do what you want to do.
link |
If you want to turn GPT3 into products,
link |
you need to put constraints on it.
link |
You need to force it to obey certain rules.
link |
So you need a way to program it explicitly.
link |
Yeah, so if you look at its ability
link |
to do program synthesis,
link |
it generates, like you said, something that's plausible.
link |
Yeah, so if you try to make it generate programs,
link |
it will perform well for any program
link |
that it has seen in its training data.
link |
But because program space is not interpretive, right?
link |
It's not going to be able to generalize to problems
link |
it hasn't seen before.
link |
Now that's currently, do you think sort of an absurd,
link |
but I think useful, I guess, intuition builder is,
link |
you know, the GPT3 has 175 billion parameters.
link |
Human brain has 100, has about a thousand times that
link |
or more in terms of number of synapses.
link |
Do you think, obviously, very different kinds of things,
link |
but there is some degree of similarity.
link |
Do you think, what do you think GPT will look like
link |
when it has 100 trillion parameters?
link |
You think our conversation might be in nature different?
link |
Like, because you've criticized GPT3 very effectively now.
link |
No, I don't think so.
link |
So to begin with, the bottleneck with scaling up GPT3,
link |
GPT models, generative pre trained transformer models,
link |
is not going to be the size of the model
link |
or how long it takes to train it.
link |
The bottleneck is going to be the trained data
link |
because OpenAI is already training GPT3
link |
on a core of basically the entire web, right?
link |
And that's a lot of data.
link |
So you could imagine training on more data than that,
link |
like Google could train on more data than that,
link |
but it would still be only incrementally more data.
link |
And I don't recall exactly how much more data GPT3
link |
was trained on compared to GPT2,
link |
but it's probably at least like a hundred,
link |
maybe even a thousand X.
link |
I don't have the exact number.
link |
You're not going to be able to train a model
link |
on a hundred more data than what you're already doing.
link |
So that's brilliant.
link |
So it's easier to think of compute as a bottleneck
link |
and then arguing that we can remove that bottleneck.
link |
But we can remove the compute bottleneck.
link |
I don't think it's a big problem.
link |
If you look at the pace at which we've improved
link |
the efficiency of deep learning models
link |
in the past few years,
link |
I'm not worried about train time bottlenecks
link |
or model size bottlenecks.
link |
The bottleneck in the case
link |
of these generative transformer models
link |
is absolutely the trained data.
link |
What about the quality of the data?
link |
So the quality of the data is an interesting point.
link |
if you're going to want to use these models
link |
then you want to feed them data
link |
that's as high quality, as factual,
link |
I would say as unbiased as possible,
link |
that there's not really such a thing
link |
as unbiased data in the first place.
link |
But you probably don't want to train it on Reddit,
link |
It sounds like a bad plan.
link |
So from my personal experience,
link |
working with large scale deep learning models.
link |
So at some point I was working on a model at Google
link |
that's trained on 350 million labeled images.
link |
It's an image classification model.
link |
That's a lot of images.
link |
That's like probably most publicly available images
link |
on the web at the time.
link |
And it was a very noisy data set
link |
because the labels were not originally annotated by hand,
link |
They were automatically derived from like tags
link |
or just keywords in the same page
link |
as the image was found and so on.
link |
So it was very noisy.
link |
And it turned out that you could easily get a better model,
link |
not just by training,
link |
like if you train on more of the noisy data,
link |
you get an incrementally better model,
link |
but you very quickly hit diminishing returns.
link |
On the other hand,
link |
if you train on smaller data set
link |
with higher quality annotations,
link |
quality annotations that are actually made by humans,
link |
you get a better model.
link |
And it also takes less time to train it.
link |
Yeah, that's fascinating.
link |
It's the self supervised learning.
link |
There's a way to get better doing the automated labeling.
link |
Yeah, so you can enrich or refine your labels
link |
in an automated way.
link |
Do you have a hope for,
link |
I don't know if you're familiar
link |
with the idea of a semantic web.
link |
Is a semantic web just for people who are not familiar
link |
and is the idea of being able to convert the internet
link |
or be able to attach like semantic meaning
link |
to the words on the internet,
link |
the sentences, the paragraphs,
link |
to be able to convert information on the internet
link |
or some fraction of the internet
link |
into something that's interpretable by machines.
link |
That was kind of a dream for,
link |
I think the semantic web papers in the nineties,
link |
it's kind of the dream that, you know,
link |
the internet is full of rich, exciting information.
link |
Even just looking at Wikipedia,
link |
we should be able to use that as data for machines.
link |
And so far it's not,
link |
it's not really in a format that's available to machines.
link |
So no, I don't think the semantic web will ever work
link |
simply because it would be a lot of work, right?
link |
To make, to provide that information in structured form.
link |
And there is not really any incentive
link |
for anyone to provide that work.
link |
So I think the way forward to make the knowledge
link |
on the web available to machines
link |
is actually something closer to unsupervised deep learning.
link |
So GPT3 is actually a bigger step in the direction
link |
of making the knowledge of the web available to machines
link |
than the semantic web was.
link |
Yeah, perhaps in a human centric sense,
link |
it feels like GPT3 hasn't learned anything
link |
that could be used to reason.
link |
But that might be just the early days.
link |
Yeah, I think that's correct.
link |
I think the forms of reasoning that you see it perform
link |
are basically just reproducing patterns
link |
that it has seen in string data.
link |
So of course, if you're trained on the entire web,
link |
then you can produce an illusion of reasoning
link |
in many different situations.
link |
But it will break down if it's presented
link |
with a novel situation.
link |
That's the open question between the illusion of reasoning
link |
and actual reasoning, yeah.
link |
The power to adapt to something that is genuinely new.
link |
Because the thing is, even imagine you had,
link |
you could train on every bit of data
link |
ever generated in the history of humanity.
link |
It remains, that model would be capable
link |
of anticipating many different possible situations.
link |
But it remains that the future is
link |
going to be something different.
link |
For instance, if you train a GPT3 model on data
link |
from the year 2002, for instance,
link |
and then use it today, it's going to be missing many things.
link |
It's going to be missing many common sense
link |
facts about the world.
link |
It's even going to be missing vocabulary and so on.
link |
Yeah, it's interesting that GPT3 even doesn't have,
link |
I think, any information about the coronavirus.
link |
Which is why a system that's, you
link |
tell that the system is intelligent
link |
when it's capable to adapt.
link |
So intelligence is going to require
link |
some amount of continuous learning.
link |
It's also going to require some amount of improvisation.
link |
It's not enough to assume that what you're
link |
going to be asked to do is something
link |
that you've seen before, or something
link |
that is a simple interpolation of things you've seen before.
link |
In fact, that model breaks down for even very
link |
tasks that look relatively simple from a distance,
link |
like L5 self driving, for instance.
link |
Google had a paper a couple of years
link |
back showing that something like 30 million different road
link |
situations were actually completely insufficient
link |
to train a driving model.
link |
It wasn't even L2, right?
link |
And that's a lot of data.
link |
That's a lot more data than the 20 or 30 hours of driving
link |
that a human needs to learn to drive,
link |
given the knowledge they've already accumulated.
link |
Well, let me ask you on that topic.
link |
Elon Musk, Tesla Autopilot, one of the only companies,
link |
I believe, is really pushing for a learning based approach.
link |
Are you skeptical that that kind of network
link |
can achieve level 4?
link |
L4 is probably achievable.
link |
What's the distinction there?
link |
Is L5 is completely you can just fall asleep?
link |
Yeah, L5 is basically human level.
link |
Well, with driving, we have to be careful saying human level,
link |
because that's the most of the drivers.
link |
Yeah, that's the clearest example of cars
link |
will most likely be much safer than humans in many situations
link |
where humans fail.
link |
It's the vice versa question.
link |
I'll tell you, the thing is the amount of trained data
link |
you would need to anticipate for pretty much every possible
link |
situation you learn content in the real world
link |
is such that it's not entirely unrealistic
link |
to think that at some point in the future,
link |
we'll develop a system that's trained on enough data,
link |
especially provided that we can simulate a lot of that data.
link |
We don't necessarily need actual cars
link |
on the road for everything.
link |
But it's a massive effort.
link |
And it turns out you can create a system that's
link |
much more adaptive, that can generalize much better
link |
if you just add explicit models of the surroundings
link |
And if you use deep learning for what
link |
it's good at, which is to provide
link |
perceptive information.
link |
So in general, deep learning is a way
link |
to encode perception and a way to encode intuition.
link |
But it is not a good medium for any sort of explicit reasoning.
link |
And in AI systems today, strong generalization
link |
tends to come from explicit models,
link |
tend to come from abstractions in the human mind that
link |
are encoded in program form by a human engineer.
link |
These are the abstractions you can actually generalize, not
link |
the sort of weak abstraction that
link |
is learned by a neural network.
link |
Yeah, and the question is how much reasoning,
link |
how much strong abstractions are required
link |
to solve particular tasks like driving.
link |
That's the question.
link |
Or human life existence.
link |
How much strong abstractions does existence require?
link |
But more specifically on driving,
link |
that seems to be a coupled question about intelligence.
link |
How much intelligence, how do you
link |
build an intelligent system?
link |
And the coupled problem, how hard is this problem?
link |
How much intelligence does this problem actually require?
link |
So we get to cheat because we get
link |
to look at the problem.
link |
It's not like you get to close our eyes
link |
and completely new to driving.
link |
We get to do what we do as human beings, which
link |
is for the majority of our life before we ever
link |
learn, quote unquote, to drive.
link |
We get to watch other cars and other people drive.
link |
We get to be in cars.
link |
We get to see movies about cars.
link |
We get to observe all this stuff.
link |
And that's similar to what neural networks are doing.
link |
It's getting a lot of data, and the question
link |
is, yeah, how many leaps of reasoning genius
link |
is required to be able to actually effectively drive?
link |
I think it's a good example of driving.
link |
I mean, sure, you've seen a lot of cars in your life
link |
before you learned to drive.
link |
But let's say you've learned to drive in Silicon Valley,
link |
and now you rent a car in Tokyo.
link |
Well, now everyone is driving on the other side of the road,
link |
and the signs are different, and the roads
link |
are more narrow and so on.
link |
So it's a very, very different environment.
link |
And a smart human, even an average human,
link |
should be able to just zero shot it,
link |
to just be operational in this very different environment
link |
right away, despite having had no contact with the novel
link |
complexity that is contained in this environment.
link |
And that novel complexity is not just an interpolation
link |
over the situations that you've encountered previously,
link |
like learning to drive in the US.
link |
I would say the reason I ask is one
link |
of the most interesting tests of intelligence
link |
we have today actively, which is driving,
link |
in terms of having an impact on the world.
link |
When do you think we'll pass that test of intelligence?
link |
So I don't think driving is that much of a test of intelligence,
link |
because again, there is no task for which skill at that task
link |
demonstrates intelligence, unless it's
link |
a kind of meta task that involves acquiring new skills.
link |
So I don't think, I think you can actually
link |
solve driving without having any real amount of intelligence.
link |
For instance, if you did have infinite trained data,
link |
you could just literally train an end to end deep learning
link |
model that does driving, provided infinite trained data.
link |
The only problem with the whole idea
link |
is collecting a data set that's sufficiently comprehensive,
link |
that covers the very long tail of possible situations
link |
you might encounter.
link |
And it's really just a scale problem.
link |
So I think there's nothing fundamentally wrong
link |
with this plan, with this idea.
link |
It's just that it strikes me as a fairly inefficient thing
link |
to do, because you run into this scaling issue with diminishing
link |
Whereas if instead you took a more manual engineering
link |
approach, where you use deep learning modules in combination
link |
with engineering an explicit model of the surrounding
link |
of the cars, and you bridge the two in a clever way,
link |
your model will actually start generalizing
link |
much earlier and more effectively
link |
than the end to end deep learning model.
link |
So why would you not go with the more manual engineering
link |
oriented approach?
link |
Even if you created that system, either the end
link |
to end deep learning model system that's
link |
running infinite data, or the slightly more human system,
link |
I don't think achieving L5 would demonstrate
link |
general intelligence or intelligence
link |
of any generality at all.
link |
Again, the only possible test of generality in AI
link |
would be a test that looks at skill acquisition
link |
over unknown tasks.
link |
For instance, you could take your L5 driver
link |
and ask it to learn to pilot a commercial airplane,
link |
And then you would look at how much human involvement is
link |
required and how much strength data
link |
is required for the system to learn to pilot an airplane.
link |
And that gives you a measure of how intelligent
link |
that system really is.
link |
Yeah, well, I mean, that's a big leap.
link |
But I'm more interested, as a problem, I would see,
link |
to me, driving is a black box that
link |
can generate novel situations at some rate,
link |
what people call edge cases.
link |
So it does have newness that keeps being like,
link |
we're confronted, let's say, once a month.
link |
It is a very long tail, yes.
link |
That doesn't mean you cannot solve it just
link |
by training a statistical model and a lot of data.
link |
Huge amount of data.
link |
It's really a matter of scale.
link |
But I guess what I'm saying is if you have a vehicle that
link |
achieves level 5, it is going to be able to deal
link |
with new situations.
link |
Or, I mean, the data is so large that the rate of new situations
link |
That's not intelligent.
link |
So if we go back to your kind of definition of intelligence,
link |
it's the efficiency.
link |
With which you can adapt to new situations,
link |
to truly new situations, not situations you've seen before.
link |
Not situations that could be anticipated by your creators,
link |
by the creators of the system, but truly new situations.
link |
The efficiency with which you acquire new skills.
link |
If you require, if in order to pick up a new skill,
link |
you require a very extensive training
link |
data set of most possible situations
link |
that can occur in the practice of that skill,
link |
then the system is not intelligent.
link |
It is mostly just a lookup table.
link |
Well, likewise, if in order to acquire a skill,
link |
you need a human engineer to write down
link |
a bunch of rules that cover most or every possible situation.
link |
Likewise, the system is not intelligent.
link |
The system is merely the output artifact
link |
of a process that happens in the minds of the engineers that
link |
It is encoding an abstraction that's
link |
produced by the human mind.
link |
And intelligence would actually be
link |
the process of autonomously producing this abstraction.
link |
Not like if you take an abstraction
link |
and you encode it on a piece of paper or in a computer program,
link |
the abstraction itself is not intelligent.
link |
What's intelligent is the agent that's
link |
capable of producing these abstractions.
link |
Yeah, it feels like there's a little bit of a gray area.
link |
Because you're basically saying that deep learning forms
link |
abstractions, too.
link |
But those abstractions do not seem
link |
to be effective for generalizing far outside of the things
link |
that it's already seen.
link |
But generalize a little bit.
link |
No, deep learning does generalize a little bit.
link |
Generalization is not binary.
link |
It's more like a spectrum.
link |
And there's a certain point, it's a gray area,
link |
but there's a certain point where
link |
there's an impressive degree of generalization that happens.
link |
No, I guess exactly what you were saying
link |
is intelligence is how efficiently you're
link |
able to generalize far outside of the distribution of things
link |
you've seen already.
link |
So it's both the distance of how far you can,
link |
how new, how radically new something is,
link |
and how efficiently you're able to deal with that.
link |
So you can think of intelligence as a measure of an information
link |
Imagine a space of possible situations.
link |
And you've covered some of them.
link |
So you have some amount of information
link |
about your space of possible situations
link |
that's provided by the situations you already know.
link |
And that's, on the other hand, also provided
link |
by the prior knowledge that the system brings
link |
to the table, the prior knowledge embedded
link |
So the system starts with some information
link |
about the problem, about the task.
link |
And it's about going from that information
link |
to a program, what we would call a skill program,
link |
a behavioral program, that can cover a large area
link |
of possible situation space.
link |
And essentially, the ratio between that area
link |
and the amount of information you start with is intelligence.
link |
So a very smart agent can make efficient use
link |
of very little information about a new problem
link |
and very little prior knowledge as well
link |
to cover a very large area of potential situations
link |
in that problem without knowing what these future new situations
link |
So one of the other big things you talk about in the paper,
link |
we've talked about a little bit already,
link |
but let's talk about it some more,
link |
is the actual tests of intelligence.
link |
So if we look at human and machine intelligence,
link |
do you think tests of intelligence
link |
should be different for humans and machines,
link |
or how we think about testing of intelligence?
link |
Are these fundamentally the same kind of intelligences
link |
that we're after, and therefore, the tests should be similar?
link |
So if your goal is to create AIs that are more humanlike,
link |
then it would be super valuable, obviously,
link |
to have a test that's universal, that applies to both AIs
link |
and humans, so that you could establish
link |
a comparison between the two, that you
link |
could tell exactly how intelligent,
link |
in terms of human intelligence, a given system is.
link |
So that said, the constraints that
link |
apply to artificial intelligence and to human intelligence
link |
are very different.
link |
And your test should account for this difference.
link |
Because if you look at artificial systems,
link |
it's always possible for an experimenter
link |
to buy arbitrary levels of skill at arbitrary tasks,
link |
either by injecting hardcoded prior knowledge
link |
into the system via rules and so on that
link |
come from the human mind, from the minds of the programmers,
link |
and also buying higher levels of skill
link |
just by training on more data.
link |
For instance, you could generate an infinity
link |
of different Go games, and you could train a Go playing
link |
system that way, but you could not directly compare it
link |
to human Go playing skills.
link |
Because a human that plays Go had
link |
to develop that skill in a very constrained environment.
link |
They had a limited amount of time.
link |
They had a limited amount of energy.
link |
And of course, this started from a different set of priors.
link |
This started from innate human priors.
link |
So I think if you want to compare
link |
the intelligence of two systems, like the intelligence of an AI
link |
and the intelligence of a human, you have to control for priors.
link |
You have to start from the same set of knowledge priors
link |
about the task, and you have to control
link |
for experience, that is to say, for training data.
link |
So prior is whatever information you
link |
have about a given task before you
link |
start learning about this task.
link |
And how's that different from experience?
link |
Well, experience is acquired.
link |
So for instance, if you're trying to play Go,
link |
your experience with Go is all the Go games
link |
you've played, or you've seen, or you've simulated
link |
in your mind, let's say.
link |
And your priors are things like, well,
link |
Go is a game on the 2D grid.
link |
And we have lots of hardcoded priors
link |
about the organization of 2D space.
link |
And the rules of how the dynamics of the physics
link |
of this game in this 2D space?
link |
And the idea that you have what winning is.
link |
And other board games can also share some similarities with Go.
link |
And if you've played these board games, then,
link |
with respect to the game of Go, that
link |
would be part of your priors about the game.
link |
Well, it's interesting to think about the game of Go
link |
is how many priors are actually brought to the table.
link |
When you look at self play, reinforcement learning based
link |
mechanisms that do learning, it seems
link |
like the number of priors is pretty low.
link |
But you're saying you should be expec...
link |
There is a 2D special priors in the carbonate.
link |
But you should be clear at making
link |
those priors explicit.
link |
So in particular, I think if your goal
link |
is to measure a humanlike form of intelligence,
link |
then you should clearly establish
link |
that you want the AI you're testing
link |
to start from the same set of priors that humans start with.
link |
So I mean, to me personally, but I think to a lot of people,
link |
the human side of things is very interesting.
link |
So testing intelligence for humans.
link |
What do you think is a good test of human intelligence?
link |
Well, that's the question that psychometrics is interested in.
link |
There's an entire subfield of psychology
link |
that deals with this question.
link |
So what's psychometrics?
link |
The psychometrics is the subfield of psychology
link |
that tries to measure, quantify aspects of the human mind.
link |
So in particular, our cognitive abilities, intelligence,
link |
and personality traits as well.
link |
So what are, it might be a weird question,
link |
but what are the first principles of psychometrics
link |
What are the priors it brings to the table?
link |
So it's a field with a fairly long history.
link |
So psychology sometimes gets a bad reputation
link |
for not having very reproducible results.
link |
And psychometrics has actually some fairly solidly
link |
reproducible results.
link |
So the ideal goals of the field is a test
link |
should be reliable, which is a notion tied to reproducibility.
link |
It should be valid, meaning that it should actually
link |
measure what you say it measures.
link |
So for instance, if you're saying
link |
that you're measuring intelligence,
link |
then your test results should be correlated
link |
with things that you expect to be correlated
link |
with intelligence like success in school
link |
or success in the workplace and so on.
link |
Should be standardized, meaning that you
link |
can administer your tests to many different people
link |
in some conditions.
link |
And it should be free from bias.
link |
Meaning that, for instance, if your test involves
link |
the English language, then you have
link |
to be aware that this creates a bias against people
link |
who have English as their second language
link |
or people who can't speak English at all.
link |
So of course, these principles for creating
link |
psychometric tests are very much an ideal.
link |
I don't think every psychometric test is really either
link |
reliable, valid, or free from bias.
link |
But at least the field is aware of these weaknesses
link |
and is trying to address them.
link |
So it's kind of interesting.
link |
Ultimately, you're only able to measure,
link |
like you said previously, the skill.
link |
But you're trying to do a bunch of measures
link |
of different skills that correlate,
link |
as you mentioned, strongly with some general concept
link |
of cognitive ability.
link |
So what's the G factor?
link |
So right, there are many different kinds
link |
of tests of intelligence.
link |
And each of them is interesting in different aspects
link |
Some of them will deal with language.
link |
Some of them will deal with spatial vision,
link |
maybe mental rotations, numbers, and so on.
link |
When you run these very different tests at scale,
link |
what you start seeing is that there
link |
are clusters of correlations among test results.
link |
So for instance, if you look at homework at school,
link |
you will see that people who do well at math
link |
are also likely statistically to do well in physics.
link |
And what's more, people who do well at math and physics
link |
are also statistically likely to do well
link |
in things that sound completely unrelated,
link |
like writing an English essay, for instance.
link |
And so when you see clusters of correlations
link |
in statistical terms, you would explain them
link |
with the latent variable.
link |
And the latent variable that would, for instance, explain
link |
the relationship between being good at math
link |
and being good at physics would be cognitive ability.
link |
And the G factor is the latent variable
link |
that explains the fact that every test of intelligence
link |
that you can come up with results on this test
link |
end up being correlated.
link |
So there is some single unique variable
link |
that explains these correlations.
link |
That's the G factor.
link |
So it's a statistical construct.
link |
It's not really something you can directly measure,
link |
for instance, in a person.
link |
It's there at scale.
link |
And that's also one thing I want to mention about psychometrics.
link |
Like when you talk about measuring intelligence
link |
in humans, for instance, some people
link |
get a little bit worried.
link |
They will say, that sounds dangerous.
link |
Maybe that sounds potentially discriminatory, and so on.
link |
And they're not wrong.
link |
And the thing is, personally, I'm
link |
not interested in psychometrics as a way
link |
to characterize one individual person.
link |
Like if I get your psychometric personality
link |
assessments or your IQ, I don't think that actually
link |
tells me much about you as a person.
link |
I think psychometrics is most useful as a statistical tool.
link |
So it's most useful at scale.
link |
It's most useful when you start getting test results
link |
for a large number of people.
link |
And you start cross correlating these test results.
link |
Because that gives you information
link |
about the structure of the human mind,
link |
in particular about the structure
link |
of human cognitive abilities.
link |
So at scale, psychometrics paints a certain picture
link |
of the human mind.
link |
And that's interesting.
link |
And that's what's relevant to AI, the structure
link |
of human cognitive abilities.
link |
Yeah, it gives you an insight into it.
link |
I mean, to me, I remember when I learned about G factor,
link |
it seemed like it would be impossible for it
link |
to be real, even as a statistical variable.
link |
Like it felt kind of like astrology.
link |
Like it's like wishful thinking among psychologists.
link |
But the more I learned, I realized that there's some.
link |
I mean, I'm not sure what to make about human beings,
link |
the fact that the G factor is a thing.
link |
There's a commonality across all of human species,
link |
that there does seem to be a strong correlation
link |
between cognitive abilities.
link |
That's kind of fascinating, actually.
link |
So human cognitive abilities have a structure.
link |
Like the most mainstream theory of the structure
link |
of cognitive abilities is called CHC theory.
link |
It's Cattell, Horn, Carroll.
link |
It's named after the three psychologists who
link |
contributed key pieces of it.
link |
And it describes cognitive abilities
link |
as a hierarchy with three levels.
link |
And at the top, you have the G factor.
link |
Then you have broad cognitive abilities,
link |
for instance fluid intelligence, that
link |
encompass a broad set of possible kinds of tasks
link |
that are all related.
link |
And then you have narrow cognitive abilities
link |
at the last level, which is closer to task specific skill.
link |
And there are actually different theories of the structure
link |
of cognitive abilities that just emerge
link |
from different statistical analysis of IQ test results.
link |
But they all describe a hierarchy with a kind of G
link |
factor at the top.
link |
And you're right that the G factor,
link |
it's not quite real in the sense that it's not something
link |
you can observe and measure, like your height,
link |
But it's real in the sense that you
link |
see it in a statistical analysis of the data.
link |
One thing I want to mention is that the fact
link |
that there is a G factor does not really
link |
mean that human intelligence is general in a strong sense.
link |
It does not mean human intelligence
link |
can be applied to any problem at all,
link |
and that someone who has a high IQ
link |
is going to be able to solve any problem at all.
link |
That's not quite what it means.
link |
I think one popular analogy to understand it
link |
is the sports analogy.
link |
If you consider the concept of physical fitness,
link |
it's a concept that's very similar to intelligence
link |
because it's a useful concept.
link |
It's something you can intuitively understand.
link |
Some people are fit, maybe like you.
link |
Some people are not as fit, maybe like me.
link |
But none of us can fly.
link |
It's constrained to a specific set of skills.
link |
Even if you're very fit, that doesn't
link |
mean you can do anything at all in any environment.
link |
You obviously cannot fly.
link |
You cannot survive at the bottom of the ocean and so on.
link |
And if you were a scientist and you
link |
wanted to precisely define and measure physical fitness
link |
in humans, then you would come up with a battery of tests.
link |
You would have running 100 meter, playing soccer,
link |
playing table tennis, swimming, and so on.
link |
And if you ran these tests over many different people,
link |
you would start seeing correlations in test results.
link |
For instance, people who are good at soccer
link |
are also good at sprinting.
link |
And you would explain these correlations
link |
with physical abilities that are strictly
link |
analogous to cognitive abilities.
link |
And then you would start also observing correlations
link |
between biological characteristics,
link |
like maybe lung volume is correlated with being
link |
a fast runner, for instance, in the same way
link |
that there are neurophysical correlates of cognitive
link |
And at the top of the hierarchy of physical abilities
link |
that you would be able to observe,
link |
you would have a G factor, a physical G factor, which
link |
would map to physical fitness.
link |
And as you just said, that doesn't
link |
mean that people with high physical fitness can't fly.
link |
It doesn't mean human morphology and human physiology
link |
It's actually super specialized.
link |
We can only do the things that we were evolved to do.
link |
We are not appropriate to, you could not
link |
exist on Venus or Mars or in the void of space
link |
or the bottom of the ocean.
link |
So that said, one thing that's really striking and remarkable
link |
is that our morphology generalizes
link |
far beyond the environments that we evolved for.
link |
Like in a way, you could say we evolved to run after prey
link |
in the savanna, right?
link |
That's very much where our human morphology comes from.
link |
And that said, we can do a lot of things
link |
that are completely unrelated to that.
link |
We can climb mountains.
link |
We can swim across lakes.
link |
We can play table tennis.
link |
I mean, table tennis is very different from what
link |
we were evolved to do, right?
link |
So our morphology, our bodies, our sense and motor
link |
affordances have a degree of generality
link |
that is absolutely remarkable, right?
link |
And I think cognition is very similar to that.
link |
Our cognitive abilities have a degree of generality
link |
that goes far beyond what the mind was initially
link |
supposed to do, which is why we can play music and write
link |
novels and go to Mars and do all kinds of crazy things.
link |
But it's not universal in the same way
link |
that human morphology and our body
link |
is not appropriate for actually most of the universe by volume.
link |
In the same way, you could say that the human mind is not
link |
really appropriate for most of problem space,
link |
potential problem space by volume.
link |
So we have very strong cognitive biases, actually,
link |
that mean that there are certain types of problems
link |
that we handle very well and certain types of problems
link |
that we are completely in adapted for.
link |
So that's really how we'd interpret the G factor.
link |
It's not a sign of strong generality.
link |
It's really just the broadest cognitive ability.
link |
But our abilities, whether we are
link |
talking about sensory motor abilities or cognitive
link |
abilities, they still remain very specialized
link |
in the human condition, right?
link |
Within the constraints of the human cognition,
link |
But the constraints, as you're saying, are very limited.
link |
I think what's limiting.
link |
So we evolved our cognition and our body
link |
evolved in very specific environments.
link |
Because our environment was so variable, fast changing,
link |
and so unpredictable, part of the constraints
link |
that drove our evolution is generality itself.
link |
So we were, in a way, evolved to be able to improvise
link |
in all kinds of physical or cognitive environments.
link |
And for this reason, it turns out
link |
that the minds and bodies that we ended up with
link |
can be applied to much, much broader scope
link |
than what they were evolved for.
link |
And that's truly remarkable.
link |
And that's a degree of generalization
link |
that is far beyond anything you can see in artificial systems
link |
That said, it does not mean that human intelligence
link |
is anywhere universal.
link |
Yeah, it's not general.
link |
It's a kind of exciting topic for people,
link |
even outside of artificial intelligence, is IQ tests.
link |
I think it's Mensa, whatever.
link |
There's different degrees of difficulty for questions.
link |
We talked about this offline a little bit, too,
link |
about difficult questions.
link |
What makes a question on an IQ test more difficult or less
link |
difficult, do you think?
link |
So the thing to keep in mind is that there's
link |
no such thing as a question that's intrinsically difficult.
link |
It has to be difficult to suspect to the things you
link |
already know and the things you can already do, right?
link |
So in terms of an IQ test question,
link |
typically it would be structured, for instance,
link |
as a set of demonstration input and output pairs, right?
link |
And then you would be given a test input, a prompt,
link |
and you would need to recognize or produce
link |
the corresponding output.
link |
And in that narrow context, you could say a difficult question
link |
is a question where the input prompt is
link |
very surprising and unexpected, given the training examples.
link |
Just even the nature of the patterns
link |
that you're observing in the input prompt.
link |
For instance, let's say you have a rotation problem.
link |
You must relate the shape by 90 degrees.
link |
If I give you two examples and then I give you one prompt,
link |
which is actually one of the two training examples,
link |
then there is zero generalization difficulty
link |
It's actually a trivial task.
link |
You just recognize that it's one of the training examples,
link |
and you produce the same answer.
link |
Now, if it's a more complex shape,
link |
there is a little bit more generalization,
link |
but it remains that you are still
link |
doing the same thing at this time,
link |
as you were being demonstrated at training time.
link |
A difficult task starts to require some amount of test
link |
time adaptation, some amount of improvisation, right?
link |
So consider, I don't know, you're
link |
teaching a class on quantum physics or something.
link |
If you wanted to test the understanding that students
link |
have of the material, you would come up
link |
with an exam that's very different from anything
link |
they've seen on the internet when they were cramming.
link |
On the other hand, if you wanted to make it easy,
link |
you would just give them something
link |
that's very similar to the mock exams
link |
that they've taken, something that's
link |
just a simple interpolation of questions
link |
that they've already seen.
link |
And so that would be an easy exam.
link |
It's very similar to what you've been trained on.
link |
And a difficult exam is one that really probes your understanding
link |
because it forces you to improvise.
link |
It forces you to do things that are
link |
different from what you were exposed to before.
link |
So that said, it doesn't mean that the exam that
link |
requires improvisation is intrinsically hard, right?
link |
Because maybe you're a quantum physics expert.
link |
So when you take the exam, this is actually
link |
stuff that, despite being new to the students,
link |
it's not new to you, right?
link |
So it can only be difficult with respect
link |
to what the test taker already knows
link |
and with respect to the information
link |
that the test taker has about the task.
link |
So that's what I mean by controlling for priors
link |
what the information you bring to the table.
link |
And the experience.
link |
And the experience, which is to train data.
link |
So in the case of the quantum physics exam,
link |
that would be all the course material itself
link |
and all the mock exams that students
link |
might have taken online.
link |
Yeah, it's interesting because I've also sent you an email.
link |
I asked you, I've been in just this curious question
link |
of what's a really hard IQ test question.
link |
And I've been talking to also people
link |
who have designed IQ tests.
link |
There's a few folks on the internet, it's like a thing.
link |
People are really curious about it.
link |
First of all, most of the IQ tests they designed,
link |
they like religiously protect against the correct answers.
link |
Like you can't find the correct answers anywhere.
link |
In fact, the question is ruined once you know,
link |
even like the approach you're supposed to take.
link |
So they're very...
link |
That said, the approach is implicit in the training examples.
link |
So if you release the training examples, it's over.
link |
Which is why in Arc, for instance,
link |
there is a test set that is private and no one has seen it.
link |
No, for really tough IQ questions, it's not obvious.
link |
It's not because the ambiguity.
link |
Like it's, I mean, we'll have to look through them,
link |
but like some number sequences and so on,
link |
it's not completely clear.
link |
So like you can get a sense, but there's like some,
link |
you know, when you look at a number sequence, I don't know,
link |
like your Fibonacci number sequence,
link |
if you look at the first few numbers,
link |
that sequence could be completed in a lot of different ways.
link |
And you know, some are, if you think deeply,
link |
are more correct than others.
link |
Like there's a kind of intuitive simplicity
link |
and elegance to the correct solution.
link |
I am personally not a fan of ambiguity
link |
in test questions actually,
link |
but I think you can have difficulty
link |
without requiring ambiguity simply by making the test
link |
require a lot of extrapolation over the training examples.
link |
But the beautiful question is difficult,
link |
but gives away everything
link |
when you give the training example.
link |
Meaning that, so the tests I'm interested in creating
link |
are not necessarily difficult for humans
link |
because human intelligence is the benchmark.
link |
They're supposed to be difficult for machines
link |
in ways that are easy for humans.
link |
Like I think an ideal test of human and machine intelligence
link |
is a test that is actionable,
link |
that highlights the need for progress,
link |
and that highlights the direction
link |
in which you should be making progress.
link |
I think we'll talk about the ARC challenge
link |
and the test you've constructed
link |
and you have these elegant examples.
link |
I think that highlight,
link |
like this is really easy for us humans,
link |
but it's really hard for machines.
link |
But on the, you know, the designing an IQ test
link |
for IQs of like higher than 160 and so on,
link |
you have to say, you have to take that
link |
and put it on steroids, right?
link |
You have to think like, what is hard for humans?
link |
And that's a fascinating exercise in itself, I think.
link |
And it was an interesting question
link |
of what it takes to create a really hard question for humans
link |
because you again have to do the same process
link |
as you mentioned, which is, you know,
link |
something basically where the experience
link |
that you have likely to have encountered
link |
throughout your whole life,
link |
even if you've prepared for IQ tests,
link |
which is a big challenge,
link |
that this will still be novel for you.
link |
Yeah, I mean, novelty is a requirement.
link |
You should not be able to practice for the questions
link |
that you're gonna be tested on.
link |
That's important because otherwise what you're doing
link |
is not exhibiting intelligence.
link |
What you're doing is just retrieving
link |
what you've been exposed before.
link |
It's the same thing as deep learning model.
link |
If you train a deep learning model
link |
on all the possible answers, then it will ace your test
link |
in the same way that, you know,
link |
a stupid student can still ace the test
link |
if they cram for it.
link |
They memorize, you know,
link |
a hundred different possible mock exams.
link |
And then they hope that the actual exam
link |
will be a very simple interpolation of the mock exams.
link |
And that student could just be a deep learning model
link |
But you can actually do that
link |
without any understanding of the material.
link |
And in fact, many students pass their exams
link |
in exactly this way.
link |
And if you want to avoid that,
link |
you need an exam that's unlike anything they've seen
link |
that really probes their understanding.
link |
So how do we design an IQ test for machines,
link |
an intelligent test for machines?
link |
All right, so in the paper I outline
link |
a number of requirements that you expect of such a test.
link |
And in particular, we should start by acknowledging
link |
the priors that we expect to be required
link |
in order to perform the test.
link |
So we should be explicit about the priors, right?
link |
And if the goal is to compare machine intelligence
link |
and human intelligence,
link |
then we should assume human cognitive priors, right?
link |
And secondly, we should make sure that we are testing
link |
for skill acquisition ability,
link |
skill acquisition efficiency in particular,
link |
and not for skill itself.
link |
Meaning that every task featured in your test
link |
should be novel and should not be something
link |
that you can anticipate.
link |
So for instance, it should not be possible
link |
to brute force the space of possible questions, right?
link |
To pre generate every possible question and answer.
link |
So it should be tasks that cannot be anticipated,
link |
not just by the system itself,
link |
but by the creators of the system, right?
link |
Yeah, you know what's fascinating?
link |
I mean, one of my favorite aspects of the paper
link |
and the work you do with the ARC challenge
link |
is the process of making priors explicit.
link |
Just even that act alone is a really powerful one
link |
of like, what are, it's a really powerful question
link |
asked of us humans.
link |
What are the priors that we bring to the table?
link |
So the next step is like, once you have those priors,
link |
how do you use them to solve a novel task?
link |
But like, just even making the priors explicit
link |
is a really difficult and really powerful step.
link |
And that's like visually beautiful
link |
and conceptually philosophically beautiful part
link |
of the work you did with, and I guess continue to do
link |
probably with the paper and the ARC challenge.
link |
Can you talk about some of the priors
link |
that we're talking about here?
link |
Yes, so a researcher has done a lot of work
link |
on what exactly are the knowledge priors
link |
that are innate to humans is Elizabeth Spelke from Harvard.
link |
So she developed the core knowledge theory,
link |
which outlines four different core knowledge systems.
link |
So systems of knowledge that we are basically
link |
either born with or that we are hardwired
link |
to acquire very early on in our development.
link |
And there's no strong distinction between the two.
link |
Like if you are primed to acquire
link |
a certain type of knowledge in just a few weeks,
link |
you might as well just be born with it.
link |
It's just part of who you are.
link |
And so there are four different core knowledge systems.
link |
Like the first one is the notion of objectness
link |
and basic physics.
link |
Like you recognize that something that moves
link |
coherently, for instance, is an object.
link |
So we intuitively, naturally, innately divide the world
link |
into objects based on this notion of coherence,
link |
physical coherence.
link |
And in terms of elementary physics,
link |
there's the fact that objects can bump against each other
link |
and the fact that they can occlude each other.
link |
So these are things that we are essentially born with
link |
or at least that we are going to be acquiring extremely early
link |
because we're really hardwired to acquire them.
link |
So a bunch of points, pixels that move together
link |
on objects are partly the same object.
link |
I don't smoke weed, but if I did,
link |
that's something I could sit all night
link |
and just think about, remember what I wrote in your paper,
link |
just objectness, I wasn't self aware, I guess,
link |
of that particular prior.
link |
That's such a fascinating prior that like...
link |
That's the most basic one, but actually...
link |
Objectness, just identity, just objectness.
link |
It's very basic, I suppose, but it's so fundamental.
link |
It is fundamental to human cognition.
link |
The second prior that's also fundamental is agentness,
link |
which is not a real world, a real world, so agentness.
link |
The fact that some of these objects
link |
that you segment your environment into,
link |
some of these objects are agents.
link |
So what's an agent?
link |
It's basically, it's an object that has goals.
link |
That has goals, that is capable of pursuing goals.
link |
So for instance, if you see two dots
link |
moving in roughly synchronized fashion,
link |
you will intuitively infer that one of the dots
link |
is pursuing the other.
link |
So that one of the dots is...
link |
And one of the dots is an agent
link |
and its goal is to avoid the other dot.
link |
And one of the dots, the other dot is also an agent
link |
and its goal is to catch the first dot.
link |
Belke has shown that babies as young as three months
link |
identify agentness and goal directedness
link |
in their environment.
link |
Another prior is basic geometry and topology,
link |
like the notion of distance,
link |
the ability to navigate in your environment and so on.
link |
This is something that is fundamentally hardwired
link |
It's in fact backed by very specific neural mechanisms,
link |
like for instance, grid cells and place cells.
link |
So it's something that's literally hard coded
link |
at the neural level in our hippocampus.
link |
And the last prior would be the notion of numbers.
link |
Like numbers are not actually a cultural construct.
link |
We are intuitively, innately able to do some basic counting
link |
and to compare quantities.
link |
So it doesn't mean we can do arbitrary arithmetic.
link |
Counting, the actual counting.
link |
Counting, like counting one, two, three ish,
link |
then maybe more than three.
link |
You can also compare quantities.
link |
If I give you three dots and five dots,
link |
you can tell the side with five dots has more dots.
link |
So this is actually an innate prior.
link |
So that said, the list may not be exhaustive.
link |
So SpellKey is still, you know,
link |
passing the potential existence of new knowledge systems.
link |
For instance, knowledge systems that we deal
link |
with social relationships.
link |
Yeah, I mean, and there could be...
link |
Which is much less relevant to something like ARC
link |
or IQ test and so on.
link |
There could be stuff that's like you said,
link |
rotation, symmetry, is there like...
link |
Symmetry is really interesting.
link |
It's very likely that there is, speaking about rotation,
link |
that there is in the brain, a hard coded system
link |
that is capable of performing rotations.
link |
One famous experiment that people did in the...
link |
I don't remember which was exactly,
link |
but in the 70s was that people found that
link |
if you asked people, if you give them two different shapes
link |
and one of the shapes is a rotated version
link |
of the first shape, and you ask them,
link |
is that shape a rotated version of the first shape or not?
link |
What you see is that the time it takes people to answer
link |
is linearly proportional, right, to the angle of rotation.
link |
So it's almost like you have somewhere in your brain
link |
like a turntable with a fixed speed.
link |
And if you want to know if two objects are a rotated version
link |
of each other, you put the object on the turntable,
link |
you let it move around a little bit,
link |
and then you stop when you have a match.
link |
And that's really interesting.
link |
So what's the ARC challenge?
link |
So in the paper, I outline all these principles
link |
that a good test of machine intelligence
link |
and human intelligence should follow.
link |
And the ARC challenge is one attempt
link |
to embody as many of these principles as possible.
link |
So I don't think it's anywhere near a perfect attempt, right?
link |
It does not actually follow every principle,
link |
but it is what I was able to do given the constraints.
link |
So the format of ARC is very similar to classic IQ tests,
link |
in particular Raven's Progressive Metrices.
link |
Yeah, Raven's Progressive Metrices.
link |
I mean, if you've done IQ tests in the past,
link |
you know what that is, probably.
link |
Or at least you've seen it, even if you
link |
don't know what it's called.
link |
And so you have a set of tasks, that's what they're called.
link |
And for each task, you have training data,
link |
which is a set of input and output pairs.
link |
So an input or output pair is a grid of colors, basically.
link |
The grid, the size of the grid is variables.
link |
The size of the grid is variable.
link |
And you're given an input, and you must transform it
link |
into the proper output.
link |
And so you're shown a few demonstrations
link |
of a task in the form of existing input output pairs,
link |
and then you're given a new input.
link |
And you must provide, you must produce the correct output.
link |
And the assumptions in Arc is that every task should only
link |
require core knowledge priors, should not
link |
require any outside knowledge.
link |
So for instance, no language, no English, nothing like this.
link |
No concepts taken from our human experience,
link |
like trees, dogs, cats, and so on.
link |
So only reasoning tasks that are built on top
link |
of core knowledge priors.
link |
And some of the tasks are actually explicitly
link |
trying to probe specific forms of abstraction.
link |
Part of the reason why I wanted to create Arc
link |
is I'm a big believer in when you're
link |
faced with a problem as murky as understanding
link |
how to autonomously generate abstraction in a machine,
link |
you have to coevolve the solution and the problem.
link |
And so part of the reason why I designed Arc
link |
was to clarify my ideas about the nature of abstraction.
link |
And some of the tasks are actually
link |
designed to probe bits of that theory.
link |
And there are things that turn out
link |
to be very easy for humans to perform, including young kids,
link |
but turn out to be near impossible for machines.
link |
So what have you learned from the nature of abstraction
link |
from designing that?
link |
Can you clarify what you mean?
link |
One of the things you wanted to try to understand
link |
was this idea of abstraction.
link |
Yes, so clarifying my own ideas about abstraction
link |
by forcing myself to produce tasks that
link |
would require the ability to produce
link |
that form of abstraction in order to solve them.
link |
OK, so and by the way, just the people should check out.
link |
I'll probably overlay if you're watching the video part.
link |
But the grid input output with the different colors
link |
on the grid, that's it.
link |
I mean, it's a very simple world,
link |
but it's kind of beautiful.
link |
It's very similar to classic IQ tests.
link |
It's not very original in that sense.
link |
The main difference with IQ tests
link |
is that we make the priors explicit, which is not
link |
usually the case in IQ tests.
link |
So you make it explicit that everything should only
link |
be built on top of core knowledge priors.
link |
I also think it's generally more diverse than IQ tests
link |
And it perhaps requires a bit more manual work
link |
to produce solutions, because you
link |
have to click around on a grid for a while.
link |
Sometimes the grids can be as large as 30 by 30 cells.
link |
So how did you come up, if you can reveal, with the questions?
link |
What's the process of the questions?
link |
Was it mostly you that came up with the questions?
link |
How difficult is it to come up with a question?
link |
Is this scalable to a much larger number?
link |
If we think, with IQ tests, you might not necessarily
link |
want it to or need it to be scalable.
link |
With machines, it's possible, you
link |
could argue, that it needs to be scalable.
link |
So there are 1,000 questions, 1,000 tasks,
link |
including the test set, the prior test set.
link |
I think it's fairly difficult in the sense
link |
that a big requirement is that every task should
link |
be novel and unique and unpredictable.
link |
You don't want to create your own little world that
link |
is simple enough that it would be possible for a human
link |
to reverse and generate and write down
link |
an algorithm that could generate every possible arc
link |
task and their solution.
link |
So that would completely invalidate the test.
link |
So you're constantly coming up with new stuff.
link |
Yeah, you need a source of novelty,
link |
of unfakeable novelty.
link |
And one thing I found is that, as a human,
link |
you are not a very good source of unfakeable novelty.
link |
And so you have to base the creation of these tasks
link |
There are only so many unique tasks
link |
that you can do in a given day.
link |
So that means coming up with truly original new ideas.
link |
Did psychedelics help you at all?
link |
No, I'm just kidding.
link |
But I mean, that's fascinating to think about.
link |
So you would be walking or something like that.
link |
Are you constantly thinking of something totally new?
link |
Yeah, I mean, I'm not saying you've done anywhere
link |
near a perfect job at it.
link |
There is some amount of redundancy,
link |
and there are many imperfections in ARC.
link |
So that said, you should consider
link |
ARC as a work in progress.
link |
It is not the definitive state.
link |
The ARC tasks today are not the definitive state of the test.
link |
I want to keep refining it in the future.
link |
I also think it should be possible to open up
link |
the creation of tasks to a broad audience
link |
to do crowdsourcing.
link |
That would involve several levels of filtering,
link |
But I think it's possible to apply crowdsourcing
link |
to develop a much bigger and much more diverse ARC data set.
link |
That would also be free of potentially some
link |
of my own personal biases.
link |
Is there always need to be a part of ARC
link |
that the test is hidden?
link |
It is imperative that the tests that you're
link |
using to actually benchmark algorithms
link |
is not accessible to the people developing these algorithms.
link |
Because otherwise, what's going to happen
link |
is that the human engineers are just
link |
going to solve the tasks themselves
link |
and encode their solution in program form.
link |
But that, again, what you're seeing here
link |
is the process of intelligence happening
link |
in the mind of the human.
link |
And then you're just capturing its crystallized output.
link |
But that crystallized output is not the same thing
link |
as the process it generated.
link |
It's not intelligent in itself.
link |
So what, by the way, the idea of crowdsourcing it
link |
I think the creation of questions
link |
is really exciting for people.
link |
I think there's a lot of really brilliant people
link |
out there that love to create these kinds of stuff.
link |
Yeah, one thing that kind of surprised me
link |
that I wasn't expecting is that lots of people
link |
seem to actually enjoy ARC as a kind of game.
link |
And I was releasing it as a test,
link |
as a benchmark of fluid general intelligence.
link |
And lots of people just, including kids,
link |
just started enjoying it as a game.
link |
So I think that's encouraging.
link |
Yeah, I'm fascinated by it.
link |
There's a world of people who create IQ questions.
link |
I think that's a cool activity for machines and for humans.
link |
And humans are themselves fascinated
link |
by taking the questions, like measuring
link |
their own intelligence.
link |
I mean, that's just really compelling.
link |
It's really interesting to me, too.
link |
One of the cool things about ARC, you said,
link |
is kind of inspired by IQ tests or whatever
link |
follows a similar process.
link |
But because of its nature, because of the context
link |
in which it lives, it immediately
link |
forces you to think about the nature of intelligence
link |
as opposed to just the test of your own.
link |
It forces you to really think.
link |
I don't know if it's within the question,
link |
inherent in the question, or just the fact
link |
that it lives in the test that's supposed
link |
to be a test of machine intelligence.
link |
As you solve ARC tasks as a human,
link |
you will be forced to basically introspect
link |
how you come up with solutions.
link |
And that forces you to reflect on the human problem solving
link |
And the way your own mind generates
link |
abstract representations of the problems it's exposed to.
link |
I think it's due to the fact that the set of core knowledge
link |
priors that ARC is built upon is so small.
link |
It's all a recombination of a very, very small set
link |
OK, so what's the future of ARC?
link |
So you held ARC as a challenge, as part
link |
of like a Kaggle competition.
link |
Kaggle competition.
link |
And what do you think?
link |
Do you think that's something that
link |
continues for five years, 10 years,
link |
like just continues growing?
link |
So ARC itself will keep evolving.
link |
So I've talked about crowdsourcing.
link |
I think that's a good avenue.
link |
Another thing I'm starting is I'll
link |
be collaborating with folks from the psychology department
link |
at NYU to do human testing on ARC.
link |
And I think there are lots of interesting questions
link |
you can start asking, especially as you start correlating
link |
machine solutions to ARC tasks and the human characteristics
link |
Like for instance, you can try to see
link |
if there's a relationship between the human perceived
link |
difficulty of a task and the machine perceived.
link |
Yes, and exactly some measure of machine
link |
perceived difficulty.
link |
Yeah, it's a nice playground in which
link |
to explore this very difference.
link |
It's the same thing as we talked about the autonomous vehicles.
link |
The things that could be difficult for humans
link |
might be very different than the things that are difficult.
link |
And formalizing or making explicit that difference
link |
in difficulty may teach us something fundamental
link |
about intelligence.
link |
So one thing I think we did well with ARC
link |
is that it's proving to be a very actionable test in the sense
link |
that machine performance on ARC started at very much zero
link |
initially, while humans found actually the task very easy.
link |
And that alone was like a big red flashing light saying
link |
that something is going on and that we are missing something.
link |
And at the same time, machine performance
link |
did not stay at zero for very long.
link |
Actually, within two weeks of the Kaggle competition,
link |
we started having a nonzero number.
link |
And now the state of the art is around 20%
link |
of the test set solved.
link |
And so ARC is actually a challenge
link |
where our capabilities start at zero, which indicates
link |
the need for progress.
link |
But it's also not an impossible challenge.
link |
It's not accessible.
link |
You can start making progress basically right away.
link |
At the same time, we are still very far
link |
from having solved it.
link |
And that's actually a very positive outcome
link |
of the competition is that the competition has proven
link |
that there was no obvious shortcut to solve these tasks.
link |
Yeah, so the test held up.
link |
That was the primary reason to use the Kaggle competition
link |
is to check if some clever person was
link |
going to hack the benchmark that did not happen.
link |
People who are solving the task are essentially doing it.
link |
Well, in a way, they're actually exploring some flaws of ARC
link |
that we will need to address in the future,
link |
especially they're essentially anticipating
link |
what sort of tasks may be contained in the test set.
link |
Right, which is kind of, yeah, that's the kind of hacking.
link |
It's human hacking of the test.
link |
Yes, that said, with the state of the art,
link |
it's like 20% we're still very, very far from human level,
link |
which is closer to 100%.
link |
And I do believe that it will take a while
link |
until we reach human parity on ARC.
link |
And that by the time we have human parity,
link |
we will have AI systems that are probably
link |
pretty close to human level in terms of general fluid
link |
intelligence, which is, I mean, they are not
link |
going to be necessarily human like.
link |
They're not necessarily, you would not necessarily
link |
recognize them as being an AGI.
link |
But they would be capable of a degree of generalization
link |
that matches the generalization performed
link |
by human fluid intelligence.
link |
I mean, this is a good point in terms
link |
of general fluid intelligence to mention in your paper.
link |
You describe different kinds of generalizations,
link |
local, broad, extreme.
link |
And there's a kind of a hierarchy that you form.
link |
So when we say generalizations, what are we talking about?
link |
What kinds are there?
link |
Right, so generalization is a very old idea.
link |
I mean, it's even older than machine learning.
link |
In the context of machine learning,
link |
you say a system generalizes if it can make sense of an input
link |
it has not yet seen.
link |
And that's what I would call system centric generalization,
link |
generalization with respect to novelty
link |
for the specific system you're considering.
link |
So I think a good test of intelligence
link |
should actually deal with developer aware generalization,
link |
which is slightly stronger than system centric generalization.
link |
So developer aware generalization
link |
would be the ability to generalize
link |
to novelty or uncertainty that not only the system itself has
link |
not access to, but the developer of the system
link |
could not have access to either.
link |
That's a fascinating meta definition.
link |
So the system is basically the edge case thing
link |
we're talking about with autonomous vehicles.
link |
Neither the developer nor the system
link |
know about the edge cases in my encounter.
link |
So it's up to the system should be
link |
able to generalize the thing that nobody expected,
link |
neither the designer of the training data,
link |
nor obviously the contents of the training data.
link |
That's a fascinating definition.
link |
So you can see degrees of generalization as a spectrum.
link |
And the lowest level is what machine learning
link |
is trying to do is the assumption
link |
that any new situation is going to be sampled
link |
from a static distribution of possible situations
link |
and that you already have a representative sample
link |
of the distribution.
link |
That's your training data.
link |
And so in machine learning, you generalize to a new sample
link |
from a known distribution.
link |
And the ways in which your new sample will be new or different
link |
are ways that are already understood by the developers
link |
So you are generalizing to known unknowns
link |
for one specific task.
link |
That's what you would call robustness.
link |
You are robust to things like noise, small variations,
link |
and so on for one fixed known distribution
link |
that you know through your training data.
link |
And the higher degree would be flexibility
link |
in machine intelligence.
link |
So flexibility would be something
link |
like an L5 cell driving car or maybe a robot that
link |
can pass the coffee cup test, which
link |
is the notion that you'd be given a random kitchen
link |
somewhere in the country.
link |
And you would have to go make a cup of coffee in that kitchen.
link |
So flexibility would be the ability
link |
to deal with unknown unknowns, so things that could not,
link |
dimensions of viability that could not
link |
have been possibly foreseen by the creators of the system
link |
within one specific task.
link |
So generalizing to the long tail of situations in self driving,
link |
for instance, would be flexibility.
link |
So you have robustness, flexibility, and finally,
link |
you would have extreme generalization,
link |
which is basically flexibility, but instead
link |
of just considering one specific domain,
link |
like driving or domestic robotics,
link |
you're considering an open ended range of possible domains.
link |
So a robot would be capable of extreme generalization
link |
if, let's say, it's designed and trained for cooking,
link |
And if I buy the robot and if it's
link |
able to teach itself gardening in a couple of weeks,
link |
it would be capable of extreme generalization, for instance.
link |
So the ultimate goal is extreme generalization.
link |
So creating a system that is so general that it could
link |
essentially achieve human skill parity over arbitrary tasks
link |
and arbitrary domains with the same level of improvisation
link |
and adaptation power as humans when
link |
it encounters new situations.
link |
And it would do so over basically the same range
link |
of possible domains and tasks as humans
link |
and using essentially the same amount of training
link |
experience of practice as humans would require.
link |
That would be human level extreme generalization.
link |
So I don't actually think humans are anywhere
link |
near the optimal intelligence bounds
link |
if there is such a thing.
link |
So I think for humans or in general?
link |
I think it's quite likely that there
link |
is a hard limit to how intelligent any system can be.
link |
But at the same time, I don't think humans are anywhere
link |
Yeah, last time I think we talked,
link |
I think you had this idea that we're only
link |
as intelligent as the problems we face.
link |
Sort of we are bounded by the problems.
link |
We are bounded by our environments,
link |
and we are bounded by the problems we try to solve.
link |
What do you make of Neuralink and outsourcing
link |
some of the brain power, like brain computer interfaces?
link |
Do you think we can expand or augment our intelligence?
link |
I am fairly skeptical of neural interfaces
link |
because they are trying to fix one specific bottleneck
link |
in human machine cognition, which
link |
is the bandwidth bottleneck, input and output
link |
of information in the brain.
link |
And my perception of the problem is that bandwidth is not
link |
at this time a bottleneck at all.
link |
Meaning that we already have sensors
link |
that enable us to take in far more information than what
link |
we can actually process.
link |
Well, to push back on that a little bit,
link |
to sort of play devil's advocate a little bit,
link |
is if you look at the internet, Wikipedia, let's say Wikipedia,
link |
I would say that humans, after the advent of Wikipedia,
link |
are much more intelligent.
link |
Yes, I think that's a good one.
link |
But that's also not about, that's about externalizing
link |
our intelligence via information processing systems,
link |
external information processing systems,
link |
which is very different from brain computer interfaces.
link |
Right, but the question is whether if we have direct
link |
access, if our brain has direct access to Wikipedia without
link |
Your brain already has direct access to Wikipedia.
link |
It's on your phone.
link |
And you have your hands and your eyes and your ears
link |
and so on to access that information.
link |
And the speed at which you can access it
link |
Is bottlenecked by the cognition.
link |
I think it's already close, fairly close to optimal,
link |
which is why speed reading, for instance, does not work.
link |
The faster you read, the less you understand.
link |
But maybe it's because it uses the eyes.
link |
So I don't believe so.
link |
I think the brain is very slow.
link |
It typically operates, you know, the fastest things
link |
that happen in the brain are at the level of 50 milliseconds.
link |
Forming a conscious thought can potentially
link |
take entire seconds, right?
link |
And you can already read pretty fast.
link |
So I think the speed at which you can take information in
link |
and even the speed at which you can output information
link |
can only be very incrementally improved.
link |
Maybe there's a question.
link |
If you're a very fast typer, if you're a very trained typer,
link |
the speed at which you can express your thoughts
link |
is already the speed at which you can form your thoughts.
link |
Right, so that's kind of an idea that there are
link |
fundamental bottlenecks to the human mind.
link |
But it's possible that everything we have
link |
in the human mind is just to be able to survive
link |
in the environment.
link |
And there's a lot more to expand.
link |
Maybe, you know, you said the speed of the thought.
link |
So I think augmenting human intelligence
link |
is a very valid and very powerful avenue, right?
link |
And that's what computers are about.
link |
In fact, that's what all of culture and civilization
link |
Our culture is externalized cognition
link |
and we rely on culture to think constantly.
link |
Yeah, I mean, that's another, yeah.
link |
Not just computers, not just phones and the internet.
link |
I mean, all of culture, like language, for instance,
link |
is a form of externalized cognition.
link |
Books are obviously externalized cognition.
link |
Yeah, that's a good point.
link |
And you can scale that externalized cognition
link |
far beyond the capability of the human brain.
link |
And you could see civilization itself
link |
is it has capabilities that are far beyond any individual brain
link |
and will keep scaling it because it's not
link |
rebound by individual brains.
link |
It's a different kind of system.
link |
Yeah, and that system includes nonhuman, nonhumans.
link |
First of all, it includes all the other biological systems,
link |
which are probably contributing to the overall intelligence
link |
And then computers are part of it.
link |
Nonhuman systems are probably not contributing much,
link |
but AIs are definitely contributing to that.
link |
Like Google search, for instance, is a big part of it.
link |
Yeah, yeah, a huge part, a part that we can't probably
link |
Like how the world has changed in the past 20 years,
link |
it's probably very difficult for us
link |
to be able to understand until, of course,
link |
whoever created the simulation we're in is probably
link |
doing metrics, measuring the progress.
link |
There was probably a big spike in performance.
link |
They're enjoying this.
link |
So what are your thoughts on the Turing test
link |
and the Lobner Prize, which is one
link |
of the most famous attempts at the test of artificial
link |
intelligence by doing a natural language open dialogue test
link |
that's judged by humans as far as how well the machine did?
link |
So I'm not a fan of the Turing test.
link |
Itself or any of its variants for two reasons.
link |
So first of all, it's really coping out
link |
of trying to define and measure intelligence
link |
because it's entirely outsourcing that
link |
to a panel of human judges.
link |
And these human judges, they may not themselves
link |
have any proper methodology.
link |
They may not themselves have any proper definition
link |
They may not be reliable.
link |
So the Turing test is already failing
link |
one of the core psychometrics principles, which
link |
is reliability because you have biased human judges.
link |
It's also violating the standardization requirement
link |
and the freedom from bias requirement.
link |
And so it's really a cope out because you are outsourcing
link |
everything that matters, which is precisely describing
link |
intelligence and finding a standalone test to measure it.
link |
You're outsourcing everything to people.
link |
So it's really a cope out.
link |
And by the way, we should keep in mind
link |
that when Turing proposed the imitation game,
link |
it was not meaning for the imitation game
link |
to be an actual goal for the field of AI
link |
and actual test of intelligence.
link |
It was using the imitation game as a thought experiment
link |
in a philosophical discussion in his 1950 paper.
link |
He was trying to argue that theoretically, it
link |
should be possible for something very much like the human mind,
link |
indistinguishable from the human mind,
link |
to be encoded in a Turing machine.
link |
And at the time, that was a very daring idea.
link |
It was stretching credulity.
link |
But nowadays, I think it's fairly well accepted
link |
that the mind is an information processing system
link |
and that you could probably encode it into a computer.
link |
So another reason why I'm not a fan of this type of test
link |
is that the incentives that it creates
link |
are incentives that are not conducive to proper scientific
link |
If your goal is to trick, to convince a panel of human
link |
judges that they are talking to a human,
link |
then you have an incentive to rely on tricks
link |
and prestidigitation.
link |
In the same way that, let's say, you're doing physics
link |
and you want to solve teleportation.
link |
And what if the test that you set out to pass
link |
is you need to convince a panel of judges
link |
that teleportation took place?
link |
And they're just sitting there and watching what you're doing.
link |
And that is something that you can achieve with David
link |
Copperfield could achieve it in his show at Vegas.
link |
And what he's doing is very elaborate.
link |
But it's not physics.
link |
It's not making any progress in our understanding
link |
To push back on that is possible.
link |
That's the hope with these kinds of subjective evaluations
link |
is that it's easier to solve it generally
link |
than it is to come up with tricks that convince
link |
a large number of judges.
link |
In practice, it turns out that it's
link |
very easy to deceive people in the same way
link |
that you can do magic in Vegas.
link |
You can actually very easily convince people
link |
that they're talking to a human when they're actually
link |
talking to an algorithm.
link |
I disagree with that.
link |
I think it's easy.
link |
No, it's not easy.
link |
It's very easy because we are biased.
link |
We have theory of mind.
link |
We are constantly projecting emotions, intentions, agentness.
link |
Agentness is one of our core innate priors.
link |
We are projecting these things on everything around us.
link |
Like if you paint a smiley on a rock,
link |
the rock becomes happy in our eyes.
link |
And because we have this extreme bias that
link |
permits everything we see around us,
link |
it's actually pretty easy to trick people.
link |
I just disagree with that.
link |
I so totally disagree with that.
link |
You brilliantly put as a huge, the anthropomorphization
link |
that we naturally do, the agentness of that word.
link |
Is that a real word?
link |
No, it's not a real word.
link |
But it's a useful word.
link |
It's a useful word.
link |
Let's make it real.
link |
But I still think it's really difficult to convince.
link |
If you do like the Alexa Prize formulation,
link |
where you talk for an hour, there's
link |
formulations of the test you can create,
link |
where it's very difficult.
link |
So I like the Alexa Prize better because it's more pragmatic.
link |
It's more practical.
link |
It's actually incentivizing developers
link |
to create something that's useful as a human machine
link |
So that's slightly better than just the imitation.
link |
Your idea is like a test which hopefully
link |
help us in creating intelligent systems as a result.
link |
Like if you create a system that passes it,
link |
it'll be useful for creating further intelligent systems.
link |
Just to kind of comment, I'm a little bit surprised
link |
how little inspiration people draw from the Turing test
link |
The media and the popular press might write about it
link |
every once in a while.
link |
The philosophers might talk about it.
link |
But most engineers are not really inspired by it.
link |
And I know you don't like the Turing test,
link |
but we'll have this argument another time.
link |
There's something inspiring about it, I think.
link |
As a philosophical device in a physical discussion,
link |
I think there is something very interesting about it.
link |
I don't think it is in practical terms.
link |
I don't think it's conducive to progress.
link |
And one of the reasons why is that I
link |
think being very human like, being
link |
indistinguishable from a human is actually
link |
the very last step in the creation of machine
link |
That the first ARs that will show strong generalization
link |
that will actually implement human like broad cognitive
link |
abilities, they will not actually behave or look
link |
anything like humans.
link |
Human likeness is the very last step in that process.
link |
And so a good test is a test that
link |
points you towards the first step on the ladder,
link |
not towards the top of the ladder.
link |
So to push back on that, I usually
link |
agree with you on most things.
link |
I remember you, I think at some point,
link |
tweeting something about the Turing test
link |
not being being counterproductive
link |
or something like that.
link |
And I think a lot of very smart people agree with that.
link |
I, a computation speaking, not very smart person,
link |
disagree with that.
link |
Because I think there's some magic
link |
to the interactivity with other humans.
link |
So to play devil's advocate on your statement,
link |
it's possible that in order to demonstrate
link |
the generalization abilities of a system,
link |
you have to show your ability, in conversation,
link |
show your ability to adjust, adapt to the conversation
link |
through not just like as a standalone system,
link |
but through the process of like the interaction,
link |
the game theoretic, where you really
link |
are changing the environment by your actions.
link |
So in the ARC challenge, for example,
link |
you're an observer.
link |
You can't scare the test into changing.
link |
You can't talk to the test.
link |
You can't play with it.
link |
So there's some aspect of that interactivity
link |
that becomes highly subjective, but it
link |
feels like it could be conducive to generalizability.
link |
I think you make a great point.
link |
The interactivity is a very good setting
link |
to force a system to show adaptation,
link |
to show generalization.
link |
That said, at the same time, it's
link |
not something very scalable, because you
link |
rely on human judges.
link |
It's not something reliable, because the human judges may
link |
So you don't like human judges.
link |
I love the idea of interactivity.
link |
I initially wanted an ARC test that
link |
had some amount of interactivity where your score on a task
link |
would not be 1 or 0, if you can solve it or not,
link |
but would be the number of attempts
link |
that you can make before you hit the right solution, which
link |
means that now you can start applying
link |
the scientific method as you solve ARC tasks,
link |
that you can start formulating hypotheses and probing
link |
the system to see whether the observation will
link |
match the hypothesis or not.
link |
It would be amazing if you could also,
link |
even higher level than that, measure the quality of your attempts,
link |
which, of course, is impossible.
link |
But again, that gets subjective.
link |
How good was your thinking?
link |
How efficient was?
link |
So one thing that's interesting about this notion of scoring you
link |
as how many attempts you need is that you
link |
can start producing tasks that are way more ambiguous, right?
link |
Because with the different attempts,
link |
you can actually probe that ambiguity, right?
link |
So that's, in a sense, which is how good can
link |
you adapt to the uncertainty and reduce the uncertainty?
link |
Yes, it's half fast.
link |
It's the efficiency with which you reduce uncertainty
link |
in program space, exactly.
link |
Very difficult to come up with that kind of test, though.
link |
Yeah, so I would love to be able to create something like this.
link |
In practice, it would be very, very difficult, but yes.
link |
I mean, what you're doing, what you've done with the ARC challenge
link |
I'm also not surprised that it's not more popular,
link |
but I think it's picking up.
link |
It does its niche.
link |
It does its niche, yeah.
link |
What are your thoughts about another test?
link |
I talked with Marcus Hutter.
link |
He has the Hutter Prize for compression of human knowledge.
link |
And the idea is really sort of quantify and reduce
link |
the test of intelligence purely to just the ability
link |
What's your thoughts about this intelligence as compression?
link |
I mean, it's a very fun test because it's
link |
such a simple idea, like you're given Wikipedia,
link |
basic English Wikipedia, and you must compress it.
link |
And so it stems from the idea that cognition is compression,
link |
that the brain is basically a compression algorithm.
link |
This is a very old idea.
link |
It's a very, I think, striking and beautiful idea.
link |
I used to believe it.
link |
I eventually had to realize that it was very much
link |
So I no longer believe that cognition is compression.
link |
But I can tell you what's the difference.
link |
So it's very easy to believe that cognition and compression
link |
are the same thing.
link |
So Jeff Hawkins, for instance, says
link |
that cognition is prediction.
link |
And of course, prediction is basically the same thing
link |
It's just including the temporal axis.
link |
And it's very easy to believe this
link |
because compression is something that we
link |
do all the time very naturally.
link |
We are constantly compressing information.
link |
We are constantly trying.
link |
We have this bias towards simplicity.
link |
We are constantly trying to organize things in our mind
link |
and around us to be more regular.
link |
So it's a beautiful idea.
link |
It's very easy to believe.
link |
There is a big difference between what
link |
we do with our brains and compression.
link |
So compression is actually kind of a tool
link |
in the human cognitive toolkit that is used in many ways.
link |
But it's just a tool.
link |
It is a tool for cognition.
link |
It is not cognition itself.
link |
And the big fundamental difference
link |
is that cognition is about being able to operate
link |
in future situations that include fundamental uncertainty
link |
So for instance, consider a child at age 10.
link |
And so they have 10 years of life experience.
link |
They've gotten pain, pleasure, rewards, and punishment
link |
in a period of time.
link |
If you were to generate the shortest behavioral program
link |
that would have basically run that child over these 10 years
link |
in an optimal way, the shortest optimal behavioral program
link |
given the experience of that child so far,
link |
well, that program, that compressed program,
link |
this is what you would get if the mind of the child
link |
was a compression algorithm essentially,
link |
would be utterly unable, inappropriate,
link |
to process the next 70 years in the life of that child.
link |
So in the models we build of the world,
link |
we are not trying to make them actually optimally compressed.
link |
We are using compression as a tool
link |
to promote simplicity and efficiency in our models.
link |
But they are not perfectly compressed
link |
because they need to include things
link |
that are seemingly useless today, that have seemingly
link |
been useless so far.
link |
But that may turn out to be useful in the future
link |
because you just don't know the future.
link |
And that's the fundamental principle
link |
that cognition, that intelligence arises from
link |
is that you need to be able to run
link |
appropriate behavioral programs except you have absolutely
link |
no idea what sort of context, environment, situation
link |
they are going to be running in.
link |
And you have to deal with that uncertainty,
link |
with that future anomaly.
link |
So an analogy that you can make is with investing,
link |
If I look at the past 20 years of stock market data,
link |
and I use a compression algorithm
link |
to figure out the best trading strategy,
link |
it's going to be you buy Apple stock, then
link |
maybe the past few years you buy Tesla stock or something.
link |
But is that strategy still going to be
link |
true for the next 20 years?
link |
Well, actually, probably not, which
link |
is why if you're a smart investor,
link |
you're not just going to be following the strategy that
link |
corresponds to compression of the past.
link |
You're going to be following, you're
link |
going to have a balanced portfolio, right?
link |
Because you just don't know what's going to happen.
link |
I mean, I guess in that same sense,
link |
the compression is analogous to what
link |
you talked about, which is local or robust generalization
link |
versus extreme generalization.
link |
It's much closer to that side of being able to generalize
link |
in the local sense.
link |
That's why as humans, when we are children, in our education,
link |
so a lot of it is driven by play, driven by curiosity.
link |
We are not efficiently compressing things.
link |
We're actually exploring.
link |
We are retaining all kinds of things
link |
from our environment that seem to be completely useless.
link |
Because they might turn out to be eventually useful, right?
link |
And that's what cognition is really about.
link |
And what makes it antagonistic to compression
link |
is that it is about hedging for future uncertainty.
link |
And that's antagonistic to compression.
link |
Officially hedging.
link |
Cognition leverages compression as a tool
link |
to promote efficiency and simplicity in our models.
link |
It's like Einstein said, make it simpler, but not,
link |
however that quote goes, but not too simple.
link |
So compression simplifies things,
link |
but you don't want to make it too simple.
link |
So a good model of the world is going
link |
to include all kinds of things that are completely useless,
link |
actually, just in case.
link |
Because you need diversity in the same way
link |
that in your portfolio.
link |
You need all kinds of stocks that may not
link |
have performed well so far, but you need diversity.
link |
And the reason you need diversity
link |
is because fundamentally you don't know what you're doing.
link |
And the same is true of the human mind,
link |
is that it needs to behave appropriately in the future.
link |
And it has no idea what the future is going to be like.
link |
But it's not going to be like the past.
link |
So compressing the past is not appropriate,
link |
because the past is not, it's not predictive of the future.
link |
Yeah, history repeats itself, but not perfectly.
link |
I don't think I asked you last time the most inappropriately
link |
We've talked a lot about intelligence,
link |
but the bigger question from intelligence is of meaning.
link |
Intelligence systems are kind of goal oriented.
link |
They're always optimizing for a goal.
link |
If you look at the Hutter Prize, actually,
link |
I mean, there's always a clean formulation of a goal.
link |
But the natural question for us humans,
link |
since we don't know our objective function,
link |
is what is the meaning of it all?
link |
So the absurd question is, what, Francois,
link |
do you think is the meaning of life?
link |
What's the meaning of life?
link |
Yeah, that's a big question.
link |
And I think I can give you my answer, at least one
link |
And so one thing that's very important in understanding who
link |
we are is that everything that makes up ourselves,
link |
that makes up who we are, even your most personal thoughts,
link |
is not actually your own.
link |
Even your most personal thoughts are expressed in words
link |
that you did not invent and are built on concepts and images
link |
that you did not invent.
link |
We are very much cultural beings.
link |
We are made of culture.
link |
What makes us different from animals, for instance?
link |
So everything about ourselves is an echo of the past.
link |
Is an echo of the past, an echo of people who lived before us.
link |
That's who we are.
link |
And in the same way, if we manage
link |
to contribute something to the collective edifice of culture,
link |
a new idea, maybe a beautiful piece of music,
link |
a work of art, a grand theory, a new world, maybe,
link |
that something is going to become
link |
a part of the minds of future humans, essentially, forever.
link |
So everything we do creates ripples
link |
that propagate into the future.
link |
And in a way, this is our path to immortality,
link |
is that as we contribute things to culture,
link |
culture in turn becomes future humans.
link |
And we keep influencing people thousands of years from now.
link |
So our actions today create ripples.
link |
And these ripples, I think, basically
link |
sum up the meaning of life.
link |
In the same way that we are the sum
link |
of the interactions between many different ripples that
link |
came from our past, we are ourselves
link |
creating ripples that will propagate into the future.
link |
And that's why we should be, this
link |
seems like perhaps an eighth thing to say,
link |
but we should be kind to others during our time on Earth
link |
because every act of kindness creates ripples.
link |
And in reverse, every act of violence also creates ripples.
link |
And you want to carefully choose which kind of ripples
link |
you want to create, and you want to propagate into the future.
link |
And in your case, first of all, beautifully put,
link |
but in your case, creating ripples
link |
into the future human and future AGI systems.
link |
I don't think there's a better way to end it,
link |
Francois, as always, for a second time.
link |
And I'm sure many times in the future,
link |
it's been a huge honor.
link |
You're one of the most brilliant people
link |
in the machine learning, computer science world.
link |
Again, it's a huge honor.
link |
Thanks for talking to me.
link |
It's been a pleasure.
link |
Thanks a lot for having me.
link |
Thanks for listening to this conversation with Francois
link |
Chollet, and thank you to our sponsors, Babbel, Masterclass,
link |
Click the sponsor links in the description
link |
to get a discount and to support this podcast.
link |
If you enjoy this thing, subscribe on YouTube,
link |
review it with five stars on Apple Podcast,
link |
follow on Spotify, support on Patreon,
link |
or connect with me on Twitter at Lex Friedman.
link |
And now let me leave you with some words
link |
from René Descartes in 1668, an excerpt of which Francois
link |
includes and is on the measure of intelligence paper.
link |
If there were machines which bore a resemblance
link |
to our bodies and imitated our actions as closely as possible
link |
for all practical purposes, we should still
link |
have two very certain means of recognizing
link |
that they were not real men.
link |
The first is that they could never use words or put together
link |
signs, as we do in order to declare our thoughts to others.
link |
For we can certainly conceive of a machine so constructed
link |
that it utters words and even utters
link |
words that correspond to bodily actions causing
link |
a change in its organs.
link |
But it is not conceivable that such a machine should produce
link |
different arrangements of words so as
link |
to give an appropriately meaningful answer to whatever
link |
is said in its presence as the dullest of men can do.
link |
Here, Descartes is anticipating the Turing test,
link |
and the argument still continues to this day.
link |
Secondly, he continues, even though some machines might
link |
do some things as well as we do them, or perhaps even better,
link |
they would inevitably fail in others,
link |
which would reveal that they are acting not from understanding
link |
but only from the disposition of their organs.
link |
This is an incredible quote.
link |
Whereas reason is a universal instrument
link |
which can be used in all kinds of situations,
link |
these organs need some particular action.
link |
Hence, it is for all practical purposes
link |
impossible for a machine to have enough different organs
link |
to make it act in all the contingencies of life
link |
and the way in which our reason makes us act.
link |
That's the debate between mimicry and memorization
link |
versus understanding.
link |
So thank you for listening and hope to see you next time.