back to index

François Chollet: Measures of Intelligence | Lex Fridman Podcast #120


small model | large model

link |
00:00:00.000
The following is a conversation with Francois Chollet,
link |
00:00:03.220
his second time on the podcast.
link |
00:00:05.260
He's both a world class engineer and a philosopher
link |
00:00:09.580
in the realm of deep learning and artificial intelligence.
link |
00:00:13.180
This time, we talk a lot about his paper titled
link |
00:00:16.200
on the measure of intelligence that discusses
link |
00:00:19.040
how we might define and measure general intelligence
link |
00:00:22.440
in our computing machinery.
link |
00:00:24.640
Quick summary of the sponsors,
link |
00:00:26.420
Babbel, Masterclass, and Cash App.
link |
00:00:29.460
Click the sponsor links in the description
link |
00:00:31.240
to get a discount and to support this podcast.
link |
00:00:34.500
As a side note, let me say that the serious,
link |
00:00:36.880
rigorous scientific study
link |
00:00:38.720
of artificial general intelligence is a rare thing.
link |
00:00:42.220
The mainstream machine learning community works
link |
00:00:44.080
on very narrow AI with very narrow benchmarks.
link |
00:00:47.740
This is very good for incremental
link |
00:00:49.920
and sometimes big incremental progress.
link |
00:00:53.200
On the other hand, the outside the mainstream,
link |
00:00:56.060
renegade, you could say, AGI community works
link |
00:01:00.020
on approaches that verge on the philosophical
link |
00:01:03.000
and even the literary without big public benchmarks.
link |
00:01:07.300
Walking the line between the two worlds is a rare breed,
link |
00:01:10.640
but it doesn't have to be.
link |
00:01:12.360
I ran the AGI series at MIT as an attempt
link |
00:01:15.320
to inspire more people to walk this line.
link |
00:01:17.700
Deep mind and open AI for a time
link |
00:01:20.020
and still on occasion walk this line.
link |
00:01:23.180
Francois Chollet does as well.
link |
00:01:25.860
I hope to also.
link |
00:01:27.620
It's a beautiful dream to work towards
link |
00:01:29.880
and to make real one day.
link |
00:01:32.480
If you enjoy this thing, subscribe on YouTube,
link |
00:01:34.580
review it with five stars on Apple Podcast,
link |
00:01:36.760
follow on Spotify, support on Patreon,
link |
00:01:39.020
or connect with me on Twitter at Lex Friedman.
link |
00:01:42.020
As usual, I'll do a few minutes of ads now
link |
00:01:44.240
and no ads in the middle.
link |
00:01:45.780
I try to make these interesting,
link |
00:01:47.440
but I give you timestamps so you can skip.
link |
00:01:50.620
But still, please do check out the sponsors
link |
00:01:52.660
by clicking the links in the description.
link |
00:01:54.580
It's the best way to support this podcast.
link |
00:01:57.900
This show is sponsored by Babbel,
link |
00:02:00.100
an app and website that gets you speaking
link |
00:02:02.460
in a new language within weeks.
link |
00:02:04.360
Go to babbel.com and use code Lex to get three months free.
link |
00:02:08.200
They offer 14 languages, including Spanish, French,
link |
00:02:11.460
Italian, German, and yes, Russian.
link |
00:02:15.220
Daily lessons are 10 to 15 minutes,
link |
00:02:17.340
super easy, effective,
link |
00:02:19.060
designed by over 100 language experts.
link |
00:02:22.240
Let me read a few lines from the Russian poem
link |
00:02:24.700
Noch, ulitsa, fanar, apteka, by Alexander Bloch,
link |
00:02:29.020
that you'll start to understand if you sign up to Babbel.
link |
00:02:32.580
Noch, ulitsa, fanar, apteka,
link |
00:02:35.220
Bessmysliny, ituskly, svet,
link |
00:02:38.100
Zhevi esho, khod chetvert veka,
link |
00:02:41.140
Vse budet tak, ishoda, net.
link |
00:02:44.700
Now, I say that you'll start to understand this poem
link |
00:02:48.500
because Russian starts with a language
link |
00:02:51.420
and ends with vodka.
link |
00:02:54.020
Now, the latter part is definitely not endorsed
link |
00:02:56.600
or provided by Babbel.
link |
00:02:58.020
It will probably lose me this sponsorship,
link |
00:03:00.340
although it hasn't yet.
link |
00:03:02.460
But once you graduate with Babbel,
link |
00:03:04.460
you can enroll in my advanced course
link |
00:03:06.120
of late night Russian conversation over vodka.
link |
00:03:09.200
No app for that yet.
link |
00:03:11.260
So get started by visiting babbel.com
link |
00:03:13.740
and use code Lex to get three months free.
link |
00:03:18.180
This show is also sponsored by Masterclass.
link |
00:03:20.980
Sign up at masterclass.com slash Lex
link |
00:03:23.380
to get a discount and to support this podcast.
link |
00:03:26.580
When I first heard about Masterclass,
link |
00:03:28.060
I thought it was too good to be true.
link |
00:03:29.980
I still think it's too good to be true.
link |
00:03:32.340
For $180 a year, you get an all access pass
link |
00:03:35.420
to watch courses from, to list some of my favorites.
link |
00:03:38.740
Chris Hatfield on space exploration,
link |
00:03:41.340
hope to have him in this podcast one day.
link |
00:03:43.500
Neil Dugras Tyson on scientific thinking and communication,
link |
00:03:46.660
Neil too.
link |
00:03:47.900
Will Wright, creator of SimCity and Sims
link |
00:03:50.140
on game design, Carlos Santana on guitar,
link |
00:03:52.780
Kary Kasparov on chess, Daniel Nagrano on poker,
link |
00:03:55.980
and many more.
link |
00:03:57.240
Chris Hatfield explaining how rockets work
link |
00:03:59.700
and the experience of being watched at the space
link |
00:04:01.740
alone is worth the money.
link |
00:04:03.300
By the way, you can watch it on basically any device.
link |
00:04:06.540
Once again, sign up at masterclass.com slash Lex
link |
00:04:09.380
to get a discount and to support this podcast.
link |
00:04:13.340
This show finally is presented by Cash App,
link |
00:04:16.460
the number one finance app in the App Store.
link |
00:04:18.720
When you get it, use code LexPodcast.
link |
00:04:21.220
Cash App lets you send money to friends,
link |
00:04:23.300
buy Bitcoin, and invest in the stock market
link |
00:04:25.460
with as little as $1.
link |
00:04:27.260
Since Cash App allows you to send
link |
00:04:28.980
and receive money digitally,
link |
00:04:30.540
let me mention a surprising fact related to physical money.
link |
00:04:33.860
Of all the currency in the world,
link |
00:04:35.700
roughly 8% of it is actually physical money.
link |
00:04:39.300
The other 92% of the money only exists digitally,
link |
00:04:42.820
and that's only going to increase.
link |
00:04:45.280
So again, if you get Cash App from the App Store
link |
00:04:47.400
through Google Play and use code LexPodcast,
link |
00:04:50.660
you get 10 bucks,
link |
00:04:51.740
and Cash App will also donate $10 to FIRST,
link |
00:04:54.420
an organization that is helping to advance robotics
link |
00:04:57.000
and STEM education for young people around the world.
link |
00:05:00.500
And now here's my conversation with Francois Chalet.
link |
00:05:05.060
What philosophers, thinkers, or ideas
link |
00:05:07.360
had a big impact on you growing up and today?
link |
00:05:10.700
So one author that had a big impact on me
link |
00:05:14.860
when I read his books as a teenager was Jean Piaget,
link |
00:05:18.820
who is a Swiss psychologist,
link |
00:05:21.380
is considered to be the father of developmental psychology.
link |
00:05:25.540
And he has a large body of work about
link |
00:05:28.700
basically how intelligence develops in children.
link |
00:05:33.380
And so it's very old work,
link |
00:05:35.500
like most of it is from the 1930s, 1940s.
link |
00:05:39.140
So it's not quite up to date.
link |
00:05:40.900
It's actually superseded by many newer developments
link |
00:05:43.820
in developmental psychology.
link |
00:05:45.660
But to me, it was very interesting, very striking,
link |
00:05:49.600
and actually shaped the early ways
link |
00:05:51.340
in which I started thinking about the mind
link |
00:05:53.820
and the development of intelligence as a teenager.
link |
00:05:56.220
His actual ideas or the way he thought about it
link |
00:05:58.460
or just the fact that you could think
link |
00:05:59.840
about the developing mind at all?
link |
00:06:01.600
I guess both.
link |
00:06:02.500
Jean Piaget is the author that really introduced me
link |
00:06:04.940
to the notion that intelligence and the mind
link |
00:06:07.980
is something that you construct throughout your life
link |
00:06:11.120
and that children construct it in stages.
link |
00:06:15.780
And I thought that was a very interesting idea,
link |
00:06:17.460
which is, of course, very relevant to AI,
link |
00:06:20.460
to building artificial minds.
link |
00:06:23.180
Another book that I read around the same time
link |
00:06:25.860
that had a big impact on me,
link |
00:06:28.900
and there was actually a little bit of overlap
link |
00:06:32.100
with Jean Piaget as well,
link |
00:06:32.980
and I read it around the same time,
link |
00:06:35.340
is Geoff Hawking's On Intelligence, which is a classic.
link |
00:06:39.860
And he has this vision of the mind
link |
00:06:42.500
as a multi scale hierarchy of temporal prediction modules.
link |
00:06:47.820
And these ideas really resonated with me,
link |
00:06:50.020
like the notion of a modular hierarchy
link |
00:06:55.440
of potentially compression functions
link |
00:07:00.100
or prediction functions.
link |
00:07:01.700
I thought it was really, really interesting,
link |
00:07:03.980
and it shaped the way I started thinking
link |
00:07:07.100
about how to build minds.
link |
00:07:09.760
The hierarchical nature, which aspect?
link |
00:07:13.740
Also, he's a neuroscientist, so he was thinking actual,
link |
00:07:17.520
he was basically talking about how our mind works.
link |
00:07:20.580
Yeah, the notion that cognition is prediction
link |
00:07:23.260
was an idea that was kind of new to me at the time
link |
00:07:25.460
and that I really loved at the time.
link |
00:07:27.840
And yeah, and the notion that there are multiple scales
link |
00:07:31.900
of processing in the brain.
link |
00:07:35.320
The hierarchy.
link |
00:07:36.260
Yes.
link |
00:07:37.100
This was before deep learning.
link |
00:07:38.600
These ideas of hierarchies in AI
link |
00:07:41.140
have been around for a long time,
link |
00:07:43.180
even before on intelligence.
link |
00:07:45.020
They've been around since the 1980s.
link |
00:07:48.980
And yeah, that was before deep learning.
link |
00:07:50.500
But of course, I think these ideas really found
link |
00:07:53.500
their practical implementation in deep learning.
link |
00:07:58.100
What about the memory side of things?
link |
00:07:59.740
I think he was talking about knowledge representation.
link |
00:08:02.860
Do you think about memory a lot?
link |
00:08:04.420
One way you can think of neural networks
link |
00:08:06.340
as a kind of memory, you're memorizing things,
link |
00:08:10.780
but it doesn't seem to be the kind of memory
link |
00:08:14.260
that's in our brains,
link |
00:08:16.880
or it doesn't have the same rich complexity,
link |
00:08:18.660
long term nature that's in our brains.
link |
00:08:20.660
Yes, the brain is more of a sparse access memory
link |
00:08:23.980
so that you can actually retrieve very precisely
link |
00:08:27.740
like bits of your experience.
link |
00:08:30.100
The retrieval aspect, you can like introspect,
link |
00:08:33.500
you can ask yourself questions.
link |
00:08:35.300
I guess you can program your own memory
link |
00:08:38.260
and language is actually the tool you use to do that.
link |
00:08:41.700
I think language is a kind of operating system for the mind
link |
00:08:46.360
and use language.
link |
00:08:47.820
Well, one of the uses of language is as a query
link |
00:08:51.800
that you run over your own memory,
link |
00:08:53.860
use words as keys to retrieve specific experiences
link |
00:08:57.940
or specific concepts, specific thoughts.
link |
00:09:00.140
Like language is a way you store thoughts,
link |
00:09:02.380
not just in writing, in the physical world,
link |
00:09:04.740
but also in your own mind.
link |
00:09:06.100
And it's also how you retrieve them.
link |
00:09:07.580
Like, imagine if you didn't have language,
link |
00:09:10.000
then you would have to,
link |
00:09:11.740
you would not really have a self,
link |
00:09:14.340
internally triggered way of retrieving past thoughts.
link |
00:09:18.620
You would have to rely on external experiences.
link |
00:09:21.300
For instance, you see a specific site,
link |
00:09:24.020
you smell a specific smell and that brings up memories,
link |
00:09:26.780
but you would not really have a way
link |
00:09:28.700
to deliberately access these memories without language.
link |
00:09:32.740
Well, the interesting thing you mentioned
link |
00:09:33.980
is you can also program the memory.
link |
00:09:37.420
You can change it probably with language.
link |
00:09:39.980
Yeah, using language, yes.
link |
00:09:41.500
Well, let me ask you a Chomsky question,
link |
00:09:44.100
which is like, first of all,
link |
00:09:45.980
do you think language is like fundamental,
link |
00:09:49.100
like there's turtles, what's at the bottom of the turtles?
link |
00:09:54.460
They don't go, it can't be turtles all the way down.
link |
00:09:57.260
Is language at the bottom of cognition of everything?
link |
00:10:00.260
Is like language, the fundamental aspect
link |
00:10:05.300
of like what it means to be a thinking thing?
link |
00:10:10.700
No, I don't think so.
link |
00:10:12.100
I think language is.
link |
00:10:12.940
You disagree with Norm Chomsky?
link |
00:10:14.620
Yes, I think language is a layer on top of cognition.
link |
00:10:17.900
So it is fundamental to cognition in the sense that
link |
00:10:21.740
to use a computing metaphor,
link |
00:10:23.380
I see language as the operating system of the brain,
link |
00:10:28.060
of the human mind.
link |
00:10:29.500
And the operating system is a layer on top of the computer.
link |
00:10:33.180
The computer exists before the operating system,
link |
00:10:36.140
but the operating system is how you make it truly useful.
link |
00:10:39.500
And the operating system is most likely Windows, not Linux,
link |
00:10:43.940
because language is messy.
link |
00:10:45.860
Yeah, it's messy and it's pretty difficult
link |
00:10:49.460
to inspect it, introspect it.
link |
00:10:53.140
How do you think about language?
link |
00:10:55.100
Like we use actually sort of human interpretable language,
link |
00:11:00.060
but is there something like a deeper,
link |
00:11:03.100
that's closer to like logical type of statements?
link |
00:11:08.860
Like, yeah, what is the nature of language, do you think?
link |
00:11:16.140
Like is there something deeper than like the syntactic rules
link |
00:11:18.540
we construct?
link |
00:11:19.380
Is there something that doesn't require utterances
link |
00:11:22.860
or writing or so on?
link |
00:11:25.580
Are you asking about the possibility
link |
00:11:27.460
that there could exist languages for thinking
link |
00:11:30.900
that are not made of words?
link |
00:11:32.820
Yeah.
link |
00:11:33.660
Yeah, I think so.
link |
00:11:34.500
I think, so the mind is layers, right?
link |
00:11:38.580
And language is almost like the outermost,
link |
00:11:41.780
the uppermost layer.
link |
00:11:44.620
But before we think in words,
link |
00:11:46.780
I think we think in terms of emotion in space
link |
00:11:51.100
and we think in terms of physical actions.
link |
00:11:54.180
And I think babies in particular,
link |
00:11:56.860
probably expresses thoughts in terms of the actions
link |
00:12:01.380
that they've seen or that they can perform
link |
00:12:03.700
and in terms of motions of objects in their environment
link |
00:12:08.020
before they start thinking in terms of words.
link |
00:12:10.860
It's amazing to think about that
link |
00:12:13.900
as the building blocks of language.
link |
00:12:16.780
So like the kind of actions and ways the babies see the world
link |
00:12:21.820
as like more fundamental
link |
00:12:23.260
than the beautiful Shakespearean language
link |
00:12:26.220
you construct on top of it.
link |
00:12:28.620
And we probably don't have any idea
link |
00:12:30.500
what that looks like, right?
link |
00:12:31.700
Like what, because it's important
link |
00:12:34.020
for them trying to engineer it into AI systems.
link |
00:12:38.460
I think visual analogies and motion
link |
00:12:42.060
is a fundamental building block of the mind.
link |
00:12:45.380
And you actually see it reflected in language.
link |
00:12:48.540
Like language is full of special metaphors.
link |
00:12:51.820
And when you think about things,
link |
00:12:53.820
I consider myself very much as a visual thinker.
link |
00:12:57.380
You often express these thoughts
link |
00:13:01.140
by using things like visualizing concepts
link |
00:13:06.500
in 2D space or like you solve problems
link |
00:13:09.940
by imagining yourself navigating a concept space.
link |
00:13:14.940
So I don't know if you have this sort of experience.
link |
00:13:17.940
You said visualizing concept space.
link |
00:13:19.860
So like, so I certainly think about,
link |
00:13:24.820
I certainly visualize mathematical concepts,
link |
00:13:27.980
but you mean like in concept space,
link |
00:13:32.340
visually you're embedding ideas
link |
00:13:34.860
into a three dimensional space
link |
00:13:36.940
you can explore with your mind essentially?
link |
00:13:38.820
You should be more like 2D, but yeah.
link |
00:13:40.340
2D?
link |
00:13:41.180
Yeah.
link |
00:13:42.100
You're a flatlander.
link |
00:13:43.180
You're, okay.
link |
00:13:45.700
No, I do not.
link |
00:13:49.660
I always have to, before I jump from concept to concept,
link |
00:13:52.780
I have to put it back down on paper.
link |
00:13:57.100
It has to be on paper.
link |
00:13:58.060
I can only travel on 2D paper, not inside my mind.
link |
00:14:03.340
You're able to move inside your mind.
link |
00:14:05.340
But even if you're writing like a paper, for instance,
link |
00:14:07.900
don't you have like a spatial representation of your paper?
link |
00:14:11.020
Like you visualize where ideas lie topologically
link |
00:14:16.660
in relationship to other ideas,
link |
00:14:18.980
kind of like a subway map of the ideas in your paper.
link |
00:14:22.500
Yeah, that's true.
link |
00:14:23.380
I mean, there is, in papers, I don't know about you,
link |
00:14:27.900
but it feels like there's a destination.
link |
00:14:32.540
There's a key idea that you want to arrive at.
link |
00:14:36.220
And a lot of it is in the fog
link |
00:14:39.340
and you're trying to kind of,
link |
00:14:40.820
it's almost like, what's that called
link |
00:14:46.180
when you do a path planning search from both directions,
link |
00:14:49.900
from the start and from the end.
link |
00:14:52.700
And then you find, you do like shortest path,
link |
00:14:54.740
but like, you know, in game playing,
link |
00:14:57.380
you do this with like A star from both sides.
link |
00:15:01.020
And you see where we're on the join.
link |
00:15:03.420
Yeah, so you kind of do, at least for me,
link |
00:15:05.740
I think like, first of all,
link |
00:15:07.100
just exploring from the start from like first principles,
link |
00:15:10.800
what do I know, what can I start proving from that, right?
link |
00:15:15.620
And then from the destination,
link |
00:15:18.060
if you start backtracking,
link |
00:15:20.460
like if I want to show some kind of sets of ideas,
link |
00:15:25.400
what would it take to show them and you kind of backtrack,
link |
00:15:28.300
but like, yeah,
link |
00:15:29.140
I don't think I'm doing all that in my mind though.
link |
00:15:31.260
Like I'm putting it down on paper.
link |
00:15:33.180
Do you use mind maps to organize your ideas?
link |
00:15:35.500
Yeah, I like mind maps.
link |
00:15:37.740
Let's get into this,
link |
00:15:38.580
because I've been so jealous of people.
link |
00:15:41.180
I haven't really tried it.
link |
00:15:42.120
I've been jealous of people that seem to like,
link |
00:15:45.500
they get like this fire of passion in their eyes
link |
00:15:48.140
because everything starts making sense.
link |
00:15:50.020
It's like Tom Cruise in the movie
link |
00:15:51.940
was like moving stuff around.
link |
00:15:53.820
Some of the most brilliant people I know use mind maps.
link |
00:15:55.900
I haven't tried really.
link |
00:15:57.660
Can you explain what the hell a mind map is?
link |
00:16:01.240
I guess mind map is a way to make
link |
00:16:03.700
kind of like the mess inside your mind
link |
00:16:05.940
to just put it on paper so that you gain more control over it.
link |
00:16:10.020
It's a way to organize things on paper
link |
00:16:13.020
and as kind of like a consequence
link |
00:16:16.420
of organizing things on paper,
link |
00:16:17.940
they start being more organized inside your own mind.
link |
00:16:20.300
So what does that look like?
link |
00:16:21.540
You put, like, do you have an example?
link |
00:16:23.980
Like what's the first thing you write on paper?
link |
00:16:27.360
What's the second thing you write?
link |
00:16:28.980
I mean, typically you draw a mind map
link |
00:16:31.660
to organize the way you think about a topic.
link |
00:16:34.860
So you would start by writing down
link |
00:16:37.340
like the key concept about that topic.
link |
00:16:39.580
Like you would write intelligence or something,
link |
00:16:42.220
and then you would start adding associative connections.
link |
00:16:45.660
Like what do you think about
link |
00:16:46.860
when you think about intelligence?
link |
00:16:48.100
What do you think are the key elements of intelligence?
link |
00:16:50.460
So maybe you would have language, for instance,
link |
00:16:52.340
and you'd have motion.
link |
00:16:53.420
And so you would start drawing notes with these things.
link |
00:16:55.460
And then you would see what do you think about
link |
00:16:57.220
when you think about motion and so on.
link |
00:16:59.140
And you would go like that, like a tree.
link |
00:17:00.620
Is it a tree mostly or is it a graph too, like a tree?
link |
00:17:05.660
Oh, it's more of a graph than a tree.
link |
00:17:07.980
And it's not limited to just writing down words.
link |
00:17:13.260
You can also draw things.
link |
00:17:15.940
And it's not supposed to be purely hierarchical, right?
link |
00:17:21.660
The point is that once you start writing it down,
link |
00:17:24.540
you can start reorganizing it so that it makes more sense,
link |
00:17:27.500
so that it's connected in a more effective way.
link |
00:17:29.940
See, but I'm so OCD that you just mentioned
link |
00:17:34.460
intelligence and language and motion.
link |
00:17:37.060
I would start becoming paranoid
link |
00:17:39.100
that the categorization isn't perfect.
link |
00:17:41.980
Like that I would become paralyzed with the mind map
link |
00:17:47.860
that like this may not be.
link |
00:17:49.660
So like the, even though you're just doing
link |
00:17:52.660
associative kind of connections,
link |
00:17:55.380
there's an implied hierarchy that's emerging.
link |
00:17:58.460
And I would start becoming paranoid
link |
00:17:59.900
that it's not the proper hierarchy.
link |
00:18:02.340
So you're not just, one way to see mind maps
link |
00:18:04.940
is you're putting thoughts on paper.
link |
00:18:07.060
It's like a stream of consciousness,
link |
00:18:10.580
but then you can also start getting paranoid.
link |
00:18:12.220
Well, is this the right hierarchy?
link |
00:18:15.140
Sure, which it's mind maps, your mind map.
link |
00:18:17.780
You're free to draw anything you want.
link |
00:18:19.420
You're free to draw any connection you want.
link |
00:18:20.860
And you can just make a different mind map
link |
00:18:23.420
if you think the central node is not the right node.
link |
00:18:26.260
Yeah, I suppose there's a fear of being wrong.
link |
00:18:29.700
If you want to organize your ideas
link |
00:18:32.660
by writing down what you think,
link |
00:18:35.540
which I think is very effective.
link |
00:18:37.380
Like how do you know what you think about something
link |
00:18:40.140
if you don't write it down, right?
link |
00:18:42.940
If you do that, the thing is that it imposes
link |
00:18:46.180
much more syntactic structure over your ideas,
link |
00:18:49.980
which is not required with mind maps.
link |
00:18:51.540
So mind map is kind of like a lower level,
link |
00:18:54.180
more freehand way of organizing your thoughts.
link |
00:18:57.900
And once you've drawn it,
link |
00:18:59.580
then you can start actually voicing your thoughts
link |
00:19:03.620
in terms of, you know, paragraphs.
link |
00:19:05.380
It's a two dimensional aspect of layout too, right?
link |
00:19:08.780
Yeah.
link |
00:19:09.620
It's a kind of flower, I guess, you start.
link |
00:19:12.860
There's usually, you want to start with a central concept?
link |
00:19:15.820
Yes.
link |
00:19:16.660
Then you move out.
link |
00:19:17.500
Typically it ends up more like a subway map.
link |
00:19:19.140
So it ends up more like a graph,
link |
00:19:20.660
a topological graph without a root node.
link |
00:19:23.500
Yeah, so like in a subway map,
link |
00:19:25.020
there are some nodes that are more connected than others.
link |
00:19:27.300
And there are some nodes that are more important than others.
link |
00:19:30.940
So there are destinations,
link |
00:19:32.380
but it's not going to be purely like a tree, for instance.
link |
00:19:36.420
Yeah, it's fascinating to think that
link |
00:19:38.540
if there's something to that about the way our mind thinks.
link |
00:19:42.420
By the way, I just kind of remembered obvious thing
link |
00:19:45.820
that I have probably thousands of documents
link |
00:19:49.020
in Google Doc at this point, that are bullet point lists,
link |
00:19:53.620
which is, you can probably map a mind map
link |
00:19:57.860
to a bullet point list.
link |
00:20:01.460
It's the same, it's a, no, it's not, it's a tree.
link |
00:20:05.060
It's a tree, yeah.
link |
00:20:06.220
So I create trees,
link |
00:20:07.900
but also they don't have the visual element.
link |
00:20:10.740
Like, I guess I'm comfortable with the structure.
link |
00:20:13.460
It feels like the narrowness,
link |
00:20:15.740
the constraints feel more comforting.
link |
00:20:18.260
If you have thousands of documents
link |
00:20:20.300
with your own thoughts in Google Docs,
link |
00:20:23.100
why don't you write some kind of search engine,
link |
00:20:26.580
like maybe a mind map, a piece of software,
link |
00:20:30.900
mind mapping software, where you write down a concept
link |
00:20:33.980
and then it gives you sentences or paragraphs
link |
00:20:37.500
from your thousand Google Docs document
link |
00:20:39.700
that match this concept.
link |
00:20:41.220
The problem is it's so deeply, unlike mind maps,
link |
00:20:45.300
it's so deeply rooted in natural language.
link |
00:20:48.460
So it's not, it's not semantically searchable,
link |
00:20:54.420
I would say, because the categories are very,
link |
00:20:57.220
you kind of mentioned intelligence, language, and motion.
link |
00:21:00.700
They're very strong, semantic.
link |
00:21:02.580
Like, it feels like the mind map forces you
link |
00:21:05.020
to be semantically clear and specific.
link |
00:21:09.780
The bullet points list I have are sparse,
link |
00:21:13.860
desperate thoughts that poetically represent
link |
00:21:20.340
a category like motion, as opposed to saying motion.
link |
00:21:25.260
So unfortunately, that's the same problem with the internet.
link |
00:21:28.980
That's why the idea of semantic web is difficult to get.
link |
00:21:32.340
It's, most language on the internet is a giant mess
link |
00:21:37.980
of natural language that's hard to interpret, which,
link |
00:21:42.500
so do you think there's something to mind maps as,
link |
00:21:46.180
you actually originally brought it up
link |
00:21:48.100
as we were talking about kind of cognition and language.
link |
00:21:53.580
Do you think there's something to mind maps
link |
00:21:55.300
about how our brain actually deals,
link |
00:21:58.100
like think reasons about things?
link |
00:22:01.740
It's possible.
link |
00:22:02.580
I think it's reasonable to assume that there is
link |
00:22:07.100
some level of topological processing in the brain,
link |
00:22:10.620
that the brain is very associative in nature.
link |
00:22:15.140
And I also believe that a topological space
link |
00:22:20.660
is a better medium to encode thoughts
link |
00:22:25.420
than a geometric space.
link |
00:22:27.540
So I think...
link |
00:22:28.380
What's the difference in a topological
link |
00:22:29.740
and a geometric space?
link |
00:22:31.060
Well, if you're talking about topologies,
link |
00:22:34.100
then points are either connected or not.
link |
00:22:36.220
So a topology is more like a subway map.
link |
00:22:38.660
And geometry is when you're interested
link |
00:22:41.660
in the distance between things.
link |
00:22:43.900
And in a subway map,
link |
00:22:44.740
you don't really have the concept of distance.
link |
00:22:46.340
You only have the concept of whether there is a train
link |
00:22:48.420
going from station A to station B.
link |
00:22:52.820
And what we do in deep learning is that we're actually
link |
00:22:55.620
dealing with geometric spaces.
link |
00:22:57.740
We are dealing with concept vectors, word vectors,
link |
00:23:01.540
that have a distance between them
link |
00:23:03.300
to express in terms of that product.
link |
00:23:05.340
So we are not really building topological models usually.
link |
00:23:10.780
I think you're absolutely right.
link |
00:23:11.820
Like distance is a fundamental importance in deep learning.
link |
00:23:16.540
I mean, it's the continuous aspect of it.
link |
00:23:19.300
Yes, because everything is a vector
link |
00:23:21.180
and everything has to be a vector
link |
00:23:22.500
because everything has to be differentiable.
link |
00:23:24.500
If your space is discrete, it's no longer differentiable.
link |
00:23:26.860
You cannot do deep learning in it anymore.
link |
00:23:29.660
Well, you could, but you can only do it by embedding it
link |
00:23:32.420
in a bigger continuous space.
link |
00:23:35.620
So if you do topology in the context of deep learning,
link |
00:23:39.380
you have to do it by embedding your topology
link |
00:23:41.100
in the geometry.
link |
00:23:42.820
Well, let me zoom out for a second.
link |
00:23:46.220
Let's get into your paper on the measure of intelligence
link |
00:23:50.180
that you put out in 2019.
link |
00:23:52.860
Yes.
link |
00:23:53.700
Okay.
link |
00:23:54.540
November.
link |
00:23:55.380
November.
link |
00:23:57.700
Yeah, remember 2019?
link |
00:23:59.420
That was a different time.
link |
00:24:01.100
Yeah, I remember.
link |
00:24:02.780
I still remember.
link |
00:24:06.500
It feels like a different world.
link |
00:24:09.620
You could travel, you could actually go outside
link |
00:24:12.620
and see friends.
link |
00:24:15.100
Yeah.
link |
00:24:16.260
Let me ask the most absurd question.
link |
00:24:18.940
I think there's some nonzero probability
link |
00:24:21.740
there'll be a textbook one day, like 200 years from now
link |
00:24:25.220
on artificial intelligence,
link |
00:24:27.740
or it'll be called like just intelligence
link |
00:24:30.660
cause humans will already be gone.
link |
00:24:32.460
It'll be your picture with a quote.
link |
00:24:35.220
This is, you know, one of the early biological systems
link |
00:24:39.020
would consider the nature of intelligence
link |
00:24:41.580
and there'll be like a definition
link |
00:24:43.180
of how they thought about intelligence.
link |
00:24:45.180
Which is one of the things you do in your paper
link |
00:24:46.860
on measure intelligence is to ask like,
link |
00:24:51.060
well, what is intelligence
link |
00:24:52.620
and how to test for intelligence and so on.
link |
00:24:55.540
So is there a spiffy quote about what is intelligence?
link |
00:25:01.860
What is the definition of intelligence
link |
00:25:03.900
according to Francois Chollet?
link |
00:25:06.740
Yeah, so do you think the super intelligent AIs
link |
00:25:10.740
of the future will want to remember us
link |
00:25:13.900
the way we remember humans from the past?
link |
00:25:16.060
And do you think they will be, you know,
link |
00:25:18.500
they won't be ashamed of having a biological origin?
link |
00:25:22.340
No, I think it would be a niche topic.
link |
00:25:24.660
It won't be that interesting,
link |
00:25:25.820
but it'll be like the people that study
link |
00:25:29.420
in certain contexts like historical civilization
link |
00:25:33.100
that no longer exists, the Aztecs and so on.
link |
00:25:36.340
That's how it'll be seen.
link |
00:25:38.260
And it'll be study in also the context on social media.
link |
00:25:42.340
There'll be hashtags about the atrocity
link |
00:25:46.700
committed to human beings
link |
00:25:49.340
when the robots finally got rid of them.
link |
00:25:52.500
Like it was a mistake.
link |
00:25:55.180
You'll be seen as a giant mistake,
link |
00:25:57.020
but ultimately in the name of progress
link |
00:26:00.060
and it created a better world
link |
00:26:01.540
because humans were over consuming the resources
link |
00:26:05.220
and they were not very rational
link |
00:26:07.260
and were destructive in the end in terms of productivity
link |
00:26:11.060
and putting more love in the world.
link |
00:26:13.820
And so within that context,
link |
00:26:15.300
there'll be a chapter about these biological systems.
link |
00:26:17.420
It seems to have a very detailed vision of that hit here.
link |
00:26:20.380
You should write a sci fi novel about it.
link |
00:26:22.340
I'm working on a sci fi novel currently, yes.
link |
00:26:28.100
Self published, yeah.
link |
00:26:29.460
The definition of intelligence.
link |
00:26:30.740
So intelligence is the efficiency
link |
00:26:34.660
with which you acquire new skills at tasks
link |
00:26:39.380
that you did not previously know about,
link |
00:26:41.940
that you did not prepare for, right?
link |
00:26:44.700
So intelligence is not skill itself.
link |
00:26:47.780
It's not what you know, it's not what you can do.
link |
00:26:50.740
It's how well and how efficiently
link |
00:26:52.900
you can learn new things.
link |
00:26:54.580
New things.
link |
00:26:55.580
Yes.
link |
00:26:56.420
The idea of newness there
link |
00:26:58.100
seems to be fundamentally important.
link |
00:27:01.180
Yes.
link |
00:27:02.020
So you would see intelligence on display, for instance.
link |
00:27:05.780
Whenever you see a human being or an AI creature
link |
00:27:09.980
adapt to a new environment that it does not see before,
link |
00:27:13.900
that its creators did not anticipate.
link |
00:27:16.620
When you see adaptation, when you see improvisation,
link |
00:27:19.340
when you see generalization, that's intelligence.
link |
00:27:22.500
In reverse, if you have a system
link |
00:27:24.460
that when you put it in a slightly new environment,
link |
00:27:27.100
it cannot adapt, it cannot improvise,
link |
00:27:30.060
it cannot deviate from what it's hard coded to do
link |
00:27:33.380
or what it has been trained to do,
link |
00:27:38.700
that is a system that is not intelligent.
link |
00:27:41.060
There's actually a quote from Einstein
link |
00:27:43.580
that captures this idea, which is,
link |
00:27:46.780
the measure of intelligence is the ability to change.
link |
00:27:50.740
I like that quote.
link |
00:27:51.740
I think it captures at least part of this idea.
link |
00:27:54.940
You know, there might be something interesting
link |
00:27:56.460
about the difference between your definition and Einstein's.
link |
00:27:59.500
I mean, he's just being Einstein and clever,
link |
00:28:04.740
but acquisition of new ability to deal with new things
link |
00:28:09.740
versus ability to just change.
link |
00:28:14.100
What's the difference between those two things?
link |
00:28:16.820
So just change in itself.
link |
00:28:19.260
Do you think there's something to that?
link |
00:28:21.300
Just being able to change.
link |
00:28:23.780
Yes, being able to adapt.
link |
00:28:25.540
So not change, but certainly change its direction.
link |
00:28:30.060
Being able to adapt yourself to your environment.
link |
00:28:34.420
Whatever the environment is.
link |
00:28:35.660
That's a big part of intelligence.
link |
00:28:37.460
And intelligence is more precisely, you know,
link |
00:28:40.020
how efficiently you're able to adapt,
link |
00:28:42.460
how efficiently you're able to basically master your environment,
link |
00:28:45.740
how efficiently you can acquire new skills.
link |
00:28:49.140
And I think there's a big distinction to be drawn
link |
00:28:52.300
between intelligence, which is a process,
link |
00:28:56.220
and the output of that process, which is skill.
link |
00:29:01.420
So for instance, if you have a very smart human brain,
link |
00:29:04.900
so for instance, if you have a very smart human programmer
link |
00:29:08.980
that considers the game of chess,
link |
00:29:10.780
and that writes down a static program that can play chess,
link |
00:29:16.180
then the intelligence is the process
link |
00:29:19.140
of developing that program.
link |
00:29:20.660
But the program itself is just encoding
link |
00:29:25.660
the output artifact of that process.
link |
00:29:28.100
The program itself is not intelligent.
link |
00:29:30.020
And the way you tell it's not intelligent
link |
00:29:31.860
is that if you put it in a different context,
link |
00:29:34.020
you ask it to play Go or something,
link |
00:29:36.060
it's not going to be able to perform well
link |
00:29:37.780
without human involvement,
link |
00:29:38.900
because the source of intelligence,
link |
00:29:41.100
the entity that is capable of that process
link |
00:29:43.140
is the human programmer.
link |
00:29:44.380
So we should be able to tell the difference
link |
00:29:47.940
between the process and its output.
link |
00:29:50.100
We should not confuse the output and the process.
link |
00:29:53.260
It's the same as, you know,
link |
00:29:54.860
do not confuse a road building company
link |
00:29:58.780
and one specific road,
link |
00:30:00.180
because one specific road takes you from point A to point B,
link |
00:30:03.460
but a road building company can take you from,
link |
00:30:06.180
can make a path from anywhere to anywhere else.
link |
00:30:08.980
Yeah, that's beautifully put,
link |
00:30:10.140
but it's also to play devil's advocate a little bit.
link |
00:30:15.460
You know, it's possible that there's something
link |
00:30:18.740
more fundamental than us humans.
link |
00:30:21.260
So you kind of said the programmer creates
link |
00:30:25.860
the difference between the choir,
link |
00:30:28.940
the skill and the skill itself.
link |
00:30:31.340
There could be something like,
link |
00:30:32.780
you could argue the universe is more intelligent.
link |
00:30:36.420
Like the base intelligence that we should be trying
link |
00:30:43.020
to measure is something that created humans.
link |
00:30:46.500
We should be measuring God or the source of the universe
link |
00:30:51.540
as opposed to, like there could be a deeper intelligence.
link |
00:30:55.140
Sure.
link |
00:30:55.980
There's always deeper intelligence, I guess.
link |
00:30:57.140
You can argue that,
link |
00:30:58.020
but that does not take anything away
link |
00:31:00.100
from the fact that humans are intelligent.
link |
00:31:01.900
And you can tell that
link |
00:31:03.260
because they are capable of adaptation and generality.
link |
00:31:07.020
Got it.
link |
00:31:07.860
And you see that in particular in the fact
link |
00:31:09.700
that humans are capable of handling situations and tasks
link |
00:31:16.780
that are quite different from anything
link |
00:31:19.780
that any of our evolutionary ancestors
link |
00:31:22.940
has ever encountered.
link |
00:31:24.540
So we are capable of generalizing very much
link |
00:31:27.140
out of distribution,
link |
00:31:28.100
if you consider our evolutionary history
link |
00:31:30.260
as being in a way our training data.
link |
00:31:33.260
Of course, evolutionary biologists would argue
link |
00:31:35.060
that we're not going too far out of the distribution.
link |
00:31:37.660
We're like mapping the skills we've learned previously,
link |
00:31:41.380
desperately trying to like jam them
link |
00:31:43.540
into like these new situations.
link |
00:31:47.060
I mean, there's definitely a little bit of that,
link |
00:31:49.460
but it's pretty clear to me that we're able to,
link |
00:31:53.660
most of the things we do any given day
link |
00:31:56.580
in our modern civilization
link |
00:31:58.060
are things that are very, very different
link |
00:32:00.860
from what our ancestors a million years ago
link |
00:32:03.900
would have been doing in a given day.
link |
00:32:05.900
And your environment is very different.
link |
00:32:07.540
So I agree that everything we do,
link |
00:32:12.180
we do it with cognitive building blocks
link |
00:32:14.220
that we acquired over the course of evolution, right?
link |
00:32:17.820
And that anchors our cognition to a certain context,
link |
00:32:22.180
which is the human condition very much.
link |
00:32:25.260
But still our mind is capable of a pretty remarkable degree
link |
00:32:29.500
of generality far beyond anything we can create
link |
00:32:32.700
in artificial systems today.
link |
00:32:34.100
Like the degree in which the mind can generalize
link |
00:32:37.740
from its evolutionary history,
link |
00:32:41.620
can generalize away from its evolutionary history
link |
00:32:43.940
is much greater than the degree
link |
00:32:46.500
to which a deep learning system today
link |
00:32:48.860
can generalize away from its training data.
link |
00:32:51.020
And like the key point you're making,
link |
00:32:52.380
which I think is quite beautiful is like,
link |
00:32:54.220
we shouldn't measure, if we're talking about measurement,
link |
00:32:58.660
we shouldn't measure the skill.
link |
00:33:01.620
We should measure like the creation of the new skill,
link |
00:33:04.340
the ability to create that new skill.
link |
00:33:06.780
But it's tempting, like it's weird
link |
00:33:10.940
because the skill is a little bit of a small window
link |
00:33:13.620
into the system.
link |
00:33:16.380
So whenever you have a lot of skills,
link |
00:33:19.420
it's tempting to measure the skills.
link |
00:33:21.900
I mean, the skill is the only thing you can objectively
link |
00:33:25.820
measure, but yeah.
link |
00:33:27.540
So the thing to keep in mind is that
link |
00:33:30.780
when you see skill in the human,
link |
00:33:35.060
it gives you a strong signal that that human is intelligent
link |
00:33:39.220
because you know they weren't born with that skill typically.
link |
00:33:42.740
Like you see a very strong chess player,
link |
00:33:45.220
maybe you're a very strong chess player yourself.
link |
00:33:47.540
I think you're saying that because I'm Russian
link |
00:33:51.020
and now you're prejudiced, you assume.
link |
00:33:53.860
All Russians are good at chess.
link |
00:33:54.700
I'm biased, exactly.
link |
00:33:55.540
I'm biased, yeah.
link |
00:33:56.900
Well, you're definitely biased.
link |
00:34:00.020
So if you see a very strong chess player,
link |
00:34:01.900
you know they weren't born knowing how to play chess.
link |
00:34:05.460
So they had to acquire that skill
link |
00:34:07.780
with their limited resources, with their limited lifetime.
link |
00:34:10.940
And they did that because they are generally intelligent.
link |
00:34:15.420
And so they may as well have acquired any other skill.
link |
00:34:18.980
You know they have this potential.
link |
00:34:21.180
And on the other hand, if you see a computer playing chess,
link |
00:34:25.700
you cannot make the same assumptions
link |
00:34:27.860
because you cannot just assume
link |
00:34:29.380
the computer is generally intelligent.
link |
00:34:30.860
The computer may be born knowing how to play chess
link |
00:34:35.300
in the sense that it may have been programmed by a human
link |
00:34:38.220
that has understood chess for the computer
link |
00:34:40.900
and that has just encoded the output
link |
00:34:44.180
of that understanding in a static program.
link |
00:34:46.020
And that program is not intelligent.
link |
00:34:49.420
So let's zoom out just for a second and say like,
link |
00:34:52.380
what is the goal on the measure of intelligence paper?
link |
00:34:57.460
Like what do you hope to achieve with it?
link |
00:34:59.020
So the goal of the paper is to clear up
link |
00:35:01.700
some longstanding misunderstandings
link |
00:35:04.580
about the way we've been conceptualizing intelligence
link |
00:35:08.380
in the AI community and in the way we've been
link |
00:35:12.500
evaluating progress in AI.
link |
00:35:16.780
There's been a lot of progress recently in machine learning
link |
00:35:19.060
and people are extrapolating from that progress
link |
00:35:22.140
that we are about to solve general intelligence.
link |
00:35:26.380
And if you want to be able to evaluate these statements,
link |
00:35:30.500
you need to precisely define what you're talking about
link |
00:35:33.820
when you're talking about general intelligence.
link |
00:35:35.580
And you need a formal way, a reliable way to measure
link |
00:35:40.580
how much intelligence,
link |
00:35:42.380
how much general intelligence a system processes.
link |
00:35:45.900
And ideally this measure of intelligence
link |
00:35:48.420
should be actionable.
link |
00:35:50.260
So it should not just describe what intelligence is.
link |
00:35:54.620
It should not just be a binary indicator
link |
00:35:56.860
that tells you the system is intelligent or it isn't.
link |
00:36:01.620
It should be actionable.
link |
00:36:03.060
It should have explanatory power, right?
link |
00:36:05.740
So you could use it as a feedback signal.
link |
00:36:08.580
It would show you the way
link |
00:36:10.980
towards building more intelligent systems.
link |
00:36:13.100
So at the first level, you draw a distinction
link |
00:36:16.500
between two divergent views of intelligence.
link |
00:36:21.780
As we just talked about,
link |
00:36:22.860
intelligence is a collection of task specific skills
link |
00:36:26.820
and a general learning ability.
link |
00:36:29.900
So what's the difference between
link |
00:36:32.300
kind of this memorization of skills
link |
00:36:35.580
and a general learning ability?
link |
00:36:37.820
We've talked about it a little bit,
link |
00:36:39.580
but can you try to linger on this topic for a bit?
link |
00:36:43.060
Yeah, so the first part of the paper
link |
00:36:45.460
is an assessment of the different ways
link |
00:36:49.100
we've been thinking about intelligence
link |
00:36:50.500
and the different ways we've been evaluating progress in AI.
link |
00:36:54.540
And this tree of cognitive sciences
link |
00:36:57.700
has been shaped by two views of the human mind.
link |
00:37:01.220
And one view is the evolutionary psychology view
link |
00:37:04.740
in which the mind is a collection of fairly static
link |
00:37:10.660
special purpose ad hoc mechanisms
link |
00:37:14.220
that have been hard coded by evolution
link |
00:37:17.620
over our history as a species for a very long time.
link |
00:37:22.500
And early AI researchers,
link |
00:37:27.940
people like Marvin Minsky, for instance,
link |
00:37:30.340
they clearly subscribed to this view.
link |
00:37:33.300
And they saw the mind as a kind of
link |
00:37:36.860
collection of static programs
link |
00:37:39.820
similar to the programs they would run
link |
00:37:42.140
on like mainframe computers.
link |
00:37:43.580
And in fact, I think they very much understood the mind
link |
00:37:48.060
through the metaphor of the mainframe computer
link |
00:37:50.540
because that was the tool they were working with, right?
link |
00:37:53.580
And so you had these static programs,
link |
00:37:55.100
this collection of very different static programs
link |
00:37:57.180
operating over a database like memory.
link |
00:38:00.060
And in this picture, learning was not very important.
link |
00:38:03.580
Learning was considered to be just memorization.
link |
00:38:05.660
And in fact, learning is basically not featured
link |
00:38:10.380
in AI textbooks until the 1980s
link |
00:38:14.620
with the rise of machine learning.
link |
00:38:16.940
It's kind of fun to think about
link |
00:38:18.780
that learning was the outcast.
link |
00:38:21.500
Like the weird people working on learning,
link |
00:38:24.060
like the mainstream AI world was,
link |
00:38:28.100
I mean, I don't know what the best term is,
link |
00:38:31.780
but it's non learning.
link |
00:38:33.900
It was seen as like reasoning would not be learning based.
link |
00:38:37.940
Yes, it was considered that the mind
link |
00:38:40.620
was a collection of programs
link |
00:38:43.180
that were primarily logical in nature.
link |
00:38:46.620
And that's all you needed to do to create a mind
link |
00:38:49.140
was to write down these programs
link |
00:38:50.860
and they would operate over knowledge,
link |
00:38:52.860
which would be stored in some kind of database.
link |
00:38:55.100
And as long as your database would encompass,
link |
00:38:57.300
you know, everything about the world
link |
00:38:59.380
and your logical rules were comprehensive,
link |
00:39:03.340
then you would have a mind.
link |
00:39:04.940
So the other view of the mind
link |
00:39:06.420
is the brain as a sort of blank slate, right?
link |
00:39:11.940
This is a very old idea.
link |
00:39:13.180
You find it in John Locke's writings.
link |
00:39:16.140
This is the tabula rasa.
link |
00:39:19.220
And this is this idea that the mind
link |
00:39:21.140
is some kind of like information sponge
link |
00:39:23.340
that starts empty, that starts blank.
link |
00:39:27.340
And that absorbs knowledge and skills from experience, right?
link |
00:39:34.340
So it's a sponge that reflects the complexity of the world,
link |
00:39:38.700
the complexity of your life experience, essentially.
link |
00:39:41.780
That everything you know and everything you can do
link |
00:39:44.340
is a reflection of something you found
link |
00:39:47.740
in the outside world, essentially.
link |
00:39:49.580
So this is an idea that's very old.
link |
00:39:51.580
That was not very popular, for instance, in the 1970s.
link |
00:39:56.780
But that gained a lot of vitality recently
link |
00:39:58.820
with the rise of connectionism, in particular deep learning.
link |
00:40:02.300
And so today, deep learning
link |
00:40:03.780
is the dominant paradigm in AI.
link |
00:40:06.540
And I feel like lots of AI researchers
link |
00:40:10.420
are conceptualizing the mind via a deep learning metaphor.
link |
00:40:14.980
Like they see the mind as a kind of
link |
00:40:17.820
randomly initialized neural network that starts blank
link |
00:40:21.660
when you're born.
link |
00:40:22.500
And then that gets trained via exposure to trained data
link |
00:40:26.100
that acquires knowledge and skills
link |
00:40:27.740
via exposure to trained data.
link |
00:40:29.220
By the way, it's a small tangent.
link |
00:40:32.700
I feel like people who are thinking about intelligence
link |
00:40:36.700
are not conceptualizing it that way.
link |
00:40:39.700
I actually haven't met too many people
link |
00:40:41.820
who believe that a neural network
link |
00:40:44.700
will be able to reason, who seriously think that rigorously.
link |
00:40:51.660
Because I think it's actually an interesting worldview.
link |
00:40:54.260
And we'll talk about it more,
link |
00:40:56.420
but it's been impressive what neural networks
link |
00:41:00.420
have been able to accomplish.
link |
00:41:02.100
And to me, I don't know, you might disagree,
link |
00:41:04.540
but it's an open question whether like scaling size
link |
00:41:09.820
eventually might lead to incredible results
link |
00:41:13.660
to us mere humans will appear as if it's general.
link |
00:41:17.060
I mean, if you ask people who are seriously thinking
link |
00:41:19.860
about intelligence, they will definitely not say
link |
00:41:22.660
that all you need to do is,
link |
00:41:24.900
like the mind is just a neural network.
link |
00:41:27.420
However, it's actually a view that's very popular,
link |
00:41:30.420
I think, in the deep learning community
link |
00:41:31.780
that many people are kind of conceptually
link |
00:41:35.460
intellectually lazy about it.
link |
00:41:37.140
Right, it's a, but I guess what I'm saying exactly right,
link |
00:41:40.500
it's, I mean, I haven't met many people
link |
00:41:44.740
and I think it would be interesting to meet a person
link |
00:41:47.740
who is not intellectually lazy about this particular topic
link |
00:41:50.260
and still believes that neural networks will go all the way.
link |
00:41:54.460
I think Yama is probably closest to that
link |
00:41:56.820
with self supervised.
link |
00:41:57.660
There are definitely people who argue
link |
00:41:59.660
that current deep learning techniques
link |
00:42:03.100
are already the way to general artificial intelligence.
link |
00:42:06.860
And that all you need to do is to scale it up
link |
00:42:09.460
to all the available trained data.
link |
00:42:12.780
And that's, if you look at the waves
link |
00:42:16.300
that OpenAI's GPT3 model has made,
link |
00:42:19.500
you see echoes of this idea.
link |
00:42:22.700
So on that topic, GPT3, similar to GPT2 actually,
link |
00:42:28.980
have captivated some part of the imagination of the public.
link |
00:42:33.060
There's just a bunch of hype of different kind.
link |
00:42:35.580
That's, I would say it's emergent.
link |
00:42:37.940
It's not artificially manufactured.
link |
00:42:39.820
It's just like people just get excited
link |
00:42:42.580
for some strange reason.
link |
00:42:43.780
And in the case of GPT3, which is funny,
link |
00:42:46.500
that there's, I believe, a couple months delay
link |
00:42:49.100
from release to hype.
link |
00:42:51.580
Maybe I'm not historically correct on that,
link |
00:42:56.780
but it feels like there was a little bit of a lack of hype
link |
00:43:01.260
and then there's a phase shift into hype.
link |
00:43:04.780
But nevertheless, there's a bunch of cool applications
link |
00:43:07.460
that seem to captivate the imagination of the public
link |
00:43:10.380
about what this language model
link |
00:43:12.140
that's trained in unsupervised way
link |
00:43:15.180
without any fine tuning is able to achieve.
link |
00:43:19.500
So what do you make of that?
link |
00:43:20.900
What are your thoughts about GPT3?
link |
00:43:22.940
Yeah, so I think what's interesting about GPT3
link |
00:43:25.700
is the idea that it may be able to learn new tasks
link |
00:43:31.180
after just being shown a few examples.
link |
00:43:33.580
So I think if it's actually capable of doing that,
link |
00:43:35.620
that's novel and that's very interesting
link |
00:43:37.580
and that's something we should investigate.
link |
00:43:39.900
That said, I must say, I'm not entirely convinced
link |
00:43:43.140
that we have shown it's capable of doing that.
link |
00:43:47.300
It's very likely, given the amount of data
link |
00:43:50.980
that the model is trained on,
link |
00:43:52.220
that what it's actually doing is pattern matching
link |
00:43:55.700
a new task you give it with a task
link |
00:43:58.060
that it's been exposed to in its trained data.
link |
00:44:00.100
It's just recognizing the task
link |
00:44:01.620
instead of just developing a model of the task, right?
link |
00:44:05.540
But there's, sorry to interrupt,
link |
00:44:07.660
there's a parallel as to what you said before,
link |
00:44:10.020
which is it's possible to see GPT3 as like the prompts
link |
00:44:14.620
it's given as a kind of SQL query
link |
00:44:17.780
into this thing that it's learned,
link |
00:44:19.580
similar to what you said before,
link |
00:44:20.860
which is language is used to query the memory.
link |
00:44:23.340
Yes.
link |
00:44:24.180
So is it possible that neural network
link |
00:44:26.940
is a giant memorization thing,
link |
00:44:29.300
but then if it gets sufficiently giant,
link |
00:44:32.260
it'll memorize sufficiently large amounts
link |
00:44:35.100
of things in the world or it becomes,
link |
00:44:37.860
or intelligence becomes a querying machine?
link |
00:44:40.580
I think it's possible that a significant chunk
link |
00:44:44.180
of intelligence is this giant associative memory.
link |
00:44:48.740
I definitely don't believe that intelligence
link |
00:44:51.340
is just a giant associative memory,
link |
00:44:53.740
but it may well be a big component.
link |
00:44:57.660
So do you think GPT3, 4, 5,
link |
00:45:02.660
GPT10 will eventually, like, what do you think,
link |
00:45:07.140
where's the ceiling?
link |
00:45:08.340
Do you think you'll be able to reason?
link |
00:45:11.980
No, that's a bad question.
link |
00:45:14.620
Like, what is the ceiling is the better question.
link |
00:45:17.340
How well is it gonna scale?
link |
00:45:18.500
How good is GPTN going to be?
link |
00:45:21.180
Yeah.
link |
00:45:22.020
So I believe GPTN is gonna.
link |
00:45:25.420
GPTN.
link |
00:45:26.860
Is gonna improve on the strength of GPT2 and 3,
link |
00:45:30.940
which is it will be able to generate, you know,
link |
00:45:33.980
ever more plausible text in context.
link |
00:45:37.660
Just monotonically increasing performance.
link |
00:45:41.260
Yes, if you train a bigger model on more data,
link |
00:45:44.340
then your text will be increasingly more context aware
link |
00:45:49.340
and increasingly more plausible
link |
00:45:51.220
in the same way that GPT3 is much better
link |
00:45:54.700
at generating plausible text compared to GPT2.
link |
00:45:57.500
But that said, I don't think just scaling up the model
link |
00:46:01.940
to more transformer layers and more trained data
link |
00:46:04.180
is gonna address the flaws of GPT3,
link |
00:46:07.020
which is that it can generate plausible text,
link |
00:46:09.900
but that text is not constrained by anything else
link |
00:46:13.620
other than plausibility.
link |
00:46:15.180
So in particular, it's not constrained by factualness
link |
00:46:19.180
or even consistency, which is why it's very easy
link |
00:46:21.820
to get GPT3 to generate statements
link |
00:46:23.860
that are factually untrue.
link |
00:46:26.260
Or to generate statements that are even self contradictory.
link |
00:46:29.580
Right?
link |
00:46:30.420
Because it's only goal is plausibility,
link |
00:46:35.420
and it has no other constraints.
link |
00:46:37.620
It's not constrained to be self consistent, for instance.
link |
00:46:40.300
Right?
link |
00:46:41.140
And so for this reason, one thing that I thought
link |
00:46:43.540
was very interesting with GPT3 is that you can
link |
00:46:46.780
predetermine the answer it will give you
link |
00:46:49.780
by asking the question in a specific way,
link |
00:46:52.020
because it's very responsive to the way you ask the question.
link |
00:46:55.260
Since it has no understanding of the content of the question.
link |
00:47:00.260
Right.
link |
00:47:01.100
And if you have the same question in two different ways
link |
00:47:05.620
that are basically adversarially engineered
link |
00:47:09.020
to produce certain answer,
link |
00:47:10.260
you will get two different answers,
link |
00:47:12.740
two contradictory answers.
link |
00:47:14.180
It's very susceptible to adversarial attacks, essentially.
link |
00:47:16.660
Potentially, yes.
link |
00:47:17.780
So in general, the problem with these models,
link |
00:47:20.820
these generative models, is that they are very good
link |
00:47:24.180
at generating plausible text,
link |
00:47:27.220
but that's just not enough.
link |
00:47:29.660
Right?
link |
00:47:33.620
I think one avenue that would be very interesting
link |
00:47:36.500
to make progress is to make it possible
link |
00:47:40.780
to write programs over the latent space
link |
00:47:43.860
that these models operate on.
link |
00:47:45.620
That you would rely on these self supervised models
link |
00:47:49.460
to generate a sort of like pool of knowledge and concepts
link |
00:47:54.340
and common sense.
link |
00:47:55.260
And then you would be able to write
link |
00:47:57.180
explicit reasoning programs over it.
link |
00:48:01.460
Because the current problem with GPT3 is that
link |
00:48:03.660
it can be quite difficult to get it to do what you want to do.
link |
00:48:09.420
If you want to turn GPT3 into products,
link |
00:48:12.420
you need to put constraints on it.
link |
00:48:14.780
You need to force it to obey certain rules.
link |
00:48:19.500
So you need a way to program it explicitly.
link |
00:48:22.540
Yeah, so if you look at its ability
link |
00:48:24.220
to do program synthesis,
link |
00:48:26.140
it generates, like you said, something that's plausible.
link |
00:48:29.060
Yeah, so if you try to make it generate programs,
link |
00:48:32.580
it will perform well for any program
link |
00:48:35.940
that it has seen in its training data.
link |
00:48:38.700
But because program space is not interpretive, right?
link |
00:48:42.940
It's not going to be able to generalize to problems
link |
00:48:46.740
it hasn't seen before.
link |
00:48:48.700
Now that's currently, do you think sort of an absurd,
link |
00:48:54.980
but I think useful, I guess, intuition builder is,
link |
00:49:00.340
you know, the GPT3 has 175 billion parameters.
link |
00:49:07.340
Human brain has 100, has about a thousand times that
link |
00:49:11.740
or more in terms of number of synapses.
link |
00:49:16.380
Do you think, obviously, very different kinds of things,
link |
00:49:21.180
but there is some degree of similarity.
link |
00:49:26.380
Do you think, what do you think GPT will look like
link |
00:49:30.700
when it has 100 trillion parameters?
link |
00:49:34.180
You think our conversation might be in nature different?
link |
00:49:39.100
Like, because you've criticized GPT3 very effectively now.
link |
00:49:42.940
Do you think?
link |
00:49:45.420
No, I don't think so.
link |
00:49:46.940
So to begin with, the bottleneck with scaling up GPT3,
link |
00:49:51.020
GPT models, generative pre trained transformer models,
link |
00:49:54.860
is not going to be the size of the model
link |
00:49:57.620
or how long it takes to train it.
link |
00:49:59.580
The bottleneck is going to be the trained data
link |
00:50:01.860
because OpenAI is already training GPT3
link |
00:50:05.540
on a core of basically the entire web, right?
link |
00:50:08.860
And that's a lot of data.
link |
00:50:09.820
So you could imagine training on more data than that,
link |
00:50:12.140
like Google could train on more data than that,
link |
00:50:14.460
but it would still be only incrementally more data.
link |
00:50:17.500
And I don't recall exactly how much more data GPT3
link |
00:50:21.340
was trained on compared to GPT2,
link |
00:50:22.820
but it's probably at least like a hundred,
link |
00:50:25.100
maybe even a thousand X.
link |
00:50:26.620
I don't have the exact number.
link |
00:50:28.460
You're not going to be able to train a model
link |
00:50:30.140
on a hundred more data than what you're already doing.
link |
00:50:34.180
So that's brilliant.
link |
00:50:35.300
So it's easier to think of compute as a bottleneck
link |
00:50:38.940
and then arguing that we can remove that bottleneck.
link |
00:50:41.380
But we can remove the compute bottleneck.
link |
00:50:43.060
I don't think it's a big problem.
link |
00:50:44.580
If you look at the pace at which we've improved
link |
00:50:48.500
the efficiency of deep learning models
link |
00:50:51.340
in the past few years,
link |
00:50:54.060
I'm not worried about train time bottlenecks
link |
00:50:57.180
or model size bottlenecks.
link |
00:50:59.580
The bottleneck in the case
link |
00:51:01.140
of these generative transformer models
link |
00:51:03.420
is absolutely the trained data.
link |
00:51:05.540
What about the quality of the data?
link |
00:51:07.740
So, yeah.
link |
00:51:08.580
So the quality of the data is an interesting point.
link |
00:51:10.900
The thing is,
link |
00:51:11.900
if you're going to want to use these models
link |
00:51:14.460
in real products,
link |
00:51:16.900
then you want to feed them data
link |
00:51:20.060
that's as high quality, as factual,
link |
00:51:23.460
I would say as unbiased as possible,
link |
00:51:25.620
that there's not really such a thing
link |
00:51:27.340
as unbiased data in the first place.
link |
00:51:30.500
But you probably don't want to train it on Reddit,
link |
00:51:34.020
for instance.
link |
00:51:34.860
It sounds like a bad plan.
link |
00:51:37.060
So from my personal experience,
link |
00:51:38.620
working with large scale deep learning models.
link |
00:51:42.740
So at some point I was working on a model at Google
link |
00:51:46.580
that's trained on 350 million labeled images.
link |
00:51:52.340
It's an image classification model.
link |
00:51:53.660
That's a lot of images.
link |
00:51:54.660
That's like probably most publicly available images
link |
00:51:58.140
on the web at the time.
link |
00:52:00.980
And it was a very noisy data set
link |
00:52:03.900
because the labels were not originally annotated by hand,
link |
00:52:07.820
by humans.
link |
00:52:08.660
They were automatically derived from like tags
link |
00:52:12.420
on social media,
link |
00:52:14.300
or just keywords in the same page
link |
00:52:16.820
as the image was found and so on.
link |
00:52:18.220
So it was very noisy.
link |
00:52:19.140
And it turned out that you could easily get a better model,
link |
00:52:25.340
not just by training,
link |
00:52:26.500
like if you train on more of the noisy data,
link |
00:52:29.980
you get an incrementally better model,
link |
00:52:31.540
but you very quickly hit diminishing returns.
link |
00:52:35.500
On the other hand,
link |
00:52:36.660
if you train on smaller data set
link |
00:52:38.420
with higher quality annotations,
link |
00:52:40.020
quality annotations that are actually made by humans,
link |
00:52:45.380
you get a better model.
link |
00:52:47.340
And it also takes less time to train it.
link |
00:52:49.860
Yeah, that's fascinating.
link |
00:52:51.580
It's the self supervised learning.
link |
00:52:53.500
There's a way to get better doing the automated labeling.
link |
00:52:58.780
Yeah, so you can enrich or refine your labels
link |
00:53:04.620
in an automated way.
link |
00:53:05.860
That's correct.
link |
00:53:07.460
Do you have a hope for,
link |
00:53:08.700
I don't know if you're familiar
link |
00:53:09.580
with the idea of a semantic web.
link |
00:53:11.980
Is a semantic web just for people who are not familiar
link |
00:53:15.620
and is the idea of being able to convert the internet
link |
00:53:20.620
or be able to attach like semantic meaning
link |
00:53:25.700
to the words on the internet,
link |
00:53:27.940
the sentences, the paragraphs,
link |
00:53:29.780
to be able to convert information on the internet
link |
00:53:33.940
or some fraction of the internet
link |
00:53:35.660
into something that's interpretable by machines.
link |
00:53:39.140
That was kind of a dream for,
link |
00:53:44.260
I think the semantic web papers in the nineties,
link |
00:53:47.020
it's kind of the dream that, you know,
link |
00:53:49.740
the internet is full of rich, exciting information.
link |
00:53:52.340
Even just looking at Wikipedia,
link |
00:53:54.420
we should be able to use that as data for machines.
link |
00:53:57.780
And so far it's not,
link |
00:53:58.980
it's not really in a format that's available to machines.
link |
00:54:01.220
So no, I don't think the semantic web will ever work
link |
00:54:04.540
simply because it would be a lot of work, right?
link |
00:54:08.020
To make, to provide that information in structured form.
link |
00:54:12.020
And there is not really any incentive
link |
00:54:13.820
for anyone to provide that work.
link |
00:54:16.340
So I think the way forward to make the knowledge
link |
00:54:21.180
on the web available to machines
link |
00:54:22.820
is actually something closer to unsupervised deep learning.
link |
00:54:29.140
So GPT3 is actually a bigger step in the direction
link |
00:54:32.220
of making the knowledge of the web available to machines
link |
00:54:34.940
than the semantic web was.
link |
00:54:36.660
Yeah, perhaps in a human centric sense,
link |
00:54:40.140
it feels like GPT3 hasn't learned anything
link |
00:54:47.300
that could be used to reason.
link |
00:54:50.340
But that might be just the early days.
link |
00:54:52.820
Yeah, I think that's correct.
link |
00:54:54.300
I think the forms of reasoning that you see it perform
link |
00:54:57.340
are basically just reproducing patterns
link |
00:55:00.660
that it has seen in string data.
link |
00:55:02.380
So of course, if you're trained on the entire web,
link |
00:55:06.580
then you can produce an illusion of reasoning
link |
00:55:09.340
in many different situations.
link |
00:55:10.740
But it will break down if it's presented
link |
00:55:13.100
with a novel situation.
link |
00:55:15.260
That's the open question between the illusion of reasoning
link |
00:55:17.660
and actual reasoning, yeah.
link |
00:55:18.700
Yes.
link |
00:55:19.660
The power to adapt to something that is genuinely new.
link |
00:55:22.780
Because the thing is, even imagine you had,
link |
00:55:28.020
you could train on every bit of data
link |
00:55:31.100
ever generated in the history of humanity.
link |
00:55:35.500
It remains, that model would be capable
link |
00:55:38.540
of anticipating many different possible situations.
link |
00:55:43.220
But it remains that the future is
link |
00:55:45.660
going to be something different.
link |
00:55:48.940
For instance, if you train a GPT3 model on data
link |
00:55:52.940
from the year 2002, for instance,
link |
00:55:55.700
and then use it today, it's going to be missing many things.
link |
00:55:58.260
It's going to be missing many common sense
link |
00:56:00.740
facts about the world.
link |
00:56:02.620
It's even going to be missing vocabulary and so on.
link |
00:56:05.820
Yeah, it's interesting that GPT3 even doesn't have,
link |
00:56:09.580
I think, any information about the coronavirus.
link |
00:56:13.580
Yes.
link |
00:56:14.980
Which is why a system that's, you
link |
00:56:19.620
tell that the system is intelligent
link |
00:56:21.300
when it's capable to adapt.
link |
00:56:22.860
So intelligence is going to require
link |
00:56:25.580
some amount of continuous learning.
link |
00:56:28.140
It's also going to require some amount of improvisation.
link |
00:56:31.020
It's not enough to assume that what you're
link |
00:56:33.980
going to be asked to do is something
link |
00:56:36.780
that you've seen before, or something
link |
00:56:39.300
that is a simple interpolation of things you've seen before.
link |
00:56:42.700
Yeah.
link |
00:56:43.340
In fact, that model breaks down for even very
link |
00:56:49.060
tasks that look relatively simple from a distance,
link |
00:56:52.300
like L5 self driving, for instance.
link |
00:56:55.660
Google had a paper a couple of years
link |
00:56:58.420
back showing that something like 30 million different road
link |
00:57:04.540
situations were actually completely insufficient
link |
00:57:07.180
to train a driving model.
link |
00:57:09.780
It wasn't even L2, right?
link |
00:57:11.740
And that's a lot of data.
link |
00:57:12.820
That's a lot more data than the 20 or 30 hours of driving
link |
00:57:16.940
that a human needs to learn to drive,
link |
00:57:19.580
given the knowledge they've already accumulated.
link |
00:57:21.900
Well, let me ask you on that topic.
link |
00:57:25.540
Elon Musk, Tesla Autopilot, one of the only companies,
link |
00:57:31.100
I believe, is really pushing for a learning based approach.
link |
00:57:34.660
Are you skeptical that that kind of network
link |
00:57:37.020
can achieve level 4?
link |
00:57:39.460
L4 is probably achievable.
link |
00:57:42.660
L5 probably not.
link |
00:57:44.420
What's the distinction there?
link |
00:57:45.860
Is L5 is completely you can just fall asleep?
link |
00:57:49.340
Yeah, L5 is basically human level.
link |
00:57:51.060
Well, with driving, we have to be careful saying human level,
link |
00:57:53.740
because that's the most of the drivers.
link |
00:57:57.180
Yeah, that's the clearest example of cars
link |
00:58:00.620
will most likely be much safer than humans in many situations
link |
00:58:05.020
where humans fail.
link |
00:58:06.540
It's the vice versa question.
link |
00:58:09.860
I'll tell you, the thing is the amount of trained data
link |
00:58:13.820
you would need to anticipate for pretty much every possible
link |
00:58:17.020
situation you learn content in the real world
link |
00:58:20.460
is such that it's not entirely unrealistic
link |
00:58:23.500
to think that at some point in the future,
link |
00:58:25.540
we'll develop a system that's trained on enough data,
link |
00:58:27.700
especially provided that we can simulate a lot of that data.
link |
00:58:32.340
We don't necessarily need actual cars
link |
00:58:34.500
on the road for everything.
link |
00:58:37.620
But it's a massive effort.
link |
00:58:39.780
And it turns out you can create a system that's
link |
00:58:42.100
much more adaptive, that can generalize much better
link |
00:58:45.180
if you just add explicit models of the surroundings
link |
00:58:52.060
of the car.
link |
00:58:53.580
And if you use deep learning for what
link |
00:58:55.180
it's good at, which is to provide
link |
00:58:57.460
perceptive information.
link |
00:58:59.500
So in general, deep learning is a way
link |
00:59:02.460
to encode perception and a way to encode intuition.
link |
00:59:05.740
But it is not a good medium for any sort of explicit reasoning.
link |
00:59:11.100
And in AI systems today, strong generalization
link |
00:59:15.940
tends to come from explicit models,
link |
00:59:21.020
tend to come from abstractions in the human mind that
link |
00:59:24.540
are encoded in program form by a human engineer.
link |
00:59:29.540
These are the abstractions you can actually generalize, not
link |
00:59:31.580
the sort of weak abstraction that
link |
00:59:33.380
is learned by a neural network.
link |
00:59:34.860
Yeah, and the question is how much reasoning,
link |
00:59:38.540
how much strong abstractions are required
link |
00:59:41.940
to solve particular tasks like driving.
link |
00:59:44.620
That's the question.
link |
00:59:46.540
Or human life existence.
link |
00:59:48.860
How much strong abstractions does existence require?
link |
00:59:53.340
But more specifically on driving,
link |
00:59:58.100
that seems to be a coupled question about intelligence.
link |
01:00:02.180
How much intelligence, how do you
link |
01:00:05.740
build an intelligent system?
link |
01:00:07.140
And the coupled problem, how hard is this problem?
link |
01:00:11.420
How much intelligence does this problem actually require?
link |
01:00:14.380
So we get to cheat because we get
link |
01:00:18.460
to look at the problem.
link |
01:00:20.700
It's not like you get to close our eyes
link |
01:00:22.860
and completely new to driving.
link |
01:00:24.740
We get to do what we do as human beings, which
link |
01:00:27.020
is for the majority of our life before we ever
link |
01:00:31.100
learn, quote unquote, to drive.
link |
01:00:32.460
We get to watch other cars and other people drive.
link |
01:00:35.460
We get to be in cars.
link |
01:00:36.540
We get to watch.
link |
01:00:37.540
We get to see movies about cars.
link |
01:00:39.500
We get to observe all this stuff.
link |
01:00:42.700
And that's similar to what neural networks are doing.
link |
01:00:45.060
It's getting a lot of data, and the question
link |
01:00:50.340
is, yeah, how many leaps of reasoning genius
link |
01:00:55.740
is required to be able to actually effectively drive?
link |
01:00:59.420
I think it's a good example of driving.
link |
01:01:01.260
I mean, sure, you've seen a lot of cars in your life
link |
01:01:06.260
before you learned to drive.
link |
01:01:07.700
But let's say you've learned to drive in Silicon Valley,
link |
01:01:10.620
and now you rent a car in Tokyo.
link |
01:01:14.100
Well, now everyone is driving on the other side of the road,
link |
01:01:16.820
and the signs are different, and the roads
link |
01:01:19.220
are more narrow and so on.
link |
01:01:20.500
So it's a very, very different environment.
link |
01:01:22.660
And a smart human, even an average human,
link |
01:01:26.780
should be able to just zero shot it,
link |
01:01:29.300
to just be operational in this very different environment
link |
01:01:34.260
right away, despite having had no contact with the novel
link |
01:01:40.500
complexity that is contained in this environment.
link |
01:01:44.140
And that novel complexity is not just an interpolation
link |
01:01:49.780
over the situations that you've encountered previously,
link |
01:01:52.420
like learning to drive in the US.
link |
01:01:55.060
I would say the reason I ask is one
link |
01:01:57.300
of the most interesting tests of intelligence
link |
01:01:59.940
we have today actively, which is driving,
link |
01:02:04.460
in terms of having an impact on the world.
link |
01:02:06.740
When do you think we'll pass that test of intelligence?
link |
01:02:09.900
So I don't think driving is that much of a test of intelligence,
link |
01:02:13.380
because again, there is no task for which skill at that task
link |
01:02:18.500
demonstrates intelligence, unless it's
link |
01:02:21.980
a kind of meta task that involves acquiring new skills.
link |
01:02:26.540
So I don't think, I think you can actually
link |
01:02:28.260
solve driving without having any real amount of intelligence.
link |
01:02:35.060
For instance, if you did have infinite trained data,
link |
01:02:39.540
you could just literally train an end to end deep learning
link |
01:02:42.660
model that does driving, provided infinite trained data.
link |
01:02:45.700
The only problem with the whole idea
link |
01:02:48.940
is collecting a data set that's sufficiently comprehensive,
link |
01:02:53.500
that covers the very long tail of possible situations
link |
01:02:56.380
you might encounter.
link |
01:02:57.260
And it's really just a scale problem.
link |
01:02:59.380
So I think there's nothing fundamentally wrong
link |
01:03:04.500
with this plan, with this idea.
link |
01:03:06.500
It's just that it strikes me as a fairly inefficient thing
link |
01:03:11.260
to do, because you run into this scaling issue with diminishing
link |
01:03:17.340
returns.
link |
01:03:17.860
Whereas if instead you took a more manual engineering
link |
01:03:21.980
approach, where you use deep learning modules in combination
link |
01:03:29.020
with engineering an explicit model of the surrounding
link |
01:03:33.220
of the cars, and you bridge the two in a clever way,
link |
01:03:36.100
your model will actually start generalizing
link |
01:03:38.900
much earlier and more effectively
link |
01:03:40.900
than the end to end deep learning model.
link |
01:03:42.540
So why would you not go with the more manual engineering
link |
01:03:46.500
oriented approach?
link |
01:03:47.900
Even if you created that system, either the end
link |
01:03:50.620
to end deep learning model system that's
link |
01:03:52.500
running infinite data, or the slightly more human system,
link |
01:03:58.500
I don't think achieving L5 would demonstrate
link |
01:04:02.740
general intelligence or intelligence
link |
01:04:04.540
of any generality at all.
link |
01:04:05.740
Again, the only possible test of generality in AI
link |
01:04:10.580
would be a test that looks at skill acquisition
link |
01:04:12.740
over unknown tasks.
link |
01:04:14.500
For instance, you could take your L5 driver
link |
01:04:17.380
and ask it to learn to pilot a commercial airplane,
link |
01:04:21.540
for instance.
link |
01:04:22.420
And then you would look at how much human involvement is
link |
01:04:25.180
required and how much strength data
link |
01:04:26.740
is required for the system to learn to pilot an airplane.
link |
01:04:29.860
And that gives you a measure of how intelligent
link |
01:04:35.020
that system really is.
link |
01:04:35.860
Yeah, well, I mean, that's a big leap.
link |
01:04:37.540
I get you.
link |
01:04:38.060
But I'm more interested, as a problem, I would see,
link |
01:04:42.820
to me, driving is a black box that
link |
01:04:47.380
can generate novel situations at some rate,
link |
01:04:51.180
what people call edge cases.
link |
01:04:53.500
So it does have newness that keeps being like,
link |
01:04:56.380
we're confronted, let's say, once a month.
link |
01:04:59.460
It is a very long tail, yes.
link |
01:05:00.660
It's a long tail.
link |
01:05:01.460
That doesn't mean you cannot solve it just
link |
01:05:05.620
by training a statistical model and a lot of data.
link |
01:05:08.740
Huge amount of data.
link |
01:05:09.820
It's really a matter of scale.
link |
01:05:11.900
But I guess what I'm saying is if you have a vehicle that
link |
01:05:16.020
achieves level 5, it is going to be able to deal
link |
01:05:21.580
with new situations.
link |
01:05:23.980
Or, I mean, the data is so large that the rate of new situations
link |
01:05:30.860
is very low.
link |
01:05:32.100
Yes.
link |
01:05:33.140
That's not intelligent.
link |
01:05:34.220
So if we go back to your kind of definition of intelligence,
link |
01:05:37.780
it's the efficiency.
link |
01:05:39.460
With which you can adapt to new situations,
link |
01:05:42.380
to truly new situations, not situations you've seen before.
link |
01:05:45.700
Not situations that could be anticipated by your creators,
link |
01:05:48.460
by the creators of the system, but truly new situations.
link |
01:05:51.740
The efficiency with which you acquire new skills.
link |
01:05:54.940
If you require, if in order to pick up a new skill,
link |
01:05:58.260
you require a very extensive training
link |
01:06:03.180
data set of most possible situations
link |
01:06:05.900
that can occur in the practice of that skill,
link |
01:06:08.940
then the system is not intelligent.
link |
01:06:10.620
It is mostly just a lookup table.
link |
01:06:15.060
Yeah.
link |
01:06:16.140
Well, likewise, if in order to acquire a skill,
link |
01:06:20.100
you need a human engineer to write down
link |
01:06:23.300
a bunch of rules that cover most or every possible situation.
link |
01:06:26.940
Likewise, the system is not intelligent.
link |
01:06:29.620
The system is merely the output artifact
link |
01:06:33.100
of a process that happens in the minds of the engineers that
link |
01:06:39.300
are creating it.
link |
01:06:40.820
It is encoding an abstraction that's
link |
01:06:44.700
produced by the human mind.
link |
01:06:46.420
And intelligence would actually be
link |
01:06:51.500
the process of autonomously producing this abstraction.
link |
01:06:56.260
Yeah.
link |
01:06:57.180
Not like if you take an abstraction
link |
01:06:59.260
and you encode it on a piece of paper or in a computer program,
link |
01:07:02.900
the abstraction itself is not intelligent.
link |
01:07:05.940
What's intelligent is the agent that's
link |
01:07:09.220
capable of producing these abstractions.
link |
01:07:11.780
Yeah, it feels like there's a little bit of a gray area.
link |
01:07:16.500
Because you're basically saying that deep learning forms
link |
01:07:18.860
abstractions, too.
link |
01:07:21.500
But those abstractions do not seem
link |
01:07:24.660
to be effective for generalizing far outside of the things
link |
01:07:29.140
that it's already seen.
link |
01:07:30.100
But generalize a little bit.
link |
01:07:31.620
Yeah, absolutely.
link |
01:07:32.620
No, deep learning does generalize a little bit.
link |
01:07:34.820
Generalization is not binary.
link |
01:07:36.980
It's more like a spectrum.
link |
01:07:38.140
Yeah.
link |
01:07:38.740
And there's a certain point, it's a gray area,
link |
01:07:40.860
but there's a certain point where
link |
01:07:42.500
there's an impressive degree of generalization that happens.
link |
01:07:47.340
No, I guess exactly what you were saying
link |
01:07:50.420
is intelligence is how efficiently you're
link |
01:07:56.420
able to generalize far outside of the distribution of things
link |
01:08:02.300
you've seen already.
link |
01:08:03.260
Yes.
link |
01:08:03.780
So it's both the distance of how far you can,
link |
01:08:07.180
how new, how radically new something is,
link |
01:08:10.180
and how efficiently you're able to deal with that.
link |
01:08:12.740
So you can think of intelligence as a measure of an information
link |
01:08:17.420
conversion ratio.
link |
01:08:19.140
Imagine a space of possible situations.
link |
01:08:23.420
And you've covered some of them.
link |
01:08:27.860
So you have some amount of information
link |
01:08:30.180
about your space of possible situations
link |
01:08:32.020
that's provided by the situations you already know.
link |
01:08:34.420
And that's, on the other hand, also provided
link |
01:08:36.540
by the prior knowledge that the system brings
link |
01:08:40.420
to the table, the prior knowledge embedded
link |
01:08:42.340
in the system.
link |
01:08:43.660
So the system starts with some information
link |
01:08:46.420
about the problem, about the task.
link |
01:08:48.860
And it's about going from that information
link |
01:08:52.500
to a program, what we would call a skill program,
link |
01:08:55.340
a behavioral program, that can cover a large area
link |
01:08:58.860
of possible situation space.
link |
01:09:01.660
And essentially, the ratio between that area
link |
01:09:04.100
and the amount of information you start with is intelligence.
link |
01:09:09.740
So a very smart agent can make efficient use
link |
01:09:14.180
of very little information about a new problem
link |
01:09:17.580
and very little prior knowledge as well
link |
01:09:19.580
to cover a very large area of potential situations
link |
01:09:23.380
in that problem without knowing what these future new situations
link |
01:09:28.500
are going to be.
link |
01:09:31.140
So one of the other big things you talk about in the paper,
link |
01:09:34.540
we've talked about a little bit already,
link |
01:09:36.300
but let's talk about it some more,
link |
01:09:37.860
is the actual tests of intelligence.
link |
01:09:41.020
So if we look at human and machine intelligence,
link |
01:09:45.980
do you think tests of intelligence
link |
01:09:48.100
should be different for humans and machines,
link |
01:09:50.340
or how we think about testing of intelligence?
link |
01:09:54.420
Are these fundamentally the same kind of intelligences
link |
01:09:59.740
that we're after, and therefore, the tests should be similar?
link |
01:10:03.780
So if your goal is to create AIs that are more humanlike,
link |
01:10:10.540
then it would be super valuable, obviously,
link |
01:10:12.540
to have a test that's universal, that applies to both AIs
link |
01:10:18.500
and humans, so that you could establish
link |
01:10:20.820
a comparison between the two, that you
link |
01:10:23.260
could tell exactly how intelligent,
link |
01:10:27.340
in terms of human intelligence, a given system is.
link |
01:10:30.420
So that said, the constraints that
link |
01:10:34.260
apply to artificial intelligence and to human intelligence
link |
01:10:37.620
are very different.
link |
01:10:39.340
And your test should account for this difference.
link |
01:10:44.860
Because if you look at artificial systems,
link |
01:10:47.140
it's always possible for an experimenter
link |
01:10:50.420
to buy arbitrary levels of skill at arbitrary tasks,
link |
01:10:55.580
either by injecting hardcoded prior knowledge
link |
01:11:01.100
into the system via rules and so on that
link |
01:11:05.660
come from the human mind, from the minds of the programmers,
link |
01:11:08.660
and also buying higher levels of skill
link |
01:11:12.980
just by training on more data.
link |
01:11:15.620
For instance, you could generate an infinity
link |
01:11:17.860
of different Go games, and you could train a Go playing
link |
01:11:21.660
system that way, but you could not directly compare it
link |
01:11:26.820
to human Go playing skills.
link |
01:11:28.620
Because a human that plays Go had
link |
01:11:31.100
to develop that skill in a very constrained environment.
link |
01:11:34.660
They had a limited amount of time.
link |
01:11:36.580
They had a limited amount of energy.
link |
01:11:38.940
And of course, this started from a different set of priors.
link |
01:11:42.620
This started from innate human priors.
link |
01:11:48.540
So I think if you want to compare
link |
01:11:49.860
the intelligence of two systems, like the intelligence of an AI
link |
01:11:53.260
and the intelligence of a human, you have to control for priors.
link |
01:11:59.780
You have to start from the same set of knowledge priors
link |
01:12:04.500
about the task, and you have to control
link |
01:12:06.940
for experience, that is to say, for training data.
link |
01:12:11.140
So what's priors?
link |
01:12:14.980
So prior is whatever information you
link |
01:12:18.340
have about a given task before you
link |
01:12:21.020
start learning about this task.
link |
01:12:23.100
And how's that different from experience?
link |
01:12:25.780
Well, experience is acquired.
link |
01:12:28.020
So for instance, if you're trying to play Go,
link |
01:12:31.100
your experience with Go is all the Go games
link |
01:12:33.900
you've played, or you've seen, or you've simulated
link |
01:12:37.060
in your mind, let's say.
link |
01:12:38.500
And your priors are things like, well,
link |
01:12:42.740
Go is a game on the 2D grid.
link |
01:12:45.860
And we have lots of hardcoded priors
link |
01:12:48.780
about the organization of 2D space.
link |
01:12:53.180
And the rules of how the dynamics of the physics
link |
01:12:58.340
of this game in this 2D space?
link |
01:12:59.980
Yes.
link |
01:13:00.580
And the idea that you have what winning is.
link |
01:13:04.300
Yes, exactly.
link |
01:13:05.580
And other board games can also share some similarities with Go.
link |
01:13:09.660
And if you've played these board games, then,
link |
01:13:12.060
with respect to the game of Go, that
link |
01:13:13.860
would be part of your priors about the game.
link |
01:13:16.300
Well, it's interesting to think about the game of Go
link |
01:13:18.500
is how many priors are actually brought to the table.
link |
01:13:22.620
When you look at self play, reinforcement learning based
link |
01:13:27.500
mechanisms that do learning, it seems
link |
01:13:29.300
like the number of priors is pretty low.
link |
01:13:31.020
Yes.
link |
01:13:31.380
But you're saying you should be expec...
link |
01:13:32.980
There is a 2D special priors in the carbonate.
link |
01:13:35.700
Right.
link |
01:13:36.460
But you should be clear at making
link |
01:13:39.020
those priors explicit.
link |
01:13:40.460
Yes.
link |
01:13:41.820
So in particular, I think if your goal
link |
01:13:44.060
is to measure a humanlike form of intelligence,
link |
01:13:47.700
then you should clearly establish
link |
01:13:49.700
that you want the AI you're testing
link |
01:13:52.820
to start from the same set of priors that humans start with.
link |
01:13:57.500
Right.
link |
01:13:58.820
So I mean, to me personally, but I think to a lot of people,
link |
01:14:02.740
the human side of things is very interesting.
link |
01:14:05.300
So testing intelligence for humans.
link |
01:14:08.020
What do you think is a good test of human intelligence?
link |
01:14:14.420
Well, that's the question that psychometrics is interested in.
link |
01:14:19.820
There's an entire subfield of psychology
link |
01:14:22.420
that deals with this question.
link |
01:14:23.860
So what's psychometrics?
link |
01:14:25.180
The psychometrics is the subfield of psychology
link |
01:14:27.980
that tries to measure, quantify aspects of the human mind.
link |
01:14:33.940
So in particular, our cognitive abilities, intelligence,
link |
01:14:36.940
and personality traits as well.
link |
01:14:39.660
So what are, it might be a weird question,
link |
01:14:43.620
but what are the first principles of psychometrics
link |
01:14:49.700
this operates on?
link |
01:14:52.100
What are the priors it brings to the table?
link |
01:14:55.340
So it's a field with a fairly long history.
link |
01:15:01.940
So psychology sometimes gets a bad reputation
link |
01:15:05.500
for not having very reproducible results.
link |
01:15:09.020
And psychometrics has actually some fairly solidly
link |
01:15:12.420
reproducible results.
link |
01:15:14.180
So the ideal goals of the field is a test
link |
01:15:17.980
should be reliable, which is a notion tied to reproducibility.
link |
01:15:23.060
It should be valid, meaning that it should actually
link |
01:15:26.540
measure what you say it measures.
link |
01:15:30.860
So for instance, if you're saying
link |
01:15:32.780
that you're measuring intelligence,
link |
01:15:34.140
then your test results should be correlated
link |
01:15:36.620
with things that you expect to be correlated
link |
01:15:39.140
with intelligence like success in school
link |
01:15:41.500
or success in the workplace and so on.
link |
01:15:43.580
Should be standardized, meaning that you
link |
01:15:46.540
can administer your tests to many different people
link |
01:15:48.980
in some conditions.
link |
01:15:50.780
And it should be free from bias.
link |
01:15:52.860
Meaning that, for instance, if your test involves
link |
01:15:57.140
the English language, then you have
link |
01:15:59.100
to be aware that this creates a bias against people
link |
01:16:02.500
who have English as their second language
link |
01:16:04.340
or people who can't speak English at all.
link |
01:16:07.300
So of course, these principles for creating
link |
01:16:10.100
psychometric tests are very much an ideal.
link |
01:16:13.420
I don't think every psychometric test is really either
link |
01:16:17.540
reliable, valid, or free from bias.
link |
01:16:22.060
But at least the field is aware of these weaknesses
link |
01:16:25.740
and is trying to address them.
link |
01:16:27.380
So it's kind of interesting.
link |
01:16:30.100
Ultimately, you're only able to measure,
link |
01:16:31.820
like you said previously, the skill.
link |
01:16:34.420
But you're trying to do a bunch of measures
link |
01:16:36.420
of different skills that correlate,
link |
01:16:38.820
as you mentioned, strongly with some general concept
link |
01:16:41.780
of cognitive ability.
link |
01:16:43.340
Yes, yes.
link |
01:16:44.060
So what's the G factor?
link |
01:16:46.620
So right, there are many different kinds
link |
01:16:48.140
of tests of intelligence.
link |
01:16:50.620
And each of them is interesting in different aspects
link |
01:16:55.340
of intelligence.
link |
01:16:56.060
Some of them will deal with language.
link |
01:16:57.580
Some of them will deal with spatial vision,
link |
01:17:00.940
maybe mental rotations, numbers, and so on.
link |
01:17:04.420
When you run these very different tests at scale,
link |
01:17:08.580
what you start seeing is that there
link |
01:17:10.940
are clusters of correlations among test results.
link |
01:17:14.220
So for instance, if you look at homework at school,
link |
01:17:19.300
you will see that people who do well at math
link |
01:17:21.780
are also likely statistically to do well in physics.
link |
01:17:25.500
And what's more, people who do well at math and physics
link |
01:17:30.060
are also statistically likely to do well
link |
01:17:32.620
in things that sound completely unrelated,
link |
01:17:35.580
like writing an English essay, for instance.
link |
01:17:38.420
And so when you see clusters of correlations
link |
01:17:42.700
in statistical terms, you would explain them
link |
01:17:46.140
with the latent variable.
link |
01:17:47.540
And the latent variable that would, for instance, explain
link |
01:17:51.100
the relationship between being good at math
link |
01:17:53.020
and being good at physics would be cognitive ability.
link |
01:17:57.020
And the G factor is the latent variable
link |
01:18:00.780
that explains the fact that every test of intelligence
link |
01:18:05.540
that you can come up with results on this test
link |
01:18:09.340
end up being correlated.
link |
01:18:10.540
So there is some single unique variable
link |
01:18:16.180
that explains these correlations.
link |
01:18:17.820
That's the G factor.
link |
01:18:18.820
So it's a statistical construct.
link |
01:18:20.380
It's not really something you can directly measure,
link |
01:18:23.060
for instance, in a person.
link |
01:18:25.540
But it's there.
link |
01:18:26.540
But it's there.
link |
01:18:27.220
It's there.
link |
01:18:27.740
It's there at scale.
link |
01:18:28.740
And that's also one thing I want to mention about psychometrics.
link |
01:18:33.460
Like when you talk about measuring intelligence
link |
01:18:36.620
in humans, for instance, some people
link |
01:18:38.660
get a little bit worried.
link |
01:18:40.060
They will say, that sounds dangerous.
link |
01:18:41.940
Maybe that sounds potentially discriminatory, and so on.
link |
01:18:44.340
And they're not wrong.
link |
01:18:46.460
And the thing is, personally, I'm
link |
01:18:48.220
not interested in psychometrics as a way
link |
01:18:51.100
to characterize one individual person.
link |
01:18:54.740
Like if I get your psychometric personality
link |
01:18:59.180
assessments or your IQ, I don't think that actually
link |
01:19:01.780
tells me much about you as a person.
link |
01:19:05.020
I think psychometrics is most useful as a statistical tool.
link |
01:19:10.300
So it's most useful at scale.
link |
01:19:12.500
It's most useful when you start getting test results
link |
01:19:15.420
for a large number of people.
link |
01:19:17.420
And you start cross correlating these test results.
link |
01:19:20.580
Because that gives you information
link |
01:19:23.620
about the structure of the human mind,
link |
01:19:26.420
in particular about the structure
link |
01:19:28.300
of human cognitive abilities.
link |
01:19:29.780
So at scale, psychometrics paints a certain picture
link |
01:19:34.860
of the human mind.
link |
01:19:35.620
And that's interesting.
link |
01:19:37.220
And that's what's relevant to AI, the structure
link |
01:19:39.540
of human cognitive abilities.
link |
01:19:41.060
Yeah, it gives you an insight into it.
link |
01:19:42.860
I mean, to me, I remember when I learned about G factor,
link |
01:19:45.820
it seemed like it would be impossible for it
link |
01:19:52.820
to be real, even as a statistical variable.
link |
01:19:55.500
Like it felt kind of like astrology.
link |
01:19:59.020
Like it's like wishful thinking among psychologists.
link |
01:20:01.980
But the more I learned, I realized that there's some.
link |
01:20:05.420
I mean, I'm not sure what to make about human beings,
link |
01:20:07.620
the fact that the G factor is a thing.
link |
01:20:10.580
There's a commonality across all of human species,
link |
01:20:13.260
that there does seem to be a strong correlation
link |
01:20:15.340
between cognitive abilities.
link |
01:20:17.140
That's kind of fascinating, actually.
link |
01:20:19.140
So human cognitive abilities have a structure.
link |
01:20:22.780
Like the most mainstream theory of the structure
link |
01:20:25.380
of cognitive abilities is called CHC theory.
link |
01:20:28.780
It's Cattell, Horn, Carroll.
link |
01:20:30.660
It's named after the three psychologists who
link |
01:20:33.180
contributed key pieces of it.
link |
01:20:35.340
And it describes cognitive abilities
link |
01:20:38.620
as a hierarchy with three levels.
link |
01:20:41.060
And at the top, you have the G factor.
link |
01:20:43.140
Then you have broad cognitive abilities,
link |
01:20:46.140
for instance fluid intelligence, that
link |
01:20:49.340
encompass a broad set of possible kinds of tasks
link |
01:20:54.940
that are all related.
link |
01:20:57.100
And then you have narrow cognitive abilities
link |
01:20:59.900
at the last level, which is closer to task specific skill.
link |
01:21:04.340
And there are actually different theories of the structure
link |
01:21:09.100
of cognitive abilities that just emerge
link |
01:21:10.700
from different statistical analysis of IQ test results.
link |
01:21:14.500
But they all describe a hierarchy with a kind of G
link |
01:21:18.500
factor at the top.
link |
01:21:21.140
And you're right that the G factor,
link |
01:21:23.740
it's not quite real in the sense that it's not something
link |
01:21:27.620
you can observe and measure, like your height,
link |
01:21:29.660
for instance.
link |
01:21:30.340
But it's real in the sense that you
link |
01:21:32.940
see it in a statistical analysis of the data.
link |
01:21:37.780
One thing I want to mention is that the fact
link |
01:21:39.700
that there is a G factor does not really
link |
01:21:41.540
mean that human intelligence is general in a strong sense.
link |
01:21:45.740
It does not mean human intelligence
link |
01:21:47.220
can be applied to any problem at all,
link |
01:21:50.340
and that someone who has a high IQ
link |
01:21:52.140
is going to be able to solve any problem at all.
link |
01:21:54.100
That's not quite what it means.
link |
01:21:55.260
I think one popular analogy to understand it
link |
01:22:00.420
is the sports analogy.
link |
01:22:03.340
If you consider the concept of physical fitness,
link |
01:22:06.660
it's a concept that's very similar to intelligence
link |
01:22:09.220
because it's a useful concept.
link |
01:22:11.340
It's something you can intuitively understand.
link |
01:22:14.460
Some people are fit, maybe like you.
link |
01:22:17.620
Some people are not as fit, maybe like me.
link |
01:22:20.540
But none of us can fly.
link |
01:22:22.980
Absolutely.
link |
01:22:23.700
It's constrained to a specific set of skills.
link |
01:22:25.460
Even if you're very fit, that doesn't
link |
01:22:27.060
mean you can do anything at all in any environment.
link |
01:22:31.020
You obviously cannot fly.
link |
01:22:32.420
You cannot survive at the bottom of the ocean and so on.
link |
01:22:36.020
And if you were a scientist and you
link |
01:22:38.540
wanted to precisely define and measure physical fitness
link |
01:22:42.820
in humans, then you would come up with a battery of tests.
link |
01:22:47.500
You would have running 100 meter, playing soccer,
link |
01:22:51.580
playing table tennis, swimming, and so on.
link |
01:22:54.260
And if you ran these tests over many different people,
link |
01:22:58.420
you would start seeing correlations in test results.
link |
01:23:01.220
For instance, people who are good at soccer
link |
01:23:03.020
are also good at sprinting.
link |
01:23:05.620
And you would explain these correlations
link |
01:23:08.580
with physical abilities that are strictly
link |
01:23:11.660
analogous to cognitive abilities.
link |
01:23:14.020
And then you would start also observing correlations
link |
01:23:17.060
between biological characteristics,
link |
01:23:21.220
like maybe lung volume is correlated with being
link |
01:23:24.900
a fast runner, for instance, in the same way
link |
01:23:27.820
that there are neurophysical correlates of cognitive
link |
01:23:32.500
abilities.
link |
01:23:33.940
And at the top of the hierarchy of physical abilities
link |
01:23:38.620
that you would be able to observe,
link |
01:23:39.980
you would have a G factor, a physical G factor, which
link |
01:23:43.340
would map to physical fitness.
link |
01:23:45.740
And as you just said, that doesn't
link |
01:23:47.980
mean that people with high physical fitness can't fly.
link |
01:23:51.340
It doesn't mean human morphology and human physiology
link |
01:23:54.500
is universal.
link |
01:23:55.660
It's actually super specialized.
link |
01:23:57.860
We can only do the things that we were evolved to do.
link |
01:24:04.100
We are not appropriate to, you could not
link |
01:24:08.340
exist on Venus or Mars or in the void of space
link |
01:24:11.100
or the bottom of the ocean.
link |
01:24:12.460
So that said, one thing that's really striking and remarkable
link |
01:24:17.740
is that our morphology generalizes
link |
01:24:23.060
far beyond the environments that we evolved for.
link |
01:24:27.260
Like in a way, you could say we evolved to run after prey
link |
01:24:31.180
in the savanna, right?
link |
01:24:32.900
That's very much where our human morphology comes from.
link |
01:24:36.820
And that said, we can do a lot of things
link |
01:24:40.220
that are completely unrelated to that.
link |
01:24:42.980
We can climb mountains.
link |
01:24:44.260
We can swim across lakes.
link |
01:24:47.260
We can play table tennis.
link |
01:24:48.980
I mean, table tennis is very different from what
link |
01:24:51.060
we were evolved to do, right?
link |
01:24:53.100
So our morphology, our bodies, our sense and motor
link |
01:24:56.300
affordances have a degree of generality
link |
01:24:59.500
that is absolutely remarkable, right?
link |
01:25:02.180
And I think cognition is very similar to that.
link |
01:25:05.300
Our cognitive abilities have a degree of generality
link |
01:25:08.260
that goes far beyond what the mind was initially
link |
01:25:11.180
supposed to do, which is why we can play music and write
link |
01:25:14.540
novels and go to Mars and do all kinds of crazy things.
link |
01:25:18.580
But it's not universal in the same way
link |
01:25:20.780
that human morphology and our body
link |
01:25:23.420
is not appropriate for actually most of the universe by volume.
link |
01:25:27.500
In the same way, you could say that the human mind is not
link |
01:25:29.940
really appropriate for most of problem space,
link |
01:25:32.620
potential problem space by volume.
link |
01:25:35.460
So we have very strong cognitive biases, actually,
link |
01:25:39.660
that mean that there are certain types of problems
link |
01:25:42.620
that we handle very well and certain types of problems
link |
01:25:45.380
that we are completely in adapted for.
link |
01:25:48.260
So that's really how we'd interpret the G factor.
link |
01:25:52.420
It's not a sign of strong generality.
link |
01:25:56.340
It's really just the broadest cognitive ability.
link |
01:26:01.020
But our abilities, whether we are
link |
01:26:03.020
talking about sensory motor abilities or cognitive
link |
01:26:05.820
abilities, they still remain very specialized
link |
01:26:09.460
in the human condition, right?
link |
01:26:12.420
Within the constraints of the human cognition,
link |
01:26:16.300
they're general.
link |
01:26:18.300
Yes, absolutely.
link |
01:26:19.500
But the constraints, as you're saying, are very limited.
link |
01:26:22.140
I think what's limiting.
link |
01:26:23.860
So we evolved our cognition and our body
link |
01:26:26.980
evolved in very specific environments.
link |
01:26:29.420
Because our environment was so variable, fast changing,
link |
01:26:32.740
and so unpredictable, part of the constraints
link |
01:26:35.740
that drove our evolution is generality itself.
link |
01:26:39.540
So we were, in a way, evolved to be able to improvise
link |
01:26:42.780
in all kinds of physical or cognitive environments.
link |
01:26:47.540
And for this reason, it turns out
link |
01:26:49.900
that the minds and bodies that we ended up with
link |
01:26:55.060
can be applied to much, much broader scope
link |
01:26:58.020
than what they were evolved for.
link |
01:27:00.060
And that's truly remarkable.
link |
01:27:01.740
And that's a degree of generalization
link |
01:27:03.940
that is far beyond anything you can see in artificial systems
link |
01:27:07.540
today.
link |
01:27:10.300
That said, it does not mean that human intelligence
link |
01:27:14.500
is anywhere universal.
link |
01:27:16.380
Yeah, it's not general.
link |
01:27:18.900
It's a kind of exciting topic for people,
link |
01:27:21.140
even outside of artificial intelligence, is IQ tests.
link |
01:27:27.580
I think it's Mensa, whatever.
link |
01:27:29.220
There's different degrees of difficulty for questions.
link |
01:27:32.420
We talked about this offline a little bit, too,
link |
01:27:34.700
about difficult questions.
link |
01:27:37.500
What makes a question on an IQ test more difficult or less
link |
01:27:42.300
difficult, do you think?
link |
01:27:43.700
So the thing to keep in mind is that there's
link |
01:27:46.500
no such thing as a question that's intrinsically difficult.
link |
01:27:51.540
It has to be difficult to suspect to the things you
link |
01:27:54.580
already know and the things you can already do, right?
link |
01:27:58.540
So in terms of an IQ test question,
link |
01:28:02.740
typically it would be structured, for instance,
link |
01:28:05.980
as a set of demonstration input and output pairs, right?
link |
01:28:11.860
And then you would be given a test input, a prompt,
link |
01:28:15.420
and you would need to recognize or produce
link |
01:28:18.700
the corresponding output.
link |
01:28:20.300
And in that narrow context, you could say a difficult question
link |
01:28:26.060
is a question where the input prompt is
link |
01:28:31.580
very surprising and unexpected, given the training examples.
link |
01:28:36.540
Just even the nature of the patterns
link |
01:28:38.340
that you're observing in the input prompt.
link |
01:28:40.180
For instance, let's say you have a rotation problem.
link |
01:28:43.260
You must relate the shape by 90 degrees.
link |
01:28:46.660
If I give you two examples and then I give you one prompt,
link |
01:28:50.500
which is actually one of the two training examples,
link |
01:28:53.020
then there is zero generalization difficulty
link |
01:28:55.700
for the task.
link |
01:28:56.380
It's actually a trivial task.
link |
01:28:57.500
You just recognize that it's one of the training examples,
link |
01:29:00.780
and you produce the same answer.
link |
01:29:02.300
Now, if it's a more complex shape,
link |
01:29:05.580
there is a little bit more generalization,
link |
01:29:07.700
but it remains that you are still
link |
01:29:09.860
doing the same thing at this time,
link |
01:29:12.060
as you were being demonstrated at training time.
link |
01:29:15.060
A difficult task starts to require some amount of test
link |
01:29:20.300
time adaptation, some amount of improvisation, right?
link |
01:29:25.100
So consider, I don't know, you're
link |
01:29:29.260
teaching a class on quantum physics or something.
link |
01:29:34.020
If you wanted to test the understanding that students
link |
01:29:40.460
have of the material, you would come up
link |
01:29:42.220
with an exam that's very different from anything
link |
01:29:47.740
they've seen on the internet when they were cramming.
link |
01:29:51.940
On the other hand, if you wanted to make it easy,
link |
01:29:54.780
you would just give them something
link |
01:29:56.340
that's very similar to the mock exams
link |
01:30:00.420
that they've taken, something that's
link |
01:30:03.060
just a simple interpolation of questions
link |
01:30:05.220
that they've already seen.
link |
01:30:07.260
And so that would be an easy exam.
link |
01:30:09.220
It's very similar to what you've been trained on.
link |
01:30:11.940
And a difficult exam is one that really probes your understanding
link |
01:30:15.460
because it forces you to improvise.
link |
01:30:18.980
It forces you to do things that are
link |
01:30:22.180
different from what you were exposed to before.
link |
01:30:24.780
So that said, it doesn't mean that the exam that
link |
01:30:29.100
requires improvisation is intrinsically hard, right?
link |
01:30:32.700
Because maybe you're a quantum physics expert.
link |
01:30:35.820
So when you take the exam, this is actually
link |
01:30:37.780
stuff that, despite being new to the students,
link |
01:30:40.300
it's not new to you, right?
link |
01:30:42.900
So it can only be difficult with respect
link |
01:30:46.020
to what the test taker already knows
link |
01:30:49.380
and with respect to the information
link |
01:30:51.780
that the test taker has about the task.
link |
01:30:54.700
So that's what I mean by controlling for priors
link |
01:30:57.860
what the information you bring to the table.
link |
01:30:59.900
And the experience.
link |
01:31:00.660
And the experience, which is to train data.
link |
01:31:02.660
So in the case of the quantum physics exam,
link |
01:31:05.580
that would be all the course material itself
link |
01:31:09.740
and all the mock exams that students
link |
01:31:11.500
might have taken online.
link |
01:31:12.820
Yeah, it's interesting because I've also sent you an email.
link |
01:31:17.700
I asked you, I've been in just this curious question
link |
01:31:21.820
of what's a really hard IQ test question.
link |
01:31:27.300
And I've been talking to also people
link |
01:31:30.580
who have designed IQ tests.
link |
01:31:32.540
There's a few folks on the internet, it's like a thing.
link |
01:31:34.420
People are really curious about it.
link |
01:31:36.180
First of all, most of the IQ tests they designed,
link |
01:31:39.460
they like religiously protect against the correct answers.
link |
01:31:45.620
Like you can't find the correct answers anywhere.
link |
01:31:48.380
In fact, the question is ruined once you know,
link |
01:31:50.620
even like the approach you're supposed to take.
link |
01:31:53.700
So they're very...
link |
01:31:54.540
That said, the approach is implicit in the training examples.
link |
01:31:58.420
So if you release the training examples, it's over.
link |
01:32:02.740
Which is why in Arc, for instance,
link |
01:32:04.980
there is a test set that is private and no one has seen it.
link |
01:32:09.140
No, for really tough IQ questions, it's not obvious.
link |
01:32:13.580
It's not because the ambiguity.
link |
01:32:17.100
Like it's, I mean, we'll have to look through them,
link |
01:32:20.780
but like some number sequences and so on,
link |
01:32:22.860
it's not completely clear.
link |
01:32:25.060
So like you can get a sense, but there's like some,
link |
01:32:30.540
you know, when you look at a number sequence, I don't know,
link |
01:32:36.140
like your Fibonacci number sequence,
link |
01:32:37.620
if you look at the first few numbers,
link |
01:32:39.580
that sequence could be completed in a lot of different ways.
link |
01:32:42.980
And you know, some are, if you think deeply,
link |
01:32:45.620
are more correct than others.
link |
01:32:46.900
Like there's a kind of intuitive simplicity
link |
01:32:51.300
and elegance to the correct solution.
link |
01:32:53.020
Yes.
link |
01:32:53.860
I am personally not a fan of ambiguity
link |
01:32:56.420
in test questions actually,
link |
01:32:58.660
but I think you can have difficulty
link |
01:33:01.140
without requiring ambiguity simply by making the test
link |
01:33:05.620
require a lot of extrapolation over the training examples.
link |
01:33:09.500
But the beautiful question is difficult,
link |
01:33:13.340
but gives away everything
link |
01:33:14.500
when you give the training example.
link |
01:33:17.180
Basically, yes.
link |
01:33:18.460
Meaning that, so the tests I'm interested in creating
link |
01:33:24.020
are not necessarily difficult for humans
link |
01:33:27.740
because human intelligence is the benchmark.
link |
01:33:31.580
They're supposed to be difficult for machines
link |
01:33:34.380
in ways that are easy for humans.
link |
01:33:36.300
Like I think an ideal test of human and machine intelligence
link |
01:33:40.820
is a test that is actionable,
link |
01:33:44.380
that highlights the need for progress,
link |
01:33:48.260
and that highlights the direction
link |
01:33:50.060
in which you should be making progress.
link |
01:33:51.500
I think we'll talk about the ARC challenge
link |
01:33:54.340
and the test you've constructed
link |
01:33:55.580
and you have these elegant examples.
link |
01:33:58.100
I think that highlight,
link |
01:33:59.180
like this is really easy for us humans,
link |
01:34:01.820
but it's really hard for machines.
link |
01:34:04.580
But on the, you know, the designing an IQ test
link |
01:34:09.220
for IQs of like higher than 160 and so on,
link |
01:34:13.380
you have to say, you have to take that
link |
01:34:15.220
and put it on steroids, right?
link |
01:34:16.500
You have to think like, what is hard for humans?
link |
01:34:19.540
And that's a fascinating exercise in itself, I think.
link |
01:34:25.940
And it was an interesting question
link |
01:34:27.740
of what it takes to create a really hard question for humans
link |
01:34:32.300
because you again have to do the same process
link |
01:34:36.340
as you mentioned, which is, you know,
link |
01:34:39.900
something basically where the experience
link |
01:34:45.100
that you have likely to have encountered
link |
01:34:46.900
throughout your whole life,
link |
01:34:48.740
even if you've prepared for IQ tests,
link |
01:34:51.780
which is a big challenge,
link |
01:34:53.380
that this will still be novel for you.
link |
01:34:55.820
Yeah, I mean, novelty is a requirement.
link |
01:34:58.900
You should not be able to practice for the questions
link |
01:35:02.100
that you're gonna be tested on.
link |
01:35:03.780
That's important because otherwise what you're doing
link |
01:35:06.700
is not exhibiting intelligence.
link |
01:35:08.180
What you're doing is just retrieving
link |
01:35:10.900
what you've been exposed before.
link |
01:35:12.380
It's the same thing as deep learning model.
link |
01:35:14.500
If you train a deep learning model
link |
01:35:15.900
on all the possible answers, then it will ace your test
link |
01:35:20.100
in the same way that, you know,
link |
01:35:24.420
a stupid student can still ace the test
link |
01:35:28.100
if they cram for it.
link |
01:35:30.140
They memorize, you know,
link |
01:35:32.500
a hundred different possible mock exams.
link |
01:35:34.980
And then they hope that the actual exam
link |
01:35:37.180
will be a very simple interpolation of the mock exams.
link |
01:35:41.180
And that student could just be a deep learning model
link |
01:35:43.180
at that point.
link |
01:35:44.020
But you can actually do that
link |
01:35:45.900
without any understanding of the material.
link |
01:35:48.180
And in fact, many students pass their exams
link |
01:35:50.540
in exactly this way.
link |
01:35:51.940
And if you want to avoid that,
link |
01:35:53.140
you need an exam that's unlike anything they've seen
link |
01:35:56.660
that really probes their understanding.
link |
01:36:00.020
So how do we design an IQ test for machines,
link |
01:36:05.020
an intelligent test for machines?
link |
01:36:07.860
All right, so in the paper I outline
link |
01:36:10.300
a number of requirements that you expect of such a test.
link |
01:36:14.780
And in particular, we should start by acknowledging
link |
01:36:19.620
the priors that we expect to be required
link |
01:36:23.300
in order to perform the test.
link |
01:36:25.260
So we should be explicit about the priors, right?
link |
01:36:28.100
And if the goal is to compare machine intelligence
link |
01:36:31.780
and human intelligence,
link |
01:36:32.740
then we should assume human cognitive priors, right?
link |
01:36:36.980
And secondly, we should make sure that we are testing
link |
01:36:42.020
for skill acquisition ability,
link |
01:36:44.820
skill acquisition efficiency in particular,
link |
01:36:46.740
and not for skill itself.
link |
01:36:48.580
Meaning that every task featured in your test
link |
01:36:51.860
should be novel and should not be something
link |
01:36:54.420
that you can anticipate.
link |
01:36:55.980
So for instance, it should not be possible
link |
01:36:57.980
to brute force the space of possible questions, right?
link |
01:37:02.860
To pre generate every possible question and answer.
link |
01:37:06.940
So it should be tasks that cannot be anticipated,
link |
01:37:10.660
not just by the system itself,
link |
01:37:12.460
but by the creators of the system, right?
link |
01:37:15.940
Yeah, you know what's fascinating?
link |
01:37:17.660
I mean, one of my favorite aspects of the paper
link |
01:37:20.820
and the work you do with the ARC challenge
link |
01:37:22.860
is the process of making priors explicit.
link |
01:37:28.940
Just even that act alone is a really powerful one
link |
01:37:33.420
of like, what are, it's a really powerful question
link |
01:37:39.260
asked of us humans.
link |
01:37:40.500
What are the priors that we bring to the table?
link |
01:37:44.180
So the next step is like, once you have those priors,
link |
01:37:46.900
how do you use them to solve a novel task?
link |
01:37:50.060
But like, just even making the priors explicit
link |
01:37:52.940
is a really difficult and really powerful step.
link |
01:37:56.140
And that's like visually beautiful
link |
01:37:58.940
and conceptually philosophically beautiful part
link |
01:38:01.340
of the work you did with, and I guess continue to do
link |
01:38:06.020
probably with the paper and the ARC challenge.
link |
01:38:08.460
Can you talk about some of the priors
link |
01:38:10.740
that we're talking about here?
link |
01:38:12.380
Yes, so a researcher has done a lot of work
link |
01:38:15.380
on what exactly are the knowledge priors
link |
01:38:19.460
that are innate to humans is Elizabeth Spelke from Harvard.
link |
01:38:26.500
So she developed the core knowledge theory,
link |
01:38:30.580
which outlines four different core knowledge systems.
link |
01:38:36.500
So systems of knowledge that we are basically
link |
01:38:39.180
either born with or that we are hardwired
link |
01:38:43.660
to acquire very early on in our development.
link |
01:38:47.180
And there's no strong distinction between the two.
link |
01:38:52.060
Like if you are primed to acquire
link |
01:38:57.060
a certain type of knowledge in just a few weeks,
link |
01:39:01.220
you might as well just be born with it.
link |
01:39:03.500
It's just part of who you are.
link |
01:39:06.460
And so there are four different core knowledge systems.
link |
01:39:09.500
Like the first one is the notion of objectness
link |
01:39:13.460
and basic physics.
link |
01:39:16.340
Like you recognize that something that moves
link |
01:39:20.700
coherently, for instance, is an object.
link |
01:39:23.220
So we intuitively, naturally, innately divide the world
link |
01:39:28.260
into objects based on this notion of coherence,
link |
01:39:31.260
physical coherence.
link |
01:39:32.740
And in terms of elementary physics,
link |
01:39:34.700
there's the fact that objects can bump against each other
link |
01:39:41.620
and the fact that they can occlude each other.
link |
01:39:44.460
So these are things that we are essentially born with
link |
01:39:48.300
or at least that we are going to be acquiring extremely early
link |
01:39:52.500
because we're really hardwired to acquire them.
link |
01:39:55.620
So a bunch of points, pixels that move together
link |
01:39:59.940
on objects are partly the same object.
link |
01:40:02.820
Yes.
link |
01:40:07.660
I don't smoke weed, but if I did,
link |
01:40:11.260
that's something I could sit all night
link |
01:40:13.100
and just think about, remember what I wrote in your paper,
link |
01:40:15.700
just objectness, I wasn't self aware, I guess,
link |
01:40:19.700
of that particular prior.
link |
01:40:23.180
That's such a fascinating prior that like...
link |
01:40:28.500
That's the most basic one, but actually...
link |
01:40:30.940
Objectness, just identity, just objectness.
link |
01:40:34.420
It's very basic, I suppose, but it's so fundamental.
link |
01:40:39.060
It is fundamental to human cognition.
link |
01:40:41.380
Yeah.
link |
01:40:42.220
The second prior that's also fundamental is agentness,
link |
01:40:46.660
which is not a real world, a real world, so agentness.
link |
01:40:50.740
The fact that some of these objects
link |
01:40:53.340
that you segment your environment into,
link |
01:40:56.540
some of these objects are agents.
link |
01:40:58.940
So what's an agent?
link |
01:41:00.300
It's basically, it's an object that has goals.
link |
01:41:05.380
That has what?
link |
01:41:06.340
That has goals, that is capable of pursuing goals.
link |
01:41:09.420
So for instance, if you see two dots
link |
01:41:12.580
moving in roughly synchronized fashion,
link |
01:41:16.300
you will intuitively infer that one of the dots
link |
01:41:19.820
is pursuing the other.
link |
01:41:21.620
So that one of the dots is...
link |
01:41:24.980
And one of the dots is an agent
link |
01:41:27.380
and its goal is to avoid the other dot.
link |
01:41:29.460
And one of the dots, the other dot is also an agent
link |
01:41:32.740
and its goal is to catch the first dot.
link |
01:41:35.860
Belke has shown that babies as young as three months
link |
01:41:40.540
identify agentness and goal directedness
link |
01:41:45.220
in their environment.
link |
01:41:46.420
Another prior is basic geometry and topology,
link |
01:41:52.140
like the notion of distance,
link |
01:41:53.660
the ability to navigate in your environment and so on.
link |
01:41:57.620
This is something that is fundamentally hardwired
link |
01:42:01.380
into our brain.
link |
01:42:02.700
It's in fact backed by very specific neural mechanisms,
link |
01:42:07.100
like for instance, grid cells and place cells.
link |
01:42:10.820
So it's something that's literally hard coded
link |
01:42:15.260
at the neural level in our hippocampus.
link |
01:42:19.940
And the last prior would be the notion of numbers.
link |
01:42:23.580
Like numbers are not actually a cultural construct.
link |
01:42:26.460
We are intuitively, innately able to do some basic counting
link |
01:42:31.460
and to compare quantities.
link |
01:42:34.100
So it doesn't mean we can do arbitrary arithmetic.
link |
01:42:37.660
Counting, the actual counting.
link |
01:42:39.020
Counting, like counting one, two, three ish,
link |
01:42:41.500
then maybe more than three.
link |
01:42:43.700
You can also compare quantities.
link |
01:42:45.140
If I give you three dots and five dots,
link |
01:42:48.580
you can tell the side with five dots has more dots.
link |
01:42:52.500
So this is actually an innate prior.
link |
01:42:56.580
So that said, the list may not be exhaustive.
link |
01:43:00.020
So SpellKey is still, you know,
link |
01:43:02.580
passing the potential existence of new knowledge systems.
link |
01:43:08.500
For instance, knowledge systems that we deal
link |
01:43:12.100
with social relationships.
link |
01:43:15.940
Yeah, I mean, and there could be...
link |
01:43:17.700
Which is much less relevant to something like ARC
link |
01:43:22.060
or IQ test and so on.
link |
01:43:22.900
Right.
link |
01:43:23.740
There could be stuff that's like you said,
link |
01:43:26.740
rotation, symmetry, is there like...
link |
01:43:29.020
Symmetry is really interesting.
link |
01:43:31.060
It's very likely that there is, speaking about rotation,
link |
01:43:34.380
that there is in the brain, a hard coded system
link |
01:43:38.900
that is capable of performing rotations.
link |
01:43:42.060
One famous experiment that people did in the...
link |
01:43:45.660
I don't remember which was exactly,
link |
01:43:48.180
but in the 70s was that people found that
link |
01:43:53.180
if you asked people, if you give them two different shapes
link |
01:43:57.580
and one of the shapes is a rotated version
link |
01:44:01.420
of the first shape, and you ask them,
link |
01:44:03.340
is that shape a rotated version of the first shape or not?
link |
01:44:07.060
What you see is that the time it takes people to answer
link |
01:44:11.140
is linearly proportional, right, to the angle of rotation.
link |
01:44:16.140
So it's almost like you have somewhere in your brain
link |
01:44:19.660
like a turntable with a fixed speed.
link |
01:44:24.020
And if you want to know if two objects are a rotated version
link |
01:44:28.620
of each other, you put the object on the turntable,
link |
01:44:31.700
you let it move around a little bit,
link |
01:44:34.740
and then you stop when you have a match.
link |
01:44:37.580
And that's really interesting.
link |
01:44:40.140
So what's the ARC challenge?
link |
01:44:42.740
So in the paper, I outline all these principles
link |
01:44:47.380
that a good test of machine intelligence
link |
01:44:50.140
and human intelligence should follow.
link |
01:44:51.940
And the ARC challenge is one attempt
link |
01:44:55.300
to embody as many of these principles as possible.
link |
01:44:58.540
So I don't think it's anywhere near a perfect attempt, right?
link |
01:45:03.780
It does not actually follow every principle,
link |
01:45:06.060
but it is what I was able to do given the constraints.
link |
01:45:10.700
So the format of ARC is very similar to classic IQ tests,
link |
01:45:15.540
in particular Raven's Progressive Metrices.
link |
01:45:18.020
Raven's?
link |
01:45:18.980
Yeah, Raven's Progressive Metrices.
link |
01:45:20.580
I mean, if you've done IQ tests in the past,
link |
01:45:22.820
you know what that is, probably.
link |
01:45:24.220
Or at least you've seen it, even if you
link |
01:45:25.620
don't know what it's called.
link |
01:45:26.980
And so you have a set of tasks, that's what they're called.
link |
01:45:32.300
And for each task, you have training data,
link |
01:45:37.180
which is a set of input and output pairs.
link |
01:45:40.260
So an input or output pair is a grid of colors, basically.
link |
01:45:45.540
The grid, the size of the grid is variables.
link |
01:45:48.500
The size of the grid is variable.
link |
01:45:51.380
And you're given an input, and you must transform it
link |
01:45:56.100
into the proper output.
link |
01:45:59.020
And so you're shown a few demonstrations
link |
01:46:02.060
of a task in the form of existing input output pairs,
link |
01:46:05.100
and then you're given a new input.
link |
01:46:06.860
And you must provide, you must produce the correct output.
link |
01:46:12.620
And the assumptions in Arc is that every task should only
link |
01:46:22.860
require core knowledge priors, should not
link |
01:46:27.660
require any outside knowledge.
link |
01:46:30.460
So for instance, no language, no English, nothing like this.
link |
01:46:36.900
No concepts taken from our human experience,
link |
01:46:41.540
like trees, dogs, cats, and so on.
link |
01:46:44.340
So only reasoning tasks that are built on top
link |
01:46:49.700
of core knowledge priors.
link |
01:46:52.060
And some of the tasks are actually explicitly
link |
01:46:56.260
trying to probe specific forms of abstraction.
link |
01:47:02.220
Part of the reason why I wanted to create Arc
link |
01:47:05.500
is I'm a big believer in when you're
link |
01:47:11.740
faced with a problem as murky as understanding
link |
01:47:18.340
how to autonomously generate abstraction in a machine,
link |
01:47:22.380
you have to coevolve the solution and the problem.
link |
01:47:27.180
And so part of the reason why I designed Arc
link |
01:47:29.380
was to clarify my ideas about the nature of abstraction.
link |
01:47:34.660
And some of the tasks are actually
link |
01:47:36.220
designed to probe bits of that theory.
link |
01:47:39.900
And there are things that turn out
link |
01:47:42.340
to be very easy for humans to perform, including young kids,
link |
01:47:46.740
but turn out to be near impossible for machines.
link |
01:47:50.500
So what have you learned from the nature of abstraction
link |
01:47:53.780
from designing that?
link |
01:47:58.380
Can you clarify what you mean?
link |
01:47:59.620
One of the things you wanted to try to understand
link |
01:48:02.300
was this idea of abstraction.
link |
01:48:06.020
Yes, so clarifying my own ideas about abstraction
link |
01:48:10.380
by forcing myself to produce tasks that
link |
01:48:13.700
would require the ability to produce
link |
01:48:17.020
that form of abstraction in order to solve them.
link |
01:48:19.900
Got it.
link |
01:48:20.860
OK, so and by the way, just the people should check out.
link |
01:48:24.060
I'll probably overlay if you're watching the video part.
link |
01:48:26.380
But the grid input output with the different colors
link |
01:48:32.180
on the grid, that's it.
link |
01:48:34.340
I mean, it's a very simple world,
link |
01:48:36.300
but it's kind of beautiful.
link |
01:48:37.460
It's very similar to classic IQ tests.
link |
01:48:39.740
It's not very original in that sense.
link |
01:48:41.620
The main difference with IQ tests
link |
01:48:43.260
is that we make the priors explicit, which is not
link |
01:48:46.860
usually the case in IQ tests.
link |
01:48:48.580
So you make it explicit that everything should only
link |
01:48:50.820
be built on top of core knowledge priors.
link |
01:48:53.860
I also think it's generally more diverse than IQ tests
link |
01:48:58.620
in general.
link |
01:49:00.300
And it perhaps requires a bit more manual work
link |
01:49:03.820
to produce solutions, because you
link |
01:49:05.460
have to click around on a grid for a while.
link |
01:49:08.500
Sometimes the grids can be as large as 30 by 30 cells.
link |
01:49:12.020
So how did you come up, if you can reveal, with the questions?
link |
01:49:18.020
What's the process of the questions?
link |
01:49:19.580
Was it mostly you that came up with the questions?
link |
01:49:23.380
How difficult is it to come up with a question?
link |
01:49:25.780
Is this scalable to a much larger number?
link |
01:49:30.700
If we think, with IQ tests, you might not necessarily
link |
01:49:33.740
want it to or need it to be scalable.
link |
01:49:36.460
With machines, it's possible, you
link |
01:49:39.580
could argue, that it needs to be scalable.
link |
01:49:41.620
So there are 1,000 questions, 1,000 tasks,
link |
01:49:46.500
including the test set, the prior test set.
link |
01:49:49.140
I think it's fairly difficult in the sense
link |
01:49:51.060
that a big requirement is that every task should
link |
01:49:54.500
be novel and unique and unpredictable.
link |
01:50:00.140
You don't want to create your own little world that
link |
01:50:04.460
is simple enough that it would be possible for a human
link |
01:50:08.860
to reverse and generate and write down
link |
01:50:12.580
an algorithm that could generate every possible arc
link |
01:50:15.940
task and their solution.
link |
01:50:17.060
So that would completely invalidate the test.
link |
01:50:19.340
So you're constantly coming up with new stuff.
link |
01:50:21.700
Yeah, you need a source of novelty,
link |
01:50:24.820
of unfakeable novelty.
link |
01:50:27.860
And one thing I found is that, as a human,
link |
01:50:32.020
you are not a very good source of unfakeable novelty.
link |
01:50:36.460
And so you have to base the creation of these tasks
link |
01:50:40.580
quite a bit.
link |
01:50:41.100
There are only so many unique tasks
link |
01:50:42.980
that you can do in a given day.
link |
01:50:45.580
So that means coming up with truly original new ideas.
link |
01:50:49.860
Did psychedelics help you at all?
link |
01:50:52.380
No, I'm just kidding.
link |
01:50:53.780
But I mean, that's fascinating to think about.
link |
01:50:55.820
So you would be walking or something like that.
link |
01:50:58.780
Are you constantly thinking of something totally new?
link |
01:51:02.860
Yes.
link |
01:51:06.020
This is hard.
link |
01:51:06.980
This is hard.
link |
01:51:07.620
Yeah, I mean, I'm not saying you've done anywhere
link |
01:51:10.980
near a perfect job at it.
link |
01:51:12.380
There is some amount of redundancy,
link |
01:51:14.540
and there are many imperfections in ARC.
link |
01:51:16.740
So that said, you should consider
link |
01:51:18.540
ARC as a work in progress.
link |
01:51:19.820
It is not the definitive state.
link |
01:51:25.180
The ARC tasks today are not the definitive state of the test.
link |
01:51:29.300
I want to keep refining it in the future.
link |
01:51:32.780
I also think it should be possible to open up
link |
01:51:36.180
the creation of tasks to a broad audience
link |
01:51:38.660
to do crowdsourcing.
link |
01:51:40.860
That would involve several levels of filtering,
link |
01:51:43.180
obviously.
link |
01:51:44.140
But I think it's possible to apply crowdsourcing
link |
01:51:46.260
to develop a much bigger and much more diverse ARC data set.
link |
01:51:51.140
That would also be free of potentially some
link |
01:51:54.020
of my own personal biases.
link |
01:51:56.700
Is there always need to be a part of ARC
link |
01:51:59.220
that the test is hidden?
link |
01:52:02.900
Yes, absolutely.
link |
01:52:04.140
It is imperative that the tests that you're
link |
01:52:08.900
using to actually benchmark algorithms
link |
01:52:11.900
is not accessible to the people developing these algorithms.
link |
01:52:15.220
Because otherwise, what's going to happen
link |
01:52:16.860
is that the human engineers are just
link |
01:52:19.100
going to solve the tasks themselves
link |
01:52:21.820
and encode their solution in program form.
link |
01:52:24.820
But that, again, what you're seeing here
link |
01:52:27.420
is the process of intelligence happening
link |
01:52:30.100
in the mind of the human.
link |
01:52:31.180
And then you're just capturing its crystallized output.
link |
01:52:35.460
But that crystallized output is not the same thing
link |
01:52:38.260
as the process it generated.
link |
01:52:40.020
It's not intelligent in itself.
link |
01:52:41.340
So what, by the way, the idea of crowdsourcing it
link |
01:52:43.980
is fascinating.
link |
01:52:45.860
I think the creation of questions
link |
01:52:49.860
is really exciting for people.
link |
01:52:51.460
I think there's a lot of really brilliant people
link |
01:52:53.980
out there that love to create these kinds of stuff.
link |
01:52:56.220
Yeah, one thing that kind of surprised me
link |
01:52:59.060
that I wasn't expecting is that lots of people
link |
01:53:01.620
seem to actually enjoy ARC as a kind of game.
link |
01:53:05.980
And I was releasing it as a test,
link |
01:53:08.820
as a benchmark of fluid general intelligence.
link |
01:53:14.100
And lots of people just, including kids,
link |
01:53:17.100
just started enjoying it as a game.
link |
01:53:18.900
So I think that's encouraging.
link |
01:53:20.980
Yeah, I'm fascinated by it.
link |
01:53:22.300
There's a world of people who create IQ questions.
link |
01:53:25.940
I think that's a cool activity for machines and for humans.
link |
01:53:32.660
And humans are themselves fascinated
link |
01:53:35.420
by taking the questions, like measuring
link |
01:53:40.220
their own intelligence.
link |
01:53:42.300
I mean, that's just really compelling.
link |
01:53:44.420
It's really interesting to me, too.
link |
01:53:47.020
One of the cool things about ARC, you said,
link |
01:53:48.740
is kind of inspired by IQ tests or whatever
link |
01:53:51.620
follows a similar process.
link |
01:53:53.460
But because of its nature, because of the context
link |
01:53:56.060
in which it lives, it immediately
link |
01:53:59.020
forces you to think about the nature of intelligence
link |
01:54:01.660
as opposed to just the test of your own.
link |
01:54:04.220
It forces you to really think.
link |
01:54:06.020
I don't know if it's within the question,
link |
01:54:09.900
inherent in the question, or just the fact
link |
01:54:11.860
that it lives in the test that's supposed
link |
01:54:13.780
to be a test of machine intelligence.
link |
01:54:15.340
Absolutely.
link |
01:54:15.900
As you solve ARC tasks as a human,
link |
01:54:20.660
you will be forced to basically introspect
link |
01:54:24.700
how you come up with solutions.
link |
01:54:27.060
And that forces you to reflect on the human problem solving
link |
01:54:32.660
process.
link |
01:54:33.820
And the way your own mind generates
link |
01:54:38.780
abstract representations of the problems it's exposed to.
link |
01:54:44.780
I think it's due to the fact that the set of core knowledge
link |
01:54:48.860
priors that ARC is built upon is so small.
link |
01:54:52.460
It's all a recombination of a very, very small set
link |
01:54:58.660
of assumptions.
link |
01:55:00.460
OK, so what's the future of ARC?
link |
01:55:02.900
So you held ARC as a challenge, as part
link |
01:55:05.420
of like a Kaggle competition.
link |
01:55:06.700
Yes.
link |
01:55:07.180
Kaggle competition.
link |
01:55:08.420
And what do you think?
link |
01:55:11.860
Do you think that's something that
link |
01:55:13.060
continues for five years, 10 years,
link |
01:55:16.060
like just continues growing?
link |
01:55:17.820
Yes, absolutely.
link |
01:55:18.940
So ARC itself will keep evolving.
link |
01:55:21.340
So I've talked about crowdsourcing.
link |
01:55:22.780
I think that's a good avenue.
link |
01:55:26.180
Another thing I'm starting is I'll
link |
01:55:29.340
be collaborating with folks from the psychology department
link |
01:55:32.700
at NYU to do human testing on ARC.
link |
01:55:36.660
And I think there are lots of interesting questions
link |
01:55:38.940
you can start asking, especially as you start correlating
link |
01:55:43.940
machine solutions to ARC tasks and the human characteristics
link |
01:55:49.420
of solutions.
link |
01:55:50.060
Like for instance, you can try to see
link |
01:55:52.020
if there's a relationship between the human perceived
link |
01:55:55.660
difficulty of a task and the machine perceived.
link |
01:55:59.420
Yes, and exactly some measure of machine
link |
01:56:01.940
perceived difficulty.
link |
01:56:02.780
Yeah, it's a nice playground in which
link |
01:56:04.900
to explore this very difference.
link |
01:56:06.340
It's the same thing as we talked about the autonomous vehicles.
link |
01:56:09.260
The things that could be difficult for humans
link |
01:56:10.900
might be very different than the things that are difficult.
link |
01:56:13.100
And formalizing or making explicit that difference
link |
01:56:17.300
in difficulty may teach us something fundamental
link |
01:56:21.020
about intelligence.
link |
01:56:22.340
So one thing I think we did well with ARC
link |
01:56:26.420
is that it's proving to be a very actionable test in the sense
link |
01:56:33.060
that machine performance on ARC started at very much zero
link |
01:56:37.700
initially, while humans found actually the task very easy.
link |
01:56:43.340
And that alone was like a big red flashing light saying
link |
01:56:48.180
that something is going on and that we are missing something.
link |
01:56:52.380
And at the same time, machine performance
link |
01:56:55.420
did not stay at zero for very long.
link |
01:56:57.660
Actually, within two weeks of the Kaggle competition,
link |
01:57:00.260
we started having a nonzero number.
link |
01:57:03.220
And now the state of the art is around 20%
link |
01:57:06.460
of the test set solved.
link |
01:57:10.260
And so ARC is actually a challenge
link |
01:57:12.500
where our capabilities start at zero, which indicates
link |
01:57:16.860
the need for progress.
link |
01:57:18.180
But it's also not an impossible challenge.
link |
01:57:20.580
It's not accessible.
link |
01:57:21.500
You can start making progress basically right away.
link |
01:57:25.260
At the same time, we are still very far
link |
01:57:28.380
from having solved it.
link |
01:57:29.420
And that's actually a very positive outcome
link |
01:57:32.820
of the competition is that the competition has proven
link |
01:57:35.900
that there was no obvious shortcut to solve these tasks.
link |
01:57:41.740
Yeah, so the test held up.
link |
01:57:43.180
Yeah, exactly.
link |
01:57:44.340
That was the primary reason to use the Kaggle competition
link |
01:57:46.900
is to check if some clever person was
link |
01:57:51.540
going to hack the benchmark that did not happen.
link |
01:57:56.380
People who are solving the task are essentially doing it.
link |
01:58:01.060
Well, in a way, they're actually exploring some flaws of ARC
link |
01:58:05.580
that we will need to address in the future,
link |
01:58:07.380
especially they're essentially anticipating
link |
01:58:09.900
what sort of tasks may be contained in the test set.
link |
01:58:13.780
Right, which is kind of, yeah, that's the kind of hacking.
link |
01:58:18.460
It's human hacking of the test.
link |
01:58:20.180
Yes, that said, with the state of the art,
link |
01:58:23.380
it's like 20% we're still very, very far from human level,
link |
01:58:28.220
which is closer to 100%.
link |
01:58:30.940
And I do believe that it will take a while
link |
01:58:35.540
until we reach human parity on ARC.
link |
01:58:40.500
And that by the time we have human parity,
link |
01:58:43.540
we will have AI systems that are probably
link |
01:58:47.020
pretty close to human level in terms of general fluid
link |
01:58:50.740
intelligence, which is, I mean, they are not
link |
01:58:53.260
going to be necessarily human like.
link |
01:58:54.940
They're not necessarily, you would not necessarily
link |
01:58:58.780
recognize them as being an AGI.
link |
01:59:01.860
But they would be capable of a degree of generalization
link |
01:59:06.860
that matches the generalization performed
link |
01:59:09.820
by human fluid intelligence.
link |
01:59:11.300
Sure.
link |
01:59:11.860
I mean, this is a good point in terms
link |
01:59:13.380
of general fluid intelligence to mention in your paper.
link |
01:59:17.700
You describe different kinds of generalizations,
link |
01:59:21.060
local, broad, extreme.
link |
01:59:23.460
And there's a kind of a hierarchy that you form.
link |
01:59:25.660
So when we say generalizations, what are we talking about?
link |
01:59:31.820
What kinds are there?
link |
01:59:33.180
Right, so generalization is a very old idea.
link |
01:59:37.020
I mean, it's even older than machine learning.
link |
01:59:39.420
In the context of machine learning,
link |
01:59:40.980
you say a system generalizes if it can make sense of an input
link |
01:59:47.140
it has not yet seen.
link |
01:59:49.580
And that's what I would call system centric generalization,
link |
01:59:54.940
generalization with respect to novelty
link |
02:00:00.380
for the specific system you're considering.
link |
02:00:02.980
So I think a good test of intelligence
link |
02:00:05.060
should actually deal with developer aware generalization,
link |
02:00:09.900
which is slightly stronger than system centric generalization.
link |
02:00:13.500
So developer aware generalization
link |
02:00:16.020
would be the ability to generalize
link |
02:00:19.860
to novelty or uncertainty that not only the system itself has
link |
02:00:24.220
not access to, but the developer of the system
link |
02:00:26.660
could not have access to either.
link |
02:00:29.380
That's a fascinating meta definition.
link |
02:00:32.380
So the system is basically the edge case thing
link |
02:00:37.700
we're talking about with autonomous vehicles.
link |
02:00:39.780
Neither the developer nor the system
link |
02:00:41.620
know about the edge cases in my encounter.
link |
02:00:44.420
So it's up to the system should be
link |
02:00:47.020
able to generalize the thing that nobody expected,
link |
02:00:51.660
neither the designer of the training data,
link |
02:00:54.860
nor obviously the contents of the training data.
link |
02:00:59.060
That's a fascinating definition.
link |
02:01:00.580
So you can see degrees of generalization as a spectrum.
link |
02:01:04.540
And the lowest level is what machine learning
link |
02:01:08.060
is trying to do is the assumption
link |
02:01:10.780
that any new situation is going to be sampled
link |
02:01:15.220
from a static distribution of possible situations
link |
02:01:18.340
and that you already have a representative sample
link |
02:01:21.500
of the distribution.
link |
02:01:22.420
That's your training data.
link |
02:01:23.860
And so in machine learning, you generalize to a new sample
link |
02:01:26.700
from a known distribution.
link |
02:01:28.780
And the ways in which your new sample will be new or different
link |
02:01:34.020
are ways that are already understood by the developers
link |
02:01:38.140
of the system.
link |
02:01:39.420
So you are generalizing to known unknowns
link |
02:01:43.020
for one specific task.
link |
02:01:45.100
That's what you would call robustness.
link |
02:01:47.500
You are robust to things like noise, small variations,
link |
02:01:50.180
and so on for one fixed known distribution
link |
02:01:56.620
that you know through your training data.
link |
02:01:59.300
And the higher degree would be flexibility
link |
02:02:05.060
in machine intelligence.
link |
02:02:06.380
So flexibility would be something
link |
02:02:08.620
like an L5 cell driving car or maybe a robot that
link |
02:02:12.500
can pass the coffee cup test, which
link |
02:02:16.820
is the notion that you'd be given a random kitchen
link |
02:02:21.460
somewhere in the country.
link |
02:02:22.460
And you would have to go make a cup of coffee in that kitchen.
link |
02:02:28.460
So flexibility would be the ability
link |
02:02:30.820
to deal with unknown unknowns, so things that could not,
link |
02:02:35.300
dimensions of viability that could not
link |
02:02:37.180
have been possibly foreseen by the creators of the system
link |
02:02:41.100
within one specific task.
link |
02:02:42.860
So generalizing to the long tail of situations in self driving,
link |
02:02:47.020
for instance, would be flexibility.
link |
02:02:48.540
So you have robustness, flexibility, and finally,
link |
02:02:51.700
you would have extreme generalization,
link |
02:02:53.700
which is basically flexibility, but instead
link |
02:02:57.740
of just considering one specific domain,
link |
02:03:01.180
like driving or domestic robotics,
link |
02:03:03.340
you're considering an open ended range of possible domains.
link |
02:03:07.740
So a robot would be capable of extreme generalization
link |
02:03:12.620
if, let's say, it's designed and trained for cooking,
link |
02:03:18.060
for instance.
link |
02:03:19.820
And if I buy the robot and if it's
link |
02:03:24.580
able to teach itself gardening in a couple of weeks,
link |
02:03:28.780
it would be capable of extreme generalization, for instance.
link |
02:03:32.300
So the ultimate goal is extreme generalization.
link |
02:03:34.300
Yes.
link |
02:03:34.820
So creating a system that is so general that it could
link |
02:03:40.020
essentially achieve human skill parity over arbitrary tasks
link |
02:03:46.140
and arbitrary domains with the same level of improvisation
link |
02:03:50.820
and adaptation power as humans when
link |
02:03:53.740
it encounters new situations.
link |
02:03:55.380
And it would do so over basically the same range
link |
02:03:59.780
of possible domains and tasks as humans
link |
02:04:02.780
and using essentially the same amount of training
link |
02:04:05.500
experience of practice as humans would require.
link |
02:04:07.860
That would be human level extreme generalization.
link |
02:04:10.900
So I don't actually think humans are anywhere
link |
02:04:14.620
near the optimal intelligence bounds
link |
02:04:19.580
if there is such a thing.
link |
02:04:21.300
So I think for humans or in general?
link |
02:04:23.820
In general.
link |
02:04:25.140
I think it's quite likely that there
link |
02:04:26.780
is a hard limit to how intelligent any system can be.
link |
02:04:33.860
But at the same time, I don't think humans are anywhere
link |
02:04:35.980
near that limit.
link |
02:04:39.180
Yeah, last time I think we talked,
link |
02:04:40.780
I think you had this idea that we're only
link |
02:04:43.820
as intelligent as the problems we face.
link |
02:04:46.580
Sort of we are bounded by the problems.
link |
02:04:51.300
In a way, yes.
link |
02:04:51.940
We are bounded by our environments,
link |
02:04:55.100
and we are bounded by the problems we try to solve.
link |
02:04:58.100
Yeah.
link |
02:04:59.220
Yeah.
link |
02:04:59.700
What do you make of Neuralink and outsourcing
link |
02:05:03.820
some of the brain power, like brain computer interfaces?
link |
02:05:07.140
Do you think we can expand or augment our intelligence?
link |
02:05:13.460
I am fairly skeptical of neural interfaces
link |
02:05:18.340
because they are trying to fix one specific bottleneck
link |
02:05:23.780
in human machine cognition, which
link |
02:05:26.380
is the bandwidth bottleneck, input and output
link |
02:05:29.700
of information in the brain.
link |
02:05:31.820
And my perception of the problem is that bandwidth is not
link |
02:05:37.820
at this time a bottleneck at all.
link |
02:05:41.140
Meaning that we already have sensors
link |
02:05:43.580
that enable us to take in far more information than what
link |
02:05:48.300
we can actually process.
link |
02:05:50.420
Well, to push back on that a little bit,
link |
02:05:53.260
to sort of play devil's advocate a little bit,
link |
02:05:55.420
is if you look at the internet, Wikipedia, let's say Wikipedia,
link |
02:05:58.980
I would say that humans, after the advent of Wikipedia,
link |
02:06:03.300
are much more intelligent.
link |
02:06:05.860
Yes, I think that's a good one.
link |
02:06:07.820
But that's also not about, that's about externalizing
link |
02:06:14.180
our intelligence via information processing systems,
link |
02:06:18.140
external information processing systems,
link |
02:06:19.740
which is very different from brain computer interfaces.
link |
02:06:23.780
Right, but the question is whether if we have direct
link |
02:06:27.980
access, if our brain has direct access to Wikipedia without
link |
02:06:31.940
Your brain already has direct access to Wikipedia.
link |
02:06:34.540
It's on your phone.
link |
02:06:35.900
And you have your hands and your eyes and your ears
link |
02:06:39.380
and so on to access that information.
link |
02:06:42.140
And the speed at which you can access it
link |
02:06:44.340
Is bottlenecked by the cognition.
link |
02:06:45.700
I think it's already close, fairly close to optimal,
link |
02:06:49.620
which is why speed reading, for instance, does not work.
link |
02:06:53.340
The faster you read, the less you understand.
link |
02:06:55.980
But maybe it's because it uses the eyes.
link |
02:06:58.420
So maybe.
link |
02:07:00.540
So I don't believe so.
link |
02:07:01.460
I think the brain is very slow.
link |
02:07:04.620
It typically operates, you know, the fastest things
link |
02:07:07.860
that happen in the brain are at the level of 50 milliseconds.
link |
02:07:11.420
Forming a conscious thought can potentially
link |
02:07:14.580
take entire seconds, right?
link |
02:07:16.740
And you can already read pretty fast.
link |
02:07:19.220
So I think the speed at which you can take information in
link |
02:07:23.460
and even the speed at which you can output information
link |
02:07:26.460
can only be very incrementally improved.
link |
02:07:29.900
Maybe there's a question.
link |
02:07:31.100
If you're a very fast typer, if you're a very trained typer,
link |
02:07:34.380
the speed at which you can express your thoughts
link |
02:07:36.660
is already the speed at which you can form your thoughts.
link |
02:07:40.500
Right, so that's kind of an idea that there are
link |
02:07:44.540
fundamental bottlenecks to the human mind.
link |
02:07:47.020
But it's possible that everything we have
link |
02:07:50.260
in the human mind is just to be able to survive
link |
02:07:53.140
in the environment.
link |
02:07:54.420
And there's a lot more to expand.
link |
02:07:58.300
Maybe, you know, you said the speed of the thought.
link |
02:08:02.420
So I think augmenting human intelligence
link |
02:08:06.780
is a very valid and very powerful avenue, right?
link |
02:08:09.900
And that's what computers are about.
link |
02:08:12.260
In fact, that's what all of culture and civilization
link |
02:08:15.900
is about.
link |
02:08:16.740
Our culture is externalized cognition
link |
02:08:20.620
and we rely on culture to think constantly.
link |
02:08:23.740
Yeah, I mean, that's another, yeah.
link |
02:08:26.620
Not just computers, not just phones and the internet.
link |
02:08:29.140
I mean, all of culture, like language, for instance,
link |
02:08:32.460
is a form of externalized cognition.
link |
02:08:34.020
Books are obviously externalized cognition.
link |
02:08:37.460
Yeah, that's a good point.
link |
02:08:38.580
And you can scale that externalized cognition
link |
02:08:42.060
far beyond the capability of the human brain.
link |
02:08:45.180
And you could see civilization itself
link |
02:08:48.900
is it has capabilities that are far beyond any individual brain
link |
02:08:54.260
and will keep scaling it because it's not
link |
02:08:55.940
rebound by individual brains.
link |
02:08:59.140
It's a different kind of system.
link |
02:09:01.340
Yeah, and that system includes nonhuman, nonhumans.
link |
02:09:06.260
First of all, it includes all the other biological systems,
link |
02:09:08.700
which are probably contributing to the overall intelligence
link |
02:09:11.660
of the organism.
link |
02:09:12.900
And then computers are part of it.
link |
02:09:14.460
Nonhuman systems are probably not contributing much,
link |
02:09:16.860
but AIs are definitely contributing to that.
link |
02:09:19.700
Like Google search, for instance, is a big part of it.
link |
02:09:24.260
Yeah, yeah, a huge part, a part that we can't probably
link |
02:09:29.660
introspect.
link |
02:09:31.060
Like how the world has changed in the past 20 years,
link |
02:09:33.780
it's probably very difficult for us
link |
02:09:35.220
to be able to understand until, of course,
link |
02:09:38.620
whoever created the simulation we're in is probably
link |
02:09:41.740
doing metrics, measuring the progress.
link |
02:09:44.940
There was probably a big spike in performance.
link |
02:09:48.340
They're enjoying this.
link |
02:09:51.580
So what are your thoughts on the Turing test
link |
02:09:56.020
and the Lobner Prize, which is one
link |
02:10:00.340
of the most famous attempts at the test of artificial
link |
02:10:05.700
intelligence by doing a natural language open dialogue test
link |
02:10:11.740
that's judged by humans as far as how well the machine did?
link |
02:10:18.860
So I'm not a fan of the Turing test.
link |
02:10:21.460
Itself or any of its variants for two reasons.
link |
02:10:25.940
So first of all, it's really coping out
link |
02:10:34.140
of trying to define and measure intelligence
link |
02:10:37.660
because it's entirely outsourcing that
link |
02:10:40.620
to a panel of human judges.
link |
02:10:43.380
And these human judges, they may not themselves
link |
02:10:47.420
have any proper methodology.
link |
02:10:49.700
They may not themselves have any proper definition
link |
02:10:52.660
of intelligence.
link |
02:10:53.620
They may not be reliable.
link |
02:10:54.780
So the Turing test is already failing
link |
02:10:57.260
one of the core psychometrics principles, which
link |
02:10:59.620
is reliability because you have biased human judges.
link |
02:11:04.620
It's also violating the standardization requirement
link |
02:11:07.900
and the freedom from bias requirement.
link |
02:11:10.140
And so it's really a cope out because you are outsourcing
link |
02:11:13.900
everything that matters, which is precisely describing
link |
02:11:17.380
intelligence and finding a standalone test to measure it.
link |
02:11:22.180
You're outsourcing everything to people.
link |
02:11:25.260
So it's really a cope out.
link |
02:11:26.340
And by the way, we should keep in mind
link |
02:11:28.860
that when Turing proposed the imitation game,
link |
02:11:33.940
it was not meaning for the imitation game
link |
02:11:36.780
to be an actual goal for the field of AI
link |
02:11:40.700
and actual test of intelligence.
link |
02:11:42.460
It was using the imitation game as a thought experiment
link |
02:11:48.780
in a philosophical discussion in his 1950 paper.
link |
02:11:53.580
He was trying to argue that theoretically, it
link |
02:11:58.820
should be possible for something very much like the human mind,
link |
02:12:04.220
indistinguishable from the human mind,
link |
02:12:06.100
to be encoded in a Turing machine.
link |
02:12:08.060
And at the time, that was a very daring idea.
link |
02:12:14.540
It was stretching credulity.
link |
02:12:16.580
But nowadays, I think it's fairly well accepted
link |
02:12:20.140
that the mind is an information processing system
link |
02:12:22.660
and that you could probably encode it into a computer.
link |
02:12:25.420
So another reason why I'm not a fan of this type of test
link |
02:12:29.380
is that the incentives that it creates
link |
02:12:34.220
are incentives that are not conducive to proper scientific
link |
02:12:39.740
research.
link |
02:12:40.780
If your goal is to trick, to convince a panel of human
link |
02:12:45.700
judges that they are talking to a human,
link |
02:12:48.460
then you have an incentive to rely on tricks
link |
02:12:53.420
and prestidigitation.
link |
02:12:56.500
In the same way that, let's say, you're doing physics
link |
02:12:59.180
and you want to solve teleportation.
link |
02:13:01.500
And what if the test that you set out to pass
link |
02:13:04.660
is you need to convince a panel of judges
link |
02:13:07.460
that teleportation took place?
link |
02:13:09.500
And they're just sitting there and watching what you're doing.
link |
02:13:12.580
And that is something that you can achieve with David
link |
02:13:17.540
Copperfield could achieve it in his show at Vegas.
link |
02:13:22.780
And what he's doing is very elaborate.
link |
02:13:25.260
But it's not physics.
link |
02:13:29.180
It's not making any progress in our understanding
link |
02:13:31.740
of the universe.
link |
02:13:32.620
To push back on that is possible.
link |
02:13:34.780
That's the hope with these kinds of subjective evaluations
link |
02:13:39.020
is that it's easier to solve it generally
link |
02:13:41.940
than it is to come up with tricks that convince
link |
02:13:45.420
a large number of judges.
link |
02:13:46.620
That's the hope.
link |
02:13:47.340
In practice, it turns out that it's
link |
02:13:49.300
very easy to deceive people in the same way
link |
02:13:51.500
that you can do magic in Vegas.
link |
02:13:54.380
You can actually very easily convince people
link |
02:13:57.300
that they're talking to a human when they're actually
link |
02:13:59.500
talking to an algorithm.
link |
02:14:00.740
I just disagree.
link |
02:14:01.740
I disagree with that.
link |
02:14:02.660
I think it's easy.
link |
02:14:03.620
I would push.
link |
02:14:05.100
No, it's not easy.
link |
02:14:07.340
It's doable.
link |
02:14:08.300
It's very easy because we are biased.
link |
02:14:12.260
We have theory of mind.
link |
02:14:13.860
We are constantly projecting emotions, intentions, agentness.
link |
02:14:21.020
Agentness is one of our core innate priors.
link |
02:14:24.260
We are projecting these things on everything around us.
link |
02:14:26.820
Like if you paint a smiley on a rock,
link |
02:14:31.260
the rock becomes happy in our eyes.
link |
02:14:33.420
And because we have this extreme bias that
link |
02:14:36.540
permits everything we see around us,
link |
02:14:39.740
it's actually pretty easy to trick people.
link |
02:14:41.780
I just disagree with that.
link |
02:14:44.300
I so totally disagree with that.
link |
02:14:45.820
You brilliantly put as a huge, the anthropomorphization
link |
02:14:50.500
that we naturally do, the agentness of that word.
link |
02:14:53.140
Is that a real word?
link |
02:14:53.980
No, it's not a real word.
link |
02:14:55.500
I like it.
link |
02:14:56.020
But it's a useful word.
link |
02:14:57.780
It's a useful word.
link |
02:14:58.620
Let's make it real.
link |
02:14:59.660
It's a huge help.
link |
02:15:01.020
But I still think it's really difficult to convince.
link |
02:15:04.900
If you do like the Alexa Prize formulation,
link |
02:15:07.940
where you talk for an hour, there's
link |
02:15:10.420
formulations of the test you can create,
link |
02:15:12.460
where it's very difficult.
link |
02:15:13.780
So I like the Alexa Prize better because it's more pragmatic.
link |
02:15:18.100
It's more practical.
link |
02:15:19.540
It's actually incentivizing developers
link |
02:15:22.100
to create something that's useful as a human machine
link |
02:15:27.860
interface.
link |
02:15:29.300
So that's slightly better than just the imitation.
link |
02:15:31.780
So I like it.
link |
02:15:34.100
Your idea is like a test which hopefully
link |
02:15:36.980
help us in creating intelligent systems as a result.
link |
02:15:39.620
Like if you create a system that passes it,
link |
02:15:41.700
it'll be useful for creating further intelligent systems.
link |
02:15:44.740
Yes, at least.
link |
02:15:46.100
Yeah.
link |
02:15:47.620
Just to kind of comment, I'm a little bit surprised
link |
02:15:51.740
how little inspiration people draw from the Turing test
link |
02:15:55.660
today.
link |
02:15:57.180
The media and the popular press might write about it
link |
02:15:59.420
every once in a while.
link |
02:16:00.900
The philosophers might talk about it.
link |
02:16:03.500
But most engineers are not really inspired by it.
link |
02:16:07.020
And I know you don't like the Turing test,
link |
02:16:11.340
but we'll have this argument another time.
link |
02:16:15.060
There's something inspiring about it, I think.
link |
02:16:18.620
As a philosophical device in a physical discussion,
link |
02:16:21.740
I think there is something very interesting about it.
link |
02:16:23.780
I don't think it is in practical terms.
link |
02:16:26.220
I don't think it's conducive to progress.
link |
02:16:29.060
And one of the reasons why is that I
link |
02:16:32.540
think being very human like, being
link |
02:16:35.300
indistinguishable from a human is actually
link |
02:16:37.540
the very last step in the creation of machine
link |
02:16:40.460
intelligence.
link |
02:16:41.020
That the first ARs that will show strong generalization
link |
02:16:46.820
that will actually implement human like broad cognitive
link |
02:16:52.500
abilities, they will not actually behave or look
link |
02:16:54.980
anything like humans.
link |
02:16:58.500
Human likeness is the very last step in that process.
link |
02:17:01.700
And so a good test is a test that
link |
02:17:03.780
points you towards the first step on the ladder,
link |
02:17:07.060
not towards the top of the ladder.
link |
02:17:08.900
So to push back on that, I usually
link |
02:17:11.980
agree with you on most things.
link |
02:17:13.460
I remember you, I think at some point,
link |
02:17:15.060
tweeting something about the Turing test
link |
02:17:17.100
not being being counterproductive
link |
02:17:19.020
or something like that.
link |
02:17:20.340
And I think a lot of very smart people agree with that.
link |
02:17:23.220
I, a computation speaking, not very smart person,
link |
02:17:31.460
disagree with that.
link |
02:17:32.300
Because I think there's some magic
link |
02:17:33.820
to the interactivity with other humans.
link |
02:17:36.900
So to play devil's advocate on your statement,
link |
02:17:39.620
it's possible that in order to demonstrate
link |
02:17:42.780
the generalization abilities of a system,
link |
02:17:45.540
you have to show your ability, in conversation,
link |
02:17:49.940
show your ability to adjust, adapt to the conversation
link |
02:17:55.380
through not just like as a standalone system,
link |
02:17:58.380
but through the process of like the interaction,
link |
02:18:01.380
the game theoretic, where you really
link |
02:18:05.700
are changing the environment by your actions.
link |
02:18:09.180
So in the ARC challenge, for example,
link |
02:18:11.660
you're an observer.
link |
02:18:12.820
You can't scare the test into changing.
link |
02:18:17.460
You can't talk to the test.
link |
02:18:19.380
You can't play with it.
link |
02:18:21.260
So there's some aspect of that interactivity
link |
02:18:24.300
that becomes highly subjective, but it
link |
02:18:26.140
feels like it could be conducive to generalizability.
link |
02:18:29.620
I think you make a great point.
link |
02:18:31.060
The interactivity is a very good setting
link |
02:18:33.580
to force a system to show adaptation,
link |
02:18:36.060
to show generalization.
link |
02:18:39.300
That said, at the same time, it's
link |
02:18:42.620
not something very scalable, because you
link |
02:18:44.860
rely on human judges.
link |
02:18:46.100
It's not something reliable, because the human judges may
link |
02:18:48.700
not, may not.
link |
02:18:49.420
So you don't like human judges.
link |
02:18:50.940
Basically, yes.
link |
02:18:51.860
And I think so.
link |
02:18:52.540
I love the idea of interactivity.
link |
02:18:56.140
I initially wanted an ARC test that
link |
02:18:59.620
had some amount of interactivity where your score on a task
link |
02:19:02.820
would not be 1 or 0, if you can solve it or not,
link |
02:19:05.380
but would be the number of attempts
link |
02:19:11.580
that you can make before you hit the right solution, which
link |
02:19:14.740
means that now you can start applying
link |
02:19:16.900
the scientific method as you solve ARC tasks,
link |
02:19:19.860
that you can start formulating hypotheses and probing
link |
02:19:23.780
the system to see whether the observation will
link |
02:19:27.300
match the hypothesis or not.
link |
02:19:28.660
It would be amazing if you could also,
link |
02:19:30.700
even higher level than that, measure the quality of your attempts,
link |
02:19:35.500
which, of course, is impossible.
link |
02:19:36.780
But again, that gets subjective.
link |
02:19:38.540
How good was your thinking?
link |
02:19:41.620
How efficient was?
link |
02:19:43.900
So one thing that's interesting about this notion of scoring you
link |
02:19:48.380
as how many attempts you need is that you
link |
02:19:50.500
can start producing tasks that are way more ambiguous, right?
link |
02:19:55.220
Right.
link |
02:19:56.500
Because with the different attempts,
link |
02:19:59.700
you can actually probe that ambiguity, right?
link |
02:20:03.300
Right.
link |
02:20:04.140
So that's, in a sense, which is how good can
link |
02:20:08.220
you adapt to the uncertainty and reduce the uncertainty?
link |
02:20:15.700
Yes, it's half fast.
link |
02:20:18.260
It's the efficiency with which you reduce uncertainty
link |
02:20:21.180
in program space, exactly.
link |
02:20:22.940
Very difficult to come up with that kind of test, though.
link |
02:20:24.940
Yeah, so I would love to be able to create something like this.
link |
02:20:28.340
In practice, it would be very, very difficult, but yes.
link |
02:20:33.140
I mean, what you're doing, what you've done with the ARC challenge
link |
02:20:36.140
is brilliant.
link |
02:20:37.620
I'm also not surprised that it's not more popular,
link |
02:20:40.940
but I think it's picking up.
link |
02:20:42.140
It does its niche.
link |
02:20:42.860
It does its niche, yeah.
link |
02:20:44.100
Yeah.
link |
02:20:44.900
What are your thoughts about another test?
link |
02:20:47.100
I talked with Marcus Hutter.
link |
02:20:48.940
He has the Hutter Prize for compression of human knowledge.
link |
02:20:51.660
And the idea is really sort of quantify and reduce
link |
02:20:55.620
the test of intelligence purely to just the ability
link |
02:20:58.260
to compress.
link |
02:20:59.580
What's your thoughts about this intelligence as compression?
link |
02:21:04.660
I mean, it's a very fun test because it's
link |
02:21:07.980
such a simple idea, like you're given Wikipedia,
link |
02:21:12.220
basic English Wikipedia, and you must compress it.
link |
02:21:15.500
And so it stems from the idea that cognition is compression,
link |
02:21:21.140
that the brain is basically a compression algorithm.
link |
02:21:24.020
This is a very old idea.
link |
02:21:25.620
It's a very, I think, striking and beautiful idea.
link |
02:21:30.540
I used to believe it.
link |
02:21:32.740
I eventually had to realize that it was very much
link |
02:21:36.140
a flawed idea.
link |
02:21:36.900
So I no longer believe that cognition is compression.
link |
02:21:41.420
But I can tell you what's the difference.
link |
02:21:44.620
So it's very easy to believe that cognition and compression
link |
02:21:48.820
are the same thing.
link |
02:21:51.660
So Jeff Hawkins, for instance, says
link |
02:21:53.220
that cognition is prediction.
link |
02:21:54.780
And of course, prediction is basically the same thing
link |
02:21:57.740
as compression.
link |
02:21:58.700
It's just including the temporal axis.
link |
02:22:03.580
And it's very easy to believe this
link |
02:22:05.060
because compression is something that we
link |
02:22:06.900
do all the time very naturally.
link |
02:22:09.020
We are constantly compressing information.
link |
02:22:12.020
We are constantly trying.
link |
02:22:15.660
We have this bias towards simplicity.
link |
02:22:17.940
We are constantly trying to organize things in our mind
link |
02:22:21.060
and around us to be more regular.
link |
02:22:24.460
So it's a beautiful idea.
link |
02:22:26.860
It's very easy to believe.
link |
02:22:28.620
There is a big difference between what
link |
02:22:31.580
we do with our brains and compression.
link |
02:22:33.980
So compression is actually kind of a tool
link |
02:22:38.220
in the human cognitive toolkit that is used in many ways.
link |
02:22:42.060
But it's just a tool.
link |
02:22:44.540
It is a tool for cognition.
link |
02:22:45.940
It is not cognition itself.
link |
02:22:47.620
And the big fundamental difference
link |
02:22:50.020
is that cognition is about being able to operate
link |
02:22:55.340
in future situations that include fundamental uncertainty
link |
02:23:00.740
and novelty.
link |
02:23:02.140
So for instance, consider a child at age 10.
link |
02:23:06.860
And so they have 10 years of life experience.
link |
02:23:10.100
They've gotten pain, pleasure, rewards, and punishment
link |
02:23:14.260
in a period of time.
link |
02:23:16.500
If you were to generate the shortest behavioral program
link |
02:23:21.980
that would have basically run that child over these 10 years
link |
02:23:26.740
in an optimal way, the shortest optimal behavioral program
link |
02:23:32.220
given the experience of that child so far,
link |
02:23:34.820
well, that program, that compressed program,
link |
02:23:37.540
this is what you would get if the mind of the child
link |
02:23:39.940
was a compression algorithm essentially,
link |
02:23:42.740
would be utterly unable, inappropriate,
link |
02:23:48.100
to process the next 70 years in the life of that child.
link |
02:23:54.380
So in the models we build of the world,
link |
02:23:59.020
we are not trying to make them actually optimally compressed.
link |
02:24:03.220
We are using compression as a tool
link |
02:24:06.660
to promote simplicity and efficiency in our models.
link |
02:24:10.060
But they are not perfectly compressed
link |
02:24:12.060
because they need to include things
link |
02:24:15.300
that are seemingly useless today, that have seemingly
link |
02:24:18.540
been useless so far.
link |
02:24:20.140
But that may turn out to be useful in the future
link |
02:24:24.140
because you just don't know the future.
link |
02:24:25.900
And that's the fundamental principle
link |
02:24:28.740
that cognition, that intelligence arises from
link |
02:24:31.260
is that you need to be able to run
link |
02:24:33.780
appropriate behavioral programs except you have absolutely
link |
02:24:36.660
no idea what sort of context, environment, situation
link |
02:24:40.940
they are going to be running in.
link |
02:24:42.260
And you have to deal with that uncertainty,
link |
02:24:45.020
with that future anomaly.
link |
02:24:46.580
So an analogy that you can make is with investing,
link |
02:24:52.500
for instance.
link |
02:24:54.460
If I look at the past 20 years of stock market data,
link |
02:24:59.540
and I use a compression algorithm
link |
02:25:01.860
to figure out the best trading strategy,
link |
02:25:04.420
it's going to be you buy Apple stock, then
link |
02:25:06.660
maybe the past few years you buy Tesla stock or something.
link |
02:25:10.420
But is that strategy still going to be
link |
02:25:13.300
true for the next 20 years?
link |
02:25:14.660
Well, actually, probably not, which
link |
02:25:17.980
is why if you're a smart investor,
link |
02:25:21.060
you're not just going to be following the strategy that
link |
02:25:26.340
corresponds to compression of the past.
link |
02:25:28.980
You're going to be following, you're
link |
02:25:31.660
going to have a balanced portfolio, right?
link |
02:25:34.860
Because you just don't know what's going to happen.
link |
02:25:38.180
I mean, I guess in that same sense,
link |
02:25:40.460
the compression is analogous to what
link |
02:25:42.540
you talked about, which is local or robust generalization
link |
02:25:45.900
versus extreme generalization.
link |
02:25:47.820
It's much closer to that side of being able to generalize
link |
02:25:52.420
in the local sense.
link |
02:25:53.420
That's why as humans, when we are children, in our education,
link |
02:25:59.980
so a lot of it is driven by play, driven by curiosity.
link |
02:26:04.180
We are not efficiently compressing things.
link |
02:26:07.900
We're actually exploring.
link |
02:26:09.620
We are retaining all kinds of things
link |
02:26:16.620
from our environment that seem to be completely useless.
link |
02:26:19.660
Because they might turn out to be eventually useful, right?
link |
02:26:24.380
And that's what cognition is really about.
link |
02:26:26.940
And what makes it antagonistic to compression
link |
02:26:29.300
is that it is about hedging for future uncertainty.
link |
02:26:33.980
And that's antagonistic to compression.
link |
02:26:35.860
Yes.
link |
02:26:36.580
Officially hedging.
link |
02:26:38.500
Cognition leverages compression as a tool
link |
02:26:41.660
to promote efficiency and simplicity in our models.
link |
02:26:47.420
It's like Einstein said, make it simpler, but not,
link |
02:26:52.260
however that quote goes, but not too simple.
link |
02:26:54.940
So compression simplifies things,
link |
02:26:57.700
but you don't want to make it too simple.
link |
02:27:00.100
Yes.
link |
02:27:00.740
So a good model of the world is going
link |
02:27:03.100
to include all kinds of things that are completely useless,
link |
02:27:06.100
actually, just in case.
link |
02:27:08.500
Because you need diversity in the same way
link |
02:27:10.020
that in your portfolio.
link |
02:27:11.140
You need all kinds of stocks that may not
link |
02:27:13.340
have performed well so far, but you need diversity.
link |
02:27:15.580
And the reason you need diversity
link |
02:27:16.980
is because fundamentally you don't know what you're doing.
link |
02:27:19.660
And the same is true of the human mind,
link |
02:27:22.020
is that it needs to behave appropriately in the future.
link |
02:27:26.860
And it has no idea what the future is going to be like.
link |
02:27:29.860
But it's not going to be like the past.
link |
02:27:31.460
So compressing the past is not appropriate,
link |
02:27:33.620
because the past is not, it's not predictive of the future.
link |
02:27:40.500
Yeah, history repeats itself, but not perfectly.
link |
02:27:44.740
I don't think I asked you last time the most inappropriately
link |
02:27:48.980
absurd question.
link |
02:27:51.180
We've talked a lot about intelligence,
link |
02:27:54.420
but the bigger question from intelligence is of meaning.
link |
02:28:00.860
Intelligence systems are kind of goal oriented.
link |
02:28:02.980
They're always optimizing for a goal.
link |
02:28:05.380
If you look at the Hutter Prize, actually,
link |
02:28:07.620
I mean, there's always a clean formulation of a goal.
link |
02:28:10.860
But the natural question for us humans,
link |
02:28:14.220
since we don't know our objective function,
link |
02:28:16.020
is what is the meaning of it all?
link |
02:28:18.460
So the absurd question is, what, Francois,
link |
02:28:22.980
do you think is the meaning of life?
link |
02:28:25.660
What's the meaning of life?
link |
02:28:26.820
Yeah, that's a big question.
link |
02:28:28.180
And I think I can give you my answer, at least one
link |
02:28:33.220
of my answers.
link |
02:28:34.540
And so one thing that's very important in understanding who
link |
02:28:42.220
we are is that everything that makes up ourselves,
link |
02:28:48.380
that makes up who we are, even your most personal thoughts,
link |
02:28:53.740
is not actually your own.
link |
02:28:55.700
Even your most personal thoughts are expressed in words
link |
02:29:00.060
that you did not invent and are built on concepts and images
link |
02:29:04.940
that you did not invent.
link |
02:29:06.900
We are very much cultural beings.
link |
02:29:10.940
We are made of culture.
link |
02:29:12.860
What makes us different from animals, for instance?
link |
02:29:16.660
So everything about ourselves is an echo of the past.
link |
02:29:22.860
Is an echo of the past, an echo of people who lived before us.
link |
02:29:29.900
That's who we are.
link |
02:29:31.420
And in the same way, if we manage
link |
02:29:35.300
to contribute something to the collective edifice of culture,
link |
02:29:41.780
a new idea, maybe a beautiful piece of music,
link |
02:29:44.580
a work of art, a grand theory, a new world, maybe,
link |
02:29:51.260
that something is going to become
link |
02:29:54.380
a part of the minds of future humans, essentially, forever.
link |
02:30:00.300
So everything we do creates ripples
link |
02:30:03.980
that propagate into the future.
link |
02:30:06.020
And in a way, this is our path to immortality,
link |
02:30:11.900
is that as we contribute things to culture,
link |
02:30:17.580
culture in turn becomes future humans.
link |
02:30:21.420
And we keep influencing people thousands of years from now.
link |
02:30:27.660
So our actions today create ripples.
link |
02:30:30.740
And these ripples, I think, basically
link |
02:30:35.140
sum up the meaning of life.
link |
02:30:37.620
In the same way that we are the sum
link |
02:30:42.540
of the interactions between many different ripples that
link |
02:30:45.500
came from our past, we are ourselves
link |
02:30:48.100
creating ripples that will propagate into the future.
link |
02:30:50.700
And that's why we should be, this
link |
02:30:53.460
seems like perhaps an eighth thing to say,
link |
02:30:56.060
but we should be kind to others during our time on Earth
link |
02:31:02.060
because every act of kindness creates ripples.
link |
02:31:05.660
And in reverse, every act of violence also creates ripples.
link |
02:31:09.380
And you want to carefully choose which kind of ripples
link |
02:31:13.260
you want to create, and you want to propagate into the future.
link |
02:31:16.460
And in your case, first of all, beautifully put,
link |
02:31:19.020
but in your case, creating ripples
link |
02:31:21.380
into the future human and future AGI systems.
link |
02:31:27.780
Yes.
link |
02:31:28.500
It's fascinating.
link |
02:31:29.500
Our successors.
link |
02:31:30.420
I don't think there's a better way to end it,
link |
02:31:34.500
Francois, as always, for a second time.
link |
02:31:37.180
And I'm sure many times in the future,
link |
02:31:39.340
it's been a huge honor.
link |
02:31:40.820
You're one of the most brilliant people
link |
02:31:43.380
in the machine learning, computer science world.
link |
02:31:47.500
Again, it's a huge honor.
link |
02:31:48.700
Thanks for talking to me.
link |
02:31:49.460
It's been a pleasure.
link |
02:31:50.540
Thanks a lot for having me.
link |
02:31:51.980
We appreciate it.
link |
02:31:53.900
Thanks for listening to this conversation with Francois
link |
02:31:56.220
Chollet, and thank you to our sponsors, Babbel, Masterclass,
link |
02:32:00.340
and Cash App.
link |
02:32:01.660
Click the sponsor links in the description
link |
02:32:03.900
to get a discount and to support this podcast.
link |
02:32:06.820
If you enjoy this thing, subscribe on YouTube,
link |
02:32:09.060
review it with five stars on Apple Podcast,
link |
02:32:11.340
follow on Spotify, support on Patreon,
link |
02:32:14.060
or connect with me on Twitter at Lex Friedman.
link |
02:32:17.780
And now let me leave you with some words
link |
02:32:19.420
from René Descartes in 1668, an excerpt of which Francois
link |
02:32:24.380
includes and is on the measure of intelligence paper.
link |
02:32:27.780
If there were machines which bore a resemblance
link |
02:32:30.300
to our bodies and imitated our actions as closely as possible
link |
02:32:34.420
for all practical purposes, we should still
link |
02:32:36.980
have two very certain means of recognizing
link |
02:32:40.020
that they were not real men.
link |
02:32:42.220
The first is that they could never use words or put together
link |
02:32:45.300
signs, as we do in order to declare our thoughts to others.
link |
02:32:49.820
For we can certainly conceive of a machine so constructed
link |
02:32:53.420
that it utters words and even utters
link |
02:32:55.540
words that correspond to bodily actions causing
link |
02:32:57.940
a change in its organs.
link |
02:32:59.580
But it is not conceivable that such a machine should produce
link |
02:33:03.380
different arrangements of words so as
link |
02:33:05.420
to give an appropriately meaningful answer to whatever
link |
02:33:08.620
is said in its presence as the dullest of men can do.
link |
02:33:12.780
Here, Descartes is anticipating the Turing test,
link |
02:33:15.460
and the argument still continues to this day.
link |
02:33:18.780
Secondly, he continues, even though some machines might
link |
02:33:22.140
do some things as well as we do them, or perhaps even better,
link |
02:33:26.580
they would inevitably fail in others,
link |
02:33:29.100
which would reveal that they are acting not from understanding
link |
02:33:32.420
but only from the disposition of their organs.
link |
02:33:36.780
This is an incredible quote.
link |
02:33:39.860
Whereas reason is a universal instrument
link |
02:33:43.220
which can be used in all kinds of situations,
link |
02:33:46.580
these organs need some particular action.
link |
02:33:49.060
Hence, it is for all practical purposes
link |
02:33:51.220
impossible for a machine to have enough different organs
link |
02:33:54.300
to make it act in all the contingencies of life
link |
02:33:57.780
and the way in which our reason makes us act.
link |
02:34:01.340
That's the debate between mimicry and memorization
link |
02:34:05.060
versus understanding.
link |
02:34:07.220
So thank you for listening and hope to see you next time.