back to index

François Chollet: Measures of Intelligence | Lex Fridman Podcast #120


small model | large model

link |
00:00:00.000
The following is a conversation with Francois Chollet, his second time in the podcast.
link |
00:00:05.520
He's both a world class engineer and a philosopher in the realm of deep learning and artificial
link |
00:00:11.720
intelligence.
link |
00:00:13.320
This time, we talk a lot about his paper titled On the Measure of Intelligence, that discusses
link |
00:00:19.000
how we may define and measure general intelligence in our computing machinery.
link |
00:00:24.840
Quick summary of the sponsors Babel, Masterclass, and Cash App.
link |
00:00:29.640
Take the sponsor links in the description to get a discount and to support this podcast.
link |
00:00:34.560
As a side note, let me say that the serious, rigorous, scientific study of artificial general
link |
00:00:39.640
intelligence is a rare thing.
link |
00:00:42.280
The mainstream machine learning community works on very narrow AI with very narrow benchmarks.
link |
00:00:47.840
This is very good for incremental and sometimes big incremental progress.
link |
00:00:53.280
On the other hand, the outside the mainstream, renegade, you could say, AGI community, works
link |
00:00:59.920
on approaches that verge on the philosophical and even the literary without big public benchmarks.
link |
00:01:07.560
Walking the line between the two worlds is a rare breed, but it doesn't have to be.
link |
00:01:12.320
I ran the AGI series at MIT as an attempt to inspire more people to walk this line.
link |
00:01:17.840
Keep mind and open AI for time and still, on occasion, walk this line.
link |
00:01:23.400
Francois Chollet does as well.
link |
00:01:25.840
I hope to also.
link |
00:01:27.720
It's a beautiful dream to work towards and to make real one day.
link |
00:01:31.600
If you enjoy this thing, subscribe on YouTube, review it with 5 stars on Apple Podcast, follow
link |
00:01:36.960
on Spotify, support on Patreon, or connect with me on Twitter at Lex Freedman.
link |
00:01:42.080
As usual, I'll do a few minutes of ads now and no ads in the middle.
link |
00:01:45.480
I try to make these interesting, but I give you time stamps so you can skip.
link |
00:01:50.840
But still, please do check out the sponsors by clicking the links in the description.
link |
00:01:54.720
It's the best way to support this podcast.
link |
00:01:58.040
This show is sponsored by Babel, an app and website that gets you speaking in a new language
link |
00:02:03.200
within weeks.
link |
00:02:04.200
Go to babel.com and use colex to get three months free.
link |
00:02:08.280
They offer 14 languages, including Spanish, French, Italian, German, and yes, Russian.
link |
00:02:15.360
Daily lessons are 10 to 15 minutes, super easy, effective, designed by over 100 language
link |
00:02:21.000
experts.
link |
00:02:22.000
Let me read a few lines from the Russian poem Noch Ulytso Fanaar Abteka by Alexander
link |
00:02:27.880
Blok that you'll start to understand if you sign up to Babel.
link |
00:02:32.720
Noch Ulytso Fanaar Abteka, Vesmyzynyi Itusklii Svet, Zhevii Shchoh Chetvirt Vekka, Vsebudit
link |
00:02:41.600
Tak, Ishoda, Ned.
link |
00:02:44.800
Now I say that you'll start to understand this poem because Russian starts with a language
link |
00:02:51.360
and ends with a vodka.
link |
00:02:54.120
Now the latter part is definitely not endorsed or provided by Babel and will probably lose
link |
00:02:58.760
me this sponsorship, although it hasn't yet.
link |
00:03:02.600
But once you graduate with Babel, you can enroll in my advanced course of late night
link |
00:03:06.720
Russian conversation over vodka.
link |
00:03:09.320
No app for that yet.
link |
00:03:11.400
So get started by visiting babel.com and use code LEX to get 3 months free.
link |
00:03:18.280
This show is also sponsored by Masterclass.
link |
00:03:21.200
Sign up at masterclass.com slash lex to get a discount and to support this podcast.
link |
00:03:26.080
When I first heard about Masterclass, I thought it was too good to be true.
link |
00:03:30.000
I still think it's too good to be true.
link |
00:03:32.440
For $180 a year, you get an all access pass to watch courses from to list some of my favorites
link |
00:03:38.760
Chris Hatfield on Space Exploration, hope to have him in this podcast one day, Neil
link |
00:03:43.640
Lugras Tyson on Scientific Thinking and Communication, Neil too, Will Wright, creator of SimCity
link |
00:03:49.440
and Sims on Game Design, Carl Santana on Guitar, Karek Asparov on Chess, Daniel Nagrano on
link |
00:03:55.440
Poker and many more.
link |
00:03:57.280
Chris Hatfield explaining how rockets work and the experience of being watched at the
link |
00:04:01.240
space alone is worth the money.
link |
00:04:03.320
By the way, you can watch it on basically any device.
link |
00:04:06.640
Once again, sign up at masterclass.com slash lex to get a discount and to support this podcast.
link |
00:04:13.400
This show finally is presented by Cash App, the number one finance app in the App Store.
link |
00:04:18.800
When you get it, use code LEX Podcast, Cash App lets you send money to friends, buy bitcoin
link |
00:04:24.240
and invest in the stock market with as little as $1.
link |
00:04:27.360
Since Cash App allows you to send and receive money digitally, let me mention a surprising
link |
00:04:31.640
fact related to physical money.
link |
00:04:33.920
Of all the currency in the world, roughly 8% of it is actually physical money.
link |
00:04:39.360
The other 92% of the money only exists digitally and that's only going to increase.
link |
00:04:45.360
So again, if you get Cash App from the App Store, Google Play and use code LEX Podcast,
link |
00:04:50.760
you get $10 and Cash App will also donate $10 the first, an organization that is helping
link |
00:04:55.720
to advance robotics and STEM education for young people around the world.
link |
00:05:00.600
And now, here's my conversation with Francois Chalet.
link |
00:05:05.200
What philosophers, thinkers or ideas had a big impact on you growing up and today?
link |
00:05:10.760
So one author that had a big impact on me when I read his books as a teenager was Jean
link |
00:05:17.840
Piaget, who is a Swiss psychologist, is considered to be the father of developmental psychology
link |
00:05:25.400
and he has a large body of work about basically how intelligence develops in children.
link |
00:05:33.640
And so it's very old work, like most of it is from the 1930s, 1940s, so it's not quite
link |
00:05:39.760
up to date.
link |
00:05:40.760
It's actually superseded by many newer developments in developmental psychology.
link |
00:05:45.760
But to me, it was very, very interesting, very striking and actually shaped the early
link |
00:05:50.920
ways in which I started thinking about the mind and the development of intelligence as
link |
00:05:55.480
a teenager.
link |
00:05:56.480
His actual ideas or the way he thought about it or just the fact that you could think about
link |
00:06:00.120
the developing mind at all?
link |
00:06:01.520
I guess both.
link |
00:06:02.520
Jean Piaget is the author that's really introduced me to the notion that intelligence and the
link |
00:06:07.520
mind is something that you construct throughout your life and that the children construct
link |
00:06:14.280
it in stages.
link |
00:06:15.840
And I thought that was a very interesting idea, which is, of course, very relevant to AI,
link |
00:06:20.520
to building artificial minds.
link |
00:06:23.440
Another book that I read around the same time that had a big impact on me, and there was
link |
00:06:30.520
actually a little bit of overlap with Jean Piaget as well, and I read it around the same
link |
00:06:33.920
time, is Jeff Hawkins on Intelligence, which is a classic.
link |
00:06:40.040
And he has this vision of the mind as a multiscale hierarchy of temporal prediction modules.
link |
00:06:48.040
And these ideas really resonated with me, the notion of a modular hierarchy of compression
link |
00:06:59.440
functions or prediction functions.
link |
00:07:01.680
I thought it was really, really interesting, and it shaped the way I started thinking about
link |
00:07:07.240
how to build minds.
link |
00:07:10.360
The hierarchical nature, which aspect?
link |
00:07:14.000
Also he's a neuroscientist.
link |
00:07:15.000
He was thinking, he was basically talking about how our mind works.
link |
00:07:20.360
Yeah, the notion that cognition is prediction was an idea that was kind of new to me at
link |
00:07:25.080
the time, and that I really loved at the time.
link |
00:07:28.160
And yeah, the notion that there are multiple scales of processing in the brain.
link |
00:07:35.400
The hierarchy.
link |
00:07:36.400
Yes.
link |
00:07:37.400
This is before deep learning.
link |
00:07:38.400
These ideas of hierarchies in AI have been around for a long time, even before on intelligence
link |
00:07:44.520
I mean, they've been around since the 1980s.
link |
00:07:49.120
And yeah, that was before deep learning.
link |
00:07:50.560
But of course, I think these ideas really found their practical implementation in deep
link |
00:07:56.760
learning.
link |
00:07:57.760
What about the memory side of things?
link |
00:07:59.680
I think he was talking about knowledge representation.
link |
00:08:02.240
Do you think about memory a lot?
link |
00:08:04.560
One way you can think of neural networks as a kind of memory, you're memorizing things,
link |
00:08:10.880
but it doesn't seem to be the kind of memory that's in our brains, or it doesn't have the
link |
00:08:17.440
same rich complexity, long term nature that's in our brains.
link |
00:08:20.880
Yes.
link |
00:08:21.880
The brain is more of a sparse access memory so that you can actually retrieve very precisely
link |
00:08:27.680
like bits of your experience.
link |
00:08:30.280
The retrieval aspect, you can like introspect, you can ask yourself questions, I guess.
link |
00:08:36.560
You can program your own memory, and language is actually the tool you used to do that.
link |
00:08:41.680
I think language is a kind of operating system for the mind.
link |
00:08:46.480
And you use language, well, one of the uses of language is as a query that you run over
link |
00:08:52.920
your own memory, use words as keys to retrieve specific experiences or specific concepts,
link |
00:08:59.080
specific thoughts.
link |
00:09:00.080
Like language is a way you store thoughts, not just in writing, in the physical world,
link |
00:09:04.680
but also in your own mind.
link |
00:09:06.280
And it's also how you retrieve them, like imagine if you didn't have language, then
link |
00:09:10.200
you would have to, you would not really have a self internally triggered way of retrieving
link |
00:09:17.760
past thoughts.
link |
00:09:18.760
You would have to rely on external experiences.
link |
00:09:21.360
For instance, you see a specific site, you smell a specific smell, and that brings up
link |
00:09:25.800
memories, but you would not really have a way to deliberately access these memories
link |
00:09:31.800
without language.
link |
00:09:32.800
Well, the interesting thing you mentioned is you can also program the memory.
link |
00:09:37.520
You can change it probably with language.
link |
00:09:39.840
Yeah, using language.
link |
00:09:41.320
Yes.
link |
00:09:42.320
Well, let me ask you a Chomsky question, which is like, first of all, do you think language
link |
00:09:46.760
is like fundamental, like there's turtles, what's at the bottom of the turtles?
link |
00:09:54.520
They don't go, it can't be turtles all the way down, is language at the bottom of cognition
link |
00:09:59.240
of everything is like language, the fundamental aspect of like what it means to be a thinking
link |
00:10:09.200
thing.
link |
00:10:10.200
No, I don't think so.
link |
00:10:12.040
I think language is...
link |
00:10:13.040
You disagree with Norm Chomsky?
link |
00:10:14.640
Yes.
link |
00:10:15.640
In language is a layer on top of cognition.
link |
00:10:18.040
So it is fundamental to cognition in the sense that to use a computing metaphor, I see language
link |
00:10:24.560
as the operating system of the brain, of the human mind.
link |
00:10:29.120
Yeah.
link |
00:10:30.120
And the operating system is a layer on top of the computer.
link |
00:10:33.280
The computer exists before the operating system, but the operating system is how you
link |
00:10:37.560
make it truly useful.
link |
00:10:39.680
And the operating system is most likely Windows, not Linux, because its language is messy.
link |
00:10:45.840
Yeah, it's messy and it's pretty difficult to inspect it, introspect it.
link |
00:10:53.280
How do you think about language?
link |
00:10:56.280
We use actually sort of human interpretable language, but is there something deeper that's
link |
00:11:03.200
closer to like logical type of statements?
link |
00:11:08.880
Yeah, what is the nature of language, do you think?
link |
00:11:16.320
Is there something deeper than like the syntactic rules we construct?
link |
00:11:19.200
Is there something that doesn't require utterances or writing or so on?
link |
00:11:25.640
Now you're asking about the possibility that there could exist languages for thinking that
link |
00:11:31.000
are not made of words?
link |
00:11:32.920
Yeah, I think so.
link |
00:11:35.480
So the mind is layers, right?
link |
00:11:38.680
And language is almost like the uttermost, the uppermost layer.
link |
00:11:44.760
But before we think in words, I think we think in terms of emotion in space and we think
link |
00:11:51.560
in terms of physical actions.
link |
00:11:53.520
And I think babies in particular probably express its thoughts in terms of the actions
link |
00:12:00.920
that they've seen or that they can perform and in terms of the emotions of objects in
link |
00:12:07.320
the environment before they start thinking in terms of words.
link |
00:12:10.560
It's amazing to think about that as the building blocks of language, so like the kind of actions
link |
00:12:18.400
and ways the babies see the world as like more fundamental than the beautiful Shakespearean
link |
00:12:25.800
language you construct on top of it.
link |
00:12:28.840
And we probably don't have any idea what that looks like, right?
link |
00:12:33.320
Because it's important for them trying to engineer it into AI systems.
link |
00:12:38.480
I think visual analogies and motion is a fundamental building block of the mind and you actually
link |
00:12:46.840
see it reflected in language, like language is full of special metaphors.
link |
00:12:51.960
And when you think about things, I consider myself very much as a visual thinker.
link |
00:12:57.840
You often express its thoughts by using things like visualizing concepts into the space or
link |
00:13:08.280
like you solve problems by imagining yourself, navigating a concept space.
link |
00:13:15.040
I don't know if you have this sort of experience.
link |
00:13:18.080
You said visualizing concept space.
link |
00:13:20.960
So I certainly think about, I certainly visualized mathematical concepts, but you mean like in
link |
00:13:30.040
concept space visually you're embedding ideas into three dimensional space you can explore
link |
00:13:37.360
with your mind essentially.
link |
00:13:38.360
You should be more like 2D, but yeah.
link |
00:13:40.600
2D?
link |
00:13:41.600
Yeah.
link |
00:13:42.600
You're a flatlander.
link |
00:13:43.600
Okay.
link |
00:13:45.600
No, I do not.
link |
00:13:49.640
I always have to, before I jump from concept to concept, I have to put it back down.
link |
00:13:57.120
It has to be on paper.
link |
00:13:58.120
I can only travel on 2D paper, not inside my mind.
link |
00:14:03.480
You're able to move inside your mind.
link |
00:14:05.080
And even if you're writing like a paper, for instance, don't you have like a spatial representation
link |
00:14:10.040
of your paper?
link |
00:14:12.160
Like you visualize where ideas lie topologically in relationship to other ideas, kind of like
link |
00:14:19.360
a software map of the ideas in your paper.
link |
00:14:22.560
Yeah.
link |
00:14:23.560
That's true.
link |
00:14:24.560
I mean, there is, in papers, I don't know about you, but there feels like there's a destination.
link |
00:14:33.240
There's a key idea that you want to write that and a lot of it is in the fog and you're
link |
00:14:39.560
trying to kind of...
link |
00:14:40.920
It's almost like...
link |
00:14:45.520
What's that called when you do a path planning search from both directions from the start
link |
00:14:50.440
and from the end?
link |
00:14:52.840
And then you find, you do like shortest path, but like, in game playing, you do this with
link |
00:14:58.080
like A star from both sides.
link |
00:15:00.720
And you see where they join.
link |
00:15:03.240
Yeah.
link |
00:15:04.240
So you kind of do, at least for me, I think like, first of all, just exploring from the
link |
00:15:08.160
start from like, first principles, what do I know?
link |
00:15:12.520
What can I start proving from that, right?
link |
00:15:15.760
And then from the destination, if I use their backtracking, like, if I want to show some
link |
00:15:23.840
kind of sets of ideas, what would it take to show them and kind of backtrack?
link |
00:15:27.880
But like, yeah, I don't think I'm doing all that in my mind though, like putting it down
link |
00:15:32.160
on paper.
link |
00:15:33.160
Do you use mind maps to organize your ideas?
link |
00:15:35.360
No.
link |
00:15:36.360
Yeah.
link |
00:15:37.360
Yeah.
link |
00:15:38.360
Let's get into this because it's, I've been so jealous of people, I haven't really tried
link |
00:15:42.040
it.
link |
00:15:43.040
I've been jealous of people that seem to like, they get like this fire of passion in their
link |
00:15:47.520
eyes because everything starts making sense.
link |
00:15:50.080
It's like Tom Cruise in the movie was like moving stuff around some of the most brilliant
link |
00:15:54.520
people I know use mind maps.
link |
00:15:55.880
I haven't tried really.
link |
00:15:57.240
Can you explain what the hell a mind map is?
link |
00:16:00.880
I guess a mind map is the way to make kind of like the mess inside your mind to just
link |
00:16:07.240
put it on paper so that you gain more control over it.
link |
00:16:10.160
It's the way to organize things on paper and as kind of like a consequence of organizing
link |
00:16:17.160
things on paper, it starts being more organized inside your mind.
link |
00:16:20.280
So what does that look like?
link |
00:16:21.600
You put, like, do you have an example, like, what do you, what's the first thing you write
link |
00:16:26.360
on paper?
link |
00:16:27.360
What's the second thing you write?
link |
00:16:28.760
I mean, typically, you draw a mind map to organize the way you think about the topic.
link |
00:16:34.720
So you would start by writing down like the key concept about that topic, like you would
link |
00:16:39.880
write intelligence or something.
link |
00:16:42.320
And then you would start adding associative connections.
link |
00:16:45.680
Like what do you think about when you think about intelligence?
link |
00:16:48.120
What do you think are the key elements of intelligence?
link |
00:16:50.480
So maybe you would have language, for instance, and you would have motion.
link |
00:16:53.440
And so you would start drawing notes with these things.
link |
00:16:55.520
And then you would see what do you think about when you think about motion and so on.
link |
00:16:59.160
And you would go like that, like a tree.
link |
00:17:00.960
Is it a tree or a tree most or is it a graph too, like a tree?
link |
00:17:05.720
Oh, it's more of a graph than a tree.
link |
00:17:09.840
And it's not limited to just writing down words.
link |
00:17:13.280
You can also draw things.
link |
00:17:16.200
And it's not supposed to be purely hierarchical, right?
link |
00:17:19.720
Like you can, the point is that you can start, once you start writing it down, you can start
link |
00:17:24.960
reorganizing it so that it makes more sense, so that it's connected in a more effective
link |
00:17:29.520
way.
link |
00:17:30.520
See, but I'm so OCD that you just mentioned intelligence and language and motion.
link |
00:17:37.000
I'll start becoming paranoid that the categorization isn't perfect.
link |
00:17:42.120
Like that I would become paralyzed with the mind map that like this may not be so like
link |
00:17:50.080
the, even though you're just doing associative kind of connections, there's an implied hierarchy
link |
00:17:56.960
that's emerging.
link |
00:17:58.600
And I would start becoming paranoid that it's not the proper hierarchy.
link |
00:18:01.880
So you're not just one way to see mind maps is you're putting thoughts on paper.
link |
00:18:07.200
It's like a stream of consciousness.
link |
00:18:10.680
But then you can also start getting paranoid.
link |
00:18:12.400
Well, if it's just the right hierarchy, sure, which is, but it's a mind map, it's your mind
link |
00:18:17.520
map, you're free to draw anything you want, you're free to draw any connection you want.
link |
00:18:20.920
And you can just make a different mind map if you think the central node is not the right
link |
00:18:25.400
node.
link |
00:18:26.400
Yeah, I suppose there's a fear of being wrong.
link |
00:18:30.000
If you want to organize your ideas by writing down what you think, which I think is very
link |
00:18:36.840
effective.
link |
00:18:37.840
Like, how do you know what you think about something if you don't write it down, right?
link |
00:18:43.000
If you do that, the thing is that it imposes much more syntactic structure over your ideas,
link |
00:18:50.040
which is not required with a mind map.
link |
00:18:51.600
So a mind map is kind of like a lower level, more freehand way of organizing your thoughts.
link |
00:18:58.000
And once you've drawn it, then you can start actually voicing your thoughts in terms of,
link |
00:19:04.360
you know, paragraphs.
link |
00:19:05.360
It's a two dimensional aspect of layout too, right?
link |
00:19:08.400
Yeah.
link |
00:19:09.400
And it's a kind of flower, I guess, you start, there's usually, you want to start with a
link |
00:19:14.240
central concept?
link |
00:19:15.240
Yes.
link |
00:19:16.240
And you move out.
link |
00:19:17.240
Typically, it ends up more like a subway map, so it ends up more like a graph, a topological
link |
00:19:21.240
graph.
link |
00:19:22.240
Without a root node.
link |
00:19:23.240
Yeah, so like in a subway map, there are some nodes that are more connected than others
link |
00:19:27.160
and there are some nodes that are more important than others, right?
link |
00:19:30.360
So there are destinations, but it's not going to be purely like a tree, for instance.
link |
00:19:36.160
Yeah, it's fascinating to think that if there's something to that about our, about the way
link |
00:19:41.080
our mind thinks.
link |
00:19:42.080
By the way, I just kind of remembered obvious thing that I have probably thousands of documents
link |
00:19:48.960
in Google Doc at this point that are bullet point lists, which is you can probably map
link |
00:19:56.400
a mind map to a bullet point list.
link |
00:20:01.560
It's the same.
link |
00:20:02.560
It's a, no, it's not.
link |
00:20:03.560
It's a tree.
link |
00:20:04.560
It's a tree.
link |
00:20:05.560
Yeah.
link |
00:20:06.560
So I create trees, but also they don't have the visual element.
link |
00:20:10.880
Like I guess I'm comfortable with the structure.
link |
00:20:13.520
It feels like the narrowness, the constraints feel more comforting.
link |
00:20:18.320
If you have thousands of documents with your own thoughts in Google Docs, why don't you
link |
00:20:23.800
write some kind of search engine, like maybe a mind map, a piece of software, a mind mapping
link |
00:20:31.360
software where you write down a concept and then it gives you sentences or paragraphs
link |
00:20:37.440
from your thousands Google Docs document that match this concept.
link |
00:20:41.200
The problem is it's so deeply unlike mind maps, it's so deeply rooted in natural language.
link |
00:20:48.600
So it's not, it's not semantically searchable, I would say, because the categories are very,
link |
00:20:57.280
you kind of mention intelligence, language and motion, they're very strong, semantic,
link |
00:21:02.640
like it feels like the mind map forces you to be semantically clear and specific.
link |
00:21:09.800
The bullet points list I have are sparse, disparate thoughts that poetically represent
link |
00:21:20.240
a category like motion as opposed to saying motion.
link |
00:21:25.360
So unfortunately, that's the same problem with the internet, that's why the idea of
link |
00:21:29.800
semantic web is difficult to get.
link |
00:21:32.480
It's most language on the internet is a giant mess of natural language that's hard to interpret.
link |
00:21:42.600
So do you think there's something to mind maps as, you actually originally brought it
link |
00:21:47.320
up as when we're talking about kind of cognition and language, do you think there's something
link |
00:21:54.360
to mind maps about how our brain actually deals, like, think reasons about things?
link |
00:22:01.840
It's possible.
link |
00:22:02.840
I think it's reasonable to assume that there is some level of topological processing in
link |
00:22:10.080
the brain, that the brain is very associative in nature.
link |
00:22:15.480
And I also believe that a topological space is a better medium to include thoughts than
link |
00:22:25.520
a geometric space.
link |
00:22:27.640
So I think...
link |
00:22:28.640
What's the difference in a topological and a geometric space?
link |
00:22:31.240
Well, if you're talking about topologies, then points are either connected or not.
link |
00:22:36.280
So the topology is more like a subway map.
link |
00:22:39.000
And geometry is when you're interested in the distance between things.
link |
00:22:44.000
In subway maps, you don't really have the concept of distance, you only have the concept
link |
00:22:47.280
of whether there is a train going from station A to station B.
link |
00:22:53.120
And what we do in deep learning is that we're actually dealing with geometric spaces, we
link |
00:22:57.800
are dealing with concept vectors, word vectors, that have a distance between the distance
link |
00:23:03.800
and the product.
link |
00:23:06.680
We are not really building topological models, usually.
link |
00:23:10.720
I think you're absolutely right.
link |
00:23:12.480
Distance is a fundamental importance in deep learning.
link |
00:23:16.480
I mean, it's the continuous aspect of it.
link |
00:23:20.120
Because everything is a vector, and everything has to be a vector because everything has
link |
00:23:23.120
to be differentiable.
link |
00:23:24.120
If your space is discrete, it's no longer differentiable, you cannot do deep learning
link |
00:23:27.760
in it anymore.
link |
00:23:28.760
Well, you could, but you could only do it by embedding it in a bigger, continuous space.
link |
00:23:35.720
So if you do topology in the context of deep learning, you have to do it by embedding your
link |
00:23:40.520
topology in a geometry.
link |
00:23:42.320
Well, let me zoom out for a second.
link |
00:23:46.360
Let's get into your paper on the measure of intelligence that, did you put it on 2019?
link |
00:23:52.520
Yes.
link |
00:23:53.520
Okay.
link |
00:23:54.520
November.
link |
00:23:55.520
November.
link |
00:23:56.520
Yeah.
link |
00:23:57.520
Remember 2019?
link |
00:23:58.520
That was a different time.
link |
00:24:01.080
Yeah.
link |
00:24:02.080
I remember.
link |
00:24:03.080
I still remember.
link |
00:24:06.640
It feels like a different world.
link |
00:24:09.680
You could travel, or you could actually go outside and see friends.
link |
00:24:14.560
Yeah.
link |
00:24:15.560
Let me ask the most absurd question.
link |
00:24:18.960
I think there's some nonzero probability there'll be a textbook one day, like 200 years from
link |
00:24:24.320
now on artificial intelligence, or it'll be called like just intelligence because humans
link |
00:24:31.080
will already be gone, and it'll be your picture with a quote.
link |
00:24:34.680
This is, you know, one of the early biological systems would consider the nature of intelligence,
link |
00:24:41.840
and there'll be like a definition of how they thought about intelligence, which is one of
link |
00:24:45.680
the things you do in your paper on measure intelligence is to ask like, well, what is
link |
00:24:51.800
intelligence and how to test for intelligence and so on.
link |
00:24:55.680
So is there a spiffy quote about what is intelligence?
link |
00:25:02.040
What is the definition of intelligence, according to your friends, Warshale?
link |
00:25:05.760
Yeah.
link |
00:25:06.760
So do you think the superintended AIs of the future will want to remember us?
link |
00:25:14.000
The way we remember humans from the past, and do you think they won't be ashamed of
link |
00:25:19.480
having a biological origin?
link |
00:25:21.480
No, I think it'll be a niche topic.
link |
00:25:24.760
It won't be that interesting, but it'll be like the people that study in certain contexts
link |
00:25:30.960
like historical civilization that no longer exists, the Aztecs and so on.
link |
00:25:36.680
That's how it'll be seen, and it'll be study in also the context on social media.
link |
00:25:42.400
There will be hashtags about the atrocity committed to human beings when the robots
link |
00:25:51.280
finally got rid of them.
link |
00:25:54.400
It was a mistake.
link |
00:25:55.400
It'll be seen as a giant mistake, but ultimately in the name of progress, and it created a
link |
00:26:00.880
better world because humans were over consuming the resources and they were not very rational
link |
00:26:07.120
and were destructive in the end, in terms of productivity, and putting more love in
link |
00:26:12.560
the world.
link |
00:26:13.880
And so within that context, there'll be a chapter about these biological systems.
link |
00:26:17.320
It seems to have a very detailed vision of that feature.
link |
00:26:20.360
You should write a sci fi novel about it.
link |
00:26:22.800
I'm working on a sci fi novel currently, yes.
link |
00:26:27.080
Yeah, so.
link |
00:26:28.080
Self published, yeah.
link |
00:26:29.440
The definition of intelligence, so intelligence is the efficiency with which you acquire new
link |
00:26:37.560
skills at tasks that you did not previously know about, that you did not prepare for,
link |
00:26:43.800
right?
link |
00:26:44.800
So intelligence is not skill itself.
link |
00:26:47.920
It's not what you know, it's not what you can do.
link |
00:26:50.880
It's how well and how efficiently you can learn new things.
link |
00:26:54.760
New things.
link |
00:26:55.760
Yes.
link |
00:26:56.760
The idea of newness there seems to be fundamentally important.
link |
00:27:00.360
Yes.
link |
00:27:01.600
So you would see intelligence on display, for instance, whenever you see a human being
link |
00:27:08.240
or an AI creature adapt to a new environment that it does not see before, that its creators
link |
00:27:14.800
did not anticipate.
link |
00:27:16.920
When you see adaptation, when you see improvisation, when you see generalization, that's intelligence.
link |
00:27:22.560
In reverse, if you have a system that when you put it in a slightly new environment,
link |
00:27:27.200
it cannot adapt, it cannot improvise, it cannot deviate from what it's hardcoded to do or
link |
00:27:34.920
what it has been trained to do, that is a system that is not intelligence.
link |
00:27:40.560
There's actually a quote from Einstein that captures this idea, which is, the measure
link |
00:27:47.200
of intelligence is the ability to change.
link |
00:27:50.800
I like that quote.
link |
00:27:51.800
I think it captures at least part of this idea.
link |
00:27:54.520
You know, there might be something interesting about the difference between your definition
link |
00:27:58.520
of Einstein's.
link |
00:27:59.520
I mean, he's just being Einstein and clever, but acquisition of new ability to deal with
link |
00:28:12.040
new things versus ability to just change.
link |
00:28:17.520
What's the difference between those two things?
link |
00:28:20.080
To just change in itself.
link |
00:28:22.400
Do you think there's something to that?
link |
00:28:24.640
Just being able to change.
link |
00:28:26.520
Yes, being able to adapt.
link |
00:28:28.600
So not change, but certainly a change in direction.
link |
00:28:33.360
Being able to adapt yourself to your environment.
link |
00:28:37.760
Whatever the environment is.
link |
00:28:38.760
That's a big part of intelligence, yes.
link |
00:28:41.720
Intelligence is most precisely how efficiently you're able to adapt, how efficiently you're
link |
00:28:46.520
able to basically master your environment, how efficiently you can acquire new skills.
link |
00:28:52.760
And I think there's a big distinction to be drawn between intelligence, which is a process
link |
00:28:59.520
and the output of that process, which is skill.
link |
00:29:04.920
So for instance, if you have a very smart human programmer that considers the game of
link |
00:29:09.960
chess and that writes down a static program that can play chess.
link |
00:29:16.480
Then the intelligence is the process of developing that program.
link |
00:29:20.680
But the program itself is just encoding the output artifact of that process.
link |
00:29:28.080
The program itself is not intelligent.
link |
00:29:30.120
And the way you tell it's not intelligent is that if you put it in a different context,
link |
00:29:34.000
you ask it to play go or something, it's not going to be able to perform well with that
link |
00:29:38.000
human involvement because the source of intelligence, the entity that is capable of that process
link |
00:29:43.080
is the human programmer.
link |
00:29:44.440
So we should be able to tell a difference between the process and its output.
link |
00:29:50.120
We should not confuse the output and the process.
link |
00:29:53.320
It's the same as do not confuse a road building company and one specific road.
link |
00:30:00.320
Because one specific road takes you from point A to point B. But a road building company
link |
00:30:04.840
can take you from, can make a path from anywhere to anywhere else.
link |
00:30:08.800
Yeah, that's beautifully put, but it's also to play devil's advocate a little bit.
link |
00:30:17.120
It's possible that there is something more fundamental than us humans.
link |
00:30:21.360
So you said the programmer creates the difference between the choir of the skill and the skill
link |
00:30:30.040
itself.
link |
00:30:31.040
There could be something like you could argue the universe is more intelligent, like the
link |
00:30:37.440
deep, the base intelligence that we should be trying to measure is something that created
link |
00:30:44.520
humans should be measuring God or the source of the universe as opposed to like there could
link |
00:30:53.560
be a deeper intelligence, there's always deeper intelligence.
link |
00:30:57.120
You can argue that, but that does not take anything away from the fact that humans are
link |
00:31:01.120
intelligent and you can tell that because they are capable of adaptation and generality.
link |
00:31:08.000
And you see that in particular in the fact that humans are capable of handling situations
link |
00:31:15.920
and tasks that are quite different from anything that any of our evolutionary ancestors has
link |
00:31:23.000
ever encountered.
link |
00:31:24.000
So we are capable of generalizing very much out of distribution if you consider our evolutionary
link |
00:31:29.760
history as being in a way out of training data.
link |
00:31:32.480
Of course evolutionary biologists would argue that we're not going too far out of the distribution.
link |
00:31:37.800
We're like mapping the skills we've learned previously, desperately trying to like jam
link |
00:31:43.240
them into like these new situations.
link |
00:31:46.120
I mean, there's definitely a little bit of that, but it's pretty clear to me that we're
link |
00:31:51.320
able to, you know, most of the things we do any given day in our modern civilization are
link |
00:31:58.240
things that are very, very different from what our ancestors a million years ago would
link |
00:32:04.040
have been doing in a given day.
link |
00:32:06.080
And our environment is very different.
link |
00:32:07.680
So I agree that everything we do, we do it with cognitive building blocks that we acquire
link |
00:32:15.840
over the course of evolution, and that anchors our cognition to certain contexts, which is
link |
00:32:22.840
the human condition very much.
link |
00:32:25.400
But still, our mind is capable of a pretty remarkable degree of generality far beyond
link |
00:32:31.200
anything we can create in artificial systems today, like the degree in which the mind can
link |
00:32:37.080
generalize from its evolutionary history, can generalize away from its evolutionary history
link |
00:32:44.000
is much greater than the degree to which a deep learning system today can generalize
link |
00:32:49.440
away from its training data.
link |
00:32:51.240
And the key point you're making, which I think is quite beautiful, is we shouldn't measure,
link |
00:32:57.120
if we talk about measurement, we shouldn't measure the skill.
link |
00:33:01.720
We should measure the creation of the new skill, the ability to create that new skill.
link |
00:33:06.880
But it's tempting.
link |
00:33:10.560
It's weird because the skill is a little bit of a small window into the system.
link |
00:33:16.520
So whenever you have a lot of skills, it's tempting to measure the skills.
link |
00:33:21.280
Yes.
link |
00:33:22.280
I mean, the skill is the only thing you can objectively measure.
link |
00:33:26.240
But yeah, so the thing to keep in mind is that when you see skill in the human, it gives
link |
00:33:36.680
you a strong signal that human is intelligent because you know they weren't born with that
link |
00:33:41.360
skill typically.
link |
00:33:42.360
Like, you see a very strong chess player, maybe you're a very strong chess player yourself.
link |
00:33:47.520
I think you're saying that because I'm Russian and now you're prejudiced, you assume all
link |
00:33:54.000
of us are some degree of chess.
link |
00:33:55.000
I'm biased.
link |
00:33:56.000
Exactly.
link |
00:33:57.000
Well, you're dead.
link |
00:33:58.000
Bias.
link |
00:33:59.000
So if you see a very strong chess player, you know they weren't born knowing how to
link |
00:34:04.760
play chess.
link |
00:34:05.760
So they had to acquire that skill with their limited resources, with their limited lifetime.
link |
00:34:11.280
And you know they did that because they are generally intelligent.
link |
00:34:15.520
And so they may as well have acquired any other skill.
link |
00:34:19.080
You know they have this potential.
link |
00:34:21.400
And on the other hand, if you see a computer playing chess, you cannot make the same assumptions
link |
00:34:27.840
because you cannot just assume the computer is generally intelligent.
link |
00:34:30.920
The computer may be born knowing how to play chess in the sense that it may have been programmed
link |
00:34:37.320
by a human that has understood chess for the computer and that has just encoded the output
link |
00:34:44.080
of that understanding in a static program.
link |
00:34:46.120
And that program is not intelligent.
link |
00:34:49.480
So let's zoom out just for a second and say like, what is the goal of the on the measure
link |
00:34:55.680
of intelligence paper?
link |
00:34:57.120
Like, what do you hope to achieve with it?
link |
00:34:59.160
So the goal of the paper is to clear up some longstanding misunderstandings about the way
link |
00:35:05.240
we've been conceptualizing intelligence in the AI community and in the way we've been
link |
00:35:12.440
evaluating progress in AI.
link |
00:35:16.880
There's been a lot of progress recently in machine learning and people are extrapolating
link |
00:35:21.200
from that progress that we are about to solve general intelligence.
link |
00:35:26.680
And if you want to be able to evaluate these statements, you need to precisely define what
link |
00:35:32.800
you're talking about when you're talking about general intelligence.
link |
00:35:35.680
And you need a formal way, a reliable way to measure how much intelligence, how much
link |
00:35:43.200
general intelligence a system processes.
link |
00:35:46.600
And ideally this measure of intelligence should be actionable.
link |
00:35:50.840
So it should not just describe what intelligence is, it should not just be a binary indicator
link |
00:35:57.200
that tells you the system is intelligent or it isn't.
link |
00:36:02.200
It should be actionable, it should have explanatory power, right?
link |
00:36:06.280
So you could use it as a feedback signal.
link |
00:36:09.120
It would show you the way towards building more intelligent systems.
link |
00:36:13.640
So at the first level, you draw a distinction between two divergent views of intelligence.
link |
00:36:22.400
As we just talked about, intelligence is a collection of task specific skills and a general
link |
00:36:29.080
learning ability.
link |
00:36:30.440
So what's the difference between this memorization of skills and a general learning ability?
link |
00:36:38.360
We've talked about it a little bit, but can you try to link around this topic for a bit?
link |
00:36:43.080
Yes, so the first part of the paper is an assessment of the different ways we've been
link |
00:36:49.280
thinking about intelligence and the different ways we've been evaluating progress in AI.
link |
00:36:54.800
And this tree of cognitive sciences has been shaped by two views of the human mind.
link |
00:37:01.400
And one view is the evolutionary psychology view in which the mind is a collection of
link |
00:37:09.120
fairly static, special purpose ad hoc mechanisms that have been hard coded by evolution over
link |
00:37:17.800
our history as a species over a very long time.
link |
00:37:22.880
And early AI researchers, people like Marvin Minsky, for instance, they clearly subscribed
link |
00:37:31.920
to this view.
link |
00:37:33.720
And they saw the mind as a kind of collection of static programs similar to the programs
link |
00:37:41.120
they would run on like mainframe computers.
link |
00:37:43.720
And in fact, I think they very much understood the mind through the metaphor of the mainframe
link |
00:37:49.880
computer because that was the tool they were working with.
link |
00:37:53.720
And so you had these static programs, this collection of very different static programs
link |
00:37:57.120
operating over a database like memory.
link |
00:38:00.200
And in this picture, learning was not very important.
link |
00:38:03.840
Learning was considered to be just memorization.
link |
00:38:05.760
And in fact, learning is basically not featured in AI textbooks until the 1980s with the rise
link |
00:38:15.480
of machine learning.
link |
00:38:16.480
It's kind of fun to think about that learning was the outcast, like the weird people working
link |
00:38:23.400
on learning.
link |
00:38:24.400
Like the mainstream AI world was, I mean, I don't know what the best term is, but it's
link |
00:38:32.000
non learning.
link |
00:38:34.320
It was seen as like reasoning would not be learning based.
link |
00:38:37.960
Yes, it was considered that the mind was a collection of programs that were primarily
link |
00:38:45.440
logical in nature.
link |
00:38:46.760
And that's all you needed to do to create a mind was to write down these programs.
link |
00:38:51.000
And they would operate over knowledge, which will be stored in some kind of database.
link |
00:38:55.280
And as long as your database would encompass everything about the world and your logical
link |
00:39:00.280
rules were comprehensive, then you would have a mind.
link |
00:39:04.560
So the other view of the mind is the brain as sort of blank slate.
link |
00:39:10.880
Right?
link |
00:39:11.880
This is a very old idea.
link |
00:39:13.240
You find it in John Locke's writings.
link |
00:39:16.200
This is the Tabulaza.
link |
00:39:19.560
And this is this idea that the mind is some kind of like information sponge that starts
link |
00:39:24.800
empty, that starts blank, and that absorbs knowledge and skills from experience, right?
link |
00:39:35.320
So it's a sponge that reflects the complexity of the world, the complexity of your life
link |
00:39:41.280
experience, essentially, that everything you know and everything you can do is a reflection
link |
00:39:47.080
of something you found in the outside world, essentially.
link |
00:39:50.520
So this is an idea that's very old, that was not very popular, for instance, in the 1970s.
link |
00:39:58.320
But that gained a lot of vitality recently with the rise of connectionism in particular
link |
00:40:02.720
deep learning.
link |
00:40:04.360
And so today, deep learning is the dominant paradigm in AI.
link |
00:40:08.560
And I feel like lots of AI researchers are conceptualizing the mind via a deep learning
link |
00:40:15.920
metaphor, like they see the mind as a kind of randomly initialized neural network that
link |
00:40:22.120
starts blank when you're born, and then that gets trained via exposure to training data
link |
00:40:27.680
that requires knowledge and skills, exposure to training data.
link |
00:40:30.960
By the way, it's a small tangent.
link |
00:40:34.200
I feel like people who are thinking about intelligence are not conceptualizing it that
link |
00:40:40.080
way.
link |
00:40:41.280
I actually haven't met too many people who believe that a neural network will be able
link |
00:40:46.880
to reason, who seriously think that, rigorously, because I think it's an actually interesting
link |
00:40:53.080
worldview.
link |
00:40:54.080
And we'll talk about it more, but it's been impressive what neural networks have been
link |
00:41:00.600
able to accomplish.
link |
00:41:01.600
And it's an eye to me, I don't know, you might disagree, but it's an open question whether
link |
00:41:08.120
scaling size eventually might lead to incredible results to us mere humans will appear as if
link |
00:41:15.840
it's general.
link |
00:41:16.840
I mean, if you ask people who are seriously thinking about intelligence, they will definitely
link |
00:41:22.000
not say that all you need to do is like the mind is just a neural network.
link |
00:41:27.120
However, it's actually a view that's very popular, I think, in the deep learning community,
link |
00:41:31.920
that many people are kind of conceptually intellectually lazy about it.
link |
00:41:38.480
But I guess what I'm saying exactly right is I haven't met many people, and I think
link |
00:41:45.040
it would be interesting to meet a person who is not intellectually lazy about this particular
link |
00:41:49.680
topic and still believes that neural networks will go all the way.
link |
00:41:54.400
I think Yalla is probably closest to that with self supervisor who argue that currently
link |
00:42:02.120
planning techniques are already the way to general artificial intelligence and that all
link |
00:42:07.200
you need to do is to scale it up to all the available train data.
link |
00:42:13.040
And that's if you look at the waves that open AI is GPT stream model is made, you see echoes
link |
00:42:21.240
of this idea.
link |
00:42:22.760
So on that topic, GPT three, similar to GPT two actually, have captivated some part of
link |
00:42:30.880
the imagination of the public.
link |
00:42:33.160
This is just a bunch of hype of different kind that's, I would say it's emergent.
link |
00:42:38.080
It's not artificially manufactured.
link |
00:42:39.880
It's just like people just get excited for some strange reason.
link |
00:42:44.080
In the case of GPT three, which is funny, that there's, I believe a couple of months
link |
00:42:48.520
delay from release to hype, maybe I'm not historically correct on that, but it feels
link |
00:42:57.200
like there was a little bit of a lack of hype and then there's a phase shift into hype.
link |
00:43:04.960
But nevertheless, there's a bunch of cool applications that seem to captivate the imagination
link |
00:43:09.800
of the public about what this language model that's trained in unsupervised way without
link |
00:43:16.280
any fine tuning is able to achieve.
link |
00:43:19.720
So what do you make of that?
link |
00:43:21.040
What are your thoughts about GPT three?
link |
00:43:22.720
Yeah.
link |
00:43:23.720
So I think what's interesting about GPT three is the idea that it may be able to learn new
link |
00:43:29.200
tasks in after just being shown a few examples.
link |
00:43:33.760
So I think if it's actually capable of doing that, that's novel and that's very interesting
link |
00:43:37.600
and that's something we should investigate.
link |
00:43:40.080
That said, I must say, I'm not entirely convinced that we have shown it's capable of doing that
link |
00:43:46.240
but it's very likely given the amount of data that the model is trained on that what it's
link |
00:43:52.680
actually doing is pattern matching a new task you give it with a task that it's been exposed
link |
00:43:59.360
to in its train data.
link |
00:44:00.360
It's just recognizing the task instead of just developing a model of the task.
link |
00:44:05.640
But there's, sorry to interrupt, there's a parallel as to what you said before, which
link |
00:44:10.160
is it's possible to see GPT three as like the prompts it's given as a kind of SQL query
link |
00:44:17.640
into this thing that it's learned similar to what you said before, which is language
link |
00:44:21.560
is used to query the memory.
link |
00:44:23.960
So is it possible that neural network is a giant memorization thing, but then if it's
link |
00:44:30.840
gets sufficiently giant, it'll memorize sufficiently large amounts of thing in the world or intelligence
link |
00:44:38.480
becomes a querying machine.
link |
00:44:40.520
I think it's possible that a significant chunk of intelligence is this giant associative
link |
00:44:46.720
memory.
link |
00:44:48.720
I definitely don't believe that intelligence is just a giant associative memory, but it
link |
00:44:53.920
may well be a big component.
link |
00:44:57.760
So do you think GPT three, four, five, GPT 10 will eventually like what do you think?
link |
00:45:07.480
Where's the ceiling?
link |
00:45:08.480
Do you think they'll be able to reason?
link |
00:45:11.080
No, that's a bad question.
link |
00:45:14.720
Like what is the ceiling is the better question?
link |
00:45:16.760
Well, what is going to scale?
link |
00:45:18.680
How good is GPT N going to be?
link |
00:45:22.160
So I believe GPT N is going to improve on the strength of GPT two and three, which is it
link |
00:45:31.920
will be able to generate, you know, ever more plausible text in context.
link |
00:45:37.800
Just monetize the performance.
link |
00:45:40.360
Yes, if you train a bigger model on more data, then your text will be increasingly more context
link |
00:45:48.680
aware and increasingly more plausible in the same way that GPT three is much better at
link |
00:45:54.840
generating plausible texts compared to GPT two.
link |
00:45:59.120
So that said, I don't think just getting up the model to more transformer layers and
link |
00:46:04.920
more train data is going to address the flaws of GPT three, which is that it can generate
link |
00:46:10.320
plausible texts, but that text is not constrained by anything else other than plausibility.
link |
00:46:16.760
So in particular, it's not constrained by factualness or even consistency, which is
link |
00:46:22.320
why it's very easy to get GPT three to generate statements that are factually untrue or to
link |
00:46:28.360
generate statements that are even self contradictory, right?
link |
00:46:32.400
Because it's only goal is plausibility, and it has no other constraints.
link |
00:46:39.200
It's not constrained to be self consistent, right?
link |
00:46:42.600
And so for this reason, one thing that I thought was very interesting with GPT three
link |
00:46:46.640
is that you can put it in mind the answer it will give you by asking the question in
link |
00:46:52.520
a specific way, because it's very responsive to the way you ask the question since it has
link |
00:46:57.280
no understanding of the content of the question.
link |
00:47:03.640
And if you are the same question in two different ways that are basically adversely engineered
link |
00:47:10.520
to produce a certain answer, you will get two different answers, two contradictory answers.
link |
00:47:15.720
It's very susceptible to adversarial attacks essentially.
link |
00:47:18.200
Potentially, yes.
link |
00:47:19.520
So in general, the problem with these models, these generative models is that they are very
link |
00:47:25.160
good at generating plausible texts, but that's just not enough, right?
link |
00:47:32.200
You need, I think one avenue that would be very interesting to make progress is to make
link |
00:47:38.400
it possible to write programs over the latent space that these models operate on, that you
link |
00:47:46.000
would rely on these self supervised models to generate a sort of lack pool of knowledge
link |
00:47:53.480
and concepts and common sense, and then you would be able to write explicit reasoning
link |
00:47:59.600
programs over it.
link |
00:48:01.680
Because the current problem with GPT three is that you can be quite difficult to get
link |
00:48:06.640
it to do what you want to do.
link |
00:48:09.480
If you want to turn GPT three into products, you need to put constraints on it.
link |
00:48:14.960
You need to force it to obey certain rules.
link |
00:48:19.120
So you need a way to program it explicitly.
link |
00:48:22.040
Yeah, so if you look at its ability to do program synthesis, it generates, like you
link |
00:48:27.080
said, something that's plausible.
link |
00:48:28.760
Yeah, so if you try to make it generate programs, it will perform well for any program that
link |
00:48:36.480
it has seen it in its training data, but because a program space is not interpretative, right?
link |
00:48:44.520
It's not going to be able to generalize two problems it hasn't seen before.
link |
00:48:48.840
Now that's currently, do you think sort of an absurd, but I think useful, I guess, intuition
link |
00:48:58.400
builder is, you know, the GPT three has 175 billion parameters.
link |
00:49:07.720
Human brain has 100 has about 1000 times that or more in terms of number of synapses.
link |
00:49:16.480
Do you think, obviously, very different kinds of things, but there is some degree of similarity.
link |
00:49:26.560
Do you think, what do you think GPT will look like when it has 100 trillion parameters?
link |
00:49:34.520
You think our conversation might be in nature different, like, because you've criticized
link |
00:49:40.640
GPT three very effectively now.
link |
00:49:43.040
Do you think?
link |
00:49:44.440
No, I don't think so.
link |
00:49:47.080
So to begin with, the bottleneck with scaling up GPT three GPT models, genetic pretrained
link |
00:49:53.400
transformer models, is not going to be the size of the model or how long it takes to
link |
00:49:59.000
train it.
link |
00:50:00.000
The bottleneck is going to be the training data because OpenEye is already training GPT
link |
00:50:05.040
three on a crawl of basically the entire web, right?
link |
00:50:08.960
And that's a lot of data.
link |
00:50:09.960
So you could imagine training on more data than that, like Google could train on more
link |
00:50:13.360
data than that, but it would still be only incrementally more data.
link |
00:50:17.760
And I don't recall exactly how much more data GPT three was trained on compared to GPT
link |
00:50:22.800
two, but it's probably at least like 100, maybe even 1000x, don't have the exact number.
link |
00:50:28.600
You're not going to be able to train a model on 100 more data than what you're already
link |
00:50:33.360
doing.
link |
00:50:34.360
So that's brilliant.
link |
00:50:35.360
So it's easier to think of compute as a bottleneck and then arguing that we can remove that
link |
00:50:40.760
bottleneck.
link |
00:50:41.760
We can remove the compute bottleneck, I don't think it's a big problem.
link |
00:50:44.720
If you look at the pace at which we've improved the efficiency of deep learning models in
link |
00:50:51.960
the past a few years, I'm not worried about training time bottlenecks or model size bottlenecks.
link |
00:51:00.120
The bottleneck in the case of these generative transformer models is absolutely the trained
link |
00:51:04.800
data.
link |
00:51:05.800
What about the quality of the data?
link |
00:51:07.800
So yeah.
link |
00:51:08.800
So the quality of the data is an interesting point.
link |
00:51:10.960
The thing is, if you're going to want to use these models in real products, then you want
link |
00:51:18.600
to feed them data that's as high quality as factual, I would say as unbiased as possible,
link |
00:51:25.720
but there's not really such a thing as unbiased data in the first place.
link |
00:51:30.640
But you probably don't want to train it on Reddit, for instance.
link |
00:51:35.000
Sounds like a bad plan.
link |
00:51:37.200
So from my personal experience, working with large scale deep learning models.
link |
00:51:42.840
So at some point, I was working on a model at Google that's trained on like 350 million
link |
00:51:50.640
labeled images.
link |
00:51:51.640
It's an image classification model.
link |
00:51:53.800
That's a lot of images.
link |
00:51:54.800
That's like probably most publicly available images on the web at the time.
link |
00:52:01.240
And it was a very noisy data set because the labels were not originally annotated by hand
link |
00:52:07.720
by humans.
link |
00:52:08.720
They were automatically derived from tags on social media or just keywords in the same
link |
00:52:16.320
page as the image was found and so on.
link |
00:52:18.240
So it was very noisy.
link |
00:52:19.240
And it turned out that you could easily get a better model, not just by training.
link |
00:52:26.360
Like if you train on more of the noisy data, you get an incrementally better model, but
link |
00:52:31.600
you very quickly eat diminishing returns.
link |
00:52:35.560
On the other hand, if you train on a smaller data set with higher quality annotations,
link |
00:52:39.960
quality that are annotations that are actually made by humans, you get a better model.
link |
00:52:46.800
And it also takes less time to train it.
link |
00:52:49.600
Yeah, that's fascinating.
link |
00:52:51.640
It's the self supervised learning.
link |
00:52:53.320
If there's a way to get better doing the automated labeling.
link |
00:52:58.040
Yeah, so you can enrich or refine your labels in an automated way.
link |
00:53:05.960
That's correct.
link |
00:53:06.960
Do you have a hope for, I don't know if you're familiar with the idea of a semantic web.
link |
00:53:12.200
Is this a semantic web, just what people are not familiar, and is the idea of being able
link |
00:53:18.640
to convert the internet or be able to attach semantic meaning to the words on the internet?
link |
00:53:27.920
The sentences, the paragraphs, to be able to convert information on the internet or
link |
00:53:33.960
some fraction of the internet into something that's interpretable by machines.
link |
00:53:38.280
That was kind of a dream for, I think the semantic web papers in the 90s, it's kind
link |
00:53:48.240
of the dream that the internet is full of rich, exciting information.
link |
00:53:52.520
Even just looking at Wikipedia, we should be able to use that as data for machines.
link |
00:53:57.880
And so far.
link |
00:53:58.880
Information is not in a format that's available to machines.
link |
00:54:01.280
So no, I don't think the semantic web will ever work simply because it would be a lot
link |
00:54:06.640
of work to provide that information in a structured form.
link |
00:54:12.240
And there is not really any incentive for anyone to provide that work.
link |
00:54:16.440
So I think the way forward to make the knowledge on the web available to machines is actually
link |
00:54:24.200
something closer to unsupervised deep learning.
link |
00:54:27.200
Yeah.
link |
00:54:28.200
So GBT3 is actually a bigger step in the direction of making the knowledge of the web available
link |
00:54:33.880
to machines than the semantic web was.
link |
00:54:36.720
Yeah.
link |
00:54:37.720
In a human centric sense, it feels like GBT3 hasn't learned anything that could be used
link |
00:54:48.560
to reason.
link |
00:54:50.600
But that might be just the early days.
link |
00:54:52.840
Yeah.
link |
00:54:53.840
I think that's correct.
link |
00:54:54.840
I think the forms of reasoning that you see it perform are basically just reproducing
link |
00:55:00.160
patterns that it has seen in string data.
link |
00:55:02.560
So of course, if you're trained on the entire web, then you can produce an illusion of reasoning
link |
00:55:09.360
in many different situations, but it will break down if it's presented with a novel
link |
00:55:13.880
situation.
link |
00:55:14.880
That's the open question between the illusion of reasoning and actual reasoning, yeah.
link |
00:55:19.160
Yes.
link |
00:55:20.160
The power to adapt to something that is genuinely new.
link |
00:55:22.960
Because the thing is, even imagine you had, you could train on every bit of data ever
link |
00:55:31.320
generated in the history of humanity.
link |
00:55:35.600
It remains, that model would be capable of anticipating many different possible situations,
link |
00:55:43.280
but it remains that the future is going to be something different.
link |
00:55:47.560
Like, for instance, if you train a GBT3 model on data from the year 2002, for instance,
link |
00:55:55.920
and then you use it today, it's going to be missing many things, it's going to be missing
link |
00:55:58.920
many common sense facts about the world.
link |
00:56:02.880
It's even going to be missing vocabulary and so on.
link |
00:56:05.600
Yeah, it's interesting that GBT3 even doesn't have, I think, any information about the coronavirus.
link |
00:56:13.040
Yes.
link |
00:56:15.120
Which is why a system that's, you tell that the system is intelligent when it's capable
link |
00:56:21.920
to adapt.
link |
00:56:22.920
So intelligence is going to require some amount of continuous learning.
link |
00:56:27.480
It's also going to require some amount of improvisation.
link |
00:56:31.280
It's not enough to assume that what you're going to be asked to do is something that
link |
00:56:36.840
you've seen before, or something that is a simple interpolation of things you've seen
link |
00:56:41.200
before.
link |
00:56:42.200
Yeah.
link |
00:56:43.200
In fact, that model breaks down for even very tasks that look relatively simple from
link |
00:56:51.520
a distance, like L5 self driving, for instance.
link |
00:56:55.120
Google at the paper a couple of years back showing that something like 30 million different
link |
00:57:04.200
road situations were actually completely insufficient to train a driving model.
link |
00:57:09.920
It wasn't even L2, right?
link |
00:57:11.920
And that's a lot of data.
link |
00:57:12.920
That's a lot more data than the 20 or 30 hours of driving that a human needs to learn to
link |
00:57:19.080
drive given the knowledge they've already accumulated.
link |
00:57:22.240
Well, let me ask you on that topic, Elon Musk, Tesla autopilot, one of the only companies,
link |
00:57:31.120
I believe, is really pushing for a learning based approach.
link |
00:57:35.200
You're skeptical that that kind of network can achieve level four?
link |
00:57:39.680
L4 is probably achievable, L5 is probably not.
link |
00:57:44.640
What's the distinction there?
link |
00:57:46.080
Is L5 is completely, you can just fall asleep?
link |
00:57:49.080
Yeah, L5 is basically human level.
link |
00:57:51.360
Well, it would drive, you have to be careful saying human level because that's the most
link |
00:57:55.440
kind of drivers.
link |
00:57:56.440
Yeah, that's the clearest example of cars will most likely be much safer than humans
link |
00:58:03.200
in many situations where humans fail.
link |
00:58:06.800
It's the vice versa question.
link |
00:58:09.080
I'll tell you, the thing is the amounts of train data you would need to anticipate for
link |
00:58:15.520
pretty much every possible situation you learn content in the real world is such that
link |
00:58:21.280
it's not entirely unrealistic to think that at some point in the future we'll develop
link |
00:58:25.960
a system that's trying enough data, especially provided that we can simulate a lot of that
link |
00:58:31.600
data.
link |
00:58:32.600
We don't necessarily need actual cars on the road for everything, but it's a massive effort.
link |
00:58:40.040
And it turns out you can create a system that's much more adaptive, that can generalize much
link |
00:58:44.720
better if you just add explicit models of the surroundings of the car.
link |
00:58:53.760
And if you use deep learning for what it's good at, which is to provide perceptive information.
link |
00:58:59.560
So in general, deep learning is a way to encode perception and a way to encode intuition,
link |
00:59:05.080
but it is not a good medium for any sort of explicit reasoning.
link |
00:59:11.600
And in AI systems today, strong generalization tends to come from explicit models, tend to
link |
00:59:21.320
come from abstractions in the human mind that are encoded in program form by a human engineer.
link |
00:59:28.400
These are the abstractions that can actually generalize, not the sort of weak abstraction
link |
00:59:33.200
that is learned by a neural network.
link |
00:59:35.280
And the question is how much reasoning, how much strong abstractions are required to solve
link |
00:59:42.600
particular tasks like driving?
link |
00:59:45.880
That's the question.
link |
00:59:46.880
Or human life, existence, how much strong abstractions does existence require, but more
link |
00:59:53.800
specifically on driving?
link |
00:59:57.240
That seems to be a coupled question about intelligence is like, how much intelligence
link |
01:00:03.920
like, how do you build an intelligent system?
link |
01:00:07.560
And the coupled problem, how hard is this problem?
link |
01:00:11.520
How much intelligence does this problem actually require?
link |
01:00:14.520
So we're, we get to cheat, right?
link |
01:00:18.160
Because we get to look at the problem.
link |
01:00:20.280
Like it's not like you get to close our eyes and completely new to driving.
link |
01:00:24.840
We get to do what we do as human beings, which is for the majority of our life, before we
link |
01:00:30.720
ever learn quote unquote to drive, you get to watch other cars and other people drive.
link |
01:00:35.440
We get to be in cars, we get to watch, we get to go and see movies about cars.
link |
01:00:39.480
We get to, you know, we get to observe all this stuff.
link |
01:00:42.800
And that's similar to what neural networks are doing.
link |
01:00:44.840
It's getting a lot of data.
link |
01:00:47.400
And the question is, yeah, how much is, how many leaps of reasoning genius is required
link |
01:00:57.680
to be able to actually effectively drive?
link |
01:00:59.360
Oh, it's for example, driving, I mean, sure, you've seen a lot of cars in your life before
link |
01:01:06.440
you learn to drive.
link |
01:01:07.840
But let's say you've learned to drive in Silicon Valley and now you rent a car in Tokyo.
link |
01:01:14.320
Well now everyone is driving on the other side of the road.
link |
01:01:17.040
And the signs are different and the roads are more narrow and so on.
link |
01:01:20.400
So it's a very, very different environment.
link |
01:01:22.960
And a smart human, even an average human should be able to just zero shot it to just be operational
link |
01:01:31.480
in this, in this very different environment right away, despite having had no contact
link |
01:01:38.640
with the novel complexity that is contained in this environment, right?
link |
01:01:44.480
And that is novel complexity is not just interpolation over the situations that you've encountered
link |
01:01:51.680
previously, like learning to drive in the US, right?
link |
01:01:55.040
I would say the reason I ask is one of the most interesting tests of intelligence we
link |
01:01:59.960
have today, actively, which is driving in terms of having an impact on the world.
link |
01:02:06.560
Like when do you think we'll pass that test of intelligence?
link |
01:02:09.920
So I don't think driving is that much of a test of intelligence because again, there
link |
01:02:15.000
is no task for which skill at that task demonstrates intelligence unless it's a kind of meta tasks
link |
01:02:23.040
that involves acquiring new skills.
link |
01:02:26.640
So I don't think I think you can actually solve driving without having any, any real
link |
01:02:34.120
amount of intelligence.
link |
01:02:35.120
For instance, if you really did have infinite train there, you could just literally train
link |
01:02:41.600
an end to end deep learning model that does driving provided infinite train data.
link |
01:02:45.800
The only problem with the whole idea is collecting a data sets that's sufficiently comprehensive
link |
01:02:53.400
that covers the very long tail of possible situations you might encounter.
link |
01:02:57.440
And it's really just a scale problem.
link |
01:02:59.400
So I think there's nothing fundamentally wrong with this plan, with this idea.
link |
01:03:06.600
It's just that it strikes me as a fairly inefficient thing to do because you run into this scaling
link |
01:03:15.960
issue with diminishing returns, whereas if instead you took a more manual engineering
link |
01:03:21.880
approach where you use deep learning modules in combination with engineering an explicit
link |
01:03:31.720
model of the surrounding of the cars and you bridge the two in a clever way.
link |
01:03:36.280
Your model will actually start generalizing much earlier and more effectively than the
link |
01:03:41.160
end to end deep learning model.
link |
01:03:42.560
So why would you not go with the more manual engineering oriented approach?
link |
01:03:47.800
Like even if you created that system, either the end to end deep learning model system
link |
01:03:52.240
that's infinite data or the slightly more human system.
link |
01:03:58.520
I don't think achieving alpha would demonstrate general intelligence or intelligence of any
link |
01:04:04.840
generality at all, again, the only possible test of generality in AI would be a test that
link |
01:04:11.280
looks at skill acquisition over unknown tasks.
link |
01:04:14.000
For instance, you could take your L5 driver and ask it to learn to pilot a commercial
link |
01:04:21.080
airplane, for instance, and then you would look at how much human involvement is required
link |
01:04:25.720
and how much training data is required for the system to learn to pilot an airplane.
link |
01:04:30.120
And that gives you a measure of how intelligent that system really is.
link |
01:04:35.920
Yeah, I mean, that's a big leap.
link |
01:04:37.480
I get you, but I'm more interested as a problem.
link |
01:04:42.000
I would see, to me, driving is a black box that can generate novel situations at some
link |
01:04:49.720
rate, what people call edge cases.
link |
01:04:53.520
So it does have newness that keeps being like we're confronted, let's say once a month.
link |
01:04:59.200
It is a very long tail.
link |
01:05:00.640
Yes.
link |
01:05:01.640
It's a long tail.
link |
01:05:02.640
That doesn't mean you cannot solve it just by training as a school model and out of data.
link |
01:05:09.000
Huge amount of data.
link |
01:05:10.000
It's really amount of scale.
link |
01:05:12.040
But I guess what I'm saying is if you have a vehicle that achieves level five, it is
link |
01:05:18.040
going to be able to deal with new situations.
link |
01:05:24.240
Or I mean, the data is so large that the rate of new situations is very low.
link |
01:05:33.360
That's not intelligent.
link |
01:05:34.360
So if we go back to your definition of intelligence, it's the efficiency with which you can adapt
link |
01:05:41.120
to new situations, to truly new situations, not situations you've seen before, not situations
link |
01:05:46.320
that could be anticipated by your creators, by the creators of the system, but true new
link |
01:05:51.040
situations.
link |
01:05:52.040
The efficiency with which you acquire new skills.
link |
01:05:55.080
If you require, in order to pick up a new skill, you require a very extensive training
link |
01:06:03.040
dataset of most possible situations that can occur in the practice of that skill, then
link |
01:06:09.160
the system is not intelligent.
link |
01:06:10.600
It is mostly just a lookup table.
link |
01:06:15.240
Yeah.
link |
01:06:16.240
Well.
link |
01:06:17.240
So if, in order to acquire a skill, you need a human engineer to write down a bunch of
link |
01:06:23.760
rules that cover most or every possible situation, likewise, the system is not intelligent.
link |
01:06:29.640
The system is merely the output artifact of a process that happens in the minds of the
link |
01:06:38.480
engineers that are creating it.
link |
01:06:41.000
It is encoding an abstraction that's produced by the human mind, and intelligence would
link |
01:06:48.000
actually be the process of producing, of autonomously producing this abstraction.
link |
01:06:55.840
Yeah.
link |
01:06:56.840
Not like, if you take an abstraction and you encode it on a piece of paper or in a computer
link |
01:07:02.080
program, the abstraction itself is not intelligent.
link |
01:07:06.160
This intelligent is the agent that's capable of producing these abstractions, right?
link |
01:07:11.640
Yeah.
link |
01:07:12.640
It feels like there's a little bit of a gray area, like, because you're basically saying
link |
01:07:17.600
that deep learning forms abstractions too, but those abstractions do not seem to be effective
link |
01:07:25.520
for generalizing far outside of the things that you've already seen, but generalize a
link |
01:07:30.680
little bit.
link |
01:07:31.680
Yeah.
link |
01:07:32.680
Absolutely.
link |
01:07:33.680
No, deep learning does generalize a little bit.
link |
01:07:34.680
But generalization is not a binary, it's more like a spectrum.
link |
01:07:38.280
Yeah.
link |
01:07:39.280
And there's a certain point, it's a gray area, but there's a certain point where there's
link |
01:07:42.680
an impressive degree of generalization that happens.
link |
01:07:46.720
No, I guess exactly what you were saying is intelligence is how efficiently you're able
link |
01:07:56.920
to generalize far outside of the distribution of things you've seen already.
link |
01:08:03.240
Yes.
link |
01:08:04.240
It's just like the distance of how far you can, like, how new, how radically new something
link |
01:08:09.520
is and how efficiently you're able to deal with that.
link |
01:08:13.240
You can think of intelligence as a measure of an information conversion ratio.
link |
01:08:18.960
Like, imagine a space of possible situations, and you've covered some of them, so you have
link |
01:08:28.160
some amount of information about your space of possible situations that's provided by
link |
01:08:32.720
the situations you already know, and that's, on the other hand, also provided by the prior
link |
01:08:38.200
knowledge that the system brings to the table or the prior knowledge that's embedded in
link |
01:08:42.480
the system.
link |
01:08:43.720
So the system starts with some information, right, about the problem, about the task.
link |
01:08:49.240
And it's about going from that information to a program, what you would call a skill
link |
01:08:54.880
program, a behavioral program that can cover a large area of possible situation space.
link |
01:09:02.000
And essentially, the ratio between that area and the amount of information you start with
link |
01:09:06.720
is intelligence.
link |
01:09:09.840
So a very smart agent can make efficient users of very little information about a new problem
link |
01:09:17.440
and very little prior knowledge as well to cover a very large area of potential situations
link |
01:09:23.200
in that problem, without knowing what these future new situations are going to be.
link |
01:09:31.200
So one of the other big things you talk about in the paper, we've talked about it a little
link |
01:09:35.520
bit already, but let's talk about it some more as the actual tests of intelligence.
link |
01:09:41.040
So if we look at like human and machine intelligence, do you think tests of intelligence should
link |
01:09:48.160
be different for humans and machines, or how we think about testing of intelligence?
link |
01:09:54.840
These fundamentally the same kind of intelligence that we're after, and therefore the tests
link |
01:10:01.440
should be similar?
link |
01:10:03.760
So if your goal is to create AIs that are more human like, then it will be super valuable,
link |
01:10:11.960
obviously, to have a test that's universal, that applies to both AIs and humans, so that
link |
01:10:19.640
you could establish a comparison between the two that you could tell exactly how intelligence,
link |
01:10:27.440
in terms of human intelligence, a given system is.
link |
01:10:30.520
So that said, the constraints that apply to artificial intelligence and to human intelligence
link |
01:10:37.600
are very different, and your test should account for this difference.
link |
01:10:45.240
Because if you look at artificial systems, it's always possible for an experimenter to
link |
01:10:50.560
buy arbitrary levels of skill at arbitrary tasks, either by injecting a hard coded prior
link |
01:10:59.840
knowledge into the system via rules and so on that come from the human mind, from the
link |
01:11:07.120
minds of the programmers, and also buying higher levels of skill just by training on
link |
01:11:14.200
more data, for instance, you could generate an infinity of different Go games, and you
link |
01:11:19.760
could train a Go playing system that way, but you could not directly compare it to human
link |
01:11:27.520
Go playing skills, because a human that plays Go had to develop that skill in a very constrained
link |
01:11:34.000
environment.
link |
01:11:35.000
They had the limited amount of time, they had the limited amount of energy, and of course,
link |
01:11:40.520
they started from a different set of priors, they started from innate human priors.
link |
01:11:48.720
So I think if you want to compare the intelligence of two systems, like the intelligence of an
link |
01:11:52.440
AI and the intelligence of a human, you have to control for priors.
link |
01:11:59.920
You have to start from the same set of knowledge priors about the task, and you have to control
link |
01:12:06.880
for experience, that is to say, for training data.
link |
01:12:11.280
So what's priors?
link |
01:12:15.160
So priors is whatever information you have about a given task before you start learning
link |
01:12:21.600
about this task.
link |
01:12:23.440
And how's the difference from experience?
link |
01:12:25.960
Well experience is acquired, right.
link |
01:12:28.160
For instance, if you're trying to play Go, your experience with Go is all the Go games
link |
01:12:33.840
you've played or you've seen or you've simulated in your mind, let's say.
link |
01:12:39.000
And your priors are things like, well, Go is a game on a 2D grid, and we have lots of
link |
01:12:46.720
hard coded priors about the organization of 2D space.
link |
01:12:53.440
And the rules of how the dynamics of the physics of this game in this 2D space.
link |
01:13:00.880
And the idea that you have what winning is.
link |
01:13:04.000
Yes, exactly.
link |
01:13:05.680
And other board games can also share some similarities with Go, and if you've played
link |
01:13:10.320
these board games, then with respect to the game of Go, that would be part of your priors
link |
01:13:15.080
about the game.
link |
01:13:16.080
Well, it's interesting to think about the game of Go is how many priors are actually
link |
01:13:19.880
brought to the table.
link |
01:13:22.760
When you look at self play, reinforcement learning based mechanisms that do learning,
link |
01:13:28.960
it seems like the number of priors is pretty low.
link |
01:13:31.080
Yes.
link |
01:13:32.080
But you're saying you should be...
link |
01:13:33.080
There is a 2D special priors in the cabinet.
link |
01:13:35.840
Right.
link |
01:13:36.840
But you should be clear at making those priors explicit.
link |
01:13:40.640
Yes.
link |
01:13:41.640
So in part, I think if your goal is to measure a human life form of intelligence, then you
link |
01:13:48.080
should clearly establish that you want the AI you're testing to start from the same set
link |
01:13:55.280
of priors that humans start with.
link |
01:13:58.920
So, I mean, to me personally, but I think to a lot of people, the human side of things
link |
01:14:03.680
is very interesting.
link |
01:14:05.400
So testing intelligence for humans, what do you think is a good test of human intelligence?
link |
01:14:13.480
Well, let's do a question that Psychometrics is interested in.
link |
01:14:19.240
What is?
link |
01:14:20.240
There's an entire subfield of psychology that deals with this question.
link |
01:14:23.840
So what's Psychometrics?
link |
01:14:25.200
The Psychometrics is the subfield of psychology that tries to measure, quantify aspects of
link |
01:14:33.120
the human mind.
link |
01:14:34.120
So in particular, cognitive abilities, intelligence, and personality traits as well.
link |
01:14:39.800
So what are, might be a weird question, but what are the first principles of Psychometrics
link |
01:14:49.640
that operates on, you know, what are the priors it brings to the table?
link |
01:14:55.480
So it's a field with a fairly long history.
link |
01:15:01.400
It's, so you know, psychology sometimes gets a bad reputation for not having very reproducible
link |
01:15:08.880
results and some Psychometrics has actually some fairly slightly reproducible results.
link |
01:15:14.240
So the ideal goals of the field is, you know, tests should be reliable, which is an ocean
link |
01:15:21.320
tide to your productivity.
link |
01:15:23.240
It should be valid, meaning that it should actually measure what you say it measures.
link |
01:15:30.960
So for instance, if you're saying that you're measuring intelligence, then your test results
link |
01:15:35.720
should be correlated with things that you expect to be correlated with intelligence like success
link |
01:15:40.880
in school or success in the workplace and so on, should be standardized, meaning that
link |
01:15:46.400
you can administer your tests to many different people in some conditions, and it should be
link |
01:15:51.280
free from bias, meaning that for instance, if your, if your test involves the English
link |
01:15:57.520
language, then you have to be aware that this creates a bias against people who have English
link |
01:16:03.120
as their second language or people who can't speak English at all.
link |
01:16:07.360
So of course, these, these principles for creating Psychometric tests are very much
link |
01:16:12.760
90 all.
link |
01:16:13.760
I don't think every Psychometric test is, is really either reliable, valid, or offered
link |
01:16:21.400
from bias, but at least the field is aware of these weaknesses and is trying to address
link |
01:16:26.960
them.
link |
01:16:27.960
So it's kind of interesting, ultimately, you're only able to measure like you said previously
link |
01:16:32.800
the skill, but you're trying to do a bunch of measures of different skills that correlate.
link |
01:16:38.160
Yes.
link |
01:16:39.160
You mentioned strongly with some general concept of cognitive ability.
link |
01:16:43.120
Yes, yes.
link |
01:16:44.120
So what's the G factor?
link |
01:16:46.760
So right, there are many different kinds of tests, tests of intelligence and each of them
link |
01:16:52.920
is interested in different aspects of intelligence.
link |
01:16:55.880
You know, some of them will deal with language, some of them will deal with special vision,
link |
01:17:01.080
maybe mental rotations, numbers and so on.
link |
01:17:04.680
When you run these very different tests at scale, what you start seeing is that there
link |
01:17:10.920
are clusters of correlations among test results.
link |
01:17:13.720
So for instance, if you look at homework at school, you will see that people who do well
link |
01:17:21.000
at math are also likely statistically to do well in physics.
link |
01:17:25.720
And what's more, there are also people who do well at math and physics are also statistically
link |
01:17:31.520
likely to do well in things that sound completely unrelated, like writing in English, for instance.
link |
01:17:38.760
And so when you see clusters of correlations in statistical terms, you would explain them
link |
01:17:46.040
with a latent variable.
link |
01:17:47.840
And the latent variable that would, for instance, explain the relationship between being good
link |
01:17:52.560
at math and being good at physics would be cognitive ability, right?
link |
01:17:57.480
And the G factor is the latent variable that explains the fact that every test of intelligence
link |
01:18:05.560
that you can come up with results on this test end up being correlated.
link |
01:18:10.600
So there is some single, unique variable that explains these correlations, that's the G factor.
link |
01:18:19.000
So it's a statistical construct.
link |
01:18:20.440
It's not really something you can directly measure, for instance, in a person.
link |
01:18:25.680
But it's there.
link |
01:18:26.680
But it's there.
link |
01:18:27.680
It's there.
link |
01:18:28.680
It's there at scale.
link |
01:18:29.680
And that's also one thing I want to mention about psychometrics.
link |
01:18:33.080
Like, you know, when you talk about measuring intelligence in humans, for instance, some
link |
01:18:38.280
people get a little bit worried, they will say, you know, that sounds dangerous, maybe
link |
01:18:42.120
that sounds potentially discriminatory and so on.
link |
01:18:44.560
And they are not wrong.
link |
01:18:46.840
And the thing is, personally, I'm not interested in psychometrics as a way to characterize one
link |
01:18:53.240
individual person, like if I get your psychometric personality assessment or your IQ, I don't
link |
01:19:01.200
think that actually tells me much about you as a person.
link |
01:19:05.000
I think psychometrics is most useful as a statistical tool.
link |
01:19:10.360
So it's most useful at scale.
link |
01:19:12.680
It's most useful when you start getting test results for a large number of people and you
link |
01:19:17.640
start cross correlating these test results, because that gives you information about the
link |
01:19:23.960
structure of the human mind, in particular about the structure of human cognitive abilities.
link |
01:19:29.920
So at scale, psychometrics paints a certain picture of the human mind.
link |
01:19:35.720
And that's interesting.
link |
01:19:37.400
And that's what's relevant to AI, the structure of human cognitive abilities.
link |
01:19:40.760
Yeah, it gives you an insight into, I mean, to me, I remember when I learned about GFactor,
link |
01:19:45.960
it seemed like it would be impossible for it to be real, even as a statistical variable.
link |
01:19:56.520
It felt kind of like astrology, like it's like wishful thinking about psychologists.
link |
01:20:02.280
But the more I learned, I realized that there's some, I mean, I'm not sure what to make about
link |
01:20:06.920
human beings, the fact that the GFactor is a thing, that there's a commonality across all
link |
01:20:11.880
of human species, that there does seem to be a strong correlation between cognitive abilities.
link |
01:20:17.000
That's kind of fascinating.
link |
01:20:18.440
Yeah.
link |
01:20:19.080
So human connectivities have a structure, like the most mainstream theory of the structure
link |
01:20:25.240
of connectivities is called a CHC theory.
link |
01:20:28.680
It's a cattle horn, Carol, it's named after the three psychologists who contributed key pieces of it.
link |
01:20:34.440
And it describes cognitive abilities as a hierarchy with three levels.
link |
01:20:40.360
And at the top, you have the GFactor, then you have broad cognitive abilities, for instance,
link |
01:20:45.720
fluid intelligence, that encompass a broad set of possible kinds of tasks that are all related.
link |
01:20:55.960
And then you have narrow cognitive abilities at the last level, which is closer to task specific
link |
01:21:02.440
skill. And there are actually different theories of the structure of cognitive abilities.
link |
01:21:09.800
They just emerge from different statistical analysis of IQ test results.
link |
01:21:14.280
But they all describe a hierarchy with a kind of GFactor at the top.
link |
01:21:21.000
And you're right that the GFactor is, it's not quite real in the sense that it's not
link |
01:21:27.000
something you can observe and measure, like your height, for instance.
link |
01:21:30.120
But it's really in the sense that you see it in a statistical analysis of the data.
link |
01:21:37.640
One thing I want to mention is that the fact that there is a GFactor does not really mean that
link |
01:21:41.960
human intelligence is general in a strong sense, does not mean human intelligence can be applied
link |
01:21:48.120
to any problem at all and that someone who has a high IQ is going to be able to solve any problem
link |
01:21:53.240
at all. That's not quite what it means, I think.
link |
01:21:55.400
One popular analogy to understand it is the sports analogy. If you consider the concept
link |
01:22:04.680
of physical fitness, it's a concept that's very similar to intelligence because it's a useful
link |
01:22:10.440
concept. It's something you can intuitively understand. Some people are fit, maybe like you,
link |
01:22:17.560
some people are not as fit, maybe like me. But none of us can fly.
link |
01:22:21.640
Absolutely. Even if you're very fit, that doesn't mean you can do anything at all in
link |
01:22:30.040
any environment. You obviously cannot fly, you cannot serve at the bottom of the ocean and so
link |
01:22:35.160
on. And if you were a scientist and you wanted to precisely define and measure physical fitness
link |
01:22:42.760
in humans, then you would come up with a battery of tests, like you would have running 100 meter,
link |
01:22:49.560
playing soccer, playing table tennis, swimming, and so on. And if you ran these tests over many
link |
01:22:57.640
different people, you would start seeing correlations and test results. For instance,
link |
01:23:01.720
people who are good at soccer are also good at sprinting. And you would explain these correlations
link |
01:23:08.520
with physical abilities that are strictly analogous to cognitive abilities. And then you would
link |
01:23:14.360
start also observing correlations between biological characteristics, like maybe lung
link |
01:23:22.040
volume is correlated with being a fast runner, for instance. In the same way that there are
link |
01:23:29.080
neurophysical correlates of cognitive abilities. And at the top of the hierarchy of physical
link |
01:23:38.200
abilities that you would be able to observe, you would have a g factor, a physical g factor,
link |
01:23:42.920
which would map to physical fitness. And as you just said, that doesn't mean that people with
link |
01:23:49.720
high physical fitness can't fly. It doesn't mean human morphology and human physiology is universal.
link |
01:23:55.560
It's actually super specialized. We can only do the things that we evolve to do. We are not
link |
01:24:04.600
appropriate to... You could not exist on Venus or Mars or in the void of space or the bottom of the
link |
01:24:11.640
ocean. So that said, one thing that's really striking and remarkable is that our morphology
link |
01:24:22.120
generalizes far beyond the environments that we evolved for. Like in a way, you could say we evolved
link |
01:24:29.240
to run after prey in the savanna, right? That's very much where our human morphology comes from.
link |
01:24:36.120
And that said, we can do a lot of things that are completely unrelated to that. We can climb
link |
01:24:43.560
mountains. We can swim across lakes. We can play table tennis. I mean, table tennis is very different
link |
01:24:50.600
from what we were evolved to do, right? So our morphology, our bodies, our sense and motor
link |
01:24:56.280
affordances have a degree of generality that is absolutely remarkable, right? And I think cognition
link |
01:25:03.480
is very similar to that. Our cognitive abilities have a degree of generality that goes far beyond
link |
01:25:09.720
what the mind was initially supposed to do, which is why we can play music and write novels and go
link |
01:25:15.720
to Mars and do all kinds of crazy things. But it's not universal in the same way that human
link |
01:25:21.080
morphology and our body is not appropriate for actually most of the universe by volume,
link |
01:25:27.640
in the same way you could say that the human mind is not really appropriate for most of
link |
01:25:31.480
problem space, potential problem space by volume. So we have very strong cognitive biases,
link |
01:25:39.080
actually. That means that there are certain types of problems that we handle very well and certain
link |
01:25:44.520
types of problems that we are completely inadapted for. So that's really how we interpret
link |
01:25:51.560
the g factor. It's not a sign of strong generality. It's really just the broadest cognitive ability.
link |
01:25:59.960
But our abilities, whether we are talking about sensory motor abilities or cognitive abilities,
link |
01:26:06.200
they still, they remain very specialized in the human condition, right?
link |
01:26:12.600
Within the constraints of the human cognition, they're general.
link |
01:26:18.280
Yes, absolutely.
link |
01:26:19.240
But the constraints, as you're saying, are very limited.
link |
01:26:21.480
I think what's limiting.
link |
01:26:23.480
So we evolved our cognition and our body evolved in very specific environments
link |
01:26:29.400
because our environment was so valuable, fast changing and so unpredictable.
link |
01:26:34.440
Part of the constraints that drove our evolution is generality itself. So we were in a way evolved
link |
01:26:40.680
to be able to improvise in all kinds of physical or cognitive environments, right?
link |
01:26:47.560
And for this reason, it turns out that the minds and bodies that we ended up with
link |
01:26:53.960
can be applied to much, much broader scope than what they were evolved for, right?
link |
01:26:59.960
And that's truly remarkable. And that goes, that's a degree of generalization that is far beyond
link |
01:27:05.400
anything you can see in artificial systems today, right?
link |
01:27:10.280
That's it. It does not mean that human intelligence is anywhere universal.
link |
01:27:16.280
Yeah, it's not general. You know, it's a kind of exciting topic for people even outside of
link |
01:27:22.120
artificial intelligence IQ tests. I think it's Mensa, whatever, there's different degrees of
link |
01:27:30.280
difficulty for questions. We talked about this offline a little bit too about sort of difficult
link |
01:27:35.720
questions. What makes a question on an IQ test more difficult or less difficult, do you think?
link |
01:27:43.720
So the thing to keep in mind is that there's no such thing as a question that's intrinsically
link |
01:27:50.440
difficult. It has to be difficult to suspect to the things you already know, and the things you
link |
01:27:56.280
cannot really do, right? So in terms of an IQ test question, typically it would be structured,
link |
01:28:05.240
for instance, as a set of demonstration input and output pairs, right? And then you would be given
link |
01:28:13.320
a test input, a prompt, and you would need to recognize or produce the corresponding output.
link |
01:28:19.720
And in that narrow context, you could say a difficult question is a question where
link |
01:28:29.560
the input prompt is very surprising and unexpected given the training example.
link |
01:28:36.360
Just even the nature of the patterns that you're observing in the input prompt.
link |
01:28:40.040
For instance, let's say you have a rotation problem. You must rotate the shape by 90 degrees.
link |
01:28:45.800
If I give you two examples, and then I give you one prompt, which is actually one of the two
link |
01:28:51.720
training examples, then there is zero generalization difficulty for the task. It's actually a trivial
link |
01:28:57.080
task. You just recognize that it's one of the training examples and you produce the same answer.
link |
01:29:02.280
Now, if it's a more complex shape, there is a little bit more generalization, but it remains
link |
01:29:08.280
that you are still doing the same thing at this time as you were being demonstrated at
link |
01:29:14.120
training time. A difficult task does require some amount of test time adaptation, some amount of
link |
01:29:23.320
improvisation, right? So consider, I don't know, you're teaching a class on quantum physics or
link |
01:29:30.920
something. If you wanted to kind of test the understanding that students have of the material,
link |
01:29:41.400
you would come up with an exam that's very different from anything they've seen,
link |
01:29:48.760
like on the Internet when they were cramming. On the other hand, if you wanted to make it easy,
link |
01:29:54.680
you would just give them something that's very similar to the mock exams that they've taken,
link |
01:30:02.360
something that's just a simple interpolation of questions that they've already seen.
link |
01:30:06.520
And so that would be an easy exam. It's very similar to what you've been trained on. And a
link |
01:30:12.200
difficult exam is one that really probes your understanding because it forces you to improvise.
link |
01:30:18.920
It forces you to do things that are different from what you were exposed to before. So that said,
link |
01:30:27.080
it doesn't mean that the exam that requires improvisation is intrinsically hard, right?
link |
01:30:32.600
Because maybe you're a quantum physics expert. So when you take the exam, this is actually stuff
link |
01:30:38.040
that despite being new to the students, it's not new to you, right? So it can only be difficult
link |
01:30:44.600
with respect to what the test taker already knows, and with respect to the information that
link |
01:30:51.880
the test taker has about the task. So that's what I mean by controlling for priors what you,
link |
01:30:58.360
the information you bring to the table. And the experience, which is the training data. So in the
link |
01:31:03.480
case of the quantum physics exam, that would be all the course material itself and all the mock
link |
01:31:10.360
exams that students might have taken online. Yeah, it's interesting because I've also,
link |
01:31:16.200
I sent you an email and asked you, like, I've been, this just this curious question of,
link |
01:31:22.040
you know, what's a really hard IQ test question? And I've been talking to also people who have
link |
01:31:30.680
designed IQ tests as a few folks on the internet. It's like a thing. People are really curious
link |
01:31:35.720
about it. First of all, most of the IQ tests they designed, they like religiously protect against
link |
01:31:43.880
the correct answers. Like you can't find the correct answers anywhere. In fact, the question
link |
01:31:49.160
is ruined once you know, even like the approach you're supposed to take. So they're very
link |
01:31:54.160
that said, the approach is implicit in the training examples. So if you release the train
link |
01:31:59.320
examples, it's over. Well, which is why in arc, for instance, there's a test set that is private
link |
01:32:06.840
and no one has seen it. No, for really tough IQ questions, it's not obvious. It's not because
link |
01:32:15.160
the ambiguity. Like it's, I mean, we'll have to look through them, but like some number sequences
link |
01:32:22.280
and so on, it's not completely clear. So like, you can get a sense, but there's like some,
link |
01:32:30.440
you know, when you look at a number sequence, I don't know,
link |
01:32:35.960
like your Fibonacci number sequence, if you look at the first few numbers, that sequence
link |
01:32:39.960
could be completed in a lot of different ways. And, you know, some are, if you think deeply,
link |
01:32:45.480
are more correct than others. Like there's a kind of intuitive simplicity and elegance
link |
01:32:51.800
to the correct solution. Yes, I am personally not a fan of ambiguity in test questions,
link |
01:32:57.880
actually. But I think you can have difficulty without requiring ambiguity simply by making the
link |
01:33:04.280
test require a lot of extrapolation over the training examples. But the beautiful question
link |
01:33:12.440
is difficult, but gives away everything when you give the training example.
link |
01:33:17.080
Basically, yes. Meaning that, so the tests I'm interested in creating are not necessarily
link |
01:33:24.920
difficult for humans, because human intelligence is the benchmark. They're supposed to be difficult
link |
01:33:32.760
for machines in ways that are easy for humans. Like I think an ideal test of human and machine
link |
01:33:39.160
intelligence is a test that is actionable, that highlights the need for progress, and that
link |
01:33:47.800
highlights the direction in which you should be making progress. I think we'll talk about
link |
01:33:52.760
the RR challenge and the test you've constructed, and you have these elegant examples. I think
link |
01:33:57.800
that highlight, like this is really easy for us humans, but it's really hard for machines.
link |
01:34:03.720
But on the designing an IQ test for IQs of like a higher than 160 and so on,
link |
01:34:12.680
you have to say, you have to take that and put it on steroids, right? You have to think like,
link |
01:34:16.520
what is hard for humans? And that's a fascinating exercise in itself, I think.
link |
01:34:22.840
And it was an interesting question of what it takes to create a really hard question for humans,
link |
01:34:29.400
because you again have to do the same process as you mentioned, which is something basically
link |
01:34:41.000
where the experience that you have likely to have encountered throughout your whole life,
link |
01:34:46.200
even if you've prepared for IQ tests, which is a big challenge, that this will still be novel for
link |
01:34:52.200
you. Yeah, I mean, novelty is a requirement. You should not be able to practice for the questions
link |
01:34:59.160
that you're going to be tested on. That's important. Because otherwise, what you're doing
link |
01:35:03.640
is not exhibiting intelligence, what you're doing is just retrieving what you've been exposed before.
link |
01:35:09.560
It's the same thing as a deep learning model. If you train a deep learning model on
link |
01:35:13.640
all the possible answers, then it will ace your test. In the same way that a stupid student can
link |
01:35:23.720
still ace the test, if they cram for it, they memorize 100 different possible mock exams.
link |
01:35:32.040
And then they hope that the actual exam will be a very simple interpolation of the mock exams.
link |
01:35:38.120
And that student could just be a deep learning model at that point.
link |
01:35:41.080
And that student could just be a deep learning model at that point. But you can actually do that
link |
01:35:45.720
without any understanding of the material. And in fact, many students pass the exams in exactly
link |
01:35:51.160
this way. And if you want to avoid that, you need an exam that's unlike anything they've seen,
link |
01:35:56.520
that really probes their understanding. So how do we design an IQ test for machines?
link |
01:36:04.360
All right, so in the paper, I outline a number of requirements that you expect of such a test.
link |
01:36:15.000
And in particular, we should start by acknowledging the priors that we expect to be required
link |
01:36:23.320
in order to perform the test. So we should be explicit about the priors.
link |
01:36:28.440
And if the goal is to compare machine intelligence and human intelligence,
link |
01:36:32.680
then we should assume human cognitive priors. And secondly, we should make sure that we are testing
link |
01:36:42.040
for skill acquisition ability, skill acquisition efficiency in particular, and not for skill
link |
01:36:47.720
itself, meaning that every task featured in your test should be novel and should not be
link |
01:36:54.040
something that you can anticipate. So for instance, it should not be possible to
link |
01:36:58.200
brute force the space of possible questions to pregenerate every possible question and answer.
link |
01:37:06.840
So it should be tasks that cannot be anticipated, not just by the system itself,
link |
01:37:12.360
but by the creators of the system. Yeah, you know what's fascinating? I mean,
link |
01:37:17.880
one of my favorite aspects of the paper and the work you do, the ARC challenge, is the process
link |
01:37:25.320
of making priors explicit. Just even that act alone is a really powerful one of like, what are,
link |
01:37:35.720
it's a really powerful question, ask of us humans, what are the priors that we bring to the table?
link |
01:37:44.120
So the next step is like, once you have those priors, how do you use them to solve a novel
link |
01:37:49.640
task? But like just even making the priors explicit is a really difficult and really powerful step.
link |
01:37:56.200
And that's like visually beautiful and conceptually philosophically beautiful part of the work you
link |
01:38:02.440
did with, and I guess continue to do probably with the paper and the ARC challenge. Can you
link |
01:38:08.920
talk about some of the priors that we're talking about here? Yes. So a researcher has done a lot
link |
01:38:14.760
of work on what exactly are the knowledge priors that are innate to humans is Elizabeth Spelke
link |
01:38:24.360
from Harvard. So she developed the core knowledge theory, which outlines four different core
link |
01:38:34.440
knowledge systems. So systems of knowledge that we are basically either born with or that we are
link |
01:38:41.560
hardwired to acquire very early on in our development. And there's no strong distinction
link |
01:38:51.240
between the two. Like if you are primed to acquire a certain type of knowledge, in just a few weeks,
link |
01:39:01.080
you might as well just be born with it. It's just part of who you are. And so there are four
link |
01:39:07.560
different core knowledge systems. Like the first one is the notion of objectness and basic physics.
link |
01:39:16.280
Like you recognize that something that moves currently, for instance, is an object. So we
link |
01:39:24.200
intuitively naturally, innately divide the world into objects based on this notion of
link |
01:39:30.440
coherence, physical coherence. And in terms of elementary physics, there's the fact that objects
link |
01:39:38.360
can bump against each other and the fact that they can occlude each other. So these are things that
link |
01:39:45.880
we are essentially born with or at least that we are going to be acquiring extremely early because
link |
01:39:52.680
really hardwired to acquire them. So a bunch of points, pixels that move together on objects
link |
01:40:01.000
are partly the same object. Yes. I mean, I don't smoke weed, but if I did,
link |
01:40:11.160
that's something I could sit all night and just think about. I remember writing in your paper
link |
01:40:15.560
just objectness. I wasn't self aware of that particular prior. That's such a fascinating
link |
01:40:26.040
prior. That's the most basic one. Objectness, just identity, objectness. It's very basic,
link |
01:40:35.720
I suppose, but it's so fundamental. It is fundamental to human cognition.
link |
01:40:40.680
Yeah. And the second prior that's also fundamental is agentness, which is not a real world,
link |
01:40:48.040
a real world, but so agentness. The fact that some of these objects that you segment your
link |
01:40:55.240
environment into, some of these objects are agents. So what's an agent? Basically, it's
link |
01:41:02.040
an object that has goals. That has what? That has goals. There's capable of
link |
01:41:08.680
pursuing goals. So for instance, if you see two dots moving in a roughly synchronized fashion,
link |
01:41:16.200
you will intuitively infer that one of the dots is pursuing the other. So one of the dots is,
link |
01:41:25.160
and one of the dots is an agent, and its goal is to avoid the other dot. And one of the dots,
link |
01:41:30.120
the other dot, is also an agent, and its goal is to catch the first dot. Pelke has shown that
link |
01:41:37.560
babies as young as three months identify agentness and goal directedness in their environment.
link |
01:41:46.280
Another prior is basic geometry and topology, like the notion of distance,
link |
01:41:53.560
the ability to navigate in your environment, and so on. This is something that is fundamentally
link |
01:41:59.480
hardwired into our brain. It's in fact backed by very specific neural mechanisms, like for instance,
link |
01:42:08.440
grid cells and plate cells. So it's something that's literally hardcoded at the neural level
link |
01:42:17.800
in our hippocampus. And the last prior would be the notion of numbers, like numbers are not
link |
01:42:24.760
actually a cultural construct. We are intuitively, innately able to do some basic counting and to
link |
01:42:32.840
compare quantities. So it doesn't mean we can do arbitrary arithmetic. Counting, the actual counting.
link |
01:42:40.120
Like counting one, two, three, then maybe more than three. You can also compare quantities if I give
link |
01:42:46.280
you three dots and five dots, you can tell the side with five dots as more dots. So this is
link |
01:42:53.720
actually an innate prior. So that said, the list may not be exhaustive. So Spelki is still
link |
01:43:03.720
pursuing the potential existence of new knowledge systems, for instance,
link |
01:43:11.160
knowledge systems that we deal with social relationships.
link |
01:43:15.480
Yeah. Which is much less relevant to something like ARC or IQ test.
link |
01:43:24.040
Right. There could be stuff that's, like you said, rotation or symmetry. Is it really interesting?
link |
01:43:31.000
It's very likely that there is, speaking about rotation, that there is in the brain
link |
01:43:37.560
a hardcoded system that is capable of performing rotations.
link |
01:43:40.520
One famous experiment that people did in the, I don't remember who it was exactly, but in the
link |
01:43:48.520
70s was that people found that if you asked people, if you give them two different shapes,
link |
01:43:57.560
and one of the shapes is a rotated version of the first shape, and you ask them,
link |
01:44:03.240
is that shape a rotated version of the first shape or not? What you see is that the time it
link |
01:44:09.400
takes people to answer is linearly proportional, right, to the angle of rotation. So it's almost
link |
01:44:16.760
like you have it somewhere in your brain, like a turntable with a fixed speed. And if you want to
link |
01:44:24.760
know if two objects are rotated versions of each other, you put the object on the turntable,
link |
01:44:32.040
you let it move around a little bit, and then you stop when you have a match.
link |
01:44:36.760
And that's really interesting. So what's the arc challenge?
link |
01:44:42.680
So in the paper, I outlined all these principles that a good test of machine
link |
01:44:49.560
intelligence and human intelligence should follow. And the arc challenge is one attempt
link |
01:44:55.160
to embody as many of these principles as possible. So I don't think it's anywhere near
link |
01:45:00.440
a perfect attempt, right? It does not actually follow every principle, but it is
link |
01:45:07.560
what I was able to do given the constraints. So the format of arc is very similar to classic
link |
01:45:14.760
IQ tests, in particular Raven's Progessive Metruses. Yeah, Raven's Progessive Metruses.
link |
01:45:20.440
I mean, if you've done IQ tests in the past, you know where that is probably,
link |
01:45:24.040
at least you've seen it, even if you don't know what it's called. And so you have a set of tasks,
link |
01:45:31.240
that's what they're called. And for each task, you have training data, which is a set of input
link |
01:45:38.840
and output pairs. So an input or output pair is a grid of colors, basically. The size of the
link |
01:45:46.600
grid is variable, is the size of the grid is variable. And you're given an input and you
link |
01:45:54.920
must transform it into the proper output, right? And so you're shown a few demonstrations
link |
01:46:01.960
of a task in the form of existing input output pairs, and then you're given a new input,
link |
01:46:06.840
and you must provide, you must produce the correct output. And the assumptions
link |
01:46:18.840
in arc is that every task should only require core knowledge priors, should not require any
link |
01:46:28.600
outside knowledge. So for instance, no language, no English, nothing like this, no concepts taken
link |
01:46:38.520
from our human experience, like trees, dogs, cats, and so on. So only tasks that are, reasoning tasks
link |
01:46:48.600
that are built on top of core knowledge priors. And some of the tasks are actually explicitly
link |
01:46:56.120
trying to probe specific forms of abstraction, right? Part of the reason why I wanted to create arc
link |
01:47:05.480
is I'm a big believer in, you know, when you're faced with a problem as murky as
link |
01:47:17.240
understanding how to autonomously generate abstraction in a machine,
link |
01:47:21.560
you have to co evolve the solution and the problem. And so part of the reason why I designed arc
link |
01:47:28.520
was to clarify my ideas about the nature of abstraction, right? And some of the tasks are
link |
01:47:35.160
actually designed to probe bits of that theory. And there are things that are turned out to be
link |
01:47:41.800
very easy for humans to perform, including young kids, right? But turned out to be
link |
01:47:47.000
not to be near impossible for machines. So what have you learned from the nature of abstraction
link |
01:47:54.280
from designing that? Can you clarify what you mean? One of the things you wanted to try to
link |
01:48:00.920
understand was this idea of abstraction? Yes. So clarifying my own ideas about abstraction by
link |
01:48:10.520
forcing myself to produce tasks that would require the ability to produce that form of
link |
01:48:17.480
abstraction in order to solve them. Got it. Okay. So, and by the way, just to, I mean,
link |
01:48:23.000
people should check out, I'll probably overlay if you're watching the video part, but the grid input
link |
01:48:28.360
output with the different colors on the grid. That's it. That's that means a very simple world.
link |
01:48:35.560
But it's kind of beautiful. It's very similar to classic acutest. Like, it's not very original
link |
01:48:40.840
in that sense. The main difference with acutest is that we make the priors explicit, which is not
link |
01:48:46.680
usually the case in acutest. So you might get explicit that everything should only be built
link |
01:48:51.560
out of core knowledge priors. I also think it's generally more diverse than acutest in general.
link |
01:48:59.000
And it's, it perhaps requires a bit more manual work to produce solutions because you have to
link |
01:49:05.960
click around on a grid for a while. Sometimes the grades can be as large as cell by cell cells.
link |
01:49:11.880
So how did you come up? If you can reveal with the questions, like what's the process
link |
01:49:18.760
of the questions? Was it mostly you? Yeah, they came up with the questions. What,
link |
01:49:23.240
how difficult is it to come up with a question? Like, is this scalable to a much larger number?
link |
01:49:30.600
If we think, you know, with acutest, you might not necessarily want it to or need it to be scalable
link |
01:49:36.280
with machines. It's possible you could argue that it needs to be scalable.
link |
01:49:41.480
So there are a thousand questions, a thousand tasks, including the test set, the private test set.
link |
01:49:48.200
I think it's fairly difficult in the sense that a big requirement is that every task should be
link |
01:49:55.320
novel and unique and unpredictable, right? Like you don't want to create your own little world
link |
01:50:04.040
that is simple enough that it would be possible for a human to reverse and generate
link |
01:50:10.920
and write down an algorithm that could generate every possible arc task and their solution.
link |
01:50:16.840
So in a sense, that would completely invalidate the test. So you're constantly coming up with new
link |
01:50:21.000
stuff. Yeah, you need a source of novelty, of unfakeable novelty. And one thing I found is that
link |
01:50:30.120
as a human, you are not a very good source of unfakeable novelty. And so you have to
link |
01:50:38.040
base the creation of these tasks quite a bit. There are only so many unique tasks that you
link |
01:50:43.080
can do in a given day. So that means coming up with truly original new ideas. Did psychedelics
link |
01:50:51.080
help you at all? No, it's okay. But I mean, that's fascinating to think about. So you would be like
link |
01:50:56.840
walking or something like that. Are you constantly thinking of something totally new?
link |
01:51:02.840
Yes. I mean, this is hard. This is hard. Yeah, I mean, I'm not saying I've done
link |
01:51:10.440
anywhere near perfect job at it. There is some amount of redundancy, and there are many imperfections
link |
01:51:15.880
in arc. So that said, you should consider arc as a work in progress. It is not the definitive
link |
01:51:22.600
state where the arc tasks today are not the definitive state of the test. I want to keep
link |
01:51:29.960
refining it. In the future, I also think it should be possible to open up the creation of tasks
link |
01:51:37.560
to broad audience to do crowdsourcing. That would involve several levels of filtering, obviously.
link |
01:51:43.960
But I think it's possible to apply crowdsourcing to develop a much bigger and much more diverse
link |
01:51:50.120
arc data set that would also be free of potentially some of my own personal biases.
link |
01:51:56.440
Is there always need to be a part of arc that the test is hidden?
link |
01:52:02.120
Yes, absolutely. It is impressive that the test that you're using to actually benchmark algorithms
link |
01:52:11.880
is not accessible to the people developing these algorithms. Because otherwise, what's
link |
01:52:16.040
going to happen is that the human engineers are just going to solve the tasks themselves
link |
01:52:21.720
and encode their solution in program form. But that again, what you're seeing here is
link |
01:52:28.120
the process of intelligence happening in the mind of the human. And then you're just capturing
link |
01:52:33.880
its crystallized output. But that crystallized output is not the same thing as the process
link |
01:52:38.920
generated. It's not intelligent. So by the way, the idea of crowdsourcing it is fascinating.
link |
01:52:46.040
I think the creation of questions is really exciting for people. I think there's a lot
link |
01:52:52.840
of really brilliant people out there that love to create these kinds of stuff.
link |
01:52:56.200
Yeah. One thing that surprised me that I wasn't expecting is that lots of people seem to actually
link |
01:53:03.080
enjoy arc as a kind of game. And I was really seeing it as a test, as a benchmark of fluid
link |
01:53:12.600
general intelligence. And lots of people, including kids, are just enjoying it as a game. So I think
link |
01:53:19.080
that's encouraging. Yeah, I'm fascinated by it. There's a world of people who create IQ questions.
link |
01:53:24.440
I think that's a cool activity for machines and for humans. And humans are themselves fascinated
link |
01:53:35.320
by taking the questions, measuring their own intelligence. That's just really compelling.
link |
01:53:44.280
It's really interesting to me too. It helps. One of the cool things about arc, you said,
link |
01:53:48.600
it's kind of inspired by IQ tests or whatever. It follows a similar process. But because of its
link |
01:53:54.520
nature, because of the context in which it lives, it immediately forces you to think about the nature
link |
01:54:00.680
of intelligence as opposed to just the test of your own. It forces you to really think. I don't
link |
01:54:06.120
know if it's within the question, inherent in the question, or just the fact that it lives
link |
01:54:12.520
in the test that's supposed to be a test of machine intelligence. Absolutely. As you solve
link |
01:54:17.720
arc tasks as a human, you will be forced to basically introspect how you come up with solutions,
link |
01:54:26.920
and that forces you to reflect on the human problem solving process and the way your own mind
link |
01:54:36.200
generates abstract representations of the problems it's exposed to. I think it's due to the fact that
link |
01:54:46.760
the set of core knowledge priors that arc is built upon is so small. It's all a recombination of a
link |
01:54:55.080
very, very small set of assumptions. Okay. So what's the future of arc? So you held arc as a
link |
01:55:04.440
challenge as part of a Kegel competition. Yes. Kegel competition. And what do you think? Do
link |
01:55:12.200
you think that's something that continues for five years, 10 years, just continues growing?
link |
01:55:17.720
Yes, absolutely. So arc itself will keep evolving. So I've talked about crowd sourcing,
link |
01:55:22.600
I think that's a good avenue. Another thing I'm starting is I'll be collaborating with
link |
01:55:30.200
folks from the psychology department at NYU to do human testing on arc. And I think there are
link |
01:55:37.480
lots of interesting questions you can start asking, especially as you start coordinating machine
link |
01:55:44.280
solutions to arc tasks and the human characteristics of solutions. Like for instance, you can try to
link |
01:55:51.720
see if there's a relationship between the human perceived difficulty of a task and the machine
link |
01:55:58.840
perceived. Yes, and exactly some measure of machine perceived difficulty. Yeah, it's a nice
link |
01:56:03.480
playground in which to explore this very difference. It's the same thing as we talked
link |
01:56:07.320
about the autonomous vehicles. The things that could be difficult for humans might be very
link |
01:56:11.240
different than the things that are absolutely and formalizing or making explicit that difference
link |
01:56:17.160
and difficulty will teach us something may teach us something fundamental about intelligence.
link |
01:56:22.040
So one thing I think we did well with arc is that it's proving to be a very
link |
01:56:29.400
actionable test in the sense that machine performance on arc started at very much zero
link |
01:56:37.720
initially, while humans found actually the task very easy. And that alone was like a big red
link |
01:56:47.000
flashing light saying that something is going on and that we are missing something. And at the
link |
01:56:52.920
same time, machine performance did not stay at zero for very long actually within two weeks
link |
01:56:58.920
of the Kaggle competition, we started having a non zero number. And now the state of the art is
link |
01:57:04.600
around 20% of the test set solved. And so arc is actually a challenge where our capabilities
link |
01:57:14.920
start at zero, which indicates the need for progress. But it's also not an impossible
link |
01:57:20.120
challenge. It's not accessible. You can start making progress basically right away. At the
link |
01:57:25.560
same time, we are still very far from having solved it. And that's actually a very positive outcome
link |
01:57:32.680
of the competition is that the competition has proven that there was no obvious shortcut to
link |
01:57:40.280
solve these tasks. Right. Yeah, so the test held up. Yeah, exactly. That was the primary reason
link |
01:57:45.560
to do the Kaggle competition is to check if some some, you know, clever person was was going to
link |
01:57:51.880
hack the benchmark. And that did not happen, right? Like people who are solving the task are
link |
01:57:57.880
essentially doing it. Well, in a way, they're actually exploring some flaws of arc that we
link |
01:58:05.720
will need to address in the future, especially they're essentially anticipating what sort of tasks
link |
01:58:11.480
may be contained in the test set, right? Right. Which is kind of, yeah, that's the kind of hacking.
link |
01:58:18.280
It's human hacking of the test. Yes. That said, you know, with the state of the art, that's like
link |
01:58:24.120
a 20% versus very, very far from human level, which is closer to 100 person. And so, and I do
link |
01:58:31.720
believe that, you know, it will it will take a while until we reach a human parity on arc. And
link |
01:58:40.600
that by the time we have human parity, we will have AI systems that are probably pretty close to
link |
01:58:47.640
human level in terms of general fluid intelligence, which is, I mean, it's they're not going to be
link |
01:58:53.560
necessarily human like, they're not necessarily, you would not necessarily recognize them as,
link |
01:58:59.560
you know, being an AI. But they would be capable of a degree of generalization that matches the
link |
01:59:07.800
generalization performed by human fluid intelligence. Sure. I mean, this is a good point
link |
01:59:12.920
in terms of general fluid intelligence to mention in your paper, you describe different kinds of
link |
01:59:18.760
generalizations, local, broad, extreme, and there's a kind of hierarchy that you form. So when we say
link |
01:59:27.240
generalizations, what, what are we talking about? What kinds are there? Right. So generalization is
link |
01:59:36.040
very old idea. I mean, it's even older than machine learning. In the context of machine learning,
link |
01:59:40.840
you say a system generalizes if it can make sense of an input it has, it has not yet seen.
link |
01:59:49.480
And that's what I would call a system centric generalization, you generalization
link |
01:59:56.840
with respect to novelty for the specific system you're considering. So I think a good test of
link |
02:00:04.360
intelligence should actually deal with developer aware generalization, which is slightly stronger
link |
02:00:11.800
than system centric generalization. So developer generalization developer aware generalization
link |
02:00:15.880
would be the ability to generalize to novelty or uncertainty that not only the system itself
link |
02:00:23.800
has not access to, but the developer of the system could not have access to either.
link |
02:00:28.360
Yeah. That's a fascinating, that's a fascinating meta definition. So like the system is, it's
link |
02:00:36.280
basically the edge case thing we're talking about with autonomous vehicles, neither the developer
link |
02:00:40.680
nor the system know about the edge cases. So it's up to the system should be able to generalize the
link |
02:00:47.720
thing that, that nobody expected, neither the designer of the training data, nor obviously
link |
02:00:56.440
the contents of the training data. That's a fascinating definition.
link |
02:01:00.280
So you can see generalization degrees of generalization as a spectrum.
link |
02:01:04.360
And the lowest level is what machine learning is trying to do is the assumption that
link |
02:01:11.960
any new situation is going to be sampled from a static distribution of possible situations.
link |
02:01:18.200
And that you already have a representative sample of the distribution that's your training data.
link |
02:01:23.720
And so in machine learning, you generalize to a new sample from a known distribution.
link |
02:01:28.680
And the ways in which your new sample will be new or different are ways that are already understood
link |
02:01:36.760
by the developers of the system. So you are generalizing to known unknowns for one specific task.
link |
02:01:45.000
That's what you would call robustness. You are robust to things like noise,
link |
02:01:49.160
small variations and so on. For one fixed known distribution that you know through your training
link |
02:01:58.440
data. And a higher degree would be flexibility in machine intelligence. So flexibility would be
link |
02:02:08.040
something like an L5 cell driving car, or maybe a robot that can pass the
link |
02:02:15.080
the coffee cup test, which is the notion that you would be given a random kitchen
link |
02:02:21.400
somewhere in the country and you would have to go make a cup of coffee in that kitchen.
link |
02:02:28.360
So flexibility would be the ability to deal with unknown unknowns. So things that could not,
link |
02:02:35.240
dimensions of variability that could not have been possibly foreseen by the creators of the system
link |
02:02:41.000
within one specific task. So generalizing to the long tail of situations in cell driving,
link |
02:02:46.920
for instance, would be flexibility. So you have robustness, flexibility. And finally,
link |
02:02:51.560
we'd have extreme generalization, which is basically flexibility, but instead of just
link |
02:02:58.040
considering one specific domain like driving or domestic robotics, you're considering an
link |
02:03:04.360
open ended range of possible domains. So a robot would be capable of extreme generalization if
link |
02:03:13.400
let's say it's designed and trained for cooking, for instance. And if I buy the robots,
link |
02:03:22.360
and if I'm able, if it's able to teach itself gardening in a couple weeks, it would be capable
link |
02:03:29.960
of extreme generalization, for instance. So the ultimate goal is extreme generalization. Yes.
link |
02:03:34.760
So creating a system that is so general that it could essentially achieve human skill parity over
link |
02:03:44.920
arbitrary tasks and arbitrary domains with the same level of improvisation and adaptation power
link |
02:03:52.120
as humans when it encounters new situations. And it would do so over basically the same range
link |
02:03:58.920
of possible domains and tasks as humans, and using essentially the same amount of training
link |
02:04:04.840
experience of practice as humans would require. That would be human level
link |
02:04:08.760
of extreme generalization. So I don't actually think humans are anywhere near the
link |
02:04:16.760
optimal intelligence bound if there is such a thing. So I think for humans or in general?
link |
02:04:22.600
In general. I think it's quite likely that there is a hard limit to how intelligent
link |
02:04:32.760
any system can be. But at the same time, I don't think humans are anywhere near that limit.
link |
02:04:39.080
Yeah, last time I think we talked, I think you had this idea that we're only as intelligent as
link |
02:04:44.600
the problems we face. We are bounded by the problem. In a way, yes. We are bounded by our
link |
02:04:53.960
environments and we are bounded by the problems we try to solve. Yeah. What do you make of Neuralink
link |
02:05:01.000
and outsourcing some of the brain power, like brain computer interfaces? Do you think we can expand
link |
02:05:08.280
our, augment our intelligence? I am fairly skeptical of Neuralink interfaces because
link |
02:05:19.240
they're trying to fix one specific bottleneck in human mission cognition, which is the bandwidth
link |
02:05:28.040
bottleneck input and output of information in the brain. And my perception of the problem is that
link |
02:05:35.640
bandwidth is not, at this time, a bottleneck at all, meaning that we already have senses that
link |
02:05:43.000
enable us to take in far more information than what we can actually process. Well, to push back
link |
02:05:50.760
on that a little bit, to sort of play devil's advocate a little bit, is if you look at the
link |
02:05:55.880
internet, the Wikipedia, let's say Wikipedia, I would say that humans, after the advent of Wikipedia,
link |
02:06:01.960
are much more intelligent. Yes. I think that's a good one. But that's also not about, that's about
link |
02:06:12.040
externalizing our intelligence via information processing systems, external information
link |
02:06:18.840
processing systems, which is very different from brain computer interfaces. Right. But the question
link |
02:06:24.920
is whether if we have direct access, if our brain has direct access to Wikipedia, would our brain
link |
02:06:32.200
already has direct access to Wikipedia, it's on your phone, and you have your hands and your eyes
link |
02:06:38.360
and your ears and so on to access that information and the speed at which you can access it.
link |
02:06:44.280
Is bottlenecked by the cognition? I think it's already close, fairly close to optimal, which is
link |
02:06:49.800
why speed reading, for instance, does not work. The faster you read, the less you understand.
link |
02:06:55.880
But maybe it's because it uses the eyes. So maybe, so I don't believe so. I think, you know,
link |
02:07:01.800
the brain is very slow. It's speaking operates, you know, the fastest things that happen in the
link |
02:07:08.200
brain at the level of 50 milliseconds, forming a conscious start can potentially take entire
link |
02:07:15.320
seconds. Right. And you can already read pretty fast. So I think the speed at which you can
link |
02:07:21.800
take information in and even the speed at which you can output information can only be very
link |
02:07:28.520
incrementally improved. Maybe if you're very fast typer, if you're a very trained typer,
link |
02:07:34.280
the speed at which you can express your thoughts is already a speed at which you can form your
link |
02:07:39.160
thoughts. Right. So that's kind of an idea that there are fundamental bottlenecks to the human
link |
02:07:46.680
mind. But it's possible that everything we have in the human mind is just to be able to survive
link |
02:07:53.080
in the environment. And there's a lot more to expand. Maybe, you know, you said the speed of the
link |
02:08:01.720
thought. So yeah, I think augmenting human intelligence is a very valid and very powerful
link |
02:08:09.000
avenue. Right. And that's what computers are about. In fact, that's what, you know, all of
link |
02:08:14.040
culture and civilization is about. They are culture is externalized cognition. And we rely
link |
02:08:21.400
on culture to think constantly. Yeah. Yeah. I mean, that's another yeah, that's not just not
link |
02:08:27.160
just computers, not just forms on the internet. I mean, all of culture, like language, for instance,
link |
02:08:32.360
is a form of externalized cognition. Books are obviously externalized cognition.
link |
02:08:37.400
Yeah, that's great. And you can scale that externalized cognition, you know, far beyond
link |
02:08:42.680
the capability of the human brain. And you could see, you know, civilization itself is
link |
02:08:50.760
it has capabilities that are far beyond any individual brain and we keep scaling it because
link |
02:08:55.400
it's not rebound by individual brains. It's a different kind of system. Yeah. And and that
link |
02:09:02.600
system includes non human, non humans. First of all, includes all the other biological systems,
link |
02:09:08.520
which are probably contributing to the overall intelligence of the organism.
link |
02:09:12.840
And then computers are part of it on non human systems, probably not contributing much, but
link |
02:09:17.400
AI is definitely contributing to that. Like Google search, for instance, part of it. Yeah. Yeah.
link |
02:09:27.160
A huge part, a part that we can't probably introspect. Like how the world has changed in
link |
02:09:32.280
the past 20 years. It's probably very difficult for us to be able to understand until, of course,
link |
02:09:38.440
whoever created the simulation wherein is probably do metrics measuring the progress.
link |
02:09:43.720
Yes. There was probably a big spike in performance. They're enjoying, they're enjoying this.
link |
02:09:51.560
So what are your thoughts on the Turing test and the Lobner Prize, which is the,
link |
02:09:59.240
you know, one of the most famous attempts at the test of human intelligence, sorry,
link |
02:10:04.760
of artificial intelligence by doing a natural language open dialogue test that's test that's
link |
02:10:14.360
judged by humans as far as how well the machine did.
link |
02:10:18.760
So I'm not a fan of the Turing test itself or any of its variants for two reasons.
link |
02:10:24.600
So first of all, it's really coping out of trying to define and measure intelligence because it's
link |
02:10:38.120
entirely outsourcing that to a panel of human judges. And these human judges, they may not
link |
02:10:46.680
themselves have any proper methodology. They may not themselves have any proper definition of
link |
02:10:52.760
intelligence. They may not be reliable. So the Turing test already failing one of the core
link |
02:10:57.960
psychometrics principles, which is reliability because you have biased human judges. It's also
link |
02:11:05.000
violating the standardization requirement and the freedom from bias requirement. And so it's
link |
02:11:10.840
really a coop out because you are outsourcing everything that matters, which is precisely
link |
02:11:16.120
describing intelligence and finding a standard on test to measure it. You are sourcing everything
link |
02:11:23.320
to people. So it's really a coop out. And by the way, we should keep in mind that when Turing
link |
02:11:31.080
proposed the imitation game, it was not meaning for the imitation game to be an actual goal for
link |
02:11:39.640
the field of AI and actual tests of intelligence. It was using the imitation game as a thought
link |
02:11:47.560
experiment in a philosophical discussion in his 1950 paper. He was trying to argue that theoretically,
link |
02:11:58.520
it should be possible for something very much like the human mind indistinguishable from the
link |
02:12:05.480
human mind to be encoded in a Turing machine. And at the time, that was a very daring idea.
link |
02:12:14.440
It was stretching credulity. But nowadays, I think it's fairly well accepted that the mind is an
link |
02:12:21.320
information processing system and that you could probably encode it into a computer. So another
link |
02:12:26.280
reason why I'm not a fan of this type of test is that the incentives that it creates are incentives
link |
02:12:35.400
that are not conducive to proper scientific research. If your goal is to convince a panel of
link |
02:12:45.160
human judges that they're talking to a human, then you have an incentive to rely on tricks and
link |
02:12:54.040
prestidigitation in the same way that, let's say, you're doing physics and you want to solve
link |
02:13:00.040
teleportation. And what if the test that you set out to pass is you need to convince a panel of
link |
02:13:06.760
judges that teleportation took place and they're just sitting there and watching what you're doing.
link |
02:13:12.520
And that is something that you can achieve with David Copperfield could achieve it in his show
link |
02:13:20.200
at Vegas. And what he's doing is very elaborate. But it's not actually, it's not physics. It's
link |
02:13:29.320
not making any progress in our understanding of the universe. To push back on that as possible,
link |
02:13:34.680
that's the hope with these kinds of subjective evaluations is that it's easier to solve it
link |
02:13:41.000
generally than it is to come up with tricks that convince a large number of judges. That's the hope.
link |
02:13:47.240
In practice, it turns out that it's very easy to deceive people in the same way that you can do
link |
02:13:52.680
magic in Vegas. You can actually very easily convince people that they're talking to a human
link |
02:13:58.440
when they're actually talking to an algorithm. I disagree with that. I think it's easy.
link |
02:14:05.160
It's not easy. It's doable. It's very easy because I wouldn't say it's very easy though.
link |
02:14:10.680
We are biased. We have theory of mind. We are constantly projecting emotions, intentions,
link |
02:14:19.960
agentness. Agentness is one of our core innate priors. We are projecting these things on everything
link |
02:14:26.200
around us. If you paint a smiley on a rock, the rock becomes happy in our eyes. Because
link |
02:14:33.960
we have this extreme bias that permits everything we see around us, it's actually pretty easy to
link |
02:14:40.920
trick people. I disagree with that. I totally disagree with that. You brilliantly put the
link |
02:14:49.320
anthropomorphization that we naturally do, the agentness of that word. Is that a real word?
link |
02:14:53.880
No, it's not a real word. I like it. But it's a good word. It's a useful word.
link |
02:14:57.640
It's a useful word. Let's make it real. It's a huge help. But I still think it's really difficult
link |
02:15:02.520
to convince. If you do like the Alexa Prize formulation where you talk for an hour,
link |
02:15:09.880
like there's formulations of the test you can create where it's very difficult.
link |
02:15:13.640
So I like the Alexa Prize better because it's more pragmatic. It's more practical.
link |
02:15:18.920
It's actually incentivizing developers to create something that's useful as a human
link |
02:15:27.160
mission interface. So that's slightly better than just the imitation.
link |
02:15:31.560
So your idea is like a test which hopefully will help us in creating intelligent systems
link |
02:15:38.920
as a result. If you create a system that passes it, it'll be useful for creating
link |
02:15:43.240
further intelligent systems. Yes, at least. I'm a little bit surprised
link |
02:15:51.720
how little inspiration people draw from the Turing test today. The media and the popular
link |
02:15:58.280
press might write about it every once in a while. The philosophers might talk about it.
link |
02:16:03.400
But most engineers are not really inspired by it. I know you don't like the Turing test,
link |
02:16:10.680
but we'll have this argument another time. There's something inspiring it about it,
link |
02:16:16.920
I think. As a philosophical device in a philosophical discussion,
link |
02:16:21.640
I think there is something very interesting about it. I don't think it is in practical terms.
link |
02:16:26.040
I don't think it's conducive to progress. And one of the reasons why is that I think
link |
02:16:33.320
being very human like being undistinguishable from a human is actually the very last step
link |
02:16:38.520
in the creation of machine intelligence. That the first AI is that will show strong
link |
02:16:44.520
generalization that will actually implement human like broad cognitive abilities.
link |
02:16:53.000
They will not actually be able to look anything like humans.
link |
02:16:58.360
Human likeness is the very last step in that process. And so a good test is a test that
link |
02:17:03.720
points you towards the first step on the ladder, not towards the top of the ladder, right?
link |
02:17:08.760
So to push back on that, I usually agree with you on most things. I remember you,
link |
02:17:14.040
I think, at some point tweeting something about the Turing test not being counterproductive or
link |
02:17:19.000
something like that. And I think a lot of very smart people agree with that. A computation
link |
02:17:29.080
speaking not very smart person disagree with that because I think there's some magic to the
link |
02:17:33.880
interactivity interactivity with other humans. So to push to play devil's advocate on your
link |
02:17:39.000
statement, it's possible that in order to demonstrate the generalization abilities of a system,
link |
02:17:45.400
you have to show your in conversation show your ability to adjust, adapt to the conversation
link |
02:17:54.680
through not just like as a standalone system, but through the process of like the interaction
link |
02:18:01.160
that game theoretic, where you're you really are changing the environment by your actions. So
link |
02:18:09.560
in the art challenge, for example, you're an observer, you can't you can't scare the test
link |
02:18:15.400
into into changing, you can't talk to the test, you can't play with it. So there's some aspect
link |
02:18:22.360
of that interactivity that becomes highly subjective, but it feels like it could be conducive
link |
02:18:28.200
to yeah, I think you make a great point. The interactivity is very good setting to force
link |
02:18:33.960
a system to show adaptation to show generalization. That said, you're at the same time. It's not
link |
02:18:42.680
something very scalable because you rely on human judges. It's not something reliable because the
link |
02:18:47.720
human judges may not you don't like human judges. Basically, yes. And I think so. I love the idea
link |
02:18:53.880
of interactivity. I initially wanted an artist that had some amount of interactivity where your
link |
02:19:01.640
score on a task would not be one or zero if you can solve it or not, but would be the number
link |
02:19:09.240
of attempts that you can make before you hit the right solution, which means that now you can start
link |
02:19:16.280
applying the scientific method as you sort of arc tasks that you can start formulating hypothesis
link |
02:19:22.280
and probing the system to see whether the idea of this is the observation will match the hypothesis
link |
02:19:27.960
or not. It would be amazing if you could also even higher level than that, measure the quality of
link |
02:19:33.960
your attempts, which of course is impossible. But again, that gets subjective. How good was
link |
02:19:39.080
your thinking? How efficient was? So one thing that's interesting about this notion of scoring
link |
02:19:47.880
you as how many attempts you need is that you can start producing tasks that are way more ambiguous,
link |
02:19:53.640
right? Because with the different attempts, you can actually probe that
link |
02:20:01.240
ambiguity, right? Right. So that's in a sense, which is how good can you adapt to the uncertainty
link |
02:20:12.200
and reduce the uncertainty? Yes. It's half fast. Is the efficiency with which to reduce uncertainty
link |
02:20:21.080
in program space? Exactly. Very difficult to come up with that kind of test though.
link |
02:20:24.840
Yeah. So I would love to be able to create something like this. In practice, it would be very,
link |
02:20:30.520
very difficult. But yes. What you're doing, what you've done with the ARC challenge is brilliant.
link |
02:20:37.400
I'm also not, I'm surprised that it's not more popular, but I think it's picking up like that.
link |
02:20:42.040
It does its niche. It does its niche. Yeah. What are your thoughts about another test that I talked
link |
02:20:47.320
with Marcus Hutter? He has the harder prize for compression of human knowledge and the idea is
link |
02:20:52.920
really sort of quantify and reduce the test of intelligence purely to just the ability to
link |
02:20:58.280
compress. What's your thoughts about this intelligence as compression? I mean, it's a very
link |
02:21:06.440
fun test because it's such a simple idea. Like you're given Wikipedia, basically English Wikipedia,
link |
02:21:13.720
and you must compress it. And so it stems from the idea that cognition is compression,
link |
02:21:21.080
that the brain is basically a compression algorithm. This is a very old idea. It's a very, I think,
link |
02:21:27.480
striking and beautiful idea. I used to believe it. I eventually had to realize that it was,
link |
02:21:34.920
it was very much a flawed idea. So I no longer believe that compression is cognition is compression.
link |
02:21:40.920
So, but I can tell you what's the difference. So it's very easy to believe that cognition
link |
02:21:47.960
and compression are the same thing because, so Jeff Hawkins, for instance, says that
link |
02:21:53.480
cognition is prediction. And of course, prediction is basically the same thing as compression, right?
link |
02:21:58.600
It's just including the temporal axis. And it's very easy to believe this because compression
link |
02:22:05.960
is something that we do all the time very naturally. We are constantly compressing information. We are
link |
02:22:14.280
constantly trying, we have this bias towards simplicity. We're constantly trying to organize
link |
02:22:20.040
things in our mind and around us to be more regular, right? So it's a beautiful idea. It's
link |
02:22:26.920
very easy to believe. There is a big difference between what we do with our brains and compression.
link |
02:22:33.880
So compression is actually kind of a tool in the human cognitive tool kit that is used in many
link |
02:22:41.400
ways. But it's just a tool. It is not, it is a tool for cognition. It is not cognition itself.
link |
02:22:47.640
And the big fundamental difference is that cognition is about being able to operate in
link |
02:22:55.560
future situations that include fundamental uncertainty and novelty. So for instance,
link |
02:23:03.720
consider a child at age 10. And so they have 10 years of life experience. They've gotten,
link |
02:23:10.840
you know, pain, pleasure, rewards, and punishment at a period of time. If you were to generate
link |
02:23:18.760
the shortest behavioral program that would have basically run that child over these 10 years
link |
02:23:26.680
in an optimal way, right? The shortest optimal behavioral program given the experience of that
link |
02:23:33.640
child so far. Well, that program, that compressed program, this is what you would get if the mind
link |
02:23:39.400
of the child was a compression algorithm essentially, would be utterly unable, inappropriate to process
link |
02:23:48.920
the next 70 years in the life of that child. So in the models we build of the world, we are not
link |
02:23:59.400
trying to make them actually optimally compressed. We are using compression as a tool to promote
link |
02:24:07.160
simplicity and efficiency in our models. But they are not perfectly compressed because they need to
link |
02:24:13.080
include things that are seemingly useless today, that have seemingly been useless so far. But that
link |
02:24:20.600
may turn out to be useful in the future because you just don't know the future. And that's the
link |
02:24:26.680
fundamental principle that cognition, that intelligence arises from is that you need to be
link |
02:24:32.920
able to run appropriate behavioral programs, except you have absolutely no idea what sort of
link |
02:24:38.440
context, environment, and situation they're going to be running in. And you have to deal with that,
link |
02:24:43.640
with that uncertainty, with that future novelty. So an analogy that you can make is with investing,
link |
02:24:52.440
for instance. If I look at the past 20 years of stock market data, and I use a compression
link |
02:25:01.000
algorithm to figure out the best trading strategy, it's going to be you buy Apple stock, then maybe
link |
02:25:06.680
the past few years you buy Tesla stock or something. But is that strategy still going to be true for
link |
02:25:13.560
the next 20 years? Well, actually, probably not. Which is why if you're a smart investor, you're
link |
02:25:21.320
not just going to be following the strategy that corresponds to compression of the past.
link |
02:25:28.120
You're going to be following, you're going to have a balanced spot for you, right? Because you
link |
02:25:35.320
just don't know what's going to happen. I mean, I guess in that same sense, the compression is
link |
02:25:41.240
analogous to what you talked about, which is like local or robust generalization versus extreme
link |
02:25:46.840
generalization. It's much closer to that side of being able to generalize in the local sense.
link |
02:25:53.320
That's why as humans, when we are children, in our education, so a lot of it is driven by place,
link |
02:26:01.480
driven by curiosity, we are not efficiently compressing things. We're actually exploring.
link |
02:26:09.640
We are retaining all kinds of things from our environment that seem to be completely useless,
link |
02:26:19.560
because they might turn out to be eventually useful. That's what cognition is really about,
link |
02:26:26.760
and what makes it antagonistic to compression is that it is about hedging for future uncertainty.
link |
02:26:38.360
Cognition leverages compression as a tool to promote efficiency.
link |
02:26:43.720
So in that sense, in our models. It's like Einstein said, make it simpler,
link |
02:26:50.760
but not however that quote goes, but not too simple. So you want to compression,
link |
02:26:56.760
simplifies things, but you don't want to make it too simple.
link |
02:27:00.040
Yes. So a good model of the world is going to include all kinds of things that are completely
link |
02:27:05.400
useless, actually, just in case. Because you need diversity in the same way that in your portfolio,
link |
02:27:11.000
you need all kinds of stocks that may not have performed well so far, but you need
link |
02:27:14.840
diversity. And the reason you need diversity is because, fundamentally, you don't know what
link |
02:27:18.680
you're doing. And the same is true of the human mind, is that it needs to behave appropriately
link |
02:27:26.040
in the future. And it has no idea what the future is going to be like. But it's not going to be
link |
02:27:30.520
like the past. So compressing the past is not appropriate, because the past is not
link |
02:27:35.160
predictive of the future. Yeah. History repeats itself, but not perfectly.
link |
02:27:44.600
I don't think I asked you last time the most inappropriately absurd question.
link |
02:27:51.080
We've talked a lot about intelligence, but the bigger question from intelligence is of meaning.
link |
02:27:58.920
You know, intelligence systems are kind of goal oriented. They're always optimizing for goal.
link |
02:28:03.400
You look at the hotter prize, actually. I mean, there's always a clean formulation of a goal.
link |
02:28:08.840
But the natural questions for us humans, since we don't know our objective function, is what is
link |
02:28:15.000
the meaning of it all? So the absurd question is, what Francois Chalet do you think is the meaning
link |
02:28:22.840
of life? What's the meaning of life? Yeah, that's a big question.
link |
02:28:27.080
And I think I can give you my answer, at least one of my answers.
link |
02:28:37.960
And so you know, the one thing that's very important in understanding who we are is that
link |
02:28:47.720
everything that makes up ourselves, that makes up who we are,
link |
02:28:52.440
even your most personal thoughts is not actually your own. Even your most personal thoughts
link |
02:29:01.640
are expressed in words that you did not invent and are built on concepts and images that you did
link |
02:29:09.080
not invent. We are very much cultural beings. We are made of culture. What makes us different
link |
02:29:17.720
from animals, for instance. So everything about ourselves is an echo of the past, an echo of
link |
02:29:26.440
people who lived before us. That's who we are. And in the same way, if we manage to contribute
link |
02:29:35.960
something to the collective edifice of culture, a new idea, maybe a beautiful piece of music,
link |
02:29:44.360
a work of art, a grand theory, and new words, maybe, that something is going to become a part
link |
02:29:55.800
of the minds of future humans, essentially forever. So everything we do creates repulse
link |
02:30:03.800
that propagates into the future. And that's in a way, this is our path to immortality,
link |
02:30:11.800
is that as we contribute things to culture, culture in turn becomes future humans. And
link |
02:30:22.280
we keep influencing people thousands of years from now. So our actions today create repulse.
link |
02:30:30.680
And these repulse, I think, basically sum up the meaning of life. Like in the same way that we are,
link |
02:30:38.680
the sum of the interactions between many different repulse that came from our past,
link |
02:30:47.080
we are ourselves creating repulse that will propagate into the future. And that's why
link |
02:30:52.360
we should be, this seems like perhaps an eighth thing to say, but we should be kind to others
link |
02:30:59.240
during our time on Earth, because every act of kindness creates repulse. And in reverse,
link |
02:31:06.680
every act of violence also creates repulse. And you want to carefully choose which kind of repulse
link |
02:31:13.160
you want to create, and you want to propagate into the future. And in your case, first of all,
link |
02:31:18.280
beautifully put, but in your case, creating repulse into the future human and future AGI systems.
link |
02:31:27.800
Yes. It's fascinating. Our successors.
link |
02:31:30.280
I don't think there's a better way to end it. Francois has always, for a second time, and I'm
link |
02:31:37.640
sure many times in the future, it's been a huge honor. You're one of the most brilliant people
link |
02:31:43.240
in the machine learning computer science, science world. Again, it's a huge honor. Thanks for talking
link |
02:31:49.080
to me. It's been a pleasure. Thanks a lot for having me. We appreciate it. Thanks for listening
link |
02:31:54.520
to this conversation with Francois Chollet. And thank you to our sponsors, Babel, Masterclass,
link |
02:32:00.280
and Cash App. Click the sponsor links in the description to get a discount and to support
link |
02:32:05.400
this podcast. If you enjoy this thing, subscribe on YouTube, review it with five stars on our
link |
02:32:10.200
podcast, follow on Spotify, support on Patreon, or connect with me on Twitter at Lex Freedman.
link |
02:32:17.640
And now let me leave you with some words from Renee Descartes in 1668, an excerpt of which
link |
02:32:23.720
Francois includes in his On the Measure of Intelligence paper. If there were machines
link |
02:32:28.680
which bore a resemblance to our bodies and imitated our actions as closely as possible
link |
02:32:34.280
for all practical purposes, we should still have two very certain means of recognizing
link |
02:32:39.800
that they were not real men. The first is that they could never use words or put together
link |
02:32:45.240
signs as we do in order to declare our thoughts to others. For we can certainly conceive of a
link |
02:32:51.480
machine so constructed that it utters words and even utters words that correspond to bodily
link |
02:32:57.000
actions causing a change in its organs. But it is not conceivable that such a machine should
link |
02:33:02.760
produce different arrangements of words so as to give it an appropriately meaningful answer
link |
02:33:07.880
to whatever is said in its presence as the dullest of men can do. Here Descartes is anticipating
link |
02:33:13.960
the touring test, and the argument still continues to this day. Secondly, he continues,
link |
02:33:20.840
even though some machines might do some things as well as we do them, or perhaps even better,
link |
02:33:26.360
they would inevitably fail in others, which would reveal that they are acting not from
link |
02:33:31.160
understanding but only from the disposition of their organs. This is an incredible quote.
link |
02:33:38.600
For whereas reason is a universal instrument which can be used in all kinds of situations,
link |
02:33:46.440
these organs need some particular action. Hence it is for all practical purposes impossible
link |
02:33:51.960
for a machine to have enough different organs to make it act in all the contingencies of life
link |
02:33:57.720
in the way in which our reason makes us act. That's the debate between mimicry memorization
link |
02:34:04.920
versus understanding. So, thank you for listening and hope to see you next time.