back to index

François Chollet: Keras, Deep Learning, and the Progress of AI | Lex Fridman Podcast #38


small model | large model

link |
00:00:00.000
The following is a conversation with Francois Chalet.
link |
00:00:03.720
He's the creator of Keras, which is an open source deep learning
link |
00:00:07.300
library that is designed to enable fast, user friendly
link |
00:00:10.560
experimentation with deep neural networks.
link |
00:00:13.600
It serves as an interface to several deep learning libraries,
link |
00:00:16.680
most popular of which is TensorFlow.
link |
00:00:19.040
And it was integrated into the TensorFlow main code base
link |
00:00:22.600
a while ago.
link |
00:00:24.120
Meaning, if you want to create, train, and use
link |
00:00:27.560
neural networks, probably the easiest and most popular option
link |
00:00:31.040
is to use Keras inside TensorFlow.
link |
00:00:34.840
Aside from creating an exceptionally useful and popular
link |
00:00:37.760
library, Francois is also a world class AI researcher
link |
00:00:41.920
and software engineer at Google.
link |
00:00:44.560
And he's definitely an outspoken, if not controversial,
link |
00:00:48.080
personality in the AI world, especially
link |
00:00:51.480
in the realm of ideas around the future
link |
00:00:53.720
of artificial intelligence.
link |
00:00:55.920
This is the Artificial Intelligence Podcast.
link |
00:00:58.600
If you enjoy it, subscribe on YouTube,
link |
00:01:01.000
give us five stars on iTunes, support on Patreon,
link |
00:01:04.160
or simply connect with me on Twitter
link |
00:01:06.080
at Lex Freedman, spelled F R I D M A N.
link |
00:01:09.960
And now, here's my conversation with Francois Chalet.
link |
00:01:14.880
You're known for not sugarcoating your opinions
link |
00:01:17.320
and speaking your mind about ideas in AI, especially
link |
00:01:19.640
on Twitter.
link |
00:01:21.120
That's one of my favorite Twitter accounts.
link |
00:01:22.800
So what's one of the more controversial ideas
link |
00:01:26.360
you've expressed online and gotten some heat for?
link |
00:01:30.440
How do you pick?
link |
00:01:33.080
How do I pick?
link |
00:01:33.920
Yeah, no, I think if you go through the trouble of maintaining
link |
00:01:38.280
Twitter accounts, you might as well speak your mind.
link |
00:01:41.880
Otherwise, what's even the point of doing Twitter accounts,
link |
00:01:44.640
like getting an eye scar and just leaving it in the garage?
link |
00:01:48.600
Yeah, so that's one thing for which
link |
00:01:50.360
I got a lot of pushback.
link |
00:01:53.640
Perhaps that time, I wrote something
link |
00:01:56.720
about the idea of intelligence explosion.
link |
00:02:00.960
And I was questioning the idea and the reasoning behind this
link |
00:02:05.680
idea.
link |
00:02:06.880
And I got a lot of pushback on that.
link |
00:02:09.720
I got a lot of flak for it.
link |
00:02:11.840
So yeah, so intelligence explosion, I'm sure you're familiar
link |
00:02:14.360
with the idea, but it's the idea
link |
00:02:15.800
that if you were to build general AI problems
link |
00:02:21.360
solving algorithms, well, the problem of building such an AI,
link |
00:02:27.600
that itself is a problem that could be solved by your AI.
link |
00:02:30.640
And maybe it could be solved better than what humans can do.
link |
00:02:33.840
So your AI could start tweaking its own algorithm,
link |
00:02:36.920
could start making a better version of itself.
link |
00:02:39.640
And so on, iteratively, in a recursive fashion,
link |
00:02:43.320
and so you would end up with an AI
link |
00:02:47.360
with exponentially increasing intelligence.
link |
00:02:50.920
And I was basically questioning this idea.
link |
00:02:55.880
First of all, because the notion of intelligence explosion
link |
00:02:59.080
uses an implicit definition of intelligence
link |
00:03:02.240
that doesn't sound quite right to me.
link |
00:03:05.400
It considers intelligence as a property of a brain
link |
00:03:11.200
that you can consider in isolation,
link |
00:03:13.680
like the height of a building, for instance.
link |
00:03:16.640
But that's not really what intelligence is.
link |
00:03:19.040
Intelligence emerges from the interaction
link |
00:03:22.200
between a brain, a body, like embodied intelligence,
link |
00:03:26.720
and an environment.
link |
00:03:28.320
And if you're missing one of these pieces,
link |
00:03:30.720
then you cannot really define intelligence anymore.
link |
00:03:33.840
So just tweaking a brain to make it smaller and smaller
link |
00:03:36.800
doesn't actually make any sense to me.
link |
00:03:39.120
So first of all, you're crushing the dreams of many people.
link |
00:03:43.000
So let's look at Sam Harris.
link |
00:03:46.000
Actually, a lot of physicists, Max Tegmark,
link |
00:03:48.680
people who think the universe is an information processing
link |
00:03:53.600
system.
link |
00:03:54.640
Our brain is kind of an information processing system.
link |
00:03:57.680
So what's the theoretical limit?
link |
00:04:00.040
It doesn't make sense that there should be some,
link |
00:04:04.840
it seems naive to think that our own brain is somehow
link |
00:04:08.080
the limit of the capabilities and this information.
link |
00:04:11.600
I'm playing devil's advocate here.
link |
00:04:13.600
This information processing system.
link |
00:04:15.600
And then if you just scale it, if you're
link |
00:04:18.040
able to build something that's on par with the brain,
link |
00:04:20.880
you just, the process that builds it just continues
link |
00:04:24.000
and it will improve exponentially.
link |
00:04:26.360
So that's the logic that's used actually
link |
00:04:30.120
by almost everybody that is worried
link |
00:04:33.920
about super human intelligence.
link |
00:04:36.880
Yeah, so you're trying to make, so most people
link |
00:04:39.800
who are skeptical of that are kind of like,
link |
00:04:42.320
this doesn't, their thought process,
link |
00:04:44.360
this doesn't feel right.
link |
00:04:46.520
Like that's for me as well.
link |
00:04:47.680
So I'm more like, it doesn't, the whole thing is shrouded
link |
00:04:52.320
in mystery where you can't really say anything concrete,
link |
00:04:55.840
but you could say this doesn't feel right.
link |
00:04:57.880
This doesn't feel like that's how the brain works.
link |
00:05:00.680
And you're trying to, with your blog post
link |
00:05:02.400
and now making it a little more explicit.
link |
00:05:05.680
So one idea is that the brain isn't,
link |
00:05:10.280
exists alone, it exists within the environment.
link |
00:05:13.840
So you can't exponentially, you would have to somehow
link |
00:05:17.520
exponentially improve the environment
link |
00:05:19.360
and the brain together, almost yet in order
link |
00:05:22.280
to create something that's much smarter
link |
00:05:26.280
in some kind of, of course we don't have
link |
00:05:29.120
a definition of intelligence.
link |
00:05:30.560
That's correct, that's correct.
link |
00:05:31.880
I don't think, you should look at very smart people
link |
00:05:34.560
to the even humans, not even talking about AI's.
link |
00:05:37.840
I don't think their brain and the performance
link |
00:05:40.000
of their brain is the bottleneck
link |
00:05:42.520
to their expressed intelligence, to their achievements.
link |
00:05:47.160
You cannot just tweak one part of this system,
link |
00:05:50.480
like of this brain, body, environment system
link |
00:05:53.360
and expect the capabilities, like what emerges
link |
00:05:56.480
out of this system to just, you know,
link |
00:05:59.000
explode exponentially.
link |
00:06:00.800
Because anytime you improve one part of a system
link |
00:06:04.720
with many interdependencies like this,
link |
00:06:07.280
there's a new bottleneck that arises, right?
link |
00:06:09.520
And I don't think even today for very smart people,
link |
00:06:12.280
their brain is not the bottleneck
link |
00:06:14.960
to the sort of problems they can solve, right?
link |
00:06:17.560
In fact, many very smart people today, you know,
link |
00:06:21.480
they're not actually solving any big scientific problems.
link |
00:06:23.760
They're not Einstein.
link |
00:06:24.800
They're like Einstein, but, you know,
link |
00:06:26.560
the patent clerk days.
link |
00:06:28.280
Like Einstein became Einstein
link |
00:06:31.920
because this was a meeting of a genius
link |
00:06:36.080
with a big problem at the right time, right?
link |
00:06:39.480
But maybe this meeting could have never happened
link |
00:06:42.480
and then Einstein, there's just been a patent clerk, right?
link |
00:06:44.960
And in fact, many people today are probably like
link |
00:06:49.760
genius level smart, but you wouldn't know
link |
00:06:52.240
because they're not really expressing any of that.
link |
00:06:54.800
Well, that's brilliant. So we can think of the world, earth,
link |
00:06:58.520
but also the universe as just, as a space of problems.
link |
00:07:02.720
So all of these problems and tasks are roaming it
link |
00:07:05.160
of various difficulty.
link |
00:07:06.880
And there's agents, creatures like ourselves
link |
00:07:10.120
and animals and so on that are also roaming it.
link |
00:07:13.360
And then you get coupled with a problem
link |
00:07:16.480
and then you solve it.
link |
00:07:17.640
But without that coupling,
link |
00:07:19.880
you can't demonstrate your quote unquote intelligence.
link |
00:07:22.560
Yeah, exactly. Intelligence is the meaning of
link |
00:07:25.440
great problem solving capabilities with a great problem.
link |
00:07:28.760
And if you don't have the problem,
link |
00:07:30.560
you don't really express in intelligence.
link |
00:07:32.280
All you're left with is potential intelligence,
link |
00:07:34.760
like the performance of your brain or, you know,
link |
00:07:36.920
how high your IQ is, which in itself is just a number, right?
link |
00:07:42.080
So you mentioned problem solving capacity.
link |
00:07:46.520
Yeah.
link |
00:07:47.360
What do you think of as problem solving capacity?
link |
00:07:51.040
What, can you try to define intelligence?
link |
00:07:56.680
Like, what does it mean to be more or less intelligent?
link |
00:08:00.040
Is it completely coupled to a particular problem?
link |
00:08:03.040
Or is there something a little bit more universal?
link |
00:08:05.760
Yeah, I do believe all intelligence
link |
00:08:07.480
is specialized intelligence.
link |
00:08:09.120
Even human intelligence has some degree of generality.
link |
00:08:12.280
Well, all intelligence systems have some degree of generality,
link |
00:08:15.400
but they're always specialized in one category of problems.
link |
00:08:19.480
So the human intelligence is specialized
link |
00:08:21.920
in the human experience and that shows at various levels,
link |
00:08:25.560
that shows in some prior knowledge,
link |
00:08:29.320
that's innate, that we have at birth,
link |
00:08:32.040
knowledge about things like agents,
link |
00:08:35.360
goal driven behavior, visual priors about what makes an object,
link |
00:08:40.440
priors about time, and so on.
link |
00:08:43.520
That shows also in the way we learn,
link |
00:08:45.360
for instance, it's very easy for us to pick up language,
link |
00:08:48.920
it's very, very easy for us to learn certain things
link |
00:08:52.080
because we are basically hard coded to learn them.
link |
00:08:54.920
And we are specialized in solving certain kinds of problems
link |
00:08:58.280
and we are quite useless when it comes to other kinds of problems.
link |
00:09:01.440
For instance, we are not really designed
link |
00:09:06.160
to handle very long term problems.
link |
00:09:08.800
We have no capability of seeing the very long term.
link |
00:09:12.840
We don't have very much working memory, you know?
link |
00:09:17.840
So how do you think about long term?
link |
00:09:19.960
Do you think long term planning,
link |
00:09:21.280
we're talking about scale of years, millennia,
link |
00:09:24.760
what do you mean by long term, we're not very good?
link |
00:09:27.960
Well, human intelligence is specialized in the human experience
link |
00:09:30.600
and human experience is very short, like one lifetime is short.
link |
00:09:34.120
Even within one lifetime, we have a very hard time envisioning,
link |
00:09:38.600
you know, things on a scale of years.
link |
00:09:41.080
Like it's very difficult to project yourself at the scale of five,
link |
00:09:43.920
at the scale of 10 years and so on.
link |
00:09:46.080
Right. We can solve only fairly narrowly scoped problems.
link |
00:09:49.960
So when it comes to solving bigger problems, larger scale problems,
link |
00:09:53.720
we are not actually doing it on an individual level.
link |
00:09:56.320
So it's not actually our brain doing it.
link |
00:09:59.240
We have this thing called civilization, right?
link |
00:10:03.040
Which is itself a sort of problem solving system,
link |
00:10:06.600
a sort of artificial intelligence system, right?
link |
00:10:10.000
And it's not running on one brain, it's running on a network of brains.
link |
00:10:14.080
In fact, it's running on much more than a network of brains.
link |
00:10:16.760
It's running on a lot of infrastructure, like books and computers
link |
00:10:21.960
and the internet and human institutions and so on.
link |
00:10:25.760
And that is capable of handling problems on a much greater scale
link |
00:10:31.640
than any individual human.
link |
00:10:33.720
If you look at computer science, for instance,
link |
00:10:37.560
that's an institution that solves problems and it is super human, right?
link |
00:10:42.480
It operates on a greater scale, it can solve much bigger problems
link |
00:10:46.840
than an individual human could.
link |
00:10:49.040
And science itself, science as a system, as an institution,
link |
00:10:52.120
is a kind of artificially intelligent problem solving algorithm
link |
00:10:57.640
that is super human.
link |
00:10:59.360
Yeah, it's a computer science is like a theorem prover
link |
00:11:06.080
at a scale of thousands, maybe hundreds of thousands of human beings.
link |
00:11:10.360
At that scale, what do you think is an intelligent agent?
link |
00:11:14.640
So there's us humans at the individual level.
link |
00:11:18.280
There is millions, maybe billions of bacteria in our skin.
link |
00:11:23.880
There is, that's at the smaller scale.
link |
00:11:26.400
You can even go to the particle level as systems that behave.
link |
00:11:31.880
You can say intelligently in some ways.
link |
00:11:35.400
And then you can look at the Earth as a single organism.
link |
00:11:37.840
You can look at our galaxy and even the universe as a single organism.
link |
00:11:42.160
Do you think, how do you think about scale and defining intelligent systems?
link |
00:11:46.320
And we're here at Google, there is millions of devices doing computation
link |
00:11:51.840
in a distributed way.
link |
00:11:53.400
How do you think about intelligence versus scale?
link |
00:11:55.880
You can always characterize anything as a system.
link |
00:12:00.640
I think people who talk about things like intelligence explosion
link |
00:12:05.320
tend to focus on one agent is basically one brain,
link |
00:12:08.760
like one brain considered in isolation, like a brain, a jar
link |
00:12:11.920
that's controlling a body in a very top to bottom kind of fashion.
link |
00:12:16.280
And that body is pursuing goals into an environment.
link |
00:12:19.480
So it's a very hierarchical view.
link |
00:12:20.720
You have the brain at the top of the pyramid,
link |
00:12:22.840
then you have the body just plainly receiving orders,
link |
00:12:25.960
then the body is manipulating objects in an environment and so on.
link |
00:12:28.920
So everything is subordinate to this one thing, this epicenter,
link |
00:12:33.680
which is the brain.
link |
00:12:34.760
But in real life, intelligent agents don't really work like this.
link |
00:12:39.240
There is no strong delimitation between the brain and the body to start with.
link |
00:12:43.400
You have to look not just at the brain, but at the nervous system.
link |
00:12:46.520
But then the nervous system and the body are naturally two separate entities.
link |
00:12:50.760
So you have to look at an entire animal as one agent.
link |
00:12:53.960
But then you start realizing as you observe an animal over any length of time
link |
00:13:00.200
that a lot of the intelligence of an animal is actually externalized.
link |
00:13:04.600
That's especially true for humans.
link |
00:13:06.240
A lot of our intelligence is externalized.
link |
00:13:08.880
When you write down some notes, there is externalized intelligence.
link |
00:13:11.960
When you write a computer program, you are externalizing cognition.
link |
00:13:16.000
So it's externalized in books.
link |
00:13:17.320
It's externalized in computers, the internet, in other humans.
link |
00:13:23.040
It's externalized in language and so on.
link |
00:13:25.400
So there is no hard delimitation of what makes an intelligent agent.
link |
00:13:32.640
It's all about context.
link |
00:13:34.920
OK, but AlphaGo is better at Go than the best human player.
link |
00:13:42.440
There's levels of skill here.
link |
00:13:44.960
So do you think there is such a concept as an intelligence explosion
link |
00:13:52.680
in a specific task?
link |
00:13:54.720
And then, well, yeah, do you think it's possible to have a category of tasks
link |
00:14:00.080
on which you do have something like an exponential growth of ability
link |
00:14:05.000
to solve that particular problem?
link |
00:14:07.400
I think if you consider a specific vertical, it's probably possible to some extent.
link |
00:14:15.280
I also don't think we have to speculate about it
link |
00:14:18.320
because we have real world examples of free classivity self improving
link |
00:14:24.760
intelligent systems.
link |
00:14:26.880
For instance, science is a problem solving system, a knowledge generation system,
link |
00:14:32.560
like a system that experiences the world in some sense
link |
00:14:36.240
and then gradually understands it and can act on it.
link |
00:14:40.120
And that system is superhuman and it is clearly recursively self improving
link |
00:14:45.560
because science fits into technology.
link |
00:14:47.520
Technology can be used to build better tools, better computers,
link |
00:14:51.120
better instrumentation and so on, which in turn can make science faster.
link |
00:14:56.720
So science is probably the closest thing we have today
link |
00:15:00.520
to a real civility self improving superhuman AI.
link |
00:15:04.720
And you can just observe, is science, is scientific progress today exploding,
link |
00:15:10.280
which itself is an interesting question.
link |
00:15:12.760
You can use that as a basis to try to understand what
link |
00:15:15.800
will happen with a superhuman AI that has science like behavior.
link |
00:15:20.960
Let me linger on it a little bit more.
link |
00:15:23.320
What is your intuition why an intelligence explosion is not possible?
link |
00:15:28.520
Like taking the scientific, all the semi scientific revolutions.
link |
00:15:34.400
Why can't we slightly accelerate that process?
link |
00:15:38.080
So you can absolutely accelerate any problem solving process.
link |
00:15:43.160
So recursively, recursive self improvement is absolutely a real thing.
link |
00:15:48.640
But what happens with a recursively self improving system
link |
00:15:51.880
is typically not explosion because no system exists in isolation.
link |
00:15:56.480
And so tweaking one part of the system means that suddenly another part of the system
link |
00:16:00.840
becomes a bottleneck.
link |
00:16:02.120
And if you look at science, for instance, which is clearly a recursively self improving,
link |
00:16:06.760
clearly a problem solving system, scientific progress is not actually exploding.
link |
00:16:11.960
If you look at science, what you see is the picture of a system that is consuming
link |
00:16:17.840
an exponentially increasing amount of resources.
link |
00:16:20.440
But it's having a linear output in terms of scientific progress.
link |
00:16:26.000
And maybe that will seem like a very strong claim.
link |
00:16:28.960
Many people are actually saying that scientific progress is exponential.
link |
00:16:34.520
But when they're claiming this, they're actually looking at indicators of resource
link |
00:16:40.000
consumption by science.
link |
00:16:43.080
For instance, the number of papers being published, the number of patterns being
link |
00:16:49.200
filed, and so on, which are just completely correlated with how many people are working
link |
00:16:55.760
on science today.
link |
00:16:57.640
So it's actually an indicator of resource consumption.
link |
00:17:00.720
But what you should look at is the output is progress in terms of the knowledge that
link |
00:17:06.760
science generates in terms of the scope and significance of the problems that we solve.
link |
00:17:12.840
And some people have actually been trying to measure that.
link |
00:17:16.920
Like Michael Nielsen, for instance, he had a very nice paper, I think that was last
link |
00:17:22.800
year about it.
link |
00:17:25.280
So his approach to measure scientific progress was to look at the timeline of scientific
link |
00:17:32.760
discoveries over the past 100, 150 years.
link |
00:17:37.400
And for each major discovery, ask a panel of experts to rate the significance of the
link |
00:17:46.120
discovery.
link |
00:17:47.120
And if the output of sciences in the institution were exponential, you would expect the temporal
link |
00:17:54.440
density of significance to go up exponentially, maybe because there's a faster rate of discoveries,
link |
00:18:01.080
maybe because the discoveries are increasingly more important.
link |
00:18:05.120
And what actually happens if you plot this temporal density of significance measured
link |
00:18:10.360
in this way, is that you see very much a flat graph.
link |
00:18:14.600
You see a flat graph across all disciplines, across physics, biology, medicine and so on.
link |
00:18:20.040
And it actually makes a lot of sense if you think about it, because think about the progress
link |
00:18:24.400
of physics 110 years ago.
link |
00:18:28.120
It was a time of crazy change.
link |
00:18:30.240
Think about the progress of technology 170 years ago, when we started replacing horses,
link |
00:18:36.640
with cars, when we started having electricity and so on.
link |
00:18:40.080
It was a time of incredible change.
link |
00:18:41.640
And today is also a time of very, very fast change.
link |
00:18:44.800
But it would be an unfair characterization to say that today, technology and science
link |
00:18:50.480
are moving way faster than they did 50 years ago or 100 years ago.
link |
00:18:54.600
And if you do try to rigorously plot the temporal density of the significance, you do see very
link |
00:19:08.800
flat curves and you can check out the paper that Michael Nielsen had about this idea.
link |
00:19:16.240
And so the way I interpret it is as you make progress in a given field or in a given subfield
link |
00:19:25.280
of science, it becomes exponentially more difficult to make further progress, like the
link |
00:19:30.640
very first person to work on information theory.
link |
00:19:35.120
If you enter a new field and it's still the very early years, there's a lot of low hanging
link |
00:19:40.320
fruit you can pick.
link |
00:19:42.200
But the next generation of researchers is going to have to dig much harder, actually,
link |
00:19:48.240
to make smaller discoveries, probably larger numbers, smaller discoveries.
link |
00:19:52.800
And to achieve the same amount of impact, you're going to need a much greater head count.
link |
00:19:57.640
And that's exactly the picture you're seeing with science, is that the number of scientists
link |
00:20:02.840
and engineers is, in fact, increasing exponentially.
link |
00:20:06.680
The amount of computational resources that are available to science is increasing exponentially
link |
00:20:11.520
and so on.
link |
00:20:12.520
So the resource consumption of science is exponential, but the output in terms of progress,
link |
00:20:18.240
in terms of significance, is linear.
link |
00:20:21.160
And the reason why is because, and even though science is rigorously self improving, meaning
link |
00:20:26.200
that scientific progress turns into technological progress, which in turn helps science.
link |
00:20:33.000
If you look at computers, for instance, our products of science and computers are tremendously
link |
00:20:39.240
useful in spinning up science.
link |
00:20:41.600
The internet, same thing.
link |
00:20:42.600
The internet is a technology that's made possible by very recent scientific advances.
link |
00:20:47.680
And itself, because it enables scientists to network, to communicate, to exchange papers
link |
00:20:53.960
and ideas much faster, it is a way to speed up scientific progress.
link |
00:20:57.480
So even though you're looking at a recursively self improving system, it is consuming exponentially
link |
00:21:02.800
more resources to produce the same amount of problem solving, in fact.
link |
00:21:09.240
So that's a fascinating way to paint it.
link |
00:21:11.200
And certainly that holds for the deep learning community, right?
link |
00:21:14.960
If you look at the temporal, what did you call it?
link |
00:21:18.040
The temporal density of significant ideas.
link |
00:21:21.260
If you look at in deep learning, I think, I'd have to think about that, but if you really
link |
00:21:27.440
look at significant ideas in deep learning, they might even be decreasing.
link |
00:21:32.480
So I do believe the per paper significance is decreasing.
link |
00:21:39.720
But the amount of papers is still today, exponentially increasing.
link |
00:21:43.480
So I think if you look at an aggregate, my guess is that you would see a linear progress.
link |
00:21:49.480
If you were to sum the significance of all papers, you would see a roughly linear progress.
link |
00:21:58.720
And in my opinion, it is not a coincidence that you're seeing linear progress in science
link |
00:22:05.680
despite exponential resource consumption.
link |
00:22:07.640
I think the resource consumption is dynamically adjusting itself to maintain linear progress
link |
00:22:15.840
because we as a community expect linear progress, meaning that if we start investing less and
link |
00:22:21.360
seeing less progress, it means that suddenly there are some lower hanging fruits that become
link |
00:22:26.160
available and someone's going to step up and pick them.
link |
00:22:31.320
So it's very much like a market for discoveries and ideas.
link |
00:22:37.200
But there's another fundamental part which you're highlighting, which as a hypothesis
link |
00:22:41.640
as science or the space of ideas, any one path you travel down, it gets exponentially
link |
00:22:49.440
more difficult to develop new ideas.
link |
00:22:54.800
And your sense is that's going to hold across our mysterious universe.
link |
00:23:01.080
Yes.
link |
00:23:02.080
Well, exponential progress triggers exponential friction so that if you tweak one part of
link |
00:23:06.800
the system, suddenly some other part becomes a bottleneck.
link |
00:23:10.200
For instance, let's say we develop some device that measures its own acceleration and then
link |
00:23:17.440
it has some engine and it outputs even more acceleration in proportion of its own acceleration
link |
00:23:22.240
and you drop it somewhere.
link |
00:23:23.240
It's not going to reach infinite speed because it exists in a certain context.
link |
00:23:29.120
So the error on this is going to generate friction and it's going to block it at some
link |
00:23:32.960
top speed.
link |
00:23:34.440
And even if you were to consider a broader context and lift the bottleneck there, like
link |
00:23:39.880
the bottleneck of friction, then some other part of the system would start stepping in
link |
00:23:46.200
and creating exponential friction, maybe the speed of flight or whatever.
link |
00:23:50.040
And this definitely holds true when you look at the problem solving algorithm that is being
link |
00:23:55.400
run by science as an institution, science as a system.
link |
00:23:59.780
As you make more and more progress, despite having this recursive self improvement component,
link |
00:24:06.880
you are encountering exponential friction, like the more researchers you have working
link |
00:24:11.840
on different ideas, the more overhead you have in terms of communication across researchers.
link |
00:24:18.200
If you look at, you were mentioning quantum mechanics, right?
link |
00:24:23.160
Well if you want to start making significant discoveries today, significant progress in
link |
00:24:28.480
quantum mechanics, there is an amount of knowledge you have to ingest, which is huge.
link |
00:24:34.200
But there is a very large overhead to even start to contribute, there is a large amount
link |
00:24:40.000
of overhead to synchronize across researchers and so on.
link |
00:24:44.240
And of course, the significant practical experiments are going to require exponentially
link |
00:24:50.720
expensive equipment because the easier ones have already been run, right?
link |
00:24:57.920
So in your senses, there is no way of escaping this kind of friction with artificial intelligence
link |
00:25:08.520
systems.
link |
00:25:09.520
Yeah, no, I think science is a very good way to model what would happen with a superhuman
link |
00:25:15.360
recursive research improving AI.
link |
00:25:17.880
That's my intuition.
link |
00:25:20.960
It's not like a mathematical proof of anything, that's not my point, like I'm not trying
link |
00:25:26.680
to prove anything, I'm just trying to make an argument to question the narrative of intelligence
link |
00:25:31.440
explosion, which is quite a dominant narrative and you do get a lot of pushback if you go
link |
00:25:35.600
against it.
link |
00:25:36.920
Because so for many people, right, AI is not just a subfield of computer science, it's
link |
00:25:43.280
more like a belief system, like this belief that the world is headed towards an event,
link |
00:25:49.560
the singularity, past which, you know, AI will become, will go exponential very much
link |
00:25:58.000
and the world will be transformed and humans will become obsolete.
link |
00:26:02.160
And if you go against this narrative, because it is not really a scientific argument but
link |
00:26:07.880
more of a belief system, it is part of the identity of many people.
link |
00:26:12.240
If you go against this narrative, it's like you're attacking the identity of people who
link |
00:26:15.680
believe in it.
link |
00:26:16.680
It's almost like saying God doesn't exist or something, so you do get a lot of pushback
link |
00:26:22.880
if you try to question his ideas.
link |
00:26:25.200
First of all, I believe most people, they might not be as eloquent or explicit as you're
link |
00:26:29.880
being, but most people in computer science are most people who actually have built anything
link |
00:26:34.400
that you could call AI, quote unquote, would agree with you.
link |
00:26:39.160
They might not be describing in the same kind of way, it's more, so the pushback you're
link |
00:26:43.880
getting is from people who get attached to the narrative from, not from a place of science,
link |
00:26:51.120
but from a place of imagination.
link |
00:26:53.520
That's correct.
link |
00:26:54.520
That's correct.
link |
00:26:55.520
So why do you think that's so appealing?
link |
00:26:57.240
Because the usual dreams that people have when you create a superintelligence system
link |
00:27:03.880
past the singularity, that what people imagine is somehow always destructive.
link |
00:27:09.520
Do you have, if you were put on your psychology hat, what's, why is it so?
link |
00:27:13.760
Why is it so appealing to imagine the ways that all of human civilization will be destroyed?
link |
00:27:20.200
I think it's a good story.
link |
00:27:22.200
You know, it's a good story.
link |
00:27:23.200
And very interestingly, it mirrors religious stories, right, religious mythology.
link |
00:27:30.680
If you look at the mythology of most civilizations, it's about the world being headed towards
link |
00:27:36.960
some final events in which the world will be destroyed and some new world order will
link |
00:27:42.240
arise that will be mostly spiritual, like the apocalypse followed by a paradise, probably.
link |
00:27:49.640
It's a very appealing story on a fundamental level.
link |
00:27:52.880
And we all need stories.
link |
00:27:54.640
We all need stories to structure in the way we see the world, especially at timescales
link |
00:27:59.920
that are beyond our ability to make predictions.
link |
00:28:04.600
So on a more serious non exponential explosion question, do you think there will be a time
link |
00:28:14.920
when we'll create something like human level intelligence or intelligence systems that
link |
00:28:21.880
will make you sit back and be just surprised at damn how smart this thing is?
link |
00:28:28.720
That doesn't require exponential growth or an exponential improvement.
link |
00:28:32.360
But what's your sense of the timeline and so on, that you'll be really surprised at
link |
00:28:39.840
certain capabilities?
link |
00:28:40.840
And we'll talk about limitations and deep learning, so do you think in your lifetime
link |
00:28:44.360
you'll be really damn surprised?
link |
00:28:46.760
Around 2013, 2014, I was many times surprised by the capabilities of deep learning, actually.
link |
00:28:53.960
That was before we had assessed exactly what deep learning could do and could not do and
link |
00:28:57.880
it felt like a time of immense potential.
link |
00:29:00.680
And then we started narrowing it down.
link |
00:29:03.120
But I was very surprised, so I would say it has already happened.
link |
00:29:07.240
Was there a moment, there must have been a day in there where your surprise was almost
link |
00:29:13.640
bordering on the belief of the narrative that we just discussed?
link |
00:29:19.640
Was there a moment, because you've written quite eloquently about the limits of deep
link |
00:29:23.200
learning, was there a moment that you thought that maybe deep learning is limitless?
link |
00:29:28.600
No, I don't think I've ever believed this.
link |
00:29:32.520
What was really shocking is that it worked.
link |
00:29:35.120
It worked at all, yeah.
link |
00:29:37.800
But there's a big jump between being able to do really good computer vision and human
link |
00:29:43.880
level intelligence.
link |
00:29:45.040
So I don't think at any point, I wasn't an impression that the results we got in computer
link |
00:29:50.840
vision meant that we were very close to human level intelligence.
link |
00:29:54.040
I don't think we're very close to human level intelligence.
link |
00:29:56.000
I do believe that there's no reason why we won't achieve it at some point.
link |
00:30:01.720
I also believe that the problem with talking about human level intelligence is that implicitly
link |
00:30:10.280
you're considering an axis of intelligence with different levels.
link |
00:30:13.920
But that's not really how intelligence works.
link |
00:30:17.200
Intelligence is very multidimensional.
link |
00:30:19.600
And so there's the question of capabilities, but there's also the question of being human
link |
00:30:24.440
like, and it's two very different things, like you can build potentially very advanced
link |
00:30:29.640
intelligent agents that are not human like at all.
link |
00:30:32.760
And you can also build very human like agents.
link |
00:30:35.320
And these are two very different things, right?
link |
00:30:37.920
Right.
link |
00:30:38.920
Let's go from the philosophical to the practical.
link |
00:30:42.360
Can you give me a history of Keras and all the major deep learning frameworks that you
link |
00:30:46.560
kind of remember in relation to Keras and in general, TensorFlow, Theano, the old days.
link |
00:30:51.600
Can you give a brief overview, Wikipedia style history, and your role in it before we return
link |
00:30:57.440
to AGI discussions?
link |
00:30:58.840
Yeah, that's a broad topic.
link |
00:31:00.840
So I started working on Keras, it was a name Keras at the time, I actually picked the
link |
00:31:06.800
name like just the day I was going to release it.
link |
00:31:09.920
So I started working on it in February 2015.
link |
00:31:15.040
And so at the time, there weren't too many people working on deep learning, maybe like
link |
00:31:18.440
fewer than 10,000, the software tooling was not really developed.
link |
00:31:25.480
So the main deep learning library was Cafe, which was mostly C++.
link |
00:31:30.960
Why do you say Cafe was the main one?
link |
00:31:33.040
Cafe was vastly more popular than Theano in late 2014, early 2015.
link |
00:31:39.120
Cafe was the one library that everyone was using for computer vision.
link |
00:31:43.480
And computer vision was the most popular problem.
link |
00:31:46.240
Absolutely.
link |
00:31:47.240
Like, Covenant was like the subfield of deep learning that everyone was working on.
link |
00:31:53.280
So myself, so in late 2014, I was actually interested in RNNs, in recurrent neural networks,
link |
00:32:01.840
which was a very niche topic at the time, right, it really took off around 2016.
link |
00:32:08.800
And so I was looking for good tools.
link |
00:32:11.520
I had used Torch 7, I had used Theano, used Theano a lot in Kaggle competitions, I had
link |
00:32:19.480
used Cafe.
link |
00:32:21.240
And there was no like good solution for RNNs at the time, like there was no reusable open
link |
00:32:27.880
source implementation of an LSTM, for instance.
link |
00:32:30.280
So I decided to build my own.
link |
00:32:33.200
And at first, the pitch for that was it was going to be mostly around LSTM recurrent neural
link |
00:32:39.600
networks.
link |
00:32:40.600
So in Python, an important decision at the time that was kind of nonobvious is that the
link |
00:32:46.000
models would be defined via Python code, which was kind of like going against the mainstream
link |
00:32:54.520
at the time, because Cafe, Pylon 2 and so on, like all the big libraries were actually
link |
00:33:00.320
going with you, approaching static configuration files in YAML to define models.
link |
00:33:05.840
So some libraries were using code to define models like Torch 7, obviously, but that was
link |
00:33:10.560
not.
link |
00:33:11.560
Python Lasagne was like a Theano based very early library that was, I think, developed.
link |
00:33:17.840
I don't remember exactly.
link |
00:33:18.840
Probably late 2014.
link |
00:33:19.840
It's Python as well.
link |
00:33:20.840
It's Python as well.
link |
00:33:21.840
It was like on top of Theano.
link |
00:33:25.040
And so I started working on something and the value proposition at the time was that not
link |
00:33:32.760
only that what I think was the first reusable open source implementation of LSTM, you could
link |
00:33:40.920
combine RNNs and covenants with the same library, which is not really possible before.
link |
00:33:47.080
Like Cafe was only doing covenants.
link |
00:33:50.760
And it was kind of easy to use.
link |
00:33:52.880
Because so before I was using Theano, I was actually using Psykitlin.
link |
00:33:55.760
And I loved Psykitlin for its usability.
link |
00:33:58.480
So I drew a lot of inspiration from Psykitlin when I met Keras.
link |
00:34:02.440
It's almost like Psykitlin for neural networks.
link |
00:34:05.680
The fit function.
link |
00:34:06.680
Exactly.
link |
00:34:07.680
The fit function.
link |
00:34:08.680
Like reducing a complex string loop to a single function call.
link |
00:34:13.000
And of course, some people will say, this is hiding a lot of details, but that's exactly
link |
00:34:17.480
the point.
link |
00:34:18.480
The magic is the point.
link |
00:34:20.360
So it's magical, but in a good way, it's magical in the sense that it's delightful.
link |
00:34:25.280
I'm actually quite surprised.
link |
00:34:27.600
I didn't know that it was born out of desire to implement RNNs and LSTMs.
link |
00:34:31.920
It was.
link |
00:34:32.920
That's fascinating.
link |
00:34:33.920
So you were actually one of the first people to really try to attempt to get the major
link |
00:34:39.160
architecture together.
link |
00:34:41.160
And it's also interesting, I mean, you realize that that was a design decision at all is
link |
00:34:45.160
defining the model and code.
link |
00:34:47.480
Just I'm putting myself in your shoes, whether the YAML, especially if Cafe was the most
link |
00:34:52.320
popular.
link |
00:34:53.320
It was the most popular by far.
link |
00:34:54.760
If I was if I were, yeah, I don't, I didn't like the YAML thing, but it makes more sense
link |
00:35:01.880
that you will put in a configuration file, the definition of a model.
link |
00:35:05.760
That's an interesting gutsy move to stick with defining it in code.
link |
00:35:10.160
Just if you look back, other libraries, we're doing it as well, but it was definitely the
link |
00:35:14.800
more niche option.
link |
00:35:16.200
Yeah.
link |
00:35:17.200
Okay.
link |
00:35:18.200
Keras and then Keras.
link |
00:35:19.200
So I released Keras in March, 2015, and it got users pretty much from the start.
link |
00:35:24.220
So the deep learning community was very, very small at the time.
link |
00:35:27.480
Lots of people were starting to be interested in LSTMs.
link |
00:35:30.640
So it was going to release at the right time because it was offering an easy to use LSTM
link |
00:35:34.760
implementation.
link |
00:35:35.760
Exactly at the time where lots of you started to be intrigued by the capabilities of RNN,
link |
00:35:40.840
RNNs 1LP.
link |
00:35:42.340
So it grew from there.
link |
00:35:47.000
Then I joined Google about six months later, and that was actually completely unrelated
link |
00:35:53.760
to Keras.
link |
00:35:54.760
Keras actually joined a research team working on image classification mostly like computer
link |
00:36:00.720
vision.
link |
00:36:01.720
So I was doing computer vision research at Google initially.
link |
00:36:03.840
And immediately when I joined Google, I was exposed to the early internal version of TensorFlow.
link |
00:36:11.440
And the way it appeared to me at the time, and it was definitely the way it was at the
link |
00:36:15.400
time, is that this was an improved version of Tiano.
link |
00:36:20.880
So I immediately knew I had to port Keras to this new TensorFlow thing.
link |
00:36:27.040
And I was actually very busy as a new Googler.
link |
00:36:31.760
So I had not time to work on that.
link |
00:36:34.600
But then in November, I think it was November 2015, TensorFlow got released.
link |
00:36:41.360
And it was kind of like my wake up call that, hey, I had to actually go and make it happen.
link |
00:36:47.440
So in December, I ported Keras to run on TensorFlow, but it was not exactly a port.
link |
00:36:53.360
It was more like a refactoring where I was abstracting away all the backend functionality
link |
00:36:59.360
into one module so that the same code base could run on top of multiple backends.
link |
00:37:05.200
So on top of TensorFlow or Tiano.
link |
00:37:07.560
And for the next year, Tiano stayed as the default option, it was easier to use, it was
link |
00:37:21.000
much faster, especially when it came to on it.
link |
00:37:23.440
But eventually, TensorFlow overtook it.
link |
00:37:27.560
And TensorFlow, the early TensorFlow has similar architectural decisions as Tiano.
link |
00:37:34.000
So it was a natural transition.
link |
00:37:38.360
So what, I mean, that still carries as a side, almost one project, right?
link |
00:37:45.360
Yeah, so it was not my job assignment, it was not.
link |
00:37:50.280
I was doing it on the side.
link |
00:37:52.360
And even though it grew to have a lot of uses for deep learning library at the time, like
link |
00:37:57.840
Stroud 2016, but I wasn't doing it as my main job.
link |
00:38:02.560
So things started changing in, I think it must have been maybe October 2016, so one year
link |
00:38:10.680
later.
link |
00:38:11.680
So Rajat, who has the lead in TensorFlow, basically showed up one day in our building
link |
00:38:18.440
where I was doing like, so I was doing research and things like, so I did a lot of computer
link |
00:38:23.040
vision research, also collaborations with Christian Zegedi and Deep Learning for Theraim
link |
00:38:29.040
Proving, that was a really interesting research topic.
link |
00:38:34.720
And so Rajat was saying, hey, we saw Keras, we like it, we saw that you had Google, why
link |
00:38:42.600
don't you come over for like a quarter and work with us?
link |
00:38:46.960
And I was like, yeah, that sounds like a great opportunity, let's do it.
link |
00:38:50.560
And so I started working on integrating the Keras API into TensorFlow more tightly.
link |
00:38:57.520
So what followed up is a sort of temporary TensorFlow only version of Keras that was
link |
00:39:06.000
in TensorFlow.contrib for a while, and finally moved to TensorFlow Core.
link |
00:39:12.560
And I've never actually gotten back to my old team doing research.
link |
00:39:17.320
Well, it's kind of funny that somebody like you who dreams of or at least sees the power
link |
00:39:27.360
of AI systems that reason and Theraim Proving will talk about has also created a system
link |
00:39:33.800
that makes the most basic kind of Lego building that is deep learning, super accessible, super
link |
00:39:41.600
easy, so beautifully so.
link |
00:39:43.840
It's a funny irony that you're both, you're responsible for both things.
link |
00:39:50.280
So TensorFlow 2.0 is kind of, there's a sprint, I don't know how long it'll take, but there's
link |
00:39:55.360
a sprint towards the finish.
link |
00:39:57.080
What do you look, what are you working on these days?
link |
00:40:01.120
What are you excited about?
link |
00:40:02.120
What are you excited about in 2.0?
link |
00:40:05.040
Eager execution, there's so many things that just make it a lot easier to work.
link |
00:40:09.880
What are you excited about?
link |
00:40:11.640
And what's also really hard?
link |
00:40:13.800
What are the problems you have to kind of solve?
link |
00:40:15.880
So I've spent the past year and a half working on TensorFlow 2.0 and it's been a long journey.
link |
00:40:22.880
I'm actually extremely excited about it.
link |
00:40:25.040
I think it's a great product.
link |
00:40:26.560
It's a delightful product compared to TensorFlow 1.0.
link |
00:40:29.440
We've made huge progress.
link |
00:40:32.800
So on the Keras side, what I'm really excited about is that, so previously Keras has been
link |
00:40:40.640
this very easy to use high level interface to do deep learning, but if you wanted to,
link |
00:40:50.880
if you wanted a lot of flexibility, the Keras framework was probably not the optimal way
link |
00:40:57.760
to do things compared to just writing everything from scratch.
link |
00:41:02.160
So in some way, the framework was getting in the way.
link |
00:41:05.040
And in TensorFlow 2.0, you don't have this at all, actually.
link |
00:41:08.280
You have the usability of the high level interface, but you have the flexibility of this lower
link |
00:41:13.600
level interface, and you have this spectrum of workflows where you can get more or less
link |
00:41:20.520
usability and flexibility, the tradeoffs, depending on your needs.
link |
00:41:26.960
You can write everything from scratch and you get a lot of help doing so by subclassing
link |
00:41:33.800
models and writing some train loops using eager execution.
link |
00:41:38.520
It's very flexible.
link |
00:41:39.520
It's very easy to debug.
link |
00:41:40.520
It's very powerful.
link |
00:41:42.400
But all of this integrates seamlessly with higher level features up to the classic Keras
link |
00:41:48.600
workflows, which are very psychedelic and ideal for a data scientist, machine learning
link |
00:41:56.440
engineer type of profile.
link |
00:41:58.320
So now you can have the same framework offering the same set of APIs that enable a spectrum
link |
00:42:04.320
of workflows that are lower level, more or less high level, that are suitable for profiles
link |
00:42:11.000
ranging from researchers to data scientists and everything in between.
link |
00:42:15.400
Yeah.
link |
00:42:16.400
So that's super exciting.
link |
00:42:17.400
I mean, it's not just that.
link |
00:42:18.600
It's connected to all kinds of tooling.
link |
00:42:21.560
You can go on mobile, you can go with TensorFlow Lite, you can go in the cloud or serving
link |
00:42:26.760
and so on, it all is connected together.
link |
00:42:29.240
Some of the best software written ever is often done by one person, sometimes two.
link |
00:42:37.440
So with a Google, you're now seeing sort of Keras having to be integrated in TensorFlow.
link |
00:42:42.920
I'm sure it has a ton of engineers working on.
link |
00:42:46.520
So I'm sure there are a lot of tricky design decisions to be made.
link |
00:42:52.320
How does that process usually happen?
link |
00:42:54.600
At least your perspective, what are the debates like?
link |
00:43:00.800
Is there a lot of thinking considering different options and so on?
link |
00:43:07.160
Yes.
link |
00:43:08.160
So a lot of the time I spend at Google is actually discussing design discussions, writing design
link |
00:43:17.920
docs, participating in design review meetings and so on.
link |
00:43:22.200
This is as important as actually writing a code.
link |
00:43:25.520
So there's a lot of thought and a lot of care that is taken in coming up with these decisions
link |
00:43:34.080
and taking into account all of our users because TensorFlow has this extremely diverse user
link |
00:43:39.920
base.
link |
00:43:40.920
It's not like just one user segment where everyone has the same needs.
link |
00:43:45.560
We have small scale production users, large scale production users.
link |
00:43:49.640
We have startups, we have researchers, it's all over the place, and we have to cater to
link |
00:43:56.520
all of their needs.
link |
00:43:57.520
If I just look at the standard debates of C++ or Python, there's some heated debates.
link |
00:44:04.160
Do you have those at Google?
link |
00:44:05.680
I mean, they're not heated in terms of emotionally, but there's probably multiple ways to do it,
link |
00:44:10.560
right?
link |
00:44:11.560
So how do you arrive through those design meetings at the best way to do it, especially in deep
link |
00:44:16.080
learning where the field is evolving as you're doing it?
link |
00:44:21.960
Is there some magic to it?
link |
00:44:23.440
Is there some magic to the process?
link |
00:44:25.240
I don't know if there's magic to the process, but there definitely is a process.
link |
00:44:30.800
So making design decisions is about satisfying a set of constraints, but also trying to do
link |
00:44:37.240
so in the simplest way possible because this is what can be maintained, this is what can
link |
00:44:42.720
be expanded in the future.
link |
00:44:45.080
So you don't want to naively satisfy the constraints by just, you know, for each capability you
link |
00:44:51.200
need available, you're going to come up with one argument in your API and so on.
link |
00:44:54.760
You want to design APIs that are modular and hierarchical so that they have an API surface
link |
00:45:03.920
that is as small as possible, right?
link |
00:45:07.520
And you want this modular hierarchical architecture to reflect the way that domain experts think
link |
00:45:14.800
about the problem because as a domain expert, when you're reading about a new API, you're
link |
00:45:19.960
reading a tutorial or some docs, pages, you already have a way that you're thinking about
link |
00:45:27.120
the problem.
link |
00:45:28.120
You already have certain concepts in mind and you're thinking about how they relate together
link |
00:45:35.600
and when you're reading docs, you're trying to build as quickly as possible a mapping
link |
00:45:41.280
between the concepts featured in your API and the concepts in your mind so you're trying
link |
00:45:47.240
to map your mental model as a domain expert to the way things work in the API.
link |
00:45:53.720
So you need an API and an underlying implementation that are reflecting the way people think about
link |
00:45:59.320
these things.
link |
00:46:00.320
So in minimizing the time it takes to do the mapping?
link |
00:46:02.960
Yes.
link |
00:46:03.960
Minimizing the time, the cognitive load there is in ingesting this new knowledge about your
link |
00:46:10.000
API.
link |
00:46:11.000
An API should not be self referential or referring to implementation details, it should only
link |
00:46:16.080
be referring to domain specific concepts that people already understand.
link |
00:46:22.360
Brilliant.
link |
00:46:24.560
So what's the future of Keras and TensorFlow look like?
link |
00:46:27.640
What does TensorFlow 3.0 look like?
link |
00:46:30.680
So that's kind of too far in the future for me to answer, especially since I'm not even
link |
00:46:36.440
the one making these decisions.
link |
00:46:39.480
But so from my perspective, which is just one perspective among many different perspectives
link |
00:46:44.840
on the TensorFlow team, I'm really excited by developing even higher level APIs, higher
link |
00:46:52.600
level than Keras.
link |
00:46:53.600
I'm really excited by hyperparameter tuning, by automated machine learning, AutoML.
link |
00:47:01.040
I think the future is not just defining a model like you were assembling Lego blocks
link |
00:47:07.480
and then colleague fit on it, it's more like an automagical model that would just look
link |
00:47:14.280
at your data and optimize the objective you're after.
link |
00:47:19.120
So that's what I'm looking into.
link |
00:47:22.440
Yes.
link |
00:47:23.440
So you put the baby into a room with the problem and come back a few hours later with a fully
link |
00:47:30.120
solved problem.
link |
00:47:31.120
Exactly.
link |
00:47:32.120
It's not like a box of Legos, it's more like the combination of a kid that's really good
link |
00:47:36.520
at Legos, and a box of Legos, and just building the thing on the song.
link |
00:47:41.560
Very nice.
link |
00:47:42.760
So that's an exciting feature.
link |
00:47:44.080
I think there's a huge amount of applications and revolutions to be had under the constraints
link |
00:47:50.680
of the discussion we previously had.
link |
00:47:52.800
But what do you think are the current limits of deep learning?
link |
00:47:57.520
If we look specifically at these function approximators that tries to generalize from
link |
00:48:05.200
data?
link |
00:48:06.200
If you've talked about local versus extreme generalization, you mentioned that neural
link |
00:48:11.800
networks don't generalize well and humans do, so there's this gap.
link |
00:48:17.840
And you've also mentioned that extreme generalization requires something like reasoning to fill those
link |
00:48:22.840
gaps.
link |
00:48:24.040
So how can we start trying to build systems like that?
link |
00:48:27.120
Right.
link |
00:48:28.120
Yes.
link |
00:48:29.120
So this is by design, right?
link |
00:48:30.640
And deep learning models are huge, parametric models, differentiable, so continuous, that
link |
00:48:39.600
go from an input space to an output space.
link |
00:48:42.840
And they're trained with gradient descent, so they're trained pretty much point by point.
link |
00:48:46.560
They're learning a continuous geometric morphing from an input vector space to an output vector
link |
00:48:53.560
space, right?
link |
00:48:55.640
And because this is done point by point, a deep neural network can only make sense of
link |
00:49:02.920
points in experience space that are very close to things that it has already seen in string
link |
00:49:08.160
data.
link |
00:49:09.160
At best, it can do interpolation across points.
link |
00:49:14.040
But that means in order to train your network, you need a dense sampling of the input cross
link |
00:49:20.560
output space, almost a point by point sampling, which can be very expensive if you're dealing
link |
00:49:27.040
with complex real world problems like autonomous driving, for instance, or robotics.
link |
00:49:33.760
It's doable if you're looking at the subset of the visual space.
link |
00:49:37.240
But even then, it's still fairly expensive, you still need millions of examples.
link |
00:49:41.200
And it's only going to be able to make sense of things that are very close to ways that's
link |
00:49:45.600
seen before.
link |
00:49:47.000
And in contrast to that, well, of course, you have human intelligence, but even if you're
link |
00:49:50.720
not looking at human intelligence, you can look at very simple rules, algorithms.
link |
00:49:56.840
If you have a symbolic rule, it can actually apply to a very, very large set of inputs
link |
00:50:03.080
because it is abstract.
link |
00:50:04.920
It is not obtained by doing a point by point mapping, right?
link |
00:50:10.760
For instance, if you try to learn a sorting algorithm using a deep neural network, well,
link |
00:50:15.640
you're very much limited to learning point by point what the sorted representation of
link |
00:50:21.800
this specific list is like.
link |
00:50:24.520
But instead, you could have a very, very simple sorting algorithm written in a few lines.
link |
00:50:32.120
Maybe it's just two nested loops.
link |
00:50:35.720
And it can process any list at all because it is abstract, because it is a set of rules.
link |
00:50:42.320
So deep learning is really like point by point geometric morphings, morphings trained with
link |
00:50:47.440
God and Descent.
link |
00:50:48.880
And meanwhile, abstract rules can generalize much better.
link |
00:50:54.200
And I think the future is really to combine the two.
link |
00:50:56.400
So how do we, do you think, combine the two?
link |
00:50:59.720
How do we combine good point by point functions with programs, which is what the symbolic AI
link |
00:51:08.040
type systems?
link |
00:51:09.040
Yeah.
link |
00:51:10.040
At which levels the combination happened.
link |
00:51:11.600
I mean, obviously, we're jumping into the realm of where there's no good answers.
link |
00:51:17.480
It's just kind of ideas and intuitions and so on.
link |
00:51:20.120
Yeah.
link |
00:51:21.120
Well, if you look at the really successful AI systems today, I think there are already
link |
00:51:25.200
hybrid systems that are combining symbolic AI with deep learning.
link |
00:51:29.600
For instance, successful robotics systems are already mostly model based, rule based
link |
00:51:36.120
things like planning algorithms and so on.
link |
00:51:39.560
At the same time, they're using deep learning as perception modules.
link |
00:51:44.320
Sometimes they're using deep learning as a way to inject fuzzy intuition into a rule
link |
00:51:49.120
based process.
link |
00:51:51.000
If you look at a system like a self driving car, it's not just one big end to end neural
link |
00:51:56.720
network that wouldn't work at all, precisely because in order to train that, you would
link |
00:52:00.920
need a dense sampling of experience space when it comes to driving, which is completely
link |
00:52:06.960
unrealistic, obviously.
link |
00:52:08.480
Instead, the self driving car is mostly symbolic, it's software, it's programmed by hand.
link |
00:52:18.560
It's mostly based on explicit models, in this case, mostly 3D models of the environment
link |
00:52:25.760
around the car, but it's interfacing with the real world, using deep learning modules.
link |
00:52:31.600
The deep learning there serves as a way to convert the raw sensory information to something
link |
00:52:36.480
usable by symbolic systems.
link |
00:52:38.600
Okay, well, let's linger on that a little more.
link |
00:52:42.440
So dense sampling from input to output, you said it's obviously very difficult.
link |
00:52:48.400
Is it possible?
link |
00:52:49.400
In the case of self driving, you mean?
link |
00:52:51.960
Let's say self driving, right?
link |
00:52:53.240
Self driving for many people.
link |
00:52:57.760
Let's not even talk about self driving, let's talk about steering, so staying inside the
link |
00:53:03.320
lane.
link |
00:53:05.320
It's definitely a problem you can solve with an end to end deep learning model, but that's
link |
00:53:09.200
like one small subset.
link |
00:53:10.200
Hold on a second, I don't know how you're jumping from the extreme so easily, because
link |
00:53:14.600
I disagree with you on that.
link |
00:53:17.800
I think, well, it's not obvious to me that you can solve lane following.
link |
00:53:23.240
No, it's not obvious, I think it's doable.
link |
00:53:25.720
I think in general, there is no hard limitations to what you can learn with a deep neural network,
link |
00:53:33.800
as long as the search space is rich enough, is flexible enough, and as long as you have
link |
00:53:42.160
this dense sampling of the input cross output space, the problem is that this dense sampling
link |
00:53:47.640
could mean anything from 10,000 examples to trillions and trillions.
link |
00:53:52.920
So that's my question.
link |
00:53:54.440
So what's your intuition?
link |
00:53:56.360
And if you could just give it a chance and think what kind of problems can be solved
link |
00:54:01.800
by getting a huge amounts of data and thereby creating a dense mapping.
link |
00:54:08.080
So let's think about natural language dialogue, the Turing test.
link |
00:54:14.040
Do you think the Turing test can be solved with a neural network alone?
link |
00:54:20.080
Well, the Turing test is all about tricking people into believing they're talking to a
link |
00:54:26.480
human.
link |
00:54:27.480
It's actually very difficult because it's more about exploiting human perception and
link |
00:54:35.720
not so much about intelligence.
link |
00:54:37.680
There's a big difference between mimicking into Asian behavior and actually into Asian
link |
00:54:41.520
behavior.
link |
00:54:42.520
So, okay, let's look at maybe the Alexa prize and so on, the different formulations of the
link |
00:54:46.680
natural language conversation that are less about mimicking and more about maintaining
link |
00:54:51.720
a fun conversation that lasts for 20 minutes.
link |
00:54:54.920
It's a little less about mimicking and that's more about, I mean, it's still mimicking,
link |
00:54:59.240
but it's more about being able to carry forward a conversation with all the tangents that
link |
00:55:03.200
happen in dialogue and so on.
link |
00:55:05.120
Do you think that problem is learnable with this kind of neural network that does the
link |
00:55:12.480
point to point mapping?
link |
00:55:14.600
So I think it would be very, very challenging to do this with deep learning.
link |
00:55:17.800
I don't think it's out of the question either.
link |
00:55:21.480
I wouldn't read it out.
link |
00:55:23.440
The space of problems that can be solved with a large neural network.
link |
00:55:27.080
What's your sense about the space of those problems?
link |
00:55:31.280
Useful problems for us.
link |
00:55:32.680
In theory, it's infinite.
link |
00:55:33.960
You can solve any problem.
link |
00:55:36.320
In practice, while deep learning is a great fit for perception problems, in general, any
link |
00:55:45.400
problem which is naturally amenable to explicit handcrafted rules or rules that you can generate
link |
00:55:52.120
by exhaustive search over some program space.
link |
00:55:56.160
So perception, artificial intuition, as long as you have a sufficient training data set.
link |
00:56:03.400
And that's the question.
link |
00:56:04.400
I mean, perception, there's interpretation and understanding of the scene, which seems
link |
00:56:08.800
to be outside the reach of current perception systems.
link |
00:56:13.040
So do you think larger networks will be able to start to understand the physics and the
link |
00:56:19.240
physics of the scene, the three dimensional structure and relationships of objects in
link |
00:56:23.960
the scene, and so on?
link |
00:56:25.720
Or really, that's where symbolic at has to step in?
link |
00:56:28.880
Well, it's always possible to solve these problems with deep learning is just extremely
link |
00:56:37.680
inefficient.
link |
00:56:38.680
A model would be an explicit rule based abstract model would be a far better, more compressed
link |
00:56:45.240
representation of physics than learning just this mapping between in this situation, this
link |
00:56:50.280
thing happens.
link |
00:56:51.280
If you change the situation slightly, then this other thing happens and so on.
link |
00:56:54.520
Do you think it's possible to automatically generate the programs that would require that
link |
00:57:00.840
kind of reasoning?
link |
00:57:01.840
Or does it have to, so where expert systems fail, there's so many facts about the world
link |
00:57:07.120
had to be hand coded in.
link |
00:57:08.640
Do you think it's possible to learn those logical statements that are true about the
link |
00:57:15.360
world and their relationships?
link |
00:57:17.120
I mean, that's kind of what they're improving at a basic level is trying to do, right?
link |
00:57:22.640
Yeah, except it's much harder to formulate statements about the world compared to fermenting
link |
00:57:28.360
mathematical statements.
link |
00:57:30.680
Statements about the world tend to be subjective.
link |
00:57:34.320
So can you learn rule based models?
link |
00:57:39.320
Yes.
link |
00:57:40.320
Yes, definitely.
link |
00:57:41.320
That's the field of program synthesis.
link |
00:57:43.720
However, today we just don't really know how to do it.
link |
00:57:48.080
So it's very much a grass search or tree search problem.
link |
00:57:52.640
And so we are limited to the sort of a tree session grass search algorithms that we have
link |
00:57:58.080
today.
link |
00:57:59.080
Personally, I think genetic algorithms are very promising.
link |
00:58:02.080
So it's almost like genetic programming.
link |
00:58:04.640
Genetic programming, exactly.
link |
00:58:05.760
Can you discuss the field of program synthesis, like what, how many people are working and
link |
00:58:12.200
thinking about it?
link |
00:58:13.840
What, where we are in the history of program synthesis and what are your hopes for it?
link |
00:58:20.360
Well, if it were deep learning, this is like the 90s.
link |
00:58:24.760
So meaning that we already have existing solutions.
link |
00:58:29.320
We are starting to have some basic understanding of what this is about.
link |
00:58:35.720
But it's still a field that is in its infancy.
link |
00:58:38.120
There are very few people working on it.
link |
00:58:40.560
There are very few real world applications.
link |
00:58:44.520
So the one real world application I'm aware of is Flash Fill in Excel.
link |
00:58:51.960
It's a way to automatically learn very simple programs to format cells in an Excel spreadsheet
link |
00:58:58.240
from a few examples.
link |
00:58:59.840
For instance, learning a way to format a date, things like that.
link |
00:59:02.840
Oh, that's fascinating.
link |
00:59:03.840
Yeah.
link |
00:59:04.840
You know, okay, that's that's fascinating topic.
link |
00:59:06.280
I was wondering when I provide a few samples to Excel, what it's able to figure out, like
link |
00:59:12.880
just giving it a few dates, what are you able to figure out from the pattern I just gave
link |
00:59:18.280
you?
link |
00:59:19.280
That's a fascinating question.
link |
00:59:20.280
It's fascinating whether that's learnable patterns and you're saying they're working
link |
00:59:24.240
on that.
link |
00:59:25.240
Yeah.
link |
00:59:26.240
How big is the toolbox currently?
link |
00:59:27.240
Yeah.
link |
00:59:28.240
Are we completely in the dark?
link |
00:59:29.240
So if you set the 90s.
link |
00:59:30.240
In terms of program synthesis?
link |
00:59:32.240
No.
link |
00:59:33.240
So I would say, so maybe 90s is even too optimistic because by the 90s, you know, we already understood
link |
00:59:40.520
backprop.
link |
00:59:41.520
We already understood, you know, the engine of deep learning, even though we couldn't
link |
00:59:44.720
really see its potential quite today, I don't think we found the engine of program synthesis.
link |
00:59:50.440
So we're in the winter before backprop.
link |
00:59:52.960
Yeah.
link |
00:59:53.960
In a way, yes.
link |
00:59:55.760
So I do believe program synthesis, in general, discrete search over rule based models is going
link |
01:00:02.400
to be a cornerstone of AI research in the next century, right?
link |
01:00:06.960
And that doesn't mean we're going to drop deep learning.
link |
01:00:10.240
Deep learning is immensely useful.
link |
01:00:11.960
Like being able to learn this is a very flexible, adaptable, parametric models, that's actually
link |
01:00:19.480
immensely useful.
link |
01:00:20.480
Like all it's doing, it's pattern cognition, but being good at pattern cognition, given
link |
01:00:24.960
lots of data is just extremely powerful.
link |
01:00:27.880
So we are still going to be working on deep learning and we're going to be working on
link |
01:00:31.000
program synthesis.
link |
01:00:32.000
We're going to be combining the two in increasingly automated ways.
link |
01:00:36.520
So let's talk a little bit about data.
link |
01:00:38.640
You've tweeted about 10,000 deep learning papers have been written about hard coding
link |
01:00:46.120
priors, about a specific task in a neural network architecture, it works better than
link |
01:00:50.280
a lack of a prior.
link |
01:00:52.760
By summarizing all these efforts, they put a name to an architecture, but really what
link |
01:00:57.480
they're doing is hard coding some priors that improve the performance of the system.
link |
01:01:01.680
But we get straight to the point, it's probably true.
link |
01:01:07.000
So you say that you can always buy performance, buy in quotes performance by either training
link |
01:01:12.080
on more data, better data, or by injecting task information to the architecture of the
link |
01:01:17.520
preprocessing.
link |
01:01:18.520
However, this is informative about the generalization power the techniques use, the fundamentals
link |
01:01:22.720
of ability to generalize.
link |
01:01:23.720
Do you think we can go far by coming up with better methods for this kind of cheating,
link |
01:01:30.040
for better methods of large scale annotation of data, so building better priors?
link |
01:01:35.320
If you've made it, it's not cheating anymore.
link |
01:01:37.400
Right.
link |
01:01:38.400
I'm joking about the cheating, but large scale, so basically I'm asking about something
link |
01:01:46.480
that hasn't, from my perspective, been researched too much is exponential improvement in annotation
link |
01:01:54.300
of data.
link |
01:01:56.800
You often think about...
link |
01:01:58.120
I think it's actually been researched quite a bit.
link |
01:02:00.880
You just don't see publications about it, because people who publish papers are going
link |
01:02:06.120
to publish about known benchmarks, sometimes they're going to read a new benchmark.
link |
01:02:10.000
People who actually have real world large scale defining problems, they're going to spend
link |
01:02:14.360
a lot of resources into data annotation and good data annotation pipelines, but you don't
link |
01:02:18.800
see any papers about it.
link |
01:02:19.800
That's interesting.
link |
01:02:20.800
Do you think there are certain resources, but do you think there's innovation happening?
link |
01:02:24.600
Oh, yeah.
link |
01:02:25.920
To clarify at the point in the twist, machine learning in general is the science of generalization.
link |
01:02:33.960
You want to generate knowledge that can be reused across different datasets, across different
link |
01:02:41.080
tasks.
link |
01:02:42.680
If instead you're looking at one dataset, and then you are hard coding knowledge about
link |
01:02:49.320
this task into your architecture, this is no more useful than training a network and
link |
01:02:55.920
then saying, oh, I found these weight values perform well.
link |
01:03:03.160
David Ha, I don't know if you know David, he had a paper the other day about weight
link |
01:03:08.720
agnostic neural networks, and this is very interesting paper because it really illustrates
link |
01:03:13.840
the fact that an architecture, even without weight, an architecture is a knowledge about
link |
01:03:20.800
a task.
link |
01:03:21.800
It encodes knowledge.
link |
01:03:24.280
When it comes to architectures that are uncrafted by researchers, in some cases, it is very,
link |
01:03:31.560
very clear that all they are doing is artificially reencoding the template that corresponds
link |
01:03:39.400
to the proper way to solve the task and coding in a given dataset.
link |
01:03:45.240
For instance, if you've looked at the baby dataset, which is about natural language
link |
01:03:52.120
question answering, it is generated by an algorithm.
link |
01:03:55.800
This is a question under pairs that are generated by an algorithm.
link |
01:03:59.320
The algorithm is solving a certain template.
link |
01:04:01.680
Turns out, if you craft a network that literally encodes this template, you can solve this
link |
01:04:06.760
dataset with nearly 100% accuracy, but that doesn't actually tell you anything about how
link |
01:04:13.160
to solve question answering in general, which is the point.
link |
01:04:17.760
The question is just the linger on it, whether it's from the data side or from the size of
link |
01:04:21.560
the network.
link |
01:04:22.560
I don't know if you've read the blog post by Ray Sutton, the bitter lesson, where he
link |
01:04:27.960
says the biggest lesson that we can read from 70 years of AI research is that general methods
link |
01:04:33.480
that leverage computation are ultimately the most effective.
link |
01:04:38.120
As opposed to figuring out methods that can generalize effectively, do you think we can
link |
01:04:45.520
get pretty far by just having something that leverages computation and the improvement of
link |
01:04:50.720
computation?
link |
01:04:51.720
Yes.
link |
01:04:52.720
I think Rich is making a very good point, which is that a lot of these papers, which
link |
01:04:56.880
are actually all about manually hard coding prior knowledge about a task into some system,
link |
01:05:03.760
doesn't have to be deeply architected into some system, right?
link |
01:05:08.720
These papers are not actually making any impact.
link |
01:05:11.560
Instead, what's making really long term impact is very simple, very general systems that
link |
01:05:18.680
are really agnostic to all these tricks, because these tricks do not generalize.
link |
01:05:23.560
And of course, the one general and simple thing that you should focus on is that which
link |
01:05:31.680
leverages computation, because computation, the availability of large scale computation
link |
01:05:37.360
has been increasing exponentially, following Morse law.
link |
01:05:40.720
So if your algorithm is all about exploiting this, then your algorithm is suddenly exponentially
link |
01:05:46.160
improving, right?
link |
01:05:47.640
So I think Rich is definitely right.
link |
01:05:51.800
However, he's right about the past 70 years, he's like assessing the past 70 years.
link |
01:05:59.520
I am not sure that this assessment will still hold true for the next 70 years.
link |
01:06:05.440
It might, to some extent, I suspect it will not, because the truth of his assessment is
link |
01:06:12.040
a function of the context, right, in which this research took place.
link |
01:06:17.040
And the context is changing, like Morse law might not be applicable anymore, for instance,
link |
01:06:22.560
in the future.
link |
01:06:24.080
And I do believe that when you tweak one aspect of a system, when you exploit one aspect
link |
01:06:32.320
of a system, some other aspect starts becoming the bottleneck.
link |
01:06:36.680
Let's say you have unlimited computation, well, then data is the bottleneck.
link |
01:06:41.640
And I think we are already starting to be in a regime where our systems are so large
link |
01:06:46.560
in scale and so data ingrained, the data today, and the quality of data, and the scale of
link |
01:06:50.960
data is the bottleneck.
link |
01:06:53.280
And in this environment, the beta lesson from Rich is not going to be true anymore, right?
link |
01:07:00.960
So I think we are going to move from a focus on a scale of a competition scale to focus
link |
01:07:08.000
on data efficiency.
link |
01:07:10.080
Data efficiency.
link |
01:07:11.080
So that's getting to the question of symbolic AI.
link |
01:07:13.240
But the linger on the deep learning approaches, do you have hope for either unsupervised learning
link |
01:07:19.120
or reinforcement learning, which are ways of being more data efficient in terms of the
link |
01:07:28.280
amount of data they need that require human annotation?
link |
01:07:31.720
So unsupervised learning and reinforcement learning are frameworks for learning, but
link |
01:07:36.320
they are not like any specific technique.
link |
01:07:39.080
So usually when people say reinforcement learning, what they really mean is deep reinforcement
link |
01:07:42.800
learning, which is like one approach which is actually very questionable.
link |
01:07:47.440
The question I was asking was unsupervised learning with deep neural networks and deeper
link |
01:07:53.440
reinforcement learning.
link |
01:07:54.440
Well, these are not really data efficient because you're still leveraging these huge
link |
01:07:58.840
parametric models, point by point with gradient descent.
link |
01:08:03.760
It is more efficient in terms of the number of annotations, the density of annotations
link |
01:08:09.000
you need.
link |
01:08:10.000
The idea being to learn the latent space around which the data is organized and then map the
link |
01:08:16.680
sparse annotations into it.
link |
01:08:18.960
And sure, I mean, that's clearly a very good idea.
link |
01:08:23.640
It's not really a topic I would be working on, but it's clearly a good idea.
link |
01:08:27.960
So it would get us to solve some problems that...
link |
01:08:32.040
It will get us to incremental improvements in labeled data efficiency.
link |
01:08:38.280
Do you have concerns about short term or long term threats from AI, from artificial intelligence?
link |
01:08:46.640
Yes, definitely to some extent.
link |
01:08:50.720
And what's the shape of those concerns?
link |
01:08:52.360
This is actually something I've briefly written about.
link |
01:08:57.200
But the capabilities of deep learning technology can be used in many ways that are concerning
link |
01:09:06.160
from mass surveillance with things like facial recognition, in general, tracking lots of
link |
01:09:13.920
data about everyone and then being able to making sense of this data, to do identification,
link |
01:09:20.040
to do prediction.
link |
01:09:22.520
That's concerning.
link |
01:09:23.520
That's something that's being very aggressively pursued by totalitarian states like China.
link |
01:09:31.680
One thing I am very much concerned about is that our lives are increasingly online, are
link |
01:09:40.760
increasingly digital, made of information, made of information consumption and information
link |
01:09:45.960
production or digital footprint, I would say.
link |
01:09:52.160
And if you absorb all of this data and you are in control of where you consume information,
link |
01:10:01.200
social networks and so on, recommendation engines, then you can build a sort of reinforcement
link |
01:10:10.160
loop for human behavior.
link |
01:10:13.920
You can observe the state of your mind at time t.
link |
01:10:18.440
You can predict how you would react to different pieces of content, how to get you to move
link |
01:10:25.040
your mind in a certain direction, then you can feed the specific piece of content that
link |
01:10:33.280
would move you in a specific direction.
link |
01:10:35.920
And you can do this at scale in terms of doing it continuously in real time.
link |
01:10:45.000
You can also do it at scale in terms of scaling this to many, many people, to entire populations.
link |
01:10:50.560
So potentially, artificial intelligence, even in its current state, if you combine it with
link |
01:10:57.800
the internet, with the fact that we have all of our lives are moving to digital devices
link |
01:11:04.120
and digital information consumption and creation, what you get is the possibility to achieve
link |
01:11:11.800
mass manipulation of behavior and mass psychological control.
link |
01:11:16.960
And this is a very real possibility.
link |
01:11:18.360
Yeah, so you're talking about any kind of recommender system.
link |
01:11:22.240
Let's look at the YouTube algorithm, Facebook, anything that recommends content you should
link |
01:11:28.160
watch next, and it's fascinating to think that there's some aspects of human behavior
link |
01:11:35.480
that you can say a problem of, is this person hold Republican beliefs or Democratic beliefs?
link |
01:11:45.520
And it's a trivial, that's an objective function, and you can optimize and you can measure and
link |
01:11:52.720
you can turn everybody into a Republican or everybody into a Democrat.
link |
01:11:55.720
Absolutely, yeah.
link |
01:11:56.720
I do believe it's true.
link |
01:11:57.960
So the human mind is very...
link |
01:12:02.520
If you look at the human mind as a kind of computer program, it has a very large exploit
link |
01:12:06.760
surface, right?
link |
01:12:07.760
It has many, many vulnerabilities.
link |
01:12:08.760
Exploit surfaces, yeah.
link |
01:12:09.760
Where you can control it, for instance, when it comes to your political beliefs, this is
link |
01:12:16.920
very much tied to your identity.
link |
01:12:19.360
So for instance, if I'm in control of your news feed on your favorite social media platforms,
link |
01:12:26.080
this is actually where you're getting your news from.
link |
01:12:29.680
And of course, I can choose to only show you news that will make you see the world in a
link |
01:12:35.560
specific way, right?
link |
01:12:37.200
But I can also create incentives for you to post about some political beliefs.
link |
01:12:44.720
And then when I get you to express a statement, if it's a statement that me as a controller,
link |
01:12:52.720
I want to reinforce.
link |
01:12:53.720
I can just show it to people who will agree and they will like it.
link |
01:12:57.080
And that will reinforce the statement in your mind.
link |
01:12:59.400
If this is a statement I want you to, this is a belief I want you to abandon, I can,
link |
01:13:06.280
on the other hand, show it to opponents, right, will attack you.
link |
01:13:10.800
And because they attack you at the very least, next time you will think twice about posting
link |
01:13:16.440
it.
link |
01:13:17.440
But maybe you will even, you know, stop believing this because you got pushed back, right?
link |
01:13:22.920
So there are many ways in which social media platforms can potentially control your opinions.
link |
01:13:30.560
And today, the, so all of these things are already being controlled by algorithms.
link |
01:13:38.320
These algorithms do not have any explicit political goal today.
link |
01:13:43.080
Well, potentially they could, like if some totalitarian government takes over, you know,
link |
01:13:50.960
social media platforms and decides that, you know, now we're going to use this not just
link |
01:13:55.280
for my surveillance, but also for my opinion control and behavior control, very bad things
link |
01:13:59.960
could happen.
link |
01:14:02.000
But what's really fascinating and actually quite concerning is that even without an
link |
01:14:08.680
explicit intent to manipulate, you're already seeing very dangerous dynamics in terms of
link |
01:14:15.480
how this content recommendation algorithms behave.
link |
01:14:19.960
Because right now, the goal, the objective function of these algorithms is to maximize
link |
01:14:26.920
engagement, right, which seems fairly innocuous at first, right?
link |
01:14:32.600
However, it is not because content that will maximally engage people, you know, get people
link |
01:14:40.400
to react in an emotional way, get people to click on something.
link |
01:14:44.480
It is very often content that, you know, is not healthy to the public discourse.
link |
01:14:54.480
For instance, fake news are far more likely to get you to click on them than real news,
link |
01:15:01.560
simply because they are not constrained to reality.
link |
01:15:07.080
So they can be as outrageous, as surprising as good stories as you want, because they
link |
01:15:14.120
are artificial, right?
link |
01:15:15.120
Yeah.
link |
01:15:16.120
To me, that's an exciting world because so much good can come.
link |
01:15:19.640
So there's an opportunity to educate people.
link |
01:15:24.680
You can balance people's worldview with other ideas.
link |
01:15:31.200
So there's so many objective functions.
link |
01:15:33.880
The space of objective functions that create better civilizations is large, arguably infinite.
link |
01:15:41.080
But there's also a large space that creates division and destruction, civil war, a lot
link |
01:15:51.720
of bad stuff.
link |
01:15:53.360
And the worry is, naturally, probably that space is bigger, first of all.
link |
01:15:59.480
And if we don't explicitly think about what kind of effects are going to be observed from
link |
01:16:06.920
different objective functions, then we're going to get into trouble.
link |
01:16:10.280
Because the question is, how do we get into rooms and have discussions?
link |
01:16:16.400
So inside Google, inside Facebook, inside Twitter, and think about, okay, how can we
link |
01:16:22.200
drive up engagement and at the same time create a good society?
link |
01:16:28.240
Is it even possible to have that kind of philosophical discussion?
link |
01:16:31.760
I think you can definitely try.
link |
01:16:33.200
So from my perspective, I would feel rather uncomfortable with companies that are in control
link |
01:16:40.160
of these new algorithms, with them making explicit decisions to manipulate people's opinions
link |
01:16:49.760
or behaviors, even if the intent is good, because that's a very totalitarian mindset.
link |
01:16:55.360
So instead, what I would like to see is probably never going to happen, because it's not super
link |
01:16:59.840
realistic, but that's actually something I really care about.
link |
01:17:02.560
I would like all these algorithms to present configuration settings to their users, so
link |
01:17:10.680
that the users can actually make the decision about how they want to be impacted by these
link |
01:17:17.960
information recommendation, content recommendation algorithms.
link |
01:17:22.080
For instance, as a user of something like YouTube or Twitter, maybe I want to maximize
link |
01:17:27.120
learning about a specific topic.
link |
01:17:30.480
So I want the algorithm to feed my curiosity, which is in itself a very interesting problem.
link |
01:17:38.720
So instead of maximizing my engagement, it will maximize how fast and how much I'm learning,
link |
01:17:44.840
and it will also take into account the accuracy, hopefully, of the information I'm learning.
link |
01:17:50.880
So yeah, the user should be able to determine exactly how these algorithms are affecting
link |
01:17:57.800
their lives.
link |
01:17:58.800
I don't want actually any entity making decisions about in which direction they're going to
link |
01:18:08.240
try to manipulate me.
link |
01:18:09.480
I want technology.
link |
01:18:11.840
So AI, these algorithms are increasingly going to be our interface to a world that is increasingly
link |
01:18:18.520
made of information.
link |
01:18:20.280
And I want everyone to be in control of this interface, to interface with the world on
link |
01:18:27.440
their own terms.
link |
01:18:29.160
So if someone wants these algorithms to serve their own personal growth goals, they should
link |
01:18:38.040
be able to configure these algorithms in such a way.
link |
01:18:41.920
Yeah, but so I know it's painful to have explicit decisions, but there is underlying explicit
link |
01:18:50.400
decisions, which is some of the most beautiful fundamental philosophy that we have before
link |
01:18:57.240
us, which is personal growth.
link |
01:19:01.200
If I want to watch videos from which I can learn, what does that mean?
link |
01:19:08.080
So if I have a checkbox that wants to emphasize learning, there's still an algorithm with
link |
01:19:13.600
explicit decisions in it that would promote learning.
link |
01:19:18.000
What does that mean for me?
link |
01:19:19.000
Like, for example, I've watched a documentary on Flat Earth theory, I guess.
link |
01:19:25.440
It was very, like, I learned a lot.
link |
01:19:28.200
I'm really glad I watched it.
link |
01:19:29.880
It was a friend recommended it to me, because I don't have such an allergic reaction to
link |
01:19:35.480
crazy people as my fellow colleagues do.
link |
01:19:37.800
But it was very eye opening, and for others, it might not be.
link |
01:19:42.320
From others, they might just get turned off from the same with the Republican and Democrat.
link |
01:19:47.640
And it's a non trivial problem.
link |
01:19:50.480
And first of all, if it's done well, I don't think it's something that wouldn't happen
link |
01:19:56.440
that the YouTube wouldn't be promoting or Twitter wouldn't be.
link |
01:20:00.160
It's just a really difficult problem.
link |
01:20:02.400
How do we do, how do give people control?
link |
01:20:05.080
Well, it's mostly an interface design problem.
link |
01:20:09.000
The way I see it, you want to create technology that's like a mentor or a coach or an assistant
link |
01:20:16.280
so that it's not your boss, right, you are in control of it.
link |
01:20:22.680
You are telling it what to do for you.
link |
01:20:25.920
And if you feel like it's manipulating you, it's not actually, it's not actually doing
link |
01:20:30.760
what you want.
link |
01:20:31.920
You should be able to switch to a different algorithm, you know.
link |
01:20:35.040
So that fine tune control, you kind of learn, you're trusting the human collaboration.
link |
01:20:39.720
I mean, that's how I see autonomous vehicles, too, is giving as much information as possible
link |
01:20:44.440
and you learn that dance yourself.
link |
01:20:46.560
Yeah, Adobe, I don't know if you use Adobe product for like Photoshop.
link |
01:20:51.040
Yeah, they're trying to see if they can inject YouTube into their interface, but basically
link |
01:20:56.600
allow you to show you all these videos that, because everybody's confused about what to
link |
01:21:01.920
do with features.
link |
01:21:03.360
So basically teach people by linking to, in that way, it's an assistant that shows, uses
link |
01:21:09.720
videos as a basic element of information.
link |
01:21:12.960
Okay, so what practically should people do to try to, to try to fight against abuses of
link |
01:21:23.080
these algorithms or algorithms that manipulate us?
link |
01:21:26.880
Honestly, it's a very, very difficult problem because to start with, there is very little
link |
01:21:31.080
public awareness of these issues.
link |
01:21:34.120
Very few people would think that, you know, anything wrong with their new algorithm, even
link |
01:21:39.960
though there is actually something wrong already, which is that it's trying to maximize engagement
link |
01:21:44.440
most of the time, which has very negative side effects, right?
link |
01:21:50.000
So ideally, so the very first thing is to stop trying to purely maximize engagement, try
link |
01:21:59.760
to propagate content based on popularity, right, instead take into account the goals
link |
01:22:11.000
and the profiles of each user.
link |
01:22:13.640
So you will, you will be, one example is, for instance, when I look at topic recommendations
link |
01:22:20.200
on Twitter, it's like, you know, they have this news tab with switch recommendations.
link |
01:22:25.640
That's always the worst garbage because it's content that appeals to the smallest command
link |
01:22:33.480
denominator to all Twitter users because they're trying to optimize, they're purely
link |
01:22:37.560
trying to obtain us popularity, they're purely trying to optimize engagement, but that's
link |
01:22:41.680
not what I want.
link |
01:22:43.080
So they should put me in control of some setting so that I define what's the objective function
link |
01:22:50.440
that Twitter is going to be following to show me this content.
link |
01:22:54.280
And honestly, so this is all about interface design, and we are not, it's not realistic
link |
01:22:59.320
to give users control of a bunch of knobs that define an algorithm, instead, we should
link |
01:23:04.760
purely put them in charge of defining the objective function, like let the user tell
link |
01:23:11.200
us what they want to achieve, how they want this algorithm to impact their lives.
link |
01:23:15.320
So do you think it is that or do they provide individual article by article reward structure
link |
01:23:20.200
where you give a signal, I'm glad I saw this or I'm glad I didn't?
link |
01:23:24.760
So like a Spotify type feedback mechanism, it works to some extent, I'm kind of skeptical
link |
01:23:31.520
about it because the only way the algorithm, the algorithm will attempt to relate your choices
link |
01:23:38.920
with the choices of everyone else, which might, you know, if you have an average profile that
link |
01:23:45.040
works fine, I'm sure Spotify accommodations work fine if you just like mainstream stuff.
link |
01:23:49.680
But if you don't, it can be, it's not optimal at all, actually.
link |
01:23:54.040
It'll be in an efficient search for the part of the Spotify world that represents you.
link |
01:24:00.880
So it's a tough problem, but do note that even a feedback system like what Spotify has
link |
01:24:09.000
does not give me control over why the algorithm is trying to optimize for.
link |
01:24:15.680
Well, public awareness, which is what we're doing now, is a good place to start.
link |
01:24:21.440
Do you have concerns about long term existential threats of artificial intelligence?
link |
01:24:27.760
Well, as I was saying, our world is increasingly made of information, AI algorithms are increasingly
link |
01:24:34.800
going to be our interface to this world of information, and somebody will be in control
link |
01:24:40.280
of these algorithms, and that puts us in any kind of bad situation, right?
link |
01:24:46.000
It has risks.
link |
01:24:48.120
It has risks coming from potentially large companies wanting to optimize their own goals,
link |
01:24:55.000
maybe profit, maybe something else, also from governments who might want to use these algorithms
link |
01:25:01.760
as a means of control of the entire population.
link |
01:25:04.720
Do you think there's existential threat that could arise from that?
link |
01:25:07.560
So existential threat, so maybe you're referring to the singularity narrative where robots
link |
01:25:15.840
just take over?
link |
01:25:16.840
Well, I don't not terminate a robot, and I don't believe it has to be a singularity.
link |
01:25:22.040
We're just talking to, just like you said, the algorithm controlling masses of populations,
link |
01:25:30.000
the existential threat being hurt ourselves much like a nuclear war would hurt ourselves,
link |
01:25:37.840
that kind of thing.
link |
01:25:38.840
I don't think that requires a singularity, that requires a loss of control over AI algorithms.
link |
01:25:44.600
So I do agree there are concerning trends.
link |
01:25:47.920
Honestly, I wouldn't want to make any long term predictions.
link |
01:25:53.600
I don't think today we really have the capability to see what the dangers of AI are going to
link |
01:25:59.560
be in 50 years, in 100 years.
link |
01:26:02.240
I do see that we are already faced with concrete and present dangers surrounding the negative
link |
01:26:11.480
side effects of content recombination systems of new seed algorithms concerning algorithmic
link |
01:26:17.280
bias as well.
link |
01:26:19.520
So we are delegating more and more decision processes to algorithms.
link |
01:26:26.000
Some of these algorithms are uncrafted, some are learned from data.
link |
01:26:30.160
But we are delegating control.
link |
01:26:34.040
Sometimes it's a good thing, sometimes not so much.
link |
01:26:37.240
And there is in general very little supervision of this process.
link |
01:26:41.720
So we are still in this period of very fast change, even chaos, where society is restructuring
link |
01:26:50.160
itself, turning into an information society, which itself is turning into an increasingly
link |
01:26:56.160
automated information processing society.
link |
01:26:59.240
And well, yeah, I think the best we can do today is try to raise awareness around some
link |
01:27:05.760
of these issues.
link |
01:27:06.760
And I think we are actually making good progress if you look at algorithmic bias, for instance.
link |
01:27:13.000
Three years ago, even two years ago, very, very few people were talking about it.
link |
01:27:17.240
And now all the big companies are talking about it, often not in a very serious way,
link |
01:27:22.400
but at least it is part of the public discourse.
link |
01:27:24.600
You see people in Congress talking about it.
link |
01:27:27.360
And it all started from raising awareness.
link |
01:27:32.840
So in terms of alignment problem, trying to teach as we allow algorithms, just even recommend
link |
01:27:40.200
their systems on Twitter, encoding human values and morals, decisions that touch on ethics.
link |
01:27:50.280
How hard do you think that problem is?
link |
01:27:52.640
How do we have lost functions in neural networks that have some component, some fuzzy components
link |
01:27:59.800
of human morals?
link |
01:28:01.280
Well, I think this is really all about objective function engineering, which is probably going
link |
01:28:07.400
to be increasingly a topic of concern in the future.
link |
01:28:10.680
Like for now, we are just using very naive loss functions because the hard part is not
link |
01:28:16.160
actually what you're trying to minimize, it's everything else.
link |
01:28:19.240
But as the everything else is going to be increasingly automated, we're going to be
link |
01:28:25.280
focusing our human attention on increasingly high level components, like what's actually
link |
01:28:30.920
driving the whole learning system, like the objective function.
link |
01:28:34.040
So loss function engineering is going to be, loss function engineer is probably going to
link |
01:28:38.360
be a job title in the future.
link |
01:28:40.760
And then the tooling you're creating with Keras essentially takes care of all the details
link |
01:28:46.200
underneath and basically the human expert is needed for exactly that.
link |
01:28:52.960
Keras is the interface between the data you're collecting and the business goals.
link |
01:28:59.240
And your job as an engineer is going to be to express your business goals and your understanding
link |
01:29:04.280
of your business or your product, your system as a kind of loss function or a kind of set
link |
01:29:10.440
of constraints.
link |
01:29:11.440
Does the possibility of creating an AGI system excite you or scare you or bore you?
link |
01:29:19.560
So intelligence can never really be general, you know, at best it can have some degree
link |
01:29:23.600
of generality, like human intelligence.
link |
01:29:26.600
And it's also always as some specialization in the same way that human intelligence is
link |
01:29:30.720
specialized in a certain category of problems, is specialized in the human experience.
link |
01:29:35.680
And when people talk about AGI, I'm never quite sure if they're talking about very,
link |
01:29:41.440
very smart AI, so smart that it's even smarter than humans, or they're talking about human
link |
01:29:46.200
like intelligence, because these are different things.
link |
01:29:49.880
Let's say, presumably I'm oppressing you today with my humanness.
link |
01:29:54.840
So imagine that I was in fact a robot.
link |
01:29:59.400
So what does that mean?
link |
01:30:02.400
I'm oppressing you with natural language processing.
link |
01:30:05.160
Maybe if you weren't able to see me, maybe this is a phone call.
link |
01:30:08.320
That kind of system.
link |
01:30:09.320
Okay.
link |
01:30:10.320
So companion.
link |
01:30:11.320
So that's very much about building human like AI.
link |
01:30:15.200
And you're asking me, you know, is this an exciting perspective?
link |
01:30:18.200
Yes.
link |
01:30:19.200
I think so, yes.
link |
01:30:21.960
Not so much because of what artificial human like intelligence could do, but, you know,
link |
01:30:29.640
from an intellectual perspective, I think if you could build truly human like intelligence,
link |
01:30:34.240
that means you could actually understand human intelligence, which is fascinating, right?
link |
01:30:40.160
Human like intelligence is going to require emotions, it's going to require consciousness,
link |
01:30:44.480
which is not things that would normally be required by an intelligent system.
link |
01:30:48.640
If you look at, you know, we were mentioning earlier like science as a superhuman problem
link |
01:30:55.560
solving agent or system, it does not have consciousness, it doesn't have emotions.
link |
01:31:02.240
In general, so emotions, I see consciousness as being on the same spectrum as emotions.
link |
01:31:07.760
It is a component of the subjective experience that is meant very much to guide behavior
link |
01:31:17.560
generation, right, it's meant to guide your behavior.
link |
01:31:20.880
In general, human intelligence and animal intelligence has evolved for the purpose of
link |
01:31:27.080
behavior generation, right, including in a social context.
link |
01:31:30.760
So that's why we actually need emotions.
link |
01:31:32.600
That's why we need consciousness.
link |
01:31:35.080
An artificial intelligence system developed in a different context may well never need
link |
01:31:39.280
them, may well never be conscious like science.
link |
01:31:43.280
But on that point, I would argue it's possible to imagine that there's echoes of consciousness
link |
01:31:50.160
in science when viewed as an organism, that science is consciousness.
link |
01:31:55.640
So I mean, how would you go about testing this hypothesis?
link |
01:31:59.320
How do you probe the subjective experience of an abstract system like science?
link |
01:32:07.240
Well the point of probing any subjective experience is impossible, because I'm not science, I'm
link |
01:32:12.280
a science.
link |
01:32:13.280
So I can't probe another entity's, another, it's no more than bacteria on my skin.
link |
01:32:20.720
Your legs, I can ask you questions about your subjective experience and you can answer me.
link |
01:32:25.360
And that's how I know you're conscious.
link |
01:32:27.720
Yes, but that's because we speak the same language.
link |
01:32:32.080
You perhaps, we have to speak the language of science and we have to ask it.
link |
01:32:35.800
Honestly, I don't think consciousness, just like emotions of pain and pleasure, is not
link |
01:32:41.120
something that inevitably arises from any sort of sufficiently intelligent information
link |
01:32:47.120
processing.
link |
01:32:48.120
It is a feature of the mind and if you've not implemented it explicitly, it is not there.
link |
01:32:54.080
So you think it's an emergent feature of a particular architecture.
link |
01:32:59.120
So do you think?
link |
01:33:00.120
It's a feature in the same sense.
link |
01:33:02.080
So again, the subjective experience is all about guiding behavior.
link |
01:33:09.800
If the problems you're trying to solve don't really involve embedded agents, maybe in a
link |
01:33:15.560
social context, generating behavior and pursuing goals like this.
link |
01:33:19.800
And if you look at science, that's not really what's happening, even though it is, it is
link |
01:33:23.280
a form of artificial air in this artificial intelligence in the sense that it is solving
link |
01:33:29.600
problems, it is committing knowledge, committing solutions and so on.
link |
01:33:35.240
So if you're not explicitly implementing a subjective experience, implementing certain
link |
01:33:41.120
emotions and implementing consciousness, it's not going to just spontaneously emerge.
link |
01:33:47.120
Yeah.
link |
01:33:48.360
But so for a system like human like intelligent system that has consciousness, do you think
link |
01:33:53.640
it needs to have a body?
link |
01:33:55.240
Yes, definitely.
link |
01:33:56.240
I mean, it doesn't have to be a physical body, right?
link |
01:33:59.920
And there's not that much difference between a realistic simulation in the real world.
link |
01:34:03.680
So there has to be something you have to preserve kind of thing.
link |
01:34:06.560
Yes.
link |
01:34:07.560
But human like intelligence can only arise in a human like context.
link |
01:34:12.400
Intelligence needs to be tired.
link |
01:34:13.400
You need other humans in order for you to demonstrate that you have human like intelligence, essentially.
link |
01:34:20.480
So what kind of tests and demonstration would be sufficient for you to demonstrate human
link |
01:34:29.240
like intelligence?
link |
01:34:30.480
Yeah.
link |
01:34:31.480
And just out of curiosity, you talked about in terms of theorem proving and program synthesis,
link |
01:34:37.080
I think you've written about that there's no good benchmarks for this.
link |
01:34:40.480
Yeah.
link |
01:34:41.480
That's one of the problems.
link |
01:34:42.480
So let's talk programs, program synthesis.
link |
01:34:46.560
So what do you imagine is a good, I think it's related questions for human like intelligence
link |
01:34:51.440
and for program synthesis.
link |
01:34:53.720
What's a good benchmark for either or both?
link |
01:34:56.160
Right.
link |
01:34:57.160
So I mean, you're actually asking two questions.
link |
01:34:59.400
Which is one is about quantifying intelligence and comparing the intelligence of an artificial
link |
01:35:06.520
system to the intelligence for human.
link |
01:35:08.800
And the other is about a degree to which this intelligence is human like.
link |
01:35:13.520
It's actually two different questions.
link |
01:35:16.800
So if you look, you mentioned earlier the Turing test.
link |
01:35:19.320
Right.
link |
01:35:20.320
Well, I actually don't like the Turing test because it's very lazy.
link |
01:35:24.080
It's all about completely bypassing the problem of defining and measuring intelligence.
link |
01:35:28.960
And instead delegating to a human judge or a panel of human judges.
link |
01:35:34.400
So it's a total cobalt, right?
link |
01:35:38.400
If you want to measure how human like an agent is, I think you have to make it interact
link |
01:35:45.640
with other humans.
link |
01:35:47.920
Maybe it's not necessarily a good idea to have these other humans be the judges.
link |
01:35:54.120
Maybe you should just observe BFU and compare it to what the human would actually have done.
link |
01:36:00.800
When it comes to measuring how smart, how clever an agent is and comparing that to the
link |
01:36:09.160
degree of human intelligence.
link |
01:36:11.240
So we're already talking about two things, right?
link |
01:36:13.680
The degree, kind of like the magnitude of an intelligence and its direction, right?
link |
01:36:20.600
Like the norm of a vector and its direction.
link |
01:36:23.560
And the direction is like human likeness.
link |
01:36:27.200
And the magnitude, the norm is intelligence.
link |
01:36:32.880
You could call it intelligence, right?
link |
01:36:34.280
So the direction, your sense, the space of directions that are human like is very narrow.
link |
01:36:42.440
So the way you would measure the magnitude of intelligence in a system in a way that
link |
01:36:49.880
also enables you to compare it to that of a human.
link |
01:36:54.960
Well, if you look at different benchmarks for intelligence today, they're all too focused
link |
01:37:02.000
on skill at a given task.
link |
01:37:04.480
That's skill at playing chess, skill at playing Go, skill at playing Dota.
link |
01:37:11.080
And I think that's not the right way to go about it because you can always be the human
link |
01:37:17.560
at one specific task.
link |
01:37:20.240
The reason why our skill at playing Go or at juggling or anything is impressive is because
link |
01:37:25.320
we are expressing this skill within a certain set of constraints.
link |
01:37:29.480
If you remove the constraints, the constraints that we have one lifetime, that we have this
link |
01:37:33.840
body and so on, if you remove the context, if you have unlimited train data, if you
link |
01:37:40.120
can have access to, you know, for instance, if you look at juggling, if you have no restriction
link |
01:37:44.840
on the hardware, then achieving arbitrary levels of skill is not very interesting and
link |
01:37:50.040
says nothing about the amount of intelligence you've achieved.
link |
01:37:53.960
So if you want to measure intelligence, you need to rigorously define what intelligence
link |
01:37:59.320
is, which in itself, you know, it's a very challenging problem.
link |
01:38:04.360
And do you think that's possible?
link |
01:38:05.960
To define intelligence?
link |
01:38:06.960
Yes, absolutely.
link |
01:38:07.960
I mean, you can provide, many people have provided, you know, some definition.
link |
01:38:11.680
I have my own definition.
link |
01:38:13.640
Where does your definition begin if it doesn't end?
link |
01:38:16.520
Well, I think intelligence is essentially the efficiency with which you turn experience
link |
01:38:25.560
into generalizable programs.
link |
01:38:29.960
So what that means is it's the efficiency with which you turn a sampling of experience
link |
01:38:35.280
space into the ability to process a larger chunk of experience space.
link |
01:38:46.200
So measuring skill can be one proxy because many, many different tasks can be one proxy
link |
01:38:53.480
for measure intelligence.
link |
01:38:54.680
But if you want to only measure skill, you should control for two things.
link |
01:38:58.880
You should control for the amount of experience that your system has and the priors that your
link |
01:39:07.920
system has.
link |
01:39:08.920
But if you control, if you look at two agents and you give them the same priors and you
link |
01:39:14.120
give them the same amount of experience, there is one of the agents that is going to learn
link |
01:39:21.480
programs, representation, something, a model that will perform well on the larger chunk
link |
01:39:27.720
of experience space than the other.
link |
01:39:29.760
And that is the smaller agent.
link |
01:39:31.920
Yeah.
link |
01:39:32.920
So if you fix the experience, which generate better programs, better meaning, more generalizable,
link |
01:39:39.920
that's really interesting.
link |
01:39:40.920
That's a very nice, clean definition of...
link |
01:39:42.760
By the way, in this definition, it is already very obvious that intelligence has to be specialized
link |
01:39:49.560
because you're talking about experience space and you're talking about segments of experience
link |
01:39:53.600
space.
link |
01:39:54.600
You're talking about priors and you're talking about experience, all of these things define
link |
01:39:59.680
the context in which intelligence emerges.
link |
01:40:04.840
And you can never look at the totality of experience space.
link |
01:40:10.040
So intelligence has to be specialized.
link |
01:40:12.520
But it can be sufficiently large, the experience space, even though specialized is a certain
link |
01:40:16.760
point when the experience space is large enough to where it might as well be general.
link |
01:40:22.200
It feels general.
link |
01:40:23.200
It looks general.
link |
01:40:24.200
I mean, it's very relative.
link |
01:40:25.200
For instance, many people would say human intelligence is general.
link |
01:40:29.560
In fact, it is quite specialized.
link |
01:40:32.960
We can definitely build systems that start from the same innate priors as what humans
link |
01:40:37.960
have at birth because we already understand fairly well what sort of priors we have as
link |
01:40:43.720
humans.
link |
01:40:44.720
Like many people have worked on this problem, most notably, Elzebeth Spelke from Harvard,
link |
01:40:50.680
and if you know her, she's worked a lot on what she calls a core knowledge.
link |
01:40:56.240
And it is very much about trying to determine and describe what priors we are born with.
link |
01:41:02.560
Like language skills and so on and all that kind of stuff.
link |
01:41:06.080
Exactly.
link |
01:41:07.080
So we have some pretty good understanding of what priors we are born with.
link |
01:41:11.520
So we could...
link |
01:41:13.960
So I've actually been working on a benchmark for the past couple of years, on and off.
link |
01:41:18.720
I hope to be able to release it at some point.
link |
01:41:21.440
The idea is to measure the intelligence of systems by considering for priors, considering
link |
01:41:29.120
for amount of experience, and by assuming the same priors as what humans are born with
link |
01:41:34.840
so that you can actually compare these scores to human intelligence and you can actually
link |
01:41:40.160
have humans pass the same test in a way that's fair.
link |
01:41:44.440
And so importantly, such a benchmark should be such that any amount of practicing does
link |
01:41:54.720
not increase your score.
link |
01:41:56.800
So try to picture a game where no matter how much you play this game, it does not change
link |
01:42:04.120
your skill at the game.
link |
01:42:05.400
Can you picture that?
link |
01:42:08.600
As a person who deeply appreciates practice, I cannot actually...
link |
01:42:14.840
There's actually a very simple trick.
link |
01:42:19.040
So in order to come up with a task, so the only thing you can measure is skill at a task.
link |
01:42:24.760
All tasks are going to involve priors.
link |
01:42:28.280
The trick is to know what they are and to describe that.
link |
01:42:32.480
And then you make sure that this is the same set of priors as what humans start with.
link |
01:42:36.040
So you create a task that assumes these priors, that exactly documents these priors, so that
link |
01:42:41.080
the priors are made explicit and there are no other priors involved.
link |
01:42:44.720
And then you generate a certain number of samples in experience space for this task.
link |
01:42:52.240
And this, for one task, assuming that the task is new for the agent passing it, that's
link |
01:42:59.480
one test of this definition of intelligence that we set up.
link |
01:43:07.560
And now you can scale that to many different tasks, that each task should be new to the
link |
01:43:12.360
agent passing it.
link |
01:43:13.360
And also should be human interpretable and understandable, so that you can actually have
link |
01:43:18.680
a human pass the same test and then you can compare the score of your machine and the score
link |
01:43:21.960
of your human.
link |
01:43:22.960
Which could be a lot.
link |
01:43:23.960
It could even start a task like MNIST, just as long as you start with the same set of
link |
01:43:28.580
priors.
link |
01:43:29.580
Yeah, so the problem with MNIST, humans are already trained to recognize digits.
link |
01:43:35.880
But let's say we're considering objects that are not digits, some complete arbitrary patterns.
link |
01:43:44.240
Well, humans already come with visual priors about how to process that.
link |
01:43:50.120
So in order to make the game fair, you would have to isolate these priors and describe
link |
01:43:55.760
them and then express them as computational rules.
link |
01:43:58.720
Having worked a lot with vision science people has exceptionally difficult, a lot of progress
link |
01:44:03.760
has been made, there's been a lot of good tests, and basically reducing all of human
link |
01:44:07.720
vision into some good priors.
link |
01:44:09.360
We're still probably far away from that perfectly, but as a start for a benchmark, that's an
link |
01:44:14.640
exciting possibility.
link |
01:44:15.640
Yeah, so Elisabeth Belke actually lists objectness as one of the core knowledge priors.
link |
01:44:25.320
Objectness.
link |
01:44:26.320
Cool.
link |
01:44:27.320
Objectness.
link |
01:44:28.320
Yeah.
link |
01:44:29.320
So we have priors about objectness, like about the visual space, about time, about agents,
link |
01:44:33.000
about goal oriented behavior.
link |
01:44:34.600
We have many different priors, but what's interesting is that, sure, we have this pretty
link |
01:44:42.680
diverse and rich set of priors, but it's also not that diverse, right?
link |
01:44:48.520
We are not born into this world with a ton of knowledge about the world.
link |
01:44:52.560
There is only a small set of core knowledge, right?
link |
01:44:59.240
Yeah.
link |
01:45:00.240
So do you have a sense of how it feels to us humans that that set is not that large,
link |
01:45:07.120
but just even the nature of time that we kind of integrate pretty effectively through all
link |
01:45:11.920
of our perception, all of our reasoning, maybe how, you know, do you have a sense of
link |
01:45:17.680
how easy it is to encode those priors?
link |
01:45:19.880
Maybe it requires building a universe, and then the human brain in order to encode those
link |
01:45:26.000
priors.
link |
01:45:27.000
Or do you have a hope that it can be listed like an XAMAT?
link |
01:45:30.680
I don't think so.
link |
01:45:31.680
So you have to keep in mind that any knowledge about the world that we are born with is something
link |
01:45:36.480
that has to have been encoded into our DNA by evolution at some point.
link |
01:45:43.280
And DNA is a very, very low bandwidth medium, like it's extremely long and expensive to
link |
01:45:50.720
encode anything into DNA, because first of all, you need some sort of evolutionary pressure
link |
01:45:57.120
to guide this writing process.
link |
01:45:59.400
And then, you know, the higher level of information you're trying to write, the longer it's going
link |
01:46:05.720
to be, and the thing in the environment that you're trying to encode knowledge about has
link |
01:46:13.960
to be stable over this duration.
link |
01:46:17.240
So you can only encode into DNA things that constitute an evolutionary advantage.
link |
01:46:22.840
So this is actually a very small subset of all possible knowledge about the world.
link |
01:46:27.120
You can only encode things that are stable, that are true over very, very long periods
link |
01:46:33.360
of time, typically millions of years.
link |
01:46:35.480
For instance, we might have some visual prior about the shape of snakes, right?
link |
01:46:40.520
But what makes a face?
link |
01:46:43.800
What's the difference between a face and a nonface?
link |
01:46:46.440
But consider this interesting question.
link |
01:46:49.840
Do we have any innate sense of the visual difference between a male face and a female
link |
01:46:57.800
face?
link |
01:46:58.800
What do you think?
link |
01:46:59.800
For a human, I mean.
link |
01:47:01.320
I would have to look back into evolutionary history when the genders emerged.
link |
01:47:05.920
But yeah, most, I mean, the faces of humans are quite different from the faces of great
link |
01:47:11.280
apes, great apes, right?
link |
01:47:14.000
Yeah.
link |
01:47:15.000
That's interesting.
link |
01:47:16.000
But yeah.
link |
01:47:17.000
You couldn't tell the face of a female chimpanzee from the face of a male chimpanzee, probably.
link |
01:47:23.200
Yeah.
link |
01:47:24.200
And I don't think most humans evolve all that ability.
link |
01:47:26.720
We do have innate knowledge of what makes a face, but it's actually impossible for us
link |
01:47:33.160
to have any DNA encoding knowledge of the difference between a female human face and
link |
01:47:39.200
a male human face.
link |
01:47:40.680
Because that knowledge, that information came up into the world actually very recently.
link |
01:47:50.800
If you look at the slowness of the process of encoding knowledge into DNA.
link |
01:47:56.920
Yeah.
link |
01:47:57.920
So that's interesting.
link |
01:47:58.920
That's a really powerful argument that DNA is a low bandwidth and it takes a long time
link |
01:48:01.640
to encode that naturally creates a very efficient encoding.
link |
01:48:05.480
But one important consequence of this is that, so yes, we are born into this world with a
link |
01:48:12.400
bunch of knowledge, sometimes very high level knowledge about the world like the rough shape
link |
01:48:17.440
of a snake, of the rough shape of a face.
link |
01:48:20.800
But importantly, because this knowledge takes so long to write, almost all of this innate
link |
01:48:27.040
knowledge is shared with our cousins, with great apes, right?
link |
01:48:33.360
So it is not actually this innate knowledge that makes us special.
link |
01:48:37.600
But to throw it right back at you from the earlier on in our discussion, that encoding
link |
01:48:44.120
might also include the entirety of the environment of Earth.
link |
01:48:50.600
To sum up, so it can include things that are important to survival and production.
link |
01:48:56.520
So for which there is some evolutionary pressure and things that are stable, constant over
link |
01:49:01.840
very, very, very long time periods.
link |
01:49:05.240
And honestly, it's not that much information.
link |
01:49:07.440
There's also, besides the bandwidths, constraints and constraints of the writing process, there's
link |
01:49:15.600
also memory constraints like DNA, the part of DNA that deals with the human brain, it's
link |
01:49:22.600
actually fairly small.
link |
01:49:23.600
It's like, you know, on the order of megabytes, right?
link |
01:49:26.360
There's not that much high level knowledge about the world you can encode.
link |
01:49:31.880
That's quite brilliant and hopeful for a benchmark that you're referring to of encoding priors.
link |
01:49:39.400
I actually look forward to, I'm skeptical that you can do it in the next couple of years,
link |
01:49:43.680
but hopefully...
link |
01:49:44.680
I've been working on it.
link |
01:49:45.960
So honestly, it's a very simple benchmark and it's not like a big breakthrough or anything.
link |
01:49:50.120
It's more like a fun side project, right?
link |
01:49:53.920
So is ImageNet.
link |
01:49:56.720
These fun side projects could launch entire groups of efforts towards creating reasoning
link |
01:50:04.120
systems and so on.
link |
01:50:05.120
And I think...
link |
01:50:06.120
Yeah, that's the goal.
link |
01:50:07.120
It's trying to measure strong generalization, to measure the strength of abstraction in
link |
01:50:12.160
our minds, well, in our minds and in an artificially intelligent agency.
link |
01:50:17.160
And if there's anything true about this science organism, it's individual cells love competition.
link |
01:50:24.960
So benchmarks encourage competition.
link |
01:50:27.000
So that's an exciting possibility.
link |
01:50:29.680
If you...
link |
01:50:30.680
Do you think an AI winter is coming and how do we prevent it?
link |
01:50:35.720
Not really.
link |
01:50:36.720
So an AI winter is something that would occur when there's a big mismatch between how we
link |
01:50:42.160
are selling the capabilities of AI and the actual capabilities of AI.
link |
01:50:47.560
And today, deep learning is creating a lot of value and it will keep creating a lot of
link |
01:50:52.000
value in the sense that these models are applicable to a very wide range of problems that are
link |
01:50:59.360
even today.
link |
01:51:00.360
And we are only just getting started with applying algorithms to every problem they
link |
01:51:05.320
could be solving.
link |
01:51:06.520
So deep learning will keep creating a lot of value for the time being.
link |
01:51:10.440
What's concerning, however, is that there's a lot of hype around deep learning and around
link |
01:51:16.000
AI.
link |
01:51:17.000
A lot of people are overselling the capabilities of these systems, not just the capabilities
link |
01:51:22.840
but also overselling the fact that they might be more or less brain like, like given a kind
link |
01:51:31.520
of a mystical aspect, these technologies, and also overselling the pace of progress,
link |
01:51:40.480
which it might look fast in the sense that we have this exponentially increasing number
link |
01:51:46.000
of papers.
link |
01:51:48.080
But again, that's just a simple consequence of the fact that we have ever more people
link |
01:51:53.000
coming into the field.
link |
01:51:54.000
It doesn't mean the progress is actually exponentially fast.
link |
01:51:58.000
Like, let's say you're trying to raise money for your startup or your research lab.
link |
01:52:02.960
You might want to tell, you know, a grand yos story to investors about how deep learning
link |
01:52:09.120
is just like the brain and how it can solve all these incredible problems like self driving
link |
01:52:14.240
and robotics and so on, and maybe you can tell them that the field is progressing so fast
link |
01:52:19.040
and we're going to have AI within 15 years or even 10 years, and none of this is true.
link |
01:52:27.000
And every time you're like saying these things and an investor or, you know, a decision maker
link |
01:52:33.320
believes them, well, this is like the equivalent of taking on credit card debt, but for trust.
link |
01:52:43.400
And maybe this will, you know, this will be what enables you to raise a lot of money,
link |
01:52:50.920
but ultimately you are creating damage, you are damaging the field.
link |
01:52:55.160
That's the concern is that debt, that's what happens with the other AI winters is the concern
link |
01:53:01.240
is you actually tweeted about this with autonomous vehicles, right?
link |
01:53:04.440
There's almost every single company now have promised that they will have full autonomous
link |
01:53:08.960
vehicles by 2021, 2022.
link |
01:53:12.000
That's a good example of the consequences of overhyping the capabilities of AI and the
link |
01:53:18.280
pace of progress.
link |
01:53:19.280
So because I work especially a lot recently in this area, I have a deep concern of what
link |
01:53:25.160
happens when all of these companies after every invested billions have a meeting and
link |
01:53:30.480
say, how much do we actually, first of all, do we have an autonomous vehicle?
link |
01:53:33.720
The answer will definitely be no.
link |
01:53:36.360
And second will be, wait a minute, we've invested one, two, three, four billion dollars
link |
01:53:40.680
into this and we made no profit.
link |
01:53:43.400
And the reaction to that may be going very hard in other directions that might impact
link |
01:53:49.280
you that even other industries.
link |
01:53:50.840
And that's what we call in the air winter is when there is backlash where no one believes
link |
01:53:55.320
any of these promises anymore because they've turned out to be big lies the first time around.
link |
01:54:00.600
And this will definitely happen to some extent for autonomous vehicles because the public
link |
01:54:06.120
and decision makers have been convinced that around 2015, they've been convinced by these
link |
01:54:13.440
people who are trying to raise money for their startups and so on, that L5 driving was coming
link |
01:54:19.600
in maybe 2016, maybe 2017, May 2018.
link |
01:54:23.120
Now in 2019, we're still waiting for it.
link |
01:54:28.040
And so I don't believe we are going to have a full on AI winter because we have these
link |
01:54:32.880
technologies that are producing a tremendous amount of real value, but there is also too
link |
01:54:39.480
much hype.
link |
01:54:40.480
So there will be some backlash, especially there will be backlash.
link |
01:54:45.240
So some startups are trying to sell the dream of AGI and the fact that AGI is going to create
link |
01:54:53.080
infinite value.
link |
01:54:54.080
AGI is like a freelance, like if you can develop an AI system that passes a certain threshold
link |
01:55:01.240
of IQ or something, then suddenly you have infinite value.
link |
01:55:06.440
And well, there are actually lots of investors buying into this idea.
link |
01:55:11.640
And they will wait maybe 10, 15 years and nothing will happen.
link |
01:55:18.920
And the next time around, well, maybe there will be a new generation of investors, no
link |
01:55:22.800
one will care.
link |
01:55:24.040
Human memory is very short after all.
link |
01:55:27.160
I don't know about you, but because I've spoken about AGI sometimes poetically, I get a lot
link |
01:55:34.440
of emails from people giving me, they're usually like a large manifestos of, they say to me
link |
01:55:42.360
that they have created an AGI system or they know how to do it and there's a long write
link |
01:55:48.320
up of how to do it.
link |
01:55:49.320
I get a lot of these emails.
link |
01:55:51.400
They're a little bit feel like it's generated by an AI system actually, but there's usually
link |
01:55:57.840
no backup.
link |
01:55:58.840
Maybe that's recursively self improving AI, it's you have a transformer generating crank
link |
01:56:04.920
papers about a GI.
link |
01:56:06.880
So the question is about, because you've been such a good, you have a good radar for crank
link |
01:56:12.160
papers, how do we know they're not onto something?
link |
01:56:16.960
How do I, so when you start to talk about AGI or anything like the reasoning benchmarks
link |
01:56:24.320
and so on, so something that doesn't have a benchmark, it's really difficult to know.
link |
01:56:28.720
I mean, I talked to Jeff Hawkins who's really looking at neuroscience approaches to how,
link |
01:56:35.480
and there's some, there's echoes of really interesting ideas in at least Jeff's case,
link |
01:56:41.800
which he's showing.
link |
01:56:43.520
How do you usually think about this?
link |
01:56:45.840
Like preventing yourself from being too narrow minded and elitist about deep learning.
link |
01:56:52.920
It has to work on these particular benchmarks, otherwise it's trash.
link |
01:56:57.040
Well, the thing is intelligence does not exist in the abstract.
link |
01:57:05.880
Intelligence has to be applied.
link |
01:57:07.440
So if you don't have a benchmark, if you don't have an improvement on some benchmark, maybe
link |
01:57:11.040
it's a new benchmark.
link |
01:57:12.680
Maybe it's not something we've been looking at before, but you do need a problem that
link |
01:57:16.760
you're trying to solve.
link |
01:57:17.760
You're not going to come up with a solution without a problem.
link |
01:57:21.040
So you, general intelligence, I mean, you've clearly highlighted generalization.
link |
01:57:26.760
If you want to claim that you have an intelligence system, it should come with a benchmark.
link |
01:57:31.320
It should, yes, it should display capabilities of some kind.
link |
01:57:35.960
It should show that it can create some form of value, even if it's a very artificial form
link |
01:57:41.920
of value.
link |
01:57:43.160
And that's also the reason why you don't actually need to care about telling which papers have
link |
01:57:48.840
actually some hidden potential and which do not.
link |
01:57:53.520
Because if there is a new technique, it's actually creating value.
link |
01:57:58.880
This is going to be brought to light very quickly because it's actually making a difference.
link |
01:58:02.640
So it's a difference between something that is ineffectual and something that is actually
link |
01:58:08.240
useful.
link |
01:58:09.240
And ultimately, usefulness is our guide, not just in this field, but if you look at science
link |
01:58:14.120
in general, maybe there are many, many people over the years that have had some really interesting
link |
01:58:19.560
theories of everything, but they were just completely useless.
link |
01:58:23.120
And you don't actually need to tell the interesting theories from the useless theories.
link |
01:58:28.240
All you need is to see, you know, is this actually having an effect on something else?
link |
01:58:34.120
You know, is this actually useful?
link |
01:58:35.600
Is this making an impact or not?
link |
01:58:37.960
As beautifully put, I mean, the same applies to quantum mechanics, to string theory, to
link |
01:58:42.480
the holographic principle.
link |
01:58:43.480
We are doing deep learning because it works.
link |
01:58:46.080
Before it started working, people considered people working on neural networks as cranks
link |
01:58:52.720
very much.
link |
01:58:53.720
Like, you know, no one was working on this anymore.
link |
01:58:56.560
And now it's working, which is what makes it valuable.
link |
01:58:59.400
It's not about being right, it's about being effective.
link |
01:59:02.840
And nevertheless, the individual entities of this scientific mechanism, just like Yoshio
link |
01:59:08.120
Banjo or Yanlacun, they, while being called cranks, stuck with it, right?
link |
01:59:13.160
And so, us individual agents, even if everyone's laughing at us, should stick with it.
link |
01:59:19.080
If you believe you have something, you should stick with it and see it through.
link |
01:59:23.840
That's a beautiful, inspirational message to end on.
link |
01:59:25.920
Francois, thank you so much for talking today.
link |
01:59:27.800
That was amazing.
link |
01:59:28.800
Thank you.