back to index

François Chollet: Keras, Deep Learning, and the Progress of AI | Lex Fridman Podcast #38


small model | large model

link |
00:00:00.000
The following is a conversation with Francois Chollet.
link |
00:00:03.720
He's the creator of Keras,
link |
00:00:05.760
which is an open source deep learning library
link |
00:00:08.080
that is designed to enable fast, user friendly experimentation
link |
00:00:11.480
with deep neural networks.
link |
00:00:13.600
It serves as an interface to several deep learning libraries,
link |
00:00:16.680
most popular of which is TensorFlow,
link |
00:00:19.040
and it was integrated into the TensorFlow main code base
link |
00:00:22.600
a while ago.
link |
00:00:24.080
Meaning, if you want to create, train,
link |
00:00:27.000
and use neural networks,
link |
00:00:28.640
probably the easiest and most popular option
link |
00:00:31.040
is to use Keras inside TensorFlow.
link |
00:00:34.840
Aside from creating an exceptionally useful
link |
00:00:37.240
and popular library,
link |
00:00:38.680
Francois is also a world class AI researcher
link |
00:00:41.920
and software engineer at Google.
link |
00:00:44.560
And he's definitely an outspoken,
link |
00:00:46.960
if not controversial personality in the AI world,
link |
00:00:50.560
especially in the realm of ideas
link |
00:00:52.920
around the future of artificial intelligence.
link |
00:00:55.920
This is the Artificial Intelligence Podcast.
link |
00:00:58.600
If you enjoy it, subscribe on YouTube,
link |
00:01:01.000
give it five stars on iTunes,
link |
00:01:02.760
support it on Patreon,
link |
00:01:04.160
or simply connect with me on Twitter
link |
00:01:06.120
at Lex Friedman, spelled F R I D M A N.
link |
00:01:09.960
And now, here's my conversation with Francois Chollet.
link |
00:01:14.880
You're known for not sugarcoating your opinions
link |
00:01:17.320
and speaking your mind about ideas in AI,
link |
00:01:19.160
especially on Twitter.
link |
00:01:21.160
It's one of my favorite Twitter accounts.
link |
00:01:22.760
So what's one of the more controversial ideas
link |
00:01:26.320
you've expressed online and gotten some heat for?
link |
00:01:30.440
How do you pick?
link |
00:01:33.080
How do I pick?
link |
00:01:33.920
Yeah, no, I think if you go through the trouble
link |
00:01:36.880
of maintaining a Twitter account,
link |
00:01:39.640
you might as well speak your mind, you know?
link |
00:01:41.840
Otherwise, what's even the point of having a Twitter account?
link |
00:01:44.600
It's like having a nice car
link |
00:01:45.480
and just leaving it in the garage.
link |
00:01:48.600
Yeah, so what's one thing for which I got
link |
00:01:50.840
a lot of pushback?
link |
00:01:53.600
Perhaps, you know, that time I wrote something
link |
00:01:56.640
about the idea of intelligence explosion,
link |
00:02:00.920
and I was questioning the idea
link |
00:02:04.520
and the reasoning behind this idea.
link |
00:02:06.840
And I got a lot of pushback on that.
link |
00:02:09.640
I got a lot of flak for it.
link |
00:02:11.840
So yeah, so intelligence explosion,
link |
00:02:13.600
I'm sure you're familiar with the idea,
link |
00:02:14.960
but it's the idea that if you were to build
link |
00:02:18.800
general AI problem solving algorithms,
link |
00:02:22.920
well, the problem of building such an AI,
link |
00:02:27.480
that itself is a problem that could be solved by your AI,
link |
00:02:30.520
and maybe it could be solved better
link |
00:02:31.880
than what humans can do.
link |
00:02:33.760
So your AI could start tweaking its own algorithm,
link |
00:02:36.840
could start making a better version of itself,
link |
00:02:39.520
and so on iteratively in a recursive fashion.
link |
00:02:43.240
And so you would end up with an AI
link |
00:02:47.320
with exponentially increasing intelligence.
link |
00:02:50.080
That's right.
link |
00:02:50.920
And I was basically questioning this idea,
link |
00:02:55.880
first of all, because the notion of intelligence explosion
link |
00:02:59.040
uses an implicit definition of intelligence
link |
00:03:02.200
that doesn't sound quite right to me.
link |
00:03:05.360
It considers intelligence as a property of a brain
link |
00:03:11.200
that you can consider in isolation,
link |
00:03:13.680
like the height of a building, for instance.
link |
00:03:16.640
But that's not really what intelligence is.
link |
00:03:19.040
Intelligence emerges from the interaction
link |
00:03:22.200
between a brain, a body,
link |
00:03:25.240
like embodied intelligence, and an environment.
link |
00:03:28.320
And if you're missing one of these pieces,
link |
00:03:30.720
then you cannot really define intelligence anymore.
link |
00:03:33.800
So just tweaking a brain to make it smaller and smaller
link |
00:03:36.800
doesn't actually make any sense to me.
link |
00:03:39.120
So first of all,
link |
00:03:39.960
you're crushing the dreams of many people, right?
link |
00:03:43.000
So there's a, let's look at like Sam Harris.
link |
00:03:46.000
Actually, a lot of physicists, Max Tegmark,
link |
00:03:48.680
people who think the universe
link |
00:03:52.120
is an information processing system,
link |
00:03:54.640
our brain is kind of an information processing system.
link |
00:03:57.680
So what's the theoretical limit?
link |
00:03:59.400
Like, it doesn't make sense that there should be some,
link |
00:04:04.800
it seems naive to think that our own brain
link |
00:04:07.520
is somehow the limit of the capabilities
link |
00:04:10.000
of this information system.
link |
00:04:11.600
I'm playing devil's advocate here.
link |
00:04:13.600
This information processing system.
link |
00:04:15.600
And then if you just scale it,
link |
00:04:17.760
if you're able to build something
link |
00:04:19.360
that's on par with the brain,
link |
00:04:20.920
you just, the process that builds it just continues
link |
00:04:24.040
and it'll improve exponentially.
link |
00:04:26.400
So that's the logic that's used actually
link |
00:04:30.160
by almost everybody
link |
00:04:32.560
that is worried about super human intelligence.
link |
00:04:36.920
So you're trying to make,
link |
00:04:39.120
so most people who are skeptical of that
link |
00:04:40.960
are kind of like, this doesn't,
link |
00:04:43.000
their thought process, this doesn't feel right.
link |
00:04:46.520
Like that's for me as well.
link |
00:04:47.680
So I'm more like, it doesn't,
link |
00:04:51.440
the whole thing is shrouded in mystery
link |
00:04:52.800
where you can't really say anything concrete,
link |
00:04:55.840
but you could say this doesn't feel right.
link |
00:04:57.880
This doesn't feel like that's how the brain works.
link |
00:05:00.640
And you're trying to with your blog posts
link |
00:05:02.400
and now making it a little more explicit.
link |
00:05:05.680
So one idea is that the brain isn't exist alone.
link |
00:05:10.680
It exists within the environment.
link |
00:05:13.200
So you can't exponentially,
link |
00:05:15.680
you would have to somehow exponentially improve
link |
00:05:18.000
the environment and the brain together almost.
link |
00:05:20.920
Yeah, in order to create something that's much smarter
link |
00:05:25.960
in some kind of,
link |
00:05:27.840
of course we don't have a definition of intelligence.
link |
00:05:29.960
That's correct, that's correct.
link |
00:05:31.280
I don't think, you should look at very smart people today,
link |
00:05:34.280
even humans, not even talking about AIs.
link |
00:05:37.280
I don't think their brain
link |
00:05:38.640
and the performance of their brain is the bottleneck
link |
00:05:41.960
to their expressed intelligence, to their achievements.
link |
00:05:46.600
You cannot just tweak one part of this system,
link |
00:05:49.960
like of this brain, body, environment system
link |
00:05:52.840
and expect that capabilities like what emerges
link |
00:05:55.960
out of this system to just explode exponentially.
link |
00:06:00.280
Because anytime you improve one part of a system
link |
00:06:04.200
with many interdependencies like this,
link |
00:06:06.760
there's a new bottleneck that arises, right?
link |
00:06:09.520
And I don't think even today for very smart people,
link |
00:06:12.280
their brain is not the bottleneck
link |
00:06:15.000
to the sort of problems they can solve, right?
link |
00:06:17.560
In fact, many very smart people today,
link |
00:06:20.760
you know, they are not actually solving
link |
00:06:22.520
any big scientific problems, they're not Einstein.
link |
00:06:24.800
They're like Einstein, but you know, the patent clerk days.
link |
00:06:29.800
Like Einstein became Einstein
link |
00:06:31.920
because this was a meeting of a genius
link |
00:06:36.080
with a big problem at the right time, right?
link |
00:06:39.480
But maybe this meeting could have never happened
link |
00:06:42.480
and then Einstein would have just been a patent clerk, right?
link |
00:06:44.960
And in fact, many people today are probably like
link |
00:06:49.760
genius level smart, but you wouldn't know
link |
00:06:52.240
because they're not really expressing any of that.
link |
00:06:54.800
Wow, that's brilliant.
link |
00:06:55.640
So we can think of the world, Earth,
link |
00:06:58.520
but also the universe as just as a space of problems.
link |
00:07:02.720
So all these problems and tasks are roaming it
link |
00:07:05.160
of various difficulty.
link |
00:07:06.880
And there's agents, creatures like ourselves
link |
00:07:10.120
and animals and so on that are also roaming it.
link |
00:07:13.360
And then you get coupled with a problem
link |
00:07:16.480
and then you solve it.
link |
00:07:17.640
But without that coupling,
link |
00:07:19.880
you can't demonstrate your quote unquote intelligence.
link |
00:07:22.560
Exactly, intelligence is the meeting
link |
00:07:24.480
of great problem solving capabilities
link |
00:07:27.480
with a great problem.
link |
00:07:28.760
And if you don't have the problem,
link |
00:07:30.560
you don't really express any intelligence.
link |
00:07:32.280
All you're left with is potential intelligence,
link |
00:07:34.760
like the performance of your brain
link |
00:07:36.240
or how high your IQ is,
link |
00:07:38.680
which in itself is just a number, right?
link |
00:07:42.080
So you mentioned problem solving capacity.
link |
00:07:46.520
Yeah.
link |
00:07:47.360
What do you think of as problem solving capacity?
link |
00:07:51.800
Can you try to define intelligence?
link |
00:07:56.640
Like what does it mean to be more or less intelligent?
link |
00:08:00.000
Is it completely coupled to a particular problem
link |
00:08:03.000
or is there something a little bit more universal?
link |
00:08:05.720
Yeah, I do believe all intelligence
link |
00:08:07.440
is specialized intelligence.
link |
00:08:09.080
Even human intelligence has some degree of generality.
link |
00:08:12.200
Well, all intelligent systems have some degree of generality
link |
00:08:15.320
but they're always specialized in one category of problems.
link |
00:08:19.400
So the human intelligence is specialized
link |
00:08:21.880
in the human experience.
link |
00:08:23.560
And that shows at various levels,
link |
00:08:25.560
that shows in some prior knowledge that's innate
link |
00:08:30.200
that we have at birth.
link |
00:08:32.040
Knowledge about things like agents,
link |
00:08:35.360
goal driven behavior, visual priors
link |
00:08:38.080
about what makes an object, priors about time and so on.
link |
00:08:43.520
That shows also in the way we learn.
link |
00:08:45.360
For instance, it's very, very easy for us
link |
00:08:47.160
to pick up language.
link |
00:08:49.560
It's very, very easy for us to learn certain things
link |
00:08:52.080
because we are basically hard coded to learn them.
link |
00:08:54.920
And we are specialized in solving certain kinds of problem
link |
00:08:58.280
and we are quite useless
link |
00:08:59.720
when it comes to other kinds of problems.
link |
00:09:01.440
For instance, we are not really designed
link |
00:09:06.160
to handle very long term problems.
link |
00:09:08.800
We have no capability of seeing the very long term.
link |
00:09:12.880
We don't have very much working memory.
link |
00:09:18.000
So how do you think about long term?
link |
00:09:20.080
Do you think long term planning,
link |
00:09:21.360
are we talking about scale of years, millennia?
link |
00:09:24.880
What do you mean by long term?
link |
00:09:26.400
We're not very good.
link |
00:09:28.120
Well, human intelligence is specialized
link |
00:09:29.760
in the human experience.
link |
00:09:30.720
And human experience is very short.
link |
00:09:32.800
One lifetime is short.
link |
00:09:34.240
Even within one lifetime,
link |
00:09:35.880
we have a very hard time envisioning things
link |
00:09:40.000
on a scale of years.
link |
00:09:41.360
It's very difficult to project yourself
link |
00:09:43.240
at a scale of five years, at a scale of 10 years and so on.
link |
00:09:46.960
We can solve only fairly narrowly scoped problems.
link |
00:09:50.000
So when it comes to solving bigger problems,
link |
00:09:52.320
larger scale problems,
link |
00:09:53.760
we are not actually doing it on an individual level.
link |
00:09:56.360
So it's not actually our brain doing it.
link |
00:09:59.280
We have this thing called civilization, right?
link |
00:10:03.040
Which is itself a sort of problem solving system,
link |
00:10:06.600
a sort of artificially intelligent system, right?
link |
00:10:10.000
And it's not running on one brain,
link |
00:10:12.120
it's running on a network of brains.
link |
00:10:14.080
In fact, it's running on much more
link |
00:10:15.640
than a network of brains.
link |
00:10:16.760
It's running on a lot of infrastructure,
link |
00:10:20.080
like books and computers and the internet
link |
00:10:23.040
and human institutions and so on.
link |
00:10:25.800
And that is capable of handling problems
link |
00:10:30.240
on a much greater scale than any individual human.
link |
00:10:33.760
If you look at computer science, for instance,
link |
00:10:37.600
that's an institution that solves problems
link |
00:10:39.840
and it is superhuman, right?
link |
00:10:42.560
It operates on a greater scale.
link |
00:10:44.200
It can solve much bigger problems
link |
00:10:46.880
than an individual human could.
link |
00:10:49.080
And science itself, science as a system, as an institution,
link |
00:10:52.160
is a kind of artificially intelligent problem solving
link |
00:10:57.120
algorithm that is superhuman.
link |
00:10:59.360
Yeah, it's, at least computer science
link |
00:11:02.800
is like a theorem prover at a scale of thousands,
link |
00:11:07.720
maybe hundreds of thousands of human beings.
link |
00:11:10.400
At that scale, what do you think is an intelligent agent?
link |
00:11:14.680
So there's us humans at the individual level,
link |
00:11:18.280
there is millions, maybe billions of bacteria in our skin.
link |
00:11:23.880
There is, that's at the smaller scale.
link |
00:11:26.400
You can even go to the particle level
link |
00:11:29.160
as systems that behave,
link |
00:11:31.840
you can say intelligently in some ways.
link |
00:11:35.440
And then you can look at the earth as a single organism,
link |
00:11:37.840
you can look at our galaxy
link |
00:11:39.200
and even the universe as a single organism.
link |
00:11:42.160
Do you think, how do you think about scale
link |
00:11:44.680
in defining intelligent systems?
link |
00:11:46.280
And we're here at Google, there is millions of devices
link |
00:11:50.440
doing computation just in a distributed way.
link |
00:11:53.360
How do you think about intelligence versus scale?
link |
00:11:55.880
You can always characterize anything as a system.
link |
00:12:00.640
I think people who talk about things
link |
00:12:03.600
like intelligence explosion,
link |
00:12:05.320
tend to focus on one agent is basically one brain,
link |
00:12:08.760
like one brain considered in isolation,
link |
00:12:10.960
like a brain, a jaw that's controlling a body
link |
00:12:13.200
in a very like top to bottom kind of fashion.
link |
00:12:16.280
And that body is pursuing goals into an environment.
link |
00:12:19.480
So it's a very hierarchical view.
link |
00:12:20.720
You have the brain at the top of the pyramid,
link |
00:12:22.880
then you have the body just plainly receiving orders.
link |
00:12:25.960
And then the body is manipulating objects
link |
00:12:27.640
in the environment and so on.
link |
00:12:28.920
So everything is subordinate to this one thing,
link |
00:12:32.920
this epicenter, which is the brain.
link |
00:12:34.720
But in real life, intelligent agents
link |
00:12:37.120
don't really work like this, right?
link |
00:12:39.240
There is no strong delimitation
link |
00:12:40.920
between the brain and the body to start with.
link |
00:12:43.400
You have to look not just at the brain,
link |
00:12:45.000
but at the nervous system.
link |
00:12:46.560
But then the nervous system and the body
link |
00:12:48.840
are naturally two separate entities.
link |
00:12:50.760
So you have to look at an entire animal as one agent.
link |
00:12:53.960
But then you start realizing as you observe an animal
link |
00:12:57.000
over any length of time,
link |
00:13:00.200
that a lot of the intelligence of an animal
link |
00:13:03.160
is actually externalized.
link |
00:13:04.600
That's especially true for humans.
link |
00:13:06.240
A lot of our intelligence is externalized.
link |
00:13:08.880
When you write down some notes,
link |
00:13:10.360
that is externalized intelligence.
link |
00:13:11.960
When you write a computer program,
link |
00:13:14.000
you are externalizing cognition.
link |
00:13:16.000
So it's externalizing books, it's externalized in computers,
link |
00:13:19.720
the internet, in other humans.
link |
00:13:23.080
It's externalizing language and so on.
link |
00:13:25.400
So there is no hard delimitation
link |
00:13:30.480
of what makes an intelligent agent.
link |
00:13:32.640
It's all about context.
link |
00:13:34.960
Okay, but AlphaGo is better at Go
link |
00:13:38.800
than the best human player.
link |
00:13:42.520
There's levels of skill here.
link |
00:13:45.000
So do you think there's such a ability,
link |
00:13:48.600
such a concept as intelligence explosion
link |
00:13:52.800
in a specific task?
link |
00:13:54.760
And then, well, yeah.
link |
00:13:57.360
Do you think it's possible to have a category of tasks
link |
00:14:00.120
on which you do have something
link |
00:14:02.080
like an exponential growth of ability
link |
00:14:05.040
to solve that particular problem?
link |
00:14:07.440
I think if you consider a specific vertical,
link |
00:14:10.320
it's probably possible to some extent.
link |
00:14:15.320
I also don't think we have to speculate about it
link |
00:14:18.320
because we have real world examples
link |
00:14:22.280
of recursively self improving intelligent systems, right?
link |
00:14:26.920
So for instance, science is a problem solving system,
link |
00:14:30.920
a knowledge generation system,
link |
00:14:32.600
like a system that experiences the world in some sense
link |
00:14:36.240
and then gradually understands it and can act on it.
link |
00:14:40.160
And that system is superhuman
link |
00:14:42.120
and it is clearly recursively self improving
link |
00:14:45.600
because science feeds into technology.
link |
00:14:47.560
Technology can be used to build better tools,
link |
00:14:50.200
better computers, better instrumentation and so on,
link |
00:14:52.880
which in turn can make science faster, right?
link |
00:14:56.720
So science is probably the closest thing we have today
link |
00:15:00.560
to a recursively self improving superhuman AI.
link |
00:15:04.760
And you can just observe is science,
link |
00:15:08.520
is scientific progress to the exploding,
link |
00:15:10.320
which itself is an interesting question.
link |
00:15:12.800
You can use that as a basis to try to understand
link |
00:15:15.560
what will happen with a superhuman AI
link |
00:15:17.920
that has a science like behavior.
link |
00:15:21.000
Let me linger on it a little bit more.
link |
00:15:23.320
What is your intuition why an intelligence explosion
link |
00:15:27.600
is not possible?
link |
00:15:28.560
Like taking the scientific,
link |
00:15:30.920
all the semi scientific revolutions,
link |
00:15:33.240
why can't we slightly accelerate that process?
link |
00:15:38.080
So you can absolutely accelerate
link |
00:15:41.200
any problem solving process.
link |
00:15:43.120
So a recursively self improvement
link |
00:15:46.720
is absolutely a real thing.
link |
00:15:48.640
But what happens with a recursively self improving system
link |
00:15:51.880
is typically not explosion
link |
00:15:53.680
because no system exists in isolation.
link |
00:15:56.520
And so tweaking one part of the system
link |
00:15:58.640
means that suddenly another part of the system
link |
00:16:00.880
becomes a bottleneck.
link |
00:16:02.200
And if you look at science, for instance,
link |
00:16:03.800
which is clearly a recursively self improving,
link |
00:16:06.800
clearly a problem solving system,
link |
00:16:09.040
scientific progress is not actually exploding.
link |
00:16:12.000
If you look at science,
link |
00:16:13.520
what you see is the picture of a system
link |
00:16:16.480
that is consuming an exponentially increasing
link |
00:16:19.240
amount of resources,
link |
00:16:20.520
but it's having a linear output
link |
00:16:23.960
in terms of scientific progress.
link |
00:16:26.000
And maybe that will seem like a very strong claim.
link |
00:16:28.960
Many people are actually saying that,
link |
00:16:31.160
scientific progress is exponential,
link |
00:16:34.560
but when they're claiming this,
link |
00:16:36.120
they're actually looking at indicators
link |
00:16:38.400
of resource consumption by science.
link |
00:16:43.080
For instance, the number of papers being published,
link |
00:16:47.560
the number of patents being filed and so on,
link |
00:16:49.960
which are just completely correlated
link |
00:16:53.600
with how many people are working on science today.
link |
00:16:58.480
So it's actually an indicator of resource consumption,
link |
00:17:00.640
but what you should look at is the output,
link |
00:17:03.200
is progress in terms of the knowledge
link |
00:17:06.680
that science generates,
link |
00:17:08.040
in terms of the scope and significance
link |
00:17:10.640
of the problems that we solve.
link |
00:17:12.520
And some people have actually been trying to measure that.
link |
00:17:16.720
Like Michael Nielsen, for instance,
link |
00:17:20.160
he had a very nice paper,
link |
00:17:21.920
I think that was last year about it.
link |
00:17:25.200
So his approach to measure scientific progress
link |
00:17:28.360
was to look at the timeline of scientific discoveries
link |
00:17:33.720
over the past, you know, 100, 150 years.
link |
00:17:37.160
And for each major discovery,
link |
00:17:41.360
ask a panel of experts to rate
link |
00:17:44.360
the significance of the discovery.
link |
00:17:46.760
And if the output of science as an institution
link |
00:17:49.600
were exponential,
link |
00:17:50.440
you would expect the temporal density of significance
link |
00:17:56.600
to go up exponentially.
link |
00:17:58.160
Maybe because there's a faster rate of discoveries,
link |
00:18:00.960
maybe because the discoveries are, you know,
link |
00:18:02.960
increasingly more important.
link |
00:18:04.920
And what actually happens
link |
00:18:06.800
if you plot this temporal density of significance
link |
00:18:10.040
measured in this way,
link |
00:18:11.320
is that you see very much a flat graph.
link |
00:18:14.520
You see a flat graph across all disciplines,
link |
00:18:16.600
across physics, biology, medicine, and so on.
link |
00:18:19.720
And it actually makes a lot of sense
link |
00:18:22.480
if you think about it,
link |
00:18:23.320
because think about the progress of physics
link |
00:18:26.000
110 years ago, right?
link |
00:18:28.000
It was a time of crazy change.
link |
00:18:30.080
Think about the progress of technology,
link |
00:18:31.960
you know, 170 years ago,
link |
00:18:34.360
when we started having, you know,
link |
00:18:35.400
replacing horses with cars,
link |
00:18:37.560
when we started having electricity and so on.
link |
00:18:40.000
It was a time of incredible change.
link |
00:18:41.520
And today is also a time of very, very fast change,
link |
00:18:44.600
but it would be an unfair characterization
link |
00:18:48.040
to say that today technology and science
link |
00:18:50.560
are moving way faster than they did 50 years ago
link |
00:18:52.920
or 100 years ago.
link |
00:18:54.360
And if you do try to rigorously plot
link |
00:18:59.520
the temporal density of the significance,
link |
00:19:04.880
yeah, of significance, sorry,
link |
00:19:07.320
you do see very flat curves.
link |
00:19:09.720
And you can check out the paper
link |
00:19:12.040
that Michael Nielsen had about this idea.
link |
00:19:16.000
And so the way I interpret it is,
link |
00:19:20.000
as you make progress in a given field,
link |
00:19:24.160
or in a given subfield of science,
link |
00:19:26.120
it becomes exponentially more difficult
link |
00:19:28.680
to make further progress.
link |
00:19:30.440
Like the very first person to work on information theory.
link |
00:19:35.000
If you enter a new field,
link |
00:19:36.440
and it's still the very early years,
link |
00:19:37.920
there's a lot of low hanging fruit you can pick.
link |
00:19:41.160
That's right, yeah.
link |
00:19:42.000
But the next generation of researchers
link |
00:19:43.960
is gonna have to dig much harder, actually,
link |
00:19:48.160
to make smaller discoveries,
link |
00:19:50.640
probably larger number of smaller discoveries,
link |
00:19:52.640
and to achieve the same amount of impact,
link |
00:19:54.640
you're gonna need a much greater head count.
link |
00:19:57.480
And that's exactly the picture you're seeing with science,
link |
00:20:00.040
that the number of scientists and engineers
link |
00:20:03.760
is in fact increasing exponentially.
link |
00:20:06.520
The amount of computational resources
link |
00:20:08.400
that are available to science
link |
00:20:10.040
is increasing exponentially and so on.
link |
00:20:11.880
So the resource consumption of science is exponential,
link |
00:20:15.560
but the output in terms of progress,
link |
00:20:18.200
in terms of significance, is linear.
link |
00:20:21.000
And the reason why is because,
link |
00:20:23.120
and even though science is regressively self improving,
link |
00:20:26.000
meaning that scientific progress
link |
00:20:28.440
turns into technological progress,
link |
00:20:30.240
which in turn helps science.
link |
00:20:32.960
If you look at computers, for instance,
link |
00:20:35.280
our products of science and computers
link |
00:20:38.480
are tremendously useful in speeding up science.
link |
00:20:41.560
The internet, same thing, the internet is a technology
link |
00:20:43.840
that's made possible by very recent scientific advances.
link |
00:20:47.480
And itself, because it enables scientists to network,
link |
00:20:52.400
to communicate, to exchange papers and ideas much faster,
link |
00:20:55.520
it is a way to speed up scientific progress.
link |
00:20:57.440
So even though you're looking
link |
00:20:58.440
at a regressively self improving system,
link |
00:21:01.440
it is consuming exponentially more resources
link |
00:21:04.080
to produce the same amount of problem solving, very much.
link |
00:21:09.200
So that's a fascinating way to paint it,
link |
00:21:11.080
and certainly that holds for the deep learning community.
link |
00:21:14.960
If you look at the temporal, what did you call it,
link |
00:21:18.120
the temporal density of significant ideas,
link |
00:21:21.240
if you look at in deep learning,
link |
00:21:24.840
I think, I'd have to think about that,
link |
00:21:26.960
but if you really look at significant ideas
link |
00:21:29.040
in deep learning, they might even be decreasing.
link |
00:21:32.400
So I do believe the per paper significance is decreasing,
link |
00:21:39.600
but the amount of papers
link |
00:21:41.240
is still today exponentially increasing.
link |
00:21:43.440
So I think if you look at an aggregate,
link |
00:21:45.880
my guess is that you would see a linear progress.
link |
00:21:48.840
If you were to sum the significance of all papers,
link |
00:21:56.120
you would see roughly in your progress.
link |
00:21:58.640
And in my opinion, it is not a coincidence
link |
00:22:03.880
that you're seeing linear progress in science
link |
00:22:05.800
despite exponential resource consumption.
link |
00:22:07.720
I think the resource consumption
link |
00:22:10.280
is dynamically adjusting itself to maintain linear progress
link |
00:22:15.880
because we as a community expect linear progress,
link |
00:22:18.560
meaning that if we start investing less
link |
00:22:21.240
and seeing less progress, it means that suddenly
link |
00:22:23.600
there are some lower hanging fruits that become available
link |
00:22:26.800
and someone's gonna step up and pick them, right?
link |
00:22:31.280
So it's very much like a market for discoveries and ideas.
link |
00:22:36.920
But there's another fundamental part
link |
00:22:38.720
which you're highlighting, which as a hypothesis
link |
00:22:41.800
as science or like the space of ideas,
link |
00:22:45.160
any one path you travel down,
link |
00:22:48.160
it gets exponentially more difficult
link |
00:22:51.080
to get a new way to develop new ideas.
link |
00:22:54.720
And your sense is that's gonna hold
link |
00:22:57.640
across our mysterious universe.
link |
00:23:01.520
Yes, well, exponential progress
link |
00:23:03.360
triggers exponential friction.
link |
00:23:05.480
So that if you tweak one part of the system,
link |
00:23:07.440
suddenly some other part becomes a bottleneck, right?
link |
00:23:10.680
For instance, let's say you develop some device
link |
00:23:14.880
that measures its own acceleration
link |
00:23:17.160
and then it has some engine
link |
00:23:18.720
and it outputs even more acceleration
link |
00:23:20.800
in proportion of its own acceleration
link |
00:23:22.360
and you drop it somewhere,
link |
00:23:23.320
it's not gonna reach infinite speed
link |
00:23:25.240
because it exists in a certain context.
link |
00:23:29.080
So the air around it is gonna generate friction
link |
00:23:31.000
and it's gonna block it at some top speed.
link |
00:23:34.320
And even if you were to consider the broader context
link |
00:23:37.480
and lift the bottleneck there,
link |
00:23:39.840
like the bottleneck of friction,
link |
00:23:43.120
then some other part of the system
link |
00:23:45.120
would start stepping in and creating exponential friction,
link |
00:23:48.120
maybe the speed of flight or whatever.
link |
00:23:49.920
And this definitely holds true
link |
00:23:51.920
when you look at the problem solving algorithm
link |
00:23:54.960
that is being run by science as an institution,
link |
00:23:58.160
science as a system.
link |
00:23:59.720
As you make more and more progress,
link |
00:24:01.720
despite having this recursive self improvement component,
link |
00:24:06.760
you are encountering exponential friction.
link |
00:24:09.840
The more researchers you have working on different ideas,
link |
00:24:13.480
the more overhead you have
link |
00:24:14.880
in terms of communication across researchers.
link |
00:24:18.040
If you look at, you were mentioning quantum mechanics, right?
link |
00:24:22.920
Well, if you want to start making significant discoveries
link |
00:24:26.880
today, significant progress in quantum mechanics,
link |
00:24:29.680
there is an amount of knowledge you have to ingest,
link |
00:24:33.000
which is huge.
link |
00:24:34.080
So there's a very large overhead
link |
00:24:36.520
to even start to contribute.
link |
00:24:39.240
There's a large amount of overhead
link |
00:24:40.680
to synchronize across researchers and so on.
link |
00:24:44.040
And of course, the significant practical experiments
link |
00:24:48.600
are going to require exponentially expensive equipment
link |
00:24:52.160
because the easier ones have already been run, right?
link |
00:24:56.480
So in your senses, there's no way escaping,
link |
00:25:00.480
there's no way of escaping this kind of friction
link |
00:25:04.480
with artificial intelligence systems.
link |
00:25:08.600
Yeah, no, I think science is a very good way
link |
00:25:11.520
to model what would happen with a superhuman
link |
00:25:14.280
recursive research improving AI.
link |
00:25:16.440
That's your sense, I mean, the...
link |
00:25:18.240
That's my intuition.
link |
00:25:19.680
It's not like a mathematical proof of anything.
link |
00:25:23.400
That's not my point.
link |
00:25:24.400
Like, I'm not trying to prove anything.
link |
00:25:26.600
I'm just trying to make an argument
link |
00:25:27.920
to question the narrative of intelligence explosion,
link |
00:25:31.160
which is quite a dominant narrative.
link |
00:25:32.880
And you do get a lot of pushback if you go against it.
link |
00:25:35.840
Because, so for many people, right,
link |
00:25:39.320
AI is not just a subfield of computer science.
link |
00:25:42.200
It's more like a belief system.
link |
00:25:44.120
Like this belief that the world is headed towards an event,
link |
00:25:48.640
the singularity, past which, you know, AI will become...
link |
00:25:55.040
will go exponential very much,
link |
00:25:57.080
and the world will be transformed,
link |
00:25:58.600
and humans will become obsolete.
link |
00:26:00.840
And if you go against this narrative,
link |
00:26:03.880
because it is not really a scientific argument,
link |
00:26:06.920
but more of a belief system,
link |
00:26:08.880
it is part of the identity of many people.
link |
00:26:11.240
If you go against this narrative,
link |
00:26:12.600
it's like you're attacking the identity
link |
00:26:14.400
of people who believe in it.
link |
00:26:15.560
It's almost like saying God doesn't exist,
link |
00:26:17.640
or something.
link |
00:26:19.000
So you do get a lot of pushback
link |
00:26:21.880
if you try to question these ideas.
link |
00:26:24.040
First of all, I believe most people,
link |
00:26:26.520
they might not be as eloquent or explicit as you're being,
link |
00:26:29.240
but most people in computer science
link |
00:26:30.920
are most people who actually have built
link |
00:26:33.000
anything that you could call AI, quote, unquote,
link |
00:26:36.360
would agree with you.
link |
00:26:38.080
They might not be describing in the same kind of way.
link |
00:26:40.560
It's more, so the pushback you're getting
link |
00:26:43.960
is from people who get attached to the narrative
link |
00:26:48.080
from, not from a place of science,
link |
00:26:51.000
but from a place of imagination.
link |
00:26:53.400
That's correct, that's correct.
link |
00:26:54.760
So why do you think that's so appealing?
link |
00:26:56.920
Because the usual dreams that people have
link |
00:27:02.120
when you create a superintelligence system
link |
00:27:03.960
past the singularity,
link |
00:27:05.120
that what people imagine is somehow always destructive.
link |
00:27:09.440
Do you have, if you were put on your psychology hat,
link |
00:27:12.240
what's, why is it so appealing to imagine
link |
00:27:17.400
the ways that all of human civilization will be destroyed?
link |
00:27:20.760
I think it's a good story.
link |
00:27:22.080
You know, it's a good story.
link |
00:27:23.120
And very interestingly, it mirrors a religious stories,
link |
00:27:28.160
right, religious mythology.
link |
00:27:30.560
If you look at the mythology of most civilizations,
link |
00:27:34.360
it's about the world being headed towards some final events
link |
00:27:38.280
in which the world will be destroyed
link |
00:27:40.480
and some new world order will arise
link |
00:27:42.800
that will be mostly spiritual,
link |
00:27:44.920
like the apocalypse followed by a paradise probably, right?
link |
00:27:49.400
It's a very appealing story on a fundamental level.
link |
00:27:52.600
And we all need stories.
link |
00:27:54.560
We all need stories to structure the way we see the world,
link |
00:27:58.160
especially at timescales
link |
00:27:59.960
that are beyond our ability to make predictions, right?
link |
00:28:04.520
So on a more serious non exponential explosion,
link |
00:28:08.840
question, do you think there will be a time
link |
00:28:15.000
when we'll create something like human level intelligence
link |
00:28:19.800
or intelligent systems that will make you sit back
link |
00:28:23.800
and be just surprised at damn how smart this thing is?
link |
00:28:28.520
That doesn't require exponential growth
link |
00:28:30.160
or an exponential improvement,
link |
00:28:32.120
but what's your sense of the timeline and so on
link |
00:28:35.600
that you'll be really surprised at certain capabilities?
link |
00:28:41.080
And we'll talk about limitations and deep learning.
link |
00:28:42.560
So do you think in your lifetime,
link |
00:28:44.480
you'll be really damn surprised?
link |
00:28:46.600
Around 2013, 2014, I was many times surprised
link |
00:28:51.440
by the capabilities of deep learning actually.
link |
00:28:53.960
That was before we had assessed exactly
link |
00:28:55.920
what deep learning could do and could not do.
link |
00:28:57.880
And it felt like a time of immense potential.
link |
00:29:00.600
And then we started narrowing it down,
link |
00:29:03.080
but I was very surprised.
link |
00:29:04.360
I would say it has already happened.
link |
00:29:07.120
Was there a moment, there must've been a day in there
link |
00:29:10.800
where your surprise was almost bordering
link |
00:29:14.360
on the belief of the narrative that we just discussed.
link |
00:29:19.440
Was there a moment,
link |
00:29:20.800
because you've written quite eloquently
link |
00:29:22.400
about the limits of deep learning,
link |
00:29:23.960
was there a moment that you thought
link |
00:29:25.760
that maybe deep learning is limitless?
link |
00:29:30.000
No, I don't think I've ever believed this.
link |
00:29:32.400
What was really shocking is that it worked.
link |
00:29:35.560
It worked at all, yeah.
link |
00:29:37.640
But there's a big jump between being able
link |
00:29:40.520
to do really good computer vision
link |
00:29:43.400
and human level intelligence.
link |
00:29:44.920
So I don't think at any point I wasn't under the impression
link |
00:29:49.520
that the results we got in computer vision
link |
00:29:51.280
meant that we were very close to human level intelligence.
link |
00:29:54.080
I don't think we're very close to human level intelligence.
link |
00:29:56.040
I do believe that there's no reason
link |
00:29:58.520
why we won't achieve it at some point.
link |
00:30:01.760
I also believe that it's the problem
link |
00:30:06.400
with talking about human level intelligence
link |
00:30:08.560
that implicitly you're considering
link |
00:30:11.240
like an axis of intelligence with different levels,
link |
00:30:14.360
but that's not really how intelligence works.
link |
00:30:16.720
Intelligence is very multi dimensional.
link |
00:30:19.480
And so there's the question of capabilities,
link |
00:30:22.480
but there's also the question of being human like,
link |
00:30:25.560
and it's two very different things.
link |
00:30:27.040
Like you can build potentially
link |
00:30:28.280
very advanced intelligent agents
link |
00:30:30.640
that are not human like at all.
link |
00:30:32.640
And you can also build very human like agents.
link |
00:30:35.240
And these are two very different things, right?
link |
00:30:37.840
Right.
link |
00:30:38.760
Let's go from the philosophical to the practical.
link |
00:30:42.240
Can you give me a history of Keras
link |
00:30:44.240
and all the major deep learning frameworks
link |
00:30:46.440
that you kind of remember in relation to Keras
link |
00:30:48.480
and in general, TensorFlow, Theano, the old days.
link |
00:30:52.040
Can you give a brief overview Wikipedia style history
link |
00:30:55.400
and your role in it before we return to AGI discussions?
link |
00:30:59.120
Yeah, that's a broad topic.
link |
00:31:00.720
So I started working on Keras.
link |
00:31:04.920
It was the name Keras at the time.
link |
00:31:06.240
I actually picked the name like
link |
00:31:08.320
just the day I was going to release it.
link |
00:31:10.200
So I started working on it in February, 2015.
link |
00:31:14.800
And so at the time there weren't too many people
link |
00:31:17.240
working on deep learning, maybe like fewer than 10,000.
link |
00:31:20.320
The software tooling was not really developed.
link |
00:31:25.320
So the main deep learning library was Cafe,
link |
00:31:28.800
which was mostly C++.
link |
00:31:30.840
Why do you say Cafe was the main one?
link |
00:31:32.760
Cafe was vastly more popular than Theano
link |
00:31:36.000
in late 2014, early 2015.
link |
00:31:38.920
Cafe was the one library that everyone was using
link |
00:31:42.400
for computer vision.
link |
00:31:43.400
And computer vision was the most popular problem
link |
00:31:46.120
in deep learning at the time.
link |
00:31:46.960
Absolutely.
link |
00:31:47.800
Like ConvNets was like the subfield of deep learning
link |
00:31:50.440
that everyone was working on.
link |
00:31:53.160
So myself, so in late 2014,
link |
00:31:57.680
I was actually interested in RNNs,
link |
00:32:00.600
in recurrent neural networks,
link |
00:32:01.760
which was a very niche topic at the time, right?
link |
00:32:05.800
It really took off around 2016.
link |
00:32:08.640
And so I was looking for good tools.
link |
00:32:11.080
I had used Torch 7, I had used Theano,
link |
00:32:14.800
used Theano a lot in Kaggle competitions.
link |
00:32:19.320
I had used Cafe.
link |
00:32:20.840
And there was no like good solution for RNNs at the time.
link |
00:32:25.840
Like there was no reusable open source implementation
link |
00:32:28.640
of an LSTM, for instance.
link |
00:32:30.000
So I decided to build my own.
link |
00:32:32.920
And at first, the pitch for that was,
link |
00:32:35.440
it was gonna be mostly around LSTM recurrent neural networks.
link |
00:32:39.960
It was gonna be in Python.
link |
00:32:42.280
An important decision at the time
link |
00:32:44.280
that was kind of not obvious
link |
00:32:45.440
is that the models would be defined via Python code,
link |
00:32:50.360
which was kind of like going against the mainstream
link |
00:32:54.400
at the time because Cafe, Pylon 2, and so on,
link |
00:32:58.000
like all the big libraries were actually going
link |
00:33:00.600
with the approach of setting configuration files
link |
00:33:03.520
in YAML to define models.
link |
00:33:05.560
So some libraries were using code to define models,
link |
00:33:08.840
like Torch 7, obviously, but that was not Python.
link |
00:33:12.280
Lasagne was like a Theano based very early library
link |
00:33:16.680
that was, I think, developed, I don't remember exactly,
link |
00:33:18.640
probably late 2014.
link |
00:33:20.240
It's Python as well.
link |
00:33:21.200
It's Python as well.
link |
00:33:22.040
It was like on top of Theano.
link |
00:33:24.320
And so I started working on something
link |
00:33:29.480
and the value proposition at the time was that
link |
00:33:32.520
not only what I think was the first
link |
00:33:36.240
reusable open source implementation of LSTM,
link |
00:33:40.400
you could combine RNNs and covenants
link |
00:33:44.440
with the same library,
link |
00:33:45.440
which is not really possible before,
link |
00:33:46.920
like Cafe was only doing covenants.
link |
00:33:50.440
And it was kind of easy to use
link |
00:33:52.560
because, so before I was using Theano,
link |
00:33:54.440
I was actually using scikitlin
link |
00:33:55.680
and I loved scikitlin for its usability.
link |
00:33:58.320
So I drew a lot of inspiration from scikitlin
link |
00:34:01.560
when I made Keras.
link |
00:34:02.400
It's almost like scikitlin for neural networks.
link |
00:34:05.600
The fit function.
link |
00:34:06.680
Exactly, the fit function,
link |
00:34:07.960
like reducing a complex string loop
link |
00:34:10.800
to a single function call, right?
link |
00:34:12.880
And of course, some people will say,
link |
00:34:14.880
this is hiding a lot of details,
link |
00:34:16.320
but that's exactly the point, right?
link |
00:34:18.680
The magic is the point.
link |
00:34:20.280
So it's magical, but in a good way.
link |
00:34:22.680
It's magical in the sense that it's delightful.
link |
00:34:24.960
Yeah, yeah.
link |
00:34:26.160
I'm actually quite surprised.
link |
00:34:27.640
I didn't know that it was born out of desire
link |
00:34:29.600
to implement RNNs and LSTMs.
link |
00:34:32.480
It was.
link |
00:34:33.320
That's fascinating.
link |
00:34:34.160
So you were actually one of the first people
link |
00:34:36.040
to really try to attempt
link |
00:34:37.960
to get the major architectures together.
link |
00:34:41.000
And it's also interesting.
link |
00:34:42.760
You made me realize that that was a design decision at all
link |
00:34:45.160
is defining the model and code.
link |
00:34:47.360
Just, I'm putting myself in your shoes,
link |
00:34:49.920
whether the YAML, especially if cafe was the most popular.
link |
00:34:53.200
It was the most popular by far.
link |
00:34:54.720
If I was, if I were, yeah, I don't,
link |
00:34:58.480
I didn't like the YAML thing,
link |
00:34:59.560
but it makes more sense that you will put
link |
00:35:02.840
in a configuration file, the definition of a model.
link |
00:35:05.720
That's an interesting gutsy move
link |
00:35:07.200
to stick with defining it in code.
link |
00:35:10.040
Just if you look back.
link |
00:35:11.600
Other libraries were doing it as well,
link |
00:35:13.480
but it was definitely the more niche option.
link |
00:35:16.320
Yeah.
link |
00:35:17.160
Okay, Keras and then.
link |
00:35:18.360
So I released Keras in March, 2015,
link |
00:35:21.520
and it got users pretty much from the start.
link |
00:35:24.160
So the deep learning community was very, very small
link |
00:35:25.800
at the time.
link |
00:35:27.240
Lots of people were starting to be interested in LSTM.
link |
00:35:30.600
So it was gonna release it at the right time
link |
00:35:32.440
because it was offering an easy to use LSTM implementation.
link |
00:35:35.560
Exactly at the time where lots of people started
link |
00:35:37.680
to be intrigued by the capabilities of RNN, RNNs for NLP.
link |
00:35:42.280
So it grew from there.
link |
00:35:43.920
Then I joined Google about six months later,
link |
00:35:51.480
and that was actually completely unrelated to Keras.
link |
00:35:54.920
So I actually joined a research team
link |
00:35:57.080
working on image classification,
link |
00:35:59.520
mostly like computer vision.
link |
00:36:00.680
So I was doing computer vision research
link |
00:36:02.320
at Google initially.
link |
00:36:03.640
And immediately when I joined Google,
link |
00:36:05.520
I was exposed to the early internal version of TensorFlow.
link |
00:36:10.520
And the way it appeared to me at the time,
link |
00:36:13.920
and it was definitely the way it was at the time
link |
00:36:15.720
is that this was an improved version of Theano.
link |
00:36:20.760
So I immediately knew I had to port Keras
link |
00:36:24.720
to this new TensorFlow thing.
link |
00:36:26.800
And I was actually very busy as a noobler,
link |
00:36:29.800
as a new Googler.
link |
00:36:31.600
So I had not time to work on that.
link |
00:36:34.520
But then in November, I think it was November, 2015,
link |
00:36:38.680
TensorFlow got released.
link |
00:36:41.240
And it was kind of like my wake up call
link |
00:36:44.560
that, hey, I had to actually go and make it happen.
link |
00:36:47.320
So in December, I ported Keras to run on top of TensorFlow,
link |
00:36:52.200
but it was not exactly a port.
link |
00:36:53.320
It was more like a refactoring
link |
00:36:55.280
where I was abstracting away
link |
00:36:57.920
all the backend functionality into one module
link |
00:37:00.480
so that the same code base
link |
00:37:02.320
could run on top of multiple backends.
link |
00:37:05.080
So on top of TensorFlow or Theano.
link |
00:37:07.440
And for the next year,
link |
00:37:09.760
Theano stayed as the default option.
link |
00:37:15.400
It was easier to use, somewhat less buggy.
link |
00:37:20.640
It was much faster, especially when it came to audience.
link |
00:37:23.360
But eventually, TensorFlow overtook it.
link |
00:37:27.480
And TensorFlow, the early TensorFlow,
link |
00:37:30.200
has similar architectural decisions as Theano, right?
link |
00:37:33.960
So it was a natural transition.
link |
00:37:37.440
Yeah, absolutely.
link |
00:37:38.320
So what, I mean, that still Keras is a side,
link |
00:37:42.960
almost fun project, right?
link |
00:37:45.280
Yeah, so it was not my job assignment.
link |
00:37:49.040
It was not.
link |
00:37:50.360
I was doing it on the side.
link |
00:37:52.240
And even though it grew to have a lot of users
link |
00:37:55.840
for a deep learning library at the time, like Stroud 2016,
link |
00:37:59.600
but I wasn't doing it as my main job.
link |
00:38:02.480
So things started changing in,
link |
00:38:04.760
I think it must have been maybe October, 2016.
link |
00:38:10.200
So one year later.
link |
00:38:12.360
So Rajat, who was the lead on TensorFlow,
link |
00:38:15.240
basically showed up one day in our building
link |
00:38:19.240
where I was doing like,
link |
00:38:20.080
so I was doing research and things like,
link |
00:38:21.640
so I did a lot of computer vision research,
link |
00:38:24.640
also collaborations with Christian Zighetti
link |
00:38:27.560
and deep learning for theorem proving.
link |
00:38:29.640
It was a really interesting research topic.
link |
00:38:34.520
And so Rajat was saying,
link |
00:38:37.640
hey, we saw Keras, we like it.
link |
00:38:41.040
We saw that you're at Google.
link |
00:38:42.440
Why don't you come over for like a quarter
link |
00:38:45.280
and work with us?
link |
00:38:47.280
And I was like, yeah, that sounds like a great opportunity.
link |
00:38:49.240
Let's do it.
link |
00:38:50.400
And so I started working on integrating the Keras API
link |
00:38:55.720
into TensorFlow more tightly.
link |
00:38:57.320
So what followed up is a sort of like temporary
link |
00:39:02.640
TensorFlow only version of Keras
link |
00:39:05.480
that was in TensorFlow.com Trib for a while.
link |
00:39:09.320
And finally moved to TensorFlow Core.
link |
00:39:12.200
And I've never actually gotten back
link |
00:39:15.360
to my old team doing research.
link |
00:39:17.600
Well, it's kind of funny that somebody like you
link |
00:39:22.320
who dreams of, or at least sees the power of AI systems
link |
00:39:28.960
that reason and theorem proving we'll talk about
link |
00:39:31.680
has also created a system that makes the most basic
link |
00:39:36.520
kind of Lego building that is deep learning
link |
00:39:40.400
super accessible, super easy.
link |
00:39:42.640
So beautifully so.
link |
00:39:43.800
It's a funny irony that you're both,
link |
00:39:47.720
you're responsible for both things,
link |
00:39:49.120
but so TensorFlow 2.0 is kind of, there's a sprint.
link |
00:39:54.000
I don't know how long it'll take,
link |
00:39:55.080
but there's a sprint towards the finish.
link |
00:39:56.960
What do you look, what are you working on these days?
link |
00:40:01.040
What are you excited about?
link |
00:40:02.160
What are you excited about in 2.0?
link |
00:40:04.280
I mean, eager execution.
link |
00:40:05.760
There's so many things that just make it a lot easier
link |
00:40:08.440
to work.
link |
00:40:09.760
What are you excited about and what's also really hard?
link |
00:40:13.640
What are the problems you have to kind of solve?
link |
00:40:15.800
So I've spent the past year and a half working on
link |
00:40:19.080
TensorFlow 2.0 and it's been a long journey.
link |
00:40:22.920
I'm actually extremely excited about it.
link |
00:40:25.080
I think it's a great product.
link |
00:40:26.440
It's a delightful product compared to TensorFlow 1.0.
link |
00:40:29.360
We've made huge progress.
link |
00:40:32.640
So on the Keras side, what I'm really excited about is that,
link |
00:40:37.400
so previously Keras has been this very easy to use
link |
00:40:42.400
high level interface to do deep learning.
link |
00:40:45.840
But if you wanted to,
link |
00:40:50.520
if you wanted a lot of flexibility,
link |
00:40:53.040
the Keras framework was probably not the optimal way
link |
00:40:57.520
to do things compared to just writing everything
link |
00:40:59.760
from scratch.
link |
00:41:01.800
So in some way, the framework was getting in the way.
link |
00:41:04.680
And in TensorFlow 2.0, you don't have this at all, actually.
link |
00:41:07.960
You have the usability of the high level interface,
link |
00:41:11.040
but you have the flexibility of this lower level interface.
link |
00:41:14.480
And you have this spectrum of workflows
link |
00:41:16.800
where you can get more or less usability
link |
00:41:21.560
and flexibility trade offs depending on your needs, right?
link |
00:41:26.640
You can write everything from scratch
link |
00:41:29.680
and you get a lot of help doing so
link |
00:41:32.320
by subclassing models and writing some train loops
link |
00:41:36.400
using ego execution.
link |
00:41:38.200
It's very flexible, it's very easy to debug,
link |
00:41:40.160
it's very powerful.
link |
00:41:42.280
But all of this integrates seamlessly
link |
00:41:45.000
with higher level features up to the classic Keras workflows,
link |
00:41:49.440
which are very scikit learn like
link |
00:41:51.560
and are ideal for a data scientist,
link |
00:41:56.040
machine learning engineer type of profile.
link |
00:41:58.240
So now you can have the same framework
link |
00:42:00.840
offering the same set of APIs
link |
00:42:02.880
that enable a spectrum of workflows
link |
00:42:05.000
that are more or less low level, more or less high level
link |
00:42:08.560
that are suitable for profiles ranging from researchers
link |
00:42:13.520
to data scientists and everything in between.
link |
00:42:15.560
Yeah, so that's super exciting.
link |
00:42:16.960
I mean, it's not just that,
link |
00:42:18.400
it's connected to all kinds of tooling.
link |
00:42:21.680
You can go on mobile, you can go with TensorFlow Lite,
link |
00:42:24.520
you can go in the cloud or serving and so on.
link |
00:42:27.240
It all is connected together.
link |
00:42:28.960
Now some of the best software written ever
link |
00:42:31.880
is often done by one person, sometimes two.
link |
00:42:36.880
So with a Google, you're now seeing sort of Keras
link |
00:42:40.800
having to be integrated in TensorFlow,
link |
00:42:42.840
I'm sure has a ton of engineers working on.
link |
00:42:46.800
And there's, I'm sure a lot of tricky design decisions
link |
00:42:51.040
to be made.
link |
00:42:52.200
How does that process usually happen
link |
00:42:54.440
from at least your perspective?
link |
00:42:56.800
What are the debates like?
link |
00:43:00.720
Is there a lot of thinking,
link |
00:43:04.200
considering different options and so on?
link |
00:43:06.880
Yes.
link |
00:43:08.160
So a lot of the time I spend at Google
link |
00:43:12.640
is actually discussing design discussions, right?
link |
00:43:17.280
Writing design docs, participating in design review meetings
link |
00:43:20.480
and so on.
link |
00:43:22.080
This is as important as actually writing a code.
link |
00:43:25.240
Right.
link |
00:43:26.080
So there's a lot of thought, there's a lot of thought
link |
00:43:28.120
and a lot of care that is taken
link |
00:43:32.280
in coming up with these decisions
link |
00:43:34.160
and taking into account all of our users
link |
00:43:37.160
because TensorFlow has this extremely diverse user base,
link |
00:43:40.680
right?
link |
00:43:41.520
It's not like just one user segment
link |
00:43:43.120
where everyone has the same needs.
link |
00:43:45.480
We have small scale production users,
link |
00:43:47.640
large scale production users.
link |
00:43:49.520
We have startups, we have researchers,
link |
00:43:53.720
you know, it's all over the place.
link |
00:43:55.080
And we have to cater to all of their needs.
link |
00:43:57.560
If I just look at the standard debates
link |
00:44:00.040
of C++ or Python, there's some heated debates.
link |
00:44:04.000
Do you have those at Google?
link |
00:44:06.000
I mean, they're not heated in terms of emotionally,
link |
00:44:08.080
but there's probably multiple ways to do it, right?
link |
00:44:10.800
So how do you arrive through those design meetings
link |
00:44:14.040
at the best way to do it?
link |
00:44:15.440
Especially in deep learning where the field is evolving
link |
00:44:19.280
as you're doing it.
link |
00:44:21.880
Is there some magic to it?
link |
00:44:23.600
Is there some magic to the process?
link |
00:44:26.240
I don't know if there's magic to the process,
link |
00:44:28.280
but there definitely is a process.
link |
00:44:30.640
So making design decisions
link |
00:44:33.760
is about satisfying a set of constraints,
link |
00:44:36.080
but also trying to do so in the simplest way possible,
link |
00:44:39.920
because this is what can be maintained,
link |
00:44:42.240
this is what can be expanded in the future.
link |
00:44:44.920
So you don't want to naively satisfy the constraints
link |
00:44:49.120
by just, you know, for each capability you need available,
link |
00:44:51.880
you're gonna come up with one argument in your API
link |
00:44:53.960
and so on.
link |
00:44:54.800
You want to design APIs that are modular and hierarchical
link |
00:45:00.640
so that they have an API surface
link |
00:45:04.080
that is as small as possible, right?
link |
00:45:07.040
And you want this modular hierarchical architecture
link |
00:45:11.640
to reflect the way that domain experts
link |
00:45:14.560
think about the problem.
link |
00:45:16.400
Because as a domain expert,
link |
00:45:17.880
when you are reading about a new API,
link |
00:45:19.840
you're reading a tutorial or some docs pages,
link |
00:45:24.760
you already have a way that you're thinking about the problem.
link |
00:45:28.200
You already have like certain concepts in mind
link |
00:45:32.320
and you're thinking about how they relate together.
link |
00:45:35.680
And when you're reading docs,
link |
00:45:37.200
you're trying to build as quickly as possible
link |
00:45:40.280
a mapping between the concepts featured in your API
link |
00:45:45.280
and the concepts in your mind.
link |
00:45:46.800
So you're trying to map your mental model
link |
00:45:48.880
as a domain expert to the way things work in the API.
link |
00:45:53.600
So you need an API and an underlying implementation
link |
00:45:57.040
that are reflecting the way people think about these things.
link |
00:46:00.120
So in minimizing the time it takes to do the mapping.
link |
00:46:02.880
Yes, minimizing the time,
link |
00:46:04.680
the cognitive load there is
link |
00:46:06.560
in ingesting this new knowledge about your API.
link |
00:46:10.920
An API should not be self referential
link |
00:46:13.160
or referring to implementation details.
link |
00:46:15.520
It should only be referring to domain specific concepts
link |
00:46:19.160
that people already understand.
link |
00:46:23.240
Brilliant.
link |
00:46:24.480
So what's the future of Keras and TensorFlow look like?
link |
00:46:27.560
What does TensorFlow 3.0 look like?
link |
00:46:30.600
So that's kind of too far in the future for me to answer,
link |
00:46:33.720
especially since I'm not even the one making these decisions.
link |
00:46:37.800
Okay.
link |
00:46:39.080
But so from my perspective,
link |
00:46:41.240
which is just one perspective
link |
00:46:43.200
among many different perspectives on the TensorFlow team,
link |
00:46:47.200
I'm really excited by developing even higher level APIs,
link |
00:46:52.360
higher level than Keras.
link |
00:46:53.560
I'm really excited by hyperparameter tuning,
link |
00:46:56.480
by automated machine learning, AutoML.
link |
00:47:01.120
I think the future is not just, you know,
link |
00:47:03.200
defining a model like you were assembling Lego blocks
link |
00:47:07.600
and then collect fit on it.
link |
00:47:09.200
It's more like an automagical model
link |
00:47:13.680
that would just look at your data
link |
00:47:16.080
and optimize the objective you're after, right?
link |
00:47:19.040
So that's what I'm looking into.
link |
00:47:23.040
Yeah, so you put the baby into a room with the problem
link |
00:47:26.480
and come back a few hours later
link |
00:47:28.760
with a fully solved problem.
link |
00:47:30.960
Exactly, it's not like a box of Legos.
link |
00:47:33.560
It's more like the combination of a kid
link |
00:47:35.920
that's really good at Legos and a box of Legos.
link |
00:47:38.800
It's just building the thing on its own.
link |
00:47:41.520
Very nice.
link |
00:47:42.680
So that's an exciting future.
link |
00:47:44.160
I think there's a huge amount of applications
link |
00:47:46.080
and revolutions to be had
link |
00:47:49.920
under the constraints of the discussion we previously had.
link |
00:47:52.640
But what do you think of the current limits of deep learning?
link |
00:47:57.480
If we look specifically at these function approximators
link |
00:48:03.840
that tries to generalize from data.
link |
00:48:06.160
You've talked about local versus extreme generalization.
link |
00:48:11.120
You mentioned that neural networks don't generalize well
link |
00:48:13.280
and humans do.
link |
00:48:14.560
So there's this gap.
link |
00:48:17.640
And you've also mentioned that extreme generalization
link |
00:48:20.840
requires something like reasoning to fill those gaps.
link |
00:48:23.960
So how can we start trying to build systems like that?
link |
00:48:27.560
Right, yeah, so this is by design, right?
link |
00:48:30.600
Deep learning models are like huge parametric models,
link |
00:48:37.080
differentiable, so continuous,
link |
00:48:39.280
that go from an input space to an output space.
link |
00:48:42.680
And they're trained with gradient descent.
link |
00:48:44.120
So they're trained pretty much point by point.
link |
00:48:47.160
They are learning a continuous geometric morphing
link |
00:48:50.520
from an input vector space to an output vector space.
link |
00:48:55.320
And because this is done point by point,
link |
00:48:58.960
a deep neural network can only make sense
link |
00:49:02.200
of points in experience space that are very close
link |
00:49:05.880
to things that it has already seen in string data.
link |
00:49:08.520
At best, it can do interpolation across points.
link |
00:49:13.840
But that means in order to train your network,
link |
00:49:17.360
you need a dense sampling of the input cross output space,
link |
00:49:22.880
almost a point by point sampling,
link |
00:49:25.240
which can be very expensive if you're dealing
link |
00:49:27.160
with complex real world problems,
link |
00:49:29.320
like autonomous driving, for instance, or robotics.
link |
00:49:33.240
It's doable if you're looking at the subset
link |
00:49:36.000
of the visual space.
link |
00:49:37.120
But even then, it's still fairly expensive.
link |
00:49:38.800
You still need millions of examples.
link |
00:49:40.920
And it's only going to be able to make sense of things
link |
00:49:44.240
that are very close to what it has seen before.
link |
00:49:46.880
And in contrast to that, well, of course,
link |
00:49:49.160
you have human intelligence.
link |
00:49:50.160
But even if you're not looking at human intelligence,
link |
00:49:53.240
you can look at very simple rules, algorithms.
link |
00:49:56.800
If you have a symbolic rule,
link |
00:49:58.080
it can actually apply to a very, very large set of inputs
link |
00:50:03.120
because it is abstract.
link |
00:50:04.880
It is not obtained by doing a point by point mapping.
link |
00:50:10.720
For instance, if you try to learn a sorting algorithm
link |
00:50:14.000
using a deep neural network,
link |
00:50:15.520
well, you're very much limited to learning point by point
link |
00:50:20.080
what the sorted representation of this specific list is like.
link |
00:50:24.360
But instead, you could have a very, very simple
link |
00:50:29.400
sorting algorithm written in a few lines.
link |
00:50:31.920
Maybe it's just two nested loops.
link |
00:50:35.560
And it can process any list at all because it is abstract,
link |
00:50:41.040
because it is a set of rules.
link |
00:50:42.240
So deep learning is really like point by point
link |
00:50:45.160
geometric morphings, train with good and decent.
link |
00:50:48.640
And meanwhile, abstract rules can generalize much better.
link |
00:50:53.640
And I think the future is we need to combine the two.
link |
00:50:56.160
So how do we, do you think, combine the two?
link |
00:50:59.160
How do we combine good point by point functions
link |
00:51:03.040
with programs, which is what the symbolic AI type systems?
link |
00:51:08.920
At which levels the combination happen?
link |
00:51:11.600
I mean, obviously we're jumping into the realm
link |
00:51:14.680
of where there's no good answers.
link |
00:51:16.880
It's just kind of ideas and intuitions and so on.
link |
00:51:20.280
Well, if you look at the really successful AI systems
link |
00:51:23.080
today, I think they are already hybrid systems
link |
00:51:26.320
that are combining symbolic AI with deep learning.
link |
00:51:29.520
For instance, successful robotics systems
link |
00:51:32.520
are already mostly model based, rule based,
link |
00:51:37.400
things like planning algorithms and so on.
link |
00:51:39.400
At the same time, they're using deep learning
link |
00:51:42.200
as perception modules.
link |
00:51:43.840
Sometimes they're using deep learning as a way
link |
00:51:46.000
to inject fuzzy intuition into a rule based process.
link |
00:51:50.920
If you look at the system like in a self driving car,
link |
00:51:54.560
it's not just one big end to end neural network.
link |
00:51:57.240
You know, that wouldn't work at all.
link |
00:51:59.000
Precisely because in order to train that,
link |
00:52:00.760
you would need a dense sampling of experience base
link |
00:52:05.160
when it comes to driving,
link |
00:52:06.200
which is completely unrealistic, obviously.
link |
00:52:08.880
Instead, the self driving car is mostly
link |
00:52:13.920
symbolic, you know, it's software, it's programmed by hand.
link |
00:52:18.360
So it's mostly based on explicit models.
link |
00:52:21.640
In this case, mostly 3D models of the environment
link |
00:52:25.840
around the car, but it's interfacing with the real world
link |
00:52:29.520
using deep learning modules, right?
link |
00:52:31.440
So the deep learning there serves as a way
link |
00:52:33.440
to convert the raw sensory information
link |
00:52:36.080
to something usable by symbolic systems.
link |
00:52:39.760
Okay, well, let's linger on that a little more.
link |
00:52:42.400
So dense sampling from input to output.
link |
00:52:45.440
You said it's obviously very difficult.
link |
00:52:48.240
Is it possible?
link |
00:52:50.120
In the case of self driving, you mean?
link |
00:52:51.800
Let's say self driving, right?
link |
00:52:53.040
Self driving for many people,
link |
00:52:57.560
let's not even talk about self driving,
link |
00:52:59.520
let's talk about steering, so staying inside the lane.
link |
00:53:05.040
Lane following, yeah, it's definitely a problem
link |
00:53:07.080
you can solve with an end to end deep learning model,
link |
00:53:08.880
but that's like one small subset.
link |
00:53:10.600
Hold on a second.
link |
00:53:11.440
Yeah, I don't know why you're jumping
link |
00:53:12.760
from the extreme so easily,
link |
00:53:14.480
because I disagree with you on that.
link |
00:53:16.280
I think, well, it's not obvious to me
link |
00:53:21.000
that you can solve lane following.
link |
00:53:23.400
No, it's not obvious, I think it's doable.
link |
00:53:25.840
I think in general, there is no hard limitations
link |
00:53:31.200
to what you can learn with a deep neural network,
link |
00:53:33.680
as long as the search space is rich enough,
link |
00:53:40.320
is flexible enough, and as long as you have
link |
00:53:42.240
this dense sampling of the input cross output space.
link |
00:53:45.360
The problem is that this dense sampling
link |
00:53:47.720
could mean anything from 10,000 examples
link |
00:53:51.120
to like trillions and trillions.
link |
00:53:52.840
So that's my question.
link |
00:53:54.360
So what's your intuition?
link |
00:53:56.200
And if you could just give it a chance
link |
00:53:58.720
and think what kind of problems can be solved
link |
00:54:01.880
by getting a huge amounts of data
link |
00:54:04.240
and thereby creating a dense mapping.
link |
00:54:08.000
So let's think about natural language dialogue,
link |
00:54:12.480
the Turing test.
link |
00:54:14.000
Do you think the Turing test can be solved
link |
00:54:17.000
with a neural network alone?
link |
00:54:21.120
Well, the Turing test is all about tricking people
link |
00:54:24.440
into believing they're talking to a human.
link |
00:54:26.880
And I don't think that's actually very difficult
link |
00:54:29.040
because it's more about exploiting human perception
link |
00:54:35.600
and not so much about intelligence.
link |
00:54:37.520
There's a big difference between mimicking
link |
00:54:39.680
intelligent behavior and actual intelligent behavior.
link |
00:54:42.080
So, okay, let's look at maybe the Alexa prize and so on.
link |
00:54:45.360
The different formulations of the natural language
link |
00:54:47.480
conversation that are less about mimicking
link |
00:54:50.520
and more about maintaining a fun conversation
link |
00:54:52.800
that lasts for 20 minutes.
link |
00:54:54.720
That's a little less about mimicking
link |
00:54:56.200
and that's more about, I mean, it's still mimicking,
link |
00:54:59.080
but it's more about being able to carry forward
link |
00:55:01.440
a conversation with all the tangents that happen
link |
00:55:03.640
in dialogue and so on.
link |
00:55:05.080
Do you think that problem is learnable
link |
00:55:08.320
with a neural network that does the point to point mapping?
link |
00:55:14.520
So I think it would be very, very challenging
link |
00:55:16.280
to do this with deep learning.
link |
00:55:17.800
I don't think it's out of the question either.
link |
00:55:21.480
I wouldn't rule it out.
link |
00:55:23.240
The space of problems that can be solved
link |
00:55:25.400
with a large neural network.
link |
00:55:26.920
What's your sense about the space of those problems?
link |
00:55:30.080
So useful problems for us.
link |
00:55:32.560
In theory, it's infinite, right?
link |
00:55:34.800
You can solve any problem.
link |
00:55:36.200
In practice, well, deep learning is a great fit
link |
00:55:39.800
for perception problems.
link |
00:55:41.800
In general, any problem which is naturally amenable
link |
00:55:47.640
to explicit handcrafted rules or rules that you can generate
link |
00:55:52.200
by exhaustive search over some program space.
link |
00:55:56.080
So perception, artificial intuition,
link |
00:55:59.320
as long as you have a sufficient training dataset.
link |
00:56:03.240
And that's the question, I mean, perception,
link |
00:56:05.360
there's interpretation and understanding of the scene,
link |
00:56:08.400
which seems to be outside the reach
link |
00:56:10.280
of current perception systems.
link |
00:56:12.960
So do you think larger networks will be able
link |
00:56:15.920
to start to understand the physics
link |
00:56:18.280
and the physics of the scene,
link |
00:56:21.080
the three dimensional structure and relationships
link |
00:56:23.400
of objects in the scene and so on?
link |
00:56:25.560
Or really that's where symbolic AI has to step in?
link |
00:56:28.320
Well, it's always possible to solve these problems
link |
00:56:34.480
with deep learning.
link |
00:56:36.800
It's just extremely inefficient.
link |
00:56:38.560
A model would be an explicit rule based abstract model
link |
00:56:42.000
would be a far better, more compressed
link |
00:56:45.240
representation of physics.
link |
00:56:46.840
Then learning just this mapping between
link |
00:56:49.080
in this situation, this thing happens.
link |
00:56:50.960
If you change the situation slightly,
link |
00:56:52.720
then this other thing happens and so on.
link |
00:56:54.760
Do you think it's possible to automatically generate
link |
00:56:57.440
the programs that would require that kind of reasoning?
link |
00:57:02.200
Or does it have to, so the way the expert systems fail,
link |
00:57:05.360
there's so many facts about the world
link |
00:57:07.120
had to be hand coded in.
link |
00:57:08.960
Do you think it's possible to learn those logical statements
link |
00:57:14.600
that are true about the world and their relationships?
link |
00:57:18.200
Do you think, I mean, that's kind of what theorem proving
link |
00:57:20.360
at a basic level is trying to do, right?
link |
00:57:22.680
Yeah, except it's much harder to formulate statements
link |
00:57:26.160
about the world compared to formulating
link |
00:57:28.480
mathematical statements.
link |
00:57:30.320
Statements about the world tend to be subjective.
link |
00:57:34.200
So can you learn rule based models?
link |
00:57:39.600
Yes, definitely.
link |
00:57:40.920
That's the field of program synthesis.
link |
00:57:43.640
However, today we just don't really know how to do it.
link |
00:57:48.040
So it's very much a grass search or tree search problem.
link |
00:57:52.400
And so we are limited to the sort of tree session grass
link |
00:57:56.800
search algorithms that we have today.
link |
00:57:58.560
Personally, I think genetic algorithms are very promising.
link |
00:58:02.760
So almost like genetic programming.
link |
00:58:04.360
Genetic programming, exactly.
link |
00:58:05.560
Can you discuss the field of program synthesis?
link |
00:58:08.840
Like how many people are working and thinking about it?
link |
00:58:14.560
Where we are in the history of program synthesis
link |
00:58:17.960
and what are your hopes for it?
link |
00:58:20.720
Well, if it were deep learning, this is like the 90s.
link |
00:58:24.600
So meaning that we already have existing solutions.
link |
00:58:29.120
We are starting to have some basic understanding
link |
00:58:34.280
of what this is about.
link |
00:58:35.480
But it's still a field that is in its infancy.
link |
00:58:38.000
There are very few people working on it.
link |
00:58:40.440
There are very few real world applications.
link |
00:58:44.480
So the one real world application I'm aware of
link |
00:58:47.640
is Flash Fill in Excel.
link |
00:58:51.680
It's a way to automatically learn very simple programs
link |
00:58:55.080
to format cells in an Excel spreadsheet
link |
00:58:58.200
from a few examples.
link |
00:59:00.240
For instance, learning a way to format a date, things like that.
link |
00:59:02.800
Oh, that's fascinating.
link |
00:59:03.680
Yeah.
link |
00:59:04.560
You know, OK, that's a fascinating topic.
link |
00:59:06.280
I always wonder when I provide a few samples to Excel,
link |
00:59:10.480
what it's able to figure out.
link |
00:59:12.600
Like just giving it a few dates, what
link |
00:59:15.960
are you able to figure out from the pattern I just gave you?
link |
00:59:18.480
That's a fascinating question.
link |
00:59:19.760
And it's fascinating whether that's learnable patterns.
link |
00:59:23.320
And you're saying they're working on that.
link |
00:59:25.520
How big is the toolbox currently?
link |
00:59:28.200
Are we completely in the dark?
link |
00:59:29.520
So if you said the 90s.
link |
00:59:30.440
In terms of program synthesis?
link |
00:59:31.720
No.
link |
00:59:32.360
So I would say, so maybe 90s is even too optimistic.
link |
00:59:37.720
Because by the 90s, we already understood back prop.
link |
00:59:41.080
We already understood the engine of deep learning,
link |
00:59:43.960
even though we couldn't really see its potential quite.
link |
00:59:47.280
Today, I don't think we have found
link |
00:59:48.520
the engine of program synthesis.
link |
00:59:50.400
So we're in the winter before back prop.
link |
00:59:52.880
Yeah.
link |
00:59:54.160
In a way, yes.
link |
00:59:55.720
So I do believe program synthesis and general discrete search
link |
01:00:00.120
over rule based models is going to be
link |
01:00:02.760
a cornerstone of AI research in the next century.
link |
01:00:06.640
And that doesn't mean we are going to drop deep learning.
link |
01:00:10.200
Deep learning is immensely useful.
link |
01:00:11.880
Like, being able to learn is a very flexible, adaptable,
link |
01:00:17.200
parametric model.
link |
01:00:18.120
So it's got to understand that's actually immensely useful.
link |
01:00:20.720
All it's doing is pattern cognition.
link |
01:00:23.040
But being good at pattern cognition, given lots of data,
link |
01:00:25.640
is just extremely powerful.
link |
01:00:27.920
So we are still going to be working on deep learning.
link |
01:00:30.320
We are going to be working on program synthesis.
link |
01:00:31.840
We are going to be combining the two in increasingly automated
link |
01:00:34.680
ways.
link |
01:00:36.400
So let's talk a little bit about data.
link |
01:00:38.520
You've tweeted, about 10,000 deep learning papers
link |
01:00:44.600
have been written about hard coding priors
link |
01:00:47.080
about a specific task in a neural network architecture
link |
01:00:49.600
works better than a lack of a prior.
link |
01:00:52.440
Basically, summarizing all these efforts,
link |
01:00:55.120
they put a name to an architecture.
link |
01:00:56.920
But really, what they're doing is hard coding some priors
link |
01:00:59.280
that improve the performance of the system.
link |
01:01:01.560
But which gets straight to the point is probably true.
link |
01:01:06.880
So you say that you can always buy performance by,
link |
01:01:09.800
in quotes, performance by either training on more data,
link |
01:01:12.920
better data, or by injecting task information
link |
01:01:15.480
to the architecture of the preprocessing.
link |
01:01:18.400
However, this isn't informative about the generalization power
link |
01:01:21.280
the techniques use, the fundamental ability
link |
01:01:23.080
to generalize.
link |
01:01:24.200
Do you think we can go far by coming up
link |
01:01:26.800
with better methods for this kind of cheating,
link |
01:01:29.920
for better methods of large scale annotation of data?
link |
01:01:33.520
So building better priors.
link |
01:01:34.960
If you automate it, it's not cheating anymore.
link |
01:01:37.280
Right.
link |
01:01:38.360
I'm joking about the cheating, but large scale.
link |
01:01:41.600
So basically, I'm asking about something
link |
01:01:46.560
that hasn't, from my perspective,
link |
01:01:48.280
been researched too much is exponential improvement
link |
01:01:53.360
in annotation of data.
link |
01:01:55.960
Do you often think about?
link |
01:01:58.120
I think it's actually been researched quite a bit.
link |
01:02:00.840
You just don't see publications about it.
link |
01:02:02.720
Because people who publish papers
link |
01:02:05.840
are going to publish about known benchmarks.
link |
01:02:07.920
Sometimes they're going to read a new benchmark.
link |
01:02:09.800
People who actually have real world large scale
link |
01:02:12.200
depending on problems, they're going
link |
01:02:13.880
to spend a lot of resources into data annotation
link |
01:02:16.960
and good data annotation pipelines,
link |
01:02:18.400
but you don't see any papers about it.
link |
01:02:19.640
That's interesting.
link |
01:02:20.400
So do you think, certainly resources,
link |
01:02:22.720
but do you think there's innovation happening?
link |
01:02:24.840
Oh, yeah.
link |
01:02:25.880
To clarify the point in the tweet.
link |
01:02:28.880
So machine learning in general is
link |
01:02:31.160
the science of generalization.
link |
01:02:33.840
You want to generate knowledge that
link |
01:02:37.800
can be reused across different data sets,
link |
01:02:40.440
across different tasks.
link |
01:02:42.000
And if instead you're looking at one data set
link |
01:02:45.280
and then you are hard coding knowledge about this task
link |
01:02:50.000
into your architecture, this is no more useful
link |
01:02:54.040
than training a network and then saying, oh, I
link |
01:02:56.760
found these weight values perform well.
link |
01:03:01.920
So David Ha, I don't know if you know David,
link |
01:03:05.680
he had a paper the other day about weight
link |
01:03:08.760
agnostic neural networks.
link |
01:03:10.400
And this is a very interesting paper
link |
01:03:12.120
because it really illustrates the fact
link |
01:03:14.400
that an architecture, even without weights,
link |
01:03:17.400
an architecture is knowledge about a task.
link |
01:03:21.360
It encodes knowledge.
link |
01:03:23.640
And when it comes to architectures
link |
01:03:25.840
that are uncrafted by researchers, in some cases,
link |
01:03:30.440
it is very, very clear that all they are doing
link |
01:03:34.160
is artificially reencoding the template that
link |
01:03:38.880
corresponds to the proper way to solve the task encoding
link |
01:03:44.400
a given data set.
link |
01:03:45.200
For instance, I know if you looked
link |
01:03:48.120
at the baby data set, which is about natural language
link |
01:03:52.280
question answering, it is generated by an algorithm.
link |
01:03:55.520
So this is a question answer pairs
link |
01:03:57.680
that are generated by an algorithm.
link |
01:03:59.280
The algorithm is solving a certain template.
link |
01:04:01.520
Turns out, if you craft a network that
link |
01:04:04.400
literally encodes this template, you
link |
01:04:06.360
can solve this data set with nearly 100% accuracy.
link |
01:04:09.640
But that doesn't actually tell you
link |
01:04:11.160
anything about how to solve question answering
link |
01:04:14.640
in general, which is the point.
link |
01:04:17.680
The question is just to linger on it,
link |
01:04:19.400
whether it's from the data side or from the size
link |
01:04:21.560
of the network.
link |
01:04:23.280
I don't know if you've read the blog post by Rich Sutton,
link |
01:04:25.920
The Bitter Lesson, where he says,
link |
01:04:28.400
the biggest lesson that we can read from 70 years of AI
link |
01:04:31.480
research is that general methods that leverage computation
link |
01:04:34.720
are ultimately the most effective.
link |
01:04:37.160
So as opposed to figuring out methods
link |
01:04:39.720
that can generalize effectively, do you
link |
01:04:41.840
think we can get pretty far by just having something
link |
01:04:47.720
that leverages computation and the improvement of computation?
link |
01:04:51.520
Yeah, so I think Rich is making a very good point, which
link |
01:04:54.960
is that a lot of these papers, which are actually
link |
01:04:57.560
all about manually hardcoding prior knowledge about a task
link |
01:05:02.800
into some system, it doesn't have
link |
01:05:04.720
to be deep learning architecture, but into some system.
link |
01:05:08.600
These papers are not actually making any impact.
link |
01:05:11.920
Instead, what's making really long term impact
link |
01:05:14.800
is very simple, very general systems
link |
01:05:18.520
that are really agnostic to all these tricks.
link |
01:05:21.280
Because these tricks do not generalize.
link |
01:05:23.320
And of course, the one general and simple thing
link |
01:05:27.480
that you should focus on is that which leverages computation.
link |
01:05:33.160
Because computation, the availability
link |
01:05:36.200
of large scale computation has been increasing exponentially
link |
01:05:39.400
following Moore's law.
link |
01:05:40.560
So if your algorithm is all about exploiting this,
link |
01:05:44.080
then your algorithm is suddenly exponentially improving.
link |
01:05:47.440
So I think Rich is definitely right.
link |
01:05:52.400
However, he's right about the past 70 years.
link |
01:05:57.120
He's like assessing the past 70 years.
link |
01:05:59.440
I am not sure that this assessment will still
link |
01:06:02.360
hold true for the next 70 years.
link |
01:06:04.880
It might to some extent.
link |
01:06:07.160
I suspect it will not.
link |
01:06:08.560
Because the truth of his assessment
link |
01:06:11.560
is a function of the context in which this research took place.
link |
01:06:16.800
And the context is changing.
link |
01:06:18.600
Moore's law might not be applicable anymore,
link |
01:06:21.440
for instance, in the future.
link |
01:06:23.760
And I do believe that when you tweak one aspect of a system,
link |
01:06:31.200
when you exploit one aspect of a system,
link |
01:06:32.920
some other aspect starts becoming the bottleneck.
link |
01:06:36.480
Let's say you have unlimited computation.
link |
01:06:38.800
Well, then data is the bottleneck.
link |
01:06:41.440
And I think we are already starting
link |
01:06:43.560
to be in a regime where our systems are
link |
01:06:45.720
so large in scale and so data ingrained
link |
01:06:48.120
that data today and the quality of data
link |
01:06:50.360
and the scale of data is the bottleneck.
link |
01:06:53.040
And in this environment, the bitter lesson from Rich
link |
01:06:58.160
is not going to be true anymore.
link |
01:07:00.800
So I think we are going to move from a focus
link |
01:07:03.960
on a computation scale to focus on data efficiency.
link |
01:07:09.840
Data efficiency.
link |
01:07:10.720
So that's getting to the question of symbolic AI.
link |
01:07:13.120
But to linger on the deep learning approaches,
link |
01:07:16.280
do you have hope for either unsupervised learning
link |
01:07:19.240
or reinforcement learning, which are
link |
01:07:23.280
ways of being more data efficient in terms
link |
01:07:28.120
of the amount of data they need that required human annotation?
link |
01:07:31.560
So unsupervised learning and reinforcement learning
link |
01:07:34.280
are frameworks for learning, but they are not
link |
01:07:36.640
like any specific technique.
link |
01:07:39.000
So usually when people say reinforcement learning,
link |
01:07:41.200
what they really mean is deep reinforcement learning,
link |
01:07:43.320
which is like one approach which is actually very questionable.
link |
01:07:47.440
The question I was asking was unsupervised learning
link |
01:07:50.920
with deep neural networks and deep reinforcement learning.
link |
01:07:54.680
Well, these are not really data efficient
link |
01:07:56.840
because you're still leveraging these huge parametric models
link |
01:08:00.520
point by point with gradient descent.
link |
01:08:03.720
It is more efficient in terms of the number of annotations,
link |
01:08:08.000
the density of annotations you need.
link |
01:08:09.520
So the idea being to learn the latent space around which
link |
01:08:13.840
the data is organized and then map the sparse annotations
link |
01:08:17.960
into it.
link |
01:08:18.760
And sure, I mean, that's clearly a very good idea.
link |
01:08:23.560
It's not really a topic I would be working on,
link |
01:08:26.080
but it's clearly a good idea.
link |
01:08:28.040
So it would get us to solve some problems that?
link |
01:08:31.760
It will get us to incremental improvements
link |
01:08:34.880
in labeled data efficiency.
link |
01:08:38.240
Do you have concerns about short term or long term threats
link |
01:08:43.520
from AI, from artificial intelligence?
link |
01:08:47.800
Yes, definitely to some extent.
link |
01:08:50.640
And what's the shape of those concerns?
link |
01:08:52.800
This is actually something I've briefly written about.
link |
01:08:56.880
But the capabilities of deep learning technology
link |
01:09:02.680
can be used in many ways that are
link |
01:09:05.200
concerning from mass surveillance with things
link |
01:09:09.760
like facial recognition.
link |
01:09:11.880
In general, tracking lots of data about everyone
link |
01:09:15.440
and then being able to making sense of this data
link |
01:09:18.920
to do identification, to do prediction.
link |
01:09:22.240
That's concerning.
link |
01:09:23.160
That's something that's being very aggressively pursued
link |
01:09:26.560
by totalitarian states like China.
link |
01:09:31.440
One thing I am very much concerned about
link |
01:09:34.000
is that our lives are increasingly online,
link |
01:09:40.640
are increasingly digital, made of information,
link |
01:09:43.280
made of information consumption and information production,
link |
01:09:48.080
our digital footprint, I would say.
link |
01:09:51.800
And if you absorb all of this data
link |
01:09:56.280
and you are in control of where you consume information,
link |
01:10:01.440
social networks and so on, recommendation engines,
link |
01:10:06.960
then you can build a sort of reinforcement
link |
01:10:10.200
loop for human behavior.
link |
01:10:13.760
You can observe the state of your mind at time t.
link |
01:10:18.360
You can predict how you would react
link |
01:10:21.080
to different pieces of content, how
link |
01:10:23.800
to get you to move your mind in a certain direction.
link |
01:10:27.000
And then you can feed you the specific piece of content
link |
01:10:33.160
that would move you in a specific direction.
link |
01:10:35.680
And you can do this at scale in terms
link |
01:10:41.800
of doing it continuously in real time.
link |
01:10:44.960
You can also do it at scale in terms
link |
01:10:46.440
of scaling this to many, many people, to entire populations.
link |
01:10:50.480
So potentially, artificial intelligence,
link |
01:10:53.840
even in its current state, if you combine it
link |
01:10:57.440
with the internet, with the fact that all of our lives
link |
01:11:01.760
are moving to digital devices and digital information
link |
01:11:05.120
consumption and creation, what you get
link |
01:11:08.720
is the possibility to achieve mass manipulation of behavior
link |
01:11:14.480
and mass psychological control.
link |
01:11:16.840
And this is a very real possibility.
link |
01:11:18.520
Yeah, so you're talking about any kind of recommender system.
link |
01:11:22.080
Let's look at the YouTube algorithm, Facebook,
link |
01:11:26.160
anything that recommends content you should watch next.
link |
01:11:29.720
And it's fascinating to think that there's
link |
01:11:32.960
some aspects of human behavior that you can say a problem of,
link |
01:11:41.120
is this person hold Republican beliefs or Democratic beliefs?
link |
01:11:45.400
And this is a trivial, that's an objective function.
link |
01:11:50.240
And you can optimize, and you can measure,
link |
01:11:52.600
and you can turn everybody into a Republican
link |
01:11:54.360
or everybody into a Democrat.
link |
01:11:56.080
I do believe it's true.
link |
01:11:57.840
So the human mind is very, if you look at the human mind
link |
01:12:03.680
as a kind of computer program, it
link |
01:12:05.320
has a very large exploit surface.
link |
01:12:07.560
It has many, many vulnerabilities.
link |
01:12:09.360
Exploit surfaces, yeah.
link |
01:12:10.840
Ways you can control it.
link |
01:12:13.520
For instance, when it comes to your political beliefs,
link |
01:12:16.680
this is very much tied to your identity.
link |
01:12:19.400
So for instance, if I'm in control of your news feed
link |
01:12:23.040
on your favorite social media platforms,
link |
01:12:26.000
this is actually where you're getting your news from.
link |
01:12:29.360
And of course, I can choose to only show you
link |
01:12:32.960
news that will make you see the world in a specific way.
link |
01:12:37.120
But I can also create incentives for you
link |
01:12:41.920
to post about some political beliefs.
link |
01:12:44.720
And then when I get you to express a statement,
link |
01:12:47.960
if it's a statement that me as the controller,
link |
01:12:51.840
I want to reinforce.
link |
01:12:53.800
I can just show it to people who will agree,
link |
01:12:55.560
and they will like it.
link |
01:12:56.880
And that will reinforce the statement in your mind.
link |
01:12:59.280
If this is a statement I want you to,
link |
01:13:02.760
this is a belief I want you to abandon,
link |
01:13:05.320
I can, on the other hand, show it to opponents.
link |
01:13:09.600
We'll attack you.
link |
01:13:10.640
And because they attack you, at the very least,
link |
01:13:12.840
next time you will think twice about posting it.
link |
01:13:16.840
But maybe you will even start believing this
link |
01:13:20.280
because you got pushback.
link |
01:13:22.840
So there are many ways in which social media platforms
link |
01:13:28.440
can potentially control your opinions.
link |
01:13:30.520
And today, so all of these things
link |
01:13:35.040
are already being controlled by AI algorithms.
link |
01:13:38.240
These algorithms do not have any explicit political goal
link |
01:13:41.880
today.
link |
01:13:42.880
Well, potentially they could, like if some totalitarian
link |
01:13:48.680
government takes over social media platforms
link |
01:13:52.720
and decides that now we are going to use this not just
link |
01:13:55.360
for mass surveillance, but also for mass opinion control
link |
01:13:58.040
and behavior control.
link |
01:13:59.360
Very bad things could happen.
link |
01:14:01.840
But what's really fascinating and actually quite concerning
link |
01:14:06.480
is that even without an explicit intent to manipulate,
link |
01:14:11.280
you're already seeing very dangerous dynamics
link |
01:14:14.760
in terms of how these content recommendation
link |
01:14:18.160
algorithms behave.
link |
01:14:19.800
Because right now, the goal, the objective function
link |
01:14:24.920
of these algorithms is to maximize engagement,
link |
01:14:28.640
which seems fairly innocuous at first.
link |
01:14:32.520
However, it is not because content
link |
01:14:36.480
that will maximally engage people, get people to react
link |
01:14:42.000
in an emotional way, get people to click on something.
link |
01:14:44.720
It is very often content that is not
link |
01:14:52.200
healthy to the public discourse.
link |
01:14:54.400
For instance, fake news are far more
link |
01:14:58.200
likely to get you to click on them than real news
link |
01:15:01.320
simply because they are not constrained to reality.
link |
01:15:06.960
So they can be as outrageous, as surprising,
link |
01:15:11.360
as good stories as you want because they're artificial.
link |
01:15:15.880
To me, that's an exciting world because so much good
link |
01:15:18.880
can come.
link |
01:15:19.560
So there's an opportunity to educate people.
link |
01:15:24.520
You can balance people's worldview with other ideas.
link |
01:15:31.200
So there's so many objective functions.
link |
01:15:33.800
The space of objective functions that
link |
01:15:35.840
create better civilizations is large, arguably infinite.
link |
01:15:40.720
But there's also a large space that
link |
01:15:43.720
creates division and destruction, civil war,
link |
01:15:51.480
a lot of bad stuff.
link |
01:15:53.160
And the worry is, naturally, probably that space
link |
01:15:56.920
is bigger, first of all.
link |
01:15:59.160
And if we don't explicitly think about what kind of effects
link |
01:16:04.920
are going to be observed from different objective functions,
link |
01:16:08.320
then we're going to get into trouble.
link |
01:16:10.160
But the question is, how do we get into rooms
link |
01:16:14.480
and have discussions, so inside Google, inside Facebook,
link |
01:16:18.560
inside Twitter, and think about, OK,
link |
01:16:21.840
how can we drive up engagement and, at the same time,
link |
01:16:24.840
create a good society?
link |
01:16:28.200
Is it even possible to have that kind
link |
01:16:29.560
of philosophical discussion?
link |
01:16:31.720
I think you can definitely try.
link |
01:16:33.080
So from my perspective, I would feel rather uncomfortable
link |
01:16:37.280
with companies that are uncomfortable with these new
link |
01:16:41.560
student algorithms, with them making explicit decisions
link |
01:16:47.120
to manipulate people's opinions or behaviors,
link |
01:16:50.440
even if the intent is good, because that's
link |
01:16:53.480
a very totalitarian mindset.
link |
01:16:55.200
So instead, what I would like to see
link |
01:16:57.440
is probably never going to happen,
link |
01:16:58.880
because it's not super realistic,
link |
01:17:00.360
but that's actually something I really care about.
link |
01:17:02.520
I would like all these algorithms
link |
01:17:06.280
to present configuration settings to their users,
link |
01:17:10.560
so that the users can actually make the decision about how
link |
01:17:14.600
they want to be impacted by these information
link |
01:17:19.000
recommendation, content recommendation algorithms.
link |
01:17:21.960
For instance, as a user of something
link |
01:17:24.240
like YouTube or Twitter, maybe I want
link |
01:17:26.520
to maximize learning about a specific topic.
link |
01:17:30.280
So I want the algorithm to feed my curiosity,
link |
01:17:36.800
which is in itself a very interesting problem.
link |
01:17:38.760
So instead of maximizing my engagement,
link |
01:17:41.200
it will maximize how fast and how much I'm learning.
link |
01:17:44.600
And it will also take into account the accuracy,
link |
01:17:47.360
hopefully, of the information I'm learning.
link |
01:17:50.680
So yeah, the user should be able to determine exactly
link |
01:17:55.680
how these algorithms are affecting their lives.
link |
01:17:58.560
I don't want actually any entity making decisions
link |
01:18:03.520
about in which direction they're going to try to manipulate me.
link |
01:18:09.480
I want technology.
link |
01:18:11.680
So AI, these algorithms are increasingly
link |
01:18:14.280
going to be our interface to a world that is increasingly
link |
01:18:18.560
made of information.
link |
01:18:19.960
And I want everyone to be in control of this interface,
link |
01:18:25.840
to interface with the world on their own terms.
link |
01:18:29.160
So if someone wants these algorithms
link |
01:18:32.840
to serve their own personal growth goals,
link |
01:18:37.640
they should be able to configure these algorithms
link |
01:18:40.640
in such a way.
link |
01:18:41.800
Yeah, but so I know it's painful to have explicit decisions.
link |
01:18:46.680
But there is underlying explicit decisions,
link |
01:18:51.080
which is some of the most beautiful fundamental
link |
01:18:53.360
philosophy that we have before us,
link |
01:18:57.400
which is personal growth.
link |
01:19:01.120
If I want to watch videos from which I can learn,
link |
01:19:05.680
what does that mean?
link |
01:19:08.080
So if I have a checkbox that wants to emphasize learning,
link |
01:19:11.800
there's still an algorithm with explicit decisions in it
link |
01:19:15.480
that would promote learning.
link |
01:19:17.800
What does that mean for me?
link |
01:19:19.200
For example, I've watched a documentary on flat Earth
link |
01:19:22.800
theory, I guess.
link |
01:19:27.280
I learned a lot.
link |
01:19:28.240
I'm really glad I watched it.
link |
01:19:29.800
It was a friend recommended it to me.
link |
01:19:32.560
Because I don't have such an allergic reaction to crazy
link |
01:19:35.800
people, as my fellow colleagues do.
link |
01:19:37.640
But it was very eye opening.
link |
01:19:40.360
And for others, it might not be.
link |
01:19:42.120
From others, they might just get turned off from that, same
link |
01:19:45.560
with Republican and Democrat.
link |
01:19:47.160
And it's a non trivial problem.
link |
01:19:50.200
And first of all, if it's done well,
link |
01:19:52.880
I don't think it's something that wouldn't happen,
link |
01:19:56.560
that YouTube wouldn't be promoting,
link |
01:19:59.280
or Twitter wouldn't be.
link |
01:20:00.200
It's just a really difficult problem,
link |
01:20:02.280
how to give people control.
link |
01:20:05.520
Well, it's mostly an interface design problem.
link |
01:20:08.960
The way I see it, you want to create technology
link |
01:20:11.080
that's like a mentor, or a coach, or an assistant,
link |
01:20:16.400
so that it's not your boss.
link |
01:20:20.520
You are in control of it.
link |
01:20:22.560
You are telling it what to do for you.
link |
01:20:25.760
And if you feel like it's manipulating you,
link |
01:20:27.840
it's not actually doing what you want.
link |
01:20:31.760
You should be able to switch to a different algorithm.
link |
01:20:34.920
So that's fine tune control.
link |
01:20:36.440
You kind of learn that you're trusting
link |
01:20:38.840
the human collaboration.
link |
01:20:40.080
I mean, that's how I see autonomous vehicles too,
link |
01:20:41.920
is giving as much information as possible,
link |
01:20:44.480
and you learn that dance yourself.
link |
01:20:47.240
Yeah, Adobe, I don't know if you use Adobe product
link |
01:20:50.280
for like Photoshop.
link |
01:20:52.280
They're trying to see if they can inject YouTube
link |
01:20:55.040
into their interface, but basically allow you
link |
01:20:57.120
to show you all these videos,
link |
01:20:59.840
that everybody's confused about what to do with features.
link |
01:21:03.320
So basically teach people by linking to,
link |
01:21:07.120
in that way, it's an assistant that uses videos
link |
01:21:10.280
as a basic element of information.
link |
01:21:13.440
Okay, so what practically should people do
link |
01:21:18.240
to try to fight against abuses of these algorithms,
link |
01:21:24.000
or algorithms that manipulate us?
link |
01:21:27.400
Honestly, it's a very, very difficult problem,
link |
01:21:29.280
because to start with, there is very little public awareness
link |
01:21:32.800
of these issues.
link |
01:21:35.040
Very few people would think there's anything wrong
link |
01:21:38.520
with the unused algorithm,
link |
01:21:39.720
even though there is actually something wrong already,
link |
01:21:42.040
which is that it's trying to maximize engagement
link |
01:21:44.480
most of the time, which has very negative side effects.
link |
01:21:49.880
So ideally, so the very first thing is to stop
link |
01:21:56.160
trying to purely maximize engagement,
link |
01:21:59.560
try to propagate content based on popularity, right?
link |
01:22:06.560
Instead, take into account the goals
link |
01:22:11.040
and the profiles of each user.
link |
01:22:13.560
So you will be, one example is, for instance,
link |
01:22:16.920
when I look at topic recommendations on Twitter,
link |
01:22:20.800
it's like, you know, they have this news tab
link |
01:22:24.480
with switch recommendations.
link |
01:22:25.480
It's always the worst coverage,
link |
01:22:28.480
because it's content that appeals
link |
01:22:30.360
to the smallest common denominator
link |
01:22:34.080
to all Twitter users, because they're trying to optimize.
link |
01:22:37.080
They're purely trying to optimize popularity.
link |
01:22:39.040
They're purely trying to optimize engagement.
link |
01:22:41.320
But that's not what I want.
link |
01:22:42.960
So they should put me in control of some setting
link |
01:22:46.080
so that I define what's the objective function
link |
01:22:50.360
that Twitter is going to be following
link |
01:22:52.200
to show me this content.
link |
01:22:54.120
And honestly, so this is all about interface design.
link |
01:22:57.360
And we are not, it's not realistic
link |
01:22:59.440
to give users control of a bunch of knobs
link |
01:23:01.760
that define algorithm.
link |
01:23:03.400
Instead, we should purely put them in charge
link |
01:23:06.760
of defining the objective function.
link |
01:23:09.400
Like, let the user tell us what they want to achieve,
link |
01:23:13.240
how they want this algorithm to impact their lives.
link |
01:23:15.280
So do you think it is that,
link |
01:23:16.680
or do they provide individual article by article
link |
01:23:19.360
reward structure where you give a signal,
link |
01:23:21.600
I'm glad I saw this, or I'm glad I didn't?
link |
01:23:24.720
So like a Spotify type feedback mechanism,
link |
01:23:28.480
it works to some extent.
link |
01:23:30.680
I'm kind of skeptical about it
link |
01:23:32.000
because the only way the algorithm,
link |
01:23:34.880
the algorithm will attempt to relate your choices
link |
01:23:39.120
with the choices of everyone else,
link |
01:23:41.040
which might, you know, if you have an average profile
link |
01:23:45.000
that works fine, I'm sure Spotify accommodations work fine
link |
01:23:47.880
if you just like mainstream stuff.
link |
01:23:49.560
If you don't, it can be, it's not optimal at all actually.
link |
01:23:53.960
It'll be in an efficient search
link |
01:23:56.040
for the part of the Spotify world that represents you.
link |
01:24:00.800
So it's a tough problem,
link |
01:24:02.960
but do note that even a feedback system
link |
01:24:07.960
like what Spotify has does not give me control
link |
01:24:10.880
over what the algorithm is trying to optimize for.
link |
01:24:16.320
Well, public awareness, which is what we're doing now,
link |
01:24:19.360
is a good place to start.
link |
01:24:21.360
Do you have concerns about longterm existential threats
link |
01:24:25.960
of artificial intelligence?
link |
01:24:28.280
Well, as I was saying,
link |
01:24:31.040
our world is increasingly made of information.
link |
01:24:33.360
AI algorithms are increasingly going to be our interface
link |
01:24:36.240
to this world of information,
link |
01:24:37.880
and somebody will be in control of these algorithms.
link |
01:24:41.480
And that puts us in any kind of a bad situation, right?
link |
01:24:45.920
It has risks.
link |
01:24:46.880
It has risks coming from potentially large companies
link |
01:24:50.840
wanting to optimize their own goals,
link |
01:24:53.760
maybe profit, maybe something else.
link |
01:24:55.960
Also from governments who might want to use these algorithms
link |
01:25:00.720
as a means of control of the population.
link |
01:25:03.520
Do you think there's existential threat
link |
01:25:05.000
that could arise from that?
link |
01:25:06.320
So existential threat.
link |
01:25:09.120
So maybe you're referring to the singularity narrative
link |
01:25:13.240
where robots just take over.
link |
01:25:15.560
Well, I don't, I'm not terminating robots,
link |
01:25:18.320
and I don't believe it has to be a singularity.
link |
01:25:21.000
We're just talking to, just like you said,
link |
01:25:24.800
the algorithm controlling masses of populations.
link |
01:25:28.920
The existential threat being,
link |
01:25:32.640
hurt ourselves much like a nuclear war would hurt ourselves.
link |
01:25:36.760
That kind of thing.
link |
01:25:37.600
I don't think that requires a singularity.
link |
01:25:39.480
That requires a loss of control over AI algorithm.
link |
01:25:42.560
Yes.
link |
01:25:43.560
So I do agree there are concerning trends.
link |
01:25:47.000
Honestly, I wouldn't want to make any longterm predictions.
link |
01:25:52.960
I don't think today we really have the capability
link |
01:25:56.000
to see what the dangers of AI
link |
01:25:58.560
are going to be in 50 years, in 100 years.
link |
01:26:01.360
I do see that we are already faced
link |
01:26:04.800
with concrete and present dangers
link |
01:26:08.840
surrounding the negative side effects
link |
01:26:11.560
of content recombination systems, of newsfeed algorithms
link |
01:26:14.960
concerning algorithmic bias as well.
link |
01:26:18.640
So we are delegating more and more
link |
01:26:22.240
decision processes to algorithms.
link |
01:26:25.080
Some of these algorithms are uncrafted,
link |
01:26:26.760
some are learned from data,
link |
01:26:29.360
but we are delegating control.
link |
01:26:32.920
Sometimes it's a good thing, sometimes not so much.
link |
01:26:36.280
And there is in general very little supervision
link |
01:26:39.480
of this process, right?
link |
01:26:41.000
So we are still in this period of very fast change,
link |
01:26:45.400
even chaos, where society is restructuring itself,
link |
01:26:50.920
turning into an information society,
link |
01:26:53.160
which itself is turning into
link |
01:26:54.520
an increasingly automated information passing society.
link |
01:26:58.360
And well, yeah, I think the best we can do today
link |
01:27:02.520
is try to raise awareness around some of these issues.
link |
01:27:06.040
And I think we're actually making good progress.
link |
01:27:07.680
If you look at algorithmic bias, for instance,
link |
01:27:12.760
three years ago, even two years ago,
link |
01:27:14.760
very, very few people were talking about it.
link |
01:27:17.040
And now all the big companies are talking about it.
link |
01:27:20.320
They are often not in a very serious way,
link |
01:27:22.360
but at least it is part of the public discourse.
link |
01:27:24.560
You see people in Congress talking about it.
link |
01:27:27.080
And it all started from raising awareness.
link |
01:27:31.960
Right.
link |
01:27:32.800
So in terms of alignment problem,
link |
01:27:36.080
trying to teach as we allow algorithms,
link |
01:27:39.400
just even recommender systems on Twitter,
link |
01:27:43.640
encoding human values and morals,
link |
01:27:48.280
decisions that touch on ethics,
link |
01:27:50.200
how hard do you think that problem is?
link |
01:27:52.600
How do we have lost functions in neural networks
link |
01:27:57.240
that have some component,
link |
01:27:58.640
some fuzzy components of human morals?
link |
01:28:01.080
Well, I think this is really all about objective function engineering,
link |
01:28:06.080
which is probably going to be increasingly a topic of concern in the future.
link |
01:28:10.520
Like for now, we're just using very naive loss functions
link |
01:28:14.640
because the hard part is not actually what you're trying to minimize.
link |
01:28:17.760
It's everything else.
link |
01:28:19.040
But as the everything else is going to be increasingly automated,
link |
01:28:22.840
we're going to be focusing our human attention
link |
01:28:27.040
on increasingly high level components,
link |
01:28:30.240
like what's actually driving the whole learning system,
link |
01:28:32.680
like the objective function.
link |
01:28:33.960
So loss function engineering is going to be,
link |
01:28:36.920
loss function engineer is probably going to be a job title in the future.
link |
01:28:40.640
And then the tooling you're creating with Keras essentially
link |
01:28:44.520
takes care of all the details underneath.
link |
01:28:47.040
And basically the human expert is needed for exactly that.
link |
01:28:52.720
That's the idea.
link |
01:28:53.920
Keras is the interface between the data you're collecting
link |
01:28:57.640
and the business goals.
link |
01:28:59.080
And your job as an engineer is going to be to express your business goals
link |
01:29:03.480
and your understanding of your business or your product,
link |
01:29:06.720
your system as a kind of loss function or a kind of set of constraints.
link |
01:29:11.840
Does the possibility of creating an AGI system excite you or scare you or bore you?
link |
01:29:19.480
So intelligence can never really be general.
link |
01:29:22.080
You know, at best it can have some degree of generality like human intelligence.
link |
01:29:26.400
It also always has some specialization in the same way that human intelligence
link |
01:29:30.640
is specialized in a certain category of problems,
link |
01:29:33.440
is specialized in the human experience.
link |
01:29:35.440
And when people talk about AGI,
link |
01:29:37.280
I'm never quite sure if they're talking about very, very smart AI,
link |
01:29:42.520
so smart that it's even smarter than humans,
link |
01:29:45.080
or they're talking about human like intelligence,
link |
01:29:48.000
because these are different things.
link |
01:29:49.680
Let's say, presumably I'm oppressing you today with my humanness.
link |
01:29:54.760
So imagine that I was in fact a robot.
link |
01:29:59.240
So what does that mean?
link |
01:30:01.920
That I'm impressing you with natural language processing.
link |
01:30:04.920
Maybe if you weren't able to see me, maybe this is a phone call.
link |
01:30:07.840
So that kind of system.
link |
01:30:10.000
Companion.
link |
01:30:11.120
So that's very much about building human like AI.
link |
01:30:15.040
And you're asking me, you know, is this an exciting perspective?
link |
01:30:18.200
Yes.
link |
01:30:19.440
I think so, yes.
link |
01:30:21.760
Not so much because of what artificial human like intelligence could do,
link |
01:30:28.000
but, you know, from an intellectual perspective,
link |
01:30:30.880
I think if you could build truly human like intelligence,
link |
01:30:34.120
that means you could actually understand human intelligence,
link |
01:30:37.240
which is fascinating, right?
link |
01:30:39.880
Human like intelligence is going to require emotions.
link |
01:30:42.680
It's going to require consciousness,
link |
01:30:44.400
which is not things that would normally be required by an intelligent system.
link |
01:30:49.720
If you look at, you know, we were mentioning earlier like science
link |
01:30:53.160
as a superhuman problem solving agent or system,
link |
01:30:59.600
it does not have consciousness, it doesn't have emotions.
link |
01:31:02.120
In general, so emotions,
link |
01:31:04.320
I see consciousness as being on the same spectrum as emotions.
link |
01:31:07.640
It is a component of the subjective experience
link |
01:31:12.280
that is meant very much to guide behavior generation, right?
link |
01:31:18.800
It's meant to guide your behavior.
link |
01:31:20.800
In general, human intelligence and animal intelligence
link |
01:31:24.520
has evolved for the purpose of behavior generation, right?
link |
01:31:29.280
Including in a social context.
link |
01:31:30.680
So that's why we actually need emotions.
link |
01:31:32.480
That's why we need consciousness.
link |
01:31:34.920
An artificial intelligence system developed in a different context
link |
01:31:38.360
may well never need them, may well never be conscious like science.
link |
01:31:42.800
Well, on that point, I would argue it's possible to imagine
link |
01:31:47.960
that there's echoes of consciousness in science
link |
01:31:51.480
when viewed as an organism, that science is consciousness.
link |
01:31:55.480
So, I mean, how would you go about testing this hypothesis?
link |
01:31:59.160
How do you probe the subjective experience of an abstract system like science?
link |
01:32:07.000
Well, the point of probing any subjective experience is impossible
link |
01:32:10.400
because I'm not science, I'm Lex.
link |
01:32:13.200
So I can't probe another entity, it's no more than bacteria on my skin.
link |
01:32:20.520
You're Lex, I can ask you questions about your subjective experience
link |
01:32:24.160
and you can answer me, and that's how I know you're conscious.
link |
01:32:28.440
Yes, but that's because we speak the same language.
link |
01:32:31.840
You perhaps, we have to speak the language of science in order to ask it.
link |
01:32:35.520
Honestly, I don't think consciousness, just like emotions of pain and pleasure,
link |
01:32:40.320
is not something that inevitably arises
link |
01:32:44.160
from any sort of sufficiently intelligent information processing.
link |
01:32:47.920
It is a feature of the mind, and if you've not implemented it explicitly, it is not there.
link |
01:32:53.920
So you think it's an emergent feature of a particular architecture.
link |
01:32:58.960
So do you think...
link |
01:33:00.320
It's a feature in the same sense.
link |
01:33:02.000
So, again, the subjective experience is all about guiding behavior.
link |
01:33:08.240
If the problems you're trying to solve don't really involve an embodied agent,
link |
01:33:15.120
maybe in a social context, generating behavior and pursuing goals like this.
link |
01:33:19.520
And if you look at science, that's not really what's happening.
link |
01:33:22.160
Even though it is, it is a form of artificial AI, artificial intelligence,
link |
01:33:27.920
in the sense that it is solving problems, it is accumulating knowledge,
link |
01:33:31.920
accumulating solutions and so on.
link |
01:33:35.040
So if you're not explicitly implementing a subjective experience,
link |
01:33:39.440
implementing certain emotions and implementing consciousness,
link |
01:33:44.000
it's not going to just spontaneously emerge.
link |
01:33:47.360
Yeah.
link |
01:33:48.080
But so for a system like, human like intelligence system that has consciousness,
link |
01:33:53.200
do you think it needs to have a body?
link |
01:33:55.840
Yes, definitely.
link |
01:33:56.720
I mean, it doesn't have to be a physical body, right?
link |
01:33:59.600
And there's not that much difference between a realistic simulation in the real world.
link |
01:34:03.440
So there has to be something you have to preserve kind of thing.
link |
01:34:06.400
Yes, but human like intelligence can only arise in a human like context.
link |
01:34:11.840
Intelligence needs other humans in order for you to demonstrate
link |
01:34:16.800
that you have human like intelligence, essentially.
link |
01:34:19.040
Yes.
link |
01:34:20.320
So what kind of tests and demonstration would be sufficient for you
link |
01:34:28.080
to demonstrate human like intelligence?
link |
01:34:30.960
Yeah.
link |
01:34:31.360
Just out of curiosity, you've talked about in terms of theorem proving
link |
01:34:35.600
and program synthesis, I think you've written about
link |
01:34:38.000
that there's no good benchmarks for this.
link |
01:34:40.480
Yeah.
link |
01:34:40.720
That's one of the problems.
link |
01:34:42.000
So let's talk program synthesis.
link |
01:34:46.320
So what do you imagine is a good...
link |
01:34:48.800
I think it's related questions for human like intelligence
link |
01:34:51.360
and for program synthesis.
link |
01:34:53.360
What's a good benchmark for either or both?
link |
01:34:56.080
Right.
link |
01:34:56.480
So I mean, you're actually asking two questions,
link |
01:34:59.200
which is one is about quantifying intelligence
link |
01:35:02.480
and comparing the intelligence of an artificial system
link |
01:35:06.880
to the intelligence for human.
link |
01:35:08.480
And the other is about the degree to which this intelligence is human like.
link |
01:35:13.440
It's actually two different questions.
link |
01:35:16.560
So you mentioned earlier the Turing test.
link |
01:35:19.680
Well, I actually don't like the Turing test because it's very lazy.
link |
01:35:23.200
It's all about completely bypassing the problem of defining and measuring intelligence
link |
01:35:28.720
and instead delegating to a human judge or a panel of human judges.
link |
01:35:34.160
So it's a total copout, right?
link |
01:35:38.160
If you want to measure how human like an agent is,
link |
01:35:43.760
I think you have to make it interact with other humans.
link |
01:35:47.600
Maybe it's not necessarily a good idea to have these other humans be the judges.
link |
01:35:53.760
Maybe you should just observe behavior and compare it to what a human would actually have done.
link |
01:36:00.560
When it comes to measuring how smart, how clever an agent is
link |
01:36:05.120
and comparing that to the degree of human intelligence.
link |
01:36:11.120
So we're already talking about two things, right?
link |
01:36:13.520
The degree, kind of like the magnitude of an intelligence and its direction, right?
link |
01:36:20.320
Like the norm of a vector and its direction.
link |
01:36:23.280
And the direction is like human likeness and the magnitude, the norm is intelligence.
link |
01:36:32.720
You could call it intelligence, right?
link |
01:36:34.080
So the direction, your sense, the space of directions that are human like is very narrow.
link |
01:36:41.040
Yeah.
link |
01:36:42.240
So the way you would measure the magnitude of intelligence in a system
link |
01:36:48.880
in a way that also enables you to compare it to that of a human.
link |
01:36:54.640
Well, if you look at different benchmarks for intelligence today,
link |
01:36:59.200
they're all too focused on skill at a given task.
link |
01:37:04.160
Like skill at playing chess, skill at playing Go, skill at playing Dota.
link |
01:37:10.720
And I think that's not the right way to go about it because you can always
link |
01:37:15.600
beat a human at one specific task.
link |
01:37:19.200
The reason why our skill at playing Go or juggling or anything is impressive
link |
01:37:23.920
is because we are expressing this skill within a certain set of constraints.
link |
01:37:28.400
If you remove the constraints, the constraints that we have one lifetime,
link |
01:37:32.320
that we have this body and so on, if you remove the context,
link |
01:37:36.080
if you have unlimited string data, if you can have access to, you know,
link |
01:37:40.480
for instance, if you look at juggling, if you have no restriction on the hardware,
link |
01:37:44.640
then achieving arbitrary levels of skill is not very interesting
link |
01:37:48.400
and says nothing about the amount of intelligence you've achieved.
link |
01:37:52.400
So if you want to measure intelligence, you need to rigorously define what
link |
01:37:57.440
intelligence is, which in itself, you know, it's a very challenging problem.
link |
01:38:02.960
And do you think that's possible?
link |
01:38:04.320
To define intelligence? Yes, absolutely.
link |
01:38:06.000
I mean, you can provide, many people have provided, you know, some definition.
link |
01:38:10.560
I have my own definition.
link |
01:38:12.000
Where does your definition begin?
link |
01:38:13.440
Where does your definition begin if it doesn't end?
link |
01:38:16.240
Well, I think intelligence is essentially the efficiency
link |
01:38:22.320
with which you turn experience into generalizable programs.
link |
01:38:29.760
So what that means is it's the efficiency with which
link |
01:38:32.800
you turn a sampling of experience space into
link |
01:38:36.720
the ability to process a larger chunk of experience space.
link |
01:38:46.000
So measuring skill can be one proxy across many different tasks,
link |
01:38:52.560
can be one proxy for measuring intelligence.
link |
01:38:54.480
But if you want to only measure skill, you should control for two things.
link |
01:38:58.720
You should control for the amount of experience that your system has
link |
01:39:04.960
and the priors that your system has.
link |
01:39:08.080
But if you look at two agents and you give them the same priors
link |
01:39:13.120
and you give them the same amount of experience,
link |
01:39:16.160
there is one of the agents that is going to learn programs,
link |
01:39:21.360
representations, something, a model that will perform well
link |
01:39:25.440
on the larger chunk of experience space than the other.
link |
01:39:28.720
And that is the smaller agent.
link |
01:39:30.960
Yeah. So if you fix the experience, which generate better programs,
link |
01:39:37.680
better meaning more generalizable.
link |
01:39:39.520
That's really interesting.
link |
01:39:40.560
That's a very nice, clean definition of...
link |
01:39:42.400
Oh, by the way, in this definition, it is already very obvious
link |
01:39:47.280
that intelligence has to be specialized
link |
01:39:49.440
because you're talking about experience space
link |
01:39:51.680
and you're talking about segments of experience space.
link |
01:39:54.080
You're talking about priors and you're talking about experience.
link |
01:39:57.200
All of these things define the context in which intelligence emerges.
link |
01:40:04.480
And you can never look at the totality of experience space, right?
link |
01:40:09.760
So intelligence has to be specialized.
link |
01:40:12.160
But it can be sufficiently large, the experience space,
link |
01:40:14.960
even though it's specialized.
link |
01:40:16.080
There's a certain point when the experience space is large enough
link |
01:40:19.120
to where it might as well be general.
link |
01:40:22.000
It feels general. It looks general.
link |
01:40:23.920
Sure. I mean, it's very relative.
link |
01:40:25.680
Like, for instance, many people would say human intelligence is general.
link |
01:40:29.360
In fact, it is quite specialized.
link |
01:40:32.800
We can definitely build systems that start from the same innate priors
link |
01:40:37.120
as what humans have at birth.
link |
01:40:39.120
Because we already understand fairly well
link |
01:40:42.320
what sort of priors we have as humans.
link |
01:40:44.480
Like many people have worked on this problem.
link |
01:40:46.800
Most notably, Elisabeth Spelke from Harvard.
link |
01:40:51.040
I don't know if you know her.
link |
01:40:52.240
She's worked a lot on what she calls core knowledge.
link |
01:40:56.000
And it is very much about trying to determine and describe
link |
01:41:00.640
what priors we are born with.
link |
01:41:02.320
Like language skills and so on, all that kind of stuff.
link |
01:41:04.720
Exactly.
link |
01:41:06.880
So we have some pretty good understanding of what priors we are born with.
link |
01:41:11.440
So we could...
link |
01:41:13.760
So I've actually been working on a benchmark for the past couple years,
link |
01:41:17.760
you know, on and off.
link |
01:41:18.640
I hope to be able to release it at some point.
link |
01:41:20.480
That's exciting.
link |
01:41:21.760
The idea is to measure the intelligence of systems
link |
01:41:26.800
by countering for priors,
link |
01:41:28.640
countering for amount of experience,
link |
01:41:30.480
and by assuming the same priors as what humans are born with.
link |
01:41:34.800
So that you can actually compare these scores to human intelligence.
link |
01:41:39.520
You can actually have humans pass the same test in a way that's fair.
link |
01:41:43.280
Yeah. And so importantly, such a benchmark should be such that any amount
link |
01:41:52.960
of practicing does not increase your score.
link |
01:41:56.480
So try to picture a game where no matter how much you play this game,
link |
01:42:01.600
that does not change your skill at the game.
link |
01:42:05.040
Can you picture that?
link |
01:42:05.920
As a person who deeply appreciates practice, I cannot actually.
link |
01:42:11.040
There's actually a very simple trick.
link |
01:42:16.560
So in order to come up with a task,
link |
01:42:19.440
so the only thing you can measure is skill at the task.
link |
01:42:21.760
Yes.
link |
01:42:22.320
All tasks are going to involve priors.
link |
01:42:24.800
Yes.
link |
01:42:25.600
The trick is to know what they are and to describe that.
link |
01:42:29.920
And then you make sure that this is the same set of priors as what humans start with.
link |
01:42:33.760
So you create a task that assumes these priors, that exactly documents these priors,
link |
01:42:38.560
so that the priors are made explicit and there are no other priors involved.
link |
01:42:42.240
And then you generate a certain number of samples in experience space for this task, right?
link |
01:42:49.840
And this, for one task, assuming that the task is new for the agent passing it,
link |
01:42:56.320
that's one test of this definition of intelligence that we set up.
link |
01:43:04.320
And now you can scale that to many different tasks,
link |
01:43:06.880
that each task should be new to the agent passing it, right?
link |
01:43:11.360
And also it should be human interpretable and understandable
link |
01:43:14.480
so that you can actually have a human pass the same test.
link |
01:43:16.880
And then you can compare the score of your machine and the score of your human.
link |
01:43:19.760
Which could be a lot of stuff.
link |
01:43:20.720
You could even start a task like MNIST.
link |
01:43:23.040
Just as long as you start with the same set of priors.
link |
01:43:28.800
So the problem with MNIST, humans are already trying to recognize digits, right?
link |
01:43:35.600
But let's say we're considering objects that are not digits,
link |
01:43:42.400
some completely arbitrary patterns.
link |
01:43:44.480
Well, humans already come with visual priors about how to process that.
link |
01:43:48.880
So in order to make the game fair, you would have to isolate these priors
link |
01:43:54.080
and describe them and then express them as computational rules.
link |
01:43:57.280
Having worked a lot with vision science people, that's exceptionally difficult.
link |
01:44:01.680
A lot of progress has been made.
link |
01:44:03.120
There's been a lot of good tests and basically reducing all of human vision into some good priors.
link |
01:44:08.640
We're still probably far away from that perfectly,
link |
01:44:10.960
but as a start for a benchmark, that's an exciting possibility.
link |
01:44:14.640
Yeah, so Elisabeth Spelke actually lists objectness as one of the core knowledge priors.
link |
01:44:24.800
Objectness, cool.
link |
01:44:25.920
Objectness, yeah.
link |
01:44:27.440
So we have priors about objectness, like about the visual space, about time,
link |
01:44:31.520
about agents, about goal oriented behavior.
link |
01:44:35.280
We have many different priors, but what's interesting is that,
link |
01:44:39.280
sure, we have this pretty diverse and rich set of priors,
link |
01:44:43.920
but it's also not that diverse, right?
link |
01:44:46.880
We are not born into this world with a ton of knowledge about the world,
link |
01:44:50.800
with only a small set of core knowledge.
link |
01:44:58.640
Yeah, sorry, do you have a sense of how it feels to us humans that that set is not that large?
link |
01:45:05.040
But just even the nature of time that we kind of integrate pretty effectively
link |
01:45:09.600
through all of our perception, all of our reasoning,
link |
01:45:12.640
maybe how, you know, do you have a sense of how easy it is to encode those priors?
link |
01:45:17.680
Maybe it requires building a universe and then the human brain in order to encode those priors.
link |
01:45:25.440
Or do you have a hope that it can be listed like an axiomatic?
link |
01:45:28.640
I don't think so.
link |
01:45:29.280
So you have to keep in mind that any knowledge about the world that we are
link |
01:45:33.040
born with is something that has to have been encoded into our DNA by evolution at some point.
link |
01:45:41.120
Right.
link |
01:45:41.440
And DNA is a very, very low bandwidth medium.
link |
01:45:46.000
Like it's extremely long and expensive to encode anything into DNA because first of all,
link |
01:45:52.560
you need some sort of evolutionary pressure to guide this writing process.
link |
01:45:57.440
And then, you know, the higher level of information you're trying to write, the longer it's going to take.
link |
01:46:04.480
And the thing in the environment that you're trying to encode knowledge about has to be stable
link |
01:46:13.520
over this duration.
link |
01:46:15.280
So you can only encode into DNA things that constitute an evolutionary advantage.
link |
01:46:20.960
So this is actually a very small subset of all possible knowledge about the world.
link |
01:46:25.280
You can only encode things that are stable, that are true, over very, very long periods of time,
link |
01:46:32.080
typically millions of years.
link |
01:46:33.680
For instance, we might have some visual prior about the shape of snakes, right?
link |
01:46:38.720
But what makes a face, what's the difference between a face and an art face?
link |
01:46:44.560
But consider this interesting question.
link |
01:46:48.080
Do we have any innate sense of the visual difference between a male face and a female face?
link |
01:46:56.640
What do you think?
link |
01:46:58.640
For a human, I mean.
link |
01:46:59.840
I would have to look back into evolutionary history when the genders emerged.
link |
01:47:04.000
But yeah, most...
link |
01:47:06.240
I mean, the faces of humans are quite different from the faces of great apes.
link |
01:47:10.640
Great apes, right?
link |
01:47:12.880
Yeah.
link |
01:47:13.600
That's interesting.
link |
01:47:14.800
Yeah, you couldn't tell the face of a female chimpanzee from the face of a male chimpanzee,
link |
01:47:22.800
probably.
link |
01:47:23.440
Yeah, and I don't think most humans have all that ability.
link |
01:47:26.160
So we do have innate knowledge of what makes a face, but it's actually impossible for us to
link |
01:47:33.280
have any DNA encoded knowledge of the difference between a female human face and a male human face
link |
01:47:40.320
because that knowledge, that information came up into the world actually very recently.
link |
01:47:50.560
If you look at the slowness of the process of encoding knowledge into DNA.
link |
01:47:56.400
Yeah, so that's interesting.
link |
01:47:57.360
That's a really powerful argument that DNA is a low bandwidth and it takes a long time to encode.
link |
01:48:02.800
That naturally creates a very efficient encoding.
link |
01:48:05.200
But one important consequence of this is that, so yes, we are born into this world with a bunch of
link |
01:48:12.800
knowledge, sometimes high level knowledge about the world, like the shape, the rough shape of a
link |
01:48:17.600
snake, of the rough shape of a face.
link |
01:48:20.480
But importantly, because this knowledge takes so long to write, almost all of this innate
link |
01:48:26.960
knowledge is shared with our cousins, with great apes, right?
link |
01:48:32.080
So it is not actually this innate knowledge that makes us special.
link |
01:48:36.320
But to throw it right back at you from the earlier on in our discussion, it's that encoding
link |
01:48:42.960
might also include the entirety of the environment of Earth.
link |
01:48:49.360
To some extent.
link |
01:48:49.920
So it can include things that are important to survival and production, so for which there is
link |
01:48:56.480
some evolutionary pressure, and things that are stable, constant over very, very, very long time
link |
01:49:02.880
periods.
link |
01:49:04.160
And honestly, it's not that much information.
link |
01:49:06.320
There's also, besides the bandwidths constraint and the constraints of the writing process,
link |
01:49:14.400
there's also memory constraints, like DNA, the part of DNA that deals with the human brain,
link |
01:49:21.440
it's actually fairly small.
link |
01:49:22.640
It's like, you know, on the order of megabytes, right?
link |
01:49:25.520
There's not that much high level knowledge about the world you can encode.
link |
01:49:31.600
That's quite brilliant and hopeful for a benchmark that you're referring to of encoding
link |
01:49:38.880
priors.
link |
01:49:39.360
I actually look forward to, I'm skeptical whether you can do it in the next couple of
link |
01:49:43.120
years, but hopefully.
link |
01:49:45.040
I've been working.
link |
01:49:45.760
So honestly, it's a very simple benchmark, and it's not like a big breakthrough or anything.
link |
01:49:49.920
It's more like a fun side project, right?
link |
01:49:53.200
But these fun, so is ImageNet.
link |
01:49:56.480
These fun side projects could launch entire groups of efforts towards creating reasoning
link |
01:50:04.080
systems and so on.
link |
01:50:04.960
And I think...
link |
01:50:05.440
Yeah, that's the goal.
link |
01:50:06.160
It's trying to measure strong generalization, to measure the strength of abstraction in
link |
01:50:12.080
our minds, well, in our minds and in artificial intelligence agencies.
link |
01:50:16.960
And if there's anything true about this science organism is its individual cells love competition.
link |
01:50:24.800
So and benchmarks encourage competition.
link |
01:50:26.800
So that's an exciting possibility.
link |
01:50:29.520
If you, do you think an AI winter is coming?
link |
01:50:33.520
And how do we prevent it?
link |
01:50:35.440
Not really.
link |
01:50:36.080
So an AI winter is something that would occur when there's a big mismatch between how we
link |
01:50:42.160
are selling the capabilities of AI and the actual capabilities of AI.
link |
01:50:47.280
And today, some deep learning is creating a lot of value.
link |
01:50:50.560
And it will keep creating a lot of value in the sense that these models are applicable
link |
01:50:56.240
to a very wide range of problems that are relevant today.
link |
01:51:00.000
And we are only just getting started with applying these algorithms to every problem
link |
01:51:05.120
they could be solving.
link |
01:51:06.320
So deep learning will keep creating a lot of value for the time being.
link |
01:51:10.160
What's concerning, however, is that there's a lot of hype around deep learning and around
link |
01:51:15.920
AI.
link |
01:51:16.240
There are lots of people are overselling the capabilities of these systems, not just
link |
01:51:22.000
the capabilities, but also overselling the fact that they might be more or less, you
link |
01:51:27.760
know, brain like, like given the kind of a mystical aspect, these technologies and also
link |
01:51:36.640
overselling the pace of progress, which, you know, it might look fast in the sense that
link |
01:51:43.840
we have this exponentially increasing number of papers.
link |
01:51:47.760
But again, that's just a simple consequence of the fact that we have ever more people
link |
01:51:52.960
coming into the field.
link |
01:51:54.400
It doesn't mean the progress is actually exponentially fast.
link |
01:51:58.640
Let's say you're trying to raise money for your startup or your research lab.
link |
01:52:02.720
You might want to tell, you know, a grandiose story to investors about how deep learning
link |
01:52:09.120
is just like the brain and how it can solve all these incredible problems like self driving
link |
01:52:14.240
and robotics and so on.
link |
01:52:15.760
And maybe you can tell them that the field is progressing so fast and we are going to
link |
01:52:19.440
have AGI within 15 years or even 10 years.
link |
01:52:23.040
And none of this is true.
link |
01:52:25.920
And every time you're like saying these things and an investor or, you know, a decision maker
link |
01:52:32.800
believes them, well, this is like the equivalent of taking on credit card debt, but for trust,
link |
01:52:41.680
right?
link |
01:52:42.480
And maybe this will, you know, this will be what enables you to raise a lot of money,
link |
01:52:50.160
but ultimately you are creating damage, you are damaging the field.
link |
01:52:54.320
So that's the concern is that that debt, that's what happens with the other AI winters is
link |
01:53:00.160
the concern is you actually tweeted about this with autonomous vehicles, right?
link |
01:53:04.160
There's almost every single company now have promised that they will have full autonomous
link |
01:53:08.960
vehicles by 2021, 2022.
link |
01:53:11.760
That's a good example of the consequences of over hyping the capabilities of AI and
link |
01:53:18.080
the pace of progress.
link |
01:53:19.280
So because I work especially a lot recently in this area, I have a deep concern of what
link |
01:53:25.200
happens when all of these companies after I've invested billions have a meeting and
link |
01:53:30.400
say, how much do we actually, first of all, do we have an autonomous vehicle?
link |
01:53:33.600
The answer will definitely be no.
link |
01:53:35.840
And second will be, wait a minute, we've invested one, two, three, four billion dollars
link |
01:53:40.560
into this and we made no profit.
link |
01:53:43.120
And the reaction to that may be going very hard in other directions that might impact
link |
01:53:49.200
even other industries.
link |
01:53:50.400
And that's what we call an AI winter is when there is backlash where no one believes any
link |
01:53:55.520
of these promises anymore because they've turned that to be big lies the first time
link |
01:53:59.360
around.
link |
01:54:00.240
And this will definitely happen to some extent for autonomous vehicles because the public
link |
01:54:06.000
and decision makers have been convinced that around 2015, they've been convinced by these
link |
01:54:13.360
people who are trying to raise money for their startups and so on, that L5 driving was coming
link |
01:54:19.600
in maybe 2016, maybe 2017, maybe 2018.
link |
01:54:22.880
Now we're in 2019, we're still waiting for it.
link |
01:54:27.600
And so I don't believe we are going to have a full on AI winter because we have these
link |
01:54:32.800
technologies that are producing a tremendous amount of real value.
link |
01:54:37.680
But there is also too much hype.
link |
01:54:39.920
So there will be some backlash, especially there will be backlash.
link |
01:54:44.960
So some startups are trying to sell the dream of AGI and the fact that AGI is going to create
link |
01:54:53.040
infinite value.
link |
01:54:53.760
Like AGI is like a free lunch.
link |
01:54:55.680
Like if you can develop an AI system that passes a certain threshold of IQ or something,
link |
01:55:02.800
then suddenly you have infinite value.
link |
01:55:04.400
And well, there are actually lots of investors buying into this idea and they will wait maybe
link |
01:55:14.160
10, 15 years and nothing will happen.
link |
01:55:17.760
And the next time around, well, maybe there will be a new generation of investors.
link |
01:55:22.560
No one will care.
link |
01:55:24.800
Human memory is fairly short after all.
link |
01:55:27.280
I don't know about you, but because I've spoken about AGI sometimes poetically, I get a lot
link |
01:55:34.320
of emails from people giving me, they're usually like a large manifestos of they've, they say
link |
01:55:42.000
to me that they have created an AGI system or they know how to do it.
link |
01:55:47.200
And there's a long write up of how to do it.
link |
01:55:48.880
I get a lot of these emails, yeah.
link |
01:55:50.560
They're a little bit feel like it's generated by an AI system actually, but there's usually
link |
01:55:57.760
no diagram, you have a transformer generating crank papers about AGI.
link |
01:56:06.640
So the question is about, because you've been such a good, you have a good radar for crank
link |
01:56:12.160
papers, how do we know they're not onto something?
link |
01:56:16.720
How do I, so when you start to talk about AGI or anything like the reasoning benchmarks
link |
01:56:24.240
and so on, so something that doesn't have a benchmark, it's really difficult to know.
link |
01:56:29.120
I mean, I talked to Jeff Hawkins, who's really looking at neuroscience approaches to how,
link |
01:56:35.200
and there's some, there's echoes of really interesting ideas in at least Jeff's case,
link |
01:56:41.520
which he's showing.
link |
01:56:43.280
How do you usually think about this?
link |
01:56:46.640
Like preventing yourself from being too narrow minded and elitist about deep learning, it
link |
01:56:52.880
has to work on these particular benchmarks, otherwise it's trash.
link |
01:56:56.720
Well, you know, the thing is, intelligence does not exist in the abstract.
link |
01:57:05.280
Intelligence has to be applied.
link |
01:57:07.200
So if you don't have a benchmark, if you have an improvement in some benchmark, maybe it's
link |
01:57:11.040
a new benchmark, right?
link |
01:57:12.400
Maybe it's not something we've been looking at before, but you do need a problem that
link |
01:57:16.640
you're trying to solve.
link |
01:57:17.360
You're not going to come up with a solution without a problem.
link |
01:57:20.000
So you, general intelligence, I mean, you've clearly highlighted generalization.
link |
01:57:26.320
If you want to claim that you have an intelligence system, it should come with a benchmark.
link |
01:57:31.200
It should, yes, it should display capabilities of some kind.
link |
01:57:35.760
It should show that it can create some form of value, even if it's a very artificial form
link |
01:57:41.840
of value.
link |
01:57:42.800
And that's also the reason why you don't actually need to care about telling which papers have
link |
01:57:48.800
actually some hidden potential and which do not.
link |
01:57:53.120
Because if there is a new technique that's actually creating value, this is going to
link |
01:57:59.200
be brought to light very quickly because it's actually making a difference.
link |
01:58:02.480
So it's the difference between something that is ineffectual and something that is actually
link |
01:58:08.160
useful.
link |
01:58:08.800
And ultimately usefulness is our guide, not just in this field, but if you look at science
link |
01:58:14.080
in general, maybe there are many, many people over the years that have had some really interesting
link |
01:58:19.440
theories of everything, but they were just completely useless.
link |
01:58:22.800
And you don't actually need to tell the interesting theories from the useless theories.
link |
01:58:28.000
All you need is to see, is this actually having an effect on something else?
link |
01:58:34.080
Is this actually useful?
link |
01:58:35.360
Is this making an impact or not?
link |
01:58:37.600
That's beautifully put.
link |
01:58:38.640
I mean, the same applies to quantum mechanics, to string theory, to the holographic principle.
link |
01:58:43.680
We are doing deep learning because it works.
link |
01:58:46.960
Before it started working, people considered people working on neural networks as cranks
link |
01:58:52.720
very much.
link |
01:58:54.560
No one was working on this anymore.
link |
01:58:56.320
And now it's working, which is what makes it valuable.
link |
01:58:59.120
It's not about being right.
link |
01:59:01.120
It's about being effective.
link |
01:59:02.560
And nevertheless, the individual entities of this scientific mechanism, just like Yoshua
link |
01:59:08.080
Banjo or Jan Lekun, they, while being called cranks, stuck with it.
link |
01:59:12.480
Right?
link |
01:59:12.880
Yeah.
link |
01:59:13.280
And so us individual agents, even if everyone's laughing at us, just stick with it.
link |
01:59:18.880
If you believe you have something, you should stick with it and see it through.
link |
01:59:23.520
That's a beautiful inspirational message to end on.
link |
01:59:25.920
Francois, thank you so much for talking today.
link |
01:59:27.600
That was amazing.
link |
01:59:28.640
Thank you.