back to indexFrançois Chollet: Keras, Deep Learning, and the Progress of AI | Lex Fridman Podcast #38
link |
The following is a conversation with Francois Chollet.
link |
He's the creator of Keras,
link |
which is an open source deep learning library
link |
that is designed to enable fast, user friendly experimentation
link |
with deep neural networks.
link |
It serves as an interface to several deep learning libraries,
link |
most popular of which is TensorFlow,
link |
and it was integrated into the TensorFlow main code base
link |
Meaning, if you want to create, train,
link |
and use neural networks,
link |
probably the easiest and most popular option
link |
is to use Keras inside TensorFlow.
link |
Aside from creating an exceptionally useful
link |
and popular library,
link |
Francois is also a world class AI researcher
link |
and software engineer at Google.
link |
And he's definitely an outspoken,
link |
if not controversial personality in the AI world,
link |
especially in the realm of ideas
link |
around the future of artificial intelligence.
link |
This is the Artificial Intelligence Podcast.
link |
If you enjoy it, subscribe on YouTube,
link |
give it five stars on iTunes,
link |
support it on Patreon,
link |
or simply connect with me on Twitter
link |
at Lex Friedman, spelled F R I D M A N.
link |
And now, here's my conversation with Francois Chollet.
link |
You're known for not sugarcoating your opinions
link |
and speaking your mind about ideas in AI,
link |
especially on Twitter.
link |
It's one of my favorite Twitter accounts.
link |
So what's one of the more controversial ideas
link |
you've expressed online and gotten some heat for?
link |
Yeah, no, I think if you go through the trouble
link |
of maintaining a Twitter account,
link |
you might as well speak your mind, you know?
link |
Otherwise, what's even the point of having a Twitter account?
link |
It's like having a nice car
link |
and just leaving it in the garage.
link |
Yeah, so what's one thing for which I got
link |
a lot of pushback?
link |
Perhaps, you know, that time I wrote something
link |
about the idea of intelligence explosion,
link |
and I was questioning the idea
link |
and the reasoning behind this idea.
link |
And I got a lot of pushback on that.
link |
I got a lot of flak for it.
link |
So yeah, so intelligence explosion,
link |
I'm sure you're familiar with the idea,
link |
but it's the idea that if you were to build
link |
general AI problem solving algorithms,
link |
well, the problem of building such an AI,
link |
that itself is a problem that could be solved by your AI,
link |
and maybe it could be solved better
link |
than what humans can do.
link |
So your AI could start tweaking its own algorithm,
link |
could start making a better version of itself,
link |
and so on iteratively in a recursive fashion.
link |
And so you would end up with an AI
link |
with exponentially increasing intelligence.
link |
And I was basically questioning this idea,
link |
first of all, because the notion of intelligence explosion
link |
uses an implicit definition of intelligence
link |
that doesn't sound quite right to me.
link |
It considers intelligence as a property of a brain
link |
that you can consider in isolation,
link |
like the height of a building, for instance.
link |
But that's not really what intelligence is.
link |
Intelligence emerges from the interaction
link |
between a brain, a body,
link |
like embodied intelligence, and an environment.
link |
And if you're missing one of these pieces,
link |
then you cannot really define intelligence anymore.
link |
So just tweaking a brain to make it smaller and smaller
link |
doesn't actually make any sense to me.
link |
you're crushing the dreams of many people, right?
link |
So there's a, let's look at like Sam Harris.
link |
Actually, a lot of physicists, Max Tegmark,
link |
people who think the universe
link |
is an information processing system,
link |
our brain is kind of an information processing system.
link |
So what's the theoretical limit?
link |
Like, it doesn't make sense that there should be some,
link |
it seems naive to think that our own brain
link |
is somehow the limit of the capabilities
link |
of this information system.
link |
I'm playing devil's advocate here.
link |
This information processing system.
link |
And then if you just scale it,
link |
if you're able to build something
link |
that's on par with the brain,
link |
you just, the process that builds it just continues
link |
and it'll improve exponentially.
link |
So that's the logic that's used actually
link |
by almost everybody
link |
that is worried about super human intelligence.
link |
So you're trying to make,
link |
so most people who are skeptical of that
link |
are kind of like, this doesn't,
link |
their thought process, this doesn't feel right.
link |
Like that's for me as well.
link |
So I'm more like, it doesn't,
link |
the whole thing is shrouded in mystery
link |
where you can't really say anything concrete,
link |
but you could say this doesn't feel right.
link |
This doesn't feel like that's how the brain works.
link |
And you're trying to with your blog posts
link |
and now making it a little more explicit.
link |
So one idea is that the brain isn't exist alone.
link |
It exists within the environment.
link |
So you can't exponentially,
link |
you would have to somehow exponentially improve
link |
the environment and the brain together almost.
link |
Yeah, in order to create something that's much smarter
link |
of course we don't have a definition of intelligence.
link |
That's correct, that's correct.
link |
I don't think, you should look at very smart people today,
link |
even humans, not even talking about AIs.
link |
I don't think their brain
link |
and the performance of their brain is the bottleneck
link |
to their expressed intelligence, to their achievements.
link |
You cannot just tweak one part of this system,
link |
like of this brain, body, environment system
link |
and expect that capabilities like what emerges
link |
out of this system to just explode exponentially.
link |
Because anytime you improve one part of a system
link |
with many interdependencies like this,
link |
there's a new bottleneck that arises, right?
link |
And I don't think even today for very smart people,
link |
their brain is not the bottleneck
link |
to the sort of problems they can solve, right?
link |
In fact, many very smart people today,
link |
you know, they are not actually solving
link |
any big scientific problems, they're not Einstein.
link |
They're like Einstein, but you know, the patent clerk days.
link |
Like Einstein became Einstein
link |
because this was a meeting of a genius
link |
with a big problem at the right time, right?
link |
But maybe this meeting could have never happened
link |
and then Einstein would have just been a patent clerk, right?
link |
And in fact, many people today are probably like
link |
genius level smart, but you wouldn't know
link |
because they're not really expressing any of that.
link |
Wow, that's brilliant.
link |
So we can think of the world, Earth,
link |
but also the universe as just as a space of problems.
link |
So all these problems and tasks are roaming it
link |
of various difficulty.
link |
And there's agents, creatures like ourselves
link |
and animals and so on that are also roaming it.
link |
And then you get coupled with a problem
link |
and then you solve it.
link |
But without that coupling,
link |
you can't demonstrate your quote unquote intelligence.
link |
Exactly, intelligence is the meeting
link |
of great problem solving capabilities
link |
with a great problem.
link |
And if you don't have the problem,
link |
you don't really express any intelligence.
link |
All you're left with is potential intelligence,
link |
like the performance of your brain
link |
or how high your IQ is,
link |
which in itself is just a number, right?
link |
So you mentioned problem solving capacity.
link |
What do you think of as problem solving capacity?
link |
Can you try to define intelligence?
link |
Like what does it mean to be more or less intelligent?
link |
Is it completely coupled to a particular problem
link |
or is there something a little bit more universal?
link |
Yeah, I do believe all intelligence
link |
is specialized intelligence.
link |
Even human intelligence has some degree of generality.
link |
Well, all intelligent systems have some degree of generality
link |
but they're always specialized in one category of problems.
link |
So the human intelligence is specialized
link |
in the human experience.
link |
And that shows at various levels,
link |
that shows in some prior knowledge that's innate
link |
that we have at birth.
link |
Knowledge about things like agents,
link |
goal driven behavior, visual priors
link |
about what makes an object, priors about time and so on.
link |
That shows also in the way we learn.
link |
For instance, it's very, very easy for us
link |
to pick up language.
link |
It's very, very easy for us to learn certain things
link |
because we are basically hard coded to learn them.
link |
And we are specialized in solving certain kinds of problem
link |
and we are quite useless
link |
when it comes to other kinds of problems.
link |
For instance, we are not really designed
link |
to handle very long term problems.
link |
We have no capability of seeing the very long term.
link |
We don't have very much working memory.
link |
So how do you think about long term?
link |
Do you think long term planning,
link |
are we talking about scale of years, millennia?
link |
What do you mean by long term?
link |
We're not very good.
link |
Well, human intelligence is specialized
link |
in the human experience.
link |
And human experience is very short.
link |
One lifetime is short.
link |
Even within one lifetime,
link |
we have a very hard time envisioning things
link |
on a scale of years.
link |
It's very difficult to project yourself
link |
at a scale of five years, at a scale of 10 years and so on.
link |
We can solve only fairly narrowly scoped problems.
link |
So when it comes to solving bigger problems,
link |
larger scale problems,
link |
we are not actually doing it on an individual level.
link |
So it's not actually our brain doing it.
link |
We have this thing called civilization, right?
link |
Which is itself a sort of problem solving system,
link |
a sort of artificially intelligent system, right?
link |
And it's not running on one brain,
link |
it's running on a network of brains.
link |
In fact, it's running on much more
link |
than a network of brains.
link |
It's running on a lot of infrastructure,
link |
like books and computers and the internet
link |
and human institutions and so on.
link |
And that is capable of handling problems
link |
on a much greater scale than any individual human.
link |
If you look at computer science, for instance,
link |
that's an institution that solves problems
link |
and it is superhuman, right?
link |
It operates on a greater scale.
link |
It can solve much bigger problems
link |
than an individual human could.
link |
And science itself, science as a system, as an institution,
link |
is a kind of artificially intelligent problem solving
link |
algorithm that is superhuman.
link |
Yeah, it's, at least computer science
link |
is like a theorem prover at a scale of thousands,
link |
maybe hundreds of thousands of human beings.
link |
At that scale, what do you think is an intelligent agent?
link |
So there's us humans at the individual level,
link |
there is millions, maybe billions of bacteria in our skin.
link |
There is, that's at the smaller scale.
link |
You can even go to the particle level
link |
as systems that behave,
link |
you can say intelligently in some ways.
link |
And then you can look at the earth as a single organism,
link |
you can look at our galaxy
link |
and even the universe as a single organism.
link |
Do you think, how do you think about scale
link |
in defining intelligent systems?
link |
And we're here at Google, there is millions of devices
link |
doing computation just in a distributed way.
link |
How do you think about intelligence versus scale?
link |
You can always characterize anything as a system.
link |
I think people who talk about things
link |
like intelligence explosion,
link |
tend to focus on one agent is basically one brain,
link |
like one brain considered in isolation,
link |
like a brain, a jaw that's controlling a body
link |
in a very like top to bottom kind of fashion.
link |
And that body is pursuing goals into an environment.
link |
So it's a very hierarchical view.
link |
You have the brain at the top of the pyramid,
link |
then you have the body just plainly receiving orders.
link |
And then the body is manipulating objects
link |
in the environment and so on.
link |
So everything is subordinate to this one thing,
link |
this epicenter, which is the brain.
link |
But in real life, intelligent agents
link |
don't really work like this, right?
link |
There is no strong delimitation
link |
between the brain and the body to start with.
link |
You have to look not just at the brain,
link |
but at the nervous system.
link |
But then the nervous system and the body
link |
are naturally two separate entities.
link |
So you have to look at an entire animal as one agent.
link |
But then you start realizing as you observe an animal
link |
over any length of time,
link |
that a lot of the intelligence of an animal
link |
is actually externalized.
link |
That's especially true for humans.
link |
A lot of our intelligence is externalized.
link |
When you write down some notes,
link |
that is externalized intelligence.
link |
When you write a computer program,
link |
you are externalizing cognition.
link |
So it's externalizing books, it's externalized in computers,
link |
the internet, in other humans.
link |
It's externalizing language and so on.
link |
So there is no hard delimitation
link |
of what makes an intelligent agent.
link |
It's all about context.
link |
Okay, but AlphaGo is better at Go
link |
than the best human player.
link |
There's levels of skill here.
link |
So do you think there's such a ability,
link |
such a concept as intelligence explosion
link |
in a specific task?
link |
And then, well, yeah.
link |
Do you think it's possible to have a category of tasks
link |
on which you do have something
link |
like an exponential growth of ability
link |
to solve that particular problem?
link |
I think if you consider a specific vertical,
link |
it's probably possible to some extent.
link |
I also don't think we have to speculate about it
link |
because we have real world examples
link |
of recursively self improving intelligent systems, right?
link |
So for instance, science is a problem solving system,
link |
a knowledge generation system,
link |
like a system that experiences the world in some sense
link |
and then gradually understands it and can act on it.
link |
And that system is superhuman
link |
and it is clearly recursively self improving
link |
because science feeds into technology.
link |
Technology can be used to build better tools,
link |
better computers, better instrumentation and so on,
link |
which in turn can make science faster, right?
link |
So science is probably the closest thing we have today
link |
to a recursively self improving superhuman AI.
link |
And you can just observe is science,
link |
is scientific progress to the exploding,
link |
which itself is an interesting question.
link |
You can use that as a basis to try to understand
link |
what will happen with a superhuman AI
link |
that has a science like behavior.
link |
Let me linger on it a little bit more.
link |
What is your intuition why an intelligence explosion
link |
Like taking the scientific,
link |
all the semi scientific revolutions,
link |
why can't we slightly accelerate that process?
link |
So you can absolutely accelerate
link |
any problem solving process.
link |
So a recursively self improvement
link |
is absolutely a real thing.
link |
But what happens with a recursively self improving system
link |
is typically not explosion
link |
because no system exists in isolation.
link |
And so tweaking one part of the system
link |
means that suddenly another part of the system
link |
becomes a bottleneck.
link |
And if you look at science, for instance,
link |
which is clearly a recursively self improving,
link |
clearly a problem solving system,
link |
scientific progress is not actually exploding.
link |
If you look at science,
link |
what you see is the picture of a system
link |
that is consuming an exponentially increasing
link |
amount of resources,
link |
but it's having a linear output
link |
in terms of scientific progress.
link |
And maybe that will seem like a very strong claim.
link |
Many people are actually saying that,
link |
scientific progress is exponential,
link |
but when they're claiming this,
link |
they're actually looking at indicators
link |
of resource consumption by science.
link |
For instance, the number of papers being published,
link |
the number of patents being filed and so on,
link |
which are just completely correlated
link |
with how many people are working on science today.
link |
So it's actually an indicator of resource consumption,
link |
but what you should look at is the output,
link |
is progress in terms of the knowledge
link |
that science generates,
link |
in terms of the scope and significance
link |
of the problems that we solve.
link |
And some people have actually been trying to measure that.
link |
Like Michael Nielsen, for instance,
link |
he had a very nice paper,
link |
I think that was last year about it.
link |
So his approach to measure scientific progress
link |
was to look at the timeline of scientific discoveries
link |
over the past, you know, 100, 150 years.
link |
And for each major discovery,
link |
ask a panel of experts to rate
link |
the significance of the discovery.
link |
And if the output of science as an institution
link |
you would expect the temporal density of significance
link |
to go up exponentially.
link |
Maybe because there's a faster rate of discoveries,
link |
maybe because the discoveries are, you know,
link |
increasingly more important.
link |
And what actually happens
link |
if you plot this temporal density of significance
link |
measured in this way,
link |
is that you see very much a flat graph.
link |
You see a flat graph across all disciplines,
link |
across physics, biology, medicine, and so on.
link |
And it actually makes a lot of sense
link |
if you think about it,
link |
because think about the progress of physics
link |
110 years ago, right?
link |
It was a time of crazy change.
link |
Think about the progress of technology,
link |
you know, 170 years ago,
link |
when we started having, you know,
link |
replacing horses with cars,
link |
when we started having electricity and so on.
link |
It was a time of incredible change.
link |
And today is also a time of very, very fast change,
link |
but it would be an unfair characterization
link |
to say that today technology and science
link |
are moving way faster than they did 50 years ago
link |
And if you do try to rigorously plot
link |
the temporal density of the significance,
link |
yeah, of significance, sorry,
link |
you do see very flat curves.
link |
And you can check out the paper
link |
that Michael Nielsen had about this idea.
link |
And so the way I interpret it is,
link |
as you make progress in a given field,
link |
or in a given subfield of science,
link |
it becomes exponentially more difficult
link |
to make further progress.
link |
Like the very first person to work on information theory.
link |
If you enter a new field,
link |
and it's still the very early years,
link |
there's a lot of low hanging fruit you can pick.
link |
That's right, yeah.
link |
But the next generation of researchers
link |
is gonna have to dig much harder, actually,
link |
to make smaller discoveries,
link |
probably larger number of smaller discoveries,
link |
and to achieve the same amount of impact,
link |
you're gonna need a much greater head count.
link |
And that's exactly the picture you're seeing with science,
link |
that the number of scientists and engineers
link |
is in fact increasing exponentially.
link |
The amount of computational resources
link |
that are available to science
link |
is increasing exponentially and so on.
link |
So the resource consumption of science is exponential,
link |
but the output in terms of progress,
link |
in terms of significance, is linear.
link |
And the reason why is because,
link |
and even though science is regressively self improving,
link |
meaning that scientific progress
link |
turns into technological progress,
link |
which in turn helps science.
link |
If you look at computers, for instance,
link |
our products of science and computers
link |
are tremendously useful in speeding up science.
link |
The internet, same thing, the internet is a technology
link |
that's made possible by very recent scientific advances.
link |
And itself, because it enables scientists to network,
link |
to communicate, to exchange papers and ideas much faster,
link |
it is a way to speed up scientific progress.
link |
So even though you're looking
link |
at a regressively self improving system,
link |
it is consuming exponentially more resources
link |
to produce the same amount of problem solving, very much.
link |
So that's a fascinating way to paint it,
link |
and certainly that holds for the deep learning community.
link |
If you look at the temporal, what did you call it,
link |
the temporal density of significant ideas,
link |
if you look at in deep learning,
link |
I think, I'd have to think about that,
link |
but if you really look at significant ideas
link |
in deep learning, they might even be decreasing.
link |
So I do believe the per paper significance is decreasing,
link |
but the amount of papers
link |
is still today exponentially increasing.
link |
So I think if you look at an aggregate,
link |
my guess is that you would see a linear progress.
link |
If you were to sum the significance of all papers,
link |
you would see roughly in your progress.
link |
And in my opinion, it is not a coincidence
link |
that you're seeing linear progress in science
link |
despite exponential resource consumption.
link |
I think the resource consumption
link |
is dynamically adjusting itself to maintain linear progress
link |
because we as a community expect linear progress,
link |
meaning that if we start investing less
link |
and seeing less progress, it means that suddenly
link |
there are some lower hanging fruits that become available
link |
and someone's gonna step up and pick them, right?
link |
So it's very much like a market for discoveries and ideas.
link |
But there's another fundamental part
link |
which you're highlighting, which as a hypothesis
link |
as science or like the space of ideas,
link |
any one path you travel down,
link |
it gets exponentially more difficult
link |
to get a new way to develop new ideas.
link |
And your sense is that's gonna hold
link |
across our mysterious universe.
link |
Yes, well, exponential progress
link |
triggers exponential friction.
link |
So that if you tweak one part of the system,
link |
suddenly some other part becomes a bottleneck, right?
link |
For instance, let's say you develop some device
link |
that measures its own acceleration
link |
and then it has some engine
link |
and it outputs even more acceleration
link |
in proportion of its own acceleration
link |
and you drop it somewhere,
link |
it's not gonna reach infinite speed
link |
because it exists in a certain context.
link |
So the air around it is gonna generate friction
link |
and it's gonna block it at some top speed.
link |
And even if you were to consider the broader context
link |
and lift the bottleneck there,
link |
like the bottleneck of friction,
link |
then some other part of the system
link |
would start stepping in and creating exponential friction,
link |
maybe the speed of flight or whatever.
link |
And this definitely holds true
link |
when you look at the problem solving algorithm
link |
that is being run by science as an institution,
link |
science as a system.
link |
As you make more and more progress,
link |
despite having this recursive self improvement component,
link |
you are encountering exponential friction.
link |
The more researchers you have working on different ideas,
link |
the more overhead you have
link |
in terms of communication across researchers.
link |
If you look at, you were mentioning quantum mechanics, right?
link |
Well, if you want to start making significant discoveries
link |
today, significant progress in quantum mechanics,
link |
there is an amount of knowledge you have to ingest,
link |
So there's a very large overhead
link |
to even start to contribute.
link |
There's a large amount of overhead
link |
to synchronize across researchers and so on.
link |
And of course, the significant practical experiments
link |
are going to require exponentially expensive equipment
link |
because the easier ones have already been run, right?
link |
So in your senses, there's no way escaping,
link |
there's no way of escaping this kind of friction
link |
with artificial intelligence systems.
link |
Yeah, no, I think science is a very good way
link |
to model what would happen with a superhuman
link |
recursive research improving AI.
link |
That's your sense, I mean, the...
link |
That's my intuition.
link |
It's not like a mathematical proof of anything.
link |
That's not my point.
link |
Like, I'm not trying to prove anything.
link |
I'm just trying to make an argument
link |
to question the narrative of intelligence explosion,
link |
which is quite a dominant narrative.
link |
And you do get a lot of pushback if you go against it.
link |
Because, so for many people, right,
link |
AI is not just a subfield of computer science.
link |
It's more like a belief system.
link |
Like this belief that the world is headed towards an event,
link |
the singularity, past which, you know, AI will become...
link |
will go exponential very much,
link |
and the world will be transformed,
link |
and humans will become obsolete.
link |
And if you go against this narrative,
link |
because it is not really a scientific argument,
link |
but more of a belief system,
link |
it is part of the identity of many people.
link |
If you go against this narrative,
link |
it's like you're attacking the identity
link |
of people who believe in it.
link |
It's almost like saying God doesn't exist,
link |
So you do get a lot of pushback
link |
if you try to question these ideas.
link |
First of all, I believe most people,
link |
they might not be as eloquent or explicit as you're being,
link |
but most people in computer science
link |
are most people who actually have built
link |
anything that you could call AI, quote, unquote,
link |
would agree with you.
link |
They might not be describing in the same kind of way.
link |
It's more, so the pushback you're getting
link |
is from people who get attached to the narrative
link |
from, not from a place of science,
link |
but from a place of imagination.
link |
That's correct, that's correct.
link |
So why do you think that's so appealing?
link |
Because the usual dreams that people have
link |
when you create a superintelligence system
link |
past the singularity,
link |
that what people imagine is somehow always destructive.
link |
Do you have, if you were put on your psychology hat,
link |
what's, why is it so appealing to imagine
link |
the ways that all of human civilization will be destroyed?
link |
I think it's a good story.
link |
You know, it's a good story.
link |
And very interestingly, it mirrors a religious stories,
link |
right, religious mythology.
link |
If you look at the mythology of most civilizations,
link |
it's about the world being headed towards some final events
link |
in which the world will be destroyed
link |
and some new world order will arise
link |
that will be mostly spiritual,
link |
like the apocalypse followed by a paradise probably, right?
link |
It's a very appealing story on a fundamental level.
link |
And we all need stories.
link |
We all need stories to structure the way we see the world,
link |
especially at timescales
link |
that are beyond our ability to make predictions, right?
link |
So on a more serious non exponential explosion,
link |
question, do you think there will be a time
link |
when we'll create something like human level intelligence
link |
or intelligent systems that will make you sit back
link |
and be just surprised at damn how smart this thing is?
link |
That doesn't require exponential growth
link |
or an exponential improvement,
link |
but what's your sense of the timeline and so on
link |
that you'll be really surprised at certain capabilities?
link |
And we'll talk about limitations and deep learning.
link |
So do you think in your lifetime,
link |
you'll be really damn surprised?
link |
Around 2013, 2014, I was many times surprised
link |
by the capabilities of deep learning actually.
link |
That was before we had assessed exactly
link |
what deep learning could do and could not do.
link |
And it felt like a time of immense potential.
link |
And then we started narrowing it down,
link |
but I was very surprised.
link |
I would say it has already happened.
link |
Was there a moment, there must've been a day in there
link |
where your surprise was almost bordering
link |
on the belief of the narrative that we just discussed.
link |
Was there a moment,
link |
because you've written quite eloquently
link |
about the limits of deep learning,
link |
was there a moment that you thought
link |
that maybe deep learning is limitless?
link |
No, I don't think I've ever believed this.
link |
What was really shocking is that it worked.
link |
It worked at all, yeah.
link |
But there's a big jump between being able
link |
to do really good computer vision
link |
and human level intelligence.
link |
So I don't think at any point I wasn't under the impression
link |
that the results we got in computer vision
link |
meant that we were very close to human level intelligence.
link |
I don't think we're very close to human level intelligence.
link |
I do believe that there's no reason
link |
why we won't achieve it at some point.
link |
I also believe that it's the problem
link |
with talking about human level intelligence
link |
that implicitly you're considering
link |
like an axis of intelligence with different levels,
link |
but that's not really how intelligence works.
link |
Intelligence is very multi dimensional.
link |
And so there's the question of capabilities,
link |
but there's also the question of being human like,
link |
and it's two very different things.
link |
Like you can build potentially
link |
very advanced intelligent agents
link |
that are not human like at all.
link |
And you can also build very human like agents.
link |
And these are two very different things, right?
link |
Let's go from the philosophical to the practical.
link |
Can you give me a history of Keras
link |
and all the major deep learning frameworks
link |
that you kind of remember in relation to Keras
link |
and in general, TensorFlow, Theano, the old days.
link |
Can you give a brief overview Wikipedia style history
link |
and your role in it before we return to AGI discussions?
link |
Yeah, that's a broad topic.
link |
So I started working on Keras.
link |
It was the name Keras at the time.
link |
I actually picked the name like
link |
just the day I was going to release it.
link |
So I started working on it in February, 2015.
link |
And so at the time there weren't too many people
link |
working on deep learning, maybe like fewer than 10,000.
link |
The software tooling was not really developed.
link |
So the main deep learning library was Cafe,
link |
which was mostly C++.
link |
Why do you say Cafe was the main one?
link |
Cafe was vastly more popular than Theano
link |
in late 2014, early 2015.
link |
Cafe was the one library that everyone was using
link |
for computer vision.
link |
And computer vision was the most popular problem
link |
in deep learning at the time.
link |
Like ConvNets was like the subfield of deep learning
link |
that everyone was working on.
link |
So myself, so in late 2014,
link |
I was actually interested in RNNs,
link |
in recurrent neural networks,
link |
which was a very niche topic at the time, right?
link |
It really took off around 2016.
link |
And so I was looking for good tools.
link |
I had used Torch 7, I had used Theano,
link |
used Theano a lot in Kaggle competitions.
link |
And there was no like good solution for RNNs at the time.
link |
Like there was no reusable open source implementation
link |
of an LSTM, for instance.
link |
So I decided to build my own.
link |
And at first, the pitch for that was,
link |
it was gonna be mostly around LSTM recurrent neural networks.
link |
It was gonna be in Python.
link |
An important decision at the time
link |
that was kind of not obvious
link |
is that the models would be defined via Python code,
link |
which was kind of like going against the mainstream
link |
at the time because Cafe, Pylon 2, and so on,
link |
like all the big libraries were actually going
link |
with the approach of setting configuration files
link |
in YAML to define models.
link |
So some libraries were using code to define models,
link |
like Torch 7, obviously, but that was not Python.
link |
Lasagne was like a Theano based very early library
link |
that was, I think, developed, I don't remember exactly,
link |
probably late 2014.
link |
It's Python as well.
link |
It's Python as well.
link |
It was like on top of Theano.
link |
And so I started working on something
link |
and the value proposition at the time was that
link |
not only what I think was the first
link |
reusable open source implementation of LSTM,
link |
you could combine RNNs and covenants
link |
with the same library,
link |
which is not really possible before,
link |
like Cafe was only doing covenants.
link |
And it was kind of easy to use
link |
because, so before I was using Theano,
link |
I was actually using scikitlin
link |
and I loved scikitlin for its usability.
link |
So I drew a lot of inspiration from scikitlin
link |
when I made Keras.
link |
It's almost like scikitlin for neural networks.
link |
Exactly, the fit function,
link |
like reducing a complex string loop
link |
to a single function call, right?
link |
And of course, some people will say,
link |
this is hiding a lot of details,
link |
but that's exactly the point, right?
link |
The magic is the point.
link |
So it's magical, but in a good way.
link |
It's magical in the sense that it's delightful.
link |
I'm actually quite surprised.
link |
I didn't know that it was born out of desire
link |
to implement RNNs and LSTMs.
link |
That's fascinating.
link |
So you were actually one of the first people
link |
to really try to attempt
link |
to get the major architectures together.
link |
And it's also interesting.
link |
You made me realize that that was a design decision at all
link |
is defining the model and code.
link |
Just, I'm putting myself in your shoes,
link |
whether the YAML, especially if cafe was the most popular.
link |
It was the most popular by far.
link |
If I was, if I were, yeah, I don't,
link |
I didn't like the YAML thing,
link |
but it makes more sense that you will put
link |
in a configuration file, the definition of a model.
link |
That's an interesting gutsy move
link |
to stick with defining it in code.
link |
Just if you look back.
link |
Other libraries were doing it as well,
link |
but it was definitely the more niche option.
link |
Okay, Keras and then.
link |
So I released Keras in March, 2015,
link |
and it got users pretty much from the start.
link |
So the deep learning community was very, very small
link |
Lots of people were starting to be interested in LSTM.
link |
So it was gonna release it at the right time
link |
because it was offering an easy to use LSTM implementation.
link |
Exactly at the time where lots of people started
link |
to be intrigued by the capabilities of RNN, RNNs for NLP.
link |
So it grew from there.
link |
Then I joined Google about six months later,
link |
and that was actually completely unrelated to Keras.
link |
So I actually joined a research team
link |
working on image classification,
link |
mostly like computer vision.
link |
So I was doing computer vision research
link |
at Google initially.
link |
And immediately when I joined Google,
link |
I was exposed to the early internal version of TensorFlow.
link |
And the way it appeared to me at the time,
link |
and it was definitely the way it was at the time
link |
is that this was an improved version of Theano.
link |
So I immediately knew I had to port Keras
link |
to this new TensorFlow thing.
link |
And I was actually very busy as a noobler,
link |
So I had not time to work on that.
link |
But then in November, I think it was November, 2015,
link |
TensorFlow got released.
link |
And it was kind of like my wake up call
link |
that, hey, I had to actually go and make it happen.
link |
So in December, I ported Keras to run on top of TensorFlow,
link |
but it was not exactly a port.
link |
It was more like a refactoring
link |
where I was abstracting away
link |
all the backend functionality into one module
link |
so that the same code base
link |
could run on top of multiple backends.
link |
So on top of TensorFlow or Theano.
link |
And for the next year,
link |
Theano stayed as the default option.
link |
It was easier to use, somewhat less buggy.
link |
It was much faster, especially when it came to audience.
link |
But eventually, TensorFlow overtook it.
link |
And TensorFlow, the early TensorFlow,
link |
has similar architectural decisions as Theano, right?
link |
So it was a natural transition.
link |
So what, I mean, that still Keras is a side,
link |
almost fun project, right?
link |
Yeah, so it was not my job assignment.
link |
I was doing it on the side.
link |
And even though it grew to have a lot of users
link |
for a deep learning library at the time, like Stroud 2016,
link |
but I wasn't doing it as my main job.
link |
So things started changing in,
link |
I think it must have been maybe October, 2016.
link |
So one year later.
link |
So Rajat, who was the lead on TensorFlow,
link |
basically showed up one day in our building
link |
where I was doing like,
link |
so I was doing research and things like,
link |
so I did a lot of computer vision research,
link |
also collaborations with Christian Zighetti
link |
and deep learning for theorem proving.
link |
It was a really interesting research topic.
link |
And so Rajat was saying,
link |
hey, we saw Keras, we like it.
link |
We saw that you're at Google.
link |
Why don't you come over for like a quarter
link |
And I was like, yeah, that sounds like a great opportunity.
link |
And so I started working on integrating the Keras API
link |
into TensorFlow more tightly.
link |
So what followed up is a sort of like temporary
link |
TensorFlow only version of Keras
link |
that was in TensorFlow.com Trib for a while.
link |
And finally moved to TensorFlow Core.
link |
And I've never actually gotten back
link |
to my old team doing research.
link |
Well, it's kind of funny that somebody like you
link |
who dreams of, or at least sees the power of AI systems
link |
that reason and theorem proving we'll talk about
link |
has also created a system that makes the most basic
link |
kind of Lego building that is deep learning
link |
super accessible, super easy.
link |
So beautifully so.
link |
It's a funny irony that you're both,
link |
you're responsible for both things,
link |
but so TensorFlow 2.0 is kind of, there's a sprint.
link |
I don't know how long it'll take,
link |
but there's a sprint towards the finish.
link |
What do you look, what are you working on these days?
link |
What are you excited about?
link |
What are you excited about in 2.0?
link |
I mean, eager execution.
link |
There's so many things that just make it a lot easier
link |
What are you excited about and what's also really hard?
link |
What are the problems you have to kind of solve?
link |
So I've spent the past year and a half working on
link |
TensorFlow 2.0 and it's been a long journey.
link |
I'm actually extremely excited about it.
link |
I think it's a great product.
link |
It's a delightful product compared to TensorFlow 1.0.
link |
We've made huge progress.
link |
So on the Keras side, what I'm really excited about is that,
link |
so previously Keras has been this very easy to use
link |
high level interface to do deep learning.
link |
But if you wanted to,
link |
if you wanted a lot of flexibility,
link |
the Keras framework was probably not the optimal way
link |
to do things compared to just writing everything
link |
So in some way, the framework was getting in the way.
link |
And in TensorFlow 2.0, you don't have this at all, actually.
link |
You have the usability of the high level interface,
link |
but you have the flexibility of this lower level interface.
link |
And you have this spectrum of workflows
link |
where you can get more or less usability
link |
and flexibility trade offs depending on your needs, right?
link |
You can write everything from scratch
link |
and you get a lot of help doing so
link |
by subclassing models and writing some train loops
link |
using ego execution.
link |
It's very flexible, it's very easy to debug,
link |
it's very powerful.
link |
But all of this integrates seamlessly
link |
with higher level features up to the classic Keras workflows,
link |
which are very scikit learn like
link |
and are ideal for a data scientist,
link |
machine learning engineer type of profile.
link |
So now you can have the same framework
link |
offering the same set of APIs
link |
that enable a spectrum of workflows
link |
that are more or less low level, more or less high level
link |
that are suitable for profiles ranging from researchers
link |
to data scientists and everything in between.
link |
Yeah, so that's super exciting.
link |
I mean, it's not just that,
link |
it's connected to all kinds of tooling.
link |
You can go on mobile, you can go with TensorFlow Lite,
link |
you can go in the cloud or serving and so on.
link |
It all is connected together.
link |
Now some of the best software written ever
link |
is often done by one person, sometimes two.
link |
So with a Google, you're now seeing sort of Keras
link |
having to be integrated in TensorFlow,
link |
I'm sure has a ton of engineers working on.
link |
And there's, I'm sure a lot of tricky design decisions
link |
How does that process usually happen
link |
from at least your perspective?
link |
What are the debates like?
link |
Is there a lot of thinking,
link |
considering different options and so on?
link |
So a lot of the time I spend at Google
link |
is actually discussing design discussions, right?
link |
Writing design docs, participating in design review meetings
link |
This is as important as actually writing a code.
link |
So there's a lot of thought, there's a lot of thought
link |
and a lot of care that is taken
link |
in coming up with these decisions
link |
and taking into account all of our users
link |
because TensorFlow has this extremely diverse user base,
link |
It's not like just one user segment
link |
where everyone has the same needs.
link |
We have small scale production users,
link |
large scale production users.
link |
We have startups, we have researchers,
link |
you know, it's all over the place.
link |
And we have to cater to all of their needs.
link |
If I just look at the standard debates
link |
of C++ or Python, there's some heated debates.
link |
Do you have those at Google?
link |
I mean, they're not heated in terms of emotionally,
link |
but there's probably multiple ways to do it, right?
link |
So how do you arrive through those design meetings
link |
at the best way to do it?
link |
Especially in deep learning where the field is evolving
link |
as you're doing it.
link |
Is there some magic to it?
link |
Is there some magic to the process?
link |
I don't know if there's magic to the process,
link |
but there definitely is a process.
link |
So making design decisions
link |
is about satisfying a set of constraints,
link |
but also trying to do so in the simplest way possible,
link |
because this is what can be maintained,
link |
this is what can be expanded in the future.
link |
So you don't want to naively satisfy the constraints
link |
by just, you know, for each capability you need available,
link |
you're gonna come up with one argument in your API
link |
You want to design APIs that are modular and hierarchical
link |
so that they have an API surface
link |
that is as small as possible, right?
link |
And you want this modular hierarchical architecture
link |
to reflect the way that domain experts
link |
think about the problem.
link |
Because as a domain expert,
link |
when you are reading about a new API,
link |
you're reading a tutorial or some docs pages,
link |
you already have a way that you're thinking about the problem.
link |
You already have like certain concepts in mind
link |
and you're thinking about how they relate together.
link |
And when you're reading docs,
link |
you're trying to build as quickly as possible
link |
a mapping between the concepts featured in your API
link |
and the concepts in your mind.
link |
So you're trying to map your mental model
link |
as a domain expert to the way things work in the API.
link |
So you need an API and an underlying implementation
link |
that are reflecting the way people think about these things.
link |
So in minimizing the time it takes to do the mapping.
link |
Yes, minimizing the time,
link |
the cognitive load there is
link |
in ingesting this new knowledge about your API.
link |
An API should not be self referential
link |
or referring to implementation details.
link |
It should only be referring to domain specific concepts
link |
that people already understand.
link |
So what's the future of Keras and TensorFlow look like?
link |
What does TensorFlow 3.0 look like?
link |
So that's kind of too far in the future for me to answer,
link |
especially since I'm not even the one making these decisions.
link |
But so from my perspective,
link |
which is just one perspective
link |
among many different perspectives on the TensorFlow team,
link |
I'm really excited by developing even higher level APIs,
link |
higher level than Keras.
link |
I'm really excited by hyperparameter tuning,
link |
by automated machine learning, AutoML.
link |
I think the future is not just, you know,
link |
defining a model like you were assembling Lego blocks
link |
and then collect fit on it.
link |
It's more like an automagical model
link |
that would just look at your data
link |
and optimize the objective you're after, right?
link |
So that's what I'm looking into.
link |
Yeah, so you put the baby into a room with the problem
link |
and come back a few hours later
link |
with a fully solved problem.
link |
Exactly, it's not like a box of Legos.
link |
It's more like the combination of a kid
link |
that's really good at Legos and a box of Legos.
link |
It's just building the thing on its own.
link |
So that's an exciting future.
link |
I think there's a huge amount of applications
link |
and revolutions to be had
link |
under the constraints of the discussion we previously had.
link |
But what do you think of the current limits of deep learning?
link |
If we look specifically at these function approximators
link |
that tries to generalize from data.
link |
You've talked about local versus extreme generalization.
link |
You mentioned that neural networks don't generalize well
link |
So there's this gap.
link |
And you've also mentioned that extreme generalization
link |
requires something like reasoning to fill those gaps.
link |
So how can we start trying to build systems like that?
link |
Right, yeah, so this is by design, right?
link |
Deep learning models are like huge parametric models,
link |
differentiable, so continuous,
link |
that go from an input space to an output space.
link |
And they're trained with gradient descent.
link |
So they're trained pretty much point by point.
link |
They are learning a continuous geometric morphing
link |
from an input vector space to an output vector space.
link |
And because this is done point by point,
link |
a deep neural network can only make sense
link |
of points in experience space that are very close
link |
to things that it has already seen in string data.
link |
At best, it can do interpolation across points.
link |
But that means in order to train your network,
link |
you need a dense sampling of the input cross output space,
link |
almost a point by point sampling,
link |
which can be very expensive if you're dealing
link |
with complex real world problems,
link |
like autonomous driving, for instance, or robotics.
link |
It's doable if you're looking at the subset
link |
of the visual space.
link |
But even then, it's still fairly expensive.
link |
You still need millions of examples.
link |
And it's only going to be able to make sense of things
link |
that are very close to what it has seen before.
link |
And in contrast to that, well, of course,
link |
you have human intelligence.
link |
But even if you're not looking at human intelligence,
link |
you can look at very simple rules, algorithms.
link |
If you have a symbolic rule,
link |
it can actually apply to a very, very large set of inputs
link |
because it is abstract.
link |
It is not obtained by doing a point by point mapping.
link |
For instance, if you try to learn a sorting algorithm
link |
using a deep neural network,
link |
well, you're very much limited to learning point by point
link |
what the sorted representation of this specific list is like.
link |
But instead, you could have a very, very simple
link |
sorting algorithm written in a few lines.
link |
Maybe it's just two nested loops.
link |
And it can process any list at all because it is abstract,
link |
because it is a set of rules.
link |
So deep learning is really like point by point
link |
geometric morphings, train with good and decent.
link |
And meanwhile, abstract rules can generalize much better.
link |
And I think the future is we need to combine the two.
link |
So how do we, do you think, combine the two?
link |
How do we combine good point by point functions
link |
with programs, which is what the symbolic AI type systems?
link |
At which levels the combination happen?
link |
I mean, obviously we're jumping into the realm
link |
of where there's no good answers.
link |
It's just kind of ideas and intuitions and so on.
link |
Well, if you look at the really successful AI systems
link |
today, I think they are already hybrid systems
link |
that are combining symbolic AI with deep learning.
link |
For instance, successful robotics systems
link |
are already mostly model based, rule based,
link |
things like planning algorithms and so on.
link |
At the same time, they're using deep learning
link |
as perception modules.
link |
Sometimes they're using deep learning as a way
link |
to inject fuzzy intuition into a rule based process.
link |
If you look at the system like in a self driving car,
link |
it's not just one big end to end neural network.
link |
You know, that wouldn't work at all.
link |
Precisely because in order to train that,
link |
you would need a dense sampling of experience base
link |
when it comes to driving,
link |
which is completely unrealistic, obviously.
link |
Instead, the self driving car is mostly
link |
symbolic, you know, it's software, it's programmed by hand.
link |
So it's mostly based on explicit models.
link |
In this case, mostly 3D models of the environment
link |
around the car, but it's interfacing with the real world
link |
using deep learning modules, right?
link |
So the deep learning there serves as a way
link |
to convert the raw sensory information
link |
to something usable by symbolic systems.
link |
Okay, well, let's linger on that a little more.
link |
So dense sampling from input to output.
link |
You said it's obviously very difficult.
link |
In the case of self driving, you mean?
link |
Let's say self driving, right?
link |
Self driving for many people,
link |
let's not even talk about self driving,
link |
let's talk about steering, so staying inside the lane.
link |
Lane following, yeah, it's definitely a problem
link |
you can solve with an end to end deep learning model,
link |
but that's like one small subset.
link |
Yeah, I don't know why you're jumping
link |
from the extreme so easily,
link |
because I disagree with you on that.
link |
I think, well, it's not obvious to me
link |
that you can solve lane following.
link |
No, it's not obvious, I think it's doable.
link |
I think in general, there is no hard limitations
link |
to what you can learn with a deep neural network,
link |
as long as the search space is rich enough,
link |
is flexible enough, and as long as you have
link |
this dense sampling of the input cross output space.
link |
The problem is that this dense sampling
link |
could mean anything from 10,000 examples
link |
to like trillions and trillions.
link |
So that's my question.
link |
So what's your intuition?
link |
And if you could just give it a chance
link |
and think what kind of problems can be solved
link |
by getting a huge amounts of data
link |
and thereby creating a dense mapping.
link |
So let's think about natural language dialogue,
link |
Do you think the Turing test can be solved
link |
with a neural network alone?
link |
Well, the Turing test is all about tricking people
link |
into believing they're talking to a human.
link |
And I don't think that's actually very difficult
link |
because it's more about exploiting human perception
link |
and not so much about intelligence.
link |
There's a big difference between mimicking
link |
intelligent behavior and actual intelligent behavior.
link |
So, okay, let's look at maybe the Alexa prize and so on.
link |
The different formulations of the natural language
link |
conversation that are less about mimicking
link |
and more about maintaining a fun conversation
link |
that lasts for 20 minutes.
link |
That's a little less about mimicking
link |
and that's more about, I mean, it's still mimicking,
link |
but it's more about being able to carry forward
link |
a conversation with all the tangents that happen
link |
in dialogue and so on.
link |
Do you think that problem is learnable
link |
with a neural network that does the point to point mapping?
link |
So I think it would be very, very challenging
link |
to do this with deep learning.
link |
I don't think it's out of the question either.
link |
I wouldn't rule it out.
link |
The space of problems that can be solved
link |
with a large neural network.
link |
What's your sense about the space of those problems?
link |
So useful problems for us.
link |
In theory, it's infinite, right?
link |
You can solve any problem.
link |
In practice, well, deep learning is a great fit
link |
for perception problems.
link |
In general, any problem which is naturally amenable
link |
to explicit handcrafted rules or rules that you can generate
link |
by exhaustive search over some program space.
link |
So perception, artificial intuition,
link |
as long as you have a sufficient training dataset.
link |
And that's the question, I mean, perception,
link |
there's interpretation and understanding of the scene,
link |
which seems to be outside the reach
link |
of current perception systems.
link |
So do you think larger networks will be able
link |
to start to understand the physics
link |
and the physics of the scene,
link |
the three dimensional structure and relationships
link |
of objects in the scene and so on?
link |
Or really that's where symbolic AI has to step in?
link |
Well, it's always possible to solve these problems
link |
with deep learning.
link |
It's just extremely inefficient.
link |
A model would be an explicit rule based abstract model
link |
would be a far better, more compressed
link |
representation of physics.
link |
Then learning just this mapping between
link |
in this situation, this thing happens.
link |
If you change the situation slightly,
link |
then this other thing happens and so on.
link |
Do you think it's possible to automatically generate
link |
the programs that would require that kind of reasoning?
link |
Or does it have to, so the way the expert systems fail,
link |
there's so many facts about the world
link |
had to be hand coded in.
link |
Do you think it's possible to learn those logical statements
link |
that are true about the world and their relationships?
link |
Do you think, I mean, that's kind of what theorem proving
link |
at a basic level is trying to do, right?
link |
Yeah, except it's much harder to formulate statements
link |
about the world compared to formulating
link |
mathematical statements.
link |
Statements about the world tend to be subjective.
link |
So can you learn rule based models?
link |
That's the field of program synthesis.
link |
However, today we just don't really know how to do it.
link |
So it's very much a grass search or tree search problem.
link |
And so we are limited to the sort of tree session grass
link |
search algorithms that we have today.
link |
Personally, I think genetic algorithms are very promising.
link |
So almost like genetic programming.
link |
Genetic programming, exactly.
link |
Can you discuss the field of program synthesis?
link |
Like how many people are working and thinking about it?
link |
Where we are in the history of program synthesis
link |
and what are your hopes for it?
link |
Well, if it were deep learning, this is like the 90s.
link |
So meaning that we already have existing solutions.
link |
We are starting to have some basic understanding
link |
of what this is about.
link |
But it's still a field that is in its infancy.
link |
There are very few people working on it.
link |
There are very few real world applications.
link |
So the one real world application I'm aware of
link |
is Flash Fill in Excel.
link |
It's a way to automatically learn very simple programs
link |
to format cells in an Excel spreadsheet
link |
from a few examples.
link |
For instance, learning a way to format a date, things like that.
link |
Oh, that's fascinating.
link |
You know, OK, that's a fascinating topic.
link |
I always wonder when I provide a few samples to Excel,
link |
what it's able to figure out.
link |
Like just giving it a few dates, what
link |
are you able to figure out from the pattern I just gave you?
link |
That's a fascinating question.
link |
And it's fascinating whether that's learnable patterns.
link |
And you're saying they're working on that.
link |
How big is the toolbox currently?
link |
Are we completely in the dark?
link |
So if you said the 90s.
link |
In terms of program synthesis?
link |
So I would say, so maybe 90s is even too optimistic.
link |
Because by the 90s, we already understood back prop.
link |
We already understood the engine of deep learning,
link |
even though we couldn't really see its potential quite.
link |
Today, I don't think we have found
link |
the engine of program synthesis.
link |
So we're in the winter before back prop.
link |
So I do believe program synthesis and general discrete search
link |
over rule based models is going to be
link |
a cornerstone of AI research in the next century.
link |
And that doesn't mean we are going to drop deep learning.
link |
Deep learning is immensely useful.
link |
Like, being able to learn is a very flexible, adaptable,
link |
So it's got to understand that's actually immensely useful.
link |
All it's doing is pattern cognition.
link |
But being good at pattern cognition, given lots of data,
link |
is just extremely powerful.
link |
So we are still going to be working on deep learning.
link |
We are going to be working on program synthesis.
link |
We are going to be combining the two in increasingly automated
link |
So let's talk a little bit about data.
link |
You've tweeted, about 10,000 deep learning papers
link |
have been written about hard coding priors
link |
about a specific task in a neural network architecture
link |
works better than a lack of a prior.
link |
Basically, summarizing all these efforts,
link |
they put a name to an architecture.
link |
But really, what they're doing is hard coding some priors
link |
that improve the performance of the system.
link |
But which gets straight to the point is probably true.
link |
So you say that you can always buy performance by,
link |
in quotes, performance by either training on more data,
link |
better data, or by injecting task information
link |
to the architecture of the preprocessing.
link |
However, this isn't informative about the generalization power
link |
the techniques use, the fundamental ability
link |
Do you think we can go far by coming up
link |
with better methods for this kind of cheating,
link |
for better methods of large scale annotation of data?
link |
So building better priors.
link |
If you automate it, it's not cheating anymore.
link |
I'm joking about the cheating, but large scale.
link |
So basically, I'm asking about something
link |
that hasn't, from my perspective,
link |
been researched too much is exponential improvement
link |
in annotation of data.
link |
Do you often think about?
link |
I think it's actually been researched quite a bit.
link |
You just don't see publications about it.
link |
Because people who publish papers
link |
are going to publish about known benchmarks.
link |
Sometimes they're going to read a new benchmark.
link |
People who actually have real world large scale
link |
depending on problems, they're going
link |
to spend a lot of resources into data annotation
link |
and good data annotation pipelines,
link |
but you don't see any papers about it.
link |
That's interesting.
link |
So do you think, certainly resources,
link |
but do you think there's innovation happening?
link |
To clarify the point in the tweet.
link |
So machine learning in general is
link |
the science of generalization.
link |
You want to generate knowledge that
link |
can be reused across different data sets,
link |
across different tasks.
link |
And if instead you're looking at one data set
link |
and then you are hard coding knowledge about this task
link |
into your architecture, this is no more useful
link |
than training a network and then saying, oh, I
link |
found these weight values perform well.
link |
So David Ha, I don't know if you know David,
link |
he had a paper the other day about weight
link |
agnostic neural networks.
link |
And this is a very interesting paper
link |
because it really illustrates the fact
link |
that an architecture, even without weights,
link |
an architecture is knowledge about a task.
link |
It encodes knowledge.
link |
And when it comes to architectures
link |
that are uncrafted by researchers, in some cases,
link |
it is very, very clear that all they are doing
link |
is artificially reencoding the template that
link |
corresponds to the proper way to solve the task encoding
link |
For instance, I know if you looked
link |
at the baby data set, which is about natural language
link |
question answering, it is generated by an algorithm.
link |
So this is a question answer pairs
link |
that are generated by an algorithm.
link |
The algorithm is solving a certain template.
link |
Turns out, if you craft a network that
link |
literally encodes this template, you
link |
can solve this data set with nearly 100% accuracy.
link |
But that doesn't actually tell you
link |
anything about how to solve question answering
link |
in general, which is the point.
link |
The question is just to linger on it,
link |
whether it's from the data side or from the size
link |
I don't know if you've read the blog post by Rich Sutton,
link |
The Bitter Lesson, where he says,
link |
the biggest lesson that we can read from 70 years of AI
link |
research is that general methods that leverage computation
link |
are ultimately the most effective.
link |
So as opposed to figuring out methods
link |
that can generalize effectively, do you
link |
think we can get pretty far by just having something
link |
that leverages computation and the improvement of computation?
link |
Yeah, so I think Rich is making a very good point, which
link |
is that a lot of these papers, which are actually
link |
all about manually hardcoding prior knowledge about a task
link |
into some system, it doesn't have
link |
to be deep learning architecture, but into some system.
link |
These papers are not actually making any impact.
link |
Instead, what's making really long term impact
link |
is very simple, very general systems
link |
that are really agnostic to all these tricks.
link |
Because these tricks do not generalize.
link |
And of course, the one general and simple thing
link |
that you should focus on is that which leverages computation.
link |
Because computation, the availability
link |
of large scale computation has been increasing exponentially
link |
following Moore's law.
link |
So if your algorithm is all about exploiting this,
link |
then your algorithm is suddenly exponentially improving.
link |
So I think Rich is definitely right.
link |
However, he's right about the past 70 years.
link |
He's like assessing the past 70 years.
link |
I am not sure that this assessment will still
link |
hold true for the next 70 years.
link |
It might to some extent.
link |
I suspect it will not.
link |
Because the truth of his assessment
link |
is a function of the context in which this research took place.
link |
And the context is changing.
link |
Moore's law might not be applicable anymore,
link |
for instance, in the future.
link |
And I do believe that when you tweak one aspect of a system,
link |
when you exploit one aspect of a system,
link |
some other aspect starts becoming the bottleneck.
link |
Let's say you have unlimited computation.
link |
Well, then data is the bottleneck.
link |
And I think we are already starting
link |
to be in a regime where our systems are
link |
so large in scale and so data ingrained
link |
that data today and the quality of data
link |
and the scale of data is the bottleneck.
link |
And in this environment, the bitter lesson from Rich
link |
is not going to be true anymore.
link |
So I think we are going to move from a focus
link |
on a computation scale to focus on data efficiency.
link |
So that's getting to the question of symbolic AI.
link |
But to linger on the deep learning approaches,
link |
do you have hope for either unsupervised learning
link |
or reinforcement learning, which are
link |
ways of being more data efficient in terms
link |
of the amount of data they need that required human annotation?
link |
So unsupervised learning and reinforcement learning
link |
are frameworks for learning, but they are not
link |
like any specific technique.
link |
So usually when people say reinforcement learning,
link |
what they really mean is deep reinforcement learning,
link |
which is like one approach which is actually very questionable.
link |
The question I was asking was unsupervised learning
link |
with deep neural networks and deep reinforcement learning.
link |
Well, these are not really data efficient
link |
because you're still leveraging these huge parametric models
link |
point by point with gradient descent.
link |
It is more efficient in terms of the number of annotations,
link |
the density of annotations you need.
link |
So the idea being to learn the latent space around which
link |
the data is organized and then map the sparse annotations
link |
And sure, I mean, that's clearly a very good idea.
link |
It's not really a topic I would be working on,
link |
but it's clearly a good idea.
link |
So it would get us to solve some problems that?
link |
It will get us to incremental improvements
link |
in labeled data efficiency.
link |
Do you have concerns about short term or long term threats
link |
from AI, from artificial intelligence?
link |
Yes, definitely to some extent.
link |
And what's the shape of those concerns?
link |
This is actually something I've briefly written about.
link |
But the capabilities of deep learning technology
link |
can be used in many ways that are
link |
concerning from mass surveillance with things
link |
like facial recognition.
link |
In general, tracking lots of data about everyone
link |
and then being able to making sense of this data
link |
to do identification, to do prediction.
link |
That's concerning.
link |
That's something that's being very aggressively pursued
link |
by totalitarian states like China.
link |
One thing I am very much concerned about
link |
is that our lives are increasingly online,
link |
are increasingly digital, made of information,
link |
made of information consumption and information production,
link |
our digital footprint, I would say.
link |
And if you absorb all of this data
link |
and you are in control of where you consume information,
link |
social networks and so on, recommendation engines,
link |
then you can build a sort of reinforcement
link |
loop for human behavior.
link |
You can observe the state of your mind at time t.
link |
You can predict how you would react
link |
to different pieces of content, how
link |
to get you to move your mind in a certain direction.
link |
And then you can feed you the specific piece of content
link |
that would move you in a specific direction.
link |
And you can do this at scale in terms
link |
of doing it continuously in real time.
link |
You can also do it at scale in terms
link |
of scaling this to many, many people, to entire populations.
link |
So potentially, artificial intelligence,
link |
even in its current state, if you combine it
link |
with the internet, with the fact that all of our lives
link |
are moving to digital devices and digital information
link |
consumption and creation, what you get
link |
is the possibility to achieve mass manipulation of behavior
link |
and mass psychological control.
link |
And this is a very real possibility.
link |
Yeah, so you're talking about any kind of recommender system.
link |
Let's look at the YouTube algorithm, Facebook,
link |
anything that recommends content you should watch next.
link |
And it's fascinating to think that there's
link |
some aspects of human behavior that you can say a problem of,
link |
is this person hold Republican beliefs or Democratic beliefs?
link |
And this is a trivial, that's an objective function.
link |
And you can optimize, and you can measure,
link |
and you can turn everybody into a Republican
link |
or everybody into a Democrat.
link |
I do believe it's true.
link |
So the human mind is very, if you look at the human mind
link |
as a kind of computer program, it
link |
has a very large exploit surface.
link |
It has many, many vulnerabilities.
link |
Exploit surfaces, yeah.
link |
Ways you can control it.
link |
For instance, when it comes to your political beliefs,
link |
this is very much tied to your identity.
link |
So for instance, if I'm in control of your news feed
link |
on your favorite social media platforms,
link |
this is actually where you're getting your news from.
link |
And of course, I can choose to only show you
link |
news that will make you see the world in a specific way.
link |
But I can also create incentives for you
link |
to post about some political beliefs.
link |
And then when I get you to express a statement,
link |
if it's a statement that me as the controller,
link |
I want to reinforce.
link |
I can just show it to people who will agree,
link |
and they will like it.
link |
And that will reinforce the statement in your mind.
link |
If this is a statement I want you to,
link |
this is a belief I want you to abandon,
link |
I can, on the other hand, show it to opponents.
link |
And because they attack you, at the very least,
link |
next time you will think twice about posting it.
link |
But maybe you will even start believing this
link |
because you got pushback.
link |
So there are many ways in which social media platforms
link |
can potentially control your opinions.
link |
And today, so all of these things
link |
are already being controlled by AI algorithms.
link |
These algorithms do not have any explicit political goal
link |
Well, potentially they could, like if some totalitarian
link |
government takes over social media platforms
link |
and decides that now we are going to use this not just
link |
for mass surveillance, but also for mass opinion control
link |
and behavior control.
link |
Very bad things could happen.
link |
But what's really fascinating and actually quite concerning
link |
is that even without an explicit intent to manipulate,
link |
you're already seeing very dangerous dynamics
link |
in terms of how these content recommendation
link |
algorithms behave.
link |
Because right now, the goal, the objective function
link |
of these algorithms is to maximize engagement,
link |
which seems fairly innocuous at first.
link |
However, it is not because content
link |
that will maximally engage people, get people to react
link |
in an emotional way, get people to click on something.
link |
It is very often content that is not
link |
healthy to the public discourse.
link |
For instance, fake news are far more
link |
likely to get you to click on them than real news
link |
simply because they are not constrained to reality.
link |
So they can be as outrageous, as surprising,
link |
as good stories as you want because they're artificial.
link |
To me, that's an exciting world because so much good
link |
So there's an opportunity to educate people.
link |
You can balance people's worldview with other ideas.
link |
So there's so many objective functions.
link |
The space of objective functions that
link |
create better civilizations is large, arguably infinite.
link |
But there's also a large space that
link |
creates division and destruction, civil war,
link |
a lot of bad stuff.
link |
And the worry is, naturally, probably that space
link |
is bigger, first of all.
link |
And if we don't explicitly think about what kind of effects
link |
are going to be observed from different objective functions,
link |
then we're going to get into trouble.
link |
But the question is, how do we get into rooms
link |
and have discussions, so inside Google, inside Facebook,
link |
inside Twitter, and think about, OK,
link |
how can we drive up engagement and, at the same time,
link |
create a good society?
link |
Is it even possible to have that kind
link |
of philosophical discussion?
link |
I think you can definitely try.
link |
So from my perspective, I would feel rather uncomfortable
link |
with companies that are uncomfortable with these new
link |
student algorithms, with them making explicit decisions
link |
to manipulate people's opinions or behaviors,
link |
even if the intent is good, because that's
link |
a very totalitarian mindset.
link |
So instead, what I would like to see
link |
is probably never going to happen,
link |
because it's not super realistic,
link |
but that's actually something I really care about.
link |
I would like all these algorithms
link |
to present configuration settings to their users,
link |
so that the users can actually make the decision about how
link |
they want to be impacted by these information
link |
recommendation, content recommendation algorithms.
link |
For instance, as a user of something
link |
like YouTube or Twitter, maybe I want
link |
to maximize learning about a specific topic.
link |
So I want the algorithm to feed my curiosity,
link |
which is in itself a very interesting problem.
link |
So instead of maximizing my engagement,
link |
it will maximize how fast and how much I'm learning.
link |
And it will also take into account the accuracy,
link |
hopefully, of the information I'm learning.
link |
So yeah, the user should be able to determine exactly
link |
how these algorithms are affecting their lives.
link |
I don't want actually any entity making decisions
link |
about in which direction they're going to try to manipulate me.
link |
I want technology.
link |
So AI, these algorithms are increasingly
link |
going to be our interface to a world that is increasingly
link |
made of information.
link |
And I want everyone to be in control of this interface,
link |
to interface with the world on their own terms.
link |
So if someone wants these algorithms
link |
to serve their own personal growth goals,
link |
they should be able to configure these algorithms
link |
Yeah, but so I know it's painful to have explicit decisions.
link |
But there is underlying explicit decisions,
link |
which is some of the most beautiful fundamental
link |
philosophy that we have before us,
link |
which is personal growth.
link |
If I want to watch videos from which I can learn,
link |
what does that mean?
link |
So if I have a checkbox that wants to emphasize learning,
link |
there's still an algorithm with explicit decisions in it
link |
that would promote learning.
link |
What does that mean for me?
link |
For example, I've watched a documentary on flat Earth
link |
I'm really glad I watched it.
link |
It was a friend recommended it to me.
link |
Because I don't have such an allergic reaction to crazy
link |
people, as my fellow colleagues do.
link |
But it was very eye opening.
link |
And for others, it might not be.
link |
From others, they might just get turned off from that, same
link |
with Republican and Democrat.
link |
And it's a non trivial problem.
link |
And first of all, if it's done well,
link |
I don't think it's something that wouldn't happen,
link |
that YouTube wouldn't be promoting,
link |
or Twitter wouldn't be.
link |
It's just a really difficult problem,
link |
how to give people control.
link |
Well, it's mostly an interface design problem.
link |
The way I see it, you want to create technology
link |
that's like a mentor, or a coach, or an assistant,
link |
so that it's not your boss.
link |
You are in control of it.
link |
You are telling it what to do for you.
link |
And if you feel like it's manipulating you,
link |
it's not actually doing what you want.
link |
You should be able to switch to a different algorithm.
link |
So that's fine tune control.
link |
You kind of learn that you're trusting
link |
the human collaboration.
link |
I mean, that's how I see autonomous vehicles too,
link |
is giving as much information as possible,
link |
and you learn that dance yourself.
link |
Yeah, Adobe, I don't know if you use Adobe product
link |
for like Photoshop.
link |
They're trying to see if they can inject YouTube
link |
into their interface, but basically allow you
link |
to show you all these videos,
link |
that everybody's confused about what to do with features.
link |
So basically teach people by linking to,
link |
in that way, it's an assistant that uses videos
link |
as a basic element of information.
link |
Okay, so what practically should people do
link |
to try to fight against abuses of these algorithms,
link |
or algorithms that manipulate us?
link |
Honestly, it's a very, very difficult problem,
link |
because to start with, there is very little public awareness
link |
Very few people would think there's anything wrong
link |
with the unused algorithm,
link |
even though there is actually something wrong already,
link |
which is that it's trying to maximize engagement
link |
most of the time, which has very negative side effects.
link |
So ideally, so the very first thing is to stop
link |
trying to purely maximize engagement,
link |
try to propagate content based on popularity, right?
link |
Instead, take into account the goals
link |
and the profiles of each user.
link |
So you will be, one example is, for instance,
link |
when I look at topic recommendations on Twitter,
link |
it's like, you know, they have this news tab
link |
with switch recommendations.
link |
It's always the worst coverage,
link |
because it's content that appeals
link |
to the smallest common denominator
link |
to all Twitter users, because they're trying to optimize.
link |
They're purely trying to optimize popularity.
link |
They're purely trying to optimize engagement.
link |
But that's not what I want.
link |
So they should put me in control of some setting
link |
so that I define what's the objective function
link |
that Twitter is going to be following
link |
to show me this content.
link |
And honestly, so this is all about interface design.
link |
And we are not, it's not realistic
link |
to give users control of a bunch of knobs
link |
that define algorithm.
link |
Instead, we should purely put them in charge
link |
of defining the objective function.
link |
Like, let the user tell us what they want to achieve,
link |
how they want this algorithm to impact their lives.
link |
So do you think it is that,
link |
or do they provide individual article by article
link |
reward structure where you give a signal,
link |
I'm glad I saw this, or I'm glad I didn't?
link |
So like a Spotify type feedback mechanism,
link |
it works to some extent.
link |
I'm kind of skeptical about it
link |
because the only way the algorithm,
link |
the algorithm will attempt to relate your choices
link |
with the choices of everyone else,
link |
which might, you know, if you have an average profile
link |
that works fine, I'm sure Spotify accommodations work fine
link |
if you just like mainstream stuff.
link |
If you don't, it can be, it's not optimal at all actually.
link |
It'll be in an efficient search
link |
for the part of the Spotify world that represents you.
link |
So it's a tough problem,
link |
but do note that even a feedback system
link |
like what Spotify has does not give me control
link |
over what the algorithm is trying to optimize for.
link |
Well, public awareness, which is what we're doing now,
link |
is a good place to start.
link |
Do you have concerns about longterm existential threats
link |
of artificial intelligence?
link |
Well, as I was saying,
link |
our world is increasingly made of information.
link |
AI algorithms are increasingly going to be our interface
link |
to this world of information,
link |
and somebody will be in control of these algorithms.
link |
And that puts us in any kind of a bad situation, right?
link |
It has risks coming from potentially large companies
link |
wanting to optimize their own goals,
link |
maybe profit, maybe something else.
link |
Also from governments who might want to use these algorithms
link |
as a means of control of the population.
link |
Do you think there's existential threat
link |
that could arise from that?
link |
So existential threat.
link |
So maybe you're referring to the singularity narrative
link |
where robots just take over.
link |
Well, I don't, I'm not terminating robots,
link |
and I don't believe it has to be a singularity.
link |
We're just talking to, just like you said,
link |
the algorithm controlling masses of populations.
link |
The existential threat being,
link |
hurt ourselves much like a nuclear war would hurt ourselves.
link |
That kind of thing.
link |
I don't think that requires a singularity.
link |
That requires a loss of control over AI algorithm.
link |
So I do agree there are concerning trends.
link |
Honestly, I wouldn't want to make any longterm predictions.
link |
I don't think today we really have the capability
link |
to see what the dangers of AI
link |
are going to be in 50 years, in 100 years.
link |
I do see that we are already faced
link |
with concrete and present dangers
link |
surrounding the negative side effects
link |
of content recombination systems, of newsfeed algorithms
link |
concerning algorithmic bias as well.
link |
So we are delegating more and more
link |
decision processes to algorithms.
link |
Some of these algorithms are uncrafted,
link |
some are learned from data,
link |
but we are delegating control.
link |
Sometimes it's a good thing, sometimes not so much.
link |
And there is in general very little supervision
link |
of this process, right?
link |
So we are still in this period of very fast change,
link |
even chaos, where society is restructuring itself,
link |
turning into an information society,
link |
which itself is turning into
link |
an increasingly automated information passing society.
link |
And well, yeah, I think the best we can do today
link |
is try to raise awareness around some of these issues.
link |
And I think we're actually making good progress.
link |
If you look at algorithmic bias, for instance,
link |
three years ago, even two years ago,
link |
very, very few people were talking about it.
link |
And now all the big companies are talking about it.
link |
They are often not in a very serious way,
link |
but at least it is part of the public discourse.
link |
You see people in Congress talking about it.
link |
And it all started from raising awareness.
link |
So in terms of alignment problem,
link |
trying to teach as we allow algorithms,
link |
just even recommender systems on Twitter,
link |
encoding human values and morals,
link |
decisions that touch on ethics,
link |
how hard do you think that problem is?
link |
How do we have lost functions in neural networks
link |
that have some component,
link |
some fuzzy components of human morals?
link |
Well, I think this is really all about objective function engineering,
link |
which is probably going to be increasingly a topic of concern in the future.
link |
Like for now, we're just using very naive loss functions
link |
because the hard part is not actually what you're trying to minimize.
link |
It's everything else.
link |
But as the everything else is going to be increasingly automated,
link |
we're going to be focusing our human attention
link |
on increasingly high level components,
link |
like what's actually driving the whole learning system,
link |
like the objective function.
link |
So loss function engineering is going to be,
link |
loss function engineer is probably going to be a job title in the future.
link |
And then the tooling you're creating with Keras essentially
link |
takes care of all the details underneath.
link |
And basically the human expert is needed for exactly that.
link |
Keras is the interface between the data you're collecting
link |
and the business goals.
link |
And your job as an engineer is going to be to express your business goals
link |
and your understanding of your business or your product,
link |
your system as a kind of loss function or a kind of set of constraints.
link |
Does the possibility of creating an AGI system excite you or scare you or bore you?
link |
So intelligence can never really be general.
link |
You know, at best it can have some degree of generality like human intelligence.
link |
It also always has some specialization in the same way that human intelligence
link |
is specialized in a certain category of problems,
link |
is specialized in the human experience.
link |
And when people talk about AGI,
link |
I'm never quite sure if they're talking about very, very smart AI,
link |
so smart that it's even smarter than humans,
link |
or they're talking about human like intelligence,
link |
because these are different things.
link |
Let's say, presumably I'm oppressing you today with my humanness.
link |
So imagine that I was in fact a robot.
link |
So what does that mean?
link |
That I'm impressing you with natural language processing.
link |
Maybe if you weren't able to see me, maybe this is a phone call.
link |
So that kind of system.
link |
So that's very much about building human like AI.
link |
And you're asking me, you know, is this an exciting perspective?
link |
Not so much because of what artificial human like intelligence could do,
link |
but, you know, from an intellectual perspective,
link |
I think if you could build truly human like intelligence,
link |
that means you could actually understand human intelligence,
link |
which is fascinating, right?
link |
Human like intelligence is going to require emotions.
link |
It's going to require consciousness,
link |
which is not things that would normally be required by an intelligent system.
link |
If you look at, you know, we were mentioning earlier like science
link |
as a superhuman problem solving agent or system,
link |
it does not have consciousness, it doesn't have emotions.
link |
In general, so emotions,
link |
I see consciousness as being on the same spectrum as emotions.
link |
It is a component of the subjective experience
link |
that is meant very much to guide behavior generation, right?
link |
It's meant to guide your behavior.
link |
In general, human intelligence and animal intelligence
link |
has evolved for the purpose of behavior generation, right?
link |
Including in a social context.
link |
So that's why we actually need emotions.
link |
That's why we need consciousness.
link |
An artificial intelligence system developed in a different context
link |
may well never need them, may well never be conscious like science.
link |
Well, on that point, I would argue it's possible to imagine
link |
that there's echoes of consciousness in science
link |
when viewed as an organism, that science is consciousness.
link |
So, I mean, how would you go about testing this hypothesis?
link |
How do you probe the subjective experience of an abstract system like science?
link |
Well, the point of probing any subjective experience is impossible
link |
because I'm not science, I'm Lex.
link |
So I can't probe another entity, it's no more than bacteria on my skin.
link |
You're Lex, I can ask you questions about your subjective experience
link |
and you can answer me, and that's how I know you're conscious.
link |
Yes, but that's because we speak the same language.
link |
You perhaps, we have to speak the language of science in order to ask it.
link |
Honestly, I don't think consciousness, just like emotions of pain and pleasure,
link |
is not something that inevitably arises
link |
from any sort of sufficiently intelligent information processing.
link |
It is a feature of the mind, and if you've not implemented it explicitly, it is not there.
link |
So you think it's an emergent feature of a particular architecture.
link |
So do you think...
link |
It's a feature in the same sense.
link |
So, again, the subjective experience is all about guiding behavior.
link |
If the problems you're trying to solve don't really involve an embodied agent,
link |
maybe in a social context, generating behavior and pursuing goals like this.
link |
And if you look at science, that's not really what's happening.
link |
Even though it is, it is a form of artificial AI, artificial intelligence,
link |
in the sense that it is solving problems, it is accumulating knowledge,
link |
accumulating solutions and so on.
link |
So if you're not explicitly implementing a subjective experience,
link |
implementing certain emotions and implementing consciousness,
link |
it's not going to just spontaneously emerge.
link |
But so for a system like, human like intelligence system that has consciousness,
link |
do you think it needs to have a body?
link |
I mean, it doesn't have to be a physical body, right?
link |
And there's not that much difference between a realistic simulation in the real world.
link |
So there has to be something you have to preserve kind of thing.
link |
Yes, but human like intelligence can only arise in a human like context.
link |
Intelligence needs other humans in order for you to demonstrate
link |
that you have human like intelligence, essentially.
link |
So what kind of tests and demonstration would be sufficient for you
link |
to demonstrate human like intelligence?
link |
Just out of curiosity, you've talked about in terms of theorem proving
link |
and program synthesis, I think you've written about
link |
that there's no good benchmarks for this.
link |
That's one of the problems.
link |
So let's talk program synthesis.
link |
So what do you imagine is a good...
link |
I think it's related questions for human like intelligence
link |
and for program synthesis.
link |
What's a good benchmark for either or both?
link |
So I mean, you're actually asking two questions,
link |
which is one is about quantifying intelligence
link |
and comparing the intelligence of an artificial system
link |
to the intelligence for human.
link |
And the other is about the degree to which this intelligence is human like.
link |
It's actually two different questions.
link |
So you mentioned earlier the Turing test.
link |
Well, I actually don't like the Turing test because it's very lazy.
link |
It's all about completely bypassing the problem of defining and measuring intelligence
link |
and instead delegating to a human judge or a panel of human judges.
link |
So it's a total copout, right?
link |
If you want to measure how human like an agent is,
link |
I think you have to make it interact with other humans.
link |
Maybe it's not necessarily a good idea to have these other humans be the judges.
link |
Maybe you should just observe behavior and compare it to what a human would actually have done.
link |
When it comes to measuring how smart, how clever an agent is
link |
and comparing that to the degree of human intelligence.
link |
So we're already talking about two things, right?
link |
The degree, kind of like the magnitude of an intelligence and its direction, right?
link |
Like the norm of a vector and its direction.
link |
And the direction is like human likeness and the magnitude, the norm is intelligence.
link |
You could call it intelligence, right?
link |
So the direction, your sense, the space of directions that are human like is very narrow.
link |
So the way you would measure the magnitude of intelligence in a system
link |
in a way that also enables you to compare it to that of a human.
link |
Well, if you look at different benchmarks for intelligence today,
link |
they're all too focused on skill at a given task.
link |
Like skill at playing chess, skill at playing Go, skill at playing Dota.
link |
And I think that's not the right way to go about it because you can always
link |
beat a human at one specific task.
link |
The reason why our skill at playing Go or juggling or anything is impressive
link |
is because we are expressing this skill within a certain set of constraints.
link |
If you remove the constraints, the constraints that we have one lifetime,
link |
that we have this body and so on, if you remove the context,
link |
if you have unlimited string data, if you can have access to, you know,
link |
for instance, if you look at juggling, if you have no restriction on the hardware,
link |
then achieving arbitrary levels of skill is not very interesting
link |
and says nothing about the amount of intelligence you've achieved.
link |
So if you want to measure intelligence, you need to rigorously define what
link |
intelligence is, which in itself, you know, it's a very challenging problem.
link |
And do you think that's possible?
link |
To define intelligence? Yes, absolutely.
link |
I mean, you can provide, many people have provided, you know, some definition.
link |
I have my own definition.
link |
Where does your definition begin?
link |
Where does your definition begin if it doesn't end?
link |
Well, I think intelligence is essentially the efficiency
link |
with which you turn experience into generalizable programs.
link |
So what that means is it's the efficiency with which
link |
you turn a sampling of experience space into
link |
the ability to process a larger chunk of experience space.
link |
So measuring skill can be one proxy across many different tasks,
link |
can be one proxy for measuring intelligence.
link |
But if you want to only measure skill, you should control for two things.
link |
You should control for the amount of experience that your system has
link |
and the priors that your system has.
link |
But if you look at two agents and you give them the same priors
link |
and you give them the same amount of experience,
link |
there is one of the agents that is going to learn programs,
link |
representations, something, a model that will perform well
link |
on the larger chunk of experience space than the other.
link |
And that is the smaller agent.
link |
Yeah. So if you fix the experience, which generate better programs,
link |
better meaning more generalizable.
link |
That's really interesting.
link |
That's a very nice, clean definition of...
link |
Oh, by the way, in this definition, it is already very obvious
link |
that intelligence has to be specialized
link |
because you're talking about experience space
link |
and you're talking about segments of experience space.
link |
You're talking about priors and you're talking about experience.
link |
All of these things define the context in which intelligence emerges.
link |
And you can never look at the totality of experience space, right?
link |
So intelligence has to be specialized.
link |
But it can be sufficiently large, the experience space,
link |
even though it's specialized.
link |
There's a certain point when the experience space is large enough
link |
to where it might as well be general.
link |
It feels general. It looks general.
link |
Sure. I mean, it's very relative.
link |
Like, for instance, many people would say human intelligence is general.
link |
In fact, it is quite specialized.
link |
We can definitely build systems that start from the same innate priors
link |
as what humans have at birth.
link |
Because we already understand fairly well
link |
what sort of priors we have as humans.
link |
Like many people have worked on this problem.
link |
Most notably, Elisabeth Spelke from Harvard.
link |
I don't know if you know her.
link |
She's worked a lot on what she calls core knowledge.
link |
And it is very much about trying to determine and describe
link |
what priors we are born with.
link |
Like language skills and so on, all that kind of stuff.
link |
So we have some pretty good understanding of what priors we are born with.
link |
So I've actually been working on a benchmark for the past couple years,
link |
you know, on and off.
link |
I hope to be able to release it at some point.
link |
The idea is to measure the intelligence of systems
link |
by countering for priors,
link |
countering for amount of experience,
link |
and by assuming the same priors as what humans are born with.
link |
So that you can actually compare these scores to human intelligence.
link |
You can actually have humans pass the same test in a way that's fair.
link |
Yeah. And so importantly, such a benchmark should be such that any amount
link |
of practicing does not increase your score.
link |
So try to picture a game where no matter how much you play this game,
link |
that does not change your skill at the game.
link |
Can you picture that?
link |
As a person who deeply appreciates practice, I cannot actually.
link |
There's actually a very simple trick.
link |
So in order to come up with a task,
link |
so the only thing you can measure is skill at the task.
link |
All tasks are going to involve priors.
link |
The trick is to know what they are and to describe that.
link |
And then you make sure that this is the same set of priors as what humans start with.
link |
So you create a task that assumes these priors, that exactly documents these priors,
link |
so that the priors are made explicit and there are no other priors involved.
link |
And then you generate a certain number of samples in experience space for this task, right?
link |
And this, for one task, assuming that the task is new for the agent passing it,
link |
that's one test of this definition of intelligence that we set up.
link |
And now you can scale that to many different tasks,
link |
that each task should be new to the agent passing it, right?
link |
And also it should be human interpretable and understandable
link |
so that you can actually have a human pass the same test.
link |
And then you can compare the score of your machine and the score of your human.
link |
Which could be a lot of stuff.
link |
You could even start a task like MNIST.
link |
Just as long as you start with the same set of priors.
link |
So the problem with MNIST, humans are already trying to recognize digits, right?
link |
But let's say we're considering objects that are not digits,
link |
some completely arbitrary patterns.
link |
Well, humans already come with visual priors about how to process that.
link |
So in order to make the game fair, you would have to isolate these priors
link |
and describe them and then express them as computational rules.
link |
Having worked a lot with vision science people, that's exceptionally difficult.
link |
A lot of progress has been made.
link |
There's been a lot of good tests and basically reducing all of human vision into some good priors.
link |
We're still probably far away from that perfectly,
link |
but as a start for a benchmark, that's an exciting possibility.
link |
Yeah, so Elisabeth Spelke actually lists objectness as one of the core knowledge priors.
link |
So we have priors about objectness, like about the visual space, about time,
link |
about agents, about goal oriented behavior.
link |
We have many different priors, but what's interesting is that,
link |
sure, we have this pretty diverse and rich set of priors,
link |
but it's also not that diverse, right?
link |
We are not born into this world with a ton of knowledge about the world,
link |
with only a small set of core knowledge.
link |
Yeah, sorry, do you have a sense of how it feels to us humans that that set is not that large?
link |
But just even the nature of time that we kind of integrate pretty effectively
link |
through all of our perception, all of our reasoning,
link |
maybe how, you know, do you have a sense of how easy it is to encode those priors?
link |
Maybe it requires building a universe and then the human brain in order to encode those priors.
link |
Or do you have a hope that it can be listed like an axiomatic?
link |
So you have to keep in mind that any knowledge about the world that we are
link |
born with is something that has to have been encoded into our DNA by evolution at some point.
link |
And DNA is a very, very low bandwidth medium.
link |
Like it's extremely long and expensive to encode anything into DNA because first of all,
link |
you need some sort of evolutionary pressure to guide this writing process.
link |
And then, you know, the higher level of information you're trying to write, the longer it's going to take.
link |
And the thing in the environment that you're trying to encode knowledge about has to be stable
link |
over this duration.
link |
So you can only encode into DNA things that constitute an evolutionary advantage.
link |
So this is actually a very small subset of all possible knowledge about the world.
link |
You can only encode things that are stable, that are true, over very, very long periods of time,
link |
typically millions of years.
link |
For instance, we might have some visual prior about the shape of snakes, right?
link |
But what makes a face, what's the difference between a face and an art face?
link |
But consider this interesting question.
link |
Do we have any innate sense of the visual difference between a male face and a female face?
link |
What do you think?
link |
For a human, I mean.
link |
I would have to look back into evolutionary history when the genders emerged.
link |
I mean, the faces of humans are quite different from the faces of great apes.
link |
Great apes, right?
link |
That's interesting.
link |
Yeah, you couldn't tell the face of a female chimpanzee from the face of a male chimpanzee,
link |
Yeah, and I don't think most humans have all that ability.
link |
So we do have innate knowledge of what makes a face, but it's actually impossible for us to
link |
have any DNA encoded knowledge of the difference between a female human face and a male human face
link |
because that knowledge, that information came up into the world actually very recently.
link |
If you look at the slowness of the process of encoding knowledge into DNA.
link |
Yeah, so that's interesting.
link |
That's a really powerful argument that DNA is a low bandwidth and it takes a long time to encode.
link |
That naturally creates a very efficient encoding.
link |
But one important consequence of this is that, so yes, we are born into this world with a bunch of
link |
knowledge, sometimes high level knowledge about the world, like the shape, the rough shape of a
link |
snake, of the rough shape of a face.
link |
But importantly, because this knowledge takes so long to write, almost all of this innate
link |
knowledge is shared with our cousins, with great apes, right?
link |
So it is not actually this innate knowledge that makes us special.
link |
But to throw it right back at you from the earlier on in our discussion, it's that encoding
link |
might also include the entirety of the environment of Earth.
link |
So it can include things that are important to survival and production, so for which there is
link |
some evolutionary pressure, and things that are stable, constant over very, very, very long time
link |
And honestly, it's not that much information.
link |
There's also, besides the bandwidths constraint and the constraints of the writing process,
link |
there's also memory constraints, like DNA, the part of DNA that deals with the human brain,
link |
it's actually fairly small.
link |
It's like, you know, on the order of megabytes, right?
link |
There's not that much high level knowledge about the world you can encode.
link |
That's quite brilliant and hopeful for a benchmark that you're referring to of encoding
link |
I actually look forward to, I'm skeptical whether you can do it in the next couple of
link |
years, but hopefully.
link |
I've been working.
link |
So honestly, it's a very simple benchmark, and it's not like a big breakthrough or anything.
link |
It's more like a fun side project, right?
link |
But these fun, so is ImageNet.
link |
These fun side projects could launch entire groups of efforts towards creating reasoning
link |
systems and so on.
link |
Yeah, that's the goal.
link |
It's trying to measure strong generalization, to measure the strength of abstraction in
link |
our minds, well, in our minds and in artificial intelligence agencies.
link |
And if there's anything true about this science organism is its individual cells love competition.
link |
So and benchmarks encourage competition.
link |
So that's an exciting possibility.
link |
If you, do you think an AI winter is coming?
link |
And how do we prevent it?
link |
So an AI winter is something that would occur when there's a big mismatch between how we
link |
are selling the capabilities of AI and the actual capabilities of AI.
link |
And today, some deep learning is creating a lot of value.
link |
And it will keep creating a lot of value in the sense that these models are applicable
link |
to a very wide range of problems that are relevant today.
link |
And we are only just getting started with applying these algorithms to every problem
link |
they could be solving.
link |
So deep learning will keep creating a lot of value for the time being.
link |
What's concerning, however, is that there's a lot of hype around deep learning and around
link |
There are lots of people are overselling the capabilities of these systems, not just
link |
the capabilities, but also overselling the fact that they might be more or less, you
link |
know, brain like, like given the kind of a mystical aspect, these technologies and also
link |
overselling the pace of progress, which, you know, it might look fast in the sense that
link |
we have this exponentially increasing number of papers.
link |
But again, that's just a simple consequence of the fact that we have ever more people
link |
coming into the field.
link |
It doesn't mean the progress is actually exponentially fast.
link |
Let's say you're trying to raise money for your startup or your research lab.
link |
You might want to tell, you know, a grandiose story to investors about how deep learning
link |
is just like the brain and how it can solve all these incredible problems like self driving
link |
and robotics and so on.
link |
And maybe you can tell them that the field is progressing so fast and we are going to
link |
have AGI within 15 years or even 10 years.
link |
And none of this is true.
link |
And every time you're like saying these things and an investor or, you know, a decision maker
link |
believes them, well, this is like the equivalent of taking on credit card debt, but for trust,
link |
And maybe this will, you know, this will be what enables you to raise a lot of money,
link |
but ultimately you are creating damage, you are damaging the field.
link |
So that's the concern is that that debt, that's what happens with the other AI winters is
link |
the concern is you actually tweeted about this with autonomous vehicles, right?
link |
There's almost every single company now have promised that they will have full autonomous
link |
vehicles by 2021, 2022.
link |
That's a good example of the consequences of over hyping the capabilities of AI and
link |
the pace of progress.
link |
So because I work especially a lot recently in this area, I have a deep concern of what
link |
happens when all of these companies after I've invested billions have a meeting and
link |
say, how much do we actually, first of all, do we have an autonomous vehicle?
link |
The answer will definitely be no.
link |
And second will be, wait a minute, we've invested one, two, three, four billion dollars
link |
into this and we made no profit.
link |
And the reaction to that may be going very hard in other directions that might impact
link |
even other industries.
link |
And that's what we call an AI winter is when there is backlash where no one believes any
link |
of these promises anymore because they've turned that to be big lies the first time
link |
And this will definitely happen to some extent for autonomous vehicles because the public
link |
and decision makers have been convinced that around 2015, they've been convinced by these
link |
people who are trying to raise money for their startups and so on, that L5 driving was coming
link |
in maybe 2016, maybe 2017, maybe 2018.
link |
Now we're in 2019, we're still waiting for it.
link |
And so I don't believe we are going to have a full on AI winter because we have these
link |
technologies that are producing a tremendous amount of real value.
link |
But there is also too much hype.
link |
So there will be some backlash, especially there will be backlash.
link |
So some startups are trying to sell the dream of AGI and the fact that AGI is going to create
link |
Like AGI is like a free lunch.
link |
Like if you can develop an AI system that passes a certain threshold of IQ or something,
link |
then suddenly you have infinite value.
link |
And well, there are actually lots of investors buying into this idea and they will wait maybe
link |
10, 15 years and nothing will happen.
link |
And the next time around, well, maybe there will be a new generation of investors.
link |
Human memory is fairly short after all.
link |
I don't know about you, but because I've spoken about AGI sometimes poetically, I get a lot
link |
of emails from people giving me, they're usually like a large manifestos of they've, they say
link |
to me that they have created an AGI system or they know how to do it.
link |
And there's a long write up of how to do it.
link |
I get a lot of these emails, yeah.
link |
They're a little bit feel like it's generated by an AI system actually, but there's usually
link |
no diagram, you have a transformer generating crank papers about AGI.
link |
So the question is about, because you've been such a good, you have a good radar for crank
link |
papers, how do we know they're not onto something?
link |
How do I, so when you start to talk about AGI or anything like the reasoning benchmarks
link |
and so on, so something that doesn't have a benchmark, it's really difficult to know.
link |
I mean, I talked to Jeff Hawkins, who's really looking at neuroscience approaches to how,
link |
and there's some, there's echoes of really interesting ideas in at least Jeff's case,
link |
which he's showing.
link |
How do you usually think about this?
link |
Like preventing yourself from being too narrow minded and elitist about deep learning, it
link |
has to work on these particular benchmarks, otherwise it's trash.
link |
Well, you know, the thing is, intelligence does not exist in the abstract.
link |
Intelligence has to be applied.
link |
So if you don't have a benchmark, if you have an improvement in some benchmark, maybe it's
link |
a new benchmark, right?
link |
Maybe it's not something we've been looking at before, but you do need a problem that
link |
you're trying to solve.
link |
You're not going to come up with a solution without a problem.
link |
So you, general intelligence, I mean, you've clearly highlighted generalization.
link |
If you want to claim that you have an intelligence system, it should come with a benchmark.
link |
It should, yes, it should display capabilities of some kind.
link |
It should show that it can create some form of value, even if it's a very artificial form
link |
And that's also the reason why you don't actually need to care about telling which papers have
link |
actually some hidden potential and which do not.
link |
Because if there is a new technique that's actually creating value, this is going to
link |
be brought to light very quickly because it's actually making a difference.
link |
So it's the difference between something that is ineffectual and something that is actually
link |
And ultimately usefulness is our guide, not just in this field, but if you look at science
link |
in general, maybe there are many, many people over the years that have had some really interesting
link |
theories of everything, but they were just completely useless.
link |
And you don't actually need to tell the interesting theories from the useless theories.
link |
All you need is to see, is this actually having an effect on something else?
link |
Is this actually useful?
link |
Is this making an impact or not?
link |
That's beautifully put.
link |
I mean, the same applies to quantum mechanics, to string theory, to the holographic principle.
link |
We are doing deep learning because it works.
link |
Before it started working, people considered people working on neural networks as cranks
link |
No one was working on this anymore.
link |
And now it's working, which is what makes it valuable.
link |
It's not about being right.
link |
It's about being effective.
link |
And nevertheless, the individual entities of this scientific mechanism, just like Yoshua
link |
Banjo or Jan Lekun, they, while being called cranks, stuck with it.
link |
And so us individual agents, even if everyone's laughing at us, just stick with it.
link |
If you believe you have something, you should stick with it and see it through.
link |
That's a beautiful inspirational message to end on.
link |
Francois, thank you so much for talking today.