back to indexFrançois Chollet: Keras, Deep Learning, and the Progress of AI | Lex Fridman Podcast #38
link |
The following is a conversation with Francois Chalet.
link |
He's the creator of Keras, which is an open source deep learning
link |
library that is designed to enable fast, user friendly
link |
experimentation with deep neural networks.
link |
It serves as an interface to several deep learning libraries,
link |
most popular of which is TensorFlow.
link |
And it was integrated into the TensorFlow main code base
link |
Meaning, if you want to create, train, and use
link |
neural networks, probably the easiest and most popular option
link |
is to use Keras inside TensorFlow.
link |
Aside from creating an exceptionally useful and popular
link |
library, Francois is also a world class AI researcher
link |
and software engineer at Google.
link |
And he's definitely an outspoken, if not controversial,
link |
personality in the AI world, especially
link |
in the realm of ideas around the future
link |
of artificial intelligence.
link |
This is the Artificial Intelligence Podcast.
link |
If you enjoy it, subscribe on YouTube,
link |
give us five stars on iTunes, support on Patreon,
link |
or simply connect with me on Twitter
link |
at Lex Freedman, spelled F R I D M A N.
link |
And now, here's my conversation with Francois Chalet.
link |
You're known for not sugarcoating your opinions
link |
and speaking your mind about ideas in AI, especially
link |
That's one of my favorite Twitter accounts.
link |
So what's one of the more controversial ideas
link |
you've expressed online and gotten some heat for?
link |
Yeah, no, I think if you go through the trouble of maintaining
link |
Twitter accounts, you might as well speak your mind.
link |
Otherwise, what's even the point of doing Twitter accounts,
link |
like getting an eye scar and just leaving it in the garage?
link |
Yeah, so that's one thing for which
link |
I got a lot of pushback.
link |
Perhaps that time, I wrote something
link |
about the idea of intelligence explosion.
link |
And I was questioning the idea and the reasoning behind this
link |
And I got a lot of pushback on that.
link |
I got a lot of flak for it.
link |
So yeah, so intelligence explosion, I'm sure you're familiar
link |
with the idea, but it's the idea
link |
that if you were to build general AI problems
link |
solving algorithms, well, the problem of building such an AI,
link |
that itself is a problem that could be solved by your AI.
link |
And maybe it could be solved better than what humans can do.
link |
So your AI could start tweaking its own algorithm,
link |
could start making a better version of itself.
link |
And so on, iteratively, in a recursive fashion,
link |
and so you would end up with an AI
link |
with exponentially increasing intelligence.
link |
And I was basically questioning this idea.
link |
First of all, because the notion of intelligence explosion
link |
uses an implicit definition of intelligence
link |
that doesn't sound quite right to me.
link |
It considers intelligence as a property of a brain
link |
that you can consider in isolation,
link |
like the height of a building, for instance.
link |
But that's not really what intelligence is.
link |
Intelligence emerges from the interaction
link |
between a brain, a body, like embodied intelligence,
link |
and an environment.
link |
And if you're missing one of these pieces,
link |
then you cannot really define intelligence anymore.
link |
So just tweaking a brain to make it smaller and smaller
link |
doesn't actually make any sense to me.
link |
So first of all, you're crushing the dreams of many people.
link |
So let's look at Sam Harris.
link |
Actually, a lot of physicists, Max Tegmark,
link |
people who think the universe is an information processing
link |
Our brain is kind of an information processing system.
link |
So what's the theoretical limit?
link |
It doesn't make sense that there should be some,
link |
it seems naive to think that our own brain is somehow
link |
the limit of the capabilities and this information.
link |
I'm playing devil's advocate here.
link |
This information processing system.
link |
And then if you just scale it, if you're
link |
able to build something that's on par with the brain,
link |
you just, the process that builds it just continues
link |
and it will improve exponentially.
link |
So that's the logic that's used actually
link |
by almost everybody that is worried
link |
about super human intelligence.
link |
Yeah, so you're trying to make, so most people
link |
who are skeptical of that are kind of like,
link |
this doesn't, their thought process,
link |
this doesn't feel right.
link |
Like that's for me as well.
link |
So I'm more like, it doesn't, the whole thing is shrouded
link |
in mystery where you can't really say anything concrete,
link |
but you could say this doesn't feel right.
link |
This doesn't feel like that's how the brain works.
link |
And you're trying to, with your blog post
link |
and now making it a little more explicit.
link |
So one idea is that the brain isn't,
link |
exists alone, it exists within the environment.
link |
So you can't exponentially, you would have to somehow
link |
exponentially improve the environment
link |
and the brain together, almost yet in order
link |
to create something that's much smarter
link |
in some kind of, of course we don't have
link |
a definition of intelligence.
link |
That's correct, that's correct.
link |
I don't think, you should look at very smart people
link |
to the even humans, not even talking about AI's.
link |
I don't think their brain and the performance
link |
of their brain is the bottleneck
link |
to their expressed intelligence, to their achievements.
link |
You cannot just tweak one part of this system,
link |
like of this brain, body, environment system
link |
and expect the capabilities, like what emerges
link |
out of this system to just, you know,
link |
explode exponentially.
link |
Because anytime you improve one part of a system
link |
with many interdependencies like this,
link |
there's a new bottleneck that arises, right?
link |
And I don't think even today for very smart people,
link |
their brain is not the bottleneck
link |
to the sort of problems they can solve, right?
link |
In fact, many very smart people today, you know,
link |
they're not actually solving any big scientific problems.
link |
They're not Einstein.
link |
They're like Einstein, but, you know,
link |
the patent clerk days.
link |
Like Einstein became Einstein
link |
because this was a meeting of a genius
link |
with a big problem at the right time, right?
link |
But maybe this meeting could have never happened
link |
and then Einstein, there's just been a patent clerk, right?
link |
And in fact, many people today are probably like
link |
genius level smart, but you wouldn't know
link |
because they're not really expressing any of that.
link |
Well, that's brilliant. So we can think of the world, earth,
link |
but also the universe as just, as a space of problems.
link |
So all of these problems and tasks are roaming it
link |
of various difficulty.
link |
And there's agents, creatures like ourselves
link |
and animals and so on that are also roaming it.
link |
And then you get coupled with a problem
link |
and then you solve it.
link |
But without that coupling,
link |
you can't demonstrate your quote unquote intelligence.
link |
Yeah, exactly. Intelligence is the meaning of
link |
great problem solving capabilities with a great problem.
link |
And if you don't have the problem,
link |
you don't really express in intelligence.
link |
All you're left with is potential intelligence,
link |
like the performance of your brain or, you know,
link |
how high your IQ is, which in itself is just a number, right?
link |
So you mentioned problem solving capacity.
link |
What do you think of as problem solving capacity?
link |
What, can you try to define intelligence?
link |
Like, what does it mean to be more or less intelligent?
link |
Is it completely coupled to a particular problem?
link |
Or is there something a little bit more universal?
link |
Yeah, I do believe all intelligence
link |
is specialized intelligence.
link |
Even human intelligence has some degree of generality.
link |
Well, all intelligence systems have some degree of generality,
link |
but they're always specialized in one category of problems.
link |
So the human intelligence is specialized
link |
in the human experience and that shows at various levels,
link |
that shows in some prior knowledge,
link |
that's innate, that we have at birth,
link |
knowledge about things like agents,
link |
goal driven behavior, visual priors about what makes an object,
link |
priors about time, and so on.
link |
That shows also in the way we learn,
link |
for instance, it's very easy for us to pick up language,
link |
it's very, very easy for us to learn certain things
link |
because we are basically hard coded to learn them.
link |
And we are specialized in solving certain kinds of problems
link |
and we are quite useless when it comes to other kinds of problems.
link |
For instance, we are not really designed
link |
to handle very long term problems.
link |
We have no capability of seeing the very long term.
link |
We don't have very much working memory, you know?
link |
So how do you think about long term?
link |
Do you think long term planning,
link |
we're talking about scale of years, millennia,
link |
what do you mean by long term, we're not very good?
link |
Well, human intelligence is specialized in the human experience
link |
and human experience is very short, like one lifetime is short.
link |
Even within one lifetime, we have a very hard time envisioning,
link |
you know, things on a scale of years.
link |
Like it's very difficult to project yourself at the scale of five,
link |
at the scale of 10 years and so on.
link |
Right. We can solve only fairly narrowly scoped problems.
link |
So when it comes to solving bigger problems, larger scale problems,
link |
we are not actually doing it on an individual level.
link |
So it's not actually our brain doing it.
link |
We have this thing called civilization, right?
link |
Which is itself a sort of problem solving system,
link |
a sort of artificial intelligence system, right?
link |
And it's not running on one brain, it's running on a network of brains.
link |
In fact, it's running on much more than a network of brains.
link |
It's running on a lot of infrastructure, like books and computers
link |
and the internet and human institutions and so on.
link |
And that is capable of handling problems on a much greater scale
link |
than any individual human.
link |
If you look at computer science, for instance,
link |
that's an institution that solves problems and it is super human, right?
link |
It operates on a greater scale, it can solve much bigger problems
link |
than an individual human could.
link |
And science itself, science as a system, as an institution,
link |
is a kind of artificially intelligent problem solving algorithm
link |
that is super human.
link |
Yeah, it's a computer science is like a theorem prover
link |
at a scale of thousands, maybe hundreds of thousands of human beings.
link |
At that scale, what do you think is an intelligent agent?
link |
So there's us humans at the individual level.
link |
There is millions, maybe billions of bacteria in our skin.
link |
There is, that's at the smaller scale.
link |
You can even go to the particle level as systems that behave.
link |
You can say intelligently in some ways.
link |
And then you can look at the Earth as a single organism.
link |
You can look at our galaxy and even the universe as a single organism.
link |
Do you think, how do you think about scale and defining intelligent systems?
link |
And we're here at Google, there is millions of devices doing computation
link |
in a distributed way.
link |
How do you think about intelligence versus scale?
link |
You can always characterize anything as a system.
link |
I think people who talk about things like intelligence explosion
link |
tend to focus on one agent is basically one brain,
link |
like one brain considered in isolation, like a brain, a jar
link |
that's controlling a body in a very top to bottom kind of fashion.
link |
And that body is pursuing goals into an environment.
link |
So it's a very hierarchical view.
link |
You have the brain at the top of the pyramid,
link |
then you have the body just plainly receiving orders,
link |
then the body is manipulating objects in an environment and so on.
link |
So everything is subordinate to this one thing, this epicenter,
link |
which is the brain.
link |
But in real life, intelligent agents don't really work like this.
link |
There is no strong delimitation between the brain and the body to start with.
link |
You have to look not just at the brain, but at the nervous system.
link |
But then the nervous system and the body are naturally two separate entities.
link |
So you have to look at an entire animal as one agent.
link |
But then you start realizing as you observe an animal over any length of time
link |
that a lot of the intelligence of an animal is actually externalized.
link |
That's especially true for humans.
link |
A lot of our intelligence is externalized.
link |
When you write down some notes, there is externalized intelligence.
link |
When you write a computer program, you are externalizing cognition.
link |
So it's externalized in books.
link |
It's externalized in computers, the internet, in other humans.
link |
It's externalized in language and so on.
link |
So there is no hard delimitation of what makes an intelligent agent.
link |
It's all about context.
link |
OK, but AlphaGo is better at Go than the best human player.
link |
There's levels of skill here.
link |
So do you think there is such a concept as an intelligence explosion
link |
in a specific task?
link |
And then, well, yeah, do you think it's possible to have a category of tasks
link |
on which you do have something like an exponential growth of ability
link |
to solve that particular problem?
link |
I think if you consider a specific vertical, it's probably possible to some extent.
link |
I also don't think we have to speculate about it
link |
because we have real world examples of free classivity self improving
link |
intelligent systems.
link |
For instance, science is a problem solving system, a knowledge generation system,
link |
like a system that experiences the world in some sense
link |
and then gradually understands it and can act on it.
link |
And that system is superhuman and it is clearly recursively self improving
link |
because science fits into technology.
link |
Technology can be used to build better tools, better computers,
link |
better instrumentation and so on, which in turn can make science faster.
link |
So science is probably the closest thing we have today
link |
to a real civility self improving superhuman AI.
link |
And you can just observe, is science, is scientific progress today exploding,
link |
which itself is an interesting question.
link |
You can use that as a basis to try to understand what
link |
will happen with a superhuman AI that has science like behavior.
link |
Let me linger on it a little bit more.
link |
What is your intuition why an intelligence explosion is not possible?
link |
Like taking the scientific, all the semi scientific revolutions.
link |
Why can't we slightly accelerate that process?
link |
So you can absolutely accelerate any problem solving process.
link |
So recursively, recursive self improvement is absolutely a real thing.
link |
But what happens with a recursively self improving system
link |
is typically not explosion because no system exists in isolation.
link |
And so tweaking one part of the system means that suddenly another part of the system
link |
becomes a bottleneck.
link |
And if you look at science, for instance, which is clearly a recursively self improving,
link |
clearly a problem solving system, scientific progress is not actually exploding.
link |
If you look at science, what you see is the picture of a system that is consuming
link |
an exponentially increasing amount of resources.
link |
But it's having a linear output in terms of scientific progress.
link |
And maybe that will seem like a very strong claim.
link |
Many people are actually saying that scientific progress is exponential.
link |
But when they're claiming this, they're actually looking at indicators of resource
link |
consumption by science.
link |
For instance, the number of papers being published, the number of patterns being
link |
filed, and so on, which are just completely correlated with how many people are working
link |
So it's actually an indicator of resource consumption.
link |
But what you should look at is the output is progress in terms of the knowledge that
link |
science generates in terms of the scope and significance of the problems that we solve.
link |
And some people have actually been trying to measure that.
link |
Like Michael Nielsen, for instance, he had a very nice paper, I think that was last
link |
So his approach to measure scientific progress was to look at the timeline of scientific
link |
discoveries over the past 100, 150 years.
link |
And for each major discovery, ask a panel of experts to rate the significance of the
link |
And if the output of sciences in the institution were exponential, you would expect the temporal
link |
density of significance to go up exponentially, maybe because there's a faster rate of discoveries,
link |
maybe because the discoveries are increasingly more important.
link |
And what actually happens if you plot this temporal density of significance measured
link |
in this way, is that you see very much a flat graph.
link |
You see a flat graph across all disciplines, across physics, biology, medicine and so on.
link |
And it actually makes a lot of sense if you think about it, because think about the progress
link |
of physics 110 years ago.
link |
It was a time of crazy change.
link |
Think about the progress of technology 170 years ago, when we started replacing horses,
link |
with cars, when we started having electricity and so on.
link |
It was a time of incredible change.
link |
And today is also a time of very, very fast change.
link |
But it would be an unfair characterization to say that today, technology and science
link |
are moving way faster than they did 50 years ago or 100 years ago.
link |
And if you do try to rigorously plot the temporal density of the significance, you do see very
link |
flat curves and you can check out the paper that Michael Nielsen had about this idea.
link |
And so the way I interpret it is as you make progress in a given field or in a given subfield
link |
of science, it becomes exponentially more difficult to make further progress, like the
link |
very first person to work on information theory.
link |
If you enter a new field and it's still the very early years, there's a lot of low hanging
link |
fruit you can pick.
link |
But the next generation of researchers is going to have to dig much harder, actually,
link |
to make smaller discoveries, probably larger numbers, smaller discoveries.
link |
And to achieve the same amount of impact, you're going to need a much greater head count.
link |
And that's exactly the picture you're seeing with science, is that the number of scientists
link |
and engineers is, in fact, increasing exponentially.
link |
The amount of computational resources that are available to science is increasing exponentially
link |
So the resource consumption of science is exponential, but the output in terms of progress,
link |
in terms of significance, is linear.
link |
And the reason why is because, and even though science is rigorously self improving, meaning
link |
that scientific progress turns into technological progress, which in turn helps science.
link |
If you look at computers, for instance, our products of science and computers are tremendously
link |
useful in spinning up science.
link |
The internet, same thing.
link |
The internet is a technology that's made possible by very recent scientific advances.
link |
And itself, because it enables scientists to network, to communicate, to exchange papers
link |
and ideas much faster, it is a way to speed up scientific progress.
link |
So even though you're looking at a recursively self improving system, it is consuming exponentially
link |
more resources to produce the same amount of problem solving, in fact.
link |
So that's a fascinating way to paint it.
link |
And certainly that holds for the deep learning community, right?
link |
If you look at the temporal, what did you call it?
link |
The temporal density of significant ideas.
link |
If you look at in deep learning, I think, I'd have to think about that, but if you really
link |
look at significant ideas in deep learning, they might even be decreasing.
link |
So I do believe the per paper significance is decreasing.
link |
But the amount of papers is still today, exponentially increasing.
link |
So I think if you look at an aggregate, my guess is that you would see a linear progress.
link |
If you were to sum the significance of all papers, you would see a roughly linear progress.
link |
And in my opinion, it is not a coincidence that you're seeing linear progress in science
link |
despite exponential resource consumption.
link |
I think the resource consumption is dynamically adjusting itself to maintain linear progress
link |
because we as a community expect linear progress, meaning that if we start investing less and
link |
seeing less progress, it means that suddenly there are some lower hanging fruits that become
link |
available and someone's going to step up and pick them.
link |
So it's very much like a market for discoveries and ideas.
link |
But there's another fundamental part which you're highlighting, which as a hypothesis
link |
as science or the space of ideas, any one path you travel down, it gets exponentially
link |
more difficult to develop new ideas.
link |
And your sense is that's going to hold across our mysterious universe.
link |
Well, exponential progress triggers exponential friction so that if you tweak one part of
link |
the system, suddenly some other part becomes a bottleneck.
link |
For instance, let's say we develop some device that measures its own acceleration and then
link |
it has some engine and it outputs even more acceleration in proportion of its own acceleration
link |
and you drop it somewhere.
link |
It's not going to reach infinite speed because it exists in a certain context.
link |
So the error on this is going to generate friction and it's going to block it at some
link |
And even if you were to consider a broader context and lift the bottleneck there, like
link |
the bottleneck of friction, then some other part of the system would start stepping in
link |
and creating exponential friction, maybe the speed of flight or whatever.
link |
And this definitely holds true when you look at the problem solving algorithm that is being
link |
run by science as an institution, science as a system.
link |
As you make more and more progress, despite having this recursive self improvement component,
link |
you are encountering exponential friction, like the more researchers you have working
link |
on different ideas, the more overhead you have in terms of communication across researchers.
link |
If you look at, you were mentioning quantum mechanics, right?
link |
Well if you want to start making significant discoveries today, significant progress in
link |
quantum mechanics, there is an amount of knowledge you have to ingest, which is huge.
link |
But there is a very large overhead to even start to contribute, there is a large amount
link |
of overhead to synchronize across researchers and so on.
link |
And of course, the significant practical experiments are going to require exponentially
link |
expensive equipment because the easier ones have already been run, right?
link |
So in your senses, there is no way of escaping this kind of friction with artificial intelligence
link |
Yeah, no, I think science is a very good way to model what would happen with a superhuman
link |
recursive research improving AI.
link |
That's my intuition.
link |
It's not like a mathematical proof of anything, that's not my point, like I'm not trying
link |
to prove anything, I'm just trying to make an argument to question the narrative of intelligence
link |
explosion, which is quite a dominant narrative and you do get a lot of pushback if you go
link |
Because so for many people, right, AI is not just a subfield of computer science, it's
link |
more like a belief system, like this belief that the world is headed towards an event,
link |
the singularity, past which, you know, AI will become, will go exponential very much
link |
and the world will be transformed and humans will become obsolete.
link |
And if you go against this narrative, because it is not really a scientific argument but
link |
more of a belief system, it is part of the identity of many people.
link |
If you go against this narrative, it's like you're attacking the identity of people who
link |
It's almost like saying God doesn't exist or something, so you do get a lot of pushback
link |
if you try to question his ideas.
link |
First of all, I believe most people, they might not be as eloquent or explicit as you're
link |
being, but most people in computer science are most people who actually have built anything
link |
that you could call AI, quote unquote, would agree with you.
link |
They might not be describing in the same kind of way, it's more, so the pushback you're
link |
getting is from people who get attached to the narrative from, not from a place of science,
link |
but from a place of imagination.
link |
So why do you think that's so appealing?
link |
Because the usual dreams that people have when you create a superintelligence system
link |
past the singularity, that what people imagine is somehow always destructive.
link |
Do you have, if you were put on your psychology hat, what's, why is it so?
link |
Why is it so appealing to imagine the ways that all of human civilization will be destroyed?
link |
I think it's a good story.
link |
You know, it's a good story.
link |
And very interestingly, it mirrors religious stories, right, religious mythology.
link |
If you look at the mythology of most civilizations, it's about the world being headed towards
link |
some final events in which the world will be destroyed and some new world order will
link |
arise that will be mostly spiritual, like the apocalypse followed by a paradise, probably.
link |
It's a very appealing story on a fundamental level.
link |
And we all need stories.
link |
We all need stories to structure in the way we see the world, especially at timescales
link |
that are beyond our ability to make predictions.
link |
So on a more serious non exponential explosion question, do you think there will be a time
link |
when we'll create something like human level intelligence or intelligence systems that
link |
will make you sit back and be just surprised at damn how smart this thing is?
link |
That doesn't require exponential growth or an exponential improvement.
link |
But what's your sense of the timeline and so on, that you'll be really surprised at
link |
certain capabilities?
link |
And we'll talk about limitations and deep learning, so do you think in your lifetime
link |
you'll be really damn surprised?
link |
Around 2013, 2014, I was many times surprised by the capabilities of deep learning, actually.
link |
That was before we had assessed exactly what deep learning could do and could not do and
link |
it felt like a time of immense potential.
link |
And then we started narrowing it down.
link |
But I was very surprised, so I would say it has already happened.
link |
Was there a moment, there must have been a day in there where your surprise was almost
link |
bordering on the belief of the narrative that we just discussed?
link |
Was there a moment, because you've written quite eloquently about the limits of deep
link |
learning, was there a moment that you thought that maybe deep learning is limitless?
link |
No, I don't think I've ever believed this.
link |
What was really shocking is that it worked.
link |
It worked at all, yeah.
link |
But there's a big jump between being able to do really good computer vision and human
link |
level intelligence.
link |
So I don't think at any point, I wasn't an impression that the results we got in computer
link |
vision meant that we were very close to human level intelligence.
link |
I don't think we're very close to human level intelligence.
link |
I do believe that there's no reason why we won't achieve it at some point.
link |
I also believe that the problem with talking about human level intelligence is that implicitly
link |
you're considering an axis of intelligence with different levels.
link |
But that's not really how intelligence works.
link |
Intelligence is very multidimensional.
link |
And so there's the question of capabilities, but there's also the question of being human
link |
like, and it's two very different things, like you can build potentially very advanced
link |
intelligent agents that are not human like at all.
link |
And you can also build very human like agents.
link |
And these are two very different things, right?
link |
Let's go from the philosophical to the practical.
link |
Can you give me a history of Keras and all the major deep learning frameworks that you
link |
kind of remember in relation to Keras and in general, TensorFlow, Theano, the old days.
link |
Can you give a brief overview, Wikipedia style history, and your role in it before we return
link |
to AGI discussions?
link |
Yeah, that's a broad topic.
link |
So I started working on Keras, it was a name Keras at the time, I actually picked the
link |
name like just the day I was going to release it.
link |
So I started working on it in February 2015.
link |
And so at the time, there weren't too many people working on deep learning, maybe like
link |
fewer than 10,000, the software tooling was not really developed.
link |
So the main deep learning library was Cafe, which was mostly C++.
link |
Why do you say Cafe was the main one?
link |
Cafe was vastly more popular than Theano in late 2014, early 2015.
link |
Cafe was the one library that everyone was using for computer vision.
link |
And computer vision was the most popular problem.
link |
Like, Covenant was like the subfield of deep learning that everyone was working on.
link |
So myself, so in late 2014, I was actually interested in RNNs, in recurrent neural networks,
link |
which was a very niche topic at the time, right, it really took off around 2016.
link |
And so I was looking for good tools.
link |
I had used Torch 7, I had used Theano, used Theano a lot in Kaggle competitions, I had
link |
And there was no like good solution for RNNs at the time, like there was no reusable open
link |
source implementation of an LSTM, for instance.
link |
So I decided to build my own.
link |
And at first, the pitch for that was it was going to be mostly around LSTM recurrent neural
link |
So in Python, an important decision at the time that was kind of nonobvious is that the
link |
models would be defined via Python code, which was kind of like going against the mainstream
link |
at the time, because Cafe, Pylon 2 and so on, like all the big libraries were actually
link |
going with you, approaching static configuration files in YAML to define models.
link |
So some libraries were using code to define models like Torch 7, obviously, but that was
link |
Python Lasagne was like a Theano based very early library that was, I think, developed.
link |
I don't remember exactly.
link |
Probably late 2014.
link |
It's Python as well.
link |
It's Python as well.
link |
It was like on top of Theano.
link |
And so I started working on something and the value proposition at the time was that not
link |
only that what I think was the first reusable open source implementation of LSTM, you could
link |
combine RNNs and covenants with the same library, which is not really possible before.
link |
Like Cafe was only doing covenants.
link |
And it was kind of easy to use.
link |
Because so before I was using Theano, I was actually using Psykitlin.
link |
And I loved Psykitlin for its usability.
link |
So I drew a lot of inspiration from Psykitlin when I met Keras.
link |
It's almost like Psykitlin for neural networks.
link |
Like reducing a complex string loop to a single function call.
link |
And of course, some people will say, this is hiding a lot of details, but that's exactly
link |
The magic is the point.
link |
So it's magical, but in a good way, it's magical in the sense that it's delightful.
link |
I'm actually quite surprised.
link |
I didn't know that it was born out of desire to implement RNNs and LSTMs.
link |
That's fascinating.
link |
So you were actually one of the first people to really try to attempt to get the major
link |
architecture together.
link |
And it's also interesting, I mean, you realize that that was a design decision at all is
link |
defining the model and code.
link |
Just I'm putting myself in your shoes, whether the YAML, especially if Cafe was the most
link |
It was the most popular by far.
link |
If I was if I were, yeah, I don't, I didn't like the YAML thing, but it makes more sense
link |
that you will put in a configuration file, the definition of a model.
link |
That's an interesting gutsy move to stick with defining it in code.
link |
Just if you look back, other libraries, we're doing it as well, but it was definitely the
link |
more niche option.
link |
Keras and then Keras.
link |
So I released Keras in March, 2015, and it got users pretty much from the start.
link |
So the deep learning community was very, very small at the time.
link |
Lots of people were starting to be interested in LSTMs.
link |
So it was going to release at the right time because it was offering an easy to use LSTM
link |
Exactly at the time where lots of you started to be intrigued by the capabilities of RNN,
link |
So it grew from there.
link |
Then I joined Google about six months later, and that was actually completely unrelated
link |
Keras actually joined a research team working on image classification mostly like computer
link |
So I was doing computer vision research at Google initially.
link |
And immediately when I joined Google, I was exposed to the early internal version of TensorFlow.
link |
And the way it appeared to me at the time, and it was definitely the way it was at the
link |
time, is that this was an improved version of Tiano.
link |
So I immediately knew I had to port Keras to this new TensorFlow thing.
link |
And I was actually very busy as a new Googler.
link |
So I had not time to work on that.
link |
But then in November, I think it was November 2015, TensorFlow got released.
link |
And it was kind of like my wake up call that, hey, I had to actually go and make it happen.
link |
So in December, I ported Keras to run on TensorFlow, but it was not exactly a port.
link |
It was more like a refactoring where I was abstracting away all the backend functionality
link |
into one module so that the same code base could run on top of multiple backends.
link |
So on top of TensorFlow or Tiano.
link |
And for the next year, Tiano stayed as the default option, it was easier to use, it was
link |
much faster, especially when it came to on it.
link |
But eventually, TensorFlow overtook it.
link |
And TensorFlow, the early TensorFlow has similar architectural decisions as Tiano.
link |
So it was a natural transition.
link |
So what, I mean, that still carries as a side, almost one project, right?
link |
Yeah, so it was not my job assignment, it was not.
link |
I was doing it on the side.
link |
And even though it grew to have a lot of uses for deep learning library at the time, like
link |
Stroud 2016, but I wasn't doing it as my main job.
link |
So things started changing in, I think it must have been maybe October 2016, so one year
link |
So Rajat, who has the lead in TensorFlow, basically showed up one day in our building
link |
where I was doing like, so I was doing research and things like, so I did a lot of computer
link |
vision research, also collaborations with Christian Zegedi and Deep Learning for Theraim
link |
Proving, that was a really interesting research topic.
link |
And so Rajat was saying, hey, we saw Keras, we like it, we saw that you had Google, why
link |
don't you come over for like a quarter and work with us?
link |
And I was like, yeah, that sounds like a great opportunity, let's do it.
link |
And so I started working on integrating the Keras API into TensorFlow more tightly.
link |
So what followed up is a sort of temporary TensorFlow only version of Keras that was
link |
in TensorFlow.contrib for a while, and finally moved to TensorFlow Core.
link |
And I've never actually gotten back to my old team doing research.
link |
Well, it's kind of funny that somebody like you who dreams of or at least sees the power
link |
of AI systems that reason and Theraim Proving will talk about has also created a system
link |
that makes the most basic kind of Lego building that is deep learning, super accessible, super
link |
easy, so beautifully so.
link |
It's a funny irony that you're both, you're responsible for both things.
link |
So TensorFlow 2.0 is kind of, there's a sprint, I don't know how long it'll take, but there's
link |
a sprint towards the finish.
link |
What do you look, what are you working on these days?
link |
What are you excited about?
link |
What are you excited about in 2.0?
link |
Eager execution, there's so many things that just make it a lot easier to work.
link |
What are you excited about?
link |
And what's also really hard?
link |
What are the problems you have to kind of solve?
link |
So I've spent the past year and a half working on TensorFlow 2.0 and it's been a long journey.
link |
I'm actually extremely excited about it.
link |
I think it's a great product.
link |
It's a delightful product compared to TensorFlow 1.0.
link |
We've made huge progress.
link |
So on the Keras side, what I'm really excited about is that, so previously Keras has been
link |
this very easy to use high level interface to do deep learning, but if you wanted to,
link |
if you wanted a lot of flexibility, the Keras framework was probably not the optimal way
link |
to do things compared to just writing everything from scratch.
link |
So in some way, the framework was getting in the way.
link |
And in TensorFlow 2.0, you don't have this at all, actually.
link |
You have the usability of the high level interface, but you have the flexibility of this lower
link |
level interface, and you have this spectrum of workflows where you can get more or less
link |
usability and flexibility, the tradeoffs, depending on your needs.
link |
You can write everything from scratch and you get a lot of help doing so by subclassing
link |
models and writing some train loops using eager execution.
link |
It's very flexible.
link |
It's very easy to debug.
link |
It's very powerful.
link |
But all of this integrates seamlessly with higher level features up to the classic Keras
link |
workflows, which are very psychedelic and ideal for a data scientist, machine learning
link |
engineer type of profile.
link |
So now you can have the same framework offering the same set of APIs that enable a spectrum
link |
of workflows that are lower level, more or less high level, that are suitable for profiles
link |
ranging from researchers to data scientists and everything in between.
link |
So that's super exciting.
link |
I mean, it's not just that.
link |
It's connected to all kinds of tooling.
link |
You can go on mobile, you can go with TensorFlow Lite, you can go in the cloud or serving
link |
and so on, it all is connected together.
link |
Some of the best software written ever is often done by one person, sometimes two.
link |
So with a Google, you're now seeing sort of Keras having to be integrated in TensorFlow.
link |
I'm sure it has a ton of engineers working on.
link |
So I'm sure there are a lot of tricky design decisions to be made.
link |
How does that process usually happen?
link |
At least your perspective, what are the debates like?
link |
Is there a lot of thinking considering different options and so on?
link |
So a lot of the time I spend at Google is actually discussing design discussions, writing design
link |
docs, participating in design review meetings and so on.
link |
This is as important as actually writing a code.
link |
So there's a lot of thought and a lot of care that is taken in coming up with these decisions
link |
and taking into account all of our users because TensorFlow has this extremely diverse user
link |
It's not like just one user segment where everyone has the same needs.
link |
We have small scale production users, large scale production users.
link |
We have startups, we have researchers, it's all over the place, and we have to cater to
link |
all of their needs.
link |
If I just look at the standard debates of C++ or Python, there's some heated debates.
link |
Do you have those at Google?
link |
I mean, they're not heated in terms of emotionally, but there's probably multiple ways to do it,
link |
So how do you arrive through those design meetings at the best way to do it, especially in deep
link |
learning where the field is evolving as you're doing it?
link |
Is there some magic to it?
link |
Is there some magic to the process?
link |
I don't know if there's magic to the process, but there definitely is a process.
link |
So making design decisions is about satisfying a set of constraints, but also trying to do
link |
so in the simplest way possible because this is what can be maintained, this is what can
link |
be expanded in the future.
link |
So you don't want to naively satisfy the constraints by just, you know, for each capability you
link |
need available, you're going to come up with one argument in your API and so on.
link |
You want to design APIs that are modular and hierarchical so that they have an API surface
link |
that is as small as possible, right?
link |
And you want this modular hierarchical architecture to reflect the way that domain experts think
link |
about the problem because as a domain expert, when you're reading about a new API, you're
link |
reading a tutorial or some docs, pages, you already have a way that you're thinking about
link |
You already have certain concepts in mind and you're thinking about how they relate together
link |
and when you're reading docs, you're trying to build as quickly as possible a mapping
link |
between the concepts featured in your API and the concepts in your mind so you're trying
link |
to map your mental model as a domain expert to the way things work in the API.
link |
So you need an API and an underlying implementation that are reflecting the way people think about
link |
So in minimizing the time it takes to do the mapping?
link |
Minimizing the time, the cognitive load there is in ingesting this new knowledge about your
link |
An API should not be self referential or referring to implementation details, it should only
link |
be referring to domain specific concepts that people already understand.
link |
So what's the future of Keras and TensorFlow look like?
link |
What does TensorFlow 3.0 look like?
link |
So that's kind of too far in the future for me to answer, especially since I'm not even
link |
the one making these decisions.
link |
But so from my perspective, which is just one perspective among many different perspectives
link |
on the TensorFlow team, I'm really excited by developing even higher level APIs, higher
link |
I'm really excited by hyperparameter tuning, by automated machine learning, AutoML.
link |
I think the future is not just defining a model like you were assembling Lego blocks
link |
and then colleague fit on it, it's more like an automagical model that would just look
link |
at your data and optimize the objective you're after.
link |
So that's what I'm looking into.
link |
So you put the baby into a room with the problem and come back a few hours later with a fully
link |
It's not like a box of Legos, it's more like the combination of a kid that's really good
link |
at Legos, and a box of Legos, and just building the thing on the song.
link |
So that's an exciting feature.
link |
I think there's a huge amount of applications and revolutions to be had under the constraints
link |
of the discussion we previously had.
link |
But what do you think are the current limits of deep learning?
link |
If we look specifically at these function approximators that tries to generalize from
link |
If you've talked about local versus extreme generalization, you mentioned that neural
link |
networks don't generalize well and humans do, so there's this gap.
link |
And you've also mentioned that extreme generalization requires something like reasoning to fill those
link |
So how can we start trying to build systems like that?
link |
So this is by design, right?
link |
And deep learning models are huge, parametric models, differentiable, so continuous, that
link |
go from an input space to an output space.
link |
And they're trained with gradient descent, so they're trained pretty much point by point.
link |
They're learning a continuous geometric morphing from an input vector space to an output vector
link |
And because this is done point by point, a deep neural network can only make sense of
link |
points in experience space that are very close to things that it has already seen in string
link |
At best, it can do interpolation across points.
link |
But that means in order to train your network, you need a dense sampling of the input cross
link |
output space, almost a point by point sampling, which can be very expensive if you're dealing
link |
with complex real world problems like autonomous driving, for instance, or robotics.
link |
It's doable if you're looking at the subset of the visual space.
link |
But even then, it's still fairly expensive, you still need millions of examples.
link |
And it's only going to be able to make sense of things that are very close to ways that's
link |
And in contrast to that, well, of course, you have human intelligence, but even if you're
link |
not looking at human intelligence, you can look at very simple rules, algorithms.
link |
If you have a symbolic rule, it can actually apply to a very, very large set of inputs
link |
because it is abstract.
link |
It is not obtained by doing a point by point mapping, right?
link |
For instance, if you try to learn a sorting algorithm using a deep neural network, well,
link |
you're very much limited to learning point by point what the sorted representation of
link |
this specific list is like.
link |
But instead, you could have a very, very simple sorting algorithm written in a few lines.
link |
Maybe it's just two nested loops.
link |
And it can process any list at all because it is abstract, because it is a set of rules.
link |
So deep learning is really like point by point geometric morphings, morphings trained with
link |
And meanwhile, abstract rules can generalize much better.
link |
And I think the future is really to combine the two.
link |
So how do we, do you think, combine the two?
link |
How do we combine good point by point functions with programs, which is what the symbolic AI
link |
At which levels the combination happened.
link |
I mean, obviously, we're jumping into the realm of where there's no good answers.
link |
It's just kind of ideas and intuitions and so on.
link |
Well, if you look at the really successful AI systems today, I think there are already
link |
hybrid systems that are combining symbolic AI with deep learning.
link |
For instance, successful robotics systems are already mostly model based, rule based
link |
things like planning algorithms and so on.
link |
At the same time, they're using deep learning as perception modules.
link |
Sometimes they're using deep learning as a way to inject fuzzy intuition into a rule
link |
If you look at a system like a self driving car, it's not just one big end to end neural
link |
network that wouldn't work at all, precisely because in order to train that, you would
link |
need a dense sampling of experience space when it comes to driving, which is completely
link |
unrealistic, obviously.
link |
Instead, the self driving car is mostly symbolic, it's software, it's programmed by hand.
link |
It's mostly based on explicit models, in this case, mostly 3D models of the environment
link |
around the car, but it's interfacing with the real world, using deep learning modules.
link |
The deep learning there serves as a way to convert the raw sensory information to something
link |
usable by symbolic systems.
link |
Okay, well, let's linger on that a little more.
link |
So dense sampling from input to output, you said it's obviously very difficult.
link |
In the case of self driving, you mean?
link |
Let's say self driving, right?
link |
Self driving for many people.
link |
Let's not even talk about self driving, let's talk about steering, so staying inside the
link |
It's definitely a problem you can solve with an end to end deep learning model, but that's
link |
like one small subset.
link |
Hold on a second, I don't know how you're jumping from the extreme so easily, because
link |
I disagree with you on that.
link |
I think, well, it's not obvious to me that you can solve lane following.
link |
No, it's not obvious, I think it's doable.
link |
I think in general, there is no hard limitations to what you can learn with a deep neural network,
link |
as long as the search space is rich enough, is flexible enough, and as long as you have
link |
this dense sampling of the input cross output space, the problem is that this dense sampling
link |
could mean anything from 10,000 examples to trillions and trillions.
link |
So that's my question.
link |
So what's your intuition?
link |
And if you could just give it a chance and think what kind of problems can be solved
link |
by getting a huge amounts of data and thereby creating a dense mapping.
link |
So let's think about natural language dialogue, the Turing test.
link |
Do you think the Turing test can be solved with a neural network alone?
link |
Well, the Turing test is all about tricking people into believing they're talking to a
link |
It's actually very difficult because it's more about exploiting human perception and
link |
not so much about intelligence.
link |
There's a big difference between mimicking into Asian behavior and actually into Asian
link |
So, okay, let's look at maybe the Alexa prize and so on, the different formulations of the
link |
natural language conversation that are less about mimicking and more about maintaining
link |
a fun conversation that lasts for 20 minutes.
link |
It's a little less about mimicking and that's more about, I mean, it's still mimicking,
link |
but it's more about being able to carry forward a conversation with all the tangents that
link |
happen in dialogue and so on.
link |
Do you think that problem is learnable with this kind of neural network that does the
link |
point to point mapping?
link |
So I think it would be very, very challenging to do this with deep learning.
link |
I don't think it's out of the question either.
link |
I wouldn't read it out.
link |
The space of problems that can be solved with a large neural network.
link |
What's your sense about the space of those problems?
link |
Useful problems for us.
link |
In theory, it's infinite.
link |
You can solve any problem.
link |
In practice, while deep learning is a great fit for perception problems, in general, any
link |
problem which is naturally amenable to explicit handcrafted rules or rules that you can generate
link |
by exhaustive search over some program space.
link |
So perception, artificial intuition, as long as you have a sufficient training data set.
link |
And that's the question.
link |
I mean, perception, there's interpretation and understanding of the scene, which seems
link |
to be outside the reach of current perception systems.
link |
So do you think larger networks will be able to start to understand the physics and the
link |
physics of the scene, the three dimensional structure and relationships of objects in
link |
the scene, and so on?
link |
Or really, that's where symbolic at has to step in?
link |
Well, it's always possible to solve these problems with deep learning is just extremely
link |
A model would be an explicit rule based abstract model would be a far better, more compressed
link |
representation of physics than learning just this mapping between in this situation, this
link |
If you change the situation slightly, then this other thing happens and so on.
link |
Do you think it's possible to automatically generate the programs that would require that
link |
kind of reasoning?
link |
Or does it have to, so where expert systems fail, there's so many facts about the world
link |
had to be hand coded in.
link |
Do you think it's possible to learn those logical statements that are true about the
link |
world and their relationships?
link |
I mean, that's kind of what they're improving at a basic level is trying to do, right?
link |
Yeah, except it's much harder to formulate statements about the world compared to fermenting
link |
mathematical statements.
link |
Statements about the world tend to be subjective.
link |
So can you learn rule based models?
link |
That's the field of program synthesis.
link |
However, today we just don't really know how to do it.
link |
So it's very much a grass search or tree search problem.
link |
And so we are limited to the sort of a tree session grass search algorithms that we have
link |
Personally, I think genetic algorithms are very promising.
link |
So it's almost like genetic programming.
link |
Genetic programming, exactly.
link |
Can you discuss the field of program synthesis, like what, how many people are working and
link |
thinking about it?
link |
What, where we are in the history of program synthesis and what are your hopes for it?
link |
Well, if it were deep learning, this is like the 90s.
link |
So meaning that we already have existing solutions.
link |
We are starting to have some basic understanding of what this is about.
link |
But it's still a field that is in its infancy.
link |
There are very few people working on it.
link |
There are very few real world applications.
link |
So the one real world application I'm aware of is Flash Fill in Excel.
link |
It's a way to automatically learn very simple programs to format cells in an Excel spreadsheet
link |
from a few examples.
link |
For instance, learning a way to format a date, things like that.
link |
Oh, that's fascinating.
link |
You know, okay, that's that's fascinating topic.
link |
I was wondering when I provide a few samples to Excel, what it's able to figure out, like
link |
just giving it a few dates, what are you able to figure out from the pattern I just gave
link |
That's a fascinating question.
link |
It's fascinating whether that's learnable patterns and you're saying they're working
link |
How big is the toolbox currently?
link |
Are we completely in the dark?
link |
So if you set the 90s.
link |
In terms of program synthesis?
link |
So I would say, so maybe 90s is even too optimistic because by the 90s, you know, we already understood
link |
We already understood, you know, the engine of deep learning, even though we couldn't
link |
really see its potential quite today, I don't think we found the engine of program synthesis.
link |
So we're in the winter before backprop.
link |
So I do believe program synthesis, in general, discrete search over rule based models is going
link |
to be a cornerstone of AI research in the next century, right?
link |
And that doesn't mean we're going to drop deep learning.
link |
Deep learning is immensely useful.
link |
Like being able to learn this is a very flexible, adaptable, parametric models, that's actually
link |
Like all it's doing, it's pattern cognition, but being good at pattern cognition, given
link |
lots of data is just extremely powerful.
link |
So we are still going to be working on deep learning and we're going to be working on
link |
program synthesis.
link |
We're going to be combining the two in increasingly automated ways.
link |
So let's talk a little bit about data.
link |
You've tweeted about 10,000 deep learning papers have been written about hard coding
link |
priors, about a specific task in a neural network architecture, it works better than
link |
a lack of a prior.
link |
By summarizing all these efforts, they put a name to an architecture, but really what
link |
they're doing is hard coding some priors that improve the performance of the system.
link |
But we get straight to the point, it's probably true.
link |
So you say that you can always buy performance, buy in quotes performance by either training
link |
on more data, better data, or by injecting task information to the architecture of the
link |
However, this is informative about the generalization power the techniques use, the fundamentals
link |
of ability to generalize.
link |
Do you think we can go far by coming up with better methods for this kind of cheating,
link |
for better methods of large scale annotation of data, so building better priors?
link |
If you've made it, it's not cheating anymore.
link |
I'm joking about the cheating, but large scale, so basically I'm asking about something
link |
that hasn't, from my perspective, been researched too much is exponential improvement in annotation
link |
You often think about...
link |
I think it's actually been researched quite a bit.
link |
You just don't see publications about it, because people who publish papers are going
link |
to publish about known benchmarks, sometimes they're going to read a new benchmark.
link |
People who actually have real world large scale defining problems, they're going to spend
link |
a lot of resources into data annotation and good data annotation pipelines, but you don't
link |
see any papers about it.
link |
That's interesting.
link |
Do you think there are certain resources, but do you think there's innovation happening?
link |
To clarify at the point in the twist, machine learning in general is the science of generalization.
link |
You want to generate knowledge that can be reused across different datasets, across different
link |
If instead you're looking at one dataset, and then you are hard coding knowledge about
link |
this task into your architecture, this is no more useful than training a network and
link |
then saying, oh, I found these weight values perform well.
link |
David Ha, I don't know if you know David, he had a paper the other day about weight
link |
agnostic neural networks, and this is very interesting paper because it really illustrates
link |
the fact that an architecture, even without weight, an architecture is a knowledge about
link |
It encodes knowledge.
link |
When it comes to architectures that are uncrafted by researchers, in some cases, it is very,
link |
very clear that all they are doing is artificially reencoding the template that corresponds
link |
to the proper way to solve the task and coding in a given dataset.
link |
For instance, if you've looked at the baby dataset, which is about natural language
link |
question answering, it is generated by an algorithm.
link |
This is a question under pairs that are generated by an algorithm.
link |
The algorithm is solving a certain template.
link |
Turns out, if you craft a network that literally encodes this template, you can solve this
link |
dataset with nearly 100% accuracy, but that doesn't actually tell you anything about how
link |
to solve question answering in general, which is the point.
link |
The question is just the linger on it, whether it's from the data side or from the size of
link |
I don't know if you've read the blog post by Ray Sutton, the bitter lesson, where he
link |
says the biggest lesson that we can read from 70 years of AI research is that general methods
link |
that leverage computation are ultimately the most effective.
link |
As opposed to figuring out methods that can generalize effectively, do you think we can
link |
get pretty far by just having something that leverages computation and the improvement of
link |
I think Rich is making a very good point, which is that a lot of these papers, which
link |
are actually all about manually hard coding prior knowledge about a task into some system,
link |
doesn't have to be deeply architected into some system, right?
link |
These papers are not actually making any impact.
link |
Instead, what's making really long term impact is very simple, very general systems that
link |
are really agnostic to all these tricks, because these tricks do not generalize.
link |
And of course, the one general and simple thing that you should focus on is that which
link |
leverages computation, because computation, the availability of large scale computation
link |
has been increasing exponentially, following Morse law.
link |
So if your algorithm is all about exploiting this, then your algorithm is suddenly exponentially
link |
So I think Rich is definitely right.
link |
However, he's right about the past 70 years, he's like assessing the past 70 years.
link |
I am not sure that this assessment will still hold true for the next 70 years.
link |
It might, to some extent, I suspect it will not, because the truth of his assessment is
link |
a function of the context, right, in which this research took place.
link |
And the context is changing, like Morse law might not be applicable anymore, for instance,
link |
And I do believe that when you tweak one aspect of a system, when you exploit one aspect
link |
of a system, some other aspect starts becoming the bottleneck.
link |
Let's say you have unlimited computation, well, then data is the bottleneck.
link |
And I think we are already starting to be in a regime where our systems are so large
link |
in scale and so data ingrained, the data today, and the quality of data, and the scale of
link |
data is the bottleneck.
link |
And in this environment, the beta lesson from Rich is not going to be true anymore, right?
link |
So I think we are going to move from a focus on a scale of a competition scale to focus
link |
on data efficiency.
link |
So that's getting to the question of symbolic AI.
link |
But the linger on the deep learning approaches, do you have hope for either unsupervised learning
link |
or reinforcement learning, which are ways of being more data efficient in terms of the
link |
amount of data they need that require human annotation?
link |
So unsupervised learning and reinforcement learning are frameworks for learning, but
link |
they are not like any specific technique.
link |
So usually when people say reinforcement learning, what they really mean is deep reinforcement
link |
learning, which is like one approach which is actually very questionable.
link |
The question I was asking was unsupervised learning with deep neural networks and deeper
link |
reinforcement learning.
link |
Well, these are not really data efficient because you're still leveraging these huge
link |
parametric models, point by point with gradient descent.
link |
It is more efficient in terms of the number of annotations, the density of annotations
link |
The idea being to learn the latent space around which the data is organized and then map the
link |
sparse annotations into it.
link |
And sure, I mean, that's clearly a very good idea.
link |
It's not really a topic I would be working on, but it's clearly a good idea.
link |
So it would get us to solve some problems that...
link |
It will get us to incremental improvements in labeled data efficiency.
link |
Do you have concerns about short term or long term threats from AI, from artificial intelligence?
link |
Yes, definitely to some extent.
link |
And what's the shape of those concerns?
link |
This is actually something I've briefly written about.
link |
But the capabilities of deep learning technology can be used in many ways that are concerning
link |
from mass surveillance with things like facial recognition, in general, tracking lots of
link |
data about everyone and then being able to making sense of this data, to do identification,
link |
That's concerning.
link |
That's something that's being very aggressively pursued by totalitarian states like China.
link |
One thing I am very much concerned about is that our lives are increasingly online, are
link |
increasingly digital, made of information, made of information consumption and information
link |
production or digital footprint, I would say.
link |
And if you absorb all of this data and you are in control of where you consume information,
link |
social networks and so on, recommendation engines, then you can build a sort of reinforcement
link |
loop for human behavior.
link |
You can observe the state of your mind at time t.
link |
You can predict how you would react to different pieces of content, how to get you to move
link |
your mind in a certain direction, then you can feed the specific piece of content that
link |
would move you in a specific direction.
link |
And you can do this at scale in terms of doing it continuously in real time.
link |
You can also do it at scale in terms of scaling this to many, many people, to entire populations.
link |
So potentially, artificial intelligence, even in its current state, if you combine it with
link |
the internet, with the fact that we have all of our lives are moving to digital devices
link |
and digital information consumption and creation, what you get is the possibility to achieve
link |
mass manipulation of behavior and mass psychological control.
link |
And this is a very real possibility.
link |
Yeah, so you're talking about any kind of recommender system.
link |
Let's look at the YouTube algorithm, Facebook, anything that recommends content you should
link |
watch next, and it's fascinating to think that there's some aspects of human behavior
link |
that you can say a problem of, is this person hold Republican beliefs or Democratic beliefs?
link |
And it's a trivial, that's an objective function, and you can optimize and you can measure and
link |
you can turn everybody into a Republican or everybody into a Democrat.
link |
I do believe it's true.
link |
So the human mind is very...
link |
If you look at the human mind as a kind of computer program, it has a very large exploit
link |
It has many, many vulnerabilities.
link |
Exploit surfaces, yeah.
link |
Where you can control it, for instance, when it comes to your political beliefs, this is
link |
very much tied to your identity.
link |
So for instance, if I'm in control of your news feed on your favorite social media platforms,
link |
this is actually where you're getting your news from.
link |
And of course, I can choose to only show you news that will make you see the world in a
link |
specific way, right?
link |
But I can also create incentives for you to post about some political beliefs.
link |
And then when I get you to express a statement, if it's a statement that me as a controller,
link |
I want to reinforce.
link |
I can just show it to people who will agree and they will like it.
link |
And that will reinforce the statement in your mind.
link |
If this is a statement I want you to, this is a belief I want you to abandon, I can,
link |
on the other hand, show it to opponents, right, will attack you.
link |
And because they attack you at the very least, next time you will think twice about posting
link |
But maybe you will even, you know, stop believing this because you got pushed back, right?
link |
So there are many ways in which social media platforms can potentially control your opinions.
link |
And today, the, so all of these things are already being controlled by algorithms.
link |
These algorithms do not have any explicit political goal today.
link |
Well, potentially they could, like if some totalitarian government takes over, you know,
link |
social media platforms and decides that, you know, now we're going to use this not just
link |
for my surveillance, but also for my opinion control and behavior control, very bad things
link |
But what's really fascinating and actually quite concerning is that even without an
link |
explicit intent to manipulate, you're already seeing very dangerous dynamics in terms of
link |
how this content recommendation algorithms behave.
link |
Because right now, the goal, the objective function of these algorithms is to maximize
link |
engagement, right, which seems fairly innocuous at first, right?
link |
However, it is not because content that will maximally engage people, you know, get people
link |
to react in an emotional way, get people to click on something.
link |
It is very often content that, you know, is not healthy to the public discourse.
link |
For instance, fake news are far more likely to get you to click on them than real news,
link |
simply because they are not constrained to reality.
link |
So they can be as outrageous, as surprising as good stories as you want, because they
link |
are artificial, right?
link |
To me, that's an exciting world because so much good can come.
link |
So there's an opportunity to educate people.
link |
You can balance people's worldview with other ideas.
link |
So there's so many objective functions.
link |
The space of objective functions that create better civilizations is large, arguably infinite.
link |
But there's also a large space that creates division and destruction, civil war, a lot
link |
And the worry is, naturally, probably that space is bigger, first of all.
link |
And if we don't explicitly think about what kind of effects are going to be observed from
link |
different objective functions, then we're going to get into trouble.
link |
Because the question is, how do we get into rooms and have discussions?
link |
So inside Google, inside Facebook, inside Twitter, and think about, okay, how can we
link |
drive up engagement and at the same time create a good society?
link |
Is it even possible to have that kind of philosophical discussion?
link |
I think you can definitely try.
link |
So from my perspective, I would feel rather uncomfortable with companies that are in control
link |
of these new algorithms, with them making explicit decisions to manipulate people's opinions
link |
or behaviors, even if the intent is good, because that's a very totalitarian mindset.
link |
So instead, what I would like to see is probably never going to happen, because it's not super
link |
realistic, but that's actually something I really care about.
link |
I would like all these algorithms to present configuration settings to their users, so
link |
that the users can actually make the decision about how they want to be impacted by these
link |
information recommendation, content recommendation algorithms.
link |
For instance, as a user of something like YouTube or Twitter, maybe I want to maximize
link |
learning about a specific topic.
link |
So I want the algorithm to feed my curiosity, which is in itself a very interesting problem.
link |
So instead of maximizing my engagement, it will maximize how fast and how much I'm learning,
link |
and it will also take into account the accuracy, hopefully, of the information I'm learning.
link |
So yeah, the user should be able to determine exactly how these algorithms are affecting
link |
I don't want actually any entity making decisions about in which direction they're going to
link |
try to manipulate me.
link |
I want technology.
link |
So AI, these algorithms are increasingly going to be our interface to a world that is increasingly
link |
made of information.
link |
And I want everyone to be in control of this interface, to interface with the world on
link |
So if someone wants these algorithms to serve their own personal growth goals, they should
link |
be able to configure these algorithms in such a way.
link |
Yeah, but so I know it's painful to have explicit decisions, but there is underlying explicit
link |
decisions, which is some of the most beautiful fundamental philosophy that we have before
link |
us, which is personal growth.
link |
If I want to watch videos from which I can learn, what does that mean?
link |
So if I have a checkbox that wants to emphasize learning, there's still an algorithm with
link |
explicit decisions in it that would promote learning.
link |
What does that mean for me?
link |
Like, for example, I've watched a documentary on Flat Earth theory, I guess.
link |
It was very, like, I learned a lot.
link |
I'm really glad I watched it.
link |
It was a friend recommended it to me, because I don't have such an allergic reaction to
link |
crazy people as my fellow colleagues do.
link |
But it was very eye opening, and for others, it might not be.
link |
From others, they might just get turned off from the same with the Republican and Democrat.
link |
And it's a non trivial problem.
link |
And first of all, if it's done well, I don't think it's something that wouldn't happen
link |
that the YouTube wouldn't be promoting or Twitter wouldn't be.
link |
It's just a really difficult problem.
link |
How do we do, how do give people control?
link |
Well, it's mostly an interface design problem.
link |
The way I see it, you want to create technology that's like a mentor or a coach or an assistant
link |
so that it's not your boss, right, you are in control of it.
link |
You are telling it what to do for you.
link |
And if you feel like it's manipulating you, it's not actually, it's not actually doing
link |
You should be able to switch to a different algorithm, you know.
link |
So that fine tune control, you kind of learn, you're trusting the human collaboration.
link |
I mean, that's how I see autonomous vehicles, too, is giving as much information as possible
link |
and you learn that dance yourself.
link |
Yeah, Adobe, I don't know if you use Adobe product for like Photoshop.
link |
Yeah, they're trying to see if they can inject YouTube into their interface, but basically
link |
allow you to show you all these videos that, because everybody's confused about what to
link |
So basically teach people by linking to, in that way, it's an assistant that shows, uses
link |
videos as a basic element of information.
link |
Okay, so what practically should people do to try to, to try to fight against abuses of
link |
these algorithms or algorithms that manipulate us?
link |
Honestly, it's a very, very difficult problem because to start with, there is very little
link |
public awareness of these issues.
link |
Very few people would think that, you know, anything wrong with their new algorithm, even
link |
though there is actually something wrong already, which is that it's trying to maximize engagement
link |
most of the time, which has very negative side effects, right?
link |
So ideally, so the very first thing is to stop trying to purely maximize engagement, try
link |
to propagate content based on popularity, right, instead take into account the goals
link |
and the profiles of each user.
link |
So you will, you will be, one example is, for instance, when I look at topic recommendations
link |
on Twitter, it's like, you know, they have this news tab with switch recommendations.
link |
That's always the worst garbage because it's content that appeals to the smallest command
link |
denominator to all Twitter users because they're trying to optimize, they're purely
link |
trying to obtain us popularity, they're purely trying to optimize engagement, but that's
link |
So they should put me in control of some setting so that I define what's the objective function
link |
that Twitter is going to be following to show me this content.
link |
And honestly, so this is all about interface design, and we are not, it's not realistic
link |
to give users control of a bunch of knobs that define an algorithm, instead, we should
link |
purely put them in charge of defining the objective function, like let the user tell
link |
us what they want to achieve, how they want this algorithm to impact their lives.
link |
So do you think it is that or do they provide individual article by article reward structure
link |
where you give a signal, I'm glad I saw this or I'm glad I didn't?
link |
So like a Spotify type feedback mechanism, it works to some extent, I'm kind of skeptical
link |
about it because the only way the algorithm, the algorithm will attempt to relate your choices
link |
with the choices of everyone else, which might, you know, if you have an average profile that
link |
works fine, I'm sure Spotify accommodations work fine if you just like mainstream stuff.
link |
But if you don't, it can be, it's not optimal at all, actually.
link |
It'll be in an efficient search for the part of the Spotify world that represents you.
link |
So it's a tough problem, but do note that even a feedback system like what Spotify has
link |
does not give me control over why the algorithm is trying to optimize for.
link |
Well, public awareness, which is what we're doing now, is a good place to start.
link |
Do you have concerns about long term existential threats of artificial intelligence?
link |
Well, as I was saying, our world is increasingly made of information, AI algorithms are increasingly
link |
going to be our interface to this world of information, and somebody will be in control
link |
of these algorithms, and that puts us in any kind of bad situation, right?
link |
It has risks coming from potentially large companies wanting to optimize their own goals,
link |
maybe profit, maybe something else, also from governments who might want to use these algorithms
link |
as a means of control of the entire population.
link |
Do you think there's existential threat that could arise from that?
link |
So existential threat, so maybe you're referring to the singularity narrative where robots
link |
Well, I don't not terminate a robot, and I don't believe it has to be a singularity.
link |
We're just talking to, just like you said, the algorithm controlling masses of populations,
link |
the existential threat being hurt ourselves much like a nuclear war would hurt ourselves,
link |
that kind of thing.
link |
I don't think that requires a singularity, that requires a loss of control over AI algorithms.
link |
So I do agree there are concerning trends.
link |
Honestly, I wouldn't want to make any long term predictions.
link |
I don't think today we really have the capability to see what the dangers of AI are going to
link |
be in 50 years, in 100 years.
link |
I do see that we are already faced with concrete and present dangers surrounding the negative
link |
side effects of content recombination systems of new seed algorithms concerning algorithmic
link |
So we are delegating more and more decision processes to algorithms.
link |
Some of these algorithms are uncrafted, some are learned from data.
link |
But we are delegating control.
link |
Sometimes it's a good thing, sometimes not so much.
link |
And there is in general very little supervision of this process.
link |
So we are still in this period of very fast change, even chaos, where society is restructuring
link |
itself, turning into an information society, which itself is turning into an increasingly
link |
automated information processing society.
link |
And well, yeah, I think the best we can do today is try to raise awareness around some
link |
And I think we are actually making good progress if you look at algorithmic bias, for instance.
link |
Three years ago, even two years ago, very, very few people were talking about it.
link |
And now all the big companies are talking about it, often not in a very serious way,
link |
but at least it is part of the public discourse.
link |
You see people in Congress talking about it.
link |
And it all started from raising awareness.
link |
So in terms of alignment problem, trying to teach as we allow algorithms, just even recommend
link |
their systems on Twitter, encoding human values and morals, decisions that touch on ethics.
link |
How hard do you think that problem is?
link |
How do we have lost functions in neural networks that have some component, some fuzzy components
link |
Well, I think this is really all about objective function engineering, which is probably going
link |
to be increasingly a topic of concern in the future.
link |
Like for now, we are just using very naive loss functions because the hard part is not
link |
actually what you're trying to minimize, it's everything else.
link |
But as the everything else is going to be increasingly automated, we're going to be
link |
focusing our human attention on increasingly high level components, like what's actually
link |
driving the whole learning system, like the objective function.
link |
So loss function engineering is going to be, loss function engineer is probably going to
link |
be a job title in the future.
link |
And then the tooling you're creating with Keras essentially takes care of all the details
link |
underneath and basically the human expert is needed for exactly that.
link |
Keras is the interface between the data you're collecting and the business goals.
link |
And your job as an engineer is going to be to express your business goals and your understanding
link |
of your business or your product, your system as a kind of loss function or a kind of set
link |
Does the possibility of creating an AGI system excite you or scare you or bore you?
link |
So intelligence can never really be general, you know, at best it can have some degree
link |
of generality, like human intelligence.
link |
And it's also always as some specialization in the same way that human intelligence is
link |
specialized in a certain category of problems, is specialized in the human experience.
link |
And when people talk about AGI, I'm never quite sure if they're talking about very,
link |
very smart AI, so smart that it's even smarter than humans, or they're talking about human
link |
like intelligence, because these are different things.
link |
Let's say, presumably I'm oppressing you today with my humanness.
link |
So imagine that I was in fact a robot.
link |
So what does that mean?
link |
I'm oppressing you with natural language processing.
link |
Maybe if you weren't able to see me, maybe this is a phone call.
link |
That kind of system.
link |
So that's very much about building human like AI.
link |
And you're asking me, you know, is this an exciting perspective?
link |
Not so much because of what artificial human like intelligence could do, but, you know,
link |
from an intellectual perspective, I think if you could build truly human like intelligence,
link |
that means you could actually understand human intelligence, which is fascinating, right?
link |
Human like intelligence is going to require emotions, it's going to require consciousness,
link |
which is not things that would normally be required by an intelligent system.
link |
If you look at, you know, we were mentioning earlier like science as a superhuman problem
link |
solving agent or system, it does not have consciousness, it doesn't have emotions.
link |
In general, so emotions, I see consciousness as being on the same spectrum as emotions.
link |
It is a component of the subjective experience that is meant very much to guide behavior
link |
generation, right, it's meant to guide your behavior.
link |
In general, human intelligence and animal intelligence has evolved for the purpose of
link |
behavior generation, right, including in a social context.
link |
So that's why we actually need emotions.
link |
That's why we need consciousness.
link |
An artificial intelligence system developed in a different context may well never need
link |
them, may well never be conscious like science.
link |
But on that point, I would argue it's possible to imagine that there's echoes of consciousness
link |
in science when viewed as an organism, that science is consciousness.
link |
So I mean, how would you go about testing this hypothesis?
link |
How do you probe the subjective experience of an abstract system like science?
link |
Well the point of probing any subjective experience is impossible, because I'm not science, I'm
link |
So I can't probe another entity's, another, it's no more than bacteria on my skin.
link |
Your legs, I can ask you questions about your subjective experience and you can answer me.
link |
And that's how I know you're conscious.
link |
Yes, but that's because we speak the same language.
link |
You perhaps, we have to speak the language of science and we have to ask it.
link |
Honestly, I don't think consciousness, just like emotions of pain and pleasure, is not
link |
something that inevitably arises from any sort of sufficiently intelligent information
link |
It is a feature of the mind and if you've not implemented it explicitly, it is not there.
link |
So you think it's an emergent feature of a particular architecture.
link |
It's a feature in the same sense.
link |
So again, the subjective experience is all about guiding behavior.
link |
If the problems you're trying to solve don't really involve embedded agents, maybe in a
link |
social context, generating behavior and pursuing goals like this.
link |
And if you look at science, that's not really what's happening, even though it is, it is
link |
a form of artificial air in this artificial intelligence in the sense that it is solving
link |
problems, it is committing knowledge, committing solutions and so on.
link |
So if you're not explicitly implementing a subjective experience, implementing certain
link |
emotions and implementing consciousness, it's not going to just spontaneously emerge.
link |
But so for a system like human like intelligent system that has consciousness, do you think
link |
it needs to have a body?
link |
I mean, it doesn't have to be a physical body, right?
link |
And there's not that much difference between a realistic simulation in the real world.
link |
So there has to be something you have to preserve kind of thing.
link |
But human like intelligence can only arise in a human like context.
link |
Intelligence needs to be tired.
link |
You need other humans in order for you to demonstrate that you have human like intelligence, essentially.
link |
So what kind of tests and demonstration would be sufficient for you to demonstrate human
link |
like intelligence?
link |
And just out of curiosity, you talked about in terms of theorem proving and program synthesis,
link |
I think you've written about that there's no good benchmarks for this.
link |
That's one of the problems.
link |
So let's talk programs, program synthesis.
link |
So what do you imagine is a good, I think it's related questions for human like intelligence
link |
and for program synthesis.
link |
What's a good benchmark for either or both?
link |
So I mean, you're actually asking two questions.
link |
Which is one is about quantifying intelligence and comparing the intelligence of an artificial
link |
system to the intelligence for human.
link |
And the other is about a degree to which this intelligence is human like.
link |
It's actually two different questions.
link |
So if you look, you mentioned earlier the Turing test.
link |
Well, I actually don't like the Turing test because it's very lazy.
link |
It's all about completely bypassing the problem of defining and measuring intelligence.
link |
And instead delegating to a human judge or a panel of human judges.
link |
So it's a total cobalt, right?
link |
If you want to measure how human like an agent is, I think you have to make it interact
link |
with other humans.
link |
Maybe it's not necessarily a good idea to have these other humans be the judges.
link |
Maybe you should just observe BFU and compare it to what the human would actually have done.
link |
When it comes to measuring how smart, how clever an agent is and comparing that to the
link |
degree of human intelligence.
link |
So we're already talking about two things, right?
link |
The degree, kind of like the magnitude of an intelligence and its direction, right?
link |
Like the norm of a vector and its direction.
link |
And the direction is like human likeness.
link |
And the magnitude, the norm is intelligence.
link |
You could call it intelligence, right?
link |
So the direction, your sense, the space of directions that are human like is very narrow.
link |
So the way you would measure the magnitude of intelligence in a system in a way that
link |
also enables you to compare it to that of a human.
link |
Well, if you look at different benchmarks for intelligence today, they're all too focused
link |
on skill at a given task.
link |
That's skill at playing chess, skill at playing Go, skill at playing Dota.
link |
And I think that's not the right way to go about it because you can always be the human
link |
at one specific task.
link |
The reason why our skill at playing Go or at juggling or anything is impressive is because
link |
we are expressing this skill within a certain set of constraints.
link |
If you remove the constraints, the constraints that we have one lifetime, that we have this
link |
body and so on, if you remove the context, if you have unlimited train data, if you
link |
can have access to, you know, for instance, if you look at juggling, if you have no restriction
link |
on the hardware, then achieving arbitrary levels of skill is not very interesting and
link |
says nothing about the amount of intelligence you've achieved.
link |
So if you want to measure intelligence, you need to rigorously define what intelligence
link |
is, which in itself, you know, it's a very challenging problem.
link |
And do you think that's possible?
link |
To define intelligence?
link |
I mean, you can provide, many people have provided, you know, some definition.
link |
I have my own definition.
link |
Where does your definition begin if it doesn't end?
link |
Well, I think intelligence is essentially the efficiency with which you turn experience
link |
into generalizable programs.
link |
So what that means is it's the efficiency with which you turn a sampling of experience
link |
space into the ability to process a larger chunk of experience space.
link |
So measuring skill can be one proxy because many, many different tasks can be one proxy
link |
for measure intelligence.
link |
But if you want to only measure skill, you should control for two things.
link |
You should control for the amount of experience that your system has and the priors that your
link |
But if you control, if you look at two agents and you give them the same priors and you
link |
give them the same amount of experience, there is one of the agents that is going to learn
link |
programs, representation, something, a model that will perform well on the larger chunk
link |
of experience space than the other.
link |
And that is the smaller agent.
link |
So if you fix the experience, which generate better programs, better meaning, more generalizable,
link |
that's really interesting.
link |
That's a very nice, clean definition of...
link |
By the way, in this definition, it is already very obvious that intelligence has to be specialized
link |
because you're talking about experience space and you're talking about segments of experience
link |
You're talking about priors and you're talking about experience, all of these things define
link |
the context in which intelligence emerges.
link |
And you can never look at the totality of experience space.
link |
So intelligence has to be specialized.
link |
But it can be sufficiently large, the experience space, even though specialized is a certain
link |
point when the experience space is large enough to where it might as well be general.
link |
I mean, it's very relative.
link |
For instance, many people would say human intelligence is general.
link |
In fact, it is quite specialized.
link |
We can definitely build systems that start from the same innate priors as what humans
link |
have at birth because we already understand fairly well what sort of priors we have as
link |
Like many people have worked on this problem, most notably, Elzebeth Spelke from Harvard,
link |
and if you know her, she's worked a lot on what she calls a core knowledge.
link |
And it is very much about trying to determine and describe what priors we are born with.
link |
Like language skills and so on and all that kind of stuff.
link |
So we have some pretty good understanding of what priors we are born with.
link |
So I've actually been working on a benchmark for the past couple of years, on and off.
link |
I hope to be able to release it at some point.
link |
The idea is to measure the intelligence of systems by considering for priors, considering
link |
for amount of experience, and by assuming the same priors as what humans are born with
link |
so that you can actually compare these scores to human intelligence and you can actually
link |
have humans pass the same test in a way that's fair.
link |
And so importantly, such a benchmark should be such that any amount of practicing does
link |
not increase your score.
link |
So try to picture a game where no matter how much you play this game, it does not change
link |
your skill at the game.
link |
Can you picture that?
link |
As a person who deeply appreciates practice, I cannot actually...
link |
There's actually a very simple trick.
link |
So in order to come up with a task, so the only thing you can measure is skill at a task.
link |
All tasks are going to involve priors.
link |
The trick is to know what they are and to describe that.
link |
And then you make sure that this is the same set of priors as what humans start with.
link |
So you create a task that assumes these priors, that exactly documents these priors, so that
link |
the priors are made explicit and there are no other priors involved.
link |
And then you generate a certain number of samples in experience space for this task.
link |
And this, for one task, assuming that the task is new for the agent passing it, that's
link |
one test of this definition of intelligence that we set up.
link |
And now you can scale that to many different tasks, that each task should be new to the
link |
And also should be human interpretable and understandable, so that you can actually have
link |
a human pass the same test and then you can compare the score of your machine and the score
link |
Which could be a lot.
link |
It could even start a task like MNIST, just as long as you start with the same set of
link |
Yeah, so the problem with MNIST, humans are already trained to recognize digits.
link |
But let's say we're considering objects that are not digits, some complete arbitrary patterns.
link |
Well, humans already come with visual priors about how to process that.
link |
So in order to make the game fair, you would have to isolate these priors and describe
link |
them and then express them as computational rules.
link |
Having worked a lot with vision science people has exceptionally difficult, a lot of progress
link |
has been made, there's been a lot of good tests, and basically reducing all of human
link |
vision into some good priors.
link |
We're still probably far away from that perfectly, but as a start for a benchmark, that's an
link |
exciting possibility.
link |
Yeah, so Elisabeth Belke actually lists objectness as one of the core knowledge priors.
link |
So we have priors about objectness, like about the visual space, about time, about agents,
link |
about goal oriented behavior.
link |
We have many different priors, but what's interesting is that, sure, we have this pretty
link |
diverse and rich set of priors, but it's also not that diverse, right?
link |
We are not born into this world with a ton of knowledge about the world.
link |
There is only a small set of core knowledge, right?
link |
So do you have a sense of how it feels to us humans that that set is not that large,
link |
but just even the nature of time that we kind of integrate pretty effectively through all
link |
of our perception, all of our reasoning, maybe how, you know, do you have a sense of
link |
how easy it is to encode those priors?
link |
Maybe it requires building a universe, and then the human brain in order to encode those
link |
Or do you have a hope that it can be listed like an XAMAT?
link |
So you have to keep in mind that any knowledge about the world that we are born with is something
link |
that has to have been encoded into our DNA by evolution at some point.
link |
And DNA is a very, very low bandwidth medium, like it's extremely long and expensive to
link |
encode anything into DNA, because first of all, you need some sort of evolutionary pressure
link |
to guide this writing process.
link |
And then, you know, the higher level of information you're trying to write, the longer it's going
link |
to be, and the thing in the environment that you're trying to encode knowledge about has
link |
to be stable over this duration.
link |
So you can only encode into DNA things that constitute an evolutionary advantage.
link |
So this is actually a very small subset of all possible knowledge about the world.
link |
You can only encode things that are stable, that are true over very, very long periods
link |
of time, typically millions of years.
link |
For instance, we might have some visual prior about the shape of snakes, right?
link |
But what makes a face?
link |
What's the difference between a face and a nonface?
link |
But consider this interesting question.
link |
Do we have any innate sense of the visual difference between a male face and a female
link |
What do you think?
link |
For a human, I mean.
link |
I would have to look back into evolutionary history when the genders emerged.
link |
But yeah, most, I mean, the faces of humans are quite different from the faces of great
link |
apes, great apes, right?
link |
That's interesting.
link |
You couldn't tell the face of a female chimpanzee from the face of a male chimpanzee, probably.
link |
And I don't think most humans evolve all that ability.
link |
We do have innate knowledge of what makes a face, but it's actually impossible for us
link |
to have any DNA encoding knowledge of the difference between a female human face and
link |
a male human face.
link |
Because that knowledge, that information came up into the world actually very recently.
link |
If you look at the slowness of the process of encoding knowledge into DNA.
link |
So that's interesting.
link |
That's a really powerful argument that DNA is a low bandwidth and it takes a long time
link |
to encode that naturally creates a very efficient encoding.
link |
But one important consequence of this is that, so yes, we are born into this world with a
link |
bunch of knowledge, sometimes very high level knowledge about the world like the rough shape
link |
of a snake, of the rough shape of a face.
link |
But importantly, because this knowledge takes so long to write, almost all of this innate
link |
knowledge is shared with our cousins, with great apes, right?
link |
So it is not actually this innate knowledge that makes us special.
link |
But to throw it right back at you from the earlier on in our discussion, that encoding
link |
might also include the entirety of the environment of Earth.
link |
To sum up, so it can include things that are important to survival and production.
link |
So for which there is some evolutionary pressure and things that are stable, constant over
link |
very, very, very long time periods.
link |
And honestly, it's not that much information.
link |
There's also, besides the bandwidths, constraints and constraints of the writing process, there's
link |
also memory constraints like DNA, the part of DNA that deals with the human brain, it's
link |
actually fairly small.
link |
It's like, you know, on the order of megabytes, right?
link |
There's not that much high level knowledge about the world you can encode.
link |
That's quite brilliant and hopeful for a benchmark that you're referring to of encoding priors.
link |
I actually look forward to, I'm skeptical that you can do it in the next couple of years,
link |
I've been working on it.
link |
So honestly, it's a very simple benchmark and it's not like a big breakthrough or anything.
link |
It's more like a fun side project, right?
link |
These fun side projects could launch entire groups of efforts towards creating reasoning
link |
systems and so on.
link |
Yeah, that's the goal.
link |
It's trying to measure strong generalization, to measure the strength of abstraction in
link |
our minds, well, in our minds and in an artificially intelligent agency.
link |
And if there's anything true about this science organism, it's individual cells love competition.
link |
So benchmarks encourage competition.
link |
So that's an exciting possibility.
link |
Do you think an AI winter is coming and how do we prevent it?
link |
So an AI winter is something that would occur when there's a big mismatch between how we
link |
are selling the capabilities of AI and the actual capabilities of AI.
link |
And today, deep learning is creating a lot of value and it will keep creating a lot of
link |
value in the sense that these models are applicable to a very wide range of problems that are
link |
And we are only just getting started with applying algorithms to every problem they
link |
So deep learning will keep creating a lot of value for the time being.
link |
What's concerning, however, is that there's a lot of hype around deep learning and around
link |
A lot of people are overselling the capabilities of these systems, not just the capabilities
link |
but also overselling the fact that they might be more or less brain like, like given a kind
link |
of a mystical aspect, these technologies, and also overselling the pace of progress,
link |
which it might look fast in the sense that we have this exponentially increasing number
link |
But again, that's just a simple consequence of the fact that we have ever more people
link |
coming into the field.
link |
It doesn't mean the progress is actually exponentially fast.
link |
Like, let's say you're trying to raise money for your startup or your research lab.
link |
You might want to tell, you know, a grand yos story to investors about how deep learning
link |
is just like the brain and how it can solve all these incredible problems like self driving
link |
and robotics and so on, and maybe you can tell them that the field is progressing so fast
link |
and we're going to have AI within 15 years or even 10 years, and none of this is true.
link |
And every time you're like saying these things and an investor or, you know, a decision maker
link |
believes them, well, this is like the equivalent of taking on credit card debt, but for trust.
link |
And maybe this will, you know, this will be what enables you to raise a lot of money,
link |
but ultimately you are creating damage, you are damaging the field.
link |
That's the concern is that debt, that's what happens with the other AI winters is the concern
link |
is you actually tweeted about this with autonomous vehicles, right?
link |
There's almost every single company now have promised that they will have full autonomous
link |
vehicles by 2021, 2022.
link |
That's a good example of the consequences of overhyping the capabilities of AI and the
link |
So because I work especially a lot recently in this area, I have a deep concern of what
link |
happens when all of these companies after every invested billions have a meeting and
link |
say, how much do we actually, first of all, do we have an autonomous vehicle?
link |
The answer will definitely be no.
link |
And second will be, wait a minute, we've invested one, two, three, four billion dollars
link |
into this and we made no profit.
link |
And the reaction to that may be going very hard in other directions that might impact
link |
you that even other industries.
link |
And that's what we call in the air winter is when there is backlash where no one believes
link |
any of these promises anymore because they've turned out to be big lies the first time around.
link |
And this will definitely happen to some extent for autonomous vehicles because the public
link |
and decision makers have been convinced that around 2015, they've been convinced by these
link |
people who are trying to raise money for their startups and so on, that L5 driving was coming
link |
in maybe 2016, maybe 2017, May 2018.
link |
Now in 2019, we're still waiting for it.
link |
And so I don't believe we are going to have a full on AI winter because we have these
link |
technologies that are producing a tremendous amount of real value, but there is also too
link |
So there will be some backlash, especially there will be backlash.
link |
So some startups are trying to sell the dream of AGI and the fact that AGI is going to create
link |
AGI is like a freelance, like if you can develop an AI system that passes a certain threshold
link |
of IQ or something, then suddenly you have infinite value.
link |
And well, there are actually lots of investors buying into this idea.
link |
And they will wait maybe 10, 15 years and nothing will happen.
link |
And the next time around, well, maybe there will be a new generation of investors, no
link |
Human memory is very short after all.
link |
I don't know about you, but because I've spoken about AGI sometimes poetically, I get a lot
link |
of emails from people giving me, they're usually like a large manifestos of, they say to me
link |
that they have created an AGI system or they know how to do it and there's a long write
link |
up of how to do it.
link |
I get a lot of these emails.
link |
They're a little bit feel like it's generated by an AI system actually, but there's usually
link |
Maybe that's recursively self improving AI, it's you have a transformer generating crank
link |
papers about a GI.
link |
So the question is about, because you've been such a good, you have a good radar for crank
link |
papers, how do we know they're not onto something?
link |
How do I, so when you start to talk about AGI or anything like the reasoning benchmarks
link |
and so on, so something that doesn't have a benchmark, it's really difficult to know.
link |
I mean, I talked to Jeff Hawkins who's really looking at neuroscience approaches to how,
link |
and there's some, there's echoes of really interesting ideas in at least Jeff's case,
link |
which he's showing.
link |
How do you usually think about this?
link |
Like preventing yourself from being too narrow minded and elitist about deep learning.
link |
It has to work on these particular benchmarks, otherwise it's trash.
link |
Well, the thing is intelligence does not exist in the abstract.
link |
Intelligence has to be applied.
link |
So if you don't have a benchmark, if you don't have an improvement on some benchmark, maybe
link |
it's a new benchmark.
link |
Maybe it's not something we've been looking at before, but you do need a problem that
link |
you're trying to solve.
link |
You're not going to come up with a solution without a problem.
link |
So you, general intelligence, I mean, you've clearly highlighted generalization.
link |
If you want to claim that you have an intelligence system, it should come with a benchmark.
link |
It should, yes, it should display capabilities of some kind.
link |
It should show that it can create some form of value, even if it's a very artificial form
link |
And that's also the reason why you don't actually need to care about telling which papers have
link |
actually some hidden potential and which do not.
link |
Because if there is a new technique, it's actually creating value.
link |
This is going to be brought to light very quickly because it's actually making a difference.
link |
So it's a difference between something that is ineffectual and something that is actually
link |
And ultimately, usefulness is our guide, not just in this field, but if you look at science
link |
in general, maybe there are many, many people over the years that have had some really interesting
link |
theories of everything, but they were just completely useless.
link |
And you don't actually need to tell the interesting theories from the useless theories.
link |
All you need is to see, you know, is this actually having an effect on something else?
link |
You know, is this actually useful?
link |
Is this making an impact or not?
link |
As beautifully put, I mean, the same applies to quantum mechanics, to string theory, to
link |
the holographic principle.
link |
We are doing deep learning because it works.
link |
Before it started working, people considered people working on neural networks as cranks
link |
Like, you know, no one was working on this anymore.
link |
And now it's working, which is what makes it valuable.
link |
It's not about being right, it's about being effective.
link |
And nevertheless, the individual entities of this scientific mechanism, just like Yoshio
link |
Banjo or Yanlacun, they, while being called cranks, stuck with it, right?
link |
And so, us individual agents, even if everyone's laughing at us, should stick with it.
link |
If you believe you have something, you should stick with it and see it through.
link |
That's a beautiful, inspirational message to end on.
link |
Francois, thank you so much for talking today.