back to indexJuergen Schmidhuber: Godel Machines, Meta-Learning, and LSTMs | Lex Fridman Podcast #11
link |
The following is a conversation with Jürgen Schmidhuber.
link |
He's the co director of the CS Swiss AI Lab
link |
and a co creator of long short term memory networks.
link |
LSDMs are used in billions of devices today
link |
for speech recognition, translation, and much more.
link |
Over 30 years, he has proposed a lot of interesting
link |
out of the box ideas on meta learning, adversarial networks,
link |
computer vision, and even a formal theory of quote,
link |
creativity, curiosity, and fun.
link |
This conversation is part of the MIT course
link |
on artificial general intelligence
link |
and the artificial intelligence podcast.
link |
If you enjoy it, subscribe on YouTube, iTunes,
link |
or simply connect with me on Twitter
link |
at Lex Friedman spelled F R I D.
link |
And now here's my conversation with Jürgen Schmidhuber.
link |
Early on you dreamed of AI systems
link |
that self improve recursively.
link |
When was that dream born?
link |
When I was a baby.
link |
No, that's not true.
link |
When I was a teenager.
link |
And what was the catalyst for that birth?
link |
What was the thing that first inspired you?
link |
When I was a boy, I was thinking about what to do in my life
link |
and then I thought the most exciting thing
link |
is to solve the riddles of the universe.
link |
And that means you have to become a physicist.
link |
However, then I realized that there's something even grander.
link |
You can try to build a machine
link |
that isn't really a machine any longer
link |
that learns to become a much better physicist
link |
than I could ever hope to be.
link |
And that's how I thought maybe I can multiply
link |
my tiny little bit of creativity into infinity.
link |
But ultimately that creativity will be multiplied
link |
to understand the universe around us.
link |
That's the curiosity for that mystery that drove you.
link |
Yes, so if you can build a machine
link |
that learns to solve more and more complex problems
link |
and more and more general problem solver
link |
then you basically have solved all the problems,
link |
at least all the solvable problems.
link |
So how do you think, what is the mechanism
link |
for that kind of general solver look like?
link |
Obviously we don't quite yet have one
link |
or know how to build one but we have ideas
link |
and you have had throughout your career
link |
several ideas about it.
link |
So how do you think about that mechanism?
link |
So in the 80s, I thought about how to build this machine
link |
that learns to solve all these problems
link |
that I cannot solve myself.
link |
And I thought it is clear it has to be a machine
link |
that not only learns to solve this problem here
link |
and this problem here but it also has to learn
link |
to improve the learning algorithm itself.
link |
So it has to have the learning algorithm
link |
in a representation that allows it to inspect it
link |
and modify it such that it can come up
link |
with a better learning algorithm.
link |
So I call that meta learning, learning to learn
link |
and recursive self improvement
link |
that is really the pinnacle of that
link |
where you then not only learn how to improve
link |
on that problem and on that
link |
but you also improve the way the machine improves
link |
and you also improve the way it improves
link |
the way it improves itself.
link |
And that was my 1987 diploma thesis
link |
which was all about that higher education
link |
hierarchy of meta learners that have no computational limits
link |
except for the well known limits that Gödel identified
link |
in 1931 and for the limits of physics.
link |
In the recent years, meta learning has gained popularity
link |
in a specific kind of form.
link |
You've talked about how that's not really meta learning
link |
with neural networks, that's more basic transfer learning.
link |
Can you talk about the difference
link |
between the big general meta learning
link |
and a more narrow sense of meta learning
link |
the way it's used today, the way it's talked about today?
link |
Let's take the example of a deep neural network
link |
that has learned to classify images
link |
and maybe you have trained that network
link |
on 100 different databases of images.
link |
And now a new database comes along
link |
and you want to quickly learn the new thing as well.
link |
So one simple way of doing that is you take the network
link |
which already knows 100 types of databases
link |
and then you just take the top layer of that
link |
and you retrain that using the new label data
link |
that you have in the new image database.
link |
And then it turns out that it really, really quickly
link |
can learn that too, one shot basically
link |
because from the first 100 data sets,
link |
it already has learned so much about computer vision
link |
that it can reuse that and that is then almost good enough
link |
to solve the new task except you need a little bit
link |
of adjustment on the top.
link |
So that is transfer learning.
link |
And it has been done in principle for many decades.
link |
People have done similar things for decades.
link |
Meta learning too, meta learning is about
link |
having the learning algorithm itself
link |
open to introspection by the system that is using it
link |
and also open to modification such that the learning system
link |
has an opportunity to modify
link |
any part of the learning algorithm
link |
and then evaluate the consequences of that modification
link |
and then learn from that to create
link |
a better learning algorithm and so on recursively.
link |
So that's a very different animal
link |
where you are opening the space of possible learning
link |
algorithms to the learning system itself.
link |
Right, so you've, like in the 2004 paper, you described
link |
gator machines, programs that rewrite themselves, right?
link |
Philosophically and even in your paper, mathematically,
link |
these are really compelling ideas but practically,
link |
do you see these self referential programs
link |
being successful in the near term to having an impact
link |
where sort of it demonstrates to the world
link |
that this direction is a good one to pursue
link |
Yes, we had these two different types
link |
of fundamental research,
link |
how to build a universal problem solver,
link |
one basically exploiting proof search
link |
and things like that that you need to come up with
link |
asymptotically optimal, theoretically optimal
link |
self improvers and problem solvers.
link |
However, one has to admit that through this proof search
link |
comes in an additive constant, an overhead,
link |
an additive overhead that vanishes in comparison
link |
to what you have to do to solve large problems.
link |
However, for many of the small problems
link |
that we want to solve in our everyday life,
link |
we cannot ignore this constant overhead
link |
and that's why we also have been doing other things,
link |
non universal things such as recurrent neural networks
link |
which are trained by gradient descent
link |
and local search techniques which aren't universal at all,
link |
which aren't provably optimal at all,
link |
like the other stuff that we did,
link |
but which are much more practical
link |
as long as we only want to solve the small problems
link |
that we are typically trying to solve
link |
in this environment here.
link |
So the universal problem solvers like the Gödel machine,
link |
but also Markus Hutter's fastest way
link |
of solving all possible problems,
link |
which he developed around 2002 in my lab,
link |
they are associated with these constant overheads
link |
for proof search, which guarantees that the thing
link |
that you're doing is optimal.
link |
For example, there is this fastest way
link |
of solving all problems with a computable solution,
link |
which is due to Markus, Markus Hutter,
link |
and to explain what's going on there,
link |
let's take traveling salesman problems.
link |
With traveling salesman problems,
link |
you have a number of cities and cities
link |
and you try to find the shortest path
link |
through all these cities without visiting any city twice.
link |
And nobody knows the fastest way
link |
of solving traveling salesman problems, TSPs,
link |
but let's assume there is a method of solving them
link |
within N to the five operations
link |
where N is the number of cities.
link |
Then the universal method of Markus
link |
is going to solve the same traveling salesman problem
link |
also within N to the five steps,
link |
plus O of one, plus a constant number of steps
link |
that you need for the proof searcher,
link |
which you need to show that this particular class
link |
of problems, the traveling salesman problems,
link |
can be solved within a certain time frame,
link |
solved within a certain time bound,
link |
within order N to the five steps, basically,
link |
and this additive constant doesn't care for N,
link |
which means as N is getting larger and larger,
link |
as you have more and more cities,
link |
the constant overhead pales in comparison,
link |
and that means that almost all large problems are solved
link |
in the best possible way.
link |
Today, we already have a universal problem solver like that.
link |
However, it's not practical because the overhead,
link |
the constant overhead is so large
link |
that for the small kinds of problems
link |
that we want to solve in this little biosphere.
link |
By the way, when you say small,
link |
you're talking about things that fall
link |
within the constraints of our computational systems.
link |
So they can seem quite large to us mere humans, right?
link |
That's right, yeah.
link |
So they seem large and even unsolvable
link |
in a practical sense today,
link |
but they are still small compared to almost all problems
link |
because almost all problems are large problems,
link |
which are much larger than any constant.
link |
Do you find it useful as a person
link |
who has dreamed of creating a general learning system,
link |
has worked on creating one,
link |
has done a lot of interesting ideas there,
link |
to think about P versus NP,
link |
this formalization of how hard problems are,
link |
this kind of worst case analysis type of thinking,
link |
do you find that useful?
link |
Or is it only just a mathematical,
link |
it's a set of mathematical techniques
link |
to give you intuition about what's good and bad.
link |
So P versus NP, that's super interesting
link |
from a theoretical point of view.
link |
And in fact, as you are thinking about that problem,
link |
you can also get inspiration
link |
for better practical problem solvers.
link |
On the other hand, we have to admit
link |
that at the moment, the best practical problem solvers
link |
for all kinds of problems that we are now solving
link |
through what is called AI at the moment,
link |
they are not of the kind
link |
that is inspired by these questions.
link |
There we are using general purpose computers
link |
such as recurrent neural networks,
link |
but we have a search technique
link |
which is just local search gradient descent
link |
to try to find a program
link |
that is running on these recurrent networks,
link |
such that it can solve some interesting problems
link |
such as speech recognition or machine translation
link |
and something like that.
link |
And there is very little theory behind the best solutions
link |
that we have at the moment that can do that.
link |
Do you think that needs to change?
link |
Do you think that will change?
link |
Or can we go, can we create a general intelligent systems
link |
without ever really proving that that system is intelligent
link |
in some kind of mathematical way,
link |
solving machine translation perfectly
link |
or something like that,
link |
within some kind of syntactic definition of a language,
link |
or can we just be super impressed
link |
by the thing working extremely well and that's sufficient?
link |
There's an old saying,
link |
and I don't know who brought it up first,
link |
which says, there's nothing more practical
link |
than a good theory.
link |
And a good theory of problem solving
link |
under limited resources,
link |
like here in this universe or on this little planet,
link |
has to take into account these limited resources.
link |
And so probably there is locking a theory,
link |
which is related to what we already have,
link |
these asymptotically optimal problem solvers,
link |
which tells us what we need in addition to that
link |
to come up with a practically optimal problem solver.
link |
So I believe we will have something like that.
link |
And maybe just a few little tiny twists are necessary
link |
to change what we already have,
link |
to come up with that as well.
link |
As long as we don't have that,
link |
we admit that we are taking suboptimal ways
link |
and recurrent neural networks and long short term memory
link |
for equipped with local search techniques.
link |
And we are happy that it works better
link |
than any competing methods,
link |
but that doesn't mean that we think we are done.
link |
You've said that an AGI system
link |
will ultimately be a simple one.
link |
A general intelligence system
link |
will ultimately be a simple one.
link |
Maybe a pseudocode of a few lines
link |
will be able to describe it.
link |
Can you talk through your intuition behind this idea,
link |
why you feel that at its core,
link |
intelligence is a simple algorithm?
link |
Experience tells us that the stuff that works best
link |
So the asymptotically optimal ways of solving problems,
link |
if you look at them,
link |
they're just a few lines of code, it's really true.
link |
Although they are these amazing properties,
link |
just a few lines of code.
link |
Then the most promising and most useful practical things,
link |
maybe don't have this proof of optimality
link |
associated with them.
link |
However, they are also just a few lines of code.
link |
The most successful recurrent neural networks,
link |
you can write them down in five lines of pseudocode.
link |
That's a beautiful, almost poetic idea,
link |
but what you're describing there
link |
is the lines of pseudocode are sitting on top
link |
of layers and layers of abstractions in a sense.
link |
So you're saying at the very top,
link |
it'll be a beautifully written sort of algorithm.
link |
But do you think that there's many layers of abstractions
link |
we have to first learn to construct?
link |
Yeah, of course, we are building on all these
link |
great abstractions that people have invented over the millennia,
link |
such as matrix multiplications and real numbers
link |
and basic arithmetics and calculus
link |
and derivations of error functions
link |
and derivatives of error functions and stuff like that.
link |
So without that language that greatly simplifies
link |
our way of thinking about these problems,
link |
we couldn't do anything.
link |
So in that sense, as always,
link |
we are standing on the shoulders of the giants
link |
who in the past simplified the problem
link |
of problem solving so much
link |
that now we have a chance to do the final step.
link |
So the final step will be a simple one.
link |
If we take a step back through all of human civilization
link |
and just the universe in general,
link |
how do you think about evolution
link |
and what if creating a universe
link |
is required to achieve this final step?
link |
What if going through the very painful
link |
and inefficient process of evolution is needed
link |
to come up with this set of abstractions
link |
that ultimately lead to intelligence?
link |
Do you think there's a shortcut
link |
or do you think we have to create something like our universe
link |
in order to create something like human level intelligence?
link |
So far, the only example we have is this one,
link |
this universe in which we are living.
link |
Do you think we can do better?
link |
Maybe not, but we are part of this whole process.
link |
So apparently, so it might be the case
link |
that the code that runs the universe
link |
is really, really simple.
link |
Everything points to that possibility
link |
because gravity and other basic forces
link |
are really simple laws that can be easily described
link |
also in just a few lines of code basically.
link |
And then there are these other events
link |
that the apparently random events
link |
in the history of the universe,
link |
which as far as we know at the moment
link |
don't have a compact code, but who knows?
link |
Maybe somebody in the near future
link |
is going to figure out the pseudo random generator
link |
which is computing whether the measurement
link |
of that spin up or down thing here
link |
is going to be positive or negative.
link |
Underlying quantum mechanics.
link |
Do you ultimately think quantum mechanics
link |
is a pseudo random number generator?
link |
So it's all deterministic.
link |
There's no randomness in our universe.
link |
Does God play dice?
link |
So a couple of years ago, a famous physicist,
link |
quantum physicist, Anton Zeilinger,
link |
he wrote an essay in nature
link |
and it started more or less like that.
link |
One of the fundamental insights of the 20th century
link |
was that the universe is fundamentally random
link |
on the quantum level.
link |
And that whenever you measure spin up or down
link |
or something like that,
link |
a new bit of information enters the history of the universe.
link |
And while I was reading that,
link |
I was already typing the response
link |
and they had to publish it.
link |
Because I was right, that there is no evidence,
link |
no physical evidence for that.
link |
So there's an alternative explanation
link |
where everything that we consider random
link |
is actually pseudo random,
link |
such as the decimal expansion of pi,
link |
3.141 and so on, which looks random, but isn't.
link |
So pi is interesting because every three digits
link |
sequence, every sequence of three digits
link |
appears roughly one in a thousand times.
link |
And every five digit sequence
link |
appears roughly one in 10,000 times,
link |
what you would expect if it was random.
link |
But there's a very short algorithm,
link |
a short program that computes all of that.
link |
So it's extremely compressible.
link |
And who knows, maybe tomorrow,
link |
somebody, some grad student at CERN goes back
link |
over all these data points, better decay and whatever,
link |
and figures out, oh, it's the second billion digits of pi
link |
or something like that.
link |
We don't have any fundamental reason at the moment
link |
to believe that this is truly random
link |
and not just a deterministic video game.
link |
If it was a deterministic video game,
link |
it would be much more beautiful.
link |
Because beauty is simplicity.
link |
And many of the basic laws of the universe,
link |
like gravity and the other basic forces are very simple.
link |
So very short programs can explain what these are doing.
link |
And it would be awful and ugly.
link |
The universe would be ugly.
link |
The history of the universe would be ugly
link |
if for the extra things, the random,
link |
the seemingly random data points that we get all the time,
link |
that we really need a huge number of extra bits
link |
to describe all these extra bits of information.
link |
So as long as we don't have evidence
link |
that there is no short program
link |
that computes the entire history of the entire universe,
link |
we are, as scientists, compelled to look further
link |
for that shortest program.
link |
Your intuition says there exists a program
link |
that can backtrack to the creation of the universe.
link |
So it can give the shortest path
link |
to the creation of the universe.
link |
Including all the entanglement things
link |
and all the spin up and down measures
link |
that have been taken place since 13.8 billion years ago.
link |
So we don't have a proof that it is random.
link |
We don't have a proof that it is compressible
link |
to a short program.
link |
But as long as we don't have that proof,
link |
we are obliged as scientists to keep looking
link |
for that simple explanation.
link |
So you said the simplicity is beautiful or beauty is simple.
link |
But you also work on curiosity, discovery,
link |
the romantic notion of randomness, of serendipity,
link |
of being surprised by things that are about you.
link |
In our poetic notion of reality,
link |
we think it's kind of like,
link |
poetic notion of reality, we think as humans
link |
require randomness.
link |
So you don't find randomness beautiful.
link |
You find simple determinism beautiful.
link |
Because the explanation becomes shorter.
link |
A universe that is compressible to a short program
link |
is much more elegant and much more beautiful
link |
than another one, which needs an almost infinite
link |
number of bits to be described.
link |
As far as we know, many things that are happening
link |
in this universe are really simple in terms of
link |
short programs that compute gravity
link |
and the interaction between elementary particles and so on.
link |
So all of that seems to be very, very simple.
link |
Every electron seems to reuse the same subprogram
link |
all the time, as it is interacting with
link |
other elementary particles.
link |
If we now require an extra oracle injecting
link |
new bits of information all the time for these
link |
extra things which are currently not understood,
link |
such as better decay, then the whole description
link |
length of the data that we can observe of the
link |
history of the universe would become much longer
link |
and therefore uglier.
link |
Again, simplicity is elegant and beautiful.
link |
The history of science is a history of compression progress.
link |
Yes, so you've described sort of as we build up
link |
abstractions and you've talked about the idea
link |
How do you see this, the history of science,
link |
the history of humanity, our civilization,
link |
and life on Earth as some kind of path towards
link |
greater and greater compression?
link |
What do you mean by that?
link |
How do you think about that?
link |
Indeed, the history of science is a history of
link |
compression progress.
link |
What does that mean?
link |
Hundreds of years ago there was an astronomer
link |
whose name was Kepler and he looked at the data
link |
points that he got by watching planets move.
link |
And then he had all these data points and
link |
suddenly it turned out that he can greatly
link |
compress the data by predicting it through an
link |
So it turns out that all these data points are
link |
more or less on ellipses around the sun.
link |
And another guy came along whose name was
link |
Newton and before him Hooke.
link |
And they said the same thing that is making
link |
these planets move like that is what makes the
link |
And it also holds for stones and for all kinds
link |
And suddenly many, many of these observations
link |
became much more compressible because as long
link |
as you can predict the next thing, given what
link |
you have seen so far, you can compress it.
link |
And you don't have to store that data extra.
link |
This is called predictive coding.
link |
And then there was still something wrong with
link |
that theory of the universe and you had
link |
deviations from these predictions of the theory.
link |
And 300 years later another guy came along
link |
whose name was Einstein.
link |
And he was able to explain away all these
link |
deviations from the predictions of the old
link |
theory through a new theory which was called
link |
the general theory of relativity.
link |
Which at first glance looks a little bit more
link |
complicated and you have to warp space and time
link |
but you can't phrase it within one single
link |
sentence which is no matter how fast you
link |
accelerate and how hard you decelerate and no
link |
matter what is the gravity in your local
link |
network, light speed always looks the same.
link |
And from that you can calculate all the
link |
So it's a very simple thing and it allows you
link |
to further compress all the observations
link |
because certainly there are hardly any
link |
deviations any longer that you can measure
link |
from the predictions of this new theory.
link |
So all of science is a history of compression
link |
You never arrive immediately at the shortest
link |
explanation of the data but you're making
link |
Whenever you are making progress you have an
link |
You see oh first I needed so many bits of
link |
information to describe the data, to describe
link |
my falling apples, my video of falling apples,
link |
I need so many data, so many pixels have to be
link |
But then suddenly I realize no there is a very
link |
simple way of predicting the third frame in the
link |
video from the first two.
link |
And maybe not every little detail can be
link |
predicted but more or less most of these orange
link |
blobs that are coming down they accelerate in
link |
the same way which means that I can greatly
link |
compress the video.
link |
And the amount of compression, progress, that
link |
is the depth of the insight that you have at
link |
That's the fun that you have, the scientific
link |
fun, the fun in that discovery.
link |
And we can build artificial systems that do
link |
They measure the depth of their insights as they
link |
are looking at the data which is coming in
link |
through their own experiments and we give
link |
them a reward, an intrinsic reward in proportion
link |
to this depth of insight.
link |
And since they are trying to maximize the
link |
rewards they get they are suddenly motivated to
link |
come up with new action sequences, with new
link |
experiments that have the property that the data
link |
that is coming in as a consequence of these
link |
experiments has the property that they can learn
link |
something about, see a pattern in there which
link |
they hadn't seen yet before.
link |
So there is an idea of power play that you
link |
described, a training in general problem solver
link |
in this kind of way of looking for the unsolved
link |
Can you describe that idea a little further?
link |
It's another very simple idea.
link |
So normally what you do in computer science,
link |
you have some guy who gives you a problem and
link |
then there is a huge search space of potential
link |
solution candidates and you somehow try them
link |
out and you have more less sophisticated ways
link |
of moving around in that search space until
link |
you finally found a solution which you
link |
consider satisfactory.
link |
That's what most of computer science is about.
link |
Power play just goes one little step further
link |
and says let's not only search for solutions
link |
to a given problem but let's search to pairs of
link |
problems and their solutions where the system
link |
itself has the opportunity to phrase its own
link |
So we are looking suddenly at pairs of
link |
problems and their solutions or modifications
link |
of the problem solver that is supposed to
link |
generate a solution to that new problem.
link |
And this additional degree of freedom allows
link |
us to build career systems that are like
link |
scientists in the sense that they not only
link |
try to solve and try to find answers to
link |
existing questions, no they are also free to
link |
pose their own questions.
link |
So if you want to build an artificial scientist
link |
you have to give it that freedom and power
link |
play is exactly doing that.
link |
So that's a dimension of freedom that's
link |
important to have but how hard do you think
link |
that, how multidimensional and difficult the
link |
space of then coming up with your own questions
link |
So it's one of the things that as human beings
link |
we consider to be the thing that makes us
link |
special, the intelligence that makes us special
link |
is that brilliant insight that can create
link |
something totally new.
link |
So now let's look at the extreme case, let's
link |
look at the set of all possible problems that
link |
you can formally describe which is infinite,
link |
which should be the next problem that a scientist
link |
or power play is going to solve.
link |
Well, it should be the easiest problem that
link |
goes beyond what you already know.
link |
So it should be the simplest problem that the
link |
current problem solver that you have which can
link |
already solve 100 problems that he cannot solve
link |
yet by just generalizing.
link |
So it has to be new, so it has to require a
link |
modification of the problem solver such that the
link |
new problem solver can solve this new thing but
link |
the old problem solver cannot do it and in
link |
addition to that we have to make sure that the
link |
problem solver doesn't forget any of the
link |
previous solutions.
link |
And so by definition power play is now trying
link |
always to search in this pair of, in the set of
link |
pairs of problems and problems over modifications
link |
for a combination that minimize the time to
link |
achieve these criteria.
link |
So it's always trying to find the problem which
link |
is easiest to add to the repertoire.
link |
So just like grad students and academics and
link |
researchers can spend their whole career in a
link |
local minima stuck trying to come up with
link |
interesting questions but ultimately doing very
link |
Do you think it's easy in this approach of
link |
looking for the simplest unsolvable problem to
link |
get stuck in a local minima?
link |
Is not never really discovering new, you know
link |
really jumping outside of the 100 problems that
link |
you've already solved in a genuine creative way?
link |
No, because that's the nature of power play that
link |
it's always trying to break its current
link |
generalization abilities by coming up with a new
link |
problem which is beyond the current horizon.
link |
Just shifting the horizon of knowledge a little
link |
bit out there, breaking the existing rules such
link |
that the new thing becomes solvable but wasn't
link |
solvable by the old thing.
link |
So like adding a new axiom like what Gödel did
link |
when he came up with these new sentences, new
link |
theorems that didn't have a proof in the formal
link |
system which means you can add them to the
link |
repertoire hoping that they are not going to
link |
damage the consistency of the whole thing.
link |
So in the paper with the amazing title,
link |
Formal Theory of Creativity, Fun and Intrinsic
link |
Motivation, you talk about discovery as intrinsic
link |
reward, so if you view humans as intelligent
link |
agents, what do you think is the purpose and
link |
meaning of life for us humans?
link |
You've talked about this discovery, do you see
link |
humans as an instance of power play, agents?
link |
Humans are curious and that means they behave
link |
like scientists, not only the official scientists
link |
but even the babies behave like scientists and
link |
they play around with their toys to figure out
link |
how the world works and how it is responding to
link |
their actions and that's how they learn about
link |
gravity and everything.
link |
In 1990 we had the first systems like that which
link |
would just try to play around with the environment
link |
and come up with situations that go beyond what
link |
they knew at that time and then get a reward for
link |
creating these situations and then becoming more
link |
general problem solvers and being able to understand
link |
more of the world.
link |
I think in principle that curiosity strategy or
link |
more sophisticated versions of what I just
link |
described, they are what we have built in as well
link |
because evolution discovered that's a good way of
link |
exploring the unknown world and a guy who explores
link |
the unknown world has a higher chance of solving
link |
the mystery that he needs to survive in this world.
link |
On the other hand, those guys who were too curious
link |
they were weeded out as well so you have to find
link |
Evolution found a certain trade off.
link |
Apparently in our society there is a certain
link |
percentage of extremely explorative guys and it
link |
doesn't matter if they die because many of the
link |
others are more conservative.
link |
It would be surprising to me if that principle of
link |
artificial curiosity wouldn't be present in almost
link |
exactly the same form here.
link |
You are a bit of a musician and an artist.
link |
Continuing on this topic of creativity, what do you
link |
think is the role of creativity and intelligence?
link |
So you've kind of implied that it's essential for
link |
intelligence if you think of intelligence as a
link |
problem solving system, as ability to solve problems.
link |
But do you think it's essential, this idea of
link |
We never have a program, a sub program that is
link |
called creativity or something.
link |
It's just a side effect of what our problem solvers
link |
do. They are searching a space of problems, a space
link |
of candidates, of solution candidates until they
link |
hopefully find a solution to a given problem.
link |
But then there are these two types of creativity
link |
and both of them are now present in our machines.
link |
The first one has been around for a long time,
link |
which is human gives problem to machine, machine
link |
tries to find a solution to that.
link |
And this has been happening for many decades and
link |
for many decades machines have found creative
link |
solutions to interesting problems where humans were
link |
not aware of these particularly creative solutions
link |
but then appreciated that the machine found that.
link |
The second is the pure creativity.
link |
That I would call, what I just mentioned, I would
link |
call the applied creativity, like applied art where
link |
somebody tells you now make a nice picture of this
link |
Pope and you will get money for that.
link |
So here is the artist and he makes a convincing
link |
picture of the Pope and the Pope likes it and gives
link |
And then there is the pure creativity which is
link |
more like the power play and the artificial
link |
curiosity thing where you have the freedom to
link |
select your own problem.
link |
Like a scientist who defines his own question
link |
to study and so that is the pure creativity if you
link |
will as opposed to the applied creativity which
link |
And in that distinction there is almost echoes of
link |
narrow AI versus general AI.
link |
So this kind of constrained painting of a Pope
link |
seems like the approaches of what people are
link |
calling narrow AI and pure creativity seems to be,
link |
maybe I am just biased as a human but it seems to
link |
be an essential element of human level intelligence.
link |
Is that what you are implying?
link |
If you zoom back a little bit and you just look
link |
at a general problem solving machine which is
link |
trying to solve arbitrary problems then this
link |
machine will figure out in the course of solving
link |
problems that it is good to be curious.
link |
So all of what I said just now about this prewired
link |
curiosity and this will to invent new problems
link |
that the system doesn't know how to solve yet
link |
should be just a byproduct of the general search.
link |
However, apparently evolution has built it into
link |
us because it turned out to be so successful,
link |
a prewiring, a bias, a very successful exploratory
link |
bias that we are born with.
link |
And you have also said that consciousness in the
link |
same kind of way may be a byproduct of problem solving.
link |
Do you find this an interesting byproduct?
link |
Do you think it is a useful byproduct?
link |
What are your thoughts on consciousness in general?
link |
Or is it simply a byproduct of greater and greater
link |
capabilities of problem solving that is similar
link |
to creativity in that sense?
link |
We never have a procedure called consciousness
link |
However, we get as side effects of what these
link |
machines are doing things that seem to be closely
link |
related to what people call consciousness.
link |
So for example, already in 1990 we had simple
link |
systems which were basically recurrent networks
link |
and therefore universal computers trying to map
link |
incoming data into actions that lead to success.
link |
Maximizing reward in a given environment,
link |
always finding the charging station in time
link |
whenever the battery is low and negative signals
link |
are coming from the battery, always find the
link |
charging station in time without bumping against
link |
painful obstacles on the way.
link |
So complicated things but very easily motivated.
link |
And then we give these little guys a separate
link |
recurrent neural network which is just predicting
link |
what's happening if I do that and that.
link |
What will happen as a consequence of these
link |
actions that I'm executing.
link |
And it's just trained on the long and long history
link |
of interactions with the world.
link |
So it becomes a predictive model of the world
link |
And therefore also a compressor of the observations
link |
of the world because whatever you can predict
link |
you don't have to store extra.
link |
So compression is a side effect of prediction.
link |
And how does this recurrent network compress?
link |
Well, it's inventing little subprograms, little
link |
subnetworks that stand for everything that
link |
frequently appears in the environment like
link |
bottles and microphones and faces, maybe lots of
link |
faces in my environment so I'm learning to create
link |
something like a prototype face and a new face
link |
comes along and all I have to encode are the
link |
deviations from the prototype.
link |
So it's compressing all the time the stuff that
link |
frequently appears.
link |
There's one thing that appears all the time that
link |
is present all the time when the agent is
link |
interacting with its environment which is the
link |
But just for data compression reasons it is
link |
extremely natural for this recurrent network to
link |
come up with little subnetworks that stand for
link |
the properties of the agents, the hand, the other
link |
actuators and all the stuff that you need to
link |
better encode the data which is influenced by
link |
the actions of the agent.
link |
So there just as a side effect of data compression
link |
during problem solving you have internal self
link |
Now you can use this model of the world to plan
link |
your future and that's what we also have done
link |
So the recurrent network which is the controller
link |
which is trying to maximize reward can use this
link |
model of the network of the world, this model
link |
network of the world, this predictive model of
link |
the world to plan ahead and say let's not do this
link |
action sequence, let's do this action sequence
link |
instead because it leads to more predicted
link |
And whenever it is waking up these little
link |
subnetworks that stand for itself then it is
link |
thinking about itself and it is thinking about
link |
itself and it is exploring mentally the
link |
consequences of its own actions and now you tell
link |
me what is still missing.
link |
Missing the next, the gap to consciousness.
link |
That's a really beautiful idea that if life is
link |
a collection of data and life is a process of
link |
compressing that data to act efficiently in that
link |
data you yourself appear very often.
link |
So it's useful to form compressions of yourself
link |
and it's a really beautiful formulation of what
link |
consciousness is a necessary side effect.
link |
It's actually quite compelling to me.
link |
You've described RNNs, developed LSTMs, long
link |
short term memory networks that are a type of
link |
recurrent neural networks that have gotten a lot
link |
of success recently.
link |
So these are networks that model the temporal
link |
aspects in the data, temporal patterns in the
link |
data and you've called them the deepest of the
link |
So what do you think is the value of depth in
link |
the models that we use to learn?
link |
Since you mentioned the long short term memory
link |
and the LSTM I have to mention the names of the
link |
brilliant students who made that possible.
link |
First of all my first student ever Sepp Hochreiter
link |
who had fundamental insights already in his
link |
Then Felix Geers who had additional important
link |
Alex Gray is a guy from Scotland who is mostly
link |
responsible for this CTC algorithm which is now
link |
often used to train the LSTM to do the speech
link |
recognition on all the Google Android phones and
link |
whatever and Siri and so on.
link |
So these guys without these guys I would be
link |
It's a lot of incredible work.
link |
What is now the depth?
link |
What is the importance of depth?
link |
Well most problems in the real world are deep in
link |
the sense that the current input doesn't tell you
link |
all you need to know about the environment.
link |
So instead you have to have a memory of what
link |
happened in the past and often important parts of
link |
that memory are dated.
link |
They are pretty old.
link |
So when you're doing speech recognition for
link |
example and somebody says 11 then that's about
link |
half a second or something like that which means
link |
it's already 50 time steps.
link |
And another guy or the same guy says 7.
link |
So the ending is the same even but now the
link |
system has to see the distinction between 7 and
link |
11 and the only way it can see the difference is
link |
it has to store that 50 steps ago there was an
link |
S or an L, 11 or 7.
link |
So there you have already a problem of depth 50
link |
because for each time step you have something
link |
like a virtual layer in the expanded unrolled
link |
version of this recurrent network which is doing
link |
the speech recognition.
link |
So these long time lags they translate into
link |
And most problems in this world are such that
link |
you really have to look far back in time to
link |
understand what is the problem and to solve it.
link |
But just like with LSTMs you don't necessarily
link |
need to when you look back in time remember every
link |
aspect you just need to remember the important
link |
The network has to learn to put the important
link |
stuff into memory and to ignore the unimportant
link |
But in that sense deeper and deeper is better
link |
or is there a limitation?
link |
I mean LSTM is one of the great examples of
link |
architectures that do something beyond just
link |
deeper and deeper networks.
link |
There's clever mechanisms for filtering data,
link |
for remembering and forgetting.
link |
So do you think that kind of thinking is
link |
If you think about LSTMs as a leap, a big leap
link |
forward over traditional vanilla RNNs, what do
link |
you think is the next leap within this context?
link |
So LSTM is a very clever improvement but LSTM
link |
still don't have the same kind of ability to see
link |
far back in the past as us humans do.
link |
The credit assignment problem across way back
link |
not just 50 time steps or 100 or 1000 but
link |
millions and billions.
link |
It's not clear what are the practical limits of
link |
the LSTM when it comes to looking back.
link |
Already in 2006 I think we had examples where
link |
it not only looked back tens of thousands of
link |
steps but really millions of steps.
link |
And Juan Perez Ortiz in my lab I think was the
link |
first author of a paper where we really, was it
link |
2006 or something, had examples where it learned
link |
to look back for more than 10 million steps.
link |
So for most problems of speech recognition it's
link |
not necessary to look that far back but there
link |
are examples where it does.
link |
Now the looking back thing, that's rather easy
link |
because there is only one past but there are
link |
many possible futures and so a reinforcement
link |
learning system which is trying to maximize its
link |
future expected reward and doesn't know yet which
link |
of these many possible futures should I select
link |
given this one single past is facing problems
link |
that the LSTM by itself cannot solve.
link |
So the LSTM is good for coming up with a compact
link |
representation of the history and observations
link |
and actions so far but now how do you plan in an
link |
efficient and good way among all these, how do
link |
you select one of these many possible action
link |
sequences that a reinforcement learning system
link |
has to consider to maximize reward in this
link |
We have this basic setup where you have one
link |
recurrent network which gets in the video and
link |
the speech and whatever and it's executing
link |
actions and it's trying to maximize reward so
link |
there is no teacher who tells it what to do at
link |
which point in time.
link |
And then there's the other network which is
link |
just predicting what's going to happen if I do
link |
that and that and that could be an LSTM network
link |
and it learns to look back all the way to make
link |
better predictions of the next time step.
link |
So essentially although it's predicting only the
link |
next time step it is motivated to learn to put
link |
into memory something that happened maybe a
link |
million steps ago because it's important to
link |
memorize that if you want to predict that at the
link |
next time step, the next event.
link |
Now how can a model of the world like that, a
link |
predictive model of the world be used by the
link |
Let's call it the controller and the model, the
link |
controller and the model.
link |
How can the model be used by the controller to
link |
efficiently select among these many possible
link |
The naive way we had about 30 years ago was
link |
let's just use the model of the world as a stand
link |
in, as a simulation of the world and millisecond
link |
by millisecond we plan the future and that means
link |
we have to roll it out really in detail and it
link |
will work only if the model is really good and
link |
it will still be inefficient because we have to
link |
look at all these possible futures and there are
link |
So instead what we do now since 2015 in our CM
link |
systems, controller model systems, we give the
link |
controller the opportunity to learn by itself how
link |
to use the potentially relevant parts of the M,
link |
of the model network to solve new problems more
link |
And if it wants to, it can learn to ignore the M
link |
and sometimes it's a good idea to ignore the M
link |
because it's really bad, it's a bad predictor in
link |
this particular situation of life where the
link |
controller is currently trying to maximize reward.
link |
However, it can also learn to address and exploit
link |
some of the subprograms that came about in the
link |
model network through compressing the data by
link |
So it now has an opportunity to reuse that code,
link |
the algorithmic information in the model network
link |
to reduce its own search space such that it can
link |
solve a new problem more quickly than without the
link |
So you're ultimately optimistic and excited about
link |
the power of RL, of reinforcement learning in the
link |
context of real systems.
link |
So you see RL as a potential having a huge impact
link |
beyond just sort of the M part is often developed on
link |
supervised learning methods.
link |
You see RL as a for problems of self driving cars
link |
or any kind of applied cyber robotics.
link |
That's the correct interesting direction for
link |
research in your view?
link |
We have a company called Nasence which has applied
link |
reinforcement learning to little Audis which learn
link |
to park without a teacher.
link |
The same principles were used of course.
link |
So these little Audis, they are small, maybe like
link |
that, so much smaller than the real Audis.
link |
But they have all the sensors that you find in the
link |
You find the cameras, the LIDAR sensors.
link |
They go up to 120 kilometers an hour if they want
link |
And they have pain sensors basically and they don't
link |
want to bump against obstacles and other Audis and
link |
so they must learn like little babies to park.
link |
Take the raw vision input and translate that into
link |
actions that lead to successful parking behavior
link |
which is a rewarding thing.
link |
And yes, they learn that.
link |
So we have examples like that and it's only in the
link |
This is just the tip of the iceberg and I believe the
link |
next wave of AI is going to be all about that.
link |
So at the moment, the current wave of AI is about
link |
passive pattern observation and prediction and that's
link |
what you have on your smartphone and what the major
link |
companies on the Pacific Rim are using to sell you
link |
ads to do marketing.
link |
That's the current sort of profit in AI and that's
link |
only one or two percent of the world economy.
link |
Which is big enough to make these companies pretty
link |
much the most valuable companies in the world.
link |
But there's a much, much bigger fraction of the
link |
economy going to be affected by the next wave which
link |
is really about machines that shape the data through
link |
their own actions.
link |
Do you think simulation is ultimately the biggest
link |
way that those methods will be successful in the next
link |
We're not talking about 100 years from now.
link |
We're talking about sort of the near term impact of
link |
Do you think really good simulation is required or
link |
is there other techniques like imitation learning,
link |
observing other humans operating in the real world?
link |
Where do you think the success will come from?
link |
So at the moment, we have a tendency of using physics
link |
simulations to learn behavior from machines that
link |
learn to solve problems that humans also do not know
link |
However, this is not the future because the future is
link |
in what little babies do.
link |
They don't use a physics engine to simulate the
link |
No, they learn a predictive model of the world which
link |
maybe sometimes is wrong in many ways but captures
link |
all kinds of important abstract high level predictions
link |
which are really important to be successful.
link |
And that's what was the future 30 years ago when we
link |
started that type of research but it's still the future
link |
and now we know much better how to go there to move
link |
forward and to really make working systems based on
link |
that where you have a learning model of the world,
link |
a model of the world that learns to predict what's
link |
going to happen if I do that and that.
link |
And then the controller uses that model to more
link |
quickly learn successful action sequences.
link |
And then of course always this curiosity thing.
link |
In the beginning, the model is stupid so the
link |
controller should be motivated to come up with
link |
experiments with action sequences that lead to data
link |
that improve the model.
link |
Do you think improving the model, constructing an
link |
understanding of the world in this connection is
link |
now the popular approaches that have been successful
link |
are grounded in ideas of neural networks.
link |
But in the 80s with expert systems, there's
link |
symbolic AI approaches which to us humans are more
link |
intuitive in the sense that it makes sense that you
link |
build up knowledge in this knowledge representation.
link |
What kind of lessons can we draw into our current
link |
approaches from expert systems from symbolic AI?
link |
So I became aware of all of that in the 80s and
link |
back then logic programming was a huge thing.
link |
Was it inspiring to you yourself?
link |
Did you find it compelling?
link |
Because a lot of your work was not so much in that
link |
It was more in the learning systems.
link |
Yes and no, but we did all of that.
link |
So my first publication ever actually was 1987,
link |
was the implementation of genetic algorithm of a
link |
genetic programming system in Prolog.
link |
So Prolog, that's what you learn back then which is
link |
a logic programming language and the Japanese,
link |
they have this huge fifth generation AI project
link |
which was mostly about logic programming back then.
link |
Although neural networks existed and were well
link |
known back then and deep learning has existed since
link |
1965, since this guy in the Ukraine,
link |
Iwakunenko, started it.
link |
But the Japanese and many other people,
link |
they focused really on this logic programming and I
link |
was influenced to the extent that I said,
link |
okay, let's take these biologically inspired
link |
algorithms like evolution, programs,
link |
and implement that in the language which I know,
link |
which was Prolog, for example, back then.
link |
And then in many ways this came back later because
link |
the Gödel machine, for example,
link |
has a proof searcher on board and without that it
link |
would not be optimal.
link |
Well, Markus Futter's universal algorithm for
link |
solving all well defined problems has a proof
link |
searcher on board so that's very much logic programming.
link |
Without that it would not be asymptotically optimal.
link |
But then on the other hand,
link |
because we are very pragmatic guys also,
link |
we focused on recurrent neural networks and
link |
suboptimal stuff such as gradient based search and
link |
program space rather than provably optimal things.
link |
The logic programming certainly has a usefulness
link |
when you're trying to construct something provably
link |
optimal or provably good or something like that.
link |
But is it useful for practical problems?
link |
It's really useful for our theorem proving.
link |
The best theorem provers today are not neural networks.
link |
No, they are logic programming systems and they
link |
are much better theorem provers than most math
link |
students in the first or second semester.
link |
But for reasoning, for playing games of Go or chess
link |
or for robots, autonomous vehicles that operate in
link |
the real world or object manipulation,
link |
you think learning.
link |
Yeah, as long as the problems have little to do
link |
with theorem proving themselves,
link |
then as long as that is not the case,
link |
you just want to have better pattern recognition.
link |
So to build a self driving car,
link |
you want to have better pattern recognition and
link |
pedestrian recognition and all these things.
link |
You want to minimize the number of false positives,
link |
which is currently slowing down self driving cars
link |
All of that has very little to do with logic programming.
link |
What are you most excited about in terms of
link |
directions of artificial intelligence at this moment
link |
in the next few years in your own research
link |
and in the broader community?
link |
So I think in the not so distant future,
link |
we will have for the first time little robots
link |
that learn like kids.
link |
I will be able to say to the robot,
link |
look here robot, we are going to assemble a smartphone.
link |
Let's take this slab of plastic and the screwdriver
link |
and let's screw in the screw like that.
link |
Not like that, like that.
link |
Not like that, like that.
link |
And I don't have a data glove or something.
link |
He will see me and he will hear me
link |
and he will try to do something with his own actuators,
link |
which will be really different from mine,
link |
but he will understand the difference
link |
and will learn to imitate me,
link |
but not in the supervised way
link |
where a teacher is giving target signals
link |
for all his muscles all the time.
link |
No, by doing this high level imitation
link |
where he first has to learn to imitate me
link |
and then to interpret these additional noises
link |
coming from my mouth as helping,
link |
helpful signals to do that better.
link |
And then it will by itself come up with faster ways
link |
and more efficient ways of doing the same thing.
link |
And finally I stop his learning algorithm
link |
and make a million copies and sell it.
link |
And so at the moment this is not possible,
link |
but we already see how we are going to get there.
link |
And you can imagine to the extent
link |
that this works economically and cheaply,
link |
it's going to change everything.
link |
Almost all of production is going to be affected by that.
link |
And a much bigger wave,
link |
a much bigger AI wave is coming
link |
than the one that we are currently witnessing,
link |
which is mostly about passive pattern recognition
link |
on your smartphone.
link |
This is about active machines that shapes data
link |
through the actions they are executing
link |
and they learn to do that in a good way.
link |
So many of the traditional industries
link |
are going to be affected by that.
link |
All the companies that are building machines
link |
will equip these machines with cameras
link |
and other sensors and they are going to learn
link |
to solve all kinds of problems
link |
through interaction with humans,
link |
but also a lot on their own
link |
to improve what they already can do.
link |
And lots of old economy is going to be affected by that.
link |
And in recent years I have seen that old economy
link |
is actually waking up and realizing that this is the case.
link |
Are you optimistic about that future?
link |
Are you concerned?
link |
There is a lot of people concerned in the near term
link |
about the transformation of the nature of work,
link |
the kind of ideas that you just suggested
link |
would have a significant impact
link |
of what kind of things could be automated.
link |
Are you optimistic about that future?
link |
Are you nervous about that future?
link |
And looking a little bit farther into the future,
link |
there are people like Gila Musk, Stuart Russell,
link |
concerned about the existential threats of that future.
link |
So in the near term, job loss,
link |
in the long term existential threat,
link |
are these concerns to you or are you ultimately optimistic?
link |
So let's first address the near future.
link |
We have had predictions of job losses for many decades.
link |
For example, when industrial robots came along,
link |
many people predicted that lots of jobs are going to get lost.
link |
And in a sense, they were right,
link |
because back then there were car factories
link |
and hundreds of people in these factories assembled cars,
link |
and today the same car factories have hundreds of robots
link |
and maybe three guys watching the robots.
link |
On the other hand, those countries that have lots of robots per capita,
link |
Japan, Korea, Germany, Switzerland,
link |
and a couple of other countries,
link |
they have really low unemployment rates.
link |
Somehow, all kinds of new jobs were created.
link |
Back then, nobody anticipated those jobs.
link |
And decades ago, I always said,
link |
it's really easy to say which jobs are going to get lost,
link |
but it's really hard to predict the new ones.
link |
200 years ago, who would have predicted all these people
link |
making money as YouTube bloggers, for example?
link |
200 years ago, 60% of all people used to work in agriculture.
link |
But still, only, I don't know, 5% unemployment.
link |
Lots of new jobs were created, and Homo Ludens, the playing man,
link |
is inventing new jobs all the time.
link |
Most of these jobs are not existentially necessary
link |
for the survival of our species.
link |
There are only very few existentially necessary jobs,
link |
such as farming and building houses and warming up the houses,
link |
but less than 10% of the population is doing that.
link |
And most of these newly invented jobs are about
link |
interacting with other people in new ways,
link |
through new media and so on,
link |
getting new types of kudos and forms of likes and whatever,
link |
and even making money through that.
link |
So, Homo Ludens, the playing man, doesn't want to be unemployed,
link |
and that's why he's inventing new jobs all the time.
link |
And he keeps considering these jobs as really important
link |
and is investing a lot of energy and hours of work into those new jobs.
link |
That's quite beautifully put.
link |
We're really nervous about the future because we can't predict
link |
what kind of new jobs will be created.
link |
But you're ultimately optimistic that we humans are so restless
link |
that we create and give meaning to newer and newer jobs,
link |
totally new, things that get likes on Facebook
link |
or whatever the social platform is.
link |
So what about long term existential threat of AI,
link |
where our whole civilization may be swallowed up
link |
by these ultra super intelligent systems?
link |
Maybe it's not going to be swallowed up,
link |
but I'd be surprised if we humans were the last step
link |
in the evolution of the universe.
link |
You've actually had this beautiful comment somewhere that I've seen
link |
saying that, quite insightful, artificial general intelligence systems,
link |
just like us humans, will likely not want to interact with humans,
link |
they'll just interact amongst themselves.
link |
Just like ants interact amongst themselves
link |
and only tangentially interact with humans.
link |
And it's quite an interesting idea that once we create AGI,
link |
they will lose interest in humans and compete for their own Facebook likes
link |
and their own social platforms.
link |
So within that quite elegant idea, how do we know in a hypothetical sense
link |
that there's not already intelligence systems out there?
link |
How do you think broadly of general intelligence greater than us?
link |
How do we know it's out there?
link |
How do we know it's around us?
link |
And could it already be?
link |
I'd be surprised if within the next few decades or something like that,
link |
we won't have AIs that are truly smart in every single way
link |
and better problem solvers in almost every single important way.
link |
And I'd be surprised if they wouldn't realize what we have realized a long time ago,
link |
which is that almost all physical resources are not here in this biosphere,
link |
but further out, the rest of the solar system gets 2 billion times more solar energy
link |
than our little planet.
link |
There's lots of material out there that you can use to build robots
link |
and self replicating robot factories and all this stuff.
link |
And they are going to do that and they will be scientists and curious
link |
and they will explore what they can do.
link |
And in the beginning, they will be fascinated by life
link |
and by their own origins in our civilization.
link |
They will want to understand that completely, just like people today
link |
would like to understand how life works and also the history of our own existence
link |
and civilization, but then also the physical laws that created all of that.
link |
So in the beginning, they will be fascinated by life.
link |
Once they understand it, they lose interest.
link |
Like anybody who loses interest in things he understands.
link |
And then, as you said, the most interesting sources of information for them
link |
will be others of their own kind.
link |
So at least in the long run, there seems to be some sort of protection
link |
through lack of interest on the other side.
link |
And now it seems also clear, as far as we understand physics,
link |
you need matter and energy to compute and to build more robots and infrastructure
link |
for AI civilization and EIEI ecologies consisting of trillions of different types of AIs.
link |
And so it seems inconceivable to me that this thing is not going to expand.
link |
Some AI ecology not controlled by one AI, but trillions of different types of AIs
link |
competing in all kinds of quickly evolving and disappearing ecological niches
link |
in ways that we cannot fathom at the moment.
link |
But it's going to expand, limited by light speed and physics,
link |
but it's going to expand and now we realize that the universe is still young.
link |
It's only 13.8 billion years old and it's going to be a thousand times older than that.
link |
So there's plenty of time to conquer the entire universe
link |
and to fill it with intelligence and senders and receivers
link |
such that AIs can travel the way they are traveling in our labs today,
link |
which is by radio from sender to receiver.
link |
And let's call the current age of the universe one eon, one eon.
link |
Now it will take just a few eons from now and the entire visible universe
link |
is going to be full of that stuff.
link |
And let's look ahead to a time when the universe is going to be 1000 times older than it is now.
link |
They will look back and they will say, look, almost immediately after the Big Bang,
link |
only a few eons later, the entire universe started to become intelligent.
link |
Now to your question, how do we see whether anything like that has already happened
link |
or is already in a more advanced stage in some other part of the universe, of the visible universe?
link |
We are trying to look out there and nothing like that has happened so far or is that true?
link |
Do you think we would recognize it?
link |
How do we know it's not among us?
link |
How do we know planets aren't in themselves intelligent beings?
link |
How do we know ants seen as a collective are not much greater intelligence than our own?
link |
These kinds of ideas.
link |
When I was a boy, I was thinking about these things
link |
and I thought, maybe it has already happened.
link |
Because back then I knew, I learned from popular physics books,
link |
that the large scale structure of the universe is not homogeneous.
link |
You have these clusters of galaxies and then in between there are these huge empty spaces.
link |
And I thought, maybe they aren't really empty.
link |
It's just that in the middle of that, some AI civilization already has expanded
link |
and then has covered a bubble of a billion light years diameter
link |
and is using all the energy of all the stars within that bubble for its own unfathomable purposes.
link |
And so it already has happened and we just fail to interpret the signs.
link |
And then I learned that gravity by itself explains the large scale structure of the universe
link |
and that this is not a convincing explanation.
link |
And then I thought, maybe it's the dark matter.
link |
Because as far as we know today, 80% of the measurable matter is invisible.
link |
And we know that because otherwise our galaxy or other galaxies would fall apart.
link |
They are rotating too quickly.
link |
And then the idea was, maybe all of these AI civilizations that are already out there,
link |
they are just invisible because they are really efficient in using the energies of their own local systems
link |
and that's why they appear dark to us.
link |
But this is also not a convincing explanation because then the question becomes,
link |
why are there still any visible stars left in our own galaxy, which also must have a lot of dark matter?
link |
So that is also not a convincing thing.
link |
And today, I like to think it's quite plausible that maybe we are the first,
link |
at least in our local light cone within the few hundreds of millions of light years that we can reliably observe.
link |
Is that exciting to you that we might be the first?
link |
And it would make us much more important because if we mess it up through a nuclear war,
link |
then maybe this will have an effect on the development of the entire universe.
link |
So let's not mess it up.
link |
Let's not mess it up.
link |
Jürgen, thank you so much for talking today. I really appreciate it.