back to indexJuergen Schmidhuber: Godel Machines, Meta-Learning, and LSTMs | Lex Fridman Podcast #11
link |
The following is a conversation with Jürgen Schmidhuber.
link |
He's the codirector of its CS Swiss AI lab
link |
and a cocreator of long short term memory networks.
link |
LSDMs are used in billions of devices today
link |
for speech recognition, translation, and much more.
link |
Over 30 years, he has proposed a lot of interesting
link |
out of the box ideas, a meta learning, adversarial networks,
link |
computer vision, and even a formal theory of quote,
link |
creativity, curiosity, and fun.
link |
This conversation is part of the MIT course
link |
on artificial general intelligence
link |
and the artificial intelligence podcast.
link |
If you enjoy it, subscribe on YouTube, iTunes,
link |
or simply connect with me on Twitter
link |
at Lex Friedman spelled F R I D.
link |
And now here's my conversation with Jürgen Schmidhuber.
link |
Early on you dreamed of AI systems
link |
that self improve recursively.
link |
When was that dream born?
link |
When I was a baby.
link |
No, that's not true.
link |
When I was a teenager.
link |
And what was the catalyst for that birth?
link |
What was the thing that first inspired you?
link |
When I was a boy, I...
link |
I was thinking about what to do in my life
link |
and then I thought the most exciting thing
link |
is to solve the riddles of the universe.
link |
And that means you have to become a physicist.
link |
However, then I realized that there's something even grander.
link |
You can try to build a machine.
link |
That isn't really a machine any longer.
link |
That learns to become a much better physicist
link |
than I could ever hope to be.
link |
And that's how I thought maybe I can multiply
link |
my tiny little bit of creativity into infinity.
link |
But ultimately, that creativity will be multiplied
link |
to understand the universe around us.
link |
That's the curiosity for that mystery that drove you.
link |
Yes, so if you can build a machine
link |
that learns to solve more and more complex problems
link |
and more and more general problems over,
link |
then you basically have solved all the problems.
link |
At least all the solvable problems.
link |
So how do you think...
link |
What is the mechanism for that kind of general solver look like?
link |
Obviously, we don't quite yet have one or know
link |
how to build one boy of ideas
link |
and you have had throughout your career several ideas about it.
link |
So how do you think about that mechanism?
link |
So in the 80s, I thought about how to build this machine
link |
that learns to solve all these problems
link |
that I cannot solve myself.
link |
And I thought it is clear, it has to be a machine
link |
that not only learns to solve this problem here
link |
and this problem here,
link |
but it also has to learn to improve
link |
the learning algorithm itself.
link |
So it has to have the learning algorithm
link |
in a representation that allows it to inspect it
link |
and modify it so that it can come up
link |
with a better learning algorithm.
link |
So I called that meta learning, learning to learn
link |
and recursive self improvement.
link |
That is really the pinnacle of that,
link |
where you then not only learn how to improve
link |
on that problem and on that,
link |
but you also improve the way the machine improves
link |
and you also improve the way it improves the way
link |
it improves itself.
link |
And that was my 1987 diploma thesis,
link |
which was all about that hierarchy of meta learners
link |
that have no computational limits
link |
except for the well known limits
link |
that Gödel identified in 1931
link |
and for the limits of physics.
link |
In the recent years, meta learning has gained popularity
link |
in a specific kind of form.
link |
You've talked about how that's not really meta learning
link |
with neural networks, that's more basic transfer learning.
link |
Can you talk about the difference
link |
between the big general meta learning
link |
and a more narrow sense of meta learning
link |
the way it's used today, the way it's talked about today?
link |
Let's take the example of a deep neural network
link |
that has learned to classify images.
link |
And maybe you have trained that network
link |
on 100 different databases of images.
link |
And now a new database comes along
link |
and you want to quickly learn the new thing as well.
link |
So one simple way of doing that is you take the network
link |
which already knows 100 types of databases
link |
and then you just take the top layer of that
link |
and you retrain that using the new labeled data
link |
that you have in the new image database.
link |
And then it turns out that it really, really quickly
link |
can learn that too, one shot basically,
link |
because from the first 100 data sets,
link |
it already has learned so much about computer vision
link |
that it can reuse that and that is then almost good enough
link |
to solve the new tasks except you need a little bit
link |
of adjustment on the top.
link |
So that is transfer learning
link |
and it has been done in principle for many decades.
link |
People have done similar things for decades.
link |
Meta learning, true meta learning is about
link |
having the learning algorithm itself
link |
open to introspection by the system that is using it
link |
and also open to modification
link |
such that the learning system has an opportunity
link |
to modify any part of the learning algorithm
link |
and then evaluate the consequences of that modification
link |
and then learn from that to create a better learning algorithm
link |
and so on recursively.
link |
So that's a very different animal
link |
where you are opening the space of possible learning algorithms
link |
to the learning system itself.
link |
Right, so you've like in the 2004 paper,
link |
you describe Gatal machines and programs
link |
that rewrite themselves, right?
link |
Philosophically and even in your paper mathematically,
link |
these are really compelling ideas,
link |
but practically, do you see these self referential programs
link |
being successful in the near term to having an impact
link |
where sort of it demonstrates to the world
link |
that this direction is a good one to pursue in the near term?
link |
Yes, we had these two different types
link |
of fundamental research,
link |
how to build a universal problem solver,
link |
one basically exploiting proof search
link |
and things like that that you need to come up
link |
with asymptotically optimal, theoretically optimal
link |
self improvers and problem solvers.
link |
However, one has to admit that through this proof search
link |
comes in an additive constant,
link |
an overhead, an additive overhead
link |
that vanishes in comparison to what you have to do
link |
to solve large problems.
link |
However, for many of the small problems
link |
that we want to solve in our everyday life,
link |
we cannot ignore this constant overhead.
link |
And that's why we also have been doing other things,
link |
non universal things such as recurrent neural networks
link |
which are trained by gradient descent
link |
and local search techniques which aren't universal at all,
link |
which aren't provably optimal at all
link |
like the other stuff that we did,
link |
but which are much more practical
link |
as long as we only want to solve the small problems
link |
that we are typically trying to solve in this environment here.
link |
So the universal problem solvers like the Gödel machine
link |
but also Markus Hutter's fastest way
link |
of solving all possible problems,
link |
which he developed around 2002 in my lab,
link |
they are associated with these constant overheads
link |
for proof search, which guarantees
link |
that the thing that you're doing is optimal.
link |
For example, there is this fastest way
link |
of solving all problems with a computable solution
link |
which is due to Markus Hutter.
link |
And to explain what's going on there,
link |
let's take traveling salesman problems.
link |
With traveling salesman problems,
link |
you have a number of cities, N cities,
link |
and you try to find the shortest path
link |
through all these cities without visiting any city twice.
link |
And nobody knows the fastest way
link |
of solving traveling salesman problems, TSPs,
link |
but let's assume there is a method of solving them
link |
within N to the five operations
link |
where N is the number of cities.
link |
Then the universal method of Markus
link |
is going to solve the same traveling salesman problem
link |
also within N to the five steps,
link |
plus O of one, plus a constant number of steps
link |
that you need for the proof searcher,
link |
which you need to show that this particular
link |
class of problems that traveling salesman problems
link |
can be solved within a certain time bound,
link |
within order N to the five steps, basically.
link |
And this additive constant doesn't care for N,
link |
which means as N is getting larger and larger,
link |
as you have more and more cities,
link |
the constant overhead pales and comparison.
link |
And that means that almost all large problems are solved
link |
in the best possible way already today.
link |
We already have a universal problem solved like that.
link |
However, it's not practical because the overhead,
link |
the constant overhead is so large
link |
that for the small kinds of problems
link |
that we want to solve in this little biosphere.
link |
By the way, when you say small,
link |
you're talking about things that fall
link |
within the constraints of our computational systems.
link |
So they can seem quite large to us mere humans.
link |
That's right, yeah.
link |
So they seem large and even unsolvable
link |
in a practical sense today,
link |
but they are still small compared to almost all problems
link |
because almost all problems are large problems,
link |
which are much larger than any constant.
link |
Do you find it useful as a person
link |
who is dreamed of creating a general learning system,
link |
has worked on creating one,
link |
has done a lot of interesting ideas there
link |
to think about P versus NP,
link |
this formalization of how hard problems are,
link |
this kind of worst case analysis type of thinking.
link |
Do you find that useful?
link |
Or is it only just a mathematical,
link |
it's a set of mathematical techniques
link |
to give you intuition about what's good and bad?
link |
So P versus NP, that's super interesting
link |
from a theoretical point of view.
link |
And in fact, as you are thinking about that problem,
link |
you can also get inspiration
link |
for better practical problem solvers.
link |
On the other hand, we have to admit
link |
that at the moment,
link |
the best practical problem solvers
link |
for all kinds of problems
link |
that we are now solving through what is called AI at the moment,
link |
they are not of the kind
link |
that is inspired by these questions.
link |
There we are using general purpose computers,
link |
such as recurrent neural networks,
link |
but we have a search technique,
link |
which is just local search gradient descent
link |
to try to find a program
link |
that is running on these recurrent networks,
link |
such that it can solve some interesting problems,
link |
such as speech recognition or machine translation
link |
and something like that.
link |
And there is very little theory
link |
behind the best solutions that we have at the moment
link |
Do you think that needs to change?
link |
Do you think that will change or can we go,
link |
can we create a general intelligence systems
link |
without ever really proving
link |
that that system is intelligent
link |
in some kind of mathematical way,
link |
solving machine translation perfectly
link |
or something like that,
link |
within some kind of syntactic definition of a language?
link |
Or can we just be super impressed
link |
by the thing working extremely well and that's sufficient?
link |
There's an old saying,
link |
and I don't know who brought it up first,
link |
which says there's nothing more practical
link |
than a good theory.
link |
And a good theory of problem solving
link |
under limited resources like here in this universe
link |
or on this little planet
link |
has to take into account these limited resources.
link |
And so probably there is locking a theory
link |
which is related to what we already have,
link |
these asymptotically optimal problem solvers,
link |
which tells us what we need in addition to that
link |
to come up with a practically optimal problem solver.
link |
So I believe we will have something like that
link |
and maybe just a few little tiny twists
link |
are necessary to change what we already have
link |
to come up with that as well.
link |
As long as we don't have that,
link |
we admit that we are taking suboptimal ways
link |
and recurrent neural networks and long short term memory
link |
for equipped with local search techniques
link |
and we are happy that it works better
link |
than any competing methods,
link |
but that doesn't mean that we think we are done.
link |
You've said that an AGI system will ultimately be a simple one,
link |
a general intelligence system will ultimately be a simple one,
link |
maybe a pseudo code of a few lines
link |
will be able to describe it.
link |
Can you talk through your intuition behind this idea,
link |
why you feel that at its core intelligence
link |
is a simple algorithm?
link |
Experience tells us that the stuff that works best
link |
So the asymptotically optimal ways of solving problems,
link |
if you look at them,
link |
they're just a few lines of code, it's really true.
link |
Although they are these amazing properties,
link |
just a few lines of code,
link |
then the most promising and most useful practical things
link |
maybe don't have this proof of optimality associated with them.
link |
However, they are also just a few lines of code.
link |
The most successful recurrent neural networks,
link |
you can write them down and five lines of pseudo code.
link |
That's a beautiful, almost poetic idea,
link |
but what you're describing there
link |
is the lines of pseudo code
link |
are sitting on top of layers and layers of abstractions,
link |
So you're saying at the very top,
link |
it'll be a beautifully written sort of algorithm,
link |
but do you think that there's many layers of abstractions
link |
we have to first learn to construct?
link |
We are building on all these great abstractions
link |
that people have invented over the millennia,
link |
such as matrix multiplications and drill numbers
link |
and basic arithmetics and calculus and derivations
link |
of error functions and derivatives of error functions
link |
and stuff like that.
link |
So without that language that greatly simplifies
link |
our way of thinking about these problems,
link |
we couldn't do anything.
link |
So in that sense, as always,
link |
we are standing on the shoulders of the giants
link |
who in the past simplified the problem of problem solving
link |
so much that now we have a chance to do the final step.
link |
So the final step will be a simple one.
link |
If we take a step back through all of human civilization
link |
and just the universe in general,
link |
how do you think about evolution?
link |
And what if creating a universe is required
link |
to achieve this final step?
link |
What if going through the very painful
link |
and inefficient process of evolution is needed
link |
to come up with this set of abstractions
link |
that ultimately lead to intelligence?
link |
Do you think there's a shortcut
link |
or do you think we have to create something like our universe
link |
in order to create something like human level intelligence?
link |
So far, the only example we have is this one,
link |
this universe in which we are living.
link |
You think you can do better?
link |
Maybe not, but we are part of this whole process.
link |
So apparently, so it might be the case
link |
that the code that runs the universe
link |
is really, really simple.
link |
Everything points to that possibility
link |
because gravity and other basic forces
link |
are really simple laws that can be easily described,
link |
also in just a few lines of code, basically.
link |
And then there are these other events
link |
that the apparently random events
link |
in the history of the universe,
link |
which as far as we know at the moment
link |
don't have a compact code,
link |
but who knows, maybe somebody in the near future
link |
is going to figure out the pseudo random generator,
link |
which is computing whether the measurement of that
link |
spin up or down thing here
link |
is going to be positive or negative.
link |
Underline quantum mechanics.
link |
Do you ultimately think quantum mechanics
link |
is a pseudo random number generator?
link |
So it's all deterministic.
link |
There's no randomness in our universe.
link |
Does God play dice?
link |
So a couple of years ago,
link |
a famous physicist, quantum physicist, Anton Zeilinger,
link |
he wrote an essay in Nature,
link |
and it started more or less like that.
link |
One of the fundamental insights of the 20th century
link |
was that the universe is fundamentally random
link |
on the quantum level, and that whenever
link |
you measure spin up or down or something like that,
link |
a new bit of information enters the history of the universe.
link |
And while I was reading that,
link |
I was already typing the response
link |
and they had to publish it because I was right,
link |
that there is no evidence, no physical evidence for that.
link |
So there's an alternative explanation
link |
where everything that we consider random
link |
is actually pseudo random,
link |
such as the decimal expansion of pi, 3.141 and so on,
link |
which looks random, but isn't.
link |
So pi is interesting because every three digit sequence,
link |
every sequence of three digits appears roughly
link |
one in a thousand times, and every five digit sequence
link |
appears roughly one in 10,000 times.
link |
What do you expect?
link |
If it was random, but there's a very short algorithm,
link |
a short program that computes all of that.
link |
So it's extremely compressible.
link |
And who knows, maybe tomorrow somebody,
link |
some grad student at CERN goes back
link |
over all these data points, better decay,
link |
and whatever, and figures out, oh,
link |
it's the second billion digits of pi or something like that.
link |
We don't have any fundamental reason at the moment
link |
to believe that this is truly random
link |
and not just a deterministic video game.
link |
If it was a deterministic video game,
link |
it would be much more beautiful
link |
because beauty is simplicity.
link |
And many of the basic laws of the universe
link |
like gravity and the other basic forces are very simple.
link |
So very short programs can explain what these are doing.
link |
And it would be awful and ugly.
link |
The universe would be ugly.
link |
The history of the universe would be ugly
link |
if for the extra things, the random,
link |
the seemingly random data points that we get all the time
link |
that we really need a huge number of extra bits
link |
to describe all these extra bits of information.
link |
So as long as we don't have evidence
link |
that there is no short program
link |
that computes the entire history of the entire universe,
link |
we are, as scientists, compelled to look further
link |
for that shortest program.
link |
Your intuition says there exists a program
link |
that can backtrack to the creation of the universe.
link |
So it can take the shortest path to the creation of the universe.
link |
Yes, including all the entanglement things
link |
and all the spin up and down measurements
link |
that have been taken place since 13.8 billion years ago.
link |
So we don't have a proof that it is random.
link |
We don't have a proof that it is compressible to a short program.
link |
But as long as we don't have that proof,
link |
we are obliged as scientists to keep looking
link |
for that simple explanation.
link |
So you said simplicity is beautiful or beauty is simple.
link |
But you also work on curiosity, discovery.
link |
The romantic notion of randomness, of serendipity,
link |
of being surprised by things that are about you,
link |
kind of in our poetic notion of reality,
link |
we think as humans require randomness.
link |
So you don't find randomness beautiful.
link |
You find simple determinism beautiful.
link |
Because the explanation becomes shorter.
link |
A universe that is compressible to a short program
link |
is much more elegant and much more beautiful
link |
which needs an almost infinite number of bits to be described.
link |
As far as we know,
link |
many things that are happening in this universe are really simple
link |
in terms of short programs that compute gravity
link |
and the interaction between elementary particles and so on.
link |
So all of that seems to be very, very simple.
link |
Every electron seems to reuse the same subprogram all the time
link |
as it is interacting with other elementary particles.
link |
If we now require an extra oracle
link |
injecting new bits of information all the time
link |
for these extra things which are currently not understood,
link |
such as better decay,
link |
then the whole description length of the data that we can observe
link |
of the history of the universe would become much longer.
link |
And therefore, uglier.
link |
Again, the simplicity is elegant and beautiful.
link |
All the history of science is a history of compression progress.
link |
So you've described sort of as we build up abstractions
link |
and you've talked about the idea of compression.
link |
How do you see this, the history of science,
link |
the history of humanity, our civilization and life on Earth
link |
as some kind of path towards greater and greater compression?
link |
What do you mean by that?
link |
How do you think about that?
link |
Indeed, the history of science is a history of compression progress.
link |
What does that mean?
link |
Hundreds of years ago, there was an astronomer
link |
whose name was Kepler.
link |
And he looked at the data points that he got by watching planets move.
link |
And then he had all these data points and suddenly it turned out
link |
that he can greatly compress the data by predicting it through an ellipse law.
link |
So it turns out that all these data points are more or less on ellipses around the sun.
link |
And another guy came along whose name was Newton and before him Hook.
link |
And they said the same thing that is making these planets move like that
link |
is what makes the apples fall down.
link |
And it also holds for stones and for all kinds of other objects.
link |
And suddenly many, many of these observations became much more compressible
link |
because as long as you can predict the next thing,
link |
given what you have seen so far, you can compress it.
link |
But you don't have to store that data extra.
link |
This is called predictive coding.
link |
And then there was still something wrong with that theory of the universe
link |
and you had deviations from these predictions of the theory.
link |
And 300 years later another guy came along whose name was Einstein
link |
and he was able to explain away all these deviations from the predictions of the old theory
link |
through a new theory which was called the general theory of relativity
link |
which at first glance looks a little bit more complicated
link |
and you have to warp space and time but you can't phrase it within one single sentence
link |
which is no matter how fast you accelerate and how fast or how hard you decelerate
link |
and no matter what is the gravity in your local framework,
link |
light speed always looks the same.
link |
And from that you can calculate all the consequences.
link |
So it's a very simple thing and it allows you to further compress all the observations
link |
because certainly there are hardly any deviations any longer
link |
that you can measure from the predictions of this new theory.
link |
So art of science is a history of compression progress.
link |
You never arrive immediately at the shortest explanation of the data
link |
but you're making progress.
link |
Whenever you are making progress you have an insight.
link |
You see, oh, first I needed so many bits of information to describe the data,
link |
to describe my falling apples, my video of falling apples,
link |
I need so many data, so many pixels have to be stored
link |
but then suddenly I realize, no, there is a very simple way of predicting the third frame
link |
in the video from the first two and maybe not every little detail can be predicted
link |
but more or less most of these orange blots that are coming down,
link |
I'm sorry, in the same way, which means that I can greatly compress the video
link |
and the amount of compression, progress,
link |
that is the depth of the insight that you have at that moment.
link |
That's the fun that you have, the scientific fun,
link |
the fun in that discovery and we can build artificial systems that do the same thing.
link |
They measure the depth of their insights as they are looking at the data
link |
through their own experiments and we give them a reward,
link |
an intrinsic reward and proportion to this depth of insight.
link |
And since they are trying to maximize the rewards they get,
link |
they are suddenly motivated to come up with new action sequences,
link |
with new experiments that have the property that the data that is coming in
link |
as a consequence of these experiments has the property
link |
that they can learn something about, see a pattern in there
link |
which they hadn't seen yet before.
link |
So there's an idea of power play that you've described,
link |
a training and general problem solver in this kind of way of looking for the unsolved problems.
link |
Can you describe that idea a little further?
link |
It's another very simple idea.
link |
Normally what you do in computer science, you have some guy who gives you a problem
link |
and then there is a huge search space of potential solution candidates
link |
and you somehow try them out and you have more or less sophisticated ways
link |
of moving around in that search space
link |
until you finally found a solution which you consider satisfactory.
link |
That's what most of computer science is about.
link |
Power play just goes one little step further and says,
link |
let's not only search for solutions to a given problem,
link |
but let's search to pairs of problems and their solutions
link |
where the system itself has the opportunity to phrase its own problem.
link |
So we are looking suddenly at pairs of problems and their solutions
link |
or modifications of the problem solver
link |
that is supposed to generate a solution to that new problem.
link |
And this additional degree of freedom
link |
allows us to build career systems that are like scientists
link |
in the sense that they not only try to solve and try to find answers
link |
to existing questions, no, they are also free to pose their own questions.
link |
So if you want to build an artificial scientist,
link |
you have to give it that freedom and power play is exactly doing that.
link |
So that's a dimension of freedom that's important to have,
link |
how hard do you think that, how multi dimensional and difficult the space of
link |
then coming up with your own questions is.
link |
So it's one of the things that as human beings we consider to be
link |
the thing that makes us special, the intelligence that makes us special
link |
is that brilliant insight that can create something totally new.
link |
Yes. So now let's look at the extreme case.
link |
Let's look at the set of all possible problems that you can formally describe,
link |
which is infinite, which should be the next problem
link |
that a scientist or power play is going to solve.
link |
Well, it should be the easiest problem that goes beyond what you already know.
link |
So it should be the simplest problem that the current problems
link |
that you have which can already solve 100 problems that he cannot solve yet
link |
by just generalizing.
link |
So it has to be new.
link |
So it has to require a modification of the problem solver such that the new
link |
problem solver can solve this new thing, but the old problem solver cannot do it.
link |
And in addition to that, we have to make sure that the problem solver
link |
doesn't forget any of the previous solutions.
link |
And so by definition, power play is now trying always to search in this pair of
link |
in the set of pairs of problems and problems over modifications
link |
for a combination that minimize the time to achieve these criteria.
link |
Power is trying to find the problem which is easiest to add to the repertoire.
link |
So just like grad students and academics and researchers can spend their whole
link |
career in a local minima stuck trying to come up with interesting questions,
link |
but ultimately doing very little.
link |
Do you think it's easy in this approach of looking for the simplest
link |
problem solver problem to get stuck in a local minima is not never really discovering
link |
new, you know, really jumping outside of the hundred problems that you've already
link |
solved in a genuine creative way.
link |
No, because that's the nature of power play that it's always trying to break
link |
its current generalization abilities by coming up with a new problem which is
link |
beyond the current horizon, just shifting the horizon of knowledge a little bit
link |
out there, breaking the existing rules such that the new thing becomes solvable
link |
but wasn't solvable by the old thing.
link |
So like adding a new axiom, like what Gödel did when he came up with these
link |
new sentences, new theorems that didn't have a proof in the formal system,
link |
which means you can add them to the repertoire, hoping that they are not
link |
going to damage the consistency of the whole thing.
link |
So in the paper with the amazing title, Formal Theory of Creativity,
link |
Fun and Intrinsic Motivation, you talk about discovery as intrinsic reward.
link |
So if you view humans as intelligent agents, what do you think is the purpose
link |
and meaning of life for us humans?
link |
You've talked about this discovery.
link |
Do you see humans as an instance of power play agents?
link |
Yeah, so humans are curious and that means they behave like scientists,
link |
not only the official scientists but even the babies behave like scientists
link |
and they play around with their toys to figure out how the world works
link |
and how it is responding to their actions.
link |
And that's how they learn about gravity and everything.
link |
And yeah, in 1990, we had the first systems like that
link |
who would just try to play around with the environment
link |
and come up with situations that go beyond what they knew at that time
link |
and then get a reward for creating these situations
link |
and then becoming more general problem solvers
link |
and being able to understand more of the world.
link |
So yeah, I think in principle that curiosity,
link |
strategy or more sophisticated versions of what I just described,
link |
they are what we have built in as well because evolution discovered
link |
that's a good way of exploring the unknown world and a guy who explores
link |
the unknown world has a higher chance of solving problems
link |
that he needs to survive in this world.
link |
On the other hand, those guys who were too curious,
link |
they were weeded out as well.
link |
So you have to find this trade off.
link |
Evolution found a certain trade off apparently in our society.
link |
There is a certain percentage of extremely explorative guys
link |
and it doesn't matter if they die because many of the others are more conservative.
link |
And so yeah, it would be surprising to me
link |
if that principle of artificial curiosity wouldn't be present
link |
in almost exactly the same form here in our brains.
link |
So you're a bit of a musician and an artist.
link |
So continuing on this topic of creativity,
link |
what do you think is the role of creativity in intelligence?
link |
So you've kind of implied that it's essential for intelligence,
link |
if you think of intelligence as a problem solving system,
link |
as ability to solve problems.
link |
But do you think it's essential, this idea of creativity?
link |
We never have a subprogram that is called creativity or something.
link |
It's just a side effect of what our problems always do.
link |
They are searching a space of candidates, of solution candidates,
link |
until they hopefully find a solution to a given problem.
link |
But then there are these two types of creativity
link |
and both of them are now present in our machines.
link |
The first one has been around for a long time,
link |
which is human gives problem to machine.
link |
Machine tries to find a solution to that.
link |
And this has been happening for many decades.
link |
And for many decades, machines have found creative solutions
link |
to interesting problems where humans were not aware
link |
of these particularly creative solutions,
link |
but then appreciated that the machine found that.
link |
The second is the pure creativity.
link |
What I just mentioned, I would call the applied creativity,
link |
like applied art, where somebody tells you,
link |
now make a nice picture of this pope,
link |
and you will get money for that.
link |
So here is the artist and he makes a convincing picture of the pope
link |
and the pope likes it and gives him the money.
link |
And then there is the pure creativity,
link |
which is more like the power play and the artificial curiosity thing,
link |
where you have the freedom to select your own problem,
link |
like a scientist who defines his own question to study.
link |
And so that is the pure creativity, if you will,
link |
as opposed to the applied creativity, which serves another.
link |
In that distinction, there's almost echoes of narrow AI versus general AI.
link |
So this kind of constrained painting of a pope seems like
link |
the approaches of what people are calling narrow AI.
link |
And pure creativity seems to be,
link |
maybe I'm just biased as a human,
link |
but it seems to be an essential element of human level intelligence.
link |
Is that what you're implying?
link |
If you zoom back a little bit and you just look at a general problem solving machine,
link |
which is trying to solve arbitrary problems,
link |
then this machine will figure out in the course of solving problems
link |
that it's good to be curious.
link |
So all of what I said just now about this pre wild curiosity
link |
and this will to invent new problems that the system doesn't know how to solve yet,
link |
should be just a byproduct of the general search.
link |
However, apparently evolution has built it into us
link |
because it turned out to be so successful, a pre wiring, a bias,
link |
a very successful exploratory bias that we are born with.
link |
And you've also said that consciousness in the same kind of way
link |
may be a byproduct of problem solving.
link |
Do you find this an interesting byproduct?
link |
Do you think it's a useful byproduct?
link |
What are your thoughts on consciousness in general?
link |
Or is it simply a byproduct of greater and greater capabilities of problem solving
link |
that's similar to creativity in that sense?
link |
We never have a procedure called consciousness in our machines.
link |
However, we get a side effects of what these machines are doing,
link |
things that seem to be closely related to what people call consciousness.
link |
So for example, already 1990 we had simple systems
link |
which were basically recurrent networks and therefore universal computers
link |
trying to map incoming data into actions that lead to success.
link |
Maximizing reward in a given environment, always finding the charging station in time
link |
whenever the battery is low and negative signals are coming from the battery,
link |
always find the charging station in time without bumping against painful obstacles on the way.
link |
So complicated things but very easily motivated.
link |
And then we give these little guys a separate recurrent network
link |
which is just predicting what's happening if I do that and that.
link |
What will happen as a consequence of these actions that I'm executing
link |
and it's just trained on the long and long history of interactions with the world.
link |
So it becomes a predictive model of the world basically.
link |
And therefore also a compressor of the observations of the world
link |
because whatever you can predict, you don't have to store extra.
link |
So compression is a side effect of prediction.
link |
And how does this recurrent network compress?
link |
Well, it's inventing little subprograms, little subnetworks
link |
that stand for everything that frequently appears in the environment.
link |
Like bottles and microphones and faces, maybe lots of faces in my environment.
link |
So I'm learning to create something like a prototype face
link |
and a new face comes along and all I have to encode are the deviations from the prototype.
link |
So it's compressing all the time the stuff that frequently appears.
link |
There's one thing that appears all the time
link |
that is present all the time when the agent is interacting with its environment,
link |
which is the agent itself.
link |
So just for data compression reasons,
link |
it is extremely natural for this recurrent network
link |
to come up with little subnetworks that stand for the properties of the agents,
link |
the hand, the other actuators,
link |
and all the stuff that you need to better encode the data,
link |
which is influenced by the actions of the agent.
link |
So there, just as a side effect of data compression during primal solving,
link |
you have internal self models.
link |
Now you can use this model of the world to plan your future.
link |
And that's what we also have done since 1990.
link |
So the recurrent network, which is the controller,
link |
which is trying to maximize reward,
link |
can use this model of the network of the world,
link |
this model network of the world, this predictive model of the world
link |
to plan ahead and say, let's not do this action sequence.
link |
Let's do this action sequence instead
link |
because it leads to more predicted rewards.
link |
And whenever it's waking up these little subnetworks that stand for itself,
link |
then it's thinking about itself.
link |
Then it's thinking about itself.
link |
And it's exploring mentally the consequences of its own actions.
link |
And now you tell me why it's still missing.
link |
Missing the gap to consciousness.
link |
There isn't. That's a really beautiful idea that, you know,
link |
if life is a collection of data
link |
and life is a process of compressing that data to act efficiently.
link |
In that data, you yourself appear very often.
link |
So it's useful to form compressions of yourself.
link |
And it's a really beautiful formulation of what consciousness is,
link |
is a necessary side effect.
link |
It's actually quite compelling to me.
link |
We've described RNNs, developed LSTMs, long short term memory networks.
link |
They're a type of recurrent neural networks.
link |
They've gotten a lot of success recently.
link |
So these are networks that model the temporal aspects in the data,
link |
temporal patterns in the data.
link |
And you've called them the deepest of the neural networks, right?
link |
What do you think is the value of depth in the models that we use to learn?
link |
Yeah, since you mentioned the long short term memory and the LSTM,
link |
I have to mention the names of the brilliant students who made that possible.
link |
Yes, of course, of course.
link |
First of all, my first student ever, Sepp Hochreiter,
link |
who had fundamental insights already in his diploma thesis.
link |
Then Felix Giers, who had additional important contributions.
link |
Alex Gray is a guy from Scotland who is mostly responsible for this CTC algorithm,
link |
which is now often used to train the LSTM to do the speech recognition
link |
on all the Google Android phones and whatever, and Siri and so on.
link |
So these guys, without these guys, I would be nothing.
link |
It's a lot of incredible work.
link |
What is now the depth?
link |
What is the importance of depth?
link |
Well, most problems in the real world are deep
link |
in the sense that the current input doesn't tell you all you need to know
link |
about the environment.
link |
So instead, you have to have a memory of what happened in the past
link |
and often important parts of that memory are dated.
link |
They are pretty old.
link |
So when you're doing speech recognition, for example,
link |
and somebody says 11,
link |
then that's about half a second or something like that,
link |
which means it's already 50 time steps.
link |
And another guy or the same guy says 7.
link |
So the ending is the same, even.
link |
But now the system has to see the distinction between 7 and 11,
link |
and the only way it can see the difference is it has to store
link |
that 50 steps ago there was an S or an L, 11 or 7.
link |
So there you have already a problem of depth 50,
link |
because for each time step you have something like a virtual layer
link |
and the expanded, unrolled version of this recurrent network
link |
which is doing the speech recognition.
link |
So these long time lags, they translate into problem depth.
link |
And most problems in this world are such that you really
link |
have to look far back in time to understand what is the problem
link |
But just like with LSTMs, you don't necessarily need to,
link |
when you look back in time, remember every aspect.
link |
You just need to remember the important aspects.
link |
The network has to learn to put the important stuff into memory
link |
and to ignore the unimportant noise.
link |
But in that sense, deeper and deeper is better?
link |
Or is there a limitation?
link |
I mean LSTM is one of the great examples of architectures
link |
that do something beyond just deeper and deeper networks.
link |
There's clever mechanisms for filtering data for remembering and forgetting.
link |
So do you think that kind of thinking is necessary?
link |
If you think about LSTMs as a leap, a big leap forward
link |
over traditional vanilla RNNs, what do you think is the next leap
link |
within this context?
link |
So LSTM is a very clever improvement, but LSTMs still don't
link |
have the same kind of ability to see far back in the past
link |
as humans do, the credit assignment problem across way back,
link |
not just 50 time steps or 100 or 1,000, but millions and billions.
link |
It's not clear what are the practical limits of the LSTM
link |
when it comes to looking back.
link |
Already in 2006, I think, we had examples where not only
link |
looked back tens of thousands of steps, but really millions of steps.
link |
And Juan Perez Ortiz in my lab, I think was the first author of a paper
link |
where we really, was it 2006 or something, had examples where it
link |
learned to look back for more than 10 million steps.
link |
So for most problems of speech recognition, it's not
link |
necessary to look that far back, but there are examples where it does.
link |
Now, the looking back thing, that's rather easy because there is only
link |
one past, but there are many possible futures.
link |
And so a reinforcement learning system, which is trying to maximize
link |
its future expected reward and doesn't know yet which of these
link |
many possible futures should I select, given this one single past,
link |
is facing problems that the LSTM by itself cannot solve.
link |
So the LSTM is good for coming up with a compact representation
link |
of the history so far, of the history and of observations and actions so far.
link |
But now, how do you plan in an efficient and good way among all these,
link |
how do you select one of these many possible action sequences
link |
that a reinforcement learning system has to consider to maximize
link |
reward in this unknown future.
link |
So again, we have this basic setup where you have one recon network,
link |
which gets in the video and the speech and whatever, and it's
link |
executing the actions and it's trying to maximize reward.
link |
So there is no teacher who tells it what to do at which point in time.
link |
And then there's the other network, which is just predicting
link |
what's going to happen if I do that and then.
link |
And that could be an LSTM network, and it learns to look back
link |
all the way to make better predictions of the next time step.
link |
So essentially, although it's predicting only the next time step,
link |
it is motivated to learn to put into memory something that happened
link |
maybe a million steps ago because it's important to memorize that
link |
if you want to predict that at the next time step, the next event.
link |
Now, how can a model of the world like that,
link |
a predictive model of the world be used by the first guy,
link |
let's call it the controller and the model, the controller and the model.
link |
How can the model be used by the controller to efficiently select
link |
among these many possible futures?
link |
The naive way we had about 30 years ago was
link |
let's just use the model of the world as a stand in,
link |
as a simulation of the world.
link |
And millisecond by millisecond we plan the future
link |
and that means we have to roll it out really in detail
link |
and it will work only if the model is really good
link |
and it will still be inefficient
link |
because we have to look at all these possible futures
link |
and there are so many of them.
link |
So instead, what we do now since 2015 in our CN systems,
link |
controller model systems, we give the controller the opportunity
link |
to learn by itself how to use the potentially relevant parts
link |
of the model network to solve new problems more quickly.
link |
And if it wants to, it can learn to ignore the M
link |
and sometimes it's a good idea to ignore the M
link |
because it's really bad, it's a bad predictor
link |
in this particular situation of life
link |
where the controller is currently trying to maximize reward.
link |
However, it can also learn to address and exploit
link |
some of the subprograms that came about in the model network
link |
through compressing the data by predicting it.
link |
So it now has an opportunity to reuse that code,
link |
the algorithmic information in the model network
link |
to reduce its own search space,
link |
search that it can solve a new problem more quickly
link |
than without the model.
link |
So you're ultimately optimistic and excited
link |
about the power of reinforcement learning
link |
in the context of real systems.
link |
So you see RL as a potential having a huge impact
link |
beyond just sort of the M part is often developed
link |
on supervised learning methods.
link |
You see RL as a, for problems of cell driving cars
link |
or any kind of applied side robotics,
link |
that's the correct, interesting direction for researching you.
link |
We have a company called Nasense,
link |
which has applied reinforcement learning to little Audis.
link |
Which learn to park without a teacher.
link |
The same principles were used, of course.
link |
So these little Audis, they are small, maybe like that,
link |
so much smaller than the RL Audis.
link |
But they have all the sensors that you find in the RL Audis.
link |
You find the cameras, the LIDAR sensors.
link |
They go up to 120 kilometers an hour if they want to.
link |
And they have pain sensors, basically.
link |
And they don't want to bump against obstacles and other Audis.
link |
And so they must learn like little babies to park.
link |
Take the raw vision input and translate that into actions
link |
that lead to successful parking behavior,
link |
which is a rewarding thing.
link |
And yes, they learn that.
link |
So we have examples like that.
link |
And it's only in the beginning.
link |
This is just a tip of the iceberg.
link |
And I believe the next wave of AI is going to be all about that.
link |
So at the moment, the current wave of AI is about
link |
passive pattern observation and prediction.
link |
And that's what you have on your smartphone
link |
and what the major companies on the Pacific Rim are using
link |
to sell you ads to do marketing.
link |
That's the current sort of profit in AI.
link |
And that's only one or two percent of the wild economy,
link |
which is big enough to make these companies
link |
pretty much the most valuable companies in the world.
link |
But there's a much, much bigger fraction of the economy
link |
going to be affected by the next wave,
link |
which is really about machines that shape the data
link |
through their own actions.
link |
Do you think simulation is ultimately the biggest way
link |
that those methods will be successful in the next 10, 20 years?
link |
We're not talking about 100 years from now.
link |
We're talking about sort of the near term impact of RL.
link |
Do you think really good simulation is required?
link |
Or is there other techniques like imitation learning,
link |
observing other humans operating in the real world?
link |
Where do you think this success will come from?
link |
So at the moment we have a tendency of using
link |
physics simulations to learn behavior for machines
link |
that learn to solve problems that humans also do not know how to solve.
link |
However, this is not the future,
link |
because the future is in what little babies do.
link |
They don't use a physics engine to simulate the world.
link |
They learn a predictive model of the world,
link |
which maybe sometimes is wrong in many ways,
link |
but captures all kinds of important abstract high level predictions
link |
which are really important to be successful.
link |
And that's what was the future 30 years ago
link |
when we started that type of research,
link |
but it's still the future,
link |
and now we know much better how to move forward
link |
and to really make working systems based on that,
link |
where you have a learning model of the world,
link |
a model of the world that learns to predict what's going to happen
link |
if I do that and that,
link |
and then the controller uses that model
link |
to more quickly learn successful action sequences.
link |
And then of course always this curiosity thing,
link |
in the beginning the model is stupid,
link |
so the controller should be motivated
link |
to come up with experiments, with action sequences
link |
that lead to data that improve the model.
link |
Do you think improving the model,
link |
constructing an understanding of the world in this connection
link |
is now the popular approaches have been successful
link |
or grounded in ideas of neural networks,
link |
but in the 80s with expert systems there's symbolic AI approaches,
link |
which to us humans are more intuitive
link |
in the sense that it makes sense that you build up knowledge
link |
in this knowledge representation.
link |
What kind of lessons can we draw into our current approaches
link |
from expert systems, from symbolic AI?
link |
So I became aware of all of that in the 80s
link |
and back then logic programming was a huge thing.
link |
Was it inspiring to yourself that you find it compelling
link |
that a lot of your work was not so much in that realm,
link |
is more in the learning systems?
link |
Yes and no, but we did all of that.
link |
So my first publication ever actually was 1987,
link |
was the implementation of a genetic algorithm
link |
of a genetic programming system in Prolog.
link |
So Prolog, that's what you learn back then,
link |
which is a logic programming language,
link |
and the Japanese, they had this huge fifth generation AI project,
link |
which was mostly about logic programming back then,
link |
although neural networks existed and were well known back then,
link |
and deep learning has existed since 1965,
link |
since this guy in the Ukraine, Ivak Nenko, started it,
link |
but the Japanese and many other people,
link |
they focused really on this logic programming,
link |
and I was influenced to the extent that I said,
link |
okay, let's take these biologically inspired algorithms
link |
like evolution, programs,
link |
and implement that in the language which I know,
link |
which was Prolog, for example, back then.
link |
And then in many ways this came back later,
link |
because the Goudel machine, for example,
link |
has a proof searcher on board,
link |
and without that it would not be optimal.
link |
Well, Markus Hutter's universal algorithm
link |
for solving all well defined problems
link |
has a proof search on board,
link |
so that's very much logic programming.
link |
Without that it would not be asymptotically optimal.
link |
But then on the other hand, because we are very pragmatic guys also,
link |
we focused on recurrent neural networks
link |
and suboptimal stuff such as gradient based search
link |
and program space rather than provably optimal things.
link |
So logic programming certainly has a usefulness
link |
when you're trying to construct something provably optimal
link |
or provably good or something like that,
link |
but is it useful for practical problems?
link |
It's really useful for our theorem proving.
link |
The best theorem proofers today are not neural networks.
link |
No, they are logic programming systems
link |
that are much better theorem proofers than most math students
link |
in the first or second semester.
link |
But for reasoning, for playing games of Go, or chess,
link |
or for robots, autonomous vehicles that operate in the real world,
link |
or object manipulation, you think learning...
link |
Yeah, as long as the problems have little to do
link |
with theorem proving themselves,
link |
then as long as that is not the case,
link |
you just want to have better pattern recognition.
link |
So to build a self trying car, you want to have better pattern recognition
link |
and pedestrian recognition and all these things,
link |
and you want to minimize the number of false positives,
link |
which is currently slowing down self trying cars in many ways.
link |
And all of that has very little to do with logic programming.
link |
What are you most excited about in terms of directions
link |
of artificial intelligence at this moment in the next few years,
link |
in your own research and in the broader community?
link |
So I think in the not so distant future,
link |
we will have for the first time little robots that learn like kids.
link |
And I will be able to say to the robot,
link |
look here robot, we are going to assemble a smartphone.
link |
Let's take this slab of plastic and the screwdriver
link |
and let's screw in the screw like that.
link |
No, not like that, like that.
link |
Not like that, like that.
link |
And I don't have a data glove or something.
link |
He will see me and he will hear me
link |
and he will try to do something with his own actuators,
link |
which will be really different from mine,
link |
but he will understand the difference
link |
and will learn to imitate me but not in the supervised way
link |
where a teacher is giving target signals for all his muscles all the time.
link |
No, by doing this high level imitation
link |
where he first has to learn to imitate me
link |
and to interpret these additional noises coming from my mouth
link |
as helpful signals to do that pattern.
link |
And then it will by itself come up with faster ways
link |
and more efficient ways of doing the same thing.
link |
And finally, I stop his learning algorithm
link |
and make a million copies and sell it.
link |
And so at the moment this is not possible,
link |
but we already see how we are going to get there.
link |
And you can imagine to the extent that this works economically and cheaply,
link |
it's going to change everything.
link |
Almost all our production is going to be affected by that.
link |
And a much bigger wave,
link |
a much bigger AI wave is coming
link |
than the one that we are currently witnessing,
link |
which is mostly about passive pattern recognition on your smartphone.
link |
This is about active machines that shapes data through the actions they are executing
link |
and they learn to do that in a good way.
link |
So many of the traditional industries are going to be affected by that.
link |
All the companies that are building machines
link |
will equip these machines with cameras and other sensors
link |
and they are going to learn to solve all kinds of problems.
link |
Through interaction with humans, but also a lot on their own
link |
to improve what they already can do.
link |
And lots of old economy is going to be affected by that.
link |
And in recent years I have seen that old economy is actually waking up
link |
and realizing that this is the case.
link |
Are you optimistic about that future? Are you concerned?
link |
There's a lot of people concerned in the near term about the transformation
link |
of the nature of work.
link |
The kind of ideas that you just suggested
link |
would have a significant impact on what kind of things could be automated.
link |
Are you optimistic about that future?
link |
Are you nervous about that future?
link |
And looking a little bit farther into the future, there's people like Gila Musk
link |
still wrestle concerned about the existential threats of that future.
link |
So in the near term, job loss in the long term existential threat,
link |
are these concerns to you or are you ultimately optimistic?
link |
So let's first address the near future.
link |
We have had predictions of job losses for many decades.
link |
For example, when industrial robots came along,
link |
many people predicted that lots of jobs are going to get lost.
link |
And in a sense, they were right,
link |
because back then there were car factories
link |
and hundreds of people in these factories assembled cars.
link |
And today the same car factories have hundreds of robots
link |
and maybe three guys watching the robots.
link |
On the other hand, those countries that have lots of robots per capita,
link |
Japan, Korea, Germany, Switzerland, a couple of other countries,
link |
they have really low unemployment rates.
link |
Somehow all kinds of new jobs were created.
link |
Back then nobody anticipated those jobs.
link |
And decades ago, I already said,
link |
it's really easy to say which jobs are going to get lost,
link |
but it's really hard to predict the new ones.
link |
30 years ago, who would have predicted all these people
link |
making money as YouTube bloggers, for example?
link |
200 years ago, 60% of all people used to work in agriculture.
link |
But still, only, I don't know, 5% unemployment.
link |
Lots of new jobs were created.
link |
And Homo Ludens, the playing man,
link |
is inventing new jobs all the time.
link |
Most of these jobs are not existentially necessary
link |
for the survival of our species.
link |
There are only very few existentially necessary jobs
link |
such as farming and building houses and warming up the houses,
link |
but less than 10% of the population is doing that.
link |
And most of these newly invented jobs are about interacting with other people
link |
in new ways, through new media and so on,
link |
getting new types of kudos and forms of likes and whatever,
link |
and even making money through that.
link |
So, Homo Ludens, the playing man, doesn't want to be unemployed,
link |
and that's why he's inventing new jobs all the time.
link |
And he keeps considering these jobs as really important
link |
and is investing a lot of energy and hours of work into those new jobs.
link |
That's quite beautifully put.
link |
We're really nervous about the future
link |
because we can't predict what kind of new jobs will be created.
link |
But you're ultimately optimistic that we humans are so restless
link |
that we create and give meaning to newer and newer jobs,
link |
telling you things that get likes on Facebook
link |
or whatever the social platform is.
link |
So, what about long term existential threat of AI
link |
where our whole civilization may be swallowed up
link |
by this ultra super intelligent systems?
link |
Maybe it's not going to be swallowed up,
link |
but I'd be surprised if we humans were the last step
link |
in the evolution of the universe.
link |
You've actually had this beautiful comment somewhere
link |
that I've seen saying that artificial...
link |
Quite insightful, artificial intelligence systems
link |
just like us humans will likely not want to interact with humans.
link |
They'll just interact amongst themselves,
link |
just like ants interact amongst themselves
link |
and only tangentially interact with humans.
link |
And it's quite an interesting idea that once we create AGI
link |
that will lose interest in humans
link |
and have compete for their own Facebook likes
link |
and their own social platforms.
link |
So, within that quite elegant idea,
link |
how do we know in a hypothetical sense
link |
that there's not already intelligent systems out there?
link |
How do you think broadly of general intelligence
link |
greater than us, how do we know it's out there?
link |
How do we know it's around us and could it already be?
link |
I'd be surprised if within the next few decades
link |
or something like that we won't have AIs
link |
that are truly smart in every single way
link |
and better problem solvers in almost every single important way.
link |
And I'd be surprised if they wouldn't realize
link |
what we have realized a long time ago,
link |
which is that almost all physical resources are not here
link |
in this biosphere, but throughout the rest of the solar system
link |
gets two billion times more solar energy than our little planet.
link |
There's lots of material out there that you can use
link |
to build robots and self replicating robot factories and all this stuff.
link |
And they are going to do that.
link |
And they will be scientists and curious
link |
and they will explore what they can do.
link |
And in the beginning they will be fascinated by life
link |
and by their own origins in our civilization.
link |
They will want to understand that completely,
link |
just like people today would like to understand how life works
link |
and also the history of our own existence and civilization
link |
and also the physical laws that created all of them.
link |
So in the beginning they will be fascinated by life
link |
once they understand it, they lose interest,
link |
like anybody who loses interest in things he understands.
link |
And then, as you said,
link |
the most interesting sources of information for them
link |
will be others of their own kind.
link |
So, at least in the long run,
link |
there seems to be some sort of protection
link |
through lack of interest on the other side.
link |
And now it seems also clear, as far as we understand physics,
link |
you need matter and energy to compute
link |
and to build more robots and infrastructure
link |
and more AI civilization and AI ecologies
link |
consisting of trillions of different types of AI's.
link |
And so it seems inconceivable to me
link |
that this thing is not going to expand.
link |
Some AI ecology not controlled by one AI
link |
but trillions of different types of AI's competing
link |
in all kinds of quickly evolving
link |
and disappearing ecological niches
link |
in ways that we cannot fathom at the moment.
link |
But it's going to expand,
link |
limited by light speed and physics,
link |
but it's going to expand and now we realize
link |
that the universe is still young.
link |
It's only 13.8 billion years old
link |
and it's going to be a thousand times older than that.
link |
So there's plenty of time
link |
to conquer the entire universe
link |
and to fill it with intelligence
link |
and send us in receivers such that
link |
AI's can travel the way they are traveling
link |
in our labs today,
link |
which is by radio from sender to receiver.
link |
And let's call the current age of the universe one eon.
link |
Now it will take just a few eons from now
link |
and the entire visible universe
link |
is going to be full of that stuff.
link |
And let's look ahead to a time
link |
when the universe is going to be
link |
one thousand times older than it is now.
link |
They will look back and they will say,
link |
look almost immediately after the Big Bang,
link |
only a few eons later,
link |
the entire universe started to become intelligent.
link |
Now to your question,
link |
how do we see whether anything like that
link |
has already happened or is already in a more advanced stage
link |
in some other part of the universe,
link |
of the visible universe?
link |
We are trying to look out there
link |
and nothing like that has happened so far.
link |
Do you think we would recognize it?
link |
How do we know it's not among us?
link |
How do we know planets aren't in themselves intelligent beings?
link |
How do we know ants seen as a collective
link |
are not much greater intelligence than our own?
link |
These kinds of ideas.
link |
When I was a boy, I was thinking about these things
link |
and I thought, hmm, maybe it has already happened.
link |
Because back then I knew,
link |
I learned from popular physics books,
link |
that the structure, the large scale structure of the universe
link |
is not homogeneous.
link |
And you have these clusters of galaxies
link |
and then in between there are these huge empty spaces.
link |
And I thought, hmm, maybe they aren't really empty.
link |
It's just that in the middle of that
link |
some AI civilization already has expanded
link |
and then has covered a bubble of a billion light years time
link |
using all the energy of all the stars within that bubble
link |
for its own unfathomable practices.
link |
And so it always happened and we just failed to interpret the signs.
link |
But then I learned that gravity by itself
link |
explains the large scale structure of the universe
link |
and that this is not a convincing explanation.
link |
And then I thought maybe it's the dark matter
link |
because as far as we know today
link |
80% of the measurable matter is invisible.
link |
And we know that because otherwise our galaxy
link |
or other galaxies would fall apart.
link |
They are rotating too quickly.
link |
And then the idea was maybe all of these
link |
AI civilizations that are already out there,
link |
they are just invisible
link |
because they are really efficient in using the energies
link |
out their own local systems
link |
and that's why they appear dark to us.
link |
But this is also not a convincing explanation
link |
because then the question becomes
link |
why are there still any visible stars left in our own galaxy
link |
which also must have a lot of dark matter.
link |
So that is also not a convincing thing.
link |
And today I like to think it's quite plausible
link |
that maybe we are the first, at least in our local light cone
link |
within the few hundreds of millions of light years
link |
that we can reliably observe.
link |
Is that exciting to you?
link |
That we might be the first?
link |
It would make us much more important
link |
because if we mess it up through a nuclear war
link |
then maybe this will have an effect
link |
on the development of the entire universe.
link |
So let's not mess it up.
link |
Let's not mess it up.
link |
Jürgen, thank you so much for talking today.
link |
I really appreciate it.