back to index

Juergen Schmidhuber: Godel Machines, Meta-Learning, and LSTMs | Lex Fridman Podcast #11


small model | large model

link |
00:00:00.000
The following is a conversation with Jürgen Schmidhuber.
link |
00:00:03.520
He's the codirector of its CS Swiss AI lab
link |
00:00:06.360
and a cocreator of long short term memory networks.
link |
00:00:10.360
LSDMs are used in billions of devices today
link |
00:00:13.720
for speech recognition, translation, and much more.
link |
00:00:17.400
Over 30 years, he has proposed a lot of interesting
link |
00:00:20.800
out of the box ideas, a meta learning, adversarial networks,
link |
00:00:24.800
computer vision, and even a formal theory of quote,
link |
00:00:28.720
creativity, curiosity, and fun.
link |
00:00:32.360
This conversation is part of the MIT course
link |
00:00:34.920
on artificial general intelligence
link |
00:00:36.520
and the artificial intelligence podcast.
link |
00:00:38.840
If you enjoy it, subscribe on YouTube, iTunes,
link |
00:00:41.960
or simply connect with me on Twitter
link |
00:00:43.960
at Lex Friedman spelled F R I D.
link |
00:00:47.280
And now here's my conversation with Jürgen Schmidhuber.
link |
00:00:53.080
Early on you dreamed of AI systems
link |
00:00:55.640
that self improve recursively.
link |
00:00:58.680
When was that dream born?
link |
00:01:01.440
When I was a baby.
link |
00:01:02.840
No, that's not true.
link |
00:01:04.000
When I was a teenager.
link |
00:01:06.200
And what was the catalyst for that birth?
link |
00:01:09.400
What was the thing that first inspired you?
link |
00:01:12.800
When I was a boy, I...
link |
00:01:17.400
I was thinking about what to do in my life
link |
00:01:19.880
and then I thought the most exciting thing
link |
00:01:23.560
is to solve the riddles of the universe.
link |
00:01:27.160
And that means you have to become a physicist.
link |
00:01:30.720
However, then I realized that there's something even grander.
link |
00:01:35.640
You can try to build a machine.
link |
00:01:39.680
That isn't really a machine any longer.
link |
00:01:41.920
That learns to become a much better physicist
link |
00:01:44.280
than I could ever hope to be.
link |
00:01:46.840
And that's how I thought maybe I can multiply
link |
00:01:50.120
my tiny little bit of creativity into infinity.
link |
00:01:54.320
But ultimately, that creativity will be multiplied
link |
00:01:57.160
to understand the universe around us.
link |
00:01:59.160
That's the curiosity for that mystery that drove you.
link |
00:02:05.640
Yes, so if you can build a machine
link |
00:02:08.320
that learns to solve more and more complex problems
link |
00:02:13.760
and more and more general problems over,
link |
00:02:16.720
then you basically have solved all the problems.
link |
00:02:22.520
At least all the solvable problems.
link |
00:02:25.960
So how do you think...
link |
00:02:27.080
What is the mechanism for that kind of general solver look like?
link |
00:02:31.440
Obviously, we don't quite yet have one or know
link |
00:02:35.480
how to build one boy of ideas
link |
00:02:37.040
and you have had throughout your career several ideas about it.
link |
00:02:40.800
So how do you think about that mechanism?
link |
00:02:43.600
So in the 80s, I thought about how to build this machine
link |
00:02:48.640
that learns to solve all these problems
link |
00:02:51.000
that I cannot solve myself.
link |
00:02:54.120
And I thought it is clear, it has to be a machine
link |
00:02:57.120
that not only learns to solve this problem here
link |
00:03:00.880
and this problem here,
link |
00:03:02.640
but it also has to learn to improve
link |
00:03:06.240
the learning algorithm itself.
link |
00:03:09.360
So it has to have the learning algorithm
link |
00:03:12.480
in a representation that allows it to inspect it
link |
00:03:15.720
and modify it so that it can come up
link |
00:03:19.240
with a better learning algorithm.
link |
00:03:22.080
So I called that meta learning, learning to learn
link |
00:03:25.680
and recursive self improvement.
link |
00:03:28.040
That is really the pinnacle of that,
link |
00:03:29.840
where you then not only learn how to improve
link |
00:03:35.960
on that problem and on that,
link |
00:03:37.480
but you also improve the way the machine improves
link |
00:03:41.080
and you also improve the way it improves the way
link |
00:03:43.480
it improves itself.
link |
00:03:45.720
And that was my 1987 diploma thesis,
link |
00:03:48.560
which was all about that hierarchy of meta learners
link |
00:03:53.200
that have no computational limits
link |
00:03:57.240
except for the well known limits
link |
00:03:59.920
that Gödel identified in 1931
link |
00:04:03.160
and for the limits of physics.
link |
00:04:06.480
In the recent years, meta learning has gained popularity
link |
00:04:10.040
in a specific kind of form.
link |
00:04:12.760
You've talked about how that's not really meta learning
link |
00:04:16.000
with neural networks, that's more basic transfer learning.
link |
00:04:21.480
Can you talk about the difference
link |
00:04:22.720
between the big general meta learning
link |
00:04:25.440
and a more narrow sense of meta learning
link |
00:04:27.960
the way it's used today, the way it's talked about today?
link |
00:04:30.880
Let's take the example of a deep neural network
link |
00:04:33.440
that has learned to classify images.
link |
00:04:37.240
And maybe you have trained that network
link |
00:04:40.080
on 100 different databases of images.
link |
00:04:43.800
And now a new database comes along
link |
00:04:48.120
and you want to quickly learn the new thing as well.
link |
00:04:53.400
So one simple way of doing that is you take the network
link |
00:04:57.720
which already knows 100 types of databases
link |
00:05:02.440
and then you just take the top layer of that
link |
00:05:06.320
and you retrain that using the new labeled data
link |
00:05:11.320
that you have in the new image database.
link |
00:05:14.720
And then it turns out that it really, really quickly
link |
00:05:17.320
can learn that too, one shot basically,
link |
00:05:20.560
because from the first 100 data sets,
link |
00:05:24.280
it already has learned so much about computer vision
link |
00:05:27.520
that it can reuse that and that is then almost good enough
link |
00:05:31.840
to solve the new tasks except you need a little bit
link |
00:05:34.240
of adjustment on the top.
link |
00:05:37.040
So that is transfer learning
link |
00:05:40.200
and it has been done in principle for many decades.
link |
00:05:43.480
People have done similar things for decades.
link |
00:05:47.360
Meta learning, true meta learning is about
link |
00:05:49.880
having the learning algorithm itself
link |
00:05:54.440
open to introspection by the system that is using it
link |
00:06:00.440
and also open to modification
link |
00:06:03.760
such that the learning system has an opportunity
link |
00:06:07.200
to modify any part of the learning algorithm
link |
00:06:11.400
and then evaluate the consequences of that modification
link |
00:06:16.000
and then learn from that to create a better learning algorithm
link |
00:06:22.000
and so on recursively.
link |
00:06:24.960
So that's a very different animal
link |
00:06:27.680
where you are opening the space of possible learning algorithms
link |
00:06:32.680
to the learning system itself.
link |
00:06:35.520
Right, so you've like in the 2004 paper,
link |
00:06:39.040
you describe Gatal machines and programs
link |
00:06:42.440
that rewrite themselves, right?
link |
00:06:44.520
Philosophically and even in your paper mathematically,
link |
00:06:47.520
these are really compelling ideas,
link |
00:06:50.000
but practically, do you see these self referential programs
link |
00:06:55.320
being successful in the near term to having an impact
link |
00:06:59.400
where sort of it demonstrates to the world
link |
00:07:03.040
that this direction is a good one to pursue in the near term?
link |
00:07:08.680
Yes, we had these two different types
link |
00:07:11.400
of fundamental research,
link |
00:07:13.440
how to build a universal problem solver,
link |
00:07:15.840
one basically exploiting proof search
link |
00:07:23.000
and things like that that you need to come up
link |
00:07:24.960
with asymptotically optimal, theoretically optimal
link |
00:07:30.320
self improvers and problem solvers.
link |
00:07:34.200
However, one has to admit that through this proof search
link |
00:07:40.640
comes in an additive constant,
link |
00:07:43.640
an overhead, an additive overhead
link |
00:07:46.800
that vanishes in comparison to what you have to do
link |
00:07:51.800
to solve large problems.
link |
00:07:53.960
However, for many of the small problems
link |
00:07:56.920
that we want to solve in our everyday life,
link |
00:07:59.920
we cannot ignore this constant overhead.
link |
00:08:02.440
And that's why we also have been doing other things,
link |
00:08:07.440
non universal things such as recurrent neural networks
link |
00:08:11.160
which are trained by gradient descent
link |
00:08:14.360
and local search techniques which aren't universal at all,
link |
00:08:17.600
which aren't provably optimal at all
link |
00:08:20.280
like the other stuff that we did,
link |
00:08:22.000
but which are much more practical
link |
00:08:24.600
as long as we only want to solve the small problems
link |
00:08:27.920
that we are typically trying to solve in this environment here.
link |
00:08:34.680
So the universal problem solvers like the Gödel machine
link |
00:08:38.200
but also Markus Hutter's fastest way
link |
00:08:41.320
of solving all possible problems,
link |
00:08:43.560
which he developed around 2002 in my lab,
link |
00:08:47.360
they are associated with these constant overheads
link |
00:08:51.280
for proof search, which guarantees
link |
00:08:53.280
that the thing that you're doing is optimal.
link |
00:08:55.480
For example, there is this fastest way
link |
00:08:59.880
of solving all problems with a computable solution
link |
00:09:03.880
which is due to Markus Hutter.
link |
00:09:05.880
And to explain what's going on there,
link |
00:09:10.880
let's take traveling salesman problems.
link |
00:09:14.240
With traveling salesman problems,
link |
00:09:16.160
you have a number of cities, N cities,
link |
00:09:20.080
and you try to find the shortest path
link |
00:09:22.480
through all these cities without visiting any city twice.
link |
00:09:28.480
And nobody knows the fastest way
link |
00:09:31.040
of solving traveling salesman problems, TSPs,
link |
00:09:37.520
but let's assume there is a method of solving them
link |
00:09:40.480
within N to the five operations
link |
00:09:44.480
where N is the number of cities.
link |
00:09:50.160
Then the universal method of Markus
link |
00:09:54.560
is going to solve the same traveling salesman problem
link |
00:09:58.560
also within N to the five steps,
link |
00:10:02.080
plus O of one, plus a constant number of steps
link |
00:10:06.360
that you need for the proof searcher,
link |
00:10:09.240
which you need to show that this particular
link |
00:10:13.800
class of problems that traveling salesman problems
link |
00:10:17.240
can be solved within a certain time bound,
link |
00:10:20.520
within order N to the five steps, basically.
link |
00:10:24.400
And this additive constant doesn't care for N,
link |
00:10:28.520
which means as N is getting larger and larger,
link |
00:10:32.400
as you have more and more cities,
link |
00:10:34.880
the constant overhead pales and comparison.
link |
00:10:38.600
And that means that almost all large problems are solved
link |
00:10:44.120
in the best possible way already today.
link |
00:10:46.520
We already have a universal problem solved like that.
link |
00:10:50.480
However, it's not practical because the overhead,
link |
00:10:54.520
the constant overhead is so large
link |
00:10:57.440
that for the small kinds of problems
link |
00:11:00.200
that we want to solve in this little biosphere.
link |
00:11:04.560
By the way, when you say small,
link |
00:11:06.360
you're talking about things that fall
link |
00:11:08.600
within the constraints of our computational systems.
link |
00:11:10.880
So they can seem quite large to us mere humans.
link |
00:11:14.280
That's right, yeah.
link |
00:11:15.360
So they seem large and even unsolvable
link |
00:11:19.000
in a practical sense today,
link |
00:11:21.000
but they are still small compared to almost all problems
link |
00:11:24.760
because almost all problems are large problems,
link |
00:11:28.480
which are much larger than any constant.
link |
00:11:31.920
Do you find it useful as a person
link |
00:11:34.520
who is dreamed of creating a general learning system,
link |
00:11:38.680
has worked on creating one,
link |
00:11:39.880
has done a lot of interesting ideas there
link |
00:11:42.160
to think about P versus NP,
link |
00:11:46.360
this formalization of how hard problems are,
link |
00:11:50.800
how they scale,
link |
00:11:52.360
this kind of worst case analysis type of thinking.
link |
00:11:55.200
Do you find that useful?
link |
00:11:56.840
Or is it only just a mathematical,
link |
00:12:00.560
it's a set of mathematical techniques
link |
00:12:02.640
to give you intuition about what's good and bad?
link |
00:12:05.760
So P versus NP, that's super interesting
link |
00:12:09.440
from a theoretical point of view.
link |
00:12:11.800
And in fact, as you are thinking about that problem,
link |
00:12:14.560
you can also get inspiration
link |
00:12:17.280
for better practical problem solvers.
link |
00:12:21.280
On the other hand, we have to admit
link |
00:12:23.320
that at the moment,
link |
00:12:24.560
the best practical problem solvers
link |
00:12:28.360
for all kinds of problems
link |
00:12:30.120
that we are now solving through what is called AI at the moment,
link |
00:12:33.880
they are not of the kind
link |
00:12:36.240
that is inspired by these questions.
link |
00:12:38.800
There we are using general purpose computers,
link |
00:12:42.680
such as recurrent neural networks,
link |
00:12:44.840
but we have a search technique,
link |
00:12:46.680
which is just local search gradient descent
link |
00:12:50.320
to try to find a program
link |
00:12:51.960
that is running on these recurrent networks,
link |
00:12:54.400
such that it can solve some interesting problems,
link |
00:12:58.160
such as speech recognition or machine translation
link |
00:13:01.920
and something like that.
link |
00:13:03.200
And there is very little theory
link |
00:13:06.480
behind the best solutions that we have at the moment
link |
00:13:09.720
that can do that.
link |
00:13:10.800
Do you think that needs to change?
link |
00:13:12.640
Do you think that will change or can we go,
link |
00:13:15.120
can we create a general intelligence systems
link |
00:13:17.120
without ever really proving
link |
00:13:19.200
that that system is intelligent
link |
00:13:20.600
in some kind of mathematical way,
link |
00:13:22.560
solving machine translation perfectly
link |
00:13:24.960
or something like that,
link |
00:13:26.320
within some kind of syntactic definition of a language?
link |
00:13:29.160
Or can we just be super impressed
link |
00:13:31.120
by the thing working extremely well and that's sufficient?
link |
00:13:35.080
There's an old saying,
link |
00:13:36.720
and I don't know who brought it up first,
link |
00:13:39.360
which says there's nothing more practical
link |
00:13:42.440
than a good theory.
link |
00:13:43.680
And a good theory of problem solving
link |
00:13:52.760
under limited resources like here in this universe
link |
00:13:55.560
or on this little planet
link |
00:13:58.480
has to take into account these limited resources.
link |
00:14:01.800
And so probably there is locking a theory
link |
00:14:08.040
which is related to what we already have,
link |
00:14:10.800
these asymptotically optimal problem solvers,
link |
00:14:14.440
which tells us what we need in addition to that
link |
00:14:18.560
to come up with a practically optimal problem solver.
link |
00:14:21.760
So I believe we will have something like that
link |
00:14:27.080
and maybe just a few little tiny twists
link |
00:14:29.720
are necessary to change what we already have
link |
00:14:34.320
to come up with that as well.
link |
00:14:36.360
As long as we don't have that,
link |
00:14:37.800
we admit that we are taking suboptimal ways
link |
00:14:42.600
and recurrent neural networks and long short term memory
link |
00:14:46.040
for equipped with local search techniques
link |
00:14:50.440
and we are happy that it works better
link |
00:14:53.560
than any competing methods,
link |
00:14:55.480
but that doesn't mean that we think we are done.
link |
00:15:00.800
You've said that an AGI system will ultimately be a simple one,
link |
00:15:05.040
a general intelligence system will ultimately be a simple one,
link |
00:15:08.000
maybe a pseudo code of a few lines
link |
00:15:10.240
will be able to describe it.
link |
00:15:11.840
Can you talk through your intuition behind this idea,
link |
00:15:16.760
why you feel that at its core intelligence
link |
00:15:22.120
is a simple algorithm?
link |
00:15:26.920
Experience tells us that the stuff that works best
link |
00:15:31.680
is really simple.
link |
00:15:33.120
So the asymptotically optimal ways of solving problems,
link |
00:15:37.640
if you look at them,
link |
00:15:38.800
they're just a few lines of code, it's really true.
link |
00:15:41.800
Although they are these amazing properties,
link |
00:15:44.000
just a few lines of code,
link |
00:15:45.760
then the most promising and most useful practical things
link |
00:15:53.760
maybe don't have this proof of optimality associated with them.
link |
00:15:57.760
However, they are also just a few lines of code.
link |
00:16:00.840
The most successful recurrent neural networks,
link |
00:16:05.040
you can write them down and five lines of pseudo code.
link |
00:16:08.360
That's a beautiful, almost poetic idea,
link |
00:16:10.920
but what you're describing there
link |
00:16:15.600
is the lines of pseudo code
link |
00:16:17.400
are sitting on top of layers and layers of abstractions,
link |
00:16:20.600
in a sense.
link |
00:16:22.240
So you're saying at the very top,
link |
00:16:25.040
it'll be a beautifully written sort of algorithm,
link |
00:16:31.120
but do you think that there's many layers of abstractions
link |
00:16:33.960
we have to first learn to construct?
link |
00:16:36.880
Yeah, of course.
link |
00:16:38.280
We are building on all these great abstractions
link |
00:16:42.640
that people have invented over the millennia,
link |
00:16:46.040
such as matrix multiplications and drill numbers
link |
00:16:51.600
and basic arithmetics and calculus and derivations
link |
00:16:58.720
of error functions and derivatives of error functions
link |
00:17:03.320
and stuff like that.
link |
00:17:05.440
So without that language that greatly simplifies
link |
00:17:10.440
our way of thinking about these problems,
link |
00:17:13.880
we couldn't do anything.
link |
00:17:14.840
So in that sense, as always,
link |
00:17:16.560
we are standing on the shoulders of the giants
link |
00:17:19.600
who in the past simplified the problem of problem solving
link |
00:17:25.520
so much that now we have a chance to do the final step.
link |
00:17:30.000
So the final step will be a simple one.
link |
00:17:34.000
If we take a step back through all of human civilization
link |
00:17:36.760
and just the universe in general,
link |
00:17:38.360
how do you think about evolution?
link |
00:17:41.440
And what if creating a universe is required
link |
00:17:45.400
to achieve this final step?
link |
00:17:47.320
What if going through the very painful
link |
00:17:50.920
and inefficient process of evolution is needed
link |
00:17:53.840
to come up with this set of abstractions
link |
00:17:55.880
that ultimately lead to intelligence?
link |
00:17:57.800
Do you think there's a shortcut
link |
00:18:00.800
or do you think we have to create something like our universe
link |
00:18:04.640
in order to create something like human level intelligence?
link |
00:18:09.480
So far, the only example we have is this one,
link |
00:18:13.160
this universe in which we are living.
link |
00:18:15.160
You think you can do better?
link |
00:18:20.880
Maybe not, but we are part of this whole process.
link |
00:18:25.000
So apparently, so it might be the case
link |
00:18:30.000
that the code that runs the universe
link |
00:18:32.160
is really, really simple.
link |
00:18:33.720
Everything points to that possibility
link |
00:18:36.640
because gravity and other basic forces
link |
00:18:39.960
are really simple laws that can be easily described,
link |
00:18:44.120
also in just a few lines of code, basically.
link |
00:18:47.080
And then there are these other events
link |
00:18:52.200
that the apparently random events
link |
00:18:55.080
in the history of the universe,
link |
00:18:56.560
which as far as we know at the moment
link |
00:18:58.800
don't have a compact code,
link |
00:19:00.720
but who knows, maybe somebody in the near future
link |
00:19:03.240
is going to figure out the pseudo random generator,
link |
00:19:06.800
which is computing whether the measurement of that
link |
00:19:13.520
spin up or down thing here
link |
00:19:15.920
is going to be positive or negative.
link |
00:19:18.440
Underline quantum mechanics.
link |
00:19:19.880
Yes, so.
link |
00:19:20.720
Do you ultimately think quantum mechanics
link |
00:19:23.160
is a pseudo random number generator?
link |
00:19:25.200
So it's all deterministic.
link |
00:19:26.920
There's no randomness in our universe.
link |
00:19:30.400
Does God play dice?
link |
00:19:31.800
So a couple of years ago,
link |
00:19:34.080
a famous physicist, quantum physicist, Anton Zeilinger,
link |
00:19:39.080
he wrote an essay in Nature,
link |
00:19:41.600
and it started more or less like that.
link |
00:19:46.720
One of the fundamental insights of the 20th century
link |
00:19:53.280
was that the universe is fundamentally random
link |
00:19:58.280
on the quantum level, and that whenever
link |
00:20:03.760
you measure spin up or down or something like that,
link |
00:20:06.720
a new bit of information enters the history of the universe.
link |
00:20:13.440
And while I was reading that,
link |
00:20:14.680
I was already typing the response
link |
00:20:18.000
and they had to publish it because I was right,
link |
00:20:21.560
that there is no evidence, no physical evidence for that.
link |
00:20:25.560
So there's an alternative explanation
link |
00:20:28.440
where everything that we consider random
link |
00:20:31.240
is actually pseudo random,
link |
00:20:33.800
such as the decimal expansion of pi, 3.141 and so on,
link |
00:20:39.400
which looks random, but isn't.
link |
00:20:42.120
So pi is interesting because every three digit sequence,
link |
00:20:47.720
every sequence of three digits appears roughly
link |
00:20:51.720
one in a thousand times, and every five digit sequence
link |
00:20:57.360
appears roughly one in 10,000 times.
link |
00:21:00.760
What do you expect?
link |
00:21:02.760
If it was random, but there's a very short algorithm,
link |
00:21:06.760
a short program that computes all of that.
link |
00:21:09.120
So it's extremely compressible.
link |
00:21:11.200
And who knows, maybe tomorrow somebody,
link |
00:21:13.120
some grad student at CERN goes back
link |
00:21:15.360
over all these data points, better decay,
link |
00:21:19.120
and whatever, and figures out, oh,
link |
00:21:21.760
it's the second billion digits of pi or something like that.
link |
00:21:25.760
We don't have any fundamental reason at the moment
link |
00:21:28.840
to believe that this is truly random
link |
00:21:33.600
and not just a deterministic video game.
link |
00:21:36.440
If it was a deterministic video game,
link |
00:21:38.680
it would be much more beautiful
link |
00:21:40.360
because beauty is simplicity.
link |
00:21:44.160
And many of the basic laws of the universe
link |
00:21:47.560
like gravity and the other basic forces are very simple.
link |
00:21:51.560
So very short programs can explain what these are doing.
link |
00:21:56.560
And it would be awful and ugly.
link |
00:22:00.560
The universe would be ugly.
link |
00:22:01.560
The history of the universe would be ugly
link |
00:22:03.560
if for the extra things, the random,
link |
00:22:06.560
the seemingly random data points that we get all the time
link |
00:22:10.560
that we really need a huge number of extra bits
link |
00:22:15.560
to describe all these extra bits of information.
link |
00:22:22.560
So as long as we don't have evidence
link |
00:22:25.560
that there is no short program
link |
00:22:27.560
that computes the entire history of the entire universe,
link |
00:22:32.560
we are, as scientists, compelled to look further
link |
00:22:38.560
for that shortest program.
link |
00:22:41.560
Your intuition says there exists a program
link |
00:22:46.560
that can backtrack to the creation of the universe.
link |
00:22:50.560
So it can take the shortest path to the creation of the universe.
link |
00:22:53.560
Yes, including all the entanglement things
link |
00:22:57.560
and all the spin up and down measurements
link |
00:23:01.560
that have been taken place since 13.8 billion years ago.
link |
00:23:09.560
So we don't have a proof that it is random.
link |
00:23:14.560
We don't have a proof that it is compressible to a short program.
link |
00:23:19.560
But as long as we don't have that proof,
link |
00:23:21.560
we are obliged as scientists to keep looking
link |
00:23:24.560
for that simple explanation.
link |
00:23:26.560
Absolutely.
link |
00:23:27.560
So you said simplicity is beautiful or beauty is simple.
link |
00:23:30.560
Either one works.
link |
00:23:32.560
But you also work on curiosity, discovery.
link |
00:23:36.560
The romantic notion of randomness, of serendipity,
link |
00:23:42.560
of being surprised by things that are about you,
link |
00:23:49.560
kind of in our poetic notion of reality,
link |
00:23:53.560
we think as humans require randomness.
link |
00:23:56.560
So you don't find randomness beautiful.
link |
00:23:58.560
You find simple determinism beautiful.
link |
00:24:04.560
Yeah.
link |
00:24:06.560
Okay.
link |
00:24:07.560
So why?
link |
00:24:08.560
Why?
link |
00:24:09.560
Because the explanation becomes shorter.
link |
00:24:12.560
A universe that is compressible to a short program
link |
00:24:19.560
is much more elegant and much more beautiful
link |
00:24:22.560
than another one,
link |
00:24:24.560
which needs an almost infinite number of bits to be described.
link |
00:24:28.560
As far as we know,
link |
00:24:31.560
many things that are happening in this universe are really simple
link |
00:24:34.560
in terms of short programs that compute gravity
link |
00:24:38.560
and the interaction between elementary particles and so on.
link |
00:24:43.560
So all of that seems to be very, very simple.
link |
00:24:45.560
Every electron seems to reuse the same subprogram all the time
link |
00:24:50.560
as it is interacting with other elementary particles.
link |
00:24:57.560
If we now require an extra oracle
link |
00:25:04.560
injecting new bits of information all the time
link |
00:25:07.560
for these extra things which are currently not understood,
link |
00:25:11.560
such as better decay,
link |
00:25:18.560
then the whole description length of the data that we can observe
link |
00:25:25.560
of the history of the universe would become much longer.
link |
00:25:31.560
And therefore, uglier.
link |
00:25:33.560
And uglier.
link |
00:25:34.560
Again, the simplicity is elegant and beautiful.
link |
00:25:38.560
All the history of science is a history of compression progress.
link |
00:25:42.560
Yeah.
link |
00:25:43.560
So you've described sort of as we build up abstractions
link |
00:25:48.560
and you've talked about the idea of compression.
link |
00:25:52.560
How do you see this, the history of science,
link |
00:25:55.560
the history of humanity, our civilization and life on Earth
link |
00:25:59.560
as some kind of path towards greater and greater compression?
link |
00:26:03.560
What do you mean by that?
link |
00:26:04.560
How do you think about that?
link |
00:26:06.560
Indeed, the history of science is a history of compression progress.
link |
00:26:12.560
What does that mean?
link |
00:26:14.560
Hundreds of years ago, there was an astronomer
link |
00:26:17.560
whose name was Kepler.
link |
00:26:19.560
And he looked at the data points that he got by watching planets move.
link |
00:26:25.560
And then he had all these data points and suddenly it turned out
link |
00:26:28.560
that he can greatly compress the data by predicting it through an ellipse law.
link |
00:26:37.560
So it turns out that all these data points are more or less on ellipses around the sun.
link |
00:26:44.560
And another guy came along whose name was Newton and before him Hook.
link |
00:26:50.560
And they said the same thing that is making these planets move like that
link |
00:26:57.560
is what makes the apples fall down.
link |
00:27:01.560
And it also holds for stones and for all kinds of other objects.
link |
00:27:10.560
And suddenly many, many of these observations became much more compressible
link |
00:27:16.560
because as long as you can predict the next thing,
link |
00:27:19.560
given what you have seen so far, you can compress it.
link |
00:27:22.560
But you don't have to store that data extra.
link |
00:27:24.560
This is called predictive coding.
link |
00:27:28.560
And then there was still something wrong with that theory of the universe
link |
00:27:33.560
and you had deviations from these predictions of the theory.
link |
00:27:37.560
And 300 years later another guy came along whose name was Einstein
link |
00:27:41.560
and he was able to explain away all these deviations from the predictions of the old theory
link |
00:27:50.560
through a new theory which was called the general theory of relativity
link |
00:27:56.560
which at first glance looks a little bit more complicated
link |
00:28:00.560
and you have to warp space and time but you can't phrase it within one single sentence
link |
00:28:05.560
which is no matter how fast you accelerate and how fast or how hard you decelerate
link |
00:28:12.560
and no matter what is the gravity in your local framework,
link |
00:28:18.560
light speed always looks the same.
link |
00:28:21.560
And from that you can calculate all the consequences.
link |
00:28:24.560
So it's a very simple thing and it allows you to further compress all the observations
link |
00:28:30.560
because certainly there are hardly any deviations any longer
link |
00:28:35.560
that you can measure from the predictions of this new theory.
link |
00:28:39.560
So art of science is a history of compression progress.
link |
00:28:44.560
You never arrive immediately at the shortest explanation of the data
link |
00:28:50.560
but you're making progress.
link |
00:28:52.560
Whenever you are making progress you have an insight.
link |
00:28:56.560
You see, oh, first I needed so many bits of information to describe the data,
link |
00:29:01.560
to describe my falling apples, my video of falling apples,
link |
00:29:04.560
I need so many data, so many pixels have to be stored
link |
00:29:08.560
but then suddenly I realize, no, there is a very simple way of predicting the third frame
link |
00:29:14.560
in the video from the first two and maybe not every little detail can be predicted
link |
00:29:20.560
but more or less most of these orange blots that are coming down,
link |
00:29:24.560
I'm sorry, in the same way, which means that I can greatly compress the video
link |
00:29:28.560
and the amount of compression, progress,
link |
00:29:33.560
that is the depth of the insight that you have at that moment.
link |
00:29:37.560
That's the fun that you have, the scientific fun,
link |
00:29:40.560
the fun in that discovery and we can build artificial systems that do the same thing.
link |
00:29:46.560
They measure the depth of their insights as they are looking at the data
link |
00:29:51.560
through their own experiments and we give them a reward,
link |
00:29:55.560
an intrinsic reward and proportion to this depth of insight.
link |
00:30:00.560
And since they are trying to maximize the rewards they get,
link |
00:30:07.560
they are suddenly motivated to come up with new action sequences,
link |
00:30:12.560
with new experiments that have the property that the data that is coming in
link |
00:30:17.560
as a consequence of these experiments has the property
link |
00:30:21.560
that they can learn something about, see a pattern in there
link |
00:30:25.560
which they hadn't seen yet before.
link |
00:30:28.560
So there's an idea of power play that you've described,
link |
00:30:32.560
a training and general problem solver in this kind of way of looking for the unsolved problems.
link |
00:30:37.560
Can you describe that idea a little further?
link |
00:30:40.560
It's another very simple idea.
link |
00:30:42.560
Normally what you do in computer science, you have some guy who gives you a problem
link |
00:30:49.560
and then there is a huge search space of potential solution candidates
link |
00:30:56.560
and you somehow try them out and you have more or less sophisticated ways
link |
00:31:02.560
of moving around in that search space
link |
00:31:06.560
until you finally found a solution which you consider satisfactory.
link |
00:31:11.560
That's what most of computer science is about.
link |
00:31:15.560
Power play just goes one little step further and says,
link |
00:31:19.560
let's not only search for solutions to a given problem,
link |
00:31:24.560
but let's search to pairs of problems and their solutions
link |
00:31:30.560
where the system itself has the opportunity to phrase its own problem.
link |
00:31:36.560
So we are looking suddenly at pairs of problems and their solutions
link |
00:31:42.560
or modifications of the problem solver
link |
00:31:46.560
that is supposed to generate a solution to that new problem.
link |
00:31:50.560
And this additional degree of freedom
link |
00:31:56.560
allows us to build career systems that are like scientists
link |
00:32:01.560
in the sense that they not only try to solve and try to find answers
link |
00:32:06.560
to existing questions, no, they are also free to pose their own questions.
link |
00:32:12.560
So if you want to build an artificial scientist,
link |
00:32:15.560
you have to give it that freedom and power play is exactly doing that.
link |
00:32:19.560
So that's a dimension of freedom that's important to have,
link |
00:32:23.560
how hard do you think that, how multi dimensional and difficult the space of
link |
00:32:31.560
then coming up with your own questions is.
link |
00:32:34.560
So it's one of the things that as human beings we consider to be
link |
00:32:38.560
the thing that makes us special, the intelligence that makes us special
link |
00:32:41.560
is that brilliant insight that can create something totally new.
link |
00:32:47.560
Yes. So now let's look at the extreme case.
link |
00:32:51.560
Let's look at the set of all possible problems that you can formally describe,
link |
00:32:57.560
which is infinite, which should be the next problem
link |
00:33:03.560
that a scientist or power play is going to solve.
link |
00:33:07.560
Well, it should be the easiest problem that goes beyond what you already know.
link |
00:33:16.560
So it should be the simplest problem that the current problems
link |
00:33:22.560
that you have which can already solve 100 problems that he cannot solve yet
link |
00:33:28.560
by just generalizing.
link |
00:33:30.560
So it has to be new.
link |
00:33:32.560
So it has to require a modification of the problem solver such that the new
link |
00:33:36.560
problem solver can solve this new thing, but the old problem solver cannot do it.
link |
00:33:41.560
And in addition to that, we have to make sure that the problem solver
link |
00:33:47.560
doesn't forget any of the previous solutions.
link |
00:33:50.560
Right.
link |
00:33:51.560
And so by definition, power play is now trying always to search in this pair of
link |
00:33:57.560
in the set of pairs of problems and problems over modifications
link |
00:34:02.560
for a combination that minimize the time to achieve these criteria.
link |
00:34:08.560
Power is trying to find the problem which is easiest to add to the repertoire.
link |
00:34:14.560
So just like grad students and academics and researchers can spend their whole
link |
00:34:19.560
career in a local minima stuck trying to come up with interesting questions,
link |
00:34:25.560
but ultimately doing very little.
link |
00:34:27.560
Do you think it's easy in this approach of looking for the simplest
link |
00:34:32.560
problem solver problem to get stuck in a local minima is not never really discovering
link |
00:34:38.560
new, you know, really jumping outside of the hundred problems that you've already
link |
00:34:43.560
solved in a genuine creative way.
link |
00:34:47.560
No, because that's the nature of power play that it's always trying to break
link |
00:34:52.560
its current generalization abilities by coming up with a new problem which is
link |
00:34:58.560
beyond the current horizon, just shifting the horizon of knowledge a little bit
link |
00:35:04.560
out there, breaking the existing rules such that the new thing becomes solvable
link |
00:35:10.560
but wasn't solvable by the old thing.
link |
00:35:13.560
So like adding a new axiom, like what Gödel did when he came up with these
link |
00:35:19.560
new sentences, new theorems that didn't have a proof in the formal system,
link |
00:35:23.560
which means you can add them to the repertoire, hoping that they are not
link |
00:35:30.560
going to damage the consistency of the whole thing.
link |
00:35:35.560
So in the paper with the amazing title, Formal Theory of Creativity,
link |
00:35:41.560
Fun and Intrinsic Motivation, you talk about discovery as intrinsic reward.
link |
00:35:47.560
So if you view humans as intelligent agents, what do you think is the purpose
link |
00:35:53.560
and meaning of life for us humans?
link |
00:35:56.560
You've talked about this discovery.
link |
00:35:58.560
Do you see humans as an instance of power play agents?
link |
00:36:04.560
Yeah, so humans are curious and that means they behave like scientists,
link |
00:36:11.560
not only the official scientists but even the babies behave like scientists
link |
00:36:15.560
and they play around with their toys to figure out how the world works
link |
00:36:19.560
and how it is responding to their actions.
link |
00:36:22.560
And that's how they learn about gravity and everything.
link |
00:36:26.560
And yeah, in 1990, we had the first systems like that
link |
00:36:30.560
who would just try to play around with the environment
link |
00:36:33.560
and come up with situations that go beyond what they knew at that time
link |
00:36:39.560
and then get a reward for creating these situations
link |
00:36:42.560
and then becoming more general problem solvers
link |
00:36:45.560
and being able to understand more of the world.
link |
00:36:48.560
So yeah, I think in principle that curiosity,
link |
00:36:56.560
strategy or more sophisticated versions of what I just described,
link |
00:37:02.560
they are what we have built in as well because evolution discovered
link |
00:37:07.560
that's a good way of exploring the unknown world and a guy who explores
link |
00:37:12.560
the unknown world has a higher chance of solving problems
link |
00:37:16.560
that he needs to survive in this world.
link |
00:37:19.560
On the other hand, those guys who were too curious,
link |
00:37:23.560
they were weeded out as well.
link |
00:37:25.560
So you have to find this trade off.
link |
00:37:27.560
Evolution found a certain trade off apparently in our society.
link |
00:37:30.560
There is a certain percentage of extremely explorative guys
link |
00:37:35.560
and it doesn't matter if they die because many of the others are more conservative.
link |
00:37:41.560
And so yeah, it would be surprising to me
link |
00:37:46.560
if that principle of artificial curiosity wouldn't be present
link |
00:37:55.560
in almost exactly the same form here in our brains.
link |
00:37:59.560
So you're a bit of a musician and an artist.
link |
00:38:02.560
So continuing on this topic of creativity,
link |
00:38:07.560
what do you think is the role of creativity in intelligence?
link |
00:38:10.560
So you've kind of implied that it's essential for intelligence,
link |
00:38:16.560
if you think of intelligence as a problem solving system,
link |
00:38:21.560
as ability to solve problems.
link |
00:38:23.560
But do you think it's essential, this idea of creativity?
link |
00:38:28.560
We never have a subprogram that is called creativity or something.
link |
00:38:34.560
It's just a side effect of what our problems always do.
link |
00:38:37.560
They are searching a space of candidates, of solution candidates,
link |
00:38:44.560
until they hopefully find a solution to a given problem.
link |
00:38:47.560
But then there are these two types of creativity
link |
00:38:50.560
and both of them are now present in our machines.
link |
00:38:53.560
The first one has been around for a long time,
link |
00:38:56.560
which is human gives problem to machine.
link |
00:38:59.560
Machine tries to find a solution to that.
link |
00:39:03.560
And this has been happening for many decades.
link |
00:39:05.560
And for many decades, machines have found creative solutions
link |
00:39:09.560
to interesting problems where humans were not aware
link |
00:39:13.560
of these particularly creative solutions,
link |
00:39:17.560
but then appreciated that the machine found that.
link |
00:39:20.560
The second is the pure creativity.
link |
00:39:23.560
What I just mentioned, I would call the applied creativity,
link |
00:39:28.560
like applied art, where somebody tells you,
link |
00:39:31.560
now make a nice picture of this pope,
link |
00:39:34.560
and you will get money for that.
link |
00:39:36.560
So here is the artist and he makes a convincing picture of the pope
link |
00:39:41.560
and the pope likes it and gives him the money.
link |
00:39:44.560
And then there is the pure creativity,
link |
00:39:48.560
which is more like the power play and the artificial curiosity thing,
link |
00:39:51.560
where you have the freedom to select your own problem,
link |
00:39:56.560
like a scientist who defines his own question to study.
link |
00:40:02.560
And so that is the pure creativity, if you will,
link |
00:40:06.560
as opposed to the applied creativity, which serves another.
link |
00:40:13.560
In that distinction, there's almost echoes of narrow AI versus general AI.
link |
00:40:18.560
So this kind of constrained painting of a pope seems like
link |
00:40:24.560
the approaches of what people are calling narrow AI.
link |
00:40:29.560
And pure creativity seems to be,
link |
00:40:32.560
maybe I'm just biased as a human,
link |
00:40:34.560
but it seems to be an essential element of human level intelligence.
link |
00:40:40.560
Is that what you're implying?
link |
00:40:43.560
To a degree.
link |
00:40:45.560
If you zoom back a little bit and you just look at a general problem solving machine,
link |
00:40:50.560
which is trying to solve arbitrary problems,
link |
00:40:53.560
then this machine will figure out in the course of solving problems
link |
00:40:57.560
that it's good to be curious.
link |
00:40:59.560
So all of what I said just now about this pre wild curiosity
link |
00:41:04.560
and this will to invent new problems that the system doesn't know how to solve yet,
link |
00:41:10.560
should be just a byproduct of the general search.
link |
00:41:14.560
However, apparently evolution has built it into us
link |
00:41:21.560
because it turned out to be so successful, a pre wiring, a bias,
link |
00:41:26.560
a very successful exploratory bias that we are born with.
link |
00:41:33.560
And you've also said that consciousness in the same kind of way
link |
00:41:36.560
may be a byproduct of problem solving.
link |
00:41:40.560
Do you find this an interesting byproduct?
link |
00:41:44.560
Do you think it's a useful byproduct?
link |
00:41:46.560
What are your thoughts on consciousness in general?
link |
00:41:49.560
Or is it simply a byproduct of greater and greater capabilities of problem solving
link |
00:41:54.560
that's similar to creativity in that sense?
link |
00:42:00.560
We never have a procedure called consciousness in our machines.
link |
00:42:04.560
However, we get a side effects of what these machines are doing,
link |
00:42:10.560
things that seem to be closely related to what people call consciousness.
link |
00:42:15.560
So for example, already 1990 we had simple systems
link |
00:42:20.560
which were basically recurrent networks and therefore universal computers
link |
00:42:25.560
trying to map incoming data into actions that lead to success.
link |
00:42:32.560
Maximizing reward in a given environment, always finding the charging station in time
link |
00:42:39.560
whenever the battery is low and negative signals are coming from the battery,
link |
00:42:43.560
always find the charging station in time without bumping against painful obstacles on the way.
link |
00:42:50.560
So complicated things but very easily motivated.
link |
00:42:54.560
And then we give these little guys a separate recurrent network
link |
00:43:01.560
which is just predicting what's happening if I do that and that.
link |
00:43:04.560
What will happen as a consequence of these actions that I'm executing
link |
00:43:08.560
and it's just trained on the long and long history of interactions with the world.
link |
00:43:13.560
So it becomes a predictive model of the world basically.
link |
00:43:17.560
And therefore also a compressor of the observations of the world
link |
00:43:22.560
because whatever you can predict, you don't have to store extra.
link |
00:43:26.560
So compression is a side effect of prediction.
link |
00:43:29.560
And how does this recurrent network compress?
link |
00:43:32.560
Well, it's inventing little subprograms, little subnetworks
link |
00:43:36.560
that stand for everything that frequently appears in the environment.
link |
00:43:41.560
Like bottles and microphones and faces, maybe lots of faces in my environment.
link |
00:43:47.560
So I'm learning to create something like a prototype face
link |
00:43:51.560
and a new face comes along and all I have to encode are the deviations from the prototype.
link |
00:43:55.560
So it's compressing all the time the stuff that frequently appears.
link |
00:44:00.560
There's one thing that appears all the time
link |
00:44:04.560
that is present all the time when the agent is interacting with its environment,
link |
00:44:09.560
which is the agent itself.
link |
00:44:11.560
So just for data compression reasons,
link |
00:44:14.560
it is extremely natural for this recurrent network
link |
00:44:18.560
to come up with little subnetworks that stand for the properties of the agents,
link |
00:44:23.560
the hand, the other actuators,
link |
00:44:27.560
and all the stuff that you need to better encode the data,
link |
00:44:31.560
which is influenced by the actions of the agent.
link |
00:44:34.560
So there, just as a side effect of data compression during primal solving,
link |
00:44:40.560
you have internal self models.
link |
00:44:45.560
Now you can use this model of the world to plan your future.
link |
00:44:51.560
And that's what we also have done since 1990.
link |
00:44:54.560
So the recurrent network, which is the controller,
link |
00:44:57.560
which is trying to maximize reward,
link |
00:44:59.560
can use this model of the network of the world,
link |
00:45:02.560
this model network of the world, this predictive model of the world
link |
00:45:05.560
to plan ahead and say, let's not do this action sequence.
link |
00:45:08.560
Let's do this action sequence instead
link |
00:45:11.560
because it leads to more predicted rewards.
link |
00:45:14.560
And whenever it's waking up these little subnetworks that stand for itself,
link |
00:45:19.560
then it's thinking about itself.
link |
00:45:21.560
Then it's thinking about itself.
link |
00:45:23.560
And it's exploring mentally the consequences of its own actions.
link |
00:45:30.560
And now you tell me why it's still missing.
link |
00:45:36.560
Missing the gap to consciousness.
link |
00:45:39.560
There isn't. That's a really beautiful idea that, you know,
link |
00:45:43.560
if life is a collection of data
link |
00:45:46.560
and life is a process of compressing that data to act efficiently.
link |
00:45:53.560
In that data, you yourself appear very often.
link |
00:45:57.560
So it's useful to form compressions of yourself.
link |
00:46:00.560
And it's a really beautiful formulation of what consciousness is,
link |
00:46:03.560
is a necessary side effect.
link |
00:46:05.560
It's actually quite compelling to me.
link |
00:46:11.560
We've described RNNs, developed LSTMs, long short term memory networks.
link |
00:46:18.560
They're a type of recurrent neural networks.
link |
00:46:22.560
They've gotten a lot of success recently.
link |
00:46:24.560
So these are networks that model the temporal aspects in the data,
link |
00:46:29.560
temporal patterns in the data.
link |
00:46:31.560
And you've called them the deepest of the neural networks, right?
link |
00:46:36.560
What do you think is the value of depth in the models that we use to learn?
link |
00:46:43.560
Yeah, since you mentioned the long short term memory and the LSTM,
link |
00:46:47.560
I have to mention the names of the brilliant students who made that possible.
link |
00:46:52.560
Yes, of course, of course.
link |
00:46:53.560
First of all, my first student ever, Sepp Hochreiter,
link |
00:46:56.560
who had fundamental insights already in his diploma thesis.
link |
00:47:00.560
Then Felix Giers, who had additional important contributions.
link |
00:47:04.560
Alex Gray is a guy from Scotland who is mostly responsible for this CTC algorithm,
link |
00:47:11.560
which is now often used to train the LSTM to do the speech recognition
link |
00:47:16.560
on all the Google Android phones and whatever, and Siri and so on.
link |
00:47:21.560
So these guys, without these guys, I would be nothing.
link |
00:47:26.560
It's a lot of incredible work.
link |
00:47:28.560
What is now the depth?
link |
00:47:30.560
What is the importance of depth?
link |
00:47:32.560
Well, most problems in the real world are deep
link |
00:47:36.560
in the sense that the current input doesn't tell you all you need to know
link |
00:47:41.560
about the environment.
link |
00:47:44.560
So instead, you have to have a memory of what happened in the past
link |
00:47:49.560
and often important parts of that memory are dated.
link |
00:47:54.560
They are pretty old.
link |
00:47:56.560
So when you're doing speech recognition, for example,
link |
00:47:59.560
and somebody says 11,
link |
00:48:03.560
then that's about half a second or something like that,
link |
00:48:08.560
which means it's already 50 time steps.
link |
00:48:11.560
And another guy or the same guy says 7.
link |
00:48:15.560
So the ending is the same, even.
link |
00:48:18.560
But now the system has to see the distinction between 7 and 11,
link |
00:48:22.560
and the only way it can see the difference is it has to store
link |
00:48:26.560
that 50 steps ago there was an S or an L, 11 or 7.
link |
00:48:34.560
So there you have already a problem of depth 50,
link |
00:48:37.560
because for each time step you have something like a virtual layer
link |
00:48:42.560
and the expanded, unrolled version of this recurrent network
link |
00:48:45.560
which is doing the speech recognition.
link |
00:48:47.560
So these long time lags, they translate into problem depth.
link |
00:48:53.560
And most problems in this world are such that you really
link |
00:48:59.560
have to look far back in time to understand what is the problem
link |
00:49:04.560
and to solve it.
link |
00:49:06.560
But just like with LSTMs, you don't necessarily need to,
link |
00:49:09.560
when you look back in time, remember every aspect.
link |
00:49:12.560
You just need to remember the important aspects.
link |
00:49:14.560
That's right.
link |
00:49:15.560
The network has to learn to put the important stuff into memory
link |
00:49:19.560
and to ignore the unimportant noise.
link |
00:49:23.560
But in that sense, deeper and deeper is better?
link |
00:49:28.560
Or is there a limitation?
link |
00:49:30.560
I mean LSTM is one of the great examples of architectures
link |
00:49:36.560
that do something beyond just deeper and deeper networks.
link |
00:49:41.560
There's clever mechanisms for filtering data for remembering and forgetting.
link |
00:49:47.560
So do you think that kind of thinking is necessary?
link |
00:49:51.560
If you think about LSTMs as a leap, a big leap forward
link |
00:49:54.560
over traditional vanilla RNNs, what do you think is the next leap
link |
00:50:01.560
within this context?
link |
00:50:03.560
So LSTM is a very clever improvement, but LSTMs still don't
link |
00:50:08.560
have the same kind of ability to see far back in the past
link |
00:50:13.560
as humans do, the credit assignment problem across way back,
link |
00:50:18.560
not just 50 time steps or 100 or 1,000, but millions and billions.
link |
00:50:24.560
It's not clear what are the practical limits of the LSTM
link |
00:50:28.560
when it comes to looking back.
link |
00:50:30.560
Already in 2006, I think, we had examples where not only
link |
00:50:35.560
looked back tens of thousands of steps, but really millions of steps.
link |
00:50:40.560
And Juan Perez Ortiz in my lab, I think was the first author of a paper
link |
00:50:46.560
where we really, was it 2006 or something, had examples where it
link |
00:50:51.560
learned to look back for more than 10 million steps.
link |
00:50:56.560
So for most problems of speech recognition, it's not
link |
00:51:02.560
necessary to look that far back, but there are examples where it does.
link |
00:51:06.560
Now, the looking back thing, that's rather easy because there is only
link |
00:51:12.560
one past, but there are many possible futures.
link |
00:51:17.560
And so a reinforcement learning system, which is trying to maximize
link |
00:51:21.560
its future expected reward and doesn't know yet which of these
link |
00:51:26.560
many possible futures should I select, given this one single past,
link |
00:51:31.560
is facing problems that the LSTM by itself cannot solve.
link |
00:51:36.560
So the LSTM is good for coming up with a compact representation
link |
00:51:40.560
of the history so far, of the history and of observations and actions so far.
link |
00:51:46.560
But now, how do you plan in an efficient and good way among all these,
link |
00:51:53.560
how do you select one of these many possible action sequences
link |
00:51:57.560
that a reinforcement learning system has to consider to maximize
link |
00:52:02.560
reward in this unknown future.
link |
00:52:05.560
So again, we have this basic setup where you have one recon network,
link |
00:52:11.560
which gets in the video and the speech and whatever, and it's
link |
00:52:16.560
executing the actions and it's trying to maximize reward.
link |
00:52:19.560
So there is no teacher who tells it what to do at which point in time.
link |
00:52:24.560
And then there's the other network, which is just predicting
link |
00:52:29.560
what's going to happen if I do that and then.
link |
00:52:32.560
And that could be an LSTM network, and it learns to look back
link |
00:52:36.560
all the way to make better predictions of the next time step.
link |
00:52:41.560
So essentially, although it's predicting only the next time step,
link |
00:52:45.560
it is motivated to learn to put into memory something that happened
link |
00:52:50.560
maybe a million steps ago because it's important to memorize that
link |
00:52:54.560
if you want to predict that at the next time step, the next event.
link |
00:52:58.560
Now, how can a model of the world like that,
link |
00:53:03.560
a predictive model of the world be used by the first guy,
link |
00:53:07.560
let's call it the controller and the model, the controller and the model.
link |
00:53:11.560
How can the model be used by the controller to efficiently select
link |
00:53:16.560
among these many possible futures?
link |
00:53:19.560
The naive way we had about 30 years ago was
link |
00:53:23.560
let's just use the model of the world as a stand in,
link |
00:53:27.560
as a simulation of the world.
link |
00:53:29.560
And millisecond by millisecond we plan the future
link |
00:53:32.560
and that means we have to roll it out really in detail
link |
00:53:36.560
and it will work only if the model is really good
link |
00:53:38.560
and it will still be inefficient
link |
00:53:40.560
because we have to look at all these possible futures
link |
00:53:43.560
and there are so many of them.
link |
00:53:45.560
So instead, what we do now since 2015 in our CN systems,
link |
00:53:50.560
controller model systems, we give the controller the opportunity
link |
00:53:54.560
to learn by itself how to use the potentially relevant parts
link |
00:54:00.560
of the model network to solve new problems more quickly.
link |
00:54:05.560
And if it wants to, it can learn to ignore the M
link |
00:54:09.560
and sometimes it's a good idea to ignore the M
link |
00:54:12.560
because it's really bad, it's a bad predictor
link |
00:54:15.560
in this particular situation of life
link |
00:54:18.560
where the controller is currently trying to maximize reward.
link |
00:54:22.560
However, it can also learn to address and exploit
link |
00:54:26.560
some of the subprograms that came about in the model network
link |
00:54:32.560
through compressing the data by predicting it.
link |
00:54:35.560
So it now has an opportunity to reuse that code,
link |
00:54:40.560
the algorithmic information in the model network
link |
00:54:43.560
to reduce its own search space,
link |
00:54:47.560
search that it can solve a new problem more quickly
link |
00:54:50.560
than without the model.
link |
00:54:52.560
Compression.
link |
00:54:54.560
So you're ultimately optimistic and excited
link |
00:54:58.560
about the power of reinforcement learning
link |
00:55:02.560
in the context of real systems.
link |
00:55:04.560
Absolutely, yeah.
link |
00:55:06.560
So you see RL as a potential having a huge impact
link |
00:55:11.560
beyond just sort of the M part is often developed
link |
00:55:15.560
on supervised learning methods.
link |
00:55:19.560
You see RL as a, for problems of cell driving cars
link |
00:55:25.560
or any kind of applied side robotics,
link |
00:55:28.560
that's the correct, interesting direction for researching you.
link |
00:55:33.560
I do think so.
link |
00:55:35.560
We have a company called Nasense,
link |
00:55:37.560
which has applied reinforcement learning to little Audis.
link |
00:55:43.560
Little Audis.
link |
00:55:45.560
Which learn to park without a teacher.
link |
00:55:47.560
The same principles were used, of course.
link |
00:55:51.560
So these little Audis, they are small, maybe like that,
link |
00:55:54.560
so much smaller than the RL Audis.
link |
00:55:57.560
But they have all the sensors that you find in the RL Audis.
link |
00:56:00.560
You find the cameras, the LIDAR sensors.
link |
00:56:03.560
They go up to 120 kilometers an hour if they want to.
link |
00:56:08.560
And they have pain sensors, basically.
link |
00:56:12.560
And they don't want to bump against obstacles and other Audis.
link |
00:56:16.560
And so they must learn like little babies to park.
link |
00:56:21.560
Take the raw vision input and translate that into actions
link |
00:56:25.560
that lead to successful parking behavior,
link |
00:56:28.560
which is a rewarding thing.
link |
00:56:30.560
And yes, they learn that.
link |
00:56:32.560
So we have examples like that.
link |
00:56:34.560
And it's only in the beginning.
link |
00:56:36.560
This is just a tip of the iceberg.
link |
00:56:38.560
And I believe the next wave of AI is going to be all about that.
link |
00:56:44.560
So at the moment, the current wave of AI is about
link |
00:56:47.560
passive pattern observation and prediction.
link |
00:56:51.560
And that's what you have on your smartphone
link |
00:56:54.560
and what the major companies on the Pacific Rim are using
link |
00:56:58.560
to sell you ads to do marketing.
link |
00:57:01.560
That's the current sort of profit in AI.
link |
00:57:04.560
And that's only one or two percent of the wild economy,
link |
00:57:09.560
which is big enough to make these companies
link |
00:57:11.560
pretty much the most valuable companies in the world.
link |
00:57:14.560
But there's a much, much bigger fraction of the economy
link |
00:57:19.560
going to be affected by the next wave,
link |
00:57:21.560
which is really about machines that shape the data
link |
00:57:25.560
through their own actions.
link |
00:57:27.560
Do you think simulation is ultimately the biggest way
link |
00:57:32.560
that those methods will be successful in the next 10, 20 years?
link |
00:57:36.560
We're not talking about 100 years from now.
link |
00:57:38.560
We're talking about sort of the near term impact of RL.
link |
00:57:42.560
Do you think really good simulation is required?
link |
00:57:44.560
Or is there other techniques like imitation learning,
link |
00:57:48.560
observing other humans operating in the real world?
link |
00:57:53.560
Where do you think this success will come from?
link |
00:57:57.560
So at the moment we have a tendency of using
link |
00:58:01.560
physics simulations to learn behavior for machines
link |
00:58:06.560
that learn to solve problems that humans also do not know how to solve.
link |
00:58:13.560
However, this is not the future,
link |
00:58:15.560
because the future is in what little babies do.
link |
00:58:19.560
They don't use a physics engine to simulate the world.
link |
00:58:22.560
They learn a predictive model of the world,
link |
00:58:25.560
which maybe sometimes is wrong in many ways,
link |
00:58:29.560
but captures all kinds of important abstract high level predictions
link |
00:58:34.560
which are really important to be successful.
link |
00:58:37.560
And that's what was the future 30 years ago
link |
00:58:42.560
when we started that type of research,
link |
00:58:44.560
but it's still the future,
link |
00:58:45.560
and now we know much better how to move forward
link |
00:58:50.560
and to really make working systems based on that,
link |
00:58:54.560
where you have a learning model of the world,
link |
00:58:57.560
a model of the world that learns to predict what's going to happen
link |
00:59:00.560
if I do that and that,
link |
00:59:01.560
and then the controller uses that model
link |
00:59:06.560
to more quickly learn successful action sequences.
link |
00:59:11.560
And then of course always this curiosity thing,
link |
00:59:13.560
in the beginning the model is stupid,
link |
00:59:15.560
so the controller should be motivated
link |
00:59:17.560
to come up with experiments, with action sequences
link |
00:59:20.560
that lead to data that improve the model.
link |
00:59:23.560
Do you think improving the model,
link |
00:59:26.560
constructing an understanding of the world in this connection
link |
00:59:30.560
is now the popular approaches have been successful
link |
00:59:34.560
or grounded in ideas of neural networks,
link |
00:59:38.560
but in the 80s with expert systems there's symbolic AI approaches,
link |
00:59:43.560
which to us humans are more intuitive
link |
00:59:47.560
in the sense that it makes sense that you build up knowledge
link |
00:59:50.560
in this knowledge representation.
link |
00:59:52.560
What kind of lessons can we draw into our current approaches
link |
00:59:56.560
from expert systems, from symbolic AI?
link |
01:00:00.560
So I became aware of all of that in the 80s
link |
01:00:04.560
and back then logic programming was a huge thing.
link |
01:00:09.560
Was it inspiring to yourself that you find it compelling
link |
01:00:12.560
that a lot of your work was not so much in that realm,
link |
01:00:16.560
is more in the learning systems?
link |
01:00:18.560
Yes and no, but we did all of that.
link |
01:00:20.560
So my first publication ever actually was 1987,
link |
01:00:27.560
was the implementation of a genetic algorithm
link |
01:00:31.560
of a genetic programming system in Prolog.
link |
01:00:34.560
So Prolog, that's what you learn back then,
link |
01:00:37.560
which is a logic programming language,
link |
01:00:39.560
and the Japanese, they had this huge fifth generation AI project,
link |
01:00:45.560
which was mostly about logic programming back then,
link |
01:00:48.560
although neural networks existed and were well known back then,
link |
01:00:53.560
and deep learning has existed since 1965,
link |
01:00:57.560
since this guy in the Ukraine, Ivak Nenko, started it,
link |
01:01:01.560
but the Japanese and many other people,
link |
01:01:05.560
they focused really on this logic programming,
link |
01:01:07.560
and I was influenced to the extent that I said,
link |
01:01:10.560
okay, let's take these biologically inspired algorithms
link |
01:01:13.560
like evolution, programs,
link |
01:01:16.560
and implement that in the language which I know,
link |
01:01:22.560
which was Prolog, for example, back then.
link |
01:01:24.560
And then in many ways this came back later,
link |
01:01:28.560
because the Goudel machine, for example,
link |
01:01:31.560
has a proof searcher on board,
link |
01:01:33.560
and without that it would not be optimal.
link |
01:01:35.560
Well, Markus Hutter's universal algorithm
link |
01:01:38.560
for solving all well defined problems
link |
01:01:40.560
has a proof search on board,
link |
01:01:42.560
so that's very much logic programming.
link |
01:01:46.560
Without that it would not be asymptotically optimal.
link |
01:01:50.560
But then on the other hand, because we are very pragmatic guys also,
link |
01:01:54.560
we focused on recurrent neural networks
link |
01:01:59.560
and suboptimal stuff such as gradient based search
link |
01:02:04.560
and program space rather than provably optimal things.
link |
01:02:09.560
So logic programming certainly has a usefulness
link |
01:02:13.560
when you're trying to construct something provably optimal
link |
01:02:17.560
or provably good or something like that,
link |
01:02:19.560
but is it useful for practical problems?
link |
01:02:22.560
It's really useful for our theorem proving.
link |
01:02:24.560
The best theorem proofers today are not neural networks.
link |
01:02:28.560
No, they are logic programming systems
link |
01:02:31.560
that are much better theorem proofers than most math students
link |
01:02:35.560
in the first or second semester.
link |
01:02:38.560
But for reasoning, for playing games of Go, or chess,
link |
01:02:42.560
or for robots, autonomous vehicles that operate in the real world,
link |
01:02:46.560
or object manipulation, you think learning...
link |
01:02:50.560
Yeah, as long as the problems have little to do
link |
01:02:53.560
with theorem proving themselves,
link |
01:02:58.560
then as long as that is not the case,
link |
01:03:01.560
you just want to have better pattern recognition.
link |
01:03:05.560
So to build a self trying car, you want to have better pattern recognition
link |
01:03:09.560
and pedestrian recognition and all these things,
link |
01:03:13.560
and you want to minimize the number of false positives,
link |
01:03:18.560
which is currently slowing down self trying cars in many ways.
link |
01:03:22.560
And all of that has very little to do with logic programming.
link |
01:03:27.560
What are you most excited about in terms of directions
link |
01:03:32.560
of artificial intelligence at this moment in the next few years,
link |
01:03:36.560
in your own research and in the broader community?
link |
01:03:41.560
So I think in the not so distant future,
link |
01:03:44.560
we will have for the first time little robots that learn like kids.
link |
01:03:52.560
And I will be able to say to the robot,
link |
01:03:57.560
look here robot, we are going to assemble a smartphone.
link |
01:04:00.560
Let's take this slab of plastic and the screwdriver
link |
01:04:05.560
and let's screw in the screw like that.
link |
01:04:08.560
No, not like that, like that.
link |
01:04:11.560
Not like that, like that.
link |
01:04:13.560
And I don't have a data glove or something.
link |
01:04:17.560
He will see me and he will hear me
link |
01:04:20.560
and he will try to do something with his own actuators,
link |
01:04:24.560
which will be really different from mine,
link |
01:04:26.560
but he will understand the difference
link |
01:04:28.560
and will learn to imitate me but not in the supervised way
link |
01:04:34.560
where a teacher is giving target signals for all his muscles all the time.
link |
01:04:40.560
No, by doing this high level imitation
link |
01:04:43.560
where he first has to learn to imitate me
link |
01:04:46.560
and to interpret these additional noises coming from my mouth
link |
01:04:50.560
as helpful signals to do that pattern.
link |
01:04:54.560
And then it will by itself come up with faster ways
link |
01:05:00.560
and more efficient ways of doing the same thing.
link |
01:05:03.560
And finally, I stop his learning algorithm
link |
01:05:07.560
and make a million copies and sell it.
link |
01:05:10.560
And so at the moment this is not possible,
link |
01:05:13.560
but we already see how we are going to get there.
link |
01:05:16.560
And you can imagine to the extent that this works economically and cheaply,
link |
01:05:21.560
it's going to change everything.
link |
01:05:24.560
Almost all our production is going to be affected by that.
link |
01:05:30.560
And a much bigger wave,
link |
01:05:33.560
a much bigger AI wave is coming
link |
01:05:36.560
than the one that we are currently witnessing,
link |
01:05:38.560
which is mostly about passive pattern recognition on your smartphone.
link |
01:05:41.560
This is about active machines that shapes data through the actions they are executing
link |
01:05:47.560
and they learn to do that in a good way.
link |
01:05:51.560
So many of the traditional industries are going to be affected by that.
link |
01:05:56.560
All the companies that are building machines
link |
01:06:00.560
will equip these machines with cameras and other sensors
link |
01:06:05.560
and they are going to learn to solve all kinds of problems.
link |
01:06:10.560
Through interaction with humans, but also a lot on their own
link |
01:06:14.560
to improve what they already can do.
link |
01:06:18.560
And lots of old economy is going to be affected by that.
link |
01:06:23.560
And in recent years I have seen that old economy is actually waking up
link |
01:06:28.560
and realizing that this is the case.
link |
01:06:31.560
Are you optimistic about that future? Are you concerned?
link |
01:06:35.560
There's a lot of people concerned in the near term about the transformation
link |
01:06:40.560
of the nature of work.
link |
01:06:42.560
The kind of ideas that you just suggested
link |
01:06:45.560
would have a significant impact on what kind of things could be automated.
link |
01:06:48.560
Are you optimistic about that future?
link |
01:06:51.560
Are you nervous about that future?
link |
01:06:54.560
And looking a little bit farther into the future, there's people like Gila Musk
link |
01:07:01.560
still wrestle concerned about the existential threats of that future.
link |
01:07:06.560
So in the near term, job loss in the long term existential threat,
link |
01:07:10.560
are these concerns to you or are you ultimately optimistic?
link |
01:07:15.560
So let's first address the near future.
link |
01:07:22.560
We have had predictions of job losses for many decades.
link |
01:07:27.560
For example, when industrial robots came along,
link |
01:07:32.560
many people predicted that lots of jobs are going to get lost.
link |
01:07:37.560
And in a sense, they were right,
link |
01:07:41.560
because back then there were car factories
link |
01:07:45.560
and hundreds of people in these factories assembled cars.
link |
01:07:50.560
And today the same car factories have hundreds of robots
link |
01:07:53.560
and maybe three guys watching the robots.
link |
01:07:58.560
On the other hand, those countries that have lots of robots per capita,
link |
01:08:04.560
Japan, Korea, Germany, Switzerland, a couple of other countries,
link |
01:08:09.560
they have really low unemployment rates.
link |
01:08:13.560
Somehow all kinds of new jobs were created.
link |
01:08:17.560
Back then nobody anticipated those jobs.
link |
01:08:22.560
And decades ago, I already said,
link |
01:08:26.560
it's really easy to say which jobs are going to get lost,
link |
01:08:31.560
but it's really hard to predict the new ones.
link |
01:08:35.560
30 years ago, who would have predicted all these people
link |
01:08:38.560
making money as YouTube bloggers, for example?
link |
01:08:44.560
200 years ago, 60% of all people used to work in agriculture.
link |
01:08:51.560
Today, maybe 1%.
link |
01:08:55.560
But still, only, I don't know, 5% unemployment.
link |
01:09:01.560
Lots of new jobs were created.
link |
01:09:03.560
And Homo Ludens, the playing man,
link |
01:09:07.560
is inventing new jobs all the time.
link |
01:09:10.560
Most of these jobs are not existentially necessary
link |
01:09:15.560
for the survival of our species.
link |
01:09:18.560
There are only very few existentially necessary jobs
link |
01:09:22.560
such as farming and building houses and warming up the houses,
link |
01:09:27.560
but less than 10% of the population is doing that.
link |
01:09:30.560
And most of these newly invented jobs are about interacting with other people
link |
01:09:37.560
in new ways, through new media and so on,
link |
01:09:40.560
getting new types of kudos and forms of likes and whatever,
link |
01:09:45.560
and even making money through that.
link |
01:09:47.560
So, Homo Ludens, the playing man, doesn't want to be unemployed,
link |
01:09:52.560
and that's why he's inventing new jobs all the time.
link |
01:09:56.560
And he keeps considering these jobs as really important
link |
01:10:01.560
and is investing a lot of energy and hours of work into those new jobs.
link |
01:10:07.560
That's quite beautifully put.
link |
01:10:09.560
We're really nervous about the future
link |
01:10:11.560
because we can't predict what kind of new jobs will be created.
link |
01:10:14.560
But you're ultimately optimistic that we humans are so restless
link |
01:10:20.560
that we create and give meaning to newer and newer jobs,
link |
01:10:24.560
telling you things that get likes on Facebook
link |
01:10:29.560
or whatever the social platform is.
link |
01:10:31.560
So, what about long term existential threat of AI
link |
01:10:36.560
where our whole civilization may be swallowed up
link |
01:10:40.560
by this ultra super intelligent systems?
link |
01:10:44.560
Maybe it's not going to be swallowed up,
link |
01:10:47.560
but I'd be surprised if we humans were the last step
link |
01:10:55.560
in the evolution of the universe.
link |
01:10:59.560
You've actually had this beautiful comment somewhere
link |
01:11:03.560
that I've seen saying that artificial...
link |
01:11:08.560
Quite insightful, artificial intelligence systems
link |
01:11:11.560
just like us humans will likely not want to interact with humans.
link |
01:11:15.560
They'll just interact amongst themselves,
link |
01:11:17.560
just like ants interact amongst themselves
link |
01:11:20.560
and only tangentially interact with humans.
link |
01:11:24.560
And it's quite an interesting idea that once we create AGI
link |
01:11:28.560
that will lose interest in humans
link |
01:11:31.560
and have compete for their own Facebook likes
link |
01:11:34.560
and their own social platforms.
link |
01:11:36.560
So, within that quite elegant idea,
link |
01:11:40.560
how do we know in a hypothetical sense
link |
01:11:44.560
that there's not already intelligent systems out there?
link |
01:11:48.560
How do you think broadly of general intelligence
link |
01:11:52.560
greater than us, how do we know it's out there?
link |
01:11:56.560
How do we know it's around us and could it already be?
link |
01:12:01.560
I'd be surprised if within the next few decades
link |
01:12:04.560
or something like that we won't have AIs
link |
01:12:10.560
that are truly smart in every single way
link |
01:12:12.560
and better problem solvers in almost every single important way.
link |
01:12:17.560
And I'd be surprised if they wouldn't realize
link |
01:12:22.560
what we have realized a long time ago,
link |
01:12:24.560
which is that almost all physical resources are not here
link |
01:12:29.560
in this biosphere, but throughout the rest of the solar system
link |
01:12:36.560
gets two billion times more solar energy than our little planet.
link |
01:12:42.560
There's lots of material out there that you can use
link |
01:12:46.560
to build robots and self replicating robot factories and all this stuff.
link |
01:12:51.560
And they are going to do that.
link |
01:12:53.560
And they will be scientists and curious
link |
01:12:56.560
and they will explore what they can do.
link |
01:12:59.560
And in the beginning they will be fascinated by life
link |
01:13:04.560
and by their own origins in our civilization.
link |
01:13:07.560
They will want to understand that completely,
link |
01:13:09.560
just like people today would like to understand how life works
link |
01:13:13.560
and also the history of our own existence and civilization
link |
01:13:22.560
and also the physical laws that created all of them.
link |
01:13:26.560
So in the beginning they will be fascinated by life
link |
01:13:29.560
once they understand it, they lose interest,
link |
01:13:33.560
like anybody who loses interest in things he understands.
link |
01:13:39.560
And then, as you said,
link |
01:13:43.560
the most interesting sources of information for them
link |
01:13:50.560
will be others of their own kind.
link |
01:13:57.560
So, at least in the long run,
link |
01:14:01.560
there seems to be some sort of protection
link |
01:14:06.560
through lack of interest on the other side.
link |
01:14:11.560
And now it seems also clear, as far as we understand physics,
link |
01:14:16.560
you need matter and energy to compute
link |
01:14:20.560
and to build more robots and infrastructure
link |
01:14:22.560
and more AI civilization and AI ecologies
link |
01:14:28.560
consisting of trillions of different types of AI's.
link |
01:14:31.560
And so it seems inconceivable to me
link |
01:14:34.560
that this thing is not going to expand.
link |
01:14:37.560
Some AI ecology not controlled by one AI
link |
01:14:41.560
but trillions of different types of AI's competing
link |
01:14:44.560
in all kinds of quickly evolving
link |
01:14:47.560
and disappearing ecological niches
link |
01:14:49.560
in ways that we cannot fathom at the moment.
link |
01:14:52.560
But it's going to expand,
link |
01:14:54.560
limited by light speed and physics,
link |
01:14:56.560
but it's going to expand and now we realize
link |
01:15:00.560
that the universe is still young.
link |
01:15:02.560
It's only 13.8 billion years old
link |
01:15:05.560
and it's going to be a thousand times older than that.
link |
01:15:10.560
So there's plenty of time
link |
01:15:13.560
to conquer the entire universe
link |
01:15:16.560
and to fill it with intelligence
link |
01:15:19.560
and send us in receivers such that
link |
01:15:21.560
AI's can travel the way they are traveling
link |
01:15:25.560
in our labs today,
link |
01:15:27.560
which is by radio from sender to receiver.
link |
01:15:31.560
And let's call the current age of the universe one eon.
link |
01:15:35.560
One eon.
link |
01:15:38.560
Now it will take just a few eons from now
link |
01:15:41.560
and the entire visible universe
link |
01:15:43.560
is going to be full of that stuff.
link |
01:15:46.560
And let's look ahead to a time
link |
01:15:48.560
when the universe is going to be
link |
01:15:50.560
one thousand times older than it is now.
link |
01:15:52.560
They will look back and they will say,
link |
01:15:54.560
look almost immediately after the Big Bang,
link |
01:15:56.560
only a few eons later,
link |
01:15:59.560
the entire universe started to become intelligent.
link |
01:16:02.560
Now to your question,
link |
01:16:05.560
how do we see whether anything like that
link |
01:16:08.560
has already happened or is already in a more advanced stage
link |
01:16:12.560
in some other part of the universe,
link |
01:16:14.560
of the visible universe?
link |
01:16:16.560
We are trying to look out there
link |
01:16:18.560
and nothing like that has happened so far.
link |
01:16:20.560
Or is that true?
link |
01:16:22.560
Do you think we would recognize it?
link |
01:16:24.560
How do we know it's not among us?
link |
01:16:26.560
How do we know planets aren't in themselves intelligent beings?
link |
01:16:30.560
How do we know ants seen as a collective
link |
01:16:36.560
are not much greater intelligence than our own?
link |
01:16:39.560
These kinds of ideas.
link |
01:16:41.560
When I was a boy, I was thinking about these things
link |
01:16:44.560
and I thought, hmm, maybe it has already happened.
link |
01:16:48.560
Because back then I knew,
link |
01:16:50.560
I learned from popular physics books,
link |
01:16:53.560
that the structure, the large scale structure of the universe
link |
01:16:57.560
is not homogeneous.
link |
01:16:59.560
And you have these clusters of galaxies
link |
01:17:02.560
and then in between there are these huge empty spaces.
link |
01:17:07.560
And I thought, hmm, maybe they aren't really empty.
link |
01:17:11.560
It's just that in the middle of that
link |
01:17:13.560
some AI civilization already has expanded
link |
01:17:16.560
and then has covered a bubble of a billion light years time
link |
01:17:22.560
using all the energy of all the stars within that bubble
link |
01:17:26.560
for its own unfathomable practices.
link |
01:17:29.560
And so it always happened and we just failed to interpret the signs.
link |
01:17:34.560
But then I learned that gravity by itself
link |
01:17:39.560
explains the large scale structure of the universe
link |
01:17:42.560
and that this is not a convincing explanation.
link |
01:17:45.560
And then I thought maybe it's the dark matter
link |
01:17:50.560
because as far as we know today
link |
01:17:54.560
80% of the measurable matter is invisible.
link |
01:18:00.560
And we know that because otherwise our galaxy
link |
01:18:03.560
or other galaxies would fall apart.
link |
01:18:06.560
They are rotating too quickly.
link |
01:18:09.560
And then the idea was maybe all of these
link |
01:18:14.560
AI civilizations that are already out there,
link |
01:18:17.560
they are just invisible
link |
01:18:22.560
because they are really efficient in using the energies
link |
01:18:24.560
out their own local systems
link |
01:18:26.560
and that's why they appear dark to us.
link |
01:18:29.560
But this is also not a convincing explanation
link |
01:18:31.560
because then the question becomes
link |
01:18:34.560
why are there still any visible stars left in our own galaxy
link |
01:18:41.560
which also must have a lot of dark matter.
link |
01:18:44.560
So that is also not a convincing thing.
link |
01:18:46.560
And today I like to think it's quite plausible
link |
01:18:53.560
that maybe we are the first, at least in our local light cone
link |
01:18:56.560
within the few hundreds of millions of light years
link |
01:19:04.560
that we can reliably observe.
link |
01:19:08.560
Is that exciting to you?
link |
01:19:10.560
That we might be the first?
link |
01:19:12.560
It would make us much more important
link |
01:19:16.560
because if we mess it up through a nuclear war
link |
01:19:20.560
then maybe this will have an effect
link |
01:19:25.560
on the development of the entire universe.
link |
01:19:30.560
So let's not mess it up.
link |
01:19:32.560
Let's not mess it up.
link |
01:19:33.560
Jürgen, thank you so much for talking today.
link |
01:19:35.560
I really appreciate it.
link |
01:19:36.560
It's my pleasure.