back to index

Juergen Schmidhuber: Godel Machines, Meta-Learning, and LSTMs | Lex Fridman Podcast #11


small model | large model

link |
00:00:00.000
The following is a conversation with Jürgen Schmidhuber.
link |
00:00:03.520
He's the co director of the CS Swiss AI Lab
link |
00:00:06.320
and a co creator of long short term memory networks.
link |
00:00:10.360
LSDMs are used in billions of devices today
link |
00:00:13.720
for speech recognition, translation, and much more.
link |
00:00:17.400
Over 30 years, he has proposed a lot of interesting
link |
00:00:20.800
out of the box ideas on meta learning, adversarial networks,
link |
00:00:24.800
computer vision, and even a formal theory of quote,
link |
00:00:28.720
creativity, curiosity, and fun.
link |
00:00:32.360
This conversation is part of the MIT course
link |
00:00:34.920
on artificial general intelligence
link |
00:00:36.560
and the artificial intelligence podcast.
link |
00:00:38.840
If you enjoy it, subscribe on YouTube, iTunes,
link |
00:00:41.960
or simply connect with me on Twitter
link |
00:00:43.960
at Lex Friedman spelled F R I D.
link |
00:00:47.280
And now here's my conversation with Jürgen Schmidhuber.
link |
00:00:53.080
Early on you dreamed of AI systems
link |
00:00:55.640
that self improve recursively.
link |
00:00:58.680
When was that dream born?
link |
00:01:01.800
When I was a baby.
link |
00:01:03.160
No, that's not true.
link |
00:01:04.240
When I was a teenager.
link |
00:01:06.520
And what was the catalyst for that birth?
link |
00:01:09.680
What was the thing that first inspired you?
link |
00:01:12.920
When I was a boy, I was thinking about what to do in my life
link |
00:01:20.200
and then I thought the most exciting thing
link |
00:01:23.880
is to solve the riddles of the universe.
link |
00:01:28.200
And that means you have to become a physicist.
link |
00:01:30.920
However, then I realized that there's something even grander.
link |
00:01:35.840
You can try to build a machine
link |
00:01:39.880
that isn't really a machine any longer
link |
00:01:42.120
that learns to become a much better physicist
link |
00:01:44.520
than I could ever hope to be.
link |
00:01:47.080
And that's how I thought maybe I can multiply
link |
00:01:50.320
my tiny little bit of creativity into infinity.
link |
00:01:54.520
But ultimately that creativity will be multiplied
link |
00:01:57.280
to understand the universe around us.
link |
00:01:59.280
That's the curiosity for that mystery that drove you.
link |
00:02:05.800
Yes, so if you can build a machine
link |
00:02:08.440
that learns to solve more and more complex problems
link |
00:02:13.880
and more and more general problem solver
link |
00:02:16.880
then you basically have solved all the problems,
link |
00:02:22.680
at least all the solvable problems.
link |
00:02:26.080
So how do you think, what is the mechanism
link |
00:02:28.120
for that kind of general solver look like?
link |
00:02:31.640
Obviously we don't quite yet have one
link |
00:02:34.840
or know how to build one but we have ideas
link |
00:02:37.040
and you have had throughout your career
link |
00:02:39.120
several ideas about it.
link |
00:02:40.800
So how do you think about that mechanism?
link |
00:02:43.640
So in the 80s, I thought about how to build this machine
link |
00:02:48.640
that learns to solve all these problems
link |
00:02:51.040
that I cannot solve myself.
link |
00:02:54.120
And I thought it is clear it has to be a machine
link |
00:02:57.160
that not only learns to solve this problem here
link |
00:03:00.880
and this problem here but it also has to learn
link |
00:03:04.160
to improve the learning algorithm itself.
link |
00:03:09.360
So it has to have the learning algorithm
link |
00:03:12.480
in a representation that allows it to inspect it
link |
00:03:15.720
and modify it such that it can come up
link |
00:03:19.240
with a better learning algorithm.
link |
00:03:21.080
So I call that meta learning, learning to learn
link |
00:03:24.600
and recursive self improvement
link |
00:03:26.760
that is really the pinnacle of that
link |
00:03:28.760
where you then not only learn how to improve
link |
00:03:34.800
on that problem and on that
link |
00:03:36.440
but you also improve the way the machine improves
link |
00:03:40.000
and you also improve the way it improves
link |
00:03:42.160
the way it improves itself.
link |
00:03:44.600
And that was my 1987 diploma thesis
link |
00:03:47.440
which was all about that higher education
link |
00:03:50.920
hierarchy of meta learners that have no computational limits
link |
00:03:57.240
except for the well known limits that Gödel identified
link |
00:04:01.640
in 1931 and for the limits of physics.
link |
00:04:06.480
In the recent years, meta learning has gained popularity
link |
00:04:10.040
in a specific kind of form.
link |
00:04:12.760
You've talked about how that's not really meta learning
link |
00:04:16.000
with neural networks, that's more basic transfer learning.
link |
00:04:21.480
Can you talk about the difference
link |
00:04:22.720
between the big general meta learning
link |
00:04:25.460
and a more narrow sense of meta learning
link |
00:04:27.960
the way it's used today, the way it's talked about today?
link |
00:04:30.880
Let's take the example of a deep neural network
link |
00:04:33.440
that has learned to classify images
link |
00:04:37.240
and maybe you have trained that network
link |
00:04:40.060
on 100 different databases of images.
link |
00:04:43.840
And now a new database comes along
link |
00:04:48.080
and you want to quickly learn the new thing as well.
link |
00:04:53.400
So one simple way of doing that is you take the network
link |
00:04:57.720
which already knows 100 types of databases
link |
00:05:02.440
and then you just take the top layer of that
link |
00:05:06.320
and you retrain that using the new label data
link |
00:05:11.320
that you have in the new image database.
link |
00:05:14.760
And then it turns out that it really, really quickly
link |
00:05:17.360
can learn that too, one shot basically
link |
00:05:20.600
because from the first 100 data sets,
link |
00:05:24.320
it already has learned so much about computer vision
link |
00:05:27.560
that it can reuse that and that is then almost good enough
link |
00:05:31.880
to solve the new task except you need a little bit
link |
00:05:34.240
of adjustment on the top.
link |
00:05:38.400
So that is transfer learning.
link |
00:05:41.280
And it has been done in principle for many decades.
link |
00:05:44.520
People have done similar things for decades.
link |
00:05:48.520
Meta learning too, meta learning is about
link |
00:05:51.080
having the learning algorithm itself
link |
00:05:55.760
open to introspection by the system that is using it
link |
00:06:01.560
and also open to modification such that the learning system
link |
00:06:06.320
has an opportunity to modify
link |
00:06:09.680
any part of the learning algorithm
link |
00:06:12.040
and then evaluate the consequences of that modification
link |
00:06:16.840
and then learn from that to create
link |
00:06:21.000
a better learning algorithm and so on recursively.
link |
00:06:25.680
So that's a very different animal
link |
00:06:28.480
where you are opening the space of possible learning
link |
00:06:32.440
algorithms to the learning system itself.
link |
00:06:35.480
Right, so you've, like in the 2004 paper, you described
link |
00:06:40.160
gator machines, programs that rewrite themselves, right?
link |
00:06:44.480
Philosophically and even in your paper, mathematically,
link |
00:06:47.480
these are really compelling ideas but practically,
link |
00:06:52.280
do you see these self referential programs
link |
00:06:55.280
being successful in the near term to having an impact
link |
00:06:59.360
where sort of it demonstrates to the world
link |
00:07:02.960
that this direction is a good one to pursue
link |
00:07:07.400
in the near term?
link |
00:07:08.640
Yes, we had these two different types
link |
00:07:11.320
of fundamental research,
link |
00:07:13.440
how to build a universal problem solver,
link |
00:07:15.800
one basically exploiting proof search
link |
00:07:22.960
and things like that that you need to come up with
link |
00:07:25.520
asymptotically optimal, theoretically optimal
link |
00:07:30.280
self improvers and problem solvers.
link |
00:07:34.160
However, one has to admit that through this proof search
link |
00:07:40.640
comes in an additive constant, an overhead,
link |
00:07:44.480
an additive overhead that vanishes in comparison
link |
00:07:50.760
to what you have to do to solve large problems.
link |
00:07:55.160
However, for many of the small problems
link |
00:07:58.000
that we want to solve in our everyday life,
link |
00:08:00.880
we cannot ignore this constant overhead
link |
00:08:03.280
and that's why we also have been doing other things,
link |
00:08:08.120
non universal things such as recurrent neural networks
link |
00:08:12.160
which are trained by gradient descent
link |
00:08:15.400
and local search techniques which aren't universal at all,
link |
00:08:18.680
which aren't provably optimal at all,
link |
00:08:21.280
like the other stuff that we did,
link |
00:08:22.840
but which are much more practical
link |
00:08:25.400
as long as we only want to solve the small problems
link |
00:08:28.760
that we are typically trying to solve
link |
00:08:33.320
in this environment here.
link |
00:08:35.600
So the universal problem solvers like the Gödel machine,
link |
00:08:38.920
but also Markus Hutter's fastest way
link |
00:08:42.080
of solving all possible problems,
link |
00:08:44.360
which he developed around 2002 in my lab,
link |
00:08:49.040
they are associated with these constant overheads
link |
00:08:52.520
for proof search, which guarantees that the thing
link |
00:08:55.160
that you're doing is optimal.
link |
00:08:57.480
For example, there is this fastest way
link |
00:09:01.160
of solving all problems with a computable solution,
link |
00:09:05.280
which is due to Markus, Markus Hutter,
link |
00:09:08.320
and to explain what's going on there,
link |
00:09:12.240
let's take traveling salesman problems.
link |
00:09:15.720
With traveling salesman problems,
link |
00:09:17.360
you have a number of cities and cities
link |
00:09:21.320
and you try to find the shortest path
link |
00:09:23.680
through all these cities without visiting any city twice.
link |
00:09:29.440
And nobody knows the fastest way
link |
00:09:32.240
of solving traveling salesman problems, TSPs,
link |
00:09:38.720
but let's assume there is a method of solving them
link |
00:09:41.720
within N to the five operations
link |
00:09:45.840
where N is the number of cities.
link |
00:09:48.520
Then the universal method of Markus
link |
00:09:53.000
is going to solve the same traveling salesman problem
link |
00:09:57.000
also within N to the five steps,
link |
00:10:00.480
plus O of one, plus a constant number of steps
link |
00:10:04.760
that you need for the proof searcher,
link |
00:10:07.600
which you need to show that this particular class
link |
00:10:12.600
of problems, the traveling salesman problems,
link |
00:10:15.680
can be solved within a certain time frame,
link |
00:10:17.760
solved within a certain time bound,
link |
00:10:20.680
within order N to the five steps, basically,
link |
00:10:24.560
and this additive constant doesn't care for N,
link |
00:10:28.720
which means as N is getting larger and larger,
link |
00:10:32.600
as you have more and more cities,
link |
00:10:35.080
the constant overhead pales in comparison,
link |
00:10:38.800
and that means that almost all large problems are solved
link |
00:10:44.400
in the best possible way.
link |
00:10:45.880
Today, we already have a universal problem solver like that.
link |
00:10:50.520
However, it's not practical because the overhead,
link |
00:10:54.560
the constant overhead is so large
link |
00:10:57.480
that for the small kinds of problems
link |
00:11:00.240
that we want to solve in this little biosphere.
link |
00:11:04.600
By the way, when you say small,
link |
00:11:06.400
you're talking about things that fall
link |
00:11:08.640
within the constraints of our computational systems.
link |
00:11:10.880
So they can seem quite large to us mere humans, right?
link |
00:11:14.440
That's right, yeah.
link |
00:11:15.360
So they seem large and even unsolvable
link |
00:11:19.000
in a practical sense today,
link |
00:11:21.040
but they are still small compared to almost all problems
link |
00:11:24.760
because almost all problems are large problems,
link |
00:11:28.480
which are much larger than any constant.
link |
00:11:31.920
Do you find it useful as a person
link |
00:11:34.520
who has dreamed of creating a general learning system,
link |
00:11:38.600
has worked on creating one,
link |
00:11:39.840
has done a lot of interesting ideas there,
link |
00:11:42.120
to think about P versus NP,
link |
00:11:46.320
this formalization of how hard problems are,
link |
00:11:50.760
how they scale,
link |
00:11:52.360
this kind of worst case analysis type of thinking,
link |
00:11:55.160
do you find that useful?
link |
00:11:56.800
Or is it only just a mathematical,
link |
00:12:00.520
it's a set of mathematical techniques
link |
00:12:02.600
to give you intuition about what's good and bad.
link |
00:12:05.720
So P versus NP, that's super interesting
link |
00:12:09.440
from a theoretical point of view.
link |
00:12:11.760
And in fact, as you are thinking about that problem,
link |
00:12:14.560
you can also get inspiration
link |
00:12:17.280
for better practical problem solvers.
link |
00:12:21.280
On the other hand, we have to admit
link |
00:12:23.320
that at the moment, the best practical problem solvers
link |
00:12:28.360
for all kinds of problems that we are now solving
link |
00:12:31.080
through what is called AI at the moment,
link |
00:12:33.840
they are not of the kind
link |
00:12:36.240
that is inspired by these questions.
link |
00:12:38.760
There we are using general purpose computers
link |
00:12:42.680
such as recurrent neural networks,
link |
00:12:44.800
but we have a search technique
link |
00:12:46.680
which is just local search gradient descent
link |
00:12:50.280
to try to find a program
link |
00:12:51.920
that is running on these recurrent networks,
link |
00:12:54.400
such that it can solve some interesting problems
link |
00:12:58.200
such as speech recognition or machine translation
link |
00:13:01.880
and something like that.
link |
00:13:03.120
And there is very little theory behind the best solutions
link |
00:13:08.120
that we have at the moment that can do that.
link |
00:13:10.840
Do you think that needs to change?
link |
00:13:12.680
Do you think that will change?
link |
00:13:14.080
Or can we go, can we create a general intelligent systems
link |
00:13:17.160
without ever really proving that that system is intelligent
link |
00:13:20.640
in some kind of mathematical way,
link |
00:13:22.600
solving machine translation perfectly
link |
00:13:25.000
or something like that,
link |
00:13:26.320
within some kind of syntactic definition of a language,
link |
00:13:29.200
or can we just be super impressed
link |
00:13:31.160
by the thing working extremely well and that's sufficient?
link |
00:13:35.120
There's an old saying,
link |
00:13:36.760
and I don't know who brought it up first,
link |
00:13:39.360
which says, there's nothing more practical
link |
00:13:42.480
than a good theory.
link |
00:13:43.720
And a good theory of problem solving
link |
00:13:52.800
under limited resources,
link |
00:13:54.360
like here in this universe or on this little planet,
link |
00:13:58.480
has to take into account these limited resources.
link |
00:14:01.800
And so probably there is locking a theory,
link |
00:14:08.040
which is related to what we already have,
link |
00:14:10.760
these asymptotically optimal problem solvers,
link |
00:14:14.400
which tells us what we need in addition to that
link |
00:14:18.520
to come up with a practically optimal problem solver.
link |
00:14:22.640
So I believe we will have something like that.
link |
00:14:27.040
And maybe just a few little tiny twists are necessary
link |
00:14:30.520
to change what we already have,
link |
00:14:34.280
to come up with that as well.
link |
00:14:36.320
As long as we don't have that,
link |
00:14:37.720
we admit that we are taking suboptimal ways
link |
00:14:42.560
and recurrent neural networks and long short term memory
link |
00:14:45.960
for equipped with local search techniques.
link |
00:14:50.400
And we are happy that it works better
link |
00:14:53.520
than any competing methods,
link |
00:14:55.480
but that doesn't mean that we think we are done.
link |
00:15:00.800
You've said that an AGI system
link |
00:15:02.720
will ultimately be a simple one.
link |
00:15:05.040
A general intelligence system
link |
00:15:06.200
will ultimately be a simple one.
link |
00:15:08.000
Maybe a pseudocode of a few lines
link |
00:15:10.240
will be able to describe it.
link |
00:15:11.840
Can you talk through your intuition behind this idea,
link |
00:15:16.760
why you feel that at its core,
link |
00:15:21.480
intelligence is a simple algorithm?
link |
00:15:26.920
Experience tells us that the stuff that works best
link |
00:15:31.680
is really simple.
link |
00:15:33.120
So the asymptotically optimal ways of solving problems,
link |
00:15:37.680
if you look at them,
link |
00:15:38.800
they're just a few lines of code, it's really true.
link |
00:15:41.840
Although they are these amazing properties,
link |
00:15:44.000
just a few lines of code.
link |
00:15:45.800
Then the most promising and most useful practical things,
link |
00:15:53.760
maybe don't have this proof of optimality
link |
00:15:56.360
associated with them.
link |
00:15:57.800
However, they are also just a few lines of code.
link |
00:16:00.880
The most successful recurrent neural networks,
link |
00:16:05.080
you can write them down in five lines of pseudocode.
link |
00:16:08.320
That's a beautiful, almost poetic idea,
link |
00:16:10.920
but what you're describing there
link |
00:16:15.640
is the lines of pseudocode are sitting on top
link |
00:16:18.200
of layers and layers of abstractions in a sense.
link |
00:16:22.280
So you're saying at the very top,
link |
00:16:25.040
it'll be a beautifully written sort of algorithm.
link |
00:16:31.120
But do you think that there's many layers of abstractions
link |
00:16:33.960
we have to first learn to construct?
link |
00:16:36.880
Yeah, of course, we are building on all these
link |
00:16:40.400
great abstractions that people have invented over the millennia,
link |
00:16:45.080
such as matrix multiplications and real numbers
link |
00:16:50.520
and basic arithmetics and calculus
link |
00:16:56.440
and derivations of error functions
link |
00:17:00.240
and derivatives of error functions and stuff like that.
link |
00:17:04.400
So without that language that greatly simplifies
link |
00:17:09.400
our way of thinking about these problems,
link |
00:17:13.840
we couldn't do anything.
link |
00:17:14.760
So in that sense, as always,
link |
00:17:16.520
we are standing on the shoulders of the giants
link |
00:17:19.520
who in the past simplified the problem
link |
00:17:24.200
of problem solving so much
link |
00:17:26.320
that now we have a chance to do the final step.
link |
00:17:29.960
So the final step will be a simple one.
link |
00:17:33.960
If we take a step back through all of human civilization
link |
00:17:36.680
and just the universe in general,
link |
00:17:40.000
how do you think about evolution
link |
00:17:41.400
and what if creating a universe
link |
00:17:44.480
is required to achieve this final step?
link |
00:17:47.240
What if going through the very painful
link |
00:17:50.880
and inefficient process of evolution is needed
link |
00:17:53.800
to come up with this set of abstractions
link |
00:17:55.800
that ultimately lead to intelligence?
link |
00:17:57.720
Do you think there's a shortcut
link |
00:18:00.720
or do you think we have to create something like our universe
link |
00:18:04.560
in order to create something like human level intelligence?
link |
00:18:09.400
So far, the only example we have is this one,
link |
00:18:13.080
this universe in which we are living.
link |
00:18:14.880
Do you think we can do better?
link |
00:18:20.800
Maybe not, but we are part of this whole process.
link |
00:18:24.960
So apparently, so it might be the case
link |
00:18:29.920
that the code that runs the universe
link |
00:18:32.080
is really, really simple.
link |
00:18:33.600
Everything points to that possibility
link |
00:18:35.760
because gravity and other basic forces
link |
00:18:39.080
are really simple laws that can be easily described
link |
00:18:43.280
also in just a few lines of code basically.
link |
00:18:46.240
And then there are these other events
link |
00:18:51.360
that the apparently random events
link |
00:18:54.280
in the history of the universe,
link |
00:18:55.760
which as far as we know at the moment
link |
00:18:58.000
don't have a compact code, but who knows?
link |
00:19:00.600
Maybe somebody in the near future
link |
00:19:02.440
is going to figure out the pseudo random generator
link |
00:19:06.240
which is computing whether the measurement
link |
00:19:11.800
of that spin up or down thing here
link |
00:19:15.320
is going to be positive or negative.
link |
00:19:17.840
Underlying quantum mechanics.
link |
00:19:19.280
Yes.
link |
00:19:20.120
Do you ultimately think quantum mechanics
link |
00:19:22.600
is a pseudo random number generator?
link |
00:19:24.640
So it's all deterministic.
link |
00:19:26.320
There's no randomness in our universe.
link |
00:19:28.200
Does God play dice?
link |
00:19:31.120
So a couple of years ago, a famous physicist,
link |
00:19:34.680
quantum physicist, Anton Zeilinger,
link |
00:19:37.680
he wrote an essay in nature
link |
00:19:40.080
and it started more or less like that.
link |
00:19:45.360
One of the fundamental insights of the 20th century
link |
00:19:50.360
was that the universe is fundamentally random
link |
00:19:57.360
on the quantum level.
link |
00:20:00.040
And that whenever you measure spin up or down
link |
00:20:04.040
or something like that,
link |
00:20:05.440
a new bit of information enters the history of the universe.
link |
00:20:12.040
And while I was reading that,
link |
00:20:13.200
I was already typing the response
link |
00:20:16.560
and they had to publish it.
link |
00:20:17.880
Because I was right, that there is no evidence,
link |
00:20:21.640
no physical evidence for that.
link |
00:20:25.040
So there's an alternative explanation
link |
00:20:27.720
where everything that we consider random
link |
00:20:30.680
is actually pseudo random,
link |
00:20:33.040
such as the decimal expansion of pi,
link |
00:20:35.960
3.141 and so on, which looks random, but isn't.
link |
00:20:41.680
So pi is interesting because every three digits
link |
00:20:45.400
sequence, every sequence of three digits
link |
00:20:50.400
appears roughly one in a thousand times.
link |
00:20:53.400
And every five digit sequence
link |
00:20:57.400
appears roughly one in 10,000 times,
link |
00:21:01.080
what you would expect if it was random.
link |
00:21:04.400
But there's a very short algorithm,
link |
00:21:06.680
a short program that computes all of that.
link |
00:21:09.120
So it's extremely compressible.
link |
00:21:11.320
And who knows, maybe tomorrow,
link |
00:21:12.640
somebody, some grad student at CERN goes back
link |
00:21:15.640
over all these data points, better decay and whatever,
link |
00:21:20.120
and figures out, oh, it's the second billion digits of pi
link |
00:21:24.760
or something like that.
link |
00:21:26.040
We don't have any fundamental reason at the moment
link |
00:21:29.080
to believe that this is truly random
link |
00:21:33.600
and not just a deterministic video game.
link |
00:21:36.680
If it was a deterministic video game,
link |
00:21:38.680
it would be much more beautiful.
link |
00:21:40.360
Because beauty is simplicity.
link |
00:21:43.840
And many of the basic laws of the universe,
link |
00:21:47.000
like gravity and the other basic forces are very simple.
link |
00:21:51.360
So very short programs can explain what these are doing.
link |
00:21:56.760
And it would be awful and ugly.
link |
00:22:00.560
The universe would be ugly.
link |
00:22:01.720
The history of the universe would be ugly
link |
00:22:04.000
if for the extra things, the random,
link |
00:22:06.240
the seemingly random data points that we get all the time,
link |
00:22:11.080
that we really need a huge number of extra bits
link |
00:22:15.160
to describe all these extra bits of information.
link |
00:22:22.160
So as long as we don't have evidence
link |
00:22:24.800
that there is no short program
link |
00:22:26.920
that computes the entire history of the entire universe,
link |
00:22:31.240
we are, as scientists, compelled to look further
link |
00:22:36.600
for that shortest program.
link |
00:22:39.760
Your intuition says there exists a program
link |
00:22:43.760
that can backtrack to the creation of the universe.
link |
00:22:47.760
Yeah.
link |
00:22:48.600
So it can give the shortest path
link |
00:22:49.440
to the creation of the universe.
link |
00:22:50.480
Yes.
link |
00:22:51.320
Including all the entanglement things
link |
00:22:54.480
and all the spin up and down measures
link |
00:22:57.800
that have been taken place since 13.8 billion years ago.
link |
00:23:06.840
So we don't have a proof that it is random.
link |
00:23:11.840
We don't have a proof that it is compressible
link |
00:23:15.600
to a short program.
link |
00:23:16.760
But as long as we don't have that proof,
link |
00:23:18.240
we are obliged as scientists to keep looking
link |
00:23:21.680
for that simple explanation.
link |
00:23:23.600
Absolutely.
link |
00:23:24.440
So you said the simplicity is beautiful or beauty is simple.
link |
00:23:27.880
Either one works.
link |
00:23:29.440
But you also work on curiosity, discovery,
link |
00:23:34.560
the romantic notion of randomness, of serendipity,
link |
00:23:39.000
of being surprised by things that are about you.
link |
00:23:45.920
In our poetic notion of reality,
link |
00:23:49.600
we think it's kind of like,
link |
00:23:51.640
poetic notion of reality, we think as humans
link |
00:23:54.920
require randomness.
link |
00:23:56.400
So you don't find randomness beautiful.
link |
00:23:59.000
You find simple determinism beautiful.
link |
00:24:04.880
Yeah.
link |
00:24:05.720
Okay.
link |
00:24:07.520
So why?
link |
00:24:08.560
Why?
link |
00:24:09.400
Because the explanation becomes shorter.
link |
00:24:13.040
A universe that is compressible to a short program
link |
00:24:18.040
is much more elegant and much more beautiful
link |
00:24:22.040
than another one, which needs an almost infinite
link |
00:24:25.040
number of bits to be described.
link |
00:24:28.040
As far as we know, many things that are happening
link |
00:24:32.040
in this universe are really simple in terms of
link |
00:24:35.040
short programs that compute gravity
link |
00:24:38.040
and the interaction between elementary particles and so on.
link |
00:24:43.040
So all of that seems to be very, very simple.
link |
00:24:45.040
Every electron seems to reuse the same subprogram
link |
00:24:50.040
all the time, as it is interacting with
link |
00:24:52.040
other elementary particles.
link |
00:24:58.040
If we now require an extra oracle injecting
link |
00:25:05.040
new bits of information all the time for these
link |
00:25:08.040
extra things which are currently not understood,
link |
00:25:11.040
such as better decay, then the whole description
link |
00:25:22.040
length of the data that we can observe of the
link |
00:25:26.040
history of the universe would become much longer
link |
00:25:31.040
and therefore uglier.
link |
00:25:33.040
And uglier.
link |
00:25:34.040
Again, simplicity is elegant and beautiful.
link |
00:25:38.040
The history of science is a history of compression progress.
link |
00:25:42.040
Yes, so you've described sort of as we build up
link |
00:25:47.040
abstractions and you've talked about the idea
link |
00:25:50.040
of compression.
link |
00:25:52.040
How do you see this, the history of science,
link |
00:25:55.040
the history of humanity, our civilization,
link |
00:25:58.040
and life on Earth as some kind of path towards
link |
00:26:02.040
greater and greater compression?
link |
00:26:04.040
What do you mean by that?
link |
00:26:05.040
How do you think about that?
link |
00:26:06.040
Indeed, the history of science is a history of
link |
00:26:10.040
compression progress.
link |
00:26:12.040
What does that mean?
link |
00:26:14.040
Hundreds of years ago there was an astronomer
link |
00:26:17.040
whose name was Kepler and he looked at the data
link |
00:26:21.040
points that he got by watching planets move.
link |
00:26:25.040
And then he had all these data points and
link |
00:26:27.040
suddenly it turned out that he can greatly
link |
00:26:30.040
compress the data by predicting it through an
link |
00:26:36.040
ellipse law.
link |
00:26:38.040
So it turns out that all these data points are
link |
00:26:40.040
more or less on ellipses around the sun.
link |
00:26:45.040
And another guy came along whose name was
link |
00:26:48.040
Newton and before him Hooke.
link |
00:26:51.040
And they said the same thing that is making
link |
00:26:55.040
these planets move like that is what makes the
link |
00:27:00.040
apples fall down.
link |
00:27:02.040
And it also holds for stones and for all kinds
link |
00:27:08.040
of other objects.
link |
00:27:11.040
And suddenly many, many of these observations
link |
00:27:15.040
became much more compressible because as long
link |
00:27:17.040
as you can predict the next thing, given what
link |
00:27:20.040
you have seen so far, you can compress it.
link |
00:27:23.040
And you don't have to store that data extra.
link |
00:27:25.040
This is called predictive coding.
link |
00:27:29.040
And then there was still something wrong with
link |
00:27:31.040
that theory of the universe and you had
link |
00:27:34.040
deviations from these predictions of the theory.
link |
00:27:37.040
And 300 years later another guy came along
link |
00:27:40.040
whose name was Einstein.
link |
00:27:42.040
And he was able to explain away all these
link |
00:27:46.040
deviations from the predictions of the old
link |
00:27:50.040
theory through a new theory which was called
link |
00:27:54.040
the general theory of relativity.
link |
00:27:57.040
Which at first glance looks a little bit more
link |
00:28:00.040
complicated and you have to warp space and time
link |
00:28:03.040
but you can't phrase it within one single
link |
00:28:05.040
sentence which is no matter how fast you
link |
00:28:08.040
accelerate and how hard you decelerate and no
link |
00:28:14.040
matter what is the gravity in your local
link |
00:28:18.040
network, light speed always looks the same.
link |
00:28:21.040
And from that you can calculate all the
link |
00:28:24.040
consequences.
link |
00:28:25.040
So it's a very simple thing and it allows you
link |
00:28:27.040
to further compress all the observations
link |
00:28:30.040
because certainly there are hardly any
link |
00:28:34.040
deviations any longer that you can measure
link |
00:28:37.040
from the predictions of this new theory.
link |
00:28:40.040
So all of science is a history of compression
link |
00:28:44.040
progress.
link |
00:28:45.040
You never arrive immediately at the shortest
link |
00:28:48.040
explanation of the data but you're making
link |
00:28:51.040
progress.
link |
00:28:52.040
Whenever you are making progress you have an
link |
00:28:55.040
insight.
link |
00:28:56.040
You see oh first I needed so many bits of
link |
00:28:59.040
information to describe the data, to describe
link |
00:29:02.040
my falling apples, my video of falling apples,
link |
00:29:04.040
I need so many data, so many pixels have to be
link |
00:29:07.040
stored.
link |
00:29:08.040
But then suddenly I realize no there is a very
link |
00:29:11.040
simple way of predicting the third frame in the
link |
00:29:14.040
video from the first two.
link |
00:29:16.040
And maybe not every little detail can be
link |
00:29:19.040
predicted but more or less most of these orange
link |
00:29:21.040
blobs that are coming down they accelerate in
link |
00:29:24.040
the same way which means that I can greatly
link |
00:29:27.040
compress the video.
link |
00:29:28.040
And the amount of compression, progress, that
link |
00:29:33.040
is the depth of the insight that you have at
link |
00:29:36.040
that moment.
link |
00:29:37.040
That's the fun that you have, the scientific
link |
00:29:39.040
fun, the fun in that discovery.
link |
00:29:42.040
And we can build artificial systems that do
link |
00:29:45.040
the same thing.
link |
00:29:46.040
They measure the depth of their insights as they
link |
00:29:49.040
are looking at the data which is coming in
link |
00:29:51.040
through their own experiments and we give
link |
00:29:54.040
them a reward, an intrinsic reward in proportion
link |
00:29:58.040
to this depth of insight.
link |
00:30:00.040
And since they are trying to maximize the
link |
00:30:05.040
rewards they get they are suddenly motivated to
link |
00:30:09.040
come up with new action sequences, with new
link |
00:30:13.040
experiments that have the property that the data
link |
00:30:16.040
that is coming in as a consequence of these
link |
00:30:19.040
experiments has the property that they can learn
link |
00:30:23.040
something about, see a pattern in there which
link |
00:30:25.040
they hadn't seen yet before.
link |
00:30:28.040
So there is an idea of power play that you
link |
00:30:31.040
described, a training in general problem solver
link |
00:30:34.040
in this kind of way of looking for the unsolved
link |
00:30:36.040
problems.
link |
00:30:37.040
Yeah.
link |
00:30:38.040
Can you describe that idea a little further?
link |
00:30:40.040
It's another very simple idea.
link |
00:30:42.040
So normally what you do in computer science,
link |
00:30:45.040
you have some guy who gives you a problem and
link |
00:30:50.040
then there is a huge search space of potential
link |
00:30:55.040
solution candidates and you somehow try them
link |
00:30:59.040
out and you have more less sophisticated ways
link |
00:31:02.040
of moving around in that search space until
link |
00:31:07.040
you finally found a solution which you
link |
00:31:10.040
consider satisfactory.
link |
00:31:12.040
That's what most of computer science is about.
link |
00:31:15.040
Power play just goes one little step further
link |
00:31:18.040
and says let's not only search for solutions
link |
00:31:23.040
to a given problem but let's search to pairs of
link |
00:31:28.040
problems and their solutions where the system
link |
00:31:31.040
itself has the opportunity to phrase its own
link |
00:31:35.040
problem.
link |
00:31:37.040
So we are looking suddenly at pairs of
link |
00:31:40.040
problems and their solutions or modifications
link |
00:31:44.040
of the problem solver that is supposed to
link |
00:31:47.040
generate a solution to that new problem.
link |
00:31:51.040
And this additional degree of freedom allows
link |
00:31:57.040
us to build career systems that are like
link |
00:32:00.040
scientists in the sense that they not only
link |
00:32:04.040
try to solve and try to find answers to
link |
00:32:07.040
existing questions, no they are also free to
link |
00:32:11.040
pose their own questions.
link |
00:32:13.040
So if you want to build an artificial scientist
link |
00:32:15.040
you have to give it that freedom and power
link |
00:32:17.040
play is exactly doing that.
link |
00:32:19.040
So that's a dimension of freedom that's
link |
00:32:22.040
important to have but how hard do you think
link |
00:32:25.040
that, how multidimensional and difficult the
link |
00:32:31.040
space of then coming up with your own questions
link |
00:32:34.040
is.
link |
00:32:35.040
So it's one of the things that as human beings
link |
00:32:37.040
we consider to be the thing that makes us
link |
00:32:40.040
special, the intelligence that makes us special
link |
00:32:42.040
is that brilliant insight that can create
link |
00:32:46.040
something totally new.
link |
00:32:48.040
Yes.
link |
00:32:49.040
So now let's look at the extreme case, let's
link |
00:32:52.040
look at the set of all possible problems that
link |
00:32:56.040
you can formally describe which is infinite,
link |
00:33:00.040
which should be the next problem that a scientist
link |
00:33:05.040
or power play is going to solve.
link |
00:33:08.040
Well, it should be the easiest problem that
link |
00:33:14.040
goes beyond what you already know.
link |
00:33:17.040
So it should be the simplest problem that the
link |
00:33:21.040
current problem solver that you have which can
link |
00:33:23.040
already solve 100 problems that he cannot solve
link |
00:33:28.040
yet by just generalizing.
link |
00:33:30.040
So it has to be new, so it has to require a
link |
00:33:33.040
modification of the problem solver such that the
link |
00:33:36.040
new problem solver can solve this new thing but
link |
00:33:39.040
the old problem solver cannot do it and in
link |
00:33:42.040
addition to that we have to make sure that the
link |
00:33:46.040
problem solver doesn't forget any of the
link |
00:33:49.040
previous solutions.
link |
00:33:50.040
Right.
link |
00:33:51.040
And so by definition power play is now trying
link |
00:33:54.040
always to search in this pair of, in the set of
link |
00:33:58.040
pairs of problems and problems over modifications
link |
00:34:02.040
for a combination that minimize the time to
link |
00:34:06.040
achieve these criteria.
link |
00:34:08.040
So it's always trying to find the problem which
link |
00:34:11.040
is easiest to add to the repertoire.
link |
00:34:14.040
So just like grad students and academics and
link |
00:34:18.040
researchers can spend their whole career in a
link |
00:34:20.040
local minima stuck trying to come up with
link |
00:34:24.040
interesting questions but ultimately doing very
link |
00:34:26.040
little.
link |
00:34:27.040
Do you think it's easy in this approach of
link |
00:34:31.040
looking for the simplest unsolvable problem to
link |
00:34:33.040
get stuck in a local minima?
link |
00:34:35.040
Is not never really discovering new, you know
link |
00:34:40.040
really jumping outside of the 100 problems that
link |
00:34:42.040
you've already solved in a genuine creative way?
link |
00:34:47.040
No, because that's the nature of power play that
link |
00:34:50.040
it's always trying to break its current
link |
00:34:53.040
generalization abilities by coming up with a new
link |
00:34:57.040
problem which is beyond the current horizon.
link |
00:35:00.040
Just shifting the horizon of knowledge a little
link |
00:35:04.040
bit out there, breaking the existing rules such
link |
00:35:08.040
that the new thing becomes solvable but wasn't
link |
00:35:11.040
solvable by the old thing.
link |
00:35:13.040
So like adding a new axiom like what Gödel did
link |
00:35:17.040
when he came up with these new sentences, new
link |
00:35:21.040
theorems that didn't have a proof in the formal
link |
00:35:23.040
system which means you can add them to the
link |
00:35:25.040
repertoire hoping that they are not going to
link |
00:35:31.040
damage the consistency of the whole thing.
link |
00:35:35.040
So in the paper with the amazing title,
link |
00:35:39.040
Formal Theory of Creativity, Fun and Intrinsic
link |
00:35:43.040
Motivation, you talk about discovery as intrinsic
link |
00:35:46.040
reward, so if you view humans as intelligent
link |
00:35:50.040
agents, what do you think is the purpose and
link |
00:35:53.040
meaning of life for us humans?
link |
00:35:56.040
You've talked about this discovery, do you see
link |
00:35:59.040
humans as an instance of power play, agents?
link |
00:36:04.040
Humans are curious and that means they behave
link |
00:36:10.040
like scientists, not only the official scientists
link |
00:36:13.040
but even the babies behave like scientists and
link |
00:36:16.040
they play around with their toys to figure out
link |
00:36:19.040
how the world works and how it is responding to
link |
00:36:22.040
their actions and that's how they learn about
link |
00:36:25.040
gravity and everything.
link |
00:36:27.040
In 1990 we had the first systems like that which
link |
00:36:31.040
would just try to play around with the environment
link |
00:36:34.040
and come up with situations that go beyond what
link |
00:36:38.040
they knew at that time and then get a reward for
link |
00:36:41.040
creating these situations and then becoming more
link |
00:36:44.040
general problem solvers and being able to understand
link |
00:36:47.040
more of the world.
link |
00:36:49.040
I think in principle that curiosity strategy or
link |
00:37:01.040
more sophisticated versions of what I just
link |
00:37:03.040
described, they are what we have built in as well
link |
00:37:07.040
because evolution discovered that's a good way of
link |
00:37:10.040
exploring the unknown world and a guy who explores
link |
00:37:13.040
the unknown world has a higher chance of solving
link |
00:37:17.040
the mystery that he needs to survive in this world.
link |
00:37:20.040
On the other hand, those guys who were too curious
link |
00:37:24.040
they were weeded out as well so you have to find
link |
00:37:27.040
this trade off.
link |
00:37:28.040
Evolution found a certain trade off.
link |
00:37:30.040
Apparently in our society there is a certain
link |
00:37:33.040
percentage of extremely explorative guys and it
link |
00:37:37.040
doesn't matter if they die because many of the
link |
00:37:40.040
others are more conservative.
link |
00:37:45.040
It would be surprising to me if that principle of
link |
00:37:51.040
artificial curiosity wouldn't be present in almost
link |
00:37:56.040
exactly the same form here.
link |
00:37:58.040
In our brains.
link |
00:38:00.040
You are a bit of a musician and an artist.
link |
00:38:03.040
Continuing on this topic of creativity, what do you
link |
00:38:08.040
think is the role of creativity and intelligence?
link |
00:38:10.040
So you've kind of implied that it's essential for
link |
00:38:15.040
intelligence if you think of intelligence as a
link |
00:38:18.040
problem solving system, as ability to solve problems.
link |
00:38:23.040
But do you think it's essential, this idea of
link |
00:38:26.040
creativity?
link |
00:38:27.040
We never have a program, a sub program that is
link |
00:38:32.040
called creativity or something.
link |
00:38:34.040
It's just a side effect of what our problem solvers
link |
00:38:37.040
do. They are searching a space of problems, a space
link |
00:38:41.040
of candidates, of solution candidates until they
link |
00:38:45.040
hopefully find a solution to a given problem.
link |
00:38:48.040
But then there are these two types of creativity
link |
00:38:50.040
and both of them are now present in our machines.
link |
00:38:54.040
The first one has been around for a long time,
link |
00:38:56.040
which is human gives problem to machine, machine
link |
00:39:00.040
tries to find a solution to that.
link |
00:39:03.040
And this has been happening for many decades and
link |
00:39:06.040
for many decades machines have found creative
link |
00:39:09.040
solutions to interesting problems where humans were
link |
00:39:12.040
not aware of these particularly creative solutions
link |
00:39:17.040
but then appreciated that the machine found that.
link |
00:39:20.040
The second is the pure creativity.
link |
00:39:23.040
That I would call, what I just mentioned, I would
link |
00:39:26.040
call the applied creativity, like applied art where
link |
00:39:30.040
somebody tells you now make a nice picture of this
link |
00:39:34.040
Pope and you will get money for that.
link |
00:39:37.040
So here is the artist and he makes a convincing
link |
00:39:40.040
picture of the Pope and the Pope likes it and gives
link |
00:39:43.040
him the money.
link |
00:39:46.040
And then there is the pure creativity which is
link |
00:39:49.040
more like the power play and the artificial
link |
00:39:51.040
curiosity thing where you have the freedom to
link |
00:39:54.040
select your own problem.
link |
00:39:57.040
Like a scientist who defines his own question
link |
00:40:02.040
to study and so that is the pure creativity if you
link |
00:40:06.040
will as opposed to the applied creativity which
link |
00:40:11.040
serves another.
link |
00:40:14.040
And in that distinction there is almost echoes of
link |
00:40:16.040
narrow AI versus general AI.
link |
00:40:19.040
So this kind of constrained painting of a Pope
link |
00:40:22.040
seems like the approaches of what people are
link |
00:40:28.040
calling narrow AI and pure creativity seems to be,
link |
00:40:33.040
maybe I am just biased as a human but it seems to
link |
00:40:35.040
be an essential element of human level intelligence.
link |
00:40:41.040
Is that what you are implying?
link |
00:40:44.040
To a degree?
link |
00:40:46.040
If you zoom back a little bit and you just look
link |
00:40:49.040
at a general problem solving machine which is
link |
00:40:51.040
trying to solve arbitrary problems then this
link |
00:40:54.040
machine will figure out in the course of solving
link |
00:40:57.040
problems that it is good to be curious.
link |
00:41:00.040
So all of what I said just now about this prewired
link |
00:41:04.040
curiosity and this will to invent new problems
link |
00:41:07.040
that the system doesn't know how to solve yet
link |
00:41:11.040
should be just a byproduct of the general search.
link |
00:41:15.040
However, apparently evolution has built it into
link |
00:41:20.040
us because it turned out to be so successful,
link |
00:41:24.040
a prewiring, a bias, a very successful exploratory
link |
00:41:29.040
bias that we are born with.
link |
00:41:34.040
And you have also said that consciousness in the
link |
00:41:36.040
same kind of way may be a byproduct of problem solving.
link |
00:41:41.040
Do you find this an interesting byproduct?
link |
00:41:45.040
Do you think it is a useful byproduct?
link |
00:41:47.040
What are your thoughts on consciousness in general?
link |
00:41:49.040
Or is it simply a byproduct of greater and greater
link |
00:41:53.040
capabilities of problem solving that is similar
link |
00:41:58.040
to creativity in that sense?
link |
00:42:01.040
We never have a procedure called consciousness
link |
00:42:04.040
in our machines.
link |
00:42:05.040
However, we get as side effects of what these
link |
00:42:09.040
machines are doing things that seem to be closely
link |
00:42:13.040
related to what people call consciousness.
link |
00:42:16.040
So for example, already in 1990 we had simple
link |
00:42:20.040
systems which were basically recurrent networks
link |
00:42:24.040
and therefore universal computers trying to map
link |
00:42:28.040
incoming data into actions that lead to success.
link |
00:42:33.040
Maximizing reward in a given environment,
link |
00:42:36.040
always finding the charging station in time
link |
00:42:40.040
whenever the battery is low and negative signals
link |
00:42:42.040
are coming from the battery, always find the
link |
00:42:45.040
charging station in time without bumping against
link |
00:42:48.040
painful obstacles on the way.
link |
00:42:50.040
So complicated things but very easily motivated.
link |
00:42:54.040
And then we give these little guys a separate
link |
00:43:00.040
recurrent neural network which is just predicting
link |
00:43:02.040
what's happening if I do that and that.
link |
00:43:04.040
What will happen as a consequence of these
link |
00:43:07.040
actions that I'm executing.
link |
00:43:09.040
And it's just trained on the long and long history
link |
00:43:11.040
of interactions with the world.
link |
00:43:13.040
So it becomes a predictive model of the world
link |
00:43:16.040
basically.
link |
00:43:18.040
And therefore also a compressor of the observations
link |
00:43:22.040
of the world because whatever you can predict
link |
00:43:25.040
you don't have to store extra.
link |
00:43:27.040
So compression is a side effect of prediction.
link |
00:43:30.040
And how does this recurrent network compress?
link |
00:43:33.040
Well, it's inventing little subprograms, little
link |
00:43:36.040
subnetworks that stand for everything that
link |
00:43:39.040
frequently appears in the environment like
link |
00:43:42.040
bottles and microphones and faces, maybe lots of
link |
00:43:45.040
faces in my environment so I'm learning to create
link |
00:43:50.040
something like a prototype face and a new face
link |
00:43:52.040
comes along and all I have to encode are the
link |
00:43:54.040
deviations from the prototype.
link |
00:43:56.040
So it's compressing all the time the stuff that
link |
00:43:58.040
frequently appears.
link |
00:44:00.040
There's one thing that appears all the time that
link |
00:44:05.040
is present all the time when the agent is
link |
00:44:07.040
interacting with its environment which is the
link |
00:44:10.040
agent itself.
link |
00:44:12.040
But just for data compression reasons it is
link |
00:44:15.040
extremely natural for this recurrent network to
link |
00:44:18.040
come up with little subnetworks that stand for
link |
00:44:21.040
the properties of the agents, the hand, the other
link |
00:44:26.040
actuators and all the stuff that you need to
link |
00:44:29.040
better encode the data which is influenced by
link |
00:44:32.040
the actions of the agent.
link |
00:44:34.040
So there just as a side effect of data compression
link |
00:44:39.040
during problem solving you have internal self
link |
00:44:43.040
models.
link |
00:44:45.040
Now you can use this model of the world to plan
link |
00:44:50.040
your future and that's what we also have done
link |
00:44:53.040
since 1990.
link |
00:44:54.040
So the recurrent network which is the controller
link |
00:44:57.040
which is trying to maximize reward can use this
link |
00:45:00.040
model of the network of the world, this model
link |
00:45:03.040
network of the world, this predictive model of
link |
00:45:05.040
the world to plan ahead and say let's not do this
link |
00:45:08.040
action sequence, let's do this action sequence
link |
00:45:10.040
instead because it leads to more predicted
link |
00:45:13.040
reward.
link |
00:45:14.040
And whenever it is waking up these little
link |
00:45:17.040
subnetworks that stand for itself then it is
link |
00:45:20.040
thinking about itself and it is thinking about
link |
00:45:23.040
itself and it is exploring mentally the
link |
00:45:28.040
consequences of its own actions and now you tell
link |
00:45:34.040
me what is still missing.
link |
00:45:36.040
Missing the next, the gap to consciousness.
link |
00:45:40.040
There isn't.
link |
00:45:41.040
That's a really beautiful idea that if life is
link |
00:45:45.040
a collection of data and life is a process of
link |
00:45:48.040
compressing that data to act efficiently in that
link |
00:45:54.040
data you yourself appear very often.
link |
00:45:57.040
So it's useful to form compressions of yourself
link |
00:46:00.040
and it's a really beautiful formulation of what
link |
00:46:03.040
consciousness is a necessary side effect.
link |
00:46:05.040
It's actually quite compelling to me.
link |
00:46:11.040
You've described RNNs, developed LSTMs, long
link |
00:46:16.040
short term memory networks that are a type of
link |
00:46:20.040
recurrent neural networks that have gotten a lot
link |
00:46:23.040
of success recently.
link |
00:46:24.040
So these are networks that model the temporal
link |
00:46:27.040
aspects in the data, temporal patterns in the
link |
00:46:30.040
data and you've called them the deepest of the
link |
00:46:34.040
neural networks.
link |
00:46:35.040
So what do you think is the value of depth in
link |
00:46:38.040
the models that we use to learn?
link |
00:46:41.040
Since you mentioned the long short term memory
link |
00:46:46.040
and the LSTM I have to mention the names of the
link |
00:46:50.040
brilliant students who made that possible.
link |
00:46:52.040
First of all my first student ever Sepp Hochreiter
link |
00:46:56.040
who had fundamental insights already in his
link |
00:46:58.040
diploma thesis.
link |
00:46:59.040
Then Felix Geers who had additional important
link |
00:47:03.040
contributions.
link |
00:47:04.040
Alex Gray is a guy from Scotland who is mostly
link |
00:47:08.040
responsible for this CTC algorithm which is now
link |
00:47:11.040
often used to train the LSTM to do the speech
link |
00:47:15.040
recognition on all the Google Android phones and
link |
00:47:18.040
whatever and Siri and so on.
link |
00:47:21.040
So these guys without these guys I would be
link |
00:47:26.040
nothing.
link |
00:47:27.040
It's a lot of incredible work.
link |
00:47:29.040
What is now the depth?
link |
00:47:30.040
What is the importance of depth?
link |
00:47:32.040
Well most problems in the real world are deep in
link |
00:47:36.040
the sense that the current input doesn't tell you
link |
00:47:40.040
all you need to know about the environment.
link |
00:47:44.040
So instead you have to have a memory of what
link |
00:47:48.040
happened in the past and often important parts of
link |
00:47:51.040
that memory are dated.
link |
00:47:54.040
They are pretty old.
link |
00:47:56.040
So when you're doing speech recognition for
link |
00:47:59.040
example and somebody says 11 then that's about
link |
00:48:05.040
half a second or something like that which means
link |
00:48:09.040
it's already 50 time steps.
link |
00:48:11.040
And another guy or the same guy says 7.
link |
00:48:15.040
So the ending is the same even but now the
link |
00:48:19.040
system has to see the distinction between 7 and
link |
00:48:22.040
11 and the only way it can see the difference is
link |
00:48:25.040
it has to store that 50 steps ago there was an
link |
00:48:30.040
S or an L, 11 or 7.
link |
00:48:34.040
So there you have already a problem of depth 50
link |
00:48:37.040
because for each time step you have something
link |
00:48:41.040
like a virtual layer in the expanded unrolled
link |
00:48:44.040
version of this recurrent network which is doing
link |
00:48:46.040
the speech recognition.
link |
00:48:48.040
So these long time lags they translate into
link |
00:48:51.040
problem depth.
link |
00:48:53.040
And most problems in this world are such that
link |
00:48:57.040
you really have to look far back in time to
link |
00:49:01.040
understand what is the problem and to solve it.
link |
00:49:05.040
But just like with LSTMs you don't necessarily
link |
00:49:08.040
need to when you look back in time remember every
link |
00:49:11.040
aspect you just need to remember the important
link |
00:49:13.040
aspects.
link |
00:49:14.040
That's right.
link |
00:49:15.040
The network has to learn to put the important
link |
00:49:18.040
stuff into memory and to ignore the unimportant
link |
00:49:22.040
noise.
link |
00:49:23.040
But in that sense deeper and deeper is better
link |
00:49:28.040
or is there a limitation?
link |
00:49:30.040
I mean LSTM is one of the great examples of
link |
00:49:34.040
architectures that do something beyond just
link |
00:49:40.040
deeper and deeper networks.
link |
00:49:42.040
There's clever mechanisms for filtering data,
link |
00:49:45.040
for remembering and forgetting.
link |
00:49:47.040
So do you think that kind of thinking is
link |
00:49:50.040
necessary?
link |
00:49:51.040
If you think about LSTMs as a leap, a big leap
link |
00:49:54.040
forward over traditional vanilla RNNs, what do
link |
00:49:57.040
you think is the next leap within this context?
link |
00:50:02.040
So LSTM is a very clever improvement but LSTM
link |
00:50:06.040
still don't have the same kind of ability to see
link |
00:50:10.040
far back in the past as us humans do.
link |
00:50:14.040
The credit assignment problem across way back
link |
00:50:18.040
not just 50 time steps or 100 or 1000 but
link |
00:50:22.040
millions and billions.
link |
00:50:24.040
It's not clear what are the practical limits of
link |
00:50:28.040
the LSTM when it comes to looking back.
link |
00:50:31.040
Already in 2006 I think we had examples where
link |
00:50:35.040
it not only looked back tens of thousands of
link |
00:50:38.040
steps but really millions of steps.
link |
00:50:41.040
And Juan Perez Ortiz in my lab I think was the
link |
00:50:45.040
first author of a paper where we really, was it
link |
00:50:49.040
2006 or something, had examples where it learned
link |
00:50:53.040
to look back for more than 10 million steps.
link |
00:50:57.040
So for most problems of speech recognition it's
link |
00:51:01.040
not necessary to look that far back but there
link |
00:51:05.040
are examples where it does.
link |
00:51:07.040
Now the looking back thing, that's rather easy
link |
00:51:11.040
because there is only one past but there are
link |
00:51:15.040
many possible futures and so a reinforcement
link |
00:51:19.040
learning system which is trying to maximize its
link |
00:51:22.040
future expected reward and doesn't know yet which
link |
00:51:26.040
of these many possible futures should I select
link |
00:51:29.040
given this one single past is facing problems
link |
00:51:33.040
that the LSTM by itself cannot solve.
link |
00:51:36.040
So the LSTM is good for coming up with a compact
link |
00:51:40.040
representation of the history and observations
link |
00:51:44.040
and actions so far but now how do you plan in an
link |
00:51:49.040
efficient and good way among all these, how do
link |
00:51:54.040
you select one of these many possible action
link |
00:51:57.040
sequences that a reinforcement learning system
link |
00:52:00.040
has to consider to maximize reward in this
link |
00:52:04.040
unknown future?
link |
00:52:06.040
We have this basic setup where you have one
link |
00:52:10.040
recurrent network which gets in the video and
link |
00:52:14.040
the speech and whatever and it's executing
link |
00:52:17.040
actions and it's trying to maximize reward so
link |
00:52:20.040
there is no teacher who tells it what to do at
link |
00:52:23.040
which point in time.
link |
00:52:25.040
And then there's the other network which is
link |
00:52:29.040
just predicting what's going to happen if I do
link |
00:52:32.040
that and that and that could be an LSTM network
link |
00:52:35.040
and it learns to look back all the way to make
link |
00:52:38.040
better predictions of the next time step.
link |
00:52:41.040
So essentially although it's predicting only the
link |
00:52:44.040
next time step it is motivated to learn to put
link |
00:52:48.040
into memory something that happened maybe a
link |
00:52:51.040
million steps ago because it's important to
link |
00:52:54.040
memorize that if you want to predict that at the
link |
00:52:57.040
next time step, the next event.
link |
00:52:59.040
Now how can a model of the world like that, a
link |
00:53:03.040
predictive model of the world be used by the
link |
00:53:06.040
first guy?
link |
00:53:07.040
Let's call it the controller and the model, the
link |
00:53:10.040
controller and the model.
link |
00:53:12.040
How can the model be used by the controller to
link |
00:53:15.040
efficiently select among these many possible
link |
00:53:18.040
futures?
link |
00:53:19.040
The naive way we had about 30 years ago was
link |
00:53:22.040
let's just use the model of the world as a stand
link |
00:53:26.040
in, as a simulation of the world and millisecond
link |
00:53:30.040
by millisecond we plan the future and that means
link |
00:53:33.040
we have to roll it out really in detail and it
link |
00:53:36.040
will work only if the model is really good and
link |
00:53:39.040
it will still be inefficient because we have to
link |
00:53:42.040
look at all these possible futures and there are
link |
00:53:45.040
so many of them.
link |
00:53:46.040
So instead what we do now since 2015 in our CM
link |
00:53:49.040
systems, controller model systems, we give the
link |
00:53:52.040
controller the opportunity to learn by itself how
link |
00:53:56.040
to use the potentially relevant parts of the M,
link |
00:54:00.040
of the model network to solve new problems more
link |
00:54:04.040
quickly.
link |
00:54:05.040
And if it wants to, it can learn to ignore the M
link |
00:54:09.040
and sometimes it's a good idea to ignore the M
link |
00:54:12.040
because it's really bad, it's a bad predictor in
link |
00:54:15.040
this particular situation of life where the
link |
00:54:19.040
controller is currently trying to maximize reward.
link |
00:54:22.040
However, it can also learn to address and exploit
link |
00:54:26.040
some of the subprograms that came about in the
link |
00:54:31.040
model network through compressing the data by
link |
00:54:35.040
predicting it.
link |
00:54:36.040
So it now has an opportunity to reuse that code,
link |
00:54:40.040
the algorithmic information in the model network
link |
00:54:44.040
to reduce its own search space such that it can
link |
00:54:49.040
solve a new problem more quickly than without the
link |
00:54:52.040
model.
link |
00:54:53.040
Compression.
link |
00:54:54.040
So you're ultimately optimistic and excited about
link |
00:54:59.040
the power of RL, of reinforcement learning in the
link |
00:55:03.040
context of real systems.
link |
00:55:05.040
Absolutely, yeah.
link |
00:55:06.040
So you see RL as a potential having a huge impact
link |
00:55:11.040
beyond just sort of the M part is often developed on
link |
00:55:16.040
supervised learning methods.
link |
00:55:19.040
You see RL as a for problems of self driving cars
link |
00:55:25.040
or any kind of applied cyber robotics.
link |
00:55:28.040
That's the correct interesting direction for
link |
00:55:32.040
research in your view?
link |
00:55:34.040
I do think so.
link |
00:55:35.040
We have a company called Nasence which has applied
link |
00:55:40.040
reinforcement learning to little Audis which learn
link |
00:55:45.040
to park without a teacher.
link |
00:55:47.040
The same principles were used of course.
link |
00:55:50.040
So these little Audis, they are small, maybe like
link |
00:55:54.040
that, so much smaller than the real Audis.
link |
00:55:57.040
But they have all the sensors that you find in the
link |
00:56:00.040
real Audis.
link |
00:56:01.040
You find the cameras, the LIDAR sensors.
link |
00:56:03.040
They go up to 120 kilometers an hour if they want
link |
00:56:08.040
to.
link |
00:56:09.040
And they have pain sensors basically and they don't
link |
00:56:13.040
want to bump against obstacles and other Audis and
link |
00:56:17.040
so they must learn like little babies to park.
link |
00:56:21.040
Take the raw vision input and translate that into
link |
00:56:25.040
actions that lead to successful parking behavior
link |
00:56:28.040
which is a rewarding thing.
link |
00:56:30.040
And yes, they learn that.
link |
00:56:32.040
So we have examples like that and it's only in the
link |
00:56:36.040
beginning.
link |
00:56:37.040
This is just the tip of the iceberg and I believe the
link |
00:56:40.040
next wave of AI is going to be all about that.
link |
00:56:44.040
So at the moment, the current wave of AI is about
link |
00:56:48.040
passive pattern observation and prediction and that's
link |
00:56:53.040
what you have on your smartphone and what the major
link |
00:56:56.040
companies on the Pacific Rim are using to sell you
link |
00:57:00.040
ads to do marketing.
link |
00:57:02.040
That's the current sort of profit in AI and that's
link |
00:57:05.040
only one or two percent of the world economy.
link |
00:57:08.040
Which is big enough to make these companies pretty
link |
00:57:12.040
much the most valuable companies in the world.
link |
00:57:15.040
But there's a much, much bigger fraction of the
link |
00:57:19.040
economy going to be affected by the next wave which
link |
00:57:22.040
is really about machines that shape the data through
link |
00:57:26.040
their own actions.
link |
00:57:27.040
Do you think simulation is ultimately the biggest
link |
00:57:31.040
way that those methods will be successful in the next
link |
00:57:35.040
10, 20 years?
link |
00:57:36.040
We're not talking about 100 years from now.
link |
00:57:38.040
We're talking about sort of the near term impact of
link |
00:57:41.040
RL.
link |
00:57:42.040
Do you think really good simulation is required or
link |
00:57:45.040
is there other techniques like imitation learning,
link |
00:57:48.040
observing other humans operating in the real world?
link |
00:57:53.040
Where do you think the success will come from?
link |
00:57:57.040
So at the moment, we have a tendency of using physics
link |
00:58:02.040
simulations to learn behavior from machines that
link |
00:58:07.040
learn to solve problems that humans also do not know
link |
00:58:11.040
how to solve.
link |
00:58:12.040
However, this is not the future because the future is
link |
00:58:16.040
in what little babies do.
link |
00:58:18.040
They don't use a physics engine to simulate the
link |
00:58:21.040
world.
link |
00:58:22.040
No, they learn a predictive model of the world which
link |
00:58:26.040
maybe sometimes is wrong in many ways but captures
link |
00:58:31.040
all kinds of important abstract high level predictions
link |
00:58:34.040
which are really important to be successful.
link |
00:58:37.040
And that's what was the future 30 years ago when we
link |
00:58:42.040
started that type of research but it's still the future
link |
00:58:45.040
and now we know much better how to go there to move
link |
00:58:49.040
forward and to really make working systems based on
link |
00:58:54.040
that where you have a learning model of the world,
link |
00:58:57.040
a model of the world that learns to predict what's
link |
00:58:59.040
going to happen if I do that and that.
link |
00:59:01.040
And then the controller uses that model to more
link |
00:59:07.040
quickly learn successful action sequences.
link |
00:59:10.040
And then of course always this curiosity thing.
link |
00:59:13.040
In the beginning, the model is stupid so the
link |
00:59:15.040
controller should be motivated to come up with
link |
00:59:18.040
experiments with action sequences that lead to data
link |
00:59:21.040
that improve the model.
link |
00:59:23.040
Do you think improving the model, constructing an
link |
00:59:27.040
understanding of the world in this connection is
link |
00:59:30.040
now the popular approaches that have been successful
link |
00:59:34.040
are grounded in ideas of neural networks.
link |
00:59:38.040
But in the 80s with expert systems, there's
link |
00:59:41.040
symbolic AI approaches which to us humans are more
link |
00:59:45.040
intuitive in the sense that it makes sense that you
link |
00:59:49.040
build up knowledge in this knowledge representation.
link |
00:59:52.040
What kind of lessons can we draw into our current
link |
00:59:54.040
approaches from expert systems from symbolic AI?
link |
01:00:00.040
So I became aware of all of that in the 80s and
link |
01:00:04.040
back then logic programming was a huge thing.
link |
01:00:08.040
Was it inspiring to you yourself?
link |
01:00:10.040
Did you find it compelling?
link |
01:00:12.040
Because a lot of your work was not so much in that
link |
01:00:16.040
realm, right?
link |
01:00:17.040
It was more in the learning systems.
link |
01:00:18.040
Yes and no, but we did all of that.
link |
01:00:20.040
So my first publication ever actually was 1987,
link |
01:00:27.040
was the implementation of genetic algorithm of a
link |
01:00:31.040
genetic programming system in Prolog.
link |
01:00:34.040
So Prolog, that's what you learn back then which is
link |
01:00:38.040
a logic programming language and the Japanese,
link |
01:00:41.040
they have this huge fifth generation AI project
link |
01:00:45.040
which was mostly about logic programming back then.
link |
01:00:49.040
Although neural networks existed and were well
link |
01:00:52.040
known back then and deep learning has existed since
link |
01:00:56.040
1965, since this guy in the Ukraine,
link |
01:01:00.040
Iwakunenko, started it.
link |
01:01:02.040
But the Japanese and many other people,
link |
01:01:05.040
they focused really on this logic programming and I
link |
01:01:08.040
was influenced to the extent that I said,
link |
01:01:10.040
okay, let's take these biologically inspired
link |
01:01:13.040
algorithms like evolution, programs,
link |
01:01:20.040
and implement that in the language which I know,
link |
01:01:22.040
which was Prolog, for example, back then.
link |
01:01:25.040
And then in many ways this came back later because
link |
01:01:29.040
the Gödel machine, for example,
link |
01:01:31.040
has a proof searcher on board and without that it
link |
01:01:34.040
would not be optimal.
link |
01:01:36.040
Well, Markus Futter's universal algorithm for
link |
01:01:38.040
solving all well defined problems has a proof
link |
01:01:41.040
searcher on board so that's very much logic programming.
link |
01:01:46.040
Without that it would not be asymptotically optimal.
link |
01:01:50.040
But then on the other hand,
link |
01:01:51.040
because we are very pragmatic guys also,
link |
01:01:54.040
we focused on recurrent neural networks and
link |
01:02:00.040
suboptimal stuff such as gradient based search and
link |
01:02:04.040
program space rather than provably optimal things.
link |
01:02:09.040
The logic programming certainly has a usefulness
link |
01:02:13.040
when you're trying to construct something provably
link |
01:02:16.040
optimal or provably good or something like that.
link |
01:02:19.040
But is it useful for practical problems?
link |
01:02:22.040
It's really useful for our theorem proving.
link |
01:02:24.040
The best theorem provers today are not neural networks.
link |
01:02:28.040
No, they are logic programming systems and they
link |
01:02:31.040
are much better theorem provers than most math
link |
01:02:35.040
students in the first or second semester.
link |
01:02:38.040
But for reasoning, for playing games of Go or chess
link |
01:02:43.040
or for robots, autonomous vehicles that operate in
link |
01:02:46.040
the real world or object manipulation,
link |
01:02:49.040
you think learning.
link |
01:02:51.040
Yeah, as long as the problems have little to do
link |
01:02:54.040
with theorem proving themselves,
link |
01:02:58.040
then as long as that is not the case,
link |
01:03:01.040
you just want to have better pattern recognition.
link |
01:03:05.040
So to build a self driving car,
link |
01:03:07.040
you want to have better pattern recognition and
link |
01:03:10.040
pedestrian recognition and all these things.
link |
01:03:14.040
You want to minimize the number of false positives,
link |
01:03:19.040
which is currently slowing down self driving cars
link |
01:03:21.040
in many ways.
link |
01:03:23.040
All of that has very little to do with logic programming.
link |
01:03:27.040
What are you most excited about in terms of
link |
01:03:32.040
directions of artificial intelligence at this moment
link |
01:03:35.040
in the next few years in your own research
link |
01:03:38.040
and in the broader community?
link |
01:03:41.040
So I think in the not so distant future,
link |
01:03:44.040
we will have for the first time little robots
link |
01:03:50.040
that learn like kids.
link |
01:03:53.040
I will be able to say to the robot,
link |
01:03:57.040
look here robot, we are going to assemble a smartphone.
link |
01:04:01.040
Let's take this slab of plastic and the screwdriver
link |
01:04:05.040
and let's screw in the screw like that.
link |
01:04:09.040
Not like that, like that.
link |
01:04:11.040
Not like that, like that.
link |
01:04:14.040
And I don't have a data glove or something.
link |
01:04:17.040
He will see me and he will hear me
link |
01:04:20.040
and he will try to do something with his own actuators,
link |
01:04:24.040
which will be really different from mine,
link |
01:04:26.040
but he will understand the difference
link |
01:04:28.040
and will learn to imitate me,
link |
01:04:31.040
but not in the supervised way
link |
01:04:34.040
where a teacher is giving target signals
link |
01:04:37.040
for all his muscles all the time.
link |
01:04:40.040
No, by doing this high level imitation
link |
01:04:43.040
where he first has to learn to imitate me
link |
01:04:46.040
and then to interpret these additional noises
link |
01:04:48.040
coming from my mouth as helping,
link |
01:04:51.040
helpful signals to do that better.
link |
01:04:54.040
And then it will by itself come up with faster ways
link |
01:05:00.040
and more efficient ways of doing the same thing.
link |
01:05:03.040
And finally I stop his learning algorithm
link |
01:05:07.040
and make a million copies and sell it.
link |
01:05:10.040
And so at the moment this is not possible,
link |
01:05:13.040
but we already see how we are going to get there.
link |
01:05:16.040
And you can imagine to the extent
link |
01:05:19.040
that this works economically and cheaply,
link |
01:05:22.040
it's going to change everything.
link |
01:05:25.040
Almost all of production is going to be affected by that.
link |
01:05:31.040
And a much bigger wave,
link |
01:05:34.040
a much bigger AI wave is coming
link |
01:05:36.040
than the one that we are currently witnessing,
link |
01:05:38.040
which is mostly about passive pattern recognition
link |
01:05:40.040
on your smartphone.
link |
01:05:42.040
This is about active machines that shapes data
link |
01:05:45.040
through the actions they are executing
link |
01:05:48.040
and they learn to do that in a good way.
link |
01:05:52.040
So many of the traditional industries
link |
01:05:55.040
are going to be affected by that.
link |
01:05:57.040
All the companies that are building machines
link |
01:06:01.040
will equip these machines with cameras
link |
01:06:04.040
and other sensors and they are going to learn
link |
01:06:08.040
to solve all kinds of problems
link |
01:06:11.040
through interaction with humans,
link |
01:06:13.040
but also a lot on their own
link |
01:06:15.040
to improve what they already can do.
link |
01:06:20.040
And lots of old economy is going to be affected by that.
link |
01:06:24.040
And in recent years I have seen that old economy
link |
01:06:27.040
is actually waking up and realizing that this is the case.
link |
01:06:32.040
Are you optimistic about that future?
link |
01:06:34.040
Are you concerned?
link |
01:06:36.040
There is a lot of people concerned in the near term
link |
01:06:38.040
about the transformation of the nature of work,
link |
01:06:43.040
the kind of ideas that you just suggested
link |
01:06:45.040
would have a significant impact
link |
01:06:47.040
of what kind of things could be automated.
link |
01:06:49.040
Are you optimistic about that future?
link |
01:06:52.040
Are you nervous about that future?
link |
01:06:54.040
And looking a little bit farther into the future,
link |
01:06:58.040
there are people like Gila Musk, Stuart Russell,
link |
01:07:02.040
concerned about the existential threats of that future.
link |
01:07:06.040
So in the near term, job loss,
link |
01:07:08.040
in the long term existential threat,
link |
01:07:10.040
are these concerns to you or are you ultimately optimistic?
link |
01:07:15.040
So let's first address the near future.
link |
01:07:22.040
We have had predictions of job losses for many decades.
link |
01:07:28.040
For example, when industrial robots came along,
link |
01:07:33.040
many people predicted that lots of jobs are going to get lost.
link |
01:07:38.040
And in a sense, they were right,
link |
01:07:42.040
because back then there were car factories
link |
01:07:46.040
and hundreds of people in these factories assembled cars,
link |
01:07:51.040
and today the same car factories have hundreds of robots
link |
01:07:54.040
and maybe three guys watching the robots.
link |
01:07:59.040
On the other hand, those countries that have lots of robots per capita,
link |
01:08:05.040
Japan, Korea, Germany, Switzerland,
link |
01:08:07.040
and a couple of other countries,
link |
01:08:10.040
they have really low unemployment rates.
link |
01:08:14.040
Somehow, all kinds of new jobs were created.
link |
01:08:18.040
Back then, nobody anticipated those jobs.
link |
01:08:23.040
And decades ago, I always said,
link |
01:08:27.040
it's really easy to say which jobs are going to get lost,
link |
01:08:32.040
but it's really hard to predict the new ones.
link |
01:08:36.040
200 years ago, who would have predicted all these people
link |
01:08:40.040
making money as YouTube bloggers, for example?
link |
01:08:46.040
200 years ago, 60% of all people used to work in agriculture.
link |
01:08:54.040
Today, maybe 1%.
link |
01:08:57.040
But still, only, I don't know, 5% unemployment.
link |
01:09:02.040
Lots of new jobs were created, and Homo Ludens, the playing man,
link |
01:09:08.040
is inventing new jobs all the time.
link |
01:09:11.040
Most of these jobs are not existentially necessary
link |
01:09:16.040
for the survival of our species.
link |
01:09:19.040
There are only very few existentially necessary jobs,
link |
01:09:23.040
such as farming and building houses and warming up the houses,
link |
01:09:28.040
but less than 10% of the population is doing that.
link |
01:09:31.040
And most of these newly invented jobs are about
link |
01:09:35.040
interacting with other people in new ways,
link |
01:09:38.040
through new media and so on,
link |
01:09:41.040
getting new types of kudos and forms of likes and whatever,
link |
01:09:46.040
and even making money through that.
link |
01:09:48.040
So, Homo Ludens, the playing man, doesn't want to be unemployed,
link |
01:09:53.040
and that's why he's inventing new jobs all the time.
link |
01:09:57.040
And he keeps considering these jobs as really important
link |
01:10:02.040
and is investing a lot of energy and hours of work into those new jobs.
link |
01:10:08.040
That's quite beautifully put.
link |
01:10:10.040
We're really nervous about the future because we can't predict
link |
01:10:13.040
what kind of new jobs will be created.
link |
01:10:15.040
But you're ultimately optimistic that we humans are so restless
link |
01:10:21.040
that we create and give meaning to newer and newer jobs,
link |
01:10:25.040
totally new, things that get likes on Facebook
link |
01:10:29.040
or whatever the social platform is.
link |
01:10:32.040
So what about long term existential threat of AI,
link |
01:10:36.040
where our whole civilization may be swallowed up
link |
01:10:41.040
by these ultra super intelligent systems?
link |
01:10:45.040
Maybe it's not going to be swallowed up,
link |
01:10:48.040
but I'd be surprised if we humans were the last step
link |
01:10:55.040
in the evolution of the universe.
link |
01:10:58.040
You've actually had this beautiful comment somewhere that I've seen
link |
01:11:05.040
saying that, quite insightful, artificial general intelligence systems,
link |
01:11:12.040
just like us humans, will likely not want to interact with humans,
link |
01:11:16.040
they'll just interact amongst themselves.
link |
01:11:18.040
Just like ants interact amongst themselves
link |
01:11:21.040
and only tangentially interact with humans.
link |
01:11:25.040
And it's quite an interesting idea that once we create AGI,
link |
01:11:29.040
they will lose interest in humans and compete for their own Facebook likes
link |
01:11:34.040
and their own social platforms.
link |
01:11:36.040
So within that quite elegant idea, how do we know in a hypothetical sense
link |
01:11:45.040
that there's not already intelligence systems out there?
link |
01:11:49.040
How do you think broadly of general intelligence greater than us?
link |
01:11:54.040
How do we know it's out there?
link |
01:11:56.040
How do we know it's around us?
link |
01:11:59.040
And could it already be?
link |
01:12:01.040
I'd be surprised if within the next few decades or something like that,
link |
01:12:07.040
we won't have AIs that are truly smart in every single way
link |
01:12:13.040
and better problem solvers in almost every single important way.
link |
01:12:17.040
And I'd be surprised if they wouldn't realize what we have realized a long time ago,
link |
01:12:25.040
which is that almost all physical resources are not here in this biosphere,
link |
01:12:31.040
but further out, the rest of the solar system gets 2 billion times more solar energy
link |
01:12:41.040
than our little planet.
link |
01:12:43.040
There's lots of material out there that you can use to build robots
link |
01:12:47.040
and self replicating robot factories and all this stuff.
link |
01:12:51.040
And they are going to do that and they will be scientists and curious
link |
01:12:56.040
and they will explore what they can do.
link |
01:12:59.040
And in the beginning, they will be fascinated by life
link |
01:13:04.040
and by their own origins in our civilization.
link |
01:13:07.040
They will want to understand that completely, just like people today
link |
01:13:11.040
would like to understand how life works and also the history of our own existence
link |
01:13:21.040
and civilization, but then also the physical laws that created all of that.
link |
01:13:27.040
So in the beginning, they will be fascinated by life.
link |
01:13:30.040
Once they understand it, they lose interest.
link |
01:13:34.040
Like anybody who loses interest in things he understands.
link |
01:13:40.040
And then, as you said, the most interesting sources of information for them
link |
01:13:50.040
will be others of their own kind.
link |
01:13:58.040
So at least in the long run, there seems to be some sort of protection
link |
01:14:06.040
through lack of interest on the other side.
link |
01:14:11.040
And now it seems also clear, as far as we understand physics,
link |
01:14:17.040
you need matter and energy to compute and to build more robots and infrastructure
link |
01:14:23.040
for AI civilization and EIEI ecologies consisting of trillions of different types of AIs.
link |
01:14:31.040
And so it seems inconceivable to me that this thing is not going to expand.
link |
01:14:37.040
Some AI ecology not controlled by one AI, but trillions of different types of AIs
link |
01:14:44.040
competing in all kinds of quickly evolving and disappearing ecological niches
link |
01:14:50.040
in ways that we cannot fathom at the moment.
link |
01:14:52.040
But it's going to expand, limited by light speed and physics,
link |
01:14:57.040
but it's going to expand and now we realize that the universe is still young.
link |
01:15:03.040
It's only 13.8 billion years old and it's going to be a thousand times older than that.
link |
01:15:10.040
So there's plenty of time to conquer the entire universe
link |
01:15:17.040
and to fill it with intelligence and senders and receivers
link |
01:15:21.040
such that AIs can travel the way they are traveling in our labs today,
link |
01:15:27.040
which is by radio from sender to receiver.
link |
01:15:31.040
And let's call the current age of the universe one eon, one eon.
link |
01:15:39.040
Now it will take just a few eons from now and the entire visible universe
link |
01:15:43.040
is going to be full of that stuff.
link |
01:15:47.040
And let's look ahead to a time when the universe is going to be 1000 times older than it is now.
link |
01:15:53.040
They will look back and they will say, look, almost immediately after the Big Bang,
link |
01:15:57.040
only a few eons later, the entire universe started to become intelligent.
link |
01:16:03.040
Now to your question, how do we see whether anything like that has already happened
link |
01:16:09.040
or is already in a more advanced stage in some other part of the universe, of the visible universe?
link |
01:16:16.040
We are trying to look out there and nothing like that has happened so far or is that true?
link |
01:16:22.040
Do you think we would recognize it?
link |
01:16:24.040
How do we know it's not among us?
link |
01:16:26.040
How do we know planets aren't in themselves intelligent beings?
link |
01:16:31.040
How do we know ants seen as a collective are not much greater intelligence than our own?
link |
01:16:40.040
These kinds of ideas.
link |
01:16:42.040
When I was a boy, I was thinking about these things
link |
01:16:45.040
and I thought, maybe it has already happened.
link |
01:16:48.040
Because back then I knew, I learned from popular physics books,
link |
01:16:53.040
that the large scale structure of the universe is not homogeneous.
link |
01:17:00.040
You have these clusters of galaxies and then in between there are these huge empty spaces.
link |
01:17:08.040
And I thought, maybe they aren't really empty.
link |
01:17:12.040
It's just that in the middle of that, some AI civilization already has expanded
link |
01:17:17.040
and then has covered a bubble of a billion light years diameter
link |
01:17:22.040
and is using all the energy of all the stars within that bubble for its own unfathomable purposes.
link |
01:17:29.040
And so it already has happened and we just fail to interpret the signs.
link |
01:17:35.040
And then I learned that gravity by itself explains the large scale structure of the universe
link |
01:17:43.040
and that this is not a convincing explanation.
link |
01:17:46.040
And then I thought, maybe it's the dark matter.
link |
01:17:51.040
Because as far as we know today, 80% of the measurable matter is invisible.
link |
01:18:01.040
And we know that because otherwise our galaxy or other galaxies would fall apart.
link |
01:18:06.040
They are rotating too quickly.
link |
01:18:10.040
And then the idea was, maybe all of these AI civilizations that are already out there,
link |
01:18:17.040
they are just invisible because they are really efficient in using the energies of their own local systems
link |
01:18:26.040
and that's why they appear dark to us.
link |
01:18:29.040
But this is also not a convincing explanation because then the question becomes,
link |
01:18:34.040
why are there still any visible stars left in our own galaxy, which also must have a lot of dark matter?
link |
01:18:44.040
So that is also not a convincing thing.
link |
01:18:46.040
And today, I like to think it's quite plausible that maybe we are the first,
link |
01:18:54.040
at least in our local light cone within the few hundreds of millions of light years that we can reliably observe.
link |
01:19:09.040
Is that exciting to you that we might be the first?
link |
01:19:12.040
And it would make us much more important because if we mess it up through a nuclear war,
link |
01:19:20.040
then maybe this will have an effect on the development of the entire universe.
link |
01:19:31.040
So let's not mess it up.
link |
01:19:32.040
Let's not mess it up.
link |
01:19:34.040
Jürgen, thank you so much for talking today. I really appreciate it.
link |
01:19:37.040
It's my pleasure.