back to index

Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI | Lex Fridman Podcast #75


small model | large model

link |
00:00:00.000
The following is a conversation with Marcus Hutter,
link |
00:00:03.480
senior research scientist at Google DeepMind.
link |
00:00:06.680
Throughout his career of research,
link |
00:00:08.360
including with Jürgen Schmidhuber and Shane Legge,
link |
00:00:11.760
he has proposed a lot of interesting ideas
link |
00:00:13.960
in and around the field of artificial general
link |
00:00:16.360
intelligence, including the development of AICSI,
link |
00:00:20.140
spelled AIXI model, which is a mathematical approach to AGI
link |
00:00:25.360
that incorporates ideas of Kolmogorov complexity,
link |
00:00:28.880
Solomonov induction, and reinforcement learning.
link |
00:00:33.080
In 2006, Marcus launched the 50,000 Euro Hutter Prize
link |
00:00:38.200
for lossless compression of human knowledge.
link |
00:00:41.200
The idea behind this prize is that the ability
link |
00:00:43.720
to compress well is closely related to intelligence.
link |
00:00:47.900
This, to me, is a profound idea.
link |
00:00:51.260
Specifically, if you can compress the first 100
link |
00:00:54.000
megabytes or 1 gigabyte of Wikipedia
link |
00:00:56.520
better than your predecessors, your compressor
link |
00:00:59.000
likely has to also be smarter.
link |
00:01:02.200
The intention of this prize is to encourage
link |
00:01:04.240
the development of intelligent compressors as a path to AGI.
link |
00:01:09.640
In conjunction with his podcast release just a few days ago,
link |
00:01:13.280
Marcus announced a 10x increase in several aspects
link |
00:01:16.520
of this prize, including the money, to 500,000 Euros.
link |
00:01:22.680
The better your compressor works relative to the previous
link |
00:01:25.240
winners, the higher fraction of that prize money
link |
00:01:27.680
is awarded to you.
link |
00:01:29.440
You can learn more about it if you Google simply Hutter Prize.
link |
00:01:35.080
I'm a big fan of benchmarks for developing AI systems,
link |
00:01:38.240
and the Hutter Prize may indeed be
link |
00:01:39.960
one that will spark some good ideas for approaches that
link |
00:01:43.240
will make progress on the path of developing AGI systems.
link |
00:01:47.880
This is the Artificial Intelligence Podcast.
link |
00:01:50.520
If you enjoy it, subscribe on YouTube,
link |
00:01:52.720
give it five stars on Apple Podcast,
link |
00:01:54.720
support it on Patreon, or simply connect with me on Twitter
link |
00:01:58.040
at Lex Friedman, spelled F R I D M A N.
link |
00:02:02.640
As usual, I'll do one or two minutes of ads
link |
00:02:04.840
now and never any ads in the middle
link |
00:02:06.960
that can break the flow of the conversation.
link |
00:02:09.240
I hope that works for you and doesn't
link |
00:02:11.040
hurt the listening experience.
link |
00:02:13.240
This show is presented by Cash App, the number one finance
link |
00:02:16.400
app in the App Store.
link |
00:02:17.800
When you get it, use code LEX PODCAST.
link |
00:02:21.240
Cash App lets you send money to friends,
link |
00:02:23.520
buy Bitcoin, and invest in the stock market
link |
00:02:26.040
with as little as $1.
link |
00:02:27.920
Broker services are provided by Cash App Investing,
link |
00:02:30.920
a subsidiary of Square, a member SIPC.
link |
00:02:34.960
Since Cash App allows you to send and receive money
link |
00:02:37.400
digitally, peer to peer, and security
link |
00:02:39.920
in all digital transactions is very important.
link |
00:02:42.800
Let me mention the PCI data security standard
link |
00:02:45.840
that Cash App is compliant with.
link |
00:02:48.080
I'm a big fan of standards for safety and security.
link |
00:02:52.080
PCI DSS is a good example of that,
link |
00:02:55.080
where a bunch of competitors got together
link |
00:02:57.200
and agreed that there needs to be
link |
00:02:59.000
a global standard around the security of transactions.
link |
00:03:02.520
Now, we just need to do the same for autonomous vehicles
link |
00:03:06.040
and AI systems in general.
link |
00:03:08.880
So again, if you get Cash App from the App Store or Google
link |
00:03:11.920
Play and use the code LEX PODCAST, you'll get $10.
link |
00:03:16.400
And Cash App will also donate $10 to FIRST,
link |
00:03:19.240
one of my favorite organizations that
link |
00:03:21.380
is helping to advance robotics and STEM education
link |
00:03:24.520
for young people around the world.
link |
00:03:27.680
And now, here's my conversation with Markus Hutter.
link |
00:03:32.600
Do you think of the universe as a computer
link |
00:03:34.480
or maybe an information processing system?
link |
00:03:37.020
Let's go with a big question first.
link |
00:03:39.080
Okay, with a big question first.
link |
00:03:41.560
I think it's a very interesting hypothesis or idea.
link |
00:03:45.240
And I have a background in physics,
link |
00:03:47.960
so I know a little bit about physical theories,
link |
00:03:50.800
the standard model of particle physics
link |
00:03:52.440
and general relativity theory.
link |
00:03:54.440
And they are amazing and describe virtually everything
link |
00:03:57.200
in the universe.
link |
00:03:58.040
And they're all in a sense, computable theories.
link |
00:03:59.780
I mean, they're very hard to compute.
link |
00:04:01.800
And it's very elegant, simple theories,
link |
00:04:04.360
which describe virtually everything in the universe.
link |
00:04:07.260
So there's a strong indication that somehow
link |
00:04:12.400
the universe is computable, but it's a plausible hypothesis.
link |
00:04:17.400
So what do you think, just like you said, general relativity,
link |
00:04:21.200
quantum field theory, what do you think that
link |
00:04:23.680
the laws of physics are so nice and beautiful
link |
00:04:26.560
and simple and compressible?
link |
00:04:29.000
Do you think our universe was designed,
link |
00:04:32.800
is naturally this way?
link |
00:04:34.240
Are we just focusing on the parts
link |
00:04:36.760
that are especially compressible?
link |
00:04:39.560
Are human minds just enjoy something about that simplicity?
link |
00:04:42.780
And in fact, there's other things
link |
00:04:44.880
that are not so compressible.
link |
00:04:46.760
I strongly believe and I'm pretty convinced
link |
00:04:49.440
that the universe is inherently beautiful, elegant
link |
00:04:52.560
and simple and described by these equations.
link |
00:04:55.520
And we're not just picking that.
link |
00:04:57.640
I mean, if there were some phenomena
link |
00:05:00.040
which cannot be neatly described,
link |
00:05:02.680
scientists would try that.
link |
00:05:04.640
And there's biology, which is more messy,
link |
00:05:06.720
but we understand that it's an emergent phenomena
link |
00:05:09.280
and it's complex systems,
link |
00:05:11.000
but they still follow the same rules
link |
00:05:12.720
of quantum and electrodynamics.
link |
00:05:14.640
All of chemistry follows that and we know that.
link |
00:05:16.560
I mean, we cannot compute everything
link |
00:05:18.120
because we have limited computational resources.
link |
00:05:20.280
No, I think it's not a bias of the humans,
link |
00:05:22.040
but it's objectively simple.
link |
00:05:23.960
I mean, of course, you never know,
link |
00:05:25.640
maybe there's some corners very far out in the universe
link |
00:05:28.280
or super, super tiny below the nucleus of atoms
link |
00:05:32.960
or parallel universes which are not nice and simple,
link |
00:05:38.200
but there's no evidence for that.
link |
00:05:40.520
And we should apply Occam's razor
link |
00:05:42.200
and choose the simplest three consistent with it.
link |
00:05:45.120
But also it's a little bit self referential.
link |
00:05:48.000
So maybe a quick pause.
link |
00:05:49.440
What is Occam's razor?
link |
00:05:50.960
So Occam's razor says that you should not multiply entities
link |
00:05:55.520
beyond necessity, which sort of,
link |
00:05:58.040
if you translate it to proper English means,
link |
00:06:01.360
and in the scientific context means
link |
00:06:03.400
that if you have two theories or hypothesis or models,
link |
00:06:06.400
which equally well describe the phenomenon,
link |
00:06:09.760
your study or the data,
link |
00:06:11.520
you should choose the more simple one.
link |
00:06:13.920
So that's just the principle or sort of,
link |
00:06:16.640
that's not like a provable law, perhaps.
link |
00:06:20.040
Perhaps we'll kind of discuss it and think about it,
link |
00:06:23.480
but what's the intuition of why the simpler answer
link |
00:06:28.080
is the one that is likely to be more correct descriptor
link |
00:06:33.280
of whatever we're talking about?
link |
00:06:35.080
I believe that Occam's razor
link |
00:06:36.560
is probably the most important principle in science.
link |
00:06:40.240
I mean, of course we lead logical deduction
link |
00:06:42.040
and we do experimental design,
link |
00:06:44.560
but science is about finding, understanding the world,
link |
00:06:49.880
finding models of the world.
link |
00:06:51.480
And we can come up with crazy complex models,
link |
00:06:53.720
which explain everything but predict nothing.
link |
00:06:56.040
But the simple model seem to have predictive power
link |
00:07:00.240
and it's a valid question why?
link |
00:07:03.160
And there are two answers to that.
link |
00:07:06.000
You can just accept it.
link |
00:07:07.240
That is the principle of science and we use this principle
link |
00:07:10.800
and it seems to be successful.
link |
00:07:12.840
We don't know why, but it just happens to be.
link |
00:07:15.920
Or you can try, find another principle
link |
00:07:18.560
which explains Occam's razor.
link |
00:07:21.120
And if we start with the assumption
link |
00:07:24.120
that the world is governed by simple rules,
link |
00:07:27.600
then there's a bias towards simplicity
link |
00:07:31.400
and applying Occam's razor is the mechanism
link |
00:07:36.200
to finding these rules.
link |
00:07:37.120
And actually in a more quantitative sense,
link |
00:07:39.080
and we come back to that later in terms of somnolent reduction,
link |
00:07:41.760
you can rigorously prove that.
link |
00:07:43.080
You can assume that the world is simple,
link |
00:07:45.680
then Occam's razor is the best you can do
link |
00:07:47.800
in a certain sense.
link |
00:07:49.080
So I apologize for the romanticized question,
link |
00:07:51.720
but why do you think, outside of its effectiveness,
link |
00:07:56.320
why do you think we find simplicity
link |
00:07:58.440
so appealing as human beings?
link |
00:08:00.000
Why does E equals MC squared seem so beautiful to us humans?
link |
00:08:05.000
I guess mostly, in general, many things
link |
00:08:08.480
can be explained by an evolutionary argument.
link |
00:08:12.000
And there's some artifacts in humans
link |
00:08:14.240
which are just artifacts and not evolutionary necessary.
link |
00:08:18.240
But with this beauty and simplicity,
link |
00:08:21.120
it's, I believe, at least the core is about,
link |
00:08:28.160
like science, finding regularities in the world,
link |
00:08:31.520
understanding the world, which is necessary for survival.
link |
00:08:35.120
If I look at a bush and I just see noise,
link |
00:08:39.480
and there is a tiger and it eats me, then I'm dead.
link |
00:08:42.080
But if I try to find a pattern,
link |
00:08:44.000
and we know that humans are prone to find more patterns
link |
00:08:49.360
in data than they are, like the Mars face
link |
00:08:53.160
and all these things, but these biads
link |
00:08:55.680
towards finding patterns, even if they are non,
link |
00:08:58.240
but, I mean, it's best, of course, if they are, yeah,
link |
00:09:01.360
helps us for survival.
link |
00:09:04.040
Yeah, that's fascinating.
link |
00:09:04.880
I haven't thought really about the,
link |
00:09:07.240
I thought I just loved science,
link |
00:09:08.840
but indeed, in terms of just for survival purposes,
link |
00:09:13.600
there is an evolutionary argument
link |
00:09:15.920
for why we find the work of Einstein so beautiful.
link |
00:09:21.760
Maybe a quick small tangent.
link |
00:09:24.080
Could you describe what's,
link |
00:09:26.040
Salomonov induction is?
link |
00:09:28.400
Yeah, so that's a theory which I claim,
link |
00:09:32.680
and Mr. Lomanov sort of claimed a long time ago,
link |
00:09:35.440
that this solves the big philosophical problem of induction.
link |
00:09:39.800
And I believe the claim is essentially true.
link |
00:09:42.760
And what it does is the following.
link |
00:09:44.800
So, okay, for the picky listener,
link |
00:09:49.640
induction can be interpreted narrowly and widely.
link |
00:09:53.560
Narrow means inferring models from data.
link |
00:09:58.800
And widely means also then using these models
link |
00:10:01.240
for doing predictions,
link |
00:10:02.320
so predictions also part of the induction.
link |
00:10:04.760
So I'm a little bit sloppy sort of with the terminology,
link |
00:10:07.680
and maybe that comes from Ray Salomonov, you know,
link |
00:10:10.880
being sloppy, maybe I shouldn't say that.
link |
00:10:12.800
He can't complain anymore.
link |
00:10:15.640
So let me explain a little bit this theory in simple terms.
link |
00:10:20.240
So assume you have a data sequence,
link |
00:10:21.960
make it very simple, the simplest one say 1, 1, 1, 1, 1,
link |
00:10:24.800
and you see if 100 ones, what do you think comes next?
link |
00:10:28.840
The natural answer, I'm gonna speed up a little bit,
link |
00:10:30.560
the natural answer is of course, you know, one, okay?
link |
00:10:33.640
And the question is why, okay?
link |
00:10:36.040
Well, we see a pattern there, yeah, okay,
link |
00:10:38.920
there's a one and we repeat it.
link |
00:10:40.720
And why should it suddenly after 100 ones be different?
link |
00:10:43.440
So what we're looking for is simple explanations or models
link |
00:10:47.040
for the data we have.
link |
00:10:48.640
And now the question is,
link |
00:10:49.800
a model has to be presented in a certain language,
link |
00:10:53.400
in which language do we use?
link |
00:10:55.440
In science, we want formal languages,
link |
00:10:57.480
and we can use mathematics,
link |
00:10:58.840
or we can use programs on a computer.
link |
00:11:01.920
So abstractly on a Turing machine, for instance,
link |
00:11:04.480
or it can be a general purpose computer.
link |
00:11:06.320
So, and there are of course, lots of models of,
link |
00:11:09.320
you can say maybe it's 100 ones and then 100 zeros
link |
00:11:11.880
and 100 ones, that's a model, right?
link |
00:11:13.320
But there are simpler models, there's a model print one loop,
link |
00:11:17.240
and it also explains the data.
link |
00:11:19.840
And if you push that to the extreme,
link |
00:11:23.120
you are looking for the shortest program,
link |
00:11:25.320
which if you run this program reproduces the data you have,
link |
00:11:29.400
it will not stop, it will continue naturally.
link |
00:11:32.280
And this you take for your prediction.
link |
00:11:34.600
And on the sequence of ones, it's very plausible, right?
link |
00:11:37.040
That print one loop is the shortest program.
link |
00:11:39.400
We can give some more complex examples
link |
00:11:41.480
like one, two, three, four, five.
link |
00:11:43.760
What comes next?
link |
00:11:44.600
The short program is again, you know,
link |
00:11:46.240
counter, and so that is roughly speaking
link |
00:11:50.160
how solomotive induction works.
link |
00:11:53.160
The extra twist is that it can also deal with noisy data.
link |
00:11:56.360
So if you have, for instance, a coin flip,
link |
00:11:58.680
say a biased coin, which comes up head with 60% probability,
link |
00:12:03.320
then it will predict, it will learn and figure this out,
link |
00:12:06.520
and after a while it predicts, oh, the next coin flip
link |
00:12:09.480
will be head with probability 60%.
link |
00:12:11.400
So it's the stochastic version of that.
link |
00:12:13.480
But the goal is, the dream is always the search
link |
00:12:16.440
for the short program.
link |
00:12:17.520
Yes, yeah.
link |
00:12:18.360
Well, in solomotive induction, precisely what you do is,
link |
00:12:21.000
so you combine, so looking for the shortest program
link |
00:12:24.840
is like applying Opaque's razor,
link |
00:12:26.520
like looking for the simplest theory.
link |
00:12:28.480
There's also Epicorus principle, which says,
link |
00:12:31.160
if you have multiple hypotheses,
link |
00:12:32.720
which equally well describe your data,
link |
00:12:34.440
don't discard any of them, keep all of them around,
link |
00:12:36.520
you never know.
link |
00:12:37.920
And you can put that together and say,
link |
00:12:39.680
okay, I have a bias towards simplicity,
link |
00:12:42.080
but it don't rule out the larger models.
link |
00:12:44.280
And technically what we do is,
link |
00:12:46.360
we weigh the shorter models higher
link |
00:12:49.880
and the longer models lower.
link |
00:12:52.040
And you use a Bayesian techniques, you have a prior,
link |
00:12:55.280
and which is precisely two to the minus
link |
00:12:59.520
the complexity of the program.
link |
00:13:01.840
And you weigh all this hypothesis and take this mixture,
link |
00:13:04.440
and then you get also the stochasticity in.
link |
00:13:06.840
Yeah, like many of your ideas,
link |
00:13:08.200
that's just a beautiful idea of weighing based
link |
00:13:10.560
on the simplicity of the program.
link |
00:13:12.280
I love that, that seems to me
link |
00:13:15.480
maybe a very human centric concept.
link |
00:13:17.200
It seems to be a very appealing way
link |
00:13:19.440
of discovering good programs in this world.
link |
00:13:24.600
You've used the term compression quite a bit.
link |
00:13:27.760
I think it's a beautiful idea.
link |
00:13:30.240
Sort of, we just talked about simplicity
link |
00:13:32.600
and maybe science or just all of our intellectual pursuits
link |
00:13:37.280
is basically the time to compress the complexity
link |
00:13:41.040
all around us into something simple.
link |
00:13:43.080
So what does this word mean to you, compression?
link |
00:13:49.920
I essentially have already explained it.
link |
00:13:51.560
So it compression means for me,
link |
00:13:53.960
finding short programs for the data
link |
00:13:58.400
or the phenomenon at hand.
link |
00:13:59.760
You could interpret it more widely,
link |
00:14:01.640
finding simple theories,
link |
00:14:03.960
which can be mathematical theories
link |
00:14:05.440
or maybe even informal, like just in words.
link |
00:14:09.040
Compression means finding short descriptions,
link |
00:14:11.920
explanations, programs for the data.
link |
00:14:14.880
Do you see science as a kind of our human attempt
link |
00:14:20.320
at compression, so we're speaking more generally,
link |
00:14:23.040
because when you say programs,
link |
00:14:24.920
you're kind of zooming in on a particular sort of
link |
00:14:26.800
almost like a computer science,
link |
00:14:28.080
artificial intelligence focus,
link |
00:14:30.200
but do you see all of human endeavor
link |
00:14:31.920
as a kind of compression?
link |
00:14:34.360
Well, at least all of science,
link |
00:14:35.560
I see as an endeavor of compression,
link |
00:14:37.600
not all of humanity, maybe.
link |
00:14:39.680
And well, there are also some other aspects of science
link |
00:14:42.160
like experimental design, right?
link |
00:14:43.600
I mean, we create experiments specifically
link |
00:14:47.440
to get extra knowledge.
link |
00:14:48.720
And that isn't part of the decision making process,
link |
00:14:53.320
but once we have the data,
link |
00:14:55.400
to understand the data is essentially compression.
link |
00:14:58.160
So I don't see any difference between compression,
link |
00:15:00.800
compression, understanding, and prediction.
link |
00:15:05.960
So we're jumping around topics a little bit,
link |
00:15:07.960
but returning back to simplicity,
link |
00:15:10.480
a fascinating concept of Kolmogorov complexity.
link |
00:15:14.320
So in your sense, do most objects
link |
00:15:17.120
in our mathematical universe
link |
00:15:19.680
have high Kolmogorov complexity?
link |
00:15:21.960
And maybe what is, first of all,
link |
00:15:24.080
what is Kolmogorov complexity?
link |
00:15:25.960
Okay, Kolmogorov complexity is a notion
link |
00:15:28.400
of simplicity or complexity,
link |
00:15:31.160
and it takes the compression view to the extreme.
link |
00:15:35.960
So I explained before that if you have some data sequence,
link |
00:15:39.680
just think about a file in a computer
link |
00:15:41.720
and best sort of, you know, just a string of bits.
link |
00:15:45.120
And if you, and we have data compressors,
link |
00:15:49.440
like we compress big files into zip files
link |
00:15:52.040
with certain compressors.
link |
00:15:53.720
And you can also produce self extracting ArcaFs.
link |
00:15:56.360
That means as an executable,
link |
00:15:58.000
if you run it, it reproduces your original file
link |
00:16:00.760
without needing an extra decompressor.
link |
00:16:02.880
It's just a decompressor plus the ArcaF together in one.
link |
00:16:06.240
And now there are better and worse compressors,
link |
00:16:08.840
and you can ask, what is the ultimate compressor?
link |
00:16:11.120
So what is the shortest possible self extracting ArcaF
link |
00:16:14.880
you could produce for a certain data set here,
link |
00:16:17.920
which reproduces the data set.
link |
00:16:19.560
And the length of this is called the Kolmogorov complexity.
link |
00:16:23.320
And arguably that is the information content
link |
00:16:26.680
in the data set.
link |
00:16:27.960
I mean, if the data set is very redundant or very boring,
link |
00:16:30.480
you can compress it very well.
link |
00:16:31.760
So the information content should be low
link |
00:16:34.760
and you know, it is low according to this definition.
link |
00:16:36.920
So it's the length of the shortest program
link |
00:16:39.720
that summarizes the data?
link |
00:16:41.040
Yes.
link |
00:16:42.040
And what's your sense of our sort of universe
link |
00:16:46.280
when we think about the different objects in our universe
link |
00:16:51.360
that we try, concepts or whatever at every level,
link |
00:16:55.440
do they have higher or low Kolmogorov complexity?
link |
00:16:58.320
So what's the hope?
link |
00:17:00.280
Do we have a lot of hope
link |
00:17:01.400
and be able to summarize much of our world?
link |
00:17:05.680
That's a tricky and difficult question.
link |
00:17:08.520
So as I said before, I believe that the whole universe
link |
00:17:13.560
based on the evidence we have is very simple.
link |
00:17:16.760
So it has a very short description.
link |
00:17:19.240
Sorry, to linger on that, the whole universe,
link |
00:17:23.200
what does that mean?
link |
00:17:24.040
You mean at the very basic fundamental level
link |
00:17:26.720
in order to create the universe?
link |
00:17:28.560
Yes, yeah.
link |
00:17:29.400
So you need a very short program and you run it.
link |
00:17:32.960
To get the thing going.
link |
00:17:34.040
To get the thing going
link |
00:17:35.040
and then it will reproduce our universe.
link |
00:17:37.480
There's a problem with noise.
link |
00:17:39.320
We can come back to that later possibly.
link |
00:17:42.080
Is noise a problem or is it a bug or a feature?
link |
00:17:46.240
I would say it makes our life as a scientist
link |
00:17:49.440
really, really much harder.
link |
00:17:52.160
I mean, think about without noise,
link |
00:17:53.480
we wouldn't need all of the statistics.
link |
00:17:55.920
But then maybe we wouldn't feel like there's a free will.
link |
00:17:58.840
Maybe we need that for the...
link |
00:18:01.360
This is an illusion that noise can give you free will.
link |
00:18:04.000
At least in that way, it's a feature.
link |
00:18:06.640
But also, if you don't have noise,
link |
00:18:09.000
you have chaotic phenomena,
link |
00:18:10.720
which are effectively like noise.
link |
00:18:12.720
So we can't get away with statistics even then.
link |
00:18:15.680
I mean, think about rolling a dice
link |
00:18:17.520
and forget about quantum mechanics
link |
00:18:19.200
and you know exactly how you throw it.
link |
00:18:21.160
But I mean, it's still so hard to compute the trajectory
link |
00:18:24.000
that effectively it is best to model it
link |
00:18:26.400
as coming out with a number,
link |
00:18:30.080
this probability one over six.
link |
00:18:33.040
But from this set of philosophical
link |
00:18:36.320
Kolmogorov complexity perspective,
link |
00:18:38.080
if we didn't have noise,
link |
00:18:39.880
then arguably you could describe the whole universe
link |
00:18:43.160
as well as a standard model plus generativity.
link |
00:18:47.400
I mean, we don't have a theory of everything yet,
link |
00:18:49.600
but sort of assuming we are close to it or have it.
link |
00:18:52.200
Plus the initial conditions, which may hopefully be simple.
link |
00:18:55.400
And then you just run it
link |
00:18:56.600
and then you would reproduce the universe.
link |
00:18:59.040
But that's spoiled by noise or by chaotic systems
link |
00:19:03.520
or by initial conditions, which may be complex.
link |
00:19:06.280
So now if we don't take the whole universe,
link |
00:19:09.680
but just a subset, just take planet Earth.
link |
00:19:13.720
Planet Earth cannot be compressed
link |
00:19:15.600
into a couple of equations.
link |
00:19:17.520
This is a hugely complex system.
link |
00:19:19.200
So interesting.
link |
00:19:20.040
So when you look at the window,
link |
00:19:21.640
like the whole thing might be simple,
link |
00:19:23.000
but when you just take a small window, then...
link |
00:19:26.080
It may become complex and that may be counterintuitive,
link |
00:19:28.760
but there's a very nice analogy.
link |
00:19:31.720
The book, the library of all books.
link |
00:19:34.240
So imagine you have a normal library with interesting books
link |
00:19:36.960
and you go there, great, lots of information
link |
00:19:39.320
and quite complex.
link |
00:19:41.960
So now I create a library which contains all possible books,
link |
00:19:45.000
say of 500 pages.
link |
00:19:46.800
So the first book just has A, A, A, A, A over all the pages.
link |
00:19:49.680
The next book A, A, A and ends with B and so on.
link |
00:19:52.240
I create this library of all books.
link |
00:19:54.200
I can write a super short program which creates this library.
link |
00:19:57.280
So this library which has all books
link |
00:19:59.000
has zero information content.
link |
00:20:01.280
And you take a subset of this library
link |
00:20:02.880
and suddenly you have a lot of information in there.
link |
00:20:05.320
So that's fascinating.
link |
00:20:06.680
I think one of the most beautiful object,
link |
00:20:08.320
mathematical objects that at least today
link |
00:20:10.440
seems to be understudied or under talked about
link |
00:20:12.520
is cellular automata.
link |
00:20:14.920
What lessons do you draw from sort of the game of life
link |
00:20:18.560
for cellular automata where you start with the simple rules
link |
00:20:20.800
just like you're describing with the universe
link |
00:20:22.840
and somehow complexity emerges.
link |
00:20:26.280
Do you feel like you have an intuitive grasp
link |
00:20:30.400
on the fascinating behavior of such systems
link |
00:20:34.120
where like you said, some chaotic behavior could happen,
link |
00:20:37.560
some complexity could emerge,
link |
00:20:39.560
some it could die out and some very rigid structures.
link |
00:20:43.680
Do you have a sense about cellular automata
link |
00:20:46.760
that somehow transfers maybe
link |
00:20:48.200
to the bigger questions of our universe?
link |
00:20:50.960
Yeah, the cellular automata
link |
00:20:51.960
and especially the Conway's game of life
link |
00:20:54.240
is really great because these rules are so simple.
link |
00:20:56.240
You can explain it to every child
link |
00:20:57.720
and even by hand you can simulate a little bit
link |
00:21:00.280
and you see these beautiful patterns emerge
link |
00:21:04.040
and people have proven that it's even Turing complete.
link |
00:21:06.800
You cannot just use a computer to simulate game of life
link |
00:21:09.840
but you can also use game of life to simulate any computer.
link |
00:21:13.480
That is truly amazing.
link |
00:21:16.520
And it's the prime example probably to demonstrate
link |
00:21:21.240
that very simple rules can lead to very rich phenomena.
link |
00:21:25.040
And people sometimes,
link |
00:21:26.840
how is chemistry and biology so rich?
link |
00:21:29.720
I mean, this can't be based on simple rules.
link |
00:21:32.400
But no, we know quantum electrodynamics
link |
00:21:34.520
describes all of chemistry.
link |
00:21:36.360
And we come later back to that.
link |
00:21:38.960
I claim intelligence can be explained
link |
00:21:40.960
or described in one single equation.
link |
00:21:43.000
This very rich phenomenon.
link |
00:21:45.720
You asked also about whether I understand this phenomenon
link |
00:21:49.880
and it's probably not.
link |
00:21:54.280
And there's this saying,
link |
00:21:55.560
you never understand really things,
link |
00:21:56.800
you just get used to them.
link |
00:21:58.360
And I think I got pretty used to cellular automata.
link |
00:22:03.600
So you believe that you understand
link |
00:22:05.440
now why this phenomenon happens.
link |
00:22:07.120
But I give you a different example.
link |
00:22:09.240
I didn't play too much with Conway's game of life
link |
00:22:11.760
but a little bit more with fractals
link |
00:22:15.000
and with the Mandelbrot set and these beautiful patterns,
link |
00:22:18.480
just look Mandelbrot set.
link |
00:22:21.000
And well, when the computers were really slow
link |
00:22:23.280
and I just had a black and white monitor
link |
00:22:25.280
and programmed my own programs in assembler too.
link |
00:22:29.040
Assembler, wow.
link |
00:22:30.920
Wow, you're legit.
link |
00:22:33.720
To get these fractals on the screen
link |
00:22:35.480
and it was mesmerized and much later.
link |
00:22:37.320
So I returned to this every couple of years
link |
00:22:40.080
and then I tried to understand what is going on.
link |
00:22:42.800
And you can understand a little bit.
link |
00:22:44.800
So I tried to derive the locations,
link |
00:22:48.720
there are these circles and the apple shape
link |
00:22:53.520
and then you have smaller Mandelbrot sets
link |
00:22:57.360
recursively in this set.
link |
00:22:59.000
And there's a way to mathematically
link |
00:23:01.720
by solving high order polynomials
link |
00:23:03.480
to figure out where these centers are
link |
00:23:05.640
and what size they are approximately.
link |
00:23:08.080
And by sort of mathematically approaching this problem,
link |
00:23:12.560
you slowly get a feeling of why things are like they are
link |
00:23:18.080
and that sort of isn't, you know,
link |
00:23:21.960
first step to understanding why this rich phenomena.
link |
00:23:24.880
Do you think it's possible, what's your intuition?
link |
00:23:27.200
Do you think it's possible to reverse engineer
link |
00:23:28.880
and find the short program that generated these fractals
link |
00:23:33.320
sort of by looking at the fractals?
link |
00:23:36.400
Well, in principle, yes, yeah.
link |
00:23:38.840
So, I mean, in principle, what you can do is
link |
00:23:42.000
you take, you know, any data set, you know,
link |
00:23:43.480
you take these fractals or you take whatever your data set,
link |
00:23:46.480
whatever you have, say a picture of Convey's Game of Life
link |
00:23:51.000
and you run through all programs.
link |
00:23:53.200
You take a program size one, two, three, four
link |
00:23:55.280
and all these programs around them all in parallel
link |
00:23:57.080
in so called dovetailing fashion,
link |
00:23:59.080
give them computational resources,
link |
00:24:01.320
first one 50%, second one half resources and so on
link |
00:24:03.880
and let them run, wait until they halt,
link |
00:24:06.960
give an output, compare it to your data
link |
00:24:09.120
and if some of these programs produce the correct data,
link |
00:24:12.360
then you stop and then you have already some program.
link |
00:24:14.480
It may be a long program because it's faster
link |
00:24:16.680
and then you continue and you get shorter
link |
00:24:18.760
and shorter programs until you eventually
link |
00:24:20.760
find the shortest program.
link |
00:24:22.520
The interesting thing, you can never know
link |
00:24:24.040
whether it's the shortest program
link |
00:24:25.520
because there could be an even shorter program
link |
00:24:27.440
which is just even slower and you just have to wait here.
link |
00:24:32.200
But asymptotically and actually after a finite time,
link |
00:24:35.000
you have the shortest program.
link |
00:24:36.480
So this is a theoretical but completely impractical way
link |
00:24:40.440
of finding the underlying structure in every data set
link |
00:24:47.440
and that is what Solomov induction does
link |
00:24:49.040
and Kolmogorov complexity.
link |
00:24:50.680
In practice, of course, we have to approach the problem
link |
00:24:52.680
more intelligently.
link |
00:24:53.760
And then if you take resource limitations into account,
link |
00:24:58.760
there's, for instance, a field of pseudo random numbers
link |
00:25:01.760
and these are deterministic sequences,
link |
00:25:06.760
but no algorithm which is fast,
link |
00:25:09.120
fast means runs in polynomial time,
link |
00:25:10.800
can detect that it's actually deterministic.
link |
00:25:13.800
So we can produce interesting,
link |
00:25:16.040
I mean, random numbers maybe not that interesting,
link |
00:25:17.680
but just an example.
link |
00:25:18.520
We can produce complex looking data
link |
00:25:22.480
and we can then prove that no fast algorithm
link |
00:25:25.280
can detect the underlying pattern.
link |
00:25:27.440
Which is, unfortunately, that's a big challenge
link |
00:25:34.240
for our search for simple programs
link |
00:25:35.920
in the space of artificial intelligence, perhaps.
link |
00:25:38.440
Yes, it definitely is for artificial intelligence
link |
00:25:40.480
and it's quite surprising that it's, I can't say easy.
link |
00:25:44.520
I mean, physicists worked really hard to find these theories,
link |
00:25:48.240
but apparently it was possible for human minds
link |
00:25:51.920
to find these simple rules in the universe.
link |
00:25:54.040
It could have been different, right?
link |
00:25:59.200
It could have been different.
link |
00:26:00.200
It's awe inspiring.
link |
00:26:04.720
So let me ask another absurdly big question.
link |
00:26:09.120
What is intelligence in your view?
link |
00:26:13.280
So I have, of course, a definition.
link |
00:26:17.080
I wasn't sure what you're going to say
link |
00:26:18.240
because you could have just as easily said,
link |
00:26:20.000
I have no clue.
link |
00:26:21.520
Which many people would say,
link |
00:26:23.360
but I'm not modest in this question.
link |
00:26:26.680
So the informal version,
link |
00:26:31.440
which I worked out together with Shane Lack,
link |
00:26:33.120
who cofounded DeepMind,
link |
00:26:35.520
is that intelligence measures an agent's ability
link |
00:26:38.720
to perform well in a wide range of environments.
link |
00:26:42.880
So that doesn't sound very impressive.
link |
00:26:45.800
And these words have been very carefully chosen
link |
00:26:49.560
and there is a mathematical theory behind that
link |
00:26:52.960
and we come back to that later.
link |
00:26:54.920
And if you look at this definition by itself,
link |
00:26:59.640
it seems like, yeah, okay,
link |
00:27:01.160
but it seems a lot of things are missing.
link |
00:27:03.400
But if you think it through,
link |
00:27:05.920
then you realize that most,
link |
00:27:08.760
and I claim all of the other traits,
link |
00:27:10.680
at least of rational intelligence,
link |
00:27:12.600
which we usually associate with intelligence,
link |
00:27:14.440
are emergent phenomena from this definition.
link |
00:27:17.960
Like creativity, memorization, planning, knowledge.
link |
00:27:22.160
You all need that in order to perform well
link |
00:27:25.000
in a wide range of environments.
link |
00:27:27.400
So you don't have to explicitly mention
link |
00:27:29.000
that in a definition.
link |
00:27:29.960
Interesting.
link |
00:27:30.800
So yeah, so the consciousness, abstract reasoning,
link |
00:27:34.040
all these kinds of things are just emergent phenomena
link |
00:27:36.200
that help you in towards,
link |
00:27:40.640
can you say the definition again?
link |
00:27:41.880
So multiple environments.
link |
00:27:44.160
Did you mention the word goals?
link |
00:27:45.880
No, but we have an alternative definition.
link |
00:27:47.760
Instead of performing well,
link |
00:27:48.800
you can just replace it by goals.
link |
00:27:50.160
So intelligence measures an agent's ability
link |
00:27:53.280
to achieve goals in a wide range of environments.
link |
00:27:55.680
That's more or less equal.
link |
00:27:56.520
But interesting,
link |
00:27:57.360
because in there, there's an injection of the word goals.
link |
00:27:59.680
So we want to specify there should be a goal.
link |
00:28:03.160
Yeah, but perform well is sort of,
link |
00:28:04.800
what does it mean?
link |
00:28:05.760
It's the same problem.
link |
00:28:06.640
Yeah.
link |
00:28:07.760
There's a little bit gray area,
link |
00:28:09.240
but it's much closer to something that could be formalized.
link |
00:28:14.080
In your view, are humans,
link |
00:28:16.320
where do humans fit into that definition?
link |
00:28:18.320
Are they general intelligence systems
link |
00:28:21.920
that are able to perform in,
link |
00:28:24.120
like how good are they at fulfilling that definition
link |
00:28:27.840
at performing well in multiple environments?
link |
00:28:31.200
Yeah, that's a big question.
link |
00:28:32.760
I mean, the humans are performing best among all species.
link |
00:28:37.640
We know of, yeah.
link |
00:28:40.680
Depends.
link |
00:28:41.520
You could say that trees and plants are doing a better job.
link |
00:28:44.440
They'll probably outlast us.
link |
00:28:46.280
Yeah, but they are in a much more narrow environment, right?
link |
00:28:49.400
I mean, you just have a little bit of air pollutions
link |
00:28:51.680
and these trees die and we can adapt, right?
link |
00:28:54.040
We build houses, we build filters,
link |
00:28:55.440
we do geoengineering.
link |
00:28:59.480
So the multiple environment part.
link |
00:29:01.040
Yeah, that is very important, yeah.
link |
00:29:02.600
So that distinguish narrow intelligence
link |
00:29:04.640
from wide intelligence, also in the AI research.
link |
00:29:08.400
So let me ask the Allentourian question.
link |
00:29:12.080
Can machines think?
link |
00:29:14.160
Can machines be intelligent?
link |
00:29:15.880
So in your view, I have to kind of ask,
link |
00:29:19.560
the answer is probably yes,
link |
00:29:20.560
but I want to kind of hear what your thoughts on it.
link |
00:29:24.360
Can machines be made to fulfill this definition
link |
00:29:27.720
of intelligence, to achieve intelligence?
link |
00:29:30.760
Well, we are sort of getting there
link |
00:29:33.000
and on a small scale, we are already there.
link |
00:29:36.720
The wide range of environments are missing,
link |
00:29:38.960
but we have self driving cars,
link |
00:29:40.320
we have programs which play Go and chess,
link |
00:29:42.720
we have speech recognition.
link |
00:29:44.440
So that's pretty amazing,
link |
00:29:45.480
but these are narrow environments.
link |
00:29:49.560
But if you look at AlphaZero,
link |
00:29:51.000
that was also developed by DeepMind.
link |
00:29:53.720
I mean, got famous with AlphaGo
link |
00:29:55.400
and then came AlphaZero a year later.
link |
00:29:57.720
That was truly amazing.
link |
00:29:59.280
So reinforcement learning algorithm,
link |
00:30:01.800
which is able just by self play,
link |
00:30:04.440
to play chess and then also Go.
link |
00:30:08.560
And I mean, yes, they're both games,
link |
00:30:10.120
but they're quite different games.
link |
00:30:11.400
And you didn't don't feed them the rules of the game.
link |
00:30:15.120
And the most remarkable thing,
link |
00:30:16.720
which is still a mystery to me,
link |
00:30:18.080
that usually for any decent chess program,
link |
00:30:21.040
I don't know much about Go,
link |
00:30:22.800
you need opening books and end game tables and so on too.
link |
00:30:26.960
And nothing in there, nothing was put in there.
link |
00:30:29.680
Especially with AlphaZero,
link |
00:30:31.360
the self playing mechanism starting from scratch,
link |
00:30:33.520
being able to learn actually new strategies is...
link |
00:30:39.040
Yeah, it rediscovered all these famous openings
link |
00:30:43.040
within four hours by itself.
link |
00:30:46.280
What I was really happy about,
link |
00:30:47.480
I'm a terrible chess player, but I like Queen Gumby.
link |
00:30:50.200
And AlphaZero figured out that this is the best opening.
link |
00:30:53.160
Finally, somebody proved you correct.
link |
00:30:59.920
So yes, to answer your question,
link |
00:31:01.680
yes, I believe that general intelligence is possible.
link |
00:31:05.040
And it also, I mean, it depends how you define it.
link |
00:31:08.280
Do you say AGI with general intelligence,
link |
00:31:11.520
artificial intelligence,
link |
00:31:13.600
only refers to if you achieve human level
link |
00:31:16.120
or a subhuman level, but quite broad,
link |
00:31:18.600
is it also general intelligence?
link |
00:31:19.960
So we have to distinguish,
link |
00:31:20.920
or it's only super human intelligence,
link |
00:31:23.360
general artificial intelligence.
link |
00:31:25.120
Is there a test in your mind,
link |
00:31:26.680
like the Turing test for natural language
link |
00:31:28.680
or some other test that would impress the heck out of you
link |
00:31:32.000
that would kind of cross the line of your sense
link |
00:31:36.960
of intelligence within the framework that you said?
link |
00:31:39.840
Well, the Turing test has been criticized a lot,
link |
00:31:42.960
but I think it's not as bad as some people think.
link |
00:31:45.880
And some people think it's too strong.
link |
00:31:47.680
So it tests not just for system to be intelligent,
link |
00:31:52.120
but it also has to fake human deception,
link |
00:31:56.960
which is much harder.
link |
00:31:58.960
And on the other hand, they say it's too weak
link |
00:32:01.160
because it just maybe fakes emotions
link |
00:32:05.640
or intelligent behavior.
link |
00:32:07.680
It's not real.
link |
00:32:09.400
But I don't think that's the problem or a big problem.
link |
00:32:11.960
So if you would pass the Turing test,
link |
00:32:15.720
so a conversation over terminal with a bot for an hour,
link |
00:32:20.600
or maybe a day or so,
link |
00:32:21.760
and you can fool a human into not knowing
link |
00:32:25.080
whether this is a human or not,
link |
00:32:26.120
so that's the Turing test,
link |
00:32:27.720
I would be truly impressed.
link |
00:32:30.240
And we have this annual competition, the Lübner Prize.
link |
00:32:34.360
And I mean, it started with ELISA,
link |
00:32:35.960
that was the first conversational program.
link |
00:32:38.200
And what is it called?
link |
00:32:40.200
The Japanese Mitsuko or so.
link |
00:32:41.760
That's the winner of the last couple of years.
link |
00:32:44.680
And well.
link |
00:32:45.520
Quite impressive.
link |
00:32:46.360
Yeah, it's quite impressive.
link |
00:32:47.200
And then Google has developed Mina, right?
link |
00:32:50.240
Just recently, that's an open domain conversational bot,
link |
00:32:55.200
just a couple of weeks ago, I think.
link |
00:32:57.560
Yeah, I kind of like the metric
link |
00:32:58.760
that sort of the Alexa Prize has proposed.
link |
00:33:01.680
I mean, maybe it's obvious to you.
link |
00:33:02.880
It wasn't to me of setting sort of a length
link |
00:33:06.400
of a conversation.
link |
00:33:07.720
Like you want the bot to be sufficiently interesting
link |
00:33:10.920
that you would want to keep talking to it
link |
00:33:12.360
for like 20 minutes.
link |
00:33:13.640
And that's a surprisingly effective in aggregate metric,
link |
00:33:19.520
because really, like nobody has the patience
link |
00:33:24.960
to be able to talk to a bot that's not interesting
link |
00:33:27.720
and intelligent and witty,
link |
00:33:29.000
and is able to go on to different tangents, jump domains,
link |
00:33:32.960
be able to say something interesting
link |
00:33:35.360
to maintain your attention.
link |
00:33:36.680
And maybe many humans will also fail this test.
link |
00:33:39.040
That's the, unfortunately, we set,
link |
00:33:42.840
just like with autonomous vehicles, with chatbots,
link |
00:33:45.400
we also set a bar that's way too high to reach.
link |
00:33:48.200
I said, you know, the Turing test is not as bad
link |
00:33:50.000
as some people believe,
link |
00:33:51.160
but what is really not useful about the Turing test,
link |
00:33:55.920
it gives us no guidance
link |
00:33:58.160
how to develop these systems in the first place.
link |
00:34:00.560
Of course, you know, we can develop them by trial and error
link |
00:34:02.960
and, you know, do whatever and then run the test
link |
00:34:05.400
and see whether it works or not.
link |
00:34:06.880
But a mathematical definition of intelligence
link |
00:34:12.320
gives us, you know, an objective,
link |
00:34:16.200
which we can then analyze by theoretical tools
link |
00:34:19.520
or computational, and, you know,
link |
00:34:22.480
maybe even prove how close we are.
link |
00:34:25.160
And we will come back to that later with the iXe model.
link |
00:34:28.760
So, I mentioned the compression, right?
link |
00:34:31.280
So in natural language processing,
link |
00:34:33.320
they have achieved amazing results.
link |
00:34:36.760
And one way to test this, of course,
link |
00:34:38.760
you know, take the system, you train it,
link |
00:34:40.280
and then you see how well it performs on the task.
link |
00:34:43.200
But a lot of performance measurement
link |
00:34:47.520
is done by so called perplexity,
link |
00:34:49.040
which is essentially the same as complexity
link |
00:34:51.920
or compression length.
link |
00:34:53.240
So the NLP community develops new systems
link |
00:34:55.920
and then they measure the compression length
link |
00:34:57.520
and then they have ranking and leaks
link |
00:35:01.280
because there's a strong correlation
link |
00:35:02.800
between compressing well,
link |
00:35:04.640
and then the system's performing well at the task at hand.
link |
00:35:07.560
It's not perfect, but it's good enough
link |
00:35:09.840
for them as an intermediate aim.
link |
00:35:14.640
So you mean a measure,
link |
00:35:16.040
so this is kind of almost returning
link |
00:35:18.400
to the common goal of complexity.
link |
00:35:19.800
So you're saying good compression
link |
00:35:22.520
usually means good intelligence.
link |
00:35:24.960
Yes.
link |
00:35:27.040
So you mentioned you're one of the only people
link |
00:35:31.120
who dared boldly to try to formalize
link |
00:35:36.280
the idea of artificial general intelligence,
link |
00:35:38.720
to have a mathematical framework for intelligence,
link |
00:35:42.840
just like as we mentioned,
link |
00:35:45.000
termed AIXI, A, I, X, I.
link |
00:35:49.200
So let me ask the basic question.
link |
00:35:51.760
What is AIXI?
link |
00:35:54.760
Okay, so let me first say what it stands for because...
link |
00:35:57.960
What it stands for, actually,
link |
00:35:58.880
that's probably the more basic question.
link |
00:36:00.360
What it...
link |
00:36:01.640
The first question is usually how it's pronounced,
link |
00:36:04.400
but finally I put it on the website how it's pronounced
link |
00:36:07.240
and you figured it out.
link |
00:36:10.520
The name comes from AI, artificial intelligence,
link |
00:36:13.280
and the X, I, is the Greek letter Xi,
link |
00:36:16.400
which are used for Solomonov's distribution
link |
00:36:19.680
for quite stupid reasons,
link |
00:36:22.000
which I'm not willing to repeat here in front of camera.
link |
00:36:24.800
Sure.
link |
00:36:27.040
So it just happened to be more or less arbitrary.
link |
00:36:29.840
I chose the Xi.
link |
00:36:31.600
But it also has nice other interpretations.
link |
00:36:34.680
So there are actions and perceptions in this model.
link |
00:36:38.360
An agent has actions and perceptions over time.
link |
00:36:42.000
So this is A index I, X index I.
link |
00:36:44.680
So there's the action at time I
link |
00:36:46.120
and then followed by perception at time I.
link |
00:36:49.040
Yeah, we'll go with that.
link |
00:36:50.440
I'll edit out the first part.
link |
00:36:52.320
I'm just kidding.
link |
00:36:53.320
I have some more interpretations.
link |
00:36:55.120
So at some point, maybe five years ago or 10 years ago,
link |
00:36:59.280
I discovered in Barcelona, it was on a big church
link |
00:37:04.720
that was in stone engraved, some text,
link |
00:37:08.480
and the word Aixia appeared there a couple of times.
link |
00:37:11.480
I was very surprised and happy about that.
link |
00:37:16.960
And I looked it up.
link |
00:37:17.800
So it is a Catalan language
link |
00:37:19.440
and it means with some interpretation of that's it,
link |
00:37:22.280
that's the right thing to do.
link |
00:37:23.320
Yeah, Huayrica.
link |
00:37:24.800
Oh, so it's almost like destined somehow.
link |
00:37:27.920
It came to you in a dream.
link |
00:37:32.080
And similar, there's a Chinese word, Aixi,
link |
00:37:34.280
also written like Aixi, if you transcribe that to Pinyin.
link |
00:37:37.480
And the final one is that it's AI crossed with induction
link |
00:37:41.120
because that is, and that's going more to the content now.
link |
00:37:44.680
So good old fashioned AI is more about planning
link |
00:37:47.400
and known deterministic world
link |
00:37:48.760
and induction is more about often IID data
link |
00:37:51.800
and inferring models.
link |
00:37:53.000
And essentially what this Aixi model does
link |
00:37:54.880
is combining these two.
link |
00:37:56.160
And I actually also recently, I think heard that
link |
00:37:59.480
in Japanese AI means love.
link |
00:38:02.280
So if you can combine XI somehow with that,
link |
00:38:06.720
I think we can, there might be some interesting ideas there.
link |
00:38:10.320
So Aixi, let's then take the next step.
link |
00:38:12.640
Can you maybe talk at the big level
link |
00:38:16.560
of what is this mathematical framework?
link |
00:38:19.480
Yeah, so it consists essentially of two parts.
link |
00:38:22.560
One is the learning and induction and prediction part.
link |
00:38:26.520
And the other one is the planning part.
link |
00:38:28.680
So let's come first to the learning,
link |
00:38:31.200
induction, prediction part,
link |
00:38:32.840
which essentially I explained already before.
link |
00:38:35.640
So what we need for any agent to act well
link |
00:38:40.680
is that it can somehow predict what happens.
link |
00:38:43.480
I mean, if you have no idea what your actions do,
link |
00:38:47.080
how can you decide which actions are good or not?
link |
00:38:48.920
So you need to have some model of what your actions effect.
link |
00:38:52.840
So what you do is you have some experience,
link |
00:38:56.160
you build models like scientists of your experience,
link |
00:38:59.360
then you hope these models are roughly correct,
link |
00:39:01.400
and then you use these models for prediction.
link |
00:39:03.480
And the model is, sorry to interrupt,
link |
00:39:05.200
and the model is based on your perception of the world,
link |
00:39:08.360
how your actions will affect that world.
link |
00:39:10.480
That's not...
link |
00:39:12.080
So how do you think about a model?
link |
00:39:12.920
That's not the important part,
link |
00:39:14.280
but it is technically important,
link |
00:39:16.000
but at this stage we can just think about predicting,
link |
00:39:18.240
let's say, stock market data, weather data,
link |
00:39:20.760
or IQ sequences, one, two, three, four, five,
link |
00:39:23.240
what comes next, yeah?
link |
00:39:24.520
So of course our actions affect what we're doing,
link |
00:39:28.680
but I'll come back to that in a second.
link |
00:39:30.240
So, and I'll keep just interrupting.
link |
00:39:32.160
So just to draw a line between prediction and planning,
link |
00:39:37.000
what do you mean by prediction in this way?
link |
00:39:40.880
It's trying to predict the environment
link |
00:39:43.640
without your long term action in the environment?
link |
00:39:47.280
What is prediction?
link |
00:39:49.480
Okay, if you want to put the actions in now,
link |
00:39:51.160
okay, then let's put it in now, yeah?
link |
00:39:53.680
So...
link |
00:39:54.720
We don't have to put them now.
link |
00:39:55.560
Yeah, yeah.
link |
00:39:56.400
Scratch it, scratch it, dumb question, okay.
link |
00:39:58.360
So the simplest form of prediction is
link |
00:40:01.280
that you just have data which you passively observe,
link |
00:40:04.840
and you want to predict what happens
link |
00:40:06.160
without interfering, as I said,
link |
00:40:08.960
weather forecasting, stock market, IQ sequences,
link |
00:40:12.120
or just anything, okay?
link |
00:40:16.240
And Solomonov's theory of induction based on compression,
link |
00:40:18.920
so you look for the shortest program
link |
00:40:20.400
which describes your data sequence,
link |
00:40:22.240
and then you take this program, run it,
link |
00:40:24.440
it reproduces your data sequence by definition,
link |
00:40:26.920
and then you let it continue running,
link |
00:40:29.000
and then it will produce some predictions,
link |
00:40:30.880
and you can rigorously prove that for any prediction task,
link |
00:40:37.160
this is essentially the best possible predictor.
link |
00:40:40.040
Of course, if there's a prediction task,
link |
00:40:43.680
or a task which is unpredictable,
link |
00:40:45.080
like, you know, you have fair coin flips.
link |
00:40:46.720
Yeah, I cannot predict the next fair coin flip.
link |
00:40:48.160
What Solomonov does is says,
link |
00:40:49.160
okay, next head is probably 50%.
link |
00:40:51.640
It's the best you can do.
link |
00:40:52.600
So if something is unpredictable,
link |
00:40:54.080
Solomonov will also not magically predict it.
link |
00:40:56.600
But if there is some pattern and predictability,
link |
00:40:59.640
then Solomonov induction will figure that out eventually,
link |
00:41:03.760
and not just eventually, but rather quickly,
link |
00:41:06.040
and you can have proof convergence rates,
link |
00:41:10.640
whatever your data is.
link |
00:41:11.720
So there's pure magic in a sense.
link |
00:41:14.760
What's the catch?
link |
00:41:15.600
Well, the catch is that it's not computable,
link |
00:41:17.040
and we come back to that later.
link |
00:41:18.200
You cannot just implement it
link |
00:41:19.720
even with Google resources here,
link |
00:41:21.160
and run it and predict the stock market and become rich.
link |
00:41:24.000
I mean, Ray Solomonov already tried it at the time.
link |
00:41:28.160
But so the basic task is you're in the environment,
link |
00:41:31.680
and you're interacting with the environment
link |
00:41:33.200
to try to learn to model that environment,
link |
00:41:35.400
and the model is in the space of all these programs,
link |
00:41:38.760
and your goal is to get a bunch of programs that are simple.
link |
00:41:41.360
Yeah, so let's go to the actions now.
link |
00:41:44.040
But actually, good that you asked.
link |
00:41:45.080
Usually I skip this part,
link |
00:41:46.400
although there is also a minor contribution which I did,
link |
00:41:48.760
so the action part,
link |
00:41:49.720
but I usually sort of just jump to the decision part.
link |
00:41:51.800
So let me explain the action part now.
link |
00:41:53.400
Thanks for asking.
link |
00:41:55.440
So you have to modify it a little bit
link |
00:41:58.760
by now not just predicting a sequence
link |
00:42:01.080
which just comes to you,
link |
00:42:03.240
but you have an observation, then you act somehow,
link |
00:42:06.760
and then you want to predict the next observation
link |
00:42:09.120
based on the past observation and your action.
link |
00:42:11.920
Then you take the next action.
link |
00:42:14.680
You don't care about predicting it because you're doing it.
link |
00:42:17.240
Then you get the next observation,
link |
00:42:19.040
and you want, well, before you get it,
link |
00:42:20.680
you want to predict it, again,
link |
00:42:21.880
based on your past action and observation sequence.
link |
00:42:24.880
You just condition extra on your actions.
link |
00:42:28.720
There's an interesting alternative
link |
00:42:30.520
that you also try to predict your own actions.
link |
00:42:35.600
If you want.
link |
00:42:36.600
In the past or the future?
link |
00:42:37.960
In your future actions.
link |
00:42:39.720
That's interesting.
link |
00:42:40.560
Yeah. Wait, let me wrap.
link |
00:42:43.480
I think my brain just broke.
link |
00:42:45.800
We should maybe discuss that later
link |
00:42:47.440
after I've explained the IXE model.
link |
00:42:48.760
That's an interesting variation.
link |
00:42:50.160
But that is a really interesting variation,
link |
00:42:52.080
and a quick comment.
link |
00:42:53.080
I don't know if you want to insert that in here,
link |
00:42:55.440
but you're looking at the, in terms of observations,
link |
00:42:59.200
you're looking at the entire, the big history,
link |
00:43:01.640
the long history of the observations.
link |
00:43:03.320
Exactly. That's very important.
link |
00:43:04.440
The whole history from birth sort of of the agent,
link |
00:43:07.520
and we can come back to that.
link |
00:43:09.080
And also why this is important.
link |
00:43:10.840
Often, you know, in RL, you have MDPs,
link |
00:43:13.560
micro decision processes, which are much more limiting.
link |
00:43:15.840
Okay. So now we can predict conditioned on actions.
link |
00:43:19.880
So even if you influence environment,
link |
00:43:21.600
but prediction is not all we want to do, right?
link |
00:43:24.120
We also want to act really in the world.
link |
00:43:26.960
And the question is how to choose the actions.
link |
00:43:29.120
And we don't want to greedily choose the actions,
link |
00:43:33.320
you know, just, you know, what is best in the next time step.
link |
00:43:36.480
And we first, I should say, you know, what is, you know,
link |
00:43:38.360
how do we measure performance?
link |
00:43:39.960
So we measure performance by giving the agent reward.
link |
00:43:43.360
That's the so called reinforcement learning framework.
link |
00:43:45.640
So every time step, you can give it a positive reward
link |
00:43:48.560
or negative reward, or maybe no reward.
link |
00:43:50.320
It could be a very scarce, right?
link |
00:43:51.880
Like if you play chess, just at the end of the game,
link |
00:43:54.160
you give plus one for winning or minus one for losing.
link |
00:43:56.920
So in the RxC framework, that's completely sufficient.
link |
00:43:59.240
So occasionally you give a reward signal
link |
00:44:01.440
and you ask the agent to maximize reward,
link |
00:44:04.040
but not greedily sort of, you know, the next one, next one,
link |
00:44:06.400
because that's very bad in the long run if you're greedy.
link |
00:44:10.040
So, but over the lifetime of the agent.
link |
00:44:12.440
So let's assume the agent lives for M time steps,
link |
00:44:14.600
or say dies in sort of a hundred years sharp.
link |
00:44:16.920
That's just, you know, the simplest model to explain.
link |
00:44:19.720
So it looks at the future reward sum
link |
00:44:22.120
and ask what is my action sequence,
link |
00:44:24.840
or actually more precisely my policy,
link |
00:44:26.920
which leads in expectation, because I don't know the world,
link |
00:44:32.160
to the maximum reward sum.
link |
00:44:34.120
Let me give you an analogy.
link |
00:44:36.120
In chess, for instance,
link |
00:44:38.240
we know how to play optimally in theory.
link |
00:44:40.320
It's just a mini max strategy.
link |
00:44:42.160
I play the move which seems best to me
link |
00:44:44.400
under the assumption that the opponent plays the move
link |
00:44:46.840
which is best for him.
link |
00:44:48.600
So best, so worst for me under the assumption that he,
link |
00:44:52.240
I play again, the best move.
link |
00:44:54.040
And then you have this expecting max three
link |
00:44:55.960
to the end of the game, and then you back propagate,
link |
00:44:58.880
and then you get the best possible move.
link |
00:45:00.760
So that is the optimal strategy,
link |
00:45:02.160
which von Neumann already figured out a long time ago,
link |
00:45:06.200
for playing adversarial games.
link |
00:45:09.000
Luckily, or maybe unluckily for the theory,
link |
00:45:11.640
it becomes harder.
link |
00:45:12.480
The world is not always adversarial.
link |
00:45:14.960
So it can be, if there are other humans,
link |
00:45:17.240
even cooperative, or nature is usually,
link |
00:45:20.120
I mean, the dead nature is stochastic, you know,
link |
00:45:22.720
things just happen randomly, or don't care about you.
link |
00:45:26.840
So what you have to take into account is the noise,
link |
00:45:29.440
and not necessarily adversarialty.
link |
00:45:30.760
So you replace the minimum on the opponent's side
link |
00:45:34.040
by an expectation,
link |
00:45:36.040
which is general enough to include also adversarial cases.
link |
00:45:40.080
So now instead of a mini max strategy,
link |
00:45:41.600
you have an expected max strategy.
link |
00:45:43.840
So far, so good.
link |
00:45:44.680
So that is well known.
link |
00:45:45.520
It's called sequential decision theory.
link |
00:45:48.040
But the question is,
link |
00:45:49.480
on which probability distribution do you base that?
link |
00:45:52.480
If I have the true probability distribution,
link |
00:45:55.400
like say I play backgammon, right?
link |
00:45:56.960
There's dice, and there's certain randomness involved.
link |
00:45:59.360
Yeah, I can calculate probabilities
link |
00:46:00.960
and feed it in the expected max,
link |
00:46:02.640
or the sequential decision tree,
link |
00:46:04.160
come up with the optimal decision if I have enough compute.
link |
00:46:07.160
But for the real world, we don't know that, you know,
link |
00:46:09.760
what is the probability the driver in front of me breaks?
link |
00:46:13.960
I don't know.
link |
00:46:14.920
So depends on all kinds of things,
link |
00:46:16.920
and especially new situations, I don't know.
link |
00:46:19.640
So this is this unknown thing about prediction,
link |
00:46:22.520
and there's where Solomonov comes in.
link |
00:46:24.240
So what you do is in sequential decision tree,
link |
00:46:26.360
you just replace the true distribution,
link |
00:46:28.680
which we don't know, by this universal distribution.
link |
00:46:32.960
I didn't explicitly talk about it,
link |
00:46:34.640
but this is used for universal prediction
link |
00:46:36.800
and plug it into the sequential decision tree mechanism.
link |
00:46:40.280
And then you get the best of both worlds.
link |
00:46:42.680
You have a long term planning agent,
link |
00:46:45.560
but it doesn't need to know anything about the world
link |
00:46:48.080
because the Solomonov induction part learns.
link |
00:46:51.640
Can you explicitly try to describe
link |
00:46:54.720
the universal distribution
link |
00:46:56.080
and how Solomonov induction plays a role here?
link |
00:46:59.680
I'm trying to understand.
link |
00:47:00.760
So what it does it, so in the simplest case,
link |
00:47:03.840
I said, take the shortest program, describing your data,
link |
00:47:06.600
run it, have a prediction which would be deterministic.
link |
00:47:09.040
Yes. Okay.
link |
00:47:10.760
But you should not just take the shortest program,
link |
00:47:13.160
but also consider the longer ones,
link |
00:47:15.320
but give it lower a priori probability.
link |
00:47:18.480
So in the Bayesian framework, you say a priori,
link |
00:47:22.400
any distribution, which is a model or a stochastic program,
link |
00:47:29.360
has a certain a priori probability,
link |
00:47:30.760
which is two to the minus, and why two to the minus length?
link |
00:47:33.320
You know, I could explain length of this program.
link |
00:47:35.520
So longer programs are punished a priori.
link |
00:47:39.760
And then you multiply it
link |
00:47:41.360
with the so called likelihood function,
link |
00:47:43.840
which is, as the name suggests,
link |
00:47:46.720
is how likely is this model given the data at hand.
link |
00:47:51.000
So if you have a very wrong model,
link |
00:47:53.240
it's very unlikely that this model is true.
link |
00:47:55.000
And so it is very small number.
link |
00:47:56.760
So even if the model is simple, it gets penalized by that.
link |
00:48:00.320
And what you do is then you take just the sum,
link |
00:48:02.480
or this is the average over it.
link |
00:48:04.440
And this gives you a probability distribution.
link |
00:48:07.600
So it's universal distribution or Solomonov distribution.
link |
00:48:10.480
So it's weighed by the simplicity of the program
link |
00:48:13.160
and the likelihood.
link |
00:48:14.120
Yes.
link |
00:48:15.320
It's kind of a nice idea.
link |
00:48:17.280
Yeah.
link |
00:48:18.120
So okay, and then you said there's you're playing N or M
link |
00:48:23.280
or forgot the letter steps into the future.
link |
00:48:25.960
So how difficult is that problem?
link |
00:48:28.320
What's involved there?
link |
00:48:29.520
Okay, so basic optimization problem.
link |
00:48:31.320
What are we talking about?
link |
00:48:32.160
Yeah, so you have a planning problem up to horizon M,
link |
00:48:34.920
and that's exponential time in the horizon M,
link |
00:48:38.040
which is, I mean, it's computable, but intractable.
link |
00:48:41.760
I mean, even for chess, it's already intractable
link |
00:48:43.520
to do that exactly.
link |
00:48:44.360
And you know, for goal.
link |
00:48:45.440
But it could be also discounted kind of framework where.
link |
00:48:48.680
Yeah, so having a hard horizon, you know, at 100 years,
link |
00:48:52.960
it's just for simplicity of discussing the model
link |
00:48:55.800
and also sometimes the math is simple.
link |
00:48:58.960
But there are lots of variations,
link |
00:49:00.000
actually quite interesting parameter.
link |
00:49:03.360
There's nothing really problematic about it,
link |
00:49:07.240
but it's very interesting.
link |
00:49:08.240
So for instance, you think, no,
link |
00:49:09.280
let's let the parameter M tend to infinity, right?
link |
00:49:12.880
You want an agent which lives forever, right?
link |
00:49:15.840
If you do it normally, you have two problems.
link |
00:49:17.480
First, the mathematics breaks down
link |
00:49:19.160
because you have an infinite reward sum,
link |
00:49:21.360
which may give infinity,
link |
00:49:22.720
and getting reward 0.1 every time step is infinity,
link |
00:49:25.560
and giving reward one every time step is infinity,
link |
00:49:27.600
so equally good.
link |
00:49:29.480
Not really what we want.
link |
00:49:31.080
Other problem is that if you have an infinite life,
link |
00:49:35.760
you can be lazy for as long as you want for 10 years
link |
00:49:38.560
and then catch up with the same expected reward.
link |
00:49:41.400
And think about yourself or maybe some friends or so.
link |
00:49:47.240
If they knew they lived forever, why work hard now?
link |
00:49:51.440
Just enjoy your life and then catch up later.
link |
00:49:54.240
So that's another problem with infinite horizon.
link |
00:49:56.600
And you mentioned, yes, we can go to discounting,
link |
00:49:59.760
but then the standard discounting
link |
00:50:01.200
is so called geometric discounting.
link |
00:50:03.080
So a dollar today is about worth
link |
00:50:05.400
as much as $1.05 tomorrow.
link |
00:50:08.320
So if you do the so called geometric discounting,
link |
00:50:10.320
you have introduced an effective horizon.
link |
00:50:12.960
So the agent is now motivated to look ahead
link |
00:50:15.960
a certain amount of time effectively.
link |
00:50:18.360
It's like a moving horizon.
link |
00:50:20.600
And for any fixed effective horizon,
link |
00:50:23.840
there is a problem to solve,
link |
00:50:26.520
which requires larger horizon.
link |
00:50:28.080
So if I look ahead five time steps,
link |
00:50:30.440
I'm a terrible chess player, right?
link |
00:50:32.440
I'll need to look ahead longer.
link |
00:50:34.560
If I play go, I probably have to look ahead even longer.
link |
00:50:36.720
So for every problem, for every horizon,
link |
00:50:40.280
there is a problem which this horizon cannot solve.
link |
00:50:43.800
But I introduced the so called near harmonic horizon,
link |
00:50:46.960
which goes down with one over T
link |
00:50:48.360
rather than exponential in T,
link |
00:50:49.960
which produces an agent,
link |
00:50:51.600
which effectively looks into the future
link |
00:50:53.880
proportional to each age.
link |
00:50:55.200
So if it's five years old, it plans for five years.
link |
00:50:57.360
If it's 100 years old, it then plans for 100 years.
link |
00:51:00.440
And it's a little bit similar to humans too, right?
link |
00:51:02.480
I mean, children don't plan ahead very long,
link |
00:51:04.320
but then we get adult, we play ahead more longer.
link |
00:51:07.080
Maybe when we get very old,
link |
00:51:08.560
I mean, we know that we don't live forever.
link |
00:51:10.360
Maybe then our horizon shrinks again.
link |
00:51:12.840
So that's really interesting.
link |
00:51:16.040
So adjusting the horizon,
link |
00:51:18.120
is there some mathematical benefit of that?
link |
00:51:20.680
Or is it just a nice,
link |
00:51:22.960
I mean, intuitively, empirically,
link |
00:51:25.560
it would probably be a good idea
link |
00:51:26.560
to sort of push the horizon back,
link |
00:51:27.960
extend the horizon as you experience more of the world.
link |
00:51:33.480
But is there some mathematical conclusions here
link |
00:51:35.840
that are beneficial?
link |
00:51:37.240
With solomonic reductions or the prediction part,
link |
00:51:38.920
we have extremely strong finite time,
link |
00:51:42.320
but not finite data results.
link |
00:51:44.760
So you have so and so much data,
link |
00:51:46.000
then you lose so and so much.
link |
00:51:47.160
So it's a, the theory is really great.
link |
00:51:49.400
With the ICSE model, with the planning part,
link |
00:51:51.920
many results are only asymptotic, which, well, this is...
link |
00:51:56.800
What does asymptotic mean?
link |
00:51:57.640
Asymptotic means you can prove, for instance,
link |
00:51:59.920
that in the long run, if the agent, you know,
link |
00:52:02.360
acts long enough, then, you know,
link |
00:52:04.160
it performs optimal or some nice thing happens.
link |
00:52:06.400
So, but you don't know how fast it converges.
link |
00:52:09.480
So it may converge fast,
link |
00:52:10.880
but we're just not able to prove it
link |
00:52:12.280
because of a difficult problem.
link |
00:52:13.760
Or maybe there's a bug in the model
link |
00:52:17.320
so that it's really that slow.
link |
00:52:19.520
So that is what asymptotic means,
link |
00:52:21.800
sort of eventually, but we don't know how fast.
link |
00:52:24.680
And if I give the agent a fixed horizon M,
link |
00:52:28.920
then I cannot prove asymptotic results, right?
link |
00:52:32.240
So I mean, sort of if it dies in a hundred years,
link |
00:52:35.040
then in a hundred years it's over, I cannot say eventually.
link |
00:52:37.840
So this is the advantage of the discounting
link |
00:52:40.600
that I can prove asymptotic results.
link |
00:52:42.760
So just to clarify, so I, okay, I made,
link |
00:52:46.960
I've built up a model, we're now in the moment of,
link |
00:52:51.720
I have this way of looking several steps ahead.
link |
00:52:55.360
How do I pick what action I will take?
link |
00:52:58.880
It's like with the playing chess, right?
link |
00:53:00.720
You do this minimax.
link |
00:53:02.320
In this case here, do expectimax based on the solomonov
link |
00:53:05.240
distribution, you propagate back,
link |
00:53:09.000
and then while an action falls out,
link |
00:53:12.080
the action which maximizes the future expected reward
link |
00:53:15.480
on the solomonov distribution,
link |
00:53:16.800
and then you just take this action.
link |
00:53:18.240
And then repeat.
link |
00:53:19.640
And then you get a new observation,
link |
00:53:20.960
and you feed it in this action observation,
link |
00:53:22.640
then you repeat.
link |
00:53:23.480
And the reward, so on.
link |
00:53:24.880
Yeah, so you rewrote too, yeah.
link |
00:53:26.760
And then maybe you can even predict your own action.
link |
00:53:29.080
I love that idea.
link |
00:53:29.960
But okay, this big framework,
link |
00:53:33.160
what is it, I mean,
link |
00:53:36.560
it's kind of a beautiful mathematical framework
link |
00:53:38.840
to think about artificial general intelligence.
link |
00:53:41.880
What can you, what does it help you into it
link |
00:53:45.800
about how to build such systems?
link |
00:53:49.080
Or maybe from another perspective,
link |
00:53:51.720
what does it help us in understanding AGI?
link |
00:53:56.720
So when I started in the field,
link |
00:54:00.440
I was always interested in two things.
link |
00:54:01.800
One was AGI, the name didn't exist then,
link |
00:54:05.800
what's called general AI or strong AI,
link |
00:54:09.200
and the physics theory of everything.
link |
00:54:10.800
So I switched back and forth between computer science
link |
00:54:13.120
and physics quite often.
link |
00:54:14.680
You said the theory of everything.
link |
00:54:15.960
The theory of everything, yeah.
link |
00:54:17.360
Those are basically the two biggest problems
link |
00:54:19.240
before all of humanity.
link |
00:54:21.360
Yeah, I can explain if you wanted some later time,
link |
00:54:28.480
why I'm interested in these two questions.
link |
00:54:29.960
Can I ask you in a small tangent,
link |
00:54:32.080
if it was one to be solved,
link |
00:54:37.120
which one would you,
link |
00:54:38.600
if an apple fell on your head
link |
00:54:41.800
and there was a brilliant insight
link |
00:54:43.280
and you could arrive at the solution to one,
link |
00:54:46.360
would it be AGI or the theory of everything?
link |
00:54:49.200
Definitely AGI, because once the AGI problem is solved,
link |
00:54:51.800
I can ask the AGI to solve the other problem for me.
link |
00:54:56.520
Yeah, brilliant input.
link |
00:54:57.720
Okay, so as you were saying about it.
link |
00:55:01.200
Okay, so, and the reason why I didn't settle,
link |
00:55:04.960
I mean, this thought about,
link |
00:55:07.400
once you have solved AGI, it solves all kinds of other,
link |
00:55:09.960
not just the theory of every problem,
link |
00:55:11.240
but all kinds of more useful problems to humanity
link |
00:55:14.160
is very appealing to many people.
link |
00:55:16.280
And I had this thought also,
link |
00:55:18.240
but I was quite disappointed with the state of the art
link |
00:55:23.960
of the field of AI.
link |
00:55:25.440
There was some theory about logical reasoning,
link |
00:55:28.160
but I was never convinced that this will fly.
link |
00:55:30.600
And then there was this more heuristic approaches
link |
00:55:33.320
with neural networks and I didn't like these heuristics.
link |
00:55:37.480
So, and also I didn't have any good idea myself.
link |
00:55:42.120
So that's the reason why I toggled back and forth
link |
00:55:44.240
quite some while and even worked four and a half years
link |
00:55:46.360
in a company developing software,
link |
00:55:48.240
something completely unrelated.
link |
00:55:49.680
But then I had this idea about the ICSE model.
link |
00:55:52.800
And so what it gives you, it gives you a gold standard.
link |
00:55:57.760
So I have proven that this is the most intelligent agents
link |
00:56:02.360
which anybody could build in quotation mark,
link |
00:56:06.840
because it's just mathematical
link |
00:56:08.200
and you need infinite compute.
link |
00:56:11.160
But this is the limit and this is completely specified.
link |
00:56:14.920
It's not just a framework and every year,
link |
00:56:19.280
tens of frameworks are developed,
link |
00:56:21.200
which are just skeletons and then pieces are missing.
link |
00:56:23.920
And usually these missing pieces,
link |
00:56:25.360
turn out to be really, really difficult.
link |
00:56:27.360
And so this is completely and uniquely defined
link |
00:56:31.080
and we can analyze that mathematically.
link |
00:56:33.480
And we've also developed some approximations.
link |
00:56:37.320
I can talk about that a little bit later.
link |
00:56:40.280
That would be sort of the top down approach,
link |
00:56:41.800
like say for Neumann's minimax theory,
link |
00:56:44.240
that's the theoretical optimal play of games.
link |
00:56:47.240
And now we need to approximate it,
link |
00:56:48.800
put heuristics in, prune the tree, blah, blah, blah,
link |
00:56:51.040
and so on.
link |
00:56:51.880
So we can do that also with the ICSE model,
link |
00:56:53.200
but for general AI.
link |
00:56:55.440
It can also inspire those,
link |
00:56:57.640
and most researchers go bottom up, right?
link |
00:57:00.840
They have the systems,
link |
00:57:01.680
they try to make it more general, more intelligent.
link |
00:57:04.160
It can inspire in which direction to go.
link |
00:57:08.120
What do you mean by that?
link |
00:57:09.120
So if you have some choice to make, right?
link |
00:57:11.200
So how should I evaluate my system
link |
00:57:13.120
if I can't do cross validation?
link |
00:57:15.400
How should I do my learning
link |
00:57:18.040
if my standard regularization doesn't work well?
link |
00:57:21.480
So the answer is always this,
link |
00:57:22.520
we have a system which does everything, that's ICSE.
link |
00:57:25.000
It's just completely in the ivory tower,
link |
00:57:27.760
completely useless from a practical point of view.
link |
00:57:30.600
But you can look at it and see,
link |
00:57:31.920
ah, yeah, maybe I can take some aspects.
link |
00:57:34.920
And instead of Kolmogorov complexity,
link |
00:57:36.520
that just takes some compressors,
link |
00:57:38.160
which has been developed so far.
link |
00:57:39.960
And for the planning, well, we have UCT,
link |
00:57:42.120
which has also been used in Go.
link |
00:57:45.240
And at least it's inspired me a lot
link |
00:57:50.040
to have this formal definition.
link |
00:57:54.160
And if you look at other fields,
link |
00:57:55.800
like I always come back to physics
link |
00:57:57.720
because I have a physics background,
link |
00:57:58.960
think about the phenomenon of energy.
link |
00:58:00.680
That was long time a mysterious concept.
link |
00:58:03.160
And at some point it was completely formalized.
link |
00:58:05.880
And that really helped a lot.
link |
00:58:08.160
And you can point out a lot of these things
link |
00:58:10.720
which were first mysterious and vague,
link |
00:58:12.960
and then they have been rigorously formalized.
link |
00:58:15.160
Speed and acceleration has been confused, right?
link |
00:58:18.240
Until it was formally defined,
link |
00:58:19.680
yeah, there was a time like this.
link |
00:58:21.040
And people often who don't have any background,
link |
00:58:25.080
still confuse it.
link |
00:58:28.280
And this ICSE model or the intelligence definitions,
link |
00:58:31.920
which is sort of the dual to it,
link |
00:58:33.160
we come back to that later,
link |
00:58:34.640
formalizes the notion of intelligence
link |
00:58:37.160
uniquely and rigorously.
link |
00:58:38.880
So in a sense, it serves as kind of the light
link |
00:58:41.640
at the end of the tunnel.
link |
00:58:43.000
So for, I mean, there's a million questions
link |
00:58:46.800
I could ask her.
link |
00:58:47.720
So maybe kind of, okay,
link |
00:58:50.280
let's feel around in the dark a little bit.
link |
00:58:52.080
So there's been here a deep mind,
link |
00:58:54.720
but in general, been a lot of breakthrough ideas,
link |
00:58:56.960
just like we've been saying around reinforcement learning.
link |
00:58:59.480
So how do you see the progress
link |
00:59:02.080
in reinforcement learning is different?
link |
00:59:04.440
Like which subset of ICSE does it occupy?
link |
00:59:08.080
The current, like you said,
link |
00:59:10.600
maybe the Markov assumption is made quite often
link |
00:59:14.520
in reinforcement learning.
link |
00:59:16.280
There's other assumptions made
link |
00:59:20.240
in order to make the system work.
link |
00:59:21.560
What do you see as the difference connection
link |
00:59:24.200
between reinforcement learning and ICSE?
link |
00:59:26.800
And so the major difference is that
link |
00:59:30.560
essentially all other approaches,
link |
00:59:33.280
they make stronger assumptions.
link |
00:59:35.600
So in reinforcement learning, the Markov assumption
link |
00:59:38.320
is that the next state or next observation
link |
00:59:41.520
only depends on the previous observation
link |
00:59:43.360
and not the whole history,
link |
00:59:45.240
which makes, of course, the mathematics much easier
link |
00:59:47.560
rather than dealing with histories.
link |
00:59:49.800
Of course, they profit from it also,
link |
00:59:51.600
because then you have algorithms
link |
00:59:53.080
that run on current computers
link |
00:59:54.320
and do something practically useful.
link |
00:59:56.640
But for general AI, all the assumptions
link |
00:59:59.680
which are made by other approaches,
link |
01:00:01.720
we know already now they are limiting.
link |
01:00:04.040
So, for instance, usually you need
link |
01:00:07.760
a goddessity assumption in the MDP frameworks
link |
01:00:09.840
in order to learn.
link |
01:00:10.680
A goddessity essentially means that you can recover
link |
01:00:13.800
from your mistakes and that there are no traps
link |
01:00:15.800
in the environment.
link |
01:00:17.400
And if you make this assumption,
link |
01:00:19.040
then essentially you can go back to a previous state,
link |
01:00:22.040
go there a couple of times and then learn
link |
01:00:24.320
what statistics and what the state is like,
link |
01:00:29.040
and then in the long run perform well in this state.
link |
01:00:32.520
But there are no fundamental problems.
link |
01:00:35.200
But in real life, we know there can be one single action.
link |
01:00:38.480
One second of being inattentive while driving a car fast
link |
01:00:43.920
can ruin the rest of my life.
link |
01:00:45.240
I can become quadriplegic or whatever.
link |
01:00:47.800
So, and there's no recovery anymore.
link |
01:00:49.680
So, the real world is not ergodic, I always say.
link |
01:00:52.160
There are traps and there are situations
link |
01:00:53.920
where you are not recover from.
link |
01:00:55.760
And very little theory has been developed for this case.
link |
01:01:00.760
What about, what do you see in the context of IECSIA
link |
01:01:05.760
as the role of exploration?
link |
01:01:07.960
Sort of, you mentioned in the real world
link |
01:01:13.440
you can get into trouble when we make the wrong decisions
link |
01:01:16.120
and really pay for it.
link |
01:01:17.480
But exploration seems to be fundamentally important
link |
01:01:20.480
for learning about this world, for gaining new knowledge.
link |
01:01:23.760
So, is exploration baked in?
link |
01:01:27.360
Another way to ask it, what are the potential
link |
01:01:29.680
to ask it, what are the parameters of IECSIA
link |
01:01:34.360
that can be controlled?
link |
01:01:36.200
Yeah, I say the good thing is that there are no parameters
link |
01:01:38.880
to control.
link |
01:01:40.200
Some other people track knobs to control.
link |
01:01:43.120
And you can do that.
link |
01:01:44.120
I mean, you can modify IECSIA so that you have some knobs
link |
01:01:46.880
to play with if you want to.
link |
01:01:48.800
But the exploration is directly baked in.
link |
01:01:53.640
And that comes from the Bayesian learning
link |
01:01:56.960
and the longterm planning.
link |
01:01:58.680
So these together already imply exploration.
link |
01:02:04.200
You can nicely and explicitly prove that
link |
01:02:08.280
for simple problems like so called bandit problems,
link |
01:02:13.560
where you say, to give a real world example,
link |
01:02:18.000
say you have two medical treatments, A and B,
link |
01:02:20.200
you don't know the effectiveness,
link |
01:02:21.560
you try A a little bit, B a little bit,
link |
01:02:23.360
but you don't want to harm too many patients.
link |
01:02:25.760
So you have to sort of trade off exploring.
link |
01:02:29.800
And at some point you want to explore
link |
01:02:31.720
and you can do the mathematics
link |
01:02:34.080
and figure out the optimal strategy.
link |
01:02:38.040
They talk about Bayesian agents,
link |
01:02:39.120
they're also non Bayesian agents,
link |
01:02:41.120
but it shows that this Bayesian framework
link |
01:02:44.240
by taking a prior or possible worlds,
link |
01:02:47.400
doing the Bayesian mixture,
link |
01:02:48.440
then the Bayes optimal decision with longterm planning
link |
01:02:50.640
that is important,
link |
01:02:52.320
automatically implies exploration,
link |
01:02:55.880
also to the proper extent,
link |
01:02:57.600
not too much exploration and not too little.
link |
01:02:59.680
It is very simple settings.
link |
01:03:01.520
In the IXE model, I was also able to prove
link |
01:03:04.400
that it is a self optimizing theorem
link |
01:03:06.160
or asymptotic optimality theorems,
link |
01:03:07.720
although they're only asymptotic, not finite time bounds.
link |
01:03:10.480
So it seems like the longterm planning is really important,
link |
01:03:13.120
but the longterm part of the planning is really important.
link |
01:03:15.720
And also, I mean, maybe a quick tangent,
link |
01:03:18.920
how important do you think is removing
link |
01:03:21.360
the Markov assumption and looking at the full history?
link |
01:03:25.320
Sort of intuitively, of course, it's important,
link |
01:03:28.040
but is it like fundamentally transformative
link |
01:03:30.960
to the entirety of the problem?
link |
01:03:33.400
What's your sense of it?
link |
01:03:34.320
Like, cause we all, we make that assumption quite often.
link |
01:03:37.800
It's just throwing away the past.
link |
01:03:40.000
No, I think it's absolutely crucial.
link |
01:03:42.960
The question is whether there's a way to deal with it
link |
01:03:47.240
in a more heuristic and still sufficiently well way.
link |
01:03:52.360
So I have to come up with an example and fly,
link |
01:03:55.480
but you have some key event in your life,
link |
01:03:59.360
long time ago in some city or something,
link |
01:04:02.080
you realized that's a really dangerous street or whatever.
link |
01:04:05.360
And you want to remember that forever,
link |
01:04:08.000
in case you come back there.
link |
01:04:09.760
Kind of a selective kind of memory.
link |
01:04:11.520
So you remember all the important events in the past,
link |
01:04:15.160
but somehow selecting the important is.
link |
01:04:17.480
That's very hard.
link |
01:04:18.600
And I'm not concerned about just storing the whole history.
link |
01:04:21.720
Just, you can calculate, human life says 30 or 100 years,
link |
01:04:26.640
doesn't matter, right?
link |
01:04:28.600
How much data comes in through the vision system
link |
01:04:31.800
and the auditory system, you compress it a little bit,
link |
01:04:35.200
in this case, lossily and store it.
link |
01:04:37.560
We are soon in the means of just storing it.
link |
01:04:40.520
But you still need to the selection for the planning part
link |
01:04:44.920
and the compression for the understanding part.
link |
01:04:47.280
The raw storage I'm really not concerned about.
link |
01:04:50.000
And I think we should just store,
link |
01:04:52.240
if you develop an agent,
link |
01:04:54.600
preferably just store all the interaction history.
link |
01:04:59.400
And then you build of course models on top of it
link |
01:05:02.240
and you compress it and you are selective,
link |
01:05:04.960
but occasionally you go back to the old data
link |
01:05:08.120
and reanalyze it based on your new experience you have.
link |
01:05:12.000
Sometimes you are in school,
link |
01:05:13.840
you learn all these things you think is totally useless
link |
01:05:16.800
and much later you realize,
link |
01:05:18.200
oh, they were not so useless as you thought.
link |
01:05:21.600
I'm looking at you, linear algebra.
link |
01:05:24.080
Right.
link |
01:05:25.160
So maybe let me ask about objective functions
link |
01:05:27.720
because that rewards, it seems to be an important part.
link |
01:05:33.440
The rewards are kind of given to the system.
link |
01:05:38.200
For a lot of people,
link |
01:05:39.560
the specification of the objective function
link |
01:05:46.600
is a key part of intelligence.
link |
01:05:48.440
The agent itself figuring out what is important.
link |
01:05:52.920
What do you think about that?
link |
01:05:54.640
Is it possible within the IXE framework
link |
01:05:58.560
to yourself discover the reward
link |
01:06:01.880
based on which you should operate?
link |
01:06:05.440
Okay, that will be a long answer.
link |
01:06:07.080
So, and that is a very interesting question.
link |
01:06:10.800
And I'm asked a lot about this question,
link |
01:06:13.360
where do the rewards come from?
link |
01:06:15.600
And that depends.
link |
01:06:17.760
So, and then I give you now a couple of answers.
link |
01:06:21.320
So if you want to build agents, now let's start simple.
link |
01:06:26.320
So let's assume we want to build an agent
link |
01:06:28.680
based on the IXE model, which performs a particular task.
link |
01:06:33.200
Let's start with something super simple,
link |
01:06:34.720
like, I mean, super simple, like playing chess,
link |
01:06:37.320
or go or something, yeah.
link |
01:06:38.840
Then you just, the reward is winning the game is plus one,
link |
01:06:42.480
losing the game is minus one, done.
link |
01:06:45.280
You apply this agent.
link |
01:06:46.360
If you have enough compute, you let it self play
link |
01:06:49.080
and it will learn the rules of the game,
link |
01:06:50.840
will play perfect chess after some while, problem solved.
link |
01:06:54.320
Okay, so if you have more complicated problems,
link |
01:06:59.520
then you may believe that you have the right reward,
link |
01:07:03.640
but it's not.
link |
01:07:04.840
So a nice, cute example is the elevator control
link |
01:07:08.400
that is also in Rich Sutton's book,
link |
01:07:10.400
which is a great book, by the way.
link |
01:07:13.600
So you control the elevator and you think,
link |
01:07:15.640
well, maybe the reward should be coupled
link |
01:07:17.760
to how long people wait in front of the elevator.
link |
01:07:20.200
Long wait is bad.
link |
01:07:21.840
You program it and you do it.
link |
01:07:23.680
And what happens is the elevator eagerly picks up
link |
01:07:25.840
all the people, but never drops them off.
link |
01:07:28.040
So then you realize, oh, maybe the time in the elevator
link |
01:07:33.120
also counts, so you minimize the sum, yeah?
link |
01:07:36.280
And the elevator does that, but never picks up the people
link |
01:07:39.000
in the 10th floor and the top floor
link |
01:07:40.400
because in expectation, it's not worth it.
link |
01:07:42.320
Just let them stay.
link |
01:07:43.240
Yeah.
link |
01:07:44.080
Yeah.
link |
01:07:44.920
Yeah.
link |
01:07:45.760
So even in apparently simple problems,
link |
01:07:49.600
you can make mistakes, yeah?
link |
01:07:51.240
And that's what in more serious contexts
link |
01:07:55.240
AGI safety researchers consider.
link |
01:07:58.000
So now let's go back to general agents.
link |
01:08:00.640
So assume you want to build an agent,
link |
01:08:02.360
which is generally useful to humans, yeah?
link |
01:08:05.080
So you have a household robot, yeah?
link |
01:08:07.440
And it should do all kinds of tasks.
link |
01:08:09.840
So in this case, the human should give the reward
link |
01:08:13.440
on the fly.
link |
01:08:14.440
I mean, maybe it's pre trained in the factory
link |
01:08:16.200
and that there's some sort of internal reward
link |
01:08:18.040
for the battery level or whatever, yeah?
link |
01:08:19.920
But so it does the dishes badly, you punish the robot,
link |
01:08:24.160
it does it good, you reward the robot
link |
01:08:25.680
and then train it to a new task, yeah, like a child, right?
link |
01:08:28.440
So you need the human in the loop.
link |
01:08:31.160
If you want a system, which is useful to the human.
link |
01:08:34.520
And as long as these agents stay subhuman level,
link |
01:08:39.360
that should work reasonably well,
link |
01:08:41.080
apart from these examples.
link |
01:08:43.040
It becomes critical if they become on a human level.
link |
01:08:45.840
It's like with children, small children,
link |
01:08:47.200
you have reasonably well under control,
link |
01:08:48.800
they become older, the reward technique
link |
01:08:51.400
doesn't work so well anymore.
link |
01:08:54.160
So then finally, so this would be agents,
link |
01:08:58.600
which are just, you could say slaves to the humans, yeah?
link |
01:09:01.800
So if you are more ambitious and just say,
link |
01:09:03.960
we want to build a new species of intelligent beings,
link |
01:09:08.080
we put them on a new planet
link |
01:09:09.360
and we want them to develop this planet or whatever.
link |
01:09:12.080
So we don't give them any reward.
link |
01:09:15.360
So what could we do?
link |
01:09:16.920
And you could try to come up with some reward functions
link |
01:09:21.080
like it should maintain itself, the robot,
link |
01:09:23.400
it should maybe multiply, build more robots, right?
link |
01:09:28.000
And maybe all kinds of things which you find useful,
link |
01:09:33.000
but that's pretty hard, right?
link |
01:09:34.800
What does self maintenance mean?
link |
01:09:36.640
What does it mean to build a copy?
link |
01:09:38.120
Should it be exact copy, an approximate copy?
link |
01:09:40.680
And so that's really hard,
link |
01:09:42.040
but Laurent also at DeepMind developed a beautiful model.
link |
01:09:48.800
So it just took the ICSE model
link |
01:09:50.560
and coupled the rewards to information gain.
link |
01:09:54.960
So he said the reward is proportional
link |
01:09:57.840
to how much the agent had learned about the world.
link |
01:10:00.720
And you can rigorously, formally, uniquely define that
link |
01:10:03.320
in terms of archival versions, okay?
link |
01:10:05.840
So if you put that in, you get a completely autonomous agent.
link |
01:10:09.880
And actually, interestingly, for this agent,
link |
01:10:11.680
we can prove much stronger result
link |
01:10:13.120
than for the general agent, which is also nice.
link |
01:10:16.000
And if you let this agent loose,
link |
01:10:18.080
it will be in a sense, the optimal scientist.
link |
01:10:20.000
It is absolutely curious to learn as much as possible
link |
01:10:22.920
about the world.
link |
01:10:24.120
And of course, it will also have
link |
01:10:25.720
a lot of instrumental goals, right?
link |
01:10:27.160
In order to learn, it needs to at least survive, right?
link |
01:10:29.560
A dead agent is not good for anything.
link |
01:10:31.520
So it needs to have self preservation.
link |
01:10:33.960
And if it builds small helpers, acquiring more information,
link |
01:10:38.000
it will do that, yeah?
link |
01:10:39.120
If exploration, space exploration or whatever is necessary,
link |
01:10:43.680
right, to gathering information and develop it.
link |
01:10:45.920
So it has a lot of instrumental goals
link |
01:10:48.200
falling on this information gain.
link |
01:10:51.000
And this agent is completely autonomous of us.
link |
01:10:53.760
No rewards necessary anymore.
link |
01:10:55.640
Yeah, of course, it could find a way
link |
01:10:57.560
to game the concept of information
link |
01:10:59.600
and get stuck in that library
link |
01:11:04.080
that you mentioned beforehand
link |
01:11:05.720
with a very large number of books.
link |
01:11:08.600
The first agent had this problem.
link |
01:11:10.680
It would get stuck in front of an old TV screen,
link |
01:11:13.640
which has just had white noise.
link |
01:11:14.960
Yeah, white noise, yeah.
link |
01:11:16.480
But the second version can deal with at least stochasticity.
link |
01:11:21.360
Well.
link |
01:11:22.200
Yeah, what about curiosity?
link |
01:11:23.680
This kind of word, curiosity, creativity,
link |
01:11:27.920
is that kind of the reward function being
link |
01:11:30.880
of getting new information?
link |
01:11:31.920
Is that similar to idea of kind of injecting exploration
link |
01:11:39.000
for its own sake inside the reward function?
link |
01:11:41.880
Do you find this at all appealing, interesting?
link |
01:11:44.880
I think that's a nice definition.
link |
01:11:46.320
Curiosity is rewards.
link |
01:11:48.600
Sorry, curiosity is exploration for its own sake.
link |
01:11:54.800
Yeah, I would accept that.
link |
01:11:57.120
But most curiosity, well, in humans,
link |
01:11:59.920
and especially in children,
link |
01:12:01.240
is not just for its own sake,
link |
01:12:03.040
but for actually learning about the environment
link |
01:12:05.960
and for behaving better.
link |
01:12:08.440
So I think most curiosity is tied in the end
link |
01:12:13.120
towards performing better.
link |
01:12:14.840
Well, okay, so if intelligence systems
link |
01:12:17.680
need to have this reward function,
link |
01:12:19.760
let me, you're an intelligence system,
link |
01:12:23.680
currently passing the torrent test quite effectively.
link |
01:12:26.600
What's the reward function
link |
01:12:30.240
of our human intelligence existence?
link |
01:12:33.920
What's the reward function
link |
01:12:35.160
that Marcus Hutter is operating under?
link |
01:12:37.720
Okay, to the first question,
link |
01:12:39.760
the biological reward function is to survive and to spread,
link |
01:12:44.480
and very few humans sort of are able to overcome
link |
01:12:48.200
this biological reward function.
link |
01:12:50.920
But we live in a very nice world
link |
01:12:54.200
where we have lots of spare time
link |
01:12:56.240
and can still survive and spread,
link |
01:12:57.640
so we can develop arbitrary other interests,
link |
01:13:01.920
which is quite interesting.
link |
01:13:03.280
On top of that.
link |
01:13:04.400
On top of that, yeah.
link |
01:13:06.160
But the survival and spreading sort of is,
link |
01:13:09.120
I would say, the goal or the reward function of humans,
link |
01:13:13.160
so that the core one.
link |
01:13:15.360
I like how you avoided answering the second question,
link |
01:13:17.480
which a good intelligence system would.
link |
01:13:19.760
So my.
link |
01:13:20.880
That your own meaning of life and the reward function.
link |
01:13:24.320
My own meaning of life and reward function
link |
01:13:26.960
is to find an AGI to build it.
link |
01:13:31.200
Beautifully put.
link |
01:13:32.040
Okay, let's dissect the X even further.
link |
01:13:34.280
So one of the assumptions is kind of infinity
link |
01:13:37.960
keeps creeping up everywhere,
link |
01:13:39.680
which, what are your thoughts
link |
01:13:44.960
on kind of bounded rationality
link |
01:13:46.920
and sort of the nature of our existence
link |
01:13:50.040
and intelligence systems is that we're operating
link |
01:13:52.000
always under constraints, under limited time,
link |
01:13:55.680
limited resources.
link |
01:13:57.640
How does that, how do you think about that
link |
01:13:59.480
within the IXE framework,
link |
01:14:01.600
within trying to create an AGI system
link |
01:14:04.480
that operates under these constraints?
link |
01:14:06.760
Yeah, that is one of the criticisms about IXE,
link |
01:14:09.200
that it ignores computation and completely.
link |
01:14:11.320
And some people believe that intelligence
link |
01:14:13.800
is inherently tied to what's bounded resources.
link |
01:14:19.520
What do you think on this one point?
link |
01:14:21.160
Do you think it's,
link |
01:14:22.480
do you think the bounded resources
link |
01:14:23.920
are fundamental to intelligence?
link |
01:14:27.840
I would say that an intelligence notion,
link |
01:14:31.160
which ignores computational limits is extremely useful.
link |
01:14:35.520
A good intelligence notion,
link |
01:14:37.120
which includes these resources would be even more useful,
link |
01:14:40.720
but we don't have that yet.
link |
01:14:43.280
And so look at other fields outside of computer science,
link |
01:14:48.480
computational aspects never play a fundamental role.
link |
01:14:52.240
You develop biological models for cells,
link |
01:14:54.880
something in physics, these theories,
link |
01:14:56.680
I mean, become more and more crazy
link |
01:14:58.160
and harder and harder to compute.
link |
01:15:00.320
Well, in the end, of course,
link |
01:15:01.440
we need to do something with this model,
link |
01:15:02.960
but this is more a nuisance than a feature.
link |
01:15:05.520
And I'm sometimes wondering if artificial intelligence
link |
01:15:10.040
would not sit in a computer science department,
link |
01:15:12.080
but in a philosophy department,
link |
01:15:14.040
then this computational focus
link |
01:15:16.120
would be probably significantly less.
link |
01:15:18.400
I mean, think about the induction problem
link |
01:15:19.720
is more in the philosophy department.
link |
01:15:22.080
There's virtually no paper who cares about,
link |
01:15:24.480
how long it takes to compute the answer.
link |
01:15:26.440
That is completely secondary.
link |
01:15:28.320
Of course, once we have figured out the first problem,
link |
01:15:31.680
so intelligence without computational resources,
link |
01:15:35.840
then the next and very good question is,
link |
01:15:39.400
could we improve it by including computational resources,
link |
01:15:42.480
but nobody was able to do that so far
link |
01:15:45.520
in an even halfway satisfactory manner.
link |
01:15:49.240
I like that, that in the long run,
link |
01:15:51.600
the right department to belong to is philosophy.
link |
01:15:55.160
That's actually quite a deep idea,
link |
01:15:58.680
or even to at least to think about
link |
01:16:01.440
big picture philosophical questions,
link |
01:16:03.680
big picture questions,
link |
01:16:05.280
even in the computer science department.
link |
01:16:07.400
But you've mentioned approximation.
link |
01:16:10.000
Sort of, there's a lot of infinity,
link |
01:16:12.160
a lot of huge resources needed.
link |
01:16:13.920
Are there approximations to IXE
link |
01:16:16.280
that within the IXE framework that are useful?
link |
01:16:19.800
Yeah, we have developed a couple of approximations.
link |
01:16:23.120
And what we do there is that
link |
01:16:27.280
the Solomov induction part,
link |
01:16:29.840
which was find the shortest program describing your data,
link |
01:16:33.640
we just replace it by standard data compressors.
link |
01:16:36.640
And the better compressors get,
link |
01:16:39.240
the better this part will become.
link |
01:16:41.680
We focus on a particular compressor
link |
01:16:43.400
called context tree weighting,
link |
01:16:44.560
which is pretty amazing, not so well known.
link |
01:16:48.520
It has beautiful theoretical properties,
link |
01:16:50.120
also works reasonably well in practice.
link |
01:16:52.240
So we use that for the approximation of the induction
link |
01:16:55.160
and the learning and the prediction part.
link |
01:16:58.160
And for the planning part,
link |
01:17:01.680
we essentially just took the ideas from a computer go
link |
01:17:05.560
from 2006.
link |
01:17:07.320
It was Java Zipes Bari, also now at DeepMind,
link |
01:17:11.320
who developed the so called UCT algorithm,
link |
01:17:14.600
upper confidence bound for trees algorithm
link |
01:17:17.440
on top of the Monte Carlo tree search.
link |
01:17:19.040
So we approximate this planning part by sampling.
link |
01:17:23.200
And it's successful on some small toy problems.
link |
01:17:29.280
We don't want to lose the generality, right?
link |
01:17:33.480
And that's sort of the handicap, right?
link |
01:17:34.920
If you want to be general, you have to give up something.
link |
01:17:38.840
So, but this single agent was able to play small games
link |
01:17:41.960
like Coon poker and Tic Tac Toe and even Pacman
link |
01:17:49.160
in the same architecture, no change.
link |
01:17:52.040
The agent doesn't know the rules of the game,
link |
01:17:54.880
really nothing and all by self or by a player
link |
01:17:57.640
with these environments.
link |
01:17:59.920
So Jürgen Schmidhuber proposed something called
link |
01:18:03.800
Ghetto Machines, which is a self improving program
link |
01:18:06.920
that rewrites its own code.
link |
01:18:10.800
Sort of mathematically, philosophically,
link |
01:18:12.800
what's the relationship in your eyes,
link |
01:18:15.080
if you're familiar with it,
link |
01:18:16.160
between AXI and the Ghetto Machines?
link |
01:18:18.400
Yeah, familiar with it.
link |
01:18:19.720
He developed it while I was in his lab.
link |
01:18:22.320
Yeah, so the Ghetto Machine, to explain it briefly,
link |
01:18:27.080
you give it a task.
link |
01:18:28.920
It could be a simple task as, you know,
link |
01:18:30.400
finding prime factors in numbers, right?
link |
01:18:32.480
You can formally write it down.
link |
01:18:33.840
There's a very slow algorithm to do that.
link |
01:18:35.280
Just try all the factors, yeah.
link |
01:18:37.520
Or play chess, right?
link |
01:18:39.240
Optimally, you write the algorithm to minimax
link |
01:18:41.200
to the end of the game.
link |
01:18:42.080
So you write down what the Ghetto Machine should do.
link |
01:18:45.360
Then it will take part of its resources to run this program
link |
01:18:50.720
and other part of its resources to improve this program.
link |
01:18:54.000
And when it finds an improved version,
link |
01:18:56.880
which provably computes the same answer.
link |
01:19:00.680
So that's the key part, yeah.
link |
01:19:02.320
It needs to prove by itself that this change of program
link |
01:19:05.680
still satisfies the original specification.
link |
01:19:08.960
And if it does so, then it replaces the original program
link |
01:19:11.680
by the improved program.
link |
01:19:13.120
And by definition, it does the same job,
link |
01:19:15.120
but just faster, okay?
link |
01:19:17.080
And then, you know, it proves over it and over it.
link |
01:19:19.160
And it's developed in a way that all parts
link |
01:19:24.560
of this Ghetto Machine can self improve,
link |
01:19:26.720
but it stays provably consistent
link |
01:19:29.160
with the original specification.
link |
01:19:31.760
So from this perspective, it has nothing to do with iXe.
link |
01:19:36.080
But if you would now put iXe as the starting axioms in,
link |
01:19:40.520
it would run iXe, but you know, that takes forever.
link |
01:19:44.800
But then if it finds a provable speed up of iXe,
link |
01:19:48.480
it would replace it by this and this and this.
link |
01:19:50.960
And maybe eventually it comes up with a model
link |
01:19:52.840
which is still the iXe model.
link |
01:19:54.480
It cannot be, I mean, just for the knowledgeable reader,
link |
01:19:59.600
iXe is incomputable and that can prove that therefore
link |
01:20:03.200
there cannot be a computable exact algorithm computers.
link |
01:20:08.640
There needs to be some approximations
link |
01:20:10.360
and this is not dealt with the Ghetto Machine.
link |
01:20:11.960
So you have to do something about it.
link |
01:20:13.200
But there's the iXe TL model, which is finitely computable,
link |
01:20:15.680
which we could put in.
link |
01:20:16.520
Which part of iXe is noncomputable?
link |
01:20:19.240
The Solomonov induction part.
link |
01:20:20.760
The induction, okay, so.
link |
01:20:22.240
But there is ways of getting computable approximations
link |
01:20:26.320
of the iXe model, so then it's at least computable.
link |
01:20:30.000
It is still way beyond any resources anybody will ever have,
link |
01:20:33.680
but then the Ghetto Machine could sort of improve it
link |
01:20:35.840
further and further in an exact way.
link |
01:20:37.720
So is it theoretically possible
link |
01:20:41.160
that the Ghetto Machine process could improve?
link |
01:20:45.120
Isn't iXe already optimal?
link |
01:20:51.800
It is optimal in terms of the reward collected
link |
01:20:56.760
over its interaction cycles,
link |
01:20:59.360
but it takes infinite time to produce one action.
link |
01:21:03.440
And the world continues whether you want it or not.
link |
01:21:07.120
So the model is assuming you had an oracle,
link |
01:21:09.720
which solved this problem,
link |
01:21:11.200
and then in the next 100 milliseconds
link |
01:21:12.920
or the reaction time you need gives the answer,
link |
01:21:15.360
then iXe is optimal.
link |
01:21:18.200
It's optimal in sense of also from learning efficiency
link |
01:21:21.440
and data efficiency, but not in terms of computation time.
link |
01:21:25.600
And then the Ghetto Machine in theory,
link |
01:21:27.560
but probably not provably could make it go faster.
link |
01:21:31.000
Yes.
link |
01:21:31.840
Okay, interesting.
link |
01:21:34.520
Those two components are super interesting.
link |
01:21:36.640
The sort of the perfect intelligence combined
link |
01:21:39.960
with self improvement,
link |
01:21:44.120
sort of provable self improvement
link |
01:21:45.600
since you're always getting the correct answer
link |
01:21:48.760
and you're improving.
link |
01:21:50.360
Beautiful ideas.
link |
01:21:51.400
Okay, so you've also mentioned that different kinds
link |
01:21:55.120
of things in the chase of solving this reward,
link |
01:21:59.840
sort of optimizing for the goal,
link |
01:22:02.960
interesting human things could emerge.
link |
01:22:04.960
So is there a place for consciousness within iXe?
link |
01:22:10.880
Where does, maybe you can comment,
link |
01:22:13.480
because I suppose we humans are just another instantiation
link |
01:22:17.440
of iXe agents and we seem to have consciousness.
link |
01:22:20.880
You say humans are an instantiation of an iXe agent?
link |
01:22:23.400
Yes.
link |
01:22:24.240
Well, that would be amazing,
link |
01:22:25.280
but I think that's not true even for the smartest
link |
01:22:27.880
and most rational humans.
link |
01:22:29.000
I think maybe we are very crude approximations.
link |
01:22:32.920
Interesting.
link |
01:22:33.760
I mean, I tend to believe, again, I'm Russian,
link |
01:22:35.720
so I tend to believe our flaws are part of the optimal.
link |
01:22:41.160
So we tend to laugh off and criticize our flaws
link |
01:22:45.640
and I tend to think that that's actually close
link |
01:22:49.240
to an optimal behavior.
link |
01:22:50.680
Well, some flaws, if you think more carefully about it,
link |
01:22:53.760
are actually not flaws, yeah,
link |
01:22:54.960
but I think there are still enough flaws.
link |
01:22:58.920
I don't know.
link |
01:23:00.000
It's unclear.
link |
01:23:00.840
As a student of history,
link |
01:23:01.880
I think all the suffering that we've endured
link |
01:23:05.240
as a civilization,
link |
01:23:06.760
it's possible that that's the optimal amount of suffering
link |
01:23:10.200
we need to endure to minimize longterm suffering.
link |
01:23:15.000
That's your Russian background, I think.
link |
01:23:17.280
That's the Russian.
link |
01:23:18.120
Whether humans are or not instantiations of an iXe agent,
link |
01:23:21.840
do you think there's a consciousness
link |
01:23:23.920
of something that could emerge
link |
01:23:25.640
in a computational form or framework like iXe?
link |
01:23:29.720
Let me also ask you a question.
link |
01:23:31.720
Do you think I'm conscious?
link |
01:23:36.800
Yeah, that's a good question.
link |
01:23:38.200
That tie is confusing me, but I think so.
link |
01:23:44.360
You think that makes me unconscious
link |
01:23:45.720
because it strangles me or?
link |
01:23:47.160
If an agent were to solve the imitation game
link |
01:23:49.720
posed by Turing,
link |
01:23:50.600
I think that would be dressed similarly to you.
link |
01:23:53.400
That because there's a kind of flamboyant,
link |
01:23:56.800
interesting, complex behavior pattern
link |
01:24:01.040
that sells that you're human and you're conscious.
link |
01:24:04.440
But why do you ask?
link |
01:24:06.080
Was it a yes or was it a no?
link |
01:24:07.880
Yes, I think you're conscious, yes.
link |
01:24:12.640
So, and you explained sort of somehow why,
link |
01:24:16.080
but you infer that from my behavior, right?
link |
01:24:18.760
You can never be sure about that.
link |
01:24:20.680
And I think the same thing will happen
link |
01:24:23.280
with any intelligent agent we develop
link |
01:24:26.760
if it behaves in a way sufficiently close to humans
link |
01:24:31.000
or maybe even not humans.
link |
01:24:32.080
I mean, maybe a dog is also sometimes
link |
01:24:34.240
a little bit self conscious, right?
link |
01:24:35.720
So if it behaves in a way
link |
01:24:38.800
where we attribute typically consciousness,
link |
01:24:41.160
we would attribute consciousness
link |
01:24:42.720
to these intelligent systems.
link |
01:24:44.320
And I see probably in particular
link |
01:24:47.240
that of course doesn't answer the question
link |
01:24:48.800
whether it's really conscious.
link |
01:24:50.800
And that's the big hard problem of consciousness.
link |
01:24:53.680
Maybe I'm a zombie.
link |
01:24:55.680
I mean, not the movie zombie, but the philosophical zombie.
link |
01:24:59.320
Is to you the display of consciousness
link |
01:25:02.600
close enough to consciousness
link |
01:25:05.000
from a perspective of AGI
link |
01:25:06.720
that the distinction of the hard problem of consciousness
link |
01:25:09.800
is not an interesting one?
link |
01:25:11.320
I think we don't have to worry
link |
01:25:12.480
about the consciousness problem,
link |
01:25:13.920
especially the hard problem for developing AGI.
link |
01:25:16.840
I think, you know, we progress.
link |
01:25:20.200
At some point we have solved all the technical problems
link |
01:25:23.120
and this system will behave intelligent
link |
01:25:25.440
and then super intelligent.
link |
01:25:26.520
And this consciousness will emerge.
link |
01:25:30.160
I mean, definitely it will display behavior
link |
01:25:32.480
which we will interpret as conscious.
link |
01:25:35.040
And then it's a philosophical question.
link |
01:25:38.120
Did this consciousness really emerge
link |
01:25:39.840
or is it a zombie which just, you know, fakes everything?
link |
01:25:43.680
We still don't have to figure that out.
link |
01:25:45.200
Although it may be interesting,
link |
01:25:47.480
at least from a philosophical point of view,
link |
01:25:48.920
it's very interesting,
link |
01:25:49.840
but it may also be sort of practically interesting.
link |
01:25:53.160
You know, there's some people saying,
link |
01:25:54.280
if it's just faking consciousness and feelings,
link |
01:25:56.200
you know, then we don't need to be concerned about,
link |
01:25:58.280
you know, rights.
link |
01:25:59.160
But if it's real conscious and has feelings,
link |
01:26:01.600
then we need to be concerned, yeah.
link |
01:26:05.840
I can't wait till the day
link |
01:26:07.560
where AI systems exhibit consciousness
link |
01:26:10.640
because it'll truly be some of the hardest ethical questions
link |
01:26:14.520
of what we do with that.
link |
01:26:15.640
It is rather easy to build systems
link |
01:26:18.880
which people ascribe consciousness.
link |
01:26:21.120
And I give you an analogy.
link |
01:26:22.600
I mean, remember, maybe it was before you were born,
link |
01:26:25.320
the Tamagotchi?
link |
01:26:26.760
Yeah.
link |
01:26:27.880
Freaking born.
link |
01:26:28.760
How dare you, sir?
link |
01:26:30.960
Why, that's the, you're young, right?
link |
01:26:33.240
Yes, that's good.
link |
01:26:34.080
Thank you, thank you very much.
link |
01:26:36.200
But I was also in the Soviet Union.
link |
01:26:37.560
We didn't have any of those fun things.
link |
01:26:41.240
But you have heard about this Tamagotchi,
link |
01:26:42.680
which was, you know, really, really primitive,
link |
01:26:44.600
actually, for the time it was,
link |
01:26:46.920
and, you know, you could raise, you know, this,
link |
01:26:48.840
and kids got so attached to it
link |
01:26:51.640
and, you know, didn't want to let it die
link |
01:26:53.600
and probably, if we would have asked, you know,
link |
01:26:56.920
the children, do you think this Tamagotchi is conscious?
link |
01:26:59.520
They would have said yes.
link |
01:27:00.360
Half of them would have said yes, I would guess.
link |
01:27:01.600
I think that's kind of a beautiful thing, actually,
link |
01:27:04.720
because that consciousness, ascribing consciousness,
link |
01:27:08.640
seems to create a deeper connection.
link |
01:27:10.440
Yeah.
link |
01:27:11.280
Which is a powerful thing.
link |
01:27:12.600
But we'll have to be careful on the ethics side of that.
link |
01:27:15.880
Well, let me ask about the AGI community broadly.
link |
01:27:18.440
You kind of represent some of the most serious work on AGI,
link |
01:27:22.600
as of at least earlier,
link |
01:27:24.280
and DeepMind represents serious work on AGI these days.
link |
01:27:29.280
But why, in your sense, is the AGI community so small
link |
01:27:34.080
or has been so small until maybe DeepMind came along?
link |
01:27:38.120
Like, why aren't more people seriously working
link |
01:27:41.680
on human level and superhuman level intelligence
link |
01:27:45.840
from a formal perspective?
link |
01:27:48.240
Okay, from a formal perspective,
link |
01:27:49.680
that's sort of an extra point.
link |
01:27:53.640
So I think there are a couple of reasons.
link |
01:27:54.960
I mean, AI came in waves, right?
link |
01:27:56.680
You know, AI winters and AI summers,
link |
01:27:58.520
and then there were big promises which were not fulfilled,
link |
01:28:01.520
and people got disappointed.
link |
01:28:05.760
And that narrow AI solving particular problems,
link |
01:28:11.480
which seemed to require intelligence,
link |
01:28:14.040
was always to some extent successful,
link |
01:28:17.000
and there were improvements, small steps.
link |
01:28:19.480
And if you build something which is useful for society
link |
01:28:24.240
or industrial useful, then there's a lot of funding.
link |
01:28:26.600
So I guess it was in parts the money,
link |
01:28:29.960
which drives people to develop a specific system
link |
01:28:34.200
solving specific tasks.
link |
01:28:36.240
But you would think that, at least in university,
link |
01:28:39.680
you should be able to do ivory tower research.
link |
01:28:43.680
And that was probably better a long time ago,
link |
01:28:46.000
but even nowadays, there's quite some pressure
link |
01:28:48.280
of doing applied research or translational research,
link |
01:28:52.240
and it's harder to get grants as a theorist.
link |
01:28:56.640
So that also drives people away.
link |
01:28:59.920
It's maybe also harder
link |
01:29:01.520
attacking the general intelligence problem.
link |
01:29:03.120
So I think enough people, I mean, maybe a small number
link |
01:29:05.880
were still interested in formalizing intelligence
link |
01:29:09.560
and thinking of general intelligence,
link |
01:29:12.880
but not much came up, right?
link |
01:29:17.560
Well, not much great stuff came up.
link |
01:29:19.880
So what do you think,
link |
01:29:21.360
we talked about the formal big light
link |
01:29:24.840
at the end of the tunnel,
link |
01:29:26.160
but from the engineering perspective,
link |
01:29:27.600
what do you think it takes to build an AGI system?
link |
01:29:30.360
Is that, and I don't know if that's a stupid question
link |
01:29:33.920
or a distinct question
link |
01:29:35.120
from everything we've been talking about at AICSI,
link |
01:29:37.160
but what do you see as the steps that are necessary to take
link |
01:29:41.040
to start to try to build something?
link |
01:29:43.040
So you want a blueprint now,
link |
01:29:44.360
and then you go off and do it?
link |
01:29:46.360
That's the whole point of this conversation,
link |
01:29:48.040
trying to squeeze that in there.
link |
01:29:49.800
Now, is there, I mean, what's your intuition?
link |
01:29:51.560
Is it in the robotics space
link |
01:29:53.960
or something that has a body and tries to explore the world?
link |
01:29:56.800
Is it in the reinforcement learning space,
link |
01:29:58.960
like the efforts with AlphaZero and AlphaStar
link |
01:30:01.000
that are kind of exploring how you can solve it through
link |
01:30:04.360
in the simulation in the gaming world?
link |
01:30:06.720
Is there stuff in sort of all the transformer work
link |
01:30:11.440
and natural English processing,
link |
01:30:13.200
sort of maybe attacking the open domain dialogue?
link |
01:30:15.800
Like what, where do you see a promising pathways?
link |
01:30:21.560
Let me pick the embodiment maybe.
link |
01:30:24.520
So embodiment is important, yes and no.
link |
01:30:33.160
I don't believe that we need a physical robot
link |
01:30:38.600
walking or rolling around, interacting with the real world
link |
01:30:42.960
in order to achieve AGI.
link |
01:30:45.080
And I think it's more of a distraction probably
link |
01:30:50.600
than helpful, it's sort of confusing the body with the mind.
link |
01:30:54.560
For industrial applications or near term applications,
link |
01:30:58.920
of course we need robots for all kinds of things,
link |
01:31:01.200
but for solving the big problem, at least at this stage,
link |
01:31:06.240
I think it's not necessary.
link |
01:31:08.120
But the answer is also yes,
link |
01:31:10.080
that I think the most promising approach
link |
01:31:13.240
is that you have an agent
link |
01:31:15.280
and that can be a virtual agent in a computer
link |
01:31:18.480
interacting with an environment,
link |
01:31:20.120
possibly a 3D simulated environment
link |
01:31:22.560
like in many computer games.
link |
01:31:25.320
And you train and learn the agent,
link |
01:31:29.760
even if you don't intend to later put it sort of,
link |
01:31:33.120
this algorithm in a robot brain
link |
01:31:35.560
and leave it forever in the virtual reality,
link |
01:31:38.560
getting experience in a,
link |
01:31:40.520
although it's just simulated 3D world,
link |
01:31:45.400
is possibly, and I say possibly,
link |
01:31:47.960
important to understand things
link |
01:31:51.600
on a similar level as humans do,
link |
01:31:55.120
especially if the agent or primarily if the agent
link |
01:31:58.560
needs to interact with the humans.
link |
01:32:00.320
If you talk about objects on top of each other in space
link |
01:32:02.960
and flying and cars and so on,
link |
01:32:04.760
and the agent has no experience
link |
01:32:06.400
with even virtual 3D worlds,
link |
01:32:09.560
it's probably hard to grasp.
link |
01:32:12.320
So if you develop an abstract agent,
link |
01:32:14.520
say we take the mathematical path
link |
01:32:16.720
and we just want to build an agent
link |
01:32:18.320
which can prove theorems
link |
01:32:19.480
and becomes a better and better mathematician,
link |
01:32:21.760
then this agent needs to be able to reason
link |
01:32:24.520
in very abstract spaces
link |
01:32:25.960
and then maybe sort of putting it into 3D environments,
link |
01:32:28.920
simulated or not is even harmful.
link |
01:32:30.480
It should sort of, you put it in, I don't know,
link |
01:32:33.400
an environment which it creates itself or so.
link |
01:32:36.680
It seems like you have a interesting, rich,
link |
01:32:38.760
complex trajectory through life
link |
01:32:40.680
in terms of your journey of ideas.
link |
01:32:42.680
So it's interesting to ask what books,
link |
01:32:45.760
technical, fiction, philosophical,
link |
01:32:49.080
books, ideas, people had a transformative effect.
link |
01:32:52.680
Books are most interesting
link |
01:32:53.800
because maybe people could also read those books
link |
01:32:57.280
and see if they could be inspired as well.
link |
01:33:00.120
Yeah, luckily I asked books and not singular book.
link |
01:33:03.520
It's very hard and I try to pin down one book.
link |
01:33:08.120
And I can do that at the end.
link |
01:33:10.520
So the most,
link |
01:33:14.200
the books which were most transformative for me
link |
01:33:16.360
or which I can most highly recommend
link |
01:33:19.600
to people interested in AI.
link |
01:33:21.920
Both perhaps.
link |
01:33:22.880
Yeah, yeah, both, both, yeah, yeah.
link |
01:33:25.440
I would always start with Russell and Norvig,
link |
01:33:28.560
Artificial Intelligence, A Modern Approach.
link |
01:33:30.880
That's the AI Bible.
link |
01:33:33.400
It's an amazing book.
link |
01:33:35.000
It's very broad.
link |
01:33:36.320
It covers all approaches to AI.
link |
01:33:38.800
And even if you focused on one approach,
link |
01:33:40.840
I think that is the minimum you should know
link |
01:33:42.520
about the other approaches out there.
link |
01:33:44.600
So that should be your first book.
link |
01:33:46.200
Fourth edition should be coming out soon.
link |
01:33:48.320
Oh, okay, interesting.
link |
01:33:50.040
There's a deep learning chapter now,
link |
01:33:51.480
so there must be.
link |
01:33:53.080
Written by Ian Goodfellow, okay.
link |
01:33:55.560
And then the next book I would recommend,
link |
01:33:59.680
The Reinforcement Learning Book by Satneen Barto.
link |
01:34:02.920
That's a beautiful book.
link |
01:34:04.440
If there's any problem with the book,
link |
01:34:06.920
it makes RL feel and look much easier than it actually is.
link |
01:34:12.920
It's very gentle book.
link |
01:34:14.800
It's very nice to read, the exercises to do.
link |
01:34:16.760
You can very quickly get some RL systems to run.
link |
01:34:19.520
You know, very toy problems, but it's a lot of fun.
link |
01:34:22.520
And in a couple of days you feel you know what RL is about,
link |
01:34:28.120
but it's much harder than the book.
link |
01:34:30.560
Yeah.
link |
01:34:31.400
Oh, come on now, it's an awesome book.
link |
01:34:34.840
Yeah, it is, yeah.
link |
01:34:36.240
And maybe, I mean, there's so many books out there.
link |
01:34:41.480
If you like the information theoretic approach,
link |
01:34:43.440
then there's Kolmogorov Complexity by Alin Vitani,
link |
01:34:46.760
but probably, you know, some short article is enough.
link |
01:34:50.800
You don't need to read a whole book,
link |
01:34:52.120
but it's a great book.
link |
01:34:54.440
And if you have to mention one all time favorite book,
link |
01:34:59.440
it's of different flavor, that's a book
link |
01:35:01.880
which is used in the International Baccalaureate
link |
01:35:04.800
for high school students in several countries.
link |
01:35:08.560
That's from Nicholas Alchin, Theory of Knowledge,
link |
01:35:12.520
second edition or first, not the third, please.
link |
01:35:16.120
The third one, they took out all the fun.
link |
01:35:18.480
Okay.
link |
01:35:20.240
So this asks all the interesting,
link |
01:35:25.240
or to me, interesting philosophical questions
link |
01:35:27.200
about how we acquire knowledge from all perspectives,
link |
01:35:30.040
from math, from art, from physics,
link |
01:35:33.400
and ask how can we know anything?
link |
01:35:36.240
And the book is called Theory of Knowledge.
link |
01:35:38.040
From which, is this almost like a philosophical exploration
link |
01:35:40.720
of how we get knowledge from anything?
link |
01:35:43.160
Yes, yeah, I mean, can religion tell us, you know,
link |
01:35:45.160
about something about the world?
link |
01:35:46.200
Can science tell us something about the world?
link |
01:35:48.080
Can mathematics, or is it just playing with symbols?
link |
01:35:51.920
And, you know, it's open ended questions.
link |
01:35:54.400
And, I mean, it's for high school students,
link |
01:35:56.240
so they have then resources from Hitchhiker's Guide
link |
01:35:58.320
to the Galaxy and from Star Wars
link |
01:35:59.960
and The Chicken Crossed the Road, yeah.
link |
01:36:01.800
And it's fun to read, but it's also quite deep.
link |
01:36:07.600
If you could live one day of your life over again,
link |
01:36:11.480
has it made you truly happy?
link |
01:36:12.840
Or maybe like we said with the books,
link |
01:36:14.440
it was truly transformative.
link |
01:36:16.240
What day, what moment would you choose
link |
01:36:19.120
that something pop into your mind?
link |
01:36:22.080
Does it need to be a day in the past,
link |
01:36:23.480
or can it be a day in the future?
link |
01:36:25.920
Well, space time is an emergent phenomena,
link |
01:36:27.960
so it's all the same anyway.
link |
01:36:30.400
Okay.
link |
01:36:32.040
Okay, from the past.
link |
01:36:34.280
You're really good at saying from the future, I love it.
link |
01:36:36.800
No, I will tell you from the future, okay.
link |
01:36:39.120
So from the past, I would say
link |
01:36:41.480
when I discovered my Axie model.
link |
01:36:43.800
I mean, it was not in one day,
link |
01:36:45.160
but it was one moment where I realized
link |
01:36:48.880
Kolmogorov complexity and didn't even know that it existed,
link |
01:36:53.200
but I discovered sort of this compression idea
link |
01:36:55.800
myself, but immediately I knew I can't be the first one,
link |
01:36:58.120
but I had this idea.
link |
01:37:00.240
And then I knew about sequential decisionry,
link |
01:37:02.200
and I knew if I put it together, this is the right thing.
link |
01:37:06.360
And yeah, still when I think back about this moment,
link |
01:37:09.680
I'm super excited about it.
link |
01:37:12.400
Was there any more details and context that moment?
link |
01:37:16.320
Did an apple fall on your head?
link |
01:37:20.120
So it was like, if you look at Ian Goodfellow
link |
01:37:21.960
talking about GANs, there was beer involved.
link |
01:37:25.920
Is there some more context of what sparked your thought,
link |
01:37:30.200
or was it just?
link |
01:37:31.200
No, it was much more mundane.
link |
01:37:32.960
So I worked in this company.
link |
01:37:34.560
So in this sense, the four and a half years
link |
01:37:36.160
was not completely wasted.
link |
01:37:39.320
And I worked on an image interpolation problem,
link |
01:37:43.720
and I developed a quite neat new interpolation techniques
link |
01:37:48.480
and they got patented, which happens quite often.
link |
01:37:52.240
I got sort of overboard and thought about,
link |
01:37:54.360
yeah, that's pretty good, but it's not the best.
link |
01:37:56.240
So what is the best possible way of doing interpolation?
link |
01:37:59.800
And then I thought, yeah, you want the simplest picture,
link |
01:38:03.200
which is if you coarse grain it,
link |
01:38:04.760
recovers your original picture.
link |
01:38:06.560
And then I thought about the simplicity concept
link |
01:38:08.880
more in quantitative terms,
link |
01:38:11.280
and then everything developed.
link |
01:38:15.040
And somehow the four beautiful mix
link |
01:38:17.120
of also being a physicist
link |
01:38:18.920
and thinking about the big picture of it,
link |
01:38:20.600
then led you to probably think big with AIX.
link |
01:38:24.120
So as a physicist, I was probably trained
link |
01:38:26.200
not to always think in computational terms,
link |
01:38:28.440
just ignore that and think about
link |
01:38:30.840
the fundamental properties, which you want to have.
link |
01:38:34.000
So what about if you could really one day in the future?
link |
01:38:36.920
What would that be?
link |
01:38:39.880
When I solve the AGI problem.
link |
01:38:43.320
In practice, so in theory,
link |
01:38:45.120
I have solved it with the AIX model, but in practice.
link |
01:38:48.680
And then I ask the first question.
link |
01:38:50.720
What would be the first question?
link |
01:38:53.200
What's the meaning of life?
link |
01:38:55.680
I don't think there's a better way to end it.
link |
01:38:58.400
Thank you so much for talking today.
link |
01:38:59.240
It's a huge honor to finally meet you.
link |
01:39:01.360
Yeah, thank you too.
link |
01:39:02.200
It was a pleasure of mine too.
link |
01:39:33.160
And now let me leave you with some words of wisdom
link |
01:39:35.760
from Albert Einstein.
link |
01:39:38.000
The measure of intelligence is the ability to change.
link |
01:39:42.040
Thank you for listening and hope to see you next time.