back to index

Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI | Lex Fridman Podcast #75


small model | large model

link |
00:00:00.000
The following is a conversation with Marcus Hutter, Senior Research Scientist at Google DeepMind.
link |
00:00:06.000
Throughout his career of research, including with Jörgen Schmitthuber and Shane Legg,
link |
00:00:11.000
he has proposed a lot of interesting ideas in and around the field of artificial general intelligence,
link |
00:00:17.000
including the development of IEXI, spelled AIXI model,
link |
00:00:22.000
which is a mathematical approach to AGI that incorporates ideas of
link |
00:00:27.000
Kamogorov complexity, Solomanov induction, and reinforcement learning.
link |
00:00:33.000
In 2006, Marcus launched the 50,000 Euro Hutter Prize for Lossless Compression of Human Knowledge.
link |
00:00:41.000
The idea behind this prize is that the ability to compress well is closely related to intelligence.
link |
00:00:48.000
This, to me, is a profound idea.
link |
00:00:51.000
Specifically, if you can compress the first 100 megabytes or 1 gigabyte of Wikipedia better than your predecessors,
link |
00:00:58.000
your compressor likely has to also be smarter.
link |
00:01:02.000
The intention of this prize is to encourage the development of intelligent compressors as a path to AGI.
link |
00:01:09.000
In conjunction with his podcast release just a few days ago,
link |
00:01:13.000
Marcus announced a 10x increase in several aspects of this prize, including the money,
link |
00:01:19.000
to 500,000 Euros.
link |
00:01:22.000
The better your compressor works, relative to the previous winners, the higher fraction of that prize money is awarded to you.
link |
00:01:29.000
You can learn more about it if you Google simply Hutter Prize.
link |
00:01:34.000
I'm a big fan of benchmarks for developing AI systems,
link |
00:01:38.000
and the Hutter Prize may indeed be one that will spark some good ideas for approaches
link |
00:01:43.000
that will make progress on the path of developing AGI systems.
link |
00:01:47.000
This is the Artificial Intelligence Podcast.
link |
00:01:50.000
If you enjoy it, subscribe on YouTube, give it 5 stars on Apple Podcasts,
link |
00:01:54.000
support it on Patreon, or simply connect with me on Twitter at Lex Freedman, spelled F R I D M A N.
link |
00:02:02.000
As usual, I'll do one or two minutes of ads now and never any ads in the middle that can break the flow of the conversation.
link |
00:02:09.000
I hope that works for you and doesn't hurt the listening experience.
link |
00:02:13.000
This show is presented by Cash App, the number one finance app in the App Store.
link |
00:02:17.000
When you get it, use code LEX Podcast.
link |
00:02:21.000
Cash App lets you send money to friends, buy Bitcoin, and invest in the stock market with as little as $1.
link |
00:02:27.000
Brokerage services are provided by Cash App Investing, a subsidiary of Square, and member SIPC.
link |
00:02:34.000
Since Cash App allows you to send and receive money digitally, peer to peer, and security in all digital transactions is very important.
link |
00:02:42.000
Let me mention the PCI Data Security Standard that Cash App is compliant with.
link |
00:02:48.000
I'm a big fan of standards for safety and security.
link |
00:02:52.000
PCI DSS is a good example of that, where a bunch of competitors got together and agreed that there needs to be a global standard around the security of transactions.
link |
00:03:02.000
Now, we just need to do the same for autonomous vehicles and AI systems in general.
link |
00:03:08.000
So again, if you get Cash App from the App Store or Google Play and use the code LEX Podcast, you'll get $10.
link |
00:03:16.000
And Cash App will also donate $10 to FIRST, one of my favorite organizations that is helping to advance robotics and STEM education for young people around the world.
link |
00:03:27.000
And now, here's my conversation with Marcus Hutter.
link |
00:03:32.000
Do you think of the universe as a computer or maybe an information processing system?
link |
00:03:37.000
Let's go with a big question first.
link |
00:03:39.000
Okay, with a big question first.
link |
00:03:41.000
I think it's a very interesting hypothesis or idea.
link |
00:03:45.000
And I have a background in physics.
link |
00:03:48.000
So I know a little bit about physical theories, the standard model of particle physics and general relativity theory.
link |
00:03:54.000
And they are amazing and describe virtually everything in the universe.
link |
00:03:58.000
And they're all, in a sense, computable theories.
link |
00:04:00.000
I mean, they're very hard to compute.
link |
00:04:02.000
And it's very elegant, simple theories which describe virtually everything in the universe.
link |
00:04:07.000
So there's a strong indication that somehow the universe is computable.
link |
00:04:15.000
But it's a plausible hypothesis.
link |
00:04:17.000
So what do you think, just like you said, general relativity, quantum field theory,
link |
00:04:22.000
what do you think that the laws of physics are so nice and beautiful and simple and compressible?
link |
00:04:28.000
Do you think our universe was designed is naturally this way?
link |
00:04:34.000
Are we just focusing on the parts that are especially compressible?
link |
00:04:39.000
Are human minds just enjoy something about that simplicity?
link |
00:04:43.000
And in fact, there's other things that are not so compressible.
link |
00:04:46.000
No, I strongly believe and I'm pretty convinced that the universe is inherently beautiful, elegant and simple
link |
00:04:53.000
and described by these equations.
link |
00:04:55.000
And we're not just picking that.
link |
00:04:57.000
I mean, if there were some phenomena which cannot be neatly described, scientists would try that.
link |
00:05:04.000
And there's biology which is more messy, but we understand that it's an emergent phenomena.
link |
00:05:09.000
And it's complex systems, but they still follow the same rules of quantum and electrodynamics.
link |
00:05:14.000
All of chemistry follows that and we know that.
link |
00:05:16.000
I mean, we cannot compute everything because we have limited computational resources.
link |
00:05:20.000
No, I think it's not a bias of the humans, but it's objectively simple.
link |
00:05:24.000
I mean, of course, you never know, maybe there's some corners very far out in the universe
link |
00:05:28.000
or super, super tiny below the nucleus of atoms or parallel universes
link |
00:05:36.000
which are not nice and simple, but there's no evidence for that.
link |
00:05:40.000
And we should apply Occam's razor and choose the simple streak consistent with it.
link |
00:05:45.000
But although it's a little bit self referential.
link |
00:05:48.000
So maybe a quick pause.
link |
00:05:49.000
What is Occam's razor?
link |
00:05:51.000
So Occam's razor says that you should not multiply entities beyond necessity,
link |
00:05:57.000
which sort of if you translate it to proper English means and, you know,
link |
00:06:02.000
in the scientific context means that if you have two theories or hypothesis or models
link |
00:06:06.000
which equally well describe the phenomenon you're studying or the data,
link |
00:06:11.000
you should choose the more simple one.
link |
00:06:13.000
So that's just the principle?
link |
00:06:15.000
Yes.
link |
00:06:16.000
Sort of that's not like a provable law perhaps.
link |
00:06:20.000
We'll kind of discuss it and think about it.
link |
00:06:23.000
But what's the intuition of why the simpler answer is the one that is likely
link |
00:06:30.000
to be more correct descriptor of whatever we're talking about?
link |
00:06:34.000
I believe that Occam's razor is probably the most important principle in science.
link |
00:06:40.000
I mean, of course, we need logical deduction and we do experimental design.
link |
00:06:44.000
But science is about finding understanding the world, finding models of the world.
link |
00:06:51.000
And we can come up with crazy complex models which, you know,
link |
00:06:54.000
explain everything but predict nothing.
link |
00:06:56.000
But the simple model seem to have predictive power and it's a valid question.
link |
00:07:02.000
Why?
link |
00:07:03.000
And the two answers to that you can just accept it.
link |
00:07:07.000
That is the principle of science and we use this principle and it seems to be successful.
link |
00:07:12.000
We don't know why, but it just happens to be.
link |
00:07:15.000
Or you can try, you know, find another principle which explains Occam's razor.
link |
00:07:21.000
And if we start with assumption that the world is governed by simple rules,
link |
00:07:27.000
then there's a bias to our simplicity.
link |
00:07:31.000
And applying Occam's razor is the mechanism to finding these rules.
link |
00:07:37.000
And actually in a more quantitative sense and we come back to that later
link |
00:07:40.000
in terms of some of the deduction, you can rigorously prove that you assume
link |
00:07:44.000
that the world is simple, then Occam's razor is the best you can do in a certain sense.
link |
00:07:49.000
So I apologize for the romanticized question, but why do you think outside of its effectiveness,
link |
00:07:56.000
why do we, do you think we find simplicity so appealing as human beings?
link |
00:08:00.000
Why does it just, why does E equals MC squared seem so beautiful to us humans?
link |
00:08:08.000
I guess mostly, in general, many things can be explained by an evolutionary argument.
link |
00:08:15.000
And, you know, there's some artifacts in humans which, you know, are just artifacts
link |
00:08:19.000
and not an evolutionary necessary.
link |
00:08:21.000
But with this beauty and simplicity, it's, I believe, at least the core is about,
link |
00:08:31.000
like science, finding regularities in the world, understanding the world,
link |
00:08:36.000
which is necessary for survival, right?
link |
00:08:38.000
You know, if I look at a bush, right, and I just see noise and there is a tiger, right,
link |
00:08:44.000
and eats me, then I'm dead.
link |
00:08:45.000
But if I try to find a pattern and we know that humans are prone to find more patterns
link |
00:08:52.000
in data than they are, you know, like, you know, Mars face and all these things,
link |
00:08:57.000
but this bias towards finding patterns, even if they are not, but I mean,
link |
00:09:02.000
it's best, of course, if they are, yeah, helps us for survival.
link |
00:09:06.000
Yeah, that's fascinating.
link |
00:09:07.000
I haven't thought really about the, I thought I just loved science,
link |
00:09:11.000
but they're indeed from in terms of just for survival purposes.
link |
00:09:16.000
There is an evolutionary argument for why we find the work of Einstein so beautiful.
link |
00:09:24.000
Maybe a quick small tangent.
link |
00:09:26.000
Could you describe what Solomonov induction is?
link |
00:09:30.000
Yeah, so that's a theory which I claim and Resolominoff sort of claimed a long time ago
link |
00:09:37.000
that this solves the big philosophical problem of induction.
link |
00:09:42.000
And I believe the claim is essentially true.
link |
00:09:45.000
And what he does is the following.
link |
00:09:47.000
So, okay, for the picky listener induction can be interpreted narrowly and wildly narrow
link |
00:09:56.000
means inferring models from data and widely means also then using these models
link |
00:10:03.000
for doing predictions or predictions also part of the induction.
link |
00:10:06.000
So I'm a little sloppy sort of with the terminology and maybe that comes from Resolominoff,
link |
00:10:12.000
you know, being sloppy, maybe I shouldn't say that.
link |
00:10:15.000
He can't complain anymore.
link |
00:10:17.000
So let me explain a little bit this theory in simple terms.
link |
00:10:22.000
So assume you have a data sequence, make it very simple.
link |
00:10:25.000
The simplest one say one, one, one, one, one and you see if one hundred ones.
link |
00:10:28.000
What do you think comes next?
link |
00:10:30.000
The natural answer I'm going to speed up a little bit.
link |
00:10:32.000
The natural answer is of course, you know, one.
link |
00:10:35.000
Okay.
link |
00:10:36.000
And the question is why?
link |
00:10:37.000
Okay.
link |
00:10:38.000
Well, we see a pattern there.
link |
00:10:40.000
Yeah.
link |
00:10:41.000
Okay.
link |
00:10:42.000
There's a one and we repeat it.
link |
00:10:43.000
And why should it suddenly after a hundred ones be different?
link |
00:10:45.000
So what we're looking for is simple explanations or models for the data we have.
link |
00:10:50.000
And now the question is a model has to be presented in a certain language in which language to be used in science.
link |
00:10:58.000
We want formal languages and we can use mathematics or we can use programs on a computer.
link |
00:11:03.000
So abstractly on a Turing machine, for instance, or it can be a general purpose computer.
link |
00:11:08.000
So and there are of course lots of models of you can say maybe it's a hundred ones and then a hundred zeros and a hundred ones.
link |
00:11:14.000
That's a model, right?
link |
00:11:15.000
But they're simpler models.
link |
00:11:17.000
There's a model print one loop.
link |
00:11:19.000
That also explains the data.
link |
00:11:21.000
And if you push that to the extreme, you are looking for the shortest program, which if you run this program reproduces the data you have, it will not stop.
link |
00:11:32.000
It will continue naturally.
link |
00:11:34.000
And this you take for your prediction.
link |
00:11:36.000
And on the sequence of ones, it's very plausible, right?
link |
00:11:39.000
That print one loop is the shortest program.
link |
00:11:41.000
We can give some more complex examples like one, two, three, four, five.
link |
00:11:45.000
What comes next?
link |
00:11:46.000
The short program is again, you know, counter.
link |
00:11:48.000
And so that is roughly speaking how Solomotiv induction works.
link |
00:11:53.000
The extra twist is that it can also deal with noisy data.
link |
00:11:57.000
So if you have, for instance, a coin flip, say a biased coin, which comes up head with 60 percent probability, then it will predict.
link |
00:12:06.000
It will learn and figure this out.
link |
00:12:08.000
And after a while, it predicts all the next coin flip will be head with probability 60 percent.
link |
00:12:13.000
So it's the stochastic version of that.
link |
00:12:15.000
The goal is the dream is always the search for the short program.
link |
00:12:18.000
Yes.
link |
00:12:19.000
Yeah.
link |
00:12:20.000
Well, in Solomotiv induction, precisely what you do is so you combine.
link |
00:12:23.000
So looking for the shortest program is like applying opus razor, like looking for the simplest theory.
link |
00:12:29.000
There's also Epicoros principle, which says if you have multiple hypothesis, which equally well describe your data, don't discard any of them.
link |
00:12:36.000
Keep all of them around.
link |
00:12:37.000
You never know.
link |
00:12:38.000
And you can put that together and say, okay, I have a bias towards simplicity, but I don't rule out the larger models.
link |
00:12:45.000
And technically what we do is we weigh the shorter models higher and the longer models lower.
link |
00:12:52.000
And you use a Bayesian techniques.
link |
00:12:54.000
You have a prior and which is precisely two to the minus the complexity of the program.
link |
00:13:02.000
And you weigh all this hypothesis and takes this mixture and then you get also this stochasticity in.
link |
00:13:07.000
Yeah.
link |
00:13:08.000
Like many of your ideas, that's just a beautiful idea of weighing based on the simplicity of the program.
link |
00:13:13.000
I love that.
link |
00:13:14.000
That seems to me maybe a very human centric concept seems to be a very appealing way of discovering good programs in this world.
link |
00:13:24.000
You've used the term compression quite a bit.
link |
00:13:28.000
I think it's a beautiful idea.
link |
00:13:30.000
Sort of, we just talked about simplicity and maybe science or just all of our intellectual pursuits is basically the attempt to compress the complexity all around us into something simple.
link |
00:13:43.000
So what does this word mean to you?
link |
00:13:48.000
Compression.
link |
00:13:49.000
I essentially have already explained it.
link |
00:13:52.000
So it compression means for me finding short programs for the data or the phenomenon at hand.
link |
00:14:00.000
You could interpret it more widely as finding simple theories which can be mathematical theories or maybe even informal like just in words.
link |
00:14:09.000
Compression means finding short descriptions explanations programs for the data.
link |
00:14:15.000
Do you see science as a kind of our human attempt at compression?
link |
00:14:22.000
So we're speaking more generally because when you say programs, the kind of zooming in on a particular sort of almost like a computer science, artificial intelligence focus.
link |
00:14:30.000
But do you see all of human endeavor as a kind of compression?
link |
00:14:34.000
Well, at least all of science I see as an endeavor of compression, not all of humanity, maybe.
link |
00:14:40.000
And well, there are also some other aspects of science like experimental design, right?
link |
00:14:45.000
I mean, we create experiments specifically to get extra knowledge and this is, that isn't part of the decision making process.
link |
00:14:53.000
But once we have the data to understand the data is essentially compression.
link |
00:14:59.000
So I don't see any difference between compression, understanding and prediction.
link |
00:15:06.000
So we're jumping around topics a little bit, but returning back to simplicity, a fascinating concept of comagorov complexity.
link |
00:15:14.000
So in your sense, do most objects in our mathematical universe have high comagorov complexity and maybe what is, first of all, what is comagorov complexity?
link |
00:15:26.000
Okay, comagorov complexity is a notion of simplicity or complexity.
link |
00:15:31.000
And it takes the compression view to the extreme.
link |
00:15:36.000
So I explained before that if you have some data sequence, just think about a file on a computer and best sort of, you know, just a string of bits.
link |
00:15:45.000
And if you, and we have data compressors, like we compress big files into zip files with certain compressors.
link |
00:15:53.000
And you can also produce self extracting archives, that means as an executable, if you run it, it reproduces your original file without needing an extra decompressor.
link |
00:16:02.000
It's just a decompressor plus the archive together in one.
link |
00:16:06.000
And now they're better and worse compressors and you can ask what is the ultimate compressor.
link |
00:16:11.000
So what is the shortest possible self extracting archive you could produce for certain data set here, which reproduces the data set and the length of this is called the comagorov complexity.
link |
00:16:22.000
And arguably, that is the information content in the data set.
link |
00:16:27.000
I mean, if the data set is very redundant or very boring, you can compress it very well.
link |
00:16:31.000
So the information content should be low.
link |
00:16:34.000
And, you know, it is low according to this definition.
link |
00:16:36.000
So the length of the shortest program that summarizes the data.
link |
00:16:40.000
Yes.
link |
00:16:41.000
And what's your sense of our universe when we think about the different objects in our universe that we try concepts or whatever at every level.
link |
00:16:55.000
Do they have higher or low comagorov complexity?
link |
00:16:58.000
So what's the hope?
link |
00:17:00.000
Do we have a lot of hope in being able to summarize much of our world?
link |
00:17:05.000
That's a tricky and difficult question.
link |
00:17:08.000
So as I said before, I believe that the whole universe based on the evidence we have is very simple.
link |
00:17:16.000
So it has a very short description.
link |
00:17:18.000
So to linger on that, the whole universe, what does that mean?
link |
00:17:24.000
Do you mean at the very basic fundamental level in order to create the universe?
link |
00:17:28.000
Yes.
link |
00:17:29.000
Yeah.
link |
00:17:30.000
So you need a very short program and you run it to get the thing going to get the thing going and then it will reproduce our universe.
link |
00:17:37.000
There's a problem with noise.
link |
00:17:39.000
We can come back to that later, possibly.
link |
00:17:42.000
Is noise a problem or is it a bug or a feature?
link |
00:17:46.000
I would say it makes our life as a scientist really, really much harder.
link |
00:17:52.000
I mean, think about it without noise, we wouldn't need all of the statistics.
link |
00:17:56.000
But that maybe we wouldn't feel like there's a free will.
link |
00:17:59.000
Maybe we need that for the...
link |
00:18:01.000
This is an illusion that noise can give you free will.
link |
00:18:05.000
At least in that way, it's a feature.
link |
00:18:06.000
But also, if you don't have noise, you have chaotic phenomena which are effectively like noise.
link |
00:18:12.000
So we can't get away with statistics even then.
link |
00:18:15.000
I mean, think about rolling a dice and forget about quantum mechanics and you know exactly how you throw it.
link |
00:18:21.000
But I mean, it's still so hard to compute the trajectory that effectively it is best to model it as coming out with a number, with probability 1 over 6.
link |
00:18:31.000
But from this sort of philosophical...
link |
00:18:36.000
Complexly perspective, if we didn't have noise, then arguably you could describe the whole universe as standard model plus generativity.
link |
00:18:47.000
I mean, we don't have a theory of everything yet, but sort of assuming we are close to it or have it.
link |
00:18:52.000
Plus the initial conditions which may hopefully be simple.
link |
00:18:55.000
And then you just run it and then you would reproduce the universe.
link |
00:18:58.000
But that's spoiled by noise or by chaotic systems or by initial conditions which may be complex.
link |
00:19:06.000
So now, if we don't take the whole universe with just a subset, you know, just take planet Earth.
link |
00:19:13.000
Planet Earth cannot be compressed into a couple of equations.
link |
00:19:17.000
This is a hugely complex system.
link |
00:19:19.000
So interesting.
link |
00:19:20.000
So when you look at the window, like the whole thing might be simple, but when you just take a small window, then...
link |
00:19:26.000
It may become complex and that may be counterintuitive, but there's a very nice analogy.
link |
00:19:31.000
The book, the library of all books.
link |
00:19:34.000
So imagine you have a normal library with interesting books and you go there, great.
link |
00:19:38.000
Lots of information and huge, quite complex.
link |
00:19:41.000
So now I create a library which contains all possible books, say, of 500 pages.
link |
00:19:47.000
So the first book just has AAAA over all the pages.
link |
00:19:50.000
The next book AAAA and ends with B.
link |
00:19:52.000
And so on.
link |
00:19:53.000
I create this library of all books.
link |
00:19:55.000
It's a short program which creates this library.
link |
00:19:57.000
So this library which has all books has zero information content.
link |
00:20:01.000
And you take a subset of this library and suddenly you have a lot of information in there.
link |
00:20:05.000
So that's fascinating.
link |
00:20:06.000
I think one of the most beautiful object, mathematical objects that at least today seems to be understudied or under talked about is cellular automata.
link |
00:20:15.000
What lessons do you draw from sort of the game of life for cellular automata?
link |
00:20:19.000
Where you start with the simple rules just like you're describing with the universe and somehow complexity emerges.
link |
00:20:26.000
Do you feel like you have an intuitive grasp on the behavior, the fascinating behavior of such systems where some, like you said, some chaotic behavior could happen.
link |
00:20:37.000
Some complexity could emerge.
link |
00:20:39.000
Some, it could die out in some very rigid structures.
link |
00:20:43.000
Do you have a sense about cellular automata that somehow transfers maybe to the bigger questions of our universe?
link |
00:20:51.000
The cellular automata and especially the converse game of life is really great because this rule is so simple.
link |
00:20:56.000
You can explain it to every child and even by hand you can simulate a little bit.
link |
00:21:00.000
And you see this beautiful patterns emerge and people have proven that it's even touring complete.
link |
00:21:07.000
You can not just use a computer to simulate game of life, but you can also use game of life to simulate any computer.
link |
00:21:13.000
That is truly amazing.
link |
00:21:16.000
And it's the prime example probably to demonstrate that very simple rules can lead to very rich phenomena.
link |
00:21:25.000
And people, you know, sometimes, you know, how can, how is chemistry and biology so rich?
link |
00:21:30.000
I mean, this can't be based on simple rules.
link |
00:21:32.000
But no, we know quantum electrodynamics describes all of chemistry and we come later back to that.
link |
00:21:39.000
I claim intelligence can be explained or described in one single equation, this very rich phenomenon.
link |
00:21:45.000
You asked also about whether, you know, I understand this phenomenon and it's probably not.
link |
00:21:53.000
And this is saying you never understand really things, you just get used to them.
link |
00:21:58.000
And I think I'm pretty used to sell it out to Marty.
link |
00:22:03.000
So you believe that you understand now why this phenomenon happens.
link |
00:22:07.000
But I give you a different example.
link |
00:22:09.000
I didn't play too much with this converse game of life, but a little bit more with fractals and with the Mandelbrot set.
link |
00:22:16.000
And it's beautiful, you know, patterns just look Mandelbrot set.
link |
00:22:20.000
And, well, when the computers were really slow and I just had a black and white monitor and programmed my own programs in Assembler too.
link |
00:22:29.000
Assembler, wow.
link |
00:22:31.000
Wow, you're legit.
link |
00:22:33.000
To get these fractals on the screen and it was mesmerized and much later.
link |
00:22:37.000
So I returned to this, you know, every couple of years and then I tried to understand what is going on and you can understand a little bit.
link |
00:22:45.000
I tried to derive the locations, you know, there are these circles and the apple shape.
link |
00:22:53.000
And then you have smaller Mandelbrot sets recursively in this set.
link |
00:22:59.000
And there's a way to mathematically by solving high order polynomials to figure out where these centers are and what size they are approximately.
link |
00:23:08.000
And by sort of mathematically approaching this problem, you slowly get a feeling of why things are like they are.
link |
00:23:18.000
And that sort of isn't, you know, first step to understanding why this rich phenomenon.
link |
00:23:25.000
Do you think it's possible? What's your intuition?
link |
00:23:27.000
Do you think it's possible to reverse engineer and find the short program that generated these fractals by looking at the fractals?
link |
00:23:36.000
Well, in principle, yes.
link |
00:23:38.000
So, I mean, in principle, what you can do is you take, you know, any data set, you know, you take these fractals or you take whatever your data set, whatever you have.
link |
00:23:47.000
It's a picture of Converse Game of Life.
link |
00:23:50.000
And you run through all programs, you take a program of size one, two, three, four and all these programs around them all in parallel in so called dovetailing fashion.
link |
00:23:58.000
Give them computational resources first one 50%, second one half resources and so on and let them run.
link |
00:24:05.000
Wait until they hold, give an output, compare it to your data.
link |
00:24:09.000
And if some of these programs produce the correct data, then you stop and then you have already some program.
link |
00:24:14.000
It may be a long program because it's faster.
link |
00:24:16.000
And then you continue and you get shorter and shorter programs until you eventually find the shortest program.
link |
00:24:22.000
The interesting thing you can never know whether it's the shortest program because there could be an even shorter program which is just even slower.
link |
00:24:30.000
And you just have to wait here, but asymptotically, and actually after a finite time you have the shortest program.
link |
00:24:36.000
So this is a theoretical but completely impractical way of finding the underlying structure in every data set and then it was a lot more of induction does and come a lot of complexity.
link |
00:24:50.000
In practice, of course, we have to approach the problem more intelligently.
link |
00:24:53.000
And then if you take resource limitations into account, there's, for instance, the field of pseudo random numbers.
link |
00:25:03.000
And these are random numbers.
link |
00:25:05.000
So these are deterministic sequences, but no algorithm which is fast, fast means runs in polynomial time can detect that it's actually deterministic.
link |
00:25:15.000
So we can produce interesting, I mean, random numbers, maybe not that interesting, but just an example.
link |
00:25:20.000
We can produce complex looking data and we can then prove that no fast algorithm can detect the underlying pattern.
link |
00:25:31.000
Which is unfortunately, that's a big challenge for our search for simple programs in the space of artificial intelligence, perhaps.
link |
00:25:42.000
Yes, it definitely is wanted vision intelligence and it's quite surprising that it's, I can't say easy.
link |
00:25:48.000
I mean, physicists worked really hard to find these theories, but apparently it was possible for human minds to find these simple rules in the universe.
link |
00:25:57.000
It could have been different, right?
link |
00:25:59.000
It could have been different.
link |
00:26:00.000
It's awe inspiring.
link |
00:26:04.000
So let me ask another absurdly big question.
link |
00:26:08.000
What is intelligence in your view?
link |
00:26:13.000
So I have, of course, a definition.
link |
00:26:17.000
I wasn't sure what you're going to say, because you could have just as easy said, I have no clue.
link |
00:26:21.000
Which many people would say, but I'm not modest in this question.
link |
00:26:27.000
So the informal version, which I worked out together with Shane Lake, who co founded the mind is that intelligence measures and agents ability to perform well in a wide range of environments.
link |
00:26:43.000
So that doesn't sound very impressive.
link |
00:26:47.000
But these words have been very carefully chosen and there is a mathematical theory behind that.
link |
00:26:53.000
And we come back to that later.
link |
00:26:55.000
And if you look at this definition by itself, it seems like, yeah, okay, but it seems a lot of things are missing.
link |
00:27:03.000
But if you think it through, then you realize that most and I claim all of the other traits, at least of rational intelligence, which we usually associate with intelligence, are emergent phenomena from this definition.
link |
00:27:18.000
Like, you know, creativity, memorization, planning, knowledge.
link |
00:27:22.000
You all need that in order to perform well in a wide range of environments.
link |
00:27:27.000
So you don't have to explicitly mention that in a definition.
link |
00:27:30.000
Interesting.
link |
00:27:31.000
So yeah, so the consciousness, abstract reasoning, all these kinds of things are just emergent phenomena that help you in towards, can you say the definition again?
link |
00:27:42.000
So multiple environments.
link |
00:27:44.000
Did you mention the word goals?
link |
00:27:46.000
No, but we have an alternative definition instead of performing well, you can just replace it by goals.
link |
00:27:51.000
So intelligence measures and agents ability to achieve goals in a wide range of environments.
link |
00:27:56.000
That's more or less equal.
link |
00:27:57.000
Interesting, because in there there's an injection of the word goals, so we want to specify there should be a goal.
link |
00:28:03.000
Yeah, but perform well is sort of what does it mean?
link |
00:28:06.000
It's the same problem.
link |
00:28:07.000
Yeah.
link |
00:28:08.000
There's a little bit gray area, but it's much closer to something that could be formalized.
link |
00:28:13.000
In your view, are humans, where do humans fit into that definition?
link |
00:28:18.000
Are they general intelligence systems that are able to perform in like, how good are they at fulfilling that definition at performing well in multiple environments?
link |
00:28:31.000
Yeah, that's a big question.
link |
00:28:33.000
I mean, the humans are performing best among all species.
link |
00:28:37.000
Species we know, we know of.
link |
00:28:40.000
Depends.
link |
00:28:41.000
You could say that trees and plants are doing a better job.
link |
00:28:44.000
They'll probably outlast us.
link |
00:28:46.000
Yeah, but they're in a much more narrow environment, right?
link |
00:28:49.000
I mean, you just have a little bit of air pollution and these trees die, and we can adapt, right?
link |
00:28:54.000
We build houses, we build filters, we do geoengineering.
link |
00:28:59.000
So the multiple environment part.
link |
00:29:01.000
Yeah, that is very important.
link |
00:29:02.000
Yeah.
link |
00:29:03.000
So they distinguish narrow intelligence from wide intelligence, also in the AI research.
link |
00:29:08.000
So let me ask the alentoring question, can machines think, can machines be intelligent?
link |
00:29:16.000
So in your view, I have to kind of ask, the answer is probably yes, but I want to kind of hear with your thoughts on it.
link |
00:29:24.000
Can machines be made to fulfill this definition of intelligence, to achieve intelligence?
link |
00:29:30.000
Well, we are sort of getting there and, you know, on a small scale, we are already there.
link |
00:29:36.000
The wide range of environments are missing.
link |
00:29:39.000
But we have self driving cars, we have programs to play go and chess, we have speech recognition.
link |
00:29:44.000
So it's pretty amazing, but you can, you know, these are narrow environments.
link |
00:29:49.000
But if you look at AlphaZero, that was also developed by DeepMind.
link |
00:29:54.000
I mean, got famous with AlphaGo and then came AlphaZero a year later.
link |
00:29:57.000
That was truly amazing.
link |
00:29:59.000
So I'm reinforcement learning algorithm, which is able just by self play to play chess and then also go.
link |
00:30:08.000
And I mean, yes, they're both games, but they're quite different games.
link |
00:30:11.000
And, you know, this, you didn't don't feed them the rules of the game.
link |
00:30:15.000
And the most remarkable thing, which is still a mystery to me that usually for any decent chess program, I don't know much about Go,
link |
00:30:22.000
you need opening books and end game tables and so on too. And nothing in there, nothing was put in there.
link |
00:30:29.000
Especially with AlphaZero, the self play mechanism starting from scratch, being able to learn actually new strategies.
link |
00:30:38.000
Yeah, it really discovered, you know, all these famous openings within four hours by itself.
link |
00:30:46.000
What I was really happy about, I'm a terrible chess player, but I like Queen Gambi.
link |
00:30:50.000
And AlphaZero figured out that this is the best opening.
link |
00:30:54.000
Finally, somebody proved you correct.
link |
00:30:59.000
So yes, to answer your question, yes, I believe that general intelligence is possible.
link |
00:31:04.000
And it also depends how you define it.
link |
00:31:08.000
Do you say AGI with general intelligence, artificial intelligence only refers to if you achieve human level or a sub human level,
link |
00:31:17.000
but quite broad, is it also general intelligence or we have to distinguish or it's only super human intelligence, general artificial intelligence?
link |
00:31:25.000
Is there a test in your mind like the Turing test for natural language or some other test that would impress the heck out of you
link |
00:31:32.000
that would kind of cross the line of your sense of intelligence within the framework that you said?
link |
00:31:40.000
Well, the Turing test has been criticized a lot, but I think it's not as bad as some people think.
link |
00:31:46.000
Some people think it's too strong, so it tests not just for a system to be intelligent,
link |
00:31:52.000
but it also has to fake human deception.
link |
00:31:56.000
Disception, which is much harder.
link |
00:31:59.000
And on the other hand, they say it's too weak because it just maybe fakes emotions or intelligent behavior.
link |
00:32:07.000
It's not real, but I don't think that's the problem or big problem.
link |
00:32:12.000
So if you would pass the Turing test, so a conversation or a terminal with a bot for an hour,
link |
00:32:20.000
or maybe a day or so, and you can fool a human into not knowing whether this is a human or not,
link |
00:32:26.000
so that's the Turing test, I would be truly impressed.
link |
00:32:30.000
And we have these annual competitions, the Lubna Prize.
link |
00:32:34.000
And I mean, it started with Eliza, that was the first conversational program.
link |
00:32:38.000
And what is it called in Japanese, Mitsuko or so, that's the winner of the last couple of years.
link |
00:32:44.000
It's quite impressive.
link |
00:32:46.000
Yeah, it's quite impressive.
link |
00:32:47.000
And then Google has developed Mina, right?
link |
00:32:50.000
Just recently, that's an open domain conversational bot, just a couple of weeks ago, I think.
link |
00:32:57.000
Yeah, I kind of like the metric that sort of the Alexa Prize has proposed.
link |
00:33:01.000
I mean, maybe it's obvious to you, it wasn't to me of setting sort of a length of a conversation.
link |
00:33:07.000
You want the bot to be sufficiently interesting that you'd want to keep talking to it for like 20 minutes.
link |
00:33:13.000
And that's a surprisingly effective and aggregate metric.
link |
00:33:19.000
Because nobody has the patience to be able to talk to a bot that's not interesting
link |
00:33:27.000
and intelligent and witty and is able to go into different tangents, jump domains,
link |
00:33:32.000
be able to say something interesting to maintain your attention.
link |
00:33:36.000
Maybe many humans will also fail this test.
link |
00:33:39.000
Unfortunately, we set, just like with autonomous vehicles with chatbots,
link |
00:33:45.000
we also set a bar that's way too hard to reach.
link |
00:33:48.000
I said the Turing test is not as bad as some people believe.
link |
00:33:51.000
But what is really not useful about the Turing test, it gives us no guidance
link |
00:33:57.000
how to develop these systems in the first place.
link |
00:34:00.000
Of course, we can develop them by trial and error and do whatever and then run the test
link |
00:34:05.000
and see whether it works or not.
link |
00:34:07.000
But a mathematical definition of intelligence gives us an objective
link |
00:34:16.000
which we can then analyze by theoretical tools or computational
link |
00:34:21.000
and maybe even prove how close we are.
link |
00:34:25.000
And we will come back to that later with the ICSE model.
link |
00:34:29.000
I mentioned the compression, right?
link |
00:34:31.000
So in language processing, they have achieved amazing results.
link |
00:34:36.000
And one way to test this, of course, you take the system, you train it
link |
00:34:40.000
and then you see how well it performs on the task.
link |
00:34:43.000
But a lot of performance measurement is done by so called perplexity,
link |
00:34:49.000
which is essentially the same as complexity or compression length.
link |
00:34:53.000
So the NLP community develops new systems and then they measure the compression length
link |
00:34:57.000
and then they have ranking and leaks because there's a strong correlation
link |
00:35:02.000
between compressing well and then the system performing well at the task at hand.
link |
00:35:07.000
It's not perfect, but it's good enough for them as an intermediate aim.
link |
00:35:14.000
So you mean measure, so this is kind of almost returning to the common growth complexity.
link |
00:35:19.000
So you're saying good compression usually means good intelligence.
link |
00:35:24.000
Yes.
link |
00:35:26.000
So you mentioned you're one of the only people who dared boldly
link |
00:35:33.000
to try to formalize the idea of artificial general intelligence,
link |
00:35:38.000
to have a mathematical framework for intelligence,
link |
00:35:42.000
just like as we mentioned, termed IXE, A I X I.
link |
00:35:49.000
So let me ask the basic question, what is IXE?
link |
00:35:54.000
Okay, so let me first say what it stands for.
link |
00:35:58.000
What it stands for, actually, that's probably the more basic question.
link |
00:36:01.000
The first question is usually how it's pronounced,
link |
00:36:04.000
but finally I put it on the website, how it's pronounced.
link |
00:36:07.000
You figured it out.
link |
00:36:10.000
The name comes from AI, artificial intelligence,
link |
00:36:13.000
and the X I is the Greek letter XI,
link |
00:36:16.000
which are used for Solomonov's distribution for quite stupid reasons,
link |
00:36:22.000
which I'm not willing to repeat here in front of camera.
link |
00:36:27.000
So it just happened to be more or less arbitrary, I chose the XI.
link |
00:36:31.000
But it also has nice other interpretations.
link |
00:36:35.000
So there are actions and perceptions in this model,
link |
00:36:38.000
where an agent has actions and perceptions, and over time.
link |
00:36:42.000
So this is A index I, X index I.
link |
00:36:45.000
So there's an action at time I, and then followed by a perception at time I.
link |
00:36:49.000
We'll go with that. I'll edit out the first part.
link |
00:36:52.000
I'm just kidding.
link |
00:36:53.000
I have some more interpretations.
link |
00:36:55.000
So at some point, maybe five years ago or 10 years ago,
link |
00:36:59.000
I discovered in Barcelona, it was on a big church.
link |
00:37:04.000
There was a stone engraved, some text,
link |
00:37:08.000
and the word IXE appeared there a couple of times.
link |
00:37:12.000
I was very surprised and happy about that.
link |
00:37:17.000
And I looked it up, so it is Catalan language,
link |
00:37:20.000
and it means with some interpretation,
link |
00:37:22.000
that's it, that's the right thing to do.
link |
00:37:25.000
So it's almost like destined, somehow came to you in a dream.
link |
00:37:32.000
And similar, there's a Chinese word, IXE, also written like IXE,
link |
00:37:35.000
if you transcribe it to Pingen.
link |
00:37:37.000
And the final one is that is AI, crossed with induction,
link |
00:37:41.000
because that is, and it's going more to the content now.
link |
00:37:44.000
So good old fashioned AI is more about planning
link |
00:37:47.000
and known deterministic world,
link |
00:37:49.000
and induction is more about, often, you know,
link |
00:37:51.000
IID data and inferring models,
link |
00:37:53.000
and essentially what this IXE model does is combine these two.
link |
00:37:57.000
And I actually also recently, I think,
link |
00:37:59.000
heard that in Japanese, AI means love.
link |
00:38:02.000
So if you can combine XI somehow with that,
link |
00:38:06.000
I think we can, there might be some interesting ideas there.
link |
00:38:10.000
So IXE, let's then take the next step.
link |
00:38:13.000
So maybe talk at the big level of what is this mathematical framework.
link |
00:38:20.000
Yeah, so it consists essentially of two parts.
link |
00:38:23.000
One is the learning and induction and prediction part,
link |
00:38:27.000
and the other one is the planning part.
link |
00:38:29.000
So let's come first to the learning induction prediction part,
link |
00:38:33.000
which essentially I explained already before.
link |
00:38:36.000
So what we need for any agent to act well
link |
00:38:41.000
is that it can somehow predict what happens.
link |
00:38:44.000
I mean, if you have no idea what your actions do,
link |
00:38:47.000
how can you decide which actions are good or not?
link |
00:38:49.000
So you need to have some model of what your actions effect.
link |
00:38:53.000
So what you do is you have some experience.
link |
00:38:56.000
You build models like scientists, you know, of your experience.
link |
00:38:59.000
Then you hope these models are roughly correct,
link |
00:39:01.000
and then you use these models for prediction.
link |
00:39:04.000
And a model is, sorry, to interrupt,
link |
00:39:06.000
and a model is based on your perception of the world,
link |
00:39:08.000
how your actions will affect that world.
link |
00:39:10.000
That's not the important part.
link |
00:39:14.000
It is technically important,
link |
00:39:16.000
but at this stage we can just think about predicting,
link |
00:39:18.000
say, stock market data,
link |
00:39:20.000
whether data or IQ sequences,
link |
00:39:22.000
one, two, three, four, five, what comes next, yeah?
link |
00:39:24.000
So of course our actions affect what we're doing,
link |
00:39:28.000
but I'll come back to that in a second.
link |
00:39:30.000
And I'll keep just interrupting.
link |
00:39:32.000
So just to draw a line between prediction and planning,
link |
00:39:36.000
what do you mean by prediction in this way?
link |
00:39:40.000
It's trying to predict the environment
link |
00:39:43.000
without your long term action in the environment.
link |
00:39:46.000
What is prediction?
link |
00:39:48.000
Okay, if you want to put the actions in now,
link |
00:39:50.000
okay, then let's put it in now, yeah?
link |
00:39:53.000
We don't have to put them now.
link |
00:39:55.000
Scratch a dumb question.
link |
00:39:57.000
Okay, so the simplest form of prediction is
link |
00:40:00.000
that you just have data which you passively observe,
link |
00:40:04.000
and you want to predict what happens without interfering.
link |
00:40:08.000
As I said, weather forecasting, stock market, IQ sequences,
link |
00:40:12.000
or just anything, okay?
link |
00:40:16.000
And Solominov's theory of induction based on compression,
link |
00:40:19.000
so you look for the shortest program
link |
00:40:21.000
which describes your data sequence,
link |
00:40:23.000
and then you take this program, run it,
link |
00:40:25.000
which reproduces your data sequence by definition,
link |
00:40:27.000
and then you let it continue running,
link |
00:40:29.000
and then it will produce some predictions,
link |
00:40:31.000
and you can rigorously prove that for any prediction task,
link |
00:40:37.000
this is essentially the best possible predictor.
link |
00:40:40.000
Of course, if there's a prediction task,
link |
00:40:43.000
or a task which is unpredictable, like, you know,
link |
00:40:46.000
you have fair coin flips, yeah?
link |
00:40:48.000
I cannot predict the next fair coin flip.
link |
00:40:50.000
What Solominov does is says, okay, next head is probably 50%.
link |
00:40:52.000
It's the best you can do.
link |
00:40:54.000
So if something is unpredictable, Solominov will also not
link |
00:40:56.000
magically predict it, but if there is some pattern
link |
00:40:59.000
of probability, then Solominov induction
link |
00:41:01.000
will figure that out eventually,
link |
00:41:04.000
and not just eventually, but rather quickly,
link |
00:41:06.000
and you can have proof convergence rates,
link |
00:41:10.000
whatever your data is.
link |
00:41:12.000
So there's pure magic in a sense.
link |
00:41:15.000
What's the catch?
link |
00:41:16.000
Well, the catch is that it's not computable,
link |
00:41:17.000
and we come back to that later.
link |
00:41:19.000
You cannot just implement it,
link |
00:41:20.000
even with Google resources here,
link |
00:41:22.000
and run it and, you know, predict the stock market
link |
00:41:24.000
and become rich.
link |
00:41:25.000
I mean, if...
link |
00:41:26.000
You know, try it at the time.
link |
00:41:29.000
So the basic task is you're in the environment,
link |
00:41:31.000
and you're interacting with the environment
link |
00:41:33.000
to try to learn a model of that environment,
link |
00:41:35.000
and the model is in the space of all these programs,
link |
00:41:38.000
and your goal is to get a bunch of programs that are simple.
link |
00:41:41.000
And so let's go to the actions now.
link |
00:41:44.000
But actually, good that you asked.
link |
00:41:45.000
Usually, I skipped this part,
link |
00:41:46.000
although there is also a minor contribution,
link |
00:41:48.000
which I did, so the action part,
link |
00:41:49.000
but I usually sort of just jump to the decision part.
link |
00:41:51.000
So let me explain to the action part now.
link |
00:41:53.000
Thanks for asking.
link |
00:41:55.000
So you have to modify it a little bit
link |
00:41:58.000
by now not just predicting a sequence
link |
00:42:01.000
which just comes to you,
link |
00:42:03.000
but you have an observation, then you act somehow,
link |
00:42:06.000
and then you want to predict the next observation
link |
00:42:09.000
based on the past observation and your action.
link |
00:42:12.000
Then you take the next action.
link |
00:42:14.000
You don't care about predicting it because you're doing it.
link |
00:42:17.000
And then you get the next observation,
link |
00:42:19.000
and you want...
link |
00:42:20.000
Well, before you get it, you want to predict it again
link |
00:42:22.000
based on your past action and observation sequence.
link |
00:42:24.000
You just condition extra on your actions.
link |
00:42:28.000
There's an interesting alternative
link |
00:42:30.000
that you also try to predict your own actions.
link |
00:42:35.000
If you want...
link |
00:42:36.000
In the past or the future?
link |
00:42:38.000
Your future actions.
link |
00:42:39.000
That's interesting.
link |
00:42:41.000
Wait, let me wrap.
link |
00:42:43.000
I think my brain just broke.
link |
00:42:45.000
We should maybe discuss that later
link |
00:42:47.000
after I've explained the ICSE model.
link |
00:42:48.000
That's an interesting variation.
link |
00:42:50.000
But that is a really interesting variation.
link |
00:42:52.000
And a quick comment.
link |
00:42:54.000
I don't know if you want to insert that in here,
link |
00:42:56.000
but you're looking at the...
link |
00:42:58.000
In terms of observations,
link |
00:43:00.000
you're looking at the entire big history,
link |
00:43:02.000
the long history of the observations.
link |
00:43:04.000
That's very important, the whole history
link |
00:43:06.000
from birth sort of of the agent.
link |
00:43:08.000
And we can come back to that also while this is important here.
link |
00:43:11.000
Often, you know, in RL, you have MDPs,
link |
00:43:14.000
macro decision processes, which are much more limiting.
link |
00:43:16.000
Okay, so now we can predict conditioned on actions.
link |
00:43:20.000
So even if the influence environment.
link |
00:43:22.000
But prediction is not all we want to do, right?
link |
00:43:24.000
We also want to act really in the world.
link |
00:43:26.000
And the question is how to choose the actions.
link |
00:43:29.000
And we don't want to greedily choose the actions.
link |
00:43:32.000
You know, just, you know, what is best in the next time step.
link |
00:43:36.000
And we first, I should say, you know,
link |
00:43:38.000
what is, you know, how do we measure performance?
link |
00:43:40.000
So we measure performance by giving the agent reward.
link |
00:43:43.000
That's the so called reinforcement learning framework.
link |
00:43:45.000
So every time step, you can give it a positive reward
link |
00:43:48.000
or negative reward or maybe no reward.
link |
00:43:50.000
It could be a very scarce, right?
link |
00:43:52.000
Like if you play chess just at the end of the game,
link |
00:43:54.000
you give plus one for winning or minus one for losing.
link |
00:43:57.000
So in the IXE framework, that's completely sufficient.
link |
00:43:59.000
So occasionally you give a reward signal
link |
00:44:01.000
and you ask the agent to maximize reward,
link |
00:44:04.000
but not greedily sort of, you know, the next one,
link |
00:44:06.000
next one because that's very bad in the long run
link |
00:44:08.000
if you're greedy.
link |
00:44:10.000
So, but over the lifetime of the agent.
link |
00:44:12.000
So let's assume the agent lives for M timestamps.
link |
00:44:14.000
That will say dies in sort of 100 years sharp.
link |
00:44:17.000
That's just, you know, the simplest model to explain.
link |
00:44:19.000
So it looks at the future reward sum and ask,
link |
00:44:23.000
what is my action sequence?
link |
00:44:25.000
Well, actually more precisely my policy,
link |
00:44:27.000
which leads in expectation because I don't know the world
link |
00:44:31.000
to the maximum reward sum.
link |
00:44:34.000
Let me give you an analogy.
link |
00:44:36.000
In chess, for instance, we know how to play optimally in theory.
link |
00:44:40.000
It's just a mini max strategy.
link |
00:44:42.000
I play the move which seems best to me under the assumption
link |
00:44:45.000
that the opponent plays the move which is best for him.
link |
00:44:48.000
So best, so worst for me under the assumption that he,
link |
00:44:51.000
I play again the best move.
link |
00:44:54.000
And then you have this expecting max tree to the end of the game.
link |
00:44:57.000
And then you back propagate and then you get the best possible move.
link |
00:45:00.000
So that is the optimal strategy,
link |
00:45:02.000
which for Norman already figured out a long time ago
link |
00:45:05.000
for playing adversarial games.
link |
00:45:08.000
Luckily, or maybe unluckily for the theory,
link |
00:45:11.000
it becomes harder that world is not always adversarial.
link |
00:45:14.000
So it can be, if the other humans even cooperative,
link |
00:45:18.000
or nature is usually, I mean the dead nature is stochastic.
link |
00:45:22.000
Things just happen randomly or don't care about you.
link |
00:45:26.000
So what you have to take into account is the noise
link |
00:45:29.000
and not necessarily adversariality.
link |
00:45:31.000
So you replace the minimum on the opponent's side
link |
00:45:34.000
by an expectation, which is general enough to include
link |
00:45:37.000
also adversarial cases.
link |
00:45:40.000
So now instead of a mini max strategy,
link |
00:45:42.000
expect a max strategy.
link |
00:45:44.000
So far, so good.
link |
00:45:45.000
So that is well known.
link |
00:45:46.000
It's called sequential decision theory.
link |
00:45:48.000
But the question is on which probability distribution
link |
00:45:51.000
do you base that?
link |
00:45:53.000
If I have the true probability distribution,
link |
00:45:55.000
like say I play beggining, right?
link |
00:45:57.000
There's dice and there's certain randomness involved.
link |
00:46:00.000
I can calculate probabilities and feed it in the expected max
link |
00:46:03.000
or the sequential decision tree come up with the optimal decision
link |
00:46:06.000
if I have enough compute.
link |
00:46:08.000
But for the real world, we don't know that.
link |
00:46:10.000
What is the probability driver in front of me breaks?
link |
00:46:14.000
I don't know.
link |
00:46:15.000
So it depends on all kinds of things
link |
00:46:17.000
and especially new situations.
link |
00:46:19.000
I don't know.
link |
00:46:20.000
So this is this unknown thing about prediction
link |
00:46:23.000
and there's where Solomanov comes in.
link |
00:46:25.000
So what you do is in sequential decision tree,
link |
00:46:27.000
you just replace the true distribution,
link |
00:46:29.000
which we don't know by this universal distribution.
link |
00:46:33.000
I didn't explicitly talk about it,
link |
00:46:35.000
but this is used for universal prediction
link |
00:46:37.000
and you plug it into the sequential decision tree mechanism.
link |
00:46:40.000
And then you get the best of both worlds.
link |
00:46:42.000
You have a long term planning agent,
link |
00:46:45.000
but it doesn't need to know anything about the world
link |
00:46:48.000
because the Solomanov induction part learns.
link |
00:46:51.000
Can you explicitly try to describe the universal distribution
link |
00:46:56.000
and how Solomanov induction plays a role here?
link |
00:47:00.000
I'm trying to understand.
link |
00:47:01.000
So what he does it, so in the simplest case,
link |
00:47:04.000
he said take the shortest program describing your data, run it,
link |
00:47:07.000
have a prediction which would be deterministic.
link |
00:47:09.000
Yes.
link |
00:47:10.000
Okay.
link |
00:47:11.000
But you should not just take the shortest program,
link |
00:47:13.000
but also consider the longer ones,
link |
00:47:15.000
but give it lower a priori probability.
link |
00:47:18.000
So in the Bayesian framework,
link |
00:47:20.000
you say a priori, any distribution,
link |
00:47:25.000
which is a model or a stochastic program,
link |
00:47:29.000
has a certain a priori probability,
link |
00:47:31.000
which is two to the minus and y to the minus length,
link |
00:47:34.000
I could explain, length of this program.
link |
00:47:36.000
So longer programs are punished, a priori.
link |
00:47:40.000
And then you multiply it with the so called likelihood function,
link |
00:47:44.000
which is, as the name suggests,
link |
00:47:47.000
is how likely is this model given the data at hand.
link |
00:47:51.000
So if you have a very wrong model,
link |
00:47:53.000
it's very unlikely that this model is true.
link |
00:47:55.000
And so it is very small number.
link |
00:47:57.000
So even if the model is simple, it gets penalized by that.
link |
00:48:00.000
And what you do is then you take just the sum,
link |
00:48:02.000
but this is the average over it.
link |
00:48:04.000
And this gives you a probability distribution.
link |
00:48:07.000
So it's a universal distribution,
link |
00:48:09.000
also a moment of distribution.
link |
00:48:10.000
So it's weighed by the simplicity of the program
link |
00:48:13.000
and the likelihood.
link |
00:48:14.000
Yes.
link |
00:48:15.000
It's kind of a nice idea.
link |
00:48:17.000
Yeah.
link |
00:48:18.000
So okay.
link |
00:48:19.000
And then you said there's,
link |
00:48:21.000
you're playing N or M or forgot the letter,
link |
00:48:24.000
steps into the future.
link |
00:48:26.000
So how difficult is that problem?
link |
00:48:28.000
What's involved there?
link |
00:48:29.000
Okay.
link |
00:48:30.000
It's a basic optimization problem.
link |
00:48:31.000
What are we talking about?
link |
00:48:32.000
Yeah.
link |
00:48:33.000
So you have a planning problem up to horizon M
link |
00:48:35.000
and that's exponential time in the horizon M,
link |
00:48:38.000
which is, I mean, it's computable, but intractable.
link |
00:48:41.000
I mean, even for chess, it's already intractable
link |
00:48:43.000
to do that exactly.
link |
00:48:44.000
And, you know, for though,
link |
00:48:45.000
but it could be also discounted kind of framework.
link |
00:48:48.000
Yeah.
link |
00:48:49.000
So, so having a hard horizon, you know,
link |
00:48:52.000
at 100 years, it's just for simplicity
link |
00:48:54.000
of discussing the model and also sometimes the master simple.
link |
00:48:58.000
Um, but there are lots of variations.
link |
00:49:00.000
Actually quite interesting parameter.
link |
00:49:02.000
It's, it's, there's nothing really problematic about it,
link |
00:49:07.000
but it's very interesting.
link |
00:49:08.000
So for instance, you think, no, let's, let's tend,
link |
00:49:10.000
let's let the parameter M tend to infinity.
link |
00:49:12.000
Right.
link |
00:49:13.000
You want an agent which lives forever.
link |
00:49:15.000
Right.
link |
00:49:16.000
If you do it now, you have two problems.
link |
00:49:17.000
First, the mathematics breaks down because you have an infinite
link |
00:49:20.000
reward sum, which may give infinity and getting reward 0.1
link |
00:49:24.000
in the time step is infinity and giving reward one every time
link |
00:49:27.000
is infinity.
link |
00:49:28.000
So equally good.
link |
00:49:29.000
Not really what we want.
link |
00:49:31.000
Other problem is that, um, if you have an infinite life,
link |
00:49:35.000
you can be lazy for as long as you want for 10 years
link |
00:49:38.000
and then catch up with the same expected reward.
link |
00:49:41.000
And, you know, think about yourself or, you know,
link |
00:49:44.000
or maybe, you know, some friends or so.
link |
00:49:46.000
Um, if they knew they lived forever, you know,
link |
00:49:50.000
why work hard now?
link |
00:49:51.000
You know, just enjoy your life, you know,
link |
00:49:53.000
and then catch up later.
link |
00:49:54.000
So that's another problem with the infinite horizon.
link |
00:49:56.000
And you mentioned, yes, we can go to discounting.
link |
00:49:59.000
But then the standard discounting is so called geometric discounting.
link |
00:50:02.000
So a dollar today is about worth as much as, you know,
link |
00:50:06.000
$1.05 tomorrow.
link |
00:50:08.000
So if you do the so called geometric discounting,
link |
00:50:10.000
you have introduced an effective horizon.
link |
00:50:12.000
So, um, the agent is now motivated to look ahead a certain amount
link |
00:50:16.000
of time effectively.
link |
00:50:18.000
It's like a moving horizon.
link |
00:50:20.000
And for any fixed effective horizon, there is a problem.
link |
00:50:25.000
To solve, which requires larger horizons.
link |
00:50:28.000
So if I look ahead, you know, five time steps,
link |
00:50:30.000
I'm a terrible chess player, right?
link |
00:50:32.000
I need to look ahead longer.
link |
00:50:34.000
If I play go, I probably have to look ahead even longer.
link |
00:50:36.000
So for every problem, um, no, for every horizon,
link |
00:50:40.000
there is a problem which this horizon cannot solve.
link |
00:50:43.000
But I introduced the so called near harmonic horizon,
link |
00:50:46.000
which goes down with one over T rather than exponentially T,
link |
00:50:49.000
which produces an agent which effectively looks into the future,
link |
00:50:53.000
proportional to each age.
link |
00:50:55.000
So if it's five years old, it plans for five years.
link |
00:50:57.000
If it's a hundred years old, it then plans for a hundred years.
link |
00:50:59.000
Interesting.
link |
00:51:00.000
And it's a little bit similar to humans too, right?
link |
00:51:02.000
I mean, children don't plan ahead very long,
link |
00:51:04.000
but when we get adults, we play ahead more longer.
link |
00:51:07.000
Maybe when we get very old, I mean, we know that we don't live forever.
link |
00:51:10.000
You know, maybe then our horizon shrinks again.
link |
00:51:13.000
So that's really interesting.
link |
00:51:16.000
So adjusting the horizon, what is there some mathematical benefit
link |
00:51:19.000
of that of, or is it just a nice, um, I mean, intuitively, empirically,
link |
00:51:25.000
it will probably be a good idea to sort of push the horizon back,
link |
00:51:28.000
to extend the horizon as you experience more of the world.
link |
00:51:33.000
But is there some mathematical conclusions here that are beneficial?
link |
00:51:37.000
Mr. Lomonov with the actual sort of prediction part,
link |
00:51:39.000
we have extremely strong finite time, um, but no finite data results.
link |
00:51:44.000
So you have so and so much data, then you lose so and so much.
link |
00:51:47.000
So the data is really great.
link |
00:51:49.000
With the ISE model, with the planning part,
link |
00:51:51.000
many results are only asymptotic, um, which, well, this is...
link |
00:51:56.000
What is asymptotic?
link |
00:51:58.000
Asymptotic means you can prove, for instance, that in the long run,
link |
00:52:01.000
if the agent, you know, acts long enough, then, you know,
link |
00:52:04.000
it performs optimal or some nice thing happens.
link |
00:52:06.000
So, but you don't know how fast it converges, yeah?
link |
00:52:09.000
So it may converge fast, but we're just not able to prove it
link |
00:52:12.000
because of a difficult problem.
link |
00:52:14.000
Maybe there's a bug in the model so that it's really that slow.
link |
00:52:19.000
Yeah.
link |
00:52:20.000
So that is what asymptotic means, sort of, eventually,
link |
00:52:23.000
but we don't know how fast.
link |
00:52:25.000
And if I give the agent a fixed horizon M, yeah,
link |
00:52:29.000
then I cannot prove asymptotic results, right?
link |
00:52:32.000
So, I mean, sort of, if it dies in 100 years,
link |
00:52:35.000
then in 100 years it's over.
link |
00:52:37.000
I cannot say eventually.
link |
00:52:38.000
So this is the advantage of the discounting
link |
00:52:40.000
that I can prove asymptotic results.
link |
00:52:43.000
So, just to clarify, so I, okay, I made,
link |
00:52:47.000
I've built up a model.
link |
00:52:49.000
Well, now in the moment, I have this way of looking several steps ahead.
link |
00:52:55.000
How do I pick what action I will take?
link |
00:52:58.000
It's like with a playing chess, right?
link |
00:53:01.000
You do this mini max.
link |
00:53:02.000
In this case here, do you expect the max based on the solomotor distribution?
link |
00:53:06.000
You propagate back and then while an action falls out,
link |
00:53:12.000
the action which maximizes the future expected reward
link |
00:53:15.000
under solomotor distribution and then you just take this action.
link |
00:53:18.000
And then repeat.
link |
00:53:19.000
And then you get a new observation and you feed it in this action,
link |
00:53:22.000
observation, then you repeat.
link |
00:53:23.000
And the reward, so on.
link |
00:53:24.000
Yeah.
link |
00:53:25.000
So you're enrolled too, yeah.
link |
00:53:26.000
And then maybe you can even predict your own action.
link |
00:53:29.000
I love the idea.
link |
00:53:30.000
But, okay, this big framework, what is it?
link |
00:53:34.000
I mean, it's kind of a beautiful mathematical framework
link |
00:53:38.000
to think about artificial general intelligence.
link |
00:53:41.000
What can you, what does it help you into it about how to build such systems?
link |
00:53:49.000
Or maybe from another perspective, what does it help us in understanding AGI?
link |
00:53:56.000
So when I started in the field, I was always interested in two things.
link |
00:54:02.000
One was, you know, AGI.
link |
00:54:04.000
The name didn't exist then.
link |
00:54:06.000
What called general AI or strong AI and the physics here of everything.
link |
00:54:11.000
So I switched back and forth between computer science and physics quite often.
link |
00:54:14.000
You said the theory of everything.
link |
00:54:16.000
The theory of everything.
link |
00:54:17.000
There's basically the biggest problems before all of humanity.
link |
00:54:23.000
Yeah, I can explain if you wanted some later time,
link |
00:54:28.000
why I'm interested in these two questions.
link |
00:54:30.000
Can I ask you, and a small tangent?
link |
00:54:33.000
If it was one to be solved, which one would you,
link |
00:54:38.000
if you were, if an apple fell in your head and there was a brilliant insight
link |
00:54:43.000
and you could arrive at the solution to one, would it be AGI or the theory of everything?
link |
00:54:49.000
Definitely AGI, because once the AGI problem is solved,
link |
00:54:52.000
I can ask the AGI to solve the other problem for me.
link |
00:54:56.000
Yeah, brilliantly put.
link |
00:54:58.000
Okay, so as you were saying about it.
link |
00:55:01.000
Okay, so, and the reason why it didn't settle,
link |
00:55:05.000
I mean, this thought about, you know, once you have solved AGI,
link |
00:55:08.000
it solves all kinds of other, not just the theory of every problem,
link |
00:55:11.000
but all kinds of more useful problems to humanity is very appealing to many people.
link |
00:55:16.000
And, you know, I had this thought also,
link |
00:55:18.000
but I was quite disappointed with the state of the art of the field of AI.
link |
00:55:25.000
There was some theory, you know, about logical reasoning,
link |
00:55:28.000
but I was never convinced that this will fly.
link |
00:55:30.000
And then there was this more, more heuristic approaches with neural networks
link |
00:55:34.000
and I didn't like these heuristics.
link |
00:55:37.000
So, and also I didn't have any good idea myself.
link |
00:55:41.000
So that's the reason why I toggled back and forth quite some while
link |
00:55:45.000
and even worked for four and a half years in a company developing software
link |
00:55:48.000
or something completely unrelated.
link |
00:55:50.000
But then I had this idea about the IXI model.
link |
00:55:53.000
And so what it gives you, it gives you a gold standard.
link |
00:55:58.000
So I have proven that this is the most intelligent agents
link |
00:56:02.000
which anybody could build in quotation mark
link |
00:56:07.000
because it's just mathematical and you need infinite compute.
link |
00:56:11.000
But this is the limit.
link |
00:56:13.000
And this is completely specified.
link |
00:56:15.000
It's not just a framework.
link |
00:56:17.000
You know, every year, tens of frameworks are developed with just skeletons
link |
00:56:22.000
and then pieces are missing.
link |
00:56:24.000
And usually these missing pieces, you know, turn out to be really, really difficult.
link |
00:56:27.000
And so this is completely and uniquely defined.
link |
00:56:31.000
And we can analyze that mathematically.
link |
00:56:33.000
And we've also developed some approximations.
link |
00:56:37.000
I can talk about that a little bit later.
link |
00:56:40.000
That would be sort of the top down approach, like say for Neumann's minimax theory,
link |
00:56:44.000
that's the theoretical optimal play of games.
link |
00:56:47.000
And now we need to approximate it, put heuristics in, prune the tree, blah, blah, blah, and so on.
link |
00:56:51.000
So we can do that also with the IXI model, but for generally I.
link |
00:56:55.000
It can also inspire those and most of most researchers go bottom up, right?
link |
00:57:01.000
They have the systems that try to make it more general, more intelligent.
link |
00:57:04.000
It can inspire in which direction to go.
link |
00:57:07.000
What do you mean by that?
link |
00:57:09.000
So if you have some choice to make, right?
link |
00:57:11.000
So how should they evaluate my system if I can't do cross validation?
link |
00:57:15.000
How should they do my learning if my standard regularization doesn't work well?
link |
00:57:21.000
So the answer is always this, we have a system which does everything that's IXI.
link |
00:57:25.000
It's just completely in the ivory tower, completely useless from a practical point of view.
link |
00:57:30.000
But you can look at it and see, ah, yeah, maybe I can take some aspects.
link |
00:57:35.000
And instead of Kolmogorov complexity, they just take some compressors which has been developed so far.
link |
00:57:40.000
And for the planning, well, we have UCT, which has also been used in Go.
link |
00:57:45.000
And at least it's inspired me a lot to have this formal definition.
link |
00:57:54.000
And if you look at other fields, you know, like I always come back to physics because I have a physics background.
link |
00:57:59.000
Think about the phenomenon of energy that was long time a mysterious concept.
link |
00:58:03.000
And at some point it was completely formalized and that really helped a lot.
link |
00:58:08.000
And you can point out a lot of these things which were first mysterious and vague.
link |
00:58:13.000
And then they have been rigorously formalized.
link |
00:58:15.000
Speed and acceleration has been confused, right, until it was formally defined.
link |
00:58:20.000
There was a time like this.
link |
00:58:21.000
And people, you know, often, you know, who don't have any background, you know, still confuse it.
link |
00:58:27.000
So, and this IXI model or the intelligence definitions, which is sort of the dual to it,
link |
00:58:33.000
we come back to that later, formalizes the notion of intelligence uniquely and rigorously.
link |
00:58:39.000
So, in a sense, it serves as kind of the light at the end of the tunnel.
link |
00:58:43.000
Yes, yeah.
link |
00:58:45.000
So, I mean, there's a million questions I could ask her.
link |
00:58:48.000
So, maybe kind of, okay, let's feel around in the dark a little bit.
link |
00:58:52.000
So, there's been here a deep mind, but in general, been a lot of breakthrough ideas,
link |
00:58:57.000
just like we've been saying around reinforcement learning.
link |
00:58:59.000
So, how do you see the progress in reinforcement learning is different?
link |
00:59:04.000
Like, which subset of IXI does it occupy the current?
link |
00:59:09.000
Like you said, maybe the Markov assumption is made quite often in reinforcement learning.
link |
00:59:16.000
There's other assumptions made in order to make the system work.
link |
00:59:21.000
What do you see as the difference connection between reinforcement learning and IXI?
link |
00:59:26.000
So, the major difference is that essentially all other approaches,
link |
00:59:33.000
they make stronger assumptions.
link |
00:59:35.000
So, in reinforcement learning, the Markov assumption is that the next state or next observation
link |
00:59:41.000
only depends on the previous observation and not the whole history,
link |
00:59:45.000
which makes, of course, the mathematics much easier rather than dealing with histories.
link |
00:59:49.000
Of course, they profit from it also because then you have algorithms that run on current computers
link |
00:59:54.000
and do something practically useful.
link |
00:59:56.000
But for generally I, all the assumptions which are made by other approaches,
link |
01:00:01.000
we know already now they are limiting.
link |
01:00:04.000
So, for instance, usually you need a Godisi assumption in the MDP framework in order to learn.
link |
01:00:11.000
A Godisi essentially means that you can recover from your mistakes
link |
01:00:15.000
and that there are no traps in the environment.
link |
01:00:17.000
And if you make this assumption, then essentially you can go back to a previous state,
link |
01:00:22.000
go there a couple of times and then learn what statistics and what the state is like.
link |
01:00:29.000
And then in the long run perform well in this state.
link |
01:00:33.000
But there are no fundamental problems.
link |
01:00:35.000
But in real life, we know there can be one single action.
link |
01:00:38.000
One second of being inattentive while driving a car fast can ruin the rest of my life.
link |
01:00:45.000
I can become quadruplegic or whatever.
link |
01:00:48.000
So, there's no recovery anymore.
link |
01:00:50.000
So, the real world is not ergodic, I always say.
link |
01:00:52.000
There are traps and there are situations where you're not recovered from.
link |
01:00:56.000
And very little theory has been developed for this case.
link |
01:01:02.000
What about...
link |
01:01:05.000
What do you see in the context of Aixi as the role of exploration?
link |
01:01:10.000
Sort of...
link |
01:01:13.000
You mentioned in the real world, you can get into trouble when we make the wrong decisions and really pay for it.
link |
01:01:19.000
But exploration seems to be fundamentally important for learning about this world, for gaining new knowledge.
link |
01:01:25.000
So, is exploration baked in?
link |
01:01:29.000
Another way to ask it, what are the parameters of this Aixi that can be controlled?
link |
01:01:36.000
Yeah, I say the good thing is that there are no parameters to control.
link |
01:01:40.000
Some other people try knobs to control and you can do that.
link |
01:01:44.000
I mean, you can modify Aixi so that you have some knobs to play with if you want to.
link |
01:01:48.000
But the exploration is directly baked in.
link |
01:01:53.000
And that comes from the Bayesian learning and the long term planning.
link |
01:01:58.000
So, these together already imply exploration.
link |
01:02:04.000
You can nicely and explicitly prove that for simple problems like so called bandit problems,
link |
01:02:13.000
where you say to give a real world example, say you have two medical treatments, A and B,
link |
01:02:20.000
you don't know the effectiveness, you try A a little bit, B a little bit,
link |
01:02:23.000
but you don't want to harm too many patients.
link |
01:02:26.000
So, you have to sort of trade off exploring and at some point you want to explore.
link |
01:02:32.000
And you can do the mathematics and figure out the optimal strategy.
link |
01:02:38.000
They're so called Bayesian agents, they're also non Bayesian agents.
link |
01:02:41.000
But it shows that this Bayesian framework by taking a prior or a possible worlds,
link |
01:02:47.000
doing the Bayesian mixture, then the base optimal decision with long term planning,
link |
01:02:51.000
that is important, automatically implies exploration also to the proper extent.
link |
01:02:58.000
Not too much exploration and not too little.
link |
01:03:00.000
It is very simple settings.
link |
01:03:02.000
In the Aixi model, I was also able to prove that it is a self optimizing theorem
link |
01:03:06.000
or asymptotic optimality theorem, although only asymptotic, not finite time bounds.
link |
01:03:10.000
So, it seems like the long term planning is a really important,
link |
01:03:13.000
but the long term part of the planning is really important.
link |
01:03:16.000
So, maybe a quick tangent.
link |
01:03:19.000
How important do you think is removing the Markov assumption and looking at the full history?
link |
01:03:25.000
Intuitively, of course, it's important, but is it fundamentally transformative
link |
01:03:31.000
to the entirety of the problem?
link |
01:03:33.000
What's your sense of it?
link |
01:03:35.000
Because we make that assumption quite often, just throwing away the past.
link |
01:03:40.000
I think it's absolutely crucial.
link |
01:03:43.000
The question is whether there's a way to deal with it
link |
01:03:47.000
in a more heuristic and still sufficiently well way.
link |
01:03:52.000
So, I have to come up with an example and fly,
link |
01:03:56.000
but you have some key event in your life a long time ago,
link |
01:04:01.000
in some city or something, you realize it's a really dangerous street or whatever, right?
link |
01:04:05.000
And you want to remember that forever, right, in case you come back there.
link |
01:04:10.000
Kind of a selective kind of memory.
link |
01:04:12.000
You remember all the important events in the past,
link |
01:04:15.000
but somehow selecting the importance is...
link |
01:04:17.000
They're very hard, yeah.
link |
01:04:19.000
And I'm not concerned about just storing the whole history.
link |
01:04:22.000
You can calculate human life, say, 30 or 100 years doesn't matter, right?
link |
01:04:28.000
How much data comes in through the vision system and the auditory system.
link |
01:04:33.000
You compress it a little bit, in this case, lossily, and store it.
link |
01:04:37.000
We are soon in the means of just storing it.
link |
01:04:40.000
But you still need to the selection for the planning part
link |
01:04:45.000
and the compression for the understanding part.
link |
01:04:47.000
The raw storage I'm really not concerned about.
link |
01:04:50.000
And I think we should just store, if you develop an agent,
link |
01:04:54.000
preferably just store all the interaction history.
link |
01:04:59.000
And then you build, of course, models on top of it and you compress it
link |
01:05:03.000
and you are selective, but occasionally you go back to the old data
link |
01:05:08.000
and reanalyze it based on your new experience you have.
link |
01:05:12.000
Sometimes you are in school, you learn all these things
link |
01:05:15.000
you think is totally useless and much later you realize,
link |
01:05:18.000
oh, they were not as useless as you thought.
link |
01:05:22.000
I'm looking at you linear algebra.
link |
01:05:24.000
Right.
link |
01:05:25.000
So maybe let me ask about objective functions, because that rewards...
link |
01:05:30.000
It seems to be an important part.
link |
01:05:33.000
The rewards are kind of given to the system.
link |
01:05:37.000
For a lot of people, the specification of the objective function
link |
01:05:45.000
is a key part of intelligence.
link |
01:05:48.000
The agent itself figuring out what is important.
link |
01:05:52.000
What do you think about that?
link |
01:05:54.000
Is it possible within IACC framework to yourself discover
link |
01:06:00.000
their reward based on which you should operate?
link |
01:06:05.000
Okay, that will be a long answer.
link |
01:06:08.000
And that is a very interesting question and I'm asked a lot about this question.
link |
01:06:14.000
Where do the rewards come from?
link |
01:06:16.000
And that depends.
link |
01:06:19.000
And I'll give you now a couple of answers.
link |
01:06:22.000
So if we want to build agents, now let's start simple.
link |
01:06:27.000
So let's assume we want to build an agent based on the IACC model
link |
01:06:31.000
which performs a particular task.
link |
01:06:34.000
Let's start with something super simple, like playing chess or go or something.
link |
01:06:39.000
Then the reward is winning the game is plus one, losing the game is minus one.
link |
01:06:44.000
Done.
link |
01:06:45.000
You apply this agent, if you have enough compute, you let itself play
link |
01:06:49.000
and it will learn the rules of the game, will play perfect chess.
link |
01:06:53.000
After some while, problem solved.
link |
01:06:55.000
So if you have more complicated problems, then you may believe
link |
01:07:03.000
that you have the right reward, but it's not.
link |
01:07:05.000
So a nice cute example is elevator control that is also in Rich Sutton's book,
link |
01:07:11.000
which is a great book, by the way.
link |
01:07:13.000
So you control the elevator and you think, well, maybe the reward should be
link |
01:07:18.000
coupled to how long people wait in front of the elevator.
link |
01:07:20.000
You know, long wait is bad.
link |
01:07:22.000
You program it and you do it.
link |
01:07:24.000
And what happens is the elevator eagerly picks up all the people but never drops them off.
link |
01:07:29.000
So then you realize that maybe the time in the elevator also counts.
link |
01:07:34.000
So you minimize the sum.
link |
01:07:36.000
And the elevator does that, but never picks up the people in the 10th floor
link |
01:07:40.000
and the top floor because in expectation, it's not worth it.
link |
01:07:43.000
Just let them stay.
link |
01:07:45.000
So even in apparently simple problems, you can make mistakes.
link |
01:07:51.000
And that's what in more serious context, say, AGI safety researchers consider.
link |
01:07:58.000
So now let's go back to general agents.
link |
01:08:01.000
So assume we want to build an agent which is generally useful to humans.
link |
01:08:05.000
Yes, we have a household robot here and it should do all kinds of tasks.
link |
01:08:10.000
So in this case, the human should give the reward on the fly.
link |
01:08:15.000
I mean, maybe it's pre trained in the factory and that there's some sort of internal reward
link |
01:08:18.000
for, you know, the battery level or whatever.
link |
01:08:20.000
Yeah, but so it, you know, it does the dishes.
link |
01:08:23.000
Badly, you know, you punish the robot, you does it good, you reward the robot and then train it to a new task.
link |
01:08:28.000
Like a child, right?
link |
01:08:29.000
So you need the human in the loop if you want a system which is useful to the human.
link |
01:08:35.000
And as long as this agent stays sub human level, that should work reasonably well.
link |
01:08:41.000
Apart from, you know, these examples, it becomes critical if they become, you know, on a human level,
link |
01:08:46.000
it's children, small children, you have reasonably well under control.
link |
01:08:49.000
They become older.
link |
01:08:51.000
The reward technique doesn't work so well anymore.
link |
01:08:54.000
So then finally, so this would be agents which are just, you could say slaves to the humans.
link |
01:09:01.000
Yeah.
link |
01:09:02.000
So if you are more ambitious and just say we want to build a new spacious of intelligent beings,
link |
01:09:08.000
we put them on a new planet and we want them to develop this planet or whatever.
link |
01:09:12.000
So we don't give them any reward.
link |
01:09:15.000
So what could we do?
link |
01:09:17.000
And you could try to, you know, come up with some reward functions like, you know,
link |
01:09:22.000
it should maintain itself, the robot, it should maybe multiply, build more robots, right?
link |
01:09:28.000
And, you know, maybe all kinds of things that you find useful, but that's pretty hard, right?
link |
01:09:34.000
You know, what does self maintenance mean?
link |
01:09:36.000
You know, what does it mean to build a copy?
link |
01:09:38.000
Should it be exact copy or an approximate copy?
link |
01:09:40.000
And so that's really hard.
link |
01:09:42.000
But Laurent or so, also at DeepMind, developed a beautiful model.
link |
01:09:48.000
So it just took the ICSE model and coupled the rewards to information gain.
link |
01:09:54.000
So he said the reward is proportional to how much the agent had learned about the world.
link |
01:10:00.000
And you can rigorously formally uniquely define that in terms of our catalog versions.
link |
01:10:05.000
Okay.
link |
01:10:06.000
So if you put that in, you get a completely autonomous agent.
link |
01:10:09.000
And actually, interestingly, for this agent, we can prove much stronger result than for the general agent, which is also nice.
link |
01:10:15.000
And if you let this agent lose, it will be in a sense the optimal scientist.
link |
01:10:20.000
This is absolutely curious to learn as much as possible about the world.
link |
01:10:24.000
And of course, it will also have a lot of instrumental goals, right?
link |
01:10:27.000
In order to learn, it needs to at least survive, right?
link |
01:10:30.000
That agent is not good for anything.
link |
01:10:32.000
So it needs to have self preservation.
link |
01:10:34.000
And if it builds small helpers acquiring more information, it will do that.
link |
01:10:38.000
Yeah.
link |
01:10:39.000
If exploration, space exploration or whatever is necessary, right?
link |
01:10:44.000
To gathering information and develop it.
link |
01:10:46.000
So it has a lot of instrumental goals following on this information gain.
link |
01:10:51.000
And this agent is completely autonomous of us.
link |
01:10:53.000
No rewards necessary anymore.
link |
01:10:55.000
Yeah.
link |
01:10:56.000
Of course, it could find a way to gain the concept of information and get stuck in that library that you mentioned beforehand.
link |
01:11:05.000
With a very large number of books.
link |
01:11:08.000
The first agent had this problem.
link |
01:11:10.000
It would get stuck in front of an old TV screen, which has just had wide noise.
link |
01:11:15.000
Yeah, wide noise.
link |
01:11:16.000
But the second version can deal with at least stochasticity.
link |
01:11:20.000
Well, yeah.
link |
01:11:22.000
What about curiosity?
link |
01:11:23.000
This kind of word, curiosity, creativity.
link |
01:11:27.000
Is that kind of the reward function being of getting new information?
link |
01:11:32.000
Is that similar to idea of kind of injecting exploration for its own sake inside the reward function?
link |
01:11:42.000
Do you find this at all appealing?
link |
01:11:43.000
Interesting.
link |
01:11:44.000
I think that's a nice definition.
link |
01:11:46.000
Curiosity is a reward.
link |
01:11:48.000
Sorry.
link |
01:11:49.000
Curiosity is exploration for its own sake.
link |
01:11:54.000
Yeah.
link |
01:11:55.000
I would accept that.
link |
01:11:57.000
But most curiosity, well, in humans and especially in children, yeah, is not just for its own sake, but for actually learning about the environment and for behaving better.
link |
01:12:08.000
So I think most curiosity is tied in the end to what's performing better.
link |
01:12:14.000
Well, okay.
link |
01:12:15.000
So if intelligence systems need to have this reward function, let me, you're an intelligence system currently passing the torrent test quite effectively.
link |
01:12:26.000
What's the reward function of our human intelligence existence?
link |
01:12:33.000
What's the reward function that Marcus Hutter is operating under?
link |
01:12:37.000
Okay.
link |
01:12:38.000
To the first question, the biological reward function is to survive and to spread.
link |
01:12:44.000
And very few humans sort of are able to overcome this biological reward function.
link |
01:12:50.000
But we live in a very nice world where we have lots of spare time and can still survive and spread.
link |
01:12:58.000
So we can develop arbitrary other interests, which is quite interesting.
link |
01:13:03.000
On top of that.
link |
01:13:04.000
On top of that.
link |
01:13:05.000
Yeah.
link |
01:13:06.000
But the survival and spreading sort of is, I would say, the goal or the reward function of humans that the core one.
link |
01:13:15.000
I like how you avoided answering the second question, which a good intelligence system would.
link |
01:13:19.000
Your own meaning of life and a reward function.
link |
01:13:24.000
My own meaning of life and reward function is to find an AGI to build it.
link |
01:13:31.000
Beautifully put.
link |
01:13:32.000
Okay.
link |
01:13:33.000
Let's dissect the eggs even further.
link |
01:13:34.000
So one of the assumptions is kind of infinity keeps creeping up everywhere.
link |
01:13:42.000
Which, what are your thoughts on kind of bounded rationality and sort of the nature of our existence and intelligence systems is that we're operating always under constraints, under, you know, limited time, limited resources.
link |
01:13:57.000
How does that, how do you think about that within the IXE framework within trying to create an AGI system that operates under these constraints?
link |
01:14:06.000
Yeah, that is one of the criticisms about IXE that it ignores computation and completely and some people believe that intelligence is inherently tied to what's bounded resources.
link |
01:14:19.000
What do you think on this one point?
link |
01:14:21.000
Do you think it's the, do you think the bound of resources are fundamental to intelligence?
link |
01:14:27.000
I would say that an intelligence notion which ignores computational limits is extremely useful.
link |
01:14:35.000
A good intelligence notion which includes these resources would be even more useful, but we don't have that yet.
link |
01:14:43.000
And so look at other fields outside of computer science.
link |
01:14:48.000
Computational aspects never play a fundamental role.
link |
01:14:52.000
You develop biological models for cells, something in physics, these theories, I mean, become more and more crazy and harder and harder to compute.
link |
01:15:00.000
Well, in the end, of course, we need to do something with this model, but there's more nuisance than a feature.
link |
01:15:05.000
And I'm sometimes wondering if artificial intelligence would not sit in a computer science department, but in a philosophy department, then this computational focus would be probably significantly less.
link |
01:15:18.000
I mean, think about the induction problem is more in the philosophy department.
link |
01:15:22.000
There's virtually no paper who cares about, you know, how long it takes to compute the answer.
link |
01:15:26.000
That is completely secondary.
link |
01:15:28.000
Of course, once we have figured out the first problem, so intelligence without computational resources, then the next and very good question is,
link |
01:15:39.000
could we improve it by including computational resources, but nobody was able to do that so far in an even halfway satisfactory manner?
link |
01:15:49.000
I like that.
link |
01:15:50.000
That's in the long run, the right department to belong to is philosophy.
link |
01:15:55.000
That's actually quite a deep idea of or even to at least to think about big picture philosophical questions, big picture questions, even in the computer science department.
link |
01:16:07.000
But you've mentioned approximation, sort of, there's a lot of infinity, a lot of huge resources needed.
link |
01:16:14.000
Are there approximations to IHC that within the IHC framework that are useful?
link |
01:16:19.000
Yeah, we have to develop a couple of approximations.
link |
01:16:22.000
And what we do there is that the Solomov induction part, which was, you know, find the shortest program describing your data, which just replaces by standard data compressors.
link |
01:16:36.000
And the better compressors get, the better this part will become.
link |
01:16:41.000
We focus on a particular compressor called Context Rewaiting, which is pretty amazing, not so well known.
link |
01:16:48.000
It has beautiful theoretical properties, also works reasonably well in practice.
link |
01:16:52.000
So we use that for the approximation of the induction and the learning and the prediction part.
link |
01:16:57.000
And for the planning part, we essentially just took the ideas from a computer go from 2006.
link |
01:17:07.000
It was Java CPSPARI, also now at DeepMind, who developed the so called UCT algorithm, upper confidence bound for trees algorithm on top of the Monte Carlo tree search.
link |
01:17:19.000
So we approximate this planning part by sampling.
link |
01:17:23.000
And it's successful on some small toy problems.
link |
01:17:29.000
We don't want to lose the generality, right?
link |
01:17:33.000
And that's sort of the handicap, right?
link |
01:17:35.000
If you want to be general, you have to give up something.
link |
01:17:39.000
So but this single agent was able to play, you know, small games like Coon poker and tic tac toe and, and even Pacman.
link |
01:17:49.000
And the same architecture, no change, the agent doesn't know the rules of the game, really nothing at all by self or by player with these environments.
link |
01:17:59.000
So you're going to Schmidt, who proposed something called the ghetto machines, which is a self improving program that rewrites its own code.
link |
01:18:09.000
What sort of mathematically or philosophically, what's the relationship in your eyes if you're familiar with it between AXI and the ghetto machines?
link |
01:18:18.000
Yeah, familiar with it. He developed it while I was in his lab.
link |
01:18:22.000
Yeah, so the girl machine to explain briefly, you give it a task.
link |
01:18:28.000
It could be a simple task as, you know, finding prime factors in numbers, right?
link |
01:18:32.000
You can formally write it down. There's a very slow algorithm to do that.
link |
01:18:35.000
Just all try all the factors. Yeah.
link |
01:18:37.000
Or play chess, right?
link |
01:18:39.000
Optimally, you write the algorithm to minimax to the end of the game.
link |
01:18:42.000
So you write down what the girl machine should do.
link |
01:18:46.000
Then it will take part of it resources to run this program and other part of the sources to improve this program.
link |
01:18:54.000
And when it finds an approved version, which provably computes the same answer.
link |
01:19:01.000
So that's the key part. Yeah, it needs to prove by itself that this change of program still satisfies the original specification.
link |
01:19:09.000
And if it does so, then it replaces the original program by the improved program. And by definition, it does the same job, but just faster.
link |
01:19:16.000
Okay. And then, you know, it proves over it and over it.
link |
01:19:19.000
And it's it's it's developed in a way that all parts of this girl machine can self improve, but it stays provably consistent with the original specification.
link |
01:19:31.000
So from this perspective, it has nothing to do with AXI.
link |
01:19:36.000
But if you would now put AXI as the starting axioms in, it would run AXI.
link |
01:19:42.000
But, you know, that takes forever.
link |
01:19:45.000
But then if it finds a provable speed up of AXI, it would replace it by this and this and this and maybe eventually it comes up with a model which is still the AXI model.
link |
01:19:55.000
I mean, just for the knowledgeable reader, AXI is incomputable. And I can prove that therefore there cannot be a computable exact algorithm.
link |
01:20:08.000
Computers then needs to be some approximations.
link |
01:20:10.000
And this is not dealt with the girdle machine.
link |
01:20:12.000
So you have to do something about it.
link |
01:20:13.000
But there's the AXI TL model, which is finally computable, which we could put in which part of AXI is non computable.
link |
01:20:19.000
The Solomon of induction part.
link |
01:20:21.000
But there's ways of getting computable approximations of the AXI model.
link |
01:20:27.000
So then it's at least computable.
link |
01:20:29.000
It is still way beyond any resources anybody will ever have.
link |
01:20:33.000
But then the girdle machine could sort of improve it further and further in an exact way.
link |
01:20:37.000
So this is theoretically possible that the girdle machine process could improve. Isn't AXI already optimal?
link |
01:20:51.000
It is optimal in terms of the reward collected over its interaction cycles, but it takes infinite time to produce one action.
link |
01:21:03.000
And the world continues whether you want it or not.
link |
01:21:07.000
So the model is assuming had an oracle which solved this problem and then in the next 100 milliseconds or the reaction time you need gives the answer, then AXI is optimal.
link |
01:21:17.000
It's optimal in sense of data, also from learning efficiency and data efficiency, but not in terms of computation time.
link |
01:21:26.000
And then the girdle machine in theory, but probably not provably could make it go faster.
link |
01:21:31.000
Those two components are super interesting.
link |
01:21:37.000
The perfect intelligence combined with self improvement.
link |
01:21:44.000
Sort of provable self improvement in sense you're always getting the correct answer and you're improving.
link |
01:21:50.000
Beautiful ideas.
link |
01:21:52.000
I also mentioned that different kinds of things in the chase of solving this reward sort of optimizing for the goal.
link |
01:22:03.000
Interesting human things could emerge.
link |
01:22:05.000
So is there a place for consciousness within AXI?
link |
01:22:10.000
Where does maybe you can comment because I suppose we humans are just another instantiation by AXI agents and we seem to have consciousness.
link |
01:22:21.000
You say humans are an instantiation of an AXI agent.
link |
01:22:23.000
Yes.
link |
01:22:24.000
That would be amazing, but I think that's not true even for the smartest and most rational humans.
link |
01:22:29.000
I think maybe we are very crude approximations.
link |
01:22:33.000
Interesting.
link |
01:22:34.000
I mean, I tend to believe, again, I'm Russian, so I tend to believe our flaws are part of the optimal.
link |
01:22:41.000
So we tend to laugh off and criticize our flaws and I tend to think that that's actually close to an optimal behavior.
link |
01:22:50.000
Well, some flaws, if you think more carefully about it, are actually not flaws, but I think there are still enough flaws.
link |
01:22:58.000
I don't know.
link |
01:22:59.000
It's unclear.
link |
01:23:00.000
As a student of history, I think all the suffering that we've endured as a civilization, it's possible that that's the optimal amount of suffering we need to endure to minimize long term suffering.
link |
01:23:14.000
That's your Russian background.
link |
01:23:16.000
That's the Russian, whether humans are or not instantiations of an AXI agent.
link |
01:23:21.000
Do you think there's consciousness is something that could emerge in a computational form or framework like AXI?
link |
01:23:29.000
Let me also ask you a question.
link |
01:23:31.000
Do you think I'm conscious?
link |
01:23:33.000
That's a good question.
link |
01:23:38.000
That tie is confusing me, but I think so.
link |
01:23:44.000
You think that makes me unconscious because it strangles me?
link |
01:23:47.000
If an agent were to solve the imitation game posed by Turing, I think that would be dressed similarly to you.
link |
01:23:53.000
Because there's a kind of flamboyant, interesting, complex behavior pattern that sells that you're human and you're conscious.
link |
01:24:04.000
But why do you ask?
link |
01:24:06.000
Was it a yes or was it a no?
link |
01:24:08.000
Yes, I think you're conscious, yes.
link |
01:24:12.000
And you explain somehow why, but you infer that from my behavior.
link |
01:24:18.000
You can never be sure about that.
link |
01:24:20.000
And I think the same thing will happen with any intelligent agent we develop if it behaves in a way sufficiently close to humans.
link |
01:24:31.000
Or maybe if not humans, maybe a dog is also sometimes a little bit self conscious.
link |
01:24:36.000
So if it behaves in a way where we attribute typically consciousness, we would attribute consciousness to these intelligent systems.
link |
01:24:44.000
And AXI probably in particular, that of course doesn't answer the question whether it's really conscious.
link |
01:24:50.000
And that's the big heart problem of consciousness.
link |
01:24:53.000
Maybe I'm a zombie.
link |
01:24:55.000
I mean, not the movie zombie, but the philosophical zombie.
link |
01:24:59.000
Is to you the display of consciousness close enough to consciousness from a perspective of AGI that the distinction of the heart problem of consciousness is not an interesting one.
link |
01:25:11.000
I think we don't have to worry about the consciousness problem, especially the heart problem for developing AGI.
link |
01:25:17.000
I think, you know, we progress at some point we have solved all the technical problems and this system will behave intelligent and then super intelligent.
link |
01:25:26.000
And this consciousness will emerge.
link |
01:25:30.000
I mean, definitely it will display behavior, which we will interpret as conscious.
link |
01:25:35.000
And then it's a philosophical question.
link |
01:25:38.000
Did this consciousness really emerge?
link |
01:25:40.000
Or is it a zombie which just, you know, fakes everything?
link |
01:25:43.000
We still don't have to figure that out.
link |
01:25:45.000
Although it may be interesting, at least from a philosophical point of view, it's very interesting, but it may also be sort of practically interesting.
link |
01:25:53.000
You know, there's some people saying, you know, if it's just faking consciousness and feelings, you know, then we don't need to be concerned about, you know, rights.
link |
01:25:59.000
But if it's real conscious and has feelings, then we need to be concerned.
link |
01:26:06.000
I can't wait till the day where AI systems exhibit consciousness because it'll truly be some of the hardest ethical questions of what we do with that.
link |
01:26:16.000
It is rather easy to build systems which people ascribe consciousness.
link |
01:26:21.000
And I give you an analogy.
link |
01:26:23.000
I mean, remember, maybe it was before you were born, the Tamagotchi.
link |
01:26:27.000
How dare you, sir.
link |
01:26:31.000
You're young, right?
link |
01:26:33.000
Yes, that's good. Thank you. Thank you very much.
link |
01:26:36.000
But I was also in the Soviet Union. We didn't have any of those fun things.
link |
01:26:41.000
But you have heard about this Tamagotchi, which was, you know, really, really primitive.
link |
01:26:45.000
Actually, for the time it was, you know, you could raise, you know, this.
link |
01:26:49.000
And kids got so attached to it and, you know, didn't want to let it die.
link |
01:26:53.000
And probably if we would have asked, you know, the children,
link |
01:26:57.000
do you think this Tamagotchi is conscious?
link |
01:26:59.000
They would have said yes.
link |
01:27:01.000
I think that's kind of a beautiful thing, actually, because that consciousness, ascribing consciousness seems to create a deeper connection,
link |
01:27:10.000
which is a powerful thing. But we have to be careful on the ethics side of that.
link |
01:27:15.000
Well, let me ask about the AGI community broadly. You kind of represent some of the most serious work on AGI,
link |
01:27:23.000
at least earlier in DeepMind, represents serious work on AGI these days.
link |
01:27:29.000
But why in your sense is the AGI community so small or has been so small until maybe DeepMind came along?
link |
01:27:38.000
Like why aren't more people seriously working on human level and superhuman level intelligence from a formal perspective?
link |
01:27:48.000
Okay, from a formal perspective, that's sort of, you know, an extra point.
link |
01:27:53.000
So I think there are a couple of reasons. I mean, AGI came in waves, right?
link |
01:27:56.000
You know, AGI winters and AGI summers, and then there were big promises which were not fulfilled.
link |
01:28:01.000
And people got disappointed. And that narrow AI, solving particular problems,
link |
01:28:11.000
which seemed to require intelligence, was always to some extent successful and there were improvements, small steps.
link |
01:28:19.000
And if you build something which is, you know, useful for society or industrial useful, then there's a lot of funding.
link |
01:28:26.000
So I guess it was in parts the money, which drives people to develop specific systems, solving specific tasks.
link |
01:28:36.000
But you would think that, you know, at least in university, you should be able to do ivory tower research.
link |
01:28:43.000
And that was probably better a long time ago, but even nowadays, there's quite some pressure of doing applied research or translational research.
link |
01:28:52.000
And, you know, it's harder to get grants as a theorist.
link |
01:28:56.000
So that also drives people away. It's maybe also harder, attacking the general intelligence problem.
link |
01:29:03.000
So I think enough people, I mean, maybe a small number, we're still interested in formalizing intelligence and thinking of general intelligence.
link |
01:29:13.000
But, you know, not much came up, right? Or not much great stuff came up.
link |
01:29:19.000
So what do you think? We talked about the formal big light at the end of the tunnel.
link |
01:29:25.000
But from the engineering perspective, what do you think it takes to build an AGI system?
link |
01:29:30.000
Is it in, I don't know if that's a stupid question or a distinct question from everything we've been talking about IACC.
link |
01:29:37.000
But what do you see as the steps that are necessary to take to start to try to build something?
link |
01:29:43.000
So you want a blueprint now and then you go off and do it?
link |
01:29:46.000
The whole point of this conversation, trying to squeeze that in there.
link |
01:29:49.000
Now, is there, I mean, what's your intuition? Is it in the robotics space or something that has a body and tries to explore the world?
link |
01:29:56.000
Is it in the reinforcement learning space, like the effort to Alpha 0 and Alpha Star, they're kind of exploring how you can solve it through in the simulation in the gaming world?
link |
01:30:06.000
Is there stuff in sort of all the transformer work in natural language processing, maybe attacking the open domain dialogue?
link |
01:30:16.000
Where do you see the promising pathways?
link |
01:30:19.000
Let me pick the embodiment, maybe.
link |
01:30:24.000
So embodiment is important, yes and no.
link |
01:30:32.000
I don't believe that we need a physical robot walking or rolling around, interacting with the real world in order to achieve AGI.
link |
01:30:44.000
And I think it's more of a distraction, probably, than helpful.
link |
01:30:51.000
It's sort of confusing the body with the mind.
link |
01:30:54.000
For industrial applications or near term applications, of course, we need robots for all kinds of things.
link |
01:31:01.000
But for solving the big problem, at least at this stage, I think it's not necessary.
link |
01:31:08.000
But the answer is also yes, that I think the most promising approach is that you have an agent.
link |
01:31:15.000
And that can be a virtual agent in a computer interacting with an environment, possibly a 3D simulated environment like in many computer games.
link |
01:31:25.000
And you train and learn the agent.
link |
01:31:29.000
Even if you don't intend to later put it sort of, you know, this algorithm in a robot brain and leave it forever in the virtual reality,
link |
01:31:38.000
getting experience in a, although it's just simulated 3D world, is possibly, and as I say, possibly important to understand things on a similar level as humans do.
link |
01:31:54.000
Especially if the agent or primarily if the agent wants, needs to interact with the humans, right?
link |
01:32:00.000
You know, if you talk about objects on top of each other in space and flying in cars and so on, and the agent has no experience with even virtual 3D worlds, it's probably hard to grasp.
link |
01:32:12.000
So if we develop an abstract agent, say we take the mathematical path and we just want to build an agent which can prove theorems and becomes a better and better mathematician,
link |
01:32:21.000
then this agent needs to be able to reason in very abstract spaces and then maybe sort of putting it into 3D environment, simulated world is even harmful.
link |
01:32:30.000
It should sort of, you put it in, I don't know, an environment which it creates itself or so.
link |
01:32:36.000
It seems like you have an interesting, rich complex trajectory through life in terms of your journey of ideas.
link |
01:32:42.000
So it's interesting to ask what books, technical fiction, philosophical books, ideas, people had a transformative effect.
link |
01:32:52.000
Books are most interesting because maybe people could also read those books and see if they could be inspired as well.
link |
01:32:59.000
Yeah, luckily I asked books and not singular book.
link |
01:33:03.000
It's very hard and I tried to pin down one book and I can't do that at the end.
link |
01:33:10.000
So the most, the books which were most transformative for me or which I can most highly recommend to people interested in AI.
link |
01:33:22.000
Both perhaps.
link |
01:33:23.000
Yeah, yeah, both.
link |
01:33:25.000
I would always start with Russell and Norbic, Artificial Intelligence and Modern Approach.
link |
01:33:31.000
That's the AI Bible.
link |
01:33:33.000
It's an amazing book.
link |
01:33:35.000
It's very broad and covers all approaches to AI and even if you focus on one approach, I think that is the minimum you should know about the other approaches out there.
link |
01:33:44.000
So that should be your first book.
link |
01:33:46.000
Fourth edition should be coming out soon.
link |
01:33:48.000
Oh, okay, interesting.
link |
01:33:50.000
There's a deep learning chapter now so there must be.
link |
01:33:53.000
Written by Ian Goodfellow.
link |
01:33:55.000
Okay.
link |
01:33:56.000
And then the next book I would recommend, The Reinforcement Learning Book by Sutton and Bartow.
link |
01:34:02.000
There's a beautiful book.
link |
01:34:05.000
If there's any problem with the book, it makes RL feel and look much easier than it actually is.
link |
01:34:13.000
It's very gentle book.
link |
01:34:15.000
It's very nice to read the exercises.
link |
01:34:17.000
You can very quickly, you know, get some RL systems to run, you know, and very toy problems, but it's a lot of fun.
link |
01:34:23.000
And in a couple of days, you feel, you know, you know what RL is about, but it's much harder than the book.
link |
01:34:33.000
Come on now.
link |
01:34:34.000
It's an awesome book.
link |
01:34:35.000
Yeah, no, it is.
link |
01:34:36.000
Yeah.
link |
01:34:37.000
And maybe, I mean, there's so many books out there.
link |
01:34:41.000
If you like the information theoretic approach, then there's Kolmogorov Complexity by Leon Vitani, but probably, you know, some short article is enough.
link |
01:34:50.000
You don't need to read the whole book, but it's a great book.
link |
01:34:54.000
And if you have to mention one all time favorite book, it's a different flavor.
link |
01:35:01.000
That's a book which is used in the international baccalaureate for high school students in several countries.
link |
01:35:09.000
That's from Nikolas Altjen, Theory of Knowledge, second edition, or first, not the third place.
link |
01:35:16.000
The third one, they put, they took out all the fun.
link |
01:35:19.000
Okay.
link |
01:35:20.000
So this asks all the interesting, or to me, interesting philosophical questions about how we acquire knowledge from all perspectives, you know, from math, from art, from physics, and ask how can we know anything?
link |
01:35:36.000
And the book is called Theory of Knowledge.
link |
01:35:38.000
From which it's almost like a philosophical exploration of how we get knowledge from anything.
link |
01:35:43.000
Yes, yeah.
link |
01:35:44.000
I mean, can religion tell us, you know, about something about the world?
link |
01:35:46.000
Can science tell us something about the world?
link |
01:35:48.000
Can mathematics, or is it just playing with symbols?
link |
01:35:52.000
And you know, it's open ended questions.
link |
01:35:54.000
And I mean, it's for high school students, so they have then resources from Hitchhiker's Guide to the Galaxy, and from Star Wars, and the Chicken Cross the Road, yeah.
link |
01:36:02.000
And it's fun to read, but it's also quite deep.
link |
01:36:07.000
If you could live one day of your life over again, because it made you truly happy, or maybe like we said with the books, it was truly transformative. What day, what moment would you choose that something popped into your mind?
link |
01:36:21.000
Does it need to be a day in the past, or can it be a day in the future?
link |
01:36:25.000
Well, space time is an emergent phenomena, so it's all the same anyway.
link |
01:36:30.000
Okay.
link |
01:36:31.000
Okay, from the past.
link |
01:36:33.000
You're really going to say from the future, I love it.
link |
01:36:36.000
No, I will tell you from the future.
link |
01:36:38.000
Okay, from the past.
link |
01:36:39.000
So from the past, I would say, when I discovered my AXI model, I mean, it was not in one day, but it was one moment where I realized Conmogorff complexity and didn't even know that it existed.
link |
01:36:53.000
But I discovered sort of this compression idea myself, but immediately I knew I can't be the first one, but I had this idea.
link |
01:37:00.000
And then I knew about sequential decisionary, and I knew if I put it together, this is the right thing.
link |
01:37:06.000
And yeah, still when I think back about this moment, I'm super excited about it.
link |
01:37:12.000
Was there any more details in context at that moment?
link |
01:37:16.000
Did an apple fall in your head?
link |
01:37:18.000
If you look at Ian Goodfellow talking about Gans, there was beer involved.
link |
01:37:25.000
Is there some more context of what sparked your thought or was it just?
link |
01:37:31.000
No, it was much more mundane.
link |
01:37:33.000
So I worked in this company.
link |
01:37:34.000
So in this sense, the four and a half years was not completely wasted.
link |
01:37:38.000
So and I worked on an image interpolation problem.
link |
01:37:43.000
And I developed a quite neat new interpolation techniques and they got patented.
link |
01:37:49.000
And then, you know, which happens quite often, I got sort of overboard and thought about, you know, yeah, that's pretty good.
link |
01:37:55.000
But it's not the best.
link |
01:37:56.000
So what is the best possible way of doing interpolation?
link |
01:37:59.000
And then I thought, yeah, you want a simplest picture, which is if you coarser in it, recovers your original picture.
link |
01:38:06.000
And then I thought about the simplicity concept more in quantitative terms.
link |
01:38:11.000
And yeah, then everything developed.
link |
01:38:14.000
And somehow the full beautiful mix of also being a physicist and thinking about the big picture of it, then led you to probably big with I.
link |
01:38:23.000
Yeah.
link |
01:38:24.000
So as a physicist, I was probably trained not to always think in computational terms.
link |
01:38:28.000
You know, just ignore that and think about the fundamental properties which you want to have.
link |
01:38:33.000
So what about if you could really one day in the future?
link |
01:38:36.000
What would that be?
link |
01:38:39.000
When I solve the AGI problem.
link |
01:38:43.000
In practice.
link |
01:38:44.000
In practice.
link |
01:38:45.000
So in theory, I have solved it with the AGI model, but in practice.
link |
01:38:48.000
And then I asked the first question.
link |
01:38:50.000
What would be the first question?
link |
01:38:53.000
What's the meaning of life?
link |
01:38:55.000
I don't think there's a better way to end it.
link |
01:38:58.000
Thank you so much for talking today.
link |
01:38:59.000
It's a huge honor to finally meet you.
link |
01:39:01.000
Yeah.
link |
01:39:02.000
Thank you too.
link |
01:39:03.000
It was a pleasure of mine, too.
link |
01:39:04.000
Thanks for listening to this conversation with Marcus Hutter.
link |
01:39:07.000
And thank you to our presenting sponsor, Cash App.
link |
01:39:10.000
Download it.
link |
01:39:11.000
Use code LEX Podcast.
link |
01:39:12.000
You'll get $10 and $10 will go to FIRST, an organization that inspires and educates young minds to become science and technology innovators of tomorrow.
link |
01:39:22.000
If you enjoy this podcast, subscribe on YouTube.
link |
01:39:25.000
Give it five stars on Apple Podcast.
link |
01:39:27.000
Support on Patreon or simply connect with me on Twitter at Lex Friedman.
link |
01:39:33.000
And now let me leave you with some words of wisdom from Albert Einstein.
link |
01:39:38.000
The measure of intelligence is the ability to change.
link |
01:39:43.000
Thank you for listening and hope to see you next time.