back to indexMarcus Hutter: Universal Artificial Intelligence, AIXI, and AGI | Lex Fridman Podcast #75
link |
The following is a conversation with Marcus Hutter,
link |
senior research scientist at Google DeepMind.
link |
Throughout his career of research,
link |
including with Jürgen Schmidhuber and Shane Legge,
link |
he has proposed a lot of interesting ideas
link |
in and around the field of artificial general
link |
intelligence, including the development of AICSI,
link |
spelled AIXI model, which is a mathematical approach to AGI
link |
that incorporates ideas of Kolmogorov complexity,
link |
Solomonov induction, and reinforcement learning.
link |
In 2006, Marcus launched the 50,000 Euro Hutter Prize
link |
for lossless compression of human knowledge.
link |
The idea behind this prize is that the ability
link |
to compress well is closely related to intelligence.
link |
This, to me, is a profound idea.
link |
Specifically, if you can compress the first 100
link |
megabytes or 1 gigabyte of Wikipedia
link |
better than your predecessors, your compressor
link |
likely has to also be smarter.
link |
The intention of this prize is to encourage
link |
the development of intelligent compressors as a path to AGI.
link |
In conjunction with his podcast release just a few days ago,
link |
Marcus announced a 10x increase in several aspects
link |
of this prize, including the money, to 500,000 Euros.
link |
The better your compressor works relative to the previous
link |
winners, the higher fraction of that prize money
link |
is awarded to you.
link |
You can learn more about it if you Google simply Hutter Prize.
link |
I'm a big fan of benchmarks for developing AI systems,
link |
and the Hutter Prize may indeed be
link |
one that will spark some good ideas for approaches that
link |
will make progress on the path of developing AGI systems.
link |
This is the Artificial Intelligence Podcast.
link |
If you enjoy it, subscribe on YouTube,
link |
give it five stars on Apple Podcast,
link |
support it on Patreon, or simply connect with me on Twitter
link |
at Lex Friedman, spelled F R I D M A N.
link |
As usual, I'll do one or two minutes of ads
link |
now and never any ads in the middle
link |
that can break the flow of the conversation.
link |
I hope that works for you and doesn't
link |
hurt the listening experience.
link |
This show is presented by Cash App, the number one finance
link |
app in the App Store.
link |
When you get it, use code LEX PODCAST.
link |
Cash App lets you send money to friends,
link |
buy Bitcoin, and invest in the stock market
link |
with as little as $1.
link |
Broker services are provided by Cash App Investing,
link |
a subsidiary of Square, a member SIPC.
link |
Since Cash App allows you to send and receive money
link |
digitally, peer to peer, and security
link |
in all digital transactions is very important.
link |
Let me mention the PCI data security standard
link |
that Cash App is compliant with.
link |
I'm a big fan of standards for safety and security.
link |
PCI DSS is a good example of that,
link |
where a bunch of competitors got together
link |
and agreed that there needs to be
link |
a global standard around the security of transactions.
link |
Now, we just need to do the same for autonomous vehicles
link |
and AI systems in general.
link |
So again, if you get Cash App from the App Store or Google
link |
Play and use the code LEX PODCAST, you'll get $10.
link |
And Cash App will also donate $10 to FIRST,
link |
one of my favorite organizations that
link |
is helping to advance robotics and STEM education
link |
for young people around the world.
link |
And now, here's my conversation with Markus Hutter.
link |
Do you think of the universe as a computer
link |
or maybe an information processing system?
link |
Let's go with a big question first.
link |
Okay, with a big question first.
link |
I think it's a very interesting hypothesis or idea.
link |
And I have a background in physics,
link |
so I know a little bit about physical theories,
link |
the standard model of particle physics
link |
and general relativity theory.
link |
And they are amazing and describe virtually everything
link |
And they're all in a sense, computable theories.
link |
I mean, they're very hard to compute.
link |
And it's very elegant, simple theories,
link |
which describe virtually everything in the universe.
link |
So there's a strong indication that somehow
link |
the universe is computable, but it's a plausible hypothesis.
link |
So what do you think, just like you said, general relativity,
link |
quantum field theory, what do you think that
link |
the laws of physics are so nice and beautiful
link |
and simple and compressible?
link |
Do you think our universe was designed,
link |
is naturally this way?
link |
Are we just focusing on the parts
link |
that are especially compressible?
link |
Are human minds just enjoy something about that simplicity?
link |
And in fact, there's other things
link |
that are not so compressible.
link |
I strongly believe and I'm pretty convinced
link |
that the universe is inherently beautiful, elegant
link |
and simple and described by these equations.
link |
And we're not just picking that.
link |
I mean, if there were some phenomena
link |
which cannot be neatly described,
link |
scientists would try that.
link |
And there's biology, which is more messy,
link |
but we understand that it's an emergent phenomena
link |
and it's complex systems,
link |
but they still follow the same rules
link |
of quantum and electrodynamics.
link |
All of chemistry follows that and we know that.
link |
I mean, we cannot compute everything
link |
because we have limited computational resources.
link |
No, I think it's not a bias of the humans,
link |
but it's objectively simple.
link |
I mean, of course, you never know,
link |
maybe there's some corners very far out in the universe
link |
or super, super tiny below the nucleus of atoms
link |
or parallel universes which are not nice and simple,
link |
but there's no evidence for that.
link |
And we should apply Occam's razor
link |
and choose the simplest three consistent with it.
link |
But also it's a little bit self referential.
link |
So maybe a quick pause.
link |
What is Occam's razor?
link |
So Occam's razor says that you should not multiply entities
link |
beyond necessity, which sort of,
link |
if you translate it to proper English means,
link |
and in the scientific context means
link |
that if you have two theories or hypothesis or models,
link |
which equally well describe the phenomenon,
link |
your study or the data,
link |
you should choose the more simple one.
link |
So that's just the principle or sort of,
link |
that's not like a provable law, perhaps.
link |
Perhaps we'll kind of discuss it and think about it,
link |
but what's the intuition of why the simpler answer
link |
is the one that is likely to be more correct descriptor
link |
of whatever we're talking about?
link |
I believe that Occam's razor
link |
is probably the most important principle in science.
link |
I mean, of course we lead logical deduction
link |
and we do experimental design,
link |
but science is about finding, understanding the world,
link |
finding models of the world.
link |
And we can come up with crazy complex models,
link |
which explain everything but predict nothing.
link |
But the simple model seem to have predictive power
link |
and it's a valid question why?
link |
And there are two answers to that.
link |
You can just accept it.
link |
That is the principle of science and we use this principle
link |
and it seems to be successful.
link |
We don't know why, but it just happens to be.
link |
Or you can try, find another principle
link |
which explains Occam's razor.
link |
And if we start with the assumption
link |
that the world is governed by simple rules,
link |
then there's a bias towards simplicity
link |
and applying Occam's razor is the mechanism
link |
to finding these rules.
link |
And actually in a more quantitative sense,
link |
and we come back to that later in terms of somnolent reduction,
link |
you can rigorously prove that.
link |
You can assume that the world is simple,
link |
then Occam's razor is the best you can do
link |
in a certain sense.
link |
So I apologize for the romanticized question,
link |
but why do you think, outside of its effectiveness,
link |
why do you think we find simplicity
link |
so appealing as human beings?
link |
Why does E equals MC squared seem so beautiful to us humans?
link |
I guess mostly, in general, many things
link |
can be explained by an evolutionary argument.
link |
And there's some artifacts in humans
link |
which are just artifacts and not evolutionary necessary.
link |
But with this beauty and simplicity,
link |
it's, I believe, at least the core is about,
link |
like science, finding regularities in the world,
link |
understanding the world, which is necessary for survival.
link |
If I look at a bush and I just see noise,
link |
and there is a tiger and it eats me, then I'm dead.
link |
But if I try to find a pattern,
link |
and we know that humans are prone to find more patterns
link |
in data than they are, like the Mars face
link |
and all these things, but these biads
link |
towards finding patterns, even if they are non,
link |
but, I mean, it's best, of course, if they are, yeah,
link |
helps us for survival.
link |
Yeah, that's fascinating.
link |
I haven't thought really about the,
link |
I thought I just loved science,
link |
but indeed, in terms of just for survival purposes,
link |
there is an evolutionary argument
link |
for why we find the work of Einstein so beautiful.
link |
Maybe a quick small tangent.
link |
Could you describe what's,
link |
Salomonov induction is?
link |
Yeah, so that's a theory which I claim,
link |
and Mr. Lomanov sort of claimed a long time ago,
link |
that this solves the big philosophical problem of induction.
link |
And I believe the claim is essentially true.
link |
And what it does is the following.
link |
So, okay, for the picky listener,
link |
induction can be interpreted narrowly and widely.
link |
Narrow means inferring models from data.
link |
And widely means also then using these models
link |
for doing predictions,
link |
so predictions also part of the induction.
link |
So I'm a little bit sloppy sort of with the terminology,
link |
and maybe that comes from Ray Salomonov, you know,
link |
being sloppy, maybe I shouldn't say that.
link |
He can't complain anymore.
link |
So let me explain a little bit this theory in simple terms.
link |
So assume you have a data sequence,
link |
make it very simple, the simplest one say 1, 1, 1, 1, 1,
link |
and you see if 100 ones, what do you think comes next?
link |
The natural answer, I'm gonna speed up a little bit,
link |
the natural answer is of course, you know, one, okay?
link |
And the question is why, okay?
link |
Well, we see a pattern there, yeah, okay,
link |
there's a one and we repeat it.
link |
And why should it suddenly after 100 ones be different?
link |
So what we're looking for is simple explanations or models
link |
for the data we have.
link |
And now the question is,
link |
a model has to be presented in a certain language,
link |
in which language do we use?
link |
In science, we want formal languages,
link |
and we can use mathematics,
link |
or we can use programs on a computer.
link |
So abstractly on a Turing machine, for instance,
link |
or it can be a general purpose computer.
link |
So, and there are of course, lots of models of,
link |
you can say maybe it's 100 ones and then 100 zeros
link |
and 100 ones, that's a model, right?
link |
But there are simpler models, there's a model print one loop,
link |
and it also explains the data.
link |
And if you push that to the extreme,
link |
you are looking for the shortest program,
link |
which if you run this program reproduces the data you have,
link |
it will not stop, it will continue naturally.
link |
And this you take for your prediction.
link |
And on the sequence of ones, it's very plausible, right?
link |
That print one loop is the shortest program.
link |
We can give some more complex examples
link |
like one, two, three, four, five.
link |
The short program is again, you know,
link |
counter, and so that is roughly speaking
link |
how solomotive induction works.
link |
The extra twist is that it can also deal with noisy data.
link |
So if you have, for instance, a coin flip,
link |
say a biased coin, which comes up head with 60% probability,
link |
then it will predict, it will learn and figure this out,
link |
and after a while it predicts, oh, the next coin flip
link |
will be head with probability 60%.
link |
So it's the stochastic version of that.
link |
But the goal is, the dream is always the search
link |
for the short program.
link |
Well, in solomotive induction, precisely what you do is,
link |
so you combine, so looking for the shortest program
link |
is like applying Opaque's razor,
link |
like looking for the simplest theory.
link |
There's also Epicorus principle, which says,
link |
if you have multiple hypotheses,
link |
which equally well describe your data,
link |
don't discard any of them, keep all of them around,
link |
And you can put that together and say,
link |
okay, I have a bias towards simplicity,
link |
but it don't rule out the larger models.
link |
And technically what we do is,
link |
we weigh the shorter models higher
link |
and the longer models lower.
link |
And you use a Bayesian techniques, you have a prior,
link |
and which is precisely two to the minus
link |
the complexity of the program.
link |
And you weigh all this hypothesis and take this mixture,
link |
and then you get also the stochasticity in.
link |
Yeah, like many of your ideas,
link |
that's just a beautiful idea of weighing based
link |
on the simplicity of the program.
link |
I love that, that seems to me
link |
maybe a very human centric concept.
link |
It seems to be a very appealing way
link |
of discovering good programs in this world.
link |
You've used the term compression quite a bit.
link |
I think it's a beautiful idea.
link |
Sort of, we just talked about simplicity
link |
and maybe science or just all of our intellectual pursuits
link |
is basically the time to compress the complexity
link |
all around us into something simple.
link |
So what does this word mean to you, compression?
link |
I essentially have already explained it.
link |
So it compression means for me,
link |
finding short programs for the data
link |
or the phenomenon at hand.
link |
You could interpret it more widely,
link |
finding simple theories,
link |
which can be mathematical theories
link |
or maybe even informal, like just in words.
link |
Compression means finding short descriptions,
link |
explanations, programs for the data.
link |
Do you see science as a kind of our human attempt
link |
at compression, so we're speaking more generally,
link |
because when you say programs,
link |
you're kind of zooming in on a particular sort of
link |
almost like a computer science,
link |
artificial intelligence focus,
link |
but do you see all of human endeavor
link |
as a kind of compression?
link |
Well, at least all of science,
link |
I see as an endeavor of compression,
link |
not all of humanity, maybe.
link |
And well, there are also some other aspects of science
link |
like experimental design, right?
link |
I mean, we create experiments specifically
link |
to get extra knowledge.
link |
And that isn't part of the decision making process,
link |
but once we have the data,
link |
to understand the data is essentially compression.
link |
So I don't see any difference between compression,
link |
compression, understanding, and prediction.
link |
So we're jumping around topics a little bit,
link |
but returning back to simplicity,
link |
a fascinating concept of Kolmogorov complexity.
link |
So in your sense, do most objects
link |
in our mathematical universe
link |
have high Kolmogorov complexity?
link |
And maybe what is, first of all,
link |
what is Kolmogorov complexity?
link |
Okay, Kolmogorov complexity is a notion
link |
of simplicity or complexity,
link |
and it takes the compression view to the extreme.
link |
So I explained before that if you have some data sequence,
link |
just think about a file in a computer
link |
and best sort of, you know, just a string of bits.
link |
And if you, and we have data compressors,
link |
like we compress big files into zip files
link |
with certain compressors.
link |
And you can also produce self extracting ArcaFs.
link |
That means as an executable,
link |
if you run it, it reproduces your original file
link |
without needing an extra decompressor.
link |
It's just a decompressor plus the ArcaF together in one.
link |
And now there are better and worse compressors,
link |
and you can ask, what is the ultimate compressor?
link |
So what is the shortest possible self extracting ArcaF
link |
you could produce for a certain data set here,
link |
which reproduces the data set.
link |
And the length of this is called the Kolmogorov complexity.
link |
And arguably that is the information content
link |
I mean, if the data set is very redundant or very boring,
link |
you can compress it very well.
link |
So the information content should be low
link |
and you know, it is low according to this definition.
link |
So it's the length of the shortest program
link |
that summarizes the data?
link |
And what's your sense of our sort of universe
link |
when we think about the different objects in our universe
link |
that we try, concepts or whatever at every level,
link |
do they have higher or low Kolmogorov complexity?
link |
So what's the hope?
link |
Do we have a lot of hope
link |
and be able to summarize much of our world?
link |
That's a tricky and difficult question.
link |
So as I said before, I believe that the whole universe
link |
based on the evidence we have is very simple.
link |
So it has a very short description.
link |
Sorry, to linger on that, the whole universe,
link |
what does that mean?
link |
You mean at the very basic fundamental level
link |
in order to create the universe?
link |
So you need a very short program and you run it.
link |
To get the thing going.
link |
To get the thing going
link |
and then it will reproduce our universe.
link |
There's a problem with noise.
link |
We can come back to that later possibly.
link |
Is noise a problem or is it a bug or a feature?
link |
I would say it makes our life as a scientist
link |
really, really much harder.
link |
I mean, think about without noise,
link |
we wouldn't need all of the statistics.
link |
But then maybe we wouldn't feel like there's a free will.
link |
Maybe we need that for the...
link |
This is an illusion that noise can give you free will.
link |
At least in that way, it's a feature.
link |
But also, if you don't have noise,
link |
you have chaotic phenomena,
link |
which are effectively like noise.
link |
So we can't get away with statistics even then.
link |
I mean, think about rolling a dice
link |
and forget about quantum mechanics
link |
and you know exactly how you throw it.
link |
But I mean, it's still so hard to compute the trajectory
link |
that effectively it is best to model it
link |
as coming out with a number,
link |
this probability one over six.
link |
But from this set of philosophical
link |
Kolmogorov complexity perspective,
link |
if we didn't have noise,
link |
then arguably you could describe the whole universe
link |
as well as a standard model plus generativity.
link |
I mean, we don't have a theory of everything yet,
link |
but sort of assuming we are close to it or have it.
link |
Plus the initial conditions, which may hopefully be simple.
link |
And then you just run it
link |
and then you would reproduce the universe.
link |
But that's spoiled by noise or by chaotic systems
link |
or by initial conditions, which may be complex.
link |
So now if we don't take the whole universe,
link |
but just a subset, just take planet Earth.
link |
Planet Earth cannot be compressed
link |
into a couple of equations.
link |
This is a hugely complex system.
link |
So when you look at the window,
link |
like the whole thing might be simple,
link |
but when you just take a small window, then...
link |
It may become complex and that may be counterintuitive,
link |
but there's a very nice analogy.
link |
The book, the library of all books.
link |
So imagine you have a normal library with interesting books
link |
and you go there, great, lots of information
link |
and quite complex.
link |
So now I create a library which contains all possible books,
link |
So the first book just has A, A, A, A, A over all the pages.
link |
The next book A, A, A and ends with B and so on.
link |
I create this library of all books.
link |
I can write a super short program which creates this library.
link |
So this library which has all books
link |
has zero information content.
link |
And you take a subset of this library
link |
and suddenly you have a lot of information in there.
link |
So that's fascinating.
link |
I think one of the most beautiful object,
link |
mathematical objects that at least today
link |
seems to be understudied or under talked about
link |
is cellular automata.
link |
What lessons do you draw from sort of the game of life
link |
for cellular automata where you start with the simple rules
link |
just like you're describing with the universe
link |
and somehow complexity emerges.
link |
Do you feel like you have an intuitive grasp
link |
on the fascinating behavior of such systems
link |
where like you said, some chaotic behavior could happen,
link |
some complexity could emerge,
link |
some it could die out and some very rigid structures.
link |
Do you have a sense about cellular automata
link |
that somehow transfers maybe
link |
to the bigger questions of our universe?
link |
Yeah, the cellular automata
link |
and especially the Conway's game of life
link |
is really great because these rules are so simple.
link |
You can explain it to every child
link |
and even by hand you can simulate a little bit
link |
and you see these beautiful patterns emerge
link |
and people have proven that it's even Turing complete.
link |
You cannot just use a computer to simulate game of life
link |
but you can also use game of life to simulate any computer.
link |
That is truly amazing.
link |
And it's the prime example probably to demonstrate
link |
that very simple rules can lead to very rich phenomena.
link |
And people sometimes,
link |
how is chemistry and biology so rich?
link |
I mean, this can't be based on simple rules.
link |
But no, we know quantum electrodynamics
link |
describes all of chemistry.
link |
And we come later back to that.
link |
I claim intelligence can be explained
link |
or described in one single equation.
link |
This very rich phenomenon.
link |
You asked also about whether I understand this phenomenon
link |
and it's probably not.
link |
And there's this saying,
link |
you never understand really things,
link |
you just get used to them.
link |
And I think I got pretty used to cellular automata.
link |
So you believe that you understand
link |
now why this phenomenon happens.
link |
But I give you a different example.
link |
I didn't play too much with Conway's game of life
link |
but a little bit more with fractals
link |
and with the Mandelbrot set and these beautiful patterns,
link |
just look Mandelbrot set.
link |
And well, when the computers were really slow
link |
and I just had a black and white monitor
link |
and programmed my own programs in assembler too.
link |
Wow, you're legit.
link |
To get these fractals on the screen
link |
and it was mesmerized and much later.
link |
So I returned to this every couple of years
link |
and then I tried to understand what is going on.
link |
And you can understand a little bit.
link |
So I tried to derive the locations,
link |
there are these circles and the apple shape
link |
and then you have smaller Mandelbrot sets
link |
recursively in this set.
link |
And there's a way to mathematically
link |
by solving high order polynomials
link |
to figure out where these centers are
link |
and what size they are approximately.
link |
And by sort of mathematically approaching this problem,
link |
you slowly get a feeling of why things are like they are
link |
and that sort of isn't, you know,
link |
first step to understanding why this rich phenomena.
link |
Do you think it's possible, what's your intuition?
link |
Do you think it's possible to reverse engineer
link |
and find the short program that generated these fractals
link |
sort of by looking at the fractals?
link |
Well, in principle, yes, yeah.
link |
So, I mean, in principle, what you can do is
link |
you take, you know, any data set, you know,
link |
you take these fractals or you take whatever your data set,
link |
whatever you have, say a picture of Convey's Game of Life
link |
and you run through all programs.
link |
You take a program size one, two, three, four
link |
and all these programs around them all in parallel
link |
in so called dovetailing fashion,
link |
give them computational resources,
link |
first one 50%, second one half resources and so on
link |
and let them run, wait until they halt,
link |
give an output, compare it to your data
link |
and if some of these programs produce the correct data,
link |
then you stop and then you have already some program.
link |
It may be a long program because it's faster
link |
and then you continue and you get shorter
link |
and shorter programs until you eventually
link |
find the shortest program.
link |
The interesting thing, you can never know
link |
whether it's the shortest program
link |
because there could be an even shorter program
link |
which is just even slower and you just have to wait here.
link |
But asymptotically and actually after a finite time,
link |
you have the shortest program.
link |
So this is a theoretical but completely impractical way
link |
of finding the underlying structure in every data set
link |
and that is what Solomov induction does
link |
and Kolmogorov complexity.
link |
In practice, of course, we have to approach the problem
link |
more intelligently.
link |
And then if you take resource limitations into account,
link |
there's, for instance, a field of pseudo random numbers
link |
and these are deterministic sequences,
link |
but no algorithm which is fast,
link |
fast means runs in polynomial time,
link |
can detect that it's actually deterministic.
link |
So we can produce interesting,
link |
I mean, random numbers maybe not that interesting,
link |
but just an example.
link |
We can produce complex looking data
link |
and we can then prove that no fast algorithm
link |
can detect the underlying pattern.
link |
Which is, unfortunately, that's a big challenge
link |
for our search for simple programs
link |
in the space of artificial intelligence, perhaps.
link |
Yes, it definitely is for artificial intelligence
link |
and it's quite surprising that it's, I can't say easy.
link |
I mean, physicists worked really hard to find these theories,
link |
but apparently it was possible for human minds
link |
to find these simple rules in the universe.
link |
It could have been different, right?
link |
It could have been different.
link |
It's awe inspiring.
link |
So let me ask another absurdly big question.
link |
What is intelligence in your view?
link |
So I have, of course, a definition.
link |
I wasn't sure what you're going to say
link |
because you could have just as easily said,
link |
Which many people would say,
link |
but I'm not modest in this question.
link |
So the informal version,
link |
which I worked out together with Shane Lack,
link |
who cofounded DeepMind,
link |
is that intelligence measures an agent's ability
link |
to perform well in a wide range of environments.
link |
So that doesn't sound very impressive.
link |
And these words have been very carefully chosen
link |
and there is a mathematical theory behind that
link |
and we come back to that later.
link |
And if you look at this definition by itself,
link |
it seems like, yeah, okay,
link |
but it seems a lot of things are missing.
link |
But if you think it through,
link |
then you realize that most,
link |
and I claim all of the other traits,
link |
at least of rational intelligence,
link |
which we usually associate with intelligence,
link |
are emergent phenomena from this definition.
link |
Like creativity, memorization, planning, knowledge.
link |
You all need that in order to perform well
link |
in a wide range of environments.
link |
So you don't have to explicitly mention
link |
that in a definition.
link |
So yeah, so the consciousness, abstract reasoning,
link |
all these kinds of things are just emergent phenomena
link |
that help you in towards,
link |
can you say the definition again?
link |
So multiple environments.
link |
Did you mention the word goals?
link |
No, but we have an alternative definition.
link |
Instead of performing well,
link |
you can just replace it by goals.
link |
So intelligence measures an agent's ability
link |
to achieve goals in a wide range of environments.
link |
That's more or less equal.
link |
because in there, there's an injection of the word goals.
link |
So we want to specify there should be a goal.
link |
Yeah, but perform well is sort of,
link |
what does it mean?
link |
It's the same problem.
link |
There's a little bit gray area,
link |
but it's much closer to something that could be formalized.
link |
In your view, are humans,
link |
where do humans fit into that definition?
link |
Are they general intelligence systems
link |
that are able to perform in,
link |
like how good are they at fulfilling that definition
link |
at performing well in multiple environments?
link |
Yeah, that's a big question.
link |
I mean, the humans are performing best among all species.
link |
You could say that trees and plants are doing a better job.
link |
They'll probably outlast us.
link |
Yeah, but they are in a much more narrow environment, right?
link |
I mean, you just have a little bit of air pollutions
link |
and these trees die and we can adapt, right?
link |
We build houses, we build filters,
link |
we do geoengineering.
link |
So the multiple environment part.
link |
Yeah, that is very important, yeah.
link |
So that distinguish narrow intelligence
link |
from wide intelligence, also in the AI research.
link |
So let me ask the Allentourian question.
link |
Can machines think?
link |
Can machines be intelligent?
link |
So in your view, I have to kind of ask,
link |
the answer is probably yes,
link |
but I want to kind of hear what your thoughts on it.
link |
Can machines be made to fulfill this definition
link |
of intelligence, to achieve intelligence?
link |
Well, we are sort of getting there
link |
and on a small scale, we are already there.
link |
The wide range of environments are missing,
link |
but we have self driving cars,
link |
we have programs which play Go and chess,
link |
we have speech recognition.
link |
So that's pretty amazing,
link |
but these are narrow environments.
link |
But if you look at AlphaZero,
link |
that was also developed by DeepMind.
link |
I mean, got famous with AlphaGo
link |
and then came AlphaZero a year later.
link |
That was truly amazing.
link |
So reinforcement learning algorithm,
link |
which is able just by self play,
link |
to play chess and then also Go.
link |
And I mean, yes, they're both games,
link |
but they're quite different games.
link |
And you didn't don't feed them the rules of the game.
link |
And the most remarkable thing,
link |
which is still a mystery to me,
link |
that usually for any decent chess program,
link |
I don't know much about Go,
link |
you need opening books and end game tables and so on too.
link |
And nothing in there, nothing was put in there.
link |
Especially with AlphaZero,
link |
the self playing mechanism starting from scratch,
link |
being able to learn actually new strategies is...
link |
Yeah, it rediscovered all these famous openings
link |
within four hours by itself.
link |
What I was really happy about,
link |
I'm a terrible chess player, but I like Queen Gumby.
link |
And AlphaZero figured out that this is the best opening.
link |
Finally, somebody proved you correct.
link |
So yes, to answer your question,
link |
yes, I believe that general intelligence is possible.
link |
And it also, I mean, it depends how you define it.
link |
Do you say AGI with general intelligence,
link |
artificial intelligence,
link |
only refers to if you achieve human level
link |
or a subhuman level, but quite broad,
link |
is it also general intelligence?
link |
So we have to distinguish,
link |
or it's only super human intelligence,
link |
general artificial intelligence.
link |
Is there a test in your mind,
link |
like the Turing test for natural language
link |
or some other test that would impress the heck out of you
link |
that would kind of cross the line of your sense
link |
of intelligence within the framework that you said?
link |
Well, the Turing test has been criticized a lot,
link |
but I think it's not as bad as some people think.
link |
And some people think it's too strong.
link |
So it tests not just for system to be intelligent,
link |
but it also has to fake human deception,
link |
which is much harder.
link |
And on the other hand, they say it's too weak
link |
because it just maybe fakes emotions
link |
or intelligent behavior.
link |
But I don't think that's the problem or a big problem.
link |
So if you would pass the Turing test,
link |
so a conversation over terminal with a bot for an hour,
link |
or maybe a day or so,
link |
and you can fool a human into not knowing
link |
whether this is a human or not,
link |
so that's the Turing test,
link |
I would be truly impressed.
link |
And we have this annual competition, the Lübner Prize.
link |
And I mean, it started with ELISA,
link |
that was the first conversational program.
link |
And what is it called?
link |
The Japanese Mitsuko or so.
link |
That's the winner of the last couple of years.
link |
Yeah, it's quite impressive.
link |
And then Google has developed Mina, right?
link |
Just recently, that's an open domain conversational bot,
link |
just a couple of weeks ago, I think.
link |
Yeah, I kind of like the metric
link |
that sort of the Alexa Prize has proposed.
link |
I mean, maybe it's obvious to you.
link |
It wasn't to me of setting sort of a length
link |
of a conversation.
link |
Like you want the bot to be sufficiently interesting
link |
that you would want to keep talking to it
link |
for like 20 minutes.
link |
And that's a surprisingly effective in aggregate metric,
link |
because really, like nobody has the patience
link |
to be able to talk to a bot that's not interesting
link |
and intelligent and witty,
link |
and is able to go on to different tangents, jump domains,
link |
be able to say something interesting
link |
to maintain your attention.
link |
And maybe many humans will also fail this test.
link |
That's the, unfortunately, we set,
link |
just like with autonomous vehicles, with chatbots,
link |
we also set a bar that's way too high to reach.
link |
I said, you know, the Turing test is not as bad
link |
as some people believe,
link |
but what is really not useful about the Turing test,
link |
it gives us no guidance
link |
how to develop these systems in the first place.
link |
Of course, you know, we can develop them by trial and error
link |
and, you know, do whatever and then run the test
link |
and see whether it works or not.
link |
But a mathematical definition of intelligence
link |
gives us, you know, an objective,
link |
which we can then analyze by theoretical tools
link |
or computational, and, you know,
link |
maybe even prove how close we are.
link |
And we will come back to that later with the iXe model.
link |
So, I mentioned the compression, right?
link |
So in natural language processing,
link |
they have achieved amazing results.
link |
And one way to test this, of course,
link |
you know, take the system, you train it,
link |
and then you see how well it performs on the task.
link |
But a lot of performance measurement
link |
is done by so called perplexity,
link |
which is essentially the same as complexity
link |
or compression length.
link |
So the NLP community develops new systems
link |
and then they measure the compression length
link |
and then they have ranking and leaks
link |
because there's a strong correlation
link |
between compressing well,
link |
and then the system's performing well at the task at hand.
link |
It's not perfect, but it's good enough
link |
for them as an intermediate aim.
link |
So you mean a measure,
link |
so this is kind of almost returning
link |
to the common goal of complexity.
link |
So you're saying good compression
link |
usually means good intelligence.
link |
So you mentioned you're one of the only people
link |
who dared boldly to try to formalize
link |
the idea of artificial general intelligence,
link |
to have a mathematical framework for intelligence,
link |
just like as we mentioned,
link |
termed AIXI, A, I, X, I.
link |
So let me ask the basic question.
link |
Okay, so let me first say what it stands for because...
link |
What it stands for, actually,
link |
that's probably the more basic question.
link |
The first question is usually how it's pronounced,
link |
but finally I put it on the website how it's pronounced
link |
and you figured it out.
link |
The name comes from AI, artificial intelligence,
link |
and the X, I, is the Greek letter Xi,
link |
which are used for Solomonov's distribution
link |
for quite stupid reasons,
link |
which I'm not willing to repeat here in front of camera.
link |
So it just happened to be more or less arbitrary.
link |
But it also has nice other interpretations.
link |
So there are actions and perceptions in this model.
link |
An agent has actions and perceptions over time.
link |
So this is A index I, X index I.
link |
So there's the action at time I
link |
and then followed by perception at time I.
link |
Yeah, we'll go with that.
link |
I'll edit out the first part.
link |
I have some more interpretations.
link |
So at some point, maybe five years ago or 10 years ago,
link |
I discovered in Barcelona, it was on a big church
link |
that was in stone engraved, some text,
link |
and the word Aixia appeared there a couple of times.
link |
I was very surprised and happy about that.
link |
And I looked it up.
link |
So it is a Catalan language
link |
and it means with some interpretation of that's it,
link |
that's the right thing to do.
link |
Oh, so it's almost like destined somehow.
link |
It came to you in a dream.
link |
And similar, there's a Chinese word, Aixi,
link |
also written like Aixi, if you transcribe that to Pinyin.
link |
And the final one is that it's AI crossed with induction
link |
because that is, and that's going more to the content now.
link |
So good old fashioned AI is more about planning
link |
and known deterministic world
link |
and induction is more about often IID data
link |
and inferring models.
link |
And essentially what this Aixi model does
link |
is combining these two.
link |
And I actually also recently, I think heard that
link |
in Japanese AI means love.
link |
So if you can combine XI somehow with that,
link |
I think we can, there might be some interesting ideas there.
link |
So Aixi, let's then take the next step.
link |
Can you maybe talk at the big level
link |
of what is this mathematical framework?
link |
Yeah, so it consists essentially of two parts.
link |
One is the learning and induction and prediction part.
link |
And the other one is the planning part.
link |
So let's come first to the learning,
link |
induction, prediction part,
link |
which essentially I explained already before.
link |
So what we need for any agent to act well
link |
is that it can somehow predict what happens.
link |
I mean, if you have no idea what your actions do,
link |
how can you decide which actions are good or not?
link |
So you need to have some model of what your actions effect.
link |
So what you do is you have some experience,
link |
you build models like scientists of your experience,
link |
then you hope these models are roughly correct,
link |
and then you use these models for prediction.
link |
And the model is, sorry to interrupt,
link |
and the model is based on your perception of the world,
link |
how your actions will affect that world.
link |
So how do you think about a model?
link |
That's not the important part,
link |
but it is technically important,
link |
but at this stage we can just think about predicting,
link |
let's say, stock market data, weather data,
link |
or IQ sequences, one, two, three, four, five,
link |
what comes next, yeah?
link |
So of course our actions affect what we're doing,
link |
but I'll come back to that in a second.
link |
So, and I'll keep just interrupting.
link |
So just to draw a line between prediction and planning,
link |
what do you mean by prediction in this way?
link |
It's trying to predict the environment
link |
without your long term action in the environment?
link |
What is prediction?
link |
Okay, if you want to put the actions in now,
link |
okay, then let's put it in now, yeah?
link |
We don't have to put them now.
link |
Scratch it, scratch it, dumb question, okay.
link |
So the simplest form of prediction is
link |
that you just have data which you passively observe,
link |
and you want to predict what happens
link |
without interfering, as I said,
link |
weather forecasting, stock market, IQ sequences,
link |
or just anything, okay?
link |
And Solomonov's theory of induction based on compression,
link |
so you look for the shortest program
link |
which describes your data sequence,
link |
and then you take this program, run it,
link |
it reproduces your data sequence by definition,
link |
and then you let it continue running,
link |
and then it will produce some predictions,
link |
and you can rigorously prove that for any prediction task,
link |
this is essentially the best possible predictor.
link |
Of course, if there's a prediction task,
link |
or a task which is unpredictable,
link |
like, you know, you have fair coin flips.
link |
Yeah, I cannot predict the next fair coin flip.
link |
What Solomonov does is says,
link |
okay, next head is probably 50%.
link |
It's the best you can do.
link |
So if something is unpredictable,
link |
Solomonov will also not magically predict it.
link |
But if there is some pattern and predictability,
link |
then Solomonov induction will figure that out eventually,
link |
and not just eventually, but rather quickly,
link |
and you can have proof convergence rates,
link |
whatever your data is.
link |
So there's pure magic in a sense.
link |
Well, the catch is that it's not computable,
link |
and we come back to that later.
link |
You cannot just implement it
link |
even with Google resources here,
link |
and run it and predict the stock market and become rich.
link |
I mean, Ray Solomonov already tried it at the time.
link |
But so the basic task is you're in the environment,
link |
and you're interacting with the environment
link |
to try to learn to model that environment,
link |
and the model is in the space of all these programs,
link |
and your goal is to get a bunch of programs that are simple.
link |
Yeah, so let's go to the actions now.
link |
But actually, good that you asked.
link |
Usually I skip this part,
link |
although there is also a minor contribution which I did,
link |
so the action part,
link |
but I usually sort of just jump to the decision part.
link |
So let me explain the action part now.
link |
Thanks for asking.
link |
So you have to modify it a little bit
link |
by now not just predicting a sequence
link |
which just comes to you,
link |
but you have an observation, then you act somehow,
link |
and then you want to predict the next observation
link |
based on the past observation and your action.
link |
Then you take the next action.
link |
You don't care about predicting it because you're doing it.
link |
Then you get the next observation,
link |
and you want, well, before you get it,
link |
you want to predict it, again,
link |
based on your past action and observation sequence.
link |
You just condition extra on your actions.
link |
There's an interesting alternative
link |
that you also try to predict your own actions.
link |
In the past or the future?
link |
In your future actions.
link |
That's interesting.
link |
Yeah. Wait, let me wrap.
link |
I think my brain just broke.
link |
We should maybe discuss that later
link |
after I've explained the IXE model.
link |
That's an interesting variation.
link |
But that is a really interesting variation,
link |
and a quick comment.
link |
I don't know if you want to insert that in here,
link |
but you're looking at the, in terms of observations,
link |
you're looking at the entire, the big history,
link |
the long history of the observations.
link |
Exactly. That's very important.
link |
The whole history from birth sort of of the agent,
link |
and we can come back to that.
link |
And also why this is important.
link |
Often, you know, in RL, you have MDPs,
link |
micro decision processes, which are much more limiting.
link |
Okay. So now we can predict conditioned on actions.
link |
So even if you influence environment,
link |
but prediction is not all we want to do, right?
link |
We also want to act really in the world.
link |
And the question is how to choose the actions.
link |
And we don't want to greedily choose the actions,
link |
you know, just, you know, what is best in the next time step.
link |
And we first, I should say, you know, what is, you know,
link |
how do we measure performance?
link |
So we measure performance by giving the agent reward.
link |
That's the so called reinforcement learning framework.
link |
So every time step, you can give it a positive reward
link |
or negative reward, or maybe no reward.
link |
It could be a very scarce, right?
link |
Like if you play chess, just at the end of the game,
link |
you give plus one for winning or minus one for losing.
link |
So in the RxC framework, that's completely sufficient.
link |
So occasionally you give a reward signal
link |
and you ask the agent to maximize reward,
link |
but not greedily sort of, you know, the next one, next one,
link |
because that's very bad in the long run if you're greedy.
link |
So, but over the lifetime of the agent.
link |
So let's assume the agent lives for M time steps,
link |
or say dies in sort of a hundred years sharp.
link |
That's just, you know, the simplest model to explain.
link |
So it looks at the future reward sum
link |
and ask what is my action sequence,
link |
or actually more precisely my policy,
link |
which leads in expectation, because I don't know the world,
link |
to the maximum reward sum.
link |
Let me give you an analogy.
link |
In chess, for instance,
link |
we know how to play optimally in theory.
link |
It's just a mini max strategy.
link |
I play the move which seems best to me
link |
under the assumption that the opponent plays the move
link |
which is best for him.
link |
So best, so worst for me under the assumption that he,
link |
I play again, the best move.
link |
And then you have this expecting max three
link |
to the end of the game, and then you back propagate,
link |
and then you get the best possible move.
link |
So that is the optimal strategy,
link |
which von Neumann already figured out a long time ago,
link |
for playing adversarial games.
link |
Luckily, or maybe unluckily for the theory,
link |
it becomes harder.
link |
The world is not always adversarial.
link |
So it can be, if there are other humans,
link |
even cooperative, or nature is usually,
link |
I mean, the dead nature is stochastic, you know,
link |
things just happen randomly, or don't care about you.
link |
So what you have to take into account is the noise,
link |
and not necessarily adversarialty.
link |
So you replace the minimum on the opponent's side
link |
by an expectation,
link |
which is general enough to include also adversarial cases.
link |
So now instead of a mini max strategy,
link |
you have an expected max strategy.
link |
So that is well known.
link |
It's called sequential decision theory.
link |
But the question is,
link |
on which probability distribution do you base that?
link |
If I have the true probability distribution,
link |
like say I play backgammon, right?
link |
There's dice, and there's certain randomness involved.
link |
Yeah, I can calculate probabilities
link |
and feed it in the expected max,
link |
or the sequential decision tree,
link |
come up with the optimal decision if I have enough compute.
link |
But for the real world, we don't know that, you know,
link |
what is the probability the driver in front of me breaks?
link |
So depends on all kinds of things,
link |
and especially new situations, I don't know.
link |
So this is this unknown thing about prediction,
link |
and there's where Solomonov comes in.
link |
So what you do is in sequential decision tree,
link |
you just replace the true distribution,
link |
which we don't know, by this universal distribution.
link |
I didn't explicitly talk about it,
link |
but this is used for universal prediction
link |
and plug it into the sequential decision tree mechanism.
link |
And then you get the best of both worlds.
link |
You have a long term planning agent,
link |
but it doesn't need to know anything about the world
link |
because the Solomonov induction part learns.
link |
Can you explicitly try to describe
link |
the universal distribution
link |
and how Solomonov induction plays a role here?
link |
I'm trying to understand.
link |
So what it does it, so in the simplest case,
link |
I said, take the shortest program, describing your data,
link |
run it, have a prediction which would be deterministic.
link |
But you should not just take the shortest program,
link |
but also consider the longer ones,
link |
but give it lower a priori probability.
link |
So in the Bayesian framework, you say a priori,
link |
any distribution, which is a model or a stochastic program,
link |
has a certain a priori probability,
link |
which is two to the minus, and why two to the minus length?
link |
You know, I could explain length of this program.
link |
So longer programs are punished a priori.
link |
And then you multiply it
link |
with the so called likelihood function,
link |
which is, as the name suggests,
link |
is how likely is this model given the data at hand.
link |
So if you have a very wrong model,
link |
it's very unlikely that this model is true.
link |
And so it is very small number.
link |
So even if the model is simple, it gets penalized by that.
link |
And what you do is then you take just the sum,
link |
or this is the average over it.
link |
And this gives you a probability distribution.
link |
So it's universal distribution or Solomonov distribution.
link |
So it's weighed by the simplicity of the program
link |
and the likelihood.
link |
It's kind of a nice idea.
link |
So okay, and then you said there's you're playing N or M
link |
or forgot the letter steps into the future.
link |
So how difficult is that problem?
link |
What's involved there?
link |
Okay, so basic optimization problem.
link |
What are we talking about?
link |
Yeah, so you have a planning problem up to horizon M,
link |
and that's exponential time in the horizon M,
link |
which is, I mean, it's computable, but intractable.
link |
I mean, even for chess, it's already intractable
link |
to do that exactly.
link |
And you know, for goal.
link |
But it could be also discounted kind of framework where.
link |
Yeah, so having a hard horizon, you know, at 100 years,
link |
it's just for simplicity of discussing the model
link |
and also sometimes the math is simple.
link |
But there are lots of variations,
link |
actually quite interesting parameter.
link |
There's nothing really problematic about it,
link |
but it's very interesting.
link |
So for instance, you think, no,
link |
let's let the parameter M tend to infinity, right?
link |
You want an agent which lives forever, right?
link |
If you do it normally, you have two problems.
link |
First, the mathematics breaks down
link |
because you have an infinite reward sum,
link |
which may give infinity,
link |
and getting reward 0.1 every time step is infinity,
link |
and giving reward one every time step is infinity,
link |
Not really what we want.
link |
Other problem is that if you have an infinite life,
link |
you can be lazy for as long as you want for 10 years
link |
and then catch up with the same expected reward.
link |
And think about yourself or maybe some friends or so.
link |
If they knew they lived forever, why work hard now?
link |
Just enjoy your life and then catch up later.
link |
So that's another problem with infinite horizon.
link |
And you mentioned, yes, we can go to discounting,
link |
but then the standard discounting
link |
is so called geometric discounting.
link |
So a dollar today is about worth
link |
as much as $1.05 tomorrow.
link |
So if you do the so called geometric discounting,
link |
you have introduced an effective horizon.
link |
So the agent is now motivated to look ahead
link |
a certain amount of time effectively.
link |
It's like a moving horizon.
link |
And for any fixed effective horizon,
link |
there is a problem to solve,
link |
which requires larger horizon.
link |
So if I look ahead five time steps,
link |
I'm a terrible chess player, right?
link |
I'll need to look ahead longer.
link |
If I play go, I probably have to look ahead even longer.
link |
So for every problem, for every horizon,
link |
there is a problem which this horizon cannot solve.
link |
But I introduced the so called near harmonic horizon,
link |
which goes down with one over T
link |
rather than exponential in T,
link |
which produces an agent,
link |
which effectively looks into the future
link |
proportional to each age.
link |
So if it's five years old, it plans for five years.
link |
If it's 100 years old, it then plans for 100 years.
link |
And it's a little bit similar to humans too, right?
link |
I mean, children don't plan ahead very long,
link |
but then we get adult, we play ahead more longer.
link |
Maybe when we get very old,
link |
I mean, we know that we don't live forever.
link |
Maybe then our horizon shrinks again.
link |
So that's really interesting.
link |
So adjusting the horizon,
link |
is there some mathematical benefit of that?
link |
Or is it just a nice,
link |
I mean, intuitively, empirically,
link |
it would probably be a good idea
link |
to sort of push the horizon back,
link |
extend the horizon as you experience more of the world.
link |
But is there some mathematical conclusions here
link |
that are beneficial?
link |
With solomonic reductions or the prediction part,
link |
we have extremely strong finite time,
link |
but not finite data results.
link |
So you have so and so much data,
link |
then you lose so and so much.
link |
So it's a, the theory is really great.
link |
With the ICSE model, with the planning part,
link |
many results are only asymptotic, which, well, this is...
link |
What does asymptotic mean?
link |
Asymptotic means you can prove, for instance,
link |
that in the long run, if the agent, you know,
link |
acts long enough, then, you know,
link |
it performs optimal or some nice thing happens.
link |
So, but you don't know how fast it converges.
link |
So it may converge fast,
link |
but we're just not able to prove it
link |
because of a difficult problem.
link |
Or maybe there's a bug in the model
link |
so that it's really that slow.
link |
So that is what asymptotic means,
link |
sort of eventually, but we don't know how fast.
link |
And if I give the agent a fixed horizon M,
link |
then I cannot prove asymptotic results, right?
link |
So I mean, sort of if it dies in a hundred years,
link |
then in a hundred years it's over, I cannot say eventually.
link |
So this is the advantage of the discounting
link |
that I can prove asymptotic results.
link |
So just to clarify, so I, okay, I made,
link |
I've built up a model, we're now in the moment of,
link |
I have this way of looking several steps ahead.
link |
How do I pick what action I will take?
link |
It's like with the playing chess, right?
link |
You do this minimax.
link |
In this case here, do expectimax based on the solomonov
link |
distribution, you propagate back,
link |
and then while an action falls out,
link |
the action which maximizes the future expected reward
link |
on the solomonov distribution,
link |
and then you just take this action.
link |
And then you get a new observation,
link |
and you feed it in this action observation,
link |
And the reward, so on.
link |
Yeah, so you rewrote too, yeah.
link |
And then maybe you can even predict your own action.
link |
But okay, this big framework,
link |
what is it, I mean,
link |
it's kind of a beautiful mathematical framework
link |
to think about artificial general intelligence.
link |
What can you, what does it help you into it
link |
about how to build such systems?
link |
Or maybe from another perspective,
link |
what does it help us in understanding AGI?
link |
So when I started in the field,
link |
I was always interested in two things.
link |
One was AGI, the name didn't exist then,
link |
what's called general AI or strong AI,
link |
and the physics theory of everything.
link |
So I switched back and forth between computer science
link |
and physics quite often.
link |
You said the theory of everything.
link |
The theory of everything, yeah.
link |
Those are basically the two biggest problems
link |
before all of humanity.
link |
Yeah, I can explain if you wanted some later time,
link |
why I'm interested in these two questions.
link |
Can I ask you in a small tangent,
link |
if it was one to be solved,
link |
which one would you,
link |
if an apple fell on your head
link |
and there was a brilliant insight
link |
and you could arrive at the solution to one,
link |
would it be AGI or the theory of everything?
link |
Definitely AGI, because once the AGI problem is solved,
link |
I can ask the AGI to solve the other problem for me.
link |
Yeah, brilliant input.
link |
Okay, so as you were saying about it.
link |
Okay, so, and the reason why I didn't settle,
link |
I mean, this thought about,
link |
once you have solved AGI, it solves all kinds of other,
link |
not just the theory of every problem,
link |
but all kinds of more useful problems to humanity
link |
is very appealing to many people.
link |
And I had this thought also,
link |
but I was quite disappointed with the state of the art
link |
of the field of AI.
link |
There was some theory about logical reasoning,
link |
but I was never convinced that this will fly.
link |
And then there was this more heuristic approaches
link |
with neural networks and I didn't like these heuristics.
link |
So, and also I didn't have any good idea myself.
link |
So that's the reason why I toggled back and forth
link |
quite some while and even worked four and a half years
link |
in a company developing software,
link |
something completely unrelated.
link |
But then I had this idea about the ICSE model.
link |
And so what it gives you, it gives you a gold standard.
link |
So I have proven that this is the most intelligent agents
link |
which anybody could build in quotation mark,
link |
because it's just mathematical
link |
and you need infinite compute.
link |
But this is the limit and this is completely specified.
link |
It's not just a framework and every year,
link |
tens of frameworks are developed,
link |
which are just skeletons and then pieces are missing.
link |
And usually these missing pieces,
link |
turn out to be really, really difficult.
link |
And so this is completely and uniquely defined
link |
and we can analyze that mathematically.
link |
And we've also developed some approximations.
link |
I can talk about that a little bit later.
link |
That would be sort of the top down approach,
link |
like say for Neumann's minimax theory,
link |
that's the theoretical optimal play of games.
link |
And now we need to approximate it,
link |
put heuristics in, prune the tree, blah, blah, blah,
link |
So we can do that also with the ICSE model,
link |
but for general AI.
link |
It can also inspire those,
link |
and most researchers go bottom up, right?
link |
They have the systems,
link |
they try to make it more general, more intelligent.
link |
It can inspire in which direction to go.
link |
What do you mean by that?
link |
So if you have some choice to make, right?
link |
So how should I evaluate my system
link |
if I can't do cross validation?
link |
How should I do my learning
link |
if my standard regularization doesn't work well?
link |
So the answer is always this,
link |
we have a system which does everything, that's ICSE.
link |
It's just completely in the ivory tower,
link |
completely useless from a practical point of view.
link |
But you can look at it and see,
link |
ah, yeah, maybe I can take some aspects.
link |
And instead of Kolmogorov complexity,
link |
that just takes some compressors,
link |
which has been developed so far.
link |
And for the planning, well, we have UCT,
link |
which has also been used in Go.
link |
And at least it's inspired me a lot
link |
to have this formal definition.
link |
And if you look at other fields,
link |
like I always come back to physics
link |
because I have a physics background,
link |
think about the phenomenon of energy.
link |
That was long time a mysterious concept.
link |
And at some point it was completely formalized.
link |
And that really helped a lot.
link |
And you can point out a lot of these things
link |
which were first mysterious and vague,
link |
and then they have been rigorously formalized.
link |
Speed and acceleration has been confused, right?
link |
Until it was formally defined,
link |
yeah, there was a time like this.
link |
And people often who don't have any background,
link |
And this ICSE model or the intelligence definitions,
link |
which is sort of the dual to it,
link |
we come back to that later,
link |
formalizes the notion of intelligence
link |
uniquely and rigorously.
link |
So in a sense, it serves as kind of the light
link |
at the end of the tunnel.
link |
So for, I mean, there's a million questions
link |
So maybe kind of, okay,
link |
let's feel around in the dark a little bit.
link |
So there's been here a deep mind,
link |
but in general, been a lot of breakthrough ideas,
link |
just like we've been saying around reinforcement learning.
link |
So how do you see the progress
link |
in reinforcement learning is different?
link |
Like which subset of ICSE does it occupy?
link |
The current, like you said,
link |
maybe the Markov assumption is made quite often
link |
in reinforcement learning.
link |
There's other assumptions made
link |
in order to make the system work.
link |
What do you see as the difference connection
link |
between reinforcement learning and ICSE?
link |
And so the major difference is that
link |
essentially all other approaches,
link |
they make stronger assumptions.
link |
So in reinforcement learning, the Markov assumption
link |
is that the next state or next observation
link |
only depends on the previous observation
link |
and not the whole history,
link |
which makes, of course, the mathematics much easier
link |
rather than dealing with histories.
link |
Of course, they profit from it also,
link |
because then you have algorithms
link |
that run on current computers
link |
and do something practically useful.
link |
But for general AI, all the assumptions
link |
which are made by other approaches,
link |
we know already now they are limiting.
link |
So, for instance, usually you need
link |
a goddessity assumption in the MDP frameworks
link |
in order to learn.
link |
A goddessity essentially means that you can recover
link |
from your mistakes and that there are no traps
link |
in the environment.
link |
And if you make this assumption,
link |
then essentially you can go back to a previous state,
link |
go there a couple of times and then learn
link |
what statistics and what the state is like,
link |
and then in the long run perform well in this state.
link |
But there are no fundamental problems.
link |
But in real life, we know there can be one single action.
link |
One second of being inattentive while driving a car fast
link |
can ruin the rest of my life.
link |
I can become quadriplegic or whatever.
link |
So, and there's no recovery anymore.
link |
So, the real world is not ergodic, I always say.
link |
There are traps and there are situations
link |
where you are not recover from.
link |
And very little theory has been developed for this case.
link |
What about, what do you see in the context of IECSIA
link |
as the role of exploration?
link |
Sort of, you mentioned in the real world
link |
you can get into trouble when we make the wrong decisions
link |
and really pay for it.
link |
But exploration seems to be fundamentally important
link |
for learning about this world, for gaining new knowledge.
link |
So, is exploration baked in?
link |
Another way to ask it, what are the potential
link |
to ask it, what are the parameters of IECSIA
link |
that can be controlled?
link |
Yeah, I say the good thing is that there are no parameters
link |
Some other people track knobs to control.
link |
And you can do that.
link |
I mean, you can modify IECSIA so that you have some knobs
link |
to play with if you want to.
link |
But the exploration is directly baked in.
link |
And that comes from the Bayesian learning
link |
and the longterm planning.
link |
So these together already imply exploration.
link |
You can nicely and explicitly prove that
link |
for simple problems like so called bandit problems,
link |
where you say, to give a real world example,
link |
say you have two medical treatments, A and B,
link |
you don't know the effectiveness,
link |
you try A a little bit, B a little bit,
link |
but you don't want to harm too many patients.
link |
So you have to sort of trade off exploring.
link |
And at some point you want to explore
link |
and you can do the mathematics
link |
and figure out the optimal strategy.
link |
They talk about Bayesian agents,
link |
they're also non Bayesian agents,
link |
but it shows that this Bayesian framework
link |
by taking a prior or possible worlds,
link |
doing the Bayesian mixture,
link |
then the Bayes optimal decision with longterm planning
link |
that is important,
link |
automatically implies exploration,
link |
also to the proper extent,
link |
not too much exploration and not too little.
link |
It is very simple settings.
link |
In the IXE model, I was also able to prove
link |
that it is a self optimizing theorem
link |
or asymptotic optimality theorems,
link |
although they're only asymptotic, not finite time bounds.
link |
So it seems like the longterm planning is really important,
link |
but the longterm part of the planning is really important.
link |
And also, I mean, maybe a quick tangent,
link |
how important do you think is removing
link |
the Markov assumption and looking at the full history?
link |
Sort of intuitively, of course, it's important,
link |
but is it like fundamentally transformative
link |
to the entirety of the problem?
link |
What's your sense of it?
link |
Like, cause we all, we make that assumption quite often.
link |
It's just throwing away the past.
link |
No, I think it's absolutely crucial.
link |
The question is whether there's a way to deal with it
link |
in a more heuristic and still sufficiently well way.
link |
So I have to come up with an example and fly,
link |
but you have some key event in your life,
link |
long time ago in some city or something,
link |
you realized that's a really dangerous street or whatever.
link |
And you want to remember that forever,
link |
in case you come back there.
link |
Kind of a selective kind of memory.
link |
So you remember all the important events in the past,
link |
but somehow selecting the important is.
link |
And I'm not concerned about just storing the whole history.
link |
Just, you can calculate, human life says 30 or 100 years,
link |
doesn't matter, right?
link |
How much data comes in through the vision system
link |
and the auditory system, you compress it a little bit,
link |
in this case, lossily and store it.
link |
We are soon in the means of just storing it.
link |
But you still need to the selection for the planning part
link |
and the compression for the understanding part.
link |
The raw storage I'm really not concerned about.
link |
And I think we should just store,
link |
if you develop an agent,
link |
preferably just store all the interaction history.
link |
And then you build of course models on top of it
link |
and you compress it and you are selective,
link |
but occasionally you go back to the old data
link |
and reanalyze it based on your new experience you have.
link |
Sometimes you are in school,
link |
you learn all these things you think is totally useless
link |
and much later you realize,
link |
oh, they were not so useless as you thought.
link |
I'm looking at you, linear algebra.
link |
So maybe let me ask about objective functions
link |
because that rewards, it seems to be an important part.
link |
The rewards are kind of given to the system.
link |
For a lot of people,
link |
the specification of the objective function
link |
is a key part of intelligence.
link |
The agent itself figuring out what is important.
link |
What do you think about that?
link |
Is it possible within the IXE framework
link |
to yourself discover the reward
link |
based on which you should operate?
link |
Okay, that will be a long answer.
link |
So, and that is a very interesting question.
link |
And I'm asked a lot about this question,
link |
where do the rewards come from?
link |
So, and then I give you now a couple of answers.
link |
So if you want to build agents, now let's start simple.
link |
So let's assume we want to build an agent
link |
based on the IXE model, which performs a particular task.
link |
Let's start with something super simple,
link |
like, I mean, super simple, like playing chess,
link |
or go or something, yeah.
link |
Then you just, the reward is winning the game is plus one,
link |
losing the game is minus one, done.
link |
You apply this agent.
link |
If you have enough compute, you let it self play
link |
and it will learn the rules of the game,
link |
will play perfect chess after some while, problem solved.
link |
Okay, so if you have more complicated problems,
link |
then you may believe that you have the right reward,
link |
So a nice, cute example is the elevator control
link |
that is also in Rich Sutton's book,
link |
which is a great book, by the way.
link |
So you control the elevator and you think,
link |
well, maybe the reward should be coupled
link |
to how long people wait in front of the elevator.
link |
You program it and you do it.
link |
And what happens is the elevator eagerly picks up
link |
all the people, but never drops them off.
link |
So then you realize, oh, maybe the time in the elevator
link |
also counts, so you minimize the sum, yeah?
link |
And the elevator does that, but never picks up the people
link |
in the 10th floor and the top floor
link |
because in expectation, it's not worth it.
link |
Just let them stay.
link |
So even in apparently simple problems,
link |
you can make mistakes, yeah?
link |
And that's what in more serious contexts
link |
AGI safety researchers consider.
link |
So now let's go back to general agents.
link |
So assume you want to build an agent,
link |
which is generally useful to humans, yeah?
link |
So you have a household robot, yeah?
link |
And it should do all kinds of tasks.
link |
So in this case, the human should give the reward
link |
I mean, maybe it's pre trained in the factory
link |
and that there's some sort of internal reward
link |
for the battery level or whatever, yeah?
link |
But so it does the dishes badly, you punish the robot,
link |
it does it good, you reward the robot
link |
and then train it to a new task, yeah, like a child, right?
link |
So you need the human in the loop.
link |
If you want a system, which is useful to the human.
link |
And as long as these agents stay subhuman level,
link |
that should work reasonably well,
link |
apart from these examples.
link |
It becomes critical if they become on a human level.
link |
It's like with children, small children,
link |
you have reasonably well under control,
link |
they become older, the reward technique
link |
doesn't work so well anymore.
link |
So then finally, so this would be agents,
link |
which are just, you could say slaves to the humans, yeah?
link |
So if you are more ambitious and just say,
link |
we want to build a new species of intelligent beings,
link |
we put them on a new planet
link |
and we want them to develop this planet or whatever.
link |
So we don't give them any reward.
link |
So what could we do?
link |
And you could try to come up with some reward functions
link |
like it should maintain itself, the robot,
link |
it should maybe multiply, build more robots, right?
link |
And maybe all kinds of things which you find useful,
link |
but that's pretty hard, right?
link |
What does self maintenance mean?
link |
What does it mean to build a copy?
link |
Should it be exact copy, an approximate copy?
link |
And so that's really hard,
link |
but Laurent also at DeepMind developed a beautiful model.
link |
So it just took the ICSE model
link |
and coupled the rewards to information gain.
link |
So he said the reward is proportional
link |
to how much the agent had learned about the world.
link |
And you can rigorously, formally, uniquely define that
link |
in terms of archival versions, okay?
link |
So if you put that in, you get a completely autonomous agent.
link |
And actually, interestingly, for this agent,
link |
we can prove much stronger result
link |
than for the general agent, which is also nice.
link |
And if you let this agent loose,
link |
it will be in a sense, the optimal scientist.
link |
It is absolutely curious to learn as much as possible
link |
And of course, it will also have
link |
a lot of instrumental goals, right?
link |
In order to learn, it needs to at least survive, right?
link |
A dead agent is not good for anything.
link |
So it needs to have self preservation.
link |
And if it builds small helpers, acquiring more information,
link |
it will do that, yeah?
link |
If exploration, space exploration or whatever is necessary,
link |
right, to gathering information and develop it.
link |
So it has a lot of instrumental goals
link |
falling on this information gain.
link |
And this agent is completely autonomous of us.
link |
No rewards necessary anymore.
link |
Yeah, of course, it could find a way
link |
to game the concept of information
link |
and get stuck in that library
link |
that you mentioned beforehand
link |
with a very large number of books.
link |
The first agent had this problem.
link |
It would get stuck in front of an old TV screen,
link |
which has just had white noise.
link |
Yeah, white noise, yeah.
link |
But the second version can deal with at least stochasticity.
link |
Yeah, what about curiosity?
link |
This kind of word, curiosity, creativity,
link |
is that kind of the reward function being
link |
of getting new information?
link |
Is that similar to idea of kind of injecting exploration
link |
for its own sake inside the reward function?
link |
Do you find this at all appealing, interesting?
link |
I think that's a nice definition.
link |
Curiosity is rewards.
link |
Sorry, curiosity is exploration for its own sake.
link |
Yeah, I would accept that.
link |
But most curiosity, well, in humans,
link |
and especially in children,
link |
is not just for its own sake,
link |
but for actually learning about the environment
link |
and for behaving better.
link |
So I think most curiosity is tied in the end
link |
towards performing better.
link |
Well, okay, so if intelligence systems
link |
need to have this reward function,
link |
let me, you're an intelligence system,
link |
currently passing the torrent test quite effectively.
link |
What's the reward function
link |
of our human intelligence existence?
link |
What's the reward function
link |
that Marcus Hutter is operating under?
link |
Okay, to the first question,
link |
the biological reward function is to survive and to spread,
link |
and very few humans sort of are able to overcome
link |
this biological reward function.
link |
But we live in a very nice world
link |
where we have lots of spare time
link |
and can still survive and spread,
link |
so we can develop arbitrary other interests,
link |
which is quite interesting.
link |
On top of that, yeah.
link |
But the survival and spreading sort of is,
link |
I would say, the goal or the reward function of humans,
link |
so that the core one.
link |
I like how you avoided answering the second question,
link |
which a good intelligence system would.
link |
That your own meaning of life and the reward function.
link |
My own meaning of life and reward function
link |
is to find an AGI to build it.
link |
Okay, let's dissect the X even further.
link |
So one of the assumptions is kind of infinity
link |
keeps creeping up everywhere,
link |
which, what are your thoughts
link |
on kind of bounded rationality
link |
and sort of the nature of our existence
link |
and intelligence systems is that we're operating
link |
always under constraints, under limited time,
link |
limited resources.
link |
How does that, how do you think about that
link |
within the IXE framework,
link |
within trying to create an AGI system
link |
that operates under these constraints?
link |
Yeah, that is one of the criticisms about IXE,
link |
that it ignores computation and completely.
link |
And some people believe that intelligence
link |
is inherently tied to what's bounded resources.
link |
What do you think on this one point?
link |
Do you think it's,
link |
do you think the bounded resources
link |
are fundamental to intelligence?
link |
I would say that an intelligence notion,
link |
which ignores computational limits is extremely useful.
link |
A good intelligence notion,
link |
which includes these resources would be even more useful,
link |
but we don't have that yet.
link |
And so look at other fields outside of computer science,
link |
computational aspects never play a fundamental role.
link |
You develop biological models for cells,
link |
something in physics, these theories,
link |
I mean, become more and more crazy
link |
and harder and harder to compute.
link |
Well, in the end, of course,
link |
we need to do something with this model,
link |
but this is more a nuisance than a feature.
link |
And I'm sometimes wondering if artificial intelligence
link |
would not sit in a computer science department,
link |
but in a philosophy department,
link |
then this computational focus
link |
would be probably significantly less.
link |
I mean, think about the induction problem
link |
is more in the philosophy department.
link |
There's virtually no paper who cares about,
link |
how long it takes to compute the answer.
link |
That is completely secondary.
link |
Of course, once we have figured out the first problem,
link |
so intelligence without computational resources,
link |
then the next and very good question is,
link |
could we improve it by including computational resources,
link |
but nobody was able to do that so far
link |
in an even halfway satisfactory manner.
link |
I like that, that in the long run,
link |
the right department to belong to is philosophy.
link |
That's actually quite a deep idea,
link |
or even to at least to think about
link |
big picture philosophical questions,
link |
big picture questions,
link |
even in the computer science department.
link |
But you've mentioned approximation.
link |
Sort of, there's a lot of infinity,
link |
a lot of huge resources needed.
link |
Are there approximations to IXE
link |
that within the IXE framework that are useful?
link |
Yeah, we have developed a couple of approximations.
link |
And what we do there is that
link |
the Solomov induction part,
link |
which was find the shortest program describing your data,
link |
we just replace it by standard data compressors.
link |
And the better compressors get,
link |
the better this part will become.
link |
We focus on a particular compressor
link |
called context tree weighting,
link |
which is pretty amazing, not so well known.
link |
It has beautiful theoretical properties,
link |
also works reasonably well in practice.
link |
So we use that for the approximation of the induction
link |
and the learning and the prediction part.
link |
And for the planning part,
link |
we essentially just took the ideas from a computer go
link |
It was Java Zipes Bari, also now at DeepMind,
link |
who developed the so called UCT algorithm,
link |
upper confidence bound for trees algorithm
link |
on top of the Monte Carlo tree search.
link |
So we approximate this planning part by sampling.
link |
And it's successful on some small toy problems.
link |
We don't want to lose the generality, right?
link |
And that's sort of the handicap, right?
link |
If you want to be general, you have to give up something.
link |
So, but this single agent was able to play small games
link |
like Coon poker and Tic Tac Toe and even Pacman
link |
in the same architecture, no change.
link |
The agent doesn't know the rules of the game,
link |
really nothing and all by self or by a player
link |
with these environments.
link |
So Jürgen Schmidhuber proposed something called
link |
Ghetto Machines, which is a self improving program
link |
that rewrites its own code.
link |
Sort of mathematically, philosophically,
link |
what's the relationship in your eyes,
link |
if you're familiar with it,
link |
between AXI and the Ghetto Machines?
link |
Yeah, familiar with it.
link |
He developed it while I was in his lab.
link |
Yeah, so the Ghetto Machine, to explain it briefly,
link |
you give it a task.
link |
It could be a simple task as, you know,
link |
finding prime factors in numbers, right?
link |
You can formally write it down.
link |
There's a very slow algorithm to do that.
link |
Just try all the factors, yeah.
link |
Or play chess, right?
link |
Optimally, you write the algorithm to minimax
link |
to the end of the game.
link |
So you write down what the Ghetto Machine should do.
link |
Then it will take part of its resources to run this program
link |
and other part of its resources to improve this program.
link |
And when it finds an improved version,
link |
which provably computes the same answer.
link |
So that's the key part, yeah.
link |
It needs to prove by itself that this change of program
link |
still satisfies the original specification.
link |
And if it does so, then it replaces the original program
link |
by the improved program.
link |
And by definition, it does the same job,
link |
but just faster, okay?
link |
And then, you know, it proves over it and over it.
link |
And it's developed in a way that all parts
link |
of this Ghetto Machine can self improve,
link |
but it stays provably consistent
link |
with the original specification.
link |
So from this perspective, it has nothing to do with iXe.
link |
But if you would now put iXe as the starting axioms in,
link |
it would run iXe, but you know, that takes forever.
link |
But then if it finds a provable speed up of iXe,
link |
it would replace it by this and this and this.
link |
And maybe eventually it comes up with a model
link |
which is still the iXe model.
link |
It cannot be, I mean, just for the knowledgeable reader,
link |
iXe is incomputable and that can prove that therefore
link |
there cannot be a computable exact algorithm computers.
link |
There needs to be some approximations
link |
and this is not dealt with the Ghetto Machine.
link |
So you have to do something about it.
link |
But there's the iXe TL model, which is finitely computable,
link |
which we could put in.
link |
Which part of iXe is noncomputable?
link |
The Solomonov induction part.
link |
The induction, okay, so.
link |
But there is ways of getting computable approximations
link |
of the iXe model, so then it's at least computable.
link |
It is still way beyond any resources anybody will ever have,
link |
but then the Ghetto Machine could sort of improve it
link |
further and further in an exact way.
link |
So is it theoretically possible
link |
that the Ghetto Machine process could improve?
link |
Isn't iXe already optimal?
link |
It is optimal in terms of the reward collected
link |
over its interaction cycles,
link |
but it takes infinite time to produce one action.
link |
And the world continues whether you want it or not.
link |
So the model is assuming you had an oracle,
link |
which solved this problem,
link |
and then in the next 100 milliseconds
link |
or the reaction time you need gives the answer,
link |
then iXe is optimal.
link |
It's optimal in sense of also from learning efficiency
link |
and data efficiency, but not in terms of computation time.
link |
And then the Ghetto Machine in theory,
link |
but probably not provably could make it go faster.
link |
Okay, interesting.
link |
Those two components are super interesting.
link |
The sort of the perfect intelligence combined
link |
with self improvement,
link |
sort of provable self improvement
link |
since you're always getting the correct answer
link |
and you're improving.
link |
Okay, so you've also mentioned that different kinds
link |
of things in the chase of solving this reward,
link |
sort of optimizing for the goal,
link |
interesting human things could emerge.
link |
So is there a place for consciousness within iXe?
link |
Where does, maybe you can comment,
link |
because I suppose we humans are just another instantiation
link |
of iXe agents and we seem to have consciousness.
link |
You say humans are an instantiation of an iXe agent?
link |
Well, that would be amazing,
link |
but I think that's not true even for the smartest
link |
and most rational humans.
link |
I think maybe we are very crude approximations.
link |
I mean, I tend to believe, again, I'm Russian,
link |
so I tend to believe our flaws are part of the optimal.
link |
So we tend to laugh off and criticize our flaws
link |
and I tend to think that that's actually close
link |
to an optimal behavior.
link |
Well, some flaws, if you think more carefully about it,
link |
are actually not flaws, yeah,
link |
but I think there are still enough flaws.
link |
As a student of history,
link |
I think all the suffering that we've endured
link |
as a civilization,
link |
it's possible that that's the optimal amount of suffering
link |
we need to endure to minimize longterm suffering.
link |
That's your Russian background, I think.
link |
That's the Russian.
link |
Whether humans are or not instantiations of an iXe agent,
link |
do you think there's a consciousness
link |
of something that could emerge
link |
in a computational form or framework like iXe?
link |
Let me also ask you a question.
link |
Do you think I'm conscious?
link |
Yeah, that's a good question.
link |
That tie is confusing me, but I think so.
link |
You think that makes me unconscious
link |
because it strangles me or?
link |
If an agent were to solve the imitation game
link |
I think that would be dressed similarly to you.
link |
That because there's a kind of flamboyant,
link |
interesting, complex behavior pattern
link |
that sells that you're human and you're conscious.
link |
But why do you ask?
link |
Was it a yes or was it a no?
link |
Yes, I think you're conscious, yes.
link |
So, and you explained sort of somehow why,
link |
but you infer that from my behavior, right?
link |
You can never be sure about that.
link |
And I think the same thing will happen
link |
with any intelligent agent we develop
link |
if it behaves in a way sufficiently close to humans
link |
or maybe even not humans.
link |
I mean, maybe a dog is also sometimes
link |
a little bit self conscious, right?
link |
So if it behaves in a way
link |
where we attribute typically consciousness,
link |
we would attribute consciousness
link |
to these intelligent systems.
link |
And I see probably in particular
link |
that of course doesn't answer the question
link |
whether it's really conscious.
link |
And that's the big hard problem of consciousness.
link |
Maybe I'm a zombie.
link |
I mean, not the movie zombie, but the philosophical zombie.
link |
Is to you the display of consciousness
link |
close enough to consciousness
link |
from a perspective of AGI
link |
that the distinction of the hard problem of consciousness
link |
is not an interesting one?
link |
I think we don't have to worry
link |
about the consciousness problem,
link |
especially the hard problem for developing AGI.
link |
I think, you know, we progress.
link |
At some point we have solved all the technical problems
link |
and this system will behave intelligent
link |
and then super intelligent.
link |
And this consciousness will emerge.
link |
I mean, definitely it will display behavior
link |
which we will interpret as conscious.
link |
And then it's a philosophical question.
link |
Did this consciousness really emerge
link |
or is it a zombie which just, you know, fakes everything?
link |
We still don't have to figure that out.
link |
Although it may be interesting,
link |
at least from a philosophical point of view,
link |
it's very interesting,
link |
but it may also be sort of practically interesting.
link |
You know, there's some people saying,
link |
if it's just faking consciousness and feelings,
link |
you know, then we don't need to be concerned about,
link |
But if it's real conscious and has feelings,
link |
then we need to be concerned, yeah.
link |
I can't wait till the day
link |
where AI systems exhibit consciousness
link |
because it'll truly be some of the hardest ethical questions
link |
of what we do with that.
link |
It is rather easy to build systems
link |
which people ascribe consciousness.
link |
And I give you an analogy.
link |
I mean, remember, maybe it was before you were born,
link |
How dare you, sir?
link |
Why, that's the, you're young, right?
link |
Thank you, thank you very much.
link |
But I was also in the Soviet Union.
link |
We didn't have any of those fun things.
link |
But you have heard about this Tamagotchi,
link |
which was, you know, really, really primitive,
link |
actually, for the time it was,
link |
and, you know, you could raise, you know, this,
link |
and kids got so attached to it
link |
and, you know, didn't want to let it die
link |
and probably, if we would have asked, you know,
link |
the children, do you think this Tamagotchi is conscious?
link |
They would have said yes.
link |
Half of them would have said yes, I would guess.
link |
I think that's kind of a beautiful thing, actually,
link |
because that consciousness, ascribing consciousness,
link |
seems to create a deeper connection.
link |
Which is a powerful thing.
link |
But we'll have to be careful on the ethics side of that.
link |
Well, let me ask about the AGI community broadly.
link |
You kind of represent some of the most serious work on AGI,
link |
as of at least earlier,
link |
and DeepMind represents serious work on AGI these days.
link |
But why, in your sense, is the AGI community so small
link |
or has been so small until maybe DeepMind came along?
link |
Like, why aren't more people seriously working
link |
on human level and superhuman level intelligence
link |
from a formal perspective?
link |
Okay, from a formal perspective,
link |
that's sort of an extra point.
link |
So I think there are a couple of reasons.
link |
I mean, AI came in waves, right?
link |
You know, AI winters and AI summers,
link |
and then there were big promises which were not fulfilled,
link |
and people got disappointed.
link |
And that narrow AI solving particular problems,
link |
which seemed to require intelligence,
link |
was always to some extent successful,
link |
and there were improvements, small steps.
link |
And if you build something which is useful for society
link |
or industrial useful, then there's a lot of funding.
link |
So I guess it was in parts the money,
link |
which drives people to develop a specific system
link |
solving specific tasks.
link |
But you would think that, at least in university,
link |
you should be able to do ivory tower research.
link |
And that was probably better a long time ago,
link |
but even nowadays, there's quite some pressure
link |
of doing applied research or translational research,
link |
and it's harder to get grants as a theorist.
link |
So that also drives people away.
link |
It's maybe also harder
link |
attacking the general intelligence problem.
link |
So I think enough people, I mean, maybe a small number
link |
were still interested in formalizing intelligence
link |
and thinking of general intelligence,
link |
but not much came up, right?
link |
Well, not much great stuff came up.
link |
So what do you think,
link |
we talked about the formal big light
link |
at the end of the tunnel,
link |
but from the engineering perspective,
link |
what do you think it takes to build an AGI system?
link |
Is that, and I don't know if that's a stupid question
link |
or a distinct question
link |
from everything we've been talking about at AICSI,
link |
but what do you see as the steps that are necessary to take
link |
to start to try to build something?
link |
So you want a blueprint now,
link |
and then you go off and do it?
link |
That's the whole point of this conversation,
link |
trying to squeeze that in there.
link |
Now, is there, I mean, what's your intuition?
link |
Is it in the robotics space
link |
or something that has a body and tries to explore the world?
link |
Is it in the reinforcement learning space,
link |
like the efforts with AlphaZero and AlphaStar
link |
that are kind of exploring how you can solve it through
link |
in the simulation in the gaming world?
link |
Is there stuff in sort of all the transformer work
link |
and natural English processing,
link |
sort of maybe attacking the open domain dialogue?
link |
Like what, where do you see a promising pathways?
link |
Let me pick the embodiment maybe.
link |
So embodiment is important, yes and no.
link |
I don't believe that we need a physical robot
link |
walking or rolling around, interacting with the real world
link |
in order to achieve AGI.
link |
And I think it's more of a distraction probably
link |
than helpful, it's sort of confusing the body with the mind.
link |
For industrial applications or near term applications,
link |
of course we need robots for all kinds of things,
link |
but for solving the big problem, at least at this stage,
link |
I think it's not necessary.
link |
But the answer is also yes,
link |
that I think the most promising approach
link |
is that you have an agent
link |
and that can be a virtual agent in a computer
link |
interacting with an environment,
link |
possibly a 3D simulated environment
link |
like in many computer games.
link |
And you train and learn the agent,
link |
even if you don't intend to later put it sort of,
link |
this algorithm in a robot brain
link |
and leave it forever in the virtual reality,
link |
getting experience in a,
link |
although it's just simulated 3D world,
link |
is possibly, and I say possibly,
link |
important to understand things
link |
on a similar level as humans do,
link |
especially if the agent or primarily if the agent
link |
needs to interact with the humans.
link |
If you talk about objects on top of each other in space
link |
and flying and cars and so on,
link |
and the agent has no experience
link |
with even virtual 3D worlds,
link |
it's probably hard to grasp.
link |
So if you develop an abstract agent,
link |
say we take the mathematical path
link |
and we just want to build an agent
link |
which can prove theorems
link |
and becomes a better and better mathematician,
link |
then this agent needs to be able to reason
link |
in very abstract spaces
link |
and then maybe sort of putting it into 3D environments,
link |
simulated or not is even harmful.
link |
It should sort of, you put it in, I don't know,
link |
an environment which it creates itself or so.
link |
It seems like you have a interesting, rich,
link |
complex trajectory through life
link |
in terms of your journey of ideas.
link |
So it's interesting to ask what books,
link |
technical, fiction, philosophical,
link |
books, ideas, people had a transformative effect.
link |
Books are most interesting
link |
because maybe people could also read those books
link |
and see if they could be inspired as well.
link |
Yeah, luckily I asked books and not singular book.
link |
It's very hard and I try to pin down one book.
link |
And I can do that at the end.
link |
the books which were most transformative for me
link |
or which I can most highly recommend
link |
to people interested in AI.
link |
Yeah, yeah, both, both, yeah, yeah.
link |
I would always start with Russell and Norvig,
link |
Artificial Intelligence, A Modern Approach.
link |
That's the AI Bible.
link |
It's an amazing book.
link |
It covers all approaches to AI.
link |
And even if you focused on one approach,
link |
I think that is the minimum you should know
link |
about the other approaches out there.
link |
So that should be your first book.
link |
Fourth edition should be coming out soon.
link |
Oh, okay, interesting.
link |
There's a deep learning chapter now,
link |
Written by Ian Goodfellow, okay.
link |
And then the next book I would recommend,
link |
The Reinforcement Learning Book by Satneen Barto.
link |
That's a beautiful book.
link |
If there's any problem with the book,
link |
it makes RL feel and look much easier than it actually is.
link |
It's very gentle book.
link |
It's very nice to read, the exercises to do.
link |
You can very quickly get some RL systems to run.
link |
You know, very toy problems, but it's a lot of fun.
link |
And in a couple of days you feel you know what RL is about,
link |
but it's much harder than the book.
link |
Oh, come on now, it's an awesome book.
link |
Yeah, it is, yeah.
link |
And maybe, I mean, there's so many books out there.
link |
If you like the information theoretic approach,
link |
then there's Kolmogorov Complexity by Alin Vitani,
link |
but probably, you know, some short article is enough.
link |
You don't need to read a whole book,
link |
but it's a great book.
link |
And if you have to mention one all time favorite book,
link |
it's of different flavor, that's a book
link |
which is used in the International Baccalaureate
link |
for high school students in several countries.
link |
That's from Nicholas Alchin, Theory of Knowledge,
link |
second edition or first, not the third, please.
link |
The third one, they took out all the fun.
link |
So this asks all the interesting,
link |
or to me, interesting philosophical questions
link |
about how we acquire knowledge from all perspectives,
link |
from math, from art, from physics,
link |
and ask how can we know anything?
link |
And the book is called Theory of Knowledge.
link |
From which, is this almost like a philosophical exploration
link |
of how we get knowledge from anything?
link |
Yes, yeah, I mean, can religion tell us, you know,
link |
about something about the world?
link |
Can science tell us something about the world?
link |
Can mathematics, or is it just playing with symbols?
link |
And, you know, it's open ended questions.
link |
And, I mean, it's for high school students,
link |
so they have then resources from Hitchhiker's Guide
link |
to the Galaxy and from Star Wars
link |
and The Chicken Crossed the Road, yeah.
link |
And it's fun to read, but it's also quite deep.
link |
If you could live one day of your life over again,
link |
has it made you truly happy?
link |
Or maybe like we said with the books,
link |
it was truly transformative.
link |
What day, what moment would you choose
link |
that something pop into your mind?
link |
Does it need to be a day in the past,
link |
or can it be a day in the future?
link |
Well, space time is an emergent phenomena,
link |
so it's all the same anyway.
link |
Okay, from the past.
link |
You're really good at saying from the future, I love it.
link |
No, I will tell you from the future, okay.
link |
So from the past, I would say
link |
when I discovered my Axie model.
link |
I mean, it was not in one day,
link |
but it was one moment where I realized
link |
Kolmogorov complexity and didn't even know that it existed,
link |
but I discovered sort of this compression idea
link |
myself, but immediately I knew I can't be the first one,
link |
but I had this idea.
link |
And then I knew about sequential decisionry,
link |
and I knew if I put it together, this is the right thing.
link |
And yeah, still when I think back about this moment,
link |
I'm super excited about it.
link |
Was there any more details and context that moment?
link |
Did an apple fall on your head?
link |
So it was like, if you look at Ian Goodfellow
link |
talking about GANs, there was beer involved.
link |
Is there some more context of what sparked your thought,
link |
No, it was much more mundane.
link |
So I worked in this company.
link |
So in this sense, the four and a half years
link |
was not completely wasted.
link |
And I worked on an image interpolation problem,
link |
and I developed a quite neat new interpolation techniques
link |
and they got patented, which happens quite often.
link |
I got sort of overboard and thought about,
link |
yeah, that's pretty good, but it's not the best.
link |
So what is the best possible way of doing interpolation?
link |
And then I thought, yeah, you want the simplest picture,
link |
which is if you coarse grain it,
link |
recovers your original picture.
link |
And then I thought about the simplicity concept
link |
more in quantitative terms,
link |
and then everything developed.
link |
And somehow the four beautiful mix
link |
of also being a physicist
link |
and thinking about the big picture of it,
link |
then led you to probably think big with AIX.
link |
So as a physicist, I was probably trained
link |
not to always think in computational terms,
link |
just ignore that and think about
link |
the fundamental properties, which you want to have.
link |
So what about if you could really one day in the future?
link |
What would that be?
link |
When I solve the AGI problem.
link |
In practice, so in theory,
link |
I have solved it with the AIX model, but in practice.
link |
And then I ask the first question.
link |
What would be the first question?
link |
What's the meaning of life?
link |
I don't think there's a better way to end it.
link |
Thank you so much for talking today.
link |
It's a huge honor to finally meet you.
link |
Yeah, thank you too.
link |
It was a pleasure of mine too.
link |
And now let me leave you with some words of wisdom
link |
from Albert Einstein.
link |
The measure of intelligence is the ability to change.
link |
Thank you for listening and hope to see you next time.