back to index## Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI | Lex Fridman Podcast #75

link |

The following is a conversation with Marcus Hutter,

link |

senior research scientist at Google DeepMind.

link |

Throughout his career of research,

link |

including with Jürgen Schmidhuber and Shane Legge,

link |

he has proposed a lot of interesting ideas

link |

in and around the field of artificial general

link |

intelligence, including the development of AICSI,

link |

spelled AIXI model, which is a mathematical approach to AGI

link |

that incorporates ideas of Kolmogorov complexity,

link |

Solomonov induction, and reinforcement learning.

link |

In 2006, Marcus launched the 50,000 Euro Hutter Prize

link |

for lossless compression of human knowledge.

link |

The idea behind this prize is that the ability

link |

to compress well is closely related to intelligence.

link |

This, to me, is a profound idea.

link |

Specifically, if you can compress the first 100

link |

megabytes or 1 gigabyte of Wikipedia

link |

better than your predecessors, your compressor

link |

likely has to also be smarter.

link |

The intention of this prize is to encourage

link |

the development of intelligent compressors as a path to AGI.

link |

In conjunction with his podcast release just a few days ago,

link |

Marcus announced a 10x increase in several aspects

link |

of this prize, including the money, to 500,000 Euros.

link |

The better your compressor works relative to the previous

link |

winners, the higher fraction of that prize money

link |

is awarded to you.

link |

You can learn more about it if you Google simply Hutter Prize.

link |

I'm a big fan of benchmarks for developing AI systems,

link |

and the Hutter Prize may indeed be

link |

one that will spark some good ideas for approaches that

link |

will make progress on the path of developing AGI systems.

link |

This is the Artificial Intelligence Podcast.

link |

If you enjoy it, subscribe on YouTube,

link |

give it five stars on Apple Podcast,

link |

support it on Patreon, or simply connect with me on Twitter

link |

at Lex Friedman, spelled F R I D M A N.

link |

As usual, I'll do one or two minutes of ads

link |

now and never any ads in the middle

link |

that can break the flow of the conversation.

link |

I hope that works for you and doesn't

link |

hurt the listening experience.

link |

This show is presented by Cash App, the number one finance

link |

app in the App Store.

link |

When you get it, use code LEX PODCAST.

link |

Cash App lets you send money to friends,

link |

buy Bitcoin, and invest in the stock market

link |

with as little as $1.

link |

Broker services are provided by Cash App Investing,

link |

a subsidiary of Square, a member SIPC.

link |

Since Cash App allows you to send and receive money

link |

digitally, peer to peer, and security

link |

in all digital transactions is very important.

link |

Let me mention the PCI data security standard

link |

that Cash App is compliant with.

link |

I'm a big fan of standards for safety and security.

link |

PCI DSS is a good example of that,

link |

where a bunch of competitors got together

link |

and agreed that there needs to be

link |

a global standard around the security of transactions.

link |

Now, we just need to do the same for autonomous vehicles

link |

and AI systems in general.

link |

So again, if you get Cash App from the App Store or Google

link |

Play and use the code LEX PODCAST, you'll get $10.

link |

And Cash App will also donate $10 to FIRST,

link |

one of my favorite organizations that

link |

is helping to advance robotics and STEM education

link |

for young people around the world.

link |

And now, here's my conversation with Markus Hutter.

link |

Do you think of the universe as a computer

link |

or maybe an information processing system?

link |

Let's go with a big question first.

link |

Okay, with a big question first.

link |

I think it's a very interesting hypothesis or idea.

link |

And I have a background in physics,

link |

so I know a little bit about physical theories,

link |

the standard model of particle physics

link |

and general relativity theory.

link |

And they are amazing and describe virtually everything

link |

And they're all in a sense, computable theories.

link |

I mean, they're very hard to compute.

link |

And it's very elegant, simple theories,

link |

which describe virtually everything in the universe.

link |

So there's a strong indication that somehow

link |

the universe is computable, but it's a plausible hypothesis.

link |

So what do you think, just like you said, general relativity,

link |

quantum field theory, what do you think that

link |

the laws of physics are so nice and beautiful

link |

and simple and compressible?

link |

Do you think our universe was designed,

link |

is naturally this way?

link |

Are we just focusing on the parts

link |

that are especially compressible?

link |

Are human minds just enjoy something about that simplicity?

link |

And in fact, there's other things

link |

that are not so compressible.

link |

I strongly believe and I'm pretty convinced

link |

that the universe is inherently beautiful, elegant

link |

and simple and described by these equations.

link |

And we're not just picking that.

link |

I mean, if there were some phenomena

link |

which cannot be neatly described,

link |

scientists would try that.

link |

And there's biology, which is more messy,

link |

but we understand that it's an emergent phenomena

link |

and it's complex systems,

link |

but they still follow the same rules

link |

of quantum and electrodynamics.

link |

All of chemistry follows that and we know that.

link |

I mean, we cannot compute everything

link |

because we have limited computational resources.

link |

No, I think it's not a bias of the humans,

link |

but it's objectively simple.

link |

I mean, of course, you never know,

link |

maybe there's some corners very far out in the universe

link |

or super, super tiny below the nucleus of atoms

link |

or parallel universes which are not nice and simple,

link |

but there's no evidence for that.

link |

And we should apply Occam's razor

link |

and choose the simplest three consistent with it.

link |

But also it's a little bit self referential.

link |

So maybe a quick pause.

link |

What is Occam's razor?

link |

So Occam's razor says that you should not multiply entities

link |

beyond necessity, which sort of,

link |

if you translate it to proper English means,

link |

and in the scientific context means

link |

that if you have two theories or hypothesis or models,

link |

which equally well describe the phenomenon,

link |

your study or the data,

link |

you should choose the more simple one.

link |

So that's just the principle or sort of,

link |

that's not like a provable law, perhaps.

link |

Perhaps we'll kind of discuss it and think about it,

link |

but what's the intuition of why the simpler answer

link |

is the one that is likely to be more correct descriptor

link |

of whatever we're talking about?

link |

I believe that Occam's razor

link |

is probably the most important principle in science.

link |

I mean, of course we lead logical deduction

link |

and we do experimental design,

link |

but science is about finding, understanding the world,

link |

finding models of the world.

link |

And we can come up with crazy complex models,

link |

which explain everything but predict nothing.

link |

But the simple model seem to have predictive power

link |

and it's a valid question why?

link |

And there are two answers to that.

link |

You can just accept it.

link |

That is the principle of science and we use this principle

link |

and it seems to be successful.

link |

We don't know why, but it just happens to be.

link |

Or you can try, find another principle

link |

which explains Occam's razor.

link |

And if we start with the assumption

link |

that the world is governed by simple rules,

link |

then there's a bias towards simplicity

link |

and applying Occam's razor is the mechanism

link |

to finding these rules.

link |

And actually in a more quantitative sense,

link |

and we come back to that later in terms of somnolent reduction,

link |

you can rigorously prove that.

link |

You can assume that the world is simple,

link |

then Occam's razor is the best you can do

link |

in a certain sense.

link |

So I apologize for the romanticized question,

link |

but why do you think, outside of its effectiveness,

link |

why do you think we find simplicity

link |

so appealing as human beings?

link |

Why does E equals MC squared seem so beautiful to us humans?

link |

I guess mostly, in general, many things

link |

can be explained by an evolutionary argument.

link |

And there's some artifacts in humans

link |

which are just artifacts and not evolutionary necessary.

link |

But with this beauty and simplicity,

link |

it's, I believe, at least the core is about,

link |

like science, finding regularities in the world,

link |

understanding the world, which is necessary for survival.

link |

If I look at a bush and I just see noise,

link |

and there is a tiger and it eats me, then I'm dead.

link |

But if I try to find a pattern,

link |

and we know that humans are prone to find more patterns

link |

in data than they are, like the Mars face

link |

and all these things, but these biads

link |

towards finding patterns, even if they are non,

link |

but, I mean, it's best, of course, if they are, yeah,

link |

helps us for survival.

link |

Yeah, that's fascinating.

link |

I haven't thought really about the,

link |

I thought I just loved science,

link |

but indeed, in terms of just for survival purposes,

link |

there is an evolutionary argument

link |

for why we find the work of Einstein so beautiful.

link |

Maybe a quick small tangent.

link |

Could you describe what's,

link |

Salomonov induction is?

link |

Yeah, so that's a theory which I claim,

link |

and Mr. Lomanov sort of claimed a long time ago,

link |

that this solves the big philosophical problem of induction.

link |

And I believe the claim is essentially true.

link |

And what it does is the following.

link |

So, okay, for the picky listener,

link |

induction can be interpreted narrowly and widely.

link |

Narrow means inferring models from data.

link |

And widely means also then using these models

link |

for doing predictions,

link |

so predictions also part of the induction.

link |

So I'm a little bit sloppy sort of with the terminology,

link |

and maybe that comes from Ray Salomonov, you know,

link |

being sloppy, maybe I shouldn't say that.

link |

He can't complain anymore.

link |

So let me explain a little bit this theory in simple terms.

link |

So assume you have a data sequence,

link |

make it very simple, the simplest one say 1, 1, 1, 1, 1,

link |

and you see if 100 ones, what do you think comes next?

link |

The natural answer, I'm gonna speed up a little bit,

link |

the natural answer is of course, you know, one, okay?

link |

And the question is why, okay?

link |

Well, we see a pattern there, yeah, okay,

link |

there's a one and we repeat it.

link |

And why should it suddenly after 100 ones be different?

link |

So what we're looking for is simple explanations or models

link |

for the data we have.

link |

And now the question is,

link |

a model has to be presented in a certain language,

link |

in which language do we use?

link |

In science, we want formal languages,

link |

and we can use mathematics,

link |

or we can use programs on a computer.

link |

So abstractly on a Turing machine, for instance,

link |

or it can be a general purpose computer.

link |

So, and there are of course, lots of models of,

link |

you can say maybe it's 100 ones and then 100 zeros

link |

and 100 ones, that's a model, right?

link |

But there are simpler models, there's a model print one loop,

link |

and it also explains the data.

link |

And if you push that to the extreme,

link |

you are looking for the shortest program,

link |

which if you run this program reproduces the data you have,

link |

it will not stop, it will continue naturally.

link |

And this you take for your prediction.

link |

And on the sequence of ones, it's very plausible, right?

link |

That print one loop is the shortest program.

link |

We can give some more complex examples

link |

like one, two, three, four, five.

link |

The short program is again, you know,

link |

counter, and so that is roughly speaking

link |

how solomotive induction works.

link |

The extra twist is that it can also deal with noisy data.

link |

So if you have, for instance, a coin flip,

link |

say a biased coin, which comes up head with 60% probability,

link |

then it will predict, it will learn and figure this out,

link |

and after a while it predicts, oh, the next coin flip

link |

will be head with probability 60%.

link |

So it's the stochastic version of that.

link |

But the goal is, the dream is always the search

link |

for the short program.

link |

Well, in solomotive induction, precisely what you do is,

link |

so you combine, so looking for the shortest program

link |

is like applying Opaque's razor,

link |

like looking for the simplest theory.

link |

There's also Epicorus principle, which says,

link |

if you have multiple hypotheses,

link |

which equally well describe your data,

link |

don't discard any of them, keep all of them around,

link |

And you can put that together and say,

link |

okay, I have a bias towards simplicity,

link |

but it don't rule out the larger models.

link |

And technically what we do is,

link |

we weigh the shorter models higher

link |

and the longer models lower.

link |

And you use a Bayesian techniques, you have a prior,

link |

and which is precisely two to the minus

link |

the complexity of the program.

link |

And you weigh all this hypothesis and take this mixture,

link |

and then you get also the stochasticity in.

link |

Yeah, like many of your ideas,

link |

that's just a beautiful idea of weighing based

link |

on the simplicity of the program.

link |

I love that, that seems to me

link |

maybe a very human centric concept.

link |

It seems to be a very appealing way

link |

of discovering good programs in this world.

link |

You've used the term compression quite a bit.

link |

I think it's a beautiful idea.

link |

Sort of, we just talked about simplicity

link |

and maybe science or just all of our intellectual pursuits

link |

is basically the time to compress the complexity

link |

all around us into something simple.

link |

So what does this word mean to you, compression?

link |

I essentially have already explained it.

link |

So it compression means for me,

link |

finding short programs for the data

link |

or the phenomenon at hand.

link |

You could interpret it more widely,

link |

finding simple theories,

link |

which can be mathematical theories

link |

or maybe even informal, like just in words.

link |

Compression means finding short descriptions,

link |

explanations, programs for the data.

link |

Do you see science as a kind of our human attempt

link |

at compression, so we're speaking more generally,

link |

because when you say programs,

link |

you're kind of zooming in on a particular sort of

link |

almost like a computer science,

link |

artificial intelligence focus,

link |

but do you see all of human endeavor

link |

as a kind of compression?

link |

Well, at least all of science,

link |

I see as an endeavor of compression,

link |

not all of humanity, maybe.

link |

And well, there are also some other aspects of science

link |

like experimental design, right?

link |

I mean, we create experiments specifically

link |

to get extra knowledge.

link |

And that isn't part of the decision making process,

link |

but once we have the data,

link |

to understand the data is essentially compression.

link |

So I don't see any difference between compression,

link |

compression, understanding, and prediction.

link |

So we're jumping around topics a little bit,

link |

but returning back to simplicity,

link |

a fascinating concept of Kolmogorov complexity.

link |

So in your sense, do most objects

link |

in our mathematical universe

link |

have high Kolmogorov complexity?

link |

And maybe what is, first of all,

link |

what is Kolmogorov complexity?

link |

Okay, Kolmogorov complexity is a notion

link |

of simplicity or complexity,

link |

and it takes the compression view to the extreme.

link |

So I explained before that if you have some data sequence,

link |

just think about a file in a computer

link |

and best sort of, you know, just a string of bits.

link |

And if you, and we have data compressors,

link |

like we compress big files into zip files

link |

with certain compressors.

link |

And you can also produce self extracting ArcaFs.

link |

That means as an executable,

link |

if you run it, it reproduces your original file

link |

without needing an extra decompressor.

link |

It's just a decompressor plus the ArcaF together in one.

link |

And now there are better and worse compressors,

link |

and you can ask, what is the ultimate compressor?

link |

So what is the shortest possible self extracting ArcaF

link |

you could produce for a certain data set here,

link |

which reproduces the data set.

link |

And the length of this is called the Kolmogorov complexity.

link |

And arguably that is the information content

link |

I mean, if the data set is very redundant or very boring,

link |

you can compress it very well.

link |

So the information content should be low

link |

and you know, it is low according to this definition.

link |

So it's the length of the shortest program

link |

that summarizes the data?

link |

And what's your sense of our sort of universe

link |

when we think about the different objects in our universe

link |

that we try, concepts or whatever at every level,

link |

do they have higher or low Kolmogorov complexity?

link |

So what's the hope?

link |

Do we have a lot of hope

link |

and be able to summarize much of our world?

link |

That's a tricky and difficult question.

link |

So as I said before, I believe that the whole universe

link |

based on the evidence we have is very simple.

link |

So it has a very short description.

link |

Sorry, to linger on that, the whole universe,

link |

what does that mean?

link |

You mean at the very basic fundamental level

link |

in order to create the universe?

link |

So you need a very short program and you run it.

link |

To get the thing going.

link |

To get the thing going

link |

and then it will reproduce our universe.

link |

There's a problem with noise.

link |

We can come back to that later possibly.

link |

Is noise a problem or is it a bug or a feature?

link |

I would say it makes our life as a scientist

link |

really, really much harder.

link |

I mean, think about without noise,

link |

we wouldn't need all of the statistics.

link |

But then maybe we wouldn't feel like there's a free will.

link |

Maybe we need that for the...

link |

This is an illusion that noise can give you free will.

link |

At least in that way, it's a feature.

link |

But also, if you don't have noise,

link |

you have chaotic phenomena,

link |

which are effectively like noise.

link |

So we can't get away with statistics even then.

link |

I mean, think about rolling a dice

link |

and forget about quantum mechanics

link |

and you know exactly how you throw it.

link |

But I mean, it's still so hard to compute the trajectory

link |

that effectively it is best to model it

link |

as coming out with a number,

link |

this probability one over six.

link |

But from this set of philosophical

link |

Kolmogorov complexity perspective,

link |

if we didn't have noise,

link |

then arguably you could describe the whole universe

link |

as well as a standard model plus generativity.

link |

I mean, we don't have a theory of everything yet,

link |

but sort of assuming we are close to it or have it.

link |

Plus the initial conditions, which may hopefully be simple.

link |

And then you just run it

link |

and then you would reproduce the universe.

link |

But that's spoiled by noise or by chaotic systems

link |

or by initial conditions, which may be complex.

link |

So now if we don't take the whole universe,

link |

but just a subset, just take planet Earth.

link |

Planet Earth cannot be compressed

link |

into a couple of equations.

link |

This is a hugely complex system.

link |

So when you look at the window,

link |

like the whole thing might be simple,

link |

but when you just take a small window, then...

link |

It may become complex and that may be counterintuitive,

link |

but there's a very nice analogy.

link |

The book, the library of all books.

link |

So imagine you have a normal library with interesting books

link |

and you go there, great, lots of information

link |

and quite complex.

link |

So now I create a library which contains all possible books,

link |

So the first book just has A, A, A, A, A over all the pages.

link |

The next book A, A, A and ends with B and so on.

link |

I create this library of all books.

link |

I can write a super short program which creates this library.

link |

So this library which has all books

link |

has zero information content.

link |

And you take a subset of this library

link |

and suddenly you have a lot of information in there.

link |

So that's fascinating.

link |

I think one of the most beautiful object,

link |

mathematical objects that at least today

link |

seems to be understudied or under talked about

link |

is cellular automata.

link |

What lessons do you draw from sort of the game of life

link |

for cellular automata where you start with the simple rules

link |

just like you're describing with the universe

link |

and somehow complexity emerges.

link |

Do you feel like you have an intuitive grasp

link |

on the fascinating behavior of such systems

link |

where like you said, some chaotic behavior could happen,

link |

some complexity could emerge,

link |

some it could die out and some very rigid structures.

link |

Do you have a sense about cellular automata

link |

that somehow transfers maybe

link |

to the bigger questions of our universe?

link |

Yeah, the cellular automata

link |

and especially the Conway's game of life

link |

is really great because these rules are so simple.

link |

You can explain it to every child

link |

and even by hand you can simulate a little bit

link |

and you see these beautiful patterns emerge

link |

and people have proven that it's even Turing complete.

link |

You cannot just use a computer to simulate game of life

link |

but you can also use game of life to simulate any computer.

link |

That is truly amazing.

link |

And it's the prime example probably to demonstrate

link |

that very simple rules can lead to very rich phenomena.

link |

And people sometimes,

link |

how is chemistry and biology so rich?

link |

I mean, this can't be based on simple rules.

link |

But no, we know quantum electrodynamics

link |

describes all of chemistry.

link |

And we come later back to that.

link |

I claim intelligence can be explained

link |

or described in one single equation.

link |

This very rich phenomenon.

link |

You asked also about whether I understand this phenomenon

link |

and it's probably not.

link |

And there's this saying,

link |

you never understand really things,

link |

you just get used to them.

link |

And I think I got pretty used to cellular automata.

link |

So you believe that you understand

link |

now why this phenomenon happens.

link |

But I give you a different example.

link |

I didn't play too much with Conway's game of life

link |

but a little bit more with fractals

link |

and with the Mandelbrot set and these beautiful patterns,

link |

just look Mandelbrot set.

link |

And well, when the computers were really slow

link |

and I just had a black and white monitor

link |

and programmed my own programs in assembler too.

link |

Wow, you're legit.

link |

To get these fractals on the screen

link |

and it was mesmerized and much later.

link |

So I returned to this every couple of years

link |

and then I tried to understand what is going on.

link |

And you can understand a little bit.

link |

So I tried to derive the locations,

link |

there are these circles and the apple shape

link |

and then you have smaller Mandelbrot sets

link |

recursively in this set.

link |

And there's a way to mathematically

link |

by solving high order polynomials

link |

to figure out where these centers are

link |

and what size they are approximately.

link |

And by sort of mathematically approaching this problem,

link |

you slowly get a feeling of why things are like they are

link |

and that sort of isn't, you know,

link |

first step to understanding why this rich phenomena.

link |

Do you think it's possible, what's your intuition?

link |

Do you think it's possible to reverse engineer

link |

and find the short program that generated these fractals

link |

sort of by looking at the fractals?

link |

Well, in principle, yes, yeah.

link |

So, I mean, in principle, what you can do is

link |

you take, you know, any data set, you know,

link |

you take these fractals or you take whatever your data set,

link |

whatever you have, say a picture of Convey's Game of Life

link |

and you run through all programs.

link |

You take a program size one, two, three, four

link |

and all these programs around them all in parallel

link |

in so called dovetailing fashion,

link |

give them computational resources,

link |

first one 50%, second one half resources and so on

link |

and let them run, wait until they halt,

link |

give an output, compare it to your data

link |

and if some of these programs produce the correct data,

link |

then you stop and then you have already some program.

link |

It may be a long program because it's faster

link |

and then you continue and you get shorter

link |

and shorter programs until you eventually

link |

find the shortest program.

link |

The interesting thing, you can never know

link |

whether it's the shortest program

link |

because there could be an even shorter program

link |

which is just even slower and you just have to wait here.

link |

But asymptotically and actually after a finite time,

link |

you have the shortest program.

link |

So this is a theoretical but completely impractical way

link |

of finding the underlying structure in every data set

link |

and that is what Solomov induction does

link |

and Kolmogorov complexity.

link |

In practice, of course, we have to approach the problem

link |

more intelligently.

link |

And then if you take resource limitations into account,

link |

there's, for instance, a field of pseudo random numbers

link |

and these are deterministic sequences,

link |

but no algorithm which is fast,

link |

fast means runs in polynomial time,

link |

can detect that it's actually deterministic.

link |

So we can produce interesting,

link |

I mean, random numbers maybe not that interesting,

link |

but just an example.

link |

We can produce complex looking data

link |

and we can then prove that no fast algorithm

link |

can detect the underlying pattern.

link |

Which is, unfortunately, that's a big challenge

link |

for our search for simple programs

link |

in the space of artificial intelligence, perhaps.

link |

Yes, it definitely is for artificial intelligence

link |

and it's quite surprising that it's, I can't say easy.

link |

I mean, physicists worked really hard to find these theories,

link |

but apparently it was possible for human minds

link |

to find these simple rules in the universe.

link |

It could have been different, right?

link |

It could have been different.

link |

It's awe inspiring.

link |

So let me ask another absurdly big question.

link |

What is intelligence in your view?

link |

So I have, of course, a definition.

link |

I wasn't sure what you're going to say

link |

because you could have just as easily said,

link |

Which many people would say,

link |

but I'm not modest in this question.

link |

So the informal version,

link |

which I worked out together with Shane Lack,

link |

who cofounded DeepMind,

link |

is that intelligence measures an agent's ability

link |

to perform well in a wide range of environments.

link |

So that doesn't sound very impressive.

link |

And these words have been very carefully chosen

link |

and there is a mathematical theory behind that

link |

and we come back to that later.

link |

And if you look at this definition by itself,

link |

it seems like, yeah, okay,

link |

but it seems a lot of things are missing.

link |

But if you think it through,

link |

then you realize that most,

link |

and I claim all of the other traits,

link |

at least of rational intelligence,

link |

which we usually associate with intelligence,

link |

are emergent phenomena from this definition.

link |

Like creativity, memorization, planning, knowledge.

link |

You all need that in order to perform well

link |

in a wide range of environments.

link |

So you don't have to explicitly mention

link |

that in a definition.

link |

So yeah, so the consciousness, abstract reasoning,

link |

all these kinds of things are just emergent phenomena

link |

that help you in towards,

link |

can you say the definition again?

link |

So multiple environments.

link |

Did you mention the word goals?

link |

No, but we have an alternative definition.

link |

Instead of performing well,

link |

you can just replace it by goals.

link |

So intelligence measures an agent's ability

link |

to achieve goals in a wide range of environments.

link |

That's more or less equal.

link |

because in there, there's an injection of the word goals.

link |

So we want to specify there should be a goal.

link |

Yeah, but perform well is sort of,

link |

what does it mean?

link |

It's the same problem.

link |

There's a little bit gray area,

link |

but it's much closer to something that could be formalized.

link |

In your view, are humans,

link |

where do humans fit into that definition?

link |

Are they general intelligence systems

link |

that are able to perform in,

link |

like how good are they at fulfilling that definition

link |

at performing well in multiple environments?

link |

Yeah, that's a big question.

link |

I mean, the humans are performing best among all species.

link |

You could say that trees and plants are doing a better job.

link |

They'll probably outlast us.

link |

Yeah, but they are in a much more narrow environment, right?

link |

I mean, you just have a little bit of air pollutions

link |

and these trees die and we can adapt, right?

link |

We build houses, we build filters,

link |

we do geoengineering.

link |

So the multiple environment part.

link |

Yeah, that is very important, yeah.

link |

So that distinguish narrow intelligence

link |

from wide intelligence, also in the AI research.

link |

So let me ask the Allentourian question.

link |

Can machines think?

link |

Can machines be intelligent?

link |

So in your view, I have to kind of ask,

link |

the answer is probably yes,

link |

but I want to kind of hear what your thoughts on it.

link |

Can machines be made to fulfill this definition

link |

of intelligence, to achieve intelligence?

link |

Well, we are sort of getting there

link |

and on a small scale, we are already there.

link |

The wide range of environments are missing,

link |

but we have self driving cars,

link |

we have programs which play Go and chess,

link |

we have speech recognition.

link |

So that's pretty amazing,

link |

but these are narrow environments.

link |

But if you look at AlphaZero,

link |

that was also developed by DeepMind.

link |

I mean, got famous with AlphaGo

link |

and then came AlphaZero a year later.

link |

That was truly amazing.

link |

So reinforcement learning algorithm,

link |

which is able just by self play,

link |

to play chess and then also Go.

link |

And I mean, yes, they're both games,

link |

but they're quite different games.

link |

And you didn't don't feed them the rules of the game.

link |

And the most remarkable thing,

link |

which is still a mystery to me,

link |

that usually for any decent chess program,

link |

I don't know much about Go,

link |

you need opening books and end game tables and so on too.

link |

And nothing in there, nothing was put in there.

link |

Especially with AlphaZero,

link |

the self playing mechanism starting from scratch,

link |

being able to learn actually new strategies is...

link |

Yeah, it rediscovered all these famous openings

link |

within four hours by itself.

link |

What I was really happy about,

link |

I'm a terrible chess player, but I like Queen Gumby.

link |

And AlphaZero figured out that this is the best opening.

link |

Finally, somebody proved you correct.

link |

So yes, to answer your question,

link |

yes, I believe that general intelligence is possible.

link |

And it also, I mean, it depends how you define it.

link |

Do you say AGI with general intelligence,

link |

artificial intelligence,

link |

only refers to if you achieve human level

link |

or a subhuman level, but quite broad,

link |

is it also general intelligence?

link |

So we have to distinguish,

link |

or it's only super human intelligence,

link |

general artificial intelligence.

link |

Is there a test in your mind,

link |

like the Turing test for natural language

link |

or some other test that would impress the heck out of you

link |

that would kind of cross the line of your sense

link |

of intelligence within the framework that you said?

link |

Well, the Turing test has been criticized a lot,

link |

but I think it's not as bad as some people think.

link |

And some people think it's too strong.

link |

So it tests not just for system to be intelligent,

link |

but it also has to fake human deception,

link |

which is much harder.

link |

And on the other hand, they say it's too weak

link |

because it just maybe fakes emotions

link |

or intelligent behavior.

link |

But I don't think that's the problem or a big problem.

link |

So if you would pass the Turing test,

link |

so a conversation over terminal with a bot for an hour,

link |

or maybe a day or so,

link |

and you can fool a human into not knowing

link |

whether this is a human or not,

link |

so that's the Turing test,

link |

I would be truly impressed.

link |

And we have this annual competition, the Lübner Prize.

link |

And I mean, it started with ELISA,

link |

that was the first conversational program.

link |

And what is it called?

link |

The Japanese Mitsuko or so.

link |

That's the winner of the last couple of years.

link |

Yeah, it's quite impressive.

link |

And then Google has developed Mina, right?

link |

Just recently, that's an open domain conversational bot,

link |

just a couple of weeks ago, I think.

link |

Yeah, I kind of like the metric

link |

that sort of the Alexa Prize has proposed.

link |

I mean, maybe it's obvious to you.

link |

It wasn't to me of setting sort of a length

link |

of a conversation.

link |

Like you want the bot to be sufficiently interesting

link |

that you would want to keep talking to it

link |

for like 20 minutes.

link |

And that's a surprisingly effective in aggregate metric,

link |

because really, like nobody has the patience

link |

to be able to talk to a bot that's not interesting

link |

and intelligent and witty,

link |

and is able to go on to different tangents, jump domains,

link |

be able to say something interesting

link |

to maintain your attention.

link |

And maybe many humans will also fail this test.

link |

That's the, unfortunately, we set,

link |

just like with autonomous vehicles, with chatbots,

link |

we also set a bar that's way too high to reach.

link |

I said, you know, the Turing test is not as bad

link |

as some people believe,

link |

but what is really not useful about the Turing test,

link |

it gives us no guidance

link |

how to develop these systems in the first place.

link |

Of course, you know, we can develop them by trial and error

link |

and, you know, do whatever and then run the test

link |

and see whether it works or not.

link |

But a mathematical definition of intelligence

link |

gives us, you know, an objective,

link |

which we can then analyze by theoretical tools

link |

or computational, and, you know,

link |

maybe even prove how close we are.

link |

And we will come back to that later with the iXe model.

link |

So, I mentioned the compression, right?

link |

So in natural language processing,

link |

they have achieved amazing results.

link |

And one way to test this, of course,

link |

you know, take the system, you train it,

link |

and then you see how well it performs on the task.

link |

But a lot of performance measurement

link |

is done by so called perplexity,

link |

which is essentially the same as complexity

link |

or compression length.

link |

So the NLP community develops new systems

link |

and then they measure the compression length

link |

and then they have ranking and leaks

link |

because there's a strong correlation

link |

between compressing well,

link |

and then the system's performing well at the task at hand.

link |

It's not perfect, but it's good enough

link |

for them as an intermediate aim.

link |

So you mean a measure,

link |

so this is kind of almost returning

link |

to the common goal of complexity.

link |

So you're saying good compression

link |

usually means good intelligence.

link |

So you mentioned you're one of the only people

link |

who dared boldly to try to formalize

link |

the idea of artificial general intelligence,

link |

to have a mathematical framework for intelligence,

link |

just like as we mentioned,

link |

termed AIXI, A, I, X, I.

link |

So let me ask the basic question.

link |

Okay, so let me first say what it stands for because...

link |

What it stands for, actually,

link |

that's probably the more basic question.

link |

The first question is usually how it's pronounced,

link |

but finally I put it on the website how it's pronounced

link |

and you figured it out.

link |

The name comes from AI, artificial intelligence,

link |

and the X, I, is the Greek letter Xi,

link |

which are used for Solomonov's distribution

link |

for quite stupid reasons,

link |

which I'm not willing to repeat here in front of camera.

link |

So it just happened to be more or less arbitrary.

link |

But it also has nice other interpretations.

link |

So there are actions and perceptions in this model.

link |

An agent has actions and perceptions over time.

link |

So this is A index I, X index I.

link |

So there's the action at time I

link |

and then followed by perception at time I.

link |

Yeah, we'll go with that.

link |

I'll edit out the first part.

link |

I have some more interpretations.

link |

So at some point, maybe five years ago or 10 years ago,

link |

I discovered in Barcelona, it was on a big church

link |

that was in stone engraved, some text,

link |

and the word Aixia appeared there a couple of times.

link |

I was very surprised and happy about that.

link |

And I looked it up.

link |

So it is a Catalan language

link |

and it means with some interpretation of that's it,

link |

that's the right thing to do.

link |

Oh, so it's almost like destined somehow.

link |

It came to you in a dream.

link |

And similar, there's a Chinese word, Aixi,

link |

also written like Aixi, if you transcribe that to Pinyin.

link |

And the final one is that it's AI crossed with induction

link |

because that is, and that's going more to the content now.

link |

So good old fashioned AI is more about planning

link |

and known deterministic world

link |

and induction is more about often IID data

link |

and inferring models.

link |

And essentially what this Aixi model does

link |

is combining these two.

link |

And I actually also recently, I think heard that

link |

in Japanese AI means love.

link |

So if you can combine XI somehow with that,

link |

I think we can, there might be some interesting ideas there.

link |

So Aixi, let's then take the next step.

link |

Can you maybe talk at the big level

link |

of what is this mathematical framework?

link |

Yeah, so it consists essentially of two parts.

link |

One is the learning and induction and prediction part.

link |

And the other one is the planning part.

link |

So let's come first to the learning,

link |

induction, prediction part,

link |

which essentially I explained already before.

link |

So what we need for any agent to act well

link |

is that it can somehow predict what happens.

link |

I mean, if you have no idea what your actions do,

link |

how can you decide which actions are good or not?

link |

So you need to have some model of what your actions effect.

link |

So what you do is you have some experience,

link |

you build models like scientists of your experience,

link |

then you hope these models are roughly correct,

link |

and then you use these models for prediction.

link |

And the model is, sorry to interrupt,

link |

and the model is based on your perception of the world,

link |

how your actions will affect that world.

link |

So how do you think about a model?

link |

That's not the important part,

link |

but it is technically important,

link |

but at this stage we can just think about predicting,

link |

let's say, stock market data, weather data,

link |

or IQ sequences, one, two, three, four, five,

link |

what comes next, yeah?

link |

So of course our actions affect what we're doing,

link |

but I'll come back to that in a second.

link |

So, and I'll keep just interrupting.

link |

So just to draw a line between prediction and planning,

link |

what do you mean by prediction in this way?

link |

It's trying to predict the environment

link |

without your long term action in the environment?

link |

What is prediction?

link |

Okay, if you want to put the actions in now,

link |

okay, then let's put it in now, yeah?

link |

We don't have to put them now.

link |

Scratch it, scratch it, dumb question, okay.

link |

So the simplest form of prediction is

link |

that you just have data which you passively observe,

link |

and you want to predict what happens

link |

without interfering, as I said,

link |

weather forecasting, stock market, IQ sequences,

link |

or just anything, okay?

link |

And Solomonov's theory of induction based on compression,

link |

so you look for the shortest program

link |

which describes your data sequence,

link |

and then you take this program, run it,

link |

it reproduces your data sequence by definition,

link |

and then you let it continue running,

link |

and then it will produce some predictions,

link |

and you can rigorously prove that for any prediction task,

link |

this is essentially the best possible predictor.

link |

Of course, if there's a prediction task,

link |

or a task which is unpredictable,

link |

like, you know, you have fair coin flips.

link |

Yeah, I cannot predict the next fair coin flip.

link |

What Solomonov does is says,

link |

okay, next head is probably 50%.

link |

It's the best you can do.

link |

So if something is unpredictable,

link |

Solomonov will also not magically predict it.

link |

But if there is some pattern and predictability,

link |

then Solomonov induction will figure that out eventually,

link |

and not just eventually, but rather quickly,

link |

and you can have proof convergence rates,

link |

whatever your data is.

link |

So there's pure magic in a sense.

link |

Well, the catch is that it's not computable,

link |

and we come back to that later.

link |

You cannot just implement it

link |

even with Google resources here,

link |

and run it and predict the stock market and become rich.

link |

I mean, Ray Solomonov already tried it at the time.

link |

But so the basic task is you're in the environment,

link |

and you're interacting with the environment

link |

to try to learn to model that environment,

link |

and the model is in the space of all these programs,

link |

and your goal is to get a bunch of programs that are simple.

link |

Yeah, so let's go to the actions now.

link |

But actually, good that you asked.

link |

Usually I skip this part,

link |

although there is also a minor contribution which I did,

link |

so the action part,

link |

but I usually sort of just jump to the decision part.

link |

So let me explain the action part now.

link |

Thanks for asking.

link |

So you have to modify it a little bit

link |

by now not just predicting a sequence

link |

which just comes to you,

link |

but you have an observation, then you act somehow,

link |

and then you want to predict the next observation

link |

based on the past observation and your action.

link |

Then you take the next action.

link |

You don't care about predicting it because you're doing it.

link |

Then you get the next observation,

link |

and you want, well, before you get it,

link |

you want to predict it, again,

link |

based on your past action and observation sequence.

link |

You just condition extra on your actions.

link |

There's an interesting alternative

link |

that you also try to predict your own actions.

link |

In the past or the future?

link |

In your future actions.

link |

That's interesting.

link |

Yeah. Wait, let me wrap.

link |

I think my brain just broke.

link |

We should maybe discuss that later

link |

after I've explained the IXE model.

link |

That's an interesting variation.

link |

But that is a really interesting variation,

link |

and a quick comment.

link |

I don't know if you want to insert that in here,

link |

but you're looking at the, in terms of observations,

link |

you're looking at the entire, the big history,

link |

the long history of the observations.

link |

Exactly. That's very important.

link |

The whole history from birth sort of of the agent,

link |

and we can come back to that.

link |

And also why this is important.

link |

Often, you know, in RL, you have MDPs,

link |

micro decision processes, which are much more limiting.

link |

Okay. So now we can predict conditioned on actions.

link |

So even if you influence environment,

link |

but prediction is not all we want to do, right?

link |

We also want to act really in the world.

link |

And the question is how to choose the actions.

link |

And we don't want to greedily choose the actions,

link |

you know, just, you know, what is best in the next time step.

link |

And we first, I should say, you know, what is, you know,

link |

how do we measure performance?

link |

So we measure performance by giving the agent reward.

link |

That's the so called reinforcement learning framework.

link |

So every time step, you can give it a positive reward

link |

or negative reward, or maybe no reward.

link |

It could be a very scarce, right?

link |

Like if you play chess, just at the end of the game,

link |

you give plus one for winning or minus one for losing.

link |

So in the RxC framework, that's completely sufficient.

link |

So occasionally you give a reward signal

link |

and you ask the agent to maximize reward,

link |

but not greedily sort of, you know, the next one, next one,

link |

because that's very bad in the long run if you're greedy.

link |

So, but over the lifetime of the agent.

link |

So let's assume the agent lives for M time steps,

link |

or say dies in sort of a hundred years sharp.

link |

That's just, you know, the simplest model to explain.

link |

So it looks at the future reward sum

link |

and ask what is my action sequence,

link |

or actually more precisely my policy,

link |

which leads in expectation, because I don't know the world,

link |

to the maximum reward sum.

link |

Let me give you an analogy.

link |

In chess, for instance,

link |

we know how to play optimally in theory.

link |

It's just a mini max strategy.

link |

I play the move which seems best to me

link |

under the assumption that the opponent plays the move

link |

which is best for him.

link |

So best, so worst for me under the assumption that he,

link |

I play again, the best move.

link |

And then you have this expecting max three

link |

to the end of the game, and then you back propagate,

link |

and then you get the best possible move.

link |

So that is the optimal strategy,

link |

which von Neumann already figured out a long time ago,

link |

for playing adversarial games.

link |

Luckily, or maybe unluckily for the theory,

link |

it becomes harder.

link |

The world is not always adversarial.

link |

So it can be, if there are other humans,

link |

even cooperative, or nature is usually,

link |

I mean, the dead nature is stochastic, you know,

link |

things just happen randomly, or don't care about you.

link |

So what you have to take into account is the noise,

link |

and not necessarily adversarialty.

link |

So you replace the minimum on the opponent's side

link |

by an expectation,

link |

which is general enough to include also adversarial cases.

link |

So now instead of a mini max strategy,

link |

you have an expected max strategy.

link |

So that is well known.

link |

It's called sequential decision theory.

link |

But the question is,

link |

on which probability distribution do you base that?

link |

If I have the true probability distribution,

link |

like say I play backgammon, right?

link |

There's dice, and there's certain randomness involved.

link |

Yeah, I can calculate probabilities

link |

and feed it in the expected max,

link |

or the sequential decision tree,

link |

come up with the optimal decision if I have enough compute.

link |

But for the real world, we don't know that, you know,

link |

what is the probability the driver in front of me breaks?

link |

So depends on all kinds of things,

link |

and especially new situations, I don't know.

link |

So this is this unknown thing about prediction,

link |

and there's where Solomonov comes in.

link |

So what you do is in sequential decision tree,

link |

you just replace the true distribution,

link |

which we don't know, by this universal distribution.

link |

I didn't explicitly talk about it,

link |

but this is used for universal prediction

link |

and plug it into the sequential decision tree mechanism.

link |

And then you get the best of both worlds.

link |

You have a long term planning agent,

link |

but it doesn't need to know anything about the world

link |

because the Solomonov induction part learns.

link |

Can you explicitly try to describe

link |

the universal distribution

link |

and how Solomonov induction plays a role here?

link |

I'm trying to understand.

link |

So what it does it, so in the simplest case,

link |

I said, take the shortest program, describing your data,

link |

run it, have a prediction which would be deterministic.

link |

But you should not just take the shortest program,

link |

but also consider the longer ones,

link |

but give it lower a priori probability.

link |

So in the Bayesian framework, you say a priori,

link |

any distribution, which is a model or a stochastic program,

link |

has a certain a priori probability,

link |

which is two to the minus, and why two to the minus length?

link |

You know, I could explain length of this program.

link |

So longer programs are punished a priori.

link |

And then you multiply it

link |

with the so called likelihood function,

link |

which is, as the name suggests,

link |

is how likely is this model given the data at hand.

link |

So if you have a very wrong model,

link |

it's very unlikely that this model is true.

link |

And so it is very small number.

link |

So even if the model is simple, it gets penalized by that.

link |

And what you do is then you take just the sum,

link |

or this is the average over it.

link |

And this gives you a probability distribution.

link |

So it's universal distribution or Solomonov distribution.

link |

So it's weighed by the simplicity of the program

link |

and the likelihood.

link |

It's kind of a nice idea.

link |

So okay, and then you said there's you're playing N or M

link |

or forgot the letter steps into the future.

link |

So how difficult is that problem?

link |

What's involved there?

link |

Okay, so basic optimization problem.

link |

What are we talking about?

link |

Yeah, so you have a planning problem up to horizon M,

link |

and that's exponential time in the horizon M,

link |

which is, I mean, it's computable, but intractable.

link |

I mean, even for chess, it's already intractable

link |

to do that exactly.

link |

And you know, for goal.

link |

But it could be also discounted kind of framework where.

link |

Yeah, so having a hard horizon, you know, at 100 years,

link |

it's just for simplicity of discussing the model

link |

and also sometimes the math is simple.

link |

But there are lots of variations,

link |

actually quite interesting parameter.

link |

There's nothing really problematic about it,

link |

but it's very interesting.

link |

So for instance, you think, no,

link |

let's let the parameter M tend to infinity, right?

link |

You want an agent which lives forever, right?

link |

If you do it normally, you have two problems.

link |

First, the mathematics breaks down

link |

because you have an infinite reward sum,

link |

which may give infinity,

link |

and getting reward 0.1 every time step is infinity,

link |

and giving reward one every time step is infinity,

link |

Not really what we want.

link |

Other problem is that if you have an infinite life,

link |

you can be lazy for as long as you want for 10 years

link |

and then catch up with the same expected reward.

link |

And think about yourself or maybe some friends or so.

link |

If they knew they lived forever, why work hard now?

link |

Just enjoy your life and then catch up later.

link |

So that's another problem with infinite horizon.

link |

And you mentioned, yes, we can go to discounting,

link |

but then the standard discounting

link |

is so called geometric discounting.

link |

So a dollar today is about worth

link |

as much as $1.05 tomorrow.

link |

So if you do the so called geometric discounting,

link |

you have introduced an effective horizon.

link |

So the agent is now motivated to look ahead

link |

a certain amount of time effectively.

link |

It's like a moving horizon.

link |

And for any fixed effective horizon,

link |

there is a problem to solve,

link |

which requires larger horizon.

link |

So if I look ahead five time steps,

link |

I'm a terrible chess player, right?

link |

I'll need to look ahead longer.

link |

If I play go, I probably have to look ahead even longer.

link |

So for every problem, for every horizon,

link |

there is a problem which this horizon cannot solve.

link |

But I introduced the so called near harmonic horizon,

link |

which goes down with one over T

link |

rather than exponential in T,

link |

which produces an agent,

link |

which effectively looks into the future

link |

proportional to each age.

link |

So if it's five years old, it plans for five years.

link |

If it's 100 years old, it then plans for 100 years.

link |

And it's a little bit similar to humans too, right?

link |

I mean, children don't plan ahead very long,

link |

but then we get adult, we play ahead more longer.

link |

Maybe when we get very old,

link |

I mean, we know that we don't live forever.

link |

Maybe then our horizon shrinks again.

link |

So that's really interesting.

link |

So adjusting the horizon,

link |

is there some mathematical benefit of that?

link |

Or is it just a nice,

link |

I mean, intuitively, empirically,

link |

it would probably be a good idea

link |

to sort of push the horizon back,

link |

extend the horizon as you experience more of the world.

link |

But is there some mathematical conclusions here

link |

that are beneficial?

link |

With solomonic reductions or the prediction part,

link |

we have extremely strong finite time,

link |

but not finite data results.

link |

So you have so and so much data,

link |

then you lose so and so much.

link |

So it's a, the theory is really great.

link |

With the ICSE model, with the planning part,

link |

many results are only asymptotic, which, well, this is...

link |

What does asymptotic mean?

link |

Asymptotic means you can prove, for instance,

link |

that in the long run, if the agent, you know,

link |

acts long enough, then, you know,

link |

it performs optimal or some nice thing happens.

link |

So, but you don't know how fast it converges.

link |

So it may converge fast,

link |

but we're just not able to prove it

link |

because of a difficult problem.

link |

Or maybe there's a bug in the model

link |

so that it's really that slow.

link |

So that is what asymptotic means,

link |

sort of eventually, but we don't know how fast.

link |

And if I give the agent a fixed horizon M,

link |

then I cannot prove asymptotic results, right?

link |

So I mean, sort of if it dies in a hundred years,

link |

then in a hundred years it's over, I cannot say eventually.

link |

So this is the advantage of the discounting

link |

that I can prove asymptotic results.

link |

So just to clarify, so I, okay, I made,

link |

I've built up a model, we're now in the moment of,

link |

I have this way of looking several steps ahead.

link |

How do I pick what action I will take?

link |

It's like with the playing chess, right?

link |

You do this minimax.

link |

In this case here, do expectimax based on the solomonov

link |

distribution, you propagate back,

link |

and then while an action falls out,

link |

the action which maximizes the future expected reward

link |

on the solomonov distribution,

link |

and then you just take this action.

link |

And then you get a new observation,

link |

and you feed it in this action observation,

link |

And the reward, so on.

link |

Yeah, so you rewrote too, yeah.

link |

And then maybe you can even predict your own action.

link |

But okay, this big framework,

link |

what is it, I mean,

link |

it's kind of a beautiful mathematical framework

link |

to think about artificial general intelligence.

link |

What can you, what does it help you into it

link |

about how to build such systems?

link |

Or maybe from another perspective,

link |

what does it help us in understanding AGI?

link |

So when I started in the field,

link |

I was always interested in two things.

link |

One was AGI, the name didn't exist then,

link |

what's called general AI or strong AI,

link |

and the physics theory of everything.

link |

So I switched back and forth between computer science

link |

and physics quite often.

link |

You said the theory of everything.

link |

The theory of everything, yeah.

link |

Those are basically the two biggest problems

link |

before all of humanity.

link |

Yeah, I can explain if you wanted some later time,

link |

why I'm interested in these two questions.

link |

Can I ask you in a small tangent,

link |

if it was one to be solved,

link |

which one would you,

link |

if an apple fell on your head

link |

and there was a brilliant insight

link |

and you could arrive at the solution to one,

link |

would it be AGI or the theory of everything?

link |

Definitely AGI, because once the AGI problem is solved,

link |

I can ask the AGI to solve the other problem for me.

link |

Yeah, brilliant input.

link |

Okay, so as you were saying about it.

link |

Okay, so, and the reason why I didn't settle,

link |

I mean, this thought about,

link |

once you have solved AGI, it solves all kinds of other,

link |

not just the theory of every problem,

link |

but all kinds of more useful problems to humanity

link |

is very appealing to many people.

link |

And I had this thought also,

link |

but I was quite disappointed with the state of the art

link |

of the field of AI.

link |

There was some theory about logical reasoning,

link |

but I was never convinced that this will fly.

link |

And then there was this more heuristic approaches

link |

with neural networks and I didn't like these heuristics.

link |

So, and also I didn't have any good idea myself.

link |

So that's the reason why I toggled back and forth

link |

quite some while and even worked four and a half years

link |

in a company developing software,

link |

something completely unrelated.

link |

But then I had this idea about the ICSE model.

link |

And so what it gives you, it gives you a gold standard.

link |

So I have proven that this is the most intelligent agents

link |

which anybody could build in quotation mark,

link |

because it's just mathematical

link |

and you need infinite compute.

link |

But this is the limit and this is completely specified.

link |

It's not just a framework and every year,

link |

tens of frameworks are developed,

link |

which are just skeletons and then pieces are missing.

link |

And usually these missing pieces,

link |

turn out to be really, really difficult.

link |

And so this is completely and uniquely defined

link |

and we can analyze that mathematically.

link |

And we've also developed some approximations.

link |

I can talk about that a little bit later.

link |

That would be sort of the top down approach,

link |

like say for Neumann's minimax theory,

link |

that's the theoretical optimal play of games.

link |

And now we need to approximate it,

link |

put heuristics in, prune the tree, blah, blah, blah,

link |

So we can do that also with the ICSE model,

link |

but for general AI.

link |

It can also inspire those,

link |

and most researchers go bottom up, right?

link |

They have the systems,

link |

they try to make it more general, more intelligent.

link |

It can inspire in which direction to go.

link |

What do you mean by that?

link |

So if you have some choice to make, right?

link |

So how should I evaluate my system

link |

if I can't do cross validation?

link |

How should I do my learning

link |

if my standard regularization doesn't work well?

link |

So the answer is always this,

link |

we have a system which does everything, that's ICSE.

link |

It's just completely in the ivory tower,

link |

completely useless from a practical point of view.

link |

But you can look at it and see,

link |

ah, yeah, maybe I can take some aspects.

link |

And instead of Kolmogorov complexity,

link |

that just takes some compressors,

link |

which has been developed so far.

link |

And for the planning, well, we have UCT,

link |

which has also been used in Go.

link |

And at least it's inspired me a lot

link |

to have this formal definition.

link |

And if you look at other fields,

link |

like I always come back to physics

link |

because I have a physics background,

link |

think about the phenomenon of energy.

link |

That was long time a mysterious concept.

link |

And at some point it was completely formalized.

link |

And that really helped a lot.

link |

And you can point out a lot of these things

link |

which were first mysterious and vague,

link |

and then they have been rigorously formalized.

link |

Speed and acceleration has been confused, right?

link |

Until it was formally defined,

link |

yeah, there was a time like this.

link |

And people often who don't have any background,

link |

And this ICSE model or the intelligence definitions,

link |

which is sort of the dual to it,

link |

we come back to that later,

link |

formalizes the notion of intelligence

link |

uniquely and rigorously.

link |

So in a sense, it serves as kind of the light

link |

at the end of the tunnel.

link |

So for, I mean, there's a million questions

link |

So maybe kind of, okay,

link |

let's feel around in the dark a little bit.

link |

So there's been here a deep mind,

link |

but in general, been a lot of breakthrough ideas,

link |

just like we've been saying around reinforcement learning.

link |

So how do you see the progress

link |

in reinforcement learning is different?

link |

Like which subset of ICSE does it occupy?

link |

The current, like you said,

link |

maybe the Markov assumption is made quite often

link |

in reinforcement learning.

link |

There's other assumptions made

link |

in order to make the system work.

link |

What do you see as the difference connection

link |

between reinforcement learning and ICSE?

link |

And so the major difference is that

link |

essentially all other approaches,

link |

they make stronger assumptions.

link |

So in reinforcement learning, the Markov assumption

link |

is that the next state or next observation

link |

only depends on the previous observation

link |

and not the whole history,

link |

which makes, of course, the mathematics much easier

link |

rather than dealing with histories.

link |

Of course, they profit from it also,

link |

because then you have algorithms

link |

that run on current computers

link |

and do something practically useful.

link |

But for general AI, all the assumptions

link |

which are made by other approaches,

link |

we know already now they are limiting.

link |

So, for instance, usually you need

link |

a goddessity assumption in the MDP frameworks

link |

in order to learn.

link |

A goddessity essentially means that you can recover

link |

from your mistakes and that there are no traps

link |

in the environment.

link |

And if you make this assumption,

link |

then essentially you can go back to a previous state,

link |

go there a couple of times and then learn

link |

what statistics and what the state is like,

link |

and then in the long run perform well in this state.

link |

But there are no fundamental problems.

link |

But in real life, we know there can be one single action.

link |

One second of being inattentive while driving a car fast

link |

can ruin the rest of my life.

link |

I can become quadriplegic or whatever.

link |

So, and there's no recovery anymore.

link |

So, the real world is not ergodic, I always say.

link |

There are traps and there are situations

link |

where you are not recover from.

link |

And very little theory has been developed for this case.

link |

What about, what do you see in the context of IECSIA

link |

as the role of exploration?

link |

Sort of, you mentioned in the real world

link |

you can get into trouble when we make the wrong decisions

link |

and really pay for it.

link |

But exploration seems to be fundamentally important

link |

for learning about this world, for gaining new knowledge.

link |

So, is exploration baked in?

link |

Another way to ask it, what are the potential

link |

to ask it, what are the parameters of IECSIA

link |

that can be controlled?

link |

Yeah, I say the good thing is that there are no parameters

link |

Some other people track knobs to control.

link |

And you can do that.

link |

I mean, you can modify IECSIA so that you have some knobs

link |

to play with if you want to.

link |

But the exploration is directly baked in.

link |

And that comes from the Bayesian learning

link |

and the longterm planning.

link |

So these together already imply exploration.

link |

You can nicely and explicitly prove that

link |

for simple problems like so called bandit problems,

link |

where you say, to give a real world example,

link |

say you have two medical treatments, A and B,

link |

you don't know the effectiveness,

link |

you try A a little bit, B a little bit,

link |

but you don't want to harm too many patients.

link |

So you have to sort of trade off exploring.

link |

And at some point you want to explore

link |

and you can do the mathematics

link |

and figure out the optimal strategy.

link |

They talk about Bayesian agents,

link |

they're also non Bayesian agents,

link |

but it shows that this Bayesian framework

link |

by taking a prior or possible worlds,

link |

doing the Bayesian mixture,

link |

then the Bayes optimal decision with longterm planning

link |

that is important,

link |

automatically implies exploration,

link |

also to the proper extent,

link |

not too much exploration and not too little.

link |

It is very simple settings.

link |

In the IXE model, I was also able to prove

link |

that it is a self optimizing theorem

link |

or asymptotic optimality theorems,

link |

although they're only asymptotic, not finite time bounds.

link |

So it seems like the longterm planning is really important,

link |

but the longterm part of the planning is really important.

link |

And also, I mean, maybe a quick tangent,

link |

how important do you think is removing

link |

the Markov assumption and looking at the full history?

link |

Sort of intuitively, of course, it's important,

link |

but is it like fundamentally transformative

link |

to the entirety of the problem?

link |

What's your sense of it?

link |

Like, cause we all, we make that assumption quite often.

link |

It's just throwing away the past.

link |

No, I think it's absolutely crucial.

link |

The question is whether there's a way to deal with it

link |

in a more heuristic and still sufficiently well way.

link |

So I have to come up with an example and fly,

link |

but you have some key event in your life,

link |

long time ago in some city or something,

link |

you realized that's a really dangerous street or whatever.

link |

And you want to remember that forever,

link |

in case you come back there.

link |

Kind of a selective kind of memory.

link |

So you remember all the important events in the past,

link |

but somehow selecting the important is.

link |

And I'm not concerned about just storing the whole history.

link |

Just, you can calculate, human life says 30 or 100 years,

link |

doesn't matter, right?

link |

How much data comes in through the vision system

link |

and the auditory system, you compress it a little bit,

link |

in this case, lossily and store it.

link |

We are soon in the means of just storing it.

link |

But you still need to the selection for the planning part

link |

and the compression for the understanding part.

link |

The raw storage I'm really not concerned about.

link |

And I think we should just store,

link |

if you develop an agent,

link |

preferably just store all the interaction history.

link |

And then you build of course models on top of it

link |

and you compress it and you are selective,

link |

but occasionally you go back to the old data

link |

and reanalyze it based on your new experience you have.

link |

Sometimes you are in school,

link |

you learn all these things you think is totally useless

link |

and much later you realize,

link |

oh, they were not so useless as you thought.

link |

I'm looking at you, linear algebra.

link |

So maybe let me ask about objective functions

link |

because that rewards, it seems to be an important part.

link |

The rewards are kind of given to the system.

link |

For a lot of people,

link |

the specification of the objective function

link |

is a key part of intelligence.

link |

The agent itself figuring out what is important.

link |

What do you think about that?

link |

Is it possible within the IXE framework

link |

to yourself discover the reward

link |

based on which you should operate?

link |

Okay, that will be a long answer.

link |

So, and that is a very interesting question.

link |

And I'm asked a lot about this question,

link |

where do the rewards come from?

link |

So, and then I give you now a couple of answers.

link |

So if you want to build agents, now let's start simple.

link |

So let's assume we want to build an agent

link |

based on the IXE model, which performs a particular task.

link |

Let's start with something super simple,

link |

like, I mean, super simple, like playing chess,

link |

or go or something, yeah.

link |

Then you just, the reward is winning the game is plus one,

link |

losing the game is minus one, done.

link |

You apply this agent.

link |

If you have enough compute, you let it self play

link |

and it will learn the rules of the game,

link |

will play perfect chess after some while, problem solved.

link |

Okay, so if you have more complicated problems,

link |

then you may believe that you have the right reward,

link |

So a nice, cute example is the elevator control

link |

that is also in Rich Sutton's book,

link |

which is a great book, by the way.

link |

So you control the elevator and you think,

link |

well, maybe the reward should be coupled

link |

to how long people wait in front of the elevator.

link |

You program it and you do it.

link |

And what happens is the elevator eagerly picks up

link |

all the people, but never drops them off.

link |

So then you realize, oh, maybe the time in the elevator

link |

also counts, so you minimize the sum, yeah?

link |

And the elevator does that, but never picks up the people

link |

in the 10th floor and the top floor

link |

because in expectation, it's not worth it.

link |

Just let them stay.

link |

So even in apparently simple problems,

link |

you can make mistakes, yeah?

link |

And that's what in more serious contexts

link |

AGI safety researchers consider.

link |

So now let's go back to general agents.

link |

So assume you want to build an agent,

link |

which is generally useful to humans, yeah?

link |

So you have a household robot, yeah?

link |

And it should do all kinds of tasks.

link |

So in this case, the human should give the reward

link |

I mean, maybe it's pre trained in the factory

link |

and that there's some sort of internal reward

link |

for the battery level or whatever, yeah?

link |

But so it does the dishes badly, you punish the robot,

link |

it does it good, you reward the robot

link |

and then train it to a new task, yeah, like a child, right?

link |

So you need the human in the loop.

link |

If you want a system, which is useful to the human.

link |

And as long as these agents stay subhuman level,

link |

that should work reasonably well,

link |

apart from these examples.

link |

It becomes critical if they become on a human level.

link |

It's like with children, small children,

link |

you have reasonably well under control,

link |

they become older, the reward technique

link |

doesn't work so well anymore.

link |

So then finally, so this would be agents,

link |

which are just, you could say slaves to the humans, yeah?

link |

So if you are more ambitious and just say,

link |

we want to build a new species of intelligent beings,

link |

we put them on a new planet

link |

and we want them to develop this planet or whatever.

link |

So we don't give them any reward.

link |

So what could we do?

link |

And you could try to come up with some reward functions

link |

like it should maintain itself, the robot,

link |

it should maybe multiply, build more robots, right?

link |

And maybe all kinds of things which you find useful,

link |

but that's pretty hard, right?

link |

What does self maintenance mean?

link |

What does it mean to build a copy?

link |

Should it be exact copy, an approximate copy?

link |

And so that's really hard,

link |

but Laurent also at DeepMind developed a beautiful model.

link |

So it just took the ICSE model

link |

and coupled the rewards to information gain.

link |

So he said the reward is proportional

link |

to how much the agent had learned about the world.

link |

And you can rigorously, formally, uniquely define that

link |

in terms of archival versions, okay?

link |

So if you put that in, you get a completely autonomous agent.

link |

And actually, interestingly, for this agent,

link |

we can prove much stronger result

link |

than for the general agent, which is also nice.

link |

And if you let this agent loose,

link |

it will be in a sense, the optimal scientist.

link |

It is absolutely curious to learn as much as possible

link |

And of course, it will also have

link |

a lot of instrumental goals, right?

link |

In order to learn, it needs to at least survive, right?

link |

A dead agent is not good for anything.

link |

So it needs to have self preservation.

link |

And if it builds small helpers, acquiring more information,

link |

it will do that, yeah?

link |

If exploration, space exploration or whatever is necessary,

link |

right, to gathering information and develop it.

link |

So it has a lot of instrumental goals

link |

falling on this information gain.

link |

And this agent is completely autonomous of us.

link |

No rewards necessary anymore.

link |

Yeah, of course, it could find a way

link |

to game the concept of information

link |

and get stuck in that library

link |

that you mentioned beforehand

link |

with a very large number of books.

link |

The first agent had this problem.

link |

It would get stuck in front of an old TV screen,

link |

which has just had white noise.

link |

Yeah, white noise, yeah.

link |

But the second version can deal with at least stochasticity.

link |

Yeah, what about curiosity?

link |

This kind of word, curiosity, creativity,

link |

is that kind of the reward function being

link |

of getting new information?

link |

Is that similar to idea of kind of injecting exploration

link |

for its own sake inside the reward function?

link |

Do you find this at all appealing, interesting?

link |

I think that's a nice definition.

link |

Curiosity is rewards.

link |

Sorry, curiosity is exploration for its own sake.

link |

Yeah, I would accept that.

link |

But most curiosity, well, in humans,

link |

and especially in children,

link |

is not just for its own sake,

link |

but for actually learning about the environment

link |

and for behaving better.

link |

So I think most curiosity is tied in the end

link |

towards performing better.

link |

Well, okay, so if intelligence systems

link |

need to have this reward function,

link |

let me, you're an intelligence system,

link |

currently passing the torrent test quite effectively.

link |

What's the reward function

link |

of our human intelligence existence?

link |

What's the reward function

link |

that Marcus Hutter is operating under?

link |

Okay, to the first question,

link |

the biological reward function is to survive and to spread,

link |

and very few humans sort of are able to overcome

link |

this biological reward function.

link |

But we live in a very nice world

link |

where we have lots of spare time

link |

and can still survive and spread,

link |

so we can develop arbitrary other interests,

link |

which is quite interesting.

link |

On top of that, yeah.

link |

But the survival and spreading sort of is,

link |

I would say, the goal or the reward function of humans,

link |

so that the core one.

link |

I like how you avoided answering the second question,

link |

which a good intelligence system would.

link |

That your own meaning of life and the reward function.

link |

My own meaning of life and reward function

link |

is to find an AGI to build it.

link |

Okay, let's dissect the X even further.

link |

So one of the assumptions is kind of infinity

link |

keeps creeping up everywhere,

link |

which, what are your thoughts

link |

on kind of bounded rationality

link |

and sort of the nature of our existence

link |

and intelligence systems is that we're operating

link |

always under constraints, under limited time,

link |

limited resources.

link |

How does that, how do you think about that

link |

within the IXE framework,

link |

within trying to create an AGI system

link |

that operates under these constraints?

link |

Yeah, that is one of the criticisms about IXE,

link |

that it ignores computation and completely.

link |

And some people believe that intelligence

link |

is inherently tied to what's bounded resources.

link |

What do you think on this one point?

link |

Do you think it's,

link |

do you think the bounded resources

link |

are fundamental to intelligence?

link |

I would say that an intelligence notion,

link |

which ignores computational limits is extremely useful.

link |

A good intelligence notion,

link |

which includes these resources would be even more useful,

link |

but we don't have that yet.

link |

And so look at other fields outside of computer science,

link |

computational aspects never play a fundamental role.

link |

You develop biological models for cells,

link |

something in physics, these theories,

link |

I mean, become more and more crazy

link |

and harder and harder to compute.

link |

Well, in the end, of course,

link |

we need to do something with this model,

link |

but this is more a nuisance than a feature.

link |

And I'm sometimes wondering if artificial intelligence

link |

would not sit in a computer science department,

link |

but in a philosophy department,

link |

then this computational focus

link |

would be probably significantly less.

link |

I mean, think about the induction problem

link |

is more in the philosophy department.

link |

There's virtually no paper who cares about,

link |

how long it takes to compute the answer.

link |

That is completely secondary.

link |

Of course, once we have figured out the first problem,

link |

so intelligence without computational resources,

link |

then the next and very good question is,

link |

could we improve it by including computational resources,

link |

but nobody was able to do that so far

link |

in an even halfway satisfactory manner.

link |

I like that, that in the long run,

link |

the right department to belong to is philosophy.

link |

That's actually quite a deep idea,

link |

or even to at least to think about

link |

big picture philosophical questions,

link |

big picture questions,

link |

even in the computer science department.

link |

But you've mentioned approximation.

link |

Sort of, there's a lot of infinity,

link |

a lot of huge resources needed.

link |

Are there approximations to IXE

link |

that within the IXE framework that are useful?

link |

Yeah, we have developed a couple of approximations.

link |

And what we do there is that

link |

the Solomov induction part,

link |

which was find the shortest program describing your data,

link |

we just replace it by standard data compressors.

link |

And the better compressors get,

link |

the better this part will become.

link |

We focus on a particular compressor

link |

called context tree weighting,

link |

which is pretty amazing, not so well known.

link |

It has beautiful theoretical properties,

link |

also works reasonably well in practice.

link |

So we use that for the approximation of the induction

link |

and the learning and the prediction part.

link |

And for the planning part,

link |

we essentially just took the ideas from a computer go

link |

It was Java Zipes Bari, also now at DeepMind,

link |

who developed the so called UCT algorithm,

link |

upper confidence bound for trees algorithm

link |

on top of the Monte Carlo tree search.

link |

So we approximate this planning part by sampling.

link |

And it's successful on some small toy problems.

link |

We don't want to lose the generality, right?

link |

And that's sort of the handicap, right?

link |

If you want to be general, you have to give up something.

link |

So, but this single agent was able to play small games

link |

like Coon poker and Tic Tac Toe and even Pacman

link |

in the same architecture, no change.

link |

The agent doesn't know the rules of the game,

link |

really nothing and all by self or by a player

link |

with these environments.

link |

So Jürgen Schmidhuber proposed something called

link |

Ghetto Machines, which is a self improving program

link |

that rewrites its own code.

link |

Sort of mathematically, philosophically,

link |

what's the relationship in your eyes,

link |

if you're familiar with it,

link |

between AXI and the Ghetto Machines?

link |

Yeah, familiar with it.

link |

He developed it while I was in his lab.

link |

Yeah, so the Ghetto Machine, to explain it briefly,

link |

you give it a task.

link |

It could be a simple task as, you know,

link |

finding prime factors in numbers, right?

link |

You can formally write it down.

link |

There's a very slow algorithm to do that.

link |

Just try all the factors, yeah.

link |

Or play chess, right?

link |

Optimally, you write the algorithm to minimax

link |

to the end of the game.

link |

So you write down what the Ghetto Machine should do.

link |

Then it will take part of its resources to run this program

link |

and other part of its resources to improve this program.

link |

And when it finds an improved version,

link |

which provably computes the same answer.

link |

So that's the key part, yeah.

link |

It needs to prove by itself that this change of program

link |

still satisfies the original specification.

link |

And if it does so, then it replaces the original program

link |

by the improved program.

link |

And by definition, it does the same job,

link |

but just faster, okay?

link |

And then, you know, it proves over it and over it.

link |

And it's developed in a way that all parts

link |

of this Ghetto Machine can self improve,

link |

but it stays provably consistent

link |

with the original specification.

link |

So from this perspective, it has nothing to do with iXe.

link |

But if you would now put iXe as the starting axioms in,

link |

it would run iXe, but you know, that takes forever.

link |

But then if it finds a provable speed up of iXe,

link |

it would replace it by this and this and this.

link |

And maybe eventually it comes up with a model

link |

which is still the iXe model.

link |

It cannot be, I mean, just for the knowledgeable reader,

link |

iXe is incomputable and that can prove that therefore

link |

there cannot be a computable exact algorithm computers.

link |

There needs to be some approximations

link |

and this is not dealt with the Ghetto Machine.

link |

So you have to do something about it.

link |

But there's the iXe TL model, which is finitely computable,

link |

which we could put in.

link |

Which part of iXe is noncomputable?

link |

The Solomonov induction part.

link |

The induction, okay, so.

link |

But there is ways of getting computable approximations

link |

of the iXe model, so then it's at least computable.

link |

It is still way beyond any resources anybody will ever have,

link |

but then the Ghetto Machine could sort of improve it

link |

further and further in an exact way.

link |

So is it theoretically possible

link |

that the Ghetto Machine process could improve?

link |

Isn't iXe already optimal?

link |

It is optimal in terms of the reward collected

link |

over its interaction cycles,

link |

but it takes infinite time to produce one action.

link |

And the world continues whether you want it or not.

link |

So the model is assuming you had an oracle,

link |

which solved this problem,

link |

and then in the next 100 milliseconds

link |

or the reaction time you need gives the answer,

link |

then iXe is optimal.

link |

It's optimal in sense of also from learning efficiency

link |

and data efficiency, but not in terms of computation time.

link |

And then the Ghetto Machine in theory,

link |

but probably not provably could make it go faster.

link |

Okay, interesting.

link |

Those two components are super interesting.

link |

The sort of the perfect intelligence combined

link |

with self improvement,

link |

sort of provable self improvement

link |

since you're always getting the correct answer

link |

and you're improving.

link |

Okay, so you've also mentioned that different kinds

link |

of things in the chase of solving this reward,

link |

sort of optimizing for the goal,

link |

interesting human things could emerge.

link |

So is there a place for consciousness within iXe?

link |

Where does, maybe you can comment,

link |

because I suppose we humans are just another instantiation

link |

of iXe agents and we seem to have consciousness.

link |

You say humans are an instantiation of an iXe agent?

link |

Well, that would be amazing,

link |

but I think that's not true even for the smartest

link |

and most rational humans.

link |

I think maybe we are very crude approximations.

link |

I mean, I tend to believe, again, I'm Russian,

link |

so I tend to believe our flaws are part of the optimal.

link |

So we tend to laugh off and criticize our flaws

link |

and I tend to think that that's actually close

link |

to an optimal behavior.

link |

Well, some flaws, if you think more carefully about it,

link |

are actually not flaws, yeah,

link |

but I think there are still enough flaws.

link |

As a student of history,

link |

I think all the suffering that we've endured

link |

as a civilization,

link |

it's possible that that's the optimal amount of suffering

link |

we need to endure to minimize longterm suffering.

link |

That's your Russian background, I think.

link |

That's the Russian.

link |

Whether humans are or not instantiations of an iXe agent,

link |

do you think there's a consciousness

link |

of something that could emerge

link |

in a computational form or framework like iXe?

link |

Let me also ask you a question.

link |

Do you think I'm conscious?

link |

Yeah, that's a good question.

link |

That tie is confusing me, but I think so.

link |

You think that makes me unconscious

link |

because it strangles me or?

link |

If an agent were to solve the imitation game

link |

I think that would be dressed similarly to you.

link |

That because there's a kind of flamboyant,

link |

interesting, complex behavior pattern

link |

that sells that you're human and you're conscious.

link |

But why do you ask?

link |

Was it a yes or was it a no?

link |

Yes, I think you're conscious, yes.

link |

So, and you explained sort of somehow why,

link |

but you infer that from my behavior, right?

link |

You can never be sure about that.

link |

And I think the same thing will happen

link |

with any intelligent agent we develop

link |

if it behaves in a way sufficiently close to humans

link |

or maybe even not humans.

link |

I mean, maybe a dog is also sometimes

link |

a little bit self conscious, right?

link |

So if it behaves in a way

link |

where we attribute typically consciousness,

link |

we would attribute consciousness

link |

to these intelligent systems.

link |

And I see probably in particular

link |

that of course doesn't answer the question

link |

whether it's really conscious.

link |

And that's the big hard problem of consciousness.

link |

Maybe I'm a zombie.

link |

I mean, not the movie zombie, but the philosophical zombie.

link |

Is to you the display of consciousness

link |

close enough to consciousness

link |

from a perspective of AGI

link |

that the distinction of the hard problem of consciousness

link |

is not an interesting one?

link |

I think we don't have to worry

link |

about the consciousness problem,

link |

especially the hard problem for developing AGI.

link |

I think, you know, we progress.

link |

At some point we have solved all the technical problems

link |

and this system will behave intelligent

link |

and then super intelligent.

link |

And this consciousness will emerge.

link |

I mean, definitely it will display behavior

link |

which we will interpret as conscious.

link |

And then it's a philosophical question.

link |

Did this consciousness really emerge

link |

or is it a zombie which just, you know, fakes everything?

link |

We still don't have to figure that out.

link |

Although it may be interesting,

link |

at least from a philosophical point of view,

link |

it's very interesting,

link |

but it may also be sort of practically interesting.

link |

You know, there's some people saying,

link |

if it's just faking consciousness and feelings,

link |

you know, then we don't need to be concerned about,

link |

But if it's real conscious and has feelings,

link |

then we need to be concerned, yeah.

link |

I can't wait till the day

link |

where AI systems exhibit consciousness

link |

because it'll truly be some of the hardest ethical questions

link |

of what we do with that.

link |

It is rather easy to build systems

link |

which people ascribe consciousness.

link |

And I give you an analogy.

link |

I mean, remember, maybe it was before you were born,

link |

How dare you, sir?

link |

Why, that's the, you're young, right?

link |

Thank you, thank you very much.

link |

But I was also in the Soviet Union.

link |

We didn't have any of those fun things.

link |

But you have heard about this Tamagotchi,

link |

which was, you know, really, really primitive,

link |

actually, for the time it was,

link |

and, you know, you could raise, you know, this,

link |

and kids got so attached to it

link |

and, you know, didn't want to let it die

link |

and probably, if we would have asked, you know,

link |

the children, do you think this Tamagotchi is conscious?

link |

They would have said yes.

link |

Half of them would have said yes, I would guess.

link |

I think that's kind of a beautiful thing, actually,

link |

because that consciousness, ascribing consciousness,

link |

seems to create a deeper connection.

link |

Which is a powerful thing.

link |

But we'll have to be careful on the ethics side of that.

link |

Well, let me ask about the AGI community broadly.

link |

You kind of represent some of the most serious work on AGI,

link |

as of at least earlier,

link |

and DeepMind represents serious work on AGI these days.

link |

But why, in your sense, is the AGI community so small

link |

or has been so small until maybe DeepMind came along?

link |

Like, why aren't more people seriously working

link |

on human level and superhuman level intelligence

link |

from a formal perspective?

link |

Okay, from a formal perspective,

link |

that's sort of an extra point.

link |

So I think there are a couple of reasons.

link |

I mean, AI came in waves, right?

link |

You know, AI winters and AI summers,

link |

and then there were big promises which were not fulfilled,

link |

and people got disappointed.

link |

And that narrow AI solving particular problems,

link |

which seemed to require intelligence,

link |

was always to some extent successful,

link |

and there were improvements, small steps.

link |

And if you build something which is useful for society

link |

or industrial useful, then there's a lot of funding.

link |

So I guess it was in parts the money,

link |

which drives people to develop a specific system

link |

solving specific tasks.

link |

But you would think that, at least in university,

link |

you should be able to do ivory tower research.

link |

And that was probably better a long time ago,

link |

but even nowadays, there's quite some pressure

link |

of doing applied research or translational research,

link |

and it's harder to get grants as a theorist.

link |

So that also drives people away.

link |

It's maybe also harder

link |

attacking the general intelligence problem.

link |

So I think enough people, I mean, maybe a small number

link |

were still interested in formalizing intelligence

link |

and thinking of general intelligence,

link |

but not much came up, right?

link |

Well, not much great stuff came up.

link |

So what do you think,

link |

we talked about the formal big light

link |

at the end of the tunnel,

link |

but from the engineering perspective,

link |

what do you think it takes to build an AGI system?

link |

Is that, and I don't know if that's a stupid question

link |

or a distinct question

link |

from everything we've been talking about at AICSI,

link |

but what do you see as the steps that are necessary to take

link |

to start to try to build something?

link |

So you want a blueprint now,

link |

and then you go off and do it?

link |

That's the whole point of this conversation,

link |

trying to squeeze that in there.

link |

Now, is there, I mean, what's your intuition?

link |

Is it in the robotics space

link |

or something that has a body and tries to explore the world?

link |

Is it in the reinforcement learning space,

link |

like the efforts with AlphaZero and AlphaStar

link |

that are kind of exploring how you can solve it through

link |

in the simulation in the gaming world?

link |

Is there stuff in sort of all the transformer work

link |

and natural English processing,

link |

sort of maybe attacking the open domain dialogue?

link |

Like what, where do you see a promising pathways?

link |

Let me pick the embodiment maybe.

link |

So embodiment is important, yes and no.

link |

I don't believe that we need a physical robot

link |

walking or rolling around, interacting with the real world

link |

in order to achieve AGI.

link |

And I think it's more of a distraction probably

link |

than helpful, it's sort of confusing the body with the mind.

link |

For industrial applications or near term applications,

link |

of course we need robots for all kinds of things,

link |

but for solving the big problem, at least at this stage,

link |

I think it's not necessary.

link |

But the answer is also yes,

link |

that I think the most promising approach

link |

is that you have an agent

link |

and that can be a virtual agent in a computer

link |

interacting with an environment,

link |

possibly a 3D simulated environment

link |

like in many computer games.

link |

And you train and learn the agent,

link |

even if you don't intend to later put it sort of,

link |

this algorithm in a robot brain

link |

and leave it forever in the virtual reality,

link |

getting experience in a,

link |

although it's just simulated 3D world,

link |

is possibly, and I say possibly,

link |

important to understand things

link |

on a similar level as humans do,

link |

especially if the agent or primarily if the agent

link |

needs to interact with the humans.

link |

If you talk about objects on top of each other in space

link |

and flying and cars and so on,

link |

and the agent has no experience

link |

with even virtual 3D worlds,

link |

it's probably hard to grasp.

link |

So if you develop an abstract agent,

link |

say we take the mathematical path

link |

and we just want to build an agent

link |

which can prove theorems

link |

and becomes a better and better mathematician,

link |

then this agent needs to be able to reason

link |

in very abstract spaces

link |

and then maybe sort of putting it into 3D environments,

link |

simulated or not is even harmful.

link |

It should sort of, you put it in, I don't know,

link |

an environment which it creates itself or so.

link |

It seems like you have a interesting, rich,

link |

complex trajectory through life

link |

in terms of your journey of ideas.

link |

So it's interesting to ask what books,

link |

technical, fiction, philosophical,

link |

books, ideas, people had a transformative effect.

link |

Books are most interesting

link |

because maybe people could also read those books

link |

and see if they could be inspired as well.

link |

Yeah, luckily I asked books and not singular book.

link |

It's very hard and I try to pin down one book.

link |

And I can do that at the end.

link |

the books which were most transformative for me

link |

or which I can most highly recommend

link |

to people interested in AI.

link |

Yeah, yeah, both, both, yeah, yeah.

link |

I would always start with Russell and Norvig,

link |

Artificial Intelligence, A Modern Approach.

link |

That's the AI Bible.

link |

It's an amazing book.

link |

It covers all approaches to AI.

link |

And even if you focused on one approach,

link |

I think that is the minimum you should know

link |

about the other approaches out there.

link |

So that should be your first book.

link |

Fourth edition should be coming out soon.

link |

Oh, okay, interesting.

link |

There's a deep learning chapter now,

link |

Written by Ian Goodfellow, okay.

link |

And then the next book I would recommend,

link |

The Reinforcement Learning Book by Satneen Barto.

link |

That's a beautiful book.

link |

If there's any problem with the book,

link |

it makes RL feel and look much easier than it actually is.

link |

It's very gentle book.

link |

It's very nice to read, the exercises to do.

link |

You can very quickly get some RL systems to run.

link |

You know, very toy problems, but it's a lot of fun.

link |

And in a couple of days you feel you know what RL is about,

link |

but it's much harder than the book.

link |

Oh, come on now, it's an awesome book.

link |

Yeah, it is, yeah.

link |

And maybe, I mean, there's so many books out there.

link |

If you like the information theoretic approach,

link |

then there's Kolmogorov Complexity by Alin Vitani,

link |

but probably, you know, some short article is enough.

link |

You don't need to read a whole book,

link |

but it's a great book.

link |

And if you have to mention one all time favorite book,

link |

it's of different flavor, that's a book

link |

which is used in the International Baccalaureate

link |

for high school students in several countries.

link |

That's from Nicholas Alchin, Theory of Knowledge,

link |

second edition or first, not the third, please.

link |

The third one, they took out all the fun.

link |

So this asks all the interesting,

link |

or to me, interesting philosophical questions

link |

about how we acquire knowledge from all perspectives,

link |

from math, from art, from physics,

link |

and ask how can we know anything?

link |

And the book is called Theory of Knowledge.

link |

From which, is this almost like a philosophical exploration

link |

of how we get knowledge from anything?

link |

Yes, yeah, I mean, can religion tell us, you know,

link |

about something about the world?

link |

Can science tell us something about the world?

link |

Can mathematics, or is it just playing with symbols?

link |

And, you know, it's open ended questions.

link |

And, I mean, it's for high school students,

link |

so they have then resources from Hitchhiker's Guide

link |

to the Galaxy and from Star Wars

link |

and The Chicken Crossed the Road, yeah.

link |

And it's fun to read, but it's also quite deep.

link |

If you could live one day of your life over again,

link |

has it made you truly happy?

link |

Or maybe like we said with the books,

link |

it was truly transformative.

link |

What day, what moment would you choose

link |

that something pop into your mind?

link |

Does it need to be a day in the past,

link |

or can it be a day in the future?

link |

Well, space time is an emergent phenomena,

link |

so it's all the same anyway.

link |

Okay, from the past.

link |

You're really good at saying from the future, I love it.

link |

No, I will tell you from the future, okay.

link |

So from the past, I would say

link |

when I discovered my Axie model.

link |

I mean, it was not in one day,

link |

but it was one moment where I realized

link |

Kolmogorov complexity and didn't even know that it existed,

link |

but I discovered sort of this compression idea

link |

myself, but immediately I knew I can't be the first one,

link |

but I had this idea.

link |

And then I knew about sequential decisionry,

link |

and I knew if I put it together, this is the right thing.

link |

And yeah, still when I think back about this moment,

link |

I'm super excited about it.

link |

Was there any more details and context that moment?

link |

Did an apple fall on your head?

link |

So it was like, if you look at Ian Goodfellow

link |

talking about GANs, there was beer involved.

link |

Is there some more context of what sparked your thought,

link |

No, it was much more mundane.

link |

So I worked in this company.

link |

So in this sense, the four and a half years

link |

was not completely wasted.

link |

And I worked on an image interpolation problem,

link |

and I developed a quite neat new interpolation techniques

link |

and they got patented, which happens quite often.

link |

I got sort of overboard and thought about,

link |

yeah, that's pretty good, but it's not the best.

link |

So what is the best possible way of doing interpolation?

link |

And then I thought, yeah, you want the simplest picture,

link |

which is if you coarse grain it,

link |

recovers your original picture.

link |

And then I thought about the simplicity concept

link |

more in quantitative terms,

link |

and then everything developed.

link |

And somehow the four beautiful mix

link |

of also being a physicist

link |

and thinking about the big picture of it,

link |

then led you to probably think big with AIX.

link |

So as a physicist, I was probably trained

link |

not to always think in computational terms,

link |

just ignore that and think about

link |

the fundamental properties, which you want to have.

link |

So what about if you could really one day in the future?

link |

What would that be?

link |

When I solve the AGI problem.

link |

In practice, so in theory,

link |

I have solved it with the AIX model, but in practice.

link |

And then I ask the first question.

link |

What would be the first question?

link |

What's the meaning of life?

link |

I don't think there's a better way to end it.

link |

Thank you so much for talking today.

link |

It's a huge honor to finally meet you.

link |

Yeah, thank you too.

link |

It was a pleasure of mine too.

link |

And now let me leave you with some words of wisdom

link |

from Albert Einstein.

link |

The measure of intelligence is the ability to change.

link |

Thank you for listening and hope to see you next time.