back to index

Vladimir Vapnik: Predicates, Invariants, and the Essence of Intelligence | Lex Fridman Podcast #71


small model | large model

link |
00:00:00.000
The following is a conversation with Vladimir Vapnik.
link |
00:00:03.200
Part two, the second time we spoke on the podcast.
link |
00:00:07.280
He's the co inventor of support vector machines,
link |
00:00:09.760
support vector clustering, VC theory,
link |
00:00:12.080
and many foundational ideas and statistical learning.
link |
00:00:14.960
He was born in the Soviet Union,
link |
00:00:17.280
worked at the Institute of Control Sciences in Moscow,
link |
00:00:20.240
then in the US, worked at AT&T, NEC labs,
link |
00:00:24.640
Facebook AI research,
link |
00:00:26.080
and now is a professor at Columbia University.
link |
00:00:29.360
His work has been cited over 200,000 times.
link |
00:00:32.880
The first time we spoke on the podcast was just over a year ago,
link |
00:00:36.400
one of the early episodes.
link |
00:00:38.880
This time, we spoke after a lecture he gave titled
link |
00:00:41.760
Complete Statistical Theory of Learning,
link |
00:00:44.240
as part of the MIT series of lectures on deep learning and AI that I organized.
link |
00:00:50.000
I'll release the video of the lecture in the next few days.
link |
00:00:53.520
This podcast and lecture are independent from each other,
link |
00:00:56.720
so you don't need one to understand the other.
link |
00:00:59.280
The lecture is quite technical and math heavy,
link |
00:01:02.960
so if you do watch both,
link |
00:01:04.240
I recommend listening to this podcast first,
link |
00:01:06.640
since the podcast is probably a bit more accessible.
link |
00:01:11.120
This is the Artificial Intelligence Podcast.
link |
00:01:13.760
If you enjoy it, subscribe on YouTube,
link |
00:01:15.920
give it five stars on Apple Podcasts,
link |
00:01:17.760
support it on Patreon,
link |
00:01:18.960
or simply connect with me on Twitter,
link |
00:01:20.960
and Lex Friedman spelled F R I D M A N.
link |
00:01:23.920
As usual, I'll do one or two minutes of ads now,
link |
00:01:27.040
and never any ads in the middle that can break the flow of the conversation.
link |
00:01:30.560
I hope that works for you,
link |
00:01:32.080
and doesn't hurt the listening experience.
link |
00:01:35.440
This show is presented by Cash App,
link |
00:01:37.280
the number one finance app in the App Store.
link |
00:01:39.520
When you get it, use code LEX Podcast.
link |
00:01:42.480
Cash App lets you send money to friends, buy Bitcoin,
link |
00:01:45.920
and invest in the stock market with as little as $1.
link |
00:01:48.640
Brokerage services are provided by Cash App Investing,
link |
00:01:51.680
a subsidiary of Square, and member SIPC.
link |
00:01:56.240
Since Cash App allows you to send and receive money digitally peer to peer,
link |
00:02:00.320
and security in all digital transactions is very important,
link |
00:02:03.440
let me mention the PCI Data Security Standard,
link |
00:02:06.560
PCI DSS Level 1, that Cash App is compliant with.
link |
00:02:12.000
I'm a big fan of standards for safety and security,
link |
00:02:15.120
and PCI DSS is a good example of that,
link |
00:02:18.720
where a bunch of competitors got together
link |
00:02:20.800
and agreed that there needs to be a global standard
link |
00:02:23.360
around the security of transactions.
link |
00:02:25.680
Now, we just need to do the same for autonomous vehicles
link |
00:02:28.720
and AI systems in general.
link |
00:02:31.200
So again, if you get Cash App from the App Store,
link |
00:02:33.760
or Google Play, and use the code LEX Podcast,
link |
00:02:37.120
you get $10, and Cash App will also donate $10 to first,
link |
00:02:41.040
one of my favorite organizations that is helping to advance
link |
00:02:44.800
robotics and STEM education for young people around the world.
link |
00:02:48.000
And now, here's my conversation with Vladimir Vapnick.
link |
00:02:55.200
You and I talked about Alan Turing yesterday, a little bit,
link |
00:02:59.680
and that he, as the father of artificial intelligence,
link |
00:03:02.640
may have instilled in our field an ethic of engineering,
link |
00:03:05.600
and not science, seeking more to build intelligence
link |
00:03:09.440
rather than to understand it.
link |
00:03:12.000
What do you think is the difference between these two paths
link |
00:03:14.560
of engineering intelligence and the science of intelligence?
link |
00:03:20.960
It's a completely different story.
link |
00:03:23.520
Engineering is a mutation of human activity.
link |
00:03:28.320
You have to make a device which behaves as human behavior,
link |
00:03:35.200
have all the functions of human.
link |
00:03:38.960
It does not matter how you do it.
link |
00:03:40.640
But to understand what is intelligence about is quite a different problem.
link |
00:03:48.880
So, I think, I believe that it's somehow related to predicate we talked yesterday about,
link |
00:03:57.840
because look at the Vladimir Props idea.
link |
00:04:04.640
He just found 31 he predicates.
link |
00:04:12.560
He called it units, which can explain human behavior,
link |
00:04:19.040
at least in Russian tales.
link |
00:04:20.640
He looked at Russian tales and derived from that,
link |
00:04:24.800
and then people realized that it's more wide than in Russian tales.
link |
00:04:29.440
It isn't TV, in movie serials, and so on and so on.
link |
00:04:33.840
You're talking about Vladimir Prop, who in 1928 published a book,
link |
00:04:39.920
Morphology of the Folktale, describing 31 predicates that have this kind of sequential
link |
00:04:48.720
structure that a lot of the stories and narratives follow in Russian folklore and other content.
link |
00:04:54.880
We'll talk about it.
link |
00:04:55.920
I'd like to talk about predicates in a focused way,
link |
00:04:59.040
but if you allow me to stay zoomed out on our friend Alan Turing, and he inspired a generation
link |
00:05:07.440
with the imitation game.
link |
00:05:12.800
If we can linger on that a little bit longer, do you think learning to imitate
link |
00:05:20.400
intelligence can get us closer to understanding intelligence?
link |
00:05:24.800
So, why do you think imitation is so far from understanding?
link |
00:05:32.560
I think that it is different between you have different goals.
link |
00:05:37.440
So, your goal is to create something, something useful, and that is great.
link |
00:05:45.760
And you can see how much things was done, and I believe that it will be done even more.
link |
00:05:52.000
It's self driving cars and also this business.
link |
00:05:56.640
It is great, and it was inspired by Turing vision, but understanding is very difficult.
link |
00:06:05.280
It's more or less philosophical category.
link |
00:06:08.160
What means understanding the world?
link |
00:06:11.040
I believe in a scheme which starts from Plato, that there exists a world of ideas.
link |
00:06:18.160
I believe that intelligence, it is a world of ideas, but it is a world of pure ideas.
link |
00:06:25.040
And when you combine them with reality things, it creates, as in my case,
link |
00:06:34.560
invariance, which is very specific, and that's, I believe, the combination of ideas
link |
00:06:45.040
in way to constructing invariant is intelligence.
link |
00:06:49.680
But first of all, predicate.
link |
00:06:53.280
If you know predicate, and hopefully then, not too much predicate exists.
link |
00:07:00.640
For example, 31 predicate for human behavior, it is not a lot.
link |
00:07:06.000
Vladimir Prop used 31, you can even call it predicate, 31 predicate to describe stories,
link |
00:07:16.320
narratives.
link |
00:07:17.440
So you think human behavior, how much of human behavior, how much of our world, our universe,
link |
00:07:24.320
all the things that matter in our existence can be summarized in predicates of the kind
link |
00:07:30.640
that Prop was working with?
link |
00:07:32.480
I think that we have a lot of form of behavior.
link |
00:07:37.680
But I think that predicate is much less, because even in these examples, which I gave
link |
00:07:43.440
you yesterday, you saw that predicate can be, one predicate can construct many different
link |
00:07:55.440
invariance, depending on your data, they're applying to different data, and they give different
link |
00:08:02.640
invariance.
link |
00:08:04.160
So, but pure ideas, maybe not so much.
link |
00:08:08.480
Not so many.
link |
00:08:09.760
I don't know about that.
link |
00:08:11.280
But my guess, I hope, that's why challenge about digital recognition, how much you need.
link |
00:08:19.440
I think we'll talk about computer vision and 2D images a little bit and your challenge.
link |
00:08:24.640
That's exactly about intelligence.
link |
00:08:28.720
That's exactly about, no, that hopes to be exactly about the spirit of intelligence
link |
00:08:35.280
in the simplest possible way.
link |
00:08:42.560
Well, there's an open question whether starting at the MNIST digital recognition
link |
00:08:48.480
is a step towards intelligence or it's an entirely different thing.
link |
00:08:51.680
I think that to beat records using 100, 200 times less examples, you need intelligence.
link |
00:09:00.560
You need intelligence.
link |
00:09:01.360
So, let's, because you use this term and it'll be nice.
link |
00:09:05.280
I'd like to ask simple, maybe even dumb questions.
link |
00:09:09.760
Let's start with a predicate.
link |
00:09:12.720
In terms of terms and how you think about it, what is a predicate?
link |
00:09:15.840
I don't know. I have a feeling, formally, they exist.
link |
00:09:22.800
But I believe that predicate for 2D images, one of them is symmetry.
link |
00:09:32.080
Hold on a second.
link |
00:09:32.800
Sorry, sorry to interrupt and put you back.
link |
00:09:36.320
At the simplest level, we're not even, we're not being profound currently.
link |
00:09:40.560
A predicate is a statement of something that is true.
link |
00:09:43.600
Yes.
link |
00:09:46.480
Do you think of predicates as somehow probabilistic in nature or is this binary,
link |
00:09:54.480
this is truly constraints of logical statements about the world?
link |
00:10:00.000
In my definition, the simplest predicate is function.
link |
00:10:04.080
Function and you can use this function to make inner product that is predicate.
link |
00:10:10.160
What's the input and what's the output of the function?
link |
00:10:13.920
Input is x, something which is input in reality.
link |
00:10:18.560
Say, if you consider digit recognition, it's pixel space, input.
link |
00:10:24.880
But it is function which in pixel space, but it can be any function from pixel space.
link |
00:10:34.240
And you choose, and I believe that there are several functions,
link |
00:10:41.520
which is important for understanding of images.
link |
00:10:46.320
One of them is symmetry.
link |
00:10:48.160
It's not so simple construction as I described with literarity, with all this stuff.
link |
00:10:54.960
But another, I believe, I don't know how many, is how well structured is picture.
link |
00:11:02.160
Structurized?
link |
00:11:04.000
Yeah.
link |
00:11:04.640
What do you mean by structurized?
link |
00:11:06.800
It is formal definition.
link |
00:11:08.960
Say, something heavy on the left corner, not so heavy in the middle and so on.
link |
00:11:16.960
You describe in general concept of what you assume.
link |
00:11:21.760
Concepts, some kind of universal concepts.
link |
00:11:25.120
Yeah, but I don't know how to formalize this.
link |
00:11:29.120
Do you?
link |
00:11:29.760
So this is the thing.
link |
00:11:30.720
There's a million ways we can talk about this.
link |
00:11:32.720
I'll keep bringing it up.
link |
00:11:33.920
But we humans have such concepts when we look at digits.
link |
00:11:40.640
But it's hard to put them, just like you're saying now, it's hard to put them into words.
link |
00:11:44.640
You know, that is example.
link |
00:11:47.680
When critics in music trying to describe music, they use predicate and not too many predicate.
link |
00:11:59.120
But in different combination, but they have some special words for describing music.
link |
00:12:08.240
And the same should be for images.
link |
00:12:12.560
But maybe there are critics who understand essence of what this image is about.
link |
00:12:19.280
Do you think there exists critics who can summarize the essence of images, human beings?
link |
00:12:29.280
I hope so, yes, but that explicitly state them on paper.
link |
00:12:37.360
The fundamental question I'm asking is, do you think there exists a small set of predicates
link |
00:12:46.080
that will summarize images? It feels to our mind, like it does, that the concept of what
link |
00:12:52.880
makes a two and a three and a four.
link |
00:12:55.840
No, no, no.
link |
00:12:56.480
It's not on this level.
link |
00:13:00.720
What it should not describe, two, three, four.
link |
00:13:04.880
It describes some construction which allow you to create invariance.
link |
00:13:10.560
And invariance, sorry to stick on this, but terminology.
link |
00:13:15.360
Invariance, it is, it is property of your image.
link |
00:13:24.720
Say, I can say, looking at my image, it is more or less symmetric and I can give you
link |
00:13:32.800
value of symmetry, say, level of symmetry using this function which I gave yesterday.
link |
00:13:43.360
And you can describe that your image has these characteristics.
link |
00:13:51.520
Exactly in the way how musical critics describe music.
link |
00:13:56.400
So, but this is invariant applied to specific data, to specific music, to something.
link |
00:14:07.520
I strongly believe in this plot ideas that there exists world of predicate and world of
link |
00:14:16.000
reality and predicate and reality is somehow connected and you have to do that.
link |
00:14:21.920
Let's talk about Plato a little bit. So, you draw a line from Plato to Hegel to Wigner to today.
link |
00:14:29.360
Yes.
link |
00:14:30.000
So, Plato has forms, the theory of forms.
link |
00:14:35.360
There's a world of ideas and a world of things as you talk about and there's a connection.
link |
00:14:40.320
And presumably the world of ideas is very small and the world of things is arbitrarily big.
link |
00:14:47.840
But they're all what Plato calls them like, it's a shadow.
link |
00:14:52.480
The real world is a shadow from the world of form.
link |
00:14:54.800
Yeah, you have projection.
link |
00:14:56.640
Projection.
link |
00:14:57.280
Of world of idea.
link |
00:14:59.120
Yeah, very poetic.
link |
00:15:00.560
In reality, you can realize this projection using invariance because it is projection
link |
00:15:09.200
for own specific examples which create specific features of specific objects.
link |
00:15:14.720
So, the essence of intelligence is while only being able to observe the world of things,
link |
00:15:24.640
try to come up with a world of ideas.
link |
00:15:26.880
Exactly.
link |
00:15:27.840
Like in this music story, intelligent musical critics knows this all this world and have
link |
00:15:33.360
a feeling about what.
link |
00:15:34.640
I feel like that's a contradiction, intelligent music critics.
link |
00:15:38.800
But I think music is to be enjoyed in all its forms.
link |
00:15:47.520
The notion of critic, like a food critic.
link |
00:15:49.840
No, I don't want touching motion.
link |
00:15:52.160
That's an interesting question.
link |
00:15:53.520
There's a motion, there's a certain elements of the human psychology of the human experience
link |
00:16:00.080
which seem to almost contradict intelligence and reason.
link |
00:16:04.560
Emotion, like emotion, like fear, like love, all of those things, are those not connected
link |
00:16:12.720
in any way to the space of ideas?
link |
00:16:16.320
Yes, I don't know.
link |
00:16:19.600
I just want to be concentrated on very simple story, on digit recognition.
link |
00:16:27.760
So, you don't think you have to love and fear death in order to recognize digits?
link |
00:16:31.680
I don't know, because it's so complicated.
link |
00:16:36.080
It involves a lot of stuff which I never consider.
link |
00:16:40.800
But I know about digit recognition.
link |
00:16:44.000
And I know that for digit recognition, to get the records from small number of observations,
link |
00:16:56.480
you need predicate, but not special predicate for this problem.
link |
00:17:03.440
But universal predicate, which understand the world of images.
link |
00:17:08.240
Of visual information.
link |
00:17:09.760
Visual, yes.
link |
00:17:11.280
But on the first step, they understand the world of handwritten digits or characters or something simple.
link |
00:17:21.440
So, like you said, symmetry is an interesting one.
link |
00:17:23.840
That's what I think one of the predicates related to symmetry, the level of symmetry.
link |
00:17:30.800
Okay, degree of symmetry.
link |
00:17:32.000
So, you think symmetry at the bottom is a universal notion and there's
link |
00:17:39.360
degrees of a single kind of symmetry, or is there many kinds of symmetries?
link |
00:17:44.000
Many kinds of symmetries.
link |
00:17:45.840
There is a symmetry, anti symmetry, say letter S.
link |
00:17:50.320
So, it has vertical anti symmetry.
link |
00:17:58.320
And it could be diagonal symmetry, vertical symmetry.
link |
00:18:02.560
So, when you cut vertically the letter S.
link |
00:18:07.680
Yeah, then the upper part and lower part in different directions.
link |
00:18:15.200
Yeah, inverted along the y axis.
link |
00:18:18.320
Yeah.
link |
00:18:18.800
But that's just like one example of symmetry, right?
link |
00:18:21.120
Isn't there like...
link |
00:18:21.840
All right, but there is a degree of symmetry.
link |
00:18:26.240
If you play all this derivative stuff to do tangent distance,
link |
00:18:34.160
but whatever I describe, you can have a degree of symmetry.
link |
00:18:40.400
And that is what describing reason of image.
link |
00:18:45.120
It is the same as you will describe this image, saying about digits.
link |
00:18:54.800
It has anti symmetry, digits three, symmetric, more or less look for symmetry.
link |
00:19:03.440
Do you think such concepts like symmetry, predicates like symmetry,
link |
00:19:08.480
is it a hierarchical set of concepts?
link |
00:19:12.880
Or are these independent, distinct predicates that we want to discover some set of?
link |
00:19:21.920
There is a deal of symmetry.
link |
00:19:24.160
And you can...
link |
00:19:25.200
This idea of symmetry make very general, like degree of symmetry.
link |
00:19:35.120
If degree of symmetry can be zero, no symmetry at all.
link |
00:19:38.640
Or degree of symmetry, say, more or less symmetrical, but you have one of these descriptions.
link |
00:19:48.160
And symmetry can be different, as I told, horizontal, vertical, diagonal,
link |
00:19:53.600
and anti symmetry is also concept of symmetry.
link |
00:19:58.960
What about shape in general?
link |
00:20:00.880
I mean, symmetry is a fascinating notion, but...
link |
00:20:03.760
No, no, I'm talking about digit.
link |
00:20:06.320
I would like to concentrate on all...
link |
00:20:08.800
I would like to know predicate for digit recognition.
link |
00:20:12.080
Yes, but symmetry is not enough for digit recognition, right?
link |
00:20:16.800
It is not necessarily for digit recognition.
link |
00:20:19.920
It helps to create invariant, which you can use when you will have examples for digit recognition.
link |
00:20:36.240
When you have problem of digit recognition, you have examples of the first class or second class.
link |
00:20:41.520
Plus, you know that there exists concept of symmetry.
link |
00:20:45.680
And you apply, when you're looking for decision rule, you will apply concept of symmetry,
link |
00:20:55.200
of this level of symmetry, which you estimate from me.
link |
00:21:00.000
So let's talk. Everything comes from big convergence.
link |
00:21:06.480
What is convergence?
link |
00:21:07.680
What is weak convergence?
link |
00:21:09.120
What is strong convergence?
link |
00:21:11.360
I'm sorry, I'm going to do this to you.
link |
00:21:13.200
What are we converging from and to?
link |
00:21:16.000
You're converging...
link |
00:21:18.160
You would like to have a function.
link |
00:21:20.400
The function, which, say, indicator function, which indicate your digit 5, for example.
link |
00:21:29.840
A classification task.
link |
00:21:31.360
Let's talk only about classification.
link |
00:21:33.520
So classification means you will say, whether this is a 5 or not,
link |
00:21:38.480
or say which of the 10 digits it is.
link |
00:21:40.480
All right, all right.
link |
00:21:42.000
I would like to have these functions.
link |
00:21:46.480
Then I have some examples.
link |
00:21:52.000
I can consider property of these examples.
link |
00:22:00.960
Say symmetry.
link |
00:22:02.480
And I can measure level of symmetry for every digit.
link |
00:22:07.840
And then I can take average from my training data.
link |
00:22:15.680
And I will consider only functions of conditional probability,
link |
00:22:23.840
which I'm looking for my decision rule, which applying to digits
link |
00:22:36.320
will give me the same average as I absorb on training data.
link |
00:22:40.480
So actually, this is different level of description of what you want.
link |
00:22:48.400
You want not just, you show not one digit, you show this predicate, show general property
link |
00:22:59.680
of all digits, which you have in mind.
link |
00:23:03.520
If you have in mind digits 3, it gives you property of digits 3.
link |
00:23:09.520
And you select as admissible set of function, only function, which keeps this property.
link |
00:23:16.880
You will not consider other functions.
link |
00:23:20.720
So you're immediately looking for smaller subset of function.
link |
00:23:24.800
That's what you mean by admissible functions.
link |
00:23:26.640
You look at admissible function, exactly.
link |
00:23:28.320
Which is still a pretty large for the number 3.
link |
00:23:32.480
It is pretty large, but if you have one predicate.
link |
00:23:35.600
But according to, there is a strong and weak convergence.
link |
00:23:42.640
Strong convergence is convergence and function.
link |
00:23:46.240
You're looking for the function on one function, and you're looking for another function.
link |
00:23:51.760
And square difference from them should be small.
link |
00:23:57.920
If you take difference in any points, make a square, make an integral, and it should be small.
link |
00:24:05.520
That is convergence and function.
link |
00:24:07.840
Suppose you have some function, any function.
link |
00:24:11.200
So I would say, I say that some function converge to this function.
link |
00:24:17.760
If integral from square difference between them is small.
link |
00:24:22.720
That's the definition of strong convergence.
link |
00:24:24.640
That definition of strong convergence.
link |
00:24:25.600
Two functions, the integral of the difference is small.
link |
00:24:28.800
It is convergence in functions.
link |
00:24:32.160
But you have different convergence in functionals.
link |
00:24:36.560
You take any function, you take some function phi,
link |
00:24:41.040
and take inner product.
link |
00:24:42.480
This function is f function, f0 function, which you want to find.
link |
00:24:50.240
And that gives you some value.
link |
00:24:51.760
So you say that set of functions converge in inner product to this function.
link |
00:25:02.960
If this value of inner product converge to value f0, that is for one phi.
link |
00:25:12.400
But phi converges, requires that it converge for any function of Hilbert space.
link |
00:25:19.440
If it converge for any function of Hilbert space, then you would say that this is weak convergence.
link |
00:25:26.960
You can think that when you take integral, that is property, integral property of function.
link |
00:25:34.560
For example, if you will take sine or cosine, it is coefficient of, say, Fourier expansion.
link |
00:25:42.960
So if it converge for all coefficients of Fourier expansion, so under some condition,
link |
00:25:54.160
it converge to function you're looking for.
link |
00:25:58.000
But weak convergence means any property.
link |
00:26:02.640
Convergence not point wise, but integral property of function.
link |
00:26:08.320
So weak convergence means integral property of functions.
link |
00:26:13.680
When I'm talking about predicate, I would like to formulate which integral properties
link |
00:26:23.120
I would like to have for convergence.
link |
00:26:27.840
So and if I will take one predicate its function, which I measure property,
link |
00:26:35.200
if I will use one predicate and say I will consider only function
link |
00:26:43.760
which give me the same value as this predicate, I selecting set of functions
link |
00:26:51.920
from functions which is admissible in the sense that function which I looking for
link |
00:26:58.800
in this set of functions because I checking in training data, it gives the same.
link |
00:27:08.640
Yeah, so it always has to be connected to the training data in terms of...
link |
00:27:12.560
Yeah, but property, you can know independent on training data and this guy prop.
link |
00:27:21.120
Yeah.
link |
00:27:21.760
So there is formal property, 31 property and...
link |
00:27:25.200
For fairy tale, Russian fairy tale.
link |
00:27:27.040
Yeah, but Russian fairy tale is not so interesting.
link |
00:27:30.480
More interesting that people apply this to movies, to theater, to different things and
link |
00:27:38.640
the same works, they're universal.
link |
00:27:41.920
Well, so I would argue that there's a little bit of a difference between
link |
00:27:47.600
the kinds of things that were applied to which are essentially stories and digit recognition.
link |
00:27:54.160
It is the same story.
link |
00:27:55.200
You're saying digits, there's a story within the digit.
link |
00:27:59.600
Yeah, so but my point is why I hope that it possible to beat record using not 60,000
link |
00:28:11.360
but say 100 times less because instead you will give predicate and you will select your decision
link |
00:28:21.040
not from wide set of functions, but from set of function which keeps us predicate.
link |
00:28:27.920
But predicate is not related just to digit recognition.
link |
00:28:32.720
Right, so...
link |
00:28:33.760
Like in Plattus case.
link |
00:28:37.600
Do you think it's possible to automatically discover the predicates?
link |
00:28:42.080
So you basically said that the essence of intelligence is the discovery of good predicates.
link |
00:28:48.800
Yeah, now the natural question is
link |
00:28:54.960
you know that's what Einstein was good at doing in physics.
link |
00:28:58.880
Can we make machines do these kinds of discovery of good predicates?
link |
00:29:04.320
Or is this ultimately a human endeavor?
link |
00:29:07.600
That's I don't know.
link |
00:29:08.400
I don't think that machine can do because according to theory about weak convergence any function
link |
00:29:19.760
from Hilbert space can be predicate.
link |
00:29:23.120
So you have infinite number of predicate in upper and before you don't know which predicate is good
link |
00:29:31.840
on which but whatever prop show and why people call it breaks through that there is not too many
link |
00:29:43.520
predicate which cover most of situation happened in the world.
link |
00:29:51.200
So there's a sea of predicates and most of the only a small amount are useful for the kinds of
link |
00:29:58.320
things that happen in the world.
link |
00:29:59.680
I think that I would say only small part of predicate very useful.
link |
00:30:08.640
Useful all of them.
link |
00:30:11.280
Only very few are what we should let's call them good predicates.
link |
00:30:15.360
Very good predicates.
link |
00:30:16.560
Very good predicates.
link |
00:30:18.160
So can we linger on it?
link |
00:30:20.720
What's your intuition?
link |
00:30:21.760
Why is it hard for a machine to discover good predicates?
link |
00:30:26.800
I even in my talk described how to do predicate have to find new predicate.
link |
00:30:32.560
I'm not sure that it is very good.
link |
00:30:34.880
What did you propose in your talk?
link |
00:30:36.560
No, in my talk I gave example for diabetes.
link |
00:30:43.600
When we achieve some percent so then we're looking for area where some sort of predicate
link |
00:30:52.240
which I formulate does not keeps invariant.
link |
00:31:03.040
So if it doesn't keep I retain my data I select only function which keeps this invariant
link |
00:31:10.960
and when I did it I improve my performance.
link |
00:31:14.320
I can looking for this predicate.
link |
00:31:16.400
I know technically have to do that and you can of course do it using machine but I'm not
link |
00:31:26.720
sure that we will construct the smartest predicate.
link |
00:31:30.800
Well this is the allow me to linger on it because that's the essence that's the challenge
link |
00:31:36.160
that is artificial that's that's the human level intelligence that we seek is the discovery of
link |
00:31:41.680
these good predicates you've talked about deep learning as a way to the predicates they use
link |
00:31:49.280
and the functions are mediocre.
link |
00:31:52.800
We can find better ones.
link |
00:31:54.960
Let's talk about deep learning.
link |
00:31:57.280
Sure let's do it.
link |
00:31:58.000
I know only Jan Slikun convolutional network and what else I don't know and it's a very
link |
00:32:06.560
simple convolution.
link |
00:32:07.840
There's not much else to know.
link |
00:32:09.040
Left and right yes I can do it like that one is one predicate it is convolution is a single
link |
00:32:15.920
predicate it's single it's single predicate yes but you know exactly you take the derivative
link |
00:32:25.280
for translation and predicate should be kept.
link |
00:32:30.960
So that's a single predicate but humans discovered that one or at least
link |
00:32:34.560
not that is a risk not too many predicates and that is big story because Jan did it 25
link |
00:32:42.880
years ago and nothing so clear was added to deep network and then I don't understand
link |
00:32:54.880
why we should talk about deep network instead of talking about piecewise linear functions
link |
00:33:01.120
which keeps us predicated.
link |
00:33:02.720
Well the you know a counter argument is that maybe the amount of predicates necessary
link |
00:33:11.040
to solve general intelligence say in space of images doing efficient recognition of
link |
00:33:19.280
handwritten digits is very small and so we shouldn't be so obsessed about finding
link |
00:33:25.600
we'll find other good predicates like convolution for example you know there there has been other
link |
00:33:32.960
advancements like if you look at the work with attention there's intentional mechanisms
link |
00:33:39.360
in especially used in natural language focusing the the network's ability to
link |
00:33:45.120
to learn at which part of the input to look at.
link |
00:33:47.520
The thing is there's other things besides predicates that are important for the actual
link |
00:33:53.280
engineering mechanism of showing how much you can really do given such these predicates.
link |
00:34:02.000
I mean that's essentially the work of deep learning is constructing architectures
link |
00:34:07.040
that are able to be given the training data to be able to converge towards
link |
00:34:17.440
a function that can approximate can generalize well.
link |
00:34:21.040
It's an engineering problem.
link |
00:34:24.320
Yeah I understand but let's talk not on emotional level but on a mathematical level.
link |
00:34:31.760
You have set of piecewise linear functions.
link |
00:34:36.320
It is all possible neural networks.
link |
00:34:41.840
It's just piecewise linear functions.
link |
00:34:43.840
There's many many pieces.
link |
00:34:45.280
Large number of piecewise linear functions.
link |
00:34:47.520
Exactly but very large.
link |
00:34:49.280
Very large but it's still simpler than say convolution than reproducing
link |
00:34:57.440
internal Hilbert space which have a Hilbert set of functions.
link |
00:35:00.720
What's Hilbert space?
link |
00:35:02.800
It's space with infinite number of coordinates a function for expansion something like that.
link |
00:35:11.680
So it's much richer so and when I talking about closed form solution I talking about
link |
00:35:18.960
this set of function not piecewise linear set which is particular case.
link |
00:35:29.440
It is small part.
link |
00:35:30.720
So neural networks is a small part of the space your talk of functions you're talking about.
link |
00:35:34.960
Small small say small set of functions.
link |
00:35:40.400
But it is fine.
link |
00:35:41.920
It is fine.
link |
00:35:42.640
I don't want to to discuss the small or big take advantage.
link |
00:35:47.760
So you have some set of functions.
link |
00:35:50.800
So now when you're trying to create architecture you would like to create admissible set of functions
link |
00:35:58.960
all your tricks to use not all functions but some subset of this set of functions.
link |
00:36:07.120
Say when you're introducing convolutional net it is way to make this subset useful for you.
link |
00:36:15.200
But for my point of view convolutional it is something you want to keep some invariance
link |
00:36:24.640
say translation invariance.
link |
00:36:27.920
But now if you understand this and you cannot explain on the level of ideas what neural network does
link |
00:36:39.280
you should agree that it is much better to have a set of functions and they say this set of functions
link |
00:36:49.440
should be admissible it must keep this invariant this invariant and that invariant.
link |
00:36:55.120
You know that as soon as you incorporate new invariance set of function because smaller
link |
00:37:00.560
and smaller and smaller.
link |
00:37:02.000
But all the invariance are specified by you the human.
link |
00:37:05.200
Yeah but what I am hope that there is a standard predicate like prop show
link |
00:37:15.440
that what that's what I want to find for digit recognition if we start it is completely new
link |
00:37:22.400
area what is intelligence about on the level starting from from Plattus idea what is world of ideas.
link |
00:37:30.400
So and I believe that it's not too many.
link |
00:37:35.520
Yeah but you know it is amusing that mathematician doing something in neural network
link |
00:37:42.160
in general function but people from literature from art they use this all the time that's right
link |
00:37:50.000
invariance saying say it is great how how people describe music we should learn from that
link |
00:37:58.080
and something on this level but so why Vladimir prop who was just theoretical who studied
link |
00:38:09.680
theoretical literature he found that.
link |
00:38:13.120
You know what let me throw that right back at you because there's a little bit of a
link |
00:38:17.200
that's less mathematical and more emotional philosophical Vladimir prop I mean he wasn't
link |
00:38:23.840
doing math no and you just said another emotional statement which is you believe that this
link |
00:38:32.480
Plato world of ideas is small.
link |
00:38:35.760
I hope I hope do you do what's your intuition though if we can linger on it.
link |
00:38:43.120
You know it is not just small or big I know exactly then when I introducing
link |
00:38:54.880
some predicate I decrease set of functions but my goal to decrease set of function much
link |
00:39:03.600
by as much as possible by as much as possible good predicate which does this
link |
00:39:10.160
then I should choose next predicate which does this decrease set as much as possible
link |
00:39:17.120
so set of good predicate it is such that they decrease this amount of admissible function.
link |
00:39:27.680
So if each good predicate significantly reduces the set of admissible functions that
link |
00:39:32.640
there naturally should not be that many yeah predicates.
link |
00:39:35.440
No but but if you reduce very well the VC dimension of the function of admissible set
link |
00:39:44.800
of function is small and you need not too much training data to do well.
link |
00:39:52.880
And VC dimension by the way is some measure of capacity of this set of functions.
link |
00:39:57.600
Right how roughly speaking how many functions in this set so you're decreasing decreasing
link |
00:40:03.840
and it makes it easy for you to find function you're looking for.
link |
00:40:10.160
So the most important part to create good admissible set of functions and it probably
link |
00:40:16.400
there are many ways but the good predicate is such that that can do that.
link |
00:40:25.760
So let's for this duck you should know a little bit about duck because what are the
link |
00:40:32.320
what are the three fundamental laws of ducks?
link |
00:40:35.280
Looks like a duck swims like a duck and quack like a duck.
link |
00:40:38.240
You should know something about ducks to be able to.
link |
00:40:41.040
Not necessarily looks like say horse it's also good.
link |
00:40:46.400
So it's not it generalizes from ducks.
link |
00:40:49.760
And talk like and make sound like horse something and run like horse and moves like horse.
link |
00:40:57.280
It is general it is general predicate that this applied to duck but for duck you can say
link |
00:41:07.120
play chess like duck.
link |
00:41:09.760
You cannot say play chess.
link |
00:41:11.440
Why not?
link |
00:41:12.480
So you're saying you can but that would not be a good.
link |
00:41:15.680
No you will not reduce a lot of.
link |
00:41:18.080
You will not do yeah you would not reduce the set of functions.
link |
00:41:21.600
So you get the story is formal story mathematical story is that you can use any function you want
link |
00:41:29.120
like the predicate but some of them are good some of them are not because some of them reduce a
link |
00:41:34.880
lot of functions to admissible set of some of them.
link |
00:41:39.680
But the question is I'll probably keep asking this question but how do we find such
link |
00:41:45.600
what's your intuition?
link |
00:41:46.480
So my hand written hand written recognition how do we find the the answer to your challenge?
link |
00:41:52.480
Yeah yeah I understand it like that.
link |
00:41:55.760
I understand what what to find what it means I knew predicate.
link |
00:42:01.680
Yeah like guy who understand music can say this word which he described when he listened to music.
link |
00:42:09.440
He understand music he use not too many different or you can do like prop.
link |
00:42:15.440
You can make collection what he talking about music about this about that it's not too many
link |
00:42:22.000
different situations he described.
link |
00:42:24.880
Because we mentioned Vladimir proper bunch.
link |
00:42:26.800
Let me just mention there's a there's a sequence of 31 structural notions that are common in stories
link |
00:42:36.800
and I think he called units units and I think they resonate.
link |
00:42:40.400
I mean it starts just to give an example of ascension a member of the hero's community
link |
00:42:45.920
of family leaves the security of the home environment then it goes to the introduction
link |
00:42:50.880
or forbidding edict or command is passed upon the hero don't go there don't do this the hero's
link |
00:42:57.040
warrant against some action.
link |
00:42:58.640
Then step three violet violation of interdiction breaks you know break the rules break out on
link |
00:43:06.480
your own then reconnaissance the villain makes an effort to attain knowledge needing to fulfill
link |
00:43:12.080
their plot so on it goes on like this ends ends in a wedding number 31 happily ever after.
link |
00:43:20.480
No he he he just gave description of all situations he understands this world.
link |
00:43:28.000
Of folk tales.
link |
00:43:29.200
Yeah not folk stories and this story is not in just folk tales the stories in in detective
link |
00:43:38.640
serials as well and probably in our lives we probably live read this and then they they
link |
00:43:46.560
wrote that this predicate is good for different situation from movie from for movie for theater.
link |
00:43:57.760
By the way there's also criticism right there's an other way to interpret narratives from
link |
00:44:07.600
Claude Levy Strauss I think I don't I I'm not in this business I know I know it's
link |
00:44:13.280
theoretical literature but it's looking at paradise it's always the the the the
link |
00:44:18.480
discussion yeah yeah but at least there is a unit it's not too many units that can describe
link |
00:44:26.400
but that's probably gives another unit or another way exactly another another set of units another
link |
00:44:34.560
set of predicates it does not matter but they exist probably my my question is whether given those
link |
00:44:44.480
units whether without our human brains to interpret these units they would still hold as much power
link |
00:44:52.320
as they have meaning are those units enough when we give them to the alien species let me ask you
link |
00:45:00.240
do you understand digit recognize digit images no I don't understand no no no uh when you can
link |
00:45:09.040
recognize this digit images it means that you understand yes you understand characters you
link |
00:45:16.400
understand no no no no I I it's the it's the imitation versus understanding question because
link |
00:45:26.560
I don't understand the mechanism by which I am not talking about I'm talking about predicates
link |
00:45:32.640
you understand that it involves symmetry maybe structure maybe something else I cannot formulate
link |
00:45:38.640
I just was able to find symmetries so I guess symmetries that's really good so this is a good
link |
00:45:45.680
line I feel like I understand the basic elements of what makes a good hand recognition system
link |
00:45:53.280
my own like symmetry connects with me it seems like that's a very powerful predicate my question is
link |
00:46:00.480
is there a lot more going on that we're not able to introspect maybe I need to be able to understand
link |
00:46:08.720
a huge amount in the world of ideas uh thousands of predicates millions of predicates in order to
link |
00:46:18.880
do hand recognition I don't think so so you're you're both your hope and your intuition of such
link |
00:46:26.800
that let me explain you're using digits you're using examples as well theory says that if you
link |
00:46:36.560
will use all possible functions from hilda space all possible predicate you don't need training data
link |
00:46:48.880
you just will have admissible set of function which contain one function yes so the tradeoff
link |
00:46:58.240
is when you're not using all predicates you're only using a few good predicates you need to
link |
00:47:03.440
have some training data yes the more the the more good predicates you have the less training
link |
00:47:09.040
data exactly that is intelligent still okay I'm going to keep asking the same dumb question
link |
00:47:17.360
handwritten recognition to solve the challenge you kind of propose a challenge that says we
link |
00:47:21.920
should be able to get state of the art amnesty error rates by using very few 60 maybe fewer
link |
00:47:30.080
examples per digit what kind of predicates do you think that is the challenge so people who will
link |
00:47:38.640
solve this problem they will answer they will answer do you think they'll be able to answer it
link |
00:47:44.640
in a human explainable way they just need to write function that's it but so can that function
link |
00:47:52.480
be written I guess by an automated reasoning system whether we're talking about a neural
link |
00:48:00.400
network learning a particular function or another mechanism no narrow I'm not against
link |
00:48:07.280
neural network I'm against admissible set of function which create neural network you did it
link |
00:48:14.400
by hand you don't you don't do it by invariance by predicate by by by reason but neural networks
link |
00:48:25.280
can then reverse do the reverse step of helping you find a function just the task of a neural
link |
00:48:32.640
network is is to find a disentangled representation for example what they call is to find that one
link |
00:48:40.400
predicate function that's really captures some kind of essence one not the entire essence but
link |
00:48:46.960
one very useful essence of this particular visual space do you think that's possible like
link |
00:48:55.280
listen I'm grasping hoping there's an automated way to find good predicates right so the question
link |
00:49:00.960
is what are the mechanisms of finding good predicates ideas the you think we should pursue
link |
00:49:07.840
a young grad student listening right now I gave example so find situation where
link |
00:49:18.800
predicate which you're suggesting don't create invariant
link |
00:49:27.120
it's like in physics find situation where existing theory cannot explain it
link |
00:49:34.080
so you're finding contradictions find contradiction and then remove this contradiction
link |
00:49:39.280
but in my case what means contradiction you find function which if you will use this function
link |
00:49:46.640
you you're not keeping invariance so really the process of discovering contradictions yeah
link |
00:50:01.920
it is like in physics find situation where you have contradiction for one of the property
link |
00:50:13.040
for one of the predicate then include this predicate making invariance and solve again
link |
00:50:19.680
this problem now you don't have contradiction but it is not
link |
00:50:24.720
the best way probably I don't know to looking for predicate that's just one way okay that no no
link |
00:50:34.240
it is brute force way the brute force way what about the ideas of some what big umbrella term
link |
00:50:43.200
of symbolic AI there's what in 80s with expert systems sort of logic reasoning based systems
link |
00:50:51.040
is there hope there to find some through sort of deductive reasoning to find good predicates
link |
00:51:05.360
I don't think so I think that just logic is not enough it's kind of a compelling notion though
link |
00:51:14.240
you know that when smart people sit in a room and reason through things it seems compelling
link |
00:51:20.240
and making our machines do the same is also compelling so everything is very simple
link |
00:51:29.280
when you have infinite number of predicate you can choose the the function you want you have
link |
00:51:38.880
invariance and you can choose the function you want but you have to have a not too many invariance
link |
00:51:51.680
to solve the problem so and half from infinite number of function to select finite number
link |
00:52:02.720
and hopefully small fine number of functions which is good enough to extract small set of admissible
link |
00:52:15.680
functions so they will be admissible it's for sure because every function just decrease set of
link |
00:52:23.280
function and leaving it admissible but it will be small but why do you think logic
link |
00:52:29.120
basic systems don't can't help intuition not because you you should know reality you should
link |
00:52:38.000
know life this guy like prop he knows something and he tried to to put in invariant his understanding
link |
00:52:49.200
that's the human yeah but see you're you're putting too much value into
link |
00:52:53.440
you Vladimir props knowing something no it is my story is that what means you know life
link |
00:53:04.320
what it means you know common sense no no you know something common sense it is some rules
link |
00:53:13.280
you think so common sense is simply rules common sense is every it's
link |
00:53:19.840
it's mortality it's no it's it's fear of death it's love it's spirituality it's a happiness and
link |
00:53:29.840
sadness all of it is tied up into understanding gravity which is what we think of as common sense
link |
00:53:36.720
I don't really to discuss so white I want to discuss understand digitally understand
link |
00:53:44.240
digitally recognition anytime I bring up love and death you you bring it back to digital recognition
link |
00:53:50.160
I don't like it no you know it is durable because there is a challenge yeah which I see how to solve
link |
00:53:58.000
it if I will have a student concentrating this work I will suggest something to solve you mean
link |
00:54:05.040
handwritten recognition yeah it's a beautifully simple elegant and yet I think that I know in
link |
00:54:11.680
variants which will solve this you do I think so yes but it is not universal it is maybe I want
link |
00:54:22.080
some universal in variants which are good not only for digital recognition for image understanding
link |
00:54:30.560
so let me ask how hard do you think is 2d image understanding
link |
00:54:36.800
so if we we can kind of intuit handwritten recognition how big of a step leap journey is it
link |
00:54:47.600
from that if I gave you good if I solved your challenge for handwriting recognition how long
link |
00:54:53.920
would my journey then be from that to understanding more general natural images immediate you will
link |
00:55:00.480
understand this as soon as you will make a record because it is not for free as soon as you will
link |
00:55:09.440
create several in variants which will help you to get the same performance that the best neural
link |
00:55:21.120
net did using 100 times maybe more than 100 times less examples you have to have something smart
link |
00:55:30.080
to do that and you're saying that it is invariant it is predicate because you should put some idea
link |
00:55:37.440
how to do that but okay let me just pause maybe it's a trivial point maybe not but handwritten
link |
00:55:45.360
recognition feels like a 2d two dimensional problem and it seems like how much complicated is the fact
link |
00:55:55.280
that most images are projection of a three dimensional world onto a 2d plane it feels like
link |
00:56:04.000
for a three dimensional world we still we need to start understanding common sense in order to
link |
00:56:09.600
understand an image it's no longer visual shape and symmetry it's having to start to understand
link |
00:56:19.120
concepts of it understand life yeah yes yes you're you're you're talking that there are
link |
00:56:25.840
different invariant different predicates yeah and potentially much larger number you know
link |
00:56:33.200
maybe but let's start from simple okay but you said that you know I cannot think yes about things
link |
00:56:41.280
which I don't understand this I understand but I'm sure that I don't understand everything there
link |
00:56:47.600
yeah yeah that's the difference say do as simple as possible but not simpler and that is exact case
link |
00:56:56.400
with handwritten with handwritten yeah but never that's the difference between you and I I I uh
link |
00:57:04.800
I welcome and enjoy thinking about things I completely don't understand because to me it's
link |
00:57:10.720
a natural extension without having solved handwritten recognition to wonder how how difficult is the
link |
00:57:21.200
the the next step of understanding 2d 3d images because ultimately while the science of intelligence
link |
00:57:29.120
is fascinating it's also fascinating to see how that maps to the engineering of intelligence
link |
00:57:34.560
and recognizing handwritten digits is not doesn't help you it might it may not help you with the
link |
00:57:44.000
problem of general intelligence we don't know it'll help you a little bit we don't know it's unclear
link |
00:57:49.360
it's unclear yeah it might very much but I would like to make a remark yes I start not from very
link |
00:57:55.280
primitive problem make a challenge problem I start with very general problem with plateau
link |
00:58:07.520
so you understand and and it comes from plato to to to digit recognition so so you basically took
link |
00:58:15.280
play dough and the the world of forms and ideas and mapped and projecting into the
link |
00:58:22.800
clearest simplest formulation of that big world you know I would say that I did not understand
link |
00:58:29.920
plato until recently and until I consider weak convergence and then predicate and then oh this
link |
00:58:42.160
is what plato told me so linger on that like why how do you think about this world of ideas and
link |
00:58:50.160
world of things in play dough no it is metaphor it is it's the metaphor for sure it's a compelling
link |
00:58:56.400
it's a poetic and a beautiful yeah but what can you but it is a way how you you you should try to
link |
00:59:03.760
understand have attack ideas in the world so from my point of view it is very clear but it is lying
link |
00:59:14.080
all the time people looking for that say plato's and Hegel whatever reasonable it exists whatever
link |
00:59:24.560
exists it is reasonable I don't know what he have in mind reasonable right this philosophers again
link |
00:59:31.360
no no no no no no no it is it is next stop of Wigner that what you might understand something
link |
00:59:38.960
of reality it is the same plato line and then it comes suddenly to Vladimir prop
link |
00:59:48.000
look 31 ideas 31 units and describes everything there's abstractions ideas that represent
link |
00:59:59.040
our world and we should always try to reach into that yeah but but you should make a projection
link |
01:00:06.080
on reality but understanding is it is abstract ideas you have in your mind several abstract ideas
link |
01:00:15.680
which you can apply to reality and reality in this case sort of if you look at machine learning is
link |
01:00:20.880
data example data data okay let me let me put put this on you because I'm an emotional creature
link |
01:00:28.160
I'm not a mathematical creature like you I find compelling the idea forget this the space the sea
link |
01:00:35.200
of functions there's also a sea of data in the world and I find compelling that there might be
link |
01:00:42.080
like you said teacher small examples of data that are most useful for discovering good whether it's
link |
01:00:53.280
predicates or good functions that the selection of data may be a powerful journey a useful
link |
01:01:01.200
mechanism you know coming up with a mechanism for selecting good data might be useful too
link |
01:01:07.440
do you find this idea of finding the right data set interesting at all or do you kind of take
link |
01:01:14.960
the data set as a given I think that it is you know my scheme is very simple you have huge set of
link |
01:01:23.840
functions if you will apply and you have not too many data right if you will pick up function
link |
01:01:34.000
which describes this data you will do not very well you will randomly pick up yeah you will
link |
01:01:42.720
have a fit here yeah it will be overfitting so you should decrease set of function from which
link |
01:01:50.560
you're picking up one so you should go some half to admissible set of function and this what about
link |
01:02:00.960
weak conversions so but from another point of view to to make admissible set of function
link |
01:02:12.960
you need just a deed you just function which you will take in inner product which you will measure
link |
01:02:22.800
property of your function
link |
01:02:27.360
and that is how it works no I get it I get I understand it but do you the reality is but let
link |
01:02:35.280
let this let let's think about examples you have huge set of function if you have several examples
link |
01:02:44.560
if you just trying to keep take function which satisfies these examples you still will overfit
link |
01:02:56.400
you need decrease you need admissible set of function absolutely but what say you have
link |
01:03:02.240
more data than functions so sort of consider the I mean maybe not more data than functions
link |
01:03:09.600
because that's impossible impossible but what I was trying to be poetic for a second I mean
link |
01:03:15.280
you have a huge amount of data a huge amount of examples but amount of function can be even
link |
01:03:22.640
bigger I understand every single there's always there's always a bigger boat full
link |
01:03:27.680
Hilbert space I got you but okay but you don't you don't find the world of data to be an interesting
link |
01:03:37.520
optimization space like the the optimization should be in the space of functions
link |
01:03:44.880
creating admissible set of fun admissible set of function no you know even from the classical
link |
01:03:51.440
this is sorry from structure risk minimization you should or you should organize function in the way
link |
01:04:02.160
that they will be useful for you right and that is yeah but the the way you're thinking about
link |
01:04:11.600
useful is you're given a small set small small set of function which contain function by looking yeah
link |
01:04:21.840
but as looking for based on the empirical set of small examples yeah but that is another story I
link |
01:04:29.600
don't touch it because I I believe I believe that this small examples it's not too small
link |
01:04:36.560
so 60 per class that law of large numbers works I don't need uniform law the story is that in
link |
01:04:44.480
statistics there are two law law of large numbers uniform law of large numbers so I want to be in
link |
01:04:52.240
situation where I use law of large numbers no but not uniform law of large numbers right so 60 is
link |
01:04:59.600
law of large it's like enough I hope no it's still need some evaluation some bounce so that's it
link |
01:05:07.760
but idea is the following that if you trust that say this average gives you something close to
link |
01:05:20.080
expectations so you can talk about that about this predicate and that is basis of human intelligence
link |
01:05:29.920
right good predicates is the discovery of good predicates is the basis of no no it is
link |
01:05:35.200
discovery of your of your understanding world of your methodology of a distance of understanding
link |
01:05:42.960
world because you have several functions which you will apply to reality can you say that again so
link |
01:05:53.440
your you have several functions yeah predicate but they abstract yes then you will apply them to
link |
01:06:02.000
reality to your data and you will create in this way predicate which is useful for your task
link |
01:06:09.200
but predicate are not related specifically to your task to this your task it is abstract functions
link |
01:06:19.920
which being applying applied to many tasks that you might be interested in it may be many tasks I
link |
01:06:26.400
don't know or different tasks well they should be many tasks right yeah it is like like in probe case
link |
01:06:34.560
yes it was for free details but it's happened everywhere okay so we talked about images a
link |
01:06:41.680
little bit but can we talk about Noam Chomsky for a second I don't know him personally well
link |
01:06:55.520
not personally I don't know his ideas his ideas well let me just say do you think language human
link |
01:07:01.280
language is essential to expressing ideas as Noam Chomsky believes so like language is at the core
link |
01:07:09.920
of our formation of predicates the human language for me language and all the story of language
link |
01:07:18.480
is very complicated I don't understand this and I am not I thought about nobody I'm not ready to
link |
01:07:27.360
work on that because it's so huge it is not for me and I believe not for our century
link |
01:07:35.840
the 21st century not for 21st century so you should learn something a lot of stuff
link |
01:07:42.080
from simple tasks like digit recognition so you think okay you think digital recognition
link |
01:07:48.080
2d image what how would you more abstractly define it digit recognition it's 2d image
link |
01:07:59.600
symbol recognition essentially I mean I like I'm trying to get a sense sort of thinking
link |
01:08:08.560
about it now having worked with MNIST forever how how small of a subset is this of the general
link |
01:08:16.640
vision recognition problem and the general intelligence problem is it yeah is it a giant
link |
01:08:25.840
subset is it not and how far away is language you know let me refer to Einstein take the simplest
link |
01:08:35.760
problem as simple as possible but not simpler and this is challenge is simple problem
link |
01:08:42.720
but it's simple by idea but not simple to to get it when you will do this you will find
link |
01:08:54.560
some predicate which helps it but yeah I mean with Einstein you can you look at general relativity
link |
01:09:03.840
but that doesn't help you with quantum mechanics that's another story you you don't have any
link |
01:09:09.840
universal instrument yes so I'm trying to wonder if uh which space we're in whether the whether
link |
01:09:18.240
handwritten recognition is like general relativity and then language is like quantum mechanics so
link |
01:09:23.120
you're still gonna have to do a lot of mess to to universalize it but uh I'm trying to see
link |
01:09:32.720
one so what's your intuition why handwritten recognition is easier than language
link |
01:09:41.840
just I think a lot of people would agree with that but if you could elucidate sort of the
link |
01:09:47.840
the intuition of why I don't know no I don't think in this direction I just think in
link |
01:09:57.360
direction that this is problem which if we will solve it well
link |
01:10:07.680
we will create
link |
01:10:12.480
some abstract understanding of images maybe not all images I would like to talk to guys who
link |
01:10:21.360
are doing in real images in Columbia University what kind of images unreal so real images real
link |
01:10:29.280
images yeah what their idea is there a predicate what can be predicated I still symmetry will play
link |
01:10:38.080
role in real life images in any real life images 2d images let's talk about 2d images because
link |
01:10:46.560
that's what we know in neural network was created for 2d images so the people I know in
link |
01:10:57.440
vision science for example the people study human vision yeah that they usually go to the world of
link |
01:11:03.840
symbols and like handwritten recognition but not really it's other kinds of symbols to study
link |
01:11:09.200
our visual perception system as far as I know not much predicate type of thinking is understood
link |
01:11:15.600
about our vision system they did not think in this direction they don't yeah they but how do
link |
01:11:21.040
you even begin to think in that direction that says so I would like to discuss with them yeah
link |
01:11:27.600
because if we will be able to show that it is what working
link |
01:11:35.440
and theoretical scheme it's not so bad so the the unfortunate so if we compare to language
link |
01:11:43.200
language has like letters finite set of letters and a finite set of ways you can put together
link |
01:11:49.360
those letters so it feels more amenable to kind of analysis with natural images there is so many
link |
01:11:57.120
pixels no no no letter language is much much more complicated it's involved a lot of different stuff
link |
01:12:06.880
it's not just understanding of very simple class of tasks I would like to see lists of tasks
link |
01:12:18.080
where language involved yes so there's a there's a lot of nice benchmarks now on
link |
01:12:23.040
in natural language processing from the very trivial like understanding the elements of a
link |
01:12:29.120
sentence to question answering to more much more complicated where you talk about open domain dialogue
link |
01:12:34.960
the natural question is with handwritten recognition is really the first step yeah of
link |
01:12:41.840
understanding visual information right but not but but even our records show that we go in wrong
link |
01:12:52.800
direction because we need 60 000 digits so even this first step so forget about talking about the
link |
01:12:59.840
full journey this first step should be taking in the right direction no no in wrong direction because
link |
01:13:05.040
60 000 is unacceptable no I'm saying it should be taken in the in the right direction because 60 000
link |
01:13:11.920
is not acceptable it is you can talk it's great we have half percent of error and hopefully the step
link |
01:13:20.160
from doing hand recognition using very few examples the step towards what babies do when they crawl and
link |
01:13:27.520
understand their physical environment I know you don't know about babies if you will do from very
link |
01:13:33.760
small examples yeah you will find principles that will be different from what we're using now
link |
01:13:44.320
and theoretically it's more or less clear that means that you will use weak convergence not just
link |
01:13:52.640
strong convergence do you think these principles are will naturally be human interpretable
link |
01:14:01.440
oh yeah so like when we will be able to explain them and have a nice presentation to show what
link |
01:14:06.240
those principles are or are they very going to be very kind of abstract kinds of functions
link |
01:14:14.320
for example I talked yesterday about symmetry yes and they gave very simple examples the same will
link |
01:14:20.880
be laying there you gave like a predicate of a basic for for symmetries yes for different symmetries and
link |
01:14:27.040
you have for a degree of symmetry that is important not just symmetry existence doesn't exist a degree
link |
01:14:35.440
of symmetry yeah for handwritten recognition no it's not for handwritten it's for any images
link |
01:14:44.960
but I would like apply to handwritten right it's in theory it's more general okay okay
link |
01:14:55.040
so a lot of things we've been talking about falls we've been talking about philosophy a little bit
link |
01:15:01.680
but also about mathematics and statistics a lot of it falls into this idea a universal idea of
link |
01:15:08.960
statistical theory of learning what is the most beautiful and sort of powerful or essential
link |
01:15:17.520
idea you've come across even just for yourself personally in in the world of statistics or
link |
01:15:23.040
a statistic theory of learning probably uniform convergence which we did with Alexei Chilvonakis
link |
01:15:31.840
can you describe universal convergence you have law of large law of large numbers
link |
01:15:40.000
so for any function expectation of function average of function conversion expectation
link |
01:15:47.920
but if you have set of functions for any function it is true but it should converge simultaneously
link |
01:15:55.440
for all set of functions and for for learning you need uniform convergence just convergence
link |
01:16:07.520
is not enough because when you pick up one which gives minimum you can pick up one function which
link |
01:16:21.200
does not converge in and it will give you the best answer for for this function
link |
01:16:31.360
so you need the uniform convergence to guarantee learning so learning does not
link |
01:16:37.200
really on trivial law large numbers it really on universal but
link |
01:16:43.040
but a deal of weak convergence existing statistics for a long time but
link |
01:16:55.280
it is interesting that as I think about myself how stupid I was 50 years I did not see weak
link |
01:17:06.240
convergence I work on strong convergence but now I think that most powerful is weak convergence
link |
01:17:15.120
because it makes admissible set of functions and even in all in proverbs when people try to
link |
01:17:23.600
understand recognition about dog law looks like a dog and so on they use weak convergence
link |
01:17:31.520
people in language they understand this but when we're trying to create artificial
link |
01:17:39.840
intelligence we want event in different way we just consider strong convergence
link |
01:17:48.960
arguments so reducing a set of admissible functions you think there should be
link |
01:17:56.720
effort put into understanding the properties of weak convergence
link |
01:18:01.120
you know in classical mathematics in Gilbert space there are only two way two
link |
01:18:09.520
form of convergence strong and weak now we can use both that means that we did everything
link |
01:18:19.360
and it so happened then when we use Hilbert space which is very rich space space of continuous
link |
01:18:31.120
functions which has an integral in square so we can apply weak and strong convergence for learning
link |
01:18:41.440
and have closed form solution so for computationally simple for me it is sign that it is right way
link |
01:18:52.160
because you don't need any heuristic yes whatever you want but now the only what left
link |
01:19:02.400
it is concept of what is predicate but it is not statistics by the way I like
link |
01:19:08.640
the fact that you think the heuristics are a mess that should be removed from the system
link |
01:19:14.720
so closed form solution is the ultimate no it so happened then when you're using
link |
01:19:22.320
right instrument you have closed form solution
link |
01:19:25.200
do you think intelligence human level intelligence when we create it will
link |
01:19:37.600
will have something like a closed form solution you know I now I'm looking on
link |
01:19:44.480
bounds which I gave bounds for convergence and when I looking for bounds I thinking
link |
01:19:56.000
what is the most appropriate kernel for this bound would be so you know that in say
link |
01:20:03.920
all our businesses we use radial basis function but looking on the bound I think that I start
link |
01:20:14.720
to understand that maybe we need to make corrections to radial basis function to be closer
link |
01:20:22.320
to work better for these bounds so I'm again trying to understand what type of kernel
link |
01:20:33.760
have best approximation no approximation best fit to this ball
link |
01:20:43.280
sure so there's a there's a lot of interesting work that could be done in discovering better
link |
01:20:47.200
functions and radial basis functions for for yeah but but it still comes from you you're
link |
01:20:56.560
you're looking to mass and trying to understand what from your own mind looking at the yeah but
link |
01:21:02.640
I don't know then I trying to understand what what will be good for that yeah but to me there's
link |
01:21:12.240
still a beauty again maybe I'm a descendant valenturing to heuristics to me ultimately
link |
01:21:19.760
intelligence will be a mess of heuristics and no that's the engineering answer I guess
link |
01:21:26.160
absolutely when when you're doing say self driving cars the great guy who will do this
link |
01:21:34.080
it does not matter what theory behind that who has a better feeling have to apply but by the way
link |
01:21:46.640
it is the same story about predicates because you cannot create rule for situation is much more
link |
01:21:55.200
than you have rule for that but maybe you can have more abstract rule than it will be less
link |
01:22:06.880
this rule it is the same story about ideas and and ideas applied to specific cases
link |
01:22:16.400
but still you should you cannot avoid this yes of course but you should still reach for the ideas
link |
01:22:21.440
to understand the science yeah let me kind of ask do you think neural networks or functions
link |
01:22:30.800
can be made to reason sort of what do you think we've been talking about intelligence but this
link |
01:22:37.840
idea of reasoning there's a there's an element of sequentially disassembling interpreting
link |
01:22:45.200
the the images so when you think of handwritten recognition we kind of think that there will
link |
01:22:54.240
be a single there's an input and output there's not a recurrence yeah what do you think about
link |
01:23:02.320
sort of the idea of recurrence of going back to memory and thinking through this sort of
link |
01:23:07.360
sequentially mangling the different representations over and over until you arrive at a conclusion
link |
01:23:19.840
or is ultimately all that can be wrapped up into a function
link |
01:23:23.280
no you you're suggesting that let us use this type of algorithm when I starting thinking I
link |
01:23:31.120
first of all starting to understand what I want can I write down what I want and
link |
01:23:41.840
then I trying to formalize and when I do that I think I have to solve this problem
link |
01:23:48.960
and till now I did not see a situation where you need recurrence
link |
01:24:03.680
but do you observe human beings yeah do you try to it's the imitation question right it seems
link |
01:24:12.640
that human beings reason this kind of sequentially sort of does that inspire in you a thought that
link |
01:24:22.800
we need to add that into our intelligent systems you're saying okay I mean you've kind of answered
link |
01:24:34.000
saying until now I haven't seen a need for it and so because of that you don't see a reason to think
link |
01:24:40.240
about it you know most of things I don't understand in reasoning in human it is for me too complicated
link |
01:24:52.640
for me the most difficult part is to ask questions to good questions how it works how
link |
01:25:04.080
people asking questions I don't know this you said that machine learning is not only about
link |
01:25:13.840
technical things speaking of questions but it's also about philosophy so what role does philosophy
link |
01:25:22.160
play in machine learning we talked about Plato but generally thinking in this philosophical way
link |
01:25:29.600
does it have how does philosophy and math fit together in your mind
link |
01:25:36.480
first ideas and then their implementation it's like predicate like
link |
01:25:44.000
say admissible set of functions it comes together everything because
link |
01:25:51.760
the first iteration of theory was done 50 years ago it all that this is so everything there
link |
01:26:02.080
if you have data you can and you in your set of function is not has a not have a not big capacity
link |
01:26:13.440
so low VC dimension you can do that you can make structural risk minimization control capacity
link |
01:26:19.360
but you was not able to make admissible set of function good no one suddenly realized that
link |
01:26:32.560
we did not use another idea of convergence which we can
link |
01:26:39.280
everything comes together but those are mathematical notions philosophy plays a role of simply
link |
01:26:45.920
saying that we should be swimming in the space of ideas let's talk what is philosophy philosophy
link |
01:26:54.720
means understanding of life so understanding of life say people like Plato they understand
link |
01:27:04.320
on very high abstract level of life so and whatever I doing just implementation of my
link |
01:27:14.240
understanding of life but every new step it is very difficult for example
link |
01:27:26.000
to find this idea that we need big convergence
link |
01:27:35.520
was not simple for me
link |
01:27:38.160
so that required thinking about life a little bit hard to hard to trace but
link |
01:27:47.040
there was some thought process you know I working guys thinking about the same problem for
link |
01:27:53.920
50 years somehow and again and again and again I trying to be honest and that is a very important
link |
01:28:02.560
not to be very enthusiastic yeah but concentrate on whatever we was not able to achieve for example
link |
01:28:11.920
and understand why and now I understand that because I believe in mass I believe that
link |
01:28:19.440
in Wigner's idea but now when I see that there are only two way of convergence and we're using both
link |
01:28:32.880
that means that we must do as well as people doing but now exactly in philosophy and what we
link |
01:28:43.920
know about predicate what we how we understand life can we describe as a predicate I thought about that
link |
01:28:54.560
and that is more or less obvious level of symmetry but next I have a feeling it's something about
link |
01:29:06.960
structures but I don't know how to formulate how to measure measure of structure and all this stuff
link |
01:29:16.080
and the guy who will solve this challenge problem then when we were looking how he did it
link |
01:29:26.880
probably just only symmetry is not enough but something like symmetry will be there
link |
01:29:33.520
that's absolutely symmetry will be there and level of symmetry will be there
link |
01:29:40.560
and level of symmetry anti symmetry diagonal vertical and I I even don't know how you can
link |
01:29:48.240
use in different direction idea of symmetry it was very general but it will be there
link |
01:29:53.040
I think that people very sensitive to idea of symmetry but there are several ideas like symmetry
link |
01:30:04.720
as I would like to learn but you cannot learn just thinking about that you should do challenging
link |
01:30:13.280
problems and then analyze them why why it was we were able to solve them and then we will see
link |
01:30:20.960
it very simple things it's not easy to find
link |
01:30:27.760
even with talking about this every time about your your I was surprised I I try to understand
link |
01:30:36.160
is people describe in language strong convergence mechanism for learning
link |
01:30:42.960
I did not see I don't know but we convergence this dark story and story like that when you
link |
01:30:51.840
will explain to kid you will use weak convergence argument it looks like it does like it does that
link |
01:31:00.880
but when you try to formalize you just ignoring this why why 50 years from start of machine
link |
01:31:09.760
learning and that's the role of philosophy I think I think that maybe I don't know
link |
01:31:18.160
maybe this is theory also we should blame for that because empirical risk minimization
link |
01:31:26.000
and all this stuff and if you read now textbooks they just about bound about empirical risk
link |
01:31:33.520
minimization they don't looking for another problem like admissible set but on the topic of life
link |
01:31:44.960
perhaps we you could talk in russian for a little bit what's your favorite memory from childhood
link |
01:31:52.880
what is your favorite memory from childhood music how about can you try to answer in russian music
link |
01:32:04.720
it was very cool when such music classical music what is your favorite
link |
01:32:12.400
you were different composers at first it was evaldea in general was surprised that it was possible and then
link |
01:32:23.760
when I understood Bach I was absolutely shocked by the way, I think that there are predicates
link |
01:32:34.400
like structures in Bach, but of course, because there is just a sense of structure here and I don't think
link |
01:32:43.520
that different elements of life are strongly divided in the sense of predicates all the way to the structure
link |
01:32:54.960
of the structure in human relations to the structure here is how to find these here is a high level of
link |
01:33:03.120
predicates in Bach and in life now that we're talking about Bach let's switch back to English
link |
01:33:15.520
because I like Beethoven and Chopin so Chopin it's another music story but Bach if we talk about
link |
01:33:23.040
predicates Bach probably has the most sort of well defined predicates that underlie you know it is
link |
01:33:32.080
very interesting to read what critics writing about Bach which words they're using they're trying to
link |
01:33:41.040
describe predicates and then Chopin it is very different vocabulary very different predicates
link |
01:33:55.040
and I think that if you will make a collection of that so maybe from this you can describe
link |
01:34:05.200
predicate for digit recognition well from Bach and Chopin no no no not from Bach and Chopin
link |
01:34:12.400
from the critic interpretation of the music yeah when they're trying to explain you music
link |
01:34:18.240
what they use as they use they describe high level ideas of of plateaus ideas what behind
link |
01:34:27.520
this music that's brilliant so art is not self explanatory in some sense so you have to try to
link |
01:34:35.920
convert it into ideas it is ill post problems when when you go from ideas to to the representation
link |
01:34:45.040
it is easy way but when you're trying to go back it is ill post problems but nevertheless
link |
01:34:52.800
I believe that when you're looking from that even from art you will be able to find predicate for
link |
01:35:00.880
digit recognition that's such a fascinating and powerful notion do you ponder your own mortality
link |
01:35:10.240
do you think about it do you fear it do you draw insight from it
link |
01:35:16.640
about mortality no yeah are you afraid of death
link |
01:35:25.760
not too much not too much it is pete is it I will not be able to do something which I
link |
01:35:33.920
I think I have a feeling to do that for example I will be very happy to work with
link |
01:35:46.080
guys theoretician from music to write this collection of description what what how they
link |
01:35:53.840
describe music how they use what predicate and from art as well then take what is in common
link |
01:36:02.000
and try to understand predicate which is absolute for everything and then use that for visual
link |
01:36:09.680
recognition and see if there is a connection yeah exactly oh there's still time we got time
link |
01:36:18.480
you got time it it it it takes years and years and years yeah it's a long way well see you've got
link |
01:36:27.520
the patient mathematic mathematicians mind I think it could be done very quickly and very
link |
01:36:33.440
beautifully I think it's a really elegant idea yeah but also some of many yeah you know the the
link |
01:36:38.960
most time it is not to make this collection to understand what is the common to think about
link |
01:36:47.040
that once again and again and again and again but I think sometimes especially just when you
link |
01:36:53.840
say this idea now even just putting together the collection and looking at the different
link |
01:37:02.160
sets of data language trying to interpret music criticize music and images I think there will
link |
01:37:09.200
be sparks of ideas that will come of course again and again you'll come up with better ideas but
link |
01:37:13.520
even just that notion is a beautiful notion I even have some example
link |
01:37:18.960
so I have a friend
link |
01:37:25.120
who was a specialist in Russian poetry she is a professor of Russian poetry he did not write
link |
01:37:38.240
poems but she knows a lot of stuff she makes
link |
01:37:44.880
a book several books and one of them is a collection of Russian poetry
link |
01:37:54.560
she have images of Russian poetry she collect all images of Russian poetry and I ask you to do
link |
01:38:02.640
following you have nips digit recognition and we get 100 digits or maybe less than 100 I don't
link |
01:38:15.520
remember maybe 50 digits and try from a poetical point of view describe every image you see using
link |
01:38:25.520
only words of images of Russian poetry and she did it and then we tried to
link |
01:38:40.960
I call it learning using privileged information I call it privileged information
link |
01:38:45.760
you have on two languages one language is just image of digit in another language
link |
01:38:54.080
poetic description of this image and this is privileged information
link |
01:39:02.160
and there is an algorithm when you're working using privileged information you're doing well
link |
01:39:06.800
better much better so so there's something there something there and there is a
link |
01:39:13.440
in NEC she unfortunately died the collection of digits in poetic descriptions of these digits
link |
01:39:29.040
yeah there's some something there in that poetic description but I think that
link |
01:39:35.440
there is an abstract ideas on the plateau level of ideas yeah that they're there that could be
link |
01:39:42.400
discovered and music seems to be a good entry point but as soon as we start this is this challenge
link |
01:39:49.360
problem the challenge from I listen it immediately connected to to all this stuff especially with
link |
01:39:55.920
your talk and this podcast and I'll do whatever I can to advertise it's such a clean beautiful
link |
01:40:01.280
Einstein like formulation of the challenge before us right let me ask another absurd question
link |
01:40:09.040
and we talked about mortality we talked about philosophy of life what do you think is the
link |
01:40:15.280
meaning of life what's the predicate for mysterious existence here on earth
link |
01:40:23.520
I don't know it's very interesting how we have in Russia I don't know you know the guy Strugatsky
link |
01:40:41.760
they are writing pictures they're thinking about hewn what what's going on and they have idea
link |
01:40:55.360
that there are the developing two type of people common people and very smart people they just
link |
01:41:05.440
started and these two branches of people will go in different direction very soon so that's what
link |
01:41:14.320
they're thinking about that so the purpose of life is to create two two paths two human societies
link |
01:41:25.920
yes simple people and more complicated which do you like best the simple people are the complicated
link |
01:41:33.120
ones I don't know that he's just his fantasy but you know every week we have a guy who is just
link |
01:41:46.320
writer and also a theoretic of literature and he explained how he understand
link |
01:41:55.200
literature and human relationship how he see life and I understood that I'm just small kids
link |
01:42:06.720
comparing to him he is very smart guy in understanding life he knows this predicate he
link |
01:42:15.920
knows big blocks of life I am used every time when I listen to him and he just talking about
link |
01:42:26.160
literature and I think that I was surprised so the managers in big companies most of them are
link |
01:42:42.720
guys who study English language and English literature so why because they understand life
link |
01:42:54.720
they understand models and among them maybe many talented critics
link |
01:43:02.320
did just analyzing this and this is big science like property this is blocks
link |
01:43:13.200
that's very smart it amazes me that you are and continue to be humbled by the brilliance of
link |
01:43:22.320
others I'm very modest about myself I see so smart guys around well let me be immodest for you
link |
01:43:30.880
you're one of the greatest mathematicians statisticians of our time it's truly an honor
link |
01:43:36.800
thank you for talking and let's talk it is not yeah I know my limits let's let's talk again when
link |
01:43:47.680
your challenge is taken on and solved by grad student especially when maybe musical be involved
link |
01:43:58.720
latimer thank you so much thank you very much thanks for listening to this conversation with
link |
01:44:04.240
latimer of apnick and thank you to our presenting sponsor cash app download it use code lex podcast
link |
01:44:11.280
you'll get ten dollars and ten dollars will go to first an organization that inspires and educates
link |
01:44:16.320
young minds to become science and technology innovators of tomorrow if you enjoy this podcast
link |
01:44:22.000
subscribe on youtube give us five stars an apple podcast support it on page share on
link |
01:44:26.640
or simply connect with me on twitter at lex freedman and now let me leave you with some words
link |
01:44:33.360
from latimer of apnick when solving a problem of interest do not solve a more general problem
link |
01:44:39.920
as an intermediate step thank you for listening I hope to see you next time