back to index

Vladimir Vapnik: Predicates, Invariants, and the Essence of Intelligence | Lex Fridman Podcast #71


small model | large model

link |
00:00:00.000
The following is a conversation with Vladimir Vapnik, part two, the second
link |
00:00:05.280
time we spoke on the podcast.
link |
00:00:07.280
He's the coinventor of support vector machines, support vector clustering, VC
link |
00:00:11.440
theory, and many foundational ideas and statistical learning.
link |
00:00:14.960
He was born in the Soviet Union, worked at the Institute of Control Sciences
link |
00:00:19.320
in Moscow, then in the US, worked at AT&T, NEC labs, Facebook AI research,
link |
00:00:26.080
and now is a professor at Columbia University.
link |
00:00:28.640
His work has been cited over 200,000 times.
link |
00:00:32.320
The first time we spoke on the podcast was just over a year
link |
00:00:35.040
ago, one of the early episodes.
link |
00:00:38.240
This time we spoke after a lecture he gave titled complete statistical theory
link |
00:00:42.760
of learning as part of the MIT series of lectures on deep learning
link |
00:00:46.720
and AI that I organized.
link |
00:00:49.520
I'll release the video of the lecture in the next few days.
link |
00:00:53.040
This podcast and lecture are independent from each other, so you don't need
link |
00:00:56.840
one to understand the other.
link |
00:00:59.000
The lecture is quite technical and math heavy, so if you do watch both, I
link |
00:01:04.040
recommend listening to this podcast first, since the podcast is
link |
00:01:07.320
probably a bit more accessible.
link |
00:01:10.800
This is the artificial intelligence podcast.
link |
00:01:13.560
If you enjoy it, subscribe on YouTube, give it five stars on Apple podcasts,
link |
00:01:17.520
support it on Patreon, or simply connect with me on Twitter
link |
00:01:20.720
at Lex Friedman spelled F R I D M A N.
link |
00:01:23.680
As usual, I'll do one or two minutes of ads now and never any ads in
link |
00:01:27.800
the middle that can break the flow of the conversation.
link |
00:01:30.440
I hope that works for you and doesn't hurt the listening experience.
link |
00:01:35.080
This show is presented by Cash App, the number one finance app in the app store.
link |
00:01:39.480
When you get it, use code LexPodcast.
link |
00:01:42.760
Cash App lets you send money to friends, buy Bitcoin, and invest in the
link |
00:01:46.440
stock market with as little as $1.
link |
00:01:48.680
Broker services are provided by Cash App Investing, a subsidiary of Square
link |
00:01:52.720
and member SIPC, since Cash App allows you to send and receive money
link |
00:01:57.520
digitally, peer to peer, and security in all digital transactions is very important.
link |
00:02:02.400
Let me mention that PCI data security standard, PCI DSS level one,
link |
00:02:07.760
that Cash App is compliant with.
link |
00:02:10.920
I'm a big fan of standards for safety and security and PCI DSS is a good
link |
00:02:16.480
example of that, where a bunch of competitors got together and agreed
link |
00:02:20.480
that there needs to be a global standard around the security of transactions.
link |
00:02:24.480
Now we just need to do the same for autonomous vehicles
link |
00:02:27.480
and AI systems in general.
link |
00:02:30.000
So again, if you get Cash App from the app store or Google Play and use the code
link |
00:02:34.200
LexPodcast, you get $10 and Cash App will also donate $10 to FIRST, one of my
link |
00:02:40.240
favorite organizations that is helping to advance robotics and STEM education
link |
00:02:45.040
for young people around the world.
link |
00:02:46.800
And now here's my conversation with Vladimir Vapnik.
link |
00:02:52.440
You and I talked about Alan Turing yesterday a little bit and that he, as the
link |
00:02:58.040
father of artificial intelligence, may have instilled in our field, an ethic
link |
00:03:02.080
of engineering and not science, seeking more to build intelligence
link |
00:03:06.560
rather than to understand it.
link |
00:03:09.160
What do you think is the difference between these two paths of engineering
link |
00:03:13.760
intelligence and the science of intelligence?
link |
00:03:18.120
It's a completely different story.
link |
00:03:20.520
Engineering is a mutation of human activity.
link |
00:03:25.320
You have to make a device which behaves as humans behave, have all the functions
link |
00:03:34.640
of humans.
link |
00:03:36.120
It doesn't matter how you do it, but to understand what is intelligence,
link |
00:03:41.360
but to understand what is intelligence about, it's quite a different problem.
link |
00:03:48.920
So I think, I believe that it's somehow related to the predicate we talked
link |
00:03:55.160
yesterday about, because look at the Vladimir Propp's idea.
link |
00:04:04.760
He just found 31 here, predicates, he called it units, which can explain
link |
00:04:17.600
human behavior, at least in Russian tales.
link |
00:04:20.760
You look at Russian tales and derive from that.
link |
00:04:24.840
And then people realize that it's more wide than in Russian tales.
link |
00:04:29.480
It is in TV, in movie serials and so on and so on.
link |
00:04:33.720
So you're talking about Vladimir Propp, who in 1928 published a book,
link |
00:04:39.960
Morphology of the Folktale, describing 31 predicates that have this kind of
link |
00:04:46.400
sequential structure that a lot of the stories, narratives follow in Russian
link |
00:04:53.320
folklore and in other contexts.
link |
00:04:54.960
We'll talk about it.
link |
00:04:56.040
I'd like to talk about predicates in a focused way, but let me, if you allow
link |
00:05:00.400
me to stay zoomed out on our friend, Alan Turing, and, you know, he inspired
link |
00:05:06.600
a generation with the imitation game.
link |
00:05:10.080
Yes.
link |
00:05:11.560
Do you think if we can linger on that a little bit longer, do you think we can
link |
00:05:17.480
learn, do you think learning to imitate intelligence can get us closer to the
link |
00:05:22.960
science, to understanding intelligence?
link |
00:05:24.920
So why do you think imitation is so far from understanding?
link |
00:05:32.200
I think that it is different between you have different goals.
link |
00:05:37.000
So your goal is to create something, something useful.
link |
00:05:43.080
Yeah.
link |
00:05:43.560
And that is great.
link |
00:05:45.400
And you can see how much things was done and I believe that it will be done even
link |
00:05:51.240
more, it's self driving cars and also the business, it is great.
link |
00:05:57.920
And it was inspired by Turing's vision.
link |
00:06:02.640
But understanding is very difficult.
link |
00:06:05.000
It's more or less philosophical category.
link |
00:06:07.840
What means understand the world?
link |
00:06:10.800
I believe in scheme which starts from Plato, that there exists world of ideas.
link |
00:06:18.040
I believe that intelligence, it is world of ideas, but it is world of pure ideas.
link |
00:06:24.840
And when you combine them with reality things, it creates, as in my case,
link |
00:06:34.400
invariants, which is very specific.
link |
00:06:37.520
And that's, I believe, the combination of ideas in way to constructing invariants.
link |
00:06:47.320
Constructing invariant is intelligence.
link |
00:06:49.760
But first of all, predicate, if you know, predicate and hopefully
link |
00:06:56.080
then not too much predicate exists.
link |
00:07:00.760
For example, 31 predicate for human behavior, it is not a lot.
link |
00:07:06.040
Vladimir Propp used 31, you can even call them predicate, 31
link |
00:07:12.720
predicates to describe stories, narratives.
link |
00:07:17.640
Do you think human behavior, how much of human behavior, how much of our
link |
00:07:22.560
world, our universe, all the things that matter in our existence can be
link |
00:07:28.000
summarized in predicates of the kind that Propp was working with?
link |
00:07:32.600
I think that we have a lot of form of behavior, but I think that
link |
00:07:38.760
predicate is much less because even in this example, which I gave you
link |
00:07:43.840
yesterday, you saw that predicate can be, one predicate can construct many
link |
00:07:55.000
different invariants depending on your data.
link |
00:07:59.360
They're applying to different data and they give different invariants.
link |
00:08:04.200
So, but pure ideas, maybe not so much.
link |
00:08:08.600
Not so many.
link |
00:08:09.880
I don't know about that, but my guess, I hope that's why challenge
link |
00:08:15.000
about digit recognition, how much you need.
link |
00:08:19.600
I think we'll talk about computer vision and 2D images a little bit
link |
00:08:23.560
in your challenge.
link |
00:08:24.800
That's exactly about intelligence.
link |
00:08:26.720
That's exactly, that's exactly about, no, that hopes to be exactly about
link |
00:08:33.880
the spirit of intelligence in the simplest possible way.
link |
00:08:37.160
Yeah, absolutely you should start the simplest way, otherwise you
link |
00:08:40.760
will not be able to do it.
link |
00:08:42.320
Well, there's an open question whether starting at the MNIST digit
link |
00:08:46.680
recognition is a step towards intelligence or it's an entirely different thing.
link |
00:08:52.320
I think that to beat records using say 100, 200 times less examples,
link |
00:08:59.360
you need intelligence.
link |
00:09:00.360
You need intelligence.
link |
00:09:01.200
So let's, because you use this term and it would be nice, I'd like to
link |
00:09:05.800
ask simple, maybe even dumb questions.
link |
00:09:09.640
Let's start with a predicate.
link |
00:09:12.520
In terms of terms and how you think about it, what is a predicate?
link |
00:09:17.160
I don't know.
link |
00:09:18.520
I have a feeling formally they exist, but I believe that predicate for
link |
00:09:26.440
2D images, one of them is symmetry.
link |
00:09:31.960
Hold on a second.
link |
00:09:32.560
Sorry.
link |
00:09:32.960
Sorry, sorry to interrupt and pull you back.
link |
00:09:36.440
At the simplest level, we're not even, we're not being profound currently.
link |
00:09:40.680
A predicate is a statement of something that is true.
link |
00:09:44.880
Yes.
link |
00:09:46.600
Do you think of predicates as somehow probabilistic in nature or is this binary?
link |
00:09:54.640
This is truly constraints of logical statements about the world.
link |
00:09:59.840
In my definition, the simplest predicate is function.
link |
00:10:03.800
Function, and you can use this function to make inner product that is predicate.
link |
00:10:10.480
What's the input and what's the output of the function?
link |
00:10:13.600
Input is X, something which is input in reality.
link |
00:10:18.240
Say if you consider digit recognition, it pixel space input, but it is
link |
00:10:25.440
function which in pixel space, but it can be any function from pixel space and you
link |
00:10:36.240
choose, and I believe that there are several functions which is important for
link |
00:10:43.160
understanding of images.
link |
00:10:46.400
One of them is symmetry.
link |
00:10:48.240
It's not so simple construction as I described with the derivative, with all
link |
00:10:53.720
this stuff, but another, I believe, I don't know how many, is how well
link |
00:10:59.600
structurized is picture.
link |
00:11:03.240
Structurized?
link |
00:11:04.280
Yeah.
link |
00:11:04.840
What do you mean by structurized?
link |
00:11:06.960
It is formal definition.
link |
00:11:09.040
Say something heavy on the left corner, not so heavy in the middle and so on.
link |
00:11:17.040
You describe in general concept of what you assume.
link |
00:11:21.840
Concepts, some kind of universal concepts.
link |
00:11:25.200
Yeah, but I don't know how to formalize this.
link |
00:11:29.160
Do you?
link |
00:11:29.840
So this is the thing.
link |
00:11:31.560
There's a million ways we can talk about this.
link |
00:11:33.600
I'll keep bringing it up, but we humans have such concepts when we look at
link |
00:11:40.000
digits, but it's hard to put them, just like you're saying now, it's
link |
00:11:44.000
hard to put them into words.
link |
00:11:45.480
You know, that is example, when critics in music, trying to describe music,
link |
00:11:55.440
they use predicate and not too many predicate, but in different combination,
link |
00:12:02.600
but they have some special words for describing music and the same
link |
00:12:10.440
should be for images, but maybe there are critics who understand essence
link |
00:12:16.920
of what this image is about.
link |
00:12:20.960
Do you think there exists critics who can summarize the essence of
link |
00:12:26.960
images, human beings?
link |
00:12:29.120
I hope so, yes, but that...
link |
00:12:32.440
Explicitly state them on paper.
link |
00:12:34.520
The fundamental question I'm asking is, do you think there exists a small
link |
00:12:41.840
set of predicates that will summarize images?
link |
00:12:45.040
It feels to our mind, like it does, that the concept of what makes a two
link |
00:12:50.840
and a three and a four...
link |
00:12:53.000
No, no, no, it's not on this level.
link |
00:12:58.040
It should not describe two, three, four.
link |
00:13:01.240
It describes some construction, which allow you to create invariance.
link |
00:13:08.040
And invariance, sorry to stick on this, but terminology.
link |
00:13:12.360
Invariance, it is property of your image.
link |
00:13:21.040
Say, I can say, looking on my image, it is more or less symmetric.
link |
00:13:27.760
Looking on my image, it is more or less symmetric, and I can give you value
link |
00:13:33.360
of symmetry, say, level of symmetry, using this function which I gave
link |
00:13:40.560
yesterday. And you can describe that your image has these characteristics
link |
00:13:51.560
exactly in the way how musical critics describe music.
link |
00:13:56.640
So, but this is invariant applied to specific data, to specific music,
link |
00:14:05.400
to something.
link |
00:14:07.640
I strongly believe in this plot ideas that there exists world of predicate
link |
00:14:14.960
and world of reality, and predicate and reality is somehow connected,
link |
00:14:20.160
and you have to know that.
link |
00:14:22.400
Let's talk about Plato a little bit.
link |
00:14:23.960
So you draw a line from Plato, to Hegel, to Wigner, to today.
link |
00:14:30.120
So Plato has forms, the theory of forms.
link |
00:14:35.440
So there's a world of ideas and a world of things, as you talk about,
link |
00:14:39.400
and there's a connection.
link |
00:14:40.400
And presumably the world of ideas is very small, and the world of things
link |
00:14:45.720
is arbitrarily big, but they're all what Plato calls them like, it's a shadow.
link |
00:14:52.520
The real world is a shadow from the world of forms.
link |
00:14:55.040
Yeah, you have projection of a world of ideas.
link |
00:14:58.840
Yeah, very poetic.
link |
00:15:00.640
In reality, you can realize this projection using these invariants
link |
00:15:07.040
because it is projection for own specific examples, which create specific features
link |
00:15:13.600
of specific objects.
link |
00:15:14.840
So the essence of intelligence is while only being able to observe
link |
00:15:22.920
the world of things, try to come up with a world of ideas.
link |
00:15:26.720
Exactly.
link |
00:15:27.720
Like in this music story, intelligent musical critics knows all these words
link |
00:15:33.040
and have a feeling about what they mean.
link |
00:15:34.840
I feel like that's a contradiction, intelligent music critics.
link |
00:15:38.800
But I think music is to be enjoyed in all its forms.
link |
00:15:47.280
The notion of critic, like a food critic.
link |
00:15:49.840
No, I don't want touch emotion.
link |
00:15:51.800
That's an interesting question.
link |
00:15:53.440
Does emotion...
link |
00:15:54.640
There's certain elements of the human psychology, of the human experience,
link |
00:15:59.240
which seem to almost contradict intelligence and reason.
link |
00:16:04.720
Like emotion, like fear, like love, all of those things,
link |
00:16:11.160
are those not connected in any way to the space of ideas?
link |
00:16:16.520
That I don't know.
link |
00:16:18.560
I just want to be concentrate on very simple story, on digit recognition.
link |
00:16:27.720
So you don't think you have to love and fear death in order to recognize digits?
link |
00:16:31.800
I don't know.
link |
00:16:33.600
Because it's so complicated.
link |
00:16:36.560
It involves a lot of stuff which I never considered.
link |
00:16:41.200
But I know about digit recognition.
link |
00:16:44.720
And I know that for digit recognition,
link |
00:16:50.360
to get records from small number of observations, you need predicate.
link |
00:16:59.040
But not special predicate for this problem.
link |
00:17:03.240
But universal predicate, which understand world of images.
link |
00:17:08.480
Of visual information.
link |
00:17:09.920
Visual, yes.
link |
00:17:11.120
But on the first step, they understand, say, world of handwritten digits,
link |
00:17:18.440
or characters, or something simple.
link |
00:17:21.400
So like you said, symmetry is an interesting one.
link |
00:17:23.800
No, that's what I think one of the predicate is related to symmetry.
link |
00:17:28.720
The level of symmetry.
link |
00:17:30.720
Okay, degree of symmetry.
link |
00:17:32.120
So you think symmetry at the bottom is a universal notion,
link |
00:17:37.200
and there's degrees of a single kind of symmetry,
link |
00:17:41.480
or is there many kinds of symmetries?
link |
00:17:44.160
Many kinds of symmetries.
link |
00:17:46.000
There is a symmetry, antisymmetry, say, letter S.
link |
00:17:52.360
So it has vertical antisymmetry.
link |
00:17:58.400
And it could be diagonal symmetry, vertical symmetry.
link |
00:18:02.640
So when you cut vertically the letter S...
link |
00:18:07.760
Yeah, then the upper part and lower part in different directions.
link |
00:18:16.600
Inverted, along the Y axis.
link |
00:18:18.920
But that's just like one example of symmetry, right?
link |
00:18:21.240
Isn't there like...
link |
00:18:21.960
Right, but there is a degree of symmetry.
link |
00:18:26.320
If you play all this iterative stuff to do tangent distance,
link |
00:18:35.040
whatever I describe, you can have a degree of symmetry.
link |
00:18:40.480
And that is what describing reason of image.
link |
00:18:45.920
It is the same as you will describe this image.
link |
00:18:53.200
Think about digit S, it has antisymmetry.
link |
00:18:57.920
Digit three is symmetric.
link |
00:19:00.880
More or less, look for symmetry.
link |
00:19:04.480
Do you think such concepts like symmetry,
link |
00:19:07.840
predicates like symmetry, is it a hierarchical set of concepts?
link |
00:19:14.360
Or are these independent, distinct predicates
link |
00:19:20.080
that we want to discover as some set of...
link |
00:19:23.600
No, there is an idea of symmetry.
link |
00:19:25.960
And you can, this idea of symmetry, make very general.
link |
00:19:34.880
Like degree of symmetry.
link |
00:19:37.120
If degree of symmetry can be zero, no symmetry at all.
link |
00:19:40.680
Or degree of symmetry, say, more or less symmetrical.
link |
00:19:46.960
But you have one of these descriptions.
link |
00:19:50.480
And symmetry can be different.
link |
00:19:52.480
As I told, horizontal, vertical, diagonal,
link |
00:19:56.320
and antisymmetry is also concept of symmetry.
link |
00:20:01.400
What about shape in general?
link |
00:20:03.320
I mean, symmetry is a fascinating notion, but...
link |
00:20:06.920
No, no, I'm talking about digit.
link |
00:20:08.600
I would like to concentrate on all I would like to know,
link |
00:20:12.440
predicate for digit recognition.
link |
00:20:14.440
Yes, but symmetry is not enough for digit recognition, right?
link |
00:20:19.360
It is not necessarily for digit recognition.
link |
00:20:22.520
It helps to create invariant, which you can use
link |
00:20:30.640
when you will have examples for digit recognition.
link |
00:20:35.000
You have regular problem of digit recognition.
link |
00:20:38.240
You have examples of the first class or second class.
link |
00:20:41.600
Plus, you know that there exists concept of symmetry.
link |
00:20:45.840
And you apply, when you're looking for decision rule,
link |
00:20:50.400
you will apply concept of symmetry,
link |
00:20:55.400
of this level of symmetry, which you estimate from...
link |
00:21:00.120
So let's talk.
link |
00:21:01.680
Everything comes from weak convergence.
link |
00:21:06.600
What is convergence?
link |
00:21:07.840
What is weak convergence?
link |
00:21:09.280
What is strong convergence?
link |
00:21:11.360
I'm sorry, I'm gonna do this to you.
link |
00:21:13.360
What are we converging from and to?
link |
00:21:16.120
You're converging, you would like to have a function.
link |
00:21:20.480
The function which, say, indicator function,
link |
00:21:23.600
which indicate your digit five, for example.
link |
00:21:29.920
A classification task.
link |
00:21:31.480
Let's talk only about classification.
link |
00:21:33.640
So classification means you will say
link |
00:21:36.840
whether this is a five or not,
link |
00:21:38.560
or say which of the 10 digits it is.
link |
00:21:40.600
Right, right.
link |
00:21:42.160
I would like to have these functions.
link |
00:21:46.560
Then, I have some examples.
link |
00:21:56.040
I can consider property of these examples.
link |
00:22:01.120
Say, symmetry.
link |
00:22:02.720
And I can measure level of symmetry for every digit.
link |
00:22:08.040
And then I can take average from my training data.
link |
00:22:16.680
And I will consider only functions
link |
00:22:20.920
of conditional probability,
link |
00:22:24.000
which I'm looking for my decision rule.
link |
00:22:27.280
Which applying to digits will give me the same average
link |
00:22:38.360
as I observe on training data.
link |
00:22:41.960
So, actually, this is different level
link |
00:22:45.360
of description of what you want.
link |
00:22:48.480
You want not just, you show not one digit.
link |
00:22:54.080
You show, this predicate, show general property
link |
00:22:59.840
of all digits which you have in mind.
link |
00:23:03.720
If you have in mind digit three,
link |
00:23:06.080
it gives you property of digit three.
link |
00:23:10.360
And you select as admissible set of function,
link |
00:23:13.560
only function, which keeps this property.
link |
00:23:16.960
You will not consider other functions.
link |
00:23:20.760
So, you immediately looking for smaller subset of function.
link |
00:23:24.920
That's what you mean by admissible functions.
link |
00:23:27.000
Admissible function, exactly.
link |
00:23:28.400
Which is still a pretty large,
link |
00:23:30.920
for the number three, is a large.
link |
00:23:32.920
It is pretty large, but if you have one predicate.
link |
00:23:36.600
But according to, there is a strong and weak convergence.
link |
00:23:42.760
Strong convergence is convergence in function.
link |
00:23:46.360
You're looking for the function on one function,
link |
00:23:49.200
and you're looking for another function.
link |
00:23:51.880
And square difference from them should be small.
link |
00:23:59.240
If you take difference in any points,
link |
00:24:01.880
make a square, make an integral, and it should be small.
link |
00:24:05.640
That is convergence in function.
link |
00:24:08.040
Suppose you have some function, any function.
link |
00:24:11.280
So, I would say, I say that some function
link |
00:24:15.400
converge to this function.
link |
00:24:17.880
If integral from square difference between them is small.
link |
00:24:22.880
That's the definition of strong convergence.
link |
00:24:24.760
That definition of strong convergence.
link |
00:24:25.760
Two functions, the integral, the difference, is small.
link |
00:24:28.920
Yeah, it is convergence in functions.
link |
00:24:31.160
Yeah.
link |
00:24:32.280
But you have different convergence in functionals.
link |
00:24:36.720
You take any function, you take some function, phi,
link |
00:24:41.160
and take inner product, this function, this f function.
link |
00:24:46.040
f0 function, which you want to find.
link |
00:24:50.360
And that gives you some value.
link |
00:24:52.960
So, you say that set of functions converge
link |
00:24:59.960
in inner product to this function,
link |
00:25:03.040
if this value of inner product converge to value f0.
link |
00:25:10.400
That is for one phi.
link |
00:25:12.480
But weak convergence requires that it converge for any
link |
00:25:16.320
function of Hilbert space.
link |
00:25:20.680
If it converge for any function of Hilbert space,
link |
00:25:24.200
then you will say that this is weak convergence.
link |
00:25:28.240
You can think that when you take integral,
link |
00:25:32.200
that is integral property of function.
link |
00:25:35.920
For example, if you will take sine or cosine,
link |
00:25:39.120
it is coefficient of, say, Fourier expansion.
link |
00:25:45.480
So, if it converge for all coefficients of Fourier
link |
00:25:51.440
expansion, so under some condition,
link |
00:25:54.240
it converge to function you're looking for.
link |
00:25:58.080
But weak convergence means any property.
link |
00:26:02.800
Convergence not point wise, but integral property
link |
00:26:07.640
of function.
link |
00:26:09.480
So, weak convergence means integral property of functions.
link |
00:26:13.800
When I'm talking about predicate,
link |
00:26:16.040
I would like to formulate which integral properties
link |
00:26:23.200
I would like to have for convergence.
link |
00:26:27.840
So, and if I will take one predicated function,
link |
00:26:33.440
which I measure property, if I will use one predicate
link |
00:26:39.600
and say, I will consider only function which give me
link |
00:26:44.840
the same value as this predicate,
link |
00:26:47.840
I selecting set of functions from functions
link |
00:26:53.440
which is admissible in the sense that function which I'm
link |
00:26:58.000
looking for in this set of functions
link |
00:27:01.000
because I checking in training data, it gives the same.
link |
00:27:08.760
Yeah, so it always has to be connected to the training
link |
00:27:10.960
data in terms of?
link |
00:27:12.600
Yeah, but property, you can know independent on training data.
link |
00:27:18.720
And this guy, prop, says that there is formal property,
link |
00:27:24.000
31 property.
link |
00:27:25.360
A fairy tale, a Russian fairy tale.
link |
00:27:27.640
But Russian fairy tale is not so interesting.
link |
00:27:30.560
More interesting that people apply this to movies,
link |
00:27:34.880
to theater, to different things.
link |
00:27:38.000
And the same works, they're universal.
link |
00:27:41.960
Well, so I would argue that there's
link |
00:27:44.400
a little bit of a difference between the kinds of things
link |
00:27:48.520
that were applied to which are essentially stories
link |
00:27:51.480
and digit recognition.
link |
00:27:54.240
It is the same story.
link |
00:27:55.880
You're saying digits, there's a story within the digit.
link |
00:27:59.600
Yeah.
link |
00:28:00.360
And so but my point is why I hope
link |
00:28:04.640
that it possible to beat record using not 60,000,
link |
00:28:11.440
but say 100 times less.
link |
00:28:13.800
Because instead, you will give predicates.
link |
00:28:17.840
And you will select your decision
link |
00:28:21.040
not from wide set of functions, but from set of functions
link |
00:28:25.680
which keeps this predicates.
link |
00:28:28.040
But predicate is not related just to digit recognition.
link |
00:28:32.760
Right.
link |
00:28:33.800
Like in Plato's case.
link |
00:28:37.640
Do you think it's possible to automatically discover
link |
00:28:40.800
the predicates?
link |
00:28:42.120
So you basically said that the essence of intelligence
link |
00:28:46.520
is the discovery of good predicates.
link |
00:28:49.560
Yeah.
link |
00:28:51.240
Now, the natural question is that's
link |
00:28:55.800
what Einstein was good at doing in physics.
link |
00:28:59.040
Can we make machines do these kinds
link |
00:29:02.400
of discovery of good predicates?
link |
00:29:04.480
Or is this ultimately a human endeavor?
link |
00:29:07.720
That I don't know.
link |
00:29:09.080
I don't think that machine can do.
link |
00:29:11.400
Because according to theory about weak convergence,
link |
00:29:18.840
any function from Hilbert space can be predicated.
link |
00:29:23.120
So you have infinite number of predicate in upper.
link |
00:29:27.560
And before, you don't know which predicate is good and which.
link |
00:29:32.800
But whatever prop show and why people call it breakthrough,
link |
00:29:39.880
that there is not too many predicate
link |
00:29:44.600
which cover most of situation happened in the world.
link |
00:29:48.600
Right.
link |
00:29:51.280
So there's a sea of predicates.
link |
00:29:54.200
And most of the only a small amount
link |
00:29:57.240
are useful for the kinds of things
link |
00:29:58.800
that happen in the world.
link |
00:30:01.240
I think that I would say only small part of predicate
link |
00:30:07.120
very useful.
link |
00:30:08.680
Useful all of them.
link |
00:30:11.280
Only very few are what we should let's call them
link |
00:30:14.360
good predicates.
link |
00:30:15.440
Very good predicates.
link |
00:30:16.640
Very good predicates.
link |
00:30:18.160
So can we linger on it?
link |
00:30:20.720
What's your intuition?
link |
00:30:21.680
Why is it hard for a machine to discover good predicates?
link |
00:30:27.520
Even in my talk described how to do predicate.
link |
00:30:30.680
How to find new predicate.
link |
00:30:32.640
I'm not sure that it is very good.
link |
00:30:34.960
What did you propose in your talk?
link |
00:30:36.600
No.
link |
00:30:37.160
In my talk, I gave example for diabetes.
link |
00:30:42.360
Diabetes, yeah.
link |
00:30:43.720
When we achieve some percent.
link |
00:30:46.160
So then we're looking for area where
link |
00:30:50.760
some sort of predicate, which I formulate,
link |
00:30:54.760
does not keeps invariant.
link |
00:31:03.120
So if it doesn't keep, I retrain my data.
link |
00:31:06.920
I select only function which keeps this invariant.
link |
00:31:11.080
And when I did it, I improved my performance.
link |
00:31:14.400
I can looking for this predicate.
link |
00:31:16.400
I know technically how to do that.
link |
00:31:19.440
And you can, of course, do it using machine.
link |
00:31:25.560
But I'm not sure that we will construct the smartest
link |
00:31:29.560
predicate.
link |
00:31:30.920
But this is the, allow me to linger on it.
link |
00:31:34.120
Because that's the essence.
link |
00:31:35.280
That's the challenge.
link |
00:31:36.240
That is artificial.
link |
00:31:37.600
That's the human level intelligence
link |
00:31:40.320
that we seek is the discovery of these good predicates.
link |
00:31:43.720
You've talked about deep learning as a way to,
link |
00:31:47.560
the predicates they use and the functions are mediocre.
link |
00:31:52.960
You can find better ones.
link |
00:31:55.000
Let's talk about deep learning.
link |
00:31:57.280
Sure, let's do it.
link |
00:31:58.360
I know only Jan's Likun convolutional network.
link |
00:32:04.200
And what else?
link |
00:32:05.160
I don't know.
link |
00:32:05.920
And it's a very simple convolution.
link |
00:32:07.960
There's not much else to know.
link |
00:32:09.120
To pixel left and right.
link |
00:32:10.400
I can do it like that with one predicate.
link |
00:32:14.600
Convolution is a single predicate.
link |
00:32:16.640
It's single.
link |
00:32:17.600
It's single predicate.
link |
00:32:21.120
Yes, but that's it.
link |
00:32:22.680
You know exactly.
link |
00:32:23.680
You take the derivative for translation and predicate.
link |
00:32:28.320
This should be kept.
link |
00:32:31.040
So that's a single predicate.
link |
00:32:32.440
But humans discovered that one.
link |
00:32:34.200
Or at least.
link |
00:32:35.760
Not it.
link |
00:32:36.240
That is a risk.
link |
00:32:37.120
Not too many predicates.
link |
00:32:38.960
And that is big story because Jan did it 25 years ago
link |
00:32:43.720
and nothing so clear was added to deep network.
link |
00:32:50.160
And then I don't understand why we
link |
00:32:55.400
should talk about deep network instead of talking
link |
00:32:58.400
about piecewise linear functions which keeps this predicate.
link |
00:33:02.840
Well, a counter argument is that maybe the amount
link |
00:33:08.720
of predicates necessary to solve general intelligence,
link |
00:33:14.480
say in the space of images, doing
link |
00:33:16.720
efficient recognition of handwritten digits
link |
00:33:20.640
is very small.
link |
00:33:22.400
And so we shouldn't be so obsessed about finding.
link |
00:33:26.840
We'll find other good predicates like convolution, for example.
link |
00:33:30.720
There has been other advancements
link |
00:33:33.880
like if you look at the work with attention,
link |
00:33:37.400
there's intentional mechanisms in especially used
link |
00:33:40.720
in natural language focusing the network's ability
link |
00:33:44.160
to learn at which part of the input to look at.
link |
00:33:47.640
The thing is, there's other things besides predicates
link |
00:33:51.000
that are important for the actual engineering mechanism
link |
00:33:55.280
of showing how much you can really
link |
00:33:57.240
do given these predicates.
link |
00:34:02.120
I mean, that's essentially the work of deep learning
link |
00:34:04.360
is constructing architectures that are able to be,
link |
00:34:09.000
given the training data, to be able to converge
link |
00:34:13.720
towards a function that can generalize well.
link |
00:34:22.920
It's an engineering problem.
link |
00:34:24.400
Yeah, I understand.
link |
00:34:26.120
But let's talk not on emotional level,
link |
00:34:29.840
but on a mathematical level.
link |
00:34:31.920
You have set of piecewise linear functions.
link |
00:34:36.480
It is all possible neural networks.
link |
00:34:42.040
It's just piecewise linear functions.
link |
00:34:44.040
It's many, many pieces.
link |
00:34:45.360
Large number of piecewise linear functions.
link |
00:34:47.640
Exactly.
link |
00:34:48.640
Very large.
link |
00:34:49.440
Very large.
link |
00:34:50.160
Almost feels like too large.
link |
00:34:51.800
It's still simpler than, say, convolution,
link |
00:34:56.160
than reproducing kernel Hilbert space, which
link |
00:34:59.040
have a Hilbert set of functions.
link |
00:35:00.920
What's Hilbert space?
link |
00:35:02.960
It's space with infinite number of coordinates,
link |
00:35:07.040
say, or function for expansion, something like that.
link |
00:35:11.840
So it's much richer.
link |
00:35:14.760
And when I'm talking about closed form solution,
link |
00:35:17.520
I'm talking about this set of function,
link |
00:35:20.760
not piecewise linear set, which is particular case of it
link |
00:35:29.760
is small part.
link |
00:35:31.000
So neural networks is a small part
link |
00:35:32.960
of the space of functions you're talking about.
link |
00:35:35.960
Say, small set of functions.
link |
00:35:39.160
Let me take that.
link |
00:35:40.600
But it is fine.
link |
00:35:42.080
It is fine.
link |
00:35:42.760
I don't want to discuss the small or big.
link |
00:35:46.560
You take advantage.
link |
00:35:47.920
So you have some set of functions.
link |
00:35:51.040
So now, when you're trying to create architecture,
link |
00:35:55.320
you would like to create admissible set of functions,
link |
00:35:58.800
which all your tricks to use not all functions,
link |
00:36:03.280
but some subset of this set of functions.
link |
00:36:07.200
Say, when you're introducing convolutional net,
link |
00:36:10.040
it is way to make this subset useful for you.
link |
00:36:16.440
But from my point of view, convolutional,
link |
00:36:19.760
it is something you want to keep some invariants,
link |
00:36:24.800
say, translation invariants.
link |
00:36:27.920
But now, if you understand this and you cannot explain
link |
00:36:35.440
on the level of ideas what neural network does,
link |
00:36:41.240
you should agree that it is much better
link |
00:36:44.360
to have a set of functions.
link |
00:36:46.640
And they say, this set of functions should be admissible.
link |
00:36:51.040
It must keep this invariant, this invariant,
link |
00:36:53.640
and that invariant.
link |
00:36:55.200
You know that as soon as you incorporate
link |
00:36:58.240
new invariant set of function, because smaller and smaller
link |
00:37:01.160
and smaller.
link |
00:37:02.080
But all the invariants are specified by you, the human.
link |
00:37:06.640
Yeah, but what I hope that there is a standard predicate,
link |
00:37:12.400
like PROPSHOW, that's what I want
link |
00:37:17.520
to find for digit recognition.
link |
00:37:19.560
If we start, it is completely new area,
link |
00:37:22.920
what is intelligence about on the level,
link |
00:37:25.800
starting from Plato's idea, what is world of ideas.
link |
00:37:32.600
And I believe that is not too many.
link |
00:37:36.640
But it is amusing that mathematicians doing something,
link |
00:37:40.680
a neural network in general function,
link |
00:37:44.000
but people from literature, from art, they use this all
link |
00:37:48.720
the time.
link |
00:37:49.400
That's right.
link |
00:37:50.040
Invariants saying, it is great how people describe music.
link |
00:37:57.000
We should learn from that.
link |
00:37:58.720
And something on this level.
link |
00:38:02.000
But so why Vladimir Propp, who was just theoretical,
link |
00:38:09.200
who studied theoretical literature, he found that.
link |
00:38:12.960
You know what?
link |
00:38:13.720
Let me throw that right back at you,
link |
00:38:15.200
because there's a little bit of a,
link |
00:38:17.280
that's less mathematical and more emotional, philosophical,
link |
00:38:21.000
Vladimir Propp.
link |
00:38:22.680
I mean, he wasn't doing math.
link |
00:38:24.920
No.
link |
00:38:26.840
And you just said another emotional statement,
link |
00:38:30.160
which is you believe that this Plato world of ideas is small.
link |
00:38:35.760
I hope.
link |
00:38:36.920
I hope.
link |
00:38:38.680
Do you, what's your intuition, though?
link |
00:38:42.160
If we can linger on it.
link |
00:38:44.600
You know, it is not just small or big.
link |
00:38:48.520
I know exactly.
link |
00:38:50.520
Then when I introducing some predicate,
link |
00:38:56.880
I decrease set of functions.
link |
00:38:59.760
But my goal to decrease set of function much.
link |
00:39:04.040
By as much as possible.
link |
00:39:05.000
By as much as possible.
link |
00:39:07.480
Good predicate, which does this, then
link |
00:39:11.400
I should choose next predicate, which decrease set
link |
00:39:15.560
as much as possible.
link |
00:39:17.320
So set of good predicate, it is such
link |
00:39:21.400
that they decrease this amount of admissible function.
link |
00:39:27.880
So if each good predicate significantly
link |
00:39:30.520
reduces the set of admissible functions,
link |
00:39:32.640
that there naturally should not be that many good predicates.
link |
00:39:35.560
No, but if you reduce very well the VC dimension
link |
00:39:43.040
of the function, of admissible set of function, it's small.
link |
00:39:46.760
And you need not too much training data to do well.
link |
00:39:52.960
And VC dimension, by the way, is some measure of capacity
link |
00:39:56.760
of this set of functions.
link |
00:39:57.720
Right.
link |
00:39:59.400
Roughly speaking, how many function in this set.
link |
00:40:01.960
So you're decreasing, decreasing.
link |
00:40:03.880
And it makes easy for you to find function
link |
00:40:08.160
you're looking for.
link |
00:40:10.200
But the most important part, to create good admissible set
link |
00:40:14.480
of functions.
link |
00:40:15.680
And it probably, there are many ways.
link |
00:40:18.800
But the good predicates such that they can do that.
link |
00:40:25.880
So for this duck, you should know a little bit about duck.
link |
00:40:30.520
Because what are the three fundamental laws of ducks?
link |
00:40:35.280
Looks like a duck, swims like a duck, and quacks like a duck.
link |
00:40:38.360
You should know something about ducks to be able to.
link |
00:40:41.160
Not necessarily.
link |
00:40:42.480
Looks like, say, horse.
link |
00:40:44.920
It's also good.
link |
00:40:46.520
So it's not, it generalizes from ducks.
link |
00:40:49.840
And talk like, and make sound like horse or something.
link |
00:40:54.280
And run like horse, and moves like horse.
link |
00:40:57.320
It is general, it is general predicate
link |
00:41:02.000
that this applied to duck.
link |
00:41:04.560
But for duck, you can say, play chess like duck.
link |
00:41:09.800
You cannot say play chess like duck.
link |
00:41:11.520
Why not?
link |
00:41:12.600
So you're saying you can, but that would not be a good.
link |
00:41:15.680
No, you will not reduce a lot of functions.
link |
00:41:18.160
You would not do, yeah, you would not
link |
00:41:19.760
reduce the set of functions.
link |
00:41:21.600
So you can, the story is formal story, mathematical story.
link |
00:41:26.760
Is that you can use any function you want as a predicate.
link |
00:41:31.120
But some of them are good, some of them are not,
link |
00:41:33.160
because some of them reduce a lot of functions
link |
00:41:36.880
to admissible set of some of them.
link |
00:41:39.720
But the question is, and I'll probably
link |
00:41:41.440
keep asking this question, but how do we find such,
link |
00:41:45.680
what's your intuition?
link |
00:41:47.360
Handwritten recognition.
link |
00:41:49.400
How do we find the answer to your challenge?
link |
00:41:52.600
Yeah, I understand it like that.
link |
00:41:55.840
I understand what.
link |
00:41:57.800
What defined?
link |
00:41:59.160
What it means, I knew predicate.
link |
00:42:01.680
Yeah.
link |
00:42:02.720
Like guy who understand music can say this word,
link |
00:42:06.160
which he described when he listened to music.
link |
00:42:09.520
He understand music.
link |
00:42:11.600
He use not too many different, oh, you can do like prop.
link |
00:42:15.480
You can make collection.
link |
00:42:17.280
What he talking about music, about this, about that.
link |
00:42:20.920
It's not too many different situation he described.
link |
00:42:24.960
Because we mentioned Vladimir prop a bunch.
link |
00:42:26.920
Let me just mention, there's a sequence of 31
link |
00:42:33.640
structural notions that are common in stories.
link |
00:42:36.880
And I think.
link |
00:42:37.720
You call it units.
link |
00:42:38.560
Units.
link |
00:42:39.400
And I think they resonate.
link |
00:42:40.480
I mean, it starts just to give an example,
link |
00:42:43.600
obsession, a member of the hero's community,
link |
00:42:46.040
a family leaves the security of the home environment.
link |
00:42:48.920
Then it goes to the interdiction,
link |
00:42:51.040
a forbidding edict or command is passed upon the hero.
link |
00:42:54.520
Don't go there.
link |
00:42:55.360
Don't do this.
link |
00:42:56.640
The hero is warned against some action.
link |
00:42:58.680
Then step three, violation of interdiction.
link |
00:43:05.280
Break the rules, break out on your own.
link |
00:43:07.600
Then reconnaissance.
link |
00:43:09.200
The villain makes an effort to attain knowledge,
link |
00:43:11.400
needing to fulfill their plan, so on.
link |
00:43:13.160
It goes on like this, ends in a wedding, number 31.
link |
00:43:19.480
Happily ever after.
link |
00:43:20.640
No, he just gave description of all situations.
link |
00:43:26.000
He understands this world.
link |
00:43:28.160
Of folktales.
link |
00:43:29.280
Yeah, not folktales, but stories.
link |
00:43:33.160
And these stories not in just folktales.
link |
00:43:36.560
These stories in detective serials as well.
link |
00:43:40.880
And probably in our lives.
link |
00:43:42.200
We probably live.
link |
00:43:43.760
Read this.
link |
00:43:45.040
And then they wrote that this predicate is good
link |
00:43:52.040
for different situation.
link |
00:43:54.800
From movie, for theater.
link |
00:43:57.920
By the way, there's also criticism, right?
link |
00:44:00.640
There's an other way to interpret narratives
link |
00:44:03.840
from Claude Levi Strauss.
link |
00:44:09.880
I don't know.
link |
00:44:10.880
I am not in this business.
link |
00:44:12.520
No, I know, it's theoretical literature,
link |
00:44:14.360
but it's looking at paradigms behind things.
link |
00:44:15.840
It's always the discussion, yeah.
link |
00:44:20.120
But at least there is units.
link |
00:44:23.800
It's not too many units that can describe.
link |
00:44:27.160
But this guy probably gives another units.
link |
00:44:30.840
Or another way of...
link |
00:44:31.680
Exactly, another set of units.
link |
00:44:34.400
Another set of predicates.
link |
00:44:35.920
It doesn't matter how.
link |
00:44:37.560
But they exist.
link |
00:44:40.120
Probably.
link |
00:44:40.960
My question is, whether given those units,
link |
00:44:46.240
whether without our human brains to interpret these units,
link |
00:44:50.360
they would still hold as much power as they have.
link |
00:44:53.480
Meaning, are those units enough
link |
00:44:56.200
when we give them to an alien species?
link |
00:44:58.840
Let me ask you.
link |
00:45:00.320
Do you understand digit images?
link |
00:45:06.280
No, I don't understand.
link |
00:45:07.600
No, no, no.
link |
00:45:08.640
When you can recognize these digit images,
link |
00:45:11.160
it means that you understand.
link |
00:45:13.320
Yes, exactly.
link |
00:45:14.160
You understand characters, you understand...
link |
00:45:17.280
No, no, no, no.
link |
00:45:22.720
It's the imitation versus understanding question,
link |
00:45:25.480
because I don't understand the mechanism
link |
00:45:28.360
by which I understand.
link |
00:45:29.200
No, no, no.
link |
00:45:30.040
I'm not talking about, I'm talking about predicates.
link |
00:45:32.760
You understand that it involves symmetry,
link |
00:45:35.120
maybe structure, maybe something else.
link |
00:45:37.400
I cannot formulate.
link |
00:45:38.720
I just was able to find symmetries, degree of symmetries.
link |
00:45:43.640
That's really good.
link |
00:45:44.480
So this is a good line.
link |
00:45:47.200
I feel like I understand the basic elements
link |
00:45:50.560
of what makes a good hand recognition system my own.
link |
00:45:54.280
Like symmetry connects with me.
link |
00:45:56.440
It seems like that's a very powerful predicate.
link |
00:45:59.120
My question is, is there a lot more going on
link |
00:46:02.400
that we're not able to introspect?
link |
00:46:04.480
Maybe I need to be able to understand
link |
00:46:09.600
a huge amount in the world of ideas,
link |
00:46:14.520
thousands of predicates, millions of predicates
link |
00:46:18.400
in order to do hand recognition.
link |
00:46:20.600
I don't think so.
link |
00:46:23.200
So both your hope and your intuition
link |
00:46:26.560
are such that very few predicates are enough.
link |
00:46:28.960
You're using digits, you're using examples as well.
link |
00:46:33.480
Theory says that if you will use all possible functions
link |
00:46:43.480
from Hilbert space, all possible predicate,
link |
00:46:46.360
you don't need training data.
link |
00:46:49.000
You just will have admissible set of function
link |
00:46:53.840
which contain one function.
link |
00:46:56.060
Yes.
link |
00:46:57.160
So the trade off is when you're not using all predicates,
link |
00:47:01.160
you're only using a few good predicates
link |
00:47:03.040
you need to have some training data.
link |
00:47:05.000
Yes, exactly.
link |
00:47:06.800
The more good predicates you have,
link |
00:47:08.440
the less training data you need.
link |
00:47:09.680
Exactly.
link |
00:47:10.960
That is intelligent.
link |
00:47:13.280
Still, okay, I'm gonna keep asking the same dumb question,
link |
00:47:17.400
handwritten recognition to solve the challenge.
link |
00:47:20.200
You kind of propose a challenge that says
link |
00:47:21.920
we should be able to get state of the art MNIST error rates
link |
00:47:27.100
by using very few, 60, maybe fewer examples per digit.
link |
00:47:31.480
What kind of predicates do you think it will look like?
link |
00:47:35.920
That is the challenge.
link |
00:47:37.520
So people who will solve this problem,
link |
00:47:39.760
they will answer.
link |
00:47:41.480
Do you think they'll be able to answer it
link |
00:47:44.720
in a human explainable way?
link |
00:47:47.800
They just need to write function, that's it.
link |
00:47:50.760
But so can that function be written, I guess,
link |
00:47:54.280
by an automated reasoning system?
link |
00:47:58.680
Whether we're talking about a neural network
link |
00:48:01.080
learning a particular function or another mechanism?
link |
00:48:05.040
No, I'm not against neural network.
link |
00:48:08.520
I'm against admissible set of function
link |
00:48:11.600
which create neural network.
link |
00:48:13.720
You did it by hand.
link |
00:48:16.360
You don't do it by invariance, by predicate, by reason.
link |
00:48:24.600
But neural networks can then reverse,
link |
00:48:26.400
do the reverse step of helping you find a function
link |
00:48:29.840
that just, the task of a neural network
link |
00:48:33.600
is to find a disentangled representation, for example,
link |
00:48:38.160
that they call, is to find that one predicate function
link |
00:48:42.120
that's really capture some kind of essence.
link |
00:48:45.180
One, not the entire essence, but one very useful essence
link |
00:48:48.600
of this particular visual space.
link |
00:48:52.640
Do you think that's possible?
link |
00:48:53.840
Listen, I'm grasping, hoping there's an automated way
link |
00:48:58.620
to find good predicates, right?
link |
00:49:00.300
So the question is what are the mechanisms
link |
00:49:03.000
of finding good predicates, ideas
link |
00:49:05.760
that you think we should pursue?
link |
00:49:08.040
A young grad student listening right now.
link |
00:49:11.240
I gave example.
link |
00:49:13.360
So find situation where predicate which you're suggesting
link |
00:49:23.480
don't create invariant.
link |
00:49:24.980
It's like in physics.
link |
00:49:28.820
Find situation where existing theory cannot explain it.
link |
00:49:37.180
Find situation where the existing theory
link |
00:49:39.420
can't explain it.
link |
00:49:40.260
So you're finding contradictions.
link |
00:49:42.780
Find contradiction, and then remove this contradiction.
link |
00:49:46.140
But in my case, what means contradiction,
link |
00:49:48.940
you find function which, if you will use this function,
link |
00:49:53.500
you're not keeping invariants.
link |
00:49:56.900
This is really the process of discovering contradictions.
link |
00:50:01.300
Yeah.
link |
00:50:04.060
It is like in physics.
link |
00:50:05.900
Find situation where you have contradiction
link |
00:50:09.800
for one of the property, for one of the predicate.
link |
00:50:15.500
Then include this predicate, making invariants,
link |
00:50:19.020
and solve again this problem.
link |
00:50:20.460
Now you don't have contradiction.
link |
00:50:22.100
But it is not the best way, probably, I don't know,
link |
00:50:30.380
to looking for predicate.
link |
00:50:31.980
That's just one way, okay.
link |
00:50:33.580
That, no, no, it is brute force way.
link |
00:50:35.900
The brute force way.
link |
00:50:37.300
What about the ideas of what,
link |
00:50:42.300
big umbrella term of symbolic AI?
link |
00:50:45.660
There's what in the 80s with expert systems,
link |
00:50:48.540
sort of logic reasoning based systems.
link |
00:50:52.380
Is there hope there to find some,
link |
00:50:57.020
through sort of deductive reasoning,
link |
00:51:00.500
to find good predicates?
link |
00:51:05.540
I don't think so.
link |
00:51:08.980
I think that just logic is not enough.
link |
00:51:12.020
It's kind of a compelling notion, though.
link |
00:51:14.420
You know, that when smart people sit in a room
link |
00:51:17.620
and reason through things, it seems compelling.
link |
00:51:20.360
And making our machines do the same is also compelling.
link |
00:51:24.940
So, everything is very simple.
link |
00:51:29.420
When you have infinite number of predicate,
link |
00:51:34.100
you can choose the function you want.
link |
00:51:38.580
You have invariants and you can choose the function you want.
link |
00:51:41.660
But you have to have not too many invariants
link |
00:51:51.880
to solve the problem.
link |
00:51:56.200
So, and have from infinite number of function
link |
00:51:59.940
to select finite number
link |
00:52:04.120
and hopefully small number of functions,
link |
00:52:08.460
which is good enough to extract small set
link |
00:52:14.920
of admissible functions.
link |
00:52:17.920
So, they will be admissible, it's for sure,
link |
00:52:19.840
because every function just decrease set of function
link |
00:52:23.880
and leaving it admissible.
link |
00:52:25.680
But it will be small.
link |
00:52:27.720
But why do you think logic based systems don't,
link |
00:52:32.560
can't help, intuition, not?
link |
00:52:35.280
Because you should know reality.
link |
00:52:37.800
You should know life.
link |
00:52:39.480
This guy like Propp, he knows something.
link |
00:52:44.280
And he tried to put in invariant his understanding.
link |
00:52:49.400
That's the human, yeah, but see,
link |
00:52:51.600
you're putting too much value into Vladimir Propp
link |
00:52:56.480
knowing something.
link |
00:52:57.920
No, it is, in the story, what means you know life?
link |
00:53:04.420
What it means?
link |
00:53:05.400
You know common sense.
link |
00:53:07.040
No, no, you know something.
link |
00:53:10.400
Common sense, it is some rules.
link |
00:53:13.440
You think so?
link |
00:53:14.800
Common sense is simply rules?
link |
00:53:17.180
Common sense is every, it's mortality,
link |
00:53:21.800
it's fear of death, it's love, it's spirituality,
link |
00:53:27.880
it's happiness and sadness.
link |
00:53:30.840
All of it is tied up into understanding gravity,
link |
00:53:34.420
which is what we think of as common sense.
link |
00:53:36.840
I don't really need to discuss so wide.
link |
00:53:39.840
I want to discuss, understand digit recognition.
link |
00:53:45.440
Anytime I bring up love and death,
link |
00:53:47.640
you bring it back to digit recognition, I like it.
link |
00:53:51.160
No, you know, it is durable because there is a challenge.
link |
00:53:55.200
Yeah.
link |
00:53:56.040
Which I see how to solve it.
link |
00:53:59.260
If I will have a student concentrate on this work,
link |
00:54:02.520
I will suggest something to solve.
link |
00:54:04.800
You mean handwritten record?
link |
00:54:07.000
Yeah, it's a beautifully simple, elegant, and yet.
link |
00:54:10.800
I think that I know invariants which will solve this.
link |
00:54:13.440
You do?
link |
00:54:14.280
I think so, yes.
link |
00:54:15.920
But it is not universal, it is maybe,
link |
00:54:21.600
I want some universal invariants
link |
00:54:24.160
which are good not only for digit recognition,
link |
00:54:27.360
for image understanding.
link |
00:54:28.760
So let me ask, how hard do you think
link |
00:54:34.160
is 2D image understanding?
link |
00:54:38.360
So if we, we can kind of intuit handwritten recognition.
link |
00:54:43.800
How big of a step, leap, journey is it from that?
link |
00:54:49.160
If I gave you good, if I solved your challenge
link |
00:54:51.920
for handwritten recognition,
link |
00:54:53.600
how long would my journey then be from that
link |
00:54:56.480
to understanding more general, natural images?
link |
00:54:59.360
Immediately, you will understand this
link |
00:55:01.920
as soon as you will make a record.
link |
00:55:05.400
Because it is not for free.
link |
00:55:07.720
As soon as you will create several invariants
link |
00:55:13.000
which will help you to get the same performance
link |
00:55:20.120
that the best neural net did using 100,
link |
00:55:23.880
there might be more than 100 times less examples,
link |
00:55:27.760
you have to have something smart to do that.
link |
00:55:31.220
And you're saying?
link |
00:55:32.220
That is invariant, it is predicate.
link |
00:55:35.160
Because you should put some idea how to do that.
link |
00:55:39.420
But okay, let me just pause.
link |
00:55:42.380
Maybe it's a trivial point, maybe not.
link |
00:55:44.520
But handwritten recognition feels like a 2D,
link |
00:55:48.840
two dimensional problem.
link |
00:55:50.440
And it seems like how much complicated is the fact
link |
00:55:55.360
that most images are projection of a three dimensional world
link |
00:56:00.400
onto a 2D plane.
link |
00:56:03.100
It feels like for a three dimensional world,
link |
00:56:05.880
we need to start understanding common sense
link |
00:56:08.660
in order to understand an image.
link |
00:56:11.960
It's no longer visual shape and symmetry.
link |
00:56:17.480
It's having to start to understand concepts
link |
00:56:19.920
of, understand life.
link |
00:56:22.120
Yeah, you're talking that there are different invariant,
link |
00:56:27.320
different predicate, yeah.
link |
00:56:28.920
And potentially much larger number.
link |
00:56:32.480
You know, maybe, but let's start from simple.
link |
00:56:36.360
Yeah, but you said that it would be immediate.
link |
00:56:38.200
No, you know, I cannot think about things
link |
00:56:41.360
which I don't understand.
link |
00:56:43.280
This I understand, but I'm sure that I don't understand
link |
00:56:46.920
everything there.
link |
00:56:48.440
Yeah, that's the difference.
link |
00:56:50.440
Do as simple as possible, but not simpler.
link |
00:56:54.360
And that is exact case.
link |
00:56:56.520
With handwritten.
link |
00:56:57.440
With handwritten.
link |
00:56:58.940
Yeah, but that's the difference between you and I.
link |
00:57:04.880
I welcome and enjoy thinking about things
link |
00:57:07.920
I completely don't understand.
link |
00:57:09.880
Because to me, it's a natural extension
link |
00:57:12.380
without having solved handwritten recognition
link |
00:57:15.140
to wonder how difficult is the next step
link |
00:57:23.280
of understanding 2D, 3D images.
link |
00:57:25.680
Because ultimately, while the science of intelligence
link |
00:57:29.240
is fascinating, it's also fascinating to see
link |
00:57:31.680
how that maps to the engineering of intelligence.
link |
00:57:34.680
And recognizing handwritten digits is not,
link |
00:57:39.280
doesn't help you, it might, it may not help you
link |
00:57:43.080
with the problem of general intelligence.
link |
00:57:46.560
We don't know.
link |
00:57:47.400
It'll help you a little bit.
link |
00:57:48.240
We don't know how much.
link |
00:57:49.080
It's unclear.
link |
00:57:49.900
It's unclear.
link |
00:57:50.740
Yeah.
link |
00:57:51.580
It might very much.
link |
00:57:52.400
But I would like to make a remark.
link |
00:57:53.240
Yes.
link |
00:57:54.080
I start not from very primitive problem,
link |
00:57:58.760
make a challenge problem.
link |
00:58:03.120
I start with very general problem, with PLATO.
link |
00:58:07.640
So you understand, and it comes from PLATO
link |
00:58:10.640
to digit recognition.
link |
00:58:14.000
So you basically took PLATO and the world
link |
00:58:18.120
of forms and ideas and mapped and projected
link |
00:58:22.080
into the clearest, simplest formulation
link |
00:58:25.380
of that big world.
link |
00:58:26.820
You know, I would say that I did not understand PLATO
link |
00:58:31.560
until recently, and until I consider
link |
00:58:36.560
the convergence and then predicate,
link |
00:58:40.800
and then, oh, this is what PLATO told.
link |
00:58:45.520
So.
link |
00:58:46.360
Can you linger on that?
link |
00:58:47.180
Like why, how do you think about this world of ideas
link |
00:58:50.200
and world of things in PLATO?
link |
00:58:52.880
No, it is metaphor.
link |
00:58:54.160
It is.
link |
00:58:55.000
It's a metaphor, for sure.
link |
00:58:55.840
Yeah.
link |
00:58:56.680
It's a compelling, it's a poetic
link |
00:58:57.500
and a beautiful metaphor.
link |
00:58:58.340
Yeah, yeah, yeah.
link |
00:58:59.180
But what, can you?
link |
00:59:00.560
But it is a way how you should try to understand
link |
00:59:04.960
how to talk ideas in the world.
link |
00:59:07.880
So from my point of view,
link |
00:59:11.240
it is very clear, but it is lying.
link |
00:59:14.900
All the time, people looking for that.
link |
00:59:17.520
Say, PLATO, then Hegel, whatever reasonable it exists,
link |
00:59:24.320
whatever exists, it is reasonable.
link |
00:59:26.700
I don't know what he have in mind reasonable.
link |
00:59:30.240
Right, this philosophers again,
link |
00:59:31.600
their words. No, no, no, no, no, no, no.
link |
00:59:33.320
It is next stop of Wigner.
link |
00:59:37.120
That mathematics understand something of reality.
link |
00:59:40.760
It is the same PLATO line.
link |
00:59:43.440
And then it comes suddenly to Vladimir Propp.
link |
00:59:48.160
Look, 31 ideas, 31 units, and this corrects everything.
link |
00:59:54.320
There's abstractions, ideas that represent our world.
link |
00:59:59.320
Our world, and we should always try to reach into that.
link |
01:00:03.320
Yeah, but you should make a projection on reality.
link |
01:00:07.520
But understanding is, it is abstract ideas.
link |
01:00:11.820
You have in your mind several abstract ideas
link |
01:00:15.880
which you can apply to reality.
link |
01:00:17.760
And reality in this case,
link |
01:00:19.160
so if you look at machine learning as data.
link |
01:00:21.400
This example, data.
link |
01:00:22.720
Data.
link |
01:00:24.080
Okay, let me put this on you
link |
01:00:26.280
because I'm an emotional creature.
link |
01:00:28.320
I'm not a mathematical creature like you.
link |
01:00:30.800
I find compelling the idea,
link |
01:00:33.400
forget the space, the sea of functions.
link |
01:00:36.680
There's also a sea of data in the world.
link |
01:00:39.520
And I find compelling that there might be,
link |
01:00:42.320
like you said, teacher,
link |
01:00:44.640
small examples of data that are most useful
link |
01:00:49.240
for discovering good,
link |
01:00:53.000
whether it's predicates or good functions,
link |
01:00:55.560
that the selection of data may be a powerful journey,
link |
01:01:00.320
a useful, you know, coming up with a mechanism
link |
01:01:03.760
for selecting good data might be useful too.
link |
01:01:07.480
Do you find this idea of finding the right data set
link |
01:01:12.440
interesting at all?
link |
01:01:14.000
Or do you kind of take the data set as a given?
link |
01:01:17.760
I think that it is, you know, my theme is very simple.
link |
01:01:22.680
You have huge set of functions.
link |
01:01:25.900
If you will apply, and you have not too many data,
link |
01:01:31.500
if you pick up function which describes this data,
link |
01:01:37.560
you will do not very well.
link |
01:01:41.200
You will.
link |
01:01:42.040
Like randomly pick up.
link |
01:01:42.860
Yeah, you will overfit.
link |
01:01:43.700
Yeah, it will be overfitting.
link |
01:01:46.380
So you should decrease set of function
link |
01:01:50.160
from which you're picking up one.
link |
01:01:53.640
So you should go somehow to admissible set of function.
link |
01:01:59.560
And this, what about weak conversions?
link |
01:02:03.800
So, but from another point of view,
link |
01:02:08.040
to make admissible set of function,
link |
01:02:13.200
you need just a DG, just function
link |
01:02:15.320
which you will take in inner product,
link |
01:02:19.400
which you will measure property of your function.
link |
01:02:27.440
And that is how it works.
link |
01:02:31.200
No, I get it, I get it, I understand it,
link |
01:02:32.720
but do you, the reality is.
link |
01:02:34.960
But let's think about examples.
link |
01:02:40.040
You have huge set of function,
link |
01:02:41.880
and you have several examples.
link |
01:02:44.640
If you just trying to keep, take function
link |
01:02:50.360
which satisfies these examples, you still will overfit.
link |
01:02:56.620
You need decrease, you need admissible set of function.
link |
01:02:59.320
Absolutely, but what, say you have more data than functions.
link |
01:03:06.120
So sort of consider the, I mean,
link |
01:03:08.280
maybe not more data than functions,
link |
01:03:09.760
because that's impossible.
link |
01:03:12.040
But what, I was trying to be poetic for a second.
link |
01:03:15.120
I mean, you have a huge amount of data,
link |
01:03:17.200
a huge amount of examples.
link |
01:03:19.840
But amount of function can be even bigger.
link |
01:03:22.440
It can get bigger, I understand.
link |
01:03:24.320
Everything is.
link |
01:03:25.520
There's always a bigger boat.
link |
01:03:27.560
Full Hilbert space.
link |
01:03:29.200
I got you, but okay.
link |
01:03:31.800
But you don't find the world of data
link |
01:03:35.800
to be an interesting optimization space.
link |
01:03:38.720
Like the optimization should be in the space of functions.
link |
01:03:45.040
Creating admissible set of functions.
link |
01:03:47.080
Admissible set of functions.
link |
01:03:48.120
No, you know, even from the classical business theory,
link |
01:03:54.480
from structure risk minimization,
link |
01:03:56.400
you should organize function in the way
link |
01:04:02.240
that they will be useful for you.
link |
01:04:06.560
Right.
link |
01:04:07.560
And that is admissible set.
link |
01:04:10.280
The way you're thinking about useful
link |
01:04:13.560
is you're given a small set of examples.
link |
01:04:17.000
Useful small, small set of function
link |
01:04:19.040
which contain function I'm looking for.
link |
01:04:21.800
Yeah, but looking for based on
link |
01:04:25.320
the empirical set of small examples.
link |
01:04:27.640
Yeah, but that is another story.
link |
01:04:29.640
I don't touch it.
link |
01:04:31.160
Because I believe that this small examples
link |
01:04:35.720
is not too small.
link |
01:04:37.400
Say 60 per class.
link |
01:04:39.200
Law of large numbers works.
link |
01:04:41.360
I don't need uniform law.
link |
01:04:43.400
The story is that in statistics there are two law.
link |
01:04:46.740
Law of large numbers and uniform law of large numbers.
link |
01:04:51.120
So I want to be in situation where I use
link |
01:04:54.760
law of large numbers but not uniform law of large numbers.
link |
01:04:58.280
Right, so 60 is law of large, it's large enough.
link |
01:05:01.440
I hope, no, it still need some evaluations,
link |
01:05:05.640
some bonds.
link |
01:05:07.880
But the idea is the following that
link |
01:05:11.560
if you trust that
link |
01:05:15.580
say this average gives you something close to expectations
link |
01:05:21.080
so you can talk about that, about this predicate.
link |
01:05:26.240
And that is basis of human intelligence.
link |
01:05:30.720
Good predicates is the,
link |
01:05:32.280
the discovery of good predicates is the basis of human intelligence.
link |
01:05:34.880
It is discoverer of your understanding world.
link |
01:05:39.880
Of your methodology of understanding world.
link |
01:05:45.280
Because you have several function
link |
01:05:47.240
which you will apply to reality.
link |
01:05:51.200
Can you say that again?
link |
01:05:52.480
So you're...
link |
01:05:54.440
You have several functions predicate.
link |
01:05:58.680
But they're abstract.
link |
01:06:00.240
Yes.
link |
01:06:01.080
Then you will apply them to reality, to your data.
link |
01:06:04.360
And you will create in this way predicate.
link |
01:06:07.400
Which is useful for your task.
link |
01:06:11.420
But predicate are not related specifically to your task.
link |
01:06:16.840
To this your task.
link |
01:06:17.840
It is abstract functions.
link |
01:06:20.080
Which being applying, applied to...
link |
01:06:23.240
Many tasks that you might be interested in.
link |
01:06:25.280
It might be many tasks, I don't know.
link |
01:06:27.640
Or...
link |
01:06:28.640
Different tasks.
link |
01:06:29.960
Well they should be many tasks, right?
link |
01:06:31.640
I believe like, like in prop case.
link |
01:06:35.680
It was for fairytales, but it's happened everywhere.
link |
01:06:40.080
Okay, so we talked about images a little bit.
link |
01:06:42.160
But, can we talk about Noam Chomsky for a second?
link |
01:06:49.800
No, I believe I...
link |
01:06:52.280
I don't know him very well.
link |
01:06:54.240
Personally, well...
link |
01:06:55.680
Not personally, I don't know.
link |
01:06:57.040
His ideas.
link |
01:06:57.880
His ideas.
link |
01:06:58.720
Well let me just say,
link |
01:06:59.840
do you think language, human language,
link |
01:07:02.360
is essential to expressing ideas?
link |
01:07:05.760
As Noam Chomsky believes.
link |
01:07:08.320
So like, language is at the core
link |
01:07:10.080
of our formation of predicates.
link |
01:07:13.800
The human language.
link |
01:07:14.960
For me, language and all the story of language
link |
01:07:18.560
is very complicated.
link |
01:07:20.720
I don't understand this.
link |
01:07:22.920
And I am not...
link |
01:07:24.080
I thought about...
link |
01:07:25.680
Nobody does.
link |
01:07:26.520
I am not ready to work on that.
link |
01:07:28.260
Because it's so huge.
link |
01:07:30.720
It is not for me, and I believe not for our century.
link |
01:07:35.880
The 21st century.
link |
01:07:37.280
Not for 21st century.
link |
01:07:39.440
You should learn something, a lot of stuff,
link |
01:07:42.160
from simple task like digit recognition.
link |
01:07:45.040
So you think, okay, you think digital recognition,
link |
01:07:49.200
2D image, how would you more abstractly define
link |
01:07:55.120
digit recognition?
link |
01:07:56.440
It's 2D image, symbol recognition, essentially.
link |
01:08:03.760
I mean, I'm trying to get a sense,
link |
01:08:08.080
sort of thinking about it now,
link |
01:08:09.680
having worked with MNIST forever,
link |
01:08:12.880
how small of a subset is this
link |
01:08:16.040
of the general vision recognition problem
link |
01:08:18.560
and the general intelligence problem?
link |
01:08:21.580
Is it...
link |
01:08:24.360
Yeah.
link |
01:08:25.200
Is it a giant subset?
link |
01:08:26.360
Is it not?
link |
01:08:27.840
And how far away is language?
link |
01:08:30.200
You know, let me refer to Einstein.
link |
01:08:34.600
Take the simplest problem, as simple as possible,
link |
01:08:38.280
but not simpler.
link |
01:08:39.800
And this is challenge, this simple problem.
link |
01:08:44.280
But it's simple by idea, but not simple to get it.
link |
01:08:50.360
When you will do this, you will find some predicate,
link |
01:08:55.360
which helps it a bit.
link |
01:08:57.160
Well, yeah, I mean, with Einstein, you can,
link |
01:09:01.320
you look at general relativity,
link |
01:09:04.120
but that doesn't help you with quantum mechanics.
link |
01:09:07.280
That's another story.
link |
01:09:08.760
You don't have any universal instrument.
link |
01:09:11.840
Yes, so I'm trying to wonder which space we're in,
link |
01:09:16.520
whether handwritten recognition is like general relativity,
link |
01:09:21.120
and then language is like quantum mechanics.
link |
01:09:23.160
So you're still gonna have to do a lot of mess
link |
01:09:27.000
to universalize it.
link |
01:09:28.720
But I'm trying to see,
link |
01:09:35.120
so what's your intuition why handwritten recognition
link |
01:09:39.160
is easier than language?
link |
01:09:42.020
Just, I think a lot of people would agree with that,
link |
01:09:45.320
but if you could elucidate sort of the intuition of why.
link |
01:09:50.200
I don't know, no, I don't think in this direction.
link |
01:09:56.460
I just think in directions that this is problem,
link |
01:10:00.880
which if we will solve it well,
link |
01:10:07.760
we will create some abstract understanding of images.
link |
01:10:18.040
Maybe not all images.
link |
01:10:19.680
I would like to talk to guys who doing in real images
link |
01:10:24.000
in Columbia University.
link |
01:10:26.280
What kind of images, unreal?
link |
01:10:28.400
Real images.
link |
01:10:29.240
Real images.
link |
01:10:30.060
Yeah, what they're ready, is there a predicate,
link |
01:10:33.400
what can be predicate?
link |
01:10:35.160
I still symmetry will play role in real life images,
link |
01:10:40.960
in any real life images, 2D images.
link |
01:10:43.920
Let's talk about 2D images.
link |
01:10:46.320
Because that's what we know.
link |
01:10:52.520
A neural network was created for 2D images.
link |
01:10:55.880
So the people I know in vision science, for example,
link |
01:10:58.680
the people who study human vision,
link |
01:11:01.000
that they usually go to the world of symbols
link |
01:11:04.520
and like handwritten recognition,
link |
01:11:06.360
but not really, it's other kinds of symbols
link |
01:11:08.480
to study our visual perception system.
link |
01:11:11.560
As far as I know, not much predicate type of thinking
link |
01:11:15.160
is understood about our vision system.
link |
01:11:17.640
They did not think in this direction.
link |
01:11:19.400
They don't, yeah, but how do you even begin
link |
01:11:21.720
to think in that direction?
link |
01:11:23.480
That's a, I would like to discuss with them.
link |
01:11:26.920
Yeah.
link |
01:11:27.760
Because if we will be able to show that it is what working,
link |
01:11:35.600
and theoretical scheme, it's not so bad.
link |
01:11:40.360
So the unfortunate, so if we compare to language,
link |
01:11:43.360
language is like letters, finite set of letters,
link |
01:11:46.520
and a finite set of ways you can put together those letters.
link |
01:11:50.480
So it feels more amenable to kind of analysis.
link |
01:11:53.720
With natural images, there is so many pixels.
link |
01:11:58.680
No, no, no, letter, language is much, much more complicated.
link |
01:12:03.680
It's involved a lot of different stuff.
link |
01:12:08.040
It's not just understanding of very simple class of tasks.
link |
01:12:15.280
I would like to see list of task with language involved.
link |
01:12:19.960
Yes, so there's a lot of nice benchmarks now
link |
01:12:23.200
in natural language processing from the very trivial,
link |
01:12:27.400
like understanding the elements of a sentence,
link |
01:12:30.200
to question answering, to much more complicated
link |
01:12:33.040
where you talk about open domain dialogue.
link |
01:12:36.120
The natural question is, with handwritten recognition,
link |
01:12:39.240
is really the first step of understanding
link |
01:12:42.960
visual information.
link |
01:12:44.600
Right.
link |
01:12:46.440
But even our records show that we go in the wrong direction
link |
01:12:54.160
because we need 60,000 digits.
link |
01:12:56.600
So even this first step, so forget about talking
link |
01:12:59.680
about the full journey, this first step
link |
01:13:01.880
should be taking in the right direction.
link |
01:13:03.280
No, no, wrong direction because 60,000 is unacceptable.
link |
01:13:07.160
No, I'm saying it should be taken in the right direction
link |
01:13:11.000
because 60,000 is not acceptable.
link |
01:13:13.640
If you can talk, it's great, we have half percent of error.
link |
01:13:18.440
And hopefully the step from doing hand recognition
link |
01:13:22.720
using very few examples, the step towards what babies do
link |
01:13:26.760
when they crawl and understand their physical environment.
link |
01:13:30.160
I know you don't know about babies.
link |
01:13:31.720
If you will do from very small examples,
link |
01:13:36.040
you will find principles which are different
link |
01:13:40.520
from what we're using now.
link |
01:13:44.440
And so it's more or less clear.
link |
01:13:48.320
That means that you will use weak convergence,
link |
01:13:52.240
not just strong convergence.
link |
01:13:54.440
Do you think these principles
link |
01:13:58.440
will naturally be human interpretable?
link |
01:14:01.640
Oh, yeah.
link |
01:14:02.560
So like when we'll be able to explain them
link |
01:14:04.480
and have a nice presentation to show
link |
01:14:06.240
what those principles are, or are they very,
link |
01:14:10.760
going to be very kind of abstract kinds of functions?
link |
01:14:14.440
For example, I talked yesterday about symmetry.
link |
01:14:17.640
Yes.
link |
01:14:18.680
And I gave very simple examples.
link |
01:14:20.440
The same will be like that.
link |
01:14:22.000
You gave like a predicate of a basic for?
link |
01:14:24.680
For symmetries.
link |
01:14:25.760
Yes, for different symmetries and you have for?
link |
01:14:29.520
Degree of symmetries, that is important.
link |
01:14:31.840
Not just symmetry.
link |
01:14:33.680
Existence doesn't exist, degree of symmetry.
link |
01:14:38.360
Yeah, for handwritten recognition.
link |
01:14:41.320
No, it's not for handwritten, it's for any images.
link |
01:14:45.160
But I would like apply to handwritten.
link |
01:14:47.720
Right, in theory it's more general, okay, okay.
link |
01:14:55.280
So a lot of the things we've been talking about
link |
01:14:58.160
falls, we've been talking about philosophy a little bit,
link |
01:15:01.800
but also about mathematics and statistics.
link |
01:15:05.480
A lot of it falls into this idea,
link |
01:15:08.040
a universal idea of statistical theory of learning.
link |
01:15:11.760
What is the most beautiful and sort of powerful
link |
01:15:16.760
or essential idea you've come across,
link |
01:15:19.080
even just for yourself personally in the world
link |
01:15:22.040
of statistics or statistic theory of learning?
link |
01:15:25.440
Probably uniform convergence, which we did
link |
01:15:29.480
with Alexei Chilvonenkis.
link |
01:15:33.000
Can you describe universal convergence?
link |
01:15:36.080
You have law of large numbers.
link |
01:15:40.080
So for any function, expectation of function,
link |
01:15:44.480
average of function converged to expectation.
link |
01:15:48.120
But if you have set of functions,
link |
01:15:50.520
for any function it is true.
link |
01:15:52.340
But it should converge simultaneously
link |
01:15:55.580
for all set of functions.
link |
01:15:59.020
And for learning, you need uniform convergence.
link |
01:16:06.700
Just convergence is not enough.
link |
01:16:11.220
Because when you pick up one which gives minimum,
link |
01:16:16.660
you can pick up one function which does not converge
link |
01:16:21.660
and it will give you the best answer for this function.
link |
01:16:31.460
So you need uniform convergence to guarantee learning.
link |
01:16:34.900
So learning does not rely on trivial law of large numbers,
link |
01:16:40.220
it relies on universal law.
link |
01:16:42.940
But idea of convergence exists in statistics for a long time.
link |
01:16:51.940
But it is interesting that as I think about myself,
link |
01:17:02.140
how stupid I was 50 years, I did not see weak convergence.
link |
01:17:08.160
I work on strong convergence.
link |
01:17:10.940
But now I think that most powerful is weak convergence.
link |
01:17:15.260
Because it makes admissible set of functions.
link |
01:17:18.860
And even in all proverbs,
link |
01:17:22.720
when people try to understand recognition about dog law,
link |
01:17:28.300
looks like a dog and so on, they use weak convergence.
link |
01:17:32.400
People in language, they understand this.
link |
01:17:34.600
But when we're trying to create artificial intelligence,
link |
01:17:42.260
we want event in different way.
link |
01:17:46.220
We just consider strong convergence arguments.
link |
01:17:50.540
So reducing the set of admissible functions,
link |
01:17:52.740
you think there should be effort put into understanding
link |
01:17:58.780
the properties of weak convergence?
link |
01:18:01.260
You know, in classical mathematics, in Gilbert space,
link |
01:18:07.260
there are only two ways,
link |
01:18:08.820
two form of convergence, strong and weak.
link |
01:18:14.180
Now we can use both.
link |
01:18:16.900
That means that we did everything.
link |
01:18:21.180
And it so happened that when we use Hilbert space,
link |
01:18:26.180
which is very rich space, space of continuous functions,
link |
01:18:34.780
which has integral and square.
link |
01:18:38.020
So we can apply weak and strong convergence for learning
link |
01:18:42.420
and have closed form solution.
link |
01:18:45.140
So for computationally simple.
link |
01:18:47.660
For me, it is sign that it is right way.
link |
01:18:51.080
Because you don't need any heuristic here,
link |
01:18:55.740
just do whatever you want.
link |
01:18:59.620
But now the only what left is this concept
link |
01:19:03.380
of what is predicate, but it is not statistics.
link |
01:19:08.020
By the way, I like the fact that you think that heuristics
link |
01:19:11.660
are a mess that should be removed from the system.
link |
01:19:14.900
So closed form solution is the ultimate goal.
link |
01:19:18.460
No, it so happened that when you're using right instrument,
link |
01:19:23.980
you have closed form solution.
link |
01:19:28.500
Do you think intelligence, human level intelligence,
link |
01:19:32.780
when we create it,
link |
01:19:37.660
will have something like a closed form solution?
link |
01:19:42.360
You know, now I'm looking on bounds,
link |
01:19:46.380
which I gave bounds for convergence.
link |
01:19:51.220
And when I'm looking for bounds,
link |
01:19:53.900
I'm thinking what is the most appropriate kernel
link |
01:19:59.620
for this bound would be.
link |
01:20:02.500
So we know that in say,
link |
01:20:05.960
all our businesses, we use radial basis function.
link |
01:20:11.460
But looking on the bound,
link |
01:20:13.220
I think that I start to understand that maybe
link |
01:20:17.140
we need to make corrections to radial basis function
link |
01:20:21.140
to be closer to work better for this bounds.
link |
01:20:28.440
So I'm again trying to understand what type of kernel
link |
01:20:33.940
have best approximation,
link |
01:20:37.580
best fit to this bound.
link |
01:20:43.420
Sure, so there's a lot of interesting work
link |
01:20:45.580
that could be done in discovering better functions
link |
01:20:47.780
than radial basis functions for bounds you find.
link |
01:20:53.160
It still comes from,
link |
01:20:55.860
you're looking to mass and trying to understand what.
link |
01:21:00.220
From your own mind, looking at the, I don't know.
link |
01:21:03.540
Then I'm trying to understand what will be good for that.
link |
01:21:11.260
Yeah, but to me, there's still a beauty.
link |
01:21:14.020
Again, maybe I'm a descendant of Alan Turing to heuristics.
link |
01:21:17.980
To me, ultimately, intelligence will be a mess of heuristics.
link |
01:21:23.620
And that's the engineering answer, I guess.
link |
01:21:26.300
Absolutely.
link |
01:21:27.460
When you're doing say, self driving cars,
link |
01:21:31.060
the great guy who will do this.
link |
01:21:35.020
It doesn't matter what theory behind that.
link |
01:21:40.640
Who has a better feeling how to apply it.
link |
01:21:43.800
But by the way, it is the same story about predicates.
link |
01:21:50.400
Because you cannot create rule for,
link |
01:21:53.880
situation is much more than you have rule for that.
link |
01:21:56.660
But maybe you can have more abstract rule
link |
01:22:04.780
than it will be less literal.
link |
01:22:08.780
It is the same story about ideas
link |
01:22:10.820
and ideas applied to specific cases.
link |
01:22:16.500
But still you should reach.
link |
01:22:17.340
You cannot avoid this.
link |
01:22:18.900
Yes, of course.
link |
01:22:19.740
But you should still reach for the ideas
link |
01:22:21.620
to understand the science.
link |
01:22:22.940
Okay, let me kind of ask, do you think neural networks
link |
01:22:27.980
or functions can be made to reason?
link |
01:22:34.100
So what do you think, we've been talking about intelligence,
link |
01:22:37.100
but this idea of reasoning,
link |
01:22:39.620
there's an element of sequentially disassembling,
link |
01:22:44.500
interpreting the images.
link |
01:22:48.380
So when you think of handwritten recognition, we kind of think
link |
01:22:54.100
that there'll be a single, there's an input and output.
link |
01:22:56.940
There's not a recurrence.
link |
01:23:01.060
What do you think about sort of the idea of recurrence,
link |
01:23:04.440
of going back to memory and thinking through this
link |
01:23:06.860
sort of sequentially mangling the different representations
link |
01:23:11.860
over and over until you arrive at a conclusion?
link |
01:23:20.100
Or is ultimately all that can be wrapped up into a function?
link |
01:23:23.460
No, you're suggesting that let us use this type of algorithm.
link |
01:23:29.860
When I started thinking, I first of all,
link |
01:23:33.300
starting to understand what I want.
link |
01:23:36.580
Can I write down what I want?
link |
01:23:39.560
And then I'm trying to formalize.
link |
01:23:45.020
And when I do that, I think I have to solve this problem.
link |
01:23:52.120
And till now I did not see a situation where you need recurrence.
link |
01:24:04.280
But do you observe human beings?
link |
01:24:07.840
Yeah.
link |
01:24:08.680
You try to, it's the imitation question, right?
link |
01:24:12.400
It seems that human beings reason
link |
01:24:14.880
this kind of sequentially sort of,
link |
01:24:20.680
does that inspire in you a thought that we need to add that
link |
01:24:24.120
into our intelligence systems?
link |
01:24:30.760
You're saying, okay, I mean, you've kind of answered saying
link |
01:24:34.440
until now I haven't seen a need for it.
link |
01:24:37.040
And so because of that, you don't see a reason
link |
01:24:40.080
to think about it.
link |
01:24:41.740
You know, most of things I don't understand.
link |
01:24:45.880
In reasoning in human, it is for me too complicated.
link |
01:24:52.740
For me, the most difficult part is to ask questions,
link |
01:25:01.160
to good questions, how it works,
link |
01:25:03.900
how people asking questions, I don't know this.
link |
01:25:11.720
You said that machine learning is not only
link |
01:25:13.640
about technical things, speaking of questions,
link |
01:25:16.480
but it's also about philosophy.
link |
01:25:19.720
So what role does philosophy play in machine learning?
link |
01:25:23.480
We talked about Plato, but generally thinking
link |
01:25:28.240
in this philosophical way, does it have,
link |
01:25:32.480
how does philosophy and math fit together in your mind?
link |
01:25:36.640
First ideas and then their implementation.
link |
01:25:39.520
It's like predicate, like say admissible set of functions.
link |
01:25:48.940
It comes together, everything.
link |
01:25:51.500
Because the first iteration of theory was done 50 years ago.
link |
01:25:58.360
I told that, this is theory.
link |
01:26:00.380
So everything's there, if you have data you can,
link |
01:26:04.080
and your set of function has not big capacity.
link |
01:26:13.600
So low VC dimension, you can do that.
link |
01:26:15.760
You can make structural risk minimization, control capacity.
link |
01:26:21.140
But you was not able to make admissible set of function good.
link |
01:26:26.140
Now when suddenly realize that we did not use
link |
01:26:33.680
another idea of convergence, which we can,
link |
01:26:39.480
everything comes together.
link |
01:26:41.480
But those are mathematical notions.
link |
01:26:43.320
Philosophy plays a role of simply saying
link |
01:26:48.000
that we should be swimming in the space of ideas.
link |
01:26:52.080
Let's talk what is philosophy.
link |
01:26:54.320
Philosophy means understanding of life.
link |
01:26:58.080
So understanding of life, say people like Plata,
link |
01:27:03.480
they understand on very high abstract level of life.
link |
01:27:07.640
So, and whatever I doing,
link |
01:27:12.040
just implementation of my understanding of life.
link |
01:27:16.740
But every new step, it is very difficult.
link |
01:27:21.400
For example, to find this idea
link |
01:27:28.880
that we need big convergence was not simple for me.
link |
01:27:40.600
So that required thinking about life a little bit.
link |
01:27:44.260
Hard to trace, but there was some thought process.
link |
01:27:48.840
I'm working, I'm thinking about the same problem
link |
01:27:52.960
for 50 years or more, and again, and again, and again.
link |
01:28:00.020
I'm trying to be honest and that is very important.
link |
01:28:02.680
Not to be very enthusiastic, but concentrate
link |
01:28:06.320
on whatever we was not able to achieve, for example.
link |
01:28:12.040
And understand why.
link |
01:28:13.360
And now I understand that because I believe in math,
link |
01:28:18.920
I believe that in Wigner's idea.
link |
01:28:23.740
But now when I see that there are only two way
link |
01:28:28.720
of convergence and we're using both,
link |
01:28:32.960
that means that we must do as well as people doing.
link |
01:28:37.960
But now, exactly in philosophy
link |
01:28:42.880
and what we know about predicate,
link |
01:28:45.760
how we understand life, can we describe as a predicate.
link |
01:28:51.400
I thought about that and that is more or less obvious
link |
01:28:57.840
level of symmetry.
link |
01:29:00.760
But next, I have a feeling,
link |
01:29:05.100
it's something about structures.
link |
01:29:09.540
But I don't know how to formulate,
link |
01:29:11.820
how to measure measure of structure and all this stuff.
link |
01:29:16.180
And the guy who will solve this challenge problem,
link |
01:29:22.220
then when we were looking how he did it,
link |
01:29:27.060
probably just only symmetry is not enough.
link |
01:29:30.340
But something like symmetry will be there.
link |
01:29:33.980
Structure will be there.
link |
01:29:34.820
Oh yeah, absolutely.
link |
01:29:35.640
Symmetry will be there and level of symmetry will be there.
link |
01:29:40.760
And level of symmetry, antisymmetry, diagonal, vertical.
link |
01:29:44.740
And I even don't know how you can use
link |
01:29:48.780
in different direction idea of symmetry, it's very general.
link |
01:29:52.300
But it will be there.
link |
01:29:54.940
I think that people very sensitive to idea of symmetry.
link |
01:29:58.600
But there are several ideas like symmetry.
link |
01:30:04.900
As I would like to learn.
link |
01:30:07.020
But you cannot learn just thinking about that.
link |
01:30:11.820
You should do challenging problems
link |
01:30:14.100
and then analyze them, why it was able to solve them.
link |
01:30:20.240
And then you will see.
link |
01:30:22.740
Very simple things, it's not easy to find.
link |
01:30:25.420
But even with talking about this every time.
link |
01:30:32.900
I was surprised, I tried to understand.
link |
01:30:36.340
These people describe in language
link |
01:30:40.120
strong convergence mechanism for learning.
link |
01:30:44.460
I did not see, I don't know.
link |
01:30:46.660
But weak convergence, this dark story
link |
01:30:50.100
and story like that when you will explain to kid,
link |
01:30:54.700
you will use weak convergence argument.
link |
01:30:57.620
It looks like it does like it does that.
link |
01:31:00.900
But when you try to formalize, you're just ignoring this.
link |
01:31:05.820
Why, why 50 years from start of machine learning?
link |
01:31:10.140
And that's the role of philosophy, thinking about life.
link |
01:31:12.420
I think that maybe, I don't know.
link |
01:31:18.300
Maybe this is theory also, we should blame for that
link |
01:31:22.780
because empirical risk minimization and all this stuff.
link |
01:31:27.100
And if you read now textbooks,
link |
01:31:30.660
they just about bound about empirical risk minimization.
link |
01:31:34.420
They don't looking for another problem like admissible set.
link |
01:31:41.820
But on the topic of life, perhaps we,
link |
01:31:47.340
you could talk in Russian for a little bit.
link |
01:31:50.020
What's your favorite memory from childhood?
link |
01:31:53.180
What's your favorite memory from childhood?
link |
01:31:56.740
Oh, music.
link |
01:31:59.500
How about, can you try to answer in Russian?
link |
01:32:02.700
Music?
link |
01:32:04.980
It was very cool when...
link |
01:32:08.100
What kind of music?
link |
01:32:09.980
Classic music.
link |
01:32:11.860
What's your favorite?
link |
01:32:13.340
Well, different composers.
link |
01:32:15.900
At first, it was Vivaldi, I was surprised that it was possible.
link |
01:32:23.500
And then when I understood Bach, I was absolutely shocked.
link |
01:32:29.020
By the way, from him I think that there is a predicate,
link |
01:32:35.180
like a structure.
link |
01:32:36.740
In Bach?
link |
01:32:37.580
Well, of course.
link |
01:32:38.420
Because you can just feel the structure.
link |
01:32:42.700
And I don't think that different elements of life
link |
01:32:49.020
are very much divided, in the sense of predicates.
link |
01:32:53.020
Everywhere structure, in painting structure,
link |
01:32:56.900
in human relations structure.
link |
01:32:59.820
Here's how to find these high level predicates, it's...
link |
01:33:05.540
In Bach and in life, everything is connected.
link |
01:33:08.460
Now that we're talking about Bach,
link |
01:33:14.100
let's switch back to English,
link |
01:33:15.700
because I like Beethoven and Chopin, so...
link |
01:33:18.580
Well, Chopin, it's another amusing story.
link |
01:33:21.300
But Bach, if we talk about predicates,
link |
01:33:23.940
Bach probably has the most sort of
link |
01:33:29.300
well defined predicates that underlie it.
link |
01:33:31.860
It is very interesting to read what critics
link |
01:33:36.860
are writing about Bach, which words they're using.
link |
01:33:40.460
They're trying to describe predicates.
link |
01:33:43.500
And then Chopin, it is very different vocabulary,
link |
01:33:52.100
very different predicates.
link |
01:33:55.140
And I think that if you will make collection of that,
link |
01:34:02.700
so maybe from this you can describe predicate
link |
01:34:05.860
for digit recognition as well.
link |
01:34:08.780
From Bach and Chopin.
link |
01:34:10.460
No, no, no, not from Bach and Chopin.
link |
01:34:12.540
From the critic interpretation of the music, yeah.
link |
01:34:15.260
When they're trying to explain you music, what they use.
link |
01:34:22.300
As they use, they describe high level ideas
link |
01:34:25.260
of platos ideas, what behind this music.
link |
01:34:28.900
That's brilliant.
link |
01:34:29.740
So art is not self explanatory in some sense.
link |
01:34:34.740
So you have to try to convert it into ideas.
link |
01:34:39.060
It is ill post problems.
link |
01:34:40.940
When you go from ideas to the representation,
link |
01:34:46.060
it is easy way.
link |
01:34:47.580
But when you're trying to go Bach, it is ill post problems.
link |
01:34:51.420
But nevertheless, I believe that when you're looking
link |
01:34:55.660
from that, even from art, you will be able to find
link |
01:35:00.340
predicates for digit recognition.
link |
01:35:02.100
That's such a fascinating and powerful notion.
link |
01:35:08.500
Do you ponder your own mortality?
link |
01:35:11.660
Do you think about it?
link |
01:35:12.540
Do you fear it?
link |
01:35:13.660
Do you draw insight from it?
link |
01:35:16.820
About mortality, no, yeah.
link |
01:35:21.540
Are you afraid of death?
link |
01:35:25.860
Not too much, not too much.
link |
01:35:29.660
It is pity that I will not be able to do something
link |
01:35:33.700
which I think I have a feeling to do that.
link |
01:35:39.460
For example, I will be very happy to work with guys
link |
01:35:48.020
theoretician from music to write this collection
link |
01:35:52.060
of description, how they describe music,
link |
01:35:55.060
how they use that predicate, and from art as well.
link |
01:36:00.140
Then take what is in common and try to understand
link |
01:36:04.580
predicate which is absolute for everything.
link |
01:36:08.660
And then use that for visual recognition
link |
01:36:10.460
and see if there is a connection.
link |
01:36:12.180
Yeah, exactly.
link |
01:36:13.540
Ah, there's still time.
link |
01:36:14.660
We got time.
link |
01:36:16.980
Ha ha ha ha.
link |
01:36:18.660
Yeah.
link |
01:36:19.500
We got time.
link |
01:36:20.340
It take years and years and years.
link |
01:36:24.100
Yes, yeah, it's a long way.
link |
01:36:26.460
Well, see, you've got the patient mathematicians mind.
link |
01:36:30.900
I think it could be done very quickly and very beautifully.
link |
01:36:34.060
I think it's a really elegant idea.
link |
01:36:35.820
Yeah, but also.
link |
01:36:36.940
Some of many.
link |
01:36:37.780
Yeah, you know, the most time,
link |
01:36:40.580
it is not to make this collection to understand
link |
01:36:45.280
what is the common to think about that once again
link |
01:36:48.700
and again and again.
link |
01:36:49.540
Again and again and again, but I think sometimes,
link |
01:36:52.660
especially just when you say this idea now,
link |
01:36:55.700
even just putting together the collection
link |
01:36:58.780
and looking at the different sets of data,
link |
01:37:03.300
language, trying to interpret music,
link |
01:37:05.520
criticize music, and images,
link |
01:37:08.740
I think there'll be sparks of ideas that'll come.
link |
01:37:10.940
Of course, again and again, you'll come up with better ideas,
link |
01:37:13.420
but even just that notion is a beautiful notion.
link |
01:37:16.940
I even have some example.
link |
01:37:19.340
Yes, so I have friend
link |
01:37:25.200
who was specialist in Russian poetry.
link |
01:37:30.960
She is professor of Russian poetry.
link |
01:37:35.320
He did not write poems,
link |
01:37:39.400
but she know a lot of stuff.
link |
01:37:43.340
She make book, several books,
link |
01:37:48.080
and one of them is a collection of Russian poetry.
link |
01:37:54.680
She have images of Russian poetry.
link |
01:37:57.140
She collect all images of Russian poetry.
link |
01:38:00.720
And I ask her to do following.
link |
01:38:05.420
You have NIPS, digit recognition,
link |
01:38:09.720
and we get 100 digits,
link |
01:38:13.520
or maybe less than 100.
link |
01:38:15.280
I don't remember, maybe 50 digits.
link |
01:38:18.920
And try from poetical point of view,
link |
01:38:21.680
describe every image which she see,
link |
01:38:25.260
using only words of images of Russian poetry.
link |
01:38:31.320
And she did it.
link |
01:38:34.280
And then we tried to,
link |
01:38:41.140
I call it learning using privileged information.
link |
01:38:43.600
I call it privileged information.
link |
01:38:45.920
You have on two languages.
link |
01:38:48.040
One language is just image of digit,
link |
01:38:53.140
and another language, poetic description of this image.
link |
01:38:57.760
And this is privileged information.
link |
01:39:02.360
And there is an algorithm when you're working
link |
01:39:04.520
using privileged information, you're doing better.
link |
01:39:08.320
Much better, so.
link |
01:39:10.400
So there's something there.
link |
01:39:11.560
Something there.
link |
01:39:12.880
And there is a, in NEC,
link |
01:39:16.980
she unfortunately died.
link |
01:39:20.880
The collection of digits
link |
01:39:24.840
in poetic descriptions of these digits.
link |
01:39:29.160
Yeah.
link |
01:39:30.000
So there's something there in that poetic description.
link |
01:39:32.920
But I think that there is a abstract ideas
link |
01:39:38.320
on the plot of level of ideas.
link |
01:39:40.680
Yeah, that they're there.
link |
01:39:42.000
That could be discovered.
link |
01:39:43.120
And music seems to be a good entry point.
link |
01:39:45.000
But as soon as we start with this challenge problem.
link |
01:39:50.340
The challenge problem.
link |
01:39:51.180
Listen.
link |
01:39:52.020
It immediately connected to all this stuff.
link |
01:39:55.400
Especially with your talk and this podcast,
link |
01:39:58.060
and I'll do whatever I can to advertise it.
link |
01:40:00.120
It's such a clean, beautiful Einstein like formulation
link |
01:40:03.280
of the challenge before us.
link |
01:40:05.240
Right.
link |
01:40:06.060
Let me ask another absurd question.
link |
01:40:09.520
We talked about mortality.
link |
01:40:12.800
We talked about philosophy of life.
link |
01:40:14.640
What do you think is the meaning of life?
link |
01:40:17.560
What's the predicate for mysterious existence here on earth?
link |
01:40:29.620
I don't know.
link |
01:40:33.620
It's very interesting how we have,
link |
01:40:37.640
in Russia, I don't know if you know the guy Strugatsky.
link |
01:40:43.100
They are writing fiction.
link |
01:40:46.320
They're thinking about human, what's going on.
link |
01:40:51.680
And they have idea that there are developing
link |
01:41:00.560
two type of people, common people and very smart people.
link |
01:41:05.120
They just started.
link |
01:41:06.080
And these two branches of people will go
link |
01:41:10.420
in different direction very soon.
link |
01:41:13.540
So that's what they're thinking about that.
link |
01:41:18.980
So the purpose of life is to create two paths.
link |
01:41:23.800
Two paths.
link |
01:41:24.640
Of human societies.
link |
01:41:25.940
Yes.
link |
01:41:27.020
Simple people and more complicated people.
link |
01:41:29.980
Which do you like best?
link |
01:41:31.540
The simple people or the complicated ones?
link |
01:41:34.500
I don't know that it is just his fantasy,
link |
01:41:38.260
but you know, every week we have guy
link |
01:41:41.700
who is just a writer and also a theorist of literature.
link |
01:41:51.820
And he explain how he understand literature
link |
01:41:56.600
and human relationship.
link |
01:41:58.800
How he see life.
link |
01:42:00.340
And I understood that I'm just small kids
link |
01:42:06.920
comparing to him.
link |
01:42:09.500
He's very smart guy in understanding life.
link |
01:42:13.880
He knows this predicate.
link |
01:42:15.640
He knows big blocks of life.
link |
01:42:19.760
I am used every time when I listen to him.
link |
01:42:24.800
And he just talking about literature.
link |
01:42:27.400
And I think that I was surprised.
link |
01:42:33.200
So the managers in big companies,
link |
01:42:41.460
most of them are guys who study English language
link |
01:42:48.760
and English literature.
link |
01:42:51.120
So why?
link |
01:42:52.520
Because they understand life.
link |
01:42:54.820
They understand models.
link |
01:42:57.040
And among them,
link |
01:42:58.800
maybe many talented critics just analyzing this.
link |
01:43:06.680
And this is big science like property.
link |
01:43:10.520
This is blocks.
link |
01:43:13.360
That's very smart.
link |
01:43:17.480
It amazes me that you are and continue to be humbled
link |
01:43:21.520
by the brilliance of others.
link |
01:43:22.960
I'm very modest about myself.
link |
01:43:25.540
I see so smart guys around.
link |
01:43:28.960
Well, let me be immodest for you.
link |
01:43:31.720
You're one of the greatest mathematicians,
link |
01:43:33.920
statisticians of our time.
link |
01:43:35.820
It's truly an honor.
link |
01:43:36.960
Thank you for talking again.
link |
01:43:38.600
And let's talk.
link |
01:43:41.240
It is not.
link |
01:43:43.440
I know my limits.
link |
01:43:45.720
Let's talk again when your challenge is taken on
link |
01:43:49.120
and solved by grad student.
link |
01:43:51.880
Especially when they use it.
link |
01:43:55.200
It happens.
link |
01:43:57.200
Maybe music will be involved.
link |
01:43:58.880
Latimer, thank you so much.
link |
01:43:59.880
It's been an honor. Thank you very much.
link |
01:44:02.580
Thanks for listening to this conversation
link |
01:44:04.200
with Latimer Vapnik.
link |
01:44:05.480
And thank you to our presenting sponsor, Cash App.
link |
01:44:08.760
Download it, use code LexPodcast.
link |
01:44:11.440
You'll get $10 and $10 will go to FIRST,
link |
01:44:14.320
an organization that inspires and educates young minds
link |
01:44:17.040
to become science and technology innovators of tomorrow.
link |
01:44:20.760
If you enjoy this podcast, subscribe on YouTube,
link |
01:44:23.480
give us five stars on Apple Podcast,
link |
01:44:25.320
support it on Patreon,
link |
01:44:26.840
or simply connect with me on Twitter at Lex Friedman.
link |
01:44:31.360
And now, let me leave you with some words
link |
01:44:33.480
from Latimer Vapnik.
link |
01:44:35.580
When solving a problem of interest,
link |
01:44:37.760
do not solve a more general problem
link |
01:44:40.080
as an intermediate step.
link |
01:44:43.040
Thank you for listening.
link |
01:44:44.360
I hope to see you next time.