back to indexVladimir Vapnik: Predicates, Invariants, and the Essence of Intelligence | Lex Fridman Podcast #71
link |
The following is a conversation with Vladimir Vapnik, part two, the second
link |
time we spoke on the podcast.
link |
He's the coinventor of support vector machines, support vector clustering, VC
link |
theory, and many foundational ideas and statistical learning.
link |
He was born in the Soviet Union, worked at the Institute of Control Sciences
link |
in Moscow, then in the US, worked at AT&T, NEC labs, Facebook AI research,
link |
and now is a professor at Columbia University.
link |
His work has been cited over 200,000 times.
link |
The first time we spoke on the podcast was just over a year
link |
ago, one of the early episodes.
link |
This time we spoke after a lecture he gave titled complete statistical theory
link |
of learning as part of the MIT series of lectures on deep learning
link |
and AI that I organized.
link |
I'll release the video of the lecture in the next few days.
link |
This podcast and lecture are independent from each other, so you don't need
link |
one to understand the other.
link |
The lecture is quite technical and math heavy, so if you do watch both, I
link |
recommend listening to this podcast first, since the podcast is
link |
probably a bit more accessible.
link |
This is the artificial intelligence podcast.
link |
If you enjoy it, subscribe on YouTube, give it five stars on Apple podcasts,
link |
support it on Patreon, or simply connect with me on Twitter
link |
at Lex Friedman spelled F R I D M A N.
link |
As usual, I'll do one or two minutes of ads now and never any ads in
link |
the middle that can break the flow of the conversation.
link |
I hope that works for you and doesn't hurt the listening experience.
link |
This show is presented by Cash App, the number one finance app in the app store.
link |
When you get it, use code LexPodcast.
link |
Cash App lets you send money to friends, buy Bitcoin, and invest in the
link |
stock market with as little as $1.
link |
Broker services are provided by Cash App Investing, a subsidiary of Square
link |
and member SIPC, since Cash App allows you to send and receive money
link |
digitally, peer to peer, and security in all digital transactions is very important.
link |
Let me mention that PCI data security standard, PCI DSS level one,
link |
that Cash App is compliant with.
link |
I'm a big fan of standards for safety and security and PCI DSS is a good
link |
example of that, where a bunch of competitors got together and agreed
link |
that there needs to be a global standard around the security of transactions.
link |
Now we just need to do the same for autonomous vehicles
link |
and AI systems in general.
link |
So again, if you get Cash App from the app store or Google Play and use the code
link |
LexPodcast, you get $10 and Cash App will also donate $10 to FIRST, one of my
link |
favorite organizations that is helping to advance robotics and STEM education
link |
for young people around the world.
link |
And now here's my conversation with Vladimir Vapnik.
link |
You and I talked about Alan Turing yesterday a little bit and that he, as the
link |
father of artificial intelligence, may have instilled in our field, an ethic
link |
of engineering and not science, seeking more to build intelligence
link |
rather than to understand it.
link |
What do you think is the difference between these two paths of engineering
link |
intelligence and the science of intelligence?
link |
It's a completely different story.
link |
Engineering is a mutation of human activity.
link |
You have to make a device which behaves as humans behave, have all the functions
link |
It doesn't matter how you do it, but to understand what is intelligence,
link |
but to understand what is intelligence about, it's quite a different problem.
link |
So I think, I believe that it's somehow related to the predicate we talked
link |
yesterday about, because look at the Vladimir Propp's idea.
link |
He just found 31 here, predicates, he called it units, which can explain
link |
human behavior, at least in Russian tales.
link |
You look at Russian tales and derive from that.
link |
And then people realize that it's more wide than in Russian tales.
link |
It is in TV, in movie serials and so on and so on.
link |
So you're talking about Vladimir Propp, who in 1928 published a book,
link |
Morphology of the Folktale, describing 31 predicates that have this kind of
link |
sequential structure that a lot of the stories, narratives follow in Russian
link |
folklore and in other contexts.
link |
We'll talk about it.
link |
I'd like to talk about predicates in a focused way, but let me, if you allow
link |
me to stay zoomed out on our friend, Alan Turing, and, you know, he inspired
link |
a generation with the imitation game.
link |
Do you think if we can linger on that a little bit longer, do you think we can
link |
learn, do you think learning to imitate intelligence can get us closer to the
link |
science, to understanding intelligence?
link |
So why do you think imitation is so far from understanding?
link |
I think that it is different between you have different goals.
link |
So your goal is to create something, something useful.
link |
And that is great.
link |
And you can see how much things was done and I believe that it will be done even
link |
more, it's self driving cars and also the business, it is great.
link |
And it was inspired by Turing's vision.
link |
But understanding is very difficult.
link |
It's more or less philosophical category.
link |
What means understand the world?
link |
I believe in scheme which starts from Plato, that there exists world of ideas.
link |
I believe that intelligence, it is world of ideas, but it is world of pure ideas.
link |
And when you combine them with reality things, it creates, as in my case,
link |
invariants, which is very specific.
link |
And that's, I believe, the combination of ideas in way to constructing invariants.
link |
Constructing invariant is intelligence.
link |
But first of all, predicate, if you know, predicate and hopefully
link |
then not too much predicate exists.
link |
For example, 31 predicate for human behavior, it is not a lot.
link |
Vladimir Propp used 31, you can even call them predicate, 31
link |
predicates to describe stories, narratives.
link |
Do you think human behavior, how much of human behavior, how much of our
link |
world, our universe, all the things that matter in our existence can be
link |
summarized in predicates of the kind that Propp was working with?
link |
I think that we have a lot of form of behavior, but I think that
link |
predicate is much less because even in this example, which I gave you
link |
yesterday, you saw that predicate can be, one predicate can construct many
link |
different invariants depending on your data.
link |
They're applying to different data and they give different invariants.
link |
So, but pure ideas, maybe not so much.
link |
I don't know about that, but my guess, I hope that's why challenge
link |
about digit recognition, how much you need.
link |
I think we'll talk about computer vision and 2D images a little bit
link |
in your challenge.
link |
That's exactly about intelligence.
link |
That's exactly, that's exactly about, no, that hopes to be exactly about
link |
the spirit of intelligence in the simplest possible way.
link |
Yeah, absolutely you should start the simplest way, otherwise you
link |
will not be able to do it.
link |
Well, there's an open question whether starting at the MNIST digit
link |
recognition is a step towards intelligence or it's an entirely different thing.
link |
I think that to beat records using say 100, 200 times less examples,
link |
you need intelligence.
link |
You need intelligence.
link |
So let's, because you use this term and it would be nice, I'd like to
link |
ask simple, maybe even dumb questions.
link |
Let's start with a predicate.
link |
In terms of terms and how you think about it, what is a predicate?
link |
I have a feeling formally they exist, but I believe that predicate for
link |
2D images, one of them is symmetry.
link |
Sorry, sorry to interrupt and pull you back.
link |
At the simplest level, we're not even, we're not being profound currently.
link |
A predicate is a statement of something that is true.
link |
Do you think of predicates as somehow probabilistic in nature or is this binary?
link |
This is truly constraints of logical statements about the world.
link |
In my definition, the simplest predicate is function.
link |
Function, and you can use this function to make inner product that is predicate.
link |
What's the input and what's the output of the function?
link |
Input is X, something which is input in reality.
link |
Say if you consider digit recognition, it pixel space input, but it is
link |
function which in pixel space, but it can be any function from pixel space and you
link |
choose, and I believe that there are several functions which is important for
link |
understanding of images.
link |
One of them is symmetry.
link |
It's not so simple construction as I described with the derivative, with all
link |
this stuff, but another, I believe, I don't know how many, is how well
link |
structurized is picture.
link |
What do you mean by structurized?
link |
It is formal definition.
link |
Say something heavy on the left corner, not so heavy in the middle and so on.
link |
You describe in general concept of what you assume.
link |
Concepts, some kind of universal concepts.
link |
Yeah, but I don't know how to formalize this.
link |
So this is the thing.
link |
There's a million ways we can talk about this.
link |
I'll keep bringing it up, but we humans have such concepts when we look at
link |
digits, but it's hard to put them, just like you're saying now, it's
link |
hard to put them into words.
link |
You know, that is example, when critics in music, trying to describe music,
link |
they use predicate and not too many predicate, but in different combination,
link |
but they have some special words for describing music and the same
link |
should be for images, but maybe there are critics who understand essence
link |
of what this image is about.
link |
Do you think there exists critics who can summarize the essence of
link |
images, human beings?
link |
I hope so, yes, but that...
link |
Explicitly state them on paper.
link |
The fundamental question I'm asking is, do you think there exists a small
link |
set of predicates that will summarize images?
link |
It feels to our mind, like it does, that the concept of what makes a two
link |
and a three and a four...
link |
No, no, no, it's not on this level.
link |
It should not describe two, three, four.
link |
It describes some construction, which allow you to create invariance.
link |
And invariance, sorry to stick on this, but terminology.
link |
Invariance, it is property of your image.
link |
Say, I can say, looking on my image, it is more or less symmetric.
link |
Looking on my image, it is more or less symmetric, and I can give you value
link |
of symmetry, say, level of symmetry, using this function which I gave
link |
yesterday. And you can describe that your image has these characteristics
link |
exactly in the way how musical critics describe music.
link |
So, but this is invariant applied to specific data, to specific music,
link |
I strongly believe in this plot ideas that there exists world of predicate
link |
and world of reality, and predicate and reality is somehow connected,
link |
and you have to know that.
link |
Let's talk about Plato a little bit.
link |
So you draw a line from Plato, to Hegel, to Wigner, to today.
link |
So Plato has forms, the theory of forms.
link |
So there's a world of ideas and a world of things, as you talk about,
link |
and there's a connection.
link |
And presumably the world of ideas is very small, and the world of things
link |
is arbitrarily big, but they're all what Plato calls them like, it's a shadow.
link |
The real world is a shadow from the world of forms.
link |
Yeah, you have projection of a world of ideas.
link |
Yeah, very poetic.
link |
In reality, you can realize this projection using these invariants
link |
because it is projection for own specific examples, which create specific features
link |
of specific objects.
link |
So the essence of intelligence is while only being able to observe
link |
the world of things, try to come up with a world of ideas.
link |
Like in this music story, intelligent musical critics knows all these words
link |
and have a feeling about what they mean.
link |
I feel like that's a contradiction, intelligent music critics.
link |
But I think music is to be enjoyed in all its forms.
link |
The notion of critic, like a food critic.
link |
No, I don't want touch emotion.
link |
That's an interesting question.
link |
There's certain elements of the human psychology, of the human experience,
link |
which seem to almost contradict intelligence and reason.
link |
Like emotion, like fear, like love, all of those things,
link |
are those not connected in any way to the space of ideas?
link |
That I don't know.
link |
I just want to be concentrate on very simple story, on digit recognition.
link |
So you don't think you have to love and fear death in order to recognize digits?
link |
Because it's so complicated.
link |
It involves a lot of stuff which I never considered.
link |
But I know about digit recognition.
link |
And I know that for digit recognition,
link |
to get records from small number of observations, you need predicate.
link |
But not special predicate for this problem.
link |
But universal predicate, which understand world of images.
link |
Of visual information.
link |
But on the first step, they understand, say, world of handwritten digits,
link |
or characters, or something simple.
link |
So like you said, symmetry is an interesting one.
link |
No, that's what I think one of the predicate is related to symmetry.
link |
The level of symmetry.
link |
Okay, degree of symmetry.
link |
So you think symmetry at the bottom is a universal notion,
link |
and there's degrees of a single kind of symmetry,
link |
or is there many kinds of symmetries?
link |
Many kinds of symmetries.
link |
There is a symmetry, antisymmetry, say, letter S.
link |
So it has vertical antisymmetry.
link |
And it could be diagonal symmetry, vertical symmetry.
link |
So when you cut vertically the letter S...
link |
Yeah, then the upper part and lower part in different directions.
link |
Inverted, along the Y axis.
link |
But that's just like one example of symmetry, right?
link |
Isn't there like...
link |
Right, but there is a degree of symmetry.
link |
If you play all this iterative stuff to do tangent distance,
link |
whatever I describe, you can have a degree of symmetry.
link |
And that is what describing reason of image.
link |
It is the same as you will describe this image.
link |
Think about digit S, it has antisymmetry.
link |
Digit three is symmetric.
link |
More or less, look for symmetry.
link |
Do you think such concepts like symmetry,
link |
predicates like symmetry, is it a hierarchical set of concepts?
link |
Or are these independent, distinct predicates
link |
that we want to discover as some set of...
link |
No, there is an idea of symmetry.
link |
And you can, this idea of symmetry, make very general.
link |
Like degree of symmetry.
link |
If degree of symmetry can be zero, no symmetry at all.
link |
Or degree of symmetry, say, more or less symmetrical.
link |
But you have one of these descriptions.
link |
And symmetry can be different.
link |
As I told, horizontal, vertical, diagonal,
link |
and antisymmetry is also concept of symmetry.
link |
What about shape in general?
link |
I mean, symmetry is a fascinating notion, but...
link |
No, no, I'm talking about digit.
link |
I would like to concentrate on all I would like to know,
link |
predicate for digit recognition.
link |
Yes, but symmetry is not enough for digit recognition, right?
link |
It is not necessarily for digit recognition.
link |
It helps to create invariant, which you can use
link |
when you will have examples for digit recognition.
link |
You have regular problem of digit recognition.
link |
You have examples of the first class or second class.
link |
Plus, you know that there exists concept of symmetry.
link |
And you apply, when you're looking for decision rule,
link |
you will apply concept of symmetry,
link |
of this level of symmetry, which you estimate from...
link |
Everything comes from weak convergence.
link |
What is convergence?
link |
What is weak convergence?
link |
What is strong convergence?
link |
I'm sorry, I'm gonna do this to you.
link |
What are we converging from and to?
link |
You're converging, you would like to have a function.
link |
The function which, say, indicator function,
link |
which indicate your digit five, for example.
link |
A classification task.
link |
Let's talk only about classification.
link |
So classification means you will say
link |
whether this is a five or not,
link |
or say which of the 10 digits it is.
link |
I would like to have these functions.
link |
Then, I have some examples.
link |
I can consider property of these examples.
link |
And I can measure level of symmetry for every digit.
link |
And then I can take average from my training data.
link |
And I will consider only functions
link |
of conditional probability,
link |
which I'm looking for my decision rule.
link |
Which applying to digits will give me the same average
link |
as I observe on training data.
link |
So, actually, this is different level
link |
of description of what you want.
link |
You want not just, you show not one digit.
link |
You show, this predicate, show general property
link |
of all digits which you have in mind.
link |
If you have in mind digit three,
link |
it gives you property of digit three.
link |
And you select as admissible set of function,
link |
only function, which keeps this property.
link |
You will not consider other functions.
link |
So, you immediately looking for smaller subset of function.
link |
That's what you mean by admissible functions.
link |
Admissible function, exactly.
link |
Which is still a pretty large,
link |
for the number three, is a large.
link |
It is pretty large, but if you have one predicate.
link |
But according to, there is a strong and weak convergence.
link |
Strong convergence is convergence in function.
link |
You're looking for the function on one function,
link |
and you're looking for another function.
link |
And square difference from them should be small.
link |
If you take difference in any points,
link |
make a square, make an integral, and it should be small.
link |
That is convergence in function.
link |
Suppose you have some function, any function.
link |
So, I would say, I say that some function
link |
converge to this function.
link |
If integral from square difference between them is small.
link |
That's the definition of strong convergence.
link |
That definition of strong convergence.
link |
Two functions, the integral, the difference, is small.
link |
Yeah, it is convergence in functions.
link |
But you have different convergence in functionals.
link |
You take any function, you take some function, phi,
link |
and take inner product, this function, this f function.
link |
f0 function, which you want to find.
link |
And that gives you some value.
link |
So, you say that set of functions converge
link |
in inner product to this function,
link |
if this value of inner product converge to value f0.
link |
That is for one phi.
link |
But weak convergence requires that it converge for any
link |
function of Hilbert space.
link |
If it converge for any function of Hilbert space,
link |
then you will say that this is weak convergence.
link |
You can think that when you take integral,
link |
that is integral property of function.
link |
For example, if you will take sine or cosine,
link |
it is coefficient of, say, Fourier expansion.
link |
So, if it converge for all coefficients of Fourier
link |
expansion, so under some condition,
link |
it converge to function you're looking for.
link |
But weak convergence means any property.
link |
Convergence not point wise, but integral property
link |
So, weak convergence means integral property of functions.
link |
When I'm talking about predicate,
link |
I would like to formulate which integral properties
link |
I would like to have for convergence.
link |
So, and if I will take one predicated function,
link |
which I measure property, if I will use one predicate
link |
and say, I will consider only function which give me
link |
the same value as this predicate,
link |
I selecting set of functions from functions
link |
which is admissible in the sense that function which I'm
link |
looking for in this set of functions
link |
because I checking in training data, it gives the same.
link |
Yeah, so it always has to be connected to the training
link |
Yeah, but property, you can know independent on training data.
link |
And this guy, prop, says that there is formal property,
link |
A fairy tale, a Russian fairy tale.
link |
But Russian fairy tale is not so interesting.
link |
More interesting that people apply this to movies,
link |
to theater, to different things.
link |
And the same works, they're universal.
link |
Well, so I would argue that there's
link |
a little bit of a difference between the kinds of things
link |
that were applied to which are essentially stories
link |
and digit recognition.
link |
It is the same story.
link |
You're saying digits, there's a story within the digit.
link |
And so but my point is why I hope
link |
that it possible to beat record using not 60,000,
link |
but say 100 times less.
link |
Because instead, you will give predicates.
link |
And you will select your decision
link |
not from wide set of functions, but from set of functions
link |
which keeps this predicates.
link |
But predicate is not related just to digit recognition.
link |
Like in Plato's case.
link |
Do you think it's possible to automatically discover
link |
So you basically said that the essence of intelligence
link |
is the discovery of good predicates.
link |
Now, the natural question is that's
link |
what Einstein was good at doing in physics.
link |
Can we make machines do these kinds
link |
of discovery of good predicates?
link |
Or is this ultimately a human endeavor?
link |
That I don't know.
link |
I don't think that machine can do.
link |
Because according to theory about weak convergence,
link |
any function from Hilbert space can be predicated.
link |
So you have infinite number of predicate in upper.
link |
And before, you don't know which predicate is good and which.
link |
But whatever prop show and why people call it breakthrough,
link |
that there is not too many predicate
link |
which cover most of situation happened in the world.
link |
So there's a sea of predicates.
link |
And most of the only a small amount
link |
are useful for the kinds of things
link |
that happen in the world.
link |
I think that I would say only small part of predicate
link |
Useful all of them.
link |
Only very few are what we should let's call them
link |
Very good predicates.
link |
Very good predicates.
link |
So can we linger on it?
link |
What's your intuition?
link |
Why is it hard for a machine to discover good predicates?
link |
Even in my talk described how to do predicate.
link |
How to find new predicate.
link |
I'm not sure that it is very good.
link |
What did you propose in your talk?
link |
In my talk, I gave example for diabetes.
link |
When we achieve some percent.
link |
So then we're looking for area where
link |
some sort of predicate, which I formulate,
link |
does not keeps invariant.
link |
So if it doesn't keep, I retrain my data.
link |
I select only function which keeps this invariant.
link |
And when I did it, I improved my performance.
link |
I can looking for this predicate.
link |
I know technically how to do that.
link |
And you can, of course, do it using machine.
link |
But I'm not sure that we will construct the smartest
link |
But this is the, allow me to linger on it.
link |
Because that's the essence.
link |
That's the challenge.
link |
That is artificial.
link |
That's the human level intelligence
link |
that we seek is the discovery of these good predicates.
link |
You've talked about deep learning as a way to,
link |
the predicates they use and the functions are mediocre.
link |
You can find better ones.
link |
Let's talk about deep learning.
link |
Sure, let's do it.
link |
I know only Jan's Likun convolutional network.
link |
And it's a very simple convolution.
link |
There's not much else to know.
link |
To pixel left and right.
link |
I can do it like that with one predicate.
link |
Convolution is a single predicate.
link |
It's single predicate.
link |
Yes, but that's it.
link |
You take the derivative for translation and predicate.
link |
This should be kept.
link |
So that's a single predicate.
link |
But humans discovered that one.
link |
Not too many predicates.
link |
And that is big story because Jan did it 25 years ago
link |
and nothing so clear was added to deep network.
link |
And then I don't understand why we
link |
should talk about deep network instead of talking
link |
about piecewise linear functions which keeps this predicate.
link |
Well, a counter argument is that maybe the amount
link |
of predicates necessary to solve general intelligence,
link |
say in the space of images, doing
link |
efficient recognition of handwritten digits
link |
And so we shouldn't be so obsessed about finding.
link |
We'll find other good predicates like convolution, for example.
link |
There has been other advancements
link |
like if you look at the work with attention,
link |
there's intentional mechanisms in especially used
link |
in natural language focusing the network's ability
link |
to learn at which part of the input to look at.
link |
The thing is, there's other things besides predicates
link |
that are important for the actual engineering mechanism
link |
of showing how much you can really
link |
do given these predicates.
link |
I mean, that's essentially the work of deep learning
link |
is constructing architectures that are able to be,
link |
given the training data, to be able to converge
link |
towards a function that can generalize well.
link |
It's an engineering problem.
link |
Yeah, I understand.
link |
But let's talk not on emotional level,
link |
but on a mathematical level.
link |
You have set of piecewise linear functions.
link |
It is all possible neural networks.
link |
It's just piecewise linear functions.
link |
It's many, many pieces.
link |
Large number of piecewise linear functions.
link |
Almost feels like too large.
link |
It's still simpler than, say, convolution,
link |
than reproducing kernel Hilbert space, which
link |
have a Hilbert set of functions.
link |
What's Hilbert space?
link |
It's space with infinite number of coordinates,
link |
say, or function for expansion, something like that.
link |
So it's much richer.
link |
And when I'm talking about closed form solution,
link |
I'm talking about this set of function,
link |
not piecewise linear set, which is particular case of it
link |
So neural networks is a small part
link |
of the space of functions you're talking about.
link |
Say, small set of functions.
link |
I don't want to discuss the small or big.
link |
You take advantage.
link |
So you have some set of functions.
link |
So now, when you're trying to create architecture,
link |
you would like to create admissible set of functions,
link |
which all your tricks to use not all functions,
link |
but some subset of this set of functions.
link |
Say, when you're introducing convolutional net,
link |
it is way to make this subset useful for you.
link |
But from my point of view, convolutional,
link |
it is something you want to keep some invariants,
link |
say, translation invariants.
link |
But now, if you understand this and you cannot explain
link |
on the level of ideas what neural network does,
link |
you should agree that it is much better
link |
to have a set of functions.
link |
And they say, this set of functions should be admissible.
link |
It must keep this invariant, this invariant,
link |
and that invariant.
link |
You know that as soon as you incorporate
link |
new invariant set of function, because smaller and smaller
link |
But all the invariants are specified by you, the human.
link |
Yeah, but what I hope that there is a standard predicate,
link |
like PROPSHOW, that's what I want
link |
to find for digit recognition.
link |
If we start, it is completely new area,
link |
what is intelligence about on the level,
link |
starting from Plato's idea, what is world of ideas.
link |
And I believe that is not too many.
link |
But it is amusing that mathematicians doing something,
link |
a neural network in general function,
link |
but people from literature, from art, they use this all
link |
Invariants saying, it is great how people describe music.
link |
We should learn from that.
link |
And something on this level.
link |
But so why Vladimir Propp, who was just theoretical,
link |
who studied theoretical literature, he found that.
link |
Let me throw that right back at you,
link |
because there's a little bit of a,
link |
that's less mathematical and more emotional, philosophical,
link |
I mean, he wasn't doing math.
link |
And you just said another emotional statement,
link |
which is you believe that this Plato world of ideas is small.
link |
Do you, what's your intuition, though?
link |
If we can linger on it.
link |
You know, it is not just small or big.
link |
Then when I introducing some predicate,
link |
I decrease set of functions.
link |
But my goal to decrease set of function much.
link |
By as much as possible.
link |
By as much as possible.
link |
Good predicate, which does this, then
link |
I should choose next predicate, which decrease set
link |
as much as possible.
link |
So set of good predicate, it is such
link |
that they decrease this amount of admissible function.
link |
So if each good predicate significantly
link |
reduces the set of admissible functions,
link |
that there naturally should not be that many good predicates.
link |
No, but if you reduce very well the VC dimension
link |
of the function, of admissible set of function, it's small.
link |
And you need not too much training data to do well.
link |
And VC dimension, by the way, is some measure of capacity
link |
of this set of functions.
link |
Roughly speaking, how many function in this set.
link |
So you're decreasing, decreasing.
link |
And it makes easy for you to find function
link |
you're looking for.
link |
But the most important part, to create good admissible set
link |
And it probably, there are many ways.
link |
But the good predicates such that they can do that.
link |
So for this duck, you should know a little bit about duck.
link |
Because what are the three fundamental laws of ducks?
link |
Looks like a duck, swims like a duck, and quacks like a duck.
link |
You should know something about ducks to be able to.
link |
Looks like, say, horse.
link |
So it's not, it generalizes from ducks.
link |
And talk like, and make sound like horse or something.
link |
And run like horse, and moves like horse.
link |
It is general, it is general predicate
link |
that this applied to duck.
link |
But for duck, you can say, play chess like duck.
link |
You cannot say play chess like duck.
link |
So you're saying you can, but that would not be a good.
link |
No, you will not reduce a lot of functions.
link |
You would not do, yeah, you would not
link |
reduce the set of functions.
link |
So you can, the story is formal story, mathematical story.
link |
Is that you can use any function you want as a predicate.
link |
But some of them are good, some of them are not,
link |
because some of them reduce a lot of functions
link |
to admissible set of some of them.
link |
But the question is, and I'll probably
link |
keep asking this question, but how do we find such,
link |
what's your intuition?
link |
Handwritten recognition.
link |
How do we find the answer to your challenge?
link |
Yeah, I understand it like that.
link |
I understand what.
link |
What it means, I knew predicate.
link |
Like guy who understand music can say this word,
link |
which he described when he listened to music.
link |
He understand music.
link |
He use not too many different, oh, you can do like prop.
link |
You can make collection.
link |
What he talking about music, about this, about that.
link |
It's not too many different situation he described.
link |
Because we mentioned Vladimir prop a bunch.
link |
Let me just mention, there's a sequence of 31
link |
structural notions that are common in stories.
link |
You call it units.
link |
And I think they resonate.
link |
I mean, it starts just to give an example,
link |
obsession, a member of the hero's community,
link |
a family leaves the security of the home environment.
link |
Then it goes to the interdiction,
link |
a forbidding edict or command is passed upon the hero.
link |
The hero is warned against some action.
link |
Then step three, violation of interdiction.
link |
Break the rules, break out on your own.
link |
Then reconnaissance.
link |
The villain makes an effort to attain knowledge,
link |
needing to fulfill their plan, so on.
link |
It goes on like this, ends in a wedding, number 31.
link |
Happily ever after.
link |
No, he just gave description of all situations.
link |
He understands this world.
link |
Yeah, not folktales, but stories.
link |
And these stories not in just folktales.
link |
These stories in detective serials as well.
link |
And probably in our lives.
link |
And then they wrote that this predicate is good
link |
for different situation.
link |
From movie, for theater.
link |
By the way, there's also criticism, right?
link |
There's an other way to interpret narratives
link |
from Claude Levi Strauss.
link |
I am not in this business.
link |
No, I know, it's theoretical literature,
link |
but it's looking at paradigms behind things.
link |
It's always the discussion, yeah.
link |
But at least there is units.
link |
It's not too many units that can describe.
link |
But this guy probably gives another units.
link |
Or another way of...
link |
Exactly, another set of units.
link |
Another set of predicates.
link |
It doesn't matter how.
link |
My question is, whether given those units,
link |
whether without our human brains to interpret these units,
link |
they would still hold as much power as they have.
link |
Meaning, are those units enough
link |
when we give them to an alien species?
link |
Do you understand digit images?
link |
No, I don't understand.
link |
When you can recognize these digit images,
link |
it means that you understand.
link |
You understand characters, you understand...
link |
It's the imitation versus understanding question,
link |
because I don't understand the mechanism
link |
by which I understand.
link |
I'm not talking about, I'm talking about predicates.
link |
You understand that it involves symmetry,
link |
maybe structure, maybe something else.
link |
I cannot formulate.
link |
I just was able to find symmetries, degree of symmetries.
link |
That's really good.
link |
So this is a good line.
link |
I feel like I understand the basic elements
link |
of what makes a good hand recognition system my own.
link |
Like symmetry connects with me.
link |
It seems like that's a very powerful predicate.
link |
My question is, is there a lot more going on
link |
that we're not able to introspect?
link |
Maybe I need to be able to understand
link |
a huge amount in the world of ideas,
link |
thousands of predicates, millions of predicates
link |
in order to do hand recognition.
link |
So both your hope and your intuition
link |
are such that very few predicates are enough.
link |
You're using digits, you're using examples as well.
link |
Theory says that if you will use all possible functions
link |
from Hilbert space, all possible predicate,
link |
you don't need training data.
link |
You just will have admissible set of function
link |
which contain one function.
link |
So the trade off is when you're not using all predicates,
link |
you're only using a few good predicates
link |
you need to have some training data.
link |
The more good predicates you have,
link |
the less training data you need.
link |
That is intelligent.
link |
Still, okay, I'm gonna keep asking the same dumb question,
link |
handwritten recognition to solve the challenge.
link |
You kind of propose a challenge that says
link |
we should be able to get state of the art MNIST error rates
link |
by using very few, 60, maybe fewer examples per digit.
link |
What kind of predicates do you think it will look like?
link |
That is the challenge.
link |
So people who will solve this problem,
link |
Do you think they'll be able to answer it
link |
in a human explainable way?
link |
They just need to write function, that's it.
link |
But so can that function be written, I guess,
link |
by an automated reasoning system?
link |
Whether we're talking about a neural network
link |
learning a particular function or another mechanism?
link |
No, I'm not against neural network.
link |
I'm against admissible set of function
link |
which create neural network.
link |
You did it by hand.
link |
You don't do it by invariance, by predicate, by reason.
link |
But neural networks can then reverse,
link |
do the reverse step of helping you find a function
link |
that just, the task of a neural network
link |
is to find a disentangled representation, for example,
link |
that they call, is to find that one predicate function
link |
that's really capture some kind of essence.
link |
One, not the entire essence, but one very useful essence
link |
of this particular visual space.
link |
Do you think that's possible?
link |
Listen, I'm grasping, hoping there's an automated way
link |
to find good predicates, right?
link |
So the question is what are the mechanisms
link |
of finding good predicates, ideas
link |
that you think we should pursue?
link |
A young grad student listening right now.
link |
So find situation where predicate which you're suggesting
link |
don't create invariant.
link |
It's like in physics.
link |
Find situation where existing theory cannot explain it.
link |
Find situation where the existing theory
link |
So you're finding contradictions.
link |
Find contradiction, and then remove this contradiction.
link |
But in my case, what means contradiction,
link |
you find function which, if you will use this function,
link |
you're not keeping invariants.
link |
This is really the process of discovering contradictions.
link |
It is like in physics.
link |
Find situation where you have contradiction
link |
for one of the property, for one of the predicate.
link |
Then include this predicate, making invariants,
link |
and solve again this problem.
link |
Now you don't have contradiction.
link |
But it is not the best way, probably, I don't know,
link |
to looking for predicate.
link |
That's just one way, okay.
link |
That, no, no, it is brute force way.
link |
The brute force way.
link |
What about the ideas of what,
link |
big umbrella term of symbolic AI?
link |
There's what in the 80s with expert systems,
link |
sort of logic reasoning based systems.
link |
Is there hope there to find some,
link |
through sort of deductive reasoning,
link |
to find good predicates?
link |
I think that just logic is not enough.
link |
It's kind of a compelling notion, though.
link |
You know, that when smart people sit in a room
link |
and reason through things, it seems compelling.
link |
And making our machines do the same is also compelling.
link |
So, everything is very simple.
link |
When you have infinite number of predicate,
link |
you can choose the function you want.
link |
You have invariants and you can choose the function you want.
link |
But you have to have not too many invariants
link |
to solve the problem.
link |
So, and have from infinite number of function
link |
to select finite number
link |
and hopefully small number of functions,
link |
which is good enough to extract small set
link |
of admissible functions.
link |
So, they will be admissible, it's for sure,
link |
because every function just decrease set of function
link |
and leaving it admissible.
link |
But it will be small.
link |
But why do you think logic based systems don't,
link |
can't help, intuition, not?
link |
Because you should know reality.
link |
You should know life.
link |
This guy like Propp, he knows something.
link |
And he tried to put in invariant his understanding.
link |
That's the human, yeah, but see,
link |
you're putting too much value into Vladimir Propp
link |
knowing something.
link |
No, it is, in the story, what means you know life?
link |
You know common sense.
link |
No, no, you know something.
link |
Common sense, it is some rules.
link |
Common sense is simply rules?
link |
Common sense is every, it's mortality,
link |
it's fear of death, it's love, it's spirituality,
link |
it's happiness and sadness.
link |
All of it is tied up into understanding gravity,
link |
which is what we think of as common sense.
link |
I don't really need to discuss so wide.
link |
I want to discuss, understand digit recognition.
link |
Anytime I bring up love and death,
link |
you bring it back to digit recognition, I like it.
link |
No, you know, it is durable because there is a challenge.
link |
Which I see how to solve it.
link |
If I will have a student concentrate on this work,
link |
I will suggest something to solve.
link |
You mean handwritten record?
link |
Yeah, it's a beautifully simple, elegant, and yet.
link |
I think that I know invariants which will solve this.
link |
But it is not universal, it is maybe,
link |
I want some universal invariants
link |
which are good not only for digit recognition,
link |
for image understanding.
link |
So let me ask, how hard do you think
link |
is 2D image understanding?
link |
So if we, we can kind of intuit handwritten recognition.
link |
How big of a step, leap, journey is it from that?
link |
If I gave you good, if I solved your challenge
link |
for handwritten recognition,
link |
how long would my journey then be from that
link |
to understanding more general, natural images?
link |
Immediately, you will understand this
link |
as soon as you will make a record.
link |
Because it is not for free.
link |
As soon as you will create several invariants
link |
which will help you to get the same performance
link |
that the best neural net did using 100,
link |
there might be more than 100 times less examples,
link |
you have to have something smart to do that.
link |
And you're saying?
link |
That is invariant, it is predicate.
link |
Because you should put some idea how to do that.
link |
But okay, let me just pause.
link |
Maybe it's a trivial point, maybe not.
link |
But handwritten recognition feels like a 2D,
link |
two dimensional problem.
link |
And it seems like how much complicated is the fact
link |
that most images are projection of a three dimensional world
link |
It feels like for a three dimensional world,
link |
we need to start understanding common sense
link |
in order to understand an image.
link |
It's no longer visual shape and symmetry.
link |
It's having to start to understand concepts
link |
of, understand life.
link |
Yeah, you're talking that there are different invariant,
link |
different predicate, yeah.
link |
And potentially much larger number.
link |
You know, maybe, but let's start from simple.
link |
Yeah, but you said that it would be immediate.
link |
No, you know, I cannot think about things
link |
which I don't understand.
link |
This I understand, but I'm sure that I don't understand
link |
Yeah, that's the difference.
link |
Do as simple as possible, but not simpler.
link |
And that is exact case.
link |
Yeah, but that's the difference between you and I.
link |
I welcome and enjoy thinking about things
link |
I completely don't understand.
link |
Because to me, it's a natural extension
link |
without having solved handwritten recognition
link |
to wonder how difficult is the next step
link |
of understanding 2D, 3D images.
link |
Because ultimately, while the science of intelligence
link |
is fascinating, it's also fascinating to see
link |
how that maps to the engineering of intelligence.
link |
And recognizing handwritten digits is not,
link |
doesn't help you, it might, it may not help you
link |
with the problem of general intelligence.
link |
It'll help you a little bit.
link |
We don't know how much.
link |
It might very much.
link |
But I would like to make a remark.
link |
I start not from very primitive problem,
link |
make a challenge problem.
link |
I start with very general problem, with PLATO.
link |
So you understand, and it comes from PLATO
link |
to digit recognition.
link |
So you basically took PLATO and the world
link |
of forms and ideas and mapped and projected
link |
into the clearest, simplest formulation
link |
of that big world.
link |
You know, I would say that I did not understand PLATO
link |
until recently, and until I consider
link |
the convergence and then predicate,
link |
and then, oh, this is what PLATO told.
link |
Can you linger on that?
link |
Like why, how do you think about this world of ideas
link |
and world of things in PLATO?
link |
No, it is metaphor.
link |
It's a metaphor, for sure.
link |
It's a compelling, it's a poetic
link |
and a beautiful metaphor.
link |
But what, can you?
link |
But it is a way how you should try to understand
link |
how to talk ideas in the world.
link |
So from my point of view,
link |
it is very clear, but it is lying.
link |
All the time, people looking for that.
link |
Say, PLATO, then Hegel, whatever reasonable it exists,
link |
whatever exists, it is reasonable.
link |
I don't know what he have in mind reasonable.
link |
Right, this philosophers again,
link |
their words. No, no, no, no, no, no, no.
link |
It is next stop of Wigner.
link |
That mathematics understand something of reality.
link |
It is the same PLATO line.
link |
And then it comes suddenly to Vladimir Propp.
link |
Look, 31 ideas, 31 units, and this corrects everything.
link |
There's abstractions, ideas that represent our world.
link |
Our world, and we should always try to reach into that.
link |
Yeah, but you should make a projection on reality.
link |
But understanding is, it is abstract ideas.
link |
You have in your mind several abstract ideas
link |
which you can apply to reality.
link |
And reality in this case,
link |
so if you look at machine learning as data.
link |
This example, data.
link |
Okay, let me put this on you
link |
because I'm an emotional creature.
link |
I'm not a mathematical creature like you.
link |
I find compelling the idea,
link |
forget the space, the sea of functions.
link |
There's also a sea of data in the world.
link |
And I find compelling that there might be,
link |
like you said, teacher,
link |
small examples of data that are most useful
link |
for discovering good,
link |
whether it's predicates or good functions,
link |
that the selection of data may be a powerful journey,
link |
a useful, you know, coming up with a mechanism
link |
for selecting good data might be useful too.
link |
Do you find this idea of finding the right data set
link |
interesting at all?
link |
Or do you kind of take the data set as a given?
link |
I think that it is, you know, my theme is very simple.
link |
You have huge set of functions.
link |
If you will apply, and you have not too many data,
link |
if you pick up function which describes this data,
link |
you will do not very well.
link |
Like randomly pick up.
link |
Yeah, you will overfit.
link |
Yeah, it will be overfitting.
link |
So you should decrease set of function
link |
from which you're picking up one.
link |
So you should go somehow to admissible set of function.
link |
And this, what about weak conversions?
link |
So, but from another point of view,
link |
to make admissible set of function,
link |
you need just a DG, just function
link |
which you will take in inner product,
link |
which you will measure property of your function.
link |
And that is how it works.
link |
No, I get it, I get it, I understand it,
link |
but do you, the reality is.
link |
But let's think about examples.
link |
You have huge set of function,
link |
and you have several examples.
link |
If you just trying to keep, take function
link |
which satisfies these examples, you still will overfit.
link |
You need decrease, you need admissible set of function.
link |
Absolutely, but what, say you have more data than functions.
link |
So sort of consider the, I mean,
link |
maybe not more data than functions,
link |
because that's impossible.
link |
But what, I was trying to be poetic for a second.
link |
I mean, you have a huge amount of data,
link |
a huge amount of examples.
link |
But amount of function can be even bigger.
link |
It can get bigger, I understand.
link |
There's always a bigger boat.
link |
Full Hilbert space.
link |
I got you, but okay.
link |
But you don't find the world of data
link |
to be an interesting optimization space.
link |
Like the optimization should be in the space of functions.
link |
Creating admissible set of functions.
link |
Admissible set of functions.
link |
No, you know, even from the classical business theory,
link |
from structure risk minimization,
link |
you should organize function in the way
link |
that they will be useful for you.
link |
And that is admissible set.
link |
The way you're thinking about useful
link |
is you're given a small set of examples.
link |
Useful small, small set of function
link |
which contain function I'm looking for.
link |
Yeah, but looking for based on
link |
the empirical set of small examples.
link |
Yeah, but that is another story.
link |
Because I believe that this small examples
link |
Law of large numbers works.
link |
I don't need uniform law.
link |
The story is that in statistics there are two law.
link |
Law of large numbers and uniform law of large numbers.
link |
So I want to be in situation where I use
link |
law of large numbers but not uniform law of large numbers.
link |
Right, so 60 is law of large, it's large enough.
link |
I hope, no, it still need some evaluations,
link |
But the idea is the following that
link |
say this average gives you something close to expectations
link |
so you can talk about that, about this predicate.
link |
And that is basis of human intelligence.
link |
Good predicates is the,
link |
the discovery of good predicates is the basis of human intelligence.
link |
It is discoverer of your understanding world.
link |
Of your methodology of understanding world.
link |
Because you have several function
link |
which you will apply to reality.
link |
Can you say that again?
link |
You have several functions predicate.
link |
But they're abstract.
link |
Then you will apply them to reality, to your data.
link |
And you will create in this way predicate.
link |
Which is useful for your task.
link |
But predicate are not related specifically to your task.
link |
To this your task.
link |
It is abstract functions.
link |
Which being applying, applied to...
link |
Many tasks that you might be interested in.
link |
It might be many tasks, I don't know.
link |
Well they should be many tasks, right?
link |
I believe like, like in prop case.
link |
It was for fairytales, but it's happened everywhere.
link |
Okay, so we talked about images a little bit.
link |
But, can we talk about Noam Chomsky for a second?
link |
No, I believe I...
link |
I don't know him very well.
link |
Personally, well...
link |
Not personally, I don't know.
link |
Well let me just say,
link |
do you think language, human language,
link |
is essential to expressing ideas?
link |
As Noam Chomsky believes.
link |
So like, language is at the core
link |
of our formation of predicates.
link |
The human language.
link |
For me, language and all the story of language
link |
is very complicated.
link |
I don't understand this.
link |
I thought about...
link |
I am not ready to work on that.
link |
Because it's so huge.
link |
It is not for me, and I believe not for our century.
link |
Not for 21st century.
link |
You should learn something, a lot of stuff,
link |
from simple task like digit recognition.
link |
So you think, okay, you think digital recognition,
link |
2D image, how would you more abstractly define
link |
digit recognition?
link |
It's 2D image, symbol recognition, essentially.
link |
I mean, I'm trying to get a sense,
link |
sort of thinking about it now,
link |
having worked with MNIST forever,
link |
how small of a subset is this
link |
of the general vision recognition problem
link |
and the general intelligence problem?
link |
Is it a giant subset?
link |
And how far away is language?
link |
You know, let me refer to Einstein.
link |
Take the simplest problem, as simple as possible,
link |
And this is challenge, this simple problem.
link |
But it's simple by idea, but not simple to get it.
link |
When you will do this, you will find some predicate,
link |
which helps it a bit.
link |
Well, yeah, I mean, with Einstein, you can,
link |
you look at general relativity,
link |
but that doesn't help you with quantum mechanics.
link |
That's another story.
link |
You don't have any universal instrument.
link |
Yes, so I'm trying to wonder which space we're in,
link |
whether handwritten recognition is like general relativity,
link |
and then language is like quantum mechanics.
link |
So you're still gonna have to do a lot of mess
link |
to universalize it.
link |
But I'm trying to see,
link |
so what's your intuition why handwritten recognition
link |
is easier than language?
link |
Just, I think a lot of people would agree with that,
link |
but if you could elucidate sort of the intuition of why.
link |
I don't know, no, I don't think in this direction.
link |
I just think in directions that this is problem,
link |
which if we will solve it well,
link |
we will create some abstract understanding of images.
link |
Maybe not all images.
link |
I would like to talk to guys who doing in real images
link |
in Columbia University.
link |
What kind of images, unreal?
link |
Yeah, what they're ready, is there a predicate,
link |
what can be predicate?
link |
I still symmetry will play role in real life images,
link |
in any real life images, 2D images.
link |
Let's talk about 2D images.
link |
Because that's what we know.
link |
A neural network was created for 2D images.
link |
So the people I know in vision science, for example,
link |
the people who study human vision,
link |
that they usually go to the world of symbols
link |
and like handwritten recognition,
link |
but not really, it's other kinds of symbols
link |
to study our visual perception system.
link |
As far as I know, not much predicate type of thinking
link |
is understood about our vision system.
link |
They did not think in this direction.
link |
They don't, yeah, but how do you even begin
link |
to think in that direction?
link |
That's a, I would like to discuss with them.
link |
Because if we will be able to show that it is what working,
link |
and theoretical scheme, it's not so bad.
link |
So the unfortunate, so if we compare to language,
link |
language is like letters, finite set of letters,
link |
and a finite set of ways you can put together those letters.
link |
So it feels more amenable to kind of analysis.
link |
With natural images, there is so many pixels.
link |
No, no, no, letter, language is much, much more complicated.
link |
It's involved a lot of different stuff.
link |
It's not just understanding of very simple class of tasks.
link |
I would like to see list of task with language involved.
link |
Yes, so there's a lot of nice benchmarks now
link |
in natural language processing from the very trivial,
link |
like understanding the elements of a sentence,
link |
to question answering, to much more complicated
link |
where you talk about open domain dialogue.
link |
The natural question is, with handwritten recognition,
link |
is really the first step of understanding
link |
visual information.
link |
But even our records show that we go in the wrong direction
link |
because we need 60,000 digits.
link |
So even this first step, so forget about talking
link |
about the full journey, this first step
link |
should be taking in the right direction.
link |
No, no, wrong direction because 60,000 is unacceptable.
link |
No, I'm saying it should be taken in the right direction
link |
because 60,000 is not acceptable.
link |
If you can talk, it's great, we have half percent of error.
link |
And hopefully the step from doing hand recognition
link |
using very few examples, the step towards what babies do
link |
when they crawl and understand their physical environment.
link |
I know you don't know about babies.
link |
If you will do from very small examples,
link |
you will find principles which are different
link |
from what we're using now.
link |
And so it's more or less clear.
link |
That means that you will use weak convergence,
link |
not just strong convergence.
link |
Do you think these principles
link |
will naturally be human interpretable?
link |
So like when we'll be able to explain them
link |
and have a nice presentation to show
link |
what those principles are, or are they very,
link |
going to be very kind of abstract kinds of functions?
link |
For example, I talked yesterday about symmetry.
link |
And I gave very simple examples.
link |
The same will be like that.
link |
You gave like a predicate of a basic for?
link |
Yes, for different symmetries and you have for?
link |
Degree of symmetries, that is important.
link |
Not just symmetry.
link |
Existence doesn't exist, degree of symmetry.
link |
Yeah, for handwritten recognition.
link |
No, it's not for handwritten, it's for any images.
link |
But I would like apply to handwritten.
link |
Right, in theory it's more general, okay, okay.
link |
So a lot of the things we've been talking about
link |
falls, we've been talking about philosophy a little bit,
link |
but also about mathematics and statistics.
link |
A lot of it falls into this idea,
link |
a universal idea of statistical theory of learning.
link |
What is the most beautiful and sort of powerful
link |
or essential idea you've come across,
link |
even just for yourself personally in the world
link |
of statistics or statistic theory of learning?
link |
Probably uniform convergence, which we did
link |
with Alexei Chilvonenkis.
link |
Can you describe universal convergence?
link |
You have law of large numbers.
link |
So for any function, expectation of function,
link |
average of function converged to expectation.
link |
But if you have set of functions,
link |
for any function it is true.
link |
But it should converge simultaneously
link |
for all set of functions.
link |
And for learning, you need uniform convergence.
link |
Just convergence is not enough.
link |
Because when you pick up one which gives minimum,
link |
you can pick up one function which does not converge
link |
and it will give you the best answer for this function.
link |
So you need uniform convergence to guarantee learning.
link |
So learning does not rely on trivial law of large numbers,
link |
it relies on universal law.
link |
But idea of convergence exists in statistics for a long time.
link |
But it is interesting that as I think about myself,
link |
how stupid I was 50 years, I did not see weak convergence.
link |
I work on strong convergence.
link |
But now I think that most powerful is weak convergence.
link |
Because it makes admissible set of functions.
link |
And even in all proverbs,
link |
when people try to understand recognition about dog law,
link |
looks like a dog and so on, they use weak convergence.
link |
People in language, they understand this.
link |
But when we're trying to create artificial intelligence,
link |
we want event in different way.
link |
We just consider strong convergence arguments.
link |
So reducing the set of admissible functions,
link |
you think there should be effort put into understanding
link |
the properties of weak convergence?
link |
You know, in classical mathematics, in Gilbert space,
link |
there are only two ways,
link |
two form of convergence, strong and weak.
link |
Now we can use both.
link |
That means that we did everything.
link |
And it so happened that when we use Hilbert space,
link |
which is very rich space, space of continuous functions,
link |
which has integral and square.
link |
So we can apply weak and strong convergence for learning
link |
and have closed form solution.
link |
So for computationally simple.
link |
For me, it is sign that it is right way.
link |
Because you don't need any heuristic here,
link |
just do whatever you want.
link |
But now the only what left is this concept
link |
of what is predicate, but it is not statistics.
link |
By the way, I like the fact that you think that heuristics
link |
are a mess that should be removed from the system.
link |
So closed form solution is the ultimate goal.
link |
No, it so happened that when you're using right instrument,
link |
you have closed form solution.
link |
Do you think intelligence, human level intelligence,
link |
when we create it,
link |
will have something like a closed form solution?
link |
You know, now I'm looking on bounds,
link |
which I gave bounds for convergence.
link |
And when I'm looking for bounds,
link |
I'm thinking what is the most appropriate kernel
link |
for this bound would be.
link |
So we know that in say,
link |
all our businesses, we use radial basis function.
link |
But looking on the bound,
link |
I think that I start to understand that maybe
link |
we need to make corrections to radial basis function
link |
to be closer to work better for this bounds.
link |
So I'm again trying to understand what type of kernel
link |
have best approximation,
link |
best fit to this bound.
link |
Sure, so there's a lot of interesting work
link |
that could be done in discovering better functions
link |
than radial basis functions for bounds you find.
link |
It still comes from,
link |
you're looking to mass and trying to understand what.
link |
From your own mind, looking at the, I don't know.
link |
Then I'm trying to understand what will be good for that.
link |
Yeah, but to me, there's still a beauty.
link |
Again, maybe I'm a descendant of Alan Turing to heuristics.
link |
To me, ultimately, intelligence will be a mess of heuristics.
link |
And that's the engineering answer, I guess.
link |
When you're doing say, self driving cars,
link |
the great guy who will do this.
link |
It doesn't matter what theory behind that.
link |
Who has a better feeling how to apply it.
link |
But by the way, it is the same story about predicates.
link |
Because you cannot create rule for,
link |
situation is much more than you have rule for that.
link |
But maybe you can have more abstract rule
link |
than it will be less literal.
link |
It is the same story about ideas
link |
and ideas applied to specific cases.
link |
But still you should reach.
link |
You cannot avoid this.
link |
But you should still reach for the ideas
link |
to understand the science.
link |
Okay, let me kind of ask, do you think neural networks
link |
or functions can be made to reason?
link |
So what do you think, we've been talking about intelligence,
link |
but this idea of reasoning,
link |
there's an element of sequentially disassembling,
link |
interpreting the images.
link |
So when you think of handwritten recognition, we kind of think
link |
that there'll be a single, there's an input and output.
link |
There's not a recurrence.
link |
What do you think about sort of the idea of recurrence,
link |
of going back to memory and thinking through this
link |
sort of sequentially mangling the different representations
link |
over and over until you arrive at a conclusion?
link |
Or is ultimately all that can be wrapped up into a function?
link |
No, you're suggesting that let us use this type of algorithm.
link |
When I started thinking, I first of all,
link |
starting to understand what I want.
link |
Can I write down what I want?
link |
And then I'm trying to formalize.
link |
And when I do that, I think I have to solve this problem.
link |
And till now I did not see a situation where you need recurrence.
link |
But do you observe human beings?
link |
You try to, it's the imitation question, right?
link |
It seems that human beings reason
link |
this kind of sequentially sort of,
link |
does that inspire in you a thought that we need to add that
link |
into our intelligence systems?
link |
You're saying, okay, I mean, you've kind of answered saying
link |
until now I haven't seen a need for it.
link |
And so because of that, you don't see a reason
link |
to think about it.
link |
You know, most of things I don't understand.
link |
In reasoning in human, it is for me too complicated.
link |
For me, the most difficult part is to ask questions,
link |
to good questions, how it works,
link |
how people asking questions, I don't know this.
link |
You said that machine learning is not only
link |
about technical things, speaking of questions,
link |
but it's also about philosophy.
link |
So what role does philosophy play in machine learning?
link |
We talked about Plato, but generally thinking
link |
in this philosophical way, does it have,
link |
how does philosophy and math fit together in your mind?
link |
First ideas and then their implementation.
link |
It's like predicate, like say admissible set of functions.
link |
It comes together, everything.
link |
Because the first iteration of theory was done 50 years ago.
link |
I told that, this is theory.
link |
So everything's there, if you have data you can,
link |
and your set of function has not big capacity.
link |
So low VC dimension, you can do that.
link |
You can make structural risk minimization, control capacity.
link |
But you was not able to make admissible set of function good.
link |
Now when suddenly realize that we did not use
link |
another idea of convergence, which we can,
link |
everything comes together.
link |
But those are mathematical notions.
link |
Philosophy plays a role of simply saying
link |
that we should be swimming in the space of ideas.
link |
Let's talk what is philosophy.
link |
Philosophy means understanding of life.
link |
So understanding of life, say people like Plata,
link |
they understand on very high abstract level of life.
link |
So, and whatever I doing,
link |
just implementation of my understanding of life.
link |
But every new step, it is very difficult.
link |
For example, to find this idea
link |
that we need big convergence was not simple for me.
link |
So that required thinking about life a little bit.
link |
Hard to trace, but there was some thought process.
link |
I'm working, I'm thinking about the same problem
link |
for 50 years or more, and again, and again, and again.
link |
I'm trying to be honest and that is very important.
link |
Not to be very enthusiastic, but concentrate
link |
on whatever we was not able to achieve, for example.
link |
And understand why.
link |
And now I understand that because I believe in math,
link |
I believe that in Wigner's idea.
link |
But now when I see that there are only two way
link |
of convergence and we're using both,
link |
that means that we must do as well as people doing.
link |
But now, exactly in philosophy
link |
and what we know about predicate,
link |
how we understand life, can we describe as a predicate.
link |
I thought about that and that is more or less obvious
link |
level of symmetry.
link |
But next, I have a feeling,
link |
it's something about structures.
link |
But I don't know how to formulate,
link |
how to measure measure of structure and all this stuff.
link |
And the guy who will solve this challenge problem,
link |
then when we were looking how he did it,
link |
probably just only symmetry is not enough.
link |
But something like symmetry will be there.
link |
Structure will be there.
link |
Oh yeah, absolutely.
link |
Symmetry will be there and level of symmetry will be there.
link |
And level of symmetry, antisymmetry, diagonal, vertical.
link |
And I even don't know how you can use
link |
in different direction idea of symmetry, it's very general.
link |
But it will be there.
link |
I think that people very sensitive to idea of symmetry.
link |
But there are several ideas like symmetry.
link |
As I would like to learn.
link |
But you cannot learn just thinking about that.
link |
You should do challenging problems
link |
and then analyze them, why it was able to solve them.
link |
And then you will see.
link |
Very simple things, it's not easy to find.
link |
But even with talking about this every time.
link |
I was surprised, I tried to understand.
link |
These people describe in language
link |
strong convergence mechanism for learning.
link |
I did not see, I don't know.
link |
But weak convergence, this dark story
link |
and story like that when you will explain to kid,
link |
you will use weak convergence argument.
link |
It looks like it does like it does that.
link |
But when you try to formalize, you're just ignoring this.
link |
Why, why 50 years from start of machine learning?
link |
And that's the role of philosophy, thinking about life.
link |
I think that maybe, I don't know.
link |
Maybe this is theory also, we should blame for that
link |
because empirical risk minimization and all this stuff.
link |
And if you read now textbooks,
link |
they just about bound about empirical risk minimization.
link |
They don't looking for another problem like admissible set.
link |
But on the topic of life, perhaps we,
link |
you could talk in Russian for a little bit.
link |
What's your favorite memory from childhood?
link |
What's your favorite memory from childhood?
link |
How about, can you try to answer in Russian?
link |
It was very cool when...
link |
What kind of music?
link |
What's your favorite?
link |
Well, different composers.
link |
At first, it was Vivaldi, I was surprised that it was possible.
link |
And then when I understood Bach, I was absolutely shocked.
link |
By the way, from him I think that there is a predicate,
link |
Because you can just feel the structure.
link |
And I don't think that different elements of life
link |
are very much divided, in the sense of predicates.
link |
Everywhere structure, in painting structure,
link |
in human relations structure.
link |
Here's how to find these high level predicates, it's...
link |
In Bach and in life, everything is connected.
link |
Now that we're talking about Bach,
link |
let's switch back to English,
link |
because I like Beethoven and Chopin, so...
link |
Well, Chopin, it's another amusing story.
link |
But Bach, if we talk about predicates,
link |
Bach probably has the most sort of
link |
well defined predicates that underlie it.
link |
It is very interesting to read what critics
link |
are writing about Bach, which words they're using.
link |
They're trying to describe predicates.
link |
And then Chopin, it is very different vocabulary,
link |
very different predicates.
link |
And I think that if you will make collection of that,
link |
so maybe from this you can describe predicate
link |
for digit recognition as well.
link |
From Bach and Chopin.
link |
No, no, no, not from Bach and Chopin.
link |
From the critic interpretation of the music, yeah.
link |
When they're trying to explain you music, what they use.
link |
As they use, they describe high level ideas
link |
of platos ideas, what behind this music.
link |
So art is not self explanatory in some sense.
link |
So you have to try to convert it into ideas.
link |
It is ill post problems.
link |
When you go from ideas to the representation,
link |
But when you're trying to go Bach, it is ill post problems.
link |
But nevertheless, I believe that when you're looking
link |
from that, even from art, you will be able to find
link |
predicates for digit recognition.
link |
That's such a fascinating and powerful notion.
link |
Do you ponder your own mortality?
link |
Do you think about it?
link |
Do you draw insight from it?
link |
About mortality, no, yeah.
link |
Are you afraid of death?
link |
Not too much, not too much.
link |
It is pity that I will not be able to do something
link |
which I think I have a feeling to do that.
link |
For example, I will be very happy to work with guys
link |
theoretician from music to write this collection
link |
of description, how they describe music,
link |
how they use that predicate, and from art as well.
link |
Then take what is in common and try to understand
link |
predicate which is absolute for everything.
link |
And then use that for visual recognition
link |
and see if there is a connection.
link |
Ah, there's still time.
link |
It take years and years and years.
link |
Yes, yeah, it's a long way.
link |
Well, see, you've got the patient mathematicians mind.
link |
I think it could be done very quickly and very beautifully.
link |
I think it's a really elegant idea.
link |
Yeah, you know, the most time,
link |
it is not to make this collection to understand
link |
what is the common to think about that once again
link |
and again and again.
link |
Again and again and again, but I think sometimes,
link |
especially just when you say this idea now,
link |
even just putting together the collection
link |
and looking at the different sets of data,
link |
language, trying to interpret music,
link |
criticize music, and images,
link |
I think there'll be sparks of ideas that'll come.
link |
Of course, again and again, you'll come up with better ideas,
link |
but even just that notion is a beautiful notion.
link |
I even have some example.
link |
Yes, so I have friend
link |
who was specialist in Russian poetry.
link |
She is professor of Russian poetry.
link |
He did not write poems,
link |
but she know a lot of stuff.
link |
She make book, several books,
link |
and one of them is a collection of Russian poetry.
link |
She have images of Russian poetry.
link |
She collect all images of Russian poetry.
link |
And I ask her to do following.
link |
You have NIPS, digit recognition,
link |
and we get 100 digits,
link |
or maybe less than 100.
link |
I don't remember, maybe 50 digits.
link |
And try from poetical point of view,
link |
describe every image which she see,
link |
using only words of images of Russian poetry.
link |
And then we tried to,
link |
I call it learning using privileged information.
link |
I call it privileged information.
link |
You have on two languages.
link |
One language is just image of digit,
link |
and another language, poetic description of this image.
link |
And this is privileged information.
link |
And there is an algorithm when you're working
link |
using privileged information, you're doing better.
link |
So there's something there.
link |
And there is a, in NEC,
link |
she unfortunately died.
link |
The collection of digits
link |
in poetic descriptions of these digits.
link |
So there's something there in that poetic description.
link |
But I think that there is a abstract ideas
link |
on the plot of level of ideas.
link |
Yeah, that they're there.
link |
That could be discovered.
link |
And music seems to be a good entry point.
link |
But as soon as we start with this challenge problem.
link |
The challenge problem.
link |
It immediately connected to all this stuff.
link |
Especially with your talk and this podcast,
link |
and I'll do whatever I can to advertise it.
link |
It's such a clean, beautiful Einstein like formulation
link |
of the challenge before us.
link |
Let me ask another absurd question.
link |
We talked about mortality.
link |
We talked about philosophy of life.
link |
What do you think is the meaning of life?
link |
What's the predicate for mysterious existence here on earth?
link |
It's very interesting how we have,
link |
in Russia, I don't know if you know the guy Strugatsky.
link |
They are writing fiction.
link |
They're thinking about human, what's going on.
link |
And they have idea that there are developing
link |
two type of people, common people and very smart people.
link |
They just started.
link |
And these two branches of people will go
link |
in different direction very soon.
link |
So that's what they're thinking about that.
link |
So the purpose of life is to create two paths.
link |
Of human societies.
link |
Simple people and more complicated people.
link |
Which do you like best?
link |
The simple people or the complicated ones?
link |
I don't know that it is just his fantasy,
link |
but you know, every week we have guy
link |
who is just a writer and also a theorist of literature.
link |
And he explain how he understand literature
link |
and human relationship.
link |
And I understood that I'm just small kids
link |
He's very smart guy in understanding life.
link |
He knows this predicate.
link |
He knows big blocks of life.
link |
I am used every time when I listen to him.
link |
And he just talking about literature.
link |
And I think that I was surprised.
link |
So the managers in big companies,
link |
most of them are guys who study English language
link |
and English literature.
link |
Because they understand life.
link |
They understand models.
link |
maybe many talented critics just analyzing this.
link |
And this is big science like property.
link |
That's very smart.
link |
It amazes me that you are and continue to be humbled
link |
by the brilliance of others.
link |
I'm very modest about myself.
link |
I see so smart guys around.
link |
Well, let me be immodest for you.
link |
You're one of the greatest mathematicians,
link |
statisticians of our time.
link |
It's truly an honor.
link |
Thank you for talking again.
link |
Let's talk again when your challenge is taken on
link |
and solved by grad student.
link |
Especially when they use it.
link |
Maybe music will be involved.
link |
Latimer, thank you so much.
link |
It's been an honor. Thank you very much.
link |
Thanks for listening to this conversation
link |
with Latimer Vapnik.
link |
And thank you to our presenting sponsor, Cash App.
link |
Download it, use code LexPodcast.
link |
You'll get $10 and $10 will go to FIRST,
link |
an organization that inspires and educates young minds
link |
to become science and technology innovators of tomorrow.
link |
If you enjoy this podcast, subscribe on YouTube,
link |
give us five stars on Apple Podcast,
link |
support it on Patreon,
link |
or simply connect with me on Twitter at Lex Friedman.
link |
And now, let me leave you with some words
link |
from Latimer Vapnik.
link |
When solving a problem of interest,
link |
do not solve a more general problem
link |
as an intermediate step.
link |
Thank you for listening.
link |
I hope to see you next time.