back to indexVladimir Vapnik: Statistical Learning | Lex Fridman Podcast #5
link |
The following is a conversation with Vladimir Vapnik.
link |
He's the coinventor of the Support Vector Machines,
link |
Support Vector Clustering, VC Theory,
link |
and many foundational ideas in statistical learning.
link |
He was born in the Soviet Union and worked
link |
at the Institute of Control Sciences in Moscow.
link |
Then in the United States, he worked at AT&T, NEC Labs,
link |
Facebook Research, and now as a professor at Columbia
link |
His work has been cited over 170,000 times.
link |
He has some very interesting ideas
link |
about artificial intelligence and the nature of learning,
link |
especially on the limits of our current approaches
link |
and the open problems in the field.
link |
This conversation is part of MIT course
link |
on artificial general intelligence
link |
and the Artificial Intelligence Podcast.
link |
If you enjoy it, please subscribe on YouTube
link |
or rate it on iTunes or your podcast provider of choice
link |
or simply connect with me on Twitter
link |
or other social networks at Lex Friedman, spelled F R I D.
link |
And now here's my conversation with Vladimir Vapnik.
link |
Einstein famously said that God doesn't play dice.
link |
You have studied the world through the eyes of statistics.
link |
So let me ask you, in terms of the nature of reality,
link |
fundamental nature of reality, does God play dice?
link |
We don't know some factors, and because we
link |
don't know some factors, which could be important,
link |
it looks like God play dice, but we should describe it.
link |
In philosophy, they distinguish between two positions,
link |
positions of instrumentalism, where
link |
you're creating theory for prediction
link |
and position of realism, where you're
link |
trying to understand what God's big.
link |
Can you describe instrumentalism and realism
link |
For example, if you have some mechanical laws, what is that?
link |
Is it law which is true always and everywhere?
link |
Or it is law which allows you to predict
link |
the position of moving element, what you believe.
link |
You believe that it is God's law, that God created the world,
link |
which obeyed to this physical law,
link |
or it is just law for predictions?
link |
And which one is instrumentalism?
link |
If you believe that this is law of God, and it's always
link |
true everywhere, that means that you're a realist.
link |
So you're trying to really understand that God's thought.
link |
So the way you see the world as an instrumentalist?
link |
You know, I'm working for some models,
link |
model of machine learning.
link |
So in this model, we can see setting,
link |
and we try to solve, resolve the setting,
link |
to solve the problem.
link |
And you can do it in two different ways,
link |
from the point of view of instrumentalists.
link |
And that's what everybody does now,
link |
because they say that the goal of machine learning
link |
is to find the rule for classification.
link |
That is true, but it is an instrument for prediction.
link |
But I can say the goal of machine learning
link |
is to learn about conditional probability.
link |
So how God played use, and is He play?
link |
What is probability for one?
link |
What is probability for another given situation?
link |
But for prediction, I don't need this.
link |
But for understanding, I need conditional probability.
link |
So let me just step back a little bit first to talk about.
link |
You mentioned, which I read last night,
link |
the parts of the 1960 paper by Eugene Wigner,
link |
unreasonable effectiveness of mathematics
link |
and natural sciences.
link |
Such a beautiful paper, by the way.
link |
It made me feel, to be honest, to confess my own work
link |
in the past few years on deep learning, heavily applied.
link |
It made me feel that I was missing out
link |
on some of the beauty of nature in the way
link |
that math can uncover.
link |
So let me just step away from the poetry of that for a second.
link |
How do you see the role of math in your life?
link |
Where does it sit?
link |
And does math for you have limits of what it can describe?
link |
Some people saying that math is language which use God.
link |
So I believe in that.
link |
Speak to God or use God.
link |
So I believe that this article about unreasonable
link |
effectiveness of math is that if you're
link |
looking in mathematical structures,
link |
they know something about reality.
link |
And the most scientists from natural science,
link |
they're looking on equation and trying to understand reality.
link |
So the same in machine learning.
link |
If you're trying very carefully look on all equations
link |
which define conditional probability,
link |
you can understand something about reality more
link |
than from your fantasy.
link |
So math can reveal the simple underlying principles
link |
of reality, perhaps.
link |
You know, what means simple?
link |
It is very hard to discover them.
link |
But then when you discover them and look at them,
link |
you see how beautiful they are.
link |
And it is surprising why people did not see that before.
link |
You're looking on equation and derive it from equations.
link |
For example, I talked yesterday about least squirmated.
link |
And people had a lot of fantasy have to improve least squirmated.
link |
But if you're going step by step by solving some equations,
link |
you suddenly will get some term which,
link |
after thinking, you understand that it described
link |
position of observation point.
link |
In least squirmated, we throw out a lot of information.
link |
We don't look in composition of point of observations.
link |
We're looking only on residuals.
link |
But when you understood that, that's a very simple idea.
link |
But it's not too simple to understand.
link |
And you can derive this just from equations.
link |
So some simple algebra, a few steps
link |
will take you to something surprising
link |
that when you think about, you understand.
link |
And that is proof that human intuition not to reach
link |
and very primitive.
link |
And it does not see very simple situations.
link |
So let me take a step back in general.
link |
But what about human as opposed to intuition and ingenuity?
link |
Moments of brilliance.
link |
So do you have to be so hard on human intuition?
link |
Are there moments of brilliance in human intuition?
link |
They can leap ahead of math, and then the math will catch up?
link |
I think that the best human intuition,
link |
it is putting in axioms.
link |
And then it is technical.
link |
See where the axioms take you.
link |
But if they correctly take axioms,
link |
but it axiom polished during generations of scientists.
link |
And this is integral wisdom.
link |
So that's beautifully put.
link |
But if you maybe look at when you think of Einstein
link |
and special relativity, what is the role of imagination
link |
coming first there in the moment of discovery of an idea?
link |
So there is obviously a mix of math
link |
and out of the box imagination there.
link |
That I don't know.
link |
Whatever I did, I exclude any imagination.
link |
Because whatever I saw in machine learning that
link |
come from imagination, like features, like deep learning,
link |
they are not relevant to the problem.
link |
When you're looking very carefully
link |
for mathematical equations, you're
link |
deriving very simple theory, which goes far by
link |
no theory at school than whatever people can imagine.
link |
Because it is not good fantasy.
link |
It is just interpretation.
link |
It is just fantasy.
link |
But it is not what you need.
link |
You don't need any imagination to derive, say,
link |
main principle of machine learning.
link |
When you think about learning and intelligence,
link |
maybe thinking about the human brain
link |
and trying to describe mathematically the process of learning
link |
that is something like what happens in the human brain,
link |
do you think we have the tools currently?
link |
Do you think we will ever have the tools
link |
to try to describe that process of learning?
link |
It is not description of what's going on.
link |
It is interpretation.
link |
It is your interpretation.
link |
Your vision can be wrong.
link |
You know, when a guy invent microscope,
link |
Levin Cook for the first time, only he got this instrument
link |
and nobody, he kept secrets about microscope.
link |
But he wrote reports in London Academy of Science.
link |
In his report, when he looked into the blood,
link |
he looked everywhere, on the water, on the blood,
link |
But he described blood like fight between queen and king.
link |
So he saw blood cells, red cells,
link |
and he imagines that it is army fighting each other.
link |
And it was his interpretation of situation.
link |
And he sent this report in Academy of Science.
link |
They very carefully looked because they believed
link |
that he is right, he saw something.
link |
But he gave wrong interpretation.
link |
And I believe the same can happen with brain.
link |
Because the most important part, you know,
link |
I believe in human language.
link |
In some proverb, it's so much wisdom.
link |
For example, people say that it is better than 1,000 days
link |
of diligent studies one day with great teacher.
link |
But if I will ask you what teacher does, nobody knows.
link |
And that is intelligence.
link |
And what we know from history, and now from mass
link |
and machine learning, that teacher can do a lot.
link |
So what, from a mathematical point of view,
link |
is the great teacher?
link |
That's an awful question.
link |
Now, what we can say what teacher can do,
link |
he can introduce some invariance, some predicate
link |
for creating invariance.
link |
Because teacher knows reality and can describe
link |
from this reality a predicate invariance.
link |
But he knows that when you're using invariant,
link |
he can decrease number of observations 100 times.
link |
But maybe try to pull that apart a little bit.
link |
I think you mentioned a piano teacher saying to the student,
link |
play like a butterfly.
link |
I played piano, I played guitar for a long time.
link |
Yeah, maybe it's romantic, poetic.
link |
But it feels like there's a lot of truth in that statement.
link |
There is a lot of instruction in that statement.
link |
And so can you pull that apart?
link |
The language itself may not contain this information.
link |
It's not blah, blah, blah.
link |
It does not blah, blah, blah, yeah.
link |
It affects your playing.
link |
But it's not the language.
link |
It feels like what is the information being exchanged there?
link |
What is the nature of information?
link |
What is the representation of that information?
link |
I believe that it is sort of predicate.
link |
That is exactly what intelligence in machine learning
link |
Because the rest is just mathematical technique.
link |
I think that what was discovered recently
link |
is that there is two mechanisms of learning.
link |
One called strong convergence mechanism
link |
and weak convergence mechanism.
link |
Before, people use only one convergence.
link |
In weak convergence mechanism, you can use predicate.
link |
That's what play like butterfly.
link |
And it will immediately affect your playing.
link |
You know, there is English proverb.
link |
If it looks like a duck, swims like a duck,
link |
and quack like a duck, then it is probably duck.
link |
But this is exact about predicate.
link |
Looks like a duck, what it means.
link |
So you saw many ducks that you're training data.
link |
So you have description of how looks integral looks ducks.
link |
Yeah, the visual characteristics of a duck.
link |
Yeah, but you won't.
link |
And you have model for the cognition ducks.
link |
So you would like that theoretical description
link |
from model coincide with empirical description, which
link |
you saw on Territax there.
link |
So about looks like a duck, it is general.
link |
But what about swims like a duck?
link |
You should know that duck swims.
link |
You can say it play chess like a duck, OK?
link |
Duck doesn't play chess.
link |
And it is completely legal predicate, but it is useless.
link |
So half teacher can recognize not useless predicate.
link |
So up to now, we don't use this predicate
link |
in existing machine learning.
link |
And you think that's not so useful?
link |
So why we need billions of data?
link |
But in this English proverb, they use only three predicate.
link |
Looks like a duck, swims like a duck, and quack like a duck.
link |
So you can't deny the fact that swims like a duck
link |
and quacks like a duck has humor in it, has ambiguity.
link |
Let's talk about swim like a duck.
link |
It does not say jumps like a duck.
link |
Because it's not relevant.
link |
But that means that you know ducks, you know different birds,
link |
And you derive from this that it is relevant to say swim like a duck.
link |
So underneath, in order for us to understand swims like a duck,
link |
it feels like we need to know millions of other little pieces
link |
We pick up along the way.
link |
You don't think so.
link |
There doesn't need to be this knowledge base.
link |
In those statements, carries some rich information
link |
that helps us understand the essence of duck.
link |
How far are we from integrating predicates?
link |
You know that when you consider complete theory,
link |
machine learning, so what it does,
link |
you have a lot of functions.
link |
And then you're talking, it looks like a duck.
link |
You see your training data.
link |
From training data, you recognize like expected duck should look.
link |
Then you remove all functions, which does not look like you think
link |
it should look from training data.
link |
So you decrease amount of function from which you pick up one.
link |
Then you give a second predicate.
link |
And then, again, decrease the set of function.
link |
And after that, you pick up the best function you can find.
link |
It is standard machine learning.
link |
So why you need not too many examples?
link |
Because your predicates aren't very good, or you're not.
link |
That means that predicate very good.
link |
Because every predicate is invented
link |
to decrease a divisible set of functions.
link |
So you talk about admissible set of functions,
link |
and you talk about good functions.
link |
So what makes a good function?
link |
So admissible set of function is set of function
link |
which has small capacity, or small diversity,
link |
small VC dimension example, which contain good function.
link |
So by the way, for people who don't know,
link |
VC, you're the V in the VC.
link |
So how would you describe to a lay person what VC theory is?
link |
How would you describe VC?
link |
So when you have a machine, so a machine
link |
capable to pick up one function from the admissible set
link |
But set of admissibles function can be big.
link |
They contain all continuous functions and it's useless.
link |
You don't have so many examples to pick up function.
link |
But it can be small.
link |
Small, we call it capacity, but maybe better called diversity.
link |
So not very different function in the set
link |
is infinite set of function, but not very diverse.
link |
So it is small VC dimension.
link |
When VC dimension is small, you need small amount
link |
So the goal is to create admissible set of functions
link |
which have small VC dimension and contain good function.
link |
Then you will be able to pick up the function
link |
using small amount of observations.
link |
So that is the task of learning.
link |
It is creating a set of admissible functions
link |
that has a small VC dimension.
link |
And then you've figured out a clever way of picking up.
link |
No, that is goal of learning, which I formulated yesterday.
link |
Statistical learning theory does not
link |
involve in creating admissible set of function.
link |
In classical learning theory, everywhere, 100% in textbook,
link |
the set of function admissible set of function is given.
link |
But this is science about nothing,
link |
because the most difficult problem
link |
to create admissible set of functions, given, say,
link |
a lot of functions, continuum set of functions,
link |
create admissible set of functions,
link |
that means that it has finite VC dimension,
link |
small VC dimension, and contain good function.
link |
So this was out of consideration.
link |
So what's the process of doing that?
link |
I mean, it's fascinating.
link |
What is the process of creating this admissible set of functions?
link |
That is invariant.
link |
That's invariance.
link |
Can you describe invariance?
link |
Yeah, you're looking of properties of training data.
link |
And properties means that you have some function,
link |
and you just count what is the average value of function
link |
You have a model, and what is the expectation
link |
of this function on the model.
link |
And they should coincide.
link |
So the problem is about how to pick up functions.
link |
It can be any function.
link |
In fact, it is true for all functions.
link |
But because when I talking set, say,
link |
duck does not jumping, so you don't ask question, jump like a duck.
link |
Because it is trivial, it does not jumping,
link |
it doesn't help you to recognize jump.
link |
But you know something, which question to ask,
link |
when you're asking, it swims like a jump, like a duck.
link |
But looks like a duck, it is general situation.
link |
Looks like, say, guy who have this illness, this disease,
link |
it is legal, so there is a general type of predicate
link |
looks like, and special type of predicate,
link |
which related to this specific problem.
link |
And that is intelligence part of all this business.
link |
And that we are teachers in world.
link |
Incorporating those specialized predicates.
link |
What do you think about deep learning as neural networks,
link |
these arbitrary architectures as helping accomplish some of the tasks
link |
you're thinking about, their effectiveness or lack thereof,
link |
what are the weaknesses and what are the possible strengths?
link |
You know, I think that this is fantasy.
link |
Everything which like deep learning, like features.
link |
Let me give you this example.
link |
One of the greatest book, this Churchill book about history of Second World War.
link |
And he's starting this book describing that in all time, when war is over,
link |
so the great kings, they gathered together,
link |
almost all of them were relatives,
link |
and they discussed what should be done, how to create peace.
link |
And they came to agreement.
link |
And when happens First World War, the general public came in power.
link |
And they were so greedy that robbed Germany.
link |
And it was clear for everybody that it is not peace.
link |
That peace will last only 20 years, because they were not professionals.
link |
It's the same I see in machine learning.
link |
There are mathematicians who are looking for the problem from a very deep point of view,
link |
a mathematical point of view.
link |
And there are computer scientists who mostly does not know mathematics.
link |
They just have interpretation of that.
link |
And they invented a lot of blah, blah, blah interpretations like deep learning.
link |
Why you need deep learning?
link |
Mathematics does not know deep learning.
link |
Mathematics does not know neurons.
link |
It is just function.
link |
If you like to say piecewise linear function, say that,
link |
and do it in class of piecewise linear function.
link |
But they invent something.
link |
And then they try to prove the advantage of that through interpretations,
link |
which mostly wrong.
link |
And when not enough they appeal to brain,
link |
which they know nothing about that.
link |
Nobody knows what's going on in the brain.
link |
So I think that more reliable look on maths.
link |
This is a mathematical problem.
link |
Do your best to solve this problem.
link |
Try to understand that there is not only one way of convergence,
link |
which is strong way of convergence.
link |
There is a weak way of convergence, which requires predicate.
link |
And if you will go through all this stuff,
link |
you will see that you don't need deep learning.
link |
Even more, I would say one of the theorem,
link |
which is called representor theorem.
link |
It says that optimal solution of mathematical problem,
link |
which described learning, is on shadow network, not on deep learning.
link |
And a shallow network, yeah.
link |
The ultimate problem is there.
link |
So in the end, what you're saying is exactly right.
link |
The question is, you have no value for throwing something on the table,
link |
playing with it, not math.
link |
It's like in your old network where you said throwing something in the bucket
link |
or the biological example and looking at kings and queens
link |
or the cells or the microscope.
link |
You don't see value in imagining the cells or kings and queens
link |
and using that as inspiration and imagination
link |
for where the math will eventually lead you.
link |
You think that interpretation basically deceives you in a way that's not productive.
link |
I think that if you're trying to analyze this business of learning
link |
and especially discussion about deep learning,
link |
it is discussion about interpretation.
link |
It's discussion about things, about what you can say about things.
link |
That's right, but aren't you surprised by the beauty of it?
link |
Not mathematical beauty, but the fact that it works at all.
link |
Or are you criticizing that very beauty,
link |
our human desire to interpret,
link |
to find our silly interpretations in these constructs?
link |
Let me ask you this.
link |
Are you surprised?
link |
Does it inspire you?
link |
How do you feel about the success of a system like AlphaGo
link |
at beating the game of Go?
link |
Using neural networks to estimate the quality of a board
link |
and the quality of the board?
link |
That is your interpretation quality of the board.
link |
It's not our interpretation.
link |
The fact is, a neural network system doesn't matter.
link |
A learning system that we don't mathematically understand
link |
that beats the best human player.
link |
It does something that was thought impossible.
link |
That means that it's not very difficult problem.
link |
We've empirically discovered that this is not a very difficult problem.
link |
Maybe I can't argue.
link |
Even more, I would say,
link |
that if they use deep learning,
link |
it is not the most effective way of learning theory.
link |
Usually, when people use deep learning,
link |
they're using zillions of training data.
link |
But you don't need this.
link |
I describe the challenge.
link |
Can we do some problems with deep learning method
link |
with deep net using 100 times less training data?
link |
Even more, some problems deep learning cannot solve
link |
because it's not necessary.
link |
They create admissible set of functions.
link |
Deep architecture means to create admissible set of functions.
link |
You cannot say that you're creating good admissible set of functions.
link |
It's your fantasy.
link |
It does not come from mass.
link |
But it is possible to create admissible set of functions
link |
because you have your training data.
link |
Actually, for mathematicians, when you consider a variant,
link |
you need to use law of large numbers.
link |
When you're making training in existing algorithm,
link |
you need uniform law of large numbers,
link |
which is much more difficult.
link |
You see dimension and all this stuff.
link |
Nevertheless, if you use both weak and strong way of convergence,
link |
you can decrease a lot of training data.
link |
You could do the three, the Swims like a duck and Quacks like a duck.
link |
Let's step back and think about human intelligence in general.
link |
Clearly, that has evolved in a nonmathematical way.
link |
As far as we know, God, or whoever,
link |
didn't come up with a model in place in our brain of admissible functions.
link |
It kind of evolved.
link |
Maybe you have a view on this.
link |
Alan Turing in the 50s in his paper asked and rejected the question,
link |
can machines think?
link |
It's not a very useful question.
link |
But can you briefly entertain this useless question?
link |
Can machines think?
link |
So talk about intelligence and your view of it.
link |
I don't know that.
link |
I know that Turing described imitation.
link |
If computer can imitate human being, let's call it intelligent.
link |
And he understands that it is not thinking computer.
link |
He completely understands what he's doing.
link |
But he's set up a problem of imitation.
link |
So now we understand that the problem is not in imitation.
link |
I'm not sure that intelligence is just inside of us.
link |
It may be also outside of us.
link |
I have several observations.
link |
So when I prove some theorem, it's a very difficult theorem.
link |
But in a couple of years, in several places, people proved the same theorem.
link |
Say, soil lemma after us was done.
link |
Then another guy proved the same theorem.
link |
In the history of science, it's happened all the time.
link |
For example, geometry.
link |
It's happened simultaneously.
link |
First it did Lobachevsky and then Gauss and Boyai and other guys.
link |
It happened simultaneously in 10 years period of time.
link |
And I saw a lot of examples like that.
link |
And many mathematicians think that when they develop something,
link |
they develop something in general which affects everybody.
link |
So maybe our model that intelligence is only inside of us is incorrect.
link |
It's our interpretation.
link |
Maybe there exists some connection with world intelligence.
link |
You're almost like plugging in into...
link |
...and contributing to this...
link |
Into a big network.
link |
...into a big, maybe in your own network.
link |
On the flip side of that, maybe you can comment on big O complexity
link |
and how you see classifying algorithms by worst case running time
link |
in relation to their input.
link |
So that way of thinking about functions.
link |
Do you think P equals NP?
link |
Do you think that's an interesting question?
link |
Yeah, it is an interesting question.
link |
But let me talk about complexity and about worst case scenario.
link |
There is a mathematical setting.
link |
When I came to the United States in 1990,
link |
people did not know this theory.
link |
They did not know statistical learning theory.
link |
So in Russia it was published to monographs or monographs,
link |
but in America they didn't know.
link |
Then they learned.
link |
And somebody told me that if it's worst case theory,
link |
and they will create real case theory,
link |
but till now it did not.
link |
Because it is a mathematical tool.
link |
You can do only what you can do using mathematics,
link |
which has a clear understanding and clear description.
link |
And for this reason we introduced complexity.
link |
Because actually it is diverse, I like this one more.
link |
This dimension you can prove some theorems.
link |
But we also create theory for case when you know probability measure.
link |
And that is the best case which can happen.
link |
It is entropy theory.
link |
So from a mathematical point of view,
link |
you know the best possible case and the worst possible case.
link |
You can derive different model in medium.
link |
But it's not so interesting.
link |
You think the edges are interesting?
link |
The edges are interesting.
link |
Because it is not so easy to get a good bound, exact bound.
link |
It's not many cases where you have.
link |
The bound is not exact.
link |
But interesting principles which discover the mass.
link |
Do you think it's interesting because it's challenging
link |
and reveals interesting principles that allow you to get those bounds?
link |
Or do you think it's interesting because it's actually very useful
link |
for understanding the essence of a function of an algorithm?
link |
So it's like me judging your life as a human being
link |
by the worst thing you did and the best thing you did
link |
versus all the stuff in the middle.
link |
It seems not productive.
link |
I don't think so because you cannot describe situation in the middle.
link |
Or it will be not general.
link |
So you can describe edges cases.
link |
And it is clear it has some model.
link |
But you cannot describe model for every new case.
link |
So you will be never accurate when you're using model.
link |
But from a statistical point of view,
link |
the way you've studied functions and the nature of learning
link |
and the world, don't you think that the real world has a very long tail
link |
that the edge cases are very far away from the mean,
link |
the stuff in the middle, or no?
link |
I don't know that.
link |
I think that from my point of view,
link |
if you will use formal statistic, uniform law of large numbers,
link |
if you will use this invariance business,
link |
you will need just law of large numbers.
link |
And there's a huge difference between uniform law of large numbers
link |
and large numbers.
link |
Can you describe that a little more?
link |
Or should we just take it to...
link |
No, for example, when I'm talking about duck,
link |
I gave three predicates and it was enough.
link |
But if you will try to do formal distinguish,
link |
you will need a lot of observations.
link |
And so that means that information about looks like a duck
link |
contain a lot of bits of information,
link |
formal bits of information.
link |
So we don't know that how much bit of information
link |
contain things from artificial intelligence.
link |
And that is the subject of analysis.
link |
Till now, old business,
link |
I don't like how people consider artificial intelligence.
link |
They consider us some codes which imitate activity of human being.
link |
It is not science.
link |
It is applications.
link |
You would like to imitate God.
link |
It is very useful and we have good problem.
link |
But you need to learn something more.
link |
How people can to develop predicates,
link |
swims like a duck,
link |
or play like butterfly or something like that.
link |
Not the teacher tells you how it came in his mind.
link |
How he choose this image.
link |
That is problem of intelligence.
link |
That is the problem of intelligence.
link |
And you see that connected to the problem of learning?
link |
Because you immediately give this predicate
link |
like specific predicate, swims like a duck,
link |
or quack like a duck.
link |
It was chosen somehow.
link |
So what is the line of work, would you say?
link |
If you were to formulate as a set of open problems,
link |
that will take us there.
link |
Play like a butterfly.
link |
We will get a system to be able to...
link |
Let's separate two stories.
link |
One mathematical story.
link |
That if you have predicate, you can do something.
link |
And another story you have to get predicate.
link |
It is intelligence problem.
link |
And people even did not start understanding intelligence.
link |
Because to understand intelligence, first of all,
link |
try to understand what doing teachers.
link |
How teacher teach.
link |
Why one teacher better than another one?
link |
So you think we really even haven't started on the journey
link |
of generating the predicate?
link |
We don't understand.
link |
We even don't understand that this problem exists.
link |
Because did you hear?
link |
No, I just know name.
link |
I want to understand why one teacher better than another.
link |
And how affect teacher student.
link |
It is not because he repeating the problem which is in textbook.
link |
He make some remarks.
link |
He make some philosophy of reasoning.
link |
Yeah, that's a beautiful...
link |
So it is a formulation of a question that is the open problem.
link |
Why is one teacher better than another?
link |
What he does better.
link |
Why at every level?
link |
How do they get better?
link |
What does it mean to be better?
link |
From whatever model I have.
link |
One teacher can give a very good predicate.
link |
One teacher can say swims like a dog.
link |
And another can say jump like a dog.
link |
And jump like a dog.
link |
Car is zero information.
link |
So what is the most exciting problem in statistical learning you've ever worked on?
link |
Or are working on now?
link |
I just finished this invariant story.
link |
And I'm happy that...
link |
I believe that it is ultimate learning story.
link |
At least I can show that there are no another mechanism, only two mechanisms.
link |
But they separate statistical part from intelligent part.
link |
And I know nothing about intelligent part.
link |
And if we will know this intelligent part,
link |
so it will help us a lot in teaching, in learning.
link |
You don't know it when we see it?
link |
So for example, in my talk, the last slide was the challenge.
link |
So you have, say, NIST digital recognition problem.
link |
And deep learning claims that they did it very well.
link |
Say 99.5% of correct answers.
link |
But they use 60,000 observations.
link |
Can you do the same?
link |
But incorporating invariants.
link |
What it means, you know, digit one, two, three.
link |
Just looking at that.
link |
Explain to me which invariant I should keep.
link |
To use 100 examples.
link |
Or say 100 times less examples to do the same job.
link |
That last slide, unfortunately, you talk ended quickly.
link |
The last slide was a powerful open challenge
link |
and a formulation of the essence here.
link |
That is the exact problem of intelligence.
link |
Because everybody, when machine learning started,
link |
it was developed by mathematicians,
link |
they immediately recognized that we use much more
link |
training data than humans needed.
link |
But now again, we came to the same story.
link |
That is the problem of learning.
link |
It is not like in deep learning,
link |
they use zealons of training data.
link |
Because maybe zealons are not enough
link |
if you have a good invariance.
link |
Maybe you'll never collect some number of observations.
link |
But now it is a question to intelligence.
link |
Because statistical part is ready.
link |
As soon as you supply us with predicate,
link |
we can do good job with small amount of observations.
link |
And the very first challenges will know digit recognition.
link |
And you know digits.
link |
And please tell me invariance.
link |
I think about that.
link |
I can say for digit 3, I would introduce
link |
concept of horizontal symmetry.
link |
So the digit 3 has horizontal symmetry
link |
more than say digit 2 or something like that.
link |
But as soon as I get the idea of horizontal symmetry,
link |
I can mathematically invent a lot of
link |
measure of horizontal symmetry
link |
on vertical symmetry or diagonal symmetry,
link |
whatever, if I have a day of symmetry.
link |
Looking on digit, I see that it is metapredicate,
link |
which is not shape.
link |
It is something like symmetry,
link |
like how dark is whole picture, something like that.
link |
Which can self rise up predicate.
link |
You think such a predicate could rise
link |
out of something that is not general.
link |
Meaning it feels like for me to be able to
link |
understand the difference between a 2 and a 3,
link |
I would need to have had a childhood
link |
of 10 to 15 years playing with kids,
link |
going to school, being yelled by parents.
link |
All of that, walking, jumping, looking at ducks.
link |
And now then I would be able to generate
link |
the right predicate for telling the difference
link |
between 2 and a 3.
link |
Or do you think there is a more efficient way?
link |
I know for sure that you must know
link |
something more than digits.
link |
That's a powerful statement.
link |
But maybe there are several languages
link |
of description, these elements of digits.
link |
So I'm talking about symmetry,
link |
about some properties of geometry,
link |
I'm talking about something abstract.
link |
But this is a problem of intelligence.
link |
So in one of our articles, it is trivial to show
link |
that every example can carry
link |
not more than one bit of information in real.
link |
Because when you show example
link |
and you say this is one, you can remove, say,
link |
a function which does not tell you one, say,
link |
the best strategy, if you can do it perfectly,
link |
it's remove half of the functions.
link |
But when you use one predicate, which looks like a duck,
link |
you can remove much more functions than half.
link |
And that means that it contains
link |
a lot of bit of information from a formal point of view.
link |
But when you have a general picture
link |
of what you want to recognize,
link |
a general picture of the world,
link |
can you invent this predicate?
link |
And that predicate carries a lot of information.
link |
Beautifully put, maybe just me,
link |
but in all the math you show, in your work,
link |
which is some of the most profound mathematical work
link |
in the field of learning AI and just math in general.
link |
I hear a lot of poetry and philosophy.
link |
You really kind of talk about philosophy of science.
link |
There's a poetry and music to a lot of the work you're doing
link |
and the way you're thinking about it.
link |
So where does that come from?
link |
Do you escape to poetry? Do you escape to music?
link |
I think that there exists ground truth.
link |
There exists ground truth?
link |
Yeah, and that can be seen everywhere.
link |
The smart guy, philosopher,
link |
sometimes I surprise how they deep see.
link |
Sometimes I see that some of them are completely out of subject.
link |
But the ground truth I see in music.
link |
Music is the ground truth?
link |
And in poetry, many poets, they believe they take dictation.
link |
So what piece of music,
link |
as a piece of empirical evidence,
link |
gave you a sense that they are touching something in the ground truth?
link |
The structure with the math of music.
link |
Because when you're listening to Bach,
link |
you see this structure.
link |
Very clear, very classic, very simple.
link |
And the same in Bach, when you have axioms in geometry,
link |
you have the same feeling.
link |
And in poetry, sometimes you see the same.
link |
And if you look back at your childhood,
link |
you grew up in Russia,
link |
you maybe were born as a researcher in Russia,
link |
you developed as a researcher in Russia,
link |
you came to the United States in a few places.
link |
what were some of your happiest moments as a researcher?
link |
Some of the most profound moments.
link |
Not in terms of their impact on society,
link |
but in terms of their impact on how damn good you feel that day,
link |
and you remember that moment.
link |
You know, every time when you found something,
link |
Every simple thing.
link |
But my general feeling that most of my time was wrong.
link |
You should go again and again and again
link |
and try to be honest in front of yourself.
link |
Not to make interpretation,
link |
but try to understand that it's related to ground truth.
link |
It is not my blah, blah, blah interpretation or something like that.
link |
But you're allowed to get excited at the possibility of discovery.
link |
You have to double check it, but...
link |
No, but how it's related to the other ground truth
link |
is it just temporary or it is forever?
link |
You know, you always have a feeling
link |
when you found something,
link |
So, 20 years ago, when we discovered statistical learning,
link |
so nobody believed.
link |
Except for one guy, Dudley from MIT.
link |
And then in 20 years, it became fashion.
link |
And the same with support vector machines.
link |
That's kernel machines.
link |
So with support vector machines and learning theory,
link |
when you were working on it,
link |
you had a sense that you had a sense of the profundity of it,
link |
how this seems to be right.
link |
It seems to be powerful.
link |
Right, absolutely, immediately.
link |
I recognize that it will last forever.
link |
And now, when I found this invariance story,
link |
I have a feeling that it is completely wrong.
link |
Because I have proved that there are no different mechanisms.
link |
Some say cosmetic improvement you can do,
link |
but in terms of invariance,
link |
you need both invariance and statistical learning
link |
and they should work together.
link |
But also, I'm happy that we can formulate
link |
what is intelligence from that
link |
and to separate from technical part.
link |
And that is completely different.
link |
Well, Vladimir, thank you so much for talking today.
link |
Thank you very much.