back to index

Vladimir Vapnik: Statistical Learning | Lex Fridman Podcast #5


small model | large model

link |
00:00:00.000
The following is a conversation with Vladimir Vapnik.
link |
00:00:03.040
He's the coinventor of the Support Vector Machines,
link |
00:00:05.280
Support Vector Clustering, VC Theory,
link |
00:00:07.920
and many foundational ideas in statistical learning.
link |
00:00:11.200
He was born in the Soviet Union and worked
link |
00:00:13.640
at the Institute of Control Sciences in Moscow.
link |
00:00:16.320
Then in the United States, he worked at AT&T, NEC Labs,
link |
00:00:20.640
Facebook Research, and now as a professor at Columbia
link |
00:00:24.280
University.
link |
00:00:25.960
His work has been cited over 170,000 times.
link |
00:00:30.320
He has some very interesting ideas
link |
00:00:31.840
about artificial intelligence and the nature of learning,
link |
00:00:34.800
especially on the limits of our current approaches
link |
00:00:37.600
and the open problems in the field.
link |
00:00:40.440
This conversation is part of MIT course
link |
00:00:42.520
on artificial general intelligence
link |
00:00:44.440
and the Artificial Intelligence Podcast.
link |
00:00:46.840
If you enjoy it, please subscribe on YouTube
link |
00:00:49.600
or rate it on iTunes or your podcast provider of choice
link |
00:00:53.040
or simply connect with me on Twitter
link |
00:00:55.320
or other social networks at Lex Friedman, spelled F R I D.
link |
00:01:00.200
And now here's my conversation with Vladimir Vapnik.
link |
00:01:04.800
Einstein famously said that God doesn't play dice.
link |
00:01:08.840
Yeah.
link |
00:01:10.000
You have studied the world through the eyes of statistics.
link |
00:01:12.880
So let me ask you, in terms of the nature of reality,
link |
00:01:17.320
fundamental nature of reality, does God play dice?
link |
00:01:21.360
We don't know some factors, and because we
link |
00:01:26.200
don't know some factors, which could be important,
link |
00:01:30.520
it looks like God play dice, but we should describe it.
link |
00:01:38.000
In philosophy, they distinguish between two positions,
link |
00:01:42.080
positions of instrumentalism, where
link |
00:01:45.480
you're creating theory for prediction
link |
00:01:48.720
and position of realism, where you're
link |
00:01:51.400
trying to understand what God's big.
link |
00:01:54.640
Can you describe instrumentalism and realism
link |
00:01:56.800
a little bit?
link |
00:01:58.400
For example, if you have some mechanical laws, what is that?
link |
00:02:06.320
Is it law which is true always and everywhere?
link |
00:02:11.480
Or it is law which allows you to predict
link |
00:02:14.880
the position of moving element, what you believe.
link |
00:02:22.920
You believe that it is God's law, that God created the world,
link |
00:02:28.480
which obeyed to this physical law,
link |
00:02:33.160
or it is just law for predictions?
link |
00:02:36.240
And which one is instrumentalism?
link |
00:02:38.400
For predictions.
link |
00:02:39.880
If you believe that this is law of God, and it's always
link |
00:02:45.400
true everywhere, that means that you're a realist.
link |
00:02:50.040
So you're trying to really understand that God's thought.
link |
00:02:55.480
So the way you see the world as an instrumentalist?
link |
00:03:00.040
You know, I'm working for some models,
link |
00:03:03.240
model of machine learning.
link |
00:03:06.960
So in this model, we can see setting,
link |
00:03:12.760
and we try to solve, resolve the setting,
link |
00:03:16.440
to solve the problem.
link |
00:03:18.240
And you can do it in two different ways,
link |
00:03:20.760
from the point of view of instrumentalists.
link |
00:03:23.840
And that's what everybody does now,
link |
00:03:27.120
because they say that the goal of machine learning
link |
00:03:31.560
is to find the rule for classification.
link |
00:03:36.800
That is true, but it is an instrument for prediction.
link |
00:03:40.920
But I can say the goal of machine learning
link |
00:03:46.160
is to learn about conditional probability.
link |
00:03:50.040
So how God played use, and is He play?
link |
00:03:54.440
What is probability for one?
link |
00:03:55.960
What is probability for another given situation?
link |
00:03:59.960
But for prediction, I don't need this.
link |
00:04:02.600
I need the rule.
link |
00:04:04.240
But for understanding, I need conditional probability.
link |
00:04:08.480
So let me just step back a little bit first to talk about.
link |
00:04:11.800
You mentioned, which I read last night,
link |
00:04:13.960
the parts of the 1960 paper by Eugene Wigner,
link |
00:04:21.280
unreasonable effectiveness of mathematics
link |
00:04:23.520
and natural sciences.
link |
00:04:24.880
Such a beautiful paper, by the way.
link |
00:04:29.400
It made me feel, to be honest, to confess my own work
link |
00:04:34.480
in the past few years on deep learning, heavily applied.
link |
00:04:38.400
It made me feel that I was missing out
link |
00:04:40.320
on some of the beauty of nature in the way
link |
00:04:43.960
that math can uncover.
link |
00:04:45.560
So let me just step away from the poetry of that for a second.
link |
00:04:50.360
How do you see the role of math in your life?
link |
00:04:53.040
Is it a tool?
link |
00:04:54.080
Is it poetry?
link |
00:04:55.840
Where does it sit?
link |
00:04:56.960
And does math for you have limits of what it can describe?
link |
00:05:01.400
Some people saying that math is language which use God.
link |
00:05:08.280
So I believe in that.
link |
00:05:10.280
Speak to God or use God.
link |
00:05:12.000
Or use God.
link |
00:05:12.760
Use God.
link |
00:05:14.080
Yeah.
link |
00:05:15.560
So I believe that this article about unreasonable
link |
00:05:25.680
effectiveness of math is that if you're
link |
00:05:29.960
looking in mathematical structures,
link |
00:05:33.960
they know something about reality.
link |
00:05:37.720
And the most scientists from natural science,
link |
00:05:42.480
they're looking on equation and trying to understand reality.
link |
00:05:48.440
So the same in machine learning.
link |
00:05:51.280
If you're trying very carefully look on all equations
link |
00:05:57.560
which define conditional probability,
link |
00:06:00.640
you can understand something about reality more
link |
00:06:05.680
than from your fantasy.
link |
00:06:08.160
So math can reveal the simple underlying principles
link |
00:06:12.480
of reality, perhaps.
link |
00:06:13.880
You know, what means simple?
link |
00:06:16.880
It is very hard to discover them.
link |
00:06:20.320
But then when you discover them and look at them,
link |
00:06:23.800
you see how beautiful they are.
link |
00:06:27.440
And it is surprising why people did not see that before.
link |
00:06:33.560
You're looking on equation and derive it from equations.
link |
00:06:37.480
For example, I talked yesterday about least squirmated.
link |
00:06:43.360
And people had a lot of fantasy have to improve least squirmated.
link |
00:06:48.120
But if you're going step by step by solving some equations,
link |
00:06:52.360
you suddenly will get some term which,
link |
00:06:57.680
after thinking, you understand that it described
link |
00:07:01.040
position of observation point.
link |
00:07:04.360
In least squirmated, we throw out a lot of information.
link |
00:07:08.240
We don't look in composition of point of observations.
link |
00:07:11.760
We're looking only on residuals.
link |
00:07:14.600
But when you understood that, that's a very simple idea.
link |
00:07:19.400
But it's not too simple to understand.
link |
00:07:22.320
And you can derive this just from equations.
link |
00:07:25.680
So some simple algebra, a few steps
link |
00:07:28.120
will take you to something surprising
link |
00:07:31.040
that when you think about, you understand.
link |
00:07:34.360
And that is proof that human intuition not to reach
link |
00:07:41.120
and very primitive.
link |
00:07:42.640
And it does not see very simple situations.
link |
00:07:48.520
So let me take a step back in general.
link |
00:07:51.760
Yes, right?
link |
00:07:54.480
But what about human as opposed to intuition and ingenuity?
link |
00:08:01.600
Moments of brilliance.
link |
00:08:02.960
So do you have to be so hard on human intuition?
link |
00:08:09.480
Are there moments of brilliance in human intuition?
link |
00:08:11.840
They can leap ahead of math, and then the math will catch up?
link |
00:08:17.520
I don't think so.
link |
00:08:19.400
I think that the best human intuition,
link |
00:08:23.560
it is putting in axioms.
link |
00:08:26.440
And then it is technical.
link |
00:08:28.600
See where the axioms take you.
link |
00:08:31.880
But if they correctly take axioms,
link |
00:08:34.920
but it axiom polished during generations of scientists.
link |
00:08:41.400
And this is integral wisdom.
link |
00:08:45.040
So that's beautifully put.
link |
00:08:47.480
But if you maybe look at when you think of Einstein
link |
00:08:54.040
and special relativity, what is the role of imagination
link |
00:08:58.960
coming first there in the moment of discovery of an idea?
link |
00:09:04.480
So there is obviously a mix of math
link |
00:09:06.440
and out of the box imagination there.
link |
00:09:10.800
That I don't know.
link |
00:09:12.600
Whatever I did, I exclude any imagination.
link |
00:09:18.080
Because whatever I saw in machine learning that
link |
00:09:21.080
come from imagination, like features, like deep learning,
link |
00:09:26.440
they are not relevant to the problem.
link |
00:09:29.320
When you're looking very carefully
link |
00:09:31.960
for mathematical equations, you're
link |
00:09:34.280
deriving very simple theory, which goes far by
link |
00:09:38.000
no theory at school than whatever people can imagine.
link |
00:09:42.040
Because it is not good fantasy.
link |
00:09:44.760
It is just interpretation.
link |
00:09:46.720
It is just fantasy.
link |
00:09:48.000
But it is not what you need.
link |
00:09:51.320
You don't need any imagination to derive, say,
link |
00:09:56.960
main principle of machine learning.
link |
00:10:00.040
When you think about learning and intelligence,
link |
00:10:02.760
maybe thinking about the human brain
link |
00:10:04.560
and trying to describe mathematically the process of learning
link |
00:10:09.200
that is something like what happens in the human brain,
link |
00:10:13.160
do you think we have the tools currently?
link |
00:10:17.200
Do you think we will ever have the tools
link |
00:10:19.000
to try to describe that process of learning?
link |
00:10:22.680
It is not description of what's going on.
link |
00:10:25.800
It is interpretation.
link |
00:10:27.360
It is your interpretation.
link |
00:10:29.400
Your vision can be wrong.
link |
00:10:32.080
You know, when a guy invent microscope,
link |
00:10:36.160
Levin Cook for the first time, only he got this instrument
link |
00:10:40.560
and nobody, he kept secrets about microscope.
link |
00:10:45.440
But he wrote reports in London Academy of Science.
link |
00:10:49.080
In his report, when he looked into the blood,
link |
00:10:52.040
he looked everywhere, on the water, on the blood,
link |
00:10:54.480
on the spin.
link |
00:10:56.320
But he described blood like fight between queen and king.
link |
00:11:04.040
So he saw blood cells, red cells,
link |
00:11:08.120
and he imagines that it is army fighting each other.
link |
00:11:12.400
And it was his interpretation of situation.
link |
00:11:16.960
And he sent this report in Academy of Science.
link |
00:11:19.760
They very carefully looked because they believed
link |
00:11:22.640
that he is right, he saw something.
link |
00:11:25.160
But he gave wrong interpretation.
link |
00:11:28.240
And I believe the same can happen with brain.
link |
00:11:32.280
Because the most important part, you know,
link |
00:11:35.280
I believe in human language.
link |
00:11:38.840
In some proverb, it's so much wisdom.
link |
00:11:43.000
For example, people say that it is better than 1,000 days
link |
00:11:50.240
of diligent studies one day with great teacher.
link |
00:11:53.960
But if I will ask you what teacher does, nobody knows.
link |
00:11:59.480
And that is intelligence.
link |
00:12:01.400
And what we know from history, and now from mass
link |
00:12:07.320
and machine learning, that teacher can do a lot.
link |
00:12:12.080
So what, from a mathematical point of view,
link |
00:12:14.400
is the great teacher?
link |
00:12:16.080
I don't know.
link |
00:12:17.240
That's an awful question.
link |
00:12:18.880
Now, what we can say what teacher can do,
link |
00:12:25.120
he can introduce some invariance, some predicate
link |
00:12:29.440
for creating invariance.
link |
00:12:32.280
How he doing it?
link |
00:12:33.520
I don't know.
link |
00:12:34.080
Because teacher knows reality and can describe
link |
00:12:37.560
from this reality a predicate invariance.
link |
00:12:41.200
But he knows that when you're using invariant,
link |
00:12:43.480
he can decrease number of observations 100 times.
link |
00:12:47.960
But maybe try to pull that apart a little bit.
link |
00:12:52.960
I think you mentioned a piano teacher saying to the student,
link |
00:12:58.120
play like a butterfly.
link |
00:12:59.880
I played piano, I played guitar for a long time.
link |
00:13:03.720
Yeah, maybe it's romantic, poetic.
link |
00:13:09.800
But it feels like there's a lot of truth in that statement.
link |
00:13:13.160
There is a lot of instruction in that statement.
link |
00:13:15.440
And so can you pull that apart?
link |
00:13:17.320
What is that?
link |
00:13:19.760
The language itself may not contain this information.
link |
00:13:22.520
It's not blah, blah, blah.
link |
00:13:24.160
It does not blah, blah, blah, yeah.
link |
00:13:25.640
It affects you.
link |
00:13:26.960
It's what?
link |
00:13:27.600
It affects you.
link |
00:13:28.600
It affects your playing.
link |
00:13:29.800
Yes, it does.
link |
00:13:30.640
But it's not the language.
link |
00:13:33.640
It feels like what is the information being exchanged there?
link |
00:13:38.000
What is the nature of information?
link |
00:13:39.760
What is the representation of that information?
link |
00:13:41.880
I believe that it is sort of predicate.
link |
00:13:44.000
But I don't know.
link |
00:13:45.400
That is exactly what intelligence in machine learning
link |
00:13:48.880
should be.
link |
00:13:50.080
Because the rest is just mathematical technique.
link |
00:13:53.200
I think that what was discovered recently
link |
00:13:57.920
is that there is two mechanisms of learning.
link |
00:14:03.280
One called strong convergence mechanism
link |
00:14:06.040
and weak convergence mechanism.
link |
00:14:08.560
Before, people use only one convergence.
link |
00:14:11.200
In weak convergence mechanism, you can use predicate.
link |
00:14:15.840
That's what play like butterfly.
link |
00:14:19.360
And it will immediately affect your playing.
link |
00:14:23.640
You know, there is English proverb.
link |
00:14:26.360
Great.
link |
00:14:27.320
If it looks like a duck, swims like a duck,
link |
00:14:31.680
and quack like a duck, then it is probably duck.
link |
00:14:35.200
Yes.
link |
00:14:36.240
But this is exact about predicate.
link |
00:14:40.400
Looks like a duck, what it means.
link |
00:14:42.920
So you saw many ducks that you're training data.
link |
00:14:46.720
So you have description of how looks integral looks ducks.
link |
00:14:56.480
Yeah, the visual characteristics of a duck.
link |
00:14:59.360
Yeah, but you won't.
link |
00:15:00.840
And you have model for the cognition ducks.
link |
00:15:04.200
So you would like that theoretical description
link |
00:15:07.880
from model coincide with empirical description, which
link |
00:15:12.720
you saw on Territax there.
link |
00:15:14.520
So about looks like a duck, it is general.
link |
00:15:18.440
But what about swims like a duck?
link |
00:15:21.480
You should know that duck swims.
link |
00:15:23.560
You can say it play chess like a duck, OK?
link |
00:15:26.960
Duck doesn't play chess.
link |
00:15:28.880
And it is completely legal predicate, but it is useless.
link |
00:15:35.560
So half teacher can recognize not useless predicate.
link |
00:15:41.040
So up to now, we don't use this predicate
link |
00:15:44.640
in existing machine learning.
link |
00:15:46.680
And you think that's not so useful?
link |
00:15:47.200
So why we need billions of data?
link |
00:15:50.600
But in this English proverb, they use only three predicate.
link |
00:15:55.560
Looks like a duck, swims like a duck, and quack like a duck.
link |
00:15:59.080
So you can't deny the fact that swims like a duck
link |
00:16:02.040
and quacks like a duck has humor in it, has ambiguity.
link |
00:16:08.520
Let's talk about swim like a duck.
link |
00:16:12.600
It does not say jumps like a duck.
link |
00:16:16.520
Why?
link |
00:16:17.680
Because it's not relevant.
link |
00:16:20.760
But that means that you know ducks, you know different birds,
link |
00:16:25.880
you know animals.
link |
00:16:27.600
And you derive from this that it is relevant to say swim like a duck.
link |
00:16:32.440
So underneath, in order for us to understand swims like a duck,
link |
00:16:36.680
it feels like we need to know millions of other little pieces
link |
00:16:41.200
of information.
link |
00:16:43.000
We pick up along the way.
link |
00:16:44.280
You don't think so.
link |
00:16:45.120
There doesn't need to be this knowledge base.
link |
00:16:48.480
In those statements, carries some rich information
link |
00:16:52.600
that helps us understand the essence of duck.
link |
00:16:57.280
How far are we from integrating predicates?
link |
00:17:01.920
You know that when you consider complete theory,
link |
00:17:06.000
machine learning, so what it does,
link |
00:17:09.320
you have a lot of functions.
link |
00:17:12.400
And then you're talking, it looks like a duck.
link |
00:17:17.480
You see your training data.
link |
00:17:20.720
From training data, you recognize like expected duck should look.
link |
00:17:31.040
Then you remove all functions, which does not look like you think
link |
00:17:37.640
it should look from training data.
link |
00:17:40.080
So you decrease amount of function from which you pick up one.
link |
00:17:45.800
Then you give a second predicate.
link |
00:17:48.320
And then, again, decrease the set of function.
link |
00:17:51.840
And after that, you pick up the best function you can find.
link |
00:17:55.800
It is standard machine learning.
link |
00:17:58.120
So why you need not too many examples?
link |
00:18:03.280
Because your predicates aren't very good, or you're not.
link |
00:18:06.600
That means that predicate very good.
link |
00:18:09.200
Because every predicate is invented
link |
00:18:12.520
to decrease a divisible set of functions.
link |
00:18:17.720
So you talk about admissible set of functions,
link |
00:18:20.320
and you talk about good functions.
link |
00:18:22.440
So what makes a good function?
link |
00:18:24.280
So admissible set of function is set of function
link |
00:18:28.600
which has small capacity, or small diversity,
link |
00:18:32.760
small VC dimension example, which contain good function.
link |
00:18:36.960
So by the way, for people who don't know,
link |
00:18:38.760
VC, you're the V in the VC.
link |
00:18:42.440
So how would you describe to a lay person what VC theory is?
link |
00:18:50.440
How would you describe VC?
link |
00:18:51.440
So when you have a machine, so a machine
link |
00:18:56.480
capable to pick up one function from the admissible set
link |
00:19:00.240
of function.
link |
00:19:02.520
But set of admissibles function can be big.
link |
00:19:07.640
They contain all continuous functions and it's useless.
link |
00:19:11.600
You don't have so many examples to pick up function.
link |
00:19:15.280
But it can be small.
link |
00:19:17.280
Small, we call it capacity, but maybe better called diversity.
link |
00:19:24.560
So not very different function in the set
link |
00:19:27.160
is infinite set of function, but not very diverse.
link |
00:19:31.280
So it is small VC dimension.
link |
00:19:34.280
When VC dimension is small, you need small amount
link |
00:19:39.360
of training data.
link |
00:19:41.760
So the goal is to create admissible set of functions
link |
00:19:47.360
which have small VC dimension and contain good function.
link |
00:19:53.200
Then you will be able to pick up the function
link |
00:19:58.160
using small amount of observations.
link |
00:20:02.400
So that is the task of learning.
link |
00:20:06.760
It is creating a set of admissible functions
link |
00:20:11.360
that has a small VC dimension.
link |
00:20:13.120
And then you've figured out a clever way of picking up.
link |
00:20:17.320
No, that is goal of learning, which I formulated yesterday.
link |
00:20:22.440
Statistical learning theory does not
link |
00:20:25.760
involve in creating admissible set of function.
link |
00:20:30.360
In classical learning theory, everywhere, 100% in textbook,
link |
00:20:35.520
the set of function admissible set of function is given.
link |
00:20:39.200
But this is science about nothing,
link |
00:20:41.760
because the most difficult problem
link |
00:20:44.040
to create admissible set of functions, given, say,
link |
00:20:50.120
a lot of functions, continuum set of functions,
link |
00:20:53.080
create admissible set of functions,
link |
00:20:54.960
that means that it has finite VC dimension,
link |
00:20:58.760
small VC dimension, and contain good function.
link |
00:21:02.280
So this was out of consideration.
link |
00:21:05.280
So what's the process of doing that?
link |
00:21:07.240
I mean, it's fascinating.
link |
00:21:08.240
What is the process of creating this admissible set of functions?
link |
00:21:13.200
That is invariant.
link |
00:21:14.920
That's invariance.
link |
00:21:15.760
Can you describe invariance?
link |
00:21:17.280
Yeah, you're looking of properties of training data.
link |
00:21:22.440
And properties means that you have some function,
link |
00:21:30.120
and you just count what is the average value of function
link |
00:21:36.520
on training data.
link |
00:21:38.960
You have a model, and what is the expectation
link |
00:21:43.040
of this function on the model.
link |
00:21:44.960
And they should coincide.
link |
00:21:46.720
So the problem is about how to pick up functions.
link |
00:21:51.800
It can be any function.
link |
00:21:53.200
In fact, it is true for all functions.
link |
00:21:59.280
But because when I talking set, say,
link |
00:22:05.000
duck does not jumping, so you don't ask question, jump like a duck.
link |
00:22:09.920
Because it is trivial, it does not jumping,
link |
00:22:13.360
it doesn't help you to recognize jump.
link |
00:22:15.560
But you know something, which question to ask,
link |
00:22:19.000
when you're asking, it swims like a jump, like a duck.
link |
00:22:23.840
But looks like a duck, it is general situation.
link |
00:22:26.840
Looks like, say, guy who have this illness, this disease,
link |
00:22:34.440
it is legal, so there is a general type of predicate
link |
00:22:42.280
looks like, and special type of predicate,
link |
00:22:46.440
which related to this specific problem.
link |
00:22:50.040
And that is intelligence part of all this business.
link |
00:22:53.440
And that we are teachers in world.
link |
00:22:55.440
Incorporating those specialized predicates.
link |
00:22:58.440
What do you think about deep learning as neural networks,
link |
00:23:04.840
these arbitrary architectures as helping accomplish some of the tasks
link |
00:23:11.440
you're thinking about, their effectiveness or lack thereof,
link |
00:23:14.440
what are the weaknesses and what are the possible strengths?
link |
00:23:19.440
You know, I think that this is fantasy.
link |
00:23:22.440
Everything which like deep learning, like features.
link |
00:23:28.440
Let me give you this example.
link |
00:23:32.440
One of the greatest book, this Churchill book about history of Second World War.
link |
00:23:38.440
And he's starting this book describing that in all time, when war is over,
link |
00:23:47.440
so the great kings, they gathered together,
link |
00:23:54.440
almost all of them were relatives,
link |
00:23:57.440
and they discussed what should be done, how to create peace.
link |
00:24:02.440
And they came to agreement.
link |
00:24:04.440
And when happens First World War, the general public came in power.
link |
00:24:13.440
And they were so greedy that robbed Germany.
link |
00:24:17.440
And it was clear for everybody that it is not peace.
link |
00:24:21.440
That peace will last only 20 years, because they were not professionals.
link |
00:24:28.440
It's the same I see in machine learning.
link |
00:24:31.440
There are mathematicians who are looking for the problem from a very deep point of view,
link |
00:24:37.440
a mathematical point of view.
link |
00:24:39.440
And there are computer scientists who mostly does not know mathematics.
link |
00:24:45.440
They just have interpretation of that.
link |
00:24:48.440
And they invented a lot of blah, blah, blah interpretations like deep learning.
link |
00:24:53.440
Why you need deep learning?
link |
00:24:55.440
Mathematics does not know deep learning.
link |
00:24:57.440
Mathematics does not know neurons.
link |
00:25:00.440
It is just function.
link |
00:25:02.440
If you like to say piecewise linear function, say that,
link |
00:25:06.440
and do it in class of piecewise linear function.
link |
00:25:10.440
But they invent something.
link |
00:25:12.440
And then they try to prove the advantage of that through interpretations,
link |
00:25:20.440
which mostly wrong.
link |
00:25:22.440
And when not enough they appeal to brain,
link |
00:25:25.440
which they know nothing about that.
link |
00:25:27.440
Nobody knows what's going on in the brain.
link |
00:25:29.440
So I think that more reliable look on maths.
link |
00:25:34.440
This is a mathematical problem.
link |
00:25:36.440
Do your best to solve this problem.
link |
00:25:38.440
Try to understand that there is not only one way of convergence,
link |
00:25:43.440
which is strong way of convergence.
link |
00:25:45.440
There is a weak way of convergence, which requires predicate.
link |
00:25:49.440
And if you will go through all this stuff,
link |
00:25:52.440
you will see that you don't need deep learning.
link |
00:25:55.440
Even more, I would say one of the theorem,
link |
00:26:00.440
which is called representor theorem.
link |
00:26:02.440
It says that optimal solution of mathematical problem,
link |
00:26:10.440
which described learning, is on shadow network, not on deep learning.
link |
00:26:20.440
And a shallow network, yeah.
link |
00:26:22.440
The ultimate problem is there.
link |
00:26:24.440
Absolutely.
link |
00:26:25.440
So in the end, what you're saying is exactly right.
link |
00:26:29.440
The question is, you have no value for throwing something on the table,
link |
00:26:35.440
playing with it, not math.
link |
00:26:38.440
It's like in your old network where you said throwing something in the bucket
link |
00:26:41.440
or the biological example and looking at kings and queens
link |
00:26:45.440
or the cells or the microscope.
link |
00:26:47.440
You don't see value in imagining the cells or kings and queens
link |
00:26:52.440
and using that as inspiration and imagination
link |
00:26:56.440
for where the math will eventually lead you.
link |
00:26:59.440
You think that interpretation basically deceives you in a way that's not productive.
link |
00:27:06.440
I think that if you're trying to analyze this business of learning
link |
00:27:14.440
and especially discussion about deep learning,
link |
00:27:18.440
it is discussion about interpretation.
link |
00:27:21.440
It's discussion about things, about what you can say about things.
link |
00:27:26.440
That's right, but aren't you surprised by the beauty of it?
link |
00:27:29.440
Not mathematical beauty, but the fact that it works at all.
link |
00:27:36.440
Or are you criticizing that very beauty,
link |
00:27:39.440
our human desire to interpret,
link |
00:27:45.440
to find our silly interpretations in these constructs?
link |
00:27:49.440
Let me ask you this.
link |
00:27:51.440
Are you surprised?
link |
00:27:55.440
Does it inspire you?
link |
00:27:57.440
How do you feel about the success of a system like AlphaGo
link |
00:28:00.440
at beating the game of Go?
link |
00:28:03.440
Using neural networks to estimate the quality of a board
link |
00:28:09.440
and the quality of the board?
link |
00:28:11.440
That is your interpretation quality of the board.
link |
00:28:14.440
Yes.
link |
00:28:17.440
It's not our interpretation.
link |
00:28:20.440
The fact is, a neural network system doesn't matter.
link |
00:28:23.440
A learning system that we don't mathematically understand
link |
00:28:27.440
that beats the best human player.
link |
00:28:29.440
It does something that was thought impossible.
link |
00:28:31.440
That means that it's not very difficult problem.
link |
00:28:35.440
We've empirically discovered that this is not a very difficult problem.
link |
00:28:41.440
That's true.
link |
00:28:43.440
Maybe I can't argue.
link |
00:28:49.440
Even more, I would say,
link |
00:28:52.440
that if they use deep learning,
link |
00:28:54.440
it is not the most effective way of learning theory.
link |
00:28:59.440
Usually, when people use deep learning,
link |
00:29:03.440
they're using zillions of training data.
link |
00:29:09.440
But you don't need this.
link |
00:29:13.440
I describe the challenge.
link |
00:29:15.440
Can we do some problems with deep learning method
link |
00:29:22.440
with deep net using 100 times less training data?
link |
00:29:27.440
Even more, some problems deep learning cannot solve
link |
00:29:33.440
because it's not necessary.
link |
00:29:37.440
They create admissible set of functions.
link |
00:29:40.440
Deep architecture means to create admissible set of functions.
link |
00:29:45.440
You cannot say that you're creating good admissible set of functions.
link |
00:29:49.440
It's your fantasy.
link |
00:29:52.440
It does not come from mass.
link |
00:29:54.440
But it is possible to create admissible set of functions
link |
00:29:58.440
because you have your training data.
link |
00:30:01.440
Actually, for mathematicians, when you consider a variant,
link |
00:30:08.440
you need to use law of large numbers.
link |
00:30:11.440
When you're making training in existing algorithm,
link |
00:30:17.440
you need uniform law of large numbers,
link |
00:30:20.440
which is much more difficult.
link |
00:30:22.440
You see dimension and all this stuff.
link |
00:30:24.440
Nevertheless, if you use both weak and strong way of convergence,
link |
00:30:32.440
you can decrease a lot of training data.
link |
00:30:34.440
You could do the three, the Swims like a duck and Quacks like a duck.
link |
00:30:39.440
Let's step back and think about human intelligence in general.
link |
00:30:47.440
Clearly, that has evolved in a nonmathematical way.
link |
00:30:52.440
As far as we know, God, or whoever,
link |
00:31:00.440
didn't come up with a model in place in our brain of admissible functions.
link |
00:31:05.440
It kind of evolved.
link |
00:31:06.440
I don't know.
link |
00:31:07.440
Maybe you have a view on this.
link |
00:31:08.440
Alan Turing in the 50s in his paper asked and rejected the question,
link |
00:31:15.440
can machines think?
link |
00:31:16.440
It's not a very useful question.
link |
00:31:18.440
But can you briefly entertain this useless question?
link |
00:31:23.440
Can machines think?
link |
00:31:25.440
So talk about intelligence and your view of it.
link |
00:31:28.440
I don't know that.
link |
00:31:29.440
I know that Turing described imitation.
link |
00:31:34.440
If computer can imitate human being, let's call it intelligent.
link |
00:31:41.440
And he understands that it is not thinking computer.
link |
00:31:45.440
Yes.
link |
00:31:46.440
He completely understands what he's doing.
link |
00:31:49.440
But he's set up a problem of imitation.
link |
00:31:53.440
So now we understand that the problem is not in imitation.
link |
00:31:57.440
I'm not sure that intelligence is just inside of us.
link |
00:32:04.440
It may be also outside of us.
link |
00:32:06.440
I have several observations.
link |
00:32:09.440
So when I prove some theorem, it's a very difficult theorem.
link |
00:32:15.440
But in a couple of years, in several places, people proved the same theorem.
link |
00:32:22.440
Say, soil lemma after us was done.
link |
00:32:26.440
Then another guy proved the same theorem.
link |
00:32:29.440
In the history of science, it's happened all the time.
link |
00:32:32.440
For example, geometry.
link |
00:32:35.440
It's happened simultaneously.
link |
00:32:37.440
First it did Lobachevsky and then Gauss and Boyai and other guys.
link |
00:32:43.440
It happened simultaneously in 10 years period of time.
link |
00:32:48.440
And I saw a lot of examples like that.
link |
00:32:51.440
And many mathematicians think that when they develop something,
link |
00:32:56.440
they develop something in general which affects everybody.
link |
00:33:01.440
So maybe our model that intelligence is only inside of us is incorrect.
link |
00:33:07.440
It's our interpretation.
link |
00:33:09.440
Maybe there exists some connection with world intelligence.
link |
00:33:15.440
I don't know.
link |
00:33:16.440
You're almost like plugging in into...
link |
00:33:19.440
Yeah, exactly.
link |
00:33:20.440
...and contributing to this...
link |
00:33:22.440
Into a big network.
link |
00:33:23.440
...into a big, maybe in your own network.
link |
00:33:26.440
No, no, no.
link |
00:33:27.440
On the flip side of that, maybe you can comment on big O complexity
link |
00:33:34.440
and how you see classifying algorithms by worst case running time
link |
00:33:40.440
in relation to their input.
link |
00:33:42.440
So that way of thinking about functions.
link |
00:33:45.440
Do you think P equals NP?
link |
00:33:47.440
Do you think that's an interesting question?
link |
00:33:49.440
Yeah, it is an interesting question.
link |
00:33:51.440
But let me talk about complexity and about worst case scenario.
link |
00:34:01.440
There is a mathematical setting.
link |
00:34:03.440
When I came to the United States in 1990,
link |
00:34:07.440
people did not know this theory.
link |
00:34:09.440
They did not know statistical learning theory.
link |
00:34:12.440
So in Russia it was published to monographs or monographs,
link |
00:34:17.440
but in America they didn't know.
link |
00:34:19.440
Then they learned.
link |
00:34:22.440
And somebody told me that if it's worst case theory,
link |
00:34:25.440
and they will create real case theory,
link |
00:34:27.440
but till now it did not.
link |
00:34:30.440
Because it is a mathematical tool.
link |
00:34:33.440
You can do only what you can do using mathematics,
link |
00:34:38.440
which has a clear understanding and clear description.
link |
00:34:45.440
And for this reason we introduced complexity.
link |
00:34:52.440
And we need this.
link |
00:34:54.440
Because actually it is diverse, I like this one more.
link |
00:35:01.440
This dimension you can prove some theorems.
link |
00:35:04.440
But we also create theory for case when you know probability measure.
link |
00:35:12.440
And that is the best case which can happen.
link |
00:35:14.440
It is entropy theory.
link |
00:35:17.440
So from a mathematical point of view,
link |
00:35:20.440
you know the best possible case and the worst possible case.
link |
00:35:25.440
You can derive different model in medium.
link |
00:35:28.440
But it's not so interesting.
link |
00:35:30.440
You think the edges are interesting?
link |
00:35:33.440
The edges are interesting.
link |
00:35:35.440
Because it is not so easy to get a good bound, exact bound.
link |
00:35:44.440
It's not many cases where you have.
link |
00:35:47.440
The bound is not exact.
link |
00:35:49.440
But interesting principles which discover the mass.
link |
00:35:54.440
Do you think it's interesting because it's challenging
link |
00:35:57.440
and reveals interesting principles that allow you to get those bounds?
link |
00:36:02.440
Or do you think it's interesting because it's actually very useful
link |
00:36:05.440
for understanding the essence of a function of an algorithm?
link |
00:36:10.440
So it's like me judging your life as a human being
link |
00:36:15.440
by the worst thing you did and the best thing you did
link |
00:36:19.440
versus all the stuff in the middle.
link |
00:36:21.440
It seems not productive.
link |
00:36:25.440
I don't think so because you cannot describe situation in the middle.
link |
00:36:31.440
Or it will be not general.
link |
00:36:34.440
So you can describe edges cases.
link |
00:36:38.440
And it is clear it has some model.
link |
00:36:41.440
But you cannot describe model for every new case.
link |
00:36:47.440
So you will be never accurate when you're using model.
link |
00:36:53.440
But from a statistical point of view,
link |
00:36:55.440
the way you've studied functions and the nature of learning
link |
00:37:00.440
and the world, don't you think that the real world has a very long tail
link |
00:37:07.440
that the edge cases are very far away from the mean,
link |
00:37:13.440
the stuff in the middle, or no?
link |
00:37:19.440
I don't know that.
link |
00:37:21.440
I think that from my point of view,
link |
00:37:29.440
if you will use formal statistic, uniform law of large numbers,
link |
00:37:39.440
if you will use this invariance business,
link |
00:37:47.440
you will need just law of large numbers.
link |
00:37:51.440
And there's a huge difference between uniform law of large numbers
link |
00:37:55.440
and large numbers.
link |
00:37:57.440
Can you describe that a little more?
link |
00:37:59.440
Or should we just take it to...
link |
00:38:01.440
No, for example, when I'm talking about duck,
link |
00:38:05.440
I gave three predicates and it was enough.
link |
00:38:09.440
But if you will try to do formal distinguish,
link |
00:38:14.440
you will need a lot of observations.
link |
00:38:17.440
I got you.
link |
00:38:19.440
And so that means that information about looks like a duck
link |
00:38:24.440
contain a lot of bits of information,
link |
00:38:27.440
formal bits of information.
link |
00:38:29.440
So we don't know that how much bit of information
link |
00:38:35.440
contain things from artificial intelligence.
link |
00:38:39.440
And that is the subject of analysis.
link |
00:38:42.440
Till now, old business,
link |
00:38:47.440
I don't like how people consider artificial intelligence.
link |
00:38:54.440
They consider us some codes which imitate activity of human being.
link |
00:39:00.440
It is not science.
link |
00:39:02.440
It is applications.
link |
00:39:04.440
You would like to imitate God.
link |
00:39:06.440
It is very useful and we have good problem.
link |
00:39:09.440
But you need to learn something more.
link |
00:39:15.440
How people can to develop predicates,
link |
00:39:23.440
swims like a duck,
link |
00:39:25.440
or play like butterfly or something like that.
link |
00:39:28.440
Not the teacher tells you how it came in his mind.
link |
00:39:33.440
How he choose this image.
link |
00:39:36.440
That is problem of intelligence.
link |
00:39:39.440
That is the problem of intelligence.
link |
00:39:41.440
And you see that connected to the problem of learning?
link |
00:39:44.440
Absolutely.
link |
00:39:45.440
Because you immediately give this predicate
link |
00:39:48.440
like specific predicate, swims like a duck,
link |
00:39:52.440
or quack like a duck.
link |
00:39:54.440
It was chosen somehow.
link |
00:39:57.440
So what is the line of work, would you say?
link |
00:40:00.440
If you were to formulate as a set of open problems,
link |
00:40:05.440
that will take us there.
link |
00:40:07.440
Play like a butterfly.
link |
00:40:09.440
We will get a system to be able to...
link |
00:40:11.440
Let's separate two stories.
link |
00:40:13.440
One mathematical story.
link |
00:40:15.440
That if you have predicate, you can do something.
link |
00:40:19.440
And another story you have to get predicate.
link |
00:40:22.440
It is intelligence problem.
link |
00:40:26.440
And people even did not start understanding intelligence.
link |
00:40:31.440
Because to understand intelligence, first of all,
link |
00:40:34.440
try to understand what doing teachers.
link |
00:40:37.440
How teacher teach.
link |
00:40:40.440
Why one teacher better than another one?
link |
00:40:43.440
Yeah.
link |
00:40:44.440
So you think we really even haven't started on the journey
link |
00:40:48.440
of generating the predicate?
link |
00:40:50.440
No.
link |
00:40:51.440
We don't understand.
link |
00:40:52.440
We even don't understand that this problem exists.
link |
00:40:56.440
Because did you hear?
link |
00:40:58.440
You do.
link |
00:40:59.440
No, I just know name.
link |
00:41:02.440
I want to understand why one teacher better than another.
link |
00:41:07.440
And how affect teacher student.
link |
00:41:12.440
It is not because he repeating the problem which is in textbook.
link |
00:41:17.440
Yes.
link |
00:41:18.440
He make some remarks.
link |
00:41:20.440
He make some philosophy of reasoning.
link |
00:41:23.440
Yeah, that's a beautiful...
link |
00:41:24.440
So it is a formulation of a question that is the open problem.
link |
00:41:31.440
Why is one teacher better than another?
link |
00:41:33.440
Right.
link |
00:41:34.440
What he does better.
link |
00:41:37.440
Yeah.
link |
00:41:38.440
What...
link |
00:41:39.440
Why at every level?
link |
00:41:42.440
How do they get better?
link |
00:41:44.440
What does it mean to be better?
link |
00:41:47.440
The whole...
link |
00:41:49.440
Yeah.
link |
00:41:50.440
From whatever model I have.
link |
00:41:53.440
One teacher can give a very good predicate.
link |
00:41:56.440
One teacher can say swims like a dog.
link |
00:42:00.440
And another can say jump like a dog.
link |
00:42:03.440
And jump like a dog.
link |
00:42:05.440
Car is zero information.
link |
00:42:07.440
Yeah.
link |
00:42:08.440
So what is the most exciting problem in statistical learning you've ever worked on?
link |
00:42:13.440
Or are working on now?
link |
00:42:16.440
I just finished this invariant story.
link |
00:42:22.440
And I'm happy that...
link |
00:42:24.440
I believe that it is ultimate learning story.
link |
00:42:30.440
At least I can show that there are no another mechanism, only two mechanisms.
link |
00:42:37.440
But they separate statistical part from intelligent part.
link |
00:42:44.440
And I know nothing about intelligent part.
link |
00:42:48.440
And if we will know this intelligent part,
link |
00:42:52.440
so it will help us a lot in teaching, in learning.
link |
00:42:59.440
You don't know it when we see it?
link |
00:43:02.440
So for example, in my talk, the last slide was the challenge.
link |
00:43:06.440
So you have, say, NIST digital recognition problem.
link |
00:43:11.440
And deep learning claims that they did it very well.
link |
00:43:16.440
Say 99.5% of correct answers.
link |
00:43:21.440
But they use 60,000 observations.
link |
00:43:24.440
Can you do the same?
link |
00:43:26.440
100 times less.
link |
00:43:29.440
But incorporating invariants.
link |
00:43:31.440
What it means, you know, digit one, two, three.
link |
00:43:34.440
Yeah.
link |
00:43:35.440
Just looking at that.
link |
00:43:37.440
Explain to me which invariant I should keep.
link |
00:43:40.440
To use 100 examples.
link |
00:43:43.440
Or say 100 times less examples to do the same job.
link |
00:43:48.440
Yeah.
link |
00:43:49.440
That last slide, unfortunately, you talk ended quickly.
link |
00:43:55.440
The last slide was a powerful open challenge
link |
00:43:59.440
and a formulation of the essence here.
link |
00:44:02.440
That is the exact problem of intelligence.
link |
00:44:06.440
Because everybody, when machine learning started,
link |
00:44:12.440
it was developed by mathematicians,
link |
00:44:15.440
they immediately recognized that we use much more
link |
00:44:19.440
training data than humans needed.
link |
00:44:22.440
But now again, we came to the same story.
link |
00:44:25.440
Have to decrease.
link |
00:44:27.440
That is the problem of learning.
link |
00:44:30.440
It is not like in deep learning,
link |
00:44:32.440
they use zealons of training data.
link |
00:44:35.440
Because maybe zealons are not enough
link |
00:44:38.440
if you have a good invariance.
link |
00:44:44.440
Maybe you'll never collect some number of observations.
link |
00:44:49.440
But now it is a question to intelligence.
link |
00:44:53.440
Have to do that.
link |
00:44:55.440
Because statistical part is ready.
link |
00:44:58.440
As soon as you supply us with predicate,
link |
00:45:02.440
we can do good job with small amount of observations.
link |
00:45:06.440
And the very first challenges will know digit recognition.
link |
00:45:10.440
And you know digits.
link |
00:45:12.440
And please tell me invariance.
link |
00:45:15.440
I think about that.
link |
00:45:16.440
I can say for digit 3, I would introduce
link |
00:45:20.440
concept of horizontal symmetry.
link |
00:45:24.440
So the digit 3 has horizontal symmetry
link |
00:45:29.440
more than say digit 2 or something like that.
link |
00:45:33.440
But as soon as I get the idea of horizontal symmetry,
link |
00:45:37.440
I can mathematically invent a lot of
link |
00:45:40.440
measure of horizontal symmetry
link |
00:45:43.440
on vertical symmetry or diagonal symmetry,
link |
00:45:46.440
whatever, if I have a day of symmetry.
link |
00:45:49.440
But what else?
link |
00:45:52.440
Looking on digit, I see that it is metapredicate,
link |
00:46:00.440
which is not shape.
link |
00:46:04.440
It is something like symmetry,
link |
00:46:07.440
like how dark is whole picture, something like that.
link |
00:46:12.440
Which can self rise up predicate.
link |
00:46:15.440
You think such a predicate could rise
link |
00:46:18.440
out of something that is not general.
link |
00:46:26.440
Meaning it feels like for me to be able to
link |
00:46:31.440
understand the difference between a 2 and a 3,
link |
00:46:34.440
I would need to have had a childhood
link |
00:46:39.440
of 10 to 15 years playing with kids,
link |
00:46:45.440
going to school, being yelled by parents.
link |
00:46:50.440
All of that, walking, jumping, looking at ducks.
link |
00:46:55.440
And now then I would be able to generate
link |
00:46:58.440
the right predicate for telling the difference
link |
00:47:01.440
between 2 and a 3.
link |
00:47:03.440
Or do you think there is a more efficient way?
link |
00:47:06.440
I know for sure that you must know
link |
00:47:10.440
something more than digits.
link |
00:47:12.440
That's a powerful statement.
link |
00:47:15.440
But maybe there are several languages
link |
00:47:19.440
of description, these elements of digits.
link |
00:47:24.440
So I'm talking about symmetry,
link |
00:47:27.440
about some properties of geometry,
link |
00:47:30.440
I'm talking about something abstract.
link |
00:47:33.440
But this is a problem of intelligence.
link |
00:47:38.440
So in one of our articles, it is trivial to show
link |
00:47:42.440
that every example can carry
link |
00:47:46.440
not more than one bit of information in real.
link |
00:47:49.440
Because when you show example
link |
00:47:54.440
and you say this is one, you can remove, say,
link |
00:47:59.440
a function which does not tell you one, say,
link |
00:48:03.440
the best strategy, if you can do it perfectly,
link |
00:48:06.440
it's remove half of the functions.
link |
00:48:09.440
But when you use one predicate, which looks like a duck,
link |
00:48:14.440
you can remove much more functions than half.
link |
00:48:18.440
And that means that it contains
link |
00:48:20.440
a lot of bit of information from a formal point of view.
link |
00:48:25.440
But when you have a general picture
link |
00:48:31.440
of what you want to recognize,
link |
00:48:33.440
a general picture of the world,
link |
00:48:36.440
can you invent this predicate?
link |
00:48:40.440
And that predicate carries a lot of information.
link |
00:48:46.440
Beautifully put, maybe just me,
link |
00:48:49.440
but in all the math you show, in your work,
link |
00:48:53.440
which is some of the most profound mathematical work
link |
00:48:57.440
in the field of learning AI and just math in general.
link |
00:49:01.440
I hear a lot of poetry and philosophy.
link |
00:49:04.440
You really kind of talk about philosophy of science.
link |
00:49:09.440
There's a poetry and music to a lot of the work you're doing
link |
00:49:12.440
and the way you're thinking about it.
link |
00:49:14.440
So where does that come from?
link |
00:49:16.440
Do you escape to poetry? Do you escape to music?
link |
00:49:20.440
I think that there exists ground truth.
link |
00:49:24.440
There exists ground truth?
link |
00:49:26.440
Yeah, and that can be seen everywhere.
link |
00:49:30.440
The smart guy, philosopher,
link |
00:49:32.440
sometimes I surprise how they deep see.
link |
00:49:38.440
Sometimes I see that some of them are completely out of subject.
link |
00:49:45.440
But the ground truth I see in music.
link |
00:49:50.440
Music is the ground truth?
link |
00:49:52.440
Yeah.
link |
00:49:53.440
And in poetry, many poets, they believe they take dictation.
link |
00:50:01.440
So what piece of music,
link |
00:50:06.440
as a piece of empirical evidence,
link |
00:50:08.440
gave you a sense that they are touching something in the ground truth?
link |
00:50:14.440
It is structure.
link |
00:50:16.440
The structure with the math of music.
link |
00:50:18.440
Because when you're listening to Bach,
link |
00:50:20.440
you see this structure.
link |
00:50:22.440
Very clear, very classic, very simple.
link |
00:50:25.440
And the same in Bach, when you have axioms in geometry,
link |
00:50:31.440
you have the same feeling.
link |
00:50:33.440
And in poetry, sometimes you see the same.
link |
00:50:36.440
And if you look back at your childhood,
link |
00:50:40.440
you grew up in Russia,
link |
00:50:42.440
you maybe were born as a researcher in Russia,
link |
00:50:46.440
you developed as a researcher in Russia,
link |
00:50:48.440
you came to the United States in a few places.
link |
00:50:51.440
If you look back,
link |
00:50:53.440
what were some of your happiest moments as a researcher?
link |
00:50:59.440
Some of the most profound moments.
link |
00:51:02.440
Not in terms of their impact on society,
link |
00:51:06.440
but in terms of their impact on how damn good you feel that day,
link |
00:51:12.440
and you remember that moment.
link |
00:51:15.440
You know, every time when you found something,
link |
00:51:20.440
it is great.
link |
00:51:22.440
It's a life.
link |
00:51:24.440
Every simple thing.
link |
00:51:26.440
But my general feeling that most of my time was wrong.
link |
00:51:32.440
You should go again and again and again
link |
00:51:35.440
and try to be honest in front of yourself.
link |
00:51:39.440
Not to make interpretation,
link |
00:51:41.440
but try to understand that it's related to ground truth.
link |
00:51:46.440
It is not my blah, blah, blah interpretation or something like that.
link |
00:51:52.440
But you're allowed to get excited at the possibility of discovery.
link |
00:51:57.440
You have to double check it, but...
link |
00:52:00.440
No, but how it's related to the other ground truth
link |
00:52:04.440
is it just temporary or it is forever?
link |
00:52:10.440
You know, you always have a feeling
link |
00:52:13.440
when you found something,
link |
00:52:17.440
how big is that?
link |
00:52:19.440
So, 20 years ago, when we discovered statistical learning,
link |
00:52:23.440
so nobody believed.
link |
00:52:25.440
Except for one guy, Dudley from MIT.
link |
00:52:31.440
And then in 20 years, it became fashion.
link |
00:52:36.440
And the same with support vector machines.
link |
00:52:39.440
That's kernel machines.
link |
00:52:41.440
So with support vector machines and learning theory,
link |
00:52:44.440
when you were working on it,
link |
00:52:48.440
you had a sense that you had a sense of the profundity of it,
link |
00:52:55.440
how this seems to be right.
link |
00:52:59.440
It seems to be powerful.
link |
00:53:01.440
Right, absolutely, immediately.
link |
00:53:04.440
I recognize that it will last forever.
link |
00:53:08.440
And now, when I found this invariance story,
link |
00:53:17.440
I have a feeling that it is completely wrong.
link |
00:53:21.440
Because I have proved that there are no different mechanisms.
link |
00:53:25.440
Some say cosmetic improvement you can do,
link |
00:53:30.440
but in terms of invariance,
link |
00:53:34.440
you need both invariance and statistical learning
link |
00:53:38.440
and they should work together.
link |
00:53:41.440
But also, I'm happy that we can formulate
link |
00:53:47.440
what is intelligence from that
link |
00:53:51.440
and to separate from technical part.
link |
00:53:54.440
And that is completely different.
link |
00:53:56.440
Absolutely.
link |
00:53:58.440
Well, Vladimir, thank you so much for talking today.
link |
00:54:00.440
Thank you.
link |
00:54:01.440
Thank you very much.