what is the role of imagination coming first there in the moment of discovery of an idea?

You don't need any imagination to derive the main principle of machine learning.

the process of learning, that is something like what happens in the human brain.

Do you think we will ever have the tools to try to describe that process of learning?

So, he saw blood cells, red cells, and he imagined that it is army fighting each other.

For example, people say that it is better than thousand days of diligent studies one day with great teacher.

But we know from history and now from math and machine learning that teacher can do a lot.

I don't know because teacher knows reality and can describe from this reality a predicate, invariants.

But he knows that when you're using invariant, you can decrease number of observations hundred times.

I think you mentioned like a piano teacher saying to the student, play like a butterfly.

Yeah, maybe it's romantic, poetic, but it feels like there's a lot of truth in that statement.

I think that what was discovered recently is that there is two mechanism of learning.

If it looks like a duck, swims like a duck, and quack like a duck, then it is probably duck.

So, you would like so that theoretical description from model coincide with empirical description,

So, you can't deny the fact that swims like a duck and quacks like a duck has humor in it, has ambiguity.

So, underneath, in order for us to understand swims like a duck, it feels like we need to know millions of other little pieces of information.

There doesn't need to be this knowledge base in those statements carries some rich information that helps us understand the essence of duck.

Then you remove all functions which does not look like you think it should look from training data.

That means that predicates are very good because every predicate is invented to decrease admissible set of function.

So, you talk about admissible set of functions and you talk about good functions.

So, admissible set of function is set of function which has small capacity or small diversity, small VC dimension example.

So, machine capable to pick up one function from the admissible set of function.

So, the goal is to create admissible set of functions which is have small VC dimension and contain good function.

Then you will be able to pick up the function using small amount of observations.

Is creating a set of admissible functions that has a small VC dimension and then you've figure out a clever way of picking up?

Statistical learning theory does not involve in creating admissible set of function.

In classical learning theory, everywhere, 100% in textbook, the set of function, admissible set of function is given.

But this is science about nothing because the most difficult problem to create admissible set of functions

given, say, a lot of functions, continuum set of function, create admissible set of functions.

That means that it has finite VC dimension, small VC dimension and contain good function.

Yeah, you're looking of properties of training data and properties means that you have some function

You have model and what is expectation of this function on the model and they should coincide.

But because when we're talking, say, duck does not jumping, so you don't ask question jump like a duck

But you know something, which question to ask and you're asking it seems like a duck,

So, there is a general type of predicate looks like and special type of predicate,

And that is intelligence part of all this business and that where teacher is involved.

What do you think about deep learning as neural networks, these arbitrary architectures

You know, I think that this is fantasy, everything which like deep learning, like features.

And he started this book describing that in old time when war is over, so the great kings,

they gathered together, almost all of them were relatives, and they discussed what should

And it was clear for everybody that it is not peace, that peace will last only 20 years

There are mathematicians who are looking for the problem from a very deep point of view,

If you like to say piecewise linear function, say that and do in class of piecewise linear

And then they try to prove advantage of that through interpretations, which mostly wrong.

And when it's not enough, they appeal to brain, which they know nothing about that.

Try to understand that there is not only one way of convergence, which is strong way of

And if you will go through all this stuff, you will see that you don't need deep learning.

It says that optimal solution of mathematical problem, which is described learning is on

The question is you have no value for throwing something on the table, playing with it, not

It's like a neural network where you said throwing something in the bucket or the biological

You don't see value in imagining the cells or kings and queens and using that as inspiration

You think that interpretation basically deceives you in a way that's not productive.

I think that if you're trying to analyze this business of learning and especially discussion

about deep learning, it is discussion about interpretation, not about things, about what

So not mathematical beauty, but the fact that it works at all or are you criticizing that

very beauty, our human desire to interpret, to find our silly interpretations in these

How do you feel about the success of a system like AlphaGo at beating the game of Go?

Using neural networks to estimate the quality of a board and the quality of the position.

The fact is a neural network system, it doesn't matter, a learning system that we don't I

think mathematically understand that well, beats the best human player, does something

So you empirically, we've empirically have discovered that this is not a very difficult

So even more I would say that if they use deep learning, it is not the most effective

And usually when people use deep learning, they're using zillions of training data.

So I describe challenge, can we do some problems which do well deep learning method, this deep

Even more, some problems deep learning cannot solve because it's not necessary they create

But it is possible to create admissible set of functions because you have your training

That actually for mathematicians, when you consider a variant, you need to use law of

When you're making training in existing algorithm, you need uniform law of large numbers, which

But nevertheless, if you use both weak and strong way of convergence, you can decrease

It wasn't, as far as we know, God or whoever didn't come up with a model and place in our

So Alan Turing in the 50s, in his paper, asked and rejected the question, can machines think?

It's not a very useful question, but can you briefly entertain this useful, useless question?

So when I prove some theorem, it's very difficult theorem, in couple of years, in several places,

people prove the same theorem, say, Sawyer Lemma, after us was done, then another guys

For example, geometry, it's happened simultaneously, first it did Lobachevsky and then Gauss and

Boyai and another guys, and it's approximately in 10 times period, 10 years period of time.

And many mathematicians think that when they develop something, they develop something

On the flip side of that, maybe you can comment on big O complexity and how you see classifying

So that way of thinking about functions, do you think p equals np, do you think that's

When I came to United States in 1990, people did not know, they did not know statistical

So in Russia, it was published to monographs, our monographs, but in America they didn't

Then they learned and somebody told me that it is worst case theory and they will create

So from mathematical point of view, you know the best possible case and the worst possible

The edges are interesting because it is not so easy to get good bound, exact bound.

Do you think it's interesting because it's challenging and reveals interesting principles

Or do you think it's interesting because it's actually very useful for understanding the

So it's like me judging your life as a human being by the worst thing you did and the best

So you can describe edges cases and it is clear it has some model, but you cannot describe

But from a statistical point of view, the way you've studied functions and the nature

of learning in the world, don't you think that the real world has a very long tail?

That the edge cases are very far away from the mean, the stuff in the middle or no?

I think that from my point of view, if you will use formal statistic, you need uniform

If you will use this invariance business, you will need just law of large numbers.

And there's this huge difference between uniform law of large numbers and large numbers.

For example, when I'm talking about duck, I give three predicates and that was enough.

But if you will try to do formal distinguish, you will need a lot of observations.

So that means that information about looks like a duck contain a lot of bit of information,

So we don't know that how much bit of information contain things from artificial and from intelligence.

Till now, all business, I don't like how people consider artificial intelligence.

How people try to do, how people can to develop, say, predicates seems like a duck or play

That is the problem of intelligence and you see that connected to the problem of learning?

Because you immediately give this predicate like specific predicate seems like a duck

If you were to formulate as a set of open problems, that will take us there, to play

It is intelligence problem and people even did not start to understand intelligence.

Because to understand intelligence, first of all, try to understand what do teachers.

And so you think we really even haven't started on the journey of generating the predicates?

I want to understand why one teacher better than another and how affect teacher, student.

So what is the most exciting problem in statistical learning you've ever worked on or are working

But they separate statistical part from intelligent part and I know nothing about intelligent

And if you will know this intelligent part, so it will help us a lot in teaching, in learning.

So you have say NIST digit recognition problem and deep learning claims that they did it

But looking on that, explain to me which invariant I should keep to use hundred examples or say

Yeah, that last slide, unfortunately your talk ended quickly, but that last slide was

Because everybody, when machine learning started and it was developed by mathematicians, they

It is not like in deep learning, they use zillions of training data because maybe zillions

Because statistical part is ready, as soon as you supply us with predicate, we can do

I think about that, I can say for digit three, I would introduce concept of horizontal symmetry.

So the digit three has horizontal symmetry, say more than, say, digit two or something

But as soon as I get the idea of horizontal symmetry, I can mathematically invent a lot

of measure of horizontal symmetry, or then vertical symmetry, or diagonal symmetry, whatever,

I think on digit I see that it is meta predicate, which is not shape, it is something like symmetry,

like how dark is whole picture, something like that, which can self rise a predicate.

You think such a predicate could rise out of something that is not general, meaning

it feels like for me to be able to understand the difference between two and three, I would

need to have had a childhood of 10 to 15 years playing with kids, going to school, being

yelled by parents, all of that, walking, jumping, looking at ducks, and then I would be able

to generate the right predicate for telling the difference between two and a three.

So I'm talking about symmetry, about some properties of geometry, I'm talking about

So in one of our articles, it is trivial to show that every example can carry not more

Because when you show example and you say this is one, you can remove, say, a function

But when you use one predicate, which looks like a duck, you can remove much more functions

And that means that it contains a lot of bit of information from formal point of view.

But when you have a general picture of what you want to recognize and general picture

Maybe just me, but in all the math you show, in your work, which is some of the most profound

mathematical work in the field of learning AI and just math in general, I hear a lot

There's a poetry and music to a lot of the work you're doing and the way you're thinking

So what piece of music as a piece of empirical evidence gave you a sense that they are touching

Very clear, very classic, very simple, and the same in math when you have axioms in geometry,

And if you look back at your childhood, you grew up in Russia, you maybe were born as

a researcher in Russia, you've developed as a researcher in Russia, you've came to United

If you look back, what was some of your happiest moments as a researcher, some of the most

profound moments, not in terms of their impact on society, but in terms of their impact on

You know, every time when you found something, it is great in the life, every simple things.

You should go again and again and again and try to be honest in front of yourself, not

to make interpretation, but try to understand that it's related to ground truth, it is not

No, but how it's related to another ground truth, is it just temporary or it is for forever?

So 20 years ago when we discovered statistical learning theory, nobody believed, except for

one guy, Dudley from MIT, and then in 20 years it became fashion, and the same with support

So with support vector machines and learning theory, when you were working on it, you had

a sense, you had a sense of the profundity of it, how this seems to be right, this seems

I recognized that it will last forever, and now when I found this invariant story, I have

a feeling that it is complete learning, because I have proof that there are no different mechanisms.

You can have some cosmetic improvement you can do, but in terms of invariants, you need

But also I'm happy that we can formulate what is intelligence from that, and to separate