link |

what is the role of imagination coming first there in the moment of discovery of an idea?

link |

You don't need any imagination to derive the main principle of machine learning.

link |

the process of learning, that is something like what happens in the human brain.

link |

Do you think we will ever have the tools to try to describe that process of learning?

link |

So, he saw blood cells, red cells, and he imagined that it is army fighting each other.

link |

For example, people say that it is better than thousand days of diligent studies one day with great teacher.

link |

But we know from history and now from math and machine learning that teacher can do a lot.

link |

I don't know because teacher knows reality and can describe from this reality a predicate, invariants.

link |

But he knows that when you're using invariant, you can decrease number of observations hundred times.

link |

I think you mentioned like a piano teacher saying to the student, play like a butterfly.

link |

Yeah, maybe it's romantic, poetic, but it feels like there's a lot of truth in that statement.

link |

I think that what was discovered recently is that there is two mechanism of learning.

link |

If it looks like a duck, swims like a duck, and quack like a duck, then it is probably duck.

link |

So, you would like so that theoretical description from model coincide with empirical description,

link |

So, you can't deny the fact that swims like a duck and quacks like a duck has humor in it, has ambiguity.

link |

So, underneath, in order for us to understand swims like a duck, it feels like we need to know millions of other little pieces of information.

link |

There doesn't need to be this knowledge base in those statements carries some rich information that helps us understand the essence of duck.

link |

Then you remove all functions which does not look like you think it should look from training data.

link |

That means that predicates are very good because every predicate is invented to decrease admissible set of function.

link |

So, you talk about admissible set of functions and you talk about good functions.

link |

So, admissible set of function is set of function which has small capacity or small diversity, small VC dimension example.

link |

So, machine capable to pick up one function from the admissible set of function.

link |

So, the goal is to create admissible set of functions which is have small VC dimension and contain good function.

link |

Then you will be able to pick up the function using small amount of observations.

link |

Is creating a set of admissible functions that has a small VC dimension and then you've figure out a clever way of picking up?

link |

Statistical learning theory does not involve in creating admissible set of function.

link |

In classical learning theory, everywhere, 100% in textbook, the set of function, admissible set of function is given.

link |

But this is science about nothing because the most difficult problem to create admissible set of functions

link |

given, say, a lot of functions, continuum set of function, create admissible set of functions.

link |

That means that it has finite VC dimension, small VC dimension and contain good function.

link |

Yeah, you're looking of properties of training data and properties means that you have some function

link |

You have model and what is expectation of this function on the model and they should coincide.

link |

But because when we're talking, say, duck does not jumping, so you don't ask question jump like a duck

link |

But you know something, which question to ask and you're asking it seems like a duck,

link |

So, there is a general type of predicate looks like and special type of predicate,

link |

And that is intelligence part of all this business and that where teacher is involved.

link |

What do you think about deep learning as neural networks, these arbitrary architectures

link |

You know, I think that this is fantasy, everything which like deep learning, like features.

link |

And he started this book describing that in old time when war is over, so the great kings,

link |

they gathered together, almost all of them were relatives, and they discussed what should

link |

And it was clear for everybody that it is not peace, that peace will last only 20 years

link |

There are mathematicians who are looking for the problem from a very deep point of view,

link |

If you like to say piecewise linear function, say that and do in class of piecewise linear

link |

And then they try to prove advantage of that through interpretations, which mostly wrong.

link |

And when it's not enough, they appeal to brain, which they know nothing about that.

link |

Try to understand that there is not only one way of convergence, which is strong way of

link |

And if you will go through all this stuff, you will see that you don't need deep learning.

link |

It says that optimal solution of mathematical problem, which is described learning is on

link |

The question is you have no value for throwing something on the table, playing with it, not

link |

It's like a neural network where you said throwing something in the bucket or the biological

link |

You don't see value in imagining the cells or kings and queens and using that as inspiration

link |

You think that interpretation basically deceives you in a way that's not productive.

link |

I think that if you're trying to analyze this business of learning and especially discussion

link |

about deep learning, it is discussion about interpretation, not about things, about what

link |

So not mathematical beauty, but the fact that it works at all or are you criticizing that

link |

very beauty, our human desire to interpret, to find our silly interpretations in these

link |

How do you feel about the success of a system like AlphaGo at beating the game of Go?

link |

Using neural networks to estimate the quality of a board and the quality of the position.

link |

The fact is a neural network system, it doesn't matter, a learning system that we don't I

link |

think mathematically understand that well, beats the best human player, does something

link |

So you empirically, we've empirically have discovered that this is not a very difficult

link |

So even more I would say that if they use deep learning, it is not the most effective

link |

And usually when people use deep learning, they're using zillions of training data.

link |

So I describe challenge, can we do some problems which do well deep learning method, this deep

link |

Even more, some problems deep learning cannot solve because it's not necessary they create

link |

But it is possible to create admissible set of functions because you have your training

link |

That actually for mathematicians, when you consider a variant, you need to use law of

link |

When you're making training in existing algorithm, you need uniform law of large numbers, which

link |

But nevertheless, if you use both weak and strong way of convergence, you can decrease

link |

It wasn't, as far as we know, God or whoever didn't come up with a model and place in our

link |

So Alan Turing in the 50s, in his paper, asked and rejected the question, can machines think?

link |

It's not a very useful question, but can you briefly entertain this useful, useless question?

link |

So when I prove some theorem, it's very difficult theorem, in couple of years, in several places,

link |

people prove the same theorem, say, Sawyer Lemma, after us was done, then another guys

link |

For example, geometry, it's happened simultaneously, first it did Lobachevsky and then Gauss and

link |

Boyai and another guys, and it's approximately in 10 times period, 10 years period of time.

link |

And many mathematicians think that when they develop something, they develop something

link |

On the flip side of that, maybe you can comment on big O complexity and how you see classifying

link |

So that way of thinking about functions, do you think p equals np, do you think that's

link |

When I came to United States in 1990, people did not know, they did not know statistical

link |

So in Russia, it was published to monographs, our monographs, but in America they didn't

link |

Then they learned and somebody told me that it is worst case theory and they will create

link |

So from mathematical point of view, you know the best possible case and the worst possible

link |

The edges are interesting because it is not so easy to get good bound, exact bound.

link |

Do you think it's interesting because it's challenging and reveals interesting principles

link |

Or do you think it's interesting because it's actually very useful for understanding the

link |

So it's like me judging your life as a human being by the worst thing you did and the best

link |

So you can describe edges cases and it is clear it has some model, but you cannot describe

link |

But from a statistical point of view, the way you've studied functions and the nature

link |

of learning in the world, don't you think that the real world has a very long tail?

link |

That the edge cases are very far away from the mean, the stuff in the middle or no?

link |

I think that from my point of view, if you will use formal statistic, you need uniform

link |

If you will use this invariance business, you will need just law of large numbers.

link |

And there's this huge difference between uniform law of large numbers and large numbers.

link |

For example, when I'm talking about duck, I give three predicates and that was enough.

link |

But if you will try to do formal distinguish, you will need a lot of observations.

link |

So that means that information about looks like a duck contain a lot of bit of information,

link |

So we don't know that how much bit of information contain things from artificial and from intelligence.

link |

Till now, all business, I don't like how people consider artificial intelligence.

link |

How people try to do, how people can to develop, say, predicates seems like a duck or play

link |

That is the problem of intelligence and you see that connected to the problem of learning?

link |

Because you immediately give this predicate like specific predicate seems like a duck

link |

If you were to formulate as a set of open problems, that will take us there, to play

link |

It is intelligence problem and people even did not start to understand intelligence.

link |

Because to understand intelligence, first of all, try to understand what do teachers.

link |

And so you think we really even haven't started on the journey of generating the predicates?

link |

I want to understand why one teacher better than another and how affect teacher, student.

link |

So what is the most exciting problem in statistical learning you've ever worked on or are working

link |

But they separate statistical part from intelligent part and I know nothing about intelligent

link |

And if you will know this intelligent part, so it will help us a lot in teaching, in learning.

link |

So you have say NIST digit recognition problem and deep learning claims that they did it

link |

But looking on that, explain to me which invariant I should keep to use hundred examples or say

link |

Yeah, that last slide, unfortunately your talk ended quickly, but that last slide was

link |

Because everybody, when machine learning started and it was developed by mathematicians, they

link |

It is not like in deep learning, they use zillions of training data because maybe zillions

link |

Because statistical part is ready, as soon as you supply us with predicate, we can do

link |

I think about that, I can say for digit three, I would introduce concept of horizontal symmetry.

link |

So the digit three has horizontal symmetry, say more than, say, digit two or something

link |

But as soon as I get the idea of horizontal symmetry, I can mathematically invent a lot

link |

of measure of horizontal symmetry, or then vertical symmetry, or diagonal symmetry, whatever,

link |

I think on digit I see that it is meta predicate, which is not shape, it is something like symmetry,

link |

like how dark is whole picture, something like that, which can self rise a predicate.

link |

You think such a predicate could rise out of something that is not general, meaning

link |

it feels like for me to be able to understand the difference between two and three, I would

link |

need to have had a childhood of 10 to 15 years playing with kids, going to school, being

link |

yelled by parents, all of that, walking, jumping, looking at ducks, and then I would be able

link |

to generate the right predicate for telling the difference between two and a three.

link |

So I'm talking about symmetry, about some properties of geometry, I'm talking about

link |

So in one of our articles, it is trivial to show that every example can carry not more

link |

Because when you show example and you say this is one, you can remove, say, a function

link |

But when you use one predicate, which looks like a duck, you can remove much more functions

link |

And that means that it contains a lot of bit of information from formal point of view.

link |

But when you have a general picture of what you want to recognize and general picture

link |

Maybe just me, but in all the math you show, in your work, which is some of the most profound

link |

mathematical work in the field of learning AI and just math in general, I hear a lot

link |

There's a poetry and music to a lot of the work you're doing and the way you're thinking

link |

So what piece of music as a piece of empirical evidence gave you a sense that they are touching

link |

Very clear, very classic, very simple, and the same in math when you have axioms in geometry,

link |

And if you look back at your childhood, you grew up in Russia, you maybe were born as

link |

a researcher in Russia, you've developed as a researcher in Russia, you've came to United

link |

If you look back, what was some of your happiest moments as a researcher, some of the most

link |

profound moments, not in terms of their impact on society, but in terms of their impact on

link |

You know, every time when you found something, it is great in the life, every simple things.

link |

You should go again and again and again and try to be honest in front of yourself, not

link |

to make interpretation, but try to understand that it's related to ground truth, it is not

link |

No, but how it's related to another ground truth, is it just temporary or it is for forever?

link |

So 20 years ago when we discovered statistical learning theory, nobody believed, except for

link |

one guy, Dudley from MIT, and then in 20 years it became fashion, and the same with support

link |

So with support vector machines and learning theory, when you were working on it, you had

link |

a sense, you had a sense of the profundity of it, how this seems to be right, this seems

link |

I recognized that it will last forever, and now when I found this invariant story, I have

link |

a feeling that it is complete learning, because I have proof that there are no different mechanisms.

link |

You can have some cosmetic improvement you can do, but in terms of invariants, you need

link |

But also I'm happy that we can formulate what is intelligence from that, and to separate