back to index

Yoshua Bengio: Deep Learning | Lex Fridman Podcast #4


small model | large model

link |
00:00:00.000
What difference between biological neural networks and artificial neural networks
link |
00:00:04.320
is most mysterious, captivating, and profound for you?
link |
00:00:11.120
First of all, there's so much we don't know about biological neural networks,
link |
00:00:15.280
and that's very mysterious and captivating because maybe it holds the key to improving
link |
00:00:21.840
artificial neural networks. One of the things I studied recently is something
link |
00:00:29.680
that we don't know how biological neural networks do but would be really useful for artificial ones
link |
00:00:37.120
is the ability to do credit assignment through very long time spans. There are things that
link |
00:00:46.560
we can in principle do with artificial neural nets, but it's not very convenient and it's
link |
00:00:50.400
not biologically plausible. And this mismatch, I think this kind of mismatch
link |
00:00:55.920
may be an interesting thing to study to, A, understand better how brains might do these
link |
00:01:02.560
things because we don't have good corresponding theories with artificial neural nets, and B,
link |
00:01:09.200
maybe provide new ideas that we could explore about things that brain do differently and that
link |
00:01:18.320
we could incorporate in artificial neural nets. So let's break credit assignment up a little bit.
link |
00:01:23.680
Yes. So what, it's a beautifully technical term, but it could incorporate so many things. So is it
link |
00:01:30.320
more on the RNN memory side, that thinking like that, or is it something about knowledge, building
link |
00:01:37.760
up common sense knowledge over time? Or is it more in the reinforcement learning sense that you're
link |
00:01:44.800
picking up rewards over time for a particular, to achieve a certain kind of goal? So I was thinking
link |
00:01:50.080
more about the first two meanings whereby we store all kinds of memories, episodic memories
link |
00:01:59.440
in our brain, which we can access later in order to help us both infer causes of things that we
link |
00:02:10.560
are observing now and assign credit to decisions or interpretations we came up with a while ago
link |
00:02:20.640
when those memories were stored. And then we can change the way we would have reacted or interpreted
link |
00:02:29.280
things in the past, and now that's credit assignment used for learning.
link |
00:02:33.760
So in which way do you think artificial neural networks, the current LSTM, the current architectures
link |
00:02:43.600
are not able to capture the, presumably you're thinking of very long term?
link |
00:02:50.320
Yes. So current, the current nets are doing a fairly good jobs for sequences with dozens or
link |
00:02:58.560
say hundreds of time steps. And then it gets harder and harder and depending on what you have
link |
00:03:04.960
to remember and so on, as you consider longer durations. Whereas humans seem to be able to
link |
00:03:12.480
do credit assignment through essentially arbitrary times, like I could remember something I did last
link |
00:03:16.960
year. And then now because I see some new evidence, I'm going to change my mind about the way I was
link |
00:03:23.840
thinking last year. And hopefully not do the same mistake again.
link |
00:03:30.720
I think a big part of that is probably forgetting. You're only remembering the really important
link |
00:03:36.080
things. It's very efficient forgetting.
link |
00:03:40.000
Yes. So there's a selection of what we remember. And I think there are really cool connection to
link |
00:03:46.160
higher level cognition here regarding consciousness, deciding and emotions,
link |
00:03:52.080
so deciding what comes to consciousness and what gets stored in memory, which are not trivial either.
link |
00:04:00.720
So you've been at the forefront there all along, showing some of the amazing things that neural
link |
00:04:07.120
networks, deep neural networks can do in the field of artificial intelligence is just broadly
link |
00:04:12.640
in all kinds of applications. But we can talk about that forever. But what, in your view,
link |
00:04:19.120
because we're thinking towards the future, is the weakest aspect of the way deep neural networks
link |
00:04:23.920
represent the world? What is that? What is in your view is missing?
link |
00:04:29.200
So current state of the art neural nets trained on large quantities of images or texts
link |
00:04:38.240
have some level of understanding of, you know, what explains those data sets, but it's very
link |
00:04:45.360
basic, it's it's very low level. And it's not nearly as robust and abstract and general
link |
00:04:54.160
as our understanding. Okay, so that doesn't tell us how to fix things. But I think it encourages
link |
00:05:02.400
us to think about how we can maybe train our neural nets differently, so that they would
link |
00:05:14.240
focus, for example, on causal explanation, something that we don't do currently with neural
link |
00:05:20.400
net training. Also, one thing I'll talk about in my talk this afternoon is the fact that
link |
00:05:27.440
instead of learning separately from images and videos on one hand and from texts on the other
link |
00:05:33.680
hand, we need to do a better job of jointly learning about language and about the world
link |
00:05:42.000
to which it refers. So that, you know, both sides can help each other. We need to have good world
link |
00:05:50.160
models in our neural nets for them to really understand sentences, which talk about what's
link |
00:05:57.360
going on in the world. And I think we need language input to help provide clues about
link |
00:06:06.400
what high level concepts like semantic concepts should be represented at the top levels of our
link |
00:06:13.600
neural nets. In fact, there is evidence that the purely unsupervised learning of representations
link |
00:06:21.920
doesn't give rise to high level representations that are as powerful as the ones we're getting
link |
00:06:28.960
from supervised learning. And so the clues we're getting just with the labels, not even sentences,
link |
00:06:35.680
is already very, very high level. And I think that's a very important thing to keep in mind.
link |
00:06:42.400
It's already very powerful. Do you think that's an architecture challenge or is it a data set challenge?
link |
00:06:49.520
Neither. I'm tempted to just end it there. Can you elaborate slightly?
link |
00:07:02.880
Of course, data sets and architectures are something you want to always play with. But
link |
00:07:06.800
I think the crucial thing is more the training objectives, the training frameworks. For example,
link |
00:07:13.040
going from passive observation of data to more active agents, which
link |
00:07:22.320
learn by intervening in the world, the relationships between causes and effects,
link |
00:07:27.280
the sort of objective functions, which could be important to allow the highest level explanations
link |
00:07:36.640
to rise from the learning, which I don't think we have now, the kinds of objective functions,
link |
00:07:43.840
which could be used to reward exploration, the right kind of exploration. So these kinds of
link |
00:07:50.400
questions are neither in the data set nor in the architecture, but more in how we learn,
link |
00:07:57.200
under what objectives and so on. Yeah, I've heard you mention in several contexts, the idea of sort
link |
00:08:04.240
of the way children learn, they interact with objects in the world. And it seems fascinating
link |
00:08:08.880
because in some sense, except with some cases in reinforcement learning, that idea
link |
00:08:15.520
is not part of the learning process in artificial neural networks. So it's almost like,
link |
00:08:21.360
do you envision something like an objective function saying, you know what, if you
link |
00:08:29.680
poke this object in this kind of way, it would be really helpful for me to further learn.
link |
00:08:36.400
Right, right.
link |
00:08:37.040
Sort of almost guiding some aspect of the learning.
link |
00:08:40.320
Right, right, right. So I was talking to Rebecca Sacks just a few minutes ago,
link |
00:08:43.600
and she was talking about lots and lots of evidence from infants seem to clearly pick
link |
00:08:52.960
what interests them in a directed way. And so they're not passive learners, they focus their
link |
00:09:03.040
attention on aspects of the world, which are most interesting, surprising in a non trivial way.
link |
00:09:10.480
That makes them change their theories of the world.
link |
00:09:16.000
So that's a fascinating view of the future progress. But on a more maybe boring question,
link |
00:09:26.080
do you think going deeper and larger, so do you think just increasing the size of the things that
link |
00:09:33.760
have been increasing a lot in the past few years, is going to be a big thing?
link |
00:09:38.800
I think increasing the size of the things that have been increasing a lot in the past few years
link |
00:09:44.320
will also make significant progress. So some of the representational issues that you mentioned,
link |
00:09:51.840
they're kind of shallow, in some sense.
link |
00:09:54.880
Oh, shallow in the sense of abstraction.
link |
00:09:58.400
In the sense of abstraction, they're not getting some...
link |
00:10:00.800
I don't think that having more depth in the network in the sense of instead of 100 layers,
link |
00:10:06.880
you're going to have more layers. I don't think so. Is that obvious to you?
link |
00:10:11.680
Yes. What is clear to me is that engineers and companies and labs and grad students will continue
link |
00:10:19.200
to tune architectures and explore all kinds of tweaks to make the current state of the art
link |
00:10:25.600
slightly ever slightly better. But I don't think that's going to be nearly enough. I think we need
link |
00:10:31.440
changes in the way that we're considering learning to achieve the goal that these learners actually
link |
00:10:39.920
understand in a deep way the environment in which they are, you know, observing and acting.
link |
00:10:46.640
But I guess I was trying to ask a question that's more interesting than just more layers.
link |
00:10:53.200
It's basically, once you figure out a way to learn through interacting, how many parameters
link |
00:11:00.800
it takes to store that information. So I think our brain is quite bigger than most neural networks.
link |
00:11:07.760
Right, right. Oh, I see what you mean. Oh, I'm with you there. So I agree that in order to
link |
00:11:14.240
build neural nets with the kind of broad knowledge of the world that typical adult humans have,
link |
00:11:20.960
probably the kind of computing power we have now is going to be insufficient.
link |
00:11:25.600
So the good news is there are hardware companies building neural net chips. And so
link |
00:11:30.320
it's going to get better. However, the good news in a way, which is also a bad news,
link |
00:11:37.520
is that even our state of the art, deep learning methods fail to learn models that understand
link |
00:11:46.960
even very simple environments, like some grid worlds that we have built.
link |
00:11:52.000
Even these fairly simple environments, I mean, of course, if you train them with enough examples,
link |
00:11:56.080
eventually they get it. But it's just like, instead of what humans might need just
link |
00:12:03.440
dozens of examples, these things will need millions for very, very, very simple tasks.
link |
00:12:10.000
And so I think there's an opportunity for academics who don't have the kind of computing
link |
00:12:16.640
power that, say, Google has to do really important and exciting research to advance
link |
00:12:23.440
the state of the art in training frameworks, learning models, agent learning in even simple
link |
00:12:30.960
environments that are synthetic, that seem trivial, but yet current machine learning fails on.
link |
00:12:38.240
We talked about priors and common sense knowledge. It seems like
link |
00:12:43.760
we humans take a lot of knowledge for granted. So what's your view of these priors of forming
link |
00:12:52.160
this broad view of the world, this accumulation of information and how we can teach neural networks
link |
00:12:58.880
or learning systems to pick that knowledge up? So knowledge, for a while, the artificial
link |
00:13:05.520
intelligence was maybe in the 80s, like there's a time where knowledge representation, knowledge,
link |
00:13:14.320
acquisition, expert systems, I mean, the symbolic AI was a view, was an interesting problem set to
link |
00:13:22.240
solve and it was kind of put on hold a little bit, it seems like. Because it doesn't work.
link |
00:13:27.680
It doesn't work. That's right. But that's right. But the goals of that remain important.
link |
00:13:34.960
Yes. Remain important. And how do you think those goals can be addressed?
link |
00:13:39.760
Right. So first of all, I believe that one reason why the classical expert systems approach failed
link |
00:13:48.400
is because a lot of the knowledge we have, so you talked about common sense intuition,
link |
00:13:56.320
there's a lot of knowledge like this, which is not consciously accessible.
link |
00:14:01.680
There are lots of decisions we're taking that we can't really explain, even if sometimes we make
link |
00:14:05.440
up a story. And that knowledge is also necessary for machines to take good decisions. And that
link |
00:14:15.600
knowledge is hard to codify in expert systems, rule based systems and classical AI formalism.
link |
00:14:22.960
And there are other issues, of course, with the old AI, like not really good ways of handling
link |
00:14:29.520
uncertainty, I would say something more subtle, which we understand better now, but I think still
link |
00:14:37.040
isn't enough in the minds of people. There's something really powerful that comes from
link |
00:14:43.920
distributed representations, the thing that really makes neural nets work so well.
link |
00:14:49.280
And it's hard to replicate that kind of power in a symbolic world. The knowledge in expert systems
link |
00:14:58.640
and so on is nicely decomposed into like a bunch of rules. Whereas if you think about a neural net,
link |
00:15:04.960
it's the opposite. You have this big blob of parameters which work intensely together to
link |
00:15:10.960
represent everything the network knows. And it's not sufficiently factorized. It's not
link |
00:15:16.960
sufficiently factorized. And so I think this is one of the weaknesses of current neural nets,
link |
00:15:24.240
that we have to take lessons from classical AI in order to bring in another kind of compositionality,
link |
00:15:32.320
which is common in language, for example, and in these rules, but that isn't so native to neural
link |
00:15:38.800
nets. And on that line of thinking, disentangled representations. Yes. So let me connect with
link |
00:15:48.400
disentangled representations, if you might, if you don't mind. So for many years, I've thought,
link |
00:15:55.280
and I still believe that it's really important that we come up with learning algorithms,
link |
00:16:00.560
either unsupervised or supervised, but reinforcement, whatever, that build representations
link |
00:16:06.400
in which the important factors, hopefully causal factors are nicely separated and easy to pick up
link |
00:16:13.360
from the representation. So that's the idea of disentangled representations. It says transform
link |
00:16:18.480
the data into a space where everything becomes easy. We can maybe just learn with linear models
link |
00:16:25.120
about the things we care about. And I still think this is important, but I think this is missing out
link |
00:16:30.960
on a very important ingredient, which classical AI systems can remind us of.
link |
00:16:38.080
So let's say we have these disentangled representations. You still need to learn about
link |
00:16:43.440
the relationships between the variables, those high level semantic variables. They're not going
link |
00:16:47.200
to be independent. I mean, this is like too much of an assumption. They're going to have some
link |
00:16:52.000
interesting relationships that allow to predict things in the future, to explain what happened
link |
00:16:56.320
in the past. The kind of knowledge about those relationships in a classical AI system
link |
00:17:01.600
is encoded in the rules. Like a rule is just like a little piece of knowledge that says,
link |
00:17:06.000
oh, I have these two, three, four variables that are linked in this interesting way,
link |
00:17:10.960
then I can say something about one or two of them given a couple of others, right?
link |
00:17:14.800
In addition to disentangling the elements of the representation, which are like the variables
link |
00:17:22.160
in a rule based system, you also need to disentangle the mechanisms that relate those
link |
00:17:31.840
variables to each other. So like the rules. So the rules are neatly separated. Like each rule is,
link |
00:17:37.200
you know, living on its own. And when I change a rule because I'm learning, it doesn't need to
link |
00:17:43.360
break other rules. Whereas current neural nets, for example, are very sensitive to what's called
link |
00:17:48.720
catastrophic forgetting, where after I've learned some things and then I learn new things,
link |
00:17:54.080
they can destroy the old things that I had learned, right? If the knowledge was better
link |
00:17:59.280
factorized and separated, disentangled, then you would avoid a lot of that.
link |
00:18:06.560
Now, you can't do this in the sensory domain.
link |
00:18:10.320
What do you mean by sensory domain?
link |
00:18:13.120
Like in pixel space. But my idea is that when you project the data in the right semantic space,
link |
00:18:18.640
it becomes possible to now represent this extra knowledge beyond the transformation from inputs
link |
00:18:25.040
to representations, which is how representations act on each other and predict the future and so on
link |
00:18:31.120
in a way that can be neatly disentangled. So now it's the rules that are disentangled from each
link |
00:18:37.680
other and not just the variables that are disentangled from each other.
link |
00:18:40.400
And you draw a distinction between semantic space and pixel, like does there need to be
link |
00:18:45.200
an architectural difference?
link |
00:18:46.560
Well, yeah. So there's the sensory space like pixels, which where everything is entangled.
link |
00:18:52.080
The information, like the variables are completely interdependent in very complicated ways.
link |
00:18:58.160
And also computation, like it's not just the variables, it's also how they are related to
link |
00:19:03.520
each other is all intertwined. But I'm hypothesizing that in the right high level
link |
00:19:11.280
representation space, both the variables and how they relate to each other can be
link |
00:19:16.720
disentangled. And that will provide a lot of generalization power.
link |
00:19:20.800
Generalization power.
link |
00:19:22.240
Yes.
link |
00:19:22.720
Distribution of the test set is assumed to be the same as the distribution of the training set.
link |
00:19:29.280
Right. This is where current machine learning is too weak. It doesn't tell us anything,
link |
00:19:35.600
is not able to tell us anything about how our neural nets, say, are going to generalize to
link |
00:19:40.080
a new distribution. And, you know, people may think, well, but there's nothing we can say
link |
00:19:45.120
if we don't know what the new distribution will be. The truth is humans are able to generalize
link |
00:19:50.880
to new distributions.
link |
00:19:52.560
Yeah. How are we able to do that?
link |
00:19:54.000
Yeah. Because there is something, these new distributions, even though they could look
link |
00:19:57.920
very different from the training distributions, they have things in common. So let me give you
link |
00:20:02.240
a concrete example. You read a science fiction novel. The science fiction novel, maybe, you
link |
00:20:07.920
know, brings you in some other planet where things look very different on the surface,
link |
00:20:15.200
but it's still the same laws of physics. And so you can read the book and you understand
link |
00:20:20.000
what's going on. So the distribution is very different. But because you can transport
link |
00:20:27.360
a lot of the knowledge you had from Earth about the underlying cause and effect relationships
link |
00:20:33.120
and physical mechanisms and all that, and maybe even social interactions, you can now
link |
00:20:38.720
make sense of what is going on on this planet where, like, visually, for example,
link |
00:20:42.160
things are totally different.
link |
00:20:45.280
Taking that analogy further and distorting it, let's enter a science fiction world of,
link |
00:20:50.800
say, Space Odyssey, 2001, with Hal. Or maybe, which is probably one of my favorite AI movies.
link |
00:20:59.840
Me too.
link |
00:21:00.480
And then there's another one that a lot of people love that may be a little bit outside
link |
00:21:05.360
of the AI community is Ex Machina. I don't know if you've seen it.
link |
00:21:10.000
Yes. Yes.
link |
00:21:11.600
By the way, what are your views on that movie? Are you able to enjoy it?
link |
00:21:16.000
Are there things I like and things I hate?
link |
00:21:21.120
So you could talk about that in the context of a question I want to ask, which is, there's
link |
00:21:26.800
quite a large community of people from different backgrounds, often outside of AI, who are concerned
link |
00:21:32.800
about existential threat of artificial intelligence. You've seen this community
link |
00:21:37.600
develop over time. You've seen you have a perspective. So what do you think is the best
link |
00:21:42.160
way to talk about AI safety, to think about it, to have discourse about it within AI community
link |
00:21:48.320
and outside and grounded in the fact that Ex Machina is one of the main sources of information
link |
00:21:54.560
for the general public about AI?
link |
00:21:56.560
So I think you're putting it right. There's a big difference between the sort of discussion
link |
00:22:02.240
we ought to have within the AI community and the sort of discussion that really matter
link |
00:22:07.600
in the general public. So I think the picture of Terminator and AI loose and killing people
link |
00:22:17.120
and super intelligence that's going to destroy us, whatever we try, isn't really so useful
link |
00:22:24.560
for the public discussion. Because for the public discussion, the things I believe really
link |
00:22:30.000
matter are the short term and medium term, very likely negative impacts of AI on society,
link |
00:22:37.200
whether it's from security, like, you know, big brother scenarios with face recognition
link |
00:22:43.280
or killer robots, or the impact on the job market, or concentration of power and discrimination,
link |
00:22:50.000
all kinds of social issues, which could actually, some of them could really threaten democracy,
link |
00:22:57.760
for example.
link |
00:22:58.800
Just to clarify, when you said killer robots, you mean autonomous weapon, weapon systems.
link |
00:23:04.000
Yes, I don't mean that's right.
link |
00:23:06.320
So I think these short and medium term concerns should be important parts of the public debate.
link |
00:23:13.040
Now, existential risk, for me is a very unlikely consideration, but still worth academic investigation
link |
00:23:24.640
in the same way that you could say, should we study what could happen if meteorite, you
link |
00:23:30.080
know, came to earth and destroyed it. So I think it's very unlikely that this is going
link |
00:23:33.920
to happen in or happen in a reasonable future. The sort of scenario of an AI getting loose
link |
00:23:43.040
goes against my understanding of at least current machine learning and current neural
link |
00:23:46.560
nets and so on. It's not plausible to me. But of course, I don't have a crystal ball
link |
00:23:51.120
and who knows what AI will be in 50 years from now. So I think it is worth that scientists
link |
00:23:55.520
study those problems. It's just not a pressing question as far as I'm concerned.
link |
00:23:59.680
So before I continue down that line, I have a few questions there. But what do you like
link |
00:24:05.840
and not like about Ex Machina as a movie? Because I actually watched it for the second
link |
00:24:09.840
time and enjoyed it. I hated it the first time, and I enjoyed it quite a bit more the
link |
00:24:15.600
second time when I sort of learned to accept certain pieces of it, see it as a concept
link |
00:24:23.440
movie. What was your experience? What were your thoughts?
link |
00:24:26.320
So the negative is the picture it paints of science is totally wrong. Science in general
link |
00:24:36.080
and AI in particular. Science is not happening in some hidden place by some, you know, really
link |
00:24:44.160
smart guy, one person. This is totally unrealistic. This is not how it happens. Even a team of
link |
00:24:52.160
people in some isolated place will not make it. Science moves by small steps, thanks to
link |
00:24:59.840
the collaboration and community of a large number of people interacting. And all the
link |
00:25:10.480
scientists who are expert in their field kind of know what is going on, even in the industrial
link |
00:25:14.560
labs. It's information flows and leaks and so on. And the spirit of it is very different
link |
00:25:21.920
from the way science is painted in this movie.
link |
00:25:25.600
Yeah, let me ask on that point. It's been the case to this point that kind of even if
link |
00:25:32.400
the research happens inside Google or Facebook, inside companies, it still kind of comes out,
link |
00:25:36.800
ideas come out. Do you think that will always be the case with AI? Is it possible to bottle
link |
00:25:41.680
ideas to the point where there's a set of breakthroughs that go completely undiscovered
link |
00:25:47.360
by the general research community? Do you think that's even possible?
link |
00:25:52.240
It's possible, but it's unlikely. It's not how it is done now. It's not how I can foresee
link |
00:25:59.520
it in the foreseeable future. But of course, I don't have a crystal ball and science is
link |
00:26:09.520
a crystal ball. And so who knows? This is science fiction after all.
link |
00:26:14.960
I think it's ominous that the lights went off during that discussion.
link |
00:26:21.440
So the problem, again, there's one thing is the movie and you could imagine all kinds
link |
00:26:25.320
of science fiction. The problem for me, maybe similar to the question about existential
link |
00:26:30.320
risk, is that this kind of movie paints such a wrong picture of what is the actual science
link |
00:26:39.440
and how it's going on that it can have unfortunate effects on people's understanding of current
link |
00:26:45.640
science. And so that's kind of sad.
link |
00:26:50.800
There's an important principle in research, which is diversity. So in other words, research
link |
00:26:58.440
is exploration. Research is exploration in the space of ideas. And different people will
link |
00:27:03.720
focus on different directions. And this is not just good, it's essential. So I'm totally
link |
00:27:09.520
fine with people exploring directions that are contrary to mine or look orthogonal to
link |
00:27:16.440
mine. I am more than fine. I think it's important. I and my friends don't claim we have universal
link |
00:27:24.920
truth about what will, especially about what will happen in the future. Now that being
link |
00:27:29.560
said, we have our intuitions and then we act accordingly according to where we think we
link |
00:27:36.560
can be most useful and where society has the most to gain or to lose. We should have those
link |
00:27:42.480
debates and not end up in a society where there's only one voice and one way of thinking
link |
00:27:49.800
and research money is spread out.
link |
00:27:53.520
So disagreement is a sign of good research, good science.
link |
00:27:59.040
Yes.
link |
00:28:00.040
The idea of bias in the human sense of bias. How do you think about instilling in machine
link |
00:28:08.600
learning something that's aligned with human values in terms of bias? We intuitively as
link |
00:28:15.240
human beings have a concept of what bias means, of what fundamental respect for other human
link |
00:28:21.160
beings means. But how do we instill that into machine learning systems, do you think?
link |
00:28:26.760
So I think there are short term things that are already happening and then there are long
link |
00:28:32.360
term things that we need to do. In the short term, there are techniques that have been
link |
00:28:38.360
proposed and I think will continue to be improved and maybe alternatives will come up to take
link |
00:28:44.200
data sets in which we know there is bias, we can measure it. Pretty much any data set
link |
00:28:50.120
where humans are being observed taking decisions will have some sort of bias, discrimination
link |
00:28:55.520
against particular groups and so on.
link |
00:28:59.000
And we can use machine learning techniques to try to build predictors, classifiers that
link |
00:29:04.240
are going to be less biased. We can do it, for example, using adversarial methods to
link |
00:29:11.600
make our systems less sensitive to these variables we should not be sensitive to.
link |
00:29:18.360
So these are clear, well defined ways of trying to address the problem. Maybe they have weaknesses
link |
00:29:23.520
and more research is needed and so on. But I think in fact they are sufficiently mature
link |
00:29:28.840
that governments should start regulating companies where it matters, say like insurance companies,
link |
00:29:35.240
so that they use those techniques. Because those techniques will probably reduce the
link |
00:29:40.480
bias but at a cost. For example, maybe their predictions will be less accurate and so companies
link |
00:29:46.440
will not do it until you force them.
link |
00:29:48.560
All right, so this is short term. Long term, I'm really interested in thinking how we can
link |
00:29:56.040
instill moral values into computers. Obviously, this is not something we'll achieve in the
link |
00:30:01.560
next five or 10 years. How can we, you know, there's already work in detecting emotions,
link |
00:30:08.120
for example, in images, in sounds, in texts, and also studying how different agents interacting
link |
00:30:19.880
in different ways may correspond to patterns of, say, injustice, which could trigger anger.
link |
00:30:28.200
So these are things we can do in the medium term and eventually train computers to model,
link |
00:30:37.840
for example, how humans react emotionally. I would say the simplest thing is unfair situations
link |
00:30:46.960
which trigger anger. This is one of the most basic emotions that we share with other animals.
link |
00:30:52.680
I think it's quite feasible within the next few years that we can build systems that can
link |
00:30:57.160
detect these kinds of things to the extent, unfortunately, that they understand enough
link |
00:31:01.980
about the world around us, which is a long time away. But maybe we can initially do this
link |
00:31:08.240
in virtual environments. So you can imagine a video game where agents interact in some
link |
00:31:14.840
ways and then some situations trigger an emotion. I think we could train machines to detect
link |
00:31:21.640
those situations and predict that the particular emotion will likely be felt if a human was
link |
00:31:27.400
playing one of the characters.
link |
00:31:29.460
You have shown excitement and done a lot of excellent work with unsupervised learning.
link |
00:31:35.720
But there's been a lot of success on the supervised learning side.
link |
00:31:39.840
Yes, yes.
link |
00:31:40.840
And one of the things I'm really passionate about is how humans and robots work together.
link |
00:31:46.680
And in the context of supervised learning, that means the process of annotation. Do you
link |
00:31:52.800
think about the problem of annotation put in a more interesting way as humans teaching
link |
00:32:00.080
machines?
link |
00:32:01.080
Yes.
link |
00:32:02.080
Is there?
link |
00:32:03.080
Yes. I think it's an important subject. Reducing it to annotation may be useful for somebody
link |
00:32:09.560
building a system tomorrow. But longer term, the process of teaching, I think, is something
link |
00:32:16.300
that deserves a lot more attention from the machine learning community. So there are people
link |
00:32:19.960
who have coined the term machine teaching. So what are good strategies for teaching a
link |
00:32:24.560
learning agent? And can we design and train a system that is going to be a good teacher?
link |
00:32:33.160
So in my group, we have a project called BBI or BBI game, where there is a game or scenario
link |
00:32:42.200
where there's a learning agent and a teaching agent. Presumably, the teaching agent would
link |
00:32:48.480
eventually be a human. But we're not there yet. And the role of the teacher is to use
link |
00:32:57.960
its knowledge of the environment, which it can acquire using whatever way brute force
link |
00:33:04.840
to help the learner learn as quickly as possible. So the learner is going to try to learn by
link |
00:33:10.760
itself, maybe using some exploration and whatever. But the teacher can choose, can have an influence
link |
00:33:19.920
on the interaction with the learner, so as to guide the learner, maybe teach it the things
link |
00:33:27.160
that the learner has most trouble with, or just add the boundary between what it knows
link |
00:33:30.840
and doesn't know, and so on. So there's a tradition of these kind of ideas from other
link |
00:33:36.180
fields and like tutorial systems, for example, and AI. And of course, people in the humanities
link |
00:33:45.320
have been thinking about these questions. But I think it's time that machine learning
link |
00:33:48.240
people look at this, because in the future, we'll have more and more human machine interaction
link |
00:33:55.440
with the human in the loop. And I think understanding how to make this work better, all the problems
link |
00:34:01.040
around that are very interesting and not sufficiently addressed. You've done a lot of work with
link |
00:34:06.160
language, too. What aspect of the traditionally formulated Turing test, a test of natural
link |
00:34:14.000
language understanding and generation in your eyes is the most difficult of conversation?
link |
00:34:19.520
What in your eyes is the hardest part of conversation to solve for machines? So I would say it's
link |
00:34:25.640
everything having to do with the non linguistic knowledge, which implicitly you need in order
link |
00:34:32.300
to make sense of sentences, things like the Winograd schema. So these sentences that are
link |
00:34:37.680
semantically ambiguous. In other words, you need to understand enough about the world
link |
00:34:43.720
in order to really interpret properly those sentences. I think these are interesting challenges
link |
00:34:49.280
for machine learning, because they point in the direction of building systems that both
link |
00:34:57.300
understand how the world works and this causal relationships in the world and associate that
link |
00:35:03.760
knowledge with how to express it in language, either for reading or writing.
link |
00:35:12.080
You speak French?
link |
00:35:13.080
Yes, it's my mother tongue.
link |
00:35:14.760
It's one of the romance languages. Do you think passing the Turing test and all the
link |
00:35:20.400
underlying challenges we just mentioned depend on language? Do you think it might be easier
link |
00:35:24.320
in French than it is in English, or is independent of language?
link |
00:35:28.920
I think it's independent of language. I would like to build systems that can use the same
link |
00:35:37.600
principles, the same learning mechanisms to learn from human agents, whatever their language.
link |
00:35:46.720
Well, certainly us humans can talk more beautifully and smoothly in poetry, some Russian originally.
link |
00:35:53.560
I know poetry in Russian is maybe easier to convey complex ideas than it is in English.
link |
00:36:02.600
But maybe I'm showing my bias and some people could say that about French. But of course,
link |
00:36:09.480
the goal ultimately is our human brain is able to utilize any kind of those languages
link |
00:36:15.880
to use them as tools to convey meaning.
link |
00:36:18.280
Yeah, of course, there are differences between languages, and maybe some are slightly better
link |
00:36:22.040
at some things, but in the grand scheme of things, where we're trying to understand how
link |
00:36:26.120
the brain works and language and so on, I think these differences are minute.
link |
00:36:32.040
So you've lived perhaps through an AI winter of sorts?
link |
00:36:38.880
Yes.
link |
00:36:39.920
How did you stay warm and continue your research?
link |
00:36:44.740
Stay warm with friends.
link |
00:36:45.740
With friends. Okay, so it's important to have friends. And what have you learned from the
link |
00:36:51.160
experience?
link |
00:36:53.600
Listen to your inner voice. Don't, you know, be trying to just please the crowds and the
link |
00:37:02.040
fashion. And if you have a strong intuition about something that is not contradicted by
link |
00:37:10.320
actual evidence, go for it. I mean, it could be contradicted by people.
link |
00:37:17.280
Not your own instinct of based on everything you've learned?
link |
00:37:20.600
Of course, you have to adapt your beliefs when your experiments contradict those beliefs.
link |
00:37:28.320
But you have to stick to your beliefs. Otherwise, it's what allowed me to go through those years.
link |
00:37:35.000
It's what allowed me to persist in directions that, you know, took time, whatever other
link |
00:37:42.040
people think, took time to mature and bring fruits.
link |
00:37:48.040
So history of AI is marked with these, of course, it's marked with technical breakthroughs,
link |
00:37:54.520
but it's also marked with these seminal events that capture the imagination of the community.
link |
00:38:00.980
Most recent, I would say, AlphaGo beating the world champion human Go player was one
link |
00:38:06.400
of those moments. What do you think the next such moment might be?
link |
00:38:12.360
Okay, so first of all, I think that these so called seminal events are overrated. As
link |
00:38:22.600
I said, science really moves by small steps. Now what happens is you make one more small
link |
00:38:30.200
step and it's like the drop that, you know, that fills the bucket and then you have drastic
link |
00:38:39.480
consequences because now you're able to do something you were not able to do before.
link |
00:38:43.920
Or now, say, the cost of building some device or solving a problem becomes cheaper than
link |
00:38:49.720
what existed and you have a new market that opens up, right? So especially in the world
link |
00:38:53.900
of commerce and applications, the impact of a small scientific progress could be huge.
link |
00:39:03.760
But in the science itself, I think it's very, very gradual.
link |
00:39:07.800
And where are these steps being taken now? So there's unsupervised learning.
link |
00:39:13.160
So if I look at one trend that I like in my community, so for example, at Milan, my institute,
link |
00:39:23.380
what are the two hardest topics? GANs and reinforcement learning. Even though in Montreal
link |
00:39:31.840
in particular, reinforcement learning was something pretty much absent just two or three
link |
00:39:37.020
years ago. So there's really a big interest from students and there's a big interest from
link |
00:39:44.280
people like me. So I would say this is something where we're going to see more progress, even
link |
00:39:51.560
though it hasn't yet provided much in terms of actual industrial fallout. Like even though
link |
00:39:58.680
there's AlphaGo, there's no, like Google is not making money on this right now. But I
link |
00:40:03.360
think over the long term, this is really, really important for many reasons.
link |
00:40:08.960
So in other words, I would say reinforcement learning may be more generally agent learning
link |
00:40:13.840
because it doesn't have to be with rewards. It could be in all kinds of ways that an agent
link |
00:40:17.520
is learning about its environment.
link |
00:40:20.720
Now reinforcement learning you're excited about, do you think GANs could provide something,
link |
00:40:28.840
at the moment? Well, GANs or other generative models, I believe, will be crucial ingredients
link |
00:40:38.880
in building agents that can understand the world. A lot of the successes in reinforcement
link |
00:40:45.480
learning in the past has been with policy gradient, where you just learn a policy, you
link |
00:40:51.160
don't actually learn a model of the world. But there are lots of issues with that. And
link |
00:40:55.760
we don't know how to do model based RL right now. But I think this is where we have to
link |
00:41:00.880
go in order to build models that can generalize faster and better like to new distributions
link |
00:41:09.340
that capture to some extent, at least the underlying causal mechanisms in the world.
link |
00:41:16.120
Last question. What made you fall in love with artificial intelligence? If you look
link |
00:41:21.480
back, what was the first moment in your life when you were fascinated by either the human
link |
00:41:28.880
mind or the artificial mind?
link |
00:41:31.360
You know, when I was an adolescent, I was reading a lot. And then I started reading
link |
00:41:35.520
science fiction.
link |
00:41:36.520
There you go.
link |
00:41:37.520
That's it. That's where I got hooked. And then, you know, I had one of the first personal
link |
00:41:46.520
computers and I got hooked in programming. And so it just, you know,
link |
00:41:52.680
Start with fiction and then make it a reality.
link |
00:41:54.800
That's right.
link |
00:41:55.800
Yoshua, thank you so much for talking to me.
link |
00:41:57.560
My pleasure.