back to indexIan Goodfellow: Generative Adversarial Networks (GANs) | Lex Fridman Podcast #19
link |
The following is a conversation with Ian Goodfellow.
link |
He's the author of the popular textbook on deep learning
link |
simply titled Deep Learning.
link |
He coined the term of Generative Adversarial Networks,
link |
otherwise known as GANs,
link |
and with his 2014 paper is responsible
link |
for launching the incredible growth
link |
of research and innovation in this subfield
link |
He got his BS and MS at Stanford,
link |
his PhD at University of Montreal
link |
with Yoshua Bengio and Aaron Kerrville.
link |
He held several research positions
link |
including at OpenAI, Google Brain,
link |
and now at Apple as the Director of Machine Learning.
link |
This recording happened while Ian was still at Google Brain,
link |
but we don't talk about anything specific to Google
link |
or any other organization.
link |
This conversation is part
link |
of the Artificial Intelligence Podcast.
link |
If you enjoy it, subscribe on YouTube, iTunes,
link |
or simply connect with me on Twitter at Lex Friedman,
link |
And now here's my conversation with Ian Goodfellow.
link |
You open your popular deep learning book
link |
with a Russian doll type diagram
link |
that shows deep learning is a subset
link |
of representation learning,
link |
which in turn is a subset of machine learning
link |
and finally a subset of AI.
link |
So this kind of implies that there may be limits
link |
to deep learning in the context of AI.
link |
So what do you think is the current limits of deep learning
link |
and are those limits something
link |
that we can overcome with time?
link |
Yeah, I think one of the biggest limitations
link |
of deep learning is that right now it requires
link |
really a lot of data, especially labeled data.
link |
There are some unsupervised
link |
and semi supervised learning algorithms
link |
that can reduce the amount of labeled data you need,
link |
but they still require a lot of unlabeled data,
link |
reinforcement learning algorithms.
link |
They don't need labels,
link |
but they need really a lot of experiences.
link |
As human beings, we don't learn to play Pong
link |
by failing at Pong 2 million times.
link |
So just getting the generalization ability better
link |
is one of the most important bottlenecks
link |
in the capability of the technology today.
link |
And then I guess I'd also say deep learning
link |
is like a component of a bigger system.
link |
So far, nobody is really proposing to have
link |
only what you'd call deep learning
link |
as the entire ingredient of intelligence.
link |
You use deep learning as sub modules of other systems,
link |
like AlphaGo has a deep learning model
link |
that estimates the value function.
link |
Most reinforcement learning algorithms
link |
have a deep learning module
link |
that estimates which action to take next,
link |
but you might have other components.
link |
So you're basically building a function estimator.
link |
Do you think it's possible,
link |
you said nobody's kind of been thinking about this so far,
link |
but do you think neural networks could be made to reason
link |
in the way symbolic systems did in the 80s and 90s
link |
to do more, create more like programs
link |
as opposed to functions?
link |
Yeah, I think we already see that a little bit.
link |
I already kind of think of neural nets
link |
as a kind of program.
link |
I think of deep learning as basically learning programs
link |
that have more than one step.
link |
So if you draw a flow chart
link |
or if you draw a TensorFlow graph
link |
describing your machine learning model,
link |
I think of the depth of that graph
link |
as describing the number of steps that run in sequence.
link |
And then the width of that graph
link |
is the number of steps that run in parallel.
link |
Now it's been long enough
link |
that we've had deep learning working
link |
that it's a little bit silly
link |
to even discuss shallow learning anymore.
link |
But back when I first got involved in AI,
link |
when we used machine learning,
link |
we were usually learning things like support vector machines.
link |
You could have a lot of input features to the model
link |
and you could multiply each feature by a different weight.
link |
All those multiplications were done
link |
in parallel to each other.
link |
There wasn't a lot done in series.
link |
I think what we got with deep learning
link |
was really the ability to have steps of a program
link |
that run in sequence.
link |
And I think that we've actually started to see
link |
that what's important with deep learning
link |
is more the fact that we have a multi step program
link |
rather than the fact that we've learned a representation.
link |
If you look at things like resonance, for example,
link |
they take one particular kind of representation
link |
and they update it several times.
link |
Back when deep learning first really took off
link |
in the academic world in 2006,
link |
when Jeff Hinton showed that you could train
link |
deep belief networks,
link |
everybody who was interested in the idea
link |
thought of it as each layer
link |
learns a different level of abstraction.
link |
That the first layer trained on images
link |
learns something like edges
link |
and the second layer learns corners.
link |
And eventually you get these kind of grandmother cell units
link |
that recognize specific objects.
link |
Today I think most people think of it more
link |
as a computer program where as you add more layers
link |
you can do more updates before you output your final number.
link |
But I don't think anybody believes that
link |
layer 150 of the ResNet is a grandmother cell
link |
and layer 100 is contours or something like that.
link |
Okay, so you're not thinking of it
link |
as a singular representation that keeps building.
link |
You think of it as a program,
link |
sort of almost like a state.
link |
Representation is a state of understanding.
link |
Yeah, I think of it as a program
link |
that makes several updates
link |
and arrives at better and better understandings,
link |
but it's not replacing the representation at each step.
link |
And in some sense, that's a little bit like reasoning.
link |
It's not reasoning in the form of deduction,
link |
but it's reasoning in the form of taking a thought
link |
and refining it and refining it carefully
link |
until it's good enough to use.
link |
So do you think, and I hope you don't mind,
link |
we'll jump philosophical every once in a while.
link |
Do you think of cognition, human cognition,
link |
or even consciousness as simply a result
link |
of this kind of sequential representation learning?
link |
Do you think that can emerge?
link |
Cognition, yes, I think so.
link |
Consciousness, it's really hard to even define
link |
what we mean by that.
link |
I guess there's, consciousness is often defined
link |
as things like having self awareness,
link |
and that's relatively easy to turn into something actionable
link |
for a computer scientist to reason about.
link |
People also define consciousness
link |
in terms of having qualitative states of experience,
link |
like qualia, and there's all these philosophical problems,
link |
like could you imagine a zombie
link |
who does all the same information processing as a human,
link |
but doesn't really have the qualitative experiences
link |
That sort of thing, I have no idea how to formalize
link |
or turn it into a scientific question.
link |
I don't know how you could run an experiment
link |
to tell whether a person is a zombie or not.
link |
And similarly, I don't know how you could run
link |
an experiment to tell whether an advanced AI system
link |
had become conscious in the sense of qualia or not.
link |
But in the more practical sense,
link |
like almost like self attention,
link |
you think consciousness and cognition can,
link |
in an impressive way, emerge from current types
link |
of architectures that we think of as learning.
link |
Or if you think of consciousness
link |
in terms of self awareness and just making plans
link |
based on the fact that the agent itself exists in the world,
link |
reinforcement learning algorithms
link |
are already more or less forced
link |
to model the agent's effect on the environment.
link |
So that more limited version of consciousness
link |
is already something that we get limited versions of
link |
with reinforcement learning algorithms
link |
if they're trained well.
link |
But you say limited, so the big question really
link |
is how you jump from limited to human level, right?
link |
And whether it's possible,
link |
even just building common sense reasoning
link |
seems to be exceptionally difficult.
link |
So if we scale things up,
link |
if we get much better on supervised learning,
link |
if we get better at labeling,
link |
if we get bigger data sets, more compute,
link |
do you think we'll start to see really impressive things
link |
that go from limited to something,
link |
echoes of human level cognition?
link |
I'm optimistic about what can happen
link |
just with more computation and more data.
link |
I do think it'll be important
link |
to get the right kind of data.
link |
Today, most of the machine learning systems we train
link |
are mostly trained on one type of data for each model.
link |
But the human brain, we get all of our different senses
link |
and we have many different experiences
link |
like riding a bike, driving a car,
link |
talking to people, reading.
link |
I think when we get that kind of integrated data set,
link |
working with a machine learning model
link |
that can actually close the loop and interact,
link |
we may find that algorithms not so different
link |
from what we have today learn really interesting things
link |
when you scale them up a lot
link |
and train them on a large amount of multimodal data.
link |
So multimodal is really interesting,
link |
but within, like you're working adversarial examples.
link |
So selecting within modal, within one mode of data,
link |
selecting better at what are the difficult cases
link |
from which you're most useful to learn from.
link |
Oh yeah, like could we get a whole lot of mileage
link |
out of designing a model that's resistant
link |
to adversarial examples or something like that?
link |
Right, that's the question.
link |
My thinking on that has evolved a lot
link |
over the last few years.
link |
When I first started to really invest
link |
in studying adversarial examples,
link |
I was thinking of it mostly as adversarial examples
link |
reveal a big problem with machine learning
link |
and we would like to close the gap
link |
between how machine learning models respond
link |
to adversarial examples and how humans respond.
link |
After studying the problem more,
link |
I still think that adversarial examples are important.
link |
I think of them now more of as a security liability
link |
than as an issue that necessarily shows
link |
there's something uniquely wrong
link |
with machine learning as opposed to humans.
link |
Also, do you see them as a tool
link |
to improve the performance of the system?
link |
Not on the security side, but literally just accuracy.
link |
I do see them as a kind of tool on that side,
link |
but maybe not quite as much as I used to think.
link |
We've started to find that there's a trade off
link |
between accuracy on adversarial examples
link |
and accuracy on clean examples.
link |
Back in 2014, when I did the first
link |
adversarily trained classifier that showed resistance
link |
to some kinds of adversarial examples,
link |
it also got better at the clean data on MNIST.
link |
And that's something we've replicated several times
link |
on MNIST, that when we train
link |
against weak adversarial examples,
link |
MNIST classifiers get more accurate.
link |
So far that hasn't really held up on other data sets
link |
and hasn't held up when we train
link |
against stronger adversaries.
link |
It seems like when you confront
link |
a really strong adversary,
link |
you tend to have to give something up.
link |
But it's such a compelling idea
link |
because it feels like that's how us humans learn
link |
is through the difficult cases.
link |
We try to think of what would we screw up
link |
and then we make sure we fix that.
link |
It's also in a lot of branches of engineering,
link |
you do a worst case analysis
link |
and make sure that your system will work in the worst case.
link |
And then that guarantees that it'll work
link |
in all of the messy average cases that happen
link |
when you go out into a really randomized world.
link |
Yeah, with driving with autonomous vehicles,
link |
there seems to be a desire to just look for,
link |
think adversarially,
link |
try to figure out how to mess up the system.
link |
And if you can be robust to all those difficult cases,
link |
then you can, it's a hand wavy empirical way
link |
to show your system is safe.
link |
Today, most adversarial example research
link |
isn't really focused on a particular use case,
link |
but there are a lot of different use cases
link |
where you'd like to make sure that the adversary
link |
can't interfere with the operation of your system.
link |
if you have an algorithm making trades for you,
link |
people go to a lot of an effort
link |
to obfuscate their algorithm.
link |
That's both to protect their IP
link |
because you don't want to research
link |
and develop a profitable trading algorithm
link |
then have somebody else capture the gains.
link |
But it's at least partly
link |
because you don't want people to make adversarial examples
link |
that fool your algorithm into making bad trades.
link |
Or I guess one area that's been popular
link |
in the academic literature is speech recognition.
link |
If you use speech recognition to hear an audio wave form
link |
and then turn that into a command
link |
that a phone executes for you,
link |
you don't want a malicious adversary
link |
to be able to produce audio
link |
that gets interpreted as malicious commands,
link |
especially if a human in the room doesn't realize
link |
that something like that is happening.
link |
And speech recognition,
link |
has there been much success
link |
in being able to create adversarial examples
link |
that fool the system?
link |
I guess the first work that I'm aware of
link |
is a paper called Hidden Voice Commands
link |
that came out in 2016, I believe.
link |
And they were able to show that they could make sounds
link |
that are not understandable by a human
link |
but are recognized as the target phrase
link |
that the attacker wants the phone to recognize it as.
link |
Since then, things have gotten a little bit better
link |
on the attacker's side
link |
when worse on the defender's side.
link |
It's become possible to make sounds
link |
that sound like normal speech
link |
but are actually interpreted as a different sentence
link |
than the human hears.
link |
The level of perceptibility
link |
of the adversarial perturbation is still kind of high.
link |
When you listen to the recording,
link |
it sounds like there's some noise in the background,
link |
just like rustling sounds.
link |
But those rustling sounds
link |
are actually the adversarial perturbation
link |
that makes the phone hear a completely different sentence.
link |
Yeah, that's so fascinating.
link |
Peter Norvig mentioned
link |
that you're writing the deep learning chapter
link |
for the fourth edition
link |
of the Artificial Intelligence, A Modern Approach book.
link |
So how do you even begin summarizing
link |
the field of deep learning in a chapter?
link |
Well, in my case, I waited like a year
link |
before I actually wrote anything.
link |
Even having written a full length textbook before,
link |
it's still pretty intimidating
link |
to try to start writing just one chapter
link |
that covers everything.
link |
One thing that helped me make that plan
link |
was actually the experience
link |
of having written the full book before
link |
and then watching how the field changed
link |
after the book came out.
link |
I've realized there's a lot of topics
link |
that were maybe extraneous in the first book
link |
and just seeing what stood the test
link |
of a few years of being published
link |
and what seems a little bit less important
link |
to have included now helped me pare down the topics
link |
I wanted to cover for the book.
link |
It's also really nice now
link |
that the field is kind of stabilized
link |
to the point where some core ideas from the 1980s
link |
are still used today.
link |
When I first started studying machine learning,
link |
almost everything from the 1980s had been rejected
link |
and now some of it has come back.
link |
So that stuff that's really stood the test of time
link |
is what I focused on putting into the book.
link |
There's also, I guess, two different philosophies
link |
about how you might write a book.
link |
One philosophy is you try to write a reference
link |
that covers everything.
link |
The other philosophy is you try to provide
link |
a high level summary that gives people the language
link |
to understand a field
link |
and tells them what the most important concepts are.
link |
The first deep learning book that I wrote
link |
with Joshua and Aaron was somewhere
link |
between the two philosophies,
link |
that it's trying to be both a reference
link |
and an introductory guide.
link |
Writing this chapter for Russell Norvig's book,
link |
I was able to focus more on just a concise introduction
link |
of the key concepts and the language
link |
you need to read about them more.
link |
In a lot of cases, I actually just wrote paragraphs
link |
that said, here's a rapidly evolving area
link |
that you should pay attention to.
link |
It's pointless to try to tell you what the latest
link |
and best version of a learn to learn model is.
link |
I can point you to a paper that's recent right now,
link |
but there isn't a whole lot of a reason to delve
link |
into exactly what's going on
link |
with the latest learning to learn approach
link |
or the latest module produced
link |
by a learning to learn algorithm.
link |
You should know that learning to learn is a thing
link |
and that it may very well be the source of the latest
link |
and greatest convolutional net or recurrent net module
link |
that you would want to use in your latest project.
link |
But there isn't a lot of point in trying to summarize
link |
exactly which architecture and which learning approach
link |
got to which level of performance.
link |
So you maybe focus more on the basics of the methodology.
link |
So from back propagation to feed forward
link |
to recurrent neural networks, convolutional,
link |
that kind of thing?
link |
So if I were to ask you, I remember I took algorithms
link |
and data structures algorithms course.
link |
I remember the professor asked, what is an algorithm?
link |
And yelled at everybody in a good way
link |
that nobody was answering it correctly.
link |
Everybody knew what the algorithm, it was graduate course.
link |
Everybody knew what an algorithm was,
link |
but they weren't able to answer it well.
link |
So let me ask you in that same spirit,
link |
what is deep learning?
link |
I would say deep learning is any kind of machine learning
link |
that involves learning parameters of more than one
link |
So that, I mean, shallow learning is things
link |
where you learn a lot of operations that happen in parallel.
link |
You might have a system that makes multiple steps.
link |
Like you might have hand designed feature extractors,
link |
but really only one step is learned.
link |
Deep learning is anything where you have multiple operations
link |
in sequence, and that includes the things
link |
that are really popular today,
link |
like convolutional networks and recurrent networks.
link |
But it also includes some of the things that have died out
link |
like Bolton machines,
link |
where we weren't using back propagation.
link |
Today, I hear a lot of people define deep learning
link |
as gradient descent applied
link |
to these differentiable functions.
link |
And I think that's a legitimate usage of the term.
link |
It's just different from the way that I use the term myself.
link |
So what's an example of deep learning
link |
that is not gradient descent and differentiable functions?
link |
In your, I mean, not specifically perhaps,
link |
but more even looking into the future,
link |
what's your thought about that space of approaches?
link |
Yeah, so I tend to think of machine learning algorithms
link |
as decomposed into really three different pieces.
link |
There's the model, which can be something like a neural net
link |
or a Bolton machine or a recurrent model.
link |
And that basically just describes how do you take data
link |
and how do you take parameters?
link |
And what function do you use to make a prediction
link |
given the data and the parameters?
link |
Another piece of the learning algorithm
link |
is the optimization algorithm.
link |
Or not every algorithm can be really described
link |
in terms of optimization,
link |
but what's the algorithm for updating the parameters
link |
or updating whatever the state of the network is?
link |
And then the last part is the data set,
link |
like how do you actually represent the world
link |
as it comes into your machine learning system?
link |
So I think of deep learning as telling us something about
link |
what does the model look like?
link |
And basically to qualify as deep,
link |
I say that it just has to have multiple layers.
link |
That can be multiple steps
link |
in a feed forward differentiable computation.
link |
That can be multiple layers in a graphical model.
link |
There's a lot of ways that you could satisfy me
link |
that something has multiple steps
link |
that are each parameterized separately.
link |
I think of gradient descent
link |
as being all about that other piece,
link |
the how do you actually update the parameters piece?
link |
So you could imagine having a deep model
link |
like a convolutional net
link |
and training it with something like evolution
link |
or a genetic algorithm.
link |
And I would say that still qualifies as deep learning.
link |
And then in terms of models
link |
that aren't necessarily differentiable,
link |
I guess Bolton machines are probably
link |
the main example of something
link |
where you can't really take a derivative
link |
and use that for the learning process.
link |
But you can still argue that the model
link |
has many steps of processing that it applies
link |
when you run inference in the model.
link |
So it's the steps of processing that's key.
link |
So Jeff Hinton suggests that we need to throw away
link |
back propagation and start all over.
link |
What do you think about that?
link |
What could an alternative direction
link |
of training neural networks look like?
link |
I don't know that back propagation
link |
is gonna go away entirely.
link |
Most of the time when we decide
link |
that a machine learning algorithm
link |
isn't on the critical path to research for improving AI,
link |
the algorithm doesn't die.
link |
It just becomes used for some specialized set of things.
link |
A lot of algorithms like logistic regression
link |
don't seem that exciting to AI researchers
link |
who are working on things like speech recognition
link |
or autonomous cars today.
link |
But there's still a lot of use for logistic regression
link |
and things like analyzing really noisy data
link |
in medicine and finance
link |
or making really rapid predictions
link |
in really time limited contexts.
link |
So I think back propagation and gradient descent
link |
are around to stay, but they may not end up being
link |
everything that we need to get to real human level
link |
or super human AI.
link |
Are you optimistic about us discovering
link |
back propagation has been around for a few decades?
link |
So are you optimistic about us as a community
link |
being able to discover something better?
link |
I think we likely will find something that works better.
link |
You could imagine things like having stacks of models
link |
where some of the lower level models
link |
predict parameters of the higher level models.
link |
And so at the top level,
link |
you're not learning in terms of literally
link |
calculating gradients,
link |
but just predicting how different values will perform.
link |
You can kind of see that already in some areas
link |
like Bayesian optimization,
link |
where you have a Gaussian process
link |
that predicts how well different parameter values
link |
We already use those kinds of algorithms
link |
for things like hyper parameter optimization.
link |
And in general, we know a lot of things other than back prop
link |
that work really well for specific problems.
link |
The main thing we haven't found is
link |
a way of taking one of these other
link |
non back prop based algorithms
link |
and having it really advanced the state of the art
link |
on an AI level problem.
link |
But I wouldn't be surprised if eventually
link |
we find that some of these algorithms
link |
that even the ones that already exist,
link |
not even necessarily new one,
link |
we might find some way of customizing
link |
one of these algorithms to do something really interesting
link |
at the level of cognition or the level of,
link |
I think one system that we really don't have working
link |
quite right yet is like short term memory.
link |
We have things like LSTMs,
link |
they're called long short term memory.
link |
They still don't do quite what a human does
link |
with short term memory.
link |
Like gradient descent to learn a specific fact
link |
has to do multiple steps on that fact.
link |
Like if I tell you the meeting today is at 3 p.m.,
link |
I don't need to say over and over again,
link |
it's at 3 p.m., it's at 3 p.m., it's at 3 p.m.,
link |
for you to do a gradient step on each one.
link |
You just hear it once and you remember it.
link |
There's been some work on things like self attention
link |
and attention like mechanisms,
link |
like the neural Turing machine
link |
that can write to memory cells
link |
and update themselves with facts like that right away.
link |
But I don't think we've really nailed it yet.
link |
And that's one area where I'd imagine
link |
that new optimization algorithms
link |
or different ways of applying
link |
existing optimization algorithms
link |
could give us a way of just lightning fast
link |
updating the state of a machine learning system
link |
to contain a specific fact like that
link |
without needing to have it presented
link |
over and over and over again.
link |
So some of the success of symbolic systems in the 80s
link |
is they were able to assemble these kinds of facts better.
link |
But there's a lot of expert input required
link |
and it's very limited in that sense.
link |
Do you ever look back to that
link |
as something that we'll have to return to eventually?
link |
Sort of dust off the book from the shelf
link |
and think about how we build knowledge,
link |
representation, knowledge base.
link |
Like will we have to use graph searches?
link |
Graph searches, right.
link |
And like first order logic and entailment
link |
and things like that.
link |
That kind of thing, yeah, exactly.
link |
In my particular line of work,
link |
which has mostly been machine learning security
link |
and also generative modeling,
link |
I haven't usually found myself moving in that direction.
link |
For generative models, I could see a little bit of,
link |
it could be useful if you had something
link |
like a differentiable knowledge base
link |
or some other kind of knowledge base
link |
where it's possible for some of our
link |
fuzzier machine learning algorithms
link |
to interact with a knowledge base.
link |
I mean, your network is kind of like that.
link |
It's a differentiable knowledge base of sorts.
link |
If we had a really easy way of giving feedback
link |
to machine learning models,
link |
that would clearly help a lot with generative models.
link |
And so you could imagine one way of getting there
link |
would be get a lot better at natural language processing.
link |
But another way of getting there would be
link |
take some kind of knowledge base
link |
and figure out a way for it to actually
link |
interact with a neural network.
link |
Being able to have a chat with a neural network.
link |
So like one thing in generative models we see a lot today
link |
is you'll get things like faces that are not symmetrical,
link |
like people that have two eyes that are different colors.
link |
I mean, there are people with eyes
link |
that are different colors in real life,
link |
but not nearly as many of them as you tend to see
link |
in the machine learning generated data.
link |
So if you had either a knowledge base
link |
that could contain the fact,
link |
people's faces are generally approximately symmetric
link |
and eye color is especially likely
link |
to be the same on both sides.
link |
Being able to just inject that hint
link |
into the machine learning model
link |
without it having to discover that itself
link |
after studying a lot of data
link |
would be a really useful feature.
link |
I could see a lot of ways of getting there
link |
without bringing back some of the 1980s technology,
link |
but I also see some ways that you could imagine
link |
extending the 1980s technology to play nice with neural nets
link |
and have it help get there.
link |
So you talked about the story of you coming up
link |
with the idea of GANs at a bar with some friends.
link |
You were arguing that this, you know, GANs would work,
link |
generative adversarial networks,
link |
and the others didn't think so.
link |
Then you went home at midnight, coded it up, and it worked.
link |
So if I was a friend of yours at the bar,
link |
I would also have doubts.
link |
It's a really nice idea,
link |
but I'm very skeptical that it would work.
link |
What was the basis of their skepticism?
link |
What was the basis of your intuition why it should work?
link |
I don't want to be someone who goes around
link |
promoting alcohol for the purposes of science,
link |
I do actually think that drinking helped a little bit.
link |
When your inhibitions are lowered,
link |
you're more willing to try out things
link |
that you wouldn't try out otherwise.
link |
So I have noticed in general
link |
that I'm less prone to shooting down some of my own ideas
link |
when I have had a little bit to drink.
link |
I think if I had had that idea at lunchtime,
link |
I probably would have thought,
link |
it's hard enough to train one neural net,
link |
you can't train a second neural net
link |
in the inner loop of the outer neural net.
link |
That was basically my friend's objection,
link |
was that trying to train two neural nets at the same time
link |
would be too hard.
link |
So it was more about the training process,
link |
unless, so my skepticism would be,
link |
you know, I'm sure you could train it,
link |
but the thing it would converge to
link |
would not be able to generate anything reasonable,
link |
any kind of reasonable realism.
link |
Yeah, so part of what all of us were thinking about
link |
when we had this conversation was deep Bolton machines,
link |
which a lot of us in the lab, including me,
link |
were a big fan of deep Bolton machines at the time.
link |
They involved two separate processes
link |
running at the same time.
link |
One of them is called the positive phase,
link |
where you load data into the model
link |
and tell the model to make the data more likely.
link |
The other one is called the negative phase,
link |
where you draw samples from the model
link |
and tell the model to make those samples less likely.
link |
In a deep Bolton machine,
link |
it's not trivial to generate a sample.
link |
You have to actually run an iterative process
link |
that gets better and better samples
link |
coming closer and closer to the distribution
link |
the model represents.
link |
So during the training process,
link |
you're always running these two systems at the same time,
link |
one that's updating the parameters of the model
link |
and another one that's trying to generate samples
link |
And they worked really well in things like MNIST,
link |
but a lot of us in the lab, including me,
link |
had tried to get deep Bolton machines
link |
to scale past MNIST to things like generating color photos,
link |
and we just couldn't get the two processes
link |
to stay synchronized.
link |
So when I had the idea for GANs,
link |
a lot of people thought that the discriminator
link |
would have more or less the same problem
link |
as the negative phase in the Bolton machine,
link |
that trying to train the discriminator in the inner loop,
link |
you just couldn't get it to keep up
link |
with the generator in the outer loop,
link |
and that would prevent it from converging
link |
to anything useful.
link |
Yeah, I share that intuition.
link |
But turns out to not be the case.
link |
A lot of the time with machine learning algorithms,
link |
it's really hard to predict ahead of time
link |
how well they'll actually perform.
link |
You have to just run the experiment and see what happens.
link |
And I would say I still today don't have
link |
like one factor I can put my finger on and say,
link |
this is why GANs worked for photo generation
link |
and deep Bolton machines don't.
link |
There are a lot of theory papers
link |
showing that under some theoretical settings,
link |
the GAN algorithm does actually converge,
link |
but those settings are restricted enough
link |
that they don't necessarily explain the whole picture
link |
in terms of all the results that we see in practice.
link |
So taking a step back,
link |
can you, in the same way as we talked about deep learning,
link |
can you tell me what generative adversarial networks are?
link |
Yeah, so generative adversarial networks
link |
are a particular kind of generative model.
link |
A generative model is a machine learning model
link |
that can train on some set of data.
link |
Like, so you have a collection of photos of cats
link |
and you want to generate more photos of cats,
link |
or you want to estimate a probability distribution over cats.
link |
So you can ask how likely it is
link |
that some new image is a photo of a cat.
link |
GANs are one way of doing this.
link |
Some generative models are good at creating new data.
link |
Other generative models are good at estimating
link |
that density function and telling you how likely
link |
particular pieces of data are to come
link |
from the same distribution as the training data.
link |
GANs are more focused on generating samples
link |
rather than estimating the density function.
link |
There are some kinds of GANs like FlowGAN that can do both,
link |
but mostly GANs are about generating samples,
link |
generating new photos of cats that look realistic.
link |
And they do that completely from scratch.
link |
It's analogous to human imagination.
link |
When a GAN creates a new image of a cat,
link |
it's using a neural network to produce a cat
link |
that has not existed before.
link |
It isn't doing something like compositing photos together.
link |
You're not literally taking the eye off of one cat
link |
and the ear off of another cat.
link |
It's more of this digestive process
link |
where the neural net trains in a lot of data
link |
and comes up with some representation
link |
of the probability distribution
link |
and generates entirely new cats.
link |
There are a lot of different ways
link |
of building a generative model.
link |
What's specific to GANs is that we have a two player game
link |
in the game theoretic sense.
link |
And as the players in this game compete,
link |
one of them becomes able to generate realistic data.
link |
The first player is called the generator.
link |
It produces output data such as just images, for example.
link |
And at the start of the learning process,
link |
it'll just produce completely random images.
link |
The other player is called the discriminator.
link |
The discriminator takes images as input
link |
and guesses whether they're real or fake.
link |
You train it both on real data,
link |
so photos that come from your training set,
link |
actual photos of cats,
link |
and you train it to say that those are real.
link |
You also train it on images
link |
that come from the generator network
link |
and you train it to say that those are fake.
link |
As the two players compete in this game,
link |
the discriminator tries to become better
link |
at recognizing whether images are real or fake.
link |
And the generator becomes better
link |
at fooling the discriminator into thinking
link |
that its outputs are real.
link |
And you can analyze this through the language of game theory
link |
and find that there's a Nash equilibrium
link |
where the generator has captured
link |
the correct probability distribution.
link |
So in the cat example,
link |
it makes perfectly realistic cat photos.
link |
And the discriminator is unable to do better
link |
than random guessing
link |
because all the samples coming from both the data
link |
and the generator look equally likely
link |
to have come from either source.
link |
So do you ever sit back
link |
and does it just blow your mind that this thing works?
link |
so it's able to estimate that density function
link |
enough to generate realistic images.
link |
I mean, does it, yeah.
link |
Do you ever sit back and think how does this even,
link |
why, this is quite incredible,
link |
especially where GANs have gone in terms of realism.
link |
Yeah, and not just to flatter my own work,
link |
but generative models,
link |
all of them have this property that
link |
if they really did what we ask them to do,
link |
they would do nothing but memorize the training data.
link |
Models that are based on maximizing the likelihood,
link |
the way that you obtain the maximum likelihood
link |
for a specific training set
link |
is you assign all of your probability mass
link |
to the training examples and nowhere else.
link |
For GANs, the game is played using a training set.
link |
So the way that you become unbeatable in the game
link |
is you literally memorize training examples.
link |
One of my former interns wrote a paper,
link |
his name is Vaishnav Nagarajan,
link |
and he showed that it's actually hard for the generator
link |
to memorize the training data,
link |
hard in a statistical learning theory sense,
link |
that you can actually create reasons
link |
for why it would require quite a lot of learning steps
link |
and a lot of observations of different latent variables
link |
before you could memorize the training data.
link |
That still doesn't really explain why
link |
when you produce samples that are new,
link |
why do you get compelling images
link |
rather than just garbage
link |
that's different from the training set.
link |
And I don't think we really have a good answer for that,
link |
especially if you think about
link |
how many possible images are out there
link |
and how few images the generative model sees
link |
It seems just unreasonable
link |
that generative models create new images as well as they do,
link |
especially considering that we're basically
link |
training them to memorize rather than generalize.
link |
I think part of the answer is
link |
there's a paper called Deep Image Prior
link |
where they show that you can take a convolutional net
link |
and you don't even need to learn
link |
the parameters of it at all,
link |
you just use the model architecture.
link |
And it's already useful for things like inpainting images.
link |
I think that shows us
link |
that the convolutional network architecture
link |
captures something really important
link |
about the structure of images.
link |
And we don't need to actually use the learning
link |
to capture all the information
link |
coming out of the convolutional net.
link |
That would imply that it would be much harder
link |
to make generative models in other domains.
link |
So far, we're able to make reasonable speech models
link |
and things like that.
link |
But to be honest, we haven't actually explored
link |
a whole lot of different data sets all that much.
link |
We don't, for example, see a lot of deep learning models
link |
of like biology data sets
link |
where you have lots of microarrays measuring
link |
the amount of different enzymes and things like that.
link |
So we may find that some of the progress
link |
that we've seen for images and speech
link |
turns out to really rely heavily on the model architecture.
link |
And we were able to do what we did for vision
link |
by trying to reverse engineer the human visual system.
link |
And maybe it'll turn out that we can't just use
link |
that same trick for arbitrary kinds of data.
link |
Right, so there's aspect to the human vision system,
link |
the hardware of it, that makes it without learning,
link |
without cognition, just makes it really effective
link |
at detecting the patterns we see in the visual world.
link |
Yeah, that's really interesting.
link |
What, in a big, quick overview,
link |
in your view, what types of GANs are there
link |
and what other generative models besides GANs are there?
link |
Yeah, so it's maybe a little bit easier to start
link |
with what kinds of generative models are there
link |
So most generative models are likelihood based
link |
where to train them, you have a model that tells you
link |
how much probability it assigns to a particular example
link |
and you just maximize the probability assigned
link |
to all the training examples.
link |
It turns out that it's hard to design a model
link |
that can create really complicated images
link |
or really complicated audio waveforms
link |
and still have it be possible to estimate
link |
the likelihood function from a computational point of view.
link |
Most interesting models that you would just write down
link |
intuitively, it turns out that it's almost impossible
link |
to calculate the amount of probability they assign
link |
to a particular point.
link |
So there's a few different schools of generative models
link |
in the likelihood family.
link |
One approach is to very carefully design the model
link |
so that it is computationally tractable
link |
to measure the density it assigns to a particular point.
link |
So there are things like autoregressive models,
link |
like PixelCNN, those basically break down
link |
the probability distribution into a product
link |
over every single feature.
link |
So for an image, you estimate the probability
link |
of each pixel given all of the pixels that came before it.
link |
There's tricks where if you want to measure
link |
the density function, you can actually calculate
link |
the density for all these pixels more or less in parallel.
link |
Generating the image still tends to require you
link |
to go one pixel at a time, and that can be very slow.
link |
But there are, again, tricks for doing this
link |
in a hierarchical pattern where you can keep
link |
the runtime under control.
link |
Are the quality of the images it generates,
link |
putting runtime aside, pretty good?
link |
They're reasonable, yeah.
link |
I would say a lot of the best results
link |
are from GANs these days, but it can be hard to tell
link |
how much of that is based on who's studying
link |
which type of algorithm, if that makes sense.
link |
The amount of effort invested in a particular.
link |
Yeah, or like the kind of expertise.
link |
So a lot of people who've traditionally been excited
link |
about graphics or art and things like that
link |
have gotten interested in GANs.
link |
And to some extent, it's hard to tell
link |
are GANs doing better because they have a lot
link |
of graphics and art experts behind them,
link |
or are GANs doing better because they're more
link |
computationally efficient, or are GANs doing better
link |
because they prioritize the realism of samples
link |
over the accuracy of the density function.
link |
I think all of those are potentially valid explanations,
link |
and it's hard to tell.
link |
So can you give a brief history of GANs from 2014?
link |
Were you paper 13?
link |
Yeah, so a few highlights.
link |
In the first paper, we just showed
link |
that GANs basically work.
link |
If you look back at the samples we had now,
link |
they look terrible.
link |
On the CIFAR 10 data set,
link |
you can't even recognize objects in them.
link |
Your paper, sorry, you used CIFAR 10?
link |
We used MNIST, which is little handwritten digits.
link |
We used the Toronto Face database,
link |
which is small grayscale photos of faces.
link |
We did have recognizable faces.
link |
My colleague Bing Xu put together
link |
the first GAN face model for that paper.
link |
We also had the CIFAR 10 data set,
link |
which is things like very small 32 by 32 pixels
link |
of cars and cats and dogs.
link |
For that, we didn't get recognizable objects,
link |
but all the deep learning people back then
link |
were really used to looking at these failed samples
link |
and kind of reading them like tea leaves.
link |
And people who are used to reading the tea leaves
link |
recognize that our tea leaves at least look different.
link |
Maybe not necessarily better,
link |
but there was something unusual about them.
link |
And that got a lot of us excited.
link |
One of the next really big steps was LAPGAN
link |
by Emily Denton and Sumit Chintala at Facebook AI Research,
link |
where they actually got really good high resolution photos
link |
working with GANs for the first time.
link |
They had a complicated system
link |
where they generated the image starting at low res
link |
and then scaling up to high res,
link |
but they were able to get it to work.
link |
And then in 2015, I believe later that same year,
link |
Alec Radford and Sumit Chintala and Luke Metz
link |
published the DCGAN paper,
link |
which it stands for deep convolutional GAN.
link |
It's kind of a non unique name
link |
because these days basically all GANs
link |
and even some before that were deep and convolutional,
link |
but they just kind of picked a name
link |
for a really great recipe
link |
where they were able to actually using only one model
link |
instead of a multi step process,
link |
actually generate realistic images of faces
link |
and things like that.
link |
That was sort of like the beginning
link |
of the Cambrian explosion of GANs.
link |
Like once you had animals that had a backbone,
link |
you suddenly got lots of different versions of fish
link |
and four legged animals and things like that.
link |
So DCGAN became kind of the backbone
link |
for many different models that came out.
link |
It's used as a baseline even still.
link |
And so from there,
link |
I would say some interesting things we've seen
link |
are there's a lot you can say
link |
about how just the quality
link |
of standard image generation GANs has increased,
link |
but what's also maybe more interesting
link |
on an intellectual level
link |
is how the things you can use GANs for has also changed.
link |
One thing is that you can use them to learn classifiers
link |
without having to have class labels
link |
for every example in your training set.
link |
So that's called semi supervised learning.
link |
My colleague at OpenAI, Tim Solomons,
link |
who's at Brain now,
link |
wrote a paper called Improve Techniques for Training GANs.
link |
I'm a coauthor on this paper,
link |
but I can't claim any credit for this particular part.
link |
One thing he showed in the paper
link |
is that you can take the GAN discriminator
link |
and use it as a classifier that actually tells you,
link |
this image is a cat, this image is a dog,
link |
this image is a car, this image is a truck, and so on.
link |
Not just to say whether the image is real or fake,
link |
but if it is real to say specifically
link |
what kind of object it is.
link |
And he found that you can train these classifiers
link |
with far fewer labeled examples
link |
than traditional classifiers.
link |
So if you supervise based on also
link |
not just your discrimination ability,
link |
but your ability to classify,
link |
you're going to do much,
link |
you're going to converge much faster
link |
to being effective at being a discriminator.
link |
So for example, for the MNIST dataset,
link |
you want to look at an image of a handwritten digit
link |
and say whether it's a zero, a one, or a two, and so on.
link |
To get down to less than 1% accuracy
link |
required around 60,000 examples
link |
until maybe about 2014 or so.
link |
In 2016 with this semi supervised GAN project,
link |
Tim was able to get below 1% error
link |
using only 100 labeled examples.
link |
So that was about a 600X decrease
link |
in the amount of labels that he needed.
link |
He's still using more images than that,
link |
but he doesn't need to have each of them labeled
link |
as this one's a one, this one's a two,
link |
this one's a zero, and so on.
link |
Then to be able to,
link |
for GANs to be able to generate recognizable objects,
link |
so objects from a particular class,
link |
you still need labeled data
link |
because you need to know what it means
link |
to be a particular class cat, dog.
link |
How do you think we can move away from that?
link |
Yeah, some researchers at Brain Zurich
link |
actually just released a really great paper
link |
on semi supervised GANs
link |
where their goal isn't to classify,
link |
it's to make recognizable objects
link |
despite not having a lot of labeled data.
link |
They were working off of DeepMind's BigGAN project
link |
and they showed that they can match the performance
link |
of BigGAN using only 10%, I believe,
link |
BigGAN was trained on the ImageNet data set,
link |
which is about 1.2 million images
link |
and had all of them labeled.
link |
This latest project from Brain Zurich
link |
shows that they're able to get away
link |
with only having about 10% of the images labeled.
link |
And they do that essentially using a clustering algorithm
link |
where the discriminator learns
link |
to assign the objects to groups
link |
and then this understanding that objects can be grouped
link |
into similar types helps it to form more realistic ideas
link |
of what should be appearing in the image
link |
because it knows that every image it creates
link |
has to come from one of these archetypal groups
link |
rather than just being some arbitrary image.
link |
If you train a GAN with no class labels,
link |
you tend to get things that look sort of like grass
link |
or water or brick or dirt,
link |
but without necessarily a lot going on in them.
link |
And I think that's partly because
link |
if you look at a large ImageNet image,
link |
the object doesn't necessarily occupy the whole image.
link |
And so you learn to create realistic sets of pixels,
link |
but you don't necessarily learn
link |
that the object is the star of the show
link |
and you want it to be in every image you make.
link |
Yeah, I've heard you talk about the horse,
link |
the zebra cycle GAN mapping
link |
and how it turns out, again, thought provoking
link |
that horses are usually on grass
link |
and zebras are usually on drier terrain.
link |
So when you're doing that kind of generation,
link |
you're going to end up generating greener horses
link |
or whatever, so those are connected together.
link |
It's not just, you're not able to segment,
link |
be able to generate in a segment away.
link |
So are there other types of games you come across
link |
in your mind that neural networks can play
link |
with each other to be able to solve problems?
link |
Yeah, the one that I spend most of my time on
link |
You can model most interactions as a game
link |
where there's attackers trying to break your system
link |
and you're the defender trying to build a resilient system.
link |
There's also domain adversarial learning,
link |
which is an approach to domain adaptation
link |
that looks really a lot like GANs.
link |
The authors had the idea before the GAN paper came out,
link |
their paper came out a little bit later
link |
and they're very nice and cited the GAN paper,
link |
but I know that they actually had the idea
link |
before it came out.
link |
Domain adaptation is when you want to train
link |
a machine learning model in one setting called a domain
link |
and then deploy it in another domain later.
link |
And you would like it to perform well in the new domain,
link |
even though the new domain is different
link |
from how it was trained.
link |
So for example, you might want to train
link |
on a really clean image data set like ImageNet,
link |
but then deploy on users phones
link |
where the user is taking pictures in the dark
link |
and pictures while moving quickly
link |
and just pictures that aren't really centered
link |
or composed all that well.
link |
When you take a normal machine learning model,
link |
it often degrades really badly
link |
when you move to the new domain
link |
because it looks so different
link |
from what the model was trained on.
link |
Domain adaptation algorithms try to smooth out that gap
link |
and the domain adversarial approach
link |
is based on training a feature extractor
link |
where the features have the same statistics
link |
regardless of which domain you extracted them on.
link |
So in the domain adversarial game,
link |
you have one player that's a feature extractor
link |
and another player that's a domain recognizer.
link |
The domain recognizer wants to look at the output
link |
of the feature extractor
link |
and guess which of the two domains the features came from.
link |
So it's a lot like the real versus fake discriminator
link |
in GANs and then the feature extractor,
link |
you can think of as loosely analogous
link |
to the generator in GANs,
link |
except what it's trying to do here
link |
is both fool the domain recognizer
link |
into not knowing which domain the data came from
link |
and also extract features that are good for classification.
link |
So at the end of the day,
link |
in the cases where it works out,
link |
you can actually get features
link |
that work about the same in both domains.
link |
Sometimes this has a drawback
link |
where in order to make things work the same in both domains,
link |
it just gets worse at the first one.
link |
But there are a lot of cases
link |
where it actually works out well on both.
link |
So do you think of GANs being useful
link |
in the context of data augmentation?
link |
Yeah, one thing you could hope for with GANs
link |
is you could imagine I've got a limited training set
link |
and I'd like to make more training data
link |
to train something else like a classifier.
link |
You could train the GAN on the training set
link |
and then create more data
link |
and then maybe the classifier
link |
would perform better on the test set
link |
after training on this bigger GAN generated data set.
link |
So that's the simplest version
link |
of something you might hope would work.
link |
I've never heard of that particular approach working,
link |
but I think there's some closely related things
link |
that I think could work in the future
link |
and some that actually already have worked.
link |
So if we think a little bit about what we'd be hoping for
link |
if we use the GAN to make more training data,
link |
we're hoping that the GAN will generalize to new examples
link |
better than the classifier would have generalized
link |
if it was trained on the same data.
link |
And I don't know of any reason to believe
link |
that the GAN would generalize better
link |
than the classifier would,
link |
but what we might hope for
link |
is that the GAN could generalize differently
link |
from a specific classifier.
link |
So one thing I think is worth trying
link |
that I haven't personally tried but someone could try is
link |
what if you trained a whole lot of different
link |
generative models on the same training set,
link |
create samples from all of them
link |
and then train a classifier on that?
link |
Because each of the generative models
link |
might generalize in a slightly different way.
link |
They might capture many different axes of variation
link |
that one individual model wouldn't
link |
and then the classifier can capture all of those ideas
link |
by training in all of their data.
link |
So it'd be a little bit like making
link |
an ensemble of classifiers.
link |
And I think that...
link |
Ensemble of GANs in a way.
link |
I think that could generalize better.
link |
The other thing that GANs are really good for
link |
is not necessarily generating new data
link |
that's exactly like what you already have,
link |
but by generating new data that has different properties
link |
from the data you already had.
link |
One thing that you can do is you can create
link |
differentially private data.
link |
So suppose that you have something like medical records
link |
and you don't want to train a classifier
link |
on the medical records and then publish the classifier
link |
because someone might be able to reverse engineer
link |
some of the medical records you trained on.
link |
There's a paper from Casey Green's lab
link |
that shows how you can train a GAN
link |
using differential privacy.
link |
And then the samples from the GAN
link |
still have the same differential privacy guarantees
link |
as the parameters of the GAN.
link |
So you can make fake patient data
link |
for other researchers to use.
link |
And they can do almost anything they want with that data
link |
because it doesn't come from real people.
link |
And the differential privacy mechanism
link |
gives you clear guarantees
link |
on how much the original people's data has been protected.
link |
That's really interesting, actually.
link |
I haven't heard you talk about that before.
link |
In terms of fairness, I've seen from AAAI,
link |
your talk, how can adversarial machine learning
link |
help models be more fair with respect to sensitive variables?
link |
Yeah, so there's a paper from Amos Starkey's lab
link |
about how to learn machine learning models
link |
that are incapable of using specific variables.
link |
So say, for example, you wanted to make predictions
link |
that are not affected by gender.
link |
It isn't enough to just leave gender
link |
out of the input to the model.
link |
You can often infer gender
link |
from a lot of other characteristics.
link |
Like say that you have the person's name,
link |
but you're not told their gender.
link |
Well, if their name is Ian, they're kind of obviously a man.
link |
So what you'd like to do is make a machine learning model
link |
that can still take in a lot of different attributes
link |
and make a really accurate informed prediction,
link |
but be confident that it isn't reverse engineering gender
link |
or another sensitive variable internally.
link |
You can do that using something very similar
link |
to the domain adversarial approach,
link |
where you have one player that's a feature extractor
link |
and another player that's a feature analyzer.
link |
And you want to make sure that the feature analyzer
link |
is not able to guess the value of the sensitive variable
link |
that you're trying to keep private.
link |
Right, that's, yeah, I love this approach.
link |
So yeah, with the feature,
link |
you're not able to infer the sensitive variables.
link |
Brilliant, that's quite brilliant and simple actually.
link |
Another way I think that GANs in particular
link |
could be used for fairness
link |
would be to make something like a CycleGAN,
link |
where you can take data from one domain
link |
and convert it into another.
link |
We've seen CycleGAN turning horses into zebras.
link |
We've seen other unsupervised GANs made by Mingyu Liu
link |
doing things like turning day photos into night photos.
link |
I think for fairness,
link |
you could imagine taking records for people in one group
link |
and transforming them into analogous people in another group
link |
and testing to see if they're treated equitably
link |
across those two groups.
link |
There's a lot of things that'd be hard to get right
link |
to make sure that the conversion process itself is fair.
link |
And I don't think it's anywhere near
link |
something that we could actually use yet,
link |
but if you could design that conversion process
link |
very carefully, it might give you a way of doing audits
link |
where you say, what if we took people from this group,
link |
converted them into equivalent people in another group,
link |
does the system actually treat them how it ought to?
link |
That's also really interesting.
link |
You know, in popular press and in general,
link |
in our imagination, you think,
link |
well, GANs are able to generate data
link |
and you start to think about deep fakes
link |
or being able to sort of maliciously generate data
link |
that fakes the identity of other people.
link |
Is this something of a concern to you?
link |
Is this something, if you look 10, 20 years into the future,
link |
is that something that pops up in your work,
link |
in the work of the community
link |
that's working on generating models?
link |
I'm a lot less concerned about 20 years from now
link |
than the next few years.
link |
I think there'll be a kind of bumpy cultural transition
link |
as people encounter this idea
link |
that there can be very realistic videos
link |
and audio that aren't real.
link |
I think 20 years from now,
link |
people will mostly understand
link |
that you shouldn't believe something is real
link |
just because you saw a video of it.
link |
People will expect to see
link |
that it's been cryptographically signed
link |
or have some other mechanism to make them believe
link |
that the content is real.
link |
There's already people working on this.
link |
Like there's a startup called Truepick
link |
that provides a lot of mechanisms
link |
for authenticating that an image is real.
link |
They're maybe not quite up to having a state actor
link |
try to evade their verification techniques,
link |
but it's something that people are already working on
link |
and I think we'll get right eventually.
link |
So you think authentication will eventually win out.
link |
So being able to authenticate that this is real
link |
As opposed to GANs just getting better and better
link |
or generative models being able to get better and better
link |
to where the nature of what is real is normal.
link |
I don't think we'll ever be able
link |
to look at the pixels of a photo
link |
and tell you for sure that it's real or not real.
link |
And I think it would actually be somewhat dangerous
link |
to rely on that approach too much.
link |
If you make a really good fake detector
link |
and then someone's able to fool your fake detector
link |
and your fake detector says this image is not fake,
link |
then it's even more credible
link |
than if you've never made a fake detector
link |
in the first place.
link |
What I do think we'll get to is systems
link |
that we can kind of use behind the scenes
link |
to make estimates of what's going on
link |
and maybe not like use them in court
link |
for a definitive analysis.
link |
I also think we will likely get better authentication systems
link |
where, imagine that every phone cryptographically signs
link |
everything that comes out of it.
link |
You wouldn't be able to conclusively tell
link |
that an image was real,
link |
but you would be able to tell somebody
link |
who knew the appropriate private key for this phone
link |
was actually able to sign this image
link |
and upload it to this server at this timestamp.
link |
Okay, so you could imagine maybe you make phones
link |
that have the private keys hardware embedded in them.
link |
If like a state security agency
link |
really wants to infiltrate the company,
link |
they could probably plant a private key of their choice
link |
or break open the chip and learn the private key
link |
or something like that.
link |
But it would make it a lot harder
link |
for an adversary with fewer resources to fake things.
link |
For most of us it would be okay.
link |
So you mentioned the beer and the bar and the new ideas.
link |
You were able to implement this
link |
or come up with this new idea pretty quickly
link |
and implement it pretty quickly.
link |
Do you think there's still many such groundbreaking ideas
link |
in deep learning that could be developed so quickly?
link |
Yeah, I do think that there are a lot of ideas
link |
that can be developed really quickly.
link |
GANs were probably a little bit of an outlier
link |
on the whole like one hour timescale.
link |
But just in terms of like low resource ideas
link |
where you do something really different
link |
on the algorithm scale and get a big payback.
link |
I think it's not as likely that you'll see that
link |
in terms of things like core machine learning technologies
link |
like a better classifier
link |
or a better reinforcement learning algorithm
link |
or a better generative model.
link |
If I had the GAN idea today,
link |
it would be a lot harder to prove that it was useful
link |
than it was back in 2014
link |
because I would need to get it running
link |
on something like ImageNet or Celeb A at high resolution.
link |
You know, those take a while to train.
link |
You couldn't train it in an hour
link |
and know that it was something really new and exciting.
link |
Back in 2014, training on MNIST was enough.
link |
But there are other areas of machine learning
link |
where I think a new idea
link |
could actually be developed really quickly
link |
with low resources.
link |
What's your intuition about what areas
link |
of machine learning are ripe for this?
link |
Yeah, so I think fairness and interpretability
link |
are areas where we just really don't have any idea
link |
how anything should be done yet.
link |
Like for interpretability,
link |
I don't think we even have the right definitions.
link |
And even just defining a really useful concept,
link |
you don't even need to run any experiments,
link |
could have a huge impact on the field.
link |
We've seen that, for example, in differential privacy
link |
that Cynthia Dwork and her collaborators
link |
made this technical definition of privacy
link |
where before a lot of things were really mushy.
link |
And then with that definition,
link |
you could actually design randomized algorithms
link |
for accessing databases and guarantee
link |
that they preserved individual people's privacy
link |
in like a mathematical quantitative sense.
link |
Right now, we all talk a lot about
link |
how interpretable different machine learning algorithms are,
link |
but it's really just people's opinion.
link |
And everybody probably has a different idea
link |
of what interpretability means in their head.
link |
If we could define some concept related to interpretability
link |
that's actually measurable,
link |
that would be a huge leap forward
link |
even without a new algorithm that increases that quantity.
link |
And also once we had the definition of differential privacy,
link |
it was fast to get the algorithms that guaranteed it.
link |
So you could imagine once we have definitions
link |
of good concepts and interpretability,
link |
we might be able to provide the algorithms
link |
that have the interpretability guarantees quickly too.
link |
So what do you think it takes to build a system
link |
with human level intelligence
link |
as we quickly venture into the philosophical?
link |
So artificial general intelligence, what do you think it takes?
link |
I think that it definitely takes better environments
link |
than we currently have for training agents
link |
that we want them to have
link |
a really wide diversity of experiences.
link |
I also think it's gonna take really a lot of computation.
link |
It's hard to imagine exactly how much.
link |
So you're optimistic about simulation,
link |
simulating a variety of environments as the path forward?
link |
I think it's a necessary ingredient.
link |
Yeah, I don't think that we're going to get
link |
to artificial general intelligence
link |
by training on fixed data sets
link |
or by thinking really hard about the problem.
link |
I think that the agent really needs to interact
link |
and have a variety of experiences within the same lifespan.
link |
And today we have many different models
link |
that can each do one thing.
link |
And we tend to train them on one data set
link |
or one RL environment.
link |
Sometimes there are actually papers
link |
about getting one set of parameters to perform well
link |
in many different RL environments.
link |
But we don't really have anything like an agent
link |
that goes seamlessly from one type of experience to another
link |
and really integrates all the different things
link |
that it does over the course of its life.
link |
When we do see multi agent environments,
link |
or so many multi environment agents,
link |
they tend to be similar environments.
link |
Like all of them are playing like an action based video game.
link |
We don't really have an agent that goes from
link |
playing a video game to like reading the Wall Street Journal
link |
to predicting how effective a molecule will be as a drug
link |
or something like that.
link |
What do you think is a good test for intelligence
link |
There's been a lot of benchmarks started with the,
link |
natural conversation being a good benchmark for intelligence.
link |
What would Ian Goodfellow sit back
link |
and be really damn impressed
link |
if a system was able to accomplish?
link |
Something that doesn't take a lot of glue
link |
from human engineers.
link |
So imagine that instead of having to
link |
go to the CIFAR website and download CIFAR 10
link |
and then write a Python script to parse it and all that,
link |
you could just point an agent at the CIFAR 10 problem
link |
and it downloads and extracts the data
link |
and trains a model and starts giving you predictions.
link |
I feel like something that doesn't need to have
link |
every step of the pipeline assembled for it,
link |
definitely understands what it's doing.
link |
Is AutoML moving into that direction
link |
or are you thinking way even bigger?
link |
AutoML has mostly been moving toward,
link |
once we've built all the glue,
link |
can the machine learning system
link |
design the architecture really well?
link |
And so I'm more of saying like,
link |
if something knows how to pre process the data
link |
so that it successfully accomplishes the task,
link |
then it would be very hard to argue
link |
that it doesn't truly understand the task
link |
in some fundamental sense.
link |
And I don't necessarily know that that's like
link |
the philosophical definition of intelligence,
link |
but that's something that would be really cool to build
link |
that would be really useful and would impress me
link |
and would convince me that we've made a step forward
link |
So you give it like the URL for Wikipedia
link |
and then next day expect it to be able to solve CIFAR 10.
link |
Or like you type in a paragraph
link |
explaining what you want it to do
link |
and it figures out what web searches it should run
link |
and downloads all the necessary ingredients.
link |
So you have a very clear, calm way of speaking,
link |
no ums, easy to edit.
link |
I've seen comments for both you and I
link |
have been identified as both potentially being robots.
link |
If you have to prove to the world that you are indeed human,
link |
how would you do it?
link |
I can understand thinking that I'm a robot.
link |
It's the flip side of the Turing test, I think.
link |
Yeah, yeah, the prove your human test.
link |
Intellectually, so you have to...
link |
Is there something that's truly unique in your mind?
link |
Does it go back to just natural language again?
link |
Just being able to talk the way out of it.
link |
Proving that I'm not a robot with today's technology.
link |
Yeah, that's pretty straightforward.
link |
Like my conversation today hasn't veered off
link |
into talking about the stock market or something
link |
because of my training data.
link |
But I guess more generally trying to prove
link |
that something is real from the content alone
link |
is incredibly hard.
link |
That's one of the main things I've gotten
link |
out of my GAN research,
link |
that you can simulate almost anything.
link |
And so you have to really step back to a separate channel
link |
to prove that something is real.
link |
So like, I guess I should have had myself
link |
stamped on a blockchain when I was born or something,
link |
but I didn't do that.
link |
So according to my own research methodology,
link |
there's just no way to know at this point.
link |
So what, last question, problem stands out for you
link |
that you're really excited about challenging
link |
in the near future?
link |
So I think resistance to adversarial examples,
link |
figuring out how to make machine learning secure
link |
against an adversary who wants to interfere
link |
and control it, that is one of the most important things
link |
researchers today could solve.
link |
In all domains, image, language, driving, and everything.
link |
I guess I'm most concerned about domains
link |
we haven't really encountered yet.
link |
Like imagine 20 years from now,
link |
when we're using advanced AIs to do things
link |
we haven't even thought of yet.
link |
Like if you ask people,
link |
what are the important problems in security of phones
link |
I don't think we would have anticipated
link |
that we're using them for nearly as many things
link |
as we're using them for today.
link |
I think it's gonna be like that with AI
link |
that you can kind of try to speculate
link |
about where it's going,
link |
but really the business opportunities
link |
that end up taking off would be hard
link |
to predict ahead of time.
link |
What you can predict ahead of time
link |
is that almost anything you can do with machine learning,
link |
you would like to make sure
link |
that people can't get it to do what they want
link |
rather than what you want,
link |
just by showing it a funny QR code
link |
or a funny input pattern.
link |
And you think that the set of methodology to do that
link |
can be bigger than any one domain?
link |
Yeah, like one methodology that I think is,
link |
not a specific methodology,
link |
but like a category of solutions
link |
that I'm excited about today is making dynamic models
link |
that change every time they make a prediction.
link |
So right now we tend to train models
link |
and then after they're trained, we freeze them
link |
and we just use the same rule
link |
to classify everything that comes in from then on.
link |
That's really a sitting duck from a security point of view.
link |
If you always output the same answer for the same input,
link |
then people can just run inputs through
link |
until they find a mistake that benefits them.
link |
And then they use the same mistake
link |
over and over and over again.
link |
I think having a model that updates its predictions
link |
so that it's harder to predict what you're gonna get
link |
will make it harder for an adversary
link |
to really take control of the system
link |
and make it do what they want it to do.
link |
Yeah, models that maintain a bit of a sense of mystery
link |
about them, because they always keep changing.
link |
Ian, thanks so much for talking today, it was awesome.
link |
Thank you for coming in, it's great to see you.