back to indexIan Goodfellow: Generative Adversarial Networks (GANs) | Lex Fridman Podcast #19
link |
The following is a conversation with Ian Goodfellow.
link |
He's the author of the popular textbook on deep learning
link |
simply titled Deep Learning.
link |
He coined the term of generative adversarial networks,
link |
otherwise known as GANs.
link |
And with his 2014 paper is responsible
link |
for launching the incredible growth
link |
of research and innovation
link |
in this subfield of deep learning.
link |
He got his BS and MS at Stanford,
link |
his PhD at University of Montreal
link |
with Yoshua Benjo and Aaron Kervel.
link |
He held several research positions,
link |
including at OpenAI, Google Brain,
link |
and now at Apple as the director of machine learning.
link |
This recording happened while Ian was still a Google Brain,
link |
but we don't talk about anything specific to Google
link |
or any other organization.
link |
This conversation is part
link |
of the artificial intelligence podcast.
link |
If you enjoy it, subscribe on YouTube,
link |
iTunes, or simply connect with me on Twitter
link |
at Lex Freedman, spelled F R I D.
link |
And now here's my conversation with Ian Goodfellow.
link |
You open your popular deep learning book
link |
with a Russian doll type diagram
link |
that shows deep learning is a subset
link |
of representation learning,
link |
which in turn is a subset of machine learning
link |
and finally a subset of AI.
link |
So this kind of implies that there may be limits
link |
to deep learning in the context of AI.
link |
So what do you think is the current limits of deep learning
link |
and are those limits something
link |
that we can overcome with time?
link |
Yeah, I think one of the biggest limitations
link |
of deep learning is that right now
link |
it requires really a lot of data, especially labeled data.
link |
There are some unsupervised
link |
and semi supervised learning algorithms
link |
that can reduce the amount of labeled data you need,
link |
but they still require a lot of unlabeled data.
link |
Reinforcement learning algorithms, they don't need labels,
link |
but they need really a lot of experiences.
link |
As human beings, we don't learn to play a pong
link |
by failing at pong two million times.
link |
So just getting the generalization ability better
link |
is one of the most important bottlenecks
link |
in the capability of the technology today.
link |
And then I guess I'd also say deep learning
link |
is like a component of a bigger system.
link |
So far, nobody is really proposing to have
link |
only what you'd call deep learning
link |
as the entire ingredient of intelligence.
link |
You use deep learning as sub modules of other systems,
link |
like AlphaGo has a deep learning model
link |
that estimates the value function.
link |
Most reinforcement learning algorithms
link |
have a deep learning module
link |
that estimates which action to take next,
link |
but you might have other components.
link |
So you're basically building a function estimator.
link |
Do you think it's possible?
link |
You said nobody's kind of been thinking about this so far,
link |
but do you think neural networks could be made to reason
link |
in the way symbolic systems did in the 80s and 90s
link |
to do more, create more like programs
link |
as opposed to functions?
link |
Yeah, I think we already see that a little bit.
link |
I already kind of think of neural nets as a kind of program.
link |
I think of deep learning as basically learning programs
link |
that have more than one step.
link |
So if you draw a flow chart
link |
or if you draw a TensorFlow graph
link |
describing your machine learning model,
link |
I think of the depth of that graph
link |
as describing the number of steps that run in sequence
link |
and then the width of that graph
link |
as the number of steps that run in parallel.
link |
Now it's been long enough
link |
that we've had deep learning working
link |
that it's a little bit silly
link |
to even discuss shallow learning anymore,
link |
but back when I first got involved in AI,
link |
when we used machine learning,
link |
we were usually learning things
link |
like support vector machines.
link |
You could have a lot of input features to the model
link |
and you could multiply each feature by a different weight.
link |
All those multiplications were done in parallel to each other
link |
and there wasn't a lot done in series.
link |
I think what we got with deep learning
link |
was really the ability to have steps of a program
link |
that run in sequence.
link |
And I think that we've actually started to see
link |
that what's important with deep learning
link |
is more the fact that we have a multi step program
link |
rather than the fact that we've learned a representation.
link |
If you look at things like Resnuts, for example,
link |
they take one particular kind of representation
link |
and they update it several times.
link |
Back when deep learning first really took off
link |
in the academic world in 2006,
link |
when Jeff Hinton showed that you could train
link |
deep belief networks,
link |
everybody who was interested in the idea
link |
thought of it as each layer
link |
learns a different level of abstraction,
link |
that the first layer trained on images
link |
learns something like edges
link |
and the second layer learns corners
link |
and eventually you get these kind of grandmother cell units
link |
that recognize specific objects.
link |
Today, I think most people think of it more
link |
as a computer program where as you add more layers,
link |
you can do more updates before you output your final number.
link |
But I don't think anybody believes that
link |
layer 150 of the Resnet is a grandmother cell
link |
and layer 100 is contours or something like that.
link |
Okay, so you're not thinking of it
link |
as a singular representation that keeps building.
link |
You think of it as a program sort of almost like a state.
link |
The representation is a state of understanding.
link |
Yeah, I think of it as a program that makes several updates
link |
and arrives at better and better understandings,
link |
but it's not replacing the representation at each step.
link |
And in some sense, that's a little bit like reasoning.
link |
It's not reasoning in the form of deduction,
link |
but it's reasoning in the form of taking a thought
link |
and refining it and refining it carefully
link |
until it's good enough to use.
link |
So do you think, and I hope you don't mind,
link |
we'll jump philosophical every once in a while.
link |
Do you think of, you know, cognition, human cognition,
link |
or even consciousness as simply a result
link |
of this kind of sequential representation learning?
link |
Do you think that can emerge?
link |
Cognition, yes, I think so.
link |
Consciousness, it's really hard to even define
link |
what we mean by that.
link |
I guess there's, consciousness is often defined
link |
as things like having self awareness,
link |
and that's relatively easy to turn it
link |
to something actionable for a computer scientist
link |
People also define consciousness in terms
link |
of having qualitative states of experience, like qualia.
link |
There's all these philosophical problems,
link |
like could you imagine a zombie
link |
who does all the same information processing as a human,
link |
but doesn't really have the qualitative experiences
link |
That sort of thing, I have no idea how to formalize
link |
or turn it into a scientific question.
link |
I don't know how you could run an experiment
link |
to tell whether a person is a zombie or not.
link |
And similarly, I don't know how you could run
link |
an experiment to tell whether an advanced AI system
link |
had become conscious in the sense of qualia or not.
link |
But in the more practical sense,
link |
like almost like self attention,
link |
you think consciousness and cognition can,
link |
in an impressive way, emerge from current types
link |
of architectures that we think of as determining.
link |
Or if you think of consciousness
link |
in terms of self awareness and just making plans
link |
based on the fact that the agent itself
link |
exists in the world, reinforcement learning algorithms
link |
are already more or less forced to model
link |
the agent's effect on the environment.
link |
So that more limited version of consciousness
link |
is already something that we get limited versions
link |
of with reinforcement learning algorithms
link |
if they're trained well.
link |
But you say limited.
link |
So the big question really is how you jump
link |
from limited to human level, right?
link |
And whether it's possible,
link |
even just building common sense reasoning
link |
seems to be exceptionally difficult.
link |
So if we scale things up,
link |
if we get much better on supervised learning,
link |
if we get better at labeling,
link |
if we get bigger datasets, more compute,
link |
do you think we'll start to see really impressive things
link |
that go from limited to something echoes
link |
of human level cognition?
link |
I'm optimistic about what can happen
link |
just with more computation and more data.
link |
I do think it'll be important to get the right kind of data.
link |
Today, most of the machine learning systems we train
link |
are mostly trained on one type of data for each model.
link |
But the human brain, we get all of our different senses
link |
and we have many different experiences
link |
like riding a bike, driving a car,
link |
talking to people, reading.
link |
I think when we get that kind of integrated dataset
link |
working with a machine learning model
link |
that can actually close the loop and interact,
link |
we may find that algorithms not so different
link |
from what we have today,
link |
learn really interesting things
link |
when you scale them up a lot
link |
and train them on a large amount of multimodal data.
link |
So multimodal is really interesting,
link |
but within, like you're working adversarial examples.
link |
So selecting within model, within one mode of data,
link |
selecting better at what are the difficult cases
link |
from which you're most useful to learn from.
link |
Oh, yeah, like could we get a whole lot of mileage
link |
out of designing a model that's resistant
link |
to adversarial examples or something like that?
link |
Right, that's the question.
link |
My thinking on that has evolved a lot
link |
over the last few years.
link |
When I first started to really invest
link |
in studying adversarial examples,
link |
I was thinking of it mostly as adversarial examples
link |
reveal a big problem with machine learning.
link |
And we would like to close the gap
link |
between how machine learning models respond
link |
to adversarial examples and how humans respond.
link |
After studying the problem more,
link |
I still think that adversarial examples are important.
link |
I think of them now more of as a security liability
link |
than as an issue that necessarily shows
link |
there's something uniquely wrong
link |
with machine learning as opposed to humans.
link |
Also, do you see them as a tool
link |
to improve the performance of the system?
link |
Not on the security side, but literally just accuracy.
link |
I do see them as a kind of tool on that side,
link |
but maybe not quite as much as I used to think.
link |
We've started to find that there's a trade off
link |
between accuracy on adversarial examples
link |
and accuracy on clean examples.
link |
Back in 2014, when I did the first adversarily trained
link |
classifier that showed resistance
link |
to some kinds of adversarial examples,
link |
it also got better at the clean data on MNIST.
link |
And that's something we've replicated several times
link |
on MNIST, that when we train
link |
against weak adversarial examples,
link |
MNIST classifiers get more accurate.
link |
So far that hasn't really held up on other data sets
link |
and hasn't held up when we train
link |
against stronger adversaries.
link |
It seems like when you confront
link |
a really strong adversary,
link |
you tend to have to give something up.
link |
Interesting, but it's such a compelling idea
link |
because it feels like that's how us humans learn
link |
to do the difficult cases.
link |
We try to think of what would we screw up
link |
and then we make sure we fix that.
link |
It's also in a lot of branches of engineering,
link |
you do a worst case analysis
link |
and make sure that your system will work in the worst case.
link |
And then that guarantees that it'll work
link |
in all of the messy average cases that happen
link |
when you go out into a really randomized world.
link |
Yeah, with driving with autonomous vehicles,
link |
there seems to be a desire to just look
link |
for think adversarially,
link |
try to figure out how to mess up the system.
link |
And if you can be robust to all those difficult cases,
link |
then you can, it's a hand wavy empirical way
link |
to show your system is safe.
link |
Today, most adversarial example research
link |
isn't really focused on a particular use case,
link |
but there are a lot of different use cases
link |
where you'd like to make sure
link |
that the adversary can't interfere
link |
with the operation of your system.
link |
if you have an algorithm making trades for you,
link |
people go to a lot of an effort
link |
to obfuscate their algorithm.
link |
That's both to protect their IP
link |
because you don't want to research
link |
and develop a profitable trading algorithm
link |
then have somebody else capture the gains.
link |
But it's at least partly
link |
because you don't want people to make adversarial
link |
examples that fool your algorithm
link |
into making bad trades.
link |
Or I guess one area that's been popular
link |
in the academic literature is speech recognition.
link |
If you use speech recognition to hear an audio waveform
link |
and then turn that into a command
link |
that a phone executes for you,
link |
you don't want a malicious adversary
link |
to be able to produce audio
link |
that gets interpreted as malicious commands,
link |
especially if a human in the room
link |
doesn't realize that something like that is happening.
link |
In speech recognition,
link |
has there been much success
link |
in being able to create adversarial examples
link |
that fool the system?
link |
I guess the first work that I'm aware of
link |
is a paper called Hidden Voice Commands
link |
that came out in 2016, I believe.
link |
And they were able to show
link |
that they could make sounds
link |
that are not understandable by a human
link |
but are recognized as the target phrase
link |
that the attacker wants the phone to recognize it as.
link |
Since then, things have gotten a little bit better
link |
on the attacker side when worse on the defender side.
link |
It's become possible to make sounds
link |
that sound like normal speech
link |
but are actually interpreted as a different sentence
link |
than the human hears.
link |
The level of perceptibility
link |
of the adversarial perturbation is still kind of high.
link |
When you listen to the recording,
link |
it sounds like there's some noise in the background,
link |
just like rustling sounds.
link |
But those rustling sounds are actually
link |
the adversarial perturbation
link |
that makes the phone hear a completely different sentence.
link |
Yeah, that's so fascinating.
link |
Peter Norvig mentioned that you're writing
link |
the deep learning chapter for the fourth edition
link |
of the Artificial Intelligence,
link |
the Modern Approach Book.
link |
So how do you even begin summarizing
link |
the field of deep learning in a chapter?
link |
Well, in my case, I waited like a year
link |
before I actually wrote anything.
link |
Even having written a full length textbook before,
link |
it's still pretty intimidating
link |
to try to start writing just one chapter
link |
that covers everything.
link |
One thing that helped me make that plan
link |
was actually the experience
link |
of having written the full book before
link |
and then watching how the field changed
link |
after the book came out.
link |
I realized there's a lot of topics
link |
that were maybe extraneous in the first book
link |
and just seeing what stood the test
link |
of a few years of being published
link |
and what seems a little bit less important
link |
to have included now helped me pare down
link |
the topics I wanted to cover for the book.
link |
It's also really nice now that the field
link |
is kind of stabilized to the point
link |
where some core ideas from the 1980s are still used today.
link |
When I first started studying machine learning,
link |
almost everything from the 1980s had been rejected
link |
and now some of it has come back.
link |
So that stuff that's really stood the test of time
link |
is what I focused on putting into the book.
link |
There's also, I guess, two different philosophies
link |
about how you might write a book.
link |
One philosophy is you try to write a reference
link |
that covers everything.
link |
The other philosophy is you try to provide
link |
a high level summary that gives people
link |
the language to understand a field
link |
and tells them what the most important concepts are.
link |
The first deep learning book that I wrote
link |
with Joshua and Aaron was somewhere
link |
between the two philosophies,
link |
that it's trying to be both a reference
link |
and an introductory guide.
link |
Writing this chapter for Russell and Norvig's book,
link |
I was able to focus more on just a concise introduction
link |
of the key concepts and the language
link |
you need to read about them more.
link |
In a lot of cases, I actually just wrote paragraphs
link |
that said, here's a rapidly evolving area
link |
that you should pay attention to.
link |
It's pointless to try to tell you what the latest
link |
and best version of a learn to learn model is.
link |
I can point you to a paper that's recent right now,
link |
but there isn't a whole lot of a reason to delve
link |
into exactly what's going on with the latest
link |
learning to learn approach or the latest module
link |
produced by a learning to learn algorithm.
link |
You should know that learning to learn is a thing
link |
and that it may very well be the source
link |
of the latest and greatest convolutional net
link |
or recurrent net module that you would want to use
link |
in your latest project.
link |
But there isn't a lot of point in trying to summarize
link |
exactly which architecture and which learning approach
link |
got to which level of performance.
link |
So you maybe focus more on the basics of the methodology.
link |
So from back propagation to feed forward
link |
to recurrent networks, convolutional, that kind of thing.
link |
So if I were to ask you, I remember I took algorithms
link |
and data structures algorithms, of course.
link |
I remember the professor asked, what is an algorithm?
link |
And he yelled at everybody in a good way
link |
that nobody was answering it correctly.
link |
Everybody knew what the algorithm, it was graduate course.
link |
Everybody knew what an algorithm was,
link |
but they weren't able to answer it well.
link |
So let me ask you, in that same spirit,
link |
what is deep learning?
link |
I would say deep learning is any kind of machine learning
link |
that involves learning parameters of more than one
link |
So that, I mean, shallow learning is things where
link |
you learn a lot of operations that happen in parallel.
link |
You might have a system that makes multiple steps,
link |
like you might have hand designed feature extractors,
link |
but really only one step is learned.
link |
Deep learning is anything where you have multiple
link |
operations in sequence.
link |
And that includes the things that are really popular
link |
today, like convolutional networks
link |
and recurrent networks, but it also includes some
link |
of the things that have died out, like Bolton machines,
link |
where we weren't using back propagation.
link |
Today, I hear a lot of people define deep learning
link |
as gradient descent applied to these differentiable
link |
functions, and I think that's a legitimate usage
link |
of the term, it's just different from the way
link |
that I use the term myself.
link |
So what's an example of deep learning that is not
link |
gradient descent and differentiable functions?
link |
In your, I mean, not specifically perhaps,
link |
but more even looking into the future.
link |
What's your thought about that space of approaches?
link |
Yeah, so I tend to think of machine learning algorithms
link |
as decomposed into really three different pieces.
link |
There's the model, which can be something like a neural net
link |
or a Bolton machine or a recurrent model.
link |
And that basically just describes how do you take data
link |
and how do you take parameters and what function do you use
link |
to make a prediction given the data and the parameters?
link |
Another piece of the learning algorithm is
link |
the optimization algorithm, or not every algorithm
link |
can be really described in terms of optimization,
link |
but what's the algorithm for updating the parameters
link |
or updating whatever the state of the network is?
link |
And then the last part is the data set,
link |
like how do you actually represent the world
link |
as it comes into your machine learning system?
link |
So I think of deep learning as telling us something
link |
about what does the model look like?
link |
And basically to qualify as deep,
link |
I say that it just has to have multiple layers.
link |
That can be multiple steps in a feed forward
link |
differentiable computation.
link |
That can be multiple layers in a graphical model.
link |
There's a lot of ways that you could satisfy me
link |
that something has multiple steps
link |
that are each parameterized separately.
link |
I think of gradient descent as being all about
link |
the how do you actually update the parameters piece?
link |
So you could imagine having a deep model
link |
like a convolutional net and training it with something
link |
like evolution or a genetic algorithm.
link |
And I would say that still qualifies as deep learning.
link |
And then in terms of models
link |
that aren't necessarily differentiable,
link |
I guess Bolton machines are probably the main example
link |
of something where you can't really take a derivative
link |
and use that for the learning process.
link |
But you can still argue that the model has many steps
link |
of processing that it applies
link |
when you run inference in the model.
link |
So it's the steps of processing that's key.
link |
So Jeff Hinton suggests that we need to throw away
link |
back propagation and start all over.
link |
What do you think about that?
link |
What could an alternative direction
link |
of training neural networks look like?
link |
I don't know that back propagation
link |
is going to go away entirely.
link |
Most of the time when we decide
link |
that a machine learning algorithm
link |
isn't on the critical path to research for improving AI,
link |
the algorithm doesn't die,
link |
it just becomes used for some specialized set of things.
link |
A lot of algorithms like logistic regression
link |
don't seem that exciting to AI researchers
link |
who are working on things like speech recognition
link |
or autonomous cars today,
link |
but there's still a lot of use for logistic regression
link |
and things like analyzing really noisy data
link |
in medicine and finance
link |
or making really rapid predictions
link |
in really time limited contexts.
link |
So I think back propagation and gradient descent
link |
are around to stay,
link |
but they may not end up being everything
link |
that we need to get to real human level
link |
or super human AI.
link |
Are you optimistic about us discovering?
link |
You know, back propagation has been around for a few decades.
link |
So are you optimistic about us as a community
link |
being able to discover something better?
link |
I think we likely will find something that works better.
link |
You could imagine things like having stacks of models
link |
where some of the lower level models predict parameters
link |
of the higher level models.
link |
And so at the top level,
link |
you're not learning in terms of literally
link |
calculating gradients, but just predicting
link |
how different values will perform.
link |
You can kind of see that already in some areas
link |
like Bayesian optimization,
link |
where you have a Gaussian process
link |
that predicts how well different parameter values
link |
We already use those kinds of algorithms
link |
for things like hyper parameter optimization.
link |
And in general, we know a lot of things
link |
other than back prop that work really well
link |
for specific problems.
link |
The main thing we haven't found is a way of taking one
link |
of these other non back prop based algorithms
link |
and having it really advance the state of the art
link |
on an AI level problem.
link |
But I wouldn't be surprised if eventually we find
link |
that some of these algorithms that,
link |
even the ones that already exist,
link |
not even necessarily a new one,
link |
we might find some way of customizing one of these algorithms
link |
to do something really interesting
link |
at the level of cognition or the level of,
link |
I think one system that we really don't have working
link |
quite right yet is like short term memory.
link |
We have things like LSTMs,
link |
they're called long short term memory.
link |
They still don't do quite what a human does
link |
with short term memory.
link |
Like gradient descent to learn a specific fact
link |
has to do multiple steps on that fact.
link |
Like if I tell you, the meeting today is at 3pm,
link |
I don't need to say over and over again.
link |
It's at 3pm, it's at 3pm, it's at 3pm, it's at 3pm.
link |
For you to do a gradient step on each one,
link |
you just hear it once and you remember it.
link |
There's been some work on things like self attention
link |
and attention like mechanisms like the neural Turing machine
link |
that can write to memory cells and update themselves
link |
with facts like that right away.
link |
But I don't think we've really nailed it yet.
link |
And that's one area where I'd imagine that new optimization
link |
algorithms or different ways of applying existing
link |
optimization algorithms could give us a way
link |
of just lightning fast updating the state
link |
of a machine learning system to contain
link |
a specific fact like that without needing to have it
link |
presented over and over and over again.
link |
So some of the success of symbolic systems in the 80s
link |
is they were able to assemble these kinds of facts better.
link |
But there's a lot of expert input required
link |
and it's very limited in that sense.
link |
Do you ever look back to that as something
link |
that we'll have to return to eventually
link |
sort of dust off the book from the shelf
link |
and think about how we build knowledge, representation,
link |
Like will we have to use graph searches?
link |
Graph searches, right.
link |
And like first order logic and entailment
link |
and things like that.
link |
That kind of thing, yeah, exactly.
link |
In my particular line of work,
link |
which has mostly been machine learning security
link |
and also generative modeling,
link |
I haven't usually found myself moving in that direction.
link |
For generative models, I could see a little bit of,
link |
it could be useful if you had something like a,
link |
a differentiable knowledge base
link |
or some other kind of knowledge base
link |
where it's possible for some of our fuzzier
link |
machine learning algorithms to interact with a knowledge base.
link |
I mean, your network is kind of like that.
link |
It's a differentiable knowledge base of sorts.
link |
But if we had a really easy way of giving feedback
link |
to machine learning models,
link |
that would clearly help a lot with, with generative models.
link |
And so you could imagine one way of getting there would be,
link |
get a lot better at natural language processing.
link |
But another way of getting there would be,
link |
take some kind of knowledge base
link |
and figure out a way for it to actually interact
link |
with a neural network.
link |
Being able to have a chat with a neural network.
link |
So like one thing in generative models we see a lot today is,
link |
you'll get things like faces that are not symmetrical.
link |
Like, like people that have two eyes
link |
that are different colors.
link |
And I mean, there are people with eyes
link |
that are different colors in real life,
link |
but not nearly as many of them as you tend to see
link |
in the machine learning generated data.
link |
So if you had either a knowledge base
link |
that could contain the fact,
link |
people's faces are generally approximately symmetric
link |
and eye color is especially likely
link |
to be the same on both sides.
link |
Being able to just inject that hint
link |
into the machine learning model
link |
without having to discover that itself
link |
after studying a lot of data
link |
would be a really useful feature.
link |
I could see a lot of ways of getting there
link |
without bringing back some of the 1980s technology,
link |
but I also see some ways that you could imagine
link |
extending the 1980s technology to play nice with neural nets
link |
and have it help get there.
link |
So you talked about the story of you coming up
link |
with the idea of GANs at a bar with some friends.
link |
You were arguing that this, you know,
link |
GANs would work generative adversarial networks
link |
and the others didn't think so.
link |
Then you went home at midnight, coded up and it worked.
link |
So if I was a friend of yours at the bar,
link |
I would also have doubts.
link |
It's a really nice idea,
link |
but I'm very skeptical that it would work.
link |
What was the basis of their skepticism?
link |
What was the basis of your intuition why it should work?
link |
I don't wanna be someone who goes around promoting alcohol
link |
for the purposes of science,
link |
but in this case, I do actually think
link |
that drinking helped a little bit.
link |
When your inhibitions are lowered,
link |
you're more willing to try out things
link |
that you wouldn't try out otherwise.
link |
So I have noticed in general
link |
that I'm less prone to shooting down some of my own ideas
link |
when I have had a little bit to drink.
link |
I think if I had had that idea at lunchtime,
link |
I probably would have thought it.
link |
It's hard enough to train one neural net.
link |
You can't train a second neural net
link |
in the inner loop of the outer neural net.
link |
That was basically my friend's objection
link |
was that trying to train two neural nets at the same time
link |
would be too hard.
link |
So it was more about the training process
link |
unless, so my skepticism would be, I'm sure you could train it
link |
but the thing would converge to
link |
would not be able to generate anything reasonable
link |
and any kind of reasonable realism.
link |
Yeah, so part of what all of us were thinking about
link |
when we had this conversation was deep Bolton machines,
link |
which a lot of us in the lab, including me,
link |
were a big fan of deep Bolton machines at the time.
link |
They involved two separate processes running at the same time.
link |
One of them is called the positive phase
link |
where you load data into the model
link |
and tell the model to make the data more likely.
link |
The other one is called the negative phase
link |
where you draw samples from the model
link |
and tell the model to make those samples less likely.
link |
In a deep Bolton machine, it's not trivial
link |
to generate a sample.
link |
You have to actually run an iterative process
link |
that gets better and better samples
link |
coming closer and closer to the distribution
link |
the model represents.
link |
So during the training process,
link |
you're always running these two systems at the same time.
link |
One that's updating the parameters of the model
link |
and another one that's trying to generate samples
link |
And they worked really well on things like MNIST,
link |
but a lot of us in the lab, including me,
link |
had tried to get deep Bolton machines to scale past MNIST
link |
to things like generating color photos,
link |
and we just couldn't get the two processes
link |
to stay synchronized.
link |
So when I had the idea for GANs,
link |
a lot of people thought that the discriminator
link |
would have more or less the same problem
link |
as the negative phase in the Bolton machine,
link |
that trying to train the discriminator in the inner loop,
link |
you just couldn't get it to keep up
link |
with the generator in the outer loop.
link |
And that would prevent it from
link |
converging to anything useful.
link |
Yeah, I share that intuition.
link |
But turns out to not be the case.
link |
A lot of the time with machine learning algorithms,
link |
it's really hard to predict ahead of time
link |
how well they'll actually perform.
link |
You have to just run the experiment
link |
and see what happens.
link |
And I would say I still today don't have like one factor
link |
I can put my finger on and say,
link |
this is why GANs worked for photo generation
link |
and deep Bolton machines don't.
link |
There are a lot of theory papers showing that
link |
under some theoretical settings,
link |
the GAN algorithm does actually converge.
link |
But those settings are restricted enough
link |
that they don't necessarily explain the whole picture
link |
in terms of all the results that we see in practice.
link |
So taking a step back,
link |
can you, in the same way as we talked about deep learning,
link |
can you tell me what generative adversarial networks are?
link |
Yeah, so generative adversarial networks
link |
are a particular kind of generative model.
link |
A generative model is a machine learning model
link |
that can train on some set of data.
link |
Like say you have a collection of photos of cats
link |
and you want to generate more photos of cats,
link |
or you want to estimate a probability distribution
link |
over cats so you can ask how likely it is
link |
that some new image is a photo of a cat.
link |
GANs are one way of doing this.
link |
Some generative models are good at creating new data.
link |
Other generative models are good
link |
at estimating that density function
link |
and telling you how likely particular pieces of data are
link |
to come from the same distribution as the training data.
link |
GANs are more focused on generating samples
link |
rather than estimating the density function.
link |
There are some kinds of GANs, like flow GAN,
link |
but mostly GANs are about generating samples,
link |
generating new photos of cats that look realistic.
link |
And they do that completely from scratch.
link |
It's analogous to human imagination
link |
when a GAN creates a new image of a cat.
link |
It's using a neural network to produce a cat
link |
that has not existed before.
link |
It isn't doing something like compositing photos together.
link |
You're not literally taking the eye off of one cat
link |
and the ear off of another cat.
link |
It's more of this digestive process
link |
where the neural net trains in a lot of data
link |
and comes up with some representation
link |
of the probability distribution
link |
and generates entirely new cats.
link |
There are a lot of different ways
link |
of building a generative model.
link |
What's specific to GANs is that we have a two player game
link |
in the game theoretic sense.
link |
And as the players in this game compete,
link |
one of them becomes able to generate realistic data.
link |
The first player is called the generator.
link |
It produces output data, such as just images, for example.
link |
And at the start of the learning process,
link |
it'll just produce completely random images.
link |
The other player is called the discriminator.
link |
The discriminator takes images as input
link |
and guesses whether they're real or fake.
link |
You train it both on real data,
link |
so photos that come from your training set,
link |
actual photos of cats.
link |
And you try to say that those are real.
link |
You also train it on images
link |
that come from the generator network.
link |
And you train it to say that those are fake.
link |
As the two players compete in this game,
link |
the discriminator tries to become better
link |
at recognizing whether images are real or fake.
link |
And the generator becomes better
link |
at fooling the discriminator into thinking
link |
that its outputs are real.
link |
And you can analyze this through the language of game theory
link |
and find that there's a Nash equilibrium
link |
where the generator has captured
link |
the correct probability distribution.
link |
So in the cat example,
link |
it makes perfectly realistic cat photos.
link |
And the discriminator is unable to do better
link |
than random guessing,
link |
because all the samples coming from both the data
link |
and the generator look equally likely
link |
to have come from either source.
link |
So do you ever sit back
link |
and does it just blow your mind that this thing works?
link |
So from very, so it's able to estimate the density function
link |
enough to generate realistic images.
link |
I mean, yeah, do you ever sit back and think,
link |
how does this even, this is quite incredible,
link |
especially where against have gone in terms of realism.
link |
Yeah, and not just to flatter my own work,
link |
but generative models,
link |
all of them have this property
link |
that if they really did what we asked them to do,
link |
they would do nothing but memorize the training data.
link |
Models that are based on maximizing the likelihood,
link |
the way that you obtain the maximum likelihood
link |
for a specific training set
link |
is you assign all of your probability mass
link |
to the training examples and nowhere else.
link |
For GANs, the game is played using a training set.
link |
So the way that you become unbeatable in the game
link |
is you literally memorize training examples.
link |
One of my former interns wrote a paper,
link |
his name is Vaishnav Nagarajan,
link |
and he showed that it's actually hard
link |
for the generator to memorize the training data,
link |
hard in a statistical learning theory sense,
link |
that you can actually create reasons
link |
for why it would require quite a lot of learning steps
link |
and a lot of observations of different latent variables
link |
before you could memorize the training data.
link |
That still doesn't really explain
link |
why when you produce samples that are new,
link |
why do you get compelling images
link |
rather than just garbage that's different
link |
from the training set.
link |
And I don't think we really have a good answer for that,
link |
especially if you think about
link |
how many possible images are out there
link |
and how few images the generative model sees during training.
link |
It seems just unreasonable
link |
that generative models create new images
link |
as well as they do, especially considering
link |
that we're basically training them to memorize
link |
rather than generalize.
link |
I think part of the answer is there's a paper
link |
called Deep Image Prior where they show
link |
that you can take a convolutional net
link |
and you don't even need to learn the parameters of it at all.
link |
You just use the model architecture.
link |
And it's already useful for things like in painting images.
link |
I think that shows us that the convolutional network
link |
architecture captures something really important
link |
about the structure of images.
link |
And we don't need to actually use learning
link |
to capture all the information
link |
coming out of the convolutional net.
link |
That would imply that it would be much harder
link |
to make generative models in other domains.
link |
So far, we're able to make reasonable speech models
link |
and things like that.
link |
But to be honest, we haven't actually explored
link |
a whole lot of different data sets all that much.
link |
We don't, for example, see a lot of deep learning models
link |
of like biology data sets
link |
where you have lots of microarrays
link |
measuring the amount of different enzymes
link |
and things like that.
link |
So we may find that some of the progress
link |
that we've seen for images and speech turns out
link |
to really rely heavily on the model architecture.
link |
And we were able to do what we did for vision
link |
by trying to reverse engineer the human visual system.
link |
And maybe it'll turn out that we can't just
link |
use that same trick for arbitrary kinds of data.
link |
Right, so there's aspect of the human vision system,
link |
the hardware of it that makes it,
link |
without learning, without cognition,
link |
just makes it really effective at detecting the patterns
link |
we see in the visual world.
link |
Yeah, that's really interesting.
link |
What, in a big quick overview in your view,
link |
what types of GANs are there
link |
and what other generative models besides GANs are there?
link |
Yeah, so it's maybe a little bit easier to start
link |
with what kinds of generative models
link |
are there other than GANs.
link |
So most generative models are likelihood based
link |
where to train them, you have a model
link |
that tells you how much probability it assigns
link |
to a particular example,
link |
and you just maximize the probability assigned
link |
to all the training examples.
link |
It turns out that it's hard to design a model
link |
that can create really complicated images
link |
or really complicated audio waveforms
link |
and still have it be possible to estimate
link |
the likelihood function from a computational point of view.
link |
Most interesting models that you would just write
link |
down intuitively, it turns out that it's almost impossible
link |
to calculate the amount of probability
link |
they assign to a particular point.
link |
So there's a few different schools of generative models
link |
in the likelihood family.
link |
One approach is to very carefully design the model
link |
so that it is computationally tractable
link |
to measure the density it assigns to a particular point.
link |
So there are things like auto regressive models,
link |
like pixel CNN, those basically break down
link |
the probability distribution into a product
link |
over every single feature.
link |
So for an image, you estimate the probability of each pixel
link |
given all of the pixels that came before it.
link |
There's tricks where if you want to measure
link |
the density function, you can actually calculate
link |
the density for all these pixels more or less in parallel.
link |
Generating the image still tends to require you
link |
to go one pixel at a time, and that can be very slow.
link |
But there are, again, tricks for doing this
link |
in a hierarchical pattern where you can keep
link |
the runtime under control.
link |
Are the quality of the images it generates
link |
putting runtime aside pretty good?
link |
They're reasonable, yeah.
link |
I would say a lot of the best results
link |
are from GANs these days, but it can be hard to tell
link |
how much of that is based on who's studying
link |
which type of algorithm, if that makes sense.
link |
The amount of effort invested in it.
link |
Yeah, or the kind of expertise.
link |
So a lot of people who've traditionally been excited
link |
about graphics or art and things like that
link |
have gotten interested in GANs.
link |
And to some extent, it's hard to tell,
link |
are GANs doing better because they have a lot of
link |
graphics and art experts behind them?
link |
Or are GANs doing better because
link |
they're more computationally efficient?
link |
Or are GANs doing better because
link |
they prioritize the realism of samples
link |
over the accuracy of the density function?
link |
I think all of those are potentially
link |
valid explanations, and it's hard to tell.
link |
So can you give a brief history of GANs
link |
from 2014 with Paper 13?
link |
Yeah, so a few highlights.
link |
In the first paper, we just showed that
link |
GANs basically work.
link |
If you look back at the samples we had now,
link |
they look terrible.
link |
On the CFAR 10 data set, you can't even
link |
see the effects in them.
link |
Your paper, sorry, you used CFAR 10?
link |
We used MNIST, which is Little Handwritten Digits.
link |
We used the Toronto Face Database,
link |
which is small grayscale photos of faces.
link |
We did have recognizable faces.
link |
My colleague Bing Xu put together
link |
the first GAN face model for that paper.
link |
We also had the CFAR 10 data set,
link |
which is things like very small 32x32 pixels
link |
of cars and cats and dogs.
link |
For that, we didn't get recognizable objects,
link |
but all the deep learning people back then
link |
were really used to looking at these failed samples
link |
and kind of reading them like tea leaves.
link |
And people who are used to reading the tea leaves
link |
recognize that our tea leaves at least look different.
link |
Maybe not necessarily better,
link |
but there was something unusual about them.
link |
And that got a lot of us excited.
link |
One of the next really big steps was LAPGAN
link |
by Emily Denton and Sumith Chintala at Facebook AI Research,
link |
where they actually got really good high resolution photos
link |
working with GANs for the first time.
link |
They had a complicated system
link |
where they generated the image starting at low res
link |
and then scaling up to high res,
link |
but they were able to get it to work.
link |
And then in 2015, I believe later that same year,
link |
Alec Radford and Sumith Chintala and Luke Metz
link |
published the DC GAN paper,
link |
which it stands for Deep Convolutional GAN.
link |
It's kind of a nonunique name
link |
because these days basically all GANs
link |
and even some before that were deep and convolutional,
link |
but they just kind of picked a name for a really great recipe
link |
where they were able to actually using only one model
link |
instead of a multi step process,
link |
actually generate realistic images of faces and things like that.
link |
That was sort of like the beginning
link |
of the Cambrian explosion of GANs.
link |
Once you had animals that had a backbone,
link |
you suddenly got lots of different versions of fish
link |
and four legged animals and things like that.
link |
So DC GAN became kind of the backbone
link |
for many different models that came out.
link |
Used as a baseline even still.
link |
And so from there, I would say some interesting things we've seen
link |
are there's a lot you can say about how just
link |
the quality of standard image generation GANs has increased,
link |
but what's also maybe more interesting on an intellectual level
link |
is how the things you can use GANs for has also changed.
link |
One thing is that you can use them to learn classifiers
link |
without having to have class labels for every example
link |
in your training set.
link |
So that's called semi supervised learning.
link |
My colleague at OpenAI, Tim Solomon, who's at Brain now,
link |
wrote a paper called
link |
Improved Techniques for Training GANs.
link |
I'm a coauthor on this paper,
link |
but I can't claim any credit for this particular part.
link |
One thing he showed on the paper is that
link |
you can take the GAN discriminator and use it as a classifier
link |
that actually tells you this image is a cat,
link |
this image is a dog, this image is a car,
link |
this image is a truck.
link |
And so not just to say whether the image is real or fake,
link |
but if it is real to say specifically what kind of object it is.
link |
And he found that you can train these classifiers
link |
with far fewer labeled examples
link |
than traditional classifiers.
link |
So if you supervise based on also
link |
not just your discrimination ability,
link |
but your ability to classify,
link |
you're going to converge much faster
link |
to being effective at being a discriminator.
link |
So for example, for the MNIST dataset,
link |
you want to look at an image of a handwritten digit
link |
and say whether it's a zero, a one, or two, and so on.
link |
To get down to less than 1% accuracy,
link |
we required around 60,000 examples
link |
until maybe about 2014 or so.
link |
In 2016, with this semi supervised GAN project,
link |
Tim was able to get below 1% error
link |
using only 100 labeled examples.
link |
So that was about a 600x decrease
link |
in the amount of labels that he needed.
link |
He's still using more images than that,
link |
but he doesn't need to have each of them labeled as,
link |
you know, this one's a one, this one's a two,
link |
this one's a zero, and so on.
link |
Then to be able to, for GANs,
link |
to be able to generate recognizable objects,
link |
so objects from a particular class,
link |
you still need labeled data,
link |
because you need to know
link |
what it means to be a particular class cat dog.
link |
How do you think we can move away from that?
link |
Yeah, some researchers at Brain Zurich
link |
actually just released a really great paper
link |
on semi supervised GANs,
link |
where their goal isn't to classify,
link |
to make recognizable objects
link |
despite not having a lot of labeled data.
link |
They were working off of DeepMind's BigGAN project,
link |
and they showed that they can match
link |
the performance of BigGAN
link |
using only 10%, I believe, of the labels.
link |
BigGAN was trained on the ImageNet data set,
link |
which is about 1.2 million images,
link |
and had all of them labeled.
link |
This latest project from Brain Zurich
link |
shows that they're able to get away with
link |
having about 10% of the images labeled.
link |
They do that essentially using a clustering algorithm,
link |
where the discriminator learns to assign
link |
the objects to groups,
link |
and then this understanding that objects can be grouped
link |
into similar types,
link |
helps it to form more realistic ideas
link |
of what should be appearing in the image,
link |
because it knows that every image it creates
link |
has to come from one of these archetypal groups,
link |
rather than just being some arbitrary image.
link |
If you train again with no class labels,
link |
you tend to get things that look sort of like
link |
grass or water or brick or dirt,
link |
but without necessarily a lot going on in them.
link |
I think that's partly because if you look
link |
at a large ImageNet image,
link |
the object doesn't necessarily occupy the whole image,
link |
and so you learn to create realistic sets of pixels,
link |
but you don't necessarily learn
link |
that the object is the star of the show,
link |
and you want it to be in every image you make.
link |
Yeah, I've heard you talk about the horse,
link |
the zebra cycle, gang mapping,
link |
and how it turns out, again,
link |
thought provoking that horses are usually on grass,
link |
and zebras are usually on drier terrain,
link |
so when you're doing that kind of generation,
link |
you're going to end up generating greener horses or whatever.
link |
So those are connected together.
link |
You're not able to segment,
link |
to be able to generate in a segmental way.
link |
So are there other types of games you come across
link |
in your mind that neural networks can play with each other
link |
to be able to solve problems?
link |
Yeah, the one that I spend most of my time on is in security.
link |
You can model most interactions as a game
link |
where there's attackers trying to break your system
link |
or the defender trying to build a resilient system.
link |
There's also domain adversarial learning,
link |
which is an approach to domain adaptation
link |
that looks really a lot like GANs.
link |
The authors had the idea before the GAN paper came out.
link |
Their paper came out a little bit later,
link |
and they were very nice and cited the GAN paper,
link |
but I know that they actually had the idea before it came out.
link |
Domain adaptation is when you want to train a machine learning model
link |
in one setting called a domain,
link |
and then deploy it in another domain later,
link |
and you would like it to perform well in the new domain,
link |
even though the new domain is different from how it was trained.
link |
So, for example, you might want to train
link |
on a really clean image dataset like ImageNet,
link |
but then deploy on users phones,
link |
where the user is taking pictures in the dark
link |
and pictures while moving quickly
link |
and just pictures that aren't really centered
link |
or composed all that well.
link |
When you take a normal machine learning model,
link |
it often degrades really badly when you move to the new domain
link |
because it looks so different from what the model was trained on.
link |
Domain adaptation algorithms try to smooth out that gap,
link |
and the domain adversarial approach is based on
link |
training a feature extractor,
link |
where the features have the same statistics
link |
regardless of which domain you extracted them on.
link |
So, in the domain adversarial game,
link |
you have one player that's a feature extractor
link |
and another player that's a domain recognizer.
link |
The domain recognizer wants to look at the output
link |
of the feature extractor and guess which of the two domains
link |
the features came from.
link |
So, it's a lot like the real versus fake discriminator in GANs.
link |
And then the feature extractor,
link |
you can think of as loosely analogous to the generator in GANs,
link |
except what it's trying to do here
link |
is both fool the domain recognizer
link |
into not knowing which domain the data came from
link |
and also extract features that are good for classification.
link |
So, at the end of the day, in the cases where it works out,
link |
you can actually get features that work about the same
link |
Sometimes this has a drawback where,
link |
in order to make things work the same in both domains,
link |
it just gets worse at the first one.
link |
But there are a lot of cases where it actually
link |
works out well on both.
link |
So, do you think of GANs being useful in the context
link |
of data augmentation?
link |
Yeah, one thing you could hope for with GANs
link |
is you could imagine,
link |
I've got a limited training set
link |
and I'd like to make more training data
link |
to train something else like a classifier.
link |
You could train the GAN on the training set
link |
and then create more data
link |
and then maybe the classifier would perform better
link |
on the test set after training on this bigger GAN generated data set.
link |
So, that's the simplest version
link |
of something you might hope would work.
link |
I've never heard of that particular approach working,
link |
but I think there's some closely related things
link |
that I think could work in the future
link |
and some that actually already have worked.
link |
So, if we think a little bit about what we'd be hoping for
link |
if we use the GAN to make more training data,
link |
we're hoping that the GAN will generalize
link |
to new examples better than the classifier would have
link |
generalized if it was trained on the same data.
link |
And I don't know of any reason to believe
link |
that the GAN would generalize better than the classifier would.
link |
But what we might hope for is that the GAN
link |
could generalize differently from a specific classifier.
link |
So, one thing I think is worth trying
link |
that I haven't personally tried, but someone could try is
link |
what if you trained a whole lot of different generative models
link |
on the same training set,
link |
create samples from all of them
link |
and then train a classifier on that.
link |
Because each of the generative models
link |
might generalize in a slightly different way,
link |
they might capture many different axes of variation
link |
that one individual model wouldn't.
link |
And then the classifier can capture all of those ideas
link |
by training in all of their data.
link |
So, it'd be a little bit like making an ensemble of classifiers.
link |
An ensemble of GANs in a way.
link |
I think that could generalize better.
link |
The other thing that GANs are really good for
link |
is not necessarily generating new data
link |
that's exactly like what you already have,
link |
but by generating new data that has different properties
link |
from the data you already had.
link |
One thing that you can do is you can create
link |
differentially private data.
link |
So, suppose that you have something like medical records
link |
and you don't want to train a classifier on the medical records
link |
and then publish the classifier
link |
because someone might be able to reverse engineer
link |
some of the medical records you trained on.
link |
There's a paper from Casey Green's lab
link |
that shows how you can train again using differential privacy.
link |
And then the samples from the GAN
link |
still have the same differential privacy guarantees
link |
as the parameters of the GAN.
link |
So, you can make fake patient data
link |
for other researchers to use
link |
and they can do almost anything they want with that data
link |
because it doesn't come from real people.
link |
And the differential privacy mechanism
link |
gives you clear guarantees on how much
link |
the original people's data has been protected.
link |
That's really interesting, actually.
link |
I haven't heard you talk about that before.
link |
In terms of fairness,
link |
I've seen from AAAI your talk,
link |
how can adversarial machine learning
link |
help models be more fair
link |
with respect to sensitive variables?
link |
Yeah. So, there's a paper from Emma Storky's lab
link |
about how to learn machine learning models
link |
that are incapable of using specific variables.
link |
So, say, for example, you wanted to make predictions
link |
that are not affected by gender.
link |
It isn't enough to just leave gender
link |
out of the input to the model.
link |
You can often infer gender from a lot of other characteristics.
link |
Like, say that you have the person's name,
link |
but you're not told their gender.
link |
Well, if their name is Ian, they're kind of obviously a man.
link |
So, what you'd like to do is make a machine learning model
link |
that can still take in a lot of different attributes
link |
and make a really accurate informed prediction,
link |
but be confident that it isn't reverse engineering gender
link |
or another sensitive variable internally.
link |
You can do that using something very similar
link |
to the domain adversarial approach,
link |
where you have one player that's a feature extractor
link |
and another player that's a feature analyzer.
link |
And you want to make sure that the feature analyzer
link |
is not able to guess the value of the sensitive variable
link |
that you're trying to keep private.
link |
Right. Yeah, I love this approach.
link |
So, with the feature, you're not able to infer
link |
the sensitive variables.
link |
It's brilliant. It's quite brilliant and simple, actually.
link |
Another way I think that GANs in particular
link |
could be used for fairness would be
link |
to make something like a cycle GAN,
link |
where you can take data from one domain
link |
and convert it into another.
link |
We've seen cycle GAN turning horses into zebras.
link |
We've seen other unsupervised GANs made by Mingyu Liu
link |
doing things like turning day photos into night photos.
link |
I think for fairness, you could imagine
link |
taking records for people in one group
link |
and transforming them into analogous people in another group
link |
and testing to see if they're treated equitably
link |
across those two groups.
link |
There's a lot of things that would be hard to get right
link |
and make sure that the conversion process itself is fair.
link |
And I don't think it's anywhere near something
link |
that we could actually use yet.
link |
But if you could design that conversion process very carefully,
link |
it might give you a way of doing audits
link |
where you say, what if we took people from this group,
link |
converted them into equivalent people in another group?
link |
Does the system actually treat them how it ought to?
link |
That's also really interesting.
link |
You know, in popular press
link |
and in general, in our imagination,
link |
you think, well, GANs are able to generate data
link |
and you start to think about deep fakes
link |
or being able to sort of maliciously generate data
link |
that fakes the identity of other people.
link |
Is this something of a concern to you?
link |
Is this something, if you look 10, 20 years into the future,
link |
is that something that pops up in your work,
link |
in the work of the community that's working on generative models?
link |
I'm a lot less concerned about 20 years from now
link |
than the next few years.
link |
I think there will be a kind of bumpy cultural transition
link |
as people encounter this idea
link |
that there can be very realistic videos and audio that aren't real.
link |
I think 20 years from now,
link |
people will mostly understand that you shouldn't believe
link |
something is real just because you saw a video of it.
link |
People will expect to see that it's been cryptographically signed
link |
or have some other mechanism to make them believe
link |
that the content is real.
link |
There's already people working on this,
link |
like there's a startup called TruePick
link |
that provides a lot of mechanisms for authenticating
link |
that an image is real.
link |
They're maybe not quite up to having a state actor
link |
try to evade their verification techniques,
link |
but it's something that people are already working on
link |
and I think will get right eventually.
link |
So you think authentication will eventually win out?
link |
So being able to authenticate that this is real and this is not?
link |
As opposed to GANs just getting better and better
link |
or generative models being able to get better and better
link |
to where the nature of what is real is normal.
link |
I don't think we'll ever be able to look at the pixels of a photo
link |
and tell you for sure that it's real or not real,
link |
and I think it would actually be somewhat dangerous
link |
to rely on that approach too much.
link |
If you make a really good fake detector
link |
and then someone's able to fool your fake detector
link |
and your fake detector says this image is not fake,
link |
then it's even more credible
link |
than if you've never made a fake detector in the first place.
link |
What I do think we'll get to is systems
link |
that we can kind of use behind the scenes
link |
to make estimates of what's going on
link |
and maybe not use them in court for a definitive analysis.
link |
I also think we will likely get better authentication systems
link |
where, imagine that every phone cryptographically
link |
signs everything that comes out of it.
link |
You wouldn't be able to conclusively tell
link |
that an image was real,
link |
but you would be able to tell somebody who knew
link |
the appropriate private key for this phone
link |
was actually able to sign this image
link |
and upload it to this server at this time stamp.
link |
You could imagine maybe you make phones
link |
that have the private keys hardware embedded in them.
link |
If a state security agency
link |
really wants to infiltrate the company,
link |
they could probably plant a private key of their choice
link |
or break open the chip
link |
and learn the private key or something like that.
link |
But it would make it a lot harder
link |
for an adversary with fewer resources to fake things.
link |
For most of us, it would be okay.
link |
You mentioned the beer and the bar and the new ideas.
link |
You were able to come up with this new idea
link |
pretty quickly and implement it pretty quickly.
link |
Do you think there are still many
link |
such groundbreaking ideas in deep learning
link |
that could be developed so quickly?
link |
Yeah, I do think that there are a lot of ideas
link |
that can be developed really quickly.
link |
GANs were probably a little bit of an outlier
link |
on the whole one hour time scale.
link |
But just in terms of low resource ideas
link |
where you do something really different
link |
on a high scale and get a big payback,
link |
I think it's not as likely that you'll see that
link |
in terms of things like core machine learning technologies
link |
like a better classifier
link |
or a better reinforcement learning algorithm
link |
or a better generative model.
link |
If I had the GAN idea today,
link |
it would be a lot harder to prove that it was useful
link |
than it was back in 2014
link |
because I would need to get it running on something
link |
like ImageNet or Celeb A at high resolution.
link |
Those take a while to train.
link |
You couldn't train it in an hour
link |
and know that it was something really new and exciting.
link |
Back in 2014, training on MNIST was enough.
link |
But there are other areas of machine learning
link |
where I think a new idea could actually be developed
link |
really quickly with low resources.
link |
What's your intuition about what areas
link |
of machine learning are ripe for this?
link |
Yeah, so I think fairness and interpretability
link |
are areas where we just really don't have any idea
link |
how anything should be done yet.
link |
Like for interpretability,
link |
I don't think we even have the right definitions.
link |
And even just defining a really useful concept,
link |
you don't even need to run any experiments.
link |
It could have a huge impact on the field.
link |
We've seen that, for example, in differential privacy
link |
that Cynthia Dwork and her collaborators
link |
made this technical definition of privacy
link |
where before a lot of things were really mushy
link |
and with that definition, you could actually design
link |
randomized algorithms for accessing databases
link |
and guarantee that they preserved individual people's privacy
link |
in a mathematical quantitative sense.
link |
Right now, we all talk a lot about
link |
how interpretable different machine learning algorithms are,
link |
but it's really just people's opinion.
link |
And everybody probably has a different idea
link |
of what interpretability means in their head.
link |
If we could define some concept related to interpretability
link |
that's actually measurable,
link |
that would be a huge leap forward
link |
even without a new algorithm that increases that quantity.
link |
And also, once we had the definition of differential privacy,
link |
it was fast to get the algorithms that guaranteed it.
link |
So you could imagine once we have definitions
link |
of good concepts and interpretability,
link |
we might be able to provide the algorithms
link |
that have the interpretability guarantees quickly, too.
link |
What do you think it takes to build a system
link |
with human level intelligence
link |
as we quickly venture into the philosophical?
link |
So artificial general intelligence, what do you think it takes?
link |
I think that it definitely takes better environments
link |
than we currently have for training agents,
link |
that we want them to have a really wide diversity of experiences.
link |
I also think it's going to take really a lot of computation.
link |
It's hard to imagine exactly how much.
link |
So you're optimistic about simulation,
link |
simulating a variety of environments as the path forward
link |
as opposed to operating in the real world?
link |
I think it's a necessary ingredient.
link |
I don't think that we're going to get to artificial general intelligence
link |
by training on fixed data sets
link |
or by thinking really hard about the problem.
link |
I think that the agent really needs to interact
link |
and have a variety of experiences within the same lifespan.
link |
And today we have many different models that can each do one thing,
link |
and we tend to train them on one dataset or one RL environment.
link |
Sometimes there are actually papers about getting one set of parameters
link |
to perform well in many different RL environments,
link |
but we don't really have anything like an agent
link |
that goes seamlessly from one type of experience to another
link |
and really integrates all the different things that it does
link |
over the course of its life.
link |
When we do see multiagent environments,
link |
they tend to be similar environments.
link |
All of them are playing an action based video game.
link |
We don't really have an agent that goes from playing a video game
link |
to reading the Wall Street Journal
link |
to predicting how effective a molecule will be as a drug or something like that.
link |
What do you think is a good test for intelligence in your view?
link |
There's been a lot of benchmarks started with Alan Turing,
link |
natural conversation being a good benchmark for intelligence.
link |
What would you and good fellows sit back and be really damn impressed
link |
if a system was able to accomplish?
link |
Something that doesn't take a lot of glue from human engineers.
link |
Imagine that instead of having to go to the CIFAR website and download CIFAR 10
link |
and then write a Python script to parse it and all that,
link |
you could just point an agent at the CIFAR 10 problem
link |
and it downloads and extracts the data and trains a model
link |
and starts giving you predictions.
link |
I feel like something that doesn't need to have every step of the pipeline assembled for it
link |
definitely understands what it's doing.
link |
Is AutoML moving into that direction or are you thinking way even bigger?
link |
AutoML has mostly been moving toward once we've built all the glue,
link |
can the machine learning system design the architecture really well?
link |
I'm more of saying if something knows how to pre process the data
link |
so that it successfully accomplishes the task,
link |
then it would be very hard to argue that it doesn't truly understand the task
link |
in some fundamental sense.
link |
I don't necessarily know that that's the philosophical definition of intelligence,
link |
but that's something that would be really cool to build that would be really useful
link |
and would impress me and would convince me that we've made a step forward in real AI.
link |
You give it the URL for Wikipedia
link |
and then next day expect it to be able to solve CIFAR 10.
link |
Or you type in a paragraph explaining what you want it to do
link |
and it figures out what web searches it should run and downloads all the necessary ingredients.
link |
So you have a very clear, calm way of speaking, no ums, easy to edit.
link |
I've seen comments for both you and I have been identified as both potentially being robots.
link |
If you have to prove to the world that you are indeed human, how would you do it?
link |
I can understand thinking that I'm a robot.
link |
It's the flip side of the Turing test, I think.
link |
Yeah, the prove your human test.
link |
Intellectually, so you have to, is there something that's truly unique in your mind
link |
as it doesn't go back to just natural language again, just being able to talk the way out of it?
link |
So proving that I'm not a robot with today's technology,
link |
that's pretty straightforward.
link |
My conversation today hasn't veered off into talking about the stock market or something because it's my training data.
link |
But I guess more generally trying to prove that something is real from the content alone is incredibly hard.
link |
That's one of the main things I've gotten out of my GAN research, that you can simulate almost anything
link |
and so you have to really step back to a separate channel to prove that something is real.
link |
So I guess I should have had myself stamped on a blockchain when I was born or something, but I didn't do that.
link |
So according to my own research methodology, there's just no way to know at this point.
link |
So what last question, problem stands out for you that you're really excited about challenging in the near future?
link |
I think resistance to adversarial examples, figuring out how to make machine learning secure against an adversary
link |
who wants to interfere it and control it, that is one of the most important things researchers today could solve.
link |
In all domains, image, language, driving and everything.
link |
I guess I'm most concerned about domains we haven't really encountered yet.
link |
Imagine 20 years from now when we're using advanced AIs to do things we haven't even thought of yet.
link |
If you ask people what are the important problems in security of phones in 2002,
link |
I don't think we would have anticipated that we're using them for nearly as many things as we're using them for today.
link |
I think it's going to be like that with AI that you can kind of try to speculate about where it's going,
link |
but really the business opportunities that end up taking off would be hard to predict ahead of time.
link |
What you can predict ahead of time is that almost anything you can do with machine learning,
link |
you would like to make sure that people can't get it to do what they want rather than what you want
link |
just by showing it a funny QR code or a funny input pattern.
link |
You think that the set of methodology to do that can be bigger than any one domain?
link |
One methodology that I think is not a specific methodology,
link |
but a category of solutions that I'm excited about today is making dynamic models
link |
that change every time they make a prediction.
link |
Right now, we tend to train models and then after they're trained, we freeze them.
link |
We just use the same rule to classify everything that comes in from then on.
link |
That's really a sitting duck from a security point of view.
link |
If you always output the same answer for the same input,
link |
then people can just run inputs through until they find a mistake that benefits them,
link |
and then they use the same mistake over and over and over again.
link |
I think having a model that updates its predictions so that it's harder to predict what you're going to get
link |
will make it harder for an adversary to really take control of the system
link |
and make it do what they want it to do.
link |
Yeah, models that maintain a bit of a sense of mystery about them
link |
because they always keep changing.
link |
Ian, thanks so much for talking today. It was awesome.
link |
Thank you for coming in. It's great to see you.