back to index

Gavin Miller: Adobe Research | Lex Fridman Podcast #23


small model | large model

link |
00:00:00.000
The following is a conversation with Gavin Miller, he's the head of Adobe Research.
link |
00:00:04.720
Adobe has empowered artists, designers, and creative minds from all professions
link |
00:00:09.040
working in the digital medium for over 30 years with software such as Photoshop,
link |
00:00:13.680
Illustrator, Premiere, After Effects, InDesign, Audition, software that work with images,
link |
00:00:19.440
video, and audio. Adobe Research is working to define the future evolution of these products
link |
00:00:25.200
in a way that makes the life of creatives easier, automates the tedious tasks, and gives more and
link |
00:00:30.560
more time to operate in the idea space instead of pixel space. This is where the cutting edge,
link |
00:00:36.560
deep learning methods of the past decade can really shine more than perhaps any other application.
link |
00:00:42.160
Gavin is the embodiment of combining tech and creativity. Outside of Adobe Research,
link |
00:00:47.840
he writes poetry and builds robots, both things that are near and dear to my heart as well.
link |
00:00:53.600
This conversation is part of the Artificial Intelligence Podcast. If you enjoy it,
link |
00:00:58.080
subscribe on YouTube, iTunes, or simply connect with me on Twitter at Lux Friedman spelled F R I D.
link |
00:01:05.360
And now, here's my conversation with Gavin Miller.
link |
00:01:10.400
You're head of Adobe Research, leading a lot of innovative efforts and applications of AI,
link |
00:01:15.280
creating images, video, audio, language, but you're also yourself an artist, a poet,
link |
00:01:22.640
a writer, and even a roboticist. So, while I promised to everyone listening that I will not
link |
00:01:29.280
spend the entire time we have together reading your poetry, which I love, I have to sprinkle it
link |
00:01:34.880
in at least a little bit. So, some of them are pretty deep and profound and some are light and
link |
00:01:40.000
silly. Let's start with a few lines from the silly variety. You write in Je Ne Vinaigrette Rien,
link |
00:01:49.520
a poem that beautifully parodies both Edith Piaf's Je Ne Vinaigrette Rien and My Way by
link |
00:01:56.000
Frank Sinatra. So, it opens with, and now dessert is near. It's time to pay the final total.
link |
00:02:04.960
I've tried to slim all year, but my diets have been anecdotal. So,
link |
00:02:12.720
where does that love for poetry come from for you? And if we dissect your mind,
link |
00:02:16.720
how does it all fit together in the bigger puzzle of Dr. Gavin Miller?
link |
00:02:22.160
Oh, well, interesting you chose that one. That was a poem I wrote when I'd been to my doctor
link |
00:02:27.920
and he said you really need to lose some weight and go on a diet. And whilst the rational part
link |
00:02:32.960
of my brain wanted to do that, the irrational part of my brain was protesting and sort of
link |
00:02:37.520
embraced the opposite idea. I regret nothing hence.
link |
00:02:40.560
Yes, exactly. Taken to an extreme, I thought it would be funny. Obviously, it's a serious
link |
00:02:44.640
topic for some people. But I think for me, I've always been interested in writing since I was in
link |
00:02:52.560
high school, as well as doing technology and invention. And sometimes there are parallel
link |
00:02:57.360
strands in your life that carry on. And one is more about your private life and one's more about
link |
00:03:02.720
your technological career. And then at sort of happy moments along the way, sometimes the two
link |
00:03:09.120
things touch. One idea informs the other. And we can talk about that as we go.
link |
00:03:14.880
Do you think your writing, the art, the poetry contribute
link |
00:03:17.760
indirectly or directly to your research, to your work in Adobe?
link |
00:03:21.920
Well, sometimes it does if I say, imagine a future in a science fiction kind of way. And
link |
00:03:28.160
then once it exists on paper, I think, well, why shouldn't I just build that?
link |
00:03:31.760
There was an example where when realistic voice synthesis first started in the 90s at Apple,
link |
00:03:38.000
where I worked in research, it was done by a friend of mine. I sort of sat down and started
link |
00:03:43.360
writing a poem, which each line I would enter into the voice synthesizer and see how it sounded,
link |
00:03:48.640
and sort of wrote it for that voice. And at the time, the agents weren't very sophisticated. So
link |
00:03:55.280
they'd sort of add random intonation. And I kind of made up the poem to sort of
link |
00:04:00.160
match the tone of the voice. And it sounded slightly sad and depressed. So I pretended
link |
00:04:05.120
it was a poem written by an intelligent agent, sort of telling the user to go home and leave
link |
00:04:11.040
them alone. But at the same time, they were lonely and wanted to have company and learn
link |
00:04:14.400
from what the user was saying. And at the time, it was way beyond anything that AI could possibly
link |
00:04:19.520
do. But since then, it's becoming more within the bounds of possibility. And then
link |
00:04:28.000
at the same time, I had a project at home where I did sort of a smart home. This was probably 93,
link |
00:04:34.000
94. And I had the talking voice who'd remind me when I walked in the door of what things I had
link |
00:04:39.440
to do. I had buttons on my washing machine because I was a bachelor and I'd leave the clothes in
link |
00:04:43.920
there for three days and they got moldy. So as I got up in the morning, it would say,
link |
00:04:47.760
don't forget the washing and so on. I made photo albums that use light sensors to know which page
link |
00:04:55.600
you were looking at would send that over wireless radio to the agent who would then play sounds that
link |
00:05:00.400
match the image you were looking at in the book. So I was kind of in love with this idea of magical
link |
00:05:04.960
realism and whether it was possible to do that with technology. So that was a case where the sort
link |
00:05:10.480
of the agent sort of intrigued me from a literary point of view and became a personality. I think
link |
00:05:17.200
more recently, I've also written plays and when plays you write dialogue and obviously
link |
00:05:23.120
you write a fixed set of dialogue that follows a linear narrative. But with modern agents,
link |
00:05:28.240
as you design a personality or a capability for conversation, you're sort of thinking of,
link |
00:05:33.120
I kind of have imaginary dialogue in my head. And then I think, what would it take not only to have
link |
00:05:38.000
that be real, but for it to really know what it's talking about. So it's easy to fall into the
link |
00:05:43.600
uncanny valley with AI where it says something it doesn't really understand, but it sounds good to
link |
00:05:48.640
the person. But you rapidly realize that it's kind of just stimulus response. It doesn't really have
link |
00:05:55.360
real world knowledge about the thing it's describing. And so when you get to that point,
link |
00:06:01.520
it really needs to have multiple ways of talking about the same concept. So it sounds as though it
link |
00:06:05.760
really understands it. Now, what really understanding means is in the eye of the beholder, right? But
link |
00:06:11.360
if it only has one way of referring to something, it feels like it's a canned response. But if it
link |
00:06:15.760
can reason about it, or you can go at it from multiple angles and give a similar kind of
link |
00:06:20.720
response that people would, then it starts to seem more like there's something there that's sentient.
link |
00:06:28.080
You can say the same thing, multiple things from different perspectives. I mean, with the automatic
link |
00:06:33.600
image captioning that I've seen the work that you're doing, there's elements of that, right?
link |
00:06:37.680
Being able to generate different kinds of statements about the same picture.
link |
00:06:40.480
Right. So in my team, there's a lot of work on turning a medium from one form to another, whether it's auto tagging imagery or making up full sentences about what's in the image,
link |
00:06:52.080
then changing the sentence, finding another image that matches the new sentence or vice versa.
link |
00:06:58.000
And in the modern world of GANs, you sort of give it a description and it synthesizes an asset that matches the description.
link |
00:07:06.400
So I've sort of gone on a journey. My early days in my career were about 3D computer graphics, the sort of pioneering work, sort of before movies had special effects done with 3D graphics,
link |
00:07:17.600
and sort of rode that revolution. And that was very much like the Renaissance where people would model light and color and shape and everything.
link |
00:07:25.840
And now we're kind of in another wave where it's more impressionistic and it's sort of the idea of something can be used to generate an image directly, which is
link |
00:07:34.400
sort of the new frontier in computer image generation using AI algorithms.
link |
00:07:43.040
So the creative process is more in the space of ideas or becoming more in the space of ideas versus in the raw pixels?
link |
00:07:50.000
Well, it's interesting. It depends. I think at Adobe, we really want to span the entire range from really, really good, what you might call low level tools by low level as close to say, analog workflows as possible.
link |
00:08:02.240
So what we do there is we make up systems that do really realistic oil paint and watercolor simulations.
link |
00:08:08.720
So if you want every bristle to behave as it would in the real world and leave a beautiful analog trail of water and then flow after you've made the brushstroke, you can do that.
link |
00:08:18.800
And that's really important for people who want to create something really expressive or really novel because they have complete control.
link |
00:08:26.640
And then as certain other tasks become automated, it frees the artists up to focus on the inspiration and less of the perspiration.
link |
00:08:35.520
So thinking about different ideas, obviously. Once you finish the design, there's a lot of work to, say, do it for all the different aspect ratio of phones or websites and so on.
link |
00:08:48.880
And that used to take up an awful lot of time for artists.
link |
00:08:51.920
It still does for many what we call content velocity. And one of the targets of AI is actually to reason about from the first example of what are the likely intent for these other formats?
link |
00:09:03.600
Maybe if you change the language to German and the words are longer, how do you reflow everything so that it looks nicely artistic in that way?
link |
00:09:12.160
And so the person can focus on the really creative bit in the middle, which is what is the look and style and feel and what's the message and what's the story and the human element?
link |
00:09:21.840
So I think creativity is changing. So that's one way in which we're trying to just make it easier and faster and cheaper to do so that there can be more of it, more demand because it's less expensive.
link |
00:09:33.520
So everyone wants beautiful artwork for everything from a school website to Hollywood movie.
link |
00:09:39.360
On the other side, as some of these things have automatic versions of them, people will possibly change role from being the hands on artisan to being either the art director or the conceptual artist.
link |
00:09:53.680
And then the computer will be a partner to help create polished examples of the idea that they're exploring.
link |
00:09:59.520
Let's talk about Adobe products, AI and Adobe products.
link |
00:10:02.880
Just so you know where I'm coming from, I'm a huge fan of Photoshop for images, Premiere for video, Audition for audio.
link |
00:10:12.480
I'll probably use Photoshop to create the thumbnail for this video, Premiere to edit the video, Audition to do the audio.
link |
00:10:19.680
That said, everything I do is really manually and I set up, I use this old school Kinesis keyboard and I have auto hotkey that just, it's really about optimizing the flow.
link |
00:10:32.640
Of just making sure there's as few clicks as possible, so just being extremely efficient, something you started to speak to.
link |
00:10:39.520
So before we get into the fun sort of awesome deep learning things, where does AI, if you could speak a little more to it, AI or just automation in general,
link |
00:10:50.160
do you see in the coming months and years or in general, prior in 2018, fitting into making the life, the low level pixel work flow easier?
link |
00:11:04.000
Yeah, that's a great question.
link |
00:11:05.040
So we have a very rich array of algorithms already in Photoshop, just classical procedural algorithms as well as ones based on data.
link |
00:11:14.560
In some cases, they end up with a large number of sliders and degrees of freedom.
link |
00:11:20.160
So one way in which AI can help is just an auto button, which comes up with default settings based on the content itself rather than default values for the tool.
link |
00:11:29.120
At that point, you then start tweaking.
link |
00:11:31.840
So that's a very kind of make life easier for people whilst making use of common sense from other example images.
link |
00:11:39.520
So like smart defaults.
link |
00:11:40.960
Smart defaults, absolutely.
link |
00:11:42.480
Another one is something we've spent a lot of work over the last 20 years I've been at Adobe, or 19, thinking about selection, for instance,
link |
00:11:53.040
where, you know, with quick select, you would look at color boundaries and figure out how to sort of flood fill into regions that you thought were physically connected in the real world.
link |
00:12:03.920
But that algorithm had no visual common sense about what a cat looks like or a dog.
link |
00:12:08.720
It would just do it based on rules of thumb, which were applied to graph theory.
link |
00:12:12.880
And it was a big improvement over the previous work where you had sort of almost click everything by hand.
link |
00:12:19.120
Or if it just did similar colors, it would do little tiny regions that wouldn't be connected.
link |
00:12:24.480
But in the future, using neural nets to actually do a great job with, say, a single click or even in the case of well known categories like people or animals,
link |
00:12:34.080
no click where you just say select the object and it just knows the dominant object is a person in the middle of the photograph.
link |
00:12:40.960
Those kinds of things are really valuable if they can be robust enough to give you good quality results or they can be a great start for like tweaking it.
link |
00:12:51.920
So, for example, background removal.
link |
00:12:54.240
Correct.
link |
00:12:54.480
Like one thing I'll, in a thumbnail, I'll take a picture of you right now and essentially remove the background behind you.
link |
00:13:01.520
And I want to make that as easy as possible.
link |
00:13:04.080
You don't have flowing hair, like rich at the moment.
link |
00:13:08.480
I had it in the past.
link |
00:13:10.480
It may come again in the future.
link |
00:13:13.680
So that sometimes makes it a little more challenging to remove the background.
link |
00:13:17.360
How difficult do you think is that problem for AI for basically making the quick selection tool smarter and smarter and smarter?
link |
00:13:25.040
Well, we have a lot of research on that already.
link |
00:13:26.960
If you want a sort of quick, cheap and cheerful, look, I'm pretending I'm in Hawaii, but it's sort of a joke, then you don't need perfect boundaries.
link |
00:13:36.240
And you can do that today with a single click with the algorithms we have.
link |
00:13:40.320
We have other algorithms where with a little bit more guidance on the boundaries, like you might need to touch it up a little bit.
link |
00:13:48.560
We have other algorithms that can pull a nice mat from a crude selection.
link |
00:13:53.200
So we have combinations of tools that can do all of that.
link |
00:13:57.440
And at our recent Max conference at Adobe Max, we demonstrated how very quickly, just by drawing a simple polygon around the object of interest,
link |
00:14:08.080
we could not only do it for a single still, but we could pull a mat, well, pull at least a selection mask from a moving target,
link |
00:14:16.880
like a person dancing in front of a brick wall or something.
link |
00:14:19.760
And so it's going from hours to a few seconds for workflows that are really nice, and then you might go in and touch up a little.
link |
00:14:28.560
So that's a really interesting question.
link |
00:14:30.480
You mentioned the word robust.
link |
00:14:31.520
You know, there's like a journey for an idea, right?
link |
00:14:36.240
And what you presented probably at Max has elements of just sort of, it inspires the concept, it can work pretty well in a majority of cases.
link |
00:14:45.680
But how do you make something that works, well, in majority of cases, how do you make something that works, maybe in all cases, or it becomes a robust tool that can...
link |
00:14:56.240
Well, there are a couple of things.
link |
00:14:57.600
So that really touches on the difference between academic research and industrial research.
link |
00:15:02.960
So in academic research, it's really about who's the person to have the great new idea that shows promise.
link |
00:15:09.360
And we certainly love to be those people too.
link |
00:15:12.320
But we have sort of two forms of publishing.
link |
00:15:15.040
One is academic peer review, which we do a lot of, and we have great success there as much as some universities.
link |
00:15:22.880
But then we also have shipping, which is a different type of...
link |
00:15:26.640
And then we get customer review, as well as, you know, product critics.
link |
00:15:30.800
And that might be a case where it's not about being perfect every single time, but perfect enough of the time,
link |
00:15:39.440
plus a mechanism to intervene and recover where you do have mistakes.
link |
00:15:43.280
So we have the luxury of very talented customers.
link |
00:15:46.000
We don't want them to be overly taxed doing it every time.
link |
00:15:50.640
But if they can go in and just take it from 99 to 100 with the touch of a mouse or something,
link |
00:15:58.960
then for the professional end, that's something that we definitely want to support as well.
link |
00:16:03.840
And for them, it went from having to do that tedious task all the time to much less often.
link |
00:16:09.840
So I think that gives us an out. If it had to be 100% automatic all the time,
link |
00:16:15.920
then that would delay the time at which we could get to market.
link |
00:16:19.760
So on that thread, maybe you can untangle something.
link |
00:16:23.760
Again, I'm sort of just speaking to my own experience.
link |
00:16:28.960
Maybe that is the most useful.
link |
00:16:30.400
Absolutely.
link |
00:16:30.900
So I think Photoshop, as an example, or Premiere, has a lot of amazing features that I haven't touched.
link |
00:16:41.940
And so in terms of AI helping make my life or the life of creatives easier,
link |
00:16:52.740
this collaboration between human and machine, how do you learn to collaborate better?
link |
00:16:57.220
How do you learn the new algorithms?
link |
00:16:58.980
Is it something where you have to watch tutorials and you have to watch videos and so on?
link |
00:17:03.860
Or do you think about the experience itself through exploration, being the teacher?
link |
00:17:10.100
We absolutely do.
link |
00:17:11.220
So I'm glad that you brought this up.
link |
00:17:15.940
We sort of think about two things.
link |
00:17:17.860
One is helping the person in the moment to do the task that they need to do,
link |
00:17:21.060
but the other is thinking more holistically about their journey learning a tool.
link |
00:17:24.900
And when it's like, think of it as Adobe University, where you use the tool long enough, you become an expert.
link |
00:17:30.020
And not necessarily an expert in everything.
link |
00:17:32.100
It's like living in a city.
link |
00:17:33.140
You don't necessarily know every street, but you know the important ones you need to get to.
link |
00:17:38.180
So we have projects in research, which actually look at the thousands of hours of tutorials online
link |
00:17:42.900
and try to understand what's being taught in them.
link |
00:17:46.100
And then we had one publication at CHI where it was looking at,
link |
00:17:50.100
given the last three or four actions you did, what did other people in tutorials do next?
link |
00:17:54.900
So if you want some inspiration for what you might do next, or you just want to watch the tutorial and see,
link |
00:18:00.340
learn from people who are doing similar workflows to you,
link |
00:18:02.980
you can without having to go and search on keywords and everything.
link |
00:18:06.580
So really trying to use the context of your use of the app to make intelligent suggestions,
link |
00:18:13.540
either about choices that you might make,
link |
00:18:16.340
or in a more assistive way, where it could say, if you did this next, we could show you.
link |
00:18:21.060
And that's basically the frontier that we're exploring now, which is,
link |
00:18:25.300
if we really deeply understand the domain in which designers and creative people work,
link |
00:18:30.660
can we combine that with AI and pattern matching of behavior to make intelligent suggestions,
link |
00:18:37.460
either through, you know, verbal,
link |
00:18:41.380
possibilities, or just showing the results of if you try this. And that's really the sort of,
link |
00:18:47.460
you know, I was in a meeting today thinking about these things.
link |
00:18:50.020
Well, it's still a grand challenge. You know, we'd all love an artist over one shoulder and a teacher over the other, right?
link |
00:18:57.060
And we hope to get there. And the right thing to do is to give enough at each stage that it's useful in itself,
link |
00:19:05.140
but it builds a foundation for the next stage.
link |
00:19:07.620
Give enough at each stage that it's useful in itself, but it builds a foundation for the next
link |
00:19:12.340
level of expectation.
link |
00:19:14.340
Are you aware of this gigantic medium of YouTube that's creating
link |
00:19:20.900
just a bunch of creative people, both artists and teachers of different kinds?
link |
00:19:26.180
Absolutely. And the more we can understand those media types, both visually and in terms of transcripts and
link |
00:19:32.660
words, the more we can bring the wisdom that they embody into the guidance that's embedded in the tool.
link |
00:19:38.100
That would be brilliant to remove the barrier from having to yourself type in the keyword searching, so on.
link |
00:19:45.220
Absolutely. And then in the longer term, an interesting discussion is, does it ultimately
link |
00:19:51.860
not just assist with learning the interface we have, but does it modify the interface to be simpler?
link |
00:19:56.820
Or do you fragment into a variety of tools, each of which has a different level of visibility of
link |
00:20:02.820
the functionality? I like to say that if you add a feature to a GUI, you have to have
link |
00:20:08.820
yet more visual complexity confronting the new user. Whereas if you have an assistant with a new skill,
link |
00:20:15.620
if you know they have it, so you know to ask for it, then it's sort of additive without being
link |
00:20:20.740
more intimidating. So we definitely think about new users and how to onboard them.
link |
00:20:25.380
Many actually value the idea of being able to master that complex interface and keyboard shortcuts
link |
00:20:31.780
like you were talking about earlier, because with great familiarity, it becomes a musical instrument
link |
00:20:37.060
for expressing your visual ideas. And other people just want to get something done quickly
link |
00:20:43.220
in the simplest way possible. And that's where a more assistive version of the same technology
link |
00:20:48.180
might be useful, maybe on a different class of device, which is more in context for CAPTCHA, say.
link |
00:20:55.940
Whereas somebody who's in a deep post production workflow maybe want to be on a laptop or a big
link |
00:21:01.700
screen desktop and have more knobs and dials to really express the subtlety of what they want to do.
link |
00:21:12.180
So there's so many exciting applications of computer vision and machine learning
link |
00:21:16.260
that Adobe is working on, like scene stitching, sky replacement, foreground, background removal,
link |
00:21:23.300
spatial object based image search, automatic image captioning, like we mentioned, project cloak,
link |
00:21:28.980
project deep fill, filling in parts of the images, project scribbler, style transform video, style
link |
00:21:35.140
transform faces and video with project puppetron, best name ever. Can you talk through a favorite
link |
00:21:44.820
or some of them or examples that popped in mind? I'm sure I'll be able to provide links to other
link |
00:21:52.420
ones we don't talk about because there's visual elements to all of them that are exciting.
link |
00:21:58.900
Why they're interesting for different reasons might be a good way to go. So I think sky replace
link |
00:22:03.620
is interesting because we talked about selection being sort of an atomic operation. It's almost
link |
00:22:08.820
like if you think of an assembly language, it's like a single instruction. Whereas sky replace is
link |
00:22:15.700
a compound action where you automatically select the sky, you look for stock content that matches
link |
00:22:21.540
the geometry of the scene. You try to have variety in your choices so that you do coverage of different
link |
00:22:27.220
moods. It then mats in the sky behind the foreground. But then importantly, it uses the
link |
00:22:34.980
foreground of the other image that you just searched on to recolor the foreground of the
link |
00:22:39.780
image that you're editing. So if you say go from a midday sky to an evening sky, it will actually
link |
00:22:47.540
add sort of an orange glow to the foreground objects as well.
link |
00:22:51.620
I was a big fan in college of Magritte and he has a number of paintings where it's surrealism
link |
00:22:57.380
because he'll like do a composite, but the foreground building will be at night and the
link |
00:23:01.940
sky will be during the day. There's one called The Empire of Light, which was on my wall in college.
link |
00:23:06.500
And we're trying not to do surrealism. It can be a choice, but we'd rather have it be natural by
link |
00:23:13.380
default rather than it looking fake. And then you have to do a whole bunch of post production to
link |
00:23:17.620
fix it. So that's a case where we're kind of capturing an entire workflow into a single action
link |
00:23:23.460
and doing it in about a second rather than a minute or two. And when you do that, you can
link |
00:23:29.220
not just do it once, but you can do it for say like 10 different backgrounds. And then you're
link |
00:23:34.420
almost back to this inspiration idea of I don't know quite what I want, but I'll know it when I
link |
00:23:39.060
see it. And you can just explore the design space as close to final production value as possible.
link |
00:23:45.940
And then when you really pick one, you might go back and slightly tweak the selection mask just
link |
00:23:49.620
to make it perfect and do that kind of polish that professionals like to bring to their work.
link |
00:23:54.340
So then there's this idea of, you mentioned the sky, replacing it to different stock images of
link |
00:24:00.980
the sky. But in general, you have this idea. Or it could be on your disc or whatever.
link |
00:24:04.820
Disc, right. But making even more intelligent choices about ways to search stock images,
link |
00:24:10.900
which is really interesting. It's kind of spatial.
link |
00:24:13.700
Absolutely. Right. So that was something we called concept canvas. So normally when you do
link |
00:24:19.540
a say an image search, you would I assuming it's just based on text, you would give the keywords
link |
00:24:26.260
of the things you want to be in the image, and it would find the nearest one that had those tags.
link |
00:24:32.660
For many tasks, you really want, you know, to be able to say I want a big person in the middle or
link |
00:24:36.740
in a dog to the right and umbrella above the left because you want to leave space for the text or
link |
00:24:41.220
whatever for the and so concept canvas lets you assign spatial regions to the keywords.
link |
00:24:47.540
And then we've already pre indexed the images to know where the important concepts are in the
link |
00:24:53.060
picture. So we then go through that index matching to assets. And even though it's just another form
link |
00:25:00.020
of search, because you're doing spatial design or layout, it starts to feel like design, you sort of
link |
00:25:05.700
feel oddly responsible for the image that comes back as if you invented it. Yeah. So it's, it's a
link |
00:25:12.340
it's a good example where giving enough control starts to make people have a sense of ownership
link |
00:25:18.740
over the outcome of the event. And then we also have technologies in Photoshop, we physically can
link |
00:25:23.540
move the dog in post as well. But for concept canvas, it was just a very fast way to sort of
link |
00:25:29.460
loop through and be able to lay things out. And in terms of being able to remove objects from a
link |
00:25:38.100
scene and fill in the background, right, automatically. I so that's extremely
link |
00:25:45.140
exciting. And that's so neural networks are stepping in there. I just talked this week,
link |
00:25:50.420
Ian Goodfellow, so the GANs for doing that is definitely one approach. So that is that is that
link |
00:25:56.660
a really difficult problem? Is it as difficult as it looks, again, to take it to a robust
link |
00:26:01.940
product level? Well, there are certain classes of image for which the traditional algorithms
link |
00:26:07.540
like content aware fill work really well, like if you have a naturalistic texture, like a gravel
link |
00:26:12.500
path or something, because it's patch based, it will make up a very plausible looking intermediate
link |
00:26:17.860
thing and fill in the hole. And then we use some algorithms to sort of smooth out the lighting so
link |
00:26:23.220
you don't see any brightness contrast in that region, or you've gradually ramped from one from
link |
00:26:27.860
dark to light, if it straddles the boundary, where it gets complicated as if you have to infer
link |
00:26:33.940
invisible structure behind behind the person in front. And that really requires a common sense
link |
00:26:40.420
knowledge of the world to know what, you know, if I see three quarters of a house, do I have a rough
link |
00:26:45.460
sense of what the rest of the house looks like? If you just fill it in with patches, it can end up
link |
00:26:49.780
sort of doing things that make sense locally, but you look at the global structure, and it looks
link |
00:26:53.860
like it's just sort of crumpled or messed up. And so what GANs and neural nets bring to the table is
link |
00:27:00.820
this common sense learned from the training set. And the challenge right now is that the generative
link |
00:27:08.740
methods that can make up missing holes using that kind of technology are still only stable at low
link |
00:27:14.340
resolutions. And so you either need to then go from a low resolution to a high resolution using
link |
00:27:19.220
some other algorithm, or we need to push the state of the art and it's still in research to
link |
00:27:23.860
get to that point. Of course, if you show it something, say it's trained on houses,
link |
00:27:29.860
and then you show it an octopus, it's not going to do a very good job of showing common sense about
link |
00:27:35.940
octopuses. So again, you're asking about how you know that it's ready for primetime. You really
link |
00:27:44.980
need a very diverse training set of images. And ultimately, that may be a case where you put it
link |
00:27:52.020
out there with some guardrails where you might do a detector which looks at the image and sort of
link |
00:28:01.540
estimates its own competence of how well a job could this algorithm do. So eventually, there
link |
00:28:07.620
may be this idea of what we call an ensemble of experts where any particular expert is specialized
link |
00:28:13.220
in certain things. And then there's sort of a, either they vote to say how confident they are
link |
00:28:17.380
about what to do, this is sort of more future looking, or there's some dispatcher which says
link |
00:28:22.500
you're good at houses, you're good at trees. So I mean, all this adds up to a lot of work
link |
00:28:29.940
because each of those models will be a whole bunch of work. But I think over time, you'd
link |
00:28:34.580
gradually fill out the set and initially focus on certain workflows and then sort of branch out as
link |
00:28:40.660
you get more capable. You mentioned workflows, and have you considered maybe looking far into
link |
00:28:48.100
the future? First of all, using the fact that there is a huge amount of people that use Photoshop,
link |
00:28:57.700
for example, and have certain workflows, being able to collect the information by which they,
link |
00:29:05.380
you know, basically get information about their workflows, about what they need, the
link |
00:29:10.260
ways to help them, whether it is houses or octopus that people work on more, you know,
link |
00:29:15.940
like basically getting a beat on what kind of data is needed to be annotated and collected for people
link |
00:29:23.380
to build tools that actually work well for people. Right, absolutely. And this is a big
link |
00:29:27.780
topic in the whole world of AI is what data can you gather and why? Right. At one level,
link |
00:29:33.700
a way to think about it is we not only want to train our customers in how to use our products,
link |
00:29:39.620
but we want them to teach us what's important and what's useful. At the same time, we want to
link |
00:29:44.580
respect their privacy. And obviously, we wouldn't do things without their explicit permission.
link |
00:29:52.820
And I think the modern spirit of the age around this is you have to demonstrate to somebody how
link |
00:29:57.620
they're benefiting from sharing their data with the tool. Either it's helping in the short term
link |
00:30:02.980
to understand their intent, so you can make better recommendations, or if they're friendly to your
link |
00:30:08.500
cause, or your tool, or they want to help you evolve quickly, because they depend on you for
link |
00:30:12.900
their livelihood, they may be willing to share some of their workflows or choices with the data
link |
00:30:21.700
set to be then trained. There are technologies for looking at learning without necessarily
link |
00:30:29.060
storing all the information permanently, so that you can sort of learn on the fly, but not
link |
00:30:33.940
keep a record of what somebody did. So we're definitely exploring all of those possibilities.
link |
00:30:38.420
And I think Adobe exists in a space where Photoshop, like if I look at the data I've
link |
00:30:44.660
created and own, you know, I'm less comfortable sharing data with social networks than I am with
link |
00:30:49.940
Adobe, because there's a, just exactly as you said, there's an obvious benefit for sharing
link |
00:30:58.100
for sharing the data that I use to create in Photoshop, because it's helping improve
link |
00:31:04.580
the workflow in the future, as opposed to it's not clear what the benefit is in social networks.
link |
00:31:10.020
It's nice for you to say that. I mean, I think there are some professional workflows where
link |
00:31:14.020
people might be very protective of what they're doing, such as if I was preparing
link |
00:31:18.180
evidence for a legal case, I wouldn't want any of that, you know, phoning home to help train
link |
00:31:24.420
the algorithm or anything. There may be other cases where people are, say, having a trial version,
link |
00:31:29.700
or they're doing some, I'm not saying we're doing this today, but there's a future scenario where
link |
00:31:33.860
somebody has a more permissive relationship with Adobe, where they explicitly say, I'm fine,
link |
00:31:39.220
I'm only doing hobby projects, or things which are non confidential. And in exchange for some
link |
00:31:46.260
benefit, tangible or otherwise, I'm willing to share very fine grained data. So another possible
link |
00:31:53.380
scenario is to capture relatively crude, high level things from more people, and then more
link |
00:31:59.300
detailed knowledge from people who are willing to participate. We do that today with explicit
link |
00:32:03.620
customer studies where, you know, we go and visit somebody and ask them to try the tool and we
link |
00:32:09.060
human observe what they're doing. In the future, to be able to do that enough to be able to train
link |
00:32:15.060
an algorithm, we'd need a more systematic process. But we'd have to do it very consciously, because
link |
00:32:20.260
is one of the things people treasure about Adobe is a sense of trust. And we don't want to endanger
link |
00:32:26.340
that through overly aggressive data collection. So we have a chief privacy officer. And it's
link |
00:32:32.500
definitely front and center of thinking about AI rather than an afterthought.
link |
00:32:37.460
Well, when you start that program, sign me up.
link |
00:32:40.020
Okay, happy to.
link |
00:32:42.900
Is there other projects that you wanted to mention that that I didn't perhaps
link |
00:32:47.700
that pop into mind? Well, you covered the number, I think you mentioned Project Puppetron,
link |
00:32:51.860
I think that one is interesting, because it's, you might think of Adobe as only thinking in 2d.
link |
00:32:59.780
And that's a good example where we're actually thinking more three dimensionally about how to
link |
00:33:04.820
assign features to faces so that we can, you know, if you take so what puppet run does, it takes
link |
00:33:10.500
either a still or a video of a person talking, and then it can take a painting of somebody else
link |
00:33:16.740
and then apply the style of the painting to the person who's talking in the video. And it's
link |
00:33:24.500
unlike a sort of screen door post filter effect that you sometimes see online, it really looks
link |
00:33:31.060
as though it's sort of somehow attached or reflecting the motion of the face. And so
link |
00:33:37.060
that's the case where even to do a 2d workflow, like stylization, you really need to infer more
link |
00:33:42.340
about the 3d structure of the world. And I think, as 3d computer vision algorithms get better,
link |
00:33:48.580
initially, they'll focus on particular domains, like faces, where you have a lot of prior knowledge
link |
00:33:53.540
about structure, and you can maybe have a parameterized template that you fit to the image.
link |
00:33:58.580
But over time, this should be possible for more general content. And it might even be invisible to
link |
00:34:04.340
the user that you're doing 3d reconstruction, but under the hood, but it might then let you
link |
00:34:10.020
do edits much more reliably or correctly than you would otherwise.
link |
00:34:15.780
And, you know, the face is a very important application, right?
link |
00:34:20.580
Absolutely.
link |
00:34:20.820
So making things work.
link |
00:34:22.500
And a very sensitive one. If you do something uncanny, it's very disturbing.
link |
00:34:26.500
That's right. You have to get it right. So in the space of augmented reality and virtual reality,
link |
00:34:36.900
what do you think is the role of AR and VR and in the content we consume as people, as consumers,
link |
00:34:43.220
and the content we create as creators?
link |
00:34:45.300
Now, that's a great question. We think about this a lot, too. So I think VR and AR serve
link |
00:34:51.540
slightly different purposes. So VR can really transport you to an entire immersive world,
link |
00:34:57.300
no matter what your personal situation is. To that extent, it's a bit like a really,
link |
00:35:02.740
really widescreen television, where it sort of snaps you out of your context and
link |
00:35:06.340
puts you in a new one. And I think it's still evolving in terms of the hardware.
link |
00:35:12.500
I actually worked on VR in the 90s trying to solve the latency and sort of nausea problem,
link |
00:35:16.980
which we did, but it was very expensive and a bit early. There's a new wave of that now,
link |
00:35:22.580
I think. And increasingly, those devices are becoming all in one rather than something
link |
00:35:26.740
that's tethered to a box. I think the market seems to be bifurcating into things for consumers
link |
00:35:33.380
and things for professional use cases, like for architects and people designing where your
link |
00:35:38.580
product is a building and you really want to experience it better than looking at a scale
link |
00:35:43.060
model or a drawing, I think, or even than a video. So I think for that, where you need a
link |
00:35:48.900
sense of scale and spatial relationships, it's great. I think AR holds the promise of
link |
00:35:55.380
sort of taking digital assets off the screen and putting them in context in the real world
link |
00:36:01.940
on the table in front of you, on the wall behind you. And that has the corresponding need that the
link |
00:36:08.660
assets need to adapt to the physical context in which they're being placed. I mean, it's a bit
link |
00:36:13.620
like having a live theater troupe come to your house and put on Hamlet. My mother had a friend
link |
00:36:19.140
who used to do this at Stately Homes in England for the National Trust. And they would adapt the
link |
00:36:24.180
scenes and even they'd walk the audience through the rooms to see the action based on the country
link |
00:36:31.300
house they found themselves in for two days. And I think AR will have the same issue that,
link |
00:36:36.500
you know, if you have a tiny table and a big living room or something, it'll try to figure
link |
00:36:40.100
out what can you change and what's fixed. And there's a little bit of a tension between fidelity
link |
00:36:47.460
where if you captured, say, Nureyev doing a fantastic ballet, you'd want it to be sort of
link |
00:36:53.540
exactly reproduced. And maybe all you could do is scale it down. Whereas somebody telling you a
link |
00:36:59.300
story might be walking around the room doing some gestures and that could adapt to the room in which
link |
00:37:05.940
they were telling the story. And do you think fidelity is that important in that space or is
link |
00:37:10.820
it more about the storytelling? I think it may depend on the characteristic of the media. If it's
link |
00:37:16.820
a famous celebrity, then it may be that you want to catch every nuance and they don't want to be
link |
00:37:21.300
reanimated by some algorithm. It could be that if it's really, you know, a lovable frog telling you
link |
00:37:28.660
a story and it's about a princess and a frog, then it doesn't matter if the frog moves in a
link |
00:37:33.780
different way. I think a lot of the ideas that have sort of grown up in the game world will
link |
00:37:39.460
now come into the broader commercial sphere once they're needing adaptive characters in AR.
link |
00:37:45.940
Are you thinking of engineering tools that allow creators to create in
link |
00:37:50.820
the augmented world, basically making a Photoshop for the augmented world?
link |
00:37:56.020
Well, we have shown a few demos of sort of taking a Photoshop layer stack and then expanding it into
link |
00:38:02.500
3D. That's actually been shown publicly as one example in AR. Where we're particularly excited
link |
00:38:08.580
at the moment is in 3D. 3D design is still a very challenging space. And we believe that it's a
link |
00:38:17.140
worthwhile experiment to try to figure out if AR or immersive makes 3D design more spontaneous.
link |
00:38:23.220
Can you give me an example of 3D design, just like applications?
link |
00:38:26.980
Literally, a simple one would be laying out objects, right? So on a conventional screen,
link |
00:38:32.020
you'd sort of have a plan view and a side view and a perspective view, and you'd sort of be
link |
00:38:35.380
dragging it around with a mouse. And if you're not careful, it would go through the wall and all that.
link |
00:38:39.460
Whereas if you were really laying out objects, say, in a VR headset, you could literally move
link |
00:38:46.420
your head to see a different viewpoint. They'd be in stereo. So you'd have a sense of depth
link |
00:38:50.740
because you're already wearing the depth glasses, right? So it would be
link |
00:38:55.300
those sort of big gross motor move things around kind of skills seem much more spontaneous,
link |
00:39:00.340
just like they are in the real world. The frontier for us, I think, is whether
link |
00:39:06.420
that same medium can be used to do fine grained design tasks, like very accurate constraints on,
link |
00:39:12.660
say, a CAD model or something that may be better done on a desktop, but it may just be a matter
link |
00:39:17.780
of inventing the right UI. So we're hopeful that because there will be this potential explosion
link |
00:39:26.020
of demand for 3D assets driven by AR and more real time animation on conventional screens,
link |
00:39:33.220
that those tools will also help with, or those devices will help with designing the content as
link |
00:39:40.500
well. You've mentioned quite a few interesting sort of new ideas. And at the same time, there's
link |
00:39:45.700
old timers like me that are stuck in their old ways and are...
link |
00:39:49.700
Well, I think I'm the old timer.
link |
00:39:51.300
Okay. All right. All right. But the opposed all change at all costs.
link |
00:39:55.540
Yes.
link |
00:39:57.540
When you're thinking about creating new interfaces, do you feel the burden of just
link |
00:40:02.660
this giant user base that loves the current product? So anything new you do, any new idea
link |
00:40:11.700
comes at a cost that you'll be resisted?
link |
00:40:13.700
Well, I think if you have to trade off control for convenience, then our existing user base would
link |
00:40:19.860
definitely be offended by that. I think if there are some things where you have more convenience
link |
00:40:26.180
and just as much control, that may be more welcome. We do think about not breaking well known
link |
00:40:32.740
metaphors for things. So things should sort of make sense. Photoshop has never been a static
link |
00:40:39.140
target. It's always been evolving and growing. And to some extent, there's been a lot of brilliant
link |
00:40:45.140
thought along the way of how it works today. So we don't want to just throw all that out.
link |
00:40:50.420
If there's a fundamental breakthrough, like a single click is good enough to select an object
link |
00:40:54.100
rather than having to do lots of strokes, that actually fits in quite nicely to the existing
link |
00:41:00.340
toolset, either as an optional mode or as a starting point. I think where we're looking at
link |
00:41:06.420
radical simplicity, where you could encapsulate an entire workflow with a much simpler UI, then
link |
00:41:13.060
sometimes that's easier to do in the context of either a different device, like a mobile device,
link |
00:41:18.100
where the affordances are naturally different. Or in a tool that's targeted at a different workflow,
link |
00:41:24.580
where it's about spontaneity and velocity rather than precision. And we have projects like Rush,
link |
00:41:30.820
which can let you do professional quality video editing for a certain class of media output that
link |
00:41:39.940
is targeted very differently in terms of users and the experience. And ideally, people would go,
link |
00:41:47.300
if I'm feeling like doing Premiere, big project, I'm doing a four part television series, that's
link |
00:41:54.580
definitely a Premiere thing. But if I want to do something to show my recent vacation, maybe I'll
link |
00:41:59.220
just use Rush because I can do it in the half an hour I have free at home rather than the four
link |
00:42:04.740
hours I need to do it at work. And for the use cases, which we can do well, it really is much
link |
00:42:11.860
faster to get the same output. But the more professional tools obviously have a much richer
link |
00:42:16.660
toolkit and more flexibility in what they can do. And then at the same time with the flexibility
link |
00:42:22.020
and control, I like this idea of smart defaults, of using AI to coach you to like what Google has,
link |
00:42:30.740
I'm feeling lucky button. Or one button kind of gives you a pretty good set of settings. And then
link |
00:42:38.020
that's almost an educational tool to show. Because sometimes when you have all this control,
link |
00:42:45.700
you're not sure about the correlation between the different bars that control different elements of
link |
00:42:51.780
the image and so on. And sometimes there's a degree of, you don't know what the optimal is.
link |
00:42:59.140
And then some things are sort of on demand, like help, right? Where I'm stuck, I need to know what
link |
00:43:05.060
to look for. I'm not quite sure what it's called. And something that was proactively making helpful
link |
00:43:10.420
suggestions or, you could imagine a make a suggestion button where you'd use all of that
link |
00:43:17.380
knowledge of workflows and everything to maybe suggest something to go and learn about or just
link |
00:43:21.700
to try or show the answer. And maybe it's not one intelligent default, but it's like a variety of
link |
00:43:28.580
defaults. And then you go, I like that one. Yeah. Yeah. Several options. So back to poetry.
link |
00:43:36.740
Ah, yes. We're going to interleave. So first few lines of a recent poem of yours before I ask the
link |
00:43:44.340
next question. This is about the smartphone. Today I left my phone at home and went down to the sea.
link |
00:43:53.860
The sand was soft, the ocean glass, but I was still just me. This is a poem about you leaving
link |
00:44:00.980
your phone behind and feeling quite liberated because of it. So this is kind of a difficult
link |
00:44:08.100
topic and let's see if we can talk about it, figure it out. But so with the help of AI more and more,
link |
00:44:14.500
we can create sort of versions of ourselves, versions of reality that are in some ways more
link |
00:44:20.660
beautiful than actual reality. And some of the creative ways that we can do that,
link |
00:44:29.540
some of the creative effort there is part of creating this illusion.
link |
00:44:36.260
So of course this is inevitable, but how do you think we should adjust as human beings to live in
link |
00:44:41.620
this digital world that's partly artificial, that's better than the world that we lived in
link |
00:44:49.540
a hundred years ago when you didn't have Instagram and Facebook versions of ourselves and the online
link |
00:44:56.340
Oh, this is sort of showing off better versions of ourselves. We're using the tooling of modifying
link |
00:45:02.420
the images or even with artificial intelligence ideas of deep fakes and creating adjusted or
link |
00:45:10.660
fake versions of ourselves and reality. I think it's an interesting question. You're all sort of
link |
00:45:16.500
historical bent on this. So I actually wonder if 18th century aristocrats who commissioned famous
link |
00:45:23.380
painters to paint portraits of them had portraits that were slightly nicer than they actually looked
link |
00:45:28.660
in practice. So human desire to put your best foot forward has always been true.
link |
00:45:37.460
I think it's interesting. You sort of framed it in two ways. One is if we can imagine alternate
link |
00:45:42.260
realities and visualize them, is that a good or bad thing? In the old days, you do it with
link |
00:45:47.300
storytelling and words and poetry, which still resides sometimes on websites, but we've become
link |
00:45:54.500
a very visual culture in particular. In the 19th century, we're very much a text based culture.
link |
00:46:02.180
People would read long tracks, political speeches were very long.
link |
00:46:06.660
Nowadays, everything's very kind of quick and visual and snappy.
link |
00:46:10.100
I think it depends on how harmless your intent. A lot of it's about intent. So if you have a
link |
00:46:18.180
somewhat flattering photo that you pick out of the photos that you have in your inbox to say,
link |
00:46:22.740
this is what I look like, it's probably fine. If someone's going to judge you by how you look,
link |
00:46:31.860
then they'll decide soon enough when they meet you whether the reality, you know.
link |
00:46:35.940
Yeah, right.
link |
00:46:40.420
I think where it can be harmful is if people hold themselves up to an impossible standard,
link |
00:46:46.100
which they then feel bad about themselves for not meeting. I think that definitely can be an issue.
link |
00:46:55.540
But I think the ability to imagine and visualize an alternate reality,
link |
00:46:58.900
which sometimes you then go off and build later, can be a wonderful thing too. People can imagine
link |
00:47:06.100
architectural styles, which they then, you know, have a startup, make a fortune,
link |
00:47:10.420
and then build a house that looks like their favorite video game. Is that a terrible thing?
link |
00:47:17.140
I think I used to worry about exploration, actually, that part of the joy of going to the
link |
00:47:23.860
moon. When I was a tiny child, I remember it in grainy black and white, was to know what it would
link |
00:47:30.100
look like when you got there. And I think now we have such good graphics for visualizing the
link |
00:47:35.140
experience before it happens, that I slightly worry that it may take the edge off actually
link |
00:47:40.580
wanting to go, you know what I mean? Because we've seen it on TV. We kind of, oh, you know,
link |
00:47:44.820
by the time we finally get to Mars, we'll go, yeah, yeah, so it's Mars. That's what it looks like.
link |
00:47:48.260
But then, you know, the outer exploration, I mean, I think Pluto was a fantastic recent
link |
00:47:56.420
discovery where nobody had any idea what it looked like. And it was just breathtakingly
link |
00:48:00.740
varied and beautiful. So I think expanding the ability of the human toolkit to imagine and
link |
00:48:07.860
communicate on balance is a good thing. I think there are abuses, we definitely take them seriously
link |
00:48:13.380
and try to discourage them. I think there's a parallel side where the public needs to know
link |
00:48:21.140
what's possible through events like this, right? So that you don't believe everything you read in
link |
00:48:27.620
print anymore. And it may over time become true of images as well. Or you need multiple sets of
link |
00:48:34.340
evidence to really believe something rather than a single media asset. So I think it's a constantly
link |
00:48:39.220
evolving thing. It's been true forever. There's a famous story about Anne of Cleves and Henry VIII
link |
00:48:45.380
where luckily for Anne, they didn't get married, right? So, or they got married and broke up in it.
link |
00:48:53.780
What's the story?
link |
00:48:54.580
Oh, so Holbein went and painted a picture and then Henry VIII wasn't pleased and,
link |
00:48:58.900
you know, history doesn't record whether Anne was pleased, but I think she was pleased not to
link |
00:49:04.020
be married more than a day or something. So, I mean, this has gone on for a long time, but
link |
00:49:08.180
I think it's just a part of the magnification of human capability.
link |
00:49:14.660
You've kind of built up an amazing research environment here, research culture, research lab,
link |
00:49:21.380
and you've written that the secret to a thriving research lab is interns.
link |
00:49:24.660
Can you unpack that a little bit?
link |
00:49:26.180
Oh, absolutely. So a couple of reasons. As you see looking at my personal history,
link |
00:49:33.940
there are certain ideas you bond with at a certain stage of your career and you tend to
link |
00:49:37.540
keep revisiting them through time. If you're lucky, you pick one that doesn't just get solved
link |
00:49:43.060
in the next five years and then you're sort of out of luck. So I think a constant influx of new
link |
00:49:48.340
people brings new ideas with it. From the point of view of industrial research, because a big
link |
00:49:55.060
part of what we do is really taking those ideas to the point where they can ship as very robust
link |
00:49:59.620
features, you end up investing a lot in a particular idea. And if you're not careful,
link |
00:50:06.660
people can get too conservative in what they choose to do next, knowing that the product teams
link |
00:50:10.660
will want it. And interns let you explore the more fanciful or unproven ideas in a relatively
link |
00:50:18.420
lightweight way, ideally leading to new publications for the intern and for the researcher.
link |
00:50:24.340
And it gives you then a portfolio from which to draw which idea am I going to then try to take
link |
00:50:29.380
all the way through to being robust in the next year or two to ship. So it sort of becomes part
link |
00:50:35.140
of the funnel. It's also a great way for us to identify future full time researchers. Many of
link |
00:50:40.740
our greatest researchers were former interns. It builds a bridge to university departments so we
link |
00:50:46.660
can get to know and build an enduring relationship with the professors whom we often do academic
link |
00:50:52.660
give funds to as well as an acknowledgement of the value the interns add in their own
link |
00:50:57.540
collaborations. So it's sort of a virtuous cycle. And then the long term legacy of a great research
link |
00:51:04.580
lab hopefully will be not only the people who stay, but the ones who move through and then go
link |
00:51:09.620
off and carry that same model to other companies. And so we believe strongly in industrial research
link |
00:51:16.260
and how it can complement academia. And we hope that this model will continue to propagate and
link |
00:51:21.460
be invested in by other companies, which makes it harder for us to recruit, of course, but that's a
link |
00:51:27.300
sign of success. And a rising tide lifts all ships in that sense. And where's the idea born
link |
00:51:34.260
with the interns? Is there brainstorming? Is there discussions about, you know, like what?
link |
00:51:42.340
Where do the ideas come from?
link |
00:51:43.860
Yeah. As I'm asking the question, I realize how dumb it is, but I'm hoping you have a better
link |
00:51:48.820
answer. A question I ask at the beginning of every summer. So what will happen is we'll send out a
link |
00:51:57.460
call for interns. They'll, we'll have a number of resumes come in. People will contact the
link |
00:52:02.900
candidates, talk to them about their interests. They'll usually try to find some, somebody who
link |
00:52:08.020
has a reasonably good match to what they're already doing, or just has a really interesting
link |
00:52:12.820
domain that they've been pursuing in their PhD. And we think we'd love to do one of those projects
link |
00:52:17.940
too. And then the intern stays in touch with the mentor, as we call them. And then they come and
link |
00:52:26.340
at the end of two weeks, they have to decide. So they'll often have a general sense by the time
link |
00:52:31.380
they arrive. And we'll have internal discussions about what are all the general ideas that we're
link |
00:52:37.700
wanting to pursue to see whether two people have the same idea, and maybe they should talk and all
link |
00:52:41.860
that. But then once the intern actually arrives, sometimes the idea goes linearly. And sometimes
link |
00:52:47.620
it takes a giant left turn. And we go, that sounded good. But when we thought about it,
link |
00:52:51.460
there's this other project, or it's already been done. And we found this paper, we were scooped.
link |
00:52:55.780
But we have this other great idea. So it's pretty, pretty flexible at the beginning. One of the
link |
00:53:02.260
questions for research labs is who's deciding what to do? And then who's to blame if it goes wrong?
link |
00:53:08.260
Who gets the credit if it goes right? And so in Adobe, we push the needle very much towards
link |
00:53:15.540
freedom of choice of projects by the researchers and the interns. But then we reward people based
link |
00:53:22.900
on impact. So if the projects ultimately end up impacting the products and having papers and so on.
link |
00:53:28.740
And so your alternative model, just to be clear, is that you have one lab director who thinks he's
link |
00:53:34.420
a genius and tells everybody what to do, takes all the credit if it goes well, blames everybody
link |
00:53:38.740
else if it goes badly. So we don't want that model. And this helps new ideas percolate up.
link |
00:53:45.460
The art of running such a lab is that there are strategic priorities for the company.
link |
00:53:49.860
And there are areas where we do want to invest and pressing problems. And so it's a little bit
link |
00:53:55.300
of a trickle down and filter up meets in the middle. And so you don't tell people you have
link |
00:54:00.660
to do X, but you say X would be particularly appreciated this year. And then people reinterpret
link |
00:54:06.980
X through the filter of things they want to do and they're interested in. And miraculously,
link |
00:54:11.780
it usually comes together very well. One thing that really helps is Adobe has a really broad
link |
00:54:17.380
portfolio of products. So if we have a good idea, there's usually a product team that is intrigued
link |
00:54:24.180
or interested. So it means we don't have to qualify things too much ahead of time.
link |
00:54:30.260
Once in a while, the product teams sponsor extra intern, because they have a particular problem
link |
00:54:35.460
that they really care about, in which case it's a little bit more, we really need one of these.
link |
00:54:40.420
And then we sort of say, great, I get an extra intern, we find an intern who thinks that's a
link |
00:54:44.340
great problem. But that's not the typical model. That's sort of the icing on the cake as far as
link |
00:54:48.580
the budget is concerned. And all of the above end up being important. It's really hard to predict
link |
00:54:55.140
at the beginning of the summer, which we all have high hopes of all of the intern projects, but
link |
00:55:00.260
ultimately, some of them pay off and some of them sort of are a nice paper, but don't turn into a
link |
00:55:04.660
feature. Others turn out not to be as novel as we thought, but they'd be a great feature,
link |
00:55:09.700
but not a paper. And then others, we make a little bit of progress and we realize how much
link |
00:55:15.700
we don't know. And maybe we revisit that problem several years in a row until it,
link |
00:55:20.660
finally we have a breakthrough and then it becomes more on track to impact a product.
link |
00:55:26.180
Jumping back to a big overall view of Adobe research, what are you looking forward to
link |
00:55:32.900
in 2019 and beyond? What is, you mentioned there's a giant suite of products,
link |
00:55:38.580
a giant suite of ideas, new interns, a large team of researchers.
link |
00:55:49.940
What do you think the future holds?
link |
00:55:52.260
In terms of the technological breakthroughs?
link |
00:55:54.420
Technological breakthroughs, especially ones that will make it into product,
link |
00:56:00.180
will get to impact the world.
link |
00:56:01.620
So I think the creative or the analytics assistants that we talked about where
link |
00:56:05.940
they're constantly trying to figure out what you're trying to do and how can they be helpful
link |
00:56:10.100
and make useful suggestions is a really hot topic. And it's very unpredictable as to when
link |
00:56:15.620
it'll be ready, but I'm really looking forward to seeing how much progress we make against that.
link |
00:56:20.260
I think some of the core technologies like generative adversarial networks are immensely
link |
00:56:28.180
promising and seeing how quickly those become practical for mainstream use cases at high
link |
00:56:34.020
resolution with really good quality is also exciting. And they also have this sort of
link |
00:56:38.740
strange way of even the things they do oddly are odd in an interesting way. So it can look
link |
00:56:43.540
like dreaming or something. So that's fascinating. I think internally, we have a Sensei platform,
link |
00:56:52.820
which is a way in which we're pulling our neural nets and other intelligence models
link |
00:56:59.060
into a central platform, which can then be leveraged by multiple product teams at once.
link |
00:57:05.060
So we're in the middle of transitioning from once you have a good idea, you pick a product team to
link |
00:57:10.180
work with and they sort of hand design it for that use case to a more sort of Henry Ford standard
link |
00:57:17.380
up in a standard way, which can be accessed in a standard way, which should mean that the time
link |
00:57:21.620
between a good idea and impacting our products will be greatly shortened. And when one product
link |
00:57:27.380
has a good idea, many of the other products can just leverage it too. So it's sort of an economy
link |
00:57:33.060
of scale. So that's more about the how than the what. But that combination of this sort of
link |
00:57:37.780
renaissance in AI, there's a comparable one in graphics with real time ray tracing and other
link |
00:57:43.220
really exciting emerging technologies. And when these all come together, you'll sort of basically
link |
00:57:48.900
be dancing with light, right, where you'll have real time shadows, reflections and as if it's a
link |
00:57:55.060
real world in front of you. But then with all these magical properties brought by AI, where it
link |
00:57:59.140
sort of anticipates or modifies itself in ways that make sense based on how it understands the
link |
00:58:04.500
creative task you're trying to do. That's a really exciting future for creative for myself to the
link |
00:58:11.300
creator. So first of all, I work in autonomous vehicles. I'm a roboticist. I love robots.
link |
00:58:16.180
And I think you have a fascination with snakes, both natural and artificial robots. I share your
link |
00:58:22.260
fascination. I mean, their movement is beautiful, adaptable. The adaptability is fascinating.
link |
00:58:28.580
There are, I looked it up, 2,900 species of snakes in the world.
link |
00:58:33.300
Wow.
link |
00:58:33.860
875 venomous. Some are tiny, some are huge. I saw that there's one that's 25 feet in some cases. So
link |
00:58:41.620
what's the most interesting thing that you connect with in terms of snakes, both natural and
link |
00:58:49.140
artificial? What was the connection with robotics AI and this particular form of a robot?
link |
00:58:56.340
Well, it actually came out of my work in the 80s on computer animation, where I started doing
link |
00:59:01.060
things like cloth simulation and other kind of soft body simulation. And you'd sort of drop it
link |
00:59:06.740
and it would bounce and then it would just sort of stop moving. And I thought, well, what if you
link |
00:59:10.020
animate the spring lengths and simulate muscles? And the simplest object I could do that for was
link |
00:59:15.380
an earthworm. So I actually did a paper in 1988 called The Motion Dynamics of Snakes and Worms.
link |
00:59:21.060
And I read the physiology literature on both how snakes and worms move and then did some of the
link |
00:59:27.300
early computer animation examples of that. And so your interest in robotics came out of simulation
link |
00:59:35.860
and graphics. When I moved from Alias to Apple, we actually did a movie called Her Majesty's
link |
00:59:42.020
Secret Serpent, which is about a secret agent snake that parachutes in and captures a film
link |
00:59:47.140
canister from a satellite, which tells you how old fashioned we were thinking back then. Sort
link |
00:59:51.140
of classic 1950s or 60s Bond movie kind of thing. And at the same time, I'd always made radio
link |
00:59:58.660
controlled chips when I was a child and from scratch. And I thought, well, how can it be to
link |
01:00:03.940
build a real one? And so then started what turned out to be like a 15 year obsession with trying to
link |
01:00:10.100
build better snake robots. And the first one that I built just sort of slithered sideways,
link |
01:00:15.140
but didn't actually go forward. Then I added wheels and building things in real life makes
link |
01:00:20.100
you honest about the friction. The thing that appeals to me is I love creating the illusion
link |
01:00:26.180
of life, which is what drove me to animation. And if you have a robot with enough degrees of
link |
01:00:31.540
coordinated freedom that move in a kind of biological way, then it starts to cross the
link |
01:00:36.580
Ancani Valley and to seem like a creature rather than a thing. And I certainly got that with the
link |
01:00:42.580
early snakes by S3, I had it able to sidewind as well as go directly forward. My wife to be
link |
01:00:50.980
suggested that it would be the ring bearer at our wedding. So it actually went down the aisle
link |
01:00:54.740
carrying the rings and got in the local paper for that, which was really fun. And this was all done
link |
01:01:02.980
as a hobby. And then I, at the time that can onboard compute was incredibly limited. It was
link |
01:01:07.860
sort of. Yeah. So you should explain that these things, the whole idea is that you would, you're
link |
01:01:12.100
trying to run it autonomously. Autonomously on board right. And so the very first one,
link |
01:01:20.580
I actually built the controller from discrete logic cause I used to do LSI, you know, circuits
link |
01:01:26.340
and things when I was a teenager. And then the second and third one, the eight bit microprocessors
link |
01:01:32.020
were available with like the whole 256 bytes of RAM, which you could just about squeeze in. So
link |
01:01:37.780
they were radio controlled rather than autonomous and really were more about the physicality and
link |
01:01:43.380
coordinated motion. I've occasionally taken a sidestep into, if only I could make it cheaply
link |
01:01:51.060
enough, bake a great toy, which has been a lesson in how clockwork is its own magical realm that you
link |
01:01:59.380
venture into and learn things about backlash and other things you don't take into account
link |
01:02:03.540
as a computer scientist, which is why what seemed like a good idea doesn't work. So it was quite
link |
01:02:07.540
humbling. And then more recently I've been building S9, which is a much better engineered version of
link |
01:02:14.580
S3 where the motors wore out and it doesn't work anymore. And you can't buy replacements,
link |
01:02:18.340
which is sad given that it was such a meaningful one. S5 was about twice as long and looked much
link |
01:02:26.260
more biologically inspired. Unlike the typical roboticist, I taper my snakes. There are good
link |
01:02:33.940
mechanical reasons to do that, but it also makes them look more biological, although it means every
link |
01:02:38.180
segment's unique rather than a repetition, which is why most engineers don't do it. It actually
link |
01:02:44.820
saves weight and leverage and everything. And that one is currently on display at the International
link |
01:02:50.820
Spy Museum in Washington, DC. Not that it's done any spying. It was on YouTube and it got its own
link |
01:02:57.780
conspiracy theory where people thought that it wasn't real because I work at Adobe, it must be
link |
01:03:01.380
fake graphics. And people would write to me, tell me it's real. You know, they say the background
link |
01:03:06.180
doesn't move and it's like, it's on a tripod, you know? So that one, but you can see the real thing,
link |
01:03:12.340
so it really is true. And then the latest one is the first one where I could put a Raspberry Pi,
link |
01:03:18.900
which leads to all sorts of terrible jokes about Pythons and things. But this one can have on board
link |
01:03:25.700
compute. And then where my hobby work and my work work are converging is you can now add vision
link |
01:03:33.300
accelerator chips, which can evaluate neural nets and do object recognition and everything. So both
link |
01:03:38.820
for the snakes and more recently for the spider that I've been working on, having, you know,
link |
01:03:44.660
desktop level compute is now opening up a whole world of true autonomy with onboard compute,
link |
01:03:51.060
onboard batteries, and still having that sort of biomimetic quality that appeals to
link |
01:03:58.980
children in particular. They are really drawn to them and adults think they look creepy,
link |
01:04:02.820
but children actually think they look charming. And I gave a series of lectures at Girls Who Code
link |
01:04:10.500
to encourage people to take an interest in technology. And at the moment, I'd say they're
link |
01:04:16.180
still more expensive than the value that they add, which is why they're a great hobby for me,
link |
01:04:20.660
but they're not really a great product. It makes me think about doing that very early thing I did
link |
01:04:27.940
at Alias with changing the muscle rest lengths. If I could do that with a real artificial muscle
link |
01:04:33.300
material, then the next snake ideally would use that rather than motors and gearboxes and
link |
01:04:39.140
everything. It would be lighter, much stronger, and more continuous and smooth. So it's, I like
link |
01:04:47.460
to say being in research is a license to be curious. And I have the same feeling with my
link |
01:04:51.540
hobby. It forced me to read biology and be curious about things that otherwise would have just been,
link |
01:04:58.180
you know, a National Geographic special. Suddenly I'm thinking, how does that snake move? Can I copy
link |
01:05:02.500
it? I look at the trails that sidewinding snakes leave in sand and see if my snake robots would
link |
01:05:07.860
do the same thing. So out of something inanimate, I like why you put it, try to bring life into it
link |
01:05:13.300
and beauty. Absolutely. And then ultimately give it a personality, which is where the intelligent
link |
01:05:18.260
agent research will converge with the vision and voice synthesis to give it a sense of having,
link |
01:05:25.060
not necessarily human level intelligence. I think the Turing test is such a high bar. It's
link |
01:05:30.500
a little bit self defeating, but having one that you can have a meaningful conversation with,
link |
01:05:36.100
especially if you have a reasonably good sense of what you can say. So not trying to have it so a
link |
01:05:43.380
stranger could walk up and have one, but so as a pet owner or a robot pet owner, you could know
link |
01:05:49.780
what it thinks about and what it can reason about. Or sometimes just the meaningful interaction. If
link |
01:05:55.860
you have the kind of interaction you have with the dog, sometimes you might have a conversation,
link |
01:06:00.260
but it's usually one way. Absolutely. And nevertheless, it feels like a meaningful
link |
01:06:04.340
and meaningful connection. And one of the things that I'm trying to do in the sample audio that
link |
01:06:10.660
will play you is beginning to get towards the point where the reasoning system can explain
link |
01:06:16.580
why it knows something or why it thinks something. And that again, creates the sense that it really
link |
01:06:21.700
does know what it's talking about, but also for debugging as you get more and more elaborate
link |
01:06:29.140
behavior, it's like, why did you decide to do that? You know, how do you know that? I think
link |
01:06:36.020
the robot's really my muse for helping me think about the future of AI and what to invent next.
link |
01:06:42.580
So even at Adobe, that's mostly operating in digital world. Correct. Do you ever,
link |
01:06:49.060
do you see a future where Adobe even expands into the more physical world perhaps? So bringing life
link |
01:06:55.460
not into animations, but bringing life into physical objects with, whether it's, well,
link |
01:07:03.300
I'd have to say at the moment, it's a twinkle in my eye. I think the more likely thing is that we
link |
01:07:08.180
will bring virtual objects into the physical world through augmented reality and many of the ideas
link |
01:07:15.620
that might take five years to build a robot to do, you can do in a few weeks with digital assets. So
link |
01:07:22.580
I think when really intelligent robots finally become commonplace, they won't be that surprising
link |
01:07:29.300
because we'll have been living with those personalities for in the virtual sphere for
link |
01:07:33.300
a long time. And then they'll just say, Oh, it's, you know, Siri with legs or Alexa,
link |
01:07:38.340
Alexa on hooves or something. So I can see that world coming. And for now, it's still an adventure,
link |
01:07:46.740
still an adventure. And we don't know quite what the experience will be like. And it's really
link |
01:07:52.340
exciting to sort of see all of these different strands of my career converge. Yeah. In interesting
link |
01:07:58.420
ways. And it is definitely a fun adventure. So let me end with my favorite poem, the last few
link |
01:08:07.060
lines of my favorite poem of yours that ponders mortality and in some sense, immortality, you know,
link |
01:08:13.140
as our ideas live through the ideas of others, through the work of others, it ends with do not
link |
01:08:19.060
weep or mourn. It was enough. The little enemies permitted just a single dance, scattered them as
link |
01:08:25.540
deep as your eyes can see. I'm content. They'll have another chance sweeping more centered parts
link |
01:08:31.940
along to join a jostling lifting throng as others danced in me. Beautiful poem. Beautiful way to
link |
01:08:40.420
end it. Gavin, thank you so much for talking today. And thank you for inspiring and empowering millions
link |
01:08:45.540
of people like myself for creating amazing stuff. Oh, thank you. Great conversation.