back to index

Gavin Miller: Adobe Research | Lex Fridman Podcast #23


small model | large model

link |
00:00:00.000
The following is a conversation with Gavin Miller, he's the head of Adobe Research.
link |
00:00:04.720
Adobe has empowered artists, designers, and creative minds from all professions,
link |
00:00:08.960
working in the digital medium for over 30 years with software such as Photoshop, Illustrator,
link |
00:00:14.320
Premiere, After Effects, InDesign, Audition, Software that work with images, video, and audio.
link |
00:00:21.200
Adobe Research is working to define the future evolution of these products in a way
link |
00:00:25.920
that makes the life of creatives easier, automates the tedious tasks, and gives more and more time
link |
00:00:31.360
to operate in the idea space instead of pixel space. This is where the cutting edge, deep
link |
00:00:36.880
learning methods of the past decade can really shine more than perhaps any other application.
link |
00:00:42.240
Gavin is the embodiment of combining tech and creativity. Outside of Adobe Research,
link |
00:00:47.840
he writes poetry and builds robots, both things that are near and dear to my heart as well.
link |
00:00:53.600
This conversation is part of the Artificial Intelligence Podcast. If you enjoy it, subscribe
link |
00:00:59.200
on YouTube, iTunes, or simply connect with me on Twitter at Lex Friedman's spelled F R I D.
link |
00:01:06.000
And now here's my conversation with Gavin Miller.
link |
00:01:11.120
You're head of Adobe Research, leading a lot of innovative efforts and applications of AI,
link |
00:01:15.920
creating images, video, audio, language, but you're also yourself an artist, a poet,
link |
00:01:23.200
a writer, and even a roboticist. So while I promise to everyone listening,
link |
00:01:28.640
that I will not spend the entire time we have together reading your poetry, which I love.
link |
00:01:33.440
I have to sprinkle it in at least a little bit. So some of them are pretty deep and profound,
link |
00:01:39.200
and some are light and silly. Let's start with a few lines from the silly variety.
link |
00:01:43.520
You write in a beautiful parody, both Edith Piaf's and my web at Frank Sinatra.
link |
00:01:56.800
So it opens with, and now dessert is near. It's time to pay the final total. I've tried to slim
link |
00:02:06.400
all year, but my diets have been anecdotal. So where does that love for poetry come from
link |
00:02:14.800
for you? And if we dissect your mind, how does it all fit together in the bigger puzzle of Dr.
link |
00:02:20.880
Gavin Miller? Well, interesting you chose that one. That was a poem I wrote when I'd been to
link |
00:02:27.440
my doctor and he said, you really need to lose some weight and go on a diet. And whilst the
link |
00:02:32.400
rational part of my brain wanted to do that, the irrational part of my brain was protesting and
link |
00:02:37.200
sort of embraced the opposite idea. I regret nothing, hence. Yes, exactly. Taken to an extreme,
link |
00:02:42.400
I thought it would be funny. Obviously, it's a serious topic for some people. But I think,
link |
00:02:49.600
for me, I've always been interested in writing since I was in high school, as well as doing
link |
00:02:53.920
technology and invention. And sometimes the parallel strands in your life that carry on,
link |
00:02:58.960
and one is more about your private life and one's more about your technological career.
link |
00:03:05.680
And then at sort of happy moments along the way, sometimes the two things touch, one idea informs
link |
00:03:10.640
the other. And we can talk about that as we go. Do you think you're writing the art, the poetry
link |
00:03:17.040
contribute indirectly or directly to your research, to your work in Adobe? Well, sometimes it does if
link |
00:03:23.440
I say, imagine a future in a science fiction kind of way. And then once it exists on paper,
link |
00:03:30.000
I think, well, why shouldn't I just build that? There was an example where when realistic voice
link |
00:03:37.520
synthesis first started in the 90s at Apple, where I worked in research. I was done by a friend of mine.
link |
00:03:44.000
I sort of sat down and started writing a poem which each line I would enter into the voice
link |
00:03:48.640
synthesizer and see how it sounded and sort of wrote it for that voice. And at the time,
link |
00:03:55.040
the agents weren't very sophisticated. So they'd sort of add random intonation. And I kind of made
link |
00:04:00.160
up the poem to sort of match the tone of the voice. And it sounded slightly sad and depressed. So I
link |
00:04:06.560
pretended it was a poem written by an intelligent agent, sort of telling the user to go home and
link |
00:04:12.720
leave them alone. But at the same time, they were lonely and wanted to have company and learn from
link |
00:04:16.560
what the user was saying. And at the time, it was way beyond anything that AI could possibly do.
link |
00:04:21.760
But, you know, since then, it's becoming more within the bounds of possibility.
link |
00:04:29.040
And then at the same time, I had a project at home where I did sort of a smart home. This was
link |
00:04:34.800
probably 93, 94. And I had the talking voice who'd remind me when I walked in the door of what
link |
00:04:40.960
things I had to do. I had buttons on my washing machine because I was a bachelor and I'd leave
link |
00:04:45.600
the clothes in there for three days and they'd go moldy. So as I got up in the morning, I would say,
link |
00:04:49.920
don't forget the washing and so on. I made photographic photo albums that used light
link |
00:04:56.640
sensors to know which page you were looking at would send that over wireless radio to the agent
link |
00:05:01.040
who would then play sounds that matched the image she were looking at in the book. So I was kind of
link |
00:05:05.760
in love with this idea of magical realism and whether it was possible to do that with technology.
link |
00:05:10.480
So that was a case where the sort of the agent sort of intrigued me from a literary point of
link |
00:05:16.080
view and became a personality. I think more recently, I've also written plays and when
link |
00:05:22.880
plays you write dialogue and obviously you write a fixed set of dialogue that follows a linear
link |
00:05:27.440
narrative. But with modern agents, as you design a personality or a capability for conversation,
link |
00:05:33.360
you're sort of thinking of, I kind of have imaginary dialogue in my head. And then I think,
link |
00:05:37.760
what would it take not only to have that be real, but for it to really know what it's talking about.
link |
00:05:44.240
So it's easy to fall into the uncanny valley with AI where it says something it doesn't really
link |
00:05:49.440
understand, but it sounds good to the person. But you rapidly realize that it's kind of just
link |
00:05:55.520
stimulus response. It doesn't really have real world knowledge about the thing it's describing.
link |
00:06:00.640
And so when you get to that point, it really needs to have multiple ways of talking about
link |
00:06:06.320
the same concept. So it sounds as though it really understands it. Now, what really understanding
link |
00:06:10.560
means is in the eye of the beholder, right? But if it only has one way of referring to something,
link |
00:06:16.160
it feels like it's a canned response. But if it can reason about it, or you can go at it from
link |
00:06:21.200
multiple angles and give a similar kind of response that people would, then it starts to
link |
00:06:26.400
seem more like there's something there that's sentient.
link |
00:06:31.040
You can say the same thing, multiple things from different perspectives. I mean, with the
link |
00:06:35.600
automatic image captioning that I've seen the work that you're doing, there's elements of that,
link |
00:06:40.000
right? Being able to generate different kinds of... Right. So one in my team, there's a lot of work on
link |
00:06:46.640
turning a medium from one form to another, whether it's auto tagging imagery or making up full
link |
00:06:52.000
sentences about what's in the image, then changing the sentence, finding another image that matches
link |
00:06:57.840
the new sentence or vice versa. And in the modern world of GANs, you sort of give it a description
link |
00:07:04.720
and it synthesizes an asset that matches the description. So I've sort of gone on a journey.
link |
00:07:11.360
My early days in my career were about 3D computer graphics, the sort of pioneering work sort of
link |
00:07:16.560
before movies had special effects done with 3D graphics and sort of rode that revolution. And
link |
00:07:22.400
that was very much like the renaissance where people would model light and color and shape
link |
00:07:26.720
and everything. And now we're kind of in another wave where it's more impressionistic and it's
link |
00:07:32.160
sort of the idea of something can be used to generate an image directly, which is sort of the
link |
00:07:38.240
new frontier in computer image generation using AI algorithms. So the creative process is more in
link |
00:07:45.520
the space of ideas or becoming more in the space of ideas versus in the raw pixels?
link |
00:07:50.720
Well, it's interesting. It depends. I think at Adobe, we really want to span the entire range
link |
00:07:55.280
from really, really good, what you might call low level tools by low level, as close to say analog
link |
00:08:01.040
workflows as possible. So what we do there is we make up systems that do really realistic oil
link |
00:08:07.040
paint and watercolor simulation. So if you want every bristle to behave as it would in the real
link |
00:08:11.600
world and leave a beautiful analog trail of water and then flow after you've made the brushstroke,
link |
00:08:17.760
you can do that. And that's really important for people who want to create something
link |
00:08:22.720
really expressive or really novel because they have complete control. And then a certain other
link |
00:08:28.160
task become automated. It frees the artists up to focus on the inspiration and less of the perspiration.
link |
00:08:35.600
So thinking about different ideas, obviously, once you finish the design, there's a lot of work to
link |
00:08:43.920
say do it for all the different aspect ratio of phones or websites and so on. And that used to
link |
00:08:49.840
take up an awful lot of time for artists. It still does for many, what we call content velocity.
link |
00:08:55.040
And one of the targets of AI is actually to reason about from the first example of what are the
link |
00:09:01.360
likely intent for these other formats, maybe if you change the language to German and the words
link |
00:09:06.960
are longer, how do you reflow everything so that it looks nicely artistic in that way.
link |
00:09:12.160
And so the person can focus on the really creative bit in the middle, which is what is the look and
link |
00:09:17.360
style and feel and what's the message and what's the story and the human element.
link |
00:09:21.200
So I think creativity is changing. So that's one way in which we're trying to just make it easier
link |
00:09:27.920
and faster and cheaper to do so that there can be more of it, more demand, because it's less
link |
00:09:32.880
expensive. So everyone wants beautiful artwork for everything from a school website to Hollywood movie.
link |
00:09:40.800
On the other side, as some of these things have automatic versions of them, people will
link |
00:09:46.480
possibly change role from being the hands on artist and to being either the art director or
link |
00:09:52.480
the conceptual artist. And then the computer will be a partner to help create polished examples of
link |
00:09:57.920
the idea that they're exploring. Let's talk about Adobe products versus AI and Adobe products.
link |
00:10:04.000
Just so you know where I'm coming from, I'm a huge fan of Photoshop for images premiere for video,
link |
00:10:11.200
audition for audio. I'll probably use Photoshop to create the thumbnail for this video, premiere
link |
00:10:17.200
to edit the video, audition to do the audio. That said, everything I do is really manually. And I
link |
00:10:24.800
set up, I use this old school kinesis keyboard and I have auto hotkey that just it's really about
link |
00:10:30.640
optimizing the flow of just making sure there's as few clicks as possible. So just being extremely
link |
00:10:37.280
efficient. It's something you started to speak to. So before we get into the fun, sort of awesome
link |
00:10:43.920
deep learning things, where does AI, if you could speak a little more to it AI or just
link |
00:10:48.880
automation in general, do you see in the coming months and years or in general prior in 2018
link |
00:10:58.560
fitting into making the life, the low level pixel work flow easier?
link |
00:11:03.840
Yeah, that's a great question. So we have a very rich array of algorithms already in Photoshop,
link |
00:11:10.160
just classical procedural algorithms as well as ones based on data. In some cases, they end up
link |
00:11:17.600
with a large number of sliders and degrees of freedom. So one way in which AI can help is just
link |
00:11:22.560
an auto button which comes up with default settings based on the content itself rather than
link |
00:11:27.520
default values for the tool. At that point, you then start tweaking. So that's that's a very kind of
link |
00:11:34.080
make life easier for people whilst making use of common sense from other example images.
link |
00:11:39.600
So like smart defaults.
link |
00:11:40.960
Smart defaults, absolutely. Another one is something we've spent a lot of work over the last
link |
00:11:47.760
20 years. I've been at Adobe 19 thinking about selection, for instance, where
link |
00:11:53.360
you know, with a quick select, you would look at color boundaries and figure out how to sort of
link |
00:11:58.720
flood fill into regions that you thought were physically connected in the real world.
link |
00:12:03.360
But that algorithm had no visual common sense about what a cat looks like or a dog. It would just do
link |
00:12:08.080
it based on rules of thumb, which were applied to graph theory. And it was a big improvement over
link |
00:12:14.080
the previous work we had sort of almost click every everything by hand or if it just did similar
link |
00:12:19.600
colors, it would do little tiny regions that wouldn't be connected. But in the future,
link |
00:12:24.880
using neural nets to actually do a great job with say a single click or even in the case of
link |
00:12:31.120
well known categories like people or animals, no click, where you just say select the object and
link |
00:12:36.160
it just knows the dominant object as a person in the middle of the photograph. Those kinds of things
link |
00:12:41.440
are really valuable if they can be robust enough to give you good quality results.
link |
00:12:47.520
Or they can be a great start for like tweaking it.
link |
00:12:50.720
So for example, background removal, like one thing I'll, in a thumbnail,
link |
00:12:56.080
I'll take a picture of you right now and essentially remove the background behind you.
link |
00:13:00.240
And I want to make that as easy as possible. You don't have flowing hair, like rich at the
link |
00:13:06.480
moment. Rich sort of. I had it in the past, it may come again in the future, but for now.
link |
00:13:12.320
So that sometimes makes it a little more challenging to remove the background.
link |
00:13:16.160
How difficult do you think is that problem for AI for basically making the quick selection tool
link |
00:13:23.680
smarter and smarter and smarter? Well, we have a lot of research on that already.
link |
00:13:28.400
If you want a sort of quick, cheap and cheerful, look, I'm pretending I'm in Hawaii,
link |
00:13:34.160
but it's sort of a joke, then you don't need perfect boundaries. And you can do that today
link |
00:13:38.400
with a single click for the algorithms we have. We have other algorithms where with a little bit
link |
00:13:44.960
more guidance on the boundaries, like you might need to touch it up a little bit.
link |
00:13:49.920
We have other algorithms that can pull a nice mat from a crude selection. So we have combinations
link |
00:13:56.480
of tools that can do all of that. And at our recent Max conference at AB Max, we demonstrated how
link |
00:14:04.880
very quickly just by drawing a simple polygon around the object of interest, we could not
link |
00:14:09.680
only do it for a single still, but we could pull at least a selection mask from a moving target,
link |
00:14:17.920
like a person dancing in front of a brick wall or something. And so it's going from hours to
link |
00:14:23.360
a few seconds for workflows that are really nice. And then you might go in and touch up a little.
link |
00:14:30.240
So that's a really interesting question. You mentioned the word robust.
link |
00:14:33.040
You know, there's like a journey for an idea, right? And what you presented probably at Max
link |
00:14:41.360
has elements of just sort of it inspires the concept, it can work pretty well in a majority
link |
00:14:46.320
of cases. But how do you make something that works? Well, in majority of cases, how do you make
link |
00:14:51.520
something that works, maybe in all cases, or it becomes a robust tool?
link |
00:14:56.480
There are a couple of things. So that really touches on the difference between academic research
link |
00:15:01.760
and industrial research. So in academic research, it's really about who's the person to have the
link |
00:15:06.640
great new idea that shows promise. And we certainly love to be those people too. But
link |
00:15:13.120
we have sort of two forms of publishing. One is academic peer review, which we do a lot of,
link |
00:15:17.840
and we have great success there as much as some universities. But then we also have shipping,
link |
00:15:24.720
which is a different type of, and then we get customer review, as well as, you know, product
link |
00:15:30.160
critics. And that might be a case where it's not about being perfect every single time, but
link |
00:15:37.840
perfect enough at the time, plus a mechanism to intervene and recover where you do have mistakes.
link |
00:15:43.280
So we have the luxury of very talented customers. We don't want them to be
link |
00:15:48.720
overly taxed doing it every time. But if they can go in and just take it from 99 to 100,
link |
00:15:55.200
100 with the touch of a mouse or something, then for the professional end, that's something
link |
00:16:01.440
that we definitely want to support as well. And for them, it went from having to do that
link |
00:16:06.320
tedious task all the time to much less often. So I think that gives us an out. If it had to be
link |
00:16:13.360
100% automatic all the time, then that would delay the time at which we could get to market.
link |
00:16:18.640
So on that thread, maybe you can untangle something. Again, I'm sort of just speaking to
link |
00:16:26.000
my own experience. Maybe that is the most useful idea. So I think Photoshop, as an example or premiere,
link |
00:16:37.680
has a lot of amazing features that I haven't touched. And so what's the, in terms of AI,
link |
00:16:44.240
helping make my life or the life of creatives easier? How this collaboration between human
link |
00:16:54.080
and machine, how do you learn to collaborate better? How do you learn the new algorithms?
link |
00:17:00.000
Is it something that where you have to watch tutorials and you have to watch videos and so
link |
00:17:04.240
on? Or do you ever think, do you think about the experience itself through exploration being
link |
00:17:10.400
the teacher? We absolutely do. So I'm glad that you brought this up. We sort of think about
link |
00:17:18.320
two things. One is helping the person in the moment to do the task that they need to do. But
link |
00:17:22.080
the other is thinking more holistically about their journey learning a tool. And when it's like,
link |
00:17:26.960
think of it as Adobe University, where you use the tool long enough, you become an expert.
link |
00:17:31.120
And not necessarily an expert in everything. It's like living in a city. You don't necessarily
link |
00:17:34.960
know every street, but you know, the important ones you need to get to. So we have projects in
link |
00:17:40.160
research, which actually look at the thousands of hours of tutorials online and try to understand
link |
00:17:45.360
what's being taught in them. And then we had one publication at CHI where it was looking at,
link |
00:17:52.560
given the last three or four actions you did, what did other people in tutorials do next?
link |
00:17:57.360
So if you want some inspiration for what you might do next, or you just want to watch the
link |
00:18:01.680
tutorial and see, learn from people who are doing similar workflows to you, you can without having
link |
00:18:06.800
to go and search on keywords and everything. So really trying to use the context of your use of
link |
00:18:13.360
the app to make intelligent suggestions, either about choices that you might make,
link |
00:18:20.800
or in a more assistive way where it could say, if you did this next, we could show you. And that's
link |
00:18:26.480
basically the frontier that we're exploring now, which is, if we really deeply understand the
link |
00:18:31.360
domain in which designers and creative people work, can we combine that with AI and pattern
link |
00:18:38.480
matching of behavior to make intelligent suggestions, either through verbal possibilities or just
link |
00:18:47.520
showing the results of if you try this. And that's really the sort of, I was in a meeting today
link |
00:18:53.840
thinking about these things. So it's still a grand challenge. We'd all love
link |
00:18:58.880
an artist over one shoulder and a teacher over the other, right? And we hope to get there. And
link |
00:19:05.920
the right thing to do is to give enough at each stage that it's useful in itself, but it builds
link |
00:19:10.640
a foundation for the next level of expectation. Are you aware of this gigantic medium of YouTube
link |
00:19:19.120
that's creating just a bunch of creative people, both artists and teachers of different kinds?
link |
00:19:26.240
Absolutely. And the more we can understand those media types, both visually and in terms of
link |
00:19:31.440
transcripts and words, the more we can bring the wisdom that they embody into the guidance that's
link |
00:19:37.200
embedded in the tool. That would be brilliant to remove the barrier from having to yourself type
link |
00:19:42.960
in the keyword, searching, so on. Absolutely. And then in the longer term, an interesting
link |
00:19:49.600
discussion is, does it ultimately not just assist with learning the interface we have,
link |
00:19:54.000
but does it modify the interface to be simpler? Or do you fragment into a variety of tools,
link |
00:19:59.200
each of which has a different level of visibility of the functionality? I like to say that if you
link |
00:20:05.600
add a feature to a GUI, you have to have yet more visual complexity confronting the new user.
link |
00:20:12.640
Whereas if you have an assistant with a new skill, if you know they have it, so you know
link |
00:20:17.120
to ask for it, then it's sort of additive without being more intimidating. So we definitely think
link |
00:20:23.280
about new users and how to onboard them. Many actually value the idea of being able to master
link |
00:20:29.120
that complex interface and keyboard shortcuts, like you were talking about earlier, because
link |
00:20:35.520
with great familiarity, it becomes a musical instrument for expressing your visual ideas.
link |
00:20:40.480
And other people just want to get something done quickly in the simplest way possible,
link |
00:20:45.840
and that's where a more assistive version of the same technology might be useful,
link |
00:20:50.400
maybe on a different class of device, which is more in context for capture, say,
link |
00:20:55.920
whereas somebody who's in a deep post production workflow maybe want to be on a laptop or a big
link |
00:21:01.680
screen desktop and have more knobs and dials to really express the subtlety of what they want to do.
link |
00:21:12.160
So there's so many exciting applications of computer vision and machine learning
link |
00:21:16.320
that Adobe is working on, like scene stitching, sky replacement, foreground,
link |
00:21:21.920
background removal, spatial object based image search, automatic image captioning,
link |
00:21:26.880
like we mentioned, project cloak, project deep fill filling in parts of the images,
link |
00:21:31.920
project scribbler, style transfer video, style transfer faces and video with Project Puppetron,
link |
00:21:38.640
best name ever. Can you talk through a favorite or some of them or examples that pop in mind?
link |
00:21:49.040
I'm sure I'll be able to provide links to other ones we don't talk about because there's visual
link |
00:21:54.800
elements to all of them that are exciting. Why they're interesting for different reasons might
link |
00:22:00.640
be a good way to go. So I think sky replace is interesting because we talked about selection
link |
00:22:06.640
being sort of an atomic operation. It's almost like a, if you think of an assembly language,
link |
00:22:11.440
it's like a single instruction. Whereas sky replace is a compound action where you automatically
link |
00:22:17.840
select the sky, you look for stock content that matches the geometry of the scene.
link |
00:22:24.000
You try to have variety in your choices so that you do coverage of different moods.
link |
00:22:28.160
It then mats in the sky behind the foreground, but then importantly it uses the foreground
link |
00:22:35.600
of the other image that you just searched on to recolor the foreground of the image that
link |
00:22:40.560
you're editing. So if you say go from a midday sky to an evening sky, it will actually add
link |
00:22:47.840
sort of an orange glow to the foreground objects as well. I was a big fan in college of Magritte
link |
00:22:53.760
and he has a number of paintings where it's surrealism because he'll like do a composite,
link |
00:22:59.600
but the foreground building will be at night and the sky will be during the day. There's one
link |
00:23:03.440
called The Empire of Light which was on my wall in college and we're trying not to do surrealism.
link |
00:23:09.120
It can be a choice, but we'd rather have it be natural by default rather than it looking fake
link |
00:23:15.680
and then you have to do a whole bunch of post production to fix it. So that's a case where
link |
00:23:19.760
we're kind of capturing an entire workflow into a single action and doing it in about a second
link |
00:23:25.120
rather than a minute or two. And when you do that, you can not just do it once, but you can do it
link |
00:23:30.720
for say like 10 different backgrounds and then you're almost back to this inspiration idea of
link |
00:23:36.960
I don't know quite what I want, but I'll know it when I see it. And you can just explore the
link |
00:23:41.840
design space as close to final production value as possible. And then when you really pick one,
link |
00:23:47.200
you might go back and slightly tweak the selection mask just to make it perfect and
link |
00:23:51.120
do that kind of polish that professionals like to bring to their work.
link |
00:23:54.320
So then there's this idea of, as you mentioned, the sky replacing it to different stock images of
link |
00:24:01.040
the sky. In general, you have this idea or it could be on your disk or whatever. But making even
link |
00:24:07.040
more intelligent choices about ways to search stock images which is really interesting. It's
link |
00:24:12.400
kind of spatial. Absolutely. Right. So that was something we called Concept Canvas. So normally
link |
00:24:19.120
when you do say an image search, I assume it's just based on text. You would give the keywords
link |
00:24:26.240
of the things you want to be in the image and it would find the nearest one that had those tags.
link |
00:24:32.720
For many tasks, you really want to be able to say I want a big person in the middle or in a
link |
00:24:37.200
dog to the right and umbrella above the left because you want to leave space for the text or
link |
00:24:41.280
whatever. And so Concept Canvas lets you assign spatial regions to the keywords. And then we've
link |
00:24:48.640
already preindexed the images to know where the important concepts are in the picture. So we then
link |
00:24:54.560
go through that index matching to assets. And even though it's just another form of search,
link |
00:25:01.200
because you're doing spatial design or layout, it starts to feel like design. You sort of feel
link |
00:25:06.480
oddly responsible for the image that comes back as if you invented it a little bit. So it's a good
link |
00:25:13.120
example where giving enough control starts to make people have a sense of ownership over the
link |
00:25:18.960
outcome of the event. And then we also have technologies in Photoshop where you physically
link |
00:25:23.280
can move the dog in post as well. But for Concept Canvas, it was just a very fast way to sort of
link |
00:25:29.440
loop through and be able to lay things out. In terms of being able to remove objects from a scene
link |
00:25:38.560
and fill in the background automatically. So that's extremely exciting. And that's
link |
00:25:45.920
so neural networks are stepping in there. I just talked this week with Ian Goodfellow.
link |
00:25:51.200
So the GANS for doing that is definitely one approach.
link |
00:25:55.360
So is that a really difficult problem? Is it as difficult as it looks,
link |
00:25:59.440
again, to take it to a robust product level? Well, there are certain classes of image for
link |
00:26:06.080
which the traditional algorithms like Content Aware Fill work really well. Like if you have
link |
00:26:10.800
a naturalistic texture like a gravel path or something, because it's patch based, it will
link |
00:26:15.200
make up a very plausible looking intermediate thing and fill in the hole. And then we use some
link |
00:26:20.960
algorithms to sort of smooth out the lighting so you don't see any brightness contrasts in that
link |
00:26:25.200
region or you've gradually ramped from dark to light if it straddles the boundary.
link |
00:26:29.680
Where it gets complicated is if you have to infer invisible structure behind the person in front.
link |
00:26:37.600
And that really requires a common sense knowledge of the world to know what,
link |
00:26:42.480
if I see three quarters of a house, do I have a rough sense of what the rest of the house looks
link |
00:26:47.040
like? If you just fill it in with patches, it can end up sort of doing things that make sense
link |
00:26:51.840
locally. But you look at the global structure and it looks like it's just sort of crumpled or messed
link |
00:26:56.480
up. And so what GANs and neural nets bring to the table is this common sense learned from the
link |
00:27:02.800
training set. And the challenge right now is that the generative methods that can make up
link |
00:27:10.640
missing holes using that kind of technology are still only stable at low resolutions.
link |
00:27:15.520
And so you either need to then go from a low resolution to a high resolution using some other
link |
00:27:19.680
algorithm or we need to push the state of the art and it's still in research to get to that point.
link |
00:27:24.720
Right. Of course, if you show it something, say it's trained on houses and then you show it in
link |
00:27:30.720
octopus, it's not going to do a very good job of showing common sense about octopuses. So
link |
00:27:39.360
again, you're asking about how you know that it's ready for prime time. You really need a very
link |
00:27:45.600
diverse training set of images. And ultimately, that may be a case where you put it out there
link |
00:27:52.880
with some guard rails where you might do a detector which looks at the image and
link |
00:28:00.960
sort of estimates its own competence of how well a job could this algorithm do.
link |
00:28:05.920
So eventually, there may be this idea of what we call an ensemble of experts where
link |
00:28:11.120
any particular expert is specialized in certain things and then there's sort of a
link |
00:28:15.440
either they vote to say how confident they are about what to do. This is sort of more future
link |
00:28:19.360
looking or there's some dispatcher which says you're good at houses, you're good at trees.
link |
00:28:27.120
So I mean, all this adds up to a lot of work because each of those models will be a whole
link |
00:28:31.520
bunch of work. But I think over time, you'd gradually fill out the set and initially focus
link |
00:28:38.320
on certain workflows and then sort of branch out as you get more capable.
link |
00:28:41.520
So you mentioned workflows and have you considered maybe looking far into the future?
link |
00:28:50.000
First of all, using the fact that there is a huge amount of people that use Photoshop,
link |
00:28:57.680
for example, they have certain workflows, being able to collect the information by which
link |
00:29:04.880
they basically get information about their workflows, about what they need,
link |
00:29:09.440
what can the ways to help them, whether it is houses or octopus that people work on more.
link |
00:29:16.000
Basically getting a beat on what kind of data is needed to be annotated and collected for people
link |
00:29:23.440
to build tools that actually work well for people.
link |
00:29:26.320
Absolutely. And this is a big topic and the whole world of AI is what data can you gather and why.
link |
00:29:33.200
At one level, the way to think about it is we not only want to train our customers in how to use
link |
00:29:39.120
our products, but we want them to teach us what's important and what's useful. At the same time,
link |
00:29:44.160
we want to respect their privacy and obviously we wouldn't do things without their explicit permission.
link |
00:29:52.800
And I think the modern spirit of the age around this is you have to demonstrate to somebody
link |
00:29:57.440
how they're benefiting from sharing their data with the tool. Either it's helping in the short
link |
00:30:02.720
term to understand their intent so you can make better recommendations or if they're
link |
00:30:07.120
there friendly to your cause or you're tall or they want to help you evolve quickly because
link |
00:30:11.840
they depend on you for their livelihood, they may be willing to share some of their
link |
00:30:17.360
workflows or choices with the dataset to be then trained.
link |
00:30:24.720
There are technologies for looking at learning without necessarily storing all the information
link |
00:30:30.560
permanently so that you can learn on the fly but not keep a record of what somebody did.
link |
00:30:36.080
So, we're definitely exploring all of those possibilities.
link |
00:30:38.720
And I think Adobe exists in a space where Photoshop, if I look at the data I've created
link |
00:30:45.520
in OWN, I'm less comfortable sharing data with social networks than I am with Adobe because
link |
00:30:51.840
there's just exactly as you said, there's an obvious benefit for sharing the data that I use
link |
00:31:01.360
to create in Photoshop because it's helping improve the workflow in the future.
link |
00:31:05.440
Right.
link |
00:31:06.080
As opposed to it's not clear what the benefit is in social networks.
link |
00:31:10.000
It's nice of you to say that. I mean, I think there are some professional workflows where
link |
00:31:14.000
people might be very protective of what they're doing such as if I was preparing
link |
00:31:18.240
evidence for a legal case, I wouldn't want any of that, you know,
link |
00:31:22.640
phoning home to help train the algorithm or anything.
link |
00:31:26.560
There may be other cases where people say having a trial version or they're doing some,
link |
00:31:30.720
I'm not saying we're doing this today, but there's a future scenario where somebody has a more
link |
00:31:35.280
permissive relationship with Adobe where they explicitly say, I'm fine, I'm only doing hobby
link |
00:31:40.400
projects or things which are non confidential and in exchange for some benefit tangible or
link |
00:31:48.000
otherwise, I'm willing to share very fine grain data.
link |
00:31:51.840
So, another possible scenario is to capture relatively crude high level things from more
link |
00:31:57.920
people and then more detailed knowledge from people who are willing to participate.
link |
00:32:02.160
We do that today with explicit customer studies where, you know, we go and visit somebody and
link |
00:32:07.280
ask them to try the tool and we human observe what they're doing.
link |
00:32:12.000
In the future, to be able to do that enough to be able to train an algorithm,
link |
00:32:16.320
we'd need a more systematic process, but we'd have to do it very consciously because
link |
00:32:21.200
one of the things people treasure about Adobe is a sense of trust
link |
00:32:24.560
and we don't want to endanger that through overly aggressive data collection.
link |
00:32:28.880
So, we have a Chief Privacy Officer and it's definitely front and center of thinking about AI
link |
00:32:35.520
rather than an afterthought. Well, when you start that program, sign me up.
link |
00:32:39.920
Okay, happy to.
link |
00:32:42.800
Is there other projects that you wanted to mention that I didn't perhaps that pop into mind?
link |
00:32:48.640
Well, you covered the number. I think you mentioned Project Puppetron.
link |
00:32:51.840
I think that one is interesting because you might think of Adobe as only thinking in 2D
link |
00:32:59.760
and that's a good example where we're actually thinking more three dimensionally about how to
link |
00:33:04.800
assign features to faces so that we can, you know, if you take, so what Puppetron does,
link |
00:33:09.440
it takes either a still or a video of a person talking and then it can take a painting of somebody
link |
00:33:16.320
else and then apply the style of the painting to the person who's talking in the video.
link |
00:33:20.160
And it's unlike a sort of screen door post filter effect that you sometimes see online.
link |
00:33:30.320
It really looks as though it's sort of somehow attached or reflecting the motion of the face.
link |
00:33:36.080
And so that's the case where even to do a 2D workflow like stylization,
link |
00:33:40.320
you really need to infer more about the 3D structure of the world.
link |
00:33:44.160
And I think as 3D computer vision algorithms get better,
link |
00:33:48.560
initially they'll focus on particular domains like faces where you have a lot of
link |
00:33:52.960
prior knowledge about structure and you can maybe have a parameterized template that you fit to the
link |
00:33:57.680
image. But over time, this should be possible for more general content. And it might even be
link |
00:34:03.600
invisible to the user that you're doing 3D reconstruction but under the hood, but it might
link |
00:34:09.360
then let you do edits much more reliably or correctly than you would otherwise.
link |
00:34:15.200
And you know, the face is a very important application, right?
link |
00:34:20.800
So making things work.
link |
00:34:22.480
And a very sensitive one. If you do something uncanny, it's very disturbing.
link |
00:34:26.640
That's right. You have to get it. You have to get it right.
link |
00:34:30.080
So in the space of augmented reality and virtual reality, what do you think is the role of AR and
link |
00:34:39.040
VR and in the content we consume as people as consumers and the content we create as creators?
link |
00:34:45.360
No, that's a great question. Let me think about this a lot too.
link |
00:34:48.720
So I think VR and AR serve slightly different purposes. So VR can really transport you to an
link |
00:34:55.360
entire immersive world, no matter what your personal situation is. To that extent, it's a bit like
link |
00:35:02.400
a really, really widescreen television where it sort of snaps you out of your context and puts you
link |
00:35:06.720
in a new one. And I think it's still evolving in terms of the hardware I actually worked on,
link |
00:35:13.360
VR in the 90s, trying to solve the latency and sort of nausea problem, which we did,
link |
00:35:17.680
but it was very expensive and a bit early. There's a new wave of that now. I think
link |
00:35:23.600
and increasingly those devices are becoming all in one rather than something that's tethered to a
link |
00:35:27.600
box. I think the market seems to be bifurcating into things for consumers and things for professional
link |
00:35:34.480
use cases, like for architects and people designing where your product is a building and you really
link |
00:35:40.080
want to experience it better than looking at a scale model or a drawing, I think,
link |
00:35:45.920
or even than a video. So I think for that, where you need a sense of scale and spatial
link |
00:35:50.320
relationships, it's great. I think AR holds the promise of sort of taking digital assets off the
link |
00:35:59.600
screen and putting them in context in the real world on the table in front of you on the wall
link |
00:36:03.600
behind you. And that has the corresponding need that the assets need to adapt to the physical
link |
00:36:10.400
context in which they're being placed. I mean, it's a bit like having a live theater troupe come
link |
00:36:15.680
to your house and put on Hamlet. My mother had a friend who used to do this at Stately Homes in
link |
00:36:20.880
England for the National Trust. And they would adapt the scenes and even they'd walk the audience
link |
00:36:26.640
through the rooms to see the action based on the country house they found themselves in for two
link |
00:36:32.880
days. And I think AR will have the same issue that if you have a tiny table in a big living room
link |
00:36:38.720
or something, it'll try to figure out what can you change and what's fixed. And there's a little
link |
00:36:44.960
bit of a tension between fidelity, where if you captured Senior Eye of doing a fantastic ballet,
link |
00:36:52.560
you'd want it to be sort of exactly reproduced. And maybe all you could do is scale it down.
link |
00:36:56.640
Whereas somebody telling you a story might be walking around the room doing some gestures
link |
00:37:03.760
and that could adapt to the room in which they were telling the story.
link |
00:37:07.760
And do you think fidelity is that important in that space or is it more about the storytelling?
link |
00:37:12.800
I think it may depend on the characteristic of the media. If it's a famous celebrity,
link |
00:37:17.840
then it may be that you want to catch every nuance and they don't want to be reanimated by some
link |
00:37:22.480
algorithm. It could be that if it's really a loveable frog telling you a story and it's
link |
00:37:30.800
about a princess and a frog, then it doesn't matter if the frog moves in a different way.
link |
00:37:35.520
I think a lot of the ideas that have sort of grown up in the game world will
link |
00:37:39.520
now come into the broader commercial sphere once they're needing adaptive characters in AR.
link |
00:37:45.120
Are you thinking of engineering tools that allow creators to create in the augmented world,
link |
00:37:52.480
basically making a Photoshop for the augmented world?
link |
00:37:57.360
Well, we have shown a few demos of sort of taking a Photoshop layer stack and then expanding it into
link |
00:38:02.560
3D. That's actually been shown publicly as one example in AR. Where we're particularly excited
link |
00:38:08.640
at the moment is in 3D. 3D design is still a very challenging space. We believe that it's
link |
00:38:17.120
a worthwhile experiment to try to figure out if AR or immersive makes 3D design more spontaneous.
link |
00:38:23.360
Can you give me an example of 3D design just like applications?
link |
00:38:26.960
Well, literally, a simple one would be laying out objects, right? On a conventional screen,
link |
00:38:32.080
you'd sort of have a plan view and a side view and a perspective view and you sort of be dragging
link |
00:38:35.680
it around with the mouse and if you're not careful, it would go through the wall and all that.
link |
00:38:39.440
Whereas if you were really laying out objects, say in a VR headset, you could literally move
link |
00:38:46.400
your head to see a different viewpoint. They'd be in stereo, so you'd have a sense of depth
link |
00:38:50.720
because you're already wearing the depth glasses, right? So it would be those sort of big gross
link |
00:38:57.040
motor, move things around, kind of skills seem much more spontaneous just like they are in the
link |
00:39:01.120
real world. The frontier for us, I think, is whether that same medium can be used to do fine
link |
00:39:08.320
grain design tasks, like very accurate constraints on, say, a CAD model or something. That may be
link |
00:39:14.880
better done on a desktop, but it may just be a matter of inventing the right UI.
link |
00:39:20.160
So we're hopeful that because there will be this potential explosion of demand for 3D assets
link |
00:39:27.600
that's driven by AR and more real time animation on conventional screens,
link |
00:39:34.880
those tools will also help with, or those devices will help with designing the content as well.
link |
00:39:40.800
You've mentioned quite a few interesting sort of new ideas. At the same time, there's old
link |
00:39:46.480
timers like me that are stuck in their old ways. I think I'm the old timer.
link |
00:39:51.280
Okay. All right. But the opposed all change at all costs. When you're thinking about
link |
00:39:58.560
creating new interfaces, do you feel the burden of just this giant user base that loves the
link |
00:40:05.440
current product? So anything new you do that any new idea comes at a cost that you'll be resisted?
link |
00:40:13.760
Well, I think if you have to trade off control for convenience, then our existing user base
link |
00:40:21.200
would definitely be offended by that. I think if there are some things where you have more convenience
link |
00:40:27.680
and just as much control, that may be more welcome. We do think about not breaking well known
link |
00:40:34.160
metaphors for things. So things should sort of make sense. Photoshop has never been a static
link |
00:40:40.640
target. It's always been evolving and growing. And to some extent, there's been a lot of brilliant
link |
00:40:46.640
thought along the way of how it works today. So we don't want to just throw all that out.
link |
00:40:51.840
If there's a fundamental breakthrough, like a single click is good enough to select an object
link |
00:40:55.600
rather than having to do lots of strokes, that actually fits in quite nicely to the existing
link |
00:41:01.760
tool set, either as an optional mode or as a starting point. I think where we're looking at
link |
00:41:07.840
radical simplicity where you could encapsulate an entire workflow with a much simpler UI,
link |
00:41:14.000
then sometimes that's easier to do in the context of either a different device like a
link |
00:41:18.640
mobile device where the affordances are naturally different or in a tool that's targeted at a
link |
00:41:25.120
different workflow where it's about spontaneity and velocity rather than precision. And we have
link |
00:41:31.040
projects like Rush, which can let you do professional quality video editing for a certain class of
link |
00:41:36.720
media output that is targeted very differently in terms of users and the experience. And ideally,
link |
00:41:47.440
people would go, if I'm feeling like doing Premiere, big project, I'm doing a four part
link |
00:41:54.160
television series. That's definitely a premier thing. But if I want to do something to show my
link |
00:41:59.200
recent vacation, maybe I'll just use Rush because I can do it in the half an hour. I have free at
link |
00:42:05.200
home rather than the four hours I need to do it at work. And for the use cases which we can do well,
link |
00:42:12.480
it really is much faster to get the same output. But the more professional tools obviously have
link |
00:42:16.960
a much richer toolkit and more flexibility in what they can do.
link |
00:42:21.520
And then at the same time, with the flexibility and control, I like this idea of smart defaults,
link |
00:42:27.040
of using AI to coach you to like what Google has, I'm feeling lucky button.
link |
00:42:33.520
Right. Or one button kind of gives you a pretty good
link |
00:42:38.160
set of settings. And then you almost, that's almost an educational tool.
link |
00:42:42.000
Absolutely. Yeah.
link |
00:42:43.360
To show, because sometimes when you have all this control, you're not sure about the
link |
00:42:51.040
correlation between the different bars that control different elements of the image and so on.
link |
00:42:55.600
And sometimes there's a degree of, you don't know what the optimal is.
link |
00:43:00.400
And then some things are sort of on demand like help, right?
link |
00:43:05.360
Help, yeah.
link |
00:43:06.160
I'm stuck. I need to know what to look for. I'm not quite sure what it's called.
link |
00:43:10.400
And something that was proactively making helpful suggestions or,
link |
00:43:14.800
you know, you could imagine a make a suggestion button where you'd use all of that knowledge
link |
00:43:20.640
of workflows and everything to maybe suggest something to go and learn about or just to try
link |
00:43:25.120
or show the answer. And maybe it's not one intelligent to pothole, but it's like a variety
link |
00:43:31.280
of defaults. And then you go, oh, I like that one.
link |
00:43:33.760
Yeah. Yeah.
link |
00:43:34.960
Several options.
link |
00:43:37.200
So back to poetry.
link |
00:43:39.520
Ah, yes.
link |
00:43:40.640
We're going to interleave.
link |
00:43:43.440
So first few lines of a recent poem of yours before I ask the next question.
link |
00:43:48.000
Yeah. This is about the smartphone. Today left my phone at home and went down to the sea.
link |
00:43:57.120
The sand was soft, the ocean glass, but I was still just me.
link |
00:44:02.480
So this is a poem about you leaving your phone behind and feeling quite liberated because of it.
link |
00:44:08.960
So this is kind of a difficult topic and let's see if we can talk about it, figure it out.
link |
00:44:14.800
But so with the help of AI, more and more, we can create versions of ourselves, versions of
link |
00:44:21.120
reality that are in some ways more beautiful than actual reality. And some of the creative effort
link |
00:44:31.920
there is part of creating this illusion. So of course, this is inevitable, but how do you think
link |
00:44:38.880
we should adjust this human beings to live in this digital world that's partly artificial,
link |
00:44:44.320
that's better than the world that we lived in a hundred years ago when you didn't have
link |
00:44:51.520
Instagram and Facebook versions of ourselves and the online.
link |
00:44:55.840
Oh, this is sort of showing off better versions of ourselves.
link |
00:44:58.880
We're using the tooling of modifying the images or even with artificial intelligence
link |
00:45:04.880
ideas of deep fakes and creating adjusted or fake versions of ourselves and reality.
link |
00:45:14.080
I think it's an interesting question. You're all sort of historical bent on this.
link |
00:45:19.360
I actually wonder if 18th century aristocrats who commissioned famous painters to paint portraits
link |
00:45:24.720
of them had portraits that were slightly nicer than they actually looked in practice.
link |
00:45:28.960
Well played, sir.
link |
00:45:29.680
So human desire to put your best foot forward has always been true.
link |
00:45:37.440
I think it's interesting. You sort of framed it in two ways. One is if we can imagine alternate
link |
00:45:42.240
realities and visualize them, is that a good or bad thing? In the old days, you do it with
link |
00:45:47.280
storytelling and words and poetry, which still resides sometimes on websites. But
link |
00:45:52.560
we've become a very visual culture in particular. In the 19th century, we were very much a text
link |
00:46:00.640
based culture. People would read long tracks. Political speeches were very long. Nowadays,
link |
00:46:07.280
everything's very kind of quick and visual and snappy. I think it depends on how harmless your
link |
00:46:15.280
intent. A lot of it's about intent. So if you have a somewhat flattering photo that you pick
link |
00:46:23.120
out of the photos that you have in your inbox to say, this is what I look like, it's probably fine.
link |
00:46:32.160
If someone's going to judge you by how you look, then they'll decide soon enough when they meet
link |
00:46:37.040
you whether the reality. I think where it can be harmful is if people hold themselves up to an
link |
00:46:45.120
impossible standard, which they then feel bad about themselves for not meeting. I think that's
link |
00:46:49.520
definitely can be an issue. But I think the ability to imagine and visualize an alternate
link |
00:46:58.400
reality, which sometimes which you then go off and build later, can be a wonderful thing too.
link |
00:47:04.800
People can imagine architectural styles, which they then have a startup, make a fortune and then
link |
00:47:10.720
build a house that looks like their favorite video game. Is that a terrible thing? I think
link |
00:47:18.720
I used to worry about exploration actually, that part of the joy of going to the moon
link |
00:47:24.560
when I was a tiny child, I remember it, and grainy black and white, was to know what it would look
link |
00:47:30.320
like when you got there. And I think now we have such good graphics for knowing, for visualizing
link |
00:47:35.120
experience before it happens, that I slightly worry that it may take the edge off actually wanting
link |
00:47:40.960
to go. Because we've seen it on TV, we kind of, oh, by the time we finally get to Mars,
link |
00:47:46.160
we're going, oh yeah, it's Mars, that's what it looks like. But then the outer exploration,
link |
00:47:53.200
I mean, I think Pluto was a fantastic recent discovery where nobody had any idea what it
link |
00:47:58.480
looked like and it was just breathtakingly varied and beautiful. So I think expanding
link |
00:48:04.800
the ability of the human toolkit to imagine and communicate on balance is a good thing.
link |
00:48:10.800
I think there are abuses, we definitely take them seriously and try to discourage them.
link |
00:48:17.440
I think there's a parallel side where the public needs to know what's possible through events like
link |
00:48:22.960
this, right? So that you don't believe everything you read and print anymore, and it may over time
link |
00:48:30.720
become true of images as well. Or you need multiple sets of evidence to really believe
link |
00:48:35.440
something rather than a single media asset. So I think it's a constantly evolving thing.
link |
00:48:40.240
It's been true forever. There's a famous story about Anne of Cleves and Henry VIII where,
link |
00:48:47.760
luckily for Anne, they didn't get married, right? So, or they got married and
link |
00:48:53.840
What's the story?
link |
00:48:54.560
Oh, so Holbein went and painted a picture and then Henry VIII wasn't pleased and, you know,
link |
00:48:59.200
history doesn't record whether Anne was pleased, but I think she was pleased not
link |
00:49:03.520
to be married more than a day or something. So I mean, this has gone on for a long time,
link |
00:49:07.920
but I think it's just part of the magnification of human capability.
link |
00:49:14.640
You've kind of built up an amazing research environment here, research culture, research
link |
00:49:20.560
lab, and you've written that the secret to a thriving research lab is interns. Can you unpack
link |
00:49:25.600
that a little bit? Oh, absolutely. So a couple of reasons.
link |
00:49:31.360
As you see, looking at my personal history, there are certain ideas you bond with at a certain
link |
00:49:36.080
stage of your career and you tend to keep revisiting them through time. If you're lucky,
link |
00:49:40.880
you pick one that doesn't just get solved in the next five years, and then you're sort of out of
link |
00:49:44.880
luck. So I think a constant influx of new people brings new ideas with it.
link |
00:49:49.840
From the point of view of industrial research, because a big part of what we do is really taking
link |
00:49:56.800
those ideas to the point where they can ship us very robust features, you end up investing a lot
link |
00:50:03.360
in a particular idea. And if you're not careful, people can get too conservative in what they
link |
00:50:08.640
choose to do next, knowing that the product teams will want it. And interns let you explore the more
link |
00:50:15.280
fanciful or unproven ideas in a relatively lightweight way, ideally leading to new publications for
link |
00:50:22.160
the intern and for the researcher. And it gives you then a portfolio from which to draw which idea
link |
00:50:28.080
am I going to then try to take all the way through to being robust in the next year or two to ship.
link |
00:50:34.000
So it sort of becomes part of the funnel. It's also a great way for us to
link |
00:50:38.160
identify future full time researchers, many of our greatest researchers were former interns.
link |
00:50:42.640
It builds a bridge to university departments so we can get to know and build an enduring relationship
link |
00:50:48.800
with the professors and we often do academic give funds to as well as an acknowledgement of the
link |
00:50:53.840
value the interns add and their own collaborations. So it's sort of a virtuous cycle. And then the
link |
00:51:01.360
long term legacy of a great research lab hopefully will be not only the people who stay but the ones
link |
00:51:06.960
who move through and then go off and carry that same model to other companies.
link |
00:51:11.520
And so we believe strongly in industrial research and how it can complement academia and
link |
00:51:17.440
we hope that this model will continue to propagate and be invested in by other companies,
link |
00:51:21.920
which makes it harder for us to recruit, of course, but you know, that's the sign of success
link |
00:51:26.800
and a rising tide lifts all ships in that sense.
link |
00:51:31.040
And where's the idea of born with the interns? Is there brainstorming? Is there discussions
link |
00:51:38.080
about, you know, like what the ideas come from? Yeah, as I'm asking the question, I
link |
00:51:46.240
realized how dumb it is, but I'm hoping you have a better answer than a question I ask at the
link |
00:51:51.760
beginning of every summer. So what will happen is we'll send out a call for interns. They'll
link |
00:52:00.000
we'll have a number of resumes come in, people will contact the candidates, talk to them about
link |
00:52:04.240
their interests. They'll usually try to find some somebody who has a reasonably good match to what
link |
00:52:09.920
they're already doing, or just has a really interesting domain that they've been pursuing in
link |
00:52:14.320
their PhD. And we think we'd love to do one of those projects too. And then the intern stays in
link |
00:52:20.640
touch with the mentors, we call them. And then they come and in the first at the end of two weeks,
link |
00:52:27.360
they have to decide. So they'll often have a general sense by the time they arrive.
link |
00:52:32.000
And we'll have internal discussions about what are all the general
link |
00:52:36.800
ideas that we're wanting to pursue to see whether two people have the same idea and maybe they
link |
00:52:41.120
should talk and all that. But then once the intern actually arrives, sometimes the idea goes linearly
link |
00:52:47.040
and sometimes it takes a giant left turn and we go, that sounded good. But when we thought about
link |
00:52:51.120
it, there's this other project or it's already been done and we found this paper that we were
link |
00:52:54.880
scooped. But we have this other great idea. So it's pretty flexible at the beginning. One of the
link |
00:53:02.240
questions for research labs is who's deciding what to do, and then who's to blame if it goes
link |
00:53:08.080
wrong, who gets the credit if it goes right. And so in Adobe, we push the needle very much towards
link |
00:53:15.600
freedom of choice of projects by the researchers and the interns. But then we reward people based
link |
00:53:22.960
on impact. So if the projects ultimately end up impacting the products and having papers and so
link |
00:53:28.000
on. And so your alternative model just to be clear is that you have one lab director who thinks he's
link |
00:53:34.400
a genius and tells everybody what to do, takes all the credit if it goes well, blames everybody
link |
00:53:38.720
else if it goes badly. So we don't want that model. And this helps new ideas percolate up.
link |
00:53:45.440
The art of running such a lab is that there are strategic priorities for the company
link |
00:53:49.840
and there are areas where we do want to invest in pressing problems. And so it's a little bit of a
link |
00:53:55.520
trickle down and filter up meets in the middle. And so you don't tell people you have to do X,
link |
00:54:01.360
but you say X would be particularly appreciated this year. And then people reinterpret X through
link |
00:54:07.360
the filter of things they want to do and they're interested in. And miraculously, it usually comes
link |
00:54:12.720
together very well. One thing that really helps is Adobe has a really broad portfolio of products.
link |
00:54:18.640
So if we have a good idea, there's usually a product team that is intrigued or interested.
link |
00:54:26.000
So it means we don't have to qualify things too much ahead of time. Once in a while, the product
link |
00:54:31.520
teams sponsor an extra intern because they have a particular problem that they really care about,
link |
00:54:36.880
in which case it's a little bit more, we really need one of these. And then we sort of say,
link |
00:54:41.440
great, I get an extra intern. We find an intern who thinks that's a great problem. But that's not
link |
00:54:45.920
the typical model. That's sort of the icing on the cake as far as the budget's concerned.
link |
00:54:51.440
And all of the above end up being important. It's really hard to predict at the beginning of the
link |
00:54:55.920
summer, which we all have high hopes of all of the intern projects. But ultimately, some of them
link |
00:55:01.200
pay off and some of them sort of are a nice paper, but don't turn into a feature. Others turn out
link |
00:55:06.480
not to be as novel as we thought, but they'd be a great feature, but not a paper. And then others,
link |
00:55:12.080
we make a little bit of progress and we realize how much we don't know. And maybe we revisit that
link |
00:55:16.400
problem several years in a row until it finally we have a breakthrough. And then it becomes more
link |
00:55:22.320
on track to impact a product. Jumping back to a big overall view of Adobe Research, what are you
link |
00:55:30.320
looking forward to in 2019 and beyond? What is, you mentioned there's a giant suite of products,
link |
00:55:37.360
giant suite of products, giant suite of ideas, new interns, a large team of researchers.
link |
00:55:46.960
Where do you think the future holds? In terms of the technological breakthroughs?
link |
00:55:54.400
Technological breakthroughs, especially ones that will make it into product will get to
link |
00:56:00.960
impact the world. So I think the creative or the analytics assistance that we talked about where
link |
00:56:05.920
they're constantly trying to figure out what you're trying to do and how can they be helpful and make
link |
00:56:10.320
useful suggestions is a really hot topic. And it's very unpredictable as to when it'll be ready,
link |
00:56:16.160
but I'm really looking forward to seeing how much progress we make against that. I think
link |
00:56:22.480
some of the core technologies like generative adversarial networks are immensely promising
link |
00:56:28.640
and seeing how quickly those become practical for mainstream use cases at high resolution with
link |
00:56:34.720
really good quality is also exciting. And they also have this sort of strange way of even the
link |
00:56:39.840
things they do oddly are odd in an interesting way. So it can look like dreaming or something.
link |
00:56:46.160
So that's fascinating. I think internally we have a Sensei platform, which is a way in which
link |
00:56:55.040
we're pooling our neural net and other intelligence models into a central platform, which can then be
link |
00:57:01.760
leveraged by multiple product teams at once. So we're in the middle of transitioning from a,
link |
00:57:07.040
you know, once you have a good idea, you pick a product team to work with and you sort of hand
link |
00:57:11.040
design it for that use case to a more sort of Henry Ford, stand it up in a standard way, which
link |
00:57:17.520
can be accessed in a standard way, which should mean that the time between a good idea and impacting
link |
00:57:22.880
our products will be greatly shortened. And when one product has a good idea, many of the other
link |
00:57:28.960
products can just leverage it too. So it's sort of an economy of scale. So that's more about the
link |
00:57:34.240
how then the what, but that combination of this sort of renaissance in AI, there's a comparable
link |
00:57:39.680
one in graphics with real time ray tracing and other really exciting emerging technologies.
link |
00:57:45.280
And when these all come together, you'll sort of basically be dancing with light, right? Where
link |
00:57:49.920
you'll have real time shadows, reflections, and as if it's a real world in front of you, but then
link |
00:57:56.080
with all these magical properties brought by AI where it sort of anticipates or modifies itself
link |
00:58:00.880
in ways that make sense based on how it understands the creative task you're trying to do.
link |
00:58:06.320
That's a really exciting future for creative for myself too, the creator. So first of all,
link |
00:58:12.320
I work in autonomous vehicles. I'm a roboticist. I love robots. And I think you have a fascination
link |
00:58:17.760
with snakes, both natural and artificial robots. I share your fascination. I mean, their movement
link |
00:58:23.920
is beautiful, adaptable. The adaptability is fascinating. There are, I looked it up, 2900
link |
00:58:31.760
species of snakes in the world. Wow. The 175 venomous, some are tiny, some are huge.
link |
00:58:38.880
Saw that there's one that's 25 feet in some cases. So what's the most interesting thing
link |
00:58:44.560
that you connect with in terms of snakes, both natural and artificial? Why, what was the connection
link |
00:58:52.000
with robotics AI in this particular form of a robot? Well, it actually came out of my work
link |
00:58:58.000
in the 80s on computer animation, where I started doing things like cloth simulation and
link |
00:59:02.880
other kind of soft body simulation. And you'd sort of drop it and it would bounce,
link |
00:59:07.360
then it would just sort of stop moving. And I thought, well, what if you animate the spring
link |
00:59:10.880
lengths and simulate muscles? And the simplest object I could do that for was an earthworm.
link |
00:59:16.160
So I actually did a paper in 1988 on called the motion dynamics of snakes and worms. And I
link |
00:59:21.680
read the physiology literature on both Hale snakes and worms move and then did some of the early
link |
00:59:27.760
computer animation examples of that. So your interest in robotics started with graphics?
link |
00:59:34.640
Came out of simulation and graphics. When I moved from Alias to Apple, we actually did a
link |
00:59:40.960
movie called Her Majesty's Secret Serpent, which is about a secret agent snake that parachutes in
link |
00:59:46.160
and captures a film canister from a satellite, which tells you how old fashioned we were thinking
link |
00:59:50.320
back then, sort of classic 19 sort of 50s or 60s Bond movie kind of thing. And at the same time,
link |
00:59:57.760
I'd always made radio control ships when I was a child and from scratch. And I thought, well,
link |
01:00:03.120
how can it be to build a real one? And so then started what turned out to be like a 15 year
link |
01:00:08.800
obsession with trying to build better snake robots. And the first one that I built just sort of
link |
01:00:14.320
slithered sideways, but didn't actually go forward, then added wheels and building things in real
link |
01:00:19.520
life makes you honest about the friction. The thing that appeals to me is I love creating the
link |
01:00:25.840
illusion of life, which is what drove me to drove me to animation. And if you have a robot with
link |
01:00:30.800
enough degrees of coordinated freedom that move in a kind of biological way, then it starts to
link |
01:00:36.320
cross the Yankani Valley and to see me like a creature rather than a thing. And I certainly got
link |
01:00:42.080
that with the early snakes by S3, I had it able to sidewind as well as go directly forward. My
link |
01:00:50.320
wife to be suggested that it would be the ring bearer at our wedding. So it actually went down
link |
01:00:54.240
the aisle carrying the rings and got in the local paper for that, which was really fun.
link |
01:01:00.160
And this was all done as a hobby. And then I at the time that can onboard compute was incredibly
link |
01:01:07.200
limited. It was sort of yes, you should explain that these snakes, the whole idea is that you would
link |
01:01:11.840
you're trying to run it autonomously. Autonomously, on board right. And so
link |
01:01:19.760
the very first one, I actually built the controller from discrete logic, because I used to do LSI,
link |
01:01:25.280
you know, circuits and things when I was a teenager. And then the second and third one,
link |
01:01:30.640
the 8 bit microprocessors were available with like a whole 256 bytes of RAM,
link |
01:01:36.000
which you could just about squeeze in. So they were radio controlled rather than autonomous.
link |
01:01:39.840
And really, we're more about the physic physicality and coordinated motion.
link |
01:01:46.560
I've occasionally taken a side step into if only I could make it cheaply enough,
link |
01:01:51.520
bake a great toy, which has been a lesson in how clockwork is its own magical realm that
link |
01:01:59.040
you venture into and learn things about backlash and other things you don't take into account as
link |
01:02:03.680
a computer scientist, which is why what seemed like a good idea doesn't work. So it's quite
link |
01:02:07.600
humbling. And then more recently, I've been building S9, which is a much better engineered
link |
01:02:14.160
version of S3 where the motors wore out and it doesn't work anymore. And you can't buy
link |
01:02:17.760
replacements, which is sad given that it was such a meaningful one. S5 was about twice as long and
link |
01:02:25.760
look much more biologically inspired. I, unlike the typical roboticist, I taper my snakes.
link |
01:02:33.520
There are good mechanical reasons to do that, but it also makes them look more biological,
link |
01:02:37.040
although it means every segment's unique rather than a repetition, which is why most engineers
link |
01:02:43.280
don't do it. It actually saves weight and leverage and everything. And that one is currently on
link |
01:02:49.840
display at the International Spy Museum in Washington, DC. None of it has done any spying.
link |
01:02:56.080
It was on YouTube and it got its own conspiracy theory where people thought that it wasn't real
link |
01:03:00.000
because they work at Adobe, it must be fake graphics. And people would write to me, tell me
link |
01:03:04.160
it's real. They say the background doesn't move and it's like, it's on a tripod. So that one,
link |
01:03:11.200
but you can see the real thing. So it really is true. And then the latest one is the first one
link |
01:03:16.960
where I could put a Raspberry Pi, which leads to all sorts of terrible jokes about pythons and
link |
01:03:21.280
things. But this one can have onboard compute. And then where my hobby work and my work worker
link |
01:03:29.920
converging is you can now add vision accelerator chips, which can evaluate neural nets and do
link |
01:03:36.400
object recognition and everything. So both for the snakes and more recently for the spider that
link |
01:03:41.600
I've been working on, having desktop level compute is now opening up a whole world of
link |
01:03:49.200
true autonomy with onboard compute, onboard batteries, and still having that sort of
link |
01:03:54.880
biomimetic quality that appeals to children in particular. They are really drawn to them and
link |
01:04:01.680
adults think they look creepy, but children actually think they look charming. And I gave a
link |
01:04:08.960
series of lectures at Girls Who Code to encourage people to take an interest in technology. And
link |
01:04:14.880
at the moment, I'd say they're still more expensive than the value that they add,
link |
01:04:19.200
which is why they're a great hobby for me, but they're not really a great product.
link |
01:04:22.560
It makes me think about doing that very early thing I did at Alias with changing the muscle
link |
01:04:30.000
rest lengths. If I could do that with a real artificial muscle material, then the next snake
link |
01:04:35.760
ideally would use that rather than motors and gearboxes and everything. It would be lighter,
link |
01:04:40.960
much stronger, and more continuous and smooth. So I like to say being in research as a license
link |
01:04:49.200
to be curious, and I have the same feeling with my hobby yet. It forced me to read biology and
link |
01:04:54.640
be curious about things that otherwise would have just been natural geographic special.
link |
01:04:59.680
Suddenly, I'm thinking, how does that snake move? Can I copy it? I look at the trails that
link |
01:05:04.320
side winding snakes leave in sand and see if my snake robots would do the same thing.
link |
01:05:10.240
Out of something inanimate, I like why you put a try to bring life into it and beauty.
link |
01:05:13.840
Absolutely. And then ultimately, give it a personality, which is where the intelligent
link |
01:05:18.240
agent research will converge with the vision and voice synthesis to give it a sense of having
link |
01:05:25.040
not necessarily human level intelligence. I think the Turing test is such a high bar, it's
link |
01:05:30.480
a little bit self defeating, but having one that you can have a meaningful conversation with,
link |
01:05:36.880
especially if you have a reasonably good sense of what you can say. So not trying to have it so
link |
01:05:43.040
a stranger could walk up and have one, but so as a pet owner or a robot pet owner, you could
link |
01:05:50.080
know what it thinks about and what it can reason about.
link |
01:05:53.200
Or sometimes just meaningful interaction. If you have the kind of interaction you have with a dog,
link |
01:05:58.800
sometimes you might have a conversation, but it's usually one way.
link |
01:06:02.080
Absolutely.
link |
01:06:02.640
And nevertheless, it feels like a meaningful connection.
link |
01:06:06.800
And one of the things that I'm trying to do in the sample audio that will play you is beginning
link |
01:06:12.720
to get towards the point where the reasoning system can explain why it knows something or
link |
01:06:17.680
why it thinks something. And that again, creates the sense that it really does know what it's
link |
01:06:22.320
talking about, but also for debugging. As you get more and more elaborate behavior, it's like,
link |
01:06:29.840
why did you decide to do that? How do you know that? I think the robot's really
link |
01:06:37.120
my muse for helping me think about the future of AI and what to invent next.
link |
01:06:42.560
So even at Adobe, that's mostly operating in the digital world.
link |
01:06:47.040
Correct.
link |
01:06:47.440
Do you ever, do you see a future where Adobe even expands into the more physical world perhaps?
link |
01:06:54.480
So bringing life not into animations, but bringing life into physical objects with whether it's,
link |
01:07:02.080
well, I'd have to say at the moment it's a twinkle in my eye. I think the more likely thing is that
link |
01:07:08.640
we will bring virtual objects into the physical world through augmented reality.
link |
01:07:14.240
And many of the ideas that might take five years to build a robot to do, you can do in a few weeks
link |
01:07:21.360
with digital assets. So I think when really intelligent robots finally become commonplace,
link |
01:07:29.120
they won't be that surprising because we'll have been living with those personalities in the virtual
link |
01:07:34.000
sphere for a long time. And then they'll just say, oh, it's Siri with legs or Alexa,
link |
01:07:39.520
Alexa on hooves or something. So I can see that welcoming. And for now, it's still an adventure
link |
01:07:47.760
and we don't know quite what the experience will be like. And it's really exciting to sort of see
link |
01:07:53.760
all of these different strands of my career converge. Yeah, in interesting ways. And it is
link |
01:07:59.680
definitely a fun adventure. So let me end with my favorite poem, the last few lines of my favorite
link |
01:08:07.920
poem of yours that ponders mortality. And in some sense, immortality, as our ideas live through
link |
01:08:14.560
the ideas of others through the work of others, it ends with, do not weep or mourn. It was enough
link |
01:08:21.120
the little atomies permitted just a single dance. Scatter them as deep as your eyes can see. I'm
link |
01:08:28.000
content they'll have another chance, sweeping more centered parts along to join a jostling,
link |
01:08:34.320
lifting throng as others danced in me. Beautiful poem. Beautiful way to end it. Gavin, thank you
link |
01:08:41.440
so much for talking today. And thank you for inspiring and empowering millions of people
link |
01:08:45.920
like myself for creating amazing stuff. Oh, thank you. It's a great conversation.