back to indexChris Lattner: Compilers, LLVM, Swift, TPU, and ML Accelerators | Lex Fridman Podcast #21
link |
The following is a conversation with Chris Latner.
link |
Currently, he's a senior director
link |
at Google working on several projects, including CPU, GPU,
link |
TPU accelerators for TensorFlow, Swift for TensorFlow,
link |
and all kinds of machine learning compiler magic
link |
going on behind the scenes.
link |
He's one of the top experts in the world
link |
on compiler technologies, which means he deeply
link |
understands the intricacies of how hardware and software come
link |
together to create efficient code.
link |
He created the LLVM compiler infrastructure project
link |
and the Clang compiler.
link |
He led major engineering efforts at Apple,
link |
including the creation of the Swift programming language.
link |
He also briefly spent time at Tesla
link |
as vice president of Autopilot software
link |
during the transition from Autopilot hardware 1
link |
to hardware 2, when Tesla essentially
link |
started from scratch to build an in house software
link |
infrastructure for Autopilot.
link |
I could have easily talked to Chris for many more hours.
link |
Compiling code down across the levels of abstraction
link |
is one of the most fundamental and fascinating aspects
link |
of what computers do, and he is one of the world
link |
experts in this process.
link |
It's rigorous science, and it's messy, beautiful art.
link |
This conversation is part of the Artificial Intelligence
link |
If you enjoy it, subscribe on YouTube, iTunes,
link |
or simply connect with me on Twitter at Lex Friedman,
link |
And now, here's my conversation with Chris Ladner.
link |
What was the first program you've ever written?
link |
Back, and when was it?
link |
I think I started as a kid, and my parents
link |
got a basic programming book.
link |
And so when I started, it was typing out programs
link |
from a book, and seeing how they worked,
link |
and then typing them in wrong, and trying
link |
to figure out why they were not working right,
link |
that kind of stuff.
link |
So BASIC, what was the first language
link |
that you remember yourself maybe falling in love with,
link |
like really connecting with?
link |
I mean, I feel like I've learned a lot along the way,
link |
and each of them have a different special thing
link |
So I started in BASIC, and then went like GW BASIC,
link |
which was the thing back in the DOS days,
link |
and then upgraded to QBASIC, and eventually QuickBASIC,
link |
which are all slightly more fancy versions of Microsoft
link |
Made the jump to Pascal, and started
link |
doing machine language programming and assembly
link |
in Pascal, which was really cool.
link |
Turbo Pascal was amazing for its day.
link |
Eventually got into C, C++, and then kind of did
link |
lots of other weird things.
link |
I feel like you took the dark path, which is the,
link |
you could have gone Lisp.
link |
You could have gone higher level sort
link |
of functional philosophical hippie route.
link |
Instead, you went into like the dark arts of the C.
link |
It was straight into the machine.
link |
Straight to the machine.
link |
So I started with BASIC, Pascal, and then Assembly,
link |
and then wrote a lot of Assembly.
link |
And I eventually did Smalltalk and other things like that.
link |
But that was not the starting point.
link |
But so what is this journey to C?
link |
Is that in high school?
link |
Is that in college?
link |
That was in high school, yeah.
link |
And then that was really about trying
link |
to be able to do more powerful things than what Pascal could
link |
do, and also to learn a different world.
link |
So he was really confusing to me with pointers
link |
and the syntax and everything, and it took a while.
link |
But Pascal's much more principled in various ways.
link |
C is more, I mean, it has its historical roots,
link |
but it's not as easy to learn.
link |
With pointers, there's this memory management thing
link |
that you have to become conscious of.
link |
Is that the first time you start to understand
link |
that there's resources that you're supposed to manage?
link |
Well, so you have that in Pascal as well.
link |
But in Pascal, like the caret instead of the star,
link |
there's some small differences like that.
link |
But it's not about pointer arithmetic.
link |
And in C, you end up thinking about how things get
link |
laid out in memory a lot more.
link |
And so in Pascal, you have allocating and deallocating
link |
and owning the memory, but just the programs are simpler,
link |
and you don't have to.
link |
Well, for example, Pascal has a string type.
link |
And so you can think about a string
link |
instead of an array of characters
link |
which are consecutive in memory.
link |
So it's a little bit of a higher level abstraction.
link |
So let's get into it.
link |
Let's talk about LLVM, C lang, and compilers.
link |
So can you tell me first what LLVM and C lang are?
link |
And how is it that you find yourself
link |
the creator and lead developer, one
link |
of the most powerful compiler optimization systems
link |
So I guess they're different things.
link |
So let's start with what is a compiler?
link |
Is that a good place to start?
link |
What are the phases of a compiler?
link |
Where are the parts?
link |
So what is even a compiler used for?
link |
So the way I look at this is you have a two sided problem of you
link |
have humans that need to write code.
link |
And then you have machines that need to run
link |
the program that the human wrote.
link |
And for lots of reasons, the humans
link |
don't want to be writing in binary
link |
and want to think about every piece of hardware.
link |
And so at the same time that you have lots of humans,
link |
you also have lots of kinds of hardware.
link |
And so compilers are the art of allowing
link |
humans to think at a level of abstraction
link |
that they want to think about.
link |
And then get that program, get the thing that they wrote,
link |
to run on a specific piece of hardware.
link |
And the interesting and exciting part of all this
link |
is that there's now lots of different kinds of hardware,
link |
chips like x86 and PowerPC and ARM and things like that.
link |
But also high performance accelerators
link |
for machine learning and other things like that
link |
are also just different kinds of hardware, GPUs.
link |
These are new kinds of hardware.
link |
And at the same time, on the programming side of it,
link |
you have basic, you have C, you have JavaScript,
link |
you have Python, you have Swift.
link |
You have lots of other languages
link |
that are all trying to talk to the human in a different way
link |
to make them more expressive and capable and powerful.
link |
And so compilers are the thing
link |
that goes from one to the other.
link |
End to end, from the very beginning to the very end.
link |
And so you go from what the human wrote
link |
and programming languages end up being about
link |
expressing intent, not just for the compiler
link |
and the hardware, but the programming language's job
link |
is really to capture an expression
link |
of what the programmer wanted
link |
that then can be maintained and adapted
link |
and evolved by other humans,
link |
as well as interpreted by the compiler.
link |
So when you look at this problem,
link |
you have, on the one hand, humans, which are complicated.
link |
And you have hardware, which is complicated.
link |
And so compilers typically work in multiple phases.
link |
And so the software engineering challenge
link |
that you have here is try to get maximum reuse
link |
out of the amount of code that you write,
link |
because these compilers are very complicated.
link |
And so the way it typically works out
link |
is that you have something called a front end or a parser
link |
that is language specific.
link |
And so you'll have a C parser, and that's what Clang is,
link |
or C++ or JavaScript or Python or whatever.
link |
That's the front end.
link |
Then you'll have a middle part,
link |
which is often the optimizer.
link |
And then you'll have a late part,
link |
which is hardware specific.
link |
And so compilers end up,
link |
there's many different layers often,
link |
but these three big groups are very common in compilers.
link |
And what LLVM is trying to do
link |
is trying to standardize that middle and last part.
link |
And so one of the cool things about LLVM
link |
is that there are a lot of different languages
link |
that compile through to it.
link |
And so things like Swift, but also Julia, Rust,
link |
Clang for C, C++, Subjective C,
link |
like these are all very different languages
link |
and they can all use the same optimization infrastructure,
link |
which gets better performance,
link |
and the same code generation infrastructure
link |
for hardware support.
link |
And so LLVM is really that layer that is common,
link |
that all these different specific compilers can use.
link |
And is it a standard, like a specification,
link |
or is it literally an implementation?
link |
It's an implementation.
link |
And so I think there's a couple of different ways
link |
of looking at it, right?
link |
Because it depends on which angle you're looking at it from.
link |
LLVM ends up being a bunch of code, okay?
link |
So it's a bunch of code that people reuse
link |
and they build compilers with.
link |
We call it a compiler infrastructure
link |
because it's kind of the underlying platform
link |
that you build a concrete compiler on top of.
link |
But it's also a community.
link |
And the LLVM community is hundreds of people
link |
that all collaborate.
link |
And one of the most fascinating things about LLVM
link |
over the course of time is that we've managed somehow
link |
to successfully get harsh competitors
link |
in the commercial space to collaborate
link |
on shared infrastructure.
link |
And so you have Google and Apple,
link |
you have AMD and Intel,
link |
you have Nvidia and AMD on the graphics side,
link |
you have Cray and everybody else doing these things.
link |
And all these companies are collaborating together
link |
to make that shared infrastructure really, really great.
link |
And they do this not out of the goodness of their heart,
link |
but they do it because it's in their commercial interest
link |
of having really great infrastructure
link |
that they can build on top of
link |
and facing the reality that it's so expensive
link |
that no one company, even the big companies,
link |
no one company really wants to implement it all themselves.
link |
Expensive or difficult?
link |
That's a great point because it's also about the skill sets.
link |
And the skill sets are very hard to find.
link |
How big is the LLVM?
link |
It always seems like with open source projects,
link |
the kind, an LLVM is open source?
link |
Yes, it's open source.
link |
It's about, it's 19 years old now, so it's fairly old.
link |
It seems like the magic often happens
link |
within a very small circle of people.
link |
At least their early birth and whatever.
link |
Yes, so the LLVM came from a university project,
link |
and so I was at the University of Illinois.
link |
And there it was myself, my advisor,
link |
and then a team of two or three research students
link |
in the research group,
link |
and we built many of the core pieces initially.
link |
I then graduated and went to Apple,
link |
and at Apple brought it to the products,
link |
first in the OpenGL graphics stack,
link |
but eventually to the C compiler realm,
link |
and eventually built Clang,
link |
and eventually built Swift and these things.
link |
Along the way, building a team of people
link |
that are really amazing compiler engineers
link |
that helped build a lot of that.
link |
And so as it was gaining momentum
link |
and as Apple was using it, being open source and public
link |
and encouraging contribution,
link |
many others, for example, at Google,
link |
came in and started contributing.
link |
And in some cases, Google effectively owns Clang now
link |
because it cares so much about C++
link |
and the evolution of that ecosystem,
link |
and so it's investing a lot in the C++ world
link |
and the tooling and things like that.
link |
And so likewise, NVIDIA cares a lot about CUDA.
link |
And so CUDA uses Clang and uses LLVM
link |
for graphics and GPGPU.
link |
And so when you first started as a master's project,
link |
I guess, did you think it was gonna go as far as it went?
link |
Were you crazy ambitious about it?
link |
It seems like a really difficult undertaking, a brave one.
link |
Yeah, no, no, no, it was nothing like that.
link |
So my goal when I went to the University of Illinois
link |
was to get in and out with a non thesis masters in a year
link |
and get back to work.
link |
So I was not planning to stay for five years
link |
and build this massive infrastructure.
link |
I got nerd sniped into staying.
link |
And a lot of it was because LLVM was fun
link |
and I was building cool stuff
link |
and learning really interesting things
link |
and facing both software engineering challenges,
link |
but also learning how to work in a team
link |
and things like that.
link |
I had worked at many companies as interns before that,
link |
but it was really a different thing
link |
to have a team of people that are working together
link |
and try and collaborate in version control.
link |
And it was just a little bit different.
link |
Like I said, I just talked to Don Knuth
link |
and he believes that 2% of the world population
link |
have something weird with their brain,
link |
that they're geeks, they understand computers,
link |
they're connected with computers.
link |
He put it at exactly 2%.
link |
He's a specific guy.
link |
It's very specific.
link |
Well, he says, I can't prove it,
link |
but it's very empirically there.
link |
Is there something that attracts you
link |
to the idea of optimizing code?
link |
And he seems like that's one of the biggest,
link |
coolest things about LLVM.
link |
Yeah, that's one of the major things it does.
link |
So I got into that because of a person, actually.
link |
So when I was in my undergraduate,
link |
I had an advisor, or a professor named Steve Vegdahl.
link |
And he, I went to this little tiny private school.
link |
There were like seven or nine people
link |
in my computer science department,
link |
students in my class.
link |
So it was a very tiny, very small school.
link |
It was kind of a wart on the side of the math department
link |
kind of a thing at the time.
link |
I think it's evolved a lot in the many years since then.
link |
But Steve Vegdahl was a compiler guy.
link |
And he was super passionate.
link |
And his passion rubbed off on me.
link |
And one of the things I like about compilers
link |
is that they're large, complicated software pieces.
link |
And so one of the culminating classes
link |
that many computer science departments,
link |
at least at the time, did was to say
link |
that you would take algorithms and data structures
link |
and all these core classes.
link |
But then the compilers class was one of the last classes
link |
you take because it pulls everything together.
link |
And then you work on one piece of code
link |
over the entire semester.
link |
And so you keep building on your own work,
link |
which is really interesting.
link |
And it's also very challenging because in many classes,
link |
if you don't get a project done, you just forget about it
link |
and move on to the next one and get your B or whatever it is.
link |
But here you have to live with the decisions you make
link |
and continue to reinvest in it.
link |
And I really like that.
link |
And so I did an extra study project
link |
with him the following semester.
link |
And he was just really great.
link |
And he was also a great mentor in a lot of ways.
link |
And so from him and from his advice,
link |
he encouraged me to go to graduate school.
link |
I wasn't super excited about going to grad school.
link |
I wanted the master's degree, but I
link |
didn't want to be an academic.
link |
But like I said, I kind of got tricked into saying
link |
and was having a lot of fun.
link |
And I definitely do not regret it.
link |
What aspects of compilers were the things you connected with?
link |
So LLVM, there's also the other part
link |
that's really interesting if you're interested in languages
link |
is parsing and just analyzing the language,
link |
breaking it down, parsing, and so on.
link |
Was that interesting to you, or were you
link |
more interested in optimization?
link |
For me, it was more so I'm not really a math person.
link |
I understand some bits of it when I get into it.
link |
But math is never the thing that attracted me.
link |
And so a lot of the parser part of the compiler
link |
has a lot of good formal theories
link |
that Don, for example, knows quite well.
link |
I'm still waiting for his book on that.
link |
But I just like building a thing and seeing what it could do
link |
and exploring and getting it to do more things
link |
and then setting new goals and reaching for them.
link |
And in the case of LLVM, when I started working on that,
link |
my research advisor that I was working for was a compiler guy.
link |
And so he and I specifically found each other
link |
because we were both interested in compilers.
link |
And so I started working with him and taking his class.
link |
And a lot of LLVM initially was, it's
link |
fun implementing all the standard algorithms and all
link |
the things that people had been talking about
link |
and were well known.
link |
And they were in the curricula for advanced studies
link |
And so just being able to build that was really fun.
link |
And I was learning a lot by, instead of reading about it,
link |
And so I enjoyed that.
link |
So you said compilers are these complicated systems.
link |
Can you even just with language try
link |
to describe how you turn a C++ program into code?
link |
Like, what are the hard parts?
link |
Why is it so hard?
link |
So I'll give you examples of the hard parts along the way.
link |
So C++ is a very complicated programming language.
link |
It's something like 1,400 pages in the spec.
link |
So C++ by itself is crazy complicated.
link |
Can we just pause?
link |
What makes the language complicated in terms
link |
of what's syntactically?
link |
So it's what they call syntax.
link |
So the actual how the characters are arranged, yes.
link |
It's also semantics, how it behaves.
link |
It's also, in the case of C++, there's
link |
a huge amount of history.
link |
C++ is built on top of C. You play that forward.
link |
And then a bunch of suboptimal, in some cases, decisions
link |
were made, and they compound.
link |
And then more and more and more things
link |
keep getting added to C++, and it will probably never stop.
link |
But the language is very complicated
link |
from that perspective.
link |
And so the interactions between subsystems
link |
is very complicated.
link |
There's just a lot there.
link |
And when you talk about the front end,
link |
one of the major challenges, which
link |
clang as a project, the C, C++ compiler that I built,
link |
I and many people built, one of the challenges we took on
link |
was we looked at GCC.
link |
GCC, at the time, was a really good industry standardized
link |
compiler that had really consolidated
link |
a lot of the other compilers in the world and was a standard.
link |
But it wasn't really great for research.
link |
The design was very difficult to work with.
link |
And it was full of global variables and other things
link |
that made it very difficult to reuse in ways
link |
that it wasn't originally designed for.
link |
And so with clang, one of the things that we wanted to do
link |
is push forward on better user interface,
link |
so make error messages that are just better than GCC's.
link |
And that's actually hard, because you
link |
have to do a lot of bookkeeping in an efficient way
link |
to be able to do that.
link |
We want to make compile time better.
link |
And so compile time is about making it efficient,
link |
which is also really hard when you're keeping
link |
track of extra information.
link |
We wanted to make new tools available,
link |
so refactoring tools and other analysis tools
link |
that GCC never supported, also leveraging the extra information
link |
we kept, but enabling those new classes of tools
link |
that then get built into IDEs.
link |
And so that's been one of the areas that clang has really
link |
helped push the world forward in,
link |
is in the tooling for C and C++ and things like that.
link |
But C++ and the front end piece is complicated.
link |
And you have to build syntax trees.
link |
And you have to check every rule in the spec.
link |
And you have to turn that back into an error message
link |
to the human that the human can understand
link |
when they do something wrong.
link |
But then you start doing what's called lowering,
link |
so going from C++ and the way that it represents
link |
code down to the machine.
link |
And when you do that, there's many different phases
link |
Often, there are, I think LLVM has something like 150
link |
different what are called passes in the compiler
link |
that the code passes through.
link |
And these get organized in very complicated ways,
link |
which affect the generated code and the performance
link |
and compile time and many other things.
link |
What are they passing through?
link |
So after you do the clang parsing, what's the graph?
link |
What does it look like?
link |
What's the data structure here?
link |
Yeah, so in the parser, it's usually a tree.
link |
And it's called an abstract syntax tree.
link |
And so the idea is you have a node for the plus
link |
that the human wrote in their code.
link |
Or the function call, you'll have a node for call
link |
with the function that they call and the arguments they pass,
link |
This then gets lowered into what's
link |
called an intermediate representation.
link |
And intermediate representations are like LLVM has one.
link |
And there, it's what's called a control flow graph.
link |
And so you represent each operation in the program
link |
as a very simple, like this is going to add two numbers.
link |
This is going to multiply two things.
link |
Maybe we'll do a call.
link |
But then they get put in what are called blocks.
link |
And so you get blocks of these straight line operations,
link |
where instead of being nested like in a tree,
link |
it's straight line operations.
link |
And so there's a sequence and an ordering to these operations.
link |
So within the block or outside the block?
link |
That's within the block.
link |
And so it's a straight line sequence of operations
link |
And then you have branches, like conditional branches,
link |
And so when you write a loop, for example, in a syntax tree,
link |
you would have a for node, like for a for statement
link |
in a C like language, you'd have a for node.
link |
And you have a pointer to the expression
link |
for the initializer, a pointer to the expression
link |
for the increment, a pointer to the expression
link |
for the comparison, a pointer to the body.
link |
And these are all nested underneath it.
link |
In a control flow graph, you get a block
link |
for the code that runs before the loop, so the initializer
link |
And you have a block for the body of the loop.
link |
And so the body of the loop code goes in there,
link |
but also the increment and other things like that.
link |
And then you have a branch that goes back to the top
link |
and a comparison and a branch that goes out.
link |
And so it's more of an assembly level kind of representation.
link |
But the nice thing about this level of representation
link |
is it's much more language independent.
link |
And so there's lots of different kinds of languages
link |
with different kinds of, you know,
link |
JavaScript has a lot of different ideas of what
link |
is false, for example.
link |
And all that can stay in the front end.
link |
But then that middle part can be shared across all those.
link |
How close is that intermediate representation
link |
to neural networks, for example?
link |
Are they, because everything you describe
link |
is a kind of echoes of a neural network graph.
link |
Are they neighbors or what?
link |
They're quite different in details,
link |
but they're very similar in idea.
link |
So one of the things that neural networks do
link |
is they learn representations for data
link |
at different levels of abstraction.
link |
And then they transform those through layers, right?
link |
So the compiler does very similar things.
link |
But one of the things the compiler does
link |
is it has relatively few different representations.
link |
Where a neural network often, as you get deeper, for example,
link |
you get many different representations
link |
in each layer or set of ops.
link |
It's transforming between these different representations.
link |
In a compiler, often you get one representation
link |
and they do many transformations to it.
link |
And these transformations are often applied iteratively.
link |
And for programmers, there's familiar types of things.
link |
For example, trying to find expressions inside of a loop
link |
and pulling them out of a loop so they execute for times.
link |
Or find redundant computation.
link |
Or find constant folding or other simplifications,
link |
turning two times x into x shift left by one.
link |
And things like this are all the examples
link |
of the things that happen.
link |
But compilers end up getting a lot of theorem proving
link |
and other kinds of algorithms that
link |
try to find higher level properties of the program that
link |
then can be used by the optimizer.
link |
So what's the biggest bang for the buck with optimization?
link |
Well, no, not even today.
link |
At the very beginning, the 80s, I don't know.
link |
Yeah, so for the 80s, a lot of it
link |
was things like register allocation.
link |
So the idea of in a modern microprocessor,
link |
what you'll end up having is you'll
link |
end up having memory, which is relatively slow.
link |
And then you have registers that are relatively fast.
link |
But registers, you don't have very many of them.
link |
And so when you're writing a bunch of code,
link |
you're just saying, compute this,
link |
put in a temporary variable, compute this, compute this,
link |
compute this, put in a temporary variable.
link |
I have some other stuff going on.
link |
Well, now you're running on an x86,
link |
like a desktop PC or something.
link |
Well, it only has, in some cases, some modes,
link |
And so now the compiler has to choose what values get
link |
put in what registers at what points in the program.
link |
And this is actually a really big deal.
link |
So if you think about, you have a loop, an inner loop
link |
that executes millions of times maybe.
link |
If you're doing loads and stores inside that loop,
link |
then it's going to be really slow.
link |
But if you can somehow fit all the values inside that loop
link |
in registers, now it's really fast.
link |
And so getting that right requires a lot of work,
link |
because there's many different ways to do that.
link |
And often what the compiler ends up doing
link |
is it ends up thinking about things
link |
in a different representation than what the human wrote.
link |
Well, the compiler thinks about that as four different values,
link |
each which have different lifetimes across the function
link |
And each of those could be put in a register or memory
link |
or different memory or maybe in some parts of the code
link |
recomputed instead of stored and reloaded.
link |
And there are many of these different kinds of techniques
link |
So it's adding almost like a time dimension to it's
link |
trying to optimize across time.
link |
So it's considering when you're programming,
link |
you're not thinking in that way.
link |
And so the RISC era made things.
link |
So RISC chips, R I S C. The RISC chips,
link |
as opposed to CISC chips.
link |
The RISC chips made things more complicated for the compiler,
link |
because what they ended up doing is ending up
link |
adding pipelines to the processor, where
link |
the processor can do more than one thing at a time.
link |
But this means that the order of operations matters a lot.
link |
So one of the classical compiler techniques that you use
link |
is called scheduling.
link |
And so moving the instructions around
link |
so that the processor can keep its pipelines full instead
link |
of stalling and getting blocked.
link |
And so there's a lot of things like that that
link |
are kind of bread and butter compiler techniques
link |
that have been studied a lot over the course of decades now.
link |
But the engineering side of making them real
link |
is also still quite hard.
link |
And you talk about machine learning.
link |
This is a huge opportunity for machine learning,
link |
because many of these algorithms are full of these
link |
hokey, hand rolled heuristics, which
link |
work well on specific benchmarks that don't generalize,
link |
and full of magic numbers.
link |
And I hear there's some techniques that
link |
are good at handling that.
link |
So what would be the, if you were to apply machine learning
link |
to this, what's the thing you're trying to optimize?
link |
Is it ultimately the running time?
link |
You can pick your metric, and there's running time,
link |
there's memory use, there's lots of different things
link |
that you can optimize for.
link |
Code size is another one that some people care about
link |
in the embedded space.
link |
Is this like the thinking into the future,
link |
or has somebody actually been crazy enough
link |
to try to have machine learning based parameter
link |
tuning for the optimization of compilers?
link |
So this is something that is, I would say, research right now.
link |
There are a lot of research systems
link |
that have been applying search in various forms.
link |
And using reinforcement learning is one form,
link |
but also brute force search has been tried for quite a while.
link |
And usually, these are in small problem spaces.
link |
So find the optimal way to code generate a matrix
link |
multiply for a GPU, something like that,
link |
where you say, there, there's a lot of design space of,
link |
do you unroll loops a lot?
link |
Do you execute multiple things in parallel?
link |
And there's many different confounding factors here
link |
because graphics cards have different numbers of threads
link |
and registers and execution ports and memory bandwidth
link |
and many different constraints that interact
link |
in nonlinear ways.
link |
And so search is very powerful for that.
link |
And it gets used in certain ways,
link |
but it's not very structured.
link |
This is something that we need,
link |
we as an industry need to fix.
link |
So you said 80s, but like, so have there been like big jumps
link |
in improvement and optimization?
link |
Yeah, since then, what's the coolest thing?
link |
It's largely been driven by hardware.
link |
So, well, it's hardware and software.
link |
So in the mid nineties, Java totally changed the world,
link |
And I'm still amazed by how much change was introduced
link |
by the way or in a good way.
link |
So like reflecting back, Java introduced things like,
link |
all at once introduced things like JIT compilation.
link |
None of these were novel, but it pulled it together
link |
and made it mainstream and made people invest in it.
link |
JIT compilation, garbage collection, portable code,
link |
safe code, like memory safe code,
link |
like a very dynamic dispatch execution model.
link |
Like many of these things,
link |
which had been done in research systems
link |
and had been done in small ways in various places,
link |
really came to the forefront,
link |
really changed how things worked
link |
and therefore changed the way people thought
link |
about the problem.
link |
JavaScript was another major world change
link |
based on the way it works.
link |
But also on the hardware side of things,
link |
multi core and vector instructions really change
link |
the problem space and are very,
link |
they don't remove any of the problems
link |
that compilers faced in the past,
link |
but they add new kinds of problems
link |
of how do you find enough work
link |
to keep a four wide vector busy, right?
link |
Or if you're doing a matrix multiplication,
link |
how do you do different columns out of that matrix
link |
And how do you maximally utilize the arithmetic compute
link |
that one core has?
link |
And then how do you take it to multiple cores?
link |
How did the whole virtual machine thing change
link |
the compilation pipeline?
link |
Yeah, so what the Java virtual machine does
link |
is it splits, just like I was talking about before,
link |
where you have a front end that parses the code,
link |
and then you have an intermediate representation
link |
that gets transformed.
link |
What Java did was they said,
link |
we will parse the code and then compile to
link |
what's known as Java byte code.
link |
And that byte code is now a portable code representation
link |
that is industry standard and locked down and can't change.
link |
And then the back part of the compiler
link |
that does optimization and code generation
link |
can now be built by different vendors.
link |
And Java byte code can be shipped around across the wire.
link |
It's memory safe and relatively trusted.
link |
And because of that, it can run in the browser.
link |
And that's why it runs in the browser, right?
link |
And so that way you can be in,
link |
again, back in the day, you would write a Java applet
link |
and as a web developer, you'd build this mini app
link |
that would run on a webpage.
link |
Well, a user of that is running a web browser
link |
on their computer.
link |
You download that Java byte code, which can be trusted,
link |
and then you do all the compiler stuff on your machine
link |
so that you know that you trust that.
link |
Now, is that a good idea or a bad idea?
link |
It's a great idea.
link |
I mean, it's a great idea for certain problems.
link |
And I'm very much a believer that technology is itself
link |
neither good nor bad.
link |
It's how you apply it.
link |
You know, this would be a very, very bad thing
link |
for very low levels of the software stack.
link |
But in terms of solving some of these software portability
link |
and transparency, or portability problems,
link |
I think it's been really good.
link |
Now, Java ultimately didn't win out on the desktop.
link |
And like, there are good reasons for that.
link |
But it's been very successful on servers and in many places,
link |
it's been a very successful thing over decades.
link |
So what has been LLVMs and C langs improvements
link |
and optimization that throughout its history,
link |
what are some moments we had set back
link |
and really proud of what's been accomplished?
link |
Yeah, I think that the interesting thing about LLVM
link |
is not the innovations and compiler research.
link |
It has very good implementations
link |
of various important algorithms, no doubt.
link |
And a lot of really smart people have worked on it.
link |
But I think that the thing that's most profound about LLVM
link |
is that through standardization, it made things possible
link |
that otherwise wouldn't have happened, okay?
link |
And so interesting things that have happened with LLVM,
link |
for example, Sony has picked up LLVM
link |
and used it to do all the graphics compilation
link |
in their movie production pipeline.
link |
And so now they're able to have better special effects
link |
That's kind of cool.
link |
That's not what it was designed for, right?
link |
But that's the sign of good infrastructure
link |
when it can be used in ways it was never designed for
link |
because it has good layering and software engineering
link |
and it's composable and things like that.
link |
Which is where, as you said, it differs from GCC.
link |
Yes, GCC is also great in various ways,
link |
but it's not as good as infrastructure technology.
link |
It's really a C compiler, or it's a Fortran compiler.
link |
It's not infrastructure in the same way.
link |
Now you can tell I don't know what I'm talking about
link |
because I keep saying C lang.
link |
You can always tell when a person has clues,
link |
by the way, to pronounce something.
link |
I don't think, have I ever used C lang?
link |
Entirely possible, have you?
link |
Well, so you've used code, it's generated probably.
link |
So C lang and LLVM are used to compile
link |
all the apps on the iPhone effectively and the OSs.
link |
It compiles Google's production server applications.
link |
It's used to build GameCube games and PlayStation 4
link |
and things like that.
link |
So as a user, I have, but just everything I've done
link |
that I experienced with Linux has been,
link |
I believe, always GCC.
link |
Yeah, I think Linux still defaults to GCC.
link |
And is there a reason for that?
link |
Or is it because, I mean, is there a reason for that?
link |
It's a combination of technical and social reasons.
link |
Many Linux developers do use C lang,
link |
but the distributions, for lots of reasons,
link |
use GCC historically, and they've not switched, yeah.
link |
Because it's just anecdotally online,
link |
it seems that LLVM has either reached the level of GCC
link |
or superseded on different features or whatever.
link |
The way I would say it is that they're so close,
link |
it doesn't matter.
link |
Like, they're slightly better in some ways,
link |
slightly worse than otherwise,
link |
but it doesn't actually really matter anymore, that level.
link |
So in terms of optimization breakthroughs,
link |
it's just been solid incremental work.
link |
Yeah, yeah, which describes a lot of compilers.
link |
The hard thing about compilers, in my experience,
link |
is the engineering, the software engineering,
link |
making it so that you can have hundreds of people
link |
collaborating on really detailed, low level work
link |
And that's really hard.
link |
And that's one of the things I think LLVM has done well.
link |
And that kind of goes back to the original design goals
link |
with it to be modular and things like that.
link |
And incidentally, I don't want to take all the credit
link |
I mean, some of the best parts about LLVM
link |
is that it was designed to be modular.
link |
And when I started, I would write, for example,
link |
a register allocator, and then somebody much smarter than me
link |
would come in and pull it out and replace it
link |
with something else that they would come up with.
link |
And because it's modular, they were able to do that.
link |
And that's one of the challenges with GCC, for example,
link |
is replacing subsystems is incredibly difficult.
link |
It can be done, but it wasn't designed for that.
link |
And that's one of the reasons that LLVM's been
link |
very successful in the research world as well.
link |
But in a community sense, Guido van Rossum, right,
link |
from Python, just retired from, what is it?
link |
Benevolent Dictator for Life, right?
link |
So in managing this community of brilliant compiler folks,
link |
is there, did it, for a time at least,
link |
fall on you to approve things?
link |
Oh yeah, so I mean, I still have something like
link |
an order of magnitude more patches in LLVM
link |
than anybody else, and many of those I wrote myself.
link |
But you still write, I mean, you're still close to the,
link |
to the, I don't know what the expression is,
link |
to the metal, you still write code.
link |
Yeah, I still write code.
link |
Not as much as I was able to in grad school,
link |
but that's an important part of my identity.
link |
But the way that LLVM has worked over time
link |
is that when I was a grad student, I could do all the work
link |
and steer everything and review every patch
link |
and make sure everything was done
link |
exactly the way my opinionated sense
link |
felt like it should be done, and that was fine.
link |
But as things scale, you can't do that, right?
link |
And so what ends up happening is LLVM
link |
has a hierarchical system of what's called code owners.
link |
These code owners are given the responsibility
link |
not to do all the work,
link |
not necessarily to review all the patches,
link |
but to make sure that the patches do get reviewed
link |
and make sure that the right thing's happening
link |
architecturally in their area.
link |
And so what you'll see is you'll see that, for example,
link |
hardware manufacturers end up owning
link |
the hardware specific parts of their hardware.
link |
That's very common.
link |
Leaders in the community that have done really good work
link |
naturally become the de facto owner of something.
link |
And then usually somebody else is like,
link |
how about we make them the official code owner?
link |
And then we'll have somebody to make sure
link |
that all the patches get reviewed in a timely manner.
link |
And then everybody's like, yes, that's obvious.
link |
And then it happens, right?
link |
And usually this is a very organic thing, which is great.
link |
And so I'm nominally the top of that stack still,
link |
but I don't spend a lot of time reviewing patches.
link |
What I do is I help negotiate a lot of the technical
link |
disagreements that end up happening
link |
and making sure that the community as a whole
link |
makes progress and is moving in the right direction
link |
So we also started a nonprofit six years ago,
link |
seven years ago, time's gone away.
link |
And the LLVM Foundation nonprofit helps oversee
link |
all the business sides of things and make sure
link |
that the events that the LLVM community has
link |
are funded and set up and run correctly
link |
and stuff like that.
link |
But the foundation is very much stays out
link |
of the technical side of where the project is going.
link |
Right, so it sounds like a lot of it is just organic.
link |
Yeah, well, LLVM is almost 20 years old,
link |
which is hard to believe.
link |
Somebody pointed out to me recently that LLVM
link |
is now older than GCC was when LLVM started, right?
link |
So time has a way of getting away from you.
link |
But the good thing about that is it has a really robust,
link |
really amazing community of people that are
link |
in their professional lives, spread across lots
link |
of different companies, but it's a community
link |
of people that are interested in similar kinds of problems
link |
and have been working together effectively for years
link |
and have a lot of trust and respect for each other.
link |
And even if they don't always agree that we're able
link |
to find a path forward.
link |
So then in a slightly different flavor of effort,
link |
you started at Apple in 2005 with the task
link |
of making, I guess, LLVM production ready.
link |
And then eventually 2013 through 2017,
link |
leading the entire developer tools department.
link |
We're talking about LLVM, Xcode, Objective C to Swift.
link |
So in a quick overview of your time there,
link |
what were the challenges?
link |
First of all, leading such a huge group of developers,
link |
what was the big motivator, dream, mission
link |
behind creating Swift, the early birth of it
link |
from Objective C and so on, and Xcode,
link |
what are some challenges?
link |
So these are different questions.
link |
Yeah, I know, but I wanna talk about the other stuff too.
link |
I'll stay on the technical side,
link |
then we can talk about the big team pieces, if that's okay.
link |
So it's to really oversimplify many years of hard work.
link |
LLVM started, joined Apple, became a thing,
link |
became successful and became deployed.
link |
But then there's a question about
link |
how do we actually parse the source code?
link |
So LLVM is that back part,
link |
the optimizer and the code generator.
link |
And LLVM was really good for Apple
link |
as it went through a couple of harder transitions.
link |
I joined right at the time of the Intel transition,
link |
for example, and 64 bit transitions,
link |
and then the transition to ARM with the iPhone.
link |
And so LLVM was very useful
link |
for some of these kinds of things.
link |
But at the same time, there's a lot of questions
link |
around developer experience.
link |
And so if you're a programmer pounding out
link |
at the time Objective C code,
link |
the error message you get, the compile time,
link |
the turnaround cycle, the tooling and the IDE,
link |
were not great, were not as good as they could be.
link |
And so, as I occasionally do, I'm like,
link |
well, okay, how hard is it to write a C compiler?
link |
And so I'm not gonna commit to anybody,
link |
I'm not gonna tell anybody, I'm just gonna just do it
link |
nights and weekends and start working on it.
link |
And then I built up in C,
link |
there's this thing called the preprocessor,
link |
which people don't like,
link |
but it's actually really hard and complicated
link |
and includes a bunch of really weird things
link |
like trigraphs and other stuff like that
link |
that are really nasty,
link |
and it's the crux of a bunch of the performance issues
link |
Started working on the parser
link |
and kind of got to the point where I'm like,
link |
ah, you know what, we could actually do this.
link |
Everybody's saying that this is impossible to do,
link |
but it's actually just hard, it's not impossible.
link |
And eventually told my manager about it,
link |
and he's like, oh, wow, this is great,
link |
we do need to solve this problem.
link |
Oh, this is great, we can get you one other person
link |
to work with you on this, you know?
link |
And slowly a team is formed and it starts taking off.
link |
And C++, for example, huge, complicated language.
link |
People always assume that it's impossible to implement
link |
and it's very nearly impossible,
link |
but it's just really, really hard.
link |
And the way to get there is to build it
link |
one piece at a time incrementally.
link |
And that was only possible because we were lucky
link |
to hire some really exceptional engineers
link |
that knew various parts of it very well
link |
and could do great things.
link |
Swift was kind of a similar thing.
link |
So Swift came from, we were just finishing off
link |
the first version of C++ support in Clang.
link |
And C++ is a very formidable and very important language,
link |
but it's also ugly in lots of ways.
link |
And you can't influence C++ without thinking
link |
there has to be a better thing, right?
link |
And so I started working on Swift, again,
link |
with no hope or ambition that would go anywhere,
link |
just let's see what could be done,
link |
let's play around with this thing.
link |
It was me in my spare time, not telling anybody about it,
link |
kind of a thing, and it made some good progress.
link |
I'm like, actually, it would make sense to do this.
link |
At the same time, I started talking with the senior VP
link |
of software at the time, a guy named Bertrand Serlet.
link |
And Bertrand was very encouraging.
link |
He was like, well, let's have fun, let's talk about this.
link |
And he was a little bit of a language guy,
link |
and so he helped guide some of the early work
link |
and encouraged me and got things off the ground.
link |
And eventually told my manager and told other people,
link |
and it started making progress.
link |
The complicating thing with Swift
link |
was that the idea of doing a new language
link |
was not obvious to anybody, including myself.
link |
And the tone at the time was that the iPhone
link |
was successful because of Objective C.
link |
Not despite of or just because of.
link |
And you have to understand that at the time,
link |
Apple was hiring software people that loved Objective C.
link |
And it wasn't that they came despite Objective C.
link |
They loved Objective C, and that's why they got hired.
link |
And so you had a software team that the leadership,
link |
in many cases, went all the way back to Next,
link |
where Objective C really became real.
link |
And so they, quote unquote, grew up writing Objective C.
link |
And many of the individual engineers
link |
all were hired because they loved Objective C.
link |
And so this notion of, OK, let's do new language
link |
was kind of heretical in many ways.
link |
Meanwhile, my sense was that the outside community wasn't really
link |
in love with Objective C. Some people were,
link |
and some of the most outspoken people were.
link |
But other people were hitting challenges
link |
because it has very sharp corners
link |
and it's difficult to learn.
link |
And so one of the challenges of making Swift happen that
link |
was totally non technical is the social part of what do we do?
link |
If we do a new language, which at Apple, many things
link |
happen that don't ship.
link |
So if we ship it, what is the metrics of success?
link |
Why would we do this?
link |
Why wouldn't we make Objective C better?
link |
If Objective C has problems, let's file off
link |
those rough corners and edges.
link |
And one of the major things that became the reason to do this
link |
was this notion of safety, memory safety.
link |
And the way Objective C works is that a lot of the object system
link |
and everything else is built on top of pointers in C.
link |
Objective C is an extension on top of C.
link |
And so pointers are unsafe.
link |
And if you get rid of the pointers,
link |
it's not Objective C anymore.
link |
And so fundamentally, that was an issue
link |
that you could not fix safety or memory safety
link |
without fundamentally changing the language.
link |
And so once we got through that part of the mental process
link |
and the thought process, it became a design process
link |
of saying, OK, well, if we're going to do something new,
link |
How do we think about this?
link |
And what do we like?
link |
And what are we looking for?
link |
And that was a very different phase of it.
link |
So what are some design choices early on in Swift?
link |
Like we're talking about braces, are you
link |
making a typed language or not, all those kinds of things.
link |
Yeah, so some of those were obvious given the context.
link |
So a typed language, for example,
link |
Objective C is a typed language.
link |
And going with an untyped language
link |
wasn't really seriously considered.
link |
We wanted the performance, and we
link |
wanted refactoring tools and other things
link |
like that that go with typed languages.
link |
Quick, dumb question.
link |
Was it obvious, I think this would be a dumb question,
link |
but was it obvious that the language
link |
has to be a compiled language?
link |
Yes, that's not a dumb question.
link |
Earlier, I think late 90s, Apple had seriously
link |
considered moving its development experience to Java.
link |
But Swift started in 2010, which was several years
link |
It was when the iPhone was definitely
link |
on an upward trajectory.
link |
And the iPhone was still extremely,
link |
and is still a bit memory constrained.
link |
And so being able to compile the code
link |
and then ship it and then having standalone code that
link |
is not JIT compiled is a very big deal
link |
and is very much part of the Apple value system.
link |
Now, JavaScript's also a thing.
link |
I mean, it's not that this is exclusive,
link |
and technologies are good depending
link |
on how they're applied.
link |
But in the design of Swift, saying,
link |
how can we make Objective C better?
link |
Objective C is statically compiled,
link |
and that was the contiguous, natural thing to do.
link |
Just skip ahead a little bit, and we'll go right back.
link |
Just as a question, as you think about today in 2019
link |
in your work at Google, TensorFlow and so on,
link |
is, again, compilations, static compilation still
link |
Yeah, so the funny thing after working
link |
on compilers for a really long time is that,
link |
and this is one of the things that LLVM has helped with,
link |
is that I don't look at compilations
link |
being static or dynamic or interpreted or not.
link |
This is a spectrum.
link |
And one of the cool things about Swift
link |
is that Swift is not just statically compiled.
link |
It's actually dynamically compiled as well,
link |
and it can also be interpreted.
link |
Though, nobody's actually done that.
link |
And so what ends up happening when
link |
you use Swift in a workbook, for example in Colab or in Jupyter,
link |
is it's actually dynamically compiling the statements
link |
as you execute them.
link |
And so this gets back to the software engineering problems,
link |
where if you layer the stack properly,
link |
you can actually completely change
link |
how and when things get compiled because you
link |
have the right abstractions there.
link |
And so the way that a Colab workbook works with Swift
link |
is that when you start typing into it,
link |
it creates a process, a Unix process.
link |
And then each line of code you type in,
link |
it compiles it through the Swift compiler, the front end part,
link |
and then sends it through the optimizer,
link |
JIT compiles machine code, and then
link |
injects it into that process.
link |
And so as you're typing new stuff,
link |
it's like squirting in new code and overwriting and replacing
link |
and updating code in place.
link |
And the fact that it can do this is not an accident.
link |
Swift was designed for this.
link |
But it's an important part of how the language was set up
link |
and how it's layered, and this is a nonobvious piece.
link |
And one of the things with Swift that
link |
was, for me, a very strong design point
link |
is to make it so that you can learn it very quickly.
link |
And so from a language design perspective,
link |
the thing that I always come back to
link |
is this UI principle of progressive disclosure
link |
And so in Swift, you can start by saying print, quote,
link |
hello world, quote.
link |
And there's no slash n, just like Python, one line of code,
link |
no main, no header files, no public static class void,
link |
blah, blah, blah, string like Java has, one line of code.
link |
And you can teach that, and it works great.
link |
Then you can say, well, let's introduce variables.
link |
And so you can declare a variable with var.
link |
So var x equals 4.
link |
What is a variable?
link |
You can use x, x plus 1.
link |
This is what it means.
link |
Then you can say, well, how about control flow?
link |
Well, this is what an if statement is.
link |
This is what a for statement is.
link |
This is what a while statement is.
link |
Then you can say, let's introduce functions.
link |
And many languages like Python have
link |
had this kind of notion of let's introduce small things,
link |
and then you can add complexity.
link |
Then you can introduce classes.
link |
And then you can add generics, in the case of Swift.
link |
And then you can build in modules
link |
and build out in terms of the things that you're expressing.
link |
But this is not very typical for compiled languages.
link |
And so this was a very strong design point,
link |
and one of the reasons that Swift, in general,
link |
is designed with this factoring of complexity in mind
link |
so that the language can express powerful things.
link |
You can write firmware in Swift if you want to.
link |
But it has a very high level feel,
link |
which is really this perfect blend, because often you
link |
have very advanced library writers that
link |
want to be able to use the nitty gritty details.
link |
But then other people just want to use the libraries
link |
and work at a higher abstraction level.
link |
It's kind of cool that I saw that you can just
link |
I don't think I pronounced that word enough.
link |
But you can just drag in Python.
link |
It's just strange.
link |
You can import, like I saw this in the demo.
link |
How do you make that happen?
link |
What's up with that?
link |
Is that as easy as it looks, or is it?
link |
Yes, as easy as it looks.
link |
That's not a stage magic hack or anything like that.
link |
I don't mean from the user perspective.
link |
I mean from the implementation perspective to make it happen.
link |
So it's easy once all the pieces are in place.
link |
The way it works, so if you think about a dynamically typed
link |
language like Python, you can think about it
link |
in two different ways.
link |
You can say it has no types, which
link |
is what most people would say.
link |
Or you can say it has one type.
link |
And you can say it has one type, and it's the Python object.
link |
And the Python object gets passed around.
link |
And because there's only one type, it's implicit.
link |
And so what happens with Swift and Python talking
link |
to each other, Swift has lots of types.
link |
It has arrays, and it has strings, and all classes,
link |
and that kind of stuff.
link |
But it now has a Python object type.
link |
So there is one Python object type.
link |
And so when you say import NumPy, what you get
link |
is a Python object, which is the NumPy module.
link |
And then you say np.array.
link |
It says, OK, hey, Python object, I have no idea what you are.
link |
Give me your array member.
link |
And it just uses dynamic stuff, talks to the Python interpreter,
link |
and says, hey, Python, what's the.array member
link |
in that Python object?
link |
It gives you back another Python object.
link |
And now you say parentheses for the call and the arguments
link |
you're going to pass.
link |
And so then it says, hey, a Python object
link |
that is the result of np.array, call with these arguments.
link |
Again, calling into the Python interpreter to do that work.
link |
And so right now, this is all really simple.
link |
And if you dive into the code, what you'll see
link |
is that the Python module in Swift
link |
is something like 1,200 lines of code or something.
link |
It's written in pure Swift.
link |
It's super simple.
link |
And it's built on top of the C interoperability
link |
because it just talks to the Python interpreter.
link |
But making that possible required
link |
us to add two major language features to Swift
link |
to be able to express these dynamic calls
link |
and the dynamic member lookups.
link |
And so what we've done over the last year
link |
is we've proposed, implement, standardized, and contributed
link |
new language features to the Swift language
link |
in order to make it so it is really trivial.
link |
And this is one of the things about Swift
link |
that is critical to the Swift for TensorFlow work, which
link |
is that we can actually add new language features.
link |
And the bar for adding those is high,
link |
but it's what makes it possible.
link |
So you're now at Google doing incredible work
link |
on several things, including TensorFlow.
link |
So TensorFlow 2.0 or whatever leading up to 2.0 has,
link |
by default, in 2.0, has eager execution.
link |
And yet, in order to make code optimized for GPU or TPU
link |
or some of these systems, computation
link |
needs to be converted to a graph.
link |
So what's that process like?
link |
What are the challenges there?
link |
Yeah, so I am tangentially involved in this.
link |
But the way that it works with Autograph
link |
is that you mark your function with a decorator.
link |
And when Python calls it, that decorator is invoked.
link |
And then it says, before I call this function,
link |
you can transform it.
link |
And so the way Autograph works is, as far as I understand,
link |
is it actually uses the Python parser
link |
to go parse that, turn it into a syntax tree,
link |
and now apply compiler techniques to, again,
link |
transform this down into TensorFlow graphs.
link |
And so you can think of it as saying, hey,
link |
I have an if statement.
link |
I'm going to create an if node in the graph,
link |
like you say tf.cond.
link |
You have a multiply.
link |
Well, I'll turn that into a multiply node in the graph.
link |
And it becomes this tree transformation.
link |
So where does the Swift for TensorFlow
link |
come in, which is parallels?
link |
For one, Swift is an interface.
link |
Like, Python is an interface to TensorFlow.
link |
But it seems like there's a lot more going on in just
link |
a different language interface.
link |
There's optimization methodology.
link |
So the TensorFlow world has a couple
link |
of different what I'd call front end technologies.
link |
And so Swift and Python and Go and Rust and Julia
link |
and all these things share the TensorFlow graphs
link |
and all the runtime and everything that's later.
link |
And so Swift for TensorFlow is merely another front end
link |
for TensorFlow, just like any of these other systems are.
link |
There's a major difference between, I would say,
link |
three camps of technologies here.
link |
There's Python, which is a special case,
link |
because the vast majority of the community effort
link |
is going to the Python interface.
link |
And Python has its own approaches
link |
for automatic differentiation.
link |
It has its own APIs and all this kind of stuff.
link |
There's Swift, which I'll talk about in a second.
link |
And then there's kind of everything else.
link |
And so the everything else are effectively language bindings.
link |
So they call into the TensorFlow runtime,
link |
but they usually don't have automatic differentiation
link |
or they usually don't provide anything other than APIs
link |
that call the C APIs in TensorFlow.
link |
And so they're kind of wrappers for that.
link |
Swift is really kind of special.
link |
And it's a very different approach.
link |
Swift for TensorFlow, that is, is a very different approach.
link |
Because there we're saying, let's
link |
look at all the problems that need
link |
to be solved in the full stack of the TensorFlow compilation
link |
process, if you think about it that way.
link |
Because TensorFlow is fundamentally a compiler.
link |
It takes models, and then it makes them go fast on hardware.
link |
That's what a compiler does.
link |
And it has a front end, it has an optimizer,
link |
and it has many back ends.
link |
And so if you think about it the right way,
link |
or if you look at it in a particular way,
link |
And so Swift is merely another front end.
link |
But it's saying, and the design principle is saying,
link |
let's look at all the problems that we face as machine
link |
learning practitioners and what is the best possible way we
link |
can do that, given the fact that we can change literally
link |
anything in this entire stack.
link |
And Python, for example, where the vast majority
link |
of the engineering and effort has gone into,
link |
is constrained by being the best possible thing you
link |
can do with a Python library.
link |
There are no Python language features
link |
that are added because of machine learning
link |
that I'm aware of.
link |
They added a matrix multiplication operator
link |
with that, but that's as close as you get.
link |
And so with Swift, it's hard, but you
link |
can add language features to the language.
link |
And there's a community process for that.
link |
And so we look at these things and say, well,
link |
what is the right division of labor
link |
between the human programmer and the compiler?
link |
And Swift has a number of things that shift that balance.
link |
So because it has a type system, for example,
link |
that makes certain things possible for analysis
link |
of the code, and the compiler can automatically
link |
build graphs for you without you thinking about them.
link |
That's a big deal for a programmer.
link |
You just get free performance.
link |
You get clustering and fusion and optimization,
link |
things like that, without you as a programmer
link |
having to manually do it because the compiler can do it for you.
link |
Automatic differentiation is another big deal.
link |
And I think one of the key contributions of the Swift
link |
TensorFlow project is that there's
link |
this entire body of work on automatic differentiation
link |
that dates back to the Fortran days.
link |
People doing a tremendous amount of numerical computing
link |
in Fortran used to write these what they call source
link |
to source translators, where you take a bunch of code,
link |
shove it into a mini compiler, and it would push out
link |
more Fortran code.
link |
But it would generate the backwards passes
link |
for your functions for you, the derivatives.
link |
And so in that work in the 70s, a tremendous number
link |
of optimizations, a tremendous number of techniques
link |
for fixing numerical instability,
link |
and other kinds of problems were developed.
link |
But they're very difficult to port into a world
link |
where, in eager execution, you get an op by op at a time.
link |
You need to be able to look at an entire function
link |
and be able to reason about what's going on.
link |
And so when you have a language integrated automatic
link |
differentiation, which is one of the things
link |
that the Swift project is focusing on,
link |
you can open all these techniques
link |
and reuse them in familiar ways.
link |
But the language integration piece
link |
has a bunch of design room in it, and it's also complicated.
link |
The other piece of the puzzle here that's kind of interesting
link |
is TPUs at Google.
link |
So we're in a new world with deep learning.
link |
It constantly is changing, and I imagine,
link |
without disclosing anything, I imagine
link |
you're still innovating on the TPU front, too.
link |
So how much interplay is there between software and hardware
link |
in trying to figure out how to together move
link |
towards an optimized solution?
link |
There's an incredible amount.
link |
So we're on our third generation of TPUs,
link |
which are now 100 petaflops in a very large liquid cooled box,
link |
virtual box with no cover.
link |
And as you might imagine, we're not out of ideas yet.
link |
The great thing about TPUs is that they're
link |
a perfect example of hardware software co design.
link |
And so it's about saying, what hardware
link |
do we build to solve certain classes of machine learning
link |
Well, the algorithms are changing.
link |
The hardware takes some cases years to produce.
link |
And so you have to make bets and decide
link |
what is going to happen and what is the best way to spend
link |
the transistors to get the maximum performance per watt
link |
or area per cost or whatever it is that you're optimizing for.
link |
And so one of the amazing things about TPUs
link |
is this numeric format called bfloat16.
link |
bfloat16 is a compressed 16 bit floating point format,
link |
but it puts the bits in different places.
link |
And in numeric terms, it has a smaller mantissa
link |
and a larger exponent.
link |
That means that it's less precise,
link |
but it can represent larger ranges of values,
link |
which in the machine learning context
link |
is really important and useful because sometimes you
link |
have very small gradients you want to accumulate
link |
and very, very small numbers that
link |
are important to move things as you're learning.
link |
But sometimes you have very large magnitude numbers as well.
link |
And bfloat16 is not as precise.
link |
The mantissa is small.
link |
But it turns out the machine learning algorithms actually
link |
want to generalize.
link |
And so there's theories that this actually
link |
increases the ability for the network
link |
to generalize across data sets.
link |
And regardless of whether it's good or bad,
link |
it's much cheaper at the hardware level to implement
link |
because the area and time of a multiplier
link |
is n squared in the number of bits in the mantissa,
link |
but it's linear with size of the exponent.
link |
And you're connected to both efforts
link |
here both on the hardware and the software side?
link |
Yeah, and so that was a breakthrough
link |
coming from the research side and people
link |
working on optimizing network transport of weights
link |
across the network originally and trying
link |
to find ways to compress that.
link |
But then it got burned into silicon.
link |
And it's a key part of what makes TPU performance
link |
so amazing and great.
link |
Now, TPUs have many different aspects that are important.
link |
But the co design between the low level compiler bits
link |
and the software bits and the algorithms
link |
is all super important.
link |
And it's this amazing trifecta that only Google can do.
link |
Yeah, that's super exciting.
link |
So can you tell me about MLIR project, previously
link |
the secretive one?
link |
Yeah, so MLIR is a project that we
link |
announced at a compiler conference three weeks ago
link |
or something at the Compilers for Machine Learning
link |
Basically, again, if you look at TensorFlow as a compiler stack,
link |
it has a number of compiler algorithms within it.
link |
It also has a number of compilers
link |
that get embedded into it.
link |
And they're made by different vendors.
link |
For example, Google has XLA, which
link |
is a great compiler system.
link |
NVIDIA has TensorRT.
link |
There's a number of these different compiler systems.
link |
And they're very hardware specific.
link |
And they're trying to solve different parts of the problems.
link |
But they're all kind of similar in a sense of they
link |
want to integrate with TensorFlow.
link |
Now, TensorFlow has an optimizer.
link |
And it has these different code generation technologies
link |
The idea of MLIR is to build a common infrastructure
link |
to support all these different subsystems.
link |
And initially, it's to be able to make it
link |
so that they all plug in together
link |
and they can share a lot more code and can be reusable.
link |
But over time, we hope that the industry
link |
will start collaborating and sharing code.
link |
And instead of reinventing the same things over and over again,
link |
that we can actually foster some of that working together
link |
to solve common problem energy that
link |
has been useful in the compiler field before.
link |
Beyond that, MLIR is some people have joked
link |
that it's kind of LLVM too.
link |
It learns a lot about what LLVM has been good
link |
and what LLVM has done wrong.
link |
And it's a chance to fix that.
link |
And also, there are challenges in the LLVM ecosystem as well,
link |
where LLVM is very good at the thing it was designed to do.
link |
But 20 years later, the world has changed.
link |
And people are trying to solve higher level problems.
link |
And we need some new technology.
link |
And what's the future of open source in this context?
link |
So it is not yet open source.
link |
But it will be hopefully in the next couple months.
link |
So you still believe in the value of open source
link |
in these kinds of contexts?
link |
And I think that the TensorFlow community at large
link |
fully believes in open source.
link |
So I mean, there is a difference between Apple,
link |
where you were previously, and Google now,
link |
in spirit and culture.
link |
And I would say the open source in TensorFlow
link |
was a seminal moment in the history of software,
link |
because here's this large company releasing
link |
a very large code base that's open sourcing.
link |
What are your thoughts on that?
link |
Happy or not, were you to see that kind
link |
of degree of open sourcing?
link |
So between the two, I prefer the Google approach,
link |
if that's what you're saying.
link |
The Apple approach makes sense, given the historical context
link |
that Apple came from.
link |
But that's been 35 years ago.
link |
And I think that Apple is definitely adapting.
link |
And the way I look at it is that there's
link |
different kinds of concerns in the space.
link |
It is very rational for a business
link |
to care about making money.
link |
That fundamentally is what a business is about.
link |
But I think it's also incredibly realistic to say,
link |
it's not your string library that's
link |
the thing that's going to make you money.
link |
It's going to be the amazing UI product differentiating
link |
features and other things like that that you built on top
link |
of your string library.
link |
And so keeping your string library
link |
proprietary and secret and things
link |
like that is maybe not the important thing anymore.
link |
Where before, platforms were different.
link |
And even 15 years ago, things were a little bit different.
link |
But the world is changing.
link |
So Google strikes a very good balance,
link |
And I think that TensorFlow being open source really
link |
changed the entire machine learning field
link |
and caused a revolution in its own right.
link |
And so I think it's amazingly forward looking
link |
because I could have imagined, and I wasn't at Google
link |
at the time, but I could imagine a different context
link |
and different world where a company says,
link |
machine learning is critical to what we're doing.
link |
We're not going to give it to other people.
link |
And so that decision is a profoundly brilliant insight
link |
that I think has really led to the world being
link |
better and better for Google as well.
link |
And has all kinds of ripple effects.
link |
I think it is really, I mean, you
link |
can't understate Google deciding how profound that
link |
Well, and again, I can understand the concern
link |
about if we release our machine learning software,
link |
our competitors could go faster.
link |
But on the other hand, I think that open sourcing TensorFlow
link |
has been fantastic for Google.
link |
And I'm sure that decision was very nonobvious at the time,
link |
but I think it's worked out very well.
link |
So let's try this real quick.
link |
You were at Tesla for five months
link |
as the VP of autopilot software.
link |
You led the team during the transition from H hardware
link |
one to hardware two.
link |
I have a couple of questions.
link |
So one, first of all, to me, that's
link |
one of the bravest engineering decisions undertaking really
link |
ever in the automotive industry to me, software wise,
link |
starting from scratch.
link |
It's a really brave engineering decision.
link |
So my one question there is, what was that like?
link |
What was the challenge of that?
link |
Do you mean the career decision of jumping
link |
from a comfortable good job into the unknown, or?
link |
That combined, so at the individual level,
link |
you making that decision.
link |
And then when you show up, it's a really hard engineering
link |
So you could just stay, maybe slow down,
link |
say hardware one, or those kinds of decisions.
link |
Just taking it full on, let's do this from scratch.
link |
What was that like?
link |
Well, so I mean, I don't think Tesla
link |
has a culture of taking things slow and seeing how it goes.
link |
And one of the things that attracted me about Tesla
link |
is it's very much a gung ho, let's change the world,
link |
let's figure it out kind of a place.
link |
And so I have a huge amount of respect for that.
link |
Tesla has done very smart things with hardware one
link |
And the hardware one design was originally
link |
designed to be very simple automation features
link |
in the car for like traffic aware cruise control and things
link |
And the fact that they were able to effectively feature creep
link |
it into lane holding and a very useful driver assistance
link |
feature is pretty astounding, particularly given
link |
the details of the hardware.
link |
Hardware two built on that in a lot of ways.
link |
And the challenge there was that they
link |
were transitioning from a third party provided vision stack
link |
to an in house built vision stack.
link |
And so for the first step, which I mostly helped with,
link |
was getting onto that new vision stack.
link |
And that was very challenging.
link |
And it was time critical for various reasons,
link |
and it was a big leap.
link |
But it was fortunate that it built
link |
on a lot of the knowledge and expertise and the team
link |
that had built hardware one's driver assistance features.
link |
So you spoke in a collected and kind way
link |
about your time at Tesla, but it was ultimately not a good fit.
link |
Elon Musk, we've talked on this podcast,
link |
several guests to the course, Elon Musk
link |
continues to do some of the most bold and innovative engineering
link |
work in the world, at times at the cost
link |
some of the members of the Tesla team.
link |
What did you learn about working in this chaotic world
link |
Yeah, so I guess I would say that when I was at Tesla,
link |
I experienced and saw the highest degree of turnover
link |
I'd ever seen in a company, which was a bit of a shock.
link |
But one of the things I learned and I came to respect
link |
is that Elon's able to attract amazing talent because he
link |
has a very clear vision of the future,
link |
and he can get people to buy into it
link |
because they want that future to happen.
link |
And the power of vision is something
link |
that I have a tremendous amount of respect for.
link |
And I think that Elon is fairly singular
link |
in the world in terms of the things
link |
he's able to get people to believe in.
link |
And there are many people that stand in the street corner
link |
and say, ah, we're going to go to Mars, right?
link |
But then there are a few people that
link |
can get others to buy into it and believe and build the path
link |
and make it happen.
link |
And so I respect that.
link |
I don't respect all of his methods,
link |
but I have a huge amount of respect for that.
link |
You've mentioned in a few places,
link |
including in this context, working hard.
link |
What does it mean to work hard?
link |
And when you look back at your life,
link |
what were some of the most brutal periods
link |
of having to really put everything
link |
you have into something?
link |
Yeah, good question.
link |
So working hard can be defined a lot of different ways,
link |
so a lot of hours, and so that is true.
link |
The thing to me that's the hardest
link |
is both being short term focused on delivering and executing
link |
and making a thing happen while also thinking
link |
about the longer term and trying to balance that.
link |
Because if you are myopically focused on solving a task
link |
and getting that done and only think
link |
about that incremental next step,
link |
you will miss the next big hill you should jump over to.
link |
And so I've been really fortunate that I've
link |
been able to kind of oscillate between the two.
link |
And historically at Apple, for example, that
link |
was made possible because I was able to work with some really
link |
amazing people and build up teams and leadership
link |
structures and allow them to grow in their careers
link |
and take on responsibility, thereby freeing up
link |
me to be a little bit crazy and thinking about the next thing.
link |
And so it's a lot of that.
link |
But it's also about with experience,
link |
you make connections that other people don't necessarily make.
link |
And so I think that's a big part as well.
link |
But the bedrock is just a lot of hours.
link |
And that's OK with me.
link |
There's different theories on work life balance.
link |
And my theory for myself, which I do not project onto the team,
link |
but my theory for myself is that I
link |
want to love what I'm doing and work really hard.
link |
And my purpose, I feel like, and my goal is to change the world
link |
and make it a better place.
link |
And that's what I'm really motivated to do.
link |
So last question, LLVM logo is a dragon.
link |
You explain that this is because dragons have connotations
link |
of power, speed, intelligence.
link |
It can also be sleek, elegant, and modular,
link |
though you remove the modular part.
link |
What is your favorite dragon related character
link |
from fiction, video, or movies?
link |
So those are all very kind ways of explaining it.
link |
Do you want to know the real reason it's a dragon?
link |
So there is a seminal book on compiler design
link |
called The Dragon Book.
link |
And so this is a really old now book on compilers.
link |
And so the dragon logo for LLVM came about because at Apple,
link |
we kept talking about LLVM related technologies
link |
and there's no logo to put on a slide.
link |
And so we're like, what do we do?
link |
And somebody's like, well, what kind of logo
link |
should a compiler technology have?
link |
And I'm like, I don't know.
link |
I mean, the dragon is the best thing that we've got.
link |
And Apple somehow magically came up with the logo.
link |
And it was a great thing.
link |
And the whole community rallied around it.
link |
And then it got better as other graphic designers
link |
But that's originally where it came from.
link |
Is there dragons from fiction that you
link |
connect with, that Game of Thrones, Lord of the Rings,
link |
that kind of thing?
link |
Lord of the Rings is great.
link |
I also like role playing games and things
link |
like computer role playing games.
link |
And so dragons often show up in there.
link |
But really, it comes back to the book.
link |
Oh, no, we need a thing.
link |
And hilariously, one of the funny things about LLVM
link |
is that my wife, who's amazing, runs the LLVM Foundation.
link |
And she goes to Grace Hopper and is
link |
trying to get more women involved in the.
link |
She's also a compiler engineer.
link |
So she's trying to get other women
link |
to get interested in compilers and things like this.
link |
And so she hands out the stickers.
link |
And people like the LLVM sticker because of Game of Thrones.
link |
And so sometimes culture has this helpful effect
link |
to get the next generation of compiler engineers
link |
engaged with the cause.
link |
Chris, thanks so much for talking with us.
link |
It's been great talking with you.