back to indexJim Keller: Moore's Law, Microprocessors, and First Principles | Lex Fridman Podcast #70
link |
The following is a conversation with Jim Keller, legendary microprocessor engineer who has worked at AMD, Apple, Tesla, and now Intel.
link |
He's known for his work on AMD K7, K8, K12, and Zen microarchitectures, Apple A4 and A5 processors,
link |
and coauthor of the specification for the X8664 instruction set and hypertransport interconnect.
link |
He's a brilliant first principles engineer and out of the box thinker and just an interesting and fun human being to talk to.
link |
This is the Artificial Intelligence Podcast.
link |
If you enjoy it, subscribe on YouTube, give it 5 stars on Apple Podcast, follow on Spotify, support it on Patreon, or simply connect with me on Twitter.
link |
Alex Friedman, spelled F R I D M A N.
link |
I recently started doing ads at the end of the introduction. I'll do one or two minutes after introducing the episode and never any ads in the middle that can break the flow of the conversation.
link |
I hope that works for you. It doesn't hurt the listening experience.
link |
This show is presented by Cash App, the number one finance app in the App Store.
link |
I personally use Cash App to send money to friends, but you can also use it to buy, sell, and deposit Bitcoin in just seconds.
link |
Cash App also has a new investing feature. You can buy fractions of a stock, say $1 worth, no matter what the stock price is.
link |
Roker's services are provided by Cash App Investing, a subsidiary of Square and member SIPC.
link |
I'm excited to be working with Cash App to support one of my favorite organizations called FIRST, best known for their first robotics and legal competitions.
link |
They educate and inspire hundreds of thousands of students in over 110 countries and have a perfect rating on charity navigator, which means that donated money is used to maximum effectiveness.
link |
When you get Cash App from the App Store, Google Play, and use code LEX
link |
You'll get $10 and Cash App will also donate $10 to FIRST, which again is an organization that I've personally seen inspire girls and boys to dream of engineering a better world.
link |
And now, here's my conversation with Jim Keller.
link |
What are the differences and similarities between the human brain and a computer with the microprocessor at its core? Let's start with the philosophical question, perhaps.
link |
Well, since people don't actually understand how human brains work, I think that's true.
link |
I think that's true.
link |
So it's hard to compare them.
link |
Computers are, you know, there's really two things.
link |
You think in the human brain, everything's a mesh, a mess that's combined together.
link |
I don't know that the understanding of that is super deep.
link |
What is a microprocessor, what is a microarchitecture? What's an instruction set architecture?
link |
transistors. On top of that, we build logic gates, right, and
link |
then functional units, like an adder, a subtractor, an
link |
instruction parsing unit, and then we assemble those into,
link |
you know, processing elements, modern computers are built out
link |
of, you know, probably 10 to 20 locally, you know, organic
link |
processing elements or coherent processing elements, and then
link |
that runs computer programs. Right. So there's abstraction
link |
layers, and then software, you know, there's an instruction set
link |
you run. And then there's assembly language C, C plus
link |
plus Java JavaScript, you know, there's abstraction layers,
link |
you know, essentially from the atom to the data center. Right.
link |
So when you when you build a computer, you know, first, there's
link |
a target like what's it for, like how fast does it have to be,
link |
which, you know, today, there's a whole bunch of metrics about
link |
what that is. And then in an organization of, you know, 1000
link |
people who build a computer, there's lots of different
link |
disciplines that you have to operate on. Does that make sense?
link |
there's a bunch of levels of abstraction of in an organization
link |
I can tell, and in your own vision, there's a lot of
link |
brilliance that comes in at every one of those layers. Some of
link |
it is science, some of it is engineering, some of it is art.
link |
What's the most, if you could pick favorites, what's the most
link |
important, your favorite layer on these layers of abstractions?
link |
Where does the magic enter this hierarchy?
link |
I don't really care. That's the fun, you know, I'm somewhat
link |
agnostic to that. So I would say, for relatively long periods
link |
of time, instruction sets are stable. So the x86 instruction
link |
set, the arm instruction set.
link |
What's an instruction set? So it says, how do you encode the
link |
basic operations, load, store, multiply, add, subtract,
link |
conditional branch, you know, there aren't that many
link |
interesting instructions. Look, if you look at a program and it
link |
runs, you know, 90% of the execution is on 25 op codes,
link |
you know, 25 instructions on those are stable. Right?
link |
What does it mean stable?
link |
Intel architecture has been around for 25 years.
link |
It works. It works. And that's because the
link |
basics, you know, are defined a long time ago. Right? Now, the
link |
way an old computer ran is you fetched instructions and you
link |
executed them in order. Do the load, do the add, do the
link |
compare. The way a modern computer works is you fetch large
link |
numbers of instructions, say 500. And then you find the
link |
dependency graph between the instructions. And then you execute
link |
in independent units, those little micrographs. So a modern
link |
computer, like people like to say computers should be simple
link |
and clean. But it turns out the market for a simple complete
link |
clean slow computers is zero. Right? We don't sell any simple
link |
clean computers. Now you can, there's how you build it can
link |
be clean, but the computer people want to buy, let's say in a
link |
phone or data center, fetches a large number of instructions,
link |
computes the dependency graph, and then executes it in a way
link |
that gets the right answers.
link |
And optimize that graph somehow.
link |
Yeah, they run deeply out of order. And then there's
link |
semantics around how memory ordering works and other
link |
things work. So the computer sort of has a bunch of
link |
bookkeeping tables that says what order cities operations
link |
should finish in or appear to finish in. But to go fast, you
link |
have to fetch a lot of instructions and find all the
link |
parallelism. Now there's a second kind of computer, which we
link |
call GPUs today. And I call it the difference. There's found
link |
parallelism, like you have a program with a lot of
link |
dependent instructions, you fetch a bunch and then you go
link |
figure out the dependency graph and you issues instructions
link |
at order. That's because you have one serial narrative to
link |
use, which in fact is in can be done out of order.
link |
Did you call it a narrative? Yeah.
link |
Wow. So yeah, so humans think of serial narrative. So read a
link |
book, right? There's a you know, there's a sentence after
link |
sentence after sentence, and there's paragraphs. Now you
link |
could diagram that. Imagine you diagrammed it properly and
link |
you said, which sentences could be read in anti order, any
link |
order without changing the meaning, right?
link |
It's a fascinating question to ask of a book. Yeah. Yeah, you
link |
could do that. Right? So some paragraphs could be
link |
reordered, some sentences can be reordered. You could say he
link |
is tall and smart and X, right? And it doesn't matter the order
link |
of tall and smart. But if you say the tall man is wearing a
link |
red shirt, what colors, you know, like you can create
link |
dependencies, right? Right. And so GPUs, on the other hand,
link |
run simple programs on pixels, but you're given a million of
link |
them. And the first order, the screen you're looking at
link |
doesn't care which order you do it in. So I call that given
link |
parallelism, simple narratives around the large numbers of
link |
things where you can just say it's parallel because you told
link |
me it was. So found parallelism where the narrative
link |
is sequential, but you discover like little pockets
link |
of parallelism versus. Turns out large pockets of
link |
parallelism. Large. So how hard is it to discover?
link |
Well, how hard is it? That's just transistor count, right?
link |
So once you crack the problem, you say here's how you fetch
link |
10 instructions at a time, here's how you calculate the
link |
dependencies between them, here's how you describe the
link |
dependencies, here's, you know, these are pieces, right?
link |
So, Anna, once you describe the dependencies, then it's just
link |
a graph, sort of, it's an algorithm that finds, what is
link |
that? I'm sure there's a graph theory, a theoretical answer
link |
here that's solvable. In general, programs, modern programs
link |
that human beings write, how much found parallelism is
link |
there in them? About 10x. What does 10x mean?
link |
Well, you execute it in order. Versus, yeah. You would get
link |
what's called cycles per instruction and it would be
link |
about, you know, three instructions, three cycles
link |
per instruction because of the latency of the operations
link |
and stuff. And in a modern computer, execute it like
link |
.2,.25 cycles per instruction. So it's about,
link |
we today find 10x. And there's two things. One is
link |
the found parallelism in the narrative, right? And the other
link |
is the predictability of the narrative, right? So certain
link |
operations, they do a bunch of calculations and if greater
link |
than one, do this, else do that. That, that decision is
link |
predicted in modern computers to high 90% accuracy. So
link |
branches happen a lot. So imagine you have, you have a
link |
decision to make every six instructions, which is about
link |
the average, right? But you want to fetch 500
link |
instructions, figure out the graph and execute them all
link |
in parallel. That means you have, let's say, if you
link |
affect 600 instructions and it's every six, you have to
link |
fetch, you have to predict 99 out of 100 branches
link |
correctly for that window to be effective.
link |
Okay. So parallelism, you can't parallelize branches
link |
or you can. You can predict. What does predict a branch
link |
mean? What's predicted? So imagine you do a computation
link |
over and over. You're in a loop. Yep. So while n is greater
link |
than one, do. And you go through that loop a million
link |
times. So every time you look at the branch, you say, it's
link |
probably still greater than one. And you're saying you could
link |
do that accurately. Very accurately. Modern computer. My
link |
mind is blown. How the heck do you do that? Wait a minute.
link |
Well, you want to know? This is really sad. 20 years ago.
link |
Yes. You simply recorded which way the branch went last time
link |
and predicted the same thing. Right. Okay. What's the accuracy
link |
of that? 85%. So then somebody said, hey, let's keep a
link |
couple of bits and have a little counter. So when it
link |
predicts one way, we count up and then pins. So say you have
link |
a three bit counter. So you count up and then you count
link |
down. And if it's, you know, you can use the top bit as a
link |
signed bit. So you have a signed two bit number. So if it's
link |
greater than one, you predict taken and less than one, you
link |
predict not taken, right? Or less than zero, whatever the
link |
thing is. And that got us to 92%. Oh. Okay, you know, it's
link |
better. This branch depends on how you got there. So if you
link |
came down the code one way, you're talking about Bob and
link |
Jane, right? And then said is just Bob like Jane, it went
link |
one way. But if you're talking about Bob and Jill, just Bob
link |
like Jane, you go a different way, right? So that's called
link |
history. So you take the history and a counter. That's
link |
cool. But that's not how anything works today. They
link |
use something that looks a little like a neural network. So
link |
modern, you take all the execution flows. And then you
link |
do basically deep pattern recognition of how the program
link |
is executing. And you do that multiple different ways. And
link |
you have something that chooses what the best result is.
link |
There's a little supercomputer inside the computer. That's
link |
trying to predict that calculates which way branches go. So
link |
the effective window that it's worth finding grass and gets
link |
bigger. Why was that gonna make me sad? Because that's
link |
amazing. It's amazingly complicated. Oh, well, here's
link |
the here's the funny thing. So to get to 85% took a
link |
thousand bits. To get to 99% takes tens of megabits. So
link |
this is one of those to get the result, you know, to get
link |
from a window of say, 50 instructions to 500. It took
link |
three orders of magnitude or four orders of magnitude
link |
toward bits. Now, if you get the prediction of a branch
link |
wrong, what happens then flush the pipe flush the pipe. So
link |
it's just a performance cost. But it gets even better. Yeah.
link |
So we're starting to look at stuff that says so executed
link |
down this path. And then you had two ways to go. But far, far
link |
away, there's something that doesn't matter which path
link |
you went. So you missed you took the wrong path, you
link |
executed a bunch of stuff. Then you had the
link |
mispredicting to backed it up. But you remembered all the
link |
results you already calculated. Some of those are just
link |
fine. Like if you read a book and you misunderstand a
link |
paragraph, your understanding of the next paragraph,
link |
sometimes is invariant to the understanding. Sometimes it
link |
depends on it. And you can kind of anticipate that
link |
invariance. Yeah, well, you can keep track of whether the
link |
data changed. And so when you come back to me a piece of
link |
code, should you calculate it again or do the same thing?
link |
Okay, how much of this is art and how much of it is science?
link |
Because it sounds pretty complicated. Well, how do you
link |
describe a situation? So imagine you come to a point in the
link |
road where you have to make a decision. And you have a
link |
bunch of knowledge about which way to go. Maybe you have a
link |
map. So you want to go the shortest way. Or do you want to
link |
go the fastest way? Or do you want to take the nicest
link |
road? So there's some set of data. So imagine you're
link |
doing something complicated like building a computer. And
link |
there's hundreds of decision points, all with hundreds of
link |
possible ways to go. And the ways you pick interact in a
link |
complicated way. Right. And then you have to pick the right
link |
spot. Right. So that's an order of science. I don't know.
link |
You avoided the question. You just described the Robert Frost
link |
poem of road less taken. I described the Robin Frost
link |
problem. That's what we do as computer designers. It's all
link |
poetry. Okay. Great. Yeah, I don't know how to describe that
link |
because some people are very good at making those
link |
intuitive leaps. It seems like the this combinations of
link |
things. Some people are less good at it, but they're really
link |
good at evaluating the alternatives. Right. And
link |
everybody has a different way to do it. And some people can't
link |
make those leaps, but they're really good at analyzing it.
link |
So when you see computers are designed by teams of people
link |
who have very different skill sets and a good team has lots
link |
of different kinds of people. I suspect you would describe
link |
some of them as artistic. Right. But not very many.
link |
Unfortunately. Or fortunately. Well, you know, computer
link |
design is hard. It's 99% perspiration. And the 1%
link |
perspiration is really important. But you still need the 99.
link |
Yeah, you gotta do a lot of work. And then there's there are
link |
interesting things to do at every level that stack. So at the
link |
end of the day, if you run the same program multiple times,
link |
does it always produce the same result? Is is is there some
link |
room for fuzziness there? That's a math problem. So if you
link |
run a correct C program, the definition is every time you
link |
run it, you get the same answer. Yeah, well, that's a math
link |
statement. But that's a that's a language
link |
definitional statement. So for years, when people did when we
link |
first did 3D acceleration of graphics, you could run the
link |
same scene multiple times and get different answers. Right.
link |
Right. And then some people thought that was okay. And
link |
some people thought it was a bad idea. And then when the
link |
HPC world used GPUs for calculations, they thought it
link |
was a really bad idea. Okay. Now, in modern AI stuff,
link |
people are looking at networks where the precision of the
link |
data is low enough that the data is somewhat noisy. And the
link |
observation is the input data is unbelievably noisy. So why
link |
should the calculation be not noisy? And people have
link |
experimented with algorithms that say can get faster answers
link |
by being noisy. Like as the network starts to converge, if
link |
you look at the computation graph, it starts out really
link |
wide and it gets narrower. And you can say is that last
link |
little bit that important? Or should I start the graph on
link |
the next rev before we whittle it all the way down to the
link |
answer? Right. So you can create algorithms that are
link |
noisy. Now, if you're developing something and every
link |
time you run it, you get a different answer. It's really
link |
annoying. And so most people think even today, every time
link |
you run the program, you get the same answer. No, I know.
link |
But the question is, that's the formal definition of a
link |
programming language. There is a definition of languages that
link |
don't get the same answer. But people who use those, you
link |
always want something because you get a bad answer. And then
link |
you're wondering, is it because of something in the algorithm
link |
or because of this? And so everybody wants a little switch
link |
that says no matter what, do it deterministically. And it's
link |
really weird because almost everything going into modern
link |
calculations is noisy. So the answers have to be so clear.
link |
Right. So where do you stand? I design computers for people who
link |
run programs. So somebody says, I want a deterministic
link |
answer. Like most people want that. Can you deliver a
link |
deterministic answer? I guess is the question. Like when you
link |
Yeah, hopefully, sure. What people don't realize is you get
link |
a deterministic answer even though the execution flow is
link |
very undeterministic. So you run this program 100 times.
link |
It never runs the same way twice, ever. And the answer, it
link |
arrives at the same answer. But it gets the same answer
link |
every time. It's just, it's just amazing. Okay, you've
link |
achieved in the eyes of many people, legend status as a
link |
chip art architect, what design creation are you most proud
link |
of? Perhaps because it was challenging, because of its
link |
impact or because of the set of brilliant ideas that that
link |
were involved in. Well, I find that description odd and I
link |
have two small children and I promise you they think it's
link |
hilarious. This question. Yeah. So I do it for them. So I am
link |
I'm really interested in building computers and I've worked
link |
with really, really smart people. I'm not unbelievable
link |
to be smart. I'm fascinated by how they go together both as a
link |
as a thing to do and as endeavor that people do. How
link |
people and computers go together. Yeah, like how people
link |
think and build a computer. And I find sometimes that the
link |
best computer architects aren't that interested in people or
link |
the best people managers aren't that good at designing
link |
computers. So the whole stack of human beings is
link |
fascinating. So the managers, the individual engineers. So
link |
yeah, I said I realized after a lot of years of building
link |
computers, where you sort of build them out of
link |
transistors, logic gates, functional units, computational
link |
elements, that you could think of people the same way. So
link |
people are functional units. Yes. And then you can think of
link |
organizational design as a computer architectural problem.
link |
And then it's like, oh, that's super cool because the people
link |
are all different just like the computational elements are
link |
all different. And they like to do different things and
link |
so I had a lot of fun like reframing how I think about
link |
organizations. Just like with with computers, we were
link |
saying execution paths, you can have a lot of different
link |
paths that end up at a at at the same good destination. So
link |
what have you learned about the human abstractions from
link |
individual functional human units to the the broader
link |
organization? What what does it take to create something
link |
special? Well, most people don't think simple enough.
link |
All right. So do you know the difference between a recipe
link |
and the understanding? There's probably a philosophical
link |
description of this. So imagine you're gonna make a loaf
link |
for bread. Yep. The recipe says get some flour, add some
link |
water, add some yeast, mix it up, let it rise, put it in a
link |
pan, put it in the oven. It's a recipe. Right.
link |
Understanding bread. You can understand biology, supply
link |
chains, you know, rain grinders, yeast, physics, you
link |
know, thermodynamics, like there's so many levels of
link |
understanding there. And then when people build and design
link |
things, they frequently are executing some stack of
link |
recipes. Right. And the problem with that is the recipes
link |
all have limited scope. Look, if you have a really good
link |
recipe book for making bread, it won't tell you anything
link |
about how to make an omelet. Right. Right. But if you
link |
have a deep understanding of cooking, right, then bread,
link |
omelets, you know, sandwich, you know, there's there's a
link |
different, you know, way of viewing everything. And most
link |
people, when you get to be an expert at something, you
link |
know, you're you're hoping to achieve deeper understanding
link |
not just a large set of recipes to go execute. And it's
link |
interesting to walk groups of people because executing
link |
recipes is unbelievably efficient. If it's what you
link |
want to do. If it's not what you want to do, you're
link |
really stuck. And that difference is crucial. And
link |
everybody has a balance of, let's say, deeper
link |
understanding recipes. And some people are really good
link |
at recognizing when the problem is to understand
link |
something deeply, deeply. Does that make sense? It
link |
totally makes sense. Does it every stage of development
link |
deep understanding on the team needed? Oh, this goes
link |
back to the art versus science question. Sure. If you
link |
constantly unpacked everything for deeper understanding,
link |
you never get anything done. Right. And if you don't
link |
unpack understanding when you need to, you'll do the
link |
wrong thing. And then at every juncture, like human
link |
beings are these really weird things because
link |
everything you tell them has a million possible
link |
outputs, right? And then they all interact in a
link |
hilarious way. And then having some intuition about
link |
what do you tell them, what do you do, when do you
link |
intervene, when do you not, it's, it's complicated.
link |
Right. So it's, it's, it's, you know, essentially
link |
computationally unsolvable. Yeah, it's an intractable
link |
problem. Sure. Humans are a mess. But with deep
link |
understanding, do you mean also sort of fundamental
link |
questions of things like what is a computer? Like, or
link |
why? Like, the why questions, why are we even
link |
building this? Like, of purpose? Or do you mean
link |
more like going towards the fundamental limits of
link |
physics sort of really getting into the core of
link |
the science? Well, in terms of building a computer,
link |
think simple, think a little simpler. So common
link |
practices, you build a computer. And then when
link |
somebody says, I want to make it 10% faster, you'll
link |
go in and say, all right, I need to make this
link |
buffer bigger. And maybe I'll add an ad unit. Or, you
link |
know, I have this thing that's three instructions
link |
wide, I'm going to make it four instructions wide.
link |
And what you see is each piece gets incrementally
link |
more complicated. Right. And then at some point, you
link |
hit this limit, like adding another feature or
link |
buffer doesn't seem to make it any faster. And
link |
then people will say, well, that's because it's a
link |
fundamental limit. And then somebody else will look
link |
at it and say, well, actually the way you divided
link |
the problem up and the way that different features
link |
are interacting is limiting you and it has to be
link |
rethought, rewritten. Right. So then you refactor
link |
and rewrite it and what people commonly find is
link |
the rewrite is not only faster, but half is
link |
complicated. From scratch? Yes. So how often in
link |
your career, but just have you seen as needed,
link |
maybe more generally, to just throw the whole
link |
out, the whole thing out? This is where I'm on one
link |
end of it, every three to five years. Which end
link |
are you on? Wait. Rewrite more often. Rewrite. And
link |
three to five years is... If you want to really
link |
make a lot of progress in computer architecture,
link |
every five years you should do one from scratch.
link |
So where does the x86, x64 standard come in?
link |
How often do you... I was the coauthor of that spec
link |
in 98. That's 20 years ago. Yeah. So that's still
link |
around. The instruction set itself has been
link |
extended quite a few times. Yes. And instruction sets
link |
are less interesting than the implementation
link |
underneath. There's been... Interesting. On x86
link |
architecture, Intel's designed a few, AIM just
link |
designed a few very different architectures.
link |
And I don't want to go into too much of the detail
link |
about how often, but there's a tendency to
link |
rewrite it every 10 years and it really
link |
should be every five. So you're saying you're an
link |
outlier in that sense in the... Rewrite more often.
link |
Rewrite more often. Well, and here's... Isn't that scary?
link |
Yeah, of course. Well, scary to who? To everybody
link |
involved, because like you said, repeating the
link |
recipe is efficient. Companies want to make money...
link |
No. Individual engineers want to succeed. So you want to
link |
incrementally improve, increase the buffer from 3 to 4.
link |
Well, this is where you get into diminishing return curves.
link |
I think Steve Jobs said this, right? So every... You have a project
link |
and you start here and it goes up and they have
link |
diminishing return. And to get to the next level, you have to
link |
do a new one and the initial starting point
link |
will be lower than the old optimization point.
link |
But it'll get higher. So now you have two kinds of fear.
link |
Short term disaster and long term disaster.
link |
And you're... So grown ups, right? Like, you know, people with a
link |
quarter by quarter business objective are terrified about
link |
changing everything. And people who are trying to run
link |
a business or build a computer for a long term objective
link |
know that the short term limitations block them from the long term success.
link |
So if you look at leaders of companies that had really good
link |
long term success, every time they saw that they had to redo
link |
something, they did. And so somebody has to speak up?
link |
Or you do multiple projects in parallel. Like, you optimize the old one while you
link |
build a new one and... But the marketing guys are always like,
link |
make promise me that the new computer is faster on every single thing.
link |
And the computer architect says, well, the new computer will be faster on the
link |
average. But there's a distribution of results
link |
and performance. And you'll have some outliers that are slower.
link |
And that's very hard because they have one customer cares about that one.
link |
So speaking of the long term, for over 50 years now, Moore's Law has served a...
link |
for me and millions of others as an inspiring beacon
link |
of what kind of amazing future brilliant engineers can build.
link |
I'm just making your kids laugh all of today. That's great.
link |
So first, in your eyes, what is Moore's Law if you could define for people who
link |
don't know? Well, the simple statement was,
link |
from Gordon Moore, was double the number of transistors every two years.
link |
Something like that. And then my operational model is we increase the
link |
performance of computers by 2x every two or three years.
link |
And it's wiggled around substantially over time.
link |
And also, in how we deliver performance has changed.
link |
But the foundational idea was 2x the transistors every two years.
link |
The current cadence is something like, they call it a shrink factor,
link |
like 0.6 every two years, which is not 0.5.
link |
But that's referring strictly again to the original definition.
link |
Yeah, a transistor count.
link |
A shrink factor, just getting them small and small and small.
link |
Well, as you use for a constant chip area, if you make the transistors smaller by 0.6,
link |
then you get one over 0.6 more transistors.
link |
So can you linger a little longer? What's the broader,
link |
what do you think should be the broader definition of Moore's Law
link |
when you mentioned how you think of performance?
link |
Just broadly, what's a good way to think about Moore's Law?
link |
Well, first of all, I've been aware of Moore's Law for 30 years.
link |
Well, I've been designing computers for 40.
link |
You're just watching it before your eyes kind of thing.
link |
Well, and somewhere where I became aware of it,
link |
I was also informed that Moore's Law was going to die in 10 to 15 years.
link |
And then I thought that was true at first.
link |
But then after 10 years, it was going to die in 10 to 15 years.
link |
And then at one point, it was going to die in five years.
link |
And then it went back up to 10 years.
link |
And at some point, I decided not to worry about that particular product
link |
agnostication for the rest of my life, which is fun.
link |
And then I joined Intel and everybody said Moore's Law is dead.
link |
And I thought that's sad because it's the Moore's Law company.
link |
And it's not dead.
link |
And it's always been going to die.
link |
And humans like these apocryphal kind of statements like,
link |
we'll run out of food or we'll run out of air or run out of room
link |
or run out of something.
link |
Right. But it's still incredible that it's lived for as long as it has.
link |
And yes, there's many people who believe now that Moore's Law is dead.
link |
You know, they can join the last 50 years of people who had the same idea.
link |
Yeah, there's a long tradition.
link |
But why do you think, if you can try to understand it, why do you think it's not dead?
link |
Well, first, let's just think, people think Moore's Law is one thing.
link |
Transistors get smaller.
link |
But actually under the sheets, there's literally thousands of innovations.
link |
And almost all those innovations have their own diminishing return curves.
link |
So if you graph it, it looks like a cascade of diminishing return curves.
link |
I don't know what to call that.
link |
But the result is an exponential curve.
link |
But at least it has been.
link |
And we keep inventing new things.
link |
So if you're an expert in one of the things on a diminishing return curve,
link |
right, and you can see it's plateau, you will probably tell people,
link |
well, this is done.
link |
Meanwhile, some other pilot people are doing something different.
link |
So that's just normal.
link |
So then there's the observation of how small could a switching device be?
link |
So a modern transistor is something like 1,000 by 1,000 by 1,000 atoms, right?
link |
And you get quantum effects down around 2 to 10 atoms.
link |
So you can imagine a transistor as small as 10 by 10 by 10.
link |
So that's a million times smaller.
link |
And then the quantum computational people are working away at how to use quantum effects.
link |
So 1,000 by 1,000 by 1,000.
link |
That's a really clean way of putting it.
link |
Well, a fan, like a modern transistor, if you look at the fan, it's like 120 atoms wide.
link |
But we can make that thinner.
link |
And then there's a gate wrapped around it and then there's spacing.
link |
There's a whole bunch of geometry.
link |
And a competent transistor designer could count both atoms in every single direction.
link |
Like there's techniques now to already put down atoms in a single atomic layer.
link |
And you can place atoms if you want to.
link |
It's just, from a manufacturing process, if placing an atom takes 10 minutes,
link |
and you need to put 10 to the 23rd atoms together to make a computer, it would take a long time.
link |
So the methods are both shrinking things.
link |
And then coming up with effective ways to control what's happening.
link |
Manufacture stably and cheaply.
link |
So the innovation stock's pretty broad.
link |
There's equipment.
link |
There's chemistry.
link |
There's material science.
link |
There's metallurgy. There's lots of ideas about when you put different materials together,
link |
how do they interact?
link |
Is it stable over temperature?
link |
Like are they repeatable?
link |
There's literally thousands of technologies involved.
link |
But just for the shrinking, you don't think we're quite yet close to the fundamental limits of physics.
link |
I did a talk on Mars Law and I asked for a roadmap to a path of about 100.
link |
And after two weeks, they said, we only got to 50.
link |
We only got to 50.
link |
And I said, why don't you give us another two weeks?
link |
Well, here's the thing about Mars Law.
link |
So I believe that the next 10 or 20 years of shrinking is going to happen.
link |
Now, as a computer designer, you have two stances.
link |
You think it's going to shrink, in which case you're designing and thinking about architecture
link |
in a way that you'll use more transistors, or conversely, not be swamped by the complexity
link |
of all the transistors you get.
link |
You have to have a strategy.
link |
So you're open to the possibility and waiting for the possibility of a whole new army of
link |
transistors ready to work.
link |
I'm expecting more transistors every two or three years by a number large enough
link |
that how you think about design, how you think about architecture has to change.
link |
Like, imagine you build buildings out of bricks, and every year the bricks are half the size,
link |
or every two years.
link |
Well, if you kept building bricks the same way, so many bricks per person per day,
link |
the amount of time to build a building would go up exponentially.
link |
But if you said, I know that's coming, so now I'm going to design equipment.
link |
I move bricks faster, use them better, because maybe you're getting something out of the smaller
link |
bricks, more strengths, thinner walls, you know, less material efficiency out of that.
link |
So once you have a roadmap with what's going to happen, transistors, they're going to get,
link |
we're going to get more of them, then you design all this collateral around it to take advantage
link |
of it, and also to cope with it.
link |
Like, that's the thing people don't understand.
link |
It's like, if I didn't believe in Moore's Law, and then Moore's Law Transistors showed up,
link |
my design teams were all drowned.
link |
So what's the hardest part of this influx of new transistors?
link |
I mean, even if you just look historically throughout your career,
link |
what's the thing, what fundamentally changes when you add more transistors
link |
in the task of designing an architecture?
link |
Well, there's two constants, right?
link |
One is people don't get smarter.
link |
By the way, there's some science shown that we do get smarter because of nutrition, whatever.
link |
Sorry to bring that up.
link |
Yeah, familiar with it.
link |
Nobody understands it.
link |
Nobody knows if it's still going on.
link |
Or whether it's real or not.
link |
But yeah, I sort of...
link |
Anyway, but not exponentially.
link |
I would believe for the most part, people aren't getting much smarter.
link |
The evidence doesn't support it.
link |
And then teams can't grow that much.
link |
So human beings, we're really good in teams of 10, up to teams of 100.
link |
They can know each other.
link |
Beyond that, you have to have organizational boundaries.
link |
Those are pretty hard constraints.
link |
So then you have to divide and conquer.
link |
Like as the designs get bigger, you have to divide it into pieces.
link |
The power of abstraction layers is really high.
link |
We used to build computers out of transistors.
link |
Now we have a team that turns transistors into logic cells,
link |
and our team that turns them into functional units.
link |
Another one that turns them into computers.
link |
So we have abstraction layers in there.
link |
And you have to think about when do you shift gears on that.
link |
We also use faster computers to build faster computers.
link |
So some algorithms run twice as fast on new computers,
link |
but a lot of algorithms are n squared.
link |
So a computer with twice as many transistors,
link |
and it might take four times as long to run.
link |
So you have to refactor the software.
link |
Like simply using faster computers
link |
to build bigger computers doesn't work.
link |
So you have to think about all these things.
link |
So in terms of computing performance
link |
and the exciting possibility that more powerful computers bring
link |
is shrinking the thing we've just been talking about.
link |
One of the, for you, one of the biggest exciting possibilities
link |
of advancement in performance.
link |
Or is there other directions that you're interested in?
link |
Like in the direction of sort of enforcing given parallelism,
link |
or doing massive parallelism in terms of many, many CPUs,
link |
stacking CPUs on top of each other, that kind of parallelism,
link |
or any kind of parallelism?
link |
Well, think about it in a different way.
link |
So all of computers, slow computers,
link |
you said a equal b plus c times d.
link |
Pretty simple, right?
link |
And then we made faster computers with vector units,
link |
and you can do proper equations and matrices, right?
link |
And then modern like AI computations,
link |
or like convolutional neural networks,
link |
where you convolve one large data set against another.
link |
And so there's sort of this hierarchy of mathematics,
link |
you know, from simple equation to linear equations
link |
to matrix equations to deeper kind of computation.
link |
And the data sets are getting so big
link |
that people are thinking of data as a topology problem.
link |
You know, data is organized in some immense shape.
link |
And then the computation, which sort of wants to be
link |
get data from immense shape and do some computation on it.
link |
So what computers have a lot of people to do
link |
is have algorithms go much, much further.
link |
So that paper you referenced, the Sutton paper,
link |
they talked about, you know, like when AI started,
link |
it was apply rule sets to something.
link |
That's a very simple computational situation.
link |
And then when they did first chess thing,
link |
they solved deep searches.
link |
So have a huge database of moves and results, deep search,
link |
but it's still just a search, right?
link |
Now we take large numbers of images,
link |
and we use it to train these weight sets
link |
that we convolve across.
link |
It's a completely different kind of phenomena.
link |
And now they're doing the next generation.
link |
And if you look at it,
link |
they're going up this mathematical graph, right?
link |
And then computations, both computation and data sets,
link |
support going up that graph.
link |
Yeah, the kind of computation though might,
link |
I mean, I would argue that all of it is still a search, right?
link |
Just like you said, a topology problem of data sets,
link |
you're searching the data sets for valuable data
link |
and also the actual optimization of neural networks
link |
is a kind of search for the...
link |
If you had looked at the inner layers of finding a cat,
link |
it's not a search.
link |
It's a set of endless projections.
link |
So a projection, here's a shadow of this phone, right?
link |
And then you can have a shadow of that on the something,
link |
a shadow on that or something.
link |
And if you look in the layers,
link |
you'll see this layer actually describes pointy ears
link |
and round eyedness and fuzziness and...
link |
But the computation to tease out the attributes is not search.
link |
Like the inference part might be search,
link |
but the training's not search.
link |
Okay, well, technically...
link |
And then in deep networks, they look at layers
link |
and they don't even know it's represented.
link |
And yet, if you take the layers out, it doesn't work.
link |
So I don't think it's search.
link |
All right, well...
link |
But you'll have to talk to my mathematician
link |
about what that actually is.
link |
Well, we could disagree, but it's just semantics, I think.
link |
But it's certainly not...
link |
I would say it's absolutely not semantics, but...
link |
All right, well, if you want to go there.
link |
So optimization, to me, is search.
link |
And we're trying to optimize the ability
link |
of in your own network to detect cat ears.
link |
And the difference between chess and the space,
link |
the incredibly multidimensional,
link |
100,000 dimensional space that, you know,
link |
networks are trying to optimize over,
link |
is nothing like the chess board database.
link |
So it's a totally different kind of thing.
link |
And, okay, in that sense, you can say...
link |
It loses the meaning.
link |
I can see how you might say.
link |
The funny thing is, it's the difference between
link |
given search space and found search space.
link |
Yeah, maybe that's the different way to describe it.
link |
That's a beautiful way to put it, okay.
link |
But you're saying, what's your sense in terms of the basic
link |
mathematical operations and the architectures
link |
computer hardware that enables those operations?
link |
Do you see the CPUs of today still being a really core part
link |
of executing those mathematical operations?
link |
Well, the operations, you know,
link |
continue to be at subtract, load, store,
link |
compare, and branch.
link |
So it's interesting that the building blocks
link |
of, you know, computers or transistors,
link |
and, you know, under that atoms.
link |
So you've got atoms, transistors, logic gates, computers,
link |
right, you know, functional units and computers.
link |
The building blocks of mathematics at some level
link |
are things like ads and subtracts and multiplies.
link |
But the space mathematics can describe as,
link |
I think, essentially infinite.
link |
But the computers that run the algorithms
link |
are still doing the same things.
link |
Now, a given algorithm may say, I need sparse data,
link |
or I need 32 bit data, or I need, you know,
link |
like a convolution operation that naturally takes
link |
8 bit data, multiplies it and sums it up a certain way.
link |
So the, like the data types in TensorFlow imply
link |
an optimization set.
link |
But when you go right down and look at the computers,
link |
it's Ann and Orgade still and add some multiplies.
link |
Like, that hasn't changed much.
link |
Now, the quantum researchers think they're going
link |
to change that radically.
link |
And then there's people who think about analog computing,
link |
because you look in the brain, and it seems to be more analogish.
link |
You know, that may be just a way to do that more efficiently.
link |
But we have a million X on computation.
link |
And I don't know the repris...
link |
The relationship between computational, let's say,
link |
intensity and ability to hit mathematical abstractions.
link |
I don't know anybody's described that, but just like you saw in AI,
link |
you went from rule sets, the simple search, the complex search,
link |
to, say, found search.
link |
Like, those are, you know, orders of magnitude,
link |
more computation to do.
link |
And as we get to the next two orders of magnitude,
link |
like a friend, Roger Gadori, said,
link |
like every order of magnitude changes to computation.
link |
Fundamentally changes what the computation is doing.
link |
Fundamentally changes what the computation is doing.
link |
Oh, you know, the expression of the difference in quantity
link |
is a difference in kind.
link |
You know, the difference between ant and anthill, right?
link |
Or neuron and brain.
link |
You know, there's indefinable place where the quantity changed,
link |
the quality, right?
link |
And we've seen that happen in mathematics multiple times.
link |
And, you know, my guess is it's going to keep happening.
link |
So your sense is, yeah, if you focus head down
link |
and shrinking the transistor.
link |
Well, it's not just head down.
link |
We're aware of the software stacks
link |
that are running in the computational loads.
link |
And we're kind of pondering, what do you do
link |
with a petabyte of memory that wants to be accessed
link |
in a sparse way and have, you know,
link |
the kind of calculations AI programmers want?
link |
So there's a dialogue and interaction.
link |
But when you go in the computer chip,
link |
you know, you find addersons, subtractors, and multipliers.
link |
And so if you zoom out, then with, as you mentioned,
link |
the idea that most of the development
link |
in the last many decades in AI research
link |
came from just leveraging computation
link |
and just simple algorithms waiting
link |
for the computation to improve.
link |
Well, software guys have a thing that they call it
link |
the problem of early optimization.
link |
So you write a big software stack
link |
and if you start optimizing like the first thing you write,
link |
the odds of that being the performance limit is low.
link |
But when you get the whole thing working,
link |
can you make it 2x faster by optimizing the right things?
link |
Sure. While you're optimizing that,
link |
could you've written a new software stack,
link |
which would have been a better choice?
link |
Maybe. Now you have creative tension.
link |
But the whole time as you're doing the writing,
link |
the, that's the software we're talking about,
link |
the hardware underneath gets faster and faster.
link |
Well, it goes back to the Moore's Law.
link |
If Moore's Law is going to continue,
link |
then your AI research should expect that to show up.
link |
And then you make a slightly different set of choices than
link |
we've hit the wall, nothing's going to happen.
link |
And from here, it's just us rewriting algorithms.
link |
Like that seems like a failed strategy
link |
for the last 30 years of Moore's Law's death.
link |
So can you just linger on it?
link |
I think you've answered it,
link |
but I'll just ask the same dumb question over and over.
link |
So why do you think Moore's Law is not going to die?
link |
Which is the most promising, exciting possibility
link |
of why it won't die in the next 5, 10 years?
link |
So is it the continued shrinking of the transistor,
link |
or is it another S curve that steps in,
link |
and a totally sort of.
link |
Well, shrinking the transistor is literally thousands of innovations.
link |
So there's a whole bunch of S curves just kind of running
link |
their course and being reinvented and new things.
link |
The semiconductor fabricators and technologists
link |
have all announced what's called nanowires.
link |
So they took a fin, which had a gate around it
link |
and turned that into a little wire.
link |
So you have better control that and they're smaller.
link |
And then from there, there are some obvious steps
link |
about how to shrink that.
link |
So the metallurgy around wire stacks and stuff
link |
has very obvious abilities to shrink.
link |
And there's a whole combination of things there to do.
link |
Your sense is that we're going to get a lot
link |
if this innovation from just that shrinking.
link |
Yeah, like a factor of a hundred's a lot.
link |
Yeah, I would say that's incredible.
link |
And it's totally unknown.
link |
It's only 10 or 15 years.
link |
Now, you're smarter, you might know,
link |
but to me, it's totally unpredictable of what that 100x
link |
would bring in terms of the nature of the computation
link |
that people would be familiar with Bell's law.
link |
So for a long time, it was mainframes,
link |
many's workstation, PC, mobile.
link |
Moore's law drove faster, smaller computers.
link |
And then when we were thinking about Moore's law,
link |
Roger Gadori said every 10x generates a new computation.
link |
So scalar, vector, matrix, topological computation.
link |
And if you go look at the industry trends,
link |
there was mainframes and many computers and PCs.
link |
And then the internet took off and then we got mobile devices.
link |
And now we're building 5G wireless with one millisecond latency.
link |
And people are starting to think about the smart world
link |
where everything knows you, recognizes you.
link |
Like the transformations are gonna be unpredictable.
link |
How does it make you feel that you're one of the key architects
link |
of this kind of future?
link |
So you're not, we're not talking about the architects
link |
of the high level people who build the Angry Bird apps and Snapchat.
link |
Who knows, maybe that's the whole point of the universe.
link |
I'm gonna take a stand at that.
link |
And the attention distracting nature of mobile phones.
link |
I'll take a stand.
link |
But anyway, in terms of the side effects of smartphones
link |
or the attention distraction, which part?
link |
Well, who knows, you know, where this is all leading.
link |
It's changing so fast.
link |
Wait, so back to the...
link |
My parents used to yell at my sisters for hiding in the closet
link |
with a wired phone with a dial on it.
link |
Stop talking to your friends all day.
link |
Now, my wife yells at my kids for talking to her friends
link |
It looks the same to me.
link |
It's always, it echoes at the same time.
link |
Okay, but you are the one of the key people architecting
link |
the hardware of this future.
link |
How does that make you feel?
link |
Do you feel responsible?
link |
Do you feel excited?
link |
So we're in a social context.
link |
So there's billions of people on this planet.
link |
There are literally millions of people working on technology.
link |
I feel lucky to be, you know, doing what I do
link |
and getting paid for it.
link |
And there's an interest in it.
link |
But there's so many things going on in parallel.
link |
It's like the actions are so unpredictable.
link |
If I wasn't here, somebody else would do it.
link |
The vectors of all these different things
link |
are happening all the time.
link |
You know, there's a, I'm sure some philosopher,
link |
a metaphilosopher is, you know, wondering about how we transform our world.
link |
So you can't deny the fact that these tools,
link |
whether these tools are changing our world.
link |
So do you think it's changing for the better?
link |
Somebody, I read this thing recently.
link |
It said the two disciplines with the highest GRE scores in college
link |
are physics and philosophy, right?
link |
And they're both sort of trying to answer the question,
link |
why is there anything, right?
link |
And the philosophers, you know, are on the kind of theological side
link |
and the physicists are obviously on the, you know, the material side.
link |
And there's a hundred billion galaxies with a hundred billion stars.
link |
It seems, well, repetitive at best.
link |
So, you know, there's on our way to 10 billion people.
link |
I mean, it's hard to say what it's all for, if that's what you're asking.
link |
Yeah, I guess, I guess I am.
link |
I mean, things do tend to significantly increase this in complexity.
link |
And I'm curious about how computation, like our world, our physical world,
link |
inherently generates mathematics.
link |
It's kind of obvious, right?
link |
So we have XYZ coordinates, you take a sphere, you make it bigger,
link |
you get a surface that falls, you know, grows by R squared.
link |
Like, it generally generates mathematics and the mathematicians
link |
and the physicists have been having a lot of fun talking to each other for years.
link |
And computation has been, let's say, relatively pedestrian.
link |
Like, computation in terms of mathematics has been doing binary algebra,
link |
while those guys have been galavanting through the other realms of possibility, right?
link |
Now, recently, the computation lets you do mathematical computations that are sophisticated
link |
enough that nobody understands how the answers came out, right?
link |
Right, it used to be, you get data set, you guess at a function.
link |
The function is considered physics if it's predictive of new functions,
link |
Modern, you can take a large data set with no intuition about what it is
link |
and use machine learning to find a pattern that has no function, right?
link |
And it can arrive at results that I don't know if they're completely
link |
mathematically describable.
link |
So computation has kind of done something interesting compared to A equal B plus C.
link |
There's something reminiscent of that step from the basic operations of addition
link |
to taking a step towards neural networks that's reminiscent of what life on earth
link |
at its origins was doing.
link |
Do you think we're creating sort of the next step in our evolution
link |
in creating artificial intelligence systems that will?
link |
I mean, there's so much in the universe already, it's hard to say.
link |
What we stand in this whole thing.
link |
Are human beings working on additional abstraction layers and possibilities?
link |
Yeah, it appears so.
link |
Does that mean that human beings don't need dogs?
link |
Like there's so many things that are all simultaneously interesting and useful.
link |
What you've seen throughout your career, you've seen great and greater level
link |
abstractions built in artificial machines, right?
link |
When you look at humans, you think of all life on earth as a single organism building
link |
this thing, this machine with greater and greater levels of abstraction.
link |
Do you think humans are the peak, the top of the food chain in this long arc of history
link |
Or do you think we're just somewhere in the middle?
link |
Are we the basic functional operations of a CPU?
link |
Are we the C++ program, the Python program, or with the neural network?
link |
People have calculated like how many operations does the brain do.
link |
I've seen the number 10 to the 18th about a bunch of times arrive different ways.
link |
So could you make a computer that did 10 to the 20th operations?
link |
We're going to do that.
link |
Now, is there something magical about how brains compute things?
link |
My personal experience is interesting because you think you know how you think and then you
link |
have all these ideas and you can't figure out how they happened.
link |
And if you meditate, what you can be aware of is interesting.
link |
So I don't know if brains are magical or not.
link |
The physical evidence says no.
link |
Lots of people's personal experience says yes.
link |
So what would be funny is if brains are magical and yet we can make brains with more computation.
link |
You know, I don't know what to say about that, but.
link |
What do you think magic is an emergent phenomena?
link |
I have no explanation for it.
link |
Let me ask Jim Keller of what in your view is consciousness?
link |
With consciousness?
link |
Yeah, like what consciousness, love, things that are these deeply human things that seems
link |
to emerge from our brain?
link |
Is that something that we'll be able to make in code in chips that get faster and faster
link |
and faster and faster?
link |
That's like a 10 hour conversation.
link |
Nobody really knows.
link |
Can you summarize it in a couple of sentences?
link |
Many people have observed that organisms run at lots of different levels, right?
link |
If you had two neurons, somebody said you'd have one sensory neuron and one motor neuron,
link |
So we move towards things and away from things and we have physical integrity and safety or not,
link |
And then if you look at the animal kingdom, you can see brains that are a little more complicated.
link |
And at some point there's a planning system and then there's an emotional system that's,
link |
you know, happy about being safe or unhappy about being threatened, right?
link |
And then our brains have massive numbers of structures, you know, like planning and movement
link |
and thinking and feeling and drives and emotions.
link |
And we seem to have multiple layers of thinking systems.
link |
And we have a brain, a dream system that nobody understands whatsoever, which I find completely
link |
hilarious and you can think in a way that those systems are more independent and you can observe,
link |
you know, the different parts of yourself can observe them.
link |
I don't know which one's magical.
link |
I don't know which one's not computational.
link |
So is it possible that it's all computation?
link |
Is there a limit to computation?
link |
Do you think the universe is a computer?
link |
It's a weird kind of computer because if it was a computer, right, like when they do calculations
link |
on what it, how much calculation it takes to describe quantum effects is unbelievably high.
link |
So if it was a computer, when you built it out of something that was easier to compute,
link |
right, that's, that's a funny, it's a funny system.
link |
But then the simulation guys have pointed out that the rules are kind of interesting.
link |
Like when you look really close, it's uncertain and the speed of light says you can only look
link |
so far and things can't be simultaneous except for the odd entanglement problem where they seem
link |
to be like the rules are all kind of weird.
link |
And somebody said physics is like having 50 equations with 50 variables to define 50 variables.
link |
Like, you know, it's, you know, like physics itself has been a shit show for thousands of years.
link |
It seems odd when you get to the corners of everything, you know, it's either
link |
uncomputable or undefinable or uncertain.
link |
It's almost like the designers of the simulation are trying to prevent us from understanding it
link |
But, but also the things that require calculations requires so much calculation that our idea of
link |
the universe of a computer is absurd because every single little bit of it takes all the
link |
computation in the universe to figure out.
link |
That's a weird kind of computer.
link |
You know, you say the simulation is running in the computer, which has, by definition,
link |
infinite computation.
link |
Oh, you mean if the universe is infinite?
link |
Yeah, well, every little piece of our universe seems to take infinite computation and figure out.
link |
Well, a lot, some pretty big number.
link |
Computer's little teeny spot takes all the mass in the local one light year by one light
link |
It's close enough to infinite.
link |
Oh, it's a heck of a computer if it is one.
link |
I know it's, it's, it's a weird, it's a weird description because the simulation description
link |
seems to break when you look closely at it.
link |
But the rules of the universe seem to imply something's up.
link |
That seems a little arbitrary.
link |
The whole, the universe, the whole thing, the laws of physics.
link |
It just seems like, like, how did it come out to be the way it is?
link |
Well, lots of people talk about that.
link |
It's, you know, it's, like I said, the two smartest groups of humans are working on the
link |
From different aspects.
link |
From different aspects, and they're both complete failures.
link |
So that's, that's kind of cool.
link |
They might succeed eventually.
link |
Well, after 2000 years, the trend isn't good.
link |
Oh, 2000 years is nothing in the span of the history of the universe.
link |
We have some time.
link |
But the next 1000 years doesn't look good either.
link |
So that's what everybody says at every stage.
link |
But with Moore's law, as you've just described, not being dead, the exponential
link |
growth of technology, the future seems pretty incredible.
link |
Well, it'll be interesting.
link |
So what are your thoughts on Ray Kurzweil's sense that exponential improvement and
link |
technology will continue indefinitely?
link |
That, is that how you see Moore's law?
link |
Do you see Moore's law more broadly in the sense that technology of all kinds has a way
link |
of stacking S curves on top of each other, where it'll be exponential, and then we'll
link |
What does an exponential of a million mean?
link |
That's a pretty amazing number.
link |
And that's just for a local little piece of silicon.
link |
Now, let's imagine you, say, decided to get 1000 tons of silicon to collaborate in one
link |
computer at a million times the density.
link |
Now you're talking, I don't know, 10 to the 20th more computation power than our
link |
current already unbelievably fast computers.
link |
Nobody knows what that's going to mean.
link |
The sci fi guy is called competronium.
link |
Like when a local civilization turns the nearby star into a computer.
link |
I don't know if that's true.
link |
So just even when you shrink a transistor, that's only one dimension.
link |
The ripple effects of that.
link |
People tend to think about computers as a cost problem, right?
link |
So computers are made out of silicon and minor amounts of metals.
link |
And you know, this and that, none of those things cost any money.
link |
Like there's plenty of sand.
link |
Like you could just turn the beach and a little bit of ocean water into computers.
link |
So all the cost is in the equipment to do it.
link |
And the trend on equipment is once you figure out how to build the equipment,
link |
the trend of cost is zero.
link |
Elon said, first you figure out what configuration you want the atoms in
link |
and then how to put them there.
link |
Because well, what you hear that, you know, his, his great insight is people are how constrained.
link |
I have this thing.
link |
I know how it works.
link |
And then little tweaks to that will generate something as opposed to what
link |
do I actually want and then figure out how to build it.
link |
It's a very different mindset and almost nobody has it, obviously.
link |
Well, let me ask on that topic.
link |
You were one of the key early people in the development of autopilot,
link |
at least in the hardware side.
link |
Elon Musk believes that autopilot and vehicle autonomy, if you just look at that problem,
link |
can follow this kind of exponential improvement in terms of the hot,
link |
the how question that we're talking about.
link |
There's no reason why you can't.
link |
What are your thoughts on this particular space of vehicle autonomy?
link |
And you're a part of it and Elon Musk's and Tesla's vision for the computer you need to build
link |
was straightforward.
link |
And you could argue, well, doesn't need to be two times faster or five times or 10 times.
link |
But that's just a matter of time or price in the short run.
link |
So that's, that's not a big deal.
link |
You don't have to be especially smart to drive a car.
link |
So it's not like a super hard problem.
link |
I mean, the big problem of safety is attention, which computers are really good at, not skills.
link |
Well, let me push back on one.
link |
You say everything you said is correct, but we as humans tend to,
link |
tend to take for granted how, how incredible our vision system is.
link |
So you can drive a car with 2050 vision and you can train a neural network to extract the
link |
distance of any object in the shape of any surface from a video and data.
link |
But that's really simple.
link |
No, it's not simple.
link |
That's a simple data problem.
link |
It's not, it's not simple.
link |
It's because you, because it's not just detecting objects.
link |
It's understanding the scene and it's being able to do it in a way that doesn't make errors.
link |
So the, the beautiful thing about the human vision system and our entire brain around the
link |
whole thing is we're able to fill in the gaps.
link |
It's not just about perfectly detecting cars.
link |
It's inferring the occluded cars.
link |
It's trying to, it's, it's understanding the statistics.
link |
I think that's mostly a data problem.
link |
So you think what data would compute with improvement of computation with improvement
link |
Well, there's a, you know, when you're driving a car and somebody cuts you off,
link |
your brain has theories about why they did it.
link |
You know, they're a bad person, they're distracted, they're dumb.
link |
You know, you can listen to yourself, right?
link |
So, you know, if you think that narrative is important to be able to successfully drive
link |
a car, then current autopilot systems can't do it.
link |
But if cars are ballistic things with tracks and probabilistic changes of speed and direction
link |
and roads are fixed and given, by the way, they don't change dynamically, right?
link |
Right, you can map the world really thoroughly.
link |
You can place every object really thoroughly, right?
link |
You can calculate trajectories of things really thoroughly, right?
link |
But everything you said about really thoroughly has a different degree of difficulty.
link |
And you could say at some point, computer autonomous systems will be way better at
link |
things that humans are lousy at.
link |
Like, they'll be better at attention.
link |
They'll always remember there was a pothole on the road that humans keep forgetting about.
link |
They'll remember that this set of roads has these weirdo lines on it that the computers
link |
figured out once and especially if they get updates so that somebody changes a given.
link |
Like, the key to robots and stuff somebody said is to maximize the givens.
link |
So having a robot pick up this bottle cap is way easier to put a red dot on the top because
link |
then you'll have to figure out, you know, and if you want to do a certain thing with it,
link |
you know, maximize the givens is the thing.
link |
And autonomous systems are happily maximizing the givens.
link |
Like humans, when you drive someplace new, you remember it because you're processing it the
link |
whole time and after the 50th time you drove to work, you get to work, you don't know how you got
link |
You're on autopilot, right?
link |
Autonomous cars are always on autopilot, but the cars have no theories about why they got cut off
link |
or why they're in traffic.
link |
So that's never stopped paying attention.
link |
So I tend to believe you do have deaf theories met the models of other people,
link |
especially with pedestrian cyclists, but also with other cars.
link |
So everything you said is like, is actually essential to driving.
link |
Driving is a lot more complicated than people realize.
link |
I think so sort of to push back slightly, but to cut into traffic, right?
link |
You can't just wait for a gap.
link |
You have to be somewhat aggressive.
link |
You'll be surprised how simple the calculation for that is.
link |
I may be on that particular point, but there's a, maybe I should have to push back.
link |
I would be surprised.
link |
Yeah, I'll just say where I stand.
link |
I would be very surprised, but I think it's, you might be surprised how complicated it is.
link |
I'd say I tell people like progress disappoints in the short run surprises in the long run.
link |
It's very possible.
link |
Yeah, I suspect in 10 years it'll be just like taken for granted.
link |
But you're probably right.
link |
It's going to be a $50 solution that nobody cares about.
link |
It's like GPS is like, wow, GPS.
link |
We have satellites in space that tell you where your location is.
link |
It was a really big deal.
link |
Now everything has a GPS in it.
link |
Yeah, that's true.
link |
But I do think that systems that involve human behavior are more complicated than we give them
link |
So we can do incredible things with technology that don't involve humans.
link |
I think humans are less complicated than people, you know, frequently
link |
We tend to operate out of large numbers of patterns and just keep doing it over and over.
link |
But I can't trust you because you're a human.
link |
That's something, something a human would say.
link |
But my hope is on the point you've made is even if, no matter who's right,
link |
there, I'm hoping that there's a lot of things that humans aren't good at that machines are
link |
definitely good at.
link |
Like you said, attention and things like that.
link |
Well, there'll be so much better that the overall picture of safety and autonomy will be
link |
obviously cars will be safer, even if they're not as good.
link |
No, I'm a big believer in safety.
link |
I mean, there are already the current safety systems like cruise control that doesn't let
link |
you run into people and lane keeping.
link |
There are so many features that you just look at the Pareto of accidents and knocking off
link |
like 80% of them is super doable.
link |
Just to linger on the autopilot team and the efforts there, the...
link |
It seems to be that there is a very intense scrutiny by the media and the public in terms
link |
of safety, the pressure, the bar put before autonomous vehicles.
link |
What are your, sort of, as a person there working on the hardware and trying to build
link |
a system that builds a safe vehicle and so on, what was your sense about that pressure?
link |
Is it expected of new technology?
link |
Yeah, it seems reasonable.
link |
I talked to both American and European regulators and I was worried that the regulations would
link |
write into the rules, technology solutions like modern brake systems imply hydraulic brakes.
link |
So if you read the regulations to meet the letter of the law for brakes, it sort of has
link |
to be hydraulic, right?
link |
And the regulator said they're interested in the use cases like a head on crash, an offset
link |
crash, don't hit pedestrians, don't run into people, don't leave the road, don't run a red
link |
light or a stoplight. They were very much into the scenarios and they had all the data about
link |
which scenarios injured or killed the most people and for the most part those conversations were
link |
like what's the right thing to do to take the next step.
link |
Now Elon's very interested in also in the benefits of autonomous driving or freeing
link |
people's time and attention as well as safety. And I think that's also an interesting thing but
link |
building autonomous systems so they're safe and safer than people seemed.
link |
Since the goal is to be 10x safer than people, having the bar to be safer than people and
link |
scrutinizing accidents seems philosophically correct.
link |
So I think that's a good thing.
link |
What are, it's different than the things that you worked at in teleMD,
link |
Apple with autopilot chip design and hardware design. What are interesting or challenging
link |
aspects of building this specialized kind of computing system in the automotive space?
link |
I mean there's two tricks to building like an automotive computer. One is the software team,
link |
the machine learning team is developing algorithms that are changing fast. So as you're building the
link |
accelerator you have this you know worry or intuition that the algorithms will change enough
link |
that the accelerator will be the wrong one, right? And there's the generic thing which is if you
link |
build a really good general purpose computer say its performance is one and then GPU guys will
link |
deliver about 5x to performance for the same amount of silicon because instead of discovering
link |
parallelism you're given parallelism. And then special accelerators get another 2 to 5x on top
link |
of a GPU because you say I know the math is always 8 bit integers into 32 bit accumulators
link |
and the operations are the subset of mathematical possibilities. So auto you know AI accelerators
link |
have a claimed performance benefit over GPUs because in the narrow math space
link |
you're nailing the algorithm. Now you still try to make it programmable but the AI field is changing
link |
really fast. So there's a you know there's a little creative tension there of I want the
link |
acceleration afforded by specialization without being over specialized so that the new algorithm is
link |
so much more effective that you would have been better off on a GPU. So there's a tension there
link |
to build a good computer for an application like automotive. There's all kinds of sensor inputs
link |
and safety processors and a bunch of stuff. So one of Elon's goals to make it super affordable.
link |
So every car gets an autopilot computer. So some of the recent startups you look at and
link |
they have a server and a trunk because they're saying I'm going to build this autopilot computer
link |
replaces the driver. So their cost budget is $10,000 or $20,000 and Elon's constraint was I'm going
link |
to put one every in every car whether people buy auto ton of striping or not. So the cost
link |
constraint he had in mind was great. Right. And to hit that you had to think about the system design
link |
that's complicated. It's fun. You know it's like it's like it's Crestman's work like a violin maker
link |
right. You can say Strativarius is this incredible thing. The musicians are incredible but the guy
link |
making the violin you know picked wood and sanded it and then he cut it you know and he glued it
link |
you know and he waited for the right day so that when he put the finish on it it didn't you know
link |
do something dumb. That's Crestman's work right. You may be a genius craftsman because you have
link |
the best techniques and you discover a new one but most engineers Crestman's work and humans
link |
really like to do that. Smart humans. No everybody. All humans. I don't know. I used to I dug ditches
link |
when I was in college. I got really good at it. Satisfying. Digging ditches is also craftsman
link |
work. Yeah of course. So there's an expression called complex mastery behavior. So when you're
link |
learning something that's fine because you're learning something. When you do something it's
link |
wrote and simple. It's not that satisfying but if the steps that you have to do are complicated
link |
and you're good at them it's satisfying to do them. And then if you're intrigued by it all as
link |
you're doing them you sometimes learn new things that you can raise your game but Crestman's work
link |
is good. And engineers like engineering is complicated enough that you have to learn a lot
link |
of skills and then a lot of what you do is then craftsman's work which is fun.
link |
Autonomous driving building a very resource constrained computer. So a computer has to be cheap enough
link |
that put in every single car. That's essentially boils down to craftsman's work. It's engineering.
link |
You know there's thoughtful decisions and problems to solve and tradeoffs to make. You
link |
need 10 camera imports or eight. You know you're building for the current car or the next one.
link |
You know how do you do the safety stuff. You know there's a whole bunch of details
link |
but it's fun. But it's not like I'm building a new type of neural network which has a new
link |
mathematics and a new computer to work. You know that's like there's more invention than that.
link |
But the rejection to practice once you pick the architecture you look inside and what do you see.
link |
Adders and multipliers and memories and you know the basics. So computers is always this weird set
link |
of abstraction layers of ideas and thinking that reduction to practice is transistors and wires and
link |
you know pretty basic stuff and that's an interesting phenomena. By the way that like factory work
link |
like lots of people think factory work is road assembly stuff. I've been on the assembly line
link |
like the people work there really like it. It's a really great job. It's really complicated putting
link |
cars together is hard right and the cars moving and the parts are moving and sometimes the parts are
link |
damaged and you have to coordinate putting all the stuff together and people are good at it.
link |
They're good at it. And I remember one day I went to work and the line was shut down for some reason
link |
and some of the guys sitting around were really bummed because they had reorganized a bunch of
link |
stuff and they were going to hit a new record for the number of cars built that day and they
link |
were all gun hoe to do it and these are big tough buggers and you know but what they did was complicated
link |
and you couldn't do it. Yeah and I mean well after a while you could but you'd have to work your way
link |
up because you know like putting the bright what's called the brights the trim on a car
link |
on a moving assembly line where it has to be attached 25 places in a minute and a half
link |
is unbelievably complicated and human beings can do it. It's really good. I think that's
link |
harder than driving a car by the way. Putting together working on a factory. Two smart people
link |
can disagree. Yay. I think driving a car. We'll get you in the factory someday and then we'll see
link |
how you do it. No not for us humans driving a car is easy. I'm saying building a machine
link |
that drives a car is not easy. Okay. Okay. Driving a car is easy for humans because
link |
we've been evolving for billions of years. Drive cars. Yeah I noticed the pale with the cars are
link |
super cool. No. Now you join the rest of the unit and mocking me. Okay. I wasn't mocking you. I was
link |
just intrigued by your you know your anthropology. Yeah. I have to go dig into that. There's some
link |
inaccuracies there. Yes. Okay. But in general what have you learned in terms of thinking about
link |
passion, craftsmanship, tension, chaos. Jesus. The whole mess of it. What have you learned
link |
and have taken away from your time working with Elon Musk working at Tesla which is known to be
link |
a place of chaos, innovation, craftsmanship and all those things. I really like the way you thought.
link |
Like you think you have an understanding about what first principles of something is and then
link |
you talk to Elon about it and you didn't scratch the surface. You know he has a deep
link |
belief that no matter what you do is a local maximum. Right. I had a friend he invented a
link |
better electric motor and it was like a lot better than what we were using and one day he came by
link |
he said you know I'm a little disappointed because you know this is really great and you didn't seem
link |
that impressed and I said you know when the super intelligent aliens come are they gonna be looking
link |
for you. Like where is he? The guy built the motor. Yeah. Probably not you know like but doing
link |
interesting work that's both innovative and let's say craftsman's work on the current thing is really
link |
satisfying and it's good and that's cool and then Elon was good at taking everything apart
link |
and like what's the deep first principle. Oh no what's really no what's really you know you know
link |
you know that that you know ability to look at it without assumptions and and how constraints
link |
is super wild. You know he built rocket ship and electric car and you know everything
link |
and that's super fun and he's into it too. Like when they first landed two SpaceX rockets to Tesla
link |
we had a video projector in the big room and like 500 people came down and when they landed
link |
everybody cheered and some people cried it was so cool. All right but how did you do that? Well
link |
it was super hard and then people say well it's chaotic really to get out of all your
link |
assumptions you think that's not going to be unbelievably painful and it was Elon tough
link |
yeah probably the people look back on it and say boy I'm really happy I had that experience to go
link |
take apart that many layers of assumptions sometimes super fun sometimes painful.
link |
So it could be emotionally and intellectually painful that whole process is just stripping away
link |
assumptions. Yeah imagine 99% of your thought process is protecting your self conception
link |
and 98% of that's wrong. Yeah. Now you got the math right. How do you think you're feeling when
link |
you get back into that one bit that's useful and now you're open and you have the ability to do
link |
something different. I don't know if I got the math right it might be 99.9 but it ain't 50.
link |
Imagining it the 50% is hard enough. Yeah. Now for a long time I've suspected you could get better
link |
like you can think better you can think more clearly you can take things apart
link |
and there's lots of examples of that people who do that.
link |
So and Elon is an example of that apparently you are an example so I don't know if I am
link |
I'm fun to talk to certainly I've learned a lot of stuff right well here's the other thing is
link |
like I joke like like I read books and people think oh you read books well no I've read a
link |
couple of books a week for 55 years well maybe 50 because I didn't read learn read until I was
link |
eight or something and it turns out when people write books they often take 20 years of their life
link |
where they passionately did something reduce it to 200 pages that's kind of fun and then
link |
you go online and you can find out who wrote the best books and who like you know that's kind of
link |
wild so there's this wild selection process and then you can read it and for the most part understand
link |
it and then you can go apply it like I went to one company I thought I haven't managed much before
link |
so I read 20 management books and I started talking to him basically compared to all the
link |
VPs running around I'd run night read 19 more management books than anybody else
link |
wasn't even that hard yeah and half the stuff worked like first time it wasn't even rocket science
link |
but at the core of that is questioning the assumptions or sort of entering the thinking
link |
first person principles thinking sort of looking at the reality of the situation and using using
link |
that knowledge applying that knowledge so yeah so I would say my brain has this idea that you can
link |
question first assumptions and but I can go days at a time and forget that and you have to kind of
link |
like circle back that observation because it is because it's hard well it's hard to just keep it
link |
front and center because you know you're you operate on so many levels all the time and
link |
you know getting this done takes priority or you know being happy takes priority or you know
link |
you know screwing around takes priority like like like how you go through life is complicated
link |
and then you remember oh yeah I could really uh think first principles you know shit that's
link |
that's tiring you know but you do for a while and that's kind of cool
link |
so just as a last question in your sense from the big picture from the first principles
link |
do you think you kind of answered already but do you think autonomous driving is something
link |
we can solve on a timeline of years so one two three five ten years as opposed to a century
link |
yeah definitely just to linger on it a little longer where's the confidence coming from is it
link |
the fundamentals of the problem the fundamentals of building the hardware and the software
link |
where as a computational problem understanding ballistics roles topography it seems pretty
link |
solvable I mean and you can see this you know like like speech recognition for a long time
link |
people are doing you know frequency and domain analysis and and all kinds of stuff and that
link |
didn't work for at all right and then they did deep learning about it and it worked great
link |
and it took multiple iterations and you know autonomous driving is way past the frequency
link |
analysis point you know use radar don't run into things and the data gathering is going up and the
link |
computation is going up and the algorithm understanding is going up and there's a whole
link |
bunch of problems getting solved like that the data side is really powerful but I disagree with
link |
both you and you and I'll tell you on once again as I did before that that when you add human beings
link |
into the picture the it's no longer a ballistics problem it's something more complicated but I
link |
could be very well proven wrong and cars are highly damped in terms of rate of change like the
link |
steering and the steering system is really slow compared to a computer the acceleration the acceleration
link |
is really slow yeah on a certain time scale on a ballistics time scale but human behavior I don't
link |
know yet I I shouldn't say the beans are really slow too we weirdly we operate you know half a
link |
second behind reality I'll be really understands that one either it's pretty funny yeah yeah so
link |
yeah I would be with very well could be surprised and I think with the rate of improvement in all
link |
aspects on both the compute and the the software and the hardware there's going to be pleasant
link |
surprises all over the place yeah speaking of unpleasant surprises many people have worries
link |
about a singularity in the development of AI forgive me for such questions yeah when AI improves
link |
the exponential and reaches a point of superhuman level general intelligence you know beyond the
link |
point there's no looking back do you share this worry of existential threats from artificial
link |
intelligence from computers becoming superhuman level intelligent no not really you know like we
link |
already have a very stratified society and then if you look at the whole animal kingdom of capabilities
link |
and abilities and interests and you know smart people have their niche and you know normal people
link |
have their niche and craftsmen's have their niche and you know animals have their niche I
link |
suspect that the domains of interest for things that you know astronomically different like the
link |
whole something got 10 times smarter than us and wanted to track us all down because what
link |
we like to have coffee at Starbucks like it doesn't seem plausible no is there an existential problem
link |
that how do you live in a world where there's something way smarter than you and you based
link |
your kind of self esteem on being the smartest local person well there's what 0.1 of the population
link |
who thinks that because the rest of the population has been dealing with it since they were born
link |
so the the breath of possible experience that can be interesting is really big
link |
and you know superintelligence seems likely although we still don't know if we're magical
link |
but I suspect we're not and it seems likely that will create possibilities that are interesting
link |
for us and its its interests will be interesting for that for whatever it is it's not obvious why
link |
its interests would somehow want to fight over some square foot of dirt or you know whatever
link |
you know the usual fears are about so you don't think you'll inherit some of the darker aspects
link |
of human nature depends on how you think reality is constructed so for for whatever reason human
link |
beings are in let's say creative tension and opposition with both our good and bad forces
link |
like there's lots of philosophical understanding of that right I don't know why that would be
link |
different so you think the evil is is necessary for the good I mean the the tension I don't know
link |
about evil but like we live in a competitive world where your good is somebody else's
link |
you know evil you know there's there's the malignant part of it but that seems to be
link |
self limiting although occasionally it's it's super horrible
link |
but yes there's a debate over ideas and some people have different beliefs and that that
link |
debate itself is a process so that at arriving at something yeah and why wouldn't that continue
link |
yeah just you but you don't think that whole process will leave humans behind in a way that's
link |
painful emotionally painful yes for the one for the point one percent they'll be you know why isn't
link |
it already painful for a large percentage of the population and it is I mean society does have a
link |
lot of stress in it about the one percent and the about to this and about to that but you know
link |
everybody has a lot of stress in a life about what they find satisfying and
link |
and you know know yourself seems to be the proper dictum and pursue something that makes
link |
your life meaningful seems proper and there's so many avenues on that like there's so much
link |
unexplored space at every single level you know I'm I'm somewhat of my nephew called me a jaded
link |
optimist and you know so it's there's a beautiful tension that in that label but if you were to
link |
look back at your life and could relive a moment a set of moments because there were the happiest
link |
times of your life outside of family what would that be I don't want to relive any moments I like
link |
that I like that situation where you have some amount of optimism and then the the anxiety of the
link |
unknown so you love the unknown the the mystery of it I don't know about the mystery you sure get
link |
your blood pumping what do you think is the meaning of this whole thing of life on this
link |
pale blue dot it seems to be what it does like the universe for whatever reason makes atoms
link |
which makes us which we do stuff and we figure out things and we explore things and that's
link |
just what it is it's not just yeah it is you know Jim I don't think there's a better place to end
link |
it it's a huge honor and uh well that was super fun thank you so much for talking today all right
link |
great thanks for listening to this conversation and thank you to our presenting sponsor cash app
link |
download it use code lex podcast you'll get $10 and $10 will go to first a stem education
link |
nonprofit that inspires hundreds of thousands of young minds to become future leaders and
link |
innovators if you enjoy this podcast subscribe on youtube give it five stars an apple podcast
link |
follow on spotify support on patreon or simply connect with me on twitter and now let me leave
link |
you with some words of wisdom from gordon more if everything you try works you aren't trying hard
link |
enough thank you for listening and hope to see you next time