back to index

Jim Keller: Moore's Law, Microprocessors, and First Principles | Lex Fridman Podcast #70


small model | large model

link |
00:00:00.000
The following is a conversation with Jim Keller,
link |
00:00:03.020
legendary microprocessor engineer
link |
00:00:05.560
who has worked at AMD, Apple, Tesla, and now Intel.
link |
00:00:10.160
He's known for his work on AMD K7, K8, K12,
link |
00:00:13.520
and Zen microarchitectures, Apple A4 and A5 processors,
link |
00:00:18.120
and coauthor of the specification
link |
00:00:20.080
for the x8664 instruction set
link |
00:00:23.040
and hypertransport interconnect.
link |
00:00:26.120
He's a brilliant first principles engineer
link |
00:00:28.440
and out of the box thinker,
link |
00:00:30.040
and just an interesting and fun human being to talk to.
link |
00:00:33.480
This is the Artificial Intelligence Podcast.
link |
00:00:36.480
If you enjoy it, subscribe on YouTube,
link |
00:00:38.860
give it five stars on Apple Podcast,
link |
00:00:40.840
follow on Spotify, support it on Patreon,
link |
00:00:43.500
or simply connect with me on Twitter,
link |
00:00:45.600
at Lex Friedman, spelled F R I D M A N.
link |
00:00:49.560
I recently started doing ads
link |
00:00:51.040
at the end of the introduction.
link |
00:00:52.600
I'll do one or two minutes after introducing the episode
link |
00:00:55.560
and never any ads in the middle
link |
00:00:57.100
that can break the flow of the conversation.
link |
00:00:59.400
I hope that works for you
link |
00:01:00.780
and doesn't hurt the listening experience.
link |
00:01:04.040
This show is presented by Cash App,
link |
00:01:06.160
the number one finance app in the App Store.
link |
00:01:08.640
I personally use Cash App to send money to friends,
link |
00:01:11.440
but you can also use it to buy, sell,
link |
00:01:13.200
and deposit Bitcoin in just seconds.
link |
00:01:15.600
Cash App also has a new investing feature.
link |
00:01:18.480
You can buy fractions of a stock, say $1 worth,
link |
00:01:21.440
no matter what the stock price is.
link |
00:01:23.540
Broker services are provided by Cash App Investing,
link |
00:01:26.480
a subsidiary of Square and member SIPC.
link |
00:01:29.740
I'm excited to be working with Cash App
link |
00:01:32.040
to support one of my favorite organizations called First,
link |
00:01:35.440
best known for their FIRST Robotics and Lego competitions.
link |
00:01:38.960
They educate and inspire hundreds of thousands of students
link |
00:01:42.240
in over 110 countries and have a perfect rating
link |
00:01:45.360
at Charity Navigator,
link |
00:01:46.720
which means that donated money
link |
00:01:48.000
is used to maximum effectiveness.
link |
00:01:50.760
When you get Cash App from the App Store or Google Play
link |
00:01:53.480
and use code LEXPODCAST,
link |
00:01:56.280
you'll get $10 and Cash App will also donate $10 to FIRST,
link |
00:02:00.280
which again is an organization
link |
00:02:02.120
that I've personally seen inspire girls and boys
link |
00:02:04.920
to dream of engineering a better world.
link |
00:02:08.060
And now here's my conversation with Jim Keller.
link |
00:02:12.560
What are the differences and similarities
link |
00:02:14.520
between the human brain and a computer
link |
00:02:17.200
with the microprocessor at its core?
link |
00:02:19.260
Let's start with the philosophical question perhaps.
link |
00:02:22.260
Well, since people don't actually understand
link |
00:02:25.400
how human brains work, I think that's true.
link |
00:02:29.200
I think that's true.
link |
00:02:30.560
So it's hard to compare them.
link |
00:02:32.600
Computers are, you know, there's really two things.
link |
00:02:37.260
There's memory and there's computation, right?
link |
00:02:40.480
And to date, almost all computer architectures
link |
00:02:43.920
are global memory, which is a thing, right?
link |
00:02:47.600
And then computation where you pull data
link |
00:02:49.360
and you do relatively simple operations on it
link |
00:02:52.440
and write data back.
link |
00:02:53.900
So it's decoupled in modern computers.
link |
00:02:57.760
And you think in the human brain,
link |
00:02:59.840
everything's a mesh, a mess that's combined together?
link |
00:03:02.600
What people observe is there's, you know,
link |
00:03:04.840
some number of layers of neurons
link |
00:03:06.500
which have local and global connections
link |
00:03:09.120
and information is stored in some distributed fashion
link |
00:03:13.700
and people build things called neural networks in computers
link |
00:03:18.280
where the information is distributed
link |
00:03:21.200
in some kind of fashion.
link |
00:03:22.840
You know, there's a mathematics behind it.
link |
00:03:25.520
I don't know that the understanding of that is super deep.
link |
00:03:29.220
The computations we run on those
link |
00:03:31.160
are straightforward computations.
link |
00:03:33.440
I don't believe anybody has said
link |
00:03:35.520
a neuron does this computation.
link |
00:03:37.880
So to date, it's hard to compare them, I would say.
link |
00:03:44.120
So let's get into the basics before we zoom back out.
link |
00:03:48.800
How do you build a computer from scratch?
link |
00:03:51.020
What is a microprocessor?
link |
00:03:52.760
What is a microarchitecture?
link |
00:03:54.120
What's an instruction set architecture?
link |
00:03:56.640
Maybe even as far back as what is a transistor?
link |
00:04:01.040
So the special charm of computer engineering
link |
00:04:05.040
is there's a relatively good understanding
link |
00:04:08.400
of abstraction layers.
link |
00:04:10.480
So down at the bottom, you have atoms
link |
00:04:12.280
and atoms get put together in materials like silicon
link |
00:04:15.480
or dope silicon or metal and we build transistors.
link |
00:04:19.440
On top of that, we build logic gates, right?
link |
00:04:23.680
And then functional units, like an adder or a subtractor
link |
00:04:27.360
or an instruction parsing unit.
link |
00:04:28.800
And then we assemble those into processing elements.
link |
00:04:32.320
Modern computers are built out of probably 10 to 20
link |
00:04:37.240
locally organic processing elements
link |
00:04:40.960
or coherent processing elements.
link |
00:04:42.640
And then that runs computer programs, right?
link |
00:04:46.640
So there's abstraction layers and then software,
link |
00:04:49.800
there's an instruction set you run
link |
00:04:51.760
and then there's assembly language C, C++, Java, JavaScript.
link |
00:04:56.440
There's abstraction layers,
link |
00:04:58.680
essentially from the atom to the data center, right?
link |
00:05:02.520
So when you build a computer,
link |
00:05:06.760
first there's a target, like what's it for?
link |
00:05:08.560
Like how fast does it have to be?
link |
00:05:09.960
Which today there's a whole bunch of metrics
link |
00:05:12.200
about what that is.
link |
00:05:13.840
And then in an organization of 1,000 people
link |
00:05:17.040
who build a computer, there's lots of different disciplines
link |
00:05:22.240
that you have to operate on.
link |
00:05:24.120
Does that make sense?
link |
00:05:25.480
And so...
link |
00:05:27.120
So there's a bunch of levels of abstraction
link |
00:05:30.780
in an organization like Intel and in your own vision,
link |
00:05:35.720
there's a lot of brilliance that comes in
link |
00:05:37.600
at every one of those layers.
link |
00:05:39.700
Some of it is science, some of it is engineering,
link |
00:05:41.680
some of it is art, what's the most,
link |
00:05:45.440
if you could pick favorites,
link |
00:05:46.380
what's the most important, your favorite layer
link |
00:05:49.440
on these layers of abstractions?
link |
00:05:51.100
Where does the magic enter this hierarchy?
link |
00:05:55.360
I don't really care.
link |
00:05:57.120
That's the fun, you know, I'm somewhat agnostic to that.
link |
00:06:00.740
So I would say for relatively long periods of time,
link |
00:06:05.520
instruction sets are stable.
link |
00:06:08.040
So the x86 instruction set, the ARM instruction set.
link |
00:06:12.000
What's an instruction set?
link |
00:06:13.360
So it says, how do you encode the basic operations?
link |
00:06:16.120
Load, store, multiply, add, subtract, conditional, branch.
link |
00:06:20.140
You know, there aren't that many interesting instructions.
link |
00:06:23.800
Look, if you look at a program and it runs,
link |
00:06:26.160
you know, 90% of the execution is on 25 opcodes,
link |
00:06:29.840
you know, 25 instructions.
link |
00:06:31.680
And those are stable, right?
link |
00:06:33.900
What does it mean, stable?
link |
00:06:35.460
Intel architecture's been around for 25 years.
link |
00:06:38.120
It works.
link |
00:06:38.960
It works.
link |
00:06:39.800
And that's because the basics, you know,
link |
00:06:42.520
are defined a long time ago, right?
link |
00:06:45.280
Now, the way an old computer ran is you fetched
link |
00:06:49.480
instructions and you executed them in order.
link |
00:06:52.960
Do the load, do the add, do the compare.
link |
00:06:57.140
The way a modern computer works is you fetch
link |
00:06:59.760
large numbers of instructions, say 500.
link |
00:07:03.240
And then you find the dependency graph
link |
00:07:06.240
between the instructions.
link |
00:07:07.920
And then you execute in independent units
link |
00:07:12.300
those little micrographs.
link |
00:07:15.280
So a modern computer, like people like to say,
link |
00:07:17.760
computers should be simple and clean.
link |
00:07:20.720
But it turns out the market for simple,
link |
00:07:22.400
clean, slow computers is zero, right?
link |
00:07:26.240
We don't sell any simple, clean computers.
link |
00:07:29.560
No, you can, how you build it can be clean,
link |
00:07:33.560
but the computer people want to buy,
link |
00:07:36.680
that's, say, in a phone or a data center,
link |
00:07:40.440
fetches a large number of instructions,
link |
00:07:42.680
computes the dependency graph,
link |
00:07:45.600
and then executes it in a way that gets the right answers.
link |
00:07:49.160
And optimizes that graph somehow.
link |
00:07:50.880
Yeah, they run deeply out of order.
link |
00:07:53.520
And then there's semantics around how memory ordering works
link |
00:07:57.580
and other things work.
link |
00:07:58.420
So the computer sort of has a bunch of bookkeeping tables
link |
00:08:01.960
that says what order should these operations finish in
link |
00:08:05.520
or appear to finish in?
link |
00:08:07.800
But to go fast, you have to fetch a lot of instructions
link |
00:08:10.720
and find all the parallelism.
link |
00:08:12.720
Now, there's a second kind of computer,
link |
00:08:15.480
which we call GPUs today.
link |
00:08:17.560
And I call it the difference.
link |
00:08:19.640
There's found parallelism, like you have a program
link |
00:08:21.880
with a lot of dependent instructions.
link |
00:08:24.120
You fetch a bunch and then you go figure out
link |
00:08:26.120
the dependency graph and you issue instructions out of order.
link |
00:08:29.400
That's because you have one serial narrative to execute,
link |
00:08:32.960
which, in fact, can be done out of order.
link |
00:08:35.840
Did you call it a narrative?
link |
00:08:37.080
Yeah.
link |
00:08:37.920
Oh, wow.
link |
00:08:38.760
Yeah, so humans think of serial narrative.
link |
00:08:40.700
So read a book, right?
link |
00:08:42.960
There's a sentence after sentence after sentence,
link |
00:08:45.760
and there's paragraphs.
link |
00:08:46.840
Now, you could diagram that.
link |
00:08:49.360
Imagine you diagrammed it properly and you said,
link |
00:08:52.680
which sentences could be read in any order,
link |
00:08:55.640
any order without changing the meaning, right?
link |
00:08:59.960
That's a fascinating question to ask of a book, yeah.
link |
00:09:02.520
Yeah, you could do that, right?
link |
00:09:04.400
So some paragraphs could be reordered,
link |
00:09:06.280
some sentences can be reordered.
link |
00:09:08.400
You could say, he is tall and smart and X, right?
link |
00:09:15.640
And it doesn't matter the order of tall and smart.
link |
00:09:19.840
But if you say the tall man is wearing a red shirt,
link |
00:09:22.920
what colors, you can create dependencies, right?
link |
00:09:28.440
And so GPUs, on the other hand,
link |
00:09:32.000
run simple programs on pixels,
link |
00:09:35.320
but you're given a million of them.
link |
00:09:36.880
And the first order, the screen you're looking at
link |
00:09:40.160
doesn't care which order you do it in.
link |
00:09:42.200
So I call that given parallelism.
link |
00:09:44.480
Simple narratives around the large numbers of things
link |
00:09:48.280
where you can just say,
link |
00:09:49.400
it's parallel because you told me it was.
link |
00:09:52.320
So found parallelism where the narrative is sequential,
link |
00:09:57.680
but you discover like little pockets of parallelism versus.
link |
00:10:01.800
Turns out large pockets of parallelism.
link |
00:10:03.980
Large, so how hard is it to discover?
link |
00:10:05.880
Well, how hard is it?
link |
00:10:06.960
That's just transistor count, right?
link |
00:10:08.800
So once you crack the problem, you say,
link |
00:10:11.160
here's how you fetch 10 instructions at a time.
link |
00:10:13.440
Here's how you calculate the dependencies between them.
link |
00:10:16.360
Here's how you describe the dependencies.
link |
00:10:18.480
Here's, you know, these are pieces, right?
link |
00:10:20.660
So once you describe the dependencies,
link |
00:10:25.580
then it's just a graph.
link |
00:10:27.580
Sort of, it's an algorithm that finds,
link |
00:10:31.140
what is that?
link |
00:10:31.960
I'm sure there's a graph theoretical answer here
link |
00:10:34.620
that's solvable.
link |
00:10:35.860
In general, programs, modern programs
link |
00:10:40.700
that human beings write,
link |
00:10:42.220
how much found parallelism is there in them?
link |
00:10:45.820
What does 10X mean?
link |
00:10:47.260
So if you execute it in order, you would get
link |
00:10:52.180
what's called cycles per instruction,
link |
00:10:53.940
and it would be about, you know,
link |
00:10:57.140
three instructions, three cycles per instruction
link |
00:11:00.020
because of the latency of the operations and stuff.
link |
00:11:02.780
And in a modern computer, excuse it,
link |
00:11:05.220
but like 0.2, 0.25 cycles per instruction.
link |
00:11:08.700
So it's about, we today find 10X.
link |
00:11:11.820
And there's two things.
link |
00:11:13.020
One is the found parallelism in the narrative, right?
link |
00:11:17.380
And the other is the predictability of the narrative, right?
link |
00:11:21.380
So certain operations say, do a bunch of calculations,
link |
00:11:25.540
and if greater than one, do this, else do that.
link |
00:11:30.380
That decision is predicted in modern computers
link |
00:11:33.180
to high 90% accuracy.
link |
00:11:36.220
So branches happen a lot.
link |
00:11:38.740
So imagine you have a decision
link |
00:11:40.420
to make every six instructions,
link |
00:11:41.780
which is about the average, right?
link |
00:11:43.740
But you want to fetch 500 instructions,
link |
00:11:45.440
figure out the graph, and execute them all in parallel.
link |
00:11:48.420
That means you have, let's say,
link |
00:11:51.580
if you fetch 600 instructions and it's every six,
link |
00:11:54.980
you have to fetch, you have to predict
link |
00:11:56.940
99 out of 100 branches correctly
link |
00:12:00.260
for that window to be effective.
link |
00:12:02.340
Okay, so parallelism, you can't parallelize branches.
link |
00:12:06.860
Or you can.
link |
00:12:07.700
No, you can predict.
link |
00:12:08.660
You can predict.
link |
00:12:09.500
What does predicted branch mean?
link |
00:12:10.580
What does predicted branch mean?
link |
00:12:11.420
So imagine you do a computation over and over.
link |
00:12:13.580
You're in a loop.
link |
00:12:14.940
So while n is greater than one, do.
link |
00:12:19.420
And you go through that loop a million times.
link |
00:12:21.220
So every time you look at the branch,
link |
00:12:22.660
you say, it's probably still greater than one.
link |
00:12:25.740
And you're saying you could do that accurately.
link |
00:12:27.820
Very accurately.
link |
00:12:28.660
Modern computers.
link |
00:12:29.500
My mind is blown.
link |
00:12:30.500
How the heck do you do that?
link |
00:12:31.460
Wait a minute.
link |
00:12:32.620
Well, you want to know?
link |
00:12:33.820
This is really sad.
link |
00:12:35.500
20 years ago, you simply recorded
link |
00:12:38.700
which way the branch went last time
link |
00:12:40.620
and predicted the same thing.
link |
00:12:42.780
Right.
link |
00:12:43.620
Okay.
link |
00:12:44.460
What's the accuracy of that?
link |
00:12:46.140
85%.
link |
00:12:48.100
So then somebody said, hey, let's keep a couple of bits
link |
00:12:51.780
and have a little counter so when it predicts one way,
link |
00:12:54.980
we count up and then pins.
link |
00:12:56.740
So say you have a three bit counter.
link |
00:12:58.060
So you count up and then you count down.
link |
00:13:00.740
And you can use the top bit as the signed bit
link |
00:13:03.260
so you have a signed two bit number.
link |
00:13:05.020
So if it's greater than one, you predict taken.
link |
00:13:07.500
And less than one, you predict not taken, right?
link |
00:13:11.460
Or less than zero, whatever the thing is.
link |
00:13:14.100
And that got us to 92%.
link |
00:13:16.140
Oh.
link |
00:13:17.300
Okay, no, it gets better.
link |
00:13:19.540
This branch depends on how you got there.
link |
00:13:22.900
So if you came down the code one way,
link |
00:13:25.540
you're talking about Bob and Jane, right?
link |
00:13:28.420
And then said, does Bob like Jane?
link |
00:13:30.460
It went one way.
link |
00:13:31.300
But if you're talking about Bob and Jill,
link |
00:13:32.900
does Bob like Jane?
link |
00:13:33.940
You go a different way.
link |
00:13:35.540
Right, so that's called history.
link |
00:13:36.940
So you take the history and a counter.
link |
00:13:40.020
That's cool, but that's not how anything works today.
link |
00:13:43.420
They use something that looks a little like a neural network.
link |
00:13:48.060
So modern, you take all the execution flows.
link |
00:13:52.260
And then you do basically deep pattern recognition
link |
00:13:56.140
of how the program is executing.
link |
00:13:59.940
And you do that multiple different ways.
link |
00:14:03.740
And you have something that chooses what the best result is.
link |
00:14:07.620
There's a little supercomputer inside the computer.
link |
00:14:10.460
That's trying to predict branching.
link |
00:14:11.860
That calculates which way branches go.
link |
00:14:14.340
So the effective window that it's worth finding grass
link |
00:14:17.300
in gets bigger.
link |
00:14:19.260
Why was that gonna make me sad?
link |
00:14:21.860
Because that's amazing.
link |
00:14:22.940
It's amazingly complicated.
link |
00:14:24.420
Oh, well.
link |
00:14:25.260
Well, here's the funny thing.
link |
00:14:27.100
So to get to 85% took 1,000 bits.
link |
00:14:31.740
To get to 99% takes tens of megabits.
link |
00:14:38.860
So this is one of those, to get the result,
link |
00:14:42.700
to get from a window of say 50 instructions to 500,
link |
00:14:47.780
it took three orders of magnitude
link |
00:14:49.500
or four orders of magnitude more bits.
link |
00:14:52.700
Now if you get the prediction of a branch wrong,
link |
00:14:55.460
what happens then?
link |
00:14:56.300
You flush the pipe.
link |
00:14:57.380
You flush the pipe, so it's just the performance cost.
link |
00:14:59.540
But it gets even better.
link |
00:15:00.820
Yeah.
link |
00:15:01.660
So we're starting to look at stuff that says,
link |
00:15:03.860
so they executed down this path,
link |
00:15:06.700
and then you had two ways to go.
link |
00:15:09.260
But far away, there's something that doesn't matter
link |
00:15:12.500
which path you went.
link |
00:15:14.660
So you took the wrong path.
link |
00:15:17.660
You executed a bunch of stuff.
link |
00:15:20.580
Then you had the mispredicting.
link |
00:15:21.700
You backed it up.
link |
00:15:22.540
You remembered all the results you already calculated.
link |
00:15:25.500
Some of those are just fine.
link |
00:15:27.660
Like if you read a book and you misunderstand a paragraph,
link |
00:15:30.260
your understanding of the next paragraph
link |
00:15:32.500
sometimes is invariant to that understanding.
link |
00:15:35.740
Sometimes it depends on it.
link |
00:15:38.540
And you can kind of anticipate that invariance.
link |
00:15:43.260
Yeah, well, you can keep track of whether the data changed.
link |
00:15:47.380
And so when you come back through a piece of code,
link |
00:15:49.220
should you calculate it again or do the same thing?
link |
00:15:51.860
Okay, how much of this is art and how much of it is science?
link |
00:15:55.620
Because it sounds pretty complicated.
link |
00:15:59.100
Well, how do you describe a situation?
link |
00:16:00.660
So imagine you come to a point in the road
link |
00:16:02.620
where you have to make a decision, right?
link |
00:16:05.140
And you have a bunch of knowledge about which way to go.
link |
00:16:07.060
Maybe you have a map.
link |
00:16:08.940
So you wanna go the shortest way,
link |
00:16:11.580
or do you wanna go the fastest way,
link |
00:16:13.180
or do you wanna take the nicest road?
link |
00:16:14.820
So there's some set of data.
link |
00:16:17.860
So imagine you're doing something complicated
link |
00:16:19.660
like building a computer.
link |
00:16:21.820
And there's hundreds of decision points,
link |
00:16:24.340
all with hundreds of possible ways to go.
link |
00:16:27.760
And the ways you pick interact in a complicated way.
link |
00:16:32.220
Right.
link |
00:16:33.480
And then you have to pick the right spot.
link |
00:16:35.700
Right, so that's.
link |
00:16:36.520
So that's art or science, I don't know.
link |
00:16:37.580
You avoided the question.
link |
00:16:38.940
You just described the Robert Frost problem
link |
00:16:41.380
of road less taken.
link |
00:16:43.660
I described the Robert Frost problem?
link |
00:16:45.760
That's what we do as computer designers.
link |
00:16:49.480
It's all poetry.
link |
00:16:50.420
Okay.
link |
00:16:51.260
Great.
link |
00:16:52.100
Yeah, I don't know how to describe that
link |
00:16:54.220
because some people are very good
link |
00:16:56.440
at making those intuitive leaps.
link |
00:16:57.940
It seems like just combinations of things.
link |
00:17:00.560
Some people are less good at it,
link |
00:17:02.180
but they're really good at evaluating the alternatives.
link |
00:17:05.580
Right, and everybody has a different way to do it.
link |
00:17:09.260
And some people can't make those leaps,
link |
00:17:11.860
but they're really good at analyzing it.
link |
00:17:14.300
So when you see computers are designed
link |
00:17:16.020
by teams of people who have very different skill sets.
link |
00:17:19.260
And a good team has lots of different kinds of people.
link |
00:17:24.460
I suspect you would describe some of them
link |
00:17:26.260
as artistic, but not very many.
link |
00:17:30.420
Unfortunately, or fortunately.
link |
00:17:32.060
Fortunately.
link |
00:17:33.680
Well, you know, computer design's hard.
link |
00:17:36.460
It's 99% perspiration.
link |
00:17:40.380
And the 1% inspiration is really important.
link |
00:17:44.140
But you still need the 99.
link |
00:17:45.860
Yeah, you gotta do a lot of work.
link |
00:17:47.340
And then there are interesting things to do
link |
00:17:50.780
at every level of that stack.
link |
00:17:52.760
So at the end of the day,
link |
00:17:55.720
if you run the same program multiple times,
link |
00:17:58.880
does it always produce the same result?
link |
00:18:01.460
Is there some room for fuzziness there?
link |
00:18:04.720
That's a math problem.
link |
00:18:06.720
So if you run a correct C program,
link |
00:18:08.560
the definition is every time you run it,
link |
00:18:11.480
you get the same answer.
link |
00:18:12.480
Yeah, well that's a math statement.
link |
00:18:14.480
But that's a language definitional statement.
link |
00:18:17.440
So for years when people did,
link |
00:18:19.800
when we first did 3D acceleration of graphics,
link |
00:18:24.600
you could run the same scene multiple times
link |
00:18:27.280
and get different answers.
link |
00:18:28.760
Right.
link |
00:18:29.760
Right, and then some people thought that was okay
link |
00:18:32.360
and some people thought it was a bad idea.
link |
00:18:34.560
And then when the HPC world used GPUs for calculations,
link |
00:18:39.240
they thought it was a really bad idea.
link |
00:18:41.200
Okay, now in modern AI stuff,
link |
00:18:44.440
people are looking at networks
link |
00:18:48.120
where the precision of the data is low enough
link |
00:18:51.040
that the data is somewhat noisy.
link |
00:18:53.640
And the observation is the input data is unbelievably noisy.
link |
00:18:57.280
So why should the calculation be not noisy?
link |
00:19:00.240
And people have experimented with algorithms
link |
00:19:02.200
that say can get faster answers by being noisy.
link |
00:19:05.920
Like as a network starts to converge,
link |
00:19:08.240
if you look at the computation graph,
link |
00:19:09.560
it starts out really wide and then it gets narrower.
link |
00:19:12.160
And you can say is that last little bit that important
link |
00:19:14.440
or should I start the graph on the next rev
link |
00:19:17.680
before we whittle it all the way down to the answer, right?
link |
00:19:21.280
So you can create algorithms that are noisy.
link |
00:19:24.040
Now if you're developing something
link |
00:19:25.440
and every time you run it, you get a different answer,
link |
00:19:27.440
it's really annoying.
link |
00:19:29.280
And so most people think even today,
link |
00:19:33.920
every time you run the program, you get the same answer.
link |
00:19:36.720
No, I know, but the question is
link |
00:19:38.360
that's the formal definition of a programming language.
link |
00:19:42.400
There is a definition of languages
link |
00:19:44.520
that don't get the same answer,
link |
00:19:45.760
but people who use those, you always want something
link |
00:19:49.520
because you get a bad answer and then you're wondering
link |
00:19:51.600
is it because of something in the algorithm
link |
00:19:54.440
or because of this?
link |
00:19:55.360
And so everybody wants a little switch that says
link |
00:19:57.140
no matter what, do it deterministically.
link |
00:20:00.280
And it's really weird because almost everything
link |
00:20:02.400
going into modern calculations is noisy.
link |
00:20:05.320
So why do the answers have to be so clear?
link |
00:20:08.240
Right, so where do you stand?
link |
00:20:09.600
I design computers for people who run programs.
link |
00:20:12.500
So if somebody says I want a deterministic answer,
link |
00:20:16.880
like most people want that.
link |
00:20:18.400
Can you deliver a deterministic answer,
link |
00:20:20.180
I guess is the question.
link |
00:20:21.440
Like when you.
link |
00:20:22.280
Yeah, hopefully, sure.
link |
00:20:24.040
What people don't realize is you get a deterministic answer
link |
00:20:27.280
even though the execution flow is very undeterministic.
link |
00:20:31.100
So you run this program 100 times,
link |
00:20:33.100
it never runs the same way twice, ever.
link |
00:20:36.080
And the answer, it arrives at the same answer.
link |
00:20:37.960
But it gets the same answer every time.
link |
00:20:39.200
It's just amazing.
link |
00:20:42.000
Okay, you've achieved, in the eyes of many people,
link |
00:20:49.600
legend status as a chip art architect.
link |
00:20:53.000
What design creation are you most proud of?
link |
00:20:56.400
Perhaps because it was challenging,
link |
00:20:59.440
because of its impact, or because of the set
link |
00:21:01.820
of brilliant ideas that were involved in bringing it to life?
link |
00:21:06.840
I find that description odd.
link |
00:21:10.080
And I have two small children, and I promise you,
link |
00:21:14.360
they think it's hilarious.
link |
00:21:15.960
This question.
link |
00:21:16.800
Yeah.
link |
00:21:17.620
I do it for them.
link |
00:21:18.460
So I'm really interested in building computers.
link |
00:21:23.320
And I've worked with really, really smart people.
link |
00:21:27.640
I'm not unbelievably smart.
link |
00:21:30.040
I'm fascinated by how they go together,
link |
00:21:32.100
both as a thing to do and as an endeavor that people do.
link |
00:21:38.260
How people and computers go together?
link |
00:21:40.000
Yeah.
link |
00:21:40.840
Like how people think and build a computer.
link |
00:21:44.180
And I find sometimes that the best computer architects
link |
00:21:47.800
aren't that interested in people,
link |
00:21:49.200
or the best people managers aren't that good
link |
00:21:51.780
at designing computers.
link |
00:21:54.400
So the whole stack of human beings is fascinating.
link |
00:21:56.840
So the managers, the individual engineers.
link |
00:21:58.840
Yeah, yeah.
link |
00:21:59.920
Yeah, I said I realized after a lot of years
link |
00:22:02.360
of building computers, where you sort of build them
link |
00:22:04.400
out of transistors, logic gates, functional units,
link |
00:22:06.960
computational elements, that you could think of people
link |
00:22:09.760
the same way, so people are functional units.
link |
00:22:12.640
And then you could think of organizational design
link |
00:22:14.560
as a computer architecture problem.
link |
00:22:16.920
And then it was like, oh, that's super cool,
link |
00:22:19.280
because the people are all different,
link |
00:22:20.680
just like the computational elements are all different.
link |
00:22:23.680
And they like to do different things.
link |
00:22:25.440
And so I had a lot of fun reframing
link |
00:22:29.200
how I think about organizations.
link |
00:22:31.300
Just like with computers, we were saying execution paths,
link |
00:22:35.980
you can have a lot of different paths that end up
link |
00:22:37.820
at the same good destination.
link |
00:22:41.660
So what have you learned about the human abstractions
link |
00:22:45.840
from individual functional human units
link |
00:22:48.920
to the broader organization?
link |
00:22:51.920
What does it take to create something special?
link |
00:22:55.080
Well, most people don't think simple enough.
link |
00:23:00.320
All right, so the difference between a recipe
link |
00:23:02.800
and the understanding.
link |
00:23:04.160
There's probably a philosophical description of this.
link |
00:23:09.160
So imagine you're gonna make a loaf of bread.
link |
00:23:11.480
The recipe says get some flour, add some water,
link |
00:23:14.040
add some yeast, mix it up, let it rise,
link |
00:23:16.800
put it in a pan, put it in the oven.
link |
00:23:19.400
It's a recipe.
link |
00:23:21.320
Understanding bread, you can understand biology,
link |
00:23:24.720
supply chains, grain grinders, yeast, physics,
link |
00:23:29.720
thermodynamics, there's so many levels of understanding.
link |
00:23:37.240
And then when people build and design things,
link |
00:23:40.220
they frequently are executing some stack of recipes.
link |
00:23:45.160
And the problem with that is the recipes
link |
00:23:46.920
all have limited scope.
link |
00:23:48.880
Like if you have a really good recipe book
link |
00:23:50.640
for making bread, it won't tell you anything
link |
00:23:52.280
about how to make an omelet.
link |
00:23:54.840
But if you have a deep understanding of cooking,
link |
00:23:57.320
right, than bread, omelets, you know, sandwich,
link |
00:24:03.680
you know, there's a different way of viewing everything.
link |
00:24:07.680
And most people, when you get to be an expert at something,
link |
00:24:13.020
you know, you're hoping to achieve deeper understanding,
link |
00:24:16.380
not just a large set of recipes to go execute.
link |
00:24:20.860
And it's interesting to walk groups of people
link |
00:24:22.800
because executing recipes is unbelievably efficient
link |
00:24:27.600
if it's what you want to do.
link |
00:24:30.500
If it's not what you want to do, you're really stuck.
link |
00:24:34.800
And that difference is crucial.
link |
00:24:36.600
And everybody has a balance of, let's say,
link |
00:24:39.480
deeper understanding of recipes.
link |
00:24:40.960
And some people are really good at recognizing
link |
00:24:43.760
when the problem is to understand something deeply.
link |
00:24:47.720
Does that make sense?
link |
00:24:49.040
It totally makes sense, does every stage of development,
link |
00:24:52.800
deep understanding on the team needed?
link |
00:24:55.560
Oh, this goes back to the art versus science question.
link |
00:24:58.640
Sure.
link |
00:24:59.480
If you constantly unpack everything
link |
00:25:01.240
for deeper understanding, you never get anything done.
link |
00:25:04.200
And if you don't unpack understanding when you need to,
link |
00:25:06.880
you'll do the wrong thing.
link |
00:25:09.480
And then at every juncture, like human beings
link |
00:25:12.040
are these really weird things because everything you tell them
link |
00:25:15.240
has a million possible outputs, right?
link |
00:25:18.320
And then they all interact in a hilarious way.
link |
00:25:21.080
Yeah, it's very interesting.
link |
00:25:21.920
And then having some intuition about what you tell them,
link |
00:25:24.240
what you do, when do you intervene, when do you not,
link |
00:25:26.680
it's complicated.
link |
00:25:28.720
Right, so.
link |
00:25:29.760
It's essentially computationally unsolvable.
link |
00:25:33.200
Yeah, it's an intractable problem, sure.
link |
00:25:36.640
Humans are a mess.
link |
00:25:37.960
But with deep understanding,
link |
00:25:41.800
do you mean also sort of fundamental questions
link |
00:25:44.560
of things like what is a computer?
link |
00:25:51.360
Or why, like the why questions,
link |
00:25:55.000
why are we even building this, like of purpose?
link |
00:25:58.760
Or do you mean more like going towards
link |
00:26:02.200
the fundamental limits of physics,
link |
00:26:04.280
sort of really getting into the core of the science?
link |
00:26:07.480
In terms of building a computer, think a little simpler.
link |
00:26:11.360
So common practice is you build a computer,
link |
00:26:14.640
and then when somebody says, I wanna make it 10% faster,
link |
00:26:17.760
you'll go in and say, all right,
link |
00:26:19.240
I need to make this buffer bigger,
link |
00:26:20.840
and maybe I'll add an add unit.
link |
00:26:23.000
Or I have this thing that's three instructions wide,
link |
00:26:25.360
I'm gonna make it four instructions wide.
link |
00:26:27.600
And what you see is each piece
link |
00:26:30.480
gets incrementally more complicated, right?
link |
00:26:34.240
And then at some point you hit this limit,
link |
00:26:37.080
like adding another feature or buffer
link |
00:26:39.040
doesn't seem to make it any faster.
link |
00:26:41.200
And then people will say,
link |
00:26:42.040
well, that's because it's a fundamental limit.
link |
00:26:45.400
And then somebody else will look at it and say,
link |
00:26:46.960
well, actually the way you divided the problem up
link |
00:26:49.440
and the way the different features are interacting
link |
00:26:52.000
is limiting you, and it has to be rethought, rewritten.
link |
00:26:56.280
So then you refactor it and rewrite it,
link |
00:26:58.160
and what people commonly find is the rewrite
link |
00:27:00.960
is not only faster, but half as complicated.
link |
00:27:03.600
From scratch? Yes.
link |
00:27:05.080
So how often in your career, but just have you seen
link |
00:27:08.920
is needed, maybe more generally,
link |
00:27:11.560
to just throw the whole thing out and start over?
link |
00:27:14.280
This is where I'm on one end of it,
link |
00:27:17.040
every three to five years.
link |
00:27:19.120
Which end are you on?
link |
00:27:21.120
Rewrite more often.
link |
00:27:22.720
Rewrite, and three to five years is?
link |
00:27:25.200
If you wanna really make a lot of progress
link |
00:27:27.000
on computer architecture, every five years
link |
00:27:28.960
you should do one from scratch.
link |
00:27:31.960
So where does the x86.64 standard come in?
link |
00:27:36.920
How often do you?
link |
00:27:38.960
I was the coauthor of that spec in 98.
link |
00:27:42.360
That's 20 years ago.
link |
00:27:43.880
Yeah, so that's still around.
link |
00:27:45.880
The instruction set itself has been extended
link |
00:27:48.280
quite a few times.
link |
00:27:50.000
And instruction sets are less interesting
link |
00:27:52.520
than the implementation underneath.
link |
00:27:54.760
There's been, on x86 architecture, Intel's designed a few,
link |
00:27:58.680
AIM designed a few very different architectures.
link |
00:28:02.520
And I don't wanna go into too much of the detail
link |
00:28:06.520
about how often, but there's a tendency
link |
00:28:10.640
to rewrite it every 10 years,
link |
00:28:12.560
and it really should be every five.
link |
00:28:15.200
So you're saying you're an outlier in that sense.
link |
00:28:17.880
Rewrite more often.
link |
00:28:19.080
Rewrite more often.
link |
00:28:20.080
Well, and here's the problem.
link |
00:28:20.920
Isn't that scary?
link |
00:28:22.120
Yeah, of course.
link |
00:28:23.680
Well, scary to who?
link |
00:28:25.200
To everybody involved, because like you said,
link |
00:28:28.200
repeating the recipe is efficient.
link |
00:28:30.680
Companies wanna make money.
link |
00:28:34.160
No, individual engineers wanna succeed,
link |
00:28:36.360
so you wanna incrementally improve,
link |
00:28:39.000
increase the buffer from three to four.
link |
00:28:41.280
Well, this is where you get
link |
00:28:42.720
into the diminishing return curves.
link |
00:28:45.440
I think Steve Jobs said this, right?
link |
00:28:46.920
So every, you have a project, and you start here,
link |
00:28:49.880
and it goes up, and you have diminishing return.
link |
00:28:52.360
And to get to the next level, you have to do a new one,
link |
00:28:54.760
and the initial starting point will be lower
link |
00:28:57.640
than the old optimization point, but it'll get higher.
link |
00:29:01.840
So now you have two kinds of fear,
link |
00:29:03.560
short term disaster and long term disaster.
link |
00:29:07.520
And you're, you're haunted.
link |
00:29:08.600
So grown ups, right, like, you know,
link |
00:29:12.160
people with a quarter by quarter business objective
link |
00:29:15.240
are terrified about changing everything.
link |
00:29:17.840
And people who are trying to run a business
link |
00:29:21.040
or build a computer for a long term objective
link |
00:29:23.960
know that the short term limitations block them
link |
00:29:27.200
from the long term success.
link |
00:29:29.360
So if you look at leaders of companies
link |
00:29:32.720
that had really good long term success,
link |
00:29:35.200
every time they saw that they had to redo something, they did.
link |
00:29:39.000
And so somebody has to speak up.
link |
00:29:41.040
Or you do multiple projects in parallel,
link |
00:29:43.080
like you optimize the old one while you build a new one.
link |
00:29:46.720
But the marketing guys are always like,
link |
00:29:48.200
make promise me that the new computer
link |
00:29:49.960
is faster on every single thing.
link |
00:29:52.720
And the computer architect says,
link |
00:29:53.920
well, the new computer will be faster on the average,
link |
00:29:56.720
but there's a distribution of results and performance,
link |
00:29:59.480
and you'll have some outliers that are slower.
link |
00:30:01.920
And that's very hard,
link |
00:30:02.760
because they have one customer who cares about that one.
link |
00:30:05.280
So speaking of the long term, for over 50 years now,
link |
00:30:08.960
Moore's Law has served, for me and millions of others,
link |
00:30:12.880
as an inspiring beacon of what kind of amazing future
link |
00:30:16.640
brilliant engineers can build.
link |
00:30:18.040
Yep.
link |
00:30:19.360
I'm just making your kids laugh all of today.
link |
00:30:21.880
That was great.
link |
00:30:23.480
So first, in your eyes, what is Moore's Law,
link |
00:30:27.560
if you could define for people who don't know?
link |
00:30:29.920
Well, the simple statement was, from Gordon Moore,
link |
00:30:34.280
was double the number of transistors every two years.
link |
00:30:37.880
Something like that.
link |
00:30:39.320
And then my operational model is,
link |
00:30:43.240
we increase the performance of computers
link |
00:30:45.840
by two X every two or three years.
link |
00:30:48.520
And it's wiggled around substantially over time.
link |
00:30:51.400
And also, in how we deliver, performance has changed.
link |
00:30:55.160
But the foundational idea was
link |
00:31:00.480
two X to transistors every two years.
link |
00:31:02.920
The current cadence is something like,
link |
00:31:05.760
they call it a shrink factor, like 0.6 every two years,
link |
00:31:10.040
which is not 0.5.
link |
00:31:11.920
But that's referring strictly, again,
link |
00:31:13.800
to the original definition of just.
link |
00:31:15.360
A transistor count.
link |
00:31:16.680
A shrink factor's just getting them
link |
00:31:18.060
smaller and smaller and smaller.
link |
00:31:19.040
Well, it's for a constant chip area.
link |
00:31:21.760
If you make the transistors smaller by 0.6,
link |
00:31:24.200
then you get one over 0.6 more transistors.
link |
00:31:27.200
So can you linger on it a little longer?
link |
00:31:29.140
What's a broader, what do you think should be
link |
00:31:31.680
the broader definition of Moore's Law?
link |
00:31:33.920
When you mentioned how you think of performance,
link |
00:31:37.920
just broadly, what's a good way to think about Moore's Law?
link |
00:31:42.360
Well, first of all, I've been aware
link |
00:31:45.600
of Moore's Law for 30 years.
link |
00:31:48.160
In which sense?
link |
00:31:49.100
Well, I've been designing computers for 40.
link |
00:31:52.920
You're just watching it before your eyes kind of thing.
link |
00:31:56.040
And somewhere where I became aware of it,
link |
00:31:58.160
I was also informed that Moore's Law
link |
00:31:59.800
was gonna die in 10 to 15 years.
link |
00:32:02.240
And then I thought that was true at first.
link |
00:32:03.940
But then after 10 years, it was gonna die in 10 to 15 years.
link |
00:32:07.320
And then at one point, it was gonna die in five years.
link |
00:32:09.800
And then it went back up to 10 years.
link |
00:32:11.320
And at some point, I decided not to worry
link |
00:32:13.440
about that particular prognostication
link |
00:32:16.680
for the rest of my life, which is fun.
link |
00:32:19.640
And then I joined Intel and everybody said
link |
00:32:21.560
Moore's Law is dead.
link |
00:32:22.840
And I thought that's sad,
link |
00:32:23.720
because it's the Moore's Law company.
link |
00:32:25.640
And it's not dead.
link |
00:32:26.920
And it's always been gonna die.
link |
00:32:29.200
And humans like these apocryphal kind of statements,
link |
00:32:33.360
like we'll run out of food, or we'll run out of air,
link |
00:32:36.280
or we'll run out of room, or we'll run out of something.
link |
00:32:39.960
Right, but it's still incredible
link |
00:32:41.920
that it's lived for as long as it has.
link |
00:32:44.640
And yes, there's many people who believe now
link |
00:32:47.640
that Moore's Law is dead.
link |
00:32:50.180
You know, they can join the last 50 years
link |
00:32:52.840
of people who had the same idea.
link |
00:32:53.680
Yeah, there's a long tradition.
link |
00:32:55.400
But why do you think, if you can try to understand it,
link |
00:33:00.840
why do you think it's not dead?
link |
00:33:03.080
Well, let's just think, people think Moore's Law
link |
00:33:06.600
is one thing, transistors get smaller.
link |
00:33:09.160
But actually, under the sheet,
link |
00:33:10.200
there's literally thousands of innovations.
link |
00:33:12.520
And almost all those innovations
link |
00:33:14.120
have their own diminishing return curves.
link |
00:33:17.360
So if you graph it, it looks like a cascade
link |
00:33:19.400
of diminishing return curves.
link |
00:33:21.440
I don't know what to call that.
link |
00:33:22.660
But the result is an exponential curve.
link |
00:33:26.480
Well, at least it has been.
link |
00:33:27.940
So, and we keep inventing new things.
link |
00:33:30.920
So if you're an expert in one of the things
link |
00:33:32.960
on a diminishing return curve, right,
link |
00:33:35.920
and you can see it's plateau,
link |
00:33:38.480
you will probably tell people, well, this is done.
link |
00:33:42.220
Meanwhile, some other pile of people
link |
00:33:43.640
are doing something different.
link |
00:33:46.400
So that's just normal.
link |
00:33:48.280
So then there's the observation of
link |
00:33:50.400
how small could a switching device be?
link |
00:33:54.060
So a modern transistor is something like
link |
00:33:55.760
a thousand by a thousand by a thousand atoms, right?
link |
00:33:59.900
And you get quantum effects down around two to 10 atoms.
link |
00:34:04.680
So you can imagine the transistor
link |
00:34:06.280
as small as 10 by 10 by 10.
link |
00:34:08.240
So that's a million times smaller.
link |
00:34:12.080
And then the quantum computational people
link |
00:34:14.500
are working away at how to use quantum effects.
link |
00:34:17.480
So.
link |
00:34:20.000
A thousand by a thousand by a thousand.
link |
00:34:21.920
Atoms.
link |
00:34:23.740
That's a really clean way of putting it.
link |
00:34:26.640
Well, a fan, like a modern transistor,
link |
00:34:28.840
if you look at the fan, it's like 120 atoms wide,
link |
00:34:32.060
but we can make that thinner.
link |
00:34:33.360
And then there's a gate wrapped around it,
link |
00:34:35.700
and then there's spacing.
link |
00:34:36.600
There's a whole bunch of geometry.
link |
00:34:38.800
And a competent transistor designer
link |
00:34:42.040
could count both atoms in every single direction.
link |
00:34:48.000
Like there's techniques now to already put down atoms
link |
00:34:50.480
in a single atomic layer.
link |
00:34:53.080
And you can place atoms if you want to.
link |
00:34:55.840
It's just from a manufacturing process,
link |
00:34:59.600
if placing an atom takes 10 minutes
link |
00:35:01.320
and you need to put 10 to the 23rd atoms together
link |
00:35:05.640
to make a computer, it would take a long time.
link |
00:35:08.800
So the methods are both shrinking things
link |
00:35:13.340
and then coming up with effective ways
link |
00:35:15.060
to control what's happening.
link |
00:35:17.900
Manufacture stably and cheaply.
link |
00:35:20.060
Yeah.
link |
00:35:21.400
So the innovation stock's pretty broad.
link |
00:35:23.840
There's equipment, there's optics, there's chemistry,
link |
00:35:26.880
there's physics, there's material science,
link |
00:35:29.240
there's metallurgy, there's lots of ideas
link |
00:35:31.960
about when you put different materials together,
link |
00:35:33.720
how do they interact, are they stable,
link |
00:35:35.520
is it stable over temperature, like are they repeatable?
link |
00:35:40.880
There's like literally thousands of technologies involved.
link |
00:35:45.000
But just for the shrinking, you don't think
link |
00:35:46.960
we're quite yet close to the fundamental limits of physics?
link |
00:35:50.960
I did a talk on Moore's Law and I asked for a roadmap
link |
00:35:53.800
to a path of 100 and after two weeks,
link |
00:35:56.560
they said we only got to 50.
link |
00:35:58.880
100 what, sorry?
link |
00:35:59.720
100 X shrink.
link |
00:36:00.560
100 X shrink?
link |
00:36:01.940
We only got to 50.
link |
00:36:02.780
And I said, why don't you give it another two weeks?
link |
00:36:05.720
Well, here's the thing about Moore's Law, right?
link |
00:36:09.680
So I believe that the next 10 or 20 years
link |
00:36:14.180
of shrinking is gonna happen, right?
link |
00:36:16.360
Now, as a computer designer, you have two stances.
link |
00:36:20.920
You think it's going to shrink, in which case
link |
00:36:23.040
you're designing and thinking about architecture
link |
00:36:26.160
in a way that you'll use more transistors.
link |
00:36:29.020
Or conversely, not be swamped by the complexity
link |
00:36:32.880
of all the transistors you get, right?
link |
00:36:36.120
You have to have a strategy, you know?
link |
00:36:39.320
So you're open to the possibility and waiting
link |
00:36:42.100
for the possibility of a whole new army
link |
00:36:44.160
of transistors ready to work.
link |
00:36:45.960
I'm expecting more transistors every two or three years
link |
00:36:50.380
by a number large enough that how you think about design,
link |
00:36:54.360
how you think about architecture has to change.
link |
00:36:57.200
Like, imagine you build buildings out of bricks,
link |
00:37:01.080
and every year the bricks are half the size,
link |
00:37:04.520
or every two years.
link |
00:37:05.880
Well, if you kept building bricks the same way,
link |
00:37:08.360
so many bricks per person per day,
link |
00:37:11.280
the amount of time to build a building
link |
00:37:13.600
would go up exponentially, right?
link |
00:37:16.980
But if you said, I know that's coming,
link |
00:37:19.200
so now I'm gonna design equipment that moves bricks faster,
link |
00:37:22.360
uses them better, because maybe you're getting something
link |
00:37:24.440
out of the smaller bricks, more strength, thinner walls,
link |
00:37:27.520
you know, less material, efficiency out of that.
link |
00:37:30.360
So once you have a roadmap with what's gonna happen,
link |
00:37:33.260
transistors, we're gonna get more of them,
link |
00:37:36.520
then you design all this collateral around it
link |
00:37:38.760
to take advantage of it, and also to cope with it.
link |
00:37:42.440
Like, that's the thing people don't understand.
link |
00:37:43.760
It's like, if I didn't believe in Moore's Law,
link |
00:37:46.120
and then Moore's Law transistors showed up,
link |
00:37:48.760
my design teams would all drown.
link |
00:37:50.440
So what's the hardest part of this inflow
link |
00:37:56.180
of new transistors?
link |
00:37:57.380
I mean, even if you just look historically,
link |
00:37:59.500
throughout your career, what's the thing,
link |
00:38:03.740
what fundamentally changes when you add more transistors
link |
00:38:06.980
in the task of designing an architecture?
link |
00:38:10.800
Well, there's two constants, right?
link |
00:38:12.500
One is people don't get smarter.
link |
00:38:16.100
By the way, there's some science showing
link |
00:38:17.300
that we do get smarter because of nutrition or whatever.
link |
00:38:21.260
Sorry to bring that up.
link |
00:38:22.100
Blend effect.
link |
00:38:22.940
Yes.
link |
00:38:23.760
Yeah, I'm familiar with it.
link |
00:38:24.600
Nobody understands it, nobody knows if it's still going on.
link |
00:38:26.300
So that's a...
link |
00:38:27.180
Or whether it's real or not.
link |
00:38:28.540
But yeah, it's a...
link |
00:38:30.220
I sort of...
link |
00:38:31.300
Anyway, but not exponentially.
link |
00:38:32.140
I would believe for the most part,
link |
00:38:33.480
people aren't getting much smarter.
link |
00:38:35.500
The evidence doesn't support it, that's right.
link |
00:38:37.540
And then teams can't grow that much.
link |
00:38:40.100
Right.
link |
00:38:40.940
Right, so human beings, you know,
link |
00:38:43.380
we're really good in teams of 10,
link |
00:38:45.780
you know, up to teams of 100, they can know each other.
link |
00:38:48.180
Beyond that, you have to have organizational boundaries.
link |
00:38:50.840
So you're kind of, you have,
link |
00:38:51.940
those are pretty hard constraints, right?
link |
00:38:54.680
So then you have to divide and conquer,
link |
00:38:56.420
like as the designs get bigger,
link |
00:38:57.940
you have to divide it into pieces.
link |
00:39:00.260
You know, the power of abstraction layers is really high.
link |
00:39:03.220
We used to build computers out of transistors.
link |
00:39:06.120
Now we have a team that turns transistors into logic cells
link |
00:39:08.900
and another team that turns them into functional units,
link |
00:39:10.700
another one that turns them into computers, right?
link |
00:39:13.180
So we have abstraction layers in there
link |
00:39:16.100
and you have to think about when do you shift gears on that.
link |
00:39:21.380
We also use faster computers to build faster computers.
link |
00:39:24.340
So some algorithms run twice as fast on new computers,
link |
00:39:27.820
but a lot of algorithms are N squared.
link |
00:39:30.460
So, you know, a computer with twice as many transistors
link |
00:39:33.600
and it might take four times as long to run.
link |
00:39:36.540
So you have to refactor the software.
link |
00:39:39.380
Like simply using faster computers
link |
00:39:41.040
to build bigger computers doesn't work.
link |
00:39:44.180
So you have to think about all these things.
link |
00:39:46.260
So in terms of computing performance
link |
00:39:47.900
and the exciting possibility
link |
00:39:49.300
that more powerful computers bring,
link |
00:39:51.580
is shrinking the thing which you've been talking about,
link |
00:39:57.020
for you, one of the biggest exciting possibilities
link |
00:39:59.880
of advancement in performance?
link |
00:40:01.540
Or is there other directions that you're interested in,
link |
00:40:03.940
like in the direction of sort of enforcing given parallelism
link |
00:40:08.940
or like doing massive parallelism
link |
00:40:12.180
in terms of many, many CPUs,
link |
00:40:15.020
you know, stacking CPUs on top of each other,
link |
00:40:17.660
that kind of parallelism or any kind of parallelism?
link |
00:40:20.780
Well, think about it a different way.
link |
00:40:22.220
So old computers, you know, slow computers,
link |
00:40:25.220
you said A equal B plus C times D, pretty simple, right?
link |
00:40:30.580
And then we made faster computers with vector units
link |
00:40:33.480
and you can do proper equations and matrices, right?
link |
00:40:38.480
And then modern like AI computations
link |
00:40:41.080
or like convolutional neural networks,
link |
00:40:43.400
where you convolve one large data set against another.
link |
00:40:47.080
And so there's sort of this hierarchy of mathematics,
link |
00:40:51.140
you know, from simple equation to linear equations,
link |
00:40:54.060
to matrix equations, to deeper kind of computation.
link |
00:40:58.760
And the data sets are getting so big
link |
00:41:00.600
that people are thinking of data as a topology problem.
link |
00:41:04.360
You know, data is organized in some immense shape.
link |
00:41:07.960
And then the computation, which sort of wants to be,
link |
00:41:11.160
get data from immense shape and do some computation on it.
link |
00:41:15.320
So what computers have allowed people to do
link |
00:41:18.120
is have algorithms go much, much further.
link |
00:41:22.480
So that paper you reference, the Sutton paper,
link |
00:41:26.640
they talked about, you know, like when AI started,
link |
00:41:29.120
it was apply rule sets to something.
link |
00:41:31.860
That's a very simple computational situation.
link |
00:41:35.780
And then when they did first chess thing,
link |
00:41:37.840
they solved deep searches.
link |
00:41:39.880
So have a huge database of moves and results, deep search,
link |
00:41:44.680
but it's still just a search, right?
link |
00:41:48.140
Now we take large numbers of images
link |
00:41:51.140
and we use it to train these weight sets
link |
00:41:54.360
that we convolve across.
link |
00:41:56.240
It's a completely different kind of phenomena.
link |
00:41:58.880
We call that AI.
link |
00:41:59.960
Now they're doing the next generation.
link |
00:42:02.440
And if you look at it,
link |
00:42:03.800
they're going up this mathematical graph, right?
link |
00:42:07.560
And then computations, both computation and data sets
link |
00:42:11.200
support going up that graph.
link |
00:42:13.940
Yeah, the kind of computation that might,
link |
00:42:15.480
I mean, I would argue that all of it is still a search,
link |
00:42:18.720
right?
link |
00:42:20.000
Just like you said, a topology problem with data sets,
link |
00:42:22.780
you're searching the data sets for valuable data
link |
00:42:27.040
and also the actual optimization of neural networks
link |
00:42:30.000
is a kind of search for the...
link |
00:42:33.040
I don't know, if you had looked at the interlayers
link |
00:42:34.760
of finding a cat, it's not a search.
link |
00:42:39.100
It's a set of endless projections.
link |
00:42:41.120
So, you know, a projection,
link |
00:42:42.760
here's a shadow of this phone, right?
link |
00:42:45.640
And then you can have a shadow of that on the something
link |
00:42:47.680
and a shadow on that of something.
link |
00:42:49.240
And if you look in the layers, you'll see
link |
00:42:51.440
this layer actually describes pointy ears
link |
00:42:53.580
and round eyeness and fuzziness.
link |
00:42:56.560
But the computation to tease out the attributes
link |
00:43:02.000
is not search.
link |
00:43:03.700
Like the inference part might be search,
link |
00:43:05.960
but the training's not search.
link |
00:43:07.440
And then in deep networks, they look at layers
link |
00:43:10.760
and they don't even know it's represented.
link |
00:43:14.340
And yet, if you take the layers out, it doesn't work.
link |
00:43:16.640
So I don't think it's search.
link |
00:43:18.940
But you'd have to talk to a mathematician
link |
00:43:21.040
about what that actually is.
link |
00:43:22.960
Well, we could disagree, but it's just semantics,
link |
00:43:27.000
I think, it's not, but it's certainly not...
link |
00:43:29.160
I would say it's absolutely not semantics, but...
link |
00:43:31.920
Okay, all right, well, if you want to go there.
link |
00:43:37.060
So optimization to me is search,
link |
00:43:39.020
and we're trying to optimize the ability
link |
00:43:42.960
of a neural network to detect cat ears.
link |
00:43:45.800
And the difference between chess and the space,
link |
00:43:51.060
the incredibly multidimensional,
link |
00:43:54.100
100,000 dimensional space that neural networks
link |
00:43:57.360
are trying to optimize over is nothing like
link |
00:44:00.200
the chessboard database.
link |
00:44:02.200
So it's a totally different kind of thing.
link |
00:44:04.320
And okay, in that sense, you can say it loses the meaning.
link |
00:44:07.720
I can see how you might say, if you...
link |
00:44:11.240
The funny thing is, it's the difference
link |
00:44:12.800
between given search space and found search space.
link |
00:44:16.520
Right, exactly.
link |
00:44:17.360
Yeah, maybe that's a different way to describe it.
link |
00:44:18.800
That's a beautiful way to put it, okay.
link |
00:44:19.960
But you're saying, what's your sense
link |
00:44:21.720
in terms of the basic mathematical operations
link |
00:44:24.800
and the architectures, computer hardware
link |
00:44:27.800
that enables those operations?
link |
00:44:29.920
Do you see the CPUs of today still being
link |
00:44:33.000
a really core part of executing
link |
00:44:36.000
those mathematical operations?
link |
00:44:37.640
Yes.
link |
00:44:38.560
Well, the operations continue to be add, subtract,
link |
00:44:42.280
load, store, compare, and branch.
link |
00:44:44.640
It's remarkable.
link |
00:44:46.120
So it's interesting, the building blocks
link |
00:44:48.840
of computers or transistors under that atoms.
link |
00:44:52.760
So you got atoms, transistors, logic gates, computers,
link |
00:44:56.360
functional units of computers.
link |
00:44:58.360
The building blocks of mathematics at some level
link |
00:45:01.000
are things like adds and subtracts and multiplies,
link |
00:45:04.440
but the space mathematics can describe
link |
00:45:08.360
is, I think, essentially infinite.
link |
00:45:11.240
But the computers that run the algorithms
link |
00:45:14.080
are still doing the same things.
link |
00:45:16.680
Now, a given algorithm might say, I need sparse data,
link |
00:45:20.320
or I need 32 bit data, or I need, you know,
link |
00:45:24.800
like a convolution operation that naturally takes
link |
00:45:27.800
eight bit data, multiplies it, and sums it up a certain way.
link |
00:45:31.680
So like the data types in TensorFlow
link |
00:45:35.200
imply an optimization set.
link |
00:45:38.240
But when you go right down and look at the computers,
link |
00:45:40.480
it's and and or gates doing adds and multiplies.
link |
00:45:42.920
Like that hasn't changed much.
link |
00:45:46.280
Now, the quantum researchers think
link |
00:45:48.600
they're going to change that radically,
link |
00:45:50.000
and then there's people who think about analog computing
link |
00:45:52.280
because you look in the brain, and it
link |
00:45:53.840
seems to be more analogish.
link |
00:45:55.880
You know, that maybe there's a way to do that more
link |
00:45:58.040
efficiently.
link |
00:45:59.120
But we have a million X on computation,
link |
00:46:03.520
and I don't know the relationship
link |
00:46:07.760
between computational, let's say,
link |
00:46:09.680
intensity and ability to hit mathematical abstractions.
link |
00:46:15.440
I don't know any way to describe that, but just like you saw
link |
00:46:19.320
in AI, you went from rule sets to simple search
link |
00:46:23.000
to complex search to, say, found search.
link |
00:46:26.480
Like those are orders of magnitude more computation
link |
00:46:30.080
to do.
link |
00:46:31.600
And as we get the next two orders of magnitude,
link |
00:46:34.720
like a friend, Roger Gaduri, said,
link |
00:46:36.480
like every order of magnitude changes the computation.
link |
00:46:40.240
Fundamentally changes what the computation is doing.
link |
00:46:42.720
Yeah.
link |
00:46:44.760
Oh, you know the expression the difference in quantity
link |
00:46:46.880
is the difference in kind.
link |
00:46:49.560
You know, the difference between ant and anthill, right?
link |
00:46:53.080
Or neuron and brain.
link |
00:46:56.000
You know, there's this indefinable place
link |
00:46:58.920
where the quantity changed the quality, right?
link |
00:47:02.520
And we've seen that happen in mathematics multiple times,
link |
00:47:05.040
and you know, my guess is it's going to keep happening.
link |
00:47:08.720
So your sense is, yeah, if you focus head down
link |
00:47:12.280
and shrinking the transistor.
link |
00:47:14.920
Well, it's not just head down, we're aware of the software
link |
00:47:18.000
stacks that are running in the computational loads,
link |
00:47:20.400
and we're kind of pondering what do you
link |
00:47:22.360
do with a petabyte of memory that wants
link |
00:47:24.880
to be accessed in a sparse way and have, you know,
link |
00:47:28.200
the kind of calculations AI programmers want.
link |
00:47:32.720
So there's a dialogue interaction,
link |
00:47:34.760
but when you go in the computer chip,
link |
00:47:38.120
you know, you find adders and subtractors and multipliers.
link |
00:47:43.120
So if you zoom out then with, as you mentioned very sudden,
link |
00:47:46.960
the idea that most of the development in the last many
link |
00:47:50.160
decades in AI research came from just leveraging computation
link |
00:47:54.320
and just simple algorithms waiting for the computation
link |
00:47:59.160
to improve.
link |
00:48:00.040
Well, software guys have a thing that they call it
link |
00:48:03.760
the problem of early optimization.
link |
00:48:07.080
So you write a big software stack,
link |
00:48:09.160
and if you start optimizing like the first thing you write,
link |
00:48:12.360
the odds of that being the performance limiter is low.
link |
00:48:15.400
But when you get the whole thing working,
link |
00:48:17.000
can you make it 2x faster by optimizing the right things?
link |
00:48:19.760
Sure.
link |
00:48:21.040
While you're optimizing that, could you
link |
00:48:22.760
have written a new software stack, which
link |
00:48:24.480
would have been a better choice?
link |
00:48:26.000
Maybe.
link |
00:48:27.080
Now you have creative tension.
link |
00:48:29.440
So.
link |
00:48:30.200
But the whole time as you're doing the writing,
link |
00:48:33.080
that's the software we're talking about.
link |
00:48:34.880
The hardware underneath gets faster and faster.
link |
00:48:36.840
Well, this goes back to the Moore's law.
link |
00:48:38.600
If Moore's law is going to continue, then your AI research
link |
00:48:43.680
should expect that to show up, and then you
link |
00:48:46.200
make a slightly different set of choices then.
link |
00:48:48.680
We've hit the wall.
link |
00:48:49.800
Nothing's going to happen.
link |
00:48:51.440
And from here, it's just us rewriting algorithms.
link |
00:48:55.200
That seems like a failed strategy for the last 30
link |
00:48:57.440
years of Moore's law's death.
link |
00:49:00.120
So can you just linger on it?
link |
00:49:03.240
I think you've answered it, but I'll just
link |
00:49:05.280
ask the same dumb question over and over.
link |
00:49:06.960
So why do you think Moore's law is not going to die?
link |
00:49:12.480
Which is the most promising, exciting possibility
link |
00:49:15.680
of why it won't die in the next 5, 10 years?
link |
00:49:17.960
So is it the continued shrinking of the transistor,
link |
00:49:20.640
or is it another S curve that steps in and it totally sort
link |
00:49:25.440
of matches up?
link |
00:49:26.080
Shrinking the transistor is literally
link |
00:49:28.160
thousands of innovations.
link |
00:49:30.200
Right, so there's stacks of S curves in there.
link |
00:49:33.280
There's a whole bunch of S curves just kind
link |
00:49:35.280
of running their course and being reinvented
link |
00:49:38.680
and new things.
link |
00:49:41.720
The semiconductor fabricators and technologists have all
link |
00:49:45.880
announced what's called nanowires.
link |
00:49:47.360
So they took a fan, which had a gate around it,
link |
00:49:51.120
and turned that into little wires
link |
00:49:52.640
so you have better control of that, and they're smaller.
link |
00:49:55.280
And then from there, there are some obvious steps
link |
00:49:57.240
about how to shrink that.
link |
00:49:59.680
The metallurgy around wire stacks and stuff
link |
00:50:03.640
has very obvious abilities to shrink.
link |
00:50:07.160
And there's a whole combination of things there to do.
link |
00:50:11.000
Your sense is that we're going to get a lot
link |
00:50:13.480
if this innovation performed just that, shrinking.
link |
00:50:16.680
Yeah, like a factor of 100 is a lot.
link |
00:50:19.440
Yeah, I would say that's incredible.
link |
00:50:22.120
And it's totally unknown.
link |
00:50:23.720
It's only 10 or 15 years.
link |
00:50:25.120
Now, you're smarter, you might know,
link |
00:50:26.560
but to me it's totally unpredictable
link |
00:50:28.160
of what that 100x would bring in terms
link |
00:50:30.160
of the nature of the computation that people would be.
link |
00:50:34.440
Yeah, are you familiar with Bell's law?
link |
00:50:37.280
So for a long time, it was mainframes, minis, workstation,
link |
00:50:40.720
PC, mobile.
link |
00:50:42.480
Moore's law drove faster, smaller computers.
link |
00:50:46.200
And then when we were thinking about Moore's law,
link |
00:50:49.520
Rajagaduri said, every 10x generates a new computation.
link |
00:50:53.280
So scalar, vector, matrix, topological computation.
link |
00:51:01.120
And if you go look at the industry trends,
link |
00:51:03.840
there was mainframes, and then minicomputers, and then PCs,
link |
00:51:07.440
and then the internet took off.
link |
00:51:08.920
And then we got mobile devices.
link |
00:51:10.760
And now we're building 5G wireless
link |
00:51:12.680
with one millisecond latency.
link |
00:51:14.880
And people are starting to think about the smart world
link |
00:51:17.120
where everything knows you, recognizes you.
link |
00:51:23.200
The transformations are going to be unpredictable.
link |
00:51:27.440
How does it make you feel that you're
link |
00:51:29.560
one of the key architects of this kind of future?
link |
00:51:35.200
So we're not talking about the architects
link |
00:51:37.160
of the high level people who build the Angry Bird apps,
link |
00:51:42.320
and Snapchat.
link |
00:51:43.880
Angry Bird apps.
link |
00:51:44.720
Who knows?
link |
00:51:45.240
Maybe that's the whole point of the universe.
link |
00:51:47.120
I'm going to take a stand at that,
link |
00:51:48.840
and the attention distracting nature of mobile phones.
link |
00:51:52.800
I'll take a stand.
link |
00:51:53.760
But anyway, in terms of the side effects of smartphones,
link |
00:52:01.240
or the attention distraction, which part?
link |
00:52:03.680
Well, who knows where this is all leading?
link |
00:52:06.120
It's changing so fast.
link |
00:52:08.200
My parents used to yell at my sisters
link |
00:52:09.720
for hiding in the closet with a wired phone with a dial on it.
link |
00:52:13.120
Stop talking to your friends all day.
link |
00:52:15.840
Now my wife yells at my kids for talking to their friends
link |
00:52:18.640
all day on text.
link |
00:52:20.480
It looks the same to me.
link |
00:52:21.760
It's always echoes of the same thing.
link |
00:52:23.560
But you are one of the key people
link |
00:52:26.640
architecting the hardware of this future.
link |
00:52:29.120
How does that make you feel?
link |
00:52:30.520
Do you feel responsible?
link |
00:52:33.560
Do you feel excited?
link |
00:52:36.040
So we're in a social context.
link |
00:52:38.080
So there's billions of people on this planet.
link |
00:52:40.920
There are literally millions of people working on technology.
link |
00:52:45.320
I feel lucky to be doing what I do and getting paid for it,
link |
00:52:50.840
and there's an interest in it.
link |
00:52:52.800
But there's so many things going on in parallel.
link |
00:52:56.480
The actions are so unpredictable.
link |
00:52:58.360
If I wasn't here, somebody else would do it.
link |
00:53:01.200
The vectors of all these different things
link |
00:53:03.400
are happening all the time.
link |
00:53:06.120
You know, there's a, I'm sure, some philosopher
link |
00:53:10.240
or metaphilosopher is wondering about how
link |
00:53:12.600
we transform our world.
link |
00:53:16.200
So you can't deny the fact that these tools are
link |
00:53:22.960
changing our world.
link |
00:53:24.440
That's right.
link |
00:53:25.320
Do you think it's changing for the better?
link |
00:53:29.640
I read this thing recently.
link |
00:53:31.280
It said the two disciplines with the highest GRE scores in college
link |
00:53:36.280
are physics and philosophy.
link |
00:53:39.560
And they're both sort of trying to answer the question,
link |
00:53:41.880
why is there anything?
link |
00:53:43.960
And the philosophers are on the kind of theological side,
link |
00:53:47.680
and the physicists are obviously on the material side.
link |
00:53:52.640
And there's 100 billion galaxies with 100 billion stars.
link |
00:53:56.920
It seems, well, repetitive at best.
link |
00:54:01.000
So you know, there's on our way to 10 billion people.
link |
00:54:06.240
I mean, it's hard to say what it's all for,
link |
00:54:08.160
if that's what you're asking.
link |
00:54:09.560
Yeah, I guess I am.
link |
00:54:11.280
Things do tend to significantly increase in complexity.
link |
00:54:16.240
And I'm curious about how computation,
link |
00:54:21.280
like our physical world inherently
link |
00:54:24.480
generates mathematics.
link |
00:54:25.880
It's kind of obvious, right?
link |
00:54:26.920
So we have x, y, z coordinates.
link |
00:54:28.640
You take a sphere, you make it bigger.
link |
00:54:30.120
You get a surface that grows by r squared.
link |
00:54:34.040
Like, it generally generates mathematics.
link |
00:54:36.360
And the mathematicians and the physicists
link |
00:54:38.720
have been having a lot of fun talking to each other for years.
link |
00:54:41.280
And computation has been, let's say, relatively pedestrian.
link |
00:54:46.080
Like, computation in terms of mathematics
link |
00:54:48.520
has been doing binary algebra, while those guys have
link |
00:54:52.760
been gallivanting through the other realms of possibility.
link |
00:54:58.040
Now recently, the computation lets
link |
00:55:01.200
you do mathematical computations that
link |
00:55:04.880
are sophisticated enough that nobody understands
link |
00:55:07.520
how the answers came out.
link |
00:55:10.000
Machine learning.
link |
00:55:10.760
Machine learning.
link |
00:55:12.000
It used to be you get data set, you guess at a function.
link |
00:55:16.800
The function is considered physics
link |
00:55:18.920
if it's predictive of new functions, new data sets.
link |
00:55:23.000
Modern, you can take a large data set
link |
00:55:28.320
with no intuition about what it is
link |
00:55:29.920
and use machine learning to find a pattern that
link |
00:55:31.960
has no function, right?
link |
00:55:34.240
And it can arrive at results that I
link |
00:55:37.160
don't know if they're completely mathematically describable.
link |
00:55:39.920
So computation has kind of done something interesting compared
link |
00:55:44.560
to a equal b plus c.
link |
00:55:47.160
There's something reminiscent of that step
link |
00:55:49.640
from the basic operations of addition
link |
00:55:54.760
to taking a step towards neural networks that's
link |
00:55:56.880
reminiscent of what life on Earth at its origins was doing.
link |
00:56:01.040
Do you think we're creating sort of the next step
link |
00:56:03.440
in our evolution in creating artificial intelligence
link |
00:56:06.520
systems that will?
link |
00:56:07.920
I don't know.
link |
00:56:08.680
I mean, there's so much in the universe already,
link |
00:56:11.040
it's hard to say.
link |
00:56:12.560
Where we stand in this whole thing.
link |
00:56:14.000
Are human beings working on additional abstraction
link |
00:56:17.000
layers and possibilities?
link |
00:56:18.480
Yeah, it appears so.
link |
00:56:20.280
Does that mean that human beings don't need dogs?
link |
00:56:22.960
You know, no.
link |
00:56:24.120
Like, there's so many things that
link |
00:56:26.120
are all simultaneously interesting and useful.
link |
00:56:30.400
Well, you've seen, throughout your career,
link |
00:56:32.480
you've seen greater and greater level abstractions built
link |
00:56:35.720
in artificial machines, right?
link |
00:56:39.520
Do you think, when you look at humans,
link |
00:56:41.280
do you think that the look of all life on Earth
link |
00:56:44.040
is a single organism building this thing,
link |
00:56:46.880
this machine with greater and greater levels of abstraction?
link |
00:56:49.880
Do you think humans are the peak,
link |
00:56:52.720
the top of the food chain in this long arc of history
link |
00:56:57.400
on Earth?
link |
00:56:58.440
Or do you think we're just somewhere in the middle?
link |
00:57:00.600
Are we the basic functional operations of a CPU?
link |
00:57:05.280
Are we the C++ program, the Python program,
link |
00:57:09.280
or the neural network?
link |
00:57:10.480
Like, somebody's, you know, people
link |
00:57:12.200
have calculated, like, how many operations does the brain do?
link |
00:57:14.920
Something, you know, I've seen the number 10 to the 18th
link |
00:57:17.680
a bunch of times, arrive different ways.
link |
00:57:20.600
So could you make a computer that
link |
00:57:22.080
did 10 to the 20th operations?
link |
00:57:23.760
Yes.
link |
00:57:24.360
Sure.
link |
00:57:24.880
Do you think?
link |
00:57:25.720
We're going to do that.
link |
00:57:27.040
Now, is there something magical about how brains compute things?
link |
00:57:31.640
I don't know.
link |
00:57:32.960
You know, my personal experience is interesting,
link |
00:57:35.240
because, you know, you think you know how you think,
link |
00:57:37.760
and then you have all these ideas,
link |
00:57:39.160
and you can't figure out how they happened.
link |
00:57:41.520
And if you meditate, you know, what you can be aware of
link |
00:57:47.040
is interesting.
link |
00:57:48.760
So I don't know if brains are magical or not.
link |
00:57:51.720
You know, the physical evidence says no.
link |
00:57:54.800
Lots of people's personal experience says yes.
link |
00:57:57.840
So what would be funny is if brains are magical,
link |
00:58:01.280
and yet we can make brains with more computation.
link |
00:58:04.600
You know, I don't know what to say about that.
link |
00:58:07.080
But do you think magic is an emergent phenomena?
link |
00:58:11.080
Could be.
link |
00:58:12.080
I have no explanation for it.
link |
00:58:13.840
Let me ask Jim Keller of what in your view is consciousness?
link |
00:58:19.240
With consciousness?
link |
00:58:20.640
Yeah, like what, you know, consciousness, love,
link |
00:58:25.520
things that are these deeply human things that
link |
00:58:27.960
seems to emerge from our brain, is that something
link |
00:58:30.280
that we'll be able to make encode in chips that get
link |
00:58:36.280
faster and faster and faster and faster?
link |
00:58:38.120
That's like a 10 hour conversation.
link |
00:58:40.160
Nobody really knows.
link |
00:58:41.000
Can you summarize it in a couple of sentences?
link |
00:58:45.320
Many people have observed that organisms run
link |
00:58:48.840
at lots of different levels, right?
link |
00:58:51.320
If you had two neurons, somebody said
link |
00:58:52.840
you'd have one sensory neuron and one motor neuron, right?
link |
00:58:56.880
So we move towards things and away from things.
link |
00:58:58.800
And we have physical integrity and safety or not, right?
link |
00:59:03.200
And then if you look at the animal kingdom,
link |
00:59:05.680
you can see brains that are a little more complicated.
link |
00:59:08.320
And at some point, there's a planning system.
link |
00:59:10.320
And then there's an emotional system
link |
00:59:11.960
that's happy about being safe or unhappy about being threatened.
link |
00:59:17.240
And then our brains have massive numbers of structures,
link |
00:59:21.920
like planning and movement and thinking and feeling
link |
00:59:25.680
and drives and emotions.
link |
00:59:27.920
And we seem to have multiple layers of thinking systems.
link |
00:59:31.160
And we have a dream system that nobody understands whatsoever,
link |
00:59:35.240
which I find completely hilarious.
link |
00:59:37.520
And you can think in a way that those systems are
link |
00:59:44.480
more independent.
link |
00:59:45.720
And you can observe the different parts of yourself
link |
00:59:47.880
can observe them.
link |
00:59:49.600
I don't know which one's magical.
link |
00:59:51.440
I don't know which one's not computational.
link |
00:59:55.360
So.
link |
00:59:56.800
Is it possible that it's all computation?
link |
00:59:58.880
Probably.
link |
01:00:00.120
Is there a limit to computation?
link |
01:00:01.560
I don't think so.
link |
01:00:03.200
Do you think the universe is a computer?
link |
01:00:06.240
It seems to be.
link |
01:00:07.480
It's a weird kind of computer.
link |
01:00:09.600
Because if it was a computer, like when
link |
01:00:13.120
they do calculations on how much calculation
link |
01:00:16.560
it takes to describe quantum effects, it's unbelievably high.
link |
01:00:20.960
So if it was a computer, wouldn't you
link |
01:00:22.560
have built it out of something that was easier to compute?
link |
01:00:26.240
That's a funny system.
link |
01:00:29.560
But then the simulation guys pointed out
link |
01:00:31.320
that the rules are kind of interesting.
link |
01:00:32.920
When you look really close, it's uncertain.
link |
01:00:35.160
And the speed of light says you can only look so far.
link |
01:00:37.720
And things can't be simultaneous,
link |
01:00:39.200
except for the odd entanglement problem where they seem to be.
link |
01:00:42.760
The rules are all kind of weird.
link |
01:00:45.120
And somebody said physics is like having
link |
01:00:47.960
50 equations with 50 variables to define 50 variables.
link |
01:00:55.440
Physics itself has been a shit show for thousands of years.
link |
01:00:59.080
It seems odd when you get to the corners of everything.
link |
01:01:02.040
It's either uncomputable or undefinable or uncertain.
link |
01:01:07.240
It's almost like the designers of the simulation
link |
01:01:09.360
are trying to prevent us from understanding it perfectly.
link |
01:01:12.840
But also, the things that require calculations
link |
01:01:16.160
require so much calculation that our idea
link |
01:01:18.480
of the universe of a computer is absurd,
link |
01:01:20.840
because every single little bit of it
link |
01:01:23.160
takes all the computation in the universe to figure out.
link |
01:01:26.640
So that's a weird kind of computer.
link |
01:01:28.400
You say the simulation is running
link |
01:01:29.760
in a computer, which has, by definition, infinite computation.
link |
01:01:34.520
Not infinite.
link |
01:01:35.440
Oh, you mean if the universe is infinite?
link |
01:01:37.680
Yeah.
link |
01:01:38.200
Well, every little piece of our universe
link |
01:01:40.720
seems to take infinite computation to figure out.
link |
01:01:43.240
Not infinite, just a lot.
link |
01:01:44.240
Well, a lot.
link |
01:01:44.840
Some pretty big number.
link |
01:01:46.040
Compute this little teeny spot takes all the mass
link |
01:01:50.320
in the local one light year by one light year space.
link |
01:01:53.440
It's close enough to infinite.
link |
01:01:54.960
Well, it's a heck of a computer if it is one.
link |
01:01:56.840
I know.
link |
01:01:57.520
It's a weird description, because the simulation
link |
01:02:01.040
description seems to break when you look closely at it.
link |
01:02:04.880
But the rules of the universe seem to imply something's up.
link |
01:02:08.800
That seems a little arbitrary.
link |
01:02:10.880
The universe, the whole thing, the laws of physics,
link |
01:02:14.920
it just seems like, how did it come out to be the way it is?
link |
01:02:20.120
Well, lots of people talk about that.
link |
01:02:22.640
Like I said, the two smartest groups of humans
link |
01:02:24.440
are working on the same problem.
link |
01:02:26.120
From different aspects.
link |
01:02:27.120
And they're both complete failures.
link |
01:02:29.560
So that's kind of cool.
link |
01:02:32.160
They might succeed eventually.
link |
01:02:34.800
Well, after 2,000 years, the trend isn't good.
link |
01:02:37.680
Oh, 2,000 years is nothing in the span
link |
01:02:39.640
of the history of the universe.
link |
01:02:40.920
That's for sure.
link |
01:02:41.560
We have some time.
link |
01:02:42.800
But the next 1,000 years doesn't look good either.
link |
01:02:46.720
That's what everybody says at every stage.
link |
01:02:48.360
But with Moore's law, as you've just described,
link |
01:02:50.840
not being dead, the exponential growth of technology,
link |
01:02:54.680
the future seems pretty incredible.
link |
01:02:57.360
Well, it'll be interesting, that's for sure.
link |
01:02:59.160
That's right.
link |
01:03:00.120
So what are your thoughts on Ray Kurzweil's sense
link |
01:03:03.640
that exponential improvement in technology
link |
01:03:05.640
will continue indefinitely?
link |
01:03:07.120
Is that how you see Moore's law?
link |
01:03:09.920
Do you see Moore's law more broadly,
link |
01:03:12.720
in the sense that technology of all kinds
link |
01:03:15.960
has a way of stacking S curves on top of each other,
link |
01:03:20.320
where it'll be exponential, and then we'll see all kinds of...
link |
01:03:24.440
What does an exponential of a million mean?
link |
01:03:27.600
That's a pretty amazing number.
link |
01:03:29.400
And that's just for a local little piece of silicon.
link |
01:03:32.160
Now let's imagine you, say, decided
link |
01:03:35.080
to get 1,000 tons of silicon to collaborate in one computer
link |
01:03:41.520
at a million times the density.
link |
01:03:44.720
Now you're talking, I don't know, 10 to the 20th more
link |
01:03:47.840
computation power than our current, already unbelievably
link |
01:03:51.720
fast computers.
link |
01:03:54.200
Nobody knows what that's going to mean.
link |
01:03:55.760
The sci fi guys call it computronium,
link |
01:03:58.960
like when a local civilization turns the nearby star
link |
01:04:02.720
into a computer.
link |
01:04:05.120
I don't know if that's true, but...
link |
01:04:06.720
So just even when you shrink a transistor, the...
link |
01:04:11.520
That's only one dimension.
link |
01:04:12.560
The ripple effects of that.
link |
01:04:14.280
People tend to think about computers as a cost problem.
link |
01:04:17.600
So computers are made out of silicon and minor amounts
link |
01:04:20.560
of metals and this and that.
link |
01:04:24.800
None of those things cost any money.
link |
01:04:27.520
There's plenty of sand.
link |
01:04:30.080
You could just turn the beach and a little bit of ocean water
link |
01:04:32.320
into computers.
link |
01:04:33.360
So all the cost is in the equipment to do it.
link |
01:04:36.720
And the trend on equipment is once you
link |
01:04:39.120
figure out how to build the equipment,
link |
01:04:40.640
the trend of cost is zero.
link |
01:04:41.800
Elon said, first you figure out what
link |
01:04:44.160
configuration you want the atoms in,
link |
01:04:47.560
and then how to put them there.
link |
01:04:50.320
His great insight is people are how constrained.
link |
01:04:56.480
I have this thing, I know how it works,
link |
01:04:58.720
and then little tweaks to that will generate something,
link |
01:05:02.320
as opposed to what do I actually want,
link |
01:05:05.160
and then figure out how to build it.
link |
01:05:07.080
It's a very different mindset.
link |
01:05:09.280
And almost nobody has it, obviously.
link |
01:05:12.840
Well, let me ask on that topic,
link |
01:05:15.760
you were one of the key early people
link |
01:05:18.080
in the development of autopilot, at least in the hardware
link |
01:05:21.040
side, Elon Musk believes that autopilot
link |
01:05:24.480
and vehicle autonomy, if you just look at that problem,
link |
01:05:26.720
can follow this kind of exponential improvement.
link |
01:05:29.480
In terms of the how question that we're talking about,
link |
01:05:32.600
there's no reason why you can't.
link |
01:05:34.680
What are your thoughts on this particular space
link |
01:05:37.320
of vehicle autonomy, and your part of it
link |
01:05:42.320
and Elon Musk's and Tesla's vision for vehicle autonomy?
link |
01:05:45.280
Well, the computer you need to build is straightforward.
link |
01:05:48.760
And you could argue, well, does it need to be
link |
01:05:51.160
two times faster or five times or 10 times?
link |
01:05:54.520
But that's just a matter of time or price in the short run.
link |
01:05:58.440
So that's not a big deal.
link |
01:06:00.240
You don't have to be especially smart to drive a car.
link |
01:06:03.280
So it's not like a super hard problem.
link |
01:06:05.720
I mean, the big problem with safety is attention,
link |
01:06:07.960
which computers are really good at, not skills.
link |
01:06:11.120
Well, let me push back on one.
link |
01:06:15.280
You see, everything you said is correct,
link |
01:06:17.160
but we as humans tend to take for granted
link |
01:06:24.320
how incredible our vision system is.
link |
01:06:26.880
So you can drive a car with 20, 50 vision,
link |
01:06:30.640
and you can train a neural network to extract
link |
01:06:33.080
the distance of any object in the shape of any surface
link |
01:06:36.480
from a video and data.
link |
01:06:38.560
Yeah, but that's really simple.
link |
01:06:40.200
No, it's not simple.
link |
01:06:42.120
That's a simple data problem.
link |
01:06:44.400
It's not, it's not simple.
link |
01:06:46.320
It's because it's not just detecting objects,
link |
01:06:50.480
it's understanding the scene,
link |
01:06:52.280
and it's being able to do it in a way
link |
01:06:54.320
that doesn't make errors.
link |
01:06:56.600
So the beautiful thing about the human vision system
link |
01:07:00.040
and our entire brain around the whole thing
link |
01:07:02.600
is we're able to fill in the gaps.
link |
01:07:05.520
It's not just about perfectly detecting cars.
link |
01:07:08.200
It's inferring the occluded cars.
link |
01:07:09.960
It's trying to, it's understanding the physics.
link |
01:07:12.400
I think that's mostly a data problem.
link |
01:07:14.600
So you think what data would compute
link |
01:07:17.680
with improvement of computation
link |
01:07:19.220
with improvement in collection of data?
link |
01:07:20.800
Well, there is a, you know, when you're driving a car
link |
01:07:22.640
and somebody cuts you off, your brain has theories
link |
01:07:24.760
about why they did it.
link |
01:07:26.160
You know, they're a bad person, they're distracted,
link |
01:07:28.640
they're dumb, you know, you can listen to yourself, right?
link |
01:07:32.820
So, you know, if you think that narrative is important
link |
01:07:37.040
to be able to successfully drive a car,
link |
01:07:38.840
then current autopilot systems can't do it.
link |
01:07:41.640
But if cars are ballistic things with tracks
link |
01:07:44.360
and probabilistic changes of speed and direction,
link |
01:07:47.320
and roads are fixed and given, by the way,
link |
01:07:50.200
they don't change dynamically, right?
link |
01:07:53.280
You can map the world really thoroughly.
link |
01:07:56.320
You can place every object really thoroughly.
link |
01:08:01.040
Right, you can calculate trajectories
link |
01:08:03.040
of things really thoroughly, right?
link |
01:08:06.400
But everything you said about really thoroughly
link |
01:08:09.840
has a different degree of difficulty, so.
link |
01:08:13.120
And you could say at some point,
link |
01:08:15.080
computer autonomous systems will be way better
link |
01:08:17.640
at things that humans are lousy at.
link |
01:08:20.040
Like, they'll be better at attention,
link |
01:08:22.480
they'll always remember there was a pothole in the road
link |
01:08:25.040
that humans keep forgetting about,
link |
01:08:27.360
they'll remember that this set of roads
link |
01:08:29.440
has these weirdo lines on it
link |
01:08:31.200
that the computers figured out once,
link |
01:08:32.800
and especially if they get updates,
link |
01:08:35.160
so if somebody changes a given,
link |
01:08:38.000
like, the key to robots and stuff somebody said
link |
01:08:41.280
is to maximize the givens, right?
link |
01:08:44.360
Right.
link |
01:08:45.200
So having a robot pick up this bottle cap
link |
01:08:47.960
is way easier if you put a red dot on the top,
link |
01:08:51.000
because then you'll have to figure out,
link |
01:08:52.680
and if you wanna do a certain thing with it,
link |
01:08:54.840
maximize the givens is the thing.
link |
01:08:57.160
And autonomous systems are happily maximizing the givens.
link |
01:09:01.040
Like, humans, when you drive someplace new,
link |
01:09:04.160
you remember it, because you're processing it
link |
01:09:06.200
the whole time, and after the 50th time you drove to work,
link |
01:09:08.920
you get to work, you don't know how you got there, right?
link |
01:09:11.480
You're on autopilot, right?
link |
01:09:14.840
Autonomous cars are always on autopilot.
link |
01:09:17.800
But the cars have no theories about why they got cut off,
link |
01:09:20.360
or why they're in traffic.
link |
01:09:22.140
So they also never stop paying attention.
link |
01:09:24.720
Right, so I tend to believe you do have to have theories,
link |
01:09:28.000
meta models of other people,
link |
01:09:30.000
especially with pedestrian cyclists,
link |
01:09:31.420
but also with other cars.
link |
01:09:32.840
So everything you said is actually essential to driving.
link |
01:09:38.920
Driving is a lot more complicated than people realize,
link |
01:09:41.760
I think, so to push back slightly, but to...
link |
01:09:44.640
So to cut into traffic, right?
link |
01:09:46.480
Yep.
link |
01:09:47.320
You can't just wait for a gap,
link |
01:09:48.460
you have to be somewhat aggressive.
link |
01:09:50.280
You'll be surprised how simple a calculation for that is.
link |
01:09:53.840
I may be on that particular point,
link |
01:09:55.540
but there's, maybe I actually have to push back.
link |
01:10:00.360
I would be surprised.
link |
01:10:01.640
You know what, yeah, I'll just say where I stand.
link |
01:10:03.080
I would be very surprised,
link |
01:10:04.280
but I think you might be surprised how complicated it is.
link |
01:10:10.080
I tell people, progress disappoints in the short run,
link |
01:10:12.640
and surprises in the long run.
link |
01:10:13.960
It's very possible, yeah.
link |
01:10:15.600
I suspect in 10 years it'll be just taken for granted.
link |
01:10:19.000
Yeah, probably.
link |
01:10:19.880
But you're probably right, not look like...
link |
01:10:22.080
It's gonna be a $50 solution that nobody cares about.
link |
01:10:25.080
It's like GPSes, like, wow, GPSes.
link |
01:10:27.280
We have satellites in space
link |
01:10:29.460
that tell you where your location is.
link |
01:10:31.120
It was a really big deal, now everything has a GPS in it.
link |
01:10:33.480
Yeah, that's true, but I do think that systems
link |
01:10:36.040
that involve human behavior are more complicated
link |
01:10:39.880
than we give them credit for.
link |
01:10:40.820
So we can do incredible things with technology
link |
01:10:43.520
that don't involve humans, but when you...
link |
01:10:45.560
I think humans are less complicated than people.
link |
01:10:48.440
You know, frequently ascribed.
link |
01:10:50.560
Maybe I feel...
link |
01:10:51.400
We tend to operate out of large numbers of patterns
link |
01:10:53.720
and just keep doing it over and over.
link |
01:10:55.820
But I can't trust you because you're a human.
link |
01:10:58.040
That's something a human would say.
link |
01:11:00.760
But my hope is on the point you've made is,
link |
01:11:04.600
even if, no matter who's right,
link |
01:11:08.840
I'm hoping that there's a lot of things
link |
01:11:10.660
that humans aren't good at
link |
01:11:11.880
that machines are definitely good at,
link |
01:11:13.460
like you said, attention and things like that.
link |
01:11:15.640
Well, they'll be so much better
link |
01:11:17.680
that the overall picture of safety and autonomy
link |
01:11:21.000
will be, obviously cars will be safer,
link |
01:11:22.880
even if they're not as good at understanding.
link |
01:11:24.720
I'm a big believer in safety.
link |
01:11:26.400
I mean, there are already the current safety systems,
link |
01:11:29.640
like cruise control that doesn't let you run into people
link |
01:11:32.040
and lane keeping.
link |
01:11:33.360
There are so many features
link |
01:11:34.680
that you just look at the parade of accidents
link |
01:11:37.760
and knocking off like 80% of them is super doable.
link |
01:11:42.480
Just to linger on the autopilot team
link |
01:11:44.680
and the efforts there,
link |
01:11:48.000
it seems to be that there's a very intense scrutiny
link |
01:11:51.720
by the media and the public in terms of safety,
link |
01:11:54.320
the pressure, the bar put before autonomous vehicles.
link |
01:11:58.000
What are your, sort of as a person there
link |
01:12:01.760
working on the hardware and trying to build a system
link |
01:12:03.900
that builds a safe vehicle and so on,
link |
01:12:07.240
what was your sense about that pressure?
link |
01:12:08.960
Is it unfair?
link |
01:12:09.920
Is it expected of new technology?
link |
01:12:12.320
Yeah, it seems reasonable.
link |
01:12:13.540
I was interested, I talked to both American
link |
01:12:15.440
and European regulators,
link |
01:12:17.280
and I was worried that the regulations
link |
01:12:21.240
would write into the rules technology solutions,
link |
01:12:25.120
like modern brake systems imply hydraulic brakes.
link |
01:12:30.040
So if you read the regulations,
link |
01:12:32.160
to meet the letter of the law for brakes,
link |
01:12:35.100
it sort of has to be hydraulic, right?
link |
01:12:37.800
And the regulator said they're interested in the use cases,
link |
01:12:42.060
like a head on crash, an offset crash,
link |
01:12:44.360
don't hit pedestrians, don't run into people,
link |
01:12:47.100
don't leave the road, don't run a red light or a stoplight.
link |
01:12:50.400
They were very much into the scenarios.
link |
01:12:53.160
And they had all the data about which scenarios
link |
01:12:56.920
injured or killed the most people.
link |
01:12:59.320
And for the most part, those conversations were like,
link |
01:13:04.040
what's the right thing to do to take the next step?
link |
01:13:08.800
Now, Elon's very interested also in the benefits
link |
01:13:12.000
of autonomous driving or freeing people's time
link |
01:13:14.160
and attention, as well as safety.
link |
01:13:18.600
And I think that's also an interesting thing,
link |
01:13:20.340
but building autonomous systems so they're safe
link |
01:13:25.160
and safer than people seemed,
link |
01:13:27.400
since the goal is to be 10X safer than people,
link |
01:13:30.160
having the bar to be safer than people
link |
01:13:32.200
and scrutinizing accidents seems philosophically correct.
link |
01:13:39.260
So I think that's a good thing.
link |
01:13:41.000
What are, is different than the things you worked at,
link |
01:13:46.000
Intel, AMD, Apple, with autopilot chip design
link |
01:13:51.600
and hardware design, what are interesting
link |
01:13:54.320
or challenging aspects of building this specialized
link |
01:13:56.680
kind of computing system in the automotive space?
link |
01:14:00.300
I mean, there's two tricks to building
link |
01:14:01.640
like an automotive computer.
link |
01:14:02.780
One is the software team, the machine learning team
link |
01:14:07.320
is developing algorithms that are changing fast.
link |
01:14:10.640
So as you're building the accelerator,
link |
01:14:14.280
you have this, you know, worry or intuition
link |
01:14:16.920
that the algorithms will change enough
link |
01:14:18.520
that the accelerator will be the wrong one, right?
link |
01:14:22.640
And there's the generic thing, which is,
link |
01:14:25.000
if you build a really good general purpose computer,
link |
01:14:27.240
say its performance is one, and then GPU guys
link |
01:14:31.440
will deliver about 5X to performance
link |
01:14:34.280
for the same amount of silicon,
link |
01:14:35.720
because instead of discovering parallelism,
link |
01:14:37.640
you're given parallelism.
link |
01:14:39.240
And then special accelerators get another two to 5X
link |
01:14:43.720
on top of a GPU, because you say,
link |
01:14:46.040
I know the math is always eight bit integers
link |
01:14:49.040
into 32 bit accumulators, and the operations
link |
01:14:52.200
are the subset of mathematical possibilities.
link |
01:14:55.200
So AI accelerators have a claimed performance benefit
link |
01:15:00.920
over GPUs because in the narrow math space,
link |
01:15:05.080
you're nailing the algorithm.
link |
01:15:07.100
Now, you still try to make it programmable,
link |
01:15:10.040
but the AI field is changing really fast.
link |
01:15:13.280
So there's a, you know, there's a little
link |
01:15:15.760
creative tension there of, I want the acceleration
link |
01:15:18.520
afforded by specialization without being over specialized
link |
01:15:22.160
so that the new algorithm is so much more effective
link |
01:15:25.600
that you'd have been better off on a GPU.
link |
01:15:27.960
So there's a tension there.
link |
01:15:30.000
To build a good computer for an application
link |
01:15:33.000
like automotive, there's all kinds of sensor inputs
link |
01:15:36.240
and safety processors and a bunch of stuff.
link |
01:15:39.120
So one of Elon's goals is to make it super affordable.
link |
01:15:42.240
So every car gets an autopilot computer.
link |
01:15:44.840
So some of the recent startups you look at,
link |
01:15:46.520
and they have a server in the trunk,
link |
01:15:48.360
because they're saying, I'm gonna build
link |
01:15:49.680
this autopilot computer, replaces the driver.
link |
01:15:52.540
So their cost budget's 10 or $20,000.
link |
01:15:55.240
And Elon's constraint was, I'm gonna put one in every car,
link |
01:15:58.780
whether people buy autonomous driving or not.
link |
01:16:01.720
So the cost constraint he had in mind was great, right?
link |
01:16:05.260
And to hit that, you had to think about the system design.
link |
01:16:08.400
That's complicated, and it's fun.
link |
01:16:09.880
You know, it's like, it's like, it's craftsman's work.
link |
01:16:12.560
Like, you know, a violin maker, right?
link |
01:16:14.240
You can say, Stradivarius is this incredible thing,
link |
01:16:16.800
the musicians are incredible.
link |
01:16:18.480
But the guy making the violin, you know,
link |
01:16:20.480
picked wood and sanded it, and then he cut it,
link |
01:16:24.000
you know, and he glued it, you know,
link |
01:16:25.960
and he waited for the right day
link |
01:16:27.920
so that when he put the finish on it,
link |
01:16:29.520
it didn't, you know, do something dumb.
link |
01:16:31.640
That's craftsman's work, right?
link |
01:16:33.880
You may be a genius craftsman
link |
01:16:35.520
because you have the best techniques
link |
01:16:36.840
and you discover a new one,
link |
01:16:38.840
but most engineers, craftsman's work.
link |
01:16:41.960
And humans really like to do that.
link |
01:16:44.320
You know the expression?
link |
01:16:45.140
Smart humans.
link |
01:16:45.980
No, everybody.
link |
01:16:46.820
All humans.
link |
01:16:47.660
I don't know.
link |
01:16:48.480
I used to, I dug ditches when I was in college.
link |
01:16:50.360
I got really good at it.
link |
01:16:51.440
Satisfying.
link |
01:16:52.620
Yeah.
link |
01:16:53.460
So.
link |
01:16:54.280
Digging ditches is also craftsman's work.
link |
01:16:55.480
Yeah, of course.
link |
01:16:56.960
So there's an expression called complex mastery behavior.
link |
01:17:00.920
So when you're learning something,
link |
01:17:02.080
that's fine, because you're learning something.
link |
01:17:04.080
When you do something, it's relatively simple.
link |
01:17:05.760
It's not that satisfying.
link |
01:17:06.700
But if the steps that you have to do are complicated
link |
01:17:10.360
and you're good at them, it's satisfying to do them.
link |
01:17:14.640
And then if you're intrigued by it all,
link |
01:17:16.880
as you're doing them, you sometimes learn new things
link |
01:17:19.520
that you can raise your game.
link |
01:17:21.600
But craftsman's work is good.
link |
01:17:23.760
And engineers, like engineering is complicated enough
link |
01:17:27.080
that you have to learn a lot of skills.
link |
01:17:28.800
And then a lot of what you do is then craftsman's work,
link |
01:17:32.360
which is fun.
link |
01:17:33.480
Autonomous driving, building a very resource
link |
01:17:37.040
constrained computer.
link |
01:17:37.880
So a computer has to be cheap enough
link |
01:17:39.520
to put in every single car.
link |
01:17:41.100
That essentially boils down to craftsman's work.
link |
01:17:45.040
It's engineering, it's innovation.
link |
01:17:45.880
Yeah, you know, there's thoughtful decisions
link |
01:17:47.680
and problems to solve and trade offs to make.
link |
01:17:50.560
Do you need 10 camera and ports or eight?
link |
01:17:52.480
You know, you're building for the current car
link |
01:17:54.520
or the next one.
link |
01:17:56.000
You know, how do you do the safety stuff?
link |
01:17:57.880
You know, there's a whole bunch of details.
link |
01:18:00.600
But it's fun.
link |
01:18:01.440
It's not like I'm building a new type of neural network,
link |
01:18:04.760
which has a new mathematics and a new computer to work.
link |
01:18:08.040
You know, that's like, there's more invention than that.
link |
01:18:12.400
But the rejection to practice,
link |
01:18:14.120
once you pick the architecture, you look inside
link |
01:18:16.120
and what do you see?
link |
01:18:17.080
Adders and multipliers and memories and, you know,
link |
01:18:20.360
the basics.
link |
01:18:21.200
So computers is always this weird set of abstraction layers
link |
01:18:25.640
of ideas and thinking that reduction to practice
link |
01:18:29.360
is transistors and wires and, you know, pretty basic stuff.
link |
01:18:33.800
And that's an interesting phenomenon.
link |
01:18:37.080
By the way, like factory work,
link |
01:18:38.800
like lots of people think factory work
link |
01:18:40.600
is road assembly stuff.
link |
01:18:42.280
I've been on the assembly line.
link |
01:18:44.160
Like the people who work there really like it.
link |
01:18:46.280
It's a really great job.
link |
01:18:47.880
It's really complicated.
link |
01:18:48.760
Putting cars together is hard, right?
link |
01:18:50.920
And the car is moving and the parts are moving
link |
01:18:53.440
and sometimes the parts are damaged
link |
01:18:55.000
and you have to coordinate putting all the stuff together
link |
01:18:57.560
and people are good at it.
link |
01:18:59.080
They're good at it.
link |
01:19:00.360
And I remember one day I went to work
link |
01:19:01.760
and the line was shut down for some reason
link |
01:19:03.920
and some of the guys sitting around were really bummed
link |
01:19:06.760
because they had reorganized a bunch of stuff
link |
01:19:09.240
and they were gonna hit a new record
link |
01:19:10.720
for the number of cars built that day.
link |
01:19:12.720
And they were all gung ho to do it.
link |
01:19:14.160
And these were big, tough buggers.
link |
01:19:15.680
And, you know, but what they did was complicated
link |
01:19:19.200
and you couldn't do it.
link |
01:19:20.200
Yeah, and I mean.
link |
01:19:21.360
Well, after a while you could,
link |
01:19:22.760
but you'd have to work your way up
link |
01:19:24.200
because, you know, like putting the bright,
link |
01:19:27.240
what's called the brights, the trim on a car
link |
01:19:30.960
on a moving assembly line
link |
01:19:32.600
where it has to be attached 25 places
link |
01:19:34.560
in a minute and a half is unbelievably complicated.
link |
01:19:39.200
And human beings can do it, it's really good.
link |
01:19:42.480
I think that's harder than driving a car, by the way.
link |
01:19:45.240
Putting together, working at a.
link |
01:19:47.040
Working on a factory.
link |
01:19:48.560
Two smart people can disagree.
link |
01:19:51.360
Yay.
link |
01:19:52.200
I think driving a car.
link |
01:19:54.440
We'll get you in the factory someday
link |
01:19:56.120
and then we'll see how you do.
link |
01:19:57.400
No, not for us humans driving a car is easy.
link |
01:19:59.480
I'm saying building a machine that drives a car
link |
01:20:03.040
is not easy.
link |
01:20:04.280
No, okay.
link |
01:20:05.120
Okay.
link |
01:20:05.960
Driving a car is easy for humans
link |
01:20:07.400
because we've been evolving for billions of years.
link |
01:20:10.800
Drive cars.
link |
01:20:11.640
Yeah, I noticed that.
link |
01:20:13.280
The pale of the cars are super cool.
link |
01:20:16.600
No, now you join the rest of the internet
link |
01:20:18.720
and mocking me.
link |
01:20:19.840
Okay.
link |
01:20:20.680
I wasn't mocking, I was just.
link |
01:20:22.840
Yeah, yeah.
link |
01:20:23.680
Intrigued by your anthropology.
link |
01:20:26.800
Yeah, it's.
link |
01:20:27.640
I'll have to go dig into that.
link |
01:20:28.960
There's some inaccuracies there, yes.
link |
01:20:31.080
Okay, but in general,
link |
01:20:35.360
what have you learned in terms of
link |
01:20:39.640
thinking about passion, craftsmanship,
link |
01:20:44.000
tension, chaos.
link |
01:20:47.200
Jesus.
link |
01:20:48.040
The whole mess of it.
link |
01:20:50.880
What have you learned, have taken away from your time
link |
01:20:54.240
working with Elon Musk, working at Tesla,
link |
01:20:57.000
which is known to be a place of chaos innovation,
link |
01:21:02.600
craftsmanship, and all of those things.
link |
01:21:03.640
I really like the way you thought.
link |
01:21:06.000
You think you have an understanding
link |
01:21:07.680
about what first principles of something is,
link |
01:21:10.000
and then you talk to Elon about it,
link |
01:21:11.640
and you didn't scratch the surface.
link |
01:21:15.480
He has a deep belief that no matter what you do,
link |
01:21:18.360
it's a local maximum, right?
link |
01:21:21.200
And I had a friend, he invented a better electric motor,
link |
01:21:24.280
and it was a lot better than what we were using.
link |
01:21:26.960
And one day he came by, he said,
link |
01:21:28.080
I'm a little disappointed, because this is really great,
link |
01:21:31.920
and you didn't seem that impressed.
link |
01:21:33.280
And I said, when the super intelligent aliens come,
link |
01:21:37.280
are they going to be looking for you?
link |
01:21:38.960
Like, where is he?
link |
01:21:39.800
The guy who built the motor.
link |
01:21:41.920
Yeah.
link |
01:21:42.760
Probably not.
link |
01:21:43.600
You know, like, but doing interesting work
link |
01:21:48.320
that's both innovative and, let's say,
link |
01:21:49.840
craftsman's work on the current thing
link |
01:21:51.800
is really satisfying, and it's good.
link |
01:21:54.200
And that's cool.
link |
01:21:55.120
And then Elon was good at taking everything apart,
link |
01:21:59.000
and like, what's the deep first principle?
link |
01:22:01.640
Oh, no, what's really, no, what's really?
link |
01:22:03.920
You know, that ability to look at it without assumptions
link |
01:22:08.920
and how constraints is super wild.
link |
01:22:13.680
You know, he built a rocket ship, and an electric car,
link |
01:22:17.240
and you know, everything.
link |
01:22:19.480
And that's super fun, and he's into it, too.
link |
01:22:21.280
Like, when they first landed two SpaceX rockets at Tesla,
link |
01:22:25.600
we had a video projector in the big room,
link |
01:22:27.440
and like, 500 people came down,
link |
01:22:29.280
and when they landed, everybody cheered,
link |
01:22:30.760
and some people cried.
link |
01:22:32.120
It was so cool.
link |
01:22:34.160
All right, but how did you do that?
link |
01:22:35.720
Well, it was super hard, and then people say,
link |
01:22:40.760
well, it's chaotic, really?
link |
01:22:42.560
To get out of all your assumptions,
link |
01:22:44.160
you think that's not gonna be unbelievably painful?
link |
01:22:47.720
And is Elon tough?
link |
01:22:49.640
Yeah, probably.
link |
01:22:50.960
Do people look back on it and say,
link |
01:22:52.840
boy, I'm really happy I had that experience
link |
01:22:57.080
to go take apart that many layers of assumptions?
link |
01:23:02.440
Sometimes super fun, sometimes painful.
link |
01:23:04.920
So it could be emotionally and intellectually painful,
link |
01:23:07.920
that whole process of just stripping away assumptions.
link |
01:23:10.880
Yeah, imagine 99% of your thought process
link |
01:23:13.360
is protecting your self conception,
link |
01:23:16.600
and 98% of that's wrong.
link |
01:23:20.160
Now you got the math right.
link |
01:23:22.640
How do you think you're feeling
link |
01:23:23.680
when you get back into that one bit that's useful,
link |
01:23:26.840
and now you're open,
link |
01:23:27.760
and you have the ability to do something different?
link |
01:23:30.680
I don't know if I got the math right.
link |
01:23:33.640
It might be 99.9, but it ain't 50.
link |
01:23:38.680
Imagining it, the 50% is hard enough.
link |
01:23:44.200
Now, for a long time, I've suspected you could get better.
link |
01:23:48.400
Like you can think better, you can think more clearly,
link |
01:23:50.720
you can take things apart.
link |
01:23:52.960
And there's lots of examples of that, people who do that.
link |
01:23:56.400
And Nilan is an example of that, you are an example.
link |
01:24:02.600
I don't know if I am, I'm fun to talk to.
link |
01:24:06.520
Certainly.
link |
01:24:07.360
I've learned a lot of stuff.
link |
01:24:09.000
Well, here's the other thing, I joke, like I read books,
link |
01:24:12.960
and people think, oh, you read books.
link |
01:24:14.560
Well, no, I've read a couple of books a week for 55 years.
link |
01:24:20.640
Well, maybe 50,
link |
01:24:21.520
because I didn't learn to read until I was age or something.
link |
01:24:24.640
And it turns out when people write books,
link |
01:24:28.480
they often take 20 years of their life
link |
01:24:31.240
where they passionately did something,
link |
01:24:33.280
reduce it to 200 pages.
link |
01:24:36.080
That's kind of fun.
link |
01:24:37.440
And then you go online,
link |
01:24:38.960
and you can find out who wrote the best books
link |
01:24:41.080
and who liked, you know, that's kind of wild.
link |
01:24:43.360
So there's this wild selection process,
link |
01:24:45.200
and then you can read it,
link |
01:24:46.040
and for the most part, understand it.
link |
01:24:49.840
And then you can go apply it.
link |
01:24:51.920
Like I went to one company,
link |
01:24:53.000
I thought, I haven't managed much before.
link |
01:24:55.080
So I read 20 management books,
link |
01:24:57.280
and I started talking to them,
link |
01:24:58.720
and basically compared to all the VPs running around,
link |
01:25:01.400
I'd read 19 more management books than anybody else.
link |
01:25:05.360
It wasn't even that hard.
link |
01:25:08.600
And half the stuff worked, like first time.
link |
01:25:11.160
It wasn't even rocket science.
link |
01:25:13.520
But at the core of that is questioning the assumptions,
link |
01:25:16.960
or sort of entering the thinking,
link |
01:25:20.000
first principles thinking,
link |
01:25:21.760
sort of looking at the reality of the situation,
link |
01:25:24.880
and using that knowledge, applying that knowledge.
link |
01:25:28.240
So that's.
link |
01:25:29.080
So I would say my brain has this idea
link |
01:25:31.400
that you can question first assumptions.
link |
01:25:35.280
But I can go days at a time and forget that,
link |
01:25:38.320
and you have to kind of like circle back that observation.
link |
01:25:42.520
Because it is emotionally challenging.
link |
01:25:45.200
Well, it's hard to just keep it front and center,
link |
01:25:47.360
because you operate on so many levels all the time,
link |
01:25:50.440
and getting this done takes priority,
link |
01:25:53.480
or being happy takes priority,
link |
01:25:56.560
or screwing around takes priority.
link |
01:25:59.400
Like how you go through life is complicated.
link |
01:26:03.080
And then you remember, oh yeah,
link |
01:26:04.400
I could really think first principles.
link |
01:26:06.600
Oh shit, that's tiring.
link |
01:26:09.600
But you do for a while, and that's kind of cool.
link |
01:26:12.760
So just as a last question in your sense,
link |
01:26:16.200
from the big picture, from the first principles,
link |
01:26:19.480
do you think, you kind of answered it already,
link |
01:26:21.520
but do you think autonomous driving is something
link |
01:26:25.000
we can solve on a timeline of years?
link |
01:26:28.720
So one, two, three, five, 10 years,
link |
01:26:32.240
as opposed to a century?
link |
01:26:33.880
Yeah, definitely.
link |
01:26:35.400
Just to linger on it a little longer,
link |
01:26:37.440
where's the confidence coming from?
link |
01:26:40.120
Is it the fundamentals of the problem,
link |
01:26:42.640
the fundamentals of building the hardware and the software?
link |
01:26:46.420
As a computational problem, understanding ballistics,
link |
01:26:50.680
roles, topography, it seems pretty solvable.
link |
01:26:56.800
And you can see this, like speech recognition,
link |
01:26:59.760
for a long time people are doing frequency
link |
01:27:01.720
and domain analysis, and all kinds of stuff,
link |
01:27:04.400
and that didn't work at all, right?
link |
01:27:07.280
And then they did deep learning about it,
link |
01:27:09.360
and it worked great.
link |
01:27:11.400
And it took multiple iterations.
link |
01:27:13.520
And autonomous driving is way past
link |
01:27:18.160
the frequency analysis point.
link |
01:27:21.040
Use radar, don't run into things.
link |
01:27:23.900
And the data gathering's going up,
link |
01:27:25.440
and the computation's going up,
link |
01:27:26.840
and the algorithm understanding's going up,
link |
01:27:28.640
and there's a whole bunch of problems
link |
01:27:30.020
getting solved like that.
link |
01:27:32.000
The data side is really powerful,
link |
01:27:33.520
but I disagree with both you and Elon.
link |
01:27:35.760
I'll tell Elon once again, as I did before,
link |
01:27:38.600
that when you add human beings into the picture,
link |
01:27:42.400
it's no longer a ballistics problem.
link |
01:27:45.680
It's something more complicated,
link |
01:27:47.480
but I could be very well proven wrong.
link |
01:27:50.360
Cars are highly damped in terms of rate of change.
link |
01:27:53.880
Like the steering system's really slow
link |
01:27:56.640
compared to a computer.
link |
01:27:57.640
The acceleration of the acceleration's really slow.
link |
01:28:01.000
Yeah, on a certain timescale, on a ballistics timescale,
link |
01:28:04.160
but human behavior, I don't know.
link |
01:28:07.340
I shouldn't say.
link |
01:28:08.180
Human beings are really slow too.
link |
01:28:09.780
Weirdly, we operate half a second behind reality.
link |
01:28:13.960
Nobody really understands that one either.
link |
01:28:15.300
It's pretty funny.
link |
01:28:16.440
Yeah, yeah.
link |
01:28:20.400
We very well could be surprised,
link |
01:28:23.600
and I think with the rate of improvement
link |
01:28:25.160
in all aspects on both the compute
link |
01:28:26.880
and the software and the hardware,
link |
01:28:29.680
there's gonna be pleasant surprises all over the place.
link |
01:28:34.680
Speaking of unpleasant surprises,
link |
01:28:36.720
many people have worries about a singularity
link |
01:28:39.520
in the development of AI.
link |
01:28:41.680
Forgive me for such questions.
link |
01:28:43.160
Yeah.
link |
01:28:44.460
When AI improves the exponential
link |
01:28:46.040
and reaches a point of superhuman level
link |
01:28:48.360
general intelligence, beyond the point,
link |
01:28:52.040
there's no looking back.
link |
01:28:53.320
Do you share this worry of existential threats
link |
01:28:56.120
from artificial intelligence,
link |
01:28:57.380
from computers becoming superhuman level intelligent?
link |
01:29:01.920
No, not really.
link |
01:29:04.600
We already have a very stratified society,
link |
01:29:07.540
and then if you look at the whole animal kingdom
link |
01:29:09.400
of capabilities and abilities and interests,
link |
01:29:12.560
and smart people have their niche,
link |
01:29:15.280
and normal people have their niche,
link |
01:29:17.760
and craftsmen have their niche,
link |
01:29:19.640
and animals have their niche.
link |
01:29:22.520
I suspect that the domains of interest
link |
01:29:26.000
for things that are astronomically different,
link |
01:29:29.440
like the whole something got 10 times smarter than us
link |
01:29:32.280
and wanted to track us all down because what?
link |
01:29:34.680
We like to have coffee at Starbucks?
link |
01:29:36.920
Like, it doesn't seem plausible.
link |
01:29:38.880
No, is there an existential problem
link |
01:29:40.680
that how do you live in a world
link |
01:29:42.520
where there's something way smarter than you,
link |
01:29:44.080
and you based your kind of self esteem
link |
01:29:46.400
on being the smartest local person?
link |
01:29:48.880
Well, there's what, 0.1% of the population who thinks that?
link |
01:29:52.520
Because the rest of the population's been dealing with it
link |
01:29:54.840
since they were born.
link |
01:29:56.720
So the breadth of possible experience
link |
01:30:00.940
that can be interesting is really big.
link |
01:30:03.660
And, you know, superintelligence seems likely,
link |
01:30:11.100
although we still don't know if we're magical,
link |
01:30:14.200
but I suspect we're not.
link |
01:30:16.320
And it seems likely that it'll create possibilities
link |
01:30:18.820
that are interesting for us,
link |
01:30:20.900
and its interests will be interesting for that,
link |
01:30:24.500
for whatever it is.
link |
01:30:26.800
It's not obvious why its interests would somehow
link |
01:30:30.060
want to fight over some square foot of dirt,
link |
01:30:32.360
or, you know, whatever the usual fears are about.
link |
01:30:37.660
So you don't think it'll inherit
link |
01:30:38.980
some of the darker aspects of human nature?
link |
01:30:42.140
Depends on how you think reality's constructed.
link |
01:30:45.180
So for whatever reason,
link |
01:30:48.020
human beings are in, let's say,
link |
01:30:50.540
creative tension and opposition
link |
01:30:52.300
with both our good and bad forces.
link |
01:30:55.340
Like, there's lots of philosophical understanding of that.
link |
01:30:58.180
I don't know why that would be different.
link |
01:31:03.180
So you think the evil is necessary for the good?
link |
01:31:06.700
I mean, the tension.
link |
01:31:08.180
I don't know about evil,
link |
01:31:09.080
but like we live in a competitive world
link |
01:31:11.620
where your good is somebody else's evil.
link |
01:31:16.660
You know, there's the malignant part of it,
link |
01:31:19.280
but that seems to be self limiting,
link |
01:31:22.720
although occasionally it's super horrible.
link |
01:31:26.280
But yes, there's a debate over ideas,
link |
01:31:29.980
and some people have different beliefs,
link |
01:31:32.340
and that debate itself is a process.
link |
01:31:34.580
So the arriving at something.
link |
01:31:37.580
Yeah, and why wouldn't that continue?
link |
01:31:39.360
Yeah.
link |
01:31:41.580
But you don't think that whole process
link |
01:31:43.140
will leave humans behind in a way that's painful?
link |
01:31:47.420
Emotionally painful, yes.
link |
01:31:48.660
For the 0.1%, they'll be.
link |
01:31:51.060
Why isn't it already painful
link |
01:31:52.340
for a large percentage of the population?
link |
01:31:54.060
And it is.
link |
01:31:54.900
I mean, society does have a lot of stress in it,
link |
01:31:57.860
about the 1%, and about the this, and about the that,
link |
01:32:00.660
but you know, everybody has a lot of stress in their life
link |
01:32:03.740
about what they find satisfying,
link |
01:32:05.220
and you know, know yourself seems to be the proper dictum,
link |
01:32:10.780
and pursue something that makes your life meaningful
link |
01:32:14.200
seems proper, and there's so many avenues on that.
link |
01:32:18.700
Like, there's so much unexplored space
link |
01:32:21.100
at every single level, you know.
link |
01:32:25.500
I'm somewhat of, my nephew called me a jaded optimist.
link |
01:32:29.640
And you know, so it's.
link |
01:32:33.820
There's a beautiful tension in that label,
link |
01:32:37.140
but if you were to look back at your life,
link |
01:32:40.940
and could relive a moment, a set of moments,
link |
01:32:45.780
because there were the happiest times of your life,
link |
01:32:49.220
outside of family, what would that be?
link |
01:32:54.660
I don't want to relive any moments.
link |
01:32:56.680
I like that.
link |
01:32:58.020
I like that situation where you have some amount of optimism
link |
01:33:01.340
and then the anxiety of the unknown.
link |
01:33:06.260
So you love the unknown, the mystery of it.
link |
01:33:10.100
I don't know about the mystery.
link |
01:33:11.220
It sure gets your blood pumping.
link |
01:33:14.060
What do you think is the meaning of this whole thing?
link |
01:33:17.100
Of life, on this pale blue dot?
link |
01:33:21.740
It seems to be what it does.
link |
01:33:25.260
Like, the universe, for whatever reason,
link |
01:33:29.260
makes atoms, which makes us, which we do stuff.
link |
01:33:34.340
And we figure out things, and we explore things, and.
link |
01:33:38.020
That's just what it is.
link |
01:33:39.820
It's not just.
link |
01:33:41.580
Yeah, it is.
link |
01:33:44.540
Jim, I don't think there's a better place to end it
link |
01:33:46.880
is a huge honor, and.
link |
01:33:50.100
Well, that was super fun.
link |
01:33:51.180
Thank you so much for talking today.
link |
01:33:52.520
All right, great.
link |
01:33:54.060
Thanks for listening to this conversation,
link |
01:33:56.180
and thank you to our presenting sponsor, Cash App.
link |
01:33:59.360
Download it, use code LexPodcast.
link |
01:34:02.020
You'll get $10, and $10 will go to FIRST,
link |
01:34:04.820
a STEM education nonprofit that inspires hundreds
link |
01:34:07.620
of thousands of young minds to become future leaders
link |
01:34:10.780
and innovators.
link |
01:34:12.180
If you enjoy this podcast, subscribe on YouTube.
link |
01:34:15.020
Give it five stars on Apple Podcast.
link |
01:34:17.020
Follow on Spotify, support it on Patreon,
link |
01:34:19.660
or simply connect with me on Twitter.
link |
01:34:22.320
And now, let me leave you with some words of wisdom
link |
01:34:24.780
from Gordon Moore.
link |
01:34:26.880
If everything you try works,
link |
01:34:28.780
you aren't trying hard enough.
link |
01:34:30.920
Thank you for listening, and hope to see you next time.