back to indexJim Keller: The Future of Computing, AI, Life, and Consciousness | Lex Fridman Podcast #162
link |
The following is a conversation with Jim Keller,
link |
his second time in the podcast.
link |
Jim is a legendary microprocessor architect
link |
and is widely seen as one of the greatest
link |
engineering minds of the computing age.
link |
In a peculiar twist of space time in our simulation,
link |
Jim is also a brother in law of Jordan Peterson.
link |
We talk about this and about computing,
link |
artificial intelligence, consciousness, and life.
link |
Quick mention of our sponsors.
link |
Athletic Greens All In One Nutrition Drink,
link |
Brooklyn and Sheets, ExpressVPN,
link |
and Belcampo Grass Fed Meat.
link |
Click the sponsor links to get a discount
link |
and to support this podcast.
link |
As a side note, let me say that Jim is someone who,
link |
on a personal level, inspired me to be myself.
link |
There was something in his words, on and off the mic,
link |
or perhaps that he even paid attention to me at all,
link |
that almost told me, you're all right, kid.
link |
A kind of pat on the back that can make the difference
link |
between a mind that flourishes
link |
and a mind that is broken down
link |
by the cynicism of the world.
link |
So I guess that's just my brief few words
link |
of thank you to Jim, and in general,
link |
gratitude for the people who have given me a chance
link |
on this podcast, in my work, and in life.
link |
If you enjoy this thing, subscribe on YouTube,
link |
review on Apple Podcast, follow on Spotify,
link |
support on Patreon, or connect with me
link |
on Twitter, Alex Friedman.
link |
And now, here's my conversation with Jim Keller.
link |
What's the value and effectiveness
link |
of theory versus engineering, this dichotomy,
link |
in building good software or hardware systems?
link |
Well, good design is both.
link |
I guess that's pretty obvious.
link |
By engineering, do you mean reduction of practice
link |
And then science is the pursuit of discovering things
link |
that people don't understand.
link |
Or solving unknown problems.
link |
Definitions are interesting here,
link |
but I was thinking more in theory,
link |
constructing models that kind of generalize
link |
about how things work.
link |
And engineering is actually building stuff.
link |
The pragmatic, like, okay, we have these nice models,
link |
but how do we actually get things to work?
link |
Maybe economics is a nice example.
link |
Like, economists have all these models
link |
of how the economy works,
link |
and how different policies will have an effect,
link |
but then there's the actual, okay,
link |
let's call it engineering,
link |
of like, actually deploying the policies.
link |
So computer design is almost all engineering.
link |
And reduction of practice of known methods.
link |
Now, because of the complexity of the computers we built,
link |
you know, you could think you're,
link |
well, we'll just go write some code,
link |
and then we'll verify it, and then we'll put it together,
link |
and then you find out that the combination
link |
of all that stuff is complicated.
link |
And then you have to be inventive
link |
to figure out how to do it, right?
link |
So that definitely happens a lot.
link |
And then, every so often, some big idea happens.
link |
But it might be one person.
link |
And that idea is in the space of engineering,
link |
or is it in the space of...
link |
Well, I'll give you an example.
link |
So one of the limits of computer performance
link |
is branch prediction.
link |
So, and there's a whole bunch of ideas
link |
about how good you could predict a branch.
link |
And people said, there's a limit to it,
link |
it's an asymptotic curve.
link |
And somebody came up with a better way
link |
to do branch prediction, it was a lot better.
link |
And he published a paper on it,
link |
and every computer in the world now uses it.
link |
And it was one idea.
link |
So the engineers who build branch prediction hardware
link |
were happy to drop the one kind of training array
link |
and put it in another one.
link |
So it was a real idea.
link |
And branch prediction is one of the key problems
link |
underlying all of sort of the lowest level of software.
link |
It boils down to branch prediction.
link |
Boils down to uncertainty.
link |
Computers are limited by...
link |
Single thread computer is limited by two things.
link |
The predictability of the path of the branches
link |
and the predictability of the locality of data.
link |
So we have predictors that now predict
link |
both of those pretty well.
link |
So memory is a couple hundred cycles away,
link |
local cache is a couple cycles away.
link |
When you're executing fast,
link |
virtually all the data has to be in the local cache.
link |
So a simple program says,
link |
add one to every element in an array,
link |
it's really easy to see what the stream of data will be.
link |
But you might have a more complicated program
link |
that says, get an element of this array,
link |
look at something, make a decision,
link |
go get another element, it's kind of random.
link |
And you can think, that's really unpredictable.
link |
And then you make this big predictor
link |
that looks at this kind of pattern and you realize,
link |
well, if you get this data and this data,
link |
then you probably want that one.
link |
And if you get this one and this one and this one,
link |
you probably want that one.
link |
And is that theory or is that engineering?
link |
Like the paper that was written,
link |
was it asymptotic kind of discussion
link |
or is it more like, here's a hack that works well?
link |
It's a little bit of both.
link |
Like there's information theory in it, I think somewhere.
link |
Okay, so it's actually trying to prove some kind of stuff.
link |
But once you know the method,
link |
implementing it is an engineering problem.
link |
Now there's a flip side of this,
link |
which is in a big design team,
link |
what percentage of people think
link |
their plan or their life's work is engineering
link |
versus inventing things?
link |
So lots of companies will reward you for filing patents.
link |
Some, many big companies get stuck
link |
because to get promoted,
link |
you have to come up with something new.
link |
And then what happens is everybody's trying
link |
to do some random new thing,
link |
99% of which doesn't matter.
link |
And the basics get neglected.
link |
Or there's a dichotomy, they think like the cell library
link |
and the basic CAD tools or basic software validation methods,
link |
that's simple stuff.
link |
They wanna work on the exciting stuff.
link |
And then they spend lots of time
link |
trying to figure out how to patent something.
link |
And that's mostly useless.
link |
But the breakthrough is on simple stuff.
link |
No, no, you have to do the simple stuff really well.
link |
If you're building a building out of bricks,
link |
you want great bricks.
link |
So you go to two places that sell bricks.
link |
So one guy says, yeah, they're over there in a ugly pile.
link |
And the other guy is like lovingly tells you
link |
about the 50 kinds of bricks and how hard they are
link |
and how beautiful they are and how square they are.
link |
Which one are you gonna buy bricks from?
link |
Which is gonna make a better house?
link |
So you're talking about the craftsman,
link |
the person who understands bricks,
link |
who loves bricks, who loves the varieties.
link |
That's a good word.
link |
Good engineering is great craftsmanship.
link |
And when you start thinking engineering is about invention
link |
and you set up a system that rewards invention,
link |
the craftsmanship gets neglected.
link |
Okay, so maybe one perspective is the theory,
link |
the science overemphasizes invention
link |
and engineering emphasizes craftsmanship.
link |
And therefore, so it doesn't matter what you do,
link |
theory, engineering. Well, everybody does.
link |
Like read the tech ranks are always talking
link |
about some breakthrough or innovation
link |
and everybody thinks that's the most important thing.
link |
But the number of innovative ideas
link |
is actually relatively low.
link |
We need them, right?
link |
And innovation creates a whole new opportunity.
link |
Like when some guy invented the internet, right?
link |
Like that was a big thing.
link |
The million people that wrote software against that
link |
were mostly doing engineering software writing.
link |
So the elaboration of that idea was huge.
link |
I don't know if you know Brendan Eich,
link |
he wrote JavaScript in 10 days.
link |
That's an interesting story.
link |
It makes me wonder, and it was famously for many years
link |
considered to be a pretty crappy programming language.
link |
It's been improving sort of consistently.
link |
But the interesting thing about that guy is,
link |
you know, he doesn't get any awards.
link |
You don't get a Nobel Prize or a Fields Medal or.
link |
For inventing a crappy piece of, you know, software code.
link |
That is currently the number one programming language
link |
in the world and runs,
link |
now is increasingly running the backend of the internet.
link |
Well, does he know why everybody uses it?
link |
Like that would be an interesting thing.
link |
Was it the right thing at the right time?
link |
Cause like when stuff like JavaScript came out,
link |
like there was a move from, you know,
link |
writing C programs and C++ to what they call
link |
managed code frameworks,
link |
where you write simple code, it might be interpreted,
link |
it has lots of libraries, productivity is high,
link |
and you don't have to be an expert.
link |
So, you know, Java was supposed to solve
link |
all the world's problems.
link |
It was complicated.
link |
JavaScript came out, you know,
link |
after a bunch of other scripting languages.
link |
I'm not an expert on it.
link |
But was it the right thing at the right time?
link |
Or was there something, you know, clever?
link |
Cause he wasn't the only one.
link |
There's a few elements.
link |
And maybe if he figured out what it was,
link |
then he'd get a prize.
link |
Yeah, you know, maybe his problem is he hasn't defined this.
link |
Or he just needs a good promoter.
link |
Well, I think there was a bunch of blog posts
link |
written about it, which is like,
link |
wrong is right, which is like doing the crappy thing fast.
link |
Just like hacking together the thing
link |
that answers some of the needs.
link |
And then iterating over time, listening to developers.
link |
Like listening to people who actually use the thing.
link |
This is something you can do more in software.
link |
But the right time, like you have to sense,
link |
you have to have a good instinct
link |
of when is the right time for the right tool.
link |
And make it super simple.
link |
And just get it out there.
link |
The problem is, this is true with hardware.
link |
This is less true with software.
link |
Is there's backward compatibility
link |
that just drags behind you as, you know,
link |
as you try to fix all the mistakes of the past.
link |
There's something about that.
link |
And it wasn't accidental.
link |
You have to like give yourself over to the,
link |
you have to have this like broad sense
link |
of what's needed now.
link |
Both scientifically and like the community.
link |
And just like this, it was obvious that there was no,
link |
the interesting thing about JavaScript
link |
is everything that ran in the browser at the time,
link |
like Java and I think other like Scheme,
link |
other programming languages,
link |
they were all in a separate external container.
link |
And then JavaScript was literally
link |
just injected into the webpage.
link |
It was the dumbest possible thing
link |
running in the same thread as everything else.
link |
And like it was inserted as a comment.
link |
So JavaScript code is inserted as a comment in the HTML code.
link |
And it was, I mean, there's,
link |
it's either genius or super dumb, but it's like.
link |
Right, so it had no apparatus for like a virtual machine
link |
and container, it just executed in the framework
link |
of the program that's already running.
link |
Yeah, that's cool.
link |
And then because something about that accessibility,
link |
the ease of its use resulted in then developers innovating
link |
of how to actually use it.
link |
I mean, I don't even know what to make of that,
link |
but it does seem to echo across different software,
link |
like stories of different software.
link |
PHP has the same story, really crappy language.
link |
They just took over the world.
link |
I always have a joke that the random length instructions,
link |
variable length instructions, that's always one,
link |
even though they're obviously worse.
link |
Like nobody knows why.
link |
X86 is arguably the worst architecture on the planet.
link |
It's one of the most popular ones.
link |
Well, I mean, isn't that also the story of risk versus,
link |
I mean, is that simplicity?
link |
There's something about simplicity that us
link |
in this evolutionary process is valued.
link |
If it's simple, it spreads faster, it seems like.
link |
Or is that not always true?
link |
Yeah, it could be simple is good, but too simple is bad.
link |
So why did risk win, you think, so far?
link |
In the long archivist tree.
link |
So who's gonna win?
link |
What's risk, what's CISC, and who's gonna win in that space
link |
in these instruction sets?
link |
AI software's gonna win, but there'll be little computers
link |
that run little programs like normal all over the place.
link |
But we're going through another transformation, so.
link |
But you think instruction sets underneath it all will change?
link |
Yeah, they evolve slowly.
link |
They don't matter very much.
link |
They don't matter very much, okay.
link |
I mean, the limits of performance are predictability
link |
of instructions and data.
link |
I mean, that's the big thing.
link |
And then the usability of it is some quality of design,
link |
quality of tools, availability.
link |
Like right now, x86 is proprietary with Intel and AMD,
link |
but they can change it any way they want independently.
link |
ARM is proprietary to ARM,
link |
and they won't let anybody else change it.
link |
So it's like a sole point.
link |
And RISC 5 is open source, so anybody can change it,
link |
which is super cool.
link |
But that also might mean it gets changed
link |
too many random ways that there's no common subset of it
link |
that people can use.
link |
Do you like open or do you like closed?
link |
Like if you were to bet all your money on one
link |
or the other, RISC 5 versus it?
link |
It's case dependent?
link |
Well, x86, oddly enough, when Intel first started
link |
developing it, they licensed like seven people.
link |
So it was the open architecture.
link |
And then they moved faster than others
link |
and also bought one or two of them.
link |
But there was seven different people making x86
link |
because at the time there was 6502 and Z80s and 8086.
link |
And you could argue everybody thought Z80
link |
was the better instruction set,
link |
but that was proprietary to one place.
link |
So there's like four or five different microprocessors.
link |
Intel went open, got the market share
link |
because people felt like they had multiple sources from it,
link |
and then over time it narrowed down to two players.
link |
So why, you as a historian, why did Intel win for so long
link |
with their processors?
link |
Their process development was great.
link |
Oh, so it's just looking back to JavaScript
link |
and what I like is Microsoft and Netscape
link |
and all these internet browsers.
link |
Microsoft won the browser game
link |
because they aggressively stole other people's ideas
link |
like right after they did it.
link |
You know, I don't know
link |
if Intel was stealing other people's ideas.
link |
They started making.
link |
In a good way, stealing in a good way just to clarify.
link |
They started making RAMs, random access memories.
link |
And then at the time
link |
when the Japanese manufacturers came up,
link |
you know, they were getting out competed on that
link |
and they pivoted the microprocessors
link |
and they made the first, you know,
link |
integrated microprocessor grant programs.
link |
It was the 4D04 or something.
link |
Who was behind that pivot?
link |
That's a hell of a pivot.
link |
Andy Grove and he was great.
link |
That's a hell of a pivot.
link |
And then they led semiconductor industry.
link |
Like they were just a little company, IBM,
link |
all kinds of big companies had boatloads of money
link |
and they out innovated everybody.
link |
Out innovated, okay.
link |
So it's not like marketing, it's not any of that stuff.
link |
Their processor designs were pretty good.
link |
I think the, you know, Core 2 was probably the first one
link |
I thought was great.
link |
It was a really fast processor and then Haswell was great.
link |
What makes a great processor in that?
link |
Oh, if you just look at it,
link |
it's performance versus everybody else.
link |
It's, you know, the size of it, the usability of it.
link |
So it's not specific,
link |
some kind of element that makes you beautiful.
link |
It's just like literally just raw performance.
link |
Is that how you think about processors?
link |
It's just like raw performance?
link |
It's like a horse race.
link |
The fastest one wins.
link |
You don't care how.
link |
Just as long as it wins.
link |
Well, there's the fastest in the environment.
link |
Like, you know, for years you made the fastest one you could
link |
and then people started to have power limits.
link |
So then you made the fastest at the right PowerPoint.
link |
And then when we started doing multi processors,
link |
like if you could scale your processors
link |
more than the other guy,
link |
you could be 10% faster on like a single thread,
link |
but you have more threads.
link |
So there's lots of variability.
link |
And then ARM really explored,
link |
like, you know, they have the A series
link |
and the R series and the M series,
link |
like a family of processors
link |
for all these different design points
link |
from like unbelievably small and simple.
link |
And so then when you're doing the design,
link |
it's sort of like this big pallet of CPUs.
link |
Like they're the only ones with a credible,
link |
you know, top to bottom pallet.
link |
What do you mean a credible top to bottom?
link |
Well, there's people who make microcontrollers
link |
that are small, but they don't have a fast one.
link |
There's people who make fast processors,
link |
but don't have a medium one or a small one.
link |
Is that hard to do that full pallet?
link |
That seems like a...
link |
Yeah, it's a lot of different.
link |
So what's the difference in the ARM folks and Intel
link |
in terms of the way they're approaching this problem?
link |
Well, Intel, almost all their processor designs
link |
were, you know, very custom high end,
link |
you know, for the last 15, 20 years.
link |
So the fastest horse possible.
link |
In one horse race.
link |
Yeah, and then architecturally they're really good,
link |
but the company itself was fairly insular
link |
to what's going on in the industry with CAD tools and stuff.
link |
And there's this debate about custom design
link |
versus synthesis and how do you approach that?
link |
I'd say Intel was slow on getting to synthesize processors.
link |
ARM came in from the bottom and they generated IP,
link |
which went to all kinds of customers.
link |
So they had very little say
link |
on how the customer implemented their IP.
link |
So ARM is super friendly to the synthesis IP environment.
link |
Whereas Intel said,
link |
we're gonna make this great client chip or server chip
link |
with our own CAD tools, with our own process,
link |
with our own, you know, other supporting IP
link |
and everything only works with our stuff.
link |
So is that, is ARM winning the mobile platform space
link |
in terms of process?
link |
And so in that, what you're describing
link |
is why they're winning.
link |
Well, they had lots of people doing lots
link |
of different experiments.
link |
So they controlled the processor architecture and IP,
link |
but they let people put in lots of different chips.
link |
And there was a lot of variability in what happened there.
link |
Whereas Intel, when they made their mobile,
link |
their foray into mobile,
link |
they had one team doing one part, right?
link |
So it wasn't 10 experiments.
link |
And then their mindset was PC mindset,
link |
Microsoft software mindset.
link |
And that brought a whole bunch of things along
link |
that the mobile world and the embedded world don't do.
link |
Do you think it was possible for Intel to pivot hard
link |
and win the mobile market?
link |
That's a hell of a difficult thing to do, right?
link |
For a huge company to just pivot.
link |
I mean, it's so interesting to,
link |
because we'll talk about your current work.
link |
It's like, it's clear that PCs were dominating
link |
for several decades, like desktop computers.
link |
And then mobile, it's unclear.
link |
It's a leadership question.
link |
Like Apple under Steve Jobs, when he came back,
link |
they pivoted multiple times.
link |
You know, they built iPads and iTunes and phones
link |
and tablets and great Macs.
link |
Like who knew computers should be made out of aluminum?
link |
But they're great.
link |
Like they pivoted multiple times.
link |
And you know, the old Intel, they did that multiple times.
link |
They made DRAMs and processors and processes
link |
and I gotta ask this,
link |
what was it like working with Steve Jobs?
link |
I didn't work with him.
link |
Did you interact with him?
link |
I said hi to him twice in the cafeteria.
link |
He said, hey fellas.
link |
He was wandering around and with somebody,
link |
he couldn't find a table because the cafeteria was packed
link |
and I gave him my table.
link |
But I worked for Mike Colbert who talked to,
link |
like Mike was the unofficial CTO of Apple
link |
and a brilliant guy and he worked for Steve for 25 years,
link |
maybe more and he talked to Steve multiple times a day
link |
and he was one of the people who could put up with Steve's,
link |
let's say, brilliance and intensity
link |
and Steve really liked him and Steve trusted Mike
link |
to translate the shit he thought up
link |
into engineering products that work
link |
and then Mike ran a group called Platform Architecture
link |
and I was in that group.
link |
So many times I'd be sitting with Mike
link |
and the phone would ring and it'd be Steve
link |
and Mike would hold the phone like this
link |
because Steve would be yelling about something or other.
link |
And then he would translate.
link |
And he'd translate and then he would say,
link |
Steve wants us to do this.
link |
Was Steve a good engineer or no?
link |
He was a great idea guy.
link |
And he's a really good selector for talent.
link |
Yeah, that seems to be one of the key elements
link |
of leadership, right?
link |
And then he was a really good first principles guy.
link |
Like somebody would say something couldn't be done
link |
and he would just think, that's obviously wrong, right?
link |
But you know, maybe it's hard to do.
link |
Maybe it's expensive to do.
link |
Maybe we need different people.
link |
You know, there's like a whole bunch of,
link |
if you want to do something hard,
link |
you know, maybe it takes time.
link |
Maybe you have to iterate.
link |
There's a whole bunch of things you could think about
link |
but saying it can't be done is stupid.
link |
How would you compare?
link |
So it seems like Elon Musk is more engineering centric
link |
but is also, I think he considers himself a designer too.
link |
He has a design mind.
link |
Steve Jobs feels like he's much more idea space,
link |
design space versus engineering.
link |
Just make it happen.
link |
Like the world should be this way.
link |
Just figure it out.
link |
But he used computers.
link |
You know, he had computer people talk to him all the time.
link |
Like Mike was a really good computer guy.
link |
He knew computers could do.
link |
Computer meaning computer hardware?
link |
Like hardware, software, all the pieces.
link |
And then he would have an idea about
link |
what could we do with this next.
link |
That was grounded in reality.
link |
It wasn't like he was just finger painting on the wall
link |
and wishing somebody would interpret it.
link |
So he had this interesting connection
link |
because he wasn't a computer architect or designer
link |
but he had an intuition from the computers we had
link |
to what could happen.
link |
And it's interesting you say intuition
link |
because it seems like he was pissing off a lot of engineers
link |
in his intuition about what can and can't be done.
link |
Those, like the, what is all these stories
link |
about like floppy disks and all that kind of stuff.
link |
Yeah, so in Steve, the first round,
link |
like he'd go into a lab and look at what's going on
link |
and hate it and fire people or ask somebody
link |
in the elevator what they're doing for Apple.
link |
When he came back, my impression was
link |
is he surrounded himself
link |
with a relatively small group of people
link |
and didn't really interact outside of that as much.
link |
And then the joke was you'd see like somebody moving
link |
a prototype through the quad with a black blanket over it.
link |
And that was because it was secret, partly from Steve
link |
because they didn't want Steve to see it until it was ready.
link |
Yeah, the dynamic with Johnny Ive and Steve is interesting.
link |
It's like you don't wanna,
link |
he ruins as many ideas as he generates.
link |
It's a dangerous kind of line to walk.
link |
If you have a lot of ideas,
link |
like Gordon Bell was famous for ideas, right?
link |
And it wasn't that the percentage of good ideas
link |
was way higher than anybody else.
link |
It was, he had so many ideas
link |
and he was also good at talking to people about it
link |
and getting the filters right.
link |
And seeing through stuff.
link |
Whereas Elon was like, hey, I wanna build rockets.
link |
So Steve would hire a bunch of rocket guys
link |
and Elon would go read rocket manuals.
link |
So Elon is a better engineer, a sense like,
link |
or like more like a love and passion for the manuals.
link |
The details, the craftsmanship too, right?
link |
Well, I guess Steve had craftsmanship too,
link |
but of a different kind.
link |
What do you make of the,
link |
just to stay in there for just a little longer,
link |
what do you make of like the anger
link |
and the passion and all of that?
link |
The firing and the mood swings and the madness,
link |
the being emotional and all of that, that's Steve.
link |
And I guess Elon too.
link |
So what, is that a bug or a feature?
link |
So there's a graph, which is Y axis productivity,
link |
X axis at zero is chaos,
link |
and infinity is complete order, right?
link |
So as you go from the origin,
link |
as you improve order, you improve productivity.
link |
And at some point, productivity peaks,
link |
and then it goes back down again.
link |
Too much order, nothing can happen.
link |
But the question is, how close to the chaos is that?
link |
No, no, no, here's the thing,
link |
is once you start moving in the direction of order,
link |
the force vector to drive you towards order is unstoppable.
link |
Oh, so it's a slippery slope.
link |
And every organization will move to the place
link |
where their productivity is stymied by order.
link |
So the question is, who's the counter force?
link |
Because it also feels really good.
link |
As you get more organized, the productivity goes up.
link |
The organization feels it, they orient towards it, right?
link |
They hired more people.
link |
They got more guys who couldn't run process,
link |
you get bigger, right?
link |
And then inevitably, the organization gets captured
link |
by the bureaucracy that manages all the processes.
link |
All right, and then humans really like that.
link |
And so if you just walk into a room and say,
link |
guys, love what you're doing,
link |
but I need you to have less order.
link |
If you don't have some force behind that,
link |
nothing will happen.
link |
I can't tell you on how many levels that's profound, so.
link |
So that's why I'd say it's a feature.
link |
Now, could you be nicer about it?
link |
I don't know, I don't know any good examples
link |
of being nicer about it.
link |
Well, the funny thing is to get stuff done,
link |
you need people who can manage stuff and manage people,
link |
because humans are complicated.
link |
They need lots of care and feeding that you need
link |
to tell them they look nice and they're doing good stuff
link |
and pat them on the back, right?
link |
I don't know, you tell me, is that needed?
link |
Do humans need that?
link |
I had a friend, he started a magic group and he said,
link |
You have to praise them before they do anything.
link |
I was waiting until they were done.
link |
And they were always mad at me.
link |
Now I tell them what a great job they're doing
link |
while they're doing it.
link |
But then you get stuck in that trap,
link |
because then when they're not doing something,
link |
how do you confront these people?
link |
I think a lot of people that had trauma
link |
in their childhood would disagree with you,
link |
successful people, that you need to first do the rough stuff
link |
and then be nice later.
link |
Okay, but engineering companies are full of adults
link |
who had all kinds of range of childhoods.
link |
You know, most people had okay childhoods.
link |
Well, I don't know if...
link |
Lots of people only work for praise, which is weird.
link |
You mean like everybody.
link |
I'm not that interested in it, but...
link |
Well, you're probably looking for somebody's approval.
link |
I should think about that.
link |
Maybe somebody who's no longer with us kind of thing.
link |
I used to call up my dad and tell him what I was doing.
link |
He was very excited about engineering and stuff.
link |
You got his approval?
link |
Like, he decided I was smart and unusual as a kid
link |
and that was okay when I was really young.
link |
So when I did poorly in school, I was dyslexic.
link |
I didn't read until I was third or fourth grade.
link |
My parents were like, oh, he'll be fine.
link |
Is he still with us?
link |
He had Parkinson's and then cancer.
link |
His last 10 years were tough and I killed him.
link |
Killing a man like that's hard.
link |
Well, it's pretty good.
link |
Parkinson's causes slow dementia
link |
and the chemotherapy, I think, accelerated it.
link |
But it was like hallucinogenic dementia.
link |
So he was clever and funny and interesting
link |
and it was pretty unusual.
link |
Do you remember conversations?
link |
Like, do you have fond memories of the guy?
link |
Anything come to mind?
link |
A friend told me one time I could draw a computer
link |
on the whiteboard faster than anybody he'd ever met.
link |
I said, you should meet my dad.
link |
Like, when I was a kid, he'd come home and say,
link |
I was driving by this bridge and I was thinking about it
link |
and he pulled out a piece of paper
link |
and he'd draw the whole bridge.
link |
He was a mechanical engineer.
link |
And he would just draw the whole thing
link |
and then he would tell me about it
link |
and then tell me how he would have changed it.
link |
And he had this idea that he could understand
link |
and conceive anything.
link |
And I just grew up with that, so that was natural.
link |
So when I interview people, I ask them to draw a picture
link |
of something they did on a whiteboard
link |
and it's really interesting.
link |
Like, some people draw a little box
link |
and then they'll say, and then this talks to this
link |
and I'll be like, oh, this is frustrating.
link |
I had this other guy come in one time, he says,
link |
well, I designed a floating point in this chip
link |
but I'd really like to tell you how the whole thing works
link |
and then tell you how the floating point works inside of it.
link |
Do you mind if I do that?
link |
And he covered two whiteboards in like 30 minutes
link |
Like, he was great.
link |
This is craftsman.
link |
I mean, that's the craftsmanship to that.
link |
Yeah, but also the mental agility
link |
to understand the whole thing,
link |
put the pieces in context,
link |
real view of the balance of how the design worked.
link |
Because if you don't understand it properly,
link |
when you start to draw it,
link |
you'll fill up half the whiteboard
link |
with like a little piece of it
link |
and like your ability to lay it out in an understandable way
link |
takes a lot of understanding, so.
link |
And be able to, so zoom into the detail
link |
and then zoom out to the big picture.
link |
Zoom out really fast.
link |
What about the impossible thing?
link |
You see, your dad believed that you can do anything.
link |
That's a weird feature for a craftsman.
link |
It seems that that echoes in your own behavior.
link |
Well, it's not that anybody can do anything right now, right?
link |
It's that if you work at it, you can get better at it
link |
and there might not be a limit.
link |
And they did funny things like,
link |
like he always wanted to play piano.
link |
So at the end of his life, he started playing the piano
link |
when he had Parkinson's and he was terrible.
link |
But he thought if he really worked out in this life,
link |
maybe the next life he'd be better at it.
link |
He might be onto something.
link |
Yeah, he enjoyed doing it.
link |
It's pretty funny.
link |
Do you think the perfect is the enemy of the good
link |
in hardware and software engineering?
link |
It's like we were talking about JavaScript a little bit
link |
and the messiness of the 10 day building process.
link |
Yeah, you know, creative tension, right?
link |
So creative tension is you have two different ideas
link |
that you can't do both, right?
link |
And, but the fact that you wanna do both
link |
causes you to go try to solve that problem.
link |
That's the creative part.
link |
So if you're building computers,
link |
like some people say we have the schedule
link |
and anything that doesn't fit in the schedule we can't do.
link |
And so they throw out the perfect
link |
because they have a schedule.
link |
Then there's other people who say
link |
we need to get this perfectly right.
link |
And no matter what, you know, more people, more money,
link |
And there's a really clear idea about what you want.
link |
Some people are really good at articulating it, right?
link |
So let's call that the perfect, yeah.
link |
All right, but that's also terrible
link |
because they never ship anything.
link |
You never hit any goals.
link |
So now you have your framework.
link |
You can't throw out stuff
link |
because you can't get it done today
link |
because maybe you'll get it done tomorrow
link |
or the next project, right?
link |
You can't, so you have to,
link |
I work with a guy that I really like working with,
link |
but he over filters his ideas.
link |
He'd start thinking about something
link |
and as soon as he figured out what was wrong with it,
link |
he'd throw it out.
link |
And then I start thinking about it
link |
and you come up with an idea
link |
and then you find out what's wrong with it.
link |
And then you give it a little time to set
link |
because sometimes you figure out how to tweak it
link |
or maybe that idea helps some other idea.
link |
So idea generation is really funny.
link |
So you have to give your ideas space.
link |
Like spaciousness of mind is key.
link |
But you also have to execute programs and get shit done.
link |
And then it turns out computer engineering is fun
link |
because it takes 100 people to build a computer,
link |
200 or 300, whatever the number is.
link |
And people are so variable about temperament
link |
and skill sets and stuff.
link |
That in a big organization,
link |
you find the people who love the perfect ideas
link |
and the people that want to get stuffed on yesterday
link |
and people like to come up with ideas
link |
and people like to, let's say shoot down ideas.
link |
And it takes the whole, it takes a large group of people.
link |
Some are good at generating ideas, some are good at filtering ideas.
link |
And then all in that giant mess, you're somehow,
link |
I guess the goal is for that giant mess of people
link |
to find the perfect path through the tension,
link |
the creative tension.
link |
But like, how do you know when you said
link |
there's some people good at articulating
link |
what perfect looks like, what a good design is?
link |
Like if you're sitting in a room
link |
and you have a set of ideas
link |
about like how to design a better processor,
link |
how do you know this is something special here?
link |
This is a good idea, let's try this.
link |
Have you ever brainstormed an idea
link |
with a couple of people that were really smart?
link |
And you kind of go into it and you don't quite understand it
link |
and you're working on it.
link |
And then you start talking about it,
link |
putting it on the whiteboard, maybe it takes days or weeks.
link |
And then your brain starts to kind of synchronize.
link |
It's really weird.
link |
Like you start to see what each other is thinking.
link |
And it starts to work.
link |
Like you can see work.
link |
Like my talent in computer design
link |
is I can see how computers work in my head, like really well.
link |
And I know other people can do that too.
link |
And when you're working with people that can do that,
link |
like it is kind of an amazing experience.
link |
And then every once in a while you get to that place
link |
and then you find the flaw, which is kind of funny
link |
because you can fool yourself.
link |
The two of you kind of drifted along
link |
in the direction that was useless.
link |
Like you have to, because the nice thing
link |
about computer design is always reduction in practice.
link |
Like you come up with your good ideas
link |
and I know some architects who really love ideas
link |
and then they work on them and they put it on the shelf
link |
and they go work on the next idea and put it on the shelf
link |
and they never reduce it to practice.
link |
So they find out what's good and bad.
link |
Because almost every time I've done something really new,
link |
by the time it's done, like the good parts are good,
link |
but I know all the flaws, like.
link |
Would you say your career, just your own experience,
link |
is your career defined mostly by flaws or by successes?
link |
Again, there's great tension between those.
link |
If you haven't tried hard, right?
link |
And done something new, right?
link |
Then you're not gonna be facing the challenges
link |
when you build it.
link |
Then you find out all the problems with it.
link |
But when you look back, do you see problems?
link |
Oh, when I look back?
link |
What do you remember?
link |
I think earlier in my career,
link |
like EV5 was the second alpha chip.
link |
I was so embarrassed about the mistakes,
link |
I could barely talk about it.
link |
And it was in the Guinness Book of World Records
link |
and it was the fastest processor on the planet.
link |
So it was, and at some point I realized
link |
that was really a bad mental framework
link |
to deal with doing something new.
link |
We did a bunch of new things
link |
and some worked out great and some were bad.
link |
And we learned a lot from it.
link |
And then the next one, we learned a lot.
link |
That EV6 also had some really cool things in it.
link |
I think the proportion of good stuff went up,
link |
but it had a couple of fatal flaws in it that were painful.
link |
You learned to channel the pain into like pride.
link |
Not pride, really.
link |
You know, just a realization about how the world works
link |
or how that kind of idea set works.
link |
Life is suffering.
link |
That's the reality.
link |
Well, I know the Buddha said that
link |
and a couple other people are stuck on it.
link |
No, it's, you know, there's this kind of weird combination
link |
of good and bad, you know, light and darkness
link |
that you have to tolerate and, you know, deal with.
link |
Yeah, there's definitely lots of suffering in the world.
link |
Depends on the perspective.
link |
It seems like there's way more darkness,
link |
but that makes the light part really nice.
link |
What computing hardware or just any kind,
link |
even software design, do you find beautiful
link |
from your own work, from other people's work?
link |
You're just, we were just talking about the battleground
link |
of flaws and mistakes and errors,
link |
but things that were just beautifully done.
link |
Is there something that pops to mind?
link |
Well, when things are beautifully done,
link |
usually there's a well thought out set of abstraction layers.
link |
So the whole thing works in unison nicely.
link |
And when I say abstraction layer,
link |
that means two different components
link |
when they work together, they work independently.
link |
They don't have to know what the other one is doing.
link |
So that decoupling.
link |
So the famous one was the network stack.
link |
Like there's a seven layer network stack,
link |
you know, data transport and protocol and all the layers.
link |
And the innovation was,
link |
is when they really wrote, got that right.
link |
Cause networks before that didn't define those very well.
link |
The layers could innovate independently.
link |
And occasionally the layer boundary would,
link |
the interface would be upgraded.
link |
And that let the design space breathe.
link |
And you could do something new in layer seven
link |
without having to worry about how layer four worked.
link |
And so good design does that.
link |
And you see it in processor designs.
link |
When we did the Zen design at AMD,
link |
we made several components very modular.
link |
And, you know, my insistence at the top was
link |
I wanted all the interfaces defined
link |
before we wrote the RTL for the pieces.
link |
One of the verification leads said,
link |
if we do this right,
link |
I can test the pieces so well independently
link |
when we put it together,
link |
we won't find all these interaction bugs
link |
cause the floating point knows how the cache works.
link |
And I was a little skeptical,
link |
but he was mostly right.
link |
That the modularity of the design
link |
greatly improved the quality.
link |
Is that universally true in general?
link |
Would you say about good designs,
link |
the modularity is like usually modular?
link |
Well, we talked about this before.
link |
Humans are only so smart.
link |
Like, and we're not getting any smarter, right?
link |
But the complexity of things is going up.
link |
So, you know, a beautiful design can't be bigger
link |
than the person doing it.
link |
It's just, you know, their piece of it.
link |
Like the odds of you doing a really beautiful design
link |
of something that's way too hard for you is low, right?
link |
If it's way too simple for you,
link |
it's not that interesting.
link |
It's like, well, anybody could do that.
link |
But when you get the right match of your expertise
link |
and, you know, mental power to the right design size,
link |
that's cool, but that's not big enough
link |
to make a meaningful impact in the world.
link |
So now you have to have some framework
link |
to design the pieces so that the whole thing
link |
is big and harmonious.
link |
But, you know, when you put it together,
link |
it's, you know, sufficiently interesting to be used.
link |
And, you know, so that's what a beautiful design is.
link |
Matching the limits of that human cognitive capacity
link |
to the module that you can create
link |
and creating a nice interface between those modules
link |
and thereby, do you think there's a limit
link |
to the kind of beautiful complex systems
link |
we can build with this kind of modular design?
link |
It's like, you know, if we build increasingly
link |
more complicated, you can think of like the internet.
link |
Okay, let's scale it down.
link |
Or you can think of like social network,
link |
like Twitter as one computing system.
link |
But those are little modules, right?
link |
But it's built on so many components
link |
nobody at Twitter even understands.
link |
So if an alien showed up and looked at Twitter,
link |
he wouldn't just see Twitter as a beautiful,
link |
simple thing that everybody uses, which is really big.
link |
You would see the network, it runs on the fiber optics,
link |
the data is transported to the computers.
link |
The whole thing is so bloody complicated,
link |
nobody at Twitter understands it.
link |
And so that's what the alien would see.
link |
So yeah, if an alien showed up and looked at Twitter
link |
or looked at the various different network systems
link |
that you could see on Earth.
link |
So imagine they were really smart
link |
and they could comprehend the whole thing.
link |
And then they sort of evaluated the human
link |
and thought, this is really interesting.
link |
No human on this planet comprehends the system they built.
link |
No individual, well, would they even see individual humans?
link |
Like we humans are very human centric, entity centric.
link |
And so we think of us as the central organism
link |
and the networks as just the connection of organisms.
link |
But from a perspective of an alien,
link |
from an outside perspective, it seems like.
link |
We're the ants and they'd see the ant colony.
link |
The ant colony, yeah.
link |
Or the result of production of the ant colony,
link |
which is like cities and it's,
link |
in that sense, humans are pretty impressive.
link |
The modularity that we're able to,
link |
and how robust we are to noise and mutation
link |
and all that kind of stuff.
link |
Well, that's because it's stress tested all the time.
link |
You know, you build all these cities with buildings
link |
and you get earthquakes occasionally
link |
and, you know, wars, earthquakes.
link |
Viruses every once in a while.
link |
You know, changes in business plans
link |
or, you know, like shipping or something.
link |
Like as long as it's all stress tested,
link |
then it keeps adapting to the situation.
link |
So that's a curious phenomenon.
link |
Well, let's go, let's talk about Moore's Law a little bit.
link |
It's at the broad view of Moore's Law
link |
was just exponential improvement of computing capability.
link |
Like OpenAI, for example, recently published
link |
this kind of papers looking at the exponential improvement
link |
in the training efficiency of neural networks
link |
for like ImageNet and all that kind of stuff.
link |
We just got better on this purely software side,
link |
just figuring out better tricks and algorithms
link |
for training neural networks.
link |
And that seems to be improving significantly faster
link |
than the Moore's Law prediction, you know.
link |
So that's in the software space.
link |
What do you think if Moore's Law continues
link |
or if the general version of Moore's Law continues,
link |
do you think that comes mostly from the hardware,
link |
from the software, some mix of the two,
link |
some interesting, totally,
link |
so not the reduction of the size of the transistor
link |
kind of thing, but more in the,
link |
in the totally interesting kinds of innovations
link |
in the hardware space, all that kind of stuff.
link |
Well, there's like a half a dozen things
link |
going on in that graph.
link |
So one is there's initial innovations
link |
that had a lot of headroom to be exploited.
link |
So, you know, the efficiency of the networks
link |
has improved dramatically.
link |
And then the decomposability of those and the use going,
link |
you know, they started running on one computer,
link |
then multiple computers, then multiple GPUs,
link |
and then arrays of GPUs, and they're up to thousands.
link |
And at some point, so it's sort of like
link |
they were consumed, they were going from
link |
like a single computer application
link |
to a thousand computer application.
link |
So that's not really a Moore's Law thing.
link |
That's an independent vector.
link |
How many computers can I put on this problem?
link |
Because the computers themselves are getting better
link |
on like a Moore's Law rate,
link |
but their ability to go from one to 10
link |
to 100 to a thousand, you know, was something.
link |
And then multiplied by, you know, the amount of computes
link |
it took to resolve like AlexNet to ResNet to transformers.
link |
It's been quite, you know, steady improvements.
link |
But those are like S curves, aren't they?
link |
That's the exactly kind of S curves
link |
that are underlying Moore's Law from the very beginning.
link |
So what's the biggest, what's the most productive,
link |
rich source of S curves in the future, do you think?
link |
Is it hardware, is it software, or is it?
link |
So hardware is going to move along relatively slowly.
link |
Like, you know, double performance every two years.
link |
I like how you call that slowly.
link |
Yeah, that's the slow version.
link |
The snail's pace of Moore's Law.
link |
Maybe we should trademark that one.
link |
Whereas the scaling by number of computers, you know,
link |
can go much faster, you know.
link |
I'm sure at some point Google had a, you know,
link |
their initial search engine was running on a laptop,
link |
And at some point they really worked on scaling that.
link |
And then they factored the indexer from, you know,
link |
this piece and this piece and this piece,
link |
and they spread the data on more and more things.
link |
And, you know, they did a dozen innovations.
link |
But as they scaled up the number of computers on that,
link |
it kept breaking, finding new bottlenecks
link |
in their software and their schedulers,
link |
and made them rethink.
link |
Like, it seems insane to do a scheduler
link |
across 1,000 computers to schedule parts of it
link |
and then send the results to one computer.
link |
But if you want to schedule a million searches,
link |
that makes perfect sense.
link |
So there's the scaling by just quantity
link |
is probably the richest thing.
link |
But then as you scale quantity,
link |
like a network that was great on 100 computers
link |
may be completely the wrong one.
link |
You may pick a network that's 10 times slower
link |
on 10,000 computers, like per computer.
link |
But if you go from 100 to 10,000, it's 100 times.
link |
So that's one of the things that happened
link |
when we did internet scaling.
link |
This efficiency went down, not up.
link |
The future of computing is inefficiency, not efficiency.
link |
But scale, inefficient scale.
link |
It's scaling faster than inefficiency bites you.
link |
And as long as there's, you know, dollar value there,
link |
like scaling costs lots of money.
link |
But Google showed, Facebook showed, everybody showed
link |
that the scale was where the money was at.
link |
It was, and so it was worth the financial.
link |
Do you think, is it possible that like basically
link |
the entirety of Earth will be like a computing surface?
link |
Like this table will be doing computing.
link |
This hedgehog will be doing computing.
link |
Like everything really inefficient,
link |
dumb computing will be leveraged.
link |
The science fiction books, they call it computronium.
link |
We turn everything into computing.
link |
Well, most of the elements aren't very good for anything.
link |
Like you're not gonna make a computer out of iron.
link |
Like, you know, silicon and carbon have like nice structures.
link |
You know, we'll see what you can do with the rest of it.
link |
Like people talk about, well, maybe we can turn the sun
link |
into computer, but it's hydrogen and a little bit of helium.
link |
What I mean is more like actually just adding computers
link |
So you're just converting all the mass of the universe
link |
It'd be ironic from the simulation point of view.
link |
It's like the simulator build mass, the simulates.
link |
Yeah, I mean, yeah.
link |
So, I mean, ultimately this is all heading
link |
towards a simulation.
link |
Yeah, well, I think I might've told you this story.
link |
At Tesla, they were deciding,
link |
so they wanna measure the current coming out of the battery
link |
and they decided between putting the resistor in there
link |
and putting a computer with a sensor in there.
link |
And the computer was faster than the computer
link |
I worked on in 1982.
link |
And we chose the computer
link |
because it was cheaper than the resistor.
link |
So, sure, this hedgehog costs $13
link |
and we can put an AI that's as smart as you
link |
in there for five bucks.
link |
So computers will be everywhere.
link |
I was hoping it wouldn't be smarter than me because.
link |
Well, everything's gonna be smarter than you.
link |
But you were saying it's inefficient.
link |
I thought it was better to have a lot of dumb things.
link |
Well, Moore's law will slowly compact that stuff.
link |
So even the dumb things will be smarter than us.
link |
The dumb things are gonna be smart
link |
or they're gonna be smart enough to talk to something
link |
that's really smart.
link |
You know, it's like.
link |
Well, just remember, like a big computer chip.
link |
You know, it's like an inch by an inch
link |
and, you know, 40 microns thick.
link |
It doesn't take very much, very many atoms
link |
to make a high power computer.
link |
And 10,000 of them can fit in a shoebox.
link |
But, you know, you have the cooling and power problems,
link |
but, you know, people are working on that.
link |
But they still can't write compelling poetry or music
link |
or understand what love is or have a fear of mortality.
link |
So we're still winning.
link |
Neither can most of humanity, so.
link |
Well, they can write books about it.
link |
So, but speaking about this,
link |
this walk along the path of innovation
link |
towards the dumb things being smarter than humans,
link |
you are now the CTO of 10storrent as of two months ago.
link |
They build hardware for deep learning.
link |
How do you build scalable and efficient deep learning?
link |
This is such a fascinating space.
link |
Yeah, yeah, so it's interesting.
link |
So up until recently,
link |
I thought there was two kinds of computers.
link |
There are serial computers that run like C programs,
link |
and then there's parallel computers.
link |
So the way I think about it is, you know,
link |
parallel computers have given parallelism.
link |
Like, GPUs are great because you have a million pixels,
link |
and modern GPUs run a program on every pixel.
link |
They call it the shader program, right?
link |
So, or like finite element analysis.
link |
You build something, you know,
link |
you make this into little tiny chunks,
link |
you give each chunk to a computer,
link |
so you're given all these chunks,
link |
you have parallelism like that.
link |
But most C programs, you write this linear narrative,
link |
and you have to make it go fast.
link |
To make it go fast, you predict all the branches,
link |
all the data fetches, and you run that.
link |
More parallel, but that's found parallelism.
link |
AI is, I'm still trying to decide how fundamental this is.
link |
It's a given parallelism problem.
link |
But the way people describe the neural networks,
link |
and then how they write them in PyTorch, it makes graphs.
link |
Yeah, that might be fundamentally different
link |
than the GPU kind of.
link |
Parallelism, yeah, it might be.
link |
Because when you run the GPU program on all the pixels,
link |
you're running, you know, it depends,
link |
this group of pixels say it's background blue,
link |
and it runs a really simple program.
link |
This pixel is, you know, some patch of your face,
link |
so you have some really interesting shader program
link |
to give you the impression of translucency.
link |
But the pixels themselves don't talk to each other.
link |
There's no graph, right?
link |
So you do the image, and then you do the next image,
link |
and you do the next image,
link |
and you run eight million pixels,
link |
eight million programs every time,
link |
and modern GPUs have like 6,000 thread engines in them.
link |
So, you know, to get eight million pixels,
link |
each one runs a program on, you know, 10 or 20 pixels.
link |
And that's how they work, but there's no graph.
link |
But you think graph might be a totally new way
link |
to think about hardware.
link |
So Rajagat Dori and I have been having this conversation
link |
about given versus found parallelism.
link |
And then the kind of walk,
link |
because we got more transistors,
link |
like, you know, computers way back when
link |
did stuff on scalar data.
link |
Now we did it on vector data, famous vector machines.
link |
Now we're making computers that operate on matrices, right?
link |
And then the category we said that was next was spatial.
link |
Like, imagine you have so much data
link |
that, you know, you want to do the compute on this data,
link |
and then when it's done, it says,
link |
send the result to this pile of data on some software on that.
link |
And it's better to think about it spatially
link |
than to move all the data to a central processor
link |
and do all the work.
link |
So spatially, you mean moving in the space of data
link |
as opposed to moving the data.
link |
Yeah, you have a petabyte data space
link |
spread across some huge array of computers.
link |
And when you do a computation somewhere,
link |
you send the result of that computation
link |
or maybe a pointer to the next program
link |
to some other piece of data and do it.
link |
But I think a better word might be graph.
link |
And all the AI neural networks are graphs.
link |
Do some computations, send the result here,
link |
do another computation, do a data transformation,
link |
do a merging, do a pooling, do another computation.
link |
Is it possible to compress and say
link |
how we make this thing efficient,
link |
this whole process efficient, this different?
link |
So first, the fundamental elements in the graphs
link |
are things like matrix multiplies, convolutions,
link |
data manipulations, and data movements.
link |
So GPUs emulate those things with their little singles,
link |
you know, basically running a single threaded program.
link |
And then there's, you know, and NVIDIA calls it a warp
link |
where they group a bunch of programs
link |
that are similar together.
link |
So for efficiency and instruction use.
link |
And then at a higher level, you kind of,
link |
you take this graph and you say this part of the graph
link |
is a matrix multiplier, which runs on these 32 threads.
link |
But the model at the bottom was built
link |
for running programs on pixels, not executing graphs.
link |
So it's emulation, ultimately.
link |
So is it possible to build something
link |
that natively runs graphs?
link |
Yes, so that's what 10storrent did.
link |
Where are we on that?
link |
How, like, in the history of that effort,
link |
are we in the early days?
link |
10storrent started by a friend of mine,
link |
Labisha Bajek, and I was his first investor.
link |
So I've been, you know, kind of following him
link |
and talking to him about it for years.
link |
And in the fall when I was considering things to do,
link |
I decided, you know, we held a conference last year
link |
with a friend, organized it,
link |
and we wanted to bring in thinkers.
link |
And two of the people were Andre Carpassi and Chris Ladner.
link |
And Andre gave this talk, it's on YouTube,
link |
called Software 2.0, which I think is great.
link |
Which is, we went from programmed computers,
link |
where you write programs, to data program computers.
link |
You know, like the future of software is data programs,
link |
And I think that's true.
link |
And then Chris has been working,
link |
he worked on LLVM, the low level virtual machine,
link |
which became the intermediate representation
link |
for all compilers.
link |
And now he's working on another project called MLIR,
link |
which is mid level intermediate representation,
link |
which is essentially under the graph
link |
about how do you represent that kind of computation
link |
and then coordinate large numbers
link |
of potentially heterogeneous computers.
link |
And I would say technically, Tens Torrents,
link |
you know, two pillars of those two ideas,
link |
software 2.0 and mid level representation.
link |
But it's in service of executing graph programs.
link |
The hardware is designed to do that.
link |
So it's including the hardware piece.
link |
And then the other cool thing is,
link |
for a relatively small amount of money,
link |
they did a test chip and two production chips.
link |
So it's like a super effective team.
link |
And unlike some AI startups,
link |
where if you don't build the hardware
link |
to run the software that they really want to do,
link |
then you have to fix it by writing lots more software.
link |
So the hardware naturally does matrix multiply,
link |
convolution, the data manipulations,
link |
and the data movement between processing elements
link |
that you can see in the graph,
link |
which I think is all pretty clever.
link |
And that's what I'm working on now.
link |
So the, I think it's called the Grace Call Processor.
link |
I introduced last year.
link |
It's, you know, there's a bunch of measures of performance.
link |
We're talking about horses.
link |
It seems to outperform 368 trillion operations per second.
link |
It seems to outperform NVIDIA's Tesla T4 system.
link |
So these are just numbers.
link |
What do they actually mean in real world performance?
link |
Like what are the metrics for you
link |
that you're chasing in your horse race?
link |
Like what do you care about?
link |
Well, first, so the native language of,
link |
you know, people who write AI network programs
link |
is PyTorch now, PyTorch, TensorFlow.
link |
There's a couple others.
link |
Do you think PyTorch is one over TensorFlow?
link |
I'm not an expert on that.
link |
I know many people who have switched
link |
from TensorFlow to PyTorch.
link |
And there's technical reasons for it.
link |
Both are still awesome.
link |
Both are still awesome.
link |
But the deepest love is for PyTorch currently.
link |
Yeah, there's more love for that.
link |
And that may change.
link |
So the first thing is when they write their programs,
link |
can the hardware execute it pretty much as it was written?
link |
Right, so PyTorch turns into a graph.
link |
We have a graph compiler that makes that graph.
link |
Then it fractions the graph down.
link |
So if you have big matrix multiply,
link |
we turn it into right size chunks
link |
to run on the processing elements.
link |
It hooks all the graph up.
link |
It lays out all the data.
link |
There's a couple of mid level representations of it
link |
that are also simulatable.
link |
So that if you're writing the code,
link |
you can see how it's gonna go through the machine,
link |
which is pretty cool.
link |
And then at the bottom, it schedules kernels,
link |
like math, data manipulation, data movement kernels,
link |
which do this stuff.
link |
So we don't have to write a little program
link |
to do matrix multiply,
link |
because we have a big matrix multiplier.
link |
There's no SIMD program for that.
link |
But there is scheduling for that, right?
link |
So one of the goals is,
link |
if you write a piece of PyTorch code
link |
that looks pretty reasonable,
link |
you should be able to compile it, run it on the hardware
link |
without having to tweak it
link |
and do all kinds of crazy things to get performance.
link |
There's not a lot of intermediate steps.
link |
It's running directly as written.
link |
Like on a GPU, if you write a large matrix multiply naively,
link |
you'll get five to 10% of the peak performance of the GPU.
link |
Right, and then there's a bunch of people
link |
who've published papers on this,
link |
and I read them about what steps do you have to do.
link |
And it goes from pretty reasonable,
link |
well, transpose one of the matrices.
link |
So you do row ordered, not column ordered,
link |
block it so that you can put a block of the matrix
link |
on different SMs, groups of threads.
link |
But some of it gets into little details,
link |
like you have to schedule it just so,
link |
so you don't have register conflicts.
link |
So they call them CUDA ninjas.
link |
CUDA ninjas, I love it.
link |
To get to the optimal point,
link |
you either use a prewritten library,
link |
which is a good strategy for some things,
link |
or you have to be an expert
link |
in micro architecture to program it.
link |
Right, so the optimization step
link |
is way more complicated with the GPU.
link |
So our goal is if you write PyTorch,
link |
that's good PyTorch, you can do it.
link |
Now there's, as the networks are evolving,
link |
they've changed from convolutional to matrix multiply.
link |
People are talking about conditional graphs,
link |
they're talking about very large matrices,
link |
they're talking about sparsity,
link |
they're talking about problems
link |
that scale across many, many chips.
link |
So the native data item is a packet.
link |
So you send a packet to a processor, it gets processed,
link |
it does a bunch of work,
link |
and then it may send packets to other processors,
link |
and they execute in like a data flow graph
link |
kind of methodology.
link |
We have a big network on chip,
link |
and then the second chip has 16 ethernet ports
link |
to hook lots of them together,
link |
and it's the same graph compiler across multiple chips.
link |
So that's where the scale comes in.
link |
So it's built to scale naturally.
link |
Now, my experience with scaling is as you scale,
link |
you run into lots of interesting problems.
link |
So scaling is the mountain to climb.
link |
So the hardware is built to do this,
link |
and then we're in the process of.
link |
Is there a software part to this
link |
with ethernet and all that?
link |
Well, the protocol at the bottom,
link |
we sent, it's an ethernet PHY,
link |
but the protocol basically says,
link |
send the packet from here to there.
link |
It's all point to point.
link |
The header bit says which processor to send it to,
link |
and we basically take a packet off our on chip network,
link |
put an ethernet header on it,
link |
send it to the other end to strip the header off,
link |
and send it to the local thing.
link |
It's pretty straightforward.
link |
Human to human interaction is pretty straightforward too,
link |
but when you get a million of us,
link |
we could do some crazy stuff together.
link |
Yeah, it's gonna be fun.
link |
So is that the goal is scale?
link |
So like, for example, I've been recently
link |
doing a bunch of robots at home
link |
for my own personal pleasure.
link |
Am I going to ever use 10th Story, or is this more for?
link |
There's all kinds of problems.
link |
Like, there's small inference problems,
link |
or small training problems, or big training problems.
link |
What's the big goal?
link |
Is it the big training problems,
link |
or the small training problems?
link |
Well, one of the goals is to scale
link |
from 100 milliwatts to a megawatt, you know?
link |
So like, really have some range on the problems,
link |
and the same kind of AI programs
link |
work at all different levels.
link |
So that's the goal.
link |
The natural, since the natural data item
link |
is a packet that we can move around,
link |
it's built to scale, but so many people have small problems.
link |
But the, you know.
link |
Like, inside that phone is a small problem to solve.
link |
So do you see 10th Story potentially being inside a phone?
link |
Well, the power efficiency of local memory,
link |
local computation, and the way we built it is pretty good.
link |
And then there's a lot of efficiency
link |
on being able to do conditional graphs and sparsity.
link |
I think it's, for complicated networks
link |
that wanna go in a small factor, it's gonna be quite good.
link |
But we have to prove that, that's all.
link |
It's a fun problem.
link |
And that's the early days of the company, right?
link |
It's a couple years, you said?
link |
But you think, you invested, you think they're legit.
link |
And so you joined.
link |
That's a really interesting place to be.
link |
Like, the AI world is exploding, you know.
link |
And I looked at some other opportunities
link |
like build a faster processor, which people want.
link |
But that's more on an incremental path
link |
than what's gonna happen in AI in the next 10 years.
link |
So this is kind of, you know,
link |
an exciting place to be part of.
link |
Yeah, the revolutions will be happening
link |
in the very space that Tesla is.
link |
And then lots of people are working on it,
link |
but there's lots of technical reasons why some of them,
link |
you know, aren't gonna work out that well.
link |
And, you know, that's interesting.
link |
And there's also the same problem
link |
about getting the basics right.
link |
Like, we've talked to customers about exciting features.
link |
And at some point we realized that,
link |
Labish and I were realizing they want to hear first
link |
about memory bandwidth, local bandwidth,
link |
compute intensity, programmability.
link |
They want to know the basics, power management,
link |
how the network ports work, what are the basics,
link |
do all the basics work.
link |
Because it's easy to say, we've got this great idea,
link |
you know, the crack GPT3, but the people we talked to
link |
want to say, if I buy the, so we have a PCI Express card
link |
with our chip on it, if you buy the card,
link |
you plug it in your machine to download the driver,
link |
how long does it take me to get my network to run?
link |
You know, that's a real question.
link |
It's a very basic question.
link |
Is there an answer to that yet,
link |
or is it trying to get to that?
link |
Our goal is like an hour.
link |
When can I buy a Tesla?
link |
Or my, for the small case training.
link |
Yeah, pretty soon.
link |
I love the idea of you inside the room
link |
with the Carpathi, Andre Carpathi and Chris Ladner.
link |
Very, very interesting, very brilliant people,
link |
very out of the box thinkers,
link |
but also like first principles thinkers.
link |
Well, they both get stuff done.
link |
They only get stuff done to get their own projects done.
link |
They talk about it clearly.
link |
They educate large numbers of people,
link |
and they've created platforms for other people
link |
to go do their stuff on.
link |
Yeah, the clear thinking that's able to be communicated
link |
is kind of impressive.
link |
It's kind of remarkable to, yeah, I'm a fan.
link |
because I talk to Chris actually a lot these days.
link |
He's been one of the, just to give him a shout out,
link |
he's been so supportive as a human being.
link |
So everybody's quite different.
link |
Like great engineers are different,
link |
but he's been like sensitive to the human element
link |
in a way that's been fascinating.
link |
Like he was one of the early people
link |
on this stupid podcast that I do to say like,
link |
don't quit this thing,
link |
and also talk to whoever the hell you want to talk to.
link |
That kind of from a legit engineer to get like props
link |
and be like, you can do this.
link |
That was, I mean, that's what a good leader does, right?
link |
To just kind of let a little kid do his thing,
link |
like go do it, let's see what turns out.
link |
That's a pretty powerful thing.
link |
But what do you, what's your sense about,
link |
he used to be, no, I think stepped away from Google, right?
link |
He's at SciFive, I think.
link |
What's really impressive to you
link |
about the things that Chris has worked on?
link |
Because we mentioned the optimization,
link |
the compiler design stuff, the LLVM,
link |
then there's, he's also at Google worked at the TPU stuff.
link |
He's obviously worked on Swift,
link |
so the programming language side.
link |
Talking about people that work in the entirety of the stack.
link |
What, from your time interacting with Chris
link |
and knowing the guy, what's really impressive to you
link |
that just inspires you?
link |
Well, like LLVM became the defacto platform
link |
for the defacto platform for compilers.
link |
And it was good code quality, good design choices.
link |
He hit the right level of abstraction.
link |
There's a little bit of the right time, the right place.
link |
And then he built a new programming language called Swift,
link |
which after, let's say some adoption resistance
link |
became very successful.
link |
I don't know that much about his work at Google,
link |
although I know that that was a typical,
link |
they started TensorFlow stuff and it was new.
link |
They wrote a lot of code and then at some point
link |
it needed to be refactored to be,
link |
because its development slowed down,
link |
why PyTorch started a little later and then passed it.
link |
So he did a lot of work on that.
link |
And then his idea about MLIR,
link |
which is what people started to realize
link |
is the complexity of the software stack above
link |
the low level IR was getting so high
link |
that forcing the features of that into the level
link |
was putting too much of a burden on it.
link |
So he's splitting that into multiple pieces.
link |
And that was one of the inspirations for our software stack
link |
where we have several intermediate representations
link |
that are all executable and you can look at them
link |
and do transformations on them before you lower the level.
link |
So that was, I think we started before MLIR
link |
really got far enough along to use,
link |
but we're interested in that.
link |
He's really excited about MLIR.
link |
That's his like little baby.
link |
So he, and there seems to be some profound ideas on that
link |
that are really useful.
link |
So each one of those things has been,
link |
as the world of software gets more and more complicated,
link |
how do we create the right abstraction levels
link |
to simplify it in a way that people can now work independently
link |
on different levels of it?
link |
So I would say all three of those projects,
link |
LLVM, Swift, and MLIR did that successfully.
link |
So I'm interested in what he's gonna do next
link |
in the same kind of way.
link |
On either the TPU or maybe the Nvidia GPU side,
link |
how does 10th Story think, or the ideas underlying it,
link |
does it have to be 10th Story?
link |
Just this kind of graph focused,
link |
graph centric hardware, deep learning centric hardware,
link |
beat NVIDIAs, do you think it's possible
link |
for it to basically overtake NVIDIA?
link |
What's that process look like?
link |
What's that journey look like, you think?
link |
Well, GPUs were built to run shader programs
link |
on millions of pixels, not to run graphs.
link |
So there's a hypothesis that says
link |
the way the graphs are built
link |
is going to be really interesting
link |
to be inefficient on computing this.
link |
And then the primitives is not a SIMD program,
link |
it's matrix multiply convolution.
link |
And then the data manipulations are fairly extensive about,
link |
like, how do you do a fast transpose with a program?
link |
I don't know if you've ever written a transpose program.
link |
They're ugly and slow, but in hardware,
link |
you can do really well.
link |
Like, I'll give you an example.
link |
So when GPU accelerators first started doing triangles,
link |
like, so you have a triangle
link |
which maps on a set of pixels.
link |
So you build, it's very easy,
link |
straightforward to build a hardware engine
link |
that'll find all those pixels.
link |
And it's kind of weird
link |
because you walk along the triangle to get to the edge,
link |
and then you have to go back down to the next row
link |
and walk along, and then you have to decide on the edge
link |
if the line of the triangle is like half on the pixel,
link |
what's the pixel color?
link |
Because it's half of this pixel and half the next one.
link |
That's called rasterization.
link |
And you're saying that could be done in hardware?
link |
No, that's an example of that operation
link |
as a software program is really bad.
link |
I've written a program that did rasterization.
link |
The hardware that does it has actually less code
link |
than the software program that does it,
link |
and it's way faster.
link |
Right, so there are certain times
link |
when the abstraction you have, rasterize a triangle,
link |
you know, execute a graph, you know, components of a graph.
link |
But the right thing to do in the hardware software boundary
link |
is for the hardware to naturally do it.
link |
And so the GPU is really optimized
link |
for the rasterization of triangles.
link |
Well, you know, that's just, well, like in a modern,
link |
you know, that's a small piece of modern GPUs.
link |
What they did is that they still rasterize triangles
link |
when you're running in a game, but for the most part,
link |
most of the computation in the area of the GPU
link |
is running shader programs.
link |
But they're single threaded programs on pixels, not graphs.
link |
I have to be honest, I'd say I don't actually know
link |
the math behind shader, shading and lighting
link |
and all that kind of stuff.
link |
I don't know what.
link |
They look like little simple floating point programs
link |
or complicated ones.
link |
You can have 8,000 instructions in a shader program.
link |
But I don't have a good intuition
link |
why it could be parallelized so easily.
link |
No, it's because you have 8 million pixels in every single.
link |
So when you have a light, right, that comes down,
link |
the angle, you know, the amount of light,
link |
like say this is a line of pixels across this table, right?
link |
The amount of light on each pixel is subtly different.
link |
And each pixel is responsible for figuring out what.
link |
So that pixel says, I'm this pixel.
link |
I know the angle of the light.
link |
I know the occlusion.
link |
I know the color I am.
link |
Like every single pixel here is a different color.
link |
Every single pixel gets a different amount of light.
link |
Every single pixel has a subtly different translucency.
link |
So to make it look realistic,
link |
the solution was you run a separate program on every pixel.
link |
See, but I thought there's like reflection
link |
from all over the place.
link |
Every pixel. Yeah, but there is.
link |
So you build a reflection map,
link |
which also has some pixelated thing.
link |
And then when the pixel is looking at the reflection map,
link |
it has to calculate what the normal of the surface is.
link |
And it does it per pixel.
link |
By the way, there's boatloads of hacks on that.
link |
You know, like you may have a lower resolution light map,
link |
your reflection map.
link |
There's all these, you know, tax they do.
link |
But at the end of the day, it's per pixel computation.
link |
And it's so happening that you can map
link |
graph like computation onto this pixel central computation.
link |
You can do floating point programs
link |
on convolutions and the matrices.
link |
And Nvidia invested for years in CUDA.
link |
First for HPC, and then they got lucky with the AI trend.
link |
But do you think they're going to essentially
link |
not be able to hardcore pivot out of their?
link |
That's always interesting.
link |
How often do big companies hardcore pivot?
link |
How much do you know about Nvidia, folks?
link |
Well, I'm curious as well.
link |
Who's ultimately, as a...
link |
Well, they've innovated several times.
link |
But they've also worked really hard on mobile.
link |
They've worked really hard on radios.
link |
You know, they're fundamentally a GPU company.
link |
Well, they tried to pivot.
link |
There's an interesting little game and play
link |
in autonomous vehicles, right?
link |
With, or semi autonomous, like playing with Tesla
link |
and so on and seeing that's dipping a toe
link |
into that kind of pivot.
link |
They came out with this platform,
link |
which is interesting technically.
link |
But it was like a 3000 watt, you know,
link |
3000 watt, $3,000 GPU platform.
link |
I don't know if it's interesting technically.
link |
It's interesting philosophically.
link |
Technically, I don't know if it's the execution
link |
of the craftsmanship is there.
link |
But I didn't get a sense.
link |
I think they were repurposing GPUs
link |
for an automotive solution.
link |
Right, it's not a real pivot.
link |
They didn't build a ground up solution.
link |
Like the chips inside Tesla are pretty cheap.
link |
Like Mobileye has been doing this.
link |
They're doing the classic work from the simplest thing.
link |
I mean, 40 square millimeter chips.
link |
And Nvidia, their solution had 800 millimeter chips
link |
and two 200 millimeter chips.
link |
And, you know, like boatloads are really expensive DRAMs.
link |
And, you know, it's a really different approach.
link |
And Mobileye fit the, let's say,
link |
automotive cost and form factor.
link |
And then they added features as it was economically viable.
link |
And Nvidia said, take the biggest thing
link |
and we're gonna go make it work.
link |
You know, and that's also influenced like Waymo.
link |
There's a whole bunch of autonomous startups
link |
where they have a 5,000 watt server in their trunk.
link |
But that's because they think, well, 5,000 watts
link |
and, you know, $10,000 is okay
link |
because it's replacing a driver.
link |
Elon's approach was that port has to be cheap enough
link |
to put it in every single Tesla,
link |
whether they turn on autonomous driving or not.
link |
Which, and Mobileye was like,
link |
we need to fit in the bomb and, you know,
link |
cost structure that car companies do.
link |
So they may sell you a GPS for 1500 bucks,
link |
but the bomb for that, it's like $25.
link |
Well, and for Mobileye, it seems like neural networks
link |
were not first class citizens, like the computation.
link |
They didn't start out as a...
link |
Yeah, it was a CV problem.
link |
And did classic CV and found stoplights and lines.
link |
And they were really good at it.
link |
Yeah, and they never, I mean,
link |
I don't know what's happening now,
link |
but they never fully pivoted.
link |
I mean, it's like, it's the Nvidia thing.
link |
And then as opposed to,
link |
so if you look at the new Tesla work,
link |
it's like neural networks from the ground up, right?
link |
Yeah, and even Tesla started with a lot of CV stuff in it
link |
and Andrei's basically been eliminating it.
link |
Move everything into the network.
link |
So without, this isn't like confidential stuff,
link |
but you sitting on a porch, looking over the world,
link |
looking at the work that Andrei's doing,
link |
that Elon's doing with Tesla Autopilot,
link |
do you like the trajectory of where things are going
link |
on the hardware side?
link |
Well, they're making serious progress.
link |
I like the videos of people driving the beta stuff.
link |
I guess taking some pretty complicated intersections
link |
and all that, but it's still an intervention per drive.
link |
I mean, I have autopilot, the current autopilot,
link |
my Tesla, I use it every day.
link |
Do you have full self driving beta or no?
link |
So you like where this is going?
link |
They're making progress.
link |
It's taking longer than anybody thought.
link |
You know, my wonder is, you know, hardware three,
link |
is it enough computing off by two, off by five,
link |
off by 10, off by a hundred?
link |
And I thought it probably wasn't enough,
link |
but they're doing pretty well with it now.
link |
And one thing is the data set gets bigger,
link |
the training gets better.
link |
And then there's this interesting thing is you sort of train
link |
and build an arbitrary size network that solves the problem.
link |
And then you refactor the network down to the thing
link |
that you can afford to ship, right?
link |
So the goal isn't to build a network that fits in the phone.
link |
It's to build something that actually works.
link |
And then how do you make that most effective
link |
on the hardware you have?
link |
And they seem to be doing that much better
link |
than a couple of years ago.
link |
Well, the one really important thing is also
link |
what they're doing well is how to iterate that quickly,
link |
which means like it's not just about one time deployment,
link |
one building, it's constantly iterating the network
link |
and trying to automate as many steps as possible, right?
link |
And that's actually the principles of the Software 2.0,
link |
like you mentioned with Andre is it's not just,
link |
I mean, I don't know what the actual,
link |
his description of Software 2.0 is.
link |
If it's just high level philosophical or their specifics,
link |
but the interesting thing about what that actually looks
link |
in the real world is it's that what I think Andre calls
link |
the data engine, it's like it's the iterative improvement
link |
You have a neural network that does stuff,
link |
fails on a bunch of things and learns from it
link |
over and over and over.
link |
So you're constantly discovering edge cases.
link |
So it's very much about like data engineering,
link |
like figuring out, it's kind of what you were talking about
link |
with TestTorrent is you have the data landscape.
link |
And you have to walk along that data landscape
link |
in a way that is constantly improving the neural network.
link |
And that feels like that's the central piece of it.
link |
And there's two pieces of it.
link |
Like you find edge cases that don't work
link |
and then you define something that goes,
link |
get your data for that.
link |
But then the other constraint is whether you have
link |
to label it or not.
link |
Like the amazing thing about like the GPT3 stuff
link |
is it's unsupervised.
link |
So there's essentially infinite amount of data.
link |
Now there's obviously infinite amount of data available
link |
from cars of people successfully driving.
link |
But the current pipelines are mostly running
link |
on labeled data, which is human limited.
link |
So when that becomes unsupervised,
link |
it'll create unlimited amount of data,
link |
which then they'll scale.
link |
Now the networks that may use that data
link |
might be way too big for cars,
link |
but then there'll be the transformation from now
link |
we have unlimited data, I know exactly what I want.
link |
Now can I turn that into something that fits in the car?
link |
And that process is gonna happen all over the place.
link |
Every time you get to the place where you have
link |
unlimited data, and that's what software 2.0 is about,
link |
unlimited data training networks to do stuff
link |
without humans writing code to do it.
link |
And ultimately also trying to discover,
link |
like you're saying, the self supervised formulation
link |
So the unsupervised formulation of the problem.
link |
Like in driving, there's this really interesting thing,
link |
which is you look at a scene that's before you,
link |
and you have data about what a successful human driver did
link |
in that scene one second later.
link |
It's a little piece of data that you can use
link |
just like with GPT3 as training.
link |
Currently, even though Tesla says they're using that,
link |
it's an open question to me, how far can you,
link |
can you solve all of the driving
link |
with just that self supervised piece of data?
link |
And like, I think.
link |
Well, that's what Common AI is doing.
link |
That's what Common AI is doing,
link |
but the question is how much data.
link |
So what Common AI doesn't have is as good
link |
of a data engine, for example, as Tesla does.
link |
That's where the, like the organization of the data.
link |
I mean, as far as I know, I haven't talked to George,
link |
but they do have the data.
link |
The question is how much data is needed,
link |
because we say infinite very loosely here.
link |
And then the other question, which you said,
link |
I don't know if you think it's still an open question is,
link |
are we on the right order of magnitude
link |
for the compute necessary?
link |
That is this, is it like what Elon said,
link |
this chip that's in there now is enough
link |
to do full self driving,
link |
or do we need another order of magnitude?
link |
I think nobody actually knows the answer to that question.
link |
I like the confidence that Elon has, but.
link |
There's another funny thing is you don't learn to drive
link |
with infinite amounts of data.
link |
You learn to drive with an intellectual framework
link |
that understands physics and color and horizontal surfaces
link |
and laws and roads and all your experience
link |
from manipulating your environment.
link |
Like, look, there's so many factors go into that.
link |
So then when you learn to drive,
link |
like driving is a subset of this conceptual framework
link |
that you have, right?
link |
And so with self driving cars right now,
link |
we're teaching them to drive with driving data.
link |
You never teach a human to do that.
link |
You teach a human all kinds of interesting things,
link |
like language, like don't do that, watch out.
link |
There's all kinds of stuff going on.
link |
Well, this is where you, I think previous time
link |
we talked about where you poetically disagreed
link |
with my naive notion about humans.
link |
I just think that humans will make
link |
this whole driving thing really difficult.
link |
I said, humans don't move that slow.
link |
It's a ballistics problem.
link |
It's a ballistics, humans are a ballistics problem,
link |
which is like poetry to me.
link |
It's very possible that in driving
link |
they're indeed purely a ballistics problem.
link |
And I think that's probably the right way to think about it.
link |
But I still, they still continue to surprise me,
link |
those damn pedestrians, the cyclists,
link |
other humans in other cars and.
link |
Yeah, but it's gonna be one of these compensating things.
link |
So like when you're driving,
link |
you have an intuition about what humans are going to do,
link |
but you don't have 360 cameras and radars
link |
and you have an attention problem.
link |
So the self driving car comes in with no attention problem,
link |
360 cameras right now, a bunch of other features.
link |
So they'll wipe out a whole class of accidents, right?
link |
And emergency braking with radar
link |
and especially as it gets AI enhanced
link |
will eliminate collisions, right?
link |
But then you have the other problems
link |
of these unexpected things where
link |
you think your human intuition is helping,
link |
but then the cars also have a set of hardware features
link |
that you're not even close to.
link |
And the key thing of course is if you wipe out
link |
a huge number of kind of accidents,
link |
then it might be just way safer than a human driver,
link |
even though, even if humans are still a problem,
link |
that's hard to figure out.
link |
Yeah, that's probably what will happen.
link |
Those autonomous cars will have a small number of accidents
link |
humans would have avoided, but they'll wipe,
link |
they'll get rid of the bulk of them.
link |
What do you think about like Tesla's dojo efforts
link |
or it can be bigger than Tesla in general.
link |
It's kind of like the tense torrent trying to innovate,
link |
like this is the dichotomy, like should a company
link |
try to from scratch build its own
link |
neural network training hardware?
link |
Well, first of all, I think it's great.
link |
So we need lots of experiments, right?
link |
And there's lots of startups working on this
link |
and they're pursuing different things.
link |
I was there when we started dojo and it was sort of like,
link |
what's the unconstrained computer solution
link |
to go do very large training problems?
link |
And then there's fun stuff like, we said,
link |
well, we have this 10,000 watt board to cool.
link |
Well, you go talk to guys at SpaceX
link |
and they think 10,000 watts is a really small number,
link |
And there's brilliant people working on it.
link |
I'm curious to see how it'll come out.
link |
I couldn't tell you, I know it pivoted
link |
a few times since I left, so.
link |
So the cooling does seem to be a big problem.
link |
I do like what Elon said about it, which is like,
link |
we don't wanna do the thing unless it's way better
link |
than the alternative, whatever the alternative is.
link |
So it has to be way better than like racks or GPUs.
link |
Yeah, and the other thing is just like,
link |
you know, the Tesla autonomous driving hardware,
link |
it was only serving one software stack.
link |
And the hardware team and the software team
link |
were tightly coupled.
link |
You know, if you're building a general purpose AI solution,
link |
then you know, there's so many different customers
link |
with so many different needs.
link |
Now, something Andre said is, I think this is amazing.
link |
10 years ago, like vision, recommendation, language,
link |
were completely different disciplines.
link |
He said, the people literally couldn't talk to each other.
link |
And three years ago, it was all neural networks,
link |
but the very different neural networks.
link |
And recently, it's converging on one set of networks.
link |
They vary a lot in size, obviously, they vary in data,
link |
vary in outputs, but the technology has converged
link |
Yeah, these transformers behind GPT3,
link |
it seems like they could be applied to video,
link |
they could be applied to a lot of, and it's like,
link |
and they're all really simple.
link |
And it was like they literally replace letters with pixels.
link |
It does vision, it's amazing.
link |
And then size actually improves the thing.
link |
So the bigger it gets, the more compute you throw at it,
link |
the better it gets.
link |
And the more data you have, the better it gets.
link |
So then you start to wonder, well,
link |
is that a fundamental thing?
link |
Or is this just another step to some fundamental understanding
link |
about this kind of computation?
link |
Which is really interesting.
link |
Us humans don't want to believe that that kind of thing
link |
will achieve conceptual understandings, you were saying,
link |
like you'll figure out physics, but maybe it will.
link |
Well, it's worse than that.
link |
It'll understand physics in ways that we can't understand.
link |
I like your Stephen Wolfram talk where he said,
link |
you know, there's three generations of physics.
link |
There was physics by reasoning.
link |
Well, big things should fall faster than small things,
link |
And then there's physics by equations.
link |
Like, you know, but the number of programs in the world
link |
that are solved with a single equation is relatively low.
link |
Almost all programs have, you know,
link |
more than one line of code, maybe 100 million lines of code.
link |
So he said, then now we're going to physics by equation,
link |
which is his project, which is cool.
link |
I might point out there was two generations of physics
link |
before reasoning habit.
link |
Like all animals, you know, know things fall
link |
and, you know, birds fly and, you know, predators know
link |
how to, you know, solve a differential equation
link |
to cut off a accelerating, you know, curving animal path.
link |
And then there was, you know, the gods did it, right?
link |
So there was, you know, there's five generations.
link |
Now, software 2.0 says programming things
link |
is not the last step.
link |
So there's going to be a physics past Stephen Wolfram's con.
link |
That's not explainable to us humans.
link |
And actually there's no reason that I can see
link |
well that even that's the limit.
link |
Like, there's something beyond that.
link |
I mean, they're usually, like, usually when you have
link |
this hierarchy, it's not like, well, if you have this step
link |
and this step and this step and they're all qualitatively
link |
different and conceptually different, it's not obvious why,
link |
you know, six is the right number of hierarchy steps
link |
and not seven or eight or.
link |
Well, then it's probably impossible for us to,
link |
to comprehend something that's beyond the thing
link |
that's not explainable.
link |
But the thing that, you know, understands the thing
link |
that's not explainable to us will conceive the next one.
link |
And like, I'm not sure why there's a limit to it.
link |
Click your brain hurts.
link |
That's a sad story.
link |
If we look at our own brain, which is an interesting
link |
illustrative example in your work with test story
link |
and trying to design deep learning architectures,
link |
do you think about the brain at all?
link |
Maybe from a hardware designer perspective,
link |
if you could change something about the brain,
link |
what would you change or do?
link |
Like, how would you do it?
link |
So your brain is really weird.
link |
Like, you know, your cerebral cortex where we think
link |
we do most of our thinking is what,
link |
like six or seven neurons thick?
link |
Like, that's weird.
link |
Like all the big networks are way bigger than that.
link |
So that seems odd.
link |
And then, you know, when you're thinking if it's,
link |
if the input generates a result you can lose,
link |
it goes really fast.
link |
But if it can't, that generates an output
link |
that's interesting, which turns into an input
link |
and then your brain to the point where you mold things
link |
over for days and how many trips
link |
through your brain is that, right?
link |
Like it's, you know, 300 milliseconds or something
link |
to get through seven levels of neurons.
link |
I forget the number exactly.
link |
But then it does it over and over and over as it searches.
link |
And the brain clearly looks like some kind of graph
link |
because you have a neuron with connections
link |
and it talks to other ones
link |
and it's locally very computationally intense,
link |
but it's also does sparse computations
link |
across a pretty big area.
link |
There's a lot of messy biological type of things
link |
and it's meaning like, first of all,
link |
there's mechanical, chemical and electrical signals.
link |
It's all that's going on.
link |
Then there's the asynchronicity of signals.
link |
And there's like, there's just a lot of variability
link |
that seems continuous and messy
link |
and just the mess of biology.
link |
And it's unclear whether that's a good thing
link |
or it's a bad thing, because if it's a good thing
link |
that we need to run the entirety of the evolution,
link |
well, we're gonna have to start with basic bacteria
link |
to create something.
link |
So imagine we could control,
link |
you could build a brain with 10 layers.
link |
Would that be better or worse?
link |
Or more connections or less connections,
link |
or we don't know to what level our brains are optimized.
link |
But if I was changing things,
link |
like you can only hold like seven numbers in your head.
link |
Like why not a hundred or a million?
link |
Never thought of that.
link |
And why can't we have like a floating point processor
link |
that can compute anything we want
link |
and see it all properly?
link |
Like that would be kind of fun.
link |
And why can't we see in four or eight dimensions?
link |
Because 3D is kind of a drag.
link |
Like all the hard mass transforms
link |
are up in multiple dimensions.
link |
So you could imagine a brain architecture
link |
that you could enhance with a whole bunch of features
link |
that would be really useful for thinking about things.
link |
It's possible that the limitations you're describing
link |
are actually essential for like the constraints
link |
are essential for creating like the depth of intelligence.
link |
Like that, the ability to reason.
link |
because like your brain is clearly a parallel processor.
link |
10 billion neurons talking to each other
link |
at a relatively low clock rate.
link |
But it produces something
link |
that looks like a serial thought process.
link |
It's a serial narrative in your head.
link |
But then there are people famously who are visual thinkers.
link |
Like I think I'm a relatively visual thinker.
link |
I can imagine any object and rotate it in my head
link |
And there are people who say
link |
they don't think that way at all.
link |
And recently I read an article about people
link |
who say they don't have a voice in their head.
link |
But when they, you know, it's like,
link |
well, what are you thinking?
link |
No, they'll describe something that's visual.
link |
So that's curious.
link |
Now, if you're saying,
link |
if we dedicated more hardware to holding information,
link |
like, you know, 10 numbers or a million numbers,
link |
like would that distract us from our ability
link |
to form this kind of singular identity?
link |
Like it dissipates somehow.
link |
But maybe, you know, future humans
link |
will have many identities
link |
that have some higher level organization
link |
but can actually do lots more things in parallel.
link |
Yeah, there's no reason, if we're thinking modularly,
link |
there's no reason we can't have multiple consciousnesses
link |
Yeah, and maybe there's some way to make it faster
link |
so that the, you know, the area of the computation
link |
could still have a unified feel to it
link |
while still having way more ability
link |
to do parallel stuff at the same time.
link |
Could definitely be improved.
link |
Could be improved?
link |
Okay, well, it's pretty good right now.
link |
Actually, people don't give it enough credit.
link |
The thing is pretty nice.
link |
The, you know, the fact that the right ends
link |
seem to be, give a nice, like,
link |
spark of beauty to the whole experience.
link |
I don't know if it can be improved easily.
link |
It could be more beautiful.
link |
I don't know how, I, what?
link |
What do you mean, what do you mean how?
link |
All the ways you can't imagine.
link |
No, but that's the whole point.
link |
I wouldn't be able to,
link |
the fact that I can imagine ways
link |
in which it could be more beautiful means.
link |
So do you know, you know, Ian Banks, his stories?
link |
So the super smart AIs there live,
link |
mostly live in the world of what they call infinite fun
link |
because they can create arbitrary worlds.
link |
So they interact in, you know, the story has it.
link |
They interact in the normal world and they're very smart
link |
and they can do all kinds of stuff.
link |
And, you know, a given mind can, you know,
link |
talk to a million humans at the same time
link |
because we're very slow and for reasons,
link |
you know, artificial, the story,
link |
they're interested in people and doing stuff,
link |
but they mostly live in this other land of thinking.
link |
My inclination is to think that the ability
link |
to create infinite fun will not be so fun.
link |
Well, there are so many things to do.
link |
Imagine being able to make a star move planets around.
link |
Yeah, yeah, but because we can imagine that
link |
is why life is fun, if we actually were able to do it,
link |
it would be a slippery slope
link |
where fun wouldn't even have a meaning
link |
because we just consistently desensitize ourselves
link |
by the infinite amounts of fun we're having.
link |
And the sadness, the dark stuff is what makes it fun.
link |
I think that could be the Russian.
link |
It could be the fun makes it fun
link |
and the sadness makes it bittersweet.
link |
Yeah, that's true.
link |
Fun could be the thing that makes it fun.
link |
So what do you think about the expansion,
link |
not through the biology side,
link |
but through the BCI, the brain computer interfaces?
link |
Yeah, you got a chance to check out the Neuralink stuff.
link |
It's super interesting.
link |
Like humans like our thoughts to manifest as action.
link |
You know, like as a kid, you know,
link |
like shooting a rifle was super fun,
link |
driving a mini bike, doing things.
link |
And then computer games, I think,
link |
for a lot of kids became the thing
link |
where they can do what they want.
link |
They can fly a plane, they can do this, they can do this.
link |
But you have to have this physical interaction.
link |
Now imagine, you could just imagine stuff and it happens.
link |
Like really richly and interestingly.
link |
Like we kind of do that when we dream.
link |
Like dreams are funny because like if you have some control
link |
or awareness in your dreams,
link |
like it's very realistic looking,
link |
or not realistic looking, it depends on the dream.
link |
But you can also manipulate that.
link |
And you know, what's possible there is odd.
link |
And the fact that nobody understands it, it's hilarious, but.
link |
Do you think it's possible to expand
link |
that capability through computing?
link |
Is there some interesting,
link |
so from a hardware designer perspective,
link |
is there, do you think it'll present totally new challenges
link |
in the kind of hardware required that like,
link |
so this hardware isn't standalone computing.
link |
Well, this is not working with the brain.
link |
So today, computer games are rendered by GPUs.
link |
Right, so, but you've seen the GAN stuff, right?
link |
Where trained neural networks render realistic images,
link |
but there's no pixels, no triangles, no shaders,
link |
no light maps, no nothing.
link |
So the future of graphics is probably AI, right?
link |
AI is heavily trained by lots of real data, right?
link |
So if you have an interface with a AI renderer, right?
link |
So if you say render a cat, it won't say,
link |
well, how tall's the cat and how big it,
link |
you know, it'll render a cat.
link |
And you might say, oh, a little bigger, a little smaller,
link |
you know, make it a tabby, shorter hair.
link |
You know, like you could tweak it.
link |
Like the amount of data you'll have to send
link |
to interact with a very powerful AI renderer
link |
But the question is brain computer interfaces
link |
would need to render not onto a screen,
link |
but render onto the brain and like directly
link |
so that there's a bandwidth.
link |
Well, it could do it both ways.
link |
I mean, our eyes are really good sensors.
link |
They could render onto a screen
link |
and we could feel like we're participating in it.
link |
You know, they're gonna have, you know,
link |
like the Oculus kind of stuff.
link |
It's gonna be so good when a projection to your eyes,
link |
you think it's real.
link |
You know, they're slowly solving those problems.
link |
And I suspect when the renderer of that information
link |
into your head is also AI mediated,
link |
they'll be able to give you the cues that, you know,
link |
you really want for depth and all kinds of stuff.
link |
Like your brain is partly faking your visual field, right?
link |
Like your eyes are twitching around,
link |
but you don't notice that.
link |
Occasionally they blank, you don't notice that.
link |
You know, there's all kinds of things.
link |
Like you think you see over here,
link |
but you don't really see there.
link |
It's all fabricated.
link |
Yeah, peripheral vision is fascinating.
link |
So if you have an AI renderer that's trained
link |
to understand exactly how you see
link |
and the kind of things that enhance the realism
link |
of the experience, it could be super real actually.
link |
So I don't know what the limits to that are,
link |
but obviously if we have a brain interface
link |
that goes inside your visual cortex
link |
in a better way than your eyes do, which is possible,
link |
it's a lot of neurons, maybe that'll be even cooler.
link |
Well, the really cool thing is that it has to do
link |
with the infinite fun that you were referring to,
link |
which is our brains seem to be very limited.
link |
And like you said, computations.
link |
It's also very plastic.
link |
Very plastic, yeah.
link |
Yeah, so it's a interesting combination.
link |
The interesting open question is the limits
link |
of that neuroplasticity, like how flexible is that thing?
link |
Because we haven't really tested it.
link |
We know about that at the experiments
link |
where they put like a pressure pad on somebody's head
link |
and had a visual transducer pressurize it
link |
and somebody slowly learned to see.
link |
Especially at a young age, if you throw a lot at it,
link |
like what can it, so can you like arbitrarily expand it
link |
with computing power?
link |
So connected to the internet directly somehow?
link |
Yeah, the answer's probably yes.
link |
So the problem with biology and ethics
link |
is like there's a mess there.
link |
Like us humans are perhaps unwilling to take risks
link |
into directions that are full of uncertainty.
link |
So it's like. No, no.
link |
90% of the population's unwilling to take risks.
link |
The other 10% is rushing into the risks
link |
unaided by any infrastructure whatsoever.
link |
And that's where all the fun happens in society.
link |
There's been huge transformations
link |
in the last couple thousand years.
link |
I got a chance to interact with this Matthew Johnson
link |
from Johns Hopkins.
link |
He's doing this large scale study of psychedelics.
link |
It's becoming more and more,
link |
I've gotten a chance to interact
link |
with that community of scientists working on psychedelics.
link |
But because of that, that opened the door to me
link |
to all these, what do they call it?
link |
Psychonauts, the people who, like you said,
link |
the 10% who are like, I don't care.
link |
I don't know if there's a science behind this.
link |
I'm taking this spaceship to,
link |
if I'm being the first on Mars, I'll be.
link |
Psychedelics are interesting in the sense
link |
that in another dimension, like you said,
link |
it's a way to explore the limits of the human mind.
link |
Like, what is this thing capable of doing?
link |
Because you kind of, like when you dream, you detach it.
link |
I don't know exactly the neuroscience of it,
link |
but you detach your reality from what your mind,
link |
the images your mind is able to conjure up
link |
and your mind goes into weird places and entities appear.
link |
Somehow Freudian type of trauma
link |
is probably connected in there somehow,
link |
but you start to have these weird, vivid worlds that like.
link |
So do you actively dream?
link |
I have like six hours of dreams a night.
link |
It's like really useful time.
link |
I know, I haven't, I don't for some reason.
link |
I just knock out and I have sometimes anxiety inducing
link |
kind of like very pragmatic nightmare type of dreams,
link |
but nothing fun, nothing.
link |
I try, I unfortunately have mostly have fun
link |
in the waking world, which is very limited
link |
in the amount of fun you can have.
link |
It's not that limited either.
link |
We'll have to talk.
link |
Yeah, I need instructions.
link |
There's like a manual for that.
link |
What would you dream?
link |
You know, years ago when I read about, you know,
link |
like, you know, a book about how to have, you know,
link |
become aware of your dreams.
link |
I worked on it for a while.
link |
Like there's this trick about, you know,
link |
imagine you can see your hands and look out
link |
and I got somewhat good at it.
link |
Like, but my mostly, when I'm thinking about things
link |
or working on problems, I prep myself before I go to sleep.
link |
It's like, I pull into my mind all the things
link |
I wanna work on or think about.
link |
And then that, let's say, greatly improves the chances
link |
that I'll work on that while I'm sleeping.
link |
And then I also, you know, basically ask to remember it.
link |
And I often remember very detailed.
link |
Or outside the dream.
link |
Well, to bring it up in my dreaming
link |
and then to remember it when I wake up.
link |
It's just, it's more of a meditative practice.
link |
You say, you know, to prepare yourself to do that.
link |
Like if you go to, you know, to sleep,
link |
still gnashing your teeth about some random thing
link |
that happened that you're not that really interested in,
link |
you'll dream about it.
link |
That's really interesting.
link |
But you can direct your dreams somewhat by prepping.
link |
Yeah, I'm gonna have to try that.
link |
It's really interesting.
link |
Like the most important, the interesting,
link |
not like what did this guy send in an email
link |
kind of like stupid worry stuff,
link |
but like fundamental problems
link |
you're actually concerned about.
link |
And interesting things you're worried about.
link |
Or books you're reading or, you know,
link |
some great conversation you had
link |
or some adventure you want to have.
link |
Like there's a lot of space there.
link |
And it seems to work that, you know,
link |
my percentage of interesting dreams and memories went up.
link |
Is there, is that the source of,
link |
if you were able to deconstruct like
link |
where some of your best ideas came from,
link |
is there a process that's at the core of that?
link |
Like, so some people, you know, walk and think,
link |
some people like in the shower, the best ideas hit them.
link |
If you talk about like Newton, Apple hitting them on the head.
link |
No, I found out a long time ago,
link |
I process things somewhat slowly.
link |
So like in college, I had friends who could study
link |
at the last minute, get an A the next day.
link |
I can't do that at all.
link |
So I always front loaded all the work.
link |
Like I do all the problems early, you know,
link |
for finals, like the last three days,
link |
I wouldn't look at a book because I want, you know,
link |
cause like a new fact day before finals may screw up
link |
my understanding of what I thought I knew.
link |
So my goal was to always get it in and give it time to soak.
link |
And I used to, you know,
link |
I remember when we were doing like 3D calculus,
link |
I would have these amazing dreams of 3D surfaces
link |
with normal, you know, calculating the gradient.
link |
And it's just like all come up.
link |
So it was like really fun, like very visual.
link |
And if I got cycles of that, that was useful.
link |
And the other is, is don't over filter your ideas.
link |
Like I like that process of brainstorming
link |
where lots of ideas can happen.
link |
I like people who have lots of ideas.
link |
But then there's a, yeah, I'll let them sit
link |
and let it breathe a little bit
link |
and then reduce it to practice.
link |
Like at some point you really have to, does it really work?
link |
Like, you know, is this real or not, right?
link |
But you have to do both.
link |
There's creative tension there.
link |
Like how do you be both open and, you know, precise?
link |
Have you had ideas that you just,
link |
that sit in your mind for like years before the?
link |
It's an interesting way to just generate ideas
link |
and just let them sit, let them sit there for a while.
link |
I think I have a few of those ideas.
link |
You know, that was so funny.
link |
Yeah, I think that's, you know,
link |
creativity this one or something.
link |
For the slow thinkers in the room, I suppose.
link |
As I, some people, like you said, are just like, like the.
link |
Yeah, it's really interesting.
link |
There's so much diversity in how people think.
link |
You know, how fast or slow they are,
link |
how well they remember or don't.
link |
Like, you know, I'm not super good at remembering facts,
link |
but processes and methods.
link |
Like in our engineering, I went to Penn State
link |
and almost all our engineering tests were open book.
link |
I could remember the page and not the formula.
link |
But as soon as I saw the formula,
link |
I could remember the whole method if I'd learned it.
link |
So it's just a funny, where some people could, you know,
link |
I'd watch friends like flipping through the book,
link |
trying to find the formula,
link |
even knowing that they'd done just as much work.
link |
And I would just open the book
link |
and I was on page 27, about half,
link |
I could see the whole thing visually.
link |
And you have to learn that about yourself
link |
and figure out what would function optimally.
link |
I had a friend who was always concerned
link |
he didn't know how he came up with ideas.
link |
He had lots of ideas, but he said they just sort of popped up.
link |
Like, you'd be working on something, you have this idea,
link |
like, where does it come from?
link |
But you can have more awareness of it.
link |
Like, how your brain works is a little murky
link |
as you go down from the voice in your head
link |
or the obvious visualizations.
link |
Like, when you visualize something, how does that happen?
link |
Yeah, that's right.
link |
You know, if I say, you know, visualize a volcano,
link |
it's easy to do, right?
link |
And what does it actually look like when you visualize it?
link |
I can visualize to the point where I don't see very much
link |
out of my eyes and I see the colors
link |
of the thing I'm visualizing.
link |
Yeah, but there's a shape, there's a texture,
link |
there's a color, but there's also conceptual visualization.
link |
Like, what are you actually visualizing
link |
when you're visualizing a volcano?
link |
Just like with peripheral vision,
link |
you think you see the whole thing.
link |
Yeah, yeah, yeah, that's a good way to say it.
link |
You know, you have this kind of almost peripheral vision
link |
of your visualizations, they're like these ghosts.
link |
But if, you know, if you work on it,
link |
you can get a pretty high level of detail.
link |
And somehow you can walk along those visualizations
link |
and come up with an idea, which is weird.
link |
But when you're thinking about solving problems,
link |
like, you're putting information in,
link |
you're exercising the stuff you do know,
link |
you're sort of teasing the area that you don't understand
link |
and don't know, but you can almost, you know,
link |
feel, you know, that process happening.
link |
You know, that's how I, like,
link |
like, I know sometimes when I'm working really hard
link |
on something, like, I get really hot when I'm sleeping.
link |
And, you know, it's like, we got the blank throw,
link |
I wake up, all the blanks are on the floor.
link |
And, you know, every time it's, well,
link |
I wake up and think, wow, that was great.
link |
Are you able to reverse engineer
link |
what the hell happened there?
link |
Well, sometimes it's vivid dreams
link |
and sometimes it's just kind of, like you say,
link |
like shadow thinking that you sort of have this feeling
link |
you're going through this stuff, but it's not that obvious.
link |
Isn't that so amazing that the mind
link |
just does all these little experiments?
link |
I never, you know, I always thought it's like a river
link |
that you can't, you're just there for the ride,
link |
but you're right, if you prep it.
link |
No, it's all understandable.
link |
Meditation really helps.
link |
You gotta start figuring out,
link |
you need to learn language of your own mind.
link |
And there's multiple levels of it, but.
link |
The abstractions again, right?
link |
It's somewhat comprehensible and observable
link |
and feelable or whatever the right word is.
link |
You know, you're not alone for the ride.
link |
I have to ask you, hardware engineer,
link |
working on neural networks now, what's consciousness?
link |
What the hell is that thing?
link |
Is that just some little weird quirk
link |
of our particular computing device?
link |
Or is it something fundamental
link |
that we really need to crack open
link |
if we're to build good computers?
link |
Do you ever think about consciousness?
link |
Like why it feels like something to be?
link |
I know, it's really weird.
link |
I mean, everything about it's weird.
link |
First, it's a half a second behind reality, right?
link |
It's a post hoc narrative about what happened.
link |
You've already done stuff
link |
by the time you're conscious of it.
link |
And your consciousness generally
link |
is a single threaded thing,
link |
but we know your brain is 10 billion neurons
link |
running some crazy parallel thing.
link |
And there's a really big sorting thing going on there.
link |
It also seems to be really reflective
link |
in the sense that you create a space in your head.
link |
Like we don't really see anything, right?
link |
Like photons hit your eyes,
link |
it gets turned into signals,
link |
it goes through multiple layers of neurons.
link |
I'm so curious that that looks glassy
link |
and that looks not glassy.
link |
Like how the resolution of your vision is so high
link |
you have to go through all this processing.
link |
Where for most of it, it looks nothing like vision.
link |
Like there's no theater in your mind, right?
link |
So we have a world in our heads.
link |
We're literally just isolated behind our sensors.
link |
But we can look at it, speculate about it,
link |
speculate about alternatives, problem solve, what if.
link |
There's so many things going on
link |
and that process is lagging reality.
link |
And it's single threaded
link |
even though the underlying thing is like massively parallel.
link |
So it's so curious.
link |
So imagine you're building an AI computer.
link |
If you wanted to replicate humans,
link |
well, you'd have huge arrays of neural networks
link |
and apparently only six or seven deep, which is hilarious.
link |
They don't even remember seven numbers,
link |
but I think we can upgrade that a lot, right?
link |
And then somewhere in there,
link |
you would train the network to create
link |
basically the world that you live in, right?
link |
So like tell stories to itself
link |
about the world that it's perceiving.
link |
Well, create the world, tell stories in the world
link |
and then have many dimensions of like side shows to it.
link |
Like we have an emotional structure,
link |
like we have a biological structure.
link |
And that seems hierarchical too.
link |
Like if you're hungry, it dominates your thinking.
link |
If you're mad, it dominates your thinking.
link |
And we don't know if that's important
link |
to consciousness or not,
link |
but it certainly disrupts, intrudes in the consciousness.
link |
Like so there's lots of structure to that.
link |
And we like to dwell on the past.
link |
We like to think about the future.
link |
We like to imagine, we like to fantasize, right?
link |
And the somewhat circular observation of that
link |
is the thing we call consciousness.
link |
Now, if you created a computer system
link |
and did all things, create worldviews,
link |
create the future alternate histories,
link |
dwelled on past events, accurately or semi accurately.
link |
Well, consciousness just spring up like naturally.
link |
Well, would that look and feel conscious to you?
link |
Like you seem conscious to me, but I don't know.
link |
Off of the external observer sense.
link |
Do you think a thing that looks conscious is conscious?
link |
Like do you, again, this is like an engineering
link |
kind of question, I think, because like.
link |
If we want to engineer consciousness,
link |
is it okay to engineer something
link |
that just looks conscious?
link |
Or is there a difference between something that is?
link |
Well, we evolve consciousness
link |
because it's a super effective way to manage our affairs.
link |
Yeah, this is a social element, yeah.
link |
Well, it gives us a planning system.
link |
We have a huge amount of stuff.
link |
Like when we're talking, like the reason
link |
we can talk really fast is we're modeling each other
link |
at a really high level of detail.
link |
And consciousness is required for that.
link |
Well, all those components together
link |
manifest consciousness, right?
link |
So if we make intelligent beings
link |
that we want to interact with that we're like
link |
wondering what they're thinking,
link |
looking forward to seeing them,
link |
when they interact with them, they're interesting,
link |
surprising, you know, fascinating, you know,
link |
they will probably feel conscious like we do
link |
and we'll perceive them as conscious.
link |
I don't know why not, but you never know.
link |
Another fun question on this,
link |
because from a computing perspective,
link |
we're trying to create something
link |
that's humanlike or superhumanlike.
link |
Let me ask you about aliens.
link |
Do you think there's intelligent alien civilizations
link |
out there and do you think their technology,
link |
their computing, their AI bots,
link |
their chips are of the same nature as ours?
link |
Yeah, I've got no idea.
link |
I mean, if there's lots of aliens out there
link |
that have been awfully quiet,
link |
you know, there's speculation about why.
link |
There seems to be more than enough planets out there.
link |
There's intelligent life on this planet
link |
that seems quite different, you know,
link |
like dolphins seem like plausibly understandable,
link |
octopuses don't seem understandable at all.
link |
If they lived longer than a year,
link |
maybe they would be running the planet.
link |
They seem really smart.
link |
And their neural architecture
link |
is completely different than ours.
link |
Now, who knows how they perceive things.
link |
I mean, that's the question is for us intelligent beings,
link |
we might not be able to perceive other kinds of intelligence
link |
if they become sufficiently different than us.
link |
Yeah, like we live in the current constrained world,
link |
you know, it's three dimensional geometry
link |
and the geometry defines a certain amount of physics.
link |
And, you know, there's like how time works seems to work.
link |
There's so many things that seem like
link |
a whole bunch of the input parameters to the, you know,
link |
another conscious being are the same.
link |
Yes, like if it's biological,
link |
biological things seem to be
link |
in a relatively narrow temperature range, right?
link |
Because, you know, organics aren't stable,
link |
too cold or too hot.
link |
Now, so if you specify the list of things that input to that,
link |
but as soon as we make really smart, you know, beings
link |
and they go solve about how to think
link |
about a billion numbers at the same time
link |
and how to think in end dimensions.
link |
There's a funny science fiction book
link |
where all the society had uploaded into this matrix.
link |
And at some point, some of the beings in the matrix thought,
link |
I wonder if there's intelligent life out there.
link |
So they had to do a whole bunch of work to figure out
link |
like how to make a physical thing
link |
because their matrix was self sustaining
link |
and they made a little spaceship
link |
and they traveled to another planet when they got there,
link |
there was like life running around,
link |
but there was no intelligent life.
link |
And then they figured out that there was these huge,
link |
you know, organic matrix all over the planet
link |
inside there where intelligent beings
link |
had uploaded themselves into that matrix.
link |
So everywhere intelligent life was,
link |
soon as it got smart, it upleveled itself
link |
into something way more interesting than 3D geometry.
link |
Yeah, it escaped whatever this,
link |
not escaped, uplevel is better.
link |
The essence of what we think of as an intelligent being,
link |
I tend to like the thought experiment of the organism,
link |
like humans aren't the organisms.
link |
I like the notion of like Richard Dawkins and memes
link |
that ideas themselves are the organisms,
link |
like that are just using our minds to evolve.
link |
So like we're just like meat receptacles
link |
for ideas to breed and multiply and so on.
link |
And maybe those are the aliens.
link |
Yeah, so Jordan Peterson has a line that says,
link |
you know, you think you have ideas, but ideas have you.
link |
Which, and then we know about the phenomenon of groupthink
link |
and there's so many things that constrain us.
link |
But I think you can examine all that
link |
and not be completely owned by the ideas
link |
and completely sucked into groupthink.
link |
And part of your responsibility as a human
link |
is to escape that kind of phenomenon,
link |
which isn't, it's one of the creative tension things again,
link |
you're constructed by it, but you can still observe it
link |
and you can think about it and you can make choices
link |
about to some level, how constrained you are by it.
link |
And it's useful to do that.
link |
And, but at the same time, and it could be by doing that,
link |
you know, the group and society you're part of
link |
becomes collectively even more interesting.
link |
So, you know, so the outside observer will think,
link |
wow, you know, all these Lexus running around
link |
with all these really independent ideas
link |
have created something even more interesting
link |
So, I don't know, those are lenses to look at the situation
link |
that'll give you some inspiration,
link |
but I don't think they're constrained.
link |
As a small little quirk of history,
link |
it seems like you're related to Jordan Peterson,
link |
like you mentioned.
link |
He's going through some rough stuff now.
link |
Is there some comment you can make
link |
about the roughness of the human journey, the ups and downs?
link |
Well, I became an expert in Benza withdrawal,
link |
like, which is, you took Benza to Aspen's,
link |
and at some point they interact with GABA circuits,
link |
you know, to reduce anxiety and do a hundred other things.
link |
Like there's actually no known list of everything they do
link |
because they interact with so many parts of your body.
link |
And then once you're on them, you habituate to them
link |
and you have a dependency.
link |
It's not like you're a drug dependency
link |
where you're trying to get high.
link |
It's a metabolic dependency.
link |
And then if you discontinue them,
link |
there's a funny thing called kindling,
link |
which is if you stop them and then go,
link |
you know, you'll have a horrible withdrawal symptoms.
link |
And if you go back on them at the same level,
link |
you won't be stable.
link |
And that unfortunately happened to him.
link |
Because it's so deeply integrated
link |
into all the kinds of systems in the body.
link |
It literally changes the size and numbers
link |
of neurotransmitter sites in your brain.
link |
So there's a process called the Ashton protocol
link |
where you taper it down slowly over two years
link |
to people go through that goes through unbelievable hell.
link |
And what Jordan went through seemed to be worse
link |
because on advice of doctors, you know,
link |
we'll stop taking these and take this.
link |
It was the disaster.
link |
And he got some, yeah, it was pretty tough.
link |
He seems to be doing quite a bit better intellectually.
link |
You can see his brain clicking back together.
link |
I spent a lot of time with him.
link |
I've never seen anybody suffer so much.
link |
Well, his brain is also like this powerhouse, right?
link |
So I wonder, does a brain that's able to think deeply
link |
about the world suffer more through these kinds
link |
of withdrawals, like?
link |
I've watched videos of people going through withdrawal.
link |
They all seem to suffer unbelievably.
link |
And, you know, my heart goes out to everybody.
link |
And there's some funny math about this.
link |
Some doctor said, as best he can tell, you know,
link |
there's the standard recommendations.
link |
Don't take them for more than a month
link |
and then taper over a couple of weeks.
link |
Many doctors prescribe them endlessly,
link |
which is against the protocol, but it's common, right?
link |
And then something like 75% of people, when they taper,
link |
it's, you know, half the people have difficulty,
link |
but 75% get off okay.
link |
20% have severe difficulty
link |
and 5% have life threatening difficulty.
link |
And if you're one of those, it's really bad.
link |
And the stories that people have on this
link |
is heartbreaking and tough.
link |
So you put some of the fault at the doctors.
link |
They just not know what the hell they're doing.
link |
No, no, it's hard to say.
link |
It's one of those commonly prescribed things.
link |
Like one doctor said, what happens is,
link |
if you're prescribed them for a reason
link |
and then you have a hard time getting off,
link |
the protocol basically says you're either crazy
link |
or dependent and you get kind of pushed
link |
into a different treatment regime.
link |
You're a drug addict or a psychiatric patient.
link |
And so like one doctor said, you know,
link |
I prescribed them for 10 years thinking
link |
I was helping my patients
link |
and I realized I was really harming them.
link |
And you know, the awareness of that is slowly coming up.
link |
The fact that they're casually prescribed to people
link |
is horrible and it's bloody scary.
link |
And some people are stable on them,
link |
but they're on them for life.
link |
Like once you, you know, it's another one of those drugs.
link |
But benzos long range have real impacts on your personality.
link |
People talk about the benzo bubble
link |
where you get disassociated from reality
link |
and your friends a little bit.
link |
It's really terrible.
link |
The mind is terrifying.
link |
We were talking about how the infinite possibility of fun,
link |
but like it's the infinite possibility of suffering too,
link |
which is one of the dangers of like expansion
link |
of the human mind.
link |
It's like, I wonder if all the possible experiences
link |
that an intelligent computer can have,
link |
is it mostly fun or is it mostly suffering?
link |
So like if you brute force expand the set of possibilities,
link |
like are you going to run into some trouble
link |
in terms of like torture and suffering and so on?
link |
Maybe our human brain is just protecting us
link |
from much more possible pain and suffering.
link |
Maybe the space of pain is like much larger
link |
than we could possibly imagine.
link |
The world's in a balance.
link |
You know, all the literature on religion and stuff is,
link |
you know, the struggle between good and evil
link |
is balanced for very finely tuned
link |
for reasons that are complicated.
link |
But that's a long philosophical conversation.
link |
Speaking of balance that's complicated,
link |
I wonder because we're living through
link |
one of the more important moments in human history
link |
with this particular virus.
link |
It seems like pandemics have at least the ability
link |
to kill off most of the human population at their worst.
link |
And there's just fascinating
link |
because there's so many viruses in this world.
link |
There's so many, I mean, viruses basically run the world
link |
in the sense that they've been around very long time.
link |
They're everywhere.
link |
They seem to be extremely powerful
link |
in the distributed kind of way.
link |
But at the same time, they're not intelligent
link |
and they're not even living.
link |
Do you have like high level thoughts about this virus
link |
that like in terms of you being fascinated or terrified
link |
or somewhere in between?
link |
So I believe in frameworks, right?
link |
So like one of them is evolution.
link |
Like we're evolved creatures, right?
link |
And one of the things about evolution
link |
is it's hyper competitive.
link |
And it's not competitive out of a sense of evil.
link |
It's competitive as a sense of there's endless variation
link |
and variations that work better when.
link |
And then over time, there's so many levels
link |
of that competition.
link |
Like multicellular life partly exists
link |
because of the competition
link |
between different kinds of life forms.
link |
And we know sex partly exists to scramble our genes
link |
so that we have genetic variation
link |
against the invasion of the bacteria and the viruses.
link |
Like I read some funny statistic,
link |
like the density of viruses and bacteria in the ocean
link |
And one third of the bacteria die every day
link |
because a virus is invading them.
link |
Like one third of them.
link |
Like I don't know if that number is true,
link |
but it was like the amount of competition
link |
and what's going on is stunning.
link |
And there's a theory as we age,
link |
we slowly accumulate bacterias and viruses
link |
and as our immune system kind of goes down,
link |
that's what slowly kills us.
link |
It just feels so peaceful from a human perspective
link |
when we sit back and are able
link |
to have a relaxed conversation.
link |
And there's wars going on out there.
link |
Like right now, you're harboring how many bacteria?
link |
And the ones, many of them are parasites on you
link |
and some of them are helpful
link |
and some of them are modifying your behavior
link |
and some of them are, it's just really wild.
link |
But this particular manifestation is unusual
link |
in the demographic, how it hit
link |
and the political response that it engendered
link |
and the healthcare response it engendered
link |
and the technology it engendered, it's kind of wild.
link |
Yeah, the communication on Twitter that it led to,
link |
all that kind of stuff, at every single level, yeah.
link |
But what usually kills life,
link |
the big extinctions are caused by meteors and volcanoes.
link |
That's the one you're worried about
link |
as opposed to human created bombs that we launch.
link |
Solar flares are another good one.
link |
Occasionally, solar flares hit the planet.
link |
Yeah, it's all pretty wild.
link |
On another historic moment, this is perhaps outside
link |
but perhaps within your space of frameworks
link |
that you think about that just happened,
link |
I guess a couple of weeks ago is,
link |
I don't know if you're paying attention at all,
link |
is the GameStop and Wall Street bets.
link |
So it's really fascinating.
link |
There's kind of a theme to this conversation today
link |
because it's like neural networks,
link |
it's cool how there's a large number of people
link |
in a distributed way, almost having a kind of fun,
link |
we're able to take on the powerful elites,
link |
elite hedge funds, centralized powers and overpower them.
link |
Do you have thoughts on this whole saga?
link |
I don't know enough about finance,
link |
but it was like the Elon, Robinhood guy when they talked.
link |
Yeah, what'd you think about that?
link |
Well, Robinhood guy didn't know
link |
how the finance system worked.
link |
That was clear, right?
link |
He was treating like the people
link |
who settled the transactions as a black box.
link |
And suddenly somebody called him up and say,
link |
hey, black box calling you, your transaction volume
link |
means you need to put out $3 billion right now.
link |
And he's like, I don't have $3 billion.
link |
Like I don't even make any money on these trades.
link |
Why do I owe $3 billion while you're sponsoring the trade?
link |
So there was a set of abstractions
link |
that I don't think either, like now we understand it.
link |
Like this happens in chip design.
link |
Like you buy wafers from TSMC or Samsung or Intel,
link |
and they say it works like this
link |
and you do your design based on that.
link |
And then chip comes back and doesn't work.
link |
And then suddenly you started having to open the black boxes.
link |
The transistors really work like they said,
link |
what's the real issue?
link |
So there's a whole set of things
link |
that created this opportunity and somebody spotted it.
link |
Now, people spot these kinds of opportunities all the time.
link |
So there's been flash crashes,
link |
there's always short squeezes are fairly regular.
link |
Every CEO I know hates the shorts
link |
because they're trying to manipulate their stock
link |
in a way that they make money
link |
and deprive value from both the company
link |
and the investors.
link |
So the fact that some of these stocks were so short,
link |
it's hilarious that this hasn't happened before.
link |
I don't know why, and I don't actually know why
link |
some serious hedge funds didn't do it to other hedge funds.
link |
And some of the hedge funds
link |
actually made a lot of money on this.
link |
So my guess is we know 5% of what really happened
link |
and that a lot of the players don't know what happened.
link |
And the people who probably made the most money
link |
aren't the people that they're talking about.
link |
Do you think there was something,
link |
I mean, this is the cool kind of Elon,
link |
you're the same kind of conversationalist,
link |
which is like first principles questions of like,
link |
what the hell happened?
link |
Just very basic questions of like,
link |
was there something shady going on?
link |
What, who are the parties involved?
link |
It's the basic questions everybody wants to know about.
link |
Yeah, so like we're in a very hyper competitive world,
link |
but transactions like buying and selling stock
link |
I trust the company, represented themselves properly.
link |
I bought the stock because I think it's gonna go up.
link |
I trust that the regulations are solid.
link |
Now, inside of that, there's all kinds of places
link |
where humans over trust and this exposed,
link |
let's say some weak points in the system.
link |
I don't know if it's gonna get corrected.
link |
I don't know if we have close to the real story.
link |
Yeah, my suspicion is we don't.
link |
And listen to that guy, he was like a little wide eyed
link |
about and then he did this and then he did that.
link |
And I was like, I think you should know more
link |
about your business than that.
link |
But again, there's many businesses
link |
when like this layer is really stable,
link |
you stop paying attention to it.
link |
You pay attention to the stuff that's bugging you or new.
link |
You don't pay attention to the stuff
link |
that just seems to work all the time.
link |
You just, sky's blue every day, California.
link |
And every once in a while it rains
link |
and everybody's like, what do we do?
link |
Somebody go bring in the lawn furniture.
link |
You don't know why it's getting wet.
link |
Yeah, it doesn't always work.
link |
I was blue for like a hundred days and now it's, so.
link |
But part of the problem here with Vlad,
link |
the CEO of Robinhood is the scaling
link |
that we've been talking about is there's a lot
link |
of unexpected things that happen with the scaling