back to index

Travis Oliphant: NumPy, SciPy, Anaconda, Python & Scientific Programming | Lex Fridman Podcast #224


small model | large model

link |
00:00:00.000
The following is a conversation with Travis Oliphant,
link |
00:00:03.600
one of the most impactful programmers
link |
00:00:05.520
and data scientists ever.
link |
00:00:07.900
He created NumPy, SciPy, and Anaconda.
link |
00:00:12.760
NumPy formed the foundation
link |
00:00:14.500
of tensor based machine learning in Python,
link |
00:00:17.060
SciPy formed the foundation
link |
00:00:18.880
of scientific programming in Python,
link |
00:00:20.960
and Anaconda, specifically with Conda,
link |
00:00:23.980
made Python more accessible to a much larger audience.
link |
00:00:27.620
Travis's life work across a large number of programming
link |
00:00:31.200
and entrepreneurial efforts has and will continue
link |
00:00:34.760
to have immeasurable impact on millions of lives
link |
00:00:38.440
by empowering scientists and engineers
link |
00:00:41.360
in big companies, small companies,
link |
00:00:43.600
and open source communities to take on difficult problems
link |
00:00:47.200
and solve them with the power of programming.
link |
00:00:50.520
Plus, he's a truly kind human being,
link |
00:00:53.440
which is something that when combined with vision
link |
00:00:56.000
and ambition makes for a great leader
link |
00:00:58.400
and a great person to chat with.
link |
00:01:01.160
To support this podcast,
link |
00:01:02.320
please check out our sponsors in the description.
link |
00:01:04.880
This is the Lex Friedman Podcast,
link |
00:01:06.960
and here is my conversation with Travis Oliphant.
link |
00:01:11.520
What was the first computer program you've ever written?
link |
00:01:14.480
Do you remember?
link |
00:01:15.320
Whoa, that's a good question.
link |
00:01:16.920
I think it was in fourth grade.
link |
00:01:18.380
Just a simple loop in BASIC.
link |
00:01:20.920
BASIC. BASIC, yeah, on an Atari 800,
link |
00:01:23.320
Atari 400, I think, or maybe it was an Atari 800.
link |
00:01:26.840
It was a part of a class,
link |
00:01:28.300
and we just were just BASIC loops to print things out.
link |
00:01:32.560
Did you use go to statements?
link |
00:01:34.920
Yes, yes, we used go to statements.
link |
00:01:38.000
I remember in the early days,
link |
00:01:39.560
that's when I first realized
link |
00:01:41.160
there's like principles to programming,
link |
00:01:43.320
when I was told that don't use go to statements.
link |
00:01:45.720
Those are bad software engineering principles,
link |
00:01:48.080
like it goes against what great, beautiful code is.
link |
00:01:52.040
I was like, oh, okay, there's rules to this game.
link |
00:01:54.800
I didn't see that until high school
link |
00:01:56.240
when I took an AP computer science course.
link |
00:01:58.360
I did a lot of other kinds of just programming in TI,
link |
00:02:02.200
but finally, when I took an AP computer science course
link |
00:02:04.160
in Pascal.
link |
00:02:05.720
Wow.
link |
00:02:06.560
That's, yeah, it was Pascal.
link |
00:02:07.440
That's when I, oh, there are these principles.
link |
00:02:09.760
Not C or C++?
link |
00:02:11.320
No, I didn't take C until the next year in college.
link |
00:02:14.660
I had a course in C, but I haven't done much in Pascal,
link |
00:02:18.100
just that AP computer science course.
link |
00:02:20.160
Now, sorry for the romanticized question,
link |
00:02:23.480
but when did you first fall in love with programming?
link |
00:02:26.720
Oh, man, good question.
link |
00:02:27.880
I think actually when I was 10,
link |
00:02:30.280
my dad got us a TI Timex Sinclair,
link |
00:02:33.460
and he was excited about the spreadsheet capability,
link |
00:02:37.200
and then, but I made him get the basic,
link |
00:02:39.560
the add ons we could actually program in basic,
link |
00:02:41.840
and just being able to write instructions
link |
00:02:44.520
and have the computer do something.
link |
00:02:45.960
Then we got a TI 994A when I was about 12,
link |
00:02:50.080
and I would just, it had sprites and graphics and music.
link |
00:02:52.960
You could actually program it to do music.
link |
00:02:55.320
That's when I really sort of fell in love with programming.
link |
00:02:58.600
So this is a full, like a real computer
link |
00:03:01.060
with like, with memory and storage,
link |
00:03:04.120
processors and whatnot,
link |
00:03:05.240
because you say TI. Yeah, the Timex Sinclair
link |
00:03:07.360
was one of the very first, it was a cheap, cheap,
link |
00:03:09.680
like, I think it was, well, it was still expensive,
link |
00:03:12.760
but it was 2K of memory.
link |
00:03:14.440
We got the 16K add on pack,
link |
00:03:16.760
but yeah, it had memory, and you could program it.
link |
00:03:19.000
You had the, in order to store your programs,
link |
00:03:20.920
you had to attach a tape drive.
link |
00:03:22.880
Remember that old, the sound that would play
link |
00:03:24.400
when you converted the modems would convert digital bits
link |
00:03:29.440
to audio files set on a tape drive.
link |
00:03:31.920
Still remember that sound, but that was the storage.
link |
00:03:34.760
And what was the programming language, do you remember?
link |
00:03:36.480
It was basic. It was basic.
link |
00:03:37.320
And then they had a VisiCalc,
link |
00:03:38.980
and so a little bit of spreadsheet programming
link |
00:03:40.600
in VisiCalc, but mostly just some basic.
link |
00:03:42.760
Do you remember what kind of things drew you to programming?
link |
00:03:46.340
Was it working with data, was it video games?
link |
00:03:50.360
Games, math, mathy stuff?
link |
00:03:52.600
Yeah, I've always loved math,
link |
00:03:54.800
and a lot of people think they don't like math
link |
00:03:58.080
because I think when they're exposed to it early,
link |
00:04:00.440
it's about memory.
link |
00:04:02.080
When you're exposed to math early,
link |
00:04:03.260
you have a good short term memory,
link |
00:04:04.280
can remember his timetables.
link |
00:04:05.920
And I do have a reasonably, I mean, not perfect,
link |
00:04:08.600
but a reasonably long little short term memory buffer.
link |
00:04:12.480
And so I did great at timetables.
link |
00:04:14.320
I said, oh, I'm good at math.
link |
00:04:15.840
But I started to really like math,
link |
00:04:17.360
just the problem solving aspect.
link |
00:04:20.320
And so computing was problem solving applied.
link |
00:04:25.040
And so that's always kind of been the draw,
link |
00:04:28.280
kind of coupled with the mathematics.
link |
00:04:30.480
Did you ever see the computer as like an extension
link |
00:04:33.920
of your mind, like something able to achieve?
link |
00:04:36.520
Not till later.
link |
00:04:37.760
Okay.
link |
00:04:38.600
Yeah, not then.
link |
00:04:39.440
It's just like a little set of puzzles
link |
00:04:40.880
that you can play with and you can play with math puzzles.
link |
00:04:43.520
Yeah, it was too rudimentary early on.
link |
00:04:46.120
Like it was sort of, yeah, it was a lot of work
link |
00:04:49.160
to actually take a thought you'd have
link |
00:04:51.440
and actually get it implemented.
link |
00:04:53.120
And that's still work, but it's getting easier.
link |
00:04:56.020
And so yeah, I would say that's definitely
link |
00:04:58.240
what's attracting me to Python
link |
00:04:59.560
is that that was more real, right?
link |
00:05:02.140
I could think in Python.
link |
00:05:04.840
Speaking of foreign language,
link |
00:05:05.800
I only speak another language fluently besides English,
link |
00:05:08.400
which is Spanish.
link |
00:05:09.220
And I remember the day when I would dream in Spanish
link |
00:05:11.720
and you start to think in that language.
link |
00:05:13.440
And then you actually, I do definitely believe
link |
00:05:15.340
that language limits or expands your thinking.
link |
00:05:19.640
There's some languages that actually lead you
link |
00:05:21.600
to certain thought processes.
link |
00:05:23.860
Yeah, like, so I speak Russian fluently
link |
00:05:27.280
and that's certainly a language that leads you
link |
00:05:30.960
down certain thought processes.
link |
00:05:33.240
Well, yeah, I mean, there's a history
link |
00:05:36.220
of the two world wars of millions of people starving
link |
00:05:41.220
to death or near to death throughout its history
link |
00:05:44.180
of suffering, of injustice, like this promise sold
link |
00:05:48.020
to the people and then the carpet
link |
00:05:50.900
or whatever is swept from under them.
link |
00:05:53.340
And it's like broken promises.
link |
00:05:54.660
And all of that pain and melancholy is in the language,
link |
00:05:58.100
the sad songs, the sad hopeful songs,
link |
00:06:01.700
the over romanticized, like, I love you, I hate you,
link |
00:06:05.260
the sort of the swings between all the various spectrums
link |
00:06:09.980
of emotion, so that's all within the language.
link |
00:06:13.740
The way it's twisted, there's a strong culture
link |
00:06:18.020
of rhyming poetry, so like the bards,
link |
00:06:20.380
like the sync, there's a musicality to the language too.
link |
00:06:24.740
Did Dostoevsky write in Russian?
link |
00:06:27.380
Yeah, so like Dostoevsky, Tostoy, all the,
link |
00:06:32.100
all the.
link |
00:06:32.940
The ones that I know about, which are translated
link |
00:06:34.660
and I'm curious how the translations.
link |
00:06:36.340
So Dostoevsky did not use the musicality
link |
00:06:40.860
of the language too much.
link |
00:06:42.180
So it actually translates pretty well
link |
00:06:44.180
because it's so philosophically dense
link |
00:06:46.540
that the story does a lot of the work,
link |
00:06:48.460
but there's a bunch of things that are untranslatable.
link |
00:06:51.140
Certainly the poetry is not translatable.
link |
00:06:53.580
I actually have a few conversations coming up offline
link |
00:06:57.940
and also in this podcast with people
link |
00:06:59.980
who've translated Dostoevsky.
link |
00:07:01.940
And that's for people who worked, who work in this field,
link |
00:07:06.340
know how difficult that is.
link |
00:07:07.340
Sometimes you can spend months thinking
link |
00:07:10.660
about a single sentence, right?
link |
00:07:12.340
In context, like, cause there's just the magic
link |
00:07:15.220
captured by that sentence and how do you translate
link |
00:07:17.860
just in the right way?
link |
00:07:18.940
Because those words can be really powerful.
link |
00:07:22.380
There's a famous line,
link |
00:07:24.300
beauty will save the world from Dostoevsky.
link |
00:07:27.140
You know, there's so many ways to translate that.
link |
00:07:29.500
And you're right, the language gives you the tools
link |
00:07:32.700
with which to tell the story,
link |
00:07:34.140
but it also leads your mind down certain trajectories
link |
00:07:37.260
and paths to where over time,
link |
00:07:39.660
as you think in that language,
link |
00:07:41.140
you become a different human being.
link |
00:07:42.740
Yes. Yeah.
link |
00:07:43.740
Yeah, that's a fascinating reality, I think.
link |
00:07:45.860
I know people have explored that,
link |
00:07:47.020
but it's just rediscovered.
link |
00:07:49.740
Well, we don't, we live in our own like little pockets.
link |
00:07:52.340
Like this is the sad thing is I feel like unfortunately,
link |
00:07:56.860
given time and given getting older,
link |
00:07:59.140
I'll never know China, the Chinese world,
link |
00:08:03.620
because I don't truly know the language.
link |
00:08:05.780
Same with Japanese, I don't truly know Japanese
link |
00:08:08.300
and Portuguese and Brazil,
link |
00:08:10.340
that whole South American continent.
link |
00:08:12.060
Like, yeah, I'll go to Brazil and Argentina,
link |
00:08:14.460
but will I truly understand the people
link |
00:08:17.100
if I don't understand the language?
link |
00:08:18.500
It's sad because I wonder how much,
link |
00:08:23.500
how many geniuses were missing
link |
00:08:25.220
because so much of the scientific world,
link |
00:08:28.540
so much of the technical world is in English,
link |
00:08:31.460
and so much of it might be lost
link |
00:08:33.140
because it's just we don't have the common language.
link |
00:08:36.100
I completely agree.
link |
00:08:36.940
I'm very much in that vein of there's a lot of genius
link |
00:08:40.620
out there that we miss,
link |
00:08:41.780
and it's sort of fortunate when it bubbles up
link |
00:08:45.060
into something that we can understand or process,
link |
00:08:48.700
there's a lot we miss.
link |
00:08:50.420
So I tend to lean towards really loving democratization
link |
00:08:54.060
or things that empower people
link |
00:08:55.420
or very resistant sort of authoritarian structures.
link |
00:09:00.140
Fundamentally for that reason,
link |
00:09:01.900
well, several reasons, but it just hurts us.
link |
00:09:04.740
We're soft.
link |
00:09:06.420
So speaking of languages that empower you,
link |
00:09:09.020
so Python was the first language for me
link |
00:09:11.820
that I really enjoyed thinking in, as you said.
link |
00:09:16.780
Sounds like you shared my experience too.
link |
00:09:18.500
So when did you first,
link |
00:09:19.620
do you remember when you first kind of connected with Python,
link |
00:09:21.900
maybe even fell in love with Python?
link |
00:09:23.740
It's a good question.
link |
00:09:24.580
It was a process.
link |
00:09:25.500
It took about a year.
link |
00:09:26.540
I first encountered Python in 1997.
link |
00:09:29.500
I was a graduate student studying biomedical engineering
link |
00:09:31.740
at the Mayo Clinic.
link |
00:09:32.980
And I had previously,
link |
00:09:34.700
I'd been involved in taking information from satellites.
link |
00:09:39.340
I was an electrical engineering student
link |
00:09:41.340
used to taking information
link |
00:09:42.660
and trying to get something out of it,
link |
00:09:44.060
doing some data processing, getting information out of it.
link |
00:09:46.140
And I'd done that in MATLAB.
link |
00:09:47.660
I'd done that in Perl.
link |
00:09:49.140
I'd done that in scripting on a VMS.
link |
00:09:52.540
There's actually a VAX VMS system,
link |
00:09:54.260
they had their own little scripting tools around Fortran.
link |
00:09:57.980
Done a lot of that.
link |
00:09:58.820
And then as a graduate student,
link |
00:10:00.820
I was looking for something and encountered Python.
link |
00:10:04.420
And because Python had an array,
link |
00:10:06.140
had two things that made me not filter it away.
link |
00:10:09.100
Because I was filtering a bunch of stuff,
link |
00:10:10.380
as Yorick, I looked at Yorick,
link |
00:10:11.700
I looked at a few other languages that are out there
link |
00:10:14.420
at the time in 1997, but it had arrays.
link |
00:10:17.700
There's a library called Numeric
link |
00:10:19.060
that had just been written in 95,
link |
00:10:20.860
like not very, not too much earlier.
link |
00:10:23.740
By an MIT alum, Jim Huganen.
link |
00:10:26.980
You know, and I went back and read the mailing list
link |
00:10:29.100
to see the history of how it grew.
link |
00:10:30.300
And there was a very interesting,
link |
00:10:31.220
it's fascinating to do that actually,
link |
00:10:32.380
to see how this emergent cooperation,
link |
00:10:36.020
unstructured cooperation happens in the open source world
link |
00:10:39.500
that led to a lot of this collective programming,
link |
00:10:43.300
which is something maybe we might get into a little later,
link |
00:10:45.140
but what that looks like.
link |
00:10:46.100
What gap did Numeric fill?
link |
00:10:48.340
Numeric filled the gap of having an array object.
link |
00:10:50.260
There was no array object.
link |
00:10:51.580
There was no array.
link |
00:10:52.420
There was a one dimensional byte concept,
link |
00:10:55.380
but there was no n dimensional,
link |
00:10:57.580
two, three, four dimensional tensor they call it now.
link |
00:11:00.700
I'm still in the category that a tensor is another thing
link |
00:11:03.260
and it's just an ndarray we should call it,
link |
00:11:05.220
but kind of lost that battle.
link |
00:11:08.340
There's many battles in this world,
link |
00:11:10.140
some of which we win, some we lose.
link |
00:11:12.060
That's exactly right.
link |
00:11:13.620
So, but it had no math to it.
link |
00:11:17.180
So Numeric had math and a basic way to think in arrays.
link |
00:11:20.820
So I was looking for that,
link |
00:11:21.820
and it had complex numbers,
link |
00:11:24.980
a lot of programming languages.
link |
00:11:26.380
And you can see it because,
link |
00:11:28.100
if you're just a computer scientist,
link |
00:11:29.500
you think, ah, complex numbers are just two floats.
link |
00:11:32.060
So you can, people can build that on.
link |
00:11:34.980
But in practice, a complex number
link |
00:11:36.740
as one of the significant algebras
link |
00:11:38.980
that helps connect a lot of physical
link |
00:11:40.740
and mathematical ideas,
link |
00:11:42.260
particularly FFT for an electrical engineer.
link |
00:11:45.100
And it's a really important concept
link |
00:11:48.160
and not having it means you have to develop it
link |
00:11:50.820
several times and those times may not share an approach.
link |
00:11:54.300
One of the common things in programming,
link |
00:11:55.700
one of the things programming enables is abstractions.
link |
00:11:59.060
But when you have shared abstractions, it's even better.
link |
00:12:01.180
It sort of gets to the level of language
link |
00:12:02.980
of actually we all think of this the same way,
link |
00:12:05.540
which is both powerful and dangerous, right?
link |
00:12:07.940
Because powerful in that we now can quickly
link |
00:12:11.180
make bigger and higher level things
link |
00:12:13.340
on top of those abstractions dangerous
link |
00:12:14.800
because it also limits us as to the things
link |
00:12:17.100
we maybe left behind in producing that abstraction,
link |
00:12:20.500
which is at the heart of programming today
link |
00:12:21.900
and actually building around the programming world.
link |
00:12:24.420
I think it's a fascinating philosophical topic.
link |
00:12:26.580
Yeah, they will continue for many years, I think.
link |
00:12:28.380
They'll continue for many years.
link |
00:12:29.220
As we build more and more and more abstractions.
link |
00:12:31.260
Yes, I often think about, you know,
link |
00:12:32.340
we have a world that's built on these abstractions
link |
00:12:35.060
that were they the only ones possible?
link |
00:12:37.500
Certainly not, but they led to,
link |
00:12:39.860
you know, it's very hard to do it differently.
link |
00:12:42.300
Like there's an inertia that's very hard to,
link |
00:12:44.980
you know, push out, push away from.
link |
00:12:47.740
That has implications for things like,
link |
00:12:49.640
you know, the Julia language,
link |
00:12:50.720
which you have heard of, I'm sure.
link |
00:12:52.680
And I've met the creators and I liked Julia.
link |
00:12:55.700
It's a really cool language,
link |
00:12:56.580
but they struggled to kind of against the,
link |
00:12:59.300
just the tide of like this inertia of people using Python.
link |
00:13:03.420
And, you know, there's strategies to approach that,
link |
00:13:05.820
but nonetheless, it's a phenomena.
link |
00:13:07.580
And sometimes, so I love complex numbers
link |
00:13:09.580
and I love to raise, so I looked at Python.
link |
00:13:12.260
And then I had the experience, I did some stuff in Python
link |
00:13:15.260
and I was just doing my PhD.
link |
00:13:16.380
So I was out, my focus was on,
link |
00:13:19.700
I was actually doing a combination of MRI and ultrasound
link |
00:13:22.180
and looking at a phenomenon called elastography,
link |
00:13:24.740
which is you push waves into the body
link |
00:13:27.020
and observe those waves, like you can actually measure them.
link |
00:13:30.300
And then you do mathematical inversion
link |
00:13:32.780
to see what the elasticity is.
link |
00:13:35.220
And so that's the problem I was solving
link |
00:13:36.820
is how to do that with both ultrasound and MRI.
link |
00:13:39.780
I needed some tool to do that with.
link |
00:13:41.380
So I was starting to use Python in 97.
link |
00:13:44.260
In 98, I went back, looked at what I'd written
link |
00:13:47.340
and realized I could still understand it,
link |
00:13:49.560
which is not the experience I'd had
link |
00:13:50.900
when doing Perl in 95, right?
link |
00:13:53.660
I'd done the same thing and then I looked back
link |
00:13:55.620
and I forgotten what I was even saying.
link |
00:13:58.360
Now, you know, I'm not saying, so that may,
link |
00:14:00.700
hey, this may work, I like this.
link |
00:14:02.400
This is something I can retain
link |
00:14:04.980
without becoming an expert per se.
link |
00:14:07.620
And so that led me to go, I'm gonna push more into this.
link |
00:14:10.380
And then that 98 was kind of when I started
link |
00:14:14.820
to fall in love with Python, I would say.
link |
00:14:18.300
A few peculiar things about Python.
link |
00:14:20.900
So maybe compare it to Perl,
link |
00:14:22.940
compare it to some of the other languages.
link |
00:14:24.500
So there's no braces.
link |
00:14:26.320
Yeah.
link |
00:14:27.160
So space is used, indentation, I should say,
link |
00:14:31.960
is used as part of the language.
link |
00:14:33.980
Yeah, right.
link |
00:14:35.540
So did you, I mean, that's quite a leap.
link |
00:14:39.980
Were you comfortable with that leap
link |
00:14:41.180
or were you just very open minded?
link |
00:14:42.740
It's a good question.
link |
00:14:43.580
I was open minded, so I was cognizant of the concern.
link |
00:14:48.040
And it definitely has, it has specific challenges.
link |
00:14:52.060
You know, cut and pasting.
link |
00:14:53.520
For example, when you're cut and pasting code,
link |
00:14:55.460
and if your editors aren't supportive of that,
link |
00:14:57.220
if you're putting it into a terminal,
link |
00:14:58.980
and particularly in the past when terminals
link |
00:15:01.020
didn't necessarily have the intelligence to manage it now.
link |
00:15:03.140
Now, I, Python, and Jupyter Notebooks
link |
00:15:05.100
handle that just fine, so there's really no problem.
link |
00:15:06.820
But in the past, it created some challenges,
link |
00:15:08.740
formatting challenges, also mixed tabs and spaces.
link |
00:15:12.460
If editors weren't, you weren't clear
link |
00:15:14.740
on what was happening, you would have these issues.
link |
00:15:16.860
So there were really concrete reasons about it
link |
00:15:19.180
that I heard and understood.
link |
00:15:20.400
I never really encountered a problem with it personally.
link |
00:15:23.960
Like, it was occasional annoyances,
link |
00:15:26.480
but I really liked the fact
link |
00:15:28.420
that it didn't have all this extra characters, right?
link |
00:15:31.060
That these extra characters didn't show up
link |
00:15:33.100
in my visual field when I was just trying
link |
00:15:35.420
to process understanding a snippet of code.
link |
00:15:38.000
Yeah, there's a cleanness to it.
link |
00:15:39.260
But, I mean, the idea is supposed to be
link |
00:15:41.140
that Perl also has a cleanness to it
link |
00:15:43.300
because of the minimalism of how many characters
link |
00:15:46.500
it takes to express a certain thing.
link |
00:15:48.420
So it's very compact.
link |
00:15:49.820
But what you realize with that compactness comes,
link |
00:15:53.560
there's a culture that prizes compactness,
link |
00:15:57.100
and so the code gets more and more compact
link |
00:15:58.900
and less and less readable to a point where it's like,
link |
00:16:03.600
like, to be a good programmer in Perl,
link |
00:16:05.420
you write code that's basically unreadable.
link |
00:16:07.820
There's a culture, like.
link |
00:16:09.100
Correct, and you're proud of it.
link |
00:16:10.860
Yeah, you're proud of it.
link |
00:16:12.460
Right, exactly, and it's like, feels good.
link |
00:16:14.140
And it's really selective.
link |
00:16:16.660
It means you have to be an expert in Perl to understand it.
link |
00:16:20.380
Whereas Python allowed you not to have to be an expert.
link |
00:16:22.980
You didn't have to take all this brain energy.
link |
00:16:24.740
You could leverage, what I say,
link |
00:16:25.660
you could leverage your English language center,
link |
00:16:28.180
which you're using all the time.
link |
00:16:29.980
I've wondered about other languages,
link |
00:16:31.180
particularly non Latin based languages.
link |
00:16:34.680
Latin based languages with the characters are at least similar.
link |
00:16:37.220
I think people have an easier time,
link |
00:16:38.620
but I don't know what it's like to be a Japanese
link |
00:16:41.300
or a Chinese person trying to learn different syntax.
link |
00:16:46.900
Like, what would computer programming look like in that?
link |
00:16:49.740
I haven't looked at that at all,
link |
00:16:50.780
but it certainly doesn't,
link |
00:16:52.020
you know, leveraging your Chinese language center,
link |
00:16:54.300
I'm not sure Python or any programming does that.
link |
00:16:57.060
But that was a big deal.
link |
00:16:58.140
The fact that it was accessible, I could be a scientist.
link |
00:17:00.340
What I really liked is many programming languages
link |
00:17:02.900
really demand a lot of you, and you can get a lot,
link |
00:17:04.740
you know, you do a lot if you learn it.
link |
00:17:07.200
But Python enables you to do a lot
link |
00:17:08.900
without demanding a lot of you.
link |
00:17:11.180
There's nuance to that statement,
link |
00:17:13.100
but it certainly was, it's more accessible.
link |
00:17:15.340
So more people could actually, as a scientist,
link |
00:17:18.040
as somebody who, or an engineer,
link |
00:17:19.860
who was trying to solve another problem
link |
00:17:21.460
besides point programming,
link |
00:17:23.300
I could still use this language and get things done
link |
00:17:26.000
and be happy about it.
link |
00:17:27.340
And I was also comfortable in C at that time.
link |
00:17:30.100
And MATLAB, you did a little bit of that.
link |
00:17:31.340
And MATLAB, I did a lot before that, exactly.
link |
00:17:33.180
So I was comfortable in,
link |
00:17:34.900
those three languages were really the tools I used
link |
00:17:37.580
during my studies and schooling.
link |
00:17:40.540
But to your point about language helping you think,
link |
00:17:42.620
one of the big things about MATLAB was it was,
link |
00:17:44.580
and APL before it, I don't know if you remember APL.
link |
00:17:48.300
APL is actually the predecessor of array based programming,
link |
00:17:51.660
which I think is really an underappreciated,
link |
00:17:54.160
if I talk to people who are just steeped
link |
00:17:55.340
in computer programming, computer science,
link |
00:17:57.640
like most of the people that Microsoft has hired
link |
00:17:59.460
in the past, for example,
link |
00:18:01.140
Microsoft as a company generally did not understand
link |
00:18:03.900
array based programming.
link |
00:18:05.220
Like culturally, they didn't understand it.
link |
00:18:06.620
So they kept missing the boat,
link |
00:18:08.560
kept missing the understanding of what this was.
link |
00:18:11.580
They've gotten better,
link |
00:18:12.740
but there's still a whole culture of folks
link |
00:18:14.420
that doesn't, programming, that's systems programming
link |
00:18:17.980
or web programming or lists and maps.
link |
00:18:20.380
And what about an n dimensional array?
link |
00:18:22.520
Oh yeah, that's just an implementation detail.
link |
00:18:24.700
Well, you can think that,
link |
00:18:26.700
but then actually if you have that as a construct,
link |
00:18:28.800
you actually think differently.
link |
00:18:29.860
APL was the first language to understand that.
link |
00:18:31.660
And it was in the sixties, right?
link |
00:18:33.460
The challenge of APL is APL had very dense,
link |
00:18:36.780
not only glyphs, like new characters, new glyphs,
link |
00:18:39.340
but they even had a new keyboard
link |
00:18:40.480
because to produce those glyphs,
link |
00:18:42.340
this is back in the early days in computing
link |
00:18:43.980
when the QWERTY keyboard maybe wasn't as established,
link |
00:18:47.980
like, well, we can have a new keyboard, no big deal.
link |
00:18:50.780
But it was a big deal and it didn't catch on.
link |
00:18:52.860
And the language APL, very much like Perl,
link |
00:18:56.500
as people would pride themselves on how much,
link |
00:18:58.620
could they write the game of life
link |
00:18:59.740
in 30 characters of APL.
link |
00:19:03.100
APL has characters that mean summation
link |
00:19:06.060
and they have adverbs,
link |
00:19:08.180
they would have adjectives and these things called adverbs,
link |
00:19:10.060
which are like methods, like reduction,
link |
00:19:12.220
reduction would be an adverb on an ad operator, right?
link |
00:19:15.320
So, but doing, using these tools you could construct
link |
00:19:18.660
and then you start to think at that level,
link |
00:19:20.880
you think in n dimensions is something I like to say,
link |
00:19:22.900
and you start to think differently about data at that point.
link |
00:19:25.500
Now you're, it really helps.
link |
00:19:27.500
Yeah, I mean, outside of programming,
link |
00:19:30.100
if you really internalize linear algebra as a course,
link |
00:19:33.700
I mean, it's philosophically allows you
link |
00:19:35.580
to think of the world differently.
link |
00:19:37.220
It's almost like liberating, you don't have to,
link |
00:19:39.700
you don't have to think about the individual numbers
link |
00:19:42.100
in the n dimensional array.
link |
00:19:44.240
You could think of it as an object in itself
link |
00:19:46.140
and all of a sudden this world can open up.
link |
00:19:48.500
You're saying MATLAB and APL were like the early C,
link |
00:19:52.660
I don't know if many languages got that right ever.
link |
00:19:54.980
No, no, no they didn't.
link |
00:19:56.860
Even still.
link |
00:19:57.700
Even still, I would say.
link |
00:19:58.820
I mean, NumPy is an inheritor of the traditions
link |
00:20:02.540
that I would say APLJ was another version that was,
link |
00:20:06.580
what it did is not have the glyphs,
link |
00:20:08.340
just have short characters,
link |
00:20:09.700
but still a Latin keyboard could type them.
link |
00:20:11.740
And then numeric inherited from that
link |
00:20:14.540
in terms of let's add arrays plus broadcasting
link |
00:20:17.660
plus methods, reduction,
link |
00:20:19.700
even some of the language like rank is a concept
link |
00:20:21.780
that was in Python and is still in Python
link |
00:20:24.660
for the number of dimensions, right?
link |
00:20:27.180
That's different than say the rank of a matrix
link |
00:20:29.460
which people think of as well.
link |
00:20:31.140
So it came from that tradition,
link |
00:20:33.060
but NumPy is a very pragmatic, practical tool.
link |
00:20:37.980
NumPy inherited from numeric
link |
00:20:39.260
and we can get to where NumPy came from
link |
00:20:40.820
which is the current array,
link |
00:20:43.340
at least current as of 2015, 2017.
link |
00:20:46.100
Now there's a ton of them over the past two or three years.
link |
00:20:49.320
We can get into that too.
link |
00:20:50.320
So if we just linger on the early days
link |
00:20:52.780
of what was your favorite feature of Python?
link |
00:20:56.220
Do you remember like what?
link |
00:20:58.020
So it's so interesting to linger on like the,
link |
00:21:02.260
what really makes you connect with a language?
link |
00:21:06.300
I'm not sure it's obvious to introspect that.
link |
00:21:09.400
No, it isn't.
link |
00:21:10.240
And I've thought about that at some length.
link |
00:21:12.860
I think definitely the fact that I could read it later,
link |
00:21:16.460
that I could use it productively
link |
00:21:18.140
without becoming an expert.
link |
00:21:19.820
Other language I had to put more effort into.
link |
00:21:22.180
That's like an empirical observation.
link |
00:21:23.940
Like you're not analyzing any one aspect of the language.
link |
00:21:26.500
It just seems time after time when you look back,
link |
00:21:29.460
it's somehow readable.
link |
00:21:30.580
It's somehow readable.
link |
00:21:31.420
Then it was sort of, I could take executable English
link |
00:21:35.380
and translate it to Python more easily.
link |
00:21:36.820
Like I didn't have to go, there was no translation layer.
link |
00:21:39.760
As an engineer or as a scientist,
link |
00:21:41.580
I could think about what I wanted to do.
link |
00:21:43.240
And then the syntax wasn't that far behind it, right?
link |
00:21:46.780
Now there are some warts there still.
link |
00:21:49.220
It wasn't perfect.
link |
00:21:50.600
Like there's some areas where I'm like,
link |
00:21:51.440
ah, it'd be better if this were different
link |
00:21:52.820
or if this were different.
link |
00:21:54.380
Some of those things got added to the language too.
link |
00:21:56.580
I was really grateful for some of the early pioneers
link |
00:21:58.580
in the Python ecosystem back,
link |
00:22:00.220
because Python got written in 91.
link |
00:22:01.900
That's when the first version came out.
link |
00:22:03.140
But Guido was very open to users.
link |
00:22:06.540
And one of the sets of users were people like Jim Huganen
link |
00:22:08.660
and David Asher and Paul Dubois and Conrad Hinson.
link |
00:22:13.460
These were people that were on the main list.
link |
00:22:15.380
And they were just asking for things like,
link |
00:22:16.860
hey, we really should have complex numbers in this language.
link |
00:22:19.220
So let's, you know, there's a J, there's a one J, right?
link |
00:22:22.540
And the fact that they went the engineering route of J
link |
00:22:24.340
is interesting.
link |
00:22:26.660
I don't think that's entirely favoring engineers.
link |
00:22:28.620
I think it's because I is so often used
link |
00:22:30.460
as the index of a for loop.
link |
00:22:32.100
So I think that's actually why.
link |
00:22:34.260
Probably, I mean, there's a pragmatic aspect.
link |
00:22:36.740
But the fact that complex numbers were there, I love that.
link |
00:22:39.100
The fact that I could write in the array constructs
link |
00:22:41.460
and that reduction was there,
link |
00:22:42.820
very simple to write summations and broadcasting was there.
link |
00:22:46.540
I could do addition of whole arrays.
link |
00:22:49.440
So that was cool.
link |
00:22:50.380
Those are some things I loved about it.
link |
00:22:52.660
I don't know what to start talking to you about
link |
00:22:54.820
because you've created so many incredible projects
link |
00:22:57.860
that basically changed the whole landscape of programming.
link |
00:23:00.180
But okay, let's start with,
link |
00:23:02.380
let's go chronologically with SciPy.
link |
00:23:06.060
You created SciPy over two decades ago now?
link |
00:23:09.100
Yes, yes, I love to talk about SciPy.
link |
00:23:11.140
SciPy was really my baby.
link |
00:23:12.980
What is it?
link |
00:23:14.420
What was its goal?
link |
00:23:15.420
What is its goal?
link |
00:23:16.420
How does it work?
link |
00:23:17.260
Yeah, fantastic.
link |
00:23:18.100
So SciPy was effectively, here I am using Python
link |
00:23:21.580
to do stuff that I previously used MATLAB to use.
link |
00:23:24.980
And I was using numeric, which is an array library
link |
00:23:26.860
that made a lot of it possible.
link |
00:23:28.300
But there's things that were missing.
link |
00:23:29.900
Like I didn't have an ordinary differential equation solver
link |
00:23:32.100
I could just call, right?
link |
00:23:33.460
I didn't have integration.
link |
00:23:35.260
Hey, I wanted to integrate this function.
link |
00:23:37.180
Okay, well, I don't have just a function
link |
00:23:38.780
I can call to do that.
link |
00:23:40.580
These are things I remember being critical things
link |
00:23:42.540
that I was missing.
link |
00:23:43.700
Optimization.
link |
00:23:44.580
I just wanna pass a function to an optimizer
link |
00:23:46.780
and have it tell me what the optimal value is.
link |
00:23:50.100
Those are things I'm like, well,
link |
00:23:51.100
why don't we just write a library that adds these tools?
link |
00:23:54.340
And I started to post on the mailing list
link |
00:23:55.740
and there'd previously been, people have discussed,
link |
00:23:58.100
I remember Conrad Henson saying,
link |
00:23:59.140
wouldn't it be great if we had this optimizer library
link |
00:24:00.980
or David Ashwood say this stuff.
link |
00:24:02.580
And I'm a ambitious, ambitious is the wrong word,
link |
00:24:06.940
an eager and probably more time than sense.
link |
00:24:11.340
I was a poor graduate student.
link |
00:24:13.620
My wife thinks I'm working on my PhD and I am,
link |
00:24:15.860
but part of the PhD that I loved
link |
00:24:17.220
was the fact that it's exploratory.
link |
00:24:19.180
You're not just taking orders,
link |
00:24:21.540
fulfilling a list of things to do,
link |
00:24:23.500
you're trying to figure out what to do.
link |
00:24:25.740
And so I thought, well, I'm running tools
link |
00:24:27.900
for my own use and a PhD,
link |
00:24:29.140
so I'll just start this project.
link |
00:24:32.140
And so in 99, 98 was when I first started
link |
00:24:34.940
to write libraries for Python.
link |
00:24:36.620
Definitely when I fell in love with Python 98,
link |
00:24:38.260
I thought, oh, well, there's just a few things missing.
link |
00:24:39.740
Like, oh, I need a reader to read DICOM files.
link |
00:24:42.700
I was in medical imaging and DICOM was a format
link |
00:24:44.580
that I want to be able to load that into Python.
link |
00:24:46.940
Okay, how do I write a reader for that?
link |
00:24:48.180
So I wrote something called, it was an IO package, right?
link |
00:24:51.700
And that was my very first extension module, which is C.
link |
00:24:55.140
So I wrote C code to extend Python
link |
00:24:57.060
so that in Python I could write things more easily.
link |
00:24:59.660
That combination kind of hooked me.
link |
00:25:02.260
It was the idea that I could,
link |
00:25:03.300
here's this powerful tool I can use as a scripting language
link |
00:25:05.700
and a high level language to think about,
link |
00:25:07.460
but that I can extend easily, easily in C,
link |
00:25:11.420
easily for me because I knew enough C.
link |
00:25:13.780
And then Guido had written a link.
link |
00:25:15.260
I mean, the only, the hard part of extending Python
link |
00:25:17.220
was something called the way memory management networks,
link |
00:25:19.500
and you have to do reference counting.
link |
00:25:21.060
And so there's a tracking of reference counting
link |
00:25:23.820
you have to do manually.
link |
00:25:25.500
And if you don't, you have memory leaks.
link |
00:25:27.500
And so that's hard.
link |
00:25:29.020
Plus then C, you know, it's just much more,
link |
00:25:31.020
you have to put more effort into it.
link |
00:25:32.180
It's not just, I have to now think about pointers
link |
00:25:34.700
and I have to think about stuff that is different.
link |
00:25:37.620
I have to kind of,
link |
00:25:38.460
you're like putting a new cartridge in your brain.
link |
00:25:40.620
Like, okay, I'm thinking about MRI.
link |
00:25:42.380
Now I'm thinking about programming.
link |
00:25:43.580
And there are distinct modules
link |
00:25:45.340
you end up having to think about.
link |
00:25:46.620
So it's harder.
link |
00:25:47.460
And when I was just in Python,
link |
00:25:48.300
I could just think about MRI and high level writing,
link |
00:25:51.500
but I could do that.
link |
00:25:52.340
And that kind of, I liked it.
link |
00:25:54.020
I found that to be enjoyable and fun.
link |
00:25:55.780
And so I ended up, oh,
link |
00:25:57.220
well, let me just add a bunch of stuff to Python
link |
00:25:59.020
to do integration.
link |
00:26:00.580
Well, and the cool thing is,
link |
00:26:01.660
is that the power of the internet,
link |
00:26:03.060
just looking around and I found,
link |
00:26:04.300
oh, there's this NetLive,
link |
00:26:06.300
which has hundreds of 4chan routines
link |
00:26:08.860
that people have written in the 60s and the 70s and the 80s
link |
00:26:12.260
in 4chan 77, fortunately, it wasn't 4chan 16.
link |
00:26:14.900
So it had been ported to 4chan 77.
link |
00:26:18.100
And 4chan 77 is actually a really great language.
link |
00:26:21.660
4chan 90 probably is my favorite 4chan
link |
00:26:24.100
because it's also, it's got complex numbers,
link |
00:26:26.100
got arrays and it's pretty high level.
link |
00:26:27.700
Now, the problem with it
link |
00:26:28.980
is you'd never want to write a program in 4chan 90
link |
00:26:31.020
or 4chan 77,
link |
00:26:32.260
but it's totally fine to write a subroutine in, right?
link |
00:26:34.900
And so, and then 4chan kind of got a little off course
link |
00:26:37.660
when they tried to compete with C++.
link |
00:26:39.060
But at the time,
link |
00:26:40.580
I just want libraries to do something like,
link |
00:26:42.340
oh, here's an ordinary differential equation.
link |
00:26:43.940
Here's integration.
link |
00:26:44.900
Here's runge cut integration.
link |
00:26:46.780
Already done.
link |
00:26:47.620
I don't have to think about that algorithm.
link |
00:26:48.780
I mean, you could,
link |
00:26:49.620
but it's nice to have somebody who's already done one
link |
00:26:51.020
and tested it.
link |
00:26:51.860
And so I sort of started this journey in 98, really.
link |
00:26:55.060
If you look back at the mailing list,
link |
00:26:55.980
there's sort of this productive era of me
link |
00:26:59.660
writing an extension module
link |
00:27:01.100
to connect runge cut integration to Python
link |
00:27:04.580
and making an ordinary differential equation solver.
link |
00:27:06.660
And then releasing that as a package.
link |
00:27:09.140
So we could call ODE pack, I think I called it then.
link |
00:27:11.820
Quad pack.
link |
00:27:12.660
And then I just made these packages.
link |
00:27:14.420
Eventually that became multipack
link |
00:27:16.260
because they're originally modular.
link |
00:27:17.580
You can install them separately.
link |
00:27:19.140
But a massive problem in Python
link |
00:27:20.700
was actually just getting your stuff installed.
link |
00:27:23.420
At the time, releasing software for me,
link |
00:27:25.820
like today it's people think, what does that mean?
link |
00:27:27.580
Well, then it meant some poorly written webpage.
link |
00:27:30.780
I had some bad webpage up and I put a tarball,
link |
00:27:33.100
just a GZIP tarball of source code.
link |
00:27:35.780
That was the release.
link |
00:27:37.140
But okay, can we just stand that?
link |
00:27:39.180
Because the community aspect
link |
00:27:43.060
of creating the package and sharing that, that's rare.
link |
00:27:47.820
That, to have, to both have the, at that time,
link |
00:27:50.940
so like the raw.
link |
00:27:51.780
Yeah, it was pretty early, yeah.
link |
00:27:52.740
Oh, well, not rare.
link |
00:27:54.660
Maybe you can correct me on this,
link |
00:27:57.020
but it seems like in the scientific community,
link |
00:27:59.660
so many people, you were basically solving the problems
link |
00:28:02.420
you needed to solve to process the particular application,
link |
00:28:07.100
the data that you need.
link |
00:28:08.540
And to also have the mind
link |
00:28:10.900
that I'm going to make this usable for others, that's.
link |
00:28:15.340
I would say I was inspired.
link |
00:28:16.500
I'd been inspired by Linux,
link |
00:28:18.060
been inspired by Linus and him making his code available.
link |
00:28:21.820
And I was starting to use Linux at the time.
link |
00:28:23.260
And I went, this is cool.
link |
00:28:24.460
So I'd kind of been previously primed that way.
link |
00:28:27.060
And generally I was into science
link |
00:28:29.180
because I liked the sharing notion.
link |
00:28:30.980
I liked the idea of, hey, let's,
link |
00:28:32.660
if collectively we build knowledge and share it,
link |
00:28:34.780
we can all be better off.
link |
00:28:35.740
Okay, so you want to energize by that idea.
link |
00:28:37.420
So I was energized by that idea already, right?
link |
00:28:39.540
And I can't deny that I was.
link |
00:28:40.940
I'm sort of had this very,
link |
00:28:42.900
I liked that part of science, that part of sharing.
link |
00:28:45.700
And then all of a sudden, oh, wait, here's something.
link |
00:28:47.300
And here's something I could do.
link |
00:28:49.940
And then I slowly over years learned how to share better
link |
00:28:52.780
so that you could actually engage more people faster.
link |
00:28:55.100
One of the key things was actually giving people a binary
link |
00:28:57.100
they could install, right?
link |
00:28:58.980
So that it wasn't just your source code, good luck.
link |
00:29:01.460
Compile this and then.
link |
00:29:02.660
It's compiled, ready to install, just, you know.
link |
00:29:05.180
So in fact, a lot of the journey from 98,
link |
00:29:07.380
even through 2012 when I started Anaconda was about that.
link |
00:29:10.780
Like it's why, you know, it's really the key
link |
00:29:13.260
as to why a scientist with dreams of doing MRI research
link |
00:29:17.460
ended up starting a software company
link |
00:29:19.500
that installs software.
link |
00:29:22.260
I work with a few folks now that don't program
link |
00:29:26.700
like on the creative side and the video side,
link |
00:29:28.580
the audio side.
link |
00:29:29.620
And because my whole life is running on scripts,
link |
00:29:32.500
I have to try to get them,
link |
00:29:34.020
I'm having all the task of teaching them
link |
00:29:35.900
how to do Python enough to run the scripts.
link |
00:29:39.220
And so I've been actually facing this,
link |
00:29:40.820
whether it's Anaconda or some with the task of
link |
00:29:44.220
how do I minimally explain basically to my mom
link |
00:29:46.780
how to write a Python script.
link |
00:29:48.900
And it's an interesting challenge.
link |
00:29:50.500
I have to, it's a to do item for me to figure out like,
link |
00:29:53.020
what is the minimal amount of information I have to teach?
link |
00:29:56.340
What are the tools you use that one, you enjoy it,
link |
00:29:59.700
two, you're effective at it.
link |
00:30:00.540
And they're related, those are two related questions.
link |
00:30:02.540
And then the debugging, like the iterative process
link |
00:30:05.500
of running the script to figure out what the error is,
link |
00:30:07.820
maybe even for some people to do the fix yourself.
link |
00:30:11.580
So do you compile it?
link |
00:30:12.660
Do you, like how do you distribute that code to them?
link |
00:30:15.620
And it's interesting because I think
link |
00:30:18.540
it's exactly what you're talking about.
link |
00:30:20.100
If you increase the circle of empathy,
link |
00:30:24.260
the circle of people that are able to use your programs,
link |
00:30:28.900
you increase it, it's like effectiveness and it's power.
link |
00:30:32.900
And so you have to think, can I write scripts?
link |
00:30:37.020
Can I write programs that can be used by medical engineers,
link |
00:30:40.140
by all kinds of people that don't know programming
link |
00:30:43.900
and actually maybe plant a seed,
link |
00:30:46.900
have them catch the bug of programming
link |
00:30:48.380
so that they start on a journey.
link |
00:30:50.180
That's a huge responsibility.
link |
00:30:51.500
And ultimately it has to do with the Amazon one click buy.
link |
00:30:55.340
Like how frictionless can you make the early steps?
link |
00:30:58.780
Frictionless is actually really key.
link |
00:31:00.380
To go in any community is, any friction point,
link |
00:31:03.020
you're just gonna lose some people, right?
link |
00:31:05.180
Now sometimes you may wanna intentionally do that.
link |
00:31:09.060
If you're early enough on, you need a lot of help.
link |
00:31:11.620
You need people who have the skills.
link |
00:31:13.340
You might actually, it's helpful.
link |
00:31:14.740
You don't necessarily have too many users
link |
00:31:16.820
as opposed to contributors if you're early on.
link |
00:31:20.340
Anyway, there's, SciFi started in 98,
link |
00:31:23.100
but it really emerged as this collection of modules
link |
00:31:25.740
that I was just putting on the net.
link |
00:31:27.340
People were downloading and I think I got 100 users, right?
link |
00:31:31.580
By the end of that year.
link |
00:31:32.660
But the fact that I got 100 users and more than that,
link |
00:31:35.660
people started to email me with fixes.
link |
00:31:39.420
And that was actually intoxicating, right?
link |
00:31:41.300
That was the, here I'm writing papers
link |
00:31:44.220
and I'm giving conferences and I get people to say hello,
link |
00:31:46.180
but yeah, good job.
link |
00:31:47.420
But mostly it was, you're viewed with,
link |
00:31:49.860
it's competitive, right?
link |
00:31:51.540
You publish a paper and people are like,
link |
00:31:52.900
oh, it wasn't my paper.
link |
00:31:55.900
I was starting to see that sense of academic life
link |
00:31:59.220
where it was so much,
link |
00:32:00.180
I thought there was this cooperative effort,
link |
00:32:01.460
but it sounds like we're here just to one up each other.
link |
00:32:04.940
And it's not true across the board,
link |
00:32:07.700
but a lot of that's there.
link |
00:32:08.580
But here in this world,
link |
00:32:09.660
I was getting responses from people all over the world.
link |
00:32:13.700
I remember Pjaro Peterson in Estonia, right?
link |
00:32:16.060
Was one of the first people.
link |
00:32:17.340
And he sent me back this make file,
link |
00:32:18.740
cause the first thing it is, yeah, your build thing stinks
link |
00:32:21.220
and here's a better make file.
link |
00:32:23.020
Now it was a complex make file.
link |
00:32:24.380
I don't think I never understood that make file actually,
link |
00:32:26.580
but it worked and it did a lot more.
link |
00:32:29.220
And so I said, thanks, this is cool.
link |
00:32:30.980
And that was my first kind of engagement
link |
00:32:32.500
with community development.
link |
00:32:35.100
But the process was, he sent me a patch file.
link |
00:32:37.660
I had to upload a new tar ball.
link |
00:32:39.900
And I just found, I really love that.
link |
00:32:41.580
And the style back then was here's a mailing list.
link |
00:32:43.660
It's very, it wasn't as,
link |
00:32:45.740
it's certainly weren't the tools that are available today.
link |
00:32:47.660
It was very early on, but I really started to,
link |
00:32:49.940
that's the whole year.
link |
00:32:50.780
I think I did about seven packages that year, right?
link |
00:32:54.580
And then by the end of the year,
link |
00:32:55.540
I collected them into a thing called multipack.
link |
00:32:57.840
So in 99, there was this thing called multipack.
link |
00:32:59.780
And that's when a high school student,
link |
00:33:01.820
no, he was a high school student at the time,
link |
00:33:03.060
guy named Robert Kern,
link |
00:33:04.780
took that package and made a Windows installer, right?
link |
00:33:09.700
And then of course, a massive increase of usage.
link |
00:33:12.700
So by the way, most of this development was under Linux.
link |
00:33:15.860
Yes, yes, it was on Linux.
link |
00:33:17.380
I was a Linux developer doing it on a Unix box.
link |
00:33:20.240
I mean, at the time I was actually getting into,
link |
00:33:23.020
I had a new hard drive,
link |
00:33:24.060
did some kernel programming to make the hard drive work.
link |
00:33:26.500
I mean, not programming, but modification to the kernel
link |
00:33:28.780
so I could actually get a hard drive working.
link |
00:33:31.180
I love that aspect of it.
link |
00:33:32.320
I was also in, at school, I was building a cluster.
link |
00:33:36.100
I took Mac computers and you put yellow dog Linux on them.
link |
00:33:40.940
At the Mayo Clinic, they were just,
link |
00:33:42.140
they had all these Macs that were older,
link |
00:33:43.520
they were just getting rid of.
link |
00:33:44.740
And so I kind of got permission to go grab them together.
link |
00:33:46.820
I put about 24 of them together in a cluster, in a cabinet,
link |
00:33:50.340
and put yellow dog Linux on them all.
link |
00:33:51.700
And I wrote a C++ program to do MRI simulation.
link |
00:33:56.240
That was what I was doing at the same time
link |
00:33:58.900
for my day job, so to speak.
link |
00:34:01.400
So I was loving the whole process.
link |
00:34:03.460
And the same time I was,
link |
00:34:04.300
oh, I need a ordinary differential equation.
link |
00:34:06.260
That's why ordinary differential equations were key
link |
00:34:08.160
was because that's the heart of a block equation
link |
00:34:09.820
for simulating MRI, is an ODE solver.
link |
00:34:12.420
And so that's, but I actually did that,
link |
00:34:15.720
it just happened at the same time.
link |
00:34:16.980
That's why it was kind of what you're working on
link |
00:34:18.540
and what you're interested in, they're coinciding.
link |
00:34:20.500
I was definitely scratching my own itch
link |
00:34:22.380
in terms of building stuff.
link |
00:34:24.060
And which helped in the sense that I was using it for me,
link |
00:34:27.040
so at least I had one user.
link |
00:34:28.540
I had one person who was like, well, no, this is better.
link |
00:34:30.360
I like this interface better.
link |
00:34:31.420
And I had the experience of MATLAB
link |
00:34:33.300
to guide some of what those APIs might look like.
link |
00:34:36.480
But you're just doing yourself,
link |
00:34:37.720
you're building all this stuff.
link |
00:34:39.000
But with the Windows installer,
link |
00:34:40.060
it was the first time I realized, oh yeah,
link |
00:34:41.460
the binary installer really helps people.
link |
00:34:43.740
And so that led to spending more time
link |
00:34:46.980
on that side of things.
link |
00:34:49.100
So around 2000, so I graduated my PhD in 2000,
link |
00:34:52.780
end of year, end of 2000.
link |
00:34:53.780
So 99 doing a lot of work there,
link |
00:34:56.660
98 doing a lot of work there,
link |
00:34:57.740
99 kind of spending more time on my PhD,
link |
00:35:00.780
helping people use the tools,
link |
00:35:02.420
thinking about what do I want to go from here.
link |
00:35:04.060
There was a company, there was a guy actually,
link |
00:35:05.620
Eric Jones and Travis Vought.
link |
00:35:07.620
They were two friends who founded a company called NTHOT.
link |
00:35:11.080
It's here in Austin, still here.
link |
00:35:13.620
And they, Eric contacted me at the time
link |
00:35:16.060
when I was a graduate student still.
link |
00:35:19.380
And he said, hey, why don't you come down?
link |
00:35:20.860
We want to build a company.
link |
00:35:22.660
We're thinking of a scientific company
link |
00:35:25.720
and we want to take what you're doing
link |
00:35:27.560
and kind of add it to some stuff that he'd done.
link |
00:35:29.460
He'd written some tools.
link |
00:35:31.220
And then Piero Peterson had done F2Py.
link |
00:35:32.820
Let's come together and build,
link |
00:35:34.380
pull this all together and call it SciPy.
link |
00:35:36.740
So that's the origin of the SciPy brand.
link |
00:35:39.480
It came from multi pack
link |
00:35:41.380
and a whole bunch of modules I'd written,
link |
00:35:42.580
plus a few things from some other folks
link |
00:35:44.500
and then pulled together in a single installer.
link |
00:35:47.580
SciPy was really a distribution of Python
link |
00:35:49.540
masquerading as a library.
link |
00:35:51.260
How did you think about SciPy in context of Python,
link |
00:35:54.340
in context of Numeric, like what?
link |
00:35:56.180
So we saw SciPy as a way to make an R&D environment
link |
00:35:59.020
for Python, like use Python, depended on Numeric.
link |
00:36:03.380
So Numeric was the array library we depended on.
link |
00:36:05.540
And then from there, extend it with a bunch of modules
link |
00:36:08.260
that allowed for, and at the time,
link |
00:36:10.340
the original vision of SciPy was to have plotting,
link |
00:36:13.180
was to have the REPL environment
link |
00:36:16.140
and kind of really a whole data environment
link |
00:36:19.500
that you could then install and get going with.
link |
00:36:21.020
And that was kind of the thinking.
link |
00:36:23.020
It didn't really evolve that way, right?
link |
00:36:25.020
It sort of had a, for one,
link |
00:36:27.580
it's really hard to do massive scale projects
link |
00:36:31.940
with open source collectives.
link |
00:36:34.300
Actually, there's sort of an intrinsic cooperation limit
link |
00:36:38.500
as to which, too many cooks in the kitchen,
link |
00:36:40.780
you can do amazing infrastructure work.
link |
00:36:42.780
When it comes down to bringing it all together
link |
00:36:44.220
into a single deliverable,
link |
00:36:45.860
that actually requires a little more product management
link |
00:36:49.660
that is not, that doesn't really emerge
link |
00:36:52.820
from the same dynamic.
link |
00:36:53.980
So it struggled, struggled to get almost too many voices.
link |
00:36:57.860
It's hard to have everybody agree.
link |
00:36:59.220
Consensus doesn't really work at that scale.
link |
00:37:02.100
You end up with politics,
link |
00:37:03.260
with the same kind of things that's happened
link |
00:37:05.220
in large organizations trying to decide
link |
00:37:07.100
what to do together.
link |
00:37:09.380
So consensus building was challenging at scale
link |
00:37:12.340
as more people came in, right?
link |
00:37:13.860
Early on, it's fine, because there's nobody there.
link |
00:37:15.700
So it works, but then as you get more successful
link |
00:37:17.740
and more people use it, all of a sudden,
link |
00:37:18.980
oh, there's this scale at which this doesn't work anymore
link |
00:37:22.300
and we have to come up with different approaches.
link |
00:37:23.980
So Sidepy came out officially in 2001,
link |
00:37:26.700
was the first release, most of the time.
link |
00:37:28.900
I remember the days of getting that release ready.
link |
00:37:31.060
It was a Windows installer and there were bugs
link |
00:37:33.420
on how the Windows compiler handled complex numbers
link |
00:37:36.300
and you were chasing segmentation faults.
link |
00:37:38.540
And it was, it's a lot of work.
link |
00:37:40.420
There was a lot of effort had nothing to do
link |
00:37:43.140
with my area of study.
link |
00:37:45.540
And at the same time, I had just gotten an offer.
link |
00:37:47.500
So he wondered if I wanted to come down
link |
00:37:48.780
and help him start that company with his friend.
link |
00:37:51.460
And at the time I was like, I was intrigued,
link |
00:37:53.380
but I was squaring a path, an academic path.
link |
00:37:56.620
And I had just got an offer to go and teach at my alma mater.
link |
00:37:59.980
So I took that tenure track position.
link |
00:38:02.420
And Sidepy, and kind of, then I started to work on Sidepy
link |
00:38:05.180
as a professor too.
link |
00:38:07.060
So that's, I left, I've got the Mayo Clinic,
link |
00:38:09.540
graduated, wrote my thesis using Sidepy,
link |
00:38:11.700
wrote, you know, there's images that were created.
link |
00:38:15.500
Now the plotting tool I used was something
link |
00:38:17.300
from Yorick actually.
link |
00:38:18.660
It was a plotting, a PLT kind of a plotting language
link |
00:38:21.940
that I used.
link |
00:38:22.780
Yorick is a programming language?
link |
00:38:23.940
It was a programming language, had a plotting tool,
link |
00:38:26.340
Dyslin, it had integration to Dyslin.
link |
00:38:28.940
I ended up using Dyslin plus some of the plotting
link |
00:38:31.340
from Yorick linked to from Python.
link |
00:38:33.740
Anyway, it was, people don't plot that way now,
link |
00:38:37.180
but this is before, and Sidepy was trying to add plotting.
link |
00:38:40.260
Yeah. Right?
link |
00:38:41.460
It didn't have much success.
link |
00:38:42.580
Really the success of plotting came from John Hunter,
link |
00:38:45.580
who had a similar experience to my experience,
link |
00:38:47.420
my kind of maverick experience as a person
link |
00:38:49.660
just trying to get stuff done and kind of having more time
link |
00:38:51.700
than money maybe, right?
link |
00:38:53.820
And John Hunter created what?
link |
00:38:55.300
MapPlotLib.
link |
00:38:56.300
He's the creator of MapPlotLib.
link |
00:38:57.140
Yeah, so John Hunter was, you know,
link |
00:38:59.140
he wasn't a student at the time, but he was an,
link |
00:39:00.580
he was working in Quant field and he said,
link |
00:39:02.120
we need better plotting.
link |
00:39:03.500
So he just went out and said, cool, I'll make a new project
link |
00:39:05.540
and we'll call it MapPlotLib.
link |
00:39:06.580
And he released in 2001,
link |
00:39:08.260
about the same time that Sidepy came out
link |
00:39:09.920
and it was separate library, separate install,
link |
00:39:12.960
use numeric, Sidepy use numeric.
link |
00:39:15.540
And so Sidepy, you know, in 2001, we released Sidepy
link |
00:39:18.980
and then Endthought created a conference called Sidepy,
link |
00:39:22.380
which was brought people together to talk about the space.
link |
00:39:25.460
And that conference is still ongoing.
link |
00:39:26.700
It's one of the favorite conferences of a lot of people
link |
00:39:28.460
because it's, you know, it's changed over the years,
link |
00:39:30.820
but early on it was, you know, a collection of 50 people
link |
00:39:33.740
who care about, scientists mostly, you know,
link |
00:39:36.700
practicing scientists who want, who care about coding
link |
00:39:39.300
and doing it well and not using MATLAB.
link |
00:39:42.140
And I remember being driven by, you know, I liked MATLAB,
link |
00:39:44.120
but I didn't like the fact that,
link |
00:39:46.420
so I'm not opposed to proprietary software.
link |
00:39:48.060
I'm actually not an open source zealot.
link |
00:39:50.220
I love open source for the, what it brings,
link |
00:39:52.660
but I also see the role for proprietary software.
link |
00:39:54.460
But what I didn't like was the fact that I would develop
link |
00:39:56.580
code and publish it and then effectively telling somebody
link |
00:39:59.940
here to run my code, you have to have
link |
00:40:01.420
this proprietary software.
link |
00:40:02.500
Right, and there's also culture around MATLAB as much,
link |
00:40:05.940
because I've talked to a few folks in,
link |
00:40:08.260
MathWorks creates MATLAB?
link |
00:40:09.820
Yeah.
link |
00:40:10.820
I mean, there's just a culture, they try really hard,
link |
00:40:13.900
but it just, there's this corporate IBM style culture
link |
00:40:16.820
that's like, or whatever.
link |
00:40:18.380
I don't want to say negative things about IBM or whatever,
link |
00:40:20.780
but there's a...
link |
00:40:22.260
No, it's really that connection.
link |
00:40:23.740
It's something I'm in the middle of right now
link |
00:40:24.940
is the business of open source.
link |
00:40:27.000
And how do you connect the ethos of cooperative development
link |
00:40:30.820
with the necessity of creating profits, right?
link |
00:40:34.780
And like right now today, I'm still in the middle of that.
link |
00:40:38.060
That's actually the early days of me exploring this question.
link |
00:40:42.260
Cause I was writing SciPy, I mean, as an aside,
link |
00:40:44.660
I also had, so I had three kids at the time.
link |
00:40:46.540
I have six kids now.
link |
00:40:47.860
I got married early, wanted a family.
link |
00:40:50.860
I had three kids and I remember reading,
link |
00:40:52.620
I read Richard Stallman's post and I was a fan of Stallman.
link |
00:40:55.540
I would read his work, I liked this collective ideas
link |
00:40:58.100
he would have.
link |
00:40:58.940
Certainly the ideas on IP law, I read a lot of his stuff.
link |
00:41:01.740
But then he said, okay, well,
link |
00:41:04.820
how do I make money with this?
link |
00:41:05.780
How do I make a living?
link |
00:41:06.700
How do I pay for my kids?
link |
00:41:07.740
All this stuff was in my mind,
link |
00:41:09.300
young graduate student making no money,
link |
00:41:10.640
thinking I got to get a job.
link |
00:41:12.060
And he said, well, I think just be like me
link |
00:41:14.540
and don't have kids, right?
link |
00:41:15.840
That's just, don't, don't.
link |
00:41:17.080
That's his take on it.
link |
00:41:18.540
That was what he said in that moment, right?
link |
00:41:20.860
That's the thing I read and I went,
link |
00:41:22.420
okay, this is a train I can't get on.
link |
00:41:24.960
There has to be a way to preserve the culture
link |
00:41:26.700
of open source and still be able to make sufficient money
link |
00:41:29.180
to feed your kids.
link |
00:41:30.020
Yes, exactly, there's gotta be.
link |
00:41:31.500
Well, so that actually led me to a study of economics.
link |
00:41:34.500
Because at the time I was ignorant and I really was.
link |
00:41:36.680
I'm actually, I'm embarrassed for educational system
link |
00:41:39.420
that they could let me and I was valedictorian
link |
00:41:41.300
in my high school class and I did super well in college.
link |
00:41:43.720
And like academically I did great, right?
link |
00:41:47.620
But the fact that I could do that and then be clueless
link |
00:41:49.980
about this key part of life,
link |
00:41:52.740
it led me to go, there's a problem.
link |
00:41:54.400
Like I should have learned this in fifth grade.
link |
00:41:56.660
I should have learned this in eighth grade.
link |
00:41:58.380
Like everybody should come out
link |
00:41:59.220
with a basic knowledge of economics.
link |
00:42:01.700
You're an interesting example because you've created tools
link |
00:42:04.040
that change the lives of probably millions of people
link |
00:42:07.640
and the fact that you don't understand at the time
link |
00:42:10.060
of the creation of those tools, the basics economics
link |
00:42:12.860
of how like to build up a giant system is the problem.
link |
00:42:15.260
Yeah, it's a problem.
link |
00:42:16.100
And so during my PhD at the same time,
link |
00:42:18.260
this is back in 98, 99 at the same time,
link |
00:42:20.720
I was in a library, I was reading books on capitalism,
link |
00:42:23.380
I was reading books on Marxism,
link |
00:42:24.700
I was reading books on what is this thing?
link |
00:42:27.700
What does it mean?
link |
00:42:29.700
And I encountered, basically I encountered a set of writings
link |
00:42:33.140
from people that said they were the inheritors of Adam Smith.
link |
00:42:35.500
Read Adam Smith for the first time, right?
link |
00:42:37.220
Which is the wealth of nations
link |
00:42:38.580
and kind of this notion of emergent societies
link |
00:42:42.460
and realized, oh, there's this whole world out here
link |
00:42:45.100
of people and the challenge of economics is also political.
link |
00:42:49.460
Like, cause economics, people, different parties
link |
00:42:53.940
running for office, they want their economic friends.
link |
00:42:58.080
They want their economists to back them up, right?
link |
00:43:00.040
Or to be their magicians, like the magicians
link |
00:43:03.700
in Pharaoh's court, right?
link |
00:43:04.660
The people that are kind of say, hey, this is,
link |
00:43:06.260
you should listen to me because I've got the expert
link |
00:43:08.100
who says this.
link |
00:43:09.420
And so it gets really muddled, right?
link |
00:43:11.540
But I was looking at it from as a scientist going,
link |
00:43:14.020
what is this space?
link |
00:43:14.860
What does this mean?
link |
00:43:15.680
How does Paris get fed?
link |
00:43:16.940
How does, what is money?
link |
00:43:18.420
How does it work?
link |
00:43:19.420
And I found a lot of writings that I really loved.
link |
00:43:21.580
I found some things that I really loved
link |
00:43:22.860
and I learned from that.
link |
00:43:23.980
It was writings from people like Von Missess.
link |
00:43:26.300
He wrote a paper in 1920 that still should be read
link |
00:43:29.060
more than it is.
link |
00:43:29.900
It was the economic calculation problem
link |
00:43:33.060
of the socialist commonwealth.
link |
00:43:34.560
It was basically in response
link |
00:43:35.420
to the Bolshevik revolution in 1917.
link |
00:43:37.140
And his basic argument was it's not gonna work
link |
00:43:40.180
to not have private property.
link |
00:43:41.780
You're not gonna be able to come up with prices.
link |
00:43:43.420
The bureaucrats aren't gonna be able to determine
link |
00:43:45.200
how to allocate resources without a price system.
link |
00:43:47.620
And a price system emerges from people making trades.
link |
00:43:51.700
And they can only make trades if they have authority
link |
00:43:53.860
over the thing they're trading.
link |
00:43:55.460
And that creates information flow
link |
00:43:58.020
that you just don't have if you try to top down it.
link |
00:44:01.300
Right.
link |
00:44:02.140
And it's like, huh, that's a really good point.
link |
00:44:04.780
Yeah, the prices have a signal that's used.
link |
00:44:06.860
And it's important to have that signal
link |
00:44:09.400
when you're trying to build a community
link |
00:44:11.020
of productive people like you would
link |
00:44:12.580
in the software engineering space.
link |
00:44:13.700
Yeah, the prices are actually
link |
00:44:14.860
an important signaling mechanism.
link |
00:44:17.540
Right, and that money is just a bartering tool.
link |
00:44:20.820
Right, so this is the first time I've encountered
link |
00:44:22.540
any of this concept, right, and the fact that,
link |
00:44:24.440
oh, this is actually really critical.
link |
00:44:26.600
Like it's so critical to our prosperity
link |
00:44:29.340
and that we're dangerously not learning about this,
link |
00:44:34.100
not teaching our children about this.
link |
00:44:36.140
So you had the three kids,
link |
00:44:37.260
you had to make some hard decisions.
link |
00:44:38.080
I had to make some money, right, had to figure it out.
link |
00:44:39.880
But I didn't really care.
link |
00:44:40.720
I mean, I've never been driven by money, just need it.
link |
00:44:43.260
Yeah, right, need to eat.
link |
00:44:45.200
So how did that resolve itself in terms of site buy?
link |
00:44:49.100
So I would say it didn't really resolve itself.
link |
00:44:51.320
It sort of started a journey that I'm continuing on.
link |
00:44:53.420
I'm still on, I would say.
link |
00:44:54.740
I don't think it resolved itself.
link |
00:44:55.660
But I will say I went in eyes wide open.
link |
00:44:59.260
Like I knew that there were problems
link |
00:45:00.940
with giving stuff away and creating the market externalities
link |
00:45:07.900
that the fact that, yeah, people might use it
link |
00:45:09.780
and I might not get paid for it
link |
00:45:10.820
and I'll have to figure something else out to get paid.
link |
00:45:13.060
Like at least I can say I'm not bitter
link |
00:45:14.940
that a lot of people have used stuff that I've written
link |
00:45:17.220
and I haven't necessarily benefited economically from it.
link |
00:45:20.240
I've heard other people be bitter about that
link |
00:45:22.300
when they write or they talk.
link |
00:45:23.300
Like, oh, I should've got more value out of this.
link |
00:45:24.900
And I'm also, I want to create systems
link |
00:45:27.740
that let people like me who might have these desires
link |
00:45:31.060
to do things, let them benefit.
link |
00:45:32.260
So it actually creates more of the same.
link |
00:45:34.700
Not to turn on your bitterness module,
link |
00:45:36.900
but there's some aspect, I wish there was mechanisms for me
link |
00:45:40.940
to reward whoever created side buy and non buy
link |
00:45:43.580
because it brought so much joy to my life.
link |
00:45:45.300
I appreciate that.
link |
00:45:46.140
You know what I mean?
link |
00:45:46.980
The tip dark notion was there.
link |
00:45:48.340
I appreciate that.
link |
00:45:49.180
But there should be a very frictionless mechanism.
link |
00:45:51.940
There should be a frictionless mechanism.
link |
00:45:52.760
I totally agree.
link |
00:45:53.600
I would love to talk about some of the ideas I have
link |
00:45:55.220
because I actually came across,
link |
00:45:56.220
I think I've come up with some interesting notions
link |
00:45:58.200
that could work, but they'll require anything that will work
link |
00:46:01.860
takes time to emerge, right?
link |
00:46:03.740
Like things don't just turn overnight.
link |
00:46:04.940
That's definitely one thing I've also understood
link |
00:46:06.340
and learned is any fixes, that's why it's kind of funny.
link |
00:46:10.120
We often give credit to, oh, this president gets elected
link |
00:46:12.940
and oh, look how great things have done.
link |
00:46:14.420
And I saw that when I had a transition in a condo
link |
00:46:18.340
when a new CEO came in, right?
link |
00:46:19.520
And it's like the success that's happening,
link |
00:46:22.340
there's an inertia there.
link |
00:46:23.460
Yeah, and sometimes the decision you made
link |
00:46:25.740
like 10 years before is the reason why the success is the.
link |
00:46:28.980
Right, exactly.
link |
00:46:29.820
So we're sort of just running around taking credit
link |
00:46:31.560
for stuff.
link |
00:46:32.400
The credit assignment has like a delay to it
link |
00:46:35.140
that makes the credit assignment basically wrong
link |
00:46:38.320
more than right.
link |
00:46:39.160
Wrong more than right, exactly.
link |
00:46:40.320
And so I'm like, oh, this is, you know,
link |
00:46:42.140
that's the stuff I would read a ton about, you know,
link |
00:46:44.860
early on.
link |
00:46:45.700
So I don't, I feel like I'm with you.
link |
00:46:47.720
Like I want the same thing.
link |
00:46:48.780
I want to be able to, and honestly, not for personally,
link |
00:46:50.900
I've been happy.
link |
00:46:51.740
I've been happy.
link |
00:46:52.720
I feel like I don't have any, I mean,
link |
00:46:53.980
we've been done reasonably okay, but I've had to pursue it.
link |
00:46:56.920
Like that's really what started my trajectory from academia
link |
00:47:01.380
is reading that stuff led me to say,
link |
00:47:02.940
oh, entrepreneurship matters.
link |
00:47:05.780
So I love software, but we need more entrepreneurs
link |
00:47:09.180
and I wanna understand that better.
link |
00:47:10.360
So once I kind of had that virus infect my brain,
link |
00:47:16.500
even though I was on a trajectory
link |
00:47:17.580
to go to a tenure track position at a university
link |
00:47:20.640
and I was there for six years,
link |
00:47:22.780
I was kind of already out the door when I started.
link |
00:47:26.060
And we can get into that, but.
link |
00:47:27.660
Well, can I just ask you a quick question on,
link |
00:47:30.340
is there some design principles
link |
00:47:32.740
that were in your mind around SciPy?
link |
00:47:34.740
Like, is there some key ideas
link |
00:47:36.460
that were just like sticking to you
link |
00:47:38.060
that this is the fundamental ideas?
link |
00:47:40.300
Yeah, I would say so.
link |
00:47:41.140
I would think it's basically accessibility to scientists,
link |
00:47:43.680
like give them, give scientists and engineers tools
link |
00:47:46.980
that they don't have to think a lot about programming.
link |
00:47:48.380
So give them really good building blocks,
link |
00:47:50.300
give them functions that they wanna call
link |
00:47:51.860
and sort of just the right length of spelling.
link |
00:47:55.860
There's one tradition in programming where it's like,
link |
00:47:59.500
make very, very long names, right?
link |
00:48:01.880
And you can see it in some programming languages
link |
00:48:03.700
where the names get, take half the screen.
link |
00:48:06.460
And in the 4chan world, characters had to be six letters
link |
00:48:11.540
early on, right?
link |
00:48:12.380
And that's way too much, too little.
link |
00:48:14.340
But I was like, I liked to have names
link |
00:48:16.820
that were informative but short.
link |
00:48:18.940
So even though Python, well this is a different conversation,
link |
00:48:22.340
but documentation is doing some work there.
link |
00:48:25.860
So when you look at great scientific libraries
link |
00:48:29.180
and functions, there's a richness of documentation
link |
00:48:32.700
that helps you get into the details.
link |
00:48:34.820
The first glance at a function gives you the intuition
link |
00:48:37.620
of all it needs to do by looking at the headers and so on.
link |
00:48:40.540
But to get the depths of all the complexities involved,
link |
00:48:43.420
all the options involved,
link |
00:48:44.740
documentation does some of the work.
link |
00:48:45.580
Documentation is essential, yeah.
link |
00:48:47.380
So that was actually a, so we thought about several things.
link |
00:48:50.520
One is we wanted plotting.
link |
00:48:51.940
We wanted interactive environment.
link |
00:48:53.580
We wanted good documentation.
link |
00:48:54.860
These are things we knew, we wanted.
link |
00:48:56.780
The reality is those took about 10 years to evolve, right?
link |
00:49:00.460
Given the fact that we didn't have a big budget,
link |
00:49:02.060
it was all volunteer labor.
link |
00:49:03.100
It was sort of, when nthought got created
link |
00:49:06.980
and they started to try to find projects,
link |
00:49:10.060
people would pay for pieces
link |
00:49:11.080
and they were able to fund some of it.
link |
00:49:13.740
Not nearly enough to keep up with what was necessary.
link |
00:49:15.780
And no criticism, just simply the reality.
link |
00:49:18.860
I mean, it's hard to start a business
link |
00:49:21.180
and then do consulting and then also
link |
00:49:23.220
promote an open source project that's still fairly new.
link |
00:49:26.180
Cypo is fairly niche.
link |
00:49:27.780
We stayed connected all while I was a student,
link |
00:49:30.140
sorry, a professor.
link |
00:49:30.980
I went to BYU and started to teach.
link |
00:49:32.340
Electrical engineering, all the applied math courses.
link |
00:49:35.060
I loved teaching single processing,
link |
00:49:36.980
probability theory, electromagnetism.
link |
00:49:39.180
I was, if you look at writing my professor,
link |
00:49:40.940
which my kids loved to do,
link |
00:49:42.500
I wasn't, I got some bad reviews because people.
link |
00:49:46.900
What was the criticism?
link |
00:49:48.580
I would speak too high of a level.
link |
00:49:50.920
Like I definitely had a calibration problem
link |
00:49:52.640
coming out of graduate work
link |
00:49:54.980
where I hate to be condescending to people.
link |
00:49:56.980
Like I really have a ton of respect for people fundamentally.
link |
00:49:59.300
Like my fundamental thing is I respect people.
link |
00:50:02.060
Sometimes that can lead to a,
link |
00:50:03.900
I was thinking they had more knowledge than they did.
link |
00:50:07.640
And so I would just speak at a very high level,
link |
00:50:10.100
assume they got it.
link |
00:50:11.060
But they need to rise to the standard that you set.
link |
00:50:14.340
I mean, that's one of the,
link |
00:50:15.260
some of the greatest teachers do that.
link |
00:50:17.180
And I agree.
link |
00:50:18.020
And that was kind of what was inspiring me.
link |
00:50:19.760
But you also have to,
link |
00:50:22.160
I cannot say I was articulate
link |
00:50:24.820
with some of the greatest teachers, right?
link |
00:50:26.300
I was, like one classic example,
link |
00:50:28.540
when I first taught at BYU,
link |
00:50:30.420
my very first class, it was overheads,
link |
00:50:31.980
transparencies, overheads.
link |
00:50:34.100
Before projectors were really that common,
link |
00:50:35.940
I taught transparencies.
link |
00:50:37.100
I'm writing my notes out.
link |
00:50:38.260
I go in, room's half dark.
link |
00:50:40.540
I just blaring through these transparencies.
link |
00:50:42.780
Here it is, here it is, here it is.
link |
00:50:44.900
And I did give a quiz after two weeks.
link |
00:50:47.480
No one knew anything.
link |
00:50:48.900
Nothing I had taught had gotten anywhere.
link |
00:50:50.940
And I realized, okay, I'm not, this is not working.
link |
00:50:54.140
So I put away the transparencies
link |
00:50:56.380
and I turned around and just started using the chalkboard.
link |
00:50:58.860
And what it did is it slowed me down, right?
link |
00:51:00.980
The chalkboard just slowed me down
link |
00:51:02.260
and gave people time to process and to think.
link |
00:51:04.440
And then that made me focus.
link |
00:51:06.080
My writing wasn't great on the chalkboard,
link |
00:51:07.900
but I really love that part of like the teaching.
link |
00:51:10.520
So that entered SciPy's world in terms of,
link |
00:51:12.500
we always understood that there's a didactic aspect
link |
00:51:14.860
of SciPy, kind of how do you take the knowledge
link |
00:51:17.740
and then produce it?
link |
00:51:18.640
The challenge we had was the scope.
link |
00:51:21.020
Like ultimately SciPy was everything, right?
link |
00:51:23.420
And so 2001, when it first came out,
link |
00:51:25.600
people were starting to use it.
link |
00:51:26.800
No, this is cool, this is a tool we actually use.
link |
00:51:29.580
At the same time, 2001 timeframe,
link |
00:51:31.400
there was a little bit of like the Hubble Space Telescope,
link |
00:51:33.940
the folks at Hubble that started to say,
link |
00:51:35.400
hey, Python, we're gonna use Python
link |
00:51:36.620
for processing images from Hubble.
link |
00:51:38.720
And so Perry Greenfield was a good friend
link |
00:51:40.820
in running that program.
link |
00:51:42.420
And he had called me before I left WIU and said,
link |
00:51:45.060
you know, we wanna do this,
link |
00:51:47.020
but numeric actually has some challenges in terms of,
link |
00:51:50.020
you know, it's not, the array doesn't have enough types.
link |
00:51:52.700
We need more operations.
link |
00:51:54.280
You know, broadcasting needs to be a little more settled.
link |
00:51:56.660
They wanted record arrays.
link |
00:51:57.960
They wanted, you know, record arrays are like a data frame,
link |
00:52:00.600
but a little bit different,
link |
00:52:02.220
but they wanted more structured data.
link |
00:52:03.820
So he had called me even early on then,
link |
00:52:06.020
and he said, you know, what,
link |
00:52:06.860
would you wanna work on something to make this work?
link |
00:52:08.300
And I said, yeah, I'm interested, but I'm going here,
link |
00:52:10.140
and I, you know, we'll see if I have time.
link |
00:52:12.100
So in the meantime, while I was teaching
link |
00:52:13.340
and SciPy was emerging, and I had a student,
link |
00:52:15.660
I was constantly, while I was teaching,
link |
00:52:16.840
trying to figure a way to fund this stuff.
link |
00:52:18.840
So I had a graduate student, my only graduate student,
link |
00:52:21.660
a Chinese fellow, Liu Hongze is his name, great guy.
link |
00:52:26.260
He wrote a bunch of stuff for iterative linear algebra,
link |
00:52:29.900
like got into writing some of the iterative
link |
00:52:31.380
linear algebra tools that are currently there in SciPy,
link |
00:52:34.340
and they've gotten better since,
link |
00:52:36.040
but this is in 2005, kept working on SciPy,
link |
00:52:39.260
but Perry has started working on a replacement
link |
00:52:43.060
to numeric called NumArray.
link |
00:52:45.300
And in 2004, a package called ND Image,
link |
00:52:49.020
it was an image processing library
link |
00:52:50.740
that was written for NumArray,
link |
00:52:53.220
and it had in it a morphology tool.
link |
00:52:55.580
I don't know if you know what morphology is.
link |
00:52:56.740
It's open, dilations, closed, you know,
link |
00:52:58.540
there was sort of this, as a medical imaging student,
link |
00:53:01.660
I knew what it was,
link |
00:53:02.500
because it was used in segmentation a lot.
link |
00:53:04.420
And in fact, I'd wanted to do something like that
link |
00:53:06.460
in Python, in SciPy, but just had never gotten around to it.
link |
00:53:10.220
So when it came out, but it worked only on NumArray,
link |
00:53:14.180
and SciPy needed numeric,
link |
00:53:16.420
and so we effectively had the beginning of this split.
link |
00:53:20.040
And numeric and NumArray didn't share data,
link |
00:53:22.500
they were just two, so you could have a gigabyte
link |
00:53:24.420
of numeric, NumArray data, and gigabyte of numeric data,
link |
00:53:26.540
and they wouldn't share it.
link |
00:53:27.380
And so you had these,
link |
00:53:28.500
then you had these scientific libraries written on top.
link |
00:53:31.300
I got really bugged by that.
link |
00:53:32.940
I got really like, oh man, this is not good,
link |
00:53:35.060
we're not cooperating now,
link |
00:53:36.300
we're sort of redoing each other's work,
link |
00:53:37.980
and we're just this young community.
link |
00:53:40.380
So that's what led me, even though I knew it was risky,
link |
00:53:43.940
because my, you know, I was on a tenure track position,
link |
00:53:47.140
2004 I got reviewed.
link |
00:53:48.540
They said, hey, things are going okay,
link |
00:53:49.540
you're doing well, paper's coming out,
link |
00:53:51.540
but you're kind of spending a lot of time
link |
00:53:52.460
doing this open source stuff, maybe do a little less of that,
link |
00:53:54.780
and a little more of the paper writing and grant writing,
link |
00:53:57.260
which was naive, but it was definitely the thinking.
link |
00:54:00.860
It still goes on.
link |
00:54:01.700
Still goes on.
link |
00:54:03.060
You're basically creating a thing
link |
00:54:05.120
which enables science in the 21st century.
link |
00:54:08.300
Right.
link |
00:54:09.340
Maybe don't emphasize that so much in your free year tenure.
link |
00:54:11.980
Right.
link |
00:54:13.460
It illustrates some of the challenges.
link |
00:54:14.860
Yes.
link |
00:54:15.700
It does, and it's, people mean well.
link |
00:54:18.220
Yes.
link |
00:54:19.060
Like, but we've gotten broken in a bunch of ways.
link |
00:54:22.340
Certain things, programming,
link |
00:54:23.660
understanding the role of software engineering,
link |
00:54:25.500
programming in society is a little bit lacking.
link |
00:54:27.860
Exactly.
link |
00:54:28.700
Now, I was in electrical engineering position.
link |
00:54:30.020
Right.
link |
00:54:30.860
That's even worse there.
link |
00:54:33.140
Yeah, it was very, they were very focused,
link |
00:54:34.700
and so, you know, good people, and I had a great time,
link |
00:54:37.300
I loved my time, I loved my teaching,
link |
00:54:38.940
I loved all the things I did there.
link |
00:54:40.460
The problem was, the split was happening
link |
00:54:42.540
in this community that I loved, right?
link |
00:54:43.940
I saw people, and I went, oh my gosh,
link |
00:54:45.460
this is gonna be, this is not great,
link |
00:54:47.780
and so I happened, you know, fate,
link |
00:54:50.020
I had a class I had signed up for,
link |
00:54:52.620
it's a, I was trying to build an MRI system,
link |
00:54:54.860
so I had a kind of a radio, instead of a radio,
link |
00:54:58.300
a digital radio class, it was a digital MRI class.
link |
00:55:01.820
And I had people sign up, two people signed up,
link |
00:55:04.020
then they dropped, and so I had nobody in this class.
link |
00:55:06.660
So, and I didn't have any other courses to teach,
link |
00:55:08.820
and I thought, oh, I've got some time,
link |
00:55:10.940
and I'll just write, I'll just write a replace,
link |
00:55:13.100
a merger of Numerica Numeray.
link |
00:55:14.820
Like, I'll basically take the numeric code base
link |
00:55:16.980
at the features Numeray was adding,
link |
00:55:19.220
and then kind of come up with a single array library
link |
00:55:21.180
that everybody can use.
link |
00:55:22.460
So that's where NumPy came from,
link |
00:55:24.140
was my thinking, hey, I can do this,
link |
00:55:26.500
and who else is going to?
link |
00:55:27.860
Because at that point, I'd been around the community
link |
00:55:29.260
long enough, and I'd written enough C code,
link |
00:55:30.820
I knew, I knew the structures, and I,
link |
00:55:33.260
in fact, my first contribution to numeric
link |
00:55:35.060
had been writing the CAPI documentation
link |
00:55:38.580
that went in the first documentation for NumPy,
link |
00:55:41.080
for numeric, sorry, this is Paul DuBois,
link |
00:55:43.020
David Asher, Conrad Hinson, and myself.
link |
00:55:45.100
I got credit because I wrote this chapter,
link |
00:55:47.580
which is all the CAPI of Numerica, all the C stuff.
link |
00:55:51.260
So I said, I'm probably the one to do it,
link |
00:55:53.380
and nobody else is gonna do this.
link |
00:55:54.760
So it was sort of, out of a sense of duty and passion,
link |
00:55:58.340
knowing that, eh, I don't think my academic,
link |
00:56:01.460
I don't think the department here is gonna appreciate this,
link |
00:56:03.940
but it's the right thing to do.
link |
00:56:06.020
It was like.
link |
00:56:06.860
Can we just link on that moment?
link |
00:56:08.660
Yeah, yeah.
link |
00:56:09.500
Because the importance of the way you thought
link |
00:56:11.740
and the action you took, I feel is understated
link |
00:56:16.360
and is rare and I would love to see so much more of it
link |
00:56:19.900
because what happens as the tools become more popular,
link |
00:56:24.820
there's a split that happens.
link |
00:56:27.180
And it's a truly heroic and impactful action
link |
00:56:30.940
to in those early, in that early split,
link |
00:56:33.580
to step up and it's like great leaders throughout history,
link |
00:56:37.820
like get, what is the brave heart,
link |
00:56:39.660
like get on a horse and rile the troops
link |
00:56:42.500
because I think that can have, make a big difference.
link |
00:56:46.060
We have TensorFlow versus PyTorch
link |
00:56:48.180
in the machine learning community.
link |
00:56:49.100
We have the same problem today.
link |
00:56:50.380
Yeah, I wonder.
link |
00:56:51.780
It's actually bigger.
link |
00:56:52.620
I wonder if it's possible in the early days
link |
00:56:56.620
to rally the troops.
link |
00:56:58.220
It is possible, especially in the early days.
link |
00:57:00.020
The longer it goes, the harder, right?
link |
00:57:01.620
The more energy in the factions, the harder.
link |
00:57:03.940
But in the early days, it is possible
link |
00:57:05.700
and it's extremely helpful
link |
00:57:07.660
and there's a willingness there,
link |
00:57:09.100
but the challenge is there's just not a willingness
link |
00:57:11.740
to fund it.
link |
00:57:12.980
There's not a willingness to, you know,
link |
00:57:14.880
like I was literally walking into a field
link |
00:57:17.540
saying I'm going to do this
link |
00:57:18.620
and here I am, like, you know,
link |
00:57:20.140
I have five kids at home now.
link |
00:57:23.740
Pressure builds.
link |
00:57:24.820
Sometimes my wife hears these stories
link |
00:57:26.220
and she's like, you did what?
link |
00:57:29.020
I thought we were going to,
link |
00:57:29.860
I thought you were actually on a path
link |
00:57:31.460
to make sure we had resources and money, but,
link |
00:57:34.100
but again, there's a, there's an aspect,
link |
00:57:36.420
I'm a very hopeful person.
link |
00:57:37.860
I'm an optimistic person by nature.
link |
00:57:39.680
I love people.
link |
00:57:41.120
I learned that about myself later on.
link |
00:57:43.140
And part of my, my religious beliefs
link |
00:57:47.220
actually lead to that.
link |
00:57:48.380
And it's why I hold them dear
link |
00:57:49.880
because it's actually how I feel about,
link |
00:57:51.300
that's what leads me to these attitudes,
link |
00:57:53.420
sort of this hopefulness and this sense of,
link |
00:57:55.900
yeah, it may not work out for me financially
link |
00:57:58.600
or maybe, but that's not the ultimate gain.
link |
00:58:00.600
Like that's a thing, but it's not,
link |
00:58:02.940
that's not the scorecard for me.
link |
00:58:05.540
And so I just wanted to be helpful
link |
00:58:07.060
and I knew, and partly because these SciPy conferences,
link |
00:58:09.280
because the maintenance conversations,
link |
00:58:10.860
I knew there was a lot of need for this, right?
link |
00:58:13.300
And so I had this, it wasn't like I was alone
link |
00:58:15.460
in terms of no feedback.
link |
00:58:16.460
I had these people who knew, but it was crazy.
link |
00:58:19.440
Like people who at the time said,
link |
00:58:20.700
yeah, we didn't think you'd be able to do it.
link |
00:58:22.340
We thought it was crazy.
link |
00:58:23.160
And also instructive, like practically speaking,
link |
00:58:26.720
that you had a cool feature
link |
00:58:28.700
that you were chasing the morphology, like the.
link |
00:58:30.820
Yes.
link |
00:58:31.660
Like it's not just like.
link |
00:58:32.500
There's an end result.
link |
00:58:33.460
It's not some visionary thing.
link |
00:58:35.140
I'm going to unite the community.
link |
00:58:36.820
You were like. Correct.
link |
00:58:38.060
You were actually practically,
link |
00:58:39.520
this is what one person actually could do
link |
00:58:42.100
and actually build.
link |
00:58:43.220
Cause that is important.
link |
00:58:44.220
Cause you can get over your skis.
link |
00:58:47.460
You can definitely get over your skis.
link |
00:58:49.060
And I had, in fact, this almost got me over my skis, right?
link |
00:58:52.140
I would say, well, in retrospect, I hate looking back.
link |
00:58:56.140
I can tell you all the flaws with NumPy, right?
link |
00:58:58.540
When I go into it, there's lots of stuff that I'm like,
link |
00:59:00.700
oh man, that's embarrassing.
link |
00:59:01.660
That was wrong.
link |
00:59:02.500
I wish I had somebody stop me with a wet fish there.
link |
00:59:04.300
Like I needed, like what I'd wished I'd had
link |
00:59:07.020
was somebody with more experience and certainly library
link |
00:59:10.460
writing and array library.
link |
00:59:11.540
There's like, I wish I had me.
link |
00:59:12.780
I could go back in time and go do this, do that.
link |
00:59:14.520
There's a more important thing.
link |
00:59:15.480
Cause there's things we did that are still there
link |
00:59:18.100
that are problematic, that created challenges for later.
link |
00:59:20.940
And I didn't know it at the time.
link |
00:59:22.460
Didn't understand how important that was.
link |
00:59:24.420
And in many cases, didn't know what to do.
link |
00:59:26.460
Like there was pieces of the design of NumPy.
link |
00:59:29.060
I didn't know what to do until five years ago.
link |
00:59:31.340
Now I know what they should have been, Ben.
link |
00:59:32.860
But I didn't know at the time and nobody,
link |
00:59:33.960
and I couldn't get the help.
link |
00:59:35.380
Anyway, so I wrote it.
link |
00:59:36.660
It took about, it took four months to write
link |
00:59:38.780
the first version, then about 14 months to make it usable.
link |
00:59:43.360
But it was, it wasn't, it was that first four months
link |
00:59:45.860
of intense writing, coding, getting something out the door
link |
00:59:49.320
that worked that was, it was, it was definitely challenging.
link |
00:59:52.380
And then the big thing I did was create a new type object
link |
00:59:54.900
called D type.
link |
00:59:56.100
That was probably the contribution.
link |
00:59:58.780
And then the fact that I added broad, not just broadcasting,
link |
01:00:01.900
but advanced indexing so that you could do masked indexing
link |
01:00:06.500
and indirect indexing instead of just slicing.
link |
01:00:09.940
So for people who don't know, and maybe you can elaborate,
link |
01:00:13.020
NumPy, I guess the vision in the narrowest sense
link |
01:00:17.660
is to have this object that represents
link |
01:00:21.460
n dimensional arrays.
link |
01:00:23.180
And like at any level of abstraction you want,
link |
01:00:26.300
but basically it could be a black box
link |
01:00:28.220
that you can investigate in ways that you would naturally
link |
01:00:30.940
want to investigate such objects.
link |
01:00:33.340
Yes, exactly.
link |
01:00:34.180
So you could do math on it easily.
link |
01:00:35.740
Math on it easily, yeah.
link |
01:00:37.180
So it had an associated library of math operations
link |
01:00:39.860
and effectively SciPy became an even larger operate set
link |
01:00:43.220
of math operations.
link |
01:00:44.940
So the key for me was I was going to write NumPy
link |
01:00:48.020
and then move SciPy to depend on NumPy.
link |
01:00:50.340
In fact, early on, one of the initial proposals
link |
01:00:52.980
was that we would just write SciPy
link |
01:00:54.540
and it would have the numeric object inside of it.
link |
01:00:56.660
And it'd be SciPy.array or something.
link |
01:00:59.780
That turned out to be problematic because numeric
link |
01:01:02.180
already had a little mini library of linear algebra
link |
01:01:04.820
and some functions, and it had enough momentum,
link |
01:01:08.020
enough users that nobody wanted to,
link |
01:01:10.340
they wanted backward compatibility.
link |
01:01:12.060
One of the big challenges of NumPy
link |
01:01:13.740
was I had to be backward compatible
link |
01:01:14.980
with both numeric and NumArray
link |
01:01:16.980
in order to allow both of those communities to come together.
link |
01:01:19.300
There was a ton of work in creating
link |
01:01:21.140
that backward compatibility
link |
01:01:22.580
that also created echoes in today's object.
link |
01:01:25.420
Like some of the complexity in today's object
link |
01:01:27.180
is actually from that goal of backward compatibility
link |
01:01:30.060
to these other communities,
link |
01:01:31.380
which if you didn't have that, you'd do something different,
link |
01:01:34.620
which is instructive because a lot of things are there.
link |
01:01:37.740
You think, what is that there for?
link |
01:01:38.940
It's like, well, it's a remnant.
link |
01:01:41.380
It's an artifact of its historical existence.
link |
01:01:45.220
By the way, I love the empathy
link |
01:01:46.780
and the lack of ego behind that
link |
01:01:48.460
because I feel, you see that in the split
link |
01:01:51.420
in the JavaScript framework, for example,
link |
01:01:53.340
the arbitrary branching.
link |
01:01:54.860
Right.
link |
01:01:56.980
I think in order to unite people,
link |
01:01:59.020
you have to kind of put your ego aside
link |
01:02:00.620
and truly listen to others.
link |
01:02:02.260
You do.
link |
01:02:03.100
What do you love about NumArray?
link |
01:02:04.820
What do you love about Numeric?
link |
01:02:06.020
Like actually get a sense,
link |
01:02:07.460
we were talking about languages earlier,
link |
01:02:08.860
sort of empathize to the culture,
link |
01:02:11.100
the people that love something about this particular API,
link |
01:02:14.660
some of the naming style
link |
01:02:18.100
or the actual usage patterns
link |
01:02:21.220
and truly understand them
link |
01:02:22.820
and so that you can create that same draw
link |
01:02:26.780
in the united thing. I completely agree.
link |
01:02:28.620
I completely agree.
link |
01:02:29.460
And you have to also have enough passion
link |
01:02:31.780
that you'll do it.
link |
01:02:32.620
It can't be just like a perfunctory,
link |
01:02:34.660
oh yes, I'll listen to you
link |
01:02:36.500
and then I'm not really that excited about it.
link |
01:02:38.380
So it really is an aspect,
link |
01:02:39.620
it's a philosophical, like there's a philia,
link |
01:02:42.260
there's a love of esteeming of others.
link |
01:02:44.260
It's actually at the heart of what,
link |
01:02:47.060
it's sort of a life philosophy for me, right?
link |
01:02:49.220
That I'm constantly pursuing and that helped,
link |
01:02:51.540
absolutely helped.
link |
01:02:52.660
Makes me wonder in a philosophical,
link |
01:02:54.260
like looking at human civilization as one object,
link |
01:02:57.460
it makes me wonder how we can copy and paste Travis's
link |
01:02:59.980
in this book.
link |
01:03:00.820
Well, some aspects, maybe.
link |
01:03:03.300
Some aspects, right, right, exactly.
link |
01:03:05.220
Well, it's a good question.
link |
01:03:07.300
How do we teach this?
link |
01:03:08.140
How do we encourage it?
link |
01:03:09.300
How do we lift it?
link |
01:03:10.140
Because so much of the software world,
link |
01:03:12.700
it's giant communities, right?
link |
01:03:15.140
But it seems like so much is moved by,
link |
01:03:16.820
like little individuals.
link |
01:03:18.180
You talk about like Linus Torvalds.
link |
01:03:21.020
It's like, could you have not,
link |
01:03:23.380
could you have had Linux without him?
link |
01:03:25.980
Could you?
link |
01:03:26.820
Yeah, Guido and Python.
link |
01:03:28.140
Guido and Python.
link |
01:03:28.980
Guido and Python.
link |
01:03:29.820
Well, the iPy community particularly,
link |
01:03:30.980
it's like I said, we wanted to build this big thing,
link |
01:03:32.820
but ultimately we didn't.
link |
01:03:33.780
What happened is we had Mavericks and champions
link |
01:03:36.060
like John Hunter who created Matplotlib.
link |
01:03:37.780
We had Fernando Perez who created iPython.
link |
01:03:39.940
And so we sort of inspired each other,
link |
01:03:42.260
but then it kind of, there's sort of a culture
link |
01:03:43.980
of this selfless giving, the stewardship mentality,
link |
01:03:47.820
as opposed to ownership mentality,
link |
01:03:49.140
but stewardship and community focused,
link |
01:03:54.040
community focused, but intentional work.
link |
01:03:56.620
Like not waiting for everybody else to do the work,
link |
01:03:58.900
but you're doing it for the benefit of others
link |
01:04:00.700
and not worried about what you're gonna get.
link |
01:04:04.020
You're not worried about the credit.
link |
01:04:04.860
You're not worried about what you're gonna get.
link |
01:04:05.860
You're worried about, I later realized
link |
01:04:07.580
that I have to worry a little about credit,
link |
01:04:09.000
not because I want the credit,
link |
01:04:10.300
because I want people to understand
link |
01:04:11.380
what led to the results.
link |
01:04:13.020
Like, I don't, it's not about me.
link |
01:04:15.060
It's I want to understand this is what led to the result.
link |
01:04:17.540
So let's like, I think doing,
link |
01:04:18.980
and this is what had no impact on the result.
link |
01:04:21.100
Like let's promote, just like you said,
link |
01:04:23.420
I want to promote the attributes
link |
01:04:25.100
that help make us better off.
link |
01:04:26.520
How do we make more of West McKinney?
link |
01:04:28.820
Like West McKinney was critical to the success of Python
link |
01:04:31.620
because of his creation of pandas,
link |
01:04:33.420
which is the roots of that were all the way back
link |
01:04:36.420
in numeric and num array and numpy,
link |
01:04:40.260
where numpy created an array of records.
link |
01:04:43.180
West started to use that almost like a data frame,
link |
01:04:45.980
except it's an array of records.
link |
01:04:47.840
And data frame, the challenge is,
link |
01:04:49.780
okay, if you want to augment it at another column,
link |
01:04:52.240
you have to insert, you have to do all this memory movement
link |
01:04:54.700
to insert a column.
link |
01:04:55.660
Whereas data frames became,
link |
01:04:57.180
oh, I'm going to have a loose collection of arrays.
link |
01:05:00.460
So it's a record of arrays that is a part of a data frame.
link |
01:05:03.980
And we thought about that back in the memory days,
link |
01:05:05.560
but West ended up doing the work to build it.
link |
01:05:08.940
And then also the operations that were relevant
link |
01:05:11.300
for data processing.
link |
01:05:12.620
What I noticed is just that each of these little things
link |
01:05:15.220
creates just another tick, another up.
link |
01:05:17.380
So numpy ultimately took a little while,
link |
01:05:19.940
about six months in, people started to join me,
link |
01:05:22.700
Francesc Altad, Robert Kern, Charles Harris.
link |
01:05:27.300
And these people are many of the unsung heroes, I would say.
link |
01:05:30.300
People who are, you know,
link |
01:05:31.980
they sometimes don't get the credit they deserve
link |
01:05:34.100
because they were critical both to support,
link |
01:05:36.540
like, you know, it's hard and you want,
link |
01:05:38.260
you need some support, people need support.
link |
01:05:40.340
And I needed just encouragement.
link |
01:05:41.580
And they were helping and encouraged by contributing.
link |
01:05:43.860
And once, the big thing for me was when John Hunter,
link |
01:05:48.240
he had previously done kind of a simple thing
link |
01:05:50.180
called numerics to kind of, you know, between numeric
link |
01:05:52.820
and numerae, he had a little high level tool
link |
01:05:55.100
that would just select each one for matplotlib.
link |
01:05:57.900
In 2006, he finally said,
link |
01:06:00.420
we're gonna just make numpy the dependency of matplotlib.
link |
01:06:03.220
As soon as he did that,
link |
01:06:04.420
and I remember specifically when he did that,
link |
01:06:06.100
I said, okay, we've done it.
link |
01:06:07.900
Like, that was when I knew we had to see success.
link |
01:06:11.260
Before then it was still unsure,
link |
01:06:13.620
but that kind of started a roller coaster.
link |
01:06:15.060
And then 2006 to 2009.
link |
01:06:17.900
And then I've been floored by what it's done.
link |
01:06:20.940
Like, I knew it would help.
link |
01:06:22.900
I had no idea how much it would help.
link |
01:06:25.380
Right, so.
link |
01:06:26.300
And it has to do with, again, the language thing.
link |
01:06:28.660
It just, people started to think in terms of numpy.
link |
01:06:31.940
Yes.
link |
01:06:32.820
And that opened up a whole new way of thinking.
link |
01:06:36.460
And part of the story that you kind of mentioned,
link |
01:06:39.220
but maybe you can elaborate,
link |
01:06:42.980
is it seems like at some point in the story,
link |
01:06:46.320
Python took over science and data science.
link |
01:06:50.800
Yes.
link |
01:06:51.640
And bigger than that,
link |
01:06:54.800
the scientific community started to think like programmers
link |
01:07:00.160
or started to utilize the tools of computers to do,
link |
01:07:04.280
like at a scale that wasn't done with Fortran.
link |
01:07:06.640
Like at this gigantic scale,
link |
01:07:09.320
they started to open in their heart.
link |
01:07:10.760
And then Python was the thing.
link |
01:07:12.040
I mean, there's a few other competitors, I guess,
link |
01:07:14.280
but Python, I think, really, really took over.
link |
01:07:16.960
I agree.
link |
01:07:17.800
There's a lot of stories here
link |
01:07:18.620
that are kind of during this journey,
link |
01:07:19.720
because this is sort of the start of this journey in 2005, 2006.
link |
01:07:23.240
So my tenure committee, I applied for tenure in 2006, 2007.
link |
01:07:28.180
It came back, I split the department.
link |
01:07:29.780
I was very polarizing.
link |
01:07:31.300
I had some huge fans
link |
01:07:32.560
and then some people that said no way, right?
link |
01:07:34.380
So it was very, I was a polarizing figure in the department.
link |
01:07:36.840
It went all the way up to the university president.
link |
01:07:39.800
Ultimately, my department chair had the sway
link |
01:07:42.760
and they didn't say no.
link |
01:07:43.760
They said, come back in two years and do it again.
link |
01:07:46.360
And I went, eh, at that point, I was like,
link |
01:07:49.680
I mean, I had this interest in entrepreneurship,
link |
01:07:52.840
this interest in not the academic circles,
link |
01:07:56.400
not the, like, how do we make industry work?
link |
01:07:59.680
So I do have to give credit to that exploration of economics
link |
01:08:03.060
because that led me, oh, I had a lot of opinions.
link |
01:08:06.540
I was actually very libertarian at the time.
link |
01:08:09.520
And I still have some libertarian trends,
link |
01:08:11.840
but I'm more of a, I'm more of a collectivist libertarian.
link |
01:08:15.880
So you value broadly, philosophically freedom.
link |
01:08:18.720
I value broadly, philosophically freedom,
link |
01:08:20.360
but I also understand the power of communities,
link |
01:08:23.440
like the power of collective behavior.
link |
01:08:26.200
And so what's that balance, right?
link |
01:08:27.840
That makes sense.
link |
01:08:29.800
So by the time I was just,
link |
01:08:31.520
I gotta go out and explore this entrepreneur world.
link |
01:08:33.380
So I left academia.
link |
01:08:34.220
I said, no thanks, called my friend, Eric, here,
link |
01:08:37.820
who had, his company was going.
link |
01:08:39.560
I said, hey, could I join you and start this trend?
link |
01:08:43.120
And he, at that time they were using SciFi a lot.
link |
01:08:45.920
They were trying to get clients.
link |
01:08:47.120
And so I came down to Texas.
link |
01:08:48.760
And in Texas is where I sort of,
link |
01:08:51.160
it's my entrepreneur world, right?
link |
01:08:53.440
I left academia and went to entrepreneur world in 2007.
link |
01:08:57.360
So I moved here in 2007, kind of took a leap,
link |
01:08:59.920
knew nothing really about business,
link |
01:09:01.600
knew nothing about a lot of stuff there.
link |
01:09:05.100
There's, you know, for a long time,
link |
01:09:06.980
I've kept some connections to a lot of academics
link |
01:09:08.980
because I still value it.
link |
01:09:10.080
I still love the scientific tradition.
link |
01:09:12.520
I still value the essence and the soul and the heart
link |
01:09:15.240
of what is possible.
link |
01:09:17.320
Don't like a lot of the administration
link |
01:09:21.380
and the kind of, we can go into detail about why
link |
01:09:24.160
and where and how this happens,
link |
01:09:25.320
what are some of the challenges.
link |
01:09:26.520
I don't know, but I'm with you.
link |
01:09:28.480
So I'm still affiliated with MIT.
link |
01:09:31.840
I still love MIT because there's magic there.
link |
01:09:35.600
There's people I talk to, like researchers, faculty,
link |
01:09:40.320
in those conversations and the whiteboard
link |
01:09:43.120
and just the conversation, that's magic there.
link |
01:09:46.220
All the other stuff, the administration,
link |
01:09:48.120
all that kind of stuff seems to,
link |
01:09:52.020
you don't wanna say too harshly criticize
link |
01:09:54.920
sort of bureaucracies, but there's a lag
link |
01:09:57.680
that seems to get in the way of the magic.
link |
01:10:00.800
And I'm still have a lot of hope
link |
01:10:03.800
that that can change because I don't often see
link |
01:10:08.320
that particular type of magic elsewhere in the industry.
link |
01:10:12.840
So like we need that and we need that flame going.
link |
01:10:15.800
And it's the same thing as exactly as you said,
link |
01:10:19.120
it has the same kind of elements
link |
01:10:20.560
like the open source community does.
link |
01:10:23.240
And, but then if you, like the reason I stepped away,
link |
01:10:27.160
the reason I'm here, just like you did in Austin is like,
link |
01:10:30.260
if I wanna build one robot, I'll stay at MIT.
link |
01:10:33.240
But if I wanna build millions and make money enough
link |
01:10:37.460
to where I can explore the magic of that, then you can't.
link |
01:10:41.000
And I think that dance is...
link |
01:10:44.160
That translational dance has been lost a bit, right?
link |
01:10:47.480
And there's a lot of reasons for that.
link |
01:10:48.640
I'm not, I'm certainly not an expert on this stuff.
link |
01:10:50.160
I can opine like anybody else,
link |
01:10:51.660
but I realized that I wanted to explore entrepreneurship,
link |
01:10:55.820
which I, and really figure out,
link |
01:10:57.720
and it's been a driving passion for 20 years, 25 years.
link |
01:11:01.560
How do we connect capital markets and company?
link |
01:11:06.480
Cause again, I fell in love with the notion of,
link |
01:11:07.880
oh, profit seeking on its own is not a bad thing.
link |
01:11:11.160
It's actually a coordination mechanism
link |
01:11:13.520
for allocating resources that, you know,
link |
01:11:16.480
in an emergent way, right?
link |
01:11:18.000
That respects everybody's opinions, right?
link |
01:11:20.720
So this is actually powerful.
link |
01:11:21.880
So I say all the time, when I make a company
link |
01:11:25.320
and we do something that makes profit,
link |
01:11:27.260
what we're saying is, hey,
link |
01:11:28.100
we're collecting of the world's resources
link |
01:11:29.800
and voluntarily people are asking us
link |
01:11:31.480
to do something that they like.
link |
01:11:33.000
And that's a huge deal.
link |
01:11:34.000
And so I really liked that energy.
link |
01:11:36.120
So that's what I came to do and to learn
link |
01:11:37.560
and to try to figure out.
link |
01:11:38.480
And that's what I've been kind of stumbling through
link |
01:11:40.120
since for the past 14 years.
link |
01:11:40.960
And that's 2007.
link |
01:11:42.580
2007, yeah.
link |
01:11:43.420
And so you were still working at NoPi.
link |
01:11:44.960
So NoPi was just emerging.
link |
01:11:46.560
Just emerging.
link |
01:11:47.400
One of the things I've done,
link |
01:11:49.160
it's worth mentioning because it emphasizes
link |
01:11:51.480
the exploratory nature of my thinking at the time.
link |
01:11:53.840
I said, well, I don't know how to fund this thing.
link |
01:11:55.240
I've got a graduate student I'm paying for
link |
01:11:56.720
and I've got no funding for him.
link |
01:11:57.880
And I had done some fundraising from the public
link |
01:12:00.520
to try to get public fundraisers in my lab.
link |
01:12:02.800
I didn't really wanna go out
link |
01:12:03.880
and just do the fundraising circuit
link |
01:12:05.360
the way it's traditionally done.
link |
01:12:06.920
So I wrote a book and I said, I'm gonna write a book
link |
01:12:09.960
and I'm gonna charge for it.
link |
01:12:11.440
It was called Guide to NoPi.
link |
01:12:12.720
And so ultimately NoPi became
link |
01:12:14.040
documentation driven development
link |
01:12:15.960
because I basically wrote the book
link |
01:12:17.280
and made sure the stuff worked or the book would work.
link |
01:12:19.760
So it really helped actually make NoPi become a thing.
link |
01:12:23.040
So writing that book,
link |
01:12:25.800
and it's not a page turner.
link |
01:12:28.200
Guide to NoPi is not a book you pick up
link |
01:12:29.680
and go, oh, this is great, over the fire.
link |
01:12:31.520
But it's where you could find the details,
link |
01:12:33.640
like how'd all this work.
link |
01:12:34.720
And a lot of people love that book.
link |
01:12:36.520
And so a lot of people ended up,
link |
01:12:38.040
so I said, look, I need to, so I'm gonna charge for it.
link |
01:12:41.600
And I got some flack for that.
link |
01:12:42.760
Not that much, just probably five angry messages,
link |
01:12:45.920
people yelling at me saying I was a bad guy
link |
01:12:49.960
for charging for this book.
link |
01:12:51.360
Was one of them Richard Stallman?
link |
01:12:53.280
No. Just kidding.
link |
01:12:54.120
No, I haven't really had any interaction with him personally,
link |
01:12:56.920
like I said, but there were a few,
link |
01:12:59.840
but actually surprisingly not.
link |
01:13:01.280
There was actually a lot of people like,
link |
01:13:02.760
no, it's fine, you can charge for a book.
link |
01:13:04.240
That's no big deal.
link |
01:13:05.080
We know that's a way you can try to make money
link |
01:13:07.080
around open source.
link |
01:13:07.920
So what I did, I did it in an interesting way.
link |
01:13:10.160
I said, well, kind of my ideas around IP law and stuff.
link |
01:13:14.280
I love the idea you can share something, you can spread it.
link |
01:13:16.120
Like once it's, the fact that you have a thing
link |
01:13:18.280
and copying is free, but the creation is not free.
link |
01:13:21.640
So how do you fund the creation and allow the copying?
link |
01:13:25.600
And in software, it's a little more complicated than that
link |
01:13:27.040
because creation is actually a continuous thing.
link |
01:13:29.360
It's not like you build a widget and it's done.
link |
01:13:31.160
It's sort of a process of emerging
link |
01:13:32.640
and continuing to create.
link |
01:13:34.560
But I wrote the book
link |
01:13:35.520
and had this market determined price thing.
link |
01:13:37.520
I said, look, I need, I think I said 250,000.
link |
01:13:40.720
If I make 250,000 from this book, I'll make it free.
link |
01:13:44.280
So as soon as I get that much money,
link |
01:13:45.760
or I said five years, so there's a time limit.
link |
01:13:48.960
Like it's not forever.
link |
01:13:49.800
That's really cool.
link |
01:13:50.640
It's amazing.
link |
01:13:51.680
I released it on this.
link |
01:13:53.080
And it's actually interesting
link |
01:13:54.240
because one of the people
link |
01:13:55.800
who also thought that was interesting
link |
01:13:57.040
ended up being Chris White,
link |
01:13:58.600
who was the director of DARPA project
link |
01:14:01.360
that we got funding through at Anaconda.
link |
01:14:02.920
And the reason he even called us back
link |
01:14:04.640
is because he remembered my name from this book
link |
01:14:06.720
and he thought that was interesting.
link |
01:14:08.080
And so even though we hadn't gone to the demo days,
link |
01:14:10.880
we applied and the people said, yeah,
link |
01:14:12.680
nobody ever gets this without coming to the demo day first.
link |
01:14:15.360
This is the first time I've seen it.
link |
01:14:16.320
But it's because I knew, you know,
link |
01:14:18.200
Chris had done this and had this interaction.
link |
01:14:19.640
So it did have impact.
link |
01:14:21.680
I was actually really, really pleased by the result.
link |
01:14:23.880
I mean, I ended up in three years, I made 90,000.
link |
01:14:27.360
So sold 30,000 copies by myself.
link |
01:14:29.480
I just put it up on, you know, use PayPal and sold it.
link |
01:14:33.000
And that was my first taste of kind of, okay,
link |
01:14:36.040
this can work to some degree.
link |
01:14:37.600
And I, you know, all over the world, right?
link |
01:14:40.320
From Germany to Japan to, it was actually, it did work.
link |
01:14:44.480
And so I appreciated the fact that PayPal existed
link |
01:14:47.040
and I had a way to get the money, the distribution was simple.
link |
01:14:51.200
This is pre Amazon book stuff.
link |
01:14:53.480
So it was just publishing a website.
link |
01:14:55.320
It was the popularity of SciPy emerging
link |
01:14:57.120
and getting company usage.
link |
01:14:58.960
I ended up not letting it go the five years
link |
01:15:00.600
and not trying to make the full amount
link |
01:15:01.960
because, you know, a year and a half later,
link |
01:15:04.560
I was at Enthought.
link |
01:15:05.400
I had left academia as an Enthought
link |
01:15:06.680
and I kind of had a full time job.
link |
01:15:07.880
And then actually what happened is the documentation people,
link |
01:15:10.000
there's a group that said, hey,
link |
01:15:10.840
we want to do documentation for SciPy as a collective.
link |
01:15:14.280
And they're essentially needing the stuff in the book, right?
link |
01:15:18.680
And so they kind of ask,
link |
01:15:20.360
hey, could we just use the stuff in your book?
link |
01:15:21.920
And at that point I said, yeah, I'll just open it up.
link |
01:15:24.160
So that's, but it has served its purpose.
link |
01:15:27.320
And the money that I made actually funded my grad student.
link |
01:15:31.040
Like it was actually, you know,
link |
01:15:32.160
I paid him 25,000 a year out of that money.
link |
01:15:35.440
So the funny thing is if you do a very similar
link |
01:15:37.440
kind of experiment now with NumPy or something like it,
link |
01:15:40.680
you could probably make a lot more.
link |
01:15:42.480
It's probably true.
link |
01:15:43.800
Because of the tooling and the community building.
link |
01:15:46.360
Yeah, I agree.
link |
01:15:47.200
Like the, and social media,
link |
01:15:48.680
that there's just a virality to that kind of idea.
link |
01:15:51.560
I agree.
link |
01:15:52.400
There'd be things to do.
link |
01:15:53.240
I've thought about that.
link |
01:15:54.080
And really I thought about a couple of books
link |
01:15:56.080
or a couple of things that could be done there.
link |
01:15:57.440
And I just haven't, right?
link |
01:15:58.960
Even, I tried to hire a ghostwriter this year too
link |
01:16:01.920
to see if that could help, but it didn't.
link |
01:16:04.160
But part of my problem is this,
link |
01:16:06.240
I've been so excited by a number of things
link |
01:16:08.080
that have stemmed from that.
link |
01:16:09.480
Like, so I came here, worked at Enthought for four years,
link |
01:16:13.040
graciously, Eric made me president.
link |
01:16:14.960
Then we started to work closely together.
link |
01:16:16.280
We actually helped him buy out his partner.
link |
01:16:19.440
It didn't end great.
link |
01:16:20.720
Like unfortunately Eric and I aren't real,
link |
01:16:22.880
aren't friends now.
link |
01:16:24.560
I still respect him.
link |
01:16:25.400
I have a lot, I wish we were,
link |
01:16:26.640
but he didn't like the fact that Peter and I
link |
01:16:30.240
started Anaconda, right?
link |
01:16:31.680
That was not, I mean, so there's two sides to that story.
link |
01:16:36.200
So I'm not gonna go into it, right?
link |
01:16:37.360
Sure.
link |
01:16:38.200
But you, as human beings
link |
01:16:40.600
and you wish you still could be friends.
link |
01:16:42.320
I do, I do.
link |
01:16:43.920
It saddens me.
link |
01:16:45.160
I mean, that's a story of great minds
link |
01:16:49.040
building great companies.
link |
01:16:51.480
Somehow it's sad that when there's that kind of.
link |
01:16:55.000
And I hold him in esteem.
link |
01:16:57.360
I'm grateful for him.
link |
01:16:58.200
I think Enthought still exists.
link |
01:17:00.320
They're doing great work helping scientists.
link |
01:17:02.520
They still run the SciPy conference.
link |
01:17:05.040
They have an R&D platform they're selling now
link |
01:17:07.320
that's a tool that you can go get today, right?
link |
01:17:10.080
So Enthought has played a role in the SciPy
link |
01:17:14.920
in supporting the community around SciPy, I would say.
link |
01:17:18.240
They ended up not being able to,
link |
01:17:20.560
they ended up building a tool suite
link |
01:17:22.040
to write GUI applications.
link |
01:17:24.040
Like that's where they could actually make
link |
01:17:25.440
that the business could work.
link |
01:17:26.680
And so supporting SciPy and NumPy itself
link |
01:17:29.480
wasn't as possible.
link |
01:17:30.560
Like they didn't, they tried.
link |
01:17:31.960
I mean, it was not just because,
link |
01:17:33.280
it was just because of the business aspect.
link |
01:17:34.480
So, and I wanted to build a company that could do,
link |
01:17:36.840
that could get venture funding, right?
link |
01:17:39.080
Better for worse.
link |
01:17:39.920
I mean, that's a longer story.
link |
01:17:41.040
We could talk a lot about that, but.
link |
01:17:42.400
And that's where Anaconda came to be.
link |
01:17:44.200
That's where Anaconda came to be.
link |
01:17:45.040
So let me ask you, it's a little bit for fun
link |
01:17:48.040
because you built this amazing thing.
link |
01:17:50.000
And so let's talk about like an old warrior
link |
01:17:54.640
looking over old battles.
link |
01:17:57.320
You've, you know, there's a sad letter in 2012
link |
01:18:01.480
that you wrote to the NumPy mailing list
link |
01:18:04.360
announcing that you're leaving NumPy.
link |
01:18:06.320
And some of the things you've listed
link |
01:18:08.560
as some of the things you regret
link |
01:18:10.720
or not regret necessarily, but some things to think about.
link |
01:18:14.440
If you could go back and you could fix stuff about NumPy
link |
01:18:17.640
or both sort of in a personal level,
link |
01:18:20.640
but also like looking forward,
link |
01:18:21.960
what kind of things would you like to see changed?
link |
01:18:24.560
Good question.
link |
01:18:25.400
So I think there's technical questions
link |
01:18:26.320
and social questions right there.
link |
01:18:29.680
First of all, you know, I wrote NumPy as a service
link |
01:18:33.400
and I spent a lot of time doing it.
link |
01:18:35.000
And then other people came help make it happen.
link |
01:18:36.760
NumPy succeeded because the work of a lot of people, right?
link |
01:18:39.840
So it's important to understand that.
link |
01:18:42.240
I'm grateful for the opportunity,
link |
01:18:43.880
the role I had, I could play
link |
01:18:45.080
and grateful that things I did had an impact,
link |
01:18:47.600
but they only had the impact they had
link |
01:18:49.200
because the other people that came to the story.
link |
01:18:52.200
And so they were essential,
link |
01:18:53.440
but the way data types were handled,
link |
01:18:55.720
the way data types, we had array scalers, for example,
link |
01:18:59.280
that are really just a substitute for a type concept, right?
link |
01:19:04.080
So we had array scalers or actual Python objects
link |
01:19:06.960
so that there's for every, for a 32 bit float
link |
01:19:09.520
or a 16 bit float or a 16 bit integer,
link |
01:19:13.160
Python doesn't have a natural,
link |
01:19:14.720
it's just one integer, there's one float.
link |
01:19:17.040
Well, what about these lower precision types,
link |
01:19:19.960
these larger precision types?
link |
01:19:21.600
So we had them in NumPy
link |
01:19:23.680
so that you could have a collection of them,
link |
01:19:25.320
but then have an object in Python that was one of them.
link |
01:19:28.760
And there's questions about like in retrospect,
link |
01:19:31.440
I wouldn't have created those
link |
01:19:32.920
if it improved the type system.
link |
01:19:34.880
And like made the type system actually a Python type system
link |
01:19:38.000
as opposed to currently,
link |
01:19:39.480
it's a Python one level type system.
link |
01:19:41.400
I don't know if you know the difference
link |
01:19:42.240
between Python one, Python two,
link |
01:19:43.200
it's kind of technical, kind of depth,
link |
01:19:44.880
but Python two, one of its big things that Guido did,
link |
01:19:47.320
it was really brilliant.
link |
01:19:48.160
It was the actually Python one,
link |
01:19:51.640
all classes, new objects were one.
link |
01:19:55.040
If you as a user wrote a class,
link |
01:19:56.880
it was an instance of a single Python type
link |
01:19:59.600
called the class type, right?
link |
01:20:02.000
In Python two, he used a meta typing hook
link |
01:20:06.240
to actually go, oh, we can extend this
link |
01:20:07.960
and have users write classes that are new types.
link |
01:20:10.960
So he was able to have your user classes be actual types
link |
01:20:13.320
and the Python type system got a lot more rich.
link |
01:20:16.480
I barely understood that at the time that NumPy was written.
link |
01:20:19.160
And so I essentially in NumPy created a type system
link |
01:20:22.480
that was Python one era.
link |
01:20:24.400
It was every D type is an instance of the same type
link |
01:20:29.240
as opposed to having new D types be really just Python types
link |
01:20:33.160
with additional metadata.
link |
01:20:34.280
What's the cost of that?
link |
01:20:35.440
Is it efficiency, is it usability?
link |
01:20:37.200
It's usability primarily.
link |
01:20:38.840
The cost isn't really efficiency.
link |
01:20:40.320
It's the fact that it's clumsy to create new types.
link |
01:20:45.080
It's hard.
link |
01:20:45.920
And then one of the challenges,
link |
01:20:47.560
you wanna create new types.
link |
01:20:48.400
You wanna quaternion type or you wanna add a new posit type
link |
01:20:52.600
or you wanna, so it's hard.
link |
01:20:55.080
And now, if we had done that well,
link |
01:20:59.200
when Numba came on the scene
link |
01:21:00.440
where we could actually compile Python code,
link |
01:21:02.880
it would integrate with that type system much cleaner.
link |
01:21:05.160
And now all of a sudden you could do gradual typing
link |
01:21:08.080
more easily.
link |
01:21:08.920
You could actually have Python when you add Numba
link |
01:21:10.560
plus better typing, could actually be a,
link |
01:21:14.720
you'd smooth out a lot of rough edges.
link |
01:21:16.800
But there's already, there's like,
link |
01:21:18.840
but are you talking about from the perspective
link |
01:21:20.960
of developers within NumPy or users of NumPy?
link |
01:21:23.840
Developers of new, not really users of NumPy so much.
link |
01:21:27.080
It's the development of NumPy.
link |
01:21:28.800
So you're thinking about like how to design NumPy
link |
01:21:32.160
so that it's contributors.
link |
01:21:33.880
Yeah, the contributors, it's easier.
link |
01:21:35.880
It's easier.
link |
01:21:36.720
It's less work to make it better and to keep it maintained.
link |
01:21:39.320
And where that's impacted things, for example,
link |
01:21:41.480
is the GPU.
link |
01:21:43.400
Like all of a sudden GPUs start getting added
link |
01:21:45.520
and we don't have them in NumPy.
link |
01:21:48.360
Like NumPy should just work on GPUs.
link |
01:21:50.560
The fact that we'd have to download a whole other object
link |
01:21:52.680
called Kupy to have arrays on GPUs
link |
01:21:54.800
is just an artifact of history.
link |
01:21:57.440
Like there's no fundamental reason for it.
link |
01:21:59.160
Well, that's really interesting.
link |
01:22:00.200
If we could sort of go on that tangent briefly
link |
01:22:02.520
is you have PyTorch and other libraries like TensorFlow
link |
01:22:07.800
that basically tried to mimic NumPy.
link |
01:22:11.840
Like you've created a sort of platonic form
link |
01:22:15.720
of multi dimension. Basically, yeah.
link |
01:22:16.920
Yeah, exactly.
link |
01:22:17.760
Well, and the problem was I didn't realize that.
link |
01:22:19.800
Platonic form has a lot of edges.
link |
01:22:21.760
They're like, well, we should cut those out
link |
01:22:23.360
before we present it.
link |
01:22:24.200
So I wonder if you can comment,
link |
01:22:26.920
is there like a difference between their implementations?
link |
01:22:29.360
Do you wish that they were all using NumPy
link |
01:22:31.440
or like in this abstraction of GPU?
link |
01:22:34.040
And sorry to interrupt that there's GPUs, ASICs.
link |
01:22:38.240
There might be other neuromorphic computing.
link |
01:22:40.040
There might be other kind of,
link |
01:22:41.600
or the aliens will come with a new kind of computer.
link |
01:22:43.920
Like an abstraction that NumPy should just operate nicely
link |
01:22:47.880
over the things that are more and more
link |
01:22:50.280
and smarter and smarter with this multi dimensional arrays.
link |
01:22:54.200
Yeah, yeah.
link |
01:22:55.520
There's several comments there.
link |
01:22:56.920
We are working on something now called data dash APIs.org.
link |
01:23:00.360
Data dash API.org, you can go there today.
link |
01:23:02.560
And it's our answer.
link |
01:23:04.480
It's my answer.
link |
01:23:05.320
It's not just me.
link |
01:23:06.160
It's me and Rolf and Athen and Aaron
link |
01:23:09.120
and a lot of companies are helping us at Quansight Labs.
link |
01:23:13.120
It's not unifying all the arrays.
link |
01:23:14.560
It's creating an API that is unified.
link |
01:23:17.200
So we do care about this
link |
01:23:19.360
and we're trying to work through it.
link |
01:23:21.280
I actually had the chance to go and meet
link |
01:23:22.560
with the TensorFlow team and the PyTorch team
link |
01:23:25.360
and talk to them after exiting Anaconda.
link |
01:23:29.120
Just talking about,
link |
01:23:29.960
because the first year after leaving Anaconda in 2018,
link |
01:23:33.960
I became deeply aware of this and realized that,
link |
01:23:36.000
oh, this split in the array community that exists today
link |
01:23:38.960
makes what I was concerned about in 2005 pretty parochial.
link |
01:23:44.160
It's a lot worse, right?
link |
01:23:45.880
Now there's a lot more people.
link |
01:23:47.280
So perhaps the industry can sustain more stacks, right?
link |
01:23:51.400
There's a lot of money,
link |
01:23:52.560
but it makes it a lot less efficient.
link |
01:23:54.120
I mean, but I've also learned to appreciate,
link |
01:23:56.720
it's okay to have some competition.
link |
01:23:58.440
It's okay to have different implementations,
link |
01:24:00.760
but it's better if you can at least refactor some parts.
link |
01:24:03.560
I mean, you're gonna be more efficient
link |
01:24:04.960
if you can refactor parts.
link |
01:24:07.000
It's nice to have competition over things,
link |
01:24:09.560
over what is nice to have competition.
link |
01:24:11.760
They're innovative.
link |
01:24:12.600
Yeah, innovative.
link |
01:24:13.440
And then maybe on the infrastructure,
link |
01:24:15.920
whatever, however you define infrastructure,
link |
01:24:18.120
that maybe it's nice to have come together.
link |
01:24:21.400
Exactly, I agree.
link |
01:24:22.440
And I think, but it was interesting to hear the stories.
link |
01:24:24.600
I mean, TensorFlow came out of a C++ library,
link |
01:24:29.040
Jeff Dean wrote, I think,
link |
01:24:30.160
that was basically how they were doing inference, right?
link |
01:24:33.560
And then they realized, oh,
link |
01:24:34.400
we could do this TensorFlow thing.
link |
01:24:36.440
That C++ library, then what was interesting to me
link |
01:24:38.400
was the fact that both Google and Facebook did not,
link |
01:24:42.600
it's not like they supported Python or NumPy initially.
link |
01:24:44.960
They just realized they had to.
link |
01:24:47.200
They came to this world and then all the users were like,
link |
01:24:48.760
hey, where's the NumPy interface?
link |
01:24:50.680
Oh, and then they kind of came late to it
link |
01:24:52.560
and then they had these bolt ons.
link |
01:24:54.800
TensorFlow's bolt on, I don't mean to offend,
link |
01:24:57.280
but it was so bad.
link |
01:24:58.480
Yeah, it was bad.
link |
01:24:59.320
It's the first time that I'm usually,
link |
01:25:01.760
I mean, one of the challenges I have
link |
01:25:04.160
is I don't criticize enough in the sense
link |
01:25:07.000
that I don't give people input enough, you know, if.
link |
01:25:09.960
I think it's universally agreed upon
link |
01:25:11.680
that the bolt ons on TensorFlow were.
link |
01:25:13.640
But I went to, it was a talk given at Mallorca in Spain
link |
01:25:17.080
and a great guy came and gave a talk and I said,
link |
01:25:19.880
you should never show that API again
link |
01:25:21.400
at a PyData conference.
link |
01:25:23.040
Like that was, that's terrible.
link |
01:25:24.840
Like you're taking this beautiful system we've created
link |
01:25:27.080
and like you're corrupting all these poor Python people,
link |
01:25:29.440
forcing them to write code like that
link |
01:25:30.840
or thinking they should.
link |
01:25:32.640
Fortunately, you know, they adopted Keras as their,
link |
01:25:35.640
and Keras is better.
link |
01:25:36.760
And so Keras, TensorFlow is fine, is reasonable,
link |
01:25:40.360
but they bolted it on.
link |
01:25:42.680
Facebook did too.
link |
01:25:43.640
Like Facebook had their own C++ library for doing inference
link |
01:25:48.160
and they also had the same reaction, they had to do this.
link |
01:25:51.160
One big difference is Facebook,
link |
01:25:52.840
maybe because of the way it's situated in part of fair,
link |
01:25:55.240
part of the research library,
link |
01:25:56.600
TensorFlow is definitely used and, you know,
link |
01:25:58.880
they have to make, they couldn't just open it up
link |
01:26:00.720
and let the community, you know, change what that is.
link |
01:26:03.160
Cause I guess they were worried
link |
01:26:04.640
about disrupting their operations.
link |
01:26:06.880
Facebook's been much more open to having community input
link |
01:26:10.720
on the structure itself.
link |
01:26:12.400
Whereas Google and TensorFlow,
link |
01:26:14.240
they're really eager to have community users,
link |
01:26:16.000
people use it and build the infrastructure,
link |
01:26:17.520
but it's much more walled.
link |
01:26:18.840
Like it's harder to become a contributor to TensorFlow.
link |
01:26:21.600
And it's also, this is very difficult question to answer
link |
01:26:24.760
and don't mean to be throwing shade at anybody,
link |
01:26:27.080
but you have to wonder, it's the Microsoft question
link |
01:26:30.320
of when you have a tool like PyTorch or TensorFlow,
link |
01:26:33.920
how much are you tending to the hackers
link |
01:26:36.320
and how much are you tending to the big corporate clients?
link |
01:26:39.240
Correct.
link |
01:26:40.080
So like the ones that,
link |
01:26:42.560
do you tend to the millions of people
link |
01:26:44.160
that are giving you almost no money,
link |
01:26:46.440
or do you tend to the few
link |
01:26:48.360
that are giving you a ton of money?
link |
01:26:50.320
I tend to stand with the people.
link |
01:26:54.000
Right.
link |
01:26:54.840
Cause I feel like if you nurture the hackers,
link |
01:26:57.760
you will make the right decisions in the longterm
link |
01:27:00.200
that will make the companies happy.
link |
01:27:02.000
I lean that way too.
link |
01:27:03.280
I totally agree.
link |
01:27:04.120
But then you have to find the right dance.
link |
01:27:05.680
But it's a balance.
link |
01:27:07.080
Cause you can lean to the hackers and run out of money.
link |
01:27:08.960
Yeah, exactly.
link |
01:27:10.240
Exactly.
link |
01:27:11.440
Which has been some of the challenge I've faced
link |
01:27:13.760
in the sense that,
link |
01:27:14.680
like I would look at some of the experiments,
link |
01:27:17.040
like NumPy, the fact that we have this split
link |
01:27:19.040
is a factor of I wasn't able to collect more money
link |
01:27:21.720
towards NumPy development.
link |
01:27:22.800
Yeah.
link |
01:27:23.640
Right?
link |
01:27:24.480
I mean, I didn't succeed in the early days
link |
01:27:26.480
of getting enough financial contribution to NumPy
link |
01:27:29.560
so that they could work on it.
link |
01:27:31.080
Right?
link |
01:27:31.920
I couldn't work on it full time.
link |
01:27:32.760
I had to just catch an hour here, an hour there.
link |
01:27:35.640
And I basically not liked that.
link |
01:27:37.880
Like I've wanted to be able to do something about that
link |
01:27:39.920
for a long time and try to figure out how,
link |
01:27:41.440
well, there's lots of ways.
link |
01:27:42.960
I mean, possibly one could say,
link |
01:27:44.640
we had an offer from Microsoft
link |
01:27:46.240
at early days of Anaconda.
link |
01:27:48.240
2014, they offered to come buy us, right?
link |
01:27:51.160
The problem was the right people at Microsoft
link |
01:27:52.760
didn't offer to buy us.
link |
01:27:53.600
And they were still,
link |
01:27:54.880
they were, it was really a,
link |
01:27:56.440
we were like a second,
link |
01:27:58.040
they had really bought, they just bought R,
link |
01:27:59.680
the R company called,
link |
01:28:01.800
it was not R studio,
link |
01:28:02.800
but it was another R company that was emergent.
link |
01:28:05.680
And it was kind of a,
link |
01:28:07.160
well, we should also get a Python play,
link |
01:28:09.360
but they were really doubling down on R.
link |
01:28:11.520
Right?
link |
01:28:12.360
And so it was like,
link |
01:28:13.200
it was where you would go to die.
link |
01:28:14.400
So it's not, it wasn't,
link |
01:28:15.440
it was before Satya was there.
link |
01:28:17.160
Satya had just started.
link |
01:28:18.680
Just started.
link |
01:28:19.520
Right?
link |
01:28:20.360
And the offer was coming from someone
link |
01:28:21.800
two levels down from him.
link |
01:28:23.080
Got you.
link |
01:28:23.920
Right?
link |
01:28:24.760
And if it had come from Scott Guthrie,
link |
01:28:26.640
so I got a chance to meet Scott Guthrie,
link |
01:28:28.320
great guy, I like him.
link |
01:28:29.760
If an offer had come from him,
link |
01:28:31.560
probably would be at Microsoft right now.
link |
01:28:33.200
That'd be fascinating.
link |
01:28:34.520
That would be really nice actually,
link |
01:28:36.160
especially given what Microsoft has since done
link |
01:28:38.720
for the open source community and all those things.
link |
01:28:40.200
Yes, I think they're doing well.
link |
01:28:41.640
I really like some of the stuff they've been doing.
link |
01:28:43.720
They're still working,
link |
01:28:45.200
and they've, you know,
link |
01:28:46.040
they've hired Guido now,
link |
01:28:46.880
and they've hired a lot of Python developers.
link |
01:28:47.720
Wait, Guido's not at Microsoft?
link |
01:28:49.400
Yeah, he works at Microsoft.
link |
01:28:50.240
I need to.
link |
01:28:52.480
Which, he retired,
link |
01:28:53.600
then he came out of retirement,
link |
01:28:54.720
and he's working now.
link |
01:28:55.560
I was just talking to him,
link |
01:28:56.400
and he didn't mention this person.
link |
01:28:57.840
Well.
link |
01:28:58.680
I should investigate this further.
link |
01:29:01.280
Well.
link |
01:29:02.120
Because I know he loved Dropbox,
link |
01:29:02.960
but I wasn't sure what he was doing,
link |
01:29:04.000
who he was up to.
link |
01:29:05.160
Well, he was kind of saying he'd retire,
link |
01:29:06.560
but, and it's literally been five years
link |
01:29:09.640
since I last sat down and really talked to Guido.
link |
01:29:12.280
Right?
link |
01:29:13.640
Guido's a technology expert, right?
link |
01:29:16.000
He's a, so I came,
link |
01:29:17.480
I was excited because I'd finally figured out
link |
01:29:18.880
the type system for NumPy.
link |
01:29:20.720
I wanted to kind of talk about that with him,
link |
01:29:22.240
and I kind of overwhelmed him.
link |
01:29:23.960
Could you stay in that,
link |
01:29:25.080
just for a brief moment,
link |
01:29:26.640
because you're a fascinating person
link |
01:29:28.200
in the history of programming.
link |
01:29:29.440
He is a fascinating person.
link |
01:29:31.240
What have you learned from Guido
link |
01:29:34.200
about programming, about life?
link |
01:29:37.560
Yeah, yeah.
link |
01:29:38.400
A lot, actually.
link |
01:29:39.240
I've been a fan of Guido's.
link |
01:29:40.840
You know, we have a chance to talk.
link |
01:29:42.520
Some, I wouldn't say, you know,
link |
01:29:43.760
we talk all the time.
link |
01:29:44.840
Not at all.
link |
01:29:45.680
He may, but we talk enough to,
link |
01:29:47.520
I respect his,
link |
01:29:48.840
in fact, when I first started NumPy,
link |
01:29:49.880
one of the first things I did was I had a,
link |
01:29:51.520
I asked Guido for a meeting
link |
01:29:53.320
with him and Paul Dubois in San Mateo.
link |
01:29:55.400
And I went and met him for lunch.
link |
01:29:56.920
And basically, to say,
link |
01:29:58.000
maybe we can actually,
link |
01:29:59.200
part of the strategy for NumPy
link |
01:30:00.720
was to get it into Python 3,
link |
01:30:02.440
and maybe be part of Python.
link |
01:30:04.120
And so we talked about that.
link |
01:30:05.160
That's a cool conversation.
link |
01:30:06.000
And about that approach, right?
link |
01:30:06.920
I would have loved to be a flyer in the water.
link |
01:30:09.200
That was good.
link |
01:30:10.040
And over the years for Guido,
link |
01:30:12.080
I learned,
link |
01:30:13.560
so he was open.
link |
01:30:14.840
Like, he was willing to listen to people's ideas.
link |
01:30:18.200
Right?
link |
01:30:19.040
And over the years,
link |
01:30:19.880
now generally, you know,
link |
01:30:20.920
I'm not saying universally that's been true,
link |
01:30:22.600
but generally that's been true.
link |
01:30:24.360
So he's willing to listen.
link |
01:30:25.680
He's willing to defer.
link |
01:30:27.240
Like on the scientific side,
link |
01:30:28.280
he would just kind of defer.
link |
01:30:29.120
He didn't really always understand
link |
01:30:30.160
what we were doing.
link |
01:30:31.000
Yeah.
link |
01:30:31.840
And he'd defer.
link |
01:30:32.800
One place where he didn't enough
link |
01:30:35.640
was we missed a matrix multiply operator.
link |
01:30:37.680
Like that finally got added to Python,
link |
01:30:39.600
but about 10 years later than it should have.
link |
01:30:42.240
But the reason was because nobody,
link |
01:30:44.760
it takes a lot of effort.
link |
01:30:46.200
And I learned this while I was writing NumPy.
link |
01:30:48.160
I also wrote tools to Python.
link |
01:30:49.320
I began with Python Dev,
link |
01:30:50.160
and I added some pieces to Python.
link |
01:30:52.320
Like the memory view object.
link |
01:30:53.400
I wanted the structure of NumPy into Python.
link |
01:30:55.680
So we didn't get NumPy into Python,
link |
01:30:56.960
but we got the basic structure of it into Python.
link |
01:30:59.480
Like, so you could build on it.
link |
01:31:01.000
Nobody did for a while,
link |
01:31:01.880
but eventually database authors started to.
link |
01:31:04.720
And it's a lot better.
link |
01:31:05.760
They did.
link |
01:31:06.600
And also Antoine Petrou and Stefan Krah
link |
01:31:08.960
actually fixed the memory view object.
link |
01:31:10.760
Cause I wrote the underlying infrastructure in C,
link |
01:31:13.280
but the Python exposure was terrible
link |
01:31:15.520
until they came in and fixed it.
link |
01:31:16.640
Partly because I was writing NumPy,
link |
01:31:18.080
and NumPy was the Python exposure.
link |
01:31:19.960
I didn't really care about
link |
01:31:21.240
if you didn't have NumPy installed.
link |
01:31:22.800
Anyway, Guido opened up ideas,
link |
01:31:25.360
technologically brilliant.
link |
01:31:27.280
Like really, I really got a lot of respect for him
link |
01:31:29.440
when I saw what he did
link |
01:31:30.360
with this type class merger thing.
link |
01:31:33.320
It was actually tricky, right?
link |
01:31:35.200
And then willing to share, willing to share his ideas.
link |
01:31:38.400
So the other thing early on in 1998,
link |
01:31:40.200
I said, I wrote my first extension module.
link |
01:31:42.240
The reason I could is because he'd written this blog post
link |
01:31:44.800
on how to do reference counting, right?
link |
01:31:47.360
And without it, I would have been lost, right?
link |
01:31:50.040
But he was willing to at least try to write this post.
link |
01:31:53.240
And so he's been motivated early on with Python.
link |
01:31:56.080
There's a computer science for everybody.
link |
01:31:58.200
You kind of have this early on desire to,
link |
01:31:59.880
oh, maybe we should be pushing programming to more people.
link |
01:32:02.040
So he had this populist notion, I guess,
link |
01:32:04.560
or populist sense to learn that there's a certain skill,
link |
01:32:08.720
and I've seen it in other people too,
link |
01:32:10.560
of engaging with contributors sufficiently to,
link |
01:32:13.960
because when somebody engaged with you
link |
01:32:15.640
and wants to contribute to you,
link |
01:32:16.480
if you ignore them, they go away.
link |
01:32:18.400
So building that early contributor base
link |
01:32:19.760
requires real engagement with other people.
link |
01:32:23.320
And he would do that.
link |
01:32:24.520
Can you also comment on this tragic stepping down
link |
01:32:29.080
from his position as the benevolent dictator for life
link |
01:32:32.880
over the wars, you know?
link |
01:32:35.640
The Walrus operator?
link |
01:32:36.560
The Walrus operator was the last battle.
link |
01:32:39.200
I don't know if that's the cause of it,
link |
01:32:40.880
but there's this, for people who don't know,
link |
01:32:43.640
you can look up, there's the Walrus operator,
link |
01:32:45.640
which looks like a colon and equal sign.
link |
01:32:49.560
Yeah, colon, equal sign.
link |
01:32:50.800
And it actually does maybe the thing
link |
01:32:54.680
that an equal sign should be doing.
link |
01:32:57.560
Yeah, maybe, right, exactly.
link |
01:33:00.240
But it's just historically,
link |
01:33:02.080
equal sign means something else.
link |
01:33:03.560
It just means assignment.
link |
01:33:05.240
So he stepped down over this.
link |
01:33:07.280
What do you think about the pressure of leadership?
link |
01:33:10.360
It's something that, you mentioned the letter I wrote
link |
01:33:12.280
in NumPy at the time.
link |
01:33:13.640
That was a hard time, actually.
link |
01:33:15.240
I mean, there's been really hard times.
link |
01:33:17.080
It was hard.
link |
01:33:19.520
You get criticized, right?
link |
01:33:20.840
And you get pushed, and you get,
link |
01:33:22.800
not everybody loves what you do.
link |
01:33:23.800
Like anytime you do anything that has impact at all,
link |
01:33:26.880
you're not universally loved, right?
link |
01:33:28.560
You get some real critics.
link |
01:33:29.760
And that's an important energy,
link |
01:33:31.960
because it's impossible for you to do everything right.
link |
01:33:35.080
You need people to be pushing.
link |
01:33:37.160
But sometimes people can get mean, right?
link |
01:33:39.320
People can, I prefer to give people the benefit of the doubt.
link |
01:33:43.080
I don't immediately assume they have bad intentions.
link |
01:33:45.800
And maybe for other, maybe that doesn't happen for everybody.
link |
01:33:49.000
For whatever reason, their past,
link |
01:33:50.200
their experiences with people, they sometimes have bad,
link |
01:33:53.040
so they immediately attribute to you bad intentions.
link |
01:33:54.880
So you're like, where did this come from?
link |
01:33:56.080
I mean, I'm definitely open to criticism,
link |
01:33:57.760
but I think you're misinterpreting the whole point.
link |
01:34:00.520
Because I would get that, certainly when I started Anaconda.
link |
01:34:05.800
Sometimes I say to people,
link |
01:34:08.520
I care enough about entrepreneurship
link |
01:34:09.760
to make some open source people uncomfortable.
link |
01:34:12.240
And I care enough about open source
link |
01:34:13.520
to make investors uncomfortable.
link |
01:34:15.560
So I sort of, you create kind of doubters on both sides.
link |
01:34:19.880
So when you have, and this is just a plea
link |
01:34:23.840
to the listener and the public, I've noticed this too,
link |
01:34:27.680
that there's a tendency, and social media makes this worse,
link |
01:34:32.680
when you don't have perfect information about the situation,
link |
01:34:35.560
you tend to fill the gaps with the worst possible,
link |
01:34:39.280
or at least a bad story that fills those gaps.
link |
01:34:43.080
And I think it's good to live life,
link |
01:34:46.960
maybe not fully naively, but filling in the gaps
link |
01:34:49.760
with the good, with the best, with the positive,
link |
01:34:54.720
with the hopeful explanation of why you see this.
link |
01:34:57.280
So if you see somebody like you trying to make money
link |
01:35:00.280
on a book about an umpire,
link |
01:35:01.960
there's a million stories around that that are positive.
link |
01:35:04.880
And those are good to think about,
link |
01:35:07.840
to project positive intent on the people.
link |
01:35:10.600
Because for many reasons, usually because people are good
link |
01:35:13.960
and they do have good intent.
link |
01:35:15.560
And also when you project that positive intent,
link |
01:35:17.480
people will step up to that too.
link |
01:35:19.400
Yes.
link |
01:35:20.240
It's a great point.
link |
01:35:21.760
It has this kind of viral nature to it.
link |
01:35:24.320
And of course with Twitter, early on figured out,
link |
01:35:27.720
and Facebook is that they can make a lot of money
link |
01:35:30.360
and engagement from the negative.
link |
01:35:32.280
Yes.
link |
01:35:33.120
So there's this, we're fighting this mechanism.
link |
01:35:35.440
I agree.
link |
01:35:36.280
Which is challenging.
link |
01:35:37.120
It's easier.
link |
01:35:37.940
It's just easier to be.
link |
01:35:38.780
To be negative.
link |
01:35:39.620
And then for some reason, something in our minds
link |
01:35:41.920
really enjoys sharing that and getting all excited
link |
01:35:45.280
about the negativity.
link |
01:35:46.280
We do, yeah.
link |
01:35:47.400
Some protective mechanism perhaps that we're gonna get eaten
link |
01:35:50.440
if we don't, yeah.
link |
01:35:51.280
Exactly.
link |
01:35:52.100
For us to be effective as a group of people
link |
01:35:53.200
in a software engineering project,
link |
01:35:54.600
you have to project positive intent, I think.
link |
01:35:56.860
I totally agree.
link |
01:35:57.820
Totally agree.
link |
01:35:58.660
And I think that's very,
link |
01:35:59.480
and so that happens in this space.
link |
01:36:01.640
But Python has done a reasonable job in the past,
link |
01:36:03.840
but here is a situation where I think it started
link |
01:36:05.920
to get this pressure where it didn't.
link |
01:36:07.840
I really didn't, I didn't know enough about what happened.
link |
01:36:10.440
I've talked to several people about it.
link |
01:36:12.160
And I know most of the steering committee members today,
link |
01:36:15.840
one person nominated me for that role,
link |
01:36:17.880
but it's the wrong role for me right now, right?
link |
01:36:20.880
I have a lot of respect for the Python developer space
link |
01:36:24.040
and the Python developers.
link |
01:36:25.440
I also understand the gap between computer science
link |
01:36:27.600
Python developers and array programming developers
link |
01:36:30.440
or science developers.
link |
01:36:31.440
And in fact, Python succeeds in the array space
link |
01:36:34.560
the more it has people in that boundary.
link |
01:36:36.520
And there's often very few.
link |
01:36:37.960
Like I was playing a role in that boundary
link |
01:36:39.440
and working like everything to try to keep up
link |
01:36:42.600
with even what Guido was saying, like I'm a C programmer,
link |
01:36:47.720
but not a computer scientist.
link |
01:36:49.080
Like I was an engineer and physicist and mathematician,
link |
01:36:52.600
and I didn't always understand
link |
01:36:54.840
what they were talking about
link |
01:36:56.360
and why they would have opinions the way they did.
link |
01:36:58.360
So, you know, you have to listen and try to understand.
link |
01:37:00.280
Then you also have to explain your point of view
link |
01:37:02.120
in a way they can understand.
link |
01:37:03.560
And that takes a lot of work.
link |
01:37:04.840
And that communication is always the challenge.
link |
01:37:07.920
And it's just what we're describing here
link |
01:37:09.200
about the negativity is just another form of that.
link |
01:37:11.520
Like how do we come together?
link |
01:37:12.560
And it does appear we're wired anyway
link |
01:37:14.520
to at least have a, there's a part of us
link |
01:37:16.560
that will enemy, you know, friend, enemy.
link |
01:37:18.880
And we see, yeah, it's like,
link |
01:37:21.360
why are we wiring on the enemy front?
link |
01:37:23.520
So why are we pushing that?
link |
01:37:24.760
Why are we promoting that so deeply?
link |
01:37:26.680
Assume friend until proven otherwise.
link |
01:37:28.440
Yes, yes.
link |
01:37:30.000
So, cause you have such a fascinating mind in all of this.
link |
01:37:32.160
Let me just ask you these questions.
link |
01:37:34.160
So one interesting side on the Python history
link |
01:37:38.000
is the move from Python two to Python three.
link |
01:37:41.000
You mentioned move from Python one to Python two,
link |
01:37:43.720
but the move from Python two to Python three
link |
01:37:46.800
is a little bit interesting
link |
01:37:47.920
because it took a very long time.
link |
01:37:50.040
It broke, you know, quite a small way
link |
01:37:53.520
backward compatibility, but even that small way
link |
01:37:56.280
seemed to have been very painful for people.
link |
01:37:58.680
Is there lessons you draw?
link |
01:38:00.640
Oh man, tons of lessons.
link |
01:38:01.480
From how long it took and how painful it seemed to be?
link |
01:38:05.520
Yeah, tons of lessons.
link |
01:38:07.000
Well, I mentioned here earlier
link |
01:38:08.240
that NumPy was written in 2005.
link |
01:38:11.840
It was in 2005 that I actually went to Guido
link |
01:38:15.520
to talk about getting NumPy into Python three.
link |
01:38:17.240
Like my strategy was to,
link |
01:38:18.880
oh, we were moving to Python three.
link |
01:38:19.960
Let's have that be, and it seems funny in retrospect
link |
01:38:22.200
because like, wait, Python three,
link |
01:38:23.360
that was in 2020, right?
link |
01:38:25.480
When we finally ended the support for Python two
link |
01:38:27.760
or at least 2017.
link |
01:38:29.000
The reason it took a long time,
link |
01:38:30.880
a lot of time, I think it was because one of the things is
link |
01:38:33.320
there wasn't much to like about Python three.
link |
01:38:36.240
3.0, 3.1, it really wasn't until 3.3.
link |
01:38:40.280
Like I consider Python 3.3 to be Python 3.0.
link |
01:38:43.600
But it wasn't until Python 3.3
link |
01:38:44.880
that I felt there's enough stuff in it
link |
01:38:47.200
to make it worth anybody using it, right?
link |
01:38:49.800
And then 3.4 started to be, oh yeah, I want that.
link |
01:38:52.600
And then 3.5 as the matrix multiply operator,
link |
01:38:54.880
and now it's like, okay, we gotta use that.
link |
01:38:56.520
Plus the libraries that started leveraging
link |
01:38:58.400
some of the features of Python three.
link |
01:38:59.600
Exactly.
link |
01:39:00.760
So it really, the challenge was it was,
link |
01:39:03.800
but it also illustrated a truism that, you know,
link |
01:39:07.400
when you have inertia,
link |
01:39:08.240
when you have a group of people using something,
link |
01:39:10.480
it's really hard to move them away from it.
link |
01:39:11.960
You can't just change the world on them.
link |
01:39:13.920
And Python three, you know, made some,
link |
01:39:15.440
I think it fixed some things Guido had always hated.
link |
01:39:17.240
I don't think he didn't like the fact
link |
01:39:18.440
that print was a statement.
link |
01:39:19.440
He wanted to make it a function.
link |
01:39:20.760
But in some sense, that's a bit of gratuitous change
link |
01:39:23.200
to the language.
link |
01:39:24.120
And you could argue, and people have,
link |
01:39:27.320
but one of the challenges was there wasn't enough features
link |
01:39:31.520
and too many just changes without features.
link |
01:39:34.960
And so the empathy for the end user
link |
01:39:37.440
as to why they would switch wasn't there.
link |
01:39:40.480
I think also it illustrated just the funding realities.
link |
01:39:42.960
Like Python wasn't funded.
link |
01:39:45.040
Like it was also a project
link |
01:39:46.160
with a bunch of volunteer labor, right?
link |
01:39:48.280
It had more people, so more volunteer labor,
link |
01:39:50.240
but it was still, it was fun in the sense
link |
01:39:52.240
that at least Guido had a job.
link |
01:39:53.480
And I've learned some of the behind the scenes on that now
link |
01:39:55.880
since talking to people who have lived through it
link |
01:39:57.840
and maybe not on air, we can talk about some of that.
link |
01:40:00.560
But it's interesting to see, but Guido had a job,
link |
01:40:03.640
but his full time job wasn't just work on Python.
link |
01:40:07.080
Like he had other things to do.
link |
01:40:08.880
Just wild.
link |
01:40:09.880
It is wild, isn't it?
link |
01:40:10.720
It's wild how few people are funded.
link |
01:40:13.320
Yes.
link |
01:40:14.160
And how much impact they have.
link |
01:40:15.200
Yes.
link |
01:40:16.160
Maybe that's a feature not a bug, I don't know.
link |
01:40:17.920
Maybe, yes, exactly.
link |
01:40:19.080
At least early on, like it's sort of, I know, yeah.
link |
01:40:21.840
It's like Olympic athletes are often severely underfunded,
link |
01:40:25.160
but maybe that's what brings out the greatness.
link |
01:40:27.360
Perhaps, yes, correct.
link |
01:40:28.520
No, exactly.
link |
01:40:29.680
Maybe this is the essential part of it.
link |
01:40:31.880
Because I do think about that in terms of,
link |
01:40:33.680
I currently have an incubator for open source startups.
link |
01:40:36.200
Like what I'm trying to do right now
link |
01:40:37.640
is create the environment I wished had existed
link |
01:40:40.480
when I was leaving academia with NumPy
link |
01:40:42.880
and trying to figure out what to do.
link |
01:40:44.120
I'm trying to create those opportunities and environments.
link |
01:40:46.120
So, and that's what drives me still,
link |
01:40:49.320
is how do I make the world easier
link |
01:40:50.760
for the open source entrepreneur?
link |
01:40:52.600
So let me stay, I mean, I could probably stay on NumPy
link |
01:40:55.960
for a long time, but this is fun question.
link |
01:41:00.960
So Andre Kapathy leads the Tesla Autopilot team,
link |
01:41:04.680
and he's also one of the most like legit programmers I know.
link |
01:41:10.720
It's like he builds stuff from scratch a lot,
link |
01:41:13.760
and that's how he builds intuition about how a problem works.
link |
01:41:16.200
He just builds it from scratch, and I always love that.
link |
01:41:18.320
And the primary language he uses is Python
link |
01:41:21.320
for the intuition building.
link |
01:41:23.080
But he posted something on Twitter saying
link |
01:41:27.600
that they got a significant improvement
link |
01:41:31.280
on some aspect of their like data loading, I think,
link |
01:41:35.640
by switching away from np.square root,
link |
01:41:39.840
so the NumPy's implementation of square root,
link |
01:41:42.160
to math.square root, and then somebody else commented
link |
01:41:44.520
that you can get even a much greater improvement
link |
01:41:48.120
by using the vanilla Python square root, which is like.
link |
01:41:52.600
Power 0.5.
link |
01:41:53.640
Power 0.5.
link |
01:41:55.200
And it's fascinating to me, I just wanted to.
link |
01:41:58.640
So that was some shade throwing at some.
link |
01:42:02.080
No, no, and yes, we're talking about.
link |
01:42:04.640
It's a good way to ask the trade off
link |
01:42:08.080
between usability and efficiency broadly in NumPy,
link |
01:42:12.080
but also on these specific weird quirks
link |
01:42:14.920
of like a single function.
link |
01:42:16.680
Yep, so on that point, if you use a NumPy math function
link |
01:42:21.360
on a scaler, it's gonna be slower
link |
01:42:25.000
than using a Python function on that scaler.
link |
01:42:27.960
But because the math object in NumPy is more complicated,
link |
01:42:33.800
because you can also call that math object on an array.
link |
01:42:36.760
And so effectively, it goes through a similar machine.
link |
01:42:39.200
There aren't enough of the, which you would do
link |
01:42:41.840
and you could do like checks and fast paths.
link |
01:42:45.960
So yeah, if you're basically doing a list,
link |
01:42:48.800
if you run over a list, in fact,
link |
01:42:50.680
for problems that are less than 1,000,
link |
01:42:53.700
even maybe 10,000 is probably the,
link |
01:42:55.320
if you're going more than 10,000,
link |
01:42:56.900
that's where you definitely need to be using arrays.
link |
01:42:59.080
But if you're less than that, and for reading,
link |
01:43:01.200
if you're doing a reading process
link |
01:43:02.760
and essentially it's not compute bound, it's IO bound.
link |
01:43:05.600
And so you're really taking lists of 1,000 at a time
link |
01:43:08.480
and doing work on it.
link |
01:43:09.540
Yeah, you could be faster just using Python,
link |
01:43:11.680
straight up Python.
link |
01:43:12.740
See, but also, and this is the side to the top,
link |
01:43:16.640
there's the fundamental questions
link |
01:43:18.680
when you look at the long arc of history,
link |
01:43:21.240
it's very possible that np.square root is much faster.
link |
01:43:25.560
It could be.
link |
01:43:26.400
So like in terms of like, don't worry about it,
link |
01:43:29.480
it's the evils of over optimization or whatever,
link |
01:43:32.420
all the different quotes around that,
link |
01:43:34.040
is sometimes obsessing about this particular little quark
link |
01:43:39.520
is not sufficient.
link |
01:43:41.720
For somebody like, if you're trying to optimize your path,
link |
01:43:45.220
I mean, I agree, premature optimization
link |
01:43:47.680
creates all kinds of challenges, right?
link |
01:43:49.320
Because now, but you may have to do it.
link |
01:43:51.840
I believe the quote is, it's the root of all evil.
link |
01:43:53.880
It's the root of all evil, right?
link |
01:43:55.560
Let's give Donald Knuth, I think,
link |
01:43:57.040
or is he more than somebody else?
link |
01:43:59.160
Well, Doc Knuth is kind of like Mark Twain,
link |
01:44:00.800
people just attribute stuff to him, I don't know.
link |
01:44:02.880
And it's fine because he's brilliant.
link |
01:44:04.640
So, no, I was a LaTeX user myself,
link |
01:44:07.640
and so I have a lot of respect,
link |
01:44:09.280
and he did more than that, of course,
link |
01:44:10.820
but yeah, someone I really appreciate
link |
01:44:14.120
in the computer science space.
link |
01:44:15.640
Yeah, I don't, I think that's appropriate.
link |
01:44:17.080
There's a lot of little things like that,
link |
01:44:18.320
where people actually, if you understood it,
link |
01:44:20.120
you go, yeah, of course, that's the case.
link |
01:44:22.640
And the other part, the other part I didn't mention,
link |
01:44:25.040
and Numba was a thing we wrote early on,
link |
01:44:27.960
and I was really excited by Numba
link |
01:44:29.040
because it's something we wanted,
link |
01:44:30.040
it was a compiler for Python syntax,
link |
01:44:32.160
and I wanted it from the beginning of writing NumPy
link |
01:44:35.440
because of this function question,
link |
01:44:38.280
like taking, the power of arrays
link |
01:44:41.900
is really that you can write functions using all of it.
link |
01:44:45.120
It has implicit looping, right?
link |
01:44:47.000
So you don't worry about,
link |
01:44:47.840
I write this n dimensional for loop
link |
01:44:49.200
with four loops, four, four statements.
link |
01:44:51.240
You just say, oh, big four dimensional array,
link |
01:44:53.600
I'm gonna do this operation, this plus, this minus,
link |
01:44:55.760
this reduction, and you get this,
link |
01:44:57.680
it's called vectorization in other areas,
link |
01:44:59.560
but you can basically think at a high level
link |
01:45:01.440
and get massive amounts of computation done
link |
01:45:03.640
with the added benefit of,
link |
01:45:06.200
oh, it can be paralyzed easily.
link |
01:45:08.040
It can be put in parallel.
link |
01:45:09.040
You don't have to think about that.
link |
01:45:10.000
In fact, it's worse to go decompose your,
link |
01:45:12.720
you write the for loops
link |
01:45:14.160
and then try to infer parallelism from for loops.
link |
01:45:16.280
That's actually a harder problem
link |
01:45:17.600
than to take the array problem
link |
01:45:19.640
and just automatically parallelize that problem.
link |
01:45:22.040
That's what, and so functions in NumPy
link |
01:45:25.320
are called universal functions, ufuncs.
link |
01:45:27.080
So square root is an example of a ufunk.
link |
01:45:29.000
There are others, sine, cosine, add, subtract.
link |
01:45:32.400
In fact, one of the first libraries to SciPy
link |
01:45:34.520
was something called Special
link |
01:45:35.520
where I added Bessel functions
link |
01:45:36.920
and all these special functions that come up in physics
link |
01:45:40.240
and I added them as ufuncs so they could work on arrays.
link |
01:45:43.040
So I understood ufuncs very, very well
link |
01:45:44.720
from day one inside of numeric.
link |
01:45:45.960
That was one of the things we tried to make better
link |
01:45:47.320
in NumPy was how do they work?
link |
01:45:49.120
Can they do broadcasting?
link |
01:45:50.360
What does broadcasting mean?
link |
01:45:51.960
But one of the problems is, okay,
link |
01:45:54.600
what do I do with a Python scaler?
link |
01:45:57.320
So what happens, the Python scaler gets broadcast
link |
01:45:59.800
to a zero dimensional array
link |
01:46:01.320
and then it goes through the whole same machinery
link |
01:46:02.800
as if it were a 10,000 dimensional array.
link |
01:46:05.080
And then it kind of unpacks the element
link |
01:46:07.640
and then does the addition.
link |
01:46:09.880
That's not to mention the function it calls
link |
01:46:12.600
in the case of square root
link |
01:46:13.640
is just the clib square root, right?
link |
01:46:15.960
In some cases, like Python's power,
link |
01:46:18.160
there's some optimizations they're doing
link |
01:46:20.360
that could be faster
link |
01:46:21.520
than just calling this the clib square root.
link |
01:46:23.760
In the interpreter or in the?
link |
01:46:25.320
No, in the C code, in the Python runtime.
link |
01:46:27.640
In the Python runtime, so they really optimize it
link |
01:46:30.960
and they have the freedom to do that
link |
01:46:32.120
because they don't have to worry about.
link |
01:46:32.960
It's just a scaler.
link |
01:46:34.080
It's just a scaler.
link |
01:46:34.920
Right, they don't have to worry about the fact
link |
01:46:36.200
that, oh, this could be an object with many pieces.
link |
01:46:39.360
The ufunc machine is also generic
link |
01:46:41.080
in sense that typecasting and broadcasting,
link |
01:46:44.600
broadcasting's idea of I'm gonna go,
link |
01:46:46.160
I have a zero dimensional array,
link |
01:46:47.360
I have a scaler with a four dimensional array
link |
01:46:49.240
and I add them.
link |
01:46:50.480
Oh, I have to kind of coerce the shape of this guy
link |
01:46:54.640
to make it work against the whole four dimensional array.
link |
01:46:56.880
So it's the idea of I can do a one dimensional array
link |
01:46:59.680
against a two dimensional array and have it make sense.
link |
01:47:02.200
Well, that's what NumPy does is it challenges you
link |
01:47:04.040
to reformulate, rethink your problem
link |
01:47:07.040
as a multi dimensional array problem
link |
01:47:09.080
versus move away from scalers completely.
link |
01:47:12.640
Right, exactly, exactly.
link |
01:47:14.240
In fact, that's where some of the edge cases boundaries are
link |
01:47:16.680
is that, well, they're still there
link |
01:47:18.960
and this is where array scalers are particular.
link |
01:47:21.080
So array scalers are particularly bad
link |
01:47:23.120
in the sense that they were written
link |
01:47:24.360
so that you could optimize the math on them,
link |
01:47:26.840
but that hasn't happened.
link |
01:47:29.040
And so their default is to coerce the array scaler
link |
01:47:32.800
to a zero dimensional array
link |
01:47:33.760
and then use the NumPy machinery.
link |
01:47:36.000
That's what, and you could specialize,
link |
01:47:38.200
but it doesn't happen all the time.
link |
01:47:39.960
So in fact, when we first wrote Numba,
link |
01:47:41.760
we do comparisons and say, look, it's 1000X speed up.
link |
01:47:45.720
We were lying a little bit in the sense that,
link |
01:47:47.160
well, first do the 40X slowdown
link |
01:47:50.240
of using the array scalers inside of a loop.
link |
01:47:52.280
Cause if you used to use Python scalers,
link |
01:47:53.560
you'd already be 10 times faster.
link |
01:47:56.200
But then we would get a hundred times faster
link |
01:47:58.080
over that using just compilation.
link |
01:48:00.320
But what we do is compile the loop
link |
01:48:01.600
from out of the interpreter to machine code.
link |
01:48:04.000
And then that's always been the power of Python
link |
01:48:06.280
is this extensibility so that you can,
link |
01:48:08.280
cause people say, oh, Python's so slow.
link |
01:48:09.680
Well, sure, if you do all your logic
link |
01:48:11.520
in the runtime of the Python interpreter, yeah.
link |
01:48:13.920
But the power is that you don't have to.
link |
01:48:15.800
You write all the logic,
link |
01:48:17.260
what you do in the high level is just high level logic.
link |
01:48:19.860
And the actual calls you're making
link |
01:48:21.920
could be on gigabyte arrays of data.
link |
01:48:24.400
And that's all done at compiled speeds.
link |
01:48:26.880
And the fact that integration is one can happen,
link |
01:48:30.320
but two is separable.
link |
01:48:32.420
That's one of the, the language like Julia says,
link |
01:48:35.240
we're going to be all in one.
link |
01:48:36.380
You can do all of it together.
link |
01:48:37.400
And then there's, the jury's out, is that possible?
link |
01:48:39.880
I tend to think that you're going to,
link |
01:48:41.760
there's separate concerns there.
link |
01:48:43.280
You want to precompile.
link |
01:48:44.320
In fact, generally you will want to precompile your,
link |
01:48:47.560
some of your loops.
link |
01:48:48.400
Like SciPy is a compilation step.
link |
01:48:50.160
To install SciPy, it takes about two hours.
link |
01:48:53.240
If you have many machines,
link |
01:48:54.080
maybe you can get it down to one hour.
link |
01:48:55.440
But to compile those libraries takes about, takes a while.
link |
01:48:57.920
You don't want to do that at runtime.
link |
01:48:59.920
You don't want to do that all the time.
link |
01:49:00.800
You want to have this precompiled binary available
link |
01:49:02.720
that you're then just linking into.
link |
01:49:04.400
So there's real questions about the whole source code.
link |
01:49:09.040
Code is, running binary code is more than source code.
link |
01:49:11.840
It's creating object code, it's the linker, it's the loader,
link |
01:49:14.480
it's the how does that interpret it
link |
01:49:15.600
inside of virtual memory space.
link |
01:49:17.640
There's a lot of details there that actually
link |
01:49:19.160
I didn't understand for a long time
link |
01:49:20.520
until I read books on the topic.
link |
01:49:23.000
And it led to, the more you know, the better off you are
link |
01:49:27.060
and you can do more details,
link |
01:49:28.440
but sometimes it helps with abstractions too.
link |
01:49:31.280
Well, the problem, as we mentioned earlier
link |
01:49:33.480
with abstractions is you kind of sometimes assume
link |
01:49:37.700
that whoever implemented this thing
link |
01:49:41.520
had your case in mind and found the optimal solution.
link |
01:49:45.000
Yes.
link |
01:49:45.840
Or like you assume certain things.
link |
01:49:47.320
I mean, there's a lot of,
link |
01:49:48.160
Correct.
link |
01:49:49.000
One of the really powerful things to me early on,
link |
01:49:52.800
I mean, it sounds silly to say, but with Python,
link |
01:49:55.480
probably one of the reasons I fell in love with it
link |
01:49:58.440
is dictionaries.
link |
01:49:59.800
Yes.
link |
01:50:00.920
So obviously probably most languages
link |
01:50:03.680
have some mapping concept,
link |
01:50:06.440
but it felt like it was a first class citizen
link |
01:50:09.040
and it was just my brain was able to think in dictionaries.
link |
01:50:12.200
But then there's the thing that I guess I still use
link |
01:50:14.640
to this day is order dictionaries
link |
01:50:16.920
because that seems like a more natural way
link |
01:50:20.120
to construct dictionaries.
link |
01:50:21.680
Yeah.
link |
01:50:22.520
And from a computer science perspective,
link |
01:50:23.720
the running time cost is not that significant,
link |
01:50:26.000
but there's a lot of things to understand about dictionaries
link |
01:50:30.400
that the abstraction kind of
link |
01:50:33.800
doesn't necessarily incentivize you to understand.
link |
01:50:37.400
Right, do you really understand the notion of a hash map
link |
01:50:39.400
and how the dictionary is implemented?
link |
01:50:41.080
But you're right.
link |
01:50:42.080
Dictionaries are a good example
link |
01:50:43.440
of an abstraction that's powerful.
link |
01:50:44.920
And I agree with you.
link |
01:50:46.000
I agree, I love dictionaries too.
link |
01:50:47.800
Took me a while to understand that once you do,
link |
01:50:49.160
you realize, oh, they're everywhere.
link |
01:50:50.280
And Python uses them everywhere too.
link |
01:50:52.760
Like it's actually constructed,
link |
01:50:54.240
one of the foundational things is dictionaries
link |
01:50:55.760
and it does everything with dictionaries.
link |
01:50:57.560
So it is, it's powerful.
link |
01:50:58.600
Order dictionaries came later,
link |
01:51:00.160
but it is very, very powerful.
link |
01:51:02.200
It took me a little while coming
link |
01:51:03.400
from just the array programming entirely
link |
01:51:05.960
to understand these other objects,
link |
01:51:07.360
like dictionaries and lists and tuples and binary trees.
link |
01:51:11.600
Like I said, I wasn't a computer scientist,
link |
01:51:13.360
I studied arrays first.
link |
01:51:15.120
And so I was very array centric.
link |
01:51:16.800
And you realize, oh, these others
link |
01:51:17.960
don't have purposes and value actually.
link |
01:51:21.200
I agree.
link |
01:51:22.040
There's a friendliness about,
link |
01:51:24.320
like one way to think about arrays
link |
01:51:26.760
is arrays are just like full of numbers,
link |
01:51:31.920
but to make them accessible to humans
link |
01:51:35.000
and make them less error prone to human users,
link |
01:51:38.700
sometimes you want to attach names,
link |
01:51:41.480
human interpretable names
link |
01:51:43.120
that are sticky to those arrays.
link |
01:51:44.720
So that's how you start to think about dictionaries
link |
01:51:47.160
is you start to convert numbers
link |
01:51:50.520
into something that's human interpretable.
link |
01:51:52.120
And that's actually the tension I've had with NumPy
link |
01:51:55.320
because I've built so much tooling
link |
01:51:58.160
around human interpretability
link |
01:52:02.320
and also protecting me from a year later
link |
01:52:05.680
not making the mistakes by being,
link |
01:52:07.960
I wanted to force myself to use English versus numbers.
link |
01:52:12.880
Yes, so there's a project called Labeled Arrays.
link |
01:52:15.680
Like very early it was recognized that,
link |
01:52:18.040
oh, we're indexing NumPy with just numbers,
link |
01:52:21.320
all the columns and particularly the dimensions.
link |
01:52:23.640
I mean, if you have an image,
link |
01:52:25.520
you don't necessarily need to label each column or row,
link |
01:52:27.680
but if you have a lot of images
link |
01:52:29.160
or you have another dimension,
link |
01:52:30.440
you'd at least like to label the dimension
link |
01:52:31.640
as this is X, this is Y, this is Z,
link |
01:52:33.120
or this is give us some human meaning
link |
01:52:34.640
or some domain specific meaning.
link |
01:52:36.760
That was one of the impetuses for Pandas actually
link |
01:52:39.680
was just, oh, we do need to label these things.
link |
01:52:43.040
And Label Array was an attempt to add
link |
01:52:45.240
that like a lighter weight version of that.
link |
01:52:47.680
And there's been, like, that's an example of something
link |
01:52:49.360
I think NumPy could add, could be added to NumPy,
link |
01:52:53.080
but one of the challenges again, how do you fund this?
link |
01:52:55.000
Like I said, one of the tragedies I think is that,
link |
01:52:58.280
so I never had the chance to,
link |
01:53:00.240
I was never paid to work on NumPy, right?
link |
01:53:02.360
So I've always just done it in my spare time,
link |
01:53:04.400
always taken from one thing,
link |
01:53:05.880
taken from another thing to do it.
link |
01:53:07.920
And at the time, I mean, today,
link |
01:53:09.800
it would be the wrong day and today,
link |
01:53:11.000
like paying me to work on NumPy now
link |
01:53:12.160
would not be a good use of effort,
link |
01:53:13.480
but we are finally at Quansight Labs,
link |
01:53:16.640
I'm actually paying people to work on NumPy and SciPy,
link |
01:53:19.440
which is I'm thrilled with, I'm excited by.
link |
01:53:22.000
I've wanted to do that.
link |
01:53:22.840
That's what I always wanted to do from day one.
link |
01:53:24.280
It just took me a while to figure out a mechanism to do that.
link |
01:53:27.640
Even like in the university setting,
link |
01:53:29.680
respecting that, like pushing students,
link |
01:53:33.840
young minds and young graduate students to contribute
link |
01:53:38.000
and then figuring out financial mechanisms
link |
01:53:41.160
that enable them to contribute
link |
01:53:43.280
and then sort of reward them
link |
01:53:45.280
for their innovative scientific journey,
link |
01:53:48.000
that would be nice.
link |
01:53:49.160
But then also just a better allocation of resources.
link |
01:53:53.360
It's 20 year anniversary since 9.11
link |
01:53:55.760
and I was just looking, we spent over $6 trillion
link |
01:53:59.240
in the Middle East after 9.11 in the various efforts there.
link |
01:54:04.560
And sort of to put politics and all that aside,
link |
01:54:08.040
it's just, you think about the education system,
link |
01:54:10.120
all the other ways we could have
link |
01:54:11.320
possibly allocated that money.
link |
01:54:14.280
To me, to take it back,
link |
01:54:16.560
the amount of impact you would have
link |
01:54:21.200
by allocating a little bit of money to the programmers
link |
01:54:26.360
that build the tools that run the world is fascinating.
link |
01:54:30.600
It is.
link |
01:54:32.600
I don't know, I think, again,
link |
01:54:34.920
there is some aspect to being broke
link |
01:54:38.040
as somewhat of a feature, not a bug,
link |
01:54:40.240
that you make sure that you're valued.
link |
01:54:42.320
But you can still manage that.
link |
01:54:43.440
Right, no, I know.
link |
01:54:45.320
But I don't think that's a big part.
link |
01:54:47.040
So it's like, I think you can have enough money
link |
01:54:50.720
and actually be wealthy while maintaining your values.
link |
01:54:53.880
Agreed, agreed.
link |
01:54:55.520
There's an old adage that nations that trade together
link |
01:54:57.800
don't go to war together.
link |
01:54:59.440
I've often thought about nations that code together.
link |
01:55:01.680
Yeah, code together.
link |
01:55:02.520
Right?
link |
01:55:03.360
I love that.
link |
01:55:04.200
Because one of the things I love about open source
link |
01:55:05.360
is it's global, it's multinational.
link |
01:55:07.880
Like there aren't national boundaries.
link |
01:55:09.160
One of the challenges with business and open source
link |
01:55:10.760
is the fact that, well, business is national.
link |
01:55:12.800
Like businesses are entities
link |
01:55:13.960
that are recognized in legal jurisdictions, right?
link |
01:55:16.240
And have laws that are respected in those jurisdictions
link |
01:55:18.280
and hiring, and yet the open source ecosystem
link |
01:55:21.320
is not, it's not there.
link |
01:55:23.040
Like currently, one of the problems we're solving
link |
01:55:25.080
is hiring people all over the world, right?
link |
01:55:27.200
Because we, it's a global effort.
link |
01:55:29.600
And I've had the chance to work, and I've loved the chance.
link |
01:55:31.920
I've never been to like Iran,
link |
01:55:35.280
but I once had a conference
link |
01:55:36.800
where I was able to talk to people there, right?
link |
01:55:38.640
And talk to folks in Pakistan.
link |
01:55:40.920
I've never been there, but we had a call
link |
01:55:44.080
where there were people there,
link |
01:55:45.320
like just scientists and normal people.
link |
01:55:47.600
And there's a certain amount of humanizing, right?
link |
01:55:52.640
That gets away from the,
link |
01:55:54.360
like we often get the memes of society
link |
01:55:56.200
that bubble up and get discussed,
link |
01:55:58.560
but the memes are not even an accurate reflection
link |
01:56:00.760
of the reality of what people are.
link |
01:56:02.400
Well, if you look at the major power centers
link |
01:56:05.440
that are leading to something like cyber war
link |
01:56:08.240
in the next few decades,
link |
01:56:10.000
it's the United States, it's Russia, and China.
link |
01:56:13.320
And those three countries in particular
link |
01:56:16.080
have incredible developers.
link |
01:56:18.240
So if they work together, I think that's one way,
link |
01:56:21.360
the politicians can do their stupid bickering,
link |
01:56:23.360
but like there's a layer of infrastructure, of humanity.
link |
01:56:27.360
If they collaborate together,
link |
01:56:29.400
that I think can prevent major military conflict,
link |
01:56:34.080
which would, I think most likely happen at the cyber level
link |
01:56:37.840
versus the actual hot war level.
link |
01:56:39.800
You're right.
link |
01:56:40.640
You know, I think that's a good prediction.
link |
01:56:43.320
Nations that code together don't go to war together.
link |
01:56:46.560
Don't go to war together.
link |
01:56:47.880
That's a hope, right?
link |
01:56:48.720
That's one of the philosophical hopes, but yeah.
link |
01:56:52.360
So you mentioned the project of Numba,
link |
01:56:55.640
which is fascinating.
link |
01:56:58.520
So from the early days,
link |
01:56:59.720
there was kind of a pushback on Python that it's not fast.
link |
01:57:04.560
You know, you see C plus,
link |
01:57:05.520
if you wanna write something that's fast,
link |
01:57:06.920
you use C plus plus.
link |
01:57:08.240
If you wanna write something that's usable and friendly,
link |
01:57:11.320
but slow, you use Python.
link |
01:57:13.240
And so what is Numba?
link |
01:57:15.840
What is its goal?
link |
01:57:16.800
How does it work?
link |
01:57:17.640
Great, yeah.
link |
01:57:18.480
Yes, that's what the argument.
link |
01:57:19.760
And the reality was people would write high level coding
link |
01:57:22.440
and use compiled code,
link |
01:57:23.440
but there's still user stories, use cases,
link |
01:57:25.240
where you want to write Python,
link |
01:57:27.440
but then have it still be fast.
link |
01:57:28.880
You still need to write a for loop.
link |
01:57:30.720
Like before Numba, it was always don't write a for loop.
link |
01:57:33.920
You know, write it in a vectorized way,
link |
01:57:35.800
you know, put it in an array.
link |
01:57:37.240
And often that can make a memory trade off.
link |
01:57:39.640
Like quite often you can do it,
link |
01:57:41.080
but then you make maybe use more memory
link |
01:57:42.720
because you have to build this array of data
link |
01:57:44.920
that you don't necessarily need all the time.
link |
01:57:46.680
So Numba was, it started from a desire to have
link |
01:57:50.960
kind of a vectorized that worked.
link |
01:57:52.840
A vectorized was a tool in NumPy, it was released.
link |
01:57:56.260
You give it a Python function
link |
01:57:57.800
and it gave you a universal function,
link |
01:57:59.680
a ufunc that would work on arrays.
link |
01:58:01.120
So you get the function that just worked on a scaler.
link |
01:58:03.640
Like you could make a,
link |
01:58:04.880
like the classic case was a simple function
link |
01:58:07.280
that an if then statement in it.
link |
01:58:08.280
So sine X over X function, sync function.
link |
01:58:12.160
If X equals zero, return one, otherwise do sine X over X.
link |
01:58:16.080
The challenge is you don't want that loop
link |
01:58:17.760
peg one in Python.
link |
01:58:18.720
So you want a compiled version of that,
link |
01:58:21.480
but the ufunc, the vectorized in NumPy
link |
01:58:23.160
would just give you a Python function.
link |
01:58:24.840
So it would take the array of numbers
link |
01:58:26.720
and at every call do a loop back into Python.
link |
01:58:29.560
So it was very slow.
link |
01:58:30.440
It gave you the appearance of a ufunc,
link |
01:58:31.800
but it was very slow.
link |
01:58:32.840
So I always wanted a vectorized
link |
01:58:34.600
that would take that Python scaler function
link |
01:58:36.280
and produce a ufunc working on binary native code.
link |
01:58:39.480
So in fact, I had somebody work on that with PyPy
link |
01:58:42.800
and see if PyPy could be used to produce a ufunc like that
link |
01:58:45.640
early on in 2009 or something like that, 2010.
link |
01:58:50.560
They didn't work that well.
link |
01:58:51.480
It was kind of pretty bulky.
link |
01:58:52.880
But in 2012, Peter and I had just started Anaconda.
link |
01:58:57.000
We had, I just, I'd learned to raise money.
link |
01:59:00.680
That's a different topic,
link |
01:59:01.640
but I'd learned to raise money from friends, family,
link |
01:59:04.640
and fools, as they say.
link |
01:59:05.960
And.
link |
01:59:06.800
That's a good line.
link |
01:59:09.840
Oh, that's a good line.
link |
01:59:11.200
But, so we were trying to do something.
link |
01:59:13.440
We were trying to change the world.
link |
01:59:14.680
Peter and I are super ambitious.
link |
01:59:15.840
We wanted to make array computing
link |
01:59:17.600
and we had ideas for really what's still,
link |
01:59:19.480
it's still the energy right now.
link |
01:59:20.640
How do you do at scale data science?
link |
01:59:23.520
And we had a bunch of ideas there, but one of them,
link |
01:59:25.840
I had just talked to people about LLVM
link |
01:59:27.720
and I was like, there's a way to do this.
link |
01:59:30.040
I just, I went, I heard about my friend Dave Beasley
link |
01:59:32.600
at a compiler course.
link |
01:59:33.920
So I was looking at compilers like,
link |
01:59:35.560
and I realized, oh, this is what you do.
link |
01:59:37.640
And so I wrote a version of Numba
link |
01:59:40.040
that just basically mapped Python bytecode to LLVM.
link |
01:59:45.640
Nice.
link |
01:59:46.480
Right, so, and the first version is like, this works
link |
01:59:49.200
and it produces code that's fast.
link |
01:59:50.840
This is cool for, you know,
link |
01:59:51.960
obviously a reduced subset of Python.
link |
01:59:53.440
I didn't support all the Python language.
link |
01:59:55.360
There had been efforts to speed up Python in the past,
link |
01:59:57.480
but those efforts were, I would say,
link |
01:59:59.200
not from the array computing perspective,
link |
02:00:00.840
not from the perspective of wanting to produce
link |
02:00:02.160
a vectorized improvement.
link |
02:00:03.560
They were from the perspective of speeding up
link |
02:00:05.120
the runtime of Python, which is fundamentally hard
link |
02:00:07.520
because Python allows for some constructs
link |
02:00:10.520
that aren't, you can't speed up.
link |
02:00:12.160
Like it's this generic, you know, when it does this variable.
link |
02:00:15.560
So I, from the start, did not try to replicate
link |
02:00:17.720
Python's semantics entirely.
link |
02:00:20.280
I said, I'm gonna take a subset of the Python syntax
link |
02:00:23.000
and let people write syntax in Python,
link |
02:00:25.080
but it's kind of a new language really.
link |
02:00:27.440
So it's almost like four loops, like focusing on four loops.
link |
02:00:30.480
Four loops, scalar arithmetic, you know, typed,
link |
02:00:34.400
you know, really typed language, a typed subset.
link |
02:00:38.280
That was the key.
link |
02:00:39.360
So, but we wanted to add inference of types.
link |
02:00:41.880
So you didn't have to spell all the types out
link |
02:00:43.400
because when you call a function,
link |
02:00:45.840
so Python is typed, it's just dynamically typed.
link |
02:00:48.040
So you don't tell it what the types are,
link |
02:00:49.360
but when it runs, every time an object runs,
link |
02:00:52.080
there's a type for the variables.
link |
02:00:53.360
You know what it is.
link |
02:00:54.560
And so that was the design goals of Numba
link |
02:00:56.800
were to make it possible to write functions
link |
02:00:59.200
that could be compiled and have them used for NumPy arrays.
link |
02:01:03.440
Like they needed to support NumPy arrays.
link |
02:01:05.520
And so how does it work?
link |
02:01:07.040
Do you add a comment within Python that tells it to do,
link |
02:01:10.200
like how do you help out the compiler?
link |
02:01:11.880
Yeah, so there isn't much actually.
link |
02:01:15.860
You don't, it's kind of magical in the sense
link |
02:01:17.740
that it just looks at the type of the objects
link |
02:01:19.600
and then it's typed inference to determine
link |
02:01:21.320
any other variables it needs.
link |
02:01:23.320
And then it was also, because we had a use case
link |
02:01:26.080
that could work early.
link |
02:01:28.280
Like one of the challenges of any kind of new development
link |
02:01:30.700
is if you have something that to make it work,
link |
02:01:32.280
it was gonna take you a long time,
link |
02:01:34.200
it's really hard to get out off the ground.
link |
02:01:35.960
If you have a project where there's some incremental story,
link |
02:01:39.200
it can start working today and solve a problem,
link |
02:01:42.300
then you can start getting it out there, getting feedback.
link |
02:01:44.640
Because Numba today, now Numba is nine years old today,
link |
02:01:48.160
the first two, three versions were not great, right?
link |
02:01:52.120
But they solved a problem and some people could try it
link |
02:01:54.120
and we could get some feedback on it.
link |
02:01:55.560
Not great in that it was very focused.
link |
02:01:57.520
Very fragile, the subset it would actually compile
link |
02:02:02.000
was small and so if you wrote Python code
link |
02:02:04.320
and said, so the way it worked is you write a function
link |
02:02:06.880
and you say at JIT, use decorators.
link |
02:02:09.000
So decorators, just these little constructs
link |
02:02:11.040
let you decorate code with an at and then a name.
link |
02:02:15.040
The at JIT would take your Python function
link |
02:02:17.760
and actually just compile it and replace the Python function
link |
02:02:20.240
with another function that interacts
link |
02:02:23.200
with this compiled function.
link |
02:02:24.920
And it would just do that and we went from Python bytecode
link |
02:02:28.480
then we went to AST.
link |
02:02:29.400
I mean, writing compilers actually,
link |
02:02:31.200
I learned a lot about why computer science
link |
02:02:32.940
is taught the way it is because compilers
link |
02:02:35.560
can be hard to write.
link |
02:02:36.640
They use tree structures, they use all the concepts
link |
02:02:39.080
of computer science that are needed.
link |
02:02:40.520
It's actually hard to, it's easy to write a compiler
link |
02:02:44.600
and then have it be spaghetti code.
link |
02:02:46.000
Like the passes become challenging
link |
02:02:47.600
and we ended up with three versions of Numba, right?
link |
02:02:49.940
Numba got written three times.
link |
02:02:51.540
What programming language is Numba written in?
link |
02:02:55.560
Python.
link |
02:02:56.440
Wait, okay.
link |
02:02:57.440
Yeah, Python.
link |
02:02:58.640
So.
link |
02:03:00.040
Really?
link |
02:03:00.860
That's fascinating.
link |
02:03:01.700
Yeah, so Python, but then the whole goal of Numba
link |
02:03:03.520
is to translate Python bytecode to LLVM.
link |
02:03:07.480
And so LLVM actually does the code generation.
link |
02:03:09.400
In fact, a lot of times they'd say,
link |
02:03:10.780
yeah, it's super easy to write a compiler
link |
02:03:12.780
if you're not writing the parser nor the code generator.
link |
02:03:15.880
Right?
link |
02:03:16.720
So for people who don't know, LLVM is a compiler itself.
link |
02:03:19.440
So your compiler.
link |
02:03:20.360
Yeah, it's really badly named low level virtual machine,
link |
02:03:22.680
which that part of it is not used.
link |
02:03:24.480
It's really low level.
link |
02:03:25.320
Chris, he doesn't mean that.
link |
02:03:26.160
Yeah, love Chris.
link |
02:03:29.280
But the name makes you imply that the virtual machine
link |
02:03:31.640
is what it's all about.
link |
02:03:32.480
It's actually the IR and the library,
link |
02:03:34.520
the code generation.
link |
02:03:36.000
That's the real beauty of it.
link |
02:03:37.680
The fact that, what I love about LLVM
link |
02:03:39.360
was the fact that it was a plateau you could collaborate on.
link |
02:03:43.200
Right?
link |
02:03:44.040
Instead of the internals of GCC
link |
02:03:45.880
or the internals of the Intel compiler,
link |
02:03:47.440
or like how do I extend that?
link |
02:03:49.120
And it was a place we could collaborate.
link |
02:03:51.020
And we were early.
link |
02:03:52.400
I mean, people had started before.
link |
02:03:54.000
It's a slow compiler.
link |
02:03:55.240
Like it's not a fast compiler.
link |
02:03:56.840
So for some kind of JITs,
link |
02:03:59.520
like JITs are common in language
link |
02:04:01.040
because one, every browser has a JavaScript JIT.
link |
02:04:04.760
It does real time compilation
link |
02:04:06.560
of the JavaScript to machine code.
link |
02:04:09.080
For people who don't know, JIT is just in time compilation.
link |
02:04:11.520
Thank you.
link |
02:04:12.340
Yeah, just in time compilation.
link |
02:04:13.240
They're actually really sophisticated.
link |
02:04:14.840
In fact, I got jealous of how much effort
link |
02:04:17.100
was put into the JavaScript JITs.
link |
02:04:18.600
Yes, well, it's kind of incredible what they've done.
link |
02:04:20.800
Yes, I completely agree.
link |
02:04:22.760
I'm very impressed.
link |
02:04:24.760
But you know, Numba was an effort
link |
02:04:26.880
to make that happen with Python.
link |
02:04:29.320
And so we used some of the money
link |
02:04:30.960
we raised from Anaconda to do it.
link |
02:04:32.440
And then we also applied for this DARPA grant
link |
02:04:34.800
and used some of that money to continue the development.
link |
02:04:36.820
And then we used proceeds from service projects we would do.
link |
02:04:40.680
We get consulting projects
link |
02:04:41.800
that we would then use some of the profits
link |
02:04:44.480
to invest in Numba.
link |
02:04:45.400
So we ended up with a team of two or three people
link |
02:04:47.160
working on Numba.
link |
02:04:48.880
It was a fits and starts, right?
link |
02:04:50.720
And ultimately, the fact that we had a commercial version
link |
02:04:53.560
of it also we were writing.
link |
02:04:54.720
So part of the way I was trying to fund Numba,
link |
02:04:56.640
say, well, let's do the free Numba
link |
02:04:58.560
and then we'll have a commercial version of Numba
link |
02:04:59.920
called Numba Pro.
link |
02:05:00.820
And what Numba Pro did is it targeted GPUs.
link |
02:05:03.240
So we had the very first CUDA JIT
link |
02:05:05.520
and the very first at JIT compiler that in 2012 for 13,
link |
02:05:10.840
you could run not just a view func on CPU,
link |
02:05:14.140
but a view func on GPUs.
link |
02:05:15.640
And it would automatically paralyze it
link |
02:05:17.480
and get 1000X speed on it.
link |
02:05:18.840
And that's an interesting funding mechanism
link |
02:05:21.120
because large companies or larger companies
link |
02:05:26.860
care about speed in just this way.
link |
02:05:30.120
So it's exactly a really good way.
link |
02:05:33.140
Yeah, there's been a couple of things
link |
02:05:34.240
you know people will pay for.
link |
02:05:35.200
One, they'll pay for really good user interfaces, right?
link |
02:05:37.960
And so I'm always looking for what are the things
link |
02:05:40.160
people will pay for that you could actually adapt
link |
02:05:41.720
to the open source infrastructure?
link |
02:05:43.240
One is definitely user interfaces.
link |
02:05:45.560
The second is speed, like a better runtime, faster runtime.
link |
02:05:49.120
And then when you say people,
link |
02:05:50.000
you mean like a small number of people pay a lot of money,
link |
02:05:52.280
but then there's also this other mechanism that.
link |
02:05:54.440
That's true.
link |
02:05:55.280
A ton of people pay.
link |
02:05:56.400
That's true.
link |
02:05:57.220
A little bit.
link |
02:05:58.060
First, I gotta, we mentioned Anaconda,
link |
02:06:00.320
we mentioned friends, family, and fools.
link |
02:06:04.280
So Anaconda is yet another.
link |
02:06:06.800
So there's a company, but there's also a project.
link |
02:06:09.080
Correct.
link |
02:06:09.920
That is exceptionally impactful in terms of,
link |
02:06:14.600
for many reasons, but one of which is bringing
link |
02:06:16.880
a lot more people into the community
link |
02:06:21.960
of folks who use Python.
link |
02:06:23.640
So what is Anaconda?
link |
02:06:26.920
What is its goals?
link |
02:06:28.960
Maybe what is Conda versus Anaconda?
link |
02:06:31.540
Yeah, I'll tell you a little bit of the history of that.
link |
02:06:33.080
Cause Anaconda, we wanted to do,
link |
02:06:35.280
we wanted to scale Python.
link |
02:06:37.440
Cause we, you know, that was the goal.
link |
02:06:38.680
Peter and I had the goal of when we started Anaconda,
link |
02:06:40.720
we actually started as Continuum Analytics
link |
02:06:42.440
was the name of the company that started.
link |
02:06:44.000
It got renamed Anaconda in 2015.
link |
02:06:47.000
But we said, we want to scale analytics.
link |
02:06:49.880
NumPy is great, Pandas is emerging,
link |
02:06:52.680
but these need to run at scale with lots of machines.
link |
02:06:55.320
The other thing we wanted to do was make user interfaces
link |
02:06:57.920
that were web.
link |
02:06:59.360
We wanted to make sure the web did not pass
link |
02:07:01.320
by the Python community.
link |
02:07:02.920
That we had ways to translate your data science to the web.
link |
02:07:06.000
So those are the two kind of technical areas.
link |
02:07:07.720
We thought, oh, we'll build products in this space.
link |
02:07:09.920
And that was the idea.
link |
02:07:12.500
Very quickly in, but of course,
link |
02:07:13.640
the thing I knew how to do was to do consulting
link |
02:07:15.760
to make money and to make sure my family and friends
link |
02:07:18.920
and fools that had invested didn't lose their money.
link |
02:07:21.680
So it's a little different
link |
02:07:22.640
than if you take money from a venture fund.
link |
02:07:24.360
If you take money from a venture fund,
link |
02:07:25.520
the venture fund, they want you to go big or go home.
link |
02:07:27.720
And they're kind of like expecting nine out of 10 to fail
link |
02:07:30.280
or 99 out of 100 to fail.
link |
02:07:33.080
It's different.
link |
02:07:33.920
I was, I was owed a barbell strategy.
link |
02:07:35.480
I was like, I can't fail.
link |
02:07:37.280
I mean, I may not do super well,
link |
02:07:38.680
but I cannot lose their money.
link |
02:07:40.440
So I'm going to do something I know can return a profit,
link |
02:07:43.560
but I want to have exposure to an upside.
link |
02:07:46.320
So that's what happened at Anaconda.
link |
02:07:47.920
We didn't, there was lots of things we did not well
link |
02:07:50.320
in terms of that structure.
link |
02:07:51.320
And I've learned from since and how to do it better.
link |
02:07:53.740
But we've, we did a really good job
link |
02:07:56.700
of kind of attracting the interest around the area
link |
02:07:59.140
to get good people working
link |
02:08:00.360
and then get funnel some money
link |
02:08:01.700
on some interesting projects.
link |
02:08:03.080
Super excited about what came out of our energy there.
link |
02:08:05.200
Like a lot did.
link |
02:08:06.840
So what are some of the interesting projects?
link |
02:08:08.280
So Dask, Numba, Bokeh, Conda.
link |
02:08:12.120
There was a data shader, Panel, Holoviz.
link |
02:08:16.200
These are all tools that are extremely relevant
link |
02:08:19.040
in terms of helping you build applications,
link |
02:08:21.400
build tools, build, you know, faster code.
link |
02:08:25.060
There's a couple I'm forgetting.
link |
02:08:25.900
Oh, JupyterLab, JupyterLab came out of this too.
link |
02:08:28.680
And yeah.
link |
02:08:30.320
Okay, so Bokeh does plotting?
link |
02:08:32.700
Is that?
link |
02:08:33.540
Bokeh does plotting.
link |
02:08:34.360
So Bokeh was one of the foundational things to say,
link |
02:08:35.880
I want to do plot in Python,
link |
02:08:37.360
but have the things show up in a web.
link |
02:08:39.140
Right, that's right.
link |
02:08:40.040
That's right, that's right.
link |
02:08:40.880
And plotting to me still,
link |
02:08:43.280
with all due respect to Matplotlib and Bokeh,
link |
02:08:46.480
it feels like still an unsolved problem,
link |
02:08:48.760
not a solved problem.
link |
02:08:50.260
It is, it's a big problem.
link |
02:08:52.160
Right, because you're, I mean, I don't know,
link |
02:08:55.640
it's visualization broadly, right?
link |
02:08:58.640
I think we've got a pretty good API story
link |
02:09:00.960
around certain use cases of plotting.
link |
02:09:03.440
But there's a difference between static plots
link |
02:09:04.920
versus interactive plots versus I'm an end user,
link |
02:09:07.800
I just want to write a simple,
link |
02:09:09.760
for Pandas started the idea of here's a data frame
link |
02:09:12.040
on a dot plot, I'm just going to attach plot
link |
02:09:14.200
as a method to my object,
link |
02:09:16.380
which was a little bit controversial, right?
link |
02:09:18.280
But works pretty well, actually,
link |
02:09:20.160
because there's a lot less you have to pass in, right?
link |
02:09:23.680
You can just say, here's my object, you know what you are,
link |
02:09:26.280
you tell the visualization what to do.
link |
02:09:29.000
So that, and there's things like that
link |
02:09:31.320
that have not been super well developed entirely,
link |
02:09:33.720
but Bokeh was focused on interactive plotting.
link |
02:09:36.320
So you could, it's a short path
link |
02:09:38.400
between interactive plotting and application,
link |
02:09:41.080
dashboard application.
link |
02:09:42.680
And there's some incredible work that got done there, right?
link |
02:09:44.760
And it was a hard project,
link |
02:09:45.800
because then you're basically doing JavaScript and Python.
link |
02:09:49.440
So we wanted to tackle some of these hard problems
link |
02:09:51.560
and try to just go after them.
link |
02:09:53.440
We got some DARPA funding to help,
link |
02:09:54.920
and it was super helpful, funny story there,
link |
02:09:56.880
we actually did two DARPA proposals,
link |
02:09:58.320
but one we were five minutes late for.
link |
02:10:00.580
And DARPA has a very strict cutoff window.
link |
02:10:03.040
And so I, we had two proposals,
link |
02:10:04.760
one for the Bokeh and one for actually Numba
link |
02:10:06.720
and the other work.
link |
02:10:09.320
Which one were you late for?
link |
02:10:10.920
The Foundation on Numerical Work.
link |
02:10:12.920
So Bokeh got funded. Oh no.
link |
02:10:14.880
Fortunately, Chris let us use some of the money to fund
link |
02:10:17.120
still some of the other foundational work,
link |
02:10:19.320
but it wasn't as, yeah, his hands were tired,
link |
02:10:22.040
he couldn't do anything about it.
link |
02:10:23.880
That was a whole interesting story.
link |
02:10:25.880
So one of the incredible projects
link |
02:10:27.700
that you worked on is Conda.
link |
02:10:29.200
Yes.
link |
02:10:30.040
So what is Conda? So how that came about,
link |
02:10:31.400
yeah, Conda, it was early on, like I said, with SciPy.
link |
02:10:35.480
SciPy was a distribution mass generation library.
link |
02:10:37.880
And he said, he heard me talking about compiler issues
link |
02:10:40.320
and trying to get the stuff shipped
link |
02:10:41.480
and the fact that people can use your libraries
link |
02:10:43.320
if they have it.
link |
02:10:44.660
So for a long time,
link |
02:10:45.500
we'd understood the packaging problem in Python.
link |
02:10:47.800
And one of the first things he did at Conda Analytics
link |
02:10:50.680
became Anaconda was organize the Pi data ecosystem
link |
02:10:54.240
in conjunction with NumFocus.
link |
02:10:56.160
We actually started NumFocus
link |
02:10:58.960
with some other folks in the community
link |
02:11:00.480
the same year we started Anaconda.
link |
02:11:02.880
I said, we're gonna build a corporation,
link |
02:11:04.200
but we're also gonna reify the community aspect
link |
02:11:07.040
and build a nonprofit.
link |
02:11:08.280
So we did both of those.
link |
02:11:09.400
Can we pause real quick and can you say what is PyPy,
link |
02:11:13.280
the Python package index,
link |
02:11:14.720
like this whole story of packaging in Python?
link |
02:11:19.300
Yeah, that's what I'm gonna get to actually.
link |
02:11:20.880
This is exactly the journey I'm on.
link |
02:11:22.240
It's to sort of explain packaging in Python.
link |
02:11:24.200
I think it's best expressed to the conversation
link |
02:11:26.080
I had with Guido at a conference,
link |
02:11:27.600
where I said, so packaging is kind of a problem.
link |
02:11:31.280
And Guido said, I don't ever care about packaging.
link |
02:11:34.080
I don't use it.
link |
02:11:34.920
I don't install new libraries.
link |
02:11:36.320
I'm like, I guess if you're the language creator
link |
02:11:38.200
and if you need something, you just put it in the distribution
link |
02:11:40.480
maybe you don't worry about packaging.
link |
02:11:42.520
But Guido has never really cared about packaging, right?
link |
02:11:45.200
And never really cared about the problem of distribution.
link |
02:11:47.400
It's somebody else's problem.
link |
02:11:48.480
And that's a fair position to take, I think,
link |
02:11:50.240
as a language creator.
link |
02:11:51.480
In fact, there's a philosophical question about
link |
02:11:54.160
should you have different development packaging managers?
link |
02:11:56.680
Should you have a package manager per language?
link |
02:11:58.400
Is that really the right approach?
link |
02:11:59.800
I think there are some answers of
link |
02:12:01.900
it is appropriate to have development tools.
link |
02:12:04.200
And there's an aspect of a development tool
link |
02:12:06.040
that is related to packaging.
link |
02:12:07.680
And every language should have some story there
link |
02:12:10.600
to help their developers create.
link |
02:12:12.120
So you should have language specific development tools.
link |
02:12:14.960
Development tools that relate to package managers.
link |
02:12:17.080
But then there's a very specific user story
link |
02:12:19.520
around package management
link |
02:12:20.680
that those language specific package managers
link |
02:12:22.240
have to interact with.
link |
02:12:23.560
And currently aren't doing a good job of that.
link |
02:12:25.920
That was one of the challenges
link |
02:12:27.000
that not seeing that difference,
link |
02:12:29.140
and it still exists in the difference today.
link |
02:12:31.720
Conda always was a user.
link |
02:12:34.480
I'm gonna use Python to do data science.
link |
02:12:36.540
I'm gonna use Python to do something.
link |
02:12:38.240
How do I get this installed?
link |
02:12:39.560
It was always focused on that.
link |
02:12:41.160
So it didn't have a develop.
link |
02:12:43.880
Classic example is pip has a pip develop.
link |
02:12:45.960
It's like, I wanna install this
link |
02:12:47.480
into my current development environment today.
link |
02:12:50.280
Conda doesn't have that concept
link |
02:12:51.520
because it's not part of the story.
link |
02:12:52.840
For people who don't know,
link |
02:12:54.640
pip is a Python specific package manager.
link |
02:12:59.640
That's exceptionally popular.
link |
02:13:04.640
That's probably like the default thing you've learned.
link |
02:13:06.520
It's the default user.
link |
02:13:07.360
And so the story there emerged
link |
02:13:08.840
because what happened is in 2012,
link |
02:13:11.480
we had this meeting at the Googleplex
link |
02:13:13.760
and Guido was there to come talk about what we're gonna do,
link |
02:13:15.600
how we're gonna make things work better.
link |
02:13:17.240
And Wes McKinney, me, Peter,
link |
02:13:19.960
Peter has a great photo of me talking to Guido
link |
02:13:21.880
and he pretends we're talking about this story.
link |
02:13:23.560
Maybe we were, maybe we weren't.
link |
02:13:24.680
But we did at that meeting talk about it
link |
02:13:26.320
and asked Guido, we need to fix packaging in Python.
link |
02:13:29.920
People can't get the stuff.
link |
02:13:31.040
And he said, go fix it yourself.
link |
02:13:32.400
I don't think we're gonna do it.
link |
02:13:33.600
All right.
link |
02:13:35.720
The origin story right there.
link |
02:13:36.960
All right, you said, okay, you said to do this ourselves.
link |
02:13:39.440
So at the same time,
link |
02:13:41.640
people did start to work on the packaging story in Python.
link |
02:13:44.600
It just took a little longer.
link |
02:13:45.680
So in 2012, kind of motivated
link |
02:13:48.160
by our training courses we were teaching,
link |
02:13:49.600
like very similar to what you just mentioned
link |
02:13:51.440
about your mother.
link |
02:13:52.280
Like it was motivated by the same purpose.
link |
02:13:54.160
Like how do we get this into people's hands?
link |
02:13:56.040
It's this big, long process.
link |
02:13:57.120
It takes too expensive.
link |
02:13:58.520
It was actually hurting NumPy development
link |
02:14:00.200
because I would hear people were saying,
link |
02:14:02.280
don't make that change to NumPy
link |
02:14:03.480
because I just spent a week getting my Python environment.
link |
02:14:05.480
And if you change NumPy, I have to reinstall everything.
link |
02:14:09.160
And reinstalling is such a pain, don't do it.
link |
02:14:10.880
I'm like, wait, okay.
link |
02:14:12.120
So now we're not making changes to a library
link |
02:14:14.640
because of the installation problem
link |
02:14:16.000
that it'll cause for end users.
link |
02:14:17.440
Okay, there's a problem with installation.
link |
02:14:19.400
We gotta fix this.
link |
02:14:20.520
So we said, we're gonna make a distribution in Python.
link |
02:14:23.760
And we'd previously done that.
link |
02:14:24.760
I'd previously done that at mthought.
link |
02:14:26.920
I wanted to make one that would give away for free,
link |
02:14:28.520
that everyone could just get.
link |
02:14:29.840
Like that was critical that we could just get it.
link |
02:14:32.080
It wasn't tied to a product.
link |
02:14:33.880
It was just you could get it.
link |
02:14:35.360
And then we had constantly thought about,
link |
02:14:36.960
well, do we just leverage RPM?
link |
02:14:39.120
But the challenge had always been,
link |
02:14:40.400
we want a package manager that works on Windows,
link |
02:14:42.240
Mac OS X, and Linux the same, right?
link |
02:14:45.040
And it wasn't there.
link |
02:14:46.560
Like you don't have anything like that.
link |
02:14:47.960
You have...
link |
02:14:48.800
And for people who don't know,
link |
02:14:49.640
RPM is an operating system specific package manager.
link |
02:14:54.560
Correct, it's an operating specific.
link |
02:14:55.960
Yes, exactly.
link |
02:14:56.800
So do you create the design questions,
link |
02:15:00.160
do you create an umbrella package manager
link |
02:15:02.240
that works across operating systems?
link |
02:15:03.840
Yes, that was the decision.
link |
02:15:05.680
And in neighboring design questions,
link |
02:15:08.080
do you also create a package manager
link |
02:15:09.920
that spans multiple programming languages?
link |
02:15:11.840
Correct, exactly.
link |
02:15:12.760
That was the world we faced.
link |
02:15:14.280
And we decided to go multiple operating systems,
link |
02:15:17.080
multiple and programming language independent.
link |
02:15:19.220
Because even Python, and particularly what was important
link |
02:15:21.800
was SciPy has a bunch of Fortran in it, right?
link |
02:15:24.920
And scikit learn has links to a bunch of C++.
link |
02:15:27.760
There's a lot of compiled code.
link |
02:15:29.960
And the Python package managers, especially early on,
link |
02:15:32.920
didn't even support that.
link |
02:15:34.320
So in 2000, so we released Anaconda,
link |
02:15:38.520
which was just a distribution of libraries,
link |
02:15:39.960
but we started to work on Conda in 2012.
link |
02:15:42.480
First version of Conda came out in early 2013,
link |
02:15:44.680
summer of 2013, and it was a package manager.
link |
02:15:47.840
So you could say, Conda install scikit learn.
link |
02:15:49.560
In fact, scikit learn was a fantastic project that emerged.
link |
02:15:54.280
It was the classic example of the scikits.
link |
02:15:57.120
I talked to you earlier about SciPy being too big
link |
02:15:59.760
to be a single library.
link |
02:16:01.240
Well, what the community had done is said,
link |
02:16:02.680
let's make scikits.
link |
02:16:04.160
And there's scikit image, there's scikit learn,
link |
02:16:05.840
there's a lot of scikits.
link |
02:16:07.640
And it was a fantastic move that the community did.
link |
02:16:10.200
I didn't do it.
link |
02:16:11.460
I was like, okay, that's a good idea.
link |
02:16:12.560
I didn't like the name.
link |
02:16:13.540
I didn't like the fact you typed scikit image.
link |
02:16:15.500
I was like, that's gotta be simpler.
link |
02:16:17.400
That's scikit learn, we gotta make that smaller.
link |
02:16:19.800
I don't like typing all this stuff from imports.
link |
02:16:21.940
So I was kind of a pressure that way,
link |
02:16:23.220
but I love the energy and love the fact
link |
02:16:25.280
that they went out and they did it,
link |
02:16:26.200
and DOS people, Jared Millman, and then of course, Gael,
link |
02:16:29.400
and there's people I'm not even naming.
link |
02:16:31.280
Scikit learn really emerged as a fantastic project.
link |
02:16:34.640
And the documentation around that is also incredible.
link |
02:16:36.680
And the documentation was incredible, exactly.
link |
02:16:37.840
I don't know who did that, but they did a great job.
link |
02:16:40.160
A lot of people in Inria, a lot of European contributors.
link |
02:16:45.120
There's some Andreas in the US.
link |
02:16:47.120
There's a lot of just people I just adore,
link |
02:16:48.920
I think are amazing people.
link |
02:16:51.180
Awesome use of SciPy, right?
link |
02:16:52.480
I love the fact that they were using SciPy effectively
link |
02:16:54.600
to do something I love, which is machine learning,
link |
02:16:57.160
but couldn't install it.
link |
02:16:58.980
Because there's so many pieces involved.
link |
02:17:00.600
So many dependencies, right?
link |
02:17:02.160
So our use case of Conda was Conda install scikit learn.
link |
02:17:06.080
Right, and it was the best way to install scikit learn
link |
02:17:09.440
in 2013 to really 2018, 17, 18, PIP finally caught up.
link |
02:17:14.440
I still think it's you should Conda install scikit learn
link |
02:17:16.440
for the PIP install scikit learn,
link |
02:17:17.560
but you can PIP install scikit learn.
link |
02:17:19.360
The issue is the package they created was wheels
link |
02:17:21.840
and PIP does not handle the multi vendor approach.
link |
02:17:24.480
They don't handle the fact you have C++ libraries
link |
02:17:26.600
you're depending on.
link |
02:17:27.680
They just stop at the Python boundary.
link |
02:17:29.240
And so what you have to do in the wheel world
link |
02:17:31.280
is you have to vendor.
link |
02:17:33.200
You have to take all of the binary and vendor it.
link |
02:17:35.640
Now, if your change happens in underlying dependency,
link |
02:17:38.480
you have to redo the whole wheel.
link |
02:17:40.280
So TensorFlow, as you know,
link |
02:17:42.080
you should not PIP install TensorFlow.
link |
02:17:44.680
It's a terrible idea.
link |
02:17:45.520
People do it because the popularity of PIP,
link |
02:17:48.640
many people think, oh, of course,
link |
02:17:49.480
that's how I install everything in Python.
link |
02:17:51.480
Yeah, this is one of the big challenges.
link |
02:17:53.960
You take a GitHub repository or just a basic blog post.
link |
02:17:57.920
The number of time PIP is mentioned over Conda
link |
02:18:00.840
is like 100 X to one.
link |
02:18:02.760
Correct, correct.
link |
02:18:03.600
So it just has to do with the.
link |
02:18:04.440
And that was increasing.
link |
02:18:05.280
It wasn't true early because PIP didn't exist.
link |
02:18:07.520
Like Conda came first.
link |
02:18:08.840
So but that's the problem.
link |
02:18:10.160
Like Conda came first, but that's like the long tail
link |
02:18:13.040
of the internet documentation user generated.
link |
02:18:15.840
So that like you think, how do I install Google?
link |
02:18:19.160
How do I install TensorFlow?
link |
02:18:20.400
You're just not gonna see Conda in that first page.
link |
02:18:23.000
Correct, exactly.
link |
02:18:24.120
And that.
link |
02:18:24.960
Not today, you would have in 2016, 2017.
link |
02:18:29.400
And it's sad because Conda solves
link |
02:18:32.760
a lot of usability issues.
link |
02:18:34.160
Correct.
link |
02:18:35.000
Like for especially super challenging thing.
link |
02:18:36.480
I don't know.
link |
02:18:37.320
One of the big pain points for me was
link |
02:18:39.520
just on the computer vision side, OpenCV installation.
link |
02:18:43.560
Perfect example.
link |
02:18:44.400
Yes.
link |
02:18:45.240
I think Conda, I don't know if Conda solved that one.
link |
02:18:47.400
Conda has an OpenCV package.
link |
02:18:49.080
I don't know.
link |
02:18:49.920
I certainly know PIP has not solved.
link |
02:18:53.440
I mean, there's complexities there because.
link |
02:18:55.840
Right.
link |
02:18:56.680
I actually don't know.
link |
02:18:57.640
I should probably know a good answer for this,
link |
02:18:59.120
but if you compile OpenCV with certain dependencies,
link |
02:19:05.440
you'll be able to do certain things.
link |
02:19:07.440
So there's this kind of flexibility of what you,
link |
02:19:09.840
like what options you compile with.
link |
02:19:12.960
Yes.
link |
02:19:13.800
And I don't think it's trivial to do that with Conda or.
link |
02:19:17.840
So Conda has a notion of variance of a package.
link |
02:19:20.520
You can actually have different compilation versions
link |
02:19:23.120
of a package.
link |
02:19:23.960
So not just the version is different,
link |
02:19:24.800
but oh, this is compiled with these optimizations on.
link |
02:19:26.880
So Conda does have an answer.
link |
02:19:28.000
Has those flavors.
link |
02:19:28.840
Has flavors, basically.
link |
02:19:30.080
Well, PIP, as far as I know, does not have flavors.
link |
02:19:32.360
No, no.
link |
02:19:33.280
PIP generally hasn't thought deeply
link |
02:19:36.440
about the binary dependency problem, right?
link |
02:19:38.400
And that's why fundamentally it doesn't work
link |
02:19:41.840
for the SciPy ecosystem.
link |
02:19:43.640
It barely, you can sort of paper over it and duct tape
link |
02:19:46.120
and it kind of works until it doesn't
link |
02:19:48.040
and it falls apart entirely.
link |
02:19:49.560
So it's been a mixed bag.
link |
02:19:51.520
Like, and I've been having lots of conversations
link |
02:19:54.360
with people over the years because again,
link |
02:19:56.120
it's an area where if you understand some things,
link |
02:19:58.400
but not all the things,
link |
02:19:59.240
but they've done a great job of community appeal.
link |
02:20:02.200
This is an area where I think Anaconda as a company
link |
02:20:05.560
needed to do some things
link |
02:20:07.040
in order to make Conda more community centric, right?
link |
02:20:10.440
And this is a, I talk about this all the time.
link |
02:20:13.080
There's a balance between you have every project starts
link |
02:20:16.640
with what I called company backed open source.
link |
02:20:18.280
Even if the company is yourself, it's just one person,
link |
02:20:20.320
just doing business as.
link |
02:20:23.360
But ultimately for products to succeed virally
link |
02:20:26.080
and become massive influencers,
link |
02:20:28.320
they have to create,
link |
02:20:29.160
they have to get community people on board.
link |
02:20:30.520
They have to get other people on board.
link |
02:20:32.120
So it has to become community driven.
link |
02:20:33.680
And a big part of that is engagement with those people.
link |
02:20:35.520
Empowering people, governance around it.
link |
02:20:38.600
And what happened with Conda in the early days,
link |
02:20:41.360
PIP emerged and we did do some good things.
link |
02:20:43.720
Conda Forge, Conda Forge community
link |
02:20:46.400
is sort of the community recipe creation community.
link |
02:20:49.880
But Conda itself, I still believe,
link |
02:20:52.160
and Peter is CEO of Anaconda, he's my co founder.
link |
02:20:55.120
I ran Anaconda until 2017, 2018.
link |
02:20:58.160
Is Peter still Anaconda?
link |
02:20:59.000
Peter's still Anaconda, right?
link |
02:21:00.000
And we're still great friends.
link |
02:21:01.360
We talk all the time.
link |
02:21:02.560
I love him to death.
link |
02:21:03.600
There's a long story there about like why and how
link |
02:21:06.080
and we can cover in some other podcast perhaps.
link |
02:21:08.640
Yeah.
link |
02:21:09.480
It's sort of a more, maybe a more business focused one.
link |
02:21:11.400
But this is one area where I think Conda
link |
02:21:15.160
should be more community driven.
link |
02:21:17.280
Like he should be pushing more
link |
02:21:18.960
to get more community contributors to Conda
link |
02:21:21.200
and let the, Anaconda shouldn't be fighting this battle.
link |
02:21:26.080
Yeah.
link |
02:21:26.920
Right?
link |
02:21:27.760
It's actually, it's really a developers.
link |
02:21:28.600
Like you said, like help the developers
link |
02:21:30.400
and then they'll actually move us the right direction.
link |
02:21:32.200
Well, that was the problem I have is many
link |
02:21:34.040
of the cool kids I know don't use Conda.
link |
02:21:36.520
And that to me is confusing.
link |
02:21:38.880
It is confusing.
link |
02:21:39.800
It's really a matter of, Conda has some challenges.
link |
02:21:42.640
First of all, Conda still needs to be improved.
link |
02:21:44.120
There's lots of improvements to be made.
link |
02:21:45.320
And it's that aspect of wait, who's doing this?
link |
02:21:47.600
And the fact that then the Pi PA really stepped up.
link |
02:21:50.960
Like they were not solving the problem at all.
link |
02:21:53.400
And now they kind of got to where they're solving it
link |
02:21:55.640
for the most part.
link |
02:21:56.720
And then effectively you could get,
link |
02:21:58.160
like Conda solved a problem that was there.
link |
02:22:00.360
And it still does.
link |
02:22:01.200
It's still, you know, there's still great things it can do.
link |
02:22:03.960
But, and we still use it all the time at one site
link |
02:22:06.920
and with other clients, but with,
link |
02:22:08.960
but you can kind of do similar things with PIP and Docker.
link |
02:22:12.160
Right?
link |
02:22:13.000
So especially with the web development community,
link |
02:22:15.280
that part of it, again, is this is the,
link |
02:22:17.080
there's a lot of different kinds of developers
link |
02:22:19.200
in the Python ecosystem.
link |
02:22:20.200
And there's still a lack of some clear understanding.
link |
02:22:23.720
I go to the Python conference all the time
link |
02:22:25.320
and then there's only a few people in the Pi PA who get it.
link |
02:22:28.280
And then others who are just massively trumpeting
link |
02:22:30.680
the power of PIP, but just do not understand the problem.
link |
02:22:32.840
Yeah.
link |
02:22:33.680
So one of the obvious things to me from a mom,
link |
02:22:36.040
from a non programmer perspective,
link |
02:22:37.840
is the across operating system usability.
link |
02:22:41.760
That's much more natural.
link |
02:22:42.680
So there's people that use Windows and just,
link |
02:22:45.440
it seems much easier to recommend Conda there,
link |
02:22:49.080
but then it, you should also recommend it across the board.
link |
02:22:51.840
So I'll definitely sort of.
link |
02:22:53.520
But what I recommend now is a hybrid.
link |
02:22:55.320
I do.
link |
02:22:56.160
I mean, I have no problem.
link |
02:22:57.000
Is it possible to use?
link |
02:22:57.840
Oh, it is.
link |
02:22:58.660
It is.
link |
02:22:59.500
But like build the environment with PIP, with Conda,
link |
02:23:01.600
build an environment with Conda
link |
02:23:03.360
and then PIP install on top of that.
link |
02:23:04.600
That's fine.
link |
02:23:05.440
Be careful about PIP installing OpenCV or TensorFlow
link |
02:23:09.400
or because if somebody's allowed that,
link |
02:23:11.360
it's gonna be most surely done in a way
link |
02:23:13.320
that can't be updated that easily.
link |
02:23:15.120
So install like the big packages,
link |
02:23:17.680
the infrastructure with Conda and then the weirdos.
link |
02:23:21.000
Yeah.
link |
02:23:21.840
That like the weird like implementation for some.
link |
02:23:24.720
I had a, there's a cool library I used
link |
02:23:28.440
that based on your location and time of day and date
link |
02:23:33.520
tells you the exact position of the sun
link |
02:23:35.640
relative to the earth.
link |
02:23:38.160
And it's just like a simple library,
link |
02:23:39.700
but it's very precise.
link |
02:23:41.360
And I was like, all right.
link |
02:23:42.200
But that was, that was, and it's like PIP.
link |
02:23:45.120
Well, the thing they did really well is Python developers
link |
02:23:48.600
who wanna get their stuff published,
link |
02:23:50.600
you have to have a PIP recipe.
link |
02:23:51.920
Yeah.
link |
02:23:52.760
Right?
link |
02:23:53.600
I mean, even if it's, you know, the challenge is,
link |
02:23:56.440
and there's a key thing that needs to be added to PIP,
link |
02:23:58.800
just simply add to PIP the ability to defer
link |
02:24:01.680
to a system package manager.
link |
02:24:03.440
Like, cause it's, you know,
link |
02:24:04.460
recognize you're not gonna solve all the dependency problem.
link |
02:24:07.280
So let like give up and allow the system package to work.
link |
02:24:12.420
That way Anaconda is installed and it has PIP.
link |
02:24:15.140
It would default to Conda to install stuff,
link |
02:24:16.960
but Red Hat RPM would default to RPM
link |
02:24:19.240
to install some more things.
link |
02:24:20.600
Like that's the, that's a key, not difficult,
link |
02:24:23.480
but somewhat work, some work feature needs to be added.
link |
02:24:25.960
That's an example of something like,
link |
02:24:27.440
I've known we need to do it.
link |
02:24:28.620
I mean, it's where I wish I had more money.
link |
02:24:30.920
I wish I was more successful in the business side,
link |
02:24:33.480
trying to get there, but I wish my, you know,
link |
02:24:35.060
my family, friends and full community that I know.
link |
02:24:37.280
Was larger.
link |
02:24:38.120
Was larger and had more money.
link |
02:24:39.320
Cause I know tons of things to do effectively
link |
02:24:42.680
with more resources, but you know,
link |
02:24:46.280
I have not yet been successful at channel.
link |
02:24:48.720
Tons of, you know, some, you know,
link |
02:24:49.960
I'm happy with what we've done.
link |
02:24:51.480
We created again at Quansight,
link |
02:24:54.840
what we created to get Anaconda started.
link |
02:24:56.480
We created community to get Anaconda started.
link |
02:24:58.160
Done it again with Quansight.
link |
02:24:59.280
Super excited by that.
link |
02:25:00.480
But it took three years to do it.
link |
02:25:02.200
What is Quansight?
link |
02:25:03.200
What is its mission?
link |
02:25:04.440
We've talked a few times about different fascinating
link |
02:25:06.920
aspects of it, but let's like big picture,
link |
02:25:08.920
what is Quansight?
link |
02:25:09.760
Big picture Quansight.
link |
02:25:10.600
Quansight is, its mission is to connect data
link |
02:25:13.480
to an open economy.
link |
02:25:14.480
So it's basically consulting of the pie data ecosystem,
link |
02:25:17.520
right?
link |
02:25:18.360
It's a consulting company.
link |
02:25:19.280
And what I've said when I started it was we're trying
link |
02:25:21.200
to create products, people, and technology.
link |
02:25:24.700
So it's divided into two groups.
link |
02:25:26.700
And a third one as well.
link |
02:25:28.300
The two groups are a consulting services company
link |
02:25:30.360
that just helps people do data science
link |
02:25:31.960
and data engineering and data management better
link |
02:25:35.080
and more efficiently.
link |
02:25:35.920
Like full stack, like full thing.
link |
02:25:36.760
Full stack data science, full thing.
link |
02:25:38.200
We'll help you build a infrastructure.
link |
02:25:40.020
If you're using Jupiter, we need,
link |
02:25:41.380
we do staff augmentation, need more pro programmers,
link |
02:25:43.820
help you use Dask more effectively,
link |
02:25:44.900
help you use GPUs more effectively.
link |
02:25:46.520
Just basically a lot of people need help.
link |
02:25:48.400
So we do training as well to help people, you know,
link |
02:25:50.800
both immediate help and then get, learn from somebody.
link |
02:25:55.860
We've added a bunch of stuff too.
link |
02:25:57.080
We've kind of separated some of these other things
link |
02:25:58.600
into another company called Open Teams
link |
02:26:00.120
that we currently started.
link |
02:26:01.760
One of the things I loved about what we did at Anaconda
link |
02:26:03.380
was creating a community innovation team.
link |
02:26:05.520
And so I wanted to replicate that.
link |
02:26:06.700
This time we did a lot of innovation at Anaconda.
link |
02:26:09.360
I wanted to do innovation,
link |
02:26:10.600
but also contribute to the projects that existed,
link |
02:26:13.680
like create a place where maintainers,
link |
02:26:16.440
so the SciPy and NumPy and Numba
link |
02:26:18.480
and all these projects we already started
link |
02:26:20.400
can pay people to work on them and keep them going.
link |
02:26:22.700
So that's Labs.
link |
02:26:23.540
Quansight Labs is a separate organization.
link |
02:26:25.960
It's a nonprofit mission.
link |
02:26:28.060
The profits of Quansight help fund it.
link |
02:26:29.940
And in fact, every project that we have at Quansight,
link |
02:26:33.240
a portion of the money goes directly to Quansight Labs
link |
02:26:36.040
to help keep it funded.
link |
02:26:37.060
So we've gotten several mechanisms
link |
02:26:38.280
that we keep Quansight Labs funded.
link |
02:26:40.040
And currently, so I'm really excited about Labs
link |
02:26:41.960
because it's been a mission for a long time.
link |
02:26:43.680
What kind of projects are within Labs?
link |
02:26:45.240
So Labs is working to make the software better,
link |
02:26:47.680
like make NumPy better, make SciPy better.
link |
02:26:49.760
It only works on open source.
link |
02:26:52.340
So if somebody wants to, so companies do,
link |
02:26:55.440
we have a thing called a community work order, we call it.
link |
02:26:57.480
If a company says, I wanna make Spyder better.
link |
02:27:00.020
Okay, cool.
link |
02:27:01.680
You can pay for a month of a developer of Spyder
link |
02:27:05.440
or a developer of NumPy or a developer of SciPy.
link |
02:27:08.400
You can't tell them what you want them to do.
link |
02:27:09.840
You can give them your priorities and things you wish existed
link |
02:27:12.800
and they'll work on those priorities with the community
link |
02:27:16.080
to get what the community wants
link |
02:27:17.560
and what emerges of what the community wants.
link |
02:27:18.880
Is there some aspect on the consulting side
link |
02:27:21.080
that is helping, as we were talking about morphology
link |
02:27:24.320
and so on, is there specific application
link |
02:27:26.600
that are particularly like driving,
link |
02:27:29.120
sort of inspiring the need for updates to SciPy?
link |
02:27:32.000
Correct, absolutely, absolutely.
link |
02:27:33.360
GPUs are absolutely one of them.
link |
02:27:34.840
And new hardware beyond GPUs.
link |
02:27:36.840
I mean, Tesla's Dojo chip, I'm hoping we'll have a chance
link |
02:27:39.720
to work on that perhaps.
link |
02:27:42.320
Things like that are definitely driving it.
link |
02:27:43.840
The other thing that's driving it is scalable,
link |
02:27:45.520
like speed and scale.
link |
02:27:47.640
How do I write NumPy code or NumPy Lite code
link |
02:27:50.360
if I want it to run across a cluster?
link |
02:27:52.520
That's Dask or maybe it's Ray.
link |
02:27:54.240
I mean, there's sort of ways to do that now.
link |
02:27:56.360
Or there's Moden and there's, so Pandas code,
link |
02:27:59.720
NumPy code, SciPy code, Scikit learn code
link |
02:28:02.080
that I want to scale.
link |
02:28:03.240
So that's one big area.
link |
02:28:04.880
Have you gotten a chance to chat with Andre and Elon
link |
02:28:08.400
about particular, because like.
link |
02:28:09.840
No, I would love to, by the way.
link |
02:28:11.360
I have not, but I'd love to.
link |
02:28:12.280
I just saw their Tesla AI Days video.
link |
02:28:15.520
Super excited.
link |
02:28:16.360
That's one of the, you know, I love great engineering,
link |
02:28:18.600
software engineering teams and engineering teams in general.
link |
02:28:21.000
And they're doing a lot of incredible stuff with Python.
link |
02:28:23.040
They're like revolutionary.
link |
02:28:25.040
So many aspects of the machine learning pipeline.
link |
02:28:28.800
I agree.
link |
02:28:29.640
That's operating in the real world.
link |
02:28:30.600
And so much of that is Python.
link |
02:28:31.880
Like you said, the guy running, you know, Andre Kapathy,
link |
02:28:35.000
running Autopilot is tweeting about optimization
link |
02:28:38.680
of NumPy versus.
link |
02:28:41.200
I would love to talk to him.
link |
02:28:42.920
In fact, we have at Quonset, we've been fortunate enough
link |
02:28:45.080
to work with Facebook on PyTorch directly.
link |
02:28:47.560
So we have about 13 developers at Quonset.
link |
02:28:49.880
Some of them are in labs working directly on PyTorch.
link |
02:28:52.560
On PyTorch.
link |
02:28:53.400
On PyTorch, right.
link |
02:28:54.240
So I basically started Quonset.
link |
02:28:55.680
I went to both TensorFlow and PyTorch and said,
link |
02:28:57.160
hey, I want to help connect what you're doing
link |
02:29:00.200
to the broader SciPy ecosystem.
link |
02:29:01.920
Because I see what you're doing.
link |
02:29:03.240
We have this bigger mission that we want to make sure
link |
02:29:04.760
we don't, you know, lose energy here.
link |
02:29:06.760
So, and Facebook responded really positively
link |
02:29:09.840
and I didn't get the same reaction.
link |
02:29:12.400
Not yet, not yet.
link |
02:29:13.560
Not yet.
link |
02:29:14.400
So I really love the folks at TensorFlow, too.
link |
02:29:17.480
They're fantastic.
link |
02:29:18.480
I think it's the, just how it integrates
link |
02:29:21.120
with their business.
link |
02:29:21.960
I mean, like I said, there's a lot of reasons.
link |
02:29:23.800
Just the timing, the integration with their business,
link |
02:29:25.720
what they're looking for.
link |
02:29:27.160
They're probably looking for more users.
link |
02:29:28.760
And I was looking to kind of cut up some development effort
link |
02:29:31.600
and they couldn't receive that as easily, I think.
link |
02:29:33.840
So I'm hoping, I'm really hopeful
link |
02:29:36.040
and love the people there.
link |
02:29:37.640
What's the idea behind OpenTeams?
link |
02:29:39.800
So OpenTeams, I'm super excited about OpenTeams
link |
02:29:41.960
because it's one of the,
link |
02:29:43.400
I mentioned my idea for investing directly in open source.
link |
02:29:46.760
So that's a concept called fair OSS.
link |
02:29:48.880
But one of the things we, when we started Quansight,
link |
02:29:51.000
we knew we would do is we develop products and ideas
link |
02:29:53.680
and new companies might come out.
link |
02:29:55.440
At Anaconda, this was clear, right?
link |
02:29:57.680
Anaconda, we did so much innovation
link |
02:30:00.240
that like five or six companies could have come out of that.
link |
02:30:02.960
And we just didn't structure it so they could.
link |
02:30:05.000
But in fact, they have, you look at Dask,
link |
02:30:07.240
there's two companies going out of Dask.
link |
02:30:08.880
You know, Bokeh could be a company.
link |
02:30:10.080
There's like lots of companies that could exist
link |
02:30:11.720
off the work we did there.
link |
02:30:13.120
And so I thought, oh, here's a recipe for an incubation,
link |
02:30:16.400
a concept that we could actually spawn new companies
link |
02:30:19.480
and new innovations.
link |
02:30:20.800
And then the idea has always been,
link |
02:30:22.800
well, money they earn should come back
link |
02:30:24.680
to fund the open source projects.
link |
02:30:26.520
So labs is, you know, I think there should be
link |
02:30:29.240
a lot of things like Quansight Labs.
link |
02:30:30.720
I think this concept is one that scales.
link |
02:30:32.560
You could have a lot of open source research labs.
link |
02:30:35.080
Along the way, so in 2018, when the bigger idea came,
link |
02:30:37.480
how to make open source investable, I said,
link |
02:30:38.800
oh, I need to write, I need to create a venture fund.
link |
02:30:41.120
So we created a venture fund called Quansight Initiate
link |
02:30:43.840
at the same time.
link |
02:30:44.680
It's an angel fund, really.
link |
02:30:45.520
It's, you know, we started to learn that process.
link |
02:30:47.840
How do we actually do this?
link |
02:30:48.680
How do we get LPs?
link |
02:30:49.520
How do we actually go in this direction and build a fund?
link |
02:30:52.480
And I'm like, every venture fund should have
link |
02:30:54.280
an associated open source research lab,
link |
02:30:55.720
which is no reason.
link |
02:30:56.560
Like our venture fund, the carried interest,
link |
02:30:59.520
a portion of it goes to the lab.
link |
02:31:01.840
It directly will fund the lab.
link |
02:31:03.280
That's fascinating, brother.
link |
02:31:04.120
So you use the power of the organic formation of teams
link |
02:31:06.800
in the open source community, and then like naturally,
link |
02:31:10.680
that leads to a business that can make money.
link |
02:31:13.920
Yeah, correct.
link |
02:31:14.760
And then it always maintains and loops back
link |
02:31:16.680
to the open source.
link |
02:31:17.520
Loops back to open source, exactly.
link |
02:31:18.440
I mean, to me, it's a natural fit.
link |
02:31:19.640
There's something, there's absolutely
link |
02:31:20.960
a repeatable pattern there, and it's also beneficial
link |
02:31:23.640
because, oh, I have, I have natural connections
link |
02:31:26.800
to the open source if I have an open source research lab.
link |
02:31:29.200
Like, they'll always, they'll be out there
link |
02:31:31.160
talking to people, and so we've had a chance
link |
02:31:34.280
to talk to a lot of early stage companies.
link |
02:31:35.920
And we, and our fund focuses on the early stage.
link |
02:31:37.880
So Quansight has the services, the lab, the fund, right?
link |
02:31:41.880
In that process, a lot of stuff started to happen.
link |
02:31:44.200
They're like, oh, you know, we started to do recruiting
link |
02:31:46.320
and support and training, and I was starting
link |
02:31:48.600
to build a bigger sales team and marketing team
link |
02:31:50.960
and people besides just developers.
link |
02:31:52.880
And one of the challenges with that
link |
02:31:54.080
is you end up with different cultural aspects.
link |
02:31:55.960
You know, developers, you know, there's a,
link |
02:31:58.800
in any company you go to, you kind of go look,
link |
02:32:00.760
is this a business led company, a developer led company?
link |
02:32:03.080
Do they kind of coexist?
link |
02:32:04.280
Are they, what's the interface between them?
link |
02:32:06.120
There's always a bit of a tension there.
link |
02:32:07.280
Like we were talking about before.
link |
02:32:08.760
You know, what is the tension there?
link |
02:32:10.200
With OpenTeams, I thought, wait a minute,
link |
02:32:11.360
we can actually just create,
link |
02:32:13.160
like this concept of Quansight plus labs,
link |
02:32:15.560
it's, well, it's specific to the Pi data ecosystem.
link |
02:32:18.480
The concept is general for all open source.
link |
02:32:20.800
So OpenTeams emerged as a, oh,
link |
02:32:22.640
we can create a business development company
link |
02:32:24.400
for many, many Quansights, like thousands of Quansights.
link |
02:32:28.440
And it can be a marketplace to connect,
link |
02:32:30.840
essentially be the enterprise software company
link |
02:32:33.440
of the future.
link |
02:32:34.440
If you look at what enterprise software wants
link |
02:32:36.760
from the customer side, and during this journey,
link |
02:32:38.640
I've had the chance to work and sell to lots of companies,
link |
02:32:42.360
Exxon and Shell and Davey Morgan Bank of America,
link |
02:32:45.240
like the Fortune 100,
link |
02:32:46.680
and talk to a lot of people in procurement
link |
02:32:48.240
and see what are they buying and why are they buying?
link |
02:32:50.400
So, you know, I don't know everything,
link |
02:32:51.760
but I've learned a lot about,
link |
02:32:52.720
oh, what are they really looking for?
link |
02:32:54.480
And they're looking for solutions.
link |
02:32:56.400
They're constantly given products
link |
02:32:58.160
from enterprise software.
link |
02:33:01.160
Here's open source, leave the enterprise software,
link |
02:33:02.560
now I buy it.
link |
02:33:03.400
And then they have to stitch it together into a solution.
link |
02:33:05.880
Open source is fantastic for gluing
link |
02:33:07.360
those solutions together.
link |
02:33:08.760
So, whereas they keep getting new platforms
link |
02:33:11.480
they're trying to buy,
link |
02:33:12.360
but most open source, what most enterprises want
link |
02:33:15.000
is tools that they can customize
link |
02:33:16.800
that are as inexpensive as they can.
link |
02:33:18.920
Yeah, and so you always want to maintain
link |
02:33:20.400
the connection to the open source
link |
02:33:21.560
because that's going to be the tools.
link |
02:33:22.400
Yes, so open teams is about solving
link |
02:33:24.840
enterprise software problems.
link |
02:33:26.720
Brilliant, brilliant idea, by the way.
link |
02:33:28.120
With a connect, but we do it honoring the topology.
link |
02:33:30.960
We don't hire all the people.
link |
02:33:32.360
We are a network connecting the sales energy
link |
02:33:35.120
and the procurement energy,
link |
02:33:36.520
and we work on the business side,
link |
02:33:37.960
get the deals closed,
link |
02:33:39.080
and then have a network of partners
link |
02:33:40.560
like Quonsight and others who we hand the deals to,
link |
02:33:44.080
to actually do the work.
link |
02:33:44.920
And then we have to maintain,
link |
02:33:46.480
I feel like we have to maintain
link |
02:33:47.320
some level of quality control
link |
02:33:48.760
so that the client can rely on open teams
link |
02:33:50.960
to ensure the delivery.
link |
02:33:52.080
It's not just, here's a lead, go figure that out.
link |
02:33:54.640
But no, we're going to make sure you get what you need.
link |
02:33:57.040
By the way, it's such a skill,
link |
02:33:58.840
and I don't know if I have the patience.
link |
02:34:00.640
I will have the patience to talk to the business people
link |
02:34:04.080
or more specific, I mean,
link |
02:34:05.600
there's all kinds of flavors of business people
link |
02:34:07.480
or like marketing people.
link |
02:34:11.960
There's a challenge.
link |
02:34:12.800
I hear what you're saying
link |
02:34:13.640
because I've had the same challenge.
link |
02:34:14.880
And it's true.
link |
02:34:15.720
There's sometimes you think, okay, this is way overwrought.
link |
02:34:18.440
Yeah, but you have to become an adult
link |
02:34:20.240
and you have to, because the companies have needs.
link |
02:34:22.320
They have ways to make money
link |
02:34:24.320
and they also want to learn and grow,
link |
02:34:26.480
and it's your job to kind of educate them on the best way,
link |
02:34:28.960
like the value of open source, for example.
link |
02:34:31.000
Right, and I'm really grateful for all my experiences
link |
02:34:32.960
over the past 14 years, understanding that side of it
link |
02:34:35.720
and still learning for sure,
link |
02:34:37.160
but not just understanding from companies,
link |
02:34:38.640
but also dealing with marketing professionals
link |
02:34:40.560
and sales professionals
link |
02:34:41.600
and people that make a career out of that
link |
02:34:43.120
and understanding what they're thinking about
link |
02:34:44.360
and also understanding, well, let's make this better.
link |
02:34:46.840
We can really make a place.
link |
02:34:48.160
Open teams I see as the transmission layer
link |
02:34:50.480
between companies and open source communities
link |
02:34:53.720
producing enterprise software solutions.
link |
02:34:55.600
Eventually we want to,
link |
02:34:56.880
today we're taking on SaaS and MATLAB
link |
02:34:59.320
and tools that we know we can replace for folks.
link |
02:35:01.720
Really, anytime you have a software tool at an organization
link |
02:35:04.560
where you have to do a lot of customization
link |
02:35:06.200
to make it work for you.
link |
02:35:07.360
It's not you're just buying this thing off the shelf
link |
02:35:09.000
and it works.
link |
02:35:09.840
It's like, okay, you buy this system
link |
02:35:11.080
and then you customize it a lot,
link |
02:35:12.840
usually with expensive consultants
link |
02:35:15.280
to actually make it work for you.
link |
02:35:17.200
All of those should be replaced by open source foundations
link |
02:35:19.760
with the same customization.
link |
02:35:20.600
You're doing such important work,
link |
02:35:22.360
such important work in these giant organizations
link |
02:35:25.440
that do exactly that,
link |
02:35:26.520
taking some proprietary software
link |
02:35:28.360
and hiring a huge team of consultants
link |
02:35:30.520
that customize it and then that whole thing
link |
02:35:32.760
gets outdated quick.
link |
02:35:33.680
Correct.
link |
02:35:34.520
And so, I mean, that's brilliant.
link |
02:35:36.760
So the one solution to that
link |
02:35:39.360
is kind of what Tesla's doing a little bit of,
link |
02:35:43.240
which is basically build up a software engineering team.
link |
02:35:46.680
Like build a team from scratch.
link |
02:35:48.320
Build a team from scratch.
link |
02:35:49.160
And companies are doing it well,
link |
02:35:50.000
that's what they're doing right now.
link |
02:35:50.840
Yeah, exactly.
link |
02:35:51.680
And that's okay.
link |
02:35:52.520
And you're creating a topology for some of that.
link |
02:35:54.360
You're right.
link |
02:35:55.200
You just don't have to do it.
link |
02:35:56.040
That's not the only answer, right?
link |
02:35:57.040
And so other companies can access this,
link |
02:35:58.880
be more accessible.
link |
02:35:59.880
We literally say,
link |
02:36:01.120
open team is the future of enterprise software.
link |
02:36:03.920
We're still early.
link |
02:36:04.760
Like this idea just percolated over the past year
link |
02:36:07.400
as we've kind of grown Quansight
link |
02:36:08.520
and realized the extensibility of it.
link |
02:36:10.440
We just finished in our seed round
link |
02:36:13.240
to help get more sales people
link |
02:36:15.160
and then push the messaging correctly.
link |
02:36:17.640
And there's lots of tools we're building
link |
02:36:19.160
to make this easier.
link |
02:36:20.000
Like we wanna automate the processes.
link |
02:36:21.720
We feel like a lot of the power
link |
02:36:23.560
is the efficiency of the sales process.
link |
02:36:25.600
There's a lot of wasted energy in small teams
link |
02:36:29.400
and the sales energy to get into large companies
link |
02:36:31.640
and make a deal.
link |
02:36:32.680
There's a lot of money spent on that process.
link |
02:36:34.720
Creating the tools and processes for that sales.
link |
02:36:36.560
So make that super seamless.
link |
02:36:38.160
So a single company can go,
link |
02:36:39.680
oh, I've got my contract with open teams.
link |
02:36:41.400
We've got a subscription they can get.
link |
02:36:43.040
They can make that procurement seamless.
link |
02:36:45.200
And then the fact they have access
link |
02:36:46.720
to the entire open source ecosystem.
link |
02:36:48.840
And we have a part of our work
link |
02:36:51.240
that's embracing open source ecosystems
link |
02:36:53.400
and making sure we're doing things useful for them
link |
02:36:55.080
or serving them.
link |
02:36:56.160
And then companies making sure
link |
02:36:57.560
they're getting solutions they care about.
link |
02:36:59.200
And then figuring out which targets we have.
link |
02:37:02.480
We're not taking on all of open source,
link |
02:37:04.720
all of enterprise software yet.
link |
02:37:06.040
But we're step by step.
link |
02:37:07.440
Well this feels like the future.
link |
02:37:08.520
The idea and the vision is brilliant.
link |
02:37:10.600
Can I ask you, why do you think Microsoft bought GitHub
link |
02:37:14.440
and what do you think is the future of GitHub?
link |
02:37:16.560
Great point.
link |
02:37:17.400
I thought it was a brilliant move.
link |
02:37:18.220
I think they did because Microsoft has always
link |
02:37:20.620
had a developer centric culture.
link |
02:37:22.660
Like they always have.
link |
02:37:23.500
Like one of the things Microsoft's always done well
link |
02:37:25.160
is understand that their power is the developers.
link |
02:37:27.440
It's been, Ballmer didn't necessarily make a good meme
link |
02:37:31.600
about how he approached that.
link |
02:37:32.560
But they're broadening that.
link |
02:37:34.520
I think that's why.
link |
02:37:35.360
Because they recognize GitHub is where developers are at.
link |
02:37:38.080
Right?
link |
02:37:38.920
And so.
link |
02:37:39.740
But do they have a vision like open teams
link |
02:37:41.080
type of situation, right?
link |
02:37:41.920
I don't think so yet.
link |
02:37:43.600
Are they just basically throwing money at developers
link |
02:37:46.680
to show their support?
link |
02:37:47.960
I think so.
link |
02:37:48.800
Without a topology like you put it.
link |
02:37:50.840
Like a way to leverage that.
link |
02:37:53.280
Like to give developers actual money.
link |
02:37:55.480
Right.
link |
02:37:56.320
I don't think so.
link |
02:37:57.160
They're still, it's an enterprise software company.
link |
02:37:59.440
And they make a bunch of money.
link |
02:38:00.520
They make a bunch of games.
link |
02:38:01.360
They're a big company.
link |
02:38:02.640
They sell products.
link |
02:38:03.760
I think part of it is they know there's opportunity
link |
02:38:06.080
to make money from GitHub.
link |
02:38:07.760
Right?
link |
02:38:08.600
There's definitely a business there.
link |
02:38:09.440
You know, to sell to developers.
link |
02:38:11.340
Or to sell to people using development.
link |
02:38:13.280
I think there's part of that.
link |
02:38:14.240
I think part of it is also there's,
link |
02:38:15.880
they had definitely wanted to recognize
link |
02:38:18.080
that you need to value open source
link |
02:38:20.560
to get great developers.
link |
02:38:21.920
Which is an important concept that was emerging
link |
02:38:24.000
over the past 10 years.
link |
02:38:25.000
That, you know, pay at Pi Data.
link |
02:38:28.000
We were able to convince J.P. Morgan
link |
02:38:29.880
to support Pi Data because of that fact.
link |
02:38:31.480
Right?
link |
02:38:32.320
That was where the money for them putting
link |
02:38:33.440
a couple hundred thousand into supporting Pi Data
link |
02:38:35.160
for several conferences was they want developers.
link |
02:38:37.800
And they realized that developers want
link |
02:38:39.480
to participate in open source.
link |
02:38:40.720
So enterprise software folks don't always understand
link |
02:38:43.200
how their software gets used.
link |
02:38:44.600
Having spent a lot of time on the floors
link |
02:38:46.560
at J.P. Morgan, at InShell, at ExxonMobil,
link |
02:38:49.600
you see, oh, these companies have large development teams.
link |
02:38:52.880
And then they're kind of dealing with
link |
02:38:55.280
what's being delivered to them.
link |
02:38:56.720
So I really feel kind of a privilege
link |
02:38:58.360
that I had a chance to learn some of these people
link |
02:39:00.480
and see what they're doing.
link |
02:39:01.800
And even work alongside them, you know,
link |
02:39:04.000
as a consultant, using open source and trying to figure,
link |
02:39:07.640
how do we make this work inside of our large organization?
link |
02:39:09.960
Some of it is actually, for a large organization,
link |
02:39:13.000
some of it is messaging to the world
link |
02:39:14.800
that you care about developers
link |
02:39:16.280
and you're the cool, you care.
link |
02:39:18.840
Like, for example, like if Ford,
link |
02:39:21.040
cause I talked to them, like car companies, right?
link |
02:39:23.880
They want to attract, you know,
link |
02:39:26.680
you want to take on Tesla and autopilot.
link |
02:39:28.760
You want to take on, right?
link |
02:39:29.960
And so what do you do there?
link |
02:39:31.720
You show that you're cool.
link |
02:39:32.960
Like you try to show off that you care about developers
link |
02:39:36.480
and they have a lot of trouble doing that.
link |
02:39:39.040
And like one way, I think like Ford should have bought GitHub.
link |
02:39:42.720
They just to show off, like these old school companies
link |
02:39:46.880
and it's in a lot of different industries.
link |
02:39:49.960
There's probably different ways.
link |
02:39:51.080
It's probably an art show that you care to developers.
link |
02:39:54.080
And the developers, it's exactly what you, like,
link |
02:39:57.920
for example, just spit balling here,
link |
02:40:00.520
but like Ford or somebody like that
link |
02:40:02.520
could give a hundred million dollars
link |
02:40:05.960
to the development of NumPy.
link |
02:40:07.880
And like literally look at like the top most popular projects
link |
02:40:13.200
in Python and just say, we're just going to give money.
link |
02:40:17.080
Like that's going to immediately make you cool.
link |
02:40:20.240
They could actually, yeah.
link |
02:40:21.600
And in fact, they set up NumFocus to make it easy.
link |
02:40:24.400
But the challenge was,
link |
02:40:26.080
is also you have to have some business development.
link |
02:40:28.480
Like it's a bit of a seeding problem, right?
link |
02:40:31.280
And you look at how,
link |
02:40:32.120
I've talked to the folks at Linux Foundation,
link |
02:40:33.400
know how they're doing it.
link |
02:40:34.360
I know how, and starting NumFocus,
link |
02:40:36.600
because we had two babies in 2012.
link |
02:40:39.400
One was Anaconda, one was NumFocus, right?
link |
02:40:41.120
And they were both important efforts.
link |
02:40:42.760
They had distinct journeys
link |
02:40:44.000
and super grateful that both existed
link |
02:40:46.200
and still grateful both exist.
link |
02:40:48.720
But there's different energies in getting donations
link |
02:40:51.840
as there is getting, this is important to my business.
link |
02:40:55.320
Like I'm selling you something that this is a,
link |
02:40:58.680
I'm going to make money this way.
link |
02:41:00.280
Like if you can tie it,
link |
02:41:01.120
if you can tie the message to an ROI for the company,
link |
02:41:04.040
it becomes a brainer.
link |
02:41:04.880
That's more effective.
link |
02:41:05.720
It's much more effective, right?
link |
02:41:06.920
So, and there are rational arguments to make.
link |
02:41:09.520
I've tried to have conversations with marketing,
link |
02:41:11.120
especially marketing departments.
link |
02:41:12.240
Like very early on, it was clear to me that,
link |
02:41:14.840
oh, you could just take a fraction of your marketing budget
link |
02:41:18.160
and just spend it on open source development.
link |
02:41:20.240
And you get better results from your marketing.
link |
02:41:23.760
Like, because.
link |
02:41:24.600
How did those, can I, sorry,
link |
02:41:26.000
I'm going to try not to go and rants here.
link |
02:41:27.920
What have you learned from the interaction
link |
02:41:29.800
with the marketing folks on that kind of,
link |
02:41:31.440
because you gave a great example
link |
02:41:34.160
of something that will obviously be much better investment
link |
02:41:37.240
in terms of marketing is supporting open source projects.
link |
02:41:40.360
The challenge is not dissimilar
link |
02:41:41.840
from the challenge you have in academia
link |
02:41:44.480
or the different colleges, right?
link |
02:41:46.520
Knowledge gets very specific and very channeled, right?
link |
02:41:50.000
And so people get,
link |
02:41:51.160
they get a lot of learning in the thing they know about.
link |
02:41:53.920
And it's hard then to bridge that
link |
02:41:56.200
and to get them to think differently enough
link |
02:41:58.160
to have a sense that you might have something to offer
link |
02:42:02.160
because it's different.
link |
02:42:03.000
It's like, well, how do I implement that?
link |
02:42:04.280
How do I, what do I do with that?
link |
02:42:05.840
Like, do I, which budget do I take from?
link |
02:42:07.840
Do I slow down my spend on Google ads
link |
02:42:10.320
or my spend on Facebook ads?
link |
02:42:11.600
Or do I not hire a content creator and say like,
link |
02:42:14.640
there's an operational aspect to that,
link |
02:42:16.160
that you have to be the CMO, right?
link |
02:42:19.080
Or the CEO, you have to get the right level.
link |
02:42:21.000
So you'll have to hire at a high position level
link |
02:42:24.360
where they care about this and this.
link |
02:42:25.720
Right, or they won't know how, right?
link |
02:42:27.640
And because you can also do it very clumsily, right?
link |
02:42:30.440
And I've seen it, cause you can,
link |
02:42:32.040
you absolutely have to honor and recognize
link |
02:42:33.760
the people you're going to and the fact
link |
02:42:36.640
that if you just throw money at them,
link |
02:42:37.800
it could actually create more problems.
link |
02:42:39.240
Can I just say, this is not you saying, can I just,
link |
02:42:41.320
cause I just need, I need to say this.
link |
02:42:44.360
I've been very surprised how often marketing people
link |
02:42:49.880
are terrible at marketing.
link |
02:42:51.760
I feel like the best marketing is doing something novel
link |
02:42:55.600
and unique that anticipates the future.
link |
02:42:58.240
It feels like so much of the marketing practice
link |
02:43:01.520
is like what they took in school,
link |
02:43:04.320
or maybe they're studying for what was the best thing
link |
02:43:06.680
that was done in the past decade,
link |
02:43:08.440
and they're just repeating that over and over,
link |
02:43:10.800
as opposed to innovating, like taking the risk.
link |
02:43:13.760
To me, marketing.
link |
02:43:14.600
That's a great point.
link |
02:43:15.440
Is taking the big risk.
link |
02:43:17.080
That's a great point.
link |
02:43:17.920
And being the first one to risk.
link |
02:43:18.800
Yeah, there's an aspect of data observation
link |
02:43:21.200
from that risk, right?
link |
02:43:22.160
That's, I think, shared what they're doing already.
link |
02:43:25.120
But it absolutely, it's about, I think it's content.
link |
02:43:27.680
Like there's this whole world on content marketing
link |
02:43:30.200
that you could almost say, well, yeah, it can get over,
link |
02:43:33.560
you can get inundated with stuff
link |
02:43:35.080
that's not relevant to you.
link |
02:43:36.400
Whereas what you're saying would be highly relevant
link |
02:43:39.160
and highly useful and highly beneficial.
link |
02:43:41.560
Yeah, but it's risk.
link |
02:43:42.960
I mean, that's why I sort of,
link |
02:43:44.600
there's a lot of innovative ways of doing that.
link |
02:43:46.240
Tesla's an example of people
link |
02:43:48.000
that basically don't do marketing.
link |
02:43:49.960
They do marketing in a very, like,
link |
02:43:52.800
let's say Elon hired a person who's just good at Twitter
link |
02:43:55.720
for running Tesla's Twitter account.
link |
02:43:57.520
No, right, right.
link |
02:43:59.120
I mean, that's exactly what you wanna be doing.
link |
02:44:00.840
You want it to be constantly innovating in the.
link |
02:44:03.120
Right, there's an aspect of telling.
link |
02:44:04.280
I mean, I've definitely seen people doing great work
link |
02:44:06.920
where you're not talking about it.
link |
02:44:08.400
Like, I would say that's actually a problem
link |
02:44:09.560
I have right now with Quonset Labs.
link |
02:44:11.360
Quonset Labs has been doing amazing work,
link |
02:44:12.720
really excited about it,
link |
02:44:13.560
but we have not been talking about it enough.
link |
02:44:15.480
We haven't been.
link |
02:44:16.320
And there's different ways to talk about it.
link |
02:44:17.880
There's different ways to,
link |
02:44:18.720
there's different channels to which to communicate.
link |
02:44:20.800
There's also, like, I'll just throw some shade
link |
02:44:25.600
at companies I love.
link |
02:44:27.880
So for example, iRobot,
link |
02:44:29.160
I just had a conversation with them.
link |
02:44:30.800
They make Roombas.
link |
02:44:31.840
Sure.
link |
02:44:32.680
And I think I love, they're incredible robots,
link |
02:44:35.440
but like every time they do like advertisement,
link |
02:44:38.960
not advertisement, but like marketing type stuff,
link |
02:44:41.880
it just looks so corporate.
link |
02:44:44.080
And to me, the incredible,
link |
02:44:47.640
maybe wrong in the case of iRobot, I don't know.
link |
02:44:50.280
But to me, when you're talking about engineering systems,
link |
02:44:54.000
it's really nice to show off the magic of the engineering
link |
02:44:57.000
and the software and all the geniuses behind this product
link |
02:45:02.000
and the tinkering and like the raw authenticity
link |
02:45:05.080
of what it takes to build that system
link |
02:45:06.800
versus the marketing people who want to have like
link |
02:45:09.960
pretty people, like standing there all pretty
link |
02:45:12.120
with the robots, like moving perfectly.
link |
02:45:14.600
So to me, there's some aspect,
link |
02:45:16.520
it's like speaking to the hackers,
link |
02:45:18.040
you have to throw some bones,
link |
02:45:21.040
some care towards the engineers, the developers,
link |
02:45:25.560
because there's some aspect, one, for the hiring,
link |
02:45:28.720
but two, there's an authenticity to that,
link |
02:45:31.000
authenticity to that kind of communication
link |
02:45:33.080
that's really inspiring to the end user as well.
link |
02:45:36.080
Like if they know that brilliant people,
link |
02:45:38.440
the best in the world are working at your company,
link |
02:45:40.680
they start to believe that that product
link |
02:45:42.640
that you're creating is really good.
link |
02:45:43.960
It's interesting, because your initial reaction would be,
link |
02:45:45.640
wait, there's different users here.
link |
02:45:46.760
Why would you do that to, you know,
link |
02:45:48.400
my wife bought a Roomba, and she loves developers,
link |
02:45:52.120
she loves me, but she doesn't care about that culture.
link |
02:45:56.560
So essentially what you said is actually the authenticity,
link |
02:45:59.600
because everyone has a friend, everyone knows people,
link |
02:46:01.160
there's word of mouth, I mean, if you.
link |
02:46:02.680
Word of mouth is so, so proper.
link |
02:46:04.160
Yeah, exactly, that's interesting.
link |
02:46:05.640
Because I think it's the lack of that realization,
link |
02:46:07.560
there's this halo effect that influences
link |
02:46:09.840
your general marketing, interesting.
link |
02:46:11.720
For some stupid reason, I do have a platform,
link |
02:46:14.640
and it seems that the reason I have a platform,
link |
02:46:16.920
many others like me, millions of others,
link |
02:46:19.480
is like the authenticity,
link |
02:46:21.160
and like we get excited naturally about stuff.
link |
02:46:23.960
And like, I don't want to get excited
link |
02:46:25.760
about that iRobot video,
link |
02:46:27.800
because it's boring, it's marketing, it's corporate,
link |
02:46:30.760
as opposed to, I wanted to do some fun,
link |
02:46:33.600
this is me, like a shout out to iRobot,
link |
02:46:36.240
is they're not letting me get into the robot.
link |
02:46:39.360
Yeah, well there's an aspect of,
link |
02:46:40.920
that could be benefiting from a culture of modularity,
link |
02:46:44.840
like add ons, and that could actually dramatically help.
link |
02:46:47.840
You've seen that over history,
link |
02:46:49.300
I mean, Apple is an example of a company like that,
link |
02:46:51.160
or the, like, I can see what your point is,
link |
02:46:54.400
is that you have something that needs to be,
link |
02:46:56.920
it needs to be adopted broadly,
link |
02:46:58.240
the concept needs to be adopted broadly.
link |
02:47:00.040
And if you want to go beyond this one device,
link |
02:47:01.640
you need to engage this community.
link |
02:47:04.220
Yeah, and connecting to the open source that you said.
link |
02:47:07.560
I gotta ask you,
link |
02:47:09.960
you're a programmer,
link |
02:47:11.800
one of the most impactful programmers ever.
link |
02:47:14.840
You've led many programmers, you lead many programmers.
link |
02:47:18.560
What are some, from a programmer perspective,
link |
02:47:21.180
what makes a good programmer?
link |
02:47:23.360
What makes a productive programmer?
link |
02:47:25.000
Is there a device you can give
link |
02:47:27.140
to be a great programmer in this world?
link |
02:47:28.480
That's a great, great question.
link |
02:47:30.280
And there are times in my life
link |
02:47:31.640
I'd probably answer this even better
link |
02:47:32.920
than I hope maybe give an answer today.
link |
02:47:35.040
Because I thought about this numerous times,
link |
02:47:36.700
like right now I've spent on so much time
link |
02:47:38.280
recently hiring salespeople that,
link |
02:47:41.000
That your mind is a little bit on something else.
link |
02:47:43.440
On something else.
link |
02:47:44.280
But I reflected on the past,
link |
02:47:46.080
and also, you know, I have some really,
link |
02:47:48.160
the only way I can do this,
link |
02:47:49.000
is I have some really great programmers that I work with,
link |
02:47:51.440
who lead the teams that they lead.
link |
02:47:53.240
And my goal is to inspire them and hopefully help them,
link |
02:47:56.600
encourage them, and be,
link |
02:47:57.800
help them encourage with their teams.
link |
02:47:59.620
I would say there's a number of things, couple things.
link |
02:48:01.200
One is curiosity.
link |
02:48:03.860
Like you, I think a programmer without curiosity
link |
02:48:07.700
is mundane.
link |
02:48:09.600
Like you'll lose interest, you won't do your best work.
link |
02:48:12.240
So it's sort of, it's an affect.
link |
02:48:13.640
It's sort of, are you,
link |
02:48:14.480
you have some curiosity about things.
link |
02:48:16.800
I think two, don't try to do everything at once.
link |
02:48:19.600
Recognize that you're, you know, we're limited as humans.
link |
02:48:21.960
You're limited as a human.
link |
02:48:23.200
And each one of us are limited in different ways.
link |
02:48:24.920
You know, we all have our different strengths and skills.
link |
02:48:26.600
So it's adapting the art of programming to your skills.
link |
02:48:29.880
One of the things that always works,
link |
02:48:31.240
is to limit what you're trying to solve.
link |
02:48:33.580
Right, so, if you're part of a team,
link |
02:48:36.640
usually maybe somebody else has put the architecture together
link |
02:48:38.920
and they've gotten given a portion for you if you're young.
link |
02:48:41.720
If you're not part of a team,
link |
02:48:43.440
it's sort of breaking down the problem into smaller parts,
link |
02:48:46.640
is essential for you to make progress.
link |
02:48:48.620
It's very easy to take on a big project
link |
02:48:50.720
and try to do it all at once, and you get lost.
link |
02:48:52.800
And then you do it badly.
link |
02:48:53.680
And so thinking about, you know,
link |
02:48:57.700
very concretely what you're doing,
link |
02:48:59.400
defining the inputs and outputs,
link |
02:49:01.440
defining what you want to get done.
link |
02:49:03.960
Even just talking about that and like writing down
link |
02:49:07.280
before you write code, just what are you trying to accomplish?
link |
02:49:09.440
I mean, very specific about it, really, really helps.
link |
02:49:12.800
I think using other people's work, right?
link |
02:49:17.000
Don't be afraid that somehow you're,
link |
02:49:20.000
like you should do it all.
link |
02:49:21.280
Like, nobody does.
link |
02:49:23.240
Stand on the shoulders of giants.
link |
02:49:25.240
And copy and paste from Stack Overflow.
link |
02:49:26.720
Copy and paste from Stack Overflow.
link |
02:49:28.200
But don't just copy and paste,
link |
02:49:30.040
this is particularly relevant in the era of Codex
link |
02:49:31.760
and the auto generated code, which is essentially,
link |
02:49:34.960
I see as an indexing of Stack Overflow.
link |
02:49:36.760
Right, exactly.
link |
02:49:37.600
Secondly, it's like.
link |
02:49:38.440
It's a search engine.
link |
02:49:39.280
It's a search engine over Stack Overflow, basically.
link |
02:49:41.280
So it's not, I mean, we've had this for a while.
link |
02:49:43.480
But really, you want to cut and paste, but not blindly.
link |
02:49:47.300
Like, absolutely I've cut and paste to understand,
link |
02:49:51.000
but then you understand.
link |
02:49:52.320
Oh, this is what this means.
link |
02:49:53.640
Oh, this is what it's doing.
link |
02:49:54.920
And understand as much as you can.
link |
02:49:56.680
So it's critical, that's where the curiosity comes in.
link |
02:49:59.080
If you're just blindly cutting and pasting,
link |
02:50:01.000
you're not gonna understand.
link |
02:50:02.240
So understand, and then be sensitive to hype cycles.
link |
02:50:08.600
Right, every few often there's always a,
link |
02:50:10.920
oh, test driven development is the answer.
link |
02:50:12.520
Oh, object oriented is the answer.
link |
02:50:13.800
Oh, there's always an answer.
link |
02:50:16.520
Agile is the answer.
link |
02:50:18.400
Be cautious of jumping onto a hype cycle.
link |
02:50:20.840
Like, likely there's signal.
link |
02:50:22.520
Like, there's a thing there
link |
02:50:23.440
that's actually valuable, you can learn from.
link |
02:50:25.320
But it's almost certainly not the answer
link |
02:50:27.720
to everything you need.
link |
02:50:28.960
What lessons do you draw
link |
02:50:30.160
from you having created NumPy and SciPy?
link |
02:50:34.100
Like, in service of sort of answering the question
link |
02:50:37.200
of what it takes to be a great programmer
link |
02:50:38.840
and giving advice to people.
link |
02:50:40.520
How can you be the next person to create a SciPy?
link |
02:50:42.960
Yeah, so one is listen.
link |
02:50:45.640
To?
link |
02:50:46.480
Listen.
link |
02:50:47.300
To who?
link |
02:50:48.140
To people that have a problem, right?
link |
02:50:51.440
Which is everybody, right?
link |
02:50:52.520
But listen, and listen to many.
link |
02:50:54.960
And then try to, and then do.
link |
02:50:57.460
Like, you're gonna have to do an experiment, you know?
link |
02:50:59.760
Do, fall down, don't be afraid to fall down.
link |
02:51:01.940
Don't be afraid, the first thing you do
link |
02:51:04.240
is probably gonna suck, and that's okay, right?
link |
02:51:07.600
It's honestly, I think iteration is the key to innovation.
link |
02:51:11.240
And it's almost that psychological hesitation we have
link |
02:51:16.240
to just iterate.
link |
02:51:18.520
Like, yeah, we know it's not great,
link |
02:51:20.560
but next time it'll be better.
link |
02:51:22.000
I mean, just keep learning and keep improving.
link |
02:51:25.560
So it's an attitude.
link |
02:51:27.700
And then it doesn't take intense concentration, right?
link |
02:51:32.160
Good things don't happen just,
link |
02:51:34.560
it's not quite like TikTok or like Facebook, you know?
link |
02:51:38.200
You can't scroll your way to good programming, right?
link |
02:51:40.520
There are sincere hours of deep,
link |
02:51:44.720
don't be afraid of the deep problem.
link |
02:51:46.040
Like, often people will run away from something
link |
02:51:47.680
because, oh, I can't solve this.
link |
02:51:49.000
And you might be right, but give it an hour.
link |
02:51:51.360
Give it a couple of hours and see.
link |
02:51:53.360
And just five minutes, not gonna give you that.
link |
02:51:56.560
Was it lonely when you were building SciPy and NumPy?
link |
02:52:00.520
Hugely, yeah, absolutely lonely,
link |
02:52:02.520
in the sense of you had to have an inner drive,
link |
02:52:05.760
and that inner drive for me always comes from,
link |
02:52:08.000
I have to see that this is right in some angle.
link |
02:52:11.640
I have to believe it, that this is the right approach,
link |
02:52:13.360
the right thing to do.
link |
02:52:14.720
With SciPy, it was like, oh yeah,
link |
02:52:16.400
the world needs libraries and Python.
link |
02:52:19.080
Clearly Python's popular enough
link |
02:52:20.720
with enough influential people to start,
link |
02:52:22.960
and it needs more libraries.
link |
02:52:24.640
So that is a good in and of itself.
link |
02:52:26.600
So I'm gonna go do that good.
link |
02:52:28.360
So find a good, find a thing that you know is good
link |
02:52:30.360
and just work on it.
link |
02:52:33.040
So that has to happen, and it is.
link |
02:52:34.720
And you kind of have to have enough realization
link |
02:52:37.000
of your mission to be okay with the naysayer
link |
02:52:40.280
or the fact that not everybody joins you at front.
link |
02:52:42.200
In fact, one thing I've talked to people a lot,
link |
02:52:43.480
I've seen a lot of projects come, and some fail.
link |
02:52:45.480
Not everything I've done has actually worked perfectly.
link |
02:52:47.600
I've tried a bunch of stuff that, okay,
link |
02:52:49.160
that didn't really work, or this isn't working, and why.
link |
02:52:51.920
But you see the patterns, and one of the key things is
link |
02:52:55.800
you can't even know for six months.
link |
02:52:59.040
I say 18 months right now.
link |
02:53:00.200
If you're starting a new project,
link |
02:53:01.800
you gotta give it a good 18 month run
link |
02:53:03.200
before you even know if the feedback's there.
link |
02:53:05.920
You're not gonna know in six months.
link |
02:53:07.880
You might have the perfect thing,
link |
02:53:08.720
but six months from now, it's still kind of still emerging.
link |
02:53:11.480
So give it time, because you're dealing with humans,
link |
02:53:13.360
and humans have an inertial energy
link |
02:53:15.960
that just doesn't change that quickly, so.
link |
02:53:18.680
Let me ask a silly question, but like you said,
link |
02:53:23.560
you're focused on the sales side of things currently,
link |
02:53:26.120
but back when you were actively programming,
link |
02:53:28.960
maybe in the 90s, you talked about IDEs.
link |
02:53:31.680
What's a setup that you have that brings you joy?
link |
02:53:36.200
Keyboard, number of screens, Linux.
link |
02:53:39.640
I do still like to program some.
link |
02:53:40.920
It's not as much as I used to.
link |
02:53:42.160
I have two projects I'm super interested in,
link |
02:53:44.560
trying to find funding for them,
link |
02:53:45.640
trying to figure out teams for them,
link |
02:53:47.200
but I could talk about those.
link |
02:53:49.040
But what I, yeah, I'm an Emacs guy.
link |
02:53:51.960
Great, thank the superior editor, everybody.
link |
02:53:56.080
I've got, I don't often delete tweets,
link |
02:53:59.000
but one of the tweets I deleted
link |
02:54:00.600
when I said Emacs was better than Vim,
link |
02:54:02.840
and then the hate I got from it.
link |
02:54:04.520
It is.
link |
02:54:05.360
I was like, I'm walking away from this.
link |
02:54:07.640
I do too, I don't push it.
link |
02:54:09.160
I mean, I'm not.
link |
02:54:10.000
I'm just joking, of course.
link |
02:54:11.080
Yeah, exactly, it's kind of like,
link |
02:54:12.160
but people do take the editor seriously, right?
link |
02:54:14.520
I did it as a joke.
link |
02:54:15.360
That's your life.
link |
02:54:16.200
It is, but there's something beautiful to me about Emacs,
link |
02:54:20.760
but for people that love Vim,
link |
02:54:22.360
there's something beautiful to them about that.
link |
02:54:23.200
There is.
link |
02:54:24.040
I mean, I do use Vim for quick editing.
link |
02:54:26.280
Like Command Line, if I said quick editing,
link |
02:54:27.880
I will still sometimes use it, but not much.
link |
02:54:30.280
Like it's simple, corrective signal editor character.
link |
02:54:32.760
So when you were developing SciPy, you were using Emacs?
link |
02:54:34.920
Emacs, yeah.
link |
02:54:35.880
SciPy and NumPy are all written on Emacs on a Linux box.
link |
02:54:39.040
And CVS and then SVN, version control.
link |
02:54:43.160
Git came later.
link |
02:54:44.040
Like Git has, I love distributed branch stuff.
link |
02:54:48.080
I think Git is pretty complicated, but I love the concept.
link |
02:54:51.640
And also, of course, GitHub and then GitLab
link |
02:54:55.240
make Git definitely consumable, but that came later.
link |
02:54:59.440
Did you ever touch Lisp at all?
link |
02:55:00.880
Like what were your emotional feelings
link |
02:55:03.400
about all the parentheses?
link |
02:55:04.240
Yeah, so great question.
link |
02:55:05.440
So I find myself appreciating Lisp today
link |
02:55:08.240
much more than I did early.
link |
02:55:09.680
Because when I came to programming, I knew programming,
link |
02:55:11.680
but I was a domain expert, right?
link |
02:55:13.000
And to me, the parentheses were in the way.
link |
02:55:15.720
It's like, wow, there's just all this,
link |
02:55:17.800
like it just gets in the way of my thinking
link |
02:55:19.320
about what I'm doing.
link |
02:55:20.160
So why would I have all these, right?
link |
02:55:22.440
That was my initial reaction to it.
link |
02:55:24.760
And now as I appreciate kind of the structure
link |
02:55:27.320
that kind of naturally maps to a logical thinking
link |
02:55:30.280
about a program, I can appreciate them, right?
link |
02:55:33.000
And why it's actually, you could create editors
link |
02:55:35.680
that make it not so problematic, right, honestly.
link |
02:55:40.720
So I actually have a much more appreciation of Lisp
link |
02:55:43.000
and things like Clojure and there's HyVee,
link |
02:55:44.720
which is a Python Lisp that compiles the Python bytecode.
link |
02:55:48.560
I think it's challenging.
link |
02:55:50.280
Like typically these languages are,
link |
02:55:53.160
I even saw the whole data science programming system
link |
02:55:55.280
in Lisp that somebody created, which is cool.
link |
02:55:58.360
But again, I think it's the lack of recognition
link |
02:56:00.840
of the fact that there exists
link |
02:56:02.020
what I call occasional programmers.
link |
02:56:04.080
People that are never gonna be programmers for a living.
link |
02:56:05.840
They don't want to have all this cuteness in their head.
link |
02:56:08.440
They want just, it's why basic, you know,
link |
02:56:11.880
Microsoft had the right idea with basic
link |
02:56:14.480
in terms of having that be the language of visual basic,
link |
02:56:17.660
the language of Excel and SQL Server.
link |
02:56:21.280
They should have converted that to Python 10 years ago.
link |
02:56:23.520
Like the world would be a better place if they had, but.
link |
02:56:27.200
There's also, there's a beauty and a magic
link |
02:56:29.660
to the history behind a language in Lisp.
link |
02:56:31.640
You know, some of the most interesting people
link |
02:56:34.020
in the history of computer science
link |
02:56:35.880
and artificial intelligence have used Lisp.
link |
02:56:37.920
So you feel.
link |
02:56:40.000
Well, especially that language,
link |
02:56:41.200
when you have a language, you can think in it.
link |
02:56:43.440
And it helps you think better.
link |
02:56:44.280
And it attracts a certain kinds of people
link |
02:56:45.640
that think in a certain kind of way.
link |
02:56:46.920
And then that's there.
link |
02:56:48.560
Okay, so what about like small laptop with a tiny keyboard,
link |
02:56:52.140
or is there like three screens?
link |
02:56:55.000
You know, good question.
link |
02:56:55.840
I've never gotten into the big, many screens to be honest.
link |
02:56:58.080
I mean, and maybe it's because in my head,
link |
02:57:00.720
I kind of just, I just swap between windows.
link |
02:57:03.480
Like, partly because I guess I really can't process
link |
02:57:07.480
three screens at once anyway.
link |
02:57:09.200
Like, I just am looking at one and I just flip.
link |
02:57:12.560
You know, I flip an application open.
link |
02:57:14.460
So where it's really helpful is actually
link |
02:57:17.340
when I'm trying to do, you know,
link |
02:57:18.440
here's data and I want to input it from here.
link |
02:57:20.240
Like this is the only time I really need another screen.
link |
02:57:22.280
So now, because you're both a developer, lead developers,
link |
02:57:25.960
but then there's also these businesses
link |
02:57:27.880
and there's salespeople and you're working
link |
02:57:30.120
with large companies.
link |
02:57:30.960
Operations people, hiring people, yeah.
link |
02:57:32.480
The whole thing.
link |
02:57:33.400
Which operating system is your favorite at this point?
link |
02:57:37.240
So Linux was the early days.
link |
02:57:38.960
So yeah, I love Linux as a server side.
link |
02:57:41.460
And it was early days I had my own Linux desktop.
link |
02:57:44.340
I've been on Mac laptops for 10 years now.
link |
02:57:47.800
Yeah, this is what leadership looks like.
link |
02:57:50.040
As you switch to Mac.
link |
02:57:52.800
Okay, great.
link |
02:57:53.800
Pretty much, I mean, just the fact that I had
link |
02:57:56.480
to do PowerPoints, I had to do presentations
link |
02:57:58.760
and you know, plug in, I just couldn't mess
link |
02:58:01.240
with plugging in laptops, it wouldn't project and yeah.
link |
02:58:04.440
So you mentioned also Quantset Labs and things like that.
link |
02:58:09.240
Can you give advice on how to hire great programmers
link |
02:58:13.640
and great people?
link |
02:58:14.600
Yeah, I would say, produce an open source project,
link |
02:58:19.400
get people contributing to it and hire those people.
link |
02:58:21.560
Yeah, I mean, you're doing it sort of,
link |
02:58:25.080
you may be perhaps a little biased,
link |
02:58:27.080
but that's probably 100% really good advice.
link |
02:58:30.320
I find it hard to hire.
link |
02:58:31.800
I still find it hard to hire, like in terms of,
link |
02:58:34.480
I don't think that it's not hard to hire
link |
02:58:36.560
if I've worked with somebody for a couple of weeks,
link |
02:58:39.320
but an hour or two of interviews, I have no idea.
link |
02:58:43.600
So that instinct, that radar of knowing if you're good
link |
02:58:47.880
or not, that you've found that you're still not able to.
link |
02:58:50.720
It's really hard, I mean, the resume can help,
link |
02:58:53.240
but again, the resume is like a presentation
link |
02:58:55.520
of the things they want you to see, not the reality of,
link |
02:58:58.840
and there's also, you have to understand
link |
02:59:02.800
what you're hiring for.
link |
02:59:03.960
There are different stages and different kinds of skills.
link |
02:59:06.800
And so it isn't just, one of the things I talk a lot about
link |
02:59:10.940
internally at my company is just that the whole idea
link |
02:59:14.440
of measuring ourselves against a single axis is flawed
link |
02:59:18.600
because we're not, it's a multidimensional space
link |
02:59:20.600
and how do you order a multidimensional space?
link |
02:59:22.120
There isn't one ordering.
link |
02:59:23.440
So this whole idea, you immediately get projected
link |
02:59:26.160
into a thing when you're talking about hiring
link |
02:59:28.200
or best or worst or better or not better.
link |
02:59:30.660
So what is the thing you're actually needing?
link |
02:59:33.500
And you can hire for that.
link |
02:59:35.960
There is such a thing, generally, I really value people
link |
02:59:39.040
who have the affect, that care about open source.
link |
02:59:42.920
Like so in some cases, their affinity to open source
link |
02:59:45.720
is simply kind of a filter of an affect.
link |
02:59:49.100
However, I have found this interesting dichotomy
link |
02:59:52.560
between open source contributors and product creation.
link |
02:59:58.520
There's, I don't know if it's fully true,
link |
03:00:00.560
but there does seem to be the more experienced,
link |
03:00:04.960
the more affect somebody has an open source community,
link |
03:00:08.160
the less ability to actually produce product that they have.
link |
03:00:11.640
And the opposite is kind of true too.
link |
03:00:13.520
The more product focused are, I find a lot of people,
link |
03:00:16.000
I've talked to a lot of people who produce
link |
03:00:17.020
really great products and they have a,
link |
03:00:19.400
they're looking over the open source communities,
link |
03:00:21.120
kind of wanting to participate and play,
link |
03:00:23.320
but they've played here and they do a great job here
link |
03:00:26.000
and then they don't necessarily have some of the same.
link |
03:00:29.520
Now I don't think that's entirely necessary.
link |
03:00:32.040
I think part of it is cultural, how they've emerged.
link |
03:00:34.880
Because one of the things that open source communities
link |
03:00:36.300
often lack is great product management,
link |
03:00:39.160
like some product management energy.
link |
03:00:41.000
That's brilliant, but you want both of those energies
link |
03:00:43.600
in the same place together.
link |
03:00:44.880
Yes, you really do.
link |
03:00:45.840
And so a lot of it's creating these teams of people
link |
03:00:48.120
that have these needed skills and attributes
link |
03:00:50.480
that are hard.
link |
03:00:51.880
And so one of the big things I look for is somebody
link |
03:00:55.120
that fundamentally recognizes their need to learn.
link |
03:00:57.800
Like one of the values that we have
link |
03:00:59.560
in all of the things we do is learning.
link |
03:01:01.400
Like if somebody thinks they know it all,
link |
03:01:04.560
they're gonna struggle.
link |
03:01:06.240
And some of that is just, there's more basic things
link |
03:01:09.440
like humility, just being humble in the face
link |
03:01:12.780
of all the things you don't know.
link |
03:01:14.400
And that's step one of learning.
link |
03:01:15.840
That's step one of learning, right?
link |
03:01:16.960
And I've spent a lot of time learning, right?
link |
03:01:20.840
Other people spend a lot more time,
link |
03:01:21.840
but I've spent a lot of time learning.
link |
03:01:23.280
My whole goal was to get a PhD because I love school
link |
03:01:26.320
and I wanted to be a scientist.
link |
03:01:28.240
And then what I found is what's been written about
link |
03:01:31.120
elsewhere as well is the more I learned,
link |
03:01:32.600
the more I didn't know.
link |
03:01:33.780
The more I realized, man, I know about this,
link |
03:01:37.680
but this is such a tiny thing in the global scope
link |
03:01:40.060
of what I might wanna know about.
link |
03:01:41.220
So I need to be listening a whole lot better
link |
03:01:43.840
than I am just talking.
link |
03:01:47.360
That's changed a little bit actually.
link |
03:01:48.840
My wife says that I used to be a better listener.
link |
03:01:50.600
Now that I'm so full of all these ideas I wanna do,
link |
03:01:52.880
she kind of says, you gotta give people time to talk.
link |
03:01:55.520
So you've succeeded on multiple dimensions.
link |
03:01:58.400
So one is the tenure track faculty.
link |
03:02:01.680
The other is just creating all these products
link |
03:02:03.080
and building up the businesses,
link |
03:02:04.320
then working with businesses.
link |
03:02:06.880
Do you have advice for young people today
link |
03:02:09.240
in high school and college of how to live a life
link |
03:02:13.880
as nonlinear and as successful as yours,
link |
03:02:18.280
a life that they could be proud of?
link |
03:02:21.200
Well, that's a super compliment.
link |
03:02:22.960
I'm humbled by that actually.
link |
03:02:24.200
I would say a life they can be proud of.
link |
03:02:27.960
Honestly, one thing that I've said to people is first,
link |
03:02:31.560
find people you love and care about them.
link |
03:02:34.240
Like family matters to me a lot.
link |
03:02:36.040
And family means people you love and have committed to.
link |
03:02:39.640
So it can be whatever you mean by that,
link |
03:02:42.160
but you need to have a foundation.
link |
03:02:45.120
So find people you love and wanna commit to and do that.
link |
03:02:48.960
Cause it anchors you in a way that nothing else can.
link |
03:02:52.200
And then you find other things.
link |
03:02:55.200
And then kind of from out there,
link |
03:02:56.640
you find other kinds of things you can commit to,
link |
03:02:58.800
whether it's ideas or people or groups of people.
link |
03:03:03.240
So, especially in high school,
link |
03:03:06.400
I would say don't settle on what you think you know.
link |
03:03:09.320
Like give yourself 10 years to think about the world.
link |
03:03:13.320
Like I see a lot of high school students
link |
03:03:15.440
who seem to know everything already.
link |
03:03:17.640
I think I did too.
link |
03:03:18.720
I think it's maybe natural,
link |
03:03:20.360
but recognize that the things you care about,
link |
03:03:23.160
you might change your perspective over time.
link |
03:03:26.520
I certainly have over time.
link |
03:03:28.600
I was really passionate about one specific thing
link |
03:03:30.640
and I was kind of softened.
link |
03:03:32.520
I was a big, I didn't like the Federal Reserve, right?
link |
03:03:35.760
And there's still, we could have a longer conversation
link |
03:03:38.480
about monetary policy and finances,
link |
03:03:40.120
but I'm a little more nuanced in my perspective
link |
03:03:46.000
at this point.
link |
03:03:48.000
But that's one area where you learn about something,
link |
03:03:50.160
go, ah, I wanna attack it.
link |
03:03:52.440
Build, don't destroy.
link |
03:03:55.160
Build, like so often the tendency is to not like something
link |
03:03:58.400
and wanna go attack it.
link |
03:04:00.000
Build something, build something to replace it.
link |
03:04:02.240
Yeah.
link |
03:04:03.080
Build up, attract people to your new thing.
link |
03:04:05.600
You'll be far better, right?
link |
03:04:08.800
You don't need to destroy something to build something else.
link |
03:04:12.600
So that's, I guess, generally.
link |
03:04:14.880
And then definitely like curiosity,
link |
03:04:19.120
follow your curiosity and let it,
link |
03:04:22.680
don't just follow the money.
link |
03:04:24.600
And all of that, like you said,
link |
03:04:25.800
is grounded in family, friendship, and ultimately love.
link |
03:04:30.160
Yes.
link |
03:04:31.200
Which is a great way to end it.
link |
03:04:34.640
Travis, you're one of the most impactful people
link |
03:04:37.080
in the engineering and the computer science
link |
03:04:38.760
in the human world.
link |
03:04:39.920
So I truly appreciate everything you've done.
link |
03:04:43.520
And I really appreciate that you would spend
link |
03:04:45.800
your valuable time with me.
link |
03:04:46.960
It was an honor.
link |
03:04:47.800
It was a real pleasure for me.
link |
03:04:48.840
I appreciate that.
link |
03:04:50.520
Thanks for listening to this conversation
link |
03:04:52.080
with Travis Oliphant.
link |
03:04:54.000
To support this podcast,
link |
03:04:55.320
please check out our sponsors in the description.
link |
03:04:57.900
And now, let me leave you with something
link |
03:05:00.200
that in the programming world is called Hodgson's Law.
link |
03:05:04.960
Every sufficiently advanced Lisp application
link |
03:05:08.120
will eventually be re implemented in Python.
link |
03:05:12.520
Thank you for listening and hope to see you next time.