back to indexTravis Oliphant: NumPy, SciPy, Anaconda, Python & Scientific Programming | Lex Fridman Podcast #224
link |
The following is a conversation with Travis Oliphant, one of the most impactful programmers
link |
and data scientists ever.
link |
He created NumPy, SciPy, and Anaconda.
link |
NumPy formed the foundation of tensor based machine learning in Python, SciPy formed
link |
the foundation of scientific programming in Python, and Anaconda, specifically with Konda,
link |
made Python more accessible to a much larger audience.
link |
Travis's life work across a large number of programming and entrepreneurial efforts
link |
has and will continue to have immeasurable impact on millions of lives by empowering
link |
scientists and engineers in big companies, small companies, and open source communities
link |
to take on difficult problems and solve them with the power of programming.
link |
Plus, he's a truly kind human being, which is something that when combined with vision
link |
and ambition makes for a great leader and a great person to chat with.
link |
To support this podcast, please check out our sponsors in the description.
link |
This is the Lex Friedman podcast, and here is my conversation with Travis Oliphant.
link |
What was the first computer program you've ever written?
link |
Whoa, that's a good question.
link |
I think it was in fourth grade, just a simple loop in basic.
link |
It was written on Atari 400, I think, or maybe it was Atari 800.
link |
It was part of a class, and we just were just basic loops to print things out.
link |
Did you use go to statements?
link |
Yes, we used go to statements.
link |
I remember in the early days, that's when I first realized there's principles to programming
link |
when I was told that don't use go to statements, those are bad software engineering.
link |
It goes against what great, beautiful code is.
link |
I was like, oh, okay, there's rules to this game.
link |
I didn't see that until high school when I took an AP computer science course.
link |
I did a lot of other kinds of just programming and TI, but finally, when I took an AP computer
link |
science course in Pascal, that was Pascal.
link |
That's when I, oh, there are these principles.
link |
No, I didn't take C until the next year in college, I had a course in C, but I haven't
link |
done much in Pascal, just that AP computer science course.
link |
Now, sorry for the romanticized question, but when did you first fall in love with programming?
link |
Oh, man, good question.
link |
I think actually when I was 10, my dad got us a TI Timex in Claire, and he was excited
link |
about the spreadsheet capability, but I made him get the basic, the add ons so we could
link |
actually program in basic, and just being able to write instructions and have the computer
link |
Then we got a TI 994A when I was about 12, and I would just, it had sprites and graphics
link |
and music you could actually program to do music.
link |
That's when I really sort of fell in love with programming.
link |
So this is a full, like a real computer with like, with memory and storage and processors
link |
so we're not going to say TI.
link |
Yeah, the Timex in Claire was one of the very first, it was a cheap, cheap, like, I think
link |
it was, well, it was still expensive, but it was 2K of memory.
link |
We got the 16K add on pack, but yeah, it had memory and you could program it.
link |
You had the, in order to store your programs, you had to attach a tape drive.
link |
Remember that old, the sound that would play when you converted the modems would convert
link |
digital bits to audio files, tape drive.
link |
Still remember that sound, but that was the storage.
link |
And what was the programming language, do you remember?
link |
And then they had a VisiCalc.
link |
And so a little bit of spreadsheet program in VisiCalc, but mostly just some basic.
link |
Do you remember what kind of things drew you to programming?
link |
Was it working with data?
link |
Was it video games and video games?
link |
Yeah, I've always loved math and a lot of people think they don't like math because
link |
I think when they're exposed to it early, they, it's about memory.
link |
You know, when you're exposed to math early, you have a good short term memory, you remember
link |
And I do have a reasonably, I mean, not perfect, but a reasonably long little short term memory
link |
And so I did great at timetables and said, oh, I get a math, but I started to really
link |
like math, just the problem solving aspect.
link |
And so computing was problem solving applied.
link |
And so that's always kind of been the, the draw kind of coupled with the mathematics.
link |
Did you ever see the computer as like an extension of your mind, like something able
link |
You could play with it and you can, you could play with math puzzles and yeah, it was, it
link |
was too rudimentary early on, like it was sort of, yeah, it was too, it was a lot of
link |
work to actually take a thought you'd have and actually get it implemented.
link |
And that's still work, but it's getting easier.
link |
And so yeah, I would say that's definitely what's attracting me to Python is that that
link |
was more real, right?
link |
I could think in Python, speaking a foreign language, I only speak another language fluently
link |
besides English, which is Spanish, and I remember the day when I would dream in Spanish and
link |
you start to think in that language.
link |
And then you actually, I do definitely believe that language limits or expands your thinking.
link |
There's some languages that actually lead you to certain thought processes.
link |
Like, so I speak Russian fluently and that's certainly a language that leads you down certain
link |
thought processes.
link |
There's a, there's a history of the two world wars of the millions of people starving to
link |
death or near to death throughout his history of suffering, of injustice, like this promise
link |
sold to the people and then the carpet or whatever swept from under them and it's like
link |
broken promises and all of that pain and melancholy is in the language, the sad songs, the sad
link |
hopeful songs, the over romanticized, like, I love you, I hate you, the sort of the swings
link |
between all the various spectrums of emotion.
link |
So that's all within the language, the way it's twisted, poetry, there's a strong culture
link |
of rhyming poetry, so like the bards, like the, there's a musicality to the language
link |
Did Dostoevsky write in Russian?
link |
Like Dostoevsky, Tostoevsky, all the, all the, all the, the ones that I know about which
link |
are translated and I'm curious how the translations.
link |
So Dostoevsky did not use the musicality of the language too much, so they actually
link |
translate pretty well because it's so philosophically dense that the story does a lot of the work,
link |
but there's a bunch of things that are untranslatable.
link |
Certainly the poetry is not translatable.
link |
I actually have a few conversations coming up offline and also in this podcast with people
link |
who have translated Dostoevsky and that's for people who worked, who work in this field
link |
know how difficult that is.
link |
Sometimes you can spend, you know, months thinking about a single sentence, right?
link |
In context, like, because there's just a magic captured by that sentence and how do you translate
link |
just in the right way because those words can be, can be really powerful.
link |
There's a famous line, beauty will save the world from Dostoevsky.
link |
You know, there's so many ways to translate that and you're right, the language gives
link |
you the tools with which to tell the story, but it also leads your mind down certain trajectories
link |
and paths to where over time, as you think in that language, you become a different human
link |
Yeah, that's a fascinating reality, I think.
link |
I know people have explored that, but it's, I guess, rediscovered.
link |
Well, we don't, we live in our own like little pockets, like this is the sad thing is I feel
link |
like, unfortunately, given time and given getting older, I'll never know the China, the Chinese
link |
world because I don't truly know the language, same with Japanese, I don't truly know Japanese
link |
and Portuguese and Brazil, that whole South American continent, like, yeah, I'll go to
link |
Brazil and Argentina, but will I truly understand the people, if I don't understand the language?
link |
It's sad because I wonder how much, how many geniuses we're missing because so much of
link |
the scientific world, so much of the technical world is in English and so much of it might
link |
be lost because they're, they just, we don't have the common language.
link |
I completely agree.
link |
I'm very much in that vein of, there's a lot of genius out there that we miss and it's
link |
sort of, we're sort of fortunate when it, when it bubbles up into something that we
link |
can understand or process, there's a lot we miss, so I tend to lean towards really loving
link |
democratization or things that empower people or, you know, very resistant to sort of authoritarian
link |
Fundamentally, for that reason, it, well, several reasons, but it just hurts us, we're
link |
So speaking of languages that empower you, so Python was the first language for me that
link |
I could, I really enjoyed thinking in, as you said.
link |
Sounds like you shared my experience too.
link |
So when did you first, do you remember when you first kind of connected with Python, maybe
link |
even fell in love with Python?
link |
It's a good question.
link |
It was a process that took about a year.
link |
I first encountered Python in 1997.
link |
I was a graduate student studying biomedical engineering at the Mayo Clinic and I had previously,
link |
I'd been involved in taking information from satellites.
link |
I was an electrical engineering student used to taking information and trying to get something
link |
out of it, doing some data processing information out of it.
link |
And I'd done that in MATLAB, I'd done that in Perl, I'd done that in, you know, scripting
link |
on a VMS, there's actually a VAX VMS system and they had their own little scripting tools
link |
around Fortran, done a lot of that.
link |
And then as a graduate student, I was looking for something and encounter Python and because
link |
Python had an array, had two things that made me not filter it away because I was filtering
link |
I was Yorick, I looked at Yorick, I looked at a few other languages throughout there
link |
at the time in 1997, but it had arrays, there's a library called Numeric that had just been
link |
written in 95, like not very, not too much earlier.
link |
By an MIT alum, Jim Hugenin, you know, and I went back and read the mailing list to see
link |
the history of how it grew and there was a very interesting, it's fascinating to do that
link |
actually to see how this emergent cooperation, unstructured cooperation happens in the open
link |
source world that led to a lot of this collective programming, which is something maybe we might
link |
get into a little later, but what that looks like.
link |
What gap did Numeric fill?
link |
Numeric filled the gap of having an array object.
link |
There was no array object.
link |
There was a one dimensional byte concept, but there was no n dimensional, two, three, four
link |
dimensional tensor they call it now.
link |
I'm still in the category that a tensor is another thing and it's just an nv array, we
link |
should call it, but kind of lost that battle.
link |
There's many battles in this world, some which will win, some we lose.
link |
That's exactly right.
link |
So and, but it was, it had no math to it.
link |
So Numeric had math and a basic way to think in arrays.
link |
So I was looking for that and it had complex numbers, a lot of programming languages.
link |
And you can see it because, you know, if you're just a computer scientist, you think, ah,
link |
complex numbers just too, too float.
link |
So you can, people can build that on, but in practice, a complex number as a, as one
link |
of the significant algebras that helps connect a lot of physical and mathematical ideas,
link |
particularly to FFT for an actual engineer.
link |
And it's a really important concept and not having it means you have to develop it several
link |
times and those times may not share an approach.
link |
One of the common things in programming, one of the things programming enables is abstractions.
link |
But when you have shared abstractions, it's even better.
link |
It sort of gets to the level of language of actually we all think of this the same way,
link |
which is both powerful and dangerous, right?
link |
It's powerful and that we now can quickly make bigger and higher level things on top
link |
of those abstractions dangerous because it also limits us as to the things we left, maybe
link |
left behind in producing an abstraction, which is at the heart of programming today
link |
and actually building around the programming world.
link |
So I think it's a fascinating philosophical topic.
link |
Yeah, they will continue for many years, I think, as we build more and more and more
link |
I often think about, you know, we have, we have a world that's built on these abstractions
link |
that were they the only ones possible?
link |
There's a lot, but they led to, you know, it's very hard to do it differently.
link |
Like there's an inertia that's very hard to, you know, push out, push away from.
link |
There's, that has implications for things like, you know, the Julia language, which
link |
you have heard of, I'm sure.
link |
And I've met the creators and I like Julia.
link |
It's a really cool language, but they've struggled to kind of against the, just the
link |
tide of like this inertia of people using Python and, and, you know, there's strategies
link |
But nonetheless, it's a, it's a phenomenon and sometimes, so I love complex numbers and
link |
So I, I looked at Python and then I had the experience.
link |
I did some stuff in Python and I was just doing my PhD.
link |
So I was out, my focus was on, I was actually doing a combination of MRI and ultrasound and
link |
looking at a phenomenon called elastography, which is you push waves into the body and
link |
observe those waves, like you can actually measure them.
link |
And then you do mathematical inversion to see what the elasticity is.
link |
And so that's the problem I was solving is how to do that with both ultrasound and MRI.
link |
I needed some tool to do that with.
link |
So I was starting to use Python in 97, in 98, I went back, looked at what I'd written
link |
and realized I could still understand it, which is not the experience I'd had when doing
link |
Pearl in 95, right?
link |
I'd done the same thing.
link |
And then I looked back and I, I'd forgotten what I was even saying.
link |
Now, you know, I'm not saying it.
link |
So I, that, that may, hey, this may work.
link |
This may be something I can retain without becoming an expert per se.
link |
And so that led me to go, I'm going to push more into this.
link |
And then that 98 was kind of the, when I started to fall in love with Python, I would say.
link |
A few peculiar things about Python.
link |
So maybe compared to Pearl, compared to some of the other languages.
link |
So there's no braces.
link |
So space is used, indentation, I should say, is used as part of the language.
link |
So did you, I mean, that's quite a leap.
link |
Were you comfortable with that leap?
link |
Or were you just very open minded?
link |
It's a good question.
link |
I was open minded.
link |
So it, I was cognizant of the concern.
link |
And it definitely has, it has specific challenges, you know, cut and pasting, for example, your
link |
cut and pasting code.
link |
And if your editors aren't supportive of that, if you're putting it into a terminal, and
link |
particularly in the past when terminals didn't necessarily have the intelligence to manage
link |
Now, now I Python and Jupyter notebooks handle it just fine.
link |
So there's really no problem.
link |
But in the past, it creates some challenges, formatting challenges.
link |
Also mixed tabs and spaces, if your, if editors weren't, you weren't clear on what was happening,
link |
you would have these issues.
link |
So there were really concrete reasons about it that I heard and understood.
link |
I never really encountered a problem with it personally.
link |
Like it was occasional annoyances, but I really liked the fact that it didn't have all this
link |
extra characters, right?
link |
That these extra characters didn't show up in my visual field when I was just trying
link |
to process understanding a snippet of code.
link |
Yeah, there's a cleanness to it.
link |
But I mean, the idea is supposed to be that Pearl also has a cleanness to it because of
link |
the minimalism of like how many characters it takes to express a certain thing.
link |
So it's very compact.
link |
So you realize with that compactness comes, there's a culture that prizes compactness.
link |
And so the code gets more and more compact and less and less readable to a point where
link |
it's like, like to be a good programmer in Pearl, you write code that's basically unreadable.
link |
There's a culture like.
link |
And you're proud of it.
link |
You're proud of it.
link |
And it's like feels good.
link |
And it's really selective.
link |
And it's hard in Pearl to understand it.
link |
Whereas Python was allowed you not to have to be an expert, you don't have to take all
link |
this brain energy.
link |
You could leverage what I say, you could leverage your English language center, which you're
link |
using all the time.
link |
I've wondered about other languages, particularly non Latin based languages, you know, Latin
link |
based languages with the characters are at least similar.
link |
I think people have an easier time, but I don't know what it's like to be a Japanese or a
link |
Chinese person trying to learn a different, different syntax.
link |
Like what would computer programming look like in a, in that, I haven't looked at that
link |
at all, but it certainly doesn't, you know, leveraging your, your Chinese language center.
link |
I'm not sure Python or any program names does that, but that was a big deal.
link |
The fact that it was accessible, I could be a scientist.
link |
What I really liked is many programming languages really demand a lot of you and you can get
link |
a lot, you know, you do a lot if you learn it, but Python enables you to do a lot without
link |
demanding a lot of you.
link |
There's a, there's nuance to that statement, but it certainly was, it's more accessible.
link |
So more people could actually, as a, as a scientist, as somebody or engineer who was
link |
trying to solve another problem besides point programming, I could still use this language
link |
and get things done and, and be happy about it.
link |
And I was also comfortable in C at that time.
link |
And MATLAB you did a little bit of that.
link |
And MATLAB I did a lot before that, exactly.
link |
So I was comfortable in those three languages were really the tools I used during my studies
link |
Um, but to your point about language helping you think, one of the big things about MATLAB
link |
is it was, and APL before it, I don't know if you're a, you remember APL, APL is actually
link |
the predecessor of array based programming, which I think is really an underappreciated,
link |
if I talked to people who are just steeped in computer programming and computer science,
link |
like most of the people that Microsoft has hired in the past, for example, you know, Microsoft
link |
is a company generally did not understand array based programming like culturally they
link |
So they kept missing the boat, kept missing the understanding of what this, what this was.
link |
We've gotten better, but there's still a whole culture of folks that doesn't programming.
link |
That's the, you know, that's, that's systems programming or web programming or lists and
link |
maps and you know, what about an end dimensional array?
link |
That's just an implementation detail.
link |
Well, you can think that, but then actually if you have that as a construct, you actually
link |
think differently.
link |
APL was the first language to understand that and it was in the sixties, right?
link |
The challenge of APL is APL had very dense, not only glyphs, like new characters, new
link |
glyphs, they even had a new keyboard because to produce those glyphs, this is back in the
link |
early days of computing when, you know, the qwerty keyboard maybe wasn't as established
link |
like, well, we can have a new keyboard, no big deal, but it was a big deal and it didn't
link |
catch on and the language APL very much like Pearl is people would pride themselves on
link |
how much could they write the game of life in 30 characters of APL, APL has characters
link |
that mean summation and they have adverbs, you know, they would have adjectives and
link |
these things called adverbs, which are like methods, like reduction, it would be an adverb
link |
on an ad operator, right?
link |
But using these tools, you could construct and then you start to think at that level,
link |
you think in end dimensions is something I like to say and you start to think differently
link |
about data at that point, you know, now you're, it really helps.
link |
Yeah, I mean, outside of programming, if you really internalize linear algebra as a course,
link |
I mean, it's philosophically allows you to think of the world differently.
link |
It's almost like liberating.
link |
You don't have to, you don't have to think about the individual numbers in the end dimensional
link |
You could think of it as an object in itself and all of a sudden this world can open up.
link |
You're saying MATLAB and APL were like the early, I don't know if many languages got
link |
No, no, no, they didn't still, even still, I would say, I mean, NumPy is a, as an inheritor
link |
of the traditions that I would say APLJ was another version that was what it did is not
link |
have the glyphs just have short characters, but still a Latin keyboard could type them.
link |
And then numeric inherited from that in terms of, let's add arrays plus broadcasting plus
link |
methods, reduction, even some of the language like rank is a concept that's in, that was
link |
in Python is still in Python for the number of dimensions, right?
link |
That's different than say the rank of a matrix, which people think of as well.
link |
So it's, it came from that tradition, but NumPy is a very pragmatic, practical tool.
link |
NumPy inherited from numeric and we can get to where NumPy came from, which is the current
link |
array, at least current as of 2015, 2017, now there's a ton of them over the past two
link |
We can get into that too.
link |
So if we just sort of linger on the early days of what was your favorite feature of
link |
Do you remember like what?
link |
I think it's an interesting to linger on like the, what, what really makes you connect
link |
I'm not sure it's obvious to introspect that.
link |
And I've thought about that.
link |
It's just some length.
link |
I'm not, I think definitely the fact that I could read it later, that I could use it
link |
productively without becoming an expert.
link |
And you, other languages I had to put more effort into.
link |
That's like an empirical observation.
link |
Like you're not analyzing any one aspect of language.
link |
It just seems time after time, when you look back, it's somehow readable.
link |
It's somehow readable.
link |
And then it was sort of, I could take executable English and translate it to Python more easily.
link |
Like I didn't have to go, there was no translation layer as an engineer or as a scientist.
link |
I could think about what I wanted to do.
link |
And then the syntax wasn't that far behind it.
link |
Now there are some, there are some warts there still.
link |
It wasn't perfect.
link |
Like there's some areas where I'm like, it'd be better if this were different or if this
link |
Some of those things got out of the language too.
link |
I was really grateful for some of the early pioneers in the Python ecosystem back because
link |
Python got written in 91 is when the first version came out.
link |
But Guido was very open to users.
link |
And one of the sets of users were people like Jim Huguenin and David Asher and Paul Dubois
link |
and Conrad Hinson.
link |
These were people that were on the main list.
link |
And they were just asking for things like, hey, we really should have complex numbers
link |
So let's, you know, there's a J, there's a one J, right?
link |
And the fact that they went the engineering route of J is interesting.
link |
I don't think that's entirely favorite engineers.
link |
I think it's because I is so often used as the index of a for loop.
link |
I think that's actually why.
link |
I mean, there's a pragmatic aspect.
link |
But the complex numbers were there.
link |
The fact that I could write in the array constructs and that reduction was there.
link |
Very simple to write summations and broadcasting was there.
link |
I could do addition of whole arrays.
link |
That was something that I loved about it.
link |
I don't know what to start talking to you about because you've been, you've created
link |
so many incredible projects that basically change the whole landscape of programming.
link |
But okay, let's start with, let's go chronologically with SciPy.
link |
You created SciPy over two decades ago now.
link |
I love to talk about SciPy.
link |
SciPy was really my baby.
link |
What was its goal?
link |
Well, practically, here I am using Python to do stuff that I previously used Matlab
link |
And I was using Numeric, which is an array library that made a lot of it possible.
link |
But there's things that were missing.
link |
Like I didn't have an ordinary differential equation solver I could just call.
link |
I didn't have integration.
link |
Yeah, I wanted to integrate this function.
link |
Well, I don't have just a function I can call to do that.
link |
These are things I remember being critical things that I was missing.
link |
I just want to pass a function to an optimizer and have it tell me what the optimum value
link |
Those things like, well, why don't we just write a library that adds these tools?
link |
And I started to post on the main list and there had previously been, you know, people
link |
I remember Conrad Hinson saying, wouldn't it be great if we had this optimizer library
link |
or David Asch would say this stuff and I'm, you know, I'm a ambitious, ambitious is the
link |
wrong word, an eager and probably more time than sense.
link |
I was, you know, a poor graduate student.
link |
My wife thinks I'm working on my PhD and I am, but part of the PhD that I loved was
link |
the fact that it's exploratory, but you're not just, you know, taking orders fulfilling
link |
a list of things to do.
link |
You're trying to figure out what to do.
link |
And so I thought, well, you know, I'm writing tools for my own use in a PhD.
link |
So I'll just start this project.
link |
And so in 99, 98 was when I first started to write libraries for Python, but when I fell
link |
in love with Python 98, I said, Oh, well, there's just a few things missing.
link |
Like, Oh, I need a reader to read daikon files that was in medical imaging and daikon was
link |
a format that I want to be able to load that into Python.
link |
How do I write a reader for that?
link |
So I wrote something called, it was an IO package, right?
link |
And that was my very first extension module, which is C. So I wrote C code to extend Python
link |
so that the positive in Python, I could write things more easily that, that combination
link |
kind of hooked me.
link |
It was the idea that I could, here's this powerful tool I can use as a scripting language
link |
and a high level language to think about, but that I can extend easily, easily in this
link |
in C that easily for me, because I knew enough C.
link |
And then Guido had written a link, I mean, the only, the hard part of extending Python
link |
was something called the way memory management networks, and you have to reference counting.
link |
And so there's, there's a tracking of reference counting you have to do manually.
link |
And if you don't, you have, you have memory leaks.
link |
And so that's hard, plus then C, you know, it's just much more, you have to put more
link |
It's not just I have to now think about pointers and I have to think about stuff that is different.
link |
I have to kind of, you're like putting a new cartridge in your brain, like, okay, I'm
link |
thinking about MRI, now I'm thinking about programming and, and they're distinct modules
link |
you end up having to think about.
link |
So it's harder when I was just in Python, I could just think about MRI and high level
link |
But I could do that and that kind of, I liked it.
link |
I found that to be enjoyable and fun.
link |
And so I ended up, oh, well, let me just add a bunch of stuff to Python to do integration.
link |
Well, and the cool thing is, is that, you know, the power of the internet, I just looking
link |
around and I found, oh, there's this NetLib, which has hundreds of 4chan routines that people
link |
are written in the 60s and the 70s and the 80s and 4chan 77, fortunately, it wasn't 4chan
link |
60s, it had been imported to 4chan 77.
link |
And 4chan 77 is actually a really great language.
link |
4chan 90 probably is my favorite 4chan because it's also, it's got complex numbers, got arrays
link |
and it's pretty high level.
link |
Now, the problem with it is you'd never want to write a program in 4chan 90 or 4chan 77,
link |
but it's totally fine to write a sub routine in, right?
link |
And so, and then 4chan kind of got a little off course when they tried to compete with
link |
C++, but at the time, I just want libraries that do something like, oh, here's an order
link |
infrastructure equation, here's integration, here's runge cut integration, already done.
link |
I don't have to think about that algorithm, I mean, you could, but it's nice to have somebody
link |
who's already done one and tested it.
link |
And so, I sort of started this journey in 98, really, look back at the main list, there's
link |
sort of this, this productive era of me writing an extension module to connect runge cut integration
link |
to Python and making an ordinary additional equation solver and then releasing that as
link |
a package, so we could call ODE pack, I think I called it then quad pack, you know, I just
link |
made these packages, eventually that became multi pack because they're originally modular,
link |
you can install them separately, but a massive problem in Python was actually just getting
link |
your stuff installed.
link |
At the time, releasing software for me, like today it's, people think, what does that mean?
link |
Well, then it meant some poorly written webpage, I had some bad webpage up and I put a tar
link |
ball, just a GZip tar ball of source code, that was the release.
link |
But okay, can we just stand that because that, the community aspect of creating the package
link |
and sharing that, that's rare, that to have, to both have the, at that time, so like the
link |
Yeah, it was pretty early, yeah.
link |
Well, not rare, maybe you can correct me on this, but it seems like in the scientific
link |
community, so many people, you were basically solving the problems you needed to solve,
link |
to process the particular application, the data that you need, and to also have the mind
link |
that I'm going to make this usable for others, that's...
link |
I would say I was inspired, I'd been inspired by Linux, been inspired by, you know, Linus
link |
and him making his code available, and I was starting to use Linux at the time, and I went,
link |
So I'd kind of been previously primed that way, and generally, I was into science because
link |
I like the sharing notion, I like the idea of, hey, let's, if collectively we build
link |
knowledge and share it, we can all be better off.
link |
Okay, so you want to energize by that idea?
link |
So I was energized by that idea already, right, and I can't deny that I was, I'm sort of had
link |
this very, I liked that part of science, that part of sharing, and then all of a sudden,
link |
oh wait, here's something, and here's something I could do, and then I slowly over years learned
link |
how to share better so that you could actually engage more people faster.
link |
One of the key things was actually giving people a binary they could install, right,
link |
so that wasn't just your source code, good luck.
link |
Compile this and then...
link |
It's compiled, ready to install, you just, you know, so, in fact, a lot of the journey
link |
from 98, even through 2012 when I started Anaconda was about that, like it's why, you
link |
know, it's really the key as to why a scientist with dreams of doing MRI research ended up
link |
starting a software company that installs software.
link |
I work with a few folks now that don't program, like on the creative side, the video side,
link |
the audio side, and because my whole life is running on scripts, I have to try to get
link |
them, I have now the task of teaching them how to do Python enough to run the scripts,
link |
and so I've been actually facing this, whether it's on the condor, some, with the task of
link |
how do I minimally explain, basically to my mom, how to write a Python script, and it's
link |
an interesting challenge.
link |
It's a to do item for me to figure out, like, what is the minimal amount of information
link |
I have to teach, what are the tools you use, the one you enjoy it, to your effect of it.
link |
And they're related.
link |
Those are two related questions.
link |
And then the debugging, like the iterative process of running the script to figure out
link |
what the error is, maybe even for some people to do the fix yourself.
link |
So do you compile it, like how do you distribute that code to them?
link |
And it's interesting because I think it's exactly what you're talking about, if you
link |
increase the circle of empathy, the circle of people that are able to use your programs,
link |
you increase its effectiveness and its power.
link |
And so you have to think, can I write scripts, can I write programs that can be used by medical
link |
engineers, by all kinds of people that don't know programming, and actually maybe plant
link |
the seed, have them catch the bug of programming so that they start on their journey.
link |
That's a huge responsibility.
link |
And ultimately it has to do with the Amazon one click buy, like how frictionless can you
link |
make the early steps?
link |
Frictionless is actually really key to go in any community is every, any friction point,
link |
you're just going to lose, you're going to lose some people, right?
link |
Now, sometimes you may want to intentionally do that, if you're early enough on that you
link |
need a lot of help, you need people who have the skills, you might actually, it's helpful,
link |
you don't necessarily have too many users as opposed to contributors if you're early
link |
Anyway, there's a sci fi start in 98, but it really emerged as this collection of modules
link |
that I was just putting on the net, people were downloading, and you know, I think I
link |
got 100 users by the end of that year, but the fact that I got 100 users and more than
link |
that, people started to email me with fixes, like, and that was actually intoxicating,
link |
That was the, that was the, you know, here I'm writing papers and I'm giving conferences
link |
and I get people would say hello, but yeah, good job, but mostly it was, you're reviewed
link |
with it's competitive, right?
link |
You publish a paper and people were like, oh, it wasn't my paper, you know, it was starting
link |
to see that sense of academic life where it was so much, I thought it was a cooperative
link |
effort, but it sounds like we're here just to one up each other.
link |
And you know, that's not true across the board, but a lot of that's there, but here
link |
in this world, I was getting responses from people all over the world.
link |
You know, I remember P.R.
link |
Peterson in Estonia, right, was one of the first people, and he sent me back this make
link |
file because, you know, the first thing it is, yeah, you're building stinks and here's
link |
a better make file.
link |
Now, it was a complex make file, I don't think I never understood that make file actually,
link |
but it worked and it did a lot more, and so I said, thanks, this is cool.
link |
And that was my first kind of engagement with community development.
link |
But you know, the process was he sent me a patch file, I had to upload a new tar ball,
link |
and I just found I really love that.
link |
And the style back then was here's a main list is very, it wasn't as, there's certainly
link |
more of the tools that are available today, it was very early on, but I really started
link |
to, that's the whole year, I think I did about seven packages that year, right?
link |
And then by the end of the year, I collected them into a thing called multi pack.
link |
So 99, there was this thing called multi pack, and that's when a high school student, always
link |
a high school student at the time, getting Robert Kern, took that package and made a
link |
Windows installer, right?
link |
And then of course, a massive increase of usage.
link |
So by the way, most of this development was under Linux.
link |
Yes, it was on Linux.
link |
I was a Linux developer doing it on a new box.
link |
I mean, at the time, I was actually getting into, I had a new hard drive, just some kernel
link |
programming to make the hard drive work, I mean, not programming, but modification to
link |
the kernel so I could actually hard drive working.
link |
I love that aspect of it.
link |
I was also in, at school, I was building a cluster, I took Mac computers, and you put
link |
Yellow Dog Linux on them.
link |
At the Mayo Clinic, they were just, all these Macs that were older, they were just getting
link |
rid of, and so I kind of got permission to go grab them together, I put about 24 of them
link |
together in a cluster, in a cabinet, and put Yellow Dog Linux on them all, and I wrote
link |
a C++ program to do MRI simulation.
link |
That was what I was doing at the same time for my day job, so to speak.
link |
So I was loving the whole process, and at the same time, I was, oh, I need an ordinary
link |
differential equation.
link |
That's why ordinary differential equations were key, was because that's the heart of
link |
a block equation for simulating MRI is an ODE solver, and so that's, I actually did that,
link |
it doesn't happen at the same time.
link |
That's why, kind of, what you're working on and what you're interested in, they're
link |
I was definitely scratching my own itch, in terms of building stuff, which helped in
link |
the sense that I was using it for me, so at least I had one user.
link |
I had one person who was like, well, no, this is better, I like this interface better, and
link |
I had the experience of Matlab to guide some of what those APIs might look like, but you
link |
know, you're just doing yourself, you're building all this stuff.
link |
But the Windows installer, it was the first time I realized, oh, yeah, the binary installer
link |
really helps people.
link |
And so that led to spending more time on that side of things.
link |
So around 2000, so I graduated my PhD in 2000, end of year, end of 2000, so 99 doing a lot
link |
of work there, 98 doing a lot of work there, 99 kind of spending more time on my PhD, helping
link |
people use the tools, thinking about where do I want to go from here.
link |
There was a company, there was a guy actually, Eric Jones and Travis Vott, they were two
link |
friends who founded a company called Enthought, it's here in Austin, still here.
link |
And they, Eric contacted me at the time when I was a graduate student still, and he said,
link |
hey, why don't you come down, we want to build a company, you know, we're thinking of, you
link |
know, a scientific company and we want to take what you're doing and kind of add it
link |
to some stuff that he'd done, he'd written some tools, and then Piero Peterson had done
link |
F2Py, let's come together and build, pull this all together and call it SciPy.
link |
So that's the origin of the SciPy brand, it came from, you know, multi pack and a whole
link |
bunch of modules I'd written, plus a few things from some other folks, and then pull
link |
together in a single installer.
link |
SciPy was really a distribution of Python masquerading as a library.
link |
How did you think of all SciPy in context of Python, in context of numeric, like what?
link |
We saw SciPy as a way to make an R&D environment for Python, like use Python, dependent on
link |
numeric, so numeric was the array library we depended on, and then from there extend
link |
it with a bunch of modules that allowed for, and at the time, the original vision of SciPy
link |
was to have plotting, was to have, you know, replant, you know, the replant environment
link |
and kind of a whole, really a whole data environment that you could then install and get going
link |
And that was kind of the thinking, it didn't really evolve that way, right?
link |
It sort of had a, but one, it's really hard to do massive scale projects with open source
link |
Actually, there's sort of an intrinsic cooperation limit as to which, you know, too many cooks
link |
in the kitchen, you know, you can do amazing infrastructure work, when it comes down to
link |
bringing it all together into a single deliverable, that actually requires a little more product
link |
management that is not, that doesn't really emerge from the same dynamic.
link |
So it struggled, you know, struggled to get almost too many voices, it's hard to have
link |
everybody agree, you know, consensus doesn't really work at that scale, you end up with
link |
politics, you know, with the same kind of things that's happened in large organizations
link |
trying to decide on what to do together.
link |
The consensus building was still, was challenging at scale, as more people came in, right?
link |
Early on, it's fine, because there's nobody there, and so it works, but then as you get
link |
more successful, the more people use it, all of a sudden, oh, there's this scale at which
link |
this doesn't work anymore, and we have to come up with different approaches.
link |
So SciFi came out officially in 2001, was the first release, most of the time.
link |
I remember the days of getting that release ready, it was a Windows installer, and there
link |
were bugs on how, you know, the Windows compiler handled complex numbers, and you were chasing
link |
segmentation faults, and it was, it's a lot of work, there's a lot of effort had nothing
link |
to do with my area of study, and at the same time, I had just gotten an offer, so he wondered
link |
if I wanted to come down and help him start that, you know, start that company with his
link |
friend, and at the time I was like, I was intrigued, but I was squaring a path, an academic
link |
path, and I just got an offer to go and teach at my alma mater, so I took that tenure track
link |
position, and SciFi, and kind of, then I started working on SciFi as a professor, too.
link |
So that's, I left, I've got the Mayo Clinic graduate, wrote my thesis using SciFi, wrote,
link |
you know, there's images that were created, now the plotting tool I used was something
link |
from Yorick, actually, it was a plotting PLT, kind of a plotting language that I used.
link |
Yorick is a programming language.
link |
It was a programming language, it had a plotting tool, Dislin, it had integration to Dislin,
link |
I ended up using Dislin, plus some of the plotting from Yorick, linked to from Python.
link |
But it was a, people don't plot that way now, but this was before, and SciFi was trying
link |
to add plotting, right?
link |
It didn't have much success, really the success of plotting came from John Hunter, who had
link |
a similar experience to my experience, my kind of Maverick experience as a person just
link |
trying to get stuff done, and kind of having more time than money, maybe, right?
link |
And John Hunter created what?
link |
He's a creator of Mapplotlib.
link |
Yeah, so John Hunter was, you know, he wasn't a student at the time, but he was working
link |
in Quantfield, and he said, we need better plotting.
link |
So he just went out and said, cool, I'll make a new project, and we'll call it Mapplotlib.
link |
And he released in 2001, about the same time that SciFi came out.
link |
And it was separate library, separate install, use numeric, SciFi, use numeric.
link |
And so SciFi, you know, in 2001, we released SciFi, and then Enthoc created a conference
link |
called SciFi, which was brought people together to talk about the space.
link |
And that conference is still ongoing, it's one of the favorite conferences of a lot of
link |
people because it's, you know, it's changed over the years.
link |
But early on, it was, you know, a collection of 50 people who care about scientists mostly,
link |
you know, practicing scientists who want to care about coding and doing it well and not
link |
And I remember being driven by, you know, I like MATLAB, but I didn't like the fact
link |
that, like, so I'm not opposed to proprietary software.
link |
I'm actually not an open source zealot.
link |
I love open source for the, what it brings, but I also see the role for proprietary software.
link |
But what I didn't like was the fact that I would develop code and publish it, and then
link |
effectively telling somebody here to run my code, you have to have this proprietary software.
link |
And there's also culture around MATLAB as much, because I've talked to a few folks
link |
at MathWorks, Greats MATLAB.
link |
I mean, there's just a culture, they try really hard, but it's just this corporate IBM
link |
style culture that's like, or whatever.
link |
I don't want to say negative things about IBM or whatever, but there's a, no, it's really
link |
It's something I'm in the middle of right now is the business of open source.
link |
And how do you connect the ethos of cooperative development with the necessity of creating
link |
And like right now today, you know, I'm still in the middle of that.
link |
That's actually the early days of me exploring this question.
link |
Because as I was writing SciPy, I mean, as an aside, I also had, so I had three kids
link |
I have six kids now.
link |
I got married early, wanted a family.
link |
I had three kids, and I remember reading, I read Richard Stallman's post, and I was
link |
a fan of Stallman.
link |
I would read his work, I liked this collective ideas he would have.
link |
Certainly the ideas on IP law, I read a lot of stuff.
link |
But then he said, you know, okay, well, how do I make money with this?
link |
How do I make a living?
link |
How do I pay for my kids?
link |
All this stuff was in my mind.
link |
Young graduate student making no money thinking I got to get a job.
link |
And he said, well, you know, I think just be like me and don't have kids, right?
link |
That's just don't, don't.
link |
That's his take on, is that his dad?
link |
That was the, that was the, what he said in that moment, right?
link |
That's the thing I read and I went, okay, this is a train I can't get on.
link |
There has to be a way to preserve the culture of open source and still be able to make sufficient
link |
money to feed your kids.
link |
There's got to be, well, so that actually led me to a study of economics, because at
link |
the time I was ignorant, and it really was, and I'm actually, I'm embarrassed for educational
link |
system that they could let me, and I was valedictorian of my high school class, and I did super well
link |
in college, and like academically I did great, right?
link |
But the fact that I could do that and then be clueless about this key part of life, it
link |
led me to go, there's a problem.
link |
Like I should have learned this in fifth grade, I should have learned this in eighth
link |
grade, like everybody should come out with a basic knowledge of economics.
link |
You're an interesting example, because you've created tools that change the lives of probably
link |
millions of people, and the fact that you don't understand at the time of the creation
link |
of those tools, the basics economics of how to build up a giant system is a problem.
link |
Yeah, it's a problem.
link |
And so, during my PhD, at the same time, this is back in 98, 99, at the same time, I was
link |
in a library, I was reading books on capitalism, I was reading books on Marxism, I was reading
link |
books on, you know, what is this thing, what does it mean?
link |
And I encountered a, basically what I, I encountered a set of writings from people that said they
link |
were the inheritors of Adam Smith, but Adam Smith for the first time, right?
link |
Which is the wealth of nations and kind of this notion of emergent, emergent societies,
link |
and realized, oh, there's this whole world out here of people.
link |
And the challenge of economics is also political, like, because economics, you know, people,
link |
different parties running for office, they'll, they want their economic friends, they want
link |
their economists to back them up, right, or to be their magicians, like the magicians
link |
in Pharaoh's court, right, the people that are kind of say, hey, this is, you should
link |
listen to me, because I've got the expert who says this.
link |
And so, it gets really muddled, right?
link |
I was looking at, as a scientist going, what is this space, what does this mean, how does
link |
Paris get fed, how does, how does money, how does it work?
link |
And I found a lot of writings I really loved.
link |
I found some things that I really loved, and I learned from that, it was writings from
link |
people like Von Missas, he wrote a, he wrote a paper in 1920 that still should be read
link |
It's got, I mean, it was the economic calculation problem of the socialist commonwealth.
link |
It was basically in response to the Bolshevik Revolution in 1917.
link |
And his basic argument was, it's not going to work to not have private property.
link |
You're not going to be able to come up with prices.
link |
The bureaucrats aren't going to be able to determine how to allocate resources without
link |
And a price system emerges from people making trades.
link |
And they can only make trades if they have authority over the thing they're trading.
link |
And that, that creates information flow that you just don't have if you try to top down
link |
It's like, huh, that's a really good point.
link |
The prices have a signal that's used, and it's important to have that signal when you're
link |
trying to build a community of productive people like you would in the software engineering
link |
The prices are actually an important signaling mechanism, right?
link |
And that money is just a bartering tool.
link |
So this is the first time I've encountered any of this concept, right?
link |
And the fact that, oh, this is actually really critical.
link |
Like it's so critical to our prosperity and that we're, we're dangerously not learning
link |
about this, not teaching our children about this.
link |
So you had the three kids, they had to make some hard decisions.
link |
They had to make some money, right, had to figure it out.
link |
But I didn't really care.
link |
I mean, I was never, I've never been driven by money.
link |
So what, how did that resolve itself in terms of side by?
link |
So I would say it didn't really resolve itself.
link |
It sort of started a journey that I'm continuing on.
link |
I'm still on, I would say.
link |
I don't think it resolved itself, but I will say I went in wide, eyes wide open.
link |
Like I knew that there were problems with, you know, giving stuff away and creating the,
link |
the market externalities, that the fact that, yeah, people might use it and I might not
link |
get paid for it and I'll have to figure something else out to get paid.
link |
Like at least I can say I'm not bitter that a lot of people have used stuff that I've
link |
written and I haven't necessarily benefited economically from it.
link |
Like I've heard other people be, you know, bitter about that when they write or they
link |
talk, they go, oh, I should have got more value out of this.
link |
And I'm also, I want to create systems that let people like me who might have these desires
link |
to do things, let them benefit so that actually creates more of the same.
link |
Not to turn on your bitterness module, but there's some aspect, I wish there was mechanisms
link |
for me to reward whoever created side pie and non pie because it brought so much joy
link |
I appreciate that.
link |
And I, the tip dark notion was there.
link |
I appreciate that.
link |
And I think there should be a very, there should be a machine less mechanism.
link |
I would love to talk about some of the ideas I have because I actually came across, I think
link |
I've come up with some interesting notions that could work, but they'll require anything
link |
that will work takes time to emerge.
link |
And I don't just turn overnight.
link |
That's definitely one thing I've also understood and learned is any fixes.
link |
That's why it's kind of funny.
link |
We often give credit to, you know, oh, this president gets elected and, oh, look how great
link |
And I saw that when, when I had a transition in a condo, when a new CEO came in, right?
link |
And it's like the success that's happening.
link |
There's an inertia there.
link |
And sometimes the decision made like 10 years before is the reason why the success is
link |
So we're sort of just one around taking credit for stuff.
link |
Credit assignment has like a delay to it.
link |
That makes the credit assignment basically wrong more than right.
link |
Wrong more than right.
link |
And so I'm like, oh, this is, you know, that's the stuff I would, I would read a ton about,
link |
you know, early on.
link |
So I don't, I feel like I'm with you.
link |
Like I want the same thing.
link |
I want to be able to, and honestly not for personally, I've been happy.
link |
I've been, I've been happy.
link |
I feel like I don't have any, I mean, we've been done reasonably okay, but I've had to
link |
Like that's, that's really what started my trajectory from academia is reading that
link |
stuff led me to say, oh, entrepreneurship matters.
link |
So I love software, but we need more entrepreneurs and I want to understand that better.
link |
So once I kind of had that, that virus infect my brain, it, even though I was on a trajectory
link |
to go to a tenure track position at a university and I was there for six years, I was kind
link |
of already out the door when I started and we can get into that.
link |
Yeah, what can I just ask a quick question on, is there some design principles that were
link |
in your mind or on side pie?
link |
Like, is there some key ideas that were just like sticking to you that this is, this is
link |
the fundamental ideas?
link |
Yeah, I would say so.
link |
I would think it's basically accessibility to scientists, like give them, give scientists
link |
and engineers tools that they don't have to think a lot about programming.
link |
So give them really good building blocks, give them functions that they want to call
link |
and sort of just the right length of spelling.
link |
You know, there's a one tradition in a programming where it's like, you know, make very, very
link |
long names, right?
link |
And you can see it in some programming languages where the names get, you know, take half the
link |
screen and I, and in the fortune world, characters would have to be six, six letters early on,
link |
And that's way too, too much, too, too little, but I was like, I like to have names that
link |
were informative, but short.
link |
So even though Python was a different conversation, but documentation is doing so well.
link |
There's some work there.
link |
So when you look at great scientific libraries and functions, there's, there's a richness
link |
of documentation that helps you get into the details.
link |
The first glance at a function gives you the intuition of all it needs to do by looking
link |
at the headers and so on.
link |
But to get the depths of all the complexities involved, all the options involved, documentation
link |
does some of the work.
link |
Documentation is essential.
link |
So that was actually, so we thought about several things.
link |
One is we wanted plotting.
link |
We wanted interactive environment.
link |
We wanted good documentation.
link |
These were things we knew, we wanted.
link |
The reality is those took about 10 years to evolve, right?
link |
Given the fact that we didn't have a big budget, it was all volunteer labor.
link |
It was sort of when Enthought got created and they started to, you know, try to find projects,
link |
people would pay for pieces and they were able to fund some of it, not nearly enough
link |
to keep up with what was necessary.
link |
And I'm, no, no criticism, just simply the reality.
link |
I mean, it's, it's hard to start a business and then do consulting and then also promote
link |
an open source project that's still fairly new.
link |
Cypher was fairly niche.
link |
We stayed connected all while I was a student, sorry, a professor.
link |
I went to BYU and started to teach, electrical engineering, all the applied math courses.
link |
I loved teaching, single processing, probability theory, electrical magnetism.
link |
I was the, if you look at right in my professor, which my kids loved to do, I wasn't, I got
link |
some bad reviews because people.
link |
What was the criticism?
link |
I would speak too high, too high of a level.
link |
Like I definitely had a calibration problem coming out of graduate work where I hate to
link |
be condescending to people.
link |
Like I really have a ton of respect for people fundamentally.
link |
Like my fundamental thing is I respect people.
link |
Sometimes that can lead to a, I was, I was thinking they were, they, they had more knowledge
link |
And so I would just speak at a very high level, assume they got it.
link |
But they need to rise to the standard that you set.
link |
I mean, that's one of the, some of the greatest teachers do that.
link |
And I agree, and that was kind of what was inspiring me, but, but you know, you also have to, I
link |
cannot say I was an art, I was articulate of some of the greatest teachers, right?
link |
I was, you know, like one, one classic example, when I first taught at BYU, my very first
link |
class, it was overheads, transparencies, overheads, before projectors were really that common.
link |
So transparencies, I'm writing my notes out, I go in, rooms half dark, I just blaring through
link |
these transparencies.
link |
And I did, gave a quiz after two weeks, nowhere knew anything, nothing I had taught, I had
link |
And I realized, okay, I'm not, this is not working.
link |
So I took, put away the transparencies and I turned around and just started using the
link |
And what it did is it slowed me down, right?
link |
The chalkboard just slowed me down and gave people time to process and to think and then
link |
that made me focus.
link |
My writing wasn't great on their chalkboard, but I really loved that part of like the teaching.
link |
So that, that entered sci pies world in terms of, we always understood that sci, there's
link |
a didactic aspect of sci pie kind of, how do you take the knowledge and then produce
link |
The challenge we had was the scope.
link |
Like ultimately sci pie was everything, right?
link |
And so 2001 when it first came out, people were starting to use it.
link |
This is a tool we actually use.
link |
At the same time, 2001 timeframe, there was a little bit of like the Hubble Space Telescope,
link |
the folks at Hubble has started to say, hey, Python, we're going to use Python for processing
link |
images from Hubble.
link |
And so Perry Greenfield was a good friend and running that program and he had called me
link |
before I left to BYU and said, you know, we want to do this, but numeric actually has
link |
some challenges in terms of, you know, it's not, the array doesn't have enough types.
link |
We need more operations, you know, broadcast needs to be a little more settled.
link |
They wanted record arrays.
link |
They wanted, you know, record arrays are like a data frame, but a little bit different,
link |
but they want a more structured data.
link |
So he had called me even early on then and they said, yeah, would you want to work on
link |
something to make this work?
link |
And I said, yeah, I'm interested, but I'm going here and we'll see if I have time.
link |
So in the meantime, while I was teaching and SIPI was emerging and I had a student, I was
link |
constantly while I was teaching trying to figure a way to fund this stuff.
link |
So I had a graduate student, my only graduate student, a Chinese fellow, Lu Hongze is his
link |
He wrote a bunch of stuff for iterative linear algebra, like got into writing some of the
link |
iterative linear algebra tools that are currently there in SIPI and they've gotten better since,
link |
but this is in 2005, kept working on SIPI.
link |
But Perry has started working on a replacement to Numeric called Numeray and in 2004, a package
link |
called and the image, it was an image processing library that was written for Numeray.
link |
And it had in it a morphology tool.
link |
I don't know if you know what morphology is.
link |
It's open dilations.
link |
You know, there was sort of this, as a medical imaging student, I knew what it was because
link |
it was used in segmentation a lot.
link |
And in fact, I'd wanted to do something like that in Python in SIPI, but just had never
link |
gotten around to it.
link |
So when it came out that it worked only on Numeray and SIPI needed Numeric and so we
link |
effectively had the beginning of this split and Numeric and Numeray didn't share data.
link |
They were just two, so you could have a gigabyte of Numeray data and gigabyte of Numeric data
link |
and they wouldn't share it.
link |
And so you had these, then you had these scientific libraries written on top.
link |
I got really bugged by that.
link |
I got really like, oh man, this is not good.
link |
We're not cooperating now.
link |
We're sort of redoing each other's work and we're just this young community.
link |
So that's what led me, even though I knew it was risky because my, I was on a tenure
link |
2004, I got reviewed.
link |
They said, hey, things are going okay.
link |
You're doing well.
link |
Paper's coming out.
link |
But you're kind of spending a lot of time on this open source stuff.
link |
Maybe do a little less of that and a little more of the paper writing and grant writing,
link |
which was naive, but it was definitely the tie, the thinking that still goes on, still
link |
You're basically creating a thing which enables science in the 21st century.
link |
Maybe don't emphasize that so much in your tenure.
link |
It illustrates some of the challenges.
link |
And it's, people mean well, but we've gotten broken in a bunch of ways.
link |
Certain things, a programming, understanding the role of software engineering, programming
link |
in society is a little bit like, I guess.
link |
Now I was in an electrical engineering position.
link |
They were very focused.
link |
And so, good people, and I had a great time.
link |
I loved my teaching.
link |
I loved all the things I did there.
link |
The problem was, this split was happening, this community I loved.
link |
I saw people and I went, oh my gosh, this is going to be, this is not great.
link |
And so, I happened, fate, I had a class I signed up for, I was trying to build an MRI system.
link |
But I had a kind of a radio, a digital radio class, a digital MRI class.
link |
And I had people sign up, two people signed up, then they dropped, and so, I had nobody
link |
So, and I didn't have any other courses to teach, and I thought, oh, I've got some time.
link |
And I'll just write, I'll just write a merger of numeric and nummery.
link |
Like I'll basically take the numeric code base at the features nummery I was adding,
link |
and then kind of come up with a single array library that everybody can use.
link |
So that's where NumPy came from, was my thinking, hey, I can do this, and who else is going
link |
Because at that point, I'd been around the community long enough, and I'd written enough
link |
I knew, I knew the structures.
link |
And I, in fact, my first contribution to numeric had been writing the CAPI documentation that
link |
went in the first documentation for NumPy, for numeric, sorry.
link |
This is Paul de Bois, David Asher, Conrad Henson and myself.
link |
I got credit because I wrote this chapter, which is all the CAPI of numeric, all the
link |
So I said, ah, probably the one to do it, and nobody else is going to do this.
link |
So it's sort of out of a sense of duty and passion, knowing that I don't think my academic,
link |
I don't think the department here is going to appreciate this, but it's the right thing
link |
Can we just link on that moment because the importance of the way you thought and the
link |
action you took, I feel is understated and is rare, and I would love to see so much more
link |
of it because what happens as the tools become more popular, there's a split that happens.
link |
And it's a truly heroic and impactful action to in those early, in that early split to
link |
step up and you, it's like great leaders throughout history, like get what is the brave heart,
link |
like get on a horse and, and rile the troops because I think that can have, make a big
link |
We have TensorFlow versus PyTorch in the machine learning.
link |
We have the same problem today.
link |
I wonder, it's actually bigger.
link |
I wonder if it's possible in the early days to rally the troops.
link |
It is possible, especially in the early days.
link |
The longer it goes, the harder, right?
link |
And the more energy in the factions, the harder, but in the early days, it is possible and it's
link |
extremely helpful and there's a willingness there, but, but the, but the challenge is
link |
there's usually not a willingness to fund it.
link |
There's not a willingness to, you know, like I was literally walking into a field saying
link |
I'm going to do this and you know, here I am, like, you know, I have five kids at home
link |
Sometimes my wife hears these stories and she's like, you did what?
link |
I thought we were going to, I thought you were actually on a path to make sure we had
link |
resources and money, but, but again, there's a, there's an aspect, I'm, I'm a very hopeful
link |
I'm an optimistic person.
link |
I learned that about myself later on, uh, uh, part of my, uh, my religious beliefs actually
link |
And it's why I hold them dear because it's actually how I feel about, it's what, it's
link |
what leads me to this, to these attitudes, sort of this hopefulness and this sense of,
link |
yeah, it may not, it may not work out for me financially or maybe, but that's not the
link |
Like that's a thing, but it's not, you know, that's not the scorecard, uh, for me.
link |
And so I just wanted to be helpful and I knew, and partly because these sci pi conferences
link |
because of the main list conversations, I knew there was a lot of need for this.
link |
And so I had this, it wasn't like I was alone in terms of no feedback.
link |
I had these people who knew, but it was crazy.
link |
Like people who, to the time said, yeah, we didn't think you'd be able to do it.
link |
We thought it was crazy.
link |
And also instructive, like practically speaking, that you had a cool feature that you were
link |
chasing in the morphology, like the, like it's, it's not just like, it's not some visionary
link |
I'm going to unite the community.
link |
You were like, you were actually practically, this is what one person actually can do, uh,
link |
and actually build.
link |
Cause that is important cause you can get over your skis.
link |
You can definitely get over your skis.
link |
And I had, in fact, this almost got me over my skis, right?
link |
I would say, well, in retrospect, I hate looking back, we can, I can tell you all the flaws
link |
We want to go into it.
link |
There's lots of stuff that I'm like, oh man, that's embarrassing.
link |
I wish I had somebody slot me with a wet fish there.
link |
Like I needed, like what I'd wished I'd had was somebody with more experience and certainly
link |
library writing and array library.
link |
Like I wish I had me, I could go back in time and go, do this, do that.
link |
There's an important thing.
link |
Cause there's things we did that are still there that are problematic that created challenges
link |
And, and I didn't know it at the time, didn't understand how important that was.
link |
And in many cases, didn't know what to do.
link |
Like there was pieces of the design of NumPy.
link |
I didn't know what to do until five years ago.
link |
Now I know what they should have been, but I didn't know at the time and nobody, and
link |
I couldn't get the help.
link |
It took about, it took four months to write the first version and then about 14 months
link |
to make it usable.
link |
But it was, it wasn't, it was that first four months of intense writing, coding, getting
link |
something out the door that worked.
link |
That was, it was, it was definitely challenging.
link |
And then the big thing I did was create a new type object called D type.
link |
That was probably the sync, the, the contribution.
link |
And then the fact that I added a broad, not just broadcasting, but advanced indexing.
link |
So that you could do, um, masked indexing and indirect indexing instead of just slicing
link |
So for people who don't know, and maybe you can elaborate NumPy, I guess the vision in
link |
the narrowest sense is to have this object that represents n dimensional arrays.
link |
And like at any level of abstraction you want, but basically it could be a black box that
link |
you can investigate in ways that you would naturally want to investigate such objects.
link |
So you could do math on it easily.
link |
Math on it easily.
link |
So it had an associated library of math operations.
link |
And effectively SciPy became an even larger operate set of math operations.
link |
So the key for me was I was going to write NumPy and then move SciPy to depend on NumPy.
link |
In fact, early on, one of the initial proposals was that we would just write SciPy and it would
link |
have the numeric object inside of it and it'd be SciPy dot array or something.
link |
That turned out to be problematic because numeric already had a little mini library of linear
link |
algebra and some functions and it had enough momentum, enough users that nobody wanted
link |
They wanted the backward compatibility.
link |
One of the big challenges of NumPy was I had to be backward compatible with both numeric
link |
and numeric in order to allow both of those communities to come together.
link |
There was a ton of work in creating that backward compatibility that also created echoes in
link |
Like some of the complexity in today's object is actually from that goal of backward compatibility
link |
to these other communities, which if you didn't have that, you'd do something different,
link |
which is instructive because a lot of things are there.
link |
You think, what is that there for?
link |
It's like, well, it's a remnant, it's an artifact of its historical existence.
link |
By the way, I love the empathy and the lack of ego behind that because I feel...
link |
You see that in the split, in the JavaScript frameworks, for example, the arbitrary branching.
link |
I think in order to unite people, you have to kind of put your ego aside and truly listen
link |
What do you love about Numeric?
link |
What do you love about numeric?
link |
Actually get a sense.
link |
I was talking about languages earlier, sort of empathize to the culture of the people
link |
that love something about this particular API, the naming style, or the usage patterns,
link |
and truly understand them so that you can create that same draw in the united thing.
link |
I completely agree.
link |
You have to also have enough passion that you'll do it.
link |
It can't be just like a perfunctory, oh yes, I'm really illicit to you, and then I'm not
link |
really that excited about it.
link |
So it really is an aspect, it's a philosophical, like there's a filia, there's a love of esteeming
link |
of others that's actually at the heart of what is sort of a life philosophy for me,
link |
right, that I'm constantly pursuing, and that helped, absolutely helped.
link |
Makes me wonder in a philosophical, like looking at human civilization as one object, it makes
link |
me wonder how we can copy and paste Travis's in the story.
link |
Well, in some aspects, maybe.
link |
Some aspects, right, right, exactly.
link |
Well, it's a good question, how do we teach this, how do we encourage it, how do we lift
link |
Because so much of the software world, it's giant communities, right, but it seems like
link |
so much is moved by little individuals.
link |
You talk about Linus Tarvald, it's like, could you have had Linux without him?
link |
Yeah, Guido and Python.
link |
I mean, in the scipy community in particular, I said we wanted to build this big thing,
link |
but ultimately we didn't, what happened is we had Mavericks and champions like John Hunter
link |
who created Map.lib, we had Fernando Perez who created iPython, and so we sort of inspired
link |
each other, but in the credit, there's sort of a culture of this selfless, giving the
link |
stewardship mentality, as opposed to ownership mentality, but stewardship and community focused,
link |
community focused, but intentional work, like not waiting for everybody else to do the work,
link |
but you're doing it for the benefit of others and not worried about what you're going to
link |
You're not worried about the credit, you're not worried about what you're going to get,
link |
you're worried about, I later realized that I have to worry a little about credit, not
link |
because I want the credit, because I want people to understand what led to the results.
link |
It's not about me, it's I want to understand this is what led to the result, so I think
link |
doing, and this is what had no impact on the result, like let's promote, this is just like
link |
you said, I want to promote the attributes that help make us better off.
link |
How do we make more of West McKinney, like West McKinney was critical to the success
link |
of Python because of his creation of pandas, which is the roots of that, we're all the
link |
way back in American num array and NumPy, where NumPy created an array of records, West started
link |
to use that almost like a data frame, except it's an array of records, and data frame,
link |
the challenge is, okay, if you want to augment it at another column, you have to insert, you
link |
have to do all this memory movement to insert a column, whereas data frames became, oh, I'm
link |
going to have a loose collection of arrays, so it's a record of arrays that is the heart
link |
And we thought about that back in the memory days, but West ended up doing the work to
link |
build it, and then also the operations that were relevant for data processing.
link |
What I noticed is just that each of these little things creates just another tick, another
link |
up, so NumPy ultimately took a little while, about six months in, people started joining
link |
me, Francesc Alted, Robert Kern, Charles Harris, and these people are many of the unsung
link |
heroes, I would say.
link |
People who are, they sometimes don't get the credit they deserve, because they were critical
link |
both to support, it's hard and you need some support, people need support, and I needed
link |
just encouragement, and they were helping, encouraged by contributing.
link |
And once, the big thing for me was when John Hunter, he had previously done kind of a simple
link |
thing called numerics to kind of, you know, between numeric and nummery, he had a little
link |
high level tool that would just select each one for Matplotlib.
link |
In 2006, he finally said, we're going to just make NumPy the dependency of Matplotlib.
link |
As soon as he did that, and I remember specifically when he did that, I said, okay, we've done
link |
Like, that was when I knew we had to see success.
link |
Before then, it was still, you know, doing sure, but that kind of started a roller coaster
link |
and then 2006 to 2009, and then I've been floored by what it's done.
link |
Like, I knew it would help, I had no idea how much it would help, right.
link |
And it has to do with, again, the language thing, it just, people started to think in
link |
And that opened up a whole new way of thinking.
link |
And part of the story that you kind of mentioned, but maybe you can elaborate, it seems like
link |
at some point in this story, Python took over science and data science, and not bigger
link |
than that, the scientific community started to think like programmers or started to utilize
link |
the tools of computers to do, like at a scale that wasn't done with Fortran, like at this
link |
gigantic scale, they started to opening their heart, and then Python was the thing.
link |
I mean, there's a few other competitors, I guess, but Python, I think, really, really
link |
There's a lot of stories here that are kind of during this journey, because this is sort
link |
of the start of this journey in 2005, 2006.
link |
So my tenure committee, I applied for tenure in 2006, 2007, it came back, I split the department.
link |
I was very polarizing.
link |
I had some huge fans, and then some people said, no way, right.
link |
So it was very, I was a polarizing figure in the department.
link |
It went all the way up to the university president.
link |
Ultimately, my department chair had this way, and they didn't say no, they said, come back
link |
in two years and do it again.
link |
And I went, at that point, I was like, I had this interest in entrepreneurship, this interest
link |
in not the academic circles, not the, like, how do we make industry work?
link |
So I do have to give credit to that exploration of economics, because that led me, oh, I had
link |
a lot of opinions, I was actually very libertarian at the time.
link |
And I still have some libertarian trends, but I'm more of a, I'm more of a collectivist
link |
So you value broadly, philosophically, freedom?
link |
Value broadly, philosophically, freedom, but I also understand the power of communities,
link |
like the power of collective behavior.
link |
And so what's that balance, right, that makes sense?
link |
So by the time I was just, I got to go out and explore this entrepreneurial world.
link |
When I left academia, I said, no thanks, called my friend, Eric, here, who had, his company
link |
was going, I said, hey, could I join you and start this trend?
link |
And he, at that time, they were usually inspired by a lot, they were trying to get clients,
link |
and so I came down to Texas.
link |
And in Texas is where I sort of, it's my entrepreneur world, right?
link |
I left academia and went to entrepreneur world in 2007.
link |
So moved here in 2007, kind of took a leap, knew nothing really about business, knew nothing
link |
about a lot of stuff there.
link |
There's, you know, for a long time, I've kept some connections to a lot of academics, because
link |
I still love the scientific tradition, I still value the essence and the soul and the heart
link |
of what is possible.
link |
Don't like a lot of the administration and the kind of, we can go into detail about why
link |
and where and how this happens, what are the challenges?
link |
I mean, I don't know, but I'm with you, so I'm still affiliated with MIT, I still love
link |
MIT, because there's magic there, and there's people I talk to, like researchers, faculty,
link |
in those conversations and the white board and just the conversation, that's magic there.
link |
All the other stuff, the administration, all that kind of stuff, seems to, you don't want
link |
to say too harshly criticized sort of bureaucracies, but there's a lag that seems to get in the
link |
And I don't, I'm still have a lot of hope that that can change, because I don't often
link |
see that particular type of magic elsewhere in the industry.
link |
So like we need that, and we need that flame going, and it's the same thing as exactly
link |
as you said, it has the same kind of elements like the open source community does.
link |
But then if you, like the reason I stepped away, the reason I'm here, just like you did
link |
in Austin is like, if I want to build one robot, I'll stay at MIT, but if I want to
link |
build millions and make money enough to work and explore the magic of that, then you can't.
link |
And I think that dance is...
link |
The translational dance has been lost a bit, and there's a lot of reasons for that.
link |
I'm certainly not an expert on this stuff, I can opine like anybody else, but I realized
link |
that I wanted to explore entrepreneurship, and really figure out, and it's been a driving
link |
passion for 20 years, 20, 25 years, how do we connect capital markets and company, because
link |
again, I fell in love with the notion, oh, profit seeking on its own is not a bad thing.
link |
It's actually a coordination mechanism for allocating resources that in an emergent way,
link |
that respects everybody's opinions.
link |
So this is actually powerful.
link |
So I say all the time, when I make a company and we do something that makes profit, what
link |
we're saying is, hey, we're collecting of the world's resources and voluntarily people
link |
are asking us to do something that they like, and that's a huge deal.
link |
And so I really liked that energy, so that's what I came to do and to learn and to try
link |
I've been kind of stumbling through for the past 14 years.
link |
And so you were still working on an opine.
link |
So no pie was just emerging, right?
link |
One of the things I'd done, it's worth mentioning because it emphasized the exploratory nature
link |
of my thinking at the time.
link |
I said, well, I don't know how to fund this thing.
link |
I've got a graduate student I'm paying for, and I've got no funding for him.
link |
And I had done some fundraising from the public to try to get public fundraising from my lab.
link |
I didn't really want to go out and just do the fundraising circuit the way it's traditionally
link |
So I wrote a book, and I said, I'm going to write a book, and I'm going to charge for
link |
It was called Guide to NumPy.
link |
And so ultimately NumPy became documentation driven development because I basically wrote
link |
the book and made sure the stuff worked, so the book would work.
link |
So it really helped actually make NumPy become a thing.
link |
So writing that book, and it was not a page turner, Guide to NumPy is not a book you pick
link |
up and go, oh, this is great, over the fire.
link |
But it's where you could find the details, like how did all this work?
link |
And a lot of people love that book.
link |
And so a lot of people ended up, but I said, look, I need to, so I'm going to charge for
link |
And I got some flak for that, not that much.
link |
Just probably five angry messages, people yelling at me saying I was a bad guy for charging
link |
It was one of them which is dumb.
link |
No, I haven't really had any interaction with him personally, like I said.
link |
But there were a few, but I'm just surprisingly not, there were actually a lot of people
link |
like, no, it's fine.
link |
You can charge for a book.
link |
That's no big deal.
link |
That's the way you can try to make money around open source.
link |
So what I did, what I did in an interesting way, I said, well, you know, kind of my idea
link |
is around IP law and stuff.
link |
You can share something.
link |
You can spread it.
link |
Like once it's, the fact that you have a thing and copying is free, but the creation
link |
So how do we, how do you fund the creation and allow the copying, right?
link |
And the software is a little more complicated than that because creation is actually a continuous
link |
You know, it's not like you build a widget that's done, it's sort of a process of emerging
link |
and continuing to create.
link |
But I wrote the book and had this market determined price thing.
link |
I said, look, I need, I think I said 250,000.
link |
If I make 250,000 from this book, it's, it'll, I'll make it free.
link |
So as soon as I get that much money, or I said five years, right?
link |
So there's a time limit.
link |
I didn't know this story.
link |
So I released it on this.
link |
And it's actually interesting because one of the people who also thought that was interesting
link |
ended up being Chris White, who was the director of DARPA project that we got funding
link |
through at Anaconda, and the reason he even called us back is because he remembered my
link |
name from this book and he thought that was interesting.
link |
And so even though we hadn't gone to the demo days, we applied and the people said, yeah,
link |
nobody ever gets this without coming to the demo day first.
link |
It's the first time I've seen it, but it's because I knew, you know, Chris had done this
link |
and had this interaction.
link |
So it did have impact.
link |
I was actually really, really pleased by the result.
link |
I mean, I ended up, I ended up in three years, I mean, 90,000.
link |
So sold 30,000 copies by myself.
link |
I just put it up on, you know, used PayPal and sold it.
link |
And those are my first tastes of kind of, okay, this can work to some degree.
link |
And I, you know, all over the world, right?
link |
From Germany to Japan to, it was actually, it did work.
link |
And so I appreciated the fact that PayPal existed and had a way to make, to get the money.
link |
The distribution was simple.
link |
This is pre Amazon book stuff.
link |
So it was just publishing a website.
link |
It was the popularity of sci fi emerging and getting company usage.
link |
I ended up not letting it go the five years and not trying to make the full amount because,
link |
you know, a year and a half later, I was at Enthought.
link |
I had left academia as an Enthought and I kind of had a full time job.
link |
And then actually what happened is the documentation people, there's a group that said, hey, we
link |
want to do documentation for sci fi as a collective.
link |
And they were essentially needing the stuff in the book.
link |
And so they kind of asked, hey, can we just use the stuff in your book?
link |
And at that point I said, yeah, I'll just open it up.
link |
So that's, but it has served its purpose.
link |
And the money that I made actually funded my grad student.
link |
Like it was actually, you know, I paid him $25,000 a year out of that money.
link |
The funny thing is if you do a very similar kind of experiment now with NumPy or something
link |
like it, you could probably make a lot more.
link |
It's probably true.
link |
Because of the tooling and the community building.
link |
Like the, and social media, there's just a virality to that kind of idea.
link |
There'd be things to do.
link |
I've thought about that.
link |
I really had thought about a couple of books or a couple of things that could be done there.
link |
And I just haven't, right?
link |
I even, I tried to hire a ghostwriter this year too to speak if I could help, but it
link |
Part of my problem is this, I've been so excited by a number of things that stemmed
link |
Like, so I came here, worked at and thought for four years, graciously, you know, Eric
link |
made me president and we started to work closely together.
link |
We actually helped him buy out his partner.
link |
It didn't end great.
link |
Like unfortunately Eric and I aren't friends now.
link |
I still respect him.
link |
I have a lot, you know, I wish we were, but he didn't like the fact that Peter and I
link |
That was not, I mean, so there's two sides of that story, so I'm not going to go into
link |
But you, as human beings and you wish you still could be friends.
link |
I mean, that's a story of great minds building great companies.
link |
Somehow it's sad that when there's that kind of...
link |
And I hold him in esteem.
link |
I'm grateful for him.
link |
I think they're doing, you know, their thoughts still exist.
link |
They're doing great work helping scientists.
link |
They still run the SciPy conference.
link |
They're in the, they have an R&D platform they're selling now that's a tool that you
link |
can go get today, right?
link |
So they've been, Enthought has played a role in the SciPy, in supporting the community
link |
around SciPy, I would say.
link |
They ended up not being able to, they ended up building a tool suite to write GUI applications.
link |
Like that's where they could actually make that the business could work.
link |
And so the supporting SciPy and NumPy itself wasn't as possible.
link |
Like they didn't, they try.
link |
I mean, it was not just because, it was just because the business aspect.
link |
So, and then I wanted to build a company that could do, that could get venture funding,
link |
I mean, that's a longer story.
link |
We could talk a lot about that, but.
link |
And that's, that's where Anaconda came to be.
link |
That's where Anaconda came to be.
link |
So let me, let me ask you, it's a little bit for fun because you built this amazing thing.
link |
And so let's, let's talk about like an old warrior looking over old battles.
link |
You've, you know, there's a sad letter in 2012 that you wrote to the NumPy mailing list
link |
the knowledge that you're leaving NumPy and some of the things you've listed and some,
link |
some of the things you regret or not regret necessarily, but some things to think about.
link |
If you could go back and you could fix stuff about NumPy or both sort of in a personal
link |
level, but also like looking forward, what kind of things would you like to see changed?
link |
So I think there's technical questions and social questions right there.
link |
First of all, you know, I wrote NumPy as a service and I spent a lot of time doing it
link |
and then other people came help make it happen.
link |
NumPy succeeded because of the work of a lot of people, right?
link |
So it's, it's important to be able to understand that I'm grateful for the opportunity at the
link |
role I had, I could play and grateful that things I did had an impact, but they only
link |
had the impact they had because the other people that came to the, to the story.
link |
And so they were essential, but the way data types were handled, the way data types we
link |
had array scalars, for example, that are really just a substitute for a type concept, right?
link |
So we had array scalars or actual Python objects so that there's for every, for a 32 bit float
link |
or a 16 bit float or a 16 bit integer, Python doesn't have a natural, it's just a one integer
link |
Well, what about these lower precision types, these larger precision types?
link |
So we had them in NumPy so that you could have a collection of them, but then have an object
link |
in Python that was one of them.
link |
And there's questions about like in retrospect, I wouldn't have created those of an improved
link |
the type system and like made the type system actually a Python type system, as opposed
link |
to currently it's a Python one level type system.
link |
I don't know if you know the difference between Python one, Python two, it's kind of technical
link |
kind of depth, but Python two, one of its big things that Guido did, it was really brilliant.
link |
It was he actually Python one, all classes, new objects were, were one.
link |
So he was a user wrote a class.
link |
It was an instance of a single Python type called the, called the class type, right?
link |
In Python two, he used a meta typing hook to actually go, oh, we can extend this and have
link |
users write classes that are new types.
link |
So he was able to have your user classes be actual types and the Python type system got
link |
I barely understood that at the time that NumPy was written.
link |
And so I essentially in Python and NumPy created a type system that was Python one era.
link |
It was every, every D type is an instance of the same type, as opposed to having new
link |
D types be really just Python types with additional metadata.
link |
What's the cost of that?
link |
Is it efficiencies or usability?
link |
It's usability primarily.
link |
The cost isn't really efficiency.
link |
It's the fact that it's clumsy to create new types.
link |
And then one of the challenges you want to create new types, you want to quaternion type
link |
or you want to add a new, you know, posit type or you want to.
link |
Now in the, and now if we had done that well, when NumPy came on the scene where we could
link |
actually compile Python code, it would integrate with that type system much cleaner.
link |
And now all of a sudden you could do gradual typing more easily.
link |
You could actually have Python when you add NumPy plus better typing, could actually be
link |
a, you'd smooth out a lot of rough edges.
link |
But there's already, there's like, but are you talking about from the perspective of
link |
developers within NumPy or users of NumPy?
link |
Developers of new, not really users of NumPy so much.
link |
It's the development of NumPy.
link |
So you're thinking about like how to design NumPy so that it's contributors.
link |
The contributors, it's easier.
link |
It's less work to make it better and to keep it maintained.
link |
And where that's impacted things, for example, is the GPU, like all of a sudden GPUs start
link |
getting added and we don't have them in NumPy.
link |
Like NumPy should just work on GPUs.
link |
The fact that we have to, you'd have to download a whole other object called Koopi to have arrays
link |
on GPUs is just an artifact of history.
link |
Like there's no, there's no fundamental reason for it.
link |
Well, that's really interesting if we could sort of go on that tangent briefly is you have
link |
PyTorch and other library like TensorFlow that basically tried to mimic NumPy.
link |
Like you've created a sort of platonic form of multi dimension.
link |
Well, and the problem was they didn't realize that.
link |
There were a lot of edges.
link |
They were like, well, we should cut those out before we present it.
link |
So I mean, I wonder if you can comment, is there like a difference between their implementations?
link |
Do you wish that they were all using NumPy over, like in this abstraction on GPU?
link |
And sorry to interrupt it, that there's GPUs, ASICs.
link |
There might be other neuromorphic computing.
link |
There might be other kind of, or the aliens will come with a new kind of computer, like
link |
an abstraction that NumPy should just operate nicely over the things that are more and more
link |
and smarter and smarter with this multi dimensional arrays.
link |
There's several comments there.
link |
We are working on something now called data dash apis.org, data dash api.org.
link |
You can go there today.
link |
And it's our answer.
link |
You know, it's not just me.
link |
It's me and Rolf and Athen and Aaron and a lot of companies are helping us at Quonsite
link |
It's not unifying all the arrays, it's creating an API that is unified.
link |
So we do care about this and we're trying to work through it.
link |
I actually had the chance to go and meet with the TensorFlow team and the PyTorch team and
link |
talk to them after exiting Anaconda, just talking about, because the first year after
link |
leaving Anaconda in 2018, I became deeply aware of this and realized that, oh, the split
link |
in the array community that exists today makes what I was concerned about in 2005 pretty
link |
It's a lot worse than people, so perhaps the industry can sustain more stacks, right?
link |
There's a lot of money, but it makes it a lot less efficient.
link |
I mean, I've also learned to appreciate, it's okay to have some competitions, okay to have
link |
different implementations, but it's better if you can at least refactor some parts.
link |
I mean, you're going to have more efficient if you can refactor parts.
link |
It's nice to have competition over things, over what it's nice to have competition.
link |
They're innovative.
link |
And then maybe on the infrastructure, whatever, however you define infrastructure, maybe it's
link |
nice to have competition together.
link |
And I think, but it was interesting to hear the stories.
link |
I mean, TensorFlow came out of the C++ library.
link |
Jeff Dean wrote, I think, that was basically how they were doing inference, right?
link |
And then they realized, oh, we could do this TensorFlow thing.
link |
That C++ library, then what was interesting to me was the fact that both Google and Facebook
link |
did not, it's not like they supported Python or NumPy initially, they just realized they
link |
They came to this world and then all the users were like, hey, where's the NumPy interface?
link |
Oh, and then they kind of came late to it and then they had these bolt ons.
link |
TensorFlow's bolt on, I don't mean to offend, but it was so bad.
link |
It's the first time that I'm usually, I mean, one of the challenges I have is I don't criticize
link |
enough because in the sense that I don't give people input enough.
link |
I think it's universally agreed upon that the bolt ons on TensorFlow work.
link |
It was a talk given at a Py...
link |
My Orca in Spain and a guy, great guy, came and gave a talk and I said, you should never
link |
show that IPI again at a PyData conference.
link |
That was terrible.
link |
You're taking this beautiful system we've created and you're corrupting all these poor
link |
Python people, forcing them to write code like that or thinking they should.
link |
Fortunately, they adopted Keras as their, and Keras is better.
link |
And so Keras, TensorFlow is fine, is reasonable, but they bolted it on.
link |
Facebook had their own C++ library for doing inference and they also had the same reaction.
link |
They had to do this.
link |
One big difference is Facebook, maybe because of the way it's situated in part of FAIR, part
link |
of their research library, TensorFlow is definitely used and they couldn't just open it up and
link |
let the community change what that is because I guess they were worried about disrupting
link |
Facebook's been much more open to having community input on the structure itself, whereas Google
link |
and TensorFlow, they're really eager to have community users.
link |
People use it and build the infrastructure, but it's much more walled.
link |
It's harder to become a contributor to TensorFlow itself.
link |
And it's also, this is a very difficult question to answer and don't mean to be throwing shade
link |
at anybody, but you have to wonder, it's the Microsoft question of when you have a tool
link |
like PyTorch or TensorFlow, how much are you tending to the hackers and how much are you
link |
tending to the big corporate clients?
link |
Do you tend to the millions of people that are giving you almost no money, or do you tend
link |
to the few that are giving you a ton of money?
link |
I tend to stand with the people.
link |
I feel like if you nurture the hackers, you will make the right decisions in the long
link |
term that will make the companies happy.
link |
I lean that way too.
link |
But then you have to find the right dance.
link |
But it's a balance, because you can lean to the hackers and run out of money.
link |
Which has been some of the challenge I've faced in the sense that I would look at some
link |
of the experiments, like NumPy, the fact that we have this split is a factor of I wasn't
link |
able to collect more money towards NumPy development, right?
link |
I mean, it didn't succeed in the early days of getting enough financial contribution to
link |
NumPy, so they didn't really work on it, right?
link |
I couldn't work on it full time.
link |
I had to just catch an hour here, an hour there, and I basically not like that.
link |
I've wanted to be able to do something about that for a long time and try to figure out
link |
how, well, there's lots of ways, I mean, possibly one could say, you know, we had an offer from
link |
Microsoft at early days of Anaconda, 2014, they offered to come buy us, right?
link |
The problem was the right people at Microsoft didn't offer to buy us, and they were still,
link |
it was really, we were like a second, they had really bought, they had just bought R,
link |
the R company called, it was not R Studio, but it was another R company that was emergent,
link |
and it was kind of a, well, we should also get a Python play, but they were really double
link |
it down on R, right?
link |
And so it was like...
link |
It was where you would go to die, so it wasn't, it was before Saatcha was there.
link |
Saatcha had just started.
link |
And the offer was coming from someone two levels down from him.
link |
So it would come from Scott Guthrie, so I got a chance to meet Scott Guthrie, great
link |
If it offered to come from him, probably would be at Microsoft right now.
link |
That'd be fascinating.
link |
And that would be really nice, actually, especially given what Microsoft has since done for the
link |
open source community and all those things.
link |
Yes, I think they're doing well.
link |
I really like some of the stuff they've been doing.
link |
They're still working, and they've hired Guido now, and they've hired a lot of Python developers.
link |
Wait, Guido's not at Microsoft?
link |
He retired, then he came out of retirement, and he's working on...
link |
I was just talking to him, and he didn't mention this part.
link |
I should have better to get this further, because I know he loved Dropbox, but I wasn't
link |
sure what he was doing, or what he was up to.
link |
Well, he was kind of saying he'd retire, but it's literally been five years since I last
link |
sat down and really talked to Guido, right?
link |
Guido's a technology expert, right?
link |
So I came, I was excited because I'd finally figured out the type system for NumPy.
link |
I wanted to kind of talk about that with him, and I kind of overwhelmed him.
link |
Could you stay in that...
link |
Just for a brief moment, because you're a fascinating person in the history of programming.
link |
He is a fascinating person.
link |
What have you learned from Guido about programming, about life?
link |
I've been a fan of Guido's.
link |
We have a chance to talk.
link |
Some, I wouldn't say we talk all the time, not really at all, but we've talked enough
link |
In fact, when I first started NumPy, one of the first things I did was I asked Guido
link |
for a meeting with him and Paul de Bois in San Mateo, and I went and met him for lunch.
link |
Basically, to say, maybe we can actually...
link |
Part of the strategy for NumPy was to get it into Python 3, and maybe be part of Python.
link |
So we talked about that, and about that approach, right?
link |
That's a cool conversation.
link |
I would have loved to be a fly in the water.
link |
Over the years for Guido, I learned...
link |
He was willing to listen to people's ideas, right, and over the years.
link |
Now, generally, I'm not saying universally that's been true, but generally that's been
link |
So he's willing to listen.
link |
He's willing to defer.
link |
Like on the scientific side, he would just defer.
link |
He didn't really always understand what we were doing, and he'd defer.
link |
One place where he didn't enough was we missed a matrix multiply operator.
link |
Like that finally got added to Python, but about 10 years later than it should have.
link |
The reason was because nobody... it takes a lot of effort, and I learned this while
link |
I was writing NumPy.
link |
I also wrote tools to...
link |
I became a Python dev, and I added some pieces to Python, like the memory view object.
link |
I wanted the structure of NumPy into Python.
link |
So we didn't get NumPy into Python, but we got the basic structure of it into Python,
link |
so you could build on it.
link |
Nobody did for a while, but eventually, database authors started to, and it's a lot better.
link |
And also, Antoine Petrot and Stefan Krau actually fixed the memory view object, because I wrote
link |
the underlying infrastructure in C, but the Python exposure was terrible until they came
link |
in and fixed it, partly because I was writing NumPy, and NumPy was the Python exposure.
link |
I didn't really care about if you didn't have NumPy installed.
link |
Anyway, Guido opened up ideas, technologically brilliant.
link |
I really got a lot of respect from when I saw what he did with this type class merger
link |
It was actually tricky, and then willing to share.
link |
Willing to share his ideas.
link |
So the other thing, early on in 1998, I said I start wrote my first extension module.
link |
The reason I could is because he wrote this blog post on how to do reference counting.
link |
And without it, I would have been lost.
link |
But he was willing to at least try to write this post.
link |
And so he's been motivated, early on with Python, there's a computer science for everybody.
link |
We're going to have this early on desire to, oh, maybe we should be pushing programming
link |
So he had this populist notion, I guess, or populist sense.
link |
So there's a certain skill, and I've seen it in other people, too, of engaging with
link |
contributors sufficiently to, because when somebody engages with you and wants to contribute
link |
to you, if you ignore them, they go away.
link |
So building that early contributor base requires real engagement with other people, and he
link |
Can you also comment on this tragic stepping down from his position as the benevolent
link |
dictator for life over the war's, you know, the walrus operator, the walrus operator was
link |
I don't know if that's the cause of it, but there's this, for people who don't know,
link |
you can look up, there's the walrus operator, which is, looks like a colon and equal sign.
link |
Yeah, colon, equal sign.
link |
And it actually does maybe the thing that you, that an equal sign should be doing.
link |
It's just historically, an equal sign means something else, it just means a Simon.
link |
So he stepped down over this, what do you think about the pressure of leadership?
link |
It's something that you mentioned, the letter I wrote in Empire of the Time, that was a
link |
hard time actually, I mean, you know, there's been really hard times, it was hard, you know,
link |
you get criticized, right?
link |
And you get pushed, and you get, not everybody loves what you do, like anytime you do anything
link |
that has impact at all, you're not universally loved, right?
link |
You get some real critics, and that's an important energy because it's impossible for you to
link |
do everything right.
link |
You need people to be pushing, but sometimes people can get mean, right?
link |
People can, I prefer to give people the benefit of the doubt, I don't immediately assume they
link |
have bad intentions.
link |
And maybe for other, you know, maybe that doesn't happen for everybody, they, for whatever
link |
reason, their past, their experience of people, they sometimes have bad intentions, so they
link |
immediately attribute to you bad intentions, they're like, where did this come from?
link |
And I definitely open to criticism, but I think you're misinterpreting the whole point.
link |
Because I would get that, you know, sort of when I started Anaconda, you know, I've been,
link |
sometimes I say to people, I know I'm, I care enough about entrepreneurship to make some
link |
open source people uncomfortable, and I care enough about open source to make investors
link |
So I sort of, you know, create, you create kind of doubters on both sides.
link |
So when you have, and this is just a plea to the listener and the public, I've noticed
link |
this too, that there's a tendency in social media makes this worse, when you don't have
link |
perfect information about the situation, you tend to fill the gaps with the worst possible,
link |
or at least a bad story that fills those gaps.
link |
And I think it's good to live life, maybe not fully naively, but filling in the gaps
link |
with the, with the good, with the best, with the positive, with the hopeful explanation
link |
of why you see this.
link |
So if you see somebody like you trying to make money on a book about NumPy, there's
link |
a million stories around that that are positive.
link |
And those are good to think about, to project positive intent on the people.
link |
Because for many reasons, usually because people are good, and they do have good intent.
link |
And also when you project that positive intent, people will step up to that.
link |
It's a great point.
link |
It has this kind of viral nature to it.
link |
And of course, what Twitter early on figured out on Facebook is that they can make a lot
link |
of money and engagement from the negative.
link |
And so like there's this, we're fighting this mechanism, which is challenging.
link |
It's just easier to be.
link |
And then for some reason, something in our mind really enjoys sharing that and getting,
link |
getting all excited about the negativity.
link |
It's a great mechanism perhaps that we're, we're going to eat and if we don't.
link |
For us to be effective as a group of people in a software engineering project, you have
link |
to project positive intent, I think.
link |
And I think that's very, and so that happens in this, in the space, but Python has done
link |
a reasonable job in the past, but here's a situation where I think it's, it started
link |
to get this pressure where it didn't.
link |
I was, I really didn't, I didn't know enough about what happened.
link |
I've, you know, talked to several people about it.
link |
And I know most of the steering committee members today, one, one person nominated me
link |
for that role, but it's the wrong role for me right now, right?
link |
I have a lot of respect for the Python developer space and the Python developers.
link |
I also understand the gap between computer science Python developers and array programming
link |
developers or science developers.
link |
And in fact, Python succeeds in the array space.
link |
The more it has people in that boundary and there's often very few, like I was playing
link |
a role in that boundary and, you know, working like everything to try to keep up with the,
link |
with the, what, even what Gita was saying.
link |
Like I'm a C programmer, but not a computer scientist, like I was a engineer and physicist
link |
and mathematician and I don't, I didn't always understand what they were talking about and
link |
why they would have opinions the way they did.
link |
So you know, you have to listen and try to understand, then you also have to explain
link |
your point of view in a way they can understand.
link |
And that takes a lot of work and that, that communication is always the challenge.
link |
And it's just what we're describing here about the negativity is just another form
link |
Like how do we come together?
link |
And it does appear we were wired anyway to at least have a, there's a part of us that
link |
will enemy, you know, friend, enemy and, and we see, yeah, it's like, why are we wiring
link |
on the enemy front?
link |
So, so why are we pushing that?
link |
Why are we promoting that so deeply?
link |
Assume friend until proven otherwise.
link |
So because you have such a fascinating mind and all this, let me just ask you these questions.
link |
So one interesting side on the Python history is the move from Python two to Python three.
link |
You mentioned move from Python one to Python two, but the move from Python two to Python
link |
three is a little bit interesting because it took a very long time.
link |
It broke in quite a small way, backward compatibility, but even that small way seemed to have been
link |
very painful for people.
link |
Is there a lessons you draw from, from how long it took and how painful it seemed to
link |
Yeah, tons of lessons.
link |
I mentioned here earlier that NumPy was written in 2005.
link |
It was in 2005 that I actually went to Guido to talk about getting NumPy into Python three.
link |
Like my strategy was to, oh, we were moving to Python three.
link |
Let's have that be.
link |
And it seems funny in retrospect because like, wait, Python three, that was in 2020, right,
link |
when we finally ended support for Python two or at least 2017.
link |
The reason it took a long time, a lot of time, I think it was because one of the things is
link |
there wasn't much to like about Python three, 3.0, 3.1, it really wasn't until 3.3, like
link |
I consider Python 3.3 to be Python 3.0, but it wasn't until Python 3.3 that I felt there's
link |
enough stuff in it to make it worth anybody using it, right?
link |
And then 3.4 started to be, oh, yeah, I want that.
link |
And then 3.5 as the matrix move play operator, and now it's like, okay, we got to use that.
link |
Plus the libraries that started leveraging some of the features of Python three.
link |
And then there was, it was, but it also illustrated a truism that, you know, it's, when you have
link |
inertia, when you have a group of people using something, it's really hard to move them away
link |
You can't just change the world on them.
link |
And Python three, you know, it made some, I think it fixed some things, Guido had always
link |
I don't think he didn't like the fact that print was a statement.
link |
He wanted to make it a function.
link |
But in some sense, that's a bit of gratuitous change to the language.
link |
And you could argue, and there's people have, but there was, one of the challenges was there
link |
wasn't enough features and too many just changes without features.
link |
And so the empathy for the end user as to why they would switch wasn't, wasn't there.
link |
I think also it illustrated just the funding realities.
link |
Like Python wasn't funded.
link |
Like it was also a project with a bunch of volunteer labor, right?
link |
It had more people, so more volunteer labor, but it was still, it was fun to the sense
link |
that at least Guido had a job.
link |
And I've learned some of the behind the scenes on that now since talking to people who have
link |
put through it, and maybe not on air, we can talk about some of that, but it's interesting
link |
to see, but Guido had a job, but his full time job wasn't just work on Python.
link |
Like he had other things to do.
link |
It is wild, isn't it?
link |
As wild as how few people are funded, and how much impact they have.
link |
Maybe that's a feature in our bug.
link |
Maybe, yes, exactly.
link |
At least early on.
link |
It's sort of, I know.
link |
It's like Olympic athletes are often severely underfunded, but maybe that's what brings
link |
out the greatness.
link |
Maybe this is the essential part of it, because I do think about that in terms of, I currently
link |
have an incubator for open source startups.
link |
What I'm trying to do right now is create the environment I wished it existed when I was
link |
leaving academia with NumPy and trying to figure out what to do.
link |
I'm trying to create those opportunities and environments.
link |
And that's what drives me still, is how do I make the world easier for the open source
link |
Let me stay, I could probably stay in NumPy for a long time, but this is a fun question.
link |
So Andre Capati leads the Tesla autopilot team, and he's also one of the most legit programmers.
link |
He builds stuff from scratch a lot, and that's how he builds intuition about how a problem
link |
He builds it from scratch, and I always love that.
link |
And the primary language he uses is Python for the intuition building.
link |
But he posted something on Twitter saying that they got a significant improvement on
link |
some aspect of their data loading, I think, by switching away from NP.square root, so the
link |
NumPy's implementation of square root to math.square root, and then somebody else commented that
link |
you can get even a much greater improvement by using the vanilla Python square root,
link |
And it's fascinating to me, I just wanted to...
link |
So that was some shade throwing at some...
link |
And yes, we're talking about...
link |
It's a good way to ask the trade off between usability and efficiency broadly in NumPy,
link |
but also in these specific weird quirks of a single function.
link |
So on that point, if you use a NumPy math function on a scaler, it's going to be slower
link |
than using a Python function on that scaler.
link |
Because the math object in NumPy is more complicated, because you can also call that
link |
math object on an array.
link |
And so effectively, it goes through a similar machine where there aren't enough of the...
link |
Which you could do, like checks and fast paths.
link |
So yeah, if you're basically doing a list, if you run over a list, in fact, for problems
link |
that are less than 1,000, even maybe 10,000 is probably the...
link |
If you're going more than 10,000, that's where you definitely need to be using arrays.
link |
But if you're less than that, and for reading, if you're doing a reading process, and essentially
link |
it's not compute bound, it's IO bound, and so you're really taking lists of 1,000 at
link |
a time and then doing work on it, yeah, you could be faster just using Python.
link |
Straight up Python.
link |
See, but also, this is the fundamental questions when you look at the long arc of history.
link |
It's very possible that np.square root is much faster.
link |
So in terms of don't worry about it, it's the evils of over optimization or whatever,
link |
all the different quotes around that is sometimes obsessing about this particular little quirk
link |
is not sufficient.
link |
For somebody like, if you're trying to optimize your path, I agree, premature optimization
link |
creates all kinds of challenges, right, because now, but you may have to do it.
link |
I believe the quote is the root of all evils.
link |
The root of all evils, right?
link |
I mean, let's give Don Knuth, I think, or let's give Don Knuth somebody else.
link |
Well, Don Knuth is kind of like Mark Twain, people just attribute stuff to him, I don't
link |
And it's fine because he's brilliant.
link |
No, I was a latech user myself, and so I have a lot of respect, and he did more than
link |
that, of course, but yeah, someone I really appreciate in the computer science space.
link |
Yeah, I think that's appropriate.
link |
There's a lot of little things like that where people, actually, if you understood it, you
link |
go, yeah, of course, that's the case.
link |
And the other part I didn't mention, and Numba was a thing we wrote early on, and I was really
link |
excited by Numba because it's something we wanted, it was a compiler for Python syntax.
link |
I wanted it from the beginning of writing NumPy because of this function question, like
link |
taking, the power of arrays is really that you can write functions using all of it.
link |
It has implicit looping.
link |
So you don't worry about, I write this n dimensional for loop with four loops for four statements.
link |
You just say, oh, big four dimensional array, I'm going to do this operation, this plus,
link |
this minus, this reduction.
link |
And you get this, it's called vectorization in other areas, but you can basically think
link |
at a high level and get massive amounts of computation done.
link |
With the added benefit of, oh, it can be paralyzed easily, it can be put in parallel, you don't
link |
have to think about that.
link |
In fact, it's worse to go decompose your, you write the four loops and then try to infer
link |
parallelism from four loops.
link |
That's actually harder problem than to take the array problem and just automatically parallelize
link |
That's what, and so functions in NumPy are called universal functions, ufunc.
link |
So square root is an example of a ufunc.
link |
There are others, sine, cosine, add, subtract.
link |
In fact, one of those first libraries to PsiPy was something called special where I added
link |
vessel functions and all these special functions that come up in physics and I added them as
link |
ufunc so they could work on arrays.
link |
So I understood ufuncs very, very well from day one inside of numeric.
link |
That was one of the things we tried to make better in NumPy was how do they work?
link |
Can they do broadcasting?
link |
What does broadcasting mean?
link |
But one of the problems is, okay, what do I do with a Python scalar?
link |
So what happens, the Python scalar gets broadcast to a zero dimensional array and then it goes
link |
through the whole same machinery as if it were a 10,000 dimensional array and then it
link |
kind of unpacks the element and then does the addition.
link |
That's not to mention the function it calls in the case of square root is just the C lab
link |
square root, right?
link |
In some cases, like Python's power, there's some optimizations they're doing that can
link |
be faster than just calling this the C lab square root.
link |
In the interpreter or in the C code?
link |
No, in the C code.
link |
In the Python runtime.
link |
So they really optimize it and they have the freedom to do that because they don't have
link |
to worry about it.
link |
It's just a scalar.
link |
It's just a scalar, right?
link |
They don't have to worry about the fact that, oh, this could be an object with many pieces.
link |
The ufunc machine is also generic in sense that typecasting and broadcasting.
link |
Broadcasting's idea of I'm going to go, I have a zero dimensional array, I have a scalar
link |
with a four dimensional array and I add them.
link |
Oh, I have to kind of coerce the shape of this guy to make it work against the whole
link |
four dimensional array.
link |
So it's the idea of I can do a one dimensional array against a two dimensional array and
link |
have it make sense.
link |
Well, that's what NumPy does is it challenges you to reformulate, rethink your problem as
link |
a multi dimensional array problem versus like move away from scalars completely.
link |
In fact, that's where some of the edge cases boundaries are is that, well, they're still
link |
there and this is where array scalars are particular.
link |
So array scalars are particularly bad in the sense that they were written so that you
link |
could optimize the math on them, but that hasn't happened, right?
link |
And so their default is to you is to coerce the array scalar to a zero dimensional array
link |
and then use the NumPy machinery.
link |
That's what you could specialize, but it doesn't happen all the time.
link |
So in fact, when we first wrote Numba, we do comparisons and say, look, it's a thousand
link |
We're lying a little bit in the sense that, well, first do the 40 X slowdown of using the
link |
array scalars inside of a loop because if you used to use Python scalars, you'd already
link |
be 10 times faster.
link |
But then we would get a hundred times faster over that using just compilation.
link |
And what we do is compile the loop from out of the interpreter to machine code.
link |
And then that's always been the power of Python is this extensibility so you can, could people
link |
say, oh, Python's so slow?
link |
If you do all your logic in the runtime of the Python interpreter, yeah.
link |
But the power is that you don't have to.
link |
You write all the logic which you do in the high level is just high level logic.
link |
And the actual calls you're making could be on gigabyte arrays of data.
link |
And that's all done at compiled speeds.
link |
And the fact that integration is one can happen, but two is separable.
link |
That's one of the, the language like Julia says, we're going to be all in one.
link |
You can do all of it together.
link |
And then there's, the jury's out.
link |
I tend to think that you're going to, there's separate concerns there.
link |
You want to precompile.
link |
In fact, generally you will want to precompile your, some of your loops.
link |
Like scipy is a compilation step to install scipy.
link |
It takes about two hours.
link |
If you have many machines, maybe you can get it down to one hour, but to compile all those
link |
libraries takes about, takes a while.
link |
You don't want to do that at runtime.
link |
You don't want to do that all the time.
link |
You want to have this precompiled binary available that you're then just linking into.
link |
So there's real questions about the whole, you know, source code code is running binary
link |
code is more than source code.
link |
It's create an object code.
link |
It's the, how does that interpret it inside of a virtual memory space?
link |
There's a lot of details there that actually I didn't understand for a long time until
link |
I, you know, read books on the topic and it, and it led to the more you know, the better
link |
off you are and you can do more details, but sometimes it helps with abstractions too.
link |
Well the problem, as we mentioned earlier with abstractions is you kind of sometimes
link |
assume that whoever implemented this thing had your case in mind and found the optimal
link |
Or like you assume certain things, I mean, there's a lot of, one of the really powerful
link |
things to me early on, I mean, it sounds silly to say, but with Python probably one
link |
of the reasons I fell in love with it is dictionaries.
link |
So obviously probably most languages have some mapping concept, some mapping concept,
link |
but it felt like it was a first class citizen and it was just my brain was able to think
link |
But then there's the thing that I guess I still use to this day is order dictionaries
link |
because that seems like a more natural way to construct dictionaries and from a computer
link |
science perspective, the running time cost is not that significant, but there's a lot
link |
of things to understand about dictionaries that the abstraction kind of doesn't necessarily
link |
incentivize you to understand.
link |
You really understand the notion of a hash map and how the dictionary is implemented,
link |
Dictionaries are a good example of an abstraction that's powerful and I agree with you, I love
link |
It took me a while to understand that once you do you realize, oh, they're everywhere
link |
and Python uses them everywhere too.
link |
It's actually constructed one of the foundational things as dictionaries and it does everything
link |
with dictionaries.
link |
So it is, it's powerful.
link |
Order dictionaries came later, but it is very, very powerful.
link |
It took me a little while coming from just the array programming entirely to understand
link |
these other objects like dictionaries and lists and tuples and binary trees.
link |
Like I said, I wasn't a computer scientist, but I studied arrays first and so I was very
link |
array centric and you realize, oh, these others don't have purposes and value actually.
link |
There's a friendliness about like one way to think about arrays is arrays are just like
link |
full of numbers, but to make them accessible to humans and make them less error prone to
link |
human users, sometimes you want to attach names, human interpretable names that are
link |
sticky to those arrays.
link |
So that's how you start to think about dictionaries is you start to convert numbers into something
link |
that's human interpretable and that's actually the tension I've had with NumPy because I've
link |
built so much tooling around human interpretability and also protecting me from a year later
link |
not making the mistakes by being, I wanted to force myself to use English versus numbers.
link |
So there's a, there's a project called label arrays.
link |
Like very early, it was recognized that, oh, we need, we're indexing NumPy with just numbers,
link |
all the columns and particularly the dimensions.
link |
I mean, if you have an image, you don't necessarily need to label each column a row, but if you
link |
have a lot of images or you have another dimension, you'd at least like to label the dimension
link |
as this is X, this is Y, this is Z or this is, give us some human meaning or some domain
link |
That was one of the impetuses for pandas actually was just, oh, we do need to label these things
link |
and label, label array was an attempt to add that like a lighter weight version of that.
link |
And there's been, like that's an example of something I think NumPy could add, could
link |
be added to NumPy, but one of the challenges again, how do you fund this?
link |
Like, like I said, one of the tragedies I think is that, so I never had the chance to,
link |
I was never paid to work on NumPy, right?
link |
So I've always just done it my spare time, always taken from one thing, taken from another
link |
And at the time, I mean, today, it would be the wrong day of today, like paying me to
link |
work on NumPy now would not be a good use of effort, but, but we are finally at Quonsite
link |
I'm actually paying people to work on NumPy and SciPy, which is I'm thrilled with, I'm
link |
I've wanted to do that, that's why I always wanted to do it from day one, it just took
link |
me a while to figure out a mechanism to do that.
link |
Even like in the university setting, respecting that, like pushing students, young minds, the
link |
young graduate students to contribute and then figuring out financial mechanisms that
link |
enable them to contribute and then sort of reward them for their innovative scientific
link |
journey that that would be nice, but then also there's just a better allocation of resources.
link |
Well, you know, it's 20 year anniversary since 9 11.
link |
And I was just looking, we spent over $6 trillion in the Middle East after 9 11 in the various
link |
efforts there and sort of to put politics and all that aside is just, you think about
link |
the education system, all the other ways we could have possibly allocated that money.
link |
To me, to take it back, the amount of impact you would have by allocating a little bit
link |
of money to the programmers that build the tools that run the world is fascinating.
link |
I mean, I don't know, I think again, there is some aspect to being broke as somewhat
link |
of a feature, not a bug, that you make sure that you manage that.
link |
Right. No, I know.
link |
But I don't think that's a big part.
link |
So it's like, I think you can have enough money and actually be wealthy while maintaining
link |
There's an old adage that nations that trade together don't go to war together.
link |
I've often thought about nations that code together.
link |
Because one of the things I love about open source is it's global, it's multinational.
link |
There aren't national boundaries.
link |
One of the challenges with business and open source is the fact that business is national.
link |
Like businesses are entities that are recognized in legal jurisdictions and have laws that
link |
are respected in those jurisdictions and hiring and yet the open source ecosystem is not there.
link |
Currently, one of the problems we're solving is hiring people all over the world.
link |
Because it's a global effort and I've had the chance to work and I've loved the chance.
link |
I've never been to Iran, but I once had a conference where I was able to talk to people
link |
there and talk to folks in Pakistan and we've been there, but we had a call where there
link |
are people there, like just scientists and normal people.
link |
There's a certain amount of humanizing that gets away from the...
link |
We often get the memes of society that bubble up and get discussed, but the memes are not
link |
even an accurate reflection of the reality of what people are.
link |
If you look at the major power centers that are leading to something like cyber war in
link |
the next few decades, it's the United States, it's Russia and China.
link |
Those three countries in particular have incredible developers.
link |
If they work together, I think that's one way the politicians can do their stupid bickering,
link |
but there's a layer of infrastructure, of humanity.
link |
If they collaborate together, that I think can prevent major conflict, which would,
link |
I think, most likely happen at the cyber level versus the actual hot war level.
link |
I think that's good prediction.
link |
Nations that code together don't go to war together.
link |
That's one of the philosophical hopes.
link |
You mentioned the project of Numba, which is fascinating.
link |
From the early days, there was a pushback on Python that it's not fast.
link |
If you want to write something that's fast, you use CC++.
link |
If you want to write something that's usable and friendly, but slow, you use Python.
link |
Yes, that's what the argument.
link |
The reality was people would write high level code and use compiled code, but there's still
link |
user stories, cases where you want to write Python, but then have it still be fast.
link |
You still need to write a for loop.
link |
Before Numba, it was always don't write a for loop.
link |
Write it in a vectorized way, put it in an array, and often that can make a memory trade
link |
Quite often you can do it, but then you may use more memory because you have to build
link |
this array of data that you don't necessarily need all the time.
link |
Numba was, it started from a desire to have a vectorized that worked.
link |
A vectorized was a tool in NumPy, it was released, you give it a Python function, and it gave
link |
you a universal function, a UFUNC, so it would work on arrays.
link |
So you get a function that just worked on a scalar, like the classic case was a simple
link |
function that an if then statement in it, so sine x over x function, sync function.
link |
If x equals 0, return 1, otherwise do sine x over x.
link |
The challenge is you don't want that loop going in Python, so you want a compiled version
link |
But the vectorized in NumPy would just give you a Python function.
link |
So it would take the array of numbers and at every call do a loop back into Python.
link |
So it was very slow.
link |
It gave you the appearance of a UFUNC, but it was very slow.
link |
So I always wanted a vectorized that would take that Python scalar function and produce
link |
a UFUNC working on binary native code.
link |
So in fact, I had somebody work on that with PyPy and see if PyPy could be used to produce
link |
a UFUNC like that early on in 2009 or something like that, 2010.
link |
They didn't work that well.
link |
It was kind of pretty bulky.
link |
But in 2012, Peter and I had just started Anaconda.
link |
We had, I had just, I'd learned to raise money.
link |
That's a different topic, but I'd learned to raise money from friends, family, and fools,
link |
That's a good line.
link |
Oh, that's a good line.
link |
So we were trying to do something.
link |
We were trying to change the world.
link |
Peter and I are super ambitious.
link |
We wanted to make array computing and we had ideas for really what's still, it's still
link |
the energy right now.
link |
How do you do at scale data science?
link |
We had a bunch of ideas there, but one of them, I had just talked to people by LLVM
link |
and I was like, there's a way to do this.
link |
I just, I went, I heard about my friend Dave Beasley at a compiler course.
link |
So I was looking at compilers like, and I realized, oh, this is what you do.
link |
And so I wrote a version of Numba that just basically mapped Python bytecode to LLVM.
link |
So, and the first version is like, this works and it produces code that's fast.
link |
This is cool for, you know, obviously a reduced subset of Python.
link |
I didn't support all of the Python language.
link |
There had been efforts to speed up Python in the past, but those efforts were, I would
link |
say not from the array computing perspective, not from the perspective of wanting to produce
link |
a vectorize improvement.
link |
They were from a perspective of speeding up the runtime of Python, which is fundamentally
link |
hard because Python allows for some constructs that aren't, you can't speed up.
link |
It's generic, you know, when it does this variable.
link |
So I, from the start, did not try to replicate Python's semantics entirely.
link |
I said, I'm going to take a subset of the Python syntax and let people write syntax
link |
in Python, but it's kind of a new language, really.
link |
So it's almost like for loops, like focusing on for loops, scalar arithmetic, you know,
link |
typed, you know, really typed language, a type subset.
link |
So, but we wanted to add inference of types, so you didn't have to spell all the types
link |
out because when you call a function, so Python is typed, it's just dynamically typed.
link |
You don't tell it what the types are, but when it runs, every time an object runs, there's
link |
a type for the variables, you know what it is.
link |
And so that was the design goals of Numba were to make it possible to write functions that
link |
could be compiled and have them use for NumPyRays, like the need to support NumPyRays.
link |
And so how does it work?
link |
Do you add a comment within Python that tells it to do, like, how do you help out a compiler
link |
to know what to do?
link |
There isn't much, actually.
link |
It's kind of magical in a sense.
link |
It just looks at the type of the objects and then does type inference to determine any
link |
of the other variables it needs.
link |
And then it was also because we had a use case that could work early, like one of the challenges
link |
of any kind of new development is if you have something that to make it work, it was going
link |
to take you a long time, it's really hard to get out off the ground.
link |
If you have a project where there's some incremental story that can start working today and solve
link |
a problem, then you can start getting it out there, getting feedback.
link |
Because Numba today, now Numba is nine years old today, right?
link |
The first two, three versions were not great, right?
link |
But they solved a problem and some people could try it and we could get some feedback
link |
And it was very focused.
link |
Oh, the fragility.
link |
The subset it would actually compile was small.
link |
And so if you wrote Python code and said, so the way it worked is you write a function
link |
and you say atJit.
link |
So decorator is just these little constructs that you decorate code with an app and then
link |
The atJit would take your Python function and actually just compile it and replace the Python
link |
function with another function that interacts with this compile function.
link |
So you could do that and we went from Python byte code, then we went to AST.
link |
Writing compiler is actually, I learned a lot about why computer science is taught the
link |
way it is because compilers can be hard to write.
link |
They use tree structures.
link |
They use all the concepts of computer science that are needed and it's actually hard to,
link |
it's easy to write a compiler and then have it be spaghetti code.
link |
The passes become challenging and we ended up with three versions of Numba, right?
link |
Numba got written three times.
link |
The programming language is number written in.
link |
That's fascinating.
link |
Yeah, so Python, but then the whole goal of Numba is to translate Python byte code to
link |
And so LLVM actually does the code generation.
link |
In fact, a lot of times they'd say, yeah, it's super easy to write a compiler if you're
link |
not writing the parser, nor the code generator, right?
link |
For people who don't know, LLVM is the compiler itself, so you're compiling it.
link |
It's really badly named low level virtual machine, which that part of it is not used.
link |
It's really low level.
link |
First, he doesn't mean that.
link |
But the name makes you imply that the virtual machine is what it's all about.
link |
It's actually the IR and the library that the code generation, that's the real beauty
link |
The fact that what I love about LLVM was the fact that it was a plateau you could collaborate
link |
Instead of the internals of GCC or the internals of the Intel compiler, how do I extend that?
link |
It was a place you could collaborate.
link |
I mean, people had started before.
link |
It's a slow compiler.
link |
It's not a fast compiler.
link |
For some kind of JITs, JITs are common in the language because, one, every browser has
link |
It does real time compilation of the JavaScript to machine code.
link |
For people who don't know, JIT is just in time compilation.
link |
Yeah, just in time compilation.
link |
They're actually really sophisticated.
link |
In fact, I got jealous of how much effort was put into the JavaScript JITs.
link |
Well, it's kind of incredible what they've done with JavaScript JITs.
link |
I completely agree.
link |
I'm very impressed.
link |
Number was an effort to make that happen with Python.
link |
We used some of the money we raised from Anaconda to do it, and then we also applied for this
link |
DARPA grant and used some of that money to continue the development, and then we used
link |
proceeds from service projects we would do.
link |
We get consulting projects that we would then use some of the profits to invest in number.
link |
We ended up with a team of two or three people working on number.
link |
It was a fits and starts, and ultimately, the fact that we had a commercial version
link |
of it, also we were writing.
link |
Part of the way I was trying to fund numbers, say, let's do the free number, and then we'll
link |
have a commercial version of number called number pro.
link |
Then what number pro did is it targeted GPUs.
link |
We had the very first CUDA JIT and the very first at JIT compiler that in 2013, you could
link |
run not just a viewfunk on CPU, but a viewfunk on GPUs, and it would automatically parallelize
link |
it and get 1,000 x speed on it.
link |
That's an interesting funding mechanism because large companies or larger companies care about
link |
speed in just this way, so it's exactly a really good way to fund it.
link |
Yeah, there's been a couple of things you know people will pay for.
link |
One, they'll pay for really good user interfaces, and so I'm always looking for what are the
link |
other things people will pay for that you could actually adapt to the open source infrastructure.
link |
One is definitely user interfaces.
link |
The second is speed, like a better runtime, faster runtime.
link |
And then when you say people, you mean like a small number of people pay a lot of money,
link |
but then there's also this other mechanism that a ton of people pay a little bit.
link |
First, we mentioned Anaconda, we mentioned friends, family, and fools, so Anaconda is
link |
yet another, so there's a company, but there's also a project that is exceptionally impactful
link |
in terms of for many reasons, but one of which is bringing a lot more people into the community
link |
of folks who use Python.
link |
So what is Anaconda?
link |
What is its goals?
link |
Maybe what is Kanda versus Anaconda?
link |
Yeah, I'll tell you a little bit of the history of that, because Anaconda, we wanted to do,
link |
we wanted to scale Python, because we, you know, Peter and I had the goal of when we
link |
started Anaconda, we actually started as Continuum Analytics was the name of the company that
link |
It got renamed to Anaconda in 2015, but we said we want to scale analytics.
link |
NumPy is great, Pan is emerging, but these need to run at scale with lots of machines.
link |
The other thing we wanted to do was make user interfaces that were web, we wanted to make
link |
sure the web did not pass by the Python community, that we had a ways to translate your data
link |
science to the web.
link |
So those are the two kind of technical areas and we thought, oh, we'll build products in
link |
And that was the idea.
link |
Very quickly in, but of course, the thing I knew how to do was to do consulting to make
link |
money and to make sure my family and friends and the whole city invested didn't lose their
link |
So it's a little different than if you take money from a venture fund, you take money
link |
from a venture fund.
link |
The venture fund, they want you to go big or go home.
link |
They're kind of like expecting 9 out of 10 to fail or 99 out of 100 to fail.
link |
It's different, I was at a barbell strategy, I was like, I can't fail.
link |
I mean, I may not do super well, but I cannot lose their money.
link |
So I'm going to do something I know can return a profit, but I want to have exposure to an
link |
So that's what happened in Anaconda.
link |
We didn't, there was lots of things we did not well in terms of that structure and I've
link |
learned from since to have it better.
link |
But we did a really good job of kind of attracting the interest around the area to get good people
link |
working and then get funneled some money on some interesting projects.
link |
Super excited about what came out of our energy there.
link |
So what are some of the interesting projects?
link |
So Dask, Numba, Bokeh, Kanda, there was a data shader, Panel, Holoviz.
link |
These are all tools that are extremely relevant in terms of helping you build applications,
link |
build tools, build faster code.
link |
The Bokeh is applauding.
link |
There's a couple I'm beginning.
link |
JupiterLab came out of this too.
link |
That's fascinating.
link |
So Bokeh does plotting.
link |
Bokeh does plotting.
link |
So Bokeh was one of the foundational things to say, I want to do plot in Python, but have
link |
the things show up in a web.
link |
And plotting to me still, with all due respect to Matplotlib and Bokeh, it feels like still
link |
an unsolved problem.
link |
Not a solved problem.
link |
It's a big problem.
link |
Because you're, I mean, I don't know, it's visualization broadly, right?
link |
I think we've got a pretty good API story around certain use cases of plotting.
link |
But there's a difference between static plots versus interactive plots versus, I'm an end
link |
user, I just want to write a simple, for, you know, pandas started the idea of here's
link |
a data frame on a dot plot, I'm just going to attach plot as a method to my object, which
link |
was a little bit controversial, right?
link |
But works pretty well actually, because there's a lot less you have to pass in, right?
link |
You can just say, here's my object, you know what you are, you tell the visualization what
link |
So that, and there's things like that that have not been, you know, super well developed
link |
entirely, but Bokeh was focused on interactive plotting.
link |
So you could, it's a short path between interactive plotting and application, dashboard application.
link |
And there's some incredible work that got done there, right?
link |
And it was a hard project because then you're basically doing JavaScript and Python.
link |
So we wanted to tackle some of these hard problems and try to just go after them.
link |
We got some DARPA funding to help, and it was super helpful.
link |
It's a funny story there, we actually did two DARPA proposals, but one we were five
link |
minutes late for, and DARPA has a very strict cutoff window.
link |
And so I, we had two proposals, one for the Bokeh and one for actually Numba and the other
link |
Which one were you late for?
link |
The foundation on the miracle work.
link |
So Bokeh got funded.
link |
Fortunately, Chris let us use some of the money to fund still some of the other foundational
link |
work, but it wasn't as, yeah, his hands were tied, he couldn't do anything about it.
link |
That was a whole interesting story.
link |
So one of the incredible projects that you worked on is Conda.
link |
So how that came about?
link |
Yeah, Conda, it was early on, like I said, was SciPy.
link |
SciPy was a distribution masqueraderies in a library.
link |
And you said, you heard me talking about compiler issues and trying to get the stuff shipped
link |
and the fact that people can use your libraries if they have it.
link |
So for a long time, we'd understood the packaging problem in Python.
link |
And one of the first things we did at Continue Analytics, we came out of Conda, was organize
link |
the PyData ecosystem in conjunction with NumFocus.
link |
We actually started NumFocus with some other folks in the community the same year we started
link |
I said, we're going to build a corporation, but we also got to reify the community aspect
link |
and build a nonprofit.
link |
So we did both of those.
link |
Can we pause real quick and can you say what is PyPy, the Python package index, like this
link |
whole story of packaging in Python?
link |
Yeah, that's what I'm going to get to actually.
link |
This is exactly the journey I'm on.
link |
This is sort of explain packaging in Python.
link |
I think it's best expressed to the conversation I had with Gito at a conference where I said,
link |
so packaging is kind of a problem.
link |
And Gito said, I don't ever care about packaging.
link |
I don't install new libraries.
link |
I'm like, I guess if you're the language creator and if you need something, you just
link |
put it in the distribution.
link |
Maybe you don't worry about packaging.
link |
But Gito has never really cared about packaging, right?
link |
And never really cared about the problem of distribution.
link |
Somebody else's problem.
link |
And that's a fair position to take, I think, as a language creator.
link |
In fact, there's a philosophical question about should you have different development
link |
packaging managers?
link |
Should you have a package manager per language?
link |
Is that really the right approach?
link |
I think there are some answers of it is appropriate to have development tools.
link |
And there's an aspect of development tool that is related to packaging.
link |
And every language should have some story there to help their developers create.
link |
So you should have language specific development tools that relate to package managers.
link |
But then there's a very specific user story around package management that those language
link |
specific package managers have to interact with and currently aren't doing a good job
link |
That was one of the challenges of not seeing that difference and still exists in the difference
link |
Kanda always was a user, I'm going to use Python to do data science.
link |
I'm going to use Python to do something.
link |
How do I get this installed?
link |
It was always focused on that.
link |
So it didn't have a develop.
link |
Classic example is PIP has a PIP develop.
link |
It's like, I want to install this into my current development environment today.
link |
Now, Kanda doesn't have that concept because it's not part of the story.
link |
For people who don't know, PIP is a Python specific package manager.
link |
That's exceptionally popular.
link |
That's probably like the default thing you learn.
link |
It's the default user.
link |
So the story there emerged because what happened is in 2012, we had this meeting at the Google
link |
Plex and Guido was there to come talk about what we're going to do, how we're going to
link |
make things work better, and Wes McKinney, me, Peter, Peter has a great photo of me talking
link |
to Guido and he pretends we're talking about this story.
link |
But we did at that meeting talk about it and asked Guido, we need to fix packaging in Python.
link |
I'm like, people can't get this stuff.
link |
And he said, go fix it yourself.
link |
I don't think we're going to do it.
link |
The origin story right there.
link |
You said to do this ourselves.
link |
At the same time, people did start to work on the packaging story in Python.
link |
It just took a little longer.
link |
So in 2012, kind of motivated by our training courses we were teaching, like very similar
link |
to what you just mentioned about your mother, like it was motivated by the same purpose.
link |
Like, how do we get this into people's hands and it's this big, long process that takes
link |
It was actually hurting NumPy development because I would hear people were saying, don't make
link |
that change to NumPy because I just spent a week getting my Python environment.
link |
And if you change NumPy, you have to reinstall everything and reinstalling such a pain, don't
link |
So now we're not making changes to a library because of the installation problem that will
link |
cause for end users.
link |
There's a problem with pack.
link |
There's a problem with installation.
link |
We got to fix this.
link |
So we said, we're going to make a distribution of Python and we'd previously done that, previously
link |
done that at end thought.
link |
I wanted to make one that would give away for free that one could just get like those critical
link |
that we just get it, you know, it wasn't tied to a product, it was just you could get
link |
And then we had constantly thought about, well, do we just leverage RPM?
link |
But the challenge had always been, we want a package manager that works on Windows, Mac
link |
OS X and Linux the same, right?
link |
And it wasn't there.
link |
Like you don't have anything like that.
link |
And for people who don't know, RPM is operating system specific package.
link |
It's an operating specific.
link |
So do you create the design that question is, do you create an umbrella package manager
link |
that cross operating system, yes, that was the decision and a neighboring design questions.
link |
Do you also create a package manager that spans multiple programming languages?
link |
That was the world we faced.
link |
And we decided to go multiple operating systems, multiple and programming language independent
link |
because even Python and particularly what was important was sci pi has a bunch of 4chan
link |
And scikit learn has links to a bunch of C plus plus.
link |
There's a lot of compiled code.
link |
And the Python package manager, especially early on, didn't even support that.
link |
So in 2000, so we released anaconda, which was just a distribution of libraries, but
link |
we started to work on conda in 2012.
link |
First version of conda came out in early 2013, summer of 2013.
link |
And it was a package manager.
link |
So you could say conda install scikit learn.
link |
In fact, that was the scikit learn was a fantastic project that emerged.
link |
Kind of it was the classic example of the sidekits.
link |
I told, talked to me earlier about scipy being too big to be a single library.
link |
Well, what the community had done is said, let's make scikits and there's scikit image
link |
There's a lot of scikits.
link |
And it was a fantastic move, you know, that the community did.
link |
I was like, okay, that's good idea.
link |
I didn't like the name.
link |
I didn't like the fact you type scikit image.
link |
I was like, that's going to be simpler.
link |
We got to make that smaller.
link |
I like typing all this stuff from imports.
link |
So I was kind of a pressure that way, but I love the energy and love the fact that they
link |
went out and they did it and DOS people, Jared Milman, and then of course, Gael, and there's
link |
people I'm not even naming that scikit learn really emerged this fantastic project.
link |
And the documentation around that is also incredible.
link |
It was incredible.
link |
I don't know who did that.
link |
But they did a great job.
link |
A lot of people in Inria, a lot of people, a lot of European contributors, Andreas, there's
link |
some Andreas in the U.S. There's a lot of just people I just adore.
link |
I think are amazing people.
link |
Awesome use of scipy.
link |
I love the fact that they were using scipy effectively because of my love, which is machine
link |
learning, but couldn't install it because there's so many, so many dependencies, right?
link |
So our use case of condo was cond install scikit learn, right?
link |
And it was the best way to install scikit learn in 2013 to really 2018, 17, 18, pip finally
link |
I still think you should cond install scikit learn for the pip install scikit learn, but
link |
you can pip install scikit learn.
link |
The issue is the package they created was wheels, and pip does not handle the multi vendor
link |
They don't handle the fact you have C++ libraries you're depending on.
link |
They just stop at the Python boundary.
link |
And so what you have to do in the wheel world is you have to vendor.
link |
You have to take all of the binary and vendor it.
link |
Now if your change happens in early dependency, you have to redo the whole wheel.
link |
So TensorFlow is a good example, but you should not pip install TensorFlow.
link |
It's a terrible idea if people do it because the popularity of pip, many people think,
link |
of course, that's how I install everything in Python.
link |
This is one of the big challenges.
link |
You take a GitHub repository or just a basic blog post.
link |
The number of time pip is mentioned over conda is like 100x to one.
link |
And that was increasing.
link |
It wasn't true early because pip didn't exist.
link |
Like the long tail of the internet documentation user generated, so that you think, how do
link |
I install, Google, how do I install TensorFlow, you're just not going to see conda in that
link |
And today you would have in 2016, 2017.
link |
And it's sad because conda solves a lot of usability issues.
link |
Especially super challenging thing.
link |
One of the big pain points for me was just on the computer vision side, OpenCV installation
link |
I don't know if conda solved that one.
link |
Conda has an OpenCV package.
link |
I certainly know pip has not solved...
link |
I mean, there's complexities there because...
link |
I actually don't know.
link |
I should probably know a good answer for this, but if you compile OpenCV with certain
link |
dependencies, you'll be able to do certain things.
link |
So there's this kind of flexibility of what options you compile with.
link |
And I don't think it's trivial to do that with conda or with...
link |
So conda has a notion of variance of a package.
link |
You can actually have different compilation versions of a package.
link |
So not just the versions different, but oh, this is compiled with these optimizations
link |
So conda does have an answer.
link |
Has those flavors.
link |
Has flavors, basically.
link |
Well, pip, as far as I know, does not have flavors.
link |
You know, pip generally hasn't thought deeply about the binary dependency problem, right?
link |
That's why, fundamentally, it doesn't work for the scipy ecosystem.
link |
You can sort of paper over it and duct tape it and it kind of works until it doesn't
link |
and it falls apart entirely.
link |
So it's been a mixed bag.
link |
And I've been having lots of conversations with people over the years because, again,
link |
it's an area where if you understand some things, but not all the things, but they've
link |
done a great job of community appeal.
link |
This is an area where, I think, and a conda, as a company, need to do some things in order
link |
to make conda more community centric, right?
link |
I talk about this all the time.
link |
There's a balance between...
link |
Every project starts with what I call company backed open source.
link |
Even if the company is yourself, this is one person, just doing business as.
link |
But ultimately for products to succeed virally and become massive influencers, they have
link |
They have to get community people on board.
link |
They have to get other people on board.
link |
So it has to become community driven and a big part of that is engagement with those
link |
people, empowering people, governance around it.
link |
And what happened with conda in the early days, pip emerged, and we did do some good
link |
Conda Forge community is sort of the community recipe creation community.
link |
But conda itself, I still believe, and Peter is CEO of Anaconda, he's my cofounder.
link |
I ran Anaconda until 2017, 2018.
link |
Is Peter still in Anaconda?
link |
Peter still in Anaconda, right?
link |
We're still great friends.
link |
We talk all the time.
link |
I love him to death.
link |
There's a long story there about why and how, and we can cover in some other podcasts,
link |
It's sort of a more, maybe a more business focused one.
link |
But this is one area where I think conda should be more community driven.
link |
He should be pushing more to get more community contributors to conda.
link |
And Anaconda shouldn't be fighting this battle.
link |
It's actually a...
link |
It's really a developer.
link |
You said help the developers, and then they'll actually move us the right direction.
link |
Well, that was the problem I have as many of the cool kids I know don't use conda.
link |
And that, to me, is confusing.
link |
It's really a matter of, conda has some challenges, first of all.
link |
Conda still needs to be improved.
link |
There's lots of improvements to be made.
link |
And it's that aspect of, wait, who's doing this?
link |
And the fact that then the PiPA really stepped up.
link |
They were not solving the problem at all.
link |
And now they kind of got to where they're solving it for the most part.
link |
And then, effectively, you could get conda solved a problem that was there.
link |
And it still does.
link |
There's still great things it can do.
link |
And we still use it all the time at QuantSite and with other clients.
link |
But you can kind of do similar things with Pippin Docker, right?
link |
So especially with the web development community, part of it, again, is there's a lot of different
link |
kind of developers in the Python ecosystem.
link |
And there's still a lack of some clear understanding.
link |
I go to the Python conference all the time and there's only a few people in the PiPA
link |
And then others who are just massively trumpeting the power of PIP but just do not understand
link |
So one of the obvious things to me from a mom, from a non programmer perspective is the
link |
across operating system usability that's much more natural.
link |
So they use Windows and just it seems much easier to recommend conda there.
link |
But then you should also recommend it across the board.
link |
So I'll definitely...
link |
But what I recommend now is a hybrid.
link |
I mean, I have no problem with PIP.
link |
Is it possible to use...
link |
Like build the environment with PIP with conda, build an environment with conda.
link |
And then PIP install on top of that.
link |
Be careful about PIP installing OpenCV or TensorFlow or...
link |
Because if somebody's allowed that, it's going to be most surely done in a way that can't
link |
be updated that easily.
link |
So install like the big packages, the infrastructure with conda and then the weirdos that like
link |
the weird like implementation for some head of there's a cool library I used that based
link |
on your location and time of day and date tells you the exact position of the sun relative
link |
And it's just like a simple library.
link |
But it's very precise.
link |
And I was like, all right.
link |
Well, the thing they did really well is Python developers who want to get their stuff published,
link |
you have to have a PIP recipe.
link |
I mean, even if it's...
link |
The challenge is...
link |
And there's a key thing that needs to be added to PIP.
link |
Just simply add to PIP the ability to defer to a system package manager because it's recognized
link |
you're not going to solve all the dependency problem.
link |
So let like give up and allow the system package to work.
link |
That way, Anaconda is installed and it has PIP.
link |
It would default to conda to install and stuff.
link |
But Red Hat RPM would default to RPM to install more things.
link |
Like that's a key, not difficult, but somewhat work.
link |
Some work feature needs to be added.
link |
That's an example of something like I've known we need to root and do it.
link |
I mean, it's where I wish I had more money.
link |
I wish I was more successful in the business side, trying to get there.
link |
But I wish my family, friends and full community that I know...
link |
...was larger and had more money because I know tons of things to do effectively with
link |
But I have not yet been successful a channel.
link |
I'm happy with what we've done.
link |
We've created again at Quonsite what we created to get Anaconda started.
link |
We created community to get Anaconda started, done it again with Quonsite.
link |
Super excited by that.
link |
It took three years to do it.
link |
What is its mission?
link |
We've talked a few times about different, fascinating aspects of it, but it's like
link |
Big picture of Quonsite.
link |
Quonsite is it's mission is to connect data to an open economy.
link |
So it's basically consulting the pie data ecosystem.
link |
It's a consulting company and what I've said when I started it was we're trying to create
link |
products, people and technology.
link |
So it's divided into two groups and a third one as well.
link |
The two groups are a consulting services company that just helps people do data science and
link |
data engineering and data management better and more efficiently.
link |
We'll help you build a infrastructure if you're using Jupiter.
link |
We do staff augmentation, need more programmers, help you use DAS more effectively, help you
link |
use GPUs more effectively.
link |
Basically, a lot of people need help.
link |
So we do training as well to help people, you know, both immediate help and then get
link |
learned from somebody.
link |
We've added a bunch of stuff too.
link |
We kind of separated some of these other things into another company called Open Teams
link |
that we currently started.
link |
One of the things I loved about what we did at Anaconda was creating a community innovation
link |
So I wanted to replicate that.
link |
This time, we did a lot of innovation at Anaconda.
link |
I wanted to do innovation, but also contribute to the projects that existed.
link |
Like create a place where maintainers, so that SciPy and NumPy and all these projects
link |
we already started can pay people to work on them and keep them going.
link |
QuantSite Labs is a separate organization.
link |
It's a nonprofit mission.
link |
The profits of QuantSite help fund it.
link |
And in fact, every project that we have at QuantSite, a portion of the money, goes directly
link |
to QuantSite Labs to help keep it funded.
link |
So we've gotten several mechanisms that we keep QuantSite Labs funded.
link |
And currently, I'm really excited about labs because it's been a mission for a long time.
link |
What kind of projects are within labs?
link |
So labs is working to make the software better, like make NumPy better, make SciPy better.
link |
It only works on open source.
link |
So if somebody wants to, so companies do, we have a thing called a community work order,
link |
If a company says, I want to make Spyder better, okay, cool.
link |
You can pay for a month of a developer of Spyder or developer of NumPy or developer of SciPy.
link |
You can't tell them what you want them to do.
link |
You can give them your priorities and things you wish existed.
link |
And they'll work on those priorities with the community to get what the community wants
link |
and what emerges with the community wants.
link |
Is there some aspect on the consulting side that is helping as we were talking about morphology
link |
Is there a specific application that are particularly like driving, sort of inspiring the need for
link |
GPUs are absolutely one of them.
link |
And new hardware beyond GPUs.
link |
I'm hoping we'll have a chance to work on that perhaps.
link |
Things like that are definitely driving it.
link |
The other thing is driving is scalable, like speed and scale.
link |
How do I write NumPy code or NumPy Lite code if I want it to run across a cluster?
link |
Oh, that's Dask or maybe it's Ray.
link |
I mean, there's sort of ways to do that now or there's Modin and there's, so pandas code,
link |
NumPy code, SciPy code, second learn code that I want to scale.
link |
So that's one big area.
link |
Have you gotten a chance to chat with Andre and Elon about, because like...
link |
No, I would love to, by the way.
link |
I'm not very loved to.
link |
I just saw their Tesla AI days video.
link |
So this one of the, you know, I love great engineering, software engineering teams and
link |
engineering teams in general and they're doing a lot of incredible stuff with Python.
link |
So many aspects of the machine learning pipeline.
link |
That's operating in the real world and so much of that is Python.
link |
Like you said, the guy running, you know, Andre Kapathi, running autopilot is tweeting
link |
about optimization of NumPy versus...
link |
I'd love to talk to him.
link |
In fact, we have at QuantSite, we've been fortunate enough to work with Facebook on
link |
So we have about 13 developers at QuantSite.
link |
Some of them are in labs working directly on PyTorch.
link |
So I basically started QuantSite.
link |
I went to both TensorFlow and PyTorch and said, hey, I want to help connect what you're
link |
doing to the broader SciPy ecosystem because I see what you're doing.
link |
We have this bigger mission that we want to make sure we don't lose energy here.
link |
And Facebook responded really positively and I didn't get the same reaction.
link |
I love the folks in TensorFlow.
link |
I really love the folks in TensorFlow too.
link |
They're fantastic.
link |
I think it's just how it integrates with their business.
link |
Like I said, there's a lot of reasons.
link |
Just the timing, the integration with their business, what they're looking for.
link |
They're probably looking for more users and I was looking to kind of kept some development
link |
effort and they couldn't receive that as easily, I think.
link |
I'm really hopeful and love the people there.
link |
What's the idea behind Open Teams?
link |
So Open Teams, I'm super excited about Open Teams because it's one of the, I mentioned
link |
my idea for investing directly in open source.
link |
So that's a concept called FaroSS.
link |
But one of the things we, when we started QuantSite, we knew we would do is we develop
link |
products and ideas and new companies might come out.
link |
At Anaconda, this was clear, right?
link |
At Anaconda, we did so much innovation that like five or six companies could have come
link |
And we just didn't structure it so they could.
link |
But in fact, they have.
link |
You look at BASC, there's two companies coming out of BASC, Bokeh could be a company.
link |
There's like lots of companies that could exist off the work we did there.
link |
And so I thought, oh, here's a recipe for an incubation, a concept that we could actually
link |
spawn new companies and new innovations.
link |
And then the idea has always been, well, money they earn should come back to fund the open
link |
So Labs is, I think there should be a lot of things like QuantSite Labs.
link |
I think this concept is one that scales.
link |
You could have a lot of open source research labs.
link |
Along the way, so in 2018, when the bigger idea came how to make open source investor,
link |
I said, oh, I need to write it.
link |
I need to create a venture fund.
link |
So we created a venture fund called QuantSite Initiate at the same time.
link |
It's an angel fund, really.
link |
We started to learn that process.
link |
How do we actually do this?
link |
How do we get LPs?
link |
How do we go in this direction and build a fund?
link |
And I'm like, every venture fund should have an associated open source research lab.
link |
There's just no reason.
link |
Like our venture fund, the carried interest portion of it goes to the lab.
link |
It directly will fund the lab.
link |
That's fascinating by the way.
link |
So you use the power of the organic formation of teams in the open source community and
link |
then naturally that leads to a business that can make a lot of money and then it always
link |
maintains and loops back to the open source.
link |
Loops back to open source.
link |
There's a lot of fit.
link |
There's absolutely a repeatable pattern there.
link |
And it's also beneficial because, oh, I have natural connections to the open source.
link |
If I have an open source research lab, they'll all be out there talking to people.
link |
And so we've had a chance to talk to a lot of early stage companies and our fund focused
link |
on the early stage.
link |
So QuantSite has the services, the lab, the fund.
link |
In that process, a lot of stuff started to happen.
link |
They're like, oh, we started to do recruiting and support and training.
link |
And I was starting to build a bigger sales team and marketing team and people besides
link |
And one of the challenges with that is you end up with different cultural aspects.
link |
Developers, in any company you go to, you can go look, is this a business led company,
link |
developer led company?
link |
Are they, what's the interface between them?
link |
There's always a bit of a tension there, like we were talking about before.
link |
What is the tension there?
link |
With open teams, I thought, wait a minute.
link |
We can actually just create this concept of QuantSite plus labs.
link |
While it's specific to the Piedata ecosystem, the concept is general for all open source.
link |
So open teams emerged as a, oh, we can create a business development company for many, many
link |
QuantSites, like thousands of QuantSites.
link |
And it can be a marketplace to connect, essentially be the enterprise software company of the
link |
If you look at what enterprise software wants from the customer side, and during this journey
link |
I've had the chance to work and sell to lots of companies, Exxon and Shell and Davey Morgan
link |
of America, like the Fortune 100, and talk to a lot of people in procurement and see
link |
what are they buying and why are they buying?
link |
So I don't know everything, but I've learned a lot about, oh, what are they really looking
link |
And they're looking for solutions.
link |
They're constantly given products from enterprise software.
link |
Here's open source, these enterprise software, now I buy it, and then they have to stitch
link |
it together into a solution.
link |
Open source is fantastic for gluing those solutions together.
link |
So whereas they keep getting new platforms they're trying to buy, but most open source,
link |
most enterprises want is tools that they can customize that are as inexpensive as they
link |
And so you almost want to maintain the connection to the open source because that's going to
link |
So open teams is about solving enterprise software problems.
link |
Brilliant idea, by the way.
link |
With a connect, but we do it honoring the topology.
link |
We don't hire all the people.
link |
We are a network connecting the sales energy and the procurement energy, and we were on
link |
the business side, get the deals closed, and then have a network of partners like QuantSite
link |
and others, who we hand the deals to, right, to actually do the work, and then we have
link |
to maintain, I feel like we have to maintain some level of quality control so that the
link |
client can rely on open teams to ensure their deliveries.
link |
It's not just, here's a lead, go figure that out, but no, we're going to make sure you
link |
get what you need.
link |
By the way, it's such a skill, and I don't know if I have the patience, I will have the
link |
patience to talk to the business people, or more specifically, I mean, there's all kinds
link |
of flavors of business people, or like marketing people.
link |
There's a challenge.
link |
I hear what you're saying, because I've had the same challenge, and it's true.
link |
There's some times you think, okay, this is way over a lot.
link |
You have to become an adult, and you have to, because the companies have needs.
link |
They have ways to make money, and they also want to learn and grow, and yet it's your
link |
job to kind of educate them in the best way, like the value of open source, for example.
link |
And I'm really grateful for all my experiences over the past 14 years, understanding that
link |
side of it, and still learning, for sure, but not just understanding from companies,
link |
but also dealing with marketing professionals, and sales professionals, and people that make
link |
a career out of that, and understanding what they're thinking about, and also understanding,
link |
well, let's make this better.
link |
We can really make a place, like open teams I see as the transmission layer between companies
link |
and open source communities, producing enterprise software solutions.
link |
Eventually we want to, like today, we're taking on SaaS, and MATLAB, and tools that we know
link |
we can replace for folks.
link |
Really anytime you have a software tool at an organization where you have to do a lot
link |
of customization or make it work for you, like it's not just buying this thing off
link |
the shelf and it works.
link |
It's like, okay, you buy this system, and then you customize it a lot, usually with
link |
expensive consultants, to actually make it work for you.
link |
All of those should be replaced by open source foundations, with the same customization.
link |
Really, you're doing such important work, such important work in these giant organizations
link |
that are doing exactly that, taking some proprietary software and hiring a huge team of consultants
link |
that customize it, and then that whole thing gets outdated quick.
link |
I mean, that's brilliant.
link |
The solution to that is kind of what Tesla is doing a little bit of, which is basically
link |
build up a software engineering team, like build a team from scratch.
link |
Build a team from scratch, and companies are doing it well.
link |
That's what they're doing right now.
link |
And you're creating a pathology for some of that.
link |
You just don't have to do it.
link |
That's not the only answer.
link |
And so other companies can access this, be more accessible.
link |
We really, let's really say, open team is the future of enterprise software.
link |
We're still early.
link |
This idea just percolated over the past year as we've kind of grown quantized and realized
link |
the extensibility of it.
link |
We just finished in our seed round to help get more salespeople and then push the messaging
link |
And there's lots of tools we're building to make this easier, like we want to automate
link |
We feel like a lot of the power is the efficiency of the sales process.
link |
There's a lot of wasted energy in small teams and the sales energy to get into large companies
link |
There's a lot of money spent on that process.
link |
Creating the tools and processes for that sales.
link |
So make that super seamless so a single company can go, oh, I've got my contract with open
link |
We've got a subscription they can get.
link |
They can make that procurement seamless.
link |
And then the fact they have access to the entire open source ecosystem.
link |
And we have a part of our work that's embracing open source ecosystems and making sure we're
link |
doing things useful for them.
link |
We're serving them.
link |
And then companies making sure they're getting solutions they care about.
link |
And then figuring out which targets we have.
link |
We're not taking on all of open source, all of enterprise software yet.
link |
Well, this feels like the future.
link |
The idea and the vision is brilliant.
link |
Kasko, why do you think Microsoft bought GitHub and what do you think is the future