back to indexDmitry Korkin: Evolution of Proteins, Viruses, Life, and AI | Lex Fridman Podcast #153
link |
The following is a conversation with Dimitri Korkin,
link |
his second time in the podcast.
link |
He's a professor of bioinformatics
link |
and computational biology at WPI,
link |
where he specializes in bioinformatics
link |
of complex disease, computational genomics,
link |
systems biology, and biomedical data analytics.
link |
He loves biology, he loves computing,
link |
plus he is Russian and recites a poem in Russian
link |
at the end of the podcast.
link |
What else could you possibly ask for in this world?
link |
Quick mention of our sponsors, Brave Browser,
link |
NetSuite Business Management Software,
link |
Magic Spoon Low Carb Serial,
link |
and Aidsleep Self Cooling Mattress.
link |
So the choice is browsing privacy, business success,
link |
healthy diet, or comfortable sleep.
link |
Choose wisely, my friends, and if you wish,
link |
click the sponsor links below
link |
to get a discount and to support this podcast.
link |
As a side note, let me say that to me,
link |
the scientists that did the best,
link |
apolitical, impactful, brilliant work of 2020
link |
are the biologists who study viruses without an agenda,
link |
without much sleep, to be honest,
link |
just a pure passion for scientific discovery
link |
and exploration of the mysteries within viruses.
link |
Viruses are both terrifying and beautiful,
link |
terrifying because they can threaten
link |
the fabric of human civilization,
link |
both biological and psychological.
link |
Beautiful because they give us insights
link |
into the nature of life on Earth
link |
and perhaps even extraterrestrial life
link |
of the not so intelligent variety
link |
that might meet us one day
link |
as we explore the habitable planets and moons
link |
If you enjoy this thing, subscribe on YouTube,
link |
review it on Apple Podcast, follow on Spotify,
link |
support on Patreon, or connect with me
link |
on Twitter at Lex Freedman.
link |
And now here's my conversation with Dimitri Korkin.
link |
It's often said that proteins and the amino acid residues
link |
that make them up are the building blocks of life.
link |
Do you think of proteins in this way
link |
as the basic building blocks of life?
link |
So the proteins indeed is the basic unit,
link |
biological unit that carries out
link |
important function of the cell.
link |
However, through studying the proteins
link |
and comparing the proteins across different species,
link |
across different kingdoms,
link |
you realize that proteins are actually
link |
much more complicated.
link |
So they have so called modular complexity.
link |
And so what I mean by that is an average protein
link |
consists of several structural units.
link |
So we call them protein domains.
link |
And so you can imagine a protein as a string of beads
link |
where each bead is a protein domain.
link |
And in the past 20 years,
link |
scientists have been studying the nature
link |
of the protein domains.
link |
Cause we realized that it's the unit.
link |
Because if you look at the functions, right?
link |
So many proteins have more than one function.
link |
And those protein functions are often carried out
link |
by those protein domains.
link |
So we also see that in the evolution,
link |
those proteins domains get shuffled.
link |
So they act actually as a unit.
link |
Also from the structural perspective, right?
link |
So, you know, some people think of a protein
link |
as a sort of a globular molecule.
link |
But as a matter of fact is the globular part
link |
of this protein is a protein domain.
link |
So we often have this, you know,
link |
again, the collection of this protein domains
link |
align on a string as beads.
link |
And the protein domains are made up of amino acid residues.
link |
So this is the basic,
link |
so you're saying the protein domain
link |
is the basic building block of the function
link |
that we think about proteins doing.
link |
So, of course, you can always talk about
link |
different building blocks with turtles all the way down.
link |
But there's a point where there is at the point
link |
of the hierarchy where it's the most, the cleanest
link |
element block based on which you can put them together
link |
in different kinds of ways to form complex function.
link |
And you're saying protein domains,
link |
why is that not talked about as often in popular culture?
link |
Well, you know, there are several perspectives on this.
link |
And one, of course, is the historical perspective, right?
link |
So historically, scientists have been able
link |
to structurally resolved to obtain the 3D coordinates
link |
of a protein for, you know, for smaller proteins.
link |
And smaller proteins tend to be a single domain protein.
link |
So we have a protein equal to a protein domain.
link |
And so because of that, the initial suspicion was
link |
that the proteins are, they have globular shapes
link |
and the more of smaller proteins you obtain structurally,
link |
the more you became convinced that that's the case.
link |
And only later when we started having, you know,
link |
alternative approaches.
link |
So, you know, the traditional ones are Xray crystallography
link |
and NMR spectroscopy.
link |
So these are sort of the two main techniques
link |
that give us the 3D coordinates.
link |
But nowadays, there's huge breakthrough
link |
in cryoelectron microscopy.
link |
So the more advanced methods that allow us to, you know,
link |
to get into the, you know, 3D shapes
link |
of much larger molecules, molecular complexes,
link |
just to give you one of the common examples for this year.
link |
So the first experimental structure
link |
of a SARS CoV2 protein was the cryoem structure
link |
So the spike protein.
link |
And so it was solved very quickly.
link |
And the reason for that is the advancement
link |
of this technology is pretty spectacular.
link |
How many domains is the, is it more than one domain?
link |
Oh, yes, I mean, so it's a very complex structure.
link |
And we, you know, on top of the complexity
link |
of a single protein, right?
link |
So this structure is actually, is a complex, is a trimer.
link |
So it needs to form a trimer in order to function properly.
link |
So a complex is a glomeration of multiple proteins.
link |
And so we can have the same protein copied
link |
in multiple, you know, made up in multiple copies
link |
and forming something that we called a Homo oligomer.
link |
Homo means the same, right?
link |
So, so in this case, so the spike protein is the,
link |
is an example of a Homo tetramer, Homo trimer, sorry.
link |
So means three copies of a three copies in order to.
link |
We have these three chains,
link |
the three molecular chains coupled together
link |
and performing the function.
link |
That's what, when you look at this protein from the top,
link |
you see a perfect triangle.
link |
So, but other, you know, so other complexes are made up
link |
of, you know, different proteins.
link |
Some of them are completely different.
link |
Some of them are similar, the hemoglobin molecule, right?
link |
So it's actually, it's a protein complex.
link |
It's made of four basic subunits.
link |
Two of them are identical to each other.
link |
Two other are identical to each other,
link |
but they are also similar to each other,
link |
which sort of gives us some ideas about the evolution
link |
of this, you know, of this molecule.
link |
And perhaps one of the hypotheses that, you know,
link |
in the past, it was just a Homo tetramer, right?
link |
So four identical copies, and then it became, you know,
link |
sort of modified, it became mutated over the time
link |
and became more specialized.
link |
Can we linger on the spike protein for a little bit?
link |
Is there something interesting
link |
or like beautifully you find about it?
link |
I mean, first of all, it's an incredibly challenging protein.
link |
And so we, as a part of our sort of research
link |
to understand the structural basis of this virus
link |
to sort of decode, structure decode
link |
every single protein in its proteome,
link |
which, you know, we've been working on the spike protein.
link |
And one of the main challenges was that
link |
cryoem data allows us to reconstruct
link |
or to obtain the 3D coordinates
link |
of roughly two thirds of the protein.
link |
The rest of the one third of this protein,
link |
it's a part that is buried into the membrane of the virus
link |
and of the viral envelope.
link |
And it also has a lot of unstable structures around it.
link |
So it's chemically interacting somehow
link |
with whatever the heck it's connecting?
link |
Yeah, so people are still trying to understand.
link |
So the nature of, and the role of this, you know,
link |
of this one third,
link |
because the top part, you know,
link |
the primary function is to get attached to the, you know,
link |
ACE2 receptor, human receptor.
link |
There is also beautiful, you know,
link |
mechanics of how this thing happens, right?
link |
So because there are three different copies
link |
of this chains, you know,
link |
there are three different domains, right?
link |
So we're talking about domains.
link |
So this is the receptor binding domains, RBDs,
link |
that gets untaggled
link |
and get ready to get attached to the receptor.
link |
And now they are not necessarily going in a sync mode.
link |
As a matter of fact.
link |
So, and this is where, you know,
link |
another level of complexity comes into play
link |
because right now what we see is,
link |
we typically see just one of the arms going out
link |
and getting ready to be attached
link |
to the ACE2 receptors.
link |
However, there was a recent mutation
link |
that people studied in that spike protein
link |
and a very recently,
link |
a group from UMass Medical School
link |
will happen to collaborate with groups.
link |
So this is a group of Jeremy Luban
link |
and a number of other faculty.
link |
They actually solved the mutated structure of the spike
link |
and they showed that actually,
link |
because of these mutations,
link |
you have more than one arms opening up.
link |
And so now, so the frequency of two arms going up
link |
increase quite drastically.
link |
How does that change the dynamics somehow?
link |
It potentially can change the dynamics of,
link |
because now you have two possible opportunities
link |
to get attached to the ACE2 receptor.
link |
It's a very complex molecular process,
link |
mechanistic process.
link |
But the first step of this process
link |
is the attachment of this spike protein,
link |
of the spike trimer to the human ACE2 receptor.
link |
So this is a molecule that sits on the surface
link |
of the human cell.
link |
And that's essentially what triggers
link |
the whole process of encapsulation.
link |
If this was dating, this would be the first date.
link |
So is it possible that the spike protein
link |
just like floating about on its own
link |
or does it need that interactability with the membrane?
link |
Yeah, so it needs to be attached,
link |
at least as far as I know.
link |
But when you get this thing attached on the surface,
link |
there is also a lot of dynamics
link |
on how it sits on the surface.
link |
So for example, there was a recent work in,
link |
again, where people use the cryoelectron microscopy
link |
to get the first glimpse of the overall structure.
link |
It's a very low res,
link |
but you still get some interesting details
link |
about the surface, about what is happening inside
link |
because we have literally no clue
link |
until recent work about how the capsid is organized.
link |
What's the capsid?
link |
So capsid is essentially the inner core
link |
of the viral particle where there is the RNA of the virus
link |
and it's protected by another protein and protein
link |
that essentially acts as a shield.
link |
But now we are learning more and more,
link |
so it's actually, it's not just this shield,
link |
it's potentially used for the stability
link |
of the outer shell of the virus.
link |
So it's pretty complicated.
link |
And understanding all of this is really useful
link |
for trying to figure out like developing a vaccine
link |
or some kind of drug to attack any aspects of this, right?
link |
So I mean, there are many different implications to that.
link |
First of all, it's important to understand the virus itself.
link |
So in order to understand how it acts,
link |
what is the overall mechanistic process
link |
of this virus replication
link |
of this virus proliferation to the cell, right?
link |
So that's one aspect.
link |
The other aspect is designing new treatments.
link |
So one of the possible treatments is, you know,
link |
designing nanoparticles.
link |
And so some nanoparticles that will resemble the viral shape
link |
that would have the spike integrated
link |
and essentially would act as a competitor to the real virus
link |
by blocking the ACE2 receptors
link |
and thus preventing the real virus entering the cell.
link |
Now, there are also, you know,
link |
there is a very interesting direction
link |
in looking at the membrane,
link |
at the envelope portion of the protein
link |
and attacking its M protein.
link |
So there are, you know, to give you, you know,
link |
sort of a brief overview,
link |
there are four structural proteins
link |
that these are the proteins that made up
link |
a structure of the virus.
link |
So spike S protein that acts as a trimer.
link |
So it needs three copies.
link |
E envelope protein that acts as a pentamer.
link |
So it needs five copies to act properly.
link |
M is a membrane protein.
link |
It forms dimers and actually it forms beautiful lattice.
link |
And this is something that we've been studying
link |
and we are seeing it in simulations.
link |
It actually forms a very nice grid
link |
or, you know, threads, you know,
link |
of different dimers attached next to each other.
link |
There's a bunch of copies of each other
link |
and they naturally, when you have a bunch of copies
link |
of each other, they form an interesting lattice.
link |
And, you know, if you think about this, right?
link |
So this complex, you know,
link |
the viral shape needs to be organized somehow,
link |
self organized somehow, right?
link |
So, you know, if it was a completely random process,
link |
you know, you probably wouldn't have the envelope shell
link |
of the ellipsoid shape.
link |
You know, you would have something, you know,
link |
pre random, right shape.
link |
So there is some, you know, regularity
link |
and how this, you know, how this M dimers
link |
get to attach to each other in a very specific,
link |
Is that understood at all?
link |
It's not understood.
link |
We are now, we've been working in the past six months
link |
since, you know, we're met.
link |
Actually, this is where we started working
link |
on trying to understand the overall structure
link |
of the envelope and the key components
link |
that made up this, you know, structure.
link |
Wait, does the envelope also have the lattice structure
link |
So the envelope is essentially is the outer shell
link |
of the viral particle.
link |
The N, the nucleocapsid protein,
link |
is something that is inside.
link |
The N is likely to interact with M.
link |
Does it go M and E?
link |
Like where's the E and the N?
link |
So E, those different proteins,
link |
they occur in different copies on the viral particle.
link |
So E, this pentamer complex,
link |
we only have two or three maybe per each particle.
link |
Okay, we have 1,000 or so of M dimers
link |
that essentially makes up the entire outer shell.
link |
So most of the outer shell is the M.
link |
M dimer and the M protein.
link |
When you say particle, that's the viral on the virus,
link |
the individual virus.
link |
It's a single, yes.
link |
Single element of the virus, single virus.
link |
Single virus, right.
link |
And we have about, you know, roughly 50 to 90 spike trimmers.
link |
Right, so when you show a...
link |
Per virus particle.
link |
Per virus particle.
link |
So what did you say, 50 to 90?
link |
So this is how this thing is organized.
link |
And so now, typically, right,
link |
so you see the antibodies that target spike protein,
link |
you know, spike protein,
link |
certain parts of the spike protein,
link |
but there could be some, also some treatments, right?
link |
So these are, you know, these are small molecules
link |
that bind strategic parts of these proteins
link |
disrupting its function.
link |
So one of the promising directions,
link |
it's one of the newest directions,
link |
is actually targeting the M dimer of the protein.
link |
Targeting the proteins that make up this outer shell.
link |
Because if you're able to destroy the outer shell,
link |
you're essentially destroying the viral particle itself.
link |
So preventing it from, you know, function at all.
link |
So that's, you think,
link |
is from a sort of cybersecurity perspective,
link |
virus security perspective,
link |
that's the best attack vector is,
link |
or like that's a promising attack vector.
link |
So I mean, there's still tons of research needs to be,
link |
you know, to be done.
link |
But yes, I think, you know, so.
link |
There's more attack surface, I guess.
link |
More attack surface,
link |
but you know, from our analysis,
link |
from other evolution analysis,
link |
this protein is evolutionary more stable
link |
compared to the spike protein.
link |
Stable means a more static target.
link |
Well, yeah, so it doesn't change.
link |
It doesn't evolve from the evolutionary perspective
link |
so drastically as, for example, the spike protein.
link |
There's a bunch of stuff in the news
link |
about mutations of the virus in the United Kingdom.
link |
I also saw in South Africa something,
link |
maybe that was yesterday.
link |
You just kind of mentioned about stability and so on.
link |
Which aspects of this are mutatable
link |
and which aspects, if mutated, become more dangerous.
link |
And maybe even zooming out,
link |
what are your thoughts and knowledge and ideas
link |
about the way it's mutated,
link |
all the news that we've been hearing.
link |
Are you worried about it from a biological perspective?
link |
Are you worried about it from a human perspective?
link |
So I mean, you know, mutations are sort of a general way
link |
for these viruses to evolve, right?
link |
So it's, you know, it's essentially,
link |
this is the way they evolve.
link |
This is the way they were able to jump from, you know,
link |
one species to another.
link |
We also see, you know, some recent jumps.
link |
There were some incidents of this virus jumping
link |
from human to dogs.
link |
So, you know, there is some danger in those jumps
link |
because, you know, every time it jumps, it also mutates, right?
link |
So it, when it jumps to the species
link |
and jumps back, right?
link |
So it acquires some mutations that are sort of driven
link |
by the environment of a new host, right?
link |
And it's different from the human environment.
link |
And so we don't know whether the mutations
link |
that are acquired in the new species are neutral
link |
with respect to the human host
link |
or maybe, you know, maybe damaging.
link |
Yeah, change is always scary, but so are you worried about,
link |
I mean, it seems like because the spread is during winter,
link |
now it seems to be exceptionally high.
link |
And especially with a vaccine just around the corner,
link |
already being actually deployed,
link |
there's some worry that there's,
link |
this puts evolutionary pressure,
link |
selective pressure on the virus,
link |
afford to, to mute, for you to mutate.
link |
Is that a source of worry?
link |
Well, I mean, there is always this thought, you know,
link |
in the scientist's mind, you know, what happened,
link |
what will happen, right?
link |
So I know there've been discussions about sort
link |
of the arms race between the, you know,
link |
the ability of the humanity to, you know,
link |
to get vaccinated faster than the virus,
link |
you know, essentially, you know, becomes, you know,
link |
resistant to the vaccine.
link |
I, I mean, I don't worry that much,
link |
simply because, you know, there is not that much evidence
link |
to that, to aggressive mutation around the vaccine.
link |
Exactly, you know, obviously there are mutations
link |
around the vaccine, you know, there are vaccines.
link |
So the reason we get vaccinated every year
link |
against the season of the mutations, right?
link |
But, you know, I think it's important to study it.
link |
So, so I think one of the, you know, to me,
link |
and again, I might be biased because, you know,
link |
we've been trying to do that as well.
link |
So, but one of the critical directions
link |
in understanding the virus is to understand its evolution
link |
in order to sort of understand the mechanisms,
link |
the key mechanisms that lead the virus to jump,
link |
you know, the Nordic viruses to jump from species,
link |
from species to another,
link |
that the mechanisms that lead the virus
link |
to become resistant to vaccines also to treatments, right?
link |
And hopefully that knowledge will enable us
link |
to sort of forecast the evolutionary traces,
link |
the future evolutionary traces of this virus.
link |
I mean, what, from a biological perspective,
link |
this might be a dumb question,
link |
but is there parts of the virus that if souped up
link |
like through mutation could make it more effective
link |
We're talking about the specific coronavirus, like,
link |
because we were talking about the different,
link |
like the membrane, the end protein, the E protein,
link |
the N and the S, the spike.
link |
And there are 20 or so more in addition to that.
link |
But is that a dumb way to look at it?
link |
Like, which of these, if mutated,
link |
could have the greatest impact, potentially damaging impact
link |
on the effectiveness of the virus?
link |
So, it's actually, it's a very good question
link |
because, and the short answer is we don't know yet,
link |
but of course there is capacity of this virus
link |
to become more efficient.
link |
The reason for that is, you know,
link |
so if you look at the virus, I mean, it's a machine, right?
link |
So it's a machine that does a lot of different functions.
link |
And many of these functions are sort of nearly perfect,
link |
but they're not perfect.
link |
And those mutations can make those functions more perfect.
link |
For example, the attachment to ACE2 receptor, right,
link |
of the spike, right?
link |
So, you know, has this virus reached the efficiency
link |
in which the attachment is carried out?
link |
Or there are some mutations that still to be discovered,
link |
right, that will make this attachment sort of stronger,
link |
or, you know, something in a way more efficient
link |
from the point of view of this virus functioning.
link |
That's sort of the obvious example,
link |
but if you look at each of these proteins,
link |
I mean, it's there for a reason, it performs certain function.
link |
And it could be that certain mutations will,
link |
you know, enhance this function.
link |
It could be that some mutations will make this function
link |
much less efficient, right?
link |
So that's also the case.
link |
Let's, since we're talking about the evolution
link |
in the history of a virus, let's zoom back out
link |
and look at the evolution of proteins.
link |
I glanced at this 2010 Nature paper on the quote,
link |
ongoing expansion of the protein universe.
link |
And then, you know, it kind of implies and talks about
link |
that proteins started with a common ancestor,
link |
which is kind of interesting.
link |
It's interesting to think about like even just like
link |
the first organic thing that started life on Earth.
link |
And from that, there's now, you know,
link |
what is it, 3.5 billion years later,
link |
there's now millions of proteins and they're still evolving.
link |
And that's, you know, in part, one of the things
link |
that you're researching, is there something interesting
link |
to you about the evolution of proteins
link |
from this initial ancestor to today?
link |
Is there something beautiful and insightful
link |
about this long story?
link |
So I think, you know, if I were to pick a single keyword
link |
about protein evolution, I would pick modularity,
link |
something that we talked about in the beginning.
link |
And that's the fact that the proteins are no longer considered
link |
as, you know, as a sequence of letters.
link |
There are hierarchical complexities
link |
in the way these proteins are organized.
link |
And these complexities are actually going beyond
link |
the protein sequence.
link |
It's actually going all the way back to the gene,
link |
to the nucleotide sequence.
link |
And so, you know, again, these protein domains,
link |
they are not only functional building blocks,
link |
they are also evolutionary building blocks.
link |
And so what we see in the sort of,
link |
in the later stages of evolution,
link |
I mean, once this stable, structurally,
link |
and functionally building blocks were discovered,
link |
they essentially stay, those domains stay as such.
link |
So that's why if you start comparing different proteins,
link |
you will see that many of them will have similar fragments.
link |
And those fragments will correspond to something
link |
that we call protein domain families.
link |
And so, they are still different
link |
because you still have mutations and, you know,
link |
the, you know, different mutations are attributed
link |
to, you know, diversification of the function
link |
of this, you know, protein domain.
link |
However, you don't, you very rarely see, you know,
link |
the evolutionary events that would split this domain
link |
into fragments because, and it's, you know,
link |
once you have the domain split,
link |
you actually, you know, you can completely cancel
link |
out its function, or at the very least, you can reduce it.
link |
And that's not, you know, efficient from the point of view
link |
of the, you know, of the cell function.
link |
So the protein domain level is a very important one.
link |
Now, on top of that, right?
link |
So if you look at the proteins, right?
link |
So you have this structural units
link |
and they carry out the function.
link |
But then much less is known about things
link |
that connect this protein domains.
link |
Something that we call linkers, and those linkers
link |
are completely flexible, you know, parts of the protein
link |
that nevertheless carry out a lot of function.
link |
It's like little tails or heads.
link |
So, so, so we do have tails.
link |
So they call termini, C and N termini.
link |
So these are things right on the, on one and another ends
link |
of the protein sequence.
link |
So they are also very important.
link |
So they attribute it to very specific interactions
link |
between the proteins.
link |
But you're referring to the links between domains
link |
that connect the domains.
link |
And, you know, apart from the, just the simple perspective,
link |
if you have, you know, a very short domain,
link |
you have, sorry, a very short linker,
link |
you have two domains next to each other.
link |
They are forced to be next to each other.
link |
If you have a very long one, you have the domains
link |
that are extremely flexible and they carry out
link |
a lot of sort of spatial reorganization, right?
link |
That's so awesome.
link |
But on top of that, right, just this linker itself,
link |
because it's so flexible, it actually can adapt
link |
to a lot of different shapes.
link |
And therefore, it's a very good interactor
link |
when it comes to interaction between this protein
link |
and other protein, right?
link |
So these things also evolve, you know,
link |
and they, in a way, have different sort of laws of,
link |
the driving laws that underlie the evolution
link |
because they no longer need to preserve
link |
certain structure, right, unlike protein domains.
link |
And so on top of that, you have something
link |
that is even less studied.
link |
And this is something that attribute to the concept
link |
of alternative splicing.
link |
So alternative splicing.
link |
So it's a very cool concept, it's something
link |
that we've been fascinated about for over a decade
link |
in my lab and trying to do research with that.
link |
But so typically, a simplistic perspective
link |
is that one gene is equal one protein product, right?
link |
So you have a gene, you know, you transcribe it
link |
and translate it and it becomes a protein.
link |
In reality, when we talk about eukaryotes,
link |
especially sort of more recent eukaryotes
link |
that are very complex, the gene is no longer equal
link |
to one protein, it actually can produce multiple
link |
functionally active protein products.
link |
And each of them is called an alternatively spliced product.
link |
The reason it happens is that if you look at the gene,
link |
it actually has, it has also blocks.
link |
And the blocks, some of which,
link |
and it's essentially, it goes like this.
link |
So we have a block that will later be translated,
link |
we call it exon, then we'll have a block
link |
that is not translated, cut out, we call it intron.
link |
So we have exon, intron, exon, intron, et cetera, et cetera,
link |
So sometimes you can have, you know,
link |
dozens of these exons and introns.
link |
So what happens is during the process
link |
when the gene is converted to RNA,
link |
we have things that are cut out,
link |
the introns that cut out, and exons that now
link |
get assembled together.
link |
And sometimes we will throw out some of the exons.
link |
And the remaining protein product will become
link |
still be the same, different, right?
link |
So now you have fragments of the protein
link |
that no longer there.
link |
They were cut out with the introns.
link |
Sometimes you will essentially take one exon
link |
and replace it with another one, right?
link |
So there's some flexibility in this process.
link |
So that creates a whole new level of complexity.
link |
Is this random though?
link |
We, and this is where I think now the appearance
link |
of this modern single cell before that tissue level
link |
sequencing, next generation sequencing technique,
link |
such as RNA seed allows us to see that these are the events
link |
that often happen in response in, it's a dynamic event
link |
that happens in response to disease
link |
or in response to certain developmental stage of a cell.
link |
And this is an incredibly complex layer
link |
that also undergoes, I mean, because it's at the gene level,
link |
So it undergoes certain evolution, right?
link |
And now we have this interplay between what is happening
link |
in the protein world and what is happening
link |
in the gene and RNA world.
link |
And for example, it's often that we see that
link |
the boundaries of these exons coincide
link |
with the boundaries of the protein domains, right?
link |
So there is, you know, close interplay to that.
link |
It's not always, I mean, you know,
link |
otherwise it would be too simple, right?
link |
But we do see the connection
link |
between those sort of machineries.
link |
And obviously the evolution will pick up this complexity
link |
and, you know, select for whatever is successful,
link |
whatever is interesting function.
link |
We see that complexity in play
link |
and makes this question, you know, more complex
link |
but more exciting.
link |
As a small detour, I don't know if you think about this
link |
into the world of computer science.
link |
There's a Douglas Hustader, I think came up with a name
link |
of Quine, which are, I don't know if you're familiar
link |
with these things, but it's computer programs
link |
that have, I guess, exon and intron
link |
The whole purpose of the program is to copy itself.
link |
So it prints copies of itself
link |
but can also carry information inside of it.
link |
So it's a very kind of crude, fun exercise of,
link |
can we sort of replicate these ideas from cells?
link |
Or can we have a computer program
link |
that when you run it just prints itself,
link |
the entirety of itself
link |
and does it in different programming languages and so on.
link |
I've been playing around and writing them.
link |
It's a kind of fun little exercise.
link |
You know, when I was a kid, so, you know,
link |
it was essentially one of the sort of main stages
link |
in Informatics Olympiad
link |
that you have to reach in order to be any so good
link |
is you should be able to write a program
link |
that replicates itself.
link |
And so the tags then becomes even, you know,
link |
sort of more complicated.
link |
So what is the shortest?
link |
What is the shortest, yeah.
link |
And of course, it's, you know,
link |
it's a function of a programming language.
link |
But yeah, I remember, you know,
link |
long, long, long time ago when we tried to, you know,
link |
to make it short and short and find the shortcuts.
link |
There's actually on Stack Exchange,
link |
there's an entire site called CodeGolf, I think,
link |
where the entirety is just a competition.
link |
People just come up with whatever task, I don't know,
link |
like a write code that reports the weather today.
link |
And the competition is about whatever programming language,
link |
what is the shortest program?
link |
And it makes you actually, people should check it out
link |
because it makes you realize there's some weird
link |
programming languages out there.
link |
But, you know, just to dig on that a little deeper,
link |
do you think, you know, in computer science,
link |
you don't often think about programs.
link |
There's like the machine learning world now
link |
that's still kind of basic programs.
link |
And then there's humans that replicate themselves, right?
link |
And there's these mutations and so on.
link |
Do you think we'll ever have a world
link |
where there's programs that kind of have
link |
an evolutionary process?
link |
So I'm not talking about evolutionary algorithms,
link |
but I'm talking about programs that kind of
link |
mate with each other and evolve and,
link |
like, on their own, replicate themselves.
link |
So this is kind of, the idea here is, you know,
link |
that's how you can have a runaway thing.
link |
So we think about machine learning as a system
link |
that gets smarter and smarter and smarter and smarter.
link |
At least the machine learning systems of today
link |
are like, it's a program that you can, like, turn off.
link |
As opposed to throwing a bunch of little programs out there
link |
and letting them, like, multiply and mate
link |
and evolve and replicate.
link |
Do you ever think about that kind of world, you know,
link |
when we jump from the biological systems
link |
that you're looking at to artificial ones?
link |
I mean, it's almost like you take the sort of
link |
the area of intelligent agents, right?
link |
Which are essentially the independent sort of codes
link |
that run and interact and exchange the information, right?
link |
So I don't see why not.
link |
I mean, I, you know, it could be sort of a natural evolution
link |
in this area of computer science.
link |
I think it's kind of an interesting possibility.
link |
It's terrifying, too.
link |
But I think it's a really powerful tool.
link |
Like, to have agents that, you know,
link |
we have social networks with millions of people
link |
and they interact.
link |
I think it's interesting to inject into that.
link |
It was already injected into that bot, right?
link |
But those bots are pretty dumb.
link |
You know, they're probably pretty dumb algorithms.
link |
You know, it's interesting to think that there might be bots
link |
that evolve together with humans.
link |
And there's the sea of humans and robots
link |
that are operating first in the digital space.
link |
And then you can also think, I love the idea.
link |
Some people worked, I think at Harvard, at Penn,
link |
there's robotics labs that, you know,
link |
take as a fundamental task to build a robot
link |
that, given extra resources,
link |
can build another copy of itself,
link |
like in the physical space,
link |
which is super difficult to do,
link |
but super interesting.
link |
I remember there's like research on robots
link |
that can build a bridge.
link |
So they make a copy of themselves
link |
and they connect themselves.
link |
And so it's like self building bridge
link |
based on building blocks.
link |
You can imagine like a building that self assembles.
link |
So it's basically self assembling structures
link |
from robotic parts,
link |
but it's interesting to, within that robot,
link |
add the ability to mutate and do all the interesting,
link |
like little things that you're referring to in evolution
link |
to go from a single origin protein building block
link |
to like this weird complexity.
link |
And if you think about this, I mean, you know,
link |
the bits and pieces are there, you know?
link |
So you mentioned the evolutionary algorithm, right?
link |
You know, so this is sort of,
link |
and maybe sort of the goal is in a way different, right?
link |
So the goal is to, you know, to essentially,
link |
to optimize your search, right?
link |
So, but sort of the ideas are there.
link |
So people recognize that, you know,
link |
that the recombination events
link |
lead to global changes in the search trajectories,
link |
the mutations event is a more refined step in the search.
link |
Then you have, you know,
link |
other sort of nature inspired algorithm, right?
link |
So it's one of the reasons that, you know,
link |
I think it's one of the funnest one,
link |
is the slime based algorithm, right?
link |
So I think the first was introduced by the Japanese group
link |
where it was able to solve some pre, you know,
link |
So that's, you know, and then I think there are still
link |
a lot of things we've yet to, you know,
link |
borrow from the nature, right?
link |
So there are a lot of sort of ideas that nature,
link |
you know, gets to offer us that, you know,
link |
it's up to us to grab it and to, you know,
link |
get the best use of it.
link |
Including neural networks.
link |
You know, we have a very crude inspiration
link |
from nature on neural networks.
link |
Maybe there's other inspirations to be discovered
link |
in the brain or other aspects of the various systems,
link |
even like the immune system, the way it interplays.
link |
I recently started to understand that the immune system
link |
has something to do with the way the brain operates.
link |
Like there's multiple things going on in there,
link |
which all of which are not modeled
link |
in artificial neural networks.
link |
And maybe if you throw a little bit
link |
of that biological spice in theirs,
link |
you'll come up with something, something cool.
link |
I'm not sure if you're familiar with the Drake equation.
link |
That estimate, I just did a video on it yesterday
link |
because I wanted to give my own estimate of it.
link |
It's an equation that combines a bunch of factors
link |
to estimate how many alien civilizations
link |
are in the galaxy.
link |
I've heard about it, yes.
link |
So one of the interesting parameters,
link |
you know, it's like how many stars are born every year,
link |
how many planets are on average per star,
link |
for this, how many habitable planets are there.
link |
And then the one that starts being really interesting
link |
is the probability that life emerges on a habitable planet.
link |
So like, I don't know if you think about,
link |
you certainly think a lot about evolution,
link |
but do you think about the thing
link |
which evolution doesn't describe,
link |
which is like the beginning of evolution,
link |
the origin of life.
link |
I think I put the probability of life developing
link |
a habitable planet at 1%.
link |
This is very scientifically rigorous.
link |
Okay, well, first at a high level for the Drake equation,
link |
what would you put that percent that on Earth?
link |
And in general, do you have something,
link |
do you have thoughts about how life might have started?
link |
You know, like the proteins being the first kind of,
link |
one of the early jumping points?
link |
Yes, also, I think back in 2018,
link |
there was a very exciting paper published in Nature
link |
where they found one of the simplest amino acids, glycine.
link |
One of the simplest amino acids, glycine,
link |
So this is, I apologize if I don't pronounce,
link |
it's a Russian named Comets.
link |
I think Chugryumov Gerasimenko.
link |
This is the comet where, and there was this mission
link |
to get close to this comet
link |
and get the stardust from its tail.
link |
And when scientists analyzed it,
link |
they actually found traces of glycine,
link |
which makes up, it's one of the 20 basic amino acids
link |
that makes up proteins, right?
link |
So that was kind of very exciting, right?
link |
But the question is very interesting, right?
link |
So what, if there is some alien life,
link |
is it gonna be made of proteins, right?
link |
Or maybe RNAs, right?
link |
So we see that the RNA viruses
link |
are certainly very well established
link |
sort of group of molecular machines, right?
link |
So yes, it's a very interesting question.
link |
What probability would you put?
link |
Like, how unlikely just on earth do you think
link |
this whole thing is that we got going?
link |
Like, are we really lucky or is it inevitable?
link |
Like, what's your sense when you sit back
link |
and think about life on earth?
link |
Is it higher or lower than 1%?
link |
Well, because 1% is pretty low,
link |
but it still is like, damn, that's a pretty good chance.
link |
Yes, it's a pretty good chance.
link |
I mean, I would personally, but again,
link |
I'm probably not the best person
link |
to do such estimations,
link |
but intuitively, I would probably put it lower.
link |
But still, I mean, you know, give up.
link |
We're really lucky here on earth.
link |
Or the conditions are really good.
link |
I mean, I think that there was everything was right
link |
So we still, the conditions were not like ideal
link |
if you try to look at what was, you know,
link |
several billions years ago when the life emerged.
link |
So there is something called the rare earth hypothesis
link |
that, you know, in counter to the Drake equation says
link |
that the, you know, the conditions of earth,
link |
if you actually were to describe earth,
link |
it's quite a special place.
link |
So special, it might be unique in our galaxy
link |
and potentially, you know, close to unique
link |
in the entire universe.
link |
Like it's very difficult to reconstruct
link |
those same conditions.
link |
And what the rare earth hypothesis argues
link |
is all those different conditions are essential for life.
link |
And so that's sort of the counter, you know,
link |
like all the things we, you know,
link |
thinking that earth is pretty average.
link |
I mean, I can't really, I'm trying to remember
link |
to go through all of them, but just the fact
link |
that it is shielded from a lot of asteroids,
link |
obviously the distance to the sun,
link |
but also the fact that it's like a perfect balance
link |
between the amount of water and land
link |
and all those kinds of things.
link |
I don't know, there's a bunch of different factors
link |
that I don't remember.
link |
There's a long list, but it's fascinating to think about
link |
if in order for something like proteins
link |
and then the DNA and RNA to merge,
link |
you need, and basic living organisms,
link |
you need to be a very close and earth like planet,
link |
which would be sad or exciting.
link |
If you ask me, I, you know,
link |
in a way I put a parallel between, you know,
link |
between our own research and, I mean,
link |
from the intuitive perspective, you know,
link |
you have those two extremes
link |
and the reality is never very rarely falls into the extremes.
link |
It's always the optimums always reached somewhere
link |
So I would, and that's what I tend to think.
link |
I think that, you know, we're probably somewhere in between.
link |
So they were not unique, unique,
link |
but again, the chances are, you know, reasonably small.
link |
The problem is we don't know the other extreme ways.
link |
Like I tend to think that we don't actually understand
link |
the basic mechanisms of like what this is all originated from.
link |
Like it seems like we think of life as this distinct thing,
link |
maybe intelligence is a distinct thing,
link |
maybe the physics that from which planets and suns are born
link |
is a distinct thing, but that could be a very,
link |
it's like the Stephen Wolfram thing.
link |
It's like the, from simple rules,
link |
emerges greater and greater complexity.
link |
So, you know, I tend to believe that just life finds a way.
link |
It, like, we don't know the extreme of how common life is
link |
because it could be life is like everywhere.
link |
Like, like so everywhere that it's almost like laughable.
link |
Like that we're such idiots to think,
link |
or you, like it's like ridiculous to even like think.
link |
It's like ants thinking that their little colony
link |
is the unique thing and everything else doesn't exist.
link |
I mean, it's also very possible that that's the extreme.
link |
And we're just not able to maybe comprehend
link |
the nature of that life.
link |
I mean, just to stick on alien life
link |
for just a brief moment more,
link |
is there is some signs of life on Venus in gaseous form.
link |
There's hope for life on Mars, probably extinct.
link |
We're not talking about intelligent life.
link |
Although that has been in the news recently.
link |
We're talking about basic, like, you know, bacteria.
link |
A lot of bacteria.
link |
And then also, I guess, there's a couple moons that I guess.
link |
Yeah, Europa, which is Jupiter's moon.
link |
I think there's another one.
link |
Are you, is that exciting?
link |
Or is it terrifying to you that we might find life?
link |
Do you hope we find life?
link |
I certainly do hope that we find life.
link |
I mean, it was very exciting to hear about, you know,
link |
this news about the possible life on Venus.
link |
It's been nice to have hard evidence of something with,
link |
which is what the hope is for Mars.
link |
And Europa, but do you think those organisms
link |
would be similar biologically?
link |
Or would they even be sort of carbon based?
link |
If we do find them?
link |
I would say they would be carbon based.
link |
It's a big question, right?
link |
So it's the moment we discover things outside Earth, right?
link |
Even if it's a tiny little single cell.
link |
I mean, there is so much.
link |
Just imagine that.
link |
I think that that would be another turning point
link |
for the science, you know?
link |
And if, especially if it's different
link |
in some very new way, that's exciting.
link |
Cause that says, that's a definitive state,
link |
not a definitive, but a pretty strong statement
link |
that life is everywhere in the universe.
link |
To me, at least that's really exciting.
link |
You brought up Joshua Letterberg
link |
in an offline conversation.
link |
I think I'd love to talk to you about AlphaFold.
link |
And this might be an interesting way
link |
to enter that conversation because,
link |
so he won the 1958 Nobel Prize in Physiology and Medicine
link |
for discovering that bacteria can mate and exchange genes.
link |
But he also did a ton of other stuff,
link |
like we mentioned, helping NASA find life on Mars,
link |
and Dendrol, the chemical expert system.
link |
Expert systems, remember those?
link |
Do you, what do you find interesting about this guy
link |
and his ideas about artificial intelligence in general?
link |
So I have a kind of personal story to share.
link |
So I started my PhD in Canada back in 2000.
link |
And so essentially my PhD was,
link |
so we were developing sort of a new language
link |
for symbolic machine learning.
link |
So it's different from the feature based machine learning.
link |
And one of the sort of cleanest applications
link |
of this approach, of this formalism,
link |
was two cheminformatics and computer aided drug design.
link |
So essentially, as a part of my research,
link |
I developed a system that essentially looked
link |
at chemical compounds of, say,
link |
the same therapeutic category, male hormones, right?
link |
And tried to figure out the structural fragments
link |
that are the structural building blocks
link |
that are important that define this class
link |
versus structural building blocks
link |
that are there just because, to complete the structure.
link |
But they are not essentially the ones
link |
that make up the chemical,
link |
the key chemical properties of this therapeutic category.
link |
And for me, it was something new.
link |
I was trained as an applied mathematician
link |
as with some machine learning background,
link |
but computer aided drug design
link |
was completely new territory.
link |
So because of that, I often find myself asking
link |
lots of questions on one of these sort of central forums.
link |
Back then, there were no Facebooks or stuff like that.
link |
There was a forum.
link |
It's a forum, it's essentially, it's like a bulletin board.
link |
Yeah, so essentially you have a bunch of people
link |
and you post the question
link |
and you get an answer from different people.
link |
And back then, one of the most popular forums was CCL.
link |
Think Computational Chemistry Library,
link |
not library, but something like that.
link |
But CCL, that was the forum.
link |
And there I asked a lot of dumb questions.
link |
Yes, I asked questions.
link |
I also shared some information about our formalism
link |
and how we do and whether whatever we do makes sense.
link |
And so, I remember that one of these posts,
link |
I mean, I still remember I would call it desperately
link |
looking for a chemist advice, something like that.
link |
And so I posed my question.
link |
I explained how our formalism is what it does
link |
and what kind of applications I'm planning to do.
link |
And it was in the middle of the night
link |
and I went back to bed.
link |
And next morning, have a phone call from my advisor
link |
who also looked at this forum.
link |
It's like, you won't believe who replied to you.
link |
And it's like, who?
link |
He said, well, there is a message to you
link |
from Joshua Lederberg.
link |
And my reaction was like, who is Joshua Lederberg?
link |
And your advisor hung up.
link |
So essentially, Joshua wrote me that we had conceptually
link |
similar ideas in the Dendral project.
link |
You may wanna look it up.
link |
And we should also, sorry, and it's a side comment say
link |
that even though he won the Nobel Prize at a really young age
link |
He was, I think he was what, 33.
link |
Yeah, it's just crazy.
link |
So anyway, so that's, so hence in the 90s,
link |
responding to young whippersnappers on the CCL forum.
link |
And so back then he was already very senior.
link |
I mean, he unfortunately passed away back in 2008.
link |
But back in 2001, he was a professor emeritus
link |
at Rockefeller University.
link |
And that was actually, believe it or not,
link |
one of the reasons I decided to join as a postdoc,
link |
the group of Andrei Saleh, who was at Rockefeller University
link |
with the hope that I could actually have a chance
link |
to meet Joshua in person.
link |
And I met him very briefly, right?
link |
The, just because he was walking,
link |
you know, there's a little bridge that connects the sort
link |
of the research campus with the sort of sky scrappers
link |
that Rockefeller owns.
link |
There were postdocs and faculty and graduate students live.
link |
And so, so I met him, you know,
link |
and I had a very short conversation, you know.
link |
But so I started, you know, reading about Dendral
link |
and I was amazed, you know, it's,
link |
we're talking about 1960, right?
link |
The ideas were so profound.
link |
Well, what's the fundamental ideas of it?
link |
The reason to make this is even crazier.
link |
So, so, so Lederberg wanted to make a system
link |
that would help him study the extraterrestrial molecules, right?
link |
So, so the idea was that, you know,
link |
the way you study the extraterrestrial molecules
link |
is you do the mass spec analysis, right?
link |
And so the mass spec gives you sort of bits,
link |
numbers about essentially gives you the ideas
link |
about the possible fragments or, you know, atoms,
link |
and, you know, and maybe a little fragments,
link |
pieces of this molecule that make up the molecule, right?
link |
So now you need to sort of to decompose this information
link |
and to figure out what was the whole
link |
before, you know, it became fragments, bits and pieces, right?
link |
So, so in order to make this, you know, to have this tool,
link |
the idea of Lederberg was to connect chemistry,
link |
computer science, and to design this so called expert system
link |
that looks, that takes into account,
link |
that takes as an input the mass spec data,
link |
the possible, the database of possible molecules,
link |
and essentially try to sort of induce
link |
the molecule that would correspond to this spectra,
link |
or, you know, essentially what this project ended up being
link |
was that, you know, it would provide a list of candidates
link |
that then a chemist would look at and make final decision.
link |
But the original idea is supposed to solve the entirety
link |
of this problem automatically.
link |
No, so he, you know, so he, back then he approached,
link |
yes, believe that, you know, it's amazing.
link |
I mean, it still blows my mind, you know, that it's,
link |
that's, and this was essentially the origin
link |
of the modern bioinformatics, cheminformatics,
link |
you know, back in the 60s.
link |
So that's, you know, so every time you deal
link |
with projects like this, with the, you know,
link |
research like this, you just, you know,
link |
so the power of the, you know, intelligence
link |
of these people is just, you know, overwhelming.
link |
Do you think about expert systems?
link |
Is there, and why they kind of didn't become successful,
link |
especially in the space of bioinformatics,
link |
where it does seem like there is a lot of expertise
link |
in humans and, you know, it's possible to see
link |
that a system like this could be made very useful.
link |
So it's actually, it's a great question.
link |
And this is something so, you know, so, you know,
link |
at my university, I teach artificial intelligence
link |
and, you know, we start the, my first two lectures
link |
are on the history of AI.
link |
And there we, you know, we try to, you know,
link |
go through the main stages of AI.
link |
And so, you know, the question of why expert systems failed
link |
or became obsolete is actually a very interesting one.
link |
And there are, you know, if you try to read the, you know,
link |
the historical perspectives, there are actually two lines
link |
One is that they were essentially not up to the expectations
link |
and so therefore they were replaced, you know,
link |
by, by other things.
link |
The other one was that completely opposite one
link |
that they were too good.
link |
And as a result, they essentially became sort of
link |
a household name and then essentially they got transformed.
link |
I mean, they, in both cases,
link |
sort of they were replaced by other things, right.
link |
I mean, they, in both cases, sort of the outcome
link |
was the same, they evolved into something.
link |
And that's what I, you know, if, if I look at this, right.
link |
So the modern machine learning, right.
link |
So there's echoes in the modern machine learning.
link |
Because, you know, if, if you think about this, you know,
link |
and how we design, you know, the most successful algorithms
link |
including alpha fold, right.
link |
You built in the knowledge about the domain that you study.
link |
So, so you built in your expertise.
link |
So speaking of alpha fold,
link |
so DeepMind's alpha fold two recently was announced to have
link |
quote unquote, solved protein folding.
link |
How exciting is this to you?
link |
It seems to be one of the, one of the exciting things
link |
that have happened in 2020.
link |
It's incredible accomplishment from the looks of it.
link |
What part of it is amazing to you?
link |
What part would you say is overhyped or maybe misunderstood?
link |
It's definitely a very exciting achievement
link |
to give you a little bit of perspective, right.
link |
So, so in bioinformatics, we have several competitions.
link |
And so the way, you know, you often hear how those competitions
link |
have been explained to sort of to non bioinformaticians
link |
as they, you know, they call it bioinformatics Olympic games.
link |
And there are several disciplines, right.
link |
So, so the, the, the historical one of the first one
link |
was the discipline in predicting the protein structure,
link |
predicting the 3D coordinates of the protein,
link |
but there are some others.
link |
So the predicting protein functions,
link |
predicting effects of mutations on protein functions,
link |
then predicting a protein, protein interactions.
link |
So, so the original one was a CASP
link |
or a critical assessment of, of protein structure.
link |
And the, you know, typically what happens
link |
during these competitions is, you know, scientists,
link |
experimental scientists solve the, the structures,
link |
but don't put them into the protein data bank,
link |
which is the centralized database
link |
that contains all the 3D coordinates.
link |
Instead, they hold it and release protein sequences.
link |
And now the challenge of the community
link |
is to predict the 3D structures of these proteins
link |
and then use the experimentary solve structures
link |
to assess which one is the closest one, right.
link |
And this competition, by the way,
link |
just a bunch of different tangents.
link |
So maybe you can also say what is protein folding
link |
then this competition CASP competition is,
link |
has become the gold standard
link |
and that's what was used to say
link |
that protein folding was solved.
link |
So I just added a little, just a bunch.
link |
So if you can, whenever you say stuff,
link |
maybe throw in some of the basics for the folks
link |
that might be outside of the field.
link |
So, yeah, so, you know, so the reason it's, you know,
link |
it's relevant to our understanding of protein folding
link |
is because, you know, we, we've yet to learn
link |
how the folding mechanistically works, right.
link |
So there are different hypotheses
link |
what happens to this fold.
link |
For example, there is a hypothesis
link |
that the folding happens by, you know,
link |
also in the modular fashion, right.
link |
So that, you know, we have protein domains
link |
that get folded independently
link |
because the structure is stable
link |
and then the whole protein structure gets formed.
link |
But, you know, within those domains,
link |
we also have so called secondary structure,
link |
the small alpha helices, beta sheets.
link |
So these are, you know, elements that are structurally stable.
link |
And so, and the question is, you know,
link |
when they, when do they get formed?
link |
Because some of the secondary structure elements,
link |
you have to have, you know, a fragment in the beginning
link |
and say the fragment in the middle, right.
link |
So you cannot potentially start having the full fold
link |
from the get go, right.
link |
So it's still, you know, it's still a big enigma.
link |
What happens, we know that it's an extremely efficient
link |
and stable process, right.
link |
So there's this long sequence
link |
and the fold happens really quickly.
link |
So that's really weird, right.
link |
And it happens like the same way almost every time.
link |
That's really weird.
link |
That's freaking weird.
link |
It's, yeah, that's why it's such an amazing thing.
link |
But most importantly, right, so it's, you know,
link |
so when you see the, you know, the translation process, right.
link |
So when you don't have the whole protein translated,
link |
right, it's still being translated, you know,
link |
getting out from the ribosome,
link |
you already see some structural, you know, fragmentation.
link |
So, so folding starts happening
link |
before the whole protein gets produced, right.
link |
And so this is, this is obviously, you know,
link |
one of the biggest questions in, you know,
link |
in modern molecular biologies.
link |
Not, not like maybe what happens.
link |
Like that's not, that's bigger than the question of folding.
link |
That's the question of like,
link |
so like deeper fundamental idea of folding.
link |
You know, so obviously if we are able to predict
link |
the end product of protein folding,
link |
we are one step closer to understanding
link |
sort of the mechanisms of the protein folding.
link |
Because we can then potentially look and start probing
link |
what are the critical parts of this process
link |
and what are not so critical parts of this process.
link |
So we can start decomposing this, you know,
link |
so in a way this protein structure prediction algorithm
link |
can be used as a tool, right.
link |
So you change the, you know, you modify the protein,
link |
you get back to this tool, it predicts,
link |
okay, it's completely unstable.
link |
Yeah, which aspects of the input
link |
will have a big impact on the output.
link |
So what happens is, you know, we typically have
link |
some sort of incremental advancement.
link |
You know, each stage of this CASP competition,
link |
you have groups with incremental advancement.
link |
And, you know, historically the top performing groups
link |
were, you know, they were not using machine learning.
link |
They were using very advanced biophysics
link |
combined with bioinformatics,
link |
combined with, you know, the data mining.
link |
And that was, you know, that would enable them
link |
to obtain protein structures of those proteins
link |
that don't have any structurally solved relatives.
link |
Because, you know, if we have another protein,
link |
say the same protein, but coming from a different species,
link |
we could potentially derive some ideas,
link |
and that's so called homology or comparative modeling,
link |
where we'll derive some ideas
link |
from the previously known structures.
link |
And that would help us tremendously in, you know,
link |
in reconstructing the 3D structure overall.
link |
But what happens when we don't have these relatives?
link |
This is when it becomes really, really hard, right?
link |
So that's so called the NOVO, you know,
link |
the NOVO protein structure prediction.
link |
And in this case, those methods were traditionally very good.
link |
But what happened in the last year,
link |
the original alpha fold came into,
link |
and over sudden it's much better than everyone else.
link |
Oh, the competition is only every two years, I think.
link |
And then, so, you know, it was sort of
link |
kind of over shockwave to the bioinformatics community
link |
that, you know, we have like a state of the art
link |
machine learning system that does, you know,
link |
structure prediction.
link |
And essentially what it does, you know,
link |
so, you know, if you look at this,
link |
it actually predicts the context.
link |
So, you know, so the process of reconstructing
link |
the 3D structure starts by predicting the context
link |
between the different parts of the protein.
link |
And the context essentially is the parts of the proteins
link |
that are in a close proximity to each other.
link |
So actually the machine learning part seems to be
link |
estimating, you can correct me if I'm wrong here,
link |
but it seems to be estimating the distance matrix,
link |
which is like the distance between the different parts.
link |
So we call the contact map.
link |
So once you have the contact map,
link |
the reconstruction is becoming more straightforward.
link |
But so the contact map is the key.
link |
And so, you know, so that's what happened.
link |
And now we started seeing in this current stage, right?
link |
Well, in the most recent one,
link |
we started seeing the emergence of these ideas
link |
in others people works, right?
link |
But yet here's, you know, Alpha Fold 2
link |
that again outperforms everyone else.
link |
And also by introducing yet another wave
link |
of the machine learning ideas.
link |
There doesn't seem to be also an incorporation.
link |
First of all, the paper is not out yet,
link |
but there's a bunch of ideas already out.
link |
There does seem to be an incorporation of this other thing.
link |
I don't know if it's something that you could speak to,
link |
which is like the incorporation of like other structures,
link |
like evolutionary similar structures that are used
link |
to kind of give you hints.
link |
So evolutionary similarity is something
link |
that we can detect at different levels, right?
link |
So we know, for example, that the structure of proteins
link |
is more conserved than the sequence.
link |
The sequence could be very different,
link |
but the structural shape is actually still very conserved.
link |
So that's sort of the intrinsic property
link |
that, you know, in a way related to protein folds,
link |
you know, to the evolution of the, you know,
link |
of the protein of proteins and protein domains, et cetera.
link |
I mean, we've been multiple studies.
link |
And, you know, ideally if you have structures,
link |
you know, you should use that information.
link |
However, sometimes we don't have this information.
link |
Instead, we have a bunch of sequences.
link |
Sequences we have a lot, right?
link |
So we have, you know, hundreds, thousands of,
link |
you know, different organisms sequence, right?
link |
And by taking the same protein,
link |
but in different organisms and aligning it,
link |
so making it, you know, making the corresponding positions
link |
aligned, we can actually say a lot about sort of
link |
what is conserved in this protein.
link |
And therefore, you know, structurally more stable,
link |
what is diverse in this protein.
link |
So on top of that, we could provide sort of the information
link |
about the sort of the secondary structure
link |
of this protein, et cetera, et cetera.
link |
So this information is extremely useful.
link |
And it's already there.
link |
So while it's tempting to, you know,
link |
to do a complete ab initio,
link |
so you just have a protein sequence and nothing else,
link |
the reality is such that we are overwhelmed with this data.
link |
So why not use it?
link |
And so yeah, so I'm looking forward
link |
to reading this paper.
link |
It does seem to, like they've,
link |
in the previous version of Alpha Fold,
link |
they didn't, for this evolutionary similarity thing,
link |
they didn't use machine learning for that.
link |
Or they, rather they used it as like the input
link |
to the entirety of the neural net,
link |
like the features derived from the similarity.
link |
It seems like there's some kind of quote, unquote,
link |
iterative thing where it seems to be part of the,
link |
part of the learning process is the incorporation
link |
of this evolutionary similarity.
link |
Yeah, I don't think there is a bio archive paper, right?
link |
There's a blog post that's written
link |
by a marketing team essentially,
link |
which, you know, it has some scientific similarity
link |
probably to the actual methodology used,
link |
but it could be, it's like interpreting scripture.
link |
It could be, it could be just poetic interpretations
link |
of the actual work as opposed
link |
to direct connection to the work.
link |
So now speaking about protein folding, right?
link |
So, so, so, you know, in order to answer the question,
link |
whether or not we have solved this, right?
link |
So we need to go back to the beginning
link |
of our conversation, you know,
link |
with the realization that, you know,
link |
an average protein is that typically what the CASP
link |
has been focusing on is the, you know,
link |
the, this competition has been focusing on the single,
link |
maybe two domain proteins that are still very compact.
link |
And even those ones are extremely challenging to solve, right?
link |
But now we talk about, you know,
link |
an average protein that has two, three protein domains.
link |
If you look at the proteins that are in charge
link |
of the, you know, of the process with the neural system, right?
link |
Perhaps one of the most recently evolved sort of systems
link |
in the organism, right?
link |
All of them, well, the majority of them
link |
are highly multi domain proteins.
link |
So they are, you know, some of them have five, six, seven,
link |
you know, and more domains, right?
link |
And, you know, we are very far away
link |
from understanding how these proteins are folded.
link |
So the complexity of the protein matters here,
link |
the complexity of the protein modules
link |
or the protein domains.
link |
So you're saying solved.
link |
So the definition of solved here
link |
is particularly the cast competition,
link |
achieving human level, not human level,
link |
achieving experimental level performance
link |
on these particular sets of proteins
link |
that have been used in these competitions.
link |
Well, I mean, you know, I do think that, you know,
link |
especially with regards to the alpha fold, you know,
link |
it is able to, you know, to solve, you know,
link |
at the near experimental level,
link |
pretty big majority of the more compact proteins,
link |
like or protein domains, because again,
link |
in order to understand how the overall protein,
link |
you know, multi domain protein fold,
link |
we do need to understand the structure
link |
of its individual domains.
link |
I mean, unlike if you look at alpha zero or like even mu zero,
link |
if you look at that work, you know, it's nice,
link |
reinforcement learning, self playing mechanisms are nice
link |
because it's all in simulation.
link |
So you can learn from just huge amounts,
link |
like you don't need data with like the problem with proteins,
link |
like the size, I forget how many 3D structures have been mapped,
link |
but the training data is very small, no matter what.
link |
It's like millions, maybe a one or two millions,
link |
something like that.
link |
But some very small number,
link |
but like it doesn't seem like that's scalable.
link |
There has to be, I don't know,
link |
it feels like you want to somehow 10x the data
link |
or 100x the data somehow.
link |
Yes, but we also can take advantage of homology models,
link |
right, so the models that are of very good quality
link |
because they are essentially obtained
link |
based on the evolutionary information.
link |
So there is a potential to enhance this information
link |
and use it again to empower the training set.
link |
And it's, I think, I am actually very optimistic.
link |
I think it's been one of these sort of, you know,
link |
churning events where you have a system
link |
that is, you know, a machine learning system
link |
that is very, very, very, very, very, very, very, very, very,
link |
you know, a machine learning system
link |
that is truly better than the sort of the more conventional
link |
biophysics based methods.
link |
That's a huge leap.
link |
This is one of those fun questions,
link |
but where would you put it in the ranking
link |
of the greatest breakthroughs in artificial intelligence
link |
So like, okay, so let's see who's in the running.
link |
Maybe you can correct me.
link |
So you got like AlphaZero and AlphaGo beating, you know,
link |
beating the world champion at the game of Go.
link |
Thought to be impossible like 20 years ago,
link |
or at least the AI community was highly skeptical.
link |
Then you got like also DBlue original Kasparov.
link |
You have deep learning itself,
link |
like the, maybe what would you say,
link |
the AlexNet image net moment.
link |
So the first network achieving human level performance,
link |
super not, that's not true.
link |
Achieving like a big leap in performance
link |
on the computer vision problem.
link |
There is open AI, the whole like GPT three,
link |
that whole space of transformers and language models,
link |
just achieving this incredible performance
link |
of application of neural networks to language models.
link |
Boston Dynamics, pretty cool.
link |
Like robotics, even though people are like, there's no AI.
link |
No, no, there's no machine learning currently,
link |
but AI is much bigger than machine learning.
link |
So that just the engineering aspect,
link |
I would say it's one of the greatest accomplishments
link |
in engineering side.
link |
Engineering meaning like mechanical engineering
link |
Then of course, autonomous vehicles,
link |
you can argue for Waymo,
link |
which is like the Google self driving car,
link |
or you can argue for Tesla,
link |
which is like actually being used
link |
by hundreds of thousands of people
link |
on the road to day machine learning system.
link |
And I don't know if you can, what else is there?
link |
But I think that's it.
link |
So, and then out for full,
link |
many people are saying as up there, potentially number one,
link |
would you put them at number one?
link |
Well, in terms of the impact on the science
link |
and on the society beyond,
link |
it's definitely to me would be one of the, you know.
link |
Top three, what do you want?
link |
Maybe, I mean, I'm probably not the best person
link |
to answer that, but I do have,
link |
I remember my, you know, back in, I think, 1997,
link |
when Deep Blue, that Kasparov,
link |
it was, I mean, it was a shock.
link |
I mean, it was, and I think for the, you know,
link |
for the pre substantial part of the world,
link |
that especially people who have some, you know,
link |
some experience with chess, right?
link |
And realizing how incredibly human this game,
link |
how, you know, how much of a brainpower you need,
link |
you know, to reach those, you know,
link |
those levels of grandmasters, right, level.
link |
Yeah, I mean, it's probably one of the first time,
link |
and how good Kasparov was.
link |
And again, yeah, so Kasparov is actually one
link |
of the best ever, right?
link |
And you get a machine that beats him, right?
link |
So it's, you know.
link |
First time a machine probably beat a human at that scale
link |
of a thing, of anything.
link |
Yes, so that was, to me, that was like, you know,
link |
one of the groundbreaking events in the history of FAYA.
link |
Yeah, that's probably number one.
link |
I probably, like, we don't, it's hard to remember.
link |
It's like Muhammad Ali versus, I don't know,
link |
any other Mike Tyson, something like that.
link |
It's like, nah, you got to put Muhammad Ali
link |
at number one, same with Deep Blue,
link |
even though it's not machine learning based.
link |
Still, it uses advanced search,
link |
and search is the integral part of FAYA, right?
link |
So, as you said, it's...
link |
People don't think of it that way at this moment.
link |
In Vogue currently, search is not seen
link |
as a fundamental aspect of intelligence,
link |
but it very well, and you very likely is.
link |
In fact, I mean, that's what neural networks are.
link |
They're just performing search on the space of parameters.
link |
And it's all search.
link |
All of intelligence is some form of search,
link |
and you just have to become clever and clever
link |
at that search problem.
link |
And I also have another one that you didn't mention
link |
that's one of my favorite ones is...
link |
So, you probably heard of this.
link |
It's, I think it's called Deep Rembrandt.
link |
It's the project where they trained...
link |
I think there was a collaboration between the experts
link |
in Rembrandt painting in Netherlands,
link |
and a group, an artificial intelligence group,
link |
where they train an algorithm to replicate the style
link |
of the Rembrandt, and they actually printed a portrait
link |
that never existed before in the style of Rembrandt.
link |
They, I think they printed it only sort of on the canvas
link |
that using pretty much same types of paints
link |
and stuff, to me, it was mind blowing.
link |
Yeah, and the space of art, that's interesting.
link |
There hasn't been, maybe that's it,
link |
but I think there hasn't been an image in that moment yet
link |
in the space of art.
link |
You haven't been able to achieve
link |
super human level performance in the space of art,
link |
even though there was a big famous thing
link |
where there was a piece of art was purchased,
link |
I guess, for a lot of money.
link |
But it's still, people are like in the space of music,
link |
at least, it's clear that human created pieces
link |
are much more popular.
link |
So there hasn't been a moment where it's like,
link |
oh, this is, where now, I would say in the space of music,
link |
what makes a lot of money?
link |
We're talking about serious money.
link |
It's music and movies, or like shows and so on,
link |
and entertainment.
link |
There hasn't been a moment where AI created,
link |
AI was able to create a piece of music,
link |
or a piece of cinema, or like Netflix show,
link |
that is sufficiently popular to make a ton of money.
link |
And that moment would be very, very powerful,
link |
because that's an AI system being used
link |
to make a lot of money.
link |
And like direct, of course, AI tools,
link |
like even premiere, audio editing, all the editing,
link |
everything I do, to edit this podcast,
link |
there's a lot of AI involved.
link |
I won't, actually, this is a program,
link |
I wanna talk to those folks just
link |
because I wanna nerd out, it's called Isotope,
link |
I don't know if you're familiar with it.
link |
They have a bunch of tools of audio processing,
link |
and they have, I think they're Boston based.
link |
Just, it's so exciting to be, to use it,
link |
like on the audio here,
link |
because it's all machine learning.
link |
It's not, because most audio production stuff
link |
is like any kind of processing you do,
link |
it's very basic signal processing.
link |
And you're tuning knobs and so on.
link |
They have all of that, of course,
link |
but they also have all of this machine learning stuff,
link |
like where you actually give it training data,
link |
you select parts of the audio you train on,
link |
you train on it, and it figures stuff out, it's great.
link |
It's able to detect, like the ability of it
link |
to be able to separate voice and music, for example,
link |
or voice and anything is incredible.
link |
Like it just, it's clearly exceptionally good
link |
at applying these different neural networks models
link |
to separate the different kinds of signals from the audio.
link |
Okay, so that's really exciting.
link |
Photoshop, Adobe, people also use it,
link |
but to generate a piece of music
link |
that will sell millions, a piece of art, yeah.
link |
No, I agree, and that's, as I mentioned,
link |
I offer my AI class and an integral part of this
link |
is a project, right?
link |
So it's my favorite, ultimate favorite part
link |
because typically we have this project presentations
link |
the last two weeks of the classes,
link |
right before the Christmas break,
link |
and it adds this cool excitement.
link |
And every time, I'm amazed with some projects
link |
that people come up with.
link |
And so, and quite a few of them are actually,
link |
they have some link to arts.
link |
I mean, I think last year, we had a group
link |
who designed an AI producing Hokus, Japanese poems.
link |
So, and some of them, so it got trained
link |
on the English place, Hokus, Hokus, right there.
link |
So, and some of them, they get to present
link |
like the top selection, they were pretty good.
link |
I mean, of course, I'm not a specialist,
link |
but you read them and you see it.
link |
It seems profound.
link |
Yes, yeah, it seems reasonable.
link |
So it's kinda cool.
link |
We also had a couple of projects
link |
where people tried to teach AI
link |
how to play rock music, classical music,
link |
I think, and popular music.
link |
Interestingly enough, classical music
link |
was among the most difficult ones.
link |
And of course, if you look at the grandmasters of music,
link |
like Bach, right, so there's a lot of almost math.
link |
Yeah, well, he's very mathematical, right?
link |
So this is, I would imagine that at least some style
link |
of this music could be picked up,
link |
but then you have completely different spectrum
link |
of classical composers.
link |
And so, you know, and you know,
link |
I think it's a little bit of a challenge
link |
and so it's almost like you don't have to sort of
link |
look at the data, you just listen to it
link |
and say, nah, that's not it.
link |
Yeah, that's how I feel too.
link |
There's open AI as I think open muse
link |
or something like that, the system.
link |
It's cool, but it's like, yeah,
link |
it's not compelling for some reason.
link |
It could be a psychological reason too.
link |
Maybe we need to have a human being,
link |
a tortured soul behind the music.
link |
Yeah, no, absolutely, I completely agree.
link |
But yeah, whether or not we'll have,
link |
one day we'll have, you know,
link |
a song written by an AI engine
link |
to be in like in top charts.
link |
I wouldn't be surprised.
link |
I wouldn't be surprised.
link |
I wonder if we already have one.
link |
It just hasn't been announced.
link |
How hard is the multi protein folding problem?
link |
Is that kind of something you've already mentioned,
link |
which is baked into this idea
link |
of greater and greater complexity of proteins?
link |
Like multi domain proteins,
link |
is that basically become multi protein complexes?
link |
Yes, you got it right.
link |
So it's sort of, it has the components of both,
link |
of protein folding and protein, protein interactions.
link |
Because in order for these domains,
link |
I mean, many of these proteins actually,
link |
they never form a stable structure.
link |
One of my favorite proteins,
link |
and pretty much everyone who works in the,
link |
I know, who I know who works with proteins,
link |
they always have their favorite proteins, right?
link |
So one of my favorite proteins,
link |
probably my favorite protein,
link |
the one that I worked when I was a postdoc,
link |
is so called post synaptic density 95, PSD95 protein.
link |
So it's one of the key actors
link |
in the majority of neurological processes
link |
at the molecular level.
link |
So it's essentially, it's a key player
link |
in the post synaptic density.
link |
So this is the crucial part of the synapse,
link |
where a lot of these chemical processes are happening.
link |
So it has five domains, right?
link |
So five protein domains, pretty large proteins,
link |
I think 600 something, I mean, I said, but, you know,
link |
the way it's organized itself, it's flexible, right?
link |
So it acts as a scaffold.
link |
So it is used to bring in other proteins.
link |
So they start acting in the orchestrated manner, right?
link |
So, and the type of the shape of this protein,
link |
it's in a way, there are some stable parts of this protein,
link |
but there are some flexible.
link |
And this flexibility is built in into the protein
link |
in order to become sort of this multifunctional machine.
link |
So do you think that kind of thing is also learnable
link |
through the alpha fold to kind of approach?
link |
I mean, the time will tell.
link |
Is it another level of complexity?
link |
Is it like, how big of a jump in complexity
link |
is that whole thing?
link |
To me, it's yet another level of complexity
link |
because when we talk about protein, protein interactions,
link |
and there is actually a different challenge for this
link |
And so this, that is focused specifically
link |
on macromolecular interactions,
link |
protein, protein, protein, DNA, et cetera.
link |
So, but it's, you know, there are different mechanisms
link |
that govern molecular interactions
link |
and that need to be picked up,
link |
say by a machine learning algorithm.
link |
Interestingly enough, we actually,
link |
we participated for a few years in this competition.
link |
We typically don't participate in competitions,
link |
I don't know, don't have enough time, you know,
link |
because it's very intensive process.
link |
But we participated back in, you know,
link |
about 10 years ago or so.
link |
And the way we enter this competition,
link |
so we design a scoring function, right?
link |
So the function that evaluates whether or not
link |
your protein, protein interaction
link |
is supposed to look like experimentally solved, right?
link |
So the scoring function is very critical part
link |
of the model prediction.
link |
So we designed it to be a machine learning one.
link |
And so it was one of the first machine learning
link |
based scoring function used in capri.
link |
And, you know, we essentially, you know,
link |
learned what should contribute,
link |
what are the critical components contributing
link |
into the protein, protein interactions.
link |
So this could be converted into learning problem
link |
and thereby it could be learned.
link |
I believe so, yes.
link |
Do you think Alpha Fold 2 or something similar to it
link |
from DeepMind or somebody else will be,
link |
will result in a Nobel Prize or multiple Nobel Prizes?
link |
So like, you know, obviously, maybe not so obviously,
link |
you can't give a Nobel Prize to a computer program.
link |
You, at least for now, give it to the designers
link |
But is, do you see one or multiple Nobel Prizes
link |
where Alpha Fold 2 is like a large percentage
link |
of what that prize is given for?
link |
Would it lead to discoveries at the level of Nobel Prizes?
link |
I mean, I think we are definitely destined
link |
to see the Nobel Prize becoming sort of,
link |
to be evolving with the evolution of science.
link |
And the evolution of science is such
link |
that it now becomes like really multifaceted, right?
link |
So where you don't really have like a unique discipline,
link |
you have sort of the, a lot of cross disciplinary talks
link |
in order to achieve sort of, you know,
link |
really big advancements, you know.
link |
So I think, you know, the computational methods
link |
will be acknowledged in one way or another.
link |
And as a matter of fact, you know,
link |
they were first acknowledged back in 2013, right?
link |
Where, you know, the first three people were,
link |
you know, awarded the Nobel Prize for studying
link |
the protein folding, right, the principle.
link |
And, you know, I think all three of them
link |
are computational biophysicists, right?
link |
So, you know, that I think is, is unavoidable, you know.
link |
It will come with a time.
link |
The fact that, you know, alpha fold
link |
and, you know, similar approaches,
link |
because again, it's a matter of time
link |
that people will embrace the, this, you know, principle
link |
and we'll see more and more such, you know,
link |
such tools coming into play.
link |
But, you know, these methods will be critical
link |
in a scientific discovery, no, no doubts about it.
link |
On the engineering side, maybe a dark question,
link |
but do you think it's possible
link |
to use these machine learning methods
link |
to start to engineer proteins?
link |
And the next question is something,
link |
quite a few biologists are against summer four
link |
for study purposes is to engineer viruses.
link |
Do you think machine learning, like something like alpha fold
link |
could be used to engineer viruses?
link |
So, to answer the first question, you know,
link |
it has been, you know, a part of the research
link |
in the protein science, the protein design
link |
is, you know, is a very prominent areas of research.
link |
Of course, you know, one of the pioneers
link |
is David Baker and Rosetta algorithm
link |
that essentially was doing the, the, the nova design
link |
and was used to design new proteins, you know.
link |
And design of proteins means design of functions.
link |
So like when you design a protein, you can control,
link |
I mean, the whole point of a protein
link |
with the protein structure comes a function
link |
like it's doing something.
link |
So you can design different things.
link |
So you can, yeah, so you can, well, you can look
link |
at the proteins from the functional perspective.
link |
You can also look at the proteins
link |
from the structural perspective, right?
link |
So the structural building blocks.
link |
So if you want to have a building block of a certain shape,
link |
you can try to achieve it by, you know,
link |
introducing a new protein sequence
link |
and predicting, you know, how it will fold.
link |
So, so with that, I mean, it's, it's a natural,
link |
one of the, you know, natural applications
link |
of these algorithms.
link |
Now talking about engineering a virus.
link |
With machine learning.
link |
With machine learning, right?
link |
So, so, well, you know, so luckily for us,
link |
I mean, we don't have that much data, right?
link |
We actually, right now, one of the projects
link |
that we are carrying on in the lab
link |
is we're trying to develop a machine learning algorithm
link |
that determines the, whether or not
link |
the current strain is pathogenic.
link |
And the current strain of the coronavirus.
link |
I mean, so, so there are applications to coronaviruses
link |
because we have strains of SARS CoV2, also SARS CoV,
link |
MERS that are pathogenic, but we also have strains
link |
of other coronaviruses that are, you know, not pathogenic.
link |
I mean, the common cold viruses and, you know,
link |
and some other ones, right?
link |
So, so, pathogenic, meaning spreading.
link |
Pathogenic means actually inflicting damage, correct.
link |
There are also some, you know, seasonal versus pandemic
link |
strains of influenza, right?
link |
And to determining what are the molecular determinant, right?
link |
So that are built in into the protein sequence,
link |
into the gene sequence, right?
link |
So, and whether or not the machine learning can determine
link |
those components, right?
link |
So like using machine learning to,
link |
that's really interesting to, to, to, to given,
link |
give the input is like what the entire,
link |
the protein sequence, and then determine
link |
if this thing is going to be able to do damage
link |
to, to a biological system.
link |
So, so, I mean, so.
link |
It's a good machine learning part.
link |
You're saying we don't have enough data for that?
link |
We, I mean, for, for this specific one we do,
link |
we might actually, you know, have to back up on this.
link |
Cause we, we're still in the process.
link |
There was one work that appeared in bio archive
link |
by Eugene Kunin, who is one of these, you know,
link |
pioneers in, in, in evolutionary genomics.
link |
And they tried to look at this, but, you know,
link |
the methods were sort of standard, you know,
link |
supervised learning methods.
link |
And now the question is, you know,
link |
can you advance it further by, by using, you know,
link |
not so standard methods, you know,
link |
so there's obviously a lot of hope in, in transfer learning
link |
where you can actually try to transfer the information
link |
that the machine learning learns
link |
about the proper protein sequences, right?
link |
And, you know, so, so there is some promise
link |
in going this direction.
link |
But if we have this, it would be extremely useful
link |
because then we could essentially forecast
link |
the potential mutations that would make
link |
the current strain more or less pathogenic, right?
link |
Anticipate, anticipate them from a vaccine development
link |
for the treatment, anti, anti viral drug development.
link |
That would be a very crucial task.
link |
But you could also use that system to then say,
link |
how would we potentially modify this virus
link |
to make it more pathogenic?
link |
I mean, you know, the, again, the hope is,
link |
well, several things, right?
link |
So one is that, you know, it's,
link |
even if you design a, you know, a sequence, right?
link |
So to carry out the actual experimental biology
link |
to ensure that all the components working, you know,
link |
is a completely different method.
link |
Then the, you know, we've seen in the past,
link |
there could be some regulation of the moment
link |
the scientific community recognizes
link |
that it's now becoming no longer a sort of a fun puzzle
link |
to, you know, for machine learning.
link |
Could be a weapon.
link |
Yes. So then there might be some regulation.
link |
So I think back in what, 2015, there was, you know,
link |
there was an issue on regulating the research
link |
on influenza strains, right?
link |
That there were several groups, you know,
link |
use sort of the mutation analysis to determine
link |
whether or not this strain will jump
link |
from one species to another.
link |
And I think there was like a half a year moratorium
link |
on the research, on the paper published
link |
until, you know, scientists, you know, analyzed it
link |
and decided that it's actually safe.
link |
I forgot what that's called.
link |
Something a function, test a function.
link |
Yeah, gain a function, loss of function.
link |
It's like, let's watch this thing mutate for a while
link |
to see like, to see what kind of things we can observe.
link |
I guess I'm not so much worried about that kind of research
link |
if there's a lot of regulation
link |
and if it's done very well and with competence and seriously.
link |
I am more worried about kind of this, you know,
link |
the underlying aspect of this question
link |
is more like 50 years from now.
link |
Speaking to the Drake equation,
link |
one of the parameters in the Drake equation
link |
is how long civilizations last.
link |
And that seems to be the most important value actually
link |
for calculating if there's other alien intelligence
link |
civilizations out there.
link |
That's where there's most variability,
link |
assuming like if life, if that percentage
link |
that life can emerge is like not zero,
link |
like if we're super unique,
link |
then it's the how long we last
link |
is basically the most important thing.
link |
So from a selfish perspective,
link |
but also from a Drake equation perspective,
link |
I'm worried about our civilization lasting.
link |
And you kind of think about all the ways
link |
in which machine learning can be used
link |
to design greater weapons of destruction, right?
link |
And I mean, one way to ask that,
link |
if you look sort of 50 years from now,
link |
100 years from now,
link |
would you be more worried about natural pandemics
link |
or engineered pandemics?
link |
Like who's the better designer of viruses,
link |
nature or humans if we look down the line?
link |
I think in my view, I would still be worried
link |
about the natural pandemics simply because,
link |
I mean, the capacity of the nature producing this.
link |
It does pretty good job, right?
link |
And the motivation for using virus,
link |
engineering viruses as a weapon is a weird one
link |
because maybe you can correct me on this,
link |
but it seems very difficult to target a virus, right?
link |
The whole point of a weapon, the way a rocket works,
link |
if a starting point, you have an end point
link |
and you're trying to hit a target,
link |
to hit a target with a virus is very difficult.
link |
It's basically just, right?
link |
It's, the target would be the human species.
link |
Man, yeah, I have a hope in us,
link |
I'm forever optimistic that we will not,
link |
there's no, there's insufficient evil in the world
link |
to lead that to that kind of destruction.
link |
Well, I also hope that, I mean, that's what we see.
link |
I mean, with the way we are getting connected,
link |
the world is getting connected.
link |
I think it helps for the world to become more transparent.
link |
So the information spread is,
link |
I think it's one of the key things for the society
link |
to become more balanced one way or another.
link |
This is something that people disagree with me on,
link |
but I do think that the kind of secrecy
link |
that governments have,
link |
so you're kind of speaking more to the other aspects,
link |
like research community being more open,
link |
companies are being more open,
link |
government is still like,
link |
we're talking about like military secrets.
link |
I think military secrets of the kind
link |
that could destroy the world
link |
will become also a thing of the 20th century.
link |
It'll become more and more open.
link |
Like I think nations will lose power
link |
in the 21st century,
link |
like lose sufficient power to our secrecy.
link |
Transparency is more beneficial than secrecy,
link |
but of course that's not obvious.
link |
Let's hope so that the governments will become
link |
So we last talked I think in March or April,
link |
what have you learned?
link |
How is your philosophical, psychological,
link |
biological worldview changed since then?
link |
Or you've been studying it nonstop
link |
from a computational biology perspective.
link |
How is your understanding and thoughts
link |
about this virus changed over those months
link |
from the beginning to today?
link |
One thing that I was really amazed
link |
at how efficient the scientific community was.
link |
I mean, and even just judging on this very narrow domain
link |
of protein structure, understanding the structural
link |
characterization of this virus
link |
from the components point of view,
link |
whole virus point of view.
link |
If you look at SARS, right?
link |
The something that happened less than 20,
link |
but close enough 20 years ago,
link |
and you see what, when it happened,
link |
what was sort of the response by the scientific community,
link |
you see that the structure characterizations did occur,
link |
but it took several years, right?
link |
Now the things that took several years,
link |
it's a matter of months, right?
link |
So we see that the research pop up.
link |
We are at the unprecedented level
link |
in terms of the sequencing, right?
link |
Never before we had a single virus
link |
sequenced so many times, you know?
link |
So which allows us to actually, to trace very precisely
link |
the sort of the evolutionary nature of this virus,
link |
what happens, and it's not just this virus independently
link |
of everything, it's the sequence of this virus
link |
linked anchor to the specific geographic place
link |
to specific people because our genotype influences
link |
also the evolution of this.
link |
It's always a host pathogen evolution that occurs.
link |
It'd be cool if we also had a lot more data
link |
about the spread of this virus, not maybe,
link |
well, it'd be nice if we had it
link |
for like contact tracing purposes for this virus,
link |
but it'd be also nice if we had it for the study
link |
for future viruses to be able to respond and so on.
link |
But it's already nice that we have geographical data
link |
and the basic data from individual humans, yeah.
link |
Exactly, no, I think contact tracing is obviously
link |
a key component in understanding the spread of this virus.
link |
There is also, there is a number of challenges, right?
link |
So XPRIZE is one of them we just recently
link |
took a part of this competition.
link |
It's the prediction of the number of infections
link |
in different regions, so obviously the AI
link |
is the main topic in those predictions.
link |
Yeah, but it's still the data, I mean, that's a competition,
link |
but the data is weak on the training.
link |
It's great, it's much more than probably before,
link |
but it would be nice if it was really rich.
link |
I talked to Michael Mina from Harvard,
link |
I mean, he dreams that the community comes together
link |
with a weather map to wear of viruses,
link |
like really high resolution sensors on how,
link |
from person to person, the viruses that travel,
link |
all the different kinds of viruses,
link |
because there's a ton of them,
link |
and then you'll be able to tell the story
link |
that you've spoken about of the evolution of this virus.
link |
It's like day to day mutations that are occurring.
link |
I mean, that would be fascinating,
link |
just from a perspective of study,
link |
and from the perspective of being able
link |
to respond to future pandemics.
link |
That's ultimately what I'm worried about.
link |
People love books.
link |
Is there some three or whatever number of books,
link |
technical fiction, philosophical, that brought you joy in life,
link |
had an impact on your life,
link |
and maybe some that you would recommend others?
link |
So I'll give you three very different books,
link |
and I also have a special runner up, and a...
link |
Honorable matching.
link |
Yeah, I mean, it's an audiobook,
link |
and there's some specific reason behind it.
link |
So the first book is something that impacted my earlier stage of life,
link |
and I'm probably not going to be very original here.
link |
It's Bulgakov's Master and Margarita.
link |
So that's probably...
link |
Well, not for a Russian, maybe, it's not super original,
link |
but it's a really powerful book, even in English.
link |
So I read it in English.
link |
It is incredibly powerful, and I mean, it's the way it ends,
link |
right, so I still have goosebumps when I read the very last sort of...
link |
It's called Prologue, where it's just so powerful.
link |
What impact did it have on you? What ideas?
link |
What insights did you get from it?
link |
I was just taken by the fact that you have those parallel lives
link |
apart from many centuries, right?
link |
And somehow they got sort of intertwined into one story.
link |
And that, to me, was fascinating.
link |
And of course, the romantic part of this book is not just romance,
link |
it's like the romance empowered by sort of magic, right?
link |
And maybe on top of that, you have some irony,
link |
which is unavoidable, right, because it was that Soviet time.
link |
But it's very deeply Russian, so that's the wit, the humor, the pain,
link |
the love, all of that is one of the books that kind of captures something
link |
about Russian culture that people outside of Russia should probably read.
link |
What's the second one?
link |
So the second one is, again, another one that it happened...
link |
I read it later in my life.
link |
I think I read it first time when I was a graduate student.
link |
And that's the Solzhenitsyn Cancer Award.
link |
That is an amazingly powerful book.
link |
It's about, I mean, essentially based on...
link |
Solzhenitsyn was diagnosed with cancer when he was reasonably young
link |
and he made a full recovery, but so this is about a person
link |
who was sentenced for life in one of these camps.
link |
And he had some cancer, so he was transported back
link |
to one of these Soviet republics, I think, South Asian republics.
link |
And the book is about his experience being a prisoner,
link |
being a patient in the cancer clinic in a cancer ward
link |
surrounded by people, many of which die, right?
link |
But in the way it reads, I mean, first of all,
link |
later on I read the accounts of the doctors
link |
who describe the experiences in the book
link |
by the patient as incredibly accurate.
link |
So I read that there was some doctors saying that
link |
every single doctor should read this book
link |
to understand what the patient feels.
link |
But again, as many of the Solzhenitsyn's books,
link |
it has multiple levels of complexity.
link |
And obviously, if you look above the cancer and the patient,
link |
I mean, the tumor that was growing and then disappeared
link |
in his body with some consequences.
link |
I mean, this is allegorically the Soviet...
link |
And he actually, when he was asked, he said that
link |
this is what made him think about this, how to combine these experiences.
link |
Him being a part of the Soviet regime,
link |
also being a part of someone sent to Gulag camp, right?
link |
And also someone who experienced cancer in his life.
link |
The Gulag archipelago and this book,
link |
these are the works that actually made him receive a Nobel Prize.
link |
But to me, I've read other books by Solzhenitsyn.
link |
This one, to me, is the most powerful one.
link |
And by the way, both this one and the previous one you read in Russian?
link |
So now the third book is an English book and it's completely different.
link |
So we're switching the gears completely.
link |
So this is the book, it's not even a book.
link |
It's an essay by Jonathan Neumann called The Computer and the Brain.
link |
And that was the book he was writing,
link |
knowing that he was dying of cancer.
link |
So the book was released back, it's a very thin book, right?
link |
But the power, the intellectual power in this book, in this essay is incredible.
link |
I mean, you probably know that von Neumann is considered to be one of the biggest thinkers,
link |
So his intellectual power was incredible, right?
link |
And you can actually feel this power in this book where the person is writing, knowing
link |
The book actually got published only after his death, back in 1958.
link |
But so he tried to put as many ideas that he still hadn't realized.
link |
And so this book is very difficult to read because every single paragraph is just compact,
link |
is filled with these ideas and the ideas are incredible.
link |
So nowadays, so he tried to put the parallels between the brain computing power, the neural
link |
system and the computers as they were.
link |
That whole year he was working on, it's like a approximately 57.
link |
So that was right during his, when he was diagnosed with cancer and he was essentially...
link |
Yeah, he's one of those, there's a few folks people mentioned.
link |
I think Ed Whitton is another, that everyone that meets them, they say he's just an intellectual
link |
Okay, so who's the honorable mention runner up?
link |
And this is, I mean, the reason I put it sort of in a separate section because this
link |
is a book that I reasonably recently listened to, so it's an audio book.
link |
And this is a book called Lab Girl by Hope Jaren.
link |
So Hope Jaren, she is a scientist, she's a geochemist that essentially studies the fossil
link |
And so she uses this fossil, the chemical analysis to understand what was the climate
link |
back in a thousand years, hundreds of thousands of years ago.
link |
And so something that incredibly touched me by this book, it was narrated by the author.
link |
And it's an incredibly personal story, incredibly.
link |
So certain parts of the book, you could actually hear the author crying.
link |
And that to me, I mean, I never experienced anything like this, you know, reading the
link |
book, but it was like, you know, the connection between you and the author.
link |
And I think this is, you know, this is really a must read, but even better, a must listen
link |
to audio book for anyone who wants to learn about sort of, you know, academia, science,
link |
and research in general, because it's a very personal account about her becoming a scientist.
link |
So we're just before New Year's, you know, we talked a lot about some difficult topics
link |
of viruses and so on.
link |
Do you have some exciting things you're looking forward to in 2021?
link |
Some New Year's resolutions, maybe silly or fun, or something very important and fundamental
link |
to the world of science or something completely unimportant?
link |
Well, well, I'm definitely looking forward to towards, you know, things becoming normal.
link |
So, yes, so I really miss traveling.
link |
Every summer, I go to a international summer school, it's called the School for Molecular
link |
and Theoretical Biology, it's held in Europe, it's organized by very good friends of mine,
link |
and this is the school for gifted kids from all over the world, and they're incredibly
link |
It's like every time I go there, it's like, you know, it's a highlight of the year.
link |
And we couldn't make it this August, so we did this school remotely, but it's different.
link |
So I am definitely looking forward to next August coming there.
link |
I also mean, you know, one of my, you know, personal resolutions, I realized that, you
link |
know, being in house and working from home, you know, I realized that actually I apparently
link |
missed a lot, you know, spending time with my family, believe it or not, so you typically,
link |
you know, with all the research and, you know, and teaching and everything related to the
link |
academic life, I mean, you get distracted.
link |
And so, so, you know, you don't feel that, you know, the fact that you are away from
link |
your family doesn't affect you because you're, you know, naturally distracted by other things.
link |
And you know, this time, I realized that, you know, that that's so important, right?
link |
Spending your time with the family, with your kids, and so that would be my new year resolution
link |
in actually trying to spend as much time as possible.
link |
Even when the world opens up, yeah, that's a beautiful message, that's a beautiful reminder.
link |
I asked you if there's a Russian poem you could read that I could force you to read,
link |
and you said, okay, fine, sure.
link |
Do you mind reading?
link |
I mean, you said that no paper needed, so.
link |
So, yeah, so this poem was written by my namesake, another Dmitry, Dmitry Kemerfeldt, and is
link |
a, you know, it's a recent poem, and it's called Sorceress, Viedma, in Russian, or actually
link |
Kaldunia, so that's sort of another sort of connotation of Sorceress or witch.
link |
And I really like it, and it's one of just a handful poems I actually can recall by heart.
link |
I also have a very strong association when I read this poem with master Margarita, the
link |
main female character, Margarita.
link |
And also it's, you know, it's about, you know, it's happening about the same time we're talking
link |
now, so around New Year, around Christmas.
link |
Do you mind reading it in Russian?
link |
I'll give it a try.
link |
And then I'll come back.
link |
and twisted the world.
link |
So you took the eyes of your hero,
link |
that anyone who came down to bless
link |
was ready to give the devil the devil's soul
link |
without looking at this witch's connection.
link |
There was a thief hanging around in the bushes,
link |
but I, without any prejudices and rags,
link |
ran out to feel your exhaled breath on my lips,
link |
so that the skin, with the tongue, with the ribs,
link |
would be on the other side of the earth,
link |
like you flew over the earth,
link |
in Belayv Yugi, Belayzibi, Belangliya.
link |
To me, it has a lot of meaning about this,
link |
something that is happening, something that is far away,
link |
but still very close to you.
link |
And, yes, it's the winter.
link |
There's something magical about winter, isn't there?
link |
Well, I don't know.
link |
I don't know how to translate it,
link |
but a kiss in winter is interesting.
link |
Lips in winter and all that kind of stuff.
link |
It's beautifully...
link |
I mean, Russian as a way.
link |
As a reason, Russian poetry is just...
link |
I'm a fan of poetry in both languages,
link |
but English doesn't capture some of the magic
link |
that Russian seems to, so thank you for doing that.
link |
Dimitri, it's great to talk to you again.
link |
It's contagious how much you love what you do,
link |
how much you love life,
link |
so I really appreciate you taking the time to talk today.
link |
And thank you for having me.
link |
Thanks for listening to this conversation with Dimitri Korkin,
link |
and thank you to our sponsors, Brave Browser,
link |
Netsuite Business Management Software,
link |
Magic Spoon Low Carb Serial,
link |
and A Sleep Self Cooling Mattress.
link |
So the choice is browsing privacy, business success,
link |
healthy diet, or comfortable sleep.
link |
Choose wisely, my friends,
link |
and if you wish, click the sponsor links below
link |
to get a discount and to support this podcast.
link |
And now, let me leave you with some words
link |
from Jeffrey Eugenides.
link |
Biology gives you a brain.
link |
Life turns it into a mind.
link |
Thank you for listening and hope to see you next time.